SMP Multi-thread code basics
I want to learn to implement code using SMP threads but I'm having trouble understanding how to achieve this with the SDK functions - especially without some complete sample code. Like most of the SDK it's a baffling puzzle that is only simple to understand if you already know the answer.
Any references I have found and read about multi-thread code have their API specific calls and examples are usually in an application context with a main thread. The Carrara SDK functions which I can find are different and almost seem incomplete.
I'm aware that it is a complex and more dangerous area of programming. Perhaps somebody can fill in the gaps for me or give me a kick in the right direction...
1. Creating and launching a thread is clear enough but without a 'WaitForThread' or 'WaitForMultipleObjects' how do we know when its ::Work() is done ? Must all the synchronizing be achieved through IShSemaphore and if so exactly how ?
2. How are large amounts of data in TMCArray(s) accessed and processed by a thread in ::Work() ? We can't use globals and obviously we don't want to copy them. Does all the large data have to use the LocalStorage ? How are TMCSMPArray and TMCSMPArrayRequest used ?
To present an example: how can I move this section of code, in the context of a Deformer call, out into a thread ?
class MYTHREAD: public TBasicSmpThread
{
public:
virtual MCCOMErr MCCOMAPI Work();
};
MCCOMErr MyDeformer::DeformFacetMesh(real lod,FacetMesh* MeshIn,FacetMesh** MeshOut)
{
...
TMCArray OffsetVertices;
TVector3 vOffset;
uint32 uVertex,uTotalVertices;
vOffset.x=0.0;
vOffset.y=1.0;
vOffset.z=2.0;
uTotalVertices=MeshIn->VerticesNbr();
OffsetVertices.SetElemCount(uTotalVertices);
for(uVertex=0;uVertex
OffsetVertices[uVertex]=MeshIn->fVertices[uVertex]+vOffset;
...
// MYTHREAD MyThread;
int32 iThreadID;
// iThreadID=gShellSMPUtilities->LaunchSMPThread(&MyThread;,0,NULL,kHighPriority,&bAbortThread;);
// wait for thread to finish...
...
// rebuild the facet offset mesh
}
MyThread::Work()
{
// ! MOVE THE ABOVE CODE TO HERE !
// signal that the work is done...
return MC_S_OK;
}
If that isn't too hard then the next example could be how to launch more than one thread for each iteration of the loop and get a few running simultaneously and wait for them to finish. Another example to clear things up might be to have 3 different threads with each to process x,y and z separately.
Comments
(...a few years later...)
I had the need for plugin speed and figured it would be worth coming back (a bit wiser) and trying to figure this out through trial and error, comparing the Carrara SDK to how SMP is supposed to be implemented and I got it to work. I was reluctant to code with OS specific multi-threading because I didn't know how that would interact with the rest of the SDK classes but I would have tried that next.
I'll revise what I did and share the methods. If anyone knows better please correct me. We don't need multi-threading for some plugins, shaders for example, because each tile will be used by any available CPU.
In a deformer plugin I have a loop where I am going to check for all of the deformed vertices if any of them are going to intersect with that same facet mesh. The position of the deformed vertex has already been stored in outMesh before I get to the loop. By cloning the input mesh again I will check the original facets and vertices.
So now to use all the available CPU power I can put the inner loop into threads. While the first iteration is being tested the second one could proceed in another thread. The thread will need to be passed pointers to any of the data in the loop once. The counter will need to be changed when the thread has finished processing one iteration of the loop.
Before the loop begins the threads will be created and launched. The number of threads will vary on any system so they need to be in an array. There is no point in having any more threads than there are CPUs. A semaphore will be used to know when any thread is free by signalling it. In the initial state all of the threads will be free but not ready to process any data so the flags are used to control that. An abort flag is needed and must be false for the thread to run. Even once the thread has finished execution it will still remain in existance and we will also want to check for a user abort in the loop.
Now the code from the loop needs to go into the thread's ::Work() function. As a function/method in the class when the thread is launched it will begin processing and return but we don't want that. I'm putting the thread into a while loop so it will keep running as long as it's needed but it won't process any data until it is ready. In the initial state the ready flag is set to false so the thread will not enter that inner loop. As soon as the work is done the thread can reset the ready flag and signal the semaphore. When multiple threads are running and changing the same data the critical section class is used to prevent race conditions. In this loop the original facet mesh is not changing and each iteration changes only one vertex. In the loop I haven't show that deformed the vertices I used a critical section when a triangle's vertices where being read, in case they where changed part the way through, and when any vertex was being written.
In the loop the semaphore is used to halt at that point and wait until a thread is free. The semaphore doesn't know which actual thread is free so by checking the ready flag in each of them that can be determined. A free thread can then be given the next iteration counter of the loop and any data.
So that takes the code from within the for loop and puts it into threads to run on all the available CPUs. When the next iteration comes around it waits for a free thread to be available then gives it the next vertex in the loop to process. I'm not totally clear about what really needs to go into a critical section yet. I know that using too many will slow things down so I used a few as a precaution when setting those thread control flags if more than one variable was changing. Once a thread is flagged as being ready another might have signalled.
When the main thread loop has finished the thread functions need to exit and be aborted or they will remain active. For this I wait until all the threads were flagged as ready and then set their running flags to false to exit the outer while loop in the thread. After that the flag to abort all threads was set true to tell the system to end them.
When I run this and check the Windows Task Manager the CPU usage goes to 100%, the application thread count increases and the plugin works significantly faster.
What I did have trouble with was when I tried to bring up a progress bar and increment it inside the loop two of them would appear: one would work but the upper one would hang and crash Carrara. A progress bar is always better than a spinning circle when the processing takes more than a few seconds.