Hi All,
I'm having no success with this (simple?) multithreading problem on my core-i7 processor, using CVI 9.0 (32-bit compiler).
In the code snippets below, I have a node level structure of 5 integers, and I use 32 calls to calloc() to allocate space for 32 blocks of 128*128 (16K) nodes and store the returned pointers in an array as a global var.
Node size in bytes = 20, block size in bytes = (approx) 328KB, total allocated size in bytes = (approx) 10.5MB.
I then spawn 32 threads, each of which is passed a unique index into the "node_space" pointer_array (see code below), so each thread is manipulating (reading/writing) a separate 16K block of nodes.
It should be thread safe and scale by the number of threads because each thread is addressing a different memory block (with no overlap), but multithreading goes no faster (maybe slightly) than a single thread.
I've tried various threadpool sizes, padding nodes to 16 and 64 byte boundaries, all to no avail.
Is this a memory bandwidth problem due to the size of the arrays? Does each thread somehow load the whole 32 blocks? Any help appreciated.
struct Nodes
{
unsigned int a;
unsigned int b;
unsigned int c;
unsigned int d;
unsigned int e;
} ;
typedef struct Nodes Nodes;
typedef Nodes *Node_Ptr;
Node_Ptr node_space[32]; /* pointer array into 32 separate blocks ( loaded via individual calloc calls for each block) */
.... Thread Spawning ....
for (index = 0; index < 32; ++index)
CmtScheduleThreadPoolFunction(my_thread_pool_handle, My_Thread_Function, &index, NULL);