A note on Cython Parallelism
Instead of using thread-local ndarrays, a very convenient method is to create an array shaped in
(num_of_threads, <whatever the structure>)
to make the changes thread-safe. For example:cdef np.ndarray[np.double_t, ndim=2] ret
num_threads = multiprocessing.cpu_count()
ret = np.zeros((num_threads, nbins), dtype=np.float)
and in the prange
loop:thread_num = openmp.omp_get_thread_num()
ret[thread_num, l]+=1
...
Afterwards, ret.sum(axis=0)
is returned. The loop is set to use num_threads
threads: with nogil, parallel(num_threads=num_threads):
. By default, I set the prange
with schedule='dynamic'
option, on my laptop (ArchLinux), with openmp=201511
and gcc=9.2.0
, I found interesting outputs, the summation of ret
is exactly num_threads
times the desired results, i.e. ret.sum(axis=0)[0]
should be 200, but if 4 threads are used in parallel computation, then the output becomes 800. However, this does not happen on my server (CentOS 7.6) with gcc=4.2.0
and openmp=201107
. All others are the same (Python, Cython, NumPy...) including .c files generated by Cython, for I use Anaconda. This wired problem is solved after setting schedule='static'
on my laptop. I think this is some new feature of openmp, I shall read the docs to figure this out when I have some time.
Comments
Post a Comment