### RTFM, pls RTFM!!!

I was trying to write a parallel version C-extension with Cython several days ago, to calculate Coulomb energy of $n$ points. I wrote the code as follows:
@cython.cdivision(True)
@cython.boundscheck(False)
@cython.wraparound(False)
def u(double[:,:] x):
cdef long i
cdef long j,k
cdef double ret=0,tmp
with nogil, parallel():
for i in prange(x.shape[0]-1, schedule='static'):
for j in range(i + 1, x.shape[0]):
tmp = 0
for k in range(x.shape[1]):
tmp = tmp + pow(x[i,k]-x[j,k],2)
ret = ret + 1 / sqrt(tmp)
return ret


It passed the compilation but produced an NaN result. I Googled about half an hour with "cython parallel yield nan" but no answers were found, I also tried to add variables tracing the tmp variable... After about 1 desperate hour of aimlessly Googling with every possible combinations of meaningless keywords, reading the sample code in Cython doc over and over or seperating the modulus part as an independent function... suddenly I found on the top of the doc, under "Thread-locality and reductions.." line it said clearly about how the variables work in/out the prange loop. Therefore I tried to modify the code:
@cython.cdivision(True)
@cython.boundscheck(False)
@cython.wraparound(False)
def u(double[:,:] x):
cdef long i
cdef long j,k
cdef double ret=0,tmp
with nogil, parallel():
for i in prange(x.shape[0]-1, schedule='static'):
for j in range(i + 1, x.shape[0]):
tmp = 0
for k in range(x.shape[1]):
tmp = tmp + pow(x[i,k]-x[j,k],2)
ret += 1 / sqrt(tmp)
return ret

Just by changing ret = ret + ... to ret += ..., then the problem was solved. What was the meaning of wasting all this time of aimless trials? Why not just READ THE F**KING MANNUL?