`Cython`

several days ago, to calculate Coulomb energy of $n$ points. I wrote the code as follows:```
@cython.cdivision(True)
@cython.boundscheck(False)
@cython.wraparound(False)
def u(double[:,:] x):
cdef long i
cdef long j,k
cdef double ret=0,tmp
with nogil, parallel():
for i in prange(x.shape[0]-1, schedule='static'):
for j in range(i + 1, x.shape[0]):
tmp = 0
for k in range(x.shape[1]):
tmp = tmp + pow(x[i,k]-x[j,k],2)
ret = ret + 1 / sqrt(tmp)
return ret
```

It passed the compilation but produced an NaN result. I Googled about half an hour with "cython parallel yield nan" but no answers were found, I also tried to add variables tracing the

`tmp`

variable... After about 1 desperate hour of aimlessly Googling with every possible combinations of meaningless keywords, reading the sample code in Cython doc over and over or seperating the modulus part as an independent function... suddenly I found on the top of the doc, under "Thread-locality and reductions.." line it said clearly about how the variables work in/out the `prange`

loop. Therefore I tried to modify the code:```
@cython.cdivision(True)
@cython.boundscheck(False)
@cython.wraparound(False)
def u(double[:,:] x):
cdef long i
cdef long j,k
cdef double ret=0,tmp
with nogil, parallel():
for i in prange(x.shape[0]-1, schedule='static'):
for j in range(i + 1, x.shape[0]):
tmp = 0
for k in range(x.shape[1]):
tmp = tmp + pow(x[i,k]-x[j,k],2)
ret += 1 / sqrt(tmp)
return ret
```

Just by changing `ret = ret + ...`

to `ret += ...`

, then the problem was solved. What was the meaning of wasting all this time of aimless trials? Why not just READ THE F**KING MANNUL?
## No comments:

## Post a Comment