Tuesday, October 1, 2019

Dealing with aggregates: I don't like tedious work

I used to study nanoparticles (NPs) self-assembly in polymer matrices before 2017 by coarse-grained simulation method. Analyzing morphologies of self-assemblies is a tedious work: one needs to repeatedly watch frames from simulation trajectories, finding appropriate view points, counting number of aggregates, measure size of aggregates, etc. Obtaining integrate aggregates under periodic boundary conditions is also a challenging work. In addition, one usually needs to run plenty of simulations under different conditions to find a desired morphology; selecting desired morphologies automatically is even more challenging: most of the morphologies are ill-defined, it is hard to use some common  characterization methods, e.g., $g(r)$ or $S(q)$ of NPs give little differences amongst percolated morphologies.
Based on such demands, I designed a co-pilot which can:
  1. Automatically clustering the clusters;
  2. Remove periodic boundary conditions and make the center-of-mass at $(0,0,0)^T$;
  3. Adjust the view vector along the minor axis of the aggregate;
  4. Classify the aggregate.
In step 1, firstly, I calculate the $g(r)$ of NP centers and find $r$ where $g(r)$ reaches its 1st valley, simultaneously, the distance matrix under periodic boundary condition. Using this distance matrix and $r$, a DBSCAN or HDBSCAN algorithm is performed to cluster NPs. In simulations, usually systems with very low loading of NPs are considered, DBSCAN is efficient enough; for some co-polymer self-assembly systems, a coarse-grained method is preferred: divide simulation box into grids and DBSCAN on the grids with density as weights, then perform DBSCAN on each coarse-grained clusters. DFS or BFS + Linked Cell List is also a good choice if particles are too many. All these struggles are due to that  kd-tree is not working under periodic boundary conditions.

After obtaining clusters in step 1, we must remove periodic boundary conditions of the cluster. If in step 1, one uses BFS or DFS + Linked Cell List method, then one can remove periodic boundary condition during clustering; but this method has limitations, it does not work properly if the cluster is percolated throughout the box. Therefore, in this step, I use circular statistics to deal with the clusters. In periodic boundary condition simulation box, distance between an NP in an aggregate and the midpoint will never exceed $L/2$ in corresponding box dimension. Midpoint here is not center-of-mass, e.g., distance between a nozzle point and center-of-mass of an ear-syringe-bulb is clearly larger than its half length; midpoint is a actually a "de-duplicated" center-of-mass. Besides, circular mean also puts most points in the center in case of percolation. Therefore, in part 2, we have following steps:
  1. Choose a $r_\text{cut}$ to test whether the aggregate is percolate;
  2. If the aggregate is percolate, evaluate the circular mean of all points $r_c$;
  3. Set all coordinates $r$ as $r\to pbc(r-r_c)$;
  4. If the aggregate is not percolate, midpoint is evaluated by calculating circular mean of coordinates $r$ where $\rho(r)>0$, $\rho(r)$ is calculated using bin size that smaller than $r_\text{cut}$ used in step 1;
  5. Same as step 3, update coordinates;
  6. After step 5, the aggregates are unwrapped from the box, set $r\to r-\overline{r}$ to set center-of-mass at $(0,0,0)^T$
Circular mean $\alpha$ of samples $\lbrace\alpha_i\rbrace$ is defined as the minimization of summation of a circular metric function $d(\alpha,\beta):=1-\cos(\alpha-\beta)$
$$\alpha=\underset{\beta}{\operatorname{argmin}}\sum_i (1-\cos(\alpha_i-\beta))$$
Adjusting the view vector is simple, evaluate the eigenspace of gyration tensor as $rr^T/n$ and sort the eigenvectors by eigenvalue, i.e., $\lambda_1\ge\lambda_2\ge\lambda_3$, then the minor axis is corresponding eigenvector $v_3$, the aggregate then can be rotated by $[v_1, v_2, v_3]$ so that the minor axis is $z$-axis.

The last step is a bit more tricky, the best trail I attempted was to use SVC, a binary classification method. I used about 20 samples labeled as "desired", these 20 samples were extended to 100 samples by adding some noises to the samples, e.g., moving coordinates a little bit, adding several NPs into the aggregate or removing several NPs randomly, without "breaking" the category of the morphology. Together with 100 "undesired" samples, I trained the SVC with a Gaussian kernel. The result turned out to be pretty good. I also tried to use ANN to classify all 5 categories of morphologies obtained from simulations, but ANN model did not work very well, perhaps the reason was lack of samples or the model I built was too rough. I didn't try other multi-class methods, anyway, that part of work was done, I stopped developing this co-pilot long time ago.

Eigenvalues of circulant matrices

A circulant matrix is defined as
$$C=\begin{bmatrix}c_{0}&c_{n-1}&\dots &c_{2}&c_{1}\\c_{1}&c_{0}&c_{n-1}&&c_{2}\\\vdots &c_{1}&c_{0}&\ddots &\vdots \\c_{n-2}&&\ddots &\ddots &c_{n-1}\\c_{n-1}&c_{n-2}&\dots &c_{1}&c_{0}\\\end{bmatrix}$$
where $C_{j, k}=c_{j-k \mod n}$, the $k$-th eignenvalue $\lambda_k$ and eigenvector $x_k$ satisfy $C\cdot x_k=\lambda_k x_k$, or, equivalently, $n$ equations as:
$$\sum_{j=0}^{m-1}c_{m-j}x_j+\sum_{j=m}^{n-1}c_{n-j+m}x_j=\lambda_m x_m\quad m=0,1,\dots,n-1$$
with $c_n=c_0$. Changing the dummy summing ($j\to m-j$ and $j\to n-j+m$) variables yields
$$\sum_{j=1}^{m}c_j x_{m-j} +\sum_{j=m+1}^{n}c_j x_{n+m-j}=\lambda_m x_m$$
with $m=0,1,2,\dots,n-1$. One can "guess" a solution that $x_j=\omega^j$, therefore the equation above turns into
$$\begin{align}&\sum_{j=1}^{m}c_j \omega^{m-j} +\sum_{j=m+1}^{n}c_j \omega^{n+m-j}=\lambda_m \omega^m\\ \leftrightarrow &\sum_{j=1}^{m}c_j \omega^{-j} +\omega^{n}\sum_{j=m+1}^{n}c_j \omega^{-j}=\lambda_m\end{align}$$
Let $\omega$ be one of the $n$-th square root of unity, i.e., $\omega = \exp(-2\pi\mathbb{i}/n)$,  then $\omega^{n}=1$; we have an eigenvalue
$$\lambda = \sum_{j=0}^{n-1}\omega^{-j} c_j$$
with corresponding eigenvector
$$x= (1, \exp(2\pi\mathbb{i}/n), \exp(2\pi\mathbb{i}/n)^2,\dots,\exp(2\pi\mathbb{i}/n)^{n-1})^T$$
Choosing $\omega_m = \omega^{-m} = \exp(2\pi m\mathbb{i}/n)$ with $m\in [0,n-1]$, yields the $m$-th eigenvalue and eigenvector:
$$\lambda_m = \sum_{j=0}^{n-1}c_j \exp(2\pi m\mathbb{i}/n)$$
and
$$x_m = (1, \exp(2\pi m\mathbb{i}/n), \exp(2\pi m\mathbb{i}/n)^2,\dots,\exp(2\pi m\mathbb{i}/n)^{n-1})^T$$
The eigenspace is just the DFT matrix, and ALL circulant matrices share same eigenspace, it is easily to verify that circulant matrices have following properties:
If $A$ and $B$ are circulant matrices, then
  1. $AB=BA=W^\ast\Gamma W$ with $W$ is the DFT matrix and $\Gamma=\Gamma_A\Gamma_B$, $\Gamma_i$ represents diagonal matrix consists of eigenvalues of $i$; $\Gamma_A = WAW^\ast$;
  2. $B+A=A+B=W^\ast\Omega W$, $\Omega=\Gamma_A+\Gamma_B$;
  3. If $\mathrm{det}(A)\ne 0$, then $A^{-1}=W^\ast \Gamma_A^{-1}W$;
The proof is straightforward:
  1. $AB=W^\ast \Gamma_AW W^\ast\Gamma_BW= W^\ast \Gamma_A\Gamma_BW=W^\ast \Gamma_B\Gamma_AW=BA$
  2. $W(A+B)W^\ast=\Gamma_A+\Gamma_B$; 
  3. $AW^\ast \Gamma_A^{-1}W=W^\ast \Gamma_AW W^\ast \Gamma_A^{-1}W=\mathbb{I}$

Sunday, September 22, 2019

Toeplitz, Circulant matrix and discrete convolution theorem

A topelitz matrix is a matrix with form of
$$A=\begin{bmatrix} a & b & c & d & e \\ f & a & b & c & d \\ g & f & a & b & c \\ h & g & f & a & b \\ i & h & g & f & a\end{bmatrix}$$
Generally, the elements are denoted as $a_{i-j}$: $A_{ij}=A_{i+1, j+1}=a_{i-j}$. A circulant matrix is a special kind of Toeplitz matrix where each row vector is rotated one element:
$$C=\begin{bmatrix}c_{0}&c_{n-1}&\dots &c_{2}&c_{1}\\c_{1}&c_{0}&c_{n-1}&&c_{2}\\\vdots &c_{1}&c_{0}&\ddots &\vdots \\c_{n-2}&&\ddots &\ddots &c_{n-1}\\c_{n-1}&c_{n-2}&\dots &c_{1}&c_{0}\\\end{bmatrix}$$

Circulant matrices are closely connected to discrete convolution:
$$y=h\ast x=\begin{bmatrix}h_{1}&0&\cdots &0&0\\h_{2}&h_{1}&&\vdots &\vdots \\h_{3}&h_{2}&\cdots &0&0\\\vdots &h_{3}&\cdots &h_{1}&0\\h_{m-1}&\vdots &\ddots &h_{2}&h_{1}\\h_{m}&h_{m-1}&&\vdots &h_{2}\\0&h_{m}&\ddots &h_{m-2}&\vdots \\0&0&\cdots &h_{m-1}&h_{m-2}\\\vdots &\vdots &&h_{m}&h_{m-1}\\0&0&0&\cdots &h_{m}\end{bmatrix}
\begin{bmatrix}x_{1}\\x_{2}\\x_{3}\\\vdots \\x_{n}\end{bmatrix}$$
The above matrix is $(m,n)$, the size of full version of the matrix of RHS should be $(m+n-1,m+n-1)$ and the circulant matrix is generated by an $m+n-1$ row vector $(h_1, h_2,...,h_m,0,0,...,0)$, the full version of vector of $x$ of RHS should also be an $m+n-1$ vector $(x_1, x_2,...,x_n,0,...,0)^T$. For $(a \ast v)[n] = \sum_{m = -\infty}^{\infty} a[m] v[n - m]$(from help(np.convolve)). If the original arrays are of size $m, n$, then the output array is $m+n-1$. Here I present a simple proof of discrete convolution theorem using DFT:
$y=h\ast x=\mathrm{IDFT}\{\mathrm{DFT}\{h\}\mathrm{DFT}\{x\}\}$
Proof:
the DFT matrix is:
$$W=\frac {1}{\sqrt {N}}\begin{bmatrix}1&1&1&1&\cdots &1\\1&\omega &\omega ^{2}&\omega ^{3}&\cdots &\omega ^{N-1}\\1&\omega ^{2}&\omega ^{4}&\omega ^{6}&\cdots &\omega ^{2(N-1)}\\1&\omega ^{3}&\omega ^{6}&\omega ^{9}&\cdots &\omega ^{3(N-1)}\\\vdots &\vdots &\vdots &\vdots &&\vdots \\1&\omega ^{N-1}&\omega ^{2(N-1)}&\omega ^{3(N-1)}&\cdots &\omega ^{(N-1)(N-1)}\\\end{bmatrix}$$
With $W_{ij}=\omega^{i\times j}$ and $\omega=\exp(\frac{2\pi \mathbb{i}}{N})$. Assume $\hat{h}=\mathrm{DFT}\{h\}=W\cdot h$, and assume $\hat{c}_j=\mathrm{DFT}\{h\}_j\mathrm{DFT}\{x\}_j$, we have:
$$\hat{c}_j=\sum_{k_1=0,k_2=0}^{N-1}\omega^{j\times(k_1+k_2)}h_{k_1}x_{k_2}$$
The IDFT matrix is the complex conjugate of $W$, therefore, $W_{ij}^\ast=\omega^{-i\times j}$; therefore $y_i=W_{ij}^\ast \hat{c}_{j}=\sum_{k_0=0,k_1=0,j=0}^{N-1}\omega^{j\times(k_1+k_2-i)}h_{k_1}x_{k_2}$. With $\omega$ is the $N$-th roots of unity. Summing over $j$ first, we thus obtain
$$\begin{equation}\sum_{j=0}^{N-1}\omega^{(k_1+k_2-i)\times j}=\begin{cases}N\quad &k_1+k_2=i\\ \frac{1-\omega^{(k_1+k_2-i)\times N}}{1-\omega^{k_1+k_2-i}}=0 &k_1+k_2\ne i\end{cases}\end{equation}$$
Which gives $y[i]=\sum_{k_1+k_2=i}a_{k_1}b_{k_2}=\sum_j a[j]b[i-j]$ being the discrete convolution. $Q.E.D.$
Therefore, the matrix multiplication can also be accelerated by FFT method if circulant matrices are involved. The extra $1/\sqrt{N}$ comes from IDFT.
In[1]: a = np.random.random((5,))
In[2]: b = np.random.random((5,))
In[3]: np.allclose(np.fft.ifft(np.fft.fft(a,n=9, norm='ortho')*np.fft.fft(b, n=9,norm='ortho'), 
    ....:     n=9,norm='ortho').real*9**0.5,np.convolve(a,b,'full'))
Out[3]: True
The $W^\ast \cdot W=\sum_{k=0}^{N-1}\omega^{k(i-j)}=\delta_{ij}=\mathbb{I}$ where $W^\ast$ represents the inverse DFT operator, is the complex conjugation transpose of $W$. Appling twice of FT on a function $f(x)$ yields $f(-x)$: $W^2_{ij}=\sum_{k=0}^{N-1}\omega^{k(i+j)}=\delta_{i+j\equiv0 \bmod N},i,j\in[0,N-1]$, which means $\mathrm{FT}\{\mathrm{FT}\{f(x)\}\}=f(-x)$, e.g., for $5\times5$ sized DFT matrix:
$$W^2=\begin{bmatrix}
1 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 0 & 1\\
0 & 0 & 0 & 1 & 0\\
0 & 0 & 1 & 0 & 0\\
0 & 1 & 0 & 0 & 0\\
\end{bmatrix}$$
Clearly, $W^2\cdot (x_0, x_1, x_2, x_3, x_4)^T=(x_0,x_4,x_3,x_2,x_1)^T$. In fact, it is easily to verify that the eigenvectors of a circulant matrix are Fourier modes: $v_j=1/\sqrt{N}(1, \omega_j, \omega_j^2,...,\omega_j^{N-1})^T$ for $\omega_j = \exp(2 \pi j \mathbb{i}/n)$, and corresponding eigenvalue $\lambda_j=c_0+c_{N-1}\omega_j+c_{N-2}\omega_j^2+...+c_1\omega_j^{N-1}$, with $C\cdot v_j=\lambda_j v_j$, the 'circulant' property comes from that $\omega$ is generator of a cyclic group.

Wednesday, July 24, 2019

Distribution of segments on Gaussian chain

The probability distribution function of $P_i(\mathbf{r}_i-\mathbf{r}_\mathrm{cm})$ of ideal chain is evaluated, for $P_i$ represents the probability distribution fucntion of $i$th segment with respect to its centre of mass on the ideal chain. An ideal chain is modeled as multidimensional random walk, where the steps are independent, and the distribution of a step at mean length $b$ is given by $P(\mathbf{r})\sim\mathcal{N}(0,b^2)$. Let $\mathbf{b}_i=\mathbf{r}_{i+1}-\mathbf{r}_{i}$ be the $i$th bond vector, then we have

$$\mathbf{r}_i=\sum_{j=1}^{i-1} \mathbf{b}_j$$

and the centre of mass $\mathbf{r}_\mathrm{cm}=\frac{1}{N}\sum_i \mathbf{r}_i$ is

$$\mathbf{r}_\mathrm{cm}=\frac{1}{N}\sum_{j=1}^{N-1}(N-j)\mathbf{b}_j$$

then

$$\mathbf{r}_i-\mathbf{r}_\mathrm{cm}=\sum_{j=1}^{i-1}\frac{j}{N}\mathbf{b}_j+\sum_{j=i}^{N-1}\frac{N-j}{N}\mathbf{b}_j$$

If variable $X$ is a Gaussian random variable with scale of $\sigma^2$, then $aX$ is a Gaussian random variable with scale of $a^2\sigma^2$, we can then write the characteristic function for $P_i(\mathbf{r}_i-\mathbf{r}_\mathrm{cm})$ of a $d$ dimensional ideal chain:

$$\phi_i(\mathbf{q})=\Pi_{j} \phi_{\mathbf{b}_j'}(\mathbf{q})=\exp\left(-\frac{1}{2}\mathbf{q}^T\left(\sum_{j=1}^{N-1}\Sigma_j\right)\mathbf{q}\right)$$

with $\mathbf{b}'_j=\frac{j}{N}\mathbf{b}_j$ for $j\le i-1$ and $\frac{N-j}{N}\mathbf{b}_j$ for $i\le j \le N-1$; $\phi_{\mathbf{b}_j'}=-\exp(-0.5\mathbf{q}^T\Sigma\mathbf{q})$ is the characteristic function of the probability distribution fucntion of bond $j$. $\Sigma_j=\frac{j^2 b^2}{d N^2}\mathbb{I}_d$ for $j\le i-1$ and $\Sigma_j=\frac{(N-j)^2 b^2}{d N^2}\mathbb{I}_d$ for $i\le j \le N-1$, and $\mathbb{I}_d$ is the $d$-dimensional identical matrix, and therefore: (For convenience, I will set $b=1$ in the later calculations.)

$$\phi_i(\mathbf{q})=\exp \left(-\frac{\left(6 i^2-6 i (N+1)+2 N^2+3 N+1\right) \left(q_x^2+q_y^2+q_z^2\right)}{36 N}\right)$$

The corresponding distribution of this characteristic function is still a Gaussian distribution with $\Sigma=\frac{b^2}{3} \mathbb{I}_3$, where the equivalent bond length $b^2=\frac{\left(6 i^2-6 i (N+1)+2 N^2+3 N+1\right)}{6 N}$. The 6th moment is $\frac{1}{N}\sum_{i=1}^N \langle(\mathbf{r}_i-\mathbf{r}_\mathrm{cm})^6\rangle=\frac{58 N^6-273 N^4+462 N^2-247}{1944 N^3}$. At large $N$, only leading term matters, which is $\frac{29}{972} N^3$. For $N=20$, the result is $235.886$, which is in accord with the simulation. Another example is for $N=5$, the $R_g^2$ is $0.8$ rather than $5/6=0.8\dot{3}$ in this case.

Here is the simulation code:
ch = np.random.normal(size=(100000,20,3),scale=1/3**0.5)
ch[:,0,:]=0
ch = ch.cumsum(axis=1)
ch -= ch.mean(axis=1,keepdims=1)
m6 = np.mean(np.linalg.norm(ch,axis=-1)**6)

Monday, July 22, 2019

Simple Examples of Parallel Computing of Cython

It is more convenient to calculate some properties during a molecular simulation process by accessing data from API of the molecular simulation program than calculating after the whole simulation progress and dumping all the coordinates. Especially, for sharply fluctuated properties such as Virial, RDF, etc., require massive frames to calculate. It is expensive to dump densely. For some Python-friendly molecular simulation softwares, e.g., lammps, hoomd-blue, galamost, etc.; it is easily to embed customized Python functions into the simulation control scripts. However, Python has a poor performance in massive calculations, numba.cuda.jit does dramatically boost up the program, however, due to unknown reasons, GPU-accelerated molecular simulation softwares like hoomd-blue and galamost produce errors during simulations if one uses numba.cuda.jit compiled functions. Therefore, I try to use Cython to generate C-functions to accelerate the calculations. Here I put a simple example of a pairwise property calculation function (RDF) using Cython-parallel method, it's also a learning note of mine:

# cython: language_level=3, boundscheck=False, wraparound=False, cdivision=True
import numpy as np
cimport numpy as np
import cython
from cython.parallel import prange, parallel
cimport openmp
from libc.math cimport floor,sqrt,pow
from libc.stdlib cimport malloc, free
import multiprocessing


cdef long * unravel_index_f(long i, long[:] dim) nogil:
    cdef long k, d
    d = dim.shape[0]
    cdef long * ret = <long *> malloc(d * sizeof(long))
    for k in range(d):
        ret[k] = i % dim[k]
        i = (i - ret[k]) / dim[k]
    return ret


cdef long ravel_index_f(long * vec, long[:] dim) nogil:
    cdef long ret, d, tmp, k
    d = dim.shape[0]
    ret = (vec[0] + dim[0]) % dim[0]
    tmp = dim[0]
    for k in range(1,d):
        ret += ((vec[k] + dim[k]) % dim[k]) * tmp
        tmp *= dim[k]
    return ret


cdef long jth_neighbour(long * veci, long * vecj, long[:] dim) nogil:
    cdef long ret, d, tmp, k
    cdef long * tmpi
    d = dim.shape[0]
    ret = (veci[0] + vecj[0] - 1 + dim[0]) % dim[0]
    # re-ravel tmpi + tmpj - 1 to cell_j
    # -1 for indices from -1, -1, -1 to 1, 1, 1 rather than 0,0,0 to 2,2,2
    tmp = dim[0]
    for k in range(1, d):
        ret += ((veci[k] + vecj[k] - 1 + dim[k]) % dim[k]) * tmp
        tmp *= dim[k]
    return ret


cdef double pbc_dist(double[:] x, double[:] y, double[:] b) nogil:
    cdef long i, d
    cdef double tmp=0, r=0
    d = b.shape[0]
    for i in range(d):
        tmp = x[i]-y[i]
        tmp = tmp - b[i] * floor(tmp/b[i]+0.5)
        r = r + pow(tmp, 2)
    return sqrt(r)


cdef long cell_id(double[:] p, double[:] box, long[:] ibox) nogil:
    # In the Fortran way
    cdef long ret, tmp, i, n
    n = p.shape[0]
    ret = <long> floor((p[0] / box[0] + 0.5) * ibox[0])
    tmp = ibox[0]
    for i in range(1, n):
        ret = ret + tmp * <long> floor((p[i] / box[i] + 0.5) * ibox[i])
        tmp = tmp * ibox[i]
    return ret


cdef void linked_cl(double[:, :] pos, double[:] box, long[:] ibox, long[:] head, long[:] body) nogil:
    cdef long i, n, ic
    n = pos.shape[0]
    for i in range(n):
        ic = cell_id(pos[i], box, ibox)
        body[i] = head[ic]
        head[ic] = i


def rdf(double[:,:] x, double[:,:] y, double[:] box, double bs, long nbins):
    cdef long i, j, k, l, n, m, d3, d, ic, jc
    cdef np.ndarray[np.double_t, ndim=2] ret
    cdef long[:] head, body, ibox, dim
    cdef double r, r_cut
    cdef long ** j_vecs
    cdef long * veci
    cdef int num_threads, thread_num
    num_threads = multiprocessing.cpu_count()
    r_cut = nbins * bs
    n = x.shape[0]
    d = x.shape[1]
    m = y.shape[0]
    d3 = 3 ** d
    ibox = np.zeros((d,), dtype=np.int64)
    for i in range(d):
        ibox[i] = <long> floor(box[i] / r_cut + 0.5)
    head = np.zeros(np.multiply.reduce(ibox), dtype=np.int64) - 1
    body = np.zeros((m,), dtype=np.int64) - 1
    linked_cl(y, box, ibox, head, body)
    ret = np.zeros((num_threads, nbins), dtype=np.float)
    dim = np.zeros((d,), dtype=np.int64) + 3
    j_vecs = <long **> malloc(sizeof(long *) * d3)
    for i in range(d3):
        j_vecs[i] = unravel_index_f(i, dim)  
    with nogil, parallel(num_threads=num_threads):
        for i in prange(n, schedule='dynamic'):
            ic = cell_id(x[i], box, ibox)
            thread_num = openmp.omp_get_thread_num()
            veci = unravel_index_f(ic, ibox)
            for j in range(d3):
                jc = jth_neighbour(veci, j_vecs[j], ibox)
                k = head[jc]
                while k != -1:
                    r = pbc_dist(x[i], y[k], box)
                    if r < r_cut:
                        l = <long> (r/bs)
                        ret[thread_num, l]+=1
                    k = body[k]
    return np.sum(ret, axis=0)

Notes:

  1. def means Python function and cdef means C functions;
  2. Arrays declared by double[:] etc. are memoryslice and can be conveniently initialized by np.zeros or reading an np.ndarray as function parameter;
  3. Without GIL, any calling of Python function is forbidden, the array must be initialized and declared in the C method;
  4. In prange loop, variables are thread-local (see the generated .c file, in section #pragma omp parallel), there is no explicit way to call an atomic operation in Cython (using with gil results in very low efficiency), hence the result is generated as ret = np.zeros((n, nbins), ... so that in each thread i that ret[i] are independent, or generate ret as (num_threads, data structure...), just keep independent amongst threads and in ith loop, call openmp.omp_get_thread_num() to get the current thread's id;
  5. <long> is a type casting.

This RDF function uses a linked cell list algorithm to reduce the calculation from $O(n^2)$ to $O(n)$, a small ranged RDF ($r<r_\mathrm{cut}$) is calculated, the RDF on $r\ge r_\mathrm{cut}$ can be calculated using an FFT algorithm by setting $r_\mathrm{bin}<\frac{r_\mathrm{cut}}{2}$

The second example is calculation of histogram N-d array by modulus of indices: $\int f(\mathbf{r})\delta(|\mathbf{r}|-r)\mathrm{d}\mathbf{r}$:

# cython: language_level=3, boundscheck=False, wraparound=False, cdivision=True
import numpy as np
cimport numpy as np
import cython
from cython.parallel import prange, parallel
cimport openmp
from libc.math cimport floor,sqrt,pow
from libc.stdlib cimport malloc, free
import multiprocessing


cdef double unravel_index_f_r(long i, long[:] dim) nogil:
    # unravel in Fortran way
    cdef long k, d, tmp
    cdef double r
    d = dim.shape[0]
    for k in range(d):
        tmp = i % dim[k]
        r += <double> (tmp)**2
        i = (i - tmp) / dim[k]
    return sqrt(r)



def hist_to_r(double[:] x, long[:] shape, double dr, double bs, double rc):
    cdef long i, n, j, n_bins
    cdef np.ndarray[np.double_t, ndim=2] ret, count
    cdef int num_threads, thread_num
    n_bins = <long> (rc / bs)
    num_threads = multiprocessing.cpu_count()
    ret = np.zeros((num_threads, n_bins), dtype=np.float64)
    count = np.zeros((num_threads, n_bins), dtype=np.float64)
    n = x.shape[0]
    with nogil, parallel(num_threads=num_threads):
        for i in prange(n, schedule='dynamic'):
            j = <long> (unravel_index_f_r(i, shape)*dr/bs)
            thread_num = openmp.omp_get_thread_num()
            if j < n_bins:
                ret[thread_num, j] += x[i]
                count[thread_num, j] += 1
    return np.sum(ret,axis=0), np.sum(count,axis=0)

In function unravel_index_f_r, the index is unraveled in the Fortran way, therefore, calling hist_to_r requires x.ravel(order='F').
^ Back to Top