### Di-block copolymer analysis: GPC

Gel Permeation Chromatography is a well known method of measuring molecular weight distribution of polymers. For di-block copolymers, a proper calibration curve can be constructed by using Mark-Houwink parameters of homopolymers (see here). Usually, di-block copolymers are obtained by synthesis one block first then "grow" the other block from the end of preceding block. The molecular weight distributions of final di-block copolymer and preceding block are accessible by GPC, the molecular weight distribution of the 2nd block, however, is hard to obtain.

Here I present a simple method to calculate the molecular weight distribution of 2nd block by GPC data of preceding block and final di-block copolymer. Assuming that we already have the molecular weight distribution of the preceding block, $P_1(n)$, it is generally that the length of the 2nd block "grow" from the depends on the preceding chain length $n$, therefore, the pdf of the 2 blocks should be a joint pdf $Q(n, m)=P_1(n)P_2(m, n)$, and the pdf of the final di-block copolymer is straightforward:
$$P(x)=\int Q(n, x-n) \mathrm{d}n$$
However, it is reasonable to assume that the $P_2(m, n)$ can be expressed as some $f(m)g(n)$ form by dropping some higher order correlations between $m,n$; hence, the $g(n)$ can be considered as a "fixing parameter" on the distribution of $P_1(n)$, the calculation of $P(x)$ is therefore a convolution. The simplest case is $g(n)=1$, which means the $P_1(n)$ and $P_2(m)$ are independent, the growth of 2nd block on the end of the 1st block is not effected by the length of the preceding block; or $g(n)\sim n^{-1}$ by considering the diffusion of the preceding block: small preceding chains tend to "grow" more 2nd block.

The evaluation of pdf of the 2nd block follows 3 steps:

1. Determine the range of molecular weight: $(x_{min}-n_{max}, x_{max}-n_{min})$, where $x$ is the chain length of the di-block copolymer and $n$ is the length of the preceding block;
2. Interpolate the distributions obtained from GPC into equally spaced molecular weight, negative part is simply 0;
3. Deconvolute $P(x)$ with $P_1(n)$.

### A note on Cython Parallelism

Instead of using thread-local ndarrays, a very convenient method is to create an array shaped in (num_of_threads, <whatever the structure>) to make the changes thread-safe. For example:
cdef np.ndarray[np.double_t, ndim=2] ret
ret = np.zeros((num_threads, nbins), dtype=np.float)
and in the prange loop:
thread_num = openmp.omp_get_thread_num()
...
Afterwards, ret.sum(axis=0) is returned. The loop is set to use num_threads threads: with nogil, parallel(num_threads=num_threads):. By default, I set the prange with schedule='dynamic' option, on my laptop (ArchLinux), with openmp=201511 and gcc=9.2.0, I found interesting outputs, the summation of ret is exactly num_threads times the desired results, i.e. ret.sum(axis=0)[0] should be 200, but if 4 threads are used in parallel computation, then the output becomes 800. However, this does not happen on my server (CentOS 7.6) with gcc=4.2.0 and openmp=201107. All others are the same (Python, Cython, NumPy...) including .c files generated by Cython, for I use Anaconda. This wired problem is solved after setting schedule='static' on my laptop. I think this is some new feature of openmp, I shall read the docs to figure this out when I have some time.

### Notes on scipy.optimize.minimize

Problem: evenly distributed points on ellipsoid surfaces.
Solution: numerically minimizing energy of charges constrained on the ellipsoid surface.
Assuming the ellipsoid satisfies equation:
$$\left(\frac{x}{a}\right)^2+\left(\frac{y}{b}\right)^2+\left(\frac{z}{c}\right)^2=1$$
the ratios of 3 axes of the ellipsoid surface are $(a,b,c)^T$. The minimization process is:
1. let
$$\mathbf{x}^\prime:=\frac{1}{\sqrt{\left(\frac{x}{a}\right)^2+\left(\frac{y}{b}\right)^2+\left(\frac{z}{c}\right)^2}}(x, y, z)^T$$
then $\mathbf{x}^\prime$ satisfy the ellipsoid equation;
2. minimize the energy function
$$u=\sum_i \sum_{j>i} \frac{1}{\sqrt{(x^\prime_i-x^\prime_j)^2+(y^\prime_i-y^\prime_j)^2+(z^\prime_i-z^\prime_j)^2}}$$
3. the gradient vector is, e.g. the $x$ component of ith particle:
$$\partial u/\partial x_i=\sum_{j\ne i} - ((x^{\prime 2}_i/a^2-1)(x^\prime_j-x^\prime_i) +$$ $$(x^{\prime}_i y^{\prime}_i(y^{\prime}_j-y^{\prime}_i)+x^{\prime}_i z^{\prime}_i (z^{\prime}_j-z^{\prime}_i))/a^2)/d^3$$ with $d=\sqrt{(x^{\prime}_i -x^{\prime}_j)^2+(y^{\prime}_i -y^{\prime}_j)^2+(z^{\prime}_i -z^{\prime}_j)^2}$
***NOTE THE GRADIENT IS CALCULATED WITH RESPECT TO $\mathbf{x}$, $\nabla_\mathbf{x}U$, NOT $\mathbf{x}^\prime$, $\nabla_{\mathbf{x}^\prime}U$***
The scipy.optimize.minimize function takes jac=True parameter if the function provides gradient vector, this would boost up the program dramatically.

### Dealing with aggregates: I don't like tedious work

I used to study nanoparticles (NPs) self-assembly in polymer matrices before 2017 by coarse-grained simulation method. Analyzing morphologies of self-assemblies is a tedious work: one needs to repeatedly watch frames from simulation trajectories, finding appropriate view points, counting number of aggregates, measure size of aggregates, etc. Obtaining integrate aggregates under periodic boundary conditions is also a challenging work. In addition, one usually needs to run plenty of simulations under different conditions to find a desired morphology; selecting desired morphologies automatically is even more challenging: most of the morphologies are ill-defined, it is hard to use some common  characterization methods, e.g., $g(r)$ or $S(q)$ of NPs give little differences amongst percolated morphologies.
Based on such demands, I designed a co-pilot which can:
1. Automatically clustering the clusters;
2. Remove periodic boundary conditions and make the center-of-mass at $(0,0,0)^T$;
3. Adjust the view vector along the minor axis of the aggregate;
4. Classify the aggregate.
In step 1, firstly, I calculate the $g(r)$ of NP centers and find $r$ where $g(r)$ reaches its 1st valley, simultaneously, the distance matrix under periodic boundary condition. Using this distance matrix and $r$, a DBSCAN or HDBSCAN algorithm is performed to cluster NPs. In simulations, usually systems with very low loading of NPs are considered, DBSCAN is efficient enough; for some co-polymer self-assembly systems, a coarse-grained method is preferred: divide simulation box into grids and DBSCAN on the grids with density as weights, then perform DBSCAN on each coarse-grained clusters. DFS or BFS + Linked Cell List is also a good choice if particles are too many. All these struggles are due to that  kd-tree is not working under periodic boundary conditions.

After obtaining clusters in step 1, we must remove periodic boundary conditions of the cluster. If in step 1, one uses BFS or DFS + Linked Cell List method, then one can remove periodic boundary condition during clustering; but this method has limitations, it does not work properly if the cluster is percolated throughout the box. Therefore, in this step, I use circular statistics to deal with the clusters. In periodic boundary condition simulation box, distance between an NP in an aggregate and the midpoint will never exceed $L/2$ in corresponding box dimension. Midpoint here is not center-of-mass, e.g., distance between a nozzle point and center-of-mass of an ear-syringe-bulb is clearly larger than its half length; midpoint is a actually a "de-duplicated" center-of-mass. Besides, circular mean also puts most points in the center in case of percolation. Therefore, in part 2, we have following steps:
1. Choose a $r_\text{cut}$ to test whether the aggregate is percolate;
2. If the aggregate is percolate, evaluate the circular mean of all points $r_c$;
3. Set all coordinates $r$ as $r\to pbc(r-r_c)$;
4. If the aggregate is not percolate, midpoint is evaluated by calculating circular mean of coordinates $r$ where $\rho(r)>0$, $\rho(r)$ is calculated using bin size that smaller than $r_\text{cut}$ used in step 1;
5. Same as step 3, update coordinates;
6. After step 5, the aggregates are unwrapped from the box, set $r\to r-\overline{r}$ to set center-of-mass at $(0,0,0)^T$
Circular mean $\alpha$ of samples $\lbrace\alpha_i\rbrace$ is defined as the minimization of summation of a circular metric function $d(\alpha,\beta):=1-\cos(\alpha-\beta)$
$$\alpha=\underset{\beta}{\operatorname{argmin}}\sum_i (1-\cos(\alpha_i-\beta))$$
Adjusting the view vector is simple, evaluate the eigenspace of gyration tensor as $rr^T/n$ and sort the eigenvectors by eigenvalue, i.e., $\lambda_1\ge\lambda_2\ge\lambda_3$, then the minor axis is corresponding eigenvector $v_3$, the aggregate then can be rotated by $[v_1, v_2, v_3]$ so that the minor axis is $z$-axis.

The last step is a bit more tricky, the best trail I attempted was to use SVC, a binary classification method. I used about 20 samples labeled as "desired", these 20 samples were extended to 100 samples by adding some noises to the samples, e.g., moving coordinates a little bit, adding several NPs into the aggregate or removing several NPs randomly, without "breaking" the category of the morphology. Together with 100 "undesired" samples, I trained the SVC with a Gaussian kernel. The result turned out to be pretty good. I also tried to use ANN to classify all 5 categories of morphologies obtained from simulations, but ANN model did not work very well, perhaps the reason was lack of samples or the model I built was too rough. I didn't try other multi-class methods, anyway, that part of work was done, I stopped developing this co-pilot long time ago.

### Eigenvalues of circulant matrices

A circulant matrix is defined as
$$C=\begin{bmatrix}c_{0}&c_{n-1}&\dots &c_{2}&c_{1}\\c_{1}&c_{0}&c_{n-1}&&c_{2}\\\vdots &c_{1}&c_{0}&\ddots &\vdots \\c_{n-2}&&\ddots &\ddots &c_{n-1}\\c_{n-1}&c_{n-2}&\dots &c_{1}&c_{0}\\\end{bmatrix}$$
where $C_{j, k}=c_{j-k \mod n}$, the $k$-th eigenvalue $\lambda_k$ and eigenvector $x_k$ satisfy $C\cdot x_k=\lambda_k x_k$, or, equivalently, $n$ equations as:
$$\sum_{j=0}^{m-1}c_{m-j}x_j+\sum_{j=m}^{n-1}c_{n-j+m}x_j=\lambda_k x_m\quad m=0,1,\dots,n-1$$
with $c_n=c_0$, $x_m$ is the $m$-th compoent of an eigenvector $x_k$. Changing the dummy summing ($j\to m-j$ and $j\to n-j+m$) variables yields
$$\sum_{j=1}^{m}c_j x_{m-j} +\sum_{j=m+1}^{n}c_j x_{n+m-j}=\lambda_k x_m$$
with $m=0,1,2,\dots,n-1$. One can "guess" a solution that $x_j=\omega^j$, therefore the equation above turns into
\begin{align}&\sum_{j=1}^{m}c_j \omega^{m-j} +\sum_{j=m+1}^{n}c_j \omega^{n+m-j}=\lambda_k \omega^m\\ \leftrightarrow &\sum_{j=1}^{m}c_j \omega^{-j} +\omega^{n}\sum_{j=m+1}^{n}c_j \omega^{-j}=\lambda_k\end{align}
Let $\omega$ be one of the $n$-th square root of unity, i.e., $\omega = \exp(-2\pi\mathbb{i}/n)$,  then $\omega^{n}=1$; we have an eigenvalue
$$\lambda = \sum_{j=0}^{n-1}\omega^{-j} c_j$$
with corresponding eigenvector
$$x= (1, \exp(2\pi\mathbb{i}/n), \exp(2\pi\mathbb{i}/n)^2,\dots,\exp(2\pi\mathbb{i}/n)^{n-1})^T$$
The $k$-th eigenvalue is generated from $\omega^{-k} = \exp(2\pi k\mathbb{i}/n)$ with $k\in [0,n-1]$, yields the $k$-th eigenvalue and eigenvector:
$$\lambda_k = \sum_{j=0}^{n-1}c_j \exp(2\pi k\mathbb{i}/n)$$
and
$$x_k = (1, \exp(2\pi k\mathbb{i}/n), \exp(2\pi k\mathbb{i}/n)^2,\dots,\exp(2\pi k\mathbb{i}/n)^{n-1})^T$$
The eigenspace is just the DFT matrix, and ALL circulant matrices share same eigenspace, it is easily to verify that circulant matrices have following properties:
If $A$ and $B$ are circulant matrices, then
1. $AB=BA=W^\ast\Gamma W$ with $W$ is the DFT matrix and $\Gamma=\Gamma_A\Gamma_B$, $\Gamma_i$ represents diagonal matrix consists of eigenvalues of $i$; $\Gamma_A = WAW^\ast$;
2. $B+A=A+B=W^\ast\Omega W$, $\Omega=\Gamma_A+\Gamma_B$;
3. If $\mathrm{det}(A)\ne 0$, then $A^{-1}=W^\ast \Gamma_A^{-1}W$;
The proof is straightforward:
1. $AB=W^\ast \Gamma_AW W^\ast\Gamma_BW=$
$W^\ast\Gamma_A\Gamma_BW=W^\ast \Gamma_B\Gamma_AW=BA$
2. $W(A+B)W^\ast=\Gamma_A+\Gamma_B$;
3. $AW^\ast \Gamma_A^{-1}W=W^\ast \Gamma_AW W^\ast \Gamma_A^{-1}W=\mathbb{I}$

### Toeplitz, Circulant matrix and discrete convolution theorem

A topelitz matrix is a matrix with form of
$$A=\begin{bmatrix} a & b & c & d & e \\ f & a & b & c & d \\ g & f & a & b & c \\ h & g & f & a & b \\ i & h & g & f & a\end{bmatrix}$$
Generally, the elements are denoted as $a_{i-j}$: $A_{ij}=A_{i+1, j+1}=a_{i-j}$. A circulant matrix is a special kind of Toeplitz matrix where each row vector is rotated one element:
$$C=\begin{bmatrix}c_{0}&c_{n-1}&\dots &c_{2}&c_{1}\\c_{1}&c_{0}&c_{n-1}&&c_{2}\\\vdots &c_{1}&c_{0}&\ddots &\vdots \\c_{n-2}&&\ddots &\ddots &c_{n-1}\\c_{n-1}&c_{n-2}&\dots &c_{1}&c_{0}\\\end{bmatrix}$$

Circulant matrices are closely connected to discrete convolution:
$$y=h\ast x=\begin{bmatrix}h_{1}&0&\cdots &0&0\\h_{2}&h_{1}&&\vdots &\vdots \\h_{3}&h_{2}&\cdots &0&0\\\vdots &h_{3}&\cdots &h_{1}&0\\h_{m-1}&\vdots &\ddots &h_{2}&h_{1}\\h_{m}&h_{m-1}&&\vdots &h_{2}\\0&h_{m}&\ddots &h_{m-2}&\vdots \\0&0&\cdots &h_{m-1}&h_{m-2}\\\vdots &\vdots &&h_{m}&h_{m-1}\\0&0&0&\cdots &h_{m}\end{bmatrix} \begin{bmatrix}x_{1}\\x_{2}\\x_{3}\\\vdots \\x_{n}\end{bmatrix}$$
The above matrix is $(m,n)$, the size of full version of the matrix of RHS should be $(m+n-1,m+n-1)$ and the circulant matrix is generated by an $m+n-1$ row vector $(h_1, h_2,...,h_m,0,0,...,0)$, the full version of vector of $x$ of RHS should also be an $m+n-1$ vector $(x_1, x_2,...,x_n,0,...,0)^T$. For $(a \ast v)[n] = \sum_{m = -\infty}^{\infty} a[m] v[n - m]$(from help(np.convolve)). If the original arrays are of size $m, n$, then the output array is $m+n-1$. Here I present a simple proof of discrete convolution theorem using DFT:
$y=h\ast x=\mathrm{IDFT}\{\mathrm{DFT}\{h\}\mathrm{DFT}\{x\}\}$
Proof:
the DFT matrix is:
$$W=\frac {1}{\sqrt {N}}\begin{bmatrix}1&1&1&1&\cdots &1\\1&\omega &\omega ^{2}&\omega ^{3}&\cdots &\omega ^{N-1}\\1&\omega ^{2}&\omega ^{4}&\omega ^{6}&\cdots &\omega ^{2(N-1)}\\1&\omega ^{3}&\omega ^{6}&\omega ^{9}&\cdots &\omega ^{3(N-1)}\\\vdots &\vdots &\vdots &\vdots &&\vdots \\1&\omega ^{N-1}&\omega ^{2(N-1)}&\omega ^{3(N-1)}&\cdots &\omega ^{(N-1)(N-1)}\\\end{bmatrix}$$
With $W_{ij}=\omega^{i\times j}$ and $\omega=\exp(\frac{2\pi \mathbb{i}}{N})$. Assume $\hat{h}=\mathrm{DFT}\{h\}=W\cdot h$, and assume $\hat{c}_j=\mathrm{DFT}\{h\}_j\mathrm{DFT}\{x\}_j$, we have:
$$\hat{c}_j=\sum_{k_1=0,k_2=0}^{N-1}\omega^{j\times(k_1+k_2)}h_{k_1}x_{k_2}$$
The IDFT matrix is the complex conjugate of $W$, therefore, $W_{ij}^\ast=\omega^{-i\times j}$; therefore $y_i=W_{ij}^\ast \hat{c}_{j}=\sum_{k_0=0,k_1=0,j=0}^{N-1}\omega^{j\times(k_1+k_2-i)}h_{k_1}x_{k_2}$. With $\omega$ is the $N$-th roots of unity. Summing over $j$ first, we thus obtain
$$$$\sum_{j=0}^{N-1}\omega^{(k_1+k_2-i)\times j}=\begin{cases}N\quad &k_1+k_2=i\\ \frac{1-\omega^{(k_1+k_2-i)\times N}}{1-\omega^{k_1+k_2-i}}=0 &k_1+k_2\ne i\end{cases}$$$$
Which gives $y[i]=\sum_{k_1+k_2=i}a_{k_1}b_{k_2}=\sum_j a[j]b[i-j]$ being the discrete convolution. $Q.E.D.$
Therefore, the matrix multiplication can also be accelerated by FFT method if circulant matrices are involved. The extra $1/\sqrt{N}$ comes from IDFT.
In[1]: a = np.random.random((5,))
In[2]: b = np.random.random((5,))
In[3]: np.allclose(np.fft.ifft(np.fft.fft(a,n=9, norm='ortho')*np.fft.fft(b, n=9,norm='ortho'),
....:     n=9,norm='ortho').real*9**0.5,np.convolve(a,b,'full'))
Out[3]: True
The $W^\ast \cdot W=\sum_{k=0}^{N-1}\omega^{k(i-j)}=\delta_{ij}=\mathbb{I}$ where $W^\ast$ represents the inverse DFT operator, is the complex conjugation transpose of $W$. Appling twice of FT on a function $f(x)$ yields $f(-x)$: $W^2_{ij}=\sum_{k=0}^{N-1}\omega^{k(i+j)}=\delta_{i+j\equiv0 \bmod N},i,j\in[0,N-1]$, which means $\mathrm{FT}\{\mathrm{FT}\{f(x)\}\}=f(-x)$, e.g., for $5\times5$ sized DFT matrix:
$$W^2=\begin{bmatrix} 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1\\ 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 1 & 0 & 0\\ 0 & 1 & 0 & 0 & 0\\ \end{bmatrix}$$
Clearly, $W^2\cdot (x_0, x_1, x_2, x_3, x_4)^T=(x_0,x_4,x_3,x_2,x_1)^T$. In fact, it is easily to verify that the eigenvectors of a circulant matrix are Fourier modes: $v_j=1/\sqrt{N}(1, \omega_j, \omega_j^2,...,\omega_j^{N-1})^T$ for $\omega_j = \exp(2 \pi j \mathbb{i}/n)$, and corresponding eigenvalue $\lambda_j=c_0+c_{N-1}\omega_j+c_{N-2}\omega_j^2+...+c_1\omega_j^{N-1}$, with $C\cdot v_j=\lambda_j v_j$, the 'circulant' property comes from that $\omega$ is generator of a cyclic group.

### Distribution of segments on Gaussian chain

The probability distribution function of $P_i(\mathbf{r}_i-\mathbf{r}_\mathrm{cm})$ of ideal chain is evaluated, for $P_i$ represents the probability distribution fucntion of $i$th segment with respect to its centre of mass on the ideal chain. An ideal chain is modeled as multidimensional random walk, where the steps are independent, and the distribution of a step at mean length $b$ is given by $P(\mathbf{r})\sim\mathcal{N}(0,b^2)$. Let $\mathbf{b}_i=\mathbf{r}_{i+1}-\mathbf{r}_{i}$ be the $i$th bond vector, then we have

$$\mathbf{r}_i=\sum_{j=1}^{i-1} \mathbf{b}_j$$

and the centre of mass $\mathbf{r}_\mathrm{cm}=\frac{1}{N}\sum_i \mathbf{r}_i$ is

$$\mathbf{r}_\mathrm{cm}=\frac{1}{N}\sum_{j=1}^{N-1}(N-j)\mathbf{b}_j$$

then

$$\mathbf{r}_i-\mathbf{r}_\mathrm{cm}=\sum_{j=1}^{i-1}\frac{j}{N}\mathbf{b}_j+\sum_{j=i}^{N-1}\frac{N-j}{N}\mathbf{b}_j$$

If variable $X$ is a Gaussian random variable with scale of $\sigma^2$, then $aX$ is a Gaussian random variable with scale of $a^2\sigma^2$, we can then write the characteristic function for $P_i(\mathbf{r}_i-\mathbf{r}_\mathrm{cm})$ of a $d$ dimensional ideal chain:

$$\phi_i(\mathbf{q})=\Pi_{j} \phi_{\mathbf{b}_j'}(\mathbf{q})=\exp\left(-\frac{1}{2}\mathbf{q}^T\left(\sum_{j=1}^{N-1}\Sigma_j\right)\mathbf{q}\right)$$

with $\mathbf{b}'_j=\frac{j}{N}\mathbf{b}_j$ for $j\le i-1$ and $\frac{N-j}{N}\mathbf{b}_j$ for $i\le j \le N-1$; $\phi_{\mathbf{b}_j'}=-\exp(-0.5\mathbf{q}^T\Sigma\mathbf{q})$ is the characteristic function of the probability distribution fucntion of bond $j$. $\Sigma_j=\frac{j^2 b^2}{d N^2}\mathbb{I}_d$ for $j\le i-1$ and $\Sigma_j=\frac{(N-j)^2 b^2}{d N^2}\mathbb{I}_d$ for $i\le j \le N-1$, and $\mathbb{I}_d$ is the $d$-dimensional identical matrix, and therefore: (For convenience, I will set $b=1$ in the later calculations.)

$$\phi_i(\mathbf{q})=$$ $$\exp \left(-\frac{\left(6 i^2-6 i (N+1)+2 N^2+3 N+1\right) \left(q_x^2+q_y^2+q_z^2\right)}{36 N}\right)$$

The corresponding distribution of this characteristic function is still a Gaussian distribution with $\Sigma=\frac{b^2}{3} \mathbb{I}_3$, where the equivalent bond length $b^2=\frac{\left(6 i^2-6 i (N+1)+2 N^2+3 N+1\right)}{6 N}$. The 6th moment is $\frac{1}{N}\sum_{i=1}^N \langle(\mathbf{r}_i-\mathbf{r}_\mathrm{cm})^6\rangle=\frac{58 N^6-273 N^4+462 N^2-247}{1944 N^3}$. At large $N$, only leading term matters, which is $\frac{29}{972} N^3$. For $N=20$, the result is $235.886$, which is in accord with the simulation. Another example is for $N=5$, the $R_g^2$ is $0.8$ rather than $5/6=0.8\dot{3}$ in this case.

Here is the simulation code:
ch = np.random.normal(size=(100000,20,3),scale=1/3**0.5)
ch[:,0,:]=0
ch = ch.cumsum(axis=1)
ch -= ch.mean(axis=1,keepdims=1)
m6 = np.mean(np.linalg.norm(ch,axis=-1)**6)


### Simple Examples of Parallel Computing of Cython

It is more convenient to calculate some properties during a molecular simulation process by accessing data from API of the molecular simulation program than calculating after the whole simulation progress and dumping all the coordinates. Especially, for sharply fluctuated properties such as Virial, RDF, etc., require massive frames to calculate. It is expensive to dump densely. For some Python-friendly molecular simulation softwares, e.g., lammps, hoomd-blue, galamost, etc.; it is easily to embed customized Python functions into the simulation control scripts. However, Python has a poor performance in massive calculations, numba.cuda.jit does dramatically boost up the program, however, due to unknown reasons, GPU-accelerated molecular simulation softwares like hoomd-blue and galamost produce errors during simulations if one uses numba.cuda.jit compiled functions. Therefore, I try to use Cython to generate C-functions to accelerate the calculations. Here I put a simple example of a pairwise property calculation function (RDF) using Cython-parallel method, it's also a learning note of mine:

# cython: language_level=3, boundscheck=False, wraparound=False, cdivision=True
import numpy as np
cimport numpy as np
import cython
from cython.parallel import prange, parallel
cimport openmp
from libc.math cimport floor,sqrt,pow
from libc.stdlib cimport malloc, free
import multiprocessing

cdef long * unravel_index_f(long i, long[:] dim) nogil:
cdef long k, d
d = dim.shape[0]
cdef long * ret = <long *> malloc(d * sizeof(long))
for k in range(d):
ret[k] = i % dim[k]
i = (i - ret[k]) / dim[k]
return ret

cdef long ravel_index_f(long * vec, long[:] dim) nogil:
cdef long ret, d, tmp, k
d = dim.shape[0]
ret = (vec[0] + dim[0]) % dim[0]
tmp = dim[0]
for k in range(1,d):
ret += ((vec[k] + dim[k]) % dim[k]) * tmp
tmp *= dim[k]
return ret

cdef long jth_neighbour(long * veci, long * vecj, long[:] dim) nogil:
cdef long ret, d, tmp, k
d = dim.shape[0]
ret = (veci[0] + vecj[0] - 1 + dim[0]) % dim[0]
# re-ravel tmpi + tmpj - 1 to cell_j
# -1 for indices from -1, -1, -1 to 1, 1, 1 rather than 0,0,0 to 2,2,2
tmp = dim[0]
for k in range(1, d):
ret += ((veci[k] + vecj[k] - 1 + dim[k]) % dim[k]) * tmp
tmp *= dim[k]
return ret

cdef double pbc_dist(double[:] x, double[:] y, double[:] b) nogil:
cdef long i, d
cdef double tmp=0, r=0
d = b.shape[0]
for i in range(d):
tmp = x[i]-y[i]
tmp = tmp - b[i] * floor(tmp/b[i]+0.5)
r = r + pow(tmp, 2)
return sqrt(r)

cdef long cell_id(double[:] p, double[:] box, long[:] ibox) nogil:
# In the Fortran way
cdef long ret, tmp, i, n
n = p.shape[0]
ret = <long> floor((p[0] / box[0] + 0.5) * ibox[0])
tmp = ibox[0]
for i in range(1, n):
ret = ret + tmp * <long> floor((p[i] / box[i] + 0.5) * ibox[i])
tmp = tmp * ibox[i]
return ret

cdef void linked_cl(double[:, :] pos, double[:] box, long[:] ibox, long[:] head, long[:] body) nogil:
cdef long i, n, ic
n = pos.shape[0]
for i in range(n):
ic = cell_id(pos[i], box, ibox)

def rdf(double[:,:] x, double[:,:] y, double[:] box, double bs, long nbins):
cdef long i, j, k, l, n, m, d3, d, ic, jc
cdef np.ndarray[np.double_t, ndim=2] ret
cdef long[:] head, body, ibox, dim
cdef double r, r_cut
cdef long ** j_vecs
cdef long * veci
r_cut = nbins * bs
n = x.shape[0]
d = x.shape[1]
m = y.shape[0]
d3 = 3 ** d
ibox = np.zeros((d,), dtype=np.int64)
for i in range(d):
ibox[i] = <long> floor(box[i] / r_cut + 0.5)
head = np.zeros(np.multiply.reduce(ibox), dtype=np.int64) - 1
body = np.zeros((m,), dtype=np.int64) - 1
ret = np.zeros((num_threads, nbins), dtype=np.float)
dim = np.zeros((d,), dtype=np.int64) + 3
j_vecs = <long **> malloc(sizeof(long *) * d3)
for i in range(d3):
j_vecs[i] = unravel_index_f(i, dim)
for i in prange(n, schedule='dynamic'):
ic = cell_id(x[i], box, ibox)
veci = unravel_index_f(ic, ibox)
for j in range(d3):
jc = jth_neighbour(veci, j_vecs[j], ibox)
while k != -1:
r = pbc_dist(x[i], y[k], box)
if r < r_cut:
l = <long> (r/bs)
k = body[k]
free(veci)  # arrays created by malloc should be freed. (in unravel_index_f)
free(j_vecs)   # without calling free(), memory leak would happen.
return np.sum(ret, axis=0)


Notes:

1. def means Python function and cdef means C functions;
2. Arrays declared by double[:] etc. are memoryslice and can be conveniently initialized by np.zeros or reading an np.ndarray as function parameter;
3. Without GIL, any calling of Python function is forbidden, the array must be initialized and declared in the C method;
4. In prange loop, variables are thread-local (see the generated .c file, in section #pragma omp parallel), there is no explicit way to call an atomic operation in Cython (using with gil results in very low efficiency), hence the result is generated as ret = np.zeros((n, nbins), ... so that in each thread i that ret[i] are independent, or generate ret as (num_threads, data structure...), just keep independent amongst threads and in ith loop, call openmp.omp_get_thread_num() to get the current thread's id;
5. <long> is a type casting.

This RDF function uses a linked cell list algorithm to reduce the calculation from $O(n^2)$ to $O(n)$, a small ranged RDF ($r<r_\mathrm{cut}$) is calculated, the RDF on $r\ge r_\mathrm{cut}$ can be calculated using an FFT algorithm by setting $r_\mathrm{bin}<\frac{r_\mathrm{cut}}{2}$

The second example is calculation of histogram N-d array by modulus of indices: $\int f(\mathbf{r})\delta(|\mathbf{r}|-r)\mathrm{d}\mathbf{r}$:

# cython: language_level=3, boundscheck=False, wraparound=False, cdivision=True
import numpy as np
cimport numpy as np
import cython
from cython.parallel import prange, parallel
cimport openmp
from libc.math cimport floor,sqrt,pow
from libc.stdlib cimport malloc, free
import multiprocessing

cdef double unravel_index_f_r(long i, long[:] dim) nogil:
# unravel in Fortran way
cdef long k, d, tmp
cdef double r
d = dim.shape[0]
for k in range(d):
tmp = i % dim[k]
r += <double> (tmp)**2
i = (i - tmp) / dim[k]
return sqrt(r)

def hist_to_r(double[:] x, long[:] shape, double dr, double bs, double rc):
cdef long i, n, j, n_bins
cdef np.ndarray[np.double_t, ndim=2] ret, count
n_bins = <long> (rc / bs)
ret = np.zeros((num_threads, n_bins), dtype=np.float64)
count = np.zeros((num_threads, n_bins), dtype=np.float64)
n = x.shape[0]
for i in prange(n, schedule='dynamic'):
j = <long> (unravel_index_f_r(i, shape)*dr/bs)
if j < n_bins:
ret[thread_num, j] += x[i]
count[thread_num, j] += 1
return np.sum(ret,axis=0), np.sum(count,axis=0)


In function unravel_index_f_r, the index is unraveled in the Fortran way, therefore, calling hist_to_r requires x.ravel(order='F').

### Anisotropy of ideal chain

The Gaussian chain is isotropic when averaged over conformation space and all orientations. Therefore, a Gaussian chain is dealt as sphere with radius of $R_g$, it’s radius of gyration. In the 2nd chapter of Rubinstein’s Polymer Physics, an exercise shows that the $R_g^2$ is asymmetric if one set the coordinate frame on its end-to-end vector, $\mathbf{R}_{ee}$. i.e., the $\mathbf{R}_{ee}$ vector is set as the $x$-axis therefore $\mathbf{R}_{ee}=(R,0,0)^T$. Then the 3 components of $\mathbf{R}_{ee}$ are $\frac{Nb^2}{36}$, $\frac{Nb^2}{36}$ on $y$, $z$ direction and $\frac{Nb^2}{36}+\frac{R^2}{12}$ on $x$ direction. Here I make a very simple proof.

Consider a Gaussian chain is fixed between $(0,0,0)^T$ and $\mathbf{R}_{ee}$. It’s actually a Brownian bridge, and the distribution is a multivariate Gaussian with mean at $\frac{i}{N}\mathbf{R}$ with variance $\frac{i(N-i)}{N}b^2$, the proof is simple:

$$P_{0\mathbf{R}}(\mathbf{r},n)=\frac{G(\mathbf{r},0,n)G(\mathbf{R},\mathbf{r},N-n)}{G(0,\mathbf{R},N)}$$

$G(a,b,n)$ represents distribution of a Gaussian chain ends at $a,b$ with segment length $n$. The meaning is straightforward: it’s the probability of a length $n$ Gaussian chain start from $0$ and end at $\mathbf{r}$ connected with another $N-n$ Gaussian chain which started at $\mathbf{R}$ and stopped at $\mathbf{r}$, and the whole chain is an Gaussian chain with length $N$ and $\mathbf{R}_{ee}=\mathbf{R}$. It is easily to show the distribution of such chain

$$P_{0\mathbf{R}}(\mathbf{r},n)=G\left(\mathbf{r}-\frac{n}{N}\mathbf{R}, 0,\frac{n(N-n)}{N}\right)$$

is equivalent to a Gaussian chain segment ends with $\mathbf{r}$ and $\frac{n}{N}\mathbf{R}$ with equivalent length $\frac{n(N-n)}{N}$. The $R_g^2$ is then

\begin{align}R_g^2=&\frac{1}{2N^2}\int_0^N\langle(\mathbf{r}_i-\mathbf{r_j})^2\rangle\mathrm{d}i\mathrm{d}j\\ =&\frac{1}{N^2}\int_0^N\int_j^N \frac{(i-j)^2 R^2}{N^2}+\frac{(i-j)(N-(i-j))}{N}b^2\mathrm{d}i\mathrm{d}j\\ =&\frac{R^2+nb^2}{12}\end{align}

In this equivalent method, $\langle (\mathbf{r}_i-\mathbf{r}_j)^2\rangle$ is considered as $(\frac{i}{N}-\frac{j}{N})^2R^2+\frac{|i-j|(N-|i-j|)}{N}$ from the equivalent Gaussian chain. This is because despite the chain is ‘fixed’, it is still a Gaussian chain, which means translation invariance, $\langle (\mathbf{r}_i-\mathbf{r}_j)^2\rangle$ depends only on $|i-j|$, therefore, $P_{0\mathbf{R}}(\mathbf{r},n)$ gives the probability of segment $\mathbf{r}_n-\mathbf{r}_0$, which is any $n$-segment on the Gaussian chain. I tried calculating $P(\mathbf{r}_i-\mathbf{r}_j)$ from the convolution of $P_{0\mathbf{R}}$. This is incorrect because $\mathbf{r}_i$, $\mathbf{r}_j$ are correlated, convolution of $P_{0\mathbf{R}}$ results in dependence of $i$ and $j$ in $\langle (\mathbf{r}_i-\mathbf{r}_j)^2\rangle$, violates the translation invariance of the chain.

Now if $R^2=Nb^2$, we see that $R_g^2=\frac{1}{6}Nb^2$; and we have $\frac{Nb^2}{36}+\frac{R^2_{x,y,z}}{12}=\frac{Nb^2+R^2}{36}$ for each dimension, especially, if $\mathbf{R}_{ee}=(R,0,0)^T$, we have $\frac{Nb^2}{36}$ in $y$ and $z$ direction and $\frac{Nb^2}{36}+\frac{R^2}{12}$ in $x$-direction.

Simple simulation code:

rg2_ree = rg2 = 0
for _ in  range(5000):
ch = np.random.normal(size=(1000,3))
ch = np.append(np.zeros((1,3)),np.cumsum(ch, axis=0),axis=0)
ch = ch - ch.mean(axis=0)
ree = ch[-1]-ch[0]
ree = ree/np.linalg.norm(ree)
rg2_ree += np.sum(ch.dot(ree)**2)
rg2 += np.sum(ch**2)
rg2_ree/rg2
# Result is 0.666...


### Flory characteristic ratio of hindered rotation chain

This is an exercise on Rubinstein's textbook of Polymer Physics.

1. Definition of Flory characteristic ratio:
$$C_\infty:=\lim_{n\to\infty}\frac{\langle \mathbf{R}^2 \rangle}{nb^2}$$
where $\langle \mathbf{R}^2 \rangle$ is the mean square end-to-end vector, $b$ is Kuhn length of the polymer and $n$ is corresponding chain length.
2. Calculation of $\langle \mathbf{R}^2\rangle$:
Let $\mathbf{r}_i$ be the bond vector between $i$ and $(i-1)$th particle, and $\mathbf{r}_1=\mathbf{R}_1-\mathbf{R}_0\equiv0$; for an $n$-bead chain,:
\begin{align}\langle \mathbf{R}^2 \rangle&=\left\langle\left(\sum_{i=1}^n \mathbf{r}_i^2\right)\cdot\left(\sum_{j=1}^n \mathbf{r}_j^2\right)\right\rangle\\ &=\sum_i\sum_j\langle \mathbf{r}_i\cdot\mathbf{r}_j\rangle\end{align}
3. Calculation of $\langle\mathbf{r}_i\cdot\mathbf{r}_j\rangle$:
Now consider a local coordinate frame that $\mathbf{r}_i=(b,0,0)^T$, then $\mathbf{r}_{i+1}$ could be considered as rotating $\mathbf{r}_i$ $\theta$ by $z$-axis for bond angle and $\phi_{i}$ by $x$-axis for torsion. The rotation matrix is thus
$$\mathbf{T}_i:=\left( \begin{array}{ccc} \cos (\theta ) & -\sin (\theta ) & 0 \\ \sin (\theta ) \cos (\phi_i) & \cos (\theta ) \cos (\phi_i) & -\sin (\phi_i) \\ \sin (\theta ) \sin (\phi_i ) & \cos (\theta ) \sin (\phi_i ) & \cos (\phi_i ) \\ \end{array}\right)$$
or $\pi-\theta$, $\pi - \phi$ according to the definitions of bond angle $\theta$ and torsion angle $\phi$. In every reference coordinate frame $R_i$, the bond vector is $(b,0,0)^T$, and in $R_i$, $\mathbf{r}_{i+1}=\mathbf{T}_i\cdot\mathbf{r}_i=(b \cos (\theta ),b \sin (\theta ) \cos (\phi_i ),b \sin (\theta ) \sin (\phi_i ))^T$. Now consider a simple case of $\mathbf{r}_i\cdot\mathbf{r}_{i+2}$, the $\mathbf{r}_{i+2}$ is calculated in the $(i+1)$th reference frame $R_{i+1}$ where $\mathbf{r}_{i+1}$ is $(b,0,0)^T$ and $\mathbf{r}_{i+2}=\mathbf{T}_{i+1}\cdot\mathbf{r}_{i+1}$, here we must 'rotate' one of the vector into the other frame to do the inner product, i.e. rotate $\mathbf{r}_i=(b,0,0)^T$ from $R_i$ to $R_{i+1}$ or vice versa. Now note that $\mathbf{r}_{i+1}$ in the $R_i$ frame is $(b \cos (\theta ),b \sin (\theta ) \cos (\phi_i ),b \sin (\theta ) \sin (\phi_i ))^T$ and $(b,0,0)^T$ in the $R_{i+1}$ frame, therefore if we want to transform some vector in $R_i$ into $R_{i+1}$, we need a transform matrix to transform the $(b \cos (\theta ),b \sin (\theta ) \cos (\phi_i ),b \sin (\theta ) \sin (\phi_i ))^T$ into $(b,0,0)^T$ which is $\mathbf{T}_i^{-1}$, or $\mathbf{T}_i^T$, because in frame $R_i$, we obtain $\mathbf{r}_{i+1}$ by $\mathbf{T}_i\cdot(b,0,0)^T$ and $\mathbf{T}_i$ is a rotation matrix, which is orthonormal. By transform $\mathbf{r}_i$ into $R_{i+1}$, the inner product $\mathbf{r}_i\cdot\mathbf{r}_{i+2}$ is
$$\langle\mathbf{T}_i^T\mathbf{r}_i,\mathbf{T}_{i+1}\mathbf{r}_{i+1}\rangle=$$ $$\langle\mathbf{r}_i,\mathbf{T}_i\mathbf{T}_{i+1}\mathbf{r}_{i+1}\rangle=(b,0,0)\mathbf{T}_i\mathbf{T}_{i+1}(b,0,0)^T$$
By induction, $\mathbf{r}_i\cdot\mathbf{r}_j=(b,0,0)\mathbf{T}_i\mathbf{T}_{i+1}\mathbf{T}_{i+2}\cdots\mathbf{T}_{j-1}(b,0,0)^T$, since $\theta$ for every bonds are same and $\phi_{1,2,\dots,n}$ are independently distributed, let $\mathbf{T}$ be the average transform matrix of ${\mathbf{T}_i}$:
\begin{align}\mathbf{T}&:=\frac{\int_{-\pi}^{\pi}\mathbf{T}_i(\phi)\exp{(-U(\phi)/k_BT)}\mathrm{d}\phi}{\int_{-\pi}^{\pi}\exp{(-U(\phi)/k_BT)}\mathrm{d}\phi}\\ &=\left( \begin{array}{ccc} \cos (\theta ) & -\sin (\theta ) & 0 \\ \sin (\theta ) \langle\cos (\phi) \rangle & \cos (\theta )\langle \cos (\phi)\rangle &0 \\ 0& 0& \langle\cos (\phi)\rangle \\ \end{array}\right)\end{align}
We therefore have $\langle\mathbf{r}_i\cdot\mathbf{r}_j\rangle=(b,0,0)\mathbf{T}^{|j-i|}(b,0,0^T)=b^2(\mathbf{T}^{|j-i|})_{11}$. Here every $\sin(\phi)$ term vanishes in the integral because $U$ is even.
4. Now the Flory characteristic ratio
\begin{align}C_\infty&=\lim_{n\to\infty}\frac{1}{nb^2}\sum_i\sum_j \mathbf{r}_i\cdot\mathbf{r}_j\\ &=\lim_{n\to\infty}\frac{1}{n}(\sum_i\sum_j T^{|j-i|})_{11}\\ &=\lim_{n\to\infty}\left(-\frac{1}{n}\frac{-2 \mathbf{T}^{n+1}+\mathbf{T}^2 n+2 \mathbf{T}-n\mathbf{I}}{(\mathbf{I}-\mathbf{T})^2}\right)_{11}\\ &=\left(\frac{\mathbf{I}+\mathbf{T}}{\mathbf{I}-\mathbf{T}}\right)_{11}\\ &=\frac{1+\cos(\theta)}{1-\cos(\theta)}\frac{1-\langle\cos(\phi)\rangle}{1+\langle\cos(\phi)\rangle}\end{align}
Here we have $\frac{\bullet}{n}=0$ and $\mathbf{T}^n=0$ at $n\to\infty$, because $\mathbf{T}$ is constant and correlation between 2 beads goes to $0$ as distance goes to infinity.

### RTFM, pls RTFM!!!

I was trying to write a parallel version C-extension with Cython several days ago, to calculate Coulomb energy of $n$ points. I wrote the code as follows:
@cython.cdivision(True)
@cython.boundscheck(False)
@cython.wraparound(False)
def u(double[:,:] x):
cdef long i
cdef long j,k
cdef double ret=0,tmp
with nogil, parallel():
for i in prange(x.shape[0]-1, schedule='static'):
for j in range(i + 1, x.shape[0]):
tmp = 0
for k in range(x.shape[1]):
tmp = tmp + pow(x[i,k]-x[j,k],2)
ret = ret + 1 / sqrt(tmp)
return ret


### Intersects of 2 functions

Assuming function $f$ and $g$, scipy.optimize.fsolve would give an elegant solution:

from scipy.optimize import fsolve
def f(xy):
x, y = xy
return (y - x ** 2, y - np.sin(x))
fsolve(f, [1.0, 1.0])


gives array([0.87672622, 0.76864886]) is the intersect of $y=x^2$ and $y=\sin(x)$.

### Intersects of 2 arrays

There is a function called numpy.intersect1d that returns the sorted, uniq values in both arrays, and reduce(np.intersect1d, list_of_arrays) gives intersect of all arrays in the list. Index can also be returned by return_indices=True option.

If the 2 arrays are data values, using np.flatnonzero(np.abs(arr1-arr2))<tol) seems to be a good solution; this method returns the indices of values in arr1 and arr2 which are closed by tol.

Another method is based on idea that there is a change in sign of $\lim_{x\to x_0^+}f(x_0)-g(x_0)$ and $\lim_{x\to x_0^-}f(x_0)-g(x_0)$, therefore, if $\mathrm{d}x$ is small enough, this method would give a very good estimation of the intersects.

def intersects(x, arr1, arr2):
return x[np.flatnonzero(np.diff(np.sign(arr1 - arr2))]


Singular Plural
Nom. -a -ae
Gen. -ae -ārum
Dat. -ae -īs
Acc. -am -ās
Abl. -īs
Voc. -a -ae
Back2Top ^