Posts

Showing posts from 2019

Di-block copolymer analysis: GPC

G el P ermeation C hromatography is a well known method of measuring molecular weight distribution of polymers. For di-block copolymers, a proper calibration curve can be constructed by using Mark-Houwink parameters of homopolymers ( see here ). Usually, di-block copolymers are obtained by synthesis one block first then "grow" the other block from the end of preceding block. The molecular weight distributions of final di-block copolymer and preceding block are accessible by GPC, the molecular weight distribution of the 2nd block, however, is hard to obtain. Here I present a simple method to calculate the molecular weight distribution of 2nd block by GPC data of preceding block and final di-block copolymer. Assuming that we already have the molecular weight distribution of the preceding block, $P_1(n)$, it is generally that the length of the 2nd block "grow" from the depends on the preceding chain length $n$, therefore, the pdf of the 2 blocks should be a joint pdf

A note on Cython Parallelism

Instead of using thread-local ndarrays, a very convenient method is to create an array shaped in (num_of_threads, <whatever the structure>) to make the changes thread-safe. For example: cdef np.ndarray[np.double_t, ndim=2] ret num_threads = multiprocessing.cpu_count() ret = np.zeros((num_threads, nbins), dtype=np.float) and in the prange loop: thread_num = openmp.omp_get_thread_num() ret[thread_num, l]+=1 ... Afterwards, ret.sum(axis=0) is returned. The loop is set to use num_threads threads: with nogil, parallel(num_threads=num_threads): . By default, I set the prange with schedule='dynamic' option, on my laptop (ArchLinux), with openmp=201511 and gcc=9.2.0 , I found interesting outputs, the summation of ret is exactly num_threads times the desired results, i.e. ret.sum(axis=0)[0] should be 200, but if 4 threads are used in parallel computation, then the output becomes 800. However, this does not happen on my server (CentOS 7.6) with gcc=4.2.0 and openmp=201

Notes on scipy.optimize.minimize

Problem: evenly distributed points on ellipsoid surfaces. Solution: numerically minimizing energy of charges constrained on the ellipsoid surface. Assuming the ellipsoid satisfies equation: $$\left(\frac{x}{a}\right)^2+\left(\frac{y}{b}\right)^2+\left(\frac{z}{c}\right)^2=1$$ the ratios of 3 axes of the ellipsoid surface are $(a,b,c)^T$. The minimization process is: let $$\mathbf{x}^\prime:=\frac{1}{\sqrt{\left(\frac{x}{a}\right)^2+\left(\frac{y}{b}\right)^2+\left(\frac{z}{c}\right)^2}}(x, y, z)^T$$ then $\mathbf{x}^\prime$ satisfy the ellipsoid equation; minimize the energy function $$u=\sum_i \sum_{j>i} \frac{1}{\sqrt{(x^\prime_i-x^\prime_j)^2+(y^\prime_i-y^\prime_j)^2+(z^\prime_i-z^\prime_j)^2}}$$ the gradient vector is, e.g. the $x$ component of ith particle: $$\partial u/\partial x_i=\sum_{j\ne i} - ((x^{\prime 2}_i/a^2-1)(x^\prime_j-x^\prime_i) + (x^{\prime}_i y^{\prime}_i(y^{\prime}_j-y^{\prime}_i)+x^{\prime}_i z^{\prime}_i (z^{\prime}_j-z^{\prime}_i))/a^2)/d^3$$ w

Dealing with aggregates: I don't like tedious work

I used to study nanoparticles (NPs) self-assembly in polymer matrices before 2017 by coarse-grained simulation method. Analyzing morphologies of self-assemblies is a tedious work: one needs to repeatedly watch frames from simulation trajectories, finding appropriate view points, counting number of aggregates, measure size of aggregates, etc. Obtaining integrate aggregates under periodic boundary conditions is also a challenging work. In addition, one usually needs to run plenty of simulations under different conditions to find a desired morphology; selecting desired morphologies automatically is even more challenging: most of the morphologies are ill-defined, it is hard to use some common  characterization methods, e.g., $g(r)$ or $S(q)$ of NPs give little differences amongst percolated morphologies. Based on such demands, I designed a co-pilot which can: Automatically clustering the clusters; Remove periodic boundary conditions and make the center-of-mass at $(0,0,0)^T$; Adjust

Eigenvalues of circulant matrices

A circulant matrix is defined as $$C=\begin{bmatrix}c_{0}&c_{n-1}&\dots &c_{2}&c_{1}\\c_{1}&c_{0}&c_{n-1}&&c_{2}\\\vdots &c_{1}&c_{0}&\ddots &\vdots \\c_{n-2}&&\ddots &\ddots &c_{n-1}\\c_{n-1}&c_{n-2}&\dots &c_{1}&c_{0}\\\end{bmatrix}$$ where $C_{j, k}=c_{j-k \mod n}$, the $k$-th eigenvalue $\lambda_k$ and eigenvector $x_k$ satisfy $C\cdot x_k=\lambda_k x_k$, or, equivalently, $n$ equations as: $$\sum_{j=0}^{m-1}c_{m-j}x_j+\sum_{j=m}^{n-1}c_{n-j+m}x_j=\lambda_k x_m\quad m=0,1,\dots,n-1$$ with $c_n=c_0$, $x_m$ is the $m$-th compoent of an eigenvector $x_k$. Changing the dummy summing ($j\to m-j$ and $j\to n-j+m$) variables yields $$\sum_{j=1}^{m}c_j x_{m-j} +\sum_{j=m+1}^{n}c_j x_{n+m-j}=\lambda_k x_m$$ with $m=0,1,2,\dots,n-1$. One can "guess" a solution that $x_j=\omega^j$, therefore the equation above turns into $$\begin{align}&\sum_{j=1}^{m}c_j \omega^{m-j} +\sum_{j=m+1}^{n}c_j \omeg

Toeplitz, Circulant matrix and discrete convolution theorem

A topelitz matrix is a matrix with form of $$A=\begin{bmatrix} a & b & c & d & e \\ f & a & b & c & d \\ g & f & a & b & c \\ h & g & f & a & b \\ i & h & g & f & a\end{bmatrix}$$ Generally, the elements are denoted as $a_{i-j}$: $A_{ij}=A_{i+1, j+1}=a_{i-j}$. A circulant matrix is a special kind of Toeplitz matrix where each row vector is rotated one element: $$C=\begin{bmatrix}c_{0}&c_{n-1}&\dots &c_{2}&c_{1}\\c_{1}&c_{0}&c_{n-1}&&c_{2}\\\vdots &c_{1}&c_{0}&\ddots &\vdots \\c_{n-2}&&\ddots &\ddots &c_{n-1}\\c_{n-1}&c_{n-2}&\dots &c_{1}&c_{0}\\\end{bmatrix}$$ Circulant matrices are closely connected to discrete convolution: $$y=h\ast x=\begin{bmatrix}h_{1}&0&\cdots &0&0\\h_{2}&h_{1}&&\vdots &\vdots \\h_{3}&h_{2}&\cdots &0&0\\\vdots &h_{3}&\cdots &h_{1}&0\\h_{

Distribution of segments on Gaussian chain

The probability distribution function of $P_i(\mathbf{r}_i-\mathbf{r}_\mathrm{cm})$ of ideal chain is evaluated, for $P_i$ represents the probability distribution fucntion of $i$th segment with respect to its centre of mass on the ideal chain. An ideal chain is modeled as multidimensional random walk, where the steps are independent, and the distribution of a step at mean length $b$ is given by $P(\mathbf{r})\sim\mathcal{N}(0,b^2)$. Let $\mathbf{b}_i=\mathbf{r}_{i+1}-\mathbf{r}_{i}$ be the $i$th bond vector, then we have $$\mathbf{r}_i=\sum_{j=1}^{i-1} \mathbf{b}_j$$ and the centre of mass $\mathbf{r}_\mathrm{cm}=\frac{1}{N}\sum_i \mathbf{r}_i$ is $$\mathbf{r}_\mathrm{cm}=\frac{1}{N}\sum_{j=1}^{N-1}(N-j)\mathbf{b}_j$$ then $$\mathbf{r}_i-\mathbf{r}_\mathrm{cm}=\sum_{j=1}^{i-1}\frac{j}{N}\mathbf{b}_j+\sum_{j=i}^{N-1}\frac{N-j}{N}\mathbf{b}_j$$ If variable $X$ is a Gaussian random variable with scale of $\sigma^2$, then $aX$ is a Gaussian random variable with scale of $a^2\si

Simple Examples of Parallel Computing of Cython

It is more convenient to calculate some properties during a molecular simulation process by accessing data from API of the molecular simulation program than calculating after the whole simulation progress and dumping all the coordinates. Especially, for sharply fluctuated properties such as Virial, RDF, etc., require massive frames to calculate. It is expensive to dump densely. For some Python-friendly molecular simulation softwares, e.g., lammps, hoomd-blue, galamost, etc.; it is easily to embed customized Python functions into the simulation control scripts. However, Python has a poor performance in massive calculations, numba.cuda.jit does dramatically boost up the program, however, due to unknown reasons, GPU-accelerated molecular simulation softwares like hoomd-blue and galamost produce errors during simulations if one uses numba.cuda.jit compiled functions. Therefore, I try to use Cython to generate C-functions to accelerate the calculations. Here I put a simple example of a p

Anisotropy of ideal chain

The Gaussian chain is isotropic when averaged over conformation space and all orientations. Therefore, a Gaussian chain is dealt as sphere with radius of $R_g$, it’s radius of gyration. In the 2nd chapter of Rubinstein’s Polymer Physics , an exercise shows that the $R_g^2$ is asymmetric if one set the coordinate frame on its end-to-end vector, $\mathbf{R}_{ee}$. i.e., the $\mathbf{R}_{ee}$ vector is set as the $x$-axis therefore $\mathbf{R}_{ee}=(R,0,0)^T$. Then the 3 components of $\mathbf{R}_{ee}$ are $\frac{Nb^2}{36}$, $\frac{Nb^2}{36}$ on $y$, $z$ direction and $\frac{Nb^2}{36}+\frac{R^2}{12}$ on $x$ direction. Here I make a very simple proof. Consider a Gaussian chain is fixed between $(0,0,0)^T$ and $\mathbf{R}_{ee}$. It’s actually a Brownian bridge, and the distribution is a multivariate Gaussian with mean at $\frac{i}{N}\mathbf{R}$ with variance $\frac{i(N-i)}{N}b^2$, the proof is simple: $$P_{0\mathbf{R}}(\mathbf{r},n)=\frac{G(\mathbf{r},0,n)G(\mathbf{R},\mathbf{r},N-n)}{G(

Flory characteristic ratio of hindered rotation chain

This is an exercise on Rubinstein's textbook of Polymer Physics . Definition of Flory characteristic ratio: $$C_\infty:=\lim_{n\to\infty}\frac{\langle \mathbf{R}^2 \rangle}{nb^2}$$ where $\langle \mathbf{R}^2 \rangle$ is the mean square end-to-end vector, $b$ is Kuhn length of the polymer and $n$ is corresponding chain length. Calculation of $\langle \mathbf{R}^2\rangle$: Let $\mathbf{r}_i$ be the bond vector between $i$ and $(i-1)$th particle, and $\mathbf{r}_1=\mathbf{R}_1-\mathbf{R}_0\equiv0$; for an $n$-bead chain,: $$\begin{align}\langle \mathbf{R}^2 \rangle&=\left\langle\left(\sum_{i=1}^n \mathbf{r}_i^2\right)\cdot\left(\sum_{j=1}^n \mathbf{r}_j^2\right)\right\rangle\\ &=\sum_i\sum_j\langle \mathbf{r}_i\cdot\mathbf{r}_j\rangle\end{align}$$ Calculation of $\langle\mathbf{r}_i\cdot\mathbf{r}_j\rangle$: Now consider a local coordinate frame that $\mathbf{r}_i=(b,0,0)^T$, then $\mathbf{r}_{i+1}$ could be considered as rotating $\mathbf{r}_i$ $\theta$ by $z$-axis

RTFM, pls RTFM!!!

I was trying to write a parallel version C-extension with Cython several days ago, to calculate Coulomb energy of $n$ points. I wrote the code as follows: @cython.cdivision(True) @cython.boundscheck(False) @cython.wraparound(False) def u(double[:,:] x): cdef long i cdef long j,k cdef double ret=0,tmp with nogil, parallel(): for i in prange(x.shape[0]-1, schedule='static'): for j in range(i + 1, x.shape[0]): tmp = 0 for k in range(x.shape[1]): tmp = tmp + pow(x[i,k]-x[j,k],2) ret = ret + 1 / sqrt(tmp) return ret

Find intersections numerically -- Using NumPy

Intersects of 2 functions Assuming function $f$ and $g$, scipy.optimize.fsolve would give an elegant solution: from scipy.optimize import fsolve def f(xy): x, y = xy return (y - x ** 2, y - np.sin(x)) fsolve(f, [1.0, 1.0]) gives array([0.87672622, 0.76864886]) is the intersect of $y=x^2$ and $y=\sin(x)$. Intersects of 2 arrays There is a function called numpy.intersect1d that returns the sorted, uniq values in both arrays, and reduce(np.intersect1d, list_of_arrays) gives intersect of all arrays in the list. Index can also be returned by return_indices=True option. If the 2 arrays are data values, using np.flatnonzero(np.abs(arr1-arr2))<tol) seems to be a good solution; this method returns the indices of values in arr1 and arr2 which are closed by tol . Another method is based on idea that there is a change in sign of $\lim_{x\to x_0^+}f(x_0)-g(x_0)$ and $\lim_{x\to x_0^-}f(x_0)-g(x_0)$, therefore, if $\mathrm{d}x$ is small enough, this method would give a very good e

CAPVT II -- notes on Wheelock's Latin

First declension nouns and adjectives Endings Singular Plural Nom. -a -ae Gen. -ae -ārum Dat. -ae -īs Acc. -am -ās Abl. -ā -īs Voc. -a -ae