## Saturday, January 18, 2020

### Not SQ again...

Structure factor (SQ) is a characterization quantity which is frequently calculated in molecular simulations. It is easily to code the calculation program according to the definition. In most molecular simulations, periodic boundary condition is adopted therefore
$$S(\mathbf{q}):=\mathcal{FT}\lbrace\langle\rho(\mathbf{r})\rho(\mathbf{0})\rangle\rbrace(\mathbf{q})$$
is actually a circular convolution, where FFT can give a dramatic boost in contrast to calculate directly. The steps using FFT are:
1. Calculate $\rho(\mathbf{r})$, which is a summation of Dirac-Delta function that can be estimated as a hisotgram;
2. Calculate $\hat{\rho}(\mathbf{q})$ by FFT;
3. $S(\mathbf{q})=|\hat{\rho}(\mathbf{q})|^2$;
4. Calculate mean over modulus: $S(q)=\int S(\mathbf{q})\delta(|\mathbf{q}|-q)\mathrm{d}\mathbf{q}/\int \delta(|\mathbf{q}|-q)\mathrm{d}\mathbf{q}$
Efficiency
If the simulation box is divided into $N$ (in 3D systems, $N=N_xN_yN_z$ for example) bins, the FFT gives $O(N\log(N))$ complexity and step 4 is $O(N)$. Generally, in comparison with the direct method, one needs at least loop over number of particles and $N$ bins for $\mathbf{q}$, the number of particles is obviously, way larger than $\log(N)$. For data of most simulations are real numbers, rFFT designed for real inputs could further boost the program. Here is code for a 3D example, i, j, k represent $N_x$, $N_y$, $N_z$ respectively.
fftn(a) == np.concatenate([rfftn(a), conj(rfftn(a))[-np.arange(i),-np.arange(j),np.arange(k-k//2-1,0,-1)]], axis=-1)

Accuracy
1. Binsize effect.
2. The binsize in histogram should be smaller than half of the minimun of the pair distances in the system according to Nyquist-Shannon sampling theorem. For example, in Lennard-Jones systems, $0.4\sigma$ is a good choice. However, if values of only some small $q$ are concerned, i.e., in some self-assembly systems, there is no need to calculate in the "pair accuracy", the sampling distance smaller than half of the interested domain size is fine.
3. Randomness in the bin
4. Positions in the bin are randomly distributed, especially for large binsize mentioned above. The summation $\sum_i \exp(-\mathbb{I}\mathbf{q}\cdot\mathbf{r}_i)$ can be decomposed into $$\sum_\mathrm{bin} \exp(-\mathbb{I}\mathbf{q}\cdot\mathbf{r}_\mathrm{bin})\left(\sum_i \exp( -\mathbb{I}\mathbf{q}\cdot\delta\mathbf{r}_i)\right)$$ The idea is simple, if some particles are inside a bin, the position can be decomposed as $\mathbf{r}=\mathbf{r}_\mathrm{bin}+\delta\mathbf{r}$, and for particles in the bin, $\delta\mathbf{r}$ is assumed uniformly distributed in the bin. The latter summation in the decomposition is thus represented as $n_\mathrm{bin}\overline{\exp(-\mathbb{I}\mathbf{q}\cdot\delta\mathbf{r})}$, the average is approximated according to the distribution of $\delta\mathbf{r}$. If we consider $\delta r$ (1D case, for example) uniformly distributed in $(-L/2, L/2)$ with $L$ be the binsize, the average is $\mathrm{sinc}(qL/2)$. Multiply the sinc function during step 4, the $S(q)$ will be corrected.

## Saturday, January 11, 2020

### Free energy calculation: umbrella integration

Formulae of umbrella sampling method can be found on wikipedia. In umbrella sampling, a reaction coordinate $\xi$ is pre-defined from the atomic coordinates, a bias potential is added to the atoms of interest to keep $\xi$ of the system at a specific window $\xi_w$. The bias form is usually a harmonic potential:
$$u^b_w(\xi)=0.5k_w(\xi-\xi_w)^2$$
Therefore, the energy of biased system $A^b_w = A^{ub}_w + u^b_w(\xi)$. The superscript $ub$ is short for "un-biased". In the simulations, we can sample the reaction coordinate in each window and evaluate their distribution $P^b(\xi)$, since the free energy $A=-k_BT\ln(P)$, we have:
$$A_w^{ub}(\xi) = -k_BT\ln(P^b_w(\xi))-u^b_w(\xi)-F_w$$
with $F_w$ is a reference free energy of each window and remains an unknown constant. One method to derive $F_w$ is WHAM, in year 2005, Kästner et al. (The Journal of Chemical Physics 123, no. 14 (October 8, 2005): 144104.) have proposed a new method whose idea is to take derivative of $A^u_w$ with respect to $\xi$ to eliminate $F_w$, In this method, $P(\xi)$ is assumed to be analytic, e.g., a Gaussian distribution. The general idea is to expand $A(\xi)$ into Taylor series (at $\langle \xi \rangle_w$):
$$A(\xi)=a_1 \xi + a_2 \xi^2 + a_3 \xi^3 +\ldots$$
if $a_i=0$ with $i\ge 3$, the Gaussian form is restored. For systems with higher order terms, the distributions, for example, generally have a non-zero skewness. The crucial step is to determine $\lbrace a_i\rbrace$. In practice, the probability with form of $\exp(-\sum_i a_i \xi^i)$ is difficult to determine, in year 2012, Kästner et al. (The Journal of Chemical Physics 136, no. 23 (June 21, 2012): 234102) studied cases with order $4$ and small $a_3, a_4$. I have tried another more "numerical" method to calculate $\lbrace a_i\rbrace$ when datasets are poorer: fit $\exp(-\sum_i a_i \xi ^i)$ with a Gaussian KDE to find $\lbrace a_i \rbrace$. For some "poor" datasets, higher-order terms of expansion of free energy are required. The normalization factor of the distribution is estimated as $n=\int_{\xi_\mathrm{min}}^{\xi_\mathrm{max}} P(\xi)\mathrm{d}\xi$, the fitting range should be carefully chosen to ensure the convergence of $n$.