Not SQ again...

Structure factor (SQ) is a characterization quantity which is frequently calculated in molecular simulations. It is easily to code the calculation program according to the definition. In most molecular simulations, periodic boundary condition is adopted therefore
$$S(\mathbf{q}):=\mathcal{FT}\lbrace\langle\rho(\mathbf{r})\rho(\mathbf{0})\rangle\rbrace(\mathbf{q})$$
is actually a circular convolution, where FFT can give a dramatic boost in contrast to calculate directly. The steps using FFT are:
  1. Calculate $\rho(\mathbf{r})$, which is a summation of Dirac-Delta function that can be estimated as a hisotgram;
  2. Calculate $\hat{\rho}(\mathbf{q})$ by FFT;
  3. $S(\mathbf{q})=|\hat{\rho}(\mathbf{q})|^2$;
  4. Calculate mean over modulus: $S(q)=\int S(\mathbf{q})\delta(|\mathbf{q}|-q)\mathrm{d}\mathbf{q}/\int \delta(|\mathbf{q}|-q)\mathrm{d}\mathbf{q}$
Efficiency
If the simulation box is divided into $N$ (in 3D systems, $N=N_xN_yN_z$ for example) bins, the FFT gives $O(N\log(N))$ complexity and step 4 is $O(N)$. Generally, in comparison with the direct method, one needs at least loop over number of particles and $N$ bins for $\mathbf{q}$, the number of particles is obviously, way larger than $\log(N)$. For data of most simulations are real numbers, rFFT designed for real inputs could further boost the program. Here is code for a 3D example, i, j, k represent $N_x$, $N_y$, $N_z$ respectively.
fftn(a) == np.concatenate([rfftn(a), conj(rfftn(a))[-np.arange(i),-np.arange(j),np.arange(k-k//2-1,0,-1)]], axis=-1)

Accuracy
  1. Binsize effect.
  2. The binsize in histogram should be smaller than half of the minimun of the pair distances in the system according to Nyquist-Shannon sampling theorem. For example, in Lennard-Jones systems, $0.4\sigma$ is a good choice. However, if values of only some small $q$ are concerned, i.e., in some self-assembly systems, there is no need to calculate in the "pair accuracy", the sampling distance smaller than half of the interested domain size is fine. 
  3. Randomness in the bin
  4. Particles in bins are randomly distributed, especially for large binsize mentioned above. The summation $\sum_i \exp(-\mathbb{I}\mathbf{q}\cdot\mathbf{r}_i)$ can be decomposed into $$\sum_\mathrm{bin} \exp(-\mathbb{I}\mathbf{q}\cdot\mathbf{r}_\mathrm{bin})\left(\sum_i \exp( -\mathbb{I}\mathbf{q}\cdot\delta\mathbf{r}_i)\right)$$ The idea is straightforward: for particles inside a bin, i.e., for particle $i$,  position $\mathbf{r}_i$ can be decomposed as $\mathbf{r}_i=\mathbf{r}_\mathrm{bin}+\delta\mathbf{r}_i$, and for particles in the bin, $\delta\mathbf{r}$ is assumed randomly distributed in the bin. The latter summation in the decomposition is thus represented as $n_\mathrm{bin}\overline{\exp(-\mathbb{I}\mathbf{q}\cdot\delta\mathbf{r})}$, the average is approximated according to the distribution of $\delta\mathbf{r}$. If we consider $\delta r$ (1D case, for example) uniformly distributed in $(-L/2, L/2)$ with $L$ be the binsize, the average is $\mathrm{sinc}(qL/2)$. Multiply the sinc function during step 4, the $S(q)$ will be corrected.

Comments

Popular posts from this blog

Toeplitz, Circulant matrix and discrete convolution theorem

Veni, vidi, vici

Di-block copolymer analysis: GPC