The structure factor, commonly denoted as $S(\mathbf{q})$, is an essential characterization quantity in molecular simulations. Its calculation is straightforward based on its definition. Since periodic boundary conditions are typically used in these simulations, we have
$$S(\mathbf{q}) := \mathcal{FT}\lbrace\langle\rho(\mathbf{r})\rho(\mathbf{0})\rangle\rbrace(\mathbf{q})$$
which actually represents a circular convolution. This means that using the Fast Fourier Transform (FFT) can dramatically accelerate the computation compared to performing the calculation directly.
The FFT-based procedure proceeds as follows:
- Compute the density field $\rho(\mathbf{r})$, which is a sum of Dirac delta functions approximated by a histogram.
- Use the FFT to calculate the Fourier transform $\hat{\rho}(\mathbf{q})$.
- Obtain the structure factor by taking the modulus squared of the transform:
$$ S(\mathbf{q}) = |\hat{\rho}(\mathbf{q})|^2. $$ - Average over the modulus to yield
$$ S(q) = \frac{\int S(\mathbf{q})\delta(|\mathbf{q}|-q)\,\mathrm{d}\mathbf{q}}{\int\delta(|\mathbf{q}|-q)\,\mathrm{d}\mathbf{q}}. $$
In terms of efficiency, if the simulation box is divided into $N$ bins (for example, in 3D, $N = N_x N_y N_z$), the FFT has a computational complexity of $O(N\log(N))$, and the averaging step is $O(N)$. By contrast, a direct method would involve a loop over all particles and $N$ bins in $\mathbf{q}$ space—and typically, the number of particles is much larger than $\log(N)$. Moreover, because most simulation data are real-valued, employing an FFT designed for real inputs (rFFT) can further improve performance.
Below is an example code for a 3D case, where the indices i, j, and k correspond to $N_x$, $N_y$, and $N_z$, respectively.
fftn(a) == np.concatenate([rfftn(a), conj(rfftn(a))[-np.arange(i),-np.arange(j),np.arange(k-k//2-1,0,-1)]], axis=-1)
Accuracy Considerations: Effect of Bin Size
The histogram bin size should be smaller than half the minimum pairwise distance in the system, in accordance with the Nyquist-Shannon sampling theorem. For instance, in Lennard-Jones systems, a bin size of approximately $0.4\sigma$ is advisable. However, if you are only interested in small $q$ values—as is often the case in self-assembly systems—full pairwise resolution is not necessary; a sampling interval that is just smaller than half of the domain of interest may suffice.
Randomness Within Bins
Particles are assumed to be randomly distributed within each bin, especially when employing large bins. The Fourier transform of the particle positions is given by
$$\sum_i \exp(-\mathrm{i}\mathbf{q}\cdot\mathbf{r}_i)$$
This sum can be decomposed as
$$\sum_\mathrm{bin} \exp(-\mathrm{i}\mathbf{q}\cdot\mathbf{r}_\mathrm{bin})\left(\sum_i \exp(-\mathrm{i}\mathbf{q}\cdot\delta\mathbf{r}_i)\right)$$
where each particle’s position is written as
$$\mathbf{r}_i = \mathbf{r}_\mathrm{bin} + \delta\mathbf{r}_i$$
Assuming the displacements $\delta\mathbf{r}_i$ are randomly distributed within the bin, the inner sum can be approximated by
$$n_\mathrm{bin}\langle\exp(-\mathrm{i}\mathbf{q}\cdot\delta\mathbf{r})\rangle$$
with the average taken over the distribution of $\delta\mathbf{r}$. For a one-dimensional example, if $\delta r$ is uniformly distributed in the interval $(-L/2, L/2)$ (where $L$ is the bin size), the average is given by $\mathrm{sinc}(qL/2)$. Multiplying by this sinc correction during step 4 yields a properly corrected $S(q)$.
This strategy ensures that both efficiency and accuracy are maintained when computing the structure factor in molecular simulations.
No Comments.
You can use <pre><code class="language-xxx">...<code><pre> to post codes and $...$, $$...$$ to post LaTeX inline/standalone equations.