### Free energy calculation: umbrella integration

Formulae of umbrella sampling method can be found on wikipedia. In umbrella sampling, a reaction coordinate $\xi$ is pre-defined from the atomic coordinates, a bias potential is added to the atoms of interest to keep $\xi$ of the system at a specific window $\xi_w$. The bias form is usually a harmonic potential:
$$u^b_w(\xi)=0.5k_w(\xi-\xi_w)^2$$
Therefore, the energy of biased system $A^b_w = A^{ub}_w + u^b_w(\xi)$. The superscript $ub$ is short for "un-biased". In the simulations, we can sample the reaction coordinate in each window and evaluate their distribution $P^b(\xi)$, since the free energy $A=-k_BT\ln(P)$, we have:
$$A_w^{ub}(\xi) = -k_BT\ln(P^b_w(\xi))-u^b_w(\xi)-F_w$$
with $F_w$ is a reference free energy of each window and remains an unknown constant. One method to derive $F_w$ is WHAM, in year 2005, Kästner et al. (The Journal of Chemical Physics 123, no. 14 (October 8, 2005): 144104.) have proposed a new method whose idea is to take derivative of $A^u_w$ with respect to $\xi$ to eliminate $F_w$, In this method, $P(\xi)$ is assumed to be analytic, e.g., a Gaussian distribution. The general idea is to expand $A(\xi)$ into Taylor series (at $\langle \xi \rangle_w$):
$$A(\xi)=a_1 \xi + a_2 \xi^2 + a_3 \xi^3 +\ldots$$
if $a_i=0$ with $i\ge 3$, the Gaussian form is restored. For systems with higher order terms, the distributions, for example, generally have a non-zero skewness. The crucial step is to determine $\lbrace a_i\rbrace$. In practice, the probability with form of $\exp(-\sum_i a_i \xi^i)$ is difficult to determine, in year 2012, Kästner et al. (The Journal of Chemical Physics 136, no. 23 (June 21, 2012): 234102) studied cases with order $4$ and small $a_3, a_4$. I have tried another more "numerical" method to calculate $\lbrace a_i\rbrace$ when datasets are poorer: fit $\exp(-\sum_i a_i \xi ^i)$ with a Gaussian KDE to find $\lbrace a_i \rbrace$. For some "poor" datasets, higher-order terms of expansion of free energy are required. The normalization factor of the distribution is estimated as $n=\int_{\xi_\mathrm{min}}^{\xi_\mathrm{max}} P(\xi)\mathrm{d}\xi$, the fitting range should be carefully chosen to ensure the convergence of $n$.