The entropy is defined as

$$S:=-\int_{\Omega} p(x)\log p(x) $$

where $p(x)$ is a probability measure. It is naturally that $p(x)$ is constrained by $\int_\Omega p(x)=1$. If this is the sole constraint, $p(x)$ should be uniform distribution, optimize $S$ with Lagrange multiplier:

$$J[p]:=-S-\lambda \left(\int_\Omega p(x) -1\right)$$

$$\frac{\delta J}{\delta p}=\log p(x) + 1 -\lambda=0$$

yields

$$p(x)=\exp(\lambda-1)$$

which is a constant. If $p(x)$ is defined in a box $\Omega=[a,b]$, then $p(x)=1/(b-a)$.

For distributions with 1st and 2nd central moments, e.g., $(\mu, \sigma^2)$, $J$ becomes

$$\begin{align} J[p]=&-S-\lambda_0 \left(\int_\Omega p(x) -1\right)\\-&\lambda_1 \left(\int_\Omega(x-\mu)^2 p(x)-\sigma^2\right) \end{align}$$

$$\frac{\delta J}{\delta p}=\log p(x) + 1 – \lambda_0 – 2\lambda_1 (x-\mu)^2=0$$

yields

$$p(x) = \exp(\lambda_0-1 +2\lambda_1(x-\mu)^2)$$

which is a Gaussian distribution, one can easily evaluate $\lambda_i$ from the constraints: $\lambda_0=\log \sqrt{2\sigma^2\pi} +1$ and $\lambda_1=1/(2\sigma^2)$. Notably, $\delta^2 J/\delta p^2 = 1/p$ is always positive so that $J$ is minimized, i.e., $S$ is maximized.

For a series of constraints, e.g., $\langle f_i(x)\rangle,i=1,2,…,n$ are given, there are $n$ conjugated Lagrange multipliers, which leads to

$$J = -S – \lambda_0 \left(\int_\Omega p(x)-1\right)-\sum_i\lambda_i \left(\int_\Omega f_i(x) p(x)\right)$$

$$\frac{\delta J}{\delta p}=\log p + 1-\lambda_0 – \sum_i f_i \lambda_i=0$$

yields

$$p(x)=\exp(\lambda_0-1)\exp(\sum_i f_i\lambda_i)$$

Then we have $S_{max}=\lambda_0-1 +\sum_i \langle f_i\rangle \lambda_i$, and $\lambda_i = \partial S_{max}/\partial \langle f_i\rangle$, especially,

$$ \lambda = \frac{\partial S}{\partial \langle E\rangle }:=\frac{1}{T}$$

is the definition of temperature from Boltzmann distribution.