# Shae's Ramblings

Stuff I find slightly meaningful and close to achievable

## Approximating the differential entropy

This post summarizes the main ideas of a method to approximate the entropy of a continuous random variable $X$. Recall the definition of $h(X)$:

$$h(X) \triangleq \;-\int_\mathcal{X} p(x)\log p(x)\,\mathrm{d}x$$

## Maximum entropy

If we have information about the distribution $p$ in the form of equality contraints

$$\mathrm{E}_p[G_i(X)] = \int_\mathcal{X} p(x)G_i(x)\,\mathrm{d}x = c_i\,,\qquad i\in \{1, \dots, n \}$$

then the distribution $\hat{p}$ which maximizes the entropy $h$ while respecting the above constraints must be of the form:

$$p_0(x) = A\,\exp\left(\sum_{i=1}^n a_iG_i(x)\right)$$

where $A, a_i$ are constants determined by the $c_i$ which are hard to compute.

## Simplifying the candidate distribution

Let $\phi$ be the standard normal distribution. The authors of the paper assume the candidate distribution $p$ is not far from $\phi$.
Thus, they normalize the data and put extra constraints $G_{n+1}(x) = x$ with $c_{n+1} = 0$ and $G_{n+2}(x) = x^2$ with $c_{n+2} = 1$.
They further assume the constraint functions $G_j$ are orthonormal with respect to the inner product $\langle f,g \rangle \triangleq \mathrm{E}_{\phi}[fg]$ which can be obtained through the Gram-Schmidt algorithm.

Near gaussianity implies that the coefficients $a_i$ in the above expression for $f_0$ are near zero for $i \leq n$ compared to $a_{n+2} \approx -\frac12$ since

\begin{align}
p_0(x) &= A\,\exp\left(\sum_{i=1}^{n+2} a_iG_i(x)\right) \\
&= A\,\exp\left(a_{n+2}x^2 + \sum_{i=1}^n a_iG_i(x)\right) \\
& \approx \frac{1}{\sqrt{2\pi}}\exp( -\frac12x^2 )
\end{align}

therefore as $\delta = \sum_{i=1}^n a_iG_i(x) \approx 0$ we can expand $p_0$ as ( related to Edgeworth expansions, more on this later)

$$p_0(x) \approx \phi(x)\left(1 + \sum_{i=1}^nc_iG_i(x)\right)$$