machine learning, computer science, and more

Computing the product of Gaussian distributions

Suppose you have \(K\) multivariate Gaussian distributions, each of dimensionality \(N\). It turns out that the product of these distributions, after normalization, is also multivariate Gaussian. What is the algorithmic complexity to compute this product?

Let's first assume that each \(k\) of the \( K\) Gaussian distributions is parameterized by mean \(\mu_k\) and covariance \(\Sigma_k\). The product is proportional to a Gaussian with mean \(\mu\) and covariance \(\Sigma\), where

\[\begin{aligned} \mu =& \Big(\sum^K_{k=1}\Sigma_k^{-1}\Big)^{-1} \Big(\sum^K_{k=1} \Sigma_k^{-1} \mu_k\Big), \\
\Sigma =& \Big(\sum^K_{k=1}\Sigma_k^{-1}\Big)^{-1}. \end{aligned} \]

Assuming we have memory to store and reuse all intermediate results, we will need to perform \(K+1\) matrix inversions and to solve \(K+1\) linear systems. Thus, the runtime complexity is \(O(KN^3 + KN^2)=O(KN^3)\).

Now, let's instead assume that each \(k\) of the \( K\) Gaussian distributions is parameterized by mean \(\mu_k\) and precision (i.e. inverse covariance) \(\Lambda_k\). The product is proportional to a Gaussian with mean \(\mu\) and precision \(\Lambda,\) where

\[\begin{aligned} \mu =& \Big(\sum^K_{k=1}\Lambda_k\Big)^{-1} \Big(\sum^K_{k=1} \Lambda_k \mu_k\Big), \\
\Lambda =& \Big(\sum^K_{k=1}\Lambda_k\Big). \end{aligned} \]

Here, we only need to perform \(K\) matrix-vector products and solve \(1\) linear system. So the runtime complexity is \(O(KN^2)\). But this analysis understates the likely speedup from using the precision matrices rather than covariance matrices. Precision matrices are often sparse but with dense inverses; in such cases this latter approach is faster and requires much less memory.

#ML