machine learning, computer science, and more

Mapping between two Gaussians using optimal transport and the KL-divergence

Suppose you have two multivariate Gaussian distributions and , parameterized as and . How do you linearly transform so that the transformed vectors have distribution ? Is there an optimal way to do this? The field of optimal transport (OT) provides an answer. If we choose the transport cost as the type-2 Wasserstein distance between probability measures, then we apply the following linear function:


where
½½½½½

For more details, see Remark 2.31 in “Computational Optimal Transport” by Peyre & Cuturi (available on arXiv here).

But we might instead want to find the transformation which minimizes the Kullback-Leibler divergence between and the transformed . We will use the fact that the transformed vector will also come from a Gaussian distribution, with mean and covariance given by

We then set up an optimization problem:

This leads to the following nasty-looking objective:

But we don't actually need to work through all this algebra, because the optimal transport solution also minimizes the KL-divergence. The KL-divergence reaches a minimum of 0 when and are equal, so we only need to verify that the first optimal transport transformation produces samples with distribution .

First checking the mean, we verify that . Next, checking the covariance, we have

½½½½½½½½½½

We've verified that , which means that our optimal transport solution also gives us the KL-divergence minimizer.

I'm using this fact in my ongoing research on domain adaptation under confounding. See the arXiv preprint here.


This was originally published here: https://calvinmccarter.wordpress.com/2022/03/29/mapping-between-two-gaussians-using-optimal-transport-and-the-kl-divergence/

#ML