Stuff I find slightly meaningful and close to achievable

Error propagation in the Logit-Normal family

I recently came across the problem of computing posteriors and priors according to basic sum-product rules while having uncertainty on them. Suppose you have four probabilities \(p,q,r,k\) and all of them have some level of uncertainty that's framed as some sort of \(\text{LogitNormal}(\mu,\sigma)\).

As each of those variables are outputs of different regressions, it seems only fair to give them each parameters \((\mu_s, \sigma_s)\) with \( s\in\{p,q,r,k\}\).

Another twist is that I am given an Odds-Ratio as a way to estimate the uncertainty of \(p,q\) through \(p/q\).

In my case I found it simpler to express all of them as gaussians \(Z_s\) compressed through a logit:

p/q = \dfrac{\text{logit}(Z_p)}{\text{logit}(Z_q)} = \dfrac{1+\exp(-Z_q)}{1+\exp(-Z_p)} = \exp\left(Z_\text{OR}\right)

I also know that \(k = rq + (1-r)p \) because \(p,q\) are conditional distributions.

This seems very abstract, but just know I have specific values for some of these quantities: I know that \(\mu_\text{OR} = 5.5\) and I have values for the confidence interval \(\text{OR}\in (4.1, 7.28)\).
Let's estimate the mean \(\mu_\star\) and variance \(\sigma_\star\) of the variable \(Z_\text{OR}\).

From Normal CI's to LogNormal CI's

As it is laid out in this stack exchange post,
if your variable \(X\sim\text{Normal}(\mu, \sigma)\) and \(Y = \exp X\), a \(95\%\) CI for \(Y\) is obtained by solving

\mathbb{P}(a\leq Y \leq b) \triangleq \Phi\left(\dfrac{\log(b)-\mu}{\sigma}\right) – \Phi\left(\dfrac{\log(a)-\mu}{\sigma}\right) = 0.95

With the additional knowledge that the normal CI's are attained at \(2\Phi(1.96) – 1\) I am left solving the equations

$$ 4.1 = \exp( \mu\,-\,1.96\sigma)\qquad\text{and}\qquad 7.28 = \exp( \mu + 1.96\sigma)$$

$$\mu_\star = \dfrac{\log(a) + \log(b)}{2}\qquad\text{and}\qquad\sigma_\star = \dfrac{\log(b) – \log(a)}{2\cdot 1.96}$$

Which yields \(\mu_\star = 0.74269 \) and \(\sigma_\star = 0.0609\)

Getting the distribution of every variable

Now we can guess the distribution of \(p,q\) because

1+\exp(-Z_q) = \exp\left(Z_\text{OR}\right) + \exp\left(Z_\text{OR}\right)\exp(-Z_p)

So I'm back at estimating the distribution of sums/multiplications of a bunch of \(\text{LogNormal}\) RVs. According to the wikipedia page

Sum of Log-Normals

If \(X,Y\) are log-normal then a reasonable approximation of \(S \triangleq X+Y\) is given by a log-normal with parameters

\sigma_S^2 = \dfrac{\exp\left(2\mu_X + \sigma_X^2/2\right)(\exp(\sigma_X^2)-1) + \exp\left(2\mu_Y + \sigma_Y^2/2\right)(\exp(\sigma_Y^2)-1) + 1}{\left[\exp\left(2\mu_X + \sigma_X^2/2\right) + \exp\left(2\mu_Y + \sigma_Y^2/2\right)\right]^2}

and \(\mu_S = \log\left[\exp\left(2\mu_X + \sigma_X^2/2\right) + \exp\left(2\mu_Y + \sigma_Y^2/2\right)\right] – \sigma_S^2/2\)

Product of Log-Normals

When you have to compute \( P = XY\) it's much easier, it's just a \(\text{LogNormal}\left(\mu_X + \mu_Y, \sigma_X^2 + \sigma_Y^2\right)\)

Wrapping everything up

It is conceptually easy to solve for our parameters, but doing the computations requires layers upon layers of tedious arithmetics.

Thankfully I have implemented this ugly beast in python. All it required was a symbolic expressions simplifier and some well-executed implementation of a nonlinear optimization algorithm called BFGS.

Here's a piece of the code:

    mu_q, mu_outcome, mu_condition, mu_or = sp.symbols(
        'mu_q mu_outcome mu_condition mu_or'
    sigma_q, sigma_outcome, sigma_condition, sigma_or = sp.symbols(
        'sigma_q sigma_outcome sigma_condition sigma_or', positive=True
    params_q = (mu_q, sigma_q)
    params_outcome = (mu_outcome, sigma_outcome)
    params_condition = (mu_condition, sigma_condition)
    params_or = (mu_or, sigma_or)

    mu_lhs, sigma_lhs = sum_lognormals(
        params_outcome, prod_lognormals(
            params_q, params_condition,
    mu_rhs, sigma_rhs = sum_lognormals(
            params_q, params_or, params_condition,

    return sp.simplify(mu_lhs - mu_rhs), sp.simplify(sigma_lhs - sigma_rhs)

The results !

If we truncate to 4 decimal places, our trusty algorithm finds that if

Then the values

imply that

One last twist: let's not forget we didn't really have the expression \(\exp(Z_p)/\exp(Z_q)\), but rather the expression \(\dfrac{1+\exp(-Z_p)}{1+\exp(-Z_q)}\)

This is easily remedied according the so-called Three-parameter log-normal distribution reparametrization :

Let's summarize

Variable name Log-Normal mean \(\mu\) Log-Normal std \(\sigma\)
\(r\in (0.004, 0.013) \) \(\mu_r = -4.9321\) \(\sigma_r = 0.2769\)
\(k \in (0.01, 0.02) \) \(\mu_k = -4.2585\) \(\sigma_k = 0.21238\)
\(q = ?\) \(\mu_q = 3.3583\) \(\sigma_q = 0.4212\)
\(p = \text{OR}\cdot q\) \(\mu_p = 2.61561\) \(\sigma_p = 0.42558\)