## Error propagation in the Logit-Normal family

I recently came across the problem of computing posteriors and priors according to basic sum-product rules while having uncertainty on them. Suppose you have four probabilities \(p,q,r,k\) and all of them have some level of uncertainty that's framed as some sort of \(\text{LogitNormal}(\mu,\sigma)\).

As each of those variables are outputs of different regressions, it seems only fair to give them each parameters \((\mu_s, \sigma_s)\) with \( s\in\{p,q,r,k\}\).

Another twist is that I am given an Odds-Ratio as a way to estimate the uncertainty of \(p,q\) through \(p/q\).

In my case I found it simpler to express all of them as gaussians \(Z_s\) compressed through a logit:

$$

p/q = \dfrac{\text{logit}(Z_p)}{\text{logit}(Z_q)} = \dfrac{1+\exp(-Z_q)}{1+\exp(-Z_p)} = \exp\left(Z_\text{OR}\right)

$$

I also know that \(k = rq + (1-r)p \) because \(p,q\) are conditional distributions.

This seems very abstract, but just know I have specific values for *some* of these quantities: I know that \(\mu_\text{OR} = 5.5\) and I have values for the confidence interval \(\text{OR}\in (4.1, 7.28)\).

Let's estimate the mean \(\mu_\star\) and variance \(\sigma_\star\) of the variable \(Z_\text{OR}\).

## From Normal CI's to LogNormal CI's

As it is laid out in this stack exchange post,

if your variable \(X\sim\text{Normal}(\mu, \sigma)\) and \(Y = \exp X\), a \(95\%\) CI for \(Y\) is obtained by solving

$$

\mathbb{P}(a\leq Y \leq b) \triangleq \Phi\left(\dfrac{\log(b)-\mu}{\sigma}\right) – \Phi\left(\dfrac{\log(a)-\mu}{\sigma}\right) = 0.95

$$

With the additional knowledge that the normal CI's are attained at \(2\Phi(1.96) – 1\) I am left solving the equations

$$ 4.1 = \exp( \mu\,-\,1.96\sigma)\qquad\text{and}\qquad 7.28 = \exp( \mu + 1.96\sigma)$$

Hence

$$\mu_\star = \dfrac{\log(a) + \log(b)}{2}\qquad\text{and}\qquad\sigma_\star = \dfrac{\log(b) – \log(a)}{2\cdot 1.96}$$

Which yields \(\mu_\star = 0.74269 \) and \(\sigma_\star = 0.0609\)

## Getting the distribution of every variable

Now we can guess the distribution of \(p,q\) because

$$

1+\exp(-Z_q) = \exp\left(Z_\text{OR}\right) + \exp\left(Z_\text{OR}\right)\exp(-Z_p)

$$

So I'm back at estimating the distribution of sums/multiplications of a bunch of \(\text{LogNormal}\) RVs. According to the wikipedia page

### Sum of Log-Normals

If \(X,Y\) are log-normal then a reasonable approximation of \(S \triangleq X+Y\) is given by a log-normal with parameters

$$

\sigma_S^2 = \dfrac{\exp\left(2\mu_X + \sigma_X^2/2\right)(\exp(\sigma_X^2)-1) + \exp\left(2\mu_Y + \sigma_Y^2/2\right)(\exp(\sigma_Y^2)-1) + 1}{\left[\exp\left(2\mu_X + \sigma_X^2/2\right) + \exp\left(2\mu_Y + \sigma_Y^2/2\right)\right]^2}

$$

and \(\mu_S = \log\left[\exp\left(2\mu_X + \sigma_X^2/2\right) + \exp\left(2\mu_Y + \sigma_Y^2/2\right)\right] – \sigma_S^2/2\)

### Product of Log-Normals

When you have to compute \( P = XY\) it's much easier, it's just a \(\text{LogNormal}\left(\mu_X + \mu_Y, \sigma_X^2 + \sigma_Y^2\right)\)

### Wrapping everything up

It is conceptually easy to solve for our parameters, but doing the computations requires layers upon layers of tedious arithmetics.

Thankfully I have implemented this ugly beast in python. All it required was a symbolic expressions simplifier and some well-executed implementation of a nonlinear optimization algorithm called BFGS.

Here's a piece of the code:

```
mu_q, mu_outcome, mu_condition, mu_or = sp.symbols(
'mu_q mu_outcome mu_condition mu_or'
)
sigma_q, sigma_outcome, sigma_condition, sigma_or = sp.symbols(
'sigma_q sigma_outcome sigma_condition sigma_or', positive=True
)
params_q = (mu_q, sigma_q)
params_outcome = (mu_outcome, sigma_outcome)
params_condition = (mu_condition, sigma_condition)
params_or = (mu_or, sigma_or)
mu_lhs, sigma_lhs = sum_lognormals(
params_outcome, prod_lognormals(
params_q, params_condition,
symbolic=True
),
symbolic=True
)
mu_rhs, sigma_rhs = sum_lognormals(
params_q,
prod_lognormals(
params_q, params_or, params_condition,
symbolic=True
),
symbolic=True
)
return sp.simplify(mu_lhs - mu_rhs), sp.simplify(sigma_lhs - sigma_rhs)
```

### The results !

If we truncate to 4 decimal places, our trusty algorithm finds that if

- The condition \( k = p(1-r) + rq \) holds,
- The condition \( p = \text{OR}\cdot q\) holds for some lognormal odds-ratio \(\text{OR}\)
- Each \(k,p,q,r\) are lognormals

Then the values

- \( \text{OR}\in (4.1, 7.28)\)
- \(k\in (0.1, 0.2) \)
- \(r\in (0.004, 0.013)\)

imply that

- \(\mu_k = -4.2585,\quad\sigma_k = 0.21238 \)
- \(\mu_r = -4.9321,\quad \sigma_r = 0.2769\)
- \(\mu_q = -4.3583,\quad \sigma_q = 0.4212\)
- finally \(\mu_p = \mu_q + \mu_\star,\quad \sigma_p = \sqrt{\sigma_q + \sigma_\star}\)

**One last twist:** let's not forget we didn't really have the expression \(\exp(Z_p)/\exp(Z_q)\), but rather the expression \(\dfrac{1+\exp(-Z_p)}{1+\exp(-Z_q)}\)

This is easily remedied according the so-called Three-parameter log-normal distribution reparametrization :

- \( \mu_p \gets (-\mu_p\;-\;1)\;,\quad \sigma_p \gets \sigma_p\)
- \( \mu_q \gets (-\mu_q\;-\;1)\;,\quad \sigma_q \gets \sigma_q\)

### Let's summarize

Variable name | Log-Normal mean \(\mu\) | Log-Normal std \(\sigma\) |
---|---|---|

\(r\in (0.004, 0.013) \) | \(\mu_r = -4.9321\) | \(\sigma_r = 0.2769\) |

\(k \in (0.01, 0.02) \) | \(\mu_k = -4.2585\) | \(\sigma_k = 0.21238\) |

\(q = ?\) | \(\mu_q = 3.3583\) | \(\sigma_q = 0.4212\) |

\(p = \text{OR}\cdot q\) | \(\mu_p = 2.61561\) | \(\sigma_p = 0.42558\) |