# Shae's Ramblings

Stuff I find slightly meaningful and close to achievable

## Error propagation in the Logit-Normal family

I recently came across the problem of computing posteriors and priors according to basic sum-product rules while having uncertainty on them. Suppose you have four probabilities $p,q,r,k$ and all of them have some level of uncertainty that's framed as some sort of $\text{LogitNormal}(\mu,\sigma)$.

As each of those variables are outputs of different regressions, it seems only fair to give them each parameters $(\mu_s, \sigma_s)$ with $s\in\{p,q,r,k\}$.

Another twist is that I am given an Odds-Ratio as a way to estimate the uncertainty of $p,q$ through $p/q$.

In my case I found it simpler to express all of them as gaussians $Z_s$ compressed through a logit:

$$p/q = \dfrac{\text{logit}(Z_p)}{\text{logit}(Z_q)} = \dfrac{1+\exp(-Z_q)}{1+\exp(-Z_p)} = \exp\left(Z_\text{OR}\right)$$

I also know that $k = rq + (1-r)p$ because $p,q$ are conditional distributions.

This seems very abstract, but just know I have specific values for some of these quantities: I know that $\mu_\text{OR} = 5.5$ and I have values for the confidence interval $\text{OR}\in (4.1, 7.28)$.
Let's estimate the mean $\mu_\star$ and variance $\sigma_\star$ of the variable $Z_\text{OR}$.

## From Normal CI's to LogNormal CI's

As it is laid out in this stack exchange post,
if your variable $X\sim\text{Normal}(\mu, \sigma)$ and $Y = \exp X$, a $95\%$ CI for $Y$ is obtained by solving

$$\mathbb{P}(a\leq Y \leq b) \triangleq \Phi\left(\dfrac{\log(b)-\mu}{\sigma}\right) – \Phi\left(\dfrac{\log(a)-\mu}{\sigma}\right) = 0.95$$

With the additional knowledge that the normal CI's are attained at $2\Phi(1.96) – 1$ I am left solving the equations

$$4.1 = \exp( \mu\,-\,1.96\sigma)\qquad\text{and}\qquad 7.28 = \exp( \mu + 1.96\sigma)$$

Hence
$$\mu_\star = \dfrac{\log(a) + \log(b)}{2}\qquad\text{and}\qquad\sigma_\star = \dfrac{\log(b) – \log(a)}{2\cdot 1.96}$$

Which yields $\mu_\star = 0.74269$ and $\sigma_\star = 0.0609$

## Getting the distribution of every variable

Now we can guess the distribution of $p,q$ because

$$1+\exp(-Z_q) = \exp\left(Z_\text{OR}\right) + \exp\left(Z_\text{OR}\right)\exp(-Z_p)$$

So I'm back at estimating the distribution of sums/multiplications of a bunch of $\text{LogNormal}$ RVs. According to the wikipedia page

### Sum of Log-Normals

If $X,Y$ are log-normal then a reasonable approximation of $S \triangleq X+Y$ is given by a log-normal with parameters

$$\sigma_S^2 = \dfrac{\exp\left(2\mu_X + \sigma_X^2/2\right)(\exp(\sigma_X^2)-1) + \exp\left(2\mu_Y + \sigma_Y^2/2\right)(\exp(\sigma_Y^2)-1) + 1}{\left[\exp\left(2\mu_X + \sigma_X^2/2\right) + \exp\left(2\mu_Y + \sigma_Y^2/2\right)\right]^2}$$

and $\mu_S = \log\left[\exp\left(2\mu_X + \sigma_X^2/2\right) + \exp\left(2\mu_Y + \sigma_Y^2/2\right)\right] – \sigma_S^2/2$

### Product of Log-Normals

When you have to compute $P = XY$ it's much easier, it's just a $\text{LogNormal}\left(\mu_X + \mu_Y, \sigma_X^2 + \sigma_Y^2\right)$

### Wrapping everything up

It is conceptually easy to solve for our parameters, but doing the computations requires layers upon layers of tedious arithmetics.

Thankfully I have implemented this ugly beast in python. All it required was a symbolic expressions simplifier and some well-executed implementation of a nonlinear optimization algorithm called BFGS.

Here's a piece of the code:

    mu_q, mu_outcome, mu_condition, mu_or = sp.symbols(
'mu_q mu_outcome mu_condition mu_or'
)
sigma_q, sigma_outcome, sigma_condition, sigma_or = sp.symbols(
'sigma_q sigma_outcome sigma_condition sigma_or', positive=True
)

params_q = (mu_q, sigma_q)
params_outcome = (mu_outcome, sigma_outcome)
params_condition = (mu_condition, sigma_condition)
params_or = (mu_or, sigma_or)

mu_lhs, sigma_lhs = sum_lognormals(
params_outcome, prod_lognormals(
params_q, params_condition,
symbolic=True
),
symbolic=True
)
mu_rhs, sigma_rhs = sum_lognormals(
params_q,
prod_lognormals(
params_q, params_or, params_condition,
symbolic=True
),
symbolic=True
)

return sp.simplify(mu_lhs - mu_rhs), sp.simplify(sigma_lhs - sigma_rhs)



### The results !

If we truncate to 4 decimal places, our trusty algorithm finds that if

• The condition $k = p(1-r) + rq$ holds,
• The condition $p = \text{OR}\cdot q$ holds for some lognormal odds-ratio $\text{OR}$
• Each $k,p,q,r$ are lognormals

Then the values

• $\text{OR}\in (4.1, 7.28)$
• $k\in (0.1, 0.2)$
• $r\in (0.004, 0.013)$

imply that

• $\mu_k = -4.2585,\quad\sigma_k = 0.21238$
• $\mu_r = -4.9321,\quad \sigma_r = 0.2769$
• $\mu_q = -4.3583,\quad \sigma_q = 0.4212$
• finally $\mu_p = \mu_q + \mu_\star,\quad \sigma_p = \sqrt{\sigma_q + \sigma_\star}$

One last twist: let's not forget we didn't really have the expression $\exp(Z_p)/\exp(Z_q)$, but rather the expression $\dfrac{1+\exp(-Z_p)}{1+\exp(-Z_q)}$

This is easily remedied according the so-called Three-parameter log-normal distribution reparametrization :

• $\mu_p \gets (-\mu_p\;-\;1)\;,\quad \sigma_p \gets \sigma_p$
• $\mu_q \gets (-\mu_q\;-\;1)\;,\quad \sigma_q \gets \sigma_q$

### Let's summarize

Variable name Log-Normal mean $\mu$ Log-Normal std $\sigma$
$r\in (0.004, 0.013)$ $\mu_r = -4.9321$ $\sigma_r = 0.2769$
$k \in (0.01, 0.02)$ $\mu_k = -4.2585$ $\sigma_k = 0.21238$
$q = ?$ $\mu_q = 3.3583$ $\sigma_q = 0.4212$
$p = \text{OR}\cdot q$ $\mu_p = 2.61561$ $\sigma_p = 0.42558$