# Shae's Ramblings

Stuff I find slightly meaningful and close to achievable

## Bayesian linear regression in three languages

We have already covered the basics of bayesian linear regression in a previous post. Let's now implement our own regression routine in three languages !

## Python

### From prior to posterior

We only have to implement two routines: one for the posterior mean and variance, and one for the predictive distribution. The code for the posterior is pretty straightfoward when implemented naïvely:

def posterior(sigma_noise: Union[float, jnp.ndarray],
weight_covariance: Union[List[List[float]], jnp.ndarray],
X: jnp.ndarray,
y: jnp.ndarray) -> Tuple[jnp.ndarray, jnp.ndarray]:

'''
Returns the posterior distribution on the weights of the linear model y ~ x
'''
Q_inv = weight_covariance + X.T @ X / sigma_noise ** 2
Q = jnp.linalg.inv(Q_inv)
mean = Q @ X.T @ y / sigma_noise ** 2
return mean, Q


Let us visualize priors and posteriors for $3$ different random initializations of the weight prior covariance $\pmb{\Sigma}$: the three priors are plotted below:

After computing each posterior, we observe that they all neatly converge to similar distributions:

so that the prior $\pmb{\Sigma}$ is not crucial as long as it is not extremely biased.

## Predictive distribution

Recall that if the posterior is $\mathcal{N}(\pmb{\mu}, \mathbf{Q})$ then the predictive distribution $f_\star \mid \mathbf{x}_\star$ is $\mathcal{N}(\pmb{\mu}^\mathsf{T}\mathbf{x}_\star, \mathbf{x}_\star^\mathsf{T}\mathbf{Q}\mathbf{x}_\star)$.
Because our input is two-dimensional, we cannot easily plot the predictive distribution as a function of the test input $\mathbf{x}_\star$. Instead, we'll plot the distribution of the output as $\mathbf{x}_\star$ changes on the unit circle: $\mathbf{x}_\star = \left[\cos\theta\;\sin\theta\right]$ for $\theta\in [-\pi, \pi]$.

The results for our three initializations are illustrated below

notice that although our model is linear in $\mathbf{x}$, it is not linear as a function of $\theta$. The three models are similar as expected.

## Julia

The code snippet to implement the posterior is similar:

using LinearAlgebra

function posterior(
sigma_noise::Float64,
weight_covariance::Matrix{Float64},
X::Matrix{Float64},
y::Matrix{Float64}
)::Tuple{Matrix{Float64}, Matrix{Float64}}
"""
Computes posterior mean and covariance for the
linear regression weights.
"""
Q_inv = weight_covariance + transpose(X) * X / ( sigma_noise ^ 2 )
Q = inv(Q_inv)
mean = Q * transpose(X) * y / ( sigma_noise ^ 2 )

return mean, Q
end


## R

In R, we can update the posterior through a function:

posterior <- function(
sigma_noise,
weight_covariance,
X,
y
) {
Q_inv <- weight_covariance + t(X) %*% X / (sigma_noise ^ 2)
Q <- solve(Q_inv)
mean <- Q %*% t(X) %*% y / (sigma_noise ^ 2)

results <- list("mean" = mean, "cov" = Q)
return(results)
}