Bayesian linear regression in three languages

We have already covered the basics of bayesian linear regression in a previous post. Let's now implement our own regression routine in three languages !

Python

From prior to posterior

We only have to implement two routines: one for the posterior mean and variance, and one for the predictive distribution. The code for the posterior is pretty straightfoward when implemented naïvely:

def posterior(sigma_noise: Union[float, jnp.ndarray],
              weight_covariance: Union[List[List[float]], jnp.ndarray],
              X: jnp.ndarray,
              y: jnp.ndarray) -> Tuple[jnp.ndarray, jnp.ndarray]:
    
    '''
    Returns the posterior distribution on the weights of the linear model y ~ x
    '''
    Q_inv = weight_covariance + X.T @ X / sigma_noise ** 2
    Q = jnp.linalg.inv(Q_inv)
    mean = Q @ X.T @ y / sigma_noise ** 2
    return mean, Q

Let us visualize priors and posteriors for \(3\) different random initializations of the weight prior covariance \(\pmb{\Sigma}\): the three priors are plotted below:

priors python

After computing each posterior, we observe that they all neatly converge to similar distributions:

posteriors python

so that the prior \(\pmb{\Sigma}\) is not crucial as long as it is not extremely biased.

Predictive distribution

Recall that if the posterior is \(\mathcal{N}(\pmb{\mu}, \mathbf{Q})\) then the predictive distribution \(f_\star \mid \mathbf{x}_\star\) is \(\mathcal{N}(\pmb{\mu}^\mathsf{T}\mathbf{x}_\star, \mathbf{x}_\star^\mathsf{T}\mathbf{Q}\mathbf{x}_\star)\). Because our input is two-dimensional, we cannot easily plot the predictive distribution as a function of the test input \(\mathbf{x}_\star\). Instead, we'll plot the distribution of the output as \(\mathbf{x}_\star\) changes on the unit circle: \(\mathbf{x}_\star = \left[\cos\theta\;\sin\theta\right]\) for \(\theta\in [-\pi, \pi]\).

The results for our three initializations are illustrated below

predictive python

notice that although our model is linear in \(\mathbf{x}\), it is not linear as a function of \(\theta\). The three models are similar as expected.

Julia

The code snippet to implement the posterior is similar:

using LinearAlgebra

function posterior(
        sigma_noise::Float64,
        weight_covariance::Matrix{Float64},
        X::Matrix{Float64},
        y::Matrix{Float64}
        )::Tuple{Matrix{Float64}, Matrix{Float64}}
        """
        Computes posterior mean and covariance for the
        linear regression weights.
        """
        Q_inv = weight_covariance + transpose(X) * X / ( sigma_noise ^ 2 )
        Q = inv(Q_inv)
        mean = Q * transpose(X) * y / ( sigma_noise ^ 2 )

        return mean, Q
end

To be added.

R

In R, we can update the posterior through a function:

posterior <- function(
  sigma_noise,
  weight_covariance,
  X,
  y
) {
  Q_inv <- weight_covariance + t(X) %*% X / (sigma_noise ^ 2)
  Q <- solve(Q_inv)
  mean <- Q %*% t(X) %*% y / (sigma_noise ^ 2)
  
  results <- list("mean" = mean, "cov" = Q)
  return(results)
}

To be added.