Stuff I find slightly meaningful and close to achievable

Element-wise derivative of matrix inverse

This short blog post is about a formula which is useful in statistics, machine learning and control science. In these fields, one often has a matrix $\mathbf{A}$ which is element-wise parametrized by a scalar $\theta$, so that the matrix entries are each scalar functions of the parameter:

\mathbf{A}(\theta) =
a_{11}(\theta) & a_{12}(\theta) &\ldots & a_{1j}(\theta) & \ldots & a_{1n}(\theta) \\
a_{21}(\theta) & a_{22}(\theta) &\ldots & a_{1j}(\theta) & \ldots & a_{1n}(\theta) \\
\vdots & \vdots & \ddots & \vdots & & \vdots \\
a_{i1}(\theta) & a_{i2}(\theta) &\ldots & a_{ij}(\theta) & \ldots & a_{in}(\theta) \\
\vdots & \vdots & & \vdots & \ddots & \vdots \\
a_{n1}(\theta) & a_{n2}(\theta) &\ldots & a_{nj}(\theta) & \ldots & a_{nn}(\theta) \\

We then naturally defined the derivative $\dfrac{\partial \mathbf{A}}{\partial \theta} $ as a matrix with entries $\left[\dfrac{\partial a_{ij}}{\partial \theta} \right]_{ij}$.
Assume that $\mathbf{A}^{-1}$ exists for a specific value of $\theta$. In order to find the derivative $\dfrac{\partial \mathbf{A}^{-1}}{\partial \theta} $ of the inverse, we first differentiate each side of the equality

\mathbf{A}^{-1}\mathbf{A} = \mathbf{A}\mathbf{A}^{-1} = \mathbf{I}_n

We however first need to figure out the derivative of a product $\mathbf{A}\mathbf{B}$

Derivative of a product of matrices

Suppose both $\mathbf{A}, \mathbf{B}$ depend on the parameter $\theta$. The $ij$-th entry of the product $\mathbf{A}\mathbf{B}$ is:

$$ \left[\mathbf{A}\mathbf{B}
\right]_{ij} = \sum_{k=1}^n a_{ik}b_{kj}

We therefore only need the derivative of this expression for every entry of the product $\mathbf{A}\mathbf{B}$. This is a derivative of scalar functions, which is easy to evaluate:

\left[\dfrac{\partial}{\partial \theta}\mathbf{A}\mathbf{B}
\right]_{ij} &= \dfrac{\partial}{\partial \theta}\sum_{k=1}^n a_{ik}b_{kj} \\
&= \sum_{k=1}^n \dfrac{\partial}{\partial \theta}(a_{ik}b_{kj}) \\
&= \sum_{k=1}^n \dfrac{\partial}{\partial \theta}a_{ik}b_{kj} + a_{ik}\dfrac{\partial}{\partial \theta} b_{kj} \\
& = \sum_{k=1}^n \dfrac{\partial}{\partial \theta}a_{ik}b_{kj} + \sum_{p=1}^n a_{ip}\dfrac{\partial}{\partial \theta} b_{pj} \\
& = \left[ \dfrac{\partial \mathbf{A}}{\partial \theta}\mathbf{B} + \mathbf{A} \dfrac{\partial \mathbf{B}}{\partial \theta}\right]_{ij} \\

Last derivations

We can apply this identity to obtain the equality:

\dfrac{\partial \mathbf{A}}{\partial \theta}\mathbf{A}^{-1} + \mathbf{A}\dfrac{\partial \mathbf{A}^{-1}}{\partial \theta} = \mathbf{0}_n
\dfrac{\partial \mathbf{A}^{-1}}{\partial \theta} = \,-\mathbf{A}^{-1}\dfrac{\partial \mathbf{A}}{\partial \theta}\mathbf{A}^{-1}