machine learning, computer science, and more

The Newton Schulz iteration for matrix inversion

The Newton Schulz method is well-known, and the proof of convergence is widely available on the internet. Yet the derivation of the method itself is more obscure. Here it is:

We seek the zero of \( f: \mathbb{R}^2 \rightarrow \mathbb{R}^2\), defined as follows:
\( f(X) = X^{-1} – A\).

The derivative of \(f\) at \(X\), applied to matrix \(B\), operates on \(B\) as follows:
\[ \begin{array}{l}
f'(X)[B] = -X^{-1} B X^{-1}.
\end{array} \]

We can then prove that \( f'^{-1} \) at \( X \), applied to matrix \( B \), operates on \( B \) as follows:
\( \begin{array}{l}
f'^{-1}(X)[B] = -X B X
\end{array} \).

To see this, notice that
\[ \begin{array}{l}
B \\
= f'^{-1}(X)\Big[f'(X)[B]\Big] \\
= -X \Big[-X^{-1} B X^{-1}\Big] X \\
= B.
\end{array} \]

The Newton method for root finding has at each iterate:
\[ \begin{array}{l}
X_{t+1} \\
= X_t – f'^{-1}(X_t)\Big[f(X_t)\Big]\\
= X_t – f'^{-1}(X_t)\Big[X^{-1} – A\Big] \\
= X_t – X_t[X^{-1}_t-A] X_t \\
= X_t – [-X_t + X_t A X_t]\\
= 2 X_t – X_t A X_t
\end{array} \].


This was originally published here: https://calvinmccarter.wordpress.com/2021/11/18/the-newton-schulz-iteration-for-matrix-inversion/

#ML