Condition number of a matrix: basics

A special kind of norm for \(n\times n\) square matrices \(\mathrm{A}\) is called the subordinate (or induced) norm. If we have a norm \(\lVert\cdot\rVert_\alpha\) on the input space, and a second norm \(\lVert\cdot\rVert_\beta\) on the output space, the subordinate norm is

$$ \lVert\mathrm{A}\rVert_{\alpha, \beta} = \sup_{x \neq 0}\dfrac{\;\lVert\mathrm{A}x\rVert_{\beta}}{\;\lVert x\rVert_\alpha} $$

Submultiplicative property

An important property of these induced norms is that for any third norm \(\lVert\cdot\rVert_{\gamma}\), we have:

$$ \lVert\mathrm{AB}\rVert_{\alpha, \beta} \leq \lVert\mathrm{A}\rVert_{\gamma, \beta} \lVert\mathrm{B}\rVert_{\alpha, \gamma} $$

This is easily proven via the computations

$$ \dfrac{\;\lVert\mathrm{ABx}\rVert_{\beta}}{\;\lVert x\rVert_\alpha} = \dfrac{\;\lVert\mathrm{A}y\rVert_{\beta}}{\;\lVert y\rVert_\gamma} \dfrac{\;\lVert\mathrm{B}x\rVert_{\gamma}}{\;\lVert x\rVert_\gamma} $$ for \(y = \mathrm{B}x\). As each supremum on the right is an upper bound, we get the inequality

$$ \dfrac{\;\lVert\mathrm{ABx}\rVert_{\beta}}{\;\lVert x\rVert_\alpha} \leq \lVert\mathrm{A}\rVert_{\gamma, \beta} \lVert\mathrm{B}\rVert_{\alpha, \gamma} $$

we get the final equality because the supremum is the least upper bound. A special case is when taking \(\alpha = \beta\), which yields that $$\lVert\mathrm{AB}\rVert_{\alpha, \alpha} \leq \lVert\mathrm{A}\rVert_{\alpha, \alpha} \lVert\mathrm{B}\rVert_{\alpha, \alpha} $$ we denote this norm \(\lVert\cdot\rVert_\alpha\).

Condition number

The condition number \(\kappa_\alpha(\mathrm{A})\) of a matrix is the quantity $$\kappa_\alpha(\mathrm{A}) = \lVert\mathrm{A}\rVert_\alpha\lVert\mathrm{A}^{-1}\rVert_\alpha $$ which is larger than \(1\) because $$ 1 = \lVert\mathrm{I}\rVert_\alpha = \lVert\mathrm{A}\mathrm{A}^{-1}\rVert_\alpha \leq \lVert\mathrm{A}\rVert_\alpha\lVert\mathrm{A}^{-1}\rVert_\alpha $$ this number is large when either \(\mathrm{A}\) or its inverse creates large distortions as the input varies on the unit circle.

Stability of a similarity

In linear algebra, a matrix \(\mathrm{B}\) is said to be similar to \(\mathrm{A}\) when there is an invertible transformation \(\mathrm{X}\) such that \(\mathrm{B} = \mathrm{X}\mathrm{A}\mathrm{X}^{-1}\). In numerical analyses, it is often that we wish to know the robustness of an operation \(f\colon x \mapsto f(x)\) with respect to a small perturbation of its input: if \(y = f(x)\), what is \(f(x+\varepsilon)\) ? For real functions, continuity tells us that as long as \(\lvert \varepsilon \rvert < \delta\), the perturbed output \(f(x+\varepsilon)\) is \(\tilde{\varepsilon}\)-close to \(y\) for some small \(\tilde{\varepsilon} > 0\). What is the equivalent bound on the error of the matrix function \(f_{\mathrm{X}}(\mathrm{A}) = \mathrm{X}\mathrm{A}\mathrm{X}^{-1}\) ? Suppose \(\mathrm{A}\) is perturbed by a matrix \(\mathrm{E}\) with small magnitude: \(\lVert\mathrm{E}\rVert_\alpha < \delta\) Then, the value of the function at this perturbed output is

$$ f_{\mathrm{X}}(\mathrm{A}+ \mathrm{E}) = \mathrm{X}(\mathrm{A}+ \mathrm{E})\mathrm{X}^{-1} = f_{\mathrm{X}}(\mathrm{A}) + f_{\mathrm{X}}(\mathrm{E}) $$

thus, the norm of the error \(\mathrm{B}\, – f_{\mathrm{X}}(\mathrm{A}+ \mathrm{E})\) is given by

$$ \lVert\mathrm{B}\,– f_{\mathrm{X}}(\mathrm{A}+ \mathrm{E})\rVert_\alpha = \lVert\mathrm{X}\mathrm{E}\mathrm{X}^{-1}\rVert_\alpha \leq \kappa_\alpha(\mathrm{X})\,\lVert\mathrm{E}\rVert_\alpha $$

therefore, the condition number \(\kappa_\alpha\) serves as a measure of continuity for the similarity operation \(f_\mathrm{X}(\cdot)\). If the condition number is small, the matrix is said to be “well-behaved”, so that for small input errors \(\mathrm{E}\) the output is only perturbed at most by the small amout \(\kappa_\alpha(\mathrm{X})\,\lVert\mathrm{E}\rVert_\alpha\). If the condition number is large, even small input errors cause large deviations in the output.

Orthogonal matrices are stable

The norm induced by setting \(\alpha = 2\) reveals that orthogonal matrices (the matrices \(\mathrm{Q}\) for which \(\mathrm{Q}^\mathsf{T}\mathrm{Q} = \mathrm{I}\)) are computationally very stable. Indeed, the perturbation created by the similarity \(\mathrm{A}\mapsto \mathrm{Q}\mathrm{A}\mathrm{Q}^\mathsf{T}\) is equal to \(\lVert\mathrm{Q}\mathrm{E}\mathrm{Q}^\mathsf{T}\rVert_2 = \lVert\mathrm{E}\rVert_2\)

These transformations introduce no extra perturbations !

Stability of a time-varying linear system

Consider the time-varying \(n\times n\) system \(\mathrm{A}(t)\, x(t) = b(t)\). Here, the notion of “stable” is different: a time-varying linear system is unstable if small durations cause important variations in the state \(x\). A stable system has therefore a small relative variation \(\lVert \dot{x} \rVert / \lVert x \rVert\).

Differentiating this equality yields

$$ \dot{x}(t) =\mathrm{A}(t)^{-1}\dot{b}(t)\, – \mathrm{A}(t)^{-1}\dot{\mathrm{A}}(t)\,x(t) $$

Therefore the stability of the system can be upper-bounded (dropping the \(t\) for simplicity)

$$ \dfrac{\lVert\dot{x}\rVert}{\lVert x \rVert} \leq \kappa(\mathrm{A}) \left(\dfrac{\lVert \dot{b} \rVert}{\lVert \mathrm{A}\rVert \lVert x\rVert } + \dfrac{\lVert \dot{\mathrm{A}}\rVert}{\lVert \mathrm{A}\rVert}\right) $$

therefore, a system whose matrix has a small condition number at time \(t\) will stay relatively close to its current position in the near future.