Condition number of a matrix: basics
A special kind of norm for \(n\times n\) square matrices \(\mathrm{A}\) is called the subordinate (or induced) norm.
If we have a norm \(\lVert\cdot\rVert_\alpha\) on the input space, and a second norm \(\lVert\cdot\rVert_\beta\) on the output space, the subordinate norm is
$$
\lVert\mathrm{A}\rVert_{\alpha, \beta} = \sup_{x \neq 0}\dfrac{\;\lVert\mathrm{A}x\rVert_{\beta}}{\;\lVert x\rVert_\alpha}
$$
Submultiplicative property
An important property of these induced norms is that for any third norm \(\lVert\cdot\rVert_{\gamma}\), we have:
$$
\lVert\mathrm{AB}\rVert_{\alpha, \beta} \leq \lVert\mathrm{A}\rVert_{\gamma, \beta} \lVert\mathrm{B}\rVert_{\alpha, \gamma}
$$
This is easily proven via the computations
$$
\dfrac{\;\lVert\mathrm{ABx}\rVert_{\beta}}{\;\lVert x\rVert_\alpha}
=
\dfrac{\;\lVert\mathrm{A}y\rVert_{\beta}}{\;\lVert y\rVert_\gamma}
\dfrac{\;\lVert\mathrm{B}x\rVert_{\gamma}}{\;\lVert x\rVert_\gamma}
$$
for \(y = \mathrm{B}x\).
As each supremum on the right is an upper bound, we get the inequality
$$
\dfrac{\;\lVert\mathrm{ABx}\rVert_{\beta}}{\;\lVert x\rVert_\alpha}
\leq
\lVert\mathrm{A}\rVert_{\gamma, \beta}
\lVert\mathrm{B}\rVert_{\alpha, \gamma}
$$
we get the final equality because the supremum is the least upper bound.
A special case is when taking \(\alpha = \beta\), which yields that
$$\lVert\mathrm{AB}\rVert_{\alpha, \alpha} \leq \lVert\mathrm{A}\rVert_{\alpha, \alpha} \lVert\mathrm{B}\rVert_{\alpha, \alpha}
$$
we denote this norm \(\lVert\cdot\rVert_\alpha\).
Condition number
The condition number \(\kappa_\alpha(\mathrm{A})\) of a matrix is the quantity
$$\kappa_\alpha(\mathrm{A}) = \lVert\mathrm{A}\rVert_\alpha\lVert\mathrm{A}^{-1}\rVert_\alpha $$
which is larger than \(1\) because
$$
1 = \lVert\mathrm{I}\rVert_\alpha = \lVert\mathrm{A}\mathrm{A}^{-1}\rVert_\alpha \leq \lVert\mathrm{A}\rVert_\alpha\lVert\mathrm{A}^{-1}\rVert_\alpha
$$
this number is large when either \(\mathrm{A}\) or its inverse creates large distortions as the input varies on the unit circle.
Stability of a similarity
In linear algebra, a matrix \(\mathrm{B}\) is said to be similar to \(\mathrm{A}\) when there is an invertible transformation \(\mathrm{X}\) such that \(\mathrm{B} = \mathrm{X}\mathrm{A}\mathrm{X}^{-1}\).
In numerical analyses, it is often that we wish to know the robustness of an operation \(f\colon x \mapsto f(x)\) with respect to a small perturbation of its input: if \(y = f(x)\), what is \(f(x+\varepsilon)\) ?
For real functions, continuity tells us that as long as \(\lvert \varepsilon \rvert < \delta\), the perturbed output \(f(x+\varepsilon)\) is \(\tilde{\varepsilon}\)-close to \(y\) for some small \(\tilde{\varepsilon} > 0\).
What is the equivalent bound on the error of the matrix function \(f_{\mathrm{X}}(\mathrm{A}) = \mathrm{X}\mathrm{A}\mathrm{X}^{-1}\) ?
Suppose \(\mathrm{A}\) is perturbed by a matrix \(\mathrm{E}\) with small magnitude: \(\lVert\mathrm{E}\rVert_\alpha < \delta\)
Then, the value of the function at this perturbed output is
$$
f_{\mathrm{X}}(\mathrm{A}+ \mathrm{E}) = \mathrm{X}(\mathrm{A}+ \mathrm{E})\mathrm{X}^{-1} = f_{\mathrm{X}}(\mathrm{A}) + f_{\mathrm{X}}(\mathrm{E})
$$
thus, the norm of the error \(\mathrm{B}\, – f_{\mathrm{X}}(\mathrm{A}+ \mathrm{E})\) is given by
$$
\lVert\mathrm{B}\,– f_{\mathrm{X}}(\mathrm{A}+ \mathrm{E})\rVert_\alpha = \lVert\mathrm{X}\mathrm{E}\mathrm{X}^{-1}\rVert_\alpha \leq \kappa_\alpha(\mathrm{X})\,\lVert\mathrm{E}\rVert_\alpha
$$
therefore, the condition number \(\kappa_\alpha\) serves as a measure of continuity for the similarity operation \(f_\mathrm{X}(\cdot)\). If the condition number is small, the matrix is said to be “well-behaved”, so that for small input errors \(\mathrm{E}\) the output is only perturbed at most by the small amout \(\kappa_\alpha(\mathrm{X})\,\lVert\mathrm{E}\rVert_\alpha\).
If the condition number is large, even small input errors cause large deviations in the output.
Orthogonal matrices are stable
The norm induced by setting \(\alpha = 2\) reveals that orthogonal matrices (the matrices \(\mathrm{Q}\) for which \(\mathrm{Q}^\mathsf{T}\mathrm{Q} = \mathrm{I}\)) are computationally very stable.
Indeed, the perturbation created by the similarity \(\mathrm{A}\mapsto \mathrm{Q}\mathrm{A}\mathrm{Q}^\mathsf{T}\) is equal to \(\lVert\mathrm{Q}\mathrm{E}\mathrm{Q}^\mathsf{T}\rVert_2 = \lVert\mathrm{E}\rVert_2\)
These transformations introduce no extra perturbations !
Stability of a time-varying linear system
Consider the time-varying \(n\times n\) system \(\mathrm{A}(t)\, x(t) = b(t)\).
Here, the notion of “stable” is different: a time-varying linear system is unstable if small durations cause important variations in the state \(x\).
A stable system has therefore a small relative variation \(\lVert \dot{x} \rVert / \lVert x \rVert\).
Differentiating this equality yields
$$
\dot{x}(t) =\mathrm{A}(t)^{-1}\dot{b}(t)\, – \mathrm{A}(t)^{-1}\dot{\mathrm{A}}(t)\,x(t)
$$
Therefore the stability of the system can be upper-bounded (dropping the \(t\) for simplicity)
$$
\dfrac{\lVert\dot{x}\rVert}{\lVert x \rVert} \leq \kappa(\mathrm{A}) \left(\dfrac{\lVert \dot{b} \rVert}{\lVert \mathrm{A}\rVert \lVert x\rVert } + \dfrac{\lVert \dot{\mathrm{A}}\rVert}{\lVert \mathrm{A}\rVert}\right)
$$
therefore, a system whose matrix has a small condition number at time \(t\) will stay relatively close to its current position in the near future.