3.5 The Cayley–Hamilton theorem
Before proving this, let us see what it tells us. Let
\(\seteqnumber{0}{3.}{6}\)\begin{equation*} A= \begin{pmatrix} a&b\\c&d \end {pmatrix}\in M_2(\F ). \end{equation*}
Then
\(\seteqnumber{0}{3.}{6}\)\begin{equation*} \Delta _A= \begin{vmatrix} a-x&b\\c&d-x \end {vmatrix}=x^2-(a+d)x+(ad-bc). \end{equation*}
So the Cayley–Hamilton theorem is telling us that
\(\seteqnumber{0}{3.}{6}\)\begin{equation*} A^2-(a+d)A+(ad-bc)I_2=0, \end{equation*}
that is,
\(\seteqnumber{0}{3.}{6}\)\begin{equation*} \begin{pmatrix} a^2+bc&ab+bd\\ca+dc&cb+d^2 \end {pmatrix} -(a+d) \begin{pmatrix} a&b\\c&d \end {pmatrix} + \begin{pmatrix} ad-bc&0\\0&ad-bc \end {pmatrix} = \begin{pmatrix} 0&0\\0&0 \end {pmatrix}. \end{equation*}
This is certainly true (check it!) but is far from obvious! If you are not yet convinced, work out what the theorem says for \(A\in M_3(\F )\).
-
Proof of Theorem 3.12. We will prove the matrix version. So let \(A\in M_n(\F )\) and write
\(\seteqnumber{0}{3.}{6}\)\begin{equation*} \Delta _{A}=a_0+\dots +a_nx^n. \end{equation*}
Thus, our mission is to show that
\(\seteqnumber{0}{3.}{6}\)\begin{equation*} a_0I_n+a_1A+\dots +a_nA^n=0. \end{equation*}
The key is the adjugate formula from Algebra 1B5:
\(\seteqnumber{0}{3.}{6}\)\begin{equation} \label {eq:11} \mathrm {adj}(A-xI_{n})(A-xI_{n})=\det (A-xI_{n})I_{n}. \end{equation}
Each entry of \(\mathrm {adj}(A-xI_n)\) is a polynomial in \(x\) of degree at most \(n-1\) so we write
\(\seteqnumber{0}{3.}{7}\)\begin{equation*} \mathrm {adj}(A-xI_n)=B_0+B_1x+\dots +B_{n-1}x^{n-1}, \end{equation*}
with each \(B_k\in M_n(\F )\). Substitute this into (3.7) to get
\(\seteqnumber{0}{3.}{7}\)\begin{equation*} (B_0+B_1x+\dots +B_{n-1}x^{n-1})(A-xI)=(a_0+\dots +a_nx^n)I_n \end{equation*}
and compare coefficients of \(x^{k}\) to get
\(\seteqnumber{0}{3.}{7}\)\begin{equation} \label {eq:12} B_kA-B_{k-1}=a_kI_{n}, \end{equation}
for \(0\leq k\leq n\), where we have set \(B_{-1}=B_n=0\in M_n(\F )\).
Multiply (3.8) by \(A^k\) on the right to get
\(\seteqnumber{0}{3.}{8}\)\begin{equation*} B_kA^{k+1}-B_{k-1}A^k=a_kA^k \end{equation*}
and sum:
\(\seteqnumber{0}{3.}{8}\)\begin{equation*} \Delta _{A}(A)=\sum _{k=0}^na_kA^k=\sum _{k=0}^n(B_kA^{k+1}-B_{k-1}A^k)=B_nA^{n+1}-B_{-1}=0 \end{equation*}
because nearly all terms in the penultimate sum cancel. □
-
Corollary 3.13. Let \(\phi \in L(V)\) be a linear operator on a finite-dimensional vector space over a field \(\F \).
-
(1) \(m_{\phi }\) divides \(\Delta _{\phi }\). Equivalently, \(m_A\) divides \(\Delta _A\), for any \(A\in M_n(\F )\).
-
(2) The roots of \(m_{\phi }\) are exactly the eigenvalues of \(\phi \).
-
Let us summarise the situation when \(\F =\C \) so that any polynomial is a product of linear factors. So let \(\phi \in L(V)\) be a linear operator on a finite-dimensional complex vector space with distinct eigenvalues \(\lst \lambda 1k\). Then
\(\seteqnumber{0}{3.}{8}\)\begin{align*} \Delta _{\phi }&=\pm \prod _{i=1}^k(x-\lambda _i)^{r_i}\\ m_{\phi }&=\prod _{i=1}^k(x-\lambda _i)^{s_i}, \end{align*} where \(r_i=\am (\lambda _i)\) and \(\bw 1{s_i}{r_i}\), for \(\bw 1ik\).
This gives us another way to find \(m_{\phi }\) if we can factorise \(\Delta _{\phi }\): \(m_{\phi }\) will be of the form \(p=\prod _{i=1}^{k}(x-\lambda _{i})^{s_i}\), with each \(\bw 1{s_i}{r_i}\), so evaluate \(p(\phi )\) to find the one of lowest degree with \(p(\phi )=0\).
-
Examples. Let us find \(m_A\) in the following cases:
-
(1) Take
\(\seteqnumber{0}{3.}{8}\)\begin{equation*} A= \begin{pmatrix*}[r] 1&1&2\\0&1&1\\0&0&2 \end {pmatrix*}. \end{equation*}
Since \(A\) is upper triangular, we immediately see that \(\Delta _A=-(x-1)^2(x-2)\) so that \(m_A\) is either \((x-1)(x-2)\) or \((x-1)^2(x-2)\).
We try the first of these:
\(\seteqnumber{0}{3.}{8}\)\begin{equation*} (A-I_3)(A-2I_3)= \begin{pmatrix} 0&1&2\\0&0&1\\0&0&1 \end {pmatrix} \begin{pmatrix*}[r] -1&-1&2\\0&-1&1\\0&0&0 \end {pmatrix*} = \begin{pmatrix*}[r] 0&-1&1\\0&0&0\\0&0&0 \end {pmatrix*}\neq 0. \end{equation*}
We conclude that \(m_A=(x-1)^2(x-2)\).
-
(2) Let us try again with
\(\seteqnumber{0}{3.}{8}\)\begin{equation*} A= \begin{pmatrix} 1&0&3\\0&1&2\\0&0&2 \end {pmatrix} \end{equation*}
which also has \(\Delta _A=-(x-1)^2(x-2)\) so that \(m_A\) is either \((x-1)(x-2)\) or \((x-1)^2(x-2)\).
However, this time
\(\seteqnumber{0}{3.}{8}\)\begin{equation*} (A-I_3)(A-2I_3)= \begin{pmatrix} 0&0&3\\0&0&2\\0&0&1 \end {pmatrix} \begin{pmatrix*}[r] -1&0&3\\0&-1&2\\0&0&0 \end {pmatrix*} =0 \end{equation*}
so that \(m_A=(x-1)(x-2)\).
-