3.5 The Cayley–Hamilton theorem

  • Theorem 3.12 (Cayley–Hamilton4 Theorem). Let \(\phi \in L(V)\) be a linear operator on a finite-dimensional vector space over a field \(\F \).

    Then \(\Delta _{\phi }(\phi )=0\).

    Equivalently, for any \(A\in M_n(\F )\), \(\Delta _A(A)=0\).

4 Arthur Cayley, 1821–1895; William Rowan Hamilton, 1805–1865.

Before proving this, let us see what it tells us. Let

\begin{equation*} A= \begin{pmatrix} a&b\\c&d \end {pmatrix}\in M_2(\F ). \end{equation*}

Then

\begin{equation*} \Delta _A= \begin{vmatrix} a-x&b\\c&d-x \end {vmatrix}=x^2-(a+d)x+(ad-bc). \end{equation*}

So the Cayley–Hamilton theorem is telling us that

\begin{equation*} A^2-(a+d)A+(ad-bc)I_2=0, \end{equation*}

that is,

\begin{equation*} \begin{pmatrix} a^2+bc&ab+bd\\ca+dc&cb+d^2 \end {pmatrix} -(a+d) \begin{pmatrix} a&b\\c&d \end {pmatrix} + \begin{pmatrix} ad-bc&0\\0&ad-bc \end {pmatrix} = \begin{pmatrix} 0&0\\0&0 \end {pmatrix}. \end{equation*}

This is certainly true (check it!) but is far from obvious! If you are not yet convinced, work out what the theorem says for \(A\in M_3(\F )\).

  • Proof of Theorem 3.12. We will prove the matrix version. So let \(A\in M_n(\F )\) and write

    \begin{equation*} \Delta _{A}=a_0+\dots +a_nx^n. \end{equation*}

    Thus, our mission is to show that

    \begin{equation*} a_0I_n+a_1A+\dots +a_nA^n=0. \end{equation*}

    The key is the adjugate formula from Algebra 1B5:

    \begin{equation} \label {eq:11} \mathrm {adj}(A-xI_{n})(A-xI_{n})=\det (A-xI_{n})I_{n}. \end{equation}

    Each entry of \(\mathrm {adj}(A-xI_n)\) is a polynomial in \(x\) of degree at most \(n-1\) so we write

    \begin{equation*} \mathrm {adj}(A-xI_n)=B_0+B_1x+\dots +B_{n-1}x^{n-1}, \end{equation*}

    with each \(B_k\in M_n(\F )\). Substitute this into (3.7) to get

    \begin{equation*} (B_0+B_1x+\dots +B_{n-1}x^{n-1})(A-xI)=(a_0+\dots +a_nx^n)I_n \end{equation*}

    and compare coefficients of \(x^{k}\) to get

    \begin{equation} \label {eq:12} B_kA-B_{k-1}=a_kI_{n}, \end{equation}

    for \(0\leq k\leq n\), where we have set \(B_{-1}=B_n=0\in M_n(\F )\).

    Multiply (3.8) by \(A^k\) on the right to get

    \begin{equation*} B_kA^{k+1}-B_{k-1}A^k=a_kA^k \end{equation*}

    and sum:

    \begin{equation*} \Delta _{A}(A)=\sum _{k=0}^na_kA^k=\sum _{k=0}^n(B_kA^{k+1}-B_{k-1}A^k)=B_nA^{n+1}-B_{-1}=0 \end{equation*}

    because nearly all terms in the penultimate sum cancel.  □

5 Theorem 2.4.6

  • Corollary 3.13. Let \(\phi \in L(V)\) be a linear operator on a finite-dimensional vector space over a field \(\F \).

    • (1) \(m_{\phi }\) divides \(\Delta _{\phi }\). Equivalently, \(m_A\) divides \(\Delta _A\), for any \(A\in M_n(\F )\).

    • (2) The roots of \(m_{\phi }\) are exactly the eigenvalues of \(\phi \).

  • Proof. By Theorem 3.12, \(\Delta _{\phi }(\phi )=0\) so \(m_{\phi }\) divides \(\Delta _{\phi }\) by Proposition 3.7. As a result, any root of \(m_{\phi }\) is a root of \(\Delta _{\phi }\) and so an eigenvalue. Conversely, any eigenvalue is a root of \(m_{\phi }\) by Corollary 3.11.  □

Let us summarise the situation when \(\F =\C \) so that any polynomial is a product of linear factors. So let \(\phi \in L(V)\) be a linear operator on a finite-dimensional complex vector space with distinct eigenvalues \(\lst \lambda 1k\). Then

\begin{align*} \Delta _{\phi }&=\pm \prod _{i=1}^k(x-\lambda _i)^{r_i}\\ m_{\phi }&=\prod _{i=1}^k(x-\lambda _i)^{s_i}, \end{align*} where \(r_i=\am (\lambda _i)\) and \(\bw 1{s_i}{r_i}\), for \(\bw 1ik\).

This gives us another way to find \(m_{\phi }\) if we can factorise \(\Delta _{\phi }\): \(m_{\phi }\) will be of the form \(p=\prod _{i=1}^{k}(x-\lambda _{i})^{s_i}\), with each \(\bw 1{s_i}{r_i}\), so evaluate \(p(\phi )\) to find the one of lowest degree with \(p(\phi )=0\).

  • Examples. Let us find \(m_A\) in the following cases:

    • (1) Take

      \begin{equation*} A= \begin{pmatrix*}[r] 1&1&2\\0&1&1\\0&0&2 \end {pmatrix*}. \end{equation*}

      Since \(A\) is upper triangular, we immediately see that \(\Delta _A=-(x-1)^2(x-2)\) so that \(m_A\) is either \((x-1)(x-2)\) or \((x-1)^2(x-2)\).

      We try the first of these:

      \begin{equation*} (A-I_3)(A-2I_3)= \begin{pmatrix} 0&1&2\\0&0&1\\0&0&1 \end {pmatrix} \begin{pmatrix*}[r] -1&-1&2\\0&-1&1\\0&0&0 \end {pmatrix*} = \begin{pmatrix*}[r] 0&-1&1\\0&0&0\\0&0&0 \end {pmatrix*}\neq 0. \end{equation*}

      We conclude that \(m_A=(x-1)^2(x-2)\).

    • (2) Let us try again with

      \begin{equation*} A= \begin{pmatrix} 1&0&3\\0&1&2\\0&0&2 \end {pmatrix} \end{equation*}

      which also has \(\Delta _A=-(x-1)^2(x-2)\) so that \(m_A\) is either \((x-1)(x-2)\) or \((x-1)^2(x-2)\).

      However, this time

      \begin{equation*} (A-I_3)(A-2I_3)= \begin{pmatrix} 0&0&3\\0&0&2\\0&0&1 \end {pmatrix} \begin{pmatrix*}[r] -1&0&3\\0&-1&2\\0&0&0 \end {pmatrix*} =0 \end{equation*}

      so that \(m_A=(x-1)(x-2)\).