3.5 The Cayley–Hamilton theorem

  • Theorem 3.12 (Cayley–Hamilton2 Theorem). Let \(\phi \in L(V)\) be a linear operator on a finite-dimensional vector space over a field \(\F \).

    Then \(\Delta _{\phi }(\phi )=0\).

    Equivalently, for any \(A\in M_n(\F )\), \(\Delta _A(A)=0\).

2 Arthur Cayley, 1821–1895; William Rowan Hamilton, 1805-1865.

Before proving this, let us see what it tells us. Let

\begin{equation*} A= \begin{pmatrix} a&b\\c&d \end {pmatrix}\in M_2(\F ). \end{equation*}

Then

\begin{equation*} \Delta _A= \begin{vmatrix} a-x&b\\c&d-x \end {vmatrix}=x^2-(a+d)x+(ad-bc). \end{equation*}

So the Cayley–Hamilton theorem is telling us that

\begin{equation*} A^2-(a+d)A+(ad-bc)I_2=0, \end{equation*}

that is,

\begin{equation*} \begin{pmatrix} a^2+bc&ab+bd\\ca+dc&cb+d^2 \end {pmatrix} -(a+d) \begin{pmatrix} a&b\\c&d \end {pmatrix} + \begin{pmatrix} ad-bc&0\\0&ad-bc \end {pmatrix} = \begin{pmatrix} 0&0\\0&0 \end {pmatrix}. \end{equation*}

This is certainly true (check it!) but is far from obvious! If you are not yet convinced, work out what the theorem says for \(A\in M_3(\F )\).

  • Proof of Theorem 3.12. We will prove the matrix version. So let \(A\in M_n(\F )\) and write

    \begin{equation*} \Delta _{A}=a_0+\dots +a_nx^n. \end{equation*}

    Thus, our mission is to show that

    \begin{equation*} a_0I_n+a_1A+\dots +a_nA^n=0. \end{equation*}

    The key is the adjugate formula from Algebra 1B3:

    \begin{equation} \label {eq:9} \mathrm {adj}(A-xI_{n})(A-xI_{n})=\det (A-xI_{n})I_{n}. \end{equation}

    Each entry of \(\mathrm {adj}(A-xI_n)\) is a polynomial in \(x\) of degree at most \(n-1\) so we write

    \begin{equation*} \mathrm {adj}(A-xI_n)=B_0+B_1x+\dots +B_{n-1}x^{n-1}, \end{equation*}

    with each \(B_k\in M_n(\F )\). Substitute this into (3.6) to get

    \begin{equation*} (B_0+B_1x+\dots +B_{n-1}x^{n-1})(A-xI)=(a_0+\dots +a_nx^n)I_n \end{equation*}

    and compare coefficients of \(x^{k}\) to get

    \begin{equation} \label {eq:10} B_kA-B_{k-1}=a_kI_{n}, \end{equation}

    for \(0\leq k\leq n\), where we have set \(B_{-1}=B_n=0\in M_n(\F )\).

    Multiply (3.7) by \(A^k\) on the right to get

    \begin{equation*} B_kA^{k+1}-B_{k-1}A^k=a_kA^k \end{equation*}

    and sum:

    \begin{equation*} \Delta _{A}(A)=\sum _{k=0}^na_kA^k=\sum _{k=0}^n(B_kA^{k+1}-B_{k-1}A^k)=B_nA^{n+1}-B_{-1}=0 \end{equation*}

    because nearly all terms in the penultimate sum cancel.  □

3 Theorem 2.4.6

  • Corollary 3.13. Let \(\phi \in L(V)\) be a linear operator on a finite-dimensional vector space over a field \(\F \).

    • (1) \(m_{\phi }\) divides \(\Delta _{\phi }\). Equivalently, \(m_A\) divides \(\Delta _A\), for any \(A\in M_n(\F )\).

    • (2) The roots of \(m_{\phi }\) are exactly the eigenvalues of \(\phi \).

  • Proof. By Theorem 3.12, \(\Delta _{\phi }(\phi )=0\) so \(m_{\phi }\) divides \(\Delta _{\phi }\) by Proposition 3.7. As a result, any root of \(m_{\phi }\) is a root of \(\Delta _{\phi }\) and so an eigenvalue. Conversely, any eigenvalue is a root of \(m_{\phi }\) by Corollary 3.11.  □

Let us summarise the situation when \(\F =\C \) so that any polynomial is a product of linear factors. So let \(\phi \in L(V)\) be a linear operator on a finite-dimensional complex vector space with distinct eigenvalues \(\lst \lambda 1k\). Then

\begin{align*} \Delta _{\phi }&=\pm \prod _{i=1}^k(x-\lambda _i)^{r_i}\\ m_{\phi }&=\prod _{i=1}^k(x-\lambda _i)^{s_i}, \end{align*} where \(r_i=\am (\lambda _i)\) and \(\bw 1{s_i}{r_i}\), for \(\bw 1ik\).

This gives us a new method for computing the minimum polynomial of an operator so long as we can factorise its characteristic polynomial. If \(\Delta _{\phi }=\prod _{i=1}^k(x-\lambda _i)^{r_i}\), we try \(p=\prod _{i=1}^k(x-\lambda _{i})^{a_i}\), for \(\bw 1{a_i}{r_i}\), in increasing degree, until we find one with \(p(\phi )=0\).

  • Examples.

    • (1) Find \(m_A\) where

      \begin{equation*} A= \begin{pmatrix} 1&1&2\\0&1&1\\0&0&2 \end {pmatrix}. \end{equation*}

      For any upper triangular matrix, the determinant is just the product of the diagonal entries so \(\Delta _A=-(x-1)^2(x-2)\). This gives just two possibilities for \(m_A\): it is either \((x-1)(x-2)\) or \((x-1)^2(x-2)\). We try the degree \(2\) candidate:

      \begin{equation*} (A-I_3)(A-2I_3)= \begin{pmatrix} 0&1&2\\0&0&1\\0&0&1 \end {pmatrix} \begin{pmatrix*}[r] -1&1&2\\0&-1&1\\0&0&0 \end {pmatrix*}= \begin{pmatrix*}[r] 0&-1&1\\0&0&0\\0&0&0 \end {pmatrix*}\neq 0. \end{equation*}

      Thus we must have \(m_A=(x-1)^2(x-2)\).

    • (2) Repeat the analysis when

      \begin{equation*} A= \begin{pmatrix} 1&0&3\\0&1&2\\0&0&2 \end {pmatrix}. \end{equation*}

      Again we have \(\Delta _A=-(x-1)^2(x-2)\) and so the same two candidates for \(m_A\). This time, however,

      \begin{equation*} (A-I_3)(A-2I_3)= \begin{pmatrix} 0&0&3\\0&0&2\\0&0&1 \end {pmatrix} \begin{pmatrix*}[r] -1&0&3\\0&-1&2\\0&0&0 \end {pmatrix*}=0. \end{equation*}

      Thus \(m_A=(x-1)(x-2)\).