\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\TextOrMath }[2]{#2}\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\) \(\newcommand {\intertext }[1]{\text {#1}\notag \\}\) \(\let \Hat \hat \) \(\let \Check \check \) \(\let \Tilde \tilde \) \(\let \Acute \acute \) \(\let \Grave \grave \) \(\let \Dot \dot \) \(\let \Ddot \ddot \) \(\let \Breve \breve \) \(\let \Bar \bar \) \(\let \Vec \vec \) \(\require {mathtools}\) \(\newcommand {\vcentcolon }{\mathrel {\unicode {x2236}}}\) \(\newcommand {\approxcolon }{\approx \vcentcolon }\) \(\newcommand {\Approxcolon }{\approx \dblcolon }\) \(\newcommand {\simcolon }{\sim \vcentcolon }\) \(\newcommand {\Simcolon }{\sim \dblcolon }\) \(\newcommand {\dashcolon }{\mathrel {-}\vcentcolon }\) \(\newcommand {\Dashcolon }{\mathrel {-}\dblcolon }\) \(\newcommand {\colondash }{\vcentcolon \mathrel {-}}\) \(\newcommand {\Colondash }{\dblcolon \mathrel {-}}\) \(\newenvironment {crampedsubarray}[1]{}{}\) \(\newcommand {\smashoperator }[2][]{#2\limits }\) \(\newcommand {\SwapAboveDisplaySkip }{}\) \(\newcommand {\LaTeXunderbrace }[1]{\underbrace {#1}}\) \(\newcommand {\LaTeXoverbrace }[1]{\overbrace {#1}}\) \(\Newextarrow \xLongleftarrow {10,10}{0x21D0}\) \(\Newextarrow \xLongrightarrow {10,10}{0x21D2}\) \(\let \xlongleftarrow \xleftarrow \) \(\let \xlongrightarrow \xrightarrow \) \(\newcommand {\LWRmultlined }[1][]{\begin {multline*}}\) \(\newenvironment {multlined}[1][]{\LWRmultlined }{\end {multline*}}\) \(\let \LWRorigshoveleft \shoveleft \) \(\renewcommand {\shoveleft }[1][]{\LWRorigshoveleft }\) \(\let \LWRorigshoveright \shoveright \) \(\renewcommand {\shoveright }[1][]{\LWRorigshoveright }\) \(\newcommand {\shortintertext }[1]{\text {#1}\notag \\}\) \(\newcommand {\bigzero }{\smash {\text {\huge 0}}}\) \(\newcommand {\am }{\mathrm {am}}\) \(\newcommand {\gm }{\mathrm {gm}}\) \(\newcommand {\id }{\operatorname {id}}\) \(\newcommand {\GL }{\operatorname {GL}}\) \(\newcommand {\im }{\operatorname {im}}\) \(\newcommand {\rank }{\operatorname {rank}}\) \(\newcommand {\sol }{\operatorname {sol}}\) \(\newcommand {\ann }{\operatorname {ann}}\) \(\newcommand {\rO }{\operatorname {O}}\) \(\newcommand {\rU }{\operatorname {U}}\) \(\newcommand {\rSU }{\operatorname {SU}}\) \(\newcommand {\ev }{\operatorname {ev}}\) \(\newcommand {\bil }{\operatorname {Bil}}\) \(\newcommand {\rad }{\operatorname {rad}}\) \(\newcommand {\Span }[1]{\operatorname {span}\{#1\}}\) \(\newcommand {\R }{\mathbb {R}}\) \(\newcommand {\C }{\mathbb {C}}\) \(\newcommand {\Z }{\mathbb {Z}}\) \(\newcommand {\F }{\mathbb {F}}\) \(\newcommand {\Q }{\mathbb {Q}}\) \(\newcommand {\N }{\mathbb {N}}\) \(\renewcommand {\P }{\mathbb {P}}\) \(\newcommand {\I }{\mathrm {I}}\) \(\newcommand {\half }{\tfrac 12}\) \(\newcommand {\rel }{\mathrel {\mathrm {rel}}}\) \(\renewcommand {\vec }[3]{(#1_{#2},\dots ,#1_{#3})}\) \(\newcommand {\lst }[3]{#1_{#2},\dots ,#1_{#3}}\) \(\newcommand {\plst }[3]{#1_{#2}+\dots +#1_{#3}}\) \(\newcommand {\oplst }[3]{#1_{#2}\oplus \dots \oplus #1_{#3}}\) \(\newcommand {\pplst }[3]{#1_{#2}\times \dots \times #1_{#3}}\) \(\newcommand {\dlst }[3]{\lst {#1^{*}}{#2}{#3}}\) \(\newcommand {\dlc }[4]{\lc #1{#2^{*}}#3#4}\) \(\newcommand {\hmg }[3]{[#1_{#2},\dots ,#1_{#3}]}\) \(\newcommand {\rng }[2]{#1,\dots ,#2}\) \(\newcommand {\lc }[4]{#1_{#3}#2_{#3}+\dots +#1_{#4}#2_{#4}}\) \(\newcommand {\plus }[2]{#1+\dots +#2}\) \(\newcommand {\set }[1]{\{#1\}}\) \(\newcommand {\abs }[1]{\lvert #1\rvert }\) \(\newcommand {\ip }[1]{\langle #1\rangle }\) \(\newcommand {\norm }[1]{\|#1\|}\) \(\newcommand {\bx }{\mathbf {x}}\) \(\newcommand {\be }{\mathbf {e}}\) \(\newcommand {\bq }{\mathbf {q}}\) \(\newcommand {\bu }{\mathbf {u}}\) \(\newcommand {\by }{\mathbf {y}}\) \(\newcommand {\bv }{\mathbf {v}}\) \(\newcommand {\E }{\mathbb {E}}\) \(\newcommand {\cI }{\mathcal {I}}\) \(\newcommand {\cB }{\mathcal {B}}\) \(\newcommand {\sub }{\subseteq }\) \(\newcommand {\st }{\mathrel {|}}\) \(\newcommand {\bw }[3]{#1\leq #2\leq #3}\) \(\newcommand {\col }[3]{(#1_{#2})_{#2\in #3}}\) \(\newcommand {\supp }{\mathrm {supp}}\) \(\newcommand {\restr }[1]{_{|#1}}\)

3.5 The Cayley–Hamilton theorem

  • Theorem 3.12 (Cayley–Hamilton4 Theorem). Let \(\phi \in L(V)\) be a linear operator on a finite-dimensional vector space over a field \(\F \).

    Then \(\Delta _{\phi }(\phi )=0\).

    Equivalently, for any \(A\in M_n(\F )\), \(\Delta _A(A)=0\).

4 Arthur Cayley, 1821–1895; William Rowan Hamilton, 1805–1865.

Before proving this, let us see what it tells us. Let

\begin{equation*} A= \begin{pmatrix} a&b\\c&d \end {pmatrix}\in M_2(\F ). \end{equation*}

Then

\begin{equation*} \Delta _A= \begin{vmatrix} a-x&b\\c&d-x \end {vmatrix}=x^2-(a+d)x+(ad-bc). \end{equation*}

So the Cayley–Hamilton theorem is telling us that

\begin{equation*} A^2-(a+d)A+(ad-bc)I_2=0, \end{equation*}

that is,

\begin{equation*} \begin{pmatrix} a^2+bc&ab+bd\\ca+dc&cb+d^2 \end {pmatrix} -(a+d) \begin{pmatrix} a&b\\c&d \end {pmatrix} + \begin{pmatrix} ad-bc&0\\0&ad-bc \end {pmatrix} = \begin{pmatrix} 0&0\\0&0 \end {pmatrix}. \end{equation*}

This is certainly true (check it!) but is far from obvious! If you are not yet convinced, work out what the theorem says for \(A\in M_3(\F )\).

  • Proof of Theorem 3.12. We will prove the matrix version. So let \(A\in M_n(\F )\) and write

    \begin{equation*} \Delta _{A}=a_0+\dots +a_nx^n. \end{equation*}

    Thus, our mission is to show that

    \begin{equation*} a_0I_n+a_1A+\dots +a_nA^n=0. \end{equation*}

    The key is the adjugate formula from Algebra 1B5:

    \begin{equation} \label {eq:11} \mathrm {adj}(A-xI_{n})(A-xI_{n})=\det (A-xI_{n})I_{n}. \end{equation}

    Each entry of \(\mathrm {adj}(A-xI_n)\) is a polynomial in \(x\) of degree at most \(n-1\) so we write

    \begin{equation*} \mathrm {adj}(A-xI_n)=B_0+B_1x+\dots +B_{n-1}x^{n-1}, \end{equation*}

    with each \(B_k\in M_n(\F )\). Substitute this into (3.7) to get

    \begin{equation*} (B_0+B_1x+\dots +B_{n-1}x^{n-1})(A-xI)=(a_0+\dots +a_nx^n)I_n \end{equation*}

    and compare coefficients of \(x^{k}\) to get

    \begin{equation} \label {eq:12} B_kA-B_{k-1}=a_kI_{n}, \end{equation}

    for \(0\leq k\leq n\), where we have set \(B_{-1}=B_n=0\in M_n(\F )\).

    Multiply (3.8) by \(A^k\) on the right to get

    \begin{equation*} B_kA^{k+1}-B_{k-1}A^k=a_kA^k \end{equation*}

    and sum:

    \begin{equation*} \Delta _{A}(A)=\sum _{k=0}^na_kA^k=\sum _{k=0}^n(B_kA^{k+1}-B_{k-1}A^k)=B_nA^{n+1}-B_{-1}=0 \end{equation*}

    because nearly all terms in the penultimate sum cancel.

5 Theorem 2.4.6

  • Corollary 3.13. Let \(\phi \in L(V)\) be a linear operator on a finite-dimensional vector space over a field \(\F \).

    • (1) \(m_{\phi }\) divides \(\Delta _{\phi }\). Equivalently, \(m_A\) divides \(\Delta _A\), for any \(A\in M_n(\F )\).

    • (2) The roots of \(m_{\phi }\) are exactly the eigenvalues of \(\phi \).

  • Proof. By Theorem 3.12, \(\Delta _{\phi }(\phi )=0\) so \(m_{\phi }\) divides \(\Delta _{\phi }\) by Theorem 3.7. As a result, any root of \(m_{\phi }\) is a root of \(\Delta _{\phi }\) and so an eigenvalue. Conversely, any eigenvalue is a root of \(m_{\phi }\) by Theorem 3.11.

Let us summarise the situation when \(\F =\C \) so that any polynomial is a product of linear factors. So let \(\phi \in L(V)\) be a linear operator on a finite-dimensional complex vector space with distinct eigenvalues \(\lst \lambda 1k\). Then

\begin{align*} \Delta _{\phi }&=\pm \prod _{i=1}^k(x-\lambda _i)^{r_i}\\ m_{\phi }&=\prod _{i=1}^k(x-\lambda _i)^{s_i}, \end{align*} where \(r_i=\am (\lambda _i)\) and \(\bw 1{s_i}{r_i}\), for \(\bw 1ik\).

This gives us another way to find \(m_{\phi }\) if we can factorise \(\Delta _{\phi }\): \(m_{\phi }\) will be of the form \(p=\prod _{i=1}^{k}(x-\lambda _{i})^{s_i}\), with each \(\bw 1{s_i}{r_i}\), so evaluate \(p(\phi )\) to find the one of lowest degree with \(p(\phi )=0\).

  • Examples. Let us find \(m_A\) in the following cases:

    • (1) Take

      \begin{equation*} A= \begin{pmatrix*}[r] 1&1&2\\0&1&1\\0&0&2 \end {pmatrix*}. \end{equation*}

      Since \(A\) is upper triangular, we immediately see that \(\Delta _A=-(x-1)^2(x-2)\) so that \(m_A\) is either \((x-1)(x-2)\) or \((x-1)^2(x-2)\).

      We try the first of these:

      \begin{equation*} (A-I_3)(A-2I_3)= \begin{pmatrix} 0&1&2\\0&0&1\\0&0&1 \end {pmatrix} \begin{pmatrix*}[r] -1&-1&2\\0&-1&1\\0&0&0 \end {pmatrix*} = \begin{pmatrix*}[r] 0&-1&1\\0&0&0\\0&0&0 \end {pmatrix*}\neq 0. \end{equation*}

      We conclude that \(m_A=(x-1)^2(x-2)\).

    • (2) Let us try again with

      \begin{equation*} A= \begin{pmatrix} 1&0&3\\0&1&2\\0&0&2 \end {pmatrix} \end{equation*}

      which also has \(\Delta _A=-(x-1)^2(x-2)\) so that \(m_A\) is either \((x-1)(x-2)\) or \((x-1)^2(x-2)\).

      However, this time

      \begin{equation*} (A-I_3)(A-2I_3)= \begin{pmatrix} 0&0&3\\0&0&2\\0&0&1 \end {pmatrix} \begin{pmatrix*}[r] -1&0&3\\0&-1&2\\0&0&0 \end {pmatrix*} =0 \end{equation*}

      so that \(m_A=(x-1)(x-2)\).