Chapter 4 Solving nonlinear equations
4.1 Root finding
Root finding problem. For a given function \(f:\mathbb{R}\to\mathbb{R}\), determine a solution \(x\) of the equation \(f(x)=0\). A solution \(x\) is known as a root of \(f\).
For most \(f\), there is no formula to give \(x\) explicitly and numerical methods are required. For example, we may use the bisection method and choose an interval \([a,b]\) that contains the root by checking that \(f(a)>0\) and \(f(b)<0\) or vice versa (\(f\) changes sign and has a root in the interval if it is continuous). Bisecting the interval and choosing one subinterval where \(f\) changes signs at the endpoints gives a new interval containing a root. Then, we iterate to find successively smaller intervals and better approximations to the root. Given a suitable initial interval, this bisection method is simple to apply but does not generalise easily to higher dimension. Instead, we focus on fixed-point iterations.
4.2 Fixed-point iteration
Definition 4.1 (root, fixed point) We say \(x\) is a root of a function \(f\) if \(f(x)=0\), and \(x\) is a fixed point (FP) of a function \(g\) if \(g(x)=x\).
Often, root-finding problems can be replaced by fixed-point (FP) problems:
Example 4.1 Let \(f(x)=x^3+4x^2-10\). There are many ways of posing this as a FP problem \(g(x)=x\).
Let \(g_1(x)=x-x^3-4x^2+10\). Then, it is easy to check that \(g_1(x)=x\) if and only if \(f(x)=0\).
Let \(g_2(x)=\frac 12 (10-x^3)^{1/2}\) (positive root). For \(x>0\), \(g_2(x)=x\) if and only if \(f(x)=0\).
Let \(g_3(x)= \left({\frac{10}{4+x}}\right)^{1/2}\), which is again a FP problem for the root-finding problem for \(f\).
Define the sequence \(x_n\) by \(x_{n+1}=g(x_n)\), given an initial condition \(x_0\). Under what conditions does \(x_n\) converge to a fixed point of \(g\) and can this be used for computing the root? To answer this question, we have the fixed-point theorem. We use the term `smooth’ to mean the function has enough continuous derivatives.
Theorem 4.1 (Convergence of FP iteration) Let \(g: [a,b]\to\mathbb{R}\) be a smooth function. Then, if (i) \(g(x)\in [a,b]\) for \(x\in [a,b]\) and (ii) \(|g'(x)|\le \lambda<1\) for \(x\in [a,b]\), then the sequence \(x_n\) defined by \(x_{n+1}=g(x_n)\) for any \(x_0\in[a,b]\) converges to the unique fixed point \(x\) of \(g\). Further, \[ \lvert x_{n}-x\rvert \le \lambda^n \lvert x-x_0 \rvert. \]
As well as convergence of the FP iteration, this theorem also gives existence and uniqueness of the FP of \(g\) in \([a,b]\).
Example 4.2 We look back at the fixed point problems in Example 4.1:
\(g_1(x)=x-x^3-4x^2+10\) and \([a,b]=[1,2]\). Then \(g_1(1)=6\) and condition (i) fails. The FP theorem does not apply to the iteration based on \(g_1\).
\(g_2(x)=\frac 12 \left({10-x^3}\right)^{1/2}\) and \[\begin{align*} g_2'(x) &=\frac 14 \left({10-x^3}\right)^{-1/2} (-3x^2) =\frac{-3x^2 }{4(10-x^3)^{1/2}},\\ g_2'(2) &=\frac{-3\cdot 4}{4(10-8)^{1/2}} =\frac{-3}{\sqrt{2}} \approx -2.12. \end{align*}\] Hence, \(\lvert g'_2(x) \rvert >1\) and condition (ii) fails. Again, the FP theorem does not apply.
\(g_3(x)=\left({\frac{10}{4+x}}\right)^{1/2}\) and \[ g'_3(x)=\frac 12 \left({\frac{10}{4+x}}\right)^{-1/2} \left({\frac{-10}{(4+x)^2}}\right) = \frac{-5}{(4+x)^{3/2}\sqrt{10}}. \] As \(g_3\) is decreasing and \(g_3(2)=\sqrt{10/6}\in[1,2]\) and \(g_3(1)=\sqrt 2\in[1,2]\), we see that condition (i) holds. Further, \[ \lvert g'_3(x) \rvert \le \frac{5}{\sqrt{10}} \frac{1}{5^{3/2}} <1\qquad \text{for $x\in [1,2]$}. \] Hence (ii) holds. The FP theorem applies and \(x_n\to x\), the unique fixed point of \(g\), and the root of \(f\): try it, \[ x_1=1.5,\qquad x_2=g_3(1.5)\approx 1.3484,\qquad x_3\approx 1.3674,\qquad x_4\approx 1.365. \] We see that the first three digits of \(x_n\) have already converged and that \(f(1.365)=-0.0038\), indicating that \(1.365\) is close to the root of \(f\).
For the proof of the fixed point theorem, we use the mean-value theorem from MA12001.
Theorem 4.2 (mean-value theorem) Let \(f: [a,b]\to\mathbb{R}\) be smooth. There exists \(\xi \in (a,b)\) such that \[ \frac{f(b) - f(a)}{b-a} = f'(\xi) . \]
We now prove Theorem 4.1:
Proof. Let \(f(x)=g(x)-x\). Then, by (i), \(f(a)=g(a)-a\ge 0\) and \(f(b)=g(b)-b\le 0\). By the intermediate-value theorem, there exists \(x\in [a,b]\) such that \(f(x)=0\). In other words, there exists \(x\in[a,b]\) so that \(g(x)=x\).
Consider the iteration \(x_{n+1}=g(x_n)\) and the fixed-point equation \(x=g(x)\). Then, \[ x_{n+1}-x =g(x_n)-g(x). \] By the mean-value theorem, there exists \(\xi\in(a,b)\) so that \[ x_{n+1}-x =g'(\xi)(x_{n}-x) \] (as \(g\) is smooth). Now \(|g'(\xi)|\le \lambda\) and \[ \lvert x_{n+1}-x \rvert \le \lambda \lvert x_n-x \rvert. \] By a simple induction argument, this implies that \(|x_{n}-x|\le \lambda^n |x_0-x|\).
Finally, to show uniqueness, consider two fixed-points \(x,y\) of \(g\). Then \(g(x)=x\) and \(g(y)=y\) and hence \[ x-y =g(x)-g(y) =g'(\xi)(x-y). \] As \(\lvert g'(\xi) \rvert \le \lambda\), we see that \[ \lvert x-y \rvert \le \lambda \lvert x-y \rvert. \] As \(\lambda<1\), it must hold that \(x=y\) and there is only one fixed point of \(g\) in \([a,b]\).
4.3 Newton’s method
The most well-known example of a fixed-point iteration is Newton’s method. This is the iteration \[ x_{n+1}=g(x_n),\qquad g(x):= x-\frac{f(x)}{f'(x)}, \] where we assume that \(f'(x)\ne 0\). Clearly, \(f(x)=0\) if and only if \(g(x)=x\).
We show the FP theorem applies:
Theorem 4.3 (local convergence of Newton's method) Suppose that \(f\) is smooth and that \(f(x)=0\) and \(f'(x)\ne 0\). Then, there exists \(\epsilon>0\) so that Newton’s method converges to the root \(x\) of \(f\) if the initial guess \(x_0\in [x-\epsilon,x+\epsilon]\).
Proof. With \(g(x)=x-f(x)/f'(x)\), we have \[\begin{equation} \tag{4.1} g'(x) =1-\frac{f'(x)}{f'(x)} + \frac{f(x)f''(x)}{f'(x)^2}=0 \end{equation}\] as \(f(x)=0\). As \(f\) and \(g\) are smooth, we have \(|g'(y)|\le \lambda:= \frac 12\) for \(y\in[x-\epsilon,x+\epsilon]\) by choosing \(\epsilon\) sufficiently small. This gives (ii) of the FP theorem for \(a=x-\epsilon\) and \(b=x+\epsilon\). For (i), note that \[ |g(y)-x| =|g(y)-g(x)| \le |g'(\xi)|\,|x-y| \le \frac 12 \epsilon, \] for any \(y\in [a,b]\) and some \(\xi\in(a,b)\) by the mean-value theorem. Clearly, then \(g(y)\in [x-\epsilon, x+\epsilon]=[a,b]\) and (i) of the FP theorem holds. We conclude that Newton’s method converges for initial conditions close (as given by \(\epsilon\)) to \(x\).
This theorem is problematic for the practitioner: it says that Newton’s method converges if we can start close enough to the root! We don’t usually know the root and we don’t usually know what \(\epsilon\) is (i.e., what close enough means). However, Newton’s method is often effective and, when it works, it is often very fast.
To quantify the speed of convergence, we define the order of convergence.
Definition 4.2 (order of convergence) Consider a sequence \(x_n\) approximating \(x\). Let \(e_n:= \lvert{x_n-x}\rvert\), the error in the approximation. We say that \(x_n\) converges to \(x\) with order \(r\) if
case \(r=1\) (linear convergence): \(e_{n+1} \le K e_n\) for all \(n\in \mathbb{N}\), for some \(K<1\);
case \(r>1\): \(e_{n+1} \le K e_n^r\) for all \(n\in \mathbb{N}\), for some \(K>0\). The case \(r=2\) is known as quadratic convergence.
One of the most useful tools in numerical analysis is Taylor’s theorem from MA12001:
Theorem 4.4 (Taylor's theorem) Suppose that \(f: (a,b) \to \mathbb{R}\) is \((m+1)\)-times continuously differentiable and suppose \(x_0, x \in (a,b)\) with \(x_0 \not = x\). Then, \[ f(x) = f(x_0) + f'(x_0)(x - x_0) + \frac{ f''(x_0)}{2}(x-x_0)^2 + \dots + \frac{f^{(m)} (x_0)}{m!}(x - x_0)^m + R_m (x), \] where \(f^{(m)}\) denotes the \(m\)-th derivative of \(f\) and \[ R_m(x) = \frac{f^{(m+1)} (\xi)}{(m+1)!}(x - x_0)^{m+1}, \] for some \(\xi\) lying strictly between \(x\) and \(x_0\).
Theorem 4.5 (Newton's method) If \(f\) is smooth and the initial guess \(x_0\) is sufficiently close to the root \(x\), then Newton’s method converges quadratically; that is, for some \(K>0\), \[ e_{n+1}= |x_{n+1}-x|\le K |x_n-x|^2=K e_n^2, \] where \(e_n=\lvert{x_n-x}\rvert\) represents the error at step \(n\).
Proof. Use Taylor’s theorem, to write \[ g(y) =g(x)+g'(x)(y-x) +\frac 12 g''(\xi)(y-x)^2 \] for some \(\xi\). We know from the calculation in (4.1) that \(g'(x)=0\) and hence \[ g(y)-g(x) =g(y)-x = \frac 12 g''(\xi) (y-x)^2. \] We know that the FP theorem applies in some interval \([a,b]=[x-\epsilon,x+\epsilon]\). Hence, if \(x_0\in[a,b]\) then so does \(x_n\) for \(n\in\mathbb{N}\). Hence, it is enough to take \(y\in[a,b]\) and also \(\xi\in[a,b]\). Let \(K:= \frac 12 \max_{\xi\in[a,b]} |g''(\xi)|\), which is finite as \(g\) is smooth. Then, with \(y=x_n\), we have \[ \lvert{x_{n+1}-x}\rvert \le K \lvert{x_n-x}\rvert^2, \] thus concluding the proof.
Example 4.3 Note that \(f(\pi)=0\) for \(f(x)=\sin(x)\). Newton’s method is the iteration
\[
x_{n+1} =x_n - \frac{\sin(x_n)}{\cos(x_n)} =x_n-\tan(x_n).
\]
Then, we take an initial condition \(x_0=3\) and
\[
x_1= 3.142546543074278,\quad x_2= 3.141592653300477,\quad x_3= 3.141592653589793
\]
by iterating the Python code x = x - np.tan(x).
The example illustrates nicely quadratic convergence and we see the number of correct digits increases rapidly (3, 10, 11 digits; the last one is affected by rounding error).