Chapter 6 Solution of Initial-Value Problems (IVPs)
An IVP in standard form is a system of \(N\) differential equations: \[\begin{equation} \tag{6.1} \frac{d\vec y}{dt} = f(\vec y,t),\quad t\geq 0, \end{equation}\] where \(f : \mathbb{R}^{N+1} \to \mathbb{R}^N\) is a given function and \(\vec y (t) \in \mathbb{R}^N\) is the solution (for \(t>0\)), to be found subject to initial conditions of the form: \[\begin{equation} \tag{6.2} \vec y(0)=\vec y_0, \end{equation}\] with initial data \(\mathbf y_0 \in \mathbb{R}^N\) given.
Example 6.1 (N = 1) Let \(f(y,t) = y^2\) and \(y_0 = 1\). Then
\[ \frac{d y}{d t}=y^2, \quad \text{subject to} \quad y(0)=1. \] It is easy to see that the exact solution is \(y(t)=1/(1-t).\) (Please check!) Usually the exact solution is not so easy to find.
Not all problems are presented as in Example 6.1.
Example 6.2 A simple pendulum is released from rest at angle \(\alpha\). The angle \(\theta = \theta(t)\) of the pendulum satisfies the second-order equation: \[\begin{equation} \tag{6.3} \frac{d^2\theta}{dt^2}=\frac ga\sin\theta, \end{equation}\] with \(g\) and \(a\) constants, subject to two initial conditions: \[ \theta(0) = \alpha, \quad \frac{d \theta}{d t}(0) = 0. \] To put these equations in standard form, introduce the new variable \(\phi\) defined by \(\phi = d\theta/dt\).
Then (6.3) becomes \[\begin{align*} \frac{d \theta}{dt}&=\phi,\\ \frac{ d \phi}{dt} &= \frac ga\sin\theta.\\ \end{align*}\] which has the form (6.1) with \[ \vec y =\begin{bmatrix}\theta\\ \phi \end{bmatrix} \qquad \text{and} \qquad f(\vec y , t) =\begin{bmatrix}\phi\\ \frac ga\sin\theta \end{bmatrix}. \]
6.1 Euler’s method
Given Examples 6.1 and 6.2, suppose we want to find \(\vec y (t)\) for \(t=h\) for some small \(h\). Integrating each side of (6.1) over \([0,h]\) gives
\[\begin{equation} \vec y(h) - \vec y(0) = \int^h_0\frac{d \vec y}{d t}\,d t = \int^h_0 f(\vec y(t),t)\,d t \tag{6.4} \end{equation}\]
The integral on the right-hand side is still unknown, but we can approximate it, for example, by replacing the integrand by its value at \(t = 0\) (which will be okay if \(h\) is small). This yields \[\begin{equation} \label{55a} \vec y(h) \approx \vec y(0) + h f(\vec y(0),0) = \vec y_0 + h f(\vec y_0,0). %\Rightarrow &\vec y(h)=\vec y(0)+\int^h_0 f(\vec y(t),t)\,d t. \end{equation}\]
Everything on the right-hand side is known so we can compute an approximation of \(\vec y(h)\). Then \(\vec y(2h)\) can be approximated by taking another step, and so on. This is Euler’s Method, which computes a sequence of approximations \(\vec Y_j\) to \(\vec y(t_j)\), where \(t_j = jh\), \(j = 1,2,\ldots\) by \[\begin{equation} \label{56} \begin{aligned} \vec Y_0& = \vec y_0\\ \vec Y_{j}& = \vec Y_{j-1} + h f(\vec Y_{j-1},t_{j-1}), \qquad \text{for all } j\geq 1. \end{aligned} \end{equation}\]
Example 6.3 Solve \[\begin{equation} \tag{6.5} \frac{d y}{d t}=y^2, \quad \quad y(0) = 1 \end{equation}\] using Euler’s method with \(h=0.1\); that is, \[\begin{align*} Y_0&=y_0= 1\\ Y_1&=Y_0 + hf(Y_0)=1+0.1(1^2) = 1.1\\ Y_2&=Y_1+ hf(Y_1)=1.1+0.1(1.1)^2=1.221. \end{align*}\]
Using the program,we approximate the solution of (6.5) at the time \(T = 1/2\) for various \(h\). The number of steps taken in Euler’s method is then \(n = T/h\). Since we know \(y(1/2) = 2\) in this case (see (6.1)), we can find the error exactly.
Results:
| \(n = T/h\) | \(h\) | \(| y(1/2) - Y_n |\) | Ratio |
|---|---|---|---|
| 4 | 1/8 | 0.2338 | 0.59 |
| 8 | 1/16 | 0.1389 | 0.55 |
| 16 | 1/32 | 0.07696 | 0.53 |
| 32 | 1/64 | 0.04073 |
The ratios for the errors approach \(1/2\), which suggests that \(|Y_n - y(t)| \cong \mathcal{O}(h)\) as \(h \rightarrow 0\). This is what we prove in the next section.
6.2 Convergence of one-step methods
We now restrict to the simplified version of (6.1) with \(N = 1\) and \(f(y,t) = f(y)\): \[\begin{equation} \tag{6.6} \frac{d y}{d t}=f(y) \end{equation}\] subject to the initial condition \[\begin{equation} y(0)=y_0 . \tag{6.7} \end{equation}\]
We shall consider “one step” methods of the form : \[\begin{align} Y_0& = y_0 \nonumber\\ Y_{j}&= Y_{j-1} + h F_h(Y_{j-1}), \qquad \text{for all } j\geq 1 \tag{6.8} \end{align}\] where \(F_h\) is a function to be specified.
An example is Euler’s method, where \(F_h \equiv f\). The error at the \(j\)th step is defined to be \[ e_j = y(t_j)-Y_j \quad \text{where} \quad t_j = jh. \]
Definition 6.1 If \(y\) solves (6.6),(6.7), the local truncation error for (6.8) is defined to be \[ \tau_j = \frac{y(t_{j})-y(t_{j-1})}{h}-F_h(y(t_{j-1})) \, ,\qquad \text{where} \quad t_j=jh. \] (i.e., \(h \tau_j\) is the discrepancy when the true solution is substituted into (6.8)).
The convergence analysis proceeds by (i) bounding the error in the computed solutions \(Y_j\) in terms of the local truncation errors and (ii) estimating the local truncation error using Taylor’s theorem.
Definition 6.2 A continuous function \(g: \mathbb{R} \rightarrow \mathbb{R}\) is called Lipschitz continuous with Lipschitz constant \(L>0\) if \[ | \,g(Y) - g(Z)\, | \leq L \, | Y - Z | \qquad \text{for all $Y , Z \in \mathbb{R}$.} \]
Theorem 6.1 Suppose \(F_h\) is Lipschitz continuous with Lipschitz constant \(L\) independent of \(h\). Then the error \(e_j\) in (6.8) satisfies \[\begin{equation} \tag{6.9} |e_{j}|\leq (1 + hL) | e_{j-1} | + h | \tau_j| , \quad j = 1 ,2, 3, \ldots . \end{equation}\] Moreover, for all fixed \(T\) and all \(n \in \mathbb{N}\) satisfying \(n h \leq T\), \[ | e_n | \leq \frac{\exp(TL)-1}{L} \max_{1 \leq j \leq n} | \tau_j|. \]
Proof. By definition of \(\tau_j\), \[\begin{equation} \tag{6.10} y(t_{j}) =y(t_{j-1}) + h F_h(y(t_{j-1})) + h\tau_j \, . \end{equation}\] So with \(e_j = y(t_j) - Y_j\), we have by subtracting (6.8) from (6.10): \[ e_{j}=e_{j-1}+h\big(F_h(y(t_{j-1})) - F_h(Y_{j-1})\big) +h\tau_j. \] By the triangle inequality and the Lipschitz continuity of \(F_h\), \[\begin{align*} | e_{j} | & \leq | e_{j-1} | + h | F_h(y(t_{j-1})) - F_h(Y_{j-1}) | + h | \tau_j | \\ & \leq (1 + hL) | e_{j-1} | + h | \tau_j |. \end{align*}\] We see that (6.9) holds.
Now we shall prove by induction that, for all \(n \geq 1\), \[\begin{equation} \tag{6.11} | e_n | \leq h \sum_{j = 0} ^{n-1} (1 + hL)^j \, |\tau_{n-j}|. \end{equation}\] Clearly (6.11) holds for \(n =1\), since the first equation in (6.8) implies \(| e_0| = 0\) and (6.9) then gives \(| e_1 | \leq h | \tau_1|\). Now if (6.11) holds for some \(n\), then (6.9) implies \[\begin{align*} | e_{n+1} | & \leq h \sum_{j =0}^{n-1} (1 + hL)^{j+1} \, | \tau_{n-j}| + h | \tau_{n+1} | \\ & = h \sum_{j =1}^{n} (1 + h L)^{j} \, | \tau_{n+1-j}| + h | \tau_{n+1} |\\ & = h \sum_{j =0}^{n} (1 + hL)^{j} \, | \tau_{n+1-j}|. \end{align*}\] Hence, (6.11) holds for \(n+1\), and for all \(n\) by induction.
Finally, from (6.11) we have, since \(\displaystyle \sum_{j=0}^{n-1} q^j = \frac{q^n-1}{q-1}\), \[\begin{align*} | e_n | \leq & \frac{(1+ hL)^{n}-1}{L} \max_{1 \le j \le n} | \tau_j | % = \frac{\exp(nhL)-1}{L} \max_{1 \le j \le n} | \tau_j |. \end{align*}\] since \(1 + x \leq \exp x \,\), for all \(x \geq 0\). And so the result follows for all \(nh\leq T\).
Remark. This theorem shows that the error in the approximation to the solution computed by (6.8) at the point \(t_n = nh\) will approach \(0\) if all the local truncation errors \(\tau_j\), \(j=1,\ldots,n\), approach \(0\) , as \(h \rightarrow 0\).
Normally the local truncation error is estimated by applying Taylor’s theorem.
Example 6.4 Euler’s method is (6.8) with \(F_h (Y) := f(Y)\). If we assume that \(f\) is Lipschitz continuous, then Theorem 6.1 applies and to show convergence we have to estimate \(\tau_j\),.
To do this we write (using the definition of \(\tau_j\)): \[ \tau_j = \frac{y(t_{j} ) - y(t_{j-1})}{h} - f(y(t_{j-1})) = \frac{y(t_{j-1} + h ) - y(t_{j-1})}{h} - f(y(t_{j-1})) \] which we can expand via Taylor’s Theorem, with \(\xi_j \in (t_{j-1} , t_{j})\), such that \[\begin{align*} \tau_j = & \frac{\displaystyle \bigg(y(t_{j-1}) + h \frac{dy}{dt}(t_{j-1}) + \frac{h^2}{2} \frac{d^2y}{dt^2} (\xi_j) - y(t_{j-1})\bigg)}{h} - f(y(t_{j-1})) \\ % = & \left[{ \frac{dy}{dt}(t_{j-1}) - f(y(t_{j-1}))}\right]% + \frac{h}{2} \, \frac{d^2y}{dt^2} (\xi_j). \end{align*}\]
Now, since \(y(t)\) is the solution of (6.6), we have, for all \(j = 1, \cdots, n\): \[\begin{equation} | \tau_j| \leq \frac{1}{2} \max_{t \in [0,T]} \bigg|\frac{d^2 y }{dt^2} (t)\bigg| \, h. \end{equation}\] Hence if \[\begin{equation} \tag{6.12} \bigg|\frac{d^2 y }{dt^2} (t)\bigg| \qquad \text{is bounded for $t \in \mathbb{R}$}, \end{equation}\]
then Theorem 6.1 implies that \[\begin{equation} \tag{6.13} | e_n | \leq C(T) h , \end{equation}\]
where \(C(T)\) is a constant depending on \(T\); and, for fixed \(T\), we have convergence (i.e., \(e_n \rightarrow 0\) as \(h \rightarrow 0\)).
Often one just assumes (6.12) and then concludes (6.13). To show (6.12) rigorously, we need to make some assumptions on the given function \(f\). In particular, assume that \[\begin{align} | f(x) | & \leq M, \quad x \in \mathbb{R} ,\label{18} \\[1.5ex] \text{ and } \quad | f'(x) | & \leq L , \quad x \in \mathbb{R}. \label{19} \end{align}\] Then \(f\) is Lipschitz continuous with Lipschitz constant \(L\), and by (6.6) and the chain rule, \[ \left| \frac{d^2y}{dt^2} (t) \right| = \left| \frac{d}{dt}f(y(t))\right| = \left| f'(y(t)) \frac{dy}{dt}(t) \right| = \left| f'(y(t))| | f(y(t)) \right| \leq LM . \]
6.3 Higher-order methods
We saw in (6.13) that Euler’s Method converges with \(\mathcal{O}(h)\). This is relatively slow. Higher-order methods can be found by employing higher-order quadrature in (6.4). For simplicity, restrict to the case the case \(N=1\) and \(f(y,t) = f(y)\) again: Then (6.4) is \[ y(h)-y(0)=\int^h_0f(y(t))\,d t. \] Instead of approximating the right-hand side by the one-point quadrature rule at \(0\), consider using instead the trapezium rule, to obtain
\[\begin{equation} y(h)\approx y(0)+\frac h2 \Big(f(y(0))+f(y(h))\Big). \tag{6.14} \end{equation}\] This motivates the Crank–Nicholson (or Trapezoidal) Method for solving (6.1): \[\begin{equation} \begin{aligned} Y_0& = y_0\\ Y_{j}&= Y_{j-1} + \frac h2 \Big(f(Y_{j-1})+f(Y_{j})\Big), \qquad \text{for all $j\geq 1$.} \end{aligned} \tag{6.15} \end{equation}\] This method has a big disadvantage, since to find \(Y_{j}\) from \(Y_{j-1}\) in (6.15), we have to solve a (possibly nonlinear) equation \[ Y_{j}-\frac h2 f(Y_{j}) = Y_{j-1}+\frac h2 f(Y_{j-1}) \, . \]
If there are \(N\) differential equations, then there are \(N\) (possibly nonlinear) equations to solve at each timestep. Despite this extra cost such “implicit” methods are often preferred because of their good stability properties. (Stability is an advanced topic.)
To estimate the local truncation error for (6.15), write the method as \[ \frac{Y_{j} - Y_{j-1}}{h} = \frac{1}{2} \Big(f(Y_{j-1})+f(Y_{j})\Big). \]
Then using (6.6) and applying Taylor’s theorem to both \(y\) and \(dy/dt\) we get: \[\begin{align*} \tau_j&=\frac{y(t_{j})-y(t_{j-1})}{h}-\frac{1}{2}\Big(f(y(t_{j-1}))+f(y(t_{j}))\Big) \\[3pt] &= \frac{dy}{dt}(t_{j-1}) + \frac{h}{2} \frac{d^2y}{dt^2}(t_{j-1}) + \frac{h^2}{6} \frac{d^3y}{dt^3}(t_{j-1}) - \frac{1}{2}\bigg(\frac{dy}{dt}(t_{j-1}) + \frac{dy}{dt}(t_{j})\bigg) + \mathcal{O}(h^3)\\[3pt] &= \bigg(\frac{dy}{dt} + \frac{h}{2} \frac{d^2y}{dt^2} + \frac{h^2}{6} \frac{d^3y}{dt^3} - \frac{dy}{dt} - \frac{h}{2} \frac{d^2y}{dt^2} - \frac{h^2}{4} \frac{d^3y}{dt^3}\bigg)(t_{j-1}) + \mathcal{O}(h^3)\\[3pt] &= - \frac{1}{12} \frac{d^3y}{dt^3}(t_{j-1})\,h^2 + \mathcal{O}(h^3), \end{align*}\] Hence, provided \(y\) has three bounded derivatives, \(|\tau_j| = \mathcal{O}(h^2)\). Note that \[ \frac{d^3y}{dt^3} = \frac{d^2}{dt^2} f(y) = \frac{d}{dt} \big(f'(y) y'\big) = \frac{d}{dt} \big(f'(y) f(y)\big) = f''(y) (f(y))^2 + (f'(y))^2 f(y)\,. \]
Theorem 6.2 Suppose \(f\) is Lipschitz continuous with Lipschitz constant \(L\) independent of \(h\). If \(hL \le 1\), then the error \(e_j = y(t_j) - Y_j\) for the Crank–Nicholson method in (6.15) satisfies \[\begin{equation} |e_{j}|\leq (1 + hL)^2 \, | e_{j-1} | + h(1 + hL) \, | \tau_j| , \quad j = 1 ,2, 3, \ldots . \tag{6.16} \end{equation}\] Moreover, for all fixed \(T\) and all \(n \in \mathbb{N}\) satisfying \(n h \leq T\), \[ | e_n | \leq \frac{\exp(2TL)-1}{L} \max_{1 \leq j \leq n} | \tau_j|. \]
Proof. From (6.15), \[ Y_j = Y_{j-1} + \frac{h}{2} \, \Big(f(Y_{j-1}) + f(Y_{j}) \Big) \] and by definition of the trunctation error \[ y(t_j) = y(t_{j-1}) + \frac{h}{2} \, \Big(f(y(t_{j-1})) + f(y(t_{j}))\Big) + h \tau_j\,. \] Hence, subtracting \[ |e_j| % = \bigg|e_{j-1} + \frac{h}{2} \, \Big(f(y(t_{j-1})) - f(Y_{j-1})\Big) + \frac{h}{2} \, \Big(f(y(t_{j})) - f(Y_{j})\Big) + h \tau_j\bigg| \] and, using the triangle inequality and Lipschitz continuity of \(f\), we have \[ |e_j| \le | e_{j-1} | + \frac{hL}{2} \, | e_{j-1} | + \frac{hL}{2} \, | e_{j} | + h |\tau_j |, \] and rearrange as follows, \[\begin{align*} \bigg(1 - \frac{hL}{2}\bigg) |e_j|% &\le \bigg(1 + \frac{hL}{2}\bigg) | e_{j-1} | + h |\tau_j | \\ \Rightarrow \qquad |e_j| % &\le \bigg(1 - \frac{hL}{2}\bigg)^{-1} \bigg(\bigg(1 + \frac{hL}{2}\bigg)| e_{j-1}| + h |\tau_j |\bigg). \end{align*}\] We note that, for any \(0 \le x \le \frac{1}{2}\), \[ (1-x)^{-1} \le 1+2x, \] and thus for \(hL \le 1\), \[ \bigg(1-\frac{hL}{2}\bigg)^{-1} \le 1+hL. \] Therefore \[ |e_j| \le ( 1 + hL )^2 | e_{j-1} | + h( 1 + hL ) \, |\tau_j |. \]
The rest of the proof is Problem E8.3.
6.3.1 Higher-order explicit methods
There are ways of achieving higher order without using implicitness. Assume that we have computed an approximation \(Y_{j-1}\) to \(y(t_{j-1})\). The improved Euler method uses first the standard Euler method to get an approximation \(\widehat Y_{j}\) to \(y(t_{j})\) and then the trapezoidal rule to improve it: \[\begin{align} \widehat Y_{j}% & = Y_{j-1}+hf(Y_{j-1})\qquad\qquad\qquad\quad \text{“prediction"}\\ Y_{j} & = Y_{j-1}+\frac h2\Big(f(Y_{j-1})+f(\widehat{Y}_j)\Big) \qquad \text{“correction"} \end{align}\] This method fits into the framework of Theorem 6.1 because it can be written: \[ Y_{j} = Y_{j-1} + \frac h2\Big(f(Y_{j-1}) + f\big(Y_{j-1} + hf(Y_{j-1})\big)\Big), % = Y_n + h f_h(Y_j) \] which is of the form (6.8) with \(F_h(Y) := \frac{1}{2} f(Y) + \frac 12 f\big(Y + hf(Y)\big).\) The truncation error is \[ \tau_j = \frac{y(t_{j}) - y(t_{j-1})}{h} - \frac{1}{2} \big(f(y(t_{j-1})) + f\big(y(t_{j-1}) + h f(y(t_{j-1}))\big), \] and it turns out that \(\|\tau_j\| = \mathcal{O}(h^2)\) for sufficiently smooth \(f\) (see Problem E8.2).
Moreover if \(f\) is Lipschitz then so is \(F_h\), since \[\begin{align*} | F_h(Y) - F_h(Z)| &\leq \frac 12 | f(Y) - f(Z) | + \frac 12 | f(Y + hf(Y)) - f(Z + h f(Z)) |\\ &\leq \frac L2 | Y - Z | + \frac L2 | (Y + hf(Y)) - (Z + h f(Z)) |\\ &\leq L | Y - Z | + \frac {hL}2 | f(Y) - f(Z) | \leq \bigg(L + \frac{1}{2} hL^2\bigg)| Y - Z |. \end{align*}\]
So under these conditions, Theorem 6.1 implies that the improved Euler method converges with order \(\mathcal{O}(h^2)\), at the small additional cost of an extra evaluation of \(f\) at each time step.
Higher-order methods can be built up using more evaluations of \(f\). In general these methods are called Runge-Kutta methods.
Example 6.5 Let \[\begin{equation*} \left. \begin{aligned} K_1&=f(Y_0),\\ K_2&=f\Big(Y_0+\frac h2 K_1\Big),\\ K_3&=f\Big(Y_0+\frac h2 K_2\Big),\\[3pt] K_4&=f\big(Y_0+hK_3\big),\\[2pt] Y_1&=Y_0+\frac{h}{6}\big(K_1+2K_2+2K_3+K_4\big). \end{aligned} \right\} \end{equation*}\]
This is a fourth order Runge-Kutta method (requires some analysis).