Chapter 6 Non-examinable proofs

Proofs (non-examinable) for some of the results in the notes are provided below.

6.1 Proof of Theorem 3.1

Reminder: Theorem 3.1 states that for i.i.d. normal samples, \(\overline{X}\) and \(S^{2}\) are independent and \((n-1)S^{2}/\sigma^{2} \sim \chi^{2}_{n-1}.\)

Proof: We first show that \(\overline{X}\) and \(S^{2}\) are independent random variables. We first express \(S^{2}\) are a sum of \(n-1\) deviations from the mean: \[\begin{eqnarray} S^{2} & = & \frac{1}{n-1} \sum^{n}_{i=1} (X_{i} - \overline{X})^{2} \nonumber \\ & = & \frac{1}{n-1} \left[(X_{1}-\overline{X})^{2} + \sum^{n}_{i=2} (X_{i}-\overline{X})^{2} \right] \tag{6.1} \end{eqnarray}\] Since \(\sum^{n}_{i=1} X_{i}-\overline{X} = 0\), we have \[\begin{eqnarray*} X_{1}-\overline{X} & = & -\sum^{n}_{i=2} X_{i}-\overline{X} \end{eqnarray*}\] and so \[\begin{eqnarray*} (X_{1}-\overline{X})^{2} & = & \left[-\sum^{n}_{i=2} X_{i}-\overline{X} \right]^{2} \\ &=& \left[\sum^{n}_{i=2} X_{i}-\overline{X} \right]^{2} \end{eqnarray*}\] Substituting this into Equation (6.1) we have \[\begin{eqnarray*} S^{2} &=& \frac{1}{n-1} \left[\left[\sum^{n}_{i=2} X_{i}-\overline{X} \right]^{2} + \sum^{n}_{i=2} (X_{i}-\overline{X})^{2} \right] \end{eqnarray*}\] Thus \(S^{2}\) can be expressed as a function of \((X_{2}-\overline{X},..,X_{n}-\overline{X})\). We will show that this \(n-1\) multivariate random vector is independent of \(\overline{X}\), and hence that \(S^{2}\) is independent of \(\overline{X}\). For simplicity we will assume that \(\mu=0\) and \(\sigma^{2}=1\) – it being (hopefully) clear that these choices will not affect the dependence (if any) between \(\overline{X}\) and \(S^{2}\). The joint density of \(X_{1},..,X_{n}\) in this case is \[\begin{eqnarray*} f(X_{1},..,X_{n}) = \frac{1}{(2\pi)^{n/2}} \exp\left[ - \frac{1}{2} \sum^{n}_{i=1} X_{i}^{2} \right] \end{eqnarray*}\] We now define a transformation of the vector \((\overline{X},X_{2}-\overline{X},..,X_{n}-\overline{X})\) to \(Y_{1},..,Y_{n}\) via: \[\begin{eqnarray*} Y_{1} &=& \overline{X} \\ Y_{2} &=& X_{2} - \overline{X} \\ &\vdots & \\ Y_{n} &=& X_{n} - \overline{X} \end{eqnarray*}\] The theorem that gives the density of a random variable defined as a transformation of another extends to the vector case, which can be found in (for example) Casella and Berger (page 185). Application of this result implies that the joint density function of \(Y_{1},..,Y_{n}\) is given by \[\begin{eqnarray*} f(Y_{1},..,Y_{n}) &=&\frac{n}{(2\pi)^{n/2}} e^{-\frac{1}{2}(Y_{1}-\sum^{n}_{i=2} Y_{i})^{2}} e^{\frac{1}{2}\sum^{n}_{i=2} (Y_{i}+Y_{1})^{2}} \\ &=& \left[ \left(\frac{n}{2\pi} \right)^{1/2} e^{(-nY_{1}^{2})/2} \right] \left[\frac{n^{1/2}}{(2\pi)^{(n-1)/2}} e^{-(1/2)\left[ \sum^{n}_{i=2} Y_{i}^{2} + (\sum^{n}_{i=2}Y_{i})^{2} \right]} \right] \end{eqnarray*}\] This shows that the joint density of \(Y_{1},..,Y_{n}\) can be factorized into the product of the density of \(Y_{1}\) and the density of \((Y_{2},..,Y_{n})\), from which it follows that \(Y_{1}\) is independent of \((Y_{2},..,Y_{n})\). Recalling the definition of the random variables \(Y_{1},..,Y_{n}\), this means that \(\overline{X}\) is independent of the \(n-1\) deviations which determine \(S^{2}\), and hence \(\overline{X}\) and \(S^{2}\) are independent. We now show that \[\begin{eqnarray*} \frac{(n-1)S^{2}}{\sigma^{2}} \sim \chi^{2}_{n-1} \end{eqnarray*}\] where the earlier simplifying assumption that \(\mu=0\) and \(\sigma^{2}=1\) is now not used. We use an induction argument. Let \(\overline{X}_{n}\) and \(S^{2}_{n}\) denote the corresponding statistics based on the ‘first’ \(n\) observations. Note the ordering is arbitrary and simply needed to make the argument. After some tedious algebra, one can show that \[\begin{align} (n-1)S^{2}_{n} = (n-2)S^{2}_{n-1} + \frac{n-1}{n} (X_{n} - \overline{X}_{n-1})^{2} \tag{6.2} \end{align}\] Consider \(n=2\). Defining \(0 \times S^{2}_{1}=0\), we have \[\begin{eqnarray*} \frac{S^{2}_{2}}{\sigma^{2}} &=& \frac{1}{2\sigma^{2}} (X_{2}-X_{1})^{2} \\ &=& \left[\frac{1}{\sqrt{2} \sigma} (X_{2}-X_{1}) \right]^{2} \end{eqnarray*}\] where we use that \(\overline{X}_{1}=X_{1}\). Due to independence of the \(X_{i}\), the difference \(X_{2}-X_{1} \sim N(0,2\sigma^{2})\). Therefore \[\begin{eqnarray*} \frac{1}{\sqrt{2} \sigma} (X_{2}-X_{1}) \sim N(0, 1) \end{eqnarray*}\] By the definition of the \(\chi^{2}\) distribution, we then have that \[\begin{eqnarray*} \frac{S^{2}_{2}}{\sigma^{2}} \sim \chi^{2}_{1} \end{eqnarray*}\] Next, suppose that for \(n=k\), \(\frac{(k-1)S^{2}_{k}}{\sigma^{2}} \sim \chi^{2}_{k-1}\). For \(n=k+1\) we have using Equation (6.2) that \[\begin{eqnarray*} k S^{2}_{k+1} = (k-1) S^{2}_{k} + \frac{k}{k+1} (X_{k+1}-\overline{X}_{k})^{2} \end{eqnarray*}\] and dividing through by \(\sigma^{2}\) that \[\begin{eqnarray} \frac{k S^{2}_{k+1}}{\sigma^{2}} & = & \frac{(k-1) S^{2}_{k}}{\sigma^{2}} + \frac{1}{\sigma^{2}} \frac{k}{k+1} (X_{k+1}-\overline{X}_{k})^{2} \nonumber \\ & = & \frac{(k-1) S^{2}_{k}}{\sigma^{2}} + \left[\frac{1}{\sigma}\sqrt{\frac{k}{k+1}}(X_{k+1}-\overline{X}_{k})\right]^{2} \tag{6.3} \end{eqnarray}\] The first term on the right hand side is distributed \(\chi^{2}_{k-1}\) by the induction assumption. Thus to complete the proof we need to show that the second term on the right hand side is the square of an independent standard normal. The difference \(X_{k+1}-\overline{X}_{k}\) is normal since it is the difference of two normals, and it has mean zero since each term has mean zero. Since \(X_{k+1}\) is independent of \(\overline{X}_{k}\) its variance is \[\begin{eqnarray*} \text{Var}(X_{k+1}-\overline{X}_{k}) &=& \text{Var}(X_{k+1}) + \text{Var}(\overline{X}_{k}) \\ &=& \sigma^{2} + \frac{\sigma^{2}}{k} = \sigma^{2}(1+k^{-1}) \end{eqnarray*}\] and thus \[\begin{eqnarray*} \text{Var}\left[\frac{1}{\sigma}\sqrt{\frac{k}{k+1}}(X_{k+1}-\overline{X}_{k}) \right] &=& \frac{k}{\sigma^{2}(k+1)} \sigma^{2}(1+k^{-1}) = 1 \end{eqnarray*}\] The second term on the right hand side of Equation (6.3) is thus indeed distributed \(\chi^{2}_{1}\), as it is the square of a standard normal. All that remains is to show that it is independent of the first term on the right hand side of equation (6.3). First, from the first part of the proof we have that \(S^{2}_{k}\) is independent of \(\overline{X}_{k}\). Then since \(X_{k+1}\) is independent of both of these, it follows that \(S^{2}_{k}\) is independent of the difference \(X_{k+1}-\overline{X}_{k}\). The proof is thus complete. \(\square\)

6.2 Proof of Propositions 3.1 and 3.2

Throughout this section, assume we have Let \(X_{1},..,X_{n}\) be i.i.d. from a distribution with finite mean \(\mu\) and finite variance \(\sigma^{2}.\)

Proposition 3.1 states that, for a consistent estimator \(\hat{\sigma}\) of \(\sigma\) \[\begin{eqnarray*} \frac{(\overline{X}_{n}-\mu)}{\hat{\sigma}/\sqrt{n}} \xrightarrow{L} N(0,1). \end{eqnarray*}\]

Proof: First we express the statistic as \[\begin{eqnarray*} \frac{\overline{X}_{n}-\mu}{\hat{\sigma}/\sqrt{n}} &=& \frac{\overline{X}_{n}-\mu}{\sigma/\sqrt{n}} \times \frac{\sigma}{\hat{\sigma}} \end{eqnarray*}\] The first term in the product converges in law to \(N(0,1)\) by the Central Limit Theorem (Theorem 3.3). By assumption \(\hat{\sigma} \xrightarrow{P} \sigma.\) Since the function \(g(x)=\sigma/x\) is continuous, by the Continuous Mapping Theorem (Theorem 2.4), \[\begin{eqnarray*} \frac{\sigma}{\hat{\sigma}} \xrightarrow{P} \sigma/\sigma = 1 \end{eqnarray*}\] Finally, by Slutsky’s Theorem (Theorem 3.4), we have that \[\begin{eqnarray*} \frac{\overline{X}_{n}-\mu}{\hat{\sigma}/\sqrt{n}} &=& \frac{\overline{X}_{n}-\mu}{\sigma/\sqrt{n}} \times \frac{\sigma}{\hat{\sigma}} \\ & \xrightarrow{L} & Z \times 1 \end{eqnarray*}\] where \(Z \sim N(0,1).\) Hence \[\begin{eqnarray*} \frac{(\overline{X}_{n}-\mu)}{\hat{\sigma}/\sqrt{n}} \xrightarrow{L} N(0,1). \quad \square \end{eqnarray*}\]

Proposition 3.2 states that if \(S\) is a consistent estimator, then \[\begin{equation*} \left( \overline{X} - z_{1-\alpha/2} \frac{S}{\sqrt{n}} , \overline{X} + z_{1-\alpha/2} \frac{S}{\sqrt{n}} \right) \end{equation*}\] is asymptotically an \(100 \times (1-\alpha)\)% confidence interval for \(\mu.\)

Proof: It thus follows from Proposition 3.1 and the consistency of \(S,\) that \[\begin{eqnarray*} \frac{\sqrt{n}(\overline{X}_{n}-\mu)}{S} \xrightarrow{L} N(0,1) \end{eqnarray*}\] As such, \(\frac{\sqrt{n}(\overline{X}_{n}-\mu)}{S}\) is (approximately) a pivot for \(\mu,\) and \[\begin{eqnarray*} P \left( \overline{X}_n - z_{1-\alpha/2} \frac{S}{\sqrt{n}} < \mu < \overline{X}_n + z_{1-\alpha/2} \frac{S}{\sqrt{n}} \right) = 1-\alpha \end{eqnarray*}\] as \(n \rightarrow \infty.\) \(\square\)

6.3 Proof of Equations 5.3 and 5.4

Throughout this section we assume we have two independent samples: \(X_1, \dots, X_n\) are independent and identically distributed draws from some \(f_X(x \mid \theta_x)\) with \(E(X)\), \(\text{Var}(X) < \infty\), and \(Y_1, \dots, Y_m\) are independent and identically distributed draws from some \(f_Y(y \mid \theta_y)\) with \(E(Y)\), \(\text{Var}(Y) < \infty\). We will additionally assume that as \(n \rightarrow \infty\) and \(m \rightarrow \infty\), \(\frac{m}{n+m} \rightarrow \rho\) for some \(0<\rho<1\).

First we prove a lemma.

Lemma 6.1 Suppose \(\text{Var}(X)= \sigma^{2}_{X}\) and \(\text{Var}(Y)=\sigma^{2}_{Y}\) are known. Then \[\begin{eqnarray*} \frac{\overline{X}_{n} - \overline{Y}_{m} - (\mu_{X}-\mu_{Y})} {\sqrt{\frac{\sigma^{2}_{X}}{n}+\frac{\sigma^{2}_{Y}}{m}}} & \xrightarrow{L} & N\left(0, 1 \right). \end{eqnarray*}\]

Proof: Let \(N=n+m\), so that \(m/N \rightarrow \rho\) and \(n/N \rightarrow 1-\rho\) as \(m\) and \(n\) tend to infinity. Then we can express the asymptotic behavior of \(\overline{X}_{n}\) and \(\overline{Y}_{m}\) in terms of the overall sample size \(N=n+m\) by \[\begin{eqnarray*} \sqrt{N}(\overline{X}_{n} - \mu_{X}) &=& \sqrt{n}(\overline{X}_{n} - \mu_{X}) \times \sqrt{\frac{N}{n}} \xrightarrow{L} A \times (1-\rho)^{-1/2} \end{eqnarray*}\] where \(A \sim N(0,\sigma^{2}_{X})\). Then \(\text{Var}(A (1-\rho)^{-1/2}) = \sigma^{2}_{X} (1-\rho)^{-1}\) and thus we have \[\begin{eqnarray*} \sqrt{N}(\overline{X}_{n} - \mu_{X}) \xrightarrow{L} N(0, \sigma^{2}_{X}/(1-\rho)) \end{eqnarray*}\] and similarly \[\begin{eqnarray*} \sqrt{N}(\overline{Y}_{m} - \mu_{Y}) \xrightarrow{L} N(0, \sigma^{2}_{Y}/\rho) \end{eqnarray*}\] We now want to examine the asymptotic distribution of the difference of these two quantities. A result from probability theory (Lemma 3.1.1. of Lehmann) says that if \(U_{N}\) and \(V_{N}\) are independent sequences of random variables, and \(U\) and \(V\) are independent random variables for which \(U_{N} \xrightarrow{L} U\) and \(V_{N} \xrightarrow{L} V\), then \(U_{N} \pm V_{N} \xrightarrow{L} U \pm V\). From this result it follows that \[\begin{eqnarray*} \sqrt{N}\left[(\overline{X}_{n} - \overline{Y}_{m}) - (\mu_{X}-\mu_{Y})\right] & \xrightarrow{L} & N\left(0, \frac{\sigma^{2}_{X}}{1-\rho}+\frac{\sigma^{2}_{Y}}{\rho} \right) \end{eqnarray*}\] or equivalently that \[\begin{eqnarray*} \frac{(\overline{X}_{n} - \overline{Y}_{m}) - (\mu_{X}-\mu_{Y})} {\sqrt{\frac{\sigma^{2}_{X}}{n}+\frac{\sigma^{2}_{Y}}{m}}} & \xrightarrow{L} & N\left(0, 1 \right). \square \end{eqnarray*}\]

Now we formally restate the claim in Equation (5.3)

Proposition 6.1 Suppose that \(\text{Var}(X)=\text{Var}(Y)=\sigma^{2}\) and let \(\widehat \sigma^{2}_{X}\) and \(\widehat \sigma^{2}_{Y}\) be consistent estimators of these variances

Then \[\begin{eqnarray*} \frac{\overline{X}_{n} - \overline{Y}_{m} - (\mu_{X}-\mu_{Y})}{S_{p}\sqrt{\displaystyle \frac{1}{n} + \frac{1}{m}}} \xrightarrow{L} N(0,1) \end{eqnarray*}\]

where \[\begin{eqnarray*} S_{p}^{2} = \frac{(n-1)\widehat \sigma_{X}^{2} + (m-1)\widehat \sigma_{Y}^{2}}{n+m-2}. \end{eqnarray*}\]

Proof: From Lemma 6.1 we have that \[\begin{eqnarray*} \frac{\overline{X}_{n} - \overline{Y}_{m} - (\mu_{X}-\mu_{Y})}{\sigma \sqrt{\frac{1}{n}+\frac{1}{m}}} & \xrightarrow{L} & N\left(0, 1 \right) \end{eqnarray*}\] Now we can express \[\begin{eqnarray*} \frac{\overline{X}_{n} - \overline{Y}_{m} - (\mu_{X}-\mu_{Y})}{S_{p}\sqrt{\displaystyle \frac{1}{n} + \frac{1}{m}}} = \frac{\overline{X}_{n} - \overline{Y}_{m} - (\mu_{X}-\mu_{Y})}{\sigma \sqrt{\frac{1}{n}+\frac{1}{m}}} \frac{\sigma}{S_{p}} \end{eqnarray*}\] Thus by Slutsky’s Theorem (Theorem 3.4), this will converge in distribution to \(N(0,1)\) provided \(\sigma / S_{p}\) converges in probability to one. To show this, we have by Theorem 2.5 that \[\begin{eqnarray*} S^{2}_{p} = \frac{(n-1)\widehat \sigma^{2}_{X} + (m-1)\widehat \sigma^{2}_{X}}{n+m-2} \xrightarrow{P} (1-\rho) \sigma^{2} + \rho \sigma^{2} = \sigma^{2}. \end{eqnarray*}\] Thus \(S_{p}\) is consistent for \(\sigma\) and the by the Continuous Mapping Theorem (Theorem 2.4) \(\sigma/S_{p}\) converges in probability to one. \(\square\)

Now we formally restate the claim in Equation (5.4)

Proposition 6.2 Suppose that \(\text{Var}(X) = \sigma^{2}_{X} \neq\sigma^{2}_{Y}=\text{Var}(Y)=\sigma^{2}\) and let \(\widehat \sigma^{2}_{X}\) be consistent estimator of \(\text{Var}(X)\) and \(\widehat \sigma^{2}_{Y}\) be consistent estimator of \(\text{Var}(Y)\).

\[\begin{eqnarray*} \frac{\overline{X}_{n}-\overline{Y}_{m} - (\mu_{X} - \mu_{Y})}{\sqrt{\frac{\widehat \sigma^{2}_{X}}{n} + \frac{\widehat \sigma^{2}_{Y}}{m}}} & \xrightarrow{L} & N\left(0, 1 \right) \end{eqnarray*}\]

Proof: We rewrite the quantity whose asymptotic distribution we are considering as \[\begin{eqnarray*} \frac{\overline{X}_{n}-\overline{Y}_{m} - (\mu_{X} - \mu_{Y})}{\sqrt{\frac{\sigma^{2}_{X}}{n} + \frac{\sigma^{2}_{Y}}{m}}} \frac{\sqrt{\frac{\sigma^{2}_{X}}{n} + \frac{\sigma^{2}_{Y}}{m}}}{\sqrt{\frac{\widehat \sigma^{2}_{X}}{n} + \frac{\widehat \sigma^{2}_{Y}}{m}}} \end{eqnarray*}\] By Lemma 6.1, the first term converges in distribution to \(N(0,1)\). The second term in this expression can be written as \[\begin{eqnarray*} \sqrt{\frac{m \sigma^{2}_{X} + n \sigma^{2}_{Y}}{m\widehat \sigma^{2}_{X} + n \widehat \sigma^{2}_{Y}}} = \sqrt{\frac{(m/N) \sigma^{2}_{X} + (n/N) \sigma^{2}_{Y}}{(m/N) \widehat \sigma^{2}_{X} + (n/N) \widehat \sigma^{2}_{Y}}} \end{eqnarray*}\] where \(N=m+n\). If \(m/N \rightarrow \rho\) as \(n\) and \(m\) go to infinity, then the numerator converges to \(\rho \sigma^{2}_{X} + (1-\rho) \sigma^{2}_{Y}\), and the denominator converges in probability to the same quantity by Theorem 2.5. Through another application of Theorem 2.5, the term overall converges in probability to one, and then by Slutsky’s Theorem (Theorem 3.4), we have that \[\begin{eqnarray*} \frac{(\overline{X}_{n}-\overline{Y}_{m}) - (\mu_{X} - \mu_{Y})}{\sqrt{\frac{\widehat \sigma^{2}_{X}}{n} + \frac{\widehat \sigma^{2}_{Y}}{m}}} \xrightarrow{L} N(0,1). \quad \square \end{eqnarray*}\]

6.4 Proof of Proposition 5.2

Reminder: Proposition 5.2 states that for comparing the means of two independent normal samples with unequal variances, \[\begin{eqnarray*} \frac{\overline{X}_{n}-\overline{Y}_{m} - (\mu_{X} - \mu_{Y})}{\sqrt{\frac{S^{2}_{X}}{n} + \frac{S^{2}_{Y}}{m}}} & \xrightarrow{L} & N\left(0, 1 \right). \end{eqnarray*}\]

Proof: This is a special case of Proposition 6.2 with \(\widehat \sigma^{2}_{X} = S^2_X\) and \(\widehat \sigma^{2}_{Y} = S^2_Y\). We need to establish that \(S^{2}_{X}\) and \(S^{2}_{Y}\) are consistent estimators of \(\sigma^{2}_{X}\) and \(\sigma^{2}_{Y}\). We showed this is true for the normal distribution case in Example 2.16. \(\square\)