Solution Sheet Six

Question 1

Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta = (\mu, \sigma^{2})\). Suppose that \(X_{i} \, | \, \theta \sim N(\mu, \sigma^{2})\). It is judged that the improper joint prior distribution \(f(\mu, \sigma^{2}) \propto 1/\sigma^{2}\) is appropriate.

  1. Show that the likelihood \(f(x \, | \, \mu, \sigma^{2})\), where \(x = (x_{1}, \ldots, x_{n})\), can be expressed as \[\begin{eqnarray*} f(x \, | \, \mu, \sigma^{2}) & = & \left(2\pi \sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right\}, \end{eqnarray*}\] where \(\overline{x} = \frac{1}{n} \sum_{i=1}^{n} x_{i}\), \(s^{2} = \frac{1}{n-1} \sum_{i=1}^{n} (x_{i} - \overline{x})^{2}\) are respectively the sample mean and variance. Hence, explain why \(\overline{X}\) and \(S^{2}\) are sufficient for \(X = (X_{1}, \ldots, X_{n})\) for learning about \(\theta\).

\[\begin{eqnarray*} f(x \, | \, \mu, \sigma^{2}) & = & \prod_{i=1}^{n} \frac{1}{\sqrt{2 \pi}\sigma} \exp\left\{-\frac{1}{2\sigma^{2}}(x_{i} - \mu)^{2}\right\} \\ & = & \left(2\pi\sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n} (x_{i} - \mu)^{2}\right\} \\ & = & \left(2\pi\sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n} \left((x_{i} - \overline{x}) + (\overline{x} - \mu)\right)^{2}\right\} \\ & = & \left(2\pi \sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right\}. \end{eqnarray*}\] Recall that a statistic \(t(X)\) is said to be sufficient for \(X\) for learning about \(\theta\) if we can write \[\begin{eqnarray*} f(x \, | \, \theta) & = & g(t, \theta)h(x) \end{eqnarray*}\] where \(g(t, \theta)\) depends upon \(t(x)\) and \(\theta\) and \(h(x)\) does not depend upon \(\theta\) but may depend upon \(x\). In this case we have that \[\begin{eqnarray*} f(x \, | \, \mu, \sigma^{2}) & = & g(\overline{x}, s^{2}, \mu, \sigma^{2})\left(2\pi \right)^{-\frac{n}{2}} \end{eqnarray*}\] so that \(\overline{X}\) and \(S^{2}\) are sufficient.

  1. Find, up to a constant of integration, the posterior distribution of \(\theta\) given \(x\).

\[\begin{eqnarray*} f(\mu, \sigma^{2} \, | \, x) & \propto & f(x \, | \, \mu, \sigma^{2})f(\mu, \sigma^{2}) \\ & \propto & \left(2\pi \sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right\} \frac{1}{\sigma^{2}} \\ & \propto & \left(\sigma^{2}\right)^{-\left(\frac{n}{2}+1\right)}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right\}. \end{eqnarray*}\]

  1. Show that \(\mu \, | \, \sigma^{2}, x \sim N(\overline{x}, \sigma^{2}/n)\). Hence, explain why, in this case, the chosen prior distribution for \(\theta\) is noninformative.

\[\begin{eqnarray*} f(\mu \, | \, \sigma^{2}, x) & = & \frac{f(\mu, \sigma^{2} \, | \, x)}{f(\sigma^{2} \, | \, x)} \\ & \propto & f(\mu, \sigma^{2} \, | \, x) \ \ \ \ \mbox{(with respect to $\mu$)} \\ & \propto & \exp\left\{-\frac{n}{2\sigma^{2}}(\overline{x} - \mu)^{2}\right\}. \end{eqnarray*}\] We recognise this as a kernel of \(N(\overline{x}, \sigma^{2}/n)\) so \(\mu \, | \, \sigma^{2}, x \sim N(\overline{x}, \sigma^{2}/n)\). In the classical model for \(\mu\) when \(\sigma^{2}\) is known, the mle is \(\overline{x}\) and the standard error is \(\sigma^{2}/n\). The distribution is coming only from the data (given \(\sigma^{2}\)) showing the noninformative prior in this case. Notice that a symmetric \(100(1-\alpha)\%\) credible interval for \(\mu \, | \, \sigma^{2}, x\) is \[\begin{eqnarray*} \left(\overline{x} - z_{\left(1 - \frac{\alpha}{2}\right)}\frac{\sigma}{\sqrt{n}}, \overline{x} + z_{\left(1 - \frac{\alpha}{2}\right)}\frac{\sigma}{\sqrt{n}}\right) \end{eqnarray*}\] which agrees with the \(100(1-\alpha)\%\) confidence interval for \(\mu\) when \(\sigma^{2}\) is known and \(X_{1}, \ldots, X_{n}\) are iid \(N(\mu, \sigma^{2})\).

  1. By integrating \(f(\mu, \sigma^{2} \, | \, x)\) over \(\sigma^{2}\), show that \[\begin{eqnarray*} f(\mu \, | \, x) & \propto & \left[1 + \frac{1}{n-1}\left(\frac{\overline{x} - \mu}{s/\sqrt{n}}\right)^{2}\right]^{-\frac{n}{2}}. \end{eqnarray*}\] Thus, explain why \(\mu \, | \, x \sim t_{n-1}(\overline{x}, s^{2}/n)\), the non-central \(t\)-distribution with \(n-1\) degrees of freedom, location parameter \(\overline{x}\) and squared scale parameter \(s^{2}/n\). How does this result relate to the classical problem of making inferences about \(\mu\) when \(\sigma^{2}\) is also unknown?

\[\begin{eqnarray} f(\mu \, | \, x) & = & \int_{-\infty}^{\infty} f(\mu, \sigma^{2} \, | \, x) \, d\sigma^{2} \nonumber \\ & \propto & \int_{0}^{\infty} \left(\sigma^{2}\right)^{-\left(\frac{n}{2}+1\right)}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right\} \, d\sigma^{2} \tag{1} \end{eqnarray}\] as \(f(\mu, \sigma^{2} \, | \, x) = 0\) for \(\sigma^{2} < 0\). We recognise the integrand in equation (1) as a kernel of \(\mbox{Inv-Gamma}\left(\frac{n}{2}, \frac{1}{2}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right)\) so that \[\begin{eqnarray*} f(\mu \, | \, x) & \propto & \Gamma(n/2)\left(\frac{1}{2}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right)^{-\frac{n}{2}}\\ & \propto & \left(\frac{1}{2}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right)^{-\frac{n}{2}}\\ & \propto & \left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]^{-\frac{n}{2}}\\ & \propto & \left[1 + \frac{1}{n-1}\left(\frac{\overline{x} - \mu}{s/\sqrt{n}}\right)^{2}\right]^{-\frac{n}{2}}. \end{eqnarray*}\] We recognise this as a kernel of \(t_{n-1}(\overline{x}, s^{2}/n)\) so that \(\mu \, | \, x \sim t_{n-1}(\overline{x}, s^{2}/n)\). This gives a further insight into how the prior distribution is noninformative. Inference about \(\mu\) will mirror the classical approach when \(\mu\) and \(\sigma^{2}\) are unknown and \(\frac{\overline{X} - \mu}{S/\sqrt{n}} \sim t_{n-1}\) where \(t_{n-1} = t_{n-1}(0, 1)\). For example, a symmetric \(100(1-\alpha)\%\) credible interval for \(\mu \, | \, x\) is \[\begin{eqnarray*} \left(\overline{x} - t_{n-1, \frac{\alpha}{2}}\frac{s}{\sqrt{n}}, \overline{x} + t_{n-1, \frac{\alpha}{2}}\frac{s}{\sqrt{n}}\right) \end{eqnarray*}\] which agrees with the \(100(1-\alpha)\%\) confidence interval for \(\mu\) when \(\sigma^{2}\) is unknown and \(X_{1}, \ldots, X_{n}\) are iid \(N(\mu, \sigma^{2})\).

Question 2

Let \(X_{1}, \ldots, X_{n}\) be exchangeable with \(X_{i} \, | \, \theta \sim Bern(\theta)\).

  1. Using the improper prior distribution \(f(\theta) \propto \theta^{-1}(1-\theta)^{-1}\) find the posterior distribution of \(\theta \, | \, x\) where \(x = (x_{1}, \ldots, x_{n})\). Find a normal approximation about the mode to this distribution.

The likelihood is \[\begin{eqnarray*} f(x \, | \, \theta) \ = \ \prod_{i=1}^{n} \theta^{x_{i}}(1-\theta)^{1-x_{i}} \ = \ \theta^{n\bar{x}}(1-\theta)^{n-n\bar{x}}, \end{eqnarray*}\] where \(x = (x_{1}, \ldots, x_{n})\). With the given prior the posterior is \[\begin{eqnarray*} f(\theta \, | \, x) \ \propto \ f(x \, | \, \theta)f(\theta) \ \propto \ \theta^{n\bar{x}-1}(1-\theta)^{n-n\bar{x}-1} \end{eqnarray*}\] which we recognise as a kernel of a \(Beta(n\bar{x}, n-n\bar{x})\) density. Thus, \(\theta \, | \, x \sim Beta(n\bar{x}, n-n\bar{x})\). The mode of a \(Beta(\alpha, \beta)\) distribution is \(\frac{\alpha-1}{\alpha + \beta -2}\) so for \(\theta \, | \, x\) the mode is \[\begin{eqnarray*} \tilde{\theta} & = & \frac{n\bar{x} - 1}{n-2}. \end{eqnarray*}\] The observed information is \[\begin{eqnarray*} I(\theta \, | \, x) & = & -\frac{\partial^{2}}{\partial \theta^{2}}\log f(\theta \, | \, x) \\ & = & -\frac{\partial^{2}}{\partial \theta^{2}}\left\{\log \frac{\Gamma(n)}{\Gamma(n\bar{x})\Gamma(n -n\bar{x})} + (n\bar{x}-1) \log \theta + (n-n\bar{x}-1)\log (1-\theta)\right\} \\ & = & \frac{n\bar{x}-1}{\theta^{2}} + \frac{n-n\bar{x}-1}{(1-\theta)^{2}}. \end{eqnarray*}\] So, evaluating the observed information at the mode, \[\begin{eqnarray*} I(\tilde{\theta} \, | \, x) & = & \frac{n\bar{x}-1}{\tilde{\theta}^{2}} + \frac{n-n\bar{x}-1}{(1-\tilde{\theta})^{2}}. \end{eqnarray*}\] Noting that \(1-\tilde{\theta} = \frac{n-n\bar{x}-1}{n-2}\) we have that \[\begin{eqnarray*} I(\tilde{\theta} \, | \, x) & = & \frac{(n-2)^{2}}{n\bar{x}-1} + \frac{(n-2)^{2}}{n-n\bar{x}-1} \\ & = & \frac{(n-2)^{3}}{(n\bar{x} -1)(n -n\bar{x} -1)}. \end{eqnarray*}\] So, approximately, \(\theta \, | \, x \sim N(\tilde{\theta}, I^{-1}(\tilde{\theta} \, | \, x))\), that is, approximately, \[\begin{eqnarray*} \theta \, | \, x & \sim & N\left(\frac{n\bar{x} - 1}{n-2}, \frac{(n\bar{x} -1)(n -n\bar{x} -1)}{(n-2)^{3}}\right). \end{eqnarray*}\]

  1. Show that the prior distribution \(f(\theta)\) is equivalent to a uniform prior on \[\begin{eqnarray*} \beta & = & \log \left(\frac{\theta}{1-\theta}\right) \end{eqnarray*}\] and find the posterior distribution of \(\beta \, | \, x\). Find a normal approximation about the mode to this distribution.

We have \(\beta = g(\theta)\). We invert to find \(\theta = g^{-1}(\beta)\). We find\[\begin{eqnarray*} \theta & = & \frac{e^{\beta}}{1 + e^{\beta}}. \end{eqnarray*}\] The prior \(f_{\beta}(\beta)\) for \(\beta\) is given by \[\begin{eqnarray*} f_{\beta}(\beta) & = & \left|\frac{\partial \theta}{\partial \beta} \right| f_{\theta}(\theta) \\ & = & \left|\frac{e^{\beta}}{(1 + e^{\beta})^{2}}\right| \left(\frac{e^{\beta}}{1 + e^{\beta}}\right)^{-1}\left(1 - \frac{e^{\beta}}{1 + e^{\beta}}\right)^{-1} \\ & = & \frac{e^{\beta}}{(1 + e^{\beta})^{2}} \times \frac{1 + e^{\beta}}{e^{\beta}} \times (1 + e^{\beta}) \ = \ 1, \end{eqnarray*}\] which is equivalent to the (improper) uniform on \(\beta\). The posterior is \[\begin{eqnarray*} f(\beta \, | \, x) & \propto & f(x \, | \, \beta) f(\beta) \\ & \propto & \theta^{n\bar{x}}(1-\theta)^{n-n\bar{x}} \\ & = & \left(\frac{e^{\beta}}{1 + e^{\beta}}\right)^{n\bar{x}}\left(1-\frac{e^{\beta}}{1 + e^{\beta}}\right)^{n-n\bar{x}} \ = \ \frac{e^{\beta n \bar{x}}}{(1+e^{\beta})^{n}}. \end{eqnarray*}\] Hence, \(f(\beta \, | \, x) = \frac{ce^{\beta n \bar{x}}}{(1+e^{\beta})^{n}}\) where \(c\) is the constant of integration. For the normal approximation about the mode, we first need to find the mode of \(\beta \, | \, x\). The mode is the maximum of \(f(\beta \, | \, x)\) which is, equivalently, the maximum of \(\log f(\beta \, | \, x)\). We have \[\begin{eqnarray*} \log f(\beta \, | \, x) & = & \log c + \beta n \bar{x} -n \log (1+e^{\beta}) \\ \Rightarrow \frac{\partial}{\partial \beta} \log f(\beta \, | \, x) & = & n \bar{x} - \frac{ne^{\beta}}{1 + e^{\beta}}. \end{eqnarray*}\] The mode \(\tilde{\beta}\) satisfies \[\begin{eqnarray*} n \bar{x} - \frac{ne^{\tilde{\beta}}}{1 + e^{\tilde{\beta}}} & = & 0 \\ \Rightarrow e^{\tilde{\beta}} & = & \frac{n\bar{x}}{n -n \bar{x}} \\ \Rightarrow \tilde{\beta} & = & \log \left(\frac{\bar{x}}{1-\bar{x}}\right). \end{eqnarray*}\] The observed information is \[\begin{eqnarray*} I(\beta \, | \, x) & = & - \frac{\partial^{2}}{\partial \beta^{2}} \log f(\beta \, | \, x) \\ & = & \frac{ne^{\beta}}{(1 + e^{\beta})^{2}}. \end{eqnarray*}\] Noting that \(1 + e^{\tilde{\beta}} = \frac{1}{1-\bar{x}}\) we have \[\begin{eqnarray*} I(\tilde{\beta} \, | \, x) & = & n \times \frac{\bar{x}}{1-\bar{x}} \times (1-\bar{x})^{2} \ = \ n\bar{x}(1-\bar{x}). \end{eqnarray*}\] Hence, approximately, \[\begin{eqnarray*} \beta \, | \, x & \sim & N\left(\log \frac{\bar{x}}{1-\bar{x}}, \frac{1}{n\bar{x}(1-\bar{x})}\right). \end{eqnarray*}\]

  1. For which parameterisation does it make more sense to use a normal approximation?

Whilst we can find a normal approximation about the mode either on the scale of \(\theta\) or of \(\beta\), it makes more sense for \(\beta\). We have \(0 < \theta < 1\) and \(-\infty < \beta < \infty\) so only \(\beta\) has a sample space which agrees with the normal distribution.

Question 3

In viewing a section through the pancreas, doctors see what are called “islands”. Suppose that \(X_{i}\) denotes the number of islands observed in the \(i\)th patient, \(i =1, \ldots, n\), and we judge that \(X_{1}, \ldots, X_{n}\) are exchangeable with \(X_{i} \, | \, \theta \sim Po(\theta)\). A doctor believes that for healthy patients \(\theta\) will be on average around 2; he thinks it is unlikely that \(\theta\) is greater than 3. The number of islands seen in 100 patients are summarised in the following table. \[\begin{eqnarray*} & \begin{array}{|l|rrrrrrr|} \hline \mbox{Number of islands} & 0 & 1 & 2 & 3 & 4 & 5 & \geq 6 \\ \hline \mbox{Frequency} & 20 & 30 & 28 & 14 & 7 & 1 & 0 \\ \hline \end{array} \end{eqnarray*}\]

  1. Express the doctor’s prior beliefs as a normal distribution for \(\theta\). You may interpret the term “unlikely” as meaning “with probability 0.01”.

The doctor thus asserts \(\theta \sim N(\mu, \sigma^{2})\) with \(E(\theta) = 2\) and \(P(\theta > 3) = 0.01\). Note that, as \(\theta\) is continuous, this is equivalent to \(P(\theta \geq 3) = 0.01\). We use these two pieces of information to obtain \(\mu\) and \(\sigma^{2}\). Firstly, \(E(\theta) = 2 \Rightarrow \mu = 2\). Secondly, \[\begin{eqnarray*} P(\theta > 3) \ = \ 0.01 \ \Rightarrow \ P\left(\frac{\theta - 2}{\sigma} > \frac{1}{\sigma}\right) \ = \ 0.01. \end{eqnarray*}\] As \(\frac{\theta - 2}{\sigma} \sim N(0, 1)\) we have that \[\begin{eqnarray*} \frac{1}{\sigma} \ = \ 2.33 \ \Rightarrow \ \sigma^{2} \ = \ 5.4289^{-1} \ = \ 0.1842. \end{eqnarray*}\] Hence, \(\theta \sim N(2, 5.4289^{-1})\).

  1. Find, up to a constant of proportionality, the posterior distribution \(\theta \, | \, x\) where \(x = (x_{1}, \ldots, x_{100})\).

As \(X_{i} \, | \, \theta \sim Po(\theta)\), the likelihood is \[\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} \frac{\theta^{x_{i}}e^{-\theta}}{x_{i}!} \ = \ \frac{\theta^{n\bar{x}}e^{-n\theta}}{\prod_{i=1}^{n} x_{i}!}. \end{eqnarray*}\] As \(\theta \sim N(2, 5.4289^{-1})\), the prior is \[\begin{eqnarray*} f(\theta) & = & \frac{2.33}{\sqrt{2 \pi}} \exp \left\{ - \frac{5.4289}{2}(\theta - 2)^{2} \right\}. \end{eqnarray*}\] The posterior is thus \[\begin{eqnarray*} f(\theta \, | \, x) & \propto & f(x \, | \, \theta)f(\theta) \\ & \propto & \theta^{n\bar{x}}e^{-n\theta} \times \exp \left\{ - \frac{5.4289}{2}(\theta^{2} - 4 \theta)\right\} \\ & = & \theta^{n\bar{x}} \exp\left\{ - \frac{5.4289}{2}(\theta^{2} - 4 \theta) - n\theta\right\} \end{eqnarray*}\] For the explicit data we have \(n = 100\) and \[\begin{eqnarray*} \sum_{i=1}^{100} x_{i} = (0 \times 20)+(1 \times 30) + (2 \times 28) + (3 \times 14) + (4 \times 7) + (5 \times 1) = 161. \end{eqnarray*}\] The posterior is thus \[\begin{eqnarray*} f(\theta \, | \, x) & = & c\theta^{161} \exp\left\{- \frac{5.4289}{2}(\theta^{2} - 4 \theta) - 100\theta\right\} \end{eqnarray*}\] where \(c\) is the constant of proportionality.

  1. Find a normal approximation to the posterior about the mode. Thus, estimate the posterior probability that the average number of islands is greater than 2.

We find the mode of \(\theta \, | \, x\) by maximising \(f(\theta \, | \, x)\) or, equivalently, \(\log f(\theta \, | \, x)\). We have \[\begin{eqnarray*} \frac{\partial}{\partial \theta} \log f(\theta \, | \, x) & = & \frac{\partial}{\partial \theta} \left\{ \log c + 161 \log \theta - \frac{5.4289}{2}(\theta^{2} - 4 \theta) - 100\theta\right\} \\ & = & \frac{161}{\theta} - 5.4289\theta + 2(5.4289) - 100. \end{eqnarray*}\] So, the mode \(\tilde{\theta}\) satisfies \[\begin{eqnarray*} 5.4289\tilde{\theta}^{2} + \{100-2(5.4289)\}\tilde{\theta} - 161 & = & 0 \ \Rightarrow \\ 5.4289\tilde{\theta}^{2} + 89.1422\tilde{\theta} - 161 & = & 0. \end{eqnarray*}\] Hence, as \(\tilde{\theta} > 0\), \[\begin{eqnarray*} \tilde{\theta} \ = \ \frac{-89.1422 + \sqrt{89.1422^{2} + 4(5.4289)(161)}}{2(5.4289)} \ = \ 1.6419. \end{eqnarray*}\] The observed information is \[\begin{eqnarray*} I(\theta \, | \, x) & = & - \frac{\partial^{2}}{\partial \theta^{2}} \log f(\theta \, | \, x) \ = \ \frac{161}{\theta^{2}} + 5.4289 \end{eqnarray*}\] so that \[\begin{eqnarray*} I(\tilde{\theta} \, | \, x) & = & \frac{161}{1.6419^{2}} + 5.4289 \ = \ 65.1506. \end{eqnarray*}\] So, approximately, \(\theta \, | \, x \sim N(1.6419, 65.1506^{-1})\). Thus, \[\begin{eqnarray*} P(\theta > 2 \, | \, x) & = & P\{Z > \sqrt{65.1506}(2 - 1.6419)\} \ = \ 0.0019. \end{eqnarray*}\]

  1. Why might you prefer to express the doctor’s prior beliefs as a normal distribution on some other parameterisation \(\phi = g(\theta)\)? Suggest an appropriate choice of \(g(\cdot)\) in this case. Now express the doctor’s beliefs using a normal prior for \(\phi\); that for healthy patients \(\phi\) will be on average around \(g(2)\) and it is “unlikely” that \(\phi\) is greater than \(g(3)\). Give an expression for the density of \(\phi \, | \, x\) up to a constant of proportionality.

By definition \(\theta > 0\) but both the specified normal prior distribution and the normal approximation for the posterior \(\theta \, | \, x\) have a sample space of \((-\infty, \infty)\) so we might want to use some other parametrisation which has the same sample space as the normal distribution. An obvious choice is to use \(\phi = g(\theta) = \log \theta\) as if \(\theta > 0\) then \(-\infty < \log \theta < \infty\). We assert \(\phi \sim N(\mu_{0}, \sigma_{0}^{2})\) with \(E(\phi) = \log 2\) and \(P(\phi > \log 3) = 0.01\). So, \(E(\phi) = \log 2 \Rightarrow \mu_{0} = \log 2\) and \[\begin{eqnarray*} P(\phi > \log 3) \ = \ 0.01 & \Rightarrow P\left(\frac{\phi - \log 2}{\sigma_{0}} > \frac{\log 3 - \log 2}{\sigma_{0}}\right) \ = \ 0.01. \end{eqnarray*}\]
As \(\frac{\phi - \log 2}{\sigma_{0}} \sim N(0, 1)\) we have that \[\begin{eqnarray*} \frac{1}{\sigma_{0}} \ = \ \frac{2.33}{\log \frac{3}{2}} & \Rightarrow & \sigma_{0}^{2} \ = \ 0.0303. \end{eqnarray*}\] As \(\phi = \log \theta\) then \(\theta = e^{\phi}\). The likelihood is thus, using (b), \[\begin{eqnarray*} f(x \, | \, \phi) & \propto & (e^{\phi})^{n\bar{x}} e^{-ne^{\phi}} \ = \ e^{161\phi } e^{-100e^{\phi}} \end{eqnarray*}\] for the given data. The posterior is thus \[\begin{eqnarray*} f(\phi \, | \, x) & \propto & e^{161\phi } e^{-100e^{\phi}} \times \exp\left\{-\frac{1}{0.0606}(\phi^{2} - \log 4 \phi)\right\} \\ & = & \exp\left\{-100e^{\phi} -16.5017 \phi^{2} + 183.8761 \phi\right\}. \end{eqnarray*}\]

Question 4

Let \(X_{1}, \ldots, X_{10}\) be the length of time between arrivals at an ATM machine, and assume that the \(X_{i}\)s may be viewed as exchangeable with \(X_{i} \, | \, \lambda \sim Exp(\lambda)\) where \(\lambda\) is the rate at which people arrive at the machine in one-minute intervals. Suppose we observe \(\sum_{i=1}^{10} x_{i} = 4\). Suppose that the prior distribution for \(\lambda\) is given by \[\begin{eqnarray*} f(\lambda) & = & \left\{\begin{array}{ll} c\exp\{-20(\lambda-0.25)^{2}\} & \lambda \geq 0, \\ 0 & \mbox{otherwise} \end{array} \right. \end{eqnarray*}\] where \(c\) is a known constant.

  1. Find, up to a constant \(k\) of proportionality, the posterior distribution \(\lambda \, | \, x\) where \(x = (x_{1}, \ldots, x_{10})\). Find also an expression for \(k\) which you need not evaluate.

The likelihood is \[\begin{eqnarray*} f(x \, | \, \lambda) & = & \prod_{i=1}^{10} \lambda e^{-\lambda x_{i}} \ = \ \lambda^{10}e^{-4\lambda} \end{eqnarray*}\] with the given data. The posterior is thus \[\begin{eqnarray*} f(\lambda \, | \, x) & \propto & \lambda^{10}e^{-4\lambda} \times \exp\{-20(\lambda - 0.25)^{2}\} \ = \ \lambda^{10} \exp\{-20(\lambda^{2} - 0.5\lambda) - 4\lambda\} \end{eqnarray*}\] so that, making the fact that \(\lambda > 0\) explicit, \[\begin{eqnarray*} f(\lambda \, | \, x) & = & \left\{\begin{array}{ll} k \lambda^{10} \exp\{-20\lambda^{2} + 6\lambda\} & \lambda > 0 \\ 0 & \mbox{otherwise}, \end{array}\right. \end{eqnarray*}\] where \[\begin{eqnarray*} k^{-1} & = & \int_{0}^{\infty} \lambda^{10} \exp\{-20\lambda^{2} + 6\lambda\} \, d\lambda. \end{eqnarray*}\]

  1. Find a normal approximation to this posterior distribution about the mode.

We find the mode of \(\lambda \, | \, x\) by maximising \(f(\lambda \, | \, x)\) or, equivalently, \(\log f(\lambda \, | \, x)\). We have \[\begin{eqnarray*} \frac{\partial}{\partial \lambda} \log f(\lambda \, | \, x) & = & \frac{\partial}{\partial \lambda} \left\{10 \log \lambda -20\lambda^{2} + 6\lambda\right\} \\ & = & \frac{10}{\lambda} - 40\lambda + 6. \end{eqnarray*}\] So, the mode \(\tilde{\lambda}\) satisfies \[\begin{eqnarray*} 40\tilde{\lambda}^{2} - 6\tilde{\lambda} - 10 \ = \ 0 & \Rightarrow & 20\tilde{\lambda}^{2} - 3\tilde{\lambda} - 5 \ = \ 0. \end{eqnarray*}\] Hence, as \(\tilde{\lambda} > 0\), \[\begin{eqnarray*} \tilde{\lambda} & = & \frac{3 + \sqrt{9 + 4(20)(5)}}{2(20)} \ = \ 0.5806. \end{eqnarray*}\] The observed information is \[\begin{eqnarray*} I(\lambda \, | \, x) & = & - \frac{\partial^{2}}{\partial \lambda^{2}} \log f(\lambda \, | \, x) \ = \ \frac{10}{\lambda^{2}} + 40 \end{eqnarray*}\] so that \[\begin{eqnarray*} I(\tilde{\lambda} \, | \, x) & = & \frac{10}{0.5806^{2}} + 40 \ = \ 69.6651. \end{eqnarray*}\] So, approximately, \(\lambda \, | \, x \sim N(0.5806, 69.6651^{-1})\).

  1. Let \(Z_{i}\), \(i = 1, \ldots, N\) be a sequence of independent and identically distributed standard Normal random variables. Assuming the normalising constant \(k\) is known, explain carefully how the \(Z_{i}\) may be used to obtain estimates of the mean of \(\lambda \, | \, x\).

We shall use importance sampling. If we wish to estimate some \(E\{g(\lambda) \, | \, X\}\) with posterior density \(f(\lambda \, | \, x)\) and can generate independent samples \(\lambda_{1}, \ldots, \lambda_{N}\) from some \(q(\lambda)\), an approximation of \(f(\lambda \, | \, x)\), then \[\begin{eqnarray*} \hat{I} & = & \frac{1}{N} \sum_{i=1}^{N} \frac{g(\lambda_{i})f(\lambda_{i} \, | \, x)}{q(\lambda_{i})} \end{eqnarray*}\] is an unbiased estimate of \(E\{g(\lambda) \, | \, X\}\).

As \(Z_{i} \sim N(0, 1)\) then \(\lambda_{i} = 69.6651^{-\frac{1}{2}}Z_{i} + 0.5806 \sim N(0.5806, 69.6651^{-1})\) so that we can generate an independent and identically distributed sample from the \(N(0.5806, 69.6651^{-1})\) which is an approximation to the posterior of \(\lambda\). Letting \(g(\lambda) = \lambda\), \[\begin{eqnarray*} f(\lambda \, | \, x) & = & \left\{\begin{array}{ll} k \lambda^{10} \exp\{-20\lambda^{2} + 6\lambda\} & \lambda > 0 \\ 0 & \mbox{otherwise}, \end{array} \right. \\ q(\lambda) & = & \frac{\sqrt{69.6651}}{\sqrt{2 \pi}} \exp\left\{-\frac{69.6651}{2}(\lambda - 0.5806)^{2}\right\} \end{eqnarray*}\] then \[\begin{eqnarray*} \hat{I} & = & \frac{1}{N} \sum_{i=1}^{N} \frac{k\lambda_{i}^{11}\exp\{-20\lambda_{i}^{2} + 6\lambda_{i}\}\mathbb{I}_{\{\lambda_{i} > 0\}}}{\frac{\sqrt{69.6651}}{\sqrt{2 \pi}} \exp\left\{-\frac{69.6651}{2}(\lambda_{i} - 0.5806)^{2}\right\}} \end{eqnarray*}\] is an unbiased estimate of the posterior mean of \(\lambda\) with \(\mathbb{I}_{\{\lambda_{i} > 0\}}\) denoting the indicator function for the event \(\{\lambda_{i} > 0\}\).