## Solution Sheet Six

### Question 1

Let $$X_{1}, \ldots, X_{n}$$ be exchangeable so that the $$X_{i}$$ are conditionally independent given a parameter $$\theta = (\mu, \sigma^{2})$$. Suppose that $$X_{i} \, | \, \theta \sim N(\mu, \sigma^{2})$$. It is judged that the improper joint prior distribution $$f(\mu, \sigma^{2}) \propto 1/\sigma^{2}$$ is appropriate.

1. Show that the likelihood $$f(x \, | \, \mu, \sigma^{2})$$, where $$x = (x_{1}, \ldots, x_{n})$$, can be expressed as $\begin{eqnarray*} f(x \, | \, \mu, \sigma^{2}) & = & \left(2\pi \sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right\}, \end{eqnarray*}$ where $$\overline{x} = \frac{1}{n} \sum_{i=1}^{n} x_{i}$$, $$s^{2} = \frac{1}{n-1} \sum_{i=1}^{n} (x_{i} - \overline{x})^{2}$$ are respectively the sample mean and variance. Hence, explain why $$\overline{X}$$ and $$S^{2}$$ are sufficient for $$X = (X_{1}, \ldots, X_{n})$$ for learning about $$\theta$$.

$\begin{eqnarray*} f(x \, | \, \mu, \sigma^{2}) & = & \prod_{i=1}^{n} \frac{1}{\sqrt{2 \pi}\sigma} \exp\left\{-\frac{1}{2\sigma^{2}}(x_{i} - \mu)^{2}\right\} \\ & = & \left(2\pi\sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n} (x_{i} - \mu)^{2}\right\} \\ & = & \left(2\pi\sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n} \left((x_{i} - \overline{x}) + (\overline{x} - \mu)\right)^{2}\right\} \\ & = & \left(2\pi \sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right\}. \end{eqnarray*}$ Recall that a statistic $$t(X)$$ is said to be sufficient for $$X$$ for learning about $$\theta$$ if we can write $\begin{eqnarray*} f(x \, | \, \theta) & = & g(t, \theta)h(x) \end{eqnarray*}$ where $$g(t, \theta)$$ depends upon $$t(x)$$ and $$\theta$$ and $$h(x)$$ does not depend upon $$\theta$$ but may depend upon $$x$$. In this case we have that $\begin{eqnarray*} f(x \, | \, \mu, \sigma^{2}) & = & g(\overline{x}, s^{2}, \mu, \sigma^{2})\left(2\pi \right)^{-\frac{n}{2}} \end{eqnarray*}$ so that $$\overline{X}$$ and $$S^{2}$$ are sufficient.

1. Find, up to a constant of integration, the posterior distribution of $$\theta$$ given $$x$$.

$\begin{eqnarray*} f(\mu, \sigma^{2} \, | \, x) & \propto & f(x \, | \, \mu, \sigma^{2})f(\mu, \sigma^{2}) \\ & \propto & \left(2\pi \sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right\} \frac{1}{\sigma^{2}} \\ & \propto & \left(\sigma^{2}\right)^{-\left(\frac{n}{2}+1\right)}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right\}. \end{eqnarray*}$

1. Show that $$\mu \, | \, \sigma^{2}, x \sim N(\overline{x}, \sigma^{2}/n)$$. Hence, explain why, in this case, the chosen prior distribution for $$\theta$$ is noninformative.

$\begin{eqnarray*} f(\mu \, | \, \sigma^{2}, x) & = & \frac{f(\mu, \sigma^{2} \, | \, x)}{f(\sigma^{2} \, | \, x)} \\ & \propto & f(\mu, \sigma^{2} \, | \, x) \ \ \ \ \mbox{(with respect to \mu)} \\ & \propto & \exp\left\{-\frac{n}{2\sigma^{2}}(\overline{x} - \mu)^{2}\right\}. \end{eqnarray*}$ We recognise this as a kernel of $$N(\overline{x}, \sigma^{2}/n)$$ so $$\mu \, | \, \sigma^{2}, x \sim N(\overline{x}, \sigma^{2}/n)$$. In the classical model for $$\mu$$ when $$\sigma^{2}$$ is known, the mle is $$\overline{x}$$ and the standard error is $$\sigma^{2}/n$$. The distribution is coming only from the data (given $$\sigma^{2}$$) showing the noninformative prior in this case. Notice that a symmetric $$100(1-\alpha)\%$$ credible interval for $$\mu \, | \, \sigma^{2}, x$$ is $\begin{eqnarray*} \left(\overline{x} - z_{\left(1 - \frac{\alpha}{2}\right)}\frac{\sigma}{\sqrt{n}}, \overline{x} + z_{\left(1 - \frac{\alpha}{2}\right)}\frac{\sigma}{\sqrt{n}}\right) \end{eqnarray*}$ which agrees with the $$100(1-\alpha)\%$$ confidence interval for $$\mu$$ when $$\sigma^{2}$$ is known and $$X_{1}, \ldots, X_{n}$$ are iid $$N(\mu, \sigma^{2})$$.

1. By integrating $$f(\mu, \sigma^{2} \, | \, x)$$ over $$\sigma^{2}$$, show that $\begin{eqnarray*} f(\mu \, | \, x) & \propto & \left[1 + \frac{1}{n-1}\left(\frac{\overline{x} - \mu}{s/\sqrt{n}}\right)^{2}\right]^{-\frac{n}{2}}. \end{eqnarray*}$ Thus, explain why $$\mu \, | \, x \sim t_{n-1}(\overline{x}, s^{2}/n)$$, the non-central $$t$$-distribution with $$n-1$$ degrees of freedom, location parameter $$\overline{x}$$ and squared scale parameter $$s^{2}/n$$. How does this result relate to the classical problem of making inferences about $$\mu$$ when $$\sigma^{2}$$ is also unknown?

$\begin{eqnarray} f(\mu \, | \, x) & = & \int_{-\infty}^{\infty} f(\mu, \sigma^{2} \, | \, x) \, d\sigma^{2} \nonumber \\ & \propto & \int_{0}^{\infty} \left(\sigma^{2}\right)^{-\left(\frac{n}{2}+1\right)}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right\} \, d\sigma^{2} \tag{1} \end{eqnarray}$ as $$f(\mu, \sigma^{2} \, | \, x) = 0$$ for $$\sigma^{2} < 0$$. We recognise the integrand in equation (1) as a kernel of $$\mbox{Inv-Gamma}\left(\frac{n}{2}, \frac{1}{2}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right)$$ so that $\begin{eqnarray*} f(\mu \, | \, x) & \propto & \Gamma(n/2)\left(\frac{1}{2}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right)^{-\frac{n}{2}}\\ & \propto & \left(\frac{1}{2}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right)^{-\frac{n}{2}}\\ & \propto & \left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]^{-\frac{n}{2}}\\ & \propto & \left[1 + \frac{1}{n-1}\left(\frac{\overline{x} - \mu}{s/\sqrt{n}}\right)^{2}\right]^{-\frac{n}{2}}. \end{eqnarray*}$ We recognise this as a kernel of $$t_{n-1}(\overline{x}, s^{2}/n)$$ so that $$\mu \, | \, x \sim t_{n-1}(\overline{x}, s^{2}/n)$$. This gives a further insight into how the prior distribution is noninformative. Inference about $$\mu$$ will mirror the classical approach when $$\mu$$ and $$\sigma^{2}$$ are unknown and $$\frac{\overline{X} - \mu}{S/\sqrt{n}} \sim t_{n-1}$$ where $$t_{n-1} = t_{n-1}(0, 1)$$. For example, a symmetric $$100(1-\alpha)\%$$ credible interval for $$\mu \, | \, x$$ is $\begin{eqnarray*} \left(\overline{x} - t_{n-1, \frac{\alpha}{2}}\frac{s}{\sqrt{n}}, \overline{x} + t_{n-1, \frac{\alpha}{2}}\frac{s}{\sqrt{n}}\right) \end{eqnarray*}$ which agrees with the $$100(1-\alpha)\%$$ confidence interval for $$\mu$$ when $$\sigma^{2}$$ is unknown and $$X_{1}, \ldots, X_{n}$$ are iid $$N(\mu, \sigma^{2})$$.

### Question 2

Let $$X_{1}, \ldots, X_{n}$$ be exchangeable with $$X_{i} \, | \, \theta \sim Bern(\theta)$$.

1. Using the improper prior distribution $$f(\theta) \propto \theta^{-1}(1-\theta)^{-1}$$ find the posterior distribution of $$\theta \, | \, x$$ where $$x = (x_{1}, \ldots, x_{n})$$. Find a normal approximation about the mode to this distribution.

The likelihood is $\begin{eqnarray*} f(x \, | \, \theta) \ = \ \prod_{i=1}^{n} \theta^{x_{i}}(1-\theta)^{1-x_{i}} \ = \ \theta^{n\bar{x}}(1-\theta)^{n-n\bar{x}}, \end{eqnarray*}$ where $$x = (x_{1}, \ldots, x_{n})$$. With the given prior the posterior is $\begin{eqnarray*} f(\theta \, | \, x) \ \propto \ f(x \, | \, \theta)f(\theta) \ \propto \ \theta^{n\bar{x}-1}(1-\theta)^{n-n\bar{x}-1} \end{eqnarray*}$ which we recognise as a kernel of a $$Beta(n\bar{x}, n-n\bar{x})$$ density. Thus, $$\theta \, | \, x \sim Beta(n\bar{x}, n-n\bar{x})$$. The mode of a $$Beta(\alpha, \beta)$$ distribution is $$\frac{\alpha-1}{\alpha + \beta -2}$$ so for $$\theta \, | \, x$$ the mode is $\begin{eqnarray*} \tilde{\theta} & = & \frac{n\bar{x} - 1}{n-2}. \end{eqnarray*}$ The observed information is $\begin{eqnarray*} I(\theta \, | \, x) & = & -\frac{\partial^{2}}{\partial \theta^{2}}\log f(\theta \, | \, x) \\ & = & -\frac{\partial^{2}}{\partial \theta^{2}}\left\{\log \frac{\Gamma(n)}{\Gamma(n\bar{x})\Gamma(n -n\bar{x})} + (n\bar{x}-1) \log \theta + (n-n\bar{x}-1)\log (1-\theta)\right\} \\ & = & \frac{n\bar{x}-1}{\theta^{2}} + \frac{n-n\bar{x}-1}{(1-\theta)^{2}}. \end{eqnarray*}$ So, evaluating the observed information at the mode, $\begin{eqnarray*} I(\tilde{\theta} \, | \, x) & = & \frac{n\bar{x}-1}{\tilde{\theta}^{2}} + \frac{n-n\bar{x}-1}{(1-\tilde{\theta})^{2}}. \end{eqnarray*}$ Noting that $$1-\tilde{\theta} = \frac{n-n\bar{x}-1}{n-2}$$ we have that $\begin{eqnarray*} I(\tilde{\theta} \, | \, x) & = & \frac{(n-2)^{2}}{n\bar{x}-1} + \frac{(n-2)^{2}}{n-n\bar{x}-1} \\ & = & \frac{(n-2)^{3}}{(n\bar{x} -1)(n -n\bar{x} -1)}. \end{eqnarray*}$ So, approximately, $$\theta \, | \, x \sim N(\tilde{\theta}, I^{-1}(\tilde{\theta} \, | \, x))$$, that is, approximately, $\begin{eqnarray*} \theta \, | \, x & \sim & N\left(\frac{n\bar{x} - 1}{n-2}, \frac{(n\bar{x} -1)(n -n\bar{x} -1)}{(n-2)^{3}}\right). \end{eqnarray*}$

1. Show that the prior distribution $$f(\theta)$$ is equivalent to a uniform prior on $\begin{eqnarray*} \beta & = & \log \left(\frac{\theta}{1-\theta}\right) \end{eqnarray*}$ and find the posterior distribution of $$\beta \, | \, x$$. Find a normal approximation about the mode to this distribution.

We have $$\beta = g(\theta)$$. We invert to find $$\theta = g^{-1}(\beta)$$. We find$\begin{eqnarray*} \theta & = & \frac{e^{\beta}}{1 + e^{\beta}}. \end{eqnarray*}$ The prior $$f_{\beta}(\beta)$$ for $$\beta$$ is given by $\begin{eqnarray*} f_{\beta}(\beta) & = & \left|\frac{\partial \theta}{\partial \beta} \right| f_{\theta}(\theta) \\ & = & \left|\frac{e^{\beta}}{(1 + e^{\beta})^{2}}\right| \left(\frac{e^{\beta}}{1 + e^{\beta}}\right)^{-1}\left(1 - \frac{e^{\beta}}{1 + e^{\beta}}\right)^{-1} \\ & = & \frac{e^{\beta}}{(1 + e^{\beta})^{2}} \times \frac{1 + e^{\beta}}{e^{\beta}} \times (1 + e^{\beta}) \ = \ 1, \end{eqnarray*}$ which is equivalent to the (improper) uniform on $$\beta$$. The posterior is $\begin{eqnarray*} f(\beta \, | \, x) & \propto & f(x \, | \, \beta) f(\beta) \\ & \propto & \theta^{n\bar{x}}(1-\theta)^{n-n\bar{x}} \\ & = & \left(\frac{e^{\beta}}{1 + e^{\beta}}\right)^{n\bar{x}}\left(1-\frac{e^{\beta}}{1 + e^{\beta}}\right)^{n-n\bar{x}} \ = \ \frac{e^{\beta n \bar{x}}}{(1+e^{\beta})^{n}}. \end{eqnarray*}$ Hence, $$f(\beta \, | \, x) = \frac{ce^{\beta n \bar{x}}}{(1+e^{\beta})^{n}}$$ where $$c$$ is the constant of integration. For the normal approximation about the mode, we first need to find the mode of $$\beta \, | \, x$$. The mode is the maximum of $$f(\beta \, | \, x)$$ which is, equivalently, the maximum of $$\log f(\beta \, | \, x)$$. We have $\begin{eqnarray*} \log f(\beta \, | \, x) & = & \log c + \beta n \bar{x} -n \log (1+e^{\beta}) \\ \Rightarrow \frac{\partial}{\partial \beta} \log f(\beta \, | \, x) & = & n \bar{x} - \frac{ne^{\beta}}{1 + e^{\beta}}. \end{eqnarray*}$ The mode $$\tilde{\beta}$$ satisfies $\begin{eqnarray*} n \bar{x} - \frac{ne^{\tilde{\beta}}}{1 + e^{\tilde{\beta}}} & = & 0 \\ \Rightarrow e^{\tilde{\beta}} & = & \frac{n\bar{x}}{n -n \bar{x}} \\ \Rightarrow \tilde{\beta} & = & \log \left(\frac{\bar{x}}{1-\bar{x}}\right). \end{eqnarray*}$ The observed information is $\begin{eqnarray*} I(\beta \, | \, x) & = & - \frac{\partial^{2}}{\partial \beta^{2}} \log f(\beta \, | \, x) \\ & = & \frac{ne^{\beta}}{(1 + e^{\beta})^{2}}. \end{eqnarray*}$ Noting that $$1 + e^{\tilde{\beta}} = \frac{1}{1-\bar{x}}$$ we have $\begin{eqnarray*} I(\tilde{\beta} \, | \, x) & = & n \times \frac{\bar{x}}{1-\bar{x}} \times (1-\bar{x})^{2} \ = \ n\bar{x}(1-\bar{x}). \end{eqnarray*}$ Hence, approximately, $\begin{eqnarray*} \beta \, | \, x & \sim & N\left(\log \frac{\bar{x}}{1-\bar{x}}, \frac{1}{n\bar{x}(1-\bar{x})}\right). \end{eqnarray*}$

1. For which parameterisation does it make more sense to use a normal approximation?

Whilst we can find a normal approximation about the mode either on the scale of $$\theta$$ or of $$\beta$$, it makes more sense for $$\beta$$. We have $$0 < \theta < 1$$ and $$-\infty < \beta < \infty$$ so only $$\beta$$ has a sample space which agrees with the normal distribution.

### Question 3

In viewing a section through the pancreas, doctors see what are called “islands”. Suppose that $$X_{i}$$ denotes the number of islands observed in the $$i$$th patient, $$i =1, \ldots, n$$, and we judge that $$X_{1}, \ldots, X_{n}$$ are exchangeable with $$X_{i} \, | \, \theta \sim Po(\theta)$$. A doctor believes that for healthy patients $$\theta$$ will be on average around 2; he thinks it is unlikely that $$\theta$$ is greater than 3. The number of islands seen in 100 patients are summarised in the following table. $\begin{eqnarray*} & \begin{array}{|l|rrrrrrr|} \hline \mbox{Number of islands} & 0 & 1 & 2 & 3 & 4 & 5 & \geq 6 \\ \hline \mbox{Frequency} & 20 & 30 & 28 & 14 & 7 & 1 & 0 \\ \hline \end{array} \end{eqnarray*}$

1. Express the doctor’s prior beliefs as a normal distribution for $$\theta$$. You may interpret the term “unlikely” as meaning “with probability 0.01”.

The doctor thus asserts $$\theta \sim N(\mu, \sigma^{2})$$ with $$E(\theta) = 2$$ and $$P(\theta > 3) = 0.01$$. Note that, as $$\theta$$ is continuous, this is equivalent to $$P(\theta \geq 3) = 0.01$$. We use these two pieces of information to obtain $$\mu$$ and $$\sigma^{2}$$. Firstly, $$E(\theta) = 2 \Rightarrow \mu = 2$$. Secondly, $\begin{eqnarray*} P(\theta > 3) \ = \ 0.01 \ \Rightarrow \ P\left(\frac{\theta - 2}{\sigma} > \frac{1}{\sigma}\right) \ = \ 0.01. \end{eqnarray*}$ As $$\frac{\theta - 2}{\sigma} \sim N(0, 1)$$ we have that $\begin{eqnarray*} \frac{1}{\sigma} \ = \ 2.33 \ \Rightarrow \ \sigma^{2} \ = \ 5.4289^{-1} \ = \ 0.1842. \end{eqnarray*}$ Hence, $$\theta \sim N(2, 5.4289^{-1})$$.

1. Find, up to a constant of proportionality, the posterior distribution $$\theta \, | \, x$$ where $$x = (x_{1}, \ldots, x_{100})$$.

As $$X_{i} \, | \, \theta \sim Po(\theta)$$, the likelihood is $\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} \frac{\theta^{x_{i}}e^{-\theta}}{x_{i}!} \ = \ \frac{\theta^{n\bar{x}}e^{-n\theta}}{\prod_{i=1}^{n} x_{i}!}. \end{eqnarray*}$ As $$\theta \sim N(2, 5.4289^{-1})$$, the prior is $\begin{eqnarray*} f(\theta) & = & \frac{2.33}{\sqrt{2 \pi}} \exp \left\{ - \frac{5.4289}{2}(\theta - 2)^{2} \right\}. \end{eqnarray*}$ The posterior is thus $\begin{eqnarray*} f(\theta \, | \, x) & \propto & f(x \, | \, \theta)f(\theta) \\ & \propto & \theta^{n\bar{x}}e^{-n\theta} \times \exp \left\{ - \frac{5.4289}{2}(\theta^{2} - 4 \theta)\right\} \\ & = & \theta^{n\bar{x}} \exp\left\{ - \frac{5.4289}{2}(\theta^{2} - 4 \theta) - n\theta\right\} \end{eqnarray*}$ For the explicit data we have $$n = 100$$ and $\begin{eqnarray*} \sum_{i=1}^{100} x_{i} = (0 \times 20)+(1 \times 30) + (2 \times 28) + (3 \times 14) + (4 \times 7) + (5 \times 1) = 161. \end{eqnarray*}$ The posterior is thus $\begin{eqnarray*} f(\theta \, | \, x) & = & c\theta^{161} \exp\left\{- \frac{5.4289}{2}(\theta^{2} - 4 \theta) - 100\theta\right\} \end{eqnarray*}$ where $$c$$ is the constant of proportionality.

1. Find a normal approximation to the posterior about the mode. Thus, estimate the posterior probability that the average number of islands is greater than 2.

We find the mode of $$\theta \, | \, x$$ by maximising $$f(\theta \, | \, x)$$ or, equivalently, $$\log f(\theta \, | \, x)$$. We have $\begin{eqnarray*} \frac{\partial}{\partial \theta} \log f(\theta \, | \, x) & = & \frac{\partial}{\partial \theta} \left\{ \log c + 161 \log \theta - \frac{5.4289}{2}(\theta^{2} - 4 \theta) - 100\theta\right\} \\ & = & \frac{161}{\theta} - 5.4289\theta + 2(5.4289) - 100. \end{eqnarray*}$ So, the mode $$\tilde{\theta}$$ satisfies $\begin{eqnarray*} 5.4289\tilde{\theta}^{2} + \{100-2(5.4289)\}\tilde{\theta} - 161 & = & 0 \ \Rightarrow \\ 5.4289\tilde{\theta}^{2} + 89.1422\tilde{\theta} - 161 & = & 0. \end{eqnarray*}$ Hence, as $$\tilde{\theta} > 0$$, $\begin{eqnarray*} \tilde{\theta} \ = \ \frac{-89.1422 + \sqrt{89.1422^{2} + 4(5.4289)(161)}}{2(5.4289)} \ = \ 1.6419. \end{eqnarray*}$ The observed information is $\begin{eqnarray*} I(\theta \, | \, x) & = & - \frac{\partial^{2}}{\partial \theta^{2}} \log f(\theta \, | \, x) \ = \ \frac{161}{\theta^{2}} + 5.4289 \end{eqnarray*}$ so that $\begin{eqnarray*} I(\tilde{\theta} \, | \, x) & = & \frac{161}{1.6419^{2}} + 5.4289 \ = \ 65.1506. \end{eqnarray*}$ So, approximately, $$\theta \, | \, x \sim N(1.6419, 65.1506^{-1})$$. Thus, $\begin{eqnarray*} P(\theta > 2 \, | \, x) & = & P\{Z > \sqrt{65.1506}(2 - 1.6419)\} \ = \ 0.0019. \end{eqnarray*}$

1. Why might you prefer to express the doctor’s prior beliefs as a normal distribution on some other parameterisation $$\phi = g(\theta)$$? Suggest an appropriate choice of $$g(\cdot)$$ in this case. Now express the doctor’s beliefs using a normal prior for $$\phi$$; that for healthy patients $$\phi$$ will be on average around $$g(2)$$ and it is “unlikely” that $$\phi$$ is greater than $$g(3)$$. Give an expression for the density of $$\phi \, | \, x$$ up to a constant of proportionality.

By definition $$\theta > 0$$ but both the specified normal prior distribution and the normal approximation for the posterior $$\theta \, | \, x$$ have a sample space of $$(-\infty, \infty)$$ so we might want to use some other parametrisation which has the same sample space as the normal distribution. An obvious choice is to use $$\phi = g(\theta) = \log \theta$$ as if $$\theta > 0$$ then $$-\infty < \log \theta < \infty$$. We assert $$\phi \sim N(\mu_{0}, \sigma_{0}^{2})$$ with $$E(\phi) = \log 2$$ and $$P(\phi > \log 3) = 0.01$$. So, $$E(\phi) = \log 2 \Rightarrow \mu_{0} = \log 2$$ and $\begin{eqnarray*} P(\phi > \log 3) \ = \ 0.01 & \Rightarrow P\left(\frac{\phi - \log 2}{\sigma_{0}} > \frac{\log 3 - \log 2}{\sigma_{0}}\right) \ = \ 0.01. \end{eqnarray*}$
As $$\frac{\phi - \log 2}{\sigma_{0}} \sim N(0, 1)$$ we have that $\begin{eqnarray*} \frac{1}{\sigma_{0}} \ = \ \frac{2.33}{\log \frac{3}{2}} & \Rightarrow & \sigma_{0}^{2} \ = \ 0.0303. \end{eqnarray*}$ As $$\phi = \log \theta$$ then $$\theta = e^{\phi}$$. The likelihood is thus, using (b), $\begin{eqnarray*} f(x \, | \, \phi) & \propto & (e^{\phi})^{n\bar{x}} e^{-ne^{\phi}} \ = \ e^{161\phi } e^{-100e^{\phi}} \end{eqnarray*}$ for the given data. The posterior is thus $\begin{eqnarray*} f(\phi \, | \, x) & \propto & e^{161\phi } e^{-100e^{\phi}} \times \exp\left\{-\frac{1}{0.0606}(\phi^{2} - \log 4 \phi)\right\} \\ & = & \exp\left\{-100e^{\phi} -16.5017 \phi^{2} + 183.8761 \phi\right\}. \end{eqnarray*}$

### Question 4

Let $$X_{1}, \ldots, X_{10}$$ be the length of time between arrivals at an ATM machine, and assume that the $$X_{i}$$s may be viewed as exchangeable with $$X_{i} \, | \, \lambda \sim Exp(\lambda)$$ where $$\lambda$$ is the rate at which people arrive at the machine in one-minute intervals. Suppose we observe $$\sum_{i=1}^{10} x_{i} = 4$$. Suppose that the prior distribution for $$\lambda$$ is given by $\begin{eqnarray*} f(\lambda) & = & \left\{\begin{array}{ll} c\exp\{-20(\lambda-0.25)^{2}\} & \lambda \geq 0, \\ 0 & \mbox{otherwise} \end{array} \right. \end{eqnarray*}$ where $$c$$ is a known constant.

1. Find, up to a constant $$k$$ of proportionality, the posterior distribution $$\lambda \, | \, x$$ where $$x = (x_{1}, \ldots, x_{10})$$. Find also an expression for $$k$$ which you need not evaluate.

The likelihood is $\begin{eqnarray*} f(x \, | \, \lambda) & = & \prod_{i=1}^{10} \lambda e^{-\lambda x_{i}} \ = \ \lambda^{10}e^{-4\lambda} \end{eqnarray*}$ with the given data. The posterior is thus $\begin{eqnarray*} f(\lambda \, | \, x) & \propto & \lambda^{10}e^{-4\lambda} \times \exp\{-20(\lambda - 0.25)^{2}\} \ = \ \lambda^{10} \exp\{-20(\lambda^{2} - 0.5\lambda) - 4\lambda\} \end{eqnarray*}$ so that, making the fact that $$\lambda > 0$$ explicit, $\begin{eqnarray*} f(\lambda \, | \, x) & = & \left\{\begin{array}{ll} k \lambda^{10} \exp\{-20\lambda^{2} + 6\lambda\} & \lambda > 0 \\ 0 & \mbox{otherwise}, \end{array}\right. \end{eqnarray*}$ where $\begin{eqnarray*} k^{-1} & = & \int_{0}^{\infty} \lambda^{10} \exp\{-20\lambda^{2} + 6\lambda\} \, d\lambda. \end{eqnarray*}$

1. Find a normal approximation to this posterior distribution about the mode.

We find the mode of $$\lambda \, | \, x$$ by maximising $$f(\lambda \, | \, x)$$ or, equivalently, $$\log f(\lambda \, | \, x)$$. We have $\begin{eqnarray*} \frac{\partial}{\partial \lambda} \log f(\lambda \, | \, x) & = & \frac{\partial}{\partial \lambda} \left\{10 \log \lambda -20\lambda^{2} + 6\lambda\right\} \\ & = & \frac{10}{\lambda} - 40\lambda + 6. \end{eqnarray*}$ So, the mode $$\tilde{\lambda}$$ satisfies $\begin{eqnarray*} 40\tilde{\lambda}^{2} - 6\tilde{\lambda} - 10 \ = \ 0 & \Rightarrow & 20\tilde{\lambda}^{2} - 3\tilde{\lambda} - 5 \ = \ 0. \end{eqnarray*}$ Hence, as $$\tilde{\lambda} > 0$$, $\begin{eqnarray*} \tilde{\lambda} & = & \frac{3 + \sqrt{9 + 4(20)(5)}}{2(20)} \ = \ 0.5806. \end{eqnarray*}$ The observed information is $\begin{eqnarray*} I(\lambda \, | \, x) & = & - \frac{\partial^{2}}{\partial \lambda^{2}} \log f(\lambda \, | \, x) \ = \ \frac{10}{\lambda^{2}} + 40 \end{eqnarray*}$ so that $\begin{eqnarray*} I(\tilde{\lambda} \, | \, x) & = & \frac{10}{0.5806^{2}} + 40 \ = \ 69.6651. \end{eqnarray*}$ So, approximately, $$\lambda \, | \, x \sim N(0.5806, 69.6651^{-1})$$.

1. Let $$Z_{i}$$, $$i = 1, \ldots, N$$ be a sequence of independent and identically distributed standard Normal random quantities. Assuming the normalising constant $$k$$ is known, explain carefully how the $$Z_{i}$$ may be used to obtain estimates of the mean of $$\lambda \, | \, x$$.

We shall use importance sampling. If we wish to estimate some $$E\{g(\lambda) \, | \, X\}$$ with posterior density $$f(\lambda \, | \, x)$$ and can generate independent samples $$\lambda_{1}, \ldots, \lambda_{N}$$ from some $$q(\lambda)$$, an approximation of $$f(\lambda \, | \, x)$$, then $\begin{eqnarray*} \hat{I} & = & \frac{1}{N} \sum_{i=1}^{N} \frac{g(\lambda_{i})f(\lambda_{i} \, | \, x)}{q(\lambda_{i})} \end{eqnarray*}$ is an unbiased estimate of $$E\{g(\lambda) \, | \, X\}$$.

As $$Z_{i} \sim N(0, 1)$$ then $$\lambda_{i} = 69.6651^{-\frac{1}{2}}Z_{i} + 0.5806 \sim N(0.5806, 69.6651^{-1})$$ so that we can generate an independent and identically distributed sample from the $$N(0.5806, 69.6651^{-1})$$ which is an approximation to the posterior of $$\lambda$$. Letting $$g(\lambda) = \lambda$$, $\begin{eqnarray*} f(\lambda \, | \, x) & = & \left\{\begin{array}{ll} k \lambda^{10} \exp\{-20\lambda^{2} + 6\lambda\} & \lambda > 0 \\ 0 & \mbox{otherwise}, \end{array} \right. \\ q(\lambda) & = & \frac{\sqrt{69.6651}}{\sqrt{2 \pi}} \exp\left\{-\frac{69.6651}{2}(\lambda - 0.5806)^{2}\right\} \end{eqnarray*}$ then $\begin{eqnarray*} \hat{I} & = & \frac{1}{N} \sum_{i=1}^{N} \frac{k\lambda_{i}^{11}\exp\{-20\lambda_{i}^{2} + 6\lambda_{i}\}\mathbb{I}_{\{\lambda_{i} > 0\}}}{\frac{\sqrt{69.6651}}{\sqrt{2 \pi}} \exp\left\{-\frac{69.6651}{2}(\lambda_{i} - 0.5806)^{2}\right\}} \end{eqnarray*}$ is an unbiased estimate of the posterior mean of $$\lambda$$ with $$\mathbb{I}_{\{\lambda_{i} > 0\}}$$ denoting the indicator function for the event $$\{\lambda_{i} > 0\}$$.