MA40189: Topics in Bayesian statistics

Solution Sheet Six

Question 1

Let $X_{1}, \ldots, X_{n}$ be exchangeable so that the $X_{i}$ are conditionally independent given a parameter $\theta = (\mu, \sigma^{2})$. Suppose that $X_{i} \, | \, \theta \sim N(\mu, \sigma^{2})$. It is judged that the improper joint prior distribution $f(\mu, \sigma^{2}) \propto 1/\sigma^{2}$ is appropriate.

Show that the likelihood $f(x \, | \, \mu, \sigma^{2})$, where $x = (x_{1}, \ldots, x_{n})$, can be expressed as \[\begin{eqnarray*} f(x \, | \, \mu, \sigma^{2}) & = & \left(2\pi \sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right\}, \end{eqnarray*}\] where $\overline{x} = \frac{1}{n} \sum_{i=1}^{n} x_{i}$, $s^{2} = \frac{1}{n-1} \sum_{i=1}^{n} (x_{i} - \overline{x})^{2}$ are respectively the sample mean and variance. Hence, explain why $\overline{X}$ and $S^{2}$ are sufficient for $X = (X_{1}, \ldots, X_{n})$ for learning about $\theta$.
\[\begin{eqnarray*} f(x \, | \, \mu, \sigma^{2}) & = & \prod_{i=1}^{n} \frac{1}{\sqrt{2 \pi}\sigma} \exp\left\{-\frac{1}{2\sigma^{2}}(x_{i} - \mu)^{2}\right\} \\ & = & \left(2\pi\sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n} (x_{i} - \mu)^{2}\right\} \\ & = & \left(2\pi\sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n} \left((x_{i} - \overline{x}) + (\overline{x} - \mu)\right)^{2}\right\} \\ & = & \left(2\pi \sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right\}. \end{eqnarray*}\] Recall that a statistic $t(X)$ is said to be sufficient for $X$ for learning about $\theta$ if we can write \[\begin{eqnarray*} f(x \, | \, \theta) & = & g(t, \theta)h(x) \end{eqnarray*}\] where $g(t, \theta)$ depends upon $t(x)$ and $\theta$ and $h(x)$ does not depend upon $\theta$ but may depend upon $x$. In this case we have that \[\begin{eqnarray*} f(x \, | \, \mu, \sigma^{2}) & = & g(\overline{x}, s^{2}, \mu, \sigma^{2})\left(2\pi \right)^{-\frac{n}{2}} \end{eqnarray*}\] so that $\overline{X}$ and $S^{2}$ are sufficient.
Find, up to a constant of integration, the posterior distribution of $\theta$ given $x$.
\[\begin{eqnarray*} f(\mu, \sigma^{2} \, | \, x) & \propto & f(x \, | \, \mu, \sigma^{2})f(\mu, \sigma^{2}) \\ & \propto & \left(2\pi \sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right\} \frac{1}{\sigma^{2}} \\ & \propto & \left(\sigma^{2}\right)^{-\left(\frac{n}{2}+1\right)}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right\}. \end{eqnarray*}\]
Show that $\mu \, | \, \sigma^{2}, x \sim N(\overline{x}, \sigma^{2}/n)$. Hence, explain why, in this case, the chosen prior distribution for $\theta$ is noninformative.
\[\begin{eqnarray*} f(\mu \, | \, \sigma^{2}, x) & = & \frac{f(\mu, \sigma^{2} \, | \, x)}{f(\sigma^{2} \, | \, x)} \\ & \propto & f(\mu, \sigma^{2} \, | \, x) \ \ \ \ \mbox{(with respect to $\mu$)} \\ & \propto & \exp\left\{-\frac{n}{2\sigma^{2}}(\overline{x} - \mu)^{2}\right\}. \end{eqnarray*}\] We recognise this as a kernel of $N(\overline{x}, \sigma^{2}/n)$ so $\mu \, | \, \sigma^{2}, x \sim N(\overline{x}, \sigma^{2}/n)$. In the classical model for $\mu$ when $\sigma^{2}$ is known, the mle is $\overline{x}$ and the standard error is $\sigma^{2}/n$. The distribution is coming only from the data (given $\sigma^{2}$) showing the noninformative prior in this case. Notice that a symmetric $100(1-\alpha)\%$ credible interval for $\mu \, | \, \sigma^{2}, x$ is \[\begin{eqnarray*} \left(\overline{x} - z_{\left(1 - \frac{\alpha}{2}\right)}\frac{\sigma}{\sqrt{n}}, \overline{x} + z_{\left(1 - \frac{\alpha}{2}\right)}\frac{\sigma}{\sqrt{n}}\right) \end{eqnarray*}\] which agrees with the $100(1-\alpha)\%$ confidence interval for $\mu$ when $\sigma^{2}$ is known and $X_{1}, \ldots, X_{n}$ are iid $N(\mu, \sigma^{2})$.
By integrating $f(\mu, \sigma^{2} \, | \, x)$ over $\sigma^{2}$, show that \[\begin{eqnarray*} f(\mu \, | \, x) & \propto & \left[1 + \frac{1}{n-1}\left(\frac{\overline{x} - \mu}{s/\sqrt{n}}\right)^{2}\right]^{-\frac{n}{2}}. \end{eqnarray*}\] Thus, explain why $\mu \, | \, x \sim t_{n-1}(\overline{x}, s^{2}/n)$, the non-central $t$-distribution with $n-1$ degrees of freedom, location parameter $\overline{x}$ and squared scale parameter $s^{2}/n$. How does this result relate to the classical problem of making inferences about $\mu$ when $\sigma^{2}$ is also unknown?
\[\begin{eqnarray} f(\mu \, | \, x) & = & \int_{-\infty}^{\infty} f(\mu, \sigma^{2} \, | \, x) \, d\sigma^{2} \nonumber \\ & \propto & \int_{0}^{\infty} \left(\sigma^{2}\right)^{-\left(\frac{n}{2}+1\right)}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right\} \, d\sigma^{2} \tag{1} \end{eqnarray}\] as $f(\mu, \sigma^{2} \, | \, x) = 0$ for $\sigma^{2} < 0$. We recognise the integrand in equation (1) as a kernel of $\mbox{Inv-Gamma}\left(\frac{n}{2}, \frac{1}{2}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right)$ so that \[\begin{eqnarray*} f(\mu \, | \, x) & \propto & \Gamma(n/2)\left(\frac{1}{2}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right)^{-\frac{n}{2}}\\ & \propto & \left(\frac{1}{2}\left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]\right)^{-\frac{n}{2}}\\ & \propto & \left[(n-1)s^{2} + n(\overline{x} - \mu)^{2}\right]^{-\frac{n}{2}}\\ & \propto & \left[1 + \frac{1}{n-1}\left(\frac{\overline{x} - \mu}{s/\sqrt{n}}\right)^{2}\right]^{-\frac{n}{2}}. \end{eqnarray*}\] We recognise this as a kernel of $t_{n-1}(\overline{x}, s^{2}/n)$ so that $\mu \, | \, x \sim t_{n-1}(\overline{x}, s^{2}/n)$. This gives a further insight into how the prior distribution is noninformative. Inference about $\mu$ will mirror the classical approach when $\mu$ and $\sigma^{2}$ are unknown and $\frac{\overline{X} - \mu}{S/\sqrt{n}} \sim t_{n-1}$ where $t_{n-1} = t_{n-1}(0, 1)$. For example, a symmetric $100(1-\alpha)\%$ credible interval for $\mu \, | \, x$ is \[\begin{eqnarray*} \left(\overline{x} - t_{n-1, \frac{\alpha}{2}}\frac{s}{\sqrt{n}}, \overline{x} + t_{n-1, \frac{\alpha}{2}}\frac{s}{\sqrt{n}}\right) \end{eqnarray*}\] which agrees with the $100(1-\alpha)\%$ confidence interval for $\mu$ when $\sigma^{2}$ is unknown and $X_{1}, \ldots, X_{n}$ are iid $N(\mu, \sigma^{2})$.

Question 2

Let $X_{1}, \ldots, X_{n}$ be exchangeable with $X_{i} \, | \, \theta \sim \mbox{Bern}(\theta)$.

Using the improper prior distribution $f(\theta) \propto \theta^{-1}(1-\theta)^{-1}$ find the posterior distribution of $\theta \, | \, x$ where $x = (x_{1}, \ldots, x_{n})$. Find a normal approximation about the mode to this distribution.

The likelihood is \[\begin{eqnarray*} f(x \, | \, \theta) \ = \ \prod_{i=1}^{n} \theta^{x_{i}}(1-\theta)^{1-x_{i}} \ = \ \theta^{n\bar{x}}(1-\theta)^{n-n\bar{x}}, \end{eqnarray*}\] where $x = (x_{1}, \ldots, x_{n})$. With the given prior the posterior is \[\begin{eqnarray*} f(\theta \, | \, x) \ \propto \ f(x \, | \, \theta)f(\theta) \ \propto \ \theta^{n\bar{x}-1}(1-\theta)^{n-n\bar{x}-1} \end{eqnarray*}\] which we recognise as a kernel of a $\mbox{Beta}(n\bar{x}, n-n\bar{x})$ density. Thus, $\theta \, | \, x \sim \mbox{Beta}(n\bar{x}, n-n\bar{x})$. The mode of a $\mbox{Beta}(\alpha, \beta)$ distribution is $\frac{\alpha-1}{\alpha + \beta -2}$ so for $\theta \, | \, x$ the mode is \[\begin{eqnarray*} \tilde{\theta} & = & \frac{n\bar{x} - 1}{n-2}. \end{eqnarray*}\] The observed information is \[\begin{eqnarray*} I(\theta \, | \, x) & = & -\frac{\partial^{2}}{\partial \theta^{2}}\log f(\theta \, | \, x) \\ & = & -\frac{\partial^{2}}{\partial \theta^{2}}\left\{\log \frac{\Gamma(n)}{\Gamma(n\bar{x})\Gamma(n -n\bar{x})} + (n\bar{x}-1) \log \theta + (n-n\bar{x}-1)\log (1-\theta)\right\} \\ & = & \frac{n\bar{x}-1}{\theta^{2}} + \frac{n-n\bar{x}-1}{(1-\theta)^{2}}. \end{eqnarray*}\] So, evaluating the observed information at the mode, \[\begin{eqnarray*} I(\tilde{\theta} \, | \, x) & = & \frac{n\bar{x}-1}{\tilde{\theta}^{2}} + \frac{n-n\bar{x}-1}{(1-\tilde{\theta})^{2}}. \end{eqnarray*}\] Noting that $1-\tilde{\theta} = \frac{n-n\bar{x}-1}{n-2}$ we have that \[\begin{eqnarray*} I(\tilde{\theta} \, | \, x) & = & \frac{(n-2)^{2}}{n\bar{x}-1} + \frac{(n-2)^{2}}{n-n\bar{x}-1} \\ & = & \frac{(n-2)^{3}}{(n\bar{x} -1)(n -n\bar{x} -1)}. \end{eqnarray*}\] So, approximately, $\theta \, | \, x \sim N(\tilde{\theta}, I^{-1}(\tilde{\theta} \, | \, x))$, that is, approximately, \[\begin{eqnarray*} \theta \, | \, x & \sim & N\left(\frac{n\bar{x} - 1}{n-2}, \frac{(n\bar{x} -1)(n -n\bar{x} -1)}{(n-2)^{3}}\right). \end{eqnarray*}\]
Show that the prior distribution $f(\theta)$ is equivalent to a uniform prior on \[\begin{eqnarray*} \beta & = & \log \left(\frac{\theta}{1-\theta}\right) \end{eqnarray*}\] and find the posterior distribution of $\beta \, | \, x$. Find a normal approximation about the mode to this distribution.

We have $\beta = g(\theta)$. We invert to find $\theta = g^{-1}(\beta)$. We find\[\begin{eqnarray*} \theta & = & \frac{e^{\beta}}{1 + e^{\beta}}. \end{eqnarray*}\] The prior $f_{\beta}(\beta)$ for $\beta$ is given by \[\begin{eqnarray*} f_{\beta}(\beta) & = & \left|\frac{\partial \theta}{\partial \beta} \right| f_{\theta}(\theta) \\ & = & \left|\frac{e^{\beta}}{(1 + e^{\beta})^{2}}\right| \left(\frac{e^{\beta}}{1 + e^{\beta}}\right)^{-1}\left(1 - \frac{e^{\beta}}{1 + e^{\beta}}\right)^{-1} \\ & = & \frac{e^{\beta}}{(1 + e^{\beta})^{2}} \times \frac{1 + e^{\beta}}{e^{\beta}} \times (1 + e^{\beta}) \ = \ 1, \end{eqnarray*}\] which is equivalent to the (improper) uniform on $\beta$. The posterior is \[\begin{eqnarray*} f(\beta \, | \, x) & \propto & f(x \, | \, \beta) f(\beta) \\ & \propto & \theta^{n\bar{x}}(1-\theta)^{n-n\bar{x}} \\ & = & \left(\frac{e^{\beta}}{1 + e^{\beta}}\right)^{n\bar{x}}\left(1-\frac{e^{\beta}}{1 + e^{\beta}}\right)^{n-n\bar{x}} \ = \ \frac{e^{\beta n \bar{x}}}{(1+e^{\beta})^{n}}. \end{eqnarray*}\] Hence, $f(\beta \, | \, x) = \frac{ce^{\beta n \bar{x}}}{(1+e^{\beta})^{n}}$ where $c$ is the constant of integration. For the normal approximation about the mode, we first need to find the mode of $\beta \, | \, x$. The mode is the maximum of $f(\beta \, | \, x)$ which is, equivalently, the maximum of $\log f(\beta \, | \, x)$. We have \[\begin{eqnarray*} \log f(\beta \, | \, x) & = & \log c + \beta n \bar{x} -n \log (1+e^{\beta}) \\ \Rightarrow \frac{\partial}{\partial \beta} \log f(\beta \, | \, x) & = & n \bar{x} - \frac{ne^{\beta}}{1 + e^{\beta}}. \end{eqnarray*}\] The mode $\tilde{\beta}$ satisfies \[\begin{eqnarray*} n \bar{x} - \frac{ne^{\tilde{\beta}}}{1 + e^{\tilde{\beta}}} & = & 0 \\ \Rightarrow e^{\tilde{\beta}} & = & \frac{n\bar{x}}{n -n \bar{x}} \\ \Rightarrow \tilde{\beta} & = & \log \left(\frac{\bar{x}}{1-\bar{x}}\right). \end{eqnarray*}\] The observed information is \[\begin{eqnarray*} I(\beta \, | \, x) & = & - \frac{\partial^{2}}{\partial \beta^{2}} \log f(\beta \, | \, x) \\ & = & \frac{ne^{\beta}}{(1 + e^{\beta})^{2}}. \end{eqnarray*}\] Noting that $1 + e^{\tilde{\beta}} = \frac{1}{1-\bar{x}}$ we have \[\begin{eqnarray*} I(\tilde{\beta} \, | \, x) & = & n \times \frac{\bar{x}}{1-\bar{x}} \times (1-\bar{x})^{2} \ = \ n\bar{x}(1-\bar{x}). \end{eqnarray*}\] Hence, approximately, \[\begin{eqnarray*} \beta \, | \, x & \sim & N\left(\log \frac{\bar{x}}{1-\bar{x}}, \frac{1}{n\bar{x}(1-\bar{x})}\right). \end{eqnarray*}\]
For which parameterisation does it make more sense to use a normal approximation?

Whilst we can find a normal approximation about the mode either on the scale of $\theta$ or of $\beta$, it makes more sense for $\beta$. We have $0 < \theta < 1$ and $-\infty < \beta < \infty$ so only $\beta$ has a sample space which agrees with the normal distribution.

Question 3

In viewing a section through the pancreas, doctors see what are called “islands”. Suppose that $X_{i}$ denotes the number of islands observed in the $i$th patient, $i =1, \ldots, n$, and we judge that $X_{1}, \ldots, X_{n}$ are exchangeable with $X_{i} \, | \, \theta \sim \mbox{Po}(\theta)$. A doctor believes that for healthy patients $\theta$ will be on average around 2; he thinks it is unlikely that $\theta$ is greater than 3. The number of islands seen in 100 patients are summarised in the following table. \[\begin{eqnarray*} & \begin{array}{|l|rrrrrrr|} \hline \mbox{Number of islands} & 0 & 1 & 2 & 3 & 4 & 5 & \geq 6 \\ \hline \mbox{Frequency} & 20 & 30 & 28 & 14 & 7 & 1 & 0 \\ \hline \end{array} \end{eqnarray*}\]

Express the doctor’s prior beliefs as a normal distribution for $\theta$. You may interpret the term “unlikely” as meaning “with probability 0.01”.

The doctor thus asserts $\theta \sim N(\mu, \sigma^{2})$ with $E(\theta) = 2$ and $P(\theta > 3) = 0.01$. Note that, as $\theta$ is continuous, this is equivalent to $P(\theta \geq 3) = 0.01$. We use these two pieces of information to obtain $\mu$ and $\sigma^{2}$. Firstly, $E(\theta) = 2 \Rightarrow \mu = 2$. Secondly, \[\begin{eqnarray*} P(\theta > 3) \ = \ 0.01 \ \Rightarrow \ P\left(\frac{\theta - 2}{\sigma} > \frac{1}{\sigma}\right) \ = \ 0.01. \end{eqnarray*}\] As $\frac{\theta - 2}{\sigma} \sim N(0, 1)$ we have that \[\begin{eqnarray*} \frac{1}{\sigma} \ = \ 2.33 \ \Rightarrow \ \sigma^{2} \ = \ 5.4289^{-1} \ = \ 0.1842. \end{eqnarray*}\] Hence, $\theta \sim N(2, 5.4289^{-1})$.
Find, up to a constant of proportionality, the posterior distribution $\theta \, | \, x$ where $x = (x_{1}, \ldots, x_{100})$.

As $X_{i} \, | \, \theta \sim \mbox{Po}(\theta)$, the likelihood is \[\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} \frac{\theta^{x_{i}}e^{-\theta}}{x_{i}!} \ = \ \frac{\theta^{n\bar{x}}e^{-n\theta}}{\prod_{i=1}^{n} x_{i}!}. \end{eqnarray*}\] As $\theta \sim N(2, 5.4289^{-1})$, the prior is \[\begin{eqnarray*} f(\theta) & = & \frac{2.33}{\sqrt{2 \pi}} \exp \left\{ - \frac{5.4289}{2}(\theta - 2)^{2} \right\}. \end{eqnarray*}\] The posterior is thus \[\begin{eqnarray*} f(\theta \, | \, x) & \propto & f(x \, | \, \theta)f(\theta) \\ & \propto & \theta^{n\bar{x}}e^{-n\theta} \times \exp \left\{ - \frac{5.4289}{2}(\theta^{2} - 4 \theta)\right\} \\ & = & \theta^{n\bar{x}} \exp\left\{ - \frac{5.4289}{2}(\theta^{2} - 4 \theta) - n\theta\right\} \end{eqnarray*}\] For the explicit data we have $n = 100$ and \[\begin{eqnarray*} \sum_{i=1}^{100} x_{i} = (0 \times 20)+(1 \times 30) + (2 \times 28) + (3 \times 14) + (4 \times 7) + (5 \times 1) = 161. \end{eqnarray*}\] The posterior is thus \[\begin{eqnarray*} f(\theta \, | \, x) & = & c\theta^{161} \exp\left\{- \frac{5.4289}{2}(\theta^{2} - 4 \theta) - 100\theta\right\} \end{eqnarray*}\] where $c$ is the constant of proportionality.
Find a normal approximation to the posterior about the mode. Thus, estimate the posterior probability that the average number of islands is greater than 2.

We find the mode of $\theta \, | \, x$ by maximising $f(\theta \, | \, x)$ or, equivalently, $\log f(\theta \, | \, x)$. We have \[\begin{eqnarray*} \frac{\partial}{\partial \theta} \log f(\theta \, | \, x) & = & \frac{\partial}{\partial \theta} \left\{ \log c + 161 \log \theta - \frac{5.4289}{2}(\theta^{2} - 4 \theta) - 100\theta\right\} \\ & = & \frac{161}{\theta} - 5.4289\theta + 2(5.4289) - 100. \end{eqnarray*}\] So, the mode $\tilde{\theta}$ satisfies \[\begin{eqnarray*} 5.4289\tilde{\theta}^{2} + \{100-2(5.4289)\}\tilde{\theta} - 161 & = & 0 \ \Rightarrow \\ 5.4289\tilde{\theta}^{2} + 89.1422\tilde{\theta} - 161 & = & 0. \end{eqnarray*}\] Hence, as $\tilde{\theta} > 0$, \[\begin{eqnarray*} \tilde{\theta} \ = \ \frac{-89.1422 + \sqrt{89.1422^{2} + 4(5.4289)(161)}}{2(5.4289)} \ = \ 1.6419. \end{eqnarray*}\] The observed information is \[\begin{eqnarray*} I(\theta \, | \, x) & = & - \frac{\partial^{2}}{\partial \theta^{2}} \log f(\theta \, | \, x) \ = \ \frac{161}{\theta^{2}} + 5.4289 \end{eqnarray*}\] so that \[\begin{eqnarray*} I(\tilde{\theta} \, | \, x) & = & \frac{161}{1.6419^{2}} + 5.4289 \ = \ 65.1506. \end{eqnarray*}\] So, approximately, $\theta \, | \, x \sim N(1.6419, 65.1506^{-1})$. Thus, \[\begin{eqnarray*} P(\theta > 2 \, | \, x) & = & P\{Z > \sqrt{65.1506}(2 - 1.6419)\} \ = \ 0.0019. \end{eqnarray*}\]
Why might you prefer to express the doctor’s prior beliefs as a normal distribution on some other parameterisation $\phi = g(\theta)$? Suggest an appropriate choice of $g(\cdot)$ in this case. Now express the doctor’s beliefs using a normal prior for $\phi$; that for healthy patients $\phi$ will be on average around $g(2)$ and it is “unlikely” that $\phi$ is greater than $g(3)$. Give an expression for the density of $\phi \, | \, x$ up to a constant of proportionality.

By definition $\theta > 0$ but both the specified normal prior distribution and the normal approximation for the posterior $\theta \, | \, x$ have a sample space of $(-\infty, \infty)$ so we might want to use some other parametrisation which has the same sample space as the normal distribution. An obvious choice is to use $\phi = g(\theta) = \log \theta$ as if $\theta > 0$ then $-\infty < \log \theta < \infty$. We assert $\phi \sim N(\mu_{0}, \sigma_{0}^{2})$ with $E(\phi) = \log 2$ and $P(\phi > \log 3) = 0.01$. So, $E(\phi) = \log 2 \Rightarrow \mu_{0} = \log 2$ and \[\begin{eqnarray*} P(\phi > \log 3) \ = \ 0.01 & \Rightarrow P\left(\frac{\phi - \log 2}{\sigma_{0}} > \frac{\log 3 - \log 2}{\sigma_{0}}\right) \ = \ 0.01. \end{eqnarray*}\]
As $\frac{\phi - \log 2}{\sigma_{0}} \sim N(0, 1)$ we have that \[\begin{eqnarray*} \frac{1}{\sigma_{0}} \ = \ \frac{2.33}{\log \frac{3}{2}} & \Rightarrow & \sigma_{0}^{2} \ = \ 0.0303. \end{eqnarray*}\] As $\phi = \log \theta$ then $\theta = e^{\phi}$. The likelihood is thus, using (b), \[\begin{eqnarray*} f(x \, | \, \phi) & \propto & (e^{\phi})^{n\bar{x}} e^{-ne^{\phi}} \ = \ e^{161\phi } e^{-100e^{\phi}} \end{eqnarray*}\] for the given data. The posterior is thus \[\begin{eqnarray*} f(\phi \, | \, x) & \propto & e^{161\phi } e^{-100e^{\phi}} \times \exp\left\{-\frac{1}{0.0606}(\phi^{2} - \log 4 \phi)\right\} \\ & = & \exp\left\{-100e^{\phi} -16.5017 \phi^{2} + 183.8761 \phi\right\}. \end{eqnarray*}\]

Question 4

Let $X_{1}, \ldots, X_{10}$ be the length of time between arrivals at an ATM machine, and assume that the $X_{i}$s may be viewed as exchangeable with $X_{i} \, | \, \lambda \sim \mbox{Exp}(\lambda)$ where $\lambda$ is the rate at which people arrive at the machine in one-minute intervals. Suppose we observe $\sum_{i=1}^{10} x_{i} = 4$. Suppose that the prior distribution for $\lambda$ is given by \[\begin{eqnarray*} f(\lambda) & = & \left\{\begin{array}{ll} c\exp\{-20(\lambda-0.25)^{2}\} & \lambda \geq 0, \\ 0 & \mbox{otherwise} \end{array} \right. \end{eqnarray*}\] where $c$ is a known constant.

Find, up to a constant $k$ of proportionality, the posterior distribution $\lambda \, | \, x$ where $x = (x_{1}, \ldots, x_{10})$. Find also an expression for $k$ which you need not evaluate.

The likelihood is \[\begin{eqnarray*} f(x \, | \, \lambda) & = & \prod_{i=1}^{10} \lambda e^{-\lambda x_{i}} \ = \ \lambda^{10}e^{-4\lambda} \end{eqnarray*}\] with the given data. The posterior is thus \[\begin{eqnarray*} f(\lambda \, | \, x) & \propto & \lambda^{10}e^{-4\lambda} \times \exp\{-20(\lambda - 0.25)^{2}\} \ = \ \lambda^{10} \exp\{-20(\lambda^{2} - 0.5\lambda) - 4\lambda\} \end{eqnarray*}\] so that, making the fact that $\lambda > 0$ explicit, \[\begin{eqnarray*} f(\lambda \, | \, x) & = & \left\{\begin{array}{ll} k \lambda^{10} \exp\{-20\lambda^{2} + 6\lambda\} & \lambda > 0 \\ 0 & \mbox{otherwise}, \end{array}\right. \end{eqnarray*}\] where \[\begin{eqnarray*} k^{-1} & = & \int_{0}^{\infty} \lambda^{10} \exp\{-20\lambda^{2} + 6\lambda\} \, d\lambda. \end{eqnarray*}\]
Find a normal approximation to this posterior distribution about the mode.

We find the mode of $\lambda \, | \, x$ by maximising $f(\lambda \, | \, x)$ or, equivalently, $\log f(\lambda \, | \, x)$. We have \[\begin{eqnarray*} \frac{\partial}{\partial \lambda} \log f(\lambda \, | \, x) & = & \frac{\partial}{\partial \lambda} \left\{10 \log \lambda -20\lambda^{2} + 6\lambda\right\} \\ & = & \frac{10}{\lambda} - 40\lambda + 6. \end{eqnarray*}\] So, the mode $\tilde{\lambda}$ satisfies \[\begin{eqnarray*} 40\tilde{\lambda}^{2} - 6\tilde{\lambda} - 10 \ = \ 0 & \Rightarrow & 20\tilde{\lambda}^{2} - 3\tilde{\lambda} - 5 \ = \ 0. \end{eqnarray*}\] Hence, as $\tilde{\lambda} > 0$, \[\begin{eqnarray*} \tilde{\lambda} & = & \frac{3 + \sqrt{9 + 4(20)(5)}}{2(20)} \ = \ 0.5806. \end{eqnarray*}\] The observed information is \[\begin{eqnarray*} I(\lambda \, | \, x) & = & - \frac{\partial^{2}}{\partial \lambda^{2}} \log f(\lambda \, | \, x) \ = \ \frac{10}{\lambda^{2}} + 40 \end{eqnarray*}\] so that \[\begin{eqnarray*} I(\tilde{\lambda} \, | \, x) & = & \frac{10}{0.5806^{2}} + 40 \ = \ 69.6651. \end{eqnarray*}\] So, approximately, $\lambda \, | \, x \sim N(0.5806, 69.6651^{-1})$.
Let $Z_{i}$, $i = 1, \ldots, N$ be a sequence of independent and identically distributed standard Normal random variables. Assuming the normalising constant $k$ is known, explain carefully how the $Z_{i}$ may be used to obtain estimates of the mean of $\lambda \, | \, x$.

We shall use importance sampling. If we wish to estimate some $E\{g(\lambda) \, | \, X\}$ with posterior density $f(\lambda \, | \, x)$ and can generate independent samples $\lambda_{1}, \ldots, \lambda_{N}$ from some $q(\lambda)$, an approximation of $f(\lambda \, | \, x)$, then \[\begin{eqnarray*} \hat{I} & = & \frac{1}{N} \sum_{i=1}^{N} \frac{g(\lambda_{i})f(\lambda_{i} \, | \, x)}{q(\lambda_{i})} \end{eqnarray*}\] is an unbiased estimate of $E\{g(\lambda) \, | \, X\}$.

As $Z_{i} \sim N(0, 1)$ then $\lambda_{i} = 69.6651^{-\frac{1}{2}}Z_{i} + 0.5806 \sim N(0.5806, 69.6651^{-1})$ so that we can generate an independent and identically distributed sample from the $N(0.5806, 69.6651^{-1})$ which is an approximation to the posterior of $\lambda$. Letting $g(\lambda) = \lambda$, \[\begin{eqnarray*} f(\lambda \, | \, x) & = & \left\{\begin{array}{ll} k \lambda^{10} \exp\{-20\lambda^{2} + 6\lambda\} & \lambda > 0 \\ 0 & \mbox{otherwise}, \end{array} \right. \\ q(\lambda) & = & \frac{\sqrt{69.6651}}{\sqrt{2 \pi}} \exp\left\{-\frac{69.6651}{2}(\lambda - 0.5806)^{2}\right\} \end{eqnarray*}\] then \[\begin{eqnarray*} \hat{I} & = & \frac{1}{N} \sum_{i=1}^{N} \frac{k\lambda_{i}^{11}\exp\{-20\lambda_{i}^{2} + 6\lambda_{i}\}\mathbb{I}_{\{\lambda_{i} > 0\}}}{\frac{\sqrt{69.6651}}{\sqrt{2 \pi}} \exp\left\{-\frac{69.6651}{2}(\lambda_{i} - 0.5806)^{2}\right\}} \end{eqnarray*}\] is an unbiased estimate of the posterior mean of $\lambda$ with $\mathbb{I}_{\{\lambda_{i} > 0\}}$ denoting the indicator function for the event $\{\lambda_{i} > 0\}$.