Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta = (\mu, \sigma^{2})\). Suppose that \(X_{i} \, | \, \theta \sim N(\mu, \sigma^{2})\). It is judged that the improper joint prior distribution \(f(\mu, \sigma^{2}) \propto 1/\sigma^{2}\) is appropriate.
Show that the likelihood \(f(x
\, | \, \mu, \sigma^{2})\), where \(x =
(x_{1}, \ldots, x_{n})\), can be expressed as \[\begin{eqnarray*}
f(x \, | \, \mu, \sigma^{2}) & = & \left(2\pi
\sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2}
+ n(\overline{x} - \mu)^{2}\right]\right\},
\end{eqnarray*}\] where \(\overline{x} = \frac{1}{n} \sum_{i=1}^{n}
x_{i}\), \(s^{2} = \frac{1}{n-1}
\sum_{i=1}^{n} (x_{i} - \overline{x})^{2}\) are respectively the
sample mean and variance. Hence, explain why \(\overline{X}\) and \(S^{2}\) are sufficient for \(X = (X_{1}, \ldots, X_{n})\) for learning
about \(\theta\).
\[\begin{eqnarray*}
f(x \, | \, \mu, \sigma^{2}) & = & \prod_{i=1}^{n}
\frac{1}{\sqrt{2 \pi}\sigma} \exp\left\{-\frac{1}{2\sigma^{2}}(x_{i} -
\mu)^{2}\right\} \\
& = &
\left(2\pi\sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n}
(x_{i} - \mu)^{2}\right\} \\
& = &
\left(2\pi\sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n}
\left((x_{i} - \overline{x}) + (\overline{x} - \mu)\right)^{2}\right\}
\\
& = & \left(2\pi
\sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2}
+ n(\overline{x} - \mu)^{2}\right]\right\}.
\end{eqnarray*}\] Recall that a statistic \(t(X)\) is said to be sufficient for \(X\) for learning about \(\theta\) if we can write \[\begin{eqnarray*}
f(x \, | \, \theta) & = & g(t, \theta)h(x)
\end{eqnarray*}\] where \(g(t,
\theta)\) depends upon \(t(x)\)
and \(\theta\) and \(h(x)\) does not depend upon \(\theta\) but may depend upon \(x\). In this case we have that \[\begin{eqnarray*}
f(x \, | \, \mu, \sigma^{2}) & = & g(\overline{x}, s^{2}, \mu,
\sigma^{2})\left(2\pi \right)^{-\frac{n}{2}}
\end{eqnarray*}\] so that \(\overline{X}\) and \(S^{2}\) are sufficient.
Find, up to a constant of integration, the posterior
distribution of \(\theta\) given \(x\).
\[\begin{eqnarray*}
f(\mu, \sigma^{2} \, | \, x) & \propto & f(x \, | \, \mu,
\sigma^{2})f(\mu, \sigma^{2}) \\
& \propto & \left(2\pi
\sigma^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2}
+ n(\overline{x} - \mu)^{2}\right]\right\} \frac{1}{\sigma^{2}} \\
& \propto &
\left(\sigma^{2}\right)^{-\left(\frac{n}{2}+1\right)}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2}
+ n(\overline{x} - \mu)^{2}\right]\right\}.
\end{eqnarray*}\]
Show that \(\mu \, | \,
\sigma^{2}, x \sim N(\overline{x}, \sigma^{2}/n)\). Hence,
explain why, in this case, the chosen prior distribution for \(\theta\) is noninformative.
\[\begin{eqnarray*}
f(\mu \, | \, \sigma^{2}, x) & = & \frac{f(\mu, \sigma^{2} \, |
\, x)}{f(\sigma^{2} \, | \, x)} \\
& \propto & f(\mu, \sigma^{2} \, | \, x) \ \ \ \ \mbox{(with
respect to $\mu$)} \\
& \propto & \exp\left\{-\frac{n}{2\sigma^{2}}(\overline{x} -
\mu)^{2}\right\}.
\end{eqnarray*}\] We recognise this as a kernel of \(N(\overline{x}, \sigma^{2}/n)\) so \(\mu \, | \, \sigma^{2}, x \sim N(\overline{x},
\sigma^{2}/n)\). In the classical model for \(\mu\) when \(\sigma^{2}\) is known, the mle is \(\overline{x}\) and the standard error is
\(\sigma^{2}/n\). The distribution is
coming only from the data (given \(\sigma^{2}\)) showing the noninformative
prior in this case. Notice that a symmetric \(100(1-\alpha)\%\) credible interval for
\(\mu \, | \, \sigma^{2}, x\) is \[\begin{eqnarray*}
\left(\overline{x} - z_{\left(1 -
\frac{\alpha}{2}\right)}\frac{\sigma}{\sqrt{n}}, \overline{x} +
z_{\left(1 - \frac{\alpha}{2}\right)}\frac{\sigma}{\sqrt{n}}\right)
\end{eqnarray*}\] which agrees with the \(100(1-\alpha)\%\) confidence interval for
\(\mu\) when \(\sigma^{2}\) is known and \(X_{1}, \ldots, X_{n}\) are iid \(N(\mu, \sigma^{2})\).
By integrating \(f(\mu,
\sigma^{2} \, | \, x)\) over \(\sigma^{2}\), show that \[\begin{eqnarray*}
f(\mu \, | \, x) & \propto & \left[1 +
\frac{1}{n-1}\left(\frac{\overline{x} -
\mu}{s/\sqrt{n}}\right)^{2}\right]^{-\frac{n}{2}}.
\end{eqnarray*}\] Thus, explain why \(\mu \, | \, x \sim t_{n-1}(\overline{x},
s^{2}/n)\), the non-central \(t\)-distribution with \(n-1\) degrees of freedom, location
parameter \(\overline{x}\) and squared
scale parameter \(s^{2}/n\). How does
this result relate to the classical problem of making inferences about
\(\mu\) when \(\sigma^{2}\) is also unknown?
\[\begin{eqnarray}
f(\mu \, | \, x) & = & \int_{-\infty}^{\infty} f(\mu, \sigma^{2}
\, | \, x) \, d\sigma^{2} \nonumber \\
& \propto & \int_{0}^{\infty}
\left(\sigma^{2}\right)^{-\left(\frac{n}{2}+1\right)}\exp\left\{-\frac{1}{2\sigma^{2}}\left[(n-1)s^{2}
+ n(\overline{x} - \mu)^{2}\right]\right\} \, d\sigma^{2} \tag{1}
\end{eqnarray}\] as \(f(\mu, \sigma^{2}
\, | \, x) = 0\) for \(\sigma^{2} <
0\). We recognise the integrand in equation (1) as a kernel of
\(\mbox{Inv-Gamma}\left(\frac{n}{2},
\frac{1}{2}\left[(n-1)s^{2} + n(\overline{x} -
\mu)^{2}\right]\right)\) so that \[\begin{eqnarray*}
f(\mu \, | \, x) & \propto &
\Gamma(n/2)\left(\frac{1}{2}\left[(n-1)s^{2} + n(\overline{x} -
\mu)^{2}\right]\right)^{-\frac{n}{2}}\\
& \propto & \left(\frac{1}{2}\left[(n-1)s^{2} + n(\overline{x} -
\mu)^{2}\right]\right)^{-\frac{n}{2}}\\
& \propto & \left[(n-1)s^{2} + n(\overline{x} -
\mu)^{2}\right]^{-\frac{n}{2}}\\
& \propto & \left[1 + \frac{1}{n-1}\left(\frac{\overline{x} -
\mu}{s/\sqrt{n}}\right)^{2}\right]^{-\frac{n}{2}}.
\end{eqnarray*}\] We recognise this as a kernel of \(t_{n-1}(\overline{x}, s^{2}/n)\) so that
\(\mu \, | \, x \sim t_{n-1}(\overline{x},
s^{2}/n)\). This gives a further insight into how the prior
distribution is noninformative. Inference about \(\mu\) will mirror the classical approach
when \(\mu\) and \(\sigma^{2}\) are unknown and \(\frac{\overline{X} - \mu}{S/\sqrt{n}} \sim
t_{n-1}\) where \(t_{n-1} = t_{n-1}(0,
1)\). For example, a symmetric \(100(1-\alpha)\%\) credible interval for
\(\mu \, | \, x\) is \[\begin{eqnarray*}
\left(\overline{x} - t_{n-1, \frac{\alpha}{2}}\frac{s}{\sqrt{n}},
\overline{x} + t_{n-1, \frac{\alpha}{2}}\frac{s}{\sqrt{n}}\right)
\end{eqnarray*}\] which agrees with the \(100(1-\alpha)\%\) confidence interval for
\(\mu\) when \(\sigma^{2}\) is unknown and \(X_{1}, \ldots, X_{n}\) are iid \(N(\mu, \sigma^{2})\).
Let \(X_{1}, \ldots, X_{n}\) be exchangeable with \(X_{i} \, | \, \theta \sim \mbox{Bern}(\theta)\).
Using the improper prior distribution \(f(\theta) \propto
\theta^{-1}(1-\theta)^{-1}\) find the posterior distribution of
\(\theta \, | \, x\) where \(x = (x_{1}, \ldots, x_{n})\). Find a normal
approximation about the mode to this distribution.
The
likelihood is \[\begin{eqnarray*}
f(x \, | \, \theta) \ = \ \prod_{i=1}^{n}
\theta^{x_{i}}(1-\theta)^{1-x_{i}} \ = \
\theta^{n\bar{x}}(1-\theta)^{n-n\bar{x}},
\end{eqnarray*}\] where \(x = (x_{1},
\ldots, x_{n})\). With the given prior the posterior is \[\begin{eqnarray*}
f(\theta \, | \, x) \ \propto \ f(x \, | \, \theta)f(\theta) \ \propto \
\theta^{n\bar{x}-1}(1-\theta)^{n-n\bar{x}-1}
\end{eqnarray*}\] which we recognise as a kernel of a \(\mbox{Beta}(n\bar{x}, n-n\bar{x})\)
density. Thus, \(\theta \, | \, x \sim
\mbox{Beta}(n\bar{x}, n-n\bar{x})\). The mode of a \(\mbox{Beta}(\alpha, \beta)\) distribution
is \(\frac{\alpha-1}{\alpha + \beta
-2}\) so for \(\theta \, | \,
x\) the mode is \[\begin{eqnarray*}
\tilde{\theta} & = & \frac{n\bar{x} - 1}{n-2}.
\end{eqnarray*}\] The observed information is \[\begin{eqnarray*}
I(\theta \, | \, x) & = & -\frac{\partial^{2}}{\partial
\theta^{2}}\log f(\theta \, | \, x) \\
& = & -\frac{\partial^{2}}{\partial \theta^{2}}\left\{\log
\frac{\Gamma(n)}{\Gamma(n\bar{x})\Gamma(n -n\bar{x})} + (n\bar{x}-1)
\log \theta + (n-n\bar{x}-1)\log (1-\theta)\right\} \\
& = & \frac{n\bar{x}-1}{\theta^{2}} +
\frac{n-n\bar{x}-1}{(1-\theta)^{2}}.
\end{eqnarray*}\] So, evaluating the observed information at the
mode, \[\begin{eqnarray*}
I(\tilde{\theta} \, | \, x) & = &
\frac{n\bar{x}-1}{\tilde{\theta}^{2}} +
\frac{n-n\bar{x}-1}{(1-\tilde{\theta})^{2}}.
\end{eqnarray*}\] Noting that \(1-\tilde{\theta} =
\frac{n-n\bar{x}-1}{n-2}\) we have that \[\begin{eqnarray*}
I(\tilde{\theta} \, | \, x) & = & \frac{(n-2)^{2}}{n\bar{x}-1} +
\frac{(n-2)^{2}}{n-n\bar{x}-1} \\
& = & \frac{(n-2)^{3}}{(n\bar{x} -1)(n -n\bar{x} -1)}.
\end{eqnarray*}\] So, approximately, \(\theta \, | \, x \sim N(\tilde{\theta},
I^{-1}(\tilde{\theta} \, | \, x))\), that is, approximately,
\[\begin{eqnarray*}
\theta \, | \, x & \sim & N\left(\frac{n\bar{x} - 1}{n-2},
\frac{(n\bar{x} -1)(n -n\bar{x} -1)}{(n-2)^{3}}\right).
\end{eqnarray*}\]
Show that the prior distribution \(f(\theta)\) is equivalent to a uniform
prior on \[\begin{eqnarray*}
\beta & = & \log \left(\frac{\theta}{1-\theta}\right)
\end{eqnarray*}\] and find the posterior distribution of
\(\beta \, | \, x\). Find a normal
approximation about the mode to this distribution.
We
have \(\beta = g(\theta)\). We invert
to find \(\theta = g^{-1}(\beta)\). We
find\[\begin{eqnarray*}
\theta & = & \frac{e^{\beta}}{1 + e^{\beta}}.
\end{eqnarray*}\] The prior \(f_{\beta}(\beta)\) for \(\beta\) is given by \[\begin{eqnarray*}
f_{\beta}(\beta) & = & \left|\frac{\partial \theta}{\partial
\beta} \right| f_{\theta}(\theta) \\
& = & \left|\frac{e^{\beta}}{(1 + e^{\beta})^{2}}\right|
\left(\frac{e^{\beta}}{1 + e^{\beta}}\right)^{-1}\left(1 -
\frac{e^{\beta}}{1 + e^{\beta}}\right)^{-1} \\
& = & \frac{e^{\beta}}{(1 + e^{\beta})^{2}} \times \frac{1 +
e^{\beta}}{e^{\beta}} \times (1 + e^{\beta}) \ = \ 1,
\end{eqnarray*}\] which is equivalent to the (improper) uniform
on \(\beta\). The posterior is \[\begin{eqnarray*}
f(\beta \, | \, x) & \propto & f(x \, | \, \beta) f(\beta) \\
& \propto & \theta^{n\bar{x}}(1-\theta)^{n-n\bar{x}} \\
& = & \left(\frac{e^{\beta}}{1 +
e^{\beta}}\right)^{n\bar{x}}\left(1-\frac{e^{\beta}}{1 +
e^{\beta}}\right)^{n-n\bar{x}} \ = \ \frac{e^{\beta n
\bar{x}}}{(1+e^{\beta})^{n}}.
\end{eqnarray*}\] Hence, \(f(\beta \, |
\, x) = \frac{ce^{\beta n \bar{x}}}{(1+e^{\beta})^{n}}\) where
\(c\) is the constant of integration.
For the normal approximation about the mode, we first need to find the
mode of \(\beta \, | \, x\). The mode
is the maximum of \(f(\beta \, | \,
x)\) which is, equivalently, the maximum of \(\log f(\beta \, | \, x)\). We have \[\begin{eqnarray*}
\log f(\beta \, | \, x) & = & \log c + \beta n \bar{x} -n \log
(1+e^{\beta}) \\
\Rightarrow \frac{\partial}{\partial \beta} \log f(\beta \, | \, x)
& = & n \bar{x} - \frac{ne^{\beta}}{1 + e^{\beta}}.
\end{eqnarray*}\] The mode \(\tilde{\beta}\) satisfies \[\begin{eqnarray*}
n \bar{x} - \frac{ne^{\tilde{\beta}}}{1 + e^{\tilde{\beta}}} & =
& 0 \\
\Rightarrow e^{\tilde{\beta}} & = & \frac{n\bar{x}}{n -n
\bar{x}} \\
\Rightarrow \tilde{\beta} & = & \log
\left(\frac{\bar{x}}{1-\bar{x}}\right).
\end{eqnarray*}\] The observed information is \[\begin{eqnarray*}
I(\beta \, | \, x) & = & - \frac{\partial^{2}}{\partial
\beta^{2}} \log f(\beta \, | \, x) \\
& = & \frac{ne^{\beta}}{(1 + e^{\beta})^{2}}.
\end{eqnarray*}\] Noting that \(1 +
e^{\tilde{\beta}} = \frac{1}{1-\bar{x}}\) we have \[\begin{eqnarray*}
I(\tilde{\beta} \, | \, x) & = & n \times
\frac{\bar{x}}{1-\bar{x}} \times (1-\bar{x})^{2} \ = \
n\bar{x}(1-\bar{x}).
\end{eqnarray*}\] Hence, approximately, \[\begin{eqnarray*}
\beta \, | \, x & \sim & N\left(\log \frac{\bar{x}}{1-\bar{x}},
\frac{1}{n\bar{x}(1-\bar{x})}\right).
\end{eqnarray*}\]
For which parameterisation does it make more sense to use
a normal approximation?
Whilst we can find a normal
approximation about the mode either on the scale of \(\theta\) or of \(\beta\), it makes more sense for \(\beta\). We have \(0 < \theta < 1\) and \(-\infty < \beta < \infty\) so only
\(\beta\) has a sample space which
agrees with the normal distribution.
In viewing a section through the pancreas, doctors see what are called “islands”. Suppose that \(X_{i}\) denotes the number of islands observed in the \(i\)th patient, \(i =1, \ldots, n\), and we judge that \(X_{1}, \ldots, X_{n}\) are exchangeable with \(X_{i} \, | \, \theta \sim \mbox{Po}(\theta)\). A doctor believes that for healthy patients \(\theta\) will be on average around 2; he thinks it is unlikely that \(\theta\) is greater than 3. The number of islands seen in 100 patients are summarised in the following table. \[\begin{eqnarray*} & \begin{array}{|l|rrrrrrr|} \hline \mbox{Number of islands} & 0 & 1 & 2 & 3 & 4 & 5 & \geq 6 \\ \hline \mbox{Frequency} & 20 & 30 & 28 & 14 & 7 & 1 & 0 \\ \hline \end{array} \end{eqnarray*}\]
Express the doctor’s prior beliefs as a normal
distribution for \(\theta\). You may
interpret the term “unlikely” as meaning “with probability
0.01”.
The doctor thus asserts \(\theta \sim N(\mu, \sigma^{2})\) with \(E(\theta) = 2\) and \(P(\theta > 3) = 0.01\). Note that, as
\(\theta\) is continuous, this is
equivalent to \(P(\theta \geq 3) =
0.01\). We use these two pieces of information to obtain \(\mu\) and \(\sigma^{2}\). Firstly, \(E(\theta) = 2 \Rightarrow \mu = 2\).
Secondly, \[\begin{eqnarray*}
P(\theta > 3) \ = \ 0.01 \ \Rightarrow \ P\left(\frac{\theta -
2}{\sigma} > \frac{1}{\sigma}\right) \ = \ 0.01.
\end{eqnarray*}\] As \(\frac{\theta -
2}{\sigma} \sim N(0, 1)\) we have that \[\begin{eqnarray*}
\frac{1}{\sigma} \ = \ 2.33 \ \Rightarrow \ \sigma^{2} \ = \ 5.4289^{-1}
\ = \ 0.1842.
\end{eqnarray*}\] Hence, \(\theta \sim
N(2, 5.4289^{-1})\).
Find, up to a constant of proportionality, the posterior
distribution \(\theta \, | \, x\) where
\(x = (x_{1}, \ldots,
x_{100})\).
As \(X_{i} \, | \, \theta \sim
\mbox{Po}(\theta)\), the likelihood is \[\begin{eqnarray*}
f(x \, | \, \theta) & = & \prod_{i=1}^{n}
\frac{\theta^{x_{i}}e^{-\theta}}{x_{i}!} \ = \
\frac{\theta^{n\bar{x}}e^{-n\theta}}{\prod_{i=1}^{n} x_{i}!}.
\end{eqnarray*}\] As \(\theta \sim N(2,
5.4289^{-1})\), the prior is \[\begin{eqnarray*}
f(\theta) & = & \frac{2.33}{\sqrt{2 \pi}} \exp \left\{ -
\frac{5.4289}{2}(\theta - 2)^{2} \right\}.
\end{eqnarray*}\] The posterior is thus \[\begin{eqnarray*}
f(\theta \, | \, x) & \propto & f(x \, | \, \theta)f(\theta) \\
& \propto & \theta^{n\bar{x}}e^{-n\theta} \times \exp \left\{ -
\frac{5.4289}{2}(\theta^{2} - 4 \theta)\right\} \\
& = & \theta^{n\bar{x}} \exp\left\{ -
\frac{5.4289}{2}(\theta^{2} - 4 \theta) - n\theta\right\}
\end{eqnarray*}\] For the explicit data we have \(n = 100\) and \[\begin{eqnarray*}
\sum_{i=1}^{100} x_{i} = (0 \times 20)+(1 \times 30) + (2 \times 28) +
(3 \times 14) + (4 \times 7) + (5 \times 1) = 161.
\end{eqnarray*}\] The posterior is thus \[\begin{eqnarray*}
f(\theta \, | \, x) & = & c\theta^{161} \exp\left\{-
\frac{5.4289}{2}(\theta^{2} - 4 \theta) - 100\theta\right\}
\end{eqnarray*}\] where \(c\) is
the constant of proportionality.
Find a normal approximation to the posterior about the
mode. Thus, estimate the posterior probability that the average number
of islands is greater than 2.
We find the mode of
\(\theta \, | \, x\) by maximising
\(f(\theta \, | \, x)\) or,
equivalently, \(\log f(\theta \, | \,
x)\). We have \[\begin{eqnarray*}
\frac{\partial}{\partial \theta} \log f(\theta \, | \, x) & = &
\frac{\partial}{\partial \theta} \left\{ \log c + 161 \log \theta -
\frac{5.4289}{2}(\theta^{2} - 4 \theta) - 100\theta\right\} \\
& = & \frac{161}{\theta} - 5.4289\theta + 2(5.4289) - 100.
\end{eqnarray*}\] So, the mode \(\tilde{\theta}\) satisfies \[\begin{eqnarray*}
5.4289\tilde{\theta}^{2} + \{100-2(5.4289)\}\tilde{\theta} - 161 & =
& 0 \ \Rightarrow \\ 5.4289\tilde{\theta}^{2} +
89.1422\tilde{\theta} - 161 & = & 0.
\end{eqnarray*}\] Hence, as \(\tilde{\theta} > 0\), \[\begin{eqnarray*}
\tilde{\theta} \ = \ \frac{-89.1422 + \sqrt{89.1422^{2} +
4(5.4289)(161)}}{2(5.4289)} \ = \ 1.6419.
\end{eqnarray*}\] The observed information is \[\begin{eqnarray*}
I(\theta \, | \, x) & = & - \frac{\partial^{2}}{\partial
\theta^{2}} \log f(\theta \, | \, x) \ = \ \frac{161}{\theta^{2}} +
5.4289
\end{eqnarray*}\] so that \[\begin{eqnarray*}
I(\tilde{\theta} \, | \, x) & = & \frac{161}{1.6419^{2}} +
5.4289 \ = \ 65.1506.
\end{eqnarray*}\] So, approximately, \(\theta \, | \, x \sim N(1.6419,
65.1506^{-1})\). Thus, \[\begin{eqnarray*}
P(\theta > 2 \, | \, x) & = & P\{Z > \sqrt{65.1506}(2 -
1.6419)\} \ = \ 0.0019.
\end{eqnarray*}\]
Why might you prefer to express the doctor’s prior
beliefs as a normal distribution on some other parameterisation \(\phi = g(\theta)\)? Suggest an appropriate
choice of \(g(\cdot)\) in this case.
Now express the doctor’s beliefs using a normal prior for \(\phi\); that for healthy patients \(\phi\) will be on average around \(g(2)\) and it is “unlikely” that \(\phi\) is greater than \(g(3)\). Give an expression for the density
of \(\phi \, | \, x\) up to a constant
of proportionality.
By definition \(\theta > 0\) but both the specified
normal prior distribution and the normal approximation for the posterior
\(\theta \, | \, x\) have a sample
space of \((-\infty, \infty)\) so we
might want to use some other parametrisation which has the same sample
space as the normal distribution. An obvious choice is to use \(\phi = g(\theta) = \log \theta\) as if
\(\theta > 0\) then \(-\infty < \log \theta < \infty\). We
assert \(\phi \sim N(\mu_{0},
\sigma_{0}^{2})\) with \(E(\phi) = \log
2\) and \(P(\phi > \log 3) =
0.01\). So, \(E(\phi) = \log 2
\Rightarrow \mu_{0} = \log 2\) and \[\begin{eqnarray*}
P(\phi > \log 3) \ = \ 0.01 & \Rightarrow P\left(\frac{\phi -
\log 2}{\sigma_{0}} > \frac{\log 3 - \log 2}{\sigma_{0}}\right) \ = \
0.01.
\end{eqnarray*}\]
As \(\frac{\phi - \log 2}{\sigma_{0}} \sim
N(0, 1)\) we have that \[\begin{eqnarray*}
\frac{1}{\sigma_{0}} \ = \ \frac{2.33}{\log \frac{3}{2}} &
\Rightarrow & \sigma_{0}^{2} \ = \ 0.0303.
\end{eqnarray*}\] As \(\phi = \log
\theta\) then \(\theta =
e^{\phi}\). The likelihood is thus, using (b), \[\begin{eqnarray*}
f(x \, | \, \phi) & \propto & (e^{\phi})^{n\bar{x}}
e^{-ne^{\phi}} \ = \ e^{161\phi } e^{-100e^{\phi}}
\end{eqnarray*}\] for the given data. The posterior is thus \[\begin{eqnarray*}
f(\phi \, | \, x) & \propto & e^{161\phi } e^{-100e^{\phi}}
\times \exp\left\{-\frac{1}{0.0606}(\phi^{2} - \log 4 \phi)\right\} \\
& = & \exp\left\{-100e^{\phi} -16.5017 \phi^{2} + 183.8761
\phi\right\}.
\end{eqnarray*}\]
Let \(X_{1}, \ldots, X_{10}\) be the length of time between arrivals at an ATM machine, and assume that the \(X_{i}\)s may be viewed as exchangeable with \(X_{i} \, | \, \lambda \sim \mbox{Exp}(\lambda)\) where \(\lambda\) is the rate at which people arrive at the machine in one-minute intervals. Suppose we observe \(\sum_{i=1}^{10} x_{i} = 4\). Suppose that the prior distribution for \(\lambda\) is given by \[\begin{eqnarray*} f(\lambda) & = & \left\{\begin{array}{ll} c\exp\{-20(\lambda-0.25)^{2}\} & \lambda \geq 0, \\ 0 & \mbox{otherwise} \end{array} \right. \end{eqnarray*}\] where \(c\) is a known constant.
Find, up to a constant \(k\) of proportionality, the posterior
distribution \(\lambda \, | \, x\)
where \(x = (x_{1}, \ldots, x_{10})\).
Find also an expression for \(k\) which
you need not evaluate.
The likelihood is \[\begin{eqnarray*}
f(x \, | \, \lambda) & = & \prod_{i=1}^{10} \lambda e^{-\lambda
x_{i}} \ = \ \lambda^{10}e^{-4\lambda}
\end{eqnarray*}\] with the given data. The posterior is thus
\[\begin{eqnarray*}
f(\lambda \, | \, x) & \propto & \lambda^{10}e^{-4\lambda}
\times \exp\{-20(\lambda - 0.25)^{2}\} \ = \ \lambda^{10}
\exp\{-20(\lambda^{2} - 0.5\lambda) - 4\lambda\}
\end{eqnarray*}\] so that, making the fact that \(\lambda > 0\) explicit, \[\begin{eqnarray*}
f(\lambda \, | \, x) & = & \left\{\begin{array}{ll} k
\lambda^{10} \exp\{-20\lambda^{2} + 6\lambda\} & \lambda > 0 \\ 0
& \mbox{otherwise}, \end{array}\right.
\end{eqnarray*}\] where \[\begin{eqnarray*}
k^{-1} & = & \int_{0}^{\infty} \lambda^{10} \exp\{-20\lambda^{2}
+ 6\lambda\} \, d\lambda.
\end{eqnarray*}\]
Find a normal approximation to this posterior
distribution about the mode.
We find the mode of \(\lambda \, | \, x\) by maximising \(f(\lambda \, | \, x)\) or, equivalently,
\(\log f(\lambda \, | \, x)\). We have
\[\begin{eqnarray*}
\frac{\partial}{\partial \lambda} \log f(\lambda \, | \, x) & =
& \frac{\partial}{\partial \lambda} \left\{10 \log \lambda
-20\lambda^{2} + 6\lambda\right\} \\
& = & \frac{10}{\lambda} - 40\lambda + 6.
\end{eqnarray*}\] So, the mode \(\tilde{\lambda}\) satisfies \[\begin{eqnarray*}
40\tilde{\lambda}^{2} - 6\tilde{\lambda} - 10 \ = \ 0 & \Rightarrow
& 20\tilde{\lambda}^{2} - 3\tilde{\lambda} - 5 \ = \ 0.
\end{eqnarray*}\] Hence, as \(\tilde{\lambda} > 0\), \[\begin{eqnarray*}
\tilde{\lambda} & = & \frac{3 + \sqrt{9 + 4(20)(5)}}{2(20)} \ =
\ 0.5806.
\end{eqnarray*}\] The observed information is \[\begin{eqnarray*}
I(\lambda \, | \, x) & = & - \frac{\partial^{2}}{\partial
\lambda^{2}} \log f(\lambda \, | \, x) \ = \ \frac{10}{\lambda^{2}} + 40
\end{eqnarray*}\] so that \[\begin{eqnarray*}
I(\tilde{\lambda} \, | \, x) & = & \frac{10}{0.5806^{2}} + 40 \
= \ 69.6651.
\end{eqnarray*}\] So, approximately, \(\lambda \, | \, x \sim N(0.5806,
69.6651^{-1})\).
Let \(Z_{i}\), \(i = 1, \ldots, N\) be a sequence of
independent and identically distributed standard Normal random
variables. Assuming the normalising constant \(k\) is known, explain carefully how the
\(Z_{i}\) may be used to obtain
estimates of the mean of \(\lambda \, | \,
x\).
We shall use importance sampling. If we
wish to estimate some \(E\{g(\lambda) \, | \,
X\}\) with posterior density \(f(\lambda \, | \, x)\) and can generate
independent samples \(\lambda_{1}, \ldots,
\lambda_{N}\) from some \(q(\lambda)\), an approximation of \(f(\lambda \, | \, x)\), then \[\begin{eqnarray*}
\hat{I} & = & \frac{1}{N} \sum_{i=1}^{N}
\frac{g(\lambda_{i})f(\lambda_{i} \, | \, x)}{q(\lambda_{i})}
\end{eqnarray*}\] is an unbiased estimate of \(E\{g(\lambda) \, | \, X\}\).
As
\(Z_{i} \sim N(0, 1)\) then \(\lambda_{i} = 69.6651^{-\frac{1}{2}}Z_{i} + 0.5806
\sim N(0.5806, 69.6651^{-1})\) so that we can generate an
independent and identically distributed sample from the \(N(0.5806, 69.6651^{-1})\) which is an
approximation to the posterior of \(\lambda\). Letting \(g(\lambda) = \lambda\), \[\begin{eqnarray*}
f(\lambda \, | \, x) & = & \left\{\begin{array}{ll} k
\lambda^{10} \exp\{-20\lambda^{2} + 6\lambda\} & \lambda > 0 \\ 0
& \mbox{otherwise}, \end{array} \right. \\
q(\lambda) & = & \frac{\sqrt{69.6651}}{\sqrt{2 \pi}}
\exp\left\{-\frac{69.6651}{2}(\lambda - 0.5806)^{2}\right\}
\end{eqnarray*}\] then \[\begin{eqnarray*}
\hat{I} & = & \frac{1}{N} \sum_{i=1}^{N}
\frac{k\lambda_{i}^{11}\exp\{-20\lambda_{i}^{2} +
6\lambda_{i}\}\mathbb{I}_{\{\lambda_{i} >
0\}}}{\frac{\sqrt{69.6651}}{\sqrt{2 \pi}}
\exp\left\{-\frac{69.6651}{2}(\lambda_{i} - 0.5806)^{2}\right\}}
\end{eqnarray*}\] is an unbiased estimate of the posterior mean
of \(\lambda\) with \(\mathbb{I}_{\{\lambda_{i} > 0\}}\)
denoting the indicator function for the event \(\{\lambda_{i} > 0\}\).