Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta\). For each of the following distributions for \(X_{i} \, | \, \theta\) find the Jeffreys prior and the corresponding posterior distribution for \(\theta\).
\(X_{i} \, | \, \theta \sim
\mbox{Bern}(\theta)\).
The likelihood is \[\begin{eqnarray*}
f(x \, | \, \theta) & = & \prod_{i=1}^{n} \theta^{x_{i}}(1 -
\theta)^{1 - x_{i}} \\
& = & \theta^{n\bar{x}}(1-\theta)^{n-n\bar{x}}.
\end{eqnarray*}\] As \(\theta\)
is univariate \[\begin{eqnarray*}
\frac{\partial^{2}}{\partial \theta^{2}} \log f(x \, | \, \theta) &
= & \frac{\partial^{2}}{\partial \theta^{2}} \left\{n\bar{x} \log
\theta + (n-n\bar{x})\log (1-\theta)\right\} \\
& = & -\frac{n\bar{x}}{\theta^{2}} -
\frac{(n-n\bar{x})}{(1-\theta)^{2}}.
\end{eqnarray*}\] The Fisher information is thus \[\begin{eqnarray*}
I(\theta) & = & -E\left.\left\{-\frac{n\bar{X}}{\theta^{2}} -
\frac{(n-n\bar{X})}{(1-\theta)^{2}} \, \right| \, \theta \right\} \\
& = & \frac{nE(\bar{X} \, | \, \theta)}{\theta^{2}} +
\frac{\{n-nE(\bar{X} \, | \, \theta)\}}{(1-\theta)^{2}} \\
& = & \frac{n}{\theta} + \frac{n}{1-\theta} \ = \
\frac{n}{\theta(1-\theta)}.
\end{eqnarray*}\] The Jeffreys prior is then \[\begin{eqnarray*}
f(\theta) & \propto & \sqrt{\frac{n}{\theta(1-\theta)}} \
\propto \ \theta^{-\frac{1}{2}}(1-\theta)^{-\frac{1}{2}}.
\end{eqnarray*}\] This is the kernel of a \(\mbox{Beta}(\frac{1}{2}, \frac{1}{2})\)
distribution. This makes it straightforward to obtain the posterior
using the conjugacy of the Beta distribution relative to the Bernoulli
likelihood. The posterior is \(\mbox{Beta}(\frac{1}{2} + n\bar{x}, \frac{1}{2} +
n - n\bar{x})\). (If you are not sure why, see Solution Sheet
Four Question 1 (a) with \(\alpha = \beta =
\frac{1}{2}\).)
\(X_{i} \, | \, \theta \sim
\mbox{Po}(\theta)\).
The likelihood is \[\begin{eqnarray*}
f(x \, | \, \theta) & = & \prod_{i=1}^{n}
\frac{\theta^{x_{i}}e^{-\theta}}{x_{i}!} \\
& = & \frac{\theta^{n\bar{x}}e^{-n\theta}}{\prod_{i=1}^{n}
x_{i}!}.
\end{eqnarray*}\] As \(\theta\)
is univariate \[\begin{eqnarray*}
\frac{\partial^{2}}{\partial \theta^{2}} \log f(x \, | \, \theta) &
= & \frac{\partial^{2}}{\partial \theta^{2}} \left(n\bar{x}\log
\theta - n \theta - \sum_{i=1}^{n} \log x_{i}! \right) \\
& = & -\frac{n\bar{x}}{\theta^{2}}.
\end{eqnarray*}\] The Fisher information is thus \[\begin{eqnarray*}
I(\theta) & = & - E\left.\left(-\frac{n\bar{X}}{\theta^{2}} \,
\right| \, \theta \right) \\
& = & \frac{nE(\bar{X} \, | \, \theta)}{\theta^{2}} \ = \
\frac{n}{\theta}.
\end{eqnarray*}\] The Jeffreys prior is then \[\begin{eqnarray*}
f(\theta) \ \propto \ \sqrt{\frac{n}{\theta}} \ \propto \
\theta^{-\frac{1}{2}}.
\end{eqnarray*}\] This is often expressed as the improper \(\mbox{Gamma}(\frac{1}{2}, 0)\)
distribution. This makes it straightforward to obtain the posterior
using the conjugacy of the Gamma distribution relative to the Poisson
likelihood. The posterior is \(\mbox{Gamma}(\frac{1}{2} + n\bar{x}, n)\).
(If you are not sure why, see Solution Sheet Four Question 4 (a) with
\(\alpha = \frac{1}{2}\), \(\beta = 0\).)
\(X_{i} \, | \, \theta \sim
\mbox{Maxwell}(\theta)\), the Maxwell distribution with parameter
\(\theta\) so that \[\begin{eqnarray*}
f(x_{i} \, | \, \theta) =
\left(\frac{2}{\pi}\right)^{\frac{1}{2}}\theta^{\frac{3}{2}}x_{i}^{2}\exp\left\{-\frac{\theta
x_{i}^{2}}{2}\right\}, \ \ x_{i} > 0
\end{eqnarray*}\] and \(E(X_{i}
\, | \, \theta) = 2\sqrt{\frac{2}{\pi \theta}}\), \(Var(X_{i} \, | \, \theta) = \frac{3\pi - 8}{\pi
\theta}\).
The likelihood is \[\begin{eqnarray*}
f(x \, | \, \theta) & = & \prod_{i=1}^{n}
\left(\frac{2}{\pi}\right)^{\frac{1}{2}}\theta^{\frac{3}{2}}x_{i}^{2}\exp\left\{-\frac{\theta
x_{i}^{2}}{2}\right\} \\
& = &
\left(\frac{2}{\pi}\right)^{\frac{n}{2}}\theta^{\frac{3n}{2}}\left(
\prod_{i=1}^{n} x_{i}^{2}\right)\exp\left\{-\frac{\theta}{2}
\sum_{i=1}^{n} x_{i}^{2}\right\}.
\end{eqnarray*}\] As \(\theta\)
is univariate \[\begin{eqnarray*}
\frac{\partial^{2}}{\partial \theta^{2}} \log f(x \, | \, \theta) &
= & \frac{\partial^{2}}{\partial \theta^{2}} \left(\frac{n}{2} \log
\frac{2}{\pi} + \frac{3n}{2} \log \theta + \sum_{i=1}^{n} \log x_{i}^{2}
- \frac{\theta}{2} \sum_{i=1}^{n} x_{i}^{2}\right) \\
& = & - \frac{3n}{2\theta^{2}}.
\end{eqnarray*}\] The Fisher information is \[\begin{eqnarray*}
I(\theta) \ = \ -E\left.\left(-\frac{3n}{2\theta^{2}} \, \right| \,
\theta \right) \ = \ \frac{3n}{2\theta^{2}}.
\end{eqnarray*}\] The Jeffreys prior is then \[\begin{eqnarray*}
f(\theta) & \propto & \sqrt{\frac{3n}{2\theta^{2}}} \ \propto \
\theta^{-1}.
\end{eqnarray*}\] Thus is often expressed as the improper \(\mbox{Gamma}(0, 0)\) distribution. This
makes it straightforward to obtain the posterior using the conjugacy of
the Gamma distribution relative to the Maxwell likelihood. The posterior
is \(\mbox{Gamma}(\frac{3n}{2},
\frac{1}{2}\sum_{i=1}^{n} x_{i}^{2})\). (If you are not sure why,
see Solution Sheet Four Question 1 (c) with \(\alpha = \beta = 0\).)
Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\lambda\). Suppose that \(X_{i} \, | \, \lambda \sim \mbox{Exp}(\lambda)\) where \(\lambda\) represents the rate so that \(E(X_{i} \, | \, \lambda) = \lambda^{-1}\).
Show that \(X_{i} \, | \,
\lambda \sim \mbox{Exp}(\lambda)\) is a member of the \(1\)-parameter exponential family. Hence,
write down a sufficient statistic \(t(X)\) for \(X =
(X_{1}, \ldots, X_{n})\) for learning about \(\lambda\).
We write \[\begin{eqnarray*}
f(x_{i} \, | \, \lambda) \ = \ \lambda e^{-\lambda x_{i}} \ = \
\exp\left\{-\lambda x_{i} + \log \lambda \right\}
\end{eqnarray*}\] which is of the form \(\exp\{\phi_{1}(\lambda) u_{1}(x_{i}) + g(\lambda)
+ h(x_{i})\}\) where \(\phi_{1}(\lambda) = - \lambda\), \(u_{1}(x_{i}) = x_{i}\), \(g(\lambda) = \log \lambda\) and \(h(x_{i}) = 0\). From the Proposition given
in Lecture 11 (see p26 of the on-line notes) a sufficient statistic is
\[\begin{eqnarray*}
t(X) & = & \left[n, \sum_{i=1}^{n} u_{1}(X_{i})\right] \ = \
\left[n, \sum_{i=1}^{n} X_{i}\right].
\end{eqnarray*}\]
Find the Jeffreys prior and comment upon whether or not
it is improper. Find the posterior distribution for this
prior.
The likelihood is \[\begin{eqnarray}
f(x \, | \, \lambda) \ = \ \prod_{i=1}^{n} \lambda e^{-\lambda x_{i}} \
= \ \lambda^{n} e^{- \lambda n \bar{x}}. \tag{1}
\end{eqnarray}\] As \(\theta\)
is univariate \[\begin{eqnarray*}
\frac{\partial^{2}}{\partial \theta^{2}} \log f(x \, | \, \theta) &
= & \frac{\partial^{2}}{\partial \theta^{2}} \left(n \log \lambda -
\lambda n \bar{x} \right) \\
& = & -\frac{n}{\lambda^{2}}.
\end{eqnarray*}\] The Fisher information is \[\begin{eqnarray*}
I(\lambda) \ = \ -E\left.\left(-\frac{n}{\lambda^{2}} \, \right| \,
\lambda \right) \ = \ \frac{n}{\lambda^{2}}.
\end{eqnarray*}\] The Jeffreys prior is thus \[\begin{eqnarray}
f(\lambda) \ \propto \ \sqrt{\frac{n}{\lambda^{2}}} \ \propto \
\lambda^{-1}. \tag{2}
\end{eqnarray}\] This is the improper \(\mbox{Gamma}(0, 0)\) distribution. The
posterior can be obtained using the conjugacy of the Gamma distribution
relative to the Exponential likelihood. It is \(\mbox{Gamma}(n, n\bar{x})\). (If you are
not sure why, see Solution Sheet Three Question 1 (a) with \(\alpha = \beta = 0\).)
Consider the transformation \(\phi = \log \lambda\).
The Jeffreys prior for Normal distributions. In Lecture 13 we showed that for an exchangeable collection \(X = (X_{1}, \ldots, X_{n})\) with \(X_{i} \, | \, \theta \sim N(\theta, \sigma^{2})\) where \(\sigma^{2}\) is known the Jeffreys prior for \(\theta\) is \(f(\theta) \propto 1\).
Consider, instead, that \(X_{i}
\, | \, \theta \sim N(\mu, \theta)\) where \(\mu\) is known. Find the Jeffreys prior for
\(\theta\).
The
likelihood is \[\begin{eqnarray*}
f(x \, | \, \theta) & = & \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi
\theta}} \exp\left\{-\frac{1}{2\theta} (x_{i} - \mu)^{2}\right\} \\
& = & (2 \pi)^{-\frac{n}{2}} \theta^{-\frac{n}{2}} \exp
\left\{-\frac{1}{2 \theta} \sum_{i=1}^{n} (x_{i} - \mu)^{2}\right\}.
\end{eqnarray*}\] As \(\theta\)
is univariate \[\begin{eqnarray*}
\frac{\partial^{2}}{\partial \theta^{2}} \log f(x \, | \, \theta) &
= & \frac{\partial^{2}}{\partial \theta^{2}} \left\{-\frac{n}{2}
\log 2 \pi - \frac{n}{2} \log \theta - \frac{1}{2 \theta} \sum_{i=1}^{n}
(x_{i} - \mu)^{2} \right\} \\
& = & \frac{n}{2 \theta^{2}} -
\frac{1}{\theta^{3}} \sum_{i=1}^{n} (x_{i} - \mu)^{2}.
\end{eqnarray*}\] The Fisher information is \[\begin{eqnarray*}
I(\theta) & = & - E\left.\left\{\frac{n}{2 \theta^{2}} -
\frac{1}{\theta^{3}} \sum_{i=1}^{n} (X_{i} - \mu)^{2} \, \right| \,
\theta \right\} \\
& = & - \frac{n}{2 \theta^{2}} + \frac{1}{\theta^{3}}
\sum_{i=1}^{n} E\left\{(X_{i} - \mu)^{2} \, | \, \theta\right\} \\
& = & - \frac{n}{2 \theta^{2}} + \frac{n}{\theta^{2}} \ = \
\frac{n}{2 \theta^{2}}.
\end{eqnarray*}\] The Jeffreys prior is then \[\begin{eqnarray*}
f(\theta) & \propto & \sqrt{\frac{n}{2 \theta^{2}}} \ = \
\theta^{-1}.
\end{eqnarray*}\]
Now suppose that \(X_{i} \, |
\, \theta \sim N(\mu, \sigma^{2})\) where \(\theta = (\mu, \sigma^{2})\). Find the
Jeffreys prior for \(\theta\).
The likelihood
is \[\begin{eqnarray*}
f(x \, | \, \theta) & = & (2 \pi)^{-\frac{n}{2}}
(\sigma^{2})^{-\frac{n}{2}} \exp \left\{-\frac{1}{2 \sigma^{2}}
\sum_{i=1}^{n} (x_{i} - \mu)^{2}\right\}
\end{eqnarray*}\] so that \[\begin{eqnarray*}
\log f(x \, | \, \theta) & = & -\frac{n}{2} \log 2 \pi -
\frac{n}{2} \log \sigma^{2} - \frac{1}{2 \sigma^{2}} \sum_{i=1}^{n}
(x_{i} - \mu)^{2}.
\end{eqnarray*}\] As \(\theta = (\mu,
\sigma^{2})\) then the Fisher Information matrix is \[\begin{eqnarray*}
I(\theta) & = & -\left(\begin{array}{ll}
E\left.\left\{\frac{\partial^{2}}{\partial \mu^{2}} \log f(x \, | \,
\theta) \, \right| \, \theta \right\} &
E\left.\left\{\frac{\partial^{2}}{\partial \mu \partial \sigma^{2}} \log
f(x \, | \, \theta) \, \right| \, \theta \right\}\\
E\left.\left\{\frac{\partial^{2}}{\partial \mu \partial \sigma^{2}} \log
f(x \, | \, \theta) \, \right| \, \theta \right\} &
E\left.\left\{\frac{\partial^{2}}{\partial (\sigma^{2})^{2}} \log f(x \,
| \, \theta) \, \right| \, \theta \right\}\end{array}\right)
\end{eqnarray*}\] Now, \[\begin{eqnarray*}
\frac{\partial}{\partial \mu} \log f(x \, | \, \theta) & = &
\frac{1}{\sigma^{2} } \sum_{i=1}^{n} (x_{i}- \mu); \\
\frac{\partial}{\partial (\sigma^{2})} \log f(x \, | \, \theta) & =
& -\frac{n}{2\sigma^{2}} + \frac{1}{2\sigma^{4}}\sum_{i=1}^{n}
(x_{i} - \mu)^{2};
\end{eqnarray*}\] so that \[\begin{eqnarray*}
\frac{\partial^{2}}{\partial \mu^{2}} \log f(x \, | \, \theta) & =
& -\frac{n}{\sigma^{2}}; \\
\frac{\partial^{2}}{\partial \mu \partial (\sigma^{2})} \log f(x \, | \,
\theta) & = & - \frac{1}{\sigma^{4}} \sum_{i=1}^{n} (x_{i} -
\mu); \\
\frac{\partial^{2}}{\partial (\sigma^{2})^{2}} \log f(x \, | \, \theta)
& = & \frac{n}{2\sigma^{4}} - \frac{1}{\sigma^{6}}\sum_{i=1}^{n}
(x_{i} - \mu)^{2}.
\end{eqnarray*}\] Noting that \(E(X_{i}
- \mu \, | \, \theta) = 0\) and \(E\{(X_{i} - \mu)^{2} \, | \, \theta\} =
\sigma^{2}\), the Fisher information matrix is \[\begin{eqnarray*}
I(\theta) & = & -\left(\begin{array}{cc} -\frac{n}{\sigma^{2}}
& 0 \\
0 & \frac{n}{2 \sigma^{4}} - \frac{n}{\sigma^{4}} \end{array}
\right) \ = \ \left(\begin{array}{cc} \frac{n}{\sigma^{2}} & 0 \\
0 & \frac{n}{2 \sigma^{4}} \end{array} \right).
\end{eqnarray*}\] The Jeffreys prior is \[\begin{eqnarray*}
f(\theta) & \propto & \sqrt{|I(\theta)|} \\
& = & \sqrt{\frac{n^{2}}{2\sigma^{6}}} \ \propto \ \sigma^{-3}.
\end{eqnarray*}\]
Comment upon your answers for these three Normal
cases.
Suppose \(\theta =
(\mu, \sigma^{2})\) and \(X_{i} \, | \,
\theta \sim N(\mu, \sigma^{2})\). If \(\mu\) is unknown and \(\sigma^{2}\) known then the Jeffreys prior
is \(f(\mu) \propto 1\). If \(\mu\) is known and \(\sigma^{2}\) is unknown then the Jeffreys
prior is \(f(\sigma^{2}) \propto
\sigma^{-2}\). When both \(\mu\)
and \(\sigma^{2}\) are unknown then the
Jeffreys prior is \(f(\mu, \sigma^{2}) \propto
\sigma^{-3}\). Jeffreys himself found this inconsistent, arguing
that \(f(\mu, \sigma^{2}) \propto
\sigma^{-2}\), the product of the priors \(f(\mu)\) and \(f(\sigma^{2})\). Jeffreys’ argument was
that ignorance about \(\mu\) and \(\sigma^{2}\) should be represented by
independent ignorance priors for \(\mu\) and \(\sigma^{2}\) separately. However, it is not
clear under what circumstances this prior judgement of independence
should be imposed.
Consider, given \(\theta\), a sequence of independent Bernoulli trials with parameter \(\theta\). We wish to make inferences about \(\theta\) and consider two possible methods. In the first, we carry out \(n\) trials and let \(X\) denote the total number of successes in these trials. Thus, \(X \, | \, \theta \sim \mbox{Bin}(n, \theta)\) with \[\begin{eqnarray*} f_{X}(x \, | \, \theta) & = & \binom{n}{x} \theta^{x}(1- \theta)^{n-x}, \ \ x = 0, 1, \ldots, n. \end{eqnarray*}\] In the second method, we count the total number \(Y\) of trials up to and including the \(r\)th success so that \(Y \, | \, \theta \sim \mbox{Nbin}(r, \theta)\), the negative binomial distribution, with \[\begin{eqnarray*} f_{Y}(y \, | \, \theta) & = & \binom{y-1}{r-1} \theta^{r}(1- \theta)^{y-r}, \ \ y = r, r+1, \ldots. \end{eqnarray*}\]
Obtain the Jeffreys prior distribution for each of the
two methods. You may find it useful to note that \(E(Y \, | \, \theta) =
\frac{r}{\theta}\).
For \(X \, | \, \theta \sim \mbox{Bin}(n,
\theta)\) we have \[\begin{eqnarray*}
\log f_{X}(x \, | \, \theta) & = & \log \binom{n}{x} + x \log
\theta + (n-x) \log (1 - \theta)
\end{eqnarray*}\] so that \[\begin{eqnarray*}
\frac{\partial^{2}}{\partial \theta^{2}} \log f_{X}(x \, | \, \theta)
& = & - \frac{x}{\theta^{2}} - \frac{(n-x)}{(1-\theta)^{2}}.
\end{eqnarray*}\] The Fisher information is \[\begin{eqnarray*}
I(\theta) & = & -E\left.\left\{- \frac{X}{\theta^{2}} -
\frac{(n-X)}{(1-\theta)^{2}} \, \right| \, \theta \right\} \\
& = & \frac{n}{\theta} + \frac{n}{1-\theta} \ = \
\frac{n}{\theta(1-\theta)}.
\end{eqnarray*}\] The Jeffreys prior is thus \[\begin{eqnarray*}
f(\theta) & \propto & \sqrt{\frac{n}{\theta(1-\theta)}} \
\propto \ \theta^{-\frac{1}{2}}(1-\theta)^{-\frac{1}{2}}
\end{eqnarray*}\] which is a kernel of the \(\mbox{Beta}(\frac{1}{2}, \frac{1}{2})\)
distribution.
For \(Y \, | \, \theta
\sim \mbox{Nbin}(r, \theta)\) we have \[\begin{eqnarray*}
\log f_{Y}(y \, | \, \theta) & = & \log \binom{y-1}{r-1} + r
\log \theta + (y-r) \log (1- \theta)
\end{eqnarray*}\] so that \[\begin{eqnarray*}
\frac{\partial^{2}}{\partial \theta^{2}} \log f_{Y}(y \, | \, \theta)
& = & - \frac{r}{\theta^{2}} - \frac{(y-r)}{(1-\theta)^{2}}.
\end{eqnarray*}\] The Fisher information is \[\begin{eqnarray*}
I(\theta) & = & -E\left.\left\{- \frac{r}{\theta^{2}} -
\frac{(Y-r)}{(1-\theta)^{2}} \, \right| \, \theta \right\} \\
& = & \frac{r}{\theta^{2}} + \frac{r(\frac{1}{\theta} -
1)}{(1-\theta)^{2}} \\
& = & \frac{r}{\theta^{2}} + \frac{r}{\theta(1-\theta)} \ = \
\frac{r}{\theta^{2}(1-\theta)}.
\end{eqnarray*}\] The Jeffreys prior is thus \[\begin{eqnarray*}
f(\theta) & \propto & \sqrt{\frac{r}{\theta^{2}(1-\theta)}} \
\propto \ \theta^{-1}(1-\theta)^{-\frac{1}{2}}
\end{eqnarray*}\] which can be viewed as the improper \(\mbox{Beta}(0, \frac{1}{2})\)
distribution.
Suppose we observe \(x =
r\) and \(y = n\). For each
method, calculate the posterior distribution for \(\theta\) with the Jeffreys prior. Comment
upon your answers.
We summarise the results in a
table. \[\begin{eqnarray*}
& \begin{array}{cccc}
\mbox{Prior} & \mbox{Likelihood} & \mbox{Prop Likelihood} &
\mbox{Posterior} \\ \hline
\mbox{Beta}(\frac{1}{2}, \frac{1}{2}) & X \, | \, \theta \sim
\mbox{Bin}(n, \theta) & \theta^{x}(1-\theta)^{n-x} &
\mbox{Beta}(\frac{1}{2} + x, \frac{1}{2} + n -x)\\
\mbox{Beta}(0, \frac{1}{2}) & Y \, | \, \theta \sim \mbox{Nbin}(r,
\theta) & \theta^{r}(1-\theta)^{y-r} & \mbox{Beta}(r,
\frac{1}{2} + y -r) \\
\end{array}
\end{eqnarray*}\] Notice that if \(x =
r\) and \(y = n\) then the two
approaches have proportional likelihoods: in both cases we observed
\(x\) successes in \(n\) trials but \(\theta \, | \, x \sim \mbox{Beta}(\frac{1}{2} + x,
\frac{1}{2} + n -x)\) and \(\theta \, |
\, y \sim \mbox{Beta}(x, \frac{1}{2} + n -x)\). Although the
observed data is the same, Jeffreys’ approach yields different posterior
distributions which seems to contradict the notion of an noninformative
prior. This occurs because Jeffreys’ prior violates the
likelihood principle. In short this principle states
that the likelihood contains all the information about the data \(x\) so that two likelihoods contain the
same information if they are proportional. In this case, the two
likelihoods are proportional but yield different posterior
distributions. Many classical methods, such as confidence intervals,
violate the likelihood principle but Bayesian statistics (using proper
prior distributions) does not.