Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta\). For each of the following distributions for \(X_{i} \, | \, \theta\) find the Jeffreys prior and the corresponding posterior distribution for \(\theta\).
The likelihood is \[\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} \theta^{x_{i}}(1 - \theta)^{1 - x_{i}} \\ & = & \theta^{n\bar{x}}(1-\theta)^{n-n\bar{x}}. \end{eqnarray*}\] As \(\theta\) is univariate \[\begin{eqnarray*} \frac{\partial^{2}}{\partial \theta^{2}} \log f(x \, | \, \theta) & = & \frac{\partial^{2}}{\partial \theta^{2}} \left\{n\bar{x} \log \theta + (n-n\bar{x})\log (1-\theta)\right\} \\ & = & -\frac{n\bar{x}}{\theta^{2}} - \frac{(n-n\bar{x})}{(1-\theta)^{2}}. \end{eqnarray*}\] The Fisher information is thus \[\begin{eqnarray*} I(\theta) & = & -E\left.\left\{-\frac{n\bar{X}}{\theta^{2}} - \frac{(n-n\bar{X})}{(1-\theta)^{2}} \, \right| \, \theta \right\} \\ & = & \frac{nE(\bar{X} \, | \, \theta)}{\theta^{2}} + \frac{\{n-nE(\bar{X} \, | \, \theta)\}}{(1-\theta)^{2}} \\ & = & \frac{n}{\theta} + \frac{n}{1-\theta} \ = \ \frac{n}{\theta(1-\theta)}. \end{eqnarray*}\] The Jeffreys prior is then \[\begin{eqnarray*} f(\theta) & \propto & \sqrt{\frac{n}{\theta(1-\theta)}} \ \propto \ \theta^{-\frac{1}{2}}(1-\theta)^{-\frac{1}{2}}. \end{eqnarray*}\] This is the kernel of a \(Beta(\frac{1}{2}, \frac{1}{2})\) distribution. This makes it straightforward to obtain the posterior using the conjugacy of the Beta distribution relative to the Bernoulli likelihood. The posterior is \(Beta(\frac{1}{2} + n\bar{x}, \frac{1}{2} + n - n\bar{x})\). (If you are not sure why, see Solution Sheet Four Question 1 (a) with \(\alpha = \beta = \frac{1}{2}\).)
The likelihood is \[\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} \frac{\theta^{x_{i}}e^{-\theta}}{x_{i}!} \\ & = & \frac{\theta^{n\bar{x}}e^{-n\theta}}{\prod_{i=1}^{n} x_{i}!}. \end{eqnarray*}\] As \(\theta\) is univariate \[\begin{eqnarray*} \frac{\partial^{2}}{\partial \theta^{2}} \log f(x \, | \, \theta) & = & \frac{\partial^{2}}{\partial \theta^{2}} \left(n\bar{x}\log \theta - n \theta - \sum_{i=1}^{n} \log x_{i}! \right) \\ & = & -\frac{n\bar{x}}{\theta^{2}}. \end{eqnarray*}\] The Fisher information is thus \[\begin{eqnarray*} I(\theta) & = & - E\left.\left(-\frac{n\bar{X}}{\theta^{2}} \, \right| \, \theta \right) \\ & = & \frac{nE(\bar{X} \, | \, \theta)}{\theta^{2}} \ = \ \frac{n}{\theta}. \end{eqnarray*}\] The Jeffreys prior is then \[\begin{eqnarray*} f(\theta) \ \propto \ \sqrt{\frac{n}{\theta}} \ \propto \ \theta^{-\frac{1}{2}}. \end{eqnarray*}\] This is often expressed as the improper \(Gamma(\frac{1}{2}, 0)\) distribution. This makes it straightforward to obtain the posterior using the conjugacy of the Gamma distribution relative to the Poisson likelihood. The posterior is \(Gamma(\frac{1}{2} + n\bar{x}, n)\). (If you are not sure why, see Solution Sheet Four Question 4 (a) with \(\alpha = \frac{1}{2}\), \(\beta = 0\).)
The likelihood is \[\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} \left(\frac{2}{\pi}\right)^{\frac{1}{2}}\theta^{\frac{3}{2}}x_{i}^{2}\exp\left\{-\frac{\theta x_{i}^{2}}{2}\right\} \\ & = & \left(\frac{2}{\pi}\right)^{\frac{n}{2}}\theta^{\frac{3n}{2}}\left( \prod_{i=1}^{n} x_{i}^{2}\right)\exp\left\{-\frac{\theta}{2} \sum_{i=1}^{n} x_{i}^{2}\right\}. \end{eqnarray*}\] As \(\theta\) is univariate \[\begin{eqnarray*} \frac{\partial^{2}}{\partial \theta^{2}} \log f(x \, | \, \theta) & = & \frac{\partial^{2}}{\partial \theta^{2}} \left(\frac{n}{2} \log \frac{2}{\pi} + \frac{3n}{2} \log \theta + \sum_{i=1}^{n} \log x_{i}^{2} - \frac{\theta}{2} \sum_{i=1}^{n} x_{i}^{2}\right) \\ & = & - \frac{3n}{2\theta^{2}}. \end{eqnarray*}\] The Fisher information is \[\begin{eqnarray*} I(\theta) \ = \ -E\left.\left(-\frac{3n}{2\theta^{2}} \, \right| \, \theta \right) \ = \ \frac{3n}{2\theta^{2}}. \end{eqnarray*}\] The Jeffreys prior is then \[\begin{eqnarray*} f(\theta) & \propto & \sqrt{\frac{3n}{2\theta^{2}}} \ \propto \ \theta^{-1}. \end{eqnarray*}\] Thus is often expressed as the improper \(Gamma(0, 0)\) distribution. This makes it straightforward to obtain the posterior using the conjugacy of the Gamma distribution relative to the Maxwell likelihood. The posterior is \(Gamma(\frac{3n}{2}, \frac{1}{2}\sum_{i=1}^{n} x_{i}^{2})\). (If you are not sure why, see Solution Sheet Four Question 1 (c) with \(\alpha = \beta = 0\).)
Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\lambda\). Suppose that \(X_{i} \, | \, \lambda \sim Exp(\lambda)\) where \(\lambda\) represents the rate so that \(E(X_{i} \, | \, \lambda) = \lambda^{-1}\).
We write \[\begin{eqnarray*} f(x_{i} \, | \, \lambda) \ = \ \lambda e^{-\lambda x_{i}} \ = \ \exp\left\{-\lambda x_{i} + \log \lambda \right\} \end{eqnarray*}\] which is of the form \(\exp\{\phi_{1}(\lambda) u_{1}(x_{i}) + g(\lambda) + h(x_{i})\}\) where \(\phi_{1}(\lambda) = - \lambda\), \(u_{1}(x_{i}) = x_{i}\), \(g(\lambda) = \log \lambda\) and \(h(x_{i}) = 0\). From the Proposition given in Lecture 9 (see p26 of the on-line notes) a sufficient statistic is \[\begin{eqnarray*} t(X) & = & \sum_{i=1}^{n} u_{1}(X_{i}) \ = \ \sum_{i=1}^{n} X_{i}. \end{eqnarray*}\]
The likelihood is \[\begin{eqnarray} f(x \, | \, \lambda) \ = \ \prod_{i=1}^{n} \lambda e^{-\lambda x_{i}} \ = \ \lambda^{n} e^{- \lambda n \bar{x}}. \tag{1} \end{eqnarray}\] As \(\theta\) is univariate \[\begin{eqnarray*} \frac{\partial^{2}}{\partial \theta^{2}} \log f(x \, | \, \theta) & = & \frac{\partial^{2}}{\partial \theta^{2}} \left(n \log \lambda - \lambda n \bar{x} \right) \\ & = & -\frac{n}{\lambda^{2}}. \end{eqnarray*}\] The Fisher information is \[\begin{eqnarray*} I(\lambda) \ = \ -E\left.\left(-\frac{n}{\lambda^{2}} \, \right| \, \lambda \right) \ = \ \frac{n}{\lambda^{2}}. \end{eqnarray*}\] The Jeffreys prior is thus \[\begin{eqnarray} f(\lambda) \ \propto \ \sqrt{\frac{n}{\lambda^{2}}} \ \propto \ \lambda^{-1}. \tag{2} \end{eqnarray}\] This is the improper \(Gamma(0, 0)\) distribution. The posterior can be obtained using the conjugacy of the Gamma distribution relative to the Exponential likelihood. It is \(Gamma(n, n\bar{x})\). (If you are not sure why, see Solution Sheet Three Question 1 (a) with \(\alpha = \beta = 0\).)
Consider the transformation \(\phi = \log \lambda\).
Inverting \(\phi = \log \lambda\) we have \(\lambda = e^{\phi}\). Substituting this into (1) we have \[\begin{eqnarray*} L(\phi) & = & e^{n\phi} \exp\left\{-n\bar{x}e^{\phi}\right\}. \end{eqnarray*}\] We have \[\begin{eqnarray*} \frac{\partial^{2}}{\partial \phi^{2}} \log L(\phi) & = & \frac{\partial^{2}}{\partial \phi^{2}}\left(n\phi - n\bar{x}e^{\phi}\right) \\ & = & - n\bar{x}e^{\phi}. \end{eqnarray*}\] The Fisher information is \[\begin{eqnarray*} I(\phi) & = & -E\left.\left(- n\bar{X}e^{\phi} \, \right| \, \phi \right) \\ & = & nE(X_{i} \, | \, \phi)e^{\phi}. \end{eqnarray*}\] Now \(E(X_{i} \, | \, \phi) = E(X_{i} \, | \, \lambda) = \lambda^{-1} = e^{-\phi}\) so that \(I(\phi) = n\). The Jeffreys prior is then \[\begin{eqnarray} f_{\phi}(\phi) \ \propto \ \sqrt{n} \ \propto \ 1 \tag{3} \end{eqnarray}\] which is the improper uniform distribution on \((-\infty, \infty)\).
From (2) we have that \(f_{\lambda}(\lambda) \propto \lambda^{-1}\). Using the familiar change of variables formula, \[\begin{eqnarray*} f_{\phi}(\phi) & = & \left|\frac{\partial \lambda}{\partial \phi}\right| f_{\lambda}(e^{\phi}) \end{eqnarray*}\] with \(\frac{\partial \lambda}{\partial \phi} = e^{\phi}\) as \(\lambda = e^{\phi}\), we have that \[\begin{eqnarray*} f_{\phi}(\phi) & \propto |e^{\phi}|e^{-\phi} \ = \ 1 \end{eqnarray*}\] which agrees with (3). This is an illustration of the invariance to reparameterisation of the Jeffreys prior.
The Jeffreys prior for Normal distributions. In Lecture 12 we showed that for an exchangeable collection \(X = (X_{1}, \ldots, X_{n})\) with \(X_{i} \, | \, \theta \sim N(\theta, \sigma^{2})\) where \(\sigma^{2}\) is known the Jeffreys prior for \(\theta\) is \(f(\theta) \propto 1\).
The likelihood is \[\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi \theta}} \exp\left\{-\frac{1}{2\theta} (x_{i} - \mu)^{2}\right\} \\ & = & (2 \pi)^{-\frac{n}{2}} \theta^{-\frac{n}{2}} \exp \left\{-\frac{1}{2 \theta} \sum_{i=1}^{n} (x_{i} - \mu)^{2}\right\}. \end{eqnarray*}\] As \(\theta\) is univariate \[\begin{eqnarray*} \frac{\partial^{2}}{\partial \theta^{2}} \log f(x \, | \, \theta) & = & \frac{\partial^{2}}{\partial \theta^{2}} \left\{-\frac{n}{2} \log 2 \pi - \frac{n}{2} \log \theta - \frac{1}{2 \theta} \sum_{i=1}^{n} (x_{i} - \mu)^{2} \right\} \\ & = & \frac{n}{2 \theta^{2}} - \frac{1}{\theta^{3}} \sum_{i=1}^{n} (x_{i} - \mu)^{2}. \end{eqnarray*}\] The Fisher information is \[\begin{eqnarray*} I(\theta) & = & - E\left.\left\{\frac{n}{2 \theta^{2}} - \frac{1}{\theta^{3}} \sum_{i=1}^{n} (X_{i} - \mu)^{2} \, \right| \, \theta \right\} \\ & = & - \frac{n}{2 \theta^{2}} + \frac{1}{\theta^{3}} \sum_{i=1}^{n} E\left\{(X_{i} - \mu)^{2} \, | \, \theta\right\} \\ & = & - \frac{n}{2 \theta^{2}} + \frac{n}{\theta^{2}} \ = \ \frac{n}{2 \theta^{2}}. \end{eqnarray*}\] The Jeffreys prior is then \[\begin{eqnarray*} f(\theta) & \propto & \sqrt{\frac{n}{2 \theta^{2}}} \ = \ \theta^{-1}. \end{eqnarray*}\]
The likelihood is \[\begin{eqnarray*} f(x \, | \, \theta) & = & (2 \pi)^{-\frac{n}{2}} (\sigma^{2})^{-\frac{n}{2}} \exp \left\{-\frac{1}{2 \sigma^{2}} \sum_{i=1}^{n} (x_{i} - \mu)^{2}\right\} \end{eqnarray*}\] so that \[\begin{eqnarray*} \log f(x \, | \, \theta) & = & -\frac{n}{2} \log 2 \pi - \frac{n}{2} \log \sigma^{2} - \frac{1}{2 \sigma^{2}} \sum_{i=1}^{n} (x_{i} - \mu)^{2}. \end{eqnarray*}\] As \(\theta = (\mu, \sigma^{2})\) then the Fisher Information matrix is \[\begin{eqnarray*} I(\theta) & = & -\left(\begin{array}{ll} E\left.\left\{\frac{\partial^{2}}{\partial \mu^{2}} \log f(x \, | \, \theta) \, \right| \, \theta \right\} & E\left.\left\{\frac{\partial^{2}}{\partial \mu \partial \sigma^{2}} \log f(x \, | \, \theta) \, \right| \, \theta \right\}\\ E\left.\left\{\frac{\partial^{2}}{\partial \mu \partial \sigma^{2}} \log f(x \, | \, \theta) \, \right| \, \theta \right\} & E\left.\left\{\frac{\partial^{2}}{\partial (\sigma^{2})^{2}} \log f(x \, | \, \theta) \, \right| \, \theta \right\}\end{array}\right) \end{eqnarray*}\] Now, \[\begin{eqnarray*} \frac{\partial}{\partial \mu} \log f(x \, | \, \theta) & = & \frac{1}{\sigma^{2} } \sum_{i=1}^{n} (x_{i}- \mu); \\ \frac{\partial}{\partial (\sigma^{2})} \log f(x \, | \, \theta) & = & -\frac{n}{2\sigma^{2}} + \frac{1}{2\sigma^{4}}\sum_{i=1}^{n} (x_{i} - \mu)^{2}; \end{eqnarray*}\] so that \[\begin{eqnarray*} \frac{\partial^{2}}{\partial \mu^{2}} \log f(x \, | \, \theta) & = & -\frac{n}{\sigma^{2}}; \\ \frac{\partial^{2}}{\partial \mu \partial (\sigma^{2})} \log f(x \, | \, \theta) & = & - \frac{1}{\sigma^{4}} \sum_{i=1}^{n} (x_{i} - \mu); \\ \frac{\partial^{2}}{\partial (\sigma^{2})^{2}} \log f(x \, | \, \theta) & = & \frac{n}{2\sigma^{4}} - \frac{1}{\sigma^{6}}\sum_{i=1}^{n} (x_{i} - \mu)^{2}. \end{eqnarray*}\] Noting that \(E(X_{i} - \mu \, | \, \theta) = 0\) and \(E\{(X_{i} - \mu)^{2} \, | \, \theta\} = \sigma^{2}\), the Fisher information matrix is \[\begin{eqnarray*} I(\theta) & = & -\left(\begin{array}{cc} -\frac{n}{\sigma^{2}} & 0 \\ 0 & \frac{n}{2 \sigma^{4}} - \frac{n}{\sigma^{4}} \end{array} \right) \ = \ \left(\begin{array}{cc} \frac{n}{\sigma^{2}} & 0 \\ 0 & \frac{n}{2 \sigma^{4}} \end{array} \right). \end{eqnarray*}\] The Jeffreys prior is \[\begin{eqnarray*} f(\theta) & \propto & \sqrt{|I(\theta)|} \\ & = & \sqrt{\frac{n^{2}}{2\sigma^{6}}} \ \propto \ \sigma^{-3}. \end{eqnarray*}\]
Suppose \(\theta = (\mu, \sigma^{2})\) and \(X_{i} \, | \, \theta \sim N(\mu, \sigma^{2})\). If \(\mu\) is unknown and \(\sigma^{2}\) known then the Jeffreys prior is \(f(\mu) \propto 1\). If \(\mu\) is known and \(\sigma^{2}\) is unknown then the Jeffreys prior is \(f(\sigma^{2}) \propto \sigma^{-2}\). When both \(\mu\) and \(\sigma^{2}\) are unknown then the Jeffreys prior is \(f(\mu, \sigma^{2}) \propto \sigma^{-3}\). Jeffreys himself found this inconsistent, arguing that \(f(\mu, \sigma^{2}) \propto \sigma^{-2}\), the product of the priors \(f(\mu)\) and \(f(\sigma^{2})\). Jeffreys’ argument was that ignorance about \(\mu\) and \(\sigma^{2}\) should be represented by independent ignorance priors for \(\mu\) and \(\sigma^{2}\) separately. However, it is not clear under what circumstances this prior judgement of independence should be imposed.
Consider, given \(\theta\), a sequence of independent Bernoulli trials with parameter \(\theta\). We wish to make inferences about \(\theta\) and consider two possible methods. In the first, we carry out \(n\) trials and let \(X\) denote the total number of successes in these trials. Thus, \(X \, | \, \theta \sim Bin(n, \theta)\) with \[\begin{eqnarray*} f_{X}(x \, | \, \theta) & = & \binom{n}{x} \theta^{x}(1- \theta)^{n-x}, \ \ x = 0, 1, \ldots, n. \end{eqnarray*}\] In the second method, we count the total number \(Y\) of trials up to and including the \(r\)th success so that \(Y \, | \, \theta \sim Nbin(r, \theta)\), the negative binomial distribution, with \[\begin{eqnarray*} f_{Y}(y \, | \, \theta) & = & \binom{y-1}{r-1} \theta^{r}(1- \theta)^{y-r}, \ \ y = r, r+1, \ldots. \end{eqnarray*}\]
For \(X \, | \, \theta \sim Bin(n, \theta)\) we have \[\begin{eqnarray*} \log f_{X}(x \, | \, \theta) & = & \log \binom{n}{x} + x \log \theta + (n-x) \log (1 - \theta) \end{eqnarray*}\] so that \[\begin{eqnarray*} \frac{\partial^{2}}{\partial \theta^{2}} \log f_{X}(x \, | \, \theta) & = & - \frac{x}{\theta^{2}} - \frac{(n-x)}{(1-\theta)^{2}}. \end{eqnarray*}\] The Fisher information is \[\begin{eqnarray*} I(\theta) & = & -E\left.\left\{- \frac{X}{\theta^{2}} - \frac{(n-X)}{(1-\theta)^{2}} \, \right| \, \theta \right\} \\ & = & \frac{n}{\theta} + \frac{n}{1-\theta} \ = \ \frac{n}{\theta(1-\theta)}. \end{eqnarray*}\] The Jeffreys prior is thus \[\begin{eqnarray*} f(\theta) & \propto & \sqrt{\frac{n}{\theta(1-\theta)}} \ \propto \ \theta^{-\frac{1}{2}}(1-\theta)^{-\frac{1}{2}} \end{eqnarray*}\] which is a kernel of the \(Beta(\frac{1}{2}, \frac{1}{2})\) distribution.
For \(Y \, | \, \theta \sim Nbin(r, \theta)\) we have \[\begin{eqnarray*} \log f_{Y}(y \, | \, \theta) & = & \log \binom{y-1}{r-1} + r \log \theta + (y-r) \log (1- \theta) \end{eqnarray*}\] so that \[\begin{eqnarray*} \frac{\partial^{2}}{\partial \theta^{2}} \log f_{Y}(y \, | \, \theta) & = & - \frac{r}{\theta^{2}} - \frac{(y-r)}{(1-\theta)^{2}}. \end{eqnarray*}\] The Fisher information is \[\begin{eqnarray*} I(\theta) & = & -E\left.\left\{- \frac{r}{\theta^{2}} - \frac{(Y-r)}{(1-\theta)^{2}} \, \right| \, \theta \right\} \\ & = & \frac{r}{\theta^{2}} + \frac{r(\frac{1}{\theta} - 1)}{(1-\theta)^{2}} \\ & = & \frac{r}{\theta^{2}} + \frac{r}{\theta(1-\theta)} \ = \ \frac{r}{\theta^{2}(1-\theta)}. \end{eqnarray*}\] The Jeffreys prior is thus \[\begin{eqnarray*} f(\theta) & \propto & \sqrt{\frac{r}{\theta^{2}(1-\theta)}} \ \propto \ \theta^{-1}(1-\theta)^{-\frac{1}{2}} \end{eqnarray*}\] which can be viewed as the improper \(Beta(0, \frac{1}{2})\) distribution.
We summarise the results in a table. \[\begin{eqnarray*} & \begin{array}{cccc} \mbox{Prior} & \mbox{Likelihood} & \mbox{Prop Likelihood} & \mbox{Posterior} \\ \hline Beta(\frac{1}{2}, \frac{1}{2}) & X \, | \, \theta \sim Bin(n, \theta) & \theta^{x}(1-\theta)^{n-x} & Beta(\frac{1}{2} + x, \frac{1}{2} + n -x)\\ Beta(0, \frac{1}{2}) & Y \, | \, \theta \sim Nbin(r, \theta) & \theta^{r}(1-\theta)^{y-r} & Beta(r, \frac{1}{2} + y -r) \\ \end{array} \end{eqnarray*}\] Notice that if \(x = r\) and \(y = n\) then the two approaches have proportional likelihoods: in both cases we observed \(x\) successes in \(n\) trials but \(\theta \, | \, x \sim Beta(\frac{1}{2} + x, \frac{1}{2} + n -x)\) and \(\theta \, | \, y \sim Beta(x, \frac{1}{2} + n -x)\). Although the observed data is the same, Jeffreys’ approach yields different posterior distributions which seems to contradict the notion of an noninformative prior. This occurs because Jeffreys’ prior violates the likelihood principle. In short this principle states that the likelihood contains all the information about the data \(x\) so that two likelihoods contain the same information if they are proportional. In this case, the two likelihoods are proportional but yield different posterior distributions. Many classical methods, such as confidence intervals, violate the likelihood principle but Bayesian statistics (using proper prior distributions) does not.