MA40189: Topics in Bayesian statistics

Solution Sheet Five

Question 1

Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta\). For each of the following distributions for \(X_{i} \, | \, \theta\) find the Jeffreys prior and the corresponding posterior distribution for \(\theta\).

\(X_{i} \, | \, \theta \sim \mbox{Bern}(\theta)\).

The likelihood is \[\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} \theta^{x_{i}}(1 - \theta)^{1 - x_{i}} \\ & = & \theta^{n\bar{x}}(1-\theta)^{n-n\bar{x}}. \end{eqnarray*}\] As \(\theta\) is univariate \[\begin{eqnarray*} \frac{\partial^{2}}{\partial \theta^{2}} \log f(x \, | \, \theta) & = & \frac{\partial^{2}}{\partial \theta^{2}} \left\{n\bar{x} \log \theta + (n-n\bar{x})\log (1-\theta)\right\} \\ & = & -\frac{n\bar{x}}{\theta^{2}} - \frac{(n-n\bar{x})}{(1-\theta)^{2}}. \end{eqnarray*}\] The Fisher information is thus \[\begin{eqnarray*} I(\theta) & = & -E\left.\left\{-\frac{n\bar{X}}{\theta^{2}} - \frac{(n-n\bar{X})}{(1-\theta)^{2}} \, \right| \, \theta \right\} \\ & = & \frac{nE(\bar{X} \, | \, \theta)}{\theta^{2}} + \frac{\{n-nE(\bar{X} \, | \, \theta)\}}{(1-\theta)^{2}} \\ & = & \frac{n}{\theta} + \frac{n}{1-\theta} \ = \ \frac{n}{\theta(1-\theta)}. \end{eqnarray*}\] The Jeffreys prior is then \[\begin{eqnarray*} f(\theta) & \propto & \sqrt{\frac{n}{\theta(1-\theta)}} \ \propto \ \theta^{-\frac{1}{2}}(1-\theta)^{-\frac{1}{2}}. \end{eqnarray*}\] This is the kernel of a \(\mbox{Beta}(\frac{1}{2}, \frac{1}{2})\) distribution. This makes it straightforward to obtain the posterior using the conjugacy of the Beta distribution relative to the Bernoulli likelihood. The posterior is \(\mbox{Beta}(\frac{1}{2} + n\bar{x}, \frac{1}{2} + n - n\bar{x})\). (If you are not sure why, see Solution Sheet Four Question 1 (a) with \(\alpha = \beta = \frac{1}{2}\).)
\(X_{i} \, | \, \theta \sim \mbox{Po}(\theta)\).

The likelihood is \[\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} \frac{\theta^{x_{i}}e^{-\theta}}{x_{i}!} \\ & = & \frac{\theta^{n\bar{x}}e^{-n\theta}}{\prod_{i=1}^{n} x_{i}!}. \end{eqnarray*}\] As \(\theta\) is univariate \[\begin{eqnarray*} \frac{\partial^{2}}{\partial \theta^{2}} \log f(x \, | \, \theta) & = & \frac{\partial^{2}}{\partial \theta^{2}} \left(n\bar{x}\log \theta - n \theta - \sum_{i=1}^{n} \log x_{i}! \right) \\ & = & -\frac{n\bar{x}}{\theta^{2}}. \end{eqnarray*}\] The Fisher information is thus \[\begin{eqnarray*} I(\theta) & = & - E\left.\left(-\frac{n\bar{X}}{\theta^{2}} \, \right| \, \theta \right) \\ & = & \frac{nE(\bar{X} \, | \, \theta)}{\theta^{2}} \ = \ \frac{n}{\theta}. \end{eqnarray*}\] The Jeffreys prior is then \[\begin{eqnarray*} f(\theta) \ \propto \ \sqrt{\frac{n}{\theta}} \ \propto \ \theta^{-\frac{1}{2}}. \end{eqnarray*}\] This is often expressed as the improper \(\mbox{Gamma}(\frac{1}{2}, 0)\) distribution. This makes it straightforward to obtain the posterior using the conjugacy of the Gamma distribution relative to the Poisson likelihood. The posterior is \(\mbox{Gamma}(\frac{1}{2} + n\bar{x}, n)\). (If you are not sure why, see Solution Sheet Four Question 4 (a) with \(\alpha = \frac{1}{2}\), \(\beta = 0\).)
\(X_{i} \, | \, \theta \sim \mbox{Maxwell}(\theta)\), the Maxwell distribution with parameter \(\theta\) so that \[\begin{eqnarray*} f(x_{i} \, | \, \theta) = \left(\frac{2}{\pi}\right)^{\frac{1}{2}}\theta^{\frac{3}{2}}x_{i}^{2}\exp\left\{-\frac{\theta x_{i}^{2}}{2}\right\}, \ \ x_{i} > 0 \end{eqnarray*}\] and \(E(X_{i} \, | \, \theta) = 2\sqrt{\frac{2}{\pi \theta}}\), \(Var(X_{i} \, | \, \theta) = \frac{3\pi - 8}{\pi \theta}\).

The likelihood is \[\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} \left(\frac{2}{\pi}\right)^{\frac{1}{2}}\theta^{\frac{3}{2}}x_{i}^{2}\exp\left\{-\frac{\theta x_{i}^{2}}{2}\right\} \\ & = & \left(\frac{2}{\pi}\right)^{\frac{n}{2}}\theta^{\frac{3n}{2}}\left( \prod_{i=1}^{n} x_{i}^{2}\right)\exp\left\{-\frac{\theta}{2} \sum_{i=1}^{n} x_{i}^{2}\right\}. \end{eqnarray*}\] As \(\theta\) is univariate \[\begin{eqnarray*} \frac{\partial^{2}}{\partial \theta^{2}} \log f(x \, | \, \theta) & = & \frac{\partial^{2}}{\partial \theta^{2}} \left(\frac{n}{2} \log \frac{2}{\pi} + \frac{3n}{2} \log \theta + \sum_{i=1}^{n} \log x_{i}^{2} - \frac{\theta}{2} \sum_{i=1}^{n} x_{i}^{2}\right) \\ & = & - \frac{3n}{2\theta^{2}}. \end{eqnarray*}\] The Fisher information is \[\begin{eqnarray*} I(\theta) \ = \ -E\left.\left(-\frac{3n}{2\theta^{2}} \, \right| \, \theta \right) \ = \ \frac{3n}{2\theta^{2}}. \end{eqnarray*}\] The Jeffreys prior is then \[\begin{eqnarray*} f(\theta) & \propto & \sqrt{\frac{3n}{2\theta^{2}}} \ \propto \ \theta^{-1}. \end{eqnarray*}\] Thus is often expressed as the improper \(\mbox{Gamma}(0, 0)\) distribution. This makes it straightforward to obtain the posterior using the conjugacy of the Gamma distribution relative to the Maxwell likelihood. The posterior is \(\mbox{Gamma}(\frac{3n}{2}, \frac{1}{2}\sum_{i=1}^{n} x_{i}^{2})\). (If you are not sure why, see Solution Sheet Four Question 1 (c) with \(\alpha = \beta = 0\).)

Question 2

Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\lambda\). Suppose that \(X_{i} \, | \, \lambda \sim \mbox{Exp}(\lambda)\) where \(\lambda\) represents the rate so that \(E(X_{i} \, | \, \lambda) = \lambda^{-1}\).

Show that \(X_{i} \, | \, \lambda \sim \mbox{Exp}(\lambda)\) is a member of the \(1\)-parameter exponential family. Hence, write down a sufficient statistic \(t(X)\) for \(X = (X_{1}, \ldots, X_{n})\) for learning about \(\lambda\).

We write \[\begin{eqnarray*} f(x_{i} \, | \, \lambda) \ = \ \lambda e^{-\lambda x_{i}} \ = \ \exp\left\{-\lambda x_{i} + \log \lambda \right\} \end{eqnarray*}\] which is of the form \(\exp\{\phi_{1}(\lambda) u_{1}(x_{i}) + g(\lambda) + h(x_{i})\}\) where \(\phi_{1}(\lambda) = - \lambda\), \(u_{1}(x_{i}) = x_{i}\), \(g(\lambda) = \log \lambda\) and \(h(x_{i}) = 0\). From the Proposition given in Lecture 11 (see p26 of the on-line notes) a sufficient statistic is \[\begin{eqnarray*} t(X) & = & \left[n, \sum_{i=1}^{n} u_{1}(X_{i})\right] \ = \ \left[n, \sum_{i=1}^{n} X_{i}\right]. \end{eqnarray*}\]
Find the Jeffreys prior and comment upon whether or not it is improper. Find the posterior distribution for this prior.

The likelihood is \[\begin{eqnarray} f(x \, | \, \lambda) \ = \ \prod_{i=1}^{n} \lambda e^{-\lambda x_{i}} \ = \ \lambda^{n} e^{- \lambda n \bar{x}}. \tag{1} \end{eqnarray}\] As \(\theta\) is univariate \[\begin{eqnarray*} \frac{\partial^{2}}{\partial \theta^{2}} \log f(x \, | \, \theta) & = & \frac{\partial^{2}}{\partial \theta^{2}} \left(n \log \lambda - \lambda n \bar{x} \right) \\ & = & -\frac{n}{\lambda^{2}}. \end{eqnarray*}\] The Fisher information is \[\begin{eqnarray*} I(\lambda) \ = \ -E\left.\left(-\frac{n}{\lambda^{2}} \, \right| \, \lambda \right) \ = \ \frac{n}{\lambda^{2}}. \end{eqnarray*}\] The Jeffreys prior is thus \[\begin{eqnarray} f(\lambda) \ \propto \ \sqrt{\frac{n}{\lambda^{2}}} \ \propto \ \lambda^{-1}. \tag{2} \end{eqnarray}\] This is the improper \(\mbox{Gamma}(0, 0)\) distribution. The posterior can be obtained using the conjugacy of the Gamma distribution relative to the Exponential likelihood. It is \(\mbox{Gamma}(n, n\bar{x})\). (If you are not sure why, see Solution Sheet Three Question 1 (a) with \(\alpha = \beta = 0\).)
Consider the transformation \(\phi = \log \lambda\).
1. By expressing \(L(\lambda) = f(x \, | \, \lambda)\) as \(L(\phi)\) find the Jeffreys prior for \(\phi\).
  
  Inverting \(\phi = \log \lambda\) we have \(\lambda = e^{\phi}\). Substituting this into (1) we have \[\begin{eqnarray*} L(\phi) & = & e^{n\phi} \exp\left\{-n\bar{x}e^{\phi}\right\}. \end{eqnarray*}\] We have \[\begin{eqnarray*} \frac{\partial^{2}}{\partial \phi^{2}} \log L(\phi) & = & \frac{\partial^{2}}{\partial \phi^{2}}\left(n\phi - n\bar{x}e^{\phi}\right) \\ & = & - n\bar{x}e^{\phi}. \end{eqnarray*}\] The Fisher information is \[\begin{eqnarray*} I(\phi) & = & -E\left.\left(- n\bar{X}e^{\phi} \, \right| \, \phi \right) \\ & = & nE(X_{i} \, | \, \phi)e^{\phi}. \end{eqnarray*}\] Now \(E(X_{i} \, | \, \phi) = E(X_{i} \, | \, \lambda) = \lambda^{-1} = e^{-\phi}\) so that \(I(\phi) = n\). The Jeffreys prior is then \[\begin{eqnarray} f_{\phi}(\phi) \ \propto \ \sqrt{n} \ \propto \ 1 \tag{3} \end{eqnarray}\] which is the improper uniform distribution on \((-\infty, \infty)\).

By transforming the distribution of the Jeffreys prior for \(\lambda\), \(f(\lambda)\), find the distribution of \(\phi\). [This should yield the same answer as the previous part and is an illustration of the invariance to reparameterisation of the Jeffreys prior.]

From (2) we have that \(f_{\lambda}(\lambda) \propto \lambda^{-1}\). Using the familiar change of variables formula, \[\begin{eqnarray*} f_{\phi}(\phi) & = & \left|\frac{\partial \lambda}{\partial \phi}\right| f_{\lambda}(e^{\phi}) \end{eqnarray*}\] with \(\frac{\partial \lambda}{\partial \phi} = e^{\phi}\) as \(\lambda = e^{\phi}\), we have that \[\begin{eqnarray*} f_{\phi}(\phi) & \propto |e^{\phi}|e^{-\phi} \ = \ 1 \end{eqnarray*}\] which agrees with (3). This is an illustration of the invariance to reparameterisation of the Jeffreys prior.

Question 3

The Jeffreys prior for Normal distributions. In Lecture 13 we showed that for an exchangeable collection \(X = (X_{1}, \ldots, X_{n})\) with \(X_{i} \, | \, \theta \sim N(\theta, \sigma^{2})\) where \(\sigma^{2}\) is known the Jeffreys prior for \(\theta\) is \(f(\theta) \propto 1\).

Consider, instead, that \(X_{i} \, | \, \theta \sim N(\mu, \theta)\) where \(\mu\) is known. Find the Jeffreys prior for \(\theta\).

The likelihood is \[\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi \theta}} \exp\left\{-\frac{1}{2\theta} (x_{i} - \mu)^{2}\right\} \\ & = & (2 \pi)^{-\frac{n}{2}} \theta^{-\frac{n}{2}} \exp \left\{-\frac{1}{2 \theta} \sum_{i=1}^{n} (x_{i} - \mu)^{2}\right\}. \end{eqnarray*}\] As \(\theta\) is univariate \[\begin{eqnarray*} \frac{\partial^{2}}{\partial \theta^{2}} \log f(x \, | \, \theta) & = & \frac{\partial^{2}}{\partial \theta^{2}} \left\{-\frac{n}{2} \log 2 \pi - \frac{n}{2} \log \theta - \frac{1}{2 \theta} \sum_{i=1}^{n} (x_{i} - \mu)^{2} \right\} \\ & = & \frac{n}{2 \theta^{2}} - \frac{1}{\theta^{3}} \sum_{i=1}^{n} (x_{i} - \mu)^{2}. \end{eqnarray*}\] The Fisher information is \[\begin{eqnarray*} I(\theta) & = & - E\left.\left\{\frac{n}{2 \theta^{2}} - \frac{1}{\theta^{3}} \sum_{i=1}^{n} (X_{i} - \mu)^{2} \, \right| \, \theta \right\} \\ & = & - \frac{n}{2 \theta^{2}} + \frac{1}{\theta^{3}} \sum_{i=1}^{n} E\left\{(X_{i} - \mu)^{2} \, | \, \theta\right\} \\ & = & - \frac{n}{2 \theta^{2}} + \frac{n}{\theta^{2}} \ = \ \frac{n}{2 \theta^{2}}. \end{eqnarray*}\] The Jeffreys prior is then \[\begin{eqnarray*} f(\theta) & \propto & \sqrt{\frac{n}{2 \theta^{2}}} \ = \ \theta^{-1}. \end{eqnarray*}\]
Now suppose that \(X_{i} \, | \, \theta \sim N(\mu, \sigma^{2})\) where \(\theta = (\mu, \sigma^{2})\). Find the Jeffreys prior for \(\theta\).

The likelihood is \[\begin{eqnarray*} f(x \, | \, \theta) & = & (2 \pi)^{-\frac{n}{2}} (\sigma^{2})^{-\frac{n}{2}} \exp \left\{-\frac{1}{2 \sigma^{2}} \sum_{i=1}^{n} (x_{i} - \mu)^{2}\right\} \end{eqnarray*}\] so that \[\begin{eqnarray*} \log f(x \, | \, \theta) & = & -\frac{n}{2} \log 2 \pi - \frac{n}{2} \log \sigma^{2} - \frac{1}{2 \sigma^{2}} \sum_{i=1}^{n} (x_{i} - \mu)^{2}. \end{eqnarray*}\] As \(\theta = (\mu, \sigma^{2})\) then the Fisher Information matrix is \[\begin{eqnarray*} I(\theta) & = & -\left(\begin{array}{ll} E\left.\left\{\frac{\partial^{2}}{\partial \mu^{2}} \log f(x \, | \, \theta) \, \right| \, \theta \right\} & E\left.\left\{\frac{\partial^{2}}{\partial \mu \partial \sigma^{2}} \log f(x \, | \, \theta) \, \right| \, \theta \right\}\\ E\left.\left\{\frac{\partial^{2}}{\partial \mu \partial \sigma^{2}} \log f(x \, | \, \theta) \, \right| \, \theta \right\} & E\left.\left\{\frac{\partial^{2}}{\partial (\sigma^{2})^{2}} \log f(x \, | \, \theta) \, \right| \, \theta \right\}\end{array}\right) \end{eqnarray*}\] Now, \[\begin{eqnarray*} \frac{\partial}{\partial \mu} \log f(x \, | \, \theta) & = & \frac{1}{\sigma^{2} } \sum_{i=1}^{n} (x_{i}- \mu); \\ \frac{\partial}{\partial (\sigma^{2})} \log f(x \, | \, \theta) & = & -\frac{n}{2\sigma^{2}} + \frac{1}{2\sigma^{4}}\sum_{i=1}^{n} (x_{i} - \mu)^{2}; \end{eqnarray*}\] so that \[\begin{eqnarray*} \frac{\partial^{2}}{\partial \mu^{2}} \log f(x \, | \, \theta) & = & -\frac{n}{\sigma^{2}}; \\ \frac{\partial^{2}}{\partial \mu \partial (\sigma^{2})} \log f(x \, | \, \theta) & = & - \frac{1}{\sigma^{4}} \sum_{i=1}^{n} (x_{i} - \mu); \\ \frac{\partial^{2}}{\partial (\sigma^{2})^{2}} \log f(x \, | \, \theta) & = & \frac{n}{2\sigma^{4}} - \frac{1}{\sigma^{6}}\sum_{i=1}^{n} (x_{i} - \mu)^{2}. \end{eqnarray*}\] Noting that \(E(X_{i} - \mu \, | \, \theta) = 0\) and \(E\{(X_{i} - \mu)^{2} \, | \, \theta\} = \sigma^{2}\), the Fisher information matrix is \[\begin{eqnarray*} I(\theta) & = & -\left(\begin{array}{cc} -\frac{n}{\sigma^{2}} & 0 \\ 0 & \frac{n}{2 \sigma^{4}} - \frac{n}{\sigma^{4}} \end{array} \right) \ = \ \left(\begin{array}{cc} \frac{n}{\sigma^{2}} & 0 \\ 0 & \frac{n}{2 \sigma^{4}} \end{array} \right). \end{eqnarray*}\] The Jeffreys prior is \[\begin{eqnarray*} f(\theta) & \propto & \sqrt{|I(\theta)|} \\ & = & \sqrt{\frac{n^{2}}{2\sigma^{6}}} \ \propto \ \sigma^{-3}. \end{eqnarray*}\]
Comment upon your answers for these three Normal cases.

Suppose \(\theta = (\mu, \sigma^{2})\) and \(X_{i} \, | \, \theta \sim N(\mu, \sigma^{2})\). If \(\mu\) is unknown and \(\sigma^{2}\) known then the Jeffreys prior is \(f(\mu) \propto 1\). If \(\mu\) is known and \(\sigma^{2}\) is unknown then the Jeffreys prior is \(f(\sigma^{2}) \propto \sigma^{-2}\). When both \(\mu\) and \(\sigma^{2}\) are unknown then the Jeffreys prior is \(f(\mu, \sigma^{2}) \propto \sigma^{-3}\). Jeffreys himself found this inconsistent, arguing that \(f(\mu, \sigma^{2}) \propto \sigma^{-2}\), the product of the priors \(f(\mu)\) and \(f(\sigma^{2})\). Jeffreys’ argument was that ignorance about \(\mu\) and \(\sigma^{2}\) should be represented by independent ignorance priors for \(\mu\) and \(\sigma^{2}\) separately. However, it is not clear under what circumstances this prior judgement of independence should be imposed.

Question 4

Consider, given \(\theta\), a sequence of independent Bernoulli trials with parameter \(\theta\). We wish to make inferences about \(\theta\) and consider two possible methods. In the first, we carry out \(n\) trials and let \(X\) denote the total number of successes in these trials. Thus, \(X \, | \, \theta \sim \mbox{Bin}(n, \theta)\) with \[\begin{eqnarray*} f_{X}(x \, | \, \theta) & = & \binom{n}{x} \theta^{x}(1- \theta)^{n-x}, \ \ x = 0, 1, \ldots, n. \end{eqnarray*}\] In the second method, we count the total number \(Y\) of trials up to and including the \(r\)th success so that \(Y \, | \, \theta \sim \mbox{Nbin}(r, \theta)\), the negative binomial distribution, with \[\begin{eqnarray*} f_{Y}(y \, | \, \theta) & = & \binom{y-1}{r-1} \theta^{r}(1- \theta)^{y-r}, \ \ y = r, r+1, \ldots. \end{eqnarray*}\]

Obtain the Jeffreys prior distribution for each of the two methods. You may find it useful to note that \(E(Y \, | \, \theta) = \frac{r}{\theta}\).

For \(X \, | \, \theta \sim \mbox{Bin}(n, \theta)\) we have \[\begin{eqnarray*} \log f_{X}(x \, | \, \theta) & = & \log \binom{n}{x} + x \log \theta + (n-x) \log (1 - \theta) \end{eqnarray*}\] so that \[\begin{eqnarray*} \frac{\partial^{2}}{\partial \theta^{2}} \log f_{X}(x \, | \, \theta) & = & - \frac{x}{\theta^{2}} - \frac{(n-x)}{(1-\theta)^{2}}. \end{eqnarray*}\] The Fisher information is \[\begin{eqnarray*} I(\theta) & = & -E\left.\left\{- \frac{X}{\theta^{2}} - \frac{(n-X)}{(1-\theta)^{2}} \, \right| \, \theta \right\} \\ & = & \frac{n}{\theta} + \frac{n}{1-\theta} \ = \ \frac{n}{\theta(1-\theta)}. \end{eqnarray*}\] The Jeffreys prior is thus \[\begin{eqnarray*} f(\theta) & \propto & \sqrt{\frac{n}{\theta(1-\theta)}} \ \propto \ \theta^{-\frac{1}{2}}(1-\theta)^{-\frac{1}{2}} \end{eqnarray*}\] which is a kernel of the \(\mbox{Beta}(\frac{1}{2}, \frac{1}{2})\) distribution.

For \(Y \, | \, \theta \sim \mbox{Nbin}(r, \theta)\) we have \[\begin{eqnarray*} \log f_{Y}(y \, | \, \theta) & = & \log \binom{y-1}{r-1} + r \log \theta + (y-r) \log (1- \theta) \end{eqnarray*}\] so that \[\begin{eqnarray*} \frac{\partial^{2}}{\partial \theta^{2}} \log f_{Y}(y \, | \, \theta) & = & - \frac{r}{\theta^{2}} - \frac{(y-r)}{(1-\theta)^{2}}. \end{eqnarray*}\] The Fisher information is \[\begin{eqnarray*} I(\theta) & = & -E\left.\left\{- \frac{r}{\theta^{2}} - \frac{(Y-r)}{(1-\theta)^{2}} \, \right| \, \theta \right\} \\ & = & \frac{r}{\theta^{2}} + \frac{r(\frac{1}{\theta} - 1)}{(1-\theta)^{2}} \\ & = & \frac{r}{\theta^{2}} + \frac{r}{\theta(1-\theta)} \ = \ \frac{r}{\theta^{2}(1-\theta)}. \end{eqnarray*}\] The Jeffreys prior is thus \[\begin{eqnarray*} f(\theta) & \propto & \sqrt{\frac{r}{\theta^{2}(1-\theta)}} \ \propto \ \theta^{-1}(1-\theta)^{-\frac{1}{2}} \end{eqnarray*}\] which can be viewed as the improper \(\mbox{Beta}(0, \frac{1}{2})\) distribution.
Suppose we observe \(x = r\) and \(y = n\). For each method, calculate the posterior distribution for \(\theta\) with the Jeffreys prior. Comment upon your answers.

We summarise the results in a table. \[\begin{eqnarray*} & \begin{array}{cccc} \mbox{Prior} & \mbox{Likelihood} & \mbox{Prop Likelihood} & \mbox{Posterior} \\ \hline \mbox{Beta}(\frac{1}{2}, \frac{1}{2}) & X \, | \, \theta \sim \mbox{Bin}(n, \theta) & \theta^{x}(1-\theta)^{n-x} & \mbox{Beta}(\frac{1}{2} + x, \frac{1}{2} + n -x)\\ \mbox{Beta}(0, \frac{1}{2}) & Y \, | \, \theta \sim \mbox{Nbin}(r, \theta) & \theta^{r}(1-\theta)^{y-r} & \mbox{Beta}(r, \frac{1}{2} + y -r) \\ \end{array} \end{eqnarray*}\] Notice that if \(x = r\) and \(y = n\) then the two approaches have proportional likelihoods: in both cases we observed \(x\) successes in \(n\) trials but \(\theta \, | \, x \sim \mbox{Beta}(\frac{1}{2} + x, \frac{1}{2} + n -x)\) and \(\theta \, | \, y \sim \mbox{Beta}(x, \frac{1}{2} + n -x)\). Although the observed data is the same, Jeffreys’ approach yields different posterior distributions which seems to contradict the notion of an noninformative prior. This occurs because Jeffreys’ prior violates the likelihood principle. In short this principle states that the likelihood contains all the information about the data \(x\) so that two likelihoods contain the same information if they are proportional. In this case, the two likelihoods are proportional but yield different posterior distributions. Many classical methods, such as confidence intervals, violate the likelihood principle but Bayesian statistics (using proper prior distributions) does not.