## Solution Sheet Four

### Question 1

Let $$X_{1}, \ldots, X_{n}$$ be exchangeable so that the $$X_{i}$$ are conditionally independent given a parameter $$\theta$$. For each of the distributions

1. $$X_{i} \, | \, \theta \sim Bern(\theta)$$.

1. Show that $$f(x_{i} \, | \, \theta)$$ belongs to the 1-parameter exponential family and for $$X = (X_{1}, \ldots, X_{n})$$ state the sufficient statistic for learning about $$\theta$$.

Notice that we can write $\begin{eqnarray*} f(x_{i} \, | \, \theta) & = & \theta^{x_{i}}(1-\theta)^{1-x_{i}} \\ & = & \exp \left\{\left( \log \frac{\theta}{1-\theta}\right) x_{i} + \log (1 - \theta)\right\} \end{eqnarray*}$ so that $$f(x_{i} \, | \, \theta)$$ belongs to the $$1$$-parameter exponential family with $$\phi_{1}(\theta) = \log \frac{\theta}{1-\theta}$$, $$u_{1}(x_{i}) = x_{i}$$, $$g(\theta) = \log (1 - \theta)$$ and $$h(x_{i}) = 0$$. Notice that, from Proposition 1 (see Lecture 11), $$t_{n} = [n, \sum_{i=1}^{n} X_{i}]$$ is a sufficient statistic.

1. By viewing the likelihood as a function of $$\theta$$, which generic family of distributions (over $$\theta$$) is the likelihood a kernel of?

The likelihood, without expressing in the explicit exponential family form, is $\begin{eqnarray*} f(x \, | \, \theta) & = & \theta^{n\bar{x}}(1 - \theta)^{n - n\bar{x}} \end{eqnarray*}$ which, viewing as a function of $$\theta$$, we immediately recognise as a Beta kernel (in particular, a $$Beta(n\bar{x}+1, n - n\bar{x}+1)$$).

1. By first finding the corresponding posterior distribution for $$\theta$$ given $$x = (x_{1}, \ldots, x_{n})$$, show that this family of distributions is conjugate with respect to the likelihood $$f(x \, | \, \theta)$$.

Taking $$\theta \sim Beta(\alpha, \beta)$$ we have that
$\begin{eqnarray*} f(\theta \, | \, x) & \propto & \theta^{n\bar{x}}(1 - \theta)^{n - n\bar{x}} \times \theta^{\alpha - 1}(1 - \theta)^{\beta - 1} \\ & = & \theta^{\alpha + n\bar{x} - 1}(1-\theta)^{\beta + n - n\bar{x} - 1} \end{eqnarray*}$ so that $$\theta \, | \, x \sim Beta(\alpha + n\bar{x}, \beta + n - n\bar{x})$$. Thus, the prior and the posterior are in the same family giving conjugacy.

Deriving the results directly from exponential family representation

Expressed in the 1-parameter exponential family form the likelihood is $\begin{eqnarray*} f(x \, | \, \theta) & = & \exp \left\{\left( \log \frac{\theta}{1-\theta}\right) \sum_{i=1}^{n} x_{i} + n\log (1 - \theta)\right\} \end{eqnarray*}$ from which we immediately observe the sufficient statistic $$t_{n} = [n, \sum_{i=1}^{n} x_{i}]$$. Viewing $$f(x \, | \, \theta)$$ as a function of $$\theta$$ the natural conjugate prior is a member of the $$2$$-parameter exponential family of the form $\begin{eqnarray*} f(\theta) & = & \exp \left\{a\left( \log \frac{\theta}{1-\theta}\right) + d\log (1 - \theta) + c(a, d)\right\} \end{eqnarray*}$ where $$c(a, d)$$ is the normalising constant. Hence, $\begin{eqnarray*} f(\theta) & \propto & \exp \left\{a\left( \log \frac{\theta}{1-\theta}\right) + d\log (1 - \theta) \right\} \nonumber \\ & = & \theta^{a}(1-\theta)^{d-a} \label{eq1a3} \end{eqnarray*}$ which we recognise as a kernel of a Beta distribution. The convention is to label the hyperparameters as $$\alpha$$ and $$\beta$$ so that we put $$\alpha = \alpha(a, d) = a + 1$$ and $$\beta = \beta(a, d) = d - a +1$$ (equivalently, $$a = a(\alpha, \beta) = \alpha - 1$$, $$d = d(\alpha, \beta) = \beta + \alpha -2$$). The conjugate prior distribution is $$\theta \sim Beta(\alpha, \beta)$$.

1. Let $$X_{i} \, | \, \theta \sim N(\mu, \theta)$$ with $$\mu$$ known.

1. Show that $$f(x_{i} \, | \, \theta)$$ belongs to the 1-parameter exponential family and for $$X = (X_{1}, \ldots, X_{n})$$ state the sufficient statistic for learning about $$\theta$$.

Writing the normal density as an exponential family (parameter $$\theta$$ as $$\mu$$ is a known constant) we have $\begin{eqnarray*} f(x_{i} \, | \, \theta) & = & \exp\left\{-\frac{1}{2\theta} (x_{i} - \mu)^{2} - \frac{1}{2}\log \theta - \log \sqrt{2\pi} \right\} \end{eqnarray*}$ so that $$f(x_{i} \, | \, \theta)$$ belongs to the 1-parameter exponential family. The sufficient statistic is $$t_{n} = [n, \sum_{i=1}^{n}(x_{i} - \mu)^{2}]$$. Note that, expressed explicitly as a 1-parameter exponential family, the likelihood for $$x = (x_{1}, \ldots, x_{n})$$ is $\begin{eqnarray*} f(x \, | \, \theta) & = & \exp\left\{-\frac{1}{2\theta} \sum_{i=1}^{n} (x_{i} - \mu)^{2} - \frac{n}{2}\log \theta - n\log \sqrt{2\pi} \right\} \end{eqnarray*}$ so that the natural conjugate prior has the form $\begin{eqnarray*} f(\theta) & = & \exp\left\{-a \frac{1}{\theta} - d \log \theta + c(a, d)\right\} \\ & \propto & \theta^{-d}\exp\left\{-a\frac{1}{\theta}\right\} \end{eqnarray*}$ which we recognise as a kernel of an Inverse-Gamma distribution.

1. By viewing the likelihood as a function of $$\theta$$, which generic family of distributions (over $$\theta$$) is the likelihood a kernel of?

In conventional form, $\begin{eqnarray*} f(x \, | \, \theta) & \propto & \theta^{-\frac{n}{2}} \exp \left\{-\frac{1}{2\theta}\sum_{i=1}^{n} (x_{i}- \mu)^{2}\right\} \end{eqnarray*}$ which, viewing $$f(x \, | \, \theta)$$ as a function of $$\theta$$, we recognise as a kernel of an Inverse-Gamma distribution (in particular, an $$\mbox{Inv-gamma}(\frac{n-2}{2}, \frac{1}{2}\sum_{i=1}^{n} (x_{i}- \mu)^{2})$$).

1. By first finding the corresponding posterior distribution for $$\theta$$ given $$x = (x_{1}, \ldots, x_{n})$$, show that this family of distributions is conjugate with respect to the likelihood $$f(x \, | \, \theta)$$.

Taking $$\theta \sim \mbox{Inv-gamma}(\alpha, \beta)$$ we have $\begin{eqnarray*} f(\theta \, | \, x) & \propto & \theta^{-\frac{n}{2}} \exp \left\{-\frac{1}{2\theta}\sum_{i=1}^{n} (x_{i}- \mu)^{2}\right\} \times \theta^{-(\alpha + 1)}\exp\left\{-\frac{\beta}{\theta}\right\} \\ & = & \theta^{-(\alpha + \frac{n}{2} + 1)}\exp\left\{-\left(\beta + \frac{1}{2}\sum_{i=1}^{n} (x_{i} - \mu)^{2}\right)\frac{1}{\theta}\right\} \end{eqnarray*}$ which we recognise as a kernel of an Inverse-Gamma distribution so that $$\theta \, | \, x \sim \mbox{Inv-gamma}(\alpha + \frac{n}{2}, \beta + \frac{1}{2} \sum_{i=1}^{n} (x_{i} - \mu)^{2})$$. Hence, the prior and posterior are in the same family giving conjugacy.

1. Let $$X_{i} \, | \, \theta \sim Maxwell(\theta)$$, the Maxwell distribution with parameter $$\theta$$ so that $\begin{eqnarray*} f(x_{i} \, | \, \theta) = \left(\frac{2}{\pi}\right)^{\frac{1}{2}}\theta^{\frac{3}{2}}x_{i}^{2}\exp\left\{-\frac{\theta x_{i}^{2}}{2}\right\}, \ \ x_{i} > 0 \end{eqnarray*}$ and $$E(X_{i} \, | \, \theta) = 2\sqrt{\frac{2}{\pi \theta}}$$, $$Var(X_{i} \, | \, \theta) = \frac{3\pi - 8}{\pi \theta}$$

1. Show that $$f(x_{i} \, | \, \theta)$$ belongs to the 1-parameter exponential family and for $$X = (X_{1}, \ldots, X_{n})$$ state the sufficient statistic for learning about $$\theta$$.

Writing the Maxwell density in exponential family form we have $\begin{eqnarray*} f(x_{i} \, | \, \theta) & = & \exp\left\{-\theta \frac{x_{i}^{2}}{2} + \frac{3}{2} \log \theta + \log x_{i}^{2} + \frac{1}{2} \log \frac{2}{\pi}\right\} \end{eqnarray*}$ so that $$f(x_{i} \, | \, \theta)$$ belongs to the 1-parameter exponential family. The sufficient statistic is $$t_{n} = [n, \sum_{i=1}^{n}x_{i}^{2}]$$. Note that, expressed explicitly as a 1-parameter exponential family, the likelihood for $$x = (x_{1}, \ldots, x_{n})$$ is $\begin{eqnarray*} f(x \, | \, \theta) & = & \exp\left\{-\theta \sum_{i=1}^{n} \frac{x_{i}^{2}}{2} + \frac{3n}{2} \log \theta + \sum_{i=1}^{n}\log x_{i}^{2} + \frac{n}{2} \log \frac{2}{\pi}\right\} \end{eqnarray*}$ so that the natural conjugate prior has the form $\begin{eqnarray*} f(\theta) & = & \exp\left\{-a \theta + d \log \theta + c(a, d)\right\} \\ & \propto & \theta^{d} e^{-a \theta} \end{eqnarray*}$ which we recognise as a kernel of a Gamma distribution.

1. By viewing the likelihood as a function of $$\theta$$, which generic family of distributions (over $$\theta$$) is the likelihood a kernel of?

In conventional form, $\begin{eqnarray*} f(x \, | \, \theta) & = & \left(\frac{2}{\pi}\right)^{\frac{n}{2}}\theta^{\frac{3n}{2}}\left(\prod_{i=1}^{n} x_{i}^{2}\right)\exp\left\{-\left(\frac{\sum_{i=1}^{n} x_{i}^{2}}{2}\right)\theta\right\} \\ & \propto & \theta^{\frac{3n}{2}}\exp\left\{-\left(\frac{\sum_{i=1}^{n} x_{i}^{2}}{2}\right)\theta\right\} \end{eqnarray*}$ which, viewing $$f(x \, | \, \theta)$$ as a function of $$\theta$$, we recognise as a kernel of a Gamma distribution (in particular, $$\mbox{Gamma}(\frac{3n+2}{2}, \frac{1}{2}\sum_{i=1}^{n} x_{i}^{2})$$).

1. By first finding the corresponding posterior distribution for $$\theta$$ given $$x = (x_{1}, \ldots, x_{n})$$, show that this family of distributions is conjugate with respect to the likelihood $$f(x \, | \, \theta)$$.

Taking $$\theta \sim \mbox{Gamma}(\alpha, \beta)$$ we have $\begin{eqnarray*} f(\theta \, | \, x) & \propto & \theta^{\frac{3n}{2}}\exp\left\{-\left(\frac{\sum_{i=1}^{n} x_{i}^{2}}{2}\right)\theta\right\} \times \theta^{\alpha -1}e^{-\beta \theta} \\ & = & \theta^{\alpha + \frac{3n}{2} - 1} \exp\left\{-\left(\beta + \frac{1}{2}\sum_{i=1}^{n} x_{i}^{2}\right)\theta\right\} \end{eqnarray*}$ which, of course, is a kernel of a Gamma distribution so that $$\theta \, | \, x \sim \mbox{Gamma}(\alpha + \frac{3n}{2}, \beta + \frac{1}{2}\sum_{i=1}^{n}x_{i}^{2})$$. The prior and the posterior are in the same family giving conjugacy.

### Question 2

Let $$X_{1}, \ldots, X_{n}$$ be exchangeable so that the $$X_{i}$$ are conditionally independent given a parameter $$\theta$$. Suppose that $$X_{i} \, | \, \theta$$ is geometrically distributed with probability density function $\begin{eqnarray*} f(x_{i} \, | \, \theta) & = & (1-\theta)^{x_{i}-1}\theta, \ \ x_{i} = 1, 2, \ldots. \end{eqnarray*}$

1. Show that $$f(x \, | \, \theta)$$, where $$x = (x_{1}, \ldots, x_{n})$$, belongs to the $$1$$-parameter exponential family. Hence, or otherwise, find the conjugate prior distribution and corresponding posterior distribution for $$\theta$$.

As the $$X_{i}$$ are exchangeable then $\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} f(x_{i} \, | \, \theta) \\ & = & \prod_{i=1}^{n} (1-\theta)^{x_{i}-1}\theta \\ & = & (1 - \theta)^{n\bar{x} -n}\theta^{n} \\ & = & \exp\left\{(n\bar{x} - n)\log (1 - \theta) + n \log \theta \right\} \end{eqnarray*}$ and so belongs to the $$1$$-parameter exponential family. The conjugate prior is of the form $\begin{eqnarray*} f(\theta) & \propto & \exp\left\{ a\log (1 - \theta) + b \log \theta \right\} \\ & = & \theta^{b}(1-\theta)^{a} \end{eqnarray*}$ which is a kernel of a Beta distribution. Letting $$\alpha = b+1$$, $$\beta = a+1$$ then we have $$\theta \sim Beta(\alpha, \beta)$$. $\begin{eqnarray*} f(\theta \, | \, x) & \propto & f(x \, | \, \theta)f(\theta) \\ & \propto & \theta^{n}(1-\theta)^{(n\bar{x} - n)}\theta^{\alpha-1}(1-\theta)^{\beta-1} \end{eqnarray*}$ which is a kernel of a $$Beta(\alpha + n, \beta + n\bar{x} - n)$$ so that $$\theta \, | \, x \sim Beta(\alpha + n, \beta + n\bar{x} - n)$$.

1. Show that the posterior mean for $$\theta$$ can be written as a weighted average of the prior mean of $$\theta$$ and the maximum likelihood estimate, $$\bar{x}^{-1}$$.

$\begin{eqnarray*} E(\theta \, | \, X) & = & \frac{\alpha + n}{(\alpha + n) + (\beta + n\bar{x} - n)} \\ & = & \frac{\alpha + n}{\alpha + \beta + n\bar{x}} \\ & = & \left(\frac{\alpha + \beta}{\alpha + \beta + n\bar{x}}\right)\left(\frac{\alpha}{\alpha+\beta}\right) + \left(\frac{n\bar{x}}{\alpha + \beta + n\bar{x}}\right)\left(\frac{1}{\bar{x}}\right) \\ & = & \lambda E(\theta) + (1-\lambda)\bar{x}^{-1}. \end{eqnarray*}$

1. Suppose now that the prior for $$\theta$$ is instead given by the probability density function $\begin{eqnarray*} f(\theta) & = & \frac{1}{2B(\alpha+1, \beta)}\theta^{\alpha}(1-\theta)^{\beta - 1} + \frac{1}{2B(\alpha, \beta+1)}\theta^{\alpha-1}(1-\theta)^{\beta}, \end{eqnarray*}$ where $$B(\alpha, \beta)$$ denotes the Beta function evaluated at $$\alpha$$ and $$\beta$$. Show that the posterior probability density function can be written as $\begin{eqnarray*} f(\theta \, | \, x) & = & \lambda f_{1}(\theta) + (1 - \lambda) f_{2}(\theta) \end{eqnarray*}$ where $\begin{eqnarray*} \lambda & = & \frac{(\alpha + n)\beta}{(\alpha + n)\beta + (\beta -n + \sum_{i=1}^{n} x_{i})\alpha} \end{eqnarray*}$ and $$f_{1}(\theta)$$ and $$f_{2}(\theta)$$ are probability density functions.

$\begin{eqnarray*} f(\theta \, | \, x) & \propto & f(x \, | \, \theta)f(\theta) \\ & = & \theta^{n}(1-\theta)^{(n\bar{x} - n)}\left\{\frac{\theta^{\alpha}(1-\theta)^{\beta-1}}{B(\alpha+1, \beta)} + \frac{\theta^{\alpha-1}(1-\theta)^{\beta}}{B(\alpha, \beta+1)}\right\} \\ & = & \frac{\theta^{\alpha_{1}}(1-\theta)^{\beta_{1}-1}}{B(\alpha+1, \beta)} + \frac{\theta^{\alpha_{1}-1}(1-\theta)^{\beta_{1}}}{B(\alpha, \beta+1)} \end{eqnarray*}$ where $$\alpha_{1} = \alpha +n$$ and $$\beta_{1} = \beta + n\bar{x} -n$$. Finding the constant of proportionality we observe that $$\theta^{\alpha_{1}}(1-\theta)^{\beta_{1}-1}$$ is a kernel of a $$Beta(\alpha_{1}+1, \beta_{1})$$ and $$\theta^{\alpha_{1}-1}(1-\theta)^{\beta_{1}}$$ is a kernel of a $$Beta(\alpha_{1},\beta_{1}+1)$$. So, $\begin{eqnarray*} f(\theta \, | \, x) & = & c\left\{\frac{B(\alpha_{1}+1,\beta_{1})}{B(\alpha+1,\beta)}f_{1}(\theta) + \frac{B(\alpha_{1},\beta_{1}+1)}{B(\alpha,\beta+1)}f_{2}(\theta)\right\} \end{eqnarray*}$ where $$f_{1}(\theta)$$ is the density function of $$Beta(\alpha_{1}+1, \beta_{1})$$ and $$f_{2}(\theta)$$ the density function of $$Beta(\alpha_{1},\beta_{1}+1)$$. Hence, $\begin{eqnarray*} c^{-1} & = & \frac{B(\alpha_{1}+1,\beta_{1})}{B(\alpha+1,\beta)} + \frac{B(\alpha_{1},\beta_{1}+1)}{B(\alpha,\beta+1)} \end{eqnarray*}$ so that $$f(\theta \, | \, x) = \lambda f_{1}(\theta) + (1-\lambda)f_{2}(\theta)$$ with $\begin{eqnarray*} \lambda & = & \frac{\frac{B(\alpha_{1}+1,\beta_{1})}{B(\alpha+1,\beta)}}{\frac{B(\alpha_{1}+1,\beta_{1})}{B(\alpha+1,\beta)} + \frac{B(\alpha_{1},\beta_{1}+1)}{B(\alpha,\beta+1)}} \\ & = & \frac{\frac{\alpha_{1}(\alpha + \beta)B(\alpha_{1},\beta_{1})}{\alpha(\alpha_{1}+ \beta_{1})B(\alpha,\beta)}}{\frac{\alpha_{1}(\alpha + \beta)B(\alpha_{1},\beta_{1})}{\alpha(\alpha_{1}+\beta_{1})B(\alpha,\beta)} + \frac{\beta_{1}(\alpha+\beta)B(\alpha_{1},\beta_{1})}{\beta(\alpha_{1}+\beta_{1})B(\alpha,\beta)}} \\ & = & \frac{\alpha_{1}\beta}{\alpha_{1}\beta + \beta_{1}\alpha} \\ & = & \frac{(\alpha + n)\beta}{(\alpha + n)\beta + (\beta + \sum_{i=1}^{n} x_{i}-n)\alpha}. \end{eqnarray*}$

### Question 3

Let $$X_{1}, \ldots, X_{n}$$ be exchangeable so that the $$X_{i}$$ are conditionally independent given a parameter $$\theta$$. Suppose that $$X_{i} \, | \, \theta$$ is distributed as a double-exponential distribution with probability density function $\begin{eqnarray*} f(x_{i} \, | \, \theta) & = & \frac{1}{2\theta} \exp \left\{- \frac{|x_{i}|}{\theta}\right\}, \ \ -\infty < x_{i} < \infty \end{eqnarray*}$ for $$\theta > 0$$.

1. Find the conjugate prior distribution and corresponding posterior distribution for $$\theta$$ following observation of $$x = (x_{1}, \ldots, x_{n})$$.

$\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} \frac{1}{2\theta} \exp \left\{- \frac{|x_{i}|}{\theta}\right\} \\ & \propto & \frac{1}{\theta^{n}} \exp \left\{- \frac{1}{\theta} \sum_{i=1}^{n}|x_{i}| \right\} \end{eqnarray*}$ which, when viewed as a function of $$\theta$$, is a kernel of $$Inv\mbox{-}gamma(n-1, \sum_{i=1}^{n} |x_{i}|)$$. We thus take $$\theta \sim Inv\mbox{-}gamma(\alpha, \beta)$$ as the prior so that $\begin{eqnarray*} f(\theta \, | \, x) & \propto & \frac{1}{\theta^{n}} \exp \left\{- \frac{1}{\theta} \sum_{i=1}^{n}|x_{i}| \right\}\frac{1}{\theta^{\alpha + 1}}\exp\left\{-\frac{\beta}{\theta}\right\} \\ & = & \frac{1}{\theta^{\alpha + n + 1}}\exp\left\{- \frac{1}{\theta}\left(\beta + \sum_{i=1}^{n}|x_{i}| \right)\right\} \end{eqnarray*}$ which is a kernel of $$Inv\mbox{-}gamma(\alpha + n, \beta + \sum_{i=1}^{n} |x_{i}|)$$. Thus, with respect to $$X \, | \, \theta$$, the prior and posterior are in the same family, showing conjugacy, with $$\theta \, | \, x \sim Inv\mbox{-}gamma(\alpha + n, \beta + \sum_{i=1}^{n} |x_{i}|)$$.

1. Consider the transformation $$\phi = \theta^{-1}$$. Find the posterior distribution of $$\phi \, | \, x$$.

We have $$\phi = g(\theta)$$ where $$g(\theta) = \theta^{-1}$$ so that $$\theta = g^{-1}(\phi) = \phi^{-1}$$. Transforming $$f_{\theta}(\theta \, | \, x)$$ to $$f_{\phi}(\phi \, | \, x)$$ we have $\begin{eqnarray*} f_{\phi}(\phi \, | \, x) & = & \left|\frac{\partial \theta}{\partial \phi}\right| f_{\theta}(g^{-1}(\phi) \, | \, x) \\ & \propto & \left|\frac{-1}{\phi^{2}}\right| \frac{1}{\frac{1}{\phi}^{\alpha + n + 1}}\exp\left\{- \frac{1}{\frac{1}{\phi}}\left(\beta + \sum_{i=1}^{n}|x_{i}| \right)\right\} \\ & = & \phi^{\alpha + n - 1}\exp\left\{- \phi\left(\beta + \sum_{i=1}^{n}|x_{i}| \right)\right\} \end{eqnarray*}$ which is a kernel of a $$Gamma(\alpha + n, \beta + \sum_{i=1}^{n} |x_{i}|)$$ distribution. That is $$\phi \, | \, x \sim Gamma(\alpha + n, \beta + \sum_{i=1}^{n} |x_{i}|)$$. The result highlights the relationship between the Gamma and Inv-Gamma distributions shown on question 3(b)(i) of Question Sheet Two.

### Question 4

Let $$X_{1}, \ldots, X_{n}$$ be a finite subset of a sequence of infinitely exchangeable random quantities with joint density function $\begin{eqnarray*} f(x_{1}, \ldots, x_{n}) & = & n! \left(1 + \sum_{i=1}^{n} x_{i}\right)^{-(n+1)}. \end{eqnarray*}$ Show that they can be represented as conditionally independent and exponentially distributed.

Using de Finetti’s Representation Theorem (Theorem 2 of the on-line notes), the joint distribution has an integral representation of the form $\begin{eqnarray*} f(x_{1}, \ldots, x_{n}) & = & \int_{\theta}\left\{\prod_{i=1}^{n} f(x_{i} \, | \, \theta)\right\} f(\theta) \, d\theta. \end{eqnarray*}$ If $$X_{i} \, | \, \theta \sim \mbox{Exp}(\theta)$$ then $\begin{eqnarray*} \prod_{i=1}^{n} f(x_{i} \, | \, \theta) \ = \ \prod_{i=1}^{n} \theta \exp\left(-\theta x_{i} \right) \ = \ \theta^{n} \exp\left(-\theta \sum_{i=1}^{n} x_{i} \right). \end{eqnarray*}$ Notice that, viewed as a function of $$\theta$$, this looks like a kernel of $$\mbox{Gamma}(n+1, \sum_{i=1}^{n} x_{i})$$. The result holds if we can find an $$f(\theta)$$ such that $\begin{eqnarray*} n! \left(1 + \sum_{i=1}^{n} x_{i}\right)^{-(n+1)} & = & \int_{\theta} \theta^{n} \exp\left(-\theta \sum_{i=1}^{n} x_{i} \right) f(\theta) \, d\theta. \end{eqnarray*}$ The left hand side looks like the normalising constant of a $$\mbox{Gamma}(n+1, 1 + \sum_{i=1}^{n} x_{i})$$ (as $$n! = \Gamma(n+1)$$) and if $$f(\theta) = \exp(-\theta)$$ then the integrand on the right hand side is a kernel of a $$\mbox{Gamma}(n+1, 1 + \sum_{i=1}^{n} x_{i})$$. So, if $$\theta \sim \mbox{Gamma}(1, 1)$$ then $$f(\theta) = \exp(-\theta)$$ and we have the desired representation.

### Question 5

Let $$X_{1}, \ldots, X_{n}$$ be exchangeable so that the $$X_{i}$$ are conditionally independent given a parameter $$\theta$$. Suppose that $$X_{i} \, | \, \theta$$ is distributed as a Poisson distribution with mean $$\theta$$.

1. Show that, with respect to this Poisson likelihood, the gamma family of distributions is conjugate.

$\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} P(X_{i} = x_{i} \, | \, \theta) \\ & \propto & \prod_{i=1}^{n} \theta^{x_{i}} \exp\left\{-\theta\right\} \\ & = & \theta^{n\bar{x}}\exp\left\{-n\theta\right\}. \end{eqnarray*}$ As $$\theta \sim Gamma(\alpha, \beta)$$ then $\begin{eqnarray*} f(\theta \, | \, x) & \propto & f(x \, | \, \theta)f(\theta) \\ & \propto & \theta^{n\bar{x}}\exp\left\{-n\theta\right\} \theta^{\alpha -1}\exp\left\{-\beta \theta \right\} \\ & = & \theta^{\alpha + n\bar{x} -1}\exp\left\{-(\beta +n) \theta \right\} \end{eqnarray*}$ which is a kernel of a $$Gamma(\alpha + n\bar{x}, \beta + n)$$ distribution. Hence, the prior and posterior are in the same family giving conjugacy.

1. Interpret the posterior mean of $$\theta$$ paying particular attention to the cases when we may have weak prior information and strong prior information.

$\begin{eqnarray*} E(\theta \, | \, X) & = & \frac{\alpha + n\bar{x}}{\beta + n} \\ & = & \frac{\beta\left(\frac{\alpha}{\beta}\right) + n\bar{x}}{\beta + n} \\ & = & \lambda \left(\frac{\alpha}{\beta}\right) + (1-\lambda)\bar{x} \end{eqnarray*}$ where $$\lambda = \frac{\beta}{\beta + n}$$. Hence, the posterior mean is a weighted average of the prior mean, $$\frac{\alpha}{\beta}$$, and the data mean, $$\bar{x}$$, which is also the maximum likelihood estimate.

Weak prior information corresponds to a large variance of $$\theta$$ which can be viewed as small $$\beta$$ ($$\beta$$ is the inverse scale parameter). In this case, more weight is attached to $$\bar{x}$$ than $$\frac{\alpha}{\beta}$$ in the posterior mean.

Strong prior information corresponds to a small variance of $$\theta$$ which can be viewed as large $$\beta$$ (once again, $$\beta$$ is the inverse scale parameter). In this case, more weight is attached to $$\frac{\alpha}{\beta}$$ than $$\bar{x}$$ in the posterior mean which thus favours the prior mean.

1. Suppose now that the prior for $$\theta$$ is given hierarchically. Given $$\lambda$$, $$\theta$$ is judged to follow an exponential distribution with mean $$\frac{1}{\lambda}$$ and $$\lambda$$ is given the improper distribution $$f(\lambda) \propto 1$$ for $$\lambda > 0$$. Show that $\begin{eqnarray*} f(\lambda \, | \, x) & \propto & \frac{\lambda}{(n+\lambda)^{n\bar{x}+1}} \end{eqnarray*}$ where $$\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_{i}$$.

$$\theta \, | \, \lambda \sim Exp(\lambda)$$ so $$f(\theta \, | \, \lambda) = \lambda \exp\{-\lambda \theta\}$$. $\begin{eqnarray*} f(\lambda, \theta \, | \, x) & \propto & f(x \, | \, \theta, \lambda)f(\theta, \lambda) \\ & = & f(x \, | \, \theta) f(\theta \, | \, \lambda)f(\lambda) \\ & \propto & \left(\theta^{n\bar{x}}\exp\left\{-n\theta\right\}\right)\left( \lambda \exp\left\{-\lambda \theta\right\}\right) \\ & = & \lambda \theta^{n\bar{x}}\exp\left\{-(n+\lambda)\theta\right\}. \end{eqnarray*}$ Thus, integrating out $$\theta$$, $\begin{eqnarray*} f(\lambda \, | \, x) & \propto & \int_{0}^{\infty} \lambda \theta^{n\bar{x}}\exp\left\{-(n+\lambda)\theta\right\} d\theta \\ & = & \lambda \int_{0}^{\infty} \theta^{n\bar{x}}\exp\left\{-(n+\lambda)\theta\right\} d\theta \end{eqnarray*}$ As the integrand is a kernel of a $$Gamma(n\bar{x}+1, n+\lambda)$$ distribution we thus have $\begin{eqnarray*} f(\lambda \, | \, x) & \propto & \frac{\lambda \Gamma(n\bar{x} + 1)}{(n+\lambda)^{n\bar{x}+1}} \\ & \propto & \frac{\lambda}{(n+\lambda)^{n\bar{x}+1}}. \end{eqnarray*}$