MA40189: Topics in Bayesian statistics

Solution Sheet Four

Question 1

Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta\). For each of the distributions

\(X_{i} \, | \, \theta \sim \mbox{Bern}(\theta)\).
1. Show that \(f(x_{i} \, | \, \theta)\) belongs to the 1-parameter exponential family and for \(X = (X_{1}, \ldots, X_{n})\) state the sufficient statistic for learning about \(\theta\).
  
  Notice that we can write \[\begin{eqnarray*} f(x_{i} \, | \, \theta) & = & \theta^{x_{i}}(1-\theta)^{1-x_{i}} \\ & = & \exp \left\{\left( \log \frac{\theta}{1-\theta}\right) x_{i} + \log (1 - \theta)\right\} \end{eqnarray*}\] so that \(f(x_{i} \, | \, \theta)\) belongs to the \(1\)-parameter exponential family with \(\phi_{1}(\theta) = \log \frac{\theta}{1-\theta}\), \(u_{1}(x_{i}) = x_{i}\), \(g(\theta) = \log (1 - \theta)\) and \(h(x_{i}) = 0\). Notice that, from Proposition 1 (see Lecture 11), \(t_{n} = [n, \sum_{i=1}^{n} X_{i}]\) is a sufficient statistic.
2. By viewing the likelihood as a function of \(\theta\), which generic family of distributions (over \(\theta\)) is the likelihood a kernel of?
  
  The likelihood, without expressing in the explicit exponential family form, is \[\begin{eqnarray*} f(x \, | \, \theta) & = & \theta^{n\bar{x}}(1 - \theta)^{n - n\bar{x}} \end{eqnarray*}\] which, viewing as a function of \(\theta\), we immediately recognise as a Beta kernel (in particular, a \(\mbox{Beta}(n\bar{x}+1, n - n\bar{x}+1)\)).
3. By first finding the corresponding posterior distribution for \(\theta\) given \(x = (x_{1}, \ldots, x_{n})\), show that this family of distributions is conjugate with respect to the likelihood \(f(x \, | \, \theta)\).
  
  Taking \(\theta \sim \mbox{Beta}(\alpha, \beta)\) we have that
  \[\begin{eqnarray*} f(\theta \, | \, x) & \propto & \theta^{n\bar{x}}(1 - \theta)^{n - n\bar{x}} \times \theta^{\alpha - 1}(1 - \theta)^{\beta - 1} \\ & = & \theta^{\alpha + n\bar{x} - 1}(1-\theta)^{\beta + n - n\bar{x} - 1} \end{eqnarray*}\] so that \(\theta \, | \, x \sim \mbox{Beta}(\alpha + n\bar{x}, \beta + n - n\bar{x})\). Thus, the prior and the posterior are in the same family giving conjugacy.
  
  Deriving the results directly from exponential family representation
  
  Expressed in the 1-parameter exponential family form the likelihood is \[\begin{eqnarray*} f(x \, | \, \theta) & = & \exp \left\{\left( \log \frac{\theta}{1-\theta}\right) \sum_{i=1}^{n} x_{i} + n\log (1 - \theta)\right\} \end{eqnarray*}\] from which we immediately observe the sufficient statistic \(t_{n} = [n, \sum_{i=1}^{n} x_{i}]\). Viewing \(f(x \, | \, \theta)\) as a function of \(\theta\) the natural conjugate prior is a member of the \(2\)-parameter exponential family of the form \[\begin{eqnarray*} f(\theta) & = & \exp \left\{a\left( \log \frac{\theta}{1-\theta}\right) + d\log (1 - \theta) + c(a, d)\right\} \end{eqnarray*}\] where \(c(a, d)\) is the normalising constant. Hence, \[\begin{eqnarray*} f(\theta) & \propto & \exp \left\{a\left( \log \frac{\theta}{1-\theta}\right) + d\log (1 - \theta) \right\} \nonumber \\ & = & \theta^{a}(1-\theta)^{d-a} \label{eq1a3} \end{eqnarray*}\] which we recognise as a kernel of a Beta distribution. The convention is to label the hyperparameters as \(\alpha\) and \(\beta\) so that we put \(\alpha = \alpha(a, d) = a + 1\) and \(\beta = \beta(a, d) = d - a +1\) (equivalently, \(a = a(\alpha, \beta) = \alpha - 1\), \(d = d(\alpha, \beta) = \beta + \alpha -2\)). The conjugate prior distribution is \(\theta \sim \mbox{Beta}(\alpha, \beta)\).
Let \(X_{i} \, | \, \theta \sim N(\mu, \theta)\) with \(\mu\) known.
1. Show that \(f(x_{i} \, | \, \theta)\) belongs to the 1-parameter exponential family and for \(X = (X_{1}, \ldots, X_{n})\) state the sufficient statistic for learning about \(\theta\).
  
  Writing the normal density as an exponential family (parameter \(\theta\) as \(\mu\) is a known constant) we have \[\begin{eqnarray*} f(x_{i} \, | \, \theta) & = & \exp\left\{-\frac{1}{2\theta} (x_{i} - \mu)^{2} - \frac{1}{2}\log \theta - \log \sqrt{2\pi} \right\} \end{eqnarray*}\] so that \(f(x_{i} \, | \, \theta)\) belongs to the 1-parameter exponential family. The sufficient statistic is \(t_{n} = [n, \sum_{i=1}^{n}(x_{i} - \mu)^{2}]\). Note that, expressed explicitly as a 1-parameter exponential family, the likelihood for \(x = (x_{1}, \ldots, x_{n})\) is \[\begin{eqnarray*} f(x \, | \, \theta) & = & \exp\left\{-\frac{1}{2\theta} \sum_{i=1}^{n} (x_{i} - \mu)^{2} - \frac{n}{2}\log \theta - n\log \sqrt{2\pi} \right\} \end{eqnarray*}\] so that the natural conjugate prior has the form \[\begin{eqnarray*} f(\theta) & = & \exp\left\{-a \frac{1}{\theta} - d \log \theta + c(a, d)\right\} \\ & \propto & \theta^{-d}\exp\left\{-a\frac{1}{\theta}\right\} \end{eqnarray*}\] which we recognise as a kernel of an Inverse-Gamma distribution.
2. By viewing the likelihood as a function of \(\theta\), which generic family of distributions (over \(\theta\)) is the likelihood a kernel of?
  
  In conventional form, \[\begin{eqnarray*} f(x \, | \, \theta) & \propto & \theta^{-\frac{n}{2}} \exp \left\{-\frac{1}{2\theta}\sum_{i=1}^{n} (x_{i}- \mu)^{2}\right\} \end{eqnarray*}\] which, viewing \(f(x \, | \, \theta)\) as a function of \(\theta\), we recognise as a kernel of an Inverse-Gamma distribution (in particular, an \(\mbox{Inv-gamma}(\frac{n-2}{2}, \frac{1}{2}\sum_{i=1}^{n} (x_{i}- \mu)^{2})\)).
3. By first finding the corresponding posterior distribution for \(\theta\) given \(x = (x_{1}, \ldots, x_{n})\), show that this family of distributions is conjugate with respect to the likelihood \(f(x \, | \, \theta)\).
  
  Taking \(\theta \sim \mbox{Inv-gamma}(\alpha, \beta)\) we have \[\begin{eqnarray*} f(\theta \, | \, x) & \propto & \theta^{-\frac{n}{2}} \exp \left\{-\frac{1}{2\theta}\sum_{i=1}^{n} (x_{i}- \mu)^{2}\right\} \times \theta^{-(\alpha + 1)}\exp\left\{-\frac{\beta}{\theta}\right\} \\ & = & \theta^{-(\alpha + \frac{n}{2} + 1)}\exp\left\{-\left(\beta + \frac{1}{2}\sum_{i=1}^{n} (x_{i} - \mu)^{2}\right)\frac{1}{\theta}\right\} \end{eqnarray*}\] which we recognise as a kernel of an Inverse-Gamma distribution so that \(\theta \, | \, x \sim \mbox{Inv-gamma}(\alpha + \frac{n}{2}, \beta + \frac{1}{2} \sum_{i=1}^{n} (x_{i} - \mu)^{2})\). Hence, the prior and posterior are in the same family giving conjugacy.
Let \(X_{i} \, | \, \theta \sim \mbox{Maxwell}(\theta)\), the Maxwell distribution with parameter \(\theta\) so that \[\begin{eqnarray*} f(x_{i} \, | \, \theta) = \left(\frac{2}{\pi}\right)^{\frac{1}{2}}\theta^{\frac{3}{2}}x_{i}^{2}\exp\left\{-\frac{\theta x_{i}^{2}}{2}\right\}, \ \ x_{i} > 0 \end{eqnarray*}\] and \(E(X_{i} \, | \, \theta) = 2\sqrt{\frac{2}{\pi \theta}}\), \(Var(X_{i} \, | \, \theta) = \frac{3\pi - 8}{\pi \theta}\)
1. Show that \(f(x_{i} \, | \, \theta)\) belongs to the 1-parameter exponential family and for \(X = (X_{1}, \ldots, X_{n})\) state the sufficient statistic for learning about \(\theta\).
  
  Writing the Maxwell density in exponential family form we have \[\begin{eqnarray*} f(x_{i} \, | \, \theta) & = & \exp\left\{-\theta \frac{x_{i}^{2}}{2} + \frac{3}{2} \log \theta + \log x_{i}^{2} + \frac{1}{2} \log \frac{2}{\pi}\right\} \end{eqnarray*}\] so that \(f(x_{i} \, | \, \theta)\) belongs to the 1-parameter exponential family. The sufficient statistic is \(t_{n} = [n, \sum_{i=1}^{n}x_{i}^{2}]\). Note that, expressed explicitly as a 1-parameter exponential family, the likelihood for \(x = (x_{1}, \ldots, x_{n})\) is \[\begin{eqnarray*} f(x \, | \, \theta) & = & \exp\left\{-\theta \sum_{i=1}^{n} \frac{x_{i}^{2}}{2} + \frac{3n}{2} \log \theta + \sum_{i=1}^{n}\log x_{i}^{2} + \frac{n}{2} \log \frac{2}{\pi}\right\} \end{eqnarray*}\] so that the natural conjugate prior has the form \[\begin{eqnarray*} f(\theta) & = & \exp\left\{-a \theta + d \log \theta + c(a, d)\right\} \\ & \propto & \theta^{d} e^{-a \theta} \end{eqnarray*}\] which we recognise as a kernel of a Gamma distribution.
2. By viewing the likelihood as a function of \(\theta\), which generic family of distributions (over \(\theta\)) is the likelihood a kernel of?
  
  In conventional form, \[\begin{eqnarray*} f(x \, | \, \theta) & = & \left(\frac{2}{\pi}\right)^{\frac{n}{2}}\theta^{\frac{3n}{2}}\left(\prod_{i=1}^{n} x_{i}^{2}\right)\exp\left\{-\left(\frac{\sum_{i=1}^{n} x_{i}^{2}}{2}\right)\theta\right\} \\ & \propto & \theta^{\frac{3n}{2}}\exp\left\{-\left(\frac{\sum_{i=1}^{n} x_{i}^{2}}{2}\right)\theta\right\} \end{eqnarray*}\] which, viewing \(f(x \, | \, \theta)\) as a function of \(\theta\), we recognise as a kernel of a Gamma distribution (in particular, \(\mbox{Gamma}(\frac{3n+2}{2}, \frac{1}{2}\sum_{i=1}^{n} x_{i}^{2})\)).
3. By first finding the corresponding posterior distribution for \(\theta\) given \(x = (x_{1}, \ldots, x_{n})\), show that this family of distributions is conjugate with respect to the likelihood \(f(x \, | \, \theta)\).
  
  Taking \(\theta \sim \mbox{Gamma}(\alpha, \beta)\) we have \[\begin{eqnarray*} f(\theta \, | \, x) & \propto & \theta^{\frac{3n}{2}}\exp\left\{-\left(\frac{\sum_{i=1}^{n} x_{i}^{2}}{2}\right)\theta\right\} \times \theta^{\alpha -1}e^{-\beta \theta} \\ & = & \theta^{\alpha + \frac{3n}{2} - 1} \exp\left\{-\left(\beta + \frac{1}{2}\sum_{i=1}^{n} x_{i}^{2}\right)\theta\right\} \end{eqnarray*}\] which, of course, is a kernel of a Gamma distribution so that \(\theta \, | \, x \sim \mbox{Gamma}(\alpha + \frac{3n}{2}, \beta + \frac{1}{2}\sum_{i=1}^{n}x_{i}^{2})\). The prior and the posterior are in the same family giving conjugacy.

Question 2

Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta\). Suppose that \(X_{i} \, | \, \theta\) is geometrically distributed with probability density function \[\begin{eqnarray*} f(x_{i} \, | \, \theta) & = & (1-\theta)^{x_{i}-1}\theta, \ \ x_{i} = 1, 2, \ldots. \end{eqnarray*}\]

Show that \(f(x \, | \, \theta)\), where \(x = (x_{1}, \ldots, x_{n})\), belongs to the \(1\)-parameter exponential family. Hence, or otherwise, find the conjugate prior distribution and corresponding posterior distribution for \(\theta\).

As the \(X_{i}\) are exchangeable then \[\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} f(x_{i} \, | \, \theta) \\ & = & \prod_{i=1}^{n} (1-\theta)^{x_{i}-1}\theta \\ & = & (1 - \theta)^{n\bar{x} -n}\theta^{n} \\ & = & \exp\left\{(n\bar{x} - n)\log (1 - \theta) + n \log \theta \right\} \end{eqnarray*}\] and so belongs to the \(1\)-parameter exponential family. The conjugate prior is of the form \[\begin{eqnarray*} f(\theta) & \propto & \exp\left\{ a\log (1 - \theta) + b \log \theta \right\} \\ & = & \theta^{b}(1-\theta)^{a} \end{eqnarray*}\] which is a kernel of a Beta distribution. Letting \(\alpha = b+1\), \(\beta = a+1\) then we have \(\theta \sim \mbox{Beta}(\alpha, \beta)\). \[\begin{eqnarray*} f(\theta \, | \, x) & \propto & f(x \, | \, \theta)f(\theta) \\ & \propto & \theta^{n}(1-\theta)^{(n\bar{x} - n)}\theta^{\alpha-1}(1-\theta)^{\beta-1} \end{eqnarray*}\] which is a kernel of a \(\mbox{Beta}(\alpha + n, \beta + n\bar{x} - n)\) so that \(\theta \, | \, x \sim \mbox{Beta}(\alpha + n, \beta + n\bar{x} - n)\).
Show that the posterior mean for \(\theta\) can be written as a weighted average of the prior mean of \(\theta\) and the maximum likelihood estimate, \(\bar{x}^{-1}\).
\[\begin{eqnarray*} E(\theta \, | \, X) & = & \frac{\alpha + n}{(\alpha + n) + (\beta + n\bar{x} - n)} \\ & = & \frac{\alpha + n}{\alpha + \beta + n\bar{x}} \\ & = & \left(\frac{\alpha + \beta}{\alpha + \beta + n\bar{x}}\right)\left(\frac{\alpha}{\alpha+\beta}\right) + \left(\frac{n\bar{x}}{\alpha + \beta + n\bar{x}}\right)\left(\frac{1}{\bar{x}}\right) \\ & = & \lambda E(\theta) + (1-\lambda)\bar{x}^{-1}. \end{eqnarray*}\]
Suppose now that the prior for \(\theta\) is instead given by the probability density function
\[\begin{eqnarray*} f(\theta) & = & \frac{1}{2B(\alpha+1, \beta)}\theta^{\alpha}(1-\theta)^{\beta - 1} + \frac{1}{2B(\alpha, \beta+1)}\theta^{\alpha-1}(1-\theta)^{\beta}, \end{eqnarray*}\] where \(B(\alpha, \beta)\) denotes the Beta function evaluated at \(\alpha\) and \(\beta\). Show that the posterior probability density function can be written as \[\begin{eqnarray*} f(\theta \, | \, x) & = & \lambda f_{1}(\theta) + (1 - \lambda) f_{2}(\theta) \end{eqnarray*}\] where \[\begin{eqnarray*} \lambda & = & \frac{(\alpha + n)\beta}{(\alpha + n)\beta + (\beta -n + \sum_{i=1}^{n} x_{i})\alpha} \end{eqnarray*}\] and \(f_{1}(\theta)\) and \(f_{2}(\theta)\) are probability density functions.

\[\begin{eqnarray*} f(\theta \, | \, x) & \propto & f(x \, | \, \theta)f(\theta) \\ & = & \theta^{n}(1-\theta)^{(n\bar{x} - n)}\left\{\frac{\theta^{\alpha}(1-\theta)^{\beta-1}}{B(\alpha+1, \beta)} + \frac{\theta^{\alpha-1}(1-\theta)^{\beta}}{B(\alpha, \beta+1)}\right\} \\ & = & \frac{\theta^{\alpha_{1}}(1-\theta)^{\beta_{1}-1}}{B(\alpha+1, \beta)} + \frac{\theta^{\alpha_{1}-1}(1-\theta)^{\beta_{1}}}{B(\alpha, \beta+1)} \end{eqnarray*}\] where \(\alpha_{1} = \alpha +n\) and \(\beta_{1} = \beta + n\bar{x} -n\). Finding the constant of proportionality we observe that \(\theta^{\alpha_{1}}(1-\theta)^{\beta_{1}-1}\) is a kernel of a \(\mbox{Beta}(\alpha_{1}+1, \beta_{1})\) and \(\theta^{\alpha_{1}-1}(1-\theta)^{\beta_{1}}\) is a kernel of a \(\mbox{Beta}(\alpha_{1},\beta_{1}+1)\). So, \[\begin{eqnarray*} f(\theta \, | \, x) & = & c\left\{\frac{B(\alpha_{1}+1,\beta_{1})}{B(\alpha+1,\beta)}f_{1}(\theta) + \frac{B(\alpha_{1},\beta_{1}+1)}{B(\alpha,\beta+1)}f_{2}(\theta)\right\} \end{eqnarray*}\] where \(f_{1}(\theta)\) is the density function of \(\mbox{Beta}(\alpha_{1}+1, \beta_{1})\) and \(f_{2}(\theta)\) the density function of \(\mbox{Beta}(\alpha_{1},\beta_{1}+1)\). Hence, \[\begin{eqnarray*} c^{-1} & = & \frac{B(\alpha_{1}+1,\beta_{1})}{B(\alpha+1,\beta)} + \frac{B(\alpha_{1},\beta_{1}+1)}{B(\alpha,\beta+1)} \end{eqnarray*}\] so that \(f(\theta \, | \, x) = \lambda f_{1}(\theta) + (1-\lambda)f_{2}(\theta)\) with \[\begin{eqnarray*} \lambda & = & \frac{\frac{B(\alpha_{1}+1,\beta_{1})}{B(\alpha+1,\beta)}}{\frac{B(\alpha_{1}+1,\beta_{1})}{B(\alpha+1,\beta)} + \frac{B(\alpha_{1},\beta_{1}+1)}{B(\alpha,\beta+1)}} \\ & = & \frac{\frac{\alpha_{1}(\alpha + \beta)B(\alpha_{1},\beta_{1})}{\alpha(\alpha_{1}+ \beta_{1})B(\alpha,\beta)}}{\frac{\alpha_{1}(\alpha + \beta)B(\alpha_{1},\beta_{1})}{\alpha(\alpha_{1}+\beta_{1})B(\alpha,\beta)} + \frac{\beta_{1}(\alpha+\beta)B(\alpha_{1},\beta_{1})}{\beta(\alpha_{1}+\beta_{1})B(\alpha,\beta)}} \\ & = & \frac{\alpha_{1}\beta}{\alpha_{1}\beta + \beta_{1}\alpha} \\ & = & \frac{(\alpha + n)\beta}{(\alpha + n)\beta + (\beta + \sum_{i=1}^{n} x_{i}-n)\alpha}. \end{eqnarray*}\]

Question 3

Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta\). Suppose that \(X_{i} \, | \, \theta\) is distributed as a double-exponential distribution with probability density function \[\begin{eqnarray*} f(x_{i} \, | \, \theta) & = & \frac{1}{2\theta} \exp \left\{- \frac{|x_{i}|}{\theta}\right\}, \ \ -\infty < x_{i} < \infty \end{eqnarray*}\] for \(\theta > 0\).

Find the conjugate prior distribution and corresponding posterior distribution for \(\theta\) following observation of \(x = (x_{1}, \ldots, x_{n})\).
\[\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} \frac{1}{2\theta} \exp \left\{- \frac{|x_{i}|}{\theta}\right\} \\ & \propto & \frac{1}{\theta^{n}} \exp \left\{- \frac{1}{\theta} \sum_{i=1}^{n}|x_{i}| \right\} \end{eqnarray*}\] which, when viewed as a function of \(\theta\), is a kernel of \(\mbox{Inv-gamma}(n-1, \sum_{i=1}^{n} |x_{i}|)\). We thus take \(\theta \sim \mbox{Inv-gamma}(\alpha, \beta)\) as the prior so that \[\begin{eqnarray*} f(\theta \, | \, x) & \propto & \frac{1}{\theta^{n}} \exp \left\{- \frac{1}{\theta} \sum_{i=1}^{n}|x_{i}| \right\}\frac{1}{\theta^{\alpha + 1}}\exp\left\{-\frac{\beta}{\theta}\right\} \\ & = & \frac{1}{\theta^{\alpha + n + 1}}\exp\left\{- \frac{1}{\theta}\left(\beta + \sum_{i=1}^{n}|x_{i}| \right)\right\} \end{eqnarray*}\] which is a kernel of \(\mbox{Inv-gamma}(\alpha + n, \beta + \sum_{i=1}^{n} |x_{i}|)\). Thus, with respect to \(X \, | \, \theta\), the prior and posterior are in the same family, showing conjugacy, with \(\theta \, | \, x \sim \mbox{Inv-gamma}(\alpha + n, \beta + \sum_{i=1}^{n} |x_{i}|)\).
Consider the transformation \(\phi = \theta^{-1}\). Find the posterior distribution of \(\phi \, | \, x\).

We have \(\phi = g(\theta)\) where \(g(\theta) = \theta^{-1}\) so that \(\theta = g^{-1}(\phi) = \phi^{-1}\). Transforming \(f_{\theta}(\theta \, | \, x)\) to \(f_{\phi}(\phi \, | \, x)\) we have \[\begin{eqnarray*} f_{\phi}(\phi \, | \, x) & = & \left|\frac{\partial \theta}{\partial \phi}\right| f_{\theta}(g^{-1}(\phi) \, | \, x) \\ & \propto & \left|\frac{-1}{\phi^{2}}\right| \frac{1}{\frac{1}{\phi}^{\alpha + n + 1}}\exp\left\{- \frac{1}{\frac{1}{\phi}}\left(\beta + \sum_{i=1}^{n}|x_{i}| \right)\right\} \\ & = & \phi^{\alpha + n - 1}\exp\left\{- \phi\left(\beta + \sum_{i=1}^{n}|x_{i}| \right)\right\} \end{eqnarray*}\] which is a kernel of a \(\mbox{Gamma}(\alpha + n, \beta + \sum_{i=1}^{n} |x_{i}|)\) distribution. That is \(\phi \, | \, x \sim \mbox{Gamma}(\alpha + n, \beta + \sum_{i=1}^{n} |x_{i}|)\). The result highlights the relationship between the Gamma and Inv-Gamma distributions shown on question 3(b)(i) of Question Sheet Two.

Question 4

Let \(X_{1}, \ldots, X_{n}\) be a finite subset of a sequence of infinitely exchangeable random variables with joint density function \[\begin{eqnarray*} f(x_{1}, \ldots, x_{n}) & = & n! \left(1 + \sum_{i=1}^{n} x_{i}\right)^{-(n+1)}. \end{eqnarray*}\] Show that they can be represented as conditionally independent and exponentially distributed.

Using de Finetti’s Representation Theorem (Theorem 2 of the on-line notes), the joint distribution has an integral representation of the form \[\begin{eqnarray*} f(x_{1}, \ldots, x_{n}) & = & \int_{\theta}\left\{\prod_{i=1}^{n} f(x_{i} \, | \, \theta)\right\} f(\theta) \, d\theta. \end{eqnarray*}\] If \(X_{i} \, | \, \theta \sim \mbox{Exp}(\theta)\) then \[\begin{eqnarray*} \prod_{i=1}^{n} f(x_{i} \, | \, \theta) \ = \ \prod_{i=1}^{n} \theta \exp\left(-\theta x_{i} \right) \ = \ \theta^{n} \exp\left(-\theta \sum_{i=1}^{n} x_{i} \right). \end{eqnarray*}\] Notice that, viewed as a function of \(\theta\), this looks like a kernel of \(\mbox{Gamma}(n+1, \sum_{i=1}^{n} x_{i})\). The result holds if we can find an \(f(\theta)\) such that \[\begin{eqnarray*} n! \left(1 + \sum_{i=1}^{n} x_{i}\right)^{-(n+1)} & = & \int_{\theta} \theta^{n} \exp\left(-\theta \sum_{i=1}^{n} x_{i} \right) f(\theta) \, d\theta. \end{eqnarray*}\] The left hand side looks like the normalising constant of a \(\mbox{Gamma}(n+1, 1 + \sum_{i=1}^{n} x_{i})\) (as \(n! = \Gamma(n+1)\)) and if \(f(\theta) = \exp(-\theta)\) then the integrand on the right hand side is a kernel of a \(\mbox{Gamma}(n+1, 1 + \sum_{i=1}^{n} x_{i})\). So, if \(\theta \sim \mbox{Gamma}(1, 1)\) then \(f(\theta) = \exp(-\theta)\) and we have the desired representation.

Question 5

Show that, with respect to this Poisson likelihood, the gamma family of distributions is conjugate.
\[\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} P(X_{i} = x_{i} \, | \, \theta) \\ & \propto & \prod_{i=1}^{n} \theta^{x_{i}} \exp\left\{-\theta\right\} \\ & = & \theta^{n\bar{x}}\exp\left\{-n\theta\right\}. \end{eqnarray*}\] As \(\theta \sim \mbox{Gamma}(\alpha, \beta)\) then \[\begin{eqnarray*} f(\theta \, | \, x) & \propto & f(x \, | \, \theta)f(\theta) \\ & \propto & \theta^{n\bar{x}}\exp\left\{-n\theta\right\} \theta^{\alpha -1}\exp\left\{-\beta \theta \right\} \\ & = & \theta^{\alpha + n\bar{x} -1}\exp\left\{-(\beta +n) \theta \right\} \end{eqnarray*}\] which is a kernel of a \(\mbox{Gamma}(\alpha + n\bar{x}, \beta + n)\) distribution. Hence, the prior and posterior are in the same family giving conjugacy.
Interpret the posterior mean of \(\theta\) paying particular attention to the cases when we may have weak prior information and strong prior information.
\[\begin{eqnarray*} E(\theta \, | \, X) & = & \frac{\alpha + n\bar{x}}{\beta + n} \\ & = & \frac{\beta\left(\frac{\alpha}{\beta}\right) + n\bar{x}}{\beta + n} \\ & = & \lambda \left(\frac{\alpha}{\beta}\right) + (1-\lambda)\bar{x} \end{eqnarray*}\] where \(\lambda = \frac{\beta}{\beta + n}\). Hence, the posterior mean is a weighted average of the prior mean, \(\frac{\alpha}{\beta}\), and the data mean, \(\bar{x}\), which is also the maximum likelihood estimate.

Weak prior information corresponds to a large variance of \(\theta\) which can be viewed as small \(\beta\) (\(\beta\) is the inverse scale parameter). In this case, more weight is attached to \(\bar{x}\) than \(\frac{\alpha}{\beta}\) in the posterior mean.

Strong prior information corresponds to a small variance of \(\theta\) which can be viewed as large \(\beta\) (once again, \(\beta\) is the inverse scale parameter). In this case, more weight is attached to \(\frac{\alpha}{\beta}\) than \(\bar{x}\) in the posterior mean which thus favours the prior mean.
Suppose now that the prior for \(\theta\) is given hierarchically. Given \(\lambda\), \(\theta\) is judged to follow an exponential distribution with mean \(\frac{1}{\lambda}\) and \(\lambda\) is given the improper distribution \(f(\lambda) \propto 1\) for \(\lambda > 0\). Show that \[\begin{eqnarray*} f(\lambda \, | \, x) & \propto & \frac{\lambda}{(n+\lambda)^{n\bar{x}+1}} \end{eqnarray*}\] where \(\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_{i}\).

\(\theta \, | \, \lambda \sim \mbox{Exp}(\lambda)\) so \(f(\theta \, | \, \lambda) = \lambda \exp\{-\lambda \theta\}\). \[\begin{eqnarray*} f(\lambda, \theta \, | \, x) & \propto & f(x \, | \, \theta, \lambda)f(\theta, \lambda) \\ & = & f(x \, | \, \theta) f(\theta \, | \, \lambda)f(\lambda) \\ & \propto & \left(\theta^{n\bar{x}}\exp\left\{-n\theta\right\}\right)\left( \lambda \exp\left\{-\lambda \theta\right\}\right) \\ & = & \lambda \theta^{n\bar{x}}\exp\left\{-(n+\lambda)\theta\right\}. \end{eqnarray*}\] Thus, integrating out \(\theta\), \[\begin{eqnarray*} f(\lambda \, | \, x) & \propto & \int_{0}^{\infty} \lambda \theta^{n\bar{x}}\exp\left\{-(n+\lambda)\theta\right\} d\theta \\ & = & \lambda \int_{0}^{\infty} \theta^{n\bar{x}}\exp\left\{-(n+\lambda)\theta\right\} d\theta \end{eqnarray*}\] As the integrand is a kernel of a \(\mbox{Gamma}(n\bar{x}+1, n+\lambda)\) distribution we thus have \[\begin{eqnarray*} f(\lambda \, | \, x) & \propto & \frac{\lambda \Gamma(n\bar{x} + 1)}{(n+\lambda)^{n\bar{x}+1}} \\ & \propto & \frac{\lambda}{(n+\lambda)^{n\bar{x}+1}}. \end{eqnarray*}\]