Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta\). For each of the distributions
\(X_{i} \, | \, \theta \sim Bern(\theta)\).
Notice that we can write \[\begin{eqnarray*} f(x_{i} \, | \, \theta) & = & \theta^{x_{i}}(1-\theta)^{1-x_{i}} \\ & = & \exp \left\{\left( \log \frac{\theta}{1-\theta}\right) x_{i} + \log (1 - \theta)\right\} \end{eqnarray*}\] so that \(f(x_{i} \, | \, \theta)\) belongs to the \(1\)-parameter exponential family with \(\phi_{1}(\theta) = \log \frac{\theta}{1-\theta}\), \(u_{1}(x_{i}) = x_{i}\), \(g(\theta) = \log (1 - \theta)\) and \(h(x_{i}) = 0\). Notice that, from Proposition 1 (see Lecture 11), \(t_{n} = [n, \sum_{i=1}^{n} X_{i}]\) is a sufficient statistic.
The likelihood, without expressing in the explicit exponential family form, is \[\begin{eqnarray*} f(x \, | \, \theta) & = & \theta^{n\bar{x}}(1 - \theta)^{n - n\bar{x}} \end{eqnarray*}\] which, viewing as a function of \(\theta\), we immediately recognise as a Beta kernel (in particular, a \(Beta(n\bar{x}+1, n - n\bar{x}+1)\)).
Taking \(\theta \sim Beta(\alpha,
\beta)\) we have that
\[\begin{eqnarray*}
f(\theta \, | \, x) & \propto & \theta^{n\bar{x}}(1 - \theta)^{n
- n\bar{x}} \times \theta^{\alpha - 1}(1 - \theta)^{\beta - 1} \\
& = & \theta^{\alpha + n\bar{x} - 1}(1-\theta)^{\beta + n -
n\bar{x} - 1}
\end{eqnarray*}\] so that \(\theta \, |
\, x \sim Beta(\alpha + n\bar{x}, \beta + n - n\bar{x})\). Thus,
the prior and the posterior are in the same family giving conjugacy.
Deriving the results directly from exponential family representation
Expressed in the 1-parameter exponential family form the likelihood is \[\begin{eqnarray*} f(x \, | \, \theta) & = & \exp \left\{\left( \log \frac{\theta}{1-\theta}\right) \sum_{i=1}^{n} x_{i} + n\log (1 - \theta)\right\} \end{eqnarray*}\] from which we immediately observe the sufficient statistic \(t_{n} = [n, \sum_{i=1}^{n} x_{i}]\). Viewing \(f(x \, | \, \theta)\) as a function of \(\theta\) the natural conjugate prior is a member of the \(2\)-parameter exponential family of the form \[\begin{eqnarray*} f(\theta) & = & \exp \left\{a\left( \log \frac{\theta}{1-\theta}\right) + d\log (1 - \theta) + c(a, d)\right\} \end{eqnarray*}\] where \(c(a, d)\) is the normalising constant. Hence, \[\begin{eqnarray*} f(\theta) & \propto & \exp \left\{a\left( \log \frac{\theta}{1-\theta}\right) + d\log (1 - \theta) \right\} \nonumber \\ & = & \theta^{a}(1-\theta)^{d-a} \label{eq1a3} \end{eqnarray*}\] which we recognise as a kernel of a Beta distribution. The convention is to label the hyperparameters as \(\alpha\) and \(\beta\) so that we put \(\alpha = \alpha(a, d) = a + 1\) and \(\beta = \beta(a, d) = d - a +1\) (equivalently, \(a = a(\alpha, \beta) = \alpha - 1\), \(d = d(\alpha, \beta) = \beta + \alpha -2\)). The conjugate prior distribution is \(\theta \sim Beta(\alpha, \beta)\).
Let \(X_{i} \, | \, \theta \sim N(\mu, \theta)\) with \(\mu\) known.
Writing the normal density as an exponential family (parameter \(\theta\) as \(\mu\) is a known constant) we have \[\begin{eqnarray*} f(x_{i} \, | \, \theta) & = & \exp\left\{-\frac{1}{2\theta} (x_{i} - \mu)^{2} - \frac{1}{2}\log \theta - \log \sqrt{2\pi} \right\} \end{eqnarray*}\] so that \(f(x_{i} \, | \, \theta)\) belongs to the 1-parameter exponential family. The sufficient statistic is \(t_{n} = [n, \sum_{i=1}^{n}(x_{i} - \mu)^{2}]\). Note that, expressed explicitly as a 1-parameter exponential family, the likelihood for \(x = (x_{1}, \ldots, x_{n})\) is \[\begin{eqnarray*} f(x \, | \, \theta) & = & \exp\left\{-\frac{1}{2\theta} \sum_{i=1}^{n} (x_{i} - \mu)^{2} - \frac{n}{2}\log \theta - n\log \sqrt{2\pi} \right\} \end{eqnarray*}\] so that the natural conjugate prior has the form \[\begin{eqnarray*} f(\theta) & = & \exp\left\{-a \frac{1}{\theta} - d \log \theta + c(a, d)\right\} \\ & \propto & \theta^{-d}\exp\left\{-a\frac{1}{\theta}\right\} \end{eqnarray*}\] which we recognise as a kernel of an Inverse-Gamma distribution.
In conventional form, \[\begin{eqnarray*} f(x \, | \, \theta) & \propto & \theta^{-\frac{n}{2}} \exp \left\{-\frac{1}{2\theta}\sum_{i=1}^{n} (x_{i}- \mu)^{2}\right\} \end{eqnarray*}\] which, viewing \(f(x \, | \, \theta)\) as a function of \(\theta\), we recognise as a kernel of an Inverse-Gamma distribution (in particular, an \(\mbox{Inv-gamma}(\frac{n-2}{2}, \frac{1}{2}\sum_{i=1}^{n} (x_{i}- \mu)^{2})\)).
Taking \(\theta \sim \mbox{Inv-gamma}(\alpha, \beta)\) we have \[\begin{eqnarray*} f(\theta \, | \, x) & \propto & \theta^{-\frac{n}{2}} \exp \left\{-\frac{1}{2\theta}\sum_{i=1}^{n} (x_{i}- \mu)^{2}\right\} \times \theta^{-(\alpha + 1)}\exp\left\{-\frac{\beta}{\theta}\right\} \\ & = & \theta^{-(\alpha + \frac{n}{2} + 1)}\exp\left\{-\left(\beta + \frac{1}{2}\sum_{i=1}^{n} (x_{i} - \mu)^{2}\right)\frac{1}{\theta}\right\} \end{eqnarray*}\] which we recognise as a kernel of an Inverse-Gamma distribution so that \(\theta \, | \, x \sim \mbox{Inv-gamma}(\alpha + \frac{n}{2}, \beta + \frac{1}{2} \sum_{i=1}^{n} (x_{i} - \mu)^{2})\). Hence, the prior and posterior are in the same family giving conjugacy.
Let \(X_{i} \, | \, \theta \sim Maxwell(\theta)\), the Maxwell distribution with parameter \(\theta\) so that \[\begin{eqnarray*} f(x_{i} \, | \, \theta) = \left(\frac{2}{\pi}\right)^{\frac{1}{2}}\theta^{\frac{3}{2}}x_{i}^{2}\exp\left\{-\frac{\theta x_{i}^{2}}{2}\right\}, \ \ x_{i} > 0 \end{eqnarray*}\] and \(E(X_{i} \, | \, \theta) = 2\sqrt{\frac{2}{\pi \theta}}\), \(Var(X_{i} \, | \, \theta) = \frac{3\pi - 8}{\pi \theta}\)
Writing the Maxwell density in exponential family form we have \[\begin{eqnarray*} f(x_{i} \, | \, \theta) & = & \exp\left\{-\theta \frac{x_{i}^{2}}{2} + \frac{3}{2} \log \theta + \log x_{i}^{2} + \frac{1}{2} \log \frac{2}{\pi}\right\} \end{eqnarray*}\] so that \(f(x_{i} \, | \, \theta)\) belongs to the 1-parameter exponential family. The sufficient statistic is \(t_{n} = [n, \sum_{i=1}^{n}x_{i}^{2}]\). Note that, expressed explicitly as a 1-parameter exponential family, the likelihood for \(x = (x_{1}, \ldots, x_{n})\) is \[\begin{eqnarray*} f(x \, | \, \theta) & = & \exp\left\{-\theta \sum_{i=1}^{n} \frac{x_{i}^{2}}{2} + \frac{3n}{2} \log \theta + \sum_{i=1}^{n}\log x_{i}^{2} + \frac{n}{2} \log \frac{2}{\pi}\right\} \end{eqnarray*}\] so that the natural conjugate prior has the form \[\begin{eqnarray*} f(\theta) & = & \exp\left\{-a \theta + d \log \theta + c(a, d)\right\} \\ & \propto & \theta^{d} e^{-a \theta} \end{eqnarray*}\] which we recognise as a kernel of a Gamma distribution.
In conventional form, \[\begin{eqnarray*} f(x \, | \, \theta) & = & \left(\frac{2}{\pi}\right)^{\frac{n}{2}}\theta^{\frac{3n}{2}}\left(\prod_{i=1}^{n} x_{i}^{2}\right)\exp\left\{-\left(\frac{\sum_{i=1}^{n} x_{i}^{2}}{2}\right)\theta\right\} \\ & \propto & \theta^{\frac{3n}{2}}\exp\left\{-\left(\frac{\sum_{i=1}^{n} x_{i}^{2}}{2}\right)\theta\right\} \end{eqnarray*}\] which, viewing \(f(x \, | \, \theta)\) as a function of \(\theta\), we recognise as a kernel of a Gamma distribution (in particular, \(\mbox{Gamma}(\frac{3n+2}{2}, \frac{1}{2}\sum_{i=1}^{n} x_{i}^{2})\)).
Taking \(\theta \sim \mbox{Gamma}(\alpha, \beta)\) we have \[\begin{eqnarray*} f(\theta \, | \, x) & \propto & \theta^{\frac{3n}{2}}\exp\left\{-\left(\frac{\sum_{i=1}^{n} x_{i}^{2}}{2}\right)\theta\right\} \times \theta^{\alpha -1}e^{-\beta \theta} \\ & = & \theta^{\alpha + \frac{3n}{2} - 1} \exp\left\{-\left(\beta + \frac{1}{2}\sum_{i=1}^{n} x_{i}^{2}\right)\theta\right\} \end{eqnarray*}\] which, of course, is a kernel of a Gamma distribution so that \(\theta \, | \, x \sim \mbox{Gamma}(\alpha + \frac{3n}{2}, \beta + \frac{1}{2}\sum_{i=1}^{n}x_{i}^{2})\). The prior and the posterior are in the same family giving conjugacy.
Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta\). Suppose that \(X_{i} \, | \, \theta\) is geometrically distributed with probability density function \[\begin{eqnarray*} f(x_{i} \, | \, \theta) & = & (1-\theta)^{x_{i}-1}\theta, \ \ x_{i} = 1, 2, \ldots. \end{eqnarray*}\]
As the \(X_{i}\) are exchangeable then \[\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} f(x_{i} \, | \, \theta) \\ & = & \prod_{i=1}^{n} (1-\theta)^{x_{i}-1}\theta \\ & = & (1 - \theta)^{n\bar{x} -n}\theta^{n} \\ & = & \exp\left\{(n\bar{x} - n)\log (1 - \theta) + n \log \theta \right\} \end{eqnarray*}\] and so belongs to the \(1\)-parameter exponential family. The conjugate prior is of the form \[\begin{eqnarray*} f(\theta) & \propto & \exp\left\{ a\log (1 - \theta) + b \log \theta \right\} \\ & = & \theta^{b}(1-\theta)^{a} \end{eqnarray*}\] which is a kernel of a Beta distribution. Letting \(\alpha = b+1\), \(\beta = a+1\) then we have \(\theta \sim Beta(\alpha, \beta)\). \[\begin{eqnarray*} f(\theta \, | \, x) & \propto & f(x \, | \, \theta)f(\theta) \\ & \propto & \theta^{n}(1-\theta)^{(n\bar{x} - n)}\theta^{\alpha-1}(1-\theta)^{\beta-1} \end{eqnarray*}\] which is a kernel of a \(Beta(\alpha + n, \beta + n\bar{x} - n)\) so that \(\theta \, | \, x \sim Beta(\alpha + n, \beta + n\bar{x} - n)\).
\[\begin{eqnarray*} E(\theta \, | \, X) & = & \frac{\alpha + n}{(\alpha + n) + (\beta + n\bar{x} - n)} \\ & = & \frac{\alpha + n}{\alpha + \beta + n\bar{x}} \\ & = & \left(\frac{\alpha + \beta}{\alpha + \beta + n\bar{x}}\right)\left(\frac{\alpha}{\alpha+\beta}\right) + \left(\frac{n\bar{x}}{\alpha + \beta + n\bar{x}}\right)\left(\frac{1}{\bar{x}}\right) \\ & = & \lambda E(\theta) + (1-\lambda)\bar{x}^{-1}. \end{eqnarray*}\]
\[\begin{eqnarray*} f(\theta \, | \, x) & \propto & f(x \, | \, \theta)f(\theta) \\ & = & \theta^{n}(1-\theta)^{(n\bar{x} - n)}\left\{\frac{\theta^{\alpha}(1-\theta)^{\beta-1}}{B(\alpha+1, \beta)} + \frac{\theta^{\alpha-1}(1-\theta)^{\beta}}{B(\alpha, \beta+1)}\right\} \\ & = & \frac{\theta^{\alpha_{1}}(1-\theta)^{\beta_{1}-1}}{B(\alpha+1, \beta)} + \frac{\theta^{\alpha_{1}-1}(1-\theta)^{\beta_{1}}}{B(\alpha, \beta+1)} \end{eqnarray*}\] where \(\alpha_{1} = \alpha +n\) and \(\beta_{1} = \beta + n\bar{x} -n\). Finding the constant of proportionality we observe that \(\theta^{\alpha_{1}}(1-\theta)^{\beta_{1}-1}\) is a kernel of a \(Beta(\alpha_{1}+1, \beta_{1})\) and \(\theta^{\alpha_{1}-1}(1-\theta)^{\beta_{1}}\) is a kernel of a \(Beta(\alpha_{1},\beta_{1}+1)\). So, \[\begin{eqnarray*} f(\theta \, | \, x) & = & c\left\{\frac{B(\alpha_{1}+1,\beta_{1})}{B(\alpha+1,\beta)}f_{1}(\theta) + \frac{B(\alpha_{1},\beta_{1}+1)}{B(\alpha,\beta+1)}f_{2}(\theta)\right\} \end{eqnarray*}\] where \(f_{1}(\theta)\) is the density function of \(Beta(\alpha_{1}+1, \beta_{1})\) and \(f_{2}(\theta)\) the density function of \(Beta(\alpha_{1},\beta_{1}+1)\). Hence, \[\begin{eqnarray*} c^{-1} & = & \frac{B(\alpha_{1}+1,\beta_{1})}{B(\alpha+1,\beta)} + \frac{B(\alpha_{1},\beta_{1}+1)}{B(\alpha,\beta+1)} \end{eqnarray*}\] so that \(f(\theta \, | \, x) = \lambda f_{1}(\theta) + (1-\lambda)f_{2}(\theta)\) with \[\begin{eqnarray*} \lambda & = & \frac{\frac{B(\alpha_{1}+1,\beta_{1})}{B(\alpha+1,\beta)}}{\frac{B(\alpha_{1}+1,\beta_{1})}{B(\alpha+1,\beta)} + \frac{B(\alpha_{1},\beta_{1}+1)}{B(\alpha,\beta+1)}} \\ & = & \frac{\frac{\alpha_{1}(\alpha + \beta)B(\alpha_{1},\beta_{1})}{\alpha(\alpha_{1}+ \beta_{1})B(\alpha,\beta)}}{\frac{\alpha_{1}(\alpha + \beta)B(\alpha_{1},\beta_{1})}{\alpha(\alpha_{1}+\beta_{1})B(\alpha,\beta)} + \frac{\beta_{1}(\alpha+\beta)B(\alpha_{1},\beta_{1})}{\beta(\alpha_{1}+\beta_{1})B(\alpha,\beta)}} \\ & = & \frac{\alpha_{1}\beta}{\alpha_{1}\beta + \beta_{1}\alpha} \\ & = & \frac{(\alpha + n)\beta}{(\alpha + n)\beta + (\beta + \sum_{i=1}^{n} x_{i}-n)\alpha}. \end{eqnarray*}\]
Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta\). Suppose that \(X_{i} \, | \, \theta\) is distributed as a double-exponential distribution with probability density function \[\begin{eqnarray*} f(x_{i} \, | \, \theta) & = & \frac{1}{2\theta} \exp \left\{- \frac{|x_{i}|}{\theta}\right\}, \ \ -\infty < x_{i} < \infty \end{eqnarray*}\] for \(\theta > 0\).
\[\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} \frac{1}{2\theta} \exp \left\{- \frac{|x_{i}|}{\theta}\right\} \\ & \propto & \frac{1}{\theta^{n}} \exp \left\{- \frac{1}{\theta} \sum_{i=1}^{n}|x_{i}| \right\} \end{eqnarray*}\] which, when viewed as a function of \(\theta\), is a kernel of \(Inv\mbox{-}gamma(n-1, \sum_{i=1}^{n} |x_{i}|)\). We thus take \(\theta \sim Inv\mbox{-}gamma(\alpha, \beta)\) as the prior so that \[\begin{eqnarray*} f(\theta \, | \, x) & \propto & \frac{1}{\theta^{n}} \exp \left\{- \frac{1}{\theta} \sum_{i=1}^{n}|x_{i}| \right\}\frac{1}{\theta^{\alpha + 1}}\exp\left\{-\frac{\beta}{\theta}\right\} \\ & = & \frac{1}{\theta^{\alpha + n + 1}}\exp\left\{- \frac{1}{\theta}\left(\beta + \sum_{i=1}^{n}|x_{i}| \right)\right\} \end{eqnarray*}\] which is a kernel of \(Inv\mbox{-}gamma(\alpha + n, \beta + \sum_{i=1}^{n} |x_{i}|)\). Thus, with respect to \(X \, | \, \theta\), the prior and posterior are in the same family, showing conjugacy, with \(\theta \, | \, x \sim Inv\mbox{-}gamma(\alpha + n, \beta + \sum_{i=1}^{n} |x_{i}|)\).
We have \(\phi = g(\theta)\) where \(g(\theta) = \theta^{-1}\) so that \(\theta = g^{-1}(\phi) = \phi^{-1}\). Transforming \(f_{\theta}(\theta \, | \, x)\) to \(f_{\phi}(\phi \, | \, x)\) we have \[\begin{eqnarray*} f_{\phi}(\phi \, | \, x) & = & \left|\frac{\partial \theta}{\partial \phi}\right| f_{\theta}(g^{-1}(\phi) \, | \, x) \\ & \propto & \left|\frac{-1}{\phi^{2}}\right| \frac{1}{\frac{1}{\phi}^{\alpha + n + 1}}\exp\left\{- \frac{1}{\frac{1}{\phi}}\left(\beta + \sum_{i=1}^{n}|x_{i}| \right)\right\} \\ & = & \phi^{\alpha + n - 1}\exp\left\{- \phi\left(\beta + \sum_{i=1}^{n}|x_{i}| \right)\right\} \end{eqnarray*}\] which is a kernel of a \(Gamma(\alpha + n, \beta + \sum_{i=1}^{n} |x_{i}|)\) distribution. That is \(\phi \, | \, x \sim Gamma(\alpha + n, \beta + \sum_{i=1}^{n} |x_{i}|)\). The result highlights the relationship between the Gamma and Inv-Gamma distributions shown on question 3(b)(i) of Question Sheet Two.
Let \(X_{1}, \ldots, X_{n}\) be a finite subset of a sequence of infinitely exchangeable random quantities with joint density function \[\begin{eqnarray*} f(x_{1}, \ldots, x_{n}) & = & n! \left(1 + \sum_{i=1}^{n} x_{i}\right)^{-(n+1)}. \end{eqnarray*}\] Show that they can be represented as conditionally independent and exponentially distributed.
Using de Finetti’s Representation Theorem (Theorem 2 of the on-line notes), the joint distribution has an integral representation of the form \[\begin{eqnarray*} f(x_{1}, \ldots, x_{n}) & = & \int_{\theta}\left\{\prod_{i=1}^{n} f(x_{i} \, | \, \theta)\right\} f(\theta) \, d\theta. \end{eqnarray*}\] If \(X_{i} \, | \, \theta \sim \mbox{Exp}(\theta)\) then \[\begin{eqnarray*} \prod_{i=1}^{n} f(x_{i} \, | \, \theta) \ = \ \prod_{i=1}^{n} \theta \exp\left(-\theta x_{i} \right) \ = \ \theta^{n} \exp\left(-\theta \sum_{i=1}^{n} x_{i} \right). \end{eqnarray*}\] Notice that, viewed as a function of \(\theta\), this looks like a kernel of \(\mbox{Gamma}(n+1, \sum_{i=1}^{n} x_{i})\). The result holds if we can find an \(f(\theta)\) such that \[\begin{eqnarray*} n! \left(1 + \sum_{i=1}^{n} x_{i}\right)^{-(n+1)} & = & \int_{\theta} \theta^{n} \exp\left(-\theta \sum_{i=1}^{n} x_{i} \right) f(\theta) \, d\theta. \end{eqnarray*}\] The left hand side looks like the normalising constant of a \(\mbox{Gamma}(n+1, 1 + \sum_{i=1}^{n} x_{i})\) (as \(n! = \Gamma(n+1)\)) and if \(f(\theta) = \exp(-\theta)\) then the integrand on the right hand side is a kernel of a \(\mbox{Gamma}(n+1, 1 + \sum_{i=1}^{n} x_{i})\). So, if \(\theta \sim \mbox{Gamma}(1, 1)\) then \(f(\theta) = \exp(-\theta)\) and we have the desired representation.
Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta\). Suppose that \(X_{i} \, | \, \theta\) is distributed as a Poisson distribution with mean \(\theta\).
\[\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} P(X_{i} = x_{i} \, | \, \theta) \\ & \propto & \prod_{i=1}^{n} \theta^{x_{i}} \exp\left\{-\theta\right\} \\ & = & \theta^{n\bar{x}}\exp\left\{-n\theta\right\}. \end{eqnarray*}\] As \(\theta \sim Gamma(\alpha, \beta)\) then \[\begin{eqnarray*} f(\theta \, | \, x) & \propto & f(x \, | \, \theta)f(\theta) \\ & \propto & \theta^{n\bar{x}}\exp\left\{-n\theta\right\} \theta^{\alpha -1}\exp\left\{-\beta \theta \right\} \\ & = & \theta^{\alpha + n\bar{x} -1}\exp\left\{-(\beta +n) \theta \right\} \end{eqnarray*}\] which is a kernel of a \(Gamma(\alpha + n\bar{x}, \beta + n)\) distribution. Hence, the prior and posterior are in the same family giving conjugacy.
\[\begin{eqnarray*} E(\theta \, | \, X) & = & \frac{\alpha + n\bar{x}}{\beta + n} \\ & = & \frac{\beta\left(\frac{\alpha}{\beta}\right) + n\bar{x}}{\beta + n} \\ & = & \lambda \left(\frac{\alpha}{\beta}\right) + (1-\lambda)\bar{x} \end{eqnarray*}\] where \(\lambda = \frac{\beta}{\beta + n}\). Hence, the posterior mean is a weighted average of the prior mean, \(\frac{\alpha}{\beta}\), and the data mean, \(\bar{x}\), which is also the maximum likelihood estimate.
Weak prior information corresponds to a large variance of \(\theta\) which can be viewed as small \(\beta\) (\(\beta\) is the inverse scale parameter). In this case, more weight is attached to \(\bar{x}\) than \(\frac{\alpha}{\beta}\) in the posterior mean.
Strong prior information corresponds to a small variance of \(\theta\) which can be viewed as large \(\beta\) (once again, \(\beta\) is the inverse scale parameter). In this case, more weight is attached to \(\frac{\alpha}{\beta}\) than \(\bar{x}\) in the posterior mean which thus favours the prior mean.
\(\theta \, | \, \lambda \sim Exp(\lambda)\) so \(f(\theta \, | \, \lambda) = \lambda \exp\{-\lambda \theta\}\). \[\begin{eqnarray*} f(\lambda, \theta \, | \, x) & \propto & f(x \, | \, \theta, \lambda)f(\theta, \lambda) \\ & = & f(x \, | \, \theta) f(\theta \, | \, \lambda)f(\lambda) \\ & \propto & \left(\theta^{n\bar{x}}\exp\left\{-n\theta\right\}\right)\left( \lambda \exp\left\{-\lambda \theta\right\}\right) \\ & = & \lambda \theta^{n\bar{x}}\exp\left\{-(n+\lambda)\theta\right\}. \end{eqnarray*}\] Thus, integrating out \(\theta\), \[\begin{eqnarray*} f(\lambda \, | \, x) & \propto & \int_{0}^{\infty} \lambda \theta^{n\bar{x}}\exp\left\{-(n+\lambda)\theta\right\} d\theta \\ & = & \lambda \int_{0}^{\infty} \theta^{n\bar{x}}\exp\left\{-(n+\lambda)\theta\right\} d\theta \end{eqnarray*}\] As the integrand is a kernel of a \(Gamma(n\bar{x}+1, n+\lambda)\) distribution we thus have \[\begin{eqnarray*} f(\lambda \, | \, x) & \propto & \frac{\lambda \Gamma(n\bar{x} + 1)}{(n+\lambda)^{n\bar{x}+1}} \\ & \propto & \frac{\lambda}{(n+\lambda)^{n\bar{x}+1}}. \end{eqnarray*}\]