Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta\). For each of the distributions
\(X_{i} \, | \, \theta \sim \mbox{Bern}(\theta)\).
Show that \(f(x_{i} \, | \,
\theta)\) belongs to the 1-parameter exponential family and for
\(X = (X_{1}, \ldots, X_{n})\) state
the sufficient statistic for learning about \(\theta\).
Notice that we
can write \[\begin{eqnarray*}
f(x_{i} \, | \, \theta) & = & \theta^{x_{i}}(1-\theta)^{1-x_{i}}
\\
& = & \exp \left\{\left( \log \frac{\theta}{1-\theta}\right)
x_{i} + \log (1 - \theta)\right\}
\end{eqnarray*}\] so that \(f(x_{i} \,
| \, \theta)\) belongs to the \(1\)-parameter exponential family with \(\phi_{1}(\theta) = \log
\frac{\theta}{1-\theta}\), \(u_{1}(x_{i}) = x_{i}\), \(g(\theta) = \log (1 - \theta)\) and \(h(x_{i}) = 0\). Notice that, from
Proposition 1 (see Lecture 11), \(t_{n} = [n,
\sum_{i=1}^{n} X_{i}]\) is a sufficient statistic.
By viewing the likelihood as a function of \(\theta\), which generic family of
distributions (over \(\theta\)) is the
likelihood a kernel of?
The likelihood, without
expressing in the explicit exponential family form, is \[\begin{eqnarray*}
f(x \, | \, \theta) & = & \theta^{n\bar{x}}(1 - \theta)^{n -
n\bar{x}}
\end{eqnarray*}\] which, viewing as a function of \(\theta\), we immediately recognise as a
Beta kernel (in particular, a \(\mbox{Beta}(n\bar{x}+1, n -
n\bar{x}+1)\)).
By first finding the corresponding posterior distribution
for \(\theta\) given \(x = (x_{1}, \ldots, x_{n})\), show that
this family of distributions is conjugate with respect to the likelihood
\(f(x \, | \, \theta)\).
Taking \(\theta \sim \mbox{Beta}(\alpha,
\beta)\) we have that
\[\begin{eqnarray*}
f(\theta \, | \, x) & \propto & \theta^{n\bar{x}}(1 - \theta)^{n
- n\bar{x}} \times \theta^{\alpha - 1}(1 - \theta)^{\beta - 1} \\
& = & \theta^{\alpha + n\bar{x} - 1}(1-\theta)^{\beta + n -
n\bar{x} - 1}
\end{eqnarray*}\] so that \(\theta \, |
\, x \sim \mbox{Beta}(\alpha + n\bar{x}, \beta + n - n\bar{x})\).
Thus, the prior and the posterior are in the same family giving
conjugacy.
Deriving the results directly from
exponential family representation
Expressed in the
1-parameter exponential family form the likelihood is \[\begin{eqnarray*}
f(x \, | \, \theta) & = & \exp \left\{\left( \log
\frac{\theta}{1-\theta}\right) \sum_{i=1}^{n} x_{i} + n\log (1 -
\theta)\right\}
\end{eqnarray*}\] from which we immediately observe the
sufficient statistic \(t_{n} = [n,
\sum_{i=1}^{n} x_{i}]\). Viewing \(f(x
\, | \, \theta)\) as a function of \(\theta\) the natural conjugate prior is a
member of the \(2\)-parameter
exponential family of the form \[\begin{eqnarray*}
f(\theta) & = & \exp \left\{a\left( \log
\frac{\theta}{1-\theta}\right) + d\log (1 - \theta) + c(a, d)\right\}
\end{eqnarray*}\] where \(c(a,
d)\) is the normalising constant. Hence, \[\begin{eqnarray*}
f(\theta) & \propto & \exp \left\{a\left( \log
\frac{\theta}{1-\theta}\right) + d\log (1 - \theta) \right\} \nonumber
\\
& = & \theta^{a}(1-\theta)^{d-a} \label{eq1a3}
\end{eqnarray*}\] which we recognise as a kernel of a Beta
distribution. The convention is to label the hyperparameters as \(\alpha\) and \(\beta\) so that we put \(\alpha = \alpha(a, d) = a + 1\) and \(\beta = \beta(a, d) = d - a +1\)
(equivalently, \(a = a(\alpha, \beta) = \alpha
- 1\), \(d = d(\alpha, \beta) = \beta +
\alpha -2\)). The conjugate prior distribution is \(\theta \sim \mbox{Beta}(\alpha,
\beta)\).
Let \(X_{i} \, | \, \theta \sim N(\mu, \theta)\) with \(\mu\) known.
Show that \(f(x_{i} \, | \,
\theta)\) belongs to the 1-parameter exponential family and for
\(X = (X_{1}, \ldots, X_{n})\) state
the sufficient statistic for learning about \(\theta\).
Writing the
normal density as an exponential family (parameter \(\theta\) as \(\mu\) is a known constant) we have \[\begin{eqnarray*}
f(x_{i} \, | \, \theta) & = & \exp\left\{-\frac{1}{2\theta}
(x_{i} - \mu)^{2} - \frac{1}{2}\log \theta - \log \sqrt{2\pi} \right\}
\end{eqnarray*}\] so that \(f(x_{i} \,
| \, \theta)\) belongs to the 1-parameter exponential family. The
sufficient statistic is \(t_{n} = [n,
\sum_{i=1}^{n}(x_{i} - \mu)^{2}]\). Note that, expressed
explicitly as a 1-parameter exponential family, the likelihood for \(x = (x_{1}, \ldots, x_{n})\) is \[\begin{eqnarray*}
f(x \, | \, \theta) & = & \exp\left\{-\frac{1}{2\theta}
\sum_{i=1}^{n} (x_{i} - \mu)^{2} - \frac{n}{2}\log \theta - n\log
\sqrt{2\pi} \right\}
\end{eqnarray*}\] so that the natural conjugate prior has the
form \[\begin{eqnarray*}
f(\theta) & = & \exp\left\{-a \frac{1}{\theta} - d \log \theta +
c(a, d)\right\} \\
& \propto & \theta^{-d}\exp\left\{-a\frac{1}{\theta}\right\}
\end{eqnarray*}\] which we recognise as a kernel of an
Inverse-Gamma distribution.
By viewing the likelihood as a function of \(\theta\), which generic family of
distributions (over \(\theta\)) is the
likelihood a kernel of?
In conventional form, \[\begin{eqnarray*}
f(x \, | \, \theta) & \propto & \theta^{-\frac{n}{2}} \exp
\left\{-\frac{1}{2\theta}\sum_{i=1}^{n} (x_{i}- \mu)^{2}\right\}
\end{eqnarray*}\] which, viewing \(f(x
\, | \, \theta)\) as a function of \(\theta\), we recognise as a kernel of an
Inverse-Gamma distribution (in particular, an \(\mbox{Inv-gamma}(\frac{n-2}{2},
\frac{1}{2}\sum_{i=1}^{n} (x_{i}- \mu)^{2})\)).
By first finding the corresponding posterior distribution
for \(\theta\) given \(x = (x_{1}, \ldots, x_{n})\), show that
this family of distributions is conjugate with respect to the likelihood
\(f(x \, | \, \theta)\).
Taking \(\theta \sim
\mbox{Inv-gamma}(\alpha, \beta)\) we have \[\begin{eqnarray*}
f(\theta \, | \, x) & \propto & \theta^{-\frac{n}{2}} \exp
\left\{-\frac{1}{2\theta}\sum_{i=1}^{n} (x_{i}- \mu)^{2}\right\} \times
\theta^{-(\alpha + 1)}\exp\left\{-\frac{\beta}{\theta}\right\} \\
& = & \theta^{-(\alpha + \frac{n}{2} +
1)}\exp\left\{-\left(\beta + \frac{1}{2}\sum_{i=1}^{n} (x_{i} -
\mu)^{2}\right)\frac{1}{\theta}\right\}
\end{eqnarray*}\] which we recognise as a kernel of an
Inverse-Gamma distribution so that \(\theta \,
| \, x \sim \mbox{Inv-gamma}(\alpha + \frac{n}{2}, \beta + \frac{1}{2}
\sum_{i=1}^{n} (x_{i} - \mu)^{2})\). Hence, the prior and
posterior are in the same family giving conjugacy.
Let \(X_{i} \, | \, \theta \sim \mbox{Maxwell}(\theta)\), the Maxwell distribution with parameter \(\theta\) so that \[\begin{eqnarray*} f(x_{i} \, | \, \theta) = \left(\frac{2}{\pi}\right)^{\frac{1}{2}}\theta^{\frac{3}{2}}x_{i}^{2}\exp\left\{-\frac{\theta x_{i}^{2}}{2}\right\}, \ \ x_{i} > 0 \end{eqnarray*}\] and \(E(X_{i} \, | \, \theta) = 2\sqrt{\frac{2}{\pi \theta}}\), \(Var(X_{i} \, | \, \theta) = \frac{3\pi - 8}{\pi \theta}\)
Show that \(f(x_{i} \, | \,
\theta)\) belongs to the 1-parameter exponential family and for
\(X = (X_{1}, \ldots, X_{n})\) state
the sufficient statistic for learning about \(\theta\).
Writing the
Maxwell density in exponential family form we have \[\begin{eqnarray*}
f(x_{i} \, | \, \theta) & = & \exp\left\{-\theta
\frac{x_{i}^{2}}{2} + \frac{3}{2} \log \theta + \log x_{i}^{2} +
\frac{1}{2} \log \frac{2}{\pi}\right\}
\end{eqnarray*}\] so that \(f(x_{i} \,
| \, \theta)\) belongs to the 1-parameter exponential family. The
sufficient statistic is \(t_{n} = [n,
\sum_{i=1}^{n}x_{i}^{2}]\). Note that, expressed explicitly as a
1-parameter exponential family, the likelihood for \(x = (x_{1}, \ldots, x_{n})\) is \[\begin{eqnarray*}
f(x \, | \, \theta) & = & \exp\left\{-\theta \sum_{i=1}^{n}
\frac{x_{i}^{2}}{2} + \frac{3n}{2} \log \theta + \sum_{i=1}^{n}\log
x_{i}^{2} + \frac{n}{2} \log \frac{2}{\pi}\right\}
\end{eqnarray*}\] so that the natural conjugate prior has the
form \[\begin{eqnarray*}
f(\theta) & = & \exp\left\{-a \theta + d \log \theta + c(a,
d)\right\} \\
& \propto & \theta^{d} e^{-a \theta}
\end{eqnarray*}\] which we recognise as a kernel of a Gamma
distribution.
By viewing the likelihood as a function of \(\theta\), which generic family of
distributions (over \(\theta\)) is the
likelihood a kernel of?
In conventional form, \[\begin{eqnarray*}
f(x \, | \, \theta) & = &
\left(\frac{2}{\pi}\right)^{\frac{n}{2}}\theta^{\frac{3n}{2}}\left(\prod_{i=1}^{n}
x_{i}^{2}\right)\exp\left\{-\left(\frac{\sum_{i=1}^{n}
x_{i}^{2}}{2}\right)\theta\right\} \\
& \propto &
\theta^{\frac{3n}{2}}\exp\left\{-\left(\frac{\sum_{i=1}^{n}
x_{i}^{2}}{2}\right)\theta\right\}
\end{eqnarray*}\] which, viewing \(f(x
\, | \, \theta)\) as a function of \(\theta\), we recognise as a kernel of a
Gamma distribution (in particular, \(\mbox{Gamma}(\frac{3n+2}{2},
\frac{1}{2}\sum_{i=1}^{n} x_{i}^{2})\)).
By first finding the corresponding posterior distribution
for \(\theta\) given \(x = (x_{1}, \ldots, x_{n})\), show that
this family of distributions is conjugate with respect to the likelihood
\(f(x \, | \, \theta)\).
Taking \(\theta \sim \mbox{Gamma}(\alpha,
\beta)\) we have \[\begin{eqnarray*}
f(\theta \, | \, x) & \propto &
\theta^{\frac{3n}{2}}\exp\left\{-\left(\frac{\sum_{i=1}^{n}
x_{i}^{2}}{2}\right)\theta\right\} \times \theta^{\alpha -1}e^{-\beta
\theta} \\
& = & \theta^{\alpha + \frac{3n}{2} - 1} \exp\left\{-\left(\beta
+ \frac{1}{2}\sum_{i=1}^{n} x_{i}^{2}\right)\theta\right\}
\end{eqnarray*}\] which, of course, is a kernel of a Gamma
distribution so that \(\theta \, | \, x \sim
\mbox{Gamma}(\alpha + \frac{3n}{2}, \beta +
\frac{1}{2}\sum_{i=1}^{n}x_{i}^{2})\). The prior and the
posterior are in the same family giving conjugacy.
Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta\). Suppose that \(X_{i} \, | \, \theta\) is geometrically distributed with probability density function \[\begin{eqnarray*} f(x_{i} \, | \, \theta) & = & (1-\theta)^{x_{i}-1}\theta, \ \ x_{i} = 1, 2, \ldots. \end{eqnarray*}\]
Show that \(f(x \, | \,
\theta)\), where \(x = (x_{1}, \ldots,
x_{n})\), belongs to the \(1\)-parameter exponential family. Hence, or
otherwise, find the conjugate prior distribution and corresponding
posterior distribution for \(\theta\).
As the \(X_{i}\) are exchangeable then \[\begin{eqnarray*}
f(x \, | \, \theta) & = & \prod_{i=1}^{n} f(x_{i} \, | \,
\theta) \\
& = & \prod_{i=1}^{n} (1-\theta)^{x_{i}-1}\theta \\
& = & (1 - \theta)^{n\bar{x} -n}\theta^{n} \\
& = & \exp\left\{(n\bar{x} - n)\log (1 - \theta) + n \log \theta
\right\}
\end{eqnarray*}\] and so belongs to the \(1\)-parameter exponential family. The
conjugate prior is of the form \[\begin{eqnarray*}
f(\theta) & \propto & \exp\left\{ a\log (1 - \theta) + b \log
\theta \right\} \\
& = & \theta^{b}(1-\theta)^{a}
\end{eqnarray*}\] which is a kernel of a Beta distribution.
Letting \(\alpha = b+1\), \(\beta = a+1\) then we have \(\theta \sim \mbox{Beta}(\alpha, \beta)\).
\[\begin{eqnarray*}
f(\theta \, | \, x) & \propto & f(x \, | \, \theta)f(\theta) \\
& \propto & \theta^{n}(1-\theta)^{(n\bar{x} -
n)}\theta^{\alpha-1}(1-\theta)^{\beta-1}
\end{eqnarray*}\] which is a kernel of a \(\mbox{Beta}(\alpha + n, \beta + n\bar{x} -
n)\) so that \(\theta \, | \, x \sim
\mbox{Beta}(\alpha + n, \beta + n\bar{x} - n)\).
Show that the posterior mean for \(\theta\) can be written as a weighted
average of the prior mean of \(\theta\)
and the maximum likelihood estimate, \(\bar{x}^{-1}\).
\[\begin{eqnarray*}
E(\theta \, | \, X) & = & \frac{\alpha + n}{(\alpha + n) +
(\beta + n\bar{x} - n)} \\
& = & \frac{\alpha + n}{\alpha + \beta + n\bar{x}} \\
& = & \left(\frac{\alpha + \beta}{\alpha + \beta +
n\bar{x}}\right)\left(\frac{\alpha}{\alpha+\beta}\right) +
\left(\frac{n\bar{x}}{\alpha + \beta +
n\bar{x}}\right)\left(\frac{1}{\bar{x}}\right) \\
& = & \lambda E(\theta) + (1-\lambda)\bar{x}^{-1}.
\end{eqnarray*}\]
Suppose now that the prior for \(\theta\) is instead given by the
probability density function
\[\begin{eqnarray*}
f(\theta) & = & \frac{1}{2B(\alpha+1,
\beta)}\theta^{\alpha}(1-\theta)^{\beta - 1} + \frac{1}{2B(\alpha,
\beta+1)}\theta^{\alpha-1}(1-\theta)^{\beta},
\end{eqnarray*}\] where \(B(\alpha, \beta)\) denotes the Beta
function evaluated at \(\alpha\) and
\(\beta\). Show that the posterior
probability density function can be written as \[\begin{eqnarray*}
f(\theta \, | \, x) & = & \lambda f_{1}(\theta) + (1 - \lambda)
f_{2}(\theta)
\end{eqnarray*}\] where \[\begin{eqnarray*}
\lambda & = & \frac{(\alpha + n)\beta}{(\alpha + n)\beta +
(\beta -n + \sum_{i=1}^{n} x_{i})\alpha}
\end{eqnarray*}\] and \(f_{1}(\theta)\) and \(f_{2}(\theta)\) are probability density
functions.
\[\begin{eqnarray*}
f(\theta \, | \, x) & \propto & f(x \, | \, \theta)f(\theta) \\
& = & \theta^{n}(1-\theta)^{(n\bar{x} -
n)}\left\{\frac{\theta^{\alpha}(1-\theta)^{\beta-1}}{B(\alpha+1, \beta)}
+ \frac{\theta^{\alpha-1}(1-\theta)^{\beta}}{B(\alpha, \beta+1)}\right\}
\\
& = &
\frac{\theta^{\alpha_{1}}(1-\theta)^{\beta_{1}-1}}{B(\alpha+1, \beta)} +
\frac{\theta^{\alpha_{1}-1}(1-\theta)^{\beta_{1}}}{B(\alpha, \beta+1)}
\end{eqnarray*}\] where \(\alpha_{1} =
\alpha +n\) and \(\beta_{1} = \beta +
n\bar{x} -n\). Finding the constant of proportionality we observe
that \(\theta^{\alpha_{1}}(1-\theta)^{\beta_{1}-1}\)
is a kernel of a \(\mbox{Beta}(\alpha_{1}+1,
\beta_{1})\) and \(\theta^{\alpha_{1}-1}(1-\theta)^{\beta_{1}}\)
is a kernel of a \(\mbox{Beta}(\alpha_{1},\beta_{1}+1)\). So,
\[\begin{eqnarray*}
f(\theta \, | \, x) & = &
c\left\{\frac{B(\alpha_{1}+1,\beta_{1})}{B(\alpha+1,\beta)}f_{1}(\theta)
+
\frac{B(\alpha_{1},\beta_{1}+1)}{B(\alpha,\beta+1)}f_{2}(\theta)\right\}
\end{eqnarray*}\] where \(f_{1}(\theta)\) is the density function of
\(\mbox{Beta}(\alpha_{1}+1,
\beta_{1})\) and \(f_{2}(\theta)\) the density function of
\(\mbox{Beta}(\alpha_{1},\beta_{1}+1)\).
Hence, \[\begin{eqnarray*}
c^{-1} & = & \frac{B(\alpha_{1}+1,\beta_{1})}{B(\alpha+1,\beta)}
+ \frac{B(\alpha_{1},\beta_{1}+1)}{B(\alpha,\beta+1)}
\end{eqnarray*}\] so that \(f(\theta \,
| \, x) = \lambda f_{1}(\theta) + (1-\lambda)f_{2}(\theta)\) with
\[\begin{eqnarray*}
\lambda & = &
\frac{\frac{B(\alpha_{1}+1,\beta_{1})}{B(\alpha+1,\beta)}}{\frac{B(\alpha_{1}+1,\beta_{1})}{B(\alpha+1,\beta)}
+ \frac{B(\alpha_{1},\beta_{1}+1)}{B(\alpha,\beta+1)}} \\
& = & \frac{\frac{\alpha_{1}(\alpha +
\beta)B(\alpha_{1},\beta_{1})}{\alpha(\alpha_{1}+
\beta_{1})B(\alpha,\beta)}}{\frac{\alpha_{1}(\alpha +
\beta)B(\alpha_{1},\beta_{1})}{\alpha(\alpha_{1}+\beta_{1})B(\alpha,\beta)}
+
\frac{\beta_{1}(\alpha+\beta)B(\alpha_{1},\beta_{1})}{\beta(\alpha_{1}+\beta_{1})B(\alpha,\beta)}}
\\
& = & \frac{\alpha_{1}\beta}{\alpha_{1}\beta + \beta_{1}\alpha}
\\
& = & \frac{(\alpha + n)\beta}{(\alpha + n)\beta + (\beta +
\sum_{i=1}^{n} x_{i}-n)\alpha}.
\end{eqnarray*}\]
Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta\). Suppose that \(X_{i} \, | \, \theta\) is distributed as a double-exponential distribution with probability density function \[\begin{eqnarray*} f(x_{i} \, | \, \theta) & = & \frac{1}{2\theta} \exp \left\{- \frac{|x_{i}|}{\theta}\right\}, \ \ -\infty < x_{i} < \infty \end{eqnarray*}\] for \(\theta > 0\).
Find the conjugate prior distribution and corresponding
posterior distribution for \(\theta\)
following observation of \(x = (x_{1}, \ldots,
x_{n})\).
\[\begin{eqnarray*}
f(x \, | \, \theta) & = & \prod_{i=1}^{n} \frac{1}{2\theta} \exp
\left\{- \frac{|x_{i}|}{\theta}\right\} \\
& \propto & \frac{1}{\theta^{n}} \exp \left\{- \frac{1}{\theta}
\sum_{i=1}^{n}|x_{i}| \right\}
\end{eqnarray*}\] which, when viewed as a function of \(\theta\), is a kernel of \(\mbox{Inv-gamma}(n-1, \sum_{i=1}^{n}
|x_{i}|)\). We thus take \(\theta \sim
\mbox{Inv-gamma}(\alpha, \beta)\) as the prior so that \[\begin{eqnarray*}
f(\theta \, | \, x) & \propto & \frac{1}{\theta^{n}} \exp
\left\{- \frac{1}{\theta} \sum_{i=1}^{n}|x_{i}|
\right\}\frac{1}{\theta^{\alpha +
1}}\exp\left\{-\frac{\beta}{\theta}\right\} \\
& = & \frac{1}{\theta^{\alpha + n + 1}}\exp\left\{-
\frac{1}{\theta}\left(\beta + \sum_{i=1}^{n}|x_{i}| \right)\right\}
\end{eqnarray*}\] which is a kernel of \(\mbox{Inv-gamma}(\alpha + n, \beta +
\sum_{i=1}^{n} |x_{i}|)\). Thus, with respect to \(X \, | \, \theta\), the prior and posterior
are in the same family, showing conjugacy, with \(\theta \, | \, x \sim \mbox{Inv-gamma}(\alpha + n,
\beta + \sum_{i=1}^{n} |x_{i}|)\).
Consider the transformation \(\phi = \theta^{-1}\). Find the posterior
distribution of \(\phi \, | \,
x\).
We have \(\phi =
g(\theta)\) where \(g(\theta) =
\theta^{-1}\) so that \(\theta =
g^{-1}(\phi) = \phi^{-1}\). Transforming \(f_{\theta}(\theta \, | \, x)\) to \(f_{\phi}(\phi \, | \, x)\) we have \[\begin{eqnarray*}
f_{\phi}(\phi \, | \, x) & = & \left|\frac{\partial
\theta}{\partial \phi}\right| f_{\theta}(g^{-1}(\phi) \, | \, x) \\
& \propto & \left|\frac{-1}{\phi^{2}}\right|
\frac{1}{\frac{1}{\phi}^{\alpha + n + 1}}\exp\left\{-
\frac{1}{\frac{1}{\phi}}\left(\beta + \sum_{i=1}^{n}|x_{i}|
\right)\right\} \\
& = & \phi^{\alpha + n - 1}\exp\left\{- \phi\left(\beta +
\sum_{i=1}^{n}|x_{i}| \right)\right\}
\end{eqnarray*}\] which is a kernel of a \(\mbox{Gamma}(\alpha + n, \beta + \sum_{i=1}^{n}
|x_{i}|)\) distribution. That is \(\phi
\, | \, x \sim \mbox{Gamma}(\alpha + n, \beta + \sum_{i=1}^{n}
|x_{i}|)\). The result highlights the relationship between the
Gamma and Inv-Gamma distributions shown on question 3(b)(i) of Question
Sheet Two.
Let \(X_{1}, \ldots, X_{n}\)
be a finite subset of a sequence of infinitely exchangeable random
variables with joint density function \[\begin{eqnarray*}
f(x_{1}, \ldots, x_{n}) & = & n! \left(1 + \sum_{i=1}^{n}
x_{i}\right)^{-(n+1)}.
\end{eqnarray*}\] Show that they can be represented as
conditionally independent and exponentially distributed.
Using de Finetti’s Representation Theorem (Theorem 2 of the on-line
notes), the joint distribution has an integral representation of the
form \[\begin{eqnarray*}
f(x_{1}, \ldots, x_{n}) & = &
\int_{\theta}\left\{\prod_{i=1}^{n} f(x_{i} \, | \, \theta)\right\}
f(\theta) \, d\theta.
\end{eqnarray*}\] If \(X_{i} \, | \,
\theta \sim \mbox{Exp}(\theta)\) then \[\begin{eqnarray*}
\prod_{i=1}^{n} f(x_{i} \, | \, \theta) \ = \ \prod_{i=1}^{n} \theta
\exp\left(-\theta x_{i} \right) \ = \ \theta^{n} \exp\left(-\theta
\sum_{i=1}^{n} x_{i} \right).
\end{eqnarray*}\] Notice that, viewed as a function of \(\theta\), this looks like a kernel of \(\mbox{Gamma}(n+1, \sum_{i=1}^{n} x_{i})\).
The result holds if we can find an \(f(\theta)\) such that \[\begin{eqnarray*}
n! \left(1 + \sum_{i=1}^{n} x_{i}\right)^{-(n+1)} & = &
\int_{\theta} \theta^{n} \exp\left(-\theta \sum_{i=1}^{n} x_{i} \right)
f(\theta) \, d\theta.
\end{eqnarray*}\] The left hand side looks like the normalising
constant of a \(\mbox{Gamma}(n+1, 1 +
\sum_{i=1}^{n} x_{i})\) (as \(n! =
\Gamma(n+1)\)) and if \(f(\theta) =
\exp(-\theta)\) then the integrand on the right hand side is a
kernel of a \(\mbox{Gamma}(n+1, 1 +
\sum_{i=1}^{n} x_{i})\). So, if \(\theta \sim \mbox{Gamma}(1, 1)\) then \(f(\theta) = \exp(-\theta)\) and we have the
desired representation.
Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta\). Suppose that \(X_{i} \, | \, \theta\) is distributed as a Poisson distribution with mean \(\theta\).
Show that, with respect to this Poisson likelihood, the
gamma family of distributions is conjugate.
\[\begin{eqnarray*}
f(x \, | \, \theta) & = & \prod_{i=1}^{n} P(X_{i} = x_{i} \, |
\, \theta) \\
& \propto & \prod_{i=1}^{n} \theta^{x_{i}}
\exp\left\{-\theta\right\} \\
& = & \theta^{n\bar{x}}\exp\left\{-n\theta\right\}.
\end{eqnarray*}\] As \(\theta \sim
\mbox{Gamma}(\alpha, \beta)\) then \[\begin{eqnarray*}
f(\theta \, | \, x) & \propto & f(x \, | \, \theta)f(\theta) \\
& \propto & \theta^{n\bar{x}}\exp\left\{-n\theta\right\}
\theta^{\alpha -1}\exp\left\{-\beta \theta \right\} \\
& = & \theta^{\alpha + n\bar{x} -1}\exp\left\{-(\beta +n) \theta
\right\}
\end{eqnarray*}\] which is a kernel of a \(\mbox{Gamma}(\alpha + n\bar{x}, \beta +
n)\) distribution. Hence, the prior and posterior are in the same
family giving conjugacy.
Interpret the posterior mean of \(\theta\) paying particular attention to the
cases when we may have weak prior information and strong prior
information.
\[\begin{eqnarray*}
E(\theta \, | \, X) & = & \frac{\alpha + n\bar{x}}{\beta + n} \\
& = & \frac{\beta\left(\frac{\alpha}{\beta}\right) +
n\bar{x}}{\beta + n} \\
& = & \lambda \left(\frac{\alpha}{\beta}\right) +
(1-\lambda)\bar{x}
\end{eqnarray*}\] where \(\lambda =
\frac{\beta}{\beta + n}\). Hence, the posterior mean is a
weighted average of the prior mean, \(\frac{\alpha}{\beta}\), and the data mean,
\(\bar{x}\), which is also the maximum
likelihood estimate.
Weak prior information corresponds to a
large variance of \(\theta\) which can
be viewed as small \(\beta\) (\(\beta\) is the inverse scale parameter). In
this case, more weight is attached to \(\bar{x}\) than \(\frac{\alpha}{\beta}\) in the posterior
mean.
Strong prior information corresponds to a small variance
of \(\theta\) which can be viewed as
large \(\beta\) (once again, \(\beta\) is the inverse scale parameter). In
this case, more weight is attached to \(\frac{\alpha}{\beta}\) than \(\bar{x}\) in the posterior mean which thus
favours the prior mean.
Suppose now that the prior for \(\theta\) is given hierarchically. Given
\(\lambda\), \(\theta\) is judged to follow an exponential
distribution with mean \(\frac{1}{\lambda}\) and \(\lambda\) is given the improper
distribution \(f(\lambda) \propto 1\)
for \(\lambda > 0\). Show
that \[\begin{eqnarray*}
f(\lambda \, | \, x) & \propto &
\frac{\lambda}{(n+\lambda)^{n\bar{x}+1}}
\end{eqnarray*}\] where \(\bar{x} = \frac{1}{n} \sum_{i=1}^{n}
x_{i}\).
\(\theta \,
| \, \lambda \sim \mbox{Exp}(\lambda)\) so \(f(\theta \, | \, \lambda) = \lambda \exp\{-\lambda
\theta\}\). \[\begin{eqnarray*}
f(\lambda, \theta \, | \, x) & \propto & f(x \, | \, \theta,
\lambda)f(\theta, \lambda) \\
& = & f(x \, | \, \theta) f(\theta \, | \, \lambda)f(\lambda) \\
& \propto &
\left(\theta^{n\bar{x}}\exp\left\{-n\theta\right\}\right)\left( \lambda
\exp\left\{-\lambda \theta\right\}\right) \\
& = & \lambda
\theta^{n\bar{x}}\exp\left\{-(n+\lambda)\theta\right\}.
\end{eqnarray*}\] Thus, integrating out \(\theta\), \[\begin{eqnarray*}
f(\lambda \, | \, x) & \propto & \int_{0}^{\infty} \lambda
\theta^{n\bar{x}}\exp\left\{-(n+\lambda)\theta\right\} d\theta \\
& = & \lambda \int_{0}^{\infty}
\theta^{n\bar{x}}\exp\left\{-(n+\lambda)\theta\right\} d\theta
\end{eqnarray*}\] As the integrand is a kernel of a \(\mbox{Gamma}(n\bar{x}+1, n+\lambda)\)
distribution we thus have \[\begin{eqnarray*}
f(\lambda \, | \, x) & \propto & \frac{\lambda \Gamma(n\bar{x} +
1)}{(n+\lambda)^{n\bar{x}+1}} \\
& \propto & \frac{\lambda}{(n+\lambda)^{n\bar{x}+1}}.
\end{eqnarray*}\]