Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta\). Suppose that \(X_{i} \, | \, \theta \sim \mbox{Inv-gamma}(\alpha, \theta)\), where \(\alpha\) is known, and we judge that \(\theta \sim \mbox{Gamma}(\alpha_{0}, \beta_{0})\), where \(\alpha_{0}\) and \(\beta_{0}\) are known.
Show that \(\theta \, | \, x
\sim \mbox{Gamma}(\alpha_{n}, \beta_{n})\) where \(\alpha_{n} = \alpha_{0} + n \alpha\), \(\beta_{n} = \beta_{0} + \sum_{i=1}^{n}
\frac{1}{x_{i}}\), and \(x = (x_{1},
\ldots, x_{n})\).
As \(X_{i} \, | \, \theta \sim \mbox{Inv-gamma}(\alpha,
\theta)\) then \[\begin{eqnarray*}
f(x \, | \, \theta) & = & \prod_{i=1}^{n}
\frac{\theta^{\alpha}}{\Gamma(\alpha)} x_{i}^{-(\alpha+1)} \exp
\left(-\frac{\theta}{x_{i}}\right) \\
& \propto & \theta^{n\alpha} \exp \left(-\theta
\sum_{i=1}^{n}\frac{1}{x_{i}}\right).
\end{eqnarray*}\] Similarly, as \(\theta \sim \mbox{Gamma}(\alpha_{0},
\beta_{0})\), \[\begin{eqnarray*}
f(\theta) & \propto & \theta^{\alpha_{0}-1}\exp \left(-\theta
\beta_{0}\right).
\end{eqnarray*}\] Hence, \[\begin{eqnarray*}
f(\theta \, | \, x) & \propto & f(x \, | \, \theta)f(\theta) \\
& \propto & \theta^{(\alpha_{0}+n\alpha)-1}\exp
\left\{-\theta\left(\beta_{0} + \sum_{i=1}^{n}
\frac{1}{x_{i}}\right)\right\}
\end{eqnarray*}\] which we recognise as a kernal of a \(\mbox{Gamma}(\alpha_{0} + n \alpha, \beta_{0} +
\sum_{i=1}^{n} \frac{1}{x_{i}})\). Thus, \(\theta \, | \, x \sim \mbox{Gamma}(\alpha_{n},
\beta_{n})\) as required.
We wish to use the Metropolis-Hastings algorithm to sample from the posterior distribution \(\theta \, | \, x\) using a normal distribution with mean \(\theta\) and chosen variance \(\sigma^{2}\) as the symmetric proposal distribution.
Suppose that, at time \(t\), the proposed value \(\theta^{*} \leq 0\). Briefly explain why
the corresponding acceptance probability is zero for such a \(\theta^{*}\) and thus that the sequence of
values generated by the algorithm are never less than zero.
As the proposal distribution is symmetric then the Metropolis-Hastings
algorithm reduces to the Metropolis algorithm with acceptance
probability \[\begin{eqnarray*}
\alpha(\theta^{(t-1)}, \theta^{*}) = \min \left(1, \frac{f(\theta^{*} \,
| \, x)}{f(\theta^{(t-1)} \, | \, x)}\right)
\end{eqnarray*}\] where \(\theta \, |
\, x \sim \mbox{Gamma}(\alpha_{n}, \beta_{n})\). Now, assuming
\(\theta^{(t-1)} > 0\), if \(\theta^{*} \leq 0\) then \(f(\theta^{*} \, | \, x) = 0\) and \(\alpha(\theta^{(t-1)}, \theta^{*}) = 0\) so
the move to \(\theta^{*} \leq 0\) is
rejected and \(\theta^{(t)} =
\theta^{(t-1)}\). So, provided \(\theta^{(0)} > 0\) then the sequence of
values generated by the algorithm are never less than zero.
Describe how the Metropolis-Hastings algorithm works for
this example, giving the acceptance probability in its simplest
form.
The algorithm is performed as follows.
\(\mbox{1}\). Choose a starting point
\(\theta^{(0)}\) for which \(f(\theta^{(0)} \, | \, x) > 0\). This
will hold for any \(\theta^{(0)} >
0\) as \(\theta \, | \, x \sim
\mbox{Gamma}(\alpha_{n}, \beta_{n})\).
\(\mbox{2}\). At time \(t\)
\(\bullet\) Sample \(\theta^{*} \sim N(\theta^{(t-1)},
\sigma^{2})\).
\(\bullet\) Calculate the acceptance
probability \[\begin{eqnarray*}
\alpha(\theta^{(t-1)}, \theta^{*}) & = & \min \left(1,
\frac{f(\theta^{*} \, | \, x)}{f(\theta^{(t-1)} \, | \, x)}\right) \\
& = & \left\{\begin{array}{cl} \min\left(1,
\frac{(\theta^{*})^{\alpha_{n}-1}\exp(-\beta_{n}\theta^{*})}{(\theta^{(t-1)})^{\alpha_{n}-1}\exp(-\beta_{n}\theta^{(t-1)})}\right)
& \theta^{*} > 0 \\
0 & \theta^{*} \leq 0 \end{array}\right.\\
& = & \left\{\begin{array}{cl} \min\left(1,
\left(\frac{\theta^{*}}{\theta^{(t-1)}}\right)^{\alpha_{n} - 1}
\exp\left\{-\beta_{n}\left(\theta^{*} -
\theta^{(t-1)}\right)\right\}\right) & \theta^{*} > 0 \\
0 & \theta^{*} \leq 0 \end{array}\right.
\end{eqnarray*}\] where \(\alpha_{n} = \alpha_{0} + n \alpha\), \(\beta_{n} = \beta_{0} + \sum_{i=1}^{n}
\frac{1}{x_{i}}\).
\(\bullet\) Generate \(U \sim U(0, 1)\).
\(\bullet\) If \(U
\leq \alpha(\theta^{(t-1)}, \theta^{*})\) accept the move, \(\theta^{(t)} = \theta^{*}\). Otherwise
reject the move, \(\theta^{(t)}=
\theta^{(t-1)}\).
\(\mbox{3}\). Repeat step 2.
The
algorithm will produce a Markov Chain with stationary distribution \(f(\theta \, | \, x)\). After a sufficiently
long time, \(b\) say, to allow for
convergence, the values \(\{\theta^{(t)}\}\) for \(t > b\) may be considered as a sample
from \(f(\theta \, | \, x)\), that is a
sample from \(\mbox{Gamma}(\alpha_{n},
\beta_{n})\), where the samples \(\{\theta^{(t)} \}\) for \(t \leq b\) are the “burn-in”.
Suppose that \(X \, | \, \theta \sim N(\theta, \sigma^{2})\) and \(Y \, | \, \theta, \delta \sim N(\theta - \delta, \sigma^{2})\), where \(\sigma^{2}\) is a known constant and \(X\) and \(Y\) are conditionally independent given \(\theta\) and \(\delta\). It is judged that the improper noninformative joint prior distribution \(f(\theta, \delta) \propto 1\) is appropriate.
Show that the joint posterior distribution of \(\theta\) and \(\delta\) given \(x\) and \(y\) is bivariate normal with mean vector
\(\mu\) and variance matrix \(\Sigma\) where \[\begin{eqnarray*}
\mu \ = \ \left(\begin{array}{c}
E(\theta \, | \, X, Y) \\ E(\delta \, | \, X, Y) \end{array} \right) \ =
\ \left(\begin{array}{c}
x \\ x-y \end{array} \right); \ \ \Sigma \ = \ \left(\begin{array}{cc}
\sigma^{2} & \sigma^{2} \\
\sigma^{2} & 2\sigma^{2} \end{array} \right).
\end{eqnarray*}\]
As \(X \, | \,
\theta \sim N(\theta, \sigma^{2})\) then \[\begin{eqnarray*}
f(x \, | \, \theta) & \propto &
\exp\left\{-\frac{1}{2\sigma^{2}} (\theta^{2} - 2x\theta)\right\}.
\end{eqnarray*}\] As \(Y \, | \,
\theta, \delta \sim N(\theta - \delta, \sigma^{2})\) then \[\begin{eqnarray*}
f(y \, | \, \theta, \delta) & \propto &
\exp\left\{-\frac{1}{2\sigma^{2}}[(\theta-\delta)^{2} - 2y(\theta -
\delta)]\right\}.
\end{eqnarray*}\] Now, by conditional independence, \[\begin{eqnarray*}
f(x, y \, | \, \theta, \delta) & = & f(x \, | \, \theta)f(y \, |
\, \theta, \delta) \\
& \propto & \exp \left\{-\frac{1}{2\sigma^{2}}[\theta^{2} -
2x\theta + (\theta-\delta)^{2} - 2y(\theta - \delta)]\right\}.
\end{eqnarray*}\] So, \[\begin{eqnarray*}
f(\theta, \delta \, | \, x, y) & \propto & f(x, y \, | \,
\theta, \delta)f(\theta, \delta) \\
& \propto & f(x, y \, | \, \theta, \delta) \\
& \propto & \exp \left\{-\frac{1}{2\sigma^{2}}[\theta^{2} -
2x\theta + (\theta-\delta)^{2} - 2y(\theta - \delta)]\right\} \\
& \propto & \exp \left\{-\frac{1}{2\sigma^{2}}[2\theta^{2}
-2(x+y)\theta - 2\theta\delta + \delta^{2} + 2y\delta]\right\}.
\end{eqnarray*}\] The exponential argument is quadratic in \(\theta\), \(\delta\) so \(\theta, \delta \, | \, x, y\) is bivariate
normal. We can find the exact normal by completing the square or, in
this case, using the information given in the question. We have, \[\begin{eqnarray*}
\Sigma^{-1} & = & \frac{1}{\sigma^{2}}\left(\begin{array}{cc}
2 & -1 \\
-1 & 1 \end{array} \right)
\end{eqnarray*}\] so that the density of the \(N_{2}(\mu, \Sigma)\) is \[\begin{eqnarray*}
\exp\left\{-\frac{1}{2} \left(\theta - x \ \ \delta - (x-y)\right)
\frac{1}{\sigma^{2}}\left(\begin{array}{cc}
2 & -1 \\
-1 & 1 \end{array} \right)\left(\begin{array}{c} \theta - x \\
\delta - (x-y) \end{array}\right) \right\} & \propto &
\exp\left\{-\frac{1}{2\sigma^{2}}[2\theta^{2} - 2(2x-(x-y))\theta -
2\theta \delta + \delta^{2} - 2((x-y)-x)\delta]\right\} \\
& = & \exp\left\{-\frac{1}{2\sigma^{2}}[2\theta^{2} -
2(x+y)\theta - 2\theta \delta + \delta^{2} + 2y\delta]\right\} \\
& \propto & f(\theta, \delta \, | \, x, y).
\end{eqnarray*}\]
Describe how the Gibbs sampler may be used to sample from
the posterior distribution \(\theta, \delta \,
| \, x , y\), deriving all required conditional
distributions.
The Gibbs sampler requires the
conditional distributions \(\theta \, | \,
\delta, x, y\) and \(\delta \, | \,
\theta, x, y\). Now, with proportionality with respect to \(\theta\), \[\begin{eqnarray*}
f(\theta \, | \, \delta, x, y) & \propto & f(\theta, \delta \, |
\, x, y) \\
& \propto & \exp\left\{-\frac{1}{2\sigma^{2}}[2\theta^{2} -
2(x+y+\delta)\theta]\right\} \\
& = &
\exp\left\{-\frac{1}{2}\left(\frac{2}{\sigma^{2}}\right)\left[\theta^{2}
- 2\left(\frac{x+y+\delta}{2}\right)\theta\right]\right\}
\end{eqnarray*}\] which is a kernel of a normal density. Thus,
\(\theta \, | \, \delta, x, y \sim
N(\frac{x+y+\delta}{2}, \frac{\sigma^{2}}{2})\). Now, with
proportionality with respect to \(\delta\), \[\begin{eqnarray*}
f(\delta \, | \, \theta, x, y) & \propto & f(\theta, \delta \, |
\, x, y) \\
& \propto & \exp\left\{-\frac{1}{2\sigma^{2}}[\delta^{2} -
2(\theta-y)\delta]\right\}
\end{eqnarray*}\] which is a kernel of a normal density. Thus,
\(\delta \, | \, \theta, x, y \sim N(\theta -
y, \sigma^{2})\). The Gibbs sampler algorithm is
\(\mbox{1}\). Choose a starting value \((\theta^{(0)}, \delta^{(0)})\) for which
\(f(\theta^{(0)}, \delta^{(0)} \, | \, x, y)
> 0\).
\(\mbox{2}\).
At iteration \(t\) generate the new
values \((\theta^{(t)}, \delta^{(t)})\)
as follows:
\(\bullet\) draw \(\theta^{(t)}\) from \(N(\frac{x+y+\delta^{(t-1)}}{2},
\frac{\sigma^{2}}{2})\), the distribution of \(\theta^{(t)} \, | \, \delta^{(t-1)}, x,
y\).
\(\bullet\) draw \(\delta^{(t)}\) from \(N(\theta^{(t)}-y, \sigma^{2})\), the
distribution of \(\delta^{(t)} \, | \,
\theta^{(t)}, x, y\).
\(\mbox{3}\). Repeat step 2.
The
algorithm produces a Markov Chain with stationary distribution \(f(\theta, \delta \, | \, x, y)\). After a
sufficiently long time to allow for convergence, the values \(\{\theta^{(t)}, \delta^{(t)}\}\) for \(t > b\) may be considered as a sample
from \(f(\theta, \delta \, | \, x, y)\)
where the samples \(\{\theta^{(t)},
\delta^{(t)}\}\) for \(t \leq
b\) are the “burn-in”.
Suppose that \(x=2\),
\(y=1\) and \(\sigma^{2} = 1\). Sketch the contours of
the joint posterior distribution. Starting from the origin, add to your
sketch the first four steps of a typical Gibbs sampler
path.
For \(x=2\),
\(y=1\) and \(\sigma^{2} = 1\) we shall use the Gibbs
sampler to sample from \(N_{2}(\mu,
\Sigma)\) where \(\mu =
\left(\begin{array}{c} 2 \\ 1 \end{array}\right)\) and \(\Sigma = \left(\begin{array}{cc} 1 & 1 \\ 1
& 2 \end{array}\right)\). We draw \(\theta^{(t)}\) from \(N(\frac{3+\delta^{(t-1)}}{2},
\frac{1}{2})\) and \(\delta^{(t)}\) from \(N(\theta^{(t)}-1, 1)\). Suppose that we
start from \((1,1)\) and sample \(\theta^{(1)} = 2.216715\), \(\delta^{(1)} = 1.475738\), \(\theta^{(2)} = 2.086383\) and \(\delta^{(2)} = 1.617760\). The path of the
trajectory of these points is shown in the following figure.
A trajectory path of the Gibbs sampler sampling from \(N_{2}(\mu, \Sigma)\).
Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta = (\mu, \lambda)\). Suppose that \(X_{i} \, | \, \theta \sim N(\mu, 1/\lambda)\) so that \(\mu\) is the mean and \(\lambda\) the precision of the distribution. Suppose that we judge that \(\mu\) and \(\lambda\) are independent with \(\mu \sim N(\mu_{0}, 1/\tau)\), where \(\mu_{0}\) and \(\tau\) are known, and \(\lambda \sim \mbox{Gamma}(\alpha, \beta)\), where \(\alpha\) and \(\beta\) are known.
Show that the posterior density \(f(\mu, \lambda \, | \, x)\), where \(x = (x_{1}, \ldots, x_{n})\), can be
expressed as \[\begin{eqnarray*}
f(\mu, \lambda \, | \, x) & \propto & \lambda^{\alpha +
\frac{n}{2}
-1}\exp\left\{-\frac{\lambda}{2}\sum_{i=1}^{n}(x_{i}-\mu)^{2} -
\frac{\tau}{2}\mu^{2} + \tau \mu_{0} \mu - \beta \lambda\right\}.
\end{eqnarray*}\]
As \(X_{i} \, |
\, \theta \sim N(\mu, 1/\lambda)\) and \(\theta = (\mu, \lambda)\) then \[\begin{eqnarray*}
f(x \, | \, \mu, \lambda) & \propto & \prod_{i=1}^{n}
\lambda^{\frac{1}{2}}
\exp\left\{-\frac{\lambda}{2}(x_{i}-\mu)^{2}\right\} \\
& = & \lambda^{\frac{n}{2}}
\exp\left\{-\frac{\lambda}{2}\sum_{i=1}^{n}(x_{i}-\mu)^{2}\right\}.
\end{eqnarray*}\] \(\mu \sim N(\mu_{0},
1/\tau)\) so that \[\begin{eqnarray*}
f(\mu) & \propto & \exp\left\{- \frac{\tau}{2}(\mu -
\mu_{0})^{2}\right\} \\
& \propto & \exp\left\{- \frac{\tau}{2}\mu^{2} + \tau \mu_{0}
\mu\right\}.
\end{eqnarray*}\] Finally, as \(\lambda
\sim \mbox{Gamma}(\alpha, \beta)\) then \[\begin{eqnarray*}
f(\lambda) & \propto & \lambda^{\alpha -1} \exp\left\{-\beta
\lambda\right\}.
\end{eqnarray*}\] Hence, as \(\mu\) and \(\lambda\) are independent, \[\begin{eqnarray*}
f(\mu, \lambda \, | \, x) & \propto & f(x \, | \, \mu,
\lambda)f(\mu)f(\lambda) \\
& \propto & \lambda^{\alpha + \frac{n}{2}
-1}\exp\left\{-\frac{\lambda}{2}\sum_{i=1}^{n}(x_{i}-\mu)^{2} -
\frac{\tau}{2}\mu^{2} + \tau \mu_{0} \mu - \beta \lambda\right\}.
\end{eqnarray*}\]
Hence show that \[\begin{eqnarray*}
\lambda \, | \, \mu, x \sim \mbox{Gamma}\left(\alpha + \frac{n}{2},
\beta + \frac{1}{2} \sum_{i=1}^{n} (x_{i} - \mu)^{2}\right).
\end{eqnarray*}\]
We have \[\begin{eqnarray*}
f(\lambda \, | \, \mu, x) & = & \frac{f(\lambda, \mu \, | \,
x)}{f(\mu \, | \, x)} \\
& \propto & f(\lambda, \mu \, | \, x)
\end{eqnarray*}\] where the proportionality is with respect to
\(\lambda\). So, from 2.(a) we have
\[\begin{eqnarray*}
f(\lambda \, | \, \mu, x) & \propto & \lambda^{\alpha +
\frac{n}{2}
-1}\exp\left\{-\frac{\lambda}{2}\sum_{i=1}^{n}(x_{i}-\mu)^{2} -
\frac{\tau}{2}\mu^{2} + \tau \mu_{0} \mu - \beta \lambda\right\} \\
& \propto & \lambda^{\alpha + \frac{n}{2} -1}\exp\left\{-\lambda
\left(\beta + \frac{1}{2}\sum_{i=1}^{n}(x_{i}-\mu)^{2}\right)\right\}
\end{eqnarray*}\] which is a kernel of a \(\mbox{Gamma}\left(\alpha + \frac{n}{2}, \beta +
\frac{1}{2} \sum_{i=1}^{n} (x_{i} - \mu)^{2}\right)\).
Given that \(\mu \, | \,
\lambda, x \sim N(\frac{\tau\mu_{0} + n\lambda \bar{x}}{\tau + n
\lambda}, \frac{1}{\tau + n \lambda})\), where \(\bar{x} = \frac{1}{n} \sum_{i=1}^{n}
x_{i}\), describe how the Gibbs sampler may be used to sample
from the posterior distribution \(\mu, \lambda
\, | \, x\). Give a sensible estimate of \(Var(\lambda \, | \, x)\).
The Gibbs sampler requires the conditional distributions \(\lambda \, | \, \mu, x\) and \(\mu \, | \, \lambda, x\) which we have. The
algorithm is
\(\mbox{1}\).
Choose a starting value \((\mu^{(0)},
\lambda^{(0)})\) for which \(f(\mu^{(0)}, \lambda^{(0)} \, | \, x) >
0\).
\(\mbox{2}\). At
iteration \(t\) generate the new values
\((\mu^{(t)}, \lambda^{(t)})\) as
follows:
\(\bullet\)
draw \(\mu^{(t)}\) from \(N(\frac{\tau\mu_{0} + n\lambda^{(t-1)}
\bar{x}}{\tau + n \lambda^{(t-1)}}, \frac{1}{\tau + n
\lambda^{(t-1)}})\), the distribution of \(\mu^{(t)} \, | \, \lambda^{(t-1)}, x\).
\(\bullet\) draw \(\lambda^{(t)}\) from \(\mbox{Gamma}\left(\alpha + \frac{n}{2}, \beta +
\frac{1}{2} \sum_{i=1}^{n} (x_{i} - \mu^{(t)})^{2}\right)\), the
distribution of \(\lambda^{(t)} \, | \,
\mu^{(t)}, x\).
\(\mbox{3}\). Repeat step 2.
The
algorithm will produce a Markov Chain with stationary distribution \(f(\mu, \lambda \, | \, x)\). After a
sufficiently long time to allow for convergence, the values \(\{\mu^{(t)}, \lambda^{(t)}\}\) for \(t > b\) may be considered as a sample
from \(f(\mu, \lambda \, | \, x)\)
where the samples \(\{\mu^{(t)},
\lambda^{(t)}\}\) for \(t \leq
b\) are the “burn-in”.
Ergodic averages converge to the
desired expectations under the target distribution so that, for example,
\[\begin{eqnarray*}
\frac{1}{N} \sum_{t=1}^{N} g(\lambda^{(t)}) & \rightarrow &
E(g(\lambda) \, | \, X)
\end{eqnarray*}\] as \(N \rightarrow
\infty\). As this includes the observations prior to convergence,
we will do better to estimate only using the observations after the
“burn-in”. Noting that \(Var(\lambda \, | \,
X)\) corresponds to \(g(\lambda) =
(\lambda - E(\lambda \, | \, X))^{2}\) then any sensible sample
variance will suffice. For example, we could estimate \(Var(\lambda \, | \, X) = E(\lambda^{2} \, | \, X)
- E^{2}(\lambda \, | \, X)\) by \[\begin{eqnarray*}
\frac{1}{N-b} \sum_{t=b+1}^{N} (\lambda^{(t)} - \bar{\lambda})^{2} &
= & \left(\frac{1}{N-b} \sum_{t=b+1}^{N} (\lambda^{(t)})^{2}\right)
-\bar{\lambda}^{2}
\end{eqnarray*}\] where our sample is \(\{\mu^{(t)}, \lambda^{(t)} : t = b+1, \ldots,
N\}\) and \(\bar{\lambda} =
\frac{1}{N-b} \sum_{t=b+1}^{N} \lambda^{(t)}\), the sample
mean.
Consider a Poisson hierarchical model. At the first stage we have observations \(s_{j}\) which are Poisson with mean \(t_{j}\lambda_{j}\) for \(j = 1, \ldots, p\) where each \(t_{j}\) is known. We assume that that the \(\lambda_{j}\) are independent and identically distributed with \(\mbox{Gamma}(\alpha, \beta)\) prior distributions. The parameter \(\alpha\) is known but \(\beta\) is unknown and is given a \(\mbox{Gamma}(\gamma, \delta)\) distribution where \(\gamma\) and \(\delta\) are known. The \(s_{j}\) are assumed to be conditionally independent given the unknown parameters.
Find, up to a constant of integration, the joint
posterior distribution of the unknown parameters given \(s = (s_{1}, \ldots, s_{p})\).
Let \(\lambda = (\lambda_{1}, \ldots,
\lambda_{p})\) then the unknown parameters are \(\theta = (\lambda_{1}, \ldots, \lambda_{p}, \beta)
= (\lambda, \beta)\). The prior is \[\begin{eqnarray*}
f(\lambda, \beta) & = & f(\lambda \, | \, \beta)f(\beta) \\
& = & \left\{\prod_{j=1}^{p} f(\lambda_{j} \, | \,
\beta)\right\}f(\beta) \\
& = & \left\{\prod_{j=1}^{p}
\frac{\beta^{\alpha}}{\Gamma(\alpha)}
\lambda_{j}^{\alpha-1}e^{-\beta\lambda_{j}}\right\} \times
\frac{\delta^{\gamma}}{\Gamma(\gamma)} \beta^{\gamma-1}e^{-\delta \beta}
\\
& \propto & \left(\prod_{j=1}^{p}
\lambda_{j}^{\alpha-1}\right)\beta^{(p\alpha+\gamma)-1}\exp\left\{-\beta\left(\delta+\sum_{j=1}^{p}\lambda_{j}\right)\right\}.
\end{eqnarray*}\] The likelihood is \[\begin{eqnarray*}
f(s \, | \, \lambda, \beta) & = & \prod_{j=1}^{p}
\frac{(t_{j}\lambda_{j})^{s_{j}}e^{-t_{j}\lambda_{j}}}{s_{j}!} \\
& \propto & \left(\prod_{j=1}^{p} \lambda_{j}^{s_{j}}\right)
\exp\left(-\sum_{j=1}^{p}t_{j}\lambda_{j} \right).
\end{eqnarray*}\] The posterior is thus \[\begin{eqnarray*}
f(\lambda, \beta \, | \, s) & \propto & f(s \, | \, \lambda,
\beta)f(\lambda, \beta) \\
& \propto & \left(\prod_{j=1}^{p}
\lambda_{j}^{\alpha+s_{j}-1}\right)\beta^{(p\alpha+\gamma)-1}\exp\left\{-\beta\left(\delta+\sum_{j=1}^{p}\lambda_{j}\right)
-\sum_{j=1}^{p}t_{j}\lambda_{j} \right\}.
\end{eqnarray*}\]
Describe how the Gibbs sampler may be used to sample from
the posterior distribution, deriving all required conditional
distributions.
For each \(j\), let \(\lambda_{-j} = \lambda \setminus
\lambda_{j}\). For the Gibbs sampler we need to find the
distributions of \(\lambda_{j} \, | \,
\lambda_{-j}, \beta, s\), \(j = 1,
\ldots, p\), and \(\beta \, | \,
\lambda, s\). Now, \[\begin{eqnarray*}
f(\lambda_{j} \, | \, \lambda_{-j}, \beta, s) & = &
\frac{f(\lambda, \beta \, | \, s)}{f(\lambda_{-j}, \beta \, | \, s)} \\
& \propto & f(\lambda, \beta \, | \, s) \ \ \ \ \mbox{(taken
with respect to $\lambda_{j}$)} \\
& = & \lambda_{j}^{\alpha+s_{j}-1}
\exp\left\{-(\beta+t_{j})\lambda_{j}\right\}
\end{eqnarray*}\] which is a kernel of a \(\mbox{Gamma}(\alpha+s_{j}, \beta+t_{j})\)
distribution. Hence, \(\lambda_{j} \, | \,
\lambda_{-j}, \beta, s \sim \mbox{Gamma}(\alpha+s_{j},
\beta+t_{j})\). \[\begin{eqnarray*}
f(\beta \, | \, \lambda, s) & = & \frac{f(\lambda, \beta \, | \,
s)}{f(\lambda \, | \, s)} \\
& \propto & f(\lambda, \beta \, | \, s) \ \ \ \ \mbox{(taken
with respect to $\beta$)} \\
& = &
\beta^{(p\alpha+\gamma)-1}\exp\left\{-\left(\delta+\sum_{j=1}^{p}\lambda_{j}\right)\beta\right\}
\end{eqnarray*}\] which is a kernel of a \(\mbox{Gamma}(p\alpha+\gamma,\delta+\sum_{j=1}^{p}\lambda_{j})\)
distribution. So, \(\beta \, | \, \lambda, s
\sim
\mbox{Gamma}(p\alpha+\gamma,\delta+\sum_{j=1}^{p}\lambda_{j})\).
The Gibbs sampler is
\(\mbox{1}\). Choose an arbitrary starting
point for which \(f(\lambda^{(0)}, \beta^{(0)}
\, | \, s) > 0\). As \(\lambda_{j}
> 0\) for \(j=1, \ldots, p\)
and \(\beta > 0\) then any \(p+1\) collection of positive numbers will
suffice.
\(\mbox{2}\). At time
\(t\)
\(\bullet\) Obtain \(\lambda_{1}^{(t)}\) by sampling from the
distribution of \(\lambda_{1} \, | \,
\lambda_{-1}^{(t-1)}, \beta^{(t-1)}, s\) which is \(\mbox{Gamma}(\alpha + s_{1}, \beta^{(t-1)} +
t_{1})\).
\(\bullet\) \(\vdots\) \(\vdots\) \(\vdots\)
\(\bullet\) Obtain \(\lambda_{j}^{(t)}\) by sampling from the
distribution of \(\lambda_{j} \, | \,
\lambda_{1}^{(t)}, \ldots, \lambda_{j-1}^{(t)},
\lambda_{j+1}^{(t-1)}\), \(\ldots,
\lambda_{p}^{(t-1)}, \beta^{(t-1)}, s\) which is \(\mbox{Gamma}(\alpha + s_{j}, \beta^{(t-1)} +
t_{j})\).
\(\bullet\) \(\vdots\) \(\vdots\) \(\vdots\)
\(\bullet\) Obtain \(\lambda_{p}^{(t)}\) by sampling from the
distribution of \(\lambda_{p} \, | \,
\lambda_{-p}^{(t)}, \beta^{(t-1)}, s\) which is \(\mbox{Gamma}(\alpha + s_{p}, \beta^{(t-1)} +
t_{p})\).
\(\bullet\) Obtain \(\beta^{(t)}\) by sampling from the
distribution of \(\beta \, | \, \lambda^{(t)},
s\), the \(\mbox{Gamma}(p\alpha\) \(+ \gamma, \delta+\sum_{j=1}^{p}
\lambda_{j}^{(t)})\).
\(\mbox{3}\). Repeat step 2.
The
algorithm is run until convergence is reached, When convergence is
reached, then resulting value \((\lambda^{(t)}, \beta^{(t)})\) is a
realisation from the posterior \(\lambda,
\beta \, | \, s\).
Let \(\{\lambda_{1}^{(t)},
\ldots, \lambda_{p}^{(t)}, \beta^{(t)}; t = 1, \ldots, N\}\),
with \(N\) large, be a realisation of
the Gibbs sampler described above. Give sensible estimates of \(E(\lambda_{j} \, | \, s)\), \(Var(\beta \, | \, s)\) and \(E(\lambda_{j} \, | \, a \leq \beta \leq b,
s)\) where \(0 < a < b\)
are given constants.
We first remove the burn-in
samples. Suppose we determine the burn-in time \(B\), where we assume that the Markov chain
has sufficiently converged for \(t >
B\). (We assume that \(B <
N\).) Then we discard observations \(\{\lambda^{(t)}, \beta^{(t)}; t \leq B\}\)
and use the remaining sample \(\{\lambda^{(t)}, \beta^{(t)}; t = B+1, \ldots
N\}\) to estimate posterior summaries. Here \(\lambda^{(t)} = (\lambda_{1}^{(t)}, \ldots,
\lambda_{p}^{(t)})\).
The posterior mean of \(\lambda_{j} \, | \, s\) can be estimated by
the sample mean of \(\{\lambda_{j}^{(t)}, t =
B+1, \ldots, N\}\) so that our estimate of \(E(\lambda_{j} \, | \, s)\) is \(\frac{1}{N-B} \sum_{t = B+1}^{N}
\lambda_{j}^{(t)}\).
The posterior variance of \(\beta \, | \, s\) can be estimated by the
sample variance of \(\{\beta^{(t)}, t = B+1,
\ldots, N\}\) so that our estimate of \(Var(\beta \, | \, s)\) is \(\frac{1}{N-B} \sum_{t = B+1}^{N}
(\beta^{(t)}-\bar{\beta})^{2}\) \(=
(\frac{1}{N-B} \sum_{t = B+1}^{N} (\beta^{(t)})^{2}) -
\bar{\beta}^{2}\) where \(\bar{\beta} =
\frac{1}{N-B} \sum_{t = B+1}^{N} \beta^{(t)}\).
For
estimating \(E(\lambda_{j} \, | \, a \leq
\beta \leq b, s)\) we not only discard the burn-in observations
but also those elements \(\lambda_{j}^{(t)}\) for which \(\beta^{(t)} < a\) or \(\beta^{(t)} > b\) for each \(t = B+1, \ldots, N\). So, letting \(\mathbb{I}_{[a \leq \beta^{(t)} \leq b]}\)
denote the indicator function which is one if \(a \leq \beta^{(t)} \leq b\), the number of
observations after the burn-in with \(a \leq
\beta^{(t)} \leq b\) is \(M = \sum_{t =
B+1}^{N} \mathbb{I}_{[a \leq \beta^{(t)} \leq b]}\) and our
estimate of \(E(\lambda_{j} \, | \, a \leq
\beta \leq b, s)\) is \(\frac{1}{M}
\sum_{t=B+1}^{N} \lambda_{j}^{(t)}\mathbb{I}_{[a \leq \beta^{(t)} \leq
b]}\).
Show that the Gibbs sampler for sampling from a distribution \(\pi(\theta)\) where \(\theta = (\theta_{1}, \ldots, \theta_{d})\) can be viewed as a special case of the Metropolis-Hastings algorithm where each iteration \(t\) consists of \(d\) Metropolis-Hastings steps each with an acceptance probability of 1.
For the Gibbs sampler, we update each \(\theta_{p}\) one at a time by obtaining \(\theta^{(t)}_{p}\) from the conditional distribution \(\pi(\theta_{p} \, | \, \theta_{1}^{(t)}, \ldots, \theta_{p-1}^{(t)}, \theta_{p+1}^{(t-1)}, \ldots, \theta_{d}^{(t-1)})\). Thus, we move from \(\theta_{p}^{(t-1)}\) to \(\theta_{p}^{(t)}\) leaving the remaining \(d-1\) parameters fixed.
We now demonstrate that this update step in the Gibbs sampler is equivalent to a single Metropolis-Hastings step with acceptance probability 1. Let \(\theta^{(t-1)}_{[p]} = (\theta_{1}^{(t)}, \ldots, \theta_{p-1}^{(t)},\) \(\theta_{p}^{(t-1)}, \theta_{p+1}^{(t-1)}, \ldots, \theta_{d}^{(t-1)})\) and \(\theta^{(t)}_{[p]} = (\theta_{1}^{(t)}, \ldots, \theta_{p-1}^{(t)}, \theta_{p}^{(t)}, \theta_{p+1}^{(t-1)}, \ldots, \theta_{d}^{(t-1)})\). Consider the move from \(\theta^{(t-1)}_{[p]}\) to \(\theta^{(t)}_{[p]}\). As \(\theta^{(t)}_{[p]}\) is drawn from \(\pi(\theta_{p} \, | \, \theta_{1}^{(t)}, \ldots, \theta_{p-1}^{(t)}, \theta_{p+1}^{(t-1)}, \ldots, \theta_{d}^{(t-1)})\) then we shall use this for the proposal distribution in the Metropolis-Hastings algorithm so that \[\begin{eqnarray} q(\theta^{(t)}_{[p]} \, | \, \theta^{(t-1)}_{[p]}) & = & \pi(\theta_{p}^{(t)} \, | \, \theta_{1}^{(t)}, \ldots, \theta_{p-1}^{(t)}, \theta_{p+1}^{(t-1)}, \ldots, \theta_{d}^{(t-1)}). \tag{1} \end{eqnarray}\] If, instead, we consider the move from \(\theta^{(t)}_{[p]}\) to \(\theta^{(t-1)}_{[p]}\) then, using Gibbs sampling, \(\theta^{(t-1)}_{[p]}\) is obtained by drawing \(\theta_{p}^{(t-1)}\) from \(\pi(\theta_{p} \, | \, \theta_{1}^{(t)}, \ldots, \theta_{p-1}^{(t)}, \theta_{p+1}^{(t-1)}, \ldots, \theta_{d}^{(t-1)})\). Thus, we set \[\begin{eqnarray} q(\theta^{(t-1)}_{[p]} \, | \, \theta^{(t)}_{[p]}) & = & \pi(\theta_{p}^{(t-1)} \, | \, \theta_{1}^{(t)}, \ldots, \theta_{p-1}^{(t)}, \theta_{p+1}^{(t-1)}, \ldots, \theta_{d}^{(t-1)}). \tag{2} \end{eqnarray}\] Now, \[\begin{eqnarray} \pi(\theta^{(t-1)}_{[p]} ) \ = \ \pi(\theta_{1}^{(t)}, \ldots, \theta_{p-1}^{(t)}, \theta_{p}^{(t-1)}, \theta_{p+1}^{(t-1)}, \ldots, \theta_{d}^{(t-1)}) & = & \pi(\theta_{p}^{(t-1)} \, | \, \theta_{1}^{(t)}, \ldots, \theta_{p-1}^{(t)}, \theta_{p+1}^{(t-1)}, \ldots, \theta_{d}^{(t-1)})\pi(\theta_{1}^{(t)}, \ldots, \theta_{p-1}^{(t)}, \theta_{p+1}^{(t-1)}, \ldots, \theta_{d}^{(t-1)}) \tag{3} \\ & = & q(\theta^{(t-1)}_{[p]} \, | \, \theta^{(t)}_{[p]}) \pi(\theta_{1}^{(t)}, \ldots, \theta_{p-1}^{(t)}, \theta_{p+1}^{(t-1)}, \ldots, \theta_{d}^{(t-1)}) \tag{4} \end{eqnarray}\] where (4) follows by substituting (2) into (3). In an identical fashion, \[\begin{eqnarray} \pi(\theta^{(t)}_{[p]}) \ = \ \pi(\theta_{1}^{(t)}, \ldots, \theta_{p-1}^{(t)}, \theta_{p}^{(t)}, \theta_{p+1}^{(t-1)}, \ldots, \theta_{d}^{(t-1)}) & = & \pi(\theta_{p}^{(t)} \, | \, \theta_{1}^{(t)}, \ldots, \theta_{p-1}^{(t)}, \theta_{p+1}^{(t-1)}, \ldots, \theta_{d}^{(t-1)})\pi(\theta_{1}^{(t)}, \ldots, \theta_{p-1}^{(t)}, \theta_{p+1}^{(t-1)}, \ldots, \theta_{d}^{(t-1)}) \tag{5} \\ & = & q(\theta^{(t)}_{[p]} \, | \, \theta^{(t-1)}_{[p]}) \pi(\theta_{1}^{(t)}, \ldots, \theta_{p-1}^{(t)}, \theta_{p+1}^{(t-1)}, \ldots, \theta_{d}^{(t-1)}) \tag{6} \end{eqnarray}\] where (6) follows by substituting (1) into (5). Dividing (6) by (5) gives \[\begin{eqnarray*} \frac{\pi(\theta^{(t)}_{[p]})}{\pi(\theta^{(t-1)}_{[p]})} & = & \frac{q(\theta^{(t)}_{[p]} \, | \, \theta^{(t-1)}_{[p]})}{q(\theta^{(t-1)}_{[p]} \, | \, \theta^{(t)}_{[p]})} \end{eqnarray*}\] so that \[\begin{eqnarray} \frac{\pi(\theta^{(t)}_{[p]})q(\theta^{(t-1)}_{[p]} \, | \, \theta^{(t)}_{[p]})}{\pi(\theta^{(t-1)}_{[p]})q(\theta^{(t)}_{[p]} \, | \, \theta^{(t-1)}_{[p]})} & = & 1. \tag{7} \end{eqnarray}\] Thus, a Metropolis-Hastings algorithm for sampling from \(\pi(\theta)\) for a proposed move from \(\theta^{(t-1)}_{[p]}\) to \(\theta^{(t)}_{[p]}\) using the proposal distribution given by (1) has an acceptance ratio of \[\begin{eqnarray} \alpha(\theta^{(t-1)}_{[p]}, \theta^{(t)}_{[p]}) & = & \min \left(1, \frac{\pi(\theta^{(t)}_{[p]})q(\theta^{(t-1)}_{[p]} \, | \, \theta^{(t)}_{[p]})}{\pi(\theta^{(t-1)}_{[p]})q(\theta^{(t)}_{[p]} \, | \, \theta^{(t-1)}_{[p]})} \right) \tag{8} \\ & = & 1 \tag{9} \end{eqnarray}\] where (9) follows from (8) using (7).