Let \(X_{1}, \ldots, X_{n}\) be conditionally independent given \(\theta\), so \(f(x \, | \, \theta) = \prod_{i=1}^{n} f(x_{i} \, | \, \theta)\) where \(x = (x_{1}, \ldots, x_{n})\), with each \(X_{i} \, | \, \theta \sim Po(\theta)\). Suppose we judge that \(\theta \sim Gamma(\alpha, \beta)\).
Find the distribution of \(\theta \, | \, x\).
As
\(\theta \sim Gamma(\alpha, \beta)\)
then \[\begin{eqnarray}
f(\theta) & = &
\frac{\beta^{\alpha}}{\Gamma(\alpha)}\theta^{\alpha-1}e^{-\beta \theta}
\tag{1}
\end{eqnarray}\] whilst as \(X_{i} \, |
\, \theta \sim Po(\theta)\) \[\begin{eqnarray}
f(x \, | \, \theta) \ = \ \prod_{i=1}^{n}
\frac{\theta^{x_{i}}e^{-\theta}}{x_{i}!} \ = \
\frac{\theta^{n\bar{x}}e^{-n\theta}}{\prod_{i=1}^{n} x_{i}!}. \tag{2}
\end{eqnarray}\] From (1) and (2) the posterior density \[\begin{eqnarray}
f(\theta \, | \, x) & \propto &
\frac{\theta^{n\bar{x}}e^{-n\theta}}{\prod_{i=1}^{n} x_{i}!} \times
\frac{\beta^{\alpha}}{\Gamma(\alpha)}\theta^{\alpha-1}e^{-\beta \theta}
\nonumber \\
& \propto & \theta^{(\alpha + n\bar{x})-1}e^{-(\beta + n)\theta}
\nonumber
\end{eqnarray}\] which is a kernel of a \(Gamma(\alpha + n\bar{x}, \beta + n)\)
density, so \(\theta \, | \, x \sim
Gamma(\alpha + n\bar{x}, \beta + n)\).
Show that the posterior mean can be written as a weighted
average of the prior mean, denoted \(\theta_{0}\), and the maximum likelihood
estimate of \(\theta\), \(\overline{x}\).
As \(\theta \, | \, x \sim Gamma(\alpha + n\bar{x},
\beta + n)\), \[\begin{eqnarray*}
E(\theta \, | \, x) \ = \ \frac{\alpha + n\bar{x}}{\beta + n} & =
& \frac{\beta \left(\frac{\alpha}{\beta}\right) + n\bar{x}}{\beta +
n} \\
& = & \lambda \theta_{0} + (1 - \lambda) \bar{x}
\end{eqnarray*}\] where \(\lambda =
\frac{\beta}{\beta + n}\) and \(\theta_{0} = \frac{\alpha}{\beta} =
E(\theta)\) as \(\theta \sim
Gamma(\alpha, \beta)\). Thus, the posterior mean is a weighted
average of the prior mean and the maximum likelihood estimate.
Let \(Z\) be a future
(unobserved) observation. Find the mean and variance of the predictive
distribution \(Z \, | \, x\). You may
assume that \(X\) and \(Z\) are conditionally independent given
\(\theta\) and that \(Z \, | \, \theta \sim
Po(\theta)\).
We do not need to find the full
predictive distribution. As \(X\) and
\(Z\) are conditionally independent
given \(\theta\) we can use expectation
and variance the formulas given in lectures so \[\begin{eqnarray}
E(Z \, | \, X) & = & E(E(Z \, | \, \theta) \, | \, X) \nonumber
\\
& = & E(\theta \, | \, X) \tag{3} \\
& = & \frac{\alpha + n\bar{x}}{\beta + n} \tag{4}
\end{eqnarray}\] where (3) follows as \(Z \, | \, \theta \sim Po(\theta)\) and (4)
as \(\theta \, | \, x \sim Gamma(\alpha +
n\bar{x}, \beta + n)\). Similarly, \[\begin{eqnarray}
Var(Z \, | \, X) & = & Var(E(Z \, | \, \theta) \, | \, X) +
E(Var(Z \, | \, \theta) \, | \, X) \nonumber \\
& = & Var(\theta \, | \, X) + E(\theta \, | \, X) \nonumber \\
& = & \frac{\alpha + n\bar{x}}{(\beta + n)^{2}} + \frac{\alpha +
n\bar{x}}{\beta + n} \ = \ \frac{(\alpha + n\bar{x})(\beta + n +
1)}{(\beta + n)^{2}}. \tag{5}
\end{eqnarray}\]
The data in the table below are the number of fatal
accidents on scheduled airline flights between 1976 and 1985.
\[\begin{eqnarray*}
\begin{array}{cccccccccc}
1976 & 1977 & 1978 & 1979 & 1980 & 1981 & 1982
& 1983 & 1984 & 1985 \\
24 & 25 & 31 & 31 & 22 & 21 & 26 & 20 &
16 & 22
\end{array}
\end{eqnarray*}\] Let \(Z\) be the number of fatal accidents in
1986 and \(X_{i}\) the number of fatal
accidents in 1975\(+ i\). Adopting the
model given above, with \(\alpha =
101\) and \(\beta = 5\) and
assuming that \(Z \, | \, x\) may be
approximated by a \(N(E(Z \, | \, X), Var(Z \,
| \, X))\), find an approximate 95% prediction interval for the
number of accidents in 1986, that is an interval \((z_{L}, z_{U})\) such that \(P(z_{L} < Z < z_{U} \, | \, x) =
0.95\).
Taking \(\alpha = 101\) and \(\beta = 5\) we observe \(\sum_{i=1}^{10} x_{i} = 238\) with \(n = 10\). Hence, \(\alpha + n\bar{x} = 101 + 238 = 339\) and
\(\beta + n = 5 + 10 = 15\) so that
\(\theta \, | \, x \sim Gamma(339,
15)\). From (4) and (5) we have \(E(Z
\, | \, X) = \frac{339}{15} = \frac{113}{5}\) and \(Var(Z \, | \, X) = \frac{339\times 16}{15^{2}} =
\frac{1808}{75}\) so we use \(Z \, | \,
x \sim N(\frac{113}{5}, \frac{1808}{75})\) approximately.
Recalling that if \(Y \sim N(0, 1)\)
then \(P(-1.96 < Y < 1.96) =
0.95\) the (approximate) 95% prediction interval is \[\begin{eqnarray*}
\frac{113}{5} \pm 1.96\sqrt{\frac{1808}{75}} & = & (12.98,
32.22).
\end{eqnarray*}\]
Let \(X_{1}, \ldots, X_{n}\) be a finitely exchangeable sequence of random quantities and consider any \(i \neq j \in \{1, \ldots, n\}\).
Explain why the marginal distribution of any \(X_{i}\) does not depend upon \(i\) so that \(E(X_{i})\) and \(Var(X_{i})\) do not depend upon \(i\).
If \(X_{1}, \ldots, X_{n}\) are finitely
exchangeable then their joint density satisfies \[\begin{eqnarray*}
f(x_{1}, \ldots, x_{n}) & = & f(x_{\pi(1)}, \ldots, x_{\pi(n)})
\end{eqnarray*}\] for all permutations \(\pi\) defined on the set \(\{1, \ldots, n\}\). For a given permutation
there exists \(j \in \{1, \ldots, n\}\)
such that \(x_{\pi(j)} = x_{i}\). Let
\(x_{-i}\) denote the set \(\{x_{1}, \ldots, x_{n}\} \setminus x_{i}\).
The marginal distribution of \(X_{i}\)
is given by \[\begin{eqnarray*}
f_{X_{i}}(x_{i}) & = & \int_{x_{-i}} f(x_{1}, \ldots, x_{n}) \,
dx_{-i} \\
& = & \int_{x_{-i}} f(x_{\pi(1)}, \ldots, x_{\pi(j-1)},
x_{\pi(j)}, x_{\pi(j+1)}, \ldots, x_{\pi(n)}) \, dx_{-i} \\
& = & \int_{x_{-i}} f(x_{\pi(1)}, \ldots, x_{\pi(j-1)}, x_{i},
x_{\pi(j+1)}, \ldots, x_{\pi(n)}) \, dx_{-i} \\
& = & f_{X_{j}}(x_{i}).
\end{eqnarray*}\] As this holds for all permutations then it
holds for all \(j \in \{1, \ldots,
n\}\). Thus, the marginal distribution of \(X_{i}\) does not depend upon \(i\). As \(E(X_{i})\) and \(Var(X_{i})\) are constructed over the
marginal of \(X_{i}\) then it follows
immediately that these also do not depend upon \(i\).
Explain why the joint distribution of any \(X_{i}\), \(X_{j}\) does not depend upon either \(i\) or \(j\) so that \(Cov(X_{i}, X_{j})\) does not depend on
either \(i\) or \(j\).
For \(i \neq j\), let \(x_{-i, -j}\) denote the set \(\{x_{1}, \ldots, x_{n}\} \setminus \{x_{i},
x_{j}\}\). For a given permutation there exists \(k \neq l \in \{1, \ldots, n\}\) such that
\(x_{\pi(k)} = x_{i}\) and \(x_{\pi(l)} = x_{j}\). Suppose without loss
of generality that \(k < l\). The
joint distribution of \(X_{i}\) and
\(X_{j}\) is given by \[\begin{eqnarray*}
f_{X_{i}, X_{j}}(x_{i}, x_{j}) & = & \int_{x_{-i, -j}} f(x_{1},
\ldots, x_{n}) \, dx_{-i, -j} \\
& = & \int_{x_{-i, -j}} f(x_{\pi(1)}, \ldots, x_{\pi(k-1)},
x_{\pi(k)}, x_{\pi(k+1)}, \ldots, x_{\pi(l-1)}, x_{\pi(l)},
x_{\pi(l+1)}, \ldots, x_{\pi(n)}) \, dx_{-i, -j} \\
& = & \int_{x_{-i, -j}} f(x_{\pi(1)}, \ldots, x_{\pi(k-1)},
x_{i}, x_{\pi(k+1)}, \ldots, x_{\pi(l-1)}, x_{j}, x_{\pi(l+1)}, \ldots,
x_{\pi(n)}) \, dx_{-i, -j} \\
& = & f_{X_{k}, X_{l}}(x_{i}, x_{j}).
\end{eqnarray*}\] As this holds for all permutations then it
holds for all \(k \neq l \in \{1, \ldots,
n\}\). Thus, the joint distribution of \(X_{i}\) and \(X_{j}\) does not depend upon \(i \neq j\). As \(Cov(X_{i}, X_{j})\) is constructed over the
joint distribution of \(X_{i}\) and
\(X_{j}\) then it follows immediately
that \(Cov(X_{i}, X_{j})\) does not
depend on either \(i\) or \(j\).
Let \(Y = \sum_{i=1}^{n}
X_{i}\). By considering \(Var(Y) \geq
0\), or otherwise, show that \[\begin{eqnarray*}
Corr(X_{i}, X_{j}) & = & \frac{Cov(X_{i},
X_{j})}{\sqrt{Var(X_{i})Var(X_{j})}} \ \geq \ \frac{-1}{n-1}.
\end{eqnarray*}\]
Taking the variance of \(Y = \sum_{i=1}^{n} X_{i}\) we have \[\begin{eqnarray*}
Var(Y) & = & Var\left(\sum_{i=1}^{n} X_{i}\right) \\
& = & \sum_{i=1}^{n} Var(X_{i}) + \sum_{i=1}^{n}\sum_{j \neq
i}^{n} Cov(X_{i}, X_{j}).
\end{eqnarray*}\] Now, from part (a), \(Var(X_{i})\) does not depend upon \(i\) and, from part (b), \(Cov(X_{i}, X_{j})\) does not depend upon
either \(i\) or \(j\). Thus, \[\begin{eqnarray*}
Var(Y) & = & nVar(X_{i}) + n(n-1)Cov(X_{i}, X_{j}).
\end{eqnarray*}\] As \(Var(Y) \geq
0\) it follows that \(nVar(X_{i}) +
n(n-1)Cov(X_{i}, X_{j}) \geq 0\). Rearranging this inequality
gives \[\begin{eqnarray*}
\frac{Cov(X_{i}, X_{j})}{Var(X_{i})} & \geq & \frac{-1}{n-1}.
\end{eqnarray*}\] As \(Var(X_{i})\) does not depend upon \(i\) then \(Var(X_{i}) = \sqrt{Var(X_{i})Var(X_{j})}\)
so that \[\begin{eqnarray*}
Corr(X_{i}, X_{j}) & = & \frac{Cov(X_{i},
X_{j})}{\sqrt{Var(X_{i})Var(X_{j})}} \ \geq \ \frac{-1}{n-1}.
\end{eqnarray*}\] Notice that an immediate consequence of this
result is that if \(X_{1}, X_{2},
\ldots\) are infinetely exchangeable then \(Corr(X_{i}, X_{j}) \geq 0\) as we require
\(Corr(X_{i}, X_{j}) \geq
\frac{-1}{n-1}\) for all \(n\).
Suppose \(X \, | \, \mu \sim N(\mu, \sigma^{2})\) and \(Y \, | \, \mu, \delta \sim N(\mu + \delta, \sigma^{2})\), where \(\sigma^{2}\) is known and \(X\) and \(Y\) are conditionally independent given \(\mu\) and \(\delta\).
Find the joint distribution of \(X\) and \(Y\) given \(\mu\) and \(\delta\).
As \(X\) and \(Y\) are conditionally independent given
\(\mu\) and \(\delta\) then \[\begin{eqnarray*}
f(x, y \, | \, \mu, \delta) & = & f(x \, | \, \mu, \delta)f(y \,
| \, \mu, \delta) \\
& = & f(x \, | \, \mu)f(y \, | \, \mu, \delta) \\
& = & \frac{1}{2\pi \sigma^{2}} \exp \left\{ -
\frac{1}{2\sigma^{2}}[(x-\mu)^{2} + (y - \mu - \delta)^{2}]\right\}.
\end{eqnarray*}\]
Consider the improper noninformative joint prior
distribution \[\begin{eqnarray*}
f(\mu, \delta) & \propto & 1.
\end{eqnarray*}\] Find, up to a constant of
proportionality, the joint posterior distribution of \(\mu\) and \(\delta\) given \(x\) and \(y\). Are \(\mu \,
| \, x, y\) and \(\delta \, | \, x,
y\) independent?
As \(f(\mu, \delta) \propto 1\) then \[\begin{eqnarray}
f(\mu, \delta \, | \, x, y) & \propto & f(x, y \, | \, \mu,
\delta) \nonumber \\
& \propto & \exp \left\{ - \frac{1}{2\sigma^{2}}[(x-\mu)^{2} +
(y - \mu - \delta)^{2}]\right\} \nonumber \\
& \propto & \exp \left\{ - \frac{1}{2\sigma^{2}}[\mu^{2} - 2x\mu
+ (\mu + \delta)^{2} - 2y(\mu + \delta)]\right\} \nonumber \\
& = & \exp \left\{ - \frac{1}{2\sigma^{2}}[2\mu^{2} -2(x+y)\mu +
2\mu\delta +\delta^{2} -2y\delta]\right\}. \tag{6}
\end{eqnarray}\] Thus, as a consequence of the \(\mu\delta\) term in (6), we cannot write
\(f(\mu, \delta \, | \, x, y) \propto g(\mu \,
| \, x, y)h(\delta \, | \, x, y)\) so that \(\mu \, | \, x, y\) and \(\delta \, | \, x, y\) are not
independent.
Note that \(f(\mu,
\delta \, | \, x, y)\) is quadratic in \(\mu\) and \(\delta\) so that \[\begin{eqnarray*}
f(\mu, \delta \, | \, x, y) & \propto & \exp \left\{ -
\frac{1}{2\sigma^{2}} \left(\mu-x \ \ \delta -
(y-x)\right)\left(\begin{array}{cc} 2 & 1 \\ 1 & 1
\end{array}\right)\left(\begin{array}{c} \mu-x \\ \delta -
(y-x)\end{array}\right)\right\} \\
& = & \exp \left\{ - \frac{1}{2\sigma^{2}} \left(\mu-x \ \
\delta - (y-x)\right)\left(\begin{array}{rr} 1 & -1 \\ -1 & 2
\end{array}\right)^{-1}\left(\begin{array}{c} \mu-x \\ \delta -
(y-x)\end{array}\right)\right\}
\end{eqnarray*}\] which is the kernel of a bivariate Normal
distribution. We have \[\begin{eqnarray*}
\mu, \delta \, | \, x, y & \sim &
N_{2}\left(\left(\begin{array}{c} x \\y-x \end{array}\right),
\left(\begin{array}{rr} \sigma^{2} & -\sigma^{2} \\ -\sigma^{2}
& 2\sigma^{2} \end{array}\right)\right).
\end{eqnarray*}\] Using the properties of multivariate Normal
distributions, we can read off the marginal distributions so \(\mu \, | \, x, y \sim N(x, \sigma^{2})\)
and \(\delta \, | \, x, y \sim N(y-x,
2\sigma^{2})\). We shall derive these marginal distributions
directly in parts (c) and (d) as these give insights into the types of
techniques used when the marginals are not so conventional to
obtain.
Find the marginal posterior distribution \(f(\delta \, | \, x, y)\).
\[\begin{eqnarray*}
f(\delta \, | \, x, y) & = & \int_{-\infty}^{\infty} f(\mu,
\delta \, | \, x, y) \, d\mu \\
& \propto & \int_{-\infty}^{\infty} \exp \left\{ -
\frac{1}{2\sigma^{2}}[2\mu^{2} -2(x+y)\mu + 2\mu\delta +\delta^{2}
-2y\delta]\right\} \, d\mu \\
& = & \exp\left\{- \frac{1}{2\sigma^{2}}[\delta^{2}
-2y\delta]\right\}\int_{-\infty}^{\infty} \exp \left\{ -
\frac{1}{2\sigma^{2}}[2\mu^{2} -2(x+y-\delta)\mu]\right\} \, d\mu \\
& = & \exp\left\{- \frac{1}{2\sigma^{2}}[\delta^{2}
-2y\delta]\right\}\times \\
& & \hspace{1cm}\int_{-\infty}^{\infty} \exp \left\{ -
\frac{1}{\sigma^{2}}\left[\left(\mu - \frac{(x+y-\delta)}{2}\right)^{2}
- \frac{(x+y-\delta)^{2}}{4}\right]\right\} \, d\mu \\
& = & \exp\left\{- \frac{1}{4\sigma^{2}}[2\delta^{2} -4y\delta -
(x+y-\delta)^{2}]\right\}\times \\
& & \hspace{3.8cm}\int_{-\infty}^{\infty} \exp \left\{ -
\frac{1}{\sigma^{2}}\left(\mu -
\frac{(x+y-\delta)}{2}\right)^{2}\right\} \, d\mu \\
& \propto & \exp\left\{- \frac{1}{4\sigma^{2}}[2\delta^{2}
-4y\delta - (x+y-\delta)^{2}]\right\} \\
& \propto & \exp\left\{-
\frac{1}{4\sigma^{2}}[\delta^{2}-2(2y-(x+y))\delta]\right\}
\end{eqnarray*}\] which is a kernel of a \(N(y-x, 2\sigma^{2})\) density. Hence, \(\delta \, | \, x, y \sim N(y-x,
2\sigma^{2})\).
Find the marginal posterior distribution \(f(\mu \, | \, x, y)\).
\[\begin{eqnarray*}
f(\mu \, | \, x, y) & = & \int_{-\infty}^{\infty} f(\mu, \delta
\, | \, x, y) \, d\delta \\
& \propto & \int_{-\infty}^{\infty} \exp \left\{ -
\frac{1}{2\sigma^{2}}[2\mu^{2} -2(x+y)\mu + 2\mu\delta +\delta^{2}
-2y\delta]\right\} \, d\delta \\
& = & \exp\left\{- \frac{1}{2\sigma^{2}}[2\mu^{2} -
2(x+y)\mu]\right\}\times \\
& & \hspace{3.6cm} \int_{-\infty}^{\infty} \exp \left\{ -
\frac{1}{2\sigma^{2}}[\delta^{2} -2(y-\mu)\delta]\right\} \, d\delta \\
& = & \exp\left\{- \frac{1}{2\sigma^{2}}[2\mu^{2} -
2(x+y)\mu]\right\}\times \\
& & \hspace{1.6cm}\int_{-\infty}^{\infty} \exp \left\{ -
\frac{1}{2\sigma^{2}}[\delta -(y-\mu)]^{2} +
\frac{1}{2\sigma^{2}}(y-\mu)^{2}\right\} \, d\delta \\
& = & \exp\left\{- \frac{1}{2\sigma^{2}}[2\mu^{2} - 2(x+y)\mu -
(y-\mu)^{2}]\right\}\times \\
& & \hspace{4.5cm}\int_{-\infty}^{\infty} \exp \left\{ -
\frac{1}{2\sigma^{2}}[\delta -(y-\mu)]^{2}\right\} \, d\delta \\
& \propto & \exp\left\{- \frac{1}{2\sigma^{2}}[2\mu^{2} -
2(x+y)\mu - (y-\mu)^{2}]\right\} \\
& \propto & \exp\left\{- \frac{1}{2\sigma^{2}}[\mu^{2}
-2(x+y-y)\mu]\right\}
\end{eqnarray*}\] which is a kernel of a \(N(x, \sigma^{2})\) density. Hence, \(\mu \, | \, x, y \sim N(x,
\sigma^{2})\).
Consider a future observation \(Z\) where \(Z \,
| \, \mu, \delta \sim N(\mu - \delta, \sigma^{2})\) and \(Z\) is conditionally independent of \(X\) and \(Y\) given \(\mu\) and \(\delta\). Find the predictive distribution
\(f(z \, | \, x, y)\).
Notice that \[\begin{eqnarray*}
f(z \, | \, \mu, \delta) & \propto & \exp
\left\{-\frac{1}{2\sigma^{2}}[z^{2} - 2(\mu - \delta)z + (\mu -
\delta)^{2}]\right\}
\end{eqnarray*}\] so that, as \(Z\) is conditionally independent of \(X\) and \(Y\) given \(\mu\) and \(\delta\) and using (6), \[\begin{eqnarray*}
f(z \, | \, x, y) & = & \int_{-\infty}^{\infty}
\int_{-\infty}^{\infty} f(z \, | \, \mu, \delta)f(\mu, \delta \, | \, x,
y) \, d\mu \, d\delta \\
& \propto & \int_{-\infty}^{\infty} \int_{-\infty}^{\infty}
\exp\left\{-\frac{1}{2\sigma^{2}}[z^{2} - 2(\mu - \delta)z + (\mu -
\delta)^{2}]\right\} \times \\
& & \hspace{2cm}\exp\left\{-\frac{1}{2\sigma^{2}}[2\mu^{2}
-2(x+y)\mu + 2\mu\delta +\delta^{2} -2y\delta]\right\} \, d\mu \,
d\delta \\
& = & \int_{-\infty}^{\infty} \int_{-\infty}^{\infty}
\exp\left\{-\frac{1}{2\sigma^{2}}[z^{2} + 3\mu^{2} -2(x+y+z)\mu +
2\delta^{2} -2(y-z)\delta]\right\} \, d\mu \, d\delta \\
& = & \exp\left\{-\frac{1}{2\sigma^{2}}z^{2} \right\}
\int_{-\infty}^{\infty} \exp\left\{-\frac{1}{2\sigma^{2}}[3\mu^{2}
-2(x+y+z)\mu]\right\} \, d\mu \\
& & \hspace{5.18cm} \times \int_{-\infty}^{\infty}
\exp\left\{-\frac{1}{2\sigma^{2}}[2\delta^{2} -2(y-z)\delta]\right\} \,
d\delta \\
& = & \exp\left\{-\frac{z^{2}}{2\sigma^{2}} \right\}
\int_{-\infty}^{\infty} \exp\left\{-\frac{3}{2\sigma^{2}}\left[\left(\mu
- \frac{(x+y+z)}{3}\right)^{2} - \frac{(x+y+z)^{2}}{9}\right]\right\} \,
d\mu \\
& & \hspace{3.18cm}\times \int_{-\infty}^{\infty}
\exp\left\{-\frac{1}{\sigma^{2}}\left[\left(\delta -
\frac{(y-z)}{2}\right)^{2} - \frac{(y-z)^{2}}{4}\right]\right\} \,
d\delta \\
& \propto & \exp\left\{-\frac{1}{2\sigma^{2}}z^{2}
\right\}\exp\left\{\frac{3}{2\sigma^{2}}
\frac{(x+y+z)^{2}}{9}\right\}\exp\left\{\frac{1}{\sigma^{2}}\frac{(y-z)^{2}}{4}\right\} \\
& \propto &
\exp\left\{-\frac{1}{12\sigma^{2}}[z^{2}-4(x+y)z+6yz]\right\} \\
& = & \exp\left\{-\frac{1}{12\sigma^{2}}[z^{2} -
2(2x-y)z]\right\}
\end{eqnarray*}\] which is a kernel of a \(N(2x-y, 6\sigma^{2})\) density. Hence,
\(Z \, | \, x, y \sim N(2x-y,
6\sigma^{2})\).
Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta\). Suppose that each \(X_{i} \, | \, \theta \sim U(0, \theta)\).
Show that \(M =
\max_{i}(X_{i})\) is a sufficient statistic for \(X = (X_{1}, \ldots, X_{n})\).
As \(X_{i} \, | \, \theta \sim U(0,
\theta)\) then \[\begin{eqnarray*}
f(x_{i} \, | \, \theta) & = & \left\{\begin{array}{ll}
\frac{1}{\theta} & 0 \leq x_{i} \leq \theta; \\
0 & \mbox{otherwise}.
\end{array}\right.
\end{eqnarray*}\] Let \(\mathbb{I}_{(0,
\theta)}(x)\) denote the indicator function for the event \(0 \leq x \leq \theta\), so \(\mathbb{I}_{(0, \theta)}(x) = 1\) if \(0 \leq x \leq \theta\) and \(0\) otherwise. Then we can write \(f(x_{i} \, | \, \theta) = \frac{1}{\theta}
\mathbb{I}_{(0, \theta)}(x_{i})\) so that \[\begin{eqnarray*}
f(x \, | \, \theta) & = & \prod_{i=1}^{n} \frac{1}{\theta}
\mathbb{I}_{(0, \theta)}(x_{i}) \\
& = & \frac{1}{\theta^{n}} \prod_{i=1}^{n}
\mathbb{I}_{(0, \theta)}(x_{i}).
\end{eqnarray*}\] Let \(\mathbb{I}_{\{a
\geq b\}}\) denote the indicator function for the event \(a \geq b\) so \(\mathbb{I}_{\{a \geq b\}} = 1\) if \(a \geq b\) and \(0\) otherwise. Then \[\begin{eqnarray*}
\prod_{i=1}^{n} \mathbb{I}_{(0, \theta)}(x_{i}) & = &
\mathbb{I}_{\{\min_{i}(x_{i}) \geq 0\}}\mathbb{I}_{\{\theta \geq
\max_{i}(x_{i})\}}
\end{eqnarray*}\] so that \[\begin{eqnarray*}
f(x \, | \, \theta) & = & \frac{1}{\theta^{n}}
\mathbb{I}_{\{\min_{i}(x_{i}) \geq 0\}}\mathbb{I}_{\{\theta \geq
\max_{i}(x_{i})\}} \\
& = &
\frac{1}{\theta^{n}}\mathbb{I}_{\{\theta \geq
m\}}\mathbb{I}_{\{\min_{i}(x_{i}) \geq 0\}}
\end{eqnarray*}\] where \(m =
\max_{i}(x_{i})\). Thus, \(f(x \, | \,
\theta) = g(\theta, m)h(x)\) where \(g(\theta, m) =
\frac{1}{\theta^{n}}\mathbb{I}_{\{\theta \geq m\}}\) and \(h(x) = \mathbb{I}_{\{\min_{i}(x_{i}) \geq
0\}}\) so that \(M =
\max_{i}(X_{i})\) is sufficient for \(X
= (X_{1}, \ldots, X_{n})\) for learning about \(\theta\).
Show that the Pareto distribution, \(\theta \sim Pareto(a, b)\), \[\begin{eqnarray*}
f(\theta) & = & \frac{a b^{a}}{\theta^{a+1}}, \ \ \theta \geq b
\end{eqnarray*}\] is a conjugate prior
distribution.
Utilising an indicator function, the
prior may be expressed as \[\begin{eqnarray*}
f(\theta) & = & \frac{ab^{a}}{\theta^{a+1}}\mathbb{I}_{\{\theta
\geq b\}}.
\end{eqnarray*}\] The posterior is \[\begin{eqnarray*}
f(\theta \, | \, x) & \propto & \frac{1}{\theta^{n}}
\mathbb{I}_{\{\theta \geq m\}} \times
\frac{ab^{a}}{\theta^{a+1}}\mathbb{I}_{\{\theta \geq b\}} \\
& \propto & \frac{1}{\theta^{a+n+1}} \mathbb{I}_{\{\theta \geq
m\}}\mathbb{I}_{\{\theta \geq b\}} \\
& = & \frac{1}{\theta^{a+n+1}} \mathbb{I}_{\{\theta \geq \max(b,
m)\}}
\end{eqnarray*}\] which is a kernel of a \(Pareto(a+n, \max(b, m))\) density so that
\(\theta \, | \, x \sim Pareto(a+n, \max(b,
m))\). Thus, the prior and posterior are in the same family and
so, relative to the \(U(0, \theta)\)
likelihood, the Pareto distribution is a conjugate family.