MA40189: Topics in Bayesian statistics

Solution Sheet Three

Question 1

Let \(X_{1}, \ldots, X_{n}\) be conditionally independent given \(\theta\), so \(f(x \, | \, \theta) = \prod_{i=1}^{n} f(x_{i} \, | \, \theta)\) where \(x = (x_{1}, \ldots, x_{n})\), with each \(X_{i} \, | \, \theta \sim Po(\theta)\). Suppose we judge that \(\theta \sim Gamma(\alpha, \beta)\).

Find the distribution of \(\theta \, | \, x\).

As \(\theta \sim Gamma(\alpha, \beta)\) then \[\begin{eqnarray} f(\theta) & = & \frac{\beta^{\alpha}}{\Gamma(\alpha)}\theta^{\alpha-1}e^{-\beta \theta} \tag{1} \end{eqnarray}\] whilst as \(X_{i} \, | \, \theta \sim Po(\theta)\) \[\begin{eqnarray} f(x \, | \, \theta) \ = \ \prod_{i=1}^{n} \frac{\theta^{x_{i}}e^{-\theta}}{x_{i}!} \ = \ \frac{\theta^{n\bar{x}}e^{-n\theta}}{\prod_{i=1}^{n} x_{i}!}. \tag{2} \end{eqnarray}\] From (1) and (2) the posterior density \[\begin{eqnarray} f(\theta \, | \, x) & \propto & \frac{\theta^{n\bar{x}}e^{-n\theta}}{\prod_{i=1}^{n} x_{i}!} \times \frac{\beta^{\alpha}}{\Gamma(\alpha)}\theta^{\alpha-1}e^{-\beta \theta} \nonumber \\ & \propto & \theta^{(\alpha + n\bar{x})-1}e^{-(\beta + n)\theta} \nonumber \end{eqnarray}\] which is a kernel of a \(Gamma(\alpha + n\bar{x}, \beta + n)\) density, so \(\theta \, | \, x \sim Gamma(\alpha + n\bar{x}, \beta + n)\).
Show that the posterior mean can be written as a weighted average of the prior mean, denoted \(\theta_{0}\), and the maximum likelihood estimate of \(\theta\), \(\overline{x}\).

As \(\theta \, | \, x \sim Gamma(\alpha + n\bar{x}, \beta + n)\), \[\begin{eqnarray*} E(\theta \, | \, x) \ = \ \frac{\alpha + n\bar{x}}{\beta + n} & = & \frac{\beta \left(\frac{\alpha}{\beta}\right) + n\bar{x}}{\beta + n} \\ & = & \lambda \theta_{0} + (1 - \lambda) \bar{x} \end{eqnarray*}\] where \(\lambda = \frac{\beta}{\beta + n}\) and \(\theta_{0} = \frac{\alpha}{\beta} = E(\theta)\) as \(\theta \sim Gamma(\alpha, \beta)\). Thus, the posterior mean is a weighted average of the prior mean and the maximum likelihood estimate.
Let \(Z\) be a future (unobserved) observation. Find the mean and variance of the predictive distribution \(Z \, | \, x\). You may assume that \(X\) and \(Z\) are conditionally independent given \(\theta\) and that \(Z \, | \, \theta \sim Po(\theta)\).

We do not need to find the full predictive distribution. As \(X\) and \(Z\) are conditionally independent given \(\theta\) we can use expectation and variance the formulas given in lectures so \[\begin{eqnarray} E(Z \, | \, X) & = & E(E(Z \, | \, \theta) \, | \, X) \nonumber \\ & = & E(\theta \, | \, X) \tag{3} \\ & = & \frac{\alpha + n\bar{x}}{\beta + n} \tag{4} \end{eqnarray}\] where (3) follows as \(Z \, | \, \theta \sim Po(\theta)\) and (4) as \(\theta \, | \, x \sim Gamma(\alpha + n\bar{x}, \beta + n)\). Similarly, \[\begin{eqnarray} Var(Z \, | \, X) & = & Var(E(Z \, | \, \theta) \, | \, X) + E(Var(Z \, | \, \theta) \, | \, X) \nonumber \\ & = & Var(\theta \, | \, X) + E(\theta \, | \, X) \nonumber \\ & = & \frac{\alpha + n\bar{x}}{(\beta + n)^{2}} + \frac{\alpha + n\bar{x}}{\beta + n} \ = \ \frac{(\alpha + n\bar{x})(\beta + n + 1)}{(\beta + n)^{2}}. \tag{5} \end{eqnarray}\]
The data in the table below are the number of fatal accidents on scheduled airline flights between 1976 and 1985. \[\begin{eqnarray*} \begin{array}{cccccccccc} 1976 & 1977 & 1978 & 1979 & 1980 & 1981 & 1982 & 1983 & 1984 & 1985 \\ 24 & 25 & 31 & 31 & 22 & 21 & 26 & 20 & 16 & 22 \end{array} \end{eqnarray*}\] Let \(Z\) be the number of fatal accidents in 1986 and \(X_{i}\) the number of fatal accidents in 1975\(+ i\). Adopting the model given above, with \(\alpha = 101\) and \(\beta = 5\) and assuming that \(Z \, | \, x\) may be approximated by a \(N(E(Z \, | \, X), Var(Z \, | \, X))\), find an approximate 95% prediction interval for the number of accidents in 1986, that is an interval \((z_{L}, z_{U})\) such that \(P(z_{L} < Z < z_{U} \, | \, x) = 0.95\).

Taking \(\alpha = 101\) and \(\beta = 5\) we observe \(\sum_{i=1}^{10} x_{i} = 238\) with \(n = 10\). Hence, \(\alpha + n\bar{x} = 101 + 238 = 339\) and \(\beta + n = 5 + 10 = 15\) so that \(\theta \, | \, x \sim Gamma(339, 15)\). From (4) and (5) we have \(E(Z \, | \, X) = \frac{339}{15} = \frac{113}{5}\) and \(Var(Z \, | \, X) = \frac{339\times 16}{15^{2}} = \frac{1808}{75}\) so we use \(Z \, | \, x \sim N(\frac{113}{5}, \frac{1808}{75})\) approximately. Recalling that if \(Y \sim N(0, 1)\) then \(P(-1.96 < Y < 1.96) = 0.95\) the (approximate) 95% prediction interval is \[\begin{eqnarray*} \frac{113}{5} \pm 1.96\sqrt{\frac{1808}{75}} & = & (12.98, 32.22). \end{eqnarray*}\]

Question 2

Let \(X_{1}, \ldots, X_{n}\) be a finitely exchangeable sequence of random quantities and consider any \(i \neq j \in \{1, \ldots, n\}\).

Explain why the marginal distribution of any \(X_{i}\) does not depend upon \(i\) so that \(E(X_{i})\) and \(Var(X_{i})\) do not depend upon \(i\).

If \(X_{1}, \ldots, X_{n}\) are finitely exchangeable then their joint density satisfies \[\begin{eqnarray*} f(x_{1}, \ldots, x_{n}) & = & f(x_{\pi(1)}, \ldots, x_{\pi(n)}) \end{eqnarray*}\] for all permutations \(\pi\) defined on the set \(\{1, \ldots, n\}\). For a given permutation there exists \(j \in \{1, \ldots, n\}\) such that \(x_{\pi(j)} = x_{i}\). Let \(x_{-i}\) denote the set \(\{x_{1}, \ldots, x_{n}\} \setminus x_{i}\). The marginal distribution of \(X_{i}\) is given by \[\begin{eqnarray*} f_{X_{i}}(x_{i}) & = & \int_{x_{-i}} f(x_{1}, \ldots, x_{n}) \, dx_{-i} \\ & = & \int_{x_{-i}} f(x_{\pi(1)}, \ldots, x_{\pi(j-1)}, x_{\pi(j)}, x_{\pi(j+1)}, \ldots, x_{\pi(n)}) \, dx_{-i} \\ & = & \int_{x_{-i}} f(x_{\pi(1)}, \ldots, x_{\pi(j-1)}, x_{i}, x_{\pi(j+1)}, \ldots, x_{\pi(n)}) \, dx_{-i} \\ & = & f_{X_{j}}(x_{i}). \end{eqnarray*}\] As this holds for all permutations then it holds for all \(j \in \{1, \ldots, n\}\). Thus, the marginal distribution of \(X_{i}\) does not depend upon \(i\). As \(E(X_{i})\) and \(Var(X_{i})\) are constructed over the marginal of \(X_{i}\) then it follows immediately that these also do not depend upon \(i\).
Explain why the joint distribution of any \(X_{i}\), \(X_{j}\) does not depend upon either \(i\) or \(j\) so that \(Cov(X_{i}, X_{j})\) does not depend on either \(i\) or \(j\).

For \(i \neq j\), let \(x_{-i, -j}\) denote the set \(\{x_{1}, \ldots, x_{n}\} \setminus \{x_{i}, x_{j}\}\). For a given permutation there exists \(k \neq l \in \{1, \ldots, n\}\) such that \(x_{\pi(k)} = x_{i}\) and \(x_{\pi(l)} = x_{j}\). Suppose without loss of generality that \(k < l\). The joint distribution of \(X_{i}\) and \(X_{j}\) is given by \[\begin{eqnarray*} f_{X_{i}, X_{j}}(x_{i}, x_{j}) & = & \int_{x_{-i, -j}} f(x_{1}, \ldots, x_{n}) \, dx_{-i, -j} \\ & = & \int_{x_{-i, -j}} f(x_{\pi(1)}, \ldots, x_{\pi(k-1)}, x_{\pi(k)}, x_{\pi(k+1)}, \ldots, x_{\pi(l-1)}, x_{\pi(l)}, x_{\pi(l+1)}, \ldots, x_{\pi(n)}) \, dx_{-i, -j} \\ & = & \int_{x_{-i, -j}} f(x_{\pi(1)}, \ldots, x_{\pi(k-1)}, x_{i}, x_{\pi(k+1)}, \ldots, x_{\pi(l-1)}, x_{j}, x_{\pi(l+1)}, \ldots, x_{\pi(n)}) \, dx_{-i, -j} \\ & = & f_{X_{k}, X_{l}}(x_{i}, x_{j}). \end{eqnarray*}\] As this holds for all permutations then it holds for all \(k \neq l \in \{1, \ldots, n\}\). Thus, the joint distribution of \(X_{i}\) and \(X_{j}\) does not depend upon \(i \neq j\). As \(Cov(X_{i}, X_{j})\) is constructed over the joint distribution of \(X_{i}\) and \(X_{j}\) then it follows immediately that \(Cov(X_{i}, X_{j})\) does not depend on either \(i\) or \(j\).
Let \(Y = \sum_{i=1}^{n} X_{i}\). By considering \(Var(Y) \geq 0\), or otherwise, show that \[\begin{eqnarray*} Corr(X_{i}, X_{j}) & = & \frac{Cov(X_{i}, X_{j})}{\sqrt{Var(X_{i})Var(X_{j})}} \ \geq \ \frac{-1}{n-1}. \end{eqnarray*}\]
Taking the variance of \(Y = \sum_{i=1}^{n} X_{i}\) we have \[\begin{eqnarray*} Var(Y) & = & Var\left(\sum_{i=1}^{n} X_{i}\right) \\ & = & \sum_{i=1}^{n} Var(X_{i}) + \sum_{i=1}^{n}\sum_{j \neq i}^{n} Cov(X_{i}, X_{j}). \end{eqnarray*}\] Now, from part (a), \(Var(X_{i})\) does not depend upon \(i\) and, from part (b), \(Cov(X_{i}, X_{j})\) does not depend upon either \(i\) or \(j\). Thus, \[\begin{eqnarray*} Var(Y) & = & nVar(X_{i}) + n(n-1)Cov(X_{i}, X_{j}). \end{eqnarray*}\] As \(Var(Y) \geq 0\) it follows that \(nVar(X_{i}) + n(n-1)Cov(X_{i}, X_{j}) \geq 0\). Rearranging this inequality gives \[\begin{eqnarray*} \frac{Cov(X_{i}, X_{j})}{Var(X_{i})} & \geq & \frac{-1}{n-1}. \end{eqnarray*}\] As \(Var(X_{i})\) does not depend upon \(i\) then \(Var(X_{i}) = \sqrt{Var(X_{i})Var(X_{j})}\) so that \[\begin{eqnarray*} Corr(X_{i}, X_{j}) & = & \frac{Cov(X_{i}, X_{j})}{\sqrt{Var(X_{i})Var(X_{j})}} \ \geq \ \frac{-1}{n-1}. \end{eqnarray*}\] Notice that an immediate consequence of this result is that if \(X_{1}, X_{2}, \ldots\) are infinetely exchangeable then \(Corr(X_{i}, X_{j}) \geq 0\) as we require \(Corr(X_{i}, X_{j}) \geq \frac{-1}{n-1}\) for all \(n\).

Question 3

Suppose \(X \, | \, \mu \sim N(\mu, \sigma^{2})\) and \(Y \, | \, \mu, \delta \sim N(\mu + \delta, \sigma^{2})\), where \(\sigma^{2}\) is known and \(X\) and \(Y\) are conditionally independent given \(\mu\) and \(\delta\).

Find the joint distribution of \(X\) and \(Y\) given \(\mu\) and \(\delta\).

As \(X\) and \(Y\) are conditionally independent given \(\mu\) and \(\delta\) then \[\begin{eqnarray*} f(x, y \, | \, \mu, \delta) & = & f(x \, | \, \mu, \delta)f(y \, | \, \mu, \delta) \\ & = & f(x \, | \, \mu)f(y \, | \, \mu, \delta) \\ & = & \frac{1}{2\pi \sigma^{2}} \exp \left\{ - \frac{1}{2\sigma^{2}}[(x-\mu)^{2} + (y - \mu - \delta)^{2}]\right\}. \end{eqnarray*}\]
Consider the improper noninformative joint prior distribution \[\begin{eqnarray*} f(\mu, \delta) & \propto & 1. \end{eqnarray*}\] Find, up to a constant of proportionality, the joint posterior distribution of \(\mu\) and \(\delta\) given \(x\) and \(y\). Are \(\mu \, | \, x, y\) and \(\delta \, | \, x, y\) independent?

As \(f(\mu, \delta) \propto 1\) then \[\begin{eqnarray} f(\mu, \delta \, | \, x, y) & \propto & f(x, y \, | \, \mu, \delta) \nonumber \\ & \propto & \exp \left\{ - \frac{1}{2\sigma^{2}}[(x-\mu)^{2} + (y - \mu - \delta)^{2}]\right\} \nonumber \\ & \propto & \exp \left\{ - \frac{1}{2\sigma^{2}}[\mu^{2} - 2x\mu + (\mu + \delta)^{2} - 2y(\mu + \delta)]\right\} \nonumber \\ & = & \exp \left\{ - \frac{1}{2\sigma^{2}}[2\mu^{2} -2(x+y)\mu + 2\mu\delta +\delta^{2} -2y\delta]\right\}. \tag{6} \end{eqnarray}\] Thus, as a consequence of the \(\mu\delta\) term in (6), we cannot write \(f(\mu, \delta \, | \, x, y) \propto g(\mu \, | \, x, y)h(\delta \, | \, x, y)\) so that \(\mu \, | \, x, y\) and \(\delta \, | \, x, y\) are not independent.

Note that \(f(\mu, \delta \, | \, x, y)\) is quadratic in \(\mu\) and \(\delta\) so that \[\begin{eqnarray*} f(\mu, \delta \, | \, x, y) & \propto & \exp \left\{ - \frac{1}{2\sigma^{2}} \left(\mu-x \ \ \delta - (y-x)\right)\left(\begin{array}{cc} 2 & 1 \\ 1 & 1 \end{array}\right)\left(\begin{array}{c} \mu-x \\ \delta - (y-x)\end{array}\right)\right\} \\ & = & \exp \left\{ - \frac{1}{2\sigma^{2}} \left(\mu-x \ \ \delta - (y-x)\right)\left(\begin{array}{rr} 1 & -1 \\ -1 & 2 \end{array}\right)^{-1}\left(\begin{array}{c} \mu-x \\ \delta - (y-x)\end{array}\right)\right\} \end{eqnarray*}\] which is the kernel of a bivariate Normal distribution. We have \[\begin{eqnarray*} \mu, \delta \, | \, x, y & \sim & N_{2}\left(\left(\begin{array}{c} x \\y-x \end{array}\right), \left(\begin{array}{rr} \sigma^{2} & -\sigma^{2} \\ -\sigma^{2} & 2\sigma^{2} \end{array}\right)\right). \end{eqnarray*}\] Using the properties of multivariate Normal distributions, we can read off the marginal distributions so \(\mu \, | \, x, y \sim N(x, \sigma^{2})\) and \(\delta \, | \, x, y \sim N(y-x, 2\sigma^{2})\). We shall derive these marginal distributions directly in parts (c) and (d) as these give insights into the types of techniques used when the marginals are not so conventional to obtain.
Find the marginal posterior distribution \(f(\delta \, | \, x, y)\).
\[\begin{eqnarray*} f(\delta \, | \, x, y) & = & \int_{-\infty}^{\infty} f(\mu, \delta \, | \, x, y) \, d\mu \\ & \propto & \int_{-\infty}^{\infty} \exp \left\{ - \frac{1}{2\sigma^{2}}[2\mu^{2} -2(x+y)\mu + 2\mu\delta +\delta^{2} -2y\delta]\right\} \, d\mu \\ & = & \exp\left\{- \frac{1}{2\sigma^{2}}[\delta^{2} -2y\delta]\right\}\int_{-\infty}^{\infty} \exp \left\{ - \frac{1}{2\sigma^{2}}[2\mu^{2} -2(x+y-\delta)\mu]\right\} \, d\mu \\ & = & \exp\left\{- \frac{1}{2\sigma^{2}}[\delta^{2} -2y\delta]\right\}\times \\ & & \hspace{1cm}\int_{-\infty}^{\infty} \exp \left\{ - \frac{1}{\sigma^{2}}\left[\left(\mu - \frac{(x+y-\delta)}{2}\right)^{2} - \frac{(x+y-\delta)^{2}}{4}\right]\right\} \, d\mu \\ & = & \exp\left\{- \frac{1}{4\sigma^{2}}[2\delta^{2} -4y\delta - (x+y-\delta)^{2}]\right\}\times \\ & & \hspace{3.8cm}\int_{-\infty}^{\infty} \exp \left\{ - \frac{1}{\sigma^{2}}\left(\mu - \frac{(x+y-\delta)}{2}\right)^{2}\right\} \, d\mu \\ & \propto & \exp\left\{- \frac{1}{4\sigma^{2}}[2\delta^{2} -4y\delta - (x+y-\delta)^{2}]\right\} \\ & \propto & \exp\left\{- \frac{1}{4\sigma^{2}}[\delta^{2}-2(2y-(x+y))\delta]\right\} \end{eqnarray*}\] which is a kernel of a \(N(y-x, 2\sigma^{2})\) density. Hence, \(\delta \, | \, x, y \sim N(y-x, 2\sigma^{2})\).
Find the marginal posterior distribution \(f(\mu \, | \, x, y)\).
\[\begin{eqnarray*} f(\mu \, | \, x, y) & = & \int_{-\infty}^{\infty} f(\mu, \delta \, | \, x, y) \, d\delta \\ & \propto & \int_{-\infty}^{\infty} \exp \left\{ - \frac{1}{2\sigma^{2}}[2\mu^{2} -2(x+y)\mu + 2\mu\delta +\delta^{2} -2y\delta]\right\} \, d\delta \\ & = & \exp\left\{- \frac{1}{2\sigma^{2}}[2\mu^{2} - 2(x+y)\mu]\right\}\times \\ & & \hspace{3.6cm} \int_{-\infty}^{\infty} \exp \left\{ - \frac{1}{2\sigma^{2}}[\delta^{2} -2(y-\mu)\delta]\right\} \, d\delta \\ & = & \exp\left\{- \frac{1}{2\sigma^{2}}[2\mu^{2} - 2(x+y)\mu]\right\}\times \\ & & \hspace{1.6cm}\int_{-\infty}^{\infty} \exp \left\{ - \frac{1}{2\sigma^{2}}[\delta -(y-\mu)]^{2} + \frac{1}{2\sigma^{2}}(y-\mu)^{2}\right\} \, d\delta \\ & = & \exp\left\{- \frac{1}{2\sigma^{2}}[2\mu^{2} - 2(x+y)\mu - (y-\mu)^{2}]\right\}\times \\ & & \hspace{4.5cm}\int_{-\infty}^{\infty} \exp \left\{ - \frac{1}{2\sigma^{2}}[\delta -(y-\mu)]^{2}\right\} \, d\delta \\ & \propto & \exp\left\{- \frac{1}{2\sigma^{2}}[2\mu^{2} - 2(x+y)\mu - (y-\mu)^{2}]\right\} \\ & \propto & \exp\left\{- \frac{1}{2\sigma^{2}}[\mu^{2} -2(x+y-y)\mu]\right\} \end{eqnarray*}\] which is a kernel of a \(N(x, \sigma^{2})\) density. Hence, \(\mu \, | \, x, y \sim N(x, \sigma^{2})\).
Consider a future observation \(Z\) where \(Z \, | \, \mu, \delta \sim N(\mu - \delta, \sigma^{2})\) and \(Z\) is conditionally independent of \(X\) and \(Y\) given \(\mu\) and \(\delta\). Find the predictive distribution \(f(z \, | \, x, y)\).

Notice that \[\begin{eqnarray*} f(z \, | \, \mu, \delta) & \propto & \exp \left\{-\frac{1}{2\sigma^{2}}[z^{2} - 2(\mu - \delta)z + (\mu - \delta)^{2}]\right\} \end{eqnarray*}\] so that, as \(Z\) is conditionally independent of \(X\) and \(Y\) given \(\mu\) and \(\delta\) and using (6), \[\begin{eqnarray*} f(z \, | \, x, y) & = & \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(z \, | \, \mu, \delta)f(\mu, \delta \, | \, x, y) \, d\mu \, d\delta \\ & \propto & \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \exp\left\{-\frac{1}{2\sigma^{2}}[z^{2} - 2(\mu - \delta)z + (\mu - \delta)^{2}]\right\} \times \\ & & \hspace{2cm}\exp\left\{-\frac{1}{2\sigma^{2}}[2\mu^{2} -2(x+y)\mu + 2\mu\delta +\delta^{2} -2y\delta]\right\} \, d\mu \, d\delta \\ & = & \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \exp\left\{-\frac{1}{2\sigma^{2}}[z^{2} + 3\mu^{2} -2(x+y+z)\mu + 2\delta^{2} -2(y-z)\delta]\right\} \, d\mu \, d\delta \\ & = & \exp\left\{-\frac{1}{2\sigma^{2}}z^{2} \right\} \int_{-\infty}^{\infty} \exp\left\{-\frac{1}{2\sigma^{2}}[3\mu^{2} -2(x+y+z)\mu]\right\} \, d\mu \\ & & \hspace{5.18cm} \times \int_{-\infty}^{\infty} \exp\left\{-\frac{1}{2\sigma^{2}}[2\delta^{2} -2(y-z)\delta]\right\} \, d\delta \\ & = & \exp\left\{-\frac{z^{2}}{2\sigma^{2}} \right\} \int_{-\infty}^{\infty} \exp\left\{-\frac{3}{2\sigma^{2}}\left[\left(\mu - \frac{(x+y+z)}{3}\right)^{2} - \frac{(x+y+z)^{2}}{9}\right]\right\} \, d\mu \\ & & \hspace{3.18cm}\times \int_{-\infty}^{\infty} \exp\left\{-\frac{1}{\sigma^{2}}\left[\left(\delta - \frac{(y-z)}{2}\right)^{2} - \frac{(y-z)^{2}}{4}\right]\right\} \, d\delta \\ & \propto & \exp\left\{-\frac{1}{2\sigma^{2}}z^{2} \right\}\exp\left\{\frac{3}{2\sigma^{2}} \frac{(x+y+z)^{2}}{9}\right\}\exp\left\{\frac{1}{\sigma^{2}}\frac{(y-z)^{2}}{4}\right\} \\ & \propto & \exp\left\{-\frac{1}{12\sigma^{2}}[z^{2}-4(x+y)z+6yz]\right\} \\ & = & \exp\left\{-\frac{1}{12\sigma^{2}}[z^{2} - 2(2x-y)z]\right\} \end{eqnarray*}\] which is a kernel of a \(N(2x-y, 6\sigma^{2})\) density. Hence, \(Z \, | \, x, y \sim N(2x-y, 6\sigma^{2})\).

Question 4

Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta\). Suppose that each \(X_{i} \, | \, \theta \sim U(0, \theta)\).

Show that \(M = \max_{i}(X_{i})\) is a sufficient statistic for \(X = (X_{1}, \ldots, X_{n})\).

As \(X_{i} \, | \, \theta \sim U(0, \theta)\) then \[\begin{eqnarray*} f(x_{i} \, | \, \theta) & = & \left\{\begin{array}{ll} \frac{1}{\theta} & 0 \leq x_{i} \leq \theta; \\ 0 & \mbox{otherwise}. \end{array}\right. \end{eqnarray*}\] Let \(\mathbb{I}_{(0, \theta)}(x)\) denote the indicator function for the event \(0 \leq x \leq \theta\), so \(\mathbb{I}_{(0, \theta)}(x) = 1\) if \(0 \leq x \leq \theta\) and \(0\) otherwise. Then we can write \(f(x_{i} \, | \, \theta) = \frac{1}{\theta} \mathbb{I}_{(0, \theta)}(x_{i})\) so that \[\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} \frac{1}{\theta} \mathbb{I}_{(0, \theta)}(x_{i}) \\ & = & \frac{1}{\theta^{n}} \prod_{i=1}^{n} \mathbb{I}_{(0, \theta)}(x_{i}). \end{eqnarray*}\] Let \(\mathbb{I}_{\{a \geq b\}}\) denote the indicator function for the event \(a \geq b\) so \(\mathbb{I}_{\{a \geq b\}} = 1\) if \(a \geq b\) and \(0\) otherwise. Then \[\begin{eqnarray*} \prod_{i=1}^{n} \mathbb{I}_{(0, \theta)}(x_{i}) & = & \mathbb{I}_{\{\min_{i}(x_{i}) \geq 0\}}\mathbb{I}_{\{\theta \geq \max_{i}(x_{i})\}} \end{eqnarray*}\] so that \[\begin{eqnarray*} f(x \, | \, \theta) & = & \frac{1}{\theta^{n}} \mathbb{I}_{\{\min_{i}(x_{i}) \geq 0\}}\mathbb{I}_{\{\theta \geq \max_{i}(x_{i})\}} \\ & = & \frac{1}{\theta^{n}}\mathbb{I}_{\{\theta \geq m\}}\mathbb{I}_{\{\min_{i}(x_{i}) \geq 0\}} \end{eqnarray*}\] where \(m = \max_{i}(x_{i})\). Thus, \(f(x \, | \, \theta) = g(\theta, m)h(x)\) where \(g(\theta, m) = \frac{1}{\theta^{n}}\mathbb{I}_{\{\theta \geq m\}}\) and \(h(x) = \mathbb{I}_{\{\min_{i}(x_{i}) \geq 0\}}\) so that \(M = \max_{i}(X_{i})\) is sufficient for \(X = (X_{1}, \ldots, X_{n})\) for learning about \(\theta\).
Show that the Pareto distribution, \(\theta \sim Pareto(a, b)\), \[\begin{eqnarray*} f(\theta) & = & \frac{a b^{a}}{\theta^{a+1}}, \ \ \theta \geq b \end{eqnarray*}\] is a conjugate prior distribution.

Utilising an indicator function, the prior may be expressed as \[\begin{eqnarray*} f(\theta) & = & \frac{ab^{a}}{\theta^{a+1}}\mathbb{I}_{\{\theta \geq b\}}. \end{eqnarray*}\] The posterior is \[\begin{eqnarray*} f(\theta \, | \, x) & \propto & \frac{1}{\theta^{n}} \mathbb{I}_{\{\theta \geq m\}} \times \frac{ab^{a}}{\theta^{a+1}}\mathbb{I}_{\{\theta \geq b\}} \\ & \propto & \frac{1}{\theta^{a+n+1}} \mathbb{I}_{\{\theta \geq m\}}\mathbb{I}_{\{\theta \geq b\}} \\ & = & \frac{1}{\theta^{a+n+1}} \mathbb{I}_{\{\theta \geq \max(b, m)\}} \end{eqnarray*}\] which is a kernel of a \(Pareto(a+n, \max(b, m))\) density so that \(\theta \, | \, x \sim Pareto(a+n, \max(b, m))\). Thus, the prior and posterior are in the same family and so, relative to the \(U(0, \theta)\) likelihood, the Pareto distribution is a conjugate family.