Let \(X_{1}, \ldots, X_{n}\) be conditionally independent given \(\theta\), so \(f(x \, | \, \theta) = \prod_{i=1}^{n} f(x_{i} \, | \, \theta)\) where \(x = (x_{1}, \ldots, x_{n})\), with each \(X_{i} \, | \, \theta \sim Po(\theta)\). Suppose we judge that \(\theta \sim Gamma(\alpha, \beta)\).
As \(\theta \sim Gamma(\alpha, \beta)\) then \[\begin{eqnarray} f(\theta) & = & \frac{\beta^{\alpha}}{\Gamma(\alpha)}\theta^{\alpha-1}e^{-\beta \theta} \tag{1} \end{eqnarray}\] whilst as \(X_{i} \, | \, \theta \sim Po(\theta)\) \[\begin{eqnarray} f(x \, | \, \theta) \ = \ \prod_{i=1}^{n} \frac{\theta^{x_{i}}e^{-\theta}}{x_{i}!} \ = \ \frac{\theta^{n\bar{x}}e^{-n\theta}}{\prod_{i=1}^{n} x_{i}!}. \tag{2} \end{eqnarray}\] From (1) and (2) the posterior density \[\begin{eqnarray} f(\theta \, | \, x) & \propto & \frac{\theta^{n\bar{x}}e^{-n\theta}}{\prod_{i=1}^{n} x_{i}!} \times \frac{\beta^{\alpha}}{\Gamma(\alpha)}\theta^{\alpha-1}e^{-\beta \theta} \nonumber \\ & \propto & \theta^{(\alpha + n\bar{x})-1}e^{-(\beta + n)\theta} \nonumber \end{eqnarray}\] which is a kernel of a \(Gamma(\alpha + n\bar{x}, \beta + n)\) density, so \(\theta \, | \, x \sim Gamma(\alpha + n\bar{x}, \beta + n)\).
As \(\theta \, | \, x \sim Gamma(\alpha + n\bar{x}, \beta + n)\), \[\begin{eqnarray*} E(\theta \, | \, x) \ = \ \frac{\alpha + n\bar{x}}{\beta + n} & = & \frac{\beta \left(\frac{\alpha}{\beta}\right) + n\bar{x}}{\beta + n} \\ & = & \lambda \theta_{0} + (1 - \lambda) \bar{x} \end{eqnarray*}\] where \(\lambda = \frac{\beta}{\beta + n}\) and \(\theta_{0} = \frac{\alpha}{\beta} = E(\theta)\) as \(\theta \sim Gamma(\alpha, \beta)\). Thus, the posterior mean is a weighted average of the prior mean and the maximum likelihood estimate.
We do not need to find the full predictive distribution. As \(X\) and \(Z\) are conditionally independent given \(\theta\) we can use expectation and variance the formulas given in lectures so \[\begin{eqnarray} E(Z \, | \, X) & = & E(E(Z \, | \, \theta) \, | \, X) \nonumber \\ & = & E(\theta \, | \, X) \tag{3} \\ & = & \frac{\alpha + n\bar{x}}{\beta + n} \tag{4} \end{eqnarray}\] where (3) follows as \(Z \, | \, \theta \sim Po(\theta)\) and (4) as \(\theta \, | \, x \sim Gamma(\alpha + n\bar{x}, \beta + n)\). Similarly, \[\begin{eqnarray} Var(Z \, | \, X) & = & Var(E(Z \, | \, \theta) \, | \, X) + E(Var(Z \, | \, \theta) \, | \, X) \nonumber \\ & = & Var(\theta \, | \, X) + E(\theta \, | \, X) \nonumber \\ & = & \frac{\alpha + n\bar{x}}{(\beta + n)^{2}} + \frac{\alpha + n\bar{x}}{\beta + n} \ = \ \frac{(\alpha + n\bar{x})(\beta + n + 1)}{(\beta + n)^{2}}. \tag{5} \end{eqnarray}\]
Taking \(\alpha = 101\) and \(\beta = 5\) we observe \(\sum_{i=1}^{10} x_{i} = 238\) with \(n = 10\). Hence, \(\alpha + n\bar{x} = 101 + 238 = 339\) and \(\beta + n = 5 + 10 = 15\) so that \(\theta \, | \, x \sim Gamma(339, 15)\). From (4) and (5) we have \(E(Z \, | \, X) = \frac{339}{15} = \frac{113}{5}\) and \(Var(Z \, | \, X) = \frac{339\times 16}{15^{2}} = \frac{1808}{75}\) so we use \(Z \, | \, x \sim N(\frac{113}{5}, \frac{1808}{75})\) approximately. Recalling that if \(Y \sim N(0, 1)\) then \(P(-1.96 < Y < 1.96) = 0.95\) the (approximate) 95% prediction interval is \[\begin{eqnarray*} \frac{113}{5} \pm 1.96\sqrt{\frac{1808}{75}} & = & (12.98, 32.22). \end{eqnarray*}\]
Let \(X_{1}, \ldots, X_{n}\) be a finitely exchangeable sequence of random quantities and consider any \(i \neq j \in \{1, \ldots, n\}\).
If \(X_{1}, \ldots, X_{n}\) are finitely exchangeable then their joint density satisfies \[\begin{eqnarray*} f(x_{1}, \ldots, x_{n}) & = & f(x_{\pi(1)}, \ldots, x_{\pi(n)}) \end{eqnarray*}\] for all permutations \(\pi\) defined on the set \(\{1, \ldots, n\}\). For a given permutation there exists \(j \in \{1, \ldots, n\}\) such that \(x_{\pi(j)} = x_{i}\). Let \(x_{-i}\) denote the set \(\{x_{1}, \ldots, x_{n}\} \setminus x_{i}\). The marginal distribution of \(X_{i}\) is given by \[\begin{eqnarray*} f_{X_{i}}(x_{i}) & = & \int_{x_{-i}} f(x_{1}, \ldots, x_{n}) \, dx_{-i} \\ & = & \int_{x_{-i}} f(x_{\pi(1)}, \ldots, x_{\pi(j-1)}, x_{\pi(j)}, x_{\pi(j+1)}, \ldots, x_{\pi(n)}) \, dx_{-i} \\ & = & \int_{x_{-i}} f(x_{\pi(1)}, \ldots, x_{\pi(j-1)}, x_{i}, x_{\pi(j+1)}, \ldots, x_{\pi(n)}) \, dx_{-i} \\ & = & f_{X_{j}}(x_{i}). \end{eqnarray*}\] As this holds for all permutations then it holds for all \(j \in \{1, \ldots, n\}\). Thus, the marginal distribution of \(X_{i}\) does not depend upon \(i\). As \(E(X_{i})\) and \(Var(X_{i})\) are constructed over the marginal of \(X_{i}\) then it follows immediately that these also do not depend upon \(i\).
For \(i \neq j\), let \(x_{-i, -j}\) denote the set \(\{x_{1}, \ldots, x_{n}\} \setminus \{x_{i}, x_{j}\}\). For a given permutation there exists \(k \neq l \in \{1, \ldots, n\}\) such that \(x_{\pi(k)} = x_{i}\) and \(x_{\pi(l)} = x_{j}\). Suppose without loss of generality that \(k < l\). The joint distribution of \(X_{i}\) and \(X_{j}\) is given by \[\begin{eqnarray*} f_{X_{i}, X_{j}}(x_{i}, x_{j}) & = & \int_{x_{-i, -j}} f(x_{1}, \ldots, x_{n}) \, dx_{-i, -j} \\ & = & \int_{x_{-i, -j}} f(x_{\pi(1)}, \ldots, x_{\pi(k-1)}, x_{\pi(k)}, x_{\pi(k+1)}, \ldots, x_{\pi(l-1)}, x_{\pi(l)}, x_{\pi(l+1)}, \ldots, x_{\pi(n)}) \, dx_{-i, -j} \\ & = & \int_{x_{-i, -j}} f(x_{\pi(1)}, \ldots, x_{\pi(k-1)}, x_{i}, x_{\pi(k+1)}, \ldots, x_{\pi(l-1)}, x_{j}, x_{\pi(l+1)}, \ldots, x_{\pi(n)}) \, dx_{-i, -j} \\ & = & f_{X_{k}, X_{l}}(x_{i}, x_{j}). \end{eqnarray*}\] As this holds for all permutations then it holds for all \(k \neq l \in \{1, \ldots, n\}\). Thus, the joint distribution of \(X_{i}\) and \(X_{j}\) does not depend upon \(i \neq j\). As \(Cov(X_{i}, X_{j})\) is constructed over the joint distribution of \(X_{i}\) and \(X_{j}\) then it follows immediately that \(Cov(X_{i}, X_{j})\) does not depend on either \(i\) or \(j\).
Taking the variance of \(Y = \sum_{i=1}^{n} X_{i}\) we have \[\begin{eqnarray*} Var(Y) & = & Var\left(\sum_{i=1}^{n} X_{i}\right) \\ & = & \sum_{i=1}^{n} Var(X_{i}) + \sum_{i=1}^{n}\sum_{j \neq i}^{n} Cov(X_{i}, X_{j}). \end{eqnarray*}\] Now, from part (a), \(Var(X_{i})\) does not depend upon \(i\) and, from part (b), \(Cov(X_{i}, X_{j})\) does not depend upon either \(i\) or \(j\). Thus, \[\begin{eqnarray*} Var(Y) & = & nVar(X_{i}) + n(n-1)Cov(X_{i}, X_{j}). \end{eqnarray*}\] As \(Var(Y) \geq 0\) it follows that \(nVar(X_{i}) + n(n-1)Cov(X_{i}, X_{j}) \geq 0\). Rearranging this inequality gives \[\begin{eqnarray*} \frac{Cov(X_{i}, X_{j})}{Var(X_{i})} & \geq & \frac{-1}{n-1}. \end{eqnarray*}\] As \(Var(X_{i})\) does not depend upon \(i\) then \(Var(X_{i}) = \sqrt{Var(X_{i})Var(X_{j})}\) so that \[\begin{eqnarray*} Corr(X_{i}, X_{j}) & = & \frac{Cov(X_{i}, X_{j})}{\sqrt{Var(X_{i})Var(X_{j})}} \ \geq \ \frac{-1}{n-1}. \end{eqnarray*}\] Notice that an immediate consequence of this result is that if \(X_{1}, X_{2}, \ldots\) are infinetely exchangeable then \(Corr(X_{i}, X_{j}) \geq 0\) as we require \(Corr(X_{i}, X_{j}) \geq \frac{-1}{n-1}\) for all \(n\).
Suppose \(X \, | \, \mu \sim N(\mu, \sigma^{2})\) and \(Y \, | \, \mu, \delta \sim N(\mu + \delta, \sigma^{2})\), where \(\sigma^{2}\) is known and \(X\) and \(Y\) are conditionally independent given \(\mu\) and \(\delta\).
As \(X\) and \(Y\) are conditionally independent given \(\mu\) and \(\delta\) then \[\begin{eqnarray*} f(x, y \, | \, \mu, \delta) & = & f(x \, | \, \mu, \delta)f(y \, | \, \mu, \delta) \\ & = & f(x \, | \, \mu)f(y \, | \, \mu, \delta) \\ & = & \frac{1}{2\pi \sigma^{2}} \exp \left\{ - \frac{1}{2\sigma^{2}}[(x-\mu)^{2} + (y - \mu - \delta)^{2}]\right\}. \end{eqnarray*}\]
As \(f(\mu, \delta) \propto 1\) then \[\begin{eqnarray} f(\mu, \delta \, | \, x, y) & \propto & f(x, y \, | \, \mu, \delta) \nonumber \\ & \propto & \exp \left\{ - \frac{1}{2\sigma^{2}}[(x-\mu)^{2} + (y - \mu - \delta)^{2}]\right\} \nonumber \\ & \propto & \exp \left\{ - \frac{1}{2\sigma^{2}}[\mu^{2} - 2x\mu + (\mu + \delta)^{2} - 2y(\mu + \delta)]\right\} \nonumber \\ & = & \exp \left\{ - \frac{1}{2\sigma^{2}}[2\mu^{2} -2(x+y)\mu + 2\mu\delta +\delta^{2} -2y\delta]\right\}. \tag{6} \end{eqnarray}\] Thus, as a consequence of the \(\mu\delta\) term in (6), we cannot write \(f(\mu, \delta \, | \, x, y) \propto g(\mu \, | \, x, y)h(\delta \, | \, x, y)\) so that \(\mu \, | \, x, y\) and \(\delta \, | \, x, y\) are not independent.
Note that \(f(\mu, \delta \, | \, x, y)\) is quadratic in \(\mu\) and \(\delta\) so that \[\begin{eqnarray*} f(\mu, \delta \, | \, x, y) & \propto & \exp \left\{ - \frac{1}{2\sigma^{2}} \left(\mu-x \ \ \delta - (y-x)\right)\left(\begin{array}{cc} 2 & 1 \\ 1 & 1 \end{array}\right)\left(\begin{array}{c} \mu-x \\ \delta - (y-x)\end{array}\right)\right\} \\ & = & \exp \left\{ - \frac{1}{2\sigma^{2}} \left(\mu-x \ \ \delta - (y-x)\right)\left(\begin{array}{rr} 1 & -1 \\ -1 & 2 \end{array}\right)^{-1}\left(\begin{array}{c} \mu-x \\ \delta - (y-x)\end{array}\right)\right\} \end{eqnarray*}\] which is the kernel of a bivariate Normal distribution. We have \[\begin{eqnarray*} \mu, \delta \, | \, x, y & \sim & N_{2}\left(\left(\begin{array}{c} x \\y-x \end{array}\right), \left(\begin{array}{rr} \sigma^{2} & -\sigma^{2} \\ -\sigma^{2} & 2\sigma^{2} \end{array}\right)\right). \end{eqnarray*}\] Using the properties of multivariate Normal distributions, we can read off the marginal distributions so \(\mu \, | \, x, y \sim N(x, \sigma^{2})\) and \(\delta \, | \, x, y \sim N(y-x, 2\sigma^{2})\). We shall derive these marginal distributions directly in parts (c) and (d) as these give insights into the types of techniques used when the marginals are not so conventional to obtain.
\[\begin{eqnarray*} f(\delta \, | \, x, y) & = & \int_{-\infty}^{\infty} f(\mu, \delta \, | \, x, y) \, d\mu \\ & \propto & \int_{-\infty}^{\infty} \exp \left\{ - \frac{1}{2\sigma^{2}}[2\mu^{2} -2(x+y)\mu + 2\mu\delta +\delta^{2} -2y\delta]\right\} \, d\mu \\ & = & \exp\left\{- \frac{1}{2\sigma^{2}}[\delta^{2} -2y\delta]\right\}\int_{-\infty}^{\infty} \exp \left\{ - \frac{1}{2\sigma^{2}}[2\mu^{2} -2(x+y-\delta)\mu]\right\} \, d\mu \\ & = & \exp\left\{- \frac{1}{2\sigma^{2}}[\delta^{2} -2y\delta]\right\}\times \\ & & \hspace{1cm}\int_{-\infty}^{\infty} \exp \left\{ - \frac{1}{\sigma^{2}}\left[\left(\mu - \frac{(x+y-\delta)}{2}\right)^{2} - \frac{(x+y-\delta)^{2}}{4}\right]\right\} \, d\mu \\ & = & \exp\left\{- \frac{1}{4\sigma^{2}}[2\delta^{2} -4y\delta - (x+y-\delta)^{2}]\right\}\times \\ & & \hspace{3.8cm}\int_{-\infty}^{\infty} \exp \left\{ - \frac{1}{\sigma^{2}}\left(\mu - \frac{(x+y-\delta)}{2}\right)^{2}\right\} \, d\mu \\ & \propto & \exp\left\{- \frac{1}{4\sigma^{2}}[2\delta^{2} -4y\delta - (x+y-\delta)^{2}]\right\} \\ & \propto & \exp\left\{- \frac{1}{4\sigma^{2}}[\delta^{2}-2(2y-(x+y))\delta]\right\} \end{eqnarray*}\] which is a kernel of a \(N(y-x, 2\sigma^{2})\) density. Hence, \(\delta \, | \, x, y \sim N(y-x, 2\sigma^{2})\).
\[\begin{eqnarray*} f(\mu \, | \, x, y) & = & \int_{-\infty}^{\infty} f(\mu, \delta \, | \, x, y) \, d\delta \\ & \propto & \int_{-\infty}^{\infty} \exp \left\{ - \frac{1}{2\sigma^{2}}[2\mu^{2} -2(x+y)\mu + 2\mu\delta +\delta^{2} -2y\delta]\right\} \, d\delta \\ & = & \exp\left\{- \frac{1}{2\sigma^{2}}[2\mu^{2} - 2(x+y)\mu]\right\}\times \\ & & \hspace{3.6cm} \int_{-\infty}^{\infty} \exp \left\{ - \frac{1}{2\sigma^{2}}[\delta^{2} -2(y-\mu)\delta]\right\} \, d\delta \\ & = & \exp\left\{- \frac{1}{2\sigma^{2}}[2\mu^{2} - 2(x+y)\mu]\right\}\times \\ & & \hspace{1.6cm}\int_{-\infty}^{\infty} \exp \left\{ - \frac{1}{2\sigma^{2}}[\delta -(y-\mu)]^{2} + \frac{1}{2\sigma^{2}}(y-\mu)^{2}\right\} \, d\delta \\ & = & \exp\left\{- \frac{1}{2\sigma^{2}}[2\mu^{2} - 2(x+y)\mu - (y-\mu)^{2}]\right\}\times \\ & & \hspace{4.5cm}\int_{-\infty}^{\infty} \exp \left\{ - \frac{1}{2\sigma^{2}}[\delta -(y-\mu)]^{2}\right\} \, d\delta \\ & \propto & \exp\left\{- \frac{1}{2\sigma^{2}}[2\mu^{2} - 2(x+y)\mu - (y-\mu)^{2}]\right\} \\ & \propto & \exp\left\{- \frac{1}{2\sigma^{2}}[\mu^{2} -2(x+y-y)\mu]\right\} \end{eqnarray*}\] which is a kernel of a \(N(x, \sigma^{2})\) density. Hence, \(\mu \, | \, x, y \sim N(x, \sigma^{2})\).
Let \(X_{1}, \ldots, X_{n}\) be exchangeable so that the \(X_{i}\) are conditionally independent given a parameter \(\theta\). Suppose that each \(X_{i} \, | \, \theta \sim U(0, \theta)\).
As \(X_{i} \, | \, \theta \sim U(0, \theta)\) then \[\begin{eqnarray*} f(x_{i} \, | \, \theta) & = & \left\{\begin{array}{ll} \frac{1}{\theta} & 0 \leq x_{i} \leq \theta; \\ 0 & \mbox{otherwise}. \end{array}\right. \end{eqnarray*}\] Let \(\mathbb{I}_{(0, \theta)}(x)\) denote the indicator function for the event \(0 \leq x \leq \theta\), so \(\mathbb{I}_{(0, \theta)}(x) = 1\) if \(0 \leq x \leq \theta\) and \(0\) otherwise. Then we can write \(f(x_{i} \, | \, \theta) = \frac{1}{\theta} \mathbb{I}_{(0, \theta)}(x_{i})\) so that \[\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} \frac{1}{\theta} \mathbb{I}_{(0, \theta)}(x_{i}) \\ & = & \frac{1}{\theta^{n}} \prod_{i=1}^{n} \mathbb{I}_{(0, \theta)}(x_{i}). \end{eqnarray*}\] Let \(\mathbb{I}_{\{a \geq b\}}\) denote the indicator function for the event \(a \geq b\) so \(\mathbb{I}_{\{a \geq b\}} = 1\) if \(a \geq b\) and \(0\) otherwise. Then \[\begin{eqnarray*} \prod_{i=1}^{n} \mathbb{I}_{(0, \theta)}(x_{i}) & = & \mathbb{I}_{\{\min_{i}(x_{i}) \geq 0\}}\mathbb{I}_{\{\theta \geq \max_{i}(x_{i})\}} \end{eqnarray*}\] so that \[\begin{eqnarray*} f(x \, | \, \theta) & = & \frac{1}{\theta^{n}} \mathbb{I}_{\{\min_{i}(x_{i}) \geq 0\}}\mathbb{I}_{\{\theta \geq \max_{i}(x_{i})\}} \\ & = & \frac{1}{\theta^{n}}\mathbb{I}_{\{\theta \geq m\}}\mathbb{I}_{\{\min_{i}(x_{i}) \geq 0\}} \end{eqnarray*}\] where \(m = \max_{i}(x_{i})\). Thus, \(f(x \, | \, \theta) = g(\theta, m)h(x)\) where \(g(\theta, m) = \frac{1}{\theta^{n}}\mathbb{I}_{\{\theta \geq m\}}\) and \(h(x) = \mathbb{I}_{\{\min_{i}(x_{i}) \geq 0\}}\) so that \(M = \max_{i}(X_{i})\) is sufficient for \(X = (X_{1}, \ldots, X_{n})\) for learning about \(\theta\).
Utilising an indicator function, the prior may be expressed as \[\begin{eqnarray*} f(\theta) & = & \frac{ab^{a}}{\theta^{a+1}}\mathbb{I}_{\{\theta \geq b\}}. \end{eqnarray*}\] The posterior is \[\begin{eqnarray*} f(\theta \, | \, x) & \propto & \frac{1}{\theta^{n}} \mathbb{I}_{\{\theta \geq m\}} \times \frac{ab^{a}}{\theta^{a+1}}\mathbb{I}_{\{\theta \geq b\}} \\ & \propto & \frac{1}{\theta^{a+n+1}} \mathbb{I}_{\{\theta \geq m\}}\mathbb{I}_{\{\theta \geq b\}} \\ & = & \frac{1}{\theta^{a+n+1}} \mathbb{I}_{\{\theta \geq \max(b, m)\}} \end{eqnarray*}\] which is a kernel of a \(Pareto(a+n, \max(b, m))\) density so that \(\theta \, | \, x \sim Pareto(a+n, \max(b, m))\). Thus, the prior and posterior are in the same family and so, relative to the \(U(0, \theta)\) likelihood, the Pareto distribution is a conjugate family.