MA40189: Topics in Bayesian statistics

Solution Sheet One

Question 1

For each of the following distributions, write down the probability density function and find a corresponding kernel. Your kernel should be as simple as possible.

\(X \, | \, \theta \sim Po(\theta)\).
\[\begin{eqnarray*} f(x \, | \, \theta) \ = \ P(X = x \, | \, \theta) & = & \frac{\theta^{x} e^{-\theta}}{x!}, \ x = 0, 1, \ldots \\ & \propto & \frac{\theta^{x}}{x!}. \end{eqnarray*}\]
\(Y \, | \, \theta, \beta \sim Beta(\beta \theta, \beta)\).
\[\begin{eqnarray*} f(y \, | \, \theta, \beta) & = & \frac{\Gamma(\beta \theta + \beta)}{\Gamma(\beta \theta) \Gamma(\beta)}y^{\beta \theta - 1}(1-y)^{\beta - 1}, \ y \in [0, 1] \\& \propto & y^{\beta \theta - 1}(1-y)^{\beta - 1}. \end{eqnarray*}\]
\(\theta \, | \, \alpha, \beta, x \sim Gamma(\alpha + x + 1, \beta - 3x)\).
\[\begin{eqnarray*} f(\theta \, | \, \alpha, \beta, x) & = & \frac{(\beta - 3x)^{\alpha + x + 1}}{\Gamma(\alpha + x + 1)}\theta^{\alpha + x}e^{-(\beta - 3x)\theta}, \ \theta > 0 \\& \propto & \theta^{\alpha + x}e^{-(\beta - 3x)\theta}. \end{eqnarray*}\]
\(\phi \, | \, \mu, \overline{x}, \tau \sim N(\tau \mu + (1 - \tau)\overline{x}, \overline{x}^{2}\tau^{-2})\).
\[\begin{eqnarray*} f(\phi \, | \, \mu, \overline{x}, \tau) & = & \frac{\tau}{\sqrt{2 \pi}\overline{x}}\exp\left\{-\frac{\tau^{2}}{2 \overline{x}^{2}}\left(\phi - \tau \mu - (1 - \tau)\overline{x}\right)^{2}\right\}, \ -\infty < \phi < \infty \\ & \propto & \exp\left\{-\frac{\tau^{2}}{2 \overline{x}^{2}}\left(\phi - \tau \mu - (1 - \tau)\overline{x}\right)^{2}\right\} \\ & \propto & \exp\left\{-\frac{\tau^{2}}{2 \overline{x}^{2}}\left[\phi^{2} - 2(\tau \mu + (1 - \tau)\overline{x})\phi\right]\right\}. \end{eqnarray*}\]

Question 2

In each of the following, state the distribution of the corresponding random variable.

\(f(x \, | \, \theta) \propto \frac{\theta^{x}}{x!}\), \(x = 0, 1, \ldots\), \(\theta > 0\).

From Question 1(a) this is a kernel of a \(Po(\theta)\) random quantity, so \(X \, | \, \theta \sim Po(\theta)\).
\(f(x) \propto e^{-2x}\), \(x > 0\).

A kernel of the \(Exp(\lambda)\) distribution is \(e^{-\lambda x}\) so that \(X \sim Exp(2)\). Notice that the \(Exp(\lambda)\) distribution is also the \(Gamma(1, \lambda)\) distribution so that, equivalently, \(X \sim Gamma(1, 2)\).
\(f(x) \propto 1\), \(0 \leq x \leq 1\).

If \(X \, | \, a, b \sim U(a, b)\) then a kernel is \(1\) for \(a \leq x \leq b\). Here we have \(0 \leq x \leq 1\) so that \(X \sim U(0, 1)\).
\(f(x \, | \, \alpha, \beta) \propto x^{-\alpha}e^{-2\beta/x}\), \(x > 0\), \(\alpha > 1\), \(\beta > 0\).

A kernel of an inverse gamma distribution with parameters \(a\) and \(b\), written \(Inv\mbox{-}gamma(a, b)\), is \(x^{-(a +1)}e^{-b/x}\) so \(X \, | \, \alpha, \beta \sim Inv\mbox{-}gamma(\alpha-1,2\beta)\).
\(f(x \, | \, m) \propto (1-x)^{(m-1)/2}\), \(0 \leq x \leq 1\), \(m > - 1\).

A kernel of a \(Beta(\alpha, \beta)\) is \(x^{\alpha-1}(1-x)^{\beta-1}\). Thus, \(X \, | \, m \sim Beta(1, \frac{m+1}{2})\).
\(f(x \, | \, \theta, \phi) \propto \exp\left\{-\frac{\theta}{\phi}x^{2} + (\phi + 1)x \right\}\), \(-\infty < x < \infty\), \(\theta > 0\), \(\phi > 0\).

If \(X \, | \, \mu, \sigma^{2} \sim N(\mu, \sigma^{2})\) then a kernel is \(\exp\left\{-\frac{1}{2\sigma^{2}}(x^{2} - 2\mu x)\right\}\). Thus, \(X \, | \, \theta, \phi \sim N\left(\frac{\phi(\phi+1)}{2\theta}, \frac{\phi}{2\theta}\right)\).

Question 3

Find the following sums and integrals by identifying a kernel of a probability density function and using properties of probability density functions.

\(\sum_{x=0}^{\infty} \frac{\theta^{x}}{x!}\), \(\theta > 0\).

From Question 2(a), \(\frac{\theta^{x}}{x!}\) is a kernel of a \(Po(\theta)\). Hence, \[\begin{eqnarray*} \sum_{x=0}^{\infty} \frac{\theta^{x} e^{-\theta}}{x!} \ = \ 1 & \Rightarrow & \sum_{x=0}^{\infty} \frac{\theta^{x}}{x!} \ = \ e^{\theta}. \end{eqnarray*}\]
\(\sum_{x=0}^{\infty} \frac{x\theta^{x}}{x!}\), \(\theta > 0\).

\(\frac{x\theta^{x}}{x!}\) is \(x\) times the kernel of a \(Po(\theta)\). So, thinking of the expectation of a \(Po(\theta)\), we have \[\begin{eqnarray*} E(X \, | \, \theta) \ = \ \theta \ = \ \sum_{x=0}^{\infty} \frac{x\theta^{x} e^{-\theta}}{x!} & \Rightarrow & \sum_{x=0}^{\infty} \frac{x\theta^{x}}{x!} \ = \ \theta e^{\theta}. \end{eqnarray*}\] Alternatively, \[\begin{eqnarray*} \sum_{x=0}^{\infty} \frac{x\theta^{x}}{x!} \ = \ \sum_{x=1}^{\infty} \frac{\theta^{x}}{(x-1)!} & = & \theta \sum_{x=1}^{\infty} \frac{\theta^{x-1}}{(x-1)!} \\ & = & \theta \sum_{y=0}^{\infty} \frac{\theta^{y}}{y!} \ = \ \theta e^{\theta} \end{eqnarray*}\] where \(y = x-1\) and the final equation follows from Question 3(a).
\(\int_{0}^{1} x^{\alpha - 1}(1-x)^{\beta - 1} \, dx\), \(\alpha > 0\), \(\beta > 0\).

\(x^{\alpha - 1}(1-x)^{\beta - 1}\) is a kernel of a \(Beta(\alpha, \beta)\) distribution so \[\begin{eqnarray*} \int_{0}^{1} \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} x^{\alpha - 1}(1-x)^{\beta - 1} \, dx & = & 1 \\ \Rightarrow \ \int_{0}^{1} x^{\alpha - 1}(1-x)^{\beta - 1} \, dx & = & \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)}. \end{eqnarray*}\]
\(\int_{0}^{1} \frac{1}{2}(\beta+3)(\beta+2)(\beta+1)x(1-x)^{\beta} \, dx\), \(\beta > -1\).

\(x(1-x)^{\beta}\) is a kernel of a \(Beta(2, \beta+1)\) distribution so, from Question 3(c), \[\begin{eqnarray*} \int_{0}^{1} x(1-x)^{\beta} \, dx & = & \frac{\Gamma(2)\Gamma(\beta+1)}{\Gamma(\beta+3)}. \end{eqnarray*}\] Now, the Gamma function has the property that \(\Gamma(z+1) = z\Gamma(z)\) for \(z > 0\) so that \(\frac{\Gamma(2)\Gamma(\beta+1)}{\Gamma(\beta+3)} = \frac{1}{(\beta+2)(\beta+1)}\). Thus, \[\begin{eqnarray*} \int_{0}^{1} \frac{1}{2}(\beta+3)(\beta+2)(\beta+1)x(1-x)^{\beta} \, dx & = & \frac{(\beta+3)(\beta+2)(\beta+1)}{2(\beta+2)(\beta+1)} \\ & = & \frac{1}{2}(\beta+3). \end{eqnarray*}\]
\(\int_{-\infty}^{\infty} 4\mu^{(a-1)}\left(\frac{\sigma^{2}+\tau^{2}}{\sigma\tau}\right)^{\frac{1}{2}}\exp\left\{-\left(\frac{\sigma^{2}+\tau^{2}}{\sigma\tau}\right)(\theta - \mu)^{2}\right\} \, d\theta\), \(\sigma > 0\), \(\tau > 0\).

\(\exp\left\{-\left(\frac{\sigma^{2}+\tau^{2}}{\sigma\tau}\right)(\theta - \mu)^{2}\right\}\) is a kernel of a \(N\left(\mu, \frac{\sigma\tau}{2(\sigma^{2}+\tau^{2})}\right)\) distribution. Thus, \[\begin{eqnarray*} \int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi}} \left(\frac{2(\sigma^{2}+\tau^{2})}{\sigma\tau}\right)^{\frac{1}{2}}\exp\left\{-\left(\frac{\sigma^{2}+\tau^{2}}{\sigma\tau}\right)(\theta - \mu)^{2}\right\} \, d\theta & = & 1 \\ \Rightarrow \ \int_{-\infty}^{\infty} \left(\frac{(\sigma^{2}+\tau^{2})}{\sigma\tau}\right)^{\frac{1}{2}}\exp\left\{-\left(\frac{\sigma^{2}+\tau^{2}}{\sigma\tau}\right)(\theta - \mu)^{2}\right\} \, d\theta & = & \sqrt{\pi} \\ \Rightarrow \int_{-\infty}^{\infty} 4\mu^{(a-1)}\left(\frac{\sigma^{2}+\tau^{2}}{\sigma\tau}\right)^{\frac{1}{2}}\exp\left\{-\left(\frac{\sigma^{2}+\tau^{2}}{\sigma\tau}\right)(\theta - \mu)^{2}\right\} \, d\theta & = & 4\mu^{(a-1)}\sqrt{\pi}. \end{eqnarray*}\]

Question 4

Suppose that we are interested in \(\theta\), the probability that a coin will yield a ‘head’ when spun in a specified manner. We judge that the prior distribution is \(\theta \sim Beta(4, 4)\). The coin is spun ten times (you do not witness the spins) and, rather than being told how many heads were seen, you are only told that the number is less than three.

Find the posterior distribution up to proportionality, and show that the normalising constant \(k\) is given by \[\begin{eqnarray*} k & = & \frac{\Gamma(18)}{1536 \times \Gamma(4) \Gamma(12)}. \end{eqnarray*}\]
In this case, for \(X \, | \, \theta \sim Bin(n, \theta)\), \[\begin{eqnarray} f(x \, | \, \theta) & = & P(X < 3 \, | \, \theta) \nonumber \\ & = & \sum_{x=0}^{2} \binom{n}{x} \theta^{x}(1- \theta)^{n-x}. \tag{1} \end{eqnarray}\] For \(\theta \sim Beta(\alpha, \beta)\) we have \[\begin{eqnarray} f(\theta) & \propto & \theta^{\alpha -1}(1-\theta)^{\beta - 1}. \tag{2} \end{eqnarray}\] Now as \(f(\theta \, | \, x) \propto f(x \, | \, \theta)f(\theta)\) then, from (1) and (2), \[\begin{eqnarray} f(\theta \, | \, x) & \propto & \left\{ \sum_{x=0}^{2} \binom{n}{x} \theta^{x}(1- \theta)^{n-x}\right\}\theta^{\alpha -1}(1-\theta)^{\beta - 1} \nonumber \\ & = & \sum_{x=0}^{2} \binom{n}{x} \theta^{\alpha + x - 1}(1- \theta)^{\beta + n-x -1} \nonumber \end{eqnarray}\] so that \[\begin{eqnarray} f(\theta \, | \, x) & = & k\left\{\sum_{x=0}^{2} \binom{n}{x} \theta^{\alpha + x - 1}(1- \theta)^{\beta + n-x -1}\right\}. \tag{3} \end{eqnarray}\] Now, \(\int_{0}^{1} f(\theta \, | \, x) \, d\theta = 1\) so, from (3), \[\begin{eqnarray} k^{-1} & = & \int_{0}^{1} \left\{\sum_{x=0}^{2} \binom{n}{x} \theta^{\alpha + x - 1}(1- \theta)^{\beta + n-x -1}\right\} \, d\theta \nonumber \\ & = & \sum_{x=0}^{2} \binom{n}{x} \int_{0}^{1} \theta^{\alpha + x - 1}(1- \theta)^{\beta + n-x -1} \, d\theta \nonumber \\ & = & \sum_{x=0}^{2} \binom{n}{x} B(\alpha + x, \beta + n - x) \tag{4} \end{eqnarray}\] where (4) follows since, from Question 3(c), \[\begin{eqnarray*} \int_{0}^{1} \theta^{\alpha + x - 1}(1- \theta)^{\beta + n-x -1} \, d\theta & = & \frac{\Gamma(\alpha + x)\Gamma(\beta + n - x)}{\Gamma(\alpha + \beta +n)} \\ & = & B(\alpha + x, \beta + n - x). \end{eqnarray*}\] Substituting \(n=10\), \(\alpha =4\), \(\beta = 4\) into (4) we have that \[\begin{eqnarray*} k^{-1} & = & \sum_{x=0}^{2} \binom{10}{x} B(4 + x, 14 - x) \\ & = & B(4,14) + 10B(5, 13) + 45B(6, 12) \\ & = & \frac{\Gamma(4)\Gamma(14) + 10\Gamma(5)\Gamma(13)+45\Gamma(6)\Gamma(12)}{\Gamma(18)} \\ & = & \frac{\Gamma(4)\Gamma(12)}{\Gamma(18)}\left\{(13\times 12)+10(4 \times 12)+45(5 \times 4)\right\} \\ & = & \frac{1536 \times \Gamma(4)\Gamma(12)}{\Gamma(18)}. \end{eqnarray*}\]
Show that the posterior mean is \(\frac{39}{128}\).

From (3) we have that \[\begin{eqnarray} E(\theta \, | \, X) & = & k\sum_{x=0}^{2} \binom{n}{x} \int_{0}^{1} \theta \times \theta^{\alpha + x - 1}(1- \theta)^{\beta + n-x -1} \, d\theta \nonumber \\ & = & k\sum_{x=0}^{2} \binom{n}{x} B(\alpha + x +1,\beta + n -x). \tag{5} \end{eqnarray}\] Substituting \(n=10\), \(\alpha =4\), \(\beta = 4\) into (5) we have that \[\begin{eqnarray} E(\theta \, | \, X) & = & k\sum_{x=0}^{2} \binom{n}{x} B(5+x,14 -x) \nonumber \\ & = & k\{B(5,14) + 10B(6, 13) + 45B(7, 12)\} \nonumber \\ & = & k\left\{\frac{\Gamma(5)\Gamma(14) + 10\Gamma(6)\Gamma(13)+45\Gamma(7)\Gamma(12)}{\Gamma(19)}\right\} \nonumber \\ & = & k\frac{\Gamma(5)\Gamma(12)}{\Gamma(19)}\left\{(13\times 12)+10(5 \times 12)+45(6 \times 5)\right\} \nonumber\\ & = & k\frac{2106 \Gamma(5) \Gamma(12)}{\Gamma(19)} \tag{6} \end{eqnarray}\] Substituting for \(k\) in (6) we have \[\begin{eqnarray*} E(\theta \, | \, X) & = & \frac{\Gamma(18)}{1536 \times \Gamma(4) \Gamma(12)} \times \frac{2106 \Gamma(5) \Gamma(12)}{\Gamma(19)} \\ & = & \frac{2106\times 4}{1536 \times 18} \ = \ \frac{39}{128}. \end{eqnarray*}\]

Question 5

Consider the random variables \(X\) and \(Y\) which, for convenience, you may assume are continuous. Recall that the conditional expectation of \(g(X)\) given \(Y\) is defined as \[\begin{eqnarray*} E(g(X) \, | \, Y) & = & \int_{X} g(x)f(x \, | \, y) \, dx \end{eqnarray*}\] for any function \(g(\cdot)\) where \(f(x \, | \, y)\) is the conditional distribution of \(X\) given \(Y\). Prove that

\(E(X) = E(E(X \, | \, Y))\).

Note that \(E(X \, | \, Y)\) is a function of \(Y\) so that \[\begin{eqnarray} E(E(X \, | \, Y)) & = & \int_{Y} E(X \, | \, Y)f(y) \, dy \nonumber \\ & = & \int_{Y} \left\{\int_{X} xf(x \, | \, y) \, dx \right\}f(y) \, dy \tag{7} \\ & = & \int_{X} x \left\{\int_{Y} f(x \, | \, y)f(y) \, dy \right\} \, dx \nonumber \\ & = & \int_{X} xf(x) \, dx \ = \ E(X). \nonumber \end{eqnarray}\]
\(Var(X) = Var(E(X \, | \, Y)) + E(Var(X \, | \, Y))\).

Notice that Question 5(a), by substituting \(g(x)\) for \(x\) in (7), is easily generalised to \(E(g(X)) = E(E(g(X) \, | \, Y))\). Now, \[\begin{eqnarray} Var(E(X \, | \, Y)) & = & E(E^{2}(X \, | \, Y)) - E^{2}(E(X \, | \, Y)) \nonumber \\ & = & E(E^{2}(X \, | \, Y)) - E^{2}(X). \tag{8} \end{eqnarray}\] where (8) follows from Question 5(a). Similarly, \[\begin{eqnarray} E(Var(X \, | \, Y)) & = & E(E(X^{2} \, | \, Y)) - E(E^{2}(X \, | \, Y)) \nonumber \\ & = & E(X^{2}) - E(E^{2}(X \, | \, Y)). \tag{9} \end{eqnarray}\] where (9) follows from the generalisation of Question 5(a) with \(g(X) = X^{2}\). Adding (8) and (9) gives the result.

Question 6

Consider the random variables \(\theta\) and \(X\). Consider an estimator \(t(X)\) of \(\theta\) and that we measure the quality of an estimator by its mean squared error, the expected squared distance between the estimator and \(\theta\). Show that, given \(X\), the choice \(t = E(\theta \, | \, X)\) minimises this mean squared error.

We seek to minimise \(E((t - \theta)^{2} \, | \, X)\) where, given \(X = x\), \(t = t(X)\) is a constant. Now \[\begin{eqnarray*} E((t - \theta)^{2} \, | \, X) & = & E(t^{2}-2t\theta + \theta^{2} \, | \, X) \\ & = & t^{2} - 2tE(\theta \, | \, X) + E(\theta^{2} \, | \, X) \\ & = & t^{2} - 2tE(\theta \, | \, X) + E^{2}(\theta \, | \, X) + Var(\theta \, | \, X) \\ & = & \{t-E(\theta \, | \, X)\}^{2} + Var(\theta \, | \, X). \end{eqnarray*}\] Now \(Var(\theta \, | \, X)\) does not depend upon \(t\) so \(E((t - \theta)^{2} \, | \, X)\) is minimised if we take \(t = E(\theta \, | \, X)\).