Let \(X_{1}, \ldots, X_{n}\) be conditionally independent given \(\lambda\) so that \(f(x \, | \, \lambda) = \prod_{i=1}^{n} f(x_{i} \, | \, \lambda)\) where \(x = (x_{1}, \ldots, x_{n})\). Suppose that \(\lambda \sim Gamma(\alpha, \beta)\) and \(X_{i} \, | \, \lambda \sim Exp(\lambda)\) where \(\lambda\) represents the rate so that \(E(X_{i} \, | \, \lambda) = \lambda^{-1}\).
Show that \(\lambda \, | \, x
\sim Gamma(\alpha +n, \beta + n \bar{x})\).
As
\(X_{i} \, | \, \lambda \sim
Exp(\lambda)\) then, with \(x = (x_{1},
\ldots, x_{n})\), \[\begin{eqnarray*}
f(x \, | \, \lambda) & = & \lambda^{n} e^{-n\bar{x}\lambda}
\end{eqnarray*}\] which, when viewed as a function of \(\lambda\), is a kernel of a Gamma
distribution. We expect that if the prior is a general member of the
Gamma family we will have conjugacy. We now confirm this using \(\lambda \sim Gamma(\alpha, \beta)\). The
posterior is \[\begin{eqnarray*}
f(\lambda \, | \, x) & \propto & \lambda^{n}
e^{-n\bar{x}\lambda} \times \lambda^{\alpha - 1}e^{-\beta \lambda} \\
& = & \lambda^{\alpha + n - 1}e^{-(\beta + n\bar{x})\lambda}
\end{eqnarray*}\] which is a kernel of a \(Gamma(\alpha +n, \beta + n \bar{x})\)
density, that is \(\lambda \, | \, x \sim
Gamma(\alpha +n, \beta + n \bar{x})\).
Show that the posterior mean for the failure rate \(\lambda\) can be written as a weighted
average of the prior mean of \(\lambda\) and the maximum likelihood
estimate, \(\bar{x}^{-1}\), of \(\lambda\).
\[\begin{eqnarray*}
E(\lambda \, | \, X) & = & \frac{\alpha + n}{\beta + n \bar{x}}
\\
& = & \frac{\beta\left(\frac{\alpha}{\beta}\right) + n \bar{x}
\left(\frac{1}{\bar{x}}\right)}{\beta + n \bar{x}} \ = \ c
\frac{\alpha}{\beta} + (1 - c)\frac{1}{\bar{x}}
\end{eqnarray*}\] where \(c =
\frac{\beta}{\beta + n \bar{x}}\). Hence \(E(\lambda \, | \, X)\) is a weighted
average of the prior mean, \(E(\lambda) =
\frac{\alpha}{\beta}\), and the maximum likelihood estimate,
\(\bar{x}^{-1}\), of \(\lambda\).
A water company is interested in the failure rate of water pipes. They ask two groups of engineers about their prior beliefs about the failure rate. The first group believe the mean failure rate is \(\frac{1}{8}\) with coefficient of variation \(\frac{1}{\sqrt{11}}\), whilst the second group believe the mean is \(\frac{1}{11}\) with coefficient of variation \(\frac{1}{2}\). [Note: The coefficient of variation is the standard deviation divided by the mean.] Let \(X_{i}\) be the time until water pipe \(i\) fails and assume that the \(X_{i}\) follow the exponential likelihood model described above. A sample of five of pipes is taken and the following times to failure were observed: \(8.2, \ 9.2, \ 11.2, \ 9.8, \ 10.1.\)
Find the appropriate members of the Gamma families the
prior statements of the two groups of engineers represent. In each case
find the posterior mean and variance. Approximating the posterior by
\(N(E(\lambda \, | \, x), Var(\lambda \, | \,
x))\), where \(x = (x_{1}, \ldots,
x_{5})\), estimate, in each case, the probability that the
failure rate is less than 0.1.
For a \(Gamma(\alpha, \beta)\) distribution, the
mean is \(\frac{\alpha}{\beta}\) whilst
the coefficient of variation is \(\sqrt{\frac{\alpha}{\beta^{2}}} \times
\frac{\beta}{\alpha} = \frac{1}{\sqrt{\alpha}}\).
Suppose that the first group of engineers choose the prior \(\lambda \sim Gamma(\alpha_{1}, \beta_{1})\)
then \(\frac{\alpha_{1}}{\beta_{1}} =
\frac{1}{8}\) and \(\frac{1}{\sqrt{\alpha_{1}}} =
\frac{1}{\sqrt{11}}\). Hence, \(\alpha_{1} = 11\) and \(\beta_{1} = 88\).
If the second
group of engineers choose the prior \(\lambda
\sim Gamma(\alpha_{2}, \beta_{2})\) then \(\frac{\alpha_{2}}{\beta_{2}} =
\frac{1}{11}\) and \(\frac{1}{\sqrt{\alpha_{2}}} =
\frac{1}{2}\). Hence, \(\alpha_{2} =
4\) and \(\beta_{2} = 44\).
We observe \(n = 5\) and \(\bar{x} = 9.7\) so that the first group of
engineers have a posterior given by \(Gamma(11
+ 5, 88 + 5(9.7)) = Gamma(16, 136.5)\) whilst the second group of
engineers have a posterior given by \(Gamma(4
+ 5, 44 + 5(9.7)) = Gamma(9, 92.5)\). We summarise the results in
the following table.
\[\begin{eqnarray*}
\begin{array}{ccccc}
\mbox{Prior} & \mbox{Posterior} & E(\lambda \, | \, X) &
Var(\lambda \, | \, X) & P(\lambda < 0.1 \, | \, X) \\ \hline
Gamma(11, 88) & Gamma(16, 136.5) & 0.11722 & 0.00086 &
0.2776 \\
Gamma(4, 44) & Gamma(9, 92.5) & 0.09730 & 0.00105 &
0.5319\end{array}
\end{eqnarray*}\]
How do you expect any differences between the engineers
to be reconciled as more data becomes available?
The
maximum likelihood estimate of \(\lambda\) is \(\frac{1}{9.7} = 0.10309\) which is closest
to the second group of engineers who had a higher initial variance,
corresponding to more uncertainty, for \(\lambda\) than the first group. Thus, the
posterior for the second group is closer to the likelihood than for the
first. We can also observe this by calculating the corresponding weights
of prior mean and maximum likelihood estimate in the posterior means as
per part (b). For the first group, we have \(c
= \frac{\beta_{1}}{\beta_{1} + n\bar{x}} = \frac{88}{136.5} =
0.64469\) so that the posterior mean attaches a heavier weight to
the prior mean whereas for the second group of engineers the weight is
\(c = \frac{\beta_{2}}{\beta_{2} + n\bar{x}} =
\frac{44}{92.5} = 0.47568\) so that a heavier weight is attached
to the maximum likelihood estimate.
As more and more data are
collected, the influence of the prior in the posterior will decrease and
the data will take over. This can be seen clearly in the posterior mean
calculation in part (b). Assuming \(\bar{x}\) stabilises then, as \(n \rightarrow \infty\), \(c \rightarrow 0\) and the posterior mean
will tend towards the maximum likelihood estimate.
Let \(x\) be the number of successes in \(n\) independent Bernoulli trials, each one having unknown probability \(\theta\) of success. It is judged that \(\theta\) may be modelled by a \(Unif(0, 1)\) distribution so \[\begin{eqnarray*} f(\theta) & = & 1, \ \ \ \ \ 0 < \theta < 1. \end{eqnarray*}\] An extra trial, \(z\) is performed, independent of the first \(n\) given \(\theta\), but with probability \(\frac{\theta}{2}\) of success. The full data is thus \((x, z)\) where \(z = 1\) if the extra trial is a success and \(0\) otherwise.
Show that \[\begin{eqnarray*}
f(\theta \, | \, x, z=0) & = & c\{\theta^{\alpha -
1}(1-\theta)^{\beta - 1} + \theta^{\alpha -1}(1 - \theta)^{\beta}\}
\end{eqnarray*}\] where \(\alpha = x+1\), \(\beta = n-x+1\) and \(c = \frac{1}{B(\alpha, \beta) + B(\alpha,
\beta+1)}\).
As \(X
\, | \, \theta \sim Bin(n, \theta)\) then \[\begin{eqnarray}
f(x \, | \, \theta) & = & \binom{n}{x} \theta^{x}(1-
\theta)^{n-x} \tag{1}
\end{eqnarray}\] whilst as \(Z \, | \,
\theta \sim Bernoulli(\frac{\theta}{2})\) then \[\begin{eqnarray}
f(z=0 \, | \, \theta) \ = \ P(Z = 0 \, | \, \theta) & = & 1 -
\frac{\theta}{2} \nonumber \\
& = & \frac{1}{2}(2 - \theta) \ = \
\frac{1}{2}\{1+(1-\theta)\}. \tag{2}
\end{eqnarray}\] As \(X\) and
\(Z\) are conditionally independent
given \(\theta\), from (1) and (2), we
have \[\begin{eqnarray}
f(x, z=0 \, | \, \theta) & = & f(x \, | \, \theta)f(z=0 \, | \,
\theta) \nonumber \\
& = & \binom{n}{x} \theta^{x}(1- \theta)^{n-x} \times
\frac{1}{2}\{1+(1-\theta)\}. \tag{3}
\end{eqnarray}\] As \(f(\theta) =
1\) then, using (3), the posterior is \[\begin{eqnarray*}
f(\theta \, | \, x, z=0) & \propto & f(x, z=0 \, | \,
\theta)f(\theta) \nonumber \\
& \propto & \theta^{x}(1- \theta)^{n-x} + \theta^{x}(1-
\theta)^{n-x+1}.
\end{eqnarray*}\] Letting \(\alpha =
x+1\), \(\beta = n-x+1\) we have
\[\begin{eqnarray}
f(\theta \, | \, x, z=0) & c \{\theta^{\alpha-1}(1-
\theta)^{\beta-1} + \theta^{\alpha-1}(1- \theta)^{\beta}\} \tag{4}
\end{eqnarray}\] where \[\begin{eqnarray}
c^{-1} & = & \int_{0}^{1} \{\theta^{\alpha-1}(1-
\theta)^{\beta-1} + \theta^{\alpha-1}(1- \theta)^{\beta}\} \, d\theta
\nonumber \\
& = & \int_{0}^{1} \theta^{\alpha-1}(1- \theta)^{\beta-1} \,
d\theta + \int_{0}^{1} \theta^{\alpha-1}(1- \theta)^{\beta} \, d\theta
\nonumber \\
& = & B(\alpha, \beta) + B(\alpha, \beta+1). \tag{5}
\end{eqnarray}\]
Hence show that \[\begin{eqnarray*}
E(\theta \, | \, X, Z = 0) & = &
\frac{(x+1)(2n-x+4)}{(n+3)(2n-x+3)}.
\end{eqnarray*}\] [Hint: Show that \(c = \frac{\alpha + \beta}{B(\alpha, \beta)(\alpha
+ 2\beta)}\) and work with \(\alpha\) and \(\beta\).]
From (4) we
have \[\begin{eqnarray}
E(\theta \, | \, X, Z = 0) & = & \int_{0}^{1} \theta \times c
\{\theta^{\alpha-1}(1- \theta)^{\beta-1} + \theta^{\alpha-1}(1-
\theta)^{\beta}\} \, d\theta \nonumber \\
& = & c\left\{\int_{0}^{1} \theta^{\alpha}(1- \theta)^{\beta-1}
\, d\theta + \int_{0}^{1} \theta^{\alpha}(1- \theta)^{\beta} \, d\theta
\right\} \nonumber \\
& = & c\{B(\alpha + 1,\beta) + B(\alpha+1, \beta+1)\}. \tag{6}
\end{eqnarray}\] Note that \(B(\alpha,
\beta+1) = \frac{\Gamma(\alpha)\Gamma(\beta +1)}{\Gamma(\alpha + \beta +
1)} = \frac{\beta}{\alpha + \beta} B(\alpha, \beta)\) so that,
from (5), \[\begin{eqnarray}
c \ = \ \frac{1}{B(\alpha. \beta) + B(\alpha, \beta+1)} & = &
\frac{1}{B(\alpha. \beta) + \frac{\beta}{\alpha + \beta} B(\alpha,
\beta)} \nonumber \\
& = & \frac{\alpha + \beta}{B(\alpha, \beta)(\alpha + 2\beta)}.
\tag{7}
\end{eqnarray}\] Now, \(B(\alpha +
1,\beta) = \frac{\alpha}{\alpha + \beta} B(\alpha, \beta)\) and
\(B(\alpha+1, \beta+1) = \frac{\alpha
\beta}{(\alpha + \beta + 1)(\alpha + \beta)}B(\alpha, \beta)\) so
that \[\begin{eqnarray}
B(\alpha + 1,\beta) + B(\alpha+1, \beta+1) & = &
\left\{\frac{\alpha}{\alpha + \beta} + \frac{\alpha \beta}{(\alpha +
\beta + 1)(\alpha + \beta)}\right\}B(\alpha, \beta) \nonumber \\
& = & \frac{\alpha(\alpha + 2\beta + 1)}{(\alpha + \beta +
1)(\alpha + \beta)}B(\alpha, \beta). \tag{8}
\end{eqnarray}\] Substituting (8) and (7) into (6) gives \[\begin{eqnarray}
E(\theta \, | \, X, Z = 0) \ = \ \frac{\alpha(\alpha + 2\beta +
1)}{(\alpha + \beta + 1)(\alpha + 2\beta)} \ = \
\frac{(x+1)(2n-x+4)}{(n+3)(2n-x+3)} \nonumber
\end{eqnarray}\] as \(\alpha =
x+1\) and \(\beta = n - x +
1\).
Show that, for all \(x\), \(E(\theta
\, | \, X, Z = 0)\) is less than \(E(\theta \, | \, X, Z = 1)\).
In this case \(f(z = 1 \, | \, \theta) =
\frac{\theta}{2}\) so that \[\begin{eqnarray}
f(x, z=1 \, | \, \theta) & = & \binom{n}{x} \theta^{x}(1-
\theta)^{n-x} \times \frac{1}{2}\theta. \tag{9}
\end{eqnarray}\] Now, as \(f(\theta)
\propto 1\), \(f(\theta \, | \, x, z=1)
\propto f(x, z=1 \, | \, \theta)\) which, from viewing (9) as a
function of \(\theta\), we observe as a
kernel of a \(Beta(x+2, n-x+1)\)
distribution, so that \(\theta \, | \, x, z=1
\sim Beta(x+2, n-x+1)\). Hence, \(E(\theta \, | \, X, Z = 1) =
\frac{x+2}{n+3}\). Now, from part (b), \[\begin{eqnarray*}
E(\theta \, | \, X, Z = 0) \ = \ \frac{(x+1)(2n-x+4)}{(n+3)(2n-x+3)} \ =
\ \frac{x+1}{n+3}\left(1 + \frac{1}{2n-x+3}\right)
\end{eqnarray*}\] Hence \[\begin{eqnarray*}
E(\theta \, | \, X, Z = 0) \ < \ E(\theta \, | \, X, Z = 1) &
\Leftrightarrow & \frac{x+1}{2n-x+3} \ < \ 1 \\
& \Leftrightarrow & x \ < \ n+1
\end{eqnarray*}\] which is true as \(x
\in \{0, 1, \ldots, n\}\).
Let \(X_{1}, \ldots, X_{n}\) be conditionally independent given \(\theta\), so \(f(x \, | \, \theta) = \prod_{i=1}^{n} f(x_{i} \, | \, \theta)\) where \(x = (x_{1}, \ldots, x_{n})\), with each \(X_{i} \, | \, \theta \sim N(\mu, \theta)\) where \(\mu\) is known.
Let \(s(x) = \sum_{i=1}^{n} (x_{i} - \mu)^{2}\). Show that we can write \[\begin{eqnarray*} f(x \, | \, \theta) & = & g(s, \theta)h(x) \end{eqnarray*}\] where \(g(s, \theta)\) depends upon \(s(x)\) and \(\theta\) and \(h(x)\) does not depend upon \(\theta\) but may depend upon \(x\). The equation shows that \(s(X) = \sum_{i=1}^{n} (X_{i} - \mu)^{2}\) is sufficient for \(X_{1}, \ldots, X_{n}\) for learning about \(\theta\). \[\begin{eqnarray*} f(x \, | \, \theta) & = & \prod_{i=1}^{n} \frac{1}{\sqrt{2 \pi \theta}} \exp \left\{ - \frac{1}{2\theta}(x_{i} - \mu)^{2} \right\} \\ & = & (2 \pi \theta)^{-\frac{n}{2}} \exp \left\{ - \frac{1}{2\theta} \sum_{i=1}^{n} (x_{i} - \mu)^{2} \right\} \\ & = & (2 \pi \theta)^{-\frac{n}{2}} \exp \left\{ - \frac{1}{2\theta} s(x) \right\} \\ & = & g(s, \theta)h(x) \end{eqnarray*}\] where \(g(s, \theta) = \theta^{-\frac{n}{2}}\exp \left\{ - \frac{1}{2\theta} s(x) \right\}\) and \(h(x) = (2 \pi)^{-\frac{n}{2}}\).
An inverse-gamma distribution with known parameters \(\alpha, \beta > 0\) is judged to be the prior distribution for \(\theta\). So, \[\begin{eqnarray*} f(\theta) & = & \frac{\beta^{\alpha}}{\Gamma(\alpha)}\theta^{-(\alpha+1)}e^{-\beta/\theta}, \ \ \ \ \ \theta > 0. \end{eqnarray*}\]
Find the posterior distribution of \(\theta\) given \(x = (x_{1}, \ldots, x_{n})\).
\[\begin{eqnarray*}
f(\theta \, | \, x) & \propto & f(x \, | \, \theta) f(\theta) \\
& = & (2 \pi \theta)^{-\frac{n}{2}} \exp \left\{ -
\frac{1}{2\theta} s(x) \right\} \times
\frac{\beta^{\alpha}}{\Gamma(\alpha)}\theta^{-(\alpha+1)}e^{-\beta/\theta}
\\
& \propto & \theta^{-\frac{n}{2}}\exp \left\{ -
\frac{1}{2\theta} s(x) \right\} \times
\theta^{-(\alpha+1)}e^{-\beta/\theta} \\
& = & \theta^{-\left(\alpha + \frac{n}{2} + 1 \right)}
\exp\left\{-\frac{\beta + \frac{s(x)}{2}}{\theta}\right\}
\end{eqnarray*}\] which is a kernel of an \(Inv-Gamma(\alpha + \frac{n}{2}, \beta +
\frac{s(x)}{2})\). Hence, we have the posterior \(\theta \, | \, x \sim Inv-Gamma(\alpha +
\frac{n}{2}, \beta + \frac{s(x)}{2})\).
Show that the posterior mean for \(\theta\) can be written as a weighted
average of the prior mean of \(\theta\)
and the maximum likelihood estimate, \(s(x)/n\), of \(\theta\).
As \(\theta \, | \, x \sim Inv-Gamma(\alpha +
\frac{n}{2}, \beta + \frac{s(x)}{2})\) we have that
\[\begin{eqnarray*}
E(\theta \, | \, X) & = & \frac{\beta + \frac{s(x)}{2}}{\alpha +
\frac{n}{2} - 1} \\
& = & \frac{(\alpha-1) \frac{\beta}{\alpha-1} +
\frac{n}{2}\frac{s(x)}{n}}{\alpha + \frac{n}{2} - 1} \ = \
c\left(\frac{\beta}{\alpha-1}\right)+(1-c)\left(\frac{s(x)}{n}\right)
\end{eqnarray*}\] where \(c =
\frac{\alpha-1}{\alpha + \frac{n}{2}-1}\).
Note that we
need \(\alpha + \frac{n}{2} > 1\)
for \(E(\theta \, | \, X)\) to be
finite. For any \(\alpha > 0\) then
\(n > 2\) will ensure this. For the
decomposition as a weighted average we implicitly assume \(\alpha > 1\) so that \(E(\theta)\) is finite.
Hence \(E(\theta \, | \, X)\) is a weighted average
of the prior mean, \(E(\theta) =
\frac{\beta}{\alpha-1}\), and the maximum likelihood estimate,
\(\frac{s(x)}{n}\), of \(\theta\).
Suppose that \(X_{1}, \ldots, X_{n}\) are identically distributed discrete random variables taking \(k\) possible values with probabilities \(\theta_{1}, \ldots, \theta_{k}\). Inference is required about \(\theta = (\theta_{1}, \ldots, \theta_{k})\) where \(\sum_{j=1}^{k} \theta_{j} = 1\).
Assuming that the \(X_{i}\)s are independent given \(\theta\), explain why \[\begin{eqnarray}
f(x \, | \, \theta) & \propto & \prod_{j=1}^{k}
\theta_{j}^{n_{j}} \tag{10}
\end{eqnarray}\] where \(x =
(x_{1}, \ldots, x_{n})\) and \(n_{j}\) is the number of \(x_{i}\)s observed to take the \(j\)th possible value.
This is just a generalisation of Bernoulli trials. Let \(I_{\{x_{i}, j\}}\) denote the indicator
function which is equal to one if \(x_{i}\) is in the \(j\)th class and zero otherwise. Then \[\begin{eqnarray*}
P(X_{i} = x_{i} \, | \, \theta) & = & \prod_{j=1}^{k}
\theta_{j}^{I_{\{x_{i}, j\}}}
\end{eqnarray*}\] and, by the conditional independence, \[\begin{eqnarray*}
f(x \, | \, \theta) \ = \ \prod_{i=1}^{n} P(X_{i} = x_{i} \, | \,
\theta) \ = \ \prod_{i=1}^{n} \prod_{j=1}^{k} \theta_{j}^{I_{\{x_{i},
j\}}}
\end{eqnarray*}\] which gives (10) as we thus get a contribution
of \(\theta_{j}\) for each \(x_{i}\) which lands in the \(j\)th class and there are a total of \(n_{j}\) of these, \(j = 1, \ldots, k\). Notice that if we are
only told the \(n_{j}\)s rather than
which specific \(x_{i}\) contributed to
each \(n_{j}\) then the likelihood is
\[\begin{eqnarray*}
f(x \, | \, \theta) & = & \frac{n!}{\prod_{j=1}^{k} n_{j}!}
\prod_{j=1}^{k} \theta_{j}^{n_{j}}
\end{eqnarray*}\] which is the Multinomial distribution.
Suppose that the prior for \(\theta\) is Dirichlet distributed with
known parameters \(a = (a_{1}, \ldots,
a_{k})\) so \[\begin{eqnarray*}
f(\theta) & = & \frac{1}{B(a)} \prod_{j=1}^{k}
\theta_{j}^{a_{j}-1}
\end{eqnarray*}\] where \(B(a)
= B(a_{1}, \ldots, a_{k}) = \frac{\prod_{j=1}^{k}
\Gamma(a_{j})}{\Gamma(\sum_{j=1}^{k} a_{j})}\). Show that the
posterior for \(\theta\) given \(x\) is Dirichlet with parameters \(a + n = (a_{1} + n_{1}, \ldots, a_{k} +
n_{k})\).
\[\begin{eqnarray*}
f(\theta \, | \, x) & \propto & f( x \, | \, \theta) f(\theta)
\\
& \propto & \left\{\prod_{j=1}^{k} \theta_{j}^{n_{j}}\right\}
\times \left\{\prod_{j=1}^{k} \theta_{j}^{a_{j}-1}\right\} \\
& = & \prod_{j=1}^{k} \theta_{j}^{a_{j}+n_{j}-1}
\end{eqnarray*}\] which is a kernel of the Dirichlet with
parameter \(a + n = (a_{1} + n_{1}, \ldots,
a_{k} + n_{k})\) so that the posterior \(\theta \, | \, x\) is Dirichlet with
parameter \(a + n\).
The
Multinomial distribution is the multivariate generalisation of the
Binomial (\(k = 2\) for the Multinomial
gives the Binomial) and the Dirichlet the multivariate generalisation of
the Beta (\(k = 2\) for the Dirichlet
gives the Beta). It is straightforward to obtain the moments of the
Dirichlet distribution. For example, \[\begin{eqnarray}
E\left(\prod_{j=1}^{k} \theta_{j}^{m_{j}}\right) & = &
\int_{\theta} \prod_{j=1}^{k} \theta_{j}^{m_{j}} \times \frac{1}{B(a)}
\prod_{j=1}^{k} \theta_{j}^{a_{j}-1} \, d\theta \tag{11} \\
& = & \frac{1}{B(a)} \int_{\theta} \prod_{j=1}^{k}
\theta_{j}^{a_{j}+m_{j}-1} \, d\theta \tag{12} \\
& = & \frac{B(a+m)}{B(a)} \tag{13}
\end{eqnarray}\] where \(a + m = (a_{1}
+ m_{1}, \ldots, a_{k} + m_{k})\). Notice that the integral in
(11) is \(k\)-dimensional (as \(\theta\) is \(k\)-dimensional) and (13) follows as the
integral in (12) is a kernel of the Dirichlet distribution with
parameter \(a+m\). In particular,
taking \(m_{j} = 1\) and \(m_{j'} = 0\) for \(j' \neq j\) we have \[\begin{eqnarray*}
E(\theta_{j}) \ = \ \frac{B(a+m)}{B(a)} & = &
\frac{\left\{\prod_{j' \neq j}^{k} \Gamma
(a_{j'})\right\}\Gamma(a_{j}+1)}{\Gamma((\sum_{j=1}^{k} a_{j})+1)}
\times \frac{\Gamma(\sum_{j=1}^{k} a_{j})}{\prod_{j =1}^{k} \Gamma
(a_{j})} \\
& = & \frac{\Gamma(a_{j} + 1)}{\Gamma((\sum_{j=1}^{k} a_{j})+1)}
\times \frac{\Gamma(\sum_{j=1}^{k} a_{j})}{\Gamma (a_{j})} \\
& = & \frac{a_{j}}{\sum_{j=1}^{k} a_{j}}.
\end{eqnarray*}\] As \(\theta \, | \,
x\) is Dirichlet with parameter \(a +
n\) we have \[\begin{eqnarray*}
E(\theta_{j} \, | \, x) & = & \frac{a_{j} +
n_{j}}{\sum_{j=1}^{k} a_{j} + \sum_{j=1}^{k} n_{j}} \\
& = &
\frac{\tilde{a}}{\tilde{a}+\tilde{n}}\left(\frac{a_{j}}{\tilde{a}}\right)
+
\frac{\tilde{n}}{\tilde{a}+\tilde{n}}\left(\frac{n_{j}}{\tilde{n}}\right)
\end{eqnarray*}\] where \(\tilde{a} =
\sum_{j=1}^{k} a_{j}\) and \(\tilde{n}
= \sum_{j=1}^{k} n_{j}\). The posterior mean for \(\theta_{j}\) is a weighted average of its
prior mean, \(\frac{a_{j}}{\tilde{a}}\), and the
classical maximum likelihood estimate of \(\theta_{j}\), \(\frac{n_{j}}{\tilde{n}}\). Notice that
\(\tilde{a}\) controls the weight of
the prior mean in the posterior mean and is often said to represent the
prior strength.