Consider Birnbaum’s Theorem, \((\mbox{WIP} \wedge \mbox{WCP} \, ) \leftrightarrow \mbox{SLP}\). In lectures, we showed that \((\mbox{WIP} \wedge \mbox{WCP} \,) \rightarrow \mbox{SLP}\) but not the converse. Hence, show that \(\mbox{SLP} \rightarrow \mbox{WIP}\) and \(\mbox{SLP} \rightarrow \mbox{WCP}\).
Suppose that we have two discrete experiments \(\mathcal{E}_{1} = \{\mathcal{X}_{1}, \Theta, f_{X_{1}}(x_{1} \, | \, \theta)\}\) and \(\mathcal{E}_{2} = \{\mathcal{X}_{2}, \Theta, f_{X_{2}}(x_{2} \, | \, \theta)\}\) and that, for \(x_{1}' \in \mathcal{X}_{1}\) and \(x_{2}' \in \mathcal{X}_{2}\), \[\begin{eqnarray} f_{X_{1}}(x_{1}' \, | \, \theta) \ = \ cf_{X_{2}}(x_{2}' \, | \, \theta) \tag{1} \end{eqnarray}\] for all \(\theta\) where \(c\) is a positive constant not depending upon \(\theta\) (but which may depend on \(x_{1}', x_{2}'\)) and \(f_{X_{1}}(x_{1}' \, | \, \theta) > 0\). We wish to consider estimation of \(\theta\) under a loss function \(L(\theta, d)\) which is strictly convex in \(d\) for each \(\theta\). Thus, for all \(d_{1} \neq d_{2} \in \mathcal{D}\), the decision space, and \(\alpha \in (0, 1)\), \[\begin{eqnarray*} L(\theta, \alpha d_{1} + (1-\alpha)d_{2}) & < & \alpha L(\theta, d_{1}) + (1-\alpha)L(\theta, d_{2}). \end{eqnarray*}\] For the experiment \(\mathcal{E}_{j}\), \(j = 1, 2\), for the observation \(x_{j}\) we will use the decision rule \(\delta_{j}(x_{j})\) as our estimate of \(\theta\) so that \[\begin{eqnarray*} \mbox{Ev}(\mathcal{E}_{j}, x_{j}) & = & \delta_{j}(x_{j}). \end{eqnarray*}\] Suppose that the inference violates the strong likelihood principle so that, whilst equation (1) holds, \(\delta_{1}(x_{1}') \neq \delta_{2}(x_{2}')\).
Let \(\mathcal{E}^{*}\) be the mixture of the experiments \(\mathcal{E}_{1}\) and \(\mathcal{E}_{2}\) according to mixture probabilities \(1/2\) and \(1/2\). For the outcome \((j, x_{j})\) the decision rule is \(\delta(j, x_{j})\). If the Weak Conditionality Principle (WCP) applies to \(\mathcal{E}^{*}\) show that \[\begin{eqnarray*} \delta(1, x_{1}') & \neq & \delta(2, x_{2}'). \end{eqnarray*}\]
An alternative decision rule for \(\mathcal{E}^{*}\) is \[\begin{eqnarray*} \delta^{*}(j, x_{j}) & = & \left\{\begin{array}{ll} \frac{c}{c+1}\delta(1, x_{1}') + \frac{1}{c+1}\delta(2, x_{2}') & \mbox{if $x_{j} = x_{j}'$ for $j = 1, 2$}, \\ \delta(j, x_{j}) & \mbox{otherwise}. \end{array} \right. \end{eqnarray*}\] Show that if the WCP applies to \(\mathcal{E}^{*}\) then \(\delta^{*}\) dominates \(\delta\) so that \(\delta\) is inadmissible. [Hint: First show that \(R(\theta, \delta^{*}) = \frac{1}{2}\mathbb{E}[L(\theta, \delta^{*}(1, X_{1})) \, | \, \theta] + \frac{1}{2}\mathbb{E}[L(\theta, \delta^{*}(2, X_{2})) \, | \, \theta]\).]
Comment on the result of part b.
Suppose we have a hypothesis test of two simple hypotheses \[\begin{eqnarray*} H_{0}: X \sim f_{0} \, \mbox{versus} \, H_{1}: X \sim f_{1} \end{eqnarray*}\] so that if \(H_{i}\) is true then \(X\) has distribution \(f_{i}(x)\). It is proposed to choose between \(H_{0}\) and \(H_{1}\) using the following loss function. \[\begin{eqnarray*} \begin{array}{cc|cc} & & \mbox{Decision} \\ & & H_{0} & H_{1} \\ \hline \mbox{Outcome} & \begin{array}{c} H_{0} \\ H_{1} \end{array} & \begin{array}{c} c_{00} \\ c_{10} \end{array} & \begin{array}{c} c_{01} \\ c_{11} \end{array} \end{array} \end{eqnarray*}\] where \(c_{00} < c_{01}\) and \(c_{11} < c_{10}\). Thus, \(c_{ij} = L(H_{i}, H_{j})\) is the loss when the “true” hypothesis is \(H_{i}\) and the decision \(H_{j}\) is taken. Show that a decision rule \(\delta(x)\) for choosing between \(H_{0}\) and \(H_{1}\) is admissible if and only if \[\begin{eqnarray*} \delta(x) & = & \left\{\begin{array}{cl} H_{0} & \mbox{if } \dfrac{f_{0}(x)}{f_{1}(x)} > c, \\ H_{1} & \mbox{if } \dfrac{f_{0}(x)}{f_{1}(x)} < c, \\ \mbox{either } H_{0} \mbox{ or } H_{1} & \mbox{if } \dfrac{f_{0}(x)}{f_{1}(x)} = c, \end{array}\right. \end{eqnarray*}\] for some critical value \(c > 0\). [Hint: Consider Wald’s Complete Class Theorem and a prior distribution \(\pi = (\pi_{0}, \pi_{1})\) where \(\pi_{i} = \mathbb{P}(H_{i}) > 0\). You may assume that for all \(x \in \mathcal{X}\), \(f_{i}(x) > 0\).]
Let \(X_{1}, \ldots, X_{n}\) be exchangeable random variables so that, conditional upon a parameter \(\theta\), the \(X_{i}\) are independent. Suppose that \(X_{i} \, | \, \theta \sim N(\theta, \sigma^{2})\) where the variance \(\sigma^{2}\) is known, and that \(\theta \sim N(\mu_{0}, \sigma_{0}^{2})\) where the mean \(\mu_{0}\) and variance \(\sigma_{0}^{2}\) are known. We wish to produce a point estimate \(d\) for \(\theta\), with loss function \[\begin{eqnarray} L(\theta, d) & = & 1 - \exp\left\{-\frac{1}{2}(\theta - d)^{2} \right\}. \tag{2} \end{eqnarray}\]
Let \(f(\theta)\) denote the probability density function of \(\theta \sim N(\mu_{0}, \sigma_{0}^{2})\). Show that \(\rho(f, d)\), the risk of \(d\) under \(f(\theta)\), can be expressed as \[\begin{eqnarray*} \rho(f, d) & = & 1 - \frac{1}{\sqrt{1+\sigma_{0}^{2}}}\exp\left\{-\frac{1}{2(1+\sigma_{0}^{2})}(d - \mu_{0})^{2} \right\}. \end{eqnarray*}\] [Hint: You may use, without proof, the result that \[\begin{eqnarray*} (\theta - a)^{2} + b(\theta - c)^{2} & = & (1+b)\left(\theta - \frac{a+bc}{1+b}\right)^{2} + \left(\frac{b}{1+b}\right)(a-c)^{2} \end{eqnarray*}\] for any \(a,b,c \in \mathbb{R}\) with \(b \neq -1\).]
Using part a, show that the Bayes rule of an immediate decision is \(d^{*} = \mu_{0}\) and find the corresponding Bayes risk.
Find the Bayes rule and Bayes risk after observing \(x = (x_{1}, \ldots, x_{n})\). Express the Bayes rule as a weighted average of \(d^{*}\) and the maximum likelihood estimate of \(\theta\), \(\overline{x} = \frac{1}{n} \sum_{i=1}^{n} x_{i}\), and interpret the weights. [Hint: Consider conjugacy.]
Suppose now, given data \(y\), the parameter \(\theta\) has the general posterior distribution \(f(\theta \, | \, y)\). We wish to use the loss function \(L(\theta, d)\), as given in equation (2), to find a point estimate \(d\) for \(\theta\). By considering an approximation of \(L(\theta, d)\), or otherwise, what can you say about the corresponding Bayes rule?
Show that if \(p\) is a family of significance procedures then \[\begin{eqnarray*} p(x; \Theta_{0}) & = & \sup_{\theta \in \Theta_{0}} p(x; \theta) \end{eqnarray*}\] is a significance procedure for the null hypothesis \(\Theta_{0} \subset \Theta\), that is that \(p(X; \Theta_{0})\) is super-uniform for every \(\theta \in \Theta_{0}\).
Suppose that, given \(\theta\), \(X_{1}, \ldots, X_{n}\) are independent and identically distributed \(N(\theta, 1)\) random variables so that, given \(\theta\), \(\overline{X} = \frac{1}{n} \sum_{i=1}^{n} X_{i} \sim N(\theta, 1/n)\).
Consider the test of the hypotheses \[\begin{eqnarray*} H_{0}: \theta = 0 \, \mbox{versus} \, H_{1}: \theta = 1 \end{eqnarray*}\] using the statistic \(\overline{X}\) so that large observed values \(\overline{x}\) support \(H_{1}\). For a given \(n\), the corresponding \(p\)-value is \[\begin{eqnarray*} p_{n}(\overline{x}; 0) & = & \mathbb{P}(\overline{X} \geq \overline{x} \, | \, \theta = 0). \end{eqnarray*}\] We wish to investigate how, for a fixed \(p\)-value, the likelihood ratio for \(H_{0}\) versus \(H_{1}\), \[\begin{eqnarray*} LR(H_{0}, H_{1}) & := & \frac{f(\overline{x} \, | \, \theta = 0)}{f(\overline{x} \, | \, \theta = 1)} \end{eqnarray*}\] changes as \(n\) increases.
R
to create a plot of \(LR(H_{0}, H_{1})\) for each \(n \in \{1, \ldots, 20\}\) where, for each
\(n\), \(\overline{x}\) is the value which
corresponds to a \(p\)-value of 0.05.
[Hint: You may need to utilise the qnorm
and
dnorm
functions. The look of the plot may be improved by
using a log-scale on the axes.]Consider the test of the hypotheses \[\begin{eqnarray*} H_{0}: \theta = 0 \, \mbox{versus} \, H_{1}: \theta > 0 \end{eqnarray*}\] using once again \(\overline{X}\) as the test statistic.
For the origins of the use of 0.05 see Cowles, M. and C. Davis (1982). On the origins of the .05 level of statistical significance. American Psychologist 37(5), 553-558.