Chapter 3 Statistical models

A statistical model is an artefact to link our beliefs about things which we can measure, or observe, to things we would like to know. For example, we might suppose that \(X\) denotes the value of things we can observe and \(Y\) the values of the things that we would like to know. Prior to making any observations, both \(X\) and \(Y\) are unknown, they are random variables. In a statistical approach, we quantify our uncertainty about them by specifying a probability distribution for \((X, Y)\). Then, if we observe \(X = x\) we can consider the conditional probability of \(Y\) given \(X = x\), that is we can consider predictions about \(Y\).

In this context, artefact denotes an object made by a human, for example, you or me. There are no statistical models that don’t originate inside our minds. So there is no arbiter to determine the “true” statistical model for \((X, Y)\): we may expect to disagree about the statistical model for \((X, Y)\), between ourselves, and even within ourselves from one time-point to another. In common with all other scientists, statisticians do not require their models to be true: as Box (1979) writes:

it would be very remarkable if any system existing in the real world could be exactly represented by any simple model. However, cunningly chosen parsimonious models often do provide remarkably useful approximations \(\ldots\) for such a model there is no need to ask the question “Is the model true?” If “truth” is to be the “whole truth” the answer must be “No.” The only question of interest is “Is the model illuminating and useful?”

Statistical models exist to make prediction feasible.

Maybe it would be helpful to say a little more about this. Here is the usual procedure in “public” Science, sanitised and compressed:

  1. Given an interesting question, formulate it as a problem with a solution.
  2. Using experience, imagination, and technical skill, make some simplifying assumptions to move the problem into the mathematical domain, and solve it.
  3. Contemplate the simplified solution in the light of the assumptions, e.g. in terms of robustness. Maybe iterate a few times.
  4. Publish your simplified solution (including, of course, all of your assumptions), and your recommendation for the original question, if you have one. Prepare for criticism.

MacKay (2009) provides a masterclass in this procedure. The statistical model represents a statistician’s “simplifying assumptions.”

A statistical model for a random variable \(X\) is created by ruling out many possible probability distributions. This is most clearly seen in the case when the set of possible outcomes is finite.

Example 3.1 Let \(\mathcal{X} = \{x^{(1)}, \ldots, x^{(k)}\}\) denote the set of possible outcomes of \(X\) so that the sample space consists of \(|\mathcal{X}| = k\) elements. The set of possible probability distributions for \(X\) is \[\begin{eqnarray*} \mathcal{P} & = & \left\{p \in \mathbb{R}^{k} \, : \, p_{i} \geq 0 \, \forall i, \sum_{i=1}^{k} p_{i} = 1\right\}, \end{eqnarray*}\] where \(p_{i} = \mathbb{P}(X = x^{(i)})\). A statistical model may be created by considering a family of distributions \(\mathcal{F}\) which is a subset of \(\mathcal{P}\). We will typically consider families where the functional form of the probability mass function is specified but a finite number of parameters \(\theta\) are unknown. That is \[\begin{eqnarray*} \mathcal{F} & = & \left\{p \in \mathcal{P} \, : \, p_{i} = f_{X}(x^{(i)} \, | \, \theta) \mbox{ for some $\theta \in \Theta$}\right\}. \end{eqnarray*}\]

We shall proceed by assuming that our statistical model can be expressed as a parametric model.

Definition 3.1 (Parametric model) A parametric model for a random variable \(X\) is the triple \(\mathcal{E} = \{\mathcal{X}, \Theta, f_{X}(x \, | \, \theta)\}\) where only the finite dimensional parameter \(\theta \in \Theta\) is unknown.

Thus, the model specifies the sample space \(\mathcal{X}\) of the quantity to be observed \(X\), the parameter space \(\Theta\), and a family of distributions, \(\mathcal{F}\) say, where \(f_{X}(x \, | \, \theta)\) is the distribution for \(X\) when \(\theta\) is the value of the parameter. In this general framework, both \(X\) and \(\theta\) may be multivariate and we use \(f_{X}\) to represent the density function irrespective of whether \(X\) is continuous or discrete. If it is discrete then \(f_{X}(x \, | \, \theta)\) gives the probability of an individual value \(x\). Typically, \(\theta\) is continuous-valued.

The method by which a statistician chooses the chooses the family of distributions \(\mathcal{F}\) and then the parametric model \(\mathcal{E}\) is hard to codify, although experience and precedent are obviously relevant; Davison (2003) offers a book-length treatment with many useful examples. However, once the model has been specified, our primary focus is to make an inference on the parameter \(\theta\). That is we wish to use observation \(X = x\) to update our knowledge about \(\theta\) so that we may, for example, estimate a function of \(\theta\) or make predictions about a random variable \(Y\) whose distribution depends upon \(\theta\).

Definition 3.2 (Statistic; estimator) Any function of a random variable \(X\) is termed a statistic. If \(T\) is a statistic then \(T = t(X)\) is a random variable and \(t = t(x)\) the corresponding value of the random variable when \(X = x\). In general, \(T\) is a vector. A statistic designed to estimate \(\theta\) is termed an estimator.

Typically, estimators can be divided into two types.

  1. A point estimator which maps from the sample space \(\mathcal{X}\) to a point in the parameter space \(\Theta\).
  2. A set estimator which maps from \(\mathcal{X}\) to a set in \(\Theta\).

For prediction, we consider a parametric model for \((X, Y)\), \(\mathcal{E} = \{\mathcal{X}\times \mathcal{Y}, \Theta, f_{X, Y}(x, y \, | \, \theta)\}\) from which we can calculate the predictive model \(\mathcal{E}^{*} = \{\mathcal{Y}, \Theta, f_{Y | X}(y \, | \, x, \theta)\}\) where \[\begin{eqnarray} f_{Y | X}(y \, | \, x, \theta) \ = \ \frac{f_{X, Y}(x, y \, | \, \theta)}{f_{X} (x \, | \, \theta)} \ = \ \frac{f_{X, Y}(x, y \, | \, \theta)}{\int_{\mathcal{Y}} f_{X, Y}(x, y \, | \, \theta) \, dy}. \tag{3.1} \end{eqnarray}\]