2 November 2016

## Goals

• We aim for some probability distribution $\theta \sim \pi(\theta), \, \theta \in \Theta$

• We want some summaries of this distribution $E_{\pi}g(\theta) = \int_{\Theta} g(\theta) \pi(\theta) d\theta$

• In many cases $$\pi(\theta)$$ is the posterior distribution from some Bayesian model: $\pi(\theta) = p(\theta|y) \propto p(y|\theta) p(\theta)$

## Monte carlo (according to a statistician)

• If we can generate independent samples
$\theta_i \sim_{iid} \pi, \, i = 1, \dots, n$
• Then under fairly general conditions $\sqrt n \left(\frac{1}{n} \sum_{i=1}^n g(\theta_i) - E_{\pi}g(\theta) \right) \rightarrow_d N(0, \sigma^2_g)$

• We can get a similar result even if we can only sample $$\theta_i$$ via a Markov chain with stationary distribution $$\pi(\theta)$$

## Markov chains

A sequence $$\theta_1, \theta_2,\dots$$ of random elements of $$\Theta$$ is a Markov chain if $p(\theta_{t+1}| \theta_1,\dots, \theta_t) =p(\theta_{t+1}| \theta_t)$

• $$p(\theta_{t+1}= y| \theta_t = x) = P_{xy}$$ (finite state space) or
• $$p(\theta_{t+1}\in B| \theta_t = x)$$ (general state space) is called the transition distribution/kernel
• Assume $$p(\theta_{t+1}| \theta_t)$$ is independent of $$t$$: "stationary transition probabilities" or "time independent transition probabilities"

## Some desirable properties

Want to construct a stationary Markov chain: $p(\theta_t) = \pi(\theta_t)$

• Detailed balance $\pi(x) P_{xy} = \pi(y) P_{yx}, \, \forall x,y \in \Theta$

• Irreducibility: w/ probability > 0, can get from any state to any other state in a finite number of moves

• Aperiodicity: return times to a particular state can be irregular

## Example

(Gaussian) Random Walk Metropolis Hastings

$\theta^\star \sim q = N(\theta_t, s)$

Set $$\theta_{t+1} = \theta^\star$$ ('accept proposed move') with probability $\min\left\{ 1 , \frac{\pi(\theta^\star) q(\theta_t| \theta^\star)}{\pi(\theta_t) q(\theta^\star|\theta_t)}\right\}$ Otherwise $$\theta_{t+1} = \theta_t$$

## Simple 1-D model

$Y_i \sim N(\theta, 10), \quad \text{prior}(\theta) = N(0,100)$ $\min\left\{ 1 , \frac{\pi(\theta^\star) {\color{green}q(\theta_t| \theta^\star)}}{\pi(\theta_t) {\color{red} q(\theta^\star|\theta_t)}}\right\}$

## Convergence 1-D example

RW MH with s = .1, 1:

## Convergence

• Want to quickly reduce influence of our initial state $$\theta_1$$

• Reduce auto-correlation and move around $$\Theta$$ efficiently

• Mixing: time until multiple chains with different initial states look identical (e.g., all have converged to the stationary distribution)?

• Burn in: early iterations that are discarded because of obvious dependence on $$\theta_1$$

## Problems with RW-MH

• Need to balance size of jumps (controlled by s) and achieving a good acceptance rate (average probability of moving to $$\theta^\star$$)

• Difficult to accept 'long distance' proposals if $$\Theta$$ is high dimensional
• Good to use a proposal that adapts to the local features of $$\pi$$ in these settings using, e.g, Hamiltonian Monte Carlo

## Reparameterisations/Transformations?

• McMC is not an intrinsic method

• Target $$\pi(\theta)$$, in many cases $$\theta$$ is only a label for another object (probability distribution)

• We can work with any $$\eta=T(\theta)$$ and obtain samples from $$\eta$$ to then "go back" to $$\theta=T^{-1}(\eta)$$

• McMC by itself is invariant to reparametrisation but practical efficiency (convergence, proposal choice) might depend on parametrisation (e.g.centering in hierarchical model)

• reparametrization is part of the "repertoire" of McMC in Stats

## Geometry

• If $$\theta$$ labels other objects then samples from $$\theta$$ are samples from those objects not samples of numbers

• Step behaviour should not be measured only by length and direction in terms of $$\theta$$

• Beneficial to measure step behaviour via some Geometry of the objects

• HMC is example in this direction

• Clever reparametrisations is another option. Geometry might suggests good parametrizations (e.g. geodesics)

• Example: Distance between N(0,1) and N(1,1) is larger than that between N(0,2) and N(1,2) (Hyperbolic Geometry)