Spring 2018

# Autumn 2017

### Organiser: Kari Heine

Date: 3 October 2017, CB 5.8, 13:15

Inference in generative models using the Wasserstein distance

Mathieu Gerber (University of Bristol)

Abstract: A growing range of generative statistical models are such the numerical evaluation of their likelihood functions is intractable. Approximate Bayesian computation and indirect inference have become popular approaches to overcome this issue, simulating synthetic data given parameters and comparing summaries of these simulations with the corresponding observed values. We propose to avoid these summaries and the ensuing loss of information through the use of Wasserstein distances between empirical distributions of observed and synthetic data. We describe how the approach can be used in the setting of dependent data such as time series, and how approximations of the Wasserstein distance allow the method to scale to large data sets. In particular, we propose a new approximation to the optimal assignment problem using the Hilbert space-filling curve. We provide an in-depth theoretical study, including consistency in the number of simulated data sets for a fixed number of observations and posterior concentration rates. The approach is illustrated with various examples, including a multivariate g-and-k distribution, a toggle switch model from systems biology, a queueing model, and a Lévy-driven stochastic volatility model. (Joint work with E. Bernton, P. E. Jacob and C.P. Robert.)

Date: 10 October 2017, CB 5.8, 13:15

Tensor Train algorithms for stochastic PDE problems

Sergey Dolgov (University of Bath)

Abstract: Surrogate modelling is becoming a popular technique to reduce the computational burden of forward and inverse uncertainty quantification problems. In this talk we use the Tensor Train (TT) decomposition for approximating the forward solution map of the stochastic diffusion equation, as well as the posterior density function in the Bayesian inverse problem. The TT decomposition is based on the separation of variables, hence the multivariate integration factorises into a set of one-dimensional quadratures. For sufficiently smooth functions, the storage cost of the TT decomposition grows much slower with the accuracy compared to the Monte Carlo rate. The TT decomposition of a multivariate function can be constructed from adaptively chosen fibres of samples along each variable (the so-called TT cross interpolation), with the number of function evaluations proportional to the (small) number of unknowns in the TT representation. In turn, the TT approximation of the probability density function allows an efficient computation of the Rosenblatt transform, and hence a fast method for proposing almost uncorrelated MCMC samples. We show that for smooth PDE coefficients the TT approach can be faster than Quasi Monte Carlo and adaptive Metropolis techniques.

Date: 17 October 2017, CB 5.8, 13:15

Stein variational Quasi-Newton algorithm: sampling by sequential transport of particles

Gianluca Detommaso (University of Bath)

Abstract: In many statistical applications and real-world situations, it is of fundamental importance being able to quantify the uncertainty related to estimates of interest. However, whenever the underlying probability distribution is difficult or unknown, this cannot be done directly and sampling algorithms are typically needed. A recently introduced sampling algorithm is the Stein variational gradient descent [Q. Liu, D. Wang, 2016], where a cloud of particles are sequentially transported towards the target distribution. This is accomplished by a functional gradient descent which minimises the Kullback–Leibler divergence between the current distribution of the particles and the target one. In collaboration with Dr. A. Spantini (MIT) and T. Cui (Monash, AU), we are currently working on accelerating this algorithm. From a transport maps perspective, we work out second-order information to replace gradient descent with Quasi-Newton algorithms, with potentially huge convergence accelerations. Furthermore, we substitute the simple kernel used in the original algorithm by more sophisticated ones, better representing the interaction between the particles and accelerating their spread along the target distribution support.

Date: 24 October 2017, CB 5.8, 13:15

Fast computation for latent Gaussian models with a multivariate link function

Birgir Hrafnkelsson (University of Iceland)

Árni Víðir Jóhannesson (University of Iceland)

Abstract: Latent Gaussian models (LGMs) form a frequently used class within Bayesian hierarchical models. This class is such that the density of the observed data conditioned on the latent parameters can be any parametric density, and the prior density of the latent parameters is Gaussian. Typically, the link function is univariate, i.e., it is only a function of the location parameter. Here the focus is on LGMs with a multivariate link function, e.g., LGMs structured such that the location parameter, the scale parameter and the shape parameter of an observation are transformed into three latent parameters. These three latent parameters are modeled with a linear model at the latent level. The parameters within the linear model are also defined as latent parameters and thus assigned a Gaussian prior density. To facilitate fast posterior computation, a Gaussian approximation is proposed for the likelihood function of the parameters. This approximation, along with a priori assumption of Gaussian latent parameters, allows for straightforward sampling from the posterior density. One benefit of this approach, e.g., is subset selection at the latent level. The computational approach is applied to annual maximum peak flow series from UK.

Date: 31 October 2017, CB 5.8, 13:15

Automated formative and summative assessment: the R solution with the "exams" package

Abstract: Increased student numbers for the Department of Mathematical Sciences at the University of Bath means that the working load involved in providing feedback and marking for homework, coursework and exams, is becoming undesirable. The obvious solution to avoid pressure on academic staff is the implementation of one or more automated formative and summative types of assessment. In this seminar I will discuss the approach brought about by the R package "exams", initially tested at the Wirtschaftsuniversitat Wien (WU Wien) in 2007. With "exams" it is possible to generate, for each type of problem/question, hundreds of versions with, for instance, different numerical values. These different versions can be imported in the Moodle questions bank, so to be used to create random quizzes for students. Once each student complete and submit a quiz, the final mark is returned, together with the detailed and correct solution, without instructor's intervention. The approach to automated assessment here described, is particularly, but not exclusively, suited to statistics subjects. Fields in which questions involve various algebraic manipulations can be better handled with alternative systems, like STACK (C. J. Sangwin and M. J. Grove, 2006), that is currently being investigated by the Department.

Date: 7 November 2017, CB 5.8, 13:15

Time-dependent feature allocation models via Poisson Random Fields

Paul Jenkins (University of Warwick)

Abstract: In a feature allocation model, each data point is described by a collection of latent features, possibly unobserved. For example, we might classify a corpus of texts by describing each document via a set of topics; the topics then determine a distribution over words for that document. In a Bayesian nonparametric setting, the Indian Buffet Process (IBP) is a popular prior model in which the number of topics is unknown a priori. However, the IBP is static in that it does not account for the change in popularity of topics over time. I will introduce the Wright-Fisher Indian Buffet Process (WF-IBP), a probabilistic model for collections of time-stamped documents. By adapting the Wright-Fisher diffusion from population genetics, we derive a stochastic process with appealing properties including that (i) each feature popularity evolves independently as a diffusion and (ii) marginal observations at a fixed timepoint are given by the original IBP. We describe a Markov Chain Monte Carlo algorithm for exact posterior simulation and illustrate our construction by analysing the topics of NIPS conference papers over 12 years. This is joint work with Valerio Perrone (Warwick), Dario Spano (Warwick), and Yee Whye Teh (Oxford).

Date: 14 November 2017, CB 5.13, 13:15 (Computer Group Work Room)

Introduction to Python for R Users

Julian Faraway (University of Bath)

Abstract: Python is a popular programming language that is widely used in Machine Learning and Data Science. While it can be used for Statistics, its real value to Statisticians lies in its extensive range of other capabilities. It can form a valuable complement to the statistical strengths of R. This hands on introduction in a computer lab will help you get started in Python and will focus on the ways it differs from R.

Date: 21 November 2017, CB 5.8, 13:15

Kernel methods for spatiotemporal learning in criminology (or, the methods behind our winning entry in the US National Institute of Justice's crime forecasting challenge)

Seth Flaxman (Imperial College London)

Abstract: In this talk I will highlight the statistical machine learning methods that I am developing to address public policy questions in criminology. We develop a scalable inference method for the log-Gaussian Cox Process, and show that an expressive kernel parameterisation can learn space/time structure in a large point pattern dataset [Flaxman et al, ICML 2015]. Our approach has nearly linear scaling, allowing us to efficiently fit a point pattern dataset of n = 233,088 crime events over a decade in Chicago and discover spatially varying multiscale seasonal trends and produce highly accurate long-range local area forecasts. Building on this work, we use scalable approximate kernel methods to provide a winning solution to the US National Institute of Justice "Real-Time Crime Forecasting Challenge," providing forecasts of four types of crime at a very local level (less than 1 square mile) 1 week, 1 month, and 3 months into the future.

In another line of work, we use a Hawkes process model to quantify the spatial and temporal scales over which shooting events diffuse in Washington, DC, using data collected by an acoustic gunshot locator system, in order to assess the hypothesis that crime is an infectious process. While we find robust evidence for spatiotemporal diffusion, the spatial and temporal scales are extremely short (126 meters and 10 minutes), and thus more likely to be consistent with a discrete gun fight, lasting for a matter of minutes, than with a diffusing, infectious process linking violent events across hours, days, or weeks [Loeffler and Flaxman, Journal of Quantitative Criminology 2017]

Papers and replication code available at www.sethrf.com

Date: 28 November 2017, CB 5.8, 13:15

Using forest eco-system monitoring data to model tree survival for investigating climate change effects

Nicole Augustin (University of Bath)

Alice Davis (University of Bath)

Abstract: Forests are economically, recreationally and ecologically important, providing timber and wildlife habitat and acting as a carbon sink, among many ecosystem services. They are therefore extremely valuable to society, and it is crucial to ensure that they remain healthy. Forest health is monitored in Europe by The International Co-operative Programme on Assessment and Monitoring of Air Pollution Effects (ICP Forests) in cooperation with the European Union. More recently climate change has contributed to the decline in forest health and these data are increasingly being used to investigate the effects of climate change on forests in order to decide on forest management strategies for mitigation. Here we model extensive yearly data on tree mortality and crown defoliation, an indicator of tree health, from a monitoring survey carried out in Baden-Württemberg, Germany since 1983, which includes a part of the ICP transnational grid. On a changing irregular grid, defoliation, mortality and other tree and site specific variables are recorded. In some cases the grid locations are no longer observed which leads to censored data, also recruitment of trees happens throughout when new grid points are added. We model tree survival as a function of the predictor variables on climate, soil characteristics and deposition. We are interested in the process leading to tree mortality rather than prediction and this requires the inclusion of all potential drivers of tree mortality in the model. We use the semiparametric shared frailty model fitted using a Cox regression model which allows for random effects (frailties) taking care of dependence between neighbouring trees and non-linear smooth functions of time varying predictors and functional predictors. At each of 2385 locations 24 trees were observed between 1983 and 2016, with not all locations being observed yearly. Altogether a total of 80000 trees are observed making the analysis computationally challenging.

Date: 5 December 2017, CB 5.8, 13:15

Personalised dynamic prediction of survival using patient registry data: An anatomy of a landmarking analysis

Ruth Keogh (London School of Hygiene & Tropical Medicine )

Abstract: In ‘dynamic’ prediction of survival we make updated predictions of individuals’ survival as new longitudinal measures of health status become available. Landmarking is an attractive and flexible method for dynamic prediction. In this talk I will take the audience through a dynamic prediction analysis using data from the UK Cystic Fibrosis Registry. Challenges arise due to a large number of potential predictors, use of age as the timescale, and occurrence of intermediate events. Various modelling options are possible, and choices have to be made concerning time-varying effects and landmark-specific effects. I will outline how different model selection procedures were investigated; how models were assessed and compared using bootstrapping; and how predictions and their uncertainties can be obtained for a new individual.

Date: 12 December 2017, CB 5.8, 13:15

TBA

Moved to spring term ()

Abstract: TBA

# Spring 2018

### Organiser: Kari Heine

Date: 6 February 2018, CB 3.7, 14:15

Supermassive black hole growth

Carolin Villforth (University of Bath)

Abstract: Supermassive black holes are found in the centers of all massive galaxies. While they are usually quiescent, all supermassive black holes go through phases of accretion during which they can outshine the galaxy they reside in. Black holes gain the majority of their mass during accretion events. It was also discovered that supermassive black hole masses correlate well with the properties of their host galaxies. This has raised an important question: how do supermassive black holes grow and how is this growth connected to the evolution of galaxies? Addressing this question requires disentangling emission from the galaxy and accreting black hole as well as isolating signatures of different coevolution models in large populations.

Date: 13 February 2018, CB 3.7, 14:15

On Local orthogonality and parameter meaning preservation

Karim Anaya-Izquierdo (University of Bath)

Abstract: This is an informal talk about work very much in progress. The work is motivated by the following two issues that appear when enlarging parametric models: (1) a working parametric model is enlarged to account for relevant (but not of direct interest) structures in the data but the meaning of some of the parameters is lost in that process. A simple example is when the interpretation of a parameter as a treatment effect (marginal mean difference, mean ratio or hazard ratio) is lost when including correlated random effects to account for spatial dependence. (2) More frivolously, we might want the estimation of the enlarged and uninteresting parts of the model to interfere the least with the estimation of the important bits. A well known example is the partial likelihood in the Cox model which allows estimation on target regression parameters without having to worry about any unknown baseline distribution. Local orthogonality techniques go in this direction. Despite the interesting-looking examples above, the talk will be, disappointingly, about trivial examples, concepts, modelling ideas and with very little (if any) data. Inevitably, being me, I will introduce some useful geometric tools in this context.

Date: 20 February 2018, CB 3.7, 14:15

Multivariate Output Analysis for Markov Chain Monte Carlo

Dootika Vats (University of Warwick)

Abstract: Markov chain Monte Carlo (MCMC) produces a correlated sample for estimating expectations with respect to a target distribution. A fundamental question is when should sampling stop so that we have good estimates of the desired quantities? The key to answering this question lies in assessing the Monte Carlo error through a multivariate Markov chain central limit theorem (CLT). We present a multivariate framework for terminating simulation in MCMC. We define a multivariate effective sample size, estimating which requires strongly consistent estimators of the covariance matrix in the Markov chain CLT; a property we show for the multivariate batch means estimator. We then provide a lower bound on the number of minimum effective samples required for a desired level of precision. This lower bound depends on the problem only in the dimension of the expectation being estimated, and not on the underlying stochastic process. This result is obtained by drawing a connection between terminating simulation via effective sample size and terminating simulation using a relative standard deviation fixed-volume sequential stopping rule; which we demonstrate is an asymptotically valid procedure. The finite sample properties of the proposed method are then demonstrated through simple motivating example. This work is joint with Galin Jones (U of Minnesota) and James Flegal (UC Riverside).

Date: 27 February 2018, CB 3.7, 14:15 CANCELLED

Designing an adaptive trial with treatment selection and a survival endpoint

Chris Jennison (University of Bath)

Abstract: In some Phase III clinical trials, more than one new treatment is compared to the control treatment. Such a trial requires a larger sample size than a two arm trial. However, this sample size can be reduced by choosing to focus on one of the new treatments part way through the trial. We consider a clinical trial in which two versions of a new treatment are compared against control, with the primary endpoint of overall survival. At an interim analysis, mid-way through the trial, one of the two treatments is selected, based on the short term response of progression free survival. In the remainder of the trial, new patients are randomised between the selected treatment and the control. For such an adaptive design, the familywise type I error rate can be protected by use of a closed testing procedure to deal with the two null hypotheses and combination tests to combine data from before and after the interim analysis. However, with the primary endpoint of overall survival, there is still a danger of inflating the type I error rate: we present a way of applying the combination test that solves this problem simply and effectively. With the methodology in place, we then assess the potential benefits of treatment selection in this adaptive trial design.

Date: 6 March 2018, CB 3.7, 14:15 CANCELLED

TBA

David Robertson (MRC Biostatistics Unit, University of Cambridge)

Abstract: TBA

Date: 13 March 2018, CB 3.7, 14:15 CANCELLED

TBA

Maria De Iorio (University College London)

Abstract: TBA

Date: 20 March 2018, CB 3.7, 14:15

Progress on the connection between spectral embedding and network models used by the probability, statistics and machine-learning communities

Patrick Rubin-Delanchy (University of Bristol)

Abstract: In this talk, I give theoretical and methodological results, based on work spanning Johns Hopkins, the Heilbronn Institute for Mathematical Research, Imperial and Bristol, regarding the connection between various graph spectral methods and commonly used network models which are popular in the probability, statistics and machine-learning communities. An attractive feature of the results is that they lead to very simple take-home messages for network data analysis: a) when using spectral embedding, consider eigenvectors from both ends of the spectrum; b) when implementing spectral clustering, use Gaussian mixture models, not k-means; c) when interpreting spectral embedding, think of "mixtures of behaviour" rather than "distance". Results are illustrated with cyber-security applications.

Date: 10 April 2018, CB 3.7, 14:15

A Dirichlet Process Tour

Tom Fincham Haines (University of Bath)

Abstract: I will introduce the Dirichlet process, as used in non-parametric Bayesian models when you want them to dynamically adjust how many discrete elements are used, depending on the data. This talk will demonstrate its use to solve a variety of machine learning and computer vision problems using both Gibbs sampling (MCMC) and mean field variational techniques.

Date: 17 April 2018, CB 3.7, 14:15

A regional random effects model for peaks-over-threshold flood events

Emma Eastoe (Lancaster University)

Abstract: Statistical models for the extreme events of environmental data sets must often account for temporal non-stationarity. In this talk we look at peaks-over-threshold river flow data, which consists of the times and sizes of the peak flows of flooding events. Our goal is to model the event sizes whilst accounting for non-stationarity. If event sizes are assumed to be stationary over time, then an appropriate statistical model is given by the generalised Pareto distribution. However, the assumption of stationarity is mostly invalid since the behaviour of event sizes varies across years under the influence of other climate-related processes, eg. precipitation. If observations have been made on these underlying processes then regression methods can be used. However such observations are rarely available and, even if they are, it is often not clear which combination of covariates to include in the model. We develop a regional random effects model which accounts for non-stationarity in event sizes without the need for any measurements on underlying processes. This model can be used to predict both unconditional extreme events such as the m-year maximum, as well as extreme events that condition on the value of the random effect. The random effects also provide information on likely candidates for which underlying climate-related processes cause variability in flood magnitudes. The model is applied to UK flood data.

Date: 24 April 2018, CB 3.7, 14:15

Geometric MCMC for infinite-dimensional inverse problems

Alex Beskos (University College London)

Abstract: Bayesian inverse problems often involve sampling posterior distributions on infinite-dimensional function spaces. Traditional Markov chain Monte Carlo (MCMC) algorithms are characterized by deteriorating mixing times upon mesh-refinement, when the finite-dimensional approximations become more accurate. Such methods are typically forced to reduce step-sizes as the discretization gets finer, and thus are expensive as a function of dimension. Recently, a new class of MCMC methods with mesh-independent convergence times has emerged. However, few of them take into account the geometry of the posterior informed by the data. At the same time, recently developed geometric MCMC algorithms have been found to be powerful in exploring complicated distributions that deviate significantly from elliptic Gaussian laws, but are in general computationally intractable for models defined in infinite dimensions. In this work, we combine geometric methods on a finite-dimensional subspace with mesh-independent infinite-dimensional approaches. Our objective is to speed up MCMC mixing times, without significantly increasing the computational cost per step (for instance, in comparison with the vanilla preconditioned Crank–Nicolson (pCN) method). This is achieved by using ideas from geometric MCMC to probe the complex structure of an intrinsic finite-dimensional subspace where most data information concentrates, while retaining robust mixing times as the dimension grows by using pCN-like methods in the complementary subspace. The resulting algorithms are demonstrated in the context of three challenging inverse problems arising in subsurface flow, heat conduction and incompressible flow control. The algorithms exhibit up to two orders of magnitude improvement in sampling efficiency when compared with the pCN method.

Date: 1 May 2018, CB 3.7, 14:15

Detection & attribution of large scale drivers for flood risk in the UK

Abstract: In this talk, we investigate the attribution of trends in peak river flows to large-scale climate drivers such as the North Atlantic Oscillation (NAO) and the Eastern Atlantic (EA) Index. We focus on a set of near-natural benchmark" catchments in the UK in order to detect trends not caused by anthropogenic changes, and aim to attribute trends in peak river flows to some climate indices.To improve the power of our approach compared with at-site testing, we propose modelling all stations together in a Bayesian framework. This approach leads to the detection of a clear countrywide time trend. Additionally, the EA appears to have a considerable association with peak river flows, particularly in the south east of the UK, while the effect of NAO appears to be minor across the UK. When a multivariate approach is taken to detect collinearity between climate indices and time, the association between NAO and peak flows disappears, while the association with EA remains strong.

A Decision Theoretic Approach for Phase II/III Programmes

Robbie Peck (University of Bath)

Abstract: Drug development involves the problem of identifying potentially efficacious treatments or doses (phase II), and providing evidence of their efficacy (phase III). This may be done in a phase II/III programme. The design may be adaptive, meaning the design of the rest of the programme may be changed based on previously observed data. One can apply a Bayesian decision theoretic framework to make Bayes decisions at each stage in the programme based upon the data observed so far. These decisions may include the choice of treatment/dose, and the number of patients required. The framework requires a choice of gain function for which each Bayes decision maximises the expected value of. I shall illustrate this approach through 3 case studies.

Date: 8 May 2018, CB 3.7, 14:15

Renewable energy analytics: how to model uncertainties in the energy demand and supply

Jooyoung Jeon (University of Bath)

Abstract: The future electrical grid will have unprecedented complexity and uncertainty. The cost of low carbon technologies, such as PV, wind, electric vehicles and battery storage is rapidly decreasing and they are increasingly being connected to the edge of the grid. The rise of prosumers, where energy buyers and sellers become increasingly blurred, is far beyond the capability of the current market and system operation framework. In view of this, this research proposes uncertainty quantification modelling framework for a prototype peer-to-peer energy trading/sharing (P2P-ETS) platform, which will lead to a unique scalable market place for mass prosumers to buy/sell/share energy themselves. In detail, the research explores how to model and evaluate the forecast uncertainties (1) in the energy supply from wind, using hierarchical density forecasting techniques, and (2) in the energy demand measured by smart-meters, using stochastic optimisation based on density forecasts.

# Spring 2017

### Organiser: Karim Anaya-Izquierdo

Date: 14 February 2017, CB 3.7, 14:15

Kate Button & Michelle St Clair (Bath) (Joint seminar)

(Kate Button) Personalising psychological care

Abstract: Depression and anxiety are leading causes of disability in the UK. Improving access to psychological therapies (IAPT) aims to reduce this disability by making ‘talking therapies’ available through the NHS. IAPT has been a success, providing therapy to those who would have otherwise not had access, and half of patients referred make a full recovery. However, we can still do better. The aim of this research is to use routinely collected IAPT data to identify optimal care regimes for a given patient. By providing evidence to tailor psychological care to the individual, we aim to further improve recovery rates.

(Michelle St Clair) Statistics and Human Development: Characterising developmental trajectories and (causal) pathways through childhood, adolescence and adulthood

Abstract: I will be giving a short overview of my research with regard using large scale longitudinal projects and/or longitudinal cohort databases to evaluate developmental trajectories using complex multivariate person centred and variable centred statistical techniques. I will also evaluate some work that is looking at possible causal pathways or relationships between experiences in early life and outcomes in later life using longitudinal cohort data.

Date: 17 Feb 2017, 4W 1.7, 15:15 (Landscapes seminar)

Surfaces, shapes and anatomy

Abstract: Three-dimensional surface imaging, through laser-scanning or stereo-photogrammetry, provides high-resolution data defining the surface shape of objects. In an anatomical setting this can provide invaluable quantitative information, for example on the success of surgery. Two particular applications are in the success of facial surgery and in developmental issues with associated facial shapes. An initial challenge is to extract suitable information from these images, to characterise the surface shape in an informative manner. Landmarks are traditionally used to good effect but these clearly do not adequately represent the very much richer information present in each digitised images.
Curves with clear anatomical meaning provide a good compromise between informative representations of shape and simplicity of structure, as well as providing guiding information for full surface representations. Some of the issues involved in analysing data of this type will be discussed and illustrated. Modelling issues include the measurement of asymmetry and longitudinal patterns of growth.

Date: 14 March 2017, CB 3.7, 13:15

Kari Heine (Bath)

TBA

Date: 28 March 2017, CB 3.7, 13:15

Paul Northrop (UCL)

Extreme value threshold selection

Abstract: A common form of extreme value modelling involves modelling excesses of a threshold by a generalised Pareto (GP) distribution. The GP model arises by considering the possible limiting distributions of excesses as the threshold increased. Selecting too low a threshold leads to bias from model mis-specification; raising the threshold increases the variance of estimators: a bias-variance trade-off. Many existing threshold selection methods do not address this trade-off directly, but rather aim to select the lowest threshold above which the GP model is judged to hold approximately. We use Bayesian cross-validation to address the trade-off by comparing thresholds based on predictive ability at extreme levels. Extremal inferences can be sensitive to the choice of a single threshold. We use Bayesian model averaging to combine inferences from many thresholds, thereby reducing sensitivity to the choice of a single threshold. The methodology is illustrated using significant wave height datasets from the North Sea and from the Gulf of Mexico.

Date: 25 April 2017, CB 3.7, 13:15

Heather Battey (Imperial)

Exploring and exploiting new structured classes of covariance and inverse covariance matrices

Abstract: Estimation of covariance and inverse covariance (precision) matrices is an essential ingredient to virtually every modern statistical procedure. When the dimension, p, of the covariance matrix is large relative to the sample size, the the sample covariance matrix is inconsistent in non-trivial matrix norms, and its non-invertibilty renders many techniques in multivariate analysis impossible. Structural assumptions are necessary in order to restrain the estimation error, even if this comes at the expense of some approximation error if the structural assumptions fail to hold. I will introduce new structured model classes for estimation of large covariance and precision matrices. These model classes result from imposing sparsity in the domain of the matrix logarithm. After studying the structure induced in the original and inverse domains, I will then introduce estimators of both the covariance and precision matrix that exploit this structure. I derive the convergence rates of these estimators and show that they achieve a new minimax lower bound over classes of covariance and precision matrices whose matrix logarithm is sparse. The implication of this result is that the estimators are efficient and the minimax lower bound is sharp.

Date: 2 May 2017, CB 3.7, 13:15

Anthony Lee (Warwick)

TBA

Date: 9 May 2017, CB 3.7, 13:15

Tiago de Paula Peixoto (Bath)

TBA

Date: 16 May 2017, CB 3.7, 13:15

Stats PhD students (Bath)

TBA

# Autumn 2016

Date: 11 October 2016, CB 5.8, 13:15

Daniel Falush (Bath)

The painting palettes of human ancestry

Abstract: Genomic technology is advancing at a remarkable pace and provide a great deal of information on our origins but requires new statistical technology to analyze. I will describe our chromosome painting approach to summarizing ancestry information (available from here). A Hidden Markov Model is used to fit each individual as a mosaic of the other individuals in the sample. A summary of this painting is used to subdivide the sample into populations with discrete ancestry profiles, using a merge-split sampler. I illustrate the application of this method to subdivide the British Isles into 17 regions with distinct ancestry profiles. Historical admixture events can be explored using mixture modelling. I show how Non Linear Least Squares and curve fitting can be used to estimate global admixture events in the last 3,000 years.

Date: 25 October 2016, CB 5.8, 13:15

Francisco Javier Rubio (LSHTM)

Tractable Bayesian variable selection: beyond normality

Abstract: Bayesian variable selection for continuous outcomes often assumes normality, and so do its theoretical studies. There are sound reasons behind this assumption, particularly for large $$p$$: ease of interpretation, analytical and computational convenience. More flexible frameworks exist, including semi- or non-parametric models, often at the cost of losing some computational or theoretical tractability. We propose a simple extension of the Normal model that allows for skewness and thicker-than-normal tails but preserves its tractability. We show that a classical strategy to induce asymmetric Normal and Laplace errors via two-piece distributions leads to easy interpretation and a log-concave likelihood that greatly facilitates optimization and integration. We also characterize asymptotically its maximum likelihood estimator and Bayes factor rates under model misspecification. Our work focuses on the likelihood and can thus be combined with any likelihood penalty or prior, but here we adopt non-local priors, a family that induces extra sparsity and which we characterize under misspecification for the first time. Under suitable conditions Bayes factor rates are of the same order as those that would be obtained under the correct model, but we point out a potential loss of sensitivity to detect truly active covariates. Our examples show how a novel approach to infer the error distribution leads to substantial gains in sensitivity, thus warranting the effort to go beyond normality, whereas for near-normal data one can get substantial speedups relative to assuming unnecessarily flexible models.

The methodology is available as part of R package mombf.

Joint work with David Rossell.

Date: 1 November 2016, CB 5.8, 13:15

Keming Yu (Brunel)

Abstract: Tail-index is an important measure to gauge the heavy-tailed behavior of a distribution. The problem of estimation of a Tail-index from various types of data has become rather important. Tail-index regression is introduced when covariate information is available. Inference of Tail-index regression may face two challenges: small sample bias with the analysis of small to moderate size data and the problem of storage and computational efficiency with dealing with massive data. In this paper we derive new statistical inference for Tail-index regression based on Pareto-type of distributions and Burr-XII distributions.

Date: 15 November 2016, CB 5.8, 13:15

Theresa Smith (Bath)

Age-period-cohort models for cancer incidence

Abstract: Age-period-cohort models have been used to examine and forecast cancer incidence and mortality for over three decades. However, the fitting and interpretation of these models requires great care because of the well-known identifiability problem that exists; given any two of age, period, and cohort, the third is determined.

In this talk I introduce APC models and the identifiability problem. I examine proposed ‘’solutions’’ to this problem and approaches based on an identifiable parameterization. I conclude with an analyis of cancer incidence data from Washington State and a discussion of future research directions.

Date: 21 November 2016, CB 3.16, 13:15 (Note different time and venue)

Search and Jump Algorithm for Markov Chain Monte Carlo Sampling

Abstract: We present an offline, iterated particle filter to facilitate statistical inference in general state space hidden Markov models. Given a model and a sequence of observations, the associated marginal likelihood L is central to likelihood-based inference for unknown statistical parameters. We define a class of “twisted” models: each member is specified by a sequence of positive functions psi and has an associated psi-auxiliary particle filter that provides unbiased estimates of L. We identify a sequence psi* that is optimal in the sense that the psi-auxiliary particle filter’s estimate of L has zero variance. In practical applications, psi is unknown so the psi-auxiliary particle filter cannot straightforwardly be implemented. We use an iterative scheme to approximate psi, and demonstrate empirically that the resulting iterated auxiliary particle filter significantly outperforms the bootstrap particle filter in challenging settings. Applications include parameter estimation using a particle Markov chain Monte Carlo algorithm and approximation of conditioned diffusion sample paths. [arxiv: 1511.06286]

Joint work with Pieralberto Guarniero and Anthony Lee

Date: 22 November 2016, CB 5.8, 13:15

Chris Jennison (Bath)

Search and Jump Algorithm for Markov Chain Monte Carlo Sampling

Abstract: MCMC sampling is now established as a fundamental tool in statistical inference but there are still problems to solve. MCMC samplers can mix slowly when the target distribution has multiple modes. A more insidious problem arises when sampling a distribution that is concentrated on a thin sub-region of a high-dimensional sample space. I shall present a new approach to mode-jumping and show how this can be used to sample from some challenging “thin” distributions.

Joint work with Adriana Ibrahim, University of Malasya.

Date: 6 December 2016, CB 5.8, 13:15

Sam Livingstone (Bristol)

Some recent advances in dynamics-based Markov chain Monte Carlo

Abstract: Markov chain Monte Carlo methods based on continuous-time dynamics such as Langevin diffusions and Hamiltonian flow are among the state of the art when performing inference for challenging models in many application areas. I will talk about some statistical models which Markov chains produced by these methods can explore well, and others for which they often struggle to do so. I’ll discuss some existing and new algorithms that use gradient information and/or exploit the geometry of the space through an appropriate Riemannian metric, and how these inputs can both positively and negatively affect exploration, using the notion of geometric ergodicity for Markov chains.

Date: 13 December 2016, CB 5.8, 13:15

Ilaria Prosdocimi (Bath)

A statistician’s wander into flood hydrology

Abstract: In the design and maintenance of structures such as dams or drainage networks, it is essential to be able to obtain reliable estimates of the magnitude and frequency of extreme events such as high river flow and rainfall totals. This talk will discuss methods to perform such estimation, focusing on similarities and differences of the different approaches developed by statisticians and civil engineers. Rather than presenting final results, the talk will focus on discussing the open challenges in the statistical methods for flood frequency estimation and will suggest possible future research avenues.

# Spring 2016

Date: 19 April 2016, CB 5.1, 14:15

Nick Whiteley (Bristol)

Variance estimation in the particle filter

Abstract: Particle filters provide sampling based approximations of marginal likelihoods and filtering expectations in hidden Markov models. However, estimating the Monte Carlo variance of these approximations, without generating multiple independent realizations of the approximations themselves, is not straightforward. We present an unbiased estimator of the variance of the marginal likelihood approximation, and consistent estimators of the asymptotic variance of the approximations of the marginal likelihood and filtering expectations. These estimators are byproducts of a single run of a particle filter and have no added computational complexity or storage requirements. With additional storage requirements, one can also consistently estimate higher-order terms in the non-asymptotic variance. This is information can be used to approximate the variance-optimal allocation of particle numbers.

Joint work with Anthony Lee, University of Warwick

Date: 26 April 2016, CB 5.1, 14:15

Statistical shape analysis in a Bayesian framework for shapes in two and three dimensions

Thomai Tsiftsi (Bath)

Abstract: Shape analysis is an integral part of object classification and has been used as a tool by many branches of science such as computer vision, pattern recognition and shape classification. In this talk I will present a novel shape classification method which is embedded in the Bayesian paradigm and utilises the efficacy of geometrical statistics as well as differential geometry. I will focus on the statistical classification of planar shapes by using techniques which replace some previous approximate results by analytic calculations in a closed form. This gives rise to a new Bayesian shape classification algorithm of which the efficiency was tested on available shape databases. Finally, I will conclude by demonstrating the extension of the proposed classification algorithm for shapes in three-dimensions.

Date: 3 May 2016, CB 5.1, 14:15

Hilbertian Fourth Order Blind Identification

Germain Van Bever (Open University)

Abstract: In the classical Independent Component (IC) model, the observations $$X_1,\cdots,X_n$$ are assumed to satisfy $$X_i=\Omega Z_i$$, $$i=1,\dots,n$$, where the $$Z_i$$’s are i.i.d random vectors with independent marginals and $$\Omega$$ is the mixing matrix. Independent component analysis (ICA) encompasses the set of all methods aiming at $$X=(X_1,\dots,X_n)$$, that is estimating a (non unique) unmixing matrix $$\Gamma$$ such that $$\Gamma X_i$$, $$i=1,\dots,n$$, has independent components. Cardoso (1989) introduced the celebrated Fourth Order Blind Identification (FOBI) procedure, in which an estimate of $$\Gamma$$ is provided, based on the regular covariance matrix and a scatter matrix based on fourth moments. Building on robustness considerations and generalizing FOBI, Invariant Coordinate Selection (ICS, 2009) was originally introduced as an exploratory tool generating an affine invariant coordinate system. The obtained coordinates, however, are proved to be independent in most IC models.

Nowadays, functional data (FD) are occurring more and more often in practice, and only little statistical techniques have been developed to analyze this type of data (see, for example Ramsay and Silverman 2006). Functional PCA is one such technique which only aims at dimension reduction with very little theoretical considerations. In this talk, we propose an extension of the FOBI methodology to the case of Hilbertian data, FD being the go-to example used throughout. When dealing with distributions on Hilbert spaces, two major problems arise: (i) the scatter operator is, in general, non-invertible and (ii) there may not exist two different affine equivariant scatter functionals. Projections on finite dimensional subspaces and Karhunen-Lo`eve expansions are used to overcome these issues and provide an alternative to FPCA. More importantly, we show that the proposed construction is Fisher consistent for the independent components of an appropriate Hilbertian IC model.

Affine invariance properties of the resulting FOBI components will be discussed and potential extension to a FICS procedure will be sketched. Simulated and real data are analyzed throughout the presentation to illustrate the properties and the potential benefits of the new tools.

This work is supported by the EPSRC grant EP/L010429/1.

References

• J.F. Cardoso (1989), Source Separation Using Higher Moments Proceedings of IEEE international conference on acoustics, speech and signal processing 2109-2112.

• D. Tyler, F. Critchley, L. Dumbgen and H. Oja (2009), Invariant Co-ordinate Selection J.R. Statist. Soc. B., 2009,71, 549-592.

• J. Ramsay and B.W. Silverman (2006) Functional Data Analysis 2nd edn. Springer, New York

# Autumn 2015

Date: 13 October 2015, 8W 2.13, 13:15

Evangelous Evangelou (Bath)

Writing and publishing your own R package: Some techniques and useful tools.

Abstract: Publishing an R package requires quality code but also adherence to CRAN policies. I will present some techniques for automating the process of creating and maintaining an R package the and some good practices from my experience as a package author.

Date: 20 October 2015, 8W 2.13, 13:15

Causal Models and How to Refute Them

Robin Evans (Oxford)

Abstract: Directed acyclic graph models (DAG models, also called Bayesian networks) are widely used in the context of causal inference, and they can be manipulated to represent the consequences of intervention in a causal system. However, DAGs cannot fully represent causal models with confounding; other classes of graphs, such as ancestral graphs and ADMGs, have been introduced to deal with this using additional kinds of edge, but we show that these are not sufficiently rich to capture the range of possible models. In fact, no mixed graph over the observed variables is rich enough, regardless of how many edges are used. Instead we introduce mDAGs, a class of hyper-graphs appropriate for representing causal models when some of the variables are unobserved. Results on the Markov equivalence of these models show that when interpreted causally, mDAGs are the minimal class of graphs which can be sensibly used. Understanding such equivalences is critical for the use of automatic causal structure learning methods, a topic in which there is considerable interest. We elucidate the state of the art as well as some open problems.

Date: 27 October 2015, 8W 2.13, 13:15

Jonty Rougier (Bristol)

Predicting large explosive eruptions for individual volcanoes

Abstract:Large explosive volcanic eruptions can be devastating, given that many volcanoes capable of such eruptions are close to cities. But data, on which predictions could be based, is very limited. Globally, such eruptions happen about once every two years, but the record is rapidly thinned going backwards in time, where the rate of under-recording depends not just on time, but also on location and magnitude. I describe our approach to assessing the under-recording rate, and to making predictions for sets of volcanoes with similar recorded histories, based on an exchangeable model of eruption rates. This is part of our larger project to provide a return period curve for each volcano. This is joint work with volcanologists Profs Steve Sparks and Kathy Cashman.

Date: 10 November 2015, 8W 2.13, 13:15

Haakon Bakka (NTNU)

A spatial random effect with one range for each region (the Difficult Terrain model component)

Abstract:Classical models in spatial statistics assume that the correlation between two points depends only on the distance between them (i.e. the models are stationary). In practice, however, the shortest distance may not be appropriate. Real life is not stationary! For example, when modelling fish near the shore, correlation should not take the shortest path going across land, but should travel along the shoreline. In ecology, animal movement depends on the terrain or the existence of animal corridors. We will show how this kind of information can be included in a spatial non-stationary model, by defining a different spatial range (distance) in each region.

We will answer the following questions:

• How to make a model with one range in each region?
• Is the algorithm fast enough for real data? (Hint: Yes!)
• How to avoid overfitting with flexible random effects?
• How to interpret the inference when you have flexible random effects?
• How do we model a point process with different cluster sizes in different regions, without changing the average number of points?

Date: 17 November 2015, 8W 2.13, 13:15

Daniel Williamson (Exeter)

Earth system models and probabilistic Bayesian calibration: a screw meets a hammer?

Abstract: The design and analysis of computer experiments, now called “Uncertainty Quantification” or “UQ” has been an active area of statistical research for 25 years. One of the most high profile methodologies, that of calibrating a complex computer code using the Bayesian solution to the inverse problem as described by Kennedy and O’Hagan’s seminal paper in 2001, has become something of a default approach to tackling applications in UQ and has over 1200 citations. However, is this always wise? Though the method is well tested and arguably appropriate for many types of model, particularly those for which large amounts of data are readily available and in which the limitations of the underlying mathematical expressions and solvers are well understood, many models, such as those found in climate simulation, go far beyond those successfully studied in terms of non-linearity, run time, output size and complexity of the underlying mathematics.

Have we really solved the calibration problem? To what extent is our “off the shelf approach” appropriate for the problems faced in fields such as Earth system modelling? In this talk we will discuss some of the known limitations of the Bayesian calibration framework (and some perhaps unknown) and we explore the extent to which the conditions in which calibration is known to fail are met in climate model problems. We will then present and argue for an alternative approach to the problem and apply it an ocean GCM known as NEMO.

Date: 24 November 2015, 8W 2.13, 14:15 (Note the kick-off time of 2:15)

Georg Lindgren (Lund)

Stochastic models for ocean waves - Gaussian fields made more realistic by Rice formula and some physics

Abstract: Gaussian fields were introduced in the early fifties as models for irregular ocean waves and they have been used in ocean engineering ever since. A simple modification leads to the more realistic stochastic Lagrange models, which account for horizontal and vertical movements of individual water particles, leading to realistic asymmetry of the generated waves.

Rice formula for the expected number of level crossings and its modern implementation to “level sets” makes it possible to derive exact statistical distributions for many important wave characteristics, like steepness and asymmetry. In the talk I will describe the stochastic Lagrange model and some of its statistical properties.

Date: 9 December 2015, CB 4.1, 13:15 (Note the different day (Wednesday) and venue)

John Copas (Warwick)

Title Model choice and goodness of fit

Abstract: How do we know whether a statistical model is sensible? The usual answer is to check that the model gives a reasonable fit to the data. The seminar will look at the variation between inferences based on different models, and show that this can be extremely large, even amongst models which seem to fit the data equally well. What does this tell us about most applications of statistics which completely ignore the problem of model uncertainty? What does it tell us about formal methods of model selection and model averaging which all, directly or indirectly, depend on model fit?