[1] "Created: Wed Apr  1 16:44:31 2015"

See the introduction for an overview.

library(lme4)
library(ggplot2)
options(digits=5,show.signif.stars=FALSE)

Load up and look at the data:

data(pulp, package="faraway")
pulp
   bright operator
1    59.8        a
2    60.0        a
3    60.8        a
4    60.8        a
5    59.8        a
6    59.8        b
7    60.2        b
8    60.4        b
9    59.9        b
10   60.0        b
11   60.7        c
12   60.7        c
13   60.5        c
14   60.9        c
15   60.3        c
16   61.0        d
17   60.8        d
18   60.6        d
19   60.5        d
20   60.5        d
ggplot(pulp, aes(x=operator, y=bright))+geom_point(position = position_jitter(width=0.1, height=0.0))

plot of chunk unnamed-chunk-3

Model

Fit a model with a fixed intercept and random effect for the operators:

mmod <- lmer(bright ~ 1+(1|operator), pulp)
summary(mmod)
Linear mixed model fit by REML ['lmerMod']
Formula: bright ~ 1 + (1 | operator)
   Data: pulp

REML criterion at convergence: 18.6

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-1.467 -0.759 -0.124  0.628  1.601 

Random effects:
 Groups   Name        Variance Std.Dev.
 operator (Intercept) 0.0681   0.261   
 Residual             0.1062   0.326   
Number of obs: 20, groups:  operator, 4

Fixed effects:
            Estimate Std. Error t value
(Intercept)   60.400      0.149     404

Fixed Effects

There is only an intercept term which we are not normally interested in testing. We can use a normal approximation to the t-statistic given in the output of 404 to test the hypothesis that the intercept is zero. Clearly the intercept is very far from zero.

We can produce confidence intervals using the profile likelihood (which is based on the likelihood ratio test):

confint(mmod)
               2.5 %   97.5 %
.sig01       0.00000  0.61790
.sigma       0.23891  0.48218
(Intercept) 60.07130 60.72870

This also produces confidence intervals for the two random effect parameters.

Mixed Effects

The RLRsim package offers a convenient way of using the parametric bootstrap method:

library(RLRsim)
smod <- lmer(bright ~ 1+(1|operator), pulp, REML=FALSE)
nullmod <- lm(bright ~ 1, pulp)
exactLRT(smod, nullmod)

    simulated finite sample distribution of LRT. (p-value based on
    10000 simulated values)

data:  
LRT = 2.5684, p-value = 0.0232

We that the random effect term, operator is statistically significant.

The lmerTest package also allows a test on the random effect term:

library(lmerTest)
smod <- lmer(bright ~ 1+(1|operator), pulp, REML=FALSE)
rand(smod)
Analysis of Random effects Table:
         Chi.sq Chi.DF p.value
operator   3.47      1    0.06

This test uses the twice the differences in the log-likelihoods to construct a test statistic with a chi-squared null distribution to generate the p-value. As explained in the text, this test is not accurate because the null hypothesis that the variance is zero lies on the boundary of the parameter space. This makes the asymptotic approximation of the chi-square invalid.

MCMCglmm

We can also take a Bayesian approach to analyzing this data using the MCMCglmm package.

We fit the model using all the default settings which involve almost uninformative priors with 10,000 iterations and sampling every 10th case:

library(MCMCglmm)
bmod <- MCMCglmm(bright ~ 1, random= ~operator, data=pulp, verbose=FALSE)

Before we can start making any conclusions we need to verify that the MCMC has converged. We need to check for stability in the iterations:

First check the intercept term:

xyplot(bmod$Sol)

plot of chunk unnamed-chunk-9

This looks stable. We should also check the autocorrelation:

autocorr.plot(bmod$Sol)

plot of chunk unnamed-chunk-10

There no concern with correlation.

The same checks need to be made for the two variance terms:

xyplot(log(bmod$VCV))

plot of chunk unnamed-chunk-11

This works better on a log scale for variances. There is more concern for the operator variance although there is no trend. For the operator, the chain oscillates between a state of larger variance and one of very small variance.

autocorr.plot(bmod$VCV)

plot of chunk unnamed-chunk-12

The stronger correlation is apparent but not sufficient to stop us from proceeding.

summary(bmod)

 Iterations = 3001:12991
 Thinning interval  = 10
 Sample size  = 1000 

 DIC: 21.71 

 G-structure:  ~operator

         post.mean l-95% CI u-95% CI eff.samp
operator    0.0574 2.94e-17    0.263      246

 R-structure:  ~units

      post.mean l-95% CI u-95% CI eff.samp
units     0.159   0.0567    0.277      244

 Location effects: bright ~ 1 

            post.mean l-95% CI u-95% CI eff.samp  pMCMC
(Intercept)      60.4     60.1     60.7     1000 <0.001

For the interecept term, the posterior mean of 60.4 is identical with the lme4 fit. The credible (not confidence) interval is also similar to our previous findings. For the variance terms the results differ more from the lme4-based results.

Previously we tested the hypothesis that the operator variance was zero. This same question does not make sense in the Bayesian analysis although we can gain some insight by studying the posterior distribution of the operator variance. It is easier to understand as an SD rather than variance:

hist(sqrt(bmod$VCV[,1]),100, xlab="operator SD", main="posterior distribution")

plot of chunk unnamed-chunk-14

We can see that there is a high probability that the operator SD is rather small. Notice that the response is only measured to one decimal digit. Hence an SD of 0.1, meaning a variance of 0.01, would be hardly noticeable. We can estimate the probability that the operator variance is less than 0.01 as

mean(bmod$VCV[,1] < 0.01)
[1] 0.683

which is quite high. So we are quite confident that the operator variance is small especially compared to the error variance:

hist(sqrt(bmod$VCV[,2]),100, xlab="error SD", main="posterior distribution")

plot of chunk unnamed-chunk-16

In the previous frequentist based analysis, we tested the hypothesis that the operator variance is zero. The p-value as calculated by the more reliable parametric bootstrap method was smaller than 0.05 and so we reject this hypothesis. However, it is seems quite a strong claim to suggest that the operator variance is exactly zero. A weaker claim is that is small enough to ignore. The Bayesian analysis suggests that this claim is has considerable weight but there remains a reasonable chance that the operator variance is large enough that we cannot ignore it.

Although the MCMCglmm-based analysis provides new insights, there remains some concern over the model fitting. We need to be confident that the MCMC has converged correctly. We have shown how this can be checked but this a continuing concern. Furthermore, we have simply used the default choice of prior. It is sensible to see whether the results are sensitive to this choice.

Prior specification

We have used the default priors in the analysis above. We can experiment with different choices for the prior. The variance parameters have inverse Wishart priors which simplify to inverse gamma in one dimension. The parameter V and nu are related to the shape alpha and scale beta parameters in standard specification of the inverse gamma by: \[ \alpha = \nu/2 \qquad \beta=\nu V/2 \]

We define the inverse gamma density:

dinvgamma <- function(x,alpha=1,beta=1) beta^alpha*x^(-alpha-1)*exp(-beta/x)/gamma(alpha)

The default values are V=1, nu=0.02. This corresponds to a density of:

x <- (1:300)/100
plot(x, dinvgamma(x,0.01, 0.01), ylab="density", yaxs="i",type="l")

plot of chunk unnamed-chunk-18

As can be seen, this apparently puts more prior mass on smaller values of the variance. However, this is somewhat misleading when viewed on an additive scale.

If we increase nu to 0.5:

plot(x, dinvgamma(x,0.25, 0.25), ylab="density", yaxs="i",type="l")

plot of chunk unnamed-chunk-19

This makes the distribution more concentrated on moderate values of the variance.

Now decrease V to 0.2

plot(x, dinvgamma(x,0.25, 0.05), ylab="density", yaxs="i",type="l")

plot of chunk unnamed-chunk-20

This reduces the mean.

In this particular example, a more informative prior might suggest that there is going to be some variation between and within operators but we believe this variation is not going to be particular large or small. We can achieve this by setting mu to be higher than the default choice of 0.02 - we try 0.5. In MCMCglmm, R refers to the residual variation and G to the group variation.

iprior <- list(R=list(V=1,nu=0.5),G=list(G1=list(V=1,nu=0.5)))
bmodi <- MCMCglmm(bright ~ 1, random= ~operator, data=pulp,  verbose=FALSE, prior=iprior)

Diagnostics:

xyplot(bmodi$Sol)

plot of chunk unnamed-chunk-22

xyplot(log(bmodi$VCV))

plot of chunk unnamed-chunk-22

As can be seen, the chain mixing for the operator variance is much more satisfactory.

Summary output

summary(bmodi)

 Iterations = 3001:12991
 Thinning interval  = 10
 Sample size  = 1000 

 DIC: 18.2 

 G-structure:  ~operator

         post.mean l-95% CI u-95% CI eff.samp
operator     0.499   0.0372     1.48     1000

 R-structure:  ~units

      post.mean l-95% CI u-95% CI eff.samp
units      0.15   0.0695     0.27     1000

 Location effects: bright ~ 1 

            post.mean l-95% CI u-95% CI eff.samp  pMCMC
(Intercept)      60.4     59.7     61.1     1151 <0.001

There is a considerable difference with the previous MCMCglmm analysis. The operator variance is much larger and there is more variation in the intercept term.

Posterior distribution of the operator SD

hist(sqrt(bmodi$VCV[,1]),100, xlab="operator SD", main="posterior distribution")

plot of chunk unnamed-chunk-24

This is markedly different from the results using the defaut prior and puts more mass on larger values of the operator variance.

We can check again on the probability that the variance is less than 0.01 (or SD less than 0.1):

mean(bmodi$VCV[,1] < 0.01)
[1] 0

Hence the results are considerably different from before and we see that the outcome is sensitive to the choice of prior. But this should not be surprising as we have a small dataset with only 20 observations overall with only 4 different operators with which to estimate the operator variance. It is not surprising that the prior is influential in this example. Furthermore, if we examine the density of the prior we have used as plotted above, we see that this puts very little weight on very small values of the variance so this comes through in the posterior distribution.

Parameter Expansion

The problem with the default choice of prior was that this resulted in poor mixing properties in the MCMC. This instability is common when the variance approaches zero. When we explicitly set the prior to put little weight on near zero values for the variance, the problem was avoided but at the price of making a strong assumption about the size of the variance which we might not wish to make.

It is possible to avoid the mixing problems while retaining a prior that allows some weight to small variances. We can use a technique called parameter expansion which is explained in more detail in Hadfield (2010). We can specify an non-central scaled F-distribution prior for the variance. For example, here is a possible prior for the operator variance:

alpha.V <- 1000
alpha.mu <- 0
nu <- 0.02
plot(x,df(x/alpha.V, df1 = 1, df2 = nu, ncp = (alpha.mu^2)/alpha.V),type="l",ylab="Density",yaxs="i")

plot of chunk unnamed-chunk-26

We can see that it is comparable to the default prior in its shape. We can use this:

eprior <- list(R=list(V=1,nu=0.02),G=list(G1=list(V=1,nu=0.02,alpha.V=1000)))
bmode <- MCMCglmm(bright ~ 1, random= ~operator, data=pulp,  verbose=FALSE, prior=eprior)

and check the mixing properties:

xyplot(log(bmode$VCV))

plot of chunk unnamed-chunk-28

These are satisfactory. We now check the model summary:

summary(bmode)

 Iterations = 3001:12991
 Thinning interval  = 10
 Sample size  = 1000 

 DIC: 18.074 

 G-structure:  ~operator

         post.mean l-95% CI u-95% CI eff.samp
operator     0.623 1.29e-05     2.42      439

 R-structure:  ~units

      post.mean l-95% CI u-95% CI eff.samp
units     0.126   0.0536    0.241     1000

 Location effects: bright ~ 1 

            post.mean l-95% CI u-95% CI eff.samp  pMCMC
(Intercept)      60.4     59.8     61.3     1000 <0.001

Posterior distribution of the operator SD:

hist(sqrt(bmode$VCV[,1]),100, xlab="operator SD", main="posterior distribution")

plot of chunk unnamed-chunk-30

This is more like the second posterior above than the default prior-based version.

We can check again on the probability that the variance is less than 0.01 (or SD less than 0.1):

mean(bmode$VCV[,1] < 0.01)
[1] 0.034

We see a relatively small chance of a negligible operator SD.

We can compare the error and operator SDs:

plot(sqrt(operator) ~ sqrt(units), data=bmode$VCV, xlab="error SD",  ylab="operator SD")
abline(0,1)

plot of chunk unnamed-chunk-32

mean(bmode$VCV[,1] > bmode$VCV[,2])
[1] 0.554

We see that in slightly more than half the samples, the operator SD is larger than the error SD.

Random effect posterior distributions

In the Bayesian framework, it is natural to think of the operator random effects as distributions. We need to modify the MCMCglmm call to add pr=TRUE if we want to examine these posteriors:

bmode <- MCMCglmm(bright ~ 1, random= ~operator, data=pulp, verbose=FALSE, prior=eprior, pr=TRUE)
colMeans(bmode$Sol)
(Intercept)  operator.a  operator.b  operator.c  operator.d 
   60.38927    -0.11140    -0.25890     0.18911     0.23759 

These are the posterior means for the operator effects. We see these are similar to the BLUPs computed from the lme4 fit:

ranef(mmod)
$operator
  (Intercept)
a    -0.12194
b    -0.25912
c     0.16767
d     0.21340

We can also look at the distributions of these effects:

library(reshape2)
ref <- data.frame(bmode$Sol[,-1])
colnames(ref) <- letters[1:4]
rdf <- melt(ref)
ggplot(data=rdf,aes(x=value, color=variable))+geom_density()

plot of chunk unnamed-chunk-35

There is substantial overlap in the four distributions so there is no real evidence of a difference between any pair of operators.

Check on versions of R packages used:

sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reshape2_1.2.2   MCMCglmm_2.21    ape_3.1-4        coda_0.16-1     
 [5] lattice_0.20-29  lmerTest_2.0-20  RLRsim_3.0       ggplot2_0.9.3.1 
 [9] lme4_1.1-7       Rcpp_0.11.3      Matrix_1.1-4     knitr_1.6       
[13] rmarkdown_0.2.68

loaded via a namespace (and not attached):
 [1] bitops_1.0-6        caTools_1.16        cluster_1.15.2     
 [4] colorspace_1.2-4    corpcor_1.6.7       dichromat_2.0-0    
 [7] digest_0.6.4        evaluate_0.5.3      formatR_0.10       
[10] Formula_1.1-1       gdata_2.13.3        gplots_2.13.0      
[13] grid_3.1.1          gtable_0.1.2        gtools_3.3.1       
[16] Hmisc_3.14-3        htmltools_0.2.4     KernSmooth_2.23-12 
[19] labeling_0.2        latticeExtra_0.6-26 MASS_7.3-33        
[22] minqa_1.2.3         munsell_0.4.2       nlme_3.1-117       
[25] nloptr_1.0.4        numDeriv_2012.9-1   plyr_1.8.1         
[28] proto_0.3-10        RColorBrewer_1.0-5  scales_0.2.3       
[31] splines_3.1.1       stringr_0.6.2       survival_2.37-7    
[34] tensorA_0.36        tools_3.1.1         yaml_2.1.11        

Summary

The various analysis are agreed on the fixed effect term (which is only the intercept). There is considerably more uncertainty in the magnitudes of the operator and error variances. We have enough information to be more precise about the error variance but there is still some uncertainty. It is harder to pin down the operator variance which is hardly surprising since we have effectively only four observations of operators.

References

Hadfield, Jarrod D. 2010. “MCMC Methods for Multi-Response Generalized Linear Mixed Models: The MCMCglmm R Package.” Journal of Statistical Software 33 (2): 1–22. http://www.jstatsoft.org/v33/i02/.