Nested Models

Fit the model
RLRsim
lmerTest
MCMCglmm

[1] "Created: Wed Apr  1 16:43:54 2015"

See the introduction for an overview.

library(lme4)
library(ggplot2)
options(digits=5,show.signif.stars=FALSE)

Load in and plot the data:

data(eggs, package="faraway")
summary(eggs)

      Fat         Lab    Technician Sample
 Min.   :0.060   I  :8   one:24     G:24  
 1st Qu.:0.307   II :8   two:24     H:24  
 Median :0.370   III:8                    
 Mean   :0.388   IV :8                    
 3rd Qu.:0.430   V  :8                    
 Max.   :0.800   VI :8

ggplot(eggs, aes(y=Fat, x=Lab, color=Technician, shape=Sample)) + geom_point(position = position_jitter(width=0.1, height=0.0))

plot of chunk unnamed-chunk-3

Fit the model

cmod <- lmer(Fat ~ 1 + (1|Lab) + (1|Lab:Technician) + (1|Lab:Technician:Sample), data=eggs)
summary(cmod)

Linear mixed model fit by REML ['lmerMod']
Formula: 
Fat ~ 1 + (1 | Lab) + (1 | Lab:Technician) + (1 | Lab:Technician:Sample)
   Data: eggs

REML criterion at convergence: -64.2

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.0410 -0.4658  0.0093  0.5971  1.5428 

Random effects:
 Groups                Name        Variance Std.Dev.
 Lab:Technician:Sample (Intercept) 0.00306  0.0554  
 Lab:Technician        (Intercept) 0.00698  0.0835  
 Lab                   (Intercept) 0.00592  0.0769  
 Residual                          0.00720  0.0848  
Number of obs: 48, groups:  
Lab:Technician:Sample, 24; Lab:Technician, 12; Lab, 6

Fixed effects:
            Estimate Std. Error t value
(Intercept)    0.388      0.043    9.02

Confidence intervals may be obtained:

confint(cmod)

               2.5 %  97.5 %
.sig01      0.000000 0.11520
.sig02      0.000000 0.17453
.sig03      0.000000 0.17943
.sigma      0.065502 0.11588
(Intercept) 0.296516 0.47848

These are based on a profile likelihood calculation. Notice that that all the random effect component SDs intervals include zero which suggest that none of these terms is marginally significant. However, the closeness to zero can cause problems in the calculation so it is worth confirming these calculations with a parametric bootstrap-based version:

confint(cmod, method="boot")

                                        2.5 %   97.5 %
sd_(Intercept)|Lab:Technician:Sample 0.000000 0.094151
sd_(Intercept)|Lab:Technician        0.000000 0.136405
sd_(Intercept)|Lab                   0.000000 0.158784
sigma                                0.060266 0.104808
(Intercept)                          0.301267 0.464150

The results are similar confirming our first impression regarding the variance components.

RLRsim

RLRsim only works for a single random effect. However, it is not difficult to use the parametric bootstrap to test the random effects as demonstrated in the textbook.

lmerTest

lmerTest restores some of the output seen in the textbook:

cmod <- lmer(Fat ~ 1 + (1|Lab) + (1|Lab:Technician) + (1|Lab:Technician:Sample), data=eggs)
summary(cmod)

Linear mixed model fit by REML ['lmerMod']
Formula: 
Fat ~ 1 + (1 | Lab) + (1 | Lab:Technician) + (1 | Lab:Technician:Sample)
   Data: eggs

REML criterion at convergence: -64.2

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.0410 -0.4658  0.0093  0.5971  1.5428 

Random effects:
 Groups                Name        Variance Std.Dev.
 Lab:Technician:Sample (Intercept) 0.00306  0.0554  
 Lab:Technician        (Intercept) 0.00698  0.0835  
 Lab                   (Intercept) 0.00592  0.0769  
 Residual                          0.00720  0.0848  
Number of obs: 48, groups:  
Lab:Technician:Sample, 24; Lab:Technician, 12; Lab, 6

Fixed effects:
            Estimate Std. Error t value
(Intercept)    0.388      0.043    9.02

In this case, the only addition is the test of the intercept term (which is of little interest in this example).

We can also test the random effects with this package:

cmodr <- lmer(Fat ~ 1 + (1|Lab) + (1|Lab:Technician), data=eggs)
anova(cmod, cmodr)

Data: eggs
Models:
cmodr: Fat ~ 1 + (1 | Lab) + (1 | Lab:Technician)
cmod: Fat ~ 1 + (1 | Lab) + (1 | Lab:Technician) + (1 | Lab:Technician:Sample)
      Df   AIC   BIC logLik deviance Chisq Chi Df Pr(>Chisq)
cmodr  4 -59.2 -51.7   33.6    -67.2                        
cmod   5 -58.8 -49.4   34.4    -68.8   1.6      1       0.21

We are reminded that we should use ML rather than REML estimates when constructing the test. Nevertheless, we would prefer to use the parametric bootstrap test shown earlier because of the difficulties of the LRT asymptotic distribution when the null hypothesis lies on the boundary.

MCMCglmm

The syntax of MCMCglmm requires that nested factors be uniquely labeled to distinguish, in this case, between technician one at lab one and lab two (who are different people). We use the expanded prior for the reasons established in our one random effect example.

library(MCMCglmm)
eggs$labtech <- paste0(eggs$Lab,eggs$Technician)
eggs$labtechsamp <- paste0(eggs$Lab,eggs$Technician,eggs$Sample)
eprior <- list(R=list(V=1,nu=0.02),G=list(G1=list(V=1,nu=0.02,alpha.V=1000),G2=list(V=1,nu=0.02,alpha.V=1000),G3=list(V=1,nu=0.02,alpha.V=1000)))
bmod <- MCMCglmm(Fat ~ 1,random=~Lab+labtech+labtechsamp,data=eggs,verbose=FALSE,prior=eprior)
xyplot(log(bmod$VCV))

plot of chunk unnamed-chunk-9

The diagnostics are satisfactory

summary(bmod)


 Iterations = 3001:12991
 Thinning interval  = 10
 Sample size  = 1000 

 DIC: -79.006 

 G-structure:  ~Lab

    post.mean l-95% CI u-95% CI eff.samp
Lab    0.0202 1.63e-07   0.0682      528

               ~labtech

        post.mean l-95% CI u-95% CI eff.samp
labtech     0.011 3.03e-06   0.0305      784

               ~labtechsamp

            post.mean l-95% CI u-95% CI eff.samp
labtechsamp   0.00401  3.2e-09   0.0118      908

 R-structure:  ~units

      post.mean l-95% CI u-95% CI eff.samp
units    0.0088   0.0045   0.0138     1000

 Location effects: Fat ~ 1 

            post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept)     0.385    0.253    0.534     1000 0.002

Looking at the posterior means, we see declining variance from the labs to the technician to the samples with the error variance falling somewhere between. The credible intervals indicate considerable uncertainty in these assessments.

We can view the posterior distributions for the SDs corresponding to the four components of variation:

library(reshape2)
ref <- data.frame(bmod$VCV)
rdf <- melt(ref, value.name="fat")
ggplot(rdf, aes(x=sqrt(fat),color=variable))+geom_density()

plot of chunk unnamed-chunk-11

We see that the error SD distribution is fairly compact but we find it hard to seperate the lab, technician and sample variations.

We can also examine bivariate posteriors. For example:

ggplot(ref, aes(x=sqrt(Lab),y=sqrt(labtech)))+geom_density2d()+geom_abline(int=0,slope=1)

plot of chunk unnamed-chunk-12

Here we see that while both the lab and technician variation might be close to zero seperately, it is very unlikely that they are both zero.

Check on versions of R packages used:

sessionInfo()

R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reshape2_1.2.2   MCMCglmm_2.21    ape_3.1-4        coda_0.16-1     
 [5] lattice_0.20-29  ggplot2_0.9.3.1  lme4_1.1-7       Rcpp_0.11.3     
 [9] Matrix_1.1-4     knitr_1.6        rmarkdown_0.2.68

loaded via a namespace (and not attached):
 [1] boot_1.3-11        colorspace_1.2-4   corpcor_1.6.7     
 [4] dichromat_2.0-0    digest_0.6.4       evaluate_0.5.3    
 [7] formatR_0.10       grid_3.1.1         gtable_0.1.2      
[10] htmltools_0.2.4    labeling_0.2       MASS_7.3-33       
[13] minqa_1.2.3        munsell_0.4.2      nlme_3.1-117      
[16] nloptr_1.0.4       plyr_1.8.1         proto_0.3-10      
[19] RColorBrewer_1.0-5 scales_0.2.3       splines_3.1.1     
[22] stringr_0.6.2      tensorA_0.36       tools_3.1.1       
[25] yaml_2.1.11

Nested Models

Julian Faraway

Fit the model

RLRsim

lmerTest

MCMCglmm