[1] "Created: Wed Apr 1 16:43:54 2015"
See the introduction for an overview.
library(lme4)
library(ggplot2)
options(digits=5,show.signif.stars=FALSE)
Load in and plot the data:
data(eggs, package="faraway")
summary(eggs)
Fat Lab Technician Sample
Min. :0.060 I :8 one:24 G:24
1st Qu.:0.307 II :8 two:24 H:24
Median :0.370 III:8
Mean :0.388 IV :8
3rd Qu.:0.430 V :8
Max. :0.800 VI :8
ggplot(eggs, aes(y=Fat, x=Lab, color=Technician, shape=Sample)) + geom_point(position = position_jitter(width=0.1, height=0.0))
cmod <- lmer(Fat ~ 1 + (1|Lab) + (1|Lab:Technician) + (1|Lab:Technician:Sample), data=eggs)
summary(cmod)
Linear mixed model fit by REML ['lmerMod']
Formula:
Fat ~ 1 + (1 | Lab) + (1 | Lab:Technician) + (1 | Lab:Technician:Sample)
Data: eggs
REML criterion at convergence: -64.2
Scaled residuals:
Min 1Q Median 3Q Max
-2.0410 -0.4658 0.0093 0.5971 1.5428
Random effects:
Groups Name Variance Std.Dev.
Lab:Technician:Sample (Intercept) 0.00306 0.0554
Lab:Technician (Intercept) 0.00698 0.0835
Lab (Intercept) 0.00592 0.0769
Residual 0.00720 0.0848
Number of obs: 48, groups:
Lab:Technician:Sample, 24; Lab:Technician, 12; Lab, 6
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.388 0.043 9.02
Confidence intervals may be obtained:
confint(cmod)
2.5 % 97.5 %
.sig01 0.000000 0.11520
.sig02 0.000000 0.17453
.sig03 0.000000 0.17943
.sigma 0.065502 0.11588
(Intercept) 0.296516 0.47848
These are based on a profile likelihood calculation. Notice that that all the random effect component SDs intervals include zero which suggest that none of these terms is marginally significant. However, the closeness to zero can cause problems in the calculation so it is worth confirming these calculations with a parametric bootstrap-based version:
confint(cmod, method="boot")
2.5 % 97.5 %
sd_(Intercept)|Lab:Technician:Sample 0.000000 0.094151
sd_(Intercept)|Lab:Technician 0.000000 0.136405
sd_(Intercept)|Lab 0.000000 0.158784
sigma 0.060266 0.104808
(Intercept) 0.301267 0.464150
The results are similar confirming our first impression regarding the variance components.
RLRsim
only works for a single random effect. However, it is not difficult to use the parametric bootstrap to test the random effects as demonstrated in the textbook.
lmerTest
restores some of the output seen in the textbook:
cmod <- lmer(Fat ~ 1 + (1|Lab) + (1|Lab:Technician) + (1|Lab:Technician:Sample), data=eggs)
summary(cmod)
Linear mixed model fit by REML ['lmerMod']
Formula:
Fat ~ 1 + (1 | Lab) + (1 | Lab:Technician) + (1 | Lab:Technician:Sample)
Data: eggs
REML criterion at convergence: -64.2
Scaled residuals:
Min 1Q Median 3Q Max
-2.0410 -0.4658 0.0093 0.5971 1.5428
Random effects:
Groups Name Variance Std.Dev.
Lab:Technician:Sample (Intercept) 0.00306 0.0554
Lab:Technician (Intercept) 0.00698 0.0835
Lab (Intercept) 0.00592 0.0769
Residual 0.00720 0.0848
Number of obs: 48, groups:
Lab:Technician:Sample, 24; Lab:Technician, 12; Lab, 6
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.388 0.043 9.02
In this case, the only addition is the test of the intercept term (which is of little interest in this example).
We can also test the random effects with this package:
cmodr <- lmer(Fat ~ 1 + (1|Lab) + (1|Lab:Technician), data=eggs)
anova(cmod, cmodr)
Data: eggs
Models:
cmodr: Fat ~ 1 + (1 | Lab) + (1 | Lab:Technician)
cmod: Fat ~ 1 + (1 | Lab) + (1 | Lab:Technician) + (1 | Lab:Technician:Sample)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
cmodr 4 -59.2 -51.7 33.6 -67.2
cmod 5 -58.8 -49.4 34.4 -68.8 1.6 1 0.21
We are reminded that we should use ML rather than REML estimates when constructing the test. Nevertheless, we would prefer to use the parametric bootstrap test shown earlier because of the difficulties of the LRT asymptotic distribution when the null hypothesis lies on the boundary.
The syntax of MCMCglmm requires that nested factors be uniquely labeled to distinguish, in this case, between technician one at lab one and lab two (who are different people). We use the expanded prior for the reasons established in our one random effect example.
library(MCMCglmm)
eggs$labtech <- paste0(eggs$Lab,eggs$Technician)
eggs$labtechsamp <- paste0(eggs$Lab,eggs$Technician,eggs$Sample)
eprior <- list(R=list(V=1,nu=0.02),G=list(G1=list(V=1,nu=0.02,alpha.V=1000),G2=list(V=1,nu=0.02,alpha.V=1000),G3=list(V=1,nu=0.02,alpha.V=1000)))
bmod <- MCMCglmm(Fat ~ 1,random=~Lab+labtech+labtechsamp,data=eggs,verbose=FALSE,prior=eprior)
xyplot(log(bmod$VCV))
The diagnostics are satisfactory
summary(bmod)
Iterations = 3001:12991
Thinning interval = 10
Sample size = 1000
DIC: -79.006
G-structure: ~Lab
post.mean l-95% CI u-95% CI eff.samp
Lab 0.0202 1.63e-07 0.0682 528
~labtech
post.mean l-95% CI u-95% CI eff.samp
labtech 0.011 3.03e-06 0.0305 784
~labtechsamp
post.mean l-95% CI u-95% CI eff.samp
labtechsamp 0.00401 3.2e-09 0.0118 908
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 0.0088 0.0045 0.0138 1000
Location effects: Fat ~ 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 0.385 0.253 0.534 1000 0.002
Looking at the posterior means, we see declining variance from the labs to the technician to the samples with the error variance falling somewhere between. The credible intervals indicate considerable uncertainty in these assessments.
We can view the posterior distributions for the SDs corresponding to the four components of variation:
library(reshape2)
ref <- data.frame(bmod$VCV)
rdf <- melt(ref, value.name="fat")
ggplot(rdf, aes(x=sqrt(fat),color=variable))+geom_density()
We see that the error SD distribution is fairly compact but we find it hard to seperate the lab, technician and sample variations.
We can also examine bivariate posteriors. For example:
ggplot(ref, aes(x=sqrt(Lab),y=sqrt(labtech)))+geom_density2d()+geom_abline(int=0,slope=1)
Here we see that while both the lab and technician variation might be close to zero seperately, it is very unlikely that they are both zero.
Check on versions of R packages used:
sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reshape2_1.2.2 MCMCglmm_2.21 ape_3.1-4 coda_0.16-1
[5] lattice_0.20-29 ggplot2_0.9.3.1 lme4_1.1-7 Rcpp_0.11.3
[9] Matrix_1.1-4 knitr_1.6 rmarkdown_0.2.68
loaded via a namespace (and not attached):
[1] boot_1.3-11 colorspace_1.2-4 corpcor_1.6.7
[4] dichromat_2.0-0 digest_0.6.4 evaluate_0.5.3
[7] formatR_0.10 grid_3.1.1 gtable_0.1.2
[10] htmltools_0.2.4 labeling_0.2 MASS_7.3-33
[13] minqa_1.2.3 munsell_0.4.2 nlme_3.1-117
[16] nloptr_1.0.4 plyr_1.8.1 proto_0.3-10
[19] RColorBrewer_1.0-5 scales_0.2.3 splines_3.1.1
[22] stringr_0.6.2 tensorA_0.36 tools_3.1.1
[25] yaml_2.1.11