[1] "Created: Wed Apr 1 16:47:20 2015"
See the introduction for an overview.
library(lme4)
library(ggplot2)
options(digits=5,show.signif.stars=FALSE)
Load in and summarize the data:
data(vision, package="faraway")
vision$npower <- rep(1:4,14)
summary(vision)
acuity power eye subject npower
Min. : 94 6/6 :14 left :28 1:8 Min. :1.00
1st Qu.:110 6/18:14 right:28 2:8 1st Qu.:1.75
Median :115 6/36:14 3:8 Median :2.50
Mean :113 6/60:14 4:8 Mean :2.50
3rd Qu.:118 5:8 3rd Qu.:3.25
Max. :124 6:8 Max. :4.00
7:8
Plot the data:
ggplot(vision, aes(y=acuity, x=npower, linetype=eye))+geom_line()+facet_wrap(~ subject)
mmod <- lmer(acuity~power+(1|subject)+(1|subject:eye),vision)
print(summary(mmod),corr=FALSE)
Linear mixed model fit by REML ['lmerMod']
Formula: acuity ~ power + (1 | subject) + (1 | subject:eye)
Data: vision
REML criterion at convergence: 328.7
Scaled residuals:
Min 1Q Median 3Q Max
-3.424 -0.323 0.011 0.441 2.466
Random effects:
Groups Name Variance Std.Dev.
subject:eye (Intercept) 10.3 3.21
subject (Intercept) 21.5 4.64
Residual 16.6 4.07
Number of obs: 56, groups: subject:eye, 14; subject, 7
Fixed effects:
Estimate Std. Error t value
(Intercept) 112.643 2.235 50.4
power6/18 0.786 1.540 0.5
power6/36 -1.000 1.540 -0.6
power6/60 3.286 1.540 2.1
anova(mmod)
Analysis of Variance Table
Df Sum Sq Mean Sq F value
power 3 141 46.9 2.83
Estimates but no inferential output as expected.
Parametric bootstrap-based confidence intervals:
confint(mmod, method="boot")
2.5 % 97.5 %
sd_(Intercept)|subject:eye 0.00000 5.3654
sd_(Intercept)|subject 0.00000 8.0743
sigma 3.16913 4.9847
(Intercept) 108.27293 117.2798
power6/18 -2.11527 3.9079
power6/36 -3.98635 2.1837
power6/60 0.31575 6.5202
We execute the KR adjusted test in this example:
library(pbkrtest)
mmod <- lmer(acuity~power+(1|subject)+(1|subject:eye),vision,REML=FALSE)
nmod <- lmer(acuity~1+(1|subject)+(1|subject:eye),vision,REML=FALSE)
KRmodcomp(mmod, nmod)
F-test with Kenward-Roger approximation; computing time: 0.16 sec.
large : acuity ~ power + (1 | subject) + (1 | subject:eye)
small : acuity ~ 1 + (1 | subject) + (1 | subject:eye)
stat ndf ddf F.scaling p.value
Ftest 2.83 3.00 39.00 1 0.051
The p-value exceeding 5% is interesting since it contrasts with the previous results being just below 5%. In truth, the result is borderline either way but of course, which side of the fence you are on will matter greatly to some. We might also try the parametric bootstrap.
RLRsim
cannot be used because there is more than one random effect.
We can implement the parametric bootstrap directly to test whether there is any variance in the eye within subject random effect term.
First we compute the observed LRT statistic
mmod <- lmer(acuity~power+(1|subject)+(1|subject:eye),vision,REML=FALSE)
rmod <- lmer(acuity~power+(1|subject),vision,REML=FALSE)
(olrt <- 2*(logLik(mmod)-logLik(rmod)))
'log Lik.' 6.9608 (df=7)
Now we resample 1000 times and computed the p-value:
lrstat <- numeric(1000)
for(i in 1:1000){
racuity <- unlist(simulate(rmod))
bmod <- refit(mmod, racuity)
smod <- refit(rmod, racuity)
lrstat[i] <- 2*(logLik(bmod)-logLik(smod))
}
mean(lrstat > olrt)
[1] 0.003
Looks like there is some variation in the eyes within subjects
library(lmerTest)
mmod <- lmer(acuity~power+(1|subject)+(1|subject:eye),vision)
print(summary(mmod),corr=FALSE)
Linear mixed model fit by REML t-tests use Satterthwaite approximations
to degrees of freedom [merModLmerTest]
Formula: acuity ~ power + (1 | subject) + (1 | subject:eye)
Data: vision
REML criterion at convergence: 328.7
Scaled residuals:
Min 1Q Median 3Q Max
-3.424 -0.323 0.011 0.441 2.466
Random effects:
Groups Name Variance Std.Dev.
subject:eye (Intercept) 10.3 3.21
subject (Intercept) 21.5 4.64
Residual 16.6 4.07
Number of obs: 56, groups: subject:eye, 14; subject, 7
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 112.643 2.235 8.800 50.40 3.7e-12
power6/18 0.786 1.540 39.000 0.51 0.613
power6/36 -1.000 1.540 39.000 -0.65 0.520
power6/60 3.286 1.540 39.000 2.13 0.039
anova(mmod)
Analysis of Variance Table of type 3 with Satterthwaite
approximation for degrees of freedom
Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)
power 141 46.9 3 39 2.83 0.051
Can test the random effects:
rand(mmod)
Analysis of Random effects Table:
Chi.sq Chi.DF p.value
subject 2.96 1 0.09
subject:eye 5.98 1 0.01
but we prefer the parametric bootstrap result because of problems with asymptotic distribution of the LRT in such cases.
Pairwise differences may be interesting
difflsmeans(mmod)
Differences of LSMEANS:
Estimate Standard Error DF t-value Lower CI Upper CI
power 6/6 - 6/18 -0.8 1.540 39.0 -0.51 -3.90 2.329
power 6/6 - 6/36 1.0 1.540 39.0 0.65 -2.12 4.115
power 6/6 - 6/60 -3.3 1.540 39.0 -2.13 -6.40 -0.171
power 6/18 - 6/36 1.8 1.540 39.0 1.16 -1.33 4.901
power 6/18 - 6/60 -2.5 1.540 39.0 -1.62 -5.62 0.615
power 6/36 - 6/60 -4.3 1.540 39.0 -2.78 -7.40 -1.171
p-value
power 6/6 - 6/18 0.613
power 6/6 - 6/36 0.520
power 6/6 - 6/60 0.039
power 6/18 - 6/36 0.253
power 6/18 - 6/60 0.113
power 6/36 - 6/60 0.008
The syntax of MCMCglmm requires that nested factors be uniquely labeled to distinguish, in this case, between the left eye of subject one and subject two (as these are different eyes!). We use the expanded prior for the reasons established in our one random effect example.
library(MCMCglmm)
vision$eyesub <- paste0(vision$eye,vision$subject)
eprior <- list(R=list(V=1,nu=0.02),G=list(G1=list(V=1,nu=0.02,alpha.V=1000),G2=list(V=1,nu=0.02,alpha.V=1000)))
bmod <- MCMCglmm(acuity ~ power,random=~subject+eyesub,data=vision,verbose=FALSE,prior=eprior,pr=TRUE)
xyplot(log(bmod$VCV))
The diagnostics are satisfactory
summary(bmod)
Iterations = 3001:12991
Thinning interval = 10
Sample size = 1000
DIC: 334.26
G-structure: ~subject
post.mean l-95% CI u-95% CI eff.samp
subject 30.7 0.00383 98.3 1000
~eyesub
post.mean l-95% CI u-95% CI eff.samp
eyesub 17.9 0.00336 47 1000
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 18 11.1 27.6 1000
Location effects: acuity ~ power
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 112.6641 107.8443 118.1216 1000 <0.001
power6/18 0.7473 -2.3682 3.8104 1000 0.638
power6/36 -0.9847 -4.3784 2.0154 813 0.536
power6/60 3.2479 -0.0462 6.0731 625 0.046
The results are quite similar to the lme4
fit. We see that the highest power level is the only one with much evidence of a difference from the rest.
We can examine the posterior distributions of the random effects terms.
library(reshape2)
ref <- data.frame(sqrt(bmod$VCV))
rdf <- melt(ref, value.name="acuity")
ggplot(rdf, aes(x=acuity,color=variable))+geom_density()
These form the usual pattern where the error SD is more precisely estimated. We see there is slightly more variation between subjects than between eyes within subjects. Perhaps one might expect the latter to be significantly smaller but we don’t quite see that in this fit.
Let’s look at the posterior distributions for the contrasts with the lowest power level:
ref <- data.frame(bmod$Sol[,2:4])
colnames(ref) <- c("18-6","36-6","60-6")
rdf <- melt(ref, value.name="acuity", variable.name="power")
ggplot(rdf, aes(x=acuity, color=power))+geom_density()
We see that only the 60/6 vs. 6/6 contrast has a distribution which indicates a clear difference from zero.
It is also interesting to look at the posterior distributions of the random effects. We focus on the eye effects and label whether eye is left or right.
ref <- data.frame(bmod$Sol)
rdf <- melt(ref[,grep("eye",colnames(ref))],value.name="acuity",variable.name="eyesub")
rdf$hand <- substring(rdf$eyesub,8,8)
ggplot(rdf, aes(x=acuity,group=eyesub, color=hand))+geom_density()
We see there is not much distinction between left and right eyes (unlike hands). We also see one case where there is a large difference. We can figure out which case this is:
posterior.mode(bmod$Sol)
(Intercept) power6/18 power6/36 power6/60 subject.1
112.9392405 0.5903626 -1.2241856 4.0310278 3.6405395
subject.2 subject.3 subject.4 subject.5 subject.6
0.0667272 3.9021509 -0.1296614 -0.1785706 -7.5305802
subject.7 eyesub.left1 eyesub.left2 eyesub.left3 eyesub.left4
-0.0033437 3.0636185 -0.0716572 0.0239942 0.2224728
eyesub.left5 eyesub.left6 eyesub.left7 eyesub.right1 eyesub.right2
-0.3153598 1.5774931 -0.4528449 0.2237597 -1.6250469
eyesub.right3 eyesub.right4 eyesub.right5 eyesub.right6 eyesub.right7
2.0656315 0.8727361 0.0602976 -6.3729642 -1.3415210
We see that it corresponds to the right eye of subject 6. We have already noticed this problem earlier. There is either a problem with the measurement of this eye and/or the subject is quite asymetrical.
The Bayesian approach provides a predictive distribution rather than just a point estimate. We can summarise the distribution with a posterior mean. As with lme4
approach, there are different kinds of predicted values depending on how we use the random effects.
Suppose we were interested in a new individual - not one of the subjects in the study. In this case, we need to marginalize out the subject and eye effects:
vision$pv0 <- predict(bmod,marginal=~subject+eyesub)
p <- ggplot(vision, aes(y=acuity, x=npower, linetype=eye))+geom_line()+facet_wrap(~ subject)
p+geom_line(aes(y=pv0,color="red"))
We see that this produces the same predicted value for all subjects and eyes.
We might know the subject but not which eye. To achieve this kind of predicted value:
vision$pvsub <- predict(bmod,marginal=~eyesub)
p <- ggplot(vision, aes(y=acuity, x=npower, linetype=eye))+geom_line()+facet_wrap(~ subject)
p+geom_line(aes(y=pvsub,color="red"))
Now we can see that the predicted values vary (by a constant amount) across subjects but not across eyes.
Finally, we can specify the subject and eye. This is sometimes called the full predictive distribution:
vision$pvfull <- predict(bmod,marginal=NULL)
p <- ggplot(vision, aes(y=acuity, x=npower, linetype=eye))+geom_line()+facet_wrap(~ subject)
p+geom_line(aes(y=pvfull,color="red"))
Now we see that the predicted value varies by subject and eye (with constant differences).
Having produced predicted values, we can also compute some residuals:
vision$res <- with(vision, acuity-pvfull)
We can make the standard residual vs. fitted plot:
ggplot(vision, aes(x=pvfull,y=res)) + geom_point()+xlab("Predicted")+ylab("Residuals")+geom_hline(yint=0)
We can see the outlier which we have previously identified. Otherwise everything looks fine.
A Q-Q plot of the residuals:
qqnorm(vision$res)
qqline(vision$res)
Again we see the outlier but otherwise we have approximate normality.
These diagnostics do provide something useful in checking the model but they are not entirely appropriate for a Bayesian-style analysis. Assuming we recognize the value in model checking (I certainly do!), we should check that data simulated from the fitted model looks like the data we actually observed. This idea can be implemented in various ways which we leave as a TODO.
Check on versions of R packages used:
sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reshape2_1.2.2 MCMCglmm_2.21 ape_3.1-4 coda_0.16-1
[5] lattice_0.20-29 lmerTest_2.0-20 pbkrtest_0.4-1 MASS_7.3-33
[9] ggplot2_0.9.3.1 lme4_1.1-7 Rcpp_0.11.3 Matrix_1.1-4
[13] knitr_1.6 rmarkdown_0.2.68
loaded via a namespace (and not attached):
[1] bitops_1.0-6 boot_1.3-11 caTools_1.16
[4] cluster_1.15.2 colorspace_1.2-4 corpcor_1.6.7
[7] dichromat_2.0-0 digest_0.6.4 evaluate_0.5.3
[10] formatR_0.10 Formula_1.1-1 gdata_2.13.3
[13] gplots_2.13.0 grid_3.1.1 gtable_0.1.2
[16] gtools_3.3.1 Hmisc_3.14-3 htmltools_0.2.4
[19] KernSmooth_2.23-12 labeling_0.2 latticeExtra_0.6-26
[22] minqa_1.2.3 munsell_0.4.2 nlme_3.1-117
[25] nloptr_1.0.4 numDeriv_2012.9-1 parallel_3.1.1
[28] plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5
[31] scales_0.2.3 splines_3.1.1 stringr_0.6.2
[34] survival_2.37-7 tensorA_0.36 tools_3.1.1
[37] yaml_2.1.11