[1] "Created: Wed Apr 1 16:45:08 2015"
See the introduction for an overview.
library(lme4)
library(ggplot2)
options(digits=5,show.signif.stars=FALSE)
Load in and summarize the data:
data(jsp, package="faraway")
jspr <- jsp[jsp$year==2,]
mjspr <- data.frame(rbind(jspr[,1:6],jspr[,1:6]),subject=factor(rep(c("english","math"),c(953,953))),score=c(jspr$english/100,jspr$math/40))
summary(mjspr)
school class gender social raven
48 : 126 1:1134 boy :916 4 :738 Min. : 4.0
33 : 88 2: 584 girl:990 9 :276 1st Qu.:22.0
42 : 82 3: 108 2 :252 Median :26.0
31 : 70 4: 80 5 :170 Mean :25.3
47 : 66 3 :160 3rd Qu.:30.0
21 : 56 6 :116 Max. :36.0
(Other):1418 (Other):194
id subject score
1 : 2 english:953 Min. :0.000
3 : 2 math :953 1st Qu.:0.390
4 : 2 Median :0.625
6 : 2 Mean :0.594
7 : 2 3rd Qu.:0.825
8 : 2 Max. :1.000
(Other):1894
Plot the data
ggplot(mjspr, aes(x=raven, y=score))+geom_jitter(alpha=0.25)+facet_grid(gender ~ subject)
mjspr$craven <- mjspr$raven-mean(mjspr$raven)
mmod <- lmer(score ~ subject*gender+craven*subject+social+(1|school)+(1|school:class)+(1|school:class:id),mjspr)
print(summary(mmod),corr=FALSE)
Linear mixed model fit by REML ['lmerMod']
Formula:
score ~ subject * gender + craven * subject + social + (1 | school) +
(1 | school:class) + (1 | school:class:id)
Data: mjspr
REML criterion at convergence: -1741.6
Scaled residuals:
Min 1Q Median 3Q Max
-2.6654 -0.5692 0.0072 0.5641 2.5870
Random effects:
Groups Name Variance Std.Dev.
school:class:id (Intercept) 0.010252 0.1013
school:class (Intercept) 0.000582 0.0241
school (Intercept) 0.002231 0.0472
Residual 0.013592 0.1166
Number of obs: 1906, groups:
school:class:id, 953; school:class, 90; school, 48
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.441578 0.026459 16.7
subjectmath 0.366565 0.007710 47.5
gendergirl 0.063351 0.010254 6.2
craven 0.017390 0.000925 18.8
social2 0.013754 0.027230 0.5
social3 -0.020768 0.028972 -0.7
social4 -0.070708 0.025868 -2.7
social5 -0.050474 0.028818 -1.8
social6 -0.087852 0.030672 -2.9
social7 -0.099408 0.031607 -3.1
social8 -0.081623 0.042352 -1.9
social9 -0.047337 0.027445 -1.7
subjectmath:gendergirl -0.059194 0.010706 -5.5
subjectmath:craven -0.003720 0.000930 -4.0
anova(mmod)
Analysis of Variance Table
Df Sum Sq Mean Sq F value
subject 1 53.7 53.7 3953.67
gender 1 0.1 0.1 7.46
craven 1 6.0 6.0 444.63
social 8 0.7 0.1 6.47
subject:gender 1 0.4 0.4 28.23
subject:craven 1 0.2 0.2 15.99
Estimates but no inferential output as expected
confint(mmod, method="boot")
2.5 % 97.5 %
sd_(Intercept)|school:class:id 0.0917857 0.1097861
sd_(Intercept)|school:class 0.0000000 0.0423214
sd_(Intercept)|school 0.0294989 0.0613863
sigma 0.1113921 0.1212957
(Intercept) 0.3909352 0.4935450
subjectmath 0.3518251 0.3805216
gendergirl 0.0422825 0.0833143
craven 0.0155256 0.0192694
social2 -0.0391574 0.0662833
social3 -0.0729605 0.0302451
social4 -0.1167668 -0.0194739
social5 -0.1045879 0.0040000
social6 -0.1432620 -0.0305644
social7 -0.1599951 -0.0331508
social8 -0.1636875 0.0050874
social9 -0.0989284 0.0055644
subjectmath:gendergirl -0.0795496 -0.0385646
subjectmath:craven -0.0054501 -0.0017955
Again some evidence that there is little variation between the classes at a school.
We can reconstruct an F-test of the subject by craven term in the model using the KR adjustment.
library(pbkrtest)
mmod <- lmer(score ~ subject*gender+craven*subject+social+ (1|school)+(1|school:class)+(1|school:class:id),mjspr, REML=FALSE)
mmodr <- lmer(score ~ subject*gender+craven+subject+social+(1|school)+(1|school:class)+(1|school:class:id),mjspr, REML=FALSE)
KRmodcomp(mmod, mmodr)
F-test with Kenward-Roger approximation; computing time: 0.61 sec.
large : score ~ subject + gender + craven + social + (1 | school) + (1 |
school:class) + (1 | school:class:id) + subject:gender +
subject:craven
small : score ~ subject * gender + craven + subject + social + (1 | school) +
(1 | school:class) + (1 | school:class:id)
stat ndf ddf F.scaling p.value
Ftest 16 1 950 1 6.9e-05
The outcome is very similar to the printed result. This is not surprising given the larger size of the sample.
library(lmerTest)
mmod <- lmer(score ~ subject*gender+craven*subject+social+(1|school)+(1|school:class)+(1|school:class:id),mjspr)
print(summary(mmod),corr=FALSE)
Linear mixed model fit by REML t-tests use Satterthwaite approximations
to degrees of freedom [merModLmerTest]
Formula:
score ~ subject * gender + craven * subject + social + (1 | school) +
(1 | school:class) + (1 | school:class:id)
Data: mjspr
REML criterion at convergence: -1741.6
Scaled residuals:
Min 1Q Median 3Q Max
-2.6654 -0.5692 0.0072 0.5641 2.5870
Random effects:
Groups Name Variance Std.Dev.
school:class:id (Intercept) 0.010252 0.1013
school:class (Intercept) 0.000582 0.0241
school (Intercept) 0.002231 0.0472
Residual 0.013592 0.1166
Number of obs: 1906, groups:
school:class:id, 953; school:class, 90; school, 48
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 4.42e-01 2.65e-02 8.36e+02 16.69 < 2e-16
subjectmath 3.67e-01 7.71e-03 9.50e+02 47.54 < 2e-16
gendergirl 6.34e-02 1.03e-02 1.52e+03 6.18 8.3e-10
craven 1.74e-02 9.25e-04 1.49e+03 18.81 < 2e-16
social2 1.38e-02 2.72e-02 9.27e+02 0.51 0.6136
social3 -2.08e-02 2.90e-02 9.29e+02 -0.72 0.4737
social4 -7.07e-02 2.59e-02 9.36e+02 -2.73 0.0064
social5 -5.05e-02 2.88e-02 9.34e+02 -1.75 0.0802
social6 -8.79e-02 3.07e-02 9.35e+02 -2.86 0.0043
social7 -9.94e-02 3.16e-02 9.39e+02 -3.15 0.0017
social8 -8.16e-02 4.24e-02 9.22e+02 -1.93 0.0543
social9 -4.73e-02 2.74e-02 9.37e+02 -1.72 0.0849
subjectmath:gendergirl -5.92e-02 1.07e-02 9.50e+02 -5.53 4.2e-08
subjectmath:craven -3.72e-03 9.30e-04 9.50e+02 -4.00 6.9e-05
anova(mmod)
Analysis of Variance Table of type 3 with Satterthwaite
approximation for degrees of freedom
Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)
subject 54.0 54.0 1 950 3975 < 2e-16
gender 0.2 0.2 1 915 15 0.00012
craven 5.1 5.1 1 923 378 < 2e-16
social 0.7 0.1 8 927 6 3.2e-08
subject:gender 0.4 0.4 1 950 31 4.2e-08
subject:craven 0.2 0.2 1 950 16 6.9e-05
Can test the random effects:
rand(mmod)
Analysis of Random effects Table:
Chi.sq Chi.DF p.value
school 8.63 1 0.003
school:class 1.44 1 0.230
school:class:id 186.70 1 <2e-16
We need to create unique labels for the classes within schools and the students within classes within schools. The reason for this is explained in the nested example.
library(MCMCglmm)
mjspr$classch <- paste(mjspr$school,mjspr$class,sep=".")
mjspr$idclassch <- paste(mjspr$school,mjspr$class,mjspr$id,sep=".")
We need to use an expanded prior as in previous examples.
eprior <- list(R=list(V=1,nu=0.02),G=list(G1=list(V=1,nu=0.02,alpha.V=1000),G2=list(V=1,nu=0.02,alpha.V=1000),G3=list(V=1,nu=0.02,alpha.V=1000)))
bmod <- MCMCglmm(score ~ subject*gender+craven*subject+social, random=~school+classch+idclassch,data=mjspr, verbose=FALSE, prior=eprior)
xyplot(log(bmod$VCV))
The diagnostics for the variance components look satisfactory.
summary(bmod)
Iterations = 3001:12991
Thinning interval = 10
Sample size = 1000
DIC: -2179.3
G-structure: ~school
post.mean l-95% CI u-95% CI eff.samp
school 0.00231 0.000613 0.00415 599
~classch
post.mean l-95% CI u-95% CI eff.samp
classch 0.000692 9.23e-10 0.00195 371
~idclassch
post.mean l-95% CI u-95% CI eff.samp
idclassch 0.0103 0.00859 0.012 1000
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 0.0136 0.0124 0.0149 1000
Location effects: score ~ subject * gender + craven * subject + social
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 0.44059 0.38982 0.49213 1000 <0.001
subjectmath 0.36694 0.35075 0.38002 1000 <0.001
gendergirl 0.06345 0.04297 0.08194 1000 <0.001
craven 0.01732 0.01558 0.01918 1000 <0.001
social2 0.01489 -0.03251 0.07277 1000 0.584
social3 -0.02000 -0.07337 0.03750 1000 0.506
social4 -0.06980 -0.12007 -0.01788 1000 0.006
social5 -0.04887 -0.10700 0.00441 1000 0.098
social6 -0.08775 -0.14386 -0.02943 1000 0.004
social7 -0.09884 -0.15992 -0.03638 1000 0.002
social8 -0.07940 -0.15918 0.00969 1000 0.064
social9 -0.04632 -0.09927 0.00603 1000 0.096
subjectmath:gendergirl -0.05937 -0.08086 -0.04009 1000 <0.001
subjectmath:craven -0.00370 -0.00550 -0.00203 804 <0.001
The results are quite similar to the lme4
fit with all the fixed effects looking significant. Again the class variation seems to be smallest of three components of variation in the heirarchy.
We can view the posterior distributions for the SDs corresponding to the four components of variation:
library(reshape2)
ref <- data.frame(sqrt(bmod$VCV))
rdf <- melt(ref, value.name="score")
ggplot(rdf, aes(x=score,color=variable))+geom_density()
We can clearly see the range for these components. As usual, we are most sure about the error SD which will represent the difference between maths and english scores for the same student. Interestingly, this variation is the largest.
Similar plots can be made of the fixed effect posteriors.
Check on versions of R packages used:
sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reshape2_1.2.2 MCMCglmm_2.21 ape_3.1-4 coda_0.16-1
[5] lattice_0.20-29 lmerTest_2.0-20 pbkrtest_0.4-1 MASS_7.3-33
[9] ggplot2_0.9.3.1 lme4_1.1-7 Rcpp_0.11.3 Matrix_1.1-4
[13] knitr_1.6 rmarkdown_0.2.68
loaded via a namespace (and not attached):
[1] bitops_1.0-6 boot_1.3-11 caTools_1.16
[4] cluster_1.15.2 colorspace_1.2-4 corpcor_1.6.7
[7] dichromat_2.0-0 digest_0.6.4 evaluate_0.5.3
[10] formatR_0.10 Formula_1.1-1 gdata_2.13.3
[13] gplots_2.13.0 grid_3.1.1 gtable_0.1.2
[16] gtools_3.3.1 Hmisc_3.14-3 htmltools_0.2.4
[19] KernSmooth_2.23-12 labeling_0.2 latticeExtra_0.6-26
[22] minqa_1.2.3 munsell_0.4.2 nlme_3.1-117
[25] nloptr_1.0.4 numDeriv_2012.9-1 parallel_3.1.1
[28] plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5
[31] scales_0.2.3 splines_3.1.1 stringr_0.6.2
[34] survival_2.37-7 tensorA_0.36 tools_3.1.1
[37] yaml_2.1.11