[1] "Created: Wed Apr 1 16:28:54 2015"
See the introduction for an overview.
library(lme4)
library(ggplot2)
options(digits=5,show.signif.stars=FALSE)
Load in and plot the data:
data(irrigation, package="faraway")
summary(irrigation)
field irrigation variety yield
f1 :2 i1:4 v1:8 Min. :34.8
f2 :2 i2:4 v2:8 1st Qu.:37.6
f3 :2 i3:4 Median :40.1
f4 :2 i4:4 Mean :40.2
f5 :2 3rd Qu.:42.7
f6 :2 Max. :47.6
(Other):4
ggplot(irrigation, aes(y=yield, x=field, shape=variety, color=irrigation)) + geom_point()
mmod <- lmer(yield ~ irrigation * variety + (1|field),data=irrigation)
print(summary(mmod),correlation=FALSE)
Linear mixed model fit by REML ['lmerMod']
Formula: yield ~ irrigation * variety + (1 | field)
Data: irrigation
REML criterion at convergence: 45.4
Scaled residuals:
Min 1Q Median 3Q Max
-0.745 -0.551 0.000 0.551 0.745
Random effects:
Groups Name Variance Std.Dev.
field (Intercept) 16.20 4.02
Residual 2.11 1.45
Number of obs: 16, groups: field, 8
Fixed effects:
Estimate Std. Error t value
(Intercept) 38.50 3.03 12.73
irrigationi2 1.20 4.28 0.28
irrigationi3 0.70 4.28 0.16
irrigationi4 3.50 4.28 0.82
varietyv2 0.60 1.45 0.41
irrigationi2:varietyv2 -0.40 2.05 -0.19
irrigationi3:varietyv2 -0.20 2.05 -0.10
irrigationi4:varietyv2 1.20 2.05 0.58
anova(mmod)
Analysis of Variance Table
Df Sum Sq Mean Sq F value
irrigation 3 2.46 0.818 0.39
variety 1 2.25 2.250 1.07
irrigation:variety 3 1.55 0.517 0.25
No inferential information is provided although confidence intervals can be computed:
confint(mmod)
2.5 % 97.5 %
.sig01 1.76705 5.22153
.sigma 0.67350 1.84234
(Intercept) 33.80068 43.19933
irrigationi2 -5.44585 7.84585
irrigationi3 -5.94585 7.34585
irrigationi4 -3.14585 10.14585
varietyv2 -1.67942 2.87942
irrigationi2:varietyv2 -0.37198 0.43796
irrigationi3:varietyv2 -0.17198 0.63796
irrigationi4:varietyv2 1.22802 2.03796
The model can be fit using aov
:
lmod <- aov(yield ~ irrigation*variety + Error(field), irrigation)
summary(lmod)
Error: field
Df Sum Sq Mean Sq F value Pr(>F)
irrigation 3 40.2 13.4 0.39 0.77
Residuals 4 138.0 34.5
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
variety 1 2.25 2.250 1.07 0.36
irrigation:variety 3 1.55 0.517 0.25 0.86
Residuals 4 8.43 2.107
The analysis takes account of the fact that the irrigation does not vary within the field. Note that the F-statistics are the same as the ANOVA table obtained originally from lmer
.
We can test the interaction effect:
library(pbkrtest)
lmodr <- lmer(yield ~ irrigation * variety + (1|field),data=irrigation)
lmoda <- lmer(yield ~ irrigation + variety + (1|field),data=irrigation)
KRmodcomp(lmodr, lmoda)
F-test with Kenward-Roger approximation; computing time: 0.13 sec.
large : yield ~ irrigation * variety + (1 | field)
small : yield ~ irrigation + variety + (1 | field)
stat ndf ddf F.scaling p.value
Ftest 0.25 3.00 4.00 1 0.86
Because of the balance in the data, the F-test requires no adjustment and the outcome is identical with that presented in the printed textbook. We can also test the main effect terms although we are not able to exactly reproduce the results in the text because we must frame the test as model comparisons in contrast to the ANOVA table in text.
First we test the variety term by removing it from the main effects model and making the comparison.
lmodi <- lmer(yield ~ irrigation + (1|field),data=irrigation)
KRmodcomp(lmoda, lmodi)
F-test with Kenward-Roger approximation; computing time: 0.16 sec.
large : yield ~ irrigation + variety + (1 | field)
small : yield ~ irrigation + (1 | field)
stat ndf ddf F.scaling p.value
Ftest 1.58 1.00 7.00 1 0.25
The test-statistic and p-value are not identical with the text because the ANOVA table uses the denominator mean square and degrees of freedom from the full model with the interaction. We can test the irrigation effect in the same way:
lmodv <- lmer(yield ~ variety + (1|field),data=irrigation)
KRmodcomp(lmoda, lmodv)
F-test with Kenward-Roger approximation; computing time: 0.06 sec.
large : yield ~ irrigation + variety + (1 | field)
small : yield ~ variety + (1 | field)
stat ndf ddf F.scaling p.value
Ftest 0.39 3.00 4.00 1 0.77
In this case, it is clear that neither irrigation or variety have an impact on the yield. The parametric bootstrap method can also be used here to test the fixed effects although in this case, due the balance of data, it is not going to provide any advantage over the KR-adjusted result.
We can test the field effect like this:
library(RLRsim)
pmod <- lmer(yield ~ irrigation + variety + (1|field),data=irrigation, REML=FALSE)
lmod <- lm(yield ~ irrigation + variety,data=irrigation)
exactLRT(pmod, lmod)
simulated finite sample distribution of LRT. (p-value based on
10000 simulated values)
data:
LRT = 11.042, p-value = 4e-04
Clearly the fields do vary as is already apparent from the plot above.
library(lmerTest)
mmod <- lmer(yield ~ irrigation * variety + (1|field),data=irrigation)
print(summary(mmod),correlation=FALSE)
Linear mixed model fit by REML t-tests use Satterthwaite approximations
to degrees of freedom [merModLmerTest]
Formula: yield ~ irrigation * variety + (1 | field)
Data: irrigation
REML criterion at convergence: 45.4
Scaled residuals:
Min 1Q Median 3Q Max
-0.745 -0.551 0.000 0.551 0.745
Random effects:
Groups Name Variance Std.Dev.
field (Intercept) 16.20 4.02
Residual 2.11 1.45
Number of obs: 16, groups: field, 8
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 38.50 3.03 4.49 12.73 0.00011
irrigationi2 1.20 4.28 4.49 0.28 0.79159
irrigationi3 0.70 4.28 4.49 0.16 0.87716
irrigationi4 3.50 4.28 4.49 0.82 0.45458
varietyv2 0.60 1.45 4.00 0.41 0.70058
irrigationi2:varietyv2 -0.40 2.05 4.00 -0.19 0.85502
irrigationi3:varietyv2 -0.20 2.05 4.00 -0.10 0.92708
irrigationi4:varietyv2 1.20 2.05 4.00 0.58 0.59026
anova(mmod)
Analysis of Variance Table of type 3 with Satterthwaite
approximation for degrees of freedom
Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)
irrigation 2.46 0.818 3 4 0.388 0.77
variety 2.25 2.250 1 4 1.068 0.36
irrigation:variety 1.55 0.517 3 4 0.245 0.86
The p-values match those previously obtained.
We can test the random effect term like this:
rand(mmod)
Analysis of Random effects Table:
Chi.sq Chi.DF p.value
field 6.11 1 0.01
We see that the field effect is significant although we would prefer the RLRsim
version of this test as being more accurate.
See the One way anova analysis for an introduction to MCMCglmm. We proceed directly with the final prior (using parameter expansion):
library(MCMCglmm)
eprior <- list(R=list(V=1,nu=0.02),G=list(G1=list(V=1,nu=0.02,alpha.V=1000)))
bmod <- MCMCglmm(yield ~ irrigation + variety, ~field, data=irrigation,verbose=FALSE,prior=eprior,pr=TRUE)
xyplot(bmod$Sol[,1:5])
xyplot(bmod$Sol[,6:13])
xyplot(log(bmod$VCV))
The diagnostics are satisfactory
summary(bmod)
Iterations = 3001:12991
Thinning interval = 10
Sample size = 1000
DIC: 61.554
G-structure: ~field
post.mean l-95% CI u-95% CI eff.samp
field 36.1 0.163 115 1000
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 2.01 0.406 4.61 1000
Location effects: yield ~ irrigation + variety
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 38.491 31.056 47.836 1181 <0.001
irrigationi2 0.823 -11.532 11.328 1000 0.87
irrigationi3 0.497 -11.672 12.944 1137 0.93
irrigationi4 3.947 -7.948 14.630 1000 0.41
varietyv2 0.715 -0.730 2.169 1000 0.28
The output shows pMCMC
which is twice the posterior probability observed on the other side of zero from the posterior mean. This is the same way in which a bootstrap p-value would be calculated and might be viewed as a “Bayesian p-value” (a dubious analogy). These suggest no strong evidence of a difference in the irrigation methods or varieties.
We can plot the posterior distributions.
library(reshape2)
ref <- data.frame(bmod$Sol[,2:4])
colnames(ref) <- c("2-1","3-1","4-1")
rdf <- melt(ref)
colnames(rdf) <- c("irrigation","yield")
ggplot(rdf, aes(x=yield,color=irrigation))+geom_density()
These are differences with the method 1 irrigation level. As we see all three overlap zero substantially hence the large p-value above. Furthermore, the three distributions all overlap considerably so there is little evidence of any difference between the four irrigation methods.
We check the variety contrast between 1 and 2:
ref <- data.frame(bmod$Sol[,5])
colnames(ref) <- "yield"
ggplot(ref,aes(x=yield))+geom_density()
We can also look at the field posterior distributions:
ref <- data.frame(bmod$Sol[,6:13])
colnames(ref) <- as.character(1:8)
rdf <- melt(ref)
colnames(rdf) <- c("field","yield")
ggplot(rdf, aes(x=yield,color=field))+geom_density()
We see two kinds of field posterior distribution - (1,2,3,4) and (5,6,7,8). We can look at the posterior distribution for the blend SD:
hist(sqrt(bmod$VCV[,1]), 50, xlab="yield", main="Field SD")
and also for the error SD:
hist(sqrt(bmod$VCV[,2]), 50, xlab="yield", main="Error SD")
We can also look at the joint distribution:
rdf <- data.frame(bmod$VCV)
ggplot(rdf, aes(x=sqrt(field),y=sqrt(units)))+geom_density2d()+geom_abline(int=0,slope=1)
Clear that the field variation is much greater than the error variation.
Check on versions of R packages used:
sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reshape2_1.2.2 MCMCglmm_2.21 ape_3.1-4 coda_0.16-1
[5] lattice_0.20-29 lmerTest_2.0-20 RLRsim_3.0 pbkrtest_0.4-1
[9] MASS_7.3-33 ggplot2_0.9.3.1 lme4_1.1-7 Rcpp_0.11.3
[13] Matrix_1.1-4 knitr_1.6 rmarkdown_0.2.68
loaded via a namespace (and not attached):
[1] bitops_1.0-6 caTools_1.16 cluster_1.15.2
[4] colorspace_1.2-4 corpcor_1.6.7 dichromat_2.0-0
[7] digest_0.6.4 evaluate_0.5.3 formatR_0.10
[10] Formula_1.1-1 gdata_2.13.3 gplots_2.13.0
[13] grid_3.1.1 gtable_0.1.2 gtools_3.3.1
[16] Hmisc_3.14-3 htmltools_0.2.4 KernSmooth_2.23-12
[19] labeling_0.2 latticeExtra_0.6-26 minqa_1.2.3
[22] munsell_0.4.2 nlme_3.1-117 nloptr_1.0.4
[25] numDeriv_2012.9-1 parallel_3.1.1 plyr_1.8.1
[28] proto_0.3-10 RColorBrewer_1.0-5 scales_0.2.3
[31] splines_3.1.1 stringr_0.6.2 survival_2.37-7
[34] tensorA_0.36 tools_3.1.1 yaml_2.1.11