Multilevel Models

Fit the model
pbkrtest package
Testing the random effects
lmerTest package
MCMCglmm

[1] "Created: Wed Apr  1 16:39:44 2015"

See the introduction for an overview.

library(lme4)
library(ggplot2)
options(digits=5,show.signif.stars=FALSE)

Load in and summarize the data:

data(psid, package="faraway")
head(psid)

  age educ sex income year person
1  31   12   M   6000   68      1
2  31   12   M   5300   69      1
3  31   12   M   5200   70      1
4  31   12   M   6900   71      1
5  31   12   M   7500   72      1
6  31   12   M   8000   73      1

summary(psid)

      age            educ      sex         income            year     
 Min.   :25.0   Min.   : 3.0   F:732   Min.   :     3   Min.   :68.0  
 1st Qu.:28.0   1st Qu.:10.0   M:929   1st Qu.:  4300   1st Qu.:73.0  
 Median :34.0   Median :12.0           Median :  9000   Median :78.0  
 Mean   :32.2   Mean   :11.8           Mean   : 13575   Mean   :78.6  
 3rd Qu.:36.0   3rd Qu.:13.0           3rd Qu.: 18050   3rd Qu.:84.0  
 Max.   :39.0   Max.   :16.0           Max.   :180000   Max.   :90.0  
     person    
 Min.   : 1.0  
 1st Qu.:20.0  
 Median :42.0  
 Mean   :42.4  
 3rd Qu.:63.0  
 Max.   :85.0

Construct some plots:

library(dplyr)
psid20 <- filter(psid, person <= 20)
ggplot(psid20, aes(x=year, y=income))+geom_line()+facet_wrap(~ person)

plot of chunk unnamed-chunk-4

ggplot(psid20, aes(x=year, y=income+100, group=person)) +geom_line()+facet_wrap(~ sex)+scale_y_log10()

plot of chunk unnamed-chunk-4

Fit the model

Fit the textbook model:

psid$cyear <- psid$year-78
mmod <- lmer(log(income) ~ cyear*sex +age+educ+(cyear|person),psid)
print(summary(mmod), corr=FALSE)

Linear mixed model fit by REML ['lmerMod']
Formula: log(income) ~ cyear * sex + age + educ + (cyear | person)
   Data: psid

REML criterion at convergence: 3819.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-10.231  -0.213   0.079   0.415   2.825 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 person   (Intercept) 0.2817   0.531        
          cyear       0.0024   0.049    0.19
 Residual             0.4673   0.684        
Number of obs: 1661, groups:  person, 85

Fixed effects:
            Estimate Std. Error t value
(Intercept)   6.6742     0.5433   12.28
cyear         0.0853     0.0090    9.48
sexM          1.1503     0.1213    9.48
age           0.0109     0.0135    0.81
educ          0.1042     0.0214    4.86
cyear:sexM   -0.0263     0.0122   -2.15

anova(mmod)

Analysis of Variance Table
          Df Sum Sq Mean Sq F value
cyear      1   61.6    61.6  131.76
sex        1   45.8    45.8   97.92
age        1    0.0     0.0    0.00
educ       1   10.9    10.9   23.41
cyear:sex  1    2.2     2.2    4.62

Estimates but no inferential output as expected.

Compute parametric bootstrap confidence intervals:

confint(mmod, method="boot")

                                 2.5 %     97.5 %
sd_(Intercept)|person         0.448599  0.6177670
cor_cyear.(Intercept)|person -0.073465  0.4363037
sd_cyear|person               0.039640  0.0583016
sigma                         0.660695  0.7104605
(Intercept)                   5.585035  7.7072513
cyear                         0.068175  0.1032825
sexM                          0.920625  1.3845276
age                          -0.015304  0.0378592
educ                          0.062602  0.1491223
cyear:sexM                   -0.050078 -0.0037928

Some indication of no correlation between the slope and intercept random effects. Also perhaps no year by sex interaction.

pbkrtest package

We can test the interaction term between year and sex in the model using the KR adjustment:

library(pbkrtest)
psid$cyear <- psid$year-78
mmod <- lmer(log(income) ~ cyear*sex +age+educ+(cyear|person),psid, REML=FALSE)
mmodr <- lmer(log(income) ~ cyear + sex +age+educ+(cyear|person),psid, REML=FALSE)
KRmodcomp(mmod,mmodr)

F-test with Kenward-Roger approximation; computing time: 0.30 sec.
large : log(income) ~ cyear + sex + age + educ + (cyear | person) + cyear:sex
small : log(income) ~ cyear + sex + age + educ + (cyear | person)
       stat   ndf   ddf F.scaling p.value
Ftest  4.61  1.00 81.33         1   0.035

The result is very close to the p-value obtained for the t-test of this term in the textbook output although slightly different from the bootstrapped confidence interval result.

Testing the random effects

Using RLRsim is not possible because of having multiple random effects. We can implement the parametric bootstrap directly to test whether there is any variance in the slopes of the random effects

First we compute the observed LRT statistic

mmod <-  lmer(log(income) ~ cyear*sex +age+educ+(cyear|person),psid, REML=FALSE)
rmod <-   lmer(log(income) ~ cyear*sex +age+educ+(1|person),psid, REML=FALSE)
(olrt <- 2*(logLik(mmod)-logLik(rmod)))

'log Lik.' 142.3 (df=10)

Now we resample 1000 times: (notice that the simulated response is already on the log-scale so we should not take logs again)

lrstat <- numeric(1000)
for(i in 1:1000){
rincome <- unlist(simulate(rmod))
bmod <-    refit(mmod,rincome)
smod <-    refit(rmod,rincome)
lrstat[i] <- 2*(logLik(bmod)-logLik(smod))
}
mean(lrstat > olrt)

[1] 0

Looks like there is some variation in the slopes.

lmerTest package

library(lmerTest)
mmod <- lmer(log(income) ~ cyear*sex +age+educ+(cyear|person),psid)
print(summary(mmod), corr=FALSE)

Linear mixed model fit by REML t-tests use Satterthwaite approximations
  to degrees of freedom [merModLmerTest]
Formula: log(income) ~ cyear * sex + age + educ + (cyear | person)
   Data: psid

REML criterion at convergence: 3819.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-10.231  -0.213   0.079   0.415   2.825 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 person   (Intercept) 0.2817   0.531        
          cyear       0.0024   0.049    0.19
 Residual             0.4673   0.684        
Number of obs: 1661, groups:  person, 85

Fixed effects:
            Estimate Std. Error      df t value Pr(>|t|)
(Intercept)   6.6742     0.5433 81.2000   12.28  < 2e-16
cyear         0.0853     0.0090 78.9000    9.48  1.2e-14
sexM          1.1503     0.1213 81.8000    9.48  8.0e-15
age           0.0109     0.0135 80.8000    0.81    0.421
educ          0.1042     0.0214 80.7000    4.86  5.7e-06
cyear:sexM   -0.0263     0.0122 78.0000   -2.15    0.035

anova(mmod)

Analysis of Variance Table of type 3  with  Satterthwaite 
approximation for degrees of freedom
          Sum Sq Mean Sq NumDF DenDF F.value  Pr(>F)
cyear       65.0    65.0     1  78.0   139.1 < 2e-16
sex         42.0    42.0     1  81.8    89.9 8.0e-15
age          0.3     0.3     1  80.8     0.7   0.421
educ        11.0    11.0     1  80.7    23.6 5.7e-06
cyear:sex    2.2     2.2     1  78.0     4.6   0.035

Can test the random effects:

rand(mmod)

Analysis of Random effects Table:
             Chi.sq Chi.DF p.value
cyear:person    145      2  <2e-16

but we prefer the parametric bootstrap result because of problems with asymptotic distribution of the LRT in such cases.

MCMCglmm

We use the default priors in this instance with an unstructured covariance matrix for the random effects consisting of the intercept and slope for each person:

library(MCMCglmm)
bmod <- MCMCglmm(log(income) ~ cyear*sex+age+educ, random=~us(1+cyear):person, data=psid, verbose=FALSE)
xyplot(bmod$VCV)

plot of chunk unnamed-chunk-12

The diagnostics appear satisfactory. Notice that the covariance term between slope and intercept appears twice (identically). We can investigate this correlation:

yrpercor <- bmod$VCV[,2]/sqrt(bmod$VCV[,1]*bmod$VCV[,4])
ggplot(data.frame(corr=yrpercor),aes(x=var1))+geom_density()+xlab("Correlation")

plot of chunk unnamed-chunk-13

2*mean(yrpercor < 0)

[1] 0.15

Seems it would be not unreasonable drop the correlation allowing a simplified model:

bmod <- MCMCglmm(log(income) ~ cyear*sex+age+educ, random=~idh(1+cyear):person, data=psid, verbose=FALSE)
xyplot(bmod$VCV)

plot of chunk unnamed-chunk-14

Again the diagnostics appear satisfactory. Check the summary output:

summary(bmod)


 Iterations = 3001:12991
 Thinning interval  = 10
 Sample size  = 1000 

 DIC: 3595.4 

 G-structure:  ~idh(1 + cyear):person

                   post.mean l-95% CI u-95% CI eff.samp
(Intercept).person   0.28924  0.18870   0.3831     1000
cyear.person         0.00249  0.00164   0.0037     1195

 R-structure:  ~units

      post.mean l-95% CI u-95% CI eff.samp
units     0.468    0.434    0.499     1000

 Location effects: log(income) ~ cyear * sex + age + educ 

            post.mean  l-95% CI  u-95% CI eff.samp  pMCMC
(Intercept)  6.633254  5.558990  7.806761     1182 <0.001
cyear        0.085062  0.067315  0.104848     1000 <0.001
sexM         1.153738  0.933499  1.410487      909 <0.001
age          0.010706 -0.015798  0.041155     1148  0.428
educ         0.108139  0.066923  0.150871     1248 <0.001
cyear:sexM  -0.025771 -0.049605 -0.000359     1000  0.036

We see that the age term may not be important but there is a stronger contribution from other terms. Both the random effect terms appear to be appreciable. We can check the posteriors. First for the intercept variation:

ref <- data.frame(bmod$VCV)
colnames(ref) <- c("intercept","slope","error")
ggplot(ref, aes(x=intercept))+geom_density()

plot of chunk unnamed-chunk-16

and now for the slope:

ggplot(ref, aes(x=slope))+geom_density()

plot of chunk unnamed-chunk-17

and finally for the error term:

ggplot(ref, aes(x=error))+geom_density()

plot of chunk unnamed-chunk-18

Check on versions of R packages used:

sessionInfo()

R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] MCMCglmm_2.21    ape_3.1-4        coda_0.16-1      lattice_0.20-29 
 [5] lmerTest_2.0-20  pbkrtest_0.4-1   MASS_7.3-33      dplyr_0.4.1     
 [9] ggplot2_0.9.3.1  lme4_1.1-7       Rcpp_0.11.3      Matrix_1.1-4    
[13] knitr_1.6        rmarkdown_0.2.68

loaded via a namespace (and not attached):
 [1] assertthat_0.1      bitops_1.0-6        boot_1.3-11        
 [4] caTools_1.16        cluster_1.15.2      colorspace_1.2-4   
 [7] corpcor_1.6.7       DBI_0.3.1           dichromat_2.0-0    
[10] digest_0.6.4        evaluate_0.5.3      formatR_0.10       
[13] Formula_1.1-1       gdata_2.13.3        gplots_2.13.0      
[16] grid_3.1.1          gtable_0.1.2        gtools_3.3.1       
[19] Hmisc_3.14-3        htmltools_0.2.4     KernSmooth_2.23-12 
[22] labeling_0.2        latticeExtra_0.6-26 lazyeval_0.1.10    
[25] magrittr_1.0.1      minqa_1.2.3         munsell_0.4.2      
[28] nlme_3.1-117        nloptr_1.0.4        numDeriv_2012.9-1  
[31] parallel_3.1.1      plyr_1.8.1          proto_0.3-10       
[34] RColorBrewer_1.0-5  reshape2_1.2.2      scales_0.2.3       
[37] splines_3.1.1       stringr_0.6.2       survival_2.37-7    
[40] tensorA_0.36        tools_3.1.1         yaml_2.1.11