[1] "Created: Wed Apr  1 16:39:44 2015"See the introduction for an overview.
library(lme4)
library(ggplot2)
options(digits=5,show.signif.stars=FALSE)Load in and summarize the data:
data(psid, package="faraway")
head(psid)  age educ sex income year person
1  31   12   M   6000   68      1
2  31   12   M   5300   69      1
3  31   12   M   5200   70      1
4  31   12   M   6900   71      1
5  31   12   M   7500   72      1
6  31   12   M   8000   73      1summary(psid)      age            educ      sex         income            year     
 Min.   :25.0   Min.   : 3.0   F:732   Min.   :     3   Min.   :68.0  
 1st Qu.:28.0   1st Qu.:10.0   M:929   1st Qu.:  4300   1st Qu.:73.0  
 Median :34.0   Median :12.0           Median :  9000   Median :78.0  
 Mean   :32.2   Mean   :11.8           Mean   : 13575   Mean   :78.6  
 3rd Qu.:36.0   3rd Qu.:13.0           3rd Qu.: 18050   3rd Qu.:84.0  
 Max.   :39.0   Max.   :16.0           Max.   :180000   Max.   :90.0  
     person    
 Min.   : 1.0  
 1st Qu.:20.0  
 Median :42.0  
 Mean   :42.4  
 3rd Qu.:63.0  
 Max.   :85.0  Construct some plots:
library(dplyr)
psid20 <- filter(psid, person <= 20)
ggplot(psid20, aes(x=year, y=income))+geom_line()+facet_wrap(~ person)ggplot(psid20, aes(x=year, y=income+100, group=person)) +geom_line()+facet_wrap(~ sex)+scale_y_log10()Fit the textbook model:
psid$cyear <- psid$year-78
mmod <- lmer(log(income) ~ cyear*sex +age+educ+(cyear|person),psid)
print(summary(mmod), corr=FALSE)Linear mixed model fit by REML ['lmerMod']
Formula: log(income) ~ cyear * sex + age + educ + (cyear | person)
   Data: psid
REML criterion at convergence: 3819.8
Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-10.231  -0.213   0.079   0.415   2.825 
Random effects:
 Groups   Name        Variance Std.Dev. Corr
 person   (Intercept) 0.2817   0.531        
          cyear       0.0024   0.049    0.19
 Residual             0.4673   0.684        
Number of obs: 1661, groups:  person, 85
Fixed effects:
            Estimate Std. Error t value
(Intercept)   6.6742     0.5433   12.28
cyear         0.0853     0.0090    9.48
sexM          1.1503     0.1213    9.48
age           0.0109     0.0135    0.81
educ          0.1042     0.0214    4.86
cyear:sexM   -0.0263     0.0122   -2.15anova(mmod)Analysis of Variance Table
          Df Sum Sq Mean Sq F value
cyear      1   61.6    61.6  131.76
sex        1   45.8    45.8   97.92
age        1    0.0     0.0    0.00
educ       1   10.9    10.9   23.41
cyear:sex  1    2.2     2.2    4.62Estimates but no inferential output as expected.
Compute parametric bootstrap confidence intervals:
confint(mmod, method="boot")                                 2.5 %     97.5 %
sd_(Intercept)|person         0.448599  0.6177670
cor_cyear.(Intercept)|person -0.073465  0.4363037
sd_cyear|person               0.039640  0.0583016
sigma                         0.660695  0.7104605
(Intercept)                   5.585035  7.7072513
cyear                         0.068175  0.1032825
sexM                          0.920625  1.3845276
age                          -0.015304  0.0378592
educ                          0.062602  0.1491223
cyear:sexM                   -0.050078 -0.0037928Some indication of no correlation between the slope and intercept random effects. Also perhaps no year by sex interaction.
We can test the interaction term between year and sex in the model using the KR adjustment:
library(pbkrtest)
psid$cyear <- psid$year-78
mmod <- lmer(log(income) ~ cyear*sex +age+educ+(cyear|person),psid, REML=FALSE)
mmodr <- lmer(log(income) ~ cyear + sex +age+educ+(cyear|person),psid, REML=FALSE)
KRmodcomp(mmod,mmodr)F-test with Kenward-Roger approximation; computing time: 0.30 sec.
large : log(income) ~ cyear + sex + age + educ + (cyear | person) + cyear:sex
small : log(income) ~ cyear + sex + age + educ + (cyear | person)
       stat   ndf   ddf F.scaling p.value
Ftest  4.61  1.00 81.33         1   0.035The result is very close to the p-value obtained for the t-test of this term in the textbook output although slightly different from the bootstrapped confidence interval result.
Using RLRsim is not possible because of having multiple random effects. We can implement the parametric bootstrap directly to test whether there is any variance in the slopes of the random effects
First we compute the observed LRT statistic
mmod <-  lmer(log(income) ~ cyear*sex +age+educ+(cyear|person),psid, REML=FALSE)
rmod <-   lmer(log(income) ~ cyear*sex +age+educ+(1|person),psid, REML=FALSE)
(olrt <- 2*(logLik(mmod)-logLik(rmod)))'log Lik.' 142.3 (df=10)Now we resample 1000 times: (notice that the simulated response is already on the log-scale so we should not take logs again)
lrstat <- numeric(1000)
for(i in 1:1000){
rincome <- unlist(simulate(rmod))
bmod <-    refit(mmod,rincome)
smod <-    refit(rmod,rincome)
lrstat[i] <- 2*(logLik(bmod)-logLik(smod))
}
mean(lrstat > olrt)[1] 0Looks like there is some variation in the slopes.
library(lmerTest)
mmod <- lmer(log(income) ~ cyear*sex +age+educ+(cyear|person),psid)
print(summary(mmod), corr=FALSE)Linear mixed model fit by REML t-tests use Satterthwaite approximations
  to degrees of freedom [merModLmerTest]
Formula: log(income) ~ cyear * sex + age + educ + (cyear | person)
   Data: psid
REML criterion at convergence: 3819.8
Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-10.231  -0.213   0.079   0.415   2.825 
Random effects:
 Groups   Name        Variance Std.Dev. Corr
 person   (Intercept) 0.2817   0.531        
          cyear       0.0024   0.049    0.19
 Residual             0.4673   0.684        
Number of obs: 1661, groups:  person, 85
Fixed effects:
            Estimate Std. Error      df t value Pr(>|t|)
(Intercept)   6.6742     0.5433 81.2000   12.28  < 2e-16
cyear         0.0853     0.0090 78.9000    9.48  1.2e-14
sexM          1.1503     0.1213 81.8000    9.48  8.0e-15
age           0.0109     0.0135 80.8000    0.81    0.421
educ          0.1042     0.0214 80.7000    4.86  5.7e-06
cyear:sexM   -0.0263     0.0122 78.0000   -2.15    0.035anova(mmod)Analysis of Variance Table of type 3  with  Satterthwaite 
approximation for degrees of freedom
          Sum Sq Mean Sq NumDF DenDF F.value  Pr(>F)
cyear       65.0    65.0     1  78.0   139.1 < 2e-16
sex         42.0    42.0     1  81.8    89.9 8.0e-15
age          0.3     0.3     1  80.8     0.7   0.421
educ        11.0    11.0     1  80.7    23.6 5.7e-06
cyear:sex    2.2     2.2     1  78.0     4.6   0.035Can test the random effects:
rand(mmod)Analysis of Random effects Table:
             Chi.sq Chi.DF p.value
cyear:person    145      2  <2e-16but we prefer the parametric bootstrap result because of problems with asymptotic distribution of the LRT in such cases.
We use the default priors in this instance with an unstructured covariance matrix for the random effects consisting of the intercept and slope for each person:
library(MCMCglmm)
bmod <- MCMCglmm(log(income) ~ cyear*sex+age+educ, random=~us(1+cyear):person, data=psid, verbose=FALSE)
xyplot(bmod$VCV)The diagnostics appear satisfactory. Notice that the covariance term between slope and intercept appears twice (identically). We can investigate this correlation:
yrpercor <- bmod$VCV[,2]/sqrt(bmod$VCV[,1]*bmod$VCV[,4])
ggplot(data.frame(corr=yrpercor),aes(x=var1))+geom_density()+xlab("Correlation")2*mean(yrpercor < 0)[1] 0.15Seems it would be not unreasonable drop the correlation allowing a simplified model:
bmod <- MCMCglmm(log(income) ~ cyear*sex+age+educ, random=~idh(1+cyear):person, data=psid, verbose=FALSE)
xyplot(bmod$VCV)Again the diagnostics appear satisfactory. Check the summary output:
summary(bmod)
 Iterations = 3001:12991
 Thinning interval  = 10
 Sample size  = 1000 
 DIC: 3595.4 
 G-structure:  ~idh(1 + cyear):person
                   post.mean l-95% CI u-95% CI eff.samp
(Intercept).person   0.28924  0.18870   0.3831     1000
cyear.person         0.00249  0.00164   0.0037     1195
 R-structure:  ~units
      post.mean l-95% CI u-95% CI eff.samp
units     0.468    0.434    0.499     1000
 Location effects: log(income) ~ cyear * sex + age + educ 
            post.mean  l-95% CI  u-95% CI eff.samp  pMCMC
(Intercept)  6.633254  5.558990  7.806761     1182 <0.001
cyear        0.085062  0.067315  0.104848     1000 <0.001
sexM         1.153738  0.933499  1.410487      909 <0.001
age          0.010706 -0.015798  0.041155     1148  0.428
educ         0.108139  0.066923  0.150871     1248 <0.001
cyear:sexM  -0.025771 -0.049605 -0.000359     1000  0.036We see that the age term may not be important but there is a stronger contribution from other terms. Both the random effect terms appear to be appreciable. We can check the posteriors. First for the intercept variation:
ref <- data.frame(bmod$VCV)
colnames(ref) <- c("intercept","slope","error")
ggplot(ref, aes(x=intercept))+geom_density()and now for the slope:
ggplot(ref, aes(x=slope))+geom_density()and finally for the error term:
ggplot(ref, aes(x=error))+geom_density()Check on versions of R packages used:
sessionInfo()R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
 [1] MCMCglmm_2.21    ape_3.1-4        coda_0.16-1      lattice_0.20-29 
 [5] lmerTest_2.0-20  pbkrtest_0.4-1   MASS_7.3-33      dplyr_0.4.1     
 [9] ggplot2_0.9.3.1  lme4_1.1-7       Rcpp_0.11.3      Matrix_1.1-4    
[13] knitr_1.6        rmarkdown_0.2.68
loaded via a namespace (and not attached):
 [1] assertthat_0.1      bitops_1.0-6        boot_1.3-11        
 [4] caTools_1.16        cluster_1.15.2      colorspace_1.2-4   
 [7] corpcor_1.6.7       DBI_0.3.1           dichromat_2.0-0    
[10] digest_0.6.4        evaluate_0.5.3      formatR_0.10       
[13] Formula_1.1-1       gdata_2.13.3        gplots_2.13.0      
[16] grid_3.1.1          gtable_0.1.2        gtools_3.3.1       
[19] Hmisc_3.14-3        htmltools_0.2.4     KernSmooth_2.23-12 
[22] labeling_0.2        latticeExtra_0.6-26 lazyeval_0.1.10    
[25] magrittr_1.0.1      minqa_1.2.3         munsell_0.4.2      
[28] nlme_3.1-117        nloptr_1.0.4        numDeriv_2012.9-1  
[31] parallel_3.1.1      plyr_1.8.1          proto_0.3-10       
[34] RColorBrewer_1.0-5  reshape2_1.2.2      scales_0.2.3       
[37] splines_3.1.1       stringr_0.6.2       survival_2.37-7    
[40] tensorA_0.36        tools_3.1.1         yaml_2.1.11