[1] "Created: Wed Apr 1 16:44:31 2015"
See the introduction for an overview.
library(lme4)
library(ggplot2)
options(digits=5,show.signif.stars=FALSE)
Load up and look at the data:
data(pulp, package="faraway")
pulp
bright operator
1 59.8 a
2 60.0 a
3 60.8 a
4 60.8 a
5 59.8 a
6 59.8 b
7 60.2 b
8 60.4 b
9 59.9 b
10 60.0 b
11 60.7 c
12 60.7 c
13 60.5 c
14 60.9 c
15 60.3 c
16 61.0 d
17 60.8 d
18 60.6 d
19 60.5 d
20 60.5 d
ggplot(pulp, aes(x=operator, y=bright))+geom_point(position = position_jitter(width=0.1, height=0.0))
Fit a model with a fixed intercept and random effect for the operators:
mmod <- lmer(bright ~ 1+(1|operator), pulp)
summary(mmod)
Linear mixed model fit by REML ['lmerMod']
Formula: bright ~ 1 + (1 | operator)
Data: pulp
REML criterion at convergence: 18.6
Scaled residuals:
Min 1Q Median 3Q Max
-1.467 -0.759 -0.124 0.628 1.601
Random effects:
Groups Name Variance Std.Dev.
operator (Intercept) 0.0681 0.261
Residual 0.1062 0.326
Number of obs: 20, groups: operator, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 60.400 0.149 404
There is only an intercept term which we are not normally interested in testing. We can use a normal approximation to the t-statistic given in the output of 404 to test the hypothesis that the intercept is zero. Clearly the intercept is very far from zero.
We can produce confidence intervals using the profile likelihood (which is based on the likelihood ratio test):
confint(mmod)
2.5 % 97.5 %
.sig01 0.00000 0.61790
.sigma 0.23891 0.48218
(Intercept) 60.07130 60.72870
This also produces confidence intervals for the two random effect parameters.
The RLRsim
package offers a convenient way of using the parametric bootstrap method:
library(RLRsim)
smod <- lmer(bright ~ 1+(1|operator), pulp, REML=FALSE)
nullmod <- lm(bright ~ 1, pulp)
exactLRT(smod, nullmod)
simulated finite sample distribution of LRT. (p-value based on
10000 simulated values)
data:
LRT = 2.5684, p-value = 0.0232
We that the random effect term, operator
is statistically significant.
The lmerTest
package also allows a test on the random effect term:
library(lmerTest)
smod <- lmer(bright ~ 1+(1|operator), pulp, REML=FALSE)
rand(smod)
Analysis of Random effects Table:
Chi.sq Chi.DF p.value
operator 3.47 1 0.06
This test uses the twice the differences in the log-likelihoods to construct a test statistic with a chi-squared null distribution to generate the p-value. As explained in the text, this test is not accurate because the null hypothesis that the variance is zero lies on the boundary of the parameter space. This makes the asymptotic approximation of the chi-square invalid.
We can also take a Bayesian approach to analyzing this data using the MCMCglmm
package.
We fit the model using all the default settings which involve almost uninformative priors with 10,000 iterations and sampling every 10th case:
library(MCMCglmm)
bmod <- MCMCglmm(bright ~ 1, random= ~operator, data=pulp, verbose=FALSE)
Before we can start making any conclusions we need to verify that the MCMC has converged. We need to check for stability in the iterations:
First check the intercept term:
xyplot(bmod$Sol)
This looks stable. We should also check the autocorrelation:
autocorr.plot(bmod$Sol)
There no concern with correlation.
The same checks need to be made for the two variance terms:
xyplot(log(bmod$VCV))
This works better on a log scale for variances. There is more concern for the operator variance although there is no trend. For the operator, the chain oscillates between a state of larger variance and one of very small variance.
autocorr.plot(bmod$VCV)
The stronger correlation is apparent but not sufficient to stop us from proceeding.
summary(bmod)
Iterations = 3001:12991
Thinning interval = 10
Sample size = 1000
DIC: 21.71
G-structure: ~operator
post.mean l-95% CI u-95% CI eff.samp
operator 0.0574 2.94e-17 0.263 246
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 0.159 0.0567 0.277 244
Location effects: bright ~ 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 60.4 60.1 60.7 1000 <0.001
For the interecept term, the posterior mean of 60.4 is identical with the lme4
fit. The credible (not confidence) interval is also similar to our previous findings. For the variance terms the results differ more from the lme4
-based results.
Previously we tested the hypothesis that the operator variance was zero. This same question does not make sense in the Bayesian analysis although we can gain some insight by studying the posterior distribution of the operator variance. It is easier to understand as an SD rather than variance:
hist(sqrt(bmod$VCV[,1]),100, xlab="operator SD", main="posterior distribution")
We can see that there is a high probability that the operator SD is rather small. Notice that the response is only measured to one decimal digit. Hence an SD of 0.1, meaning a variance of 0.01, would be hardly noticeable. We can estimate the probability that the operator variance is less than 0.01 as
mean(bmod$VCV[,1] < 0.01)
[1] 0.683
which is quite high. So we are quite confident that the operator variance is small especially compared to the error variance:
hist(sqrt(bmod$VCV[,2]),100, xlab="error SD", main="posterior distribution")
In the previous frequentist based analysis, we tested the hypothesis that the operator variance is zero. The p-value as calculated by the more reliable parametric bootstrap method was smaller than 0.05 and so we reject this hypothesis. However, it is seems quite a strong claim to suggest that the operator variance is exactly zero. A weaker claim is that is small enough to ignore. The Bayesian analysis suggests that this claim is has considerable weight but there remains a reasonable chance that the operator variance is large enough that we cannot ignore it.
Although the MCMCglmm-based analysis provides new insights, there remains some concern over the model fitting. We need to be confident that the MCMC has converged correctly. We have shown how this can be checked but this a continuing concern. Furthermore, we have simply used the default choice of prior. It is sensible to see whether the results are sensitive to this choice.
We have used the default priors in the analysis above. We can experiment with different choices for the prior. The variance parameters have inverse Wishart priors which simplify to inverse gamma in one dimension. The parameter V and nu are related to the shape alpha and scale beta parameters in standard specification of the inverse gamma by: \[ \alpha = \nu/2 \qquad \beta=\nu V/2 \]
We define the inverse gamma density:
dinvgamma <- function(x,alpha=1,beta=1) beta^alpha*x^(-alpha-1)*exp(-beta/x)/gamma(alpha)
The default values are V=1, nu=0.02. This corresponds to a density of:
x <- (1:300)/100
plot(x, dinvgamma(x,0.01, 0.01), ylab="density", yaxs="i",type="l")
As can be seen, this apparently puts more prior mass on smaller values of the variance. However, this is somewhat misleading when viewed on an additive scale.
If we increase nu to 0.5:
plot(x, dinvgamma(x,0.25, 0.25), ylab="density", yaxs="i",type="l")
This makes the distribution more concentrated on moderate values of the variance.
Now decrease V to 0.2
plot(x, dinvgamma(x,0.25, 0.05), ylab="density", yaxs="i",type="l")
This reduces the mean.
In this particular example, a more informative prior might suggest that there is going to be some variation between and within operators but we believe this variation is not going to be particular large or small. We can achieve this by setting mu to be higher than the default choice of 0.02 - we try 0.5. In MCMCglmm
, R refers to the residual variation and G to the group variation.
iprior <- list(R=list(V=1,nu=0.5),G=list(G1=list(V=1,nu=0.5)))
bmodi <- MCMCglmm(bright ~ 1, random= ~operator, data=pulp, verbose=FALSE, prior=iprior)
Diagnostics:
xyplot(bmodi$Sol)
xyplot(log(bmodi$VCV))
As can be seen, the chain mixing for the operator variance is much more satisfactory.
Summary output
summary(bmodi)
Iterations = 3001:12991
Thinning interval = 10
Sample size = 1000
DIC: 18.2
G-structure: ~operator
post.mean l-95% CI u-95% CI eff.samp
operator 0.499 0.0372 1.48 1000
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 0.15 0.0695 0.27 1000
Location effects: bright ~ 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 60.4 59.7 61.1 1151 <0.001
There is a considerable difference with the previous MCMCglmm analysis. The operator variance is much larger and there is more variation in the intercept term.
Posterior distribution of the operator SD
hist(sqrt(bmodi$VCV[,1]),100, xlab="operator SD", main="posterior distribution")
This is markedly different from the results using the defaut prior and puts more mass on larger values of the operator variance.
We can check again on the probability that the variance is less than 0.01 (or SD less than 0.1):
mean(bmodi$VCV[,1] < 0.01)
[1] 0
Hence the results are considerably different from before and we see that the outcome is sensitive to the choice of prior. But this should not be surprising as we have a small dataset with only 20 observations overall with only 4 different operators with which to estimate the operator variance. It is not surprising that the prior is influential in this example. Furthermore, if we examine the density of the prior we have used as plotted above, we see that this puts very little weight on very small values of the variance so this comes through in the posterior distribution.
The problem with the default choice of prior was that this resulted in poor mixing properties in the MCMC. This instability is common when the variance approaches zero. When we explicitly set the prior to put little weight on near zero values for the variance, the problem was avoided but at the price of making a strong assumption about the size of the variance which we might not wish to make.
It is possible to avoid the mixing problems while retaining a prior that allows some weight to small variances. We can use a technique called parameter expansion which is explained in more detail in Hadfield (2010). We can specify an non-central scaled F-distribution prior for the variance. For example, here is a possible prior for the operator variance:
alpha.V <- 1000
alpha.mu <- 0
nu <- 0.02
plot(x,df(x/alpha.V, df1 = 1, df2 = nu, ncp = (alpha.mu^2)/alpha.V),type="l",ylab="Density",yaxs="i")
We can see that it is comparable to the default prior in its shape. We can use this:
eprior <- list(R=list(V=1,nu=0.02),G=list(G1=list(V=1,nu=0.02,alpha.V=1000)))
bmode <- MCMCglmm(bright ~ 1, random= ~operator, data=pulp, verbose=FALSE, prior=eprior)
and check the mixing properties:
xyplot(log(bmode$VCV))
These are satisfactory. We now check the model summary:
summary(bmode)
Iterations = 3001:12991
Thinning interval = 10
Sample size = 1000
DIC: 18.074
G-structure: ~operator
post.mean l-95% CI u-95% CI eff.samp
operator 0.623 1.29e-05 2.42 439
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 0.126 0.0536 0.241 1000
Location effects: bright ~ 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 60.4 59.8 61.3 1000 <0.001
Posterior distribution of the operator SD:
hist(sqrt(bmode$VCV[,1]),100, xlab="operator SD", main="posterior distribution")
This is more like the second posterior above than the default prior-based version.
We can check again on the probability that the variance is less than 0.01 (or SD less than 0.1):
mean(bmode$VCV[,1] < 0.01)
[1] 0.034
We see a relatively small chance of a negligible operator SD.
We can compare the error and operator SDs:
plot(sqrt(operator) ~ sqrt(units), data=bmode$VCV, xlab="error SD", ylab="operator SD")
abline(0,1)
mean(bmode$VCV[,1] > bmode$VCV[,2])
[1] 0.554
We see that in slightly more than half the samples, the operator SD is larger than the error SD.
In the Bayesian framework, it is natural to think of the operator random effects as distributions. We need to modify the MCMCglmm
call to add pr=TRUE
if we want to examine these posteriors:
bmode <- MCMCglmm(bright ~ 1, random= ~operator, data=pulp, verbose=FALSE, prior=eprior, pr=TRUE)
colMeans(bmode$Sol)
(Intercept) operator.a operator.b operator.c operator.d
60.38927 -0.11140 -0.25890 0.18911 0.23759
These are the posterior means for the operator effects. We see these are similar to the BLUPs computed from the lme4
fit:
ranef(mmod)
$operator
(Intercept)
a -0.12194
b -0.25912
c 0.16767
d 0.21340
We can also look at the distributions of these effects:
library(reshape2)
ref <- data.frame(bmode$Sol[,-1])
colnames(ref) <- letters[1:4]
rdf <- melt(ref)
ggplot(data=rdf,aes(x=value, color=variable))+geom_density()
There is substantial overlap in the four distributions so there is no real evidence of a difference between any pair of operators.
Check on versions of R packages used:
sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reshape2_1.2.2 MCMCglmm_2.21 ape_3.1-4 coda_0.16-1
[5] lattice_0.20-29 lmerTest_2.0-20 RLRsim_3.0 ggplot2_0.9.3.1
[9] lme4_1.1-7 Rcpp_0.11.3 Matrix_1.1-4 knitr_1.6
[13] rmarkdown_0.2.68
loaded via a namespace (and not attached):
[1] bitops_1.0-6 caTools_1.16 cluster_1.15.2
[4] colorspace_1.2-4 corpcor_1.6.7 dichromat_2.0-0
[7] digest_0.6.4 evaluate_0.5.3 formatR_0.10
[10] Formula_1.1-1 gdata_2.13.3 gplots_2.13.0
[13] grid_3.1.1 gtable_0.1.2 gtools_3.3.1
[16] Hmisc_3.14-3 htmltools_0.2.4 KernSmooth_2.23-12
[19] labeling_0.2 latticeExtra_0.6-26 MASS_7.3-33
[22] minqa_1.2.3 munsell_0.4.2 nlme_3.1-117
[25] nloptr_1.0.4 numDeriv_2012.9-1 plyr_1.8.1
[28] proto_0.3-10 RColorBrewer_1.0-5 scales_0.2.3
[31] splines_3.1.1 stringr_0.6.2 survival_2.37-7
[34] tensorA_0.36 tools_3.1.1 yaml_2.1.11
The various analysis are agreed on the fixed effect term (which is only the intercept). There is considerably more uncertainty in the magnitudes of the operator and error variances. We have enough information to be more precise about the error variance but there is still some uncertainty. It is harder to pin down the operator variance which is hardly surprising since we have effectively only four observations of operators.
Hadfield, Jarrod D. 2010. “MCMC Methods for Multi-Response Generalized Linear Mixed Models: The MCMCglmm R Package.” Journal of Statistical Software 33 (2): 1–22. http://www.jstatsoft.org/v33/i02/.