[1] "Created: Wed Sep  3 16:38:19 2014"

We repeat the computation except using a different kernel. We choose the Matern Kernel of order 3/2 as this is perhaps the next most popular choice after the squared exponential.

load("lcdb.rda")
source("funcs.R")
GPR <- TRUE

Set the kernel type and choose an invers ewidth which gives similar fits to the squared exponential kernel.

kerntype <- "matern"
wvec <- 2e-2

Now compute all the measures. Only those measures that depend on the GP fit will be changed. The Richards measures are unchanged.

firstdate <- 53464
daterange <- 2764
detection.limit <- 20.5

Compute all the measures

uids <- lcdbi$id
nuids <- length(uids)
statmat <- NULL
types <- NULL
ids <- NULL
for(i in 1:nuids){
  olc <- lcdb[[i]]
  type <- as.character(lcdbi$type[i])
  id <- as.character(lcdbi$id[i])
  aa <- lcstats(olc,GPR,kerntype,wvec,daterange=daterange,firstdate=firstdate,detection.limit=detection.limit)
  if(is.na(aa[1])) next
  statmat <- rbind(statmat,aa)
  types <- c(types,type)
  ids <- c(ids,id)
}
colnames(statmat) <- c("meds", "iqr","shov", "maxdiff", "dscore","wander", "moveloc","nobs", "totvar", "quadvar", "famp", "fslope","trend",
"outl", "skewres", "std", "shapwilk", "lsd", "gscore", "mdev", "gtvar", "rm.amplitude", "rm.mean",
"rm.stddev", "rm.beyond1std", "rm.fpr20", "rm.fpr35", "rm.fpr50",
"rm.fpr65", "rm.fpr80", "rm.lintrend", "rm.maxslope", "rm.mad",
"rm.medbuf", "rm.pairslope", "rm.peramp", "rm.pdfp", "rm.skew",
"rm.kurtosis", "rm.std", "rm.rcorbor")
cmdb <- data.frame(statmat,type=types,id=ids,row.names=ids)
cmdb$lsd[is.na(cmdb$lsd) | is.infinite(cmdb$lsd)] <- min(cmdb$lsd[!is.infinite(cmdb$lsd)],na.rm=TRUE)-1
cmdb$fslope[is.infinite(cmdb$fslope)] <- 1

Split the data into a training(2/3) and a test(1/3) sample in the same way as before. The split will not be identical to that used on the Richards measures because of the problem that not all the Richards stats are computed on the NV group resulting in the discard of some objects from the calculation.

set.seed(123)
n <- nrow(cmdb)
isel <- sample(1:n,round(n/3))
trains <- cmdb[-isel,]
tests <- cmdb[isel,]

There are 1240 observations in the test set and 2480 observations in the training set.

Redo the classification for this set of measures using the Matern kernel:

opts_chunk$set(comment=NA, warning=FALSE, message=FALSE, error=FALSE)
load("feat.rda")
source("funcs.R")
require(MASS)
require(nnet)
require(ggplot2)
require(rpart)
require(rpart.plot)
require(xtable)
require(kernlab)
require(randomForest)

Set up the predictors that we will use throughout. Note that I have transformed some of the variables as seen in the setup.

predform <- "shov + maxdiff + dscore + log(totvar) + log(quadvar) + log(famp) + log(fslope) + log(outl) + gscore + lsd + nudlog(gtvar) + rm.amplitude  + rm.beyond1std + rm.fpr20 +rm.fpr35 + rm.fpr50 + rm.fpr80 + log(rm.maxslope) + rm.mad +asylog(rm.medbuf) + rm.pairslope + log(rm.peramp) + log(rm.pdfp) + rm.skew + log(rm.kurtosis)+ rm.std + dublog(rm.rcorbor)"
preds <- c("shov","maxdiff","dscore","totvar","quadvar","famp","fslope","outl","gscore","lsd","gtvar","rm.amplitude","rm.beyond1std","rm.fpr20","rm.fpr35","rm.fpr50","rm.fpr80","rm.maxslope","rm.mad","rm.medbuf","rm.pairslope","rm.peramp","rm.pdfp","rm.skew","rm.kurtosis","rm.std","rm.rcorbor")
tpredform <- paste(preds,collapse="+")

Creation of model formulae and subset testing and training sets

Formula for classification of all types:

allform <- as.formula(paste("type ~",predform))

transients vs. non-variables:

trains$vtype <- ifelse(trains$type == "nv","nv","tr")
trains$vtype <- factor(trains$vtype)
tests$vtype <- ifelse(tests$type == "nv","nv","tr")
tests$vtype <- factor(tests$vtype)
vtform <- as.formula(paste("vtype ~",predform))

transients only,

trains$ttype <- trains$type
trains$ttype[trains$ttype == "nv"] <- NA
trains$ttype <- factor(trains$ttype)
tests$ttype <- tests$type
tests$ttype[tests$ttype == "nv"] <- NA
tests$ttype <- factor(tests$ttype)
trform <- as.formula(paste("ttype ~",predform))
cmat <- matrix(NA,nrow=4,ncol=5)
dimnames(cmat) <- list(c("All","TranNoTran","Tranonly","Heirarch"),c("LDA","RPart","SVM","NN","Forest"))

All types

LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

ldamod <- lda(allform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 29 2 0 3 0 3 0 8
blazar 0 24 13 13 0 2 0 8
cv 0 3 85 32 1 2 1 14
downes 1 4 14 33 0 8 9 1
flare 1 0 2 3 11 10 1 1
nv 12 2 13 29 11 525 0 17
rrlyr 0 2 1 20 0 1 92 0
sn 1 2 17 2 1 7 0 143

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 66 5 0 2 0 1 0 4
blazar 0 62 9 10 0 0 0 4
cv 0 8 59 24 4 0 1 7
downes 2 10 10 24 0 1 9 1
flare 2 0 1 2 46 2 1 1
nv 27 5 9 21 46 94 0 9
rrlyr 0 5 1 15 0 0 89 0
sn 2 5 12 1 4 1 0 74

The overall classification rate is 0.7597.

Recursive Partitioning

roz <- rpart(allform ,data=trains)
rpart.plot(roz,type=1,extra=1)

plot of chunk unnamed-chunk-19

pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 27 2 3 6 1 6 0 7
blazar 0 17 6 19 0 1 4 2
cv 1 6 80 35 0 0 2 3
downes 0 6 8 9 0 2 1 5
flare 0 0 0 0 0 0 0 0
nv 11 1 19 50 22 528 3 32
rrlyr 0 4 1 13 0 7 93 5
sn 5 3 28 3 1 14 0 138

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 61 5 2 4 4 1 0 4
blazar 0 44 4 14 0 0 4 1
cv 2 15 55 26 0 0 2 2
downes 0 15 6 7 0 0 1 3
flare 0 0 0 0 0 0 0 0
nv 25 3 13 37 92 95 3 17
rrlyr 0 10 1 10 0 1 90 3
sn 11 8 19 2 4 3 0 72

The overall classification rate is 0.7194.

Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

svmod <- ksvm(allform, data=trains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel 
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 29 3 0 4 0 1 0 1
blazar 0 23 4 7 0 0 0 0
cv 1 5 102 36 1 3 1 13
downes 1 2 8 43 0 5 8 0
flare 0 0 0 1 7 4 0 0
nv 12 0 8 33 14 540 1 20
rrlyr 0 2 0 7 0 1 93 0
sn 1 4 23 4 2 4 0 158

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 66 8 0 3 0 0 0 1
blazar 0 59 3 5 0 0 0 0
cv 2 13 70 27 4 1 1 7
downes 2 5 6 32 0 1 8 0
flare 0 0 0 1 29 1 0 0
nv 27 0 6 24 58 97 1 10
rrlyr 0 5 0 5 0 0 90 0
sn 2 10 16 3 8 1 0 82

The overall classification rate is 0.8024.

Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

svmod <- multinom(allform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 32 2 0 3 0 1 0 3
blazar 1 24 4 11 0 1 0 6
cv 3 4 93 27 0 5 3 19
downes 0 6 18 59 0 11 7 1
flare 0 0 0 1 7 4 0 1
nv 7 0 9 25 12 531 0 14
rrlyr 0 0 0 7 2 0 93 0
sn 1 3 21 2 3 5 0 148

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 73 5 0 2 0 0 0 2
blazar 2 62 3 8 0 0 0 3
cv 7 10 64 20 0 1 3 10
downes 0 15 12 44 0 2 7 1
flare 0 0 0 1 29 1 0 1
nv 16 0 6 19 50 95 0 7
rrlyr 0 0 0 5 8 0 90 0
sn 2 8 14 1 12 1 0 77

The overall classification rate is 0.796.

Random Forest

Use the randomForest package with the default settings:

tallform <- as.formula(paste("type ~",tpredform))
fmod <- randomForest(tallform, data=trains)
pv <- predict(fmod, newdata=tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 30 1 0 3 0 1 0 2
blazar 0 21 3 7 0 0 0 0
cv 3 7 92 27 0 3 1 15
downes 0 4 17 57 0 7 8 0
flare 0 0 0 2 8 1 0 0
nv 8 1 10 31 15 540 1 18
rrlyr 0 2 0 5 0 1 93 0
sn 3 3 23 3 1 5 0 157

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 68 3 0 2 0 0 0 1
blazar 0 54 2 5 0 0 0 0
cv 7 18 63 20 0 1 1 8
downes 0 10 12 42 0 1 8 0
flare 0 0 0 1 33 0 0 0
nv 18 3 7 23 62 97 1 9
rrlyr 0 5 0 4 0 0 90 0
sn 7 8 16 2 4 1 0 82

The overall classification rate is 0.8048.

Transients vs. non-variables

LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

ldamod <- lda(vtform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 523 84
tr 35 598

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 94 12
tr 6 88

The overall classification rate is 0.904.

Recursive Partitioning

roz <- rpart(vtform ,data=trains)
rpart.plot(roz,type=1,extra=1)