[1] "Created: Tue Jun 24 14:25:43 2014"

We have introduced several types of new measure. Some of these are based on our GP fitting but some are data-based statistics rather like the Richards measures. Here we repeat the calculations using only the GP based measures.

Measures that are omitted here are shov, maxdiff, dscore, gscore, lsd, gtvar.

See featlcdb.Rmd for how the data feat.rda was prepared for this analysis.

opts_chunk$set(comment=NA, warning=FALSE, message=FALSE, error=FALSE)
load("feat.rda")
source("funcs.R")
require(MASS)
require(nnet)
require(ggplot2)
require(rpart)
require(rpart.plot)
require(xtable)
require(kernlab)
require(randomForest)

Set up the predictors that we will use throughout. Note that I have transformed some of the variables as seen in the setup.

predform <- "log(totvar) + log(quadvar) + log(famp) + log(fslope) + log(outl) +  rm.amplitude  + rm.beyond1std + rm.fpr20 +rm.fpr35 + rm.fpr50 + rm.fpr80 + log(rm.maxslope) + rm.mad +asylog(rm.medbuf) + rm.pairslope + log(rm.peramp) + log(rm.pdfp) + rm.skew + log(rm.kurtosis)+ rm.std + dublog(rm.rcorbor)"
preds <- c("totvar","quadvar","famp","fslope","outl","rm.amplitude","rm.beyond1std","rm.fpr20","rm.fpr35","rm.fpr50","rm.fpr80","rm.maxslope","rm.mad","rm.medbuf","rm.pairslope","rm.peramp","rm.pdfp","rm.skew","rm.kurtosis","rm.std","rm.rcorbor")
tpredform <- paste(preds,collapse="+")

Creation of model formulae and subset testing and training sets

Formula for classification of all types:

allform <- as.formula(paste("type ~",predform))

transients vs. non-variables:

trains$vtype <- ifelse(trains$type == "nv","nv","tr")
trains$vtype <- factor(trains$vtype)
tests$vtype <- ifelse(tests$type == "nv","nv","tr")
tests$vtype <- factor(tests$vtype)
vtform <- as.formula(paste("vtype ~",predform))

transients only,

trains$ttype <- trains$type
trains$ttype[trains$ttype == "nv"] <- NA
trains$ttype <- factor(trains$ttype)
tests$ttype <- tests$type
tests$ttype[tests$ttype == "nv"] <- NA
tests$ttype <- factor(tests$ttype)
trform <- as.formula(paste("ttype ~",predform))
cmat <- matrix(NA,nrow=4,ncol=5)
dimnames(cmat) <- list(c("All","TranNoTran","Tranonly","Heirarch"),c("LDA","RPart","SVM","NN","Forest"))

All types

LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

ldamod <- lda(allform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 25 4 3 1 0 3 0 3
blazar 3 20 9 14 0 5 1 8
cv 0 4 79 32 2 7 1 15
downes 0 8 13 29 0 5 12 3
flare 1 0 5 4 10 12 1 1
nv 14 1 16 41 12 521 1 29
rrlyr 0 0 0 11 0 2 87 0
sn 1 2 20 3 0 3 0 133

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 57 10 2 1 0 1 0 2
blazar 7 51 6 10 0 1 1 4
cv 0 10 54 24 8 1 1 8
downes 0 21 9 21 0 1 12 2
flare 2 0 3 3 42 2 1 1
nv 32 3 11 30 50 93 1 15
rrlyr 0 0 0 8 0 0 84 0
sn 2 5 14 2 0 1 0 69

The overall classification rate is 0.729.

Recursive Partitioning

roz <- rpart(allform ,data=trains)
rpart.plot(roz,type=1,extra=1)

plot of chunk unnamed-chunk-13

pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 34 1 1 3 0 8 1 3
blazar 0 0 0 0 0 0 0 0
cv 1 31 94 58 0 3 5 10
downes 0 0 0 0 0 0 0 0
flare 0 0 0 0 0 0 0 0
nv 8 1 17 45 14 513 2 29
rrlyr 0 0 1 10 0 0 86 1
sn 1 6 32 19 10 34 9 149

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 77 3 1 2 0 1 1 2
blazar 0 0 0 0 0 0 0 0
cv 2 79 65 43 0 1 5 5
downes 0 0 0 0 0 0 0 0
flare 0 0 0 0 0 0 0 0
nv 18 3 12 33 58 92 2 15
rrlyr 0 0 1 7 0 0 83 1
sn 2 15 22 14 42 6 9 78

The overall classification rate is 0.7065.

Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

svmod <- ksvm(allform, data=trains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel 
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 27 2 1 7 0 3 0 3
blazar 0 18 4 6 0 0 0 0
cv 0 5 94 36 2 5 1 17
downes 1 4 8 35 0 4 7 1
flare 0 0 0 1 5 4 0 0
nv 16 0 15 40 16 529 3 18
rrlyr 0 2 0 8 0 0 90 0
sn 0 8 23 2 1 13 2 153

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 61 5 1 5 0 1 0 2
blazar 0 46 3 4 0 0 0 0
cv 0 13 65 27 8 1 1 9
downes 2 10 6 26 0 1 7 1
flare 0 0 0 1 21 1 0 0
nv 36 0 10 30 67 95 3 9
rrlyr 0 5 0 6 0 0 87 0
sn 0 21 16 1 4 2 2 80

The overall classification rate is 0.7669.

Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

svmod <- multinom(allform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 28 2 1 4 0 2 0 3
blazar 2 20 4 6 0 0 0 4
cv 1 7 79 33 1 8 1 19
downes 0 6 21 44 2 15 8 2
flare 0 0 0 1 4 5 0 1
nv 11 1 14 34 15 526 2 24
rrlyr 0 1 0 9 2 0 92 0
sn 2 2 26 4 0 2 0 139

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 64 5 1 3 0 0 0 2
blazar 5 51 3 4 0 0 0 2
cv 2 18 54 24 4 1 1 10
downes 0 15 14 33 8 3 8 1
flare 0 0 0 1 17 1 0 1
nv 25 3 10 25 62 94 2 12
rrlyr 0 3 0 7 8 0 89 0
sn 5 5 18 3 0 0 0 72

The overall classification rate is 0.7516.

Random Forest

Use the randomForest package with the default settings:

tallform <- as.formula(paste("type ~",tpredform))
fmod <- randomForest(tallform, data=trains)
pv <- predict(fmod, newdata=tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 31 1 0 3 0 5 0 2
blazar 0 19 2 9 0 1 0 0
cv 2 8 82 31 1 1 1 16
downes 0 5 19 44 0 8 6 2
flare 0 0 0 1 7 5 0 0
nv 10 1 15 33 15 527 2 21
rrlyr 0 1 0 7 0 0 92 1
sn 1 4 27 7 1 11 2 150

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 70 3 0 2 0 1 0 1
blazar 0 49 1 7 0 0 0 0
cv 5 21 57 23 4 0 1 8
downes 0 13 13 33 0 1 6 1
flare 0 0 0 1 29 1 0 0
nv 23 3 10 24 62 94 2 11
rrlyr 0 3 0 5 0 0 89 1
sn 2 10 19 5 4 2 2 78

The overall classification rate is 0.7677.

Transients vs. non-variables

LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

ldamod <- lda(vtform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 515 86
tr 43 596

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 92 13
tr 8 87

The overall classification rate is 0.896.

Recursive Partitioning

roz <- rpart(vtform ,data=trains)
rpart.plot(roz,type=1,extra=1)

plot of chunk unnamed-chunk-28

pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 500 107
tr 58 575

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 90 16
tr 10 84

The overall classification rate is 0.8669.

Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

svmod <- ksvm(vtform, data=trains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel 
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 507 81
tr 51 601

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 91 12
tr 9 88

The overall classification rate is 0.8935.

Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

svmod <- multinom(vtform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 507 74
tr 51 608

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 91 11
tr 9 89