[1] "Created: Wed Sep  3 16:21:25 2014"

The purpose of this analysis is to establish a performance baseline for our measures. I have deliberately chosen the default version of five different classification methods:

  1. LDA
  2. Recursive Partitioning
  3. SVM
  4. NNET
  5. Random Forest

It is very possible that better results could be obtained by tuning these methods better or using different classification methods. This is worth trying but my main objective is the generation of new features which I hope will improve the Richards features. I need to use the same classification methods for both sets of features in order to make a fair comparison.

See featlcdb.Rmd for how the data feat.rda was prepared for this analysis.

load("feat.rda")
source("funcs.R")
require(MASS)
require(nnet)
require(ggplot2)
require(rpart)
require(rpart.plot)
require(xtable)
require(kernlab)
require(randomForest)

Richards set predictors

Set up the predictors that we will use throughout. Note that I have transformed some of the variables as seen in the setup.

predform <- "rm.amplitude  + rm.beyond1std + rm.fpr20 +rm.fpr35 + rm.fpr50 + rm.fpr80 + log(rm.maxslope) + rm.mad +asylog(rm.medbuf) + rm.pairslope + log(rm.peramp) + log(rm.pdfp) + rm.skew + log(rm.kurtosis)+ rm.std + dublog(rm.rcorbor)"
preds <- c("rm.amplitude","rm.beyond1std","rm.fpr20","rm.fpr35","rm.fpr50","rm.fpr80","rm.maxslope","rm.mad","rm.medbuf","rm.pairslope","rm.peramp","rm.pdfp","rm.skew","rm.kurtosis","rm.std","rm.rcorbor")
tpredform <- paste(preds,collapse="+")
trform <- as.formula(paste("type ~",predform))

Creation of model formulae and subset testing and training sets

Formula for classification of all types:

allform <- as.formula(paste("type ~",predform))

transients vs. non-variables:

trains$vtype <- ifelse(trains$type == "nv","nv","tr")
trains$vtype <- factor(trains$vtype)
tests$vtype <- ifelse(tests$type == "nv","nv","tr")
tests$vtype <- factor(tests$vtype)
vtform <- as.formula(paste("vtype ~",predform))

transients only,

trains$ttype <- trains$type
trains$ttype[trains$ttype == "nv"] <- NA
trains$ttype <- factor(trains$ttype)
tests$ttype <- tests$type
tests$ttype[tests$ttype == "nv"] <- NA
tests$ttype <- factor(tests$ttype)
trform <- as.formula(paste("ttype ~",predform))
cmat <- matrix(NA,nrow=4,ncol=5)
dimnames(cmat) <- list(c("All","TranNoTran","Tranonly","Heirarch"),c("LDA","RPart","SVM","NN","Forest"))

All types

LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

ldamod <- lda(allform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 2 0 0 0 0 1 0 0
blazar 9 19 11 15 0 9 0 7
cv 0 4 81 34 2 10 1 22
downes 0 1 7 12 1 5 9 4
flare 1 0 7 8 9 10 1 1
nv 32 15 25 66 11 489 65 89
rrlyr 0 0 3 0 0 18 26 4
sn 0 0 11 0 1 16 1 65

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 5 0 0 0 0 0 0 0
blazar 20 49 8 11 0 2 0 4
cv 0 10 56 25 8 2 1 11
downes 0 3 5 9 4 1 9 2
flare 2 0 5 6 38 2 1 1
nv 73 38 17 49 46 88 63 46
rrlyr 0 0 2 0 0 3 25 2
sn 0 0 8 0 4 3 1 34

The overall classification rate is 0.5669.

Recursive Partitioning

roz <- rpart(allform ,data=trains)
rpart.plot(roz,type=1,extra=1)

plot of chunk unnamed-chunk-18

pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 0 0 0 0 0 0 0 0
blazar 0 0 0 0 0 0 0 0
cv 5 26 95 53 4 20 3 27
downes 0 0 0 0 0 0 0 0
flare 0 0 0 0 0 0 0 0
nv 31 7 26 73 16 497 47 80
rrlyr 0 1 2 3 0 12 53 3
sn 8 5 22 6 4 29 0 82

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 0 0 0 0 0 0 0 0
blazar 0 0 0 0 0 0 0 0
cv 11 67 66 39 17 4 3 14
downes 0 0 0 0 0 0 0 0
flare 0 0 0 0 0 0 0 0
nv 70 18 18 54 67 89 46 42
rrlyr 0 3 1 2 0 2 51 2
sn 18 13 15 4 17 5 0 43

The overall classification rate is 0.5863.

Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

svmod <- ksvm(allform, data=trains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel 
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 11 2 1 5 0 5 2 3
blazar 0 14 4 8 0 2 0 1
cv 1 4 95 37 3 11 2 23
downes 0 4 6 10 0 3 4 0
flare 0 0 0 0 3 4 0 0
nv 31 9 19 66 17 508 18 61
rrlyr 0 1 0 4 0 8 77 1
sn 1 5 20 5 1 17 0 103

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 25 5 1 4 0 1 2 2
blazar 0 36 3 6 0 0 0 1
cv 2 10 66 27 12 2 2 12
downes 0 10 4 7 0 1 4 0
flare 0 0 0 0 12 1 0 0
nv 70 23 13 49 71 91 17 32
rrlyr 0 3 0 3 0 1 75 1
sn 2 13 14 4 4 3 0 54

The overall classification rate is 0.6621.

Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

svmod <- multinom(allform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 13 2 0 7 0 5 0 2
blazar 0 12 4 7 0 1 0 2
cv 1 10 82 39 1 10 2 18
downes 1 1 16 11 1 4 3 8
flare 0 0 0 1 4 4 0 0
nv 28 12 29 56 12 508 10 95
rrlyr 1 1 0 14 4 5 88 0
sn 0 1 14 0 2 21 0 67

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 30 5 0 5 0 1 0 1
blazar 0 31 3 5 0 0 0 1
cv 2 26 57 29 4 2 2 9
downes 2 3 11 8 4 1 3 4
flare 0 0 0 1 17 1 0 0
nv 64 31 20 41 50 91 10 49
rrlyr 2 3 0 10 17 1 85 0
sn 0 3 10 0 8 4 0 35

The overall classification rate is 0.6331.

Random Forest

Use the randomForest package with the default settings:

tallform <- as.formula(paste("type ~",tpredform))
fmod <- randomForest(tallform, data=trains)
pv <- predict(fmod, newdata=tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 21 1 0 5 0 8 0 2
blazar 4 16 3 7 1 5 0 4
cv 2 7 82 35 3 11 1 17
downes 0 1 15 16 0 8 6 2
flare 0 0 0 2 9 3 0 1
nv 15 7 18 61 8 496 17 52
rrlyr 0 1 1 5 0 7 78 1
sn 2 6 26 4 3 20 1 113

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 48 3 0 4 0 1 0 1
blazar 9 41 2 5 4 1 0 2
cv 5 18 57 26 12 2 1 9
downes 0 3 10 12 0 1 6 1
flare 0 0 0 1 38 1 0 1
nv 34 18 12 45 33 89 17 27
rrlyr 0 3 1 4 0 1 76 1
sn 5 15 18 3 12 4 1 59

The overall classification rate is 0.6702.

Transients vs. non-variables

LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

ldamod <- lda(vtform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 414 171
tr 144 511

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 74 25
tr 26 75

The overall classification rate is 0.746.

Recursive Partitioning

roz <- rpart(vtform ,data=trains)
rpart.plot(roz,type=1,extra=1)

plot of chunk unnamed-chunk-33

pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 416 112
tr 142 570

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 75 16
tr 25 84

The overall classification rate is 0.7952.

Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

svmod <- ksvm(vtform, data=trains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel 
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 441 120
tr 117 562

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 79 18
tr 21 82

The overall classification rate is 0.8089.

Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

svmod <- multinom(vtform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 410 160
tr 148 522

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 73 23
tr 27 77

The overall classification rate is 0.7516.

Random Forest

Use the randomForest package with the default settings:

tallform <- as.formula(paste("vtype ~",tpredform))
fmod <- randomForest(tallform, data=trains)
pv <- predict(fmod, newdata=tests)
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 446 102
tr 112 580

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 80 15
tr 20 85

The overall classification rate is 0.8274.

Transients only

LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

ldamod <- lda(trform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 15 4 0 10 0 0 8
blazar 8 14 7 10 1 0 4
cv 0 7 87 37 2 1 16
downes 0 7 10 43 9 20 16
flare 1 0 4 5 11 1 0
rrlyr 1 0 12 15 0 69 15
sn 19 7 25 15 1 12 133

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 34 10 0 7 0 0 4
blazar 18 36 5 7 4 0 2
cv 0 18 60 27 8 1 8
downes 0 18 7 32 38 19 8
flare 2 0 3 4 46 1 0
rrlyr 2 0 8 11 0 67 8
sn 43 18 17 11 4 12 69

The overall classification rate is 0.5455.

Recursive Partitioning

roz <- rpart(trform ,data=trains)
rpart.plot(roz,type=1,extra=1)

plot of chunk unnamed-chunk-48

pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 28 5 2 15 0 0 5
blazar 2 12 4 4 0 0 1
cv 2 10 90 41 7 4 15
downes 1 0 2 27 3 2 4
flare 0 0 0 0 0 0 0
rrlyr 4 1 2 23 9 84 6
sn 7 11 45 25 5 13 161

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 64 13 1 11 0 0 3
blazar 5 31 3 3 0 0 1
cv 5 26 62 30 29 4 8
downes 2 0 1 20 12 2 2
flare 0 0 0 0 0 0 0
rrlyr 9 3 1 17 38 82 3
sn 16 28 31 19 21 13 84

The overall classification rate is 0.5894.

Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

svmod <- ksvm(trform, data=trains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel 
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 36 6 1 19 0 2 7
blazar 0 13 4 5 1 0 1
cv 2 5 99 44 5 3 23
downes 1 5 4 42 4 10 3
flare 0 0 0 1 11 0 1
rrlyr 1 1 2 14 0 86 3
sn 4 9 35 10 3 2 154

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 82 15 1 14 0 2 4
blazar 0 33 3 4 4 0 1
cv 5 13 68 33 21 3 12
downes 2 13 3 31 17 10 2
flare 0 0 0 1 46 0 1
rrlyr 2 3 1 10 0 83 2
sn 9 23 24 7 12 2 80

The overall classification rate is 0.6466.

Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

svmod <- multinom(trform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 29 2 0 18 0 0 4
blazar 2 15 5 6 1 0 4
cv 1 10 85 41 1 1 18
downes 3 4 17 36 11 9 18
flare 0 0 0 2 6 1 1
rrlyr 1 1 2 22 3 92 0
sn 8 7 36 10 2 0 147

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 66 5 0 13 0 0 2
blazar 5 38 3 4 4 0 2
cv 2 26 59 30 4 1 9
downes 7 10 12 27 46 9 9
flare 0 0 0 1 25 1 1
rrlyr 2 3 1 16 12 89 0
sn 18 18 25 7 8 0 77

The overall classification rate is 0.6012.

Random Forest

Use the randomForest package with the default settings:

tallform <- as.formula(paste("ttype ~",tpredform))
fmod <- randomForest(tallform, data=na.omit(trains))
pv <- predict(fmod, newdata=na.omit(tests))
cm <- xtabs( ~ pv + na.omit(tests)$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 29 5 0 12 0 1 7
blazar 5 16 5 9 1 0 7
cv 2 8 85 37 3 1 18
downes 3 2 15 57 5 10 8
flare 0 0 0 2 11 0 1
rrlyr 1 1 2 12 0 86 3
sn 4 7 38 6 4 5 148

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 66 13 0 9 0 1 4
blazar 11 41 3 7 4 0 4
cv 5 21 59 27 12 1 9
downes 7 5 10 42 21 10 4
flare 0 0 0 1 46 0 1
rrlyr 2 3 1 9 0 83 2
sn 9 18 26 4 17 5 77

The overall classification rate is 0.6334.

Heirarchical Classification

First we classify into transient and non-variable. The cases which are classified as transient are then classified into type of transient. The transient classification here is different from the one above in the data used. Above, all the data are known to be transients whereas here some cases from the non-variable set will have been classified as transient at the first stage.

LDA

ldamod <- lda(vtform ,data=trains)
pv <- predict(ldamod, tests)
utests <- subset(tests, pv$class != 'nv')
pvt <- predict(ldamod, trains)
utrains <- subset(trains, pvt$class != 'nv')
ldamod <- lda(trform, data=utrains)
predc <- as.character(pv$class)
predc[predc != 'nv'] <- as.character(predict(ldamod, utests)$class)
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 18 2 1 9 0 21 4 8
blazar 6 13 4 9 0 6 0 1
cv 0 12 88 38 1 11 1 17
downes 0 1 7 7 2 3 3 6
flare 1 0 3 5 8 9 1 0
nv 12 6 12 56 9 414 25 50
rrlyr 1 0 6 4 0 26 54 13
sn 6 5 24 7 4 68 15 97

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 41 5 1 7 0 4 4 4
blazar 14 33 3 7 0 1 0 1
cv 0 31 61 28 4 2 1 9
downes 0 3 5 5 8 1 3 3
flare 2 0 2 4 33 2 1 0
nv 27 15 8 41 38 74 24 26
rrlyr 2 0 4 3 0 5 52 7
sn 14 13 17 5 17 12 15 51

The overall classification rate is 0.5637.

RPART

roz <- rpart(vtform ,data=trains)
pv <- predict(roz, tests, type="class")
utests <- subset(tests, pv != 'nv')
pvt <- predict(roz, trains, type="class")
utrains <- subset(trains, pvt != 'nv')
roz <- rpart(trform, data=utrains)
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(roz, utests,
type="class"))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 26 3 2 12 0 25 0 4
blazar 2 12 4 4 0 3 0 1
cv 2 10 90 43 9 16 4 15
downes 0 0 0 0 0 0 0 0
flare 0 0 0 0 0 0 0 0
nv 8 1 3 47 4 416 26 23
rrlyr 1 1 1 11 6 17 61 5
sn 5 12 45 18 5 81 12 144

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 59 8 1 9 0 4 0 2
blazar 5 31 3 3 0 1 0 1
cv 5 26 62 32 38 3 4 8
downes 0 0 0 0 0 0 0 0
flare 0 0 0 0 0 0 0 0
nv 18 3 2 35 17 75 25 12
rrlyr 2 3 1 8 25 3 59 3
sn 11 31 31 13 21 15 12 75

The overall classification rate is 0.604.

SVM

svmod <- ksvm(vtform ,data=trains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel 
pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
svmod <- ksvm(trform, data=utrains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel 
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 23 5 1 8 0 24 2 8
blazar 0 15 4 6 0 2 0 1
cv 2 5 98 42 5 17 3 22
downes 0 2 4 9 0 6 6 1
flare 0 0 0 2 9 4 0 0
nv 17 2 6 52 5 442 10 28
rrlyr 0 1 3 7 1 12 81 2
sn 2 9 29 9 4 51 1 130

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 52 13 1 6 0 4 2 4
blazar 0 38 3 4 0 0 0 1
cv 5 13 68 31 21 3 3 11
downes 0 5 3 7 0 1 6 1
flare 0 0 0 1 38 1 0 0
nv 39 5 4 39 21 79 10 15
rrlyr 0 3 2 5 4 2 79 1
sn 5 23 20 7 17 9 1 68

The overall classification rate is 0.6508.

NNET

svmod <- multinom(vtform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
svmod <- multinom(trform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 26 1 0 11 0 21 0 4
blazar 2 15 5 5 0 2 0 3
cv 1 10 85 41 1 21 1 18
downes 2 4 16 11 5 10 4 14
flare 0 0 0 2 6 9 0 0
nv 6 4 11 55 9 410 28 47
rrlyr 1 0 2 8 1 13 70 0
sn 6 5 26 2 2 72 0 106

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 59 3 0 8 0 4 0 2
blazar 5 38 3 4 0 0 0 2
cv 2 26 59 30 4 4 1 9
downes 5 10 11 8 21 2 4 7
flare 0 0 0 1 25 2 0 0
nv 14 10 8 41 38 73 27 24
rrlyr 2 0 1 6 4 2 68 0
sn 14 13 18 1 8 13 0 55

The overall classification rate is 0.5879.

Random Forest

tallform <- as.formula(paste("vtype ~",tpredform))
svmod <- randomForest(tallform, data=trains)
pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
tallform <- as.formula(paste("ttype ~",tpredform))
svmod <- randomForest(tallform, data=na.omit(trains))
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 26 5 0 8 0 21 0 6
blazar 3 16 4 8 1 7 0 8
cv 1 8 84 34 4 12 1 14
downes 2 1 14 23 1 14 9 5
flare 0 0 0 2 10 5 0 2
nv 9 1 13 44 4 450 7 30
rrlyr 0 1 2 10 0 14 82 1
sn 3 7 28 6 4 35 4 126

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 59 13 0 6 0 4 0 3
blazar 7 41 3 6 4 1 0 4
cv 2 21 58 25 17 2 1 7
downes 5 3 10 17 4 3 9 3
flare 0 0 0 1 42 1 0 1
nv 20 3 9 33 17 81 7 16
rrlyr 0 3 1 7 0 3 80 1
sn 7 18 19 4 17 6 4 66

The overall classification rate is 0.6589.

Summary of results

Summary of percentage classification rates across tests:

print(xtable(100*cmat,digits=2),type="html")
LDA RPart SVM NN Forest
All 56.69 58.63 66.21 63.31 67.02
TranNoTran 74.60 79.52 80.89 75.16 82.74
Tranonly 54.55 58.94 64.66 60.12 63.34
Heirarch 56.37 60.40 65.08 58.79 65.89
cmatRICH <- cmat

Ours+Richards set of predictors

Set up the predictors that we will use throughout. Note that I have transformed some of the variables as seen in the setup.

predform <- "shov + maxdiff + dscore + log(totvar) + log(quadvar) + log(famp) + log(fslope) + log(outl) + gscore + lsd + nudlog(gtvar) + rm.amplitude  + rm.beyond1std + rm.fpr20 +rm.fpr35 + rm.fpr50 + rm.fpr80 + log(rm.maxslope) + rm.mad +asylog(rm.medbuf) + rm.pairslope + log(rm.peramp) + log(rm.pdfp) + rm.skew + log(rm.kurtosis)+ rm.std + dublog(rm.rcorbor)"
preds <- c("shov","maxdiff","dscore","totvar","quadvar","famp","fslope","outl","gscore","lsd","gtvar","rm.amplitude","rm.beyond1std","rm.fpr20","rm.fpr35","rm.fpr50","rm.fpr80","rm.maxslope","rm.mad","rm.medbuf","rm.pairslope","rm.peramp","rm.pdfp","rm.skew","rm.kurtosis","rm.std","rm.rcorbor")
tpredform <- paste(preds,collapse="+")

Creation of model formulae and subset testing and training sets

Formula for classification of all types:

allform <- as.formula(paste("type ~",predform))

transients vs. non-variables:

trains$vtype <- ifelse(trains$type == "nv","nv","tr")
trains$vtype <- factor(trains$vtype)
tests$vtype <- ifelse(tests$type == "nv","nv","tr")
tests$vtype <- factor(tests$vtype)
vtform <- as.formula(paste("vtype ~",predform))

transients only,

trains$ttype <- trains$type
trains$ttype[trains$ttype == "nv"] <- NA
trains$ttype <- factor(trains$ttype)
tests$ttype <- tests$type
tests$ttype[tests$ttype == "nv"] <- NA
tests$ttype <- factor(tests$ttype)
trform <- as.formula(paste("ttype ~",predform))
cmat <- matrix(NA,nrow=4,ncol=5)
dimnames(cmat) <- list(c("All","TranNoTran","Tranonly","Heirarch"),c("LDA","RPart","SVM","NN","Forest"))

All types

LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

ldamod <- lda(allform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 29 2 0 3 0 3 0 8
blazar 0 24 13 13 0 2 0 8
cv 0 3 85 32 1 2 1 14
downes 1 4 14 33 0 8 9 1
flare 1 0 2 3 11 10 1 1
nv 12 2 13 29 11 525 0 17
rrlyr 0 2 1 20 0 1 92 0
sn 1 2 17 2 1 7 0 143

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 66 5 0 2 0 1 0 4
blazar 0 62 9 10 0 0 0 4
cv 0 8 59 24 4 0 1 7
downes 2 10 10 24 0 1 9 1
flare 2 0 1 2 46 2 1 1
nv 27 5 9 21 46 94 0 9
rrlyr 0 5 1 15 0 0 89 0
sn 2 5 12 1 4 1 0 74

The overall classification rate is 0.7597.

Recursive Partitioning

roz <- rpart(allform ,data=trains)
rpart.plot(roz,type=1,extra=1)

plot of chunk unnamed-chunk-143

pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 27 2 3 6 1 6 0 7
blazar 0 17 6 19 0 1 4 2
cv 1 6 80 35 0 0 2 3
downes 0 6 8 9 0 2 1 5
flare 0 0 0 0 0 0 0 0
nv 11 1 19 50 22 528 3 32
rrlyr 0 4 1 13 0 7 93 5
sn 5 3 28 3 1 14 0 138

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 61 5 2 4 4 1 0 4
blazar 0 44 4 14 0 0 4 1
cv 2 15 55 26 0 0 2 2
downes 0 15 6 7 0 0 1 3
flare 0 0 0 0 0 0 0 0
nv 25 3 13 37 92 95 3 17
rrlyr 0 10 1 10 0 1 90 3
sn 11 8 19 2 4 3 0 72

The overall classification rate is 0.7194.

Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

svmod <- ksvm(allform, data=trains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel 
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 29 3 0 4 0 0 0 1
blazar 0 22 4 8 0 0 0 0
cv 1 6 102 34 1 3 1 14
downes 1 2 8 43 0 5 8 0
flare 0 0 0 0 7 3 0 0
nv 12 0 8 34 14 542 1 20
rrlyr 0 2 0 8 0 1 93 0
sn 1 4 23 4 2 4 0 157

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 66 8 0 3 0 0 0 1
blazar 0 56 3 6 0 0 0 0
cv 2 15 70 25 4 1 1 7
downes 2 5 6 32 0 1 8 0
flare 0 0 0 0 29 1 0 0
nv 27 0 6 25 58 97 1 10
rrlyr 0 5 0 6 0 0 90 0
sn 2 10 16 3 8 1 0 82

The overall classification rate is 0.8024.

Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

svmod <- multinom(allform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 32 2 0 3 0 1 0 3
blazar 1 24 4 11 0 1 0 6
cv 3 4 93 27 0 5 3 19
downes 0 6 18 59 0 11 7 1
flare 0 0 0 1 7 4 0 1
nv 7 0 9 25 12 531 0 14
rrlyr 0 0 0 7 2 0 93 0
sn 1 3 21 2 3 5 0 148

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 73 5 0 2 0 0 0 2
blazar 2 62 3 8 0 0 0 3
cv 7 10 64 20 0 1 3 10
downes 0 15 12 44 0 2 7 1
flare 0 0 0 1 29 1 0 1
nv 16 0 6 19 50 95 0 7
rrlyr 0 0 0 5 8 0 90 0
sn 2 8 14 1 12 1 0 77

The overall classification rate is 0.796.

Random Forest

Use the randomForest package with the default settings:

tallform <- as.formula(paste("type ~",tpredform))
fmod <- randomForest(tallform, data=trains)
pv <- predict(fmod, newdata=tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 31 1 0 3 0 2 0 2
blazar 0 20 3 8 0 0 0 0
cv 2 7 92 27 0 3 1 14
downes 0 4 17 54 0 4 6 0
flare 0 0 0 2 8 2 0 0
nv 8 1 10 31 15 541 1 19
rrlyr 0 2 0 6 0 1 95 0
sn 3 4 23 4 1 5 0 157

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 70 3 0 2 0 0 0 1
blazar 0 51 2 6 0 0 0 0
cv 5 18 63 20 0 1 1 7
downes 0 10 12 40 0 1 6 0
flare 0 0 0 1 33 0 0 0
nv 18 3 7 23 62 97 1 10
rrlyr 0 5 0 4 0 0 92 0
sn 7 10 16 3 4 1 0 82

The overall classification rate is 0.8048.

Transients vs. non-variables

LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

ldamod <- lda(vtform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 523 84
tr 35 598

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 94 12
tr 6 88

The overall classification rate is 0.904.

Recursive Partitioning

roz <- rpart(vtform ,data=trains)
rpart.plot(roz,type=1,extra=1)

plot of chunk unnamed-chunk-158

pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 513 99
tr 45 583

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 92 15
tr 8 85

The overall classification rate is 0.8839.

Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

svmod <- ksvm(vtform, data=trains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel 
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 524 64
tr 34 618

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 94 9
tr 6 91

The overall classification rate is 0.921.

Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

svmod <- multinom(vtform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 520 66
tr 38 616

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 93 10
tr 7 90

The overall classification rate is 0.9161.

Random Forest

Use the randomForest package with the default settings:

tallform <- as.formula(paste("vtype ~",tpredform))
fmod <- randomForest(tallform, data=trains)
pv <- predict(fmod, newdata=tests)
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 519 61
tr 39 621

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
nv tr
nv 93 9
tr 7 91

The overall classification rate is 0.9194.

Transients only

LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

ldamod <- lda(trform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 37 3 4 7 4 0 11
blazar 0 23 7 13 0 0 6
cv 1 3 91 27 1 1 13
downes 3 7 17 65 5 9 2
flare 1 0 1 3 11 1 1
rrlyr 0 0 0 17 0 92 0
sn 2 3 25 3 3 0 159

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 84 8 3 5 17 0 6
blazar 0 59 5 10 0 0 3
cv 2 8 63 20 4 1 7
downes 7 18 12 48 21 9 1
flare 2 0 1 2 46 1 1
rrlyr 0 0 0 13 0 89 0
sn 5 8 17 2 12 0 83

The overall classification rate is 0.7009.

Recursive Partitioning

roz <- rpart(trform ,data=trains)
rpart.plot(roz,type=1,extra=1)

plot of chunk unnamed-chunk-173

pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 30 2 2 8 1 0 3
blazar 0 12 4 15 0 4 1
cv 2 6 88 37 5 3 6
downes 6 7 9 52 6 12 6
flare 1 0 0 1 6 0 3
rrlyr 0 3 0 10 0 84 1
sn 5 9 42 12 6 0 172

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 68 5 1 6 4 0 2
blazar 0 31 3 11 0 4 1
cv 5 15 61 27 21 3 3
downes 14 18 6 39 25 12 3
flare 2 0 0 1 25 0 2
rrlyr 0 8 0 7 0 82 1
sn 11 23 29 9 25 0 90

The overall classification rate is 0.651.

Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

svmod <- ksvm(trform, data=trains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel 
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 37 2 0 11 0 0 6
blazar 0 21 3 7 0 0 0
cv 2 8 108 36 2 2 17
downes 1 2 9 70 4 8 1
flare 0 0 0 1 12 0 1
rrlyr 0 2 0 6 0 93 0
sn 4 4 25 4 6 0 167

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 84 5 0 8 0 0 3
blazar 0 54 2 5 0 0 0
cv 5 21 74 27 8 2 9
downes 2 5 6 52 17 8 1
flare 0 0 0 1 50 0 1
rrlyr 0 5 0 4 0 90 0
sn 9 10 17 3 25 0 87

The overall classification rate is 0.7449.

Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

svmod <- multinom(trform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 35 1 1 5 0 0 3
blazar 0 22 4 11 0 0 3
cv 3 5 93 26 0 1 18
downes 3 8 23 80 6 8 5
flare 0 0 0 2 11 1 4
rrlyr 0 0 0 7 2 93 0
sn 3 3 24 4 5 0 159

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 80 3 1 4 0 0 2
blazar 0 56 3 8 0 0 2
cv 7 13 64 19 0 1 9
downes 7 21 16 59 25 8 3
flare 0 0 0 1 46 1 2
rrlyr 0 0 0 5 8 90 0
sn 7 8 17 3 21 0 83

The overall classification rate is 0.7229.

Random Forest

Use the randomForest package with the default settings:

tallform <- as.formula(paste("ttype ~",tpredform))
fmod <- randomForest(tallform, data=na.omit(trains))
pv <- predict(fmod, newdata=na.omit(tests))
cm <- xtabs( ~ pv + na.omit(tests)$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 35 2 2 5 0 0 2
blazar 0 20 3 8 0 0 0
cv 2 7 95 29 2 1 14
downes 3 5 18 74 4 6 6
flare 0 0 0 5 13 1 0
rrlyr 0 2 0 5 0 95 0
sn 4 3 27 9 5 0 170

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare rrlyr sn
agn 80 5 1 4 0 0 1
blazar 0 51 2 6 0 0 0
cv 5 18 66 21 8 1 7
downes 7 13 12 55 17 6 3
flare 0 0 0 4 54 1 0
rrlyr 0 5 0 4 0 92 0
sn 9 8 19 7 21 0 89

The overall classification rate is 0.7361.

Heirarchical Classification

First we classify into transient and non-variable. The cases which are classified as transient are then classified into type of transient. The transient classification here is different from the one above in the data used. Above, all the data are known to be transients whereas here some cases from the non-variable set will have been classified as transient at the first stage.

LDA

ldamod <- lda(vtform ,data=trains)
pv <- predict(ldamod, tests)
utests <- subset(tests, pv$class != 'nv')
pvt <- predict(ldamod, trains)
utrains <- subset(trains, pvt$class != 'nv')
ldamod <- lda(trform, data=utrains)
predc <- as.character(pv$class)
predc[predc != 'nv'] <- as.character(predict(ldamod, utests)$class)
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 28 2 3 2 2 2 0 8
blazar 0 24 6 12 0 0 0 5
cv 3 4 84 25 0 7 0 17
downes 0 6 22 46 0 7 11 1
flare 0 0 1 2 3 8 1 0
nv 12 0 9 30 16 523 0 17
rrlyr 0 0 0 17 0 2 91 0
sn 1 3 20 1 3 9 0 144

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 64 5 2 1 8 0 0 4
blazar 0 62 4 9 0 0 0 3
cv 7 10 58 19 0 1 0 9
downes 0 15 15 34 0 1 11 1
flare 0 0 1 1 12 1 1 0
nv 27 0 6 22 67 94 0 9
rrlyr 0 0 0 13 0 0 88 0
sn 2 8 14 1 12 2 0 75

The overall classification rate is 0.7605.

RPART

roz <- rpart(vtform ,data=trains)
pv <- predict(roz, tests, type="class")
utests <- subset(tests, pv != 'nv')
pvt <- predict(roz, trains, type="class")
utrains <- subset(trains, pvt != 'nv')
roz <- rpart(trform, data=utrains)
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(roz, utests,
type="class"))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 33 2 0 5 3 17 0 11
blazar 0 15 5 10 0 0 0 2
cv 1 6 83 35 0 2 1 5
downes 0 9 11 34 0 7 17 9
flare 0 0 0 0 0 0 0 0
nv 7 1 16 36 14 513 2 23
rrlyr 0 3 0 7 0 1 83 1
sn 3 3 30 8 7 18 0 141

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 75 5 0 4 12 3 0 6
blazar 0 38 3 7 0 0 0 1
cv 2 15 57 26 0 0 1 3
downes 0 23 8 25 0 1 17 5
flare 0 0 0 0 0 0 0 0
nv 16 3 11 27 58 92 2 12
rrlyr 0 8 0 5 0 0 81 1
sn 7 8 21 6 29 3 0 73

The overall classification rate is 0.7274.

SVM

svmod <- ksvm(vtform ,data=trains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel 
pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
svmod <- ksvm(trform, data=utrains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel 
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 32 2 0 6 0 5 0 3
blazar 0 21 3 7 0 0 0 0
cv 3 7 102 35 2 6 2 14
downes 0 2 9 48 0 7 8 1
flare 0 0 0 1 8 4 0 0
nv 7 0 6 28 11 524 0 13
rrlyr 0 2 0 6 0 1 93 0
sn 2 5 25 4 3 11 0 161

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 73 5 0 4 0 1 0 2
blazar 0 54 2 5 0 0 0 0
cv 7 18 70 26 8 1 2 7
downes 0 5 6 36 0 1 8 1
flare 0 0 0 1 33 1 0 0
nv 16 0 4 21 46 94 0 7
rrlyr 0 5 0 4 0 0 90 0
sn 5 13 17 3 12 2 0 84

The overall classification rate is 0.7976.

NNET

svmod <- multinom(vtform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
svmod <- multinom(trform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 31 1 0 3 0 2 0 3
blazar 0 22 4 11 0 1 0 3
cv 3 5 91 24 0 6 1 18
downes 2 8 23 64 0 13 8 5
flare 0 0 0 0 6 6 1 3
nv 7 0 6 24 15 520 0 14
rrlyr 0 0 0 6 0 0 93 0
sn 1 3 21 3 3 10 0 146

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 70 3 0 2 0 0 0 2
blazar 0 56 3 8 0 0 0 2
cv 7 13 63 18 0 1 1 9
downes 5 21 16 47 0 2 8 3
flare 0 0 0 0 25 1 1 2
nv 16 0 4 18 62 93 0 7
rrlyr 0 0 0 4 0 0 90 0
sn 2 8 14 2 12 2 0 76

The overall classification rate is 0.7847.

Random Forest

tallform <- as.formula(paste("vtype ~",tpredform))
svmod <- randomForest(tallform, data=trains)
pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
tallform <- as.formula(paste("ttype ~",tpredform))
svmod <- randomForest(tallform, data=na.omit(trains))
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 31 2 0 4 0 3 0 2
blazar 0 21 3 8 0 0 0 0
cv 2 7 92 27 0 3 1 13
downes 1 4 18 57 0 12 6 2
flare 0 0 0 3 9 4 1 0
nv 7 0 5 23 12 520 0 12
rrlyr 0 2 0 6 0 1 95 0
sn 3 3 27 7 3 15 0 163

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
Actual
agn blazar cv downes flare nv rrlyr sn
agn 70 5 0 3 0 1 0 1
blazar 0 54 2 6 0 0 0 0
cv 5 18 63 20 0 1 1 7
downes 2 10 12 42 0 2 6 1
flare 0 0 0 2 38 1 1 0
nv 16 0 3 17 50 93 0 6
rrlyr 0 5 0 4 0 0 92 0
sn 7 8 19 5 12 3 0 85

The overall classification rate is 0.7968.

Summary of results

Summary of percentage classification rates across tests:

print(xtable(100*cmat,digits=2),type="html")
LDA RPart SVM NN Forest
All 75.97 71.94 80.24 79.60 80.48
TranNoTran 90.40 88.39 92.10 91.61 91.94
Tranonly 70.09 65.10 74.49 72.29 73.61
Heirarch 76.05 72.74 79.76 78.47 79.68
cmatGPR <- cmat

Comparison of performance

Richards set

dimnames(cmatRICH) <- list(c("All","TranNoTran","TranOnly","Hierarch"),  c("LDA","RP","SVM","NN","RF"))
round(cmatRICH*100,1)
            LDA   RP  SVM   NN   RF
All        56.7 58.6 66.2 63.3 67.0
TranNoTran 74.6 79.5 80.9 75.2 82.7
TranOnly   54.5 58.9 64.7 60.1 63.3
Hierarch   56.4 60.4 65.1 58.8 65.9

Ours plus the Richards set

dimnames(cmatGPR) <- list(c("All","TranNoTran","TranOnly","Hierarch"),  c("LDA","RP","SVM","NN","RF"))
round(cmatGPR*100,1)
            LDA   RP  SVM   NN   RF
All        76.0 71.9 80.2 79.6 80.5
TranNoTran 90.4 88.4 92.1 91.6 91.9
TranOnly   70.1 65.1 74.5 72.3 73.6
Hierarch   76.0 72.7 79.8 78.5 79.7