[1] "Created: Wed Sep 3 16:21:25 2014"
The purpose of this analysis is to establish a performance baseline for our measures. I have deliberately chosen the default version of five different classification methods:
It is very possible that better results could be obtained by tuning these methods better or using different classification methods. This is worth trying but my main objective is the generation of new features which I hope will improve the Richards features. I need to use the same classification methods for both sets of features in order to make a fair comparison.
See featlcdb.Rmd for how the data feat.rda was prepared for this analysis.
load("feat.rda")
source("funcs.R")
require(MASS)
require(nnet)
require(ggplot2)
require(rpart)
require(rpart.plot)
require(xtable)
require(kernlab)
require(randomForest)
Set up the predictors that we will use throughout. Note that I have transformed some of the variables as seen in the setup.
predform <- "rm.amplitude + rm.beyond1std + rm.fpr20 +rm.fpr35 + rm.fpr50 + rm.fpr80 + log(rm.maxslope) + rm.mad +asylog(rm.medbuf) + rm.pairslope + log(rm.peramp) + log(rm.pdfp) + rm.skew + log(rm.kurtosis)+ rm.std + dublog(rm.rcorbor)"
preds <- c("rm.amplitude","rm.beyond1std","rm.fpr20","rm.fpr35","rm.fpr50","rm.fpr80","rm.maxslope","rm.mad","rm.medbuf","rm.pairslope","rm.peramp","rm.pdfp","rm.skew","rm.kurtosis","rm.std","rm.rcorbor")
tpredform <- paste(preds,collapse="+")
trform <- as.formula(paste("type ~",predform))
Formula for classification of all types:
allform <- as.formula(paste("type ~",predform))
transients vs. non-variables:
trains$vtype <- ifelse(trains$type == "nv","nv","tr")
trains$vtype <- factor(trains$vtype)
tests$vtype <- ifelse(tests$type == "nv","nv","tr")
tests$vtype <- factor(tests$vtype)
vtform <- as.formula(paste("vtype ~",predform))
transients only,
trains$ttype <- trains$type
trains$ttype[trains$ttype == "nv"] <- NA
trains$ttype <- factor(trains$ttype)
tests$ttype <- tests$type
tests$ttype[tests$ttype == "nv"] <- NA
tests$ttype <- factor(tests$ttype)
trform <- as.formula(paste("ttype ~",predform))
cmat <- matrix(NA,nrow=4,ncol=5)
dimnames(cmat) <- list(c("All","TranNoTran","Tranonly","Heirarch"),c("LDA","RPart","SVM","NN","Forest"))
Linear Discriminant analysis using the default options.
We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.
ldamod <- lda(allform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$type)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 2 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
blazar | 9 | 19 | 11 | 15 | 0 | 9 | 0 | 7 |
cv | 0 | 4 | 81 | 34 | 2 | 10 | 1 | 22 |
downes | 0 | 1 | 7 | 12 | 1 | 5 | 9 | 4 |
flare | 1 | 0 | 7 | 8 | 9 | 10 | 1 | 1 |
nv | 32 | 15 | 25 | 66 | 11 | 489 | 65 | 89 |
rrlyr | 0 | 0 | 3 | 0 | 0 | 18 | 26 | 4 |
sn | 0 | 0 | 11 | 0 | 1 | 16 | 1 | 65 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
blazar | 20 | 49 | 8 | 11 | 0 | 2 | 0 | 4 |
cv | 0 | 10 | 56 | 25 | 8 | 2 | 1 | 11 |
downes | 0 | 3 | 5 | 9 | 4 | 1 | 9 | 2 |
flare | 2 | 0 | 5 | 6 | 38 | 2 | 1 | 1 |
nv | 73 | 38 | 17 | 49 | 46 | 88 | 63 | 46 |
rrlyr | 0 | 0 | 2 | 0 | 0 | 3 | 25 | 2 |
sn | 0 | 0 | 8 | 0 | 4 | 3 | 1 | 34 |
The overall classification rate is 0.5669.
roz <- rpart(allform ,data=trains)
rpart.plot(roz,type=1,extra=1)
pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$type)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
blazar | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
cv | 5 | 26 | 95 | 53 | 4 | 20 | 3 | 27 |
downes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
flare | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
nv | 31 | 7 | 26 | 73 | 16 | 497 | 47 | 80 |
rrlyr | 0 | 1 | 2 | 3 | 0 | 12 | 53 | 3 |
sn | 8 | 5 | 22 | 6 | 4 | 29 | 0 | 82 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
blazar | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
cv | 11 | 67 | 66 | 39 | 17 | 4 | 3 | 14 |
downes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
flare | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
nv | 70 | 18 | 18 | 54 | 67 | 89 | 46 | 42 |
rrlyr | 0 | 3 | 1 | 2 | 0 | 2 | 51 | 2 |
sn | 18 | 13 | 15 | 4 | 17 | 5 | 0 | 43 |
The overall classification rate is 0.5863.
Use the default choice of setting from the kernlab R package for this:
svmod <- ksvm(allform, data=trains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$type)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 11 | 2 | 1 | 5 | 0 | 5 | 2 | 3 |
blazar | 0 | 14 | 4 | 8 | 0 | 2 | 0 | 1 |
cv | 1 | 4 | 95 | 37 | 3 | 11 | 2 | 23 |
downes | 0 | 4 | 6 | 10 | 0 | 3 | 4 | 0 |
flare | 0 | 0 | 0 | 0 | 3 | 4 | 0 | 0 |
nv | 31 | 9 | 19 | 66 | 17 | 508 | 18 | 61 |
rrlyr | 0 | 1 | 0 | 4 | 0 | 8 | 77 | 1 |
sn | 1 | 5 | 20 | 5 | 1 | 17 | 0 | 103 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 25 | 5 | 1 | 4 | 0 | 1 | 2 | 2 |
blazar | 0 | 36 | 3 | 6 | 0 | 0 | 0 | 1 |
cv | 2 | 10 | 66 | 27 | 12 | 2 | 2 | 12 |
downes | 0 | 10 | 4 | 7 | 0 | 1 | 4 | 0 |
flare | 0 | 0 | 0 | 0 | 12 | 1 | 0 | 0 |
nv | 70 | 23 | 13 | 49 | 71 | 91 | 17 | 32 |
rrlyr | 0 | 3 | 0 | 3 | 0 | 1 | 75 | 1 |
sn | 2 | 13 | 14 | 4 | 4 | 3 | 0 | 54 |
The overall classification rate is 0.6621.
Use the multinom() function from the nnet R package. Might work better with some scaling.
svmod <- multinom(allform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$type)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 13 | 2 | 0 | 7 | 0 | 5 | 0 | 2 |
blazar | 0 | 12 | 4 | 7 | 0 | 1 | 0 | 2 |
cv | 1 | 10 | 82 | 39 | 1 | 10 | 2 | 18 |
downes | 1 | 1 | 16 | 11 | 1 | 4 | 3 | 8 |
flare | 0 | 0 | 0 | 1 | 4 | 4 | 0 | 0 |
nv | 28 | 12 | 29 | 56 | 12 | 508 | 10 | 95 |
rrlyr | 1 | 1 | 0 | 14 | 4 | 5 | 88 | 0 |
sn | 0 | 1 | 14 | 0 | 2 | 21 | 0 | 67 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 30 | 5 | 0 | 5 | 0 | 1 | 0 | 1 |
blazar | 0 | 31 | 3 | 5 | 0 | 0 | 0 | 1 |
cv | 2 | 26 | 57 | 29 | 4 | 2 | 2 | 9 |
downes | 2 | 3 | 11 | 8 | 4 | 1 | 3 | 4 |
flare | 0 | 0 | 0 | 1 | 17 | 1 | 0 | 0 |
nv | 64 | 31 | 20 | 41 | 50 | 91 | 10 | 49 |
rrlyr | 2 | 3 | 0 | 10 | 17 | 1 | 85 | 0 |
sn | 0 | 3 | 10 | 0 | 8 | 4 | 0 | 35 |
The overall classification rate is 0.6331.
Use the randomForest package with the default settings:
tallform <- as.formula(paste("type ~",tpredform))
fmod <- randomForest(tallform, data=trains)
pv <- predict(fmod, newdata=tests)
cm <- xtabs( ~ pv + tests$type)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 21 | 1 | 0 | 5 | 0 | 8 | 0 | 2 |
blazar | 4 | 16 | 3 | 7 | 1 | 5 | 0 | 4 |
cv | 2 | 7 | 82 | 35 | 3 | 11 | 1 | 17 |
downes | 0 | 1 | 15 | 16 | 0 | 8 | 6 | 2 |
flare | 0 | 0 | 0 | 2 | 9 | 3 | 0 | 1 |
nv | 15 | 7 | 18 | 61 | 8 | 496 | 17 | 52 |
rrlyr | 0 | 1 | 1 | 5 | 0 | 7 | 78 | 1 |
sn | 2 | 6 | 26 | 4 | 3 | 20 | 1 | 113 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 48 | 3 | 0 | 4 | 0 | 1 | 0 | 1 |
blazar | 9 | 41 | 2 | 5 | 4 | 1 | 0 | 2 |
cv | 5 | 18 | 57 | 26 | 12 | 2 | 1 | 9 |
downes | 0 | 3 | 10 | 12 | 0 | 1 | 6 | 1 |
flare | 0 | 0 | 0 | 1 | 38 | 1 | 0 | 1 |
nv | 34 | 18 | 12 | 45 | 33 | 89 | 17 | 27 |
rrlyr | 0 | 3 | 1 | 4 | 0 | 1 | 76 | 1 |
sn | 5 | 15 | 18 | 3 | 12 | 4 | 1 | 59 |
The overall classification rate is 0.6702.
Linear Discriminant analysis using the default options.
We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.
ldamod <- lda(vtform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$vtype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 414 | 171 |
tr | 144 | 511 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 74 | 25 |
tr | 26 | 75 |
The overall classification rate is 0.746.
roz <- rpart(vtform ,data=trains)
rpart.plot(roz,type=1,extra=1)
pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$vtype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 416 | 112 |
tr | 142 | 570 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 75 | 16 |
tr | 25 | 84 |
The overall classification rate is 0.7952.
Use the default choice of setting from the kernlab R package for this:
svmod <- ksvm(vtform, data=trains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$vtype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 441 | 120 |
tr | 117 | 562 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 79 | 18 |
tr | 21 | 82 |
The overall classification rate is 0.8089.
Use the multinom() function from the nnet R package. Might work better with some scaling.
svmod <- multinom(vtform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$vtype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 410 | 160 |
tr | 148 | 522 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 73 | 23 |
tr | 27 | 77 |
The overall classification rate is 0.7516.
Use the randomForest package with the default settings:
tallform <- as.formula(paste("vtype ~",tpredform))
fmod <- randomForest(tallform, data=trains)
pv <- predict(fmod, newdata=tests)
cm <- xtabs( ~ pv + tests$vtype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 446 | 102 |
tr | 112 | 580 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 80 | 15 |
tr | 20 | 85 |
The overall classification rate is 0.8274.
Linear Discriminant analysis using the default options.
We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.
ldamod <- lda(trform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$ttype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 15 | 4 | 0 | 10 | 0 | 0 | 8 |
blazar | 8 | 14 | 7 | 10 | 1 | 0 | 4 |
cv | 0 | 7 | 87 | 37 | 2 | 1 | 16 |
downes | 0 | 7 | 10 | 43 | 9 | 20 | 16 |
flare | 1 | 0 | 4 | 5 | 11 | 1 | 0 |
rrlyr | 1 | 0 | 12 | 15 | 0 | 69 | 15 |
sn | 19 | 7 | 25 | 15 | 1 | 12 | 133 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 34 | 10 | 0 | 7 | 0 | 0 | 4 |
blazar | 18 | 36 | 5 | 7 | 4 | 0 | 2 |
cv | 0 | 18 | 60 | 27 | 8 | 1 | 8 |
downes | 0 | 18 | 7 | 32 | 38 | 19 | 8 |
flare | 2 | 0 | 3 | 4 | 46 | 1 | 0 |
rrlyr | 2 | 0 | 8 | 11 | 0 | 67 | 8 |
sn | 43 | 18 | 17 | 11 | 4 | 12 | 69 |
The overall classification rate is 0.5455.
roz <- rpart(trform ,data=trains)
rpart.plot(roz,type=1,extra=1)
pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$ttype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 28 | 5 | 2 | 15 | 0 | 0 | 5 |
blazar | 2 | 12 | 4 | 4 | 0 | 0 | 1 |
cv | 2 | 10 | 90 | 41 | 7 | 4 | 15 |
downes | 1 | 0 | 2 | 27 | 3 | 2 | 4 |
flare | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
rrlyr | 4 | 1 | 2 | 23 | 9 | 84 | 6 |
sn | 7 | 11 | 45 | 25 | 5 | 13 | 161 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 64 | 13 | 1 | 11 | 0 | 0 | 3 |
blazar | 5 | 31 | 3 | 3 | 0 | 0 | 1 |
cv | 5 | 26 | 62 | 30 | 29 | 4 | 8 |
downes | 2 | 0 | 1 | 20 | 12 | 2 | 2 |
flare | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
rrlyr | 9 | 3 | 1 | 17 | 38 | 82 | 3 |
sn | 16 | 28 | 31 | 19 | 21 | 13 | 84 |
The overall classification rate is 0.5894.
Use the default choice of setting from the kernlab R package for this:
svmod <- ksvm(trform, data=trains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$ttype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 36 | 6 | 1 | 19 | 0 | 2 | 7 |
blazar | 0 | 13 | 4 | 5 | 1 | 0 | 1 |
cv | 2 | 5 | 99 | 44 | 5 | 3 | 23 |
downes | 1 | 5 | 4 | 42 | 4 | 10 | 3 |
flare | 0 | 0 | 0 | 1 | 11 | 0 | 1 |
rrlyr | 1 | 1 | 2 | 14 | 0 | 86 | 3 |
sn | 4 | 9 | 35 | 10 | 3 | 2 | 154 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 82 | 15 | 1 | 14 | 0 | 2 | 4 |
blazar | 0 | 33 | 3 | 4 | 4 | 0 | 1 |
cv | 5 | 13 | 68 | 33 | 21 | 3 | 12 |
downes | 2 | 13 | 3 | 31 | 17 | 10 | 2 |
flare | 0 | 0 | 0 | 1 | 46 | 0 | 1 |
rrlyr | 2 | 3 | 1 | 10 | 0 | 83 | 2 |
sn | 9 | 23 | 24 | 7 | 12 | 2 | 80 |
The overall classification rate is 0.6466.
Use the multinom() function from the nnet R package. Might work better with some scaling.
svmod <- multinom(trform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$ttype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 29 | 2 | 0 | 18 | 0 | 0 | 4 |
blazar | 2 | 15 | 5 | 6 | 1 | 0 | 4 |
cv | 1 | 10 | 85 | 41 | 1 | 1 | 18 |
downes | 3 | 4 | 17 | 36 | 11 | 9 | 18 |
flare | 0 | 0 | 0 | 2 | 6 | 1 | 1 |
rrlyr | 1 | 1 | 2 | 22 | 3 | 92 | 0 |
sn | 8 | 7 | 36 | 10 | 2 | 0 | 147 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 66 | 5 | 0 | 13 | 0 | 0 | 2 |
blazar | 5 | 38 | 3 | 4 | 4 | 0 | 2 |
cv | 2 | 26 | 59 | 30 | 4 | 1 | 9 |
downes | 7 | 10 | 12 | 27 | 46 | 9 | 9 |
flare | 0 | 0 | 0 | 1 | 25 | 1 | 1 |
rrlyr | 2 | 3 | 1 | 16 | 12 | 89 | 0 |
sn | 18 | 18 | 25 | 7 | 8 | 0 | 77 |
The overall classification rate is 0.6012.
Use the randomForest package with the default settings:
tallform <- as.formula(paste("ttype ~",tpredform))
fmod <- randomForest(tallform, data=na.omit(trains))
pv <- predict(fmod, newdata=na.omit(tests))
cm <- xtabs( ~ pv + na.omit(tests)$ttype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 29 | 5 | 0 | 12 | 0 | 1 | 7 |
blazar | 5 | 16 | 5 | 9 | 1 | 0 | 7 |
cv | 2 | 8 | 85 | 37 | 3 | 1 | 18 |
downes | 3 | 2 | 15 | 57 | 5 | 10 | 8 |
flare | 0 | 0 | 0 | 2 | 11 | 0 | 1 |
rrlyr | 1 | 1 | 2 | 12 | 0 | 86 | 3 |
sn | 4 | 7 | 38 | 6 | 4 | 5 | 148 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 66 | 13 | 0 | 9 | 0 | 1 | 4 |
blazar | 11 | 41 | 3 | 7 | 4 | 0 | 4 |
cv | 5 | 21 | 59 | 27 | 12 | 1 | 9 |
downes | 7 | 5 | 10 | 42 | 21 | 10 | 4 |
flare | 0 | 0 | 0 | 1 | 46 | 0 | 1 |
rrlyr | 2 | 3 | 1 | 9 | 0 | 83 | 2 |
sn | 9 | 18 | 26 | 4 | 17 | 5 | 77 |
The overall classification rate is 0.6334.
First we classify into transient and non-variable. The cases which are classified as transient are then classified into type of transient. The transient classification here is different from the one above in the data used. Above, all the data are known to be transients whereas here some cases from the non-variable set will have been classified as transient at the first stage.
ldamod <- lda(vtform ,data=trains)
pv <- predict(ldamod, tests)
utests <- subset(tests, pv$class != 'nv')
pvt <- predict(ldamod, trains)
utrains <- subset(trains, pvt$class != 'nv')
ldamod <- lda(trform, data=utrains)
predc <- as.character(pv$class)
predc[predc != 'nv'] <- as.character(predict(ldamod, utests)$class)
cm <- xtabs( ~ predc + as.character(tests$type))
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 18 | 2 | 1 | 9 | 0 | 21 | 4 | 8 |
blazar | 6 | 13 | 4 | 9 | 0 | 6 | 0 | 1 |
cv | 0 | 12 | 88 | 38 | 1 | 11 | 1 | 17 |
downes | 0 | 1 | 7 | 7 | 2 | 3 | 3 | 6 |
flare | 1 | 0 | 3 | 5 | 8 | 9 | 1 | 0 |
nv | 12 | 6 | 12 | 56 | 9 | 414 | 25 | 50 |
rrlyr | 1 | 0 | 6 | 4 | 0 | 26 | 54 | 13 |
sn | 6 | 5 | 24 | 7 | 4 | 68 | 15 | 97 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 41 | 5 | 1 | 7 | 0 | 4 | 4 | 4 |
blazar | 14 | 33 | 3 | 7 | 0 | 1 | 0 | 1 |
cv | 0 | 31 | 61 | 28 | 4 | 2 | 1 | 9 |
downes | 0 | 3 | 5 | 5 | 8 | 1 | 3 | 3 |
flare | 2 | 0 | 2 | 4 | 33 | 2 | 1 | 0 |
nv | 27 | 15 | 8 | 41 | 38 | 74 | 24 | 26 |
rrlyr | 2 | 0 | 4 | 3 | 0 | 5 | 52 | 7 |
sn | 14 | 13 | 17 | 5 | 17 | 12 | 15 | 51 |
The overall classification rate is 0.5637.
roz <- rpart(vtform ,data=trains)
pv <- predict(roz, tests, type="class")
utests <- subset(tests, pv != 'nv')
pvt <- predict(roz, trains, type="class")
utrains <- subset(trains, pvt != 'nv')
roz <- rpart(trform, data=utrains)
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(roz, utests,
type="class"))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 26 | 3 | 2 | 12 | 0 | 25 | 0 | 4 |
blazar | 2 | 12 | 4 | 4 | 0 | 3 | 0 | 1 |
cv | 2 | 10 | 90 | 43 | 9 | 16 | 4 | 15 |
downes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
flare | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
nv | 8 | 1 | 3 | 47 | 4 | 416 | 26 | 23 |
rrlyr | 1 | 1 | 1 | 11 | 6 | 17 | 61 | 5 |
sn | 5 | 12 | 45 | 18 | 5 | 81 | 12 | 144 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 59 | 8 | 1 | 9 | 0 | 4 | 0 | 2 |
blazar | 5 | 31 | 3 | 3 | 0 | 1 | 0 | 1 |
cv | 5 | 26 | 62 | 32 | 38 | 3 | 4 | 8 |
downes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
flare | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
nv | 18 | 3 | 2 | 35 | 17 | 75 | 25 | 12 |
rrlyr | 2 | 3 | 1 | 8 | 25 | 3 | 59 | 3 |
sn | 11 | 31 | 31 | 13 | 21 | 15 | 12 | 75 |
The overall classification rate is 0.604.
svmod <- ksvm(vtform ,data=trains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel
pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
svmod <- ksvm(trform, data=utrains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 23 | 5 | 1 | 8 | 0 | 24 | 2 | 8 |
blazar | 0 | 15 | 4 | 6 | 0 | 2 | 0 | 1 |
cv | 2 | 5 | 98 | 42 | 5 | 17 | 3 | 22 |
downes | 0 | 2 | 4 | 9 | 0 | 6 | 6 | 1 |
flare | 0 | 0 | 0 | 2 | 9 | 4 | 0 | 0 |
nv | 17 | 2 | 6 | 52 | 5 | 442 | 10 | 28 |
rrlyr | 0 | 1 | 3 | 7 | 1 | 12 | 81 | 2 |
sn | 2 | 9 | 29 | 9 | 4 | 51 | 1 | 130 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 52 | 13 | 1 | 6 | 0 | 4 | 2 | 4 |
blazar | 0 | 38 | 3 | 4 | 0 | 0 | 0 | 1 |
cv | 5 | 13 | 68 | 31 | 21 | 3 | 3 | 11 |
downes | 0 | 5 | 3 | 7 | 0 | 1 | 6 | 1 |
flare | 0 | 0 | 0 | 1 | 38 | 1 | 0 | 0 |
nv | 39 | 5 | 4 | 39 | 21 | 79 | 10 | 15 |
rrlyr | 0 | 3 | 2 | 5 | 4 | 2 | 79 | 1 |
sn | 5 | 23 | 20 | 7 | 17 | 9 | 1 | 68 |
The overall classification rate is 0.6508.
svmod <- multinom(vtform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
svmod <- multinom(trform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 26 | 1 | 0 | 11 | 0 | 21 | 0 | 4 |
blazar | 2 | 15 | 5 | 5 | 0 | 2 | 0 | 3 |
cv | 1 | 10 | 85 | 41 | 1 | 21 | 1 | 18 |
downes | 2 | 4 | 16 | 11 | 5 | 10 | 4 | 14 |
flare | 0 | 0 | 0 | 2 | 6 | 9 | 0 | 0 |
nv | 6 | 4 | 11 | 55 | 9 | 410 | 28 | 47 |
rrlyr | 1 | 0 | 2 | 8 | 1 | 13 | 70 | 0 |
sn | 6 | 5 | 26 | 2 | 2 | 72 | 0 | 106 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 59 | 3 | 0 | 8 | 0 | 4 | 0 | 2 |
blazar | 5 | 38 | 3 | 4 | 0 | 0 | 0 | 2 |
cv | 2 | 26 | 59 | 30 | 4 | 4 | 1 | 9 |
downes | 5 | 10 | 11 | 8 | 21 | 2 | 4 | 7 |
flare | 0 | 0 | 0 | 1 | 25 | 2 | 0 | 0 |
nv | 14 | 10 | 8 | 41 | 38 | 73 | 27 | 24 |
rrlyr | 2 | 0 | 1 | 6 | 4 | 2 | 68 | 0 |
sn | 14 | 13 | 18 | 1 | 8 | 13 | 0 | 55 |
The overall classification rate is 0.5879.
tallform <- as.formula(paste("vtype ~",tpredform))
svmod <- randomForest(tallform, data=trains)
pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
tallform <- as.formula(paste("ttype ~",tpredform))
svmod <- randomForest(tallform, data=na.omit(trains))
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 26 | 5 | 0 | 8 | 0 | 21 | 0 | 6 |
blazar | 3 | 16 | 4 | 8 | 1 | 7 | 0 | 8 |
cv | 1 | 8 | 84 | 34 | 4 | 12 | 1 | 14 |
downes | 2 | 1 | 14 | 23 | 1 | 14 | 9 | 5 |
flare | 0 | 0 | 0 | 2 | 10 | 5 | 0 | 2 |
nv | 9 | 1 | 13 | 44 | 4 | 450 | 7 | 30 |
rrlyr | 0 | 1 | 2 | 10 | 0 | 14 | 82 | 1 |
sn | 3 | 7 | 28 | 6 | 4 | 35 | 4 | 126 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 59 | 13 | 0 | 6 | 0 | 4 | 0 | 3 |
blazar | 7 | 41 | 3 | 6 | 4 | 1 | 0 | 4 |
cv | 2 | 21 | 58 | 25 | 17 | 2 | 1 | 7 |
downes | 5 | 3 | 10 | 17 | 4 | 3 | 9 | 3 |
flare | 0 | 0 | 0 | 1 | 42 | 1 | 0 | 1 |
nv | 20 | 3 | 9 | 33 | 17 | 81 | 7 | 16 |
rrlyr | 0 | 3 | 1 | 7 | 0 | 3 | 80 | 1 |
sn | 7 | 18 | 19 | 4 | 17 | 6 | 4 | 66 |
The overall classification rate is 0.6589.
Summary of percentage classification rates across tests:
print(xtable(100*cmat,digits=2),type="html")
LDA | RPart | SVM | NN | Forest | |
---|---|---|---|---|---|
All | 56.69 | 58.63 | 66.21 | 63.31 | 67.02 |
TranNoTran | 74.60 | 79.52 | 80.89 | 75.16 | 82.74 |
Tranonly | 54.55 | 58.94 | 64.66 | 60.12 | 63.34 |
Heirarch | 56.37 | 60.40 | 65.08 | 58.79 | 65.89 |
cmatRICH <- cmat
Set up the predictors that we will use throughout. Note that I have transformed some of the variables as seen in the setup.
predform <- "shov + maxdiff + dscore + log(totvar) + log(quadvar) + log(famp) + log(fslope) + log(outl) + gscore + lsd + nudlog(gtvar) + rm.amplitude + rm.beyond1std + rm.fpr20 +rm.fpr35 + rm.fpr50 + rm.fpr80 + log(rm.maxslope) + rm.mad +asylog(rm.medbuf) + rm.pairslope + log(rm.peramp) + log(rm.pdfp) + rm.skew + log(rm.kurtosis)+ rm.std + dublog(rm.rcorbor)"
preds <- c("shov","maxdiff","dscore","totvar","quadvar","famp","fslope","outl","gscore","lsd","gtvar","rm.amplitude","rm.beyond1std","rm.fpr20","rm.fpr35","rm.fpr50","rm.fpr80","rm.maxslope","rm.mad","rm.medbuf","rm.pairslope","rm.peramp","rm.pdfp","rm.skew","rm.kurtosis","rm.std","rm.rcorbor")
tpredform <- paste(preds,collapse="+")
Formula for classification of all types:
allform <- as.formula(paste("type ~",predform))
transients vs. non-variables:
trains$vtype <- ifelse(trains$type == "nv","nv","tr")
trains$vtype <- factor(trains$vtype)
tests$vtype <- ifelse(tests$type == "nv","nv","tr")
tests$vtype <- factor(tests$vtype)
vtform <- as.formula(paste("vtype ~",predform))
transients only,
trains$ttype <- trains$type
trains$ttype[trains$ttype == "nv"] <- NA
trains$ttype <- factor(trains$ttype)
tests$ttype <- tests$type
tests$ttype[tests$ttype == "nv"] <- NA
tests$ttype <- factor(tests$ttype)
trform <- as.formula(paste("ttype ~",predform))
cmat <- matrix(NA,nrow=4,ncol=5)
dimnames(cmat) <- list(c("All","TranNoTran","Tranonly","Heirarch"),c("LDA","RPart","SVM","NN","Forest"))
Linear Discriminant analysis using the default options.
We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.
ldamod <- lda(allform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$type)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 29 | 2 | 0 | 3 | 0 | 3 | 0 | 8 |
blazar | 0 | 24 | 13 | 13 | 0 | 2 | 0 | 8 |
cv | 0 | 3 | 85 | 32 | 1 | 2 | 1 | 14 |
downes | 1 | 4 | 14 | 33 | 0 | 8 | 9 | 1 |
flare | 1 | 0 | 2 | 3 | 11 | 10 | 1 | 1 |
nv | 12 | 2 | 13 | 29 | 11 | 525 | 0 | 17 |
rrlyr | 0 | 2 | 1 | 20 | 0 | 1 | 92 | 0 |
sn | 1 | 2 | 17 | 2 | 1 | 7 | 0 | 143 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 66 | 5 | 0 | 2 | 0 | 1 | 0 | 4 |
blazar | 0 | 62 | 9 | 10 | 0 | 0 | 0 | 4 |
cv | 0 | 8 | 59 | 24 | 4 | 0 | 1 | 7 |
downes | 2 | 10 | 10 | 24 | 0 | 1 | 9 | 1 |
flare | 2 | 0 | 1 | 2 | 46 | 2 | 1 | 1 |
nv | 27 | 5 | 9 | 21 | 46 | 94 | 0 | 9 |
rrlyr | 0 | 5 | 1 | 15 | 0 | 0 | 89 | 0 |
sn | 2 | 5 | 12 | 1 | 4 | 1 | 0 | 74 |
The overall classification rate is 0.7597.
roz <- rpart(allform ,data=trains)
rpart.plot(roz,type=1,extra=1)
pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$type)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 27 | 2 | 3 | 6 | 1 | 6 | 0 | 7 |
blazar | 0 | 17 | 6 | 19 | 0 | 1 | 4 | 2 |
cv | 1 | 6 | 80 | 35 | 0 | 0 | 2 | 3 |
downes | 0 | 6 | 8 | 9 | 0 | 2 | 1 | 5 |
flare | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
nv | 11 | 1 | 19 | 50 | 22 | 528 | 3 | 32 |
rrlyr | 0 | 4 | 1 | 13 | 0 | 7 | 93 | 5 |
sn | 5 | 3 | 28 | 3 | 1 | 14 | 0 | 138 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 61 | 5 | 2 | 4 | 4 | 1 | 0 | 4 |
blazar | 0 | 44 | 4 | 14 | 0 | 0 | 4 | 1 |
cv | 2 | 15 | 55 | 26 | 0 | 0 | 2 | 2 |
downes | 0 | 15 | 6 | 7 | 0 | 0 | 1 | 3 |
flare | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
nv | 25 | 3 | 13 | 37 | 92 | 95 | 3 | 17 |
rrlyr | 0 | 10 | 1 | 10 | 0 | 1 | 90 | 3 |
sn | 11 | 8 | 19 | 2 | 4 | 3 | 0 | 72 |
The overall classification rate is 0.7194.
Use the default choice of setting from the kernlab R package for this:
svmod <- ksvm(allform, data=trains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$type)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 29 | 3 | 0 | 4 | 0 | 0 | 0 | 1 |
blazar | 0 | 22 | 4 | 8 | 0 | 0 | 0 | 0 |
cv | 1 | 6 | 102 | 34 | 1 | 3 | 1 | 14 |
downes | 1 | 2 | 8 | 43 | 0 | 5 | 8 | 0 |
flare | 0 | 0 | 0 | 0 | 7 | 3 | 0 | 0 |
nv | 12 | 0 | 8 | 34 | 14 | 542 | 1 | 20 |
rrlyr | 0 | 2 | 0 | 8 | 0 | 1 | 93 | 0 |
sn | 1 | 4 | 23 | 4 | 2 | 4 | 0 | 157 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 66 | 8 | 0 | 3 | 0 | 0 | 0 | 1 |
blazar | 0 | 56 | 3 | 6 | 0 | 0 | 0 | 0 |
cv | 2 | 15 | 70 | 25 | 4 | 1 | 1 | 7 |
downes | 2 | 5 | 6 | 32 | 0 | 1 | 8 | 0 |
flare | 0 | 0 | 0 | 0 | 29 | 1 | 0 | 0 |
nv | 27 | 0 | 6 | 25 | 58 | 97 | 1 | 10 |
rrlyr | 0 | 5 | 0 | 6 | 0 | 0 | 90 | 0 |
sn | 2 | 10 | 16 | 3 | 8 | 1 | 0 | 82 |
The overall classification rate is 0.8024.
Use the multinom() function from the nnet R package. Might work better with some scaling.
svmod <- multinom(allform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$type)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 32 | 2 | 0 | 3 | 0 | 1 | 0 | 3 |
blazar | 1 | 24 | 4 | 11 | 0 | 1 | 0 | 6 |
cv | 3 | 4 | 93 | 27 | 0 | 5 | 3 | 19 |
downes | 0 | 6 | 18 | 59 | 0 | 11 | 7 | 1 |
flare | 0 | 0 | 0 | 1 | 7 | 4 | 0 | 1 |
nv | 7 | 0 | 9 | 25 | 12 | 531 | 0 | 14 |
rrlyr | 0 | 0 | 0 | 7 | 2 | 0 | 93 | 0 |
sn | 1 | 3 | 21 | 2 | 3 | 5 | 0 | 148 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 73 | 5 | 0 | 2 | 0 | 0 | 0 | 2 |
blazar | 2 | 62 | 3 | 8 | 0 | 0 | 0 | 3 |
cv | 7 | 10 | 64 | 20 | 0 | 1 | 3 | 10 |
downes | 0 | 15 | 12 | 44 | 0 | 2 | 7 | 1 |
flare | 0 | 0 | 0 | 1 | 29 | 1 | 0 | 1 |
nv | 16 | 0 | 6 | 19 | 50 | 95 | 0 | 7 |
rrlyr | 0 | 0 | 0 | 5 | 8 | 0 | 90 | 0 |
sn | 2 | 8 | 14 | 1 | 12 | 1 | 0 | 77 |
The overall classification rate is 0.796.
Use the randomForest package with the default settings:
tallform <- as.formula(paste("type ~",tpredform))
fmod <- randomForest(tallform, data=trains)
pv <- predict(fmod, newdata=tests)
cm <- xtabs( ~ pv + tests$type)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 31 | 1 | 0 | 3 | 0 | 2 | 0 | 2 |
blazar | 0 | 20 | 3 | 8 | 0 | 0 | 0 | 0 |
cv | 2 | 7 | 92 | 27 | 0 | 3 | 1 | 14 |
downes | 0 | 4 | 17 | 54 | 0 | 4 | 6 | 0 |
flare | 0 | 0 | 0 | 2 | 8 | 2 | 0 | 0 |
nv | 8 | 1 | 10 | 31 | 15 | 541 | 1 | 19 |
rrlyr | 0 | 2 | 0 | 6 | 0 | 1 | 95 | 0 |
sn | 3 | 4 | 23 | 4 | 1 | 5 | 0 | 157 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 70 | 3 | 0 | 2 | 0 | 0 | 0 | 1 |
blazar | 0 | 51 | 2 | 6 | 0 | 0 | 0 | 0 |
cv | 5 | 18 | 63 | 20 | 0 | 1 | 1 | 7 |
downes | 0 | 10 | 12 | 40 | 0 | 1 | 6 | 0 |
flare | 0 | 0 | 0 | 1 | 33 | 0 | 0 | 0 |
nv | 18 | 3 | 7 | 23 | 62 | 97 | 1 | 10 |
rrlyr | 0 | 5 | 0 | 4 | 0 | 0 | 92 | 0 |
sn | 7 | 10 | 16 | 3 | 4 | 1 | 0 | 82 |
The overall classification rate is 0.8048.
Linear Discriminant analysis using the default options.
We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.
ldamod <- lda(vtform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$vtype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 523 | 84 |
tr | 35 | 598 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 94 | 12 |
tr | 6 | 88 |
The overall classification rate is 0.904.
roz <- rpart(vtform ,data=trains)
rpart.plot(roz,type=1,extra=1)
pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$vtype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 513 | 99 |
tr | 45 | 583 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 92 | 15 |
tr | 8 | 85 |
The overall classification rate is 0.8839.
Use the default choice of setting from the kernlab R package for this:
svmod <- ksvm(vtform, data=trains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$vtype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 524 | 64 |
tr | 34 | 618 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 94 | 9 |
tr | 6 | 91 |
The overall classification rate is 0.921.
Use the multinom() function from the nnet R package. Might work better with some scaling.
svmod <- multinom(vtform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$vtype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 520 | 66 |
tr | 38 | 616 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 93 | 10 |
tr | 7 | 90 |
The overall classification rate is 0.9161.
Use the randomForest package with the default settings:
tallform <- as.formula(paste("vtype ~",tpredform))
fmod <- randomForest(tallform, data=trains)
pv <- predict(fmod, newdata=tests)
cm <- xtabs( ~ pv + tests$vtype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 519 | 61 |
tr | 39 | 621 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
nv | tr | |
---|---|---|
nv | 93 | 9 |
tr | 7 | 91 |
The overall classification rate is 0.9194.
Linear Discriminant analysis using the default options.
We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.
ldamod <- lda(trform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$ttype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 37 | 3 | 4 | 7 | 4 | 0 | 11 |
blazar | 0 | 23 | 7 | 13 | 0 | 0 | 6 |
cv | 1 | 3 | 91 | 27 | 1 | 1 | 13 |
downes | 3 | 7 | 17 | 65 | 5 | 9 | 2 |
flare | 1 | 0 | 1 | 3 | 11 | 1 | 1 |
rrlyr | 0 | 0 | 0 | 17 | 0 | 92 | 0 |
sn | 2 | 3 | 25 | 3 | 3 | 0 | 159 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 84 | 8 | 3 | 5 | 17 | 0 | 6 |
blazar | 0 | 59 | 5 | 10 | 0 | 0 | 3 |
cv | 2 | 8 | 63 | 20 | 4 | 1 | 7 |
downes | 7 | 18 | 12 | 48 | 21 | 9 | 1 |
flare | 2 | 0 | 1 | 2 | 46 | 1 | 1 |
rrlyr | 0 | 0 | 0 | 13 | 0 | 89 | 0 |
sn | 5 | 8 | 17 | 2 | 12 | 0 | 83 |
The overall classification rate is 0.7009.
roz <- rpart(trform ,data=trains)
rpart.plot(roz,type=1,extra=1)
pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$ttype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 30 | 2 | 2 | 8 | 1 | 0 | 3 |
blazar | 0 | 12 | 4 | 15 | 0 | 4 | 1 |
cv | 2 | 6 | 88 | 37 | 5 | 3 | 6 |
downes | 6 | 7 | 9 | 52 | 6 | 12 | 6 |
flare | 1 | 0 | 0 | 1 | 6 | 0 | 3 |
rrlyr | 0 | 3 | 0 | 10 | 0 | 84 | 1 |
sn | 5 | 9 | 42 | 12 | 6 | 0 | 172 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 68 | 5 | 1 | 6 | 4 | 0 | 2 |
blazar | 0 | 31 | 3 | 11 | 0 | 4 | 1 |
cv | 5 | 15 | 61 | 27 | 21 | 3 | 3 |
downes | 14 | 18 | 6 | 39 | 25 | 12 | 3 |
flare | 2 | 0 | 0 | 1 | 25 | 0 | 2 |
rrlyr | 0 | 8 | 0 | 7 | 0 | 82 | 1 |
sn | 11 | 23 | 29 | 9 | 25 | 0 | 90 |
The overall classification rate is 0.651.
Use the default choice of setting from the kernlab R package for this:
svmod <- ksvm(trform, data=trains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$ttype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 37 | 2 | 0 | 11 | 0 | 0 | 6 |
blazar | 0 | 21 | 3 | 7 | 0 | 0 | 0 |
cv | 2 | 8 | 108 | 36 | 2 | 2 | 17 |
downes | 1 | 2 | 9 | 70 | 4 | 8 | 1 |
flare | 0 | 0 | 0 | 1 | 12 | 0 | 1 |
rrlyr | 0 | 2 | 0 | 6 | 0 | 93 | 0 |
sn | 4 | 4 | 25 | 4 | 6 | 0 | 167 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 84 | 5 | 0 | 8 | 0 | 0 | 3 |
blazar | 0 | 54 | 2 | 5 | 0 | 0 | 0 |
cv | 5 | 21 | 74 | 27 | 8 | 2 | 9 |
downes | 2 | 5 | 6 | 52 | 17 | 8 | 1 |
flare | 0 | 0 | 0 | 1 | 50 | 0 | 1 |
rrlyr | 0 | 5 | 0 | 4 | 0 | 90 | 0 |
sn | 9 | 10 | 17 | 3 | 25 | 0 | 87 |
The overall classification rate is 0.7449.
Use the multinom() function from the nnet R package. Might work better with some scaling.
svmod <- multinom(trform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$ttype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 35 | 1 | 1 | 5 | 0 | 0 | 3 |
blazar | 0 | 22 | 4 | 11 | 0 | 0 | 3 |
cv | 3 | 5 | 93 | 26 | 0 | 1 | 18 |
downes | 3 | 8 | 23 | 80 | 6 | 8 | 5 |
flare | 0 | 0 | 0 | 2 | 11 | 1 | 4 |
rrlyr | 0 | 0 | 0 | 7 | 2 | 93 | 0 |
sn | 3 | 3 | 24 | 4 | 5 | 0 | 159 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 80 | 3 | 1 | 4 | 0 | 0 | 2 |
blazar | 0 | 56 | 3 | 8 | 0 | 0 | 2 |
cv | 7 | 13 | 64 | 19 | 0 | 1 | 9 |
downes | 7 | 21 | 16 | 59 | 25 | 8 | 3 |
flare | 0 | 0 | 0 | 1 | 46 | 1 | 2 |
rrlyr | 0 | 0 | 0 | 5 | 8 | 90 | 0 |
sn | 7 | 8 | 17 | 3 | 21 | 0 | 83 |
The overall classification rate is 0.7229.
Use the randomForest package with the default settings:
tallform <- as.formula(paste("ttype ~",tpredform))
fmod <- randomForest(tallform, data=na.omit(trains))
pv <- predict(fmod, newdata=na.omit(tests))
cm <- xtabs( ~ pv + na.omit(tests)$ttype)
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 35 | 2 | 2 | 5 | 0 | 0 | 2 |
blazar | 0 | 20 | 3 | 8 | 0 | 0 | 0 |
cv | 2 | 7 | 95 | 29 | 2 | 1 | 14 |
downes | 3 | 5 | 18 | 74 | 4 | 6 | 6 |
flare | 0 | 0 | 0 | 5 | 13 | 1 | 0 |
rrlyr | 0 | 2 | 0 | 5 | 0 | 95 | 0 |
sn | 4 | 3 | 27 | 9 | 5 | 0 | 170 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | rrlyr | sn | |
---|---|---|---|---|---|---|---|
agn | 80 | 5 | 1 | 4 | 0 | 0 | 1 |
blazar | 0 | 51 | 2 | 6 | 0 | 0 | 0 |
cv | 5 | 18 | 66 | 21 | 8 | 1 | 7 |
downes | 7 | 13 | 12 | 55 | 17 | 6 | 3 |
flare | 0 | 0 | 0 | 4 | 54 | 1 | 0 |
rrlyr | 0 | 5 | 0 | 4 | 0 | 92 | 0 |
sn | 9 | 8 | 19 | 7 | 21 | 0 | 89 |
The overall classification rate is 0.7361.
First we classify into transient and non-variable. The cases which are classified as transient are then classified into type of transient. The transient classification here is different from the one above in the data used. Above, all the data are known to be transients whereas here some cases from the non-variable set will have been classified as transient at the first stage.
ldamod <- lda(vtform ,data=trains)
pv <- predict(ldamod, tests)
utests <- subset(tests, pv$class != 'nv')
pvt <- predict(ldamod, trains)
utrains <- subset(trains, pvt$class != 'nv')
ldamod <- lda(trform, data=utrains)
predc <- as.character(pv$class)
predc[predc != 'nv'] <- as.character(predict(ldamod, utests)$class)
cm <- xtabs( ~ predc + as.character(tests$type))
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 28 | 2 | 3 | 2 | 2 | 2 | 0 | 8 |
blazar | 0 | 24 | 6 | 12 | 0 | 0 | 0 | 5 |
cv | 3 | 4 | 84 | 25 | 0 | 7 | 0 | 17 |
downes | 0 | 6 | 22 | 46 | 0 | 7 | 11 | 1 |
flare | 0 | 0 | 1 | 2 | 3 | 8 | 1 | 0 |
nv | 12 | 0 | 9 | 30 | 16 | 523 | 0 | 17 |
rrlyr | 0 | 0 | 0 | 17 | 0 | 2 | 91 | 0 |
sn | 1 | 3 | 20 | 1 | 3 | 9 | 0 | 144 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 64 | 5 | 2 | 1 | 8 | 0 | 0 | 4 |
blazar | 0 | 62 | 4 | 9 | 0 | 0 | 0 | 3 |
cv | 7 | 10 | 58 | 19 | 0 | 1 | 0 | 9 |
downes | 0 | 15 | 15 | 34 | 0 | 1 | 11 | 1 |
flare | 0 | 0 | 1 | 1 | 12 | 1 | 1 | 0 |
nv | 27 | 0 | 6 | 22 | 67 | 94 | 0 | 9 |
rrlyr | 0 | 0 | 0 | 13 | 0 | 0 | 88 | 0 |
sn | 2 | 8 | 14 | 1 | 12 | 2 | 0 | 75 |
The overall classification rate is 0.7605.
roz <- rpart(vtform ,data=trains)
pv <- predict(roz, tests, type="class")
utests <- subset(tests, pv != 'nv')
pvt <- predict(roz, trains, type="class")
utrains <- subset(trains, pvt != 'nv')
roz <- rpart(trform, data=utrains)
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(roz, utests,
type="class"))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 33 | 2 | 0 | 5 | 3 | 17 | 0 | 11 |
blazar | 0 | 15 | 5 | 10 | 0 | 0 | 0 | 2 |
cv | 1 | 6 | 83 | 35 | 0 | 2 | 1 | 5 |
downes | 0 | 9 | 11 | 34 | 0 | 7 | 17 | 9 |
flare | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
nv | 7 | 1 | 16 | 36 | 14 | 513 | 2 | 23 |
rrlyr | 0 | 3 | 0 | 7 | 0 | 1 | 83 | 1 |
sn | 3 | 3 | 30 | 8 | 7 | 18 | 0 | 141 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 75 | 5 | 0 | 4 | 12 | 3 | 0 | 6 |
blazar | 0 | 38 | 3 | 7 | 0 | 0 | 0 | 1 |
cv | 2 | 15 | 57 | 26 | 0 | 0 | 1 | 3 |
downes | 0 | 23 | 8 | 25 | 0 | 1 | 17 | 5 |
flare | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
nv | 16 | 3 | 11 | 27 | 58 | 92 | 2 | 12 |
rrlyr | 0 | 8 | 0 | 5 | 0 | 0 | 81 | 1 |
sn | 7 | 8 | 21 | 6 | 29 | 3 | 0 | 73 |
The overall classification rate is 0.7274.
svmod <- ksvm(vtform ,data=trains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel
pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
svmod <- ksvm(trform, data=utrains)
Using automatic sigma estimation (sigest) for RBF or laplace kernel
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 32 | 2 | 0 | 6 | 0 | 5 | 0 | 3 |
blazar | 0 | 21 | 3 | 7 | 0 | 0 | 0 | 0 |
cv | 3 | 7 | 102 | 35 | 2 | 6 | 2 | 14 |
downes | 0 | 2 | 9 | 48 | 0 | 7 | 8 | 1 |
flare | 0 | 0 | 0 | 1 | 8 | 4 | 0 | 0 |
nv | 7 | 0 | 6 | 28 | 11 | 524 | 0 | 13 |
rrlyr | 0 | 2 | 0 | 6 | 0 | 1 | 93 | 0 |
sn | 2 | 5 | 25 | 4 | 3 | 11 | 0 | 161 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 73 | 5 | 0 | 4 | 0 | 1 | 0 | 2 |
blazar | 0 | 54 | 2 | 5 | 0 | 0 | 0 | 0 |
cv | 7 | 18 | 70 | 26 | 8 | 1 | 2 | 7 |
downes | 0 | 5 | 6 | 36 | 0 | 1 | 8 | 1 |
flare | 0 | 0 | 0 | 1 | 33 | 1 | 0 | 0 |
nv | 16 | 0 | 4 | 21 | 46 | 94 | 0 | 7 |
rrlyr | 0 | 5 | 0 | 4 | 0 | 0 | 90 | 0 |
sn | 5 | 13 | 17 | 3 | 12 | 2 | 0 | 84 |
The overall classification rate is 0.7976.
svmod <- multinom(vtform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
svmod <- multinom(trform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 31 | 1 | 0 | 3 | 0 | 2 | 0 | 3 |
blazar | 0 | 22 | 4 | 11 | 0 | 1 | 0 | 3 |
cv | 3 | 5 | 91 | 24 | 0 | 6 | 1 | 18 |
downes | 2 | 8 | 23 | 64 | 0 | 13 | 8 | 5 |
flare | 0 | 0 | 0 | 0 | 6 | 6 | 1 | 3 |
nv | 7 | 0 | 6 | 24 | 15 | 520 | 0 | 14 |
rrlyr | 0 | 0 | 0 | 6 | 0 | 0 | 93 | 0 |
sn | 1 | 3 | 21 | 3 | 3 | 10 | 0 | 146 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 70 | 3 | 0 | 2 | 0 | 0 | 0 | 2 |
blazar | 0 | 56 | 3 | 8 | 0 | 0 | 0 | 2 |
cv | 7 | 13 | 63 | 18 | 0 | 1 | 1 | 9 |
downes | 5 | 21 | 16 | 47 | 0 | 2 | 8 | 3 |
flare | 0 | 0 | 0 | 0 | 25 | 1 | 1 | 2 |
nv | 16 | 0 | 4 | 18 | 62 | 93 | 0 | 7 |
rrlyr | 0 | 0 | 0 | 4 | 0 | 0 | 90 | 0 |
sn | 2 | 8 | 14 | 2 | 12 | 2 | 0 | 76 |
The overall classification rate is 0.7847.
tallform <- as.formula(paste("vtype ~",tpredform))
svmod <- randomForest(tallform, data=trains)
pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
tallform <- as.formula(paste("ttype ~",tpredform))
svmod <- randomForest(tallform, data=na.omit(trains))
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))
This table shows the predicted type in the rows by the actual type in the columns.
print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 31 | 2 | 0 | 4 | 0 | 3 | 0 | 2 |
blazar | 0 | 21 | 3 | 8 | 0 | 0 | 0 | 0 |
cv | 2 | 7 | 92 | 27 | 0 | 3 | 1 | 13 |
downes | 1 | 4 | 18 | 57 | 0 | 12 | 6 | 2 |
flare | 0 | 0 | 0 | 3 | 9 | 4 | 1 | 0 |
nv | 7 | 0 | 5 | 23 | 12 | 520 | 0 | 12 |
rrlyr | 0 | 2 | 0 | 6 | 0 | 1 | 95 | 0 |
sn | 3 | 3 | 27 | 7 | 3 | 15 | 0 | 163 |
Same as above but now expressed a percentage within each column:
print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")
agn | blazar | cv | downes | flare | nv | rrlyr | sn | |
---|---|---|---|---|---|---|---|---|
agn | 70 | 5 | 0 | 3 | 0 | 1 | 0 | 1 |
blazar | 0 | 54 | 2 | 6 | 0 | 0 | 0 | 0 |
cv | 5 | 18 | 63 | 20 | 0 | 1 | 1 | 7 |
downes | 2 | 10 | 12 | 42 | 0 | 2 | 6 | 1 |
flare | 0 | 0 | 0 | 2 | 38 | 1 | 1 | 0 |
nv | 16 | 0 | 3 | 17 | 50 | 93 | 0 | 6 |
rrlyr | 0 | 5 | 0 | 4 | 0 | 0 | 92 | 0 |
sn | 7 | 8 | 19 | 5 | 12 | 3 | 0 | 85 |
The overall classification rate is 0.7968.
Summary of percentage classification rates across tests:
print(xtable(100*cmat,digits=2),type="html")
LDA | RPart | SVM | NN | Forest | |
---|---|---|---|---|---|
All | 75.97 | 71.94 | 80.24 | 79.60 | 80.48 |
TranNoTran | 90.40 | 88.39 | 92.10 | 91.61 | 91.94 |
Tranonly | 70.09 | 65.10 | 74.49 | 72.29 | 73.61 |
Heirarch | 76.05 | 72.74 | 79.76 | 78.47 | 79.68 |
cmatGPR <- cmat
Richards set
dimnames(cmatRICH) <- list(c("All","TranNoTran","TranOnly","Hierarch"), c("LDA","RP","SVM","NN","RF"))
round(cmatRICH*100,1)
LDA RP SVM NN RF
All 56.7 58.6 66.2 63.3 67.0
TranNoTran 74.6 79.5 80.9 75.2 82.7
TranOnly 54.5 58.9 64.7 60.1 63.3
Hierarch 56.4 60.4 65.1 58.8 65.9
Ours plus the Richards set
dimnames(cmatGPR) <- list(c("All","TranNoTran","TranOnly","Hierarch"), c("LDA","RP","SVM","NN","RF"))
round(cmatGPR*100,1)
LDA RP SVM NN RF
All 76.0 71.9 80.2 79.6 80.5
TranNoTran 90.4 88.4 92.1 91.6 91.9
TranOnly 70.1 65.1 74.5 72.3 73.6
Hierarch 76.0 72.7 79.8 78.5 79.7