Evaluation of our GP-based features using Gaussian Process Regression smoothing

[1] "Created: Tue Jun 24 14:25:43 2014"

We have introduced several types of new measure. Some of these are based on our GP fitting but some are data-based statistics rather like the Richards measures. Here we repeat the calculations using only the GP based measures.

Measures that are omitted here are shov, maxdiff, dscore, gscore, lsd, gtvar.

See featlcdb.Rmd for how the data feat.rda was prepared for this analysis.

opts_chunk$set(comment=NA, warning=FALSE, message=FALSE, error=FALSE)
load("feat.rda")
source("funcs.R")
require(MASS)
require(nnet)
require(ggplot2)
require(rpart)
require(rpart.plot)
require(xtable)
require(kernlab)
require(randomForest)

Set up the predictors that we will use throughout. Note that I have transformed some of the variables as seen in the setup.

predform <- "log(totvar) + log(quadvar) + log(famp) + log(fslope) + log(outl) +  rm.amplitude  + rm.beyond1std + rm.fpr20 +rm.fpr35 + rm.fpr50 + rm.fpr80 + log(rm.maxslope) + rm.mad +asylog(rm.medbuf) + rm.pairslope + log(rm.peramp) + log(rm.pdfp) + rm.skew + log(rm.kurtosis)+ rm.std + dublog(rm.rcorbor)"
preds <- c("totvar","quadvar","famp","fslope","outl","rm.amplitude","rm.beyond1std","rm.fpr20","rm.fpr35","rm.fpr50","rm.fpr80","rm.maxslope","rm.mad","rm.medbuf","rm.pairslope","rm.peramp","rm.pdfp","rm.skew","rm.kurtosis","rm.std","rm.rcorbor")
tpredform <- paste(preds,collapse="+")

Creation of model formulae and subset testing and training sets

Formula for classification of all types:

allform <- as.formula(paste("type ~",predform))

transients vs. non-variables:

trains$vtype <- ifelse(trains$type == "nv","nv","tr")
trains$vtype <- factor(trains$vtype)
tests$vtype <- ifelse(tests$type == "nv","nv","tr")
tests$vtype <- factor(tests$vtype)
vtform <- as.formula(paste("vtype ~",predform))

transients only,

trains$ttype <- trains$type
trains$ttype[trains$ttype == "nv"] <- NA
trains$ttype <- factor(trains$ttype)
tests$ttype <- tests$type
tests$ttype[tests$ttype == "nv"] <- NA
tests$ttype <- factor(tests$ttype)
trform <- as.formula(paste("ttype ~",predform))

cmat <- matrix(NA,nrow=4,ncol=5)
dimnames(cmat) <- list(c("All","TranNoTran","Tranonly","Heirarch"),c("LDA","RPart","SVM","NN","Forest"))

All types

LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

ldamod <- lda(allform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	25	4	3	1	0	3	0	3
blazar	3	20	9	14	0	5	1	8
cv	0	4	79	32	2	7	1	15
downes	0	8	13	29	0	5	12	3
flare	1	0	5	4	10	12	1	1
nv	14	1	16	41	12	521	1	29
rrlyr	0	0	0	11	0	2	87	0
sn	1	2	20	3	0	3	0	133

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	57	10	2	1	0	1	0	2
blazar	7	51	6	10	0	1	1	4
cv	0	10	54	24	8	1	1	8
downes	0	21	9	21	0	1	12	2
flare	2	0	3	3	42	2	1	1
nv	32	3	11	30	50	93	1	15
rrlyr	0	0	0	8	0	0	84	0
sn	2	5	14	2	0	1	0	69

The overall classification rate is 0.729.

Recursive Partitioning

roz <- rpart(allform ,data=trains)
rpart.plot(roz,type=1,extra=1)

plot of chunk unnamed-chunk-13

pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	34	1	1	3	0	8	1	3
blazar	0	0	0	0	0	0	0	0
cv	1	31	94	58	0	3	5	10
downes	0	0	0	0	0	0	0	0
flare	0	0	0	0	0	0	0	0
nv	8	1	17	45	14	513	2	29
rrlyr	0	0	1	10	0	0	86	1
sn	1	6	32	19	10	34	9	149

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	77	3	1	2	0	1	1	2
blazar	0	0	0	0	0	0	0	0
cv	2	79	65	43	0	1	5	5
downes	0	0	0	0	0	0	0	0
flare	0	0	0	0	0	0	0	0
nv	18	3	12	33	58	92	2	15
rrlyr	0	0	1	7	0	0	83	1
sn	2	15	22	14	42	6	9	78

The overall classification rate is 0.7065.

Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

svmod <- ksvm(allform, data=trains)

Using automatic sigma estimation (sigest) for RBF or laplace kernel

pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	27	2	1	7	0	3	0	3
blazar	0	18	4	6	0	0	0	0
cv	0	5	94	36	2	5	1	17
downes	1	4	8	35	0	4	7	1
flare	0	0	0	1	5	4	0	0
nv	16	0	15	40	16	529	3	18
rrlyr	0	2	0	8	0	0	90	0
sn	0	8	23	2	1	13	2	153

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	61	5	1	5	0	1	0	2
blazar	0	46	3	4	0	0	0	0
cv	0	13	65	27	8	1	1	9
downes	2	10	6	26	0	1	7	1
flare	0	0	0	1	21	1	0	0
nv	36	0	10	30	67	95	3	9
rrlyr	0	5	0	6	0	0	87	0
sn	0	21	16	1	4	2	2	80

The overall classification rate is 0.7669.

Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

svmod <- multinom(allform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	28	2	1	4	0	2	0	3
blazar	2	20	4	6	0	0	0	4
cv	1	7	79	33	1	8	1	19
downes	0	6	21	44	2	15	8	2
flare	0	0	0	1	4	5	0	1
nv	11	1	14	34	15	526	2	24
rrlyr	0	1	0	9	2	0	92	0
sn	2	2	26	4	0	2	0	139

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	64	5	1	3	0	0	0	2
blazar	5	51	3	4	0	0	0	2
cv	2	18	54	24	4	1	1	10
downes	0	15	14	33	8	3	8	1
flare	0	0	0	1	17	1	0	1
nv	25	3	10	25	62	94	2	12
rrlyr	0	3	0	7	8	0	89	0
sn	5	5	18	3	0	0	0	72

The overall classification rate is 0.7516.

Random Forest

Use the randomForest package with the default settings:

tallform <- as.formula(paste("type ~",tpredform))
fmod <- randomForest(tallform, data=trains)
pv <- predict(fmod, newdata=tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	31	1	0	3	0	5	0	2
blazar	0	19	2	9	0	1	0	0
cv	2	8	82	31	1	1	1	16
downes	0	5	19	44	0	8	6	2
flare	0	0	0	1	7	5	0	0
nv	10	1	15	33	15	527	2	21
rrlyr	0	1	0	7	0	0	92	1
sn	1	4	27	7	1	11	2	150

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	70	3	0	2	0	1	0	1
blazar	0	49	1	7	0	0	0	0
cv	5	21	57	23	4	0	1	8
downes	0	13	13	33	0	1	6	1
flare	0	0	0	1	29	1	0	0
nv	23	3	10	24	62	94	2	11
rrlyr	0	3	0	5	0	0	89	1
sn	2	10	19	5	4	2	2	78

The overall classification rate is 0.7677.

Transients vs. non-variables

LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

ldamod <- lda(vtform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	515	86
tr	43	596

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	92	13
tr	8	87

The overall classification rate is 0.896.

Recursive Partitioning

roz <- rpart(vtform ,data=trains)
rpart.plot(roz,type=1,extra=1)

plot of chunk unnamed-chunk-28

pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	500	107
tr	58	575

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	90	16
tr	10	84

The overall classification rate is 0.8669.

Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

svmod <- ksvm(vtform, data=trains)

Using automatic sigma estimation (sigest) for RBF or laplace kernel

pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	507	81
tr	51	601

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	91	12
tr	9	88

The overall classification rate is 0.8935.

Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

svmod <- multinom(vtform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	507	74
tr	51	608

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	91	11
tr	9	89

The overall classification rate is 0.8992.

Random Forest

Use the randomForest package with the default settings:

tallform <- as.formula(paste("vtype ~",tpredform))
fmod <- randomForest(tallform, data=trains)
pv <- predict(fmod, newdata=tests)
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	503	69
tr	55	613

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	90	10
tr	10	90

The overall classification rate is 0.9.

Transients only

LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

ldamod <- lda(trform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	36	5	5	8	4	0	8
blazar	2	15	8	11	0	0	10
cv	0	8	82	29	2	0	20
downes	3	9	16	68	6	14	4
flare	1	0	2	4	11	1	1
rrlyr	0	0	1	11	0	88	0
sn	2	2	31	4	1	0	149

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	82	13	3	6	17	0	4
blazar	5	38	6	8	0	0	5
cv	0	21	57	21	8	0	10
downes	7	23	11	50	25	14	2
flare	2	0	1	3	46	1	1
rrlyr	0	0	1	8	0	85	0
sn	5	5	21	3	4	0	78

The overall classification rate is 0.6584.

Recursive Partitioning

roz <- rpart(trform ,data=trains)
rpart.plot(roz,type=1,extra=1)

plot of chunk unnamed-chunk-43

pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	28	2	0	7	0	1	3
blazar	0	20	7	14	0	1	1
cv	1	10	86	44	0	4	14
downes	3	1	6	39	4	4	3
flare	0	0	0	3	8	2	0
rrlyr	0	0	1	10	0	86	1
sn	12	6	45	18	12	5	170

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	64	5	0	5	0	1	2
blazar	0	51	5	10	0	1	1
cv	2	26	59	33	0	4	7
downes	7	3	4	29	17	4	2
flare	0	0	0	2	33	2	0
rrlyr	0	0	1	7	0	83	1
sn	27	15	31	13	50	5	89

The overall classification rate is 0.6408.

Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

svmod <- ksvm(trform, data=trains)

Using automatic sigma estimation (sigest) for RBF or laplace kernel

pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	38	2	1	12	0	0	6
blazar	0	16	4	9	0	0	0
cv	1	11	99	39	2	3	16
downes	3	3	6	63	4	6	1
flare	0	0	0	1	12	0	1
rrlyr	0	1	0	7	0	92	0
sn	2	6	35	4	6	2	168

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	86	5	1	9	0	0	3
blazar	0	41	3	7	0	0	0
cv	2	28	68	29	8	3	8
downes	7	8	4	47	17	6	1
flare	0	0	0	1	50	0	1
rrlyr	0	3	0	5	0	89	0
sn	5	15	24	3	25	2	88

The overall classification rate is 0.7155.

Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

svmod <- multinom(trform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	37	2	2	11	1	0	4
blazar	2	20	5	6	1	0	4
cv	1	7	81	33	1	0	21
downes	1	6	20	70	9	9	5
flare	0	0	2	3	9	1	0
rrlyr	0	1	0	8	1	93	0
sn	3	3	35	4	2	0	158

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	84	5	1	8	4	0	2
blazar	5	51	3	4	4	0	2
cv	2	18	56	24	4	0	11
downes	2	15	14	52	38	9	3
flare	0	0	1	2	38	1	0
rrlyr	0	3	0	6	4	90	0
sn	7	8	24	3	8	0	82

The overall classification rate is 0.6862.

Random Forest

Use the randomForest package with the default settings:

tallform <- as.formula(paste("ttype ~",tpredform))
fmod <- randomForest(tallform, data=na.omit(trains))
pv <- predict(fmod, newdata=na.omit(tests))
cm <- xtabs( ~ pv + na.omit(tests)$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	35	1	1	4	0	0	3
blazar	0	18	3	10	0	0	2
cv	2	10	94	33	2	1	20
downes	4	5	20	67	4	7	4
flare	0	0	0	4	13	1	1
rrlyr	0	1	0	6	0	91	1
sn	3	4	27	11	5	3	161

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	80	3	1	3	0	0	2
blazar	0	46	2	7	0	0	1
cv	5	26	65	24	8	1	10
downes	9	13	14	50	17	7	2
flare	0	0	0	3	54	1	1
rrlyr	0	3	0	4	0	88	1
sn	7	10	19	8	21	3	84

The overall classification rate is 0.7023.

Heirarchical Classification

First we classify into transient and non-variable. The cases which are classified as transient are then classified into type of transient. The transient classification here is different from the one above in the data used. Above, all the data are known to be transients whereas here some cases from the non-variable set will have been classified as transient at the first stage.

LDA

ldamod <- lda(vtform ,data=trains)
pv <- predict(ldamod, tests)
utests <- subset(tests, pv$class != 'nv')
pvt <- predict(ldamod, trains)
utrains <- subset(trains, pvt$class != 'nv')
ldamod <- lda(trform, data=utrains)
predc <- as.character(pv$class)
predc[predc != 'nv'] <- as.character(predict(ldamod, utests)$class)
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	31	5	4	4	2	8	0	7
blazar	2	15	6	9	0	2	0	9
cv	1	8	79	26	2	7	0	19
downes	1	9	19	36	1	12	10	1
flare	0	0	1	2	6	10	1	0
nv	8	0	13	34	13	515	1	17
rrlyr	0	0	0	20	0	2	91	0
sn	1	2	23	4	0	2	0	139

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	70	13	3	3	8	1	0	4
blazar	5	38	4	7	0	0	0	5
cv	2	21	54	19	8	1	0	10
downes	2	23	13	27	4	2	10	1
flare	0	0	1	1	25	2	1	0
nv	18	0	9	25	54	92	1	9
rrlyr	0	0	0	15	0	0	88	0
sn	2	5	16	3	0	0	0	72

The overall classification rate is 0.7355.

RPART

roz <- rpart(vtform ,data=trains)
pv <- predict(roz, tests, type="class")
utests <- subset(tests, pv != 'nv')
pvt <- predict(roz, trains, type="class")
utrains <- subset(trains, pvt != 'nv')
roz <- rpart(trform, data=utrains)
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(roz, utests,
type="class"))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	27	2	0	5	3	13	2	1
blazar	2	20	8	14	0	3	1	3
cv	1	10	87	42	1	1	4	10
downes	0	1	3	3	0	1	0	0
flare	0	0	0	0	0	0	0	0
nv	7	1	17	44	10	500	2	26
rrlyr	0	0	1	11	0	0	88	1
sn	7	5	29	16	10	40	6	151

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	61	5	0	4	12	2	2	1
blazar	5	51	6	10	0	1	1	2
cv	2	26	60	31	4	0	4	5
downes	0	3	2	2	0	0	0	0
flare	0	0	0	0	0	0	0	0
nv	16	3	12	33	42	90	2	14
rrlyr	0	0	1	8	0	0	85	1
sn	16	13	20	12	42	7	6	79

The overall classification rate is 0.7065.

SVM

svmod <- ksvm(vtform ,data=trains)

Using automatic sigma estimation (sigest) for RBF or laplace kernel

pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
svmod <- ksvm(trform, data=utrains)

Using automatic sigma estimation (sigest) for RBF or laplace kernel

predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	33	1	0	5	0	9	0	4
blazar	0	20	4	10	0	0	0	0
cv	2	10	94	35	2	9	3	15
downes	0	2	7	39	0	6	5	1
flare	0	0	0	1	9	5	0	1
nv	8	0	12	34	10	507	2	14
rrlyr	0	2	0	7	0	1	92	0
sn	1	4	28	4	3	21	1	157

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	75	3	0	4	0	2	0	2
blazar	0	51	3	7	0	0	0	0
cv	5	26	65	26	8	2	3	8
downes	0	5	5	29	0	1	5	1
flare	0	0	0	1	38	1	0	1
nv	18	0	8	25	42	91	2	7
rrlyr	0	5	0	5	0	0	89	0
sn	2	10	19	3	12	4	1	82

The overall classification rate is 0.7669.

NNET

svmod <- multinom(vtform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
svmod <- multinom(trform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	32	2	1	7	0	4	0	4
blazar	2	20	5	6	1	2	0	4
cv	1	7	81	33	0	10	0	20
downes	0	6	20	52	3	19	8	3
flare	0	0	0	1	5	8	1	0
nv	7	0	13	25	14	507	1	14
rrlyr	0	1	0	8	0	0	93	0
sn	2	3	25	3	1	8	0	147

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	73	5	1	5	0	1	0	2
blazar	5	51	3	4	4	0	0	2
cv	2	18	56	24	0	2	0	10
downes	0	15	14	39	12	3	8	2
flare	0	0	0	1	21	1	1	0
nv	16	0	9	19	58	91	1	7
rrlyr	0	3	0	6	0	0	90	0
sn	5	8	17	2	4	1	0	77

The overall classification rate is 0.7556.

Random Forest

tallform <- as.formula(paste("vtype ~",tpredform))
svmod <- randomForest(tallform, data=trains)
pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
tallform <- as.formula(paste("ttype ~",tpredform))
svmod <- randomForest(tallform, data=na.omit(trains))
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	34	1	1	2	0	7	0	3
blazar	0	18	3	10	0	2	0	2
cv	2	9	82	30	2	5	2	15
downes	1	5	19	46	0	12	6	1
flare	0	0	0	2	9	7	0	1
nv	6	1	14	27	10	503	1	13
rrlyr	0	1	0	7	0	0	92	1
sn	1	4	26	11	3	22	2	156

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	77	3	1	1	0	1	0	2
blazar	0	46	2	7	0	0	0	1
cv	5	23	57	22	8	1	2	8
downes	2	13	13	34	0	2	6	1
flare	0	0	0	1	38	1	0	1
nv	14	3	10	20	42	90	1	7
rrlyr	0	3	0	5	0	0	89	1
sn	2	10	18	8	12	4	2	81

The overall classification rate is 0.7581.

Summary of results

Summary of percentage classification rates across tests:

print(xtable(100*cmat,digits=2),type="html")

	LDA	RPart	SVM	NN	Forest
All	72.90	70.65	76.69	75.16	76.77
TranNoTran	89.60	86.69	89.35	89.92	90.00
Tranonly	65.84	64.08	71.55	68.62	70.23
Heirarch	73.55	70.65	76.69	75.56	75.81

cmatredGPR <- cmat

Evaluation of our GP-based features using Gaussian Process Regression smoothing

Julian Faraway

Creation of model formulae and subset testing and training sets

All types

LDA

Recursive Partitioning

Support Vector Machines

Neural Net

Random Forest

Transients vs. non-variables

LDA

Recursive Partitioning

Support Vector Machines

Neural Net

Random Forest

Transients only

LDA

Recursive Partitioning

Support Vector Machines

Neural Net

Random Forest

Heirarchical Classification

LDA

RPART

SVM

NNET

Random Forest

Summary of results