Evaluation of Richards and our set of features using Gaussian Process Regression smoothing

Richards set predictors

Set up the predictors that we will use throughout. Note that I have transformed some of the variables as seen in the setup.

predform <- "rm.amplitude  + rm.beyond1std + rm.fpr20 +rm.fpr35 + rm.fpr50 + rm.fpr80 + log(rm.maxslope) + rm.mad +asylog(rm.medbuf) + rm.pairslope + log(rm.peramp) + log(rm.pdfp) + rm.skew + log(rm.kurtosis)+ rm.std + dublog(rm.rcorbor)"
preds <- c("rm.amplitude","rm.beyond1std","rm.fpr20","rm.fpr35","rm.fpr50","rm.fpr80","rm.maxslope","rm.mad","rm.medbuf","rm.pairslope","rm.peramp","rm.pdfp","rm.skew","rm.kurtosis","rm.std","rm.rcorbor")
tpredform <- paste(preds,collapse="+")
trform <- as.formula(paste("type ~",predform))

Creation of model formulae and subset testing and training sets

Formula for classification of all types:

allform <- as.formula(paste("type ~",predform))

transients vs. non-variables:

trains$vtype <- ifelse(trains$type == "nv","nv","tr")
trains$vtype <- factor(trains$vtype)
tests$vtype <- ifelse(tests$type == "nv","nv","tr")
tests$vtype <- factor(tests$vtype)
vtform <- as.formula(paste("vtype ~",predform))

transients only,

trains$ttype <- trains$type
trains$ttype[trains$ttype == "nv"] <- NA
trains$ttype <- factor(trains$ttype)
tests$ttype <- tests$type
tests$ttype[tests$ttype == "nv"] <- NA
tests$ttype <- factor(tests$ttype)
trform <- as.formula(paste("ttype ~",predform))

cmat <- matrix(NA,nrow=4,ncol=5)
dimnames(cmat) <- list(c("All","TranNoTran","Tranonly","Heirarch"),c("LDA","RPart","SVM","NN","Forest"))

All types

LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

ldamod <- lda(allform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	2	0	0	0	0	1	0	0
blazar	9	19	11	15	0	9	0	7
cv	0	4	81	34	2	10	1	22
downes	0	1	7	12	1	5	9	4
flare	1	0	7	8	9	10	1	1
nv	32	15	25	66	11	489	65	89
rrlyr	0	0	3	0	0	18	26	4
sn	0	0	11	0	1	16	1	65

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	5	0	0	0	0	0	0	0
blazar	20	49	8	11	0	2	0	4
cv	0	10	56	25	8	2	1	11
downes	0	3	5	9	4	1	9	2
flare	2	0	5	6	38	2	1	1
nv	73	38	17	49	46	88	63	46
rrlyr	0	0	2	0	0	3	25	2
sn	0	0	8	0	4	3	1	34

The overall classification rate is 0.5669.

Recursive Partitioning

roz <- rpart(allform ,data=trains)
rpart.plot(roz,type=1,extra=1)

plot of chunk unnamed-chunk-18

pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	0	0	0	0	0	0	0	0
blazar	0	0	0	0	0	0	0	0
cv	5	26	95	53	4	20	3	27
downes	0	0	0	0	0	0	0	0
flare	0	0	0	0	0	0	0	0
nv	31	7	26	73	16	497	47	80
rrlyr	0	1	2	3	0	12	53	3
sn	8	5	22	6	4	29	0	82

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	0	0	0	0	0	0	0	0
blazar	0	0	0	0	0	0	0	0
cv	11	67	66	39	17	4	3	14
downes	0	0	0	0	0	0	0	0
flare	0	0	0	0	0	0	0	0
nv	70	18	18	54	67	89	46	42
rrlyr	0	3	1	2	0	2	51	2
sn	18	13	15	4	17	5	0	43

The overall classification rate is 0.5863.

Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

svmod <- ksvm(allform, data=trains)

Using automatic sigma estimation (sigest) for RBF or laplace kernel

pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	11	2	1	5	0	5	2	3
blazar	0	14	4	8	0	2	0	1
cv	1	4	95	37	3	11	2	23
downes	0	4	6	10	0	3	4	0
flare	0	0	0	0	3	4	0	0
nv	31	9	19	66	17	508	18	61
rrlyr	0	1	0	4	0	8	77	1
sn	1	5	20	5	1	17	0	103

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	25	5	1	4	0	1	2	2
blazar	0	36	3	6	0	0	0	1
cv	2	10	66	27	12	2	2	12
downes	0	10	4	7	0	1	4	0
flare	0	0	0	0	12	1	0	0
nv	70	23	13	49	71	91	17	32
rrlyr	0	3	0	3	0	1	75	1
sn	2	13	14	4	4	3	0	54

The overall classification rate is 0.6621.

Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

svmod <- multinom(allform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	13	2	0	7	0	5	0	2
blazar	0	12	4	7	0	1	0	2
cv	1	10	82	39	1	10	2	18
downes	1	1	16	11	1	4	3	8
flare	0	0	0	1	4	4	0	0
nv	28	12	29	56	12	508	10	95
rrlyr	1	1	0	14	4	5	88	0
sn	0	1	14	0	2	21	0	67

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	30	5	0	5	0	1	0	1
blazar	0	31	3	5	0	0	0	1
cv	2	26	57	29	4	2	2	9
downes	2	3	11	8	4	1	3	4
flare	0	0	0	1	17	1	0	0
nv	64	31	20	41	50	91	10	49
rrlyr	2	3	0	10	17	1	85	0
sn	0	3	10	0	8	4	0	35

The overall classification rate is 0.6331.

Random Forest

Use the randomForest package with the default settings:

tallform <- as.formula(paste("type ~",tpredform))
fmod <- randomForest(tallform, data=trains)
pv <- predict(fmod, newdata=tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	21	1	0	5	0	8	0	2
blazar	4	16	3	7	1	5	0	4
cv	2	7	82	35	3	11	1	17
downes	0	1	15	16	0	8	6	2
flare	0	0	0	2	9	3	0	1
nv	15	7	18	61	8	496	17	52
rrlyr	0	1	1	5	0	7	78	1
sn	2	6	26	4	3	20	1	113

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	48	3	0	4	0	1	0	1
blazar	9	41	2	5	4	1	0	2
cv	5	18	57	26	12	2	1	9
downes	0	3	10	12	0	1	6	1
flare	0	0	0	1	38	1	0	1
nv	34	18	12	45	33	89	17	27
rrlyr	0	3	1	4	0	1	76	1
sn	5	15	18	3	12	4	1	59

The overall classification rate is 0.6702.

Transients vs. non-variables

LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

ldamod <- lda(vtform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	414	171
tr	144	511

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	74	25
tr	26	75

The overall classification rate is 0.746.

Recursive Partitioning

roz <- rpart(vtform ,data=trains)
rpart.plot(roz,type=1,extra=1)

plot of chunk unnamed-chunk-33

pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	416	112
tr	142	570

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	75	16
tr	25	84

The overall classification rate is 0.7952.

Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

svmod <- ksvm(vtform, data=trains)

Using automatic sigma estimation (sigest) for RBF or laplace kernel

pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	441	120
tr	117	562

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	79	18
tr	21	82

The overall classification rate is 0.8089.

Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

svmod <- multinom(vtform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	410	160
tr	148	522

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	73	23
tr	27	77

The overall classification rate is 0.7516.

Random Forest

Use the randomForest package with the default settings:

tallform <- as.formula(paste("vtype ~",tpredform))
fmod <- randomForest(tallform, data=trains)
pv <- predict(fmod, newdata=tests)
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	446	102
tr	112	580

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	80	15
tr	20	85

The overall classification rate is 0.8274.

Transients only

LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

ldamod <- lda(trform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	15	4	0	10	0	0	8
blazar	8	14	7	10	1	0	4
cv	0	7	87	37	2	1	16
downes	0	7	10	43	9	20	16
flare	1	0	4	5	11	1	0
rrlyr	1	0	12	15	0	69	15
sn	19	7	25	15	1	12	133

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	34	10	0	7	0	0	4
blazar	18	36	5	7	4	0	2
cv	0	18	60	27	8	1	8
downes	0	18	7	32	38	19	8
flare	2	0	3	4	46	1	0
rrlyr	2	0	8	11	0	67	8
sn	43	18	17	11	4	12	69

The overall classification rate is 0.5455.

Recursive Partitioning

roz <- rpart(trform ,data=trains)
rpart.plot(roz,type=1,extra=1)

plot of chunk unnamed-chunk-48

pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	28	5	2	15	0	0	5
blazar	2	12	4	4	0	0	1
cv	2	10	90	41	7	4	15
downes	1	0	2	27	3	2	4
flare	0	0	0	0	0	0	0
rrlyr	4	1	2	23	9	84	6
sn	7	11	45	25	5	13	161

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	64	13	1	11	0	0	3
blazar	5	31	3	3	0	0	1
cv	5	26	62	30	29	4	8
downes	2	0	1	20	12	2	2
flare	0	0	0	0	0	0	0
rrlyr	9	3	1	17	38	82	3
sn	16	28	31	19	21	13	84

The overall classification rate is 0.5894.

Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

svmod <- ksvm(trform, data=trains)

Using automatic sigma estimation (sigest) for RBF or laplace kernel

pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	36	6	1	19	0	2	7
blazar	0	13	4	5	1	0	1
cv	2	5	99	44	5	3	23
downes	1	5	4	42	4	10	3
flare	0	0	0	1	11	0	1
rrlyr	1	1	2	14	0	86	3
sn	4	9	35	10	3	2	154

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	82	15	1	14	0	2	4
blazar	0	33	3	4	4	0	1
cv	5	13	68	33	21	3	12
downes	2	13	3	31	17	10	2
flare	0	0	0	1	46	0	1
rrlyr	2	3	1	10	0	83	2
sn	9	23	24	7	12	2	80

The overall classification rate is 0.6466.

Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

svmod <- multinom(trform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	29	2	0	18	0	0	4
blazar	2	15	5	6	1	0	4
cv	1	10	85	41	1	1	18
downes	3	4	17	36	11	9	18
flare	0	0	0	2	6	1	1
rrlyr	1	1	2	22	3	92	0
sn	8	7	36	10	2	0	147

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	66	5	0	13	0	0	2
blazar	5	38	3	4	4	0	2
cv	2	26	59	30	4	1	9
downes	7	10	12	27	46	9	9
flare	0	0	0	1	25	1	1
rrlyr	2	3	1	16	12	89	0
sn	18	18	25	7	8	0	77

The overall classification rate is 0.6012.

Random Forest

Use the randomForest package with the default settings:

tallform <- as.formula(paste("ttype ~",tpredform))
fmod <- randomForest(tallform, data=na.omit(trains))
pv <- predict(fmod, newdata=na.omit(tests))
cm <- xtabs( ~ pv + na.omit(tests)$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	29	5	0	12	0	1	7
blazar	5	16	5	9	1	0	7
cv	2	8	85	37	3	1	18
downes	3	2	15	57	5	10	8
flare	0	0	0	2	11	0	1
rrlyr	1	1	2	12	0	86	3
sn	4	7	38	6	4	5	148

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	66	13	0	9	0	1	4
blazar	11	41	3	7	4	0	4
cv	5	21	59	27	12	1	9
downes	7	5	10	42	21	10	4
flare	0	0	0	1	46	0	1
rrlyr	2	3	1	9	0	83	2
sn	9	18	26	4	17	5	77

The overall classification rate is 0.6334.

Heirarchical Classification

First we classify into transient and non-variable. The cases which are classified as transient are then classified into type of transient. The transient classification here is different from the one above in the data used. Above, all the data are known to be transients whereas here some cases from the non-variable set will have been classified as transient at the first stage.

LDA

ldamod <- lda(vtform ,data=trains)
pv <- predict(ldamod, tests)
utests <- subset(tests, pv$class != 'nv')
pvt <- predict(ldamod, trains)
utrains <- subset(trains, pvt$class != 'nv')
ldamod <- lda(trform, data=utrains)
predc <- as.character(pv$class)
predc[predc != 'nv'] <- as.character(predict(ldamod, utests)$class)
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	18	2	1	9	0	21	4	8
blazar	6	13	4	9	0	6	0	1
cv	0	12	88	38	1	11	1	17
downes	0	1	7	7	2	3	3	6
flare	1	0	3	5	8	9	1	0
nv	12	6	12	56	9	414	25	50
rrlyr	1	0	6	4	0	26	54	13
sn	6	5	24	7	4	68	15	97

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	41	5	1	7	0	4	4	4
blazar	14	33	3	7	0	1	0	1
cv	0	31	61	28	4	2	1	9
downes	0	3	5	5	8	1	3	3
flare	2	0	2	4	33	2	1	0
nv	27	15	8	41	38	74	24	26
rrlyr	2	0	4	3	0	5	52	7
sn	14	13	17	5	17	12	15	51

The overall classification rate is 0.5637.

RPART

roz <- rpart(vtform ,data=trains)
pv <- predict(roz, tests, type="class")
utests <- subset(tests, pv != 'nv')
pvt <- predict(roz, trains, type="class")
utrains <- subset(trains, pvt != 'nv')
roz <- rpart(trform, data=utrains)
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(roz, utests,
type="class"))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	26	3	2	12	0	25	0	4
blazar	2	12	4	4	0	3	0	1
cv	2	10	90	43	9	16	4	15
downes	0	0	0	0	0	0	0	0
flare	0	0	0	0	0	0	0	0
nv	8	1	3	47	4	416	26	23
rrlyr	1	1	1	11	6	17	61	5
sn	5	12	45	18	5	81	12	144

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	59	8	1	9	0	4	0	2
blazar	5	31	3	3	0	1	0	1
cv	5	26	62	32	38	3	4	8
downes	0	0	0	0	0	0	0	0
flare	0	0	0	0	0	0	0	0
nv	18	3	2	35	17	75	25	12
rrlyr	2	3	1	8	25	3	59	3
sn	11	31	31	13	21	15	12	75

The overall classification rate is 0.604.

SVM

svmod <- ksvm(vtform ,data=trains)

Using automatic sigma estimation (sigest) for RBF or laplace kernel

pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
svmod <- ksvm(trform, data=utrains)

Using automatic sigma estimation (sigest) for RBF or laplace kernel

predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	23	5	1	8	0	24	2	8
blazar	0	15	4	6	0	2	0	1
cv	2	5	98	42	5	17	3	22
downes	0	2	4	9	0	6	6	1
flare	0	0	0	2	9	4	0	0
nv	17	2	6	52	5	442	10	28
rrlyr	0	1	3	7	1	12	81	2
sn	2	9	29	9	4	51	1	130

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	52	13	1	6	0	4	2	4
blazar	0	38	3	4	0	0	0	1
cv	5	13	68	31	21	3	3	11
downes	0	5	3	7	0	1	6	1
flare	0	0	0	1	38	1	0	0
nv	39	5	4	39	21	79	10	15
rrlyr	0	3	2	5	4	2	79	1
sn	5	23	20	7	17	9	1	68

The overall classification rate is 0.6508.

NNET

svmod <- multinom(vtform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
svmod <- multinom(trform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	26	1	0	11	0	21	0	4
blazar	2	15	5	5	0	2	0	3
cv	1	10	85	41	1	21	1	18
downes	2	4	16	11	5	10	4	14
flare	0	0	0	2	6	9	0	0
nv	6	4	11	55	9	410	28	47
rrlyr	1	0	2	8	1	13	70	0
sn	6	5	26	2	2	72	0	106

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	59	3	0	8	0	4	0	2
blazar	5	38	3	4	0	0	0	2
cv	2	26	59	30	4	4	1	9
downes	5	10	11	8	21	2	4	7
flare	0	0	0	1	25	2	0	0
nv	14	10	8	41	38	73	27	24
rrlyr	2	0	1	6	4	2	68	0
sn	14	13	18	1	8	13	0	55

The overall classification rate is 0.5879.

Random Forest

tallform <- as.formula(paste("vtype ~",tpredform))
svmod <- randomForest(tallform, data=trains)
pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
tallform <- as.formula(paste("ttype ~",tpredform))
svmod <- randomForest(tallform, data=na.omit(trains))
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	26	5	0	8	0	21	0	6
blazar	3	16	4	8	1	7	0	8
cv	1	8	84	34	4	12	1	14
downes	2	1	14	23	1	14	9	5
flare	0	0	0	2	10	5	0	2
nv	9	1	13	44	4	450	7	30
rrlyr	0	1	2	10	0	14	82	1
sn	3	7	28	6	4	35	4	126

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	59	13	0	6	0	4	0	3
blazar	7	41	3	6	4	1	0	4
cv	2	21	58	25	17	2	1	7
downes	5	3	10	17	4	3	9	3
flare	0	0	0	1	42	1	0	1
nv	20	3	9	33	17	81	7	16
rrlyr	0	3	1	7	0	3	80	1
sn	7	18	19	4	17	6	4	66

The overall classification rate is 0.6589.

Summary of results

Summary of percentage classification rates across tests:

print(xtable(100*cmat,digits=2),type="html")

	LDA	RPart	SVM	NN	Forest
All	56.69	58.63	66.21	63.31	67.02
TranNoTran	74.60	79.52	80.89	75.16	82.74
Tranonly	54.55	58.94	64.66	60.12	63.34
Heirarch	56.37	60.40	65.08	58.79	65.89

cmatRICH <- cmat

Ours+Richards set of predictors

Set up the predictors that we will use throughout. Note that I have transformed some of the variables as seen in the setup.

predform <- "shov + maxdiff + dscore + log(totvar) + log(quadvar) + log(famp) + log(fslope) + log(outl) + gscore + lsd + nudlog(gtvar) + rm.amplitude  + rm.beyond1std + rm.fpr20 +rm.fpr35 + rm.fpr50 + rm.fpr80 + log(rm.maxslope) + rm.mad +asylog(rm.medbuf) + rm.pairslope + log(rm.peramp) + log(rm.pdfp) + rm.skew + log(rm.kurtosis)+ rm.std + dublog(rm.rcorbor)"
preds <- c("shov","maxdiff","dscore","totvar","quadvar","famp","fslope","outl","gscore","lsd","gtvar","rm.amplitude","rm.beyond1std","rm.fpr20","rm.fpr35","rm.fpr50","rm.fpr80","rm.maxslope","rm.mad","rm.medbuf","rm.pairslope","rm.peramp","rm.pdfp","rm.skew","rm.kurtosis","rm.std","rm.rcorbor")
tpredform <- paste(preds,collapse="+")

Creation of model formulae and subset testing and training sets

Formula for classification of all types:

allform <- as.formula(paste("type ~",predform))

transients vs. non-variables:

trains$vtype <- ifelse(trains$type == "nv","nv","tr")
trains$vtype <- factor(trains$vtype)
tests$vtype <- ifelse(tests$type == "nv","nv","tr")
tests$vtype <- factor(tests$vtype)
vtform <- as.formula(paste("vtype ~",predform))

transients only,

trains$ttype <- trains$type
trains$ttype[trains$ttype == "nv"] <- NA
trains$ttype <- factor(trains$ttype)
tests$ttype <- tests$type
tests$ttype[tests$ttype == "nv"] <- NA
tests$ttype <- factor(tests$ttype)
trform <- as.formula(paste("ttype ~",predform))

cmat <- matrix(NA,nrow=4,ncol=5)
dimnames(cmat) <- list(c("All","TranNoTran","Tranonly","Heirarch"),c("LDA","RPart","SVM","NN","Forest"))

All types

LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

ldamod <- lda(allform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	29	2	0	3	0	3	0	8
blazar	0	24	13	13	0	2	0	8
cv	0	3	85	32	1	2	1	14
downes	1	4	14	33	0	8	9	1
flare	1	0	2	3	11	10	1	1
nv	12	2	13	29	11	525	0	17
rrlyr	0	2	1	20	0	1	92	0
sn	1	2	17	2	1	7	0	143

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	66	5	0	2	0	1	0	4
blazar	0	62	9	10	0	0	0	4
cv	0	8	59	24	4	0	1	7
downes	2	10	10	24	0	1	9	1
flare	2	0	1	2	46	2	1	1
nv	27	5	9	21	46	94	0	9
rrlyr	0	5	1	15	0	0	89	0
sn	2	5	12	1	4	1	0	74

The overall classification rate is 0.7597.

Recursive Partitioning

roz <- rpart(allform ,data=trains)
rpart.plot(roz,type=1,extra=1)

plot of chunk unnamed-chunk-143

pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	27	2	3	6	1	6	0	7
blazar	0	17	6	19	0	1	4	2
cv	1	6	80	35	0	0	2	3
downes	0	6	8	9	0	2	1	5
flare	0	0	0	0	0	0	0	0
nv	11	1	19	50	22	528	3	32
rrlyr	0	4	1	13	0	7	93	5
sn	5	3	28	3	1	14	0	138

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	61	5	2	4	4	1	0	4
blazar	0	44	4	14	0	0	4	1
cv	2	15	55	26	0	0	2	2
downes	0	15	6	7	0	0	1	3
flare	0	0	0	0	0	0	0	0
nv	25	3	13	37	92	95	3	17
rrlyr	0	10	1	10	0	1	90	3
sn	11	8	19	2	4	3	0	72

The overall classification rate is 0.7194.

Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

svmod <- ksvm(allform, data=trains)

Using automatic sigma estimation (sigest) for RBF or laplace kernel

pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	29	3	0	4	0	0	0	1
blazar	0	22	4	8	0	0	0	0
cv	1	6	102	34	1	3	1	14
downes	1	2	8	43	0	5	8	0
flare	0	0	0	0	7	3	0	0
nv	12	0	8	34	14	542	1	20
rrlyr	0	2	0	8	0	1	93	0
sn	1	4	23	4	2	4	0	157

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	66	8	0	3	0	0	0	1
blazar	0	56	3	6	0	0	0	0
cv	2	15	70	25	4	1	1	7
downes	2	5	6	32	0	1	8	0
flare	0	0	0	0	29	1	0	0
nv	27	0	6	25	58	97	1	10
rrlyr	0	5	0	6	0	0	90	0
sn	2	10	16	3	8	1	0	82

The overall classification rate is 0.8024.

Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

svmod <- multinom(allform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	32	2	0	3	0	1	0	3
blazar	1	24	4	11	0	1	0	6
cv	3	4	93	27	0	5	3	19
downes	0	6	18	59	0	11	7	1
flare	0	0	0	1	7	4	0	1
nv	7	0	9	25	12	531	0	14
rrlyr	0	0	0	7	2	0	93	0
sn	1	3	21	2	3	5	0	148

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	73	5	0	2	0	0	0	2
blazar	2	62	3	8	0	0	0	3
cv	7	10	64	20	0	1	3	10
downes	0	15	12	44	0	2	7	1
flare	0	0	0	1	29	1	0	1
nv	16	0	6	19	50	95	0	7
rrlyr	0	0	0	5	8	0	90	0
sn	2	8	14	1	12	1	0	77

The overall classification rate is 0.796.

Random Forest

Use the randomForest package with the default settings:

tallform <- as.formula(paste("type ~",tpredform))
fmod <- randomForest(tallform, data=trains)
pv <- predict(fmod, newdata=tests)
cm <- xtabs( ~ pv + tests$type)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	31	1	0	3	0	2	0	2
blazar	0	20	3	8	0	0	0	0
cv	2	7	92	27	0	3	1	14
downes	0	4	17	54	0	4	6	0
flare	0	0	0	2	8	2	0	0
nv	8	1	10	31	15	541	1	19
rrlyr	0	2	0	6	0	1	95	0
sn	3	4	23	4	1	5	0	157

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	70	3	0	2	0	0	0	1
blazar	0	51	2	6	0	0	0	0
cv	5	18	63	20	0	1	1	7
downes	0	10	12	40	0	1	6	0
flare	0	0	0	1	33	0	0	0
nv	18	3	7	23	62	97	1	10
rrlyr	0	5	0	4	0	0	92	0
sn	7	10	16	3	4	1	0	82

The overall classification rate is 0.8048.

Transients vs. non-variables

LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

ldamod <- lda(vtform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	523	84
tr	35	598

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	94	12
tr	6	88

The overall classification rate is 0.904.

Recursive Partitioning

roz <- rpart(vtform ,data=trains)
rpart.plot(roz,type=1,extra=1)

plot of chunk unnamed-chunk-158

pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	513	99
tr	45	583

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	92	15
tr	8	85

The overall classification rate is 0.8839.

Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

svmod <- ksvm(vtform, data=trains)

Using automatic sigma estimation (sigest) for RBF or laplace kernel

pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	524	64
tr	34	618

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	94	9
tr	6	91

The overall classification rate is 0.921.

Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

svmod <- multinom(vtform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	520	66
tr	38	616

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	93	10
tr	7	90

The overall classification rate is 0.9161.

Random Forest

Use the randomForest package with the default settings:

tallform <- as.formula(paste("vtype ~",tpredform))
fmod <- randomForest(tallform, data=trains)
pv <- predict(fmod, newdata=tests)
cm <- xtabs( ~ pv + tests$vtype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	519	61
tr	39	621

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	nv	tr
nv	93	9
tr	7	91

The overall classification rate is 0.9194.

Transients only

LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

ldamod <- lda(trform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv$class + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	37	3	4	7	4	0	11
blazar	0	23	7	13	0	0	6
cv	1	3	91	27	1	1	13
downes	3	7	17	65	5	9	2
flare	1	0	1	3	11	1	1
rrlyr	0	0	0	17	0	92	0
sn	2	3	25	3	3	0	159

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	84	8	3	5	17	0	6
blazar	0	59	5	10	0	0	3
cv	2	8	63	20	4	1	7
downes	7	18	12	48	21	9	1
flare	2	0	1	2	46	1	1
rrlyr	0	0	0	13	0	89	0
sn	5	8	17	2	12	0	83

The overall classification rate is 0.7009.

Recursive Partitioning

roz <- rpart(trform ,data=trains)
rpart.plot(roz,type=1,extra=1)

plot of chunk unnamed-chunk-173

pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	30	2	2	8	1	0	3
blazar	0	12	4	15	0	4	1
cv	2	6	88	37	5	3	6
downes	6	7	9	52	6	12	6
flare	1	0	0	1	6	0	3
rrlyr	0	3	0	10	0	84	1
sn	5	9	42	12	6	0	172

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	68	5	1	6	4	0	2
blazar	0	31	3	11	0	4	1
cv	5	15	61	27	21	3	3
downes	14	18	6	39	25	12	3
flare	2	0	0	1	25	0	2
rrlyr	0	8	0	7	0	82	1
sn	11	23	29	9	25	0	90

The overall classification rate is 0.651.

Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

svmod <- ksvm(trform, data=trains)

Using automatic sigma estimation (sigest) for RBF or laplace kernel

pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	37	2	0	11	0	0	6
blazar	0	21	3	7	0	0	0
cv	2	8	108	36	2	2	17
downes	1	2	9	70	4	8	1
flare	0	0	0	1	12	0	1
rrlyr	0	2	0	6	0	93	0
sn	4	4	25	4	6	0	167

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	84	5	0	8	0	0	3
blazar	0	54	2	5	0	0	0
cv	5	21	74	27	8	2	9
downes	2	5	6	52	17	8	1
flare	0	0	0	1	50	0	1
rrlyr	0	5	0	4	0	90	0
sn	9	10	17	3	25	0	87

The overall classification rate is 0.7449.

Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

svmod <- multinom(trform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	35	1	1	5	0	0	3
blazar	0	22	4	11	0	0	3
cv	3	5	93	26	0	1	18
downes	3	8	23	80	6	8	5
flare	0	0	0	2	11	1	4
rrlyr	0	0	0	7	2	93	0
sn	3	3	24	4	5	0	159

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	80	3	1	4	0	0	2
blazar	0	56	3	8	0	0	2
cv	7	13	64	19	0	1	9
downes	7	21	16	59	25	8	3
flare	0	0	0	1	46	1	2
rrlyr	0	0	0	5	8	90	0
sn	7	8	17	3	21	0	83

The overall classification rate is 0.7229.

Random Forest

Use the randomForest package with the default settings:

tallform <- as.formula(paste("ttype ~",tpredform))
fmod <- randomForest(tallform, data=na.omit(trains))
pv <- predict(fmod, newdata=na.omit(tests))
cm <- xtabs( ~ pv + na.omit(tests)$ttype)

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	35	2	2	5	0	0	2
blazar	0	20	3	8	0	0	0
cv	2	7	95	29	2	1	14
downes	3	5	18	74	4	6	6
flare	0	0	0	5	13	1	0
rrlyr	0	2	0	5	0	95	0
sn	4	3	27	9	5	0	170

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	rrlyr	sn
agn	80	5	1	4	0	0	1
blazar	0	51	2	6	0	0	0
cv	5	18	66	21	8	1	7
downes	7	13	12	55	17	6	3
flare	0	0	0	4	54	1	0
rrlyr	0	5	0	4	0	92	0
sn	9	8	19	7	21	0	89

The overall classification rate is 0.7361.

Heirarchical Classification

First we classify into transient and non-variable. The cases which are classified as transient are then classified into type of transient. The transient classification here is different from the one above in the data used. Above, all the data are known to be transients whereas here some cases from the non-variable set will have been classified as transient at the first stage.

LDA

ldamod <- lda(vtform ,data=trains)
pv <- predict(ldamod, tests)
utests <- subset(tests, pv$class != 'nv')
pvt <- predict(ldamod, trains)
utrains <- subset(trains, pvt$class != 'nv')
ldamod <- lda(trform, data=utrains)
predc <- as.character(pv$class)
predc[predc != 'nv'] <- as.character(predict(ldamod, utests)$class)
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	28	2	3	2	2	2	0	8
blazar	0	24	6	12	0	0	0	5
cv	3	4	84	25	0	7	0	17
downes	0	6	22	46	0	7	11	1
flare	0	0	1	2	3	8	1	0
nv	12	0	9	30	16	523	0	17
rrlyr	0	0	0	17	0	2	91	0
sn	1	3	20	1	3	9	0	144

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	64	5	2	1	8	0	0	4
blazar	0	62	4	9	0	0	0	3
cv	7	10	58	19	0	1	0	9
downes	0	15	15	34	0	1	11	1
flare	0	0	1	1	12	1	1	0
nv	27	0	6	22	67	94	0	9
rrlyr	0	0	0	13	0	0	88	0
sn	2	8	14	1	12	2	0	75

The overall classification rate is 0.7605.

RPART

roz <- rpart(vtform ,data=trains)
pv <- predict(roz, tests, type="class")
utests <- subset(tests, pv != 'nv')
pvt <- predict(roz, trains, type="class")
utrains <- subset(trains, pvt != 'nv')
roz <- rpart(trform, data=utrains)
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(roz, utests,
type="class"))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	33	2	0	5	3	17	0	11
blazar	0	15	5	10	0	0	0	2
cv	1	6	83	35	0	2	1	5
downes	0	9	11	34	0	7	17	9
flare	0	0	0	0	0	0	0	0
nv	7	1	16	36	14	513	2	23
rrlyr	0	3	0	7	0	1	83	1
sn	3	3	30	8	7	18	0	141

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	75	5	0	4	12	3	0	6
blazar	0	38	3	7	0	0	0	1
cv	2	15	57	26	0	0	1	3
downes	0	23	8	25	0	1	17	5
flare	0	0	0	0	0	0	0	0
nv	16	3	11	27	58	92	2	12
rrlyr	0	8	0	5	0	0	81	1
sn	7	8	21	6	29	3	0	73

The overall classification rate is 0.7274.

SVM

svmod <- ksvm(vtform ,data=trains)

Using automatic sigma estimation (sigest) for RBF or laplace kernel

pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
svmod <- ksvm(trform, data=utrains)

Using automatic sigma estimation (sigest) for RBF or laplace kernel

predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	32	2	0	6	0	5	0	3
blazar	0	21	3	7	0	0	0	0
cv	3	7	102	35	2	6	2	14
downes	0	2	9	48	0	7	8	1
flare	0	0	0	1	8	4	0	0
nv	7	0	6	28	11	524	0	13
rrlyr	0	2	0	6	0	1	93	0
sn	2	5	25	4	3	11	0	161

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	73	5	0	4	0	1	0	2
blazar	0	54	2	5	0	0	0	0
cv	7	18	70	26	8	1	2	7
downes	0	5	6	36	0	1	8	1
flare	0	0	0	1	33	1	0	0
nv	16	0	4	21	46	94	0	7
rrlyr	0	5	0	4	0	0	90	0
sn	5	13	17	3	12	2	0	84

The overall classification rate is 0.7976.

NNET

svmod <- multinom(vtform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
svmod <- multinom(trform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	31	1	0	3	0	2	0	3
blazar	0	22	4	11	0	1	0	3
cv	3	5	91	24	0	6	1	18
downes	2	8	23	64	0	13	8	5
flare	0	0	0	0	6	6	1	3
nv	7	0	6	24	15	520	0	14
rrlyr	0	0	0	6	0	0	93	0
sn	1	3	21	3	3	10	0	146

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	70	3	0	2	0	0	0	2
blazar	0	56	3	8	0	0	0	2
cv	7	13	63	18	0	1	1	9
downes	5	21	16	47	0	2	8	3
flare	0	0	0	0	25	1	1	2
nv	16	0	4	18	62	93	0	7
rrlyr	0	0	0	4	0	0	90	0
sn	2	8	14	2	12	2	0	76

The overall classification rate is 0.7847.

Random Forest

tallform <- as.formula(paste("vtype ~",tpredform))
svmod <- randomForest(tallform, data=trains)
pv <- predict(svmod, tests)
utests <- subset(tests, pv != 'nv')
pvt <- predict(svmod, trains)
utrains <- subset(trains, pvt != 'nv')
tallform <- as.formula(paste("ttype ~",tpredform))
svmod <- randomForest(tallform, data=na.omit(trains))
predc <- as.character(pv)
predc[predc != 'nv'] <- as.character(predict(svmod, utests))
predc <- factor(predc, levels=sort(levels(trains$type)))
cm <- xtabs( ~ predc + as.character(tests$type))

This table shows the predicted type in the rows by the actual type in the columns.

print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	31	2	0	4	0	3	0	2
blazar	0	21	3	8	0	0	0	0
cv	2	7	92	27	0	3	1	13
downes	1	4	18	57	0	12	6	2
flare	0	0	0	3	9	4	1	0
nv	7	0	5	23	12	520	0	12
rrlyr	0	2	0	6	0	1	95	0
sn	3	3	27	7	3	15	0	163

Same as above but now expressed a percentage within each column:

print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")

Actual
	agn	blazar	cv	downes	flare	nv	rrlyr	sn
agn	70	5	0	3	0	1	0	1
blazar	0	54	2	6	0	0	0	0
cv	5	18	63	20	0	1	1	7
downes	2	10	12	42	0	2	6	1
flare	0	0	0	2	38	1	1	0
nv	16	0	3	17	50	93	0	6
rrlyr	0	5	0	4	0	0	92	0
sn	7	8	19	5	12	3	0	85

The overall classification rate is 0.7968.

Summary of results

Summary of percentage classification rates across tests:

print(xtable(100*cmat,digits=2),type="html")

	LDA	RPart	SVM	NN	Forest
All	75.97	71.94	80.24	79.60	80.48
TranNoTran	90.40	88.39	92.10	91.61	91.94
Tranonly	70.09	65.10	74.49	72.29	73.61
Heirarch	76.05	72.74	79.76	78.47	79.68

cmatGPR <- cmat

Comparison of performance

Richards set

dimnames(cmatRICH) <- list(c("All","TranNoTran","TranOnly","Hierarch"),  c("LDA","RP","SVM","NN","RF"))
round(cmatRICH*100,1)

            LDA   RP  SVM   NN   RF
All        56.7 58.6 66.2 63.3 67.0
TranNoTran 74.6 79.5 80.9 75.2 82.7
TranOnly   54.5 58.9 64.7 60.1 63.3
Hierarch   56.4 60.4 65.1 58.8 65.9

Ours plus the Richards set

dimnames(cmatGPR) <- list(c("All","TranNoTran","TranOnly","Hierarch"),  c("LDA","RP","SVM","NN","RF"))
round(cmatGPR*100,1)

            LDA   RP  SVM   NN   RF
All        76.0 71.9 80.2 79.6 80.5
TranNoTran 90.4 88.4 92.1 91.6 91.9
TranOnly   70.1 65.1 74.5 72.3 73.6
Hierarch   76.0 72.7 79.8 78.5 79.7

Evaluation of Richards and our set of features using Gaussian Process Regression smoothing

Julian Faraway

Richards set predictors

Creation of model formulae and subset testing and training sets

All types

LDA

Recursive Partitioning

Support Vector Machines

Neural Net

Random Forest

Transients vs. non-variables

LDA

Recursive Partitioning

Support Vector Machines

Neural Net

Random Forest

Transients only

LDA

Recursive Partitioning

Support Vector Machines

Neural Net

Random Forest

Heirarchical Classification

LDA

RPART

SVM

NNET

Random Forest

Summary of results

Ours+Richards set of predictors

Creation of model formulae and subset testing and training sets

All types

LDA

Recursive Partitioning

Support Vector Machines

Neural Net

Random Forest

Transients vs. non-variables

LDA

Recursive Partitioning

Support Vector Machines

Neural Net

Random Forest

Transients only

LDA

Recursive Partitioning

Support Vector Machines

Neural Net

Random Forest

Heirarchical Classification

LDA

RPART

SVM

NNET

Random Forest

Summary of results

Comparison of performance