``[1] "Created: Tue Jun 24 14:25:43 2014"``

We have introduced several types of new measure. Some of these are based on our GP fitting but some are data-based statistics rather like the Richards measures. Here we repeat the calculations using only the GP based measures.

Measures that are omitted here are `shov, maxdiff, dscore, gscore, lsd, gtvar`.

See featlcdb.Rmd for how the data feat.rda was prepared for this analysis.

``````opts_chunk\$set(comment=NA, warning=FALSE, message=FALSE, error=FALSE)
source("funcs.R")
require(MASS)
require(nnet)
require(ggplot2)
require(rpart)
require(rpart.plot)
require(xtable)
require(kernlab)
require(randomForest)``````

Set up the predictors that we will use throughout. Note that I have transformed some of the variables as seen in the setup.

``````predform <- "log(totvar) + log(quadvar) + log(famp) + log(fslope) + log(outl) +  rm.amplitude  + rm.beyond1std + rm.fpr20 +rm.fpr35 + rm.fpr50 + rm.fpr80 + log(rm.maxslope) + rm.mad +asylog(rm.medbuf) + rm.pairslope + log(rm.peramp) + log(rm.pdfp) + rm.skew + log(rm.kurtosis)+ rm.std + dublog(rm.rcorbor)"
tpredform <- paste(preds,collapse="+")``````

### Creation of model formulae and subset testing and training sets

Formula for classification of all types:

``allform <- as.formula(paste("type ~",predform))``

transients vs.Â non-variables:

``````trains\$vtype <- ifelse(trains\$type == "nv","nv","tr")
trains\$vtype <- factor(trains\$vtype)
tests\$vtype <- ifelse(tests\$type == "nv","nv","tr")
tests\$vtype <- factor(tests\$vtype)
vtform <- as.formula(paste("vtype ~",predform))``````

transients only,

``````trains\$ttype <- trains\$type
trains\$ttype[trains\$ttype == "nv"] <- NA
trains\$ttype <- factor(trains\$ttype)
tests\$ttype <- tests\$type
tests\$ttype[tests\$ttype == "nv"] <- NA
tests\$ttype <- factor(tests\$ttype)
trform <- as.formula(paste("ttype ~",predform))``````
``````cmat <- matrix(NA,nrow=4,ncol=5)
dimnames(cmat) <- list(c("All","TranNoTran","Tranonly","Heirarch"),c("LDA","RPart","SVM","NN","Forest"))``````

## All types

### LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

``````ldamod <- lda(allform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv\$class + tests\$type)``````

This table shows the predicted type in the rows by the actual type in the columns.

``print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")``
Actual
agn blazar cv downes flare nv rrlyr sn
agn 25 4 3 1 0 3 0 3
blazar 3 20 9 14 0 5 1 8
cv 0 4 79 32 2 7 1 15
downes 0 8 13 29 0 5 12 3
flare 1 0 5 4 10 12 1 1
nv 14 1 16 41 12 521 1 29
rrlyr 0 0 0 11 0 2 87 0
sn 1 2 20 3 0 3 0 133

Same as above but now expressed a percentage within each column:

``print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")``
Actual
agn blazar cv downes flare nv rrlyr sn
agn 57 10 2 1 0 1 0 2
blazar 7 51 6 10 0 1 1 4
cv 0 10 54 24 8 1 1 8
downes 0 21 9 21 0 1 12 2
flare 2 0 3 3 42 2 1 1
nv 32 3 11 30 50 93 1 15
rrlyr 0 0 0 8 0 0 84 0
sn 2 5 14 2 0 1 0 69

The overall classification rate is 0.729.

### Recursive Partitioning

``````roz <- rpart(allform ,data=trains)
rpart.plot(roz,type=1,extra=1)``````

``````pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests\$type)``````

This table shows the predicted type in the rows by the actual type in the columns.

``print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")``
Actual
agn blazar cv downes flare nv rrlyr sn
agn 34 1 1 3 0 8 1 3
blazar 0 0 0 0 0 0 0 0
cv 1 31 94 58 0 3 5 10
downes 0 0 0 0 0 0 0 0
flare 0 0 0 0 0 0 0 0
nv 8 1 17 45 14 513 2 29
rrlyr 0 0 1 10 0 0 86 1
sn 1 6 32 19 10 34 9 149

Same as above but now expressed a percentage within each column:

``print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")``
Actual
agn blazar cv downes flare nv rrlyr sn
agn 77 3 1 2 0 1 1 2
blazar 0 0 0 0 0 0 0 0
cv 2 79 65 43 0 1 5 5
downes 0 0 0 0 0 0 0 0
flare 0 0 0 0 0 0 0 0
nv 18 3 12 33 58 92 2 15
rrlyr 0 0 1 7 0 0 83 1
sn 2 15 22 14 42 6 9 78

The overall classification rate is 0.7065.

### Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

``svmod <- ksvm(allform, data=trains)``
``Using automatic sigma estimation (sigest) for RBF or laplace kernel ``
``````pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests\$type)``````

This table shows the predicted type in the rows by the actual type in the columns.

``print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")``
Actual
agn blazar cv downes flare nv rrlyr sn
agn 27 2 1 7 0 3 0 3
blazar 0 18 4 6 0 0 0 0
cv 0 5 94 36 2 5 1 17
downes 1 4 8 35 0 4 7 1
flare 0 0 0 1 5 4 0 0
nv 16 0 15 40 16 529 3 18
rrlyr 0 2 0 8 0 0 90 0
sn 0 8 23 2 1 13 2 153

Same as above but now expressed a percentage within each column:

``print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")``
Actual
agn blazar cv downes flare nv rrlyr sn
agn 61 5 1 5 0 1 0 2
blazar 0 46 3 4 0 0 0 0
cv 0 13 65 27 8 1 1 9
downes 2 10 6 26 0 1 7 1
flare 0 0 0 1 21 1 0 0
nv 36 0 10 30 67 95 3 9
rrlyr 0 5 0 6 0 0 87 0
sn 0 21 16 1 4 2 2 80

The overall classification rate is 0.7669.

### Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

``````svmod <- multinom(allform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests\$type)``````

This table shows the predicted type in the rows by the actual type in the columns.

``print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")``
Actual
agn blazar cv downes flare nv rrlyr sn
agn 28 2 1 4 0 2 0 3
blazar 2 20 4 6 0 0 0 4
cv 1 7 79 33 1 8 1 19
downes 0 6 21 44 2 15 8 2
flare 0 0 0 1 4 5 0 1
nv 11 1 14 34 15 526 2 24
rrlyr 0 1 0 9 2 0 92 0
sn 2 2 26 4 0 2 0 139

Same as above but now expressed a percentage within each column:

``print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")``
Actual
agn blazar cv downes flare nv rrlyr sn
agn 64 5 1 3 0 0 0 2
blazar 5 51 3 4 0 0 0 2
cv 2 18 54 24 4 1 1 10
downes 0 15 14 33 8 3 8 1
flare 0 0 0 1 17 1 0 1
nv 25 3 10 25 62 94 2 12
rrlyr 0 3 0 7 8 0 89 0
sn 5 5 18 3 0 0 0 72

The overall classification rate is 0.7516.

### Random Forest

Use the randomForest package with the default settings:

``````tallform <- as.formula(paste("type ~",tpredform))
fmod <- randomForest(tallform, data=trains)
pv <- predict(fmod, newdata=tests)
cm <- xtabs( ~ pv + tests\$type)``````

This table shows the predicted type in the rows by the actual type in the columns.

``print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")``
Actual
agn blazar cv downes flare nv rrlyr sn
agn 31 1 0 3 0 5 0 2
blazar 0 19 2 9 0 1 0 0
cv 2 8 82 31 1 1 1 16
downes 0 5 19 44 0 8 6 2
flare 0 0 0 1 7 5 0 0
nv 10 1 15 33 15 527 2 21
rrlyr 0 1 0 7 0 0 92 1
sn 1 4 27 7 1 11 2 150

Same as above but now expressed a percentage within each column:

``print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")``
Actual
agn blazar cv downes flare nv rrlyr sn
agn 70 3 0 2 0 1 0 1
blazar 0 49 1 7 0 0 0 0
cv 5 21 57 23 4 0 1 8
downes 0 13 13 33 0 1 6 1
flare 0 0 0 1 29 1 0 0
nv 23 3 10 24 62 94 2 11
rrlyr 0 3 0 5 0 0 89 1
sn 2 10 19 5 4 2 2 78

The overall classification rate is 0.7677.

## Transients vs.Â non-variables

### LDA

Linear Discriminant analysis using the default options.

We produce the cross-classification between predicted and observed class. Note that the default priors are the proportions found in the training set.

``````ldamod <- lda(vtform ,data=trains)
pv <- predict(ldamod, tests)
cm <- xtabs( ~ pv\$class + tests\$vtype)``````

This table shows the predicted type in the rows by the actual type in the columns.

``print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")``
Actual
nv tr
nv 515 86
tr 43 596

Same as above but now expressed a percentage within each column:

``print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")``
Actual
nv tr
nv 92 13
tr 8 87

The overall classification rate is 0.896.

### Recursive Partitioning

``````roz <- rpart(vtform ,data=trains)
rpart.plot(roz,type=1,extra=1)``````

``````pv <- predict(roz,newdata=tests,type="class")
cm <- xtabs( ~ pv + tests\$vtype)``````

This table shows the predicted type in the rows by the actual type in the columns.

``print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")``
Actual
nv tr
nv 500 107
tr 58 575

Same as above but now expressed a percentage within each column:

``print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")``
Actual
nv tr
nv 90 16
tr 10 84

The overall classification rate is 0.8669.

### Support Vector Machines

Use the default choice of setting from the kernlab R package for this:

``svmod <- ksvm(vtform, data=trains)``
``Using automatic sigma estimation (sigest) for RBF or laplace kernel ``
``````pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests\$vtype)``````

This table shows the predicted type in the rows by the actual type in the columns.

``print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")``
Actual
nv tr
nv 507 81
tr 51 601

Same as above but now expressed a percentage within each column:

``print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")``
Actual
nv tr
nv 91 12
tr 9 88

The overall classification rate is 0.8935.

### Neural Net

Use the multinom() function from the nnet R package. Might work better with some scaling.

``````svmod <- multinom(vtform, data=trains, trace=FALSE, maxit=1000, decay=5e-4)
pv <- predict(svmod, tests)
cm <- xtabs( ~ pv + tests\$vtype)``````

This table shows the predicted type in the rows by the actual type in the columns.

``print(xtable(cm,digits=0,caption="Actual"),type="html",caption.placement="top")``
Actual
nv tr
nv 507 74
tr 51 608

Same as above but now expressed a percentage within each column:

``print(xtable(round(100*prop.table(cm, 2)),digits=0,caption="Actual"),type="html",caption.placement="top")``
Actual
nv tr
nv 91 11
tr 9 89