---
title: Fitting GPR and generation of the measures
author: Julian Faraway
output:
html_document:
toc: true
theme: cosmo
---
```{r global_options, include=FALSE}
library(knitr)
opts_chunk$set(comment=NA, fig.path='/tmp/Figs/', warning=FALSE, message=FALSE)
```
```{r echo=FALSE}
paste("Created:",date())
```
```{r}
load("lcdb.rda")
source("funcs.R")
GPR <- TRUE
```
Set the kernel type to squared exponential and set the inverse width:
```{r}
kerntype <- "exponential"
wvec <- 2e-4
```
## Definition of measures
Measures can be classified into groups.
1. **Whole curve measures**
- *meds* median magnitude(mag)
- *iqr* interquartile range of mag
- *shov* mean of absolute differences of successive observed
- *maxdiff* the maximum difference magnitudes
- *dscore* mean value of phi((mag-median)/sderr)
- all the *Richards measures*
2. **Fitted curve measures** (first smooth data then compute)
- *totvar* total variation
- *quadvar* quadratic variation
- *famp* range of fitted curve
- *fslope* maximum derivative in the fitted curve
- *trend* linear trend of fitted curve
3. **Residual from fit measures**
- *outl* take residuals of smoothed fit and use the maximum studentized residual.
- *std* SD of residuals
- *skewres* skewness of residuals
- *shapwilk* Shapiro-Wilk statistic of residuals (test for normality)
4. **Cluster measures** (based on groups of up to 4 measures in 30mins)
- *lsd* fit the means within the groups (up to 4 measurements). Take
the logged SD of the residuals from this fit
- *gscore* mean value of phi((mag-mean(mag))/sd) based on clusters
- *mdev* the max absolute residuals from this fit
- *gtvar* total variation of curve based on group means scaled by range of observation
5. **Other** (can't use these for discriminant purposes)
- *wander* mean movement of object in (ra, dec) space
- *moveloc* range of motion in (ra, dec) space
- *nobs* number of measurements in the group
We have not used all these measures for classification purposes. Some we have not
used, like *nobs*, because they would distort classification performance as explained
in the article. Other measures were failed experiments in that they failed to incrementally
improve classification performance. We have not reported these in the article but keep
them here for future reference.
```{r}
firstdate <- 53464
daterange <- 2764
detection.limit <- 20.5
```
```{r child="meascomp.Rmd",eval=TRUE}
```
## Numerical summary of the measures
```{r}
summary(cmdb)
```
## Plots of the measures
```{r}
mnames <- names(cmdb)
for(i in 1:(ncol(cmdb)-2)){
vname <- mnames[i]
tranf <- functran[[match(vname,names(functran))]]
y <- sapply(cmdb[,i], tranf)
ylab <- ifelse(tranf=="identity",mnames[i],paste0(tranf,"(",mnames[i],")"))
plot(cmdb[,i] ~ cmdb$type, ylab=ylab, xlab="Type")
}
```
## Split data into training and test
Split the data into a training(2/3) and a
test(1/3) sample in the same way as before. The split will not be
identical to that used on the Richards measures because of the problem
that not all the Richards stats are computed on the NV group resulting
in the discard of some objects from the calculation.
```{r}
set.seed(123)
n <- nrow(cmdb)
isel <- sample(1:n,round(n/3))
trains <- cmdb[-isel,]
tests <- cmdb[isel,]
```
There are `r round(n/3)` observations in the test set and `r (n-round(n/3))` observations in the training set.
Save for future use.
```{r}
save(cmdb,trains,tests,file="feat.rda")
```