Modeling Light Curves For Improved Classification For Astronomical Objects

Note about R markdown
Exploratory analysis
Gaussian Process Regression
Computing our derived measures
Classification using our measures in conjunction with Richards measures
Variable Importance
Matern Kernel
Using GP-based measures without our other suggested additional measures
Alternative splits
Classification on a new test set of transients
Classification into variable and non-variable - a new set of 100k lightcurves

This is the online supplement which reproduces the plots and calculations presented in the paper plus some additional material.

Note about R markdown

The scripts are presented in R markdown format. To run these, install the rmarkdown package. The scripts can then be run using render("file.Rmd") to produce the html output. See the package website for other options.

Exploratory analysis

The data can be found in the R database lcdb.rda. The R functions are also needed.

Exploratory data analysis producing Table 1 and Figures 1 and 2. Uses R markdown script

Gaussian Process Regression

The Gaussian Process Regression Fit used to generate Figure 3 is generated using this R Markdown script.

Computing our derived measures

We compute the new derived and Richards measures using this R markdown script which produces this output. The computation produces an R datafile that is used in the subsequent calculations.

The Richards measures were originally calculated using a Python script. We reimplemented these in R and found some mistakes. Our implementation corresponds to the description in Richards et al.’s paper. Here is a list of the Richards set that we have used. Note that rcorbor was not included in the original Richards set but can be found in later iterations.

Classification using our measures in conjunction with Richards measures

These new measures are evaluated using this R markdown script giving this output. Table 3 can be found at the end of this output. This requires the same sub script for the computations and script for the tabular displays.

Variable Importance

Random Forest allows us to determine the relative importance of variables to the classification. Here is the R markdown script giving this output. The first of these plots is Figure 4 from our paper.

Matern Kernel

Our calculations above use a squared exponential kernel for GP fitting. We repeat the computation of the measures and the discrimination using a a Matern kernel. Here is the R markdown script giving this output. The results are similar.

Using GP-based measures without our other suggested additional measures

We have proposed adding four different kinds of new measures. Two of these kinds are based on the GP fit - one set based on the fitted values and the other on the residuals. We also have proposed some new general data-based measures which extend the Richards set into areas where there seems to be a gap. We also proposed a group of cluster-based measures. We investigated how much value the GP-based measures add to the Richards set by repeating the discrimination tests while not using our other two groups of measures. Here is the R markdown script giving this output. The results show that the GP-based measures add substantial discriminatory value beyond the Richards set but that there is some additional value in the other two groups of measures.

Alternative splits

In the paper, we consider only one random split of the data into 2/3 training and 1/3 testing. Just to verify that the conclusions of the paper are not sensitive to this particular random split, we show the results from 10 more random splits. Here is the R markdown script giving this output. The results show that the particular split is not important.

Classification on a new test set of transients

A new set of objects was assembled from recent observations. This includes only the AGN, Blazar, CV, Flare and SNe types. These objects are found in the R database newtran13.rda.

Because this new set of objects was collected using improved methods and protocols compared to the original data, we reassembled the objects of the five transient types in the R database oldtranrev.rda.

We train a classification rule on the revised version of the original data and test it on the new set of data.

Before we compute the measures, we need to look at the revised original data to checking the range of dates etc. Here is setup and computation of measures with R markdown code which produces R datafile for subsequent calculations. The R datafile for the new2013 set of measures will also be needed.

Classification rules are trained using the revised original data and tested using the new data using this R markdown script giving this output.

Classification into variable and non-variable - a new set of 100k lightcurves

A new set of 100,000 lightcurves labelled as variable and non-variable has been assembled. Note that the labelling as variable and non-variable is not necessarily perfect. Note also that the variable set contains objects which are not transients or may be transient of a type not considered previously.

The data can be found in the R database.

Before we compute the measures, we need to look at the new data to checking the range of dates etc. Here is setup and computation of measures with R markdown code which produces R datafile for subsequent calculations.

We divide the data randomly with one half being used to train a classification rule and the other half being used to test it. Here is R markdown script for computing the classification giving this output. The classification performance can be seen in summary form at the bottom of the file.