---
title: "Introduction to R for Python Users"
author: "Julian Faraway"
output:
html_document:
theme: cosmo
toc: yes
---
```{r global_options, include=FALSE}
knitr::opts_chunk$set(cache=FALSE,comment=NA, fig.path="/tmp/Figs/", warning=FALSE, message=FALSE)
options(digits=5,show.signif.stars=FALSE)
```
## Default packages
R attaches several packages by default. Without these, there would be much less functionality:
```{r}
sessionInfo()
```
## Multiplying a list
Cannot multiply a list by two
```{r error=TRUE}
x = list(1,2,3)
x * 2
```
but no problem for a vector
```{r}
x = c(1,2,3)
x * 2
```
## Namespace clashes
Availablity of functions depends on packages loaded:
```{r error=TRUE}
select()
```
`MASS` has a select function (used for smoothing parameter selection in ridge regression)
```{r}
library(MASS)
select
```
But `dplyr` also has a select:
```{r}
library(dplyr)
select
```
This now overwrites the original select. If you want that back, you need:
```{r}
MASS::select
```
which is what Python does all the time to avoid this *namespace* problem.
## Call by reference versus value
Set y = x and then edit x:
```{r}
y = x
x[3] = 10
x
```
Does not change y
```{r}
y
```
Write a function to add one to the vector:
```{r}
addone = function(x){
x = x + 1
x
}
addone(x)
```
Does not change x
```{r}
x
```
## Pipes
output from previous stage is input to the next stage (as the first argument)
```{r}
x %>% mean
```
A more common way to use the pipe operator is with the dplyr
```{r}
iris %>% group_by(Species) %>% summarise(mean(Sepal.Length), mean(Sepal.Width))
```
## Regression
```{r}
gala = read.table("http://people.bath.ac.uk/jjf23/data/gala.dat",header=TRUE)
head(gala)
```
Summary output is reasonably compact:
```{r}
lmod = lm(Species ~ Area + Elevation + Nearest + Scruz + Adjacent, gala)
summary(lmod)
```
Introducing a collinear predictor gets rejected in a noticeable way:
```{r}
lmod = lm(Species ~ Area + Elevation + Nearest + Scruz + Adjacent + I(Area + Adjacent), gala)
summary(lmod)
```