R attaches several packages by default. Without these, there would be much less functionality:
sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.1 backports_1.1.1 magrittr_1.5 rprojroot_1.2
[5] tools_3.4.1 htmltools_0.3.6 yaml_2.1.14 Rcpp_0.12.13
[9] stringi_1.1.5 rmarkdown_1.6 knitr_1.17 stringr_1.2.0
[13] digest_0.6.12 evaluate_0.10.1
Cannot multiply a list by two
x = list(1,2,3)
x * 2
Error in x * 2: non-numeric argument to binary operator
but no problem for a vector
x = c(1,2,3)
x * 2
[1] 2 4 6
Availablity of functions depends on packages loaded:
select()
Error in select(): could not find function "select"
MASS
has a select function (used for smoothing parameter selection in ridge regression)
library(MASS)
select
function (obj)
UseMethod("select")
<bytecode: 0x7fc0be4d2510>
<environment: namespace:MASS>
But dplyr
also has a select:
library(dplyr)
select
function (.data, ...)
{
UseMethod("select")
}
<environment: namespace:dplyr>
This now overwrites the original select. If you want that back, you need:
MASS::select
function (obj)
UseMethod("select")
<bytecode: 0x7fc0be4d2510>
<environment: namespace:MASS>
which is what Python does all the time to avoid this namespace problem.
Set y = x and then edit x:
y = x
x[3] = 10
x
[1] 1 2 10
Does not change y
y
[1] 1 2 3
Write a function to add one to the vector:
addone = function(x){
x = x + 1
x
}
addone(x)
[1] 2 3 11
Does not change x
x
[1] 1 2 10
output from previous stage is input to the next stage (as the first argument)
x %>% mean
[1] 4.3333
iris %>% group_by(Species) %>% summarise(mean(Sepal.Length), mean(Sepal.Width))
# A tibble: 3 x 3
Species `mean(Sepal.Length)` `mean(Sepal.Width)`
<fctr> <dbl> <dbl>
1 setosa 5.006 3.428
2 versicolor 5.936 2.770
3 virginica 6.588 2.974
gala = read.table("http://people.bath.ac.uk/jjf23/data/gala.dat",header=TRUE)
head(gala)
Species Endemics Area Elevation Nearest Scruz Adjacent
Baltra 58 23 25.09 346 0.6 0.6 1.84
Bartolome 31 21 1.24 109 0.6 26.3 572.33
Caldwell 3 3 0.21 114 2.8 58.7 0.78
Champion 25 9 0.10 46 1.9 47.4 0.18
Coamano 2 1 0.05 77 1.9 1.9 903.82
Daphne.Major 18 11 0.34 119 8.0 8.0 1.84
Summary output is reasonably compact:
lmod = lm(Species ~ Area + Elevation + Nearest + Scruz + Adjacent, gala)
summary(lmod)
Call:
lm(formula = Species ~ Area + Elevation + Nearest + Scruz + Adjacent,
data = gala)
Residuals:
Min 1Q Median 3Q Max
-111.68 -34.90 -7.86 33.46 182.58
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.06822 19.15420 0.37 0.7154
Area -0.02394 0.02242 -1.07 0.2963
Elevation 0.31946 0.05366 5.95 3.8e-06
Nearest 0.00914 1.05414 0.01 0.9932
Scruz -0.24052 0.21540 -1.12 0.2752
Adjacent -0.07480 0.01770 -4.23 0.0003
Residual standard error: 61 on 24 degrees of freedom
Multiple R-squared: 0.766, Adjusted R-squared: 0.717
F-statistic: 15.7 on 5 and 24 DF, p-value: 6.84e-07
Introducing a collinear predictor gets rejected in a noticeable way:
lmod = lm(Species ~ Area + Elevation + Nearest + Scruz + Adjacent + I(Area + Adjacent), gala)
summary(lmod)
Call:
lm(formula = Species ~ Area + Elevation + Nearest + Scruz + Adjacent +
I(Area + Adjacent), data = gala)
Residuals:
Min 1Q Median 3Q Max
-111.68 -34.90 -7.86 33.46 182.58
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.06822 19.15420 0.37 0.7154
Area -0.02394 0.02242 -1.07 0.2963
Elevation 0.31946 0.05366 5.95 3.8e-06
Nearest 0.00914 1.05414 0.01 0.9932
Scruz -0.24052 0.21540 -1.12 0.2752
Adjacent -0.07480 0.01770 -4.23 0.0003
I(Area + Adjacent) NA NA NA NA
Residual standard error: 61 on 24 degrees of freedom
Multiple R-squared: 0.766, Adjusted R-squared: 0.717
F-statistic: 15.7 on 5 and 24 DF, p-value: 6.84e-07