# Python for R users

Some things do not work as expected

In [None]:
2+2

In [None]:
try:
    log(1)
except:
    print("Did not work")

Make a list

In [None]:
z = [1, 2, 3]
z

In [None]:
type(z)

Multiplying by two does not give the expected result

In [None]:
z * 2

## Numpy

[Numpy package](http://www.numpy.org) - fundamental numerical computing package.

Usually want to import this package:

In [None]:
import numpy
numpy.log(1)

Can shorten in a standard way

In [None]:
import numpy as np
np.log(1)

Multiplication now works as expected

In [None]:
x = np.array(z)
x * 2

In [None]:
type(x)

Can get the previous behaviour if desired

In [None]:
np.tile(x,2)

Can get help. Problem of knowing the name of the command for what you want to do.

In [None]:
?np.tile

Indices start from zero. Last index in a sequence is not used.

In [None]:
x[1]

In [None]:
x[0:2]

Here y and x refer to the same list. Changing the list, changes both x and y.

In [None]:
y = x
x[2] = 10
x

In [None]:
y

Can test if x and y refer to the same thing

In [None]:
x is y

Need to create a copy if you want the R-like behaviour

In [None]:
x = np.arange(0,3)
x

In [None]:
y = np.copy(x)
x[2] = 10
x

In [None]:
y

Some data types are immutable (meaning they can't be changed). If you try to change them, a copy will be created. Hence we see the following:

In [None]:
a = 1
b = a
a = 2
b

Functions are call by reference for mutable objects

In [None]:
def addone(a):
    a += 1
    return(a)

In [None]:
x = np.array([1, 2])
addone(x)

In [None]:
x

## Scipy

[Scientific Computing](https://docs.scipy.org/doc/scipy/reference/index.html#)

Make 10 standard normal random numbers

In [None]:
import scipy as sp
z = sp.randn(10)
z

Scipy has several subpackages. The stats subpackage can be imported.

In [None]:
from scipy import stats
stats.norm.cdf(1.645)

scipy imports and subsumes numpy so the following two are identical

In [None]:
np.mean(z)

In [None]:
sp.mean(z)

Can do some calculations in the form of a pipe using the . operator:

In [None]:
z.mean()

## Pandas

[Pandas website](http://pandas.pydata.org)

For R data frame like functionality

In [None]:
import pandas as pd
gala = pd.read_table('http://people.bath.ac.uk/jjf23/data/gala.dat',index_col=0,delim_whitespace=True)
gala.head()

In [None]:
type(gala)

In [None]:
gala.describe().round(1)

Can do R data frame like operations. Several ways to access a column

In [None]:
gala.Species

In [None]:
gala['Species']

In [None]:
gala.iloc[0]

Accessing rows and columns with numeric specifications requires iloc.

In [None]:
gala.iloc[-2:,0:3]

In [None]:
gala[gala.Species == gala.Endemics]

In [None]:
import seaborn.apionly as sns
iris = sns.load_dataset('iris')
iris.head()

In [None]:
iris.groupby('species').agg({'sepal_length': np.mean, 'sepal_width' : np.mean})

# Matplotlib

[Matplotlib website](https://matplotlib.org)

Similar functionality to the base plotting functions in R

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

np.linspace creates, by default, 50 evenly spaced values in the specified range

In [None]:
x = np.linspace(0, 2 * np.pi)
y = np.sin(x)
plt.plot(x, y)

In [None]:
plt.scatter(gala.Area, gala.Elevation)
plt.xlabel("Area")
plt.ylabel("Elevation")

## Statsmodels

[Statsmodels](http://www.statsmodels.org/stable/index.html)

Notice that the intercept is not included by default. Horrendous amount of output.

In [None]:
import statsmodels.api as sm
x = gala.iloc[:,2:]
y = gala.Species
lmod = sm.OLS(y,sm.add_constant(x)).fit()
lmod.summary()

Can use R style model formulae

In [None]:
import statsmodels.formula.api as smf
lmod = smf.ols('Species ~ Area + Elevation + Nearest + Scruz + Adjacent', data=gala).fit()
lmod.summary()

Introduce a collinear predictor. R would handle this differently.

In [None]:
lmod2 = smf.ols('Species ~ Area + Elevation + Nearest + Scruz + Adjacent + I(Area + Adjacent)', data=gala).fit()
lmod2.summary()

Can extract the usual quantities e.g. the parameter estimates

In [None]:
lmod.params

Standard diagnostics

In [None]:
plt.scatter(lmod.fittedvalues, lmod.resid)
plt.axhline(0)

QQ plot for the residuals

In [None]:
sm.qqplot(lmod.resid, line='r')