Overview of MA20226 Statistics 2A

Syllabus

This unit introduces classical estimation and hypothesis-testing principles. The unit is divided into four main topics:

Point estimation.
Confidence intervals.
Hypothesis testing.
Inference for relationships.

Within each topic we will develop the theory behind common statistical inference procedures and apply the theory to real case studies. Students will use the statistical programming package R to investigate the properties of these procedures and analyse data.

Learning outcomes: After taking this unit, students should be able to:

Perform standard estimation procedures and tests on normal data
Carry out goodness-of-fit tests and analyse contingency tables
Use R to calculate estimates, carry out hypothesis tests and compute confidence intervals.

Pre-requisites: Before taking this unit you must take MA10211 Probability & Statistics 1A and MA10212 Probability & Statistics 1B. I recommend reviewing the following topics from these units so that you have definitions and properties at your fingertips during lectures and tutorials:

MA10211: Properties of expectation and covariance. Independence. Properties of the binomial and Poisson distributions (mean, variance, mass functions).
MA10212: Properties of the normal and exponential distributions (mean, variance, density functions). Cumulative distribution functions, central limit theorem. Random sampling from the distributions above, generating simple graphical and numeric summaries, and writing simple for loops in R.

Towards the end of MA10212, you covered parameter estimation in simple cases via method of moments and maximum likelihood as well as sampling distributions of sample means. Our first major task in this unit will be to review these ideas and develop them further in Chapter 2.

Timetable

Lectures:

Tuesday 12:15-13:05 3WN2.1
Thursday 09:15-10:05 EB1.1

Problem classes:

Mondays 11:15-12:05 via Zoom.
https://bath-ac-uk.zoom.us/j/99555352535?pwd=WlRJKzM0QytBVXlqbUFPQmJodVJnQT09
Meeting ID: 995 5535 2535
Passcode: 122399

Office hours: There will be a dedicated in-person office hour on Tuesday 13:15-14:05 in my office, 4W4.10. However, I am happy to discuss any matters relating to the course at any time, either via email or one-to-one. If you would like to meet then just send me an email, with a list of proposed times and whether you wish to meet in-person or on Teams.

Tutorials: You will be assigned to a small group which will meet weekly to go over a mixture of problems using R and ‘pen and paper’. The R components are particularly important for developing the skills you will need for completing the coursework.

Group 1: Friday 14:15-15:05, EB0.9, tutored by Ruchen Liu
Group 2: Friday 13:15-14:05, 3E3.1, tutored by Melina Del Angel
Group 3: Friday 13:15-14:05, EB0.9, tutored by Ruchen Liu
Group 4: Thursday 12:15-13:05, 3E3.1, tutored by Matthew Pawley
Group 5: Thursday 13:15-14:05, 3E3.1, tutored by Simon Shaw
Group 6: Thursday 11:15-12:05, 3E3.1, tutored by Matthew Pawley
Group 7: Friday 12:15-13:05, EB0.7, tutored by Melina Del Angel

Moodle and Panopto Resources

Recordings: Recordings of the lectures and problem classes will be made available on Panopto.

Moodle page: The Moodle page will contain links for all lecture content and the lab sheets.

Lecture notes: A full set of comprehensive lecture notes is available through the unit Moodle page. The notes are available in two formats (HTML and pdf) with identical content.

Assessment and feedback

Summative assessment

Coursework: 25% of unit mark, individual electronic submission via Moodle. Set in Week 7 (Monday 13th November) and due in Week 9 (Wednesday 29th November).
Exam: 75% of unit mark.

The 2023/4 exam will be an in-person, closed-book assessment. What this means:

In person: you will sit the exams at fixed times on fixed days in a venue at the University. The exams will be invigilated.
Closed-book: You will not be allowed to have any revision materials with you or any access to the internet. Your exam papers will be tailored to this setting.

The exam will be designed to take 2 hours and there will be a total of 60 marks available. It will have two sections.

Section A (worth 24 marks, corresponding to 40%) contains a number of short questions.
Section B (worth 36 marks, corresponding to 60%) contains three longer questions, each worth 18 marks.
You should answer ALL questions from Section A and TWO questions from Section B. If you submit solutions to more than two questions in Section B, only the BEST two of these solutions will contribute to the assessment.

You will be permitted to use a university calculator and the University Formula Book in the exam.

Formative assessment

Lab sheets with a mix of programming and ‘pen and paper’ exercises will be set in each lab session with submission of written exercises one week later. Any work submitted by the hand-in deadline will be marked and returned to you giving you personal feedback. Full solutions to all exercises and general feedback sheets will also be made available.

Some useful books

This unit is self-contained in the sense that you will not strictly need to consult text books. However, the following books are possibly relevant.

Background reading:

Peter Dalgaard, Introductory statistics with R. 2nd edition. Springer. The full text is available as an e-book, either by following the link from the library here or directly here. The first half of this book is a useful refresher on R. Chapters 5 and 8 cover the R implementation of the material in Chapter 5.
John A Rice, Mathematical statistics and data analysis. 3rd edition. Duxbury. Multiple copies of eitherthis edition or earlier additions are available in the library. Chapters 1 – 5 were covered in Probability and Statistics 1A & 1B. Material relevant to this unit can be found primarily in Chapters 6, 8, 9, 11, and 13.
Yudi Pawitan. In All Likelihood. Oxford University Press. The full text is available as an e-book, either by following the link from the library here or directly here. Chapters 2, 4, and 5 are most relevant, though some of the material in those chapters is more advanced.

Sources for omitted proofs: I will occasionally reference the following books for details of proofs omitted from these lecture notes:

George Casella and Roger L Berger. Statistical Inference, 2nd edition. Brooks Cole/Cengage Learning. A very nice book, Chapters 7, 8, and 9 are most relevant.
Erich L Lehmann. Elements of Large Sample Theory. Springer. The full text is available as an e-book, either by following the link from the library here or directly here. Chapters 2, 3, 4, 5, and 7 contain relevant material.
Peter J Bickel and Kjell A Doksum. Mathematical statistics. Volume 1, Basic ideas and selected topics, 2nd edition. Pearson Prentice Hall. The appendices contain excellent detail on the probability limit results used in the course.

O’Reilly Learning Online: The University has a subscription to O’Reilly Learning Online, giving you access to a wide range of online courses and textbooks on topics including statistics and data science. Our research librarians have compiled a list of high quality courses for R and RStudio: https://library.bath.ac.uk/research-software/R-RStudio. I highly recommend the first two chapters in Learn R Programming course for a quick refresher on R.

Acknowledgements

These notes have developed with input from past lecturers of the unit who include Drs Theresa Smith, Jonathan Bartlett, and Karim Anaya-Izquierdo. Please report any errors to Simon Shaw.