HPC Doctoral Taught Course Centre
JHD's slides from Tuesday 10 September.
JHD's Notes on C types from 12 September.
The C declaration explainer.
Case Study: Matrix Multiplication
We consider what is probably the simplest HPC task, multiplying matrices, and we only consider (for simplicity) a sequential program running on one core of a dedicated node on Bath's HPC. This has 64KB of L1 cache and 6MB of L2 cache. e consider three fixed matrices A,B and C and compute C:=C+A*B.
There are three basic codes.
- Obvious(known as DGEMM in the code) c_ij:=c_ij+sum a_ikb_kj
- Bad Transpose(known as DGEMMtr in the code) c_ij:=c_ij+sum aT_kib_kj where aT is the transpose of a
- Good Transpose(known as DGEMMxtr in the code) c_ij:=c_ij+sum a_ikbT_jk where bT is the transpose of b.
We show here the performance, as a fraction of the potential peak performance, when the matrices are all sub-matrices of pre-allocated 2048x2048 matrices.
If instead we use chunks of a 2049\2049 matrix, we get different results, shown comparatively as this:
2048 and 2049 results
Factorial programs, C sheet 1.3.
Machine epsilon programs, NA sheet.
Argv example programs, C exercises.