HPC Doctoral Taught Course Centre

JHD's slides from Tuesday 10 September.
JHD's Notes on C types from 12 September.
The C declaration explainer.

Case Study: Matrix Multiplication

We consider what is probably the simplest HPC task, multiplying matrices, and we only consider (for simplicity) a sequential program running on one core of a dedicated node on Bath's HPC. This has 64KB of L1 cache and 6MB of L2 cache. e consider three fixed matrices A,B and C and compute C:=C+A*B.

There are three basic codes.

Obvious(known as DGEMM in the code) c_ij:=c_ij+sum a_ikb_kj
Bad Transpose(known as DGEMMtr in the code) c_ij:=c_ij+sum aT_kib_kj where aT is the transpose of a
Good Transpose(known as DGEMMxtr in the code) c_ij:=c_ij+sum a_ikbT_jk where bT is the transpose of b.

We show here the performance, as a fraction of the potential peak performance, when the matrices are all sub-matrices of pre-allocated 2048x2048 matrices.

If instead we use chunks of a 2049\2049 matrix, we get different results, shown comparatively as this: 2048 and 2049 results

C programs

Factorial programs, C sheet 1.3.
Machine epsilon programs, NA sheet.
Argv example programs, C exercises.