General areas of interest:

- computer algebra
- parallelism
- networks
- cryptography

Though I won't be supervising project next year, I shall leave these suggestions here for you inspiration.

### Parallel Computer Algebra

Joint with Prof Davenport

There are many interesting algorithms in Computer Algebra. This project is to take your favourite CA algorithm and make it run in parallel.

This could be using Isambard, the regional (Tier 2) HPC resource (10,496 Arm cores)

The choice of parallel architecture (shared memory, distributed memory, etc.), programming approach (pthreads, TBB, OpenMP, MPI, etc.) and language (C, C++, etc.) will be your choice, as appropriate for the CA algorithm.

Pre-requisite knowledge: computer algebra, parallelism, programming, CM30070, CM30078.

Suggested topics:

- From PASCO 2007
"Parallel computation of the rank of large sparse matrices from algebraic K-theory" Dumas

-- millions of rows, Coppersmith, Wiedemann, etc.

"Multithreaded parallel implementation of arithmetic operations modulo a triangular set" Moreno Maza

-- parallelises FFT

"Component-level parallelization of triangular decompositions" Moreno Maza

-- large shared memory

"On the parallel computation of comprehensive Gröbner systems" Shutaro Inoue, Yosuke Sato

"POSIX threads polynomials(PTPol): a scalable implementation of univariate arithmetic operations" Mohab Safey El Din, Philippe Trébuchet

-- C library, OpenMP

- From PASCO 2010
"A complete modular resultant algorithm targeted for realization on graphics hardware" Pavel Emeliyanenko

-- resultants on GPUs, CUDA

"Parallel Gaussian elimination for Gröbner bases computations in finite fields" Faugère,

-- mod a large prime, C

"Parallel sparse polynomial division using heaps" Monagan

-- fine grain parallelism, C

"Parallel sparse polynomial interpolation over finite fields" Monagan

-- probabilistic

"Parallelising the computational algebra system GAP" R. Behrends, A. Konovalov, S. Linton, F. Lübeck, M. Neunhöffer

-- report on progress

- From PASCO 2015
"Modular GCD" Monagan

-- written in Cilk, will need translating to OpenMP or TBB

"Polynomial multiplication mod p" Monagan

-- Cilk

"Faugere F4" Monagan

-- Cilk

- From PASCO 2017
"Fast Parallel Multi-point Evaluation of Sparse Polynomials" Michael Monagan

-- Cilk

"Computing Tropical Prevarieties in Parallel" Anders Jensen, Jeff Sommars, Jan Verschelde

-- GMP, Parma Polyhedral Library (PPL)

"Meataxe64: High performance linear algebra over finite fields" Richard Parker

-- ab initio code for large matrices mod p

"An Algorithm For Splitting Polynomial Systems Based On F4" Monagan

-- mod p, in C

- Others: basic algorithms
parallel bignums (modular or digit-by-digit)

parallel polynomials

parallel GCD (bignum or polynomial), and so on

- From PASCO 2007
### Comparison of Parallelism Techniques

As we more into the world of multicore processors, programmers are having problems making the best use of the CPU cycles available. So several programming paradigms have arisen. They include

- Pthreads (and here)
- OpenMP
- MPI
- Intel Threading Building Blocks
- OpenCL
- CUDA
- Grand Central Dispatch (GCD)

This project is to take as many of the above as feasible, take some algorithms, implement them and compare the simplicity of use, accuracy of code, speed of final code and anything else interesting. (This project is

*not*about parallel speedups of code.)Pre-requisite knowledge: C programming, parallelism

Indicative reading: Above.

### Grand Central Dispatch (GCD) Ergonomics

GCD is a "new" approach to parallelism being put forward by Apple. Rather than using locks to protect critical regions it has serial queues of*blocks*(closures).Apple's thesis is that queues are simpler, more lightweight and easier to program with than the traditional threads and locks.

This project is to investigate these claims. You should choose an implements a wide variety of programs in both "traditional" POSIX pthreads and in GCD. Then compare the experience of using both: for example, ease of use; conceptual hurdles in learning; rate of occurrence of parallelism bugs; easy of detection of parallelism bugs; efficiency of code; and so on.

Pre-requisite knowledge: programming using GCD and pthreads, parallelism

Indicative reading: various places on the Web

### University Route Planner

The University of Bath is a fairly complicated place: even though there is a simple East/West compass arrangement of buildings, many people find it difficult to find individual rooms. The many interconnecting corridors are particularly confusing.

This project is to create a route planner phone app (Android/iPhone/both) that will find a route from the user's position to a requested room.

Additional features might include

- Live instruction en route
- A wet weather option that finds an indoor route
- A traffic option that tries to avoid busy routes
- A version for mobile (or other) impaired people

Pre-requisite knowledge: programming, app creation

### CUDA/OpenCL back ends for other Parallel Paradigms

Graphics cards contain some very powerful processors these days. Lately the idea has arisen of using these processors for general computing, not just graphical: this is the premise of GPGPUs (general purpose graphics processors).

CUDA and OpenCL are APIs that allow us to tap into the power of such coprocessors: the intent is that programs written to such APIs will be easily portable to different graphics cards.

This project is to make libraries that enable these APIs to use other parallel architectures. For example

- OpenMP
- Generic Pthreads

And so on. So a program written using OpenMP would be runnable (likely much less efficiently) on a graphics card, etc. This would be useful for developing and debugging such programs.

Pre-requisite knowledge: Programming skills, parallelism

Indicative reading: see

*Comparison of Parallelism Techniques*, above### CUDA compatability layer for OpenCL

There are two main languages in current use for programming GPUs: CUDA and OpenCL. While OpenCL is the "standard", it is CUDA that nearly everybody uses. It would be useful to enable CUDA programs to run on OpenCL.

This project is to

- Identify a (fairly small) subset of CUDA to work on
- Implement a compatability layer in OpenCL that helps a CUDA programmer use OpenCL in the style to which they have become accustomed (i.e., implement your subset of CUDA functionality using OpenCL)
- Write a document explaining how to use the compatability layer, with example programs or anything else that might help

Pre-requisite knowledge: Programming skills, parallelism

### CUDA to OpenCL transformer

There are two main languages in current use for programming GPUs: CUDA and OpenCL. While OpenCL is the "standard", it is CUDA that many people use. It would be useful to enable CUDA programs to run on OpenCL. Such a translator exists, but it seems primitive (much is left to the programmer to do) and unmaintained. Another, CU2CL, has progressed further and needs investigating.

This project is to

- Write a document comparing the two languages
- Implement a source-to-source transformer that reads CUDA programs and produces the equivalent OpenCL programs
- Produces some demonstrator examples of CUDA programs running on non-CUDA architectures, such as AMD and ARM

Pre-requisite knowledge: Programming skills, parallelism, parsing, compiling

### C++ AMP

C++ Accelerated Massive Parallelism is a modified C++ intended to make writing parallel code easier. This uses C++-style mechanisms to hide the complexities.

This project is to compare writing in C++ AMP with a few existing methods, including at least OpenCL and CUDA.

You will

- write a document that is a simple introduction to programming in C++ AMP
- write a document comparing it to other parallel techniques with emphasis on the relative ease of writing correct programs
- provide a library of code that emulates the C++ AMP API using traditional pthreads

Pre-requisite knowledge: C++ programming, parallel computing.

### Double-single precision arithmetic for GPUs

GPUs are very good at massive floating point computations. At least,

*single precision*(`float`) computations. Double precision (`double`) is much less well supported. This is due to a couple of reasons- the majority of GPUs are still used for graphics, where single precision is perfectly adequate
- vendors (like Nvidia) like to segment the market into "high performance" (i.e., supercomputing) and everybody else. They can then charge different prices for the different graphics cards.

Thus, for example, the latest consumer GPUs from Nvidia have double precision performance 1/32 of their single precision performance.

This project is to revive the ancient technique of "double-single" floating point: namely taking

*two*single precision numbers and glueing them together to make a (nearly) double precision number.The project will code the basic arithmetic operations and some transcendental functions in a library in CUDA (similar to C++). Then compare the implementation against true double floating point to see if it can do better than the 1/32 slug factor.

Pre-requisite knowledge: C++ programming,

Indicative reading: IEEE floating point standards, double-double code from here. GPU code here.

### Tcpdump visualisation and Analysis

The output from tcpdump of network traffic between two hosts is somewhat difficult to read. This project is to take such an output (live or from a dump) and produce an arrows diagram such as we see in Stevens' book. This would be annotated with (a configurable amount of) information, e.g,. new connection, dup ACK, etc. More advanced would be to spot Nagle and other TCP strategies.

Pre-requisite knowledge: programming skills, Networking course

Indicative reading: Stevens "TCP/IP Illustrated", a standard Unix graphics system, such as Qt or FLTK.

### Unifying Difference and Differential Calculi

Recently there has been growing interest in a method that unifies discrete and continuous sytems in a single framework called "Time Scales". This project will involve writing a comprehensible introduction to Time Scales and implementing a selection of generic algorithms (probably using Maple).

Pre-requisite knowledge: Mathematics, programming skills.

Indicative reading: Stefan Hilger. "Differential and difference calculus - unified". In Nonlinear Analysis, Theory, Methods and Applications, 30, 2683-2694, 1997. Available from WebCat E-Journals.

### Summation of Divergent Series

On the face of it, infinite series like

1-1+1+1-1+...

and

1-2+4-8+16-...

do not have meaningful sums: they are

*divergent*. Such sums appear in various parts of physics, particularly quantum mechanics (where it is called regularization).It has been found that it is possible to extend the definition of "summation" to give them precise and useful values, e.g., 1-1+1+1-1+... "sums" to 1/2.

This project is to take some of the forms of extended summation and implement them in Maple to provide summation for a reasonable range of divergent series. Thus

`abel_sum((-1)^n,n);`will return`1/2`.Pre-requisite knowledge: Computer Algebra; CM30070

Indicative reading: "Divergent Series", GH Hardy, 512.95 HAR. Wikipedia, MathWorld, PlanetMath, PlanetMath Wikipedia, Wikipedia and many other places on the Web.