The table for EES as given in the paper does not match its definition: the third-to-last block (eb64749a) should be last, and the next two blocks shifted up to fill the gap.
A big endian description in contrast with all the other little endian ones.
Lack of test vectors. Thanks to Brian Gladman for getting some, and for many discussions on endianness.
Uses 64 bit words and multiplies: awkward on current machines, but will be better on future machines.
Use of multiple word sizes (32, 64, 128, etc) confusing for endian reasons.
Difficult parts: generating the Sboxes. The E function was merely fiddly (computing the bitmask M).
I notice that test vectors and corrections have now appeared.
Lack of test vectors for complete cipher.
Q boxes easy: first time. A couple of auxiliary progs to compute the qs, and compute lookup tables for the MDS multiples.
Would like more test data: using all zeros for a key meant that various matrix multiplies produced correct answers (i.e., zero) even though entries were incorrect. Some example Sboxes would have been useful.
After getting correct version, could simplify somewhat by inlining various parts (by hand) to produce a simpler version. Also producing a full-keyed version was straightforward.
Various ways of converting from words to bytes were considered (e.g., unions), until I settled on simple casts (hidden by macros) as the simplest solution. Big vs little endian: developed both at same time.
Algorithm description: good, but involved lots of flipping back and forth during implementation.
Then simple to extend to other word-length variants (no test data, so cannot confirm correctness).
No attempt at optimisation yet: there's a lot less to fiddle with, so there's not much chance of any significant optimisations.
Algorithm description: good. The key schedule is hidden away in an appendix, though.