(People with non-English names may have problems with parts (a) and (b) and should use instead MIME quoted-printable).
As hexadecimal bytes the name "Russell" becomes
The intent of this exercise is to see whether there could be a reasonable mechanism that could allow us to specify the interpretation of blocks of bytes. For example, a MIME block containing English followed by a MIME block containing Japanese. Such mechanisms will probably be quite cumbersome.
This presages the big-endian vs little-endian debate as nations that war over which end of an egg you should start on. This was actually a parody of a prevailing religious debate, but still serves us nicely in the computer age.
Basic code is given in the book. More interesting is the transmission of compound datastructures like vectors or linked lists. There are many example programs to be found using Google. See, for example, here.
A variety of approaches. Some declare the character set for the document in a header, e.g.,
<meta http-equiv="Content-Type" content="text/html; charset=shift_jis">for a HTML header (this is a 2 byte encoding of Japanese characters). Similarly for a MIME header. Then a pair of bytes like 65 E5 are interpreted as the character 日
Some use Unicode, often via UTF-8. This is declared in the header just as above. UTF-8 has the edge when most of the document is ASCII, but less so when not so.
Others insert HTML encodings of characters, such as 日 which is interpreted as the same. This is OK for the occasional non-ASCII character, but impractical for continuous text.
Still others use image inserts: . This has so many problems, like being unable to be selected and copied, read by a screen reader, read by a Web indexer, not easily modifiable, and so on.
Quoted printable: good for the occasional special character, but not so good for continuous special text. Special characters are encoded by 3 bytes, but if they are rare, there is not much expansion. Also, a little encapsulation overhead.
Base64: this has a expansion rate of 4/3 (3 text bytes become 4 encoded bytes), plus a little encapsulation overhead. So, if there is a lot of special characters (such as binary data, or a non-ASCII character set), this is a much better encoding than quoted printable.
RG9uJ3QgYmUgYmxhc+kgYWJvdXQgcHJlc2VudGF0aW9uLg==into the ISO-8859-15 character set.
No prizes! There are many tools to help, such as the Perl MIME::Base64 module.
Previous Index Next
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.