Hints for Exercises

Hints for Exercises: Chapter 11

Encode your name in

(a) ASCII
(b) EBCDIC
(c) UCS
(d) UTF-8

(People with non-English names may have problems with parts (a) and (b) and should use instead MIME quoted-printable).
As hexadecimal bytes the name "Russell" becomes
- (a) 52 75 73 73 65 6C 6C
- (b) D9 A4 A2 A2 85 93 93
- (c) 00 00 00 52 00 00 00 75 00 00 00 73 00 00 00 73 00 00 00 65 00 00 00 6C 00 00 00 6C
- (d) 52 75 73 73 65 6C 6C (same as ASCII for me)
Investigate the use of multiple character sets as a solution to representing non-European characters.
The intent of this exercise is to see whether there could be a reasonable mechanism that could allow us to specify the interpretation of blocks of bytes. For example, a MIME block containing English followed by a MIME block containing Japanese. Such mechanisms will probably be quite cumbersome.
Read Jonathan Swift's Gulliver Travels (in particular the adventure in the Empire of Blefuscu). Explain the relevance.
This presages the big-endian vs little-endian debate as nations that war over which end of an egg you should start on. This was actually a parody of a prevailing religious debate, but still serves us nicely in the computer age.
Write some pairs of XDR programs that transmit various kinds of data, including floating point numbers, strings and more complicated data structures.
Basic code is given in the book. More interesting is the transmission of compound datastructures like vectors or linked lists. There are many example programs to be found using Google. See, for example, here.
Look at the source of Web pages from countries that use a non-Western alphabet. Classify the ways that they approach the presentation problem.
A variety of approaches. Some declare the character set for the document in a header, e.g.,
```
<meta http-equiv="Content-Type" content="text/html; charset=shift_jis">
```
for a HTML header (this is a 2 byte encoding of Japanese characters). Similarly for a MIME header. Then a pair of bytes like 65 E5 are interpreted as the character 日
Some use Unicode, often via UTF-8. This is declared in the header just as above. UTF-8 has the edge when most of the document is ASCII, but less so when not so.
Others insert HTML encodings of characters, such as 日 which is interpreted as the same. This is OK for the occasional non-ASCII character, but impractical for continuous text.
Still others use image inserts: . This has so many problems, like being unable to be selected and copied, read by a screen reader, read by a Web indexer, not easily modifiable, and so on.
MIME is used in many contexts. Look at the MIME encapsulation used in several applications, such as email and Web pages. What does MIME do to the size of a chunk of data?
Quoted printable: good for the occasional special character, but not so good for continuous special text. Special characters are encoded by 3 bytes, but if they are rare, there is not much expansion. Also, a little encapsulation overhead.
Base64: this has a expansion rate of 4/3 (3 text bytes become 4 encoded bytes), plus a little encapsulation overhead. So, if there is a lot of special characters (such as binary data, or a non-ASCII character set), this is a much better encoding than quoted printable.
Decode the base64 encoded message
RG9uJ3QgYmUgYmxhc+kgYWJvdXQgcHJlc2VudGF0aW9uLg==
into the ISO-8859-15 character set.
No prizes! There are many tools to help, such as the Perl MIME::Base64 module.

Previous Index Next

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.