next up previous contents
Next: The Parser Up: XML Lexing and Parsing Previous: XML Lexing and Parsing   Contents

The Lexer

Both MathML and OpenMath are based on the structures defined by XML. The lexer must validate XML markup languages and extract the necessary tokens from the successive characters in the input source.

Hence it is important that our lexer tokenizes XML elements as well as determining the different attribute types and values an element may possess. These requirements must be met in order to retrieve the different attributes contained in MathML elements or to find out what symbol and content dictionary is expressed by an OpenMath <OMS> tag.

An XML lexer must also be flexible with spaces, ignoring any amount of spaces or return carriages contained in the input source.