Next: Scanning Up: C73-Compiler Techniques Previous: Books

Compiler Structure

A classical compiler can be broken into four phases:

scanner

converts characters to tokens (aka lexemes)

parser

takes sequences of tokens to recognize sentences (statements)

semantic processing

checks consistency of meaning across statements (for example, variable usage, function arity with call, type consistency)

code generation

complexity of this stage can vary widely, so it is broken down further as follows:

intermediate code generation: simple code generation based on translation rule for each construct of the language (avoid premature optimization)
machine independent optimization: remove redundant code arising as a consequence of previous phase-basic block construction, common sub-expression elimination
native code generation: simple generation of target machine code using a translation rule from each intermediate construct to a sequence of machine instructions
machine dependent optimization: optimization of the code from the previous phase-usually by peephole.

The languages we will examine fall into three categories: source, intermediate and machine instructions. The dividing line between source and intermediate and between intermediate and code is blurred: some intermediate language could be viewed as source and some as machine instructions. Now examine each of these phases in detail (will take several weeks).

masjap@
Fri Oct 21 18:42:03 BST 1994