Next: Scanning Up: C73-Compiler Techniques Previous: Books

Compiler Structure

A classical compiler can be broken into four phases:

scanner
converts characters to tokens (aka lexemes)

parser
takes sequences of tokens to recognize sentences (statements)
semantic processing
checks consistency of meaning across statements (for example, variable usage, function arity with call, type consistency)

code generation
complexity of this stage can vary widely, so it is broken down further as follows:
intermediate code generation
simple code generation based on translation rule for each construct of the language (avoid premature optimization)

machine independent optimization
remove redundant code arising as a consequence of previous phase-basic block construction, common sub-expression elimination

native code generation
simple generation of target machine code using a translation rule from each intermediate construct to a sequence of machine instructions

machine dependent optimization
optimization of the code from the previous phase-usually by peephole.

The languages we will examine fall into three categories: source, intermediate and machine instructions. The dividing line between source and intermediate and between intermediate and code is blurred: some intermediate language could be viewed as source and some as machine instructions. Now examine each of these phases in detail (will take several weeks).


masjap@
Fri Oct 21 18:42:03 BST 1994