Next: Scanning
Up: C73-Compiler Techniques
Previous: Books
A classical compiler can be broken into four phases:
- scanner
-
converts characters to tokens (aka lexemes)
- parser
-
takes sequences of tokens to recognize sentences (statements)
- semantic processing
-
checks consistency of meaning across statements (for example,
variable usage, function arity with call, type consistency)
- code generation
-
complexity of this stage can vary widely, so it is broken down
further as follows:
- intermediate code generation
-
simple code generation based on translation rule for each construct
of the language (avoid premature optimization)
- machine independent optimization
-
remove redundant code arising as a consequence of previous
phase-basic block construction, common sub-expression elimination
- native code generation
-
simple generation of target machine code using a translation rule
from each intermediate construct to a sequence of machine instructions
- machine dependent optimization
-
optimization of the code from the previous phase-usually by peephole.
The languages we will examine fall into three categories: source,
intermediate and machine instructions. The dividing line between
source and intermediate and between intermediate and code is blurred:
some intermediate language could be viewed as source and some as
machine instructions. Now examine each of these phases in detail
(will take several weeks).