Next: Basic Code Generation Up: C73-Compiler Techniques Previous: Parsing

Semantic Processing

Symbol table construction: data structure revision (association lists, balanced trees, hash tables). Storage of attributes, checking declarations, type compatibility, structure accesses (is a structure, field exists), number of arguments matches definition, type checking, type inference. Two approaches to semantic processing.

Type checking

none: assembler, BCPL
static: textual (PASCAL) or type inference (ML)
dynamic: tagged or boxed (Lisp, Spitbol, Basic, PASCAL (variant records)) or multiple homogenous heaps (address ranges). Cover later under run-time systems.

Brief discussion of textual type checking. Simple structure comparison: by equality of declared type name as stored in symbol table (monomorphic typing).

Type Inference

Or type synthesis-type of program is constructed mechanically. Must place restrictions on language to make it decidable. Aim is to provide polymorphism to avoid the tiresome replication of code demanded by textual systems. But first, we need to revise unification.

Notes for type inference algorithm and rules: A is a set of axioms, such as that 0 is an integer, + is a function from integer x integer -> integer, etc. The notation A.x:sigma means the extension of the set of axioms by the assertion that x has type sigma. First few rules correspond to syntactic structures of the target language. The rec rule is for recursive definitions and spec and gen are meta-rules, where generalize permits the deduction of a quantified type (ie. polymorphism) and specialize permits the replacement of a quantifed type by a specific type (an example follows). In the length example, we have the set of axioms that :alpha list, 0:int, 1:int, +:int * int -> int, tl:alpha list -> alpha list.

However, there are problems:

- (fn(f) => (f 3, f true))(fn(x) => x);
Type clash  in:  (f true)
Looking  for a:  int
I have found a:  bool

This fails because it specializes too early and restricts the type of f to int -> int. While in:

- let val f = fn(x) => x in (f 3, f true) end;
> (3,true) : int * bool

The type of f is finalized as alpha -> alpha before it starts to type-check the body of the let.

Simplest algorithm is Hindley-Milner, which can only handle shallow types (types which are universally quantified at the top-level and quantifiers cannot occur nested) and which has been shown to be exponential worst case, although average case behaviour is good. See handout for informal description of inference operation and second handout for Cardelli's set of rules. Use SPEC to deduce a specialised type of a function involving type variables and GEN to deduce the quantified type.

Syntax-directed translation

Construct attributed tree representation of the program-for example, if node, procedure node, block node, assignment node etc.. Semantic checking (as above) carried out by looking up and down tree. Can simultaneously do code-generation (incremental-Cornell Program Synthesizer), however cannot do high quality native CG at this stage, more common is an intermediate representation (IR, for example Diana) or an idealised instruction set for later optimisation and expansion.

Action symbols

Symbols are inserted into the grammar to indicate where and when an action routine should be called. The parse stack contains one entry per symbol matched-the semantic stack is a parallel stack of semantic information for manipulation by the semantic routines. These can also be used for code generation, but that aspect will be covered in the next major section.

For example, consider the semantic processing of an if statement:

S = 'if', E, #startif 'then', {S}, #dothen 'else', {S}, #doelse 'fi', #finishif

LL:

When an action symbol appears on the parse stack the corresponding routine is called. But there is a problem of access to the semantic stack since an LL parser does leftmost derivation and because the semantic information is needed after it has been parsed the semantic stack is more a queue or trail with dirty access.

LR:

Problem is that productions are only identified (reduced) when bullet reaches the right hand end, therefore cannot trigger action routines in the middle, therefore we rewrite the above production:

S = if head, then part, else part, #finishif;
if head = 'if', E, #startif;
then part = 'then', {S}, #dothen;
else part = 'else', {S}, #doelse;

clearly, this is very simple and can be done automatically, which YACC does.

The semantic stack entries can be manipulated in YACC by the dollar variables: $1, ... ,$n refer to the rhs symbols matched and $$ to the lhd non-terminal, for example:

PAIR_BEGIN inner_data more_data { $$=Fn_cons($2,$3); }

Next: Basic Code Generation Up: C73-Compiler Techniques Previous: Parsing

masjap@
Fri Oct 21 18:42:03 BST 1994