Symbol table construction: data structure revision (association lists, balanced trees, hash tables). Storage of attributes, checking declarations, type compatibility, structure accesses (is a structure, field exists), number of arguments matches definition, type checking, type inference. Two approaches to semantic processing.
Or type synthesis-type of program is constructed mechanically. Must place restrictions on language to make it decidable. Aim is to provide polymorphism to avoid the tiresome replication of code demanded by textual systems. But first, we need to revise unification.
Notes for type inference algorithm and rules: A is a set of axioms, such as that 0 is an integer, + is a function from integer x integer -> integer, etc. The notation A.x:sigma means the extension of the set of axioms by the assertion that x has type sigma. First few rules correspond to syntactic structures of the target language. The rec rule is for recursive definitions and spec and gen are meta-rules, where generalize permits the deduction of a quantified type (ie. polymorphism) and specialize permits the replacement of a quantifed type by a specific type (an example follows). In the length example, we have the set of axioms that :alpha list, 0:int, 1:int, +:int * int -> int, tl:alpha list -> alpha list.
However, there are problems:
- (fn(f) => (f 3, f true))(fn(x) => x); Type clash in: (f true) Looking for a: int I have found a: boolThis fails because it specializes too early and restricts the type of f to int -> int. While in:
- let val f = fn(x) => x in (f 3, f true) end; > (3,true) : int * boolThe type of f is finalized as alpha -> alpha before it starts to type-check the body of the let.
Simplest algorithm is Hindley-Milner, which can only handle shallow types (types which are universally quantified at the top-level and quantifiers cannot occur nested) and which has been shown to be exponential worst case, although average case behaviour is good. See handout for informal description of inference operation and second handout for Cardelli's set of rules. Use SPEC to deduce a specialised type of a function involving type variables and GEN to deduce the quantified type.
Construct attributed tree representation of the program-for example, if node, procedure node, block node, assignment node etc.. Semantic checking (as above) carried out by looking up and down tree. Can simultaneously do code-generation (incremental-Cornell Program Synthesizer), however cannot do high quality native CG at this stage, more common is an intermediate representation (IR, for example Diana) or an idealised instruction set for later optimisation and expansion.
Symbols are inserted into the grammar to indicate where and when an action routine should be called. The parse stack contains one entry per symbol matched-the semantic stack is a parallel stack of semantic information for manipulation by the semantic routines. These can also be used for code generation, but that aspect will be covered in the next major section.
For example, consider the semantic processing of an if statement:
S = 'if', E, #startif 'then', {S}, #dothen 'else', {S}, #doelse 'fi', #finishif
S = if head, then part, else part, #finishif; if head = 'if', E, #startif; then part = 'then', {S}, #dothen; else part = 'else', {S}, #doelse;clearly, this is very simple and can be done automatically, which YACC does.
The semantic stack entries can be manipulated in YACC by the dollar variables: $1, ... ,$n refer to the rhs symbols matched and $$ to the lhd non-terminal, for example:
PAIR_BEGIN inner_data more_data { $$=Fn_cons($2,$3); }