Next: Reflection Up: C73-Compiler Techniques Previous: Functional Languages

Advanced Analysis

Abstract Interpretation

Interpretation of a program where the values are replaced by their classes in order to determine information about which parts of the program are ``necessary''.

Technique is to annotate the program so that it will execute in a preferred way, but which does not change the semantics. The primary application of abstract interpretation is strictness analysis (Mycroft, 1981). Although the informal idea is intuitive, the proof of correctness is usually very complicated. Origins in [Cousot \& Cousot, 1977] which describes an imperative model, but most use in functional languages at present.

A clarifying example: consider the set of integers, [tex2html_wrap725], and the function * and that we wish to now the sign of a particular product computation. One solution is to calculate the result and check the sign. A more realistic approach is to use the ``rule of signs''. This rule is simple kind of abstract interpretation using the set [tex2html_wrap727] and associated with it are the abstraction function, [tex2html_wrap729] and we can then define the abstract multiplication function.

However, if we turn to + we do not always know the signgn of the result, so this is denoted by [tex2html_wrap731]. Although this ensures safeness, we are throwing information away whenever we map to [tex2html_wrap733].

Definition: function is strict in [tex2html_wrap735]th argument if when that is [tex2html_wrap737] the result of the function is [tex2html_wrap739]. For the purpose of strictness analysis, we define an abstract domain for [tex2html_wrap741], called [tex2html_wrap743], where [tex2html_wrap745] is its bottom element associated with [tex2html_wrap747] and [tex2html_wrap749] is associated with [tex2html_wrap751] and [tex2html_wrap753]. We represent definitely non-terminating expressions by [tex2html_wrap755] and those which may (but may not) terminate by [tex2html_wrap757], so that the abstraction function is defined by:

[tex2html_wrap719] [tex2html_wrap720]

Convenient to extend domain for tuples, but will not detail, so that we can simplify the notion of strictness to [tex2html_wrap759], where [tex2html_wrap761] appears as the [tex2html_wrap763]th argument on the lhs. This means that [tex2html_wrap765] will sometimes be associated with a non-terminating computation and so certain strict argumetns will not be detected as such. To complete our environment for abstract interpretation, we now have to define abstract versions of the primitive functions of the language.

First consider the binary arithmetic operators. Because they are strict in both arguments, their abstract versions are given by logical conjunction over [tex2html_wrap767].

Second, consider conditional expressions. If the condition does not terminate, then the whole expression does not terminate. If the condition does terminate, then termination rests on the consequent and the alternative since we cannot detect which will be chosen. Thus [tex2html_wrap769]. Again, for simplicity, we assume we are working with first order functions, so that all are named and are therefore either user-defined or primitive. Extension to high-order is fairly straightforward.

An example:

fun f(x,y,z) = if (x=0) then (y+z) else (x-y);

which, when made into its abstract counterpart is

[tex2html_wrap721]

where [tex2html_wrap771] represents [tex2html_wrap773] etc.. Examining the internal expressions, we have [tex2html_wrap775] since = is binary strict and [tex2html_wrap779] is [tex2html_wrap781] because it is a constant. Similarly for the other expressions, obtaining:

[eqnarray394]

because the result is independent of [tex2html_wrap783]. Now we are in a position to determine which parameters are ``necessary'', since [tex2html_wrap785] and [tex2html_wrap787], from which we concludes that [tex2html_wrap789] is strict in its first two parameters, but not the third.

The next step is to be able to handle recursion abstractly. Mechanism for recursive functions is the same as for the non-recursive case, obtaining a recursive abstract function, but because its domain is small (two point!) we take advantage of it to calculate the least fixed point directly

A second example:

fun f(x,y) = if (x=0) then y else f(x-1,y);

which abstracted is:

[eqnarray402]

To find the least fixed point of [tex2html_wrap791] we use the following iteration (general form without proof):

[eqnarray412]

Applied to our function:

[eqnarray420]

so the least fixed point is [tex2html_wrap793], telling us that [tex2html_wrap795] is strict in both arguments. Note that it is not in general (with [tex2html_wrap797] arguments) sufficient to stop when consecutive iterations are the same, rather the whole matrix must be computed and termination is when the whole matrix is unchanged.

Note: this is a highly simplified description; the proofs of safeness involve some pretty heavy notation and algebra.

Continuation-Passing Style

Revise continuation-function of one argument, embodying the ``rest of the program''. An intermediate language, which has a number of features:

  1. No control stack is required, since no function ever returns, instead it calls its continuation, which is passed as the first argument (by convention).
  2. Temporary (intermediate) values are made explicit-thus making code generation simpler.
  3. Order of evaluation is fixed (by the transformation), since no non-trivial expressions remain.
  4. CPS version of program is executable, since CPS form is a functional subset of the source language (makes verification of transformation easier).
  5. The number of kinds of expression (classes of AST nodes) is small (depends on implementation, but, say, 3: lambda expression, combination, variable-just like the lambda calculus).
Conversion to CPS can be expressed in five rules-see EuLisp handout-depending on the syntactic structures to be handled (e.g. labels can be a macro for let and Y. Note that we avoid distinguished CPS nodes for if by the introduction of the extensional if ef and for progn (sequencing) because it can be expressed as a sequence of combinations. Note that the above is just one particular CPS translation-an infinite variety is possible depending on argument order and the choice of primitives.

Why do this? Environment and closure analysis is simpler-lifetimes affect allocation decisions; code generation is simpler. Closures come in three forms:

  1. Full closures-generated when the lambda expression (or the variable bound to it) appears in a non-operator position.
  2. Partial closures-generated when the lambda expression (or variable) occurs only in an operator position within other fully or partially closed functions. This means the environment has to be constructed, but there is no need for a code pointer.
  3. No closures-when an operator anywhere else.
References: [Steele, 1978], [Kranz et al, 1986].

Single Static Assignment Form

Compare with single assignment languages (note: add section on dataflow technology?).

Constant Propagation

Equality of Values

Partial Evaluation


Next: Reflection Up: C73-Compiler Techniques Previous: Functional Languages


masjap@
Fri Oct 21 18:42:03 BST 1994