Next: Global Optimisation Up: C73-Compiler Techniques Previous: Run-time Systems

Native Code Generation

Aim to keep greater part of compiler machine independent. Can avoid native CG completely and interpret the IR-either in software or in microcode.

Expand IR into native code-poor quality because of localisation, in particular redundant store/load sequences. A practical solution is peephole optimization (see later). Expansion has three tasks:

instruction selection
addressing mode selection-potentially complex on CISC
register allocation-sophisticated techniques for RISC

Lots of possibilities: exhaustive enumeration impractical and messy-best strategy is driven off state of operands (where are they now).

Register Allocation

Should temporaries be held in regs or in memory? Three classes of register:

allocatable-parameter regs
reserved-pc, sp, fp, nil, 0...
volatile-very short lifetime, such as needed on RISC for transit from one memory location to another.

Easy with an infinite supply of registers, which is what most front-ends assume. In mapping the infinite set to a finite set it may be necessary to save some values to memory (spilling). Questions are: which reg to allocate? and which reg to spill? Two techniques:

value numbering: An incremental method (highly localised decision making). Enumerate each value computed (symbolic exection of code) and then spill reg whose next reference is most distant.
register colouring: More sophisticated (makes decision for a code sequence). Construct a conflict graph: an arc from v-i to v-j indicates that those values must co-exist and therefore that a different register is required for each. Register allocation is reduced to colouring a graph so that no two adjacent nodes have the same colour. If need more colours than there are registers have to spill by splitting the graphing (heuristic) and recolouring.

Peephole Optimisation

Can be applied to IR and to native code: 2/3 tuples or machine instructions are examined together. Specification in terms of rules/patterns falling into seven classes:

constant folding-arithmetic with constant arguments, or register constant plus a register can be turned into a register index operation.
strength reduction-replace slow operation with an equivalent fast one: rewriting multiplication with shifts and adds (a*b, b=2^i is equivalent to a<<i b=2^i + 2^j + ... is equivalent a<<i + a<<j + ...) and quotient or remainder with logical and.
null sequences-removal of useless operations such as addition of 0, multiplication by 1 and division by 1.
combine operations-adjacent branch collapsing: BZ L1; BR L2 becomes BNZ L2.
algebraic laws-use commutativity (not associativity because of rounding problems) to reorder arguments canonically (gathering literals) so that constant folding can be applied.
special cases-machine dependent: addition of 1 becomes increment (decrement). Load 0 becomes clr. etc..
addressing mode operations-move arithmetic into addressing mode (CISC), previous case plus array index computation.

Code Generator Generators

Code generation can also be automated to a large degree. Target code is selected by a set of rules-new rules are added to accomodate new instructions or architectures. There are several obstacles to general purpose CGG's:

lack of uniformity in computer architecture makes defining a universal description language difficult
generality leads to sub-optimality-nevertheless competitive
often there are several ways to compute something-this must not confuse the CGG (TWIG uses dynamic programming, others backtracking (not good) and Graham-Glanville heuristics)
hard to interface with machine dependent optimisers
complexity makes them slow

Description driven-implementor describes machine in a formal language and the generator chooses the instruction. Heuristic techniques can lead to blocking (dead-ends) and back-tracking (to be avoided).

Grammar-based CGG-Graham-Glanville

Derived from analogy with parsing: convert IR to prefix form and use production for CG (???). Like LR but with different treatment of ambiguity: shift-reduce always shifts-delays generating code until as late as possible (when as much information as possible will be available); reduce-reduce uses a separate set of rules to resolve conflict. Quality can equal that of ordinary non-optimising compilers.

Tree-matching-TWIG

Full description in Aho, Ganapthi and Tjiang. Post Graham-Glanville. Twig is a tree manipulation language based on pattern matching and dynamic programming. It has been applied to code generation and to cell synthesis in VLSI layout.

The problem is that more than one match may exist (indeed it would be very hard to write a set of patterns in which there were no ambiguities), so dynamic programming is used to determine the minimum cost cover of the subtree.

Algorithm:

Depth-first traversal of tree matching subtress of IR against rules.
At each node use cost function to determiine best match and replace subtree in IR by left-hand side.
Replacement can be delayed until the cost of larger including match is determined. Hence achieve minimum cost cover of whole IR tree.
Second depth-first traversal executing actions associated with replacements-emitting target m/c instructions.

See slide of figure 1.

Rules are analagous to the productions in a context-free grammar and the pruning of the tree is akin to parsing. As with Graham-Glanville, the problem is ambiguity. The difference is that there is no left to right bias and the use of dynamic programming concurrent with selection leads to optimal covering.

The tree matching algorithm is based on Aho+Corasick's linear-time keyword matching algorithm. Problem: find all substrings in input string in a given set of keywords (paraphrase to CG). Solution: construct a trie from keywords and convert into a pattern matching automaton (O(n) in lengths of keywords) and then use to search for keywords in input.

Dynamic programming is based on earlier work by Aho+Johnson on optimal CG from trees. Given a tree T,

Compute an array of costs C for node n, where C[i] is optimal cost of computing subtree rooted at n into a register (assuming i registers). This is O(n).
Traverse T using cost vectors to decide which nodes should be computed into memory (rather than registers), and
Traverse subtrees, generating code, both of which are also O(n).

Twig pattern-action rules are of the form:

-id: pattern [cost][=action] where label-id is an identifier, pattern is a parenthesized prefix expression (the rule matches when this matches and cost does not abort), cost is code which computes (i) the cost (ii) when to call the action and action is also code. See slide of example 2 from paper.

Results: in comparison with another C compiler ranged between 10%and 35%better, but has slightly higher overheads due to dynamic storage management; in comparison with another Pascal compiler, was on average 40%better than an existing hand-coded generator. Practical advantages of twig are that it also allows the incremental development of the code generator and ambiguities are not a problem because dynamic programming resolves conflicts, quality is high (proven optimal covering).

PO Generators

Definition of effect of instructions at a register transfer level (below assembler) including memory modifications-most instructions are composite under this rewriting (RISC approach). Difference with hand-coded PO's is that generated PO is exhaustive whilst former is not. Quality is very high. Strategy embodied in diagram:

          naive----------------> optimized
          code                   code
           |                     ^
substitute |                     | synthesize
           |                     |
           V                     |
          RT/                   RT/
          mem-ops-------------->mem-ops
                   optimize

However, repeated exhaustive search would not be very fast. The POG is trained (c.f. neural nets) by feeding code samples to generate sets of common optimisations which are then used in the production PO (effectively a specialization of the POG).

Next: Global Optimisation Up: C73-Compiler Techniques Previous: Run-time Systems

masjap@
Fri Oct 21 18:42:03 BST 1994