Aim to keep greater part of compiler machine independent. Can avoid native CG completely and interpret the IR-either in software or in microcode.
Expand IR into native code-poor quality because of localisation, in particular redundant store/load sequences. A practical solution is peephole optimization (see later). Expansion has three tasks:
Should temporaries be held in regs or in memory? Three classes of register:
Easy with an infinite supply of registers, which is what most front-ends assume. In mapping the infinite set to a finite set it may be necessary to save some values to memory (spilling). Questions are: which reg to allocate? and which reg to spill? Two techniques:
Can be applied to IR and to native code: 2/3 tuples or machine instructions are examined together. Specification in terms of rules/patterns falling into seven classes:
Code generation can also be automated to a large degree. Target code is selected by a set of rules-new rules are added to accomodate new instructions or architectures. There are several obstacles to general purpose CGG's:
Description driven-implementor describes machine in a formal language and the generator chooses the instruction. Heuristic techniques can lead to blocking (dead-ends) and back-tracking (to be avoided).
Derived from analogy with parsing: convert IR to prefix form and use production for CG (???). Like LR but with different treatment of ambiguity: shift-reduce always shifts-delays generating code until as late as possible (when as much information as possible will be available); reduce-reduce uses a separate set of rules to resolve conflict. Quality can equal that of ordinary non-optimising compilers.
Full description in Aho, Ganapthi and Tjiang. Post Graham-Glanville. Twig is a tree manipulation language based on pattern matching and dynamic programming. It has been applied to code generation and to cell synthesis in VLSI layout.
The problem is that more than one match may exist (indeed it would be very hard to write a set of patterns in which there were no ambiguities), so dynamic programming is used to determine the minimum cost cover of the subtree.
Algorithm:
Rules are analagous to the productions in a context-free grammar and the pruning of the tree is akin to parsing. As with Graham-Glanville, the problem is ambiguity. The difference is that there is no left to right bias and the use of dynamic programming concurrent with selection leads to optimal covering.
The tree matching algorithm is based on Aho+Corasick's linear-time keyword matching algorithm. Problem: find all substrings in input string in a given set of keywords (paraphrase to CG). Solution: construct a trie from keywords and convert into a pattern matching automaton (O(n) in lengths of keywords) and then use to search for keywords in input.
Dynamic programming is based on earlier work by Aho+Johnson on optimal CG from trees. Given a tree T,
-id: pattern [cost][=action] where label-id is an identifier, pattern is a parenthesized prefix expression (the rule matches when this matches and cost does not abort), cost is code which computes (i) the cost (ii) when to call the action and action is also code. See slide of example 2 from paper.
Results: in comparison with another C compiler ranged between 10%and 35%better, but has slightly higher overheads due to dynamic storage management; in comparison with another Pascal compiler, was on average 40%better than an existing hand-coded generator. Practical advantages of twig are that it also allows the incremental development of the code generator and ambiguities are not a problem because dynamic programming resolves conflicts, quality is high (proven optimal covering).
Definition of effect of instructions at a register transfer level (below assembler) including memory modifications-most instructions are composite under this rewriting (RISC approach). Difference with hand-coded PO's is that generated PO is exhaustive whilst former is not. Quality is very high. Strategy embodied in diagram:
naive----------------> optimized code code | ^ substitute | | synthesize | | V | RT/ RT/ mem-ops-------------->mem-ops optimize
However, repeated exhaustive search would not be very fast. The POG is trained (c.f. neural nets) by feeding code samples to generate sets of common optimisations which are then used in the production PO (effectively a specialization of the POG).