Intermediate or native-qualitative decision. For the former:
Communication medium between front-end and back-end-the machine dependency boundary. In theory, could be 0, 1, 2 or 3 address idealised machines, but in practice, usually 0 or 3 and occasionally 2:
See handout for examples from Fischer and LeBlanc of annotated if:
if E1 code for E1 then JUMP-FALSE else-label ;; #if_test S1 code for S1 ----> JUMP end-if-labe ;; #gen_jump else LABEL else-label ;; #gen_else_label S2 code for S2 end if LABEL end-if-label ;; #start_if, #gen_out_label
and for:
for I1 in E1..E2 code for E1 STORE lowerbound LT lowerbound I1 JUMP-FALSE end-for-label code for E2 STORE upperbound ----> GT I1 upperbound JUMP-FALSE end-for-label LABEL begin-for-label ;; #init_loop loop S1 code for S1 end loop ADD I1 1 I1 EQ I1 10 JUMP-FALSE begin-for-label LABEL end-for-label ;; #finish_loop
Similar, obvious expansion hold for the other syntactic elements of a programming language. Remark on case density and jump tables versus chained tests.
Most languages provide call-by-value (although Bliss and FORTRAN are exceptions) and for most programming that is sufficient. Call-by-need is essential to the efficiency of functional languages.
By implication, the only semantic models we are supporting are by value and by reference. Protocol chosen has high impact on run-time efficiency. Pass parameters on stack or in registers? trade-off memory traffic against debuggability-obviated by caching? Caller or callee saves? Three options:
[tabular195]
1, 3 and 5 are preferred because it saves replication of code for saving the regs-only in callee (once) rather than in callers (many), but 1 and 2 are preferred because it saves fewer registers. However, not an open-and-shut case for 1. Callee save and restore makes non-local exits hard because the restoration information is distributed, but with caller save/restore it is relatively straightforward (consider longjmp/setjmp and jmpbuf).
An important optimisation in functional languages and those in which the programmer cannot take the address of a local variable (since that prevents the opt.). Easiest in the caller saves model. In summary: callee's parameters are prepared; caller's frame is discarded; control is transferred to callee. Consequence is a significant saving in total stack depth-very important in languages which encourage recursive solutions. Effect is to transform recursive function consuming unbounded resources (amounts of stack) into loops consuming finite resources. Hard in C-requires sophisticated analysis to verify safety (or no parameters and no local variables!).
We have covered some control expressions and function calling, so this is all that is left.
Source code is turned into a tree (attributed tree) and classical IR is linearization of the tree. Can CG directly from tree by post-order traversal: produces valid code but not necessarily the best code-want to minimize spilling. For example: (A-B)+((C+D)+(E*F)) requires 1 reg for A-B and 2 for the rest making a total of three, but if we calculate the sub-expressions in the reverse order at most 2 regs are required.
An algorithm to minimise the number of registers requried for a tree: label the nodes with their Sethi-Ullman number working from the bottom-up using combination rules for the interior nodes.
Relatively easy to perform: ignore control flow, can be integrated with CG. Starts with construction of basic blocks. Basic block: linear code sequence, that is starting with a label (or after a conditional branch) and finishing with either a label or any kind of branch.
Aim of optimisation is to preserve the meaning of the program (not always possible-unsafe optimisations), avoid redundant computation-for example and address calculation-minimise load/store operations and prevent the destruction of useful values.
Particularly important is common sub-expression elimination: use value-numbering (Cocke and Schwartz) to do symbolic execution of IR in which each new value is enumerated; can then determine redundant computations by comparing value numbers (if a particular combination of v-i and v-j is repeated then the corresponding value is reused-hash tables). Can also be used for register allocation/spilling (see later).
This approach is applicable to basic blocks, rather than single expressions. We create a DAG (tree except that nodes may have in-degree greater than 1) by merging the tree nodes of a basic block to reflect sharing and hence resolve CSE's. Thus (A+B)+(A+B) is
+ / \ \ / + / \ A BWe now look a detailed example from Fischer and LeBlanc. There is a practical constraint on what can be acheived due to complexity. If a value is shared, the technique aims to keep it in a register until all references have been satisfied. However, even assuming an infinite number of registers, generating optimal code is exponential in the size of the DAG. The difficulty stems from left operands usually being destoyed, but sharing requiring copying before use as a left operand. Aim, therefore, is to order evaluation to minimize the number of copies.
Heuristic Scheduling Algorithm:
Register allocation: tries to use the same virtual register all the way down the left branch (use + destroy) to minimize copying. Succeeds as long as operator is last use of operand (ie. low schedule number).
Mapping virtual to physical registers: Define the span of a virtual register as the range of schedule numbers of the nodes that use the register. Two virtual registers cannot be implemented by the same physical registers if the spans overlap. V1 has a span of 1..6, V2: 1..3 and V3:4..5. Therefore, V1 maps to R1 and V2 and V3 map to R2.
Output values: that is the updates that must be made at the end of the basic block to make succeeding ones correct. The code must update certain variables: G, C, A at the end. However, mut not store too soon-that is, before all uses are complete. Thus, we delay storing until a register is going to be destroyed. The final code sequence is:
ADD A,B,R1 MUL C,R1,R2 ADD R1,R2,G SUB E,F,R2 STO R1,C MUL D,R1,R1 ADD R1,R2,A
Heuristics can be improved by using commuting information or just completely different approaches. Aliasing complicated matters significantly.
This section is based on p559 of Fischer and LeBlanc, but the technique goes back to Cocke and Schwartz in 1970.
The idea is to avoid redundant computation by looking for a previous computation of the same value within the basic block. If it is found, the generation of code is suppressed. The example here ignores (a) aliasing of array elements, (b) reference parameters (c) pointers. If two operands have not changed since the result was defined, then the previous value of the temporary can be used. To handle addresses, a number is assigned for the location and another for the contents. The code fragment in question is:
A(I,J) = A(I,J) + B + C A(I,J+1) = A(I,J) + B + D A(I,J) = A(I,J) + BSee table on overhead-tuples marked with R are redundant, bringing the length of the code sequence down from 21 to 11. Aliasing makes process more complicated because it is very difficult to ensure that all access paths are taken into account. Note that a more careful analysis would not mark tuple 20 as redundant because update of A(I,J+1) kills all values of A(I,J). Although we know they are not aliases, recognizing (in general) that J <> J+n is more profound than value numbering is capable of being.