Compiler Organization
Before we can discuss optimization profitably, we need to understand how a
compiler carries out its task.
Compiler structure
A compiler has three major components:
[Discuss-
- Abstract syntax tree is specific to the source language, and reflects
the structure of the source program
- Abstract target tree is specific to the target machine, and reflects
the structure of the target program
- Translation is dependent on both the source language
and the target machine
]
Example
Let's take a specific example and show the various stages in the
compilation:
The source language is a subset of C called C--:
int
GCD(int x, int y)
{ while (!(x == y))
if (x > y) x = x - y; else y = y - x;
return x;
}
[Discuss-
- C-- has no != operator
- A C-- program is a single function, with arguments supplied from
standard input and the result written to standard output
]
Abstract syntax tree
Here is the abstract syntax tree for this program:
[Discuss-
- The form of the abstract syntax tree is determined by the source
program
- Convention: Nodes with names ending in ``List'' are variadic (i.e. they
have an arbitrary number of children, including none at all)
- Distinction between symbol name and node name
- Nodes with the same signature can have different names
]
The structure of a tree node
Each tree fragment is represented by a block of storage with three basic
components:
- A code indicating the rule represented by the fragment
- Information specific to the symbol on the left-hand side of the rule
- Information specific to the rule
Here are some examples:
[Discuss-
- The dotted line indicates the tree fragment represented by the block of
storage
- Computations are associated with each rule
- Information is filled in by traversing the tree, executing computations
for each rule
- Because the rules constitute a context-free grammar, only the
symbol on the left-hand side of a child is known
- The role of a symbol as an interface between rule computations
]
The target machine
In order to provide a concrete example, we need to choose a specific target
machine.
I'll use the MIPS R2000 initially, because it has a straightforward
architecture for which it is easy to generate code.
The important characteristics are:
- A byte-addressed memory
- A central processor that carries out all control operations and integer
arithmetic
- A coprocessor that carries out all floating-point operations
- 32, 32-bit CPU registers
- 32, 32-bit FPU registers, used in pairs
- All computation done on registers with results to registers, except
that certain integer instructions may have a 16-bit constant as the right
operand of the computation
- Load and store instructions that form memory addresses by adding a
16-bit literal to a register
- Comparison instructions that set a register to 0 or 1
- Conditional branch instructions
- Register 0 always has the value 0, and register 31 is set to the return
address by a jal instruction
- Extended operations provided by the assembler, which uses register 1 as
a scratch register
[Discuss-
- The limitation to 16-bit constant operands is removed by the assembler,
which constructs longer constants in register 1
- There is no specialized condition code, and Boolean operations can be
implemented either to transfer control or to yield an integer value
- The available memory addressing function is imm(register)
- The assembler synthesizes (register), imm,
symbol, symbol+imm, symbol-imm,
symbol+imm(register), symbol-imm(register).
- There is no hardware stack, even when an interrupt occurs
]
Abstract target tree
The abstract target tree is stated in terms of target language concepts
instead of source language concepts.
It is a sequence of items, reflecting the execution order required by the
source language semantics.
Many of the items are themselves trees, reflecting the fact that the source
language does not constrain the order of evaluation within most
expressions.
Here are several rules describing target tree nodes:
RULE Label: Item ::= Symbol END;
RULE j: Item ::= Symbol END;
RULE CJump: Item ::= Cond Symbol END;
RULE IgnoreInt: Item ::= IntReg END;
RULE ble: Cond ::= IntReg IntReg END;
RULE sub: IntReg ::= IntReg IntReg END;
RULE sw: IntReg ::= Variable IntReg END;
RULE lw: IntReg ::= Variable END;
[Discuss-
- Symbol is a program label, Variable a program variable
- IntReg is an integer-valued expression whose result should be
left in a register
- Cond is a conditional expression whose result should be a
control transfer
]
Translation
The translation from abstract syntax tree to abstract target tree can be
described largely in terms of tree transformation rules like the following:
IntReg: MinusExpr(IntReg,IntReg) -> Mksub;
JmpFalse: GreaterExpr(IntReg,IntReg) -> Mkble;
JmpTrue: GreaterExpr(IntReg,IntReg) -> Mkbgt;
Here MinusExpr and GreaterExpr are rules describing
fragments of the abstract syntax tree.
The signature of one possible abstract target tree node corresponding to
MinusExpr fragments has an IntReg result and
two IntReg operands.
That node of the abstract target tree is built by the function Mksub
(the corresponding rule is the sub rule above).
One possible abstract target tree node corresponding to
GreaterExpr is a conditional jump that transfers when the
condition ``greater than'' between two integer values is false.
That node of the abstract target tree is built by the function
Mkble (the corresponding rule is the ble rule above).
[Discuss-
- The tree transformation process can be implemented very efficiently by
a generator (see Todd A. Proebsting, "Simple and Efficient BURS Table
Generation", SIGPLAN Notices, 27 (July, 1992) 331-340)
- Not all of the nonterminals appearing in patterns need be symbols in
the target program tree (e.g. JmpFalse and JmpTrue both
correspond to Cond)
- The patterns need not correspond to target tree nodes, but may carry
out computations such as constant folding
]
Abstract target tree for the GCD example
Here is the abstract target tree for the GCD program, described in a
textual notation that is more compact than the block diagram:
Mips(
Entry(8),
ReadInt(x),
ReadInt(y),
j(L1),
Label(L2),
CJump(ble(lw(x),lw(y)),L3),
IgnoreInt(sw(x,sub(lw(x),lw(y)))),
j(L4),
Label(L3),
IgnoreInt(sw(y,sub(lw(y),lw(x)))),
Label(L4),
Label(L1),
CJump(bne(lw(x),lw(y)),L2),
ReturnInt(lw(x)),
j(L5),
Label(L5),
PrintInt(),
Exit())
The block diagram for a portion of the tree is:
[Discuss-
- Target tree nodes have the usual form for tree nodes, storing a rule
name, information associated with the left-hand side symbol, and
information associated with the rule
- Nodes with a mixture of terminal and nonterminal information (e.g.
CJump and sw)
- Sequence that reflects the control flow
- Trees that represent the data dependence within expressions
- Expression nodes corresponding to MIPS instructions
]
Competent code generation
The target program tree represents the target program's algorithm, but is
not yet a sequence of target machine instructions.
Three tasks must be carried out in order to obtain those instructions:
- The order of execution of the individual operations within each tree
must be determined
- The resource requirements (in terms of registers and temporary storage
locations) of each subtree must be determined
- Instructions must be selected to carry out each operation
[Discuss-
- These tasks interact (for example, certain execution orders require
more resources than others)
- A non-optimizing compiler simply codes each Item of the tree
individually
- The most common strategy is to
- Use Sethi-Ullman numbering to determine the resource requirements
- Use most-resource-intensive-first evaluation order for binary operators
- Allocate a register to hold the first operand computed at a binary
operator
- Code each instruction on the basis of its operand and result registers
- Compose the instructions according to the postulated execution order
]
Results of competent code generation of the GCD program
main: sub $sp,$sp,4
sw $fp,($sp)
move $fp,$sp
sub $sp,$sp,8
li $2,5
syscall
sw $2,-4($fp)
li $2,5
syscall
sw $2,-8($fp)
j L1
L2:
lw $4,-8($fp)
lw $5,-4($fp)
ble $5,$4,L3
lw $4,-8($fp)
lw $5,-4($fp)
sub $4,$5,$4
sw $4,-4($fp)
j L4
L3:
lw $4,-4($fp)
lw $5,-8($fp)
sub $4,$5,$4
sw $4,-8($fp)
L4:
L1:
lw $4,-8($fp)
lw $5,-4($fp)
bne $5,$4,L2
lw $4,-4($fp)
j L5
L5:
li $2,1
syscall
move $sp,$fp
lw $fp,($sp)
add $sp,$sp,4
j $ra
[Discuss-
- x and y are stored in the activation record at
offsets -4 and -8 respectively
- Because each item is coded individually, and because each item is quite
small, this strategy cannot really take advantage of the large register set
- The technique works well for CISC machines in which operations can be
carried out with memory operands
- Even on a CISC machine, however, the code looks pretty bad for a program
with lots of array references because array references generate common
subexpressions
]