Compiler Organization

Before we can discuss optimization profitably, we need to understand how a compiler carries out its task.

Compiler structure

A compiler has three major components:

[Discuss-

]

Example

Let's take a specific example and show the various stages in the compilation: The source language is a subset of C called C--:
int
GCD(int x, int y)
{ while (!(x == y))
    if (x > y) x = x - y; else y = y - x;
  return x;
}

[Discuss-

]

Abstract syntax tree

Here is the abstract syntax tree for this program:

[Discuss-

]

The structure of a tree node

Each tree fragment is represented by a block of storage with three basic components: Here are some examples:

[Discuss-

]

The target machine

In order to provide a concrete example, we need to choose a specific target machine. I'll use the MIPS R2000 initially, because it has a straightforward architecture for which it is easy to generate code. The important characteristics are:

[Discuss-

]

Abstract target tree

The abstract target tree is stated in terms of target language concepts instead of source language concepts. It is a sequence of items, reflecting the execution order required by the source language semantics. Many of the items are themselves trees, reflecting the fact that the source language does not constrain the order of evaluation within most expressions.

Here are several rules describing target tree nodes:

RULE Label:     Item ::=      Symbol END;
RULE j:         Item ::=      Symbol END;
RULE CJump:     Item ::= Cond Symbol END;
RULE IgnoreInt: Item ::= IntReg      END;

RULE ble: Cond ::= IntReg IntReg END;

RULE sub: IntReg ::= IntReg   IntReg END;
RULE sw:  IntReg ::= Variable IntReg END;
RULE lw:  IntReg ::= Variable        END;

[Discuss-

]

Translation

The translation from abstract syntax tree to abstract target tree can be described largely in terms of tree transformation rules like the following:
IntReg:   MinusExpr(IntReg,IntReg)   -> Mksub;
JmpFalse: GreaterExpr(IntReg,IntReg) -> Mkble;
JmpTrue:  GreaterExpr(IntReg,IntReg) -> Mkbgt;
Here MinusExpr and GreaterExpr are rules describing fragments of the abstract syntax tree. The signature of one possible abstract target tree node corresponding to MinusExpr fragments has an IntReg result and two IntReg operands. That node of the abstract target tree is built by the function Mksub (the corresponding rule is the sub rule above).

One possible abstract target tree node corresponding to GreaterExpr is a conditional jump that transfers when the condition ``greater than'' between two integer values is false. That node of the abstract target tree is built by the function Mkble (the corresponding rule is the ble rule above).

[Discuss-

]

Abstract target tree for the GCD example

Here is the abstract target tree for the GCD program, described in a textual notation that is more compact than the block diagram:
Mips(
  Entry(8),
  ReadInt(x),
  ReadInt(y),
  j(L1),
  Label(L2),
  CJump(ble(lw(x),lw(y)),L3),
  IgnoreInt(sw(x,sub(lw(x),lw(y)))),
  j(L4),
  Label(L3),
  IgnoreInt(sw(y,sub(lw(y),lw(x)))),
  Label(L4),
  Label(L1),
  CJump(bne(lw(x),lw(y)),L2),
  ReturnInt(lw(x)),
  j(L5),
  Label(L5),
  PrintInt(),
  Exit())
The block diagram for a portion of the tree is:

[Discuss-

]

Competent code generation

The target program tree represents the target program's algorithm, but is not yet a sequence of target machine instructions. Three tasks must be carried out in order to obtain those instructions:

[Discuss-

]

Results of competent code generation of the GCD program

main:   sub     $sp,$sp,4
        sw      $fp,($sp)
        move    $fp,$sp
        sub     $sp,$sp,8
        li      $2,5
        syscall
        sw      $2,-4($fp)
        li      $2,5
        syscall
        sw      $2,-8($fp)
        j       L1
L2:
        lw      $4,-8($fp)
        lw      $5,-4($fp)
        ble     $5,$4,L3
        lw      $4,-8($fp)
        lw      $5,-4($fp)
        sub     $4,$5,$4
        sw      $4,-4($fp)
        j       L4
L3:
        lw      $4,-4($fp)
        lw      $5,-8($fp)
        sub     $4,$5,$4
        sw      $4,-8($fp)
L4:
L1:
        lw      $4,-8($fp)
        lw      $5,-4($fp)
        bne     $5,$4,L2
        lw      $4,-4($fp)
        j       L5
L5:
        li      $2,1
        syscall
        move    $sp,$fp
        lw      $fp,($sp)
        add     $sp,$sp,4
        j       $ra

[Discuss-

]


waite@isa.informatik.th-darmstadt.de
Revision $Revision: 1.3 $