Global Register Allocation

Current main memory chips have access times on the order of 60ns to 70ns. When one adds the time it takes for a memory request to pass from the processor through the system bus and then the memory controllers and decode logic, the memory access time can increase to 100ns or more.

A processor that runs with a 100MHz clock has a cycle time of 10ns. If we assume that an addition can be executed in a single processor cycle, then an ADD instruction that takes one of its operands from main memory might spend 100ns waiting for that operand and only 10ns doing the addition. The overall time required to complete a program would then be determined almost entirely by the memory access time; increasing the processor speed would have very little effect.

Hardware designers mitigate this problem by providing caches on the processor chip to buffer memory references. Although the cache does an excellent job of reducing the cost of memory operations, it is vital for the compiler to make best use of the available register set to shorten execution time still further.

We have already seen how a non-optimizing compiler allocates registers within a computation to minimize the number of registers used. This strategy is appropriate when the target machine has few registers; in the case of a machine with many registers (like the MIPS processor assumed in the earlier example), an optimizing compiler must take a broader view of the problem.

Simple example

Let's return to the example illustrating competent code generation using only local information:
int
GCD(int x, int y)
{ while (!(x == y))
    if (x > y) x = x - y; else y = y - x;
  return x;
}
The final MIPS assembly code generated for this program was:
main:   sub     $sp,$sp,4
        sw      $fp,($sp)
        move    $fp,$sp
        sub     $sp,$sp,8
        li      $2,5
        syscall
        sw      $2,-4($fp)
        li      $2,5
        syscall
        sw      $2,-8($fp)
        j       L1
L2:
        lw      $4,-8($fp)
        lw      $5,-4($fp)
        ble     $5,$4,L3
        lw      $4,-8($fp)
        lw      $5,-4($fp)
        sub     $4,$5,$4
        sw      $4,-4($fp)
        j       L4
L3:
        lw      $4,-4($fp)
        lw      $5,-8($fp)
        sub     $4,$5,$4
        sw      $4,-8($fp)
L4:
L1:
        lw      $4,-8($fp)
        lw      $5,-4($fp)
        bne     $5,$4,L2
        lw      $4,-4($fp)
        j       L5
L5:
        li      $2,1
        syscall
        move    $sp,$fp
        lw      $fp,($sp)
        add     $sp,$sp,4
        j       $ra

[Discuss-

]

The result of global register allocation

It's clear by inspection that we could implement the program using only two registers (one for x and the other for y):
main:   sub     $sp,$sp,4
        sw      $fp,($sp)
        move    $fp,$sp
        sub     $sp,$sp,8
        li      $2,5
        syscall
	add	$4,$2,0
        li      $2,5
        syscall
	add	$5,$2,0
        j       L1
L2:
        ble     $4,$5,L3
        sub     $4,$4,$5
        j       L4
L3:
        sub     $5,$5,$4
        sw      $4,-8($fp)
L4:
L1:
        bne     $4,$5,L2
        j       L5
L5:
        li      $2,1
        syscall
        move    $sp,$fp
        lw      $fp,($sp)
        add     $sp,$sp,4
        j       $ra

[Discuss-

]

Restrictions on allocation

We can only allocate a variable to a register if it fits into a register and if we have control over all accesses of that variable. A rule of thumb is to consider only simple local variables whose addresses are never computed.

[Discuss-

]

Allocation philosophy

There are two basic philosophies underlying register allocation:
  1. Variables have ``home'' memory locations, but we try to keep them in registers for as much of their lifetimes as possible
  2. Variables are kept in registers, but from time to time it may be necessary to ``spill'' them to memory

[Discuss-

]

Economics

Allocating a variable to a register needs to make sense in terms of the program's execution time. We can characterize the costs for a particular situation in terms of three parameters:
LODSAVE
The amount of execution time saved for a reference if the entity referenced is in a register
STRSAVE
The amount of execution time saved for a definition if the entity defined is in a register
MOVCOST
The cost of moving a value between a register and memory (if the costs of loads and stores are different, that must be accounted for)
For more information about these parameters, see The Priority-Based Coloring Approach to Register Allocation, ACM Transactions on Programming Languages and Systems 12 (1990) 501-536.

[Discuss-

]

Register allocation as graph coloring

Global register allocation considers a complete procedure and a set of entities to be allocated among k registers such that two entities that exist at the same time are allocated to different registers. This problem can be interpreted as one of coloring an interference graph: Adjacent nodes must be given distinct colors; the colors assigned to the nodes are the registers.

[Discuss-

]

Register allocation algorithm

  1. Find the set of units making up the live range for each entity.
  2. Construct the interference graph with one node for each live range
  3. Delete nodes with fewer than k neighbors, pushing them onto a stack. If nodes remain, select one, delete it from the graph and push it onto the stack.
  4. Pop nodes from the stack, allocating them to registers so that a node does not get allocated the same register as any of its neighbors. If this cannot be done, split the live range.
  5. If any live range was split in step (4), go to step (1).

[Discuss-

]

Register allocation data structure

Each live range has four properties: A live unit is the intersection of a live range and a basic block. It summarizes the properties of the basic block that are important for this live range: Each basic block also has a list of the live units making it up, so the information for a live entity can be accessed either from its live range or its basic block.

[Discuss-

]

Compute the set of basic blocks constituting each live range

This is the first step of the global register allocation algorithm. We know what the entities that we want to allocate are, so we can begin by creating a live range object for each. The set of basic blocks belonging to the live range are then determined by the following function:
LiveRange(x) =
  {b | for some p in b,
    (used(p,x) or
     def(p,x) or
     (Live(p,x) and Reaching(p,x)))}
As the basic blocks making up the live range are determined, live unit information should be computed for them and added to the live unit list for the live range.

[Discuss-

]

The GCD example

For this example, we'll start with the abstract target tree and interpret the lw and sw nodes as markers for variable uses and variable definitions respectively:
Mips(
  Entry(8),
  ReadInt(x),
  ReadInt(y),
  j(L1),
  Label(L2),
  CJump(ble(lw(x),lw(y)),L3),
  IgnoreInt(sw(x,sub(lw(x),lw(y)))),
  j(L4),
  Label(L3),
  IgnoreInt(sw(y,sub(lw(y),lw(x)))),
  Label(L4),
  Label(L1),
  CJump(bne(lw(x),lw(y)),L2),
  ReturnInt(lw(x)),
  j(L5),
  Label(L5),
  PrintInt(),
  Exit())
The entities assigned to registers will be the variables and the units expressing a live range will be basic blocks.

BlockPredSuccused(x)used(y)def(x)def(y)Content
1 Entry 5 false false true false ReadInt(x)
false false false true ReadInt(y)
2 5 3,4 true true false false CJump(ble(lw(x),lw(y)),L3)
3 2 5 true true true false IgnoreInt(sw(x,sub(lw(x),lw(y))))
4 2 5 true true false true IgnoreInt(sw(y,sub(lw(y),lw(x))))
5 1,3,4 2,6 true true false false CJump(bne(lw(x),lw(y)),L2)
6 5 Exit true false false false ReturnInt(lw(x))

[Discuss-

]

Construct the interference graph

This is the second step of the global register allocation algorithm. At this point each live range object has the complete set of basic blocks available, so all we need to do is to go through and intersect each pair of sets. If the intersection is not empty, an interference arc is added by placing each live range on the others' list of interfering live ranges.

Select a k-neighbor node to delete

This may be required during the third step of the global register allocation algorithm.

A priority (The Priority-Based Coloring Approach to Register Allocation, ACM Transactions on Programming Languages and Systems 12 (1990) 501-536) can be attached to each live range, showing which of them is estimated to give the best results when allocated to a register. When we need to choose a k-neighbor node to delete from the graph and push onto the stack, we should select the live range with the lowest priority.

The savings we get when a variable is in a register during a basic block is determined from the machine-dependent parameters:

SavingsLiveUnit = LODSAVE*uses + STRSAVE*defs - MOVCOST*n

The savings must be weighted by the frequency, since we're interested in minimizing the execution time:

WeightLiveUnit = 5Loop nesting depthLiveUnit

The priority is then the normalized, weighted savings over all live units in the live range:

PrLiveRange =
  (sum over LiveUnit in LiveRange
    (SavingsLiveUnit * WeightLiveUnit)
  ) / NumberOfLiveUnitsLiveRange

[Discuss-

]

Splitting a live range

This may be required during the fourth step of the global register allocation algorithm.

Initially, a single register is allocated to an entity during the whole of its existence. Perhaps fewer conflicts would result if its lifetime were split into two distinct pieces and different registers used in each piece.

When splitting a live range, each piece will ideally start with a definition of the entity (so that there will be no need for an extra load), and will consist of a set of contiguously-executed basic blocks. The following algorithm is taken from The Priority-Based Coloring Approach to Register Allocation, ACM Transactions on Programming Languages and Systems 12 (1990) 501-536:

  1. If there is a live unit in the range to be split that has free registers and begins with a definition of the entity corresponding to the range, select it and go to step 3.
  2. If there is a live unit in the range to be split that has free registers, select it and go to step 3. Otherwise discard this live range as a candidate for register allocation.
  3. Initialize the set of registers used to the set of registers used in the basic block of the selected live unit.
  4. Select successor live units, breadth-first. For each selection, check that the union of the set of registers used and the set of registers used in the basic block has fewer than k members. Go to step 5 when no such selection is possible.
  5. Remove the live units selected in steps 1, 2 and 4 from the live range. The selected live units constitute one of the new live ranges, the live units remaining in the original live range constitute the other. Update the information in a live unit description about whether the entity must be loaded on entry and/or stored on exit.

[Discuss-

]

Allocating registers to values instead of variables

The set of variables corresponding to live ranges was constrained by what kind of variable it was. Different constraints apply to the selection of values as candidates for live ranges:

[Discuss-

]

Rematerialization

One problem with allocating registers to values instead of variables is that the global value numbering process will identify a large number of simple addressing computations as being redundant. Depending upon the circumstances, it may be cheaper to redo these computations (``rematerialize their values'') than it is to save and restore their values. For a more complete discussion, see Rematerialization, SIGPLAN Notices 27, 7 (July, 1992) 311-321.

[Discuss-

]


waite@isa.informatik.th-darmstadt.de
Revision $Revision: 1.1 $