# Code Motion

Global value numbering enables us to decide whether a computation at a given program point is redundant or not, and therefore whether to eliminate it. Another way of reducing the number of computations a program carries out is to move them from frequently-executed parts of the program to parts that are executed less often. There are, of course, constraints on such moves that must be checked by checking global relationships among computations.

## Example

Embedding our array calculation in a loop illustrates the problems and potential profits:
```i = l1 + 1;
do {
j = l2 + 1;
do {
k = l3 + 1;
do {
a[i,j,k] = ( a[i-1,j,k] + a[i+1,j,k] +
a[i,j-1,k] + a[i,j+1,k] +
a[i,j,k-1] + a[i,j,k+1] ) / 6.0;
k++;
} while (k < u3);
j++;
} while (j < u2);
i++;
} while (i < u1);
```
Recall that the array assignment computed i*C1 + j*C2 + k*C3 seven times.

### [Discuss-

• Apologies for the shameful mixture of languages
• All of the recomputations are eliminated by global value numbering
• i*C1 + j*C2 + k*C3 still appears in the innermost loop
• i*C1 really only needs to be computed once for each trip around the outer loop
• Thus if the array were 50 x 50 x 50, i*C1 would be computed 125,000 times instead of 50 times

## Control flow graph and dominator tree

The dominance frontier of each dominator tree node is given in square brackets beside the node: Here are the tuples, grouped into basic blocks numbered according to the diagram. Only the ones we've already seen from the array assignment are shown, and all program point numbers are omitted. Moreover, the value numbers of the original tuples have been preserved and new ones assigned to the new tuples. Finally, i, j and k without subscripts represent the initial variables; subscripted occurrences represent renamings due to assignments within the for-loops:

```1:
V11:=l1+1
i1:=V11
Jump(2)

2:
i2:=phi(i1,i3);
j1:=phi(j,j4);
k1:=phi(k,k5);
V21:=l2+1
j2:=V21
Jump(3)

3:
j3:=phi(j2,j4);
k2:=phi(k1,k5);
V31:=l3+1
k3:=V31
Jump(4)

4:
k4:=phi(k3,k5);
V1:=i2
V2:=V1*C1
V3:=j3
V4:=V3*C2
V5:=V2+V4
V6:=k4
V7:=V6*C3
V8:=V5+V7
V9:=V8-C01
V10:=a[V9]
...
V41:=V6+1
k5:=V41
V42:=V41<u3
JumpT(4,V42)
JumpF(5,V42)

5:
V51:=V3+1
j4:=V51
V52:=V51<u2
JumpT(3,V52)
JumpF(6,V52)

6:
V61:=V1+1
i3=V61
V62:=V61<u1
JumpT(2,V62)
JumpF(exit,V62)
```

### [Discuss-

• Note that numbers like l1+1 are constants -- the addition can be done at compile time
• Linkage among the basic blocks is explicit

## Constraints on code motion

In general, one cannot move a computation
• backwards past the computation of any of its operands
• forwards past any computation using its value
• onto a path along which it would not have been computed by the original program.

### [Discuss-

• It may be possible to move the computation of an operand backward, or the use of a result forward, and thus gain more freedom
• Moving a computation where it wouldn't have been done is usually a ``pessimization'' of the program
• If a computation that could lead to an error is moved onto a path along which it would not have been computed by the original program, the optimized program can fail when the unoptimized one does not
• It is not necessary to preserve the failure position, so we can move computations that may fail
• Optimizations might delete a computation that fails, creating an optimized program that runs where the unoptimized one fails.
• On some machines, speculative execution of code that will not fail is useful -- it covers time required to access memory for a new address

## Example

Consider V1 and V2, which are computed in block 4:
• Every path from blocks 1, 2 and 3 to the exit passes through block 2, so the computation of V1 and V2 could be moved to any of these blocks without moving it onto a path along which it might not be computed
• One of the operands of V1 is computed in block 2, so V1 and V2 cannot be moved into block 1 Thus the lumped computation could be carried out in any of the three starred blocks.

### [Discuss-

• ``Earliest'' placement is in block 2
• If the computation is done in block 2, then it need not be done in block 4
• ``Code motion'' is really insert and delete

## A more complex example

This example is taken from Lazy code motion, ACM SIGPLAN Notices, 27 (July, 1992) 224-234. ### [Discuss-

• Find the earliest computation points for v (2 and 5)
• Note that this results in the fewest computations of v
• Discuss the lifetime of v and the concept of register pressure
• Note that the same number of computations of v will be executed if computations are done in 2, 7, 14 and 15
• Reducing register pressure may avoid spills, so the extra computations may cost nothing

## Critical edges

A critical edge is an edge in the flow graph leading from a node with multiple successors to a node with multiple predecessors. Critical edges can cause problems for code motion.

Consider the implementation of

```if (i != 0)
while (j > (k / i)) {
...  /* No changes made to i or k here */
j--;
}

1:
V11:=i
V12:=V11!=0
JumpT(exit,V12)
JumpF(2,V12)

2:
j1:=phi(j,j2)
V21:=k
V22:=V21/V11
V23:=j1
V24:=V23>V22
JumpT(3,V24)
JumpF(exit,V24)

3:
...
V32:=j1-1
j2:=V32
Jump(2)
```
Both V21 and V22 are constant within the loop, yet neither can be moved to block 1 because that would move them onto a path along which they would not have been computed by the original program.

### [Discuss-

• Introducing the computation into block 1 would allow a division by 0

## Removing critical edges

The problem can be easily resolved: Break the critical edge by introducing an empty basic block: ### [Discuss-

• The constant calculations in the loop exist on all paths from block 0 to the exit
• Moving them to block 0 gets them out of the loop
• We assume that critical edges will always be broken in this way
• Simple transformation of the control flow graph before any further processing is done

## Pre-tested loops

The original example of the array calculation could have been written using for statements instead of do-while:
```for (i = l1 + 1; i < u1; i++)
for (j = l2 + 1; j < u2; j++)
for (k = l3 + 1; k < u3; k++)
a[i,j,k] = ( a[i-1,j,k] + a[i+1,j,k] +
a[i,j-1,k] + a[i,j+1,k] +
a[i,j,k-1] + a[i,j,k+1] ) / 6.0;
```
Here is the definition of a for statement:
```for (i = l1 + 1; i < u1; i++) S;

i = l1 + 1;
while (i < u1) {
S;
i++;
}
``` ### [Discuss-

• Test at the beginning of the loop
• Initialization and test must be separate basic blocks
• Increment may belong to the same basic block as stuff from S

## Control flow graph and dominator tree

The dominance frontier of each dominator tree node is given in square brackets beside the node: Here are the tuples, grouped into basic blocks numbered according to the diagram:

```1:
V11:=l1+1
i1:=V11
Jump(2)

2:
i2:=phi(i1,i3)
j1:=phi(j,j3)
V21:=i2
V22:=V21<u3
JumpT(3,V22)
JumpF(exit,V22)

3:
V31:=l2+1
j2:=V31
Jump(4)

4:
j3:=phi(j2,j4)
k1:=phi(k,k3)
V41:=j3
V42:=V41<u3
JumpT(5,V42)
JumpF(9,V42)

5:
V51:=l3+1
k2:=V51
Jump(6)

6:
k3:=phi(k2,k4)
V61:=k3
V62:=V61<u3
JumpT(7,V62)
JumpF(8,V62)

7:
V1:=i2
V2:=V1*C1
...
V71:=V6+1
k4:=V71
Jump(6)

8:
V81:=V41+1
j4:=V81
Jump(4)

9:
V61:=V21+1
i3=V61
Jump(2)
```

### [Discuss-

• The computations V1 and V2 cannot be moved in this diagram because any move would place them on a path along which they would not have been computed by the original program
• There are no critical edges
• Problem is that a pre-tested loop may not be executed at all

### ]

A partial solution to the problem is for the translator to modify the implementation of the for statement by duplicating the test: ### [Discuss-

• This implementation produces a critical edge, from the first instance of the test to the body
• The critical edge removal will insert an empty basic block on this edge
• The inserted block will act as a ``landing pad'' for calculations that are constant within the loop
• Only the innermost loop will be affected, because the pre-test of the innermost loop blocks motion out of the landing pad

### ]

 waite@isa.informatik.th-darmstadt.de Revision \$Revision: 1.4 \$