i = lRecall that the array assignment computed_{1}+ 1; do { j = l_{2}+ 1; do { k = l_{3}+ 1; do { a[i,j,k] = ( a[i-1,j,k] + a[i+1,j,k] + a[i,j-1,k] + a[i,j+1,k] + a[i,j,k-1] + a[i,j,k+1] ) / 6.0; k++; } while (k < u_{3}); j++; } while (j < u_{2}); i++; } while (i < u_{1});

- Apologies for the shameful mixture of languages
- All of the recomputations are eliminated by global value numbering
`i*C`still appears in the innermost loop_{1}+ j*C_{2}+ k*C_{3}`i*C`really only needs to be computed once for each trip around the_{1}*outer*loop- Thus if the array were 50 x 50 x 50,
`i*C`would be computed 125,000 times instead of 50 times_{1}

Here are the tuples, grouped into basic blocks numbered according to the
diagram.
Only the ones we've already seen from the array assignment are shown,
and all program point numbers are omitted.
Moreover, the value numbers of the original tuples have been preserved and
new ones assigned to the new tuples.
Finally, `i`, `j` and `k` without subscripts represent
the *initial* variables; subscripted occurrences represent renamings
due to assignments within the for-loops:

1: V_{11}:=l_{1}+1 i_{1}:=V_{11}Jump(2) 2: i_{2}:=phi(i_{1},i_{3}); j_{1}:=phi(j,j_{4}); k_{1}:=phi(k,k_{5}); V_{21}:=l_{2}+1 j_{2}:=V_{21}Jump(3) 3: j_{3}:=phi(j_{2},j_{4}); k_{2}:=phi(k_{1},k_{5}); V_{31}:=l_{3}+1 k_{3}:=V_{31}Jump(4) 4: k_{4}:=phi(k_{3},k_{5}); V_{1}:=i_{2}V_{2}:=V_{1}*C_{1}V_{3}:=j_{3}V_{4}:=V_{3}*C_{2}V_{5}:=V_{2}+V_{4}V_{6}:=k_{4}V_{7}:=V_{6}*C_{3}V_{8}:=V_{5}+V_{7}V_{9}:=V_{8}-C_{01}V_{10}:=a[V_{9}] ... V_{41}:=V_{6}+1 k_{5}:=V_{41}V_{42}:=V_{41}<u_{3}JumpT(4,V_{42}) JumpF(5,V_{42}) 5: V_{51}:=V_{3}+1 j_{4}:=V_{51}V_{52}:=V_{51}<u_{2}JumpT(3,V_{52}) JumpF(6,V_{52}) 6: V_{61}:=V_{1}+1 i_{3}=V_{61}V_{62}:=V_{61}<u_{1}JumpT(2,V_{62}) JumpF(exit,V_{62})

- Note that numbers like
`l`are constants -- the addition can be done at compile time_{1}+1 - Linkage among the basic blocks is explicit

- backwards past the computation of any of its operands
- forwards past any computation using its value
- onto a path along which it would not have been computed by the original program.

- It may be possible to move the computation of an operand backward, or the use of a result forward, and thus gain more freedom
- Moving a computation where it wouldn't have been done is usually a ``pessimization'' of the program
- If a computation that could lead to an error is moved onto a path along which it would not have been computed by the original program, the optimized program can fail when the unoptimized one does not
- It is not necessary to preserve the failure position, so we can move computations that may fail
- Optimizations might
*delete*a computation that fails, creating an optimized program that runs where the unoptimized one fails. - On some machines,
*speculative*execution of code that will not fail is useful -- it covers time required to access memory for a new address

- Every path from blocks 1, 2 and 3 to the exit passes through block 2,
so the computation of
`V`and_{1}`V`could be moved to any of these blocks without moving it onto a path along which it might not be computed_{2} - One of the operands of
`V`is computed in block 2, so_{1}`V`and_{1}`V`cannot be moved into block 1_{2}

Thus the lumped computation could be carried out in any of the three starred blocks.

- ``Earliest'' placement is in block 2
- If the computation is done in block 2, then it need not be done in block 4
- ``Code motion'' is really insert and delete

- Find the earliest computation points for
`v`(2 and 5) - Note that this results in the
*fewest*computations of`v` - Discuss the lifetime of
`v`and the concept of*register pressure* - Note that the same number of computations of
`v`will be*executed*if computations are done in 2, 7, 14 and 15 - Reducing register pressure may avoid spills, so the extra computations may cost nothing

Consider the implementation of

if (i != 0) while (j > (k / i)) { ... /* No changes made to i or k here */ j--; } 1: VBoth_{11}:=i V_{12}:=V_{11}!=0 JumpT(exit,V_{12}) JumpF(2,V_{12}) 2: j_{1}:=phi(j,j_{2}) V_{21}:=k V_{22}:=V_{21}/V_{11}V_{23}:=j_{1}V_{24}:=V_{23}>V_{22}JumpT(3,V_{24}) JumpF(exit,V_{24}) 3: ... V_{32}:=j_{1}-1 j_{2}:=V_{32}Jump(2)

- Introducing the computation into block 1 would allow a division by 0

- The constant calculations in the loop exist on all paths from block 0 to the exit
- Moving them to block 0 gets them out of the loop
- We assume that critical edges will
*always*be broken in this way - Simple transformation of the control flow graph before any further processing is done

for (i = lHere is the definition of a_{1}+ 1; i < u_{1}; i++) for (j = l_{2}+ 1; j < u_{2}; j++) for (k = l_{3}+ 1; k < u_{3}; k++) a[i,j,k] = ( a[i-1,j,k] + a[i+1,j,k] + a[i,j-1,k] + a[i,j+1,k] + a[i,j,k-1] + a[i,j,k+1] ) / 6.0;

for (i = l_{1}+ 1; i < u_{1}; i++) S; i = l_{1}+ 1; while (i < u_{1}) { S; i++; }

- Test at the beginning of the loop
- Initialization and test must be separate basic blocks
- Increment may belong to the same basic block as stuff from
`S`

Here are the tuples, grouped into basic blocks numbered according to the diagram:

1: V_{11}:=l_{1}+1 i_{1}:=V_{11}Jump(2) 2: i_{2}:=phi(i_{1},i_{3}) j_{1}:=phi(j,j_{3}) V_{21}:=i_{2}V_{22}:=V_{21}<u_{3}JumpT(3,V_{22}) JumpF(exit,V_{22}) 3: V_{31}:=l_{2}+1 j_{2}:=V_{31}Jump(4) 4: j_{3}:=phi(j_{2},j_{4}) k_{1}:=phi(k,k_{3}) V_{41}:=j_{3}V_{42}:=V_{41}<u_{3}JumpT(5,V_{42}) JumpF(9,V_{42}) 5: V_{51}:=l_{3}+1 k_{2}:=V_{51}Jump(6) 6: k_{3}:=phi(k_{2},k_{4}) V_{61}:=k_{3}V_{62}:=V_{61}<u_{3}JumpT(7,V_{62}) JumpF(8,V_{62}) 7: V_{1}:=i_{2}V_{2}:=V_{1}*C_{1}... V_{71}:=V_{6}+1 k_{4}:=V_{71}Jump(6) 8: V_{81}:=V_{41}+1 j_{4}:=V_{81}Jump(4) 9: V_{61}:=V_{21}+1 i_{3}=V_{61}Jump(2)

- The computations
`V`and_{1}`V`cannot be moved in this diagram because any move would place them on a path along which they would not have been computed by the original program_{2} - There are no critical edges
- Problem is that a pre-tested loop may not be executed at all

- This implementation produces a critical edge, from the first instance of the test to the body
- The critical edge removal will insert an empty basic block on this edge
- The inserted block will act as a ``landing pad'' for calculations that are constant within the loop
- Only the
*innermost*loop will be affected, because the pre-test of the innermost loop blocks motion out of the landing pad

waite@isa.informatik.th-darmstadt.de | Revision $Revision: 1.4 $ |