Data Flow Analysis
We've seen how computations can be moved to other parts of the program so
that redundancy in time is reduced.
These moves were relatively straightforward for a human to carry out, but
we need the optimizer to do them without human intervention.
Data flow analysis is the general set of techniques that the
compiler uses.
A typical data flow analysis problem
For a more complete discussion of this problem, see
Optimal Code Motion: Theory and Practice,
ACM Transactions on Programming Languages and Systems, 16 (1994) 1117-1155.
One of the constraints placed on code motion is not to move a computation
to a path where it would not have been executed in the original program.
To check this constraint, the compiler might compute a predicate
d-safe(p,v), which is true if a computation of v at
program point p does not introduce a new value on a terminating
path starting at p.
[Discuss-
- Characterization of a data flow problem by a function of a program
point and a value
- This particular analysis only makes sense if global value numbering
has been done previously
- The need to compute that function for each value at each program point
- When the function result is Boolean, we can often compute it in
parallel for a number of values by using a bit vector.
]
Evaluating d-safe(p,v)
We can infer the following from the intended meaning of d-safe(p,v):
- If v is used at program point p then
d-safe(p,v) is true
- If p does not compute any operand of v,
and d-safe(p',v) is true for all of the successors p' of
p then d-safe(p,v) is true
The local facts about what happens to a value at a program point can be
captured by two local predicates:
used(p,v): v appears in the tuple at p
through(p,v): no operand of v appears before :=
in the tuple at p
Using these predicates, our inferred conditions can be described by:
d-safe(p,v) =
used(p,v) or
through(p,v) and
for all successors p' of p, d-safe(p',v)
Let's try computing d-safe(p,v) for the following graph:
[Discuss-
- Working through from the back
- d-safe(17,v) must be false, because a computation of
v in 17 introduces a new value on the path Entry, 1, 3,
4, 6, 17, Exit
- Therefore d-safe(exit,v) must be assumed to be false
- d-safe(11,v) and d-safe(12,v) are both clearly true,
but the formal description above would also allow them both to be false
- A fixedpoint of a computation is a set of values such that
if you compute all of the functions using them the answers don't change
- The computation of d-safe(p,v) for this graph, using the given
definition, has two fixedpoints
- In this case, we get the one we want by assuming
d-safe(p,v) = true for all p = 1,...,17 initially
- Any value can only change once, so the computation must reach a
fixedpoint
]
Solution algorithm
The description above provides a function for computing the value of
d-safe(p,v) at a point, given the values at successor points;
it does not give an algorithm for computing the values at all
points in the graph.
Here's an algorithm:
- Put all p onto a work list, and set all values of
d-safe(p,v) to ``true''
- Take an element off of the work list and compute its value
- If the new value differs from the old, put all p depending on
this value onto the work list if they are not there already
- If the work list is not empty, go to step 2
[Discuss-
- This algorithm is linear in the number of values:
- A value can change at most once
- A value can be put back on the list at most once (when some
successor becomes false)
- Those values dependent on a value are simply its predecessors in the
graph
- Note that in practice we combine the first computation with putting the
value on the work list
- A quadratic algorithm simply scans all p repeatedly, stopping
when none changes
- The quadratic algorithm could be approximately linear in
practice, if a good order happened to be chosen
]
Earliest computation point
The earliest program point, p, at which value v can be
computed is one for which d-safe(p,v) is true, but where there is
some path from the start node to p along which there is no point
p' for which d-safe(p',v) is true.
Recall that the earliest placement of a computation minimized the code
size.
To determine this placement, the compiler might compute a predicate
earliest(p,v), which is true at all nodes reached from the start
node via paths along which there is no point p' for which
d-safe(p',v) is true.
The earliest placement would then be at a point for which both
d-safe(p',v) and earliest(p,v) were true.
[Discuss-
]
Evaluating earliest(p,v)
Using the local predicates, we can say:
earliest(p,v) =
there exists a predecessor p' of p such that
(not d-safe(p',v) and earliest(p',v))
We can compute this predicate by using the same algorithm as before, but
- Setting all values to false initially
- Assuming that earliest(entry,v) = true
[Discuss-
- Do the computation for the sample graph, showing that the predicate is
true for nodes 1, 2, 3, 4, 5, 6 and 17
- The function and initial value have changed, but the algorithm remains
the same
- Those values dependent on a value are simply its successors in the graph
]
Delay
We argued that, although the earliest computation point theoretically
yielded the minimum code size, there might not be enough registers to hold
all of the values.
This would require spill code, which would increase both the execution time
and the code size.
A computation can be delayed to a node without execution-time cost
so long as none of the predecessors of that node uses the value computed.
In terms of predicates:
delay(p,v) =
(d-safe(p,v) and earliest(p,v)) or
for all predecessors, p', of p
(not used(p',v) and delay(p',v))
We can compute this predicate by using our general-purpose algorithm, but
- Setting all values to true initially
- Assuming that delay(entry,v) = false
[Discuss-
- Do the computation for the sample graph, showing that the predicate is
true for nodes 2, 5, 7, 8, 11, 12, 14 and 15
- Those values dependent on a value are simply its successors in the
graph
]
Latest placement
The final placement that we decided on in our example is determined by the
node beyond which we cannot delay a computation without execution-time
cost:
latest(p,v) =
delay(p,v) and
for some successor, p', of p)
(used(p,v) or not delay(p',v)
[Discuss-
- We don't need the general-purpose algorithm here because
latest(p,v) is not defined in terms of itself
- Do the computation for the sample graph, showing that the predicate is
true for nodes 2, 7, 14 and 15
]
Characteristics of data flow problems
The examples of data flow problems that we've seen are solved by Boolean
functions that involve:
- Successors of a node (e.g. d-safe)
- Predecessors of a node (e.g. earliest, delay)
- Universal quantifiers (e.g. d-safe, delay)
- Existential quantifiers (e.g. earliest)
Even though they appear to be quite different, all were actually solved by
the same algorithm with slightly different parameterizations.
We would like to capture this essential similarity in a model, which we can
then use to recognize and solve other data flow problems.
[Discuss-
- We use models for many compiler construction tasks
(e.g. parsing, name analysis, tree transformation)
- A model allows us to understand new problems in terms of old ones
- A model allows us to prove things like termination once
- There is the possibility of providing a tool that will generate
processors to solve problems fitting the model
]
Monotone data flow analysis
The model we'll use is the monotone analysis framework, which
constitutes:
- A set of inferences we make about the program, described as a complete
semilattice L = (A, top, bottom, meet) with height d,
where:
- A set, F, of k-ary monotone functions from lattice elements to
lattice elements containing the identity function and closed under pointwise
meet
- A mapping from operators of the program to elements of F
A function is monotone if, for all a, b in
A, a<=b implies f(a)<=f(b)
See A unified approach to path problems,
Journal of the ACM 28 (July, 1981)
577-593, for an explanation of the theory behind this class of problems.
[Discuss-
- The operator <= is defined on the lattice; it isn't
the normal integer operator
- d must be bounded by a constant (usually small -- 2 or 3)
- The arity of the functions is also usually no more than 3
- Our general algorithm is applied by first setting all of the values to
top
]
Example
Let's see how the problem of d-safety fits the monotone analysis framework:
- The inferences we make about the program are Safe and
Unsafe; Safe is the top element of the lattice and
Unsafe is the bottom element.
- The longest chain in the set of inferences is 1, so d = 1
- The meet operation is defined by the following table:
| meet | Safe | Unsafe
|
|---|
| Safe
| Safe
| Unsafe
|
|---|
| Unsafe
| Unsafe
| Unsafe
|
|---|
It's easy to see by complete enumeration that this operation satisfies all
of the conditions on meet.
- The set F contains three functions,
two 0-ary (constant) and one 1-ary:
- f0 = Safe
- f1 = Unsafe
- f2(a) = a
- The mapping from a program point p to an element of F
is:
used(p,v) ? f0 : through(p,v) ? f2 : f1
[Discuss-
- A function is monotone if the table describing its operation can be
written so that the the values in every row are monotone decreasing from
left to right and the values in every column are monotone decreasing from
top to bottom
- meet is applied to the successors of a node if there is more
than one
- f2(a) is applied to the result of the meet (a
single value summarizing the successors)
- Show how the computation works for the sample graph:
- Nodes 2, 9, 14, 15 and 16 get f0
- Node 1 gets f1
- All other nodes get f2
]
Proving termination
Here is a general method for proving that any iteration
terminates:
- Determine a predicate, B, that defines continuation (i.e. the
iteration stops if B is false)
- Determine an integer value, t, such that B implies
implies t>0 and t is decreased by each iteration
Existence of a t with these properties guarantees termination
(see The Science of Programming, Springer-Verlag, 1981).
Example:
while (fread(...) != EOF) {...}
B: We haven't gotten to the end of the file
t: Number of characters remaining in the file
[Discuss-
- Show that B and t meet the requirements
- Note that t need not be computable by the program
]
Proof of termination in a monotone analysis framework
Assume we use the work list algorithm:
B: The work list is empty
t: (sum of argument levels)*(total number of values) +
(size of work list)
``Argument level'' is the index of the row or column of the function's table,
with the origin at the lower right corner.
[Discuss-
- t>0 if the work list isn't empty
- If a value doesn't change when it's computed, that value is taken off
the work list, so t decreases
- If a value does change when it's computed, an arbitrary number
of computations may be put onto the work list but the total size of the
work list never exceeds the total number of values
- If a value changes, it must lower the level of some argument, so
t decreases
]
Global constant propagation
We can compute constant values at compile time, so it is useful to be able
to track them as they are assigned to variables and then used later.
This requires that we be able to mark every value as one of:
- Constancy undetermined
- A particular constant value
- Not a constant
[Discuss-
- Global constant propagation depends on the SSA representation, but
not on global value numbering; in fact, it should be done earlier,
to eliminate computations
- All values are initially undetermined as to constancy
- Values of external variables are always ``not a constant''
- Need functions to evaluate at least some operations
- Result of any operation with at least one ``not a constant'' operand is
``not a constant''
- At a phi function, all operands must be the same value for the result
to be constant
]
Constant propagation as monotone data flow analysis
For a more complete discussion, see Combining analyses, combining
optimizations,
ACM Transactions on Programming Languages and Systems 17 (March, 1995) 181-196.