Values
A computer program calculates a number of values.
One way to improve its performance is to reduce the number of values it
calculates.
This is done primarily by avoiding repeated calculation of the
same value.
The problem for the compiler is then to determine when two computations
yield the same value.
Redundant computations
A programmer generally does not knowingly write the code to compute the
same value several times.
The programming language may, however, conceal such recomputations in such
a way that the programmer has no choice in the matter.
Address calculations are the primary culprit.
Suppose that the array a has bounds
l1,
u1,
l2,
u2,
l3,
u3.
Assuming that
the array is stored by rows starting at address A,
and
that the size of each element is e,
the address of a[i,j,k] can be expressed as:
A + e * ( (i-l1) +
(j-l2) * (u1-l1) +
(k-l3) * (u1-l1) * (u2-l2)
)
Let di = (ui-li), and
after a little algebra:
A - e * (l1 + l2*d1 + l3*d1*d2)
+ i*e + j*e*d1 + k*e*d1*d2)
[Discuss-
- The programmer has no knowledge of or control over these computations
- The first line is a constant that depends only on the array dimensions
- d1*d2 is a constant
- The computation to be done at run time is
A - C0 + i*C1 + j*C2 + k*C3
- A similar expression can be obtained if the array is stored by columns
]
Example
Consider the following expression that is part of a typical scientific
computation:
a[i,j,k] := ( a[i-1,j,k] + a[i+1,j,k] +
a[i,j-1,k] + a[i,j+1,k] +
a[i,j,k-1] + a[i,j,k+1] ) / 6.0;
The address of a[i,j,k] was given above.
The address of a[i-1,j,k] is:
A - C0 + (i-1)*C1 + j*C2 + k*C3
= A - C0 - C1 + i*C1 + j*C2 + k*C3
= A - C01 + i*C1 + j*C2 + k*C3
[Discuss-
- i*C1 + j*C2 + k*C3
appears in every index computation
- Only the ``base address'' changes
- There is no practical way for the programmer to specify that
i*C1 + j*C2 + k*C3
should only be computed once
- A non-optimizing compiler will translate each array access
independently, thus computing this same value 7 times
]
How are values determined?
The normal code generator transforms the expression tree into a sequence of
instructions.
Considering the MIPS R2000, each instruction does one of the following
things:
- Loads a register
- Computes a value into a register from values in registers
- Stores a register
- Transfers control
[Discuss-
- RISC machine only computes values into registers
- Operands of a computation are always registers
- Specific operations that read and write memory, and do nothing else
- Strong distinction between registers and memory
]
Optimizer conventions
The optimizer represents the program as a sequence of actions that are
basically identical to RISC instructions.
[Discuss-
- Each action is a ``tuple''; p is a ``program point''
- Distinction between values and variables
- Possibility that variable addresses are computed
- Values are identified by integers
- Distinct integers represent distinct values
]
When are values identical?
It is important to note that, for the optimizer, two values are identical
if and only if they are obtained from the same values by use of the same
operator.
Consider the following sequence:
pi: Vj:=Vm+Vm
pi+1: Vk:=Vm*2
pi+2: Vj:=Vm+Vm
Vj and Vk are not identical, even
though they represent the same numerical result.
The computations at program points i and i+2, on the
other hand, involve identical operations on identical values and thus lead
to identical values.
[Discuss-
- The same computation can appear at any number of program points
- We say that values are identical if their defining tuples are congruent
- Congruence is used because mathematical equality is undecidable in
general and complicated even in decidable cases
- It's common to extend the notion of congruence to encompass some simple
identities:
- v + 0 is congruent to v
- v * 1 is congruent to v
- v2 op v1 is congruent to
v1 op v2 when op is commutative
- ...
The length of the list depends on the time the compiler designer wishes to
spend analyzing congruence.
]
Variables
A value is a specific value that gets computed at a particular program
point and never changes.
A variable, on the other hand, can be changed by writing to it.
- If we see a read tuple for a variable whose value is unknown, the value
delivered must be considered new -- not identical to any previous value
- Unless we can track the values of variables, we'll never
find identical values
[Discuss-
- The abstract program is a sequence of trees
- There is no subtree all of whose leaves are constants
- Non-constant leaves must be variable references
- If all variable references yield distinct values, then by
induction on the tree height we will never find any congruent tuples
]
Example
The first part of the tuple sequence for the array assignment mentioned
above might be:
p1: V1:=i
p2: V2:=V1*C1
p3: V3:=j
p4: V4:=V3*C2
p5: V5:=V2+V4
p6: V6:=k
p7: V7:=V6*C3
p8: V8:=V5+V7
p9: V9:=V8-C01
p10: V10:=a[V9]
If we didn't know that the variables i, j and k
were unchanged by this tuple sequence, then we would continue with a
sequence having the same form but different values.
Actually, however, the next 8 tuples will be identical to those at program
points 1-8 and can therefore be omitted.
[Discuss-
- Need to keep track of the value of a variable
- Need to have some sort of tuple lookup to test congruence
- Writes to a variable are included in this solution
- Need to be able to reliably distinguish variables -- no aliasing
]
Basic blocks
A basic block is a sequence of tuples containing no transfers of
control:
All of the tuples are executed, in order, if any of them
are executed.
This technique for tracking the values of variables works within a single
basic block.
It also enables us to implement an arbitrarily complex basic block as one
multiple assignment:
- Each variable used before being assigned in the block is read exactly
once
- When a write tuple is encountered, the value is noted
- At the end of the basic block, write tuples are generated for all
variables that were assigned
- The value assigned is the one currently shown as being the variable's
value
[Discuss-
- Theoretically interesting
- Demonstrates that all variable reads can be at the beginning of a block
and all variable writes at the end
- Reads and writes are ``boundary conditions''
]
Regions larger than a basic block
It is possible to identify several regions larger than a basic block that
have useful properties for optimization:
- Extended basic block
- A maximal tree of code sequences
- Strongly-connected region
- A collection of program points in which there is a path from every
program point to every other program point
- Procedure
- A collection of program points constituting a unit of abstraction
[Discuss-
- Our previous technique for tracking variable values still works in an
extended basic block but not in any larger region
- Computations yielding fixed values within a strongly connected region
should be moved out of that region
- Modularity allows procedures to be optimized independently
]
Tracking variables outside of extended basic blocks
The cornerstone of our analysis of values is our ability to track the
values stored in variables.
Let's look at the desired result in an extended basic block and then a
region that is neither a basic block nor an extended basic block.
Here the source language statements are to the right of the box
representing a basic block.
The top three boxes, enclosed by the dotted line, form an extended basic
block.
With the addtion of the fourth basic block, the structure no longer
satisfies the definition of an extended basic block.
[Discuss-
- Variable tracking within the extended basic block
- Addition of the join block
- Congruence does not apply to read and write tuples any more
- The symbol is a key that we use to obtain the variable's value
- Our method would work if an new variable name were used
instead of x in the join block
]
Static single assignment (SSA)
Use a new symbol for a variable whenever it is assigned or its value
becomes unknown because of a join.
Here's our example, with the source code altered so that it is SSA form.
The phi function expresses the fact that any of the possible
values reaching the beginning of the join box could be the value of the new
variable.
It permits some higher-level analysis of value propagation.
[Discuss-
- Every variable has exactly one value
- There may be no assignment to a symbol, in which case a new
value is created when the variable is accessed
- phi never has associated tuples -- it's an artifact of the
analysis
]
Converting a program to SSA form
Of course the program is not originally written in SSA form, so the
compiler must transform it.
There are two tasks that must be carried out:
- Find the places where variables are assigned (including joins) and
establish the new symbol names
- Consistently rename the variables so that each use has the same symbol
as the assignment that establishes its value
For a complete description of the process, see Efficiently Computing
Static Single Assignment Form and the Control Dependence Graph,
ACM Transactions on Programming Languages and Systems, 13 (1991) 451-490.
(Their treatment does not use the standard consistent renaming technique.)
[Discuss-
- Explicit assignments are easy to spot, but which variables are
arguments of phi functions at joins?
- Consistent renaming is a well-known compiler process, but what is the
``scope'' of an assignment?
- It turns out that these two questions are related.
]
Control flow graph
Here is an example of a control flow graph for complete procedure that
illustrates most of the issues.
It is slightly modifed from the cited SSA paper.
The numbered nodes represent the code of the procedure.
Entry and Exit represent the entry and exit operations of
the procedure.
[Discuss-
- Each numbered node of the graph corresponds to a basic block
- Directed edges represent the successor relation
- For the moment, the actual code is uninteresting
]
Dominator trees
A node X in the control flow graph dominates a node
Y if X lies on all paths from the entry node to
Y.
The dominance relation is reflexive and transitive.
We say that X strictly dominates Y when
X dominates Y and the two are different nodes.
Strict dominance can be described by a tree.
For our example:
See A Fast Algorithm for Finding Dominators in a Flowgraph
(ACM Transactions on Programming Languages and Systems 1 (1979) 121-141).
[Discuss-
- Each node of the dominator tree is a node of the control flow graph
- Linkage among the nodes is the only difference between the dominator
tree and the control flow graph
- Show that this tree really does describe strict dominance for this
control flow graph
]
Expansion of basic blocks
We've already seen that all of the assignments made within a basic
block can be collected at the end.
Also, if there is an assignment due to a phi function, it occurs at the
beginning of the basic block.
Thus we can regard each node of the dominator tree as consisting of three
nodes:
[Discuss-
- This expansion merely emphsizes the relations among assignments and
computations
- No processing algorithm will actually expand the nodes in this way, but
the expansion is reflected in attributes that decorate the tree
]
The scope of an assignment
In order to use standard consistent renaming algorithms (see
Modularity and Reusability in Attribute Grammars, Acta
Informatica, 31(1994) 601-627), we need to identify three roles:
- Defining occurrences
- Scopes
- Applied occurrence
An assignment (either in the original program or as the result of a phi
function) is a defining occurrence.
If a variable is not assigned, its defining occurrence is its first use.
If a variable use is not a defining occurrence, it's an applied occurrence.
The scope of a defining occurrence is the subtree of the dominator tree
rooted in the node containing that defining occurrence, less any subtrees
rooted in nodes containing defining occurrences of the same variable.
[Discuss-
- A use of a variable preceding any assignment establishes a value for
that variable
- An assignment in a node of the dominator tree precedes all uses in
nodes of the subtree
- If another assignment intervenes, the use will be in a subtree of
the node containing that assignment
- If there is a join, and a variable has been assigned in the blocks joined,
there will be an assignment via a phi function at the beginning of the join
block
]
Example
Let's return to the simple example we used earlier:
Here's the expanded dominator tree:
The dotted box on the right encloses the expansion of the basic block at
the bottom of the previous figure to include the assignment to x
via the phi function.
(Only the non-empty elements of each expansion are shown.)
[Discuss-
- All occurrences of x have been systematically renamed in the
tuples of both pictures
- Notice how the scope of each variable is related to the structure of
the dominator tree
- y is still valid in the lower left block because there was no
assignment to it
]
Dominance frontiers
The dominance frontier DF(X) of a control flow graph node
X is the set of all nodes Y such that X
dominates a predecessor of Y but does not strictly dominate
Y.
If there is an assignment to a variable in node X, then there must
be an assignment to that variable via a phi function at every node in
DF(X).
Consider a node Y in DF(X)
- X dominates a predecessor of Y, so there is a path
from X to Y
- X does not strictly dominate Y, so there is a path
from Entry to Y that doesn't go through X
- Therefore we can arrive at Y with either of two values for the
variable assigned in X
There is a linear algorithm for computing all dominance frontiers.
(See Efficiently Computing Static Single Assignment Form and the
Control Dependence Graph, ACM Transactions on Programming Languages
and Systems, 13(1991) 451-490.)
[Discuss-
- Strictly is important here because it allows a node to lie on
its own dominance frontier
]
Example
The dominance frontier for each node of our sample control flow graph is
given next to that node, enclosed in square brackets:
[Discuss-
- The dominance frontiers must be determined from a combination
of the dominantor tree and the control flow graph
- Notice that none of these algorithms require the tuples
]
Global value numbering
We can now determine which tuples result in identical values over the
entire procedure:
- Determine the basic blocks, along with their predecessors and
successors (this defines the control flow graph)
- Construct the dominator tree
- Add phi functions as necessary to the basic blocks
- Consistently rename the variables
- Generate the tuples in a depth-first, left-to-right traversal of the
dominator tree
[Discuss-
- Note that we end up with a tree representation of the program
- Consistent renaming can be combined with tuple generation
- When tuples are being generated, don't generate duplicates
within a basic block
- Note that the whole tuple need not be stored at each program point --
the tuple identification is sufficient
]