If a programming language permitted each identifier to have only one meaning, there would be no need for complex name analysis algorithms. One subtask of name analysis, consistent renaming, essentially re-writes the program by replacing identifiers with definition table keys. Each key has only one meaning, so the compiler is back to the simpler situation. Section 5.1 of Appel gives a general characterization of the mechanisms used in consistent renaming.
Three considerations influence the design of a compiler's name analyzer: the scope rules of the source language, general approaches to computations over trees, and available implementation strategies.
Variables R, P and T of the identification procedure (and their primed versions) denote regions of text. A region of text corresponds to a subtree of the abstract program tree, and the root of that subtree can be used to represent the entire region. Thus the identification procedure can be mapped onto a computation over the abstract program tree.
As stated, the identification procedure establishes a relationship between an applied occurrence and a defining occurrence. If we assume that the entity represented by a defining occurrence is stored at the tree node corresponding to that defining occurrence, then a relationship between an applied occurrence and a defining occurrence allows the compiler to retrieve the entity represented by the applied occurrence. Since the relationship between applied and defining occurrence is implicit in the fact that both represent the same entity, the name analyzer need only store that entity at both the defining and applied occurrences.
Section 7.1.1 of Waite explains how the search components of an identification procedure can be encapsulated in a module that is independent of particular scope rules. The scope rules of the language are embodied in the order in which the module's operations are invoked. Here is the C-- identification procedure restated in terms of the operations of that module:
You should convince yourself that this formulation of the identification procedure actually yields the same result as that of Section A.1.2.3 of Waite. The key point is that the Defineldn operation has a side effect on its environment argument, adding a binding for the identifier. This side effect changes the environment's response to subsequent KeylnEnv operations.
After step (1) has been carried out, steps (2) and (3) are largely mechanical. Thus the difficult tasks when designing tree computations are to decide what properties of program constructs are interesting, and what computations will yield the values of those properties.
Experience has led to a two-step approach to specifying computations of a property of program constructs. After stating this approach, I'll give some examples. Implementation issues, including determination of the tree traversal and allocation of intermediate storage, will be considered in the next section.
|Each lower context of the symbol requires a different computation of the property.||Design an appropriate computation for each rule having the symbol on the left-hand side.|
|Each upper context of the symbol requires a different computation of the property.||Design an appropriate computation for each rule having the symbol on the right-hand side.|
|The property can be computed independently of the particular context, using attributes of the symbol itself or remote-access attributes.||Design an appropriate lower-context symbol computation.|
|The property can be computed almost independently of the particular context, using attributes of the symbol itself or remote-access attributes. There are a few exceptions where a lower context of the symbol requires a different computation.||Design an appropriate lower-context symbol computation and override it in the exceptional contexts.|
|The property can be computed almost independently of the particular context, using attributes of the symbol itself or remote-access attributes. There are a few exceptions where an upper context of the symbol requires a different computation.||Design an appropriate upper-context symbol computation and override it in the exceptional contexts.|
Consider, for example, the design of the interpreter in Tree Abstractions for Programs. To interpret a program in Appel's straight-line programming language, the value property of expression constructs must be computed. Since Appel's language allows only integer expressions, int is an appropriate type to encode expression values. Thus an attribute of type int, named Value, was introduced. Value was associated with the symbols Exp and Binop, since they represented the only program constructs having expression values. This completes step (1) of the two-step approach to specifying tree computations.
(Note that a different, but perfectly valid, interpreter design would have resulted had the Value attribute not been associated with symbol Binop. Instead, an ``evaluation function'' property of the operator constructs could have been computed. That evaluation function would have been used to compute the value resulting from an expression like 2+3. The type int (*EvalFn)(int,int) could represent an evaluation function, and an attribute Evaluator of type EvalFn could be introduced. Evaluator would be associated with the symbol Binop, since that symbol represents all of the program constructs that would have the evaluation function property.)
As an example of the second step of the two-step process, consider the Exp symbol in the abstract syntax for Appel's language. It appears in a number of different rules, and each rule represents a distinct context for Exp. Rules in which Exp appears on the left-hand side represent lower contexts of Exp: The rule describes a part of the tree involving an Exp node and its children. A rule in which Exp appears on the right-hand side represents an upper context of Exp because it describes a part of the tree involving an Exp node and its parent and siblings. Thus an abstract syntax determines a fixed set of lower contexts for each nonterminal symbol and a fixed set of upper contexts for each nonterminal symbol.
The solution to a tree computation problem provides a computation for every attribute of every symbol. All of the computations for a particular attribute of a particular symbol could be in the lower contexts of that symbol, or all of them could be in the upper contexts. If some computations for a particular attribute of a particular symbol appear in upper contexts of that symbol and others appear in lower contexts of that symbol, then there will be trees in which the value of an attribute is ambiguous. This situation is therefore not allowed.
Computation of the Value attribute of Exp will always take place in a lower context: In a NumExp context, the Value attribute of the Exp appearing on the left-hand side is the value of the num terminal appearing on the right-hand side; in an OpExp context, the Value attribute of the Exp appearing on the left-hand side is the value of the Value attribute of the Binop appearing on the right-hand side. Because each lower context of Exp requires a different computation of the Value attribute, this problem satisfies the computation pattern of the first row of the table. It was indeed solved by the strategy of designing a computation for each rule having Exp on the left-hand side.
The three implementation techniques discussed in this section are mutually exclusive with respect to one task. For example, you will choose one of the three techniques for your implementation of the name analysis task. It is quite common to mix LIDO coding and the use of the Eli library when implementing a complete compiler, because the library does not support all of the tasks that must be carried out. A C++ implementation often calls low-level modules like the environment module discussed in Chapter 7 of Waite, but it is unusual to use Eli-generated code as part of a C++ implementation.
In the case of the C-- name analysis, the identification procedure as restated in terms of the operations defined by Waite's Figure 7.5 identified three computational roles: those played by a range, a defining occurrence and an applied occurrence. The role played by a range involved two computations, establishing a new current environment and restoring the old one, whereas the other two roles involved one computation each.
An attribution library module is made up of a number of named computational roles. To use the module, it must first be instantiated. Then appropriate symbols of the abstract syntax must be specified to inherit computational roles of the module. The description of each module shows how that module should be instantiated and explains the computational roles it provides.
The Eli library provides an attribution library module for solving the name analysis problem with C-like scope rules. To you use this module, you need only instantiate it and specify which symbols of the C-abstract syntax inherit computational roles of the module.
Define a new type-lido file for the inheritance specifications rather than adding them to existing files. This enhances the modularity of your specification, making it easier to understand and maintain. Each LIDO rule can appear an arbitrary number of times in an arbitrary number of type-lido files. If you want to specify the rule's name, you need do it only in one place; Eli considers rules to be the same if they have the same signature (the grammar rule is the signature) and do not have different user-specified names.
If you use the environment module, you will need to make the dependence of your computations on side effects explicit. Remember that the Defineldn operation has a side effect on its environment argument, and the identification procedure discussed above made use of that side effect to implement the C-- scope rules.
When you attach computations to rules and symbols, you leave the evaluation order for Eli to figure out. Eli knows that it must obtain the value of an argument before calling the function using that argument. That is what is called a functional dependence. When an operation has a side effect on some data structure that will influence the value of a later operation, the later operation is certainly dependent upon the earlier. That dependence is not explicit in the specification, however. Thus additional information must be provided. You've already seen an example of dependence due to side effects in the specification of the left-to-right evaluation required by the informal semantics of the straight-line programming language. The LIDO CHAIN construct is the mechanism for describing this kind of dependence.
Although it is theoretically possible to associate visitors with the classes corresponding to symbols of the abstract syntax, some thought will show that there is no advantage to be gained by doing that. The only way of reducing the amount of code to be written is by using procedures or macros that are invoked in the methods.
You must also determine the appropriate tree-traversal strategy and code that into your visitor. In the case of the C-- name analysis, the identification procedure as restated in terms of the operations defined by Waite's Figure 7.5 requires a depth-first, left-to-right traversal. Thus the strategy used for the interpreter for the straight-line programming language can be followed.
You must also consider how to store both the intermediate environment values and the final key values. The intermediate values are local to your visitor, but the final key values must be accessed by later visitors implementing other computations. Also, they are associated with specific nodes of the tree and therefore should probably be fields of classes implementing the abstract program tree.
Attributes are not really variables, because they can be given exactly one value. This behavior can be described in C++ by using a method to set the field and having that method signal an error if it is invoked more than once. You might decide, however, to simply implement the attribute as a public field and forego the error checking.
Submit your group's answers to the following questions to email@example.com before the beginning of class on the due date:
Complete your personal postmortem for this assignment and your personal project plan for the next assignment, and submit them via the web before class on the due date of this assignment.
|Instructor||Revision 1.3 (1998/09/12 20:38:50)|