Scope Rules

Purpose

To focus your attention on the issues central to any computation over a tree, to specifically explore the needs of name analysis, and to develop an implementation strategy that will serve as the basis for your approach to the next assignment:

Due dates

You must complete the PEP tasks (individually), and then hand in the answers to the questions in this exercise (as a group) at the beginning of class on Monday, September 28. This assignment is worth 30 points.

Background

Name analysis must be performed by a compiler for any language allowing user-defined identifiers. It uses the scope rules of the source language to associate each occurrence of an identifier with the definition of the entity (e.g. constant, variable, type, procedure, etc.) that occurrence represents. Because the scope rules are stated in terms of program structure, name analysis is defined in terms of the source program tree.

If a programming language permitted each identifier to have only one meaning, there would be no need for complex name analysis algorithms. One subtask of name analysis, consistent renaming, essentially re-writes the program by replacing identifiers with definition table keys. Each key has only one meaning, so the compiler is back to the simpler situation. Section 5.1 of Appel gives a general characterization of the mechanisms used in consistent renaming.

Three considerations influence the design of a compiler's name analyzer: the scope rules of the source language, general approaches to computations over trees, and available implementation strategies.

The scope rules of C--

C-- is defined in Waite's Appendix A. Section A.l.2.3 spells out the C-- visibility rules in terms of an identification procedure for finding the defining occurrence identified by a specific applied occurrence. Step (4) of that procedure makes the scope of a definition the program text from the definition to the end of the smallest containing range. This ``C-like'' scope rule contrasts with the ``ALGOL-like'' scope rule in which the scope of a definition is the entire range containing it. (Waite discusses name analysis for languages using ALGOL scope rules in Chapter 7.)

Variables R, P and T of the identification procedure (and their primed versions) denote regions of text. A region of text corresponds to a subtree of the abstract program tree, and the root of that subtree can be used to represent the entire region. Thus the identification procedure can be mapped onto a computation over the abstract program tree.

As stated, the identification procedure establishes a relationship between an applied occurrence and a defining occurrence. If we assume that the entity represented by a defining occurrence is stored at the tree node corresponding to that defining occurrence, then a relationship between an applied occurrence and a defining occurrence allows the compiler to retrieve the entity represented by the applied occurrence. Since the relationship between applied and defining occurrence is implicit in the fact that both represent the same entity, the name analyzer need only store that entity at both the defining and applied occurrences.

Section 7.1.1 of Waite explains how the search components of an identification procedure can be encapsulated in a module that is independent of particular scope rules. The scope rules of the language are embodied in the order in which the module's operations are invoked. Here is the C-- identification procedure restated in terms of the operations of that module:

  1. Initially, every identifier occurrence has an associated integer value, its Sym attribute, giving the index of the identifier in the string table. Each textually distinct identifier has exactly one index.

  2. NewEnv is invoked to obtain a value of type Environment. This value becomes the current environment.

  3. The text is traversed in the order in which it was written, performing the following actions at the indicated constructs:

    Beginning of a range
    Obtain a new environment by applying NewScope to the current environment. The current environment becomes the new environment until the end of the range is reached.
    Defining occurrence
    Obtain a value for the Key attribute of the defining occurrence by applying Defineldn to the current environment and the Sym attribute of the defining occurrence.
    Applied occurrence
    Obtain a value for the Key attribute of the applied occurrence by applying KeylnEnv to the current environment and the Sym attribute of the applied occurrence.
    End of a range
    Reset the current environment to the value it had before the application of NewScope at the beginning of the range.

  4. The Key attribute of each identifier occurrence defines the entity represented by that occurrence.

You should convince yourself that this formulation of the identification procedure actually yields the same result as that of Section A.1.2.3 of Waite. The key point is that the Defineldn operation has a side effect on its environment argument, adding a binding for the identifier. This side effect changes the environment's response to subsequent KeylnEnv operations.

Designing tree computations

Section 7.1 of Waite describes the name analysis problem as an example of a general tree computation. Solutions to tree computation problems have three distinct components:

  1. The computations themselves, associated with particular tree contexts.
  2. The tree traversal needed to perform the computations in the appropriate order.
  3. The intermediate storage needed for communication among the computations.

After step (1) has been carried out, steps (2) and (3) are largely mechanical. Thus the difficult tasks when designing tree computations are to decide what properties of program constructs are interesting, and what computations will yield the values of those properties.

Experience has led to a two-step approach to specifying computations of a property of program constructs. After stating this approach, I'll give some examples. Implementation issues, including determination of the tree traversal and allocation of intermediate storage, will be considered in the next section.

  1. Determine the type of value to encode the property, and introduce an attribute of that type. Associate that attribute with all symbols representing program constructs having the property.

  2. Consider each symbol having the property, and select a computation strategy for the attribute by using the following table: .

    Computation patternStrategy
    Each lower context of the symbol requires a different computation of the property. Design an appropriate computation for each rule having the symbol on the left-hand side.
    Each upper context of the symbol requires a different computation of the property. Design an appropriate computation for each rule having the symbol on the right-hand side.
    The property can be computed independently of the particular context, using attributes of the symbol itself or remote-access attributes. Design an appropriate lower-context symbol computation.
    The property can be computed almost independently of the particular context, using attributes of the symbol itself or remote-access attributes. There are a few exceptions where a lower context of the symbol requires a different computation. Design an appropriate lower-context symbol computation and override it in the exceptional contexts.
    The property can be computed almost independently of the particular context, using attributes of the symbol itself or remote-access attributes. There are a few exceptions where an upper context of the symbol requires a different computation. Design an appropriate upper-context symbol computation and override it in the exceptional contexts.

    Consider, for example, the design of the interpreter in Tree Abstractions for Programs. To interpret a program in Appel's straight-line programming language, the value property of expression constructs must be computed. Since Appel's language allows only integer expressions, int is an appropriate type to encode expression values. Thus an attribute of type int, named Value, was introduced. Value was associated with the symbols Exp and Binop, since they represented the only program constructs having expression values. This completes step (1) of the two-step approach to specifying tree computations.

    (Note that a different, but perfectly valid, interpreter design would have resulted had the Value attribute not been associated with symbol Binop. Instead, an ``evaluation function'' property of the operator constructs could have been computed. That evaluation function would have been used to compute the value resulting from an expression like 2+3. The type int (*EvalFn)(int,int) could represent an evaluation function, and an attribute Evaluator of type EvalFn could be introduced. Evaluator would be associated with the symbol Binop, since that symbol represents all of the program constructs that would have the evaluation function property.)

    As an example of the second step of the two-step process, consider the Exp symbol in the abstract syntax for Appel's language. It appears in a number of different rules, and each rule represents a distinct context for Exp. Rules in which Exp appears on the left-hand side represent lower contexts of Exp: The rule describes a part of the tree involving an Exp node and its children. A rule in which Exp appears on the right-hand side represents an upper context of Exp because it describes a part of the tree involving an Exp node and its parent and siblings. Thus an abstract syntax determines a fixed set of lower contexts for each nonterminal symbol and a fixed set of upper contexts for each nonterminal symbol.

    The solution to a tree computation problem provides a computation for every attribute of every symbol. All of the computations for a particular attribute of a particular symbol could be in the lower contexts of that symbol, or all of them could be in the upper contexts. If some computations for a particular attribute of a particular symbol appear in upper contexts of that symbol and others appear in lower contexts of that symbol, then there will be trees in which the value of an attribute is ambiguous. This situation is therefore not allowed.

    Computation of the Value attribute of Exp will always take place in a lower context: In a NumExp context, the Value attribute of the Exp appearing on the left-hand side is the value of the num terminal appearing on the right-hand side; in an OpExp context, the Value attribute of the Exp appearing on the left-hand side is the value of the Value attribute of the Binop appearing on the right-hand side. Because each lower context of Exp requires a different computation of the Value attribute, this problem satisfies the computation pattern of the first row of the table. It was indeed solved by the strategy of designing a computation for each rule having Exp on the left-hand side.

    Available implementation strategies

    The design of a tree computation involves identifying the properties to be determined, giving them types, defining attributes, and associating computations with particular tree nodes. Some kinds of computation (like name analysis) occur frequently and have sufficiently few variants that standard implementations are available in the Eli library and can be re-used. Computations that occur less frequently, or have many variants, can be specified and associated with tree nodes using LIDO. When an implementor wishes to maintain tight control over all aspects of an implementation, or must add computations to an existing C++ program, the Visitor design pattern can be used and C++ code written.

    The three implementation techniques discussed in this section are mutually exclusive with respect to one task. For example, you will choose one of the three techniques for your implementation of the name analysis task. It is quite common to mix LIDO coding and the use of the Eli library when implementing a complete compiler, because the library does not support all of the tasks that must be carried out. A C++ implementation often calls low-level modules like the environment module discussed in Chapter 7 of Waite, but it is unusual to use Eli-generated code as part of a C++ implementation.

    Eli name analysis library

    The two-step process for specifying computations over a tree associates computations with rules or symbols of the abstract syntax describing the tree. A given task usually involves several properties of program constructs. Attributes and computations for these properties can be gathered together into a module. Within a module, the attributes and computations can be divided into a number of computational roles played by different symbols.

    In the case of the C-- name analysis, the identification procedure as restated in terms of the operations defined by Waite's Figure 7.5 identified three computational roles: those played by a range, a defining occurrence and an applied occurrence. The role played by a range involved two computations, establishing a new current environment and restoring the old one, whereas the other two roles involved one computation each.

    An attribution library module is made up of a number of named computational roles. To use the module, it must first be instantiated. Then appropriate symbols of the abstract syntax must be specified to inherit computational roles of the module. The description of each module shows how that module should be instantiated and explains the computational roles it provides.

    The Eli library provides an attribution library module for solving the name analysis problem with C-like scope rules. To you use this module, you need only instantiate it and specify which symbols of the C-abstract syntax inherit computational roles of the module.

    Define a new type-lido file for the inheritance specifications rather than adding them to existing files. This enhances the modularity of your specification, making it easier to understand and maintain. Each LIDO rule can appear an arbitrary number of times in an arbitrary number of type-lido files. If you want to specify the rule's name, you need do it only in one place; Eli considers rules to be the same if they have the same signature (the grammar rule is the signature) and do not have different user-specified names.

    LIDO coding

    The two-step process for specifying computations over a tree allows you to associate specific computations with symbols and/or rules of the C-- abstract syntax. You can code these computations in LIDO, using the operations exported by the environment module whose interface is given in Waite's Figure 7.5. That module must also be made available by placing the line $/Name/envmod.specs in a type-specs file (as was done in the Eli specification for the straight-line language).

    If you use the environment module, you will need to make the dependence of your computations on side effects explicit. Remember that the Defineldn operation has a side effect on its environment argument, and the identification procedure discussed above made use of that side effect to implement the C-- scope rules.

    When you attach computations to rules and symbols, you leave the evaluation order for Eli to figure out. Eli knows that it must obtain the value of an argument before calling the function using that argument. That is what is called a functional dependence. When an operation has a side effect on some data structure that will influence the value of a later operation, the later operation is certainly dependent upon the earlier. That dependence is not explicit in the specification, however. Thus additional information must be provided. You've already seen an example of dependence due to side effects in the specification of the left-to-right evaluation required by the informal semantics of the straight-line programming language. The LIDO CHAIN construct is the mechanism for describing this kind of dependence.

    C++ coding

    When you code in C++, all computations must be associated with rules because every object in the tree is an instance of a rule class. If the computation pattern matches any of lines 3-5 of the table, the symbol computation must be copied into the visitor method of each class associated with a rule constituting the appropriate context of the symbol.

    Although it is theoretically possible to associate visitors with the classes corresponding to symbols of the abstract syntax, some thought will show that there is no advantage to be gained by doing that. The only way of reducing the amount of code to be written is by using procedures or macros that are invoked in the methods.

    You must also determine the appropriate tree-traversal strategy and code that into your visitor. In the case of the C-- name analysis, the identification procedure as restated in terms of the operations defined by Waite's Figure 7.5 requires a depth-first, left-to-right traversal. Thus the strategy used for the interpreter for the straight-line programming language can be followed.

    You must also consider how to store both the intermediate environment values and the final key values. The intermediate values are local to your visitor, but the final key values must be accessed by later visitors implementing other computations. Also, they are associated with specific nodes of the tree and therefore should probably be fields of classes implementing the abstract program tree.

    Attributes are not really variables, because they can be given exactly one value. This behavior can be described in C++ by using a method to set the field and having that method signal an error if it is invoked more than once. You might decide, however, to simply implement the attribute as a public field and forego the error checking.

    Task

    Hand in your group's written answers to the following questions at the beginning of class on the due date:

    1. ALGOL 60, one of the first ``higher level'' programming languages, defined the scope of a definition to be the range immediately surrounding the definition, less any enclosed ranges containing a definition of the same identifier. A whole family of languages based on ALGOL 60, including Pascal, inherited this definition of scope. Although Pascal uses ALGOL 60's definition of scope, the Pascal standard prohibits an identifier from being used before it has been defined. Some people claim that therefore Pascal and C have the same scope rules. Do you agree? Explain briefly.

    2. We have made the assumption that name analysis is carried out by a computation over an abstract program tree.
      1. Would it be possible to carry out consistent renaming on a C-- program during lexical and syntactic analysis? Explain briefly.
      2. Would it be possible to carry out consistent renaming on a Pascal program during lexical and syntactic analysis? Explain briefly.
      3. Would it be possible to carry out consistent renaming on an ALGOL 60 program during lexical and syntactic analysis? Explain briefly.

    3. Briefly characterize the similarities and differences between the bindings advocated by Appel's Section 5.2 and Waite's Section 7.2.

    4. Apply the two-step design process discussed above to the consistent renaming problem for C--.
      1. What properties of program constructs must be computed?
      2. What type of value will you use to encode each?
      3. What attributes will you introduce?
      4. What symbol(s) of the C-- abstract syntax will these attributes be associated with?
      5. For each symbol and attribute, which line of the table matches the necessary computation pattern?
      6. For each match, state the computation.
      7. Is the direct dependence implied by the arguments of these computations sufficient to describe all dependence present in the problem? Explain briefly.

    5. Suppose that the language in the previous question were changed from ``C--'' to ``ALGOL 60''. Would any of your answers change? Explain briefly.

    Submit your group's answers to the following questions to lrc@sei.cmu.edu before the beginning of class on the due date:

    1. Briefly discuss the costs and benefits of each of the available implementation strategies outlined above. Try to assess each cost and benefit objectively, regardless of the strategy you think you will ultimately choose.
    2. Select one of the strategies on the basis of your assessment. Write a brief description of the tasks to be carried out by each team member for the next assignment, giving a rationale for your partitioning.

    Complete your personal postmortem for this assignment and your personal project plan for the next assignment, and submit them via the web before class on the due date of this assignment.


    Instructor
    Revision 1.3 (1998/09/12 20:38:50)