Overload Resolution

Purpose

To understand how information about entities is stored and made available, and how operator overloading can be resolved using such information:

Due dates

You must complete the PEP tasks (individually), and then hand in the answers to the questions in this exercise (as a group) at the beginning of class on Monday, October 12. This assignment is worth 30 points.

Background

The semantic analysis task of a compiler must verify that the source program satisfies any context conditions imposed by the language definition. Many of these context conditions are stated in terms of the type of the result of an expression. For example, Section A.1.4.1 of Waite states that the Expression component of a C-- conditional must yield a value of type int.

To verify that such context conditions are satisfied, the compiler must determine the result type of each expression. This computation uses information from the language definition about the set of legal operators, and information about the types of identifiers and denotations. The type of a denotation can be determined by the kind of basic symbol, but the type of an identifier is determined by its declaration and that information must be conveyed from the point of declaration to the point of use.

Type consistency checks therefore involve several distinct components:

Entities and properties

A variable may be referred to in many places in a program, and it has an existence that is independent of those references. Variables have properties that are determined by some of the references and used by others. Section 7.2 of Waite describes how a compiler uses a definition table to store the properties of entities like variables.

The definition table is a general database, in which each entity is represented by a unique definition table key. Properties of the entity are accessed via operations taking the definition table key as an argument. Each property has its own set of query and update operations. Figure 7.12 shows two basic access operations, which illustrate possible interfaces; an implementation of these interfaces appears in Figure 7.15.

Eli also supports this paradigm with a definition table module implemented as described in Section 7.2. Section 7.2.3 notes that implementation of the property-dependent code for access functions is trivial, but it is obvious that this implementation is burdensome when the number of properties is large. Thus Eli provides a simple property definition language (PDL). When a user describes the desired properties in PDL, Eli can generate a set of basic access functions. More complex behavior, including arbitrary access functions and property initialization, can be specified in PDL.

Dependence through side effects

Update and query operations depend on each other because of their side effects on the definition table itself. Recall that such dependence must be made explicit, either by selecting a particular tree traversal or through the use of VOID attributes.

The most critical aspect of dependence on side effects is recognizing its presence. When we write programs, we seldom consider side effects explicitly because we account for them implicitly in the sequence of operations. Traversal strategies in a compiler are sufficiently complex that we cannot rely on implicit ordering of code. Whether we write code or specifications, we must carefully consider the question of dependence on side effects in our design. Here is a proven strategy:

  1. Determine those contexts where the property is updated and select the updating computations according to step 2 of the two-step approach.
  2. Determine those contexts where the property is queried and select the querying computations according to step 2 of the two-step approach.
  3. Determine precondition relations between (1) and (2) and encode them in either a traversal strategy or a set of computations involving VOID attributes.

In most cases the precondition relations determined in step (3) can be stated as either ``do all operations in the order in which their contexts appear in the program'' or ``do all updates before doing any queries''.

The precondition relations for the Value property of a variable in Appel's straight-line programming language can be stated as ``do all operations in the order in which their contexts appear in the program'': Assignments to a variable update its Value property and uses of a variable query its Value property. The left-to-right semantics of the language then demand these precondition relations. Recall that they were encoded in C++ as a depth-first, left-to-right traversal implemented by a single visitor, and they were specified in Eli by a VOID chain.

The precondition relations for the EntityClass property of an identifier (Waite, Section 7.2.1) can be stated as ``do all updates before doing any queries'': At each defining occurrence, the EntityClass property is set to some classification (like VariableName) if it has not been set before, and to MultiplyDefined if it has been set before. It is also examined at each defining occurrence and an error report is issued if its value is MultiplyDefined. The precondition relations ``do all updates before doing any queries'' are required to be able to issue the error report at every defining occurrence (including the first) of a multiply-defined identifier. They can be encoded in a traversal strategy by placing all of the updating computations in one traversal and all of the querying computations in a later traversal. In Eli, they can be specified as follows:

SYMBOL ROOTCLASS COMPUTE
  SYNT.AllEntityClassesDone=CONSTITUENTS DefiningOccurrence.EntityClassDone;
END;

SYMBOL DefiningOccurrence COMPUTE SYNT.EntityClassDone=SetEntityClass(THIS.Key,THIS.Kind,MultiplyDefined); IF(EQ(GetEntityClass(THIS.Key,NotDefined),MultiplyDefined), message(ERROR,"Multiply defined",0,COORDREF)) <- INCLUDING ROOTCLASS.AllEntityClassesDone; END;

This example uses the basic query and update operations of the Eli definition table module. ROOTCLASS is a predefined LIDO symbol implicitly inherited by the root of the tree grammar. Thus AllEntityClassesDone is an attribute of the root, and CONSTITUENTS DefiningOccurrence.EntityClassDone ensures that AllEntityClassesDone depends on all of the SetEntityClass update operations. In turn, INCLUDING ROOTCLASS.AllEntityClassesDone ensures that each of the GetEntityClass query operations depends on the AllEntityClassesDone attribute of the root. Since dependence is transitive, the precondition relation ``do all updates before doing any queries'' is guaranteed.

Operator identification

Waite's Table A.1 defines the operator identification problem for C--. The left-hand column of the table lists the basic symbols that represent operators. Most of these symbols are overloaded, having more than one possible meaning. For example, = could mean integer assignment, truncating assignment, or floating assignment.

Generally speaking, the compiler must identify the operator of an expression in order to determine the type of the result of that expression. Operator identification in C-- involves matching the pattern represented by the ``Indication'' and ``Operand Type'' columns in Table A.1 to the information present at a tree node. Each pattern uniquely identifies one of the operations in the rightmost column of the table, and hence fixes the result type. The problem is complicated by the presence of coercions that allow the compiler to convert values of one type into values of another.

Appel's Section 5.3 shows how a specification like Waite's Table A.1 can be implemented by writing switch statements and conditionals. This strategy allows the implementor to take advantage of regularities in the patterns to simplify the code. For example, the results of all of the overloadings of ==, < and> have type int in C-- and therefore no tests are needed for those indications.

Appel's strategy can be easily applied to C-- type analysis using either C++ code or Eli specifications. The necessary operations would be carried out by a visitor in a C++ implementation, or by rule computations in a LIDO implementation. As Appel points out in Section 5.3, certain cases are common enough to warrant additional function definitions. When using Eli, those functions would be written in C and invoked from the LIDO rule computations. The actual design is straightforward, and essentially independent of whether C++ code or an Eli specification is ultimately produced.

C-- presents a simple type analysis problem, having only two types and one coercion (see Waite, Sections A.1.5.1 and A.1.5.2). As the number of types and coercions increases, or user-defined types are allowed, Appel's strategy becomes less appealing. A better approach under those circumstances is to implement general operations that can be applied to data encoding the set of types and coercions. This kind of tradeoff, between solving a problem with specialized code and solving it with generalized code operating on specialized data, is a common one in software design.

Section 10.1 of Waite presents a data-driven solution to the type analysis problem. The presentation is concerned with two distinct questions:

  1. Is the use of this particular operator symbol in this particular context consistent with the rules of the source language?
  2. How should this source language construct be implemented as a target program tree fragment?

Although it is possible to use a single process to answer both questions, this unnecessarily complicates both the explanation and the implementation. In this course we split the two questions and answer them in different, but related, ways. Even with the split, however, all of the material in Section 10.1 is applicable to type analysis. The only difference is that the sets of types and operators are determined solely by the source language, rather depending on a mixture of source language and target machine considerations as described at the beginning of Section 10.1.1.

Eli provides an implementation of the operator identification module sketched in Section 10.3. Its interface differs somewhat from that described in Figure 10.3 because the module is implemented in C, but it provides the same functionality. It also supports simpler operator identification operations that can be used when operators can be identified without considering the contexts in which they occur (C-- satisfies this constraint). Finally, Eli provides a specification language for describing the relationships among types, coercions and operators. This language allows one to specify the data for the operator identification module in a form very much like that of Table A.1.

Task

Answer the following questions:

  1. Assume that you have some Eli specification, and want to add a property named Type, whose values are of type DefTableKey. (Assume that the existing specification has no property named Type). What is the minimum set of changes you must make? Explain briefly.

  2. Waite's Section 7.2.2 argues that what he calls the ``single-record approach'' to implementing a definition table entry has a number of disadvantages. Consider a C++ implementation following the strategy outlined in the first paragraph on page 178: There is a class to implement a definition table entry, and it has subclasses for variables, types, etc. The type subclass, in turn, has subclasses for enumerated types, record types, and so on. Would this implementation avoid any or all of the disadvantages Waite lists for the single-record approach? Explain briefly.

  3. Consider the semantics of C-- expressions described by Waite in Section A.1.5.
    1. Suppose that the language definition described no coercions, but the semantics remained unchanged. How many operation rows would appear in the corresponding version of Table A.l? Explain briefly.
    2. Explain briefly why C-- must have at least three assignment operations to implement its semantics of assignment.

  4. List all of the circumstances under which a type error can occur in C--.

  5. In order to verify type consistency, the compild must establish a type for every expression node. Apply the two-step design process to this problem:
    1. What type of value will you use to encode the type?
    2. What attributes will you introduce?
    3. What symbol(s) of the C-- abstract syntax will these attributes be associated with?
    4. For each symbol and attribute, which line of the table matches the necessary computation pattern ?
    5. For each match, state the computation.
    6. Is the direct dependence implied by the arguments of these computations sufficient to describe all dependence present in the problem? Explain briefly.

    Write a brief description of the tasks to be carried out by each team member for the next assignment, giving a rationale for your partitioning. Submit this description to lrc@sei.cmu.edu before the beginning of class on the due date.

    Complete your personal postmortem for this assignment and your personal project plan for the next assignment, and submit them via the web before class on the due date of this assignment.


    Instructor
    Revision 1.6 (1998/09/19 21:21:24)