A compiler is a program that accepts text in some source language and produces equivalent text in some target language. Its behavior can be roughly described as follows:
The structures of the source and target texts are often represented within the compiler by trees, because many properties of a construct depend on properties of that construct's components. Section 1.3 of the book Modern Compiler Construction in Java by Andrew W. Appel (Cambridge University Press, 1998) uses a simple, straight-line language to illustrate the relationship between an input text and the abstracting tree. Unfortunately, there are a number of inconsistencies in the example that obscure the nature of the relationship.
Appel's Figure 1.4 depicts the identifier strings and numbers as tree nodes. In his Grammar 1.3, however, these entities are represented by terminal symbols set in a special typeface. The word ``print'' is also set in the special typeface, but it does not appear in Figure 1.4. Finally, the operator characters (like ``+'') seem to play the same role in the grammar as ``print'', but their typeface differs from that of ``print''.
In Appel's Program 1.5, classes based on Stm and Exp are used to represent all of the interior nodes of Figure 1.4. Figure 1.4's leaves are all instances of other classes not defined in Program 1.5. Thus the classes defined in Program 1.5 correspond to some of the symbols and rules in Grammar 1.3, but not all: No class is defined for the symbol Binop, and hence there can be no classes to represent the rules having Binop on the left-hand side.
The consequence of these inconsistencies is that Appel's Grammar 1.3 does not constitute a reliable specification of the abstraction. From a software engineering point of view, that's a disaster. What it means is that the implementor of the specification is allowed to vary the code according to unstated criteria, and anyone attempting to maintain that code must examine it to understand it.
At the end of Chapter 1, Appel sets a programming exercise: implement a simple program analyzer and interpreter for the straight-line programming language. Since he does not want to worry about parsing the language at this point, he builds a tree for the following program by writing the necessary data constructors explicitly:
a := 5 + 3; b := ( print(a, a - 1), 10 * a ); print(b)
Each print statement results in a single line containing the values of the operand expressions, separated by spaces. Thus the output of the program should be:
8 7 80
This example is straightforward because no print statement has a nested print statement. The following example is more challenging for an interpreter:
a := 5 + 3; b := ( print(a, ( print(a / 4), a - 1 )), 10 * a ); print(b)
The output of this program should be:
2 8 7 80
Section 2 of this paper solves the programming exercise in C++, using the second program as a test. The C++ code was obtained by applying the following consistent implementation procedure to Appel's Grammar 1.3:
Literal symbols appearing on the right-hand side of a grammar rule carry no information, because they are fixed once the rule is known. They are used only in rules corresponding to phrases of some input text, to show the relationship between the textual delimiters and the components of the construct.
Once one has a notation for specifications and a consistent implementation procedure for that notation, a tool can be written to verify the specifications and apply the implementation procedure mechanically. This approach guarantees that the specification is complete and consistent, and that the implementation actually realizes it. Moreover, maintenance is shifted to a smaller, more readable description of the problem.
Eli provides tools that verify grammars describing trees and implement them as C data structures. Computations can be attached to a grammar specification, and the tools will generate code to carry out those computations on the data structure implementing the tree.
Section 2's solution to Appel's programming exercise is specified using these techniques in Section 3 of this paper. Because Section 2 and Section 3 solve the same problem in the same way, they provide a direct comparison between systematic hand coding and generation. Both solutions are executable, so that one can guarantee that no details have been omitted in either case.
Appel sketches a more ad-hoc solution to this problem based on the same general implementation technique, using Java as the implementation language. The inconsistencies mentioned earlier mean that his solution could differ slightly from the one presented here. Whether those differences would constitute enough of an improvement to offset the dangers of unsystematic hand coding is difficult to predict.
This paper was written using the literate programming style, in which code fragments are represented in the text by macros. Macros are numbered in order, and the macro(s) by which each macro is invoked is specified at the point of definition. Each macro invocation is a link to its definition, and the invoking macro numbers with a definition are links to the invoking macros. Some of the macros are listed as being ``attached to a product file'' or ``attached to a non-product file''. The names of those macros are the names of the files that will be generated from the document. (The product/non-product distinction has to do with the way these files are treated by Eli.)
FunnelWeb was used to process this document.