Simple Structural Analysis

Purpose

To illustrate the construction of an abstract source tree and to introduce the maptool:

Due dates

You must hand in your specification at the beginning of class on Monday, September 10. This assignment is worth 10 points.

Background

A tree is an appropriate data structure for representing input text because the meaning of that text is generally described in terms of its components, and a tree embodies the ``component of'' relationship. If Eli is to generate a program that creates such a data structure, we need to provide it with specifications of four things: Sometimes one or more of these specifications can be omitted because Eli can deduce it from the others.

We can verify that the tree structure is correct either by using an execution monitor or an unparser.

Specifying Tree Structure

The set of allowable tree nodes is described by a set of LIDO rules, specified in a file of type .lido. The tree structure is normally chosen to minimize the number of distinct symbols representing node classes. Each distinct symbol has semantics that clearly differentiate it from any other symbol.

Specifying the Phrase Structure

Phrase structure is described by a context-free grammar. The grammar must provide an unambiguous description of the structure, and Eli guarantees that this will be the case by requiring that the grammar satisfy a somewhat more stringent property called LALR(1). If the grammar is not LALR(1), Eli will report conflicts that must be resolved by the user.

When describing the phrase structure, one needs to keep the desired tree structure in mind, because the phrases correspond roughly to tree nodes. In simple cases, the correspondence is exact and one can simply omit the context-free grammar entirely. Eli will then generate a context-free grammar from the tree description. Unfortunately, however, the generated grammar will often be ambiguous because the text needs additional rules or delimiters to bound the phrases. You saw one example of this in the set specification, where you needed to either bracket the set elements or separate them by commas to resolve ambiguity; neither the braces nor commas appeared in the LIDO rules. Another important example is the use of operator precedence and association rules to disambiguate expressions.

Specifying the phrase/tree relationship

The relationship between the phrase structure and the tree structure can be determined to a great extent by the system itself. Eli allows a user to provide partial specifications for each of these structures, and then it tries to combine the two descriptions and come up with complete specifications of both. Pattern matching is the basic strategy: A LIDO rule and a context-free rule that have the same signature correspond.

If there is a LIDO rule that does not correspond to any context-free rule, a corresponding context-free rule is usually generated. Similarly, LIDO rules are usually generated for context-free rules that have no corresponding LIDO rules. Chain rules, rules with only one non-terminal symbol on the right-hand side, are the exceptions. Full details can be found in the Syntactic Analysis reference manual.

Specifying the Lexical Structure

Regular expressions are the mechanism on which a description of the lexical structure of the text is based. They can be used either completely on their own, or in conjunction with C routines called auxiliary scanners.

Specifying an Unparser

An ``unparser'' is a routine that takes a data structure and renders it as text. Eli is capable of producing two different kinds of unparser from the specification of a tree.

Task

Write a specification for a processor that accepts an arithmetic expression and prints the tree structure for that expression via a tree unparser.

Your processor should accept the usual notation for arithmetic expressions involving addition, subtraction, multiplication, division and negation. All denotations must be expressed in decimal, using either integer or floating-point notation, and normal precedence and associativity rules can be used to avoid parenthesization. Parentheses may also be used in the normal way, and redundant parentheses must be accepted.

The semantics of the expression should be those of normal arithmetic.

Minimize the number of distinct symbols that appear in your attribute grammar; each distinct symbol should have semantics that clearly differentiate it from any other symbol.


ecen5523@schof.colorado.edu
Revision 1.5 (1998/09/01 20:37:39)