Simple Structural Analysis
Purpose
To illustrate the construction of an abstract source tree
and to introduce the maptool:
- Defining a tree structure
- Defining a phrase structure
- Describing the relationship between phrase and tree structures
- Defining a lexical analyzer
Due dates
You must hand in your specification
at the beginning of class on
Monday, September 10.
This assignment is worth 10 points.
Background
A tree is an appropriate data structure for representing input text because
the meaning of that text is generally described in terms of its components,
and a tree embodies the ``component of'' relationship.
If Eli is to generate a program that creates such a data structure, we need
to provide it with specifications of four things:
- The allowable nodes of the tree
- The phrase structure of the input
- The relationship between phrases and tree nodes
- The lexical structure of the input
Sometimes one or more of these specifications can be omitted because Eli
can deduce it from the others.
We can verify that the tree structure is correct either by using an
execution monitor or an
unparser.
Specifying Tree Structure
The set of allowable tree nodes is described by a set of
LIDO rules, specified in a
file of type .lido.
The tree structure is normally chosen to minimize the number of distinct
symbols representing node classes.
Each distinct symbol has semantics that clearly differentiate it from any
other symbol.
Specifying the Phrase Structure
Phrase structure is described by a
context-free grammar.
The grammar must provide an unambiguous description of the structure, and
Eli guarantees that this will be the case by requiring that the grammar
satisfy a somewhat more stringent property called LALR(1).
If the grammar is not LALR(1), Eli will report
conflicts that must be resolved by the user.
When describing the phrase structure, one needs to keep the desired tree
structure in mind, because the phrases correspond roughly to tree nodes.
In simple cases, the correspondence is exact and one can simply omit the
context-free grammar entirely.
Eli will then generate a context-free grammar from the tree description.
Unfortunately, however, the generated grammar will often be ambiguous
because the text needs additional rules or delimiters to bound the phrases.
You saw one example of this in the
set specification, where you needed to either bracket the set elements or
separate them by commas to resolve ambiguity; neither the braces nor commas
appeared in the LIDO rules.
Another important example is the use of
operator precedence and association rules to disambiguate expressions.
Specifying the phrase/tree relationship
The
relationship between the phrase structure and the tree structure
can be determined to a great extent by the system itself.
Eli allows a user to provide partial specifications for each of these
structures, and then it tries to combine the two descriptions and come up
with complete specifications of both.
Pattern matching is the basic strategy:
A LIDO rule and a context-free rule that have the same signature correspond.
If there is a LIDO rule that does not correspond to any context-free rule,
a corresponding context-free rule is usually generated.
Similarly, LIDO rules are usually generated for context-free rules that
have no corresponding LIDO rules.
Chain rules, rules with only one non-terminal symbol on the
right-hand side, are the exceptions.
Full details can be found in the
Syntactic Analysis reference manual.
Specifying the Lexical Structure
Regular expressions are the mechanism on which a description of the lexical
structure of the text is based.
They can be used either completely on their own, or in conjunction with
C routines called auxiliary scanners.
Specifying an Unparser
An ``unparser'' is a routine that takes a data structure and renders it as
text.
Eli is capable of producing
two different kinds of unparser from the specification of a tree.
Task
Write a specification for a processor that accepts an arithmetic
expression and prints the tree structure for that expression via a tree
unparser.
Your processor should accept the usual notation for arithmetic expressions
involving addition, subtraction, multiplication, division and negation.
All denotations must be expressed in decimal,
using either integer or floating-point notation,
and normal precedence and associativity rules can be used to avoid
parenthesization.
Parentheses may also be used in the normal way, and redundant parentheses
must be accepted.
The semantics of the expression should be those of normal arithmetic.
Minimize the number of distinct symbols that appear in your attribute grammar;
each distinct symbol should have semantics that clearly differentiate it
from any other symbol.