Building a Class Web Site

Purpose

To solve a simple structural analysis problem:

Due dates

You must submit your solution via electronic mail by 9:00am on Tuesday, October 11. Your submission must be an attachment that is a zip file named ``firstName.lastName.zip''. (If you are submitting as a group, the file should be named by the person actually submitting it.) To ensure proper credit, please be certain that every one of the specification files contains the names of all of the members of the group. Place these names in C-style comments at the head of all Eli specification files except FunnelWeb files. Place them in the first block of text in a FunnelWeb file.

This assignment is worth 20 points.

Background

Structural analysis is the task of scanning and parsing the source text, detecting lexical and syntactic errors, and building an abstract syntax tree (AST) to represent that text. The AST structure is complete after structural analysis, and each leaf corresponding to a denotation or an identifier contains information specifying that denotation or identifier.

Lexical analysis

Lexical analysis groups sequences of input characters into basic symbols and comments. GLA, the language Eli uses to specify the structure of the basic symbols and comments, is defined in the Eli Lexical Analysis manual. You have seen examples of GLA specifications in lab and in the FunnelWeb file supporting the calculator example. Eli provides a library of canned descriptions, which enable you to obtain complete specifications of common basic symbols and comments simply by stating their names.

Sometimes a particular basic symbol or comment is difficult or impossible to describe with a regular expression. In such cases GLA allows you to use an auxiliary scanner written in C. A number of auxiliary scanners for common situations are available in the library, and if none of these are satisfactory you can supply your own.

Finally, the specific text of a basic symbol may have to be stored in the tree as the value of a terminal. If this is the case, you must specify a token processor written in C. It is almost never necessary for you to actually write your own token processor, because the ones available in the library cover all of the usual cases. In fact, the token processor mkidn can be used for almost any basic symbol. It stores the basic symbol's character sequence uniquely in the string table, and makes the index of that sequence the value of the corresponding terminal. Since the character sequence embodies all of the information that is available about the basic symbol, nothing is lost by this representation. Any further interpretation can be carried out by a tree computation that uses the character sequence.

Syntactic analysis

Syntactic analysis determines the phrase structure of the input text: how the basic symbols relate to one another in a meaningful hierarchy. The phrase structure of the text is described by a concrete grammar that is distinct from the abstract grammar used to describe the structure of the tree. Clearly the concrete and abstract grammars are related, but they are generally not identical: The abstract grammar is designed to simplify the desired tree computation, while the concrete grammar is designed to unambiguously determine the relationships among the basic symbols and the resulting hierarchy.

EBNF, the language Eli uses to specify the concrete grammar, is defined in the Eli Syntactic Analysis manual. You have seen examples of EBNF specifications in the FunnelWeb file supporting the calculator example. (EBNF specifications are provided to Eli in files whose names end with the .con tag.)

It is important to understand that Eli may not require a complete specification of the concrete grammar if an abstract grammar describing the desired tree structure is available. Only the information that cannot be deduced from the tree grammar must be provided. See the discussion of typical patterns of syntax development for further information. In this assignment, you are given an abstract grammar in the form of a LIDO file describing the nodes of the tree.

Your challenge in this situation lies in creating a concrete grammar that is parsable and compatible with the desired tree structure. (Click here for a detailed discussion of the relationship between phrases in the input text and nodes in the tree.) If the abstract grammar itself is parsable, then it can also serve as the concrete grammar and no question of compatibility arises. If the abstract grammar is not parsable, then a reasonable strategy is to define EBNF rules that resolve each conflict, ensuring that they remain compatible with the existing tree structure. This step-by-step approach keeps the amount of work to a minimum.

Suppose that file try.specs defines the abstract grammar (and possibly other things, like tree computations) for the problem of interest. Here are the steps that we would take in this development process:

  1. Make the following Eli request:
    -> try.specs :parsable <
    
    If Eli reports that the grammar is LALR(1), you are done with this aspect of the specification.

  2. If the grammar is not LALR(1), the :parsable product shows you sample derivations that contain the conflicts. A detailed example is given below, in the description of the task for this assignment. Analyze the output, and determine a concrete grammar rule or rules that will solve at least one of the conflicts reported. (Click here for hints about resolving a number of common conflicts.)

  3. Augment any existing .con specification to solve the conflict. If the added rule or rules introduces a new name that is semantically equivalent to a tree symbol, augment any .map specification accordingly. (Click here for a detailed discussion of .map specifications.)

  4. Make the following Eli request:
    -> try.specs :abstree <
    
    This will show you Eli's understanding of the current abstract grammar. It should be the same as the abstract grammar you are trying to implement. If the concrete rule(s) you have added are incompatible with the abstract grammar, you will see rules with names rule_001, rule_002, ... at the beginning of the output. These are rules from the concrete syntax that could not be matched to rules of the abstract syntax. (Eli generates the names rule_1, rule_2, ... for unnamed rules of the abstract grammar, and the names rule_001, rule_002, ... for rules introduced because concrete grammar rules had no counterparts in the abstract grammar.)

  5. Go to step (1).

Structural errors

Structural errors are violations of the definition of the source language that affect the structural analysis task. Error reporting and recovery strategies are built into Eli-generated scanners and parsers. Thus no effort is necessary on the part of the user to detect or report structural errors.

Task

File Site.zip contains the directory Site. It's similar to the directories in which I build web sites for my courses. Click here to obtain a copy of the zip file, unzip it, and change to the directory Site. You will see the following files and directories:
Site.specs
The raw material from which Eli will generate your processor. You should add the names of any specification files you create to this file.

Comp.fw
A specification for the processor output and most of the tree computations. You should not need to change this file, but you might be interested in it as a more complex example of tree computation. If so, obtain a PDF file for it by the command:
-> Comp.fw :fwTex :pdf > Comp.pdf

schedule
The current schedule specification for the course. This file can be used as an exhaustive sample input for your generated processor.

DHW
The directory into which your processor will write the ``dummy homework'' files (the homework files saying that the assignment is under development).

top
The directory into which your processor will write the ``Homework Assignments'' page.

Content
The directory into which your processor will write the ``Class Schedule'' page.

Here is the abstract grammar defined by Comp.fw:

RULE: Syllabus ::=    Id Sect URL Text Topics Weeks    END; 

RULE: Weeks    LISTOF Week                             END;
RULE: Week     ::=    Lecture Lecture Lecture Homework END;

RULE: Lecture  ::=    Date Text Readings Topics        END;
RULE: Lecture  ::=                                     END;

RULE: Homework ::=    Id Text Topics                   END;
RULE: Homework ::=                                     END;
  
RULE: Readings LISTOF Reading                          END;
RULE: Reading  ::=    URL Text                         END;
RULE: Reading  ::=    '(' Id Reflist ')'               END;
  
RULE: Topics   LISTOF Topic                            END;
RULE: Topic    ::=    Text                             END;
  
RULE: Reflist  LISTOF Ref                              END;
RULE: Ref      ::=    Id                               END;
RULE: Ref      ::=    Sect                             END;
RULE: Ref      ::=    Id Id                            END;
RULE: Ref      ::=    Id Sect                          END;

You cannot generate a complete processor from Comp.fw because it has no specification for basic symbols, and the abstract grammar is not parsable. Your task is to supply these missing pieces.

Basic symbols

The abstract grammar has five terminals. The structures of the corresponding basic symbols can be determined from file schedule:
Date
Two numbers separated by a slash.
Id
A normal identifier.
Sect
Normal book sections like ``1.2'', ``11-3-5'', or ``A.1.12''.
Text
Arbitrary character sequence beginning with a tab and ending at the end of the line.
URL
Arbitrary URL enclosed in quotes.

You must write a GLA specification that defines these basic symbols formally, and creates an appropriate value for the terminal. A terminal's value is ``appropriate'' if it is consistent with its use in the tree computations. For example, a Lecture has an LDate attribute that is computed as follows:

SYMBOL Lecture: LDate: CharPtr;

RULE: Lecture ::= Date Text Readings Topics
COMPUTE
  ...
  Lecture.LDate=StringTable(Date);
  ...
END;
The representation that you choose for the Date terminal must be compatible with this use or you must replace the tree computation.

Parsability

If you ask Eli to show you whether the abstract grammar is parsable (step (1) above), you will be shown a number of examples of conflicts. Here are the first two:
************************************************************************
   *** reduce-reduce conflict on: EOF

Syllabus EOF
|
Id Sect URL Text Topics Weeks
                        LST_Weeks .  [REDUCE] Weeks -> LST_Weeks  {EOF} ?

Syllabus EOF
|
Id Sect URL Text Topics Weeks
                        LST_Weeks
                        LST_Weeks Week
                                  Lecture Lecture Lecture Homework
                                  .  [REDUCE] Lecture ->  {EOF} ?

************************************************************************
   *** shift-reduce conflict on: Date

Syllabus EOF
Id Sect URL Text Topics Weeks
                        LST_Weeks
                        LST_Weeks Week
                                  Lecture Lecture Lecture Homework
                                  |       Date Text Readings Topics
                                  |
                                  .  [REDUCE] Lecture ->  {Date} ?

Syllabus EOF
Id Sect URL Text Topics Weeks
                        LST_Weeks
                        LST_Weeks Week
                                  Lecture Lecture Lecture Homework
                                  . Date Text Readings Topics  [SHIFT] Lecture -> Date . Text Readings Topics  ?

************************************************************************

Each derivation begins with Syllabus EOF. Succeeding lines are the result of rewriting a single nonterminal symbol by applying some production of the grammar. The result of a rewriting step is aligned with the symbol being rewritten. Thus we can rewrite Syllabus to Id Sect URL Text Topics Weeks, and then rewrite Weeks to LST_Weeks.

The lines adjacent to the vertical bar show how the lookahead symbol can be derived: In the second example, the second Lecture is rewritten to Date Text Readings Topics. Thus Date is the lookahead symbol that we should see after recognizing all of the components of the first Lecture.

Below the vertical bar, the main derivation continues by rewriting the symbol at the top of the bar. In this case, Lecture is rewritten as the empty string (because a lecture can be omitted). The final line in each derivation shows the action that the parser would take at that point: In this case, reducing an empty string to a Lecture in the presence of the lookahead symbol Date.

The first example shows that with an empty input, the parser could do two things:

  1. It could determine that the list of weeks was empty (the first case, reducing LST_Weeks to Weeks in the presence of EOF).
  2. It could determine that the first week began with an empty lecture (the second case, reducing empty to Lecture in the presence of EOF).
Here the parser knows that when it reaches the EOF it has gotten to the end of something, but it can't decide what. This is the characteristic of a reduce-reduce conflict.

The second example shows that when the parser sees a Date, it doesn't know whether the date is for the first Lecture of the week or whether the first Lecture is empty and this is the date for the second Lecture of the week. Here it doesn't know when it sees the Date whether it has gotten to the end of something (in this case the first Lecture) or not. This is the characteristic of a shift-reduce conflict.

To solve this conflict, we can say that we are describing class periods, and those periods will exist whether we actually hold classes during them or not. If we don't hold a class on a given date, the schedule should still show that period but with a notation like ``Labor Day'' or ``Thanksgiving''. The only periods that can legitimately be omitted are at the end of the semester. Thus we can describe the phrase structure with a concrete grammar that distinguishes the last week from all of the other weeks by using a different symbol for it. Only the last week can have empty lectures, and once one lecture has been omitted then all following lectures must be empty. Thus we can define yet another symbol to denote an empty lecture in the concrete grammar.

Semantically, however, the last week is just like any other week and an empty lecture is just like any other lecture. Thus we need to map the symbol denoting the last week in the concrete grammar to Week, and the symbol denoting an empty lecture in the concrete grammar to Lecture. This will allow us to solve the conflict and build the appropriate tree.


Instructor
Revision 1.1 (2005/09/25 17:36:04)