Code Sequences

Purpose

To design the representation of source language programs as target program trees:

Due dates

You must complete the PEP tasks (individually), and then hand in the answers to the questions in this exercise (as a group) at the beginning of class on Monday, November 16. This assignment is worth 30 points.

Background

Action mapping is the task of converting abstract source program trees into abstract target program trees.

Code to carry out the action mapping task is designed on the basis of an exhaustive case analysis in which source tree constructs are implemented by sequences of target machine assembly code instructions. This case analysis is used to ensure that every source program can, in fact, be implemented correctly by some target program. The goal should be a clean, systematic translation that is easy to understand and to verify. Special-case optimization should not be included in the initial design, because it distracts the designer and sometimes leads to more convoluted translations.

Once the case analysis has been completed, the assembly code sequences must be abstracted and represented by constructs of the abstract target program tree. The purpose of this abstraction is to defer decisions like register allocation, evaluation order within expressions, and instruction selection. It abstracts the appropriate characteristics of the target machine, and is independent of any particular source language. Because of that independence, there may be no adequate abstraction for some of the assembly language sequences needed to implement particular source language constructs. In that case the designer must decide whether to extend the target abstraction used for their compiler. Such extensions are common, but should be carefully justified.

The final design is a mapping from source tree constructs to target tree constructs that specifies sets of translations and when each is applicable. Standard techniques can then be used to generate code that carries out the mapping.

It is always important to go to the best sources for information about the problem: the source language standard and the target machine manual. Textbooks can provide overviews and discussion, but the details are sometimes missing or incorrect. The target machine for homework in this course is the SPIM simulator for the MIPS R2000. An additional reference discussing the MIPS hardware is the book by Kane. Use the SPIM Manual as your primary reference for the target machine and the Kane book as backup for providing insight.

Case Analysis

The initial case analysis involves assembly language code sequences. For every source construct requiring a run-time action, write a sequence of assembly code instructions that will carry out the required action. It is very important to be systematic about this initial case analysis. You might organize the set of sequences to parallel the C-- abstract syntax, so that you can be certain you didn't omit anything. Be sure to write separate sequences for operator overloadings, and to include sequences for coercions if necessary (some coercions don't require code). Verify that each of your assembly language sequences actually has the effect defined for the corresponding C-- language construct.

Peer review is a good way to weed out errors. You can also try out the effects of your assembly language sequences by stepping through them on the SPIM simulator. The commands spim (providing a command line interface) and xspim (providing an X-windows interface) can be used to run SPIM on either rintintin or nag, or you can download a copy and implement it on some other machine of your choice.

One obvious starting point for the case analysis is the source language arithmetic and logical operators. These operators are usually easy to understand, and correspond to very short assembly language sequences (often just single instructions). Try to write the most general sequence in each case.

As an example, consider the C-- integer addition operation (see Waite, Section A 1.5.2 and Table AI). This operation can be implemented by a single SPIM instruction:

add	rl,r2,r3

Note that specific register numbers aren't given. The reason is that the register numbers must be determined by the context in which the integer addition operation appears.

The implementation given in the previous paragraph is the most general one. SPIM also provides an integer add instruction in which r3 is replaced by an integer constant. That instruction is less general, because its right operand is restricted. It is also unnecessary for a complete mapping, because SPIM has an instruction to load an integer constant into a register. Thus a C-- integer addition operation can be carried out by a register add instruction even when the right operand is an integer denotation.

The C-- complement operator illustrates the care that must be taken to ensure that the assembly language sequence actually has the effect defined for the source language construct. According to the C-- definition (Waite Section A.1.5.2), the effect of the expression !e is the same as the effect of the expression:

e == 0 ? 1 : 0

A search of the SPIM instruction set reveals that the following will implement !e correctly if the value of e is in r2:

seq	rl,r2,r0

Specific implementations of the arithmetic and logical operators impose requirements in addition to those stemming from the source language definition. For example, the sequences presented above require that operands be loaded into registers and that results reside in registers. Thus a SPIM li instruction would be needed to implement a C-- IntVal construct.

Sometimes a construct simply provides an environment for code, as in the case of DyadicExp: The code corresponding to this construct is actually the code corresponding to its Binop child. DyadicExp only represents the ``glue'' binding instructions like add and sub together.

Other constructs, like OneSided (Waite Section A.1.4.1), embody components implemented by code sequences of arbitrary length. When doing the case analysis, simply write descriptions of those component sequences:

	Code to evaluate the component Expression into r
	beqz	r,0,L
	Code to execute the component Body 
L:

The SPIM Abstraction

Section 9.1 of Waite explains the general characteristics of abstract target program trees, and how they are used to embody certain decisions and leave other decisions open. An understanding of these general characteristics aids in deciding how to represent specific source program constructs in terms of a specific target program abstraction.

The examples given below use the abstraction described in An Encoder for SPIM; you may use it for your translation. It is important to note that this abstraction is given, and is not a result of the action mapping design process.

Instruction selection decisions, described in Section 9.1.2 of Waite, are generally trivial for SPIM because only one instruction is available to implement each machine operation. Thus each abstract SPIM node alone provides enough information to support the code generator's instruction selection decision.

A target machine abstraction supports deferred decisions about resource allocation by prohibiting the specification of register numbers. Each value created by the program is represented by a node in the tree, and is used exactly once in the context specified by its parent. For example, consider the C-- expression x+y*z. Abstract SPIM constructs that could be used to represent this expression are:

RULE iRegrr: IntReg ::= Mop IntReg IntReg  END;
RULE iLoad:  IntReg ::= Mop Memory         END;
RULE Local:  Memory ::=            Denoter END;

If x were stored at offset -4 in the activation record, y at offset -8, and z at offset -12, then the target program tree fragment would be:

iRegrr(add,
  iLoad(lw,Local(-4)),
  iRegrr(mul,
    iLoad(lw,Local(-8)),
    iLoad(lw,Local(-12))
  )
)

(Here each Mop value is written as the corresponding SPIM assembly language operation code and each Denoter value as the number it denotes.)

The abstraction supports deferred decisions about execution order by using a tree rather than a sequence. An instruction represented by a construct must be executed after all of the instructions represented by its children have been executed, but there is generally no restriction on the order in which the children's instructions are carried out. Assuming appropriate register allocations, either of the following sequences would correctly implement the computation of x+y*z:

Sequence using 3 registersSequence using 2 registers
lw$3,-4($fp)
lw$4,-8($fp)
lw$5,-12($fp)
mul$4,$4,$5
add$3,$3,$4
lw$3,-8($fp)
lw$4,-12($fp)
mul$3,$3,$4
lw$4,-4($fp)
add$3,$3,$4

Capturing the Action Mapping Design

Once you have completed your case analysis, and obtained the assembly language sequences implementing the source constructs, you need to express those sequences in terms of the SPIM abstraction. The examples given above show some of the possibilities. It's important to write down the complete mapping in an orderly way so that you can verify that you have covered all of the constructs and so that you can design a computation over the source program tree to build the target program tree. One reasonable notation is to annotate the assembly language sequences with appropriate factory method calls.

Here is an example of a tabular format, applied to the sequences described above:

Integer addition add rl,r2,r3 add_iRegrr(el,e2)
Complement seq rl,r2,0 seq-iRegri(e, 0)
One Sided
	Code to evaluate Expression->r
	beqz r,L
	Code to execute Body
L:
do_branch(
  beqz_Branchr(e),
  L)
Items coding Body
define_label(L)

The first column of the table is the name of a source language construct, the second column is the assembly code sequence implementing that construct, and the third is the (sequence of) factory method calls needed to build the target program abstraction of the assembly code sequence. 0 stands for a specific Denoter, while L stands for a definition table key representing the label. Abstract target program trees implementing expressions are represented by e, el and e2.

The implementation of OneSided consists of an arbitrary sequence of trees. That sequence begins with the tree implementing the conditional transfer of control, whose destination is determined by the translator. A list of items that implement the actions of the body of the OneSided follow, and the sequence ends with the item implementing the label definition.

Task

Answer the following questions:
  1. Design an action mapping from C-- to SPIM. Capture your design in a form similar to that described above. If you need to extend the SPIM abstraction to accommodate C-- semantics, define the necessary extension constructs, operation codes and factory methods in the same style as that used in the original definition. Write a short justification explaining why each extension is required.

  2. Figure A.10 of Waite gives a VAX assembly code implementation of the C-- GCD program. Rewrite this program in SPIM assembly language, using the mapping you designed. Run your program on the SPIM simulator and hand in a listing of the source code.

  3. Figure A.9 shows an abstract VAX target program tree for the C-- GCD program. Draw an abstract SPIM target program tree for the C-- GCD program, using the mapping you designed. Hand in your drawing.

  4. Execution order and resource allocation decisions should be deferred to the code generator if the source language does not constrain the order of evaluation of expression operands. Quote information from Appendix A showing that this is true for C--.
Write a brief description of the tasks to be carried out by each team member for the next assignment, giving a rationale for your partitioning. This description forms a part of your group's project plan, but it should be submitted by electronic mail to lrc@sei.cmu.edu rather than being entered into the PEP system.

Complete your personal postmortem for this assignment and your personal project plan for the next assignment, and submit them via the web before class on the due date of this assignment. Your group must hand in one set of answers to the questions at the beginning of class on the due date.


Instructor
Revision 1.4 (1998/11/1417:18:23)