MASSACHVSETTS INSTITVTE OF TECHNOLOGY

Department of Electrical Engineering and Computer Science
6.001 -- Structure and Interpretation of Computer Programs

Spring Semester, 1997

Problem Set 9

Issued: Thursday April 17, 1997

Written solutions due: Wednesday, April 30 for sections meeting at 10 and 11; Friday, May 2 for sections meeting at 9, 12, 1 and 2.

Reading: Attached code (java-eval.scm, and java-support.scm). Note also that java.scm, and java-syntax.scm can be read online in the 6.001 Web page, if you are interested.

Note: The overall programming framework will be discussed in lecture on Thursday, April 17, 1997.


IMPORTANT NOTICE:
This handout should serve both as a set of lecture notes for today's lecture and as the background material for the problem set itself.

This is a brand new problem set and may very well be buggy. We will take this into account when grading. Please report problems to the lab TAs so that they can be fixed over the summer.

All of the code, as well as the HTML for this handout, is at http://www-swiss.ai.mit.edu/~u6001/ST97/JMiller/ps9-java/problem-set


Decaf Java-In-Scheme: An Introduction

First, a note about the name "Decaf Java". As we will see, the goal of this lecture and problem set is to explore the implementation of an evaluator for one computer language, using the evaluator for a second computer language as a foundation. Here, we are going to implement an evaluator for a watered down version of Java, by building on our understanding of Scheme's evaluator.

Why are we doing this, i.e. what's the point of writing an interpreter for a language when there are perfectly good commercial compilers and development environments available? In order to understand any complicated system it's often worthwhile to build and study a small model. The Decaf Java-In-Scheme interpreter is the computer equivalent of a scale model of a jet airplane, and can serve the same purposes:

  1. We'll see how the fundamental structure of the interpreter is very similar to the structure of Scheme's interpreter. In fact, it's not too much of an exaggeration to say that all interpreters for computer languages have roughly the same structure.
  2. We'll see where the languages differ fundamentally, not just on the surface. For example, why does Java have a complicated mechanism for resolving method names when Scheme's is so much simpler?
  3. We'll learn a good deal about Java design decisions by seeing how our little interpreter does the job. For example, we'll have to understand exactly what the Java rules for coercion, casting, and conversion are intended to accomplish.
  4. We'll learn some restrictions on any implementation of Java. For example, these same rules about casting and so forth imply some very strong requirements for building any Java virtual machine or interpreter.
  5. We'll learn about interactive programming environments, since we've had to extend Java to allow it to be used in a read-eval-print loop like Scheme's.

Decaf Java-In-Scheme: Syntax and Simplifications

Designing and building a model is a difficult engineering task. We have to decide what questions we want to study and what issues we're willing to ignore. Then we have to build the model so that we can study what we want and argue that our results would apply even if we'd built the entire system (i.e. that there aren't any significant cross-couplings in the real world).

In designing Decaf Java-In-Scheme, our main goal was to make the interpreter small enough that it can be understood by someone familiar with Scheme interpreters with about a week of study. That's a strong constraint, and in order to meet it, we've done a good deal of simplifying. Here are some of the major decisions that you will need to understand before examining the code:

Parsing
Java's syntax is intended to be parsed by a fairly complex program that reads characters and builds what is known in the trade as a "parse tree". Systems that automatically generate parse trees are interesting pieces of code and would probably form the basis for a nice problem set on their own. We are going to sidestep the associated syntax issues and impose a major restriction on our language. In particular, we are going to require that we write our programs as Scheme data structures that are already parse trees. The language we actually implement is really one that takes as input a Java parse tree and interprets it, rather than one that takes Java source files. Thus, you will act as the parse tree generator, by using a set of rules to convert Java code to a representation that is suitable for input to a Scheme program that is simulating the Java interpreter -- hence our use of the name DECAF JAVA rather than Java.
Missing Features
Java has a number of features that are important for use in a large scale programming project, but which take a lot of time and code to implement. Many of these are in the class libraries of Java, but some are built into the language itself. We've decided to omit several important features of Java in order to make the implementation simpler; adding back the features is hard work, but (we argue) isn't really going to change the structure of the implementation very much. Here's a partial list of what we've omitted:
Simplifications
A lot of what we did was done just to make the implementation job simpler. For example, we didn't implement any of the Java class libraries (we added the two built-in operations print and newline to compensate for the most obvious problem this causes). We unified all of Java's different numeric types into one new type, number. Note: dumping the class libraries actually does have an impact on our implementation, since some of the required libraries allow you to explore the class structure and so forth. That means that some information must be available at runtime to a Java program that our system may not actually maintain.
Dirty Tricks
Frankly, some of the things in Java are either hard to implement in the new framework or are things we didn't like to begin with. In particular, the built-in operations in Decaf Java-In-Scheme (like + and so forth) don't correctly coerce their arguments; this would have been hard and uninteresting code. We've also dropped the distinction between the classes int and Int because, while important in the real world, it's hard to implement and is something that we'd really rather didn't have to be there at all.

Decaf Java-In-Scheme: Details of the Syntax

Let's begin by looking at the manner in which we will represent Decaf Java expressions as equivalent Scheme expressions. The file java-syntax.scm contains procedures to test for all of the grammatical constructs that occur in our version of Java. You can refer to this code for any details you require, but in general the code here isn't very interesting and you needn't do more than scan it to get the basic idea of how the syntax is handled. Again, we stress that the Scheme procedures in this file are expecting as input Scheme expressions that represent the equivalent Java expressions, but you will be responsible for providing this translation. In other words, this body of code represents our interface between Java and the Scheme based parse trees of our interpreter.

The best way to understand Decaf Java-In-Scheme's syntax is to compare it with Java directly. Just as in Java (and, indeed, most modern programming languages), there are three major kinds of constructs: expressions which can be used to compute values; statements which can be executed for side-effects (but which don't compute values); and declarations which are used to define new names (in Java, declarations define classes and interfaces). Notice that any expression can be used where a statement is expected (but not the reverse).

Decaf Java-In-Scheme Expressions

Let's begin by exploring the expressions of the language. The procedure expr? tells us what are legal expressions, its body is:

  (or (literal? obj)
      (java-name? obj)
      (new? obj)
      (dot? obj)
      (call? obj)
      (cast? obj)
      (instanceof? obj)
      (built-in-expr? obj)
      (assignment? obj))
and thus each of these predicates defines a different legal expression.

We can build a chart that compares Java expressions and the corresponding Decaf Java representations, which we will examine in detail below:
JAVA DECAF JAVA-IN-SCHEME
3 3
foo foo
new foo() (new foo)
this.F (dot this f)
F(3) (call f 3)
this.F(3, "x") (call (dot this f) 3 "x")
(String) x + y (cast string (+ x y))
predicate ? consequent : alternative (? predicate consequent alternative)
x = 3 (= x 3)

Now let's look at each of these kinds of expressions.

literal expressions are defined as (Scheme) numbers, booleans, characters, strings, and the special names FALSE, NULL, and TRUE. A java-name (java-name?), which is the Decaf Java-In-Scheme equivalent of a Java variable, is any symbol that isn't one of the "magic" words of the Decaf Java-In-Scheme language. For reference, here are the names you can not use for variables (or anything else, for that matter) in Decaf Java-In-Scheme:

!= * *= *BREAK-LABEL* *CONTINUE-LABEL* *RETURN-LABEL* + += - -= / /= < <= = == > >= ? ABSTRACT AND BLOCK CALL CAST CLASS DO DOT EXTENDS FALSE FIELD FINAL FOR IF IMPLEMENTS INSTANCEOF INTERFACE METHOD NEW NEWLINE NULL OR PRINT PRIVATE PROTECTED PUBLIC STATIC SUPER THIS TRUE VARIABLES WHILE

A new expression is used to create new objects, and looks like (NEW <type name>). In Java, arguments can be used to choose which of several different "object constructors" should be called; but this is one of the simplifications we've made: since we don't support the declaration of explicit object constructors, we don't support the syntax for calling them, either.

A dot expression is used for Java constructs like this.F or super.X or A.B.C.X. In Decaf Java-In-Scheme these are written as (DOT this F), (dot super X), or (Dot A b c x) respectively (remember that Decaf Java-In-Scheme doesn't care about upper/lower case distinctions like Java does).

A call represents Java's method invocation. In Decaf Java-In-Scheme we write (CALL F 3) when in Java we would write F(3);. Similarly, we write (CALL (dot this F) 3 "x") when in Java it would be this.F(3, "x");. A cast is written as (CAST String (+ X Y)) instead of Java's (String) X+Y;. Similarly, we write (INSTANCEOF (+ X Y) String) instead of X+Y instanceof String.

A built-in corresponds to Java's set of "infix operators". In Decaf Java-In-Scheme, we support the following binary operators, corresponding directly to their Java counterparts: != * + - / < <= == > >=. As mentioned earlier, Decaf Java-In-Scheme doesn't handle coercion of operands for these built in operations, so be sure to cast the operands if that is required. The following is a legal Decaf Java-In-Scheme expression: (+ X (* Y Z)). Notice that (+ X Y Z) is legal in Scheme, but not in Decaf Java-In-Scheme (Decaf Java-In-Scheme's + operator is binary, while Scheme's takes any number of arguments). Also, + in Decaf Java-In-Scheme is like Java's +, not Scheme's: it will work when both arguments are numbers or when both arguments are strings.

In addition, we support (? <predicate expr> <consequent expr> <alternative expr>) corresponding to Java's <predicate expr> ? <consequent expr> : <alternative expr>. We also support (AND ...) where there can be any number of expressions for ..., corresponding to Java's && operator as well as (OR ...) for Java's || operator. We've added two built-in operators, (newline) and (print <expr>).

Finally, an assignment expression allows us to modify the value of an existing variable (like Scheme's SET!, but with a defined value returned). Decaf Java-In-Scheme, like Java, supports a number of different assignment operations. In Decaf Java-In-Scheme these are = *= /= += -=. For example, (= X 3) changes the value of the variable X to 3 and returns the value 3.

Decaf Java-In-Scheme Statements

The procedure statement? specifies what constitutes a Java statement, that is, what Decaf Java translations into Scheme represent legal statements:

(define (statement? obj)
  (or (expr? obj)
      (and (pair? obj)
           (case (car obj)
             ((BLOCK) (all? statement? (list-tail obj 1)))
             ((BREAK) (break? obj))
             ((CONTINUE) (continue? obj))
             ((DO) (do? obj))
             ((FOR) (for? obj))
             ((IF) (if? obj))
             ((LABEL) (label? obj))
             ((RETURN) (return? obj))
             ((VARIABLES) (variables? obj))
             ((WHILE) (while? obj))
             (else #F)))))

In addition to expressions, the following syntaxes are supported. Notice that Java's switch isn't supported (yet...)

We stress once more that the descriptions below identify what you write in Decaf Java-In-Scheme as being equivalent to Java expressions, since our interpreter requires Scheme based syntax as input.

(BLOCK <statement> ...)
A block of statements to be executed in order. This is more-or-less equivalent to Java's use of braces ("{" and "}") to group statements together. It's really used as a trick to make our parse tree (i.e. the Scheme representation of a Java program) a little simpler. Instead of having lots of places where any number of statements can appear, we just have a slot for a single statement and allow that statement to be a BLOCK.
(BREAK) or (BREAK <name>)
Java's break command, for exiting loops or named blocks of code.
(CONTINUE) or (CONTINUE <name>)
Java's continue command, for continuing around a loop or named block of code.
(DO <statement> <expr>)
Java's do <statement> while <expr>; command. It executes the statement, then evaluates the expr to see if it should repeat.
(FOR <init statement> <boolean expr> <update> <body statement>) or (FOR <variable declaration> <boolean expr> <update> <body statement>)
Java's for statement. In the first case, the init statement is executed, then the boolean expr, then (conditionally) the body statement followed by the update. Then the boolean expr is evaluated again, and so forth as long as it remains true. The second case is almost the same, but instead of executing the init statement, a new environment is created with the specified variables initialized and the remaining expressions are evaluated in that environment. (Note to Java pedants: the update part of a for isn't really allowed to be an arbitrary statement, but is restricted to a "StatementExpressionList". We've ignored this restriction in order to make the syntax checking simpler. And, because we've added BLOCK as a kind of statement we don't need to have a special case for statement lists in the init statement as in real Java.)
(IF <predicate expr> <consequent statement> <alternative statement>)
Java's if statement. Unlike the conditional expression (Java's ? : expression, or Decaf Java-In-Scheme's (? ...), or Scheme's (IF ...)), this is a statement, not an expression. It doesn't return a value.
(LABEL <name> <statement>)
Java allows statements to be labelled by prefixing them with a name and a :. The LABEL statement in Decaf Java-In-Scheme serves the same purpose, but it requires you to package all of the labelled information as a single statement (which is easy to do using the BLOCK construct).
(RETURN) or (RETURN <expr>)
Java's return statement. (RETURN) can only be used to return from a method that is declared to return the type void. The other form must be used to return a specific value of the correct type from a method.
(VARIABLES <type> (<variable declaration> ...) <statement>)
Java allows variables to be declared at the beginning of a block of code. In the parse tree (i.e. in Decaf Java-In-Scheme) this is reflected by having a different construct for blocks that have such declarations in them. A single VARIABLES node introduces any number of names (one for each variable declaration) but they all have the same type. They can be nested to introduce variables of different types. Each variable declaration is either just a java-name (i.e. it has no specific initialization) or is of the form (= <name> <expr>) to specify the code that must be executed to initialize the variable.
(WHILE <boolean expr> <statement>)
Java's while statement. Similar to the DO statement, but the value of the boolean expression is tested before the statement is executed rather than after.

Decaf Java-In-Scheme Declarations

There are only two declarations in Decaf Java-In-Scheme, one for classes and one for interfaces, and in this problem set we are ignoring interfaces. The following Decaf Java expression corresponds to the same construct in Java:


(CLASS [(modifier ...)] <name>
       [(EXTENDS <name> ...)]
       [(IMPLEMENTS <name> ...)]
       <body form> ...)

(INTERFACE [(modifier ...)] <name>
           [(EXTENDS <name> ...)]
           <body form> ...)

The entries inside brackets ("[", "]") are optional -- you can leave them out if you want. The ... means that you can repeat the thing before it as often as you like.

The modifer must be one of the set allowed in Java: PUBLIC PROTECTED PRIVATE STATIC ABSTRACT FINAL. (It doesn't make sense to repeat a modifier, but it doesn't matter and Decaf Java-In-Scheme won't complain.) Note: In order to make it easier to deal with these complicated syntaxes, the procedures in java-syntax.scm are a little different from those in the book. Several of them take in the syntax that we're showing you here and return a more stylized syntax that includes entries for everything you could have put in, whether you put it there or not. This is called "canonicalizing" the parse tree and is quite common practice in compilers and interpreters. It makes the predicates for the syntax a bit more complex (they have to do the canonicalization) but the result is that the selectors are simpler and faster.

A body form can be one of three things:

  1. A method declaration (in a class). These look like:
    
    (METHOD [(<modifier> ...))]
            <type name> <method name> (<parameter> ...)
      <statement>)
    
    where a parameter is (<type name> <parameter name>).

    For example, here is a Decaf Java method:

    (METHOD (public static) void main ()
      (call (DOT System out println) "Hello, world"))
    

    The corresponding Java code is:

    public static void main()
    {
       System.out.println("Hello, world");
    }
    
  2. A body form can declare a field. Fields can be static (shared by all instances of the class) or dynamic (a new field is created for each instance of the class). A field is declared using this syntax:
    
    (FIELD [(<modifier> ...)]
           <type name> (<variable declaration> ...))
    
    . Here is a Decaf Java-In-Scheme example:
    (class parent
      (field string ((= bar "Parent bar")))
      (method void foo ((string s))
        (block
         (print (+ s (+ ": Parent foo, " (dot this bar))))
         (newline)))
      (method void foo2 ((string s))
        (block
         (print (+ s (+ ": Parent foo2 called, "
                        (dot this bar))))
         (newline)
         (call (dot this foo) s))))
    

    and here is the corresponding Java code:

    class Parent
    {
        String bar = "Parent bar;
    
        void foo (String S)
        {
            System.out.println(S + ": Parent foo,  " + this.bar);
        }
    
        void foo2 (String S)
        {
            System.out.println(S + ": Parent foo2 called,  " + this.bar);
            this.foo(S);
        }
    }
    
  3. Finally, a body form can be a static initialization expression in the form (STATIC <statement>). These are not permitted inside of interfaces, only classes.
Important: You will be responsible for converting Java programs into Decaf Java-In-Scheme. So let's get started with a simple Tutorial Exercise: As soon as youhave read the rest of this handout, convert the following small Java Class into a Decaf Java-In-Scheme equivalent parse tree. WRite the code needed to test out (in both J++ and Java-In-Scheme) and be prepared to show your output in tutorial. In J++ you will have to make another class that contains the test code as part of public static void main(String args[]). In Java-In-Scheme you will ned to use global to make a gloabl variable of type NumericFun and then use call to call the methods.
class NumericFun
{  int TheNumber;

  public void SetNumber(int n)
  {  TheNumber = n;
  }

  private int Fib(int n)
  {  if (n==0) return 0;
     else if (n==1) return 1;
          else return Fib(n-1)+Fib(n-2);
  }

  public int Fib()
  {  return Fib(TheNumber);
  }
}
To help you get started in the Java-In-Scheme version, here's a partial template for you to complete:
(define NumericFun
  '(class NumericFun
     (field number (TheNumber))
     (method (public) void SetNumber ((number n))
        (= TheNumber n))
     (method ...)   ; Fib (n)
     (method ...))) ; Fib ()

(run-exprs
  (list NumericFun
       '(global NumericFun
                (= Nine (new NumericFun))
                (= Ten (new NumericFun)))
       '(call (dot Nine SetNumber) 9)
       '(call ...)
       '(print (call ...))  ; (fib 9)
       '(newline)
       '(print (call ...))  ; (fib 10)
       '(newline)))

Problem 0

To get you some experience in converting Java expressions into the equivalent Decaf Java-In-Scheme expressions, here are a couple of warm up exercises.
  1. Here is a Decaf Java expression. What is the corresponding Java code?
    (method number fact-fn ((number n))
      (variables number ((= result 1))
        (block
         (for (number (= i 2)) (<= i n) (+= i 1)
              (*= result i))
         (return result))))
    
  2. Here is some simple Java code. What is the corresponding Decaf Java-In-Scheme translation?
    class Rotating
    {
        int start = 0;  // variable to hold state of object
        int end;        // variable to determine last value before
                        // going back to 0
    
        Rotating(int limit)
        {
           end = limit;
        }
    
        void Shift()
        {
        
         start += 1;
         if (start > end)
         { 
            start = 0;
          }
        }
    
        int Status()
        {
        
         return start;
    
        }
    }
    

Decaf Java-In-Scheme: The Interpreter

Okay, so we've seen how to translate Java code into the Decaf Java equivalent so that we can now have our interpreter evaluate such code. Let's take a look at the code in java-eval.scm. This is the basic evaluator for our Decaf Java-In-Scheme language. Just like the Scheme interpreter in the book, it consists of a main entry point where an expression is analyzed and we then dispatch to handlers for each kind of expression in the language. Notice one small but important difference from the Scheme interpreter in Sussman and Abelson -- here, each of the evaluator procedures takes three arguments:
exp
The expression to be evaluated. This will have the syntax described above, and that syntax can be manipulated by the selectors provided in the file java-syntax.scm.
env
The environment in which to evaluate the expression. As in Scheme, this is used to find the values of variables. It is also used to find methods and to interpret the meaning of this and super. As we'll see later, in Java, unlike Scheme, method names aren't looked up the same way variables are.
next
This is the part that differs from the Scheme interpreters we have seen so far. Unlike Scheme, Java (and hence Decaf Java-In-Scheme) has statements that alter the "flow of control": break, continue, and return (in a full implementation of Java, exceptions also alter the flow of control). In order to implement these statements, we provide this additional procedure which must be called by the evaluation procedure when it has finished calculating the value of the expression instead of returning an answer as they do in the Scheme interpreter.

If this is done consistently throughout the interpreter the flow-altering operations just call some procedure other than the one they are told to call, and the effect is that the program runs in a different order. This is quite subtle, and you don't have to worry about understanding it in detail. Just bear in mind that evaluation procedures (including j-eval and j-eval-in-order) do not return values to the procedure that called them, but call their next procedure instead.

The main procedure of the Decaf Java-In-Scheme interpreter is j-eval:

(define (j-eval exp env next)
  ;; EXP is a Decaf Java-In-Scheme expression.
  ;; ENV is an environment (see abstraction below)
  ;; (NEXT value) is called when the value has been computed
  (cond
   ;; Expressions
   ((literal? exp) (j-eval-literal exp env next))
   ((variable? exp) (j-eval-variable exp env next))
   ...
   ;; Statements
   ((variables? exp) (j-eval-variables exp env next))
   ((if? exp) (j-eval-if exp env next))
   ...
   ;; Handle declarations specially
   ((global? exp) (j-eval-global exp env next))
   ((quick-class-test? exp)
    (let ((full-class (class? exp)))
      (if full-class
          (j-eval-class full-class env next)
          (error "Bad syntax for CLASS" exp env))))
   ((quick-interface-test? exp)
    (let ((full-interface (interface? exp)))
      (if full-interface
          (j-eval-interface full-interface env next)
          (error "Bad syntax for INTERFACE" exp env))))
   (else (error "Unknown type of expression" exp env))))
For the most part, this just calls the syntax testing procedures and dispatches to the appropriate handling procedure. The only exception is the handling of class and interface where the syntax procedure (as mentioned earlier) canonicalizes its input. In this case, the canonicalized input is passed to the handler instead of the original expression.

Let's take a look at a few representative evaluation procedures. We'll start with the very simplest, j-eval-literal:

(define (j-eval-literal exp env next)
  (next (if (eq? exp 'NULL) '() exp)))
All it has to do is figure out the value of the expression exp and call next with that value. That's easy: for any literal except the NULL the value of the literal is itself. We represent Decaf Java-In-Scheme's NULL with Scheme's empty list, as shown.

OK, how about something harder? Let's take a look at how if is handled:

(define (j-eval-if exp env next)
  ;; (IF <test> <consequent> <alternative>)
  (j-eval (if.predicate exp) env
    (lambda (pred)    ; (1)
      (cond ((eq? pred #T)
             (j-eval (if.consequent exp) env next))
            ((eq? pred #F)
             (j-eval (if.alternative exp) env next))
            (else (error "IF: Not a boolean value"))))))

The structure is slightly different from the version we've used for Scheme. Remember that j-eval doesn't return a value. Instead, it calls its third argument with the value it has computed. So how do we read this procedure? It starts by calling j-eval to compute the value of the (if.predicate exp) in the current environment. The value of this expression is passed to the procedure that's marked (1) as the parameter pred. Notice how this defines a new next procedure, thereby passing the evaluation along. This procedure checks to see if pred is true, false, or something else. If it's true, then we call j-eval again to compute the value of the (if.consequent exp) in the same environment, and tell it to pass the value on to the procedure named next. This is important: remember that j-eval won't return a value -- it will call next instead.

Does that make sense? Let's see. We were supposed to compute the value of some big if expression and call next with the answer when we figure it out. So first we found out that the predicate of the if was true. Then we decided to call j-eval to find the value of the consequent and we've told j-eval that it should pass that answer on to next. Aha! That's clever. That means that we don't get back into the loop of handling the result any more -- we've simplified our original problem (finding the value of the if) to the problem of finding the value of the consequent of the if, and we can turn that completely over to someone else (j-eval) to take care of. This is what is known as reducing the problem to a simpler case. In this style of interpreter, we always try to reduce problems when we can, and we know that we've reduce it when we can call j-eval with our own next procedure as its next procedure (technically, these procedures are known as "continuations").

Now let's take a look at an evaluation procedure for a statement. Here's the one for block:

(define (j-eval-block exp env next)
  ;; (BLOCK <statement> ...)
  (define (loop exprs)
    (if (null? exprs)
        (next "End of block")
        (j-eval (car exprs) env
                (lambda (val) (loop (cdr exprs))))))
  (loop (cdr exp)))
Notice that when it calls j-eval it does not use its own continuation (next); instead it uses the procedure created by (lambda (val) (loop (cdr exprs))). This means that evaluating each statement within the block is a subproblem of the original problem. It isn't until after the last statement is evaluated that we actually use our own continuation to return a value.

Let's take a look at a few more evaluation procedures and then we can start the actual problem set (you can't wait, right?). Let's take a look at how labels work in this interpreter. First, how do we create a label? That's what the Decaf Java-In-Scheme (LABEL <statement>) syntax is for. Here's how it is implemented:

(define (j-eval-label exp env next)
  ;; (LABEL <name> <statement>)
  (let ((name (label.name exp))
        (labels *labels*))             ; (1)
    (define (return-here)              ; (2)
      (set! *labels* labels)
      (next "End of labelled block"))
    (add-label! name return-here)      ; (3)
    (j-eval (label.statement exp) env  ; (4)
            (lambda (val) (return-here)))))
  1. To understand how labels work, you need to know that there is a global variable, *labels*, that holds all of the labels that are currently known. It is just a Scheme list onto which new labels are added when the labelled block is entered and then it is reset when the block is exited. So to execute a LABEL statement we need to start by remembering the current contents of that list.
  2. return-here is the procedure that will be executed whenever we exit this label block. All it does is restore the list of labels (*labels*) to the value it had when we got here, and then returns a value of "End of labelled block" to whoever asked the block to be evaluated. The value, of course, isn't really used -- it's just an artifact of the way we write our interpreter that everything (including statements) returns a value.
  3. Here's where we add the new label to the global list of labels. The code for add-label! is found in java-support.scm:
    (define *labels* '())
    
    (define (add-label! name label)
      (set! *labels*
        `(,(make-binding name '*LABEL* label)
          ,@*labels*)))
    
    It is a bit ugly, but all it really does is change the value of the global variable *labels* by adding one new binding to the front of the list -- a binding from the label name to the procedure we specify (return-here in our case) with a type of *LABEL*.
  4. Having added the label, we can go evaluate the statement that's been labelled. This is done by calling j-eval, of course, but notice the continuation we are using: it is not our own continuation (that would be next). Instead we are telling j-eval that when it is done it should call the procedure created by (lambda (val) (return-here)). That is, when it finishes, it calls back to this procedure with the value (which is ignored) and then calls return-here. Return-here, of course, just restores the global list of labels (so it removes our labels and any others that might have been added) and calls our continuation (next) to report that we're finished.
So far, so good. But how do we use the label that we just created? Here's how (BREAK <label name>) works:
(define (j-eval-break exp env next)
  ;; (BREAK) or (BREAK )
  (exit exp '*BREAK-LABEL*))

(define (exit exp default-label)
  ;; Evaluate the procedure associated with a given label, supplying a
  ;; default label if none is specified to use.  Used to implement
  ;; (BREAK) and (BREAK <label>), etc.
  (let ((name (if (null? (list-tail exp 1))
                  default-label
                  (second exp))))
    ((lookup-label name))))             ; (1)

(define (lookup-label name)
  (lookup-global name *labels* "No such label"))

(define (lookup-global name global-var error-msg)
  ;; Find the value of a name in a global name space
  (let ((entry (assq name global-var))) ; (2)
    (if entry
        (binding.value entry)           ; (3)
        (error error-msg name))))
  1. Here's where the real work is done: notice that it says ((lookup-label name)) and not just (lookup-label name). So it looks up the label we're trying to find and then it calls the procedure associated with that label. If the label was made by the code in j-eval-label then the procedure will be one version or another of return-here: the procedure that removes all the newer labels and then returns to whoever evaluated the labelled block! This is what alters the flow of control: we call a continuation, but not the one that is expected -- that would have been the procedure next that is passed in to j-eval-break. Instead, we're calling the next that was passed to j-eval-label when the label was created.
  2. To find a label, we use Scheme's built in procedure assq to find it in an appropriate global list (*labels* in this case). To make this work, we had to arrange that bindings were in an appropriate format to be used with assq.
  3. Once we find the entry in the global list (*labels*) we still have to get the value part out -- the full entry is a binding so we want the binding.value part.

Using Decaf Java-In-Scheme

Before we start the main part of problem set proper, let's take a look at how you can use the interpreter. In the file java-support you will find several small programs (all of them compute factorial). Take a careful look at them to get the idea of how you translate a Java program into Decaf Java-In-Scheme.

Pay particular attention to how things hook into the read-eval-print loop. You can start it by calling (driver-loop) or you can use go to specify some code to evaluate before the loop begins. A simple model you might have is that when you type into the loop, is that you are actually typing code that is part of a static method defined in an unnamed class that's global to the whole program. This idea lets you access other classes, and so forth.

To make it a bit more convenient, we've added a new statement to the Decaf Java-In-Scheme language:


(GLOBAL <type> <VariableDecl> ...)
This allows you to declare new variables that are visible from the read-eval-print loop. With this in place, you can run a simple factorial example like the one in test2:
(define (test2 n)
  (run-exprs
   `((CLASS fact
            (METHOD (public static) number fact-fn ((number n))
              (VARIABLES number ((= result 1))
                (BLOCK
                 (FOR (number (= i 2)) (<= i n) (+= i 1)
                      (*= result i))
                 (RETURN result)))))
     (global fact (= x (new fact)))
     (call (dot x fact-fn) ,n))))
That's roughly equivalent to the following Java program:
class fact
{ method (public static) int fact-fn (int n)
  { number result=1;
    for (int i=2; i <= n; i += 1) result *= i;
    return result;
  }
}

class REPLoop
{ fact x = new fact();
  x.fact-fn (n);
}
You can use it like:
(/ (test2 10) (test2 9)) ==> 10

And the Main Part of the Problem Set Begins ...

Let's take a look at one more evaluation procedure, the one for while:
(define (j-eval-while exp env next)
  ;; (WHILE <boolean expr> <statement>)
  (let ((labels *labels*)              ; (1)
        (bool-expr (while.predicate exp))
        (statement (while.statement exp)))
    (define (done)                     ; (2)
      (set! *labels* labels)
      (next "End of WHILE"))
    (define (again)                    ; (3)
      (j-eval bool-expr env            ; (4)
        (lambda (bool)                 ; (5)
          (cond ((eq? bool #T)
                 (j-eval statement env
                   (lambda (ignored-value) (again))))
                ((eq? bool #F (done)))
                (else (error "WHILE: Not a boolean value"))))))
    (add-label! '*BREAK-LABEL* done)   ; (6)
    (add-label! '*CONTINUE-LABEL* again)
    (again)))                          ; (7)
  1. As with j-eval-label, we have to remember what labels existed when we started here. We also extract the various parts of the WHILE expression for use later.
  2. Done is like the return-here procedure for labels. It just restores the global label list to what it was before we got here and returns an uninteresting value as an answer (notice that it uses next, its own continuation, to deliver the answer).
  3. Again is the main loop of the while procedure. It is what needs to be repeated if there is a continue executed inside of the code.
  4. First, evaluate the boolean expression by calling j-eval.
  5. Notice that the continuation we pass will be called when the value of the boolean expression has been computed.
  6. Here's the main body of the procedure j-eval-while. It creates two labels: *break-label* is used by the break statement to exit the loop, so that it will call done; *continue-label* is used by the continue statement to go around the loop again, so it will call again.
  7. We have to start the loop ourselves, so we call (again).

Problem 1 -- And now it's your turn.

We've written the code to handle not only while, but also do. There used to be code in both java-syntax (for procedures do?, do.statement, and do.predicate) and in java-eval (for j-eval-do). It's been moved to java-working and we aren't distributing the source code for that. Write your own versions in the file java-testing and reload java. This should restart the whole system, with your definitions overriding ours. Turn in:

Variables vs. Methods

Let's take a close look at how Decaf Java-In-Scheme (and, fundamentally, Java) handle variables and methods. In Scheme we have Mantra 2: "to find the value of a combination, evaluate all of the subexpressions, in any order, and then apply the first to the rest." So, from Scheme's point of view, the operator of an expression and the operands of an expression are all evaluated in the same way: a variable is a variable is a variable, and it doesn't matter whether we plan to pass the value of the variable as an argument to another procedure or use its value as a procedure to call.

But what happens in Java? If you read the reference manual carefully you will see that the rules for variables and the rules for methods are very different. Finding the value of a variable is very similar to what Scheme does: there is a chain of environment frames, and they are searched in order to find the value of the variable. In Decaf Java-In-Scheme there are four ways for variables to be created and given values:

Method invocation
This binds the names of the method's parameters to values. This is very similar to what happens in Scheme when we apply a procedure to arguments, and for similar reasons. In our implementation, these frames are called bindings frames and are tagged with the keyword BINDINGS.
Variable declaration
Variables can be declared either inside of a method body (in Decaf Java-In-Scheme this is what the VARIABLES syntax does) or in the initialization part of a FOR statement. In our implementation these frames are tagged as VARIABLES and FOR frames, respectively.
Static (class) or dynamic (object) fields
The fields that are body forms in a class declaration can be referenced as variables in Java. Within a method, these fields are referenced either directly by their name or using the "dot" notation (like this.X or A.B.C).
Global declarations
We've added the GLOBAL syntax to Decaf Java-In-Scheme to allow variables to be declared in the global (read-eval-print) environment.
To find the value of a variable we use the idea of a chain of frames -- an environment -- just as we do in Scheme. In Scheme all environments end at the same frame (the global environment); but this isn't true of Decaf Java-In-Scheme. Instead, there are four kinds of frames that can serve as the end of a simple chain:
The Global Frame
This is just like Scheme's global environment frame. It is the frame in which the GLOBAL syntax creates or modifies variables.
Class Frames
These are frames that represent Java classes. When a static method is invoked, it creates an environment chain that ends in a Class Frame representing the class whose method is being called. Class frames are represented by the ev-class structure defined in the file java-support. Class frames are created when the class is declared (see the code for j-eval-class). Class frames have many parts, but the most important ones for our current purposes are the lists of bindings. A class has three separate lists of bindings: names of methods to the methods themselves; names of static (class) fields to their values; and names of dynamic (object) fields to the expressions that will be used to initialize them when an instance of the class is created.
Object Frames
These are frames that represent a Java object. They have a part that serves exactly as a Scheme frame, mapping names to values. The names are those of the "direct dynamic fields" of the object's class. That is, they are the dynamic fields that are declared in the object's direct class (the most specific class of which it is an instance). When a dynamic method is invoked, it creates an environment chain that ends in an Object Frame representing the object whose method is being called.

Object frames are created by the new expression. The other three parts of the object frame are the object's class (an ev-class structure); the "superobject" (that is, an object frame that has the dynamic fields of this object's class's direct superclass); and the object itself (i.e. the value of this inside any of this object's own methods). We won't be concerned with these other parts yet. So how do we actually search for the value of a variable (a similar set of rules applies for handling dot references). We start at the current frame and search it (using assq) for a binding for the name. If it's there, the binding will supply both the type and the value of the variable. If it's not there, we proceed to the parent frame. This continues until we get to the "base frame". For each kind of base frame there is a special rule that determines how to continue the search.

The code to do all of this is in the procedure find-binding in the file java-support:
(define (find-binding var env next)
  ;; This searches only the runtime bindings created by calling
  ;; methods and the fields (not methods) from classes and interfaces.
  (let ((the-object #F))
    (define (loop env)
      (let ((frame (first-frame env)))
        (let ((entry (assq var (frame.bindings frame))))
          (or entry
              (case (first frame)
                ((BINDINGS FOR VARIABLES)
                 ;; Simplest case: try the next frame in the chain
                 (loop (rest-of-frames env)))
                ((GLOBAL)
                 ;; Base case 1: Global environment, variable not found
                 #F)
                ((CLASS)
                 ;; Base case 2: Class, first look at the interfaces,
                 ;; then either the superclass or the superobject
                 (or
                  (any loop
                       (map basic-environment (ev-class.implements frame)))
                  (let ((super (if the-object
                                   (object.superobject the-object)
                                   (ev-class.superclass frame))))
                    (and super
                         (not (predefined-class? super))
                         (loop super)))))
                ((INTERFACE)
                 ;; Base case 3: Interface, look at inherited interfaces
                 (any loop (map basic-environment
                                (interface.implements frame))))
                ((OBJECT)
                 ;; Base case 4: Object, look at the class, but
                 ;; remember that we've found an object so that parent
                 ;; dynamic fields are considered as we go up the
                 ;; chain of superclasses
                 (set! the-object frame)
                 (loop (basic-environment (object.class the-object)))))))))
    (let ((result (loop env)))
      (if result
          (next result)
          (error "Variable not found" var)))))
How is method lookup handled? Take a look at the following code, the main part of method-lookup in java-support:
(define (method-lookup method-name arg-types env next)
  ...definitions...
  (find-object-initial-class-and-name   ; (1)
   (lambda (object initial-class name)  ; (2)
     (find-all-matching-methods object initial-class name  ; (3)
       (lambda (methods signatures classes) ; (4)
         (find-best-method object methods signatures classes)))))) ; (5)
  1. Find-object-initial-class-and-name uses method-name to decide where the search should start. If the method-name is a variable, then we start by finding the value of that variable (whose value must be an object of some kind). If the method-name is a (DOT ...) expression, then we find the value of the first component and look within that for the other names until we reach the next-to-the-last name -- this is the method name we're really looking up.
  2. In either case above we determine an object, class, and name that will be use for the rest of the search (if we don't have an object then we can only look for static methods of the class).
  3. Now search in the class for all of the methods that have the specified name, take the right number of arguments, and have the right modifiers (i.e. dynamic methods aren't allowed if we don't have an object to start with).
  4. As part of this search, we have to collect three parallel lists: the methods to consider, their signatures (i.e. the types of the input arguments, the type of the result, and the modifiers -- see make-signature in java-support), and the classes from which they were extracted.
  5. Finally, pick the best method from amongst these choices.

Problem 2 -- Your turn.

We've supplied about 85% of the code needed to make find-best-method work. But there are two procedures that you have to write on your own: supertype-loop and decide. You can look at the code for method-lookup and find-best-method in file java-support to see how these are used.
(define (supertype-loop desired-type type count)
  ;; Returns 0 for same type, 1 for one step away, etc., or #F
  ;; if type-name isn't a supertype of type
  (if (eq? type desired-type)
      ...
      (let ((super (ev-class.superclass type)))
	(if (ev-class? super)
	    ...
	    ...))))

(define (decide new best decision)
  ;; We start out TIEd, and only change that if we find
  ;; something definitely better or definitely worse.  If we
  ;; have to change our mind from NO to YES (or vice versa)
  ;; then it's really a TIE.
  ...)
Complete the code and put it into java-testing. Hand in the code and evidence that it works. You might want to try (method-test) with the version of the code we've supplied and then compare the results to what you get.

Hints:

  1. supertype-loop may look tricky, but it's just a simple counting loop like ones you've written many times before. Read the comment at the top and just fill in the parts marked ....
  2. new and best are both lists of numbers, generated by supertype-loop. These numbers represent how far it is in the class hierarchy from the actual type of argument being passed to the type the method is expecting; that means that a smaller number (closer together) is better than a larger number.

Implementing VARIABLES

Let's take a look at the code for implementing VARIABLES. This statement is in many ways analagous to Scheme's LET expression: both introduce new variables for the duration of their body. There are some differences, however: in Scheme, LET is an expression and returns a meaningful value; in Java variables are introduced as parts of statements (which don't have values). Also, in Java, the values must first be computed and then coerced to the type that was declared for the variables (this is known as "assignment conversion").
(define (j-eval-variables exp env next)
  ;; Moved to JAVA-WORKING

  ;; (VARIABLES <type> (<VariableDecl> ...) <statement>)
  (let ((vdecls (variables.var-decls exp))
        (type-name (variables.type exp)))
    (let ((names (map var-decl.name vdecls))
          (type (lookup-class type-name))
          (init-exprs
           (map (get-initializer type-name) vdecls)))
      ;; INIT-EXPRS is a list of Decaf Java-In-Scheme expressions that must
      ;; be evaluated to initialize the new variables.
      (j-eval-in-order init-exprs env
        (lambda (init-vals)
          ;; Extend the environment by creating bindings for the new
          ;; variables (including "assignment conversion" of the
          ;; values), and then in this extended environment evaluate
          ;; the statement.
	  .......... YOUR CODE HERE ...........)))))

Problem 3 -- Your turn.

In order to write the missing code, you will need to understand at least the following procedures:

Another Look at Method Lookup

If you read the Java reference manual (available in the on-line Help for J++) you will find that method lookup works in a series of steps, part of which takes place at compile time and part at run time. At compile time, the Java compiler computes what is called the signature of the method that will be called at runtime: the type of each of the input arguments and the type of the result. It does this by finding all methods of the given name, looking up the types of all of the operands (these can all be computed by the compiler using the types that the programmer provides) and comparing these with the types of the arguments that are being supplied (which can also be computed at compile time). The compiler puts this signature into the output file, and at runtime the computer must find a method that matches this signature. This all sounds straightforward, but there's an interesting issue buried in it. Can the compiler really know the type of the arguments that will be supplied at runtime? The answer to this is yes but comes with a warning: the type is known by the compiler, but not the class of the argument. That is, the argument is guaranteed to be of a class that is a subtype of the type the compiler knows, but it isn't guaranteed to be exactly the type the compiler knows. Consider the following Java program:
class SuperClass
{ int X;
}

class SubClass extends SuperClass
{ void Test2(SuperClass x)
  { System.out.println("Superclass arg.");
  }
  void Test2(SubClass x)
  { System.out.println("Subclass arg.");
  }
}

class Lynn
{ public static void main( String[] args )
  { SubClass sub = new SubClass();
    SuperClass sup = new SuperClass(),
               subAsSup = sub;

    System.out.print("Test2(sup)"); sub.Test2(sup);
    System.out.print("Test2(sub)"); sub.Test2(sub);
    System.out.print("Test2(subAsSup)");
    sub.Test2(subAsSup);
  }
}
This produces the following output:
Superclass arg.
Subclass arg.
Superclass arg.
It is this final line that's odd. The variable subAsSup is declared to have type SuperClass, so the compiler decides that it must call a procedure whose signature is "method from an object of type SubClass with an argument of type SuperClass". But at runtime, the actual value of the argument is SubClass, not SuperClass. But the compiler decided to call the method designed for SuperClass objects, and that's what gets called.

If you try this in Java-In-Scheme, you'll get the same result. But we could make the argument (and many object-oriented languages have done so) that this is the wrong behavior. We really want to call the method that best matches the argument being passed: the one that would pring "Subclass arg." in this case.

Problem 4--Your turn again.

There is a very simple change to the code for j-eval-call that will change Java-In-Scheme so that it uses the actual type of the argument, rather than its apparent type (i.e. the type to which it has been cast or coerced), to compute the method to call. The change affects only one line of the code.
  1. Show the line that must be changed, as well as the actual change.
  2. Show that this changes the behavior of Java-In-Scheme for this test program.
  3. Explain, as simply as you can, why the "apparent type" of an object can be used in our Java-In-Scheme interpreter instead of keeping track of the actual data type of expressions, as would be done in a real Java compiler. Then explain why your change works.