MASSACHVSETTS INSTITVTE OF TECHNOLOGY
Department of Electrical
Engineering and Computer Science
6.001 -- Structure and Interpretation of Computer Programs
Spring Semester, 1997
Problem Set 9
Issued: Thursday April 17, 1997
Written solutions due: Wednesday, April 30 for sections meeting
at 10 and 11; Friday, May 2 for sections meeting at 9, 12, 1 and 2.
Reading: Attached code
(java-eval.scm,
and java-support.scm). Note also that
java.scm, and java-syntax.scm can be read online in
the 6.001 Web page, if you are interested.
Note: The overall programming framework will be discussed in
lecture on Thursday, April 17, 1997.
IMPORTANT NOTICE:
This handout should serve both as a set of lecture notes for today's
lecture and as the background material for the problem set itself.
This is a brand new problem set and may very well be buggy. We will
take this into account when grading. Please report problems to the
lab TAs so that they can be fixed over the summer.
All of the code, as well as the HTML for this handout, is at
http://www-swiss.ai.mit.edu/~u6001/ST97/JMiller/ps9-java/problem-set
Decaf Java-In-Scheme: An Introduction
First, a note about the name "Decaf Java". As we will see, the goal
of this lecture and problem set is to explore the implementation of an
evaluator for one computer language, using the evaluator for a second
computer language as a foundation. Here, we are going to implement an
evaluator for a watered down version of Java, by building on our
understanding of Scheme's evaluator.
Why are we doing this, i.e. what's the point of writing an interpreter for a language when there
are perfectly good commercial compilers and development environments
available? In order to understand any complicated system it's often
worthwhile to build and study a small model. The Decaf Java-In-Scheme
interpreter is the computer equivalent of a scale model of a jet
airplane, and can serve the same purposes:
- We'll see how the fundamental structure of the interpreter is very
similar to the structure of Scheme's interpreter. In fact, it's not
too much of an exaggeration to say that all interpreters for
computer languages have roughly the same structure.
- We'll see where the languages differ fundamentally, not just on
the surface. For example, why does Java have a complicated mechanism
for resolving method names when Scheme's is so much simpler?
- We'll learn a good deal about Java design decisions by seeing how
our little interpreter does the job. For example, we'll have to
understand exactly what the Java rules for coercion, casting,
and conversion are intended to accomplish.
- We'll learn some restrictions on any implementation of
Java. For example, these same rules about casting and so forth imply
some very strong requirements for building any Java virtual machine or
interpreter.
- We'll learn about interactive programming environments, since
we've had to extend Java to allow it to be used in a read-eval-print
loop like Scheme's.
Decaf Java-In-Scheme: Syntax and Simplifications
Designing and building a model is a difficult engineering task. We
have to decide what questions we want to study and what issues we're
willing to ignore. Then we have to build the model so that we can
study what we want and argue that our results would apply even if we'd
built the entire system (i.e. that there aren't any significant
cross-couplings in the real world).
In designing Decaf Java-In-Scheme, our main goal was to make the
interpreter small enough that it can be understood by someone familiar
with Scheme interpreters with about a week of study. That's a strong
constraint, and in order to meet it, we've done a good deal of
simplifying. Here are some of the major decisions that you will need
to understand before examining the code:
- Parsing
- Java's syntax is intended to be parsed by a fairly complex program
that reads characters and builds what is known in the trade as a
"parse tree". Systems that automatically generate parse
trees are interesting pieces of code and would probably form the basis
for a nice problem set on their own. We are going to sidestep the
associated syntax issues and impose a major restriction on our
language. In particular, we are going to require that we write our programs
as Scheme data structures that are already parse trees.
The language we actually implement is really one that takes as
input a Java parse tree and interprets it, rather than one that takes
Java source files. Thus, you will act as the parse tree generator, by
using a set of rules to convert Java code to a representation that is
suitable for input to a Scheme program that is simulating the Java
interpreter -- hence our use of the name DECAF JAVA rather than
Java.
- Missing Features
- Java has a number of features that are important for use in a
large scale programming project, but which take a lot of time and code
to implement. Many of these are in the class libraries of Java, but
some are built into the language itself. We've decided to omit
several important features of Java in order to make the implementation
simpler; adding back the features is hard work, but (we argue) isn't
really going to change the structure of the implementation very much.
Here's a partial list of what we've omitted:
- Visibility Rules (i.e. handling public, private etc.)
- Exceptions, including throw and atch .
- Threads and related issues (synchronization and locking).
- Some built-in operations. Mostly we dropped the "bit
diddle" operations, leaving intact the standard numeric and
logical operations. We also removed the pre- and post-fix operations
(++, --).
- Static type checking, although the syntax matches Java's so it can
be added later.
- Because Scheme isn't case sensitive, our Decaf Java-In-Scheme isn't case
sensitive either. That means that translating real Java into
Decaf Java-In-Scheme can be tricky because Int and int
must be translated differently somehow (but we've muddled up that
distinction anyway, see below).
- Arrays, packages, explicit object constructors, importing of
names. We've completely removed these, not even allowing the
syntax to describe them.
- No switch statement. We had thought we'd assign this as part of
the problem set, but it is just tedious to do and has no intellectual
value. It's very useful if someone else implements it, of
course.
- No interfaces. Most of the code for interfaces is there, but the
code for actually declaring an interface isn't complete and none of
the rest of the code has been tested. Next year's 6.001 students may
find the situation different...
- Simplifications
- A lot of what we did was done just to make the implementation job
simpler. For example, we didn't implement any of the Java
class libraries (we added the two built-in operations print
and newline to compensate for the most obvious problem this
causes). We unified all of Java's different numeric types into one
new type, number. Note: dumping the class libraries actually
does have an impact on our implementation, since some of the
required libraries allow you to explore the class structure and so
forth. That means that some information must be available at runtime
to a Java program that our system may not actually maintain.
- Dirty Tricks
- Frankly, some of the things in Java are either hard to implement
in the new framework or are things we didn't like to begin with. In
particular, the built-in operations in Decaf Java-In-Scheme (like +
and so forth) don't correctly coerce their arguments; this would have
been hard and uninteresting code. We've also dropped the distinction
between the classes int and Int because, while
important in the real world, it's hard to implement and is something
that we'd really rather didn't have to be there at all.
Decaf Java-In-Scheme: Details of the Syntax
Let's begin by looking at the manner in which we will represent Decaf
Java expressions as equivalent Scheme expressions.
The file java-syntax.scm contains procedures to test for all
of the grammatical constructs that occur in our version of Java. You
can refer to this code for any details you require, but in general the
code here isn't very interesting and you needn't do more than scan it
to get the basic idea of how the syntax is handled. Again, we stress
that the Scheme procedures in this file are expecting as input Scheme
expressions that represent the equivalent Java expressions, but you
will be responsible for providing this translation. In other words,
this body of code represents our interface between Java and the Scheme
based parse trees of our interpreter.
The best way to understand Decaf Java-In-Scheme's syntax is to compare it
with Java directly. Just as in Java (and, indeed, most modern
programming languages), there are three major kinds of constructs:
expressions which can be used to compute values;
statements which can be executed for side-effects (but which
don't compute values); and declarations which are used to
define new names (in Java, declarations define classes and
interfaces). Notice that any expression can be used
where a statement is expected (but not the reverse).
Decaf Java-In-Scheme Expressions
Let's begin by exploring the expressions of the language. The
procedure expr? tells us what are legal expressions, its body
is:
(or (literal? obj)
(java-name? obj)
(new? obj)
(dot? obj)
(call? obj)
(cast? obj)
(instanceof? obj)
(built-in-expr? obj)
(assignment? obj))
and thus each of these predicates defines a different legal
expression.
We can build a chart that compares Java expressions and the
corresponding Decaf Java representations, which we will examine in
detail below:
JAVA |
DECAF JAVA-IN-SCHEME |
3 |
3 |
foo |
foo |
new foo() |
(new foo) |
this.F |
(dot this f) |
F(3) |
(call f 3) |
this.F(3, "x") |
(call (dot this f) 3 "x") |
(String) x + y |
(cast string (+ x y)) |
predicate ? consequent : alternative |
(? predicate consequent alternative) |
x = 3 |
(= x 3) |
Now let's look at each of these kinds of expressions.
literal expressions are defined as (Scheme) numbers,
booleans, characters, strings, and the special names FALSE, NULL, and
TRUE. A java-name (java-name?), which is the
Decaf Java-In-Scheme equivalent of a Java variable, is any symbol that isn't
one of the "magic" words of the Decaf Java-In-Scheme language.
For reference, here are the names you can not use for
variables (or anything else, for that matter) in Decaf Java-In-Scheme:
!= * *= *BREAK-LABEL* *CONTINUE-LABEL* *RETURN-LABEL* + += - -=
/ /= < <= = == > >= ? ABSTRACT AND BLOCK CALL CAST CLASS
DO DOT EXTENDS FALSE FIELD FINAL FOR IF IMPLEMENTS INSTANCEOF INTERFACE
METHOD NEW NEWLINE NULL OR PRINT PRIVATE PROTECTED PUBLIC STATIC SUPER THIS
TRUE VARIABLES WHILE
A new expression is used to create new objects, and looks
like (NEW <type name>). In Java,
arguments can be used to choose which of several different
"object constructors" should be called; but this is one of
the simplifications we've made: since we don't support the declaration
of explicit object constructors, we don't support the syntax for
calling them, either.
A dot expression is used for Java constructs like
this.F or super.X or
A.B.C.X. In Decaf Java-In-Scheme these are written as
(DOT this F), (dot super X), or
(Dot A b c x) respectively (remember that
Decaf Java-In-Scheme doesn't care about upper/lower case distinctions like
Java does).
A call represents Java's method invocation. In
Decaf Java-In-Scheme we write (CALL F 3) when in Java we
would write F(3);. Similarly, we write (CALL
(dot this F) 3 "x") when in Java it would be this.F(3,
"x");. A cast is written as (CAST String
(+ X Y)) instead of Java's (String) X+Y;.
Similarly, we write (INSTANCEOF (+ X Y) String)
instead of X+Y instanceof String.
A built-in corresponds to Java's set of "infix
operators". In Decaf Java-In-Scheme, we support the following binary
operators, corresponding directly to their Java counterparts:
!= * + - / < <= == > >=. As mentioned
earlier, Decaf Java-In-Scheme doesn't handle coercion of operands for these
built in operations, so be sure to cast the operands if that is
required. The following is a legal Decaf Java-In-Scheme expression:
(+ X (* Y Z)). Notice that (+ X Y Z)
is legal in Scheme, but not in Decaf Java-In-Scheme
(Decaf Java-In-Scheme's + operator is binary, while Scheme's takes
any number of arguments). Also, + in Decaf Java-In-Scheme is like
Java's +, not Scheme's: it will work when both arguments are
numbers or when both arguments are strings.
In addition, we support (? <predicate expr>
<consequent expr> <alternative
expr>) corresponding to Java's
<predicate expr> ? <consequent
expr> : <alternative expr>. We also
support (AND ...) where there can be any number of
expressions for ..., corresponding to Java's
&& operator as well as (OR ...) for
Java's || operator. We've added two built-in operators,
(newline) and (print
<expr>).
Finally, an assignment expression allows us to modify the
value of an existing variable (like Scheme's SET!, but with a
defined value returned). Decaf Java-In-Scheme, like Java, supports a number
of different assignment operations. In Decaf Java-In-Scheme these are
= *= /= += -=. For example, (= X 3)
changes the value of the variable X to 3 and returns the
value 3.
Decaf Java-In-Scheme Statements
The procedure statement? specifies what constitutes a Java
statement, that is, what Decaf Java translations into Scheme represent
legal statements:
(define (statement? obj)
(or (expr? obj)
(and (pair? obj)
(case (car obj)
((BLOCK) (all? statement? (list-tail obj 1)))
((BREAK) (break? obj))
((CONTINUE) (continue? obj))
((DO) (do? obj))
((FOR) (for? obj))
((IF) (if? obj))
((LABEL) (label? obj))
((RETURN) (return? obj))
((VARIABLES) (variables? obj))
((WHILE) (while? obj))
(else #F)))))
In addition to expressions, the following syntaxes are supported.
Notice that Java's switch isn't supported (yet...)
We stress once more that the descriptions below identify what you
write in Decaf Java-In-Scheme as being equivalent to Java expressions,
since our interpreter requires Scheme based syntax as input.
- (BLOCK <statement> ...)
- A block of statements to be executed in order. This is
more-or-less equivalent to Java's use of braces ("{" and "}") to group
statements together. It's really used as a trick to make our parse
tree (i.e. the Scheme representation of a Java program) a little
simpler. Instead of having lots of places where any number of
statements can appear, we just have a slot for a single statement and
allow that statement to be a BLOCK.
- (BREAK) or (BREAK
<name>)
- Java's break command, for exiting loops or named
blocks of code.
- (CONTINUE) or (CONTINUE
<name>)
- Java's continue command, for continuing around a
loop or named block of code.
- (DO <statement> <expr>)
- Java's do <statement> while
<expr>; command. It executes the
statement, then evaluates the expr to see if it
should repeat.
- (FOR <init statement> <boolean expr>
<update> <body statement>) or
(FOR <variable declaration> <boolean expr>
<update> <body statement>)
- Java's for statement. In the first case, the
init statement is executed, then the boolean expr,
then (conditionally) the body statement followed by the
update. Then the boolean expr is evaluated again,
and so forth as long as it remains true. The second case is
almost the same, but instead of executing the init statement,
a new environment is created with the specified variables initialized
and the remaining expressions are evaluated in that environment.
(Note to Java pedants: the update part of a for isn't
really allowed to be an arbitrary statement, but is restricted to a
"StatementExpressionList". We've ignored this restriction
in order to make the syntax checking simpler. And, because we've
added BLOCK as a kind of statement we don't need to have a
special case for statement lists in the init statement as in
real Java.)
- (IF <predicate expr> <consequent statement>
<alternative statement>)
- Java's if statement. Unlike the conditional expression
(Java's ? : expression, or Decaf Java-In-Scheme's (? ...),
or Scheme's (IF ...)), this is a statement, not an
expression. It doesn't return a value.
- (LABEL <name> <statement>)
- Java allows statements to be labelled by prefixing them with a
name and a :. The LABEL statement in
Decaf Java-In-Scheme serves the same purpose, but it requires you to package
all of the labelled information as a single statement (which is easy
to do using the BLOCK construct).
- (RETURN) or (RETURN
<expr>)
- Java's return statement. (RETURN) can only
be used to return from a method that is declared to return the type
void. The other form must be used to return a specific value
of the correct type from a method.
- (VARIABLES <type> (<variable
declaration> ...) <statement>)
- Java allows variables to be declared at the beginning of a block
of code. In the parse tree (i.e. in Decaf Java-In-Scheme) this is reflected
by having a different construct for blocks that have such declarations
in them. A single VARIABLES node introduces any number of
names (one for each variable declaration) but they all have
the same type. They can be nested to introduce variables of
different types. Each variable declaration is either just a
java-name (i.e. it has no specific initialization) or is of
the form (= <name> <expr>)
to specify the code that must be executed to initialize the variable.
- (WHILE <boolean expr>
<statement>)
- Java's while statement. Similar to the DO
statement, but the value of the boolean expression is tested
before the statement is executed rather than after.
Decaf Java-In-Scheme Declarations
There are only two declarations in Decaf Java-In-Scheme, one for
classes and one for interfaces, and in this problem
set we are ignoring interfaces. The following Decaf Java expression
corresponds to the same construct in Java:
(CLASS [(modifier ...)] <name>
[(EXTENDS <name> ...)]
[(IMPLEMENTS <name> ...)]
<body form> ...)
(INTERFACE [(modifier ...)] <name>
[(EXTENDS <name> ...)]
<body form> ...)
The entries inside brackets ("[", "]") are
optional -- you can leave them out if you want. The ... means
that you can repeat the thing before it as often as you like.
The modifer must be one of the set allowed in Java:
PUBLIC PROTECTED PRIVATE STATIC ABSTRACT FINAL. (It
doesn't make sense to repeat a modifier, but it doesn't matter and
Decaf Java-In-Scheme won't complain.)
Note: In order to make it easier to deal with these complicated
syntaxes, the procedures in java-syntax.scm are a little
different from those in the book. Several of them take in the syntax
that we're showing you here and return a more stylized syntax that
includes entries for everything you could have put in,
whether you put it there or not. This is called
"canonicalizing" the parse tree and is quite common practice
in compilers and interpreters. It makes the predicates for the syntax
a bit more complex (they have to do the canonicalization) but the
result is that the selectors are simpler and faster.
A body form can be one of three things:
- A method declaration (in a class). These look like:
(METHOD [(<modifier> ...))]
<type name> <method name> (<parameter> ...)
<statement>)
where a parameter is (<type name>
<parameter name>).
For example, here is a Decaf Java method:
(METHOD (public static) void main ()
(call (DOT System out println) "Hello, world"))
The corresponding Java code is:
public static void main()
{
System.out.println("Hello, world");
}
- A body form can declare a field. Fields can be static
(shared by all instances of the class) or dynamic (a new field is
created for each instance of the class). A field is declared using
this syntax:
(FIELD [(<modifier> ...)]
<type name> (<variable declaration> ...))
.
Here is a Decaf Java-In-Scheme example:
(class parent
(field string ((= bar "Parent bar")))
(method void foo ((string s))
(block
(print (+ s (+ ": Parent foo, " (dot this bar))))
(newline)))
(method void foo2 ((string s))
(block
(print (+ s (+ ": Parent foo2 called, "
(dot this bar))))
(newline)
(call (dot this foo) s))))
and here is the corresponding Java code:
class Parent
{
String bar = "Parent bar;
void foo (String S)
{
System.out.println(S + ": Parent foo, " + this.bar);
}
void foo2 (String S)
{
System.out.println(S + ": Parent foo2 called, " + this.bar);
this.foo(S);
}
}
- Finally, a body form can be a static initialization
expression in the form (STATIC
<statement>). These are not permitted inside
of interfaces, only classes.
Important: You will be responsible for converting Java
programs into Decaf Java-In-Scheme. So let's get started with a
simple
Tutorial Exercise:
As soon as youhave read the rest of this handout, convert the
following small Java Class into a Decaf Java-In-Scheme equivalent
parse tree. WRite the code needed to test out (in both J++ and
Java-In-Scheme) and be prepared to show your output in tutorial. In
J++ you will have to make another class that contains the test code as
part of public static void main(String args[]). In
Java-In-Scheme you will ned to use global to make a gloabl
variable of type NumericFun and then use call to
call the methods.
class NumericFun
{ int TheNumber;
public void SetNumber(int n)
{ TheNumber = n;
}
private int Fib(int n)
{ if (n==0) return 0;
else if (n==1) return 1;
else return Fib(n-1)+Fib(n-2);
}
public int Fib()
{ return Fib(TheNumber);
}
}
To help you get started in the Java-In-Scheme version, here's a
partial template for you to complete:
(define NumericFun
'(class NumericFun
(field number (TheNumber))
(method (public) void SetNumber ((number n))
(= TheNumber n))
(method ...) ; Fib (n)
(method ...))) ; Fib ()
(run-exprs
(list NumericFun
'(global NumericFun
(= Nine (new NumericFun))
(= Ten (new NumericFun)))
'(call (dot Nine SetNumber) 9)
'(call ...)
'(print (call ...)) ; (fib 9)
'(newline)
'(print (call ...)) ; (fib 10)
'(newline)))
Problem 0
To get you some experience in converting Java expressions into the
equivalent Decaf Java-In-Scheme expressions, here are a couple of warm
up exercises.
- Here is a Decaf Java expression. What is the corresponding
Java code?
(method number fact-fn ((number n))
(variables number ((= result 1))
(block
(for (number (= i 2)) (<= i n) (+= i 1)
(*= result i))
(return result))))
- Here is some simple Java code. What is the corresponding Decaf
Java-In-Scheme translation?
class Rotating
{
int start = 0; // variable to hold state of object
int end; // variable to determine last value before
// going back to 0
Rotating(int limit)
{
end = limit;
}
void Shift()
{
start += 1;
if (start > end)
{
start = 0;
}
}
int Status()
{
return start;
}
}
Decaf Java-In-Scheme: The Interpreter
Okay, so we've seen how to translate Java code into the Decaf Java
equivalent so that we can now have our interpreter evaluate such code.
Let's take a look at the code in java-eval.scm. This is the
basic evaluator for our Decaf Java-In-Scheme language. Just like the Scheme
interpreter in the book, it consists of a main entry point where an
expression is analyzed and we then dispatch to handlers for each kind
of expression in the language. Notice one small but important
difference from the Scheme interpreter in Sussman and Abelson -- here, each of
the evaluator procedures takes three arguments:
- exp
- The expression to be evaluated. This will have the syntax
described above, and that syntax can be manipulated by the selectors
provided in the file java-syntax.scm.
- env
- The environment in which to evaluate the expression. As in
Scheme, this is used to find the values of variables. It is also used
to find methods and to interpret the meaning of this
and super. As we'll see later, in Java, unlike Scheme,
method names aren't looked up the same way variables are.
- next
- This is the part that differs from the Scheme interpreters we have
seen so far. Unlike Scheme, Java (and hence Decaf Java-In-Scheme) has
statements that alter the "flow of control": break,
continue, and return (in a full implementation of
Java, exceptions also alter the flow of control). In order to
implement these statements, we provide this additional procedure which
must be called by the evaluation procedure when it has finished
calculating the value of the expression instead of returning
an answer as they do in the Scheme interpreter.
If this is done consistently throughout the interpreter the
flow-altering operations just call some procedure other than
the one they are told to call, and the effect is that the program runs
in a different order. This is quite subtle, and you don't have to
worry about understanding it in detail. Just bear in mind that
evaluation procedures (including j-eval and
j-eval-in-order) do not return values to the
procedure that called them, but call their next procedure
instead.
The main procedure of the Decaf Java-In-Scheme interpreter is
j-eval:
(define (j-eval exp env next)
;; EXP is a Decaf Java-In-Scheme expression.
;; ENV is an environment (see abstraction below)
;; (NEXT value) is called when the value has been computed
(cond
;; Expressions
((literal? exp) (j-eval-literal exp env next))
((variable? exp) (j-eval-variable exp env next))
...
;; Statements
((variables? exp) (j-eval-variables exp env next))
((if? exp) (j-eval-if exp env next))
...
;; Handle declarations specially
((global? exp) (j-eval-global exp env next))
((quick-class-test? exp)
(let ((full-class (class? exp)))
(if full-class
(j-eval-class full-class env next)
(error "Bad syntax for CLASS" exp env))))
((quick-interface-test? exp)
(let ((full-interface (interface? exp)))
(if full-interface
(j-eval-interface full-interface env next)
(error "Bad syntax for INTERFACE" exp env))))
(else (error "Unknown type of expression" exp env))))
For the most part, this just calls the syntax testing procedures and
dispatches to the appropriate handling procedure. The only exception
is the handling of class and interface where the
syntax procedure (as mentioned earlier) canonicalizes its
input. In this case, the canonicalized input is passed to the handler
instead of the original expression.
Let's take a look at a few representative evaluation procedures.
We'll start with the very simplest, j-eval-literal:
(define (j-eval-literal exp env next)
(next (if (eq? exp 'NULL) '() exp)))
All it has to do is figure out the value of the expression
exp and call next with that value. That's easy: for
any literal except the NULL the value of the literal is
itself. We represent Decaf Java-In-Scheme's NULL with Scheme's
empty list, as shown.
OK, how about something harder? Let's take a look at how
if is handled:
(define (j-eval-if exp env next)
;; (IF <test> <consequent> <alternative>)
(j-eval (if.predicate exp) env
(lambda (pred) ; (1)
(cond ((eq? pred #T)
(j-eval (if.consequent exp) env next))
((eq? pred #F)
(j-eval (if.alternative exp) env next))
(else (error "IF: Not a boolean value"))))))
The structure is slightly different from the version we've used for
Scheme. Remember that j-eval doesn't return a value.
Instead, it calls its third argument with the value it has computed.
So how do we read this procedure? It starts by calling
j-eval to compute the value of the (if.predicate
exp) in the current environment. The value of this
expression is passed to the procedure that's marked (1) as
the parameter pred. Notice how this defines a new
next procedure, thereby passing the evaluation along.
This procedure checks to see if
pred is true, false, or something else. If
it's true, then we call j-eval again to compute the value of
the (if.consequent exp) in the same environment,
and tell it to pass the value on to the procedure named
next. This is important: remember that j-eval
won't return a value -- it will call next instead.
Does that make sense? Let's see. We were supposed to compute the
value of some big if expression and call next with
the answer when we figure it out. So first we found out that the
predicate of the if was true. Then we decided to call
j-eval to find the value of the consequent and we've told
j-eval that it should pass that answer on to next.
Aha! That's clever. That means that we don't get back into the loop
of handling the result any more -- we've simplified our original
problem (finding the value of the if) to the problem of
finding the value of the consequent of the if, and
we can turn that completely over to someone else (j-eval) to
take care of. This is what is known as reducing the problem
to a simpler case. In this style of interpreter, we always
try to reduce problems when we can, and we know that we've reduce it
when we can call j-eval with our own next procedure
as its next procedure (technically, these procedures are
known as "continuations").
Now let's take a look at an evaluation procedure for a statement.
Here's the one for block:
(define (j-eval-block exp env next)
;; (BLOCK <statement> ...)
(define (loop exprs)
(if (null? exprs)
(next "End of block")
(j-eval (car exprs) env
(lambda (val) (loop (cdr exprs))))))
(loop (cdr exp)))
Notice that when it calls j-eval it does not use its
own continuation (next); instead it uses the procedure
created by (lambda (val) (loop (cdr exprs))). This means
that evaluating each statement within the block is a subproblem
of the original problem. It isn't until after the last statement is
evaluated that we actually use our own continuation to return a value.
Let's take a look at a few more evaluation procedures and then we can
start the actual problem set (you can't wait, right?). Let's take a
look at how labels work in this interpreter. First, how do we create
a label? That's what the Decaf Java-In-Scheme (LABEL
<statement>) syntax is for. Here's how it is
implemented:
(define (j-eval-label exp env next)
;; (LABEL <name> <statement>)
(let ((name (label.name exp))
(labels *labels*)) ; (1)
(define (return-here) ; (2)
(set! *labels* labels)
(next "End of labelled block"))
(add-label! name return-here) ; (3)
(j-eval (label.statement exp) env ; (4)
(lambda (val) (return-here)))))
- To understand how labels work, you need to know that there
is a global variable, *labels*, that holds all of the labels
that are currently known. It is just a Scheme list onto which new
labels are added when the labelled block is entered and then it is
reset when the block is exited. So to execute a LABEL
statement we need to start by remembering the current contents of that
list.
- return-here is the procedure that will be executed
whenever we exit this label block. All it does is restore the list of
labels (*labels*) to the value it had when we got here, and
then returns a value of "End of labelled block" to whoever
asked the block to be evaluated. The value, of course, isn't really
used -- it's just an artifact of the way we write our interpreter that
everything (including statements) returns a value.
- Here's where we add the new label to the global list of
labels. The code for add-label! is found in
java-support.scm:
(define *labels* '())
(define (add-label! name label)
(set! *labels*
`(,(make-binding name '*LABEL* label)
,@*labels*)))
It is a bit ugly, but all it really does is change the value of the
global variable *labels* by adding one new binding to the
front of the list -- a binding from the label name to the procedure we
specify (return-here in our case) with a type of
*LABEL*.
- Having added the label, we can go evaluate the statement
that's been labelled. This is done by calling j-eval, of
course, but notice the continuation we are using: it is not
our own continuation (that would be next). Instead we are
telling j-eval that when it is done it should call the
procedure created by (lambda (val) (return-here)). That is,
when it finishes, it calls back to this procedure with the value (which
is ignored) and then calls return-here.
Return-here, of course, just restores the global list of
labels (so it removes our labels and any others that might have been
added) and calls our continuation (next) to report that we're
finished.
So far, so good. But how do we use the label that we just created?
Here's how (BREAK <label name>) works:
(define (j-eval-break exp env next)
;; (BREAK) or (BREAK )
(exit exp '*BREAK-LABEL*))
(define (exit exp default-label)
;; Evaluate the procedure associated with a given label, supplying a
;; default label if none is specified to use. Used to implement
;; (BREAK) and (BREAK <label>), etc.
(let ((name (if (null? (list-tail exp 1))
default-label
(second exp))))
((lookup-label name)))) ; (1)
(define (lookup-label name)
(lookup-global name *labels* "No such label"))
(define (lookup-global name global-var error-msg)
;; Find the value of a name in a global name space
(let ((entry (assq name global-var))) ; (2)
(if entry
(binding.value entry) ; (3)
(error error-msg name))))
- Here's where the real work is done: notice that it says
((lookup-label name)) and not just (lookup-label
name). So it looks up the label we're trying to find and
then it calls the procedure associated with that label. If
the label was made by the code in j-eval-label then the
procedure will be one version or another of return-here: the
procedure that removes all the newer labels and then returns to
whoever evaluated the labelled block! This is what alters the flow of
control: we call a continuation, but not the one that is
expected -- that would have been the procedure next that is
passed in to j-eval-break. Instead, we're calling the
next that was passed to j-eval-label when the label
was created.
- To find a label, we use Scheme's built in procedure
assq to find it in an appropriate global list
(*labels* in this case). To make this work, we had to
arrange that bindings were in an appropriate format to be used with
assq.
- Once we find the entry in the global list
(*labels*) we still have to get the value part out -- the
full entry is a binding so we want the binding.value
part.
Using Decaf Java-In-Scheme
Before we start the main part of problem set proper, let's take a look at how you
can use the interpreter. In the file java-support you will
find several small programs (all of them compute factorial). Take a
careful look at them to get the idea of how you translate a Java
program into Decaf Java-In-Scheme.
Pay particular attention to how things hook into the
read-eval-print loop. You can start it by calling
(driver-loop) or you can use go to specify some
code to evaluate before the loop begins. A simple model you might
have is that when you type into the loop, is that you are actually
typing code that is part of a static method defined in an unnamed
class that's global to the whole program. This idea lets you access other
classes, and so forth.
To make it a bit more convenient, we've added a new statement to
the Decaf Java-In-Scheme language:
(GLOBAL <type> <VariableDecl> ...)
This allows you to declare new variables that are visible from the
read-eval-print loop. With this in place, you can run a simple
factorial example like the one in test2:
(define (test2 n)
(run-exprs
`((CLASS fact
(METHOD (public static) number fact-fn ((number n))
(VARIABLES number ((= result 1))
(BLOCK
(FOR (number (= i 2)) (<= i n) (+= i 1)
(*= result i))
(RETURN result)))))
(global fact (= x (new fact)))
(call (dot x fact-fn) ,n))))
That's roughly equivalent to the following Java program:
class fact
{ method (public static) int fact-fn (int n)
{ number result=1;
for (int i=2; i <= n; i += 1) result *= i;
return result;
}
}
class REPLoop
{ fact x = new fact();
x.fact-fn (n);
}
You can use it like:
(/ (test2 10) (test2 9)) ==> 10
And the Main Part of the Problem Set Begins ...
Let's take a look at one more evaluation procedure, the one for
while:
(define (j-eval-while exp env next)
;; (WHILE <boolean expr> <statement>)
(let ((labels *labels*) ; (1)
(bool-expr (while.predicate exp))
(statement (while.statement exp)))
(define (done) ; (2)
(set! *labels* labels)
(next "End of WHILE"))
(define (again) ; (3)
(j-eval bool-expr env ; (4)
(lambda (bool) ; (5)
(cond ((eq? bool #T)
(j-eval statement env
(lambda (ignored-value) (again))))
((eq? bool #F (done)))
(else (error "WHILE: Not a boolean value"))))))
(add-label! '*BREAK-LABEL* done) ; (6)
(add-label! '*CONTINUE-LABEL* again)
(again))) ; (7)
- As with j-eval-label, we have to remember what
labels existed when we started here. We also extract the various
parts of the WHILE expression for use later.
- Done is like the return-here procedure
for labels. It just restores the global label list to what it was
before we got here and returns an uninteresting value as an answer
(notice that it uses next, its own continuation, to deliver
the answer).
- Again is the main loop of the while
procedure. It is what needs to be repeated if there is a
continue executed inside of the code.
- First, evaluate the boolean expression by calling
j-eval.
- Notice that the continuation we pass will be called
when the value of the boolean expression has been computed.
- If it is true, then we evaluate the body of the loop (that is,
statement). Notice that we specify that when the value
is computed j-eval should call (lambda
(ignored-value) (again)) with the value. The value, of
course, is just ignored. But we then restart the loop by
calling (again).
- If it is false, then we exit by calling (done),
which removes any labels added while we were executing and calls
our own continuation (next) to return a (not very
intersting) value to the caller.
- If it isn't either true or false, we generate an error. Notice
that a real Java system is guaranteed that this can't
happen, because the compiler (or analyzer if we had an analyzing
evaluator) would know in advance that a non-boolean value might
be returned.
- Here's the main body of the procedure
j-eval-while. It creates two labels: *break-label*
is used by the break statement to exit the loop, so that it
will call done; *continue-label* is used by the
continue statement to go around the loop again, so it will
call again.
- We have to start the loop ourselves, so we call
(again).
Problem 1 -- And now it's your turn.
We've written the code to handle not
only while, but also do. There used to be code in
both java-syntax (for procedures do?,
do.statement, and do.predicate) and in
java-eval (for j-eval-do). It's been moved to
java-working and we aren't distributing the source code for
that. Write your own versions in the file java-testing and
reload java. This should restart the whole system, with your
definitions overriding ours.
Turn in:
- A very small Java program that uses do ... while
that you've tested using J++.
- The translation of that Java program into Decaf Java-In-Scheme.
- A transcript showing that the program works in Decaf Java-In-Scheme with
your new definitions.
- Your four procedure definitions: (do? exp) should test
whether a given Scheme data structure represents a Decaf
Java-In-Scheme do expression; (do.statement exp)
extracts the statement (body) of the do statement (you
mayassume that the expression (do? exp) is #t;
(do.predicate exp) extracts the predicate (test) of the
do statement (you may assume that the expression (do?
exp) is #t; (j-eval-do exp env next) should
evaluate the exp, which is guaranteed to be a do
statement. That is, it should first execute the body, then evalute the
test, and repeat this until the test returns #f. It needs to
bind labels like while; break should exit the loop
(just as though the test had been #f) and continue
should go immediately to the test part of the loop.
Variables vs. Methods
Let's take a close look at how Decaf Java-In-Scheme (and, fundamentally,
Java) handle variables and methods. In Scheme we have Mantra 2: "to
find the value of a combination, evaluate all of the subexpressions,
in any order, and then apply the first to the rest." So, from
Scheme's point of view, the operator of an expression and the operands
of an expression are all evaluated in the same way: a variable is a
variable is a variable, and it doesn't matter whether we plan to pass
the value of the variable as an argument to another procedure or use
its value as a procedure to call.
But what happens in Java? If you read the reference manual
carefully you will see that the rules for variables and the rules for
methods are very different. Finding the value of a variable is very
similar to what Scheme does: there is a chain of environment frames,
and they are searched in order to find the value of the variable. In
Decaf Java-In-Scheme there are four ways for variables to be created and
given values:
- Method invocation
- This binds the names of the method's parameters to values. This
is very similar to what happens in Scheme when we apply a procedure to
arguments, and for similar reasons. In our implementation, these
frames are called bindings frames and are tagged with the
keyword BINDINGS.
- Variable declaration
- Variables can be declared either inside of a method body (in
Decaf Java-In-Scheme this is what the VARIABLES syntax does) or in
the initialization part of a FOR statement. In our
implementation these frames are tagged as VARIABLES and
FOR frames, respectively.
- Static (class) or dynamic (object) fields
- The fields
that are body forms in a class declaration can be referenced
as variables in Java. Within a method, these fields are referenced
either directly by their name or using the "dot" notation
(like this.X or A.B.C).
- Global declarations
- We've added the GLOBAL syntax to Decaf Java-In-Scheme to allow
variables to be declared in the global (read-eval-print) environment.
To find the value of a variable we use the idea of a chain of frames
-- an environment -- just as we do in Scheme. In Scheme all
environments end at the same frame (the global environment); but this
isn't true of Decaf Java-In-Scheme. Instead, there are four kinds of frames
that can serve as the end of a simple chain:
- The Global Frame
- This is just like Scheme's global environment frame. It is the
frame in which the GLOBAL syntax creates or modifies
variables.
- Class Frames
- These are frames that represent Java classes. When a static
method is invoked, it creates an environment chain that ends in a
Class Frame representing the class whose method is being called.
Class frames are represented by the ev-class structure
defined in the file java-support. Class frames are created
when the class is declared (see the code for j-eval-class).
Class frames have many parts, but the most important ones for our
current purposes are the lists of bindings. A class has three
separate lists of bindings: names of methods to the methods
themselves; names of static (class) fields to their values; and names
of dynamic (object) fields to the expressions that will be used to
initialize them when an instance of the class is created.
- Object Frames
- These are frames that represent a Java object. They have a part
that serves exactly as a Scheme frame, mapping names to values. The
names are those of the "direct dynamic fields" of the
object's class. That is, they are the dynamic fields that are
declared in the object's direct class (the most specific class of
which it is an instance). When a dynamic method is invoked, it
creates an environment chain that ends in an Object Frame representing
the object whose method is being called.
Object frames are created by the new expression. The
other three parts of the object frame are the object's class (an
ev-class structure); the "superobject" (that is, an
object frame that has the dynamic fields of this object's class's
direct superclass); and the object itself (i.e. the value of
this inside any of this object's own methods). We won't be
concerned with these other parts yet.
So how do we actually search for the value of a variable (a similar
set of rules applies for handling dot references). We start
at the current frame and search it (using assq) for a binding
for the name. If it's there, the binding will supply both the type
and the value of the variable. If it's not there, we proceed to the
parent frame. This continues until we get to the "base
frame". For each kind of base frame there is a special rule that
determines how to continue the search.
- In the global frame, which will be the base frame when we
type expressions from the read-eval-print loop, there is nowhere else
to go so we get an error.
- In a class frame we first search the interface frames for
the interfaces it inherits (there is a list of these frames in the
ev-class data structure). If it can't be found there, we
need to look at the superclass, which can also be found from the
ev-class data structure. This same process repeats until we
get to the end of the superclass chain (we will either reach a
null? or a primitive class like *object*?).
- In an object frame we first search the immediate class of
the object, as described above. But instead of going on to the
superclass, we search the superobject instead. This means that we
alternate between searching dynamic fields and static fields as we
walk up the object / class chain.
- In an interface frame we search the interface frames from
which it inherits.
The code to do all of this is in the procedure find-binding
in the file java-support:
(define (find-binding var env next)
;; This searches only the runtime bindings created by calling
;; methods and the fields (not methods) from classes and interfaces.
(let ((the-object #F))
(define (loop env)
(let ((frame (first-frame env)))
(let ((entry (assq var (frame.bindings frame))))
(or entry
(case (first frame)
((BINDINGS FOR VARIABLES)
;; Simplest case: try the next frame in the chain
(loop (rest-of-frames env)))
((GLOBAL)
;; Base case 1: Global environment, variable not found
#F)
((CLASS)
;; Base case 2: Class, first look at the interfaces,
;; then either the superclass or the superobject
(or
(any loop
(map basic-environment (ev-class.implements frame)))
(let ((super (if the-object
(object.superobject the-object)
(ev-class.superclass frame))))
(and super
(not (predefined-class? super))
(loop super)))))
((INTERFACE)
;; Base case 3: Interface, look at inherited interfaces
(any loop (map basic-environment
(interface.implements frame))))
((OBJECT)
;; Base case 4: Object, look at the class, but
;; remember that we've found an object so that parent
;; dynamic fields are considered as we go up the
;; chain of superclasses
(set! the-object frame)
(loop (basic-environment (object.class the-object)))))))))
(let ((result (loop env)))
(if result
(next result)
(error "Variable not found" var)))))
How is method lookup handled? Take a look at the following code, the
main part of method-lookup in java-support:
(define (method-lookup method-name arg-types env next)
...definitions...
(find-object-initial-class-and-name ; (1)
(lambda (object initial-class name) ; (2)
(find-all-matching-methods object initial-class name ; (3)
(lambda (methods signatures classes) ; (4)
(find-best-method object methods signatures classes)))))) ; (5)
- Find-object-initial-class-and-name uses
method-name to decide where the search should start. If the
method-name is a variable, then we start by finding the value
of that variable (whose value must be an object of some kind). If the
method-name is a (DOT ...) expression, then we find
the value of the first component and look within that for the other
names until we reach the next-to-the-last name -- this is the method
name we're really looking up.
- In either case above we determine an object, class, and
name that will be use for the rest of the search (if we don't have an
object then we can only look for static methods of the class).
- Now search in the class for all of the methods that have
the specified name, take the right number of arguments, and have the
right modifiers (i.e. dynamic methods aren't allowed if we don't have
an object to start with).
- As part of this search, we have to collect three parallel
lists: the methods to consider, their signatures (i.e. the types of the
input arguments, the type of the result, and the modifiers -- see
make-signature in java-support), and the classes
from which they were extracted.
- Finally, pick the best method from amongst these choices.
Problem 2 -- Your turn.
We've supplied about 85% of the code needed to make
find-best-method work. But there are two procedures that you
have to write on your own: supertype-loop and
decide. You can look at the code for method-lookup
and find-best-method in file java-support to see how
these are used.
(define (supertype-loop desired-type type count)
;; Returns 0 for same type, 1 for one step away, etc., or #F
;; if type-name isn't a supertype of type
(if (eq? type desired-type)
...
(let ((super (ev-class.superclass type)))
(if (ev-class? super)
...
...))))
(define (decide new best decision)
;; We start out TIEd, and only change that if we find
;; something definitely better or definitely worse. If we
;; have to change our mind from NO to YES (or vice versa)
;; then it's really a TIE.
...)
Complete the code and put it into java-testing. Hand in the
code and evidence that it works. You might want to try
(method-test) with the version of the code we've supplied and
then compare the results to what you get.
Hints:
- supertype-loop may look tricky, but it's just a simple
counting loop like ones you've written many times before. Read the
comment at the top and just fill in the parts marked ....
- new and best are both lists of numbers,
generated by supertype-loop. These numbers represent how far
it is in the class hierarchy from the actual type of argument being
passed to the type the method is expecting; that means that a smaller
number (closer together) is better than a larger number.
Implementing VARIABLES
Let's take a look at the code for implementing VARIABLES.
This statement is in many ways analagous to Scheme's LET
expression: both introduce new variables for the duration of their
body. There are some differences, however: in Scheme, LET is
an expression and returns a meaningful value; in Java variables are
introduced as parts of statements (which don't have values). Also, in
Java, the values must first be computed and then coerced to the type
that was declared for the variables (this is known as "assignment
conversion").
(define (j-eval-variables exp env next)
;; Moved to JAVA-WORKING
;; (VARIABLES <type> (<VariableDecl> ...) <statement>)
(let ((vdecls (variables.var-decls exp))
(type-name (variables.type exp)))
(let ((names (map var-decl.name vdecls))
(type (lookup-class type-name))
(init-exprs
(map (get-initializer type-name) vdecls)))
;; INIT-EXPRS is a list of Decaf Java-In-Scheme expressions that must
;; be evaluated to initialize the new variables.
(j-eval-in-order init-exprs env
(lambda (init-vals)
;; Extend the environment by creating bindings for the new
;; variables (including "assignment conversion" of the
;; values), and then in this extended environment evaluate
;; the statement.
.......... YOUR CODE HERE ...........)))))
Problem 3 -- Your turn.
In order to write the missing code, you will need to understand at
least the following procedures:
- The data abstraction for variables in file
java-syntax.
- coerce, whose first argument is a value and whose second
argument is the desired type of that value.
- extend-environment which takes as its first argument a
frame and as its second an environment.
- j-eval, described above, the Decaf Java-In-Scheme interpreter.
- make-bindings which takes three lists as arguments. The
first is a list of names; the second is a corresponding list of type
names; and the third of which is a corresponding list of values.
- make-frame which takes as its first argument a tag for
the new frame (you will need to make a frame with the tag
VARIABLES) and as its second argument a set of bindings.
Another Look at Method Lookup
If you read the Java reference manual (available in the on-line Help
for J++) you will find that method lookup works in a series of steps,
part of which takes place at compile time and part at run time. At
compile time, the Java compiler computes what is called the
signature of the method that will be called at runtime: the
type of each of the input arguments and the type of the result. It
does this by finding all methods of the given name, looking up the
types of all of the operands (these can all be computed by the
compiler using the types that the programmer provides) and comparing
these with the types of the arguments that are being supplied (which
can also be computed at compile time). The compiler puts this
signature into the output file, and at runtime the computer must find
a method that matches this signature.
This all sounds straightforward, but there's an interesting issue
buried in it. Can the compiler really know the type of the
arguments that will be supplied at runtime? The answer to this is
yes but comes with a warning: the type is known by the
compiler, but not the class of the argument. That is, the
argument is guaranteed to be of a class that is a subtype of the type
the compiler knows, but it isn't guaranteed to be exactly the type the
compiler knows. Consider the following Java program:
class SuperClass
{ int X;
}
class SubClass extends SuperClass
{ void Test2(SuperClass x)
{ System.out.println("Superclass arg.");
}
void Test2(SubClass x)
{ System.out.println("Subclass arg.");
}
}
class Lynn
{ public static void main( String[] args )
{ SubClass sub = new SubClass();
SuperClass sup = new SuperClass(),
subAsSup = sub;
System.out.print("Test2(sup)"); sub.Test2(sup);
System.out.print("Test2(sub)"); sub.Test2(sub);
System.out.print("Test2(subAsSup)");
sub.Test2(subAsSup);
}
}
This produces the following output:
Superclass arg.
Subclass arg.
Superclass arg.
It is this final line that's odd. The variable subAsSup is
declared to have type SuperClass, so the compiler decides
that it must call a procedure whose signature is "method from an
object of type SubClass with an argument of type SuperClass".
But at runtime, the actual value of the argument is SubClass, not
SuperClass. But the compiler decided to call the method designed for
SuperClass objects, and that's what gets called.
If you try this in Java-In-Scheme, you'll get the same result. But
we could make the argument (and many object-oriented languages have
done so) that this is the wrong behavior. We really want to call the
method that best matches the argument being passed: the one that would
pring "Subclass arg." in this case.
Problem 4--Your turn again.
There is a
very simple change to the code for j-eval-call that
will change Java-In-Scheme so that it uses the actual type of the
argument, rather than its apparent type (i.e. the type to which it has
been cast or coerced), to compute the method to call. The change
affects only one line of the code.
- Show the line that must be changed, as well as the actual change.
- Show that this changes the behavior of Java-In-Scheme for this
test program.
- Explain, as simply as you can, why the "apparent type"
of an object can be used in our Java-In-Scheme interpreter instead of
keeping track of the actual data type of expressions, as would be done
in a real Java compiler. Then explain why your change works.