[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Scheme's Formal Semantics



The Verified Programming Language Implementation project has developed
a formally verified implementation of the Scheme programming language,
called Vlisp.  The contributors were Joshua D. Guttman, John D.
Ramsdell, William M. Farmer, Leonard G. Monk, and Vipin Swarup, of The
MITRE Corporation, and Mitchell Wand and Dino Oliva of Northeastern
University.  

We selected Scheme for our verification effort because it is a
programming language in active use, with a concise and relatively well
worked out formal semantics, and because there are a variety of good
public domain implementations that could be used as blueprints.  Of
course, having some Scheme enthusiasts in house probably made a
difference too.  The aim of this note is to report on problems we
discovered with the current formulation of Scheme's formal semantics
and to suggest changes.

MEMORY EXHAUSTION: The official semantics always tests whether the
store is out of memory (fully allocated) before it allocates a new
location.  We recommend removing these tests and introduce the
assumption that the store is infinite.  We give two main reasons:

  Any implementation must use its memory for various purposes that are
  not visible in the official denotational semantics.  Thus, the
  denotational semantics cannot give a reliable prediction about when 
  an implementation will run out of memory.  

  It simplifies reasoning to treat all possible memory exhaustion
  problems uniformly at a low level in the refinement process.

For instance, all Scheme implementations must use memory to record the
return point for procedure calls that are not tail-recursive, and
every implementation can exhaust memory in this way.  However, the
Scheme semantics for procedure call predicts that memory cannot be
exhausted by nested procedure calls.

LETREC EXPRESSIONS AND TAGS ON PROCEDURE VALUES: A procedure object is
treated in the semantics as a pair, consisting of a store location and
a functional value.  The latter represents the behavior of the
procedure, taking the sequence of actual parameters, an expression
continuation, and a store as its arguments, and returning a
computational answer.  The location is used as a tag, in order to
decide whether two procedure objects are equivalent in the sense of
the Scheme standard procedure EQV?.  The exact tag associated with a
procedure value depends on the exact order of events when it was
created.  Similarly, the locations of other objects will depend on
whether a location was previously allocated to serve as the tag for a
procedure object.

In Scheme, LETREC expressions are derived expressions which are
specified using assignment.  Reasoning about program transformations
in the presence of tags and side-effecting LETREC expressions is very
hard.

The language used to implement the Vlisp byte code interpreter is
called Vlisp PreScheme.  The idea for PreScheme came from Jonathan
Rees and Richard Kelsey's Scheme48 system.  PreScheme is a subset of
Scheme that a compiler can transform into a form which requires only a
C-like run time system, that is, no run time tags, no garbage
collection, and no closures.

The formal semantics for Vlisp PreScheme is much like Scheme's except
there are no tags on procedure values, and LETREC expressions are not
derived expressions, instead their initializers are required to be
LAMBDA expressions, and their semantics are given by fixed point
expressions.

As a consequence, many transformations can be proved correct for Vlisp
PreScheme, but the corresponding arguments for Scheme would not be
sound.  This fact allowed the verification of the Vlisp PreScheme
compiler's transformations.  On the other hand, in PreScheme, as in
ML, procedures cannot be tested for equality.  We prefer this
tradeoff: it seems unfortunate to sacrifice the possibility of doing a
wide variety of verifiable optimizations just to have a test for
equality on procedures and a side-effecting semantics for LETREC.  The
more so as the equality test in Scheme is fairly unpredictable (see
page 13 of R3RS).

SEMANTICS OF CONSTANTS: The official Scheme semantics contains a
semantic function K which maps constants--expressions in the abstract
syntax--to denotational values.  But K is simply characterized as
``intentionally omitted.''  In some cases, its behavior seems
straightforward: for instance, its behavior for small numerals.
However, it is far from clear how one would define K for constants
that require storage.

There are other concerns we have about the semantics of primitives,
and some apparently missing normality conditions, which are described
in the Vlisp Guide.  This report, along with seven other reports on
the Vlisp project are available in Ozan Yigit's Scheme Repository in
nexus.yorku.ca:/pub/scheme/txt/vlisp.tar.Z.

John