[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Syntactic extensions to Scheme



    Date: 7 Nov 1985 1112-CST
    From: David Bartley <Bartley%CSL60%ti-csl.csnet at CSNET-RELAY.ARPA>

    ... I'm particularly interested in experiments such as T's with extensive
    support for syntax tables.

You ask hard questions.  Therefore this message is very long.

Before answering the questions, I'll summarize how T syntax tables work:

The evaluator and compiler each take a syntax table argument.  A syntax
table maps symbols to objects called "syntax descriptors".  A syntax
descriptor is basically either a primitive token, for primitive special
forms like QUOTE and LAMBDA, or it's a macro expansion function.  Syntax
descriptors print as #[Syntax ...], where ... somehow identifies the
descriptor (name or whatever).

When the compiler processes a form whose car is a symbol, the first
thing it does is look up the symbol in the syntax table.  If there is an
entry in the table, say (syntax-table-entry table 'foo) => #[Syntax
BAR], the form is processed as if the form (#[Syntax BAR] ...) had been
seen.  When a form whose car is a syntax descriptor is processed, the
appropriate thing happens; if the descriptor is a macro expander, the
form is expanded; if it's primitive, the form is interpreted as a QUOTE
or LAMBDA or whatever expression.

Users can create syntax tables, fetch and store entries in them, and
pass them to the evaluator or compiler.  A macro (macro-expander ...)
creates a descriptor for a macro expander.  If no syntax table argument
is supplied to EVAL or LOAD, a syntax table is obtained from the
environment argument by accessing the value of a lexical environment
variable with a special internal name (let's suppose the name is
&&syntax-table&&; in principle it's something untypable).  There is
a special form DEFINE-SYNTAX which basically does a RUNTIME side-effect
of storing a descriptor into &&syntax-table&&.  (DEFINE-SYNTAX has
no effect at compile time.)

In addition, DEFINE-LOCAL-SYNTAX defines a macro local to a file or
expression, analogous to MACROLET in Common Lisp.  This has no effect at
runtime.  (DEFINE-MACRO (FOO ...) ...) is a horrible dual-purpose form
which is the same as
  (BEGIN (DEFINE-SYNTAX (FOO ...) ...)
	 (DEFINE-LOCAL-SYNTAX (FOO ...) ...)).

End of summary, now for the questions.

    What are their strengths and weaknesses?

Strength: the ability to make "absolute references" to standard syntactic
forms - i.e. (#[Syntax LAMBDA] (X) (+ X 5)) - is a boon to embedded
incompatible language implementation.  E.g. suppose you wanted to define
a syntax table where DEFINE was a macro which expanded into something
which wanted to use the usual Scheme DEFINE.  It wouldn't work to do
  (define-syntax (define ...) `( ... (define ...) ...))
since this is circular.  But you could do
  (define-syntax (define ...) `( ... (#[Syntax DEFINE] ...) ...))
where #[Syntax DEFINE] denotes the standard syntax descriptor for
DEFINE.

This is how I was able to write implementations of RRRSS and even a sort
of bastard Common Lisp in T, even though there were multiple
incompatibilities.  And because everything is scoped, one environment's
macros aren't seen in different environments, so one can easily debug
programs different parts of which are written using completely
different syntax.

The standard system macros all do the absolute reference trick, and the
effect is that a macro can expand a form into a new form which uses a
different macro, without having to worry about whether the macro in the
expansion was defined in the syntax table being used to process the
expansion.  Thus descriptors can be moved from one syntax table to
another without worrying too much about how they're defined - they act
almost like closures.  (Note that this problem doesn't come up in Common
Lisp because one "closes" over an environment (package) at read time.
Macros in Scheme have to be more complicated since Scheme did the right
thing with symbols.)

Weaknesses: there are the usual kinks.  Best to give an illustration.
Suppose a Scheme program (file) contains the following forms:

    (define-macro (foo form) (bar form))

    (define (bar form) `(#[syntax lambda] () (baz ,form)))

    (define (baz obj) (cons (cdr obj) (car obj)))

    ((foo (cons 'a 'b)))

If your model of macro expansion is that it's part of what the evaluator
does, then this "works:" the last form evaluates to (b . a).  This is
the simplest explanation of what macos do, and it's what I think most
naive users believe happens.  Indeed it is what happens in many Lisp
interpreters.

However, the code is obviously sensitive to what happens at compile time
and what happens at run time.  Presumably the compiler does not load the
file.  So when the FOO form gets expanded by the file compiler, the
expander tries to call BAR, which is undefined.

T is badly engineered in the sense that this code will run interpreted
and not compiled.  This is because the environment in which the macro
expansion function is closed is the same as the one in which the
expansion will run.  (Common Lisp has the same problem.)  Users should
have to work much harder than this to find inconsistencies like this.  A
small improvement would be if LOAD "macroexpanded" the whole file before
running any of it; then FOO would be undefined at expand time.  But this
isn't a very general solution.  What if the forms occur in different
files?  What if the code is reloaded into the same environment - will it
work the second time, using the definition of FOO from the first LOAD?

In the best of all possible worlds, macros and the auxiliary functions
they call to perform expansions should only have to exist for the
purposes of compilation (and EVAL, debugging, etc.); at runtime they can
go away or be garbage collected.  (Consider using a Scheme compiler to
build a small stand-alone system which wants to run in minimal address
space.  Expansion functions are superfluous at runtime.)  This suggests
that macros should always inhabit separate modules in their own separate
environments.  These modules can be loaded for compilation purposes, if
a client requests, but needen't exist at runtime.  Users should have to
go to the extra trouble of announcing that they are defining a
compilation-support module to cause DEFINE-SYNTAX itself to become
available at all.

I haven't even talked about the problem of the scoping of BAZ.  As it
is, any client of the FOO macro must know that the variable BAZ occurs
free in the expanded form, and suffer the consequences.  I don't see any
way around this if out-of-core compilation is going to work.  A listing
of what variables are free (in principle one needs at most 1 such) must
simply be part of the documentation of FOO.  (One can play tricks with
system primitives like CAR, but the problem remains when the expansion
needs a user-defined function.)

Whew...  obviously this is much too complicated.  I have sympathies for
the trstricted Indiana approach to macros, although I'm not sure it gets
around all the various problems, and I would resist giving up the
ability to write arbitrary expansion procedures.  But I am certainly not
asking anyone to adopt T's syntax mechanism, since the specification is
so bug-ridden.

    What do they cost in terms of compiler or runtime complexity?

In spite of all this wind, the implementation is very straightforward.
One just puts hooks into EVAL and the compiler in the right place, and
passes the syntax tables around as appropriate.

    Do users make effective use of them?

As long as macros exist, users will abuse them.  But I think they have
their place, and a small number of people know how to use them well.  It
has been suggested that a licensing agency be established.

As far as syntax tables go, I don't know of anyone who has done anything
with them quite as hairy as what I've described, but I think that the
people who exploit multiple environments (there aren't many) implicitly
use the fact that macros get scoped appropriately; they know there won't
be name conflicts, the same way they know that there won't be name
conflicts for variables.

    How flexible are they---would they suffice for major language changes?

I think what I said above about embedded languages answers this for the
most part - in the affirmative.  Clearly they wouldn't be effective e.g.
to change scoping or evaluation order rules, but they work well
to change the meaning of (LAMBDA ...).

    Should syntactic extensions be specified entirely in source terms, or
    should the user have access to internal representations?

I have a running argument with the MIT Scheme folks about this; I think
that Scheme makes a perfectly good representation for programs.  There
are some scoping problems to be addressed, e.g.  any macro expander
which wants to understand anything about a subexpression must know what
syntax table the subexpression is to be interpreted relative to, but as
long as some convention is established (there are a couple of
alternatives) and programmers are made aware of these issues, I don't
think we need to go to the trouble of defining new data types for
representing programs.  That would be very complicated and there'd be
little hope of making the interface portable.

    Should simple source-to-source optimizations be communicated to the
    compiler using the same mechanism, or is something else more
    appropriate?

Definitely not.  I'm not sure that users should ever specify
optimizations to a compiler; perhaps it would be acceptable if the
compiler had some mechanism by which it could prove the correctness of
such transformations before deciding it was okay to do them.  Also, the
situation with free variables can be very complicated.  But in any case,
just for pedagogical reasons one shouldn't do anything to lead users to
believe that macros and procedures have any similarites at all.  It's
bad enough that they are syntactically similar.

[By the way, even though T and MIT Scheme agreed some years back that
(LET ((IF LIST)) (IF 1 2 3)) should evaluate to 2 , and I used to think
this was the only acceptable semantics, now I'm not convinced that
this is the right thing; I think Kent Pitman has a coherent semantics
which would allow it to evaluate to (1 2 3), without sacrificing
compilability or purity.  But I'll let him explain that.  I'm not
convinced he's right either.]