[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

a modest macro proposal



			  Macros for Scheme
			    Jonathan Rees
			    28 March 1987

Primary objectives:

  - Macros are scoped, so users won't step on each others' toes.

  - The client of a macro need not know anything about the macro's
    implementation.  In particular, capture problems must be avoidable
    both for syntactic keywords and variables.

Secondary objectives:

  - Consistent with the "expansion passing style" described
    in [1].

  - Consistent with the spirit of the macro facilities provided by
    MIT Scheme and T.

This is a rough draft, and contains more questions than answers, but I
want to get feedback, and answers to the questions, so here it is.



Overview:

  1. Fundamental mechanism
        Describes the basic ideas of syntax tables and preprocessed
	expressions.

  2. Defining macros
        Describes two new expression types that introduce scoped macro
	definitions.

  3. Avoiding free variable capture
        Describes ways to circumvent capture problems.

  4. Convenience features
        Discusses higher-level layers that could make macro writing
	easier.

  5. Notes and questions

  Appendix. An implementation



1. Fundamental mechanism

Two abstractions are introduced, "syntax table" and "preprocessed
expression".  A "syntax table" describes a particular mapping from
concrete syntax (expressions) to abstract syntax (preprocessed
expressions).  When a user defines a macro, he implicitly defines a
variant on the language and therefore a new mapping from expressions
to preprocessed expressions.


1.1. Reference guide

(For the purposes of this discussion, the term "expression" means
"s-expression", or more precisely:
    - Symbols, numbers, booleans, characters, strings, and empty lists
      are expressions.
    - If E1 and E2 are expressions then the pair (E1 . E2) is an
      expression.  (In particular, lists of expressions are
      expressions.)
    - If E0, ... En are expressions, then the vector #(E1 ... En) is
      an expression.
    - A preprocessed expression is an expression.
    - There are no other expressions.)

(PREPROCESS expression syntax-table)

  PREPROCESS preprocesses expression according to syntax-table,
  and returns a preprocessed expression.  The manner in which
  the preprocessed expression is determined depends entirely on the
  syntax table argument (see the various ways to create syntax tables,
  below), with the following exception: (PREPROCESS p syntax-table)
  always returns p if p is already a preprocessed expression.

  It is an error if expression is not syntactically valid according
  to syntax-table.

SCHEME-SYNTAX-TABLE

  The value of SCHEME-SYNTAX-TABLE is a syntax table that corresponds
  to a language conforming to the Revised^3 Report.  In gory detail,
  this means: let E be an expression, and let P be the preprocessed
  expression that results from calling
  (PREPROCESS E SCHEME-SYNTAX-TABLE).

    - If E is a number, boolean, string, or character, then P denotes
      an appropriate literal expression.
    - If E is a symbol, and E is not a Scheme syntactic keyword (QUOTE,
      LAMBDA, etc.), then P denotes a variable reference.
    - If E is a pair whose car is a syntactic keyword, then P denotes
      an appriopriate expression (unless it contins a syntax error).
    - If E is a nonempty list whose car is an expression, then P denotes a
      combination (unless some subexpression contains a syntax error).
    - If E is already a preprocessed expression, E is equal to P.
    - Otherwise E is not syntactically valid.

(ADD-KEYWORD syntax-table symbol expansion-proc)

  Expansion-proc must be a procedure of two arguments, an expression
  and a syntax table.  Expansion-proc must return a preprocessed
  expression.

  ADD-KEYWORD returns a new syntax table according to which
  expressions of the form (symbol ...) are preprocessed by
  expansion-proc.  That is, PREPROCESS will call expansion-proc and
  return what it returns.  The arguments passed to expansion-proc will
  be the expression and syntax-table that were passed to PREPROCESS.

  Any other expression E is preprocessed the same way it would have
  been preprocessed according to syntax-table.  If this means that it
  is to be preprocessed according to SCHEME-SYNTAX-TABLE, then any
  subexpressions of the expression will be preprocessed according to
  the syntax table that was originally passed to PREPROCESS, not
  according to SCHEME-SYNTAX-TABLE.

(REMOVE-KEYWORD syntax-table symbol)

  This returns a syntax table in which an expression of the form
  (symbol ...) denotes a combination.  If symbol had an associated
  expansion procedure in syntax-table, that expansion procedure will
  be ignored in the new syntax table.


1.2. Discussion

The following may be a helpful analogy:

   (EVAL                      (PREPROCESS
      lambda-expression	         expression
      environment)		 syntax-table)
    => closure		       => preprocessed-expression

EVAL (or the ENCLOSE of the Revised Report) takes a lambda-expression,
which is context-dependent or "open" because it contains free
variables, and turns it into something that's context-independent or
"closed", namely a closure.  PREPROCESS takes an expression, which is
context-dependent because the meanings of subexpressions depend on
what macros are in effect, and returns something that is
context-independent and therefore immune to the vagaries of the macro
context into which it may be placed.

Preprocessed expressions may legitimately appear as subexpressions of
expressions to be passed to PREPROCESS.  For example, if M1 and M2 are
preprocessed-expressions, then `(AND ,M1 ,M2) is a valid expression
that can be passed again to PREPROCESS (assuming AND has its usual
meaning).  The effect of this is the same as if the expression had
been an AND-expression whose subexpressions were expressions that
would have been preprocessed as M1 and M2 in whatever syntax-table was
the second argument to the call to PREPROCESS.

The nature of "preprocessed-expression" objects is not specified here;
they may or may not be lists, vectors, procedures, etc., or objects of
some new data type.  This proposal does not provide any explicit
operations on preprocessed expressions, but it doesn't preclude such
operations, either.  Presumably LOAD, EVAL, and compilers know how to
manipulate preprocessed-expressions.  Similarly, there may be
operations on syntax tables other than the ones given here; in
particular the clever tricks in [1] could easily work in this
framework.

Note that the syntax table passed to an expansion procedure is not
necessarily the same as the syntax table returned by the call to
ADD-KEYWORD that defined its keyword.  The syntax table is the
appropriate one to use in processing subexpressions.  The syntax table
argument serves the same purpose as the expansion procedure passed to
expanders in [1].

Detail: definitions, as well as expressions, may be passed to
PREPROCESS.


1.3. Examples

Example 1: The following evaluates to a syntax table that is the
same as that for R^3R Scheme except that FOO is a syntactic keyword and
(FOO x) means the same as (QUOTE x).

  (add-keyword scheme-syntax-table 'foo
    (lambda (e st)
      (preprocess `(quote ,(cadr e)) scheme-syntax-table)))

This illustrates the general principle that a more complicated
syntax-table can be defined in terms of a simpler one.  An expression E
written in a more complicated language L (not even known at macro
definition time) is transformed into a new expression (the expansion),
and then the preprocessed version of the new expression is determined
according to a syntax-table that is known by the expansion procedure to
support the QUOTE keyword in the expected way.

Example 2: The following procedure will augment a given syntax table
with a definition of a simple LET macro.

  (define (add-let st)
    (add-keyword st 'let
      (lambda (exp st)
	(let ((bindings (cadr exp))
	      (body (cddr exp)))
	  (preprocess `((lambda ,(map car bindings)
			  ,@(map (lambda (exp)
				   (preprocess exp st))
				 body))
			,@(map (lambda (binding)
				 (preprocess (cadr binding) st))
			       bindings))
		      scheme-syntax-table)))))

The fact that preprocessed-expressions act like normal forms permits the
use of ordinary list constructors (like backquote) in constructing
partially preprocessed expressions.

Note that PREPROCESS is used within expansion procedures for two
distinct purposes:

(a) To compute a preprocessed expression, in the current syntax table,
    for each sub-expressions of the expression being expanded.

(b) To preprocess, according to some known syntax table, an expression
    that has been determined to be equivalent to the original expression.

Why are syntax tables immutable?  This aids (but doesn't guarantee)
consistency between compiled and interpreted code.


2. Defining macros

2.1. LET-SYNTAX

(LET-SYNTAX (((keyword exp-var st-var) . expansion) ...) . body)

  LET-SYNTAX is used to define a macro that is local to a single
  expression (in practice it is often wrapped around most of a file).
  (T and MIT Scheme both have constructs like this.)

  For example,

    (let-syntax (((foo exp st)
		  (preprocess `(quote ,(cadr exp)) scheme-syntax-table)))
      (foo (a b c)))

     =>  (a b c)

  LET-SYNTAX need not be primitive, assuming there exists an EVAL
  procedure and some environment EXPANDER-ENV in which to close
  expansion procedures.  The following adds a LET-SYNTAX expression
  type to any syntax table S:

  (add-keyword S 'let-syntax
    (lambda (exp st)
      (do ((specs (cadr exp) (cdr specs))
	   (st st
	       (let ((spec (car specs)))
		 (add-keyword st (caar spec)
			      (eval `(lambda ,(cdar spec)
				       ,@(cdr spec))
				    expander-env)))))
	  ((null? specs)
	   (preprocess `(begin ,@(map (lambda (exp)
					(preprocess exp st))
				      (cddr exp)))
		       scheme-syntax-table)))))

  In order to reduce the possibility that a macro could accidentally
  or intentionally depend on some run-time binding, it is
  strongly advised to make the environment in which expanders
  are closed be disjoint from the environment in which the expanded
  code will be run.  Otherwise one could find oneself in the embarrasssing
  situation of having code that "works" in an incrementally compiled
  implementation but not in a block- or cross-compiled implementation.


2.2. USING-SYNTAX

  (USING-SYNTAX syntax-table-exp . body)

  USING-SYNTAX lets one make use of some specific macro environment.

    (using-syntax scheme-syntax-table (quote yow))  =>  yow

  (add-keyword syntax-table 'using-syntax
    (lambda (exp syntax-table)
      ;; ignore syntax-table
      (let ((syntax-table (eval (cadr exp) expander-env)))
	(preprocess `(begin ,@(map (lambda (exp)
				     (preprocess exp syntax-table))
			     (cddr exp)))
		    scheme-syntax-table))))

Subtle point:

For these two forms, it might make just as much sense, and perhaps more,
to say

  (eval (preprocess foo syntax-table) expander-env)
as
  (eval	            foo               expander-env) --

i.e. macros can be written using the macros in effect where the text of
the macro definition occurs, even if they can't make use of the lexical
environment.



3. Avoiding free variable capture

3.1. Free variables introduced into expansions

We want to be able to do things like

  (let ((cons +))
    `(a ,(cons 1 2) b))

and not lose when QUASIQUOTE expanding into a call to CONS.  It
doesn't work to write (as Dan Friedman and others have suggested)

  `(',car ,z)

in the definition of QUASIQUOTE because this presents horrible questions
about the meaning of cross-compilation that no one is prepared to
answer right now.

Kohlbecker's solution amounts to performing alpha-conversion and macro
expansion at the same time.  This is a lot of mechanism and breaks
down in a few places.  Here is a much simpler, low-tech solution.

The solution is for the expander to introduce special expressions into
the expansion that represent "absolute" or "global" references.
Such references are not sensitive to the lexical environment.

(ABSOLUTE node1 node2 ...)         [syntax]

  Finds a value in an implementation-dependent, tree-structured
  namespace.  Each node_i should be an identifier; this is to be
  considered analogous to a Multics-style pathname >node1>node2>....

  In order to make it as easy as possible, Scheme implementors are
  urged to cooperate in apportioning sections of this namespace so that
  there no conflicts can arise.  This is an aministrative problem 
  analgous to domain naming on the Internet, and perhaps solvable by
  similar means.

  Only one portion of the namespace is defined here, namely that the
  top-level SCHEME-ENV node has as subnodes all the names in the
  initial R^3R Scheme environment.  E.g.

    (let ((+ -))
      ((absolute scheme-env +) 1 2))    =>  3

(Note that ABSOLUTE must be a new kind of expression -- a procedure
can't so the trick, since that would beg the question of how to name
THAT procedure.)


3.2. Bound variables introduced into the expansion

The flip side of this problem is that macros often want to introduce
new bound variables into expansions, and we don't want these names to
accidentally conflict with names already used in the client's code.

Common Lisp (Maclisp, etc.) programmers don't consider this to be a
problem, since GENSYM and GENTEMP exist.  T has GENERATE-SYMBOL (?) and
MIT Scheme has GENERATE-UNINTERNED-SYMBOL.  I think something like
this would do the trick.  However, I would very much like to
preserve the invariant

  (EQ? SYM (STRING->SYMBOL (SYMBOL->STRING SYM))).

which is violated by GENSYM (and GENERATE-UNINTERNED-SYMBOL).

One solution, with a well-defined semantics, would be to have a
procedure that returns a symbol not ocurring in a given expression
(or expressions):

  (SYMBOL-NOT-OCCURRING-IN exp)  =>  symbol

This has a nice functional flavor to it, but it could be implemented
nondeterministically, in such a way that only the symbol table need be
examined, not exp itself.  (I think T3 does this.)

Another possibility would be to apportion some subset of the set of
all symbols for use as "unique identifiers", e.g. all symbols starting
with some "obscure" prefix, not necessarily even readable (although
read/print symmetry is also a nice feature...).

I don't want to make a concrete proposal at this time.


4. Convenience features

Writing correct macros using ADD-KEYWORD is possible, but cumbersome
and error prone.  One must remember to call PREPROCESS on
sub-expressions and on the final output, passing the correct syntax
table to each.

There are several possible ways to address this problem.  One is to
say that we should not be in the business of making it easy to write
macros, but instead should do what we can to discourage users from
writing macros, or at least make them recognize the pitfalls.  In this
view, the complexity of the process is good.

A second solution is Kohlbecker's "hygienic expansion", which makes it
easy to write correct macros.  I suspect this mechanism could be
implemented in terms of the low-level primitives given above, but I
think it has some drawbacks; there are many useful kinds of macros that
can't be written.

I am working on a third solution that, loosely speaking, makes use of
a syntactic description (BNF-like) of the expression type in order to
preprocess subexpressions before handing them to the expansion procedure.
The result is more verbose than hygienic macros and only a little more
verbose than Common Lisp's macros.


5. Notes and open questions

5.1. Compatibility notes

T and MIT Scheme already have syntax tables, but they're mutable.
Expansion procedures are called "syntax descriptors" in T.  T has a
MACRO-EXPANDER macro that creates expansion procedures.  MIT Scheme has
a MACRO macro for the same purpose.

In MIT Scheme, the syntax table is passed implicitly as a fluid-bound
variable.  In T, it is possible to get at the syntax table, as an extra
argument to an expander, but it's painful.  In Common Lisp, the
syntax-table corresponds roughly to the &environment argument to macros
(except that in CL you are forbidden to redefine a special
form -- this proposal permits that).

PREPROCESS is similar to the SYNTAX procedure in MIT Scheme, and
vaguely similar to STANDARD-COMPILER in T.

ABSOLUTE is similar to MIT Scheme's ACCESS.  In ACCESS, the last
subform is evaluated, which isn't quite what we want, since that makes
it context-sensitive again (although this greatly reduces
opportunities for lossage).  [Also, I find the argument order to
ACCESS confusing; it's backwards from the way filenames are usually
written (on Multics and Unix at least) and also backwards from things
like VECTOR-REF, where the aggregate or superior object comes first.]



5.2. Syntax table used by LOAD and/or command loop

1. Which syntax table is used to process forms read by LOAD?

2. Which syntax table is used to process forms typed at a
   read-eval-print loop?

3. How can one perform definitions in the environment that will
   be seen by USING-SYNTAX?

Here is one conservative proposal, although there are many
possibilities and variations:

The top level syntax-table for any file is initially
SCHEME-SYNTAX-TABLE.  Changes must be made explicitly via USING-SYNTAX
or LET-SYNTAX, which should be wrapped around the enire file if
necessary.

The syntax table used at the read-eval-print loop is changed in some
implementation-dependent manner (there's nothing that even says there
IS a read-eval-print loop).  E.g. there could be a procedure
(SET-CURRENT-SYNTAX-TABLE! syntax-table).


5.3. Keywords and variables

Several people have complained that

  (let ((if list))
    (if 1 2 3))

ought, according to the rules of lexical scope, to evaluate to (1 2 3).
It is possible in this framework to make syntax-tables in which variable
bindings shadow syntax bindings, but it requires cooperation from
every macro that binds variables (LET, LETREC, LAMBDA, etc.):

  (add-syntax foo 'lambda
    (lambda (exp st)
      ... (do ((vars vars (cdr vars))
	       (st st (remove-syntax (car vars) st)))
	      ...) ...))

I'd rather not raise this question here since it's really orthogonal to
the rest of this proposal.


5.4. Macros that expand into multiple definitions

The syntax of <program> should be extended to include sequence
expressions:

      <program>  -->  <top>*
      <top>	 -->  <definition>
		   |  <expression>
		   |  (begin <top>+)

This is so that macros at top level can expand into multiple
definitions: (begin (define foo ...) (define bar ...)).

There is an ambiguity here in that (begin <expression>+) can be parsed
in either of two ways, but the meaning is the same in either case, so
this isn't a grave problem.

Should the syntax of a <body> be similarly extended to allow
expansions like
   (lambda (...) (begin (define ...) (define ...)) ...)?
What about
   (lambda (...) (begin (define ...) (compute ...)) ...)?
What about 
   (lambda (...) (begin (compute ...) (define ...)) ...)?


5.5. Delayed expansion

Some implementations may want to delay macro expansion (preprocessing)
so that the expression tree is processed breadth-first instead of
depth-first.  This could be handy for any number of purposes, e.g. in
preventing propagation of syntax errors, in performing
alpha-conversion in parallel with macro expansion, or to speed up file
loading.  This should be explicitly permitted by the proposal.  The
only way it would make a difference is if a macro expander could
observe or perform a side-effect.


5.6. Analysis of subexpressions

I think it's a bad idea for macros to go snooping into their
subexpressions.  This should be unnecessary for "optimization" (the
main reason people did this in Maclisp); it's hard to come
up with valid reasons to want to do it.

On the other hand, it would be nice if someone writing a compiler
could portably use PREPROCESS as a front end.  This would mandate having
operations for decomposing preprocessed expressions.  One possibility
would be to define a set of accessors and predicates, as MIT Scheme
does.  Another way to do it would be to have one or more coercion
functions to do the inverse of PREPROCESS, e.g. (UNPREPROCESS p-e)
would return an expression e such that

  (PREPROCESS e SCHEME-SYNTAX-TABLE)

would return something equivalent to p-e; then one could use CAR and
CDR to take the result apart.

Either way you have to answer sticky questions, however, such as
whether derived expressions like LETREC should be expanded out or
preserved.


5.7. Other ways to manipulate the keyword/expander association

Maybe we also want MOVE-KEYWORD or RENAME-KEYWORD?




References.

[1] Dybvig, Friedman, and Haynes.  Expansion-passing style:  Beyond
    conventional macros.  1986 ACM Lisp & FP Conference.

[2] Kohlbecker's thesis.

[3] T manual.

[4] Common Lisp.

[5] Revised^3 Scheme Report.



---------------------

Appendix: a rudimentary implementation.

;;; Preprocessed expressions

(define (preprocessed? obj)
  (and (vector? obj)
       (= (vector-length obj) 2)
       (eq? (vector-ref obj 0) 'preprocessed)))

(define (make-preprocessed core-exp)
  (if (preprocessed? core-exp)
      core-exp
      (vector 'preprocessed core-exp)))

;;; ->CORE translates a preprocessed expression into the core language.

(define (->core exp)
  (if (preprocessed? exp)
      (vector-ref exp 1)
      (error "not a preprocessed expression" exp)))

;;; A syntax table is a procedure, and PREPROCESS is FUNCALL.

(define (preprocess exp st)
  (st exp st))

(define (add-keyword st0 keyword proc)
  (lambda (exp st)
    (if (and (pair? exp) (eq? (car exp) keyword))
	(proc exp st)
	(st0 exp st))))

;;; An empty syntax table; defines no special expression types.

(define empty-syntax-table
  (lambda (exp st)
    (cond ((symbol? exp)
	   (make-preprocessed exp))
	  ((or (boolean? exp) (number? exp) (char? exp) (string? exp))
	   (make-preprocessed exp))
	  ((preprocessed? exp)
	   exp)				;Idempotent!
	  ((not (pair? exp))
	   (error "not a syntactically valid expression" exp))
	  (else
	   ;; Combination
	   ;; (There is a small bug here if REMOVE-KEYWORD exists)
	   (make-preprocessed (map (lambda (arg)
				     (->core (preprocess arg st)))
				   exp))))))

;;; Core syntax table.  Understands the primitive expression types, but
;;; not the derived ones.

(define core-syntax-table
  (do ((st empty-syntax-table
	   (add-keyword st (caar z) (cadar z)))
       (z `((quote
	     ,(lambda (exp st)
		(make-preprocessed exp)))
	    (lambda
	     ,(lambda (exp st)
		(make-preprocessed
		  `(lambda ,(cadr exp)
		     ,(->core (preprocess (caddr exp) st))))))
	    (set!
	     ,(lambda (exp st)
		(make-preprocessed
		  `(set! ,(cadr exp)
			 ,(->core (preprocess (caddr exp) st))))))
	    (define
	     ,(lambda (exp st)
		(make-preprocessed
		  `(define ,(cadr exp)
		     ,(->core (preprocess (caddr exp) st))))))
	    (if
	     ,(lambda (exp st)
		(make-preprocessed
		  `(if ,(->core (preprocess (cadr exp) st))
		       ,(->core (preprocess (caddr exp) st))
		       ,(->core (preprocess (cadddr exp) st))))))
	    (begin
	     ,(lambda (exp st)
		(make-preprocessed
		  `(begin ,@(map (lambda (exp)
				   (->core (preprocess exp st)))
				 (cdr exp))))))
	    (absolute
	     ,(lambda (exp st)
		(make-preprocessed exp))))
	  (cdr z)))
      ((null? z) st)))

;;; The scheme syntax table defines the derived expression types.

(define scheme-syntax-table
  (do ((st core-syntax-table
	   (add-keyword st (caar z) (cadar z)))
       (z `((and
	     ,(lambda (exp st)
		(let ((forms (cdr exp))
		      (j (lambda (exp) (preprocess exp st))))
		  (cond ((null? forms) `#t)
			((null? (cdr forms)) (j (car forms)))
			(else
			 (preprocess
			    `((lambda (p th)
				(if p (th) p))
			      ,(car forms)
			      (lambda () (and ,@(map j (cdr forms)))))
			    scheme-syntax-table))))))
	    ;; ...
	    (lambda
	     ,(lambda (exp st)
		(preprocess
		 `(lambda ,(cadr exp)
		    ,(preprocess-body (cddr exp) st))
		 core-syntax-table)))
	    ;; (letrec ,...)
	    ;; ...
	    ;; (quasiquote ,... (absolute scheme-env cons) ...)
	    )
	  (cdr z)))
      ((null? z) st)))

;;; Implements implicit begin and internal defines for lambda bodies.

(define (preprocess-body body st)
  (let ((definition? (lambda (exp)
		       (and (pair? exp) (eq? (car exp) 'define))))
	(definition-lhs cadr)
	(definition-rhs caddr))
    (let loop ((l (map (lambda (exp)
			 (preprocess exp st))
		       body))
	       (d '()))
      (if (null? l)
	  (error "no non-definitions in body" body)
	  (let ((exp (->core (car l))))  ;Analyze
	    (if (not (definition? exp))
		(preprocess (if (null? d)
				`(begin ,@l)
				`(letrec ,(reverse d) ,@l))
			    scheme-syntax-table)
		(loop (cdr l)
		      (cons `(,(definition-lhs exp)
			      ,(make-preprocessed (definition-rhs exp)))
			    d))))))))

; Fin