[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Better late than never

Here, as promised at the Macro Summit Meeting, is a "Modified Macro
Proposal".  Apologies for not mailing this out sooner, Jonathan and I will
share the blame for not being prompt about this.

This is not a complete proposal any more than Jonathan's previous macro
proposal was.  There are many issues of left unaddressed.  There is no
simple interface for ordinary users to use, that must wait for a separate
proposal.  There are many minor details left unspecified.  The only real
difference between this proposal and the last, is that it includes a
mechanism that addresses the various variable capture problems.

I will assume that the reader is already familiar with Jonathan's previous
proposal.  (If you care about this stuff, you probably still have a copy.)
I will only describe the differences between the two.

The basic idea is that the thing we usually call an "environment" (a map
from identifiers to locations), can be viewed as the composition of two
separate maps:  First, a map from identifiers to "variables", followed by a
second map from variables to locations.  The first map is statically
determined by the program text.  It is the mapping that compiler writers
normally call "alpha conversion".  The second map is dynamically determined
at runtime.  It is the mapping that is actually stored in closures.

In other words, consider:

    (F X
       (LAMBDA (X)
	 (F X))))

Although there is one identifier named "X" here, occurring in four places,
we all recognize that there are really two separate variables named "X",
each occurring just twice.  The mapping that tells you which variable named
"X" is meant by a particular occurrence of the identifier named "X", varies
from place to place in the source code, as determined by the scoping rules
of the language.

Now the idea is to make this static mapping from identifiers to variables
part of syntax tables.  This has a nice appeal to it since it makes syntax
tables contain -all- of the static, contextual information necessary for
interpreting the meaning of a particular piece of code.  In one package you
get both the mapping from keywords to their current meanings, as well as
the mapping from identifiers to their current meanings.

To see how this deals with the various capture problems, here is how one
might define a vanilla PUSH macro:

      (LAMBDA (SYNTAX-TABLE EXPR)	; (PUSH val var)
	    `(SET! ,VAR (CONS ,VAL ,VAR)))))))

(There are some minor incompatibilities in argument-order here with
Jonathan's original, but they should mostly be obvious.  The only real
difference is the second argument to PREPROCESS, which is new, and will be
explained shortly.  For the moment, just ignore the '()s.)

As in Jonathan's original modest proposal, the writer need not be concerned
that the definition of the keyword SET! might be locally redefined in the
location where the PUSH-expression was used, because he uses a known syntax
table to preprocess the SET! expression he constructs.  In this modified
proposal, he needn't worry about any local rebindings of CAR either,
because the mapping from the identifier named "CAR" found in
SCHEME-SYNTAX-TABLE will be the global variable named "CAR", rather than
any local variables that happen to have the same name.

Thus we don't need to introduce a new ABSOLUTE special form to allow macro
writers to make references to known variables.  Syntax tables can be used
to resolve identifiers into the particular variables whose values are to be
accessed.  (There might, of course, be -other- reasons for wanting ABSOLUTE.)

To illustrate how another kind of capture is avoided, here is a definition
of a simple two-operand version of OR:

      (LAMBDA (SYNTAX-TABLE EXPR)	; (OR2 op1 op2)
	    `((LAMBDA (TEMP)

As usual the writer doesn't need to worry about local redefinitions of the
keywords LAMBDA and IF, but notice that he doesn't have to worry that his
use of a variable named "TEMP" will accidentally capture any variables of
the same name in the second operand.  This is because the second operand is
first preprocessed in the syntax table that was current where the
OR2-expression occurred, and thus any identifiers it may have contained
named "TEMP" have already been resolved to the correct variable named

Thus there is no need to introduce "Gensyms" into macroexpansions in order
to avoid inadvertent capture.

Finally, there are situations where the programmer -wants- a capture to
occur.  For example it writing the LET macro, he wants certain variables in
the body of the LET-expression to be captured.  Here is where the new
argument to PREPROCESS comes in; it gives the user control over the context
sensitivity of preprocessed expressions.  Specifically, it is a list of
identifiers (and keywords) which are to be left syntactically free in the
resulting preprocessed expression.

To illustrate, here is a simple single variable version of LET:

      (LAMBDA (SYNTAX-TABLE EXPR)	; (LET1 <id> <val> <expr>)
	    `((LAMBDA (,ID)

Here the expression in the body of a LET1 is preprocessed using the syntax
table current where the LET1-expression occurred, with the
identifier-to-be-bound excepted.  Thus the static meaning of all the
identifiers and keywords in the expression will be correctly determined,
while the identifier in question will be left free to be captured by the
LAMBDA-expression it is embedded in.

Now the way this new argument to PREPROCESS works may strike you as a
little odd at first.  I used to agree with you.  However, the more I work
with it, the better I like it.  

Perhaps the reason it works so well can been seen by thinking of the thing
returned by PREPROCESS as a "syntactic closure", similar in nature to the
closures returned by LAMBDA-expressions.  In both cases you have an
environment of some kind, a list of identifiers, and an expression.  In
both cases all identifiers in the expression are to be taken relative to
the environment, -except- those in the given list.  The identifiers in the
list are to have their meanings determined later.  In both cases it is a
way of "parameterizing" the expression.

The difference is that the LAMBDA-expression closure is invoked with
positional arguments, while the syntactic closure is invoked in a kind of
"call-by-context" fashion.  But even that seems natural in a situation
where you are constructing expressions out of other expressions; such
context-dependence is just the normal way expressions are combigned!

[ I can't resist pointing out that John Lamping is working on something
  that looks very much like this call-by-context stuff, except for ordinary
  closures.  Anyone who doesn't know about his ideas should look into
  it, its definitely interesting. ]

An implementation of all of this follows.

; A grammar for the core language used as the target of our simple little
; compiler. 
;  core-exp  ::=  core-var
;	       |  (QUOTE datum)
;	       |  (IF core-exp core-exp core-exp)
;	       |  (SET! core-var core-exp)
;	       |  (CALL core-exp core-exp ...)
;	       |  (BEGIN core-exp core-exp ...)
;	       |  (LAMBDA (bound-var ...) core-exp)
;  core-var  ::=  bound-var
;	       |  (FREE name)
;	       |  (ABSOLUTE -path-)
;  bound-var  ::=  (VARIABLE name counter)
;  The bound-vars are all unique, and none are ultimately free.
; Preprocessed expressions

; Calling PREPROCESS just makes a "syntactic closure".  ST is a syntax
; table, SYMS is a list of identifiers and keywords that are to remain
; syntactically free in the resulting preprocessed expression.
(define (preprocess st syms exp)
  (vector 'preprocessed st syms exp))

(define (ppexp->st ppexp) (vector-ref (check-ppexp ppexp) 1))
(define (ppexp->syms ppexp) (vector-ref (check-ppexp ppexp) 2))
(define (ppexp->exp ppexp) (vector-ref (check-ppexp ppexp) 3))

(define (preprocessed? obj)
  (and (vector? obj)
       (= (vector-length obj) 4)
       (eq? (vector-ref obj 0) 'preprocessed)))

(define (check-ppexp ppexp)
  (if (preprocessed? ppexp)
      (error "not a preprocessed expression" ppexp)))
; Primitive Syntax table manipulation

; ->CORE is used to compile an expression into the core language.
; Syntax tables are just procedures.  Note that a syntax table now returns
; a core language expression.  (Previously, a preprocessed expression was
; returned.)
(define (->core st exp)
  (st st exp))

; Keywords

(define (add-keyword st0 keyword proc)
  (lambda (st exp)
    (if (and (pair? exp) (eq? (car exp) keyword))
	;; Macros must return context-insensitive expressions:
	;; (We -could- decide that it would be convenient to use
	;;  scheme-syntax-table here; as long as it is a documented
	;;  context.)
	(->core empty-syntax-table (proc st exp))
	(st0 st exp))))

; Identifiers

(define *counter* 0)

(define (add-identifier st0 id)
  (set! *counter* (+ *counter* 1))
  (let ((var `(variable ,id ,*counter*)))
    (lambda (st exp)
      (if (eq? id exp)
	  (st0 st exp)))))

(define (add-identifiers st ids)
  (if (null? ids)
      (add-identifier (add-identifiers st (cdr ids))
		      (car ids))))

; Make a new syntax table in which the identifiers and keywords in SYMS are
; found in SYMS-ST, and all others are found in ELSE-ST.
(define (filter-syntax-table syms syms-st else-st)
  (lambda (st exp)
    (if (if (pair? exp)
	    (memq (car exp) syms)
	    (memq exp syms))
	(syms-st st exp)
	(else-st st exp))))
; Syntax tables

; An empty syntax table; defines no keywords or identifiers.

(define (empty-syntax-table st exp)
  (cond ((preprocessed? exp)
	 (->core (filter-syntax-table (ppexp->syms exp) st (ppexp->st exp))
		 (ppexp->exp exp)))
	((or (boolean? exp) (number? exp) (char? exp) (string? exp))
	 `(quote ,exp))
	((pair? exp)
	 `(call ,@(map (lambda (arg) (->core st arg)) exp)))
	 (error "not a syntactically valid expression" exp))))

; Core syntax table.  Understands the primitive expression types, but
; not the derived ones.

(define (core-syntax-table st exp)
  (if (symbol? exp)
      `(free ,exp)	; or `(absolute ,exp) ?
      (case (and (pair? exp) (car exp))
	((quote absolute) exp)
	 (let ((st (add-identifiers st (cadr exp))))
	   `(lambda ,(map (lambda (var) (->core st var))
			  (cadr exp))
	      ,(->core st (caddr exp)))))
	 `(set! ,(->core st (cadr exp))
		,(->core st (caddr exp))))
	 `(if ,(->core st (cadr exp))
	      ,(->core st (caddr exp))
	      ,(->core st (cadddr exp))))
	 `(begin ,@(map (lambda (exp) (->core st exp)) (cdr exp))))
	 (empty-syntax-table st exp)))))
; The scheme syntax table defines the derived expression types.

(define scheme-syntax-table
  (do ((st core-syntax-table
	   (add-keyword st (caar z) (cadar z)))
       (z `((and
	     ,(lambda (st exp)
		(let ((forms (cdr exp))
		      (j (lambda (exp) (preprocess st '() exp))))
		  (cond ((null? forms) `#t)
			((null? (cdr forms)) (j (car forms)))
			 (preprocess scheme-syntax-table '()
			   `((lambda (p)
			       (if p (and ,@(map j (cdr forms))) p))
			     ,(j (car forms)))))))))
	    ;; ...
	     ,(lambda (st exp)
		(preprocess core-syntax-table '()
		  `(lambda ,(cadr exp)
		     ,(preprocess-body st (cadr exp) (cddr exp))))))
	    ;; (letrec ,...)
	    ;; ...
	    ;; (quasiquote ,... (absolute scheme-env cons) ...)
	  (cdr z)))
      ((null? z) st)))
; Implements implicit begin and internal defines for lambda bodies.

(define (preprocess-body st ids body)
  (let ((definition? (lambda (exp)
		       (and (pair? exp) (eq? (car exp) 'define))))
	(definition-lhs cadr)
	(definition-rhs caddr))
    (let loop ((l body)
	       (lhss '())
	       (rhss '()))
      (if (null? l)
	  (error (if (null? lhss)
		     "empty body"
		     "no non-definitions in body")
	  (let ((exp (car l)))
	    (if (not (definition? exp))
		(let ((all-ids (append lhss ids)))
		  (let ((body (map (lambda (exp)
				     (preprocess st all-ids exp))
		    (preprocess scheme-syntax-table ids
		      (if (null? lhss)
			  `(begin ,@body)
			  `(letrec ,(map (lambda (lhs rhs)
					     ,(preprocess st all-ids rhs)))
					 (reverse lhss)
					 (reverse rhss))
		(loop (cdr l)
		      (cons (definition-lhs exp) lhss)
		      (cons (definition-rhs exp) rhss))))))))
(define test-syntax-table
  (do ((st scheme-syntax-table
	   (add-keyword st (caar z) (cadar z)))
       (z `((push
	     ,(lambda (st exp)
		(let ((var (preprocess st '() (caddr exp)))
		      (val (preprocess st '() (cadr exp))))
		  (preprocess test-syntax-table '()
		    `(set! ,var (cons ,val ,var))))))
	    ;; Your macros here!
	  (cdr z)))
      ((null? z) st)))