[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Environments and Macros



Previously, there has been a debate between Chris Haynes, David Bartley
and myself (Gary Brooks) as to the nature of environment structure in
Scheme and its interaction with macros.

We disagree with almost all of what David Bartley said and much, if not
all of what Chris Haynes said.  Not only do we disagree with the
principles that were advocated in their proposals, we are horrified by
their complexity.  Instead of analyzing each proposal on a point by point
basis, we propose yet another framework that we contend is simple and
straightforward, and allows one to express the type of things one wants to.


Namespaces
----------

First a bit of nomenclature.  The term 'namespace' has been bandied
around a lot in the last few notes.  But what is a namespace?  As far as
I can tell, it is just an association of identifiers to values, or, in
other words an environment!  Why use two words when one will do?  Don't --
**poof** no more namespaces, just environments.


The Environment Problem
-----------------------

The environment problem is concerned with determining the environment in which
an identifier (symbol) is to be interpreted.  In the following       
discussion, reserved words are viewed as constituting an
environment which is given priority over other environments.

The crux of the problem, which can be found in Scheme84 as well as the
two other proposals, is the proliferation of environments.  For example,
if we examine Scheme84 we find that we have a (core) reserved word
environment, a global macro environment, a global primitive environment, a
global environment for everything else, a lexical environment, and a
fluid environment.  When the environment of an identifer is not textually
specified (as it is for fluids), problems arise when we try to determine the
precedence of each environment in the process of interpreting an
identifier.

The problem of determining environment precedence is further aggrevated
by making preferencing (scope rules) dependent on the position of an
identifer in an application, as the two prior proposals advocated.
For example, in Scheme84, the reserved word and the macro environments
are given precedence for identifiers in functional position.  However,
the lexical and global (initial lexical) environment are given precedence
in argument positions.  The complicated and unintuitive scope rules
resulting from position sensitive precedence is intolerable.

Usually, non free identifiers are interpreted globally (i.e.  in the
initial lexical environment).  If, while looking up identifiers, we give
priority to any of the other global environments (e.g.  macro) over that
of lexical scope (which was advocated in previous proposals), we
contaminate our notion of lexical scope.  Given the elegance and
simplicity of the traditional lambda calculus based lexical scoping, this
type of preferencing is a "feature" we are reluctant to incorporate.

In general, no matter what preferencing scheme one adopts in the presence
of multiple global environments, too many rules are needed.  However, if
we simply abolish the proliferation of global environments and rely
solely on lexical scope (augmented with an initial lexical environment,
which we view as containing a binding for all possible identifiers) we
can avoid the problems encountered by preferencing environments.

This immediately raises the question of how macros, primitives, core
items, etc.  are distinguished.  We proprose that they be distinguished
by type (i.e.  we introduce a new kind of typed object for each class of
items).  This emphasizes the point that the name of some entity is not
important, the type of an entity is!  From a compiler's point of view,
determining whether an identifier is a macro, primitive, or core item is
simple.  The compiler need only check the type of (the binding of)
each free identifier in the initial environment.

Some may object to this scenario on the grounds that it allows an
identifier bound to an item of one type (e.g.  macros, primitives, and core
items) to be redefined to be another type.  In particular, some people
are concerned that a crucial macro (e.g.  let) may be redefined as
something else (e.g.  a closure).  I contend that this redefinition
problem is mistakenly concerned with the type of object being redefined.
For example, even if the appropriate machinery prohibited the
redefinition of someone's favorite macro as a closure, there would still
be no conceivable way of preventing his favorite macro from being
redefined as another totally random macro.  Furthermore, one might
reasonably want to redefine a macro as a closure to, say, effect a
different space/time tradeoff.

Multiple Initial Environments:
------------------------------

A related environment problem is how one accounts for the multiple
"global" (initial) environments required in large scale software systems.
While there is much controversy on this issue, we propose a simple
framework that provides for multiple initial environments and allows for
user specified (programmed) inheritance between them.  The perspective
that we advocate is based on the principle that the relationship between
a free identifier and its L-value is determined at compile-time.  We
model initial environments as maps between free identifiers and L-values.
The compiler interface is modified by requiring the compiler to take an
additional argument, an initial environment.  

We point out that this scenerio allows for most inheritance schemes
(e.g. packages) by simply constructing a new environment out of existing
ones.  However, this scenario does prohibit dynamic (i.e., runtime)
resolution of free identifiers.  For those who desire such capabilities,
we advocate treating them in a manner similar to fluids, but not to
conflate them with initial environments.   For example, someone who
desires to create a Unix-like shell environment for an operating system
need only define the relevant macros for defining and side-effecting such
"shell" variables.


Binding and variable Injection
------------------------------

Lexical identifer problems arise when a lexical identifier or a lexical
identifier binding is injected into an existing lexical contour within a
code segment as the result of a macro expansion.  

Variable injection:

In the first case, a macro may inject an identifier into an existing lexical 
contour.  While this may be the desired intension, often it is not.  For
example, consider the following code fragment:

		(lambda (and)
		   (frob a b d))

It is transformed (via the macro frob) to:

		(lambda (and)
		   (and a (or b d)))

Note that what is (presumably) a reference to the global macro 'and' in the
definition of frob has been shadowed by the lambda variable 'and' in a
user program.  In such a case, we say that the injected variable 'and' is
captured by an existing binding.

The problem is centered on the injection of the symbol 'and' into the
lambda body.  Using the normal lexical/global lookup of a symbol we don't
get the binding of 'and' which the macro writer intended.  To see why, we
must look at the way macros are defined.  Consider the definition of frob:
(We use typed macros as advocated above.  Also, 'macro' is like lambda, but
constructs a macro object instead of a closure.)

	(define frob
	   (macro (x y z)
		  `(and ,x (or ,y ,z))))
	
While this might look ok, it suffers from the injected variable problem.
The cause of the problem is the quotation of 'and'.  I contend that when
the macro writer writes the macro (macro definition time), he/she expects
the value of 'and' to be the value of 'and' in the macro environment
(i.e.  the environment when the macro 'frob' is expanded).  Since, in
past versions of Lisp or Scheme, macros have not been typed objects,
there has been no way to do this properly.

However, if we allow the injection of typed objects directly into a piece
program "text" (structure), we can avoid the aforementioned capturing
problem.  To do so only requires the compiler to recognize semantic
objects, like closures or macros, as constants (i.e., we inject semantic
domains into the syntatic domains).  We can then achieve the macro
writers intension by writing frob as:

	(define frob
	   (macro (x y z)
		  `(,and ,x (,or ,y ,z))))

In this case the global binding of 'and' (presumably a macro object) at
the time the macro is invoked, and NOT the symbol 'and', will be included
in the transformed code, thus avoiding this category of variable capture

For those who object to the predominance of commas in the definition of
'frob', let me postulate four new reader macros:

	^  Constructs the corresponding list, but evaluates its
	   atomic subcomponents.
	,  Used to evaluate non-atomic subcomponents (i.e. an
	   application).
	`  When used inside the scope of a "^", inhibits evaluation 
	   of the following (atomic) subcomponent.
	@  Splices the following subcomponent into a list.

Frob could thus be written as:

	(define frob
	   (macro (x y z)
		  ^(and x (or y z))))

Lastly, we point out that the technique of including semantic domains in
syntatic domains has merit in its own right.  It is crucial in order to
define macros which require data structures injected into the code, as in
the definition of own variables or evaluate-once.  It is also useful
in inserting macro "help" functions directly into macro-expanded code, so
as to share code between different macro-expansions of the same macro.
We believe that this technique of "higher-dimensional programming" has
much to offer, but has, to date, barely been explored.

Binding injection:

A similar problem arises when a lexical binding is injected into an
existing lexical context because of a macro expansion.  Consider the
following two argument version of the macro or:

		(define or
		   (macro (exp1 exp2)
			  `(let [[v ,exp1]]
			      (if v v ,exp2))))
		
This works fine, unless the second expression passed to or is 'v'.  In
this case:


		(lambda (v)
		   (or exp1 v))

expands to:

		(lambda (v)
		   (let [[v exp1]]
		      (if v v v)))


We then say the user variable v (i.e., exp2) is captured by
the injected binding of v by the macro.  We point out that such captures
may actually be desired,  although this technique is often
frowned upon.

The capture of variables due to injected bindings can be easily avoided.
All that is required is to ensure that whenever a macro injects a binding,
the macro must guarantee that the name of the injected bound variable is 
unique.   Unique names can easily be created by a gensym-like function.

Currently, there are two techniques that guarantee unique identifers in
injected bindings.  One technique requires explict programmer
construction of unique names in the text of a macro.  For example, the
'or' macro above could be defined as

	(define or 
	   (macro (exp1 exp2)
	      (let [[new-var (gen-variable "or-var")]]
	        ^(let [[new-var exp1]]
		   (if new-var new-var exp2)))))

Where (gen-variable <string>) constructs a unique, uninterned symbol,
whose print name consists of <string> concatenated with a (preferably) 
unique integer.

Macro expansion of:

		(lambda (v)
		   (or <exp1> v))

would become:

		(lambda (v)
		   (<let-macro> [[or-var0001 <exp1>]]
		      (<core-if> or-var0001 or-var0001 v)))

which once again avoids capture.



This category of variable capture due to injected bindings is addressed
from a different perspective by Eugene Kohlbecker's note "Position
Statement on Macros".  Both approaches both attack the unpleasant aspects
of "freezing" and "thawing" to get around name clashes by guarenteeing
the uniquenes of injected binding identifiers.  The Beta expansion which
Eugene advocates differs from the approach above by 1) providing a
framework which alleviates the user's concern for the variable clashes
created by injected bindings, and 2) performing minimal "gensyming" of
identifiers.  


				-- brooks