[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

EQ? and procedures etc

Will has raised some very good points.  I need to be more precise about
what I mean.

						   ... Guy's informal
    distinction is probably based on some preconceived semantics for EQ?.  That
    would be ok if we could formalize that preconceived semantics and then
    sanction particular classes of deviations from it, which is in effect what I
    propose above.
    (EQ? + +) --> #!FALSE implies no more of a side effect than (EQ? 1000000
    1000000) --> #!FALSE or (<? 3 3) --> #!FALSE or (+ 3 3) --> 6.  All we're
    talking about is the value returned by a procedure.  Since procedures can't
    be side effected, we can't look to the primitive mutators for guidance, and
    we must fall back on the rule that indistinguishable objects should be EQ?.

Indeed, I have a preconceived semantics for EQ?, namely the semantics
attributed to "=" in logic: the identity function.  A logic may have "="
in it or not, but it has to be included specially.

I view EQ? as a primitive notion: it is the very definition of what it
means for two things to be the same.  I want other parts of the language
to satisfy certain relationships with respect to EQ?.

I define the notion of side effect in terms of EQ?.  If I take two
formally identical computations (that is, expressions that are identical
in every way), and use them as the two arguments to EQ?, then if EQ?
returns #!FALSE I say that a side effect has occurred.  (If EQ? returns
true then that result says nothing one way or the other about whether a
side effect has occurred.)

One of the properties I want in my language is that name-reference be
free of side effects.  That is, if "x" is any name, I expect evaluation
of the form (EQ? x x) to produce #!TRUE.

Note that I have not yet insisted that (EQ? f f) return #!TRUE for any
kind of form f other than a name.  In particular, I have not insisted
	(EQ? (LAMBDA (X) X) (LAMBDA (X) X))
return #!TRUE.  This is very different from saying that
	(LET ((Z (LAMBDA (X) X))) (EQ? Z Z))
must return #!TRUE, and indeed I do insist on this latter case.

Note too that I do not insist that beta-conversion be a valid
transformation (and this is why we must be careful to distinguish
	(EQ? (LAMBDA (X) X) (LAMBDA (X) X))
	(LET ((Z (LAMBDA (X) X))) (EQ? Z Z)),
for example: we have not yet established substitutability of expressions
for names that represent their values).  Rather, one of the requirements
for beta-conversion is that the resulting program fragment be externally
indistinguishable from the original with respect to side effects (as defined

However, I do have a rule for LET: if x and y are two names (possibly
the same), then (LET ((x y)) (EQ? x y)) returns #!TRUE.  (That is,
LET-binding does not "split".  There is an analogous rule for binding of
LAMBDA-parameters.  This rule can be generalized for arbitrary
side-effect-free expressions y.)

Once again, I insist that referring to a name be free of side effects.
It is also convenient to insist that referring to a constant (numbers
and quoted objects, for example) be free of side effect, but for now we
sidestep the question of whether the expression (+ 6 6) contains the
"same" constant 6 twice or two different constants representing the
mathematical value 6.  I insist that the operation of function calling
per se be free of side effect; if evaluation of a function-call form has
a side effect, then necessarily either the body of the invoked function
has a side effect or one of the argument forms has a side effect.  (This
is why I want references to constant to be side-effect-free: so that I
can claim that if (+ 6 6) has a side effect, it must be in the body of +
because it doesn't come from the references to the name + or to the
constant(s) 6.)

Now, what things in the language shall be permitted to have side
effects?  We could consistently allow a call to + to result in a side
effect, with the result that (EQ? (+ 6 6) (+ 6 6)) might return #!FALSE.
(Since I so far allow the 6's to be distinct, perhaps it is more to the
point to say that (LET ((X 6)) (EQ? (+ X X) (+ X X))) might return
#!FALSE.)  We could interpret this in familiar implementational terms by
saying that + allocates a new location associated with the result, but
the results do have the same mathematical values, and EQUAL? will
compare these values without regard to the location.

But because numerical equality is decidable we might as well say that
(EQ? m n) iff (EQUAL? m n) for numbers m and n and let it go at that.
Such a definition will maintain the desired properties of naming with
respect to EQ?.

On the other hand, because equality of functions is not decidable in
general, we cannot take such an easy way out.  We insist that for two
functions f and g, (EQ? f g) must return #!FALSE if f and g are
operationally distinguishable (that is, if they have differences of
behavior that are detectable within the language).  On the other hand,
it must be the case that (LET ((X f)) (EQ? x x)) returns #!TRUE for any
f, even a function.  Consider the dilemma of poor EQ?: confronted with
two mathematical functions (for which operational equality is
undecidable in general), it could take the safe way out for the first
condition by returning #!FALSE if it can't decide, or it could take the
safe way out for the second condition by returning #!TRUE if it can't
decide; but it cannot do both.  One solution, as Will has pointed out,
is to package a "hint" with every function object, in the form of a
"location".  One should perhaps think of this not as indicating a
location in *memory*, in which the closure is somehow stored, but rather
as an indication of the *(dynamic) location in the code* whose
evaluation produced the function object.  Two compare two closures,
then, one first compares the locations; if they match, then they
resulted from the same evaluation, and so EQ? must return #!TRUE to
satisfy the naming properties.  Otherwise, EQ? looks at that
mathematical functions themselves; and it may return #!TRUE if it can
prove the functions are operationally indistinguishable, but it is
always safe to return #!FALSE in this case.

To summarize: I take EQ? as primitive; I define the notion of side
effect in terms of EQ?; I require the language to obey certain
properties defined in terms of EQ? (and this is why EQ? is special:
because structural properties of the language, such as naming, are
specified in terms of it); and I show that, because of the
undecidability of equality of functions, the use of pure mathematical
functions to represent closures results in a semantically inadequate
model because two requirements on the language cannot be satisfied
simultaneously by the implementation without additional information
(such as a location).  So the use of locations is not a wart, but
critical to the semantics.  Locations are not necessary for mathematical
objects whose equality is decidable, such as numbers.

					    Thus if EQ? is supposed to express
    something like total behavioral equivalence, then it does a much better job
    with pairs than with procedures.  Hence the argument for making EQ? work on
    procedures is weaker than the corresponding argument for pairs.

I propose that two things might happen to be behaviorally equivalent
without being EQ?.  It is precisely because of the undecidability
problem that EQ? cannot determine equivalence; I say only that EQ?
returning #!TRUE implies behavioral equivalence.  (This is why splitting
is bad: I might determine (EQ? X Y) => #!TRUE and later find that X and
Y behave differently, possibly only with respect to EQ? itsefl!)

    In case anyone is worried, my proposal that EQ? can say that two procedure
    values are identical whenever their E* --> K --> C parts are identical
    is sufficient to ensure that

	    (EQ? (LET ((N 0)) (LAMBDA () (SET! N (1+ N)) N))
		 (LET ((N 0)) (LAMBDA () (SET! N (1+ N)) N)))

    is false (ignoring the fact that a compiler could determine that neither
    procedure is going anywhere).  It would not guarantee that

	    (LET ((N 0))
	      (EQ? (LAMBDA () (SET! N (1+ N)) N)
		   (LAMBDA () (SET! N (1+ N)) N)))

    is false (with the same caveat).  In other words, my new proposal seems to
    me to do the right thing with respect to object-oriented programming.

I think this is exactly right.