[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
EQ? and procedures etc
Ok, it's pretty clear that people want (EQ? X X) to be true. I bow to the
majority, but I want to discuss some of the points that have been raised.
This is a long message.
----------------------------------------------------------------
Will:
The current wording of RRRS forces a formal semantics of Scheme to associate
locations with procedure values and numbers. These locations have no
purpose other than to make the semantics of EQ? work as in RRRS, and they
make the semantics much uglier. The ugliness of the semantics results in
more complex and less effective optimizing compilers.
Kent:
I have yet to see a convincing example of this claim.... However, for now
perhaps you could just enumerate some of the troublesome cases for me to
respond to more specifically.
Will:
I made about four claims in the paragraph Kent cites, all of which are
debatable, and I'm not sure which one(s) he doesn't believe so I will run
through all of them. First of all, neither the Muchnik-Pleban semantics nor
the semantics I wrote for my 1984 Lisp conference paper can handle the
semantics of EQ? as in RRRS. The problem is that the domain equation for
expressed values is something like
E = {#!true, #!false, #!null} +
F + ; procedure values
R + ; numbers
Q + ; symbols
LxL + ; pairs
L* + ; vectors
L* ; strings
where L is the domain of locations and + is a disjoint union. The equation
for procedure values is F = E* --> K --> C, so the EQ? procedure sees a
procedure as a higher order (mathematical) function, not as a pointer to a
data structure encoding the higher order function. The obvious fix is to
change the equation for procedure values to be F = L x (E* --> K --> C).
This fix is necessary in order that an implementation be *allowed* to say
for example that (EQ? (lambda (x) x) (lambda (y) y)) is false.
If EQ? does indeed say that the two identity procedures are different, then
we have an interesting example of Guy's "splitting" (which he claims to
dislike): both arguments are the identity procedure, and in all respects
they behave exactly the same except that EQ? says they're different. Now,
however, if I naively go through the semantics and make the changes implied
by the new domain equation for F, I find that implementations are *required*
to say that (EQ? (lambda (x) x) (lambda (y) y)) is false -- so splitting is
forced upon us. That naive semantics can be fixed, provided that we agree
on a precise formulation of the circumstances in which an implementation can
say that two procedures are EQ?. My suggestion that EQ? be left unspecified
on procedure values was prompted by a disagreement over precisely what those
circumstances should be. Since people don't like that, I now propose that
it's ok to say that two procedure values are EQ? iff the second components
(i.e., elements of E* --> K --> C) are equal. This is of course undecidable
in general but implementations should be allowed to take advantage of
whatever simple cases they can recognize. (I expect the MIT philosophy will
disagree.)
Similarly R must have a structure akin to R = L x (mathematical numbers),
but this isn't as troublesome because we seem to agree that it's ok for EQ?
to return true on numbers whenever the difference of the two numbers is
zero, and that EQ? must return false whenever the difference is nonzero.
I think the location-ridden semantics is uglier, but that's a matter of
taste. I admit that the location-ridden semantics doesn't affect compilers
provided my new suggestion is adopted (that EQ? can use the second component
of procedure values), because most of us were doing precisely the same thing
with the old semantics. If on the other hand optimizations such as those
raised by David Bartley are disallowed, then compilers are going to be less
effective. Furthermore the ambitious compilers will be more complex,
because they will still attempt those optimizations whenever the compiler
can prove that the procedures never get passed to the EQ? procedure. (To
the objection that the compiler could never prove such a thing because the
break key provides access to internals at arbitrary times, I reply that the
compiler has the freedom to define critical sections during which such
interrupts are deferred.)
----------------------------------------------------------------
Kent:
...I cite as precedent the case of CALL-WITH-CURRENT-CONTINUATION. Everyone
seemed to agree that it was easier to compile/optimize code which doesn't
allow returning these upward. The argument for pushing it through was not
that it made the implementation simpler, since in fact it complicated it. It
was instead the fact that it was (a) at least possible to implement and (b)
much simpler to some write programs when this feature was available. It
seems to me that the situation is similar for EQ? -- I do not want to give
up this feature. You (the compiler writer) have to solve this problem only
once. I (the programmer) will have to solve it repeatedly if you do not
solve it once for me.
Will:
First class escape procedures greatly simplified the semantics. Their
semantics is so beautiful you don't even have to see them. Downward-only
escape procedures have a really ugly semantics. The same holds for first
class procedures as opposed to downward-only procedures. Semantic
simplicity often does not correlate with implementational simplicity. My
complaint about the current definition of EQ? is not that it is hard to
implement -- it isn't -- but that I think its semantics are ugly. I haven't
always thought that -- I too once thought that (EQ? X X) should always be
true.
(I stand by my comment about semantic ugliness turning into compiler
complexity in this particular case, but in truth I said that for the sake of
people who think of me as a semantic purist who doesn't care about
implementations. If people now think of me as a hacker who doesn't care
about semantics, well, that's great, my disguise must be working.)
----------------------------------------------------------------
Guy:
....may two things that you might expect to be different turn out to be the
same? (Call this property "coalescence".) ....may one thing turn out to be
two different things? (Call this property "splitting".)
Will:
The problem is that we don't have a precise and independent definition of
what we mean by "expect to be different" and "one thing" until we have a
semantics for EQ?, so we can't use these notions to define EQ?. (Likewise,
when the RRRS says of certain objects that they "are identical to themselves
and are different from everything else...", it says nothing at all.) My
earlier reference to (EQ? (LAMBDA (X) X) (LAMBDA (Y) Y)) --> #!FALSE as a
splitting was intended in part to demonstrate that Guy's informal
distinction is probably based on some preconceived semantics for EQ?. That
would be ok if we could formalize that preconceived semantics and then
sanction particular classes of deviations from it, which is in effect what I
propose above.
----------------------------------------------------------------
Guy:
I feel that splitting is a semantically bad thing to have in a LISP-like
language, because I think name-reference should be free of side effects!
Will:
(EQ? + +) --> #!FALSE implies no more of a side effect than (EQ? 1000000
1000000) --> #!FALSE or (<? 3 3) --> #!FALSE or (+ 3 3) --> 6. All we're
talking about is the value returned by a procedure. Since procedures can't
be side effected, we can't look to the primitive mutators for guidance, and
we must fall back on the rule that indistinguishable objects should be EQ?.
----------------------------------------------------------------
Guy:
If two objects cannot be distinguished, then it should be legitimate to
merge them. If there are no RPLACA or RPLACD operations on CONS cells, then
hash-consing is a legitimate optimization (and, indeed, perhaps EQ should be
required to behave as EQUAL on CONS cells in the absence of such side
effects!)...
Will:
Hence (EQ? (LAMBDA (X) X) (LAMBDA (Y) Y)) --> #!TRUE should be ok, but we
can't require it (as we could for hashed pairs) because the problem is
undecidable. The challenge lies in writing a semantics that will allow it
but not require it. I have stated my belief that the cleanest such
semantics leaves EQ? unspecified on procedures, numbers, etc, but I am
willing to adopt what I believe is an uglier semantics if that's what people
want.
----------------------------------------------------------------
Gerry:
I think it is essential that (EQ? x x) be true of procedure values. We
should be able to think of compound data structures as LAMBDA calculus
entities (as at the end of section 3.3.1 of Abelson and Sussman -- "Mutation
is just assignment" -- p 207), and surely we want (EQ? x x) to be true of
CONSs (else assignment to components such as SET-CAR! will make no sense).
Will:
The lambda calculus has no side effects and no primitive or definable (in
the language) equality predicate that satisfies the rule that equal objects
are behaviorally indistinguishable. My feeling is that if the lambda
calculus can get along without EQ?, then Scheme should be able to get along
without it also except for those objects that can be stored into.
Nonetheless I think Gerry has expressed the best argument for making EQ? do
something reliable on procedures.
Let me observe, however, that the value of (LET ((FOO FOO)) (LAMBDA ARGS
(APPLY FOO ARGS))) is behaviorally identical to the value stored in FOO when
that value is a procedure object, but there is no similar way to take a pair
and get from it a behaviorally identical but non-EQ? pair without redefining
all the primitive operations on pairs. Thus if EQ? is supposed to express
something like total behavioral equivalence, then it does a much better job
with pairs than with procedures. Hence the argument for making EQ? work on
procedures is weaker than the corresponding argument for pairs.
In case anyone is worried, my proposal that EQ? can say that two procedure
values are identical whenever their E* --> K --> C parts are identical
is sufficient to ensure that
(EQ? (LET ((N 0)) (LAMBDA () (SET! N (1+ N)) N))
(LET ((N 0)) (LAMBDA () (SET! N (1+ N)) N)))
is false (ignoring the fact that a compiler could determine that neither
procedure is going anywhere). It would not guarantee that
(LET ((N 0))
(EQ? (LAMBDA () (SET! N (1+ N)) N)
(LAMBDA () (SET! N (1+ N)) N)))
is false (with the same caveat). In other words, my new proposal seems to
me to do the right thing with respect to object-oriented programming.