[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A (Low-Level) Opaque Object Proposal for R5RS



>--------------------------------------------------------------------------
> > >--------------------------------------------------------------------------
> > > (1) with the passkey
> > > mechanism, it will be very difficult to determine all points where
> > > access to an object may occur,
> > >--------------------------------------------------------------------------
>
> > How is this any more or less difficult than determining all points where a
> > slot may be accessed using your record proposal?  Let me be very concrete
> > in case there's something subtle I'm overlooking.
>--------------------------------------------------------------------------
> First, you have not specified (as far as I've seen) how passkeys are
> matched.  In fact, you explicitly left this open in your proposal.
> Assuming a string passkey, as in your example, if matching is by
> string=?, say, then all bets are off.  Can we agree on this point?  If
> so, let's assume that matching is by eqv? and go on, although passkeys
> lose some of their utility under this assumption.
>--------------------------------------------------------------------------

I will not agree to this because it is not true.  If passkey matching is by
STRING=? (or even by EQUAL?) then it is still necessarily true that two
passkeys that are EQ? are also STRING=? (and, stronger, EQUAL? too).  The
reverse, of course, is _not_ true: two STRING=? strings are _not_
necessarily EQ?.  For my purposes, though, EQ? ==> EQUAL? is the point.

Imagine for just a moment that PASSKEY-MATCH? were implemented as follows:

(define (passkey-match? passkey-1 passkey-2)
  (or (eq? passkey-1 passkey-2)
      (%more-expensive-implementation-dependent-general-passkey-match?
           passkey-1 passkey-2)))

Since PASSKEY-MATCH? is intentionally not prescribed in my proposal (as you
noted), this is a fine definition to work from and the compiler can assume
this is how it is implemented (for purposes of optimizing VEIL and UNVEIL
calls).  It also supports the full utility of passkey-based opacity.
Agreed?

My point was that in the code I posted (included again below) the passkeys
_were_ in fact EQ? and this could be decided efficiently at compile time
using USE/DEF/ESCAPE analysis (or whatever you want to call it) of the
<hidden-passkey>.  Moreover, the <hidden-underlying-opaque-type> predicates
can also be optimized out (in-lined) by employing standard Orbit-style
closure-analysis [from David Kranz's Yale Ph.D. thesis].

Look carefully at the code I posted before:
--------------------------------------------------------------------------
(DEFINE-RECORD <record-id> <record-type-name> <field-id>...)

==>

`(BEGIN (DEFINE make-,<record-id>             'TBA) ; abused backquote
        (DEFINE ,<record-id>?                 'TBA) ; abused backquote
        (DEFINE ,<record-id>-,<field-id>      'TBA) ; abused backquote
        ...
        (DEFINE set-,<record-id>-,<field-id>! 'TBA) ; abused backquote, etc
        ...
  ;;
  ;; `record's will be opaque vectors tagged by the <record-type-name>
  ;;
  ,(let* ((<hidden-passkey>            ; Magic secret hidden passkey THAT
            (number->string (random))) ;  THE COMPILER COULD COMPILE OUT!!!!
          (         <field-name-list> (list <field-id>...))
          (<permuted-field-name-list> (permute <field-name-list>))
          (list-elt-number (lambda (lst elt) ; NB: elt must be in list
                             (if (eq? (car lst) elt)
                                 1
                                 (+ 1 (list-elt-number (cdr lst) elt))))))
     `(LET ((<hidden-underlying-opaque-type>
              (MAKE-OPAQUE-TYPE ,<hidden-passkey>)))
         (SET! make-,<record-id>
               (LAMBDA ,<field-name-list>
                 (OPAQUE-VEIL <hidden-underlying-opaque-type> ; opaque
                              (VECTOR ,<record-type-name>     ; tagged vect
                                      ,@<permuted-field-name-list>)
                              ,<hidden-passkey>)))
         (SET! ,(STRING->SYMBOL  ; non-abuse of backquote, for example
                  (STRING-APPEND (SYMBOL->STRING ,<record-id>)
                                 "?"))
               (LAMBDA (object)  ; recall: an `op type' is a predicate
                  (<hidden-underlying-opaque-type> object
                                                   ,<hidden-passkey>)))
         (SET! ,<record-id>-,<field-id>
               (LAMBDA (opaque-record ,<field-id>)
                 (VECTOR-REF (OPAQUE-UNVEIL opaque-record
                                            ,<hidden-passkey>)
                             ,(list-elt-number <permuted-field-name-list>
                                               <field-id>))))
         ...
         (SET! set-,<record-id>-,<field-id>!
               (LAMBDA (opaque-record ,<field-id> new-value)
                 (VECTOR-SET! (OPAQUE-UNVEIL opaque-record
                                             ,<hidden-passkey>)
                              ,(list-elt-number <permuted-field-name-list>
                                                <field-id>)
                              new-value)))
         ...)))
-------------------------------------------------------------------------------

Look at where the <hidden-passkey> is generated.  It results in a compile-
time string literal, which is then distributed throughout the body of the
LET.  Perhaps in your Scheme implementation distinct textual instances of
constant string literals are not EQ?  Fine.  Then reposition the binding
for <hidden-passkey> inside the backquoted LET below it, remove the comma
from in front of the <hidden-passkey> occurrences, and appeal to the
EQ-ness of a variable binding rather than to the EQ?-ness of the string
literal it is bound to.

This is standard USE/DEF analysis that even C and Fortran compilers do.
Any compiler that claims to be an `optimizing' compiler surely must be
doing USE/DEF/ESCAPE analysis.

Moreover, each textual occurrence of the <hidden-underlying-opaque-type> in
the code can likewise be compile-time proven to be EQ?.  Therefore, all
VEIL/UNVEIL calls in the code above can be flattened out to not bother
checking the passkey at run time, and the <record-id>? predicate can
likewise be partial evaluated into a flattened test that also does not
bother to check the passkey at run time (since it is compile-time provable
that <hidden-passkey> _is_ the passkey for this opaque type).

This requires only Orbit-style closure analysis.  Such closure analysis has
been a standard technique for at least a decade.  It (among other things)
is the bread and butter of LISP optimization.  Any Scheme compiler that
claims to be an `optimizing' compiler surely must be doing closure
analysis.

Combined, these two techniques constitute a classic case of allowing a
compiler to optimize aggressively for the common case at compile time while
still leaving a trap-door for the general case to be handled at run time.
To me, this is the sort of engineering principle that typifies Lisp
compilers and distinguishes it from mundane, common place C and Fortran
compilers. Simple EQ? tracking of variable references and closure folding
is all that is required, and they are standard Scheme optimization
techniques.

This is why I cannot agree that ``Assuming a string passkey, as in your
example, if matching is by string=?, say, then all bets are off.''  EQ?
implies STRING=? and EQ?-ness analysis is cheap and sufficient in this
case. Had you just overlooked this possibility?

It is true that in the general case, when you cannot prove compile-time
EQ?-ness, then code would have to resort to run-time STRING=? (or
whatever).  But we are not talking about the general case.  We are talking
about the very specific small piece of code I posted.  That is why I went
to the trouble of posting it. I repeated it again above. Please look at it
carefully this time.  It is the whole point, and the only point, that I was
trying to make.

>--------------------------------------------------------------------------
> Second, what we [now] have is a two-level analysis problem, which is at
> least more expensive and conceptually more complicated than with an
> opaque define-record.  After recognizing the pattern produced by your
> portable macro, we first must establish that the passkey does not
> escape, at which point we can transform the pattern into an opaque
> version of define-record.
>--------------------------------------------------------------------------

You have entirely misconstrued what I was trying to say.  I have not been
sufficiently clear, obviously.  Mea culpa. I apologize.  Let me try again
to clarify:

I am not talking about pattern recognition.  I am talking about doing
compile-time optimization of the code fragment that results _AFTER_ macro
expansion.  I couched it in terms of a macro and macro expansion because I
was trying to demonstrate _exactly_ how your proposed standardized
syntactic interface to DEFINE-RECORD could be implemented using an
underlying MAKE-OPAQUE-TYPE mechanism and standard LISP optimization
techniques (closure analysis, escape analysis, and so on).

Please forget I ever mentioned macros.  You seem to have fixated on this as
some sort of hint that I intended some pattern-based template recognition
hack.  Please imagine instead that the pseudo-macro above completely
textually expands at compile time so that the resulting source code that
the compiler sees and optimizes is just the resulting:

  (BEGIN (DEFINE ...)
         ...
         (LET ((...))
           (SET! ...)
           ...))

This is therefore _not_ a two-level anything.  It is a one-level program
analysis of a BEGIN with a LET-encapsulated sequence of SET!'s.

What I was imagining was that you run your code analysis on the source code
that results _after_ the macro expansion has taken place.  This is not
two-level analysis.  This is pure and simple macro expansion (compile-time
syntactic manipulation) _followed_ by normal every-day one-level
compile-time code analysis.

>--------------------------------------------------------------------------
>  While this can be done, I'm not enthusiastic about doing it or
> the cost associated with doing it, and I'll bet that enough other
> implementations won't do it that programmers will learn that the
> portable define-record is not efficient enough for their purposes.
>--------------------------------------------------------------------------

There will always be toy Scheme implementations with light-weight compilers
(or no compiler) for which portable code will not run efficiently.  I am
not trying to mandate that DEFINE-RECORD must be a portable macro. I am not
trying to legislate that DEFINE-RECORD be non-opaque and that a separate
MAKE-OPAQUE-OBJECT mechanism must be used to build an opaque version.  It
is completely reasonable and desirable for Scheme implementations to
implement their own native-level opaque DEFINE-RECORD according to your
proposed syntactic interface specification in order to make it wizzy fast
despite the ineffectiveness of any given compiler.

I see no difference, for example, between this and the fact that there
exist Scheme implementations that do not `efficiently' implement the full
Scheme number tower (bignums, flonums, ratnums, slownums (a.k.a. complex
arithmetic)...).  Some systems choose to implement them `efficiently' at
the native machine level; others choose a portable, `inefficient' style.

I have never seen an argument that ratnums or complex nums _must_ be
opaque, for example.  From the way you have insisted on the opacity of
RECORDs, I would imagine that if we were to try to standardize Scheme's
number hierarchy today that someone would likewise insist on opacity there.

All I was aiming at was to defuse people saying categorically that they
will veto an `X' proposal unless the proposed standard `X' interface
includes verbiage that either explicitly or implicitly require `opacity' of
compliant implementations of `X', for some reasonable definition of
`opacity'.

That is why I said the following:

>--------------------------------------------------------------------------
> > (Of course, one could also build a non-opaque, record-like efficient
> > access-by-slot-name mechanism just like your DEFINE-RECORD mechanism,
> > (called something like DEFINE-TRANSLUCENT-RECORD) that is just like the
> > above opaque macro-based DEFINE-RECORD, only without an underlying opacity
> > being involved.  My agenda being that if/when we get around to a module
> > proposal or the next neat thing, we won't deadlock on any opacity concerns
> > since those could be ironed out later after a non-opaque interface and
> > semantics is agreed upon.  In theory, any opaque version could be
> > implemented portably by augmenting a non-opaque implementation with the
> > right hidden VEIL/UNVEIL stuff.
>--------------------------------------------------------------------------

Your response:
>--------------------------------------------------------------------------
> I wish it could be this simple, but as an implementor, I will not be
> happy with a standardized variant of records (or, likely, modules)
> being so open as to prevent fruitful program analysis.
>--------------------------------------------------------------------------

What I proposed does not ``prevent fruitful program analysis''. It also
does not prevent you from implementing your own native-level opaque
DEFINE-RECORD.

A specification of how a thing behaves semantically is not a prescription
for how one must implement it.

Let me try one last time to truly understand your position.  Imagine a
worst-case scenario where the RnRS authors never agree on a DEFINE-RECORD
proposal that pacifies your demand for `opacity'.  Now imagine someone
counter-proposes a DEFINE-TRANSLUCENT-RECORD standardized interface so that
even if we cannot have a standard opaque mechanism we can _at_least_ have a
standard access-by-slot-name interface for non-opaque `records'.  Would you
veto such a DEFINE-TRANSLUCENT-RECORD proposal solely because it is not
(and does not attempt to be) opaque?

I have every hope that your opaque DEFINE-RECORD proposal will converge
into something we can all accept.  I am just trying to anticipate the next
thing that comes along that might get delayed in a similar manner.

I am trying to generate consensus while also giving us all a useful common
set of issues to keep in mind when anyone brings up the term `opacity'
again.  If nothing else comes out of this MAKE-OPAQUE-TYPE proposal than
that we have all at least learned a little more about what people are
worrying about when they worry about `opacity', then something good will
have come from it.

For example, it has become more clear to me now that you and Will Clinger
and Aubrey Jaffer all have subtly different sets of ideas in mind when you
invoke the term `opacity'.  I was just trying to create a broad proposal
that might capture all these ideas under a single, clearly defined
mechanism.  If this proposal ever only exists on paper then it will at
least have served as a useful tool for thinking about these different
issues more clearly, if just for me alone.

You see, I believe that computer languages are not just for writing
programs to be executed by a machine, they are also for expressing ideas
and communicating in a formal medium... blah blah blah.

>--------------------------------------------------------------------------
>  I want my users
> to employ the more opaque variants that are more likely to result in
> faster and more easily maintained programs.  This means that I will
> remain inclined to standardize opaque variants, and arguments of the
> form "but you can always define opaque X in terms of Y and transparent
> X" will not satisfy me.  On the other hand, I'm happy to provide
> portable access to what's left after optimization as long as I'm not
> required to retain anything that isn't otherwise needed.
>--------------------------------------------------------------------------

This certainly sounds reasonable. I too would be happy with an opaque
DEFINE-RECORD.  I also would be mostly content with a non-opaque
DEFINE-RECORD too, which merely offered a short-hand for defining a series
of slot-access-by-slot-name procedures and a new unique data type
satisfying the predicate RECORD? (and not VECTOR? or LIST? or ...).

But if the final approval hinges on acceptable provisions for your demand
for opacity and if that final approval fails for just this one reason, then
I will be very disappointed (again).  Disappointed not because it will deny
us a standard RECORD interface, but more importantly because it will be the
harbinger of dead-lock on any and all proposals for anything else that ever
raises the spector of the `opacity' albatross.

Maybe we differ on this.

>--------------------------------------------------------------------------
> > Huh?  You've totally lost me here.  The weak pointer question that Aubrey
> > posted, and Bill's reply to it, were a complete operational side-track
> > (read: red herring).  Although an interesting detail that an implementation
> > might use to reclaim storage in the general case, weak pointers are no more
> > necessary to an `efficient' opaque object mechanism than they are to an
> > `efficient' record mechanism.
> 
> Bill described how lambda + weak pointers = opaque objects, so if we
> were to standardize on weak pointers, we could have a portable opaque
> object implementation (with which we could have a portable opaque
> record implementation).
>--------------------------------------------------------------------------

Actually, what Bill said was more like: opaque lambda + weakly keyed hash
tables = opaque objects.  Maybe you were not following closely what Aubrey
asked and how Bill's response addressed it.

Bill was not suggesting that this is the _only_ way one could implement
opaque objects.  He was not arguing that weak pointers would be _necessary_
for opaque types. He merely suggested it as _one_ way for Aubrey to get GC
of opaque types and opaque objects based on GC of passkeys as indices into
a weak hash, which Aubrey seemed to be interested in for some reason.

Bill's idea was that passkeys could be bignums rather than strings.  These
bignums could be indices into a hash table that holds the veiled datum. The
opaque object could then be a (disjoint) record which holds the hash table
index rather than directly holding the the veiled datum. The VEIL/UNVEIL
procedures would then use the index to unhash the veiled datum, signalling
an error if the (weakly held) datum is no longer hashed. By separating the
veiled datum of an opaque object into a hidden weak hash table inside the
implementation, no special tagging would be needed for DISPLAY/WRITE to not
reveal the hidden opaque datum.

This is just one possible way to implement GC-able opaque data. It is very
clever.  But it is not the _only_ way to implement them.  Therefore, to
hold the opaque object proposal hostage until we have standardized weak
pointers would, in my opinion, be entirely inappropriate.

When you then said:
>--------------------------------------------------------------------------
> It may be that we should consider including a separate, secure opaque
> object facility.  If so, I propose that we first add some form of weak
> pointer and allow programmers to implement their own opaque object
> facilities in the manner Bill has outlined.  If the facilities appear
> to be widely used, we can then include some variant in a standard
> library or build it more directly into the language.
>--------------------------------------------------------------------------

... I really think you missed the point.  For you to suggest that a
proposal for OPAQUE-TYPEs should not be considered until we've accumulated
experience with one particular implementation strategy (weak pointer), and
even then it might be best to simply relegate it to a standard library,
well... I must admit I was pretty surprised and astounded.

You seem to have been arguing vehemently for opacity in RECORDs solely so
you can pull clever tricks with eliding slots and so forth.  Tell me, how
is this different conceptually from implementing slots using some clever
weak pointer mechanism?  Yes, yes, I know there are other ways of doing
it...  but I insist you address an implementation strategy using weak
pointers.

It may be that we should consider including a separate, opaque record
facility (different from a non-opaque record facility), in which one
could play clever games of dropping unused slots.  If so, I propose that
we first add some form of weak vector slot and allow programmers... etc.

You get the idea.  I think your proposal that we delay any discussion of
standardizing an opaque object facility until we've standardized on weak
pointers is no less ludicrous.

Of course, I won't be contrite and actually veto a standardized opaque
RECORD interface.  I can see that it is a useful thing that many people
seem to want, even though most who want it already have their own non-
standardized version in whatever Scheme implementation they are using
already anyway.  Come to think of it, maybe it should just be part of a
standardized library.

>--------------------------------------------------------------------------
> > (I did casually browse your web page in search of some hint as to the
> > program analysis you have in mind, but I didn't see any titles in your
> > on-line publications that suggested such an analysis was discussed.
> > ...
> 
> Click on Mike Ashley's name under "Ph.D. Students" on my home page, and
> go from there to his 1996 POPL paper.  (You can go directly to Mike's
> page at http://www.eecs.ukans.edu/~jashley.)  This should give you a
> hint about where we're headed, although no explicit mention of records
> is made.  His dissertation will be out soon with more.
>--------------------------------------------------------------------------

Thanks.  I hope to get around to looking at this in the next several weeks.
I appreciate the pointer.