[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

What I learned from building Dylan/Thomas



I've been maintaining an unintentional "radio silence" for the last
several weeks while working with Ron Weiss and Matt Birkholz to build
the Thomas system.  Now that I'm out of that hole, I'd like to update
others on some of what I learned in the process and respond to a few
of the questions raised earlier on these mailing lists.

I. Despite all of the hairy stuff involving keywords, specializers,
multiple inheritance and the rest of it, the generic dispatch system
of Dylan (maybe also CLOS?) was remarkably simple to build and
understand.  The key procedure is in the file generic.scm, and is
simplified here (I removed support for multiple values, error
checking, and renamed some functions for clarity):

   (define (generic-dispatch original-args generic-function)
     (let ((applicable-methods
	    (sorted-applicable-methods
	     (generic-function.methods generic-function)
	     original-args)))
       (let next-method-loop ((remaining-methods applicable-methods)
			      (current-args original-args))
	 (apply (car remaining-methods)
		(if (null? (cdr remaining-methods))
		    #F
		    (lambda (next-method . these-args)
		      (next-method-loop (cdr remaining-methods)
					(if (null? these-args)
					    current-args
					    these-args))))
		current-args))))

This requires one non-trivial support routine:

  sorted-applicable-methods finds the applicable methods and sorts
    them from most- to least-specific.  We used a topological sort of
    all of the classes in the system.  This is used to provide an
    ordering of the classes for comparing with individual arguments to
    the method, which seems logical enough:

    (define (match-specializer? object specializer)
      ;; Computes a distance value from the object to the specializer.
      ;; This distance is either
      ;;   #F  -- doesn't match the specializer
      ;;    0  -- the specializer is a singleton for exactly this object
      ;;   <n> -- ordering of the class representing this specializer in
      ;;          the ordering of all classes (always >= 1)
      (if (singleton? specializer)
	  (if (eq? object (singleton.object specializer)) 0 #F)
	  (if (subclass? (get-type object) specializer)
	      (class.generality specializer)
	      #F)))

    We then used lexicographic ordering to break ties for multiple
    argument dispatch, which is specified in Dylan but seems basically
    arbitrary.  This latter is a place where I'd expect a Scheme
    system to say "unspecified".

  I won't reproduce the intervening code, but if you look for a while
  you'll see that sorted-applicable-methods requires two things:

  a) A way, given a method under consideration, to find the
     specializers that restrict the types of arguments to which it can
     be applied.  In response to one of the questions on the Scheme
     list, this is where the need for WEAK-CONS arises.  Since methods
     are first-class in Dylan (Dylan/method <=> Scheme/lambda) and can
     be attached at any time to a generic function, it is necessary at
     runtime to be able to locate the specializers for a method.  This
     requires building a data structure (an a-list or hash table, for
     example) that uses methods as keys to locate their specializers.
     But such a table would grow indefinitely in size unless the
     garbage collector removes entries that correspond to methods that
     have been eliminated from the system.  [There is a similar need
     to locate a description of a generic function at runtime, used
     for the error detection mechanism I eliminated from this
     simplified code.]

  b) A mechanism for determining the most specific type of an object
     at runtime.  The procedure "get-type" (in the file
     runtime-top.scm) performs the operation.  It is a kludge and very
     order sensitive.  I found this function rather unpleasant to
     maintain as the system grew:

     (define (get-type obj)
       (cond ((instance? obj) (instance.class obj))
	     ((number? obj)			; Might be wrong
	      (if (real? obj)
		  (if (exact? obj)
		      (if (integer? obj)
			  <integer>
			  <ratio>)
		      <float>)
		  <complex>))
	     ((class? obj) <class>)
	     ((singleton? obj) <singleton>)
	     ((null? obj) <empty-list>)
	     ((slot? obj) <slot-descriptor>)
	     ((pair? obj) <pair>)
	     ((vector? obj) <simple-object-vector>)
	     ((string? obj) <byte-string>)
	     ((char? obj) <character>)
	     ((procedure? obj)
	      (cond ((dylan::generic-function? obj) <generic-function>)
		    ((dylan::method? obj) <method>)
		    (else <function>)))
	     ((keyword? obj) <keyword>)
	     ((symbol? obj) <symbol>)
	     (else <object>)))

     This observation has lead me back to a conversation we had about
     the "portable record proposal" which went unresolved at our last
     meeting.  The issue revolved around the pair of requirements for
     disjointness of types and opacity of representation.  Several of
     us discussed this at some length at the LISP conference and I
     came away convinced of the need for a set of operations, separate
     from the record package, for manufacturing disjoint user types.
     At the Authors meeting, Jonathan Rees pointed out that he had
     made such a proposal some time ago but that it had not been acted
     upon.  I'd like to have us act on it: Jonathan, can you find it
     and resubmit it?

II. On the question about teaching generic dispatch as opposed to
object dispatch, I find myself for the first time disagreeing with
Brian Harvey.  While I haven't actually tried to teach this material
for two years now, I seem to recollect that I taught numbers very
early in the course.  They are the one place in Scheme where generic
operation is deeply built into the system, and it provides a simple
and logical place to start when explaining generic dispatch.  The
symbolic algebra system at the end of Chapter 2 of Structure and
Interpretation takes this much farther, but perhaps at the expense of
the gut-level understanding of the issue that comes from simply asking
the question "How does Scheme do that with its numbers, anyway?".  The
very simple example of adding integers to real numbers explains a lot
to anyone familiar with representations.  For those unfamiliar with
numerical representations, the introduction of generic sequence
operations (like the ones in Dylan, but simplified) is another obvious
starting point.  I find the INITIAL-STATE, NEXT-STATE, CURRENT-ELEMENT
iteration protocol very elegant (although my taste would add
CURRENT-KEY to all collections, not just explicit key sequences; even
at the cost of an extra cons-cell for the state of a list).

I, personally, have always found the object dispatch method both
limiting as a programming style and consequently rather unpleasant to
teach.  Its main advantage, from what I've seen so far, is one of
modularity.  The class heterarchy is conceptually elegant and works
well, but unfortunately leads to a number of questions about where an
object gets its behavior from, which inevitably leads to questions
about HOW the object was implemented; this is precisely the
abstraction violation that I thought object systems were intending to
avoid.

III. I've written my first large, portable Scheme program.  It
required exactly two extensions to the language (WEAK operations, a
minimal error system), but is modularized in a way that makes the port
easy.  I like this way of writing my programs, and I think it gives me
a new way to think about the issue of portability.  The Scheme
language has the flexibility to allow most reasonable programs to be
divided into an implementation-independent part which calls procedures
defined in a implementation-dependent part.  This isn't new; but this
is the first time I've actually tried to make it work and it did!  The
ease of porting this system from one Scheme to another contrasts
rather sharply with my attempts to port MIT Scheme's implementation
from one version of C to another, and to similar experiences in
porting Pascal and BCPL programs in an earlier existence.

IV. To the Authors:

I'd really like to see some hooks into the garbage collector
standardized.  This implementation experience, as well as a paper I
wrote several years ago that touches on some of the other things one
can do using a garbage collector, convinced me that it isn't going to
be that hard to figure out "the right set".  I see two "must do" items
and two optional ones:

   MUST DO 1: Have a way to specify a set of "precious" items which
     the garbage collector is not at liberty to remove, but which it
     will attempt to monitor and report when they are no longer
     required for any other reason.  The two optional sets are
     specific implementations of this idea which exist in several
     current Scheme implementations, but which may not be the best
     abstraction of the service.

   MUST DO 2: Have a way to explicitly invoke the equivalent of the
     garbage collector's structure-preserving walk given an arbitrary
     starting object.  One form this could take is a procedure that
     takes the root object and a vector; it stores into the vector all
     the items found from walking that object and returns a count of
     all of the objects encountered (this allows the operation to
     occur (conceptually, at least) without CONSing).  An alternate
     formulation is to provide a function which takes an object and
     returns a collection of the sub-components of that object
     sufficient to allow a full structure walk from source-level
     Scheme.

   OPTIONAL 1: Have a way to run a procedure at the end of every
     garbage collection before returning to user code.  This is how
     the semi-portable code in Thomas implements populations and one
     dimensional "weak" tables on top of WEAK-CONS.  It is one way to
     perform the report function in MUST DO 1, but not the only
     possible way.  All three of the Schemes I used for Thomas (MIT,
     Scheme->C, and Gambit) have some such hook, but all have slightly
     different names and signatures for the function that installs the
     GC daemon(s).

   OPTIONAL 2: The WEAK-CONS set of operations (WEAK-PAIR?, WEAK-CONS,
     WEAK-C{A,D}R, SET-WEAK-C{A,D}R!) are now also available in all
     three Scheme systems.  It took Joel Bartlett only one day to add
     them to his conservative generational collector; they are nice
     because of their simplicity.  They lack the generality of MUST DO
     1, and would require something like the finalization proposal
     that was circulated before our last meeting (but not discussed).

I'd also like to see some people take a good, hard look at the Thomas
code for generic operations and the class hierarchy.  It is very
portable and provides some very good capabilities.  I'd like someone
to take an independent look at that code and see if it can be packaged
nicely and put into the (putative) Scheme library.  If we can do that,
we should be able to add a lot of the Thomas runtime system to the
same Scheme library and have a very good start at a powerful
programming library -- one thing that (IMHO) has long held up the
adoption of Scheme as a "real" language.

--Jim