[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
What I learned from building Dylan/Thomas
I've been maintaining an unintentional "radio silence" for the last
several weeks while working with Ron Weiss and Matt Birkholz to build
the Thomas system. Now that I'm out of that hole, I'd like to update
others on some of what I learned in the process and respond to a few
of the questions raised earlier on these mailing lists.
I. Despite all of the hairy stuff involving keywords, specializers,
multiple inheritance and the rest of it, the generic dispatch system
of Dylan (maybe also CLOS?) was remarkably simple to build and
understand. The key procedure is in the file generic.scm, and is
simplified here (I removed support for multiple values, error
checking, and renamed some functions for clarity):
(define (generic-dispatch original-args generic-function)
(let ((applicable-methods
(sorted-applicable-methods
(generic-function.methods generic-function)
original-args)))
(let next-method-loop ((remaining-methods applicable-methods)
(current-args original-args))
(apply (car remaining-methods)
(if (null? (cdr remaining-methods))
#F
(lambda (next-method . these-args)
(next-method-loop (cdr remaining-methods)
(if (null? these-args)
current-args
these-args))))
current-args))))
This requires one non-trivial support routine:
sorted-applicable-methods finds the applicable methods and sorts
them from most- to least-specific. We used a topological sort of
all of the classes in the system. This is used to provide an
ordering of the classes for comparing with individual arguments to
the method, which seems logical enough:
(define (match-specializer? object specializer)
;; Computes a distance value from the object to the specializer.
;; This distance is either
;; #F -- doesn't match the specializer
;; 0 -- the specializer is a singleton for exactly this object
;; <n> -- ordering of the class representing this specializer in
;; the ordering of all classes (always >= 1)
(if (singleton? specializer)
(if (eq? object (singleton.object specializer)) 0 #F)
(if (subclass? (get-type object) specializer)
(class.generality specializer)
#F)))
We then used lexicographic ordering to break ties for multiple
argument dispatch, which is specified in Dylan but seems basically
arbitrary. This latter is a place where I'd expect a Scheme
system to say "unspecified".
I won't reproduce the intervening code, but if you look for a while
you'll see that sorted-applicable-methods requires two things:
a) A way, given a method under consideration, to find the
specializers that restrict the types of arguments to which it can
be applied. In response to one of the questions on the Scheme
list, this is where the need for WEAK-CONS arises. Since methods
are first-class in Dylan (Dylan/method <=> Scheme/lambda) and can
be attached at any time to a generic function, it is necessary at
runtime to be able to locate the specializers for a method. This
requires building a data structure (an a-list or hash table, for
example) that uses methods as keys to locate their specializers.
But such a table would grow indefinitely in size unless the
garbage collector removes entries that correspond to methods that
have been eliminated from the system. [There is a similar need
to locate a description of a generic function at runtime, used
for the error detection mechanism I eliminated from this
simplified code.]
b) A mechanism for determining the most specific type of an object
at runtime. The procedure "get-type" (in the file
runtime-top.scm) performs the operation. It is a kludge and very
order sensitive. I found this function rather unpleasant to
maintain as the system grew:
(define (get-type obj)
(cond ((instance? obj) (instance.class obj))
((number? obj) ; Might be wrong
(if (real? obj)
(if (exact? obj)
(if (integer? obj)
<integer>
<ratio>)
<float>)
<complex>))
((class? obj) <class>)
((singleton? obj) <singleton>)
((null? obj) <empty-list>)
((slot? obj) <slot-descriptor>)
((pair? obj) <pair>)
((vector? obj) <simple-object-vector>)
((string? obj) <byte-string>)
((char? obj) <character>)
((procedure? obj)
(cond ((dylan::generic-function? obj) <generic-function>)
((dylan::method? obj) <method>)
(else <function>)))
((keyword? obj) <keyword>)
((symbol? obj) <symbol>)
(else <object>)))
This observation has lead me back to a conversation we had about
the "portable record proposal" which went unresolved at our last
meeting. The issue revolved around the pair of requirements for
disjointness of types and opacity of representation. Several of
us discussed this at some length at the LISP conference and I
came away convinced of the need for a set of operations, separate
from the record package, for manufacturing disjoint user types.
At the Authors meeting, Jonathan Rees pointed out that he had
made such a proposal some time ago but that it had not been acted
upon. I'd like to have us act on it: Jonathan, can you find it
and resubmit it?
II. On the question about teaching generic dispatch as opposed to
object dispatch, I find myself for the first time disagreeing with
Brian Harvey. While I haven't actually tried to teach this material
for two years now, I seem to recollect that I taught numbers very
early in the course. They are the one place in Scheme where generic
operation is deeply built into the system, and it provides a simple
and logical place to start when explaining generic dispatch. The
symbolic algebra system at the end of Chapter 2 of Structure and
Interpretation takes this much farther, but perhaps at the expense of
the gut-level understanding of the issue that comes from simply asking
the question "How does Scheme do that with its numbers, anyway?". The
very simple example of adding integers to real numbers explains a lot
to anyone familiar with representations. For those unfamiliar with
numerical representations, the introduction of generic sequence
operations (like the ones in Dylan, but simplified) is another obvious
starting point. I find the INITIAL-STATE, NEXT-STATE, CURRENT-ELEMENT
iteration protocol very elegant (although my taste would add
CURRENT-KEY to all collections, not just explicit key sequences; even
at the cost of an extra cons-cell for the state of a list).
I, personally, have always found the object dispatch method both
limiting as a programming style and consequently rather unpleasant to
teach. Its main advantage, from what I've seen so far, is one of
modularity. The class heterarchy is conceptually elegant and works
well, but unfortunately leads to a number of questions about where an
object gets its behavior from, which inevitably leads to questions
about HOW the object was implemented; this is precisely the
abstraction violation that I thought object systems were intending to
avoid.
III. I've written my first large, portable Scheme program. It
required exactly two extensions to the language (WEAK operations, a
minimal error system), but is modularized in a way that makes the port
easy. I like this way of writing my programs, and I think it gives me
a new way to think about the issue of portability. The Scheme
language has the flexibility to allow most reasonable programs to be
divided into an implementation-independent part which calls procedures
defined in a implementation-dependent part. This isn't new; but this
is the first time I've actually tried to make it work and it did! The
ease of porting this system from one Scheme to another contrasts
rather sharply with my attempts to port MIT Scheme's implementation
from one version of C to another, and to similar experiences in
porting Pascal and BCPL programs in an earlier existence.
IV. To the Authors:
I'd really like to see some hooks into the garbage collector
standardized. This implementation experience, as well as a paper I
wrote several years ago that touches on some of the other things one
can do using a garbage collector, convinced me that it isn't going to
be that hard to figure out "the right set". I see two "must do" items
and two optional ones:
MUST DO 1: Have a way to specify a set of "precious" items which
the garbage collector is not at liberty to remove, but which it
will attempt to monitor and report when they are no longer
required for any other reason. The two optional sets are
specific implementations of this idea which exist in several
current Scheme implementations, but which may not be the best
abstraction of the service.
MUST DO 2: Have a way to explicitly invoke the equivalent of the
garbage collector's structure-preserving walk given an arbitrary
starting object. One form this could take is a procedure that
takes the root object and a vector; it stores into the vector all
the items found from walking that object and returns a count of
all of the objects encountered (this allows the operation to
occur (conceptually, at least) without CONSing). An alternate
formulation is to provide a function which takes an object and
returns a collection of the sub-components of that object
sufficient to allow a full structure walk from source-level
Scheme.
OPTIONAL 1: Have a way to run a procedure at the end of every
garbage collection before returning to user code. This is how
the semi-portable code in Thomas implements populations and one
dimensional "weak" tables on top of WEAK-CONS. It is one way to
perform the report function in MUST DO 1, but not the only
possible way. All three of the Schemes I used for Thomas (MIT,
Scheme->C, and Gambit) have some such hook, but all have slightly
different names and signatures for the function that installs the
GC daemon(s).
OPTIONAL 2: The WEAK-CONS set of operations (WEAK-PAIR?, WEAK-CONS,
WEAK-C{A,D}R, SET-WEAK-C{A,D}R!) are now also available in all
three Scheme systems. It took Joel Bartlett only one day to add
them to his conservative generational collector; they are nice
because of their simplicity. They lack the generality of MUST DO
1, and would require something like the finalization proposal
that was circulated before our last meeting (but not discussed).
I'd also like to see some people take a good, hard look at the Thomas
code for generic operations and the class hierarchy. It is very
portable and provides some very good capabilities. I'd like someone
to take an independent look at that code and see if it can be packaged
nicely and put into the (putative) Scheme library. If we can do that,
we should be able to add a lot of the Thomas runtime system to the
same Scheme library and have a very good start at a powerful
programming library -- one thing that (IMHO) has long held up the
adoption of Scheme as a "real" language.
--Jim