Go to the first, previous, next, last section, table of contents.

Associations

MIT Scheme provides several mechanisms for associating objects with one another. Each of these mechanisms creates a link between one or more objects, called keys, and some other object, called a datum. Beyond this common idea, however, each of the mechanisms has various different properties that make it appropriate in different situations:

Association lists are one of Lisp's oldest association mechanisms. Because they are made from ordinary pairs, they are easy to build and manipulate, and very flexible in use. However, the average lookup time for an association list is linear in the number of associations.
1D tables have a very simple interface, making them easy to use, and offer the feature that they do not prevent their keys from being reclaimed by the garbage collector. Like association lists, their average lookup time is linear in the number of associations; but 1D tables aren't as flexible.
The association table is MIT Scheme's equivalent to the property lists of Lisp. It has the advantages that the keys may be any type of object and that it does not prevent the keys from being reclaimed by the garbage collector. However, two linear-time lookups must be performed, one for each key, whereas for traditional property lists only one is lookup required for both keys.
Hash tables are a powerful mechanism with constant-time access to large amounts of data. Hash tables are not as flexible as association lists, but because their access times are independent of the number of associations in the table, for most applications they are the mechanism of choice.
Balanced binary trees are another association mechanism that is useful for applications in which the keys are ordered. Binary trees have access times that are proportional to the logarithm of the number of associations in the tree. While they aren't as fast as hash tables, they offer the advantage that the contents of the tree can be converted to a sorted alist in linear time. Additionally, two trees can be compared for equality in worst-case linear time.
Red-Black trees are a kind of balanced binary tree. The implementation supports destructive insertion and deletion operations with a good constant factor.
Weight-Balanced trees are a kind of balanced binary tree. The implementation provides non-destructive operations. There is a comprehensive set of operations, including: a constant-time size operation; many high-level operations such as the set operations union, intersection and difference; and indexing of elements by position.

Association Lists

An association list, or alist, is a data structure used very frequently in Scheme. An alist is a list of pairs, each of which is called an association. The car of an association is called the key.

An advantage of the alist representation is that an alist can be incrementally augmented simply by adding new entries to the front. Moreover, because the searching procedures assv et al. search the alist in order, new entries can "shadow" old entries. If an alist is viewed as a mapping from keys to data, then the mapping can be not only augmented but also altered in a non-destructive manner by adding new entries to the front of the alist.(14)

procedure+: alist? object: Returns #t if object is an association list (including the empty list); otherwise returns #f. Any object satisfying this predicate also satisfies list?.

procedure: assq object alist

procedure: assv object alist

procedure: assoc object alist

These procedures find the first pair in alist whose car field is object, and return that pair; the returned pair is always an element of alist, not one of the pairs from which alist is composed. If no pair in alist has object as its car, #f (n.b.: not the empty list) is returned. assq uses eq? to compare object with the car fields of the pairs in alist, while assv uses eqv? and assoc uses equal?.(15)

(define e '((a 1) (b 2) (c 3)))
(assq 'a e)                             =>  (a 1)
(assq 'b e)                             =>  (b 2)
(assq 'd e)                             =>  #f
(assq (list 'a) '(((a)) ((b)) ((c))))   =>  #f
(assoc (list 'a) '(((a)) ((b)) ((c))))  =>  ((a))
(assq 5 '((2 3) (5 7) (11 13)))         =>  unspecified
(assv 5 '((2 3) (5 7) (11 13)))         =>  (5 7)

procedure+: association-procedure predicate selector

Returns an association procedure that is similar to assv, except that selector (a procedure of one argument) is used to select the key from the association, and predicate (an equivalence predicate) is used to compare the key to the given item. This can be used to make association lists whose elements are, say, vectors instead of pairs (also see section Searching Lists).

For example, here is how assv could be implemented:

(define assv (association-procedure eqv? car))

Another example is a "reverse association" procedure:

(define rassv (association-procedure eqv? cdr))

procedure+: del-assq object alist

procedure+: del-assv object alist

procedure+: del-assoc object alist

These procedures return a newly allocated copy of alist in which all associations with keys equal to object have been removed. Note that while the returned copy is a newly allocated list, the association pairs that are the elements of the list are shared with alist, not copied. del-assq uses eq? to compare object with the keys, while del-assv uses eqv? and del-assoc uses equal?.

(define a
  '((butcher . "231 e22nd St.")
    (baker . "515 w23rd St.")
    (hardware . "988 Lexington Ave.")))

(del-assq 'baker a)
     =>
     ((butcher . "231 e22nd St.")
      (hardware . "988 Lexington Ave."))

procedure+: del-assq! object alist
procedure+: del-assv! object alist
procedure+: del-assoc! object alist: These procedures remove from alist all associations with keys equal to object. They return the resulting list. del-assq! uses eq? to compare object with the keys, while del-assv! uses eqv? and del-assoc! uses equal?. These procedures are like del-assq, del-assv, and del-assoc, respectively, except that they destructively modify alist.

procedure+: delete-association-procedure deletor predicate selector

This returns a deletion procedure similar to del-assv or del-assq!. The predicate and selector arguments are the same as those for association-procedure, while the deletor argument should be either the procedure list-deletor (for non-destructive deletions), or the procedure list-deletor! (for destructive deletions).

For example, here is a possible implementation of del-assv:

(define del-assv 
  (delete-association-procedure list-deletor eqv? car))

procedure+: alist-copy alist

Returns a newly allocated copy of alist. This is similar to list-copy except that the "association" pairs, i.e. the elements of the list alist, are also copied. alist-copy could have been implemented like this:

(define (alist-copy alist)
  (if (null? alist)
      '()
      (cons (cons (car (car alist)) (cdr (car alist)))
            (alist-copy (cdr alist)))))

1D Tables

1D tables ("one-dimensional" tables) are similar to association lists. In a 1D table, unlike an association list, the keys of the table are held weakly: if a key is garbage-collected, its associated value in the table is removed. 1D tables compare their keys for equality using eq?.

1D tables can often be used as a higher-performance alternative to the two-dimensional association table (see section The Association Table). If one of the keys being associated is a compound object such as a vector, a 1D table can be stored in one of the vector's slots. Under these circumstances, accessing items in a 1D table will be comparable in performance to using a property list in a conventional Lisp.

procedure+: make-1d-table: Returns a newly allocated empty 1D table.

procedure+: 1d-table? object: Returns #t if object is a 1D table, otherwise returns #f. Any object that satisfies this predicate also satisfies list?.

procedure+: 1d-table/put! 1d-table key datum: Creates an association between key and datum in 1d-table. Returns an unspecified value.

procedure+: 1d-table/remove! 1d-table key: Removes any association for key in 1d-table and returns an unspecified value.

procedure+: 1d-table/get 1d-table key default: Returns the datum associated with key in 1d-table. If there is no association for key, default is returned.

procedure+: 1d-table/lookup 1d-table key if-found if-not-found: If-found must be a procedure of one argument, and if-not-found must be a procedure of no arguments. If 1d-table contains an association for key, if-found is invoked on the datum of the association. Otherwise, if-not-found is invoked with no arguments. In either case, the result of the invoked procedure is returned as the result of 1d-table/lookup.

procedure+: 1d-table/alist 1d-table: Returns a newly allocated association list that contains the same information as 1d-table.

The Association Table

MIT Scheme provides a generalization of the property-list mechanism found in most other implementations of Lisp: a global two-dimensional association table. This table is indexed by two keys, called x-key and y-key in the following procedure descriptions. These keys and the datum associated with them can be arbitrary objects. eq? is used to discriminate keys.

Think of the association table as a matrix: a single datum can be accessed using both keys, a column using x-key only, and a row using y-key only.

procedure+: 2d-put! x-key y-key datum: Makes an entry in the association table that associates datum with x-key and y-key. Returns an unspecified result.

procedure+: 2d-remove! x-key y-key: If the association table has an entry for x-key and y-key, it is removed. Returns an unspecified result.

procedure+: 2d-get x-key y-key: Returns the datum associated with x-key and y-key. Returns #f if no such association exists.

procedure+: 2d-get-alist-x x-key

Returns an association list of all entries in the association table that are associated with x-key. The result is a list of (y-key . datum) pairs. Returns the empty list if no entries for x-key exist.

(2d-put! 'foo 'bar 5)
(2d-put! 'foo 'baz 6)
(2d-get-alist-x 'foo)                   =>  ((baz . 6) (bar . 5))

procedure+: 2d-get-alist-y y-key

Returns an association list of all entries in the association table that are associated with y-key. The result is a list of (x-key . datum) pairs. Returns the empty list if no entries for y-key exist.

(2d-put! 'bar 'foo 5)
(2d-put! 'baz 'foo 6)
(2d-get-alist-y 'foo)                   =>  ((baz . 6) (bar . 5))

Hash Tables

Hash tables are a fast, powerful mechanism for storing large numbers of associations. MIT Scheme's hash tables feature automatic resizing, customizable growth parameters, and customizable hash procedures.

The average times for the insertion, deletion, and lookup operations on a hash table are bounded by a constant. The space required by the table is proportional to the number of associations in the table; the constant of proportionality is described below (see section Resizing of Hash Tables).

The hash-table implementation is a run-time-loadable option. To use hash tables, execute

(load-option 'hash-table)

once before calling any of the procedures defined here.

Construction of Hash Tables

The next few procedures are hash-table constructors. All hash table constructors are procedures that accept one optional argument, initial-size, and return a newly allocated hash table. If initial-size is given, it must be an exact non-negative integer or #f. The meaning of initial-size is discussed below (see section Resizing of Hash Tables).

Hash tables are normally characterized by two things: the equivalence predicate that is used to compare keys, and whether or not the table allows its keys to be reclaimed by the garbage collector. If a table prevents its keys from being reclaimed by the garbage collector, it is said to hold its keys strongly; otherwise it holds its keys weakly (see section Weak Pairs).

procedure+: make-eq-hash-table [initial-size]

Returns a newly allocated hash table that accepts arbitrary objects as keys, and compares those keys with eq?. The keys are held weakly. These are the fastest of the standard hash tables.

For compatibility with old code, make-symbol-hash-table is a synonym for this procedure.

procedure+: make-eqv-hash-table [initial-size]

Returns a newly allocated hash table that accepts arbitrary objects as keys, and compares those keys with eqv?. The keys are held weakly, except that booleans, characters, and numbers are held strongly. These hash tables are a little slower than those made by make-eq-hash-table.

For compatibility with old code, make-object-hash-table is a synonym for this procedure.

procedure+: make-equal-hash-table [initial-size]: Returns a newly allocated hash table that accepts arbitrary objects as keys, and compares those keys with equal?. The keys are held strongly. These hash tables are quite a bit slower than those made by make-eq-hash-table.

procedure+: make-string-hash-table [initial-size]: Returns a newly allocated hash table that accepts character strings as keys, and compares them with string=?. The keys are held strongly.

The next two procedures are used to create new hash-table constructors. All of the above hash table constructors, with the exception of make-eqv-hash-table, could have been created by calls to these "constructor-constructors"; see the examples below.

procedure+: strong-hash-table/constructor key-hash key=? [rehash-after-gc?]

procedure+: weak-hash-table/constructor key-hash key=? [rehash-after-gc?]

Each of these procedures accepts two arguments and returns a hash-table constructor. The key=? argument is an equivalence predicate for the keys of the hash table. The key-hash argument is a procedure that computes a hash number. Specifically, key-hash accepts two arguments, a key and an exact positive integer (the modulus), and returns an exact non-negative integer that is less than the modulus.

The optional argument rehash-after-gc?, if true, says that the values returned by key-hash might change after a garbage collection. If so, the hash-table implementation arranges for the table to be rehashed when necessary. (See section Address Hashing, for information about hash procedures that have this property.) Otherwise, it is assumed that key-hash always returns the same value for the same arguments. The default value of this argument is #f.

The constructors returned by strong-hash-table/constructor make hash tables that hold their keys strongly. The constructors returned by weak-hash-table/constructor make hash tables that hold their keys weakly.

Some examples showing how some standard hash-table constructors could have been defined:

(define make-eq-hash-table
  (weak-hash-table/constructor eq-hash-mod eq? #t))

(define make-equal-hash-table
  (strong-hash-table/constructor equal-hash-mod equal? #t))

(define make-string-hash-table
  (strong-hash-table/constructor string-hash-mod string=? #f))

The following procedure is sometimes useful in conjunction with weak hash tables. Normally it is not needed, because such hash tables clean themselves automatically as they are used.

procedure+: hash-table/clean! hash-table: If hash-table is a type of hash table that holds its keys weakly, this procedure recovers any space that was being used to record associations for objects that have been reclaimed by the garbage collector. Otherwise, this procedure does nothing. In either case, it returns an unspecified result.

Basic Hash Table Operations

The procedures described in this section are the basic operations on hash tables. They provide the functionality most often needed by programmers. Subsequent sections describe other operations that provide additional functionality needed by some applications.

procedure+: hash-table? object: Returns #t if object is a hash table, otherwise returns #f.

procedure+: hash-table/put! hash-table key datum: Associates datum with key in hash-table and returns an unspecified result. The average time required by this operation is bounded by a constant.

procedure+: hash-table/get hash-table key default: Returns the datum associated with key in hash-table. If there is no association for key, default is returned. The average time required by this operation is bounded by a constant.

procedure+: hash-table/remove! hash-table key: If hash-table has an association for key, removes it. Returns an unspecified result. The average time required by this operation is bounded by a constant.

procedure+: hash-table/clear! hash-table: Removes all associations in hash-table and returns an unspecified result. The average and worst-case times required by this operation are bounded by a constant.

procedure+: hash-table/count hash-table: Returns the number of associations in hash-table as an exact non-negative integer. If hash-table holds its keys weakly, this is a conservative upper bound that may count some associations whose keys have recently been reclaimed by the garbage collector. The average and worst-case times required by this operation are bounded by a constant.

procedure+: hash-table->alist hash-table: Returns the contents of hash-table as a newly allocated alist. Each element of the alist is a pair (key . datum) where key is one of the keys of hash-table, and datum is its associated datum. The average and worst-case times required by this operation are linear in the number of associations in the table.

procedure+: hash-table/key-list hash-table: Returns a newly allocated list of the keys in hash-table. The average and worst-case times required by this operation are proportional to the number of associations in the table.

procedure+: hash-table/datum-list hash-table: Returns a newly allocated list of the datums in hash-table. Each element of the list corresponds to one of the associations in hash-table; if the table contains multiple associations with the same datum, so will this list. The average and worst-case times required by this operation are proportional to the number of associations in the table.

procedure+: hash-table/for-each hash-table procedure: Procedure must be a procedure of two arguments. Invokes procedure once for each association in hash-table, passing the association's key and datum as arguments, in that order. Returns an unspecified result. Procedure must not modify hash-table, with one exception: it is permitted to call hash-table/remove! to remove the association being processed.

The following procedure is an alternate form of hash-table/get that is useful in some situations. Usually, hash-table/get is preferable because it is faster.

procedure+: hash-table/lookup hash-table key if-found if-not-found: If-found must be a procedure of one argument, and if-not-found must be a procedure of no arguments. If hash-table contains an association for key, if-found is invoked on the datum of the association. Otherwise, if-not-found is invoked with no arguments. In either case, the result yielded by the invoked procedure is returned as the result of hash-table/lookup (hash-table/lookup reduces into the invoked procedure, i.e. calls it tail-recursively). The average time required by this operation is bounded by a constant.

Resizing of Hash Tables

Normally, hash tables automatically resize themselves according to need. Because of this, the programmer need not be concerned with management of the table's size. However, some limited control over the table's size is provided, which will be discussed below. This discussion involves two concepts, usable size and physical size, which we will now define.

The usable size of a hash table is the number of associations that the table can hold at a given time. If the number of associations in the table exceeds the usable size, the table will automatically grow, increasing the usable size to a new value that is sufficient to hold the associations.

The physical size is an abstract measure of a hash table that specifies how much space is allocated to hold the associations of the table. The physical size is always greater than or equal to the usable size. The physical size is not interesting in itself; it is interesting only for its effect on the performance of the hash table. While the average performance of a hash-table lookup is bounded by a constant, the worst-case performance is not. For a table containing a given number of associations, increasing the physical size of the table decreases the probability that worse-than-average performance will occur.

The physical size of a hash table is statistically related to the number of associations. However, it is possible to place bounds on the physical size, and from this to estimate the amount of space used by the table:

(define (hash-table-space-bounds count rehash-size rehash-threshold)
  (let ((tf (/ 1 rehash-threshold)))
    (values (if (exact-integer? rehash-size)
                (- (* count (+ 4 tf))
                   (* tf (+ rehash-size rehash-size)))
                (* count (+ 4 (/ tf (* rehash-size rehash-size)))))
            (* count (+ 4 tf)))))

What this formula shows is that, for a "normal" rehash size (that is, not an exact integer), the amount of space used by the hash table is proportional to the number of associations in the table. The constant of proportionality varies statistically, with the low bound being

(+ 4 (/ (/ 1 rehash-threshold) (* rehash-size rehash-size)))

and the high bound being

(+ 4 (/ 1 rehash-threshold))

which, for the default values of these parameters, are 4.25 and 5, respectively. Reducing the rehash size will tighten these bounds, but increases the amount of time spent resizing, so you can see that the rehash size gives some control over the time-space tradeoff of the table.

The programmer can control the size of a hash table by means of three parameters:

Each table's initial-size may be specified when the table is created.
Each table has a rehash size that specifies how the size of the table is changed when it is necessary to grow or shrink the table.
Each table has a rehash threshold that specifies the relationship of the table's physical size to its usable size.

If the programmer knows that the table will initially contain a specific number of items, initial-size can be given when the table is created. If initial-size is an exact non-negative integer, it specifies the initial usable size of the hash table; the table will not change size until the number of items in the table exceeds initial-size, after which automatic resizing is enabled and initial-size no longer has any effect. Otherwise, if initial-size is not given or is #f, the table is initialized to an unspecified size and automatic resizing is immediately enabled.

The rehash size specifies how much to increase the usable size of the hash table when it becomes full. It is either an exact positive integer, or a real number greater than one. If it is an integer, the new size is the sum of the old size and the rehash size. Otherwise, it is a real number, and the new size is the product of the old size and the rehash size. Increasing the rehash size decreases the average cost of an insertion, but increases the average amount of space used by the table. The rehash size of a table may be altered dynamically by the application in order to optimize the resizing of the table; for example, if the table will grow quickly for a known period and afterwards will not change size, performance might be improved by using a large rehash size during the growth phase and a small one during the static phase. The default rehash size of a newly constructed hash table is 2.0.

Note well: The use of an exact positive integer for a rehash size is almost always undesirable; this option is provided solely for compatibility with the Common Lisp hash-table mechanism. The reason for this has to do with the time penalty for resizing the hash table. The time needed to resize a hash table is proportional to the number of associations in the table. This resizing cost is amortized across the insertions required to fill the table to the point where it needs to grow again. If the table grows by an amount proportional to the number of associations, then the cost of resizing and the increase in size are both proportional to the number of associations, so the amortized cost of an insertion operation is still bounded by a constant. However, if the table grows by a constant amount, this is not true: the amortized cost of an insertion is not bounded by a constant. Thus, using a constant rehash size means that the average cost of an insertion increases proportionally to the number of associations in the hash table.

The rehash threshold is a real number, between zero exclusive and one inclusive, that specifies the ratio between a hash table's usable size and its physical size. Decreasing the rehash threshold decreases the probability of worse-than-average insertion, deletion, and lookup times, but increases the physical size of the table for a given usable size. The default rehash threshold of a newly constructed hash table is 1.

procedure+: hash-table/size hash-table: Returns the usable size of hash-table as an exact positive integer. This is the number of associations that hash-table can hold before it will grow.

procedure+: hash-table/rehash-size hash-table: Returns the rehash size of hash-table.

procedure+: set-hash-table/rehash-size! hash-table x: X must be either an exact positive integer, or a real number that is greater than one. Sets the rehash size of hash-table to x and returns an unspecified result. This operation adjusts the "shrink threshold" of the table; the table might shrink if the number of associations is less than the new threshold.

procedure+: hash-table/rehash-threshold hash-table: Returns the rehash threshold of hash-table.

procedure+: set-hash-table/rehash-threshold! hash-table x: X must be a real number between zero exclusive and one inclusive. Sets the rehash threshold of hash-table to x and returns an unspecified result. This operation does not change the usable size of the table, but it usually changes the physical size of the table, which causes the table to be rehashed.

Address Hashing

The procedures described in this section may be used to make very efficient key-hashing procedures for arbitrary objects. All of these procedures are based on address hashing, which uses the address of an object as its hash number. The great advantage of address hashing is that converting an arbitrary object to a hash number is extremely fast and takes the same amount of time for any object.

The disadvantage of address hashing is that the garbage collector changes the addresses of most objects. The hash-table implementation compensates for this disadvantage by automatically rehashing tables that use address hashing when garbage collections occur. Thus, in order to use these procedures for key hashing, it is necessary to tell the hash-table implementation (by means of the rehash-after-gc? argument to the "constructor-constructor" procedure) that the hash numbers computed by your key-hashing procedure must be recomputed after a garbage collection.

procedure+: eq-hash object
procedure+: eqv-hash object
procedure+: equal-hash object: These procedures return a hash number for object. The result is always a non-negative integer, and in the case of eq-hash, a non-negative fixnum. Two objects that are equivalent according to eq?, eqv?, or equal?, respectively, will produce the same hash number when passed as arguments to these procedures, provided that the garbage collector does not run during or between the two calls.

The following procedures are the key-hashing procedures used by the standard address-hash-based hash tables.

procedure+: eq-hash-mod object modulus: This procedure is the key-hashing procedure used by make-eq-hash-table.

procedure+: eqv-hash-mod object modulus: This procedure is the key-hashing procedure used by make-eqv-hash-table.

procedure+: equal-hash-mod object modulus: This procedure is the key-hashing procedure used by make-equal-hash-table.

Low-Level Hash Table Operations

The procedures in this section allow the programmer to control some of the internal structure of a hash table. Normally, hash tables maintain associations between keys and datums using pairs or weak pairs. These procedures allow the programmer to specify the use of some other data structure to maintain the association. In this section, the data structure that represents an association in a hash table is called an entry.

procedure+: hash-table/constructor key-hash key=? make-entry entry-valid? entry-key entry-datum set-entry-datum! [rehash-after-gc?]

Creates and returns a hash-table constructor procedure (see section Construction of Hash Tables). The arguments to hash-table/constructor define the characteristics of the hash table as follows:

key-hash: The hashing procedure. A procedure that accepts two arguments, a key and an exact positive integer (the modulus), and returns an exact non-negative integer that is less than the modulus.
key=?: A equivalence predicate that accepts two keys and is true iff they are the same key. If this predicate is true of two keys, then key-hash must return the same value for each of these keys (given the same modulus in both cases).
make-entry: A procedure that accepts a key and a datum as arguments and returns a newly allocated entry.
entry-valid?: A procedure that accepts an entry and returns #f iff the entry's key has been reclaimed by the garbage collector. Instead of a procedure, this may be #t, which is equivalent to (lambda (entry) #t).
entry-key: A procedure that accepts an entry as an argument and returns the entry's key.
entry-datum: A procedure that accepts an entry as an argument and returns the entry's datum.
set-entry-datum!: A procedure that accepts an entry and an object as arguments, modifies the entry's datum to be the object, and returns an unspecified result.
rehash-after-gc?: An optional argument that, if true, says the values returned by key-hash might change after a garbage collection. If so, the hash-table implementation arranges for the table to be rehashed when necessary. (See section Address Hashing, for information about hash procedures that have this property.) Otherwise, it is assumed that key-hash always returns the same value for the same arguments. The default value of this argument is #f.

For example, here is how the constructors for ordinary hash tables could be defined:

(define (strong-hash-table/constructor key-hash key=?
                                       #!optional rehash-after-gc?)
  (hash-table/constructor key-hash key=? cons #t car cdr set-cdr!
                          (if (default-object? rehash-after-gc?)
                              #f
                              rehash-after-gc?)))

(define (weak-hash-table/constructor key-hash key=?
                                     #!optional rehash-after-gc?)
  (hash-table/constructor key-hash key=? weak-cons weak-pair/car?
                          weak-car weak-cdr weak-set-cdr!
                          (if (default-object? rehash-after-gc?)
                              #f
                              rehash-after-gc?)))

procedure+: hash-table/key-hash hash-table
procedure+: hash-table/key=? hash-table
procedure+: hash-table/make-entry hash-table
procedure+: hash-table/entry-valid? hash-table
procedure+: hash-table/entry-key hash-table
procedure+: hash-table/entry-datum hash-table
procedure+: hash-table/set-entry-datum! hash-table: Each of these procedures corresponds to an argument of hash-table/constructor. When called, each procedure returns the value of the corresponding argument that was used to construct hash-table.

The following procedures return the contents of a hash table as a collection of entries. While the data structure holding the entries is newly allocated, the entries themselves are not copied. Since hash table operations can modify these entries, the entries should be copied if it is desired to keep them while continuing to modify the table.

procedure+: hash-table/entries-list hash-table: Returns a newly allocated list of the entries in hash-table.

procedure+: hash-table/entries-vector hash-table

Returns a newly allocated vector of the entries in hash-table. Equivalent to

(list->vector (hash-table/entries-list hash-table))

Object Hashing

The MIT Scheme object-hashing facility provides a mechanism for generating a unique hash number for an arbitrary object. This hash number, unlike an object's address, is unchanged by garbage collection. The object-hashing facility is useful in conjunction with hash tables, but it may be used for other things as well. In particular, it is used in the generation of the written representation for some objects (see section Custom Output).

All of these procedures accept an optional argument called table; this table contains the object-integer associations. If given, this argument must be an object-hash table as constructed by hash-table/make (see below). If not given, a default table is used.

procedure+: hash object [table]: hash associates an exact non-negative integer with object and returns that integer. If hash was previously called with object as its argument, the integer returned is the same as was returned by the previous call. hash guarantees that distinct objects (in the sense of eq?) are associated with distinct integers.

procedure+: unhash k [table]: unhash takes an exact non-negative integer k and returns the object associated with that integer. If there is no object associated with k, or if the object previously associated with k has been reclaimed by the garbage collector, an error of type condition-type:bad-range-argument is signalled. In other words, if hash previously returned k for some object, and that object has not been reclaimed, it is the value of the call to unhash.

An object that is passed to hash as an argument is not protected from being reclaimed by the garbage collector. If all other references to that object are eliminated, the object will be reclaimed. Subsequently calling unhash with the hash number of the (now reclaimed) object will signal an error.

(define x (cons 0 0))           =>  unspecified
(hash x)                        =>  77
(eqv? (hash x) (hash x))        =>  #t
(define x 0)                    =>  unspecified
(gc-flip)                       ;force a garbage collection
(unhash 77)                     error-->

procedure+: object-hashed? object [table]: This predicate is true if object has an associated hash number. Otherwise it is false.

procedure+: valid-hash-number? k [table]: This predicate is true if k is the hash number associated with some object. Otherwise it is false.

The following two procedures provide a lower-level interface to the object-hashing mechanism.

procedure+: object-hash object [table [insert?]]

object-hash is like hash, except that it accepts an additional optional argument, insert?. If insert? is supplied and is #f, object-hash will return an integer for object only if there is already an association in the table; otherwise, it will return #f. If insert? is not supplied, or is not #f, object-hash always returns an integer, creating an association in the table if necessary.

object-hash additionally treats #f differently than does hash. Calling object-hash with #f as its argument will return an integer that, when passed to unhash, will signal an error rather than returning #f. Likewise, valid-hash-number? will return #f for this integer.

procedure+: object-unhash k [table]: object-unhash is like unhash, except that when k is not associated with any object or was previously associated with an object that has been reclaimed, object-unhash returns #f. This means that there is an ambiguity in the value returned by object-unhash: if #f is returned, there is no way to tell if k is associated with #f or is not associated with any object at all.

Finally, this procedure makes new object-hash tables:

procedure+: hash-table/make: This procedure creates and returns a new, empty object-hash table that is suitable for use as the optional table argument to the above procedures. The returned table contains no associations.

Red-Black Trees

Balanced binary trees are a useful data structure for maintaining large sets of associations whose keys are ordered. While most applications involving large association sets should use hash tables, some applications can benefit from the use of binary trees. Binary trees have two advantages over hash tables:

The contents of a binary tree can be converted to an alist, sorted by key, in time proportional to the number of associations in the tree. A hash table can be converted into an unsorted alist in linear time; sorting it requires additional time.
Two binary trees can be compared for equality in linear time. Hash tables, on the other hand, cannot be compared at all; they must be converted to alists before comparison can be done, and alist comparison is quadratic unless the alists are sorted.

MIT Scheme provides an implementation of red-black trees. The red-black tree-balancing algorithm provides generally good performance because it doesn't try to keep the tree very closely balanced. At any given node in the tree, one side of the node can be twice as high as the other in the worst case. With typical data the tree will remain fairly well balanced anyway.

A red-black tree takes space that is proportional to the number of associations in the tree. For the current implementation, the constant of proportionality is eight words per association.

Red-black trees hold their keys strongly. In other words, if a red-black tree contains an association for a given key, that key cannot be reclaimed by the garbage collector.

The red-black tree implementation is a run-time-loadable option. To use red-black trees, execute

(load-option 'rb-tree)

once before calling any of the procedures defined here.

procedure+: make-rb-tree key=? key<?: This procedure creates and returns a newly allocated red-black tree. The tree contains no associations. Key=? and key<? are predicates that compare two keys and determine whether they are equal to or less than one another, respectively. For any two keys, at most one of these predicates is true.

procedure+: rb-tree? object: Returns #t if object is a red-black tree, otherwise returns #f.

procedure+: rb-tree/insert! rb-tree key datum: Associates datum with key in rb-tree and returns an unspecified value. If rb-tree already has an association for key, that association is replaced. The average and worst-case times required by this operation are proportional to the logarithm of the number of assocations in rb-tree.

procedure+: rb-tree/lookup rb-tree key default: Returns the datum associated with key in rb-tree. If rb-tree doesn't contain an association for key, default is returned. The average and worst-case times required by this operation are proportional to the logarithm of the number of assocations in rb-tree.

procedure+: rb-tree/delete! rb-tree key: If rb-tree contains an association for key, removes it. Returns an unspecified value. The average and worst-case times required by this operation are proportional to the logarithm of the number of assocations in rb-tree.

procedure+: rb-tree->alist rb-tree: Returns the contents of rb-tree as a newly allocated alist. Each element of the alist is a pair (key . datum) where key is one of the keys of rb-tree, and datum is its associated datum. The alist is sorted by key according to the key<? argument used to construct rb-tree. The time required by this operation is proportional to the number of associations in the tree.

procedure+: rb-tree/key-list rb-tree: Returns a newly allocated list of the keys in rb-tree. The list is sorted by key according to the key<? argument used to construct rb-tree. The time required by this operation is proportional to the number of associations in the tree.

procedure+: rb-tree/datum-list rb-tree

Returns a newly allocated list of the datums in rb-tree. Each element of the list corresponds to one of the associations in rb-tree, so if the tree contains multiple associations with the same datum, so will this list. The list is sorted by the keys of the associations, even though they do not appear in the result. The time required by this operation is proportional to the number of associations in the tree.

This procedure is equivalent to:

(lambda (rb-tree) (map cdr (rb-tree->alist rb-tree)))

procedure+: rb-tree/equal? rb-tree-1 rb-tree-2 datum=?: Compares rb-tree-1 and rb-tree-2 for equality, returning #t iff they are equal and #f otherwise. The trees must have been constructed with the same equality and order predicates (same in the sense of eq?). The keys of the trees are compared using the key=? predicate used to build the trees, while the datums of the trees are compared using the equivalence predicate datum=?. The worst-case time required by this operation is proportional to the number of associations in the tree.

procedure+: rb-tree/empty? rb-tree: Returns #t iff rb-tree contains no associations. Otherwise returns #f.

procedure+: rb-tree/size rb-tree: Returns the number of associations in rb-tree, an exact non-negative integer. The average and worst-case times required by this operation are proportional to the number of associations in the tree.

procedure+: rb-tree/height rb-tree

Returns the height of rb-tree, an exact non-negative integer. This is the length of the longest path from a leaf of the tree to the root. The average and worst-case times required by this operation are proportional to the number of associations in the tree.

The returned value satisfies the following:

(lambda (rb-tree)
  (let ((size (rb-tree/size rb-tree))
        (lg (lambda (x) (/ (log x) (log 2)))))
    (<= (lg size)
        (rb-tree/height rb-tree)
        (* 2 (lg (+ size 1))))))

procedure+: rb-tree/copy rb-tree: Returns a newly allocated copy of rb-tree. The copy is identical to rb-tree in all respects, except that changes to rb-tree do not affect the copy, and vice versa. The time required by this operation is proportional to the number of associations in the tree.

procedure+: alist->rb-tree alist key=? key<?

Returns a newly allocated red-black tree that contains the same associations as alist. This procedure is equivalent to:

(lambda (alist key=? key<?)
  (let ((tree (make-rb-tree key=? key<?)))
    (for-each (lambda (association)
                (rb-tree/insert! tree
                                 (car association)
                                 (cdr association)))
              alist)
    tree))

Weight-Balanced Trees

Balanced binary trees are a useful data structure for maintaining large sets of ordered objects or sets of associations whose keys are ordered. MIT Scheme has a comprehensive implementation of weight-balanced binary trees which has several advantages over the other data structures for large aggregates:

In addition to the usual element-level operations like insertion, deletion and lookup, there is a full complement of collection-level operations, like set intersection, set union and subset test, all of which are implemented with good orders of growth in time and space. This makes weight-balanced trees ideal for rapid prototyping of functionally derived specifications.
An element in a tree may be indexed by its position under the ordering of the keys, and the ordinal position of an element may be determined, both with reasonable efficiency.
Operations to find and remove minimum element make weight-balanced trees simple to use for priority queues.
The implementation is functional rather than imperative. This means that operations like `inserting' an association in a tree do not destroy the old tree, in much the same way that (+ 1 x) modifies neither the constant 1 nor the value bound to x. The trees are referentially transparent thus the programmer need not worry about copying the trees. Referential transparency allows space efficiency to be achieved by sharing subtrees.

These features make weight-balanced trees suitable for a wide range of applications, especially those that require large numbers of sets or discrete maps. Applications that have a few global databases and/or concentrate on element-level operations like insertion and lookup are probably better off using hash tables or red-black trees.

The size of a tree is the number of associations that it contains. Weight-balanced binary trees are balanced to keep the sizes of the subtrees of each node within a constant factor of each other. This ensures logarithmic times for single-path operations (like lookup and insertion). A weight-balanced tree takes space that is proportional to the number of associations in the tree. For the current implementation, the constant of proportionality is six words per association.

Weight-balanced trees can be used as an implementation for either discrete sets or discrete maps (associations). Sets are implemented by ignoring the datum that is associated with the key. Under this scheme if an association exists in the tree this indicates that the key of the association is a member of the set. Typically a value such as (), #t or #f is associated with the key.

Many operations can be viewed as computing a result that, depending on whether the tree arguments are thought of as sets or maps, is known by two different names. An example is wt-tree/member?, which, when regarding the tree argument as a set, computes the set membership operation, but, when regarding the tree as a discrete map, wt-tree/member? is the predicate testing if the map is defined at an element in its domain. Most names in this package have been chosen based on interpreting the trees as sets, hence the name wt-tree/member? rather than wt-tree/defined-at?.

The weight-balanced tree implementation is a run-time-loadable option. To use weight-balanced trees, execute

(load-option 'wt-tree)

once before calling any of the procedures defined here.

Construction of Weight-Balanced Trees

Binary trees require there to be a total order on the keys used to arrange the elements in the tree. Weight-balanced trees are organized by types, where the type is an object encapsulating the ordering relation. Creating a tree is a two-stage process. First a tree type must be created from the predicate that gives the ordering. The tree type is then used for making trees, either empty or singleton trees or trees from other aggregate structures like association lists. Once created, a tree `knows' its type and the type is used to test compatibility between trees in operations taking two trees. Usually a small number of tree types are created at the beginning of a program and used many times throughout the program's execution.

procedure+: make-wt-tree-type key<?

This procedure creates and returns a new tree type based on the ordering predicate key<?. Key<? must be a total ordering, having the property that for all key values a, b and c:

(key<? a a)                         => #f
(and (key<? a b) (key<? b a))       => #f
(if (and (key<? a b) (key<? b c))
    (key<? a c)
    #t)                             => #t

Two key values are assumed to be equal if neither is less than the other by key<?.

Each call to make-wt-tree-type returns a distinct value, and trees are only compatible if their tree types are eq?. A consequence is that trees that are intended to be used in binary-tree operations must all be created with a tree type originating from the same call to make-wt-tree-type.

variable+: number-wt-type

A standard tree type for trees with numeric keys. Number-wt-type could have been defined by

(define number-wt-type (make-wt-tree-type  <))

variable+: string-wt-type

A standard tree type for trees with string keys. String-wt-type could have been defined by

(define string-wt-type (make-wt-tree-type  string<?))

procedure+: make-wt-tree wt-tree-type: This procedure creates and returns a newly allocated weight-balanced tree. The tree is empty, i.e. it contains no associations. Wt-tree-type is a weight-balanced tree type obtained by calling make-wt-tree-type; the returned tree has this type.

procedure+: singleton-wt-tree wt-tree-type key datum: This procedure creates and returns a newly allocated weight-balanced tree. The tree contains a single association, that of datum with key. Wt-tree-type is a weight-balanced tree type obtained by calling make-wt-tree-type; the returned tree has this type.

procedure+: alist->wt-tree tree-type alist

Returns a newly allocated weight-balanced tree that contains the same associations as alist. This procedure is equivalent to:

(lambda (type alist)
  (let ((tree (make-wt-tree type)))
    (for-each (lambda (association)
                (wt-tree/add! tree
                              (car association)
                              (cdr association)))
              alist)
    tree))

Basic Operations on Weight-Balanced Trees

This section describes the basic tree operations on weight-balanced trees. These operations are the usual tree operations for insertion, deletion and lookup, some predicates and a procedure for determining the number of associations in a tree.

procedure+: wt-tree? object: Returns #t if object is a weight-balanced tree, otherwise returns #f.

procedure+: wt-tree/empty? wt-tree: Returns #t if wt-tree contains no associations, otherwise returns #f.

procedure+: wt-tree/size wt-tree: Returns the number of associations in wt-tree, an exact non-negative integer. This operation takes constant time.

procedure+: wt-tree/add wt-tree key datum: Returns a new tree containing all the associations in wt-tree and the association of datum with key. If wt-tree already had an association for key, the new association overrides the old. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in wt-tree.

procedure+: wt-tree/add! wt-tree key datum: Associates datum with key in wt-tree and returns an unspecified value. If wt-tree already has an association for key, that association is replaced. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in wt-tree.

procedure+: wt-tree/member? key wt-tree: Returns #t if wt-tree contains an association for key, otherwise returns #f. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in wt-tree.

procedure+: wt-tree/lookup wt-tree key default: Returns the datum associated with key in wt-tree. If wt-tree doesn't contain an association for key, default is returned. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in wt-tree.

procedure+: wt-tree/delete wt-tree key: Returns a new tree containing all the associations in wt-tree, except that if wt-tree contains an association for key, it is removed from the result. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in wt-tree.

procedure+: wt-tree/delete! wt-tree key: If wt-tree contains an association for key the association is removed. Returns an unspecified value. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in wt-tree.

Advanced Operations on Weight-Balanced Trees

In the following the size of a tree is the number of associations that the tree contains, and a smaller tree contains fewer associations.

procedure+: wt-tree/split< wt-tree bound: Returns a new tree containing all and only the associations in wt-tree that have a key that is less than bound in the ordering relation of the tree type of wt-tree. The average and worst-case times required by this operation are proportional to the logarithm of the size of wt-tree.

procedure+: wt-tree/split> wt-tree bound: Returns a new tree containing all and only the associations in wt-tree that have a key that is greater than bound in the ordering relation of the tree type of wt-tree. The average and worst-case times required by this operation are proportional to the logarithm of the size of wt-tree.

procedure+: wt-tree/union wt-tree-1 wt-tree-2: Returns a new tree containing all the associations from both trees. This operation is asymmetric: when both trees have an association for the same key, the returned tree associates the datum from wt-tree-2 with the key. Thus if the trees are viewed as discrete maps then wt-tree/union computes the map override of wt-tree-1 by wt-tree-2. If the trees are viewed as sets the result is the set union of the arguments. The worst-case time required by this operation is proportional to the sum of the sizes of both trees. If the minimum key of one tree is greater than the maximum key of the other tree then the worst-case time required is proportional to the logarithm of the size of the larger tree.

procedure+: wt-tree/intersection wt-tree-1 wt-tree-2: Returns a new tree containing all and only those associations from wt-tree-1 that have keys appearing as the key of an association in wt-tree-2. Thus the associated data in the result are those from wt-tree-1. If the trees are being used as sets the result is the set intersection of the arguments. As a discrete map operation, wt-tree/intersection computes the domain restriction of wt-tree-1 to (the domain of) wt-tree-2. The worst-case time required by this operation is proportional to the sum of the sizes of the trees.

procedure+: wt-tree/difference wt-tree-1 wt-tree-2: Returns a new tree containing all and only those associations from wt-tree-1 that have keys that do not appear as the key of an association in wt-tree-2. If the trees are viewed as sets the result is the asymmetric set difference of the arguments. As a discrete map operation, it computes the domain restriction of wt-tree-1 to the complement of (the domain of) wt-tree-2. The worst-case time required by this operation is proportional to the sum of the sizes of the trees.

procedure+: wt-tree/subset? wt-tree-1 wt-tree-2

Returns #t iff the key of each association in wt-tree-1 is the key of some association in wt-tree-2, otherwise returns #f. Viewed as a set operation, wt-tree/subset? is the improper subset predicate. A proper subset predicate can be constructed:

(define (proper-subset? s1 s2)
  (and (wt-tree/subset? s1 s2)
       (< (wt-tree/size s1) (wt-tree/size s2))))

As a discrete map operation, wt-tree/subset? is the subset test on the domain(s) of the map(s). In the worst-case the time required by this operation is proportional to the size of wt-tree-1.

procedure+: wt-tree/set-equal? wt-tree-1 wt-tree-2

Returns #t iff for every association in wt-tree-1 there is an association in wt-tree-2 that has the same key, and vice versa.

Viewing the arguments as sets, wt-tree/set-equal? is the set equality predicate. As a map operation it determines if two maps are defined on the same domain.

This procedure is equivalent to

(lambda (wt-tree-1 wt-tree-2)
  (and (wt-tree/subset? wt-tree-1 wt-tree-2
       (wt-tree/subset? wt-tree-2 wt-tree-1)))

In the worst case the time required by this operation is proportional to the size of the smaller tree.

procedure+: wt-tree/fold combiner initial wt-tree

This procedure reduces wt-tree by combining all the associations, using an reverse in-order traversal, so the associations are visited in reverse order. Combiner is a procedure of three arguments: a key, a datum and the accumulated result so far. Provided combiner takes time bounded by a constant, wt-tree/fold takes time proportional to the size of wt-tree.

A sorted association list can be derived simply:

(wt-tree/fold (lambda (key datum list)
                (cons (cons key datum) list))
              '()
              wt-tree))

The data in the associations can be summed like this:

(wt-tree/fold (lambda (key datum sum) (+ sum datum))
              0
              wt-tree)

procedure+: wt-tree/for-each action wt-tree

This procedure traverses wt-tree in order, applying action to each association. The associations are processed in increasing order of their keys. Action is a procedure of two arguments that takes the key and datum respectively of the association. Provided action takes time bounded by a constant, wt-tree/for-each takes time proportional to the size of wt-tree. The example prints the tree:

(wt-tree/for-each (lambda (key value)
                    (display (list key value)))
                  wt-tree))

procedure+: wt-tree/union-merge wt-tree-1 wt-tree-2 merge

Returns a new tree containing all the associations from both trees. If both trees have an association for the same key, the datum associated with that key in the result tree is computed by applying the procedure merge to the key, the value from wt-tree-1 and the value from wt-tree-2. Merge is of the form

(lambda (key datum-1 datum-2) ...)

If some key occurs only in one tree, that association will appear in the result tree without being processed by merge, so for this operation to make sense, either merge must have both a right and left identity that correspond to the association being absent in one of the trees, or some guarantee must be made, for example, all the keys in one tree are known to occur in the other.

These are all reasonable procedures for merge

(lambda (key val1 val2) (+ val1 val2))
(lambda (key val1 val2) (append val1 val2))
(lambda (key val1 val2) (wt-tree/union val1 val2))

However, a procedure like

(lambda (key val1 val2) (- val1 val2))

would result in a subtraction of the data for all associations with keys occuring in both trees but associations with keys occuring in only the second tree would be copied, not negated, as is presumably be intent. The programmer might ensure that this never happens.

This procedure has the same time behavior as wt-tree/union but with a slightly worse constant factor. Indeed, wt-tree/union might have been defined like this:

(define (wt-tree/union tree1 tree2)
  (wt-tree/union-merge tree1 tree2 (lambda (key val1 val2) val2)))

The merge procedure takes the key as a parameter in case the data are not independent of the key.

Indexing Operations on Weight-Balanced Trees

Weight-balanced trees support operations that view the tree as sorted sequence of associations. Elements of the sequence can be accessed by position, and the position of an element in the sequence can be determined, both in logarthmic time.

procedure+: wt-tree/index wt-tree index

procedure+: wt-tree/index-datum wt-tree index

procedure+: wt-tree/index-pair wt-tree index

Returns the 0-based indexth association of wt-tree in the sorted sequence under the tree's ordering relation on the keys. wt-tree/index returns the indexth key, wt-tree/index-datum returns the datum associated with the indexth key and wt-tree/index-pair returns a new pair (key . datum) which is the cons of the indexth key and its datum. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in the tree.

These operations signal a condition of type condition-type:bad-range-argument if index<0 or if index is greater than or equal to the number of associations in the tree. If the tree is empty, they signal an anonymous error.

Indexing can be used to find the median and maximum keys in the tree as follows:

median:   (wt-tree/index wt-tree (quotient (wt-tree/size wt-tree) 2))
maximum:  (wt-tree/index wt-tree (- (wt-tree/size wt-tree) 1))

procedure+: wt-tree/rank wt-tree key: Determines the 0-based position of key in the sorted sequence of the keys under the tree's ordering relation, or #f if the tree has no association with for key. This procedure returns either an exact non-negative integer or #f. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in the tree.

procedure+: wt-tree/min wt-tree

procedure+: wt-tree/min-datum wt-tree

procedure+: wt-tree/min-pair wt-tree

Returns the association of wt-tree that has the least key under the tree's ordering relation. wt-tree/min returns the least key, wt-tree/min-datum returns the datum associated with the least key and wt-tree/min-pair returns a new pair (key . datum) which is the cons of the minimum key and its datum. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in the tree.

These operations signal an error if the tree is empty. They could be written

(define (wt-tree/min tree)        (wt-tree/index tree 0))
(define (wt-tree/min-datum tree)  (wt-tree/index-datum tree 0))
(define (wt-tree/min-pair tree)   (wt-tree/index-pair tree 0))

procedure+: wt-tree/delete-min wt-tree

Returns a new tree containing all of the associations in wt-tree except the association with the least key under the wt-tree's ordering relation. An error is signalled if the tree is empty. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in the tree. This operation is equivalent to

(wt-tree/delete wt-tree (wt-tree/min wt-tree))

procedure+: wt-tree/delete-min! wt-tree

Removes the association with the least key under the wt-tree's ordering relation. An error is signalled if the tree is empty. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in the tree. This operation is equivalent to

(wt-tree/delete! wt-tree (wt-tree/min wt-tree))

Go to the first, previous, next, last section, table of contents.