-*- Text -*- $Header: scode.txt,v 1.1 87/12/04 10:19:34 GMT cph Rel $ Scheme Datatypes Chris Hanson Scheme, like most other object oriented languages, has a set of datatypes which form the foundation for any embedded subsystem. The purpose of this document is to describe the following for each datatype in the set: (*) its relationship to the interpreter, (*) its relationship to other datatypes, and (*) the operations available to manipulate instances of the type. In addition to the datatype descriptions, there are some tools available for manipulating typed objects in general. [] Overview ... Note: Scheme contains an organizational heirarchy for types, which associates a special object with each datatype in the system. This heirarchy will be discussed at length below, but for now it is useful to know that such type objects exist and are useful. [] Typed Objects Most of this document will describe abstract datatypes, but this section will briefly go over the physical datatype structure of the system. Each Scheme object is represented by a bit string in the computer, which can be broken up into three parts: (1) a type code (at least 6 bits worth, currently 8), (2) a datum (for pointers this is a memory address), and (3) a danger bit (used for marking by various parts of the system). Type codes in the Scheme system are broken up into a few classes, based on the actions taken by the garbage collector. The class of an object is normally referred to as the "GC class" (or gc tyupe) of the object. The current GC classes are (the indices correspond to the values returned by PRIMITIVE-GC-TYPE): (0) Non-pointer objects, which can be represented in a single word, (1) Singletons (or cells), which point to a single memory location, (2) Pairs, which point to a pair of memory locations, (3) Triples, which point to a triple of memory locations, (4) Quadruples, which point to four memory locations, (-2) Special objects. Each of these is handled differently by the garbage collector. Most of them are for internal use of the allocator itself, (-3) Vectors, which point to an n-tuple of memory locations, and (-4) Compiled entry points, which are implementation dependent. It is quite conceivable that other classes will be implemented in the future, e.g. quintuples. [] Operations on Typed Objects The following operations will manipulate typed objects directly: (PRIMITIVE-SET-TYPE type object) This operation will construct a new object with the same datum as OBJECT, the given TYPE, and a cleared danger bit. Note that it will not return an object with a different GC class than OBJECT. (PRIMITIVE-TYPE object) (PRIMITIVE-DATUM object) Returns the type/datum of OBJECT as a non-negative integer. Currently CScheme has a bug in that `primitive-datum' returns a negative integer in some cases. (PRIMITIVE-TYPE? type object) Is a predicate which is true iff (= (PRIMITIVE-TYPE OBJECT) TYPE). TYPE must be a non-negative integer in the appropriate range (which is of course implementation dependent). (MAKE-NON-POINTER-OBJECT datum) Returns a non-pointer object with the given datum. DATUM must be a non-negative integer in the appropriate (implementation dependent) range. (OBJECT-DANGEROUS? object) Is true iff the danger bit of the given object is set. (MAKE-OBJECT-SAFE object) (MAKE-OBJECT-DANGEROUS object) Return a new object which has the same type and datum as OBJECT, but with the danger bit cleared/set, respectively. [] Type Table The facilitate thinking about type codes on a more abstract level, there are a set of operations that convert between type codes and symbolic names for those code. NUMBER-OF-MICROCODE-TYPES If the implementation you are using has N bits of type code, this number is 2^N. (MICROCODE-TYPE name) Returns the type code corresponding to NAME, or #F if there is no such type. (MICROCODE-TYPE-NAME code) Returns the canonical name corresponding to the given type code. There may in general be several names for the same code. (MICROCODE-TYPE-PREDICATE name) Returns a procedure which is true only of that type of object. (OBJECT-TYPE object) Similar to MICROCODE-TYPE-NAME, but extracts the type code form object, and remaps some related microcode types to different names. [] EQ? Hashing A useful facility provided by Scheme is the object hashing facility. This associates non-negative integers with objects such that two objects are EQ? iff their hash numbers are =. There are two sets of operations for performing object hashing, which differ only in the form of the numbers which are manipulated, not in the values of those numbers. Thus the two types of operations do not exclude one another. The most useful fact about hash numbers is that they can be used as "weak pointers", that is, as pointers that do not prevent the object pointed at from being garbage-collected. Thus is may be the case that the object associated with a particular hash number no longer exists; however, that hash number will never be reused for another object. (HASH object) Returns the hash number for OBJECT as an integer. (UNHASH hash-number) Takes a hash number represented as an integer and returns the object associated with it. If it is not a valid hash number, signals an error. [] Environments The Scheme environment structure is central to any discussion of interpretation. It is a mutable mapping from symbols to objects which is characterized by a block structure which is inherent in the lexical scoping of the language. Each Scheme environment can be thought of as a chain of frames; each frame in the chain represents a single block in the language. For example, the expression (LAMBDA (X) (LAMBDA (Y) ...)) would create two frames, the outer one containing a binding for X, the inner a binding for Y. The body of the inner lambda expression would see bindings for both X and Y, but the frame containing Y would be "the" environment frame for that expression. The frame containing X is called the "parent" of that frame. There is one environment frame that is distinguished from all of the others; this is called the "global" environment frame. The existence of this distinguished frame is due to an efficiency hack which will not be discussed here. However, it is useful to know that the value cells for this environment are located in symbols. The global environment (as the name implies) has no parent. It is also possible to have other environments without parents, but these are in general not used. A name is said to be "unbound" in an environment if none of the frames in the environment has an entry for that name. Otherwise it is said to be "bound in the frame" in which that entry occurs. The "binding" for such a name is the value of the entry in the innermost frame in which it is bound. If a name is bound, but has no value, it is called "unassigned". Such bindings can be created in several different ways; see the Scheme manual for details. Unassigned bindings are intended to be used in conjunction with side-effects (typically FLUID-LET). The following operations work on any environment; ENVIRONMENT should be a chain, and NAME should be a symbol. The binding for NAME is found by searching through all of the frames in ENVIRONMENT. (LEXICAL-UNBOUND? environment name) (LEXICAL-UNASSIGNED? environment name) These are true iff the given name is unbound or unassigned, respectively, in the given environment. (LEXICAL-UNREFERENCEABLE? environment name) This is true iff the given name is either unbound or unassigned in the given environment. (LEXICAL-REFERENCE environment name) Returns the value of NAME in ENVIRONMENT. Causes an error if the symbol is unbound or unassigned. (LEXICAL-ASSIGNMENT environment name object) Modifies the value of NAME in ENVIRONMENT to OBJECT. Causes an error if the symbol is unbound. This is defined to return the old value of the variable, but usually this should not be depended on. (LOCAL-ASSIGNMENT environment name object) This operation defines NAME to have OBJECT as its binding within the innermost frame of ENVIRONMENT. If NAME already has a binding in that frame, it is modified (whether or not it is assigned); otherwise a new binding is created with the appropriate value. SYSTEM-GLOBAL-ENVIRONMENT Is the (unique) global environment; it can be compared using EQ?. The following operations are defined on non-global environments: ENVIRONMENT-TYPE This is the type object for non-global environment structures. (ENVIRONMENT? object) Is true only of non-global environment structures. (ENVIRONMENT-PROCEDURE environment) Returns the procedure which was called to produce ENVIRONMENT; it is guaranteed to be a compound procedure. (ENVIRONMENT-HAS-PARENT? environment) Is true iff the procedure which created ENVIRONMENT was closed in a non-global environment. (ENVIRONMENT-PARENT environment) Returns the environment which ENVIRONMENT-HAS-PARENT? indicates the existence of. (ENVIRONMENT-BINDINGS environment) Returns an alist of all the variables bound by the given frame. Entries in the alist are pairs, the car of which is the (symbolic) name of the variable, and the cadr of which is the value. If a variable is unassigned, the cdr of the entry will be '(). (ENVIRONMENT-ARGUMENTS environment) Returns a list of the arguments which were passed to the procedure which created ENVIRONMENT. [] Procedures Procedures can be separated into three classes in Scheme: PRIMITIVE, COMPOUND, and COMPILED. Primitive procedures are provided by the implementation, and are typically built into the system at a low level; an example is the CAR operation. Compound procedures are so-called because they can be broken up into two parts, a LAMBDA expression and an ENVIRONMENT. Compiled procedures are really compound procedures whose lambda expressions have been compiled (into native code), and whose environments have potentially been optimized. Due to these optimizations, many of the procedure utilities do not work on compiled procedures. In some sense the distinction between primitive and compound procedures is arbitrary, since it would be perfectly possible to implement many primitive procedures as compound procedures. Indeed, it is difficult to distinguish between the two classes in a system that is compiled rather than interpreted; compound procedures that have been compiled differ little or not at all from primitive procedures. However, in this document we will mostly be concerned with the following aspects of compound procedures, which can readily be used to discriminate between the classes: * Compound procedures create new environment frames. Compiled procedures may or may not, depending on the degree of optimization achieved in the environment structure. Primitive procedures never create new frames. * Compound procedures have SCODE definitions associated with them which can be examined. The following are the standard operations on procedures: PRIMITIVE-PROCEDURE-TYPE COMPOUND-PROCEDURE-TYPE COMPILED-PROCEDURE-TYPE PROCEDURE-TYPE These type objects correspond to primitive procedures, compound procedures, compiled procedures, and all procedures, respectively. (PRIMITIVE-PROCEDURE? object) (COMPOUND-PROCEDURE? object) (COMPILED-PROCEDURE? object) (PROCEDURE? object) These predicates are true only of the appropriate objects. (PROCEDURE-ENVIRONMENT procedure) Returns the definition environment of the procedure; for primitive procedures this is the global environment. This operation is undefined for compiled procedures. (PROCEDURE-LAMBDA compound-procedure) Returns the LAMBDA object which is the definition of the given compound procedure. (PROCEDURE-COMPONENTS compound-procedure receiver) Calls RECEIVER on the LAMBDA and ENVIRONMENT of the given compound procedure. The description of each construct in the SCODE language consists of several parts. * The name of the construct, and the names of each of the subcomponents of the construct. * A description of the actions performed when an instance of the construct is executed by the Scheme interpreter. * A description of the Scheme operations which create, identify, and manipulate such constructs. The descriptions for these operations follow a general rule which is rarely deviated from; we will describe only those deviations. Here is the template for one such type: FOO: BAR BAZ BLETCH FOO is a type with component parts BAR, BAZ and BLETCH. Operations: FOO-TYPE is the type object for FOO objects. (FOO? object) is the type predicate for FOO objects. (MAKE-FOO bar baz bletch) makes an instance of a FOO object. (FOO-COMPONENTS foo receiver) calls RECEIVER on the components of FOO. That is, RECEIVER is a procedure of three arguments which is called on the three respective components of FOO. It is always true that (FOO-COMPONENTS foo MAKE-FOO) will return a new (different) object with the same components as FOO. (FOO-BAR foo) (FOO-BAZ foo) (FOO-BLETCH foo) return the components of FOO individually. * There is sometimes a short note on some implementation-specifics for the 68000 Scheme implementation. Note: Within this document, we often refer to the subexpressions of an SCODE expression as being either subproblems or reductions. Each class of subexpression is executed identically; the distinction between the two classes lies in the usage of the value. The value of a "subproblem" is an intermediate value in the computation of the value of the overall expression. The value of a "reduction" is the value of the overall expression, which is said to be "reduced to" the given subexpression. [] VARIABLE: NAME Execution: The named variable is looked up in the current environment. Operations: VARIABLE-TYPE (VARIABLE? object) (MAKE-VARIABLE name) (VARIABLE-COMPONENTS variable receiver) (VARIABLE-NAME variable) Implementation: The current interpreter system uses a incremental compilation mechanism for variable lookup which side-effects the variable object to contain information regarding the location of that variable within a particular class of environments. Variables should not be shared between arbitrary environments. MAKE-VARIABLE returns an uncompiled variable, which should only be used within a particular block of a particular piece of code. [] LAMBDA: NAME, REQUIRED, OPTIONAL, REST, AUXILIARY, DECLARATIONS, BODY Corresponds to the LAMBDA special form. Execution: A PROCEDURE object is created by closing the LAMBDA expression with the execution environment. When the procedure is applied, a new environment frame is created and the BODY executed within it; the value of the body is the value of the application, so the application is "reduced to" the body. The standard types of parameters are all accepted: REQUIRED parameters must be present when the procedure is applied, or an error will be signalled. OPTIONAL parameters may or may not be present; the REST parameter will get any remaining arguments after the REQUIRED and OPTIONAL parameters have been satisfied; any OPTIONAL parameters that are not supplied will be unassigned when the procedure is applied. AUXILIARY parameters are extra slots in the environment that are initialized to the unassigned value. Operations: (MAKE-LAMBDA name required optional rest auxiliary declarations body) Creates a new lambda object as per the given arguments. REQUIRED, OPTIONAL, and AUXILIARY should be lists of symbols; NAME should be a symbol; REST should be a symbol or #F; and BODY must be an SCODE expression. LAMBDA-TYPE (LAMBDA? object) (LAMBDA-COMPONENTS lambda receiver) (LAMBDA-BODY lambda) (LAMBDA-BOUND lambda) Returns a list of the names of all of the variables bound by LAMBDA. Implementation: Lambdas are currently implemented using 3 different SCODE forms. The different versions support the above fields differently. DECLARATIONS are not supported directly by any type, they are handled by the abstract interface above and hidden in the body of the lambda expressions. EXTENDED-LAMBDA is the general case. It supports all the options directly except declarations. LAMBDA is the common case. It only supports NAME and REQUIRED directly. It does not support OPTIONAL, REST at all, and supports AUXILIARY indirectly. LEXPR is very rare. It supports NAME and REQUIRED directly, OPTIONAL, REST, and AUXILIARY indirectly. [] COMBINATION: OPERATOR, OPERANDS Execution: The OPERATOR and the OPERANDS (an ordered sequence) are executed as subproblems in any order, and the values collected. Then the value of the OPERATOR is applied to the values of the OPERANDS. Operations: (MAKE-COMBINATION operator operands) COMBINATION-TYPE (COMBINATION? object) (COMBINATION-COMPONENTS combination receiver) (COMBINATION-OPERATOR combination) (COMBINATION-OPERANDS combination) Implementation: Combinations are currently implemented using 7 different SCODE forms: COMBINATION is the generalized form, which is implemented as a vector of subexpressions. COMBINATION-1 and COMBINATION-2 are short fixed length combinations which contain 1 and 2 operands, respectively. They were implemented to save code space, increase speed, and reduce execution storage. PRIMITIVE-COMBINATION-0, PRIMITIVE-COMBINATION-1, PRIMITIVE-COMBINATION-2, and PRIMITIVE-COMBINATION-3 are fixed length combinations whose operators are Scheme primitive procedures. A primitive is a self-evaluating constant and thus requires no execution to produce its value; this results in extra evlis-type tail-recursion. [] CONDITIONAL: PREDICATE, CONSEQUENT, ALTERNATIVE Corresponds directly to the IF special form. Execution: First, the PREDICATE subproblem is executed; if the value is false, the conditional is reduced to the ALTERNATIVE subexpression; otherwise it is reduced to the CONSEQUENT subexpression. Operations: (MAKE-CONDITIONAL predicate consequent alternative) CONDITIONAL-TYPE (CONDITIONAL? object) (CONDITIONAL-COMPONENTS conditional receiver) (CONDITIONAL-PREDICATE conditional) (CONDITIONAL-CONSEQUENT conditional) (CONDITIONAL-ALTERNATIVE conditional) [] SEQUENCE: ACTIONS Corresponds directly to the SEQUENCE special form. Execution: ACTIONS is an ordered set of subexpressions. Each of the subexpressions is executed in order. The last subexpression is a reduction; the others are subproblems and their values are discarded. Operations: (MAKE-SEQUENCE actions) SEQUENCE-TYPE (SEQUENCE? object) (SEQUENCE-COMPONENTS sequence receiver) (SEQUENCE-ACTIONS sequence) Implementation: Currently SEQUENCE is implemented as two SCODE forms, SEQUENCE-2 and SEQUENCE-3, which are fixed length sequences of 2 or 3 elements. Larger sequences are constructed by chaining these forms together. This was done because (1) sequences are usually very short in good Scheme code, and (2) this implementation was simpler, faster, and required less intermediate storage than a direct implementation of arbitrary length sequences. [] DELAY: EXPRESSION Corresponds directly to the DELAY special form. Execution: A PROMISE object is created by combining EXPRESSION with the current environment. When this object is forced, the expression is executed in the environment, and the answer is memoized in the PROMISE object. Operations: (MAKE-DELAY expression) DELAY-TYPE (DELAY? object) (DELAY-COMPONENTS delay receiver) (DELAY-EXPRESSION delay) [] THE-ENVIRONMENT Corresponds directly to the THE-ENVIRONMENT special form. Execution: The environment in which this expression is executed is returned as the value of the execution. Operations: (MAKE-THE-ENVIRONMENT) THE-ENVIRONMENT-TYPE (THE-ENVIRONMENT? object) [] QUOTATION: EXPRESSION Corresponds directly to the SCODE-QUOTE special form. Execution: The QUOTATION expression evaluates to its EXPRESSION component. This is used for quoting pieces of code. Operations: (MAKE-QUOTATION expression) QUOTATION-TYPE (QUOTATION? object) (QUOTATION-EXPRESSION quotation) [] COMMENT: TEXT, EXPRESSION [] DECLARATION: TEXT, EXPRESSION (Note: Both COMMENT and DECLARATION have the same behavior; only COMMENT is described here. Do a string substitution to get the documentation for DECLARATION.) Execution: The COMMENT is reduced to the value of EXPRESSION. TEXT is completely ignored; this is for use by programs that analyze code. Operations: (MAKE-COMMENT text expression) COMMENT-TYPE (COMMENT? object) (COMMENT-COMPONENTS comment receiver) (COMMENT-TEXT comment) (COMMENT-EXPRESSION comment) (SET-COMMENT-TEXT! comment text) (SET-COMMENT-EXPRESSION! comment expression) Modify the component parts of COMMENT. Efficiency Hacks These SCODE expressions exists primarily to save space in programs. In some cases they also save time. [] DISJUNCTION: PREDICATE, ALTERNATIVE Corresponds directly to the DISJUNCTION special form (this is the OR special form in most other LISP implementations). Execution: The predicate subproblem is executed; if it has a non-false value, that value is the result, otherwise the disjunction is reduced to the ALTERNATIVE subexpression. Operations: (MAKE-DISJUNCTION predicate alternative) DISJUNCTION-TYPE (DISJUNCTION? object) (DISJUNCTION-COMPONENTS disjunction receiver) (DISJUNCTION-PREDICATE disjunction) (DISJUNCTION-ALTERNATIVE disjunction) [] DEFINITION: NAME, VALUE Corresponds directly to the DEFINE special form. Execution: The variable NAME is defined to have as its value the result of executing the VALUE subproblem. This definition is added to the current environment frame, shadowing any other definitions in the environment. If such a variable already exists in the current frame, it is altered. Operations: (MAKE-DEFINITION name value) DEFINITION-TYPE (DEFINITION? object) (DEFINITION-COMPONENTS definition receiver) (DEFINITION-NAME definition) (DEFINITION-VALUE definition)