[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

*To*: scheme@MIT-MC*Subject*: Numbers Committee Report*From*: GJS%MIT-EECS@MIT-MC.ARPA*Date*: Sun 10 Mar 85 16:53:46-EST

Sorry it took so long, but I was afraid to do a half-assed job. The following report raises 10^10 issues which should be thought about by all of us. I believe (after a bit of study) that other language people have been inadequately careful about the status of numbers in their languages. Numbers are probably the most complicated data structures we ever deal with, so we must try to do a good job on them. The report is entirely the responsibility of GJS (so throw stones at him), though Bill Rozas, Jon Rees, Guy Steele, Chris Hanson, Hal Abelson, Yekta Gursel, and Chris Lindblad have made significant contributions -- most of their stones have been thrown and absorbed (some hurt!). Scheme Numbers Intent Numerical computation has traditionally been neglected by the Lisp community. Until Common Lisp there has been no carefully thought out strategy for organizing numerical computation; and with the exception of the MacLisp system there has been little effort to efficiently execute numerical code. We applaud the excellent work of the Common Lisp committee and we accept many of their recommendations. In some ways we simplify and generalize their proposals in a manner consistent with the purposes of Scheme. Scheme numerical operations treat numbers as abstract data, as independent of the machine representation as is possible. Thus, the casual user should be able to write simple programs without having to know that the implementation may use fixed-point, floating-point, and perhaps other representations for his data. Unfortunately, this illusion of uniformity can only be sustained approximately -- the implementation of numbers will leak out of our abstraction whenever the user must be in control of precision, or accuracy, or when he must construct especially efficient computations. Thus we also must provide escape mechanisms so that a sophisticated programmer can exercise more control of the execution of his code and the representation of his data when it is essential to do so. We separate out several apparently related issues of representation -- the abstract numbers, their machine representation, and the input/output formats of the symbolic expressions which denote numerical constants. We will use mathematical words such as NUMBER, COMPLEX, REAL, RATIONAL, and INTEGER for properties of the abstract numbers, data-structure names such as FIXNUM, BIGNUM, RATNUM, and FLONUM for machine representations, and names like INT, FIX, FLO, SCI, RAT, POLAR, and RECT for input/output formats. Numbers A Scheme system provides data of type NUMBER, which is the most general numerical type supported by that system. NUMBER is likely to be a complicated union type implemented in terms of FIXNUMs, BIGNUMS, FLONUMS, etc. but this should not be apparent to a naive user. What the user sees is that the obvious operations on numbers produce the mathematically expected results, within the limits of the implementation. Thus if the user divides 3 by 2, he should get something like 1.5 (or, the exact fraction 3/2). If he adds that result to itself, and if the implementation can swing it, he should get an exact 3. Mathematically, numbers may be arranged into a tower of subtypes with natural projections and injections relating adjacent members of the tower: NUMBER COMPLEX REAL RATIONAL INTEGER We impose a uniform rule of downward coercion -- a number of one type is also of a lower type if the injection (up) of the projection (down) of a number leaves the number unchanged. Since this tower is a real mathematical entity, Scheme provides predicates and procedures that access the tower. Not all Scheme implementations must provide the whole tower, but they are required to implement a coherent subset, consistent with the purposes of Scheme. Exactness Numbers are either EXACT or INEXACT. A number is exact if it was derived from EXACT numbers using only EXACT operations. A number is INEXACT if it models a quantity known only approximately, if it was derived using INEXACT ingredients, or if it was derived using INEXACT operations. Thus INEXACTness is a contagious property of a number. Some operations, such as the square root (of non-square numbers) must be INEXACT because of the finite precision of our representations. Other operations are inexact because of implementation requirements. We emphasize that exactness is independent of the position of the number on the tower. It is perfectly possible to have an INEXACT INTEGER or an EXACT REAL; 355/113 may be an EXACT RATIONAL or it may be an INEXACT RATIONAL approximation to pi, depending on the application. Operationally, it is the system's responsibility to combine EXACT numbers using exact methods, such as infinite precision integer and rational arithmetic, where possible. An implementation may not be able to do this (if it does not use infinite precision integers and rationals), but if a number becomes inexact for implementation reasons there is probably an important error condition, such as integer overflow, to be reported. Arithmetic on INEXACT numbers is not so constrained. The system may use floating point and other flaky representation strategies for INEXACT numbers. This is not to say that implementors need not use the best known algorithms for INEXACT computations -- only that high-quality approximate methods are allowed. In a system which cannot explicitly distinguish exact from inexact numbers the system must do its best to maintain precision. Scheme systems must not burden users with numerical operations described in terms of hardware and operating-system dependent representations such as FIXNUM and FLONUM. These representation issues should not be germane to the user's problems. We highly recommend that the IEEE 32-bit and 64-bit floating-point standards be adopted for implementations that use floating-point internal representations. To minimize loss of precision we adopt the following rules: If an implementation uses several different size floating-point formats, the results of any operation with a floating-point result must be expressed in the largest format used to express any of the floating-point arguments to that operation. It is desirable (but not required) for potentially irrational operations such as SQRT, when applied to EXACT arguments, to produce EXACT answers when that is possible (eg. sqrt[4]=2, exactly). If an EXACT number (or an INEXACT number represented as a FIXNUM, a BIGNUM, or a RATNUM) is operated upon so as to produce an INEXACT result (as by SQRT), then if the result is represented as a FLONUM, then the largest available format FLONUM must be used; but if the result is expressed as a RATNUM then the rational approximation must have at least as much precision as the largest available format FLONUM. (Note that this last rule is much stronger than that used in Common Lisp. Consider the result of the square root of a rational, for example.) Numerical Operations Scheme provides the usual set of operations for manipulation of numbers. In the specification below we illustrate each built in numerical operation with an expression using the default operator symbol which is bound to that operation in the global environment. In general, numerical operations require numerical arguments. We use the following symbols to represent the various types of object in our descriptions. It is an error for an operation to be presented with an argument that it is not specified to handle. obj any arbitrary object z, z1, ... zi, ... {complex, real, rational, integer} x, x1, ... xi, ... {real, rational, integer} q, q1, ... qi, ... {rational, integer} n, n1, ... ni, ... {integer} In our descriptions some arguments are required and others are optionally allowed. For example, numerical comparisons are allowed to have any number of arguments, but they must have at least two. We indicate this by putting in explicit argument symbols for each required argument and a "..." indicating the rest. A. Numerical type predicates can be applied to any kind of argument. They return true if the object is of the named type. In general, if a type predicate is true of a number then all higher type predicates are also true of that number. Not every system supports all of these types. It is entirely possible to have a Scheme system which only has INTEGERs, however, it must have all of these predicates. (NUMBER? obj) (COMPLEX? obj) (REAL? obj) (RATIONAL? obj) (INTEGER? obj) B. Numerical predicates test a number for a particular property. They return a boolean value. (ZERO? z) (POSITIVE? x) (NEGATIVE? x) (ODD? n) (EVEN? n) (EXACT? z) (INEXACT? z) C. Numerical comparisons require all of their arguments to be numbers. They optionally take many arguments, as in Common Lisp, to facilitate encoding of range checks. Not all implementations support this extension. These predicates have redundant names (with and without the terminal "?") to make all populations happy. Warning: On INEXACT numbers equality tests will give unreliable results; other numerical comparisons are only heuristically useful (ask a numerical analyst about this!) (= z1 z2 ... zi) all arguments are numerically equal (=? z1 z2 ... zi) (< x1 x2 ... xi) arguments are monotonically increasing (<? x1 x2 ... xi) (> x1 x2 ... xi) arguments are monotonically decreasing (>? x1 x2 ... xi) (<= x1 x2 ... xi) arguments are monotonically nondecreasing (<=? x1 x2 ... xi) (>= x1 x2 ... xi) arguments are monotonically nonincreasing (>=? x1 x2 ... xi) (max x1 ... xi) (min x1 ... xi) D. Arithmetic operations require all arguments to be numbers. Generalization to arbitrary numbers of arguments is implementation dependent. (+ z1 ... zi) ==> z1 + ... + zi (+ z) ==> z (+) ==> 0 (the identity) (* z1 ... zi) ==> z1 * ... * zi (* z) ==> z (*) ==> 1 (the identity) (- z1 ... zi) ==> z1 - z2 - ... - zi (- z) ==> -n (/ z1 ... zi) ==> z1 / z2 / ... / zi (/ z) ==> 1/z (-1+ z) ==> -1 + z (1+ z) ==> 1 + z (abs z) E. Integer division operations are only defined on integers. (quotient n1 n2) ==> n3 (remainder n1 n2) ==> n4 (modulo n1 n2) ==> n4 In general, these are intended to implement number-theoretic division, where for positive arguments, n1 and n2 n1 = n3*n2 + n4 and 0 <= n4 < d. REMAINDER and MODULO differ on negative arguments as in the Common Lisp REM and MOD procedures -- the remainder always has the sign of the dividend, the modulo always has the sign of the divisor: (modulo 13 4) ==> 1 (remainder 13 4) ==> 1 (modulo -13 4) ==> 3 (remainder -13 4) ==> -1 (modulo 13 -4) ==> -3 (remainder 13 -4) ==> 1 (modulo -13 -4) ==> -1 (remainder -13 -4) ==> -1 F. The following procedures are also only defined on integers. The result is always non-negative. (gcd n1 ... ni) Greatest common divisor (gcd) ==> 0 (the identity) (lcm n1 ... ni) Least common multiple (lcm) ==> 1 (the identity) G. Integers and rationals can be made using the following procedures. Note, this does not make the answer EXACT -- in fact, the resulting number is clearly INEXACT, it can be made EXACT with an exactness coercion. (floor x) The largest integer not larger than x (ceiling x) The smallest integer not smaller than x (truncate x) The integer of maximal absolute value not larger than x (round x) The closest integer to n. Rounds to even when halfway. (rationalize x y) Produces the best rational approximation to x within the tolerance specified by y. (rationalize x) Produces the best rational approximation to x, preserving all of the precision in its representation. H. In implementations which support real numbers the following are defined to conform with the Common Lisp standard as to meaning (be careful of the branch cuts if complex numbers are allowed). (exp z) (log z) (expt z1 z2) (sqrt z) (sin z) (cos z) (tan z) (asin z) (acos z) (atan z1 z2) I. In implementations which support complex numbers we also provide (make-rectangular x1 x2) ==> x1 + x2i (make-polar x3 x4) ==> x3*e^x4i (real-part z) (imag-part z) (magnitude z) (angle z) J. Exactness coercions. (exact->inexact z) Makes an inexact representation of z. Pretty harmless. (inexact->exact z) Makes an exact representation of z. I hope you know what you are doing here! K. Type-restricted operations are useful for error checking, or as advice to a compiler. Scheme optionally supplies such type-restricted operations: If op is defined on objects of numerical type, t1, and if t2 is any type lower on the tower than t1, then (t2 op) is the restricted operator for objects of type t2. For example, the following are expressions using restricted operators: ((real sqrt) 5) ; Gets the right thing. ((real sqrt) -1) ; Will signal an error. ((integer sqrt) 10) ; Will signal an error -- You probably wanted to compute (truncate ((real sqrt) 10)) For completeness, the system also supplies the most general operations ((number +) 3 4) The default operations, initially bound to the customary operators in the global environment, are these most general operations. The type-restriction procedures REAL, INTEGER, ... are simple maps which map the default operations to their restricted cousins. Some systems may want to enhance this facility, to allow integer subranges or other mathematically meaningful restrictions such as modular arithmetic, for example: (((integer-subrange 3 5) +) 2 3 4) ;Will signal an error. Numerical Input and Output Scheme allows all of the traditional ways of expressing numerical constants, though only some may be supported by any particular implementation. These expressions are intended to be purely notational; any kind of number may be expressed in any form that the user deems convenient. Of course, expressing 1/7 as a limited-precision decimal fraction will not exactly express the number, but this approximate expression may be just what the user wants to see. The expressions of numerical constants can be specified using formats. The system provides a procedure, NUMBER->STRING, which takes a number and a format and which produces a string which is the printed expression of the given number in the given format. (number->string <number> <format>) ==> <string> This procedure will mostly be used by sophisticated users and in system programs. In general, a naive user will need to know nothing about the formats because the system printer will have reasonable default formats for all types of NUMBERs. The system reader will construct reasonable default numerical types for numbers expressed in each of the formats it recognizes. If a user needs control of the coercion from strings to numbers he will use STRING->NUMBER, which takes a string, an exactness, and a radix and which produces a number of the maximally precise applicable type expressed by the given string. (string->number <string> E|I B|O|D|X) ==> <number> Formats may have parameters. For example, the "(SCI 5 2)" format specifies that a number is to be expressed in FORTRAN scientific format with 5 significant places and two places after the radix point. The following are all numerical constant expressions. The comment shows the format that was used to produce the expression: 123 +123 -123 ; (INT) format 12345678901234567890123456789 ; A big one 355/113 -355/113 +355/113 ; (RAT) format +123.45 -123.45 ; (FIX 2) format 3.14159265358979 ; (FIX 14) format 3.14159265358979 ; (FLO 15) format 123.450 ; (FLO 6) format -123.45E-1 ; (SCI 5 2) format 123E3 123E-3 -123E-3 ; (SCI 3 0) format -1+2i ; (RECT (INT) (INT)) format 1.2@1.570796 ; (POLAR (FIX 1) (FLO 7)) format A numerical constant may be specified with an explicit radix by a prefix. The prefixes are: #B (binary), #O (octal), #D (decimal), #X (hex). A format may specify that a number be expressed in a particular radix. The radix prefix may be suppressed. For example, one may express a complex number in polar form with the magnitude in octal and the angle in decimal as follows: #o1.2@#d1.570796327 ;(POLAR (FLO 2 (RADIX O)) (FLO (RADIX D))) format #o1.2@1.570796327 ;(POLAR (FLO 2 (RADIX O)) (FLO (RADIX D S)) format A numerical constant may be specified to be either EXACT or INEXACT by a prefix. The prefixes are: #I (inexact), #E (exact). An exactness prefix may appear before or after any radix prefix that is used. A format may specify that a number be expressed with an explicit exactness prefix, or it may force the exactness to be suppressed. For example, the following are ways to output an inexact value for pi: #I355/113 ; (RAT (EXACTNESS)) 355/113 ; (RAT (EXACTNESS S)) #I3.1416 ; (FIX 4 (EXACTNESS)) An attempt to produce more digits than are available in the internal machine representation of a number will be marked with a "#" filling the extra digits. This is not a statement that the implementation knows or keeps track of the significance of a number, just that the machine will flag attempts to produce 20 digits of a number which has only 15 digits of machine representation: 3.14158265358979##### ; (FLO 20 (EXACTNESS S)) In systems with both single and double precision FLONUMs one may want to specify which size we want to use to internally represent a constant. For example, we may want a constant whidh is pi rounded to the single precision length, or we might want a long number which has the value 6/10. In either case, we are specifying an explicit way to represent an INEXACT number. For this purpose, we allow one to express a number with a prefix which indicates short or long FLONUM representation: #S3.14159265358979 ; Round to short -- 3.141593 #L.6 ; Extend to long -- .600000000000000 Details of formats The format of a number is notated as a list beginning with a format descriptor, such as SCI. Following the descriptor are parameters used by that descriptor, such as the number of significant digits to be used. Parameters which are omitted are defaulted. Next, one may specify modifiers, such as RADIX or EXACTNESS, which themselves may be parameterized. The format descriptors are: (INT) Express as an integer. The radix point is implicit. If there are not enough significant places, as in trying to express 6.0238E23 (internally represented as a 7 digit FLONUM) as an integer we would get "6023800################". (RAT n) Express as a rational fraction. n specifies the largest denominator to be used in constructing a rational approximation to the number being expressed. If n is omitted it defaults to infinity. (FIX n) Express with a fixed radix point. n specifies the number of places to right of the radix point. n defaults to the size of a single-precision FLONUM. If there are not enough significant places, as in trying to express 6.0238E23 (internally represented as a 7 digit FLONUM) as a (FIX 2) we would get "6023800################.##". (FLO n) Express with a floating radix point. n specifies the total number of places to be displayed. n defaults to the size of a single- precision FLONUM. If the number is out of range, it is converted to (SCI). (FLO H) allows the system to heuristically express a FLO for human consumption (as in MacLisp printer). (SCI n m) Express in exponential notation. n specifies the total number of places to be displayed. n defaults to the size of a single- precision FLONUM. m specifies the number of places to the right of the radix point. m defaults to n-1. (SCI H) does heuristic expression. (RECT r i) Express as a rectangular form complex number. r and i are formats for the real and imaginary parts respectively. They default to (HEUR). (POLAR m a) Express as a polar form complex number. m and a are formats for the magnitude and angle respectively. m and a default to (HEUR). (HEUR) Express heuristically, as in the MacLisp printer (see Steele), using the minimum number of digits required to get an expression which when coerced back to a number produces the original machine representation. EXACT numbers are expressed as (INT) or (RAT). INEXACT numbers are expressed as (FLO H) or (SCI H) depending on their range. Complex numbers are expressed in (RECT). This is the normal default of the system printer. The following modifiers may be added to a numerical format specification: (EXACTNESS s) This controls the expression of the exactness label of a number. s indicates whether the exactness is to be E (expressed) or S (suppressed). s defaults to E. If no exactness modifier is specified for a format the exactness is by default not expressed. (RADIX r s) This forces a number to be expressed in the radix r. r may be B (binary), O (octal), D (decimal), or X (hex). s indicates whether the radix label is to be E (expressed) or S (suppressed). s defaults to E. If no radix modifier is specified the default is decimal and the label is suppressed. -------

- Prev by Date:
**Draft delayed to 15 March** - Next by Date:
**I/O proposal** - Prev by thread:
**Draft delayed to 15 March** - Next by thread:
**Re: Numbers Committee Report** - Index(es):