[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

MIN and MAX (long message)



I agree with almost everything Alan Bawden and Mitch Wand have said
concerning MIN and MAX, so I will try to avoid repeating their
arguments in this note.  The purpose of this note is to argue:

  *  Although the concept of exactness is a new and useful idea, it
     is not a radical new idea.  Programmers who naively assume that
     inexact numbers are just another name for floating point numbers
     are unlikely to get into any trouble they wouldn't already be in
     with floating point.  Furthermore the concept of exactness does
     not entail any loss of efficiency.

  *  The controversy concerning the semantics of MIN and MAX has nothing
     to do with the fact that Scheme does not require inexact numbers to
     be represented in floating point format.  It has to do with the
     relationship between exact and inexact numbers (by whatever names
     you want to call them) in a generic arithmetic system.

  *  No major programming language adopts either semantics that we are
     considering for MIN and MAX.  In particular, I think Guy was pulling
     our collective leg when he claimed that MIN and MAX act as
     conditionals in C.

  *  Certain statements that have been made in this discussion deserve
     correction, qualification, or elaboration.

--------------------------------------------------------------------------------

Alan Bawden:
    I've never heard users asking for the notion of exactness/inexactness at
    all.  Ask them what they want, and they will probably tell you that they
    want IEEE floating point.  As I have said before, I think it would be 100%
    reasonable for Scheme to do what every other language does, and adopt some
    reasonable set of rules for the behavior of floating point.  Unfortunately,
    Scheme decided to go in a different direction....

You might be shocked by how little "every other language" has to say about
what floating point is and how it behaves.  A lot of machines don't support
IEEE floating point very well, so language designers have to be pretty vague
about floating point if they want their language to run well on popular
hardware.

At Brandeis we asked ourselves what was distinctive about floating point,
and why we needed it.  The answer was that floating point was inexact, and
we needed it for efficient (though approximate) calculations with real
numbers.  Instead of writing a vague description of how floating point
arithmetic behaves in Scheme, we wrote a vague description of how
inexact arithmetic behaves.

Although the Scheme report explicitly recognizes that non-floating-point
implementations of inexact arithmetic are possible, I suspect that most
language standards are vague enough that non-floating-point implementations
of their "real" or "float" types would be just as legal as in Scheme.  So
I don't think we've done anything radical by allowing non-floating-point
implementations of inexact arithmetic.

Furthermore I don't think many programmers would notice the difference if
a good non-floating-point implementation of inexact numbers were used.
People doing serious calculations would notice and gripe, but they might
gripe about IEEE arithmetic as well (because, for example, its roundoff
characteristics are different from those of the Cray their programs have
been written for).  If people doing serious calculations have technical
complaints about the non-floating-point implementation, well, maybe the
implementors should listen to them.  (Now that would be radical!)

Kent Pitman:
    ....If Scheme numbers are to persist
    in their current form, I would like to be able to speak as confidently
    about their demonstrated usefulness, and I don't now feel that I have
    the ammunition to do so.

Here are four benefits, arranged in decreasing order of importance by
my estimate.  The basic argument in favor of the R3.95RS semantics for
MIN and MAX is that any other semantics would weaken the most important
benefit by adding yet another hypothesis.

  *  If a calculation yields an exact result, and the calculation did
     not involve any predicates on inexact numbers, and did not call
     INEXACT->EXACT, then the result is correct.
  *  The rules for calculating with exact numbers are such that the
     extra hypotheses in the sentence above can often be checked
     statically.
  *  The integers are a subset of the rationals, which are a subset of
     the reals, which are a subset of the complex numbers.
  *  The ODD?, EVEN?, QUOTIENT, REMAINDER, MODULO, GCD, LCM, NUMERATOR,
     and DENOMINATOR procedures accept inexact as well as exact
     arguments.

Kent Pitman:
    ....It would be one thing if it -also-
    provided floats, so that people could get real work done while they
    waited for the other stuff to get field tested, but my own cynical
    viewpoint is that--for quite a while yet, anyway-- implementations are
    likely to be either correct or efficient, but not both.  At least with
    floats the fact that games are being played is above board, and you
    don't have to fail to conform in order to use standard hardware.

Since floating point is not only a permitted representation for inexact
numbers but is in fact the typical representation, I don't understand
this argument at all.  Arithmetic in Scheme is as efficient as generic
arithmetic in, say, Common Lisp.

I agree that it will take a year or two for people to get the bugs out
of their systems.  In MacScheme 2.02, for example, the comparison
operators aren't transitive yet, the NUMBER->STRING, STRING->NUMBER, and
EXACT->INEXACT procedures aren't always good to the last bit, and the
MIN and MAX procedures still act as selectors.  These are the only
arithmetic bugs I know of, six months after product release.  None have
to do with exactness or inexactness, since they would remain bugs even
if Scheme were to require floats instead of inexact numbers.

Guy Steele Jr:
    ...every C weenie has the formula
      #define max(x,y) ((x) > (y)) ? (x) : (y)
    engraved in his little weenie brain.  See?  MAX really is a conditional.

I'm not much of a C weenie, and I don't have a description of the
proposed (or actual?) ANSI standard, but I think I know enough C to
suspect that Guy didn't mean for us Scheme weenies to take this seriously.
When examined more closely, Guy's example shows that C weenies think more
like me than like Guy.  (This doesn't surprise me, unfortunately.)

C is a statically typed language, with very little polymorphism, that
regards ints and floats as different types.  (I'm going to speak as
though all the different integer types were collapsed into one type, and
all the different float types into one type.)  The two alternatives of
a conditional expression must have the same type, or else be convertible
to the same type using C's conversion rules.  If you define max as Guy
suggested and then write max(2.5, 1000) as an expression, then the result
of that expression will be a float (1000.0), not an int.

If you write max(2.5, 1000) in a context that expects an int, then the
floating point 1000.0 will be converted back into an int (1000).  (It
isn't wise to regard that conversion as part of the expression's
semantics, because you would then have to believe that max(2, 1000.5)
also evaluates to an int.)  Hence max(2.5, 1000) in an int context is
really the equivalent of (INEXACT->EXACT (TRUNCATE (MAX 2.5 1000))) in
Scheme.

So if MAX really is a conditional in C, then (MAX 2.5 1000) ==> 1000.0
is consistent with MAX being a conditional.  It seems that Guy's example
actually supports the R3.95RS semantics he was arguing against.

To my knowledge, no major programming language defines MAX so that
MAX of a floating point 2.5 and an exact integer 1000 is guaranteed
to be an (exact) integer.  Common Lisp, which explicitly leaves it up
to the implementation, comes the closest.  [My original posting to the
effect that Common Lisp requires (MAX 2.5 1000) to be 1000.0 was
incorrect, as Guy pointed out.]

With most C compilers, it is true that the result of max(x, y) will
always be equal to one of the arguments.  This essentially results
from the poverty of C's representations.  The R3.95RS semantics of
MAX allows this property to be achieved in Scheme the same way it
is achieved by C:

  *  Don't implement exact arithmetic for any types except integers.
  *  Limit the range of exact integers to +-9007199254740992.
     (Any smaller range will work too.)
  *  Represent inexact reals as IEEE 64-bit floats.

Hence one could say that the R3.95RS semantics, like Common Lisp, allows
(but does not require) MAX to return a result that is not equal to any
of its arguments---but it would be misleading.  [Apologies to RMN.]

Guy Steele Jr:
    (1) I see no reason why the rules of inexact contagion should
    apply to MAX.  As Pavel has observed, >= does not return an
    inexact boolean.  (If >= were to be three-valued (yes/no/maybe)
    then it would be more consistent.)

MIN and MAX don't return booleans.  With the R3.95RS semantics, my
faith in an exact result is threatened only by predicates and by
INEXACT->EXACT.  I'm not happy about having to remember those
exceptions, but I'd be even less happy if I had to remember them as
"predicates, INEXACT->EXACT, MIN, and MAX".  The predicates and
INEXACT->EXACT are pretty easy to remember because they seem so
inevitable, but MIN and MAX would look like they made the list
because someone hadn't been thinking very clearly.

I'm not convinced by the argument that MIN and MAX are mere comparisons,
not computations, possibly because I know how much computation a mere
comparison is likely to take in Scheme when one argument is exact and
another inexact.


Alan Bawden:
    ....If the multiplication overflows the range of exact integers in
    some implementation and returns an inexact, then the division will
    return an inexact as well, and then VECTOR-REF will signal an error....

Guy Steele Jr:
    Depends on why I put the MIN in there.  If I put it in precisely to
    guard against indexing off the end of the vector when running in
    broken implementations that think 32 bits is enough to count anything
    interesting at all, then the program is doing its job, and MIN is
    exactly what I wanted.

VECTOR-REF isn't guaranteed to signal an error if its second argument is
inexact.  Section 1.3.2 says "Implementations may extend a procedure's
domain of definition to include such arguments".


Will Clinger:
    ...e.g. (MAX 1.4 #e1e100) ==> 9.999999999999998e99.

Bill Rozas:
    As far as I understand it (and GJS agrees with me), the example Will
    shows could only be correct in an implementation where 
    (>= 9.999999999999998e99 #e1e100) is true.  If it isn't, the
    implementation of MAX/SUP is in error.

Not as I understand it.  I intended for this example to illustrate
what happens in an implementation that uses IEEE 64-bit floating
point to represent inexact reals, but I goofed (because of a bug
in the least significant bit of MacScheme's implementation of
EXACT->INEXACT for large numbers).  A better example is

    (MAX 1.4 #e1e200) ==> 9.99999999999999969733e199

I think this example is correct in the implementation described, even
though (>= 9.99999999999999969733e199 #e1e200) is false.  The entire
specification of MAX and MIN is that they "return the maximum or minimum
of their arguments", but the implementation described will be unable to
meet this spec because it doesn't have enough precision to represent
10^200 as an inexact number without loss of accuracy.  The implementation
has exactly the same problem with + and *, which are supposed to "return
the sum or product of their arguments" but can't in examples like
(+ 1.5 1e-30).  In such cases "it is the duty of each implementation
to make the result as close as practical to the mathematically
ideal result".  It happens that 9.99999999999999969733e199 is the
inexact number that is closest to 10^200 in this implementation.
The next closest inexact number is 1.00000000000000013969e200,
which is apparently the result desired by Rozas and GJS:

    ...A good implementation would coerce to floating point
    rounding towards infinity (ie. add 1 at the least significant
    mantissa bit after truncating the bits) in order to return a
    value >= than all the inputs.

This isn't obvious, and I think it's untrue.  The property you
desire (that the result be greater than or equal to all inputs)
is in conflict with another nice property, namely that the result
of MAX be as close as possible to the mathematically ideal result.
Something has to give.

I think that requiring MAX to round up would be like requiring
(+ x y) to return a number greater than x whenever y is positive.
It sounds plausible at first, but would tend to make computations
less accurate.  I have to admit, given the difficulty of coming up
with any real-world examples where MAX returns a value that is not
equal to one of its arguments, that a difference in the least
significant bit in such cases would probably be of very little
consequence, and it might make someone's mother happy.


Jim Miller:
        (a) They are selectors from a set of numbers and thus there is
            a legitimate expectation that (memq (max <set>) <set>) ==>
            #t.  (Yes, I DO mean memq not memv.)

I think you should have meant MEMV, not MEMQ.  There is only one reason
why EQ? is in Scheme at all: it is an efficiency hack for performing an
EQV? comparison when one of the arguments is known to be of a type for
which EQ? coincides with EQV?.  Consider these facts:  The EQ? procedure
is nothing but a more implementation-dependent version of EQV?.  It is
perfectly legal for an implementation to make EQ? be exactly the same
procedure as EQV?.  It is also legal for EQ? to return #F whenever it is
passed a number or a character.  EQ? is, in short, a real crock whose
role we should be eliminating rather than expanding.

(The fact that EQ? is permitted to be universally false when passed a
number or character would not be affected by the R3.99RS's new requirement
that EQV?-ness must be preserved by storing and fetching from data
structures.  Strengthening this to preservation of EQ?-ness would decrease
the performance of some Scheme implementations, which could only be
justified by some offsetting benefit.  The efficiency argument against
preserving EQ?-ness cannot simply be dismissed, because efficiency is also
the only argument in favor of having EQ? in the language.  The only
important objects for which EQ? can differ from EQV? are numbers,
characters, and empty strings and vectors, so any benefits obtained from
preserving EQ?-ness as well as EQV?-ness must turn on the desirability
of making EQ? more reliable on these objects.  It will be hard to argue
that EQ? needs to be more reliable on these objects, because anyone who
thinks they need the additional reliability could just use EQV? instead.)


Guy Steele Jr:
    ....maybe inexact numbers should not possess object identity.

The behavior of EQ? on numbers (exact or inexact) is unspecified,
except that it must always return a boolean and (EQ? x y) implies
(EQV? x y).  The behavior of EQV? on numbers is defined in terms
of = and EXACT?.  It therefore seems to me that Scheme numbers in
general already lack object identity, unless what you mean by that
is either (1) you want calling EQ? or EQV? on an inexact number to
be an error; or (2) you want the Scheme procedure = to be unspecified
or erroneous on inexact numbers.


Alan Bawden:
    ....I am a believer in object identity even for inexact numbers....

I haven't the slightest idea what you are talking about, and it makes
me feel like Michael Dukakis.  Could somebody please tell me what
object identity is, and why it inspires such passionate allegiance?


Jim Miller:
    (e) None of the above.  Please state your position.

See page 33 of "'Toons for our Times" by Berke Breathed.

Peace, Will