[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

readers & tokenizers

    Date: Thu, 9 Apr 87 14:11:51 PDT
    From: andy at hobbes.ads.ARPA (Andy Cromarty)

    One of the difficulties with conventional read tables is that
    they become very dangerous in a "real" production software

    I have no objection to an improved input system if it is stateless.

Just for the record, I fell obliged to point out that T has a
parameterized reader (i.e. readtables), and at the same time it is
completely safe and stateless.  There is no such thing as *READTABLE* in
T; the global default reader parameters are immutable.  There is a
version of READ (called READ-OBJECT) which takes a read table as an
argument.  READ extracts a read table out of the port being read from
and then calls READ-OBJECT.  A port's readtable is initially the
standard readtable when the port is opened, but it can be set to be
something else after the port is opened; thus readtables are lexically
scoped.  A user can create new readtables and mutate them (e.g. define a
read macro or alter the input radix), but the standard readtable is
immutable, so there's no way you can accidentally step on a readtable
that someone else will see.

I do not propose this for Scheme, but I thought you should know that
stateless doesn't mean you have to throw away the possibility of something
higher level than the scanner you suggest, or even read macros.

I think a function of the sort you propose would be fine, that's
basically the minimum I had in mind for a "tokenizer".  We might
consider introducing "character sets" as in Icon (and MIT Scheme??  I
vaguely remember seeing something of the sort) to help make this go even
faster (since that's the point of the proposal).  One procedure could
coerce a list of or predicate on characters to a character set, and then
the character set specifying the delimiters could be passed to READ-LINE
(actually READ-STRING is probably a better name).

    Finally, I would advocate one extension to the reader itself.
    That is the inclusion of #+ and #-.  We already have added these
    to the ADS Scheme reader, because we found that it had a tremendous
    impact on portability of code from one LISP environment to another.
    Again, this sort of capability is critical for large development
    efforts or for the production of commercial tools that are intended
    to work in multiple LISP dialects.  I further suggest that "scheme"
    specifically be recognized by #+ and #- and that "#+scheme" be true
    in R?RS Scheme.

I have used read-time conditionalization in the past and have concluded
that it is not a good idea.  If a language must have conditionalization,
it must have run-time semantics (although of course a compiler can
optimize it into load-time or compile-time, depending on what it knows
about the target system).  The problem with read time conditionalization
is that it interacts extremely poorly with cross-compilation, static
code analysis, and macros.

What I always do is encapsulate implementation dependencies by defining
an interface, and then write multiple implementations of the same
interface.  In my experience this leads to code that's much prettier,
more modular, AND more portable.  Why won't this work for you?