[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

readers & tokenizers

To: jinx@GENEVA.AI.MIT.EDU
Subject: readers & tokenizers
From: Jonathan A Rees <JAR@AI.AI.MIT.EDU>
Date: Thu, 9 Apr 87 12:38:35 EDT
cc: bartley%home%ti-csl.csnet@RELAY.CS.NET, rrrs-authors@MC.LCS.MIT.EDU
In-reply-to: Msg of Thu 9 Apr 87 09:32:58 est from jinx at GENEVA.AI.MIT.EDU (Guillermo J. Rozas)

    Date: Thu, 9 Apr 87 09:32:58 est
    From: jinx at GENEVA.AI.MIT.EDU (Guillermo J. Rozas)
    To:   bartley%home%ti-csl.csnet at RELAY.CS.NET
    cc:   rrrs-authors at MC.LCS.MIT.EDU,
          Bartley%home%ti-csl.csnet at RELAY.CS.NET
    Re:   towards an agenda

    I feel very strongly against customizable readers.  I think they are
    not really very useful, and on the other hand are abused immensely.
    I've seen plenty of code which I can no longer recognize as Lisp code
    because of all the reader extensions used in it.
    I think it is a bad idea to standardaize on one.

I sympathize with this.  However, the argument I see for this is as
follows: many Scheme implementations internally have high-speed
tokenizers which either are or could easily be made to be table-driven.
If for some reason one writes a Scheme program which needs a tokenizer,
a (Pascal, Prolog, ...) compiler for example, one has a choice between
writing slow portable code and writing fast unportable code.  The
speedup can be quite significant; how many implementations have readers
that are written using only the i/o primitives in the report?  
(The reader in the meta-circular Scheme implementation I'm working on
comes close, except that it uses PEEK-CHAR.)

I'm not going to be precise about what I mean by "tokenizer" but at the
very least it means something akin to READ-LINE which stops when it
comes upon a "delimiter", and perhaps does some sort of filtering or
parsing (e.g. case normalization, escape sequence handling) in the
process.

Of course, you can in this case use the technique I described in a
previous message, of defining your own interface and then making a
different bummed implementation of it for each different implementation
you actually use.  But any situation like this is a candidate for
standardization, if more than one person is likely to use it.

For things like this (the category also includes multiple values #1,
hash tables, Chris's string-manipulation primitives, and other things I
have mentioned before), we are really in the business of standardizing
on interfaces to libraries of things that can be written portably but
are often better not.  We're not talking changes to the language, we're
talking making life easier for those who use these features and people
who read their code.

As for defining "read macros" for scheme programs, that's another story.
I've never felt much need for this except when emulating one lisp
dialect in another, and this application has certainly diminished in
importance now that there is some degree of standardization.  In cases
where I feel I just can't do without, e.g. if I'm playing with some idea
for a language design which for some bizarre reason really needs a new
read macro, I probably wouldn't mind too terribly writing my own reader,
which isn't such a difficult thing, especially if there's already a
tokenizer handy.  You probably want to do that if you're emulating other
dialects (like Common Lisp), too.

The peculiar thing about the Common Lisp situation is that it supports
read macros, but it doesn't give you any control of the tokenizer.  E.g.
if you're implementing C and need to distinguish case, you're out of
luck.  If you want 1A to be an error, or colon to be alphabetic, or
... to be a valid symbol, give up.  But then generality was explicitly
not one of their design goals in this case.

Jonathan

Follow-Ups:
- readers & tokenizers
  - From: jinx@GENEVA.AI.MIT.EDU (Guillermo J. Rozas)

Prev by Date: towards an agenda
Next by Date: readers & tokenizers
Prev by thread: towards an agenda
Next by thread: readers & tokenizers
Index(es):
- Date
- Thread