[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

customizable reader



Here's a proposal for the yellow pages: a customizable reader patterned
after Guy Steele's in CLtL.  It adds yet another optional argument to
read, but at least it doesn't have any global state.  If you like global
state you can, of course, redefine the READ and READ-DELIMITED-LIST
procedures to use a different default readtable.

The proposal does not support the following Common Lisp features:
non-preservation of whitespace, eof-error-p, eof-value, recursive-p,
zero return values, packages.

Since the status of #\| and #\\ is not specified, it is not specified
whether there are any single or multiple escape characters in the
standard readtable.  Et cetera.  I haven't bothered to specify the token
parser used by the standard readtable because the number syntax is still
a topic of discussion.

The description assumes two new types: the readtable and syntax types.
These types do not have to be distinct from existing types.  In my
prototype implementation I use vectors.  If we had structures, they
would undoubtedly be structures.

================================================================

(read)							--> object
(read port)						--> object
(read port readtable)					--> object

    The readtable argument defaults to the standard readtable.
    By specifying an explicit readtable you can inflict your own
    crazy syntax on the characters read from the port.

(read-delimited-list char)				--> object
(read-delimited-list char port)				--> object
(read-delimited-list char port readtable)		--> object

    Following CLtL.  This packages up most of the hair involved with
    handling comments.  I observe that CLtL does not quite give enough
    information for a user to write the #\( and #\) macro procedures,
    which must deal with the dotted pair syntax in addition to comments.
    Neither does this proposal:  I think #\(, #\), and #\; have to be
    magic, which makes it hard for a user to make #\[ and #\] behave
    exactly like #\( and #\).

(copy-readtable)
(copy-readtable readtable)				--> readtable

    The argument defaults to the standard readtable.  Returns a copy of
    its argument.  Calls to set-token-parser!, set-character-syntax!,
    and set-dispatch-character! on the copy do not affect the original.

(get-token-parser readtable)				--> parser

    Returns the readtable's token parser.  The token parser is a procedure
    of three arguments s, i, and j.  The token parser must return a token
    parsed from (substring s i j) or else signal an error.  The token
    parser used by the standard readtable always returns a symbol or
    number if it doesn't signal an error.  The token parser is bypassed
    for tokens that contain single or multiple escape characters, since
    these always indicate a symbol.  Neither is the token parser called
    for whitespace, illegal, macro, or dispatch characters.

    Passing a string and indexes instead of just a string is an efficiency
    hack suggested by David Bartley.  To prevent the token parser from
    depending on the contents of its string argument outside of
    (substring s i j), I propose that every token parser foo be equivalent
    to
        (lambda (s i j) (foo (substring s i j) 0 (- j i)))

    This also implies that the token parser is not allowed to side effect
    its string argument.

(set-token-parser! readtable parser)			--> #!unspecified

    This changes the token parser of the first argument to be the second
    argument.   By changing the token parser and changing all characters
    to be either constituent or whitespace, you can have complete control
    over the parser.  More realistically, this hook lets you experiment
    with weird number syntaxes et cetera.

(get-character-syntax readtable char)			--> syntax

    Returns the character syntax associated with the given character
    in the given readtable.

(set-character-syntax! readtable char syntax)		--> #!unspecified

    Changes the character syntax of the given character in the given
    readtable.

(make-character-syntax 'constituent)			--> syntax
(make-character-syntax 'whitespace)			--> syntax
(make-character-syntax 'illegal)			--> syntax
(make-character-syntax 'single-escape)			--> syntax
(make-character-syntax 'multiple-escape)		--> syntax
(make-character-syntax 'non-terminating-macro proc)	--> syntax
(make-character-syntax 'terminating-macro proc)		--> syntax
(make-character-syntax 'macro proc)			--> syntax

    Manufactures syntax objects.  The 'macro option is an alias for
    the 'terminating-macro option.  The proc is a procedure of three
    arguments: the macro character that has just been consumed (this
    is so you can use one proc for several related macro characters),
    a port, and a readtable.  (It would gross me out if any of these
    arguments were optional.)  See CLtL for explanation of what the
    options mean.

(get-dispatch-character readtable char)			--> proc

    Returns the proc associated with octathorpe macro characters.  The
    proc takes three arguments: the macro character just consumed, a
    port, and a readtable.

(set-dispatch-character! readtable char proc)		--> #!unspecified

    Defines new octathorpe macro characters.