[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
customizable reader
Here's a proposal for the yellow pages: a customizable reader patterned
after Guy Steele's in CLtL. It adds yet another optional argument to
read, but at least it doesn't have any global state. If you like global
state you can, of course, redefine the READ and READ-DELIMITED-LIST
procedures to use a different default readtable.
The proposal does not support the following Common Lisp features:
non-preservation of whitespace, eof-error-p, eof-value, recursive-p,
zero return values, packages.
Since the status of #\| and #\\ is not specified, it is not specified
whether there are any single or multiple escape characters in the
standard readtable. Et cetera. I haven't bothered to specify the token
parser used by the standard readtable because the number syntax is still
a topic of discussion.
The description assumes two new types: the readtable and syntax types.
These types do not have to be distinct from existing types. In my
prototype implementation I use vectors. If we had structures, they
would undoubtedly be structures.
================================================================
(read) --> object
(read port) --> object
(read port readtable) --> object
The readtable argument defaults to the standard readtable.
By specifying an explicit readtable you can inflict your own
crazy syntax on the characters read from the port.
(read-delimited-list char) --> object
(read-delimited-list char port) --> object
(read-delimited-list char port readtable) --> object
Following CLtL. This packages up most of the hair involved with
handling comments. I observe that CLtL does not quite give enough
information for a user to write the #\( and #\) macro procedures,
which must deal with the dotted pair syntax in addition to comments.
Neither does this proposal: I think #\(, #\), and #\; have to be
magic, which makes it hard for a user to make #\[ and #\] behave
exactly like #\( and #\).
(copy-readtable)
(copy-readtable readtable) --> readtable
The argument defaults to the standard readtable. Returns a copy of
its argument. Calls to set-token-parser!, set-character-syntax!,
and set-dispatch-character! on the copy do not affect the original.
(get-token-parser readtable) --> parser
Returns the readtable's token parser. The token parser is a procedure
of three arguments s, i, and j. The token parser must return a token
parsed from (substring s i j) or else signal an error. The token
parser used by the standard readtable always returns a symbol or
number if it doesn't signal an error. The token parser is bypassed
for tokens that contain single or multiple escape characters, since
these always indicate a symbol. Neither is the token parser called
for whitespace, illegal, macro, or dispatch characters.
Passing a string and indexes instead of just a string is an efficiency
hack suggested by David Bartley. To prevent the token parser from
depending on the contents of its string argument outside of
(substring s i j), I propose that every token parser foo be equivalent
to
(lambda (s i j) (foo (substring s i j) 0 (- j i)))
This also implies that the token parser is not allowed to side effect
its string argument.
(set-token-parser! readtable parser) --> #!unspecified
This changes the token parser of the first argument to be the second
argument. By changing the token parser and changing all characters
to be either constituent or whitespace, you can have complete control
over the parser. More realistically, this hook lets you experiment
with weird number syntaxes et cetera.
(get-character-syntax readtable char) --> syntax
Returns the character syntax associated with the given character
in the given readtable.
(set-character-syntax! readtable char syntax) --> #!unspecified
Changes the character syntax of the given character in the given
readtable.
(make-character-syntax 'constituent) --> syntax
(make-character-syntax 'whitespace) --> syntax
(make-character-syntax 'illegal) --> syntax
(make-character-syntax 'single-escape) --> syntax
(make-character-syntax 'multiple-escape) --> syntax
(make-character-syntax 'non-terminating-macro proc) --> syntax
(make-character-syntax 'terminating-macro proc) --> syntax
(make-character-syntax 'macro proc) --> syntax
Manufactures syntax objects. The 'macro option is an alias for
the 'terminating-macro option. The proc is a procedure of three
arguments: the macro character that has just been consumed (this
is so you can use one proc for several related macro characters),
a port, and a readtable. (It would gross me out if any of these
arguments were optional.) See CLtL for explanation of what the
options mean.
(get-dispatch-character readtable char) --> proc
Returns the proc associated with octathorpe macro characters. The
proc takes three arguments: the macro character just consumed, a
port, and a readtable.
(set-dispatch-character! readtable char proc) --> #!unspecified
Defines new octathorpe macro characters.