[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Oversight in R^4RS: `@' as an ``extended alphabetic character''

   Date: Tue, 6 Jan 1998 23:13:14 -0500
   From: Alan Bawden <Alan@lcs.mit.edu>

      From: "Michael R. Blair" <ziggy@martigny.ai.mit.edu>
      Date: Tue, 6 Jan 1998 15:48:14 -0500
      | [] Add @ to the list of extended alphabetics....

   Yes please.  That one's been bugging me for years.

I concur with the need to fix this.

Btw, if you do this, is it necessary to add a note saying how to parse:
i.e., ",@ foo" vs ", @foo"?  Or is the grammar description strong enough
to force a good interpretation.  Also, is it worth reminding implementors
that the pretty printer has to special case this, or do we prefer a nice and
short standard that lets each implementation find and fix this bug on its own?
:-)  We have other multi-char tokens, like #(...) but they aren't ambiguous
because there is no meaning to # by itself.

Making "@" be a "special subsequent" instead of an "special initial" might
be an easy way to keep this from being an issue.  Then again, @foo might be
a popular name for some variable holding an indirect quantity or a host name,
and that would eliminate it as a possible identifier.

   It would also be nice if the report said something about the status of `|'.
   Perhaps it belongs in the same category with `[', `]', `{' and `}'?

That seems right.  Btw, this list occurs only in the early, informal 
description of syntax.  Could we have this remark repeated in the 
formal section?

Also, if we're thinking of retiring this standard, perhaps the remark about
future standards should be weakened to say that these characters are reserved
from use by users, but are available to implementations for experimentation
in non-portable extensions that may become part of future standards.  I feel
bad having no one using them and knowing we realistically probably never will
either.  This would also legitimize CL-compatible uses of | as a part of a
symbol's notation--something implementations hosted in CL are likely to just
accept anyway, whether we allow it or not.

   Other than `@' and `|', all the rest of the ASCII printing characters are
   mentioned.  (I won't suggest expanding the coverage to all of ISO-8859-1...)

Perhaps we ought to write some "weasel words" admitting that extensions 
in this area are ok.  Seems horribly ethnocentric these days not to.

- - - -

Hey, I'm looking at what I think is r^4rs and it uses (in the formal 
syntax stuff) the notation "..." for two different purposes (perhaps in 
different fonts?) ... one in enumerating letters, and another in the
"peculiar tokens".  Is there any way to make that notation clearer?
Is it clearer in some later rev?

I wouldn't mind seeing the notation the SGML spec uses where when talking
about characters or character sequences one uses 'a' or 'foo' or '#(' 
or "a" or "foo" or "#(" so that you can use '"' and "'" to refer to the
quote characters.  That means you don't have to rely on fonts to 
communicate the token-ness.
 '+' | '-' | '...'
 'a' | 'b' | ... | 'z'
would have no ambiguity to me even in plaintext like this.