[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Revisions to String Proposal



I have revised the earlier proposal I submitted for String operations.
The revision was somewhat of a hatchet job, but I think that the
important functionality remains.

What follows is some notes referring to the earlier proposal.  After
that I have written a small bit of commentary about mutation of
strings.

----------------------------------------------------------------------

			      Characters

The basic operations on characters are the following; they should be
essential. 

  (CHAR? <object>)
    True iff <object> is a character.

  (CHAR->INTEGER <char>)
  (INTEGER->CHAR <integer>)
    Maps to coerce characters to integers and vice versa.  If two
    characters have a certain relationship in the character ordering,
    then the corresponding integers must have the same relationship in
    the integer ordering.  (If I recall right, this means these maps
    are order isomorphisms between characters and integers.)

I would like to propose that the following be essential.  These are
the case sensitive order predicates for characters; they define the
character ordering.  I think that the Common Lisp restrictions as to
the ordering should be adopted.

The names have been changed to conform with Common Lisp, after Don
Oxley pointed out that CHAR-EQUAL? was case sensitive, and in CL
CHAR-EQUAL was the case insensitive version.  I hope that this will
clear things up a bit.

Each of the following accepts two character objects, and compares them
in the obvious way.  Optionally, they may take more arguments, as do
the corresponding numeric predicates.

    CHAR=?  CHAR<?  CHAR<=?  CHAR>?  CHAR>=?

The following should be optional; they are the case insensitive
versions of the above.

    CHAR-CI=?  CHAR-CI<?  CHAR-CI<=?  CHAR-CI>?  CHAR-CI>=?

These character class predicates should be optional, with the meaning
described in the Common Lisp manual.  They each take one argument,
which must be a character.

    CHAR-UPPER-CASE?
    CHAR-LOWER-CASE?
    CHAR-ALPHABETIC?
    CHAR-NUMERIC?
    CHAR-ALPHANUMERIC?
    CHAR-WHITESPACE?
    CHAR-GRAPHIC?

The following should be optional; each takes a character object and
returns another character object.  They perform case conversion if the
argument is lower or upper case, respectively.

    CHAR-UPCASE
    CHAR-DOWNCASE

----------------------------------------------------------------------

			       Strings

I think that the following should be essential, as described in my
earlier proposal:

    STRING?  MAKE-STRING  STRING-LENGTH  STRING-REF  STRING->LIST
    LIST->STRING  SUBSTRING  STRING-APPEND  STRING-NULL?

Similarly to the transformation for characters, the following should
be essential for two arguments, and optionally should take more.

    STRING=?  STRING<?  STRING<=?  STRING>?  STRING>=?

Again, these case insensitive versions are optional.

    STRING-CI=?  STRING-CI<?  STRING-CI<=?  STRING-CI>?  STRING-CI>=?

Next, these optional procedures provide a small set of operations for
mutable strings. 

    STRING-ALLOCATE  STRING-COPY  STRING-SET!  SUBSTRING-FILL!
    SUBSTRING-MOVE-RIGHT!  SUBSTRING-MOVE-LEFT!

The remaining operations, while useful, are probably not important
enough to standardize on.  As I have already demonstrated, all of them
can be implemented given the above operations.


Here is the corrected text for the -MOVE- operations:

(SUBSTRING-MOVE-RIGHT! STRING1 START1 END1 STRING2 START2)
(SUBSTRING-MOVE-LEFT! STRING1 START1 END1 STRING2 START2)

These operations destructively copy the substring <STRING1, START1,
END1> to the string STRING2 starting at the index START2.  It must be
the case that <STRING2, START2, (+ START2 (- END1 START1))> is a
substring; this latter substring is destructively modified to contain
the contents of the former substring.

The operations differ only when the two substrings overlap, i.e. when
STRING1 and STRING2 are EQ? and the index sets of the substrings are
not disjoint.  In this case, the operations are defined to copy the
elements of the first substring serially.  SUBSTRING-MOVE-RIGHT!
copies the first substring starting with the rightmost element,
proceeding to the left, while SUBSTRING-MOVE-LEFT! starts with the
leftmost element, proceeding to the right.  This has the effect that
the two operations can be used to shift groups of characters right or
left, respectively, within a given string.

----------------------------------------------------------------------

                               Mutation

I have tried to be careful about the issue of mutable strings.  None
of the operations which I have proposed as "essential" mutate strings.
I have provided a small set of operations, marked as "optional", which
DO mutate strings.  I believe that mutation of strings is basically
reasonable; although I understand that in some very important cases,
in particular the names of interned symbols, there should be some
guarantee that a string cannot be mutated.  I believe that this can be
solved by one of these simple methods:

1.  A string could have an internal bit which, when set, would prevent
mutation.

2.  There could be two types of strings.  In this case, it would be
reasonable to decide that the read/print syntax for mutable strings
need not be the same as for non-mutable strings.

Anyway, I have chosen to have all strings be mutable in the MIT
implementation, because that is the simplest choice providing the most
power.  To the best of my knowledge, no one has ever been screwed by
this decision, and it seems unlikely that anyone ever would.