Strings (Revised(7) Report on the Algorithmic Language Scheme)

6.7 Strings

Strings are sequences of characters. Strings are written as sequences of characters enclosed within quotation marks ("). Within a string literal, various escape sequences represent characters other than themselves. Escape sequences always start with a backslash (\):

\a : alarm, U+0007
\b : backspace, U+0008
\t : character tabulation, U+0009
\n : linefeed, U+000A
\r : return, U+000D
\" : double quote, U+0022
\\ : backslash, U+005C
\| : vertical line, U+007C
\⟨intraline whitespace⟩*⟨line ending⟩ ⟨intraline whitespace⟩* : nothing
\x⟨hex scalar value⟩; : specified character (note the terminating semi-colon).

The result is unspecified if any other character in a string occurs after a backslash.

Except for a line ending, any character outside of an escape sequence stands for itself in the string literal. A line ending which is preceded by \⟨intraline whitespace⟩ expands to nothing (along with any trailing intraline whitespace), and can be used to indent strings for improved legibility. Any other line ending has the same effect as inserting a \n character into the string.

Examples:

"The word \"recursion\" has many meanings."
"Another example:\ntwo lines of text"
"Here's text \
   containing just one line"
"\x03B1; is named GREEK SMALL LETTER ALPHA."

The length of a string is the number of characters that it contains. This number is an exact, non-negative integer that is fixed when the string is created. The valid indexes of a string are the exact non-negative integers less than the length of the string. The first character of a string has index 0, the second has index 1, and so on.

Some of the procedures that operate on strings ignore the difference between upper and lower case. The names of the versions that ignore case end with ‘-ci’ (for “case insensitive”).

Implementations may forbid certain characters from appearing in strings. However, with the exception of #\null, ASCII characters must not be forbidden. For example, an implementation might support the entire Unicode repertoire, but only allow characters U+0001 to U+00FF (the Latin-1 repertoire without #\null) in strings.

It is an error to pass such a forbidden character to make-string, string, string-set!, or string-fill!, as part of the list passed to list->string, or as part of the vector passed to vector->string (see Vectors), or in UTF-8 encoded form within a bytevector passed to utf8->string (see Bytevectors). It is also an error for a procedure passed to string-map (see Control features) to return a forbidden character, or for read-string (see Input) to attempt to read one.

procedure: string? obj ¶: Returns #t if obj is a string, otherwise returns #f.

procedure: make-string k ¶
procedure: make-string k char ¶: The make-string procedure returns a newly allocated string of length k. If char is given, then all the characters of the string are initialized to char, otherwise the contents of the string are unspecified.

procedure: string char… ¶: Returns a newly allocated string composed of the arguments. It is analogous to list.

procedure: string-length string ¶: Returns the number of characters in the given string.

procedure: string-ref string k ¶

It is an error if k is not a valid index of string.

The string-ref procedure returns character k of string using zero-origin indexing.

There is no requirement for this procedure to execute in constant time.

procedure: string-set! string k char ¶

It is an error if k is not a valid index of string.

The string-set! procedure stores char in element k of string. There is no requirement for this procedure to execute in constant time.

(define (f) (make-string 3 #\*))
(define (g) "***")
(string-set! (f) 0 #\?) ⇒ unspecified
(string-set! (g) 0 #\?) ⇒ error
(string-set! (symbol->string 'immutable)
             0
             #\?) ⇒ error

procedure: string=? string₁ string₂ string₃… ¶: Returns #t if all the strings are the same length and contain exactly the same characters in the same positions, otherwise returns #f.

char library procedure: string-ci=? string₁ string₂ string₃… ¶: Returns #t if, after case-folding, all the strings are the same length and contain the same characters in the same positions, otherwise returns #f. Specifically, these procedures behave as if string-foldcase were applied to their arguments before comparing them.

procedure: string<? string₁ string₂ string₃… ¶

char library procedure: string-ci<? string₁ string₂ string₃… ¶

procedure: string>? string₁ string₂ string₃… ¶

char library procedure: string-ci>? string₁ string₂ string₃… ¶

procedure: string<=? string₁ string₂ string₃… ¶

char library procedure: string-ci<=? string₁ string₂ string₃… ¶

procedure: string>=? string₁ string₂ string₃… ¶

char library procedure: string-ci>=? string₁ string₂ string₃… ¶

These procedures return #t if their arguments are (respectively): monotonically increasing, monotonically decreasing, monotonically non-decreasing, or monotonically non-increasing.

These predicates are required to be transitive.

These procedures compare strings in an implementation-defined way. One approach is to make them the lexicographic extensions to strings of the corresponding orderings on characters. In that case, string<? would be the lexicographic ordering on strings induced by the ordering char<? on characters, and if the two strings differ in length but are the same up to the length of the shorter string, the shorter string would be considered to be lexicographically less than the longer string. However, it is also permitted to use the natural ordering imposed by the implementation’s internal representation of strings, or a more complex locale-specific ordering.

In all cases, a pair of strings must satisfy exactly one of string<?, string=?, and string>?, and must satisfy string<=? if and only if they do not satisfy string>? and string>=? if and only if they do not satisfy string<?.

The ‘-ci’ procedures behave as if they applied string-foldcase to their arguments before invoking the corresponding procedures without ‘-ci’.

char library procedure: string-upcase string ¶

char library procedure: string-downcase string ¶

char library procedure: string-foldcase string ¶

These procedures apply the Unicode full string uppercasing, lowercasing, and case-folding algorithms to their arguments and return the result. In certain cases, the result differs in length from the argument. If the result is equal to the argument in the sense of string=?, the argument may be returned. Note that language-sensitive mappings and foldings are not used.

The Unicode Standard prescribes special treatment of the Greek letter Σ, whose normal lower-case form is σ but which becomes ς at the end of a word. See UAX #44 [uax44] (part of the Unicode Standard) for details. However, implementations of string-downcase are not required to provide this behavior, and may choose to change Σ to σ in all cases.

procedure: substring string start end ¶: The substring procedure returns a newly allocated string formed from the characters of string beginning with index start and ending with index end. This is equivalent to calling string-copy with the same arguments, but is provided for backward compatibility and stylistic flexibility.

procedure: string-append string… ¶: Returns a newly allocated string whose characters are the concatenation of the characters in the given strings.

procedure: string->list string ¶

procedure: string->list string start ¶

procedure: string->list string start end ¶

procedure: list->string list ¶

It is an error if any element of list is not a character.

The string->list procedure returns a newly allocated list of the characters of string between start and end. list->string returns a newly allocated string formed from the elements in the list list. In both procedures, order is preserved. string->list and list->string are inverses so far as equal? is concerned.

procedure: string-copy string ¶
procedure: string-copy string start ¶
procedure: string-copy string start end ¶: Returns a newly allocated copy of the part of the given string between start and end.

procedure: string-copy! to at from ¶

procedure: string-copy! to at from start ¶

procedure: string-copy! to at from start end ¶

It is an error if at is less than zero or greater than the length of to. It is also an error if (- (string-length to)at) is less than (- end start).

Copies the characters of string from between start and end to string to, starting at at. The order in which characters are copied is unspecified, except that if the source and destination overlap, copying takes place as if the source is first copied into a temporary string and then into the destination. This can be achieved without allocating storage by making sure to copy in the correct direction in such circumstances.

(define a "12345")
(define b (string-copy "abcde"))
(string-copy! b 1 a 0 2)
b ⇒ "a12de"

procedure: string-fill! string fill ¶

procedure: string-fill! string fill start ¶

procedure: string-fill! string fill start end ¶

It is an error if fill is not a character.

The string-fill! procedure stores fill in the elements of string between start and end.