Characters (Revised(7) Report on the Algorithmic Language Scheme)

6.6 Characters

Characters are objects that represent printed characters such as letters and digits. All Scheme implementations must support at least the ASCII character repertoire: that is, Unicode characters U+0000 through U+007F. Implementations may support any other Unicode characters they see fit, and may also support non-Unicode characters as well. Except as otherwise specified, the result of applying any of the following procedures to a non-Unicode character is implementation-dependent.

Characters are written using the notation #\⟨character⟩ or #\⟨character name⟩ or #\x⟨hex scalar value⟩.

The following character names must be supported by all implementations with the given values. Implementations may add other names provided they cannot be interpreted as hex scalar values preceded by x.

#\alarm      ; U+0007
#\backspace  ; U+0008
#\delete     ; U+007F
#\escape     ; U+001B
#\newline    ; the linefeed character, U+000A
#\null       ; the null character, U+0000
#\return     ; the return character, U+000D
#\space      ; the preferred way to write a space
#\tab        ; the tab character, U+0009

Here are some additional examples:

#\a      ; lower case letter
#\A      ; upper case letter
#\(      ; left parenthesis
#\       ; the space character
#\x03BB  ; λ (if character is supported)
#\iota   ; ι (if character and name are supported)

Case is significant in #\⟨character⟩, and in #\⟨character name⟩, but not in #\x⟨hex scalar value⟩. If ⟨character⟩ in #\⟨character⟩ is alphabetic, then any character immediately following ⟨character⟩ cannot be one that can appear in an identifier. This rule resolves the ambiguous case where, for example, the sequence of characters ‘#\space’ could be taken to be either a representation of the space character or a representation of the character #\s followed by a representation of the symbol pace.

Characters written in the #\ notation are self-evaluating. That is, they do not have to be quoted in programs.

Some of the procedures that operate on characters ignore the difference between upper case and lower case. The procedures that ignore case have ‘-ci’ (for “case insensitive”) embedded in their names.

procedure: char? obj ¶: Returns #t if obj is a character, otherwise returns #f.

procedure: char=? char₁ char₂ char₃… ¶

procedure: char<? char₁ char₂ char₃… ¶

procedure: char>? char₁ char₂ char₃… ¶

procedure: char<=? char₁ char₂ char₃… ¶

procedure: char>=? char₁ char₂ char₃… ¶

These procedures return #t if the results of passing their arguments to char->integer are respectively equal, monotonically increasing, monotonically decreasing, monotonically non-decreasing, or monotonically non-increasing.

These predicates are required to be transitive.

char library procedure: char-ci=? char₁ char₂ char₃… ¶

char library procedure: char-ci<? char₁ char₂ char₃… ¶

char library procedure: char-ci>? char₁ char₂ char₃… ¶

char library procedure: char-ci<=? char₁ char₂ char₃… ¶

char library procedure: char-ci>=? char₁ char₂ char₃… ¶

These procedures are similar to char=? et cetera, but they treat upper case and lower case letters as the same. For example, (char-ci=? #\A #\a) returns #t.

Specifically, these procedures behave as if char-foldcase were applied to their arguments before they were compared.

char library procedure: char-alphabetic? char ¶

char library procedure: char-numeric? char ¶

char library procedure: char-whitespace? char ¶

char library procedure: char-upper-case? letter ¶

char library procedure: char-lower-case? letter ¶

These procedures return #t if their arguments are alphabetic, numeric, whitespace, upper case, or lower case characters, respectively, otherwise they return #f.

Specifically, they must return #t when applied to characters with the Unicode properties Alphabetic, Numeric_Type=Decimal, White_Space, Uppercase, and Lowercase respectively, and #f when applied to any other Unicode characters. Note that many Unicode characters are alphabetic but neither upper nor lower case.

char library procedure: digit-value char ¶

This procedure returns the numeric value (0 to 9) of its argument if it is a numeric digit (that is, if char-numeric? returns #t), or #f on any other character.

(digit-value #\3)     ⇒ 3
(digit-value #\x0664) ⇒ 4
(digit-value #\x0AE6) ⇒ 0
(digit-value #\x0EA6) ⇒ #f

procedure: char->integer char ¶

procedure: integer->char n ¶

Given a Unicode character, char->integer returns an exact integer between 0 and #xD7FF or between #xE000 and #x10FFFF which is equal to the Unicode scalar value of that character. Given a non-Unicode character, it returns an exact integer greater than #x10FFFF. This is true independent of whether the implementation uses the Unicode representation internally.

Given an exact integer that is the value returned by a character when char->integer is applied to it, integer->char returns that character.

char library procedure: char-upcase char ¶

char library procedure: char-downcase char ¶

char library procedure: char-foldcase char ¶

The char-upcase procedure, given an argument that is the lowercase part of a Unicode casing pair, returns the uppercase member of the pair, provided that both characters are supported by the Scheme implementation. Note that language-sensitive casing pairs are not used. If the argument is not the lowercase member of such a pair, it is returned.

The char-downcase procedure, given an argument that is the uppercase part of a Unicode casing pair, returns the lowercase member of the pair, provided that both characters are supported by the Scheme implementation. Note that language-sensitive casing pairs are not used. If the argument is not the uppercase member of such a pair, it is returned.

The char-foldcase procedure applies the Unicode simple case-folding algorithm to its argument and returns the result. Note that language-sensitive folding is not used. If the character that results from folding is not supported by the implementation, the argument is returned. See UAX #29 [uax29] (part of the Unicode Standard) for details.

Note that many Unicode lowercase characters do not have uppercase equivalents.