;Improve documentation of locale-specific string comparison
* doc/lispref/strings.texi (Text Comparison): Mention the Unicode collation rules and buffer-local case-tables.
This commit is contained in:
parent
ea44d7ddfc
commit
2b31e667be
1 changed files with 14 additions and 5 deletions
|
@ -564,11 +564,19 @@ equal with respect to collation rules. A collation rule is not only
|
|||
determined by the lexicographic order of the characters contained in
|
||||
@var{string1} and @var{string2}, but also further rules about
|
||||
relations between these characters. Usually, it is defined by the
|
||||
@var{locale} environment Emacs is running with.
|
||||
@var{locale} environment Emacs is running with and by the Standard C
|
||||
library against which Emacs was linked@footnote{
|
||||
For more information about collation rules and their locale
|
||||
dependencies, see @uref{https://unicode.org/reports/tr10/, The Unicode
|
||||
Collation Algorithm}. Some Standard C libraries, such as the
|
||||
@acronym{GNU} C Library (a.k.a.@: @dfn{glibc}) implement large
|
||||
portions of the Unicode Collation Algorithm and use the associated
|
||||
locale data, Common Locale Data Repository, or @acronym{CLDR}.
|
||||
}.
|
||||
|
||||
For example, characters with different coding points but
|
||||
the same meaning might be considered as equal, like different grave
|
||||
accent Unicode characters:
|
||||
For example, characters with different code points but the same
|
||||
meaning, like different grave accent Unicode characters, might, in
|
||||
some locales, be considered as equal:
|
||||
|
||||
@example
|
||||
@group
|
||||
|
@ -756,7 +764,8 @@ The strings are compared by the numeric values of their characters.
|
|||
For instance, @var{str1} is considered less than @var{str2} if
|
||||
its first differing character has a smaller numeric value. If
|
||||
@var{ignore-case} is non-@code{nil}, characters are converted to
|
||||
upper-case before comparing them. Unibyte strings are converted to
|
||||
upper-case, using the current buffer's case-table (@pxref{Case
|
||||
Tables}), before comparing them. Unibyte strings are converted to
|
||||
multibyte for comparison (@pxref{Text Representations}), so that a
|
||||
unibyte string and its conversion to multibyte are always regarded as
|
||||
equal.
|
||||
|
|
Loading…
Add table
Reference in a new issue