;Improve documentation of locale-specific string comparison

* doc/lispref/strings.texi (Text Comparison): Mention the Unicode
collation rules and buffer-local case-tables.
This commit is contained in:
Eli Zaretskii 2022-07-21 09:53:45 +03:00
parent ea44d7ddfc
commit 2b31e667be

View file

@ -564,11 +564,19 @@ equal with respect to collation rules. A collation rule is not only
determined by the lexicographic order of the characters contained in determined by the lexicographic order of the characters contained in
@var{string1} and @var{string2}, but also further rules about @var{string1} and @var{string2}, but also further rules about
relations between these characters. Usually, it is defined by the relations between these characters. Usually, it is defined by the
@var{locale} environment Emacs is running with. @var{locale} environment Emacs is running with and by the Standard C
library against which Emacs was linked@footnote{
For more information about collation rules and their locale
dependencies, see @uref{https://unicode.org/reports/tr10/, The Unicode
Collation Algorithm}. Some Standard C libraries, such as the
@acronym{GNU} C Library (a.k.a.@: @dfn{glibc}) implement large
portions of the Unicode Collation Algorithm and use the associated
locale data, Common Locale Data Repository, or @acronym{CLDR}.
}.
For example, characters with different coding points but For example, characters with different code points but the same
the same meaning might be considered as equal, like different grave meaning, like different grave accent Unicode characters, might, in
accent Unicode characters: some locales, be considered as equal:
@example @example
@group @group
@ -756,7 +764,8 @@ The strings are compared by the numeric values of their characters.
For instance, @var{str1} is considered less than @var{str2} if For instance, @var{str1} is considered less than @var{str2} if
its first differing character has a smaller numeric value. If its first differing character has a smaller numeric value. If
@var{ignore-case} is non-@code{nil}, characters are converted to @var{ignore-case} is non-@code{nil}, characters are converted to
upper-case before comparing them. Unibyte strings are converted to upper-case, using the current buffer's case-table (@pxref{Case
Tables}), before comparing them. Unibyte strings are converted to
multibyte for comparison (@pxref{Text Representations}), so that a multibyte for comparison (@pxref{Text Representations}), so that a
unibyte string and its conversion to multibyte are always regarded as unibyte string and its conversion to multibyte are always regarded as
equal. equal.