lispref/searching.tex small edits

* doc/lispref/searching.texi (Regular Expressions, Regexp Special):
(Regexp Backslash, Regexp Example): Copyedits.
(Regexp Special): Mention collation.
Clarify char classes with an example.
This commit is contained in:
Glenn Morris 2012-03-28 00:57:42 -07:00
parent 425df10c7b
commit d14daa28e4
2 changed files with 33 additions and 22 deletions

View file

@ -1,3 +1,10 @@
2012-03-28 Glenn Morris <rgm@gnu.org>
* searching.texi (Regular Expressions, Regexp Special):
(Regexp Backslash, Regexp Example): Copyedits.
(Regexp Special): Mention collation.
Clarify char classes with an example.
2012-03-27 Martin Rudalics <rudalics@gmx.at>
* windows.texi (Window History): Describe new option

View file

@ -241,7 +241,7 @@ regexps; the following section says how to search for them.
@findex re-builder
@cindex regular expressions, developing
For convenient interactive development of regular expressions, you
For interactive development of regular expressions, you
can use the @kbd{M-x re-builder} command. It provides a convenient
interface for creating regular expressions, by giving immediate visual
feedback in a separate buffer. As you edit the regexp, all its
@ -318,6 +318,7 @@ possible. Thus, @samp{o*} matches any number of @samp{o}s (including no
expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating
@samp{fo}. It matches @samp{f}, @samp{fo}, @samp{foo}, and so on.
@cindex backtracking and regular expressions
The matcher processes a @samp{*} construct by matching, immediately, as
many repetitions as can be found. Then it continues with the rest of
the pattern. If that fails, backtracking occurs, discarding some of the
@ -387,7 +388,12 @@ Ranges may be intermixed freely with individual characters, as in
@samp{[a-z$%.]}, which matches any lower case @acronym{ASCII} letter
or @samp{$}, @samp{%} or period.
Note that the usual regexp special characters are not special inside a
If @code{case-fold-search} is non-@code{nil}, @samp{[a-z]} also
matches upper-case letters. Note that a range like @samp{[a-z]} is
not affected by the locale's collation sequence, it always represents
a sequence in @acronym{ASCII} order.
Note also that the usual regexp special characters are not special inside a
character alternative. A completely different set of characters is
special inside character alternatives: @samp{]}, @samp{-} and @samp{^}.
@ -395,23 +401,27 @@ To include a @samp{]} in a character alternative, you must make it the
first character. For example, @samp{[]a]} matches @samp{]} or @samp{a}.
To include a @samp{-}, write @samp{-} as the first or last character of
the character alternative, or put it after a range. Thus, @samp{[]-]}
matches both @samp{]} and @samp{-}.
matches both @samp{]} and @samp{-}. (As explained below, you cannot
use @samp{\]} to include a @samp{]} inside a character alternative,
since @samp{\} is not special there.)
To include @samp{^} in a character alternative, put it anywhere but at
the beginning.
@c What if it starts with a multibyte and ends with a unibyte?
@c That doesn't seem to match anything...?
If a range starts with a unibyte character @var{c} and ends with a
multibyte character @var{c2}, the range is divided into two parts: one
is @samp{@var{c}..?\377}, the other is @samp{@var{c1}..@var{c2}}, where
@var{c1} is the first character of the charset to which @var{c2}
belongs.
spans the unibyte characters @samp{@var{c}..?\377}, the other the
multibyte characters @samp{@var{c1}..@var{c2}}, where @var{c1} is the
first character of the charset to which @var{c2} belongs.
A character alternative can also specify named character classes
(@pxref{Char Classes}). This is a POSIX feature whose syntax is
@samp{[:@var{class}:]}. Using a character class is equivalent to
mentioning each of the characters in that class; but the latter is not
feasible in practice, since some classes include thousands of
different characters.
(@pxref{Char Classes}). This is a POSIX feature. For example,
@samp{[[:ascii:]]} matches any @acronym{ASCII} character.
Using a character class is equivalent to mentioning each of the
characters in that class; but the latter is not feasible in practice,
since some classes include thousands of different characters.
@item @samp{[^ @dots{} ]}
@cindex @samp{^} in regexp
@ -812,7 +822,7 @@ with a symbol-constituent character.
@kindex invalid-regexp
Not every string is a valid regular expression. For example, a string
that ends inside a character alternative without terminating @samp{]}
that ends inside a character alternative without a terminating @samp{]}
is invalid, and so is a string that ends with a single @samp{\}. If
an invalid regular expression is passed to any of the search functions,
an @code{invalid-regexp} error is signaled.
@ -827,19 +837,13 @@ follows. (Nowadays Emacs uses a similar but more complex default
regexp constructed by the function @code{sentence-end}.
@xref{Standard Regexps}.)
First, we show the regexp as a string in Lisp syntax to distinguish
spaces from tab characters. The string constant begins and ends with a
Below, we show first the regexp as a string in Lisp syntax (to
distinguish spaces from tab characters), and then the result of
evaluating it. The string constant begins and ends with a
double-quote. @samp{\"} stands for a double-quote as part of the
string, @samp{\\} for a backslash as part of the string, @samp{\t} for a
tab and @samp{\n} for a newline.
@example
"[.?!][]\"')@}]*\\($\\| $\\|\t\\|@ @ \\)[ \t\n]*"
@end example
@noindent
In contrast, if you evaluate this string, you will see the following:
@example
@group
"[.?!][]\"')@}]*\\($\\| $\\|\t\\|@ @ \\)[ \t\n]*"
@ -849,7 +853,7 @@ In contrast, if you evaluate this string, you will see the following:
@end example
@noindent
In this output, tab and newline appear as themselves.
In the output, tab and newline appear as themselves.
This regular expression contains four parts in succession and can be
deciphered as follows: