Call them “bracket expressions” more consistently
Emacs comments and doc were inconsistent about the name used for regexps like [a-z]. Sometimes it called them “character alternatives”, sometimes “character sets”, sometimes “bracket expressions”. Prefer “bracket expressions” as it is less confusing: POSIX and most other programs’ doc uses “bracket expressions”, “alternative” is also used in the Emacs documentation to talk about ...\|... in regexps, and “character set” normally has a different meaning in Emacs.
This commit is contained in:
parent
5dfe3f21d1
commit
94d8eeeff4
4 changed files with 45 additions and 45 deletions
|
@ -950,8 +950,8 @@ features used mainly in Lisp programs.
|
|||
@dfn{special constructs} and the rest are @dfn{ordinary}. An ordinary
|
||||
character matches that same character and nothing else. The special
|
||||
characters are @samp{$^.*+?[\}. The character @samp{]} is special if
|
||||
it ends a character alternative (see below). The character @samp{-}
|
||||
is special inside a character alternative. Any other character
|
||||
it ends a bracket expression (see below). The character @samp{-}
|
||||
is special inside a bracket expression. Any other character
|
||||
appearing in a regular expression is ordinary, unless a @samp{\}
|
||||
precedes it. (When you use regular expressions in a Lisp program,
|
||||
each @samp{\} must be doubled, see the example near the end of this
|
||||
|
@ -1033,11 +1033,11 @@ you search for @samp{a.*?$} against the text @samp{abbab} followed by
|
|||
a newline, it matches the whole string. Since it @emph{can} match
|
||||
starting at the first @samp{a}, it does.
|
||||
|
||||
@cindex bracket expression
|
||||
@cindex set of alternative characters, in regular expressions
|
||||
@cindex character set, in regular expressions
|
||||
@item @kbd{[ @dots{} ]}
|
||||
is a @dfn{set of alternative characters}, or a @dfn{character set},
|
||||
beginning with @samp{[} and terminated by @samp{]}.
|
||||
is a @dfn{bracket expression}, which matches one of a set of characters.
|
||||
|
||||
In the simplest case, the characters between the two brackets are what
|
||||
this set can match. Thus, @samp{[ad]} matches either one @samp{a} or
|
||||
|
@ -1057,7 +1057,7 @@ Greek letters.
|
|||
@cindex character classes, in regular expressions
|
||||
You can also include certain special @dfn{character classes} in a
|
||||
character set. A @samp{[:} and balancing @samp{:]} enclose a
|
||||
character class inside a set of alternative characters. For instance,
|
||||
character class inside a bracket expression. For instance,
|
||||
@samp{[[:alnum:]]} matches any letter or digit. @xref{Char Classes,,,
|
||||
elisp, The Emacs Lisp Reference Manual}, for a list of character
|
||||
classes.
|
||||
|
@ -1125,7 +1125,7 @@ no preceding expression on which the @samp{*} can act. It is poor practice
|
|||
to depend on this behavior; it is better to quote the special character anyway,
|
||||
regardless of where it appears.
|
||||
|
||||
As a @samp{\} is not special inside a set of alternative characters, it can
|
||||
As a @samp{\} is not special inside a bracket expression, it can
|
||||
never remove the special meaning of @samp{-}, @samp{^} or @samp{]}.
|
||||
You should not quote these characters when they have no special
|
||||
meaning. This would not clarify anything, since backslashes
|
||||
|
|
|
@ -278,10 +278,10 @@ character is a simple regular expression that matches that character
|
|||
and nothing else. The special characters are @samp{.}, @samp{*},
|
||||
@samp{+}, @samp{?}, @samp{[}, @samp{^}, @samp{$}, and @samp{\}; no new
|
||||
special characters will be defined in the future. The character
|
||||
@samp{]} is special if it ends a character alternative (see later).
|
||||
The character @samp{-} is special inside a character alternative. A
|
||||
@samp{]} is special if it ends a bracket expression (see later).
|
||||
The character @samp{-} is special inside a bracket expression. A
|
||||
@samp{[:} and balancing @samp{:]} enclose a character class inside a
|
||||
character alternative. Any other character appearing in a regular
|
||||
bracket expression. Any other character appearing in a regular
|
||||
expression is ordinary, unless a @samp{\} precedes it.
|
||||
|
||||
For example, @samp{f} is not a special character, so it is ordinary, and
|
||||
|
@ -374,19 +374,19 @@ expression @samp{c[ad]*?a}, applied to that same string, matches just
|
|||
permits the whole expression to match is @samp{d}.)
|
||||
|
||||
@item @samp{[ @dots{} ]}
|
||||
@cindex character alternative (in regexp)
|
||||
@cindex bracket expression (in regexp)
|
||||
@cindex @samp{[} in regexp
|
||||
@cindex @samp{]} in regexp
|
||||
is a @dfn{character alternative}, which begins with @samp{[} and is
|
||||
is a @dfn{bracket expression}, which begins with @samp{[} and is
|
||||
terminated by @samp{]}. In the simplest case, the characters between
|
||||
the two brackets are what this character alternative can match.
|
||||
the two brackets are what this bracket expression can match.
|
||||
|
||||
Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and
|
||||
@samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s
|
||||
(including the empty string). It follows that @samp{c[ad]*r}
|
||||
matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc.
|
||||
|
||||
You can also include character ranges in a character alternative, by
|
||||
You can also include character ranges in a bracket expression, by
|
||||
writing the starting and ending characters with a @samp{-} between them.
|
||||
Thus, @samp{[a-z]} matches any lower-case @acronym{ASCII} letter.
|
||||
Ranges may be intermixed freely with individual characters, as in
|
||||
|
@ -395,7 +395,7 @@ or @samp{$}, @samp{%} or period. However, the ending character of one
|
|||
range should not be the starting point of another one; for example,
|
||||
@samp{[a-m-z]} should be avoided.
|
||||
|
||||
A character alternative can also specify named character classes
|
||||
A bracket expression can also specify named character classes
|
||||
(@pxref{Char Classes}). For example, @samp{[[:ascii:]]} matches any
|
||||
@acronym{ASCII} character. Using a character class is equivalent to
|
||||
mentioning each of the characters in that class; but the latter is not
|
||||
|
@ -404,9 +404,9 @@ different characters. A character class should not appear as the
|
|||
lower or upper bound of a range.
|
||||
|
||||
The usual regexp special characters are not special inside a
|
||||
character alternative. A completely different set of characters is
|
||||
bracket expression. A completely different set of characters is
|
||||
special: @samp{]}, @samp{-} and @samp{^}.
|
||||
To include @samp{]} in a character alternative, put it at the
|
||||
To include @samp{]} in a bracket expression, put it at the
|
||||
beginning. To include @samp{^}, put it anywhere but at the beginning.
|
||||
To include @samp{-}, put it at the end. Thus, @samp{[]^-]} matches
|
||||
all three of these special characters. You cannot use @samp{\} to
|
||||
|
@ -444,7 +444,7 @@ characters and raw 8-bit bytes, but not non-ASCII characters. This
|
|||
feature is intended for searching text in unibyte buffers and strings.
|
||||
@end enumerate
|
||||
|
||||
Some kinds of character alternatives are not the best style even
|
||||
Some kinds of bracket expressions are not the best style even
|
||||
though they have a well-defined meaning in Emacs. They include:
|
||||
|
||||
@enumerate
|
||||
|
@ -458,7 +458,7 @@ Unicode character escapes can help here; for example, for most programmers
|
|||
@samp{[ก-ฺ฿-๛]} is less clear than @samp{[\u0E01-\u0E3A\u0E3F-\u0E5B]}.
|
||||
|
||||
@item
|
||||
Although a character alternative can include duplicates, it is better
|
||||
Although a bracket expression can include duplicates, it is better
|
||||
style to avoid them. For example, @samp{[XYa-yYb-zX]} is less clear
|
||||
than @samp{[XYa-z]}.
|
||||
|
||||
|
@ -469,30 +469,30 @@ is simpler to list the characters. For example,
|
|||
than @samp{[ij]}, and @samp{[i-k]} is less clear than @samp{[ijk]}.
|
||||
|
||||
@item
|
||||
Although a @samp{-} can appear at the beginning of a character
|
||||
alternative or as the upper bound of a range, it is better style to
|
||||
put @samp{-} by itself at the end of a character alternative. For
|
||||
Although a @samp{-} can appear at the beginning of a bracket
|
||||
expression or as the upper bound of a range, it is better style to
|
||||
put @samp{-} by itself at the end of a bracket expression. For
|
||||
example, although @samp{[-a-z]} is valid, @samp{[a-z-]} is better
|
||||
style; and although @samp{[*--]} is valid, @samp{[*+,-]} is clearer.
|
||||
@end enumerate
|
||||
|
||||
@item @samp{[^ @dots{} ]}
|
||||
@cindex @samp{^} in regexp
|
||||
@samp{[^} begins a @dfn{complemented character alternative}. This
|
||||
@samp{[^} begins a @dfn{complemented bracket expression}. This
|
||||
matches any character except the ones specified. Thus,
|
||||
@samp{[^a-z0-9A-Z]} matches all characters @emph{except} ASCII letters and
|
||||
digits.
|
||||
|
||||
@samp{^} is not special in a character alternative unless it is the first
|
||||
@samp{^} is not special in a bracket expression unless it is the first
|
||||
character. The character following the @samp{^} is treated as if it
|
||||
were first (in other words, @samp{-} and @samp{]} are not special there).
|
||||
|
||||
A complemented character alternative can match a newline, unless newline is
|
||||
A complemented bracket expression can match a newline, unless newline is
|
||||
mentioned as one of the characters not to match. This is in contrast to
|
||||
the handling of regexps in programs such as @code{grep}.
|
||||
|
||||
You can specify named character classes, just like in character
|
||||
alternatives. For instance, @samp{[^[:ascii:]]} matches any
|
||||
You can specify named character classes, just like in bracket
|
||||
expressions. For instance, @samp{[^[:ascii:]]} matches any
|
||||
non-@acronym{ASCII} character. @xref{Char Classes}.
|
||||
|
||||
@item @samp{^}
|
||||
|
@ -556,7 +556,7 @@ that matches only empty strings, as Emacs has bugs in this area.
|
|||
For example, it is unwise to use @samp{\b*}, which can be omitted
|
||||
without changing the documented meaning of the regular expression.
|
||||
|
||||
As a @samp{\} is not special inside a character alternative, it can
|
||||
As a @samp{\} is not special inside a bracket expression, it can
|
||||
never remove the special meaning of @samp{-}, @samp{^} or @samp{]}.
|
||||
You should not quote these characters when they have no special
|
||||
meaning. This would not clarify anything, since backslashes
|
||||
|
@ -565,23 +565,23 @@ special meaning, as in @samp{[^\]} (@code{"[^\\]"} for Lisp string
|
|||
syntax), which matches any single character except a backslash.
|
||||
|
||||
In practice, most @samp{]} that occur in regular expressions close a
|
||||
character alternative and hence are special. However, occasionally a
|
||||
bracket expression and hence are special. However, occasionally a
|
||||
regular expression may try to match a complex pattern of literal
|
||||
@samp{[} and @samp{]}. In such situations, it sometimes may be
|
||||
necessary to carefully parse the regexp from the start to determine
|
||||
which square brackets enclose a character alternative. For example,
|
||||
@samp{[^][]]} consists of the complemented character alternative
|
||||
which square brackets enclose a bracket expression. For example,
|
||||
@samp{[^][]]} consists of the complemented bracket expression
|
||||
@samp{[^][]} (which matches any single character that is not a square
|
||||
bracket), followed by a literal @samp{]}.
|
||||
|
||||
The exact rules are that at the beginning of a regexp, @samp{[} is
|
||||
special and @samp{]} not. This lasts until the first unquoted
|
||||
@samp{[}, after which we are in a character alternative; @samp{[} is
|
||||
@samp{[}, after which we are in a bracket expression; @samp{[} is
|
||||
no longer special (except when it starts a character class) but @samp{]}
|
||||
is special, unless it immediately follows the special @samp{[} or that
|
||||
@samp{[} followed by a @samp{^}. This lasts until the next special
|
||||
@samp{]} that does not end a character class. This ends the character
|
||||
alternative and restores the ordinary syntax of regular expressions;
|
||||
@samp{]} that does not end a character class. This ends the bracket
|
||||
expression and restores the ordinary syntax of regular expressions;
|
||||
an unquoted @samp{[} is special again and a @samp{]} not.
|
||||
|
||||
@node Char Classes
|
||||
|
@ -592,8 +592,8 @@ an unquoted @samp{[} is special again and a @samp{]} not.
|
|||
@cindex alpha character class, regexp
|
||||
@cindex xdigit character class, regexp
|
||||
|
||||
Below is a table of the classes you can use in a character
|
||||
alternative, and what they mean. Note that the @samp{[} and @samp{]}
|
||||
Below is a table of the classes you can use in a bracket
|
||||
expression, and what they mean. Note that the @samp{[} and @samp{]}
|
||||
characters that enclose the class name are part of the name, so a
|
||||
regular expression using these classes needs one more pair of
|
||||
brackets. For example, a regular expression matching a sequence of
|
||||
|
@ -920,7 +920,7 @@ with a symbol-constituent character.
|
|||
|
||||
@kindex invalid-regexp
|
||||
Not every string is a valid regular expression. For example, a string
|
||||
that ends inside a character alternative without a terminating @samp{]}
|
||||
that ends inside a bracket expression without a terminating @samp{]}
|
||||
is invalid, and so is a string that ends with a single @samp{\}. If
|
||||
an invalid regular expression is passed to any of the search functions,
|
||||
an @code{invalid-regexp} error is signaled.
|
||||
|
@ -957,7 +957,7 @@ deciphered as follows:
|
|||
|
||||
@table @code
|
||||
@item [.?!]
|
||||
The first part of the pattern is a character alternative that matches
|
||||
The first part of the pattern is a bracket expression that matches
|
||||
any one of three characters: period, question mark, and exclamation
|
||||
mark. The match must begin with one of these three characters. (This
|
||||
is one point where the new default regexp used by Emacs differs from
|
||||
|
@ -969,7 +969,7 @@ The second part of the pattern matches any closing braces and quotation
|
|||
marks, zero or more of them, that may follow the period, question mark
|
||||
or exclamation mark. The @code{\"} is Lisp syntax for a double-quote in
|
||||
a string. The @samp{*} at the end indicates that the immediately
|
||||
preceding regular expression (a character alternative, in this case) may be
|
||||
preceding regular expression (a bracket expression, in this case) may be
|
||||
repeated zero or more times.
|
||||
|
||||
@item \\($\\|@ $\\|\t\\|@ @ \\)
|
||||
|
@ -1920,7 +1920,7 @@ attempts. Other zero-width assertions may also bring benefits by
|
|||
causing a match to fail early.
|
||||
|
||||
@item
|
||||
Avoid or-patterns in favor of character alternatives: write
|
||||
Avoid or-patterns in favor of bracket expressions: write
|
||||
@samp{[ab]} instead of @samp{a\|b}. Recall that @samp{\s-} and @samp{\sw}
|
||||
are equivalent to @samp{[[:space:]]} and @samp{[[:word:]]}, respectively.
|
||||
|
||||
|
@ -3012,7 +3012,7 @@ but does not support all the Emacs escapes.
|
|||
@item
|
||||
In POSIX BREs, it is an implementation option whether @samp{^} is special
|
||||
after @samp{\(}; GNU @command{grep} treats it like Emacs does.
|
||||
In POSIX EREs, @samp{^} is always special outside of character alternatives,
|
||||
In POSIX EREs, @samp{^} is always special outside of bracket expressions,
|
||||
which means the ERE @samp{x^} never matches.
|
||||
In Emacs regular expressions, @samp{^} is special only at the
|
||||
beginning of the regular expression, or after @samp{\(}, @samp{\(?:}
|
||||
|
@ -3021,7 +3021,7 @@ or @samp{\|}.
|
|||
@item
|
||||
In POSIX BREs, it is an implementation option whether @samp{$} is special
|
||||
before @samp{\)}; GNU @command{grep} treats it like Emacs does.
|
||||
In POSIX EREs, @samp{$} is always special outside of character alternatives,
|
||||
In POSIX EREs, @samp{$} is always special outside of bracket expressions,
|
||||
which means the ERE @samp{$x} never matches.
|
||||
In Emacs regular expressions, @samp{$} is special only at the
|
||||
end of the regular expression, or before @samp{\)} or @samp{\|}.
|
||||
|
@ -3049,8 +3049,8 @@ character classes @samp{[:ascii:]}, @samp{[:multibyte:]},
|
|||
@samp{[:nonascii:]}, @samp{[:unibyte:]}, and @samp{[:word:]}.
|
||||
|
||||
@item
|
||||
BRE and ERE alternatives can contain collating symbols and equivalence
|
||||
class expressions, e.g., @samp{[[.ch.]d[=a=]]}.
|
||||
BREs and EREs can contain collating symbols and equivalence
|
||||
class expressions within bracket expressions, e.g., @samp{[[.ch.]d[=a=]]}.
|
||||
Emacs regular expressions do not support this.
|
||||
|
||||
@item
|
||||
|
|
|
@ -1453,7 +1453,7 @@ and initial semicolons."
|
|||
;; are buffer-local, but we avoid changing them so that they can be set
|
||||
;; to make `forward-paragraph' and friends do something the user wants.
|
||||
;;
|
||||
;; `paragraph-start': The `(' in the character alternative and the
|
||||
;; `paragraph-start': The `(' in the bracket expression and the
|
||||
;; left-singlequote plus `(' sequence after the \\| alternative prevent
|
||||
;; sexps and backquoted sexps that follow a docstring from being filled
|
||||
;; with the docstring. This setting has the consequence of inhibiting
|
||||
|
|
|
@ -383,7 +383,7 @@ Interactively, ARG is the numeric argument, and defaults to 1."
|
|||
The syntax for this variable is like the syntax used inside of `[...]'
|
||||
in a regular expression--but without the `[' and the `]'.
|
||||
It is NOT a regular expression, and should follow the usual
|
||||
rules for the contents of a character alternative.
|
||||
rules for the contents of a bracket expression.
|
||||
It defines a set of \"interesting characters\" to look for when setting
|
||||
\(or searching for) tab stops, initially \"!-~\" (all printing characters).
|
||||
For example, suppose that you are editing a table which is formatted thus:
|
||||
|
|
Loading…
Add table
Reference in a new issue