Translate or-patterns that (even partially) match single characters
into character alternatives which are more efficient in matching,
sometimes algorithmically so. Example:
(or "%" (in "a-z") space)
was previously translated to
"%\\|[a-z]\\|[[:space:]]"
but now becomes
"[%a-z[:space:]]"
Single-char patterns include `nonl` and `anychar`, which now can also
be used in set operations (union, complement and intersection), and
character classes. For example, `(or nonl "\n")` is now equivalent to
`anychar`.
* lisp/emacs-lisp/rx.el (rx--expand-def): Remove, split into...
(rx--expand-def-form, rx--expand-def-symbol): ...these.
(rx--translate-compat-symbol-entry)
(rx--translate-compat-form-entry): New functions for handling the
legacy extension mechanism.
(rx--normalise-or-arg): Renamed to...
(rx--normalise-char-pattern): ...this, and rewrite.
(rx--all-string-or-args): Remove, split into...
(rx--all-string-branches-p, rx--collect-or-strings): ...these.
(rx--char-alt-union, rx--intersection-intervals)
(rx--reduce-to-char-alt, rx--optimise-or-args)
(rx--translate-char-alt, rx--human-readable): New.
(rx--translate-or, rx--translate-not, rx--translate-intersection):
Rewrite.
(rx--charset-p, rx--intervals-to-alt, rx--charset-intervals)
(rx--charset-union, rx--charset-intersection, rx--charset-all)
(rx--translate-union): Remove.
(rx--generate-alt): Decide whether to generate a negated character
alternative.
(rx--complement-intervals, rx--intersect-intervals)
(rx--union-intervals): Rename to...
(rx--interval-set-complement, rx--interval-set-intersection)
(rx--interval-set-union): ...these.
(rx--translate-symbol, rx--translate-form): Refactor extension
processing. Handle synthetic `rx--char-alt` form.
* test/lisp/emacs-lisp/rx-tests.el (rx-or, rx-char-any-raw-byte)
(rx-any, rx-charset-or): Adapt to changes and extend.
* test/lisp/emacs-lisp/rx-tests.el (rx--complement-intervals)
(rx--union-intervals, rx--intersect-intervals): Rename to...
(rx--interval-set-complement, rx--interval-set-union)
(rx--interval-set-intersection): ...these.
* lisp/emacs-lisp/rx.el:
Add tables of legacy syntax.
(rx--translate-symbol):
Translate the legacy construct `not-wordchar` as (not wordchar), which
is more intuitively obvious.
* lisp/emacs-lisp/rx.el (rx--translate-syntax):
Generate the shorter \w and \W instead of \sw and \Sw.
* test/lisp/emacs-lisp/rx-tests.el (rx-atoms, rx-syntax, rx-not):
Adapt tests.
* lisp/emacs-lisp/rx.el (rx--generate-alt):
Treat the intervals and classes lists separately without joining,
to reduce allocation. Handle special cases first.
(rx--union-intervals):
Implement directly instead of using intersection and complement.
* test/lisp/emacs-lisp/rx-tests.el (rx-any): Adapt test, as some
character alternatives are now slightly different.
(rx--complement-intervals, rx--union-intervals)
(rx--intersect-intervals): New unit tests.
(rx (in (?^ . ?a))) was incorrectly translated to "[^-a]".
Change it so that we get "[_-a^]" instead.
* lisp/emacs-lisp/rx.el (rx--generate-alt): Split ranges starting with
`^` occurring first in a non-negated character alternative.
* test/lisp/emacs-lisp/rx-tests.el (rx-any): Add and adapt tests.
(cherry picked from commit 5f5d668ac7)
(rx (in (?^ . ?a))) was incorrectly translated to "[^-a]".
Change it so that we get "[_-a^]" instead.
* lisp/emacs-lisp/rx.el (rx--generate-alt): Split ranges starting with
`^` occurring first in a non-negated character alternative.
* test/lisp/emacs-lisp/rx-tests.el (rx-any): Add and adapt tests.
The Emacs regexp engine interprets character ranges from ASCII to raw
bytes, such as [a-\xfe], as not including non-ASCII Unicode at all;
ranges from non-ACII Unicode to raw bytes, such as [ü-\x91], are
ignored entirely.
To make rx produce a translation that works as intended, split ranges
that that go from ordinary characters to raw bytes. Such ranges may
appear from set manipulation and regexp optimisation.
* lisp/emacs-lisp/rx.el (rx--generate-alt): Split intervals that
straddle the char-raw boundary when rendering a string regexp from an
interval set.
* test/lisp/emacs-lisp/rx-tests.el (rx-char-any-raw-byte):
Add test cases.
Reported by Daniel Pittman.
* lisp/emacs-lisp/rx.el (rx): Move binding of rx--local-definitions...
(rx--to-expr): ...here.
* test/lisp/emacs-lisp/rx-tests.el (rx-let-pcase): New test.
* lisp/emacs-lisp/rx.el (rx--to-expr, rx--pcase-transform):
Don't autoload.
(rx--pcase-macroexpander): Extract body into...
(rx--pcase-expand): ...a separate function, which is autoloaded.
* lisp/emacs-lisp/rx.el: `>=` and `=` are much more likely functions
than RX constructs and the indentation machinery currently has
no way to tell them apart.
Suggested by Michael Herdeegen.
* doc/lispref/searching.texi (Extending Rx): Add example illustrating
how to define a user-defined Rx form that performs computation,
from a discussion with Michael Herdeegen (bug#50136).
* lisp/emacs-lisp/rx.el (rx): Clarify evaluation time for `eval`.
pcase 'rx' patterns with a single named submatch, like
(rx (let x "a"))
would always succeed because of an over-optimistic transformation.
Patterns with 0 or more than 1 named submatches were not affected.
Reported by Philipp Stephani.
* lisp/emacs-lisp/rx.el (rx--pcase-macroexpander):
Special case for a single named submatch.
* test/lisp/emacs-lisp/rx-tests.el (rx-pcase): Add tests.
The pcase 'rx' pattern would in some cases allow the match data to be
clobbered before it is read. For example:
(pcase "PQR"
((and (rx (let a nonl)) (rx ?z)) (list 'one a))
((rx (let b ?Q)) (list 'two b)))
The above returned (two "P") instead of the correct (two "Q").
This occurred because the calls to string-match and match-string were
presented as separate patterns to pcase, which would interleave them
with other patterns.
As a remedy, combine string matching and match-data extraction into a
single pcase pattern. This introduces a slight inefficiency for two
or more submatches as they are grouped into a list structure which
then has to be destructured.
Found by Stefan Monnier. See discussion at
https://lists.gnu.org/archive/html/emacs-devel/2021-02/msg02010.html
* lisp/emacs-lisp/rx.el (rx--reduce-right): New helper.
(rx [pcase macro]): Combine string-match and match-string calls into a
single pcase pattern.
* test/lisp/emacs-lisp/rx-tests.el (rx-pcase): Add test cases.
Two unrelated bugs: A missing type check caused an error in rx
patterns for non-string match targets, and rx patterns did not work at
all in pcase-let or pcase-let*.
Second bug reported by Basil Contovounesios and Ag Ibragimov; fixes
proposed by Stefan Monnier. Discussion and explanation in thread at
https://lists.gnu.org/archive/html/emacs-devel/2021-02/msg01924.html
* lisp/emacs-lisp/rx.el (rx): Add (pred stringp) to avoid type errors,
and replace the `pred` clause for the actual match with something that
works with pcase-let(*) without being optimised away.
* test/lisp/emacs-lisp/rx-tests.el (rx-pcase): Add test cases.
The argument of the rx `regexp` form is assumed to evaluate to a valid
regexp, but certain kinds of deprecated but still accepted usage were
not handled correctly, such as unescaped literal (special) characters:
(rx "a" (regexp "*")) => "a*" which is wrong.
Handle these cases; there is no extra trouble.
* lisp/emacs-lisp/rx.el (rx--translate-regexp): Force bracketing
of single special characters.
* test/lisp/emacs-lisp/rx-tests.el (rx-regexp): Add test case.
This is a regression from Emacs 26.
Reported by Phillip Stephani.
* lisp/emacs-lisp/rx.el (rx--pcase-transform): Process ? and ?? correctly.
* test/lisp/emacs-lisp/rx-tests.el (rx-pcase): Add test case.
* lisp/battery.el: Mention BSD support in Commentary. Don't load
preloaded lisp/emacs-lisp/timer.el.
(battery--files): New function.
(battery--find-linux-sysfs-batteries): Use it and make fewer
syscalls.
(battery-status-function): Perform GNU/Linux checks in increasing
order of obsolescence: sysfs, ACPI, and then APM. Simplify Darwin
check. Add :version tag now that battery-upower is the default.
(battery-echo-area-format, battery-mode-line-format): Mention %s.
(battery-load-low, battery-load-critical): New faces.
(battery-update): Display battery-mode-line-format even if
percentage is N/A. Apply faces battery-load-low or
battery-load-critical according to the percentage, but append them
so they don't override user customizations. Update all mode lines
since we are in global-mode-string.
(battery-linux-proc-apm-regexp): Mark as obsolete, replacing with...
(battery--linux-proc-apm): ...this new rx definition.
(battery-linux-proc-apm): Use it. Fix indentation. Simplify.
(battery--acpi-rate, battery--acpi-capacity): New rx definitions.
(battery-linux-proc-acpi): Use them. Fix pathological whitespace
regexps. Simplify.
(battery-linux-sysfs): Fix docstring and indentation. Reduce number
of file searches. Simplify.
(battery-bsd-apm): Fix docstring. Simplify.
(battery-pmset): Fix docstring. Simplify ID regexp.
* lisp/emacs-lisp/rx.el (rx-define): Indent as a defun.
* test/lisp/battery-tests.el (battery-linux-proc-apm-regexp): Test
new battery--linux-proc-apm rx definition.
(battery-acpi-rate-regexp, battery-acpi-capacity-regexp): New tests.
The ? and ?? rx operators are special in that they can be written as
characters (space and '?' respectively). This confused the definition
look-up mechanism in rare cases.
* lisp/emacs-lisp/rx.el (rx--expand-def): Don't look up non-symbols.
* test/lisp/emacs-lisp/rx-tests.el (rx-charset-or): Test.
Perform 'regexp-opt' on nested 'or' forms, and after expansion of
user-defined and 'eval' forms. Characters are now turned into strings
for wider 'regexp-opt' scope. This preserves the longest-match
semantics for string in 'or' forms over composition.
* doc/lispref/searching.texi (Rx Constructs): Document.
* lisp/emacs-lisp/rx.el (rx--normalise-or-arg)
(rx--all-string-or-args): New.
(rx--translate-or): Normalise arguments first, and check for strings
in subforms.
(rx--expand-eval): Extracted from rx--translate-eval.
(rx--translate-eval): Call rx--expand-eval.
* test/lisp/emacs-lisp/rx-tests.el (rx-or, rx-def-in-or): Add tests.
* etc/NEWS: Announce.
Revert to the Emacs 26 semantics that always gave the longest match
for rx 'or' forms with only string arguments. This guarantee was
never well documented, but it is useful and people likely have come to
rely on it. For example, prior to this change,
(rx (or ">" ">="))
matched ">" even if the text contained ">=".
* lisp/emacs-lisp/rx.el (rx--translate-or): Don't tell regexp-opt to
preserve the matching order.
* doc/lispref/searching.texi (Rx Constructs): Document the
longest-match guarantee for all-string 'or' forms.
* test/lisp/emacs-lisp/rx-tests.el (rx-or): Update test.
The `not' and `intersection' forms, and `or' inside these forms,
now accept characters and single-character strings as arguments.
Previously, they had to be wrapped in `any' forms.
This does not add expressive power but is a convenience and is easily
understood.
* doc/lispref/searching.texi (Rx Constructs): Amend the documentation.
* etc/NEWS: Announce the change.
* lisp/emacs-lisp/rx.el (rx--charset-p, rx--translate-not)
(rx--charset-intervals, rx): Accept characters and 1-char strings in
more places.
* test/lisp/emacs-lisp/rx-tests.el (rx-not, rx-charset-or)
(rx-def-in-charset-or, rx-intersection): Test the change.
These character set operations, together with `not' for set
complement, improve the compositionality of rx, and reduce duplication
in complicated cases. Named character classes are not permitted in
set operations.
* lisp/emacs-lisp/rx.el (rx--translate-any): Split into multiple
functions.
(rx--foldl, rx--parse-any, rx--generate-alt, rx--intervals-to-alt)
(rx--complement-intervals, rx--intersect-intervals)
(rx--union-intervals, rx--charset-intervals, rx--charset-union)
(rx--charset-all, rx--charset-intersection, rx--translate-union)
(rx--translate-intersection): New.
(rx--translate-not, rx--translate-form, rx--builtin-forms, rx):
Add `union' and `intersection'.
* test/lisp/emacs-lisp/rx-tests.el (rx-union ,rx-def-in-union)
(rx-intersection, rx-def-in-intersection): New tests.
* doc/lispref/searching.texi (Rx Constructs):
* etc/NEWS:
Document `union' and `intersection'.
Although 'push' returns the modified list, it isn't actually
documented to do so, so don't rely on it.
* lisp/emacs-lisp/rx.el (rx--translate-any): Add progn.
For example, (any digit digit) should produce "[[:digit:]]",
not "[[:digit:][:digit:]]".
* lisp/emacs-lisp/rx.el (rx--translate-any): Deduplicate character classes.
* test/lisp/emacs-lisp/rx-tests.el (rx-any): Add test case.
* lisp/emacs-lisp/rx.el (rx--translate-symbol):
* test/lisp/emacs-lisp/rx-tests.el (rx-any, rx-atoms):
Use [^z-a] instead of ".\\|\n" for anychar.
The new expression is faster (about 2×) and does not allocate regexp
stack space. For example, (0+ anychar) now matches strings of any
size (bug#37659).