Prevent over-eager rx character range condensation
`rx' incorrectly considers character ranges between ASCII and raw bytes to cover all codes in-between, which includes all non-ASCII Unicode chars. This causes (any "\000-\377" ?Å) to be simplified to (any "\000-\377"), which is not at all the same thing: [\000-\377] really means [\000-\177\200-\377] (Bug#34492). * lisp/emacs-lisp/rx.el (rx-any-condense-range): Split ranges going from ASCII to raw bytes. * test/lisp/emacs-lisp/rx-tests.el (rx-char-any-raw-byte): Add test case. * etc/NEWS: Mention the overall change (Bug#33205).
This commit is contained in:
parent
aff0c58506
commit
478bbf7c80
3 changed files with 20 additions and 1 deletions
8
etc/NEWS
8
etc/NEWS
|
@ -1101,6 +1101,14 @@ subexpression.
|
|||
When there is no menu for a mode, display the mode name after the
|
||||
indicator instead of just the indicator (which is sometimes cryptic).
|
||||
|
||||
** rx
|
||||
|
||||
---
|
||||
*** rx now handles raw bytes in character alternatives correctly,
|
||||
when given in a string. Previously, '(any "\x80-\xff")' would match
|
||||
characters U+0080...U+00FF. Now the expression matches raw bytes in
|
||||
the 128...255 range, as expected.
|
||||
|
||||
|
||||
* New Modes and Packages in Emacs 27.1
|
||||
|
||||
|
|
|
@ -429,6 +429,13 @@ Only both edges of each range is checked."
|
|||
;; set L list of all ranges
|
||||
(mapc (lambda (e) (cond ((stringp e) (push e str))
|
||||
((numberp e) (push (cons e e) l))
|
||||
;; Ranges between ASCII and raw bytes are split,
|
||||
;; to prevent accidental inclusion of Unicode
|
||||
;; characters later on.
|
||||
((and (<= (car e) #x7f)
|
||||
(>= (cdr e) #x3fff80))
|
||||
(push (cons (car e) #x7f) l)
|
||||
(push (cons #x3fff80 (cdr e)) l))
|
||||
(t (push e l))))
|
||||
args)
|
||||
;; condense overlapped ranges in L
|
||||
|
|
|
@ -53,7 +53,11 @@
|
|||
;; Range of raw characters, multibyte.
|
||||
(should (equal (string-match-p (rx (any "Å\211\326-\377\177"))
|
||||
"XY\355\177\327")
|
||||
2)))
|
||||
2))
|
||||
;; Split range; \177-\377ÿ should not be optimised to \177-\377.
|
||||
(should (equal (string-match-p (rx (any "\177-\377" ?ÿ))
|
||||
"ÿA\310B")
|
||||
0)))
|
||||
|
||||
(ert-deftest rx-pcase ()
|
||||
(should (equal (pcase "a 1 2 3 1 1 b"
|
||||
|
|
Loading…
Add table
Reference in a new issue