New sort keyword arguments (bug#69709)

Add the :key, :lessp, :reverse and :in-place keyword arguments.
The old calling style remains available and is unchanged.

* src/fns.c (sort_list, sort_vector, Fsort):
* src/sort.c (tim_sort):
Add keyword arguments with associated new features.
All callers of Fsort adapted.
* test/src/fns-tests.el (fns-tests--shuffle-vector, fns-tests-sort-kw):
New test.
* doc/lispref/sequences.texi (Sequence Functions): Update manual.
* etc/NEWS: Announce.
This commit is contained in:
Mattias Engdegård 2024-03-19 13:03:47 +01:00
parent a52f1121a3
commit ae5f2c02bd
8 changed files with 229 additions and 87 deletions

View file

@ -350,94 +350,99 @@ encouraged to treat strings as immutable even when they are mutable.
@end defun
@defun sort sequence predicate
@defun sort sequence &rest keyword-args
@cindex stable sort
@cindex sorting lists
@cindex sorting vectors
This function sorts @var{sequence} stably. Note that this function doesn't work
for all sequences; it may be used only for lists and vectors. If @var{sequence}
is a list, it is modified destructively. This functions returns the sorted
@var{sequence} and compares elements using @var{predicate}. A stable sort is
one in which elements with equal sort keys maintain their relative order before
and after the sort. Stability is important when successive sorts are used to
order elements according to different criteria.
This function sorts @var{sequence}, which must be a list or vector, and
returns a sorted sequence of the same type.
The sort is stable, which means that elements with equal sort keys maintain
their relative order. It takes the following optional keyword arguments:
The argument @var{predicate} must be a function that accepts two
arguments. It is called with two elements of @var{sequence}. To get an
increasing order sort, the @var{predicate} should return non-@code{nil} if the
first element is ``less'' than the second, or @code{nil} if not.
@table @asis
@item :key @var{keyfunc}
Use @var{keyfunc}, a function that takes a single element from
@var{sequence} and returns its key value, to generate the keys used in
comparison. If this argument is absent or if @var{keyfunc} is
@code{nil} then @code{identity} is assumed; that is, the elements
themselves are used as sorting keys.
The comparison function @var{predicate} must give reliable results for
any given pair of arguments, at least within a single call to
@code{sort}. It must be @dfn{antisymmetric}; that is, if @var{a} is
less than @var{b}, @var{b} must not be less than @var{a}. It must be
@dfn{transitive}---that is, if @var{a} is less than @var{b}, and @var{b}
is less than @var{c}, then @var{a} must be less than @var{c}. If you
use a comparison function which does not meet these requirements, the
result of @code{sort} is unpredictable.
@item :lessp @var{predicate}
Use @var{predicate} to order the keys. @var{predicate} is a function
that takes two sort keys as arguments and returns non-@code{nil} if the
first should come before the second. If this argument is absent or
@var{predicate} is @code{nil}, then @code{value<} is used, which
is applicable to many different Lisp types and generally sorts in
ascending order (@pxref{definition of value<}).
The destructive aspect of @code{sort} for lists is that it reuses the
cons cells forming @var{sequence} by changing their contents, possibly
rearranging them in a different order. This means that the value of
the input list is undefined after sorting; only the list returned by
@code{sort} has a well-defined value. Example:
For consistency, any predicate must obey the following rules:
@itemize @bullet
@item
It must be @dfn{antisymmetric}: it cannot both order @var{a} before
@var{b} and @var{b} before @var{a}.
@item
It must be @dfn{transitive}: if it orders @var{a} before @var{b} and
@var{b} before @var{c}, then it must also order @var{a} before @var{c}.
@end itemize
@item :reverse @var{flag}
If @var{flag} is non-@code{nil}, the sorting order is reversed. With
the default @code{:lessp} predicate this means sorting in descending order.
@item :in-place @var{flag}
If @var{flag} is non-@code{nil}, then @var{sequence} is sorted in-place
(destructively) and returned. If @code{nil}, or if this argument is not
given, a sorted copy of the input is returned and @var{sequence} itself
remains unmodified. In-place sorting is slightly faster, but the
original sequence is lost.
@end table
If the default behaviour is not suitable for your needs, it is usually
easier and faster to supply a new @code{:key} function than a different
@code{:lessp} predicate. For example, consider sorting these strings:
@example
@group
(setq nums (list 2 1 4 3 0))
(sort nums #'<)
@result{} (0 1 2 3 4)
; nums is unpredictable at this point
@end group
(setq numbers '("one" "two" "three" "four" "five" "six"))
(sort numbers)
@result{} ("five" "four" "one" "six" "three" "two")
@end example
Most often we store the result back into the variable that held the
original list:
You can sort the strings by length instead by supplying a different key
function:
@example
(setq nums (sort nums #'<))
(sort numbers :key #'length)
@result{} ("one" "two" "six" "four" "five" "three")
@end example
If you wish to make a sorted copy without destroying the original,
copy it first and then sort:
Note how strings of the same length keep their original order, thanks to
the sorting stability. Now suppose you want to sort by length, but use
the string contents to break ties. The easiest way is to specify a key
function that transforms an element to a value that is sorted this way.
Since @code{value<} orders compound objects (conses, lists,
vectors and records) lexicographically, you could do:
@example
@group
(setq nums (list 2 1 4 3 0))
(sort (copy-sequence nums) #'<)
@result{} (0 1 2 3 4)
@end group
@group
nums
@result{} (2 1 4 3 0)
@end group
(sort numbers :key (lambda (x) (cons (length x) x)))
@result{} ("one" "six" "two" "five" "four" "three")
@end example
For the better understanding of what stable sort is, consider the following
vector example. After sorting, all items whose @code{car} is 8 are grouped
at the beginning of @code{vector}, but their relative order is preserved.
All items whose @code{car} is 9 are grouped at the end of @code{vector},
but their relative order is also preserved:
because @code{(3 . "six")} is ordered before @code{(3 . "two")} and so on.
For compatibility with old versions of Emacs, the @code{sort} function
can also be called using the fixed two-argument form
@example
@group
(setq
vector
(vector '(8 . "xxx") '(9 . "aaa") '(8 . "bbb") '(9 . "zzz")
'(9 . "ppp") '(8 . "ttt") '(8 . "eee") '(9 . "fff")))
@result{} [(8 . "xxx") (9 . "aaa") (8 . "bbb") (9 . "zzz")
(9 . "ppp") (8 . "ttt") (8 . "eee") (9 . "fff")]
@end group
@group
(sort vector (lambda (x y) (< (car x) (car y))))
@result{} [(8 . "xxx") (8 . "bbb") (8 . "ttt") (8 . "eee")
(9 . "aaa") (9 . "zzz") (9 . "ppp") (9 . "fff")]
@end group
(@code{sort} @var{sequence} @var{predicate})
@end example
where @var{predicate} is the @code{:lessp} argument. When using this
form, sorting is always done in-place.
@end defun
@cindex comparing values
@cindex standard sorting order
@anchor{definition of value<}
@defun value< a b
This function returns non-@code{nil} if @var{a} comes before @var{b} in
the standard sorting order; this means that it returns @code{nil} when

View file

@ -1770,6 +1770,31 @@ lexicographically.
It is intended as a convenient ordering predicate for sorting, and is
likely to be faster than hand-written Lisp functions.
+++
** New 'sort' arguments and features.
The 'sort' function can now be called using the signature
(sort SEQ &rest KEYWORD-ARGUMENTS)
where arguments after the first are keyword/value pairs, all optional:
':key' specifies a function that produces the sorting key from an element,
':lessp' specifies the ordering predicate, defaulting to 'value<',
':reverse' is used to reverse the sorting order,
':in-place is used for in-place sorting, as the default is now to
sort a copy of the input.
The new signature is less error-prone and reduces the need to write
ordering predicates by hand. We recommend that you use the ':key'
argument instead of ':lessp' unless a suitable ordering predicate is
already available. This can also be used for multi-key sorting:
(sort seq :key (lambda (x) (list (age x) (size x) (cost x))))
sorts by the return value of 'age', then by 'size', then by 'cost'.
The old signature, '(sort SEQ PREDICATE)', can still be used and sorts
its input in-place as before.
** New function 'sort-on'.
This function implements the Schwartzian transform, and is appropriate
for sorting lists when the computation of the sort key of a list

View file

@ -351,7 +351,7 @@ directory_files_internal (Lisp_Object directory, Lisp_Object full,
specpdl_ptr = specpdl_ref_to_ptr (count);
if (NILP (nosort))
list = Fsort (Fnreverse (list),
list = CALLN (Fsort, Fnreverse (list),
attrs ? Qfile_attributes_lessp : Qstring_lessp);
(void) directory_volatile;

View file

@ -2353,7 +2353,8 @@ See also the function `nreverse', which is used more often. */)
is destructively reused to hold the sorted result. */
static Lisp_Object
sort_list (Lisp_Object list, Lisp_Object predicate, Lisp_Object keyfunc)
sort_list (Lisp_Object list, Lisp_Object predicate, Lisp_Object keyfunc,
bool reverse)
{
ptrdiff_t length = list_length (list);
if (length < 2)
@ -2369,7 +2370,7 @@ sort_list (Lisp_Object list, Lisp_Object predicate, Lisp_Object keyfunc)
result[i] = Fcar (tail);
tail = XCDR (tail);
}
tim_sort (predicate, keyfunc, result, length);
tim_sort (predicate, keyfunc, result, length, reverse);
ptrdiff_t i = 0;
tail = list;
@ -2388,27 +2389,86 @@ sort_list (Lisp_Object list, Lisp_Object predicate, Lisp_Object keyfunc)
algorithm. */
static void
sort_vector (Lisp_Object vector, Lisp_Object predicate, Lisp_Object keyfunc)
sort_vector (Lisp_Object vector, Lisp_Object predicate, Lisp_Object keyfunc,
bool reverse)
{
ptrdiff_t length = ASIZE (vector);
if (length < 2)
return;
tim_sort (predicate, keyfunc, XVECTOR (vector)->contents, length);
tim_sort (predicate, keyfunc, XVECTOR (vector)->contents, length, reverse);
}
DEFUN ("sort", Fsort, Ssort, 2, 2, 0,
doc: /* Sort SEQ, stably, comparing elements using PREDICATE.
Returns the sorted sequence. SEQ should be a list or vector. SEQ is
modified by side effects. PREDICATE is called with two elements of
SEQ, and should return non-nil if the first element should sort before
the second. */)
(Lisp_Object seq, Lisp_Object predicate)
DEFUN ("sort", Fsort, Ssort, 1, MANY, 0,
doc: /* Sort SEQ, stably, and return the sorted sequence.
SEQ should be a list or vector.
Optional arguments are specified as keyword/argument pairs. The following
arguments are defined:
:key FUNC -- FUNC is a function that takes a single element from SEQ and
returns the key value to be used in comparison. If absent or nil,
`identity' is used.
:lessp FUNC -- FUNC is a function that takes two arguments and returns
non-nil if the first element should come before the second.
If absent or nil, `value<' is used.
:reverse BOOL -- if BOOL is non-nil, the sorting order implied by FUNC is
reversed. This does not affect stability: equal elements still retain
their order in the input sequence.
:in-place BOOL -- if BOOL is non-nil, SEQ is sorted in-place and returned.
Otherwise, a sorted copy of SEQ is returned and SEQ remains unmodified;
this is the default.
For compatibility, the calling convention (sort SEQ LESSP) can also be used;
in this case, sorting is always done in-place.
usage: (sort SEQ &key KEY LESSP REVERSE IN-PLACE) */)
(ptrdiff_t nargs, Lisp_Object *args)
{
Lisp_Object seq = args[0];
Lisp_Object key = Qnil;
Lisp_Object lessp = Qnil;
bool inplace = false;
bool reverse = false;
if (nargs == 2)
{
/* old-style invocation without keywords */
lessp = args[1];
inplace = true;
}
else if ((nargs & 1) == 0)
error ("Invalid argument list");
else
for (ptrdiff_t i = 1; i < nargs - 1; i += 2)
{
if (EQ (args[i], QCkey))
key = args[i + 1];
else if (EQ (args[i], QClessp))
lessp = args[i + 1];
else if (EQ (args[i], QCin_place))
inplace = !NILP (args[i + 1]);
else if (EQ (args[i], QCreverse))
reverse = !NILP (args[i + 1]);
else
signal_error ("Invalid keyword argument", args[i]);
}
if (NILP (lessp))
/* FIXME: normalise it as Qnil instead, and special-case it in tim_sort?
That would remove the funcall overhead for the common case. */
lessp = Qvaluelt;
/* FIXME: for lists it may be slightly faster to make the copy after
sorting? Measure. */
if (!inplace)
seq = Fcopy_sequence (seq);
if (CONSP (seq))
seq = sort_list (seq, predicate, Qnil);
seq = sort_list (seq, lessp, key, reverse);
else if (VECTORP (seq))
sort_vector (seq, predicate, Qnil);
sort_vector (seq, lessp, key, reverse);
else if (!NILP (seq))
wrong_type_argument (Qlist_or_vector_p, seq);
return seq;
@ -6860,4 +6920,10 @@ For best results this should end in a space. */);
DEFSYM (Qfrom__tty_menu_p, "from--tty-menu-p");
DEFSYM (Qyes_or_no_p, "yes-or-no-p");
DEFSYM (Qy_or_n_p, "y-or-n-p");
DEFSYM (QCkey, ":key");
DEFSYM (QClessp, ":lessp");
DEFSYM (QCin_place, ":in-place");
DEFSYM (QCreverse, ":reverse");
DEFSYM (Qvaluelt, "value<");
}

View file

@ -4299,7 +4299,8 @@ extern void syms_of_fns (void);
extern void mark_fns (void);
/* Defined in sort.c */
extern void tim_sort (Lisp_Object, Lisp_Object, Lisp_Object *, const ptrdiff_t);
extern void tim_sort (Lisp_Object, Lisp_Object, Lisp_Object *, const ptrdiff_t,
bool);
/* Defined in floatfns.c. */
verify (FLT_RADIX == 2 || FLT_RADIX == 16);

View file

@ -3368,7 +3368,7 @@ dump_sort_copied_objects (struct dump_context *ctx)
file and the copy into Emacs in-order, where prefetch will be
most effective. */
ctx->copied_queue =
Fsort (Fnreverse (ctx->copied_queue),
CALLN (Fsort, Fnreverse (ctx->copied_queue),
Qdump_emacs_portable__sort_predicate_copied);
}
@ -3935,7 +3935,7 @@ drain_reloc_list (struct dump_context *ctx,
{
struct dump_flags old_flags = ctx->flags;
ctx->flags.pack_objects = true;
Lisp_Object relocs = Fsort (Fnreverse (*reloc_list),
Lisp_Object relocs = CALLN (Fsort, Fnreverse (*reloc_list),
Qdump_emacs_portable__sort_predicate);
*reloc_list = Qnil;
dump_align_output (ctx, max (alignof (struct dump_reloc),
@ -4057,7 +4057,7 @@ static void
dump_do_fixups (struct dump_context *ctx)
{
dump_off saved_offset = ctx->offset;
Lisp_Object fixups = Fsort (Fnreverse (ctx->fixups),
Lisp_Object fixups = CALLN (Fsort, Fnreverse (ctx->fixups),
Qdump_emacs_portable__sort_predicate);
Lisp_Object prev_fixup = Qnil;
ctx->fixups = Qnil;

View file

@ -1072,11 +1072,11 @@ resolve_fun (Lisp_Object fun)
}
/* Sort the array SEQ with LENGTH elements in the order determined by
PREDICATE. */
PREDICATE (where Qnil means value<) and KEYFUNC (where Qnil means identity),
optionally reversed. */
void
tim_sort (Lisp_Object predicate, Lisp_Object keyfunc,
Lisp_Object *seq, const ptrdiff_t length)
Lisp_Object *seq, const ptrdiff_t length, bool reverse)
{
/* FIXME: optimise for the predicate being value<; at the very
least we'd go without the Lisp funcall overhead. */
@ -1091,9 +1091,8 @@ tim_sort (Lisp_Object predicate, Lisp_Object keyfunc,
if (EQ (keyfunc, Qidentity))
keyfunc = Qnil;
/* FIXME: consider a built-in reverse sorting flag: we would reverse
the input in-place here and reverse it back just before
returning. */
if (reverse)
reverse_slice (seq, seq + length); /* preserve stability */
if (NILP (keyfunc))
{
@ -1159,6 +1158,9 @@ tim_sort (Lisp_Object predicate, Lisp_Object keyfunc,
eassume (ms.pending[0].len == length);
lo = ms.pending[0].base;
if (reverse)
reverse_slice (seq, seq + length);
if (ms.a.keys != ms.temparray || allocated_keys != NULL)
unbind_to (ms.count, Qnil);
}

View file

@ -375,6 +375,49 @@
(should (equal (should-error (sort "cba" #'<) :type 'wrong-type-argument)
'(wrong-type-argument list-or-vector-p "cba"))))
(defun fns-tests--shuffle-vector (vect)
"Shuffle VECT in place."
(let ((n (length vect)))
(dotimes (i (1- n))
(let* ((j (+ i (random (- n i))))
(vi (aref vect i)))
(aset vect i (aref vect j))
(aset vect j vi)))))
(ert-deftest fns-tests-sort-kw ()
;; Test the `sort' keyword calling convention by comparing with
;; the results from using the old (positional) style tested above.
(random "my seed")
(dolist (size '(0 1 2 3 10 100 1000))
;; Use a vector with both positive and negative numbers (asymmetric).
(let ((numbers (vconcat
(number-sequence (- (/ size 3)) (- size 1 (/ size 3))))))
(fns-tests--shuffle-vector numbers)
;; Test both list and vector input.
(dolist (input (list (append numbers nil) numbers))
(dolist (in-place '(nil t))
(dolist (reverse '(nil t))
(dolist (key '(nil abs))
(dolist (lessp '(nil >))
(let* ((seq (copy-sequence input))
(res (sort seq :key key :lessp lessp
:in-place in-place :reverse reverse))
(pred (or lessp #'value<))
(exp-in (copy-sequence input))
(exp-out
(sort (if reverse (reverse exp-in) exp-in)
(if key
(lambda (a b)
(funcall pred
(funcall key a) (funcall key b)))
pred)))
(expected (if reverse (reverse exp-out) exp-out)))
(should (equal res expected))
(if in-place
(should (eq res seq))
(should-not (and (> size 0) (eq res seq)))
(should (equal seq input))))))))))))
(defvar w32-collate-ignore-punctuation)
(ert-deftest fns-tests-collate-sort ()