Improve documentation of treesit "thing"

* src/treesit.c (syms_of_treesit):
* lisp/treesit.el (treesit-cycle-sexp-type):
(treesit-thing-at, treesit-thing-at-point): Doc fixes.

* doc/lispref/parsing.texi (User-defined Things): Improve
documentation of treesit "thing" and related functions; add
cross-references and indexing.
This commit is contained in:
Eli Zaretskii 2025-06-05 10:30:44 +03:00
parent 1903b0062b
commit bcf005fa77
3 changed files with 80 additions and 52 deletions

View file

@ -1619,14 +1619,16 @@ documentation about pattern-matching. The documentation can be found at
It's often useful to be able to identify and find certain @dfn{things} in
a buffer, like function and class definitions, statements, code blocks,
strings, comments, etc. Emacs allows users to define what kind of
tree-sitter node corresponds to a ``thing''. This enables handy
features like jumping to the next function, marking the code block at
point, or transposing two function arguments.
strings, comments, etc., in terms of node types defined by the
tree-sitter grammar used in the buffer. Emacs allows Lisp programs to
define what kinds of tree-sitter nodes corresponds to each ``thing''.
This enables handy features like jumping to the next function, marking
the code block at point, transposing two function arguments, etc.
The ``things'' feature in Emacs is independent of the pattern matching
feature of tree-sitter, and comparatively less powerful, but more
suitable for navigation and traversing the parse tree.
feature of tree-sitter (@pxref{Pattern Matching}), and comparatively
less powerful, but more suitable for navigation and traversing the
buffer text in terms of the tree-sitter parse tree.
@findex treesit-thing-definition
@findex treesit-thing-defined-p
@ -1635,12 +1637,22 @@ predicate of a defined thing with @code{treesit-thing-definition}, and
test if a thing is defined with @code{treesit-thing-defined-p}.
@defvar treesit-thing-settings
This is an alist of thing definitions for each language. The key of
each entry is a language symbol, and the value is a list of thing
definitions of the form @w{@code{(@var{thing} @var{pred})}}, where
@var{thing} is a symbol representing the thing, like @code{defun},
@code{sexp}, or @code{sentence}; and @var{pred} specifies what kind of
tree-sitter node is this @var{thing}.
This is an alist of thing definitions for each language supported by the
grammar used in a buffer; it should be defined by the buffer's major
mode (the default value is @code{nil}). The key of each entry is a
language symbol (e.g., @code{c} for C, @code{cpp} for C@t{++}, etc.),
and the value is a list of thing definitions of the form
@w{@code{(@var{thing} @var{pred})}}, where @var{thing} is a symbol
representing the thing, and @var{pred} specifies what kinds of
tree-sitter nodes are considered as this @var{thing}.
@cindex @code{sexp}, treesit-defined thing
@cindex @code{list}, treesit-defined thing
The symbol used to define the @var{thing} can be anything meaningful for
the major mode: @code{defun}, @code{defclass}, @code{sentence},
@code{comment}, @code{string}, etc. To support tree-sitter based
navigation commands (@pxref{List Motion}), the mode should define two
things: @code{list} and @code{sexp}.
@var{pred} can be a regexp string that matches the type of the node; it
can be a function that takes a node as the argument and returns a
@ -1660,13 +1672,16 @@ meaning that not satisfying @var{pred} qualifies the node.
Finally, @var{pred} can refer to other @var{thing}s defined in this
list. For example, @w{@code{(or sexp sentence)}} defines something
that's either a @code{sexp} thing or a @code{sentence} thing, as defined
by some other rule in the alist.
by some other rules in the alist.
@cindex @code{named}, treesit-defined thing
@cindex @code{anonymous}, treesit-defined thing
There are two pre-defined predicates: @code{named} and @code{anonymous},
which qualify, respectively, named and anonymous nodes. They can be
combined with @code{and} to narrow down the match.
which qualify, respectively, named and anonymous nodes of the
tree-sitter grammar. They can be combined with @code{and} to narrow
down the match.
Here's an example @code{treesit-thing-settings} for C and C++:
Here's an example @code{treesit-thing-settings} for C and C@t{++}:
@example
@group
@ -1676,6 +1691,8 @@ Here's an example @code{treesit-thing-settings} for C and C++:
(comment "comment")
(string "raw_string_literal")
(text (or comment string)))
@end group
@group
(cpp
(defun ("function_definition" . cpp-ts-mode-defun-valid-p))
(defclass "class_specifier")
@ -1685,12 +1702,12 @@ Here's an example @code{treesit-thing-settings} for C and C++:
@noindent
Note that this example is modified for didactic purposes, and isn't
exactly how C and C@t{++} modes define things.
exactly how tree-sitter based C and C@t{++} modes define things.
@end defvar
Emacs builtin functions already make use some thing definitions.
Emacs builtin functions already make use of some thing definitions.
Command @code{treesit-forward-sexp} uses the @code{sexp} definition if
major mode defines it; @code{treesit-forward-list},
major mode defines it (@pxref{List Motion}); @code{treesit-forward-list},
@code{treesit-down-list}, @code{treesit-up-list},
@code{treesit-show-paren-data} use the @code{list} definition (its
symbol @code{list} has the symbol property @code{treesit-thing-symbol}
@ -1699,8 +1716,8 @@ to avoid ambiguity with the function that has the same name);
Defun movement functions like @code{treesit-end-of-defun} uses the
@code{defun} definition (@code{defun} definition is overridden by
@var{treesit-defun-type-regexp} for backward compatibility). Major
modes can also define @code{comment}, @code{string}, @code{text}
(generally comments and strings).
modes can also define @code{comment}, @code{string}, and @code{text}
things (to match comments and strings).
The rest of this section lists a few functions that take advantage of
the thing definitions. Besides the functions below, some other
@ -1709,10 +1726,10 @@ tree-traversing functions like @code{treesit-search-forward},
@code{treesit-induce-sparse-tree}, etc. @xref{Retrieving Nodes}.
@defun treesit-node-match-p node thing &optional ignore-missing
This function checks whether @var{node} is a @var{thing}.
This function checks whether @var{node} represents a @var{thing}.
If @var{node} is a @var{thing}, return non-@code{nil}, otherwise return
@code{nil}. For convenience, if @code{node} is @code{nil}, this
If @var{node} represents @var{thing}, return non-@code{nil}, otherwise
return @code{nil}. For convenience, if @code{node} is @code{nil}, this
function just returns @code{nil}.
The @var{thing} can be either a thing symbol like @code{defun}, or
@ -1727,8 +1744,9 @@ undefined and just returns @code{nil}; but it still signals the error if
@end defun
@defun treesit-thing-prev position thing
This function returns the first node before @var{position} that is the
specified @var{thing}. If no such node exists, it returns @code{nil}.
This function returns the first node before @var{position} in the
current buffer that is the specified @var{thing}. If no such node
exists, it returns @code{nil}.
It's guaranteed that, if a node is returned, the node's end position is
less or equal to @var{position}. In other words, this function never
returns a node that encloses @var{position}.
@ -1753,8 +1771,9 @@ function doesn't move point.
A positive @var{arg} means moving forward that many instances of
@var{thing}; negative @var{arg} means moving backward. If @var{side} is
@code{beg}, this function stops at the beginning of @var{thing}; if
@code{end}, stop at the end of @var{thing}.
@code{beg}, this function returns the position of the beginning of
@var{thing}; if it's @code{end}, it returns the position at the end of
@var{thing}.
Like in @code{treesit-thing-prev}, @var{thing} can be a thing symbol
defined in @code{treesit-thing-settings}, or a predicate.
@ -1780,8 +1799,8 @@ less or equal to @var{position}, and it's end position is greater or equal to
@var{position}.
If @var{strict} is non-@code{nil}, this function uses strict comparison,
i.e., start position must be strictly greater than @var{position}, and end
position must be strictly less than @var{position}.
i.e., start position must be strictly smaller than @var{position}, and end
position must be strictly greater than @var{position}.
@var{thing} can be either a thing symbol defined in
@code{treesit-thing-settings}, or a predicate.

View file

@ -3237,11 +3237,14 @@ The type can be `list' (the default) or `sexp'.
The `list' type uses the `list' thing defined in `treesit-thing-settings'.
See `treesit-thing-at-point'. With this type commands use syntax tables to
navigate symbols and treesit definition to navigate lists.
navigate symbols and treesit definitions to navigate lists.
The `sexp' type uses the `sexp' thing defined in `treesit-thing-settings'.
With this type commands use only the treesit definition of parser nodes,
without distinction between symbols and lists."
With this type commands use only the treesit definitions of parser nodes,
without distinction between symbols and lists. Since tree-sitter grammars
could group node types in arbitrary ways, navigation by `sexp' might not
match your expectations, and might produce different results in differnt
treesit-based modes."
(interactive "p")
(if (not (treesit-thing-defined-p 'list (treesit-language-at (point))))
(user-error "No `list' thing is defined in `treesit-thing-settings'")
@ -3630,14 +3633,15 @@ predicate as described in `treesit-thing-settings'."
(treesit--thing-sibling pos thing nil))
(defun treesit-thing-at (pos thing &optional strict)
"Return the smallest THING enclosing POS.
"Return the smallest node enclosing POS for THING.
The returned node, if non-nil, must enclose POS, i.e., its start
<= POS, its end > POS. If STRICT is non-nil, the returned node's
start must < POS rather than <= POS.
The returned node, if non-nil, must enclose POS, i.e., its
start <= POS, its end > POS. If STRICT is non-nil, the returned
node's start must be < POS rather than <= POS.
THING should be a thing defined in `treesit-thing-settings', or
it can be a predicate described in `treesit-thing-settings'."
THING should be a thing defined in `treesit-thing-settings' for
the current buffer's major mode, or it can be a predicate
described in `treesit-thing-settings'."
(let* ((cursor (treesit-node-at pos))
(iter-pred (lambda (node)
(and (treesit-node-match-p node thing t)
@ -3789,13 +3793,14 @@ function is called recursively."
(if (eq counter 0) pos nil)))
(defun treesit-thing-at-point (thing tactic)
"Return the THING at point, or nil if none is found.
"Return the node for THING at point, or nil if no THING is found at point.
THING can be a symbol, a regexp, a predicate function, and more;
see `treesit-thing-settings' for details.
for details, see `treesit-thing-settings' as defined by the
current buffer's major mode.
Return the top-level THING if TACTIC is `top-level'; return the
smallest enclosing THING as POS if TACTIC is `nested'."
Return the top-level node for THING if TACTIC is `top-level'; return
the smallest node enclosing THING at point if TACTIC is `nested'."
(let ((node (treesit-thing-at (point) thing)))
(if (eq tactic 'top-level)

View file

@ -5193,13 +5193,16 @@ then in the system default locations for dynamic libraries, in that order. */);
doc:
/* A list defining things.
The value should be an alist of (LANGUAGE . DEFINITIONS), where
LANGUAGE is a language symbol, and DEFINITIONS is a list of
The value should be defined by the major mode, and should be an alist
of the form (LANGUAGE . DEFINITIONS), where LANGUAGE is a language
symbol and DEFINITIONS is a list whose elements are of the form
(THING PRED)
THING is a symbol representing the thing, like `defun', `sexp', or
`sentence'; PRED defines what kind of node can be qualified as THING.
THING is a symbol representing the thing, like `defun', `defclass',
`sexp', `sentence', `comment', or any other symbol that is meaningful
for the major mode; PRED defines what kind of node can be qualified
as THING.
PRED can be a regexp string that matches the type of the node; it can
be a predicate function that takes the node as the sole argument and
@ -5207,12 +5210,13 @@ returns t if the node is the thing, and nil otherwise; it can be a
cons (REGEXP . FN), which is a combination of a regexp and a predicate
function, and the node has to match both to qualify as the thing.
PRED can also be recursively defined. It can be (or PRED...), meaning
satisfying anyone of the inner PREDs qualifies the node; or (and
PRED...) meaning satisfying all of the inner PREDs qualifies the node;
or (not PRED), meaning not satisfying the inner PRED qualifies the node.
PRED can also be recursively defined. It can be:
There are two pre-defined predicates, `named' and `anonymous`. They
(or PRED...), meaning satisfying any of the inner PREDs qualifies the node;
(and PRED...) meaning satisfying all of the inner PREDs qualifies the node;
(not PRED), meaning not satisfying the inner PRED qualifies the node.
There are two pre-defined predicates, `named' and `anonymous'. They
match named nodes and anonymous nodes, respectively.
Finally, PRED can refer to other THINGs defined in this list by using