From bcf005fa774194d434c68cc191566b58c297ca86 Mon Sep 17 00:00:00 2001 From: Eli Zaretskii Date: Thu, 5 Jun 2025 10:30:44 +0300 Subject: [PATCH] Improve documentation of treesit "thing" * src/treesit.c (syms_of_treesit): * lisp/treesit.el (treesit-cycle-sexp-type): (treesit-thing-at, treesit-thing-at-point): Doc fixes. * doc/lispref/parsing.texi (User-defined Things): Improve documentation of treesit "thing" and related functions; add cross-references and indexing. --- doc/lispref/parsing.texi | 79 +++++++++++++++++++++++++--------------- lisp/treesit.el | 31 +++++++++------- src/treesit.c | 22 ++++++----- 3 files changed, 80 insertions(+), 52 deletions(-) diff --git a/doc/lispref/parsing.texi b/doc/lispref/parsing.texi index aa321785460..374eeb28b7a 100644 --- a/doc/lispref/parsing.texi +++ b/doc/lispref/parsing.texi @@ -1619,14 +1619,16 @@ documentation about pattern-matching. The documentation can be found at It's often useful to be able to identify and find certain @dfn{things} in a buffer, like function and class definitions, statements, code blocks, -strings, comments, etc. Emacs allows users to define what kind of -tree-sitter node corresponds to a ``thing''. This enables handy -features like jumping to the next function, marking the code block at -point, or transposing two function arguments. +strings, comments, etc., in terms of node types defined by the +tree-sitter grammar used in the buffer. Emacs allows Lisp programs to +define what kinds of tree-sitter nodes corresponds to each ``thing''. +This enables handy features like jumping to the next function, marking +the code block at point, transposing two function arguments, etc. The ``things'' feature in Emacs is independent of the pattern matching -feature of tree-sitter, and comparatively less powerful, but more -suitable for navigation and traversing the parse tree. +feature of tree-sitter (@pxref{Pattern Matching}), and comparatively +less powerful, but more suitable for navigation and traversing the +buffer text in terms of the tree-sitter parse tree. @findex treesit-thing-definition @findex treesit-thing-defined-p @@ -1635,12 +1637,22 @@ predicate of a defined thing with @code{treesit-thing-definition}, and test if a thing is defined with @code{treesit-thing-defined-p}. @defvar treesit-thing-settings -This is an alist of thing definitions for each language. The key of -each entry is a language symbol, and the value is a list of thing -definitions of the form @w{@code{(@var{thing} @var{pred})}}, where -@var{thing} is a symbol representing the thing, like @code{defun}, -@code{sexp}, or @code{sentence}; and @var{pred} specifies what kind of -tree-sitter node is this @var{thing}. +This is an alist of thing definitions for each language supported by the +grammar used in a buffer; it should be defined by the buffer's major +mode (the default value is @code{nil}). The key of each entry is a +language symbol (e.g., @code{c} for C, @code{cpp} for C@t{++}, etc.), +and the value is a list of thing definitions of the form +@w{@code{(@var{thing} @var{pred})}}, where @var{thing} is a symbol +representing the thing, and @var{pred} specifies what kinds of +tree-sitter nodes are considered as this @var{thing}. + +@cindex @code{sexp}, treesit-defined thing +@cindex @code{list}, treesit-defined thing +The symbol used to define the @var{thing} can be anything meaningful for +the major mode: @code{defun}, @code{defclass}, @code{sentence}, +@code{comment}, @code{string}, etc. To support tree-sitter based +navigation commands (@pxref{List Motion}), the mode should define two +things: @code{list} and @code{sexp}. @var{pred} can be a regexp string that matches the type of the node; it can be a function that takes a node as the argument and returns a @@ -1660,13 +1672,16 @@ meaning that not satisfying @var{pred} qualifies the node. Finally, @var{pred} can refer to other @var{thing}s defined in this list. For example, @w{@code{(or sexp sentence)}} defines something that's either a @code{sexp} thing or a @code{sentence} thing, as defined -by some other rule in the alist. +by some other rules in the alist. +@cindex @code{named}, treesit-defined thing +@cindex @code{anonymous}, treesit-defined thing There are two pre-defined predicates: @code{named} and @code{anonymous}, -which qualify, respectively, named and anonymous nodes. They can be -combined with @code{and} to narrow down the match. +which qualify, respectively, named and anonymous nodes of the +tree-sitter grammar. They can be combined with @code{and} to narrow +down the match. -Here's an example @code{treesit-thing-settings} for C and C++: +Here's an example @code{treesit-thing-settings} for C and C@t{++}: @example @group @@ -1676,6 +1691,8 @@ Here's an example @code{treesit-thing-settings} for C and C++: (comment "comment") (string "raw_string_literal") (text (or comment string))) +@end group +@group (cpp (defun ("function_definition" . cpp-ts-mode-defun-valid-p)) (defclass "class_specifier") @@ -1685,12 +1702,12 @@ Here's an example @code{treesit-thing-settings} for C and C++: @noindent Note that this example is modified for didactic purposes, and isn't -exactly how C and C@t{++} modes define things. +exactly how tree-sitter based C and C@t{++} modes define things. @end defvar -Emacs builtin functions already make use some thing definitions. +Emacs builtin functions already make use of some thing definitions. Command @code{treesit-forward-sexp} uses the @code{sexp} definition if -major mode defines it; @code{treesit-forward-list}, +major mode defines it (@pxref{List Motion}); @code{treesit-forward-list}, @code{treesit-down-list}, @code{treesit-up-list}, @code{treesit-show-paren-data} use the @code{list} definition (its symbol @code{list} has the symbol property @code{treesit-thing-symbol} @@ -1699,8 +1716,8 @@ to avoid ambiguity with the function that has the same name); Defun movement functions like @code{treesit-end-of-defun} uses the @code{defun} definition (@code{defun} definition is overridden by @var{treesit-defun-type-regexp} for backward compatibility). Major -modes can also define @code{comment}, @code{string}, @code{text} -(generally comments and strings). +modes can also define @code{comment}, @code{string}, and @code{text} +things (to match comments and strings). The rest of this section lists a few functions that take advantage of the thing definitions. Besides the functions below, some other @@ -1709,10 +1726,10 @@ tree-traversing functions like @code{treesit-search-forward}, @code{treesit-induce-sparse-tree}, etc. @xref{Retrieving Nodes}. @defun treesit-node-match-p node thing &optional ignore-missing -This function checks whether @var{node} is a @var{thing}. +This function checks whether @var{node} represents a @var{thing}. -If @var{node} is a @var{thing}, return non-@code{nil}, otherwise return -@code{nil}. For convenience, if @code{node} is @code{nil}, this +If @var{node} represents @var{thing}, return non-@code{nil}, otherwise +return @code{nil}. For convenience, if @code{node} is @code{nil}, this function just returns @code{nil}. The @var{thing} can be either a thing symbol like @code{defun}, or @@ -1727,8 +1744,9 @@ undefined and just returns @code{nil}; but it still signals the error if @end defun @defun treesit-thing-prev position thing -This function returns the first node before @var{position} that is the -specified @var{thing}. If no such node exists, it returns @code{nil}. +This function returns the first node before @var{position} in the +current buffer that is the specified @var{thing}. If no such node +exists, it returns @code{nil}. It's guaranteed that, if a node is returned, the node's end position is less or equal to @var{position}. In other words, this function never returns a node that encloses @var{position}. @@ -1753,8 +1771,9 @@ function doesn't move point. A positive @var{arg} means moving forward that many instances of @var{thing}; negative @var{arg} means moving backward. If @var{side} is -@code{beg}, this function stops at the beginning of @var{thing}; if -@code{end}, stop at the end of @var{thing}. +@code{beg}, this function returns the position of the beginning of +@var{thing}; if it's @code{end}, it returns the position at the end of +@var{thing}. Like in @code{treesit-thing-prev}, @var{thing} can be a thing symbol defined in @code{treesit-thing-settings}, or a predicate. @@ -1780,8 +1799,8 @@ less or equal to @var{position}, and it's end position is greater or equal to @var{position}. If @var{strict} is non-@code{nil}, this function uses strict comparison, -i.e., start position must be strictly greater than @var{position}, and end -position must be strictly less than @var{position}. +i.e., start position must be strictly smaller than @var{position}, and end +position must be strictly greater than @var{position}. @var{thing} can be either a thing symbol defined in @code{treesit-thing-settings}, or a predicate. diff --git a/lisp/treesit.el b/lisp/treesit.el index 5df8eb70cbf..45626e77b99 100644 --- a/lisp/treesit.el +++ b/lisp/treesit.el @@ -3237,11 +3237,14 @@ The type can be `list' (the default) or `sexp'. The `list' type uses the `list' thing defined in `treesit-thing-settings'. See `treesit-thing-at-point'. With this type commands use syntax tables to -navigate symbols and treesit definition to navigate lists. +navigate symbols and treesit definitions to navigate lists. The `sexp' type uses the `sexp' thing defined in `treesit-thing-settings'. -With this type commands use only the treesit definition of parser nodes, -without distinction between symbols and lists." +With this type commands use only the treesit definitions of parser nodes, +without distinction between symbols and lists. Since tree-sitter grammars +could group node types in arbitrary ways, navigation by `sexp' might not +match your expectations, and might produce different results in differnt +treesit-based modes." (interactive "p") (if (not (treesit-thing-defined-p 'list (treesit-language-at (point)))) (user-error "No `list' thing is defined in `treesit-thing-settings'") @@ -3630,14 +3633,15 @@ predicate as described in `treesit-thing-settings'." (treesit--thing-sibling pos thing nil)) (defun treesit-thing-at (pos thing &optional strict) - "Return the smallest THING enclosing POS. + "Return the smallest node enclosing POS for THING. -The returned node, if non-nil, must enclose POS, i.e., its start -<= POS, its end > POS. If STRICT is non-nil, the returned node's -start must < POS rather than <= POS. +The returned node, if non-nil, must enclose POS, i.e., its +start <= POS, its end > POS. If STRICT is non-nil, the returned +node's start must be < POS rather than <= POS. -THING should be a thing defined in `treesit-thing-settings', or -it can be a predicate described in `treesit-thing-settings'." +THING should be a thing defined in `treesit-thing-settings' for +the current buffer's major mode, or it can be a predicate +described in `treesit-thing-settings'." (let* ((cursor (treesit-node-at pos)) (iter-pred (lambda (node) (and (treesit-node-match-p node thing t) @@ -3789,13 +3793,14 @@ function is called recursively." (if (eq counter 0) pos nil))) (defun treesit-thing-at-point (thing tactic) - "Return the THING at point, or nil if none is found. + "Return the node for THING at point, or nil if no THING is found at point. THING can be a symbol, a regexp, a predicate function, and more; -see `treesit-thing-settings' for details. +for details, see `treesit-thing-settings' as defined by the +current buffer's major mode. -Return the top-level THING if TACTIC is `top-level'; return the -smallest enclosing THING as POS if TACTIC is `nested'." +Return the top-level node for THING if TACTIC is `top-level'; return +the smallest node enclosing THING at point if TACTIC is `nested'." (let ((node (treesit-thing-at (point) thing))) (if (eq tactic 'top-level) diff --git a/src/treesit.c b/src/treesit.c index de74e41c89a..67dd2ee3a7a 100644 --- a/src/treesit.c +++ b/src/treesit.c @@ -5193,13 +5193,16 @@ then in the system default locations for dynamic libraries, in that order. */); doc: /* A list defining things. -The value should be an alist of (LANGUAGE . DEFINITIONS), where -LANGUAGE is a language symbol, and DEFINITIONS is a list of +The value should be defined by the major mode, and should be an alist +of the form (LANGUAGE . DEFINITIONS), where LANGUAGE is a language +symbol and DEFINITIONS is a list whose elements are of the form (THING PRED) -THING is a symbol representing the thing, like `defun', `sexp', or -`sentence'; PRED defines what kind of node can be qualified as THING. +THING is a symbol representing the thing, like `defun', `defclass', +`sexp', `sentence', `comment', or any other symbol that is meaningful +for the major mode; PRED defines what kind of node can be qualified +as THING. PRED can be a regexp string that matches the type of the node; it can be a predicate function that takes the node as the sole argument and @@ -5207,12 +5210,13 @@ returns t if the node is the thing, and nil otherwise; it can be a cons (REGEXP . FN), which is a combination of a regexp and a predicate function, and the node has to match both to qualify as the thing. -PRED can also be recursively defined. It can be (or PRED...), meaning -satisfying anyone of the inner PREDs qualifies the node; or (and -PRED...) meaning satisfying all of the inner PREDs qualifies the node; -or (not PRED), meaning not satisfying the inner PRED qualifies the node. +PRED can also be recursively defined. It can be: -There are two pre-defined predicates, `named' and `anonymous`. They + (or PRED...), meaning satisfying any of the inner PREDs qualifies the node; + (and PRED...) meaning satisfying all of the inner PREDs qualifies the node; + (not PRED), meaning not satisfying the inner PRED qualifies the node. + +There are two pre-defined predicates, `named' and `anonymous'. They match named nodes and anonymous nodes, respectively. Finally, PRED can refer to other THINGs defined in this list by using