Document tree-sitter things feature (bug#70016) (bug#68824)

* doc/lispref/parsing.texi (Retrieving Nodes): Mention new kinds of
predicate argument that the tree-traversing functions accept (which are
thing symbols and thing definitions).
(User-defined Things): New node dedicated to thing definition and
navigation functions.
This commit is contained in:
Yuan Fu 2024-04-07 15:59:48 -07:00
parent 64854869ae
commit 4efe3b99a5
No known key found for this signature in database
GPG key ID: 56E19BC57664A442
2 changed files with 195 additions and 12 deletions

View file

@ -743,12 +743,17 @@ is non-@code{nil}, it looks for the smallest named child.
@heading Searching for node
@defun treesit-search-subtree node predicate &optional backward all depth
This function traverses the subtree of @var{node} (including
@var{node} itself), looking for a node for which @var{predicate}
returns non-@code{nil}. @var{predicate} is a regexp that is matched
against each node's type, or a predicate function that takes a node
and returns non-@code{nil} if the node matches. The function returns
the first node that matches, or @code{nil} if none does.
This function traverses the subtree of @var{node} (including @var{node}
itself), looking for a node for which @var{predicate} returns
non-@code{nil}. @var{predicate} is a regexp that is matched against
each node's type, or a predicate function that takes a node and returns
non-@code{nil} if the node matches. @var{predicate} can also be a thing
symbol or thing definition (@pxref{User-defined Things}). Using an
undefined thing doesn't raise an error, the function simply returns
@code{nil}.
This function returns the first node that matches, or @code{nil} if node
matches @var{predicate}.
By default, this function only traverses named nodes, but if @var{all}
is non-@code{nil}, it traverses all the nodes. If @var{backward} is
@ -762,9 +767,13 @@ defaults to 1000.
@defun treesit-search-forward start predicate &optional backward all
Like @code{treesit-search-subtree}, this function also traverses the
parse tree and matches each node with @var{predicate} (except for
@var{start}), where @var{predicate} can be a regexp or a function.
For a tree like the one below where @var{start} is marked @samp{S},
this function traverses as numbered from 1 to 12:
@var{start}), where @var{predicate} can be a regexp or a predicate
function. @var{predicate} can also be a thing symbol or thing
definition (@pxref{User-defined Things}). Using an undefined thing
doesn't raise an error, the function simply returns @code{nil}.
For a tree like the one below where @var{start} is marked @samp{S}, this
function traverses as numbered from 1 to 12:
@example
@group
@ -818,9 +827,11 @@ This function creates a sparse tree from @var{root}'s subtree.
It takes the subtree under @var{root}, and combs it so only the nodes
that match @var{predicate} are left. Like previous functions, the
@var{predicate} can be a regexp string that matches against each
node's type, or a function that takes a node and returns
non-@code{nil} if it matches.
@var{predicate} can be a regexp string that matches against each node's
type, or a function that takes a node and returns non-@code{nil} if it
matches. @var{predicate} can also be a thing symbol or thing definition
(@pxref{User-defined Things}). Using an undefined thing doesn't raise
an error, the function simply returns @code{nil}.
For example, given the subtree on the left that consists of both
numbers and letters, if @var{predicate} is ``letter only'', the
@ -1508,6 +1519,149 @@ For more details, read the tree-sitter project's documentation about
pattern-matching, which can be found at
@uref{https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries}.
@node User-defined Things
@section User-defined ``Things'' and Navigation
It's often useful to be able to identify and find certain ``things'' in
a buffer, like function and class definitions, statements, code blocks,
strings, comments, etc. Emacs allows users to define what kind of
tree-sitter node are what ``thing''. This enables handy features like
jumping to the next function, marking the code block at point, or
transposing two function arguments.
The ``things'' feature in Emacs is independent of the pattern matching
feature of tree-sitter, comparatively less powerful, but more suitable
for navigation and traversing the parse tree.
Users can define things with @var{treesit-thing-settings}.
@defvar treesit-thing-settings
This is an alist of thing definitions for each language. The key of
each entry is a language symbol, and the value is a list of thing
definitions of the form @w{@code{(@var{thing} @var{pred})}}.
@var{thing} is a symbol representing the thing, like @code{defun},
@code{sexp}, or @code{sentence}; @var{pred} specifies what kind of
tree-sitter node is the @var{thing}.
@var{pred} can be a regexp string that matches the type of the node; it
can be a function that takes a node as the argument and returns a
boolean that indicates whether the node qualifies as the thing; it can
be a cons @w{@code{(@var{regexp} . @var{fn})}}, which is a combination
of a regexp and a function---the node has to match both to qualify as the
thing.
@var{pred} can also be recursively defined. It can be @w{@code{(or
@var{pred}...)}}, meaning satisfying any one of the @var{pred}s
qualifies the node as the thing. It can be @w{@code{(not @var{pred})}},
meaning not satisfying @var{pred} qualifies the node.
Finally, @var{pred} can refer to other @var{thing}s defined in this
list. For example, @w{@code{(or sexp sentence)}} defines something
that's either a @code{sexp} or a @code{sentence}.
Here's an example @var{treesit-thing-settings} for C and C++:
@example
@group
((c
(defun "function_definition")
(sexp (not "[](),[@{@}]"))
(comment "comment")
(string "raw_string_literal")
(text (or comment string)))
(cpp
(defun ("function_definition" . cpp-ts-mode-defun-valid-p))
(defclass "class_specifier")
(comment "comment")))
@end group
@end example
Note that this example is modified for demonstration and isn't exactly
how C and C++ mode define things.
@end defvar
The next section lists a few functions that take advantage of the thing
definitions. Besides these functions, some other functions listed
elsewhere also utilizes the thing feature, e.g., tree-traversing
functions like @code{treesit-search-forward},
@code{treesit-induce-sparse-tree}, etc.
@defun treesit-thing-prev pos thing
This function returns the first node before @var{pos} that's a
@var{thing}. If no such node exists, it returns @code{nil}. It's
guaranteed that, if a node is returned, the node's end position is less
or equal to @var{pos}. In other words, this function never return a
node that encloses @var{pos}.
@var{thing} can be either a thing symbol like @code{defun}, or simply a
thing definition like @code{"function_definition"}.
@end defun
@defun treesit-thing-next pos thing
This function is similar to @code{treesit-thing-prev}, only that it
returns the first node @emph{after} @var{pos} that's a @var{thing}. And
it guarantees that if a node is returned, the node's start position is
be greater or equal to @var{pos}.
@end defun
@defun treesit-navigate-thing pos arg side thing &optional tactic
This function builds upon @code{treesit-thing-prev} and
@code{treesit-thing-next} and provides functionality that a navigation
command would find useful.
It returns the position after navigating @var{arg} steps from @var{pos},
without actually moving point. If there aren't enough things to
navigate across, it returns nil.
A positive @var{arg} means moving forward that many steps; negative
means moving backward. If @var{side} is @code{beg}, this function stops
at the beginning of the thing; if @code{end}, stop at the end.
Like in @code{treesit-thing-prev}, @var{thing} can be a thing symbol
defined in @var{treesit-thing-settings}, or a thing definition.
@var{tactic} determines how does this function move between things.
@var{tactic} can be @code{nested}, @code{top-level}, @code{restricted},
or @code{nil}. @code{nested} or @code{nil} means normal nested
navigation: first try to move across siblings; if there aren't any
siblings left in the current level, move to the parent, then it's
siblings, and so on. @code{top-level} means only navigate across
top-level things and ignore nested things. @code{restricted} means
movement is restricted within the thing that encloses @var{pos}, if
there is one such thing. This tactic is useful for the commands that
want to stop at the current nest level and not move up.
@end defun
@defun treesit-thing-at pos thing &optional strict
This function returns the smallest node that's a @var{thing} and
encloses @var{pos}; if there's no such node, return nil.
The returned node must enclose @var{pos}, i.e., its start position is
less or equal to @var{pos}, and it's end position is greater or equal to
@var{pos}.
If @var{strict} is non-@code{nil}, this function uses strict comparison,
i.e., start position must be strictly greater than @var{pos}, and end
position must be strictly less than @var{pos}.
@var{thing} can be either a thing symbol defined in
@var{treesit-thing-settings}, or a thing definition.
@end defun
@findex treesit-beginning-of-thing
@findex treesit-end-of-thing
@findex treesit-thing-at-point
There are also some convenient wrapper functions.
@code{treesit-beginning-of-thing} moves point to the beginning of a
thing, @code{treesit-beginning-of-thing} to the end of a thing.
@code{treesit-thing-at-point} returns the thing at point.
There are defun commands that specifically use the @code{defun}
definition, like @code{treesit-beginning-of-defun},
@code{treesit-end-of-defun}, and @code{treesit-defun-at-point}. In
addition, these functions use @var{treesit-defun-tactic} as the
navigation tactic. They are described in more detail in other sections.
@node Multiple Languages
@section Parsing Text in Multiple Languages
@cindex multiple languages, parsing with tree-sitter

View file

@ -2380,6 +2380,35 @@ objects is still necessary.
** The JSON encoder and decoder now accept arbitarily large integers.
Previously, they were limited to the range of signed 64-bit integers.
** New tree-sitter functions and variables for defining and using "things"
+++
*** New variable 'treesit-thing-settings'.
New variable that allows users to define "things" like 'defun', 'text',
'sexp', for navigation commands and tree-traversal functions.
+++
*** New navigation functions 'treesit-thing-prev', 'treesit-thing-next', 'treesit-navigate-thing', 'treesit-beginning-of-thing', 'treesit-end-of-thing'.
+++
*** New functions 'treesit-thing-at', 'treesit-thing-at-point'.
+++
*** Tree-tarversing functions 'treesit-search-subtree', 'treesit-search-forward', 'treesit-search-forward-goto', 'treesit-induce-sparse-tree' now accepts more kinds of predicates.
Now users can use thing symbols (defined in 'treesit-thing-settings'),
and any thing definitions for the predicate argument.
** Other tree-sitter function and variable changes
+++
*** 'treesit-parser-list' now takes additional optional arguments, LANGUAGE and TAG.
If LANGUAGE is given, only return parsers for that language. If TAG is
given, only return parsers with that tag. Note that passing nil as tag
doesn't mean return all parsers, but rather "all parsers with no tags".
* Changes in Emacs 30.1 on Non-Free Operating Systems