Describe primarily the Emacs s-exp dialect for treesit queries

* doc/lispref/parsing.texi (Pattern Matching, Multiple Languages):
Writing tree-sitter queries as Emacs s-expressions is much more
convenient than using the native query notation inside a string,
so it makes sense to base the documentation on the former dialect
(bug#64017).
This commit is contained in:
Mattias Engdegård 2023-06-18 10:37:53 +02:00
parent eacd75df4e
commit 8f62e7b85f

View file

@ -1084,9 +1084,9 @@ Now we can introduce the @dfn{query functions}.
@defun treesit-query-capture node query &optional beg end node-only @defun treesit-query-capture node query &optional beg end node-only
This function matches patterns in @var{query} within @var{node}. The This function matches patterns in @var{query} within @var{node}. The
argument @var{query} can be either a string, an s-expression, or a argument @var{query} can be either an s-expression, a string, or a
compiled query object. For now, we focus on the string syntax; compiled query object. For now, we focus on the s-expression syntax;
s-expression syntax and compiled queries are described at the end of string syntax and compiled queries are described at the end of
the section. the section.
The argument @var{node} can also be a parser or a language symbol. A The argument @var{node} can also be a parser or a language symbol. A
@ -1118,8 +1118,8 @@ For example, suppose @var{node}'s text is @code{1 + 2}, and
@example @example
@group @group
(setq query (setq query
"(binary_expression '((binary_expression
(number_literal) @@number-in-exp) @@biexp") (number_literal) @@number-in-exp) @@biexp)
@end group @end group
@end example @end example
@ -1140,8 +1140,8 @@ For example, it could have two top-level patterns:
@example @example
@group @group
(setq query (setq query
"(binary_expression) @@biexp '((binary_expression) @@biexp
(number_literal) @@number @@biexp") (number_literal) @@number @@biexp)
@end group @end group
@end example @end example
@ -1199,23 +1199,23 @@ field, say, a @code{function_definition} without a @code{body} field:
@subheading Quantify node @subheading Quantify node
@cindex quantify node, tree-sitter @cindex quantify node, tree-sitter
Tree-sitter recognizes quantification operators @samp{*}, @samp{+}, Tree-sitter recognizes quantification operators @samp{:*}, @samp{:+},
and @samp{?}. Their meanings are the same as in regular expressions: and @samp{:?}. Their meanings are the same as in regular expressions:
@samp{*} matches the preceding pattern zero or more times, @samp{+} @samp{:*} matches the preceding pattern zero or more times, @samp{:+}
matches one or more times, and @samp{?} matches zero or one times. matches one or more times, and @samp{:?} matches zero or one times.
For example, the following pattern matches @code{type_declaration} For example, the following pattern matches @code{type_declaration}
nodes that have @emph{zero or more} @code{long} keywords. nodes that have @emph{zero or more} @code{long} keywords.
@example @example
(type_declaration "long"*) @@long-type (type_declaration "long" :*) @@long-type
@end example @end example
The following pattern matches a type declaration that may or may not The following pattern matches a type declaration that may or may not
have a @code{long} keyword: have a @code{long} keyword:
@example @example
(type_declaration "long"?) @@long-type (type_declaration "long" :?) @@long-type
@end example @end example
@subheading Grouping @subheading Grouping
@ -1225,15 +1225,14 @@ groups and apply quantification operators to them. For example, to
express a comma-separated list of identifiers, one could write express a comma-separated list of identifiers, one could write
@example @example
(identifier) ("," (identifier))* (identifier) ("," (identifier)) :*
@end example @end example
@subheading Alternation @subheading Alternation
Again, similar to regular expressions, we can express ``match any one Again, similar to regular expressions, we can express ``match any one
of these patterns'' in a pattern. The syntax is a list of patterns of these patterns'' in a pattern. The syntax is a vector of patterns.
enclosed in square brackets. For example, to capture some keywords in For example, to capture some keywords in C, the pattern would be
C, the pattern would be
@example @example
@group @group
@ -1248,7 +1247,7 @@ C, the pattern would be
@subheading Anchor @subheading Anchor
The anchor operator @samp{.} can be used to enforce juxtaposition, The anchor operator @code{:anchor} can be used to enforce juxtaposition,
i.e., to enforce two things to be directly next to each other. The i.e., to enforce two things to be directly next to each other. The
two ``things'' can be two nodes, or a child and the end of its parent. two ``things'' can be two nodes, or a child and the end of its parent.
For example, to capture the first child, the last child, or two For example, to capture the first child, the last child, or two
@ -1257,19 +1256,19 @@ adjacent children:
@example @example
@group @group
;; Anchor the child with the end of its parent. ;; Anchor the child with the end of its parent.
(compound_expression (_) @@last-child .) (compound_expression (_) @@last-child :anchor)
@end group @end group
@group @group
;; Anchor the child with the beginning of its parent. ;; Anchor the child with the beginning of its parent.
(compound_expression . (_) @@first-child) (compound_expression :anchor (_) @@first-child)
@end group @end group
@group @group
;; Anchor two adjacent children. ;; Anchor two adjacent children.
(compound_expression (compound_expression
(_) @@prev-child (_) @@prev-child
. :anchor
(_) @@next-child) (_) @@next-child)
@end group @end group
@end example @end example
@ -1285,8 +1284,8 @@ example, with the following pattern:
@example @example
@group @group
( (
(array . (_) @@first (_) @@last .) (array :anchor (_) @@first (_) @@last :anchor)
(#equal @@first @@last) (:equal @@first @@last)
) )
@end group @end group
@end example @end example
@ -1294,22 +1293,22 @@ example, with the following pattern:
@noindent @noindent
tree-sitter only matches arrays where the first element is equal to tree-sitter only matches arrays where the first element is equal to
the last element. To attach a predicate to a pattern, we need to the last element. To attach a predicate to a pattern, we need to
group them together. A predicate always starts with a @samp{#}. group them together. Currently there are three predicates:
Currently there are three predicates: @code{#equal}, @code{#match}, @code{:equal}, @code{:match}, and @code{:pred}.
and @code{#pred}.
@deffn Predicate equal arg1 arg2 @deffn Predicate :equal arg1 arg2
Matches if @var{arg1} is equal to @var{arg2}. Arguments can be either Matches if @var{arg1} is equal to @var{arg2}. Arguments can be either
strings or capture names. Capture names represent the text that the strings or capture names. Capture names represent the text that the
captured node spans in the buffer. captured node spans in the buffer.
@end deffn @end deffn
@deffn Predicate match regexp capture-name @deffn Predicate :match regexp capture-name
Matches if the text that @var{capture-name}'s node spans in the buffer Matches if the text that @var{capture-name}'s node spans in the buffer
matches regular expression @var{regexp}. Matching is case-sensitive. matches regular expression @var{regexp}, given as a string literal.
Matching is case-sensitive.
@end deffn @end deffn
@deffn Predicate pred fn &rest nodes @deffn Predicate :pred fn &rest nodes
Matches if function @var{fn} returns non-@code{nil} when passed each Matches if function @var{fn} returns non-@code{nil} when passed each
node in @var{nodes} as arguments. node in @var{nodes} as arguments.
@end deffn @end deffn
@ -1318,28 +1317,13 @@ Note that a predicate can only refer to capture names that appear in
the same pattern. Indeed, it makes little sense to refer to capture the same pattern. Indeed, it makes little sense to refer to capture
names in other patterns. names in other patterns.
@heading S-expression patterns @heading String patterns
@cindex tree-sitter patterns as sexps @cindex tree-sitter patterns as strings
@cindex patterns, tree-sitter, in sexp form @cindex patterns, tree-sitter, in string form
Besides strings, Emacs provides an s-expression based syntax for Besides s-expressions, Emacs allows the tree-sitter's native query
tree-sitter patterns. It largely resembles the string-based syntax. syntax to be used by writing them as strings. It largely resembles
For example, the following query the s-expression syntax. For example, the following query
@example
@group
(treesit-query-capture
node "(addition_expression
left: (_) @@left
\"+\" @@plus-sign
right: (_) @@right) @@addition
[\"return\" \"break\"] @@keyword")
@end group
@end example
@noindent
is equivalent to
@example @example
@group @group
@ -1353,36 +1337,40 @@ is equivalent to
@end group @end group
@end example @end example
Most patterns can be written directly as strange but nevertheless @noindent
valid s-expressions. Only a few of them need modification: is equivalent to
@itemize
@item
Anchor @samp{.} is written as @code{:anchor}.
@item
@samp{?} is written as @samp{:?}.
@item
@samp{*} is written as @samp{:*}.
@item
@samp{+} is written as @samp{:+}.
@item
@code{#equal} is written as @code{:equal}. In general, predicates
change their @samp{#} to @samp{:}.
@end itemize
For example,
@example @example
@group @group
"( (treesit-query-capture
(compound_expression . (_) @@first (_)* @@rest) node "(addition_expression
(#match \"love\" @@first) left: (_) @@left
)" \"+\" @@plus-sign
right: (_) @@right) @@addition
[\"return\" \"break\"] @@keyword")
@end group @end group
@end example @end example
@noindent Most patterns can be written directly as s-expressions inside a string.
is written in s-expression syntax as Only a few of them need modification:
@itemize
@item
Anchor @code{:anchor} is written as @samp{.}.
@item
@samp{:?} is written as @samp{?}.
@item
@samp{:*} is written as @samp{*}.
@item
@samp{:+} is written as @samp{+}.
@item
@code{:equal}, @code{:match} and @code{:pred} are written as
@code{#equal}, @code{#match} and @code{#pred}, respectively.
In general, predicates change their @samp{:} to @samp{#}.
@end itemize
For example,
@example @example
@group @group
@ -1393,6 +1381,18 @@ is written in s-expression syntax as
@end group @end group
@end example @end example
@noindent
is written in string form as
@example
@group
"(
(compound_expression . (_) @@first (_)* @@rest)
(#match \"love\" @@first)
)"
@end group
@end example
@heading Compiling queries @heading Compiling queries
@cindex compiling tree-sitter queries @cindex compiling tree-sitter queries
@ -1413,7 +1413,7 @@ validate and debug the query.
@end defun @end defun
@defun treesit-query-language query @defun treesit-query-language query
This function return the language of @var{query}. This function returns the language of @var{query}.
@end defun @end defun
@defun treesit-query-expand query @defun treesit-query-expand query
@ -1605,7 +1605,7 @@ ranges for @acronym{CSS} and JavaScript parsers:
(setq css-range (setq css-range
(treesit-query-range (treesit-query-range
'html 'html
"(style_element (raw_text) @@capture)")) '((style_element (raw_text) @@capture))))
(treesit-parser-set-included-ranges css css-range) (treesit-parser-set-included-ranges css css-range)
@end group @end group
@ -1614,7 +1614,7 @@ ranges for @acronym{CSS} and JavaScript parsers:
(setq js-range (setq js-range
(treesit-query-range (treesit-query-range
'html 'html
"(script_element (raw_text) @@capture)")) '((script_element (raw_text) @@capture))))
(treesit-parser-set-included-ranges js js-range) (treesit-parser-set-included-ranges js js-range)
@end group @end group
@end example @end example