
* doc/lispref/modes.texi: Update manual to reflect previous change: remove :toggle and :level, add :feature. Document new variables and functions. * doc/lispref/parsing.texi: Add the missing closing bracket in @code{(t . nil)}.
1533 lines
50 KiB
Text
1533 lines
50 KiB
Text
@c -*- mode: texinfo; coding: utf-8 -*-
|
||
@c This is part of the GNU Emacs Lisp Reference Manual.
|
||
@c Copyright (C) 2021 Free Software Foundation, Inc.
|
||
@c See the file elisp.texi for copying conditions.
|
||
@node Parsing Program Source
|
||
@chapter Parsing Program Source
|
||
|
||
Emacs provides various ways to parse program source text and produce a
|
||
@dfn{syntax tree}. In a syntax tree, text is no longer a
|
||
one-dimensional stream but a structured tree of nodes, where each node
|
||
representing a piece of text. Thus a syntax tree can enable
|
||
interesting features like precise fontification, indentation,
|
||
navigation, structured editing, etc.
|
||
|
||
Emacs has a simple facility for parsing balanced expressions
|
||
(@pxref{Parsing Expressions}). There is also SMIE library for generic
|
||
navigation and indentation (@pxref{SMIE}).
|
||
|
||
Emacs also provides integration with tree-sitter library
|
||
(@uref{https://tree-sitter.github.io/tree-sitter}) if compiled with
|
||
it. The tree-sitter library implements an incremental parser and has
|
||
support from a wide range of programming languages.
|
||
|
||
@defun treesit-available-p
|
||
This function returns non-nil if tree-sitter features are available
|
||
for this Emacs instance.
|
||
@end defun
|
||
|
||
For tree-sitter integration with existing Emacs features,
|
||
@pxref{Parser-based Font Lock}, @ref{Parser-based Indentation}, and
|
||
@ref{List Motion}.
|
||
|
||
About naming convention: use ``tree-sitter'' when referring to it as a
|
||
noun, like @code{python-use-tree-sitter}, but use ``treesit'' for
|
||
prefixes, like @code{python-treesit-indent-function}.
|
||
|
||
To access the syntax tree of the text in a buffer, we need to first
|
||
load a language definition and create a parser with it. Next, we can
|
||
query the parser for specific nodes in the syntax tree. Then, we can
|
||
access various information about the node, and we can pattern-match a
|
||
node with a powerful syntax. Finally, we explain how to work with
|
||
source files that mixes multiple languages. The following sections
|
||
explain how to do each of the tasks in detail.
|
||
|
||
@menu
|
||
* Language Definitions:: Loading tree-sitter language definitions.
|
||
* Using Parser:: Introduction to parsers.
|
||
* Retrieving Node:: Retrieving node from syntax tree.
|
||
* Accessing Node:: Accessing node information.
|
||
* Pattern Matching:: Pattern matching with query patterns.
|
||
* Multiple Languages:: Parse text written in multiple languages.
|
||
* Tree-sitter C API:: Compare the C API and the ELisp API.
|
||
@end menu
|
||
|
||
@node Language Definitions
|
||
@section Tree-sitter Language Definitions
|
||
|
||
@heading Loading a language definition
|
||
|
||
Tree-sitter relies on language definitions to parse text in that
|
||
language. In Emacs, A language definition is represented by a symbol.
|
||
For example, C language definition is represented as @code{c}, and
|
||
@code{c} can be passed to tree-sitter functions as the @var{language}
|
||
argument.
|
||
|
||
@vindex treesit-extra-load-path
|
||
@vindex treesit-load-language-error
|
||
@vindex treesit-load-suffixes
|
||
Tree-sitter language definitions are distributed as dynamic libraries.
|
||
In order to use a language definition in Emacs, you need to make sure
|
||
that the dynamic library is installed on the system. Emacs looks for
|
||
language definitions under load paths in
|
||
@code{treesit-extra-load-path},
|
||
@code{user-emacs-directory}/tree-sitter, and system default locations
|
||
for dynamic libraries, in that order. Emacs tries each extensions in
|
||
@code{treesit-load-suffixes}. If Emacs cannot find the library or has
|
||
problem loading it, Emacs signals @code{treesit-load-language-error}.
|
||
|
||
The signal data could be @code{(not-found @var{error-msg} ...)} if
|
||
Emacs cannot find the language definition, or @code{(symbol-error
|
||
@var{error-msg})} if the Emacs cannot find the correct symbol in the
|
||
language definition, or @code{(version_mismatch @var{error-msg})} if
|
||
the language definition's version does match that of the tree-sitter
|
||
library.
|
||
|
||
@defun treesit-language-available-p language
|
||
This function returns non-nil if @var{language} exists and is
|
||
loadable.
|
||
|
||
If @var{detail} is non-nil, return @code{(t . nil)} when
|
||
@var{language} is available, @code{(nil . DATA)} when unavailable.
|
||
@var{data} is the signal data of @code{treesit-load-language-error}.
|
||
@end defun
|
||
|
||
@vindex treesit-load-name-override-list
|
||
By convention, the dynamic library for @var{language} is
|
||
@code{libtree-sitter-@var{language}.@var{ext}}, where @var{ext} is the
|
||
system-specific extension for dynamic libraries. Also by convention,
|
||
the function provided by that library is named
|
||
@code{tree_sitter_@var{language}}. If a language definition doesn't
|
||
follow this convention, you should add an entry
|
||
|
||
@example
|
||
(@var{language} @var{library-base-name} @var{function-name})
|
||
@end example
|
||
|
||
to @code{treesit-load-name-override-list}, where
|
||
@var{library-base-name} is the base filename for the dynamic library
|
||
(conventionally @code{libtree-sitter-@var{language}}), and
|
||
@var{function-name} is the function provided by the library
|
||
(conventionally @code{tree_sitter_@var{language}}). For example,
|
||
|
||
@example
|
||
(cool-lang "libtree-sitter-coool" "tree_sitter_cooool")
|
||
@end example
|
||
|
||
for a language too cool to abide by conventions.
|
||
|
||
@defun treesit-language-version &optional min-compatible
|
||
Tree-sitter library has a @dfn{language version}, a language
|
||
definition's version needs to match this version to be compatible.
|
||
|
||
This function returns tree-sitter library’s language version. If
|
||
@var{min-compatible} is non-nil, it returns the minimal compatible
|
||
version.
|
||
@end defun
|
||
|
||
@heading Concrete syntax tree
|
||
|
||
A syntax tree is what a parser generates. In a syntax tree, each node
|
||
represents a piece of text, and is connected to each other by a
|
||
parent-child relationship. For example, if the source text is
|
||
|
||
@example
|
||
1 + 2
|
||
@end example
|
||
|
||
@noindent
|
||
its syntax tree could be
|
||
|
||
@example
|
||
@group
|
||
+--------------+
|
||
| root "1 + 2" |
|
||
+--------------+
|
||
|
|
||
+--------------------------------+
|
||
| expression "1 + 2" |
|
||
+--------------------------------+
|
||
| | |
|
||
+------------+ +--------------+ +------------+
|
||
| number "1" | | operator "+" | | number "2" |
|
||
+------------+ +--------------+ +------------+
|
||
@end group
|
||
@end example
|
||
|
||
We can also represent it in s-expression:
|
||
|
||
@example
|
||
(root (expression (number) (operator) (number)))
|
||
@end example
|
||
|
||
@subheading Node types
|
||
|
||
@cindex tree-sitter node type
|
||
@anchor{tree-sitter node type}
|
||
@cindex tree-sitter named node
|
||
@anchor{tree-sitter named node}
|
||
@cindex tree-sitter anonymous node
|
||
Names like @code{root}, @code{expression}, @code{number},
|
||
@code{operator} are nodes' @dfn{type}. However, not all nodes in a
|
||
syntax tree have a type. Nodes that don't are @dfn{anonymous nodes},
|
||
and nodes with a type are @dfn{named nodes}. Anonymous nodes are
|
||
tokens with fixed spellings, including punctuation characters like
|
||
bracket @samp{]}, and keywords like @code{return}.
|
||
|
||
@subheading Field names
|
||
|
||
@cindex tree-sitter node field name
|
||
@anchor{tree-sitter node field name} To make the syntax tree easier to
|
||
analyze, many language definitions assign @dfn{field names} to child
|
||
nodes. For example, a @code{function_definition} node could have a
|
||
@code{declarator} and a @code{body}:
|
||
|
||
@example
|
||
@group
|
||
(function_definition
|
||
declarator: (declaration)
|
||
body: (compound_statement))
|
||
@end group
|
||
@end example
|
||
|
||
@deffn Command treesit-inspect-mode
|
||
This minor mode displays the node that @emph{starts} at point in
|
||
mode-line. The mode-line will display
|
||
|
||
@example
|
||
@var{parent} @var{field-name}: (@var{child} (@var{grand-child} (...)))
|
||
@end example
|
||
|
||
@var{child}, @var{grand-child}, and @var{grand-grand-child}, etc, are
|
||
nodes that have their beginning at point. And @var{parent} is the
|
||
parent of @var{child}.
|
||
|
||
If there is no node that starts at point, i.e., point is in the middle
|
||
of a node, then the mode-line only displays the smallest node that
|
||
spans point, and its immediate parent.
|
||
|
||
This minor mode doesn't create parsers on its own. It simply uses the
|
||
first parser in @code{(treesit-parser-list)} (@pxref{Using Parser}).
|
||
@end deffn
|
||
|
||
@heading Reading the grammar definition
|
||
|
||
Authors of language definitions define the @dfn{grammar} of a
|
||
language, and this grammar determines how does a parser construct a
|
||
concrete syntax tree out of the text. In order to use the syntax
|
||
tree effectively, we need to read the @dfn{grammar file}.
|
||
|
||
The grammar file is usually @code{grammar.js} in a language
|
||
definition’s project repository. The link to a language definition’s
|
||
home page can be found in tree-sitter’s homepage
|
||
(@uref{https://tree-sitter.github.io/tree-sitter}).
|
||
|
||
The grammar is written in JavaScript syntax. For example, the rule
|
||
matching a @code{function_definition} node looks like
|
||
|
||
@example
|
||
@group
|
||
function_definition: $ => seq(
|
||
$.declaration_specifiers,
|
||
field('declarator', $.declaration),
|
||
field('body', $.compound_statement)
|
||
)
|
||
@end group
|
||
@end example
|
||
|
||
The rule is represented by a function that takes a single argument
|
||
@var{$}, representing the whole grammar. The function itself is
|
||
constructed by other functions: the @code{seq} function puts together a
|
||
sequence of children; the @code{field} function annotates a child with
|
||
a field name. If we write the above definition in BNF syntax, it
|
||
would look like
|
||
|
||
@example
|
||
@group
|
||
function_definition :=
|
||
<declaration_specifiers> <declaration> <compound_statement>
|
||
@end group
|
||
@end example
|
||
|
||
@noindent
|
||
and the node returned by the parser would look like
|
||
|
||
@example
|
||
@group
|
||
(function_definition
|
||
(declaration_specifier)
|
||
declarator: (declaration)
|
||
body: (compound_statement))
|
||
@end group
|
||
@end example
|
||
|
||
Below is a list of functions that one will see in a grammar
|
||
definition. Each function takes other rules as arguments and returns
|
||
a new rule.
|
||
|
||
@itemize @bullet
|
||
@item
|
||
@code{seq(rule1, rule2, ...)} matches each rule one after another.
|
||
|
||
@item
|
||
@code{choice(rule1, rule2, ...)} matches one of the rules in its
|
||
arguments.
|
||
|
||
@item
|
||
@code{repeat(rule)} matches @var{rule} for @emph{zero or more} times.
|
||
This is like the @samp{*} operator in regular expressions.
|
||
|
||
@item
|
||
@code{repeat1(rule)} matches @var{rule} for @emph{one or more} times.
|
||
This is like the @samp{+} operator in regular expressions.
|
||
|
||
@item
|
||
@code{optional(rule)} matches @var{rule} for @emph{zero or one} time.
|
||
This is like the @samp{?} operator in regular expressions.
|
||
|
||
@item
|
||
@code{field(name, rule)} assigns field name @var{name} to the child
|
||
node matched by @var{rule}.
|
||
|
||
@item
|
||
@code{alias(rule, alias)} makes nodes matched by @var{rule} appear as
|
||
@var{alias} in the syntax tree generated by the parser. For example,
|
||
|
||
@example
|
||
alias(preprocessor_call_exp, call_expression)
|
||
@end example
|
||
|
||
makes any node matched by @code{preprocessor_call_exp} to appear as
|
||
@code{call_expression}.
|
||
@end itemize
|
||
|
||
Below are grammar functions less interesting for a reader of a
|
||
language definition.
|
||
|
||
@itemize
|
||
@item
|
||
@code{token(rule)} marks @var{rule} to produce a single leaf node.
|
||
That is, instead of generating a parent node with individual child
|
||
nodes under it, everything is combined into a single leaf node.
|
||
|
||
@item
|
||
Normally, grammar rules ignore preceding whitespaces,
|
||
@code{token.immediate(rule)} changes @var{rule} to match only when
|
||
there is no preceding whitespaces.
|
||
|
||
@item
|
||
@code{prec(n, rule)} gives @var{rule} a level @var{n} precedence.
|
||
|
||
@item
|
||
@code{prec.left([n,] rule)} marks @var{rule} as left-associative,
|
||
optionally with level @var{n}.
|
||
|
||
@item
|
||
@code{prec.right([n,] rule)} marks @var{rule} as right-associative,
|
||
optionally with level @var{n}.
|
||
|
||
@item
|
||
@code{prec.dynamic(n, rule)} is like @code{prec}, but the precedence
|
||
is applied at runtime instead.
|
||
@end itemize
|
||
|
||
The tree-sitter project talks about writing a grammar in more detail:
|
||
@uref{https://tree-sitter.github.io/tree-sitter/creating-parsers}.
|
||
Read especially ``The Grammar DSL'' section.
|
||
|
||
@node Using Parser
|
||
@section Using Tree-sitter Parser
|
||
@cindex Tree-sitter parser
|
||
|
||
This section described how to create and configure a tree-sitter
|
||
parser. In Emacs, each tree-sitter parser is associated with a
|
||
buffer. As we edit the buffer, the associated parser and the syntax
|
||
tree is automatically kept up-to-date.
|
||
|
||
@defvar treesit-max-buffer-size
|
||
This variable contains the maximum size of buffers in which
|
||
tree-sitter can be activated. Major modes should check this value
|
||
when deciding whether to enable tree-sitter features.
|
||
@end defvar
|
||
|
||
@defun treesit-can-enable-p
|
||
This function checks whether the current buffer is suitable for
|
||
activating tree-sitter features. It basically checks
|
||
@code{treesit-available-p} and @code{treesit-max-buffer-size}.
|
||
@end defun
|
||
|
||
@cindex Creating tree-sitter parsers
|
||
@defun treesit-parser-create language &optional buffer no-reuse
|
||
To create a parser, we provide a @var{buffer} and the @var{language}
|
||
to use (@pxref{Language Definitions}). If @var{buffer} is nil, the
|
||
current buffer is used.
|
||
|
||
By default, this function reuses a parser if one already exists for
|
||
@var{language} in @var{buffer}, if @var{no-reuse} is non-nil, this
|
||
function always creates a new parser.
|
||
@end defun
|
||
|
||
Given a parser, we can query information about it:
|
||
|
||
@defun treesit-parser-buffer parser
|
||
Returns the buffer associated with @var{parser}.
|
||
@end defun
|
||
|
||
@defun treesit-parser-language parser
|
||
Returns the language that @var{parser} uses.
|
||
@end defun
|
||
|
||
@defun treesit-parser-p object
|
||
Checks if @var{object} is a tree-sitter parser. Return non-nil if it
|
||
is, return nil otherwise.
|
||
@end defun
|
||
|
||
There is no need to explicitly parse a buffer, because parsing is done
|
||
automatically and lazily. A parser only parses when we query for a
|
||
node in its syntax tree. Therefore, when a parser is first created,
|
||
it doesn't parse the buffer; it waits until we query for a node for
|
||
the first time. Similarly, when some change is made in the buffer, a
|
||
parser doesn't re-parse immediately.
|
||
|
||
@vindex treesit-buffer-too-large
|
||
When a parser do parse, it checks for the size of the buffer.
|
||
Tree-sitter can only handle buffer no larger than about 4GB. If the
|
||
size exceeds that, Emacs signals @code{treesit-buffer-too-large}
|
||
with signal data being the buffer size.
|
||
|
||
Once a parser is created, Emacs automatically adds it to the
|
||
internal parser list. Every time a change is made to the buffer,
|
||
Emacs updates parsers in this list so they can update their syntax
|
||
tree incrementally.
|
||
|
||
@defun treesit-parser-list &optional buffer
|
||
This function returns the parser list of @var{buffer}. And
|
||
@var{buffer} defaults to the current buffer.
|
||
@end defun
|
||
|
||
@defun treesit-parser-delete parser
|
||
This function deletes @var{parser}.
|
||
@end defun
|
||
|
||
@cindex tree-sitter narrowing
|
||
@anchor{tree-sitter narrowing} Normally, a parser ``sees'' the whole
|
||
buffer, but when the buffer is narrowed (@pxref{Narrowing}), the
|
||
parser will only see the visible region. As far as the parser can
|
||
tell, the hidden region is deleted. And when the buffer is later
|
||
widened, the parser thinks text is inserted in the beginning and in
|
||
the end. Although parsers respect narrowing, narrowing shouldn't be
|
||
the mean to handle a multi-language buffer; instead, set the ranges in
|
||
which a parser should operate in. @xref{Multiple Languages}.
|
||
|
||
Because a parser parses lazily, when we narrow the buffer, the parser
|
||
is not affected immediately; as long as we don't query for a node
|
||
while the buffer is narrowed, the parser is oblivious of the
|
||
narrowing.
|
||
|
||
@cindex tree-sitter parse string
|
||
@defun treesit-parse-string string language
|
||
Besides creating a parser for a buffer, we can also just parse a
|
||
string. Unlike a buffer, parsing a string is a one-time deal, and
|
||
there is no way to update the result.
|
||
|
||
This function parses @var{string} with @var{language}, and returns the
|
||
root node of the generated syntax tree.
|
||
@end defun
|
||
|
||
@node Retrieving Node
|
||
@section Retrieving Node
|
||
|
||
@cindex tree-sitter find node
|
||
@cindex tree-sitter get node
|
||
Before we continue, lets go over some conventions of tree-sitter
|
||
functions.
|
||
|
||
We talk about a node being ``smaller'' or ``larger'', and ``lower'' or
|
||
``higher''. A smaller and lower node is lower in the syntax tree and
|
||
therefore spans a smaller piece of text; a larger and higher node is
|
||
higher up in the syntax tree, containing many smaller nodes as its
|
||
children, and therefore spans a larger piece of text.
|
||
|
||
When a function cannot find a node, it returns nil. And for the
|
||
convenience for function chaining, all the functions that take a node
|
||
as argument and returns a node accept the node to be nil; in that
|
||
case, the function just returns nil.
|
||
|
||
@vindex treesit-node-outdated
|
||
Nodes are not automatically updated when the associated buffer is
|
||
modified. And there is no way to update a node once it is retrieved.
|
||
Using an outdated node throws @code{treesit-node-outdated} error.
|
||
|
||
@heading Retrieving node from syntax tree
|
||
|
||
@defun treesit-node-at beg end &optional parser-or-lang named
|
||
This function returns the @emph{smallest} node that starts at or after
|
||
the @var{point}. In other words, the start of the node is equal or
|
||
greater than @var{point}.
|
||
|
||
When @var{parser-or-lang} is nil, this function uses the first parser
|
||
in @code{(treesit-parser-list)} in the current buffer. If
|
||
@var{parser-or-lang} is a parser object, it use that parser; if
|
||
@var{parser-or-lang} is a language, it finds the first parser using
|
||
that language in @code{(treesit-parser-list)} and use that.
|
||
|
||
If @var{named} is non-nil, this function looks for a named node
|
||
only (@pxref{tree-sitter named node, named node}).
|
||
|
||
Example:
|
||
@example
|
||
@group
|
||
;; Find the node at point in a C parser's syntax tree.
|
||
(treesit-node-at (point) 'c)
|
||
@c @result{} #<treesit-node from 1 to 4 in *scratch*>
|
||
@end group
|
||
@end example
|
||
@end defun
|
||
|
||
@defun treesit-node-on beg end &optional parser-or-lang named
|
||
This function returns the @emph{smallest} node that covers the span
|
||
from @var{beg} to @var{end}. In other words, the start of the node is
|
||
less or equal to @var{beg}, and the end of the node is greater or
|
||
equal to @var{end}.
|
||
|
||
@emph{Beware} that calling this function on an empty line that is not
|
||
inside any top-level construct (function definition, etc) most
|
||
probably will give you the root node, because the root node is the
|
||
smallest node that covers that empty line. Most of the time, you want
|
||
to use @code{treesit-node-at}.
|
||
|
||
When @var{parser-or-lang} is nil, this function uses the first parser
|
||
in @code{(treesit-parser-list)} in the current buffer. If
|
||
@var{parser-or-lang} is a parser object, it use that parser; if
|
||
@var{parser-or-lang} is a language, it finds the first parser using
|
||
that language in @code{(treesit-parser-list)} and use that.
|
||
|
||
If @var{named} is non-nil, this function looks for a named node only
|
||
(@pxref{tree-sitter named node, named node}).
|
||
@end defun
|
||
|
||
@defun treesit-parser-root-node parser
|
||
This function returns the root node of the syntax tree generated by
|
||
@var{parser}.
|
||
@end defun
|
||
|
||
@defun treesit-buffer-root-node &optional language
|
||
This function finds the first parser that uses @var{language} in
|
||
@code{(treesit-parser-list)} in the current buffer, and returns the
|
||
root node of that buffer. If it cannot find an appropriate parser,
|
||
nil is returned.
|
||
@end defun
|
||
|
||
Once we have a node, we can retrieve other nodes from it, or query for
|
||
information about this node.
|
||
|
||
@heading Retrieving node from other nodes
|
||
|
||
@subheading By kinship
|
||
|
||
@defun treesit-node-parent node
|
||
This function returns the immediate parent of @var{node}.
|
||
@end defun
|
||
|
||
@defun treesit-node-child node n &optional named
|
||
This function returns the @var{n}'th child of @var{node}. If
|
||
@var{named} is non-nil, then it only counts named nodes
|
||
(@pxref{tree-sitter named node, named node}). For example, in a node
|
||
that represents a string: @code{"text"}, there are three children
|
||
nodes: the opening quote @code{"}, the string content @code{text}, and
|
||
the enclosing quote @code{"}. Among these nodes, the first child is
|
||
the opening quote @code{"}, the first named child is the string
|
||
content @code{text}.
|
||
@end defun
|
||
|
||
@defun treesit-node-children node &optional named
|
||
This function returns all of @var{node}'s children in a list. If
|
||
@var{named} is non-nil, then it only retrieves named nodes.
|
||
@end defun
|
||
|
||
@defun treesit-next-sibling node &optional named
|
||
This function finds the next sibling of @var{node}. If @var{named} is
|
||
non-nil, it finds the next named sibling.
|
||
@end defun
|
||
|
||
@defun treesit-prev-sibling node &optional named
|
||
This function finds the previous sibling of @var{node}. If
|
||
@var{named} is non-nil, it finds the previous named sibling.
|
||
@end defun
|
||
|
||
@subheading By field name
|
||
|
||
To make the syntax tree easier to analyze, many language definitions
|
||
assign @dfn{field names} to child nodes (@pxref{tree-sitter node field
|
||
name, field name}). For example, a @code{function_definition} node
|
||
could have a @code{declarator} and a @code{body}.
|
||
|
||
@defun treesit-child-by-field-name node field-name
|
||
This function finds the child of @var{node} that has @var{field-name}
|
||
as its field name.
|
||
|
||
@example
|
||
@group
|
||
;; Get the child that has "body" as its field name.
|
||
(treesit-child-by-field-name node "body")
|
||
@c @result{} #<treesit-node from 3 to 11 in *scratch*>
|
||
@end group
|
||
@end example
|
||
@end defun
|
||
|
||
@subheading By position
|
||
|
||
@defun treesit-first-child-for-pos node pos &optional named
|
||
This function finds the first child of @var{node} that extends beyond
|
||
@var{pos}. ``Extend beyond'' means the end of the child node >=
|
||
@var{pos}. This function only looks for immediate children of
|
||
@var{node}, and doesn't look in its grand children. If @var{named} is
|
||
non-nil, it only looks for named child (@pxref{tree-sitter named node,
|
||
named node}).
|
||
@end defun
|
||
|
||
@defun treesit-node-descendant-for-range node beg end &optional named
|
||
This function finds the @emph{smallest} child/grandchild... of
|
||
@var{node} that spans the range from @var{beg} to @var{end}. It is
|
||
similar to @code{treesit-node-at}. If @var{named} is non-nil, it only
|
||
looks for named child.
|
||
@end defun
|
||
|
||
@heading Searching for node
|
||
|
||
@defun treesit-search-subtree node predicate &optional all backward limit
|
||
This function traverses the subtree of @var{node} (including
|
||
@var{node}), and match @var{predicate} with each node along the way.
|
||
And @var{predicate} is a regexp that matches (case-insensitively)
|
||
against each node's type, or a function that takes a node and returns
|
||
nil/non-nil. If a node matches, that node is returned, if no node
|
||
ever matches, nil is returned.
|
||
|
||
By default, this function only traverses named nodes, if @var{all} is
|
||
non-nil, it traverses all nodes. If @var{backward} is non-nil, it
|
||
traverses backwards. If @var{limit} is non-nil, it only traverses
|
||
that number of levels down in the tree.
|
||
@end defun
|
||
|
||
@defun treesit-search-forward start predicate &optional all backward up
|
||
This function is somewhat similar to @code{treesit-search-subtree}.
|
||
It also traverse the parse tree and match each node with
|
||
@var{predicate} (except for @var{start}), where @var{predicate} can be
|
||
a (case-insensitive) regexp or a function. For a tree like the below
|
||
where @var{start} is marked 1, this function traverses as numbered:
|
||
|
||
@example
|
||
@group
|
||
o
|
||
|
|
||
3--------4-----------8
|
||
| | |
|
||
o--o-+--1 5--+--6 9---+-----12
|
||
| | | | | |
|
||
o o 2 7 +-+-+ +--+--+
|
||
| | | | |
|
||
10 11 13 14 15
|
||
@end group
|
||
@end example
|
||
|
||
Same as in @code{treesit-search-subtree}, this function only searches
|
||
for named nodes by default. But if @var{all} is non-nil, it searches
|
||
for all nodes. If @var{backward} is non-nil, it searches backwards.
|
||
|
||
If @var{up} is non-nil, this function will only traverse to siblings
|
||
and parents. In that case, only 1 3 4 8 would be traversed.
|
||
@end defun
|
||
|
||
@defun treesit-search-forward-goto predicate side &optional all backward up
|
||
This function jumps to the start or end of the next node in buffer
|
||
that matches @var{predicate}. Parameters @var{predicate}, @var{all},
|
||
@var{backward}, and @var{up} are the same as in
|
||
@code{treesit-search-forward}. And @var{side} controls which side of
|
||
the matched no do we stop at, it can be @code{start} or @code{end}.
|
||
@end defun
|
||
|
||
@defun treesit-induce-sparse-tree root predicate &optional process-fn limit
|
||
This function creates a sparse tree from @var{root}'s subtree.
|
||
|
||
Basically, it takes the subtree under @var{root}, and combs it so only
|
||
the nodes that match @var{predicate} are left, like picking out grapes
|
||
on the vine. Like previous functions, @var{predicate} can be a regexp
|
||
string that matches against each node's type case-insensitively, or a
|
||
function that takes a node and return nil/non-nil.
|
||
|
||
For example, for a subtree on the left that consist of both numbers
|
||
and letters, if @var{predicate} is ``letter only'', the returned tree
|
||
is the one on the right.
|
||
|
||
@example
|
||
@group
|
||
a a a
|
||
| | |
|
||
+---+---+ +---+---+ +---+---+
|
||
| | | | | | | | |
|
||
b 1 2 b | | b c d
|
||
| | => | | => |
|
||
c +--+ c + e
|
||
| | | | |
|
||
+--+ d 4 +--+ d
|
||
| | |
|
||
e 5 e
|
||
@end group
|
||
@end example
|
||
|
||
If @var{process-fn} is non-nil, instead of returning the matched
|
||
nodes, this function passes each node to @var{process-fn} and uses the
|
||
returned value instead. If non-nil, @var{limit} is the number of
|
||
levels to go down from @var{root}.
|
||
|
||
Each node in the returned tree looks like @code{(@var{tree-sitter
|
||
node} . (@var{child} ...))}. The @var{tree-sitter node} of the root
|
||
of this tree will be nil if @var{ROOT} doesn't match @var{pred}. If
|
||
no node matches @var{predicate}, return nil.
|
||
@end defun
|
||
|
||
@heading More convenient functions
|
||
|
||
@defun treesit-filter-child node pred &optional named
|
||
This function finds immediate children of @var{node} that satisfies
|
||
@var{pred}.
|
||
|
||
Function @var{pred} takes the child node as the argument and should
|
||
return non-nil to indicated keeping the child. If @var{named}
|
||
non-nil, this function only searches for named nodes.
|
||
@end defun
|
||
|
||
@defun treesit-parent-until node pred
|
||
This function repeatedly finds the parent of @var{node}, and returns
|
||
the parent if it satisfies @var{pred} (which takes the parent as the
|
||
argument). If no parent satisfies @var{pred}, this function returns
|
||
nil.
|
||
@end defun
|
||
|
||
@defun treesit-parent-while
|
||
This function repeatedly finds the parent of @var{node}, and keeps
|
||
doing so as long as the parent satisfies @var{pred} (which takes the
|
||
parent as the single argument). I.e., this function returns the
|
||
farthest parent that still satisfies @var{pred}.
|
||
@end defun
|
||
|
||
@node Accessing Node
|
||
@section Accessing Node Information
|
||
|
||
Before going further, make sure you have read the basic conventions
|
||
about tree-sitter nodes in the previous node.
|
||
|
||
@heading Basic information
|
||
|
||
Every node is associated with a parser, and that parser is associated
|
||
with a buffer. The following functions let you retrieve them.
|
||
|
||
@defun treesit-node-parser node
|
||
This function returns @var{node}'s associated parser.
|
||
@end defun
|
||
|
||
@defun treesit-node-buffer node
|
||
This function returns @var{node}'s parser's associated buffer.
|
||
@end defun
|
||
|
||
@defun treesit-node-language node
|
||
This function returns @var{node}'s parser's associated language.
|
||
@end defun
|
||
|
||
Each node represents a piece of text in the buffer. Functions below
|
||
finds relevant information about that text.
|
||
|
||
@defun treesit-node-start node
|
||
Return the start position of @var{node}.
|
||
@end defun
|
||
|
||
@defun treesit-node-end node
|
||
Return the end position of @var{node}.
|
||
@end defun
|
||
|
||
@defun treesit-node-text node &optional object
|
||
Returns the buffer text that @var{node} represents. (If @var{node} is
|
||
retrieved from parsing a string, it will be text from that string.)
|
||
@end defun
|
||
|
||
Here are some basic checks on tree-sitter nodes.
|
||
|
||
@defun treesit-node-p object
|
||
Checks if @var{object} is a tree-sitter syntax node.
|
||
@end defun
|
||
|
||
@defun treesit-node-eq node1 node2
|
||
Checks if @var{node1} and @var{node2} are the same node in a syntax
|
||
tree.
|
||
@end defun
|
||
|
||
@heading Property information
|
||
|
||
In general, nodes in a concrete syntax tree fall into two categories:
|
||
@dfn{named nodes} and @dfn{anonymous nodes}. Whether a node is named
|
||
or anonymous is determined by the language definition
|
||
(@pxref{tree-sitter named node, named node}).
|
||
|
||
@cindex tree-sitter missing node
|
||
Apart from being named/anonymous, a node can have other properties. A
|
||
node can be ``missing'': missing nodes are inserted by the parser in
|
||
order to recover from certain kinds of syntax errors, i.e., something
|
||
should probably be there according to the grammar, but not there.
|
||
|
||
@cindex tree-sitter extra node
|
||
A node can be ``extra'': extra nodes represent things like comments,
|
||
which can appear anywhere in the text.
|
||
|
||
@cindex tree-sitter node that has changes
|
||
A node ``has changes'' if the buffer changed since when the node is
|
||
retrieved, i.e., outdated.
|
||
|
||
@cindex tree-sitter node that has error
|
||
A node ``has error'' if the text it spans contains a syntax error. It
|
||
can be the node itself has an error, or one of its
|
||
children/grandchildren... has an error.
|
||
|
||
@defun treesit-node-check node property
|
||
This function checks if @var{node} has @var{property}. @var{property}
|
||
can be @code{'named}, @code{'missing}, @code{'extra},
|
||
@code{'has-changes}, or @code{'has-error}.
|
||
@end defun
|
||
|
||
|
||
@defun treesit-node-type node
|
||
Named nodes have ``types'' (@pxref{tree-sitter node type, node type}).
|
||
For example, a named node can be a @code{string_literal} node, where
|
||
@code{string_literal} is its type.
|
||
|
||
This function returns @var{node}'s type as a string.
|
||
@end defun
|
||
|
||
@heading Information as a child or parent
|
||
|
||
@defun treesit-node-index node &optional named
|
||
This function returns the index of @var{node} as a child node of its
|
||
parent. If @var{named} is non-nil, it only count named nodes
|
||
(@pxref{tree-sitter named node, named node}).
|
||
@end defun
|
||
|
||
@defun treesit-node-field-name node
|
||
A child of a parent node could have a field name (@pxref{tree-sitter
|
||
node field name, field name}). This function returns the field name
|
||
of @var{node} as a child of its parent.
|
||
@end defun
|
||
|
||
@defun treesit-node-field-name-for-child node n
|
||
This function returns the field name of the @var{n}'th child of
|
||
@var{node}.
|
||
@end defun
|
||
|
||
@defun treesit-child-count node &optional named
|
||
This function finds the number of children of @var{node}. If
|
||
@var{named} is non-nil, it only counts named child (@pxref{tree-sitter
|
||
named node, named node}).
|
||
@end defun
|
||
|
||
@node Pattern Matching
|
||
@section Pattern Matching Tree-sitter Nodes
|
||
|
||
Tree-sitter let us pattern match with a small declarative language.
|
||
Pattern matching consists of two steps: first tree-sitter matches a
|
||
@dfn{pattern} against nodes in the syntax tree, then it @dfn{captures}
|
||
specific nodes in that pattern and returns the captured nodes.
|
||
|
||
We describe first how to write the most basic query pattern and how to
|
||
capture nodes in a pattern, then the pattern-match function, finally
|
||
more advanced pattern syntax.
|
||
|
||
@heading Basic query syntax
|
||
|
||
@cindex Tree-sitter query syntax
|
||
@cindex Tree-sitter query pattern
|
||
A @dfn{query} consists of multiple @dfn{patterns}. Each pattern is an
|
||
s-expression that matches a certain node in the syntax node. A
|
||
pattern has the following shape:
|
||
|
||
@example
|
||
(@var{type} @var{child}...)
|
||
@end example
|
||
|
||
@noindent
|
||
For example, a pattern that matches a @code{binary_expression} node that
|
||
contains @code{number_literal} child nodes would look like
|
||
|
||
@example
|
||
(binary_expression (number_literal))
|
||
@end example
|
||
|
||
To @dfn{capture} a node in the query pattern above, append
|
||
@code{@@capture-name} after the node pattern you want to capture. For
|
||
example,
|
||
|
||
@example
|
||
(binary_expression (number_literal) @@number-in-exp)
|
||
@end example
|
||
|
||
@noindent
|
||
captures @code{number_literal} nodes that are inside a
|
||
@code{binary_expression} node with capture name @code{number-in-exp}.
|
||
|
||
We can capture the @code{binary_expression} node too, with capture
|
||
name @code{biexp}:
|
||
|
||
@example
|
||
(binary_expression
|
||
(number_literal) @@number-in-exp) @@biexp
|
||
@end example
|
||
|
||
@heading Query function
|
||
|
||
Now we can introduce the query functions.
|
||
|
||
@defun treesit-query-capture node query &optional beg end node-only
|
||
This function matches patterns in @var{query} in @var{node}.
|
||
Parameter @var{query} can be either a string, a s-expression, or a
|
||
compiled query object. For now, we focus on the string syntax;
|
||
s-expression syntax and compiled query are described at the end of the
|
||
section.
|
||
|
||
Parameter @var{node} can also be a parser or a language symbol. A
|
||
parser means using its root node, a language symbol means find or
|
||
create a parser for that language in the current buffer, and use the
|
||
root node.
|
||
|
||
The function returns all captured nodes in a list of
|
||
@code{(@var{capture_name} . @var{node})}. If @var{node-only} is
|
||
non-nil, a list of node is returned instead. If @var{beg} and
|
||
@var{end} are both non-nil, this function only pattern matches nodes
|
||
in that range.
|
||
|
||
@vindex treesit-query-error
|
||
This function raise a @var{treesit-query-error} if @var{query} is
|
||
malformed. The signal data contains a description of the specific
|
||
error. You can use @code{treesit-query-validate} to debug the query.
|
||
@end defun
|
||
|
||
For example, suppose @var{node}'s content is @code{1 + 2}, and
|
||
@var{query} is
|
||
|
||
@example
|
||
@group
|
||
(setq query
|
||
"(binary_expression
|
||
(number_literal) @@number-in-exp) @@biexp")
|
||
@end group
|
||
@end example
|
||
|
||
Querying that query would return
|
||
|
||
@example
|
||
@group
|
||
(treesit-query-capture node query)
|
||
@result{} ((biexp . @var{<node for "1 + 2">})
|
||
(number-in-exp . @var{<node for "1">})
|
||
(number-in-exp . @var{<node for "2">}))
|
||
@end group
|
||
@end example
|
||
|
||
As we mentioned earlier, a @var{query} could contain multiple
|
||
patterns. For example, it could have two top-level patterns:
|
||
|
||
@example
|
||
@group
|
||
(setq query
|
||
"(binary_expression) @@biexp
|
||
(number_literal) @@number @@biexp")
|
||
@end group
|
||
@end example
|
||
|
||
@defun treesit-query-string string query language
|
||
This function parses @var{string} with @var{language}, pattern matches
|
||
its root node with @var{query}, and returns the result.
|
||
@end defun
|
||
|
||
@heading More query syntax
|
||
|
||
Besides node type and capture, tree-sitter's query syntax can express
|
||
anonymous node, field name, wildcard, quantification, grouping,
|
||
alternation, anchor, and predicate.
|
||
|
||
@subheading Anonymous node
|
||
|
||
An anonymous node is written verbatim, surrounded by quotes. A
|
||
pattern matching (and capturing) keyword @code{return} would be
|
||
|
||
@example
|
||
"return" @@keyword
|
||
@end example
|
||
|
||
@subheading Wild card
|
||
|
||
In a query pattern, @samp{(_)} matches any named node, and @samp{_}
|
||
matches any named and anonymous node. For example, to capture any
|
||
named child of a @code{binary_expression} node, the pattern would be
|
||
|
||
@example
|
||
(binary_expression (_) @@in_biexp)
|
||
@end example
|
||
|
||
@subheading Field name
|
||
|
||
We can capture child nodes that has specific field names:
|
||
|
||
@example
|
||
@group
|
||
(function_definition
|
||
declarator: (_) @@func-declarator
|
||
body: (_) @@func-body)
|
||
@end group
|
||
@end example
|
||
|
||
We can also capture a node that doesn't have certain field, say, a
|
||
@code{function_definition} without a @code{body} field.
|
||
|
||
@example
|
||
(function_definition !body) @@func-no-body
|
||
@end example
|
||
|
||
@subheading Quantify node
|
||
|
||
Tree-sitter recognizes quantification operators @samp{*}, @samp{+} and
|
||
@samp{?}. Their meanings are the same as in regular expressions:
|
||
@samp{*} matches the preceding pattern zero or more times, @samp{+}
|
||
matches one or more times, and @samp{?} matches zero or one time.
|
||
|
||
For example, this pattern matches @code{type_declaration} nodes
|
||
that has @emph{zero or more} @code{long} keyword.
|
||
|
||
@example
|
||
(type_declaration "long"*) @@long-type
|
||
@end example
|
||
|
||
And this pattern matches a type declaration that has zero or one
|
||
@code{long} keyword:
|
||
|
||
@example
|
||
(type_declaration "long"?) @@long-type
|
||
@end example
|
||
|
||
@subheading Grouping
|
||
|
||
Similar to groups in regular expression, we can bundle patterns into a
|
||
group and apply quantification operators to it. For example, to
|
||
express a comma separated list of identifiers, one could write
|
||
|
||
@example
|
||
(identifier) ("," (identifier))*
|
||
@end example
|
||
|
||
@subheading Alternation
|
||
|
||
Again, similar to regular expressions, we can express ``match anyone
|
||
from this group of patterns'' in the query pattern. The syntax is a
|
||
list of patterns enclosed in square brackets. For example, to capture
|
||
some keywords in C, the query pattern would be
|
||
|
||
@example
|
||
@group
|
||
[
|
||
"return"
|
||
"break"
|
||
"if"
|
||
"else"
|
||
] @@keyword
|
||
@end group
|
||
@end example
|
||
|
||
@subheading Anchor
|
||
|
||
The anchor operator @samp{.} can be used to enforce juxtaposition,
|
||
i.e., to enforce two things to be directly next to each other. The
|
||
two ``things'' can be two nodes, or a child and the end of its parent.
|
||
For example, to capture the first child, the last child, or two
|
||
adjacent children:
|
||
|
||
@example
|
||
@group
|
||
;; Anchor the child with the end of its parent.
|
||
(compound_expression (_) @@last-child .)
|
||
|
||
;; Anchor the child with the beginning of its parent.
|
||
(compound_expression . (_) @@first-child)
|
||
|
||
;; Anchor two adjacent children.
|
||
(compound_expression
|
||
(_) @@prev-child
|
||
.
|
||
(_) @@next-child)
|
||
@end group
|
||
@end example
|
||
|
||
Note that the enforcement of juxtaposition ignores any anonymous
|
||
nodes.
|
||
|
||
@subheading Predicate
|
||
|
||
We can add predicate constraints to a pattern. For example, if we use
|
||
the following query pattern
|
||
|
||
@example
|
||
@group
|
||
(
|
||
(array . (_) @@first (_) @@last .)
|
||
(#equal @@first @@last)
|
||
)
|
||
@end group
|
||
@end example
|
||
|
||
Then tree-sitter only matches arrays where the first element equals to
|
||
the last element. To attach a predicate to a pattern, we need to
|
||
group then together. A predicate always starts with a @samp{#}.
|
||
Currently there are two predicates, @code{#equal} and @code{#match}.
|
||
|
||
@deffn Predicate equal arg1 arg2
|
||
Matches if @var{arg1} equals to @var{arg2}. Arguments can be either a
|
||
string or a capture name. Capture names represent the text that the
|
||
captured node spans in the buffer.
|
||
@end deffn
|
||
|
||
@deffn Predicate match regexp capture-name
|
||
Matches if the text that @var{capture-name}’s node spans in the buffer
|
||
matches regular expression @var{regexp}. Matching is case-sensitive.
|
||
@end deffn
|
||
|
||
Note that a predicate can only refer to capture names appeared in the
|
||
same pattern. Indeed, it makes little sense to refer to capture names
|
||
in other patterns anyway.
|
||
|
||
@heading S-expression patterns
|
||
|
||
Besides strings, Emacs provides a s-expression based syntax for query
|
||
patterns. It largely resembles the string-based syntax. For example,
|
||
the following pattern
|
||
|
||
@example
|
||
@group
|
||
(treesit-query-capture
|
||
node "(addition_expression
|
||
left: (_) @@left
|
||
\"+\" @@plus-sign
|
||
right: (_) @@right) @@addition
|
||
|
||
[\"return\" \"break\"] @@keyword")
|
||
@end group
|
||
@end example
|
||
|
||
@noindent
|
||
is equivalent to
|
||
|
||
@example
|
||
@group
|
||
(treesit-query-capture
|
||
node '((addition_expression
|
||
left: (_) @@left
|
||
"+" @@plus-sign
|
||
right: (_) @@right) @@addition
|
||
|
||
["return" "break"] @@keyword))
|
||
@end group
|
||
@end example
|
||
|
||
Most pattern syntax can be written directly as strange but
|
||
never-the-less valid s-expressions. Only a few of them needs
|
||
modification:
|
||
|
||
@itemize
|
||
@item
|
||
Anchor @samp{.} is written as @code{:anchor}.
|
||
@item
|
||
@samp{?} is written as @samp{:?}.
|
||
@item
|
||
@samp{*} is written as @samp{:*}.
|
||
@item
|
||
@samp{+} is written as @samp{:+}.
|
||
@item
|
||
@code{#equal} is written as @code{:equal}. In general, predicates
|
||
change their @samp{#} to @samp{:}.
|
||
@end itemize
|
||
|
||
For example,
|
||
|
||
@example
|
||
@group
|
||
"(
|
||
(compound_expression . (_) @@first (_)* @@rest)
|
||
(#match \"love\" @@first)
|
||
)"
|
||
@end group
|
||
@end example
|
||
|
||
is written in s-expression as
|
||
|
||
@example
|
||
@group
|
||
'((
|
||
(compound_expression :anchor (_) @@first (_) :* @@rest)
|
||
(:match "love" @@first)
|
||
))
|
||
@end group
|
||
@end example
|
||
|
||
@heading Compiling queries
|
||
|
||
If a query will be used repeatedly, especially in tight loops, it is
|
||
important to compile that query, because a compiled query is much
|
||
faster than an uncompiled one. A compiled query can be used anywhere
|
||
a query is accepted.
|
||
|
||
@defun treesit-query-compile language query
|
||
This function compiles @var{query} for @var{language} into a compiled
|
||
query object and returns it.
|
||
|
||
This function raise a @var{treesit-query-error} if @var{query} is
|
||
malformed. The signal data contains a description of the specific
|
||
error. You can use @code{treesit-query-validate} to debug the query.
|
||
@end defun
|
||
|
||
@defun treesit-query-language query
|
||
This function return the language of @var{query}.
|
||
@end defun
|
||
|
||
@defun treesit-query-expand query
|
||
This function expands the s-expression @var{query} into a string
|
||
query.
|
||
@end defun
|
||
|
||
@defun treesit-pattern-expand pattern
|
||
This function expands the s-expression @var{pattern} into a string
|
||
pattern.
|
||
@end defun
|
||
|
||
Finally, tree-sitter project's documentation about
|
||
pattern-matching can be found at
|
||
@uref{https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries}.
|
||
|
||
@node Multiple Languages
|
||
@section Parsing Text in Multiple Languages
|
||
|
||
Sometimes, the source of a programming language could contain sources
|
||
of other languages, HTML + CSS + JavaScript is one example. In that
|
||
case, we need to assign individual parsers to text segments written in
|
||
different languages. Traditionally this is achieved by using
|
||
narrowing. While tree-sitter works with narrowing (@pxref{tree-sitter
|
||
narrowing, narrowing}), the recommended way is to set ranges in which
|
||
a parser will operate.
|
||
|
||
@defun treesit-parser-set-included-ranges parser ranges
|
||
This function sets the range of @var{parser} to @var{ranges}. Then
|
||
@var{parser} will only read the text covered in each range. Each
|
||
range in @var{ranges} is a list of cons @code{(@var{beg}
|
||
. @var{end})}.
|
||
|
||
Each range in @var{ranges} must come in order and not overlap. That
|
||
is, in pseudo code:
|
||
|
||
@example
|
||
@group
|
||
(cl-loop for idx from 1 to (1- (length ranges))
|
||
for prev = (nth (1- idx) ranges)
|
||
for next = (nth idx ranges)
|
||
should (<= (car prev) (cdr prev)
|
||
(car next) (cdr next)))
|
||
@end group
|
||
@end example
|
||
|
||
@vindex treesit-range-invalid
|
||
If @var{ranges} violates this constraint, or something else went
|
||
wrong, this function signals a @code{treesit-range-invalid}. The
|
||
signal data contains a specific error message and the ranges we are
|
||
trying to set.
|
||
|
||
This function can also be used for disabling ranges. If @var{ranges}
|
||
is nil, the parser is set to parse the whole buffer.
|
||
|
||
Example:
|
||
|
||
@example
|
||
@group
|
||
(treesit-parser-set-included-ranges
|
||
parser '((1 . 9) (16 . 24) (24 . 25)))
|
||
@end group
|
||
@end example
|
||
@end defun
|
||
|
||
@defun treesit-parser-included-ranges parser
|
||
This function returns the ranges set for @var{parser}. The return
|
||
value is the same as the @var{ranges} argument of
|
||
@code{treesit-parser-included-ranges}: a list of cons
|
||
@code{(@var{beg} . @var{end})}. And if @var{parser} doesn't have any
|
||
ranges, the return value is nil.
|
||
|
||
@example
|
||
@group
|
||
(treesit-parser-included-ranges parser)
|
||
@result{} ((1 . 9) (16 . 24) (24 . 25))
|
||
@end group
|
||
@end example
|
||
@end defun
|
||
|
||
@defun treesit-set-ranges parser-or-lang ranges
|
||
Like @code{treesit-parser-set-included-ranges}, this function sets
|
||
the ranges of @var{parser-or-lang} to @var{ranges}. Conveniently,
|
||
@var{parser-or-lang} could be either a parser or a language. If it is
|
||
a language, this function looks for the first parser in
|
||
@code{(treesit-parser-list)} for that language in the current buffer,
|
||
and set range for it.
|
||
@end defun
|
||
|
||
@defun treesit-get-ranges parser-or-lang
|
||
This function returns the ranges of @var{parser-or-lang}, like
|
||
@code{treesit-parser-included-ranges}. And like
|
||
@code{treesit-set-ranges}, @var{parser-or-lang} can be a parser or
|
||
a language symbol.
|
||
@end defun
|
||
|
||
@defun treesit-query-range source query &optional beg end
|
||
This function matches @var{source} with @var{query} and returns the
|
||
ranges of captured nodes. The return value has the same shape of
|
||
other functions: a list of @code{(@var{beg} . @var{end})}.
|
||
|
||
For convenience, @var{source} can be a language symbol, a parser, or a
|
||
node. If a language symbol, this function matches in the root node of
|
||
the first parser using that language; if a parser, this function
|
||
matches in the root node of that parser; if a node, this function
|
||
matches in that node.
|
||
|
||
Parameter @var{query} is the query used to capture nodes
|
||
(@pxref{Pattern Matching}). The capture names don't matter. Parameter
|
||
@var{beg} and @var{end}, if both non-nil, limits the range in which
|
||
this function queries.
|
||
|
||
Like other query functions, this function raises an
|
||
@var{treesit-query-error} if @var{query} is malformed.
|
||
@end defun
|
||
|
||
@defun treesit-language-at point
|
||
This function tries to figure out which language is responsible for
|
||
the text at @var{point}. It goes over each parser in
|
||
@code{(treesit-parser-list)} and see if that parser's range covers
|
||
@var{point}.
|
||
@end defun
|
||
|
||
@defvar treesit-range-functions
|
||
A list of range functions. Font-locking and indenting code uses
|
||
functions in this alist to set correct ranges for a language parser
|
||
before using it.
|
||
|
||
The signature of each function should be
|
||
|
||
@example
|
||
(@var{start} @var{end} &rest @var{_})
|
||
@end example
|
||
|
||
where @var{start} and @var{end} marks the region that is about to be
|
||
used. A range function only need to (but not limited to) update
|
||
ranges in that region.
|
||
|
||
Each function in the list is called in-order.
|
||
@end defvar
|
||
|
||
@defun treesit-update-ranges &optional start end
|
||
This function is used by font-lock and indent to update ranges before
|
||
using any parser. Each range function in
|
||
@var{treesit-range-functions} is called in-order. Arguments
|
||
@var{start} and @var{end} are passed to each range function.
|
||
@end defun
|
||
|
||
@heading An example
|
||
|
||
Normally, in a set of languages that can be mixed together, there is a
|
||
major language and several embedded languages. We first parse the
|
||
whole document with the major language’s parser, set ranges for the
|
||
embedded languages, then parse the embedded languages.
|
||
|
||
Suppose we want to parse a very simple document that mixes HTML, CSS
|
||
and JavaScript:
|
||
|
||
@example
|
||
@group
|
||
<html>
|
||
<script>1 + 2</script>
|
||
<style>body @{ color: "blue"; @}</style>
|
||
</html>
|
||
@end group
|
||
@end example
|
||
|
||
We first parse with HTML, then set ranges for CSS and JavaScript:
|
||
|
||
@example
|
||
@group
|
||
;; Create parsers.
|
||
(setq html (treesit-get-parser-create 'html))
|
||
(setq css (treesit-get-parser-create 'css))
|
||
(setq js (treesit-get-parser-create 'javascript))
|
||
|
||
;; Set CSS ranges.
|
||
(setq css-range
|
||
(treesit-query-range
|
||
'html
|
||
"(style_element (raw_text) @@capture)"))
|
||
(treesit-parser-set-included-ranges css css-range)
|
||
|
||
;; Set JavaScript ranges.
|
||
(setq js-range
|
||
(treesit-query-range
|
||
'html
|
||
"(script_element (raw_text) @@capture)"))
|
||
(treesit-parser-set-included-ranges js js-range)
|
||
@end group
|
||
@end example
|
||
|
||
We use a query pattern @code{(style_element (raw_text) @@capture)} to
|
||
find CSS nodes in the HTML parse tree. For how to write query
|
||
patterns, @pxref{Pattern Matching}.
|
||
|
||
@node Tree-sitter C API
|
||
@section Tree-sitter C API Correspondence
|
||
|
||
Emacs' tree-sitter integration doesn't expose every feature
|
||
tree-sitter's C API provides. Missing features include:
|
||
|
||
@itemize
|
||
@item
|
||
Creating a tree cursor and navigating the syntax tree with it.
|
||
@item
|
||
Setting timeout and cancellation flag for a parser.
|
||
@item
|
||
Setting the logger for a parser.
|
||
@item
|
||
Printing a DOT graph of the syntax tree to a file.
|
||
@item
|
||
Coping and modifying a syntax tree. (Emacs doesn't expose a tree
|
||
object.)
|
||
@item
|
||
Using (row, column) coordinates as position.
|
||
@item
|
||
Updating a node with changes. (In Emacs, retrieve a new node instead
|
||
of updating the existing one.)
|
||
@item
|
||
Querying statics of a language definition.
|
||
@end itemize
|
||
|
||
In addition, Emacs makes some changes to the C API to make the API more
|
||
convenient and idiomatic:
|
||
|
||
@itemize
|
||
@item
|
||
Instead of using byte positions, the ELisp API uses character
|
||
positions.
|
||
@item
|
||
Null nodes are converted to nil.
|
||
@end itemize
|
||
|
||
Below is the correspondence between all C API functions and their
|
||
ELisp counterparts. Sometimes one ELisp function corresponds to
|
||
multiple C functions, and many C functions don't have an ELisp
|
||
counterpart.
|
||
|
||
@example
|
||
ts_parser_new treesit-parser-create
|
||
ts_parser_delete
|
||
ts_parser_set_language
|
||
ts_parser_language treesit-parser-language
|
||
ts_parser_set_included_ranges treesit-parser-set-included-ranges
|
||
ts_parser_included_ranges treesit-parser-included-ranges
|
||
ts_parser_parse
|
||
ts_parser_parse_string treesit-parse-string
|
||
ts_parser_parse_string_encoding
|
||
ts_parser_reset
|
||
ts_parser_set_timeout_micros
|
||
ts_parser_timeout_micros
|
||
ts_parser_set_cancellation_flag
|
||
ts_parser_cancellation_flag
|
||
ts_parser_set_logger
|
||
ts_parser_logger
|
||
ts_parser_print_dot_graphs
|
||
ts_tree_copy
|
||
ts_tree_delete
|
||
ts_tree_root_node
|
||
ts_tree_language
|
||
ts_tree_edit
|
||
ts_tree_get_changed_ranges
|
||
ts_tree_print_dot_graph
|
||
ts_node_type treesit-node-type
|
||
ts_node_symbol
|
||
ts_node_start_byte treesit-node-start
|
||
ts_node_start_point
|
||
ts_node_end_byte treesit-node-end
|
||
ts_node_end_point
|
||
ts_node_string treesit-node-string
|
||
ts_node_is_null
|
||
ts_node_is_named treesit-node-check
|
||
ts_node_is_missing treesit-node-check
|
||
ts_node_is_extra treesit-node-check
|
||
ts_node_has_changes treesit-node-check
|
||
ts_node_has_error treesit-node-check
|
||
ts_node_parent treesit-node-parent
|
||
ts_node_child treesit-node-child
|
||
ts_node_field_name_for_child treesit-node-field-name-for-child
|
||
ts_node_child_count treesit-node-child-count
|
||
ts_node_named_child treesit-node-child
|
||
ts_node_named_child_count treesit-node-child-count
|
||
ts_node_child_by_field_name treesit-node-by-field-name
|
||
ts_node_child_by_field_id
|
||
ts_node_next_sibling treesit-next-sibling
|
||
ts_node_prev_sibling treesit-prev-sibling
|
||
ts_node_next_named_sibling treesit-next-sibling
|
||
ts_node_prev_named_sibling treesit-prev-sibling
|
||
ts_node_first_child_for_byte treesit-first-child-for-pos
|
||
ts_node_first_named_child_for_byte treesit-first-child-for-pos
|
||
ts_node_descendant_for_byte_range treesit-descendant-for-range
|
||
ts_node_descendant_for_point_range
|
||
ts_node_named_descendant_for_byte_range treesit-descendant-for-range
|
||
ts_node_named_descendant_for_point_range
|
||
ts_node_edit
|
||
ts_node_eq treesit-node-eq
|
||
ts_tree_cursor_new
|
||
ts_tree_cursor_delete
|
||
ts_tree_cursor_reset
|
||
ts_tree_cursor_current_node
|
||
ts_tree_cursor_current_field_name
|
||
ts_tree_cursor_current_field_id
|
||
ts_tree_cursor_goto_parent
|
||
ts_tree_cursor_goto_next_sibling
|
||
ts_tree_cursor_goto_first_child
|
||
ts_tree_cursor_goto_first_child_for_byte
|
||
ts_tree_cursor_goto_first_child_for_point
|
||
ts_tree_cursor_copy
|
||
ts_query_new
|
||
ts_query_delete
|
||
ts_query_pattern_count
|
||
ts_query_capture_count
|
||
ts_query_string_count
|
||
ts_query_start_byte_for_pattern
|
||
ts_query_predicates_for_pattern
|
||
ts_query_step_is_definite
|
||
ts_query_capture_name_for_id
|
||
ts_query_string_value_for_id
|
||
ts_query_disable_capture
|
||
ts_query_disable_pattern
|
||
ts_query_cursor_new
|
||
ts_query_cursor_delete
|
||
ts_query_cursor_exec treesit-query-capture
|
||
ts_query_cursor_did_exceed_match_limit
|
||
ts_query_cursor_match_limit
|
||
ts_query_cursor_set_match_limit
|
||
ts_query_cursor_set_byte_range
|
||
ts_query_cursor_set_point_range
|
||
ts_query_cursor_next_match
|
||
ts_query_cursor_remove_match
|
||
ts_query_cursor_next_capture
|
||
ts_language_symbol_count
|
||
ts_language_symbol_name
|
||
ts_language_symbol_for_name
|
||
ts_language_field_count
|
||
ts_language_field_name_for_id
|
||
ts_language_field_id_for_name
|
||
ts_language_symbol_type
|
||
ts_language_version
|
||
@end example
|