; Update guides in /admin/notes/tree-sitter

* admin/notes/tree-sitter/html-manual/Language-Definitions.html
* admin/notes/tree-sitter/html-manual/Multiple-Languages.html
* admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html
* admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html
* admin/notes/tree-sitter/html-manual/Parsing-Program-Source.html
* admin/notes/tree-sitter/html-manual/Pattern-Matching.html
* admin/notes/tree-sitter/html-manual/Retrieving-Node.html
* admin/notes/tree-sitter/html-manual/Tree_002dsitter-C-API.html
* admin/notes/tree-sitter/html-manual/Using-Parser.html
* admin/notes/tree-sitter/starter-guide: Update to reflect changes
made recently.
This commit is contained in:
Yuan Fu 2022-11-03 11:41:42 -07:00
parent 9909652849
commit 5416ae5990
No known key found for this signature in database
GPG key ID: 56E19BC57664A442
10 changed files with 988 additions and 752 deletions

View file

@ -66,14 +66,17 @@
</div>
<hr>
<span id="Tree_002dsitter-Language-Definitions"></span><h3 class="section">37.1 Tree-sitter Language Definitions</h3>
<span id="index-language-definitions_002c-for-tree_002dsitter"></span>
<span id="Loading-a-language-definition"></span><h3 class="heading">Loading a language definition</h3>
<span id="index-loading-language-definition-for-tree_002dsitter"></span>
<span id="index-language-argument_002c-for-tree_002dsitter"></span>
<p>Tree-sitter relies on language definitions to parse text in that
language. In Emacs, A language definition is represented by a symbol.
For example, C language definition is represented as <code>c</code>, and
<code>c</code> can be passed to tree-sitter functions as the <var>language</var>
argument.
language. In Emacs, a language definition is represented by a symbol.
For example, the C language definition is represented as the symbol
<code>c</code>, and <code>c</code> can be passed to tree-sitter functions as the
<var>language</var> argument.
</p>
<span id="index-treesit_002dextra_002dload_002dpath"></span>
<span id="index-treesit_002dload_002dlanguage_002derror"></span>
@ -81,55 +84,92 @@
<p>Tree-sitter language definitions are distributed as dynamic libraries.
In order to use a language definition in Emacs, you need to make sure
that the dynamic library is installed on the system. Emacs looks for
language definitions under load paths in
<code>treesit-extra-load-path</code>, <code>user-emacs-directory</code>/tree-sitter,
and system default locations for dynamic libraries, in that order.
Emacs tries each extensions in <code>treesit-load-suffixes</code>. If Emacs
cannot find the library or has problem loading it, Emacs signals
<code>treesit-load-language-error</code>. The signal data is a list of
specific error messages.
language definitions in several places, in the following order:
</p>
<ul>
<li> first, in the list of directories specified by the variable
<code>treesit-extra-load-path</code>;
</li><li> then, in the <samp>tree-sitter</samp> subdirectory of the directory
specified by <code>user-emacs-directory</code> (see <a href="Init-File.html">The Init File</a>);
</li><li> and finally, in the system&rsquo;s default locations for dynamic libraries.
</li></ul>
<p>In each of these directories, Emacs looks for a file with file-name
extensions specified by the variable <code>treesit-load-suffixes</code>.
</p>
<p>If Emacs cannot find the library or has problems loading it, Emacs
signals the <code>treesit-load-language-error</code> error. The data of
that signal could be one of the following:
</p>
<dl compact="compact">
<dt><span><code>(not-found <var>error-msg</var> &hellip;)</code></span></dt>
<dd><p>This means that Emacs could not find the language definition library.
</p></dd>
<dt><span><code>(symbol-error <var>error-msg</var>)</code></span></dt>
<dd><p>This means that Emacs could not find in the library the expected function
that every language definition library should export.
</p></dd>
<dt><span><code>(version-mismatch <var>error-msg</var>)</code></span></dt>
<dd><p>This means that the version of language definition library is incompatible
with that of the tree-sitter library.
</p></dd>
</dl>
<p>In all of these cases, <var>error-msg</var> might provide additional
details about the failure.
</p>
<dl class="def">
<dt id="index-treesit_002dlanguage_002davailable_002dp"><span class="category">Function: </span><span><strong>treesit-language-available-p</strong> <em>language</em><a href='#index-treesit_002dlanguage_002davailable_002dp' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function checks whether the dynamic library for <var>language</var> is
present on the system, and return non-nil if it is.
<dt id="index-treesit_002dlanguage_002davailable_002dp"><span class="category">Function: </span><span><strong>treesit-language-available-p</strong> <em>language &amp;optional detail</em><a href='#index-treesit_002dlanguage_002davailable_002dp' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns non-<code>nil</code> if the language definitions for
<var>language</var> exist and can be loaded.
</p>
<p>If <var>detail</var> is non-<code>nil</code>, return <code>(t . nil)</code> when
<var>language</var> is available, and <code>(nil . <var>data</var>)</code> when it&rsquo;s
unavailable. <var>data</var> is the signal data of
<code>treesit-load-language-error</code>.
</p></dd></dl>
<span id="index-treesit_002dload_002dname_002doverride_002dlist"></span>
<p>By convention, the dynamic library for <var>language</var> is
<code>libtree-sitter-<var>language</var>.<var>ext</var></code>, where <var>ext</var> is the
system-specific extension for dynamic libraries. Also by convention,
<p>By convention, the file name of the dynamic library for <var>language</var> is
<samp>libtree-sitter-<var>language</var>.<var>ext</var></samp>, where <var>ext</var> is the
system-specific extension for dynamic libraries. Also by convention,
the function provided by that library is named
<code>tree_sitter_<var>language</var></code>. If a language definition doesn&rsquo;t
follow this convention, you should add an entry
<code>tree_sitter_<var>language</var></code>. If a language definition library
doesn&rsquo;t follow this convention, you should add an entry
</p>
<div class="example">
<pre class="example">(<var>language</var> <var>library-base-name</var> <var>function-name</var>)
</pre></div>
<p>to <code>treesit-load-name-override-list</code>, where
<var>library-base-name</var> is the base filename for the dynamic library
(conventionally <code>libtree-sitter-<var>language</var></code>), and
<p>to the list in the variable <code>treesit-load-name-override-list</code>, where
<var>library-base-name</var> is the basename of the dynamic library&rsquo;s file name,
(usually, <samp>libtree-sitter-<var>language</var></samp>), and
<var>function-name</var> is the function provided by the library
(conventionally <code>tree_sitter_<var>language</var></code>). For example,
(usually, <code>tree_sitter_<var>language</var></code>). For example,
</p>
<div class="example">
<pre class="example">(cool-lang &quot;libtree-sitter-coool&quot; &quot;tree_sitter_cooool&quot;)
</pre></div>
<p>for a language too cool to abide by conventions.
<p>for a language that considers itself too &ldquo;cool&rdquo; to abide by
conventions.
</p>
<span id="index-language_002ddefinition-version_002c-compatibility"></span>
<dl class="def">
<dt id="index-treesit_002dlanguage_002dversion"><span class="category">Function: </span><span><strong>treesit-language-version</strong> <em>&amp;optional min-compatible</em><a href='#index-treesit_002dlanguage_002dversion' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Tree-sitter library has a <em>language version</em>, a language
definition&rsquo;s version needs to match this version to be compatible.
</p>
<p>This function returns tree-sitter librarys language version. If
<var>min-compatible</var> is non-nil, it returns the minimal compatible
version.
<dd><p>This function returns the version of the language-definition
Application Binary Interface (<acronym>ABI</acronym>) supported by the
tree-sitter library. By default, it returns the latest ABI version
supported by the library, but if <var>min-compatible</var> is
non-<code>nil</code>, it returns the oldest ABI version which the library
still can support. Language definition libraries must be built for
ABI versions between the oldest and the latest versions supported by
the tree-sitter library, otherwise the library will be unable to load
them.
</p></dd></dl>
<span id="Concrete-syntax-tree"></span><h3 class="heading">Concrete syntax tree</h3>
<span id="index-syntax-tree_002c-concrete"></span>
<p>A syntax tree is what a parser generates. In a syntax tree, each node
represents a piece of text, and is connected to each other by a
@ -155,31 +195,34 @@
+------------+ +--------------+ +------------+
</pre></div>
<p>We can also represent it in s-expression:
<p>We can also represent it as an s-expression:
</p>
<div class="example">
<pre class="example">(root (expression (number) (operator) (number)))
</pre></div>
<span id="Node-types"></span><h4 class="subheading">Node types</h4>
<span id="index-node-types_002c-in-a-syntax-tree"></span>
<span id="index-tree_002dsitter-node-type"></span>
<span id="tree_002dsitter-node-type"></span><span id="index-tree_002dsitter-named-node"></span>
<span id="tree_002dsitter-named-node"></span><span id="index-tree_002dsitter-anonymous-node"></span>
<p>Names like <code>root</code>, <code>expression</code>, <code>number</code>,
<code>operator</code> are nodes&rsquo; <em>type</em>. However, not all nodes in a
syntax tree have a type. Nodes that don&rsquo;t are <em>anonymous nodes</em>,
and nodes with a type are <em>named nodes</em>. Anonymous nodes are
tokens with fixed spellings, including punctuation characters like
bracket &lsquo;<samp>]</samp>&rsquo;, and keywords like <code>return</code>.
<span id="index-type-of-node_002c-tree_002dsitter"></span>
<span id="tree_002dsitter-node-type"></span><span id="index-named-node_002c-tree_002dsitter"></span>
<span id="tree_002dsitter-named-node"></span><span id="index-anonymous-node_002c-tree_002dsitter"></span>
<p>Names like <code>root</code>, <code>expression</code>, <code>number</code>, and
<code>operator</code> specify the <em>type</em> of the nodes. However, not all
nodes in a syntax tree have a type. Nodes that don&rsquo;t have a type are
known as <em>anonymous nodes</em>, and nodes with a type are <em>named
nodes</em>. Anonymous nodes are tokens with fixed spellings, including
punctuation characters like bracket &lsquo;<samp>]</samp>&rsquo;, and keywords like
<code>return</code>.
</p>
<span id="Field-names"></span><h4 class="subheading">Field names</h4>
<span id="index-field-name_002c-tree_002dsitter"></span>
<span id="index-tree_002dsitter-node-field-name"></span>
<span id="tree_002dsitter-node-field-name"></span><p>To make the syntax tree easier to
analyze, many language definitions assign <em>field names</em> to child
nodes. For example, a <code>function_definition</code> node could have a
<code>declarator</code> and a <code>body</code>:
<span id="tree_002dsitter-node-field-name"></span><p>To make the syntax tree easier to analyze, many language definitions
assign <em>field names</em> to child nodes. For example, a
<code>function_definition</code> node could have a <code>declarator</code> and a
<code>body</code>:
</p>
<div class="example">
<pre class="example">(function_definition
@ -189,39 +232,40 @@
<dl class="def">
<dt id="index-treesit_002dinspect_002dmode"><span class="category">Command: </span><span><strong>treesit-inspect-mode</strong><a href='#index-treesit_002dinspect_002dmode' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This minor mode displays the node that <em>starts</em> at point in
mode-line. The mode-line will display
<dd><p>This minor mode displays on the mode-line the node that <em>starts</em>
at point. The mode-line will display
</p>
<div class="example">
<pre class="example"><var>parent</var> <var>field-name</var>: (<var>child</var> (<var>grand-child</var> (...)))
<pre class="example"><var>parent</var> <var>field</var>: (<var>child</var> (<var>grandchild</var> (&hellip;)))
</pre></div>
<p><var>child</var>, <var>grand-child</var>, and <var>grand-grand-child</var>, etc, are
nodes that have their beginning at point. And <var>parent</var> is the
parent of <var>child</var>.
<p><var>child</var>, <var>grand</var>, <var>grand-grandchild</var>, etc., are nodes that
begin at point. <var>parent</var> is the parent node of <var>child</var>.
</p>
<p>If there is no node that starts at point, i.e., point is in the middle
of a node, then the mode-line only displays the smallest node that
spans point, and its immediate parent.
spans the position of point, and its immediate parent.
</p>
<p>This minor mode doesn&rsquo;t create parsers on its own. It simply uses the
first parser in <code>(treesit-parser-list)</code> (see <a href="Using-Parser.html">Using Tree-sitter Parser</a>).
</p></dd></dl>
<span id="Reading-the-grammar-definition"></span><h3 class="heading">Reading the grammar definition</h3>
<span id="index-reading-grammar-definition_002c-tree_002dsitter"></span>
<p>Authors of language definitions define the <em>grammar</em> of a
language, and this grammar determines how does a parser construct a
concrete syntax tree out of the text. In order to use the syntax
tree effectively, we need to read the <em>grammar file</em>.
programming language, which determines how a parser constructs a
concrete syntax tree out of the program text. In order to use the
syntax tree effectively, you need to consult the <em>grammar file</em>.
</p>
<p>The grammar file is usually <code>grammar.js</code> in a language
definitions project repository. The link to a language definitions
home page can be found in tree-sitters homepage
(<a href="https://tree-sitter.github.io/tree-sitter">https://tree-sitter.github.io/tree-sitter</a>).
<p>The grammar file is usually <samp>grammar.js</samp> in a language
definition&rsquo;s project repository. The link to a language definition&rsquo;s
home page can be found on
<a href="https://tree-sitter.github.io/tree-sitter">tree-sitter&rsquo;s
homepage</a>.
</p>
<p>The grammar is written in JavaScript syntax. For example, the rule
matching a <code>function_definition</code> node looks like
<p>The grammar definition is written in JavaScript. For example, the
rule matching a <code>function_definition</code> node looks like
</p>
<div class="example">
<pre class="example">function_definition: $ =&gt; seq(
@ -231,12 +275,12 @@
)
</pre></div>
<p>The rule is represented by a function that takes a single argument
<p>The rules are represented by functions that take a single argument
<var>$</var>, representing the whole grammar. The function itself is
constructed by other functions: the <code>seq</code> function puts together a
sequence of children; the <code>field</code> function annotates a child with
a field name. If we write the above definition in BNF syntax, it
would look like
constructed by other functions: the <code>seq</code> function puts together
a sequence of children; the <code>field</code> function annotates a child
with a field name. If we write the above definition in the so-called
<em>Backus-Naur Form</em> (<acronym>BNF</acronym>) syntax, it would look like
</p>
<div class="example">
<pre class="example">function_definition :=
@ -252,66 +296,77 @@
body: (compound_statement))
</pre></div>
<p>Below is a list of functions that one will see in a grammar
definition. Each function takes other rules as arguments and returns
a new rule.
<p>Below is a list of functions that one can see in a grammar definition.
Each function takes other rules as arguments and returns a new rule.
</p>
<ul>
<li> <code>seq(rule1, rule2, ...)</code> matches each rule one after another.
</li><li> <code>choice(rule1, rule2, ...)</code> matches one of the rules in its
arguments.
</li><li> <code>repeat(rule)</code> matches <var>rule</var> for <em>zero or more</em> times.
<dl compact="compact">
<dt><span><code>seq(<var>rule1</var>, <var>rule2</var>, &hellip;)</code></span></dt>
<dd><p>matches each rule one after another.
</p></dd>
<dt><span><code>choice(<var>rule1</var>, <var>rule2</var>, &hellip;)</code></span></dt>
<dd><p>matches one of the rules in its arguments.
</p></dd>
<dt><span><code>repeat(<var>rule</var>)</code></span></dt>
<dd><p>matches <var>rule</var> for <em>zero or more</em> times.
This is like the &lsquo;<samp>*</samp>&rsquo; operator in regular expressions.
</li><li> <code>repeat1(rule)</code> matches <var>rule</var> for <em>one or more</em> times.
</p></dd>
<dt><span><code>repeat1(<var>rule</var>)</code></span></dt>
<dd><p>matches <var>rule</var> for <em>one or more</em> times.
This is like the &lsquo;<samp>+</samp>&rsquo; operator in regular expressions.
</li><li> <code>optional(rule)</code> matches <var>rule</var> for <em>zero or one</em> time.
</p></dd>
<dt><span><code>optional(<var>rule</var>)</code></span></dt>
<dd><p>matches <var>rule</var> for <em>zero or one</em> time.
This is like the &lsquo;<samp>?</samp>&rsquo; operator in regular expressions.
</li><li> <code>field(name, rule)</code> assigns field name <var>name</var> to the child
node matched by <var>rule</var>.
</li><li> <code>alias(rule, alias)</code> makes nodes matched by <var>rule</var> appear as
<var>alias</var> in the syntax tree generated by the parser. For example,
</p></dd>
<dt><span><code>field(<var>name</var>, <var>rule</var>)</code></span></dt>
<dd><p>assigns field name <var>name</var> to the child node matched by <var>rule</var>.
</p></dd>
<dt><span><code>alias(<var>rule</var>, <var>alias</var>)</code></span></dt>
<dd><p>makes nodes matched by <var>rule</var> appear as <var>alias</var> in the syntax
tree generated by the parser. For example,
</p>
<div class="example">
<pre class="example">alias(preprocessor_call_exp, call_expression)
</pre></div>
<p>makes any node matched by <code>preprocessor_call_exp</code> to appear as
<p>makes any node matched by <code>preprocessor_call_exp</code> appear as
<code>call_expression</code>.
</p></li></ul>
</p></dd>
</dl>
<p>Below are grammar functions less interesting for a reader of a
<p>Below are grammar functions of lesser importance for reading a
language definition.
</p>
<ul>
<li> <code>token(rule)</code> marks <var>rule</var> to produce a single leaf node.
That is, instead of generating a parent node with individual child
nodes under it, everything is combined into a single leaf node.
<dl compact="compact">
<dt><span><code>token(<var>rule</var>)</code></span></dt>
<dd><p>marks <var>rule</var> to produce a single leaf node. That is, instead of
generating a parent node with individual child nodes under it,
everything is combined into a single leaf node.
</p></dd>
<dt><span><code>token.immediate(<var>rule</var>)</code></span></dt>
<dd><p>Normally, grammar rules ignore preceding whitespace; this
changes <var>rule</var> to match only when there is no preceding
whitespaces.
</p></dd>
<dt><span><code>prec(<var>n</var>, <var>rule</var>)</code></span></dt>
<dd><p>gives <var>rule</var> the level-<var>n</var> precedence.
</p></dd>
<dt><span><code>prec.left([<var>n</var>,] <var>rule</var>)</code></span></dt>
<dd><p>marks <var>rule</var> as left-associative, optionally with level <var>n</var>.
</p></dd>
<dt><span><code>prec.right([<var>n</var>,] <var>rule</var>)</code></span></dt>
<dd><p>marks <var>rule</var> as right-associative, optionally with level <var>n</var>.
</p></dd>
<dt><span><code>prec.dynamic(<var>n</var>, <var>rule</var>)</code></span></dt>
<dd><p>this is like <code>prec</code>, but the precedence is applied at runtime
instead.
</p></dd>
</dl>
</li><li> Normally, grammar rules ignore preceding whitespaces,
<code>token.immediate(rule)</code> changes <var>rule</var> to match only when
there is no preceding whitespaces.
</li><li> <code>prec(n, rule)</code> gives <var>rule</var> a level <var>n</var> precedence.
</li><li> <code>prec.left([n,] rule)</code> marks <var>rule</var> as left-associative,
optionally with level <var>n</var>.
</li><li> <code>prec.right([n,] rule)</code> marks <var>rule</var> as right-associative,
optionally with level <var>n</var>.
</li><li> <code>prec.dynamic(n, rule)</code> is like <code>prec</code>, but the precedence
is applied at runtime instead.
</li></ul>
<p>The tree-sitter project talks about writing a grammar in more detail:
<a href="https://tree-sitter.github.io/tree-sitter/creating-parsers">https://tree-sitter.github.io/tree-sitter/creating-parsers</a>.
Read especially &ldquo;The Grammar DSL&rdquo; section.
<p>The documentation of the tree-sitter project has
<a href="https://tree-sitter.github.io/tree-sitter/creating-parsers">more
about writing a grammar</a>. Read especially &ldquo;The Grammar DSL&rdquo;
section.
</p>
</div>
<hr>

View file

@ -33,7 +33,7 @@
<link href="Index.html" rel="index" title="Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
<link href="Tree_002dsitter-C-API.html" rel="next" title="Tree-sitter C API">
<link href="Tree_002dsitter-major-modes.html" rel="next" title="Tree-sitter major modes">
<link href="Pattern-Matching.html" rel="prev" title="Pattern Matching">
<style type="text/css">
<!--
@ -63,27 +63,29 @@
<div class="section" id="Multiple-Languages">
<div class="header">
<p>
Next: <a href="Tree_002dsitter-C-API.html" accesskey="n" rel="next">Tree-sitter C API Correspondence</a>, Previous: <a href="Pattern-Matching.html" accesskey="p" rel="prev">Pattern Matching Tree-sitter Nodes</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
Next: <a href="Tree_002dsitter-major-modes.html" accesskey="n" rel="next">Developing major modes with tree-sitter</a>, Previous: <a href="Pattern-Matching.html" accesskey="p" rel="prev">Pattern Matching Tree-sitter Nodes</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<span id="Parsing-Text-in-Multiple-Languages"></span><h3 class="section">37.6 Parsing Text in Multiple Languages</h3>
<p>Sometimes, the source of a programming language could contain sources
of other languages, HTML + CSS + JavaScript is one example. In that
case, we need to assign individual parsers to text segments written in
different languages. Traditionally this is achieved by using
narrowing. While tree-sitter works with narrowing (see <a href="Using-Parser.html#tree_002dsitter-narrowing">narrowing</a>), the recommended way is to set ranges in which
a parser will operate.
<span id="index-multiple-languages_002c-parsing-with-tree_002dsitter"></span>
<span id="index-parsing-multiple-languages-with-tree_002dsitter"></span>
<p>Sometimes, the source of a programming language could contain snippets
of other languages; <acronym>HTML</acronym> + <acronym>CSS</acronym> + JavaScript is one
example. In that case, text segments written in different languages
need to be assigned different parsers. Traditionally, this is
achieved by using narrowing. While tree-sitter works with narrowing
(see <a href="Using-Parser.html#tree_002dsitter-narrowing">narrowing</a>), the recommended way is
instead to set regions of buffer text in which a parser will operate.
</p>
<dl class="def">
<dt id="index-treesit_002dparser_002dset_002dincluded_002dranges"><span class="category">Function: </span><span><strong>treesit-parser-set-included-ranges</strong> <em>parser ranges</em><a href='#index-treesit_002dparser_002dset_002dincluded_002dranges' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function sets the range of <var>parser</var> to <var>ranges</var>. Then
<var>parser</var> will only read the text covered in each range. Each
range in <var>ranges</var> is a list of cons <code>(<var>beg</var>
. <var>end</var>)</code>.
<dd><p>This function sets up <var>parser</var> to operate on <var>ranges</var>. The
<var>parser</var> will only read the text of the specified ranges. Each
range in <var>ranges</var> is a list of the form <code>(<var>beg</var>&nbsp;.&nbsp;<var>end</var>)</code><!-- /@w -->.
</p>
<p>Each range in <var>ranges</var> must come in order and not overlap. That
is, in pseudo code:
<p>The ranges in <var>ranges</var> must come in order and must not overlap.
That is, in pseudo code:
</p>
<div class="example">
<pre class="example">(cl-loop for idx from 1 to (1- (length ranges))
@ -95,12 +97,12 @@
<span id="index-treesit_002drange_002dinvalid"></span>
<p>If <var>ranges</var> violates this constraint, or something else went
wrong, this function signals a <code>treesit-range-invalid</code>. The
signal data contains a specific error message and the ranges we are
trying to set.
wrong, this function signals the <code>treesit-range-invalid</code> error.
The signal data contains a specific error message and the ranges we
are trying to set.
</p>
<p>This function can also be used for disabling ranges. If <var>ranges</var>
is nil, the parser is set to parse the whole buffer.
is <code>nil</code>, the parser is set to parse the whole buffer.
</p>
<p>Example:
</p>
@ -114,9 +116,9 @@
<dt id="index-treesit_002dparser_002dincluded_002dranges"><span class="category">Function: </span><span><strong>treesit-parser-included-ranges</strong> <em>parser</em><a href='#index-treesit_002dparser_002dincluded_002dranges' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns the ranges set for <var>parser</var>. The return
value is the same as the <var>ranges</var> argument of
<code>treesit-parser-included-ranges</code>: a list of cons
<code>(<var>beg</var> . <var>end</var>)</code>. And if <var>parser</var> doesn&rsquo;t have any
ranges, the return value is nil.
<code>treesit-parser-included-ranges</code>: a list of cons cells of the form
<code>(<var>beg</var>&nbsp;.&nbsp;<var>end</var>)</code><!-- /@w -->. If <var>parser</var> doesn&rsquo;t have any
ranges, the return value is <code>nil</code>.
</p>
<div class="example">
<pre class="example">(treesit-parser-included-ranges parser)
@ -131,7 +133,7 @@
<var>parser-or-lang</var> could be either a parser or a language. If it is
a language, this function looks for the first parser in
<code>(treesit-parser-list)</code> for that language in the current buffer,
and set range for it.
and sets the ranges for it.
</p></dd></dl>
<dl class="def">
@ -145,68 +147,76 @@
<dl class="def">
<dt id="index-treesit_002dquery_002drange"><span class="category">Function: </span><span><strong>treesit-query-range</strong> <em>source query &amp;optional beg end</em><a href='#index-treesit_002dquery_002drange' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function matches <var>source</var> with <var>query</var> and returns the
ranges of captured nodes. The return value has the same shape of
other functions: a list of <code>(<var>beg</var> . <var>end</var>)</code>.
ranges of captured nodes. The return value is a list of cons cells of
the form <code>(<var>beg</var>&nbsp;.&nbsp;<var>end</var>)</code><!-- /@w -->, where <var>beg</var> and
<var>end</var> specify the beginning and the end of a region of text.
</p>
<p>For convenience, <var>source</var> can be a language symbol, a parser, or a
node. If a language symbol, this function matches in the root node of
the first parser using that language; if a parser, this function
matches in the root node of that parser; if a node, this function
matches in that node.
node. If it&rsquo;s a language symbol, this function matches in the root
node of the first parser using that language; if a parser, this
function matches in the root node of that parser; if a node, this
function matches in that node.
</p>
<p>Parameter <var>query</var> is the query used to capture nodes
(see <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>). The capture names don&rsquo;t matter. Parameter
<var>beg</var> and <var>end</var>, if both non-nil, limits the range in which
this function queries.
<p>The argument <var>query</var> is the query used to capture nodes
(see <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>). The capture names don&rsquo;t matter. The
arguments <var>beg</var> and <var>end</var>, if both non-<code>nil</code>, limit the
range in which this function queries.
</p>
<p>Like other query functions, this function raises an
<var>treesit-query-error</var> if <var>query</var> is malformed.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dlanguage_002dat"><span class="category">Function: </span><span><strong>treesit-language-at</strong> <em>point</em><a href='#index-treesit_002dlanguage_002dat' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function tries to figure out which language is responsible for
the text at <var>point</var>. It goes over each parser in
<code>(treesit-parser-list)</code> and see if that parser&rsquo;s range covers
<var>point</var>.
<p>Like other query functions, this function raises the
<code>treesit-query-error</code> error if <var>query</var> is malformed.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002drange_002dfunctions"><span class="category">Variable: </span><span><strong>treesit-range-functions</strong><a href='#index-treesit_002drange_002dfunctions' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>A list of range functions. Font-locking and indenting code uses
functions in this alist to set correct ranges for a language parser
before using it.
<dd><p>This variable holds the list of range functions. Font-locking and
indenting code use functions in this list to set correct ranges for
a language parser before using it.
</p>
<p>The signature of each function should be
<p>The signature of each function in the list should be:
</p>
<div class="example">
<pre class="example">(<var>start</var> <var>end</var> &amp;rest <var>_</var>)
</pre></div>
<p>where <var>start</var> and <var>end</var> marks the region that is about to be
used. A range function only need to (but not limited to) update
<p>where <var>start</var> and <var>end</var> specify the region that is about to be
used. A range function only needs to (but is not limited to) update
ranges in that region.
</p>
<p>Each function in the list is called in-order.
<p>The functions in the list are called in order.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dupdate_002dranges"><span class="category">Function: </span><span><strong>treesit-update-ranges</strong> <em>&amp;optional start end</em><a href='#index-treesit_002dupdate_002dranges' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function is used by font-lock and indent to update ranges before
using any parser. Each range function in
<dd><p>This function is used by font-lock and indentation to update ranges
before using any parser. Each range function in
<var>treesit-range-functions</var> is called in-order. Arguments
<var>start</var> and <var>end</var> are passed to each range function.
</p></dd></dl>
<span id="index-treesit_002dlanguage_002dat_002dpoint_002dfunction"></span>
<dl class="def">
<dt id="index-treesit_002dlanguage_002dat"><span class="category">Function: </span><span><strong>treesit-language-at</strong> <em>pos</em><a href='#index-treesit_002dlanguage_002dat' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function tries to figure out which language is responsible for
the text at buffer position <var>pos</var>. Under the hood it just calls
<code>treesit-language-at-point-function</code>.
</p>
<p>Various Lisp programs use this function. For example, the indentation
program uses this function to determine which language&rsquo;s rule to use
in a multi-language buffer. So it is important to provide
<code>treesit-language-at-point-function</code> for a multi-language major
mode.
</p></dd></dl>
<span id="An-example"></span><h3 class="heading">An example</h3>
<p>Normally, in a set of languages that can be mixed together, there is a
major language and several embedded languages. We first parse the
whole document with the major languages parser, set ranges for the
embedded languages, then parse the embedded languages.
major language and several embedded languages. A Lisp program usually
first parses the whole document with the major language&rsquo;s parser, sets
ranges for the embedded languages, and then parses the embedded
languages.
</p>
<p>Suppose we want to parse a very simple document that mixes HTML, CSS
and JavaScript:
<p>Suppose we need to parse a very simple document that mixes
<acronym>HTML</acronym>, <acronym>CSS</acronym> and JavaScript:
</p>
<div class="example">
<pre class="example">&lt;html&gt;
@ -215,22 +225,25 @@
&lt;/html&gt;
</pre></div>
<p>We first parse with HTML, then set ranges for CSS and JavaScript:
<p>We first parse with <acronym>HTML</acronym>, then set ranges for <acronym>CSS</acronym>
and JavaScript:
</p>
<div class="example">
<pre class="example">;; Create parsers.
(setq html (treesit-get-parser-create 'html))
(setq css (treesit-get-parser-create 'css))
(setq js (treesit-get-parser-create 'javascript))
</pre><pre class="example">
;; Set CSS ranges.
</pre><pre class="example">;; Set CSS ranges.
(setq css-range
(treesit-query-range
'html
&quot;(style_element (raw_text) @capture)&quot;))
(treesit-parser-set-included-ranges css css-range)
</pre><pre class="example">
;; Set JavaScript ranges.
</pre><pre class="example">;; Set JavaScript ranges.
(setq js-range
(treesit-query-range
'html
@ -238,15 +251,15 @@
(treesit-parser-set-included-ranges js js-range)
</pre></div>
<p>We use a query pattern <code>(style_element (raw_text) @capture)</code> to
find CSS nodes in the HTML parse tree. For how to write query
patterns, see <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>.
<p>We use a query pattern <code><span class="nolinebreak">(style_element</span>&nbsp;<span class="nolinebreak">(raw_text)</span>&nbsp;@capture)</code><!-- /@w -->
to find <acronym>CSS</acronym> nodes in the <acronym>HTML</acronym> parse tree. For how
to write query patterns, see <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>.
</p>
</div>
<hr>
<div class="header">
<p>
Next: <a href="Tree_002dsitter-C-API.html">Tree-sitter C API Correspondence</a>, Previous: <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
Next: <a href="Tree_002dsitter-major-modes.html">Developing major modes with tree-sitter</a>, Previous: <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>

View file

@ -66,47 +66,78 @@
</div>
<hr>
<span id="Parser_002dbased-Font-Lock-1"></span><h4 class="subsection">24.6.10 Parser-based Font Lock</h4>
<span id="index-parser_002dbased-font_002dlock"></span>
<p>Besides simple syntactic font lock and regexp-based font lock, Emacs
also provides complete syntactic font lock with the help of a parser,
currently provided by the tree-sitter library (see <a href="Parsing-Program-Source.html">Parsing Program Source</a>).
also provides complete syntactic font lock with the help of a parser.
Currently, Emacs uses the tree-sitter library (see <a href="Parsing-Program-Source.html">Parsing Program Source</a>) for this purpose.
</p>
<dl class="def">
<dt id="index-treesit_002dfont_002dlock_002denable"><span class="category">Function: </span><span><strong>treesit-font-lock-enable</strong><a href='#index-treesit_002dfont_002dlock_002denable' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function enables parser-based font lock in the current buffer.
</p></dd></dl>
<p>Parser-based font lock and other font lock mechanism are not mutually
<p>Parser-based font lock and other font lock mechanisms are not mutually
exclusive. By default, if enabled, parser-based font lock runs first,
then the simple syntactic font lock (if enabled), then regexp-based
font lock.
replacing syntactic font lock, then the regexp-based font lock.
</p>
<p>Although parser-based font lock doesn&rsquo;t share the same customization
variables with regexp-based font lock, parser-based font lock uses
similar customization schemes. The tree-sitter counterpart of
<var>font-lock-keywords</var> is <var>treesit-font-lock-settings</var>.
variables with regexp-based font lock, it uses similar customization
schemes. The tree-sitter counterpart of <var>font-lock-keywords</var> is
<var>treesit-font-lock-settings</var>.
</p>
<span id="index-tree_002dsitter-fontifications_002c-overview"></span>
<span id="index-fontifications-with-tree_002dsitter_002c-overview"></span>
<p>In general, tree-sitter fontification works as follows:
</p>
<ul>
<li> A Lisp program (usually, part of a major mode) provides a <em>query</em>
consisting of <em>patterns</em>, each pattern associated with a
<em>capture name</em>.
</li><li> The tree-sitter library finds the nodes in the parse tree
that match these patterns, tags the nodes with the corresponding
capture names, and returns them to the Lisp program.
</li><li> The Lisp program uses the returned nodes to highlight the portions of
buffer text corresponding to each node as appropriate, using the
tagged capture names of the nodes to determine the correct
fontification. For example, a node tagged <code>font-lock-keyword</code>
would be highlighted in <code>font-lock-keyword</code> face.
</li></ul>
<p>For more information about queries, patterns, and capture names, see
<a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>.
</p>
<p>To setup tree-sitter fontification, a major mode should first set
<code>treesit-font-lock-settings</code> with the output of
<code>treesit-font-lock-rules</code>, then call
<code>treesit-major-mode-setup</code>.
</p>
<dl class="def">
<dt id="index-treesit_002dfont_002dlock_002drules"><span class="category">Function: </span><span><strong>treesit-font-lock-rules</strong> <em>:keyword value query...</em><a href='#index-treesit_002dfont_002dlock_002drules' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function is used to set <var>treesit-font-lock-settings</var>. It
takes care of compiling queries and other post-processing and outputs
a value that <var>treesit-font-lock-settings</var> accepts. An example:
takes care of compiling queries and other post-processing, and outputs
a value that <var>treesit-font-lock-settings</var> accepts. Here&rsquo;s an
example:
</p>
<div class="example">
<pre class="example">(treesit-font-lock-rules
:language 'javascript
:feature 'constant
:override t
'((true) @font-lock-constant-face
(false) @font-lock-constant-face)
:language 'html
:feature 'script
&quot;(script_element) @font-lock-builtin-face&quot;)
</pre></div>
<p>This function takes a list of text or s-exp queries. Before each
query, there are <var>:keyword</var> and <var>value</var> pairs that configure
that query. The <code>:lang</code> keyword sets the querys language and
every query must specify the language. Other keywords are optional:
query, there are <var>:keyword</var>-<var>value</var> pairs that configure
that query. The <code>:lang</code> keyword sets the query&rsquo;s language and
every query must specify the language. The <code>:feature</code> keyword
sets the feature name of the query. Users can control which features
are enabled with <code>font-lock-maximum-decoration</code> and
<code>treesit-font-lock-feature-list</code> (see below).
</p>
<p>Other keywords are optional:
</p>
<table>
<thead><tr><th width="15%">Keyword</th><th width="15%">Value</th><th width="60%">Description</th></tr></thead>
@ -115,43 +146,92 @@
<tr><td width="15%"></td><td width="15%"><code>append</code></td><td width="60%">Append the new face to existing ones</td></tr>
<tr><td width="15%"></td><td width="15%"><code>prepend</code></td><td width="60%">Prepend the new face to existing ones</td></tr>
<tr><td width="15%"></td><td width="15%"><code>keep</code></td><td width="60%">Fill-in regions without an existing face</td></tr>
<tr><td width="15%"><code>:toggle</code></td><td width="15%"><var>symbol</var></td><td width="60%">If non-nil, its value should be a variable name. The variable&rsquo;s value
(nil/non-nil) disables/enables the query during fontification.</td></tr>
<tr><td width="15%"></td><td width="15%">nil</td><td width="60%">Always enable this query.</td></tr>
<tr><td width="15%"><code>:level</code></td><td width="15%"><var>integer</var></td><td width="60%">If non-nil, its value should be the decoration level for this query.
Decoration level is controlled by <code>font-lock-maximum-decoration</code>.</td></tr>
<tr><td width="15%"></td><td width="15%">nil</td><td width="60%">Always enable this query.</td></tr>
</table>
<p>Note that a query is applied only when both <code>:toggle</code> and
<code>:level</code> permit it. <code>:level</code> is used for global,
coarse-grained control, whereas <code>:toggle</code> is for local,
fine-grained control.
</p>
<p>Capture names in <var>query</var> should be face names like
<p>Lisp programs mark patterns in the query with capture names (names
that starts with <code>@</code>), and tree-sitter will return matched nodes
tagged with those same capture names. For the purpose of
fontification, capture names in <var>query</var> should be face names like
<code>font-lock-keyword-face</code>. The captured node will be fontified
with that face. Capture names can also be function names, in which
case the function is called with (<var>start</var> <var>end</var> <var>node</var>),
where <var>start</var> and <var>end</var> are the start and end position of the
node in buffer, and <var>node</var> is the node itself. If a capture name
is both a face and a function, the face takes priority. If a capture
name is not a face name nor a function name, it is ignored.
with that face.
</p>
<span id="index-treesit_002dfontify_002dwith_002doverride"></span>
<p>Capture names can also be function names, in which case the function
is called with 4 arguments: <var>node</var> and <var>override</var>, <var>start</var>
and <var>end</var>, where <var>node</var> is the node itself, <var>override</var> is
the override property of the rule which captured this node, and
<var>start</var> and <var>end</var> limits the region in which this function
should fontify. (If this function wants to respect the <var>override</var>
argument, it can use <code>treesit-fontify-with-override</code>.)
</p>
<p>Beyond the 4 arguments presented, this function should accept more
arguments as optional arguments for future extensibility.
</p>
<p>If a capture name is both a face and a function, the face takes
priority. If a capture name is neither a face nor a function, it is
ignored.
</p></dd></dl>
<p>Contextual entities, like multi-line strings, or <code>/* */</code> style
comments, need special care, because change in these entities might
cause change in a large portion of the buffer. For example, inserting
the closing comment delimiter <code>*/</code> will change all the text
between it and the opening delimiter to comment face. Such entities
should be captured in a special name <code>contextual</code>, so Emacs can
correctly update their fontification. Here is an example for
comments:
</p>
<div class="example">
<pre class="example">(treesit-font-lock-rules
:language 'javascript
:feature 'comment
:override t
'((comment) @font-lock-comment-face)
(comment) @contextual))
</pre></div>
<dl class="def">
<dt id="index-treesit_002dfont_002dlock_002dfeature_002dlist"><span class="category">Variable: </span><span><strong>treesit-font-lock-feature-list</strong><a href='#index-treesit_002dfont_002dlock_002dfeature_002dlist' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This is a list of lists of feature symbols. Each element of the list
is a list that represents a decoration level.
<code>font-lock-maximum-decoration</code> controls which levels are
activated.
</p>
<p>Each element of the list is a list of the form <code>(<var>feature</var>&nbsp;&hellip;)</code><!-- /@w -->, where each <var>feature</var> corresponds to the
<code>:feature</code> value of a query defined in
<code>treesit-font-lock-rules</code>. Removing a feature symbol from this
list disables the corresponding query during font-lock.
</p>
<p>Common feature names, for many programming languages, include
function-name, type, variable-name (left-hand-side or <acronym>LHS</acronym> of
assignments), builtin, constant, keyword, string-interpolation,
comment, doc, string, operator, preprocessor, escape-sequence, and key
(in key-value pairs). Major modes are free to subdivide or extend
these common features.
</p>
<p>For example, the value of this variable could be:
</p><div class="example">
<pre class="example">((comment string doc) ; level 1
(function-name keyword type builtin constant) ; level 2
(variable-name string-interpolation key)) ; level 3
</pre></div>
<p>Major modes should set this variable before calling
<code>treesit-major-mode-setup</code>.
</p>
<span id="index-treesit_002dfont_002dlock_002drecompute_002dfeatures"></span>
<p>For this variable to take effect, a Lisp program should call
<code>treesit-font-lock-recompute-features</code> (which resets
<code>treesit-font-lock-settings</code> accordingly), or
<code>treesit-major-mode-setup</code> (which calls
<code>treesit-font-lock-recompute-features</code>).
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dfont_002dlock_002dsettings"><span class="category">Variable: </span><span><strong>treesit-font-lock-settings</strong><a href='#index-treesit_002dfont_002dlock_002dsettings' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>A list of <var>setting</var>s for tree-sitter font lock. The exact format
<dd><p>A list of settings for tree-sitter based font lock. The exact format
of this variable is considered internal. One should always use
<code>treesit-font-lock-rules</code> to set this variable.
</p>
<p>Each <var>setting</var> is of form
</p>
<div class="example">
<pre class="example">(<var>language</var> <var>query</var>)
</pre></div>
<p>Each <var>setting</var> controls one parser (often of different language).
And <var>language</var> is the language symbol (see <a href="Language-Definitions.html">Tree-sitter Language Definitions</a>); <var>query</var> is the query (see <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>).
</p></dd></dl>
<p>Multi-language major modes should provide range functions in

View file

@ -66,170 +66,176 @@
</div>
<hr>
<span id="Parser_002dbased-Indentation-1"></span><h4 class="subsection">24.7.2 Parser-based Indentation</h4>
<span id="index-parser_002dbased-indentation"></span>
<p>When built with the tree-sitter library (see <a href="Parsing-Program-Source.html">Parsing Program Source</a>), Emacs could parse program source and produce a syntax tree.
And this syntax tree can be used for indentation. For maximum
flexibility, we could write a custom indent function that queries the
syntax tree and indents accordingly for each language, but that would
be a lot of work. It is more convenient to use the simple indentation
engine described below: we only need to write some indentation rules
<p>When built with the tree-sitter library (see <a href="Parsing-Program-Source.html">Parsing Program Source</a>), Emacs is capable of parsing the program source and producing
a syntax tree. This syntax tree can be used for guiding the program
source indentation commands. For maximum flexibility, it is possible
to write a custom indentation function that queries the syntax tree
and indents accordingly for each language, but that is a lot of work.
It is more convenient to use the simple indentation engine described
below: then the major mode needs only to write some indentation rules
and the engine takes care of the rest.
</p>
<p>To enable the indentation engine, set the value of
<p>To enable the parser-based indentation engine, either set
<var>treesit-simple-indent-rules</var> and call
<code>treesit-major-mode-setup</code>, or equivalently, set the value of
<code>indent-line-function</code> to <code>treesit-indent</code>.
</p>
<dl class="def">
<dt id="index-treesit_002dindent_002dfunction"><span class="category">Variable: </span><span><strong>treesit-indent-function</strong><a href='#index-treesit_002dindent_002dfunction' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This variable stores the actual function called by
<code>treesit-indent</code>. By default, its value is
<code>treesit-simple-indent</code>. In the future we might add other
<code>treesit-simple-indent</code>. In the future we might add other,
more complex indentation engines.
</p></dd></dl>
<span id="Writing-indentation-rules"></span><h3 class="heading">Writing indentation rules</h3>
<span id="index-indentation-rules_002c-for-parser_002dbased-indentation"></span>
<dl class="def">
<dt id="index-treesit_002dsimple_002dindent_002drules"><span class="category">Variable: </span><span><strong>treesit-simple-indent-rules</strong><a href='#index-treesit_002dsimple_002dindent_002drules' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This local variable stores indentation rules for every language. It is
a list of
<dd><p>This local variable stores indentation rules for every language. It is
a list of the form: <code>(<var>language</var>&nbsp;.&nbsp;<var>rules</var>)</code><!-- /@w -->, where
<var>language</var> is a language symbol, and <var>rules</var> is a list of the
form <code>(<var>matcher</var>&nbsp;<var>anchor</var>&nbsp;<var>offset</var>)</code><!-- /@w -->.
</p>
<div class="example">
<pre class="example">(<var>language</var> . <var>rules</var>)
</pre></div>
<p>where <var>language</var> is a language symbol, and <var>rules</var> is a list
of
</p>
<div class="example">
<pre class="example">(<var>matcher</var> <var>anchor</var> <var>offset</var>)
</pre></div>
<p>First Emacs passes the node at point to <var>matcher</var>, if it return
non-nil, this rule applies. Then Emacs passes the node to
<var>anchor</var>, it returns a point. Emacs takes the column number of
that point, add <var>offset</var> to it, and the result is the indent for
the current line.
<p>First, Emacs passes the smallest tree-sitter node at the beginning of
the current line to <var>matcher</var>; if it returns non-<code>nil</code>, this
rule is applicable. Then Emacs passes the node to <var>anchor</var>, which
returns a buffer position. Emacs takes the column number of that
position, adds <var>offset</var> to it, and the result is the indentation
column for the current line.
</p>
<p>The <var>matcher</var> and <var>anchor</var> are functions, and Emacs provides
convenient presets for them. You can skip over to
<code>treesit-simple-indent-presets</code> below, those presets should be
more than enough.
convenient defaults for them.
</p>
<p>A <var>matcher</var> or an <var>anchor</var> is a function that takes three
arguments (<var>node</var> <var>parent</var> <var>bol</var>). Argument <var>bol</var> is
the point at where we are indenting: the position of the first
non-whitespace character from the beginning of line; <var>node</var> is the
largest (highest-in-tree) node that starts at that point; <var>parent</var>
is the parent of <var>node</var>. A <var>matcher</var> returns nil/non-nil, and
<var>anchor</var> returns a point.
<p>Each <var>matcher</var> or <var>anchor</var> is a function that takes three
arguments: <var>node</var>, <var>parent</var>, and <var>bol</var>. The argument
<var>bol</var> is the buffer position whose indentation is required: the
position of the first non-whitespace character after the beginning of
the line. The argument <var>node</var> is the largest (highest-in-tree)
node that starts at that position; and <var>parent</var> is the parent of
<var>node</var>. However, when that position is on a whitespace or inside
a multi-line string, no node that starts at that position, so
<var>node</var> is <code>nil</code>. In that case, <var>parent</var> would be the
smallest node that spans that position.
</p>
<p>Emacs finds <var>bol</var>, <var>node</var> and <var>parent</var> and
passes them to each <var>matcher</var> and <var>anchor</var>. <var>matcher</var>
should return non-<code>nil</code> if the rule is applicable, and
<var>anchor</var> should return a buffer position.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dsimple_002dindent_002dpresets"><span class="category">Variable: </span><span><strong>treesit-simple-indent-presets</strong><a href='#index-treesit_002dsimple_002dindent_002dpresets' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This is a list of presets for <var>matcher</var>s and <var>anchor</var>s in
<code>treesit-simple-indent-rules</code>. Each of them represent a function
that takes <var>node</var>, <var>parent</var> and <var>bol</var> as arguments.
<dd><p>This is a list of defaults for <var>matcher</var>s and <var>anchor</var>s in
<code>treesit-simple-indent-rules</code>. Each of them represents a function
that takes 3 arguments: <var>node</var>, <var>parent</var> and <var>bol</var>. The
available default functions are:
</p>
<div class="example">
<pre class="example">no-node
</pre></div>
<p>This matcher matches the case where <var>node</var> is nil, i.e., there is
no node that starts at <var>bol</var>. This is the case when <var>bol</var> is
at an empty line or inside a multi-line string, etc.
<dl compact="compact">
<dt id='index-no_002dnode'><span><code>no-node</code><a href='#index-no_002dnode' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This matcher is a function that is called with 3 arguments:
<var>node</var>, <var>parent</var>, and <var>bol</var>, and returns non-<code>nil</code>,
indicating a match, if <var>node</var> is <code>nil</code>, i.e., there is no
node that starts at <var>bol</var>. This is the case when <var>bol</var> is on
an empty line or inside a multi-line string, etc.
</p>
<div class="example">
<pre class="example">(parent-is <var>type</var>)
</pre></div>
<p>This matcher matches if <var>parent</var>&rsquo;s type is <var>type</var>.
</dd>
<dt id='index-parent_002dis'><span><code>parent-is</code><a href='#index-parent_002dis' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This matcher is a function of one argument, <var>type</var>; it returns a
function that is called with 3 arguments: <var>node</var>, <var>parent</var>,
and <var>bol</var>, and returns non-<code>nil</code> (i.e., a match) if
<var>parent</var>&rsquo;s type matches regexp <var>type</var>.
</p>
<div class="example">
<pre class="example">(node-is <var>type</var>)
</pre></div>
<p>This matcher matches if <var>node</var>&rsquo;s type is <var>type</var>.
</dd>
<dt id='index-node_002dis'><span><code>node-is</code><a href='#index-node_002dis' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This matcher is a function of one argument, <var>type</var>; it returns a
function that is called with 3 arguments: <var>node</var>, <var>parent</var>,
and <var>bol</var>, and returns non-<code>nil</code> if <var>node</var>&rsquo;s type matches
regexp <var>type</var>.
</p>
<div class="example">
<pre class="example">(query <var>query</var>)
</pre></div>
<p>This matcher matches if querying <var>parent</var> with <var>query</var>
captures <var>node</var>. The capture name does not matter.
</dd>
<dt id='index-query'><span><code>query</code><a href='#index-query' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This matcher is a function of one argument, <var>query</var>; it returns a
function that is called with 3 arguments: <var>node</var>, <var>parent</var>,
and <var>bol</var>, and returns non-<code>nil</code> if querying <var>parent</var>
with <var>query</var> captures <var>node</var> (see <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>).
</p>
<div class="example">
<pre class="example">(match <var>node-type</var> <var>parent-type</var>
<var>node-field</var> <var>node-index-min</var> <var>node-index-max</var>)
</pre></div>
<p>This matcher checks if <var>node</var>&rsquo;s type is <var>node-type</var>,
<var>parent</var>&rsquo;s type is <var>parent-type</var>, <var>node</var>&rsquo;s field name in
<var>parent</var> is <var>node-field</var>, and <var>node</var>&rsquo;s index among its
siblings is between <var>node-index-min</var> and <var>node-index-max</var>. If
the value of a constraint is nil, this matcher doesn&rsquo;t check for that
constraint. For example, to match the first child where parent is
<code>argument_list</code>, use
</dd>
<dt id='index-match'><span><code>match</code><a href='#index-match' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This matcher is a function of 5 arguments: <var>node-type</var>,
<var>parent-type</var>, <var>node-field</var>, <var>node-index-min</var>, and
<var>node-index-max</var>). It returns a function that is called with 3
arguments: <var>node</var>, <var>parent</var>, and <var>bol</var>, and returns
non-<code>nil</code> if <var>node</var>&rsquo;s type matches regexp <var>node-type</var>,
<var>parent</var>&rsquo;s type matches regexp <var>parent-type</var>, <var>node</var>&rsquo;s
field name in <var>parent</var> matches regexp <var>node-field</var>, and
<var>node</var>&rsquo;s index among its siblings is between <var>node-index-min</var>
and <var>node-index-max</var>. If the value of an argument is <code>nil</code>,
this matcher doesn&rsquo;t check that argument. For example, to match the
first child where parent is <code>argument_list</code>, use
</p>
<div class="example">
<pre class="example">(match nil &quot;argument_list&quot; nil nil 0 0)
</pre></div>
<div class="example">
<pre class="example">first-sibling
</pre></div>
<p>This anchor returns the start of the first child of <var>parent</var>.
</dd>
<dt id='index-first_002dsibling'><span><code>first-sibling</code><a href='#index-first_002dsibling' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
<var>parent</var>, and <var>bol</var>, and returns the start of the first child
of <var>parent</var>.
</p>
<div class="example">
<pre class="example">parent
</pre></div>
<p>This anchor returns the start of <var>parent</var>.
</dd>
<dt id='index-parent'><span><code>parent</code><a href='#index-parent' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
<var>parent</var>, and <var>bol</var>, and returns the start of <var>parent</var>.
</p>
<div class="example">
<pre class="example">parent-bol
</pre></div>
<p>This anchor returns the beginning of non-space characters on the line
where <var>parent</var> is on.
</dd>
<dt id='index-parent_002dbol'><span><code>parent-bol</code><a href='#index-parent_002dbol' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
<var>parent</var>, and <var>bol</var>, and returns the first non-space character
on the line of <var>parent</var>.
</p>
<div class="example">
<pre class="example">prev-sibling
</pre></div>
<p>This anchor returns the start of the previous sibling of <var>node</var>.
</dd>
<dt id='index-prev_002dsibling'><span><code>prev-sibling</code><a href='#index-prev_002dsibling' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
<var>parent</var>, and <var>bol</var>, and returns the start of the previous
sibling of <var>node</var>.
</p>
<div class="example">
<pre class="example">no-indent
</pre></div>
<p>This anchor returns the start of <var>node</var>, i.e., no indent.
</dd>
<dt id='index-no_002dindent'><span><code>no-indent</code><a href='#index-no_002dindent' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
<var>parent</var>, and <var>bol</var>, and returns the start of <var>node</var>.
</p>
<div class="example">
<pre class="example">prev-line
</pre></div>
</dd>
<dt id='index-prev_002dline'><span><code>prev-line</code><a href='#index-prev_002dline' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
<var>parent</var>, and <var>bol</var>, and returns the first non-whitespace
charater on the previous line.
</p></dd>
</dl>
<p>This anchor returns the first non-whitespace charater on the previous
line.
</p></dd></dl>
</dd></dl>
<span id="Indentation-utilities"></span><h3 class="heading">Indentation utilities</h3>
<span id="index-utility-functions-for-parser_002dbased-indentation"></span>
<p>Here are some utility functions that can help writing indentation
rules.
<p>Here are some utility functions that can help writing parser-based
indentation rules.
</p>
<dl class="def">
<dt id="index-treesit_002dcheck_002dindent"><span class="category">Function: </span><span><strong>treesit-check-indent</strong> <em>mode</em><a href='#index-treesit_002dcheck_002dindent' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function checks current buffer&rsquo;s indentation against major mode
<var>mode</var>. It indents the current buffer in <var>mode</var> and compares
the indentation with the current indentation. Then it pops up a diff
buffer showing the difference. Correct indentation (target) is in
green, current indentation is in red.
</p></dd></dl>
<dd><p>This function checks the current buffer&rsquo;s indentation against major
mode <var>mode</var>. It indents the current buffer according to
<var>mode</var> and compares the results with the current indentation.
Then it pops up a buffer showing the differences. Correct
indentation (target) is shown in green color, current indentation is
shown in red color. </p></dd></dl>
<p>It is also helpful to use <code>treesit-inspect-mode</code> when writing
indentation rules.
<p>It is also helpful to use <code>treesit-inspect-mode</code> (see <a href="Language-Definitions.html">Tree-sitter Language Definitions</a>) when writing indentation rules.
</p>
</div>
<hr>

View file

@ -68,53 +68,50 @@
<hr>
<span id="Parsing-Program-Source-1"></span><h2 class="chapter">37 Parsing Program Source</h2>
<span id="index-syntax-tree_002c-from-parsing-program-source"></span>
<p>Emacs provides various ways to parse program source text and produce a
<em>syntax tree</em>. In a syntax tree, text is no longer a
one-dimensional stream but a structured tree of nodes, where each node
representing a piece of text. Thus a syntax tree can enable
interesting features like precise fontification, indentation,
<em>syntax tree</em>. In a syntax tree, text is no longer considered a
one-dimensional stream of characters, but a structured tree of nodes,
where each node representing a piece of text. Thus, a syntax tree can
enable interesting features like precise fontification, indentation,
navigation, structured editing, etc.
</p>
<p>Emacs has a simple facility for parsing balanced expressions
(see <a href="Parsing-Expressions.html">Parsing Expressions</a>). There is also SMIE library for generic
navigation and indentation (see <a href="SMIE.html">Simple Minded Indentation Engine</a>).
(see <a href="Parsing-Expressions.html">Parsing Expressions</a>). There is also the SMIE library for
generic navigation and indentation (see <a href="SMIE.html">Simple Minded Indentation Engine</a>).
</p>
<p>Emacs also provides integration with tree-sitter library
(<a href="https://tree-sitter.github.io/tree-sitter">https://tree-sitter.github.io/tree-sitter</a>) if compiled with
it. The tree-sitter library implements an incremental parser and has
support from a wide range of programming languages.
<p>In addition to those, Emacs also provides integration with
<a href="https://tree-sitter.github.io/tree-sitter">the tree-sitter
library</a>) if support for it was compiled in. The tree-sitter library
implements an incremental parser and has support from a wide range of
programming languages.
</p>
<dl class="def">
<dt id="index-treesit_002davailable_002dp"><span class="category">Function: </span><span><strong>treesit-available-p</strong><a href='#index-treesit_002davailable_002dp' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns non-nil if tree-sitter features are available
for this Emacs instance.
<dd><p>This function returns non-<code>nil</code> if tree-sitter features are
available for the current Emacs session.
</p></dd></dl>
<p>For tree-sitter integration with existing Emacs features,
see <a href="Parser_002dbased-Font-Lock.html">Parser-based Font Lock</a>, <a href="Parser_002dbased-Indentation.html">Parser-based Indentation</a>, and
<a href="List-Motion.html">Moving over Balanced Expressions</a>.
</p>
<p>About naming convention: use &ldquo;tree-sitter&rdquo; when referring to it as a
noun, like <code>python-use-tree-sitter</code>, but use &ldquo;treesit&rdquo; for
prefixes, like <code>python-treesit-indent-function</code>.
</p>
<p>To access the syntax tree of the text in a buffer, we need to first
load a language definition and create a parser with it. Next, we can
query the parser for specific nodes in the syntax tree. Then, we can
access various information about the node, and we can pattern-match a
node with a powerful syntax. Finally, we explain how to work with
source files that mixes multiple languages. The following sections
explain how to do each of the tasks in detail.
<p>To be able to parse the program source using the tree-sitter library
and access the syntax tree of the program, a Lisp program needs to
load a language definition library, and create a parser for that
language and the current buffer. After that, the Lisp program can
query the parser about specific nodes of the syntax tree. Then, it
can access various kinds of information about each node, and search
for nodes using a powerful pattern-matching syntax. This chapter
explains how to do all this, and also how a Lisp program can work with
source files that mix multiple programming languages.
</p>
<ul class="section-toc">
<li><a href="Language-Definitions.html" accesskey="1">Tree-sitter Language Definitions</a></li>
<li><a href="Using-Parser.html" accesskey="2">Using Tree-sitter Parser</a></li>
<li><a href="Retrieving-Node.html" accesskey="3">Retrieving Node</a></li>
<li><a href="Accessing-Node.html" accesskey="4">Accessing Node Information</a></li>
<li><a href="Accessing-Node-Information.html" accesskey="4">Accessing Node Information</a></li>
<li><a href="Pattern-Matching.html" accesskey="5">Pattern Matching Tree-sitter Nodes</a></li>
<li><a href="Multiple-Languages.html" accesskey="6">Parsing Text in Multiple Languages</a></li>
<li><a href="Tree_002dsitter-C-API.html" accesskey="7">Tree-sitter C API Correspondence</a></li>
<li><a href="Tree_002dsitter-major-modes.html" accesskey="7">Developing major modes with tree-sitter</a></li>
<li><a href="Tree_002dsitter-C-API.html" accesskey="8">Tree-sitter C API Correspondence</a></li>
</ul>
</div>
<hr>

View file

@ -34,7 +34,7 @@
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
<link href="Multiple-Languages.html" rel="next" title="Multiple Languages">
<link href="Accessing-Node.html" rel="prev" title="Accessing Node">
<link href="Accessing-Node-Information.html" rel="prev" title="Accessing Node Information">
<style type="text/css">
<!--
a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
@ -63,32 +63,32 @@
<div class="section" id="Pattern-Matching">
<div class="header">
<p>
Next: <a href="Multiple-Languages.html" accesskey="n" rel="next">Parsing Text in Multiple Languages</a>, Previous: <a href="Accessing-Node.html" accesskey="p" rel="prev">Accessing Node Information</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
Next: <a href="Multiple-Languages.html" accesskey="n" rel="next">Parsing Text in Multiple Languages</a>, Previous: <a href="Accessing-Node-Information.html" accesskey="p" rel="prev">Accessing Node Information</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<span id="Pattern-Matching-Tree_002dsitter-Nodes"></span><h3 class="section">37.5 Pattern Matching Tree-sitter Nodes</h3>
<span id="index-pattern-matching-with-tree_002dsitter-nodes"></span>
<p>Tree-sitter let us pattern match with a small declarative language.
Pattern matching consists of two steps: first tree-sitter matches a
<em>pattern</em> against nodes in the syntax tree, then it <em>captures</em>
specific nodes in that pattern and returns the captured nodes.
<span id="index-capturing_002c-tree_002dsitter-node"></span>
<p>Tree-sitter lets Lisp programs match patterns using a small
declarative language. This pattern matching consists of two steps:
first tree-sitter matches a <em>pattern</em> against nodes in the syntax
tree, then it <em>captures</em> specific nodes that matched the pattern
and returns the captured nodes.
</p>
<p>We describe first how to write the most basic query pattern and how to
capture nodes in a pattern, then the pattern-match function, finally
more advanced pattern syntax.
capture nodes in a pattern, then the pattern-matching function, and
finally the more advanced pattern syntax.
</p>
<span id="Basic-query-syntax"></span><h3 class="heading">Basic query syntax</h3>
<span id="index-Tree_002dsitter-query-syntax"></span>
<span id="index-Tree_002dsitter-query-pattern"></span>
<span id="index-tree_002dsitter-query-pattern-syntax"></span>
<span id="index-pattern-syntax_002c-tree_002dsitter-query"></span>
<span id="index-query_002c-tree_002dsitter"></span>
<p>A <em>query</em> consists of multiple <em>patterns</em>. Each pattern is an
s-expression that matches a certain node in the syntax node. A
pattern has the following shape:
pattern has the form <code>(<var>type</var>&nbsp;(<var>child</var>&hellip;))</code><!-- /@w -->
</p>
<div class="example">
<pre class="example">(<var>type</var> <var>child</var>...)
</pre></div>
<p>For example, a pattern that matches a <code>binary_expression</code> node that
contains <code>number_literal</code> child nodes would look like
</p>
@ -96,19 +96,20 @@
<pre class="example">(binary_expression (number_literal))
</pre></div>
<p>To <em>capture</em> a node in the query pattern above, append
<code>@capture-name</code> after the node pattern you want to capture. For
example,
<p>To <em>capture</em> a node using the query pattern above, append
<code>@<var>capture-name</var></code> after the node pattern you want to
capture. For example,
</p>
<div class="example">
<pre class="example">(binary_expression (number_literal) @number-in-exp)
</pre></div>
<p>captures <code>number_literal</code> nodes that are inside a
<code>binary_expression</code> node with capture name <code>number-in-exp</code>.
<code>binary_expression</code> node with the capture name
<code>number-in-exp</code>.
</p>
<p>We can capture the <code>binary_expression</code> node too, with capture
name <code>biexp</code>:
<p>We can capture the <code>binary_expression</code> node as well, with, for
example, the capture name <code>biexp</code>:
</p>
<div class="example">
<pre class="example">(binary_expression
@ -117,34 +118,40 @@
<span id="Query-function"></span><h3 class="heading">Query function</h3>
<p>Now we can introduce the query functions.
<span id="index-query-functions_002c-tree_002dsitter"></span>
<p>Now we can introduce the <em>query functions</em>.
</p>
<dl class="def">
<dt id="index-treesit_002dquery_002dcapture"><span class="category">Function: </span><span><strong>treesit-query-capture</strong> <em>node query &amp;optional beg end node-only</em><a href='#index-treesit_002dquery_002dcapture' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function matches patterns in <var>query</var> in <var>node</var>.
Parameter <var>query</var> can be either a string, a s-expression, or a
<dd><p>This function matches patterns in <var>query</var> within <var>node</var>.
The argument <var>query</var> can be either a string, a s-expression, or a
compiled query object. For now, we focus on the string syntax;
s-expression syntax and compiled query are described at the end of the
section.
</p>
<p>Parameter <var>node</var> can also be a parser or a language symbol. A
<p>The argument <var>node</var> can also be a parser or a language symbol. A
parser means using its root node, a language symbol means find or
create a parser for that language in the current buffer, and use the
root node.
</p>
<p>The function returns all captured nodes in a list of
<code>(<var>capture_name</var> . <var>node</var>)</code>. If <var>node-only</var> is
non-nil, a list of node is returned instead. If <var>beg</var> and
<var>end</var> are both non-nil, this function only pattern matches nodes
in that range.
<p>The function returns all the captured nodes in a list of the form
<code>(<var><span class="nolinebreak">capture_name</span></var>&nbsp;.&nbsp;<var>node</var>)</code><!-- /@w -->. If <var>node-only</var> is
non-<code>nil</code>, it returns the list of nodes instead. By default the
entire text of <var>node</var> is searched, but if <var>beg</var> and <var>end</var>
are both non-<code>nil</code>, they specify the region of buffer text where
this function should match nodes. Any matching node whose span
overlaps with the region between <var>beg</var> and <var>end</var> are captured,
it doesn&rsquo;t have to be completely in the region.
</p>
<span id="index-treesit_002dquery_002derror"></span>
<p>This function raise a <var>treesit-query-error</var> if <var>query</var> is
malformed. The signal data contains a description of the specific
error. You can use <code>treesit-query-validate</code> to debug the query.
<span id="index-treesit_002dquery_002dvalidate"></span>
<p>This function raises the <code>treesit-query-error</code> error if
<var>query</var> is malformed. The signal data contains a description of
the specific error. You can use <code>treesit-query-validate</code> to
validate and debug the query.
</p></dd></dl>
<p>For example, suppose <var>node</var>&rsquo;s content is <code>1 + 2</code>, and
<p>For example, suppose <var>node</var>&rsquo;s text is <code>1 + 2</code>, and
<var>query</var> is
</p>
<div class="example">
@ -153,7 +160,7 @@
(number_literal) @number-in-exp) @biexp&quot;)
</pre></div>
<p>Querying that query would return
<p>Matching that query would return
</p>
<div class="example">
<pre class="example">(treesit-query-capture node query)
@ -162,8 +169,8 @@
(number-in-exp . <var>&lt;node for &quot;2&quot;&gt;</var>))
</pre></div>
<p>As we mentioned earlier, a <var>query</var> could contain multiple
patterns. For example, it could have two top-level patterns:
<p>As mentioned earlier, <var>query</var> could contain multiple patterns.
For example, it could have two top-level patterns:
</p>
<div class="example">
<pre class="example">(setq query
@ -173,15 +180,15 @@
<dl class="def">
<dt id="index-treesit_002dquery_002dstring"><span class="category">Function: </span><span><strong>treesit-query-string</strong> <em>string query language</em><a href='#index-treesit_002dquery_002dstring' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function parses <var>string</var> with <var>language</var>, pattern matches
its root node with <var>query</var>, and returns the result.
<dd><p>This function parses <var>string</var> with <var>language</var>, matches its
root node with <var>query</var>, and returns the result.
</p></dd></dl>
<span id="More-query-syntax"></span><h3 class="heading">More query syntax</h3>
<p>Besides node type and capture, tree-sitter&rsquo;s query syntax can express
anonymous node, field name, wildcard, quantification, grouping,
alternation, anchor, and predicate.
<p>Besides node type and capture, tree-sitter&rsquo;s pattern syntax can
express anonymous node, field name, wildcard, quantification,
grouping, alternation, anchor, and predicate.
</p>
<span id="Anonymous-node"></span><h4 class="subheading">Anonymous node</h4>
@ -194,9 +201,9 @@
<span id="Wild-card"></span><h4 class="subheading">Wild card</h4>
<p>In a query pattern, &lsquo;<samp>(_)</samp>&rsquo; matches any named node, and &lsquo;<samp>_</samp>&rsquo;
matches any named and anonymous node. For example, to capture any
named child of a <code>binary_expression</code> node, the pattern would be
<p>In a pattern, &lsquo;<samp>(_)</samp>&rsquo; matches any named node, and &lsquo;<samp>_</samp>&rsquo; matches
any named and anonymous node. For example, to capture any named child
of a <code>binary_expression</code> node, the pattern would be
</p>
<div class="example">
<pre class="example">(binary_expression (_) @in_biexp)
@ -204,7 +211,9 @@
<span id="Field-name"></span><h4 class="subheading">Field name</h4>
<p>We can capture child nodes that has specific field names:
<p>It is possible to capture child nodes that have specific field names.
In the pattern below, <code>declarator</code> and <code>body</code> are field
names, indicated by the colon following them.
</p>
<div class="example">
<pre class="example">(function_definition
@ -212,8 +221,8 @@
body: (_) @func-body)
</pre></div>
<p>We can also capture a node that doesn&rsquo;t have certain field, say, a
<code>function_definition</code> without a <code>body</code> field.
<p>It is also possible to capture a node that doesn&rsquo;t have a certain
field, say, a <code>function_definition</code> without a <code>body</code> field.
</p>
<div class="example">
<pre class="example">(function_definition !body) @func-no-body
@ -221,19 +230,20 @@
<span id="Quantify-node"></span><h4 class="subheading">Quantify node</h4>
<span id="index-quantify-node_002c-tree_002dsitter"></span>
<p>Tree-sitter recognizes quantification operators &lsquo;<samp>*</samp>&rsquo;, &lsquo;<samp>+</samp>&rsquo; and
&lsquo;<samp>?</samp>&rsquo;. Their meanings are the same as in regular expressions:
&lsquo;<samp>*</samp>&rsquo; matches the preceding pattern zero or more times, &lsquo;<samp>+</samp>&rsquo;
matches one or more times, and &lsquo;<samp>?</samp>&rsquo; matches zero or one time.
</p>
<p>For example, this pattern matches <code>type_declaration</code> nodes
that has <em>zero or more</em> <code>long</code> keyword.
<p>For example, the following pattern matches <code>type_declaration</code>
nodes that has <em>zero or more</em> <code>long</code> keyword.
</p>
<div class="example">
<pre class="example">(type_declaration &quot;long&quot;*) @long-type
</pre></div>
<p>And this pattern matches a type declaration that has zero or one
<p>The following pattern matches a type declaration that has zero or one
<code>long</code> keyword:
</p>
<div class="example">
@ -242,8 +252,8 @@
<span id="Grouping"></span><h4 class="subheading">Grouping</h4>
<p>Similar to groups in regular expression, we can bundle patterns into a
group and apply quantification operators to it. For example, to
<p>Similar to groups in regular expression, we can bundle patterns into
groups and apply quantification operators to them. For example, to
express a comma separated list of identifiers, one could write
</p>
<div class="example">
@ -253,9 +263,9 @@
<span id="Alternation"></span><h4 class="subheading">Alternation</h4>
<p>Again, similar to regular expressions, we can express &ldquo;match anyone
from this group of patterns&rdquo; in the query pattern. The syntax is a
list of patterns enclosed in square brackets. For example, to capture
some keywords in C, the query pattern would be
from this group of patterns&rdquo; in a pattern. The syntax is a list of
patterns enclosed in square brackets. For example, to capture some
keywords in C, the pattern would be
</p>
<div class="example">
<pre class="example">[
@ -277,11 +287,13 @@
<div class="example">
<pre class="example">;; Anchor the child with the end of its parent.
(compound_expression (_) @last-child .)
</pre><pre class="example">
;; Anchor the child with the beginning of its parent.
</pre><pre class="example">;; Anchor the child with the beginning of its parent.
(compound_expression . (_) @first-child)
</pre><pre class="example">
;; Anchor two adjacent children.
</pre><pre class="example">;; Anchor two adjacent children.
(compound_expression
(_) @prev-child
.
@ -293,8 +305,8 @@
</p>
<span id="Predicate"></span><h4 class="subheading">Predicate</h4>
<p>We can add predicate constraints to a pattern. For example, if we use
the following query pattern
<p>It is possible to add predicate constraints to a pattern. For
example, with the following pattern:
</p>
<div class="example">
<pre class="example">(
@ -303,33 +315,35 @@
)
</pre></div>
<p>Then tree-sitter only matches arrays where the first element equals to
<p>tree-sitter only matches arrays where the first element equals to
the last element. To attach a predicate to a pattern, we need to
group then together. A predicate always starts with a &lsquo;<samp>#</samp>&rsquo;.
group them together. A predicate always starts with a &lsquo;<samp>#</samp>&rsquo;.
Currently there are two predicates, <code>#equal</code> and <code>#match</code>.
</p>
<dl class="def">
<dt id="index-equal-1"><span class="category">Predicate: </span><span><strong>equal</strong> <em>arg1 arg2</em><a href='#index-equal-1' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Matches if <var>arg1</var> equals to <var>arg2</var>. Arguments can be either a
string or a capture name. Capture names represent the text that the
<dd><p>Matches if <var>arg1</var> equals to <var>arg2</var>. Arguments can be either
strings or capture names. Capture names represent the text that the
captured node spans in the buffer.
</p></dd></dl>
<dl class="def">
<dt id="index-match"><span class="category">Predicate: </span><span><strong>match</strong> <em>regexp capture-name</em><a href='#index-match' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Matches if the text that <var>capture-name</var>s node spans in the buffer
<dt id="index-match-1"><span class="category">Predicate: </span><span><strong>match</strong> <em>regexp capture-name</em><a href='#index-match-1' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Matches if the text that <var>capture-name</var>&rsquo;s node spans in the buffer
matches regular expression <var>regexp</var>. Matching is case-sensitive.
</p></dd></dl>
<p>Note that a predicate can only refer to capture names appeared in the
same pattern. Indeed, it makes little sense to refer to capture names
in other patterns anyway.
<p>Note that a predicate can only refer to capture names that appear in
the same pattern. Indeed, it makes little sense to refer to capture
names in other patterns.
</p>
<span id="S_002dexpression-patterns"></span><h3 class="heading">S-expression patterns</h3>
<p>Besides strings, Emacs provides a s-expression based syntax for query
patterns. It largely resembles the string-based syntax. For example,
the following pattern
<span id="index-tree_002dsitter-patterns-as-sexps"></span>
<span id="index-patterns_002c-tree_002dsitter_002c-in-sexp-form"></span>
<p>Besides strings, Emacs provides a s-expression based syntax for
tree-sitter patterns. It largely resembles the string-based syntax.
For example, the following query
</p>
<div class="example">
<pre class="example">(treesit-query-capture
@ -353,9 +367,8 @@
[&quot;return&quot; &quot;break&quot;] @keyword))
</pre></div>
<p>Most pattern syntax can be written directly as strange but
never-the-less valid s-expressions. Only a few of them needs
modification:
<p>Most patterns can be written directly as strange but nevertheless
valid s-expressions. Only a few of them needs modification:
</p>
<ul>
<li> Anchor &lsquo;<samp>.</samp>&rsquo; is written as <code>:anchor</code>.
@ -386,42 +399,50 @@
<span id="Compiling-queries"></span><h3 class="heading">Compiling queries</h3>
<p>If a query will be used repeatedly, especially in tight loops, it is
important to compile that query, because a compiled query is much
faster than an uncompiled one. A compiled query can be used anywhere
a query is accepted.
<span id="index-compiling-tree_002dsitter-queries"></span>
<span id="index-queries_002c-compiling"></span>
<p>If a query is intended to be used repeatedly, especially in tight
loops, it is important to compile that query, because a compiled query
is much faster than an uncompiled one. A compiled query can be used
anywhere a query is accepted.
</p>
<dl class="def">
<dt id="index-treesit_002dquery_002dcompile"><span class="category">Function: </span><span><strong>treesit-query-compile</strong> <em>language query</em><a href='#index-treesit_002dquery_002dcompile' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function compiles <var>query</var> for <var>language</var> into a compiled
query object and returns it.
</p>
<p>This function raise a <var>treesit-query-error</var> if <var>query</var> is
malformed. The signal data contains a description of the specific
error. You can use <code>treesit-query-validate</code> to debug the query.
<p>This function raises the <code>treesit-query-error</code> error if
<var>query</var> is malformed. The signal data contains a description of
the specific error. You can use <code>treesit-query-validate</code> to
validate and debug the query.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dquery_002dlanguage"><span class="category">Function: </span><span><strong>treesit-query-language</strong> <em>query</em><a href='#index-treesit_002dquery_002dlanguage' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function return the language of <var>query</var>.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dquery_002dexpand"><span class="category">Function: </span><span><strong>treesit-query-expand</strong> <em>query</em><a href='#index-treesit_002dquery_002dexpand' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function expands the s-expression <var>query</var> into a string
query.
<dd><p>This function converts the s-expression <var>query</var> into the string
format.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dpattern_002dexpand"><span class="category">Function: </span><span><strong>treesit-pattern-expand</strong> <em>pattern</em><a href='#index-treesit_002dpattern_002dexpand' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function expands the s-expression <var>pattern</var> into a string
pattern.
<dd><p>This function converts the s-expression <var>pattern</var> into the string
format.
</p></dd></dl>
<p>Finally, tree-sitter project&rsquo;s documentation about
pattern-matching can be found at
<p>For more details, read the tree-sitter project&rsquo;s documentation about
pattern-matching, which can be found at
<a href="https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries">https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries</a>.
</p>
</div>
<hr>
<div class="header">
<p>
Next: <a href="Multiple-Languages.html">Parsing Text in Multiple Languages</a>, Previous: <a href="Accessing-Node.html">Accessing Node Information</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
Next: <a href="Multiple-Languages.html">Parsing Text in Multiple Languages</a>, Previous: <a href="Accessing-Node-Information.html">Accessing Node Information</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>

View file

@ -33,7 +33,7 @@
<link href="Index.html" rel="index" title="Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
<link href="Accessing-Node.html" rel="next" title="Accessing Node">
<link href="Accessing-Node-Information.html" rel="next" title="Accessing Node Information">
<link href="Using-Parser.html" rel="prev" title="Using Parser">
<style type="text/css">
<!--
@ -63,77 +63,90 @@
<div class="section" id="Retrieving-Node">
<div class="header">
<p>
Next: <a href="Accessing-Node.html" accesskey="n" rel="next">Accessing Node Information</a>, Previous: <a href="Using-Parser.html" accesskey="p" rel="prev">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
Next: <a href="Accessing-Node-Information.html" accesskey="n" rel="next">Accessing Node Information</a>, Previous: <a href="Using-Parser.html" accesskey="p" rel="prev">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<span id="Retrieving-Node-1"></span><h3 class="section">37.3 Retrieving Node</h3>
<span id="index-retrieve-node_002c-tree_002dsitter"></span>
<span id="index-tree_002dsitter_002c-find-node"></span>
<span id="index-get-node_002c-tree_002dsitter"></span>
<span id="index-tree_002dsitter-find-node"></span>
<span id="index-tree_002dsitter-get-node"></span>
<p>Before we continue, lets go over some conventions of tree-sitter
functions.
<span id="index-terminology_002c-for-tree_002dsitter-functions"></span>
<p>Here&rsquo;s some terminology and conventions we use when documenting
tree-sitter functions.
</p>
<p>We talk about a node being &ldquo;smaller&rdquo; or &ldquo;larger&rdquo;, and &ldquo;lower&rdquo; or
&ldquo;higher&rdquo;. A smaller and lower node is lower in the syntax tree and
therefore spans a smaller piece of text; a larger and higher node is
higher up in the syntax tree, containing many smaller nodes as its
children, and therefore spans a larger piece of text.
therefore spans a smaller portion of buffer text; a larger and higher
node is higher up in the syntax tree, it contains many smaller nodes
as its children, and therefore spans a larger portion of text.
</p>
<p>When a function cannot find a node, it returns nil. And for the
convenience for function chaining, all the functions that take a node
as argument and returns a node accept the node to be nil; in that
case, the function just returns nil.
<p>When a function cannot find a node, it returns <code>nil</code>. For
convenience, all functions that take a node as argument and return
a node, also accept the node argument of <code>nil</code> and in that case
just return <code>nil</code>.
</p>
<span id="index-treesit_002dnode_002doutdated"></span>
<p>Nodes are not automatically updated when the associated buffer is
modified. And there is no way to update a node once it is retrieved.
Using an outdated node throws <code>treesit-node-outdated</code> error.
modified, and there is no way to update a node once it is retrieved.
Using an outdated node signals the <code>treesit-node-outdated</code> error.
</p>
<span id="Retrieving-node-from-syntax-tree"></span><h3 class="heading">Retrieving node from syntax tree</h3>
<span id="index-retrieving-tree_002dsitter-nodes"></span>
<span id="index-syntax-tree_002c-retrieving-nodes"></span>
<dl class="def">
<dt id="index-treesit_002dnode_002dat"><span class="category">Function: </span><span><strong>treesit-node-at</strong> <em>beg end &amp;optional parser-or-lang named</em><a href='#index-treesit_002dnode_002dat' class='copiable-anchor'> &para;</a></span></dt>
<dt id="index-treesit_002dnode_002dat"><span class="category">Function: </span><span><strong>treesit-node-at</strong> <em>pos &amp;optional parser-or-lang named</em><a href='#index-treesit_002dnode_002dat' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns the <em>smallest</em> node that starts at or after
the <var>point</var>. In other words, the start of the node is equal or
greater than <var>point</var>.
the buffer position <var>pos</var>. In other words, the start of the node
is greater or equal to <var>pos</var>.
</p>
<p>When <var>parser-or-lang</var> is nil, this function uses the first parser
in <code>(treesit-parser-list)</code> in the current buffer. If
<var>parser-or-lang</var> is a parser object, it use that parser; if
<var>parser-or-lang</var> is a language, it finds the first parser using
that language in <code>(treesit-parser-list)</code> and use that.
<p>When <var>parser-or-lang</var> is <code>nil</code> or omitted, this function uses
the first parser in <code>(treesit-parser-list)</code> of the current
buffer. If <var>parser-or-lang</var> is a parser object, it uses that
parser; if <var>parser-or-lang</var> is a language, it finds the first
parser using that language in <code>(treesit-parser-list)</code>, and uses
that.
</p>
<p>If <var>named</var> is non-nil, this function looks for a named node
<p>If <var>named</var> is non-<code>nil</code>, this function looks for a named node
only (see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>).
</p>
<p>When <var>pos</var> is after all the text in the buffer, technically there
is no node after <var>pos</var>. But for convenience, this function will
return the last leaf node in the parse tree. If <var>strict</var> is
non-<code>nil</code>, this function will strictly comply to the semantics and
return <var>nil</var>.
</p>
<p>Example:
</p><div class="example">
</p>
<div class="example">
<pre class="example">;; Find the node at point in a C parser's syntax tree.
(treesit-node-at (point) 'c)
</pre></div>
&rArr; #&lt;treesit-node (primitive_type) in 23-27&gt;
</pre></div>
</dd></dl>
<dl class="def">
<dt id="index-treesit_002dnode_002don"><span class="category">Function: </span><span><strong>treesit-node-on</strong> <em>beg end &amp;optional parser-or-lang named</em><a href='#index-treesit_002dnode_002don' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns the <em>smallest</em> node that covers the span
from <var>beg</var> to <var>end</var>. In other words, the start of the node is
less or equal to <var>beg</var>, and the end of the node is greater or
equal to <var>end</var>.
<dd><p>This function returns the <em>smallest</em> node that covers the region
of buffer text between <var>beg</var> and <var>end</var>. In other words, the
start of the node is before or at <var>beg</var>, and the end of the node
is at or after <var>end</var>.
</p>
<p><em>Beware</em> that calling this function on an empty line that is not
inside any top-level construct (function definition, etc) most
<p><em>Beware:</em> calling this function on an empty line that is not
inside any top-level construct (function definition, etc.) most
probably will give you the root node, because the root node is the
smallest node that covers that empty line. Most of the time, you want
to use <code>treesit-node-at</code>.
to use <code>treesit-node-at</code>, described above, instead.
</p>
<p>When <var>parser-or-lang</var> is nil, this function uses the first parser
in <code>(treesit-parser-list)</code> in the current buffer. If
<var>parser-or-lang</var> is a parser object, it use that parser; if
<p>When <var>parser-or-lang</var> is <code>nil</code>, this function uses the first
parser in <code>(treesit-parser-list)</code> of the current buffer. If
<var>parser-or-lang</var> is a parser object, it uses that parser; if
<var>parser-or-lang</var> is a language, it finds the first parser using
that language in <code>(treesit-parser-list)</code> and use that.
that language in <code>(treesit-parser-list)</code>, and uses that.
</p>
<p>If <var>named</var> is non-nil, this function looks for a named node only
(see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>).
<p>If <var>named</var> is non-<code>nil</code>, this function looks for a named node
only (see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>).
</p></dd></dl>
<dl class="def">
@ -145,17 +158,21 @@
<dl class="def">
<dt id="index-treesit_002dbuffer_002droot_002dnode"><span class="category">Function: </span><span><strong>treesit-buffer-root-node</strong> <em>&amp;optional language</em><a href='#index-treesit_002dbuffer_002droot_002dnode' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function finds the first parser that uses <var>language</var> in
<code>(treesit-parser-list)</code> in the current buffer, and returns the
root node of that buffer. If it cannot find an appropriate parser,
nil is returned.
<code>(treesit-parser-list)</code> of the current buffer, and returns the
root node generated by that parser. If it cannot find an appropriate
parser, it returns <code>nil</code>.
</p></dd></dl>
<p>Once we have a node, we can retrieve other nodes from it, or query for
information about this node.
<p>Given a node, a Lisp program can retrieve other nodes starting from
it, or query for information about this node.
</p>
<span id="Retrieving-node-from-other-nodes"></span><h3 class="heading">Retrieving node from other nodes</h3>
<span id="index-syntax-tree-nodes_002c-retrieving-from-other-nodes"></span>
<span id="By-kinship"></span><h4 class="subheading">By kinship</h4>
<span id="index-kinship_002c-syntax-tree-nodes"></span>
<span id="index-nodes_002c-by-kinship"></span>
<span id="index-syntax-tree-nodes_002c-by-kinship"></span>
<dl class="def">
<dt id="index-treesit_002dnode_002dparent"><span class="category">Function: </span><span><strong>treesit-node-parent</strong> <em>node</em><a href='#index-treesit_002dnode_002dparent' class='copiable-anchor'> &para;</a></span></dt>
@ -165,132 +182,162 @@
<dl class="def">
<dt id="index-treesit_002dnode_002dchild"><span class="category">Function: </span><span><strong>treesit-node-child</strong> <em>node n &amp;optional named</em><a href='#index-treesit_002dnode_002dchild' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns the <var>n</var>&rsquo;th child of <var>node</var>. If
<var>named</var> is non-nil, then it only counts named nodes
(see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>). For example, in a node
that represents a string: <code>&quot;text&quot;</code>, there are three children
nodes: the opening quote <code>&quot;</code>, the string content <code>text</code>, and
the enclosing quote <code>&quot;</code>. Among these nodes, the first child is
the opening quote <code>&quot;</code>, the first named child is the string
content <code>text</code>.
<var>named</var> is non-<code>nil</code>, it counts only named nodes
(see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>).
</p>
<p>For example, in a node that represents a string <code>&quot;text&quot;</code>, there
are three children nodes: the opening quote <code>&quot;</code>, the string text
<code>text</code>, and the closing quote <code>&quot;</code>. Among these nodes, the
first child is the opening quote <code>&quot;</code>, and the first named child
is the string text.
</p>
<p>This function returns <code>nil</code> if there is no <var>n</var>&rsquo;th child.
<var>n</var> could be negative, e.g., <code>-1</code> represents the last child.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dnode_002dchildren"><span class="category">Function: </span><span><strong>treesit-node-children</strong> <em>node &amp;optional named</em><a href='#index-treesit_002dnode_002dchildren' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns all of <var>node</var>&rsquo;s children in a list. If
<var>named</var> is non-nil, then it only retrieves named nodes.
<dd><p>This function returns all of <var>node</var>&rsquo;s children as a list. If
<var>named</var> is non-<code>nil</code>, it retrieves only named nodes.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dnext_002dsibling"><span class="category">Function: </span><span><strong>treesit-next-sibling</strong> <em>node &amp;optional named</em><a href='#index-treesit_002dnext_002dsibling' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function finds the next sibling of <var>node</var>. If <var>named</var> is
non-nil, it finds the next named sibling.
non-<code>nil</code>, it finds the next named sibling.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dprev_002dsibling"><span class="category">Function: </span><span><strong>treesit-prev-sibling</strong> <em>node &amp;optional named</em><a href='#index-treesit_002dprev_002dsibling' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function finds the previous sibling of <var>node</var>. If
<var>named</var> is non-nil, it finds the previous named sibling.
<var>named</var> is non-<code>nil</code>, it finds the previous named sibling.
</p></dd></dl>
<span id="By-field-name"></span><h4 class="subheading">By field name</h4>
<span id="index-nodes_002c-by-field-name"></span>
<span id="index-syntax-tree-nodes_002c-by-field-name"></span>
<p>To make the syntax tree easier to analyze, many language definitions
assign <em>field names</em> to child nodes (see <a href="Language-Definitions.html#tree_002dsitter-node-field-name">field name</a>). For example, a <code>function_definition</code> node
could have a <code>declarator</code> and a <code>body</code>.
could have a <code>declarator</code> node and a <code>body</code> node.
</p>
<dl class="def">
<dt id="index-treesit_002dchild_002dby_002dfield_002dname"><span class="category">Function: </span><span><strong>treesit-child-by-field-name</strong> <em>node field-name</em><a href='#index-treesit_002dchild_002dby_002dfield_002dname' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function finds the child of <var>node</var> that has <var>field-name</var>
as its field name.
<dd><p>This function finds the child of <var>node</var> whose field name is
<var>field-name</var>, a string.
</p>
<div class="example">
<pre class="example">;; Get the child that has &quot;body&quot; as its field name.
(treesit-child-by-field-name node &quot;body&quot;)
</pre></div>
&rArr; #&lt;treesit-node (compound_statement) in 45-89&gt;
</pre></div>
</dd></dl>
<span id="By-position"></span><h4 class="subheading">By position</h4>
<span id="index-nodes_002c-by-position"></span>
<span id="index-syntax-tree-nodes_002c-by-position"></span>
<dl class="def">
<dt id="index-treesit_002dfirst_002dchild_002dfor_002dpos"><span class="category">Function: </span><span><strong>treesit-first-child-for-pos</strong> <em>node pos &amp;optional named</em><a href='#index-treesit_002dfirst_002dchild_002dfor_002dpos' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function finds the first child of <var>node</var> that extends beyond
<var>pos</var>. &ldquo;Extend beyond&rdquo; means the end of the child node &gt;=
<var>pos</var>. This function only looks for immediate children of
<var>node</var>, and doesn&rsquo;t look in its grand children. If <var>named</var> is
non-nil, it only looks for named child (see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>).
buffer position <var>pos</var>. &ldquo;Extends beyond&rdquo; means the end of the
child node is greater or equal to <var>pos</var>. This function only looks
for immediate children of <var>node</var>, and doesn&rsquo;t look in its
grandchildren. If <var>named</var> is non-<code>nil</code>, it looks for the
first named child (see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>).
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dnode_002ddescendant_002dfor_002drange"><span class="category">Function: </span><span><strong>treesit-node-descendant-for-range</strong> <em>node beg end &amp;optional named</em><a href='#index-treesit_002dnode_002ddescendant_002dfor_002drange' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function finds the <em>smallest</em> child/grandchild... of
<var>node</var> that spans the range from <var>beg</var> to <var>end</var>. It is
similar to <code>treesit-node-at</code>. If <var>named</var> is non-nil, it only
looks for named child.
<dd><p>This function finds the <em>smallest</em> descendant node of <var>node</var>
that spans the region of text between positions <var>beg</var> and
<var>end</var>. It is similar to <code>treesit-node-at</code>. If <var>named</var>
is non-<code>nil</code>, it looks for smallest named child.
</p></dd></dl>
<span id="Searching-for-node"></span><h3 class="heading">Searching for node</h3>
<dl class="def">
<dt id="index-treesit_002dsearch_002dsubtree"><span class="category">Function: </span><span><strong>treesit-search-subtree</strong> <em>node predicate &amp;optional all backward limit</em><a href='#index-treesit_002dsearch_002dsubtree' class='copiable-anchor'> &para;</a></span></dt>
<dt id="index-treesit_002dsearch_002dsubtree"><span class="category">Function: </span><span><strong>treesit-search-subtree</strong> <em>node predicate &amp;optional backward all limit</em><a href='#index-treesit_002dsearch_002dsubtree' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function traverses the subtree of <var>node</var> (including
<var>node</var>), and match <var>predicate</var> with each node along the way.
And <var>predicate</var> is a regexp that matches (case-insensitively)
against each node&rsquo;s type, or a function that takes a node and returns
nil/non-nil. If a node matches, that node is returned, if no node
ever matches, nil is returned.
<var>node</var> itself), looking for a node for which <var>predicate</var>
returns non-<code>nil</code>. <var>predicate</var> is a regexp that is matched
(case-insensitively) against each node&rsquo;s type, or a predicate function
that takes a node and returns non-<code>nil</code> if the node matches. The
function returns the first node that matches, or <code>nil</code> if none
does.
</p>
<p>By default, this function only traverses named nodes, if <var>all</var> is
non-nil, it traverses all nodes. If <var>backward</var> is non-nil, it
traverses backwards. If <var>limit</var> is non-nil, it only traverses
that number of levels down in the tree.
<p>By default, this function only traverses named nodes, but if <var>all</var>
is non-<code>nil</code>, it traverses all the nodes. If <var>backward</var> is
non-<code>nil</code>, it traverses backwards (i.e., it visits the last child first
when traversing down the tree). If <var>limit</var> is non-<code>nil</code>, it
must be a number that limits the tree traversal to that many levels
down the tree.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dsearch_002dforward"><span class="category">Function: </span><span><strong>treesit-search-forward</strong> <em>start predicate &amp;optional all backward up</em><a href='#index-treesit_002dsearch_002dforward' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function is somewhat similar to <code>treesit-search-subtree</code>.
It also traverse the parse tree and match each node with
<var>predicate</var> (except for <var>start</var>), where <var>predicate</var> can be
a (case-insensitive) regexp or a function. For a tree like the below
where <var>start</var> is marked 1, this function traverses as numbered:
<dt id="index-treesit_002dsearch_002dforward"><span class="category">Function: </span><span><strong>treesit-search-forward</strong> <em>start predicate &amp;optional backward all</em><a href='#index-treesit_002dsearch_002dforward' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Like <code>treesit-search-subtree</code>, this function also traverses the
parse tree and matches each node with <var>predicate</var> (except for
<var>start</var>), where <var>predicate</var> can be a (case-insensitive) regexp
or a function. For a tree like the below where <var>start</var> is marked
S, this function traverses as numbered from 1 to 12:
</p>
<div class="example">
<pre class="example"> o
<pre class="example"> 12
|
3--------4-----------8
| | |
o--o-+--1 5--+--6 9---+-----12
| | | | | |
o o 2 7 +-+-+ +--+--+
| | | | |
10 11 13 14 15
S--------3----------11
| | |
o--o-+--o 1--+--2 6--+-----10
| | | |
o o +-+-+ +--+--+
| | | | |
4 5 7 8 9
</pre></div>
<p>Same as in <code>treesit-search-subtree</code>, this function only searches
for named nodes by default. But if <var>all</var> is non-nil, it searches
for all nodes. If <var>backward</var> is non-nil, it searches backwards.
<p>Note that this function doesn&rsquo;t traverse the subtree of <var>start</var>,
and it always traverse leaf nodes first, then upwards.
</p>
<p>If <var>up</var> is non-nil, this function will only traverse to siblings
and parents. In that case, only 1 3 4 8 would be traversed.
<p>Like <code>treesit-search-subtree</code>, this function only searches for
named nodes by default, but if <var>all</var> is non-<code>nil</code>, it
searches for all nodes. If <var>backward</var> is non-<code>nil</code>, it
searches backwards.
</p>
<p>While <code>treesit-search-subtree</code> traverses the subtree of a node,
this function starts with node <var>start</var> and traverses every node
that comes after it in the buffer position order, i.e., nodes with
start positions greater than the end position of <var>start</var>.
</p>
<p>In the tree shown above, <code>treesit-search-subtree</code> traverses node
S (<var>start</var>) and nodes marked with <code>o</code>, where this function
traverses the nodes marked with numbers. This function is useful for
answering questions like &ldquo;what is the first node after <var>start</var> in
the buffer that satisfies some condition?&rdquo;
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dsearch_002dforward_002dgoto"><span class="category">Function: </span><span><strong>treesit-search-forward-goto</strong> <em>predicate side &amp;optional all backward up</em><a href='#index-treesit_002dsearch_002dforward_002dgoto' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function jumps to the start or end of the next node in buffer
that matches <var>predicate</var>. Parameters <var>predicate</var>, <var>all</var>,
<var>backward</var>, and <var>up</var> are the same as in
<code>treesit-search-forward</code>. And <var>side</var> controls which side of
the matched no do we stop at, it can be <code>start</code> or <code>end</code>.
<dt id="index-treesit_002dsearch_002dforward_002dgoto"><span class="category">Function: </span><span><strong>treesit-search-forward-goto</strong> <em>node predicate &amp;optional start backward all</em><a href='#index-treesit_002dsearch_002dforward_002dgoto' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function moves point to the start or end of the next node after
<var>node</var> in the buffer that matches <var>predicate</var>. If <var>start</var>
is non-<code>nil</code>, stop at the beginning rather than the end of a node.
</p>
<p>This function guarantees that the matched node it returns makes
progress in terms of buffer position: the start/end position of the
returned node is always greater than that of <var>node</var>.
</p>
<p>Arguments <var>predicate</var>, <var>backward</var> and <var>all</var> are the same
as in <code>treesit-search-forward</code>.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dinduce_002dsparse_002dtree"><span class="category">Function: </span><span><strong>treesit-induce-sparse-tree</strong> <em>root predicate &amp;optional process-fn limit</em><a href='#index-treesit_002dinduce_002dsparse_002dtree' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function creates a sparse tree from <var>root</var>&rsquo;s subtree.
</p>
<p>Basically, it takes the subtree under <var>root</var>, and combs it so only
the nodes that match <var>predicate</var> are left, like picking out grapes
on the vine. Like previous functions, <var>predicate</var> can be a regexp
string that matches against each node&rsquo;s type case-insensitively, or a
function that takes a node and return nil/non-nil.
<p>It takes the subtree under <var>root</var>, and combs it so only the nodes
that match <var>predicate</var> are left. Like previous functions, the
<var>predicate</var> can be a regexp string that matches against each
node&rsquo;s type case-insensitively, or a function that takes a node and
return non-<code>nil</code> if it matches.
</p>
<p>For example, for a subtree on the left that consist of both numbers
and letters, if <var>predicate</var> is &ldquo;letter only&rdquo;, the returned tree
@ -310,50 +357,63 @@
e 5 e
</pre></div>
<p>If <var>process-fn</var> is non-nil, instead of returning the matched
<p>If <var>process-fn</var> is non-<code>nil</code>, instead of returning the matched
nodes, this function passes each node to <var>process-fn</var> and uses the
returned value instead. If non-nil, <var>limit</var> is the number of
returned value instead. If non-<code>nil</code>, <var>limit</var> is the number of
levels to go down from <var>root</var>.
</p>
<p>Each node in the returned tree looks like <code>(<var>tree-sitter
node</var> . (<var>child</var> ...))</code>. The <var>tree-sitter node</var> of the root
of this tree will be nil if <var>ROOT</var> doesn&rsquo;t match <var>pred</var>. If
no node matches <var>predicate</var>, return nil.
<p>Each node in the returned tree looks like
<code>(<var><span class="nolinebreak">tree-sitter-node</span></var>&nbsp;.&nbsp;(<var>child</var>&nbsp;&hellip;))</code><!-- /@w -->. The
<var>tree-sitter-node</var> of the root of this tree will be nil if
<var>root</var> doesn&rsquo;t match <var>predicate</var>. If no node matches
<var>predicate</var>, the function returns <code>nil</code>.
</p></dd></dl>
<span id="More-convenient-functions"></span><h3 class="heading">More convenient functions</h3>
<span id="More-convenience-functions"></span><h3 class="heading">More convenience functions</h3>
<dl class="def">
<dt id="index-treesit_002dfilter_002dchild"><span class="category">Function: </span><span><strong>treesit-filter-child</strong> <em>node pred &amp;optional named</em><a href='#index-treesit_002dfilter_002dchild' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function finds immediate children of <var>node</var> that satisfies
<var>pred</var>.
<dt id="index-treesit_002dfilter_002dchild"><span class="category">Function: </span><span><strong>treesit-filter-child</strong> <em>node predicate &amp;optional named</em><a href='#index-treesit_002dfilter_002dchild' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function finds immediate children of <var>node</var> that satisfy
<var>predicate</var>.
</p>
<p>Function <var>pred</var> takes the child node as the argument and should
return non-nil to indicated keeping the child. If <var>named</var>
non-nil, this function only searches for named nodes.
<p>The <var>predicate</var> function takes a node as the argument and should
return non-<code>nil</code> to indicate that the node should be kept. If
<var>named</var> is non-<code>nil</code>, this function only examines the named
nodes.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dparent_002duntil"><span class="category">Function: </span><span><strong>treesit-parent-until</strong> <em>node pred</em><a href='#index-treesit_002dparent_002duntil' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function repeatedly finds the parent of <var>node</var>, and returns
the parent if it satisfies <var>pred</var> (which takes the parent as the
argument). If no parent satisfies <var>pred</var>, this function returns
nil.
<dt id="index-treesit_002dparent_002duntil"><span class="category">Function: </span><span><strong>treesit-parent-until</strong> <em>node predicate</em><a href='#index-treesit_002dparent_002duntil' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function repeatedly finds the parents of <var>node</var>, and returns
the parent that satisfies <var>predicate</var>, a function that takes a
node as the argument. If no parent satisfies <var>predicate</var>, this
function returns <code>nil</code>.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dparent_002dwhile"><span class="category">Function: </span><span><strong>treesit-parent-while</strong><a href='#index-treesit_002dparent_002dwhile' class='copiable-anchor'> &para;</a></span></dt>
<dt id="index-treesit_002dparent_002dwhile"><span class="category">Function: </span><span><strong>treesit-parent-while</strong> <em>node predicate</em><a href='#index-treesit_002dparent_002dwhile' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function repeatedly finds the parent of <var>node</var>, and keeps
doing so as long as the parent satisfies <var>pred</var> (which takes the
parent as the single argument). I.e., this function returns the
farthest parent that still satisfies <var>pred</var>.
doing so as long as the nodes satisfy <var>predicate</var>, a function that
takes a node as the argument. That is, this function returns the
farthest parent that still satisfies <var>predicate</var>.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dnode_002dtop_002dlevel"><span class="category">Function: </span><span><strong>treesit-node-top-level</strong> <em>node &amp;optional type</em><a href='#index-treesit_002dnode_002dtop_002dlevel' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns the highest parent of <var>node</var> that has the
same type as <var>node</var>. If no such parent exists, it returns
<code>nil</code>. Therefore this function is also useful for testing
whether <var>node</var> is top-level.
</p>
<p>If <var>type</var> is non-<code>nil</code>, this function matches each parent&rsquo;s
type with <var>type</var> as a regexp, rather than using <var>node</var>&rsquo;s type.
</p></dd></dl>
</div>
<hr>
<div class="header">
<p>
Next: <a href="Accessing-Node.html">Accessing Node Information</a>, Previous: <a href="Using-Parser.html">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
Next: <a href="Accessing-Node-Information.html">Accessing Node Information</a>, Previous: <a href="Using-Parser.html">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>

View file

@ -33,7 +33,7 @@
<link href="Index.html" rel="index" title="Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
<link href="Multiple-Languages.html" rel="prev" title="Multiple Languages">
<link href="Tree_002dsitter-major-modes.html" rel="prev" title="Tree-sitter major modes">
<style type="text/css">
<!--
a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
@ -62,23 +62,23 @@
<div class="section" id="Tree_002dsitter-C-API">
<div class="header">
<p>
Previous: <a href="Multiple-Languages.html" accesskey="p" rel="prev">Parsing Text in Multiple Languages</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
Previous: <a href="Tree_002dsitter-major-modes.html" accesskey="p" rel="prev">Developing major modes with tree-sitter</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<span id="Tree_002dsitter-C-API-Correspondence"></span><h3 class="section">37.7 Tree-sitter C API Correspondence</h3>
<span id="Tree_002dsitter-C-API-Correspondence"></span><h3 class="section">37.8 Tree-sitter C API Correspondence</h3>
<p>Emacs&rsquo; tree-sitter integration doesn&rsquo;t expose every feature
tree-sitter&rsquo;s C API provides. Missing features include:
provided by tree-sitter&rsquo;s C API. Missing features include:
</p>
<ul>
<li> Creating a tree cursor and navigating the syntax tree with it.
</li><li> Setting timeout and cancellation flag for a parser.
</li><li> Setting the logger for a parser.
</li><li> Printing a DOT graph of the syntax tree to a file.
</li><li> Coping and modifying a syntax tree. (Emacs doesn&rsquo;t expose a tree
</li><li> Printing a <acronym>DOT</acronym> graph of the syntax tree to a file.
</li><li> Copying and modifying a syntax tree. (Emacs doesn&rsquo;t expose a tree
object.)
</li><li> Using (row, column) coordinates as position.
</li><li> Updating a node with changes. (In Emacs, retrieve a new node instead
</li><li> Updating a node with changes. (In Emacs, retrieve a new node instead
of updating the existing one.)
</li><li> Querying statics of a language definition.
</li></ul>
@ -87,7 +87,7 @@
convenient and idiomatic:
</p>
<ul>
<li> Instead of using byte positions, the ELisp API uses character
<li> Instead of using byte positions, the Emacs Lisp API uses character
positions.
</li><li> Null nodes are converted to nil.
</li></ul>
@ -203,7 +203,7 @@
<hr>
<div class="header">
<p>
Previous: <a href="Multiple-Languages.html">Parsing Text in Multiple Languages</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
Previous: <a href="Tree_002dsitter-major-modes.html">Developing major modes with tree-sitter</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
</div>

View file

@ -67,12 +67,12 @@
</div>
<hr>
<span id="Using-Tree_002dsitter-Parser"></span><h3 class="section">37.2 Using Tree-sitter Parser</h3>
<span id="index-Tree_002dsitter-parser"></span>
<span id="index-tree_002dsitter-parser_002c-using"></span>
<p>This section described how to create and configure a tree-sitter
<p>This section describes how to create and configure a tree-sitter
parser. In Emacs, each tree-sitter parser is associated with a
buffer. As we edit the buffer, the associated parser and the syntax
tree is automatically kept up-to-date.
buffer. As the user edits the buffer, the associated parser and
syntax tree are automatically kept up-to-date.
</p>
<dl class="def">
<dt id="index-treesit_002dmax_002dbuffer_002dsize"><span class="category">Variable: </span><span><strong>treesit-max-buffer-size</strong><a href='#index-treesit_002dmax_002dbuffer_002dsize' class='copiable-anchor'> &para;</a></span></dt>
@ -88,48 +88,49 @@
<code>treesit-available-p</code> and <code>treesit-max-buffer-size</code>.
</p></dd></dl>
<span id="index-Creating-tree_002dsitter-parsers"></span>
<span id="index-creating-tree_002dsitter-parsers"></span>
<span id="index-tree_002dsitter-parser_002c-creating"></span>
<dl class="def">
<dt id="index-treesit_002dparser_002dcreate"><span class="category">Function: </span><span><strong>treesit-parser-create</strong> <em>language &amp;optional buffer no-reuse</em><a href='#index-treesit_002dparser_002dcreate' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>To create a parser, we provide a <var>buffer</var> and the <var>language</var>
to use (see <a href="Language-Definitions.html">Tree-sitter Language Definitions</a>). If <var>buffer</var> is nil, the
current buffer is used.
<dd><p>Create a parser for the specified <var>buffer</var> and <var>language</var>
(see <a href="Language-Definitions.html">Tree-sitter Language Definitions</a>). If <var>buffer</var> is omitted or
<code>nil</code>, it stands for the current buffer.
</p>
<p>By default, this function reuses a parser if one already exists for
<var>language</var> in <var>buffer</var>, if <var>no-reuse</var> is non-nil, this
function always creates a new parser.
<var>language</var> in <var>buffer</var>, but if <var>no-reuse</var> is
non-<code>nil</code>, this function always creates a new parser.
</p></dd></dl>
<p>Given a parser, we can query information about it:
<p>Given a parser, we can query information about it.
</p>
<dl class="def">
<dt id="index-treesit_002dparser_002dbuffer"><span class="category">Function: </span><span><strong>treesit-parser-buffer</strong> <em>parser</em><a href='#index-treesit_002dparser_002dbuffer' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Returns the buffer associated with <var>parser</var>.
<dd><p>This function returns the buffer associated with <var>parser</var>.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dparser_002dlanguage"><span class="category">Function: </span><span><strong>treesit-parser-language</strong> <em>parser</em><a href='#index-treesit_002dparser_002dlanguage' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Returns the language that <var>parser</var> uses.
<dd><p>This function returns the language used by <var>parser</var>.
</p></dd></dl>
<dl class="def">
<dt id="index-treesit_002dparser_002dp"><span class="category">Function: </span><span><strong>treesit-parser-p</strong> <em>object</em><a href='#index-treesit_002dparser_002dp' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Checks if <var>object</var> is a tree-sitter parser. Return non-nil if it
is, return nil otherwise.
<dd><p>This function checks if <var>object</var> is a tree-sitter parser, and
returns non-<code>nil</code> if it is, and <code>nil</code> otherwise.
</p></dd></dl>
<p>There is no need to explicitly parse a buffer, because parsing is done
automatically and lazily. A parser only parses when we query for a
node in its syntax tree. Therefore, when a parser is first created,
it doesn&rsquo;t parse the buffer; it waits until we query for a node for
the first time. Similarly, when some change is made in the buffer, a
parser doesn&rsquo;t re-parse immediately.
automatically and lazily. A parser only parses when a Lisp program
queries for a node in its syntax tree. Therefore, when a parser is
first created, it doesn&rsquo;t parse the buffer; it waits until the Lisp
program queries for a node for the first time. Similarly, when some
change is made in the buffer, a parser doesn&rsquo;t re-parse immediately.
</p>
<span id="index-treesit_002dbuffer_002dtoo_002dlarge"></span>
<p>When a parser do parse, it checks for the size of the buffer.
<p>When a parser does parse, it checks for the size of the buffer.
Tree-sitter can only handle buffer no larger than about 4GB. If the
size exceeds that, Emacs signals <code>treesit-buffer-too-large</code>
with signal data being the buffer size.
size exceeds that, Emacs signals the <code>treesit-buffer-too-large</code>
error with signal data being the buffer size.
</p>
<p>Once a parser is created, Emacs automatically adds it to the
internal parser list. Every time a change is made to the buffer,
@ -138,8 +139,9 @@
</p>
<dl class="def">
<dt id="index-treesit_002dparser_002dlist"><span class="category">Function: </span><span><strong>treesit-parser-list</strong> <em>&amp;optional buffer</em><a href='#index-treesit_002dparser_002dlist' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function returns the parser list of <var>buffer</var>. And
<var>buffer</var> defaults to the current buffer.
<dd><p>This function returns the parser list of <var>buffer</var>. If
<var>buffer</var> is <code>nil</code> or omitted, it defaults to the current
buffer.
</p></dd></dl>
<dl class="def">
@ -148,29 +150,30 @@
</p></dd></dl>
<span id="index-tree_002dsitter-narrowing"></span>
<span id="tree_002dsitter-narrowing"></span><p>Normally, a parser &ldquo;sees&rdquo; the whole
buffer, but when the buffer is narrowed (see <a href="Narrowing.html">Narrowing</a>), the
parser will only see the visible region. As far as the parser can
tell, the hidden region is deleted. And when the buffer is later
widened, the parser thinks text is inserted in the beginning and in
the end. Although parsers respect narrowing, narrowing shouldn&rsquo;t be
the mean to handle a multi-language buffer; instead, set the ranges in
which a parser should operate in. See <a href="Multiple-Languages.html">Parsing Text in Multiple Languages</a>.
<span id="tree_002dsitter-narrowing"></span><p>Normally, a parser &ldquo;sees&rdquo; the whole buffer, but when the buffer is
narrowed (see <a href="Narrowing.html">Narrowing</a>), the parser will only see the accessible
portion of the buffer. As far as the parser can tell, the hidden
region was deleted. When the buffer is later widened, the parser
thinks text is inserted at the beginning and at the end. Although
parsers respect narrowing, modes should not use narrowing as a means
to handle a multi-language buffer; instead, set the ranges in which the
parser should operate. See <a href="Multiple-Languages.html">Parsing Text in Multiple Languages</a>.
</p>
<p>Because a parser parses lazily, when we narrow the buffer, the parser
is not affected immediately; as long as we don&rsquo;t query for a node
while the buffer is narrowed, the parser is oblivious of the
narrowing.
<p>Because a parser parses lazily, when the user or a Lisp program
narrows the buffer, the parser is not affected immediately; as long as
the mode doesn&rsquo;t query for a node while the buffer is narrowed, the
parser is oblivious of the narrowing.
</p>
<span id="index-tree_002dsitter-parse-string"></span>
<dl class="def">
<dt id="index-treesit_002dparse_002dstring"><span class="category">Function: </span><span><strong>treesit-parse-string</strong> <em>string language</em><a href='#index-treesit_002dparse_002dstring' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>Besides creating a parser for a buffer, we can also just parse a
string. Unlike a buffer, parsing a string is a one-time deal, and
<span id="index-parse-string_002c-tree_002dsitter"></span>
<p>Besides creating a parser for a buffer, a Lisp program can also parse a
string. Unlike a buffer, parsing a string is a one-off operation, and
there is no way to update the result.
</p>
<p>This function parses <var>string</var> with <var>language</var>, and returns the
root node of the generated syntax tree.
<dl class="def">
<dt id="index-treesit_002dparse_002dstring"><span class="category">Function: </span><span><strong>treesit-parse-string</strong> <em>string language</em><a href='#index-treesit_002dparse_002dstring' class='copiable-anchor'> &para;</a></span></dt>
<dd><p>This function parses <var>string</var> using <var>language</var>, and returns
the root node of the generated syntax tree.
</p></dd></dl>
</div>

View file

@ -78,28 +78,25 @@ Now check if Emacs is built with tree-sitter library
(treesit-available-p)
For your major mode, first create a tree-sitter switch:
Users toggle tree-sitter for each major mode with a central variable,
treesit-settings. You can check whether to enable tree-sitter with
treesit-ready-p, which takes a major-mode symbol and one or more
language symbol. The major mode body should use a branch like this:
#+begin_src elisp
(defcustom python-use-tree-sitter nil
"If non-nil, `python-mode' tries to use tree-sitter.
Currently `python-mode' can utilize tree-sitter for font-locking,
imenu, and movement functions."
:type 'boolean)
#+end_src
Then in other places, we decide on whether to enable tree-sitter by
#+begin_src elisp
(and python-use-tree-sitter
(treesit-can-enable-p))
#+begin_src emacs-lisp
(cond
;; Tree-sitter setup.
((treesit-ready-p 'python-mode 'python)
...)
(t
;; Non-tree-sitter setup.
...))
#+end_src
* Naming convention
When referring to tree-sitter as a noun, use “tree-sitter”, like
python-use-tree-sitter. For prefix use “treesit”, like
python-treesit-indent.
Use tree-sitter for text (documentation, comment), use treesit for
symbol (variable, function).
* Font-lock
@ -108,10 +105,23 @@ capture names, tree-sitter finds the nodes that match these patterns,
tag the corresponding capture names onto the nodes and return them to
you. The query function returns a list of (capture-name . node). For
font-lock, we use face names as capture names. And the captured node
will be fontified in their capture name. The capture name could also
be a function, in which case (START END NODE) is passed to the
function for font-lock. START and END is the start and end the
captured NODE.
will be fontified in their capture name.
The capture name could also be a function, in which case (NODE
OVERRIDE START END) is passed to the function for fontification. START
and END is the start and end of the region to be fontified. The
function should only fontify within that region. The function should
also allow more optional arguments with (&rest _), for future
extensibility. For OVERRIDE check out the docstring of
treesit-font-lock-rules.
Contextual syntax like multi-line comments and multi-line strings,
needs special care. Because change in this type of things can affect
a large portion of the buffer. Think of inserting a closing comment
delimeter, it causes all the text before it (to the opening comment
delimeter) to change to comment face. These things needs to be
captured in a special name “contextual”, so that Emacs can give them
special treatment. Se the example below for how it looks like.
** Query syntax
@ -171,52 +181,64 @@ The manual explains how to read grammar files in the bottom of section
** Debugging queires
If your query has problems, it usually cannot compile. In that case
use treesit-query-validate to debug the query. It will pop a buffer
containing the query (in text format) and mark the offending part in
red.
If your query has problems, use treesit-query-validate to debug the
query. It will pop a buffer containing the query (in text format) and
mark the offending part in red.
** Code
To enable tree-sitter font-lock, set treesit-font-lock-settings
buffer-locally and call treesit-font-lock-enable. For example, see
To enable tree-sitter font-lock, set treesit-font-lock-settings and
treesit-font-lock-feature-list buffer-locally and call
treesit-major-mode-setup. For example, see
python--treesit-settings in python.el. Below I paste a snippet of
it.
Note that like the current font-lock, if the to-be-fontified region
already has a face (ie, an earlier match fontified part/all of the
region), the new face is discarded rather than applied. If you want
region), the new face is discarded rather than applied. If you want
later matches always override earlier matches, use the :override
keyword.
Each rule should have a :feature, like function-name,
string-interpolation, builtin, etc. Users can then enable/disable each
feature individually.
#+begin_src elisp
(defvar python--treesit-settings
(treesit-font-lock-rules
:feature 'comment
:language 'python
:override t
`(;; Queries for def and class.
(function_definition
name: (identifier) @font-lock-function-name-face)
'((comment) @font-lock-comment-face)
(class_definition
name: (identifier) @font-lock-type-face)
:feature 'string
:language 'python
'((string) @font-lock-string-face
(string) @contextual) ; Contextual special treatment.
;; Comment and string.
(comment) @font-lock-comment-face
:feature 'function-name
:language 'python
'((function_definition
name: (identifier) @font-lock-function-name-face))
...)))
:feature 'class-name
:language 'python
'((class_definition
name: (identifier) @font-lock-type-face))
...))
#+end_src
Then in python-mode, enable tree-sitter font-lock:
#+begin_src elisp
(treesit-parser-create 'python)
;; This turns off the syntax-based font-lock for comments and
;; strings. So it doesnt override tree-sitters fontification.
(setq-local font-lock-keywords-only t)
(setq-local treesit-font-lock-settings
python--treesit-settings)
(treesit-font-lock-enable)
(setq-local treesit-font-lock-settings python--treesit-settings)
(setq-local treesit-font-lock-feature-list
'((comment string function-name)
(class-name keyword builtin)
(string-interpolation decorator)))
...
(treesit-major-mode-setup)
#+end_src
Concretely, something like this:
@ -224,29 +246,22 @@ Concretely, something like this:
#+begin_src elisp
(define-derived-mode python-mode prog-mode "Python"
...
(treesit-parser-create 'python)
(if (and python-use-tree-sitter
(treesit-can-enable-p))
;; Tree-sitter.
(progn
(setq-local font-lock-keywords-only t)
(setq-local treesit-font-lock-settings
python--treesit-settings)
(treesit-font-lock-enable))
(cond
;; Tree-sitter.
((treesit-ready-p 'python-mode 'python)
(treesit-parser-create 'python)
(setq-local treesit-font-lock-settings python--treesit-settings)
(setq-local treesit-font-lock-feature-list
'((comment string function-name)
(class-name keyword builtin)
(string-interpolation decorator)))
(treesit-major-mode-setup))
(t
;; No tree-sitter
(setq-local font-lock-defaults ...))
...)
(setq-local font-lock-defaults ...)
...)))
#+end_src
Youll notice that tree-sitters font-lock doesnt respect
font-lock-maximum-decoration, major modes are free to set
treesit-font-lock-settings based on the value of
font-lock-maximum-decoration, or provide more fine-grained control
through other mode-specific means. (Towards that end, the :toggle option in treesit-font-lock-rules is very useful.)
* Indent
Indent works like this: We have a bunch of rules that look like
@ -262,10 +277,14 @@ previous line. We find the column number of that point (eg, 4), add
OFFSET to it (eg, 0), and that is the column we want to indent the
current line to (4 + 0 = 4).
Matchers and anchors are functions that takes (NODE PARENT BOL &rest
_). Matches return nil/non-nil for no match/match, and anchors return
the anchor point. Below are some convenient builtin matchers and anchors.
For MATHCER we have
(parent-is TYPE)
(node-is TYPE)
(parent-is TYPE) => matches if PARENTs type matches TYPE as regexp
(node-is TYPE) => mathces NODEs type
(query QUERY) => matches if querying PARENT with QUERY
captures NODE.
@ -280,9 +299,9 @@ For ANCHOR we have
first-sibling => start of the first sibling
parent => start of parent
parent-bol => BOL of the line parent is on.
prev-sibling
no-indent => dont indent
prev-line => same indent as previous line
prev-sibling => start of previous sibling
no-indent => current position (dont indent)
prev-line => start of previous line
There is also a manual section for indent: "Parser-based Indentation".
@ -301,7 +320,7 @@ tells you which rule is applied in the echo area.
((node-is ")") parent-bol 0)
((node-is "]") parent-bol 0)
((node-is ">") parent-bol 0)
((node-is ".") parent-bol ,offset)
((node-is "\\.") parent-bol ,offset)
((parent-is "ternary_expression") parent-bol ,offset)
((parent-is "named_imports") parent-bol ,offset)
((parent-is "statement_block") parent-bol ,offset)
@ -320,21 +339,21 @@ tells you which rule is applied in the echo area.
...))))
#+end_src
Then you set treesit-simple-indent-rules to your rules, and set
indent-line-function:
Then you set treesit-simple-indent-rules to your rules, and call
treesit-major-mode-setup:
#+begin_src elisp
(setq-local treesit-simple-indent-rules typescript-mode-indent-rules)
(setq-local indent-line-function #'treesit-indent)
(treesit-major-mode-setup)
#+end_src
* Imenu
Not much to say except for utilizing treesit-induce-sparse-tree.
See python--imenu-treesit-create-index-1 in python.el for an
example.
See js--treesit-imenu-1 in js.el for an example.
Once you have the index builder, set imenu-create-index-function.
Once you have the index builder, set imenu-create-index-function to
it.
* Navigation
@ -344,51 +363,33 @@ You can find the end of a defun with something like
(treesit-search-forward-goto "function_definition" 'end)
where "function_definition" matches the node type of a function
definition node, and end means we want to go to the end of that
node.
definition node, and end means we want to go to the end of that node.
Something like this should suffice:
Tree-sitter has default implementations for
beginning-of-defun-function and end-of-defun-function. So for
ordinary languages, it is suffice to set treesit-defun-type-regexp
to something that matches all the defun struct types in the language,
and call treesit-major-mode-setup. For example,
#+begin_src elisp
(defun js--treesit-beginning-of-defun (&optional arg)
(let ((arg (or arg 1)))
(if (> arg 0)
;; Go backward.
(while (and (> arg 0)
(treesit-search-forward-goto
"function_definition" 'start nil t))
(setq arg (1- arg)))
;; Go forward.
(while (and (< arg 0)
(treesit-search-forward-goto
"function_definition" 'start))
(setq arg (1+ arg))))))
(defun xxx-end-of-defun (&optional arg)
(let ((arg (or arg 1)))
(if (< arg 0)
;; Go backward.
(while (and (< arg 0)
(treesit-search-forward-goto
"function_definition" 'end nil t))
(setq arg (1+ arg)))
;; Go forward.
(while (and (> arg 0)
(treesit-search-forward-goto
"function_definition" 'end))
(setq arg (1- arg))))))
(setq-local beginning-of-defun-function #'xxx-beginning-of-defun)
(setq-local end-of-defun-function #'xxx-end-of-defun)
#+end_src
#+begin_src emacs-lisp
(setq-local treesit-defun-type-regexp (rx bol
(or "function" "class")
"_definition"
eol))
(treesit-major-mode-setup)
#+end_src>
* Which-func
You can find the current function by going up the tree and looking for
the function_definition node. See python-info-treesit-current-defun
in python.el for an example. Since Python allows nested function
definitions, that function keeps going until it reaches the root node,
and records all the function names along the way.
If you have an imenu implementation, set which-func-functions to
nil, and which-func will automatically use imenus data.
If you want independent implementation for which-func, you can find
the current function by going up the tree and looking for the
function_definition node. See the function below for an example.
Since Python allows nested function definitions, that function keeps
going until it reaches the root node, and records all the function
names along the way.
#+begin_src elisp
(defun python-info-treesit-current-defun (&optional include-type)