; Update guides in /admin/notes/tree-sitter
* admin/notes/tree-sitter/html-manual/Language-Definitions.html * admin/notes/tree-sitter/html-manual/Multiple-Languages.html * admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html * admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html * admin/notes/tree-sitter/html-manual/Parsing-Program-Source.html * admin/notes/tree-sitter/html-manual/Pattern-Matching.html * admin/notes/tree-sitter/html-manual/Retrieving-Node.html * admin/notes/tree-sitter/html-manual/Tree_002dsitter-C-API.html * admin/notes/tree-sitter/html-manual/Using-Parser.html * admin/notes/tree-sitter/starter-guide: Update to reflect changes made recently.
This commit is contained in:
parent
9909652849
commit
5416ae5990
10 changed files with 988 additions and 752 deletions
|
@ -66,14 +66,17 @@
|
|||
</div>
|
||||
<hr>
|
||||
<span id="Tree_002dsitter-Language-Definitions"></span><h3 class="section">37.1 Tree-sitter Language Definitions</h3>
|
||||
<span id="index-language-definitions_002c-for-tree_002dsitter"></span>
|
||||
|
||||
<span id="Loading-a-language-definition"></span><h3 class="heading">Loading a language definition</h3>
|
||||
<span id="index-loading-language-definition-for-tree_002dsitter"></span>
|
||||
|
||||
<span id="index-language-argument_002c-for-tree_002dsitter"></span>
|
||||
<p>Tree-sitter relies on language definitions to parse text in that
|
||||
language. In Emacs, A language definition is represented by a symbol.
|
||||
For example, C language definition is represented as <code>c</code>, and
|
||||
<code>c</code> can be passed to tree-sitter functions as the <var>language</var>
|
||||
argument.
|
||||
language. In Emacs, a language definition is represented by a symbol.
|
||||
For example, the C language definition is represented as the symbol
|
||||
<code>c</code>, and <code>c</code> can be passed to tree-sitter functions as the
|
||||
<var>language</var> argument.
|
||||
</p>
|
||||
<span id="index-treesit_002dextra_002dload_002dpath"></span>
|
||||
<span id="index-treesit_002dload_002dlanguage_002derror"></span>
|
||||
|
@ -81,55 +84,92 @@
|
|||
<p>Tree-sitter language definitions are distributed as dynamic libraries.
|
||||
In order to use a language definition in Emacs, you need to make sure
|
||||
that the dynamic library is installed on the system. Emacs looks for
|
||||
language definitions under load paths in
|
||||
<code>treesit-extra-load-path</code>, <code>user-emacs-directory</code>/tree-sitter,
|
||||
and system default locations for dynamic libraries, in that order.
|
||||
Emacs tries each extensions in <code>treesit-load-suffixes</code>. If Emacs
|
||||
cannot find the library or has problem loading it, Emacs signals
|
||||
<code>treesit-load-language-error</code>. The signal data is a list of
|
||||
specific error messages.
|
||||
language definitions in several places, in the following order:
|
||||
</p>
|
||||
<ul>
|
||||
<li> first, in the list of directories specified by the variable
|
||||
<code>treesit-extra-load-path</code>;
|
||||
</li><li> then, in the <samp>tree-sitter</samp> subdirectory of the directory
|
||||
specified by <code>user-emacs-directory</code> (see <a href="Init-File.html">The Init File</a>);
|
||||
</li><li> and finally, in the system’s default locations for dynamic libraries.
|
||||
</li></ul>
|
||||
|
||||
<p>In each of these directories, Emacs looks for a file with file-name
|
||||
extensions specified by the variable <code>treesit-load-suffixes</code>.
|
||||
</p>
|
||||
<p>If Emacs cannot find the library or has problems loading it, Emacs
|
||||
signals the <code>treesit-load-language-error</code> error. The data of
|
||||
that signal could be one of the following:
|
||||
</p>
|
||||
<dl compact="compact">
|
||||
<dt><span><code>(not-found <var>error-msg</var> …)</code></span></dt>
|
||||
<dd><p>This means that Emacs could not find the language definition library.
|
||||
</p></dd>
|
||||
<dt><span><code>(symbol-error <var>error-msg</var>)</code></span></dt>
|
||||
<dd><p>This means that Emacs could not find in the library the expected function
|
||||
that every language definition library should export.
|
||||
</p></dd>
|
||||
<dt><span><code>(version-mismatch <var>error-msg</var>)</code></span></dt>
|
||||
<dd><p>This means that the version of language definition library is incompatible
|
||||
with that of the tree-sitter library.
|
||||
</p></dd>
|
||||
</dl>
|
||||
|
||||
<p>In all of these cases, <var>error-msg</var> might provide additional
|
||||
details about the failure.
|
||||
</p>
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dlanguage_002davailable_002dp"><span class="category">Function: </span><span><strong>treesit-language-available-p</strong> <em>language</em><a href='#index-treesit_002dlanguage_002davailable_002dp' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function checks whether the dynamic library for <var>language</var> is
|
||||
present on the system, and return non-nil if it is.
|
||||
<dt id="index-treesit_002dlanguage_002davailable_002dp"><span class="category">Function: </span><span><strong>treesit-language-available-p</strong> <em>language &optional detail</em><a href='#index-treesit_002dlanguage_002davailable_002dp' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function returns non-<code>nil</code> if the language definitions for
|
||||
<var>language</var> exist and can be loaded.
|
||||
</p>
|
||||
<p>If <var>detail</var> is non-<code>nil</code>, return <code>(t . nil)</code> when
|
||||
<var>language</var> is available, and <code>(nil . <var>data</var>)</code> when it’s
|
||||
unavailable. <var>data</var> is the signal data of
|
||||
<code>treesit-load-language-error</code>.
|
||||
</p></dd></dl>
|
||||
|
||||
<span id="index-treesit_002dload_002dname_002doverride_002dlist"></span>
|
||||
<p>By convention, the dynamic library for <var>language</var> is
|
||||
<code>libtree-sitter-<var>language</var>.<var>ext</var></code>, where <var>ext</var> is the
|
||||
system-specific extension for dynamic libraries. Also by convention,
|
||||
<p>By convention, the file name of the dynamic library for <var>language</var> is
|
||||
<samp>libtree-sitter-<var>language</var>.<var>ext</var></samp>, where <var>ext</var> is the
|
||||
system-specific extension for dynamic libraries. Also by convention,
|
||||
the function provided by that library is named
|
||||
<code>tree_sitter_<var>language</var></code>. If a language definition doesn’t
|
||||
follow this convention, you should add an entry
|
||||
<code>tree_sitter_<var>language</var></code>. If a language definition library
|
||||
doesn’t follow this convention, you should add an entry
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(<var>language</var> <var>library-base-name</var> <var>function-name</var>)
|
||||
</pre></div>
|
||||
|
||||
<p>to <code>treesit-load-name-override-list</code>, where
|
||||
<var>library-base-name</var> is the base filename for the dynamic library
|
||||
(conventionally <code>libtree-sitter-<var>language</var></code>), and
|
||||
<p>to the list in the variable <code>treesit-load-name-override-list</code>, where
|
||||
<var>library-base-name</var> is the basename of the dynamic library’s file name,
|
||||
(usually, <samp>libtree-sitter-<var>language</var></samp>), and
|
||||
<var>function-name</var> is the function provided by the library
|
||||
(conventionally <code>tree_sitter_<var>language</var></code>). For example,
|
||||
(usually, <code>tree_sitter_<var>language</var></code>). For example,
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(cool-lang "libtree-sitter-coool" "tree_sitter_cooool")
|
||||
</pre></div>
|
||||
|
||||
<p>for a language too cool to abide by conventions.
|
||||
<p>for a language that considers itself too “cool” to abide by
|
||||
conventions.
|
||||
</p>
|
||||
<span id="index-language_002ddefinition-version_002c-compatibility"></span>
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dlanguage_002dversion"><span class="category">Function: </span><span><strong>treesit-language-version</strong> <em>&optional min-compatible</em><a href='#index-treesit_002dlanguage_002dversion' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>Tree-sitter library has a <em>language version</em>, a language
|
||||
definition’s version needs to match this version to be compatible.
|
||||
</p>
|
||||
<p>This function returns tree-sitter library’s language version. If
|
||||
<var>min-compatible</var> is non-nil, it returns the minimal compatible
|
||||
version.
|
||||
<dd><p>This function returns the version of the language-definition
|
||||
Application Binary Interface (<acronym>ABI</acronym>) supported by the
|
||||
tree-sitter library. By default, it returns the latest ABI version
|
||||
supported by the library, but if <var>min-compatible</var> is
|
||||
non-<code>nil</code>, it returns the oldest ABI version which the library
|
||||
still can support. Language definition libraries must be built for
|
||||
ABI versions between the oldest and the latest versions supported by
|
||||
the tree-sitter library, otherwise the library will be unable to load
|
||||
them.
|
||||
</p></dd></dl>
|
||||
|
||||
<span id="Concrete-syntax-tree"></span><h3 class="heading">Concrete syntax tree</h3>
|
||||
<span id="index-syntax-tree_002c-concrete"></span>
|
||||
|
||||
<p>A syntax tree is what a parser generates. In a syntax tree, each node
|
||||
represents a piece of text, and is connected to each other by a
|
||||
|
@ -155,31 +195,34 @@
|
|||
+------------+ +--------------+ +------------+
|
||||
</pre></div>
|
||||
|
||||
<p>We can also represent it in s-expression:
|
||||
<p>We can also represent it as an s-expression:
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(root (expression (number) (operator) (number)))
|
||||
</pre></div>
|
||||
|
||||
<span id="Node-types"></span><h4 class="subheading">Node types</h4>
|
||||
<span id="index-node-types_002c-in-a-syntax-tree"></span>
|
||||
|
||||
<span id="index-tree_002dsitter-node-type"></span>
|
||||
<span id="tree_002dsitter-node-type"></span><span id="index-tree_002dsitter-named-node"></span>
|
||||
<span id="tree_002dsitter-named-node"></span><span id="index-tree_002dsitter-anonymous-node"></span>
|
||||
<p>Names like <code>root</code>, <code>expression</code>, <code>number</code>,
|
||||
<code>operator</code> are nodes’ <em>type</em>. However, not all nodes in a
|
||||
syntax tree have a type. Nodes that don’t are <em>anonymous nodes</em>,
|
||||
and nodes with a type are <em>named nodes</em>. Anonymous nodes are
|
||||
tokens with fixed spellings, including punctuation characters like
|
||||
bracket ‘<samp>]</samp>’, and keywords like <code>return</code>.
|
||||
<span id="index-type-of-node_002c-tree_002dsitter"></span>
|
||||
<span id="tree_002dsitter-node-type"></span><span id="index-named-node_002c-tree_002dsitter"></span>
|
||||
<span id="tree_002dsitter-named-node"></span><span id="index-anonymous-node_002c-tree_002dsitter"></span>
|
||||
<p>Names like <code>root</code>, <code>expression</code>, <code>number</code>, and
|
||||
<code>operator</code> specify the <em>type</em> of the nodes. However, not all
|
||||
nodes in a syntax tree have a type. Nodes that don’t have a type are
|
||||
known as <em>anonymous nodes</em>, and nodes with a type are <em>named
|
||||
nodes</em>. Anonymous nodes are tokens with fixed spellings, including
|
||||
punctuation characters like bracket ‘<samp>]</samp>’, and keywords like
|
||||
<code>return</code>.
|
||||
</p>
|
||||
<span id="Field-names"></span><h4 class="subheading">Field names</h4>
|
||||
|
||||
<span id="index-field-name_002c-tree_002dsitter"></span>
|
||||
<span id="index-tree_002dsitter-node-field-name"></span>
|
||||
<span id="tree_002dsitter-node-field-name"></span><p>To make the syntax tree easier to
|
||||
analyze, many language definitions assign <em>field names</em> to child
|
||||
nodes. For example, a <code>function_definition</code> node could have a
|
||||
<code>declarator</code> and a <code>body</code>:
|
||||
<span id="tree_002dsitter-node-field-name"></span><p>To make the syntax tree easier to analyze, many language definitions
|
||||
assign <em>field names</em> to child nodes. For example, a
|
||||
<code>function_definition</code> node could have a <code>declarator</code> and a
|
||||
<code>body</code>:
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(function_definition
|
||||
|
@ -189,39 +232,40 @@
|
|||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dinspect_002dmode"><span class="category">Command: </span><span><strong>treesit-inspect-mode</strong><a href='#index-treesit_002dinspect_002dmode' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This minor mode displays the node that <em>starts</em> at point in
|
||||
mode-line. The mode-line will display
|
||||
<dd><p>This minor mode displays on the mode-line the node that <em>starts</em>
|
||||
at point. The mode-line will display
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example"><var>parent</var> <var>field-name</var>: (<var>child</var> (<var>grand-child</var> (...)))
|
||||
<pre class="example"><var>parent</var> <var>field</var>: (<var>child</var> (<var>grandchild</var> (…)))
|
||||
</pre></div>
|
||||
|
||||
<p><var>child</var>, <var>grand-child</var>, and <var>grand-grand-child</var>, etc, are
|
||||
nodes that have their beginning at point. And <var>parent</var> is the
|
||||
parent of <var>child</var>.
|
||||
<p><var>child</var>, <var>grand</var>, <var>grand-grandchild</var>, etc., are nodes that
|
||||
begin at point. <var>parent</var> is the parent node of <var>child</var>.
|
||||
</p>
|
||||
<p>If there is no node that starts at point, i.e., point is in the middle
|
||||
of a node, then the mode-line only displays the smallest node that
|
||||
spans point, and its immediate parent.
|
||||
spans the position of point, and its immediate parent.
|
||||
</p>
|
||||
<p>This minor mode doesn’t create parsers on its own. It simply uses the
|
||||
first parser in <code>(treesit-parser-list)</code> (see <a href="Using-Parser.html">Using Tree-sitter Parser</a>).
|
||||
</p></dd></dl>
|
||||
|
||||
<span id="Reading-the-grammar-definition"></span><h3 class="heading">Reading the grammar definition</h3>
|
||||
<span id="index-reading-grammar-definition_002c-tree_002dsitter"></span>
|
||||
|
||||
<p>Authors of language definitions define the <em>grammar</em> of a
|
||||
language, and this grammar determines how does a parser construct a
|
||||
concrete syntax tree out of the text. In order to use the syntax
|
||||
tree effectively, we need to read the <em>grammar file</em>.
|
||||
programming language, which determines how a parser constructs a
|
||||
concrete syntax tree out of the program text. In order to use the
|
||||
syntax tree effectively, you need to consult the <em>grammar file</em>.
|
||||
</p>
|
||||
<p>The grammar file is usually <code>grammar.js</code> in a language
|
||||
definition’s project repository. The link to a language definition’s
|
||||
home page can be found in tree-sitter’s homepage
|
||||
(<a href="https://tree-sitter.github.io/tree-sitter">https://tree-sitter.github.io/tree-sitter</a>).
|
||||
<p>The grammar file is usually <samp>grammar.js</samp> in a language
|
||||
definition’s project repository. The link to a language definition’s
|
||||
home page can be found on
|
||||
<a href="https://tree-sitter.github.io/tree-sitter">tree-sitter’s
|
||||
homepage</a>.
|
||||
</p>
|
||||
<p>The grammar is written in JavaScript syntax. For example, the rule
|
||||
matching a <code>function_definition</code> node looks like
|
||||
<p>The grammar definition is written in JavaScript. For example, the
|
||||
rule matching a <code>function_definition</code> node looks like
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">function_definition: $ => seq(
|
||||
|
@ -231,12 +275,12 @@
|
|||
)
|
||||
</pre></div>
|
||||
|
||||
<p>The rule is represented by a function that takes a single argument
|
||||
<p>The rules are represented by functions that take a single argument
|
||||
<var>$</var>, representing the whole grammar. The function itself is
|
||||
constructed by other functions: the <code>seq</code> function puts together a
|
||||
sequence of children; the <code>field</code> function annotates a child with
|
||||
a field name. If we write the above definition in BNF syntax, it
|
||||
would look like
|
||||
constructed by other functions: the <code>seq</code> function puts together
|
||||
a sequence of children; the <code>field</code> function annotates a child
|
||||
with a field name. If we write the above definition in the so-called
|
||||
<em>Backus-Naur Form</em> (<acronym>BNF</acronym>) syntax, it would look like
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">function_definition :=
|
||||
|
@ -252,66 +296,77 @@
|
|||
body: (compound_statement))
|
||||
</pre></div>
|
||||
|
||||
<p>Below is a list of functions that one will see in a grammar
|
||||
definition. Each function takes other rules as arguments and returns
|
||||
a new rule.
|
||||
<p>Below is a list of functions that one can see in a grammar definition.
|
||||
Each function takes other rules as arguments and returns a new rule.
|
||||
</p>
|
||||
<ul>
|
||||
<li> <code>seq(rule1, rule2, ...)</code> matches each rule one after another.
|
||||
|
||||
</li><li> <code>choice(rule1, rule2, ...)</code> matches one of the rules in its
|
||||
arguments.
|
||||
|
||||
</li><li> <code>repeat(rule)</code> matches <var>rule</var> for <em>zero or more</em> times.
|
||||
<dl compact="compact">
|
||||
<dt><span><code>seq(<var>rule1</var>, <var>rule2</var>, …)</code></span></dt>
|
||||
<dd><p>matches each rule one after another.
|
||||
</p></dd>
|
||||
<dt><span><code>choice(<var>rule1</var>, <var>rule2</var>, …)</code></span></dt>
|
||||
<dd><p>matches one of the rules in its arguments.
|
||||
</p></dd>
|
||||
<dt><span><code>repeat(<var>rule</var>)</code></span></dt>
|
||||
<dd><p>matches <var>rule</var> for <em>zero or more</em> times.
|
||||
This is like the ‘<samp>*</samp>’ operator in regular expressions.
|
||||
|
||||
</li><li> <code>repeat1(rule)</code> matches <var>rule</var> for <em>one or more</em> times.
|
||||
</p></dd>
|
||||
<dt><span><code>repeat1(<var>rule</var>)</code></span></dt>
|
||||
<dd><p>matches <var>rule</var> for <em>one or more</em> times.
|
||||
This is like the ‘<samp>+</samp>’ operator in regular expressions.
|
||||
|
||||
</li><li> <code>optional(rule)</code> matches <var>rule</var> for <em>zero or one</em> time.
|
||||
</p></dd>
|
||||
<dt><span><code>optional(<var>rule</var>)</code></span></dt>
|
||||
<dd><p>matches <var>rule</var> for <em>zero or one</em> time.
|
||||
This is like the ‘<samp>?</samp>’ operator in regular expressions.
|
||||
|
||||
</li><li> <code>field(name, rule)</code> assigns field name <var>name</var> to the child
|
||||
node matched by <var>rule</var>.
|
||||
|
||||
</li><li> <code>alias(rule, alias)</code> makes nodes matched by <var>rule</var> appear as
|
||||
<var>alias</var> in the syntax tree generated by the parser. For example,
|
||||
|
||||
</p></dd>
|
||||
<dt><span><code>field(<var>name</var>, <var>rule</var>)</code></span></dt>
|
||||
<dd><p>assigns field name <var>name</var> to the child node matched by <var>rule</var>.
|
||||
</p></dd>
|
||||
<dt><span><code>alias(<var>rule</var>, <var>alias</var>)</code></span></dt>
|
||||
<dd><p>makes nodes matched by <var>rule</var> appear as <var>alias</var> in the syntax
|
||||
tree generated by the parser. For example,
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">alias(preprocessor_call_exp, call_expression)
|
||||
</pre></div>
|
||||
|
||||
<p>makes any node matched by <code>preprocessor_call_exp</code> to appear as
|
||||
<p>makes any node matched by <code>preprocessor_call_exp</code> appear as
|
||||
<code>call_expression</code>.
|
||||
</p></li></ul>
|
||||
</p></dd>
|
||||
</dl>
|
||||
|
||||
<p>Below are grammar functions less interesting for a reader of a
|
||||
<p>Below are grammar functions of lesser importance for reading a
|
||||
language definition.
|
||||
</p>
|
||||
<ul>
|
||||
<li> <code>token(rule)</code> marks <var>rule</var> to produce a single leaf node.
|
||||
That is, instead of generating a parent node with individual child
|
||||
nodes under it, everything is combined into a single leaf node.
|
||||
<dl compact="compact">
|
||||
<dt><span><code>token(<var>rule</var>)</code></span></dt>
|
||||
<dd><p>marks <var>rule</var> to produce a single leaf node. That is, instead of
|
||||
generating a parent node with individual child nodes under it,
|
||||
everything is combined into a single leaf node.
|
||||
</p></dd>
|
||||
<dt><span><code>token.immediate(<var>rule</var>)</code></span></dt>
|
||||
<dd><p>Normally, grammar rules ignore preceding whitespace; this
|
||||
changes <var>rule</var> to match only when there is no preceding
|
||||
whitespaces.
|
||||
</p></dd>
|
||||
<dt><span><code>prec(<var>n</var>, <var>rule</var>)</code></span></dt>
|
||||
<dd><p>gives <var>rule</var> the level-<var>n</var> precedence.
|
||||
</p></dd>
|
||||
<dt><span><code>prec.left([<var>n</var>,] <var>rule</var>)</code></span></dt>
|
||||
<dd><p>marks <var>rule</var> as left-associative, optionally with level <var>n</var>.
|
||||
</p></dd>
|
||||
<dt><span><code>prec.right([<var>n</var>,] <var>rule</var>)</code></span></dt>
|
||||
<dd><p>marks <var>rule</var> as right-associative, optionally with level <var>n</var>.
|
||||
</p></dd>
|
||||
<dt><span><code>prec.dynamic(<var>n</var>, <var>rule</var>)</code></span></dt>
|
||||
<dd><p>this is like <code>prec</code>, but the precedence is applied at runtime
|
||||
instead.
|
||||
</p></dd>
|
||||
</dl>
|
||||
|
||||
</li><li> Normally, grammar rules ignore preceding whitespaces,
|
||||
<code>token.immediate(rule)</code> changes <var>rule</var> to match only when
|
||||
there is no preceding whitespaces.
|
||||
|
||||
</li><li> <code>prec(n, rule)</code> gives <var>rule</var> a level <var>n</var> precedence.
|
||||
|
||||
</li><li> <code>prec.left([n,] rule)</code> marks <var>rule</var> as left-associative,
|
||||
optionally with level <var>n</var>.
|
||||
|
||||
</li><li> <code>prec.right([n,] rule)</code> marks <var>rule</var> as right-associative,
|
||||
optionally with level <var>n</var>.
|
||||
|
||||
</li><li> <code>prec.dynamic(n, rule)</code> is like <code>prec</code>, but the precedence
|
||||
is applied at runtime instead.
|
||||
</li></ul>
|
||||
|
||||
<p>The tree-sitter project talks about writing a grammar in more detail:
|
||||
<a href="https://tree-sitter.github.io/tree-sitter/creating-parsers">https://tree-sitter.github.io/tree-sitter/creating-parsers</a>.
|
||||
Read especially “The Grammar DSL” section.
|
||||
<p>The documentation of the tree-sitter project has
|
||||
<a href="https://tree-sitter.github.io/tree-sitter/creating-parsers">more
|
||||
about writing a grammar</a>. Read especially “The Grammar DSL”
|
||||
section.
|
||||
</p>
|
||||
</div>
|
||||
<hr>
|
||||
|
|
|
@ -33,7 +33,7 @@
|
|||
<link href="Index.html" rel="index" title="Index">
|
||||
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
|
||||
<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
|
||||
<link href="Tree_002dsitter-C-API.html" rel="next" title="Tree-sitter C API">
|
||||
<link href="Tree_002dsitter-major-modes.html" rel="next" title="Tree-sitter major modes">
|
||||
<link href="Pattern-Matching.html" rel="prev" title="Pattern Matching">
|
||||
<style type="text/css">
|
||||
<!--
|
||||
|
@ -63,27 +63,29 @@
|
|||
<div class="section" id="Multiple-Languages">
|
||||
<div class="header">
|
||||
<p>
|
||||
Next: <a href="Tree_002dsitter-C-API.html" accesskey="n" rel="next">Tree-sitter C API Correspondence</a>, Previous: <a href="Pattern-Matching.html" accesskey="p" rel="prev">Pattern Matching Tree-sitter Nodes</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
||||
Next: <a href="Tree_002dsitter-major-modes.html" accesskey="n" rel="next">Developing major modes with tree-sitter</a>, Previous: <a href="Pattern-Matching.html" accesskey="p" rel="prev">Pattern Matching Tree-sitter Nodes</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
||||
</div>
|
||||
<hr>
|
||||
<span id="Parsing-Text-in-Multiple-Languages"></span><h3 class="section">37.6 Parsing Text in Multiple Languages</h3>
|
||||
|
||||
<p>Sometimes, the source of a programming language could contain sources
|
||||
of other languages, HTML + CSS + JavaScript is one example. In that
|
||||
case, we need to assign individual parsers to text segments written in
|
||||
different languages. Traditionally this is achieved by using
|
||||
narrowing. While tree-sitter works with narrowing (see <a href="Using-Parser.html#tree_002dsitter-narrowing">narrowing</a>), the recommended way is to set ranges in which
|
||||
a parser will operate.
|
||||
<span id="index-multiple-languages_002c-parsing-with-tree_002dsitter"></span>
|
||||
<span id="index-parsing-multiple-languages-with-tree_002dsitter"></span>
|
||||
<p>Sometimes, the source of a programming language could contain snippets
|
||||
of other languages; <acronym>HTML</acronym> + <acronym>CSS</acronym> + JavaScript is one
|
||||
example. In that case, text segments written in different languages
|
||||
need to be assigned different parsers. Traditionally, this is
|
||||
achieved by using narrowing. While tree-sitter works with narrowing
|
||||
(see <a href="Using-Parser.html#tree_002dsitter-narrowing">narrowing</a>), the recommended way is
|
||||
instead to set regions of buffer text in which a parser will operate.
|
||||
</p>
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dparser_002dset_002dincluded_002dranges"><span class="category">Function: </span><span><strong>treesit-parser-set-included-ranges</strong> <em>parser ranges</em><a href='#index-treesit_002dparser_002dset_002dincluded_002dranges' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function sets the range of <var>parser</var> to <var>ranges</var>. Then
|
||||
<var>parser</var> will only read the text covered in each range. Each
|
||||
range in <var>ranges</var> is a list of cons <code>(<var>beg</var>
|
||||
. <var>end</var>)</code>.
|
||||
<dd><p>This function sets up <var>parser</var> to operate on <var>ranges</var>. The
|
||||
<var>parser</var> will only read the text of the specified ranges. Each
|
||||
range in <var>ranges</var> is a list of the form <code>(<var>beg</var> . <var>end</var>)</code><!-- /@w -->.
|
||||
</p>
|
||||
<p>Each range in <var>ranges</var> must come in order and not overlap. That
|
||||
is, in pseudo code:
|
||||
<p>The ranges in <var>ranges</var> must come in order and must not overlap.
|
||||
That is, in pseudo code:
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(cl-loop for idx from 1 to (1- (length ranges))
|
||||
|
@ -95,12 +97,12 @@
|
|||
|
||||
<span id="index-treesit_002drange_002dinvalid"></span>
|
||||
<p>If <var>ranges</var> violates this constraint, or something else went
|
||||
wrong, this function signals a <code>treesit-range-invalid</code>. The
|
||||
signal data contains a specific error message and the ranges we are
|
||||
trying to set.
|
||||
wrong, this function signals the <code>treesit-range-invalid</code> error.
|
||||
The signal data contains a specific error message and the ranges we
|
||||
are trying to set.
|
||||
</p>
|
||||
<p>This function can also be used for disabling ranges. If <var>ranges</var>
|
||||
is nil, the parser is set to parse the whole buffer.
|
||||
is <code>nil</code>, the parser is set to parse the whole buffer.
|
||||
</p>
|
||||
<p>Example:
|
||||
</p>
|
||||
|
@ -114,9 +116,9 @@
|
|||
<dt id="index-treesit_002dparser_002dincluded_002dranges"><span class="category">Function: </span><span><strong>treesit-parser-included-ranges</strong> <em>parser</em><a href='#index-treesit_002dparser_002dincluded_002dranges' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function returns the ranges set for <var>parser</var>. The return
|
||||
value is the same as the <var>ranges</var> argument of
|
||||
<code>treesit-parser-included-ranges</code>: a list of cons
|
||||
<code>(<var>beg</var> . <var>end</var>)</code>. And if <var>parser</var> doesn’t have any
|
||||
ranges, the return value is nil.
|
||||
<code>treesit-parser-included-ranges</code>: a list of cons cells of the form
|
||||
<code>(<var>beg</var> . <var>end</var>)</code><!-- /@w -->. If <var>parser</var> doesn’t have any
|
||||
ranges, the return value is <code>nil</code>.
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(treesit-parser-included-ranges parser)
|
||||
|
@ -131,7 +133,7 @@
|
|||
<var>parser-or-lang</var> could be either a parser or a language. If it is
|
||||
a language, this function looks for the first parser in
|
||||
<code>(treesit-parser-list)</code> for that language in the current buffer,
|
||||
and set range for it.
|
||||
and sets the ranges for it.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
|
@ -145,68 +147,76 @@
|
|||
<dl class="def">
|
||||
<dt id="index-treesit_002dquery_002drange"><span class="category">Function: </span><span><strong>treesit-query-range</strong> <em>source query &optional beg end</em><a href='#index-treesit_002dquery_002drange' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function matches <var>source</var> with <var>query</var> and returns the
|
||||
ranges of captured nodes. The return value has the same shape of
|
||||
other functions: a list of <code>(<var>beg</var> . <var>end</var>)</code>.
|
||||
ranges of captured nodes. The return value is a list of cons cells of
|
||||
the form <code>(<var>beg</var> . <var>end</var>)</code><!-- /@w -->, where <var>beg</var> and
|
||||
<var>end</var> specify the beginning and the end of a region of text.
|
||||
</p>
|
||||
<p>For convenience, <var>source</var> can be a language symbol, a parser, or a
|
||||
node. If a language symbol, this function matches in the root node of
|
||||
the first parser using that language; if a parser, this function
|
||||
matches in the root node of that parser; if a node, this function
|
||||
matches in that node.
|
||||
node. If it’s a language symbol, this function matches in the root
|
||||
node of the first parser using that language; if a parser, this
|
||||
function matches in the root node of that parser; if a node, this
|
||||
function matches in that node.
|
||||
</p>
|
||||
<p>Parameter <var>query</var> is the query used to capture nodes
|
||||
(see <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>). The capture names don’t matter. Parameter
|
||||
<var>beg</var> and <var>end</var>, if both non-nil, limits the range in which
|
||||
this function queries.
|
||||
<p>The argument <var>query</var> is the query used to capture nodes
|
||||
(see <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>). The capture names don’t matter. The
|
||||
arguments <var>beg</var> and <var>end</var>, if both non-<code>nil</code>, limit the
|
||||
range in which this function queries.
|
||||
</p>
|
||||
<p>Like other query functions, this function raises an
|
||||
<var>treesit-query-error</var> if <var>query</var> is malformed.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dlanguage_002dat"><span class="category">Function: </span><span><strong>treesit-language-at</strong> <em>point</em><a href='#index-treesit_002dlanguage_002dat' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function tries to figure out which language is responsible for
|
||||
the text at <var>point</var>. It goes over each parser in
|
||||
<code>(treesit-parser-list)</code> and see if that parser’s range covers
|
||||
<var>point</var>.
|
||||
<p>Like other query functions, this function raises the
|
||||
<code>treesit-query-error</code> error if <var>query</var> is malformed.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002drange_002dfunctions"><span class="category">Variable: </span><span><strong>treesit-range-functions</strong><a href='#index-treesit_002drange_002dfunctions' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>A list of range functions. Font-locking and indenting code uses
|
||||
functions in this alist to set correct ranges for a language parser
|
||||
before using it.
|
||||
<dd><p>This variable holds the list of range functions. Font-locking and
|
||||
indenting code use functions in this list to set correct ranges for
|
||||
a language parser before using it.
|
||||
</p>
|
||||
<p>The signature of each function should be
|
||||
<p>The signature of each function in the list should be:
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(<var>start</var> <var>end</var> &rest <var>_</var>)
|
||||
</pre></div>
|
||||
|
||||
<p>where <var>start</var> and <var>end</var> marks the region that is about to be
|
||||
used. A range function only need to (but not limited to) update
|
||||
<p>where <var>start</var> and <var>end</var> specify the region that is about to be
|
||||
used. A range function only needs to (but is not limited to) update
|
||||
ranges in that region.
|
||||
</p>
|
||||
<p>Each function in the list is called in-order.
|
||||
<p>The functions in the list are called in order.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dupdate_002dranges"><span class="category">Function: </span><span><strong>treesit-update-ranges</strong> <em>&optional start end</em><a href='#index-treesit_002dupdate_002dranges' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function is used by font-lock and indent to update ranges before
|
||||
using any parser. Each range function in
|
||||
<dd><p>This function is used by font-lock and indentation to update ranges
|
||||
before using any parser. Each range function in
|
||||
<var>treesit-range-functions</var> is called in-order. Arguments
|
||||
<var>start</var> and <var>end</var> are passed to each range function.
|
||||
</p></dd></dl>
|
||||
|
||||
<span id="index-treesit_002dlanguage_002dat_002dpoint_002dfunction"></span>
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dlanguage_002dat"><span class="category">Function: </span><span><strong>treesit-language-at</strong> <em>pos</em><a href='#index-treesit_002dlanguage_002dat' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function tries to figure out which language is responsible for
|
||||
the text at buffer position <var>pos</var>. Under the hood it just calls
|
||||
<code>treesit-language-at-point-function</code>.
|
||||
</p>
|
||||
<p>Various Lisp programs use this function. For example, the indentation
|
||||
program uses this function to determine which language’s rule to use
|
||||
in a multi-language buffer. So it is important to provide
|
||||
<code>treesit-language-at-point-function</code> for a multi-language major
|
||||
mode.
|
||||
</p></dd></dl>
|
||||
|
||||
<span id="An-example"></span><h3 class="heading">An example</h3>
|
||||
|
||||
<p>Normally, in a set of languages that can be mixed together, there is a
|
||||
major language and several embedded languages. We first parse the
|
||||
whole document with the major language’s parser, set ranges for the
|
||||
embedded languages, then parse the embedded languages.
|
||||
major language and several embedded languages. A Lisp program usually
|
||||
first parses the whole document with the major language’s parser, sets
|
||||
ranges for the embedded languages, and then parses the embedded
|
||||
languages.
|
||||
</p>
|
||||
<p>Suppose we want to parse a very simple document that mixes HTML, CSS
|
||||
and JavaScript:
|
||||
<p>Suppose we need to parse a very simple document that mixes
|
||||
<acronym>HTML</acronym>, <acronym>CSS</acronym> and JavaScript:
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example"><html>
|
||||
|
@ -215,22 +225,25 @@
|
|||
</html>
|
||||
</pre></div>
|
||||
|
||||
<p>We first parse with HTML, then set ranges for CSS and JavaScript:
|
||||
<p>We first parse with <acronym>HTML</acronym>, then set ranges for <acronym>CSS</acronym>
|
||||
and JavaScript:
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">;; Create parsers.
|
||||
(setq html (treesit-get-parser-create 'html))
|
||||
(setq css (treesit-get-parser-create 'css))
|
||||
(setq js (treesit-get-parser-create 'javascript))
|
||||
</pre><pre class="example">
|
||||
|
||||
;; Set CSS ranges.
|
||||
</pre><pre class="example">;; Set CSS ranges.
|
||||
(setq css-range
|
||||
(treesit-query-range
|
||||
'html
|
||||
"(style_element (raw_text) @capture)"))
|
||||
(treesit-parser-set-included-ranges css css-range)
|
||||
</pre><pre class="example">
|
||||
|
||||
;; Set JavaScript ranges.
|
||||
</pre><pre class="example">;; Set JavaScript ranges.
|
||||
(setq js-range
|
||||
(treesit-query-range
|
||||
'html
|
||||
|
@ -238,15 +251,15 @@
|
|||
(treesit-parser-set-included-ranges js js-range)
|
||||
</pre></div>
|
||||
|
||||
<p>We use a query pattern <code>(style_element (raw_text) @capture)</code> to
|
||||
find CSS nodes in the HTML parse tree. For how to write query
|
||||
patterns, see <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>.
|
||||
<p>We use a query pattern <code><span class="nolinebreak">(style_element</span> <span class="nolinebreak">(raw_text)</span> @capture)</code><!-- /@w -->
|
||||
to find <acronym>CSS</acronym> nodes in the <acronym>HTML</acronym> parse tree. For how
|
||||
to write query patterns, see <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>.
|
||||
</p>
|
||||
</div>
|
||||
<hr>
|
||||
<div class="header">
|
||||
<p>
|
||||
Next: <a href="Tree_002dsitter-C-API.html">Tree-sitter C API Correspondence</a>, Previous: <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
||||
Next: <a href="Tree_002dsitter-major-modes.html">Developing major modes with tree-sitter</a>, Previous: <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
||||
</div>
|
||||
|
||||
|
||||
|
|
|
@ -66,47 +66,78 @@
|
|||
</div>
|
||||
<hr>
|
||||
<span id="Parser_002dbased-Font-Lock-1"></span><h4 class="subsection">24.6.10 Parser-based Font Lock</h4>
|
||||
<span id="index-parser_002dbased-font_002dlock"></span>
|
||||
|
||||
|
||||
<p>Besides simple syntactic font lock and regexp-based font lock, Emacs
|
||||
also provides complete syntactic font lock with the help of a parser,
|
||||
currently provided by the tree-sitter library (see <a href="Parsing-Program-Source.html">Parsing Program Source</a>).
|
||||
also provides complete syntactic font lock with the help of a parser.
|
||||
Currently, Emacs uses the tree-sitter library (see <a href="Parsing-Program-Source.html">Parsing Program Source</a>) for this purpose.
|
||||
</p>
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dfont_002dlock_002denable"><span class="category">Function: </span><span><strong>treesit-font-lock-enable</strong><a href='#index-treesit_002dfont_002dlock_002denable' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function enables parser-based font lock in the current buffer.
|
||||
</p></dd></dl>
|
||||
|
||||
<p>Parser-based font lock and other font lock mechanism are not mutually
|
||||
<p>Parser-based font lock and other font lock mechanisms are not mutually
|
||||
exclusive. By default, if enabled, parser-based font lock runs first,
|
||||
then the simple syntactic font lock (if enabled), then regexp-based
|
||||
font lock.
|
||||
replacing syntactic font lock, then the regexp-based font lock.
|
||||
</p>
|
||||
<p>Although parser-based font lock doesn’t share the same customization
|
||||
variables with regexp-based font lock, parser-based font lock uses
|
||||
similar customization schemes. The tree-sitter counterpart of
|
||||
<var>font-lock-keywords</var> is <var>treesit-font-lock-settings</var>.
|
||||
variables with regexp-based font lock, it uses similar customization
|
||||
schemes. The tree-sitter counterpart of <var>font-lock-keywords</var> is
|
||||
<var>treesit-font-lock-settings</var>.
|
||||
</p>
|
||||
<span id="index-tree_002dsitter-fontifications_002c-overview"></span>
|
||||
<span id="index-fontifications-with-tree_002dsitter_002c-overview"></span>
|
||||
<p>In general, tree-sitter fontification works as follows:
|
||||
</p>
|
||||
<ul>
|
||||
<li> A Lisp program (usually, part of a major mode) provides a <em>query</em>
|
||||
consisting of <em>patterns</em>, each pattern associated with a
|
||||
<em>capture name</em>.
|
||||
|
||||
</li><li> The tree-sitter library finds the nodes in the parse tree
|
||||
that match these patterns, tags the nodes with the corresponding
|
||||
capture names, and returns them to the Lisp program.
|
||||
|
||||
</li><li> The Lisp program uses the returned nodes to highlight the portions of
|
||||
buffer text corresponding to each node as appropriate, using the
|
||||
tagged capture names of the nodes to determine the correct
|
||||
fontification. For example, a node tagged <code>font-lock-keyword</code>
|
||||
would be highlighted in <code>font-lock-keyword</code> face.
|
||||
</li></ul>
|
||||
|
||||
<p>For more information about queries, patterns, and capture names, see
|
||||
<a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>.
|
||||
</p>
|
||||
<p>To setup tree-sitter fontification, a major mode should first set
|
||||
<code>treesit-font-lock-settings</code> with the output of
|
||||
<code>treesit-font-lock-rules</code>, then call
|
||||
<code>treesit-major-mode-setup</code>.
|
||||
</p>
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dfont_002dlock_002drules"><span class="category">Function: </span><span><strong>treesit-font-lock-rules</strong> <em>:keyword value query...</em><a href='#index-treesit_002dfont_002dlock_002drules' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function is used to set <var>treesit-font-lock-settings</var>. It
|
||||
takes care of compiling queries and other post-processing and outputs
|
||||
a value that <var>treesit-font-lock-settings</var> accepts. An example:
|
||||
takes care of compiling queries and other post-processing, and outputs
|
||||
a value that <var>treesit-font-lock-settings</var> accepts. Here’s an
|
||||
example:
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(treesit-font-lock-rules
|
||||
:language 'javascript
|
||||
:feature 'constant
|
||||
:override t
|
||||
'((true) @font-lock-constant-face
|
||||
(false) @font-lock-constant-face)
|
||||
:language 'html
|
||||
:feature 'script
|
||||
"(script_element) @font-lock-builtin-face")
|
||||
</pre></div>
|
||||
|
||||
<p>This function takes a list of text or s-exp queries. Before each
|
||||
query, there are <var>:keyword</var> and <var>value</var> pairs that configure
|
||||
that query. The <code>:lang</code> keyword sets the query’s language and
|
||||
every query must specify the language. Other keywords are optional:
|
||||
query, there are <var>:keyword</var>-<var>value</var> pairs that configure
|
||||
that query. The <code>:lang</code> keyword sets the query’s language and
|
||||
every query must specify the language. The <code>:feature</code> keyword
|
||||
sets the feature name of the query. Users can control which features
|
||||
are enabled with <code>font-lock-maximum-decoration</code> and
|
||||
<code>treesit-font-lock-feature-list</code> (see below).
|
||||
</p>
|
||||
<p>Other keywords are optional:
|
||||
</p>
|
||||
<table>
|
||||
<thead><tr><th width="15%">Keyword</th><th width="15%">Value</th><th width="60%">Description</th></tr></thead>
|
||||
|
@ -115,43 +146,92 @@
|
|||
<tr><td width="15%"></td><td width="15%"><code>append</code></td><td width="60%">Append the new face to existing ones</td></tr>
|
||||
<tr><td width="15%"></td><td width="15%"><code>prepend</code></td><td width="60%">Prepend the new face to existing ones</td></tr>
|
||||
<tr><td width="15%"></td><td width="15%"><code>keep</code></td><td width="60%">Fill-in regions without an existing face</td></tr>
|
||||
<tr><td width="15%"><code>:toggle</code></td><td width="15%"><var>symbol</var></td><td width="60%">If non-nil, its value should be a variable name. The variable’s value
|
||||
(nil/non-nil) disables/enables the query during fontification.</td></tr>
|
||||
<tr><td width="15%"></td><td width="15%">nil</td><td width="60%">Always enable this query.</td></tr>
|
||||
<tr><td width="15%"><code>:level</code></td><td width="15%"><var>integer</var></td><td width="60%">If non-nil, its value should be the decoration level for this query.
|
||||
Decoration level is controlled by <code>font-lock-maximum-decoration</code>.</td></tr>
|
||||
<tr><td width="15%"></td><td width="15%">nil</td><td width="60%">Always enable this query.</td></tr>
|
||||
</table>
|
||||
|
||||
<p>Note that a query is applied only when both <code>:toggle</code> and
|
||||
<code>:level</code> permit it. <code>:level</code> is used for global,
|
||||
coarse-grained control, whereas <code>:toggle</code> is for local,
|
||||
fine-grained control.
|
||||
</p>
|
||||
<p>Capture names in <var>query</var> should be face names like
|
||||
<p>Lisp programs mark patterns in the query with capture names (names
|
||||
that starts with <code>@</code>), and tree-sitter will return matched nodes
|
||||
tagged with those same capture names. For the purpose of
|
||||
fontification, capture names in <var>query</var> should be face names like
|
||||
<code>font-lock-keyword-face</code>. The captured node will be fontified
|
||||
with that face. Capture names can also be function names, in which
|
||||
case the function is called with (<var>start</var> <var>end</var> <var>node</var>),
|
||||
where <var>start</var> and <var>end</var> are the start and end position of the
|
||||
node in buffer, and <var>node</var> is the node itself. If a capture name
|
||||
is both a face and a function, the face takes priority. If a capture
|
||||
name is not a face name nor a function name, it is ignored.
|
||||
with that face.
|
||||
</p>
|
||||
<span id="index-treesit_002dfontify_002dwith_002doverride"></span>
|
||||
<p>Capture names can also be function names, in which case the function
|
||||
is called with 4 arguments: <var>node</var> and <var>override</var>, <var>start</var>
|
||||
and <var>end</var>, where <var>node</var> is the node itself, <var>override</var> is
|
||||
the override property of the rule which captured this node, and
|
||||
<var>start</var> and <var>end</var> limits the region in which this function
|
||||
should fontify. (If this function wants to respect the <var>override</var>
|
||||
argument, it can use <code>treesit-fontify-with-override</code>.)
|
||||
</p>
|
||||
<p>Beyond the 4 arguments presented, this function should accept more
|
||||
arguments as optional arguments for future extensibility.
|
||||
</p>
|
||||
<p>If a capture name is both a face and a function, the face takes
|
||||
priority. If a capture name is neither a face nor a function, it is
|
||||
ignored.
|
||||
</p></dd></dl>
|
||||
|
||||
<p>Contextual entities, like multi-line strings, or <code>/* */</code> style
|
||||
comments, need special care, because change in these entities might
|
||||
cause change in a large portion of the buffer. For example, inserting
|
||||
the closing comment delimiter <code>*/</code> will change all the text
|
||||
between it and the opening delimiter to comment face. Such entities
|
||||
should be captured in a special name <code>contextual</code>, so Emacs can
|
||||
correctly update their fontification. Here is an example for
|
||||
comments:
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(treesit-font-lock-rules
|
||||
:language 'javascript
|
||||
:feature 'comment
|
||||
:override t
|
||||
'((comment) @font-lock-comment-face)
|
||||
(comment) @contextual))
|
||||
</pre></div>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dfont_002dlock_002dfeature_002dlist"><span class="category">Variable: </span><span><strong>treesit-font-lock-feature-list</strong><a href='#index-treesit_002dfont_002dlock_002dfeature_002dlist' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This is a list of lists of feature symbols. Each element of the list
|
||||
is a list that represents a decoration level.
|
||||
<code>font-lock-maximum-decoration</code> controls which levels are
|
||||
activated.
|
||||
</p>
|
||||
<p>Each element of the list is a list of the form <code>(<var>feature</var> …)</code><!-- /@w -->, where each <var>feature</var> corresponds to the
|
||||
<code>:feature</code> value of a query defined in
|
||||
<code>treesit-font-lock-rules</code>. Removing a feature symbol from this
|
||||
list disables the corresponding query during font-lock.
|
||||
</p>
|
||||
<p>Common feature names, for many programming languages, include
|
||||
function-name, type, variable-name (left-hand-side or <acronym>LHS</acronym> of
|
||||
assignments), builtin, constant, keyword, string-interpolation,
|
||||
comment, doc, string, operator, preprocessor, escape-sequence, and key
|
||||
(in key-value pairs). Major modes are free to subdivide or extend
|
||||
these common features.
|
||||
</p>
|
||||
<p>For example, the value of this variable could be:
|
||||
</p><div class="example">
|
||||
<pre class="example">((comment string doc) ; level 1
|
||||
(function-name keyword type builtin constant) ; level 2
|
||||
(variable-name string-interpolation key)) ; level 3
|
||||
</pre></div>
|
||||
|
||||
<p>Major modes should set this variable before calling
|
||||
<code>treesit-major-mode-setup</code>.
|
||||
</p>
|
||||
<span id="index-treesit_002dfont_002dlock_002drecompute_002dfeatures"></span>
|
||||
<p>For this variable to take effect, a Lisp program should call
|
||||
<code>treesit-font-lock-recompute-features</code> (which resets
|
||||
<code>treesit-font-lock-settings</code> accordingly), or
|
||||
<code>treesit-major-mode-setup</code> (which calls
|
||||
<code>treesit-font-lock-recompute-features</code>).
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dfont_002dlock_002dsettings"><span class="category">Variable: </span><span><strong>treesit-font-lock-settings</strong><a href='#index-treesit_002dfont_002dlock_002dsettings' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>A list of <var>setting</var>s for tree-sitter font lock. The exact format
|
||||
<dd><p>A list of settings for tree-sitter based font lock. The exact format
|
||||
of this variable is considered internal. One should always use
|
||||
<code>treesit-font-lock-rules</code> to set this variable.
|
||||
</p>
|
||||
<p>Each <var>setting</var> is of form
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(<var>language</var> <var>query</var>)
|
||||
</pre></div>
|
||||
|
||||
<p>Each <var>setting</var> controls one parser (often of different language).
|
||||
And <var>language</var> is the language symbol (see <a href="Language-Definitions.html">Tree-sitter Language Definitions</a>); <var>query</var> is the query (see <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>).
|
||||
</p></dd></dl>
|
||||
|
||||
<p>Multi-language major modes should provide range functions in
|
||||
|
|
|
@ -66,170 +66,176 @@
|
|||
</div>
|
||||
<hr>
|
||||
<span id="Parser_002dbased-Indentation-1"></span><h4 class="subsection">24.7.2 Parser-based Indentation</h4>
|
||||
<span id="index-parser_002dbased-indentation"></span>
|
||||
|
||||
|
||||
<p>When built with the tree-sitter library (see <a href="Parsing-Program-Source.html">Parsing Program Source</a>), Emacs could parse program source and produce a syntax tree.
|
||||
And this syntax tree can be used for indentation. For maximum
|
||||
flexibility, we could write a custom indent function that queries the
|
||||
syntax tree and indents accordingly for each language, but that would
|
||||
be a lot of work. It is more convenient to use the simple indentation
|
||||
engine described below: we only need to write some indentation rules
|
||||
<p>When built with the tree-sitter library (see <a href="Parsing-Program-Source.html">Parsing Program Source</a>), Emacs is capable of parsing the program source and producing
|
||||
a syntax tree. This syntax tree can be used for guiding the program
|
||||
source indentation commands. For maximum flexibility, it is possible
|
||||
to write a custom indentation function that queries the syntax tree
|
||||
and indents accordingly for each language, but that is a lot of work.
|
||||
It is more convenient to use the simple indentation engine described
|
||||
below: then the major mode needs only to write some indentation rules
|
||||
and the engine takes care of the rest.
|
||||
</p>
|
||||
<p>To enable the indentation engine, set the value of
|
||||
<p>To enable the parser-based indentation engine, either set
|
||||
<var>treesit-simple-indent-rules</var> and call
|
||||
<code>treesit-major-mode-setup</code>, or equivalently, set the value of
|
||||
<code>indent-line-function</code> to <code>treesit-indent</code>.
|
||||
</p>
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dindent_002dfunction"><span class="category">Variable: </span><span><strong>treesit-indent-function</strong><a href='#index-treesit_002dindent_002dfunction' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This variable stores the actual function called by
|
||||
<code>treesit-indent</code>. By default, its value is
|
||||
<code>treesit-simple-indent</code>. In the future we might add other
|
||||
<code>treesit-simple-indent</code>. In the future we might add other,
|
||||
more complex indentation engines.
|
||||
</p></dd></dl>
|
||||
|
||||
<span id="Writing-indentation-rules"></span><h3 class="heading">Writing indentation rules</h3>
|
||||
<span id="index-indentation-rules_002c-for-parser_002dbased-indentation"></span>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dsimple_002dindent_002drules"><span class="category">Variable: </span><span><strong>treesit-simple-indent-rules</strong><a href='#index-treesit_002dsimple_002dindent_002drules' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This local variable stores indentation rules for every language. It is
|
||||
a list of
|
||||
<dd><p>This local variable stores indentation rules for every language. It is
|
||||
a list of the form: <code>(<var>language</var> . <var>rules</var>)</code><!-- /@w -->, where
|
||||
<var>language</var> is a language symbol, and <var>rules</var> is a list of the
|
||||
form <code>(<var>matcher</var> <var>anchor</var> <var>offset</var>)</code><!-- /@w -->.
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(<var>language</var> . <var>rules</var>)
|
||||
</pre></div>
|
||||
|
||||
<p>where <var>language</var> is a language symbol, and <var>rules</var> is a list
|
||||
of
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(<var>matcher</var> <var>anchor</var> <var>offset</var>)
|
||||
</pre></div>
|
||||
|
||||
<p>First Emacs passes the node at point to <var>matcher</var>, if it return
|
||||
non-nil, this rule applies. Then Emacs passes the node to
|
||||
<var>anchor</var>, it returns a point. Emacs takes the column number of
|
||||
that point, add <var>offset</var> to it, and the result is the indent for
|
||||
the current line.
|
||||
<p>First, Emacs passes the smallest tree-sitter node at the beginning of
|
||||
the current line to <var>matcher</var>; if it returns non-<code>nil</code>, this
|
||||
rule is applicable. Then Emacs passes the node to <var>anchor</var>, which
|
||||
returns a buffer position. Emacs takes the column number of that
|
||||
position, adds <var>offset</var> to it, and the result is the indentation
|
||||
column for the current line.
|
||||
</p>
|
||||
<p>The <var>matcher</var> and <var>anchor</var> are functions, and Emacs provides
|
||||
convenient presets for them. You can skip over to
|
||||
<code>treesit-simple-indent-presets</code> below, those presets should be
|
||||
more than enough.
|
||||
convenient defaults for them.
|
||||
</p>
|
||||
<p>A <var>matcher</var> or an <var>anchor</var> is a function that takes three
|
||||
arguments (<var>node</var> <var>parent</var> <var>bol</var>). Argument <var>bol</var> is
|
||||
the point at where we are indenting: the position of the first
|
||||
non-whitespace character from the beginning of line; <var>node</var> is the
|
||||
largest (highest-in-tree) node that starts at that point; <var>parent</var>
|
||||
is the parent of <var>node</var>. A <var>matcher</var> returns nil/non-nil, and
|
||||
<var>anchor</var> returns a point.
|
||||
<p>Each <var>matcher</var> or <var>anchor</var> is a function that takes three
|
||||
arguments: <var>node</var>, <var>parent</var>, and <var>bol</var>. The argument
|
||||
<var>bol</var> is the buffer position whose indentation is required: the
|
||||
position of the first non-whitespace character after the beginning of
|
||||
the line. The argument <var>node</var> is the largest (highest-in-tree)
|
||||
node that starts at that position; and <var>parent</var> is the parent of
|
||||
<var>node</var>. However, when that position is on a whitespace or inside
|
||||
a multi-line string, no node that starts at that position, so
|
||||
<var>node</var> is <code>nil</code>. In that case, <var>parent</var> would be the
|
||||
smallest node that spans that position.
|
||||
</p>
|
||||
<p>Emacs finds <var>bol</var>, <var>node</var> and <var>parent</var> and
|
||||
passes them to each <var>matcher</var> and <var>anchor</var>. <var>matcher</var>
|
||||
should return non-<code>nil</code> if the rule is applicable, and
|
||||
<var>anchor</var> should return a buffer position.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dsimple_002dindent_002dpresets"><span class="category">Variable: </span><span><strong>treesit-simple-indent-presets</strong><a href='#index-treesit_002dsimple_002dindent_002dpresets' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This is a list of presets for <var>matcher</var>s and <var>anchor</var>s in
|
||||
<code>treesit-simple-indent-rules</code>. Each of them represent a function
|
||||
that takes <var>node</var>, <var>parent</var> and <var>bol</var> as arguments.
|
||||
<dd><p>This is a list of defaults for <var>matcher</var>s and <var>anchor</var>s in
|
||||
<code>treesit-simple-indent-rules</code>. Each of them represents a function
|
||||
that takes 3 arguments: <var>node</var>, <var>parent</var> and <var>bol</var>. The
|
||||
available default functions are:
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">no-node
|
||||
</pre></div>
|
||||
|
||||
<p>This matcher matches the case where <var>node</var> is nil, i.e., there is
|
||||
no node that starts at <var>bol</var>. This is the case when <var>bol</var> is
|
||||
at an empty line or inside a multi-line string, etc.
|
||||
<dl compact="compact">
|
||||
<dt id='index-no_002dnode'><span><code>no-node</code><a href='#index-no_002dnode' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This matcher is a function that is called with 3 arguments:
|
||||
<var>node</var>, <var>parent</var>, and <var>bol</var>, and returns non-<code>nil</code>,
|
||||
indicating a match, if <var>node</var> is <code>nil</code>, i.e., there is no
|
||||
node that starts at <var>bol</var>. This is the case when <var>bol</var> is on
|
||||
an empty line or inside a multi-line string, etc.
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(parent-is <var>type</var>)
|
||||
</pre></div>
|
||||
|
||||
<p>This matcher matches if <var>parent</var>’s type is <var>type</var>.
|
||||
</dd>
|
||||
<dt id='index-parent_002dis'><span><code>parent-is</code><a href='#index-parent_002dis' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This matcher is a function of one argument, <var>type</var>; it returns a
|
||||
function that is called with 3 arguments: <var>node</var>, <var>parent</var>,
|
||||
and <var>bol</var>, and returns non-<code>nil</code> (i.e., a match) if
|
||||
<var>parent</var>’s type matches regexp <var>type</var>.
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(node-is <var>type</var>)
|
||||
</pre></div>
|
||||
|
||||
<p>This matcher matches if <var>node</var>’s type is <var>type</var>.
|
||||
</dd>
|
||||
<dt id='index-node_002dis'><span><code>node-is</code><a href='#index-node_002dis' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This matcher is a function of one argument, <var>type</var>; it returns a
|
||||
function that is called with 3 arguments: <var>node</var>, <var>parent</var>,
|
||||
and <var>bol</var>, and returns non-<code>nil</code> if <var>node</var>’s type matches
|
||||
regexp <var>type</var>.
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(query <var>query</var>)
|
||||
</pre></div>
|
||||
|
||||
<p>This matcher matches if querying <var>parent</var> with <var>query</var>
|
||||
captures <var>node</var>. The capture name does not matter.
|
||||
</dd>
|
||||
<dt id='index-query'><span><code>query</code><a href='#index-query' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This matcher is a function of one argument, <var>query</var>; it returns a
|
||||
function that is called with 3 arguments: <var>node</var>, <var>parent</var>,
|
||||
and <var>bol</var>, and returns non-<code>nil</code> if querying <var>parent</var>
|
||||
with <var>query</var> captures <var>node</var> (see <a href="Pattern-Matching.html">Pattern Matching Tree-sitter Nodes</a>).
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(match <var>node-type</var> <var>parent-type</var>
|
||||
<var>node-field</var> <var>node-index-min</var> <var>node-index-max</var>)
|
||||
</pre></div>
|
||||
|
||||
<p>This matcher checks if <var>node</var>’s type is <var>node-type</var>,
|
||||
<var>parent</var>’s type is <var>parent-type</var>, <var>node</var>’s field name in
|
||||
<var>parent</var> is <var>node-field</var>, and <var>node</var>’s index among its
|
||||
siblings is between <var>node-index-min</var> and <var>node-index-max</var>. If
|
||||
the value of a constraint is nil, this matcher doesn’t check for that
|
||||
constraint. For example, to match the first child where parent is
|
||||
<code>argument_list</code>, use
|
||||
</dd>
|
||||
<dt id='index-match'><span><code>match</code><a href='#index-match' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This matcher is a function of 5 arguments: <var>node-type</var>,
|
||||
<var>parent-type</var>, <var>node-field</var>, <var>node-index-min</var>, and
|
||||
<var>node-index-max</var>). It returns a function that is called with 3
|
||||
arguments: <var>node</var>, <var>parent</var>, and <var>bol</var>, and returns
|
||||
non-<code>nil</code> if <var>node</var>’s type matches regexp <var>node-type</var>,
|
||||
<var>parent</var>’s type matches regexp <var>parent-type</var>, <var>node</var>’s
|
||||
field name in <var>parent</var> matches regexp <var>node-field</var>, and
|
||||
<var>node</var>’s index among its siblings is between <var>node-index-min</var>
|
||||
and <var>node-index-max</var>. If the value of an argument is <code>nil</code>,
|
||||
this matcher doesn’t check that argument. For example, to match the
|
||||
first child where parent is <code>argument_list</code>, use
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(match nil "argument_list" nil nil 0 0)
|
||||
</pre></div>
|
||||
|
||||
<div class="example">
|
||||
<pre class="example">first-sibling
|
||||
</pre></div>
|
||||
|
||||
<p>This anchor returns the start of the first child of <var>parent</var>.
|
||||
</dd>
|
||||
<dt id='index-first_002dsibling'><span><code>first-sibling</code><a href='#index-first_002dsibling' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
|
||||
<var>parent</var>, and <var>bol</var>, and returns the start of the first child
|
||||
of <var>parent</var>.
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">parent
|
||||
</pre></div>
|
||||
|
||||
<p>This anchor returns the start of <var>parent</var>.
|
||||
</dd>
|
||||
<dt id='index-parent'><span><code>parent</code><a href='#index-parent' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
|
||||
<var>parent</var>, and <var>bol</var>, and returns the start of <var>parent</var>.
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">parent-bol
|
||||
</pre></div>
|
||||
|
||||
<p>This anchor returns the beginning of non-space characters on the line
|
||||
where <var>parent</var> is on.
|
||||
</dd>
|
||||
<dt id='index-parent_002dbol'><span><code>parent-bol</code><a href='#index-parent_002dbol' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
|
||||
<var>parent</var>, and <var>bol</var>, and returns the first non-space character
|
||||
on the line of <var>parent</var>.
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">prev-sibling
|
||||
</pre></div>
|
||||
|
||||
<p>This anchor returns the start of the previous sibling of <var>node</var>.
|
||||
</dd>
|
||||
<dt id='index-prev_002dsibling'><span><code>prev-sibling</code><a href='#index-prev_002dsibling' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
|
||||
<var>parent</var>, and <var>bol</var>, and returns the start of the previous
|
||||
sibling of <var>node</var>.
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">no-indent
|
||||
</pre></div>
|
||||
|
||||
<p>This anchor returns the start of <var>node</var>, i.e., no indent.
|
||||
</dd>
|
||||
<dt id='index-no_002dindent'><span><code>no-indent</code><a href='#index-no_002dindent' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
|
||||
<var>parent</var>, and <var>bol</var>, and returns the start of <var>node</var>.
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">prev-line
|
||||
</pre></div>
|
||||
</dd>
|
||||
<dt id='index-prev_002dline'><span><code>prev-line</code><a href='#index-prev_002dline' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This anchor is a function that is called with 3 arguments: <var>node</var>,
|
||||
<var>parent</var>, and <var>bol</var>, and returns the first non-whitespace
|
||||
charater on the previous line.
|
||||
</p></dd>
|
||||
</dl>
|
||||
|
||||
<p>This anchor returns the first non-whitespace charater on the previous
|
||||
line.
|
||||
</p></dd></dl>
|
||||
</dd></dl>
|
||||
|
||||
<span id="Indentation-utilities"></span><h3 class="heading">Indentation utilities</h3>
|
||||
<span id="index-utility-functions-for-parser_002dbased-indentation"></span>
|
||||
|
||||
<p>Here are some utility functions that can help writing indentation
|
||||
rules.
|
||||
<p>Here are some utility functions that can help writing parser-based
|
||||
indentation rules.
|
||||
</p>
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dcheck_002dindent"><span class="category">Function: </span><span><strong>treesit-check-indent</strong> <em>mode</em><a href='#index-treesit_002dcheck_002dindent' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function checks current buffer’s indentation against major mode
|
||||
<var>mode</var>. It indents the current buffer in <var>mode</var> and compares
|
||||
the indentation with the current indentation. Then it pops up a diff
|
||||
buffer showing the difference. Correct indentation (target) is in
|
||||
green, current indentation is in red.
|
||||
</p></dd></dl>
|
||||
<dd><p>This function checks the current buffer’s indentation against major
|
||||
mode <var>mode</var>. It indents the current buffer according to
|
||||
<var>mode</var> and compares the results with the current indentation.
|
||||
Then it pops up a buffer showing the differences. Correct
|
||||
indentation (target) is shown in green color, current indentation is
|
||||
shown in red color. </p></dd></dl>
|
||||
|
||||
<p>It is also helpful to use <code>treesit-inspect-mode</code> when writing
|
||||
indentation rules.
|
||||
<p>It is also helpful to use <code>treesit-inspect-mode</code> (see <a href="Language-Definitions.html">Tree-sitter Language Definitions</a>) when writing indentation rules.
|
||||
</p>
|
||||
</div>
|
||||
<hr>
|
||||
|
|
|
@ -68,53 +68,50 @@
|
|||
<hr>
|
||||
<span id="Parsing-Program-Source-1"></span><h2 class="chapter">37 Parsing Program Source</h2>
|
||||
|
||||
<span id="index-syntax-tree_002c-from-parsing-program-source"></span>
|
||||
<p>Emacs provides various ways to parse program source text and produce a
|
||||
<em>syntax tree</em>. In a syntax tree, text is no longer a
|
||||
one-dimensional stream but a structured tree of nodes, where each node
|
||||
representing a piece of text. Thus a syntax tree can enable
|
||||
interesting features like precise fontification, indentation,
|
||||
<em>syntax tree</em>. In a syntax tree, text is no longer considered a
|
||||
one-dimensional stream of characters, but a structured tree of nodes,
|
||||
where each node representing a piece of text. Thus, a syntax tree can
|
||||
enable interesting features like precise fontification, indentation,
|
||||
navigation, structured editing, etc.
|
||||
</p>
|
||||
<p>Emacs has a simple facility for parsing balanced expressions
|
||||
(see <a href="Parsing-Expressions.html">Parsing Expressions</a>). There is also SMIE library for generic
|
||||
navigation and indentation (see <a href="SMIE.html">Simple Minded Indentation Engine</a>).
|
||||
(see <a href="Parsing-Expressions.html">Parsing Expressions</a>). There is also the SMIE library for
|
||||
generic navigation and indentation (see <a href="SMIE.html">Simple Minded Indentation Engine</a>).
|
||||
</p>
|
||||
<p>Emacs also provides integration with tree-sitter library
|
||||
(<a href="https://tree-sitter.github.io/tree-sitter">https://tree-sitter.github.io/tree-sitter</a>) if compiled with
|
||||
it. The tree-sitter library implements an incremental parser and has
|
||||
support from a wide range of programming languages.
|
||||
<p>In addition to those, Emacs also provides integration with
|
||||
<a href="https://tree-sitter.github.io/tree-sitter">the tree-sitter
|
||||
library</a>) if support for it was compiled in. The tree-sitter library
|
||||
implements an incremental parser and has support from a wide range of
|
||||
programming languages.
|
||||
</p>
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002davailable_002dp"><span class="category">Function: </span><span><strong>treesit-available-p</strong><a href='#index-treesit_002davailable_002dp' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function returns non-nil if tree-sitter features are available
|
||||
for this Emacs instance.
|
||||
<dd><p>This function returns non-<code>nil</code> if tree-sitter features are
|
||||
available for the current Emacs session.
|
||||
</p></dd></dl>
|
||||
|
||||
<p>For tree-sitter integration with existing Emacs features,
|
||||
see <a href="Parser_002dbased-Font-Lock.html">Parser-based Font Lock</a>, <a href="Parser_002dbased-Indentation.html">Parser-based Indentation</a>, and
|
||||
<a href="List-Motion.html">Moving over Balanced Expressions</a>.
|
||||
</p>
|
||||
<p>About naming convention: use “tree-sitter” when referring to it as a
|
||||
noun, like <code>python-use-tree-sitter</code>, but use “treesit” for
|
||||
prefixes, like <code>python-treesit-indent-function</code>.
|
||||
</p>
|
||||
<p>To access the syntax tree of the text in a buffer, we need to first
|
||||
load a language definition and create a parser with it. Next, we can
|
||||
query the parser for specific nodes in the syntax tree. Then, we can
|
||||
access various information about the node, and we can pattern-match a
|
||||
node with a powerful syntax. Finally, we explain how to work with
|
||||
source files that mixes multiple languages. The following sections
|
||||
explain how to do each of the tasks in detail.
|
||||
<p>To be able to parse the program source using the tree-sitter library
|
||||
and access the syntax tree of the program, a Lisp program needs to
|
||||
load a language definition library, and create a parser for that
|
||||
language and the current buffer. After that, the Lisp program can
|
||||
query the parser about specific nodes of the syntax tree. Then, it
|
||||
can access various kinds of information about each node, and search
|
||||
for nodes using a powerful pattern-matching syntax. This chapter
|
||||
explains how to do all this, and also how a Lisp program can work with
|
||||
source files that mix multiple programming languages.
|
||||
</p>
|
||||
|
||||
<ul class="section-toc">
|
||||
<li><a href="Language-Definitions.html" accesskey="1">Tree-sitter Language Definitions</a></li>
|
||||
<li><a href="Using-Parser.html" accesskey="2">Using Tree-sitter Parser</a></li>
|
||||
<li><a href="Retrieving-Node.html" accesskey="3">Retrieving Node</a></li>
|
||||
<li><a href="Accessing-Node.html" accesskey="4">Accessing Node Information</a></li>
|
||||
<li><a href="Accessing-Node-Information.html" accesskey="4">Accessing Node Information</a></li>
|
||||
<li><a href="Pattern-Matching.html" accesskey="5">Pattern Matching Tree-sitter Nodes</a></li>
|
||||
<li><a href="Multiple-Languages.html" accesskey="6">Parsing Text in Multiple Languages</a></li>
|
||||
<li><a href="Tree_002dsitter-C-API.html" accesskey="7">Tree-sitter C API Correspondence</a></li>
|
||||
<li><a href="Tree_002dsitter-major-modes.html" accesskey="7">Developing major modes with tree-sitter</a></li>
|
||||
<li><a href="Tree_002dsitter-C-API.html" accesskey="8">Tree-sitter C API Correspondence</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<hr>
|
||||
|
|
|
@ -34,7 +34,7 @@
|
|||
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
|
||||
<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
|
||||
<link href="Multiple-Languages.html" rel="next" title="Multiple Languages">
|
||||
<link href="Accessing-Node.html" rel="prev" title="Accessing Node">
|
||||
<link href="Accessing-Node-Information.html" rel="prev" title="Accessing Node Information">
|
||||
<style type="text/css">
|
||||
<!--
|
||||
a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
|
||||
|
@ -63,32 +63,32 @@
|
|||
<div class="section" id="Pattern-Matching">
|
||||
<div class="header">
|
||||
<p>
|
||||
Next: <a href="Multiple-Languages.html" accesskey="n" rel="next">Parsing Text in Multiple Languages</a>, Previous: <a href="Accessing-Node.html" accesskey="p" rel="prev">Accessing Node Information</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
||||
Next: <a href="Multiple-Languages.html" accesskey="n" rel="next">Parsing Text in Multiple Languages</a>, Previous: <a href="Accessing-Node-Information.html" accesskey="p" rel="prev">Accessing Node Information</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
||||
</div>
|
||||
<hr>
|
||||
<span id="Pattern-Matching-Tree_002dsitter-Nodes"></span><h3 class="section">37.5 Pattern Matching Tree-sitter Nodes</h3>
|
||||
<span id="index-pattern-matching-with-tree_002dsitter-nodes"></span>
|
||||
|
||||
<p>Tree-sitter let us pattern match with a small declarative language.
|
||||
Pattern matching consists of two steps: first tree-sitter matches a
|
||||
<em>pattern</em> against nodes in the syntax tree, then it <em>captures</em>
|
||||
specific nodes in that pattern and returns the captured nodes.
|
||||
<span id="index-capturing_002c-tree_002dsitter-node"></span>
|
||||
<p>Tree-sitter lets Lisp programs match patterns using a small
|
||||
declarative language. This pattern matching consists of two steps:
|
||||
first tree-sitter matches a <em>pattern</em> against nodes in the syntax
|
||||
tree, then it <em>captures</em> specific nodes that matched the pattern
|
||||
and returns the captured nodes.
|
||||
</p>
|
||||
<p>We describe first how to write the most basic query pattern and how to
|
||||
capture nodes in a pattern, then the pattern-match function, finally
|
||||
more advanced pattern syntax.
|
||||
capture nodes in a pattern, then the pattern-matching function, and
|
||||
finally the more advanced pattern syntax.
|
||||
</p>
|
||||
<span id="Basic-query-syntax"></span><h3 class="heading">Basic query syntax</h3>
|
||||
|
||||
<span id="index-Tree_002dsitter-query-syntax"></span>
|
||||
<span id="index-Tree_002dsitter-query-pattern"></span>
|
||||
<span id="index-tree_002dsitter-query-pattern-syntax"></span>
|
||||
<span id="index-pattern-syntax_002c-tree_002dsitter-query"></span>
|
||||
<span id="index-query_002c-tree_002dsitter"></span>
|
||||
<p>A <em>query</em> consists of multiple <em>patterns</em>. Each pattern is an
|
||||
s-expression that matches a certain node in the syntax node. A
|
||||
pattern has the following shape:
|
||||
pattern has the form <code>(<var>type</var> (<var>child</var>…))</code><!-- /@w -->
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(<var>type</var> <var>child</var>...)
|
||||
</pre></div>
|
||||
|
||||
<p>For example, a pattern that matches a <code>binary_expression</code> node that
|
||||
contains <code>number_literal</code> child nodes would look like
|
||||
</p>
|
||||
|
@ -96,19 +96,20 @@
|
|||
<pre class="example">(binary_expression (number_literal))
|
||||
</pre></div>
|
||||
|
||||
<p>To <em>capture</em> a node in the query pattern above, append
|
||||
<code>@capture-name</code> after the node pattern you want to capture. For
|
||||
example,
|
||||
<p>To <em>capture</em> a node using the query pattern above, append
|
||||
<code>@<var>capture-name</var></code> after the node pattern you want to
|
||||
capture. For example,
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(binary_expression (number_literal) @number-in-exp)
|
||||
</pre></div>
|
||||
|
||||
<p>captures <code>number_literal</code> nodes that are inside a
|
||||
<code>binary_expression</code> node with capture name <code>number-in-exp</code>.
|
||||
<code>binary_expression</code> node with the capture name
|
||||
<code>number-in-exp</code>.
|
||||
</p>
|
||||
<p>We can capture the <code>binary_expression</code> node too, with capture
|
||||
name <code>biexp</code>:
|
||||
<p>We can capture the <code>binary_expression</code> node as well, with, for
|
||||
example, the capture name <code>biexp</code>:
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(binary_expression
|
||||
|
@ -117,34 +118,40 @@
|
|||
|
||||
<span id="Query-function"></span><h3 class="heading">Query function</h3>
|
||||
|
||||
<p>Now we can introduce the query functions.
|
||||
<span id="index-query-functions_002c-tree_002dsitter"></span>
|
||||
<p>Now we can introduce the <em>query functions</em>.
|
||||
</p>
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dquery_002dcapture"><span class="category">Function: </span><span><strong>treesit-query-capture</strong> <em>node query &optional beg end node-only</em><a href='#index-treesit_002dquery_002dcapture' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function matches patterns in <var>query</var> in <var>node</var>.
|
||||
Parameter <var>query</var> can be either a string, a s-expression, or a
|
||||
<dd><p>This function matches patterns in <var>query</var> within <var>node</var>.
|
||||
The argument <var>query</var> can be either a string, a s-expression, or a
|
||||
compiled query object. For now, we focus on the string syntax;
|
||||
s-expression syntax and compiled query are described at the end of the
|
||||
section.
|
||||
</p>
|
||||
<p>Parameter <var>node</var> can also be a parser or a language symbol. A
|
||||
<p>The argument <var>node</var> can also be a parser or a language symbol. A
|
||||
parser means using its root node, a language symbol means find or
|
||||
create a parser for that language in the current buffer, and use the
|
||||
root node.
|
||||
</p>
|
||||
<p>The function returns all captured nodes in a list of
|
||||
<code>(<var>capture_name</var> . <var>node</var>)</code>. If <var>node-only</var> is
|
||||
non-nil, a list of node is returned instead. If <var>beg</var> and
|
||||
<var>end</var> are both non-nil, this function only pattern matches nodes
|
||||
in that range.
|
||||
<p>The function returns all the captured nodes in a list of the form
|
||||
<code>(<var><span class="nolinebreak">capture_name</span></var> . <var>node</var>)</code><!-- /@w -->. If <var>node-only</var> is
|
||||
non-<code>nil</code>, it returns the list of nodes instead. By default the
|
||||
entire text of <var>node</var> is searched, but if <var>beg</var> and <var>end</var>
|
||||
are both non-<code>nil</code>, they specify the region of buffer text where
|
||||
this function should match nodes. Any matching node whose span
|
||||
overlaps with the region between <var>beg</var> and <var>end</var> are captured,
|
||||
it doesn’t have to be completely in the region.
|
||||
</p>
|
||||
<span id="index-treesit_002dquery_002derror"></span>
|
||||
<p>This function raise a <var>treesit-query-error</var> if <var>query</var> is
|
||||
malformed. The signal data contains a description of the specific
|
||||
error. You can use <code>treesit-query-validate</code> to debug the query.
|
||||
<span id="index-treesit_002dquery_002dvalidate"></span>
|
||||
<p>This function raises the <code>treesit-query-error</code> error if
|
||||
<var>query</var> is malformed. The signal data contains a description of
|
||||
the specific error. You can use <code>treesit-query-validate</code> to
|
||||
validate and debug the query.
|
||||
</p></dd></dl>
|
||||
|
||||
<p>For example, suppose <var>node</var>’s content is <code>1 + 2</code>, and
|
||||
<p>For example, suppose <var>node</var>’s text is <code>1 + 2</code>, and
|
||||
<var>query</var> is
|
||||
</p>
|
||||
<div class="example">
|
||||
|
@ -153,7 +160,7 @@
|
|||
(number_literal) @number-in-exp) @biexp")
|
||||
</pre></div>
|
||||
|
||||
<p>Querying that query would return
|
||||
<p>Matching that query would return
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(treesit-query-capture node query)
|
||||
|
@ -162,8 +169,8 @@
|
|||
(number-in-exp . <var><node for "2"></var>))
|
||||
</pre></div>
|
||||
|
||||
<p>As we mentioned earlier, a <var>query</var> could contain multiple
|
||||
patterns. For example, it could have two top-level patterns:
|
||||
<p>As mentioned earlier, <var>query</var> could contain multiple patterns.
|
||||
For example, it could have two top-level patterns:
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(setq query
|
||||
|
@ -173,15 +180,15 @@
|
|||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dquery_002dstring"><span class="category">Function: </span><span><strong>treesit-query-string</strong> <em>string query language</em><a href='#index-treesit_002dquery_002dstring' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function parses <var>string</var> with <var>language</var>, pattern matches
|
||||
its root node with <var>query</var>, and returns the result.
|
||||
<dd><p>This function parses <var>string</var> with <var>language</var>, matches its
|
||||
root node with <var>query</var>, and returns the result.
|
||||
</p></dd></dl>
|
||||
|
||||
<span id="More-query-syntax"></span><h3 class="heading">More query syntax</h3>
|
||||
|
||||
<p>Besides node type and capture, tree-sitter’s query syntax can express
|
||||
anonymous node, field name, wildcard, quantification, grouping,
|
||||
alternation, anchor, and predicate.
|
||||
<p>Besides node type and capture, tree-sitter’s pattern syntax can
|
||||
express anonymous node, field name, wildcard, quantification,
|
||||
grouping, alternation, anchor, and predicate.
|
||||
</p>
|
||||
<span id="Anonymous-node"></span><h4 class="subheading">Anonymous node</h4>
|
||||
|
||||
|
@ -194,9 +201,9 @@
|
|||
|
||||
<span id="Wild-card"></span><h4 class="subheading">Wild card</h4>
|
||||
|
||||
<p>In a query pattern, ‘<samp>(_)</samp>’ matches any named node, and ‘<samp>_</samp>’
|
||||
matches any named and anonymous node. For example, to capture any
|
||||
named child of a <code>binary_expression</code> node, the pattern would be
|
||||
<p>In a pattern, ‘<samp>(_)</samp>’ matches any named node, and ‘<samp>_</samp>’ matches
|
||||
any named and anonymous node. For example, to capture any named child
|
||||
of a <code>binary_expression</code> node, the pattern would be
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(binary_expression (_) @in_biexp)
|
||||
|
@ -204,7 +211,9 @@
|
|||
|
||||
<span id="Field-name"></span><h4 class="subheading">Field name</h4>
|
||||
|
||||
<p>We can capture child nodes that has specific field names:
|
||||
<p>It is possible to capture child nodes that have specific field names.
|
||||
In the pattern below, <code>declarator</code> and <code>body</code> are field
|
||||
names, indicated by the colon following them.
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(function_definition
|
||||
|
@ -212,8 +221,8 @@
|
|||
body: (_) @func-body)
|
||||
</pre></div>
|
||||
|
||||
<p>We can also capture a node that doesn’t have certain field, say, a
|
||||
<code>function_definition</code> without a <code>body</code> field.
|
||||
<p>It is also possible to capture a node that doesn’t have a certain
|
||||
field, say, a <code>function_definition</code> without a <code>body</code> field.
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(function_definition !body) @func-no-body
|
||||
|
@ -221,19 +230,20 @@
|
|||
|
||||
<span id="Quantify-node"></span><h4 class="subheading">Quantify node</h4>
|
||||
|
||||
<span id="index-quantify-node_002c-tree_002dsitter"></span>
|
||||
<p>Tree-sitter recognizes quantification operators ‘<samp>*</samp>’, ‘<samp>+</samp>’ and
|
||||
‘<samp>?</samp>’. Their meanings are the same as in regular expressions:
|
||||
‘<samp>*</samp>’ matches the preceding pattern zero or more times, ‘<samp>+</samp>’
|
||||
matches one or more times, and ‘<samp>?</samp>’ matches zero or one time.
|
||||
</p>
|
||||
<p>For example, this pattern matches <code>type_declaration</code> nodes
|
||||
that has <em>zero or more</em> <code>long</code> keyword.
|
||||
<p>For example, the following pattern matches <code>type_declaration</code>
|
||||
nodes that has <em>zero or more</em> <code>long</code> keyword.
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(type_declaration "long"*) @long-type
|
||||
</pre></div>
|
||||
|
||||
<p>And this pattern matches a type declaration that has zero or one
|
||||
<p>The following pattern matches a type declaration that has zero or one
|
||||
<code>long</code> keyword:
|
||||
</p>
|
||||
<div class="example">
|
||||
|
@ -242,8 +252,8 @@
|
|||
|
||||
<span id="Grouping"></span><h4 class="subheading">Grouping</h4>
|
||||
|
||||
<p>Similar to groups in regular expression, we can bundle patterns into a
|
||||
group and apply quantification operators to it. For example, to
|
||||
<p>Similar to groups in regular expression, we can bundle patterns into
|
||||
groups and apply quantification operators to them. For example, to
|
||||
express a comma separated list of identifiers, one could write
|
||||
</p>
|
||||
<div class="example">
|
||||
|
@ -253,9 +263,9 @@
|
|||
<span id="Alternation"></span><h4 class="subheading">Alternation</h4>
|
||||
|
||||
<p>Again, similar to regular expressions, we can express “match anyone
|
||||
from this group of patterns” in the query pattern. The syntax is a
|
||||
list of patterns enclosed in square brackets. For example, to capture
|
||||
some keywords in C, the query pattern would be
|
||||
from this group of patterns” in a pattern. The syntax is a list of
|
||||
patterns enclosed in square brackets. For example, to capture some
|
||||
keywords in C, the pattern would be
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">[
|
||||
|
@ -277,11 +287,13 @@
|
|||
<div class="example">
|
||||
<pre class="example">;; Anchor the child with the end of its parent.
|
||||
(compound_expression (_) @last-child .)
|
||||
</pre><pre class="example">
|
||||
|
||||
;; Anchor the child with the beginning of its parent.
|
||||
</pre><pre class="example">;; Anchor the child with the beginning of its parent.
|
||||
(compound_expression . (_) @first-child)
|
||||
</pre><pre class="example">
|
||||
|
||||
;; Anchor two adjacent children.
|
||||
</pre><pre class="example">;; Anchor two adjacent children.
|
||||
(compound_expression
|
||||
(_) @prev-child
|
||||
.
|
||||
|
@ -293,8 +305,8 @@
|
|||
</p>
|
||||
<span id="Predicate"></span><h4 class="subheading">Predicate</h4>
|
||||
|
||||
<p>We can add predicate constraints to a pattern. For example, if we use
|
||||
the following query pattern
|
||||
<p>It is possible to add predicate constraints to a pattern. For
|
||||
example, with the following pattern:
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(
|
||||
|
@ -303,33 +315,35 @@
|
|||
)
|
||||
</pre></div>
|
||||
|
||||
<p>Then tree-sitter only matches arrays where the first element equals to
|
||||
<p>tree-sitter only matches arrays where the first element equals to
|
||||
the last element. To attach a predicate to a pattern, we need to
|
||||
group then together. A predicate always starts with a ‘<samp>#</samp>’.
|
||||
group them together. A predicate always starts with a ‘<samp>#</samp>’.
|
||||
Currently there are two predicates, <code>#equal</code> and <code>#match</code>.
|
||||
</p>
|
||||
<dl class="def">
|
||||
<dt id="index-equal-1"><span class="category">Predicate: </span><span><strong>equal</strong> <em>arg1 arg2</em><a href='#index-equal-1' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>Matches if <var>arg1</var> equals to <var>arg2</var>. Arguments can be either a
|
||||
string or a capture name. Capture names represent the text that the
|
||||
<dd><p>Matches if <var>arg1</var> equals to <var>arg2</var>. Arguments can be either
|
||||
strings or capture names. Capture names represent the text that the
|
||||
captured node spans in the buffer.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-match"><span class="category">Predicate: </span><span><strong>match</strong> <em>regexp capture-name</em><a href='#index-match' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>Matches if the text that <var>capture-name</var>’s node spans in the buffer
|
||||
<dt id="index-match-1"><span class="category">Predicate: </span><span><strong>match</strong> <em>regexp capture-name</em><a href='#index-match-1' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>Matches if the text that <var>capture-name</var>’s node spans in the buffer
|
||||
matches regular expression <var>regexp</var>. Matching is case-sensitive.
|
||||
</p></dd></dl>
|
||||
|
||||
<p>Note that a predicate can only refer to capture names appeared in the
|
||||
same pattern. Indeed, it makes little sense to refer to capture names
|
||||
in other patterns anyway.
|
||||
<p>Note that a predicate can only refer to capture names that appear in
|
||||
the same pattern. Indeed, it makes little sense to refer to capture
|
||||
names in other patterns.
|
||||
</p>
|
||||
<span id="S_002dexpression-patterns"></span><h3 class="heading">S-expression patterns</h3>
|
||||
|
||||
<p>Besides strings, Emacs provides a s-expression based syntax for query
|
||||
patterns. It largely resembles the string-based syntax. For example,
|
||||
the following pattern
|
||||
<span id="index-tree_002dsitter-patterns-as-sexps"></span>
|
||||
<span id="index-patterns_002c-tree_002dsitter_002c-in-sexp-form"></span>
|
||||
<p>Besides strings, Emacs provides a s-expression based syntax for
|
||||
tree-sitter patterns. It largely resembles the string-based syntax.
|
||||
For example, the following query
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">(treesit-query-capture
|
||||
|
@ -353,9 +367,8 @@
|
|||
["return" "break"] @keyword))
|
||||
</pre></div>
|
||||
|
||||
<p>Most pattern syntax can be written directly as strange but
|
||||
never-the-less valid s-expressions. Only a few of them needs
|
||||
modification:
|
||||
<p>Most patterns can be written directly as strange but nevertheless
|
||||
valid s-expressions. Only a few of them needs modification:
|
||||
</p>
|
||||
<ul>
|
||||
<li> Anchor ‘<samp>.</samp>’ is written as <code>:anchor</code>.
|
||||
|
@ -386,42 +399,50 @@
|
|||
|
||||
<span id="Compiling-queries"></span><h3 class="heading">Compiling queries</h3>
|
||||
|
||||
<p>If a query will be used repeatedly, especially in tight loops, it is
|
||||
important to compile that query, because a compiled query is much
|
||||
faster than an uncompiled one. A compiled query can be used anywhere
|
||||
a query is accepted.
|
||||
<span id="index-compiling-tree_002dsitter-queries"></span>
|
||||
<span id="index-queries_002c-compiling"></span>
|
||||
<p>If a query is intended to be used repeatedly, especially in tight
|
||||
loops, it is important to compile that query, because a compiled query
|
||||
is much faster than an uncompiled one. A compiled query can be used
|
||||
anywhere a query is accepted.
|
||||
</p>
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dquery_002dcompile"><span class="category">Function: </span><span><strong>treesit-query-compile</strong> <em>language query</em><a href='#index-treesit_002dquery_002dcompile' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function compiles <var>query</var> for <var>language</var> into a compiled
|
||||
query object and returns it.
|
||||
</p>
|
||||
<p>This function raise a <var>treesit-query-error</var> if <var>query</var> is
|
||||
malformed. The signal data contains a description of the specific
|
||||
error. You can use <code>treesit-query-validate</code> to debug the query.
|
||||
<p>This function raises the <code>treesit-query-error</code> error if
|
||||
<var>query</var> is malformed. The signal data contains a description of
|
||||
the specific error. You can use <code>treesit-query-validate</code> to
|
||||
validate and debug the query.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dquery_002dlanguage"><span class="category">Function: </span><span><strong>treesit-query-language</strong> <em>query</em><a href='#index-treesit_002dquery_002dlanguage' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function return the language of <var>query</var>.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dquery_002dexpand"><span class="category">Function: </span><span><strong>treesit-query-expand</strong> <em>query</em><a href='#index-treesit_002dquery_002dexpand' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function expands the s-expression <var>query</var> into a string
|
||||
query.
|
||||
<dd><p>This function converts the s-expression <var>query</var> into the string
|
||||
format.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dpattern_002dexpand"><span class="category">Function: </span><span><strong>treesit-pattern-expand</strong> <em>pattern</em><a href='#index-treesit_002dpattern_002dexpand' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function expands the s-expression <var>pattern</var> into a string
|
||||
pattern.
|
||||
<dd><p>This function converts the s-expression <var>pattern</var> into the string
|
||||
format.
|
||||
</p></dd></dl>
|
||||
|
||||
<p>Finally, tree-sitter project’s documentation about
|
||||
pattern-matching can be found at
|
||||
<p>For more details, read the tree-sitter project’s documentation about
|
||||
pattern-matching, which can be found at
|
||||
<a href="https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries">https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries</a>.
|
||||
</p>
|
||||
</div>
|
||||
<hr>
|
||||
<div class="header">
|
||||
<p>
|
||||
Next: <a href="Multiple-Languages.html">Parsing Text in Multiple Languages</a>, Previous: <a href="Accessing-Node.html">Accessing Node Information</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
||||
Next: <a href="Multiple-Languages.html">Parsing Text in Multiple Languages</a>, Previous: <a href="Accessing-Node-Information.html">Accessing Node Information</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
||||
</div>
|
||||
|
||||
|
||||
|
|
|
@ -33,7 +33,7 @@
|
|||
<link href="Index.html" rel="index" title="Index">
|
||||
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
|
||||
<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
|
||||
<link href="Accessing-Node.html" rel="next" title="Accessing Node">
|
||||
<link href="Accessing-Node-Information.html" rel="next" title="Accessing Node Information">
|
||||
<link href="Using-Parser.html" rel="prev" title="Using Parser">
|
||||
<style type="text/css">
|
||||
<!--
|
||||
|
@ -63,77 +63,90 @@
|
|||
<div class="section" id="Retrieving-Node">
|
||||
<div class="header">
|
||||
<p>
|
||||
Next: <a href="Accessing-Node.html" accesskey="n" rel="next">Accessing Node Information</a>, Previous: <a href="Using-Parser.html" accesskey="p" rel="prev">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
||||
Next: <a href="Accessing-Node-Information.html" accesskey="n" rel="next">Accessing Node Information</a>, Previous: <a href="Using-Parser.html" accesskey="p" rel="prev">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
||||
</div>
|
||||
<hr>
|
||||
<span id="Retrieving-Node-1"></span><h3 class="section">37.3 Retrieving Node</h3>
|
||||
<span id="index-retrieve-node_002c-tree_002dsitter"></span>
|
||||
<span id="index-tree_002dsitter_002c-find-node"></span>
|
||||
<span id="index-get-node_002c-tree_002dsitter"></span>
|
||||
|
||||
<span id="index-tree_002dsitter-find-node"></span>
|
||||
<span id="index-tree_002dsitter-get-node"></span>
|
||||
<p>Before we continue, lets go over some conventions of tree-sitter
|
||||
functions.
|
||||
<span id="index-terminology_002c-for-tree_002dsitter-functions"></span>
|
||||
<p>Here’s some terminology and conventions we use when documenting
|
||||
tree-sitter functions.
|
||||
</p>
|
||||
<p>We talk about a node being “smaller” or “larger”, and “lower” or
|
||||
“higher”. A smaller and lower node is lower in the syntax tree and
|
||||
therefore spans a smaller piece of text; a larger and higher node is
|
||||
higher up in the syntax tree, containing many smaller nodes as its
|
||||
children, and therefore spans a larger piece of text.
|
||||
therefore spans a smaller portion of buffer text; a larger and higher
|
||||
node is higher up in the syntax tree, it contains many smaller nodes
|
||||
as its children, and therefore spans a larger portion of text.
|
||||
</p>
|
||||
<p>When a function cannot find a node, it returns nil. And for the
|
||||
convenience for function chaining, all the functions that take a node
|
||||
as argument and returns a node accept the node to be nil; in that
|
||||
case, the function just returns nil.
|
||||
<p>When a function cannot find a node, it returns <code>nil</code>. For
|
||||
convenience, all functions that take a node as argument and return
|
||||
a node, also accept the node argument of <code>nil</code> and in that case
|
||||
just return <code>nil</code>.
|
||||
</p>
|
||||
<span id="index-treesit_002dnode_002doutdated"></span>
|
||||
<p>Nodes are not automatically updated when the associated buffer is
|
||||
modified. And there is no way to update a node once it is retrieved.
|
||||
Using an outdated node throws <code>treesit-node-outdated</code> error.
|
||||
modified, and there is no way to update a node once it is retrieved.
|
||||
Using an outdated node signals the <code>treesit-node-outdated</code> error.
|
||||
</p>
|
||||
<span id="Retrieving-node-from-syntax-tree"></span><h3 class="heading">Retrieving node from syntax tree</h3>
|
||||
<span id="index-retrieving-tree_002dsitter-nodes"></span>
|
||||
<span id="index-syntax-tree_002c-retrieving-nodes"></span>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dnode_002dat"><span class="category">Function: </span><span><strong>treesit-node-at</strong> <em>beg end &optional parser-or-lang named</em><a href='#index-treesit_002dnode_002dat' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dt id="index-treesit_002dnode_002dat"><span class="category">Function: </span><span><strong>treesit-node-at</strong> <em>pos &optional parser-or-lang named</em><a href='#index-treesit_002dnode_002dat' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function returns the <em>smallest</em> node that starts at or after
|
||||
the <var>point</var>. In other words, the start of the node is equal or
|
||||
greater than <var>point</var>.
|
||||
the buffer position <var>pos</var>. In other words, the start of the node
|
||||
is greater or equal to <var>pos</var>.
|
||||
</p>
|
||||
<p>When <var>parser-or-lang</var> is nil, this function uses the first parser
|
||||
in <code>(treesit-parser-list)</code> in the current buffer. If
|
||||
<var>parser-or-lang</var> is a parser object, it use that parser; if
|
||||
<var>parser-or-lang</var> is a language, it finds the first parser using
|
||||
that language in <code>(treesit-parser-list)</code> and use that.
|
||||
<p>When <var>parser-or-lang</var> is <code>nil</code> or omitted, this function uses
|
||||
the first parser in <code>(treesit-parser-list)</code> of the current
|
||||
buffer. If <var>parser-or-lang</var> is a parser object, it uses that
|
||||
parser; if <var>parser-or-lang</var> is a language, it finds the first
|
||||
parser using that language in <code>(treesit-parser-list)</code>, and uses
|
||||
that.
|
||||
</p>
|
||||
<p>If <var>named</var> is non-nil, this function looks for a named node
|
||||
<p>If <var>named</var> is non-<code>nil</code>, this function looks for a named node
|
||||
only (see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>).
|
||||
</p>
|
||||
<p>When <var>pos</var> is after all the text in the buffer, technically there
|
||||
is no node after <var>pos</var>. But for convenience, this function will
|
||||
return the last leaf node in the parse tree. If <var>strict</var> is
|
||||
non-<code>nil</code>, this function will strictly comply to the semantics and
|
||||
return <var>nil</var>.
|
||||
</p>
|
||||
<p>Example:
|
||||
</p><div class="example">
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">;; Find the node at point in a C parser's syntax tree.
|
||||
(treesit-node-at (point) 'c)
|
||||
</pre></div>
|
||||
⇒ #<treesit-node (primitive_type) in 23-27>
|
||||
</pre></div>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dnode_002don"><span class="category">Function: </span><span><strong>treesit-node-on</strong> <em>beg end &optional parser-or-lang named</em><a href='#index-treesit_002dnode_002don' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function returns the <em>smallest</em> node that covers the span
|
||||
from <var>beg</var> to <var>end</var>. In other words, the start of the node is
|
||||
less or equal to <var>beg</var>, and the end of the node is greater or
|
||||
equal to <var>end</var>.
|
||||
<dd><p>This function returns the <em>smallest</em> node that covers the region
|
||||
of buffer text between <var>beg</var> and <var>end</var>. In other words, the
|
||||
start of the node is before or at <var>beg</var>, and the end of the node
|
||||
is at or after <var>end</var>.
|
||||
</p>
|
||||
<p><em>Beware</em> that calling this function on an empty line that is not
|
||||
inside any top-level construct (function definition, etc) most
|
||||
<p><em>Beware:</em> calling this function on an empty line that is not
|
||||
inside any top-level construct (function definition, etc.) most
|
||||
probably will give you the root node, because the root node is the
|
||||
smallest node that covers that empty line. Most of the time, you want
|
||||
to use <code>treesit-node-at</code>.
|
||||
to use <code>treesit-node-at</code>, described above, instead.
|
||||
</p>
|
||||
<p>When <var>parser-or-lang</var> is nil, this function uses the first parser
|
||||
in <code>(treesit-parser-list)</code> in the current buffer. If
|
||||
<var>parser-or-lang</var> is a parser object, it use that parser; if
|
||||
<p>When <var>parser-or-lang</var> is <code>nil</code>, this function uses the first
|
||||
parser in <code>(treesit-parser-list)</code> of the current buffer. If
|
||||
<var>parser-or-lang</var> is a parser object, it uses that parser; if
|
||||
<var>parser-or-lang</var> is a language, it finds the first parser using
|
||||
that language in <code>(treesit-parser-list)</code> and use that.
|
||||
that language in <code>(treesit-parser-list)</code>, and uses that.
|
||||
</p>
|
||||
<p>If <var>named</var> is non-nil, this function looks for a named node only
|
||||
(see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>).
|
||||
<p>If <var>named</var> is non-<code>nil</code>, this function looks for a named node
|
||||
only (see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>).
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
|
@ -145,17 +158,21 @@
|
|||
<dl class="def">
|
||||
<dt id="index-treesit_002dbuffer_002droot_002dnode"><span class="category">Function: </span><span><strong>treesit-buffer-root-node</strong> <em>&optional language</em><a href='#index-treesit_002dbuffer_002droot_002dnode' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function finds the first parser that uses <var>language</var> in
|
||||
<code>(treesit-parser-list)</code> in the current buffer, and returns the
|
||||
root node of that buffer. If it cannot find an appropriate parser,
|
||||
nil is returned.
|
||||
<code>(treesit-parser-list)</code> of the current buffer, and returns the
|
||||
root node generated by that parser. If it cannot find an appropriate
|
||||
parser, it returns <code>nil</code>.
|
||||
</p></dd></dl>
|
||||
|
||||
<p>Once we have a node, we can retrieve other nodes from it, or query for
|
||||
information about this node.
|
||||
<p>Given a node, a Lisp program can retrieve other nodes starting from
|
||||
it, or query for information about this node.
|
||||
</p>
|
||||
<span id="Retrieving-node-from-other-nodes"></span><h3 class="heading">Retrieving node from other nodes</h3>
|
||||
<span id="index-syntax-tree-nodes_002c-retrieving-from-other-nodes"></span>
|
||||
|
||||
<span id="By-kinship"></span><h4 class="subheading">By kinship</h4>
|
||||
<span id="index-kinship_002c-syntax-tree-nodes"></span>
|
||||
<span id="index-nodes_002c-by-kinship"></span>
|
||||
<span id="index-syntax-tree-nodes_002c-by-kinship"></span>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dnode_002dparent"><span class="category">Function: </span><span><strong>treesit-node-parent</strong> <em>node</em><a href='#index-treesit_002dnode_002dparent' class='copiable-anchor'> ¶</a></span></dt>
|
||||
|
@ -165,132 +182,162 @@
|
|||
<dl class="def">
|
||||
<dt id="index-treesit_002dnode_002dchild"><span class="category">Function: </span><span><strong>treesit-node-child</strong> <em>node n &optional named</em><a href='#index-treesit_002dnode_002dchild' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function returns the <var>n</var>’th child of <var>node</var>. If
|
||||
<var>named</var> is non-nil, then it only counts named nodes
|
||||
(see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>). For example, in a node
|
||||
that represents a string: <code>"text"</code>, there are three children
|
||||
nodes: the opening quote <code>"</code>, the string content <code>text</code>, and
|
||||
the enclosing quote <code>"</code>. Among these nodes, the first child is
|
||||
the opening quote <code>"</code>, the first named child is the string
|
||||
content <code>text</code>.
|
||||
<var>named</var> is non-<code>nil</code>, it counts only named nodes
|
||||
(see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>).
|
||||
</p>
|
||||
<p>For example, in a node that represents a string <code>"text"</code>, there
|
||||
are three children nodes: the opening quote <code>"</code>, the string text
|
||||
<code>text</code>, and the closing quote <code>"</code>. Among these nodes, the
|
||||
first child is the opening quote <code>"</code>, and the first named child
|
||||
is the string text.
|
||||
</p>
|
||||
<p>This function returns <code>nil</code> if there is no <var>n</var>’th child.
|
||||
<var>n</var> could be negative, e.g., <code>-1</code> represents the last child.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dnode_002dchildren"><span class="category">Function: </span><span><strong>treesit-node-children</strong> <em>node &optional named</em><a href='#index-treesit_002dnode_002dchildren' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function returns all of <var>node</var>’s children in a list. If
|
||||
<var>named</var> is non-nil, then it only retrieves named nodes.
|
||||
<dd><p>This function returns all of <var>node</var>’s children as a list. If
|
||||
<var>named</var> is non-<code>nil</code>, it retrieves only named nodes.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dnext_002dsibling"><span class="category">Function: </span><span><strong>treesit-next-sibling</strong> <em>node &optional named</em><a href='#index-treesit_002dnext_002dsibling' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function finds the next sibling of <var>node</var>. If <var>named</var> is
|
||||
non-nil, it finds the next named sibling.
|
||||
non-<code>nil</code>, it finds the next named sibling.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dprev_002dsibling"><span class="category">Function: </span><span><strong>treesit-prev-sibling</strong> <em>node &optional named</em><a href='#index-treesit_002dprev_002dsibling' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function finds the previous sibling of <var>node</var>. If
|
||||
<var>named</var> is non-nil, it finds the previous named sibling.
|
||||
<var>named</var> is non-<code>nil</code>, it finds the previous named sibling.
|
||||
</p></dd></dl>
|
||||
|
||||
<span id="By-field-name"></span><h4 class="subheading">By field name</h4>
|
||||
<span id="index-nodes_002c-by-field-name"></span>
|
||||
<span id="index-syntax-tree-nodes_002c-by-field-name"></span>
|
||||
|
||||
<p>To make the syntax tree easier to analyze, many language definitions
|
||||
assign <em>field names</em> to child nodes (see <a href="Language-Definitions.html#tree_002dsitter-node-field-name">field name</a>). For example, a <code>function_definition</code> node
|
||||
could have a <code>declarator</code> and a <code>body</code>.
|
||||
could have a <code>declarator</code> node and a <code>body</code> node.
|
||||
</p>
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dchild_002dby_002dfield_002dname"><span class="category">Function: </span><span><strong>treesit-child-by-field-name</strong> <em>node field-name</em><a href='#index-treesit_002dchild_002dby_002dfield_002dname' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function finds the child of <var>node</var> that has <var>field-name</var>
|
||||
as its field name.
|
||||
<dd><p>This function finds the child of <var>node</var> whose field name is
|
||||
<var>field-name</var>, a string.
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example">;; Get the child that has "body" as its field name.
|
||||
(treesit-child-by-field-name node "body")
|
||||
</pre></div>
|
||||
⇒ #<treesit-node (compound_statement) in 45-89>
|
||||
</pre></div>
|
||||
</dd></dl>
|
||||
|
||||
<span id="By-position"></span><h4 class="subheading">By position</h4>
|
||||
<span id="index-nodes_002c-by-position"></span>
|
||||
<span id="index-syntax-tree-nodes_002c-by-position"></span>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dfirst_002dchild_002dfor_002dpos"><span class="category">Function: </span><span><strong>treesit-first-child-for-pos</strong> <em>node pos &optional named</em><a href='#index-treesit_002dfirst_002dchild_002dfor_002dpos' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function finds the first child of <var>node</var> that extends beyond
|
||||
<var>pos</var>. “Extend beyond” means the end of the child node >=
|
||||
<var>pos</var>. This function only looks for immediate children of
|
||||
<var>node</var>, and doesn’t look in its grand children. If <var>named</var> is
|
||||
non-nil, it only looks for named child (see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>).
|
||||
buffer position <var>pos</var>. “Extends beyond” means the end of the
|
||||
child node is greater or equal to <var>pos</var>. This function only looks
|
||||
for immediate children of <var>node</var>, and doesn’t look in its
|
||||
grandchildren. If <var>named</var> is non-<code>nil</code>, it looks for the
|
||||
first named child (see <a href="Language-Definitions.html#tree_002dsitter-named-node">named node</a>).
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dnode_002ddescendant_002dfor_002drange"><span class="category">Function: </span><span><strong>treesit-node-descendant-for-range</strong> <em>node beg end &optional named</em><a href='#index-treesit_002dnode_002ddescendant_002dfor_002drange' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function finds the <em>smallest</em> child/grandchild... of
|
||||
<var>node</var> that spans the range from <var>beg</var> to <var>end</var>. It is
|
||||
similar to <code>treesit-node-at</code>. If <var>named</var> is non-nil, it only
|
||||
looks for named child.
|
||||
<dd><p>This function finds the <em>smallest</em> descendant node of <var>node</var>
|
||||
that spans the region of text between positions <var>beg</var> and
|
||||
<var>end</var>. It is similar to <code>treesit-node-at</code>. If <var>named</var>
|
||||
is non-<code>nil</code>, it looks for smallest named child.
|
||||
</p></dd></dl>
|
||||
|
||||
<span id="Searching-for-node"></span><h3 class="heading">Searching for node</h3>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dsearch_002dsubtree"><span class="category">Function: </span><span><strong>treesit-search-subtree</strong> <em>node predicate &optional all backward limit</em><a href='#index-treesit_002dsearch_002dsubtree' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dt id="index-treesit_002dsearch_002dsubtree"><span class="category">Function: </span><span><strong>treesit-search-subtree</strong> <em>node predicate &optional backward all limit</em><a href='#index-treesit_002dsearch_002dsubtree' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function traverses the subtree of <var>node</var> (including
|
||||
<var>node</var>), and match <var>predicate</var> with each node along the way.
|
||||
And <var>predicate</var> is a regexp that matches (case-insensitively)
|
||||
against each node’s type, or a function that takes a node and returns
|
||||
nil/non-nil. If a node matches, that node is returned, if no node
|
||||
ever matches, nil is returned.
|
||||
<var>node</var> itself), looking for a node for which <var>predicate</var>
|
||||
returns non-<code>nil</code>. <var>predicate</var> is a regexp that is matched
|
||||
(case-insensitively) against each node’s type, or a predicate function
|
||||
that takes a node and returns non-<code>nil</code> if the node matches. The
|
||||
function returns the first node that matches, or <code>nil</code> if none
|
||||
does.
|
||||
</p>
|
||||
<p>By default, this function only traverses named nodes, if <var>all</var> is
|
||||
non-nil, it traverses all nodes. If <var>backward</var> is non-nil, it
|
||||
traverses backwards. If <var>limit</var> is non-nil, it only traverses
|
||||
that number of levels down in the tree.
|
||||
<p>By default, this function only traverses named nodes, but if <var>all</var>
|
||||
is non-<code>nil</code>, it traverses all the nodes. If <var>backward</var> is
|
||||
non-<code>nil</code>, it traverses backwards (i.e., it visits the last child first
|
||||
when traversing down the tree). If <var>limit</var> is non-<code>nil</code>, it
|
||||
must be a number that limits the tree traversal to that many levels
|
||||
down the tree.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dsearch_002dforward"><span class="category">Function: </span><span><strong>treesit-search-forward</strong> <em>start predicate &optional all backward up</em><a href='#index-treesit_002dsearch_002dforward' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function is somewhat similar to <code>treesit-search-subtree</code>.
|
||||
It also traverse the parse tree and match each node with
|
||||
<var>predicate</var> (except for <var>start</var>), where <var>predicate</var> can be
|
||||
a (case-insensitive) regexp or a function. For a tree like the below
|
||||
where <var>start</var> is marked 1, this function traverses as numbered:
|
||||
<dt id="index-treesit_002dsearch_002dforward"><span class="category">Function: </span><span><strong>treesit-search-forward</strong> <em>start predicate &optional backward all</em><a href='#index-treesit_002dsearch_002dforward' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>Like <code>treesit-search-subtree</code>, this function also traverses the
|
||||
parse tree and matches each node with <var>predicate</var> (except for
|
||||
<var>start</var>), where <var>predicate</var> can be a (case-insensitive) regexp
|
||||
or a function. For a tree like the below where <var>start</var> is marked
|
||||
S, this function traverses as numbered from 1 to 12:
|
||||
</p>
|
||||
<div class="example">
|
||||
<pre class="example"> o
|
||||
<pre class="example"> 12
|
||||
|
|
||||
3--------4-----------8
|
||||
| | |
|
||||
o--o-+--1 5--+--6 9---+-----12
|
||||
| | | | | |
|
||||
o o 2 7 +-+-+ +--+--+
|
||||
| | | | |
|
||||
10 11 13 14 15
|
||||
S--------3----------11
|
||||
| | |
|
||||
o--o-+--o 1--+--2 6--+-----10
|
||||
| | | |
|
||||
o o +-+-+ +--+--+
|
||||
| | | | |
|
||||
4 5 7 8 9
|
||||
</pre></div>
|
||||
|
||||
<p>Same as in <code>treesit-search-subtree</code>, this function only searches
|
||||
for named nodes by default. But if <var>all</var> is non-nil, it searches
|
||||
for all nodes. If <var>backward</var> is non-nil, it searches backwards.
|
||||
<p>Note that this function doesn’t traverse the subtree of <var>start</var>,
|
||||
and it always traverse leaf nodes first, then upwards.
|
||||
</p>
|
||||
<p>If <var>up</var> is non-nil, this function will only traverse to siblings
|
||||
and parents. In that case, only 1 3 4 8 would be traversed.
|
||||
<p>Like <code>treesit-search-subtree</code>, this function only searches for
|
||||
named nodes by default, but if <var>all</var> is non-<code>nil</code>, it
|
||||
searches for all nodes. If <var>backward</var> is non-<code>nil</code>, it
|
||||
searches backwards.
|
||||
</p>
|
||||
<p>While <code>treesit-search-subtree</code> traverses the subtree of a node,
|
||||
this function starts with node <var>start</var> and traverses every node
|
||||
that comes after it in the buffer position order, i.e., nodes with
|
||||
start positions greater than the end position of <var>start</var>.
|
||||
</p>
|
||||
<p>In the tree shown above, <code>treesit-search-subtree</code> traverses node
|
||||
S (<var>start</var>) and nodes marked with <code>o</code>, where this function
|
||||
traverses the nodes marked with numbers. This function is useful for
|
||||
answering questions like “what is the first node after <var>start</var> in
|
||||
the buffer that satisfies some condition?”
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dsearch_002dforward_002dgoto"><span class="category">Function: </span><span><strong>treesit-search-forward-goto</strong> <em>predicate side &optional all backward up</em><a href='#index-treesit_002dsearch_002dforward_002dgoto' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function jumps to the start or end of the next node in buffer
|
||||
that matches <var>predicate</var>. Parameters <var>predicate</var>, <var>all</var>,
|
||||
<var>backward</var>, and <var>up</var> are the same as in
|
||||
<code>treesit-search-forward</code>. And <var>side</var> controls which side of
|
||||
the matched no do we stop at, it can be <code>start</code> or <code>end</code>.
|
||||
<dt id="index-treesit_002dsearch_002dforward_002dgoto"><span class="category">Function: </span><span><strong>treesit-search-forward-goto</strong> <em>node predicate &optional start backward all</em><a href='#index-treesit_002dsearch_002dforward_002dgoto' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function moves point to the start or end of the next node after
|
||||
<var>node</var> in the buffer that matches <var>predicate</var>. If <var>start</var>
|
||||
is non-<code>nil</code>, stop at the beginning rather than the end of a node.
|
||||
</p>
|
||||
<p>This function guarantees that the matched node it returns makes
|
||||
progress in terms of buffer position: the start/end position of the
|
||||
returned node is always greater than that of <var>node</var>.
|
||||
</p>
|
||||
<p>Arguments <var>predicate</var>, <var>backward</var> and <var>all</var> are the same
|
||||
as in <code>treesit-search-forward</code>.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dinduce_002dsparse_002dtree"><span class="category">Function: </span><span><strong>treesit-induce-sparse-tree</strong> <em>root predicate &optional process-fn limit</em><a href='#index-treesit_002dinduce_002dsparse_002dtree' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function creates a sparse tree from <var>root</var>’s subtree.
|
||||
</p>
|
||||
<p>Basically, it takes the subtree under <var>root</var>, and combs it so only
|
||||
the nodes that match <var>predicate</var> are left, like picking out grapes
|
||||
on the vine. Like previous functions, <var>predicate</var> can be a regexp
|
||||
string that matches against each node’s type case-insensitively, or a
|
||||
function that takes a node and return nil/non-nil.
|
||||
<p>It takes the subtree under <var>root</var>, and combs it so only the nodes
|
||||
that match <var>predicate</var> are left. Like previous functions, the
|
||||
<var>predicate</var> can be a regexp string that matches against each
|
||||
node’s type case-insensitively, or a function that takes a node and
|
||||
return non-<code>nil</code> if it matches.
|
||||
</p>
|
||||
<p>For example, for a subtree on the left that consist of both numbers
|
||||
and letters, if <var>predicate</var> is “letter only”, the returned tree
|
||||
|
@ -310,50 +357,63 @@
|
|||
e 5 e
|
||||
</pre></div>
|
||||
|
||||
<p>If <var>process-fn</var> is non-nil, instead of returning the matched
|
||||
<p>If <var>process-fn</var> is non-<code>nil</code>, instead of returning the matched
|
||||
nodes, this function passes each node to <var>process-fn</var> and uses the
|
||||
returned value instead. If non-nil, <var>limit</var> is the number of
|
||||
returned value instead. If non-<code>nil</code>, <var>limit</var> is the number of
|
||||
levels to go down from <var>root</var>.
|
||||
</p>
|
||||
<p>Each node in the returned tree looks like <code>(<var>tree-sitter
|
||||
node</var> . (<var>child</var> ...))</code>. The <var>tree-sitter node</var> of the root
|
||||
of this tree will be nil if <var>ROOT</var> doesn’t match <var>pred</var>. If
|
||||
no node matches <var>predicate</var>, return nil.
|
||||
<p>Each node in the returned tree looks like
|
||||
<code>(<var><span class="nolinebreak">tree-sitter-node</span></var> . (<var>child</var> …))</code><!-- /@w -->. The
|
||||
<var>tree-sitter-node</var> of the root of this tree will be nil if
|
||||
<var>root</var> doesn’t match <var>predicate</var>. If no node matches
|
||||
<var>predicate</var>, the function returns <code>nil</code>.
|
||||
</p></dd></dl>
|
||||
|
||||
<span id="More-convenient-functions"></span><h3 class="heading">More convenient functions</h3>
|
||||
<span id="More-convenience-functions"></span><h3 class="heading">More convenience functions</h3>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dfilter_002dchild"><span class="category">Function: </span><span><strong>treesit-filter-child</strong> <em>node pred &optional named</em><a href='#index-treesit_002dfilter_002dchild' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function finds immediate children of <var>node</var> that satisfies
|
||||
<var>pred</var>.
|
||||
<dt id="index-treesit_002dfilter_002dchild"><span class="category">Function: </span><span><strong>treesit-filter-child</strong> <em>node predicate &optional named</em><a href='#index-treesit_002dfilter_002dchild' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function finds immediate children of <var>node</var> that satisfy
|
||||
<var>predicate</var>.
|
||||
</p>
|
||||
<p>Function <var>pred</var> takes the child node as the argument and should
|
||||
return non-nil to indicated keeping the child. If <var>named</var>
|
||||
non-nil, this function only searches for named nodes.
|
||||
<p>The <var>predicate</var> function takes a node as the argument and should
|
||||
return non-<code>nil</code> to indicate that the node should be kept. If
|
||||
<var>named</var> is non-<code>nil</code>, this function only examines the named
|
||||
nodes.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dparent_002duntil"><span class="category">Function: </span><span><strong>treesit-parent-until</strong> <em>node pred</em><a href='#index-treesit_002dparent_002duntil' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function repeatedly finds the parent of <var>node</var>, and returns
|
||||
the parent if it satisfies <var>pred</var> (which takes the parent as the
|
||||
argument). If no parent satisfies <var>pred</var>, this function returns
|
||||
nil.
|
||||
<dt id="index-treesit_002dparent_002duntil"><span class="category">Function: </span><span><strong>treesit-parent-until</strong> <em>node predicate</em><a href='#index-treesit_002dparent_002duntil' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function repeatedly finds the parents of <var>node</var>, and returns
|
||||
the parent that satisfies <var>predicate</var>, a function that takes a
|
||||
node as the argument. If no parent satisfies <var>predicate</var>, this
|
||||
function returns <code>nil</code>.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dparent_002dwhile"><span class="category">Function: </span><span><strong>treesit-parent-while</strong><a href='#index-treesit_002dparent_002dwhile' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dt id="index-treesit_002dparent_002dwhile"><span class="category">Function: </span><span><strong>treesit-parent-while</strong> <em>node predicate</em><a href='#index-treesit_002dparent_002dwhile' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function repeatedly finds the parent of <var>node</var>, and keeps
|
||||
doing so as long as the parent satisfies <var>pred</var> (which takes the
|
||||
parent as the single argument). I.e., this function returns the
|
||||
farthest parent that still satisfies <var>pred</var>.
|
||||
doing so as long as the nodes satisfy <var>predicate</var>, a function that
|
||||
takes a node as the argument. That is, this function returns the
|
||||
farthest parent that still satisfies <var>predicate</var>.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dnode_002dtop_002dlevel"><span class="category">Function: </span><span><strong>treesit-node-top-level</strong> <em>node &optional type</em><a href='#index-treesit_002dnode_002dtop_002dlevel' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function returns the highest parent of <var>node</var> that has the
|
||||
same type as <var>node</var>. If no such parent exists, it returns
|
||||
<code>nil</code>. Therefore this function is also useful for testing
|
||||
whether <var>node</var> is top-level.
|
||||
</p>
|
||||
<p>If <var>type</var> is non-<code>nil</code>, this function matches each parent’s
|
||||
type with <var>type</var> as a regexp, rather than using <var>node</var>’s type.
|
||||
</p></dd></dl>
|
||||
|
||||
</div>
|
||||
<hr>
|
||||
<div class="header">
|
||||
<p>
|
||||
Next: <a href="Accessing-Node.html">Accessing Node Information</a>, Previous: <a href="Using-Parser.html">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
||||
Next: <a href="Accessing-Node-Information.html">Accessing Node Information</a>, Previous: <a href="Using-Parser.html">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
||||
</div>
|
||||
|
||||
|
||||
|
|
|
@ -33,7 +33,7 @@
|
|||
<link href="Index.html" rel="index" title="Index">
|
||||
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
|
||||
<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
|
||||
<link href="Multiple-Languages.html" rel="prev" title="Multiple Languages">
|
||||
<link href="Tree_002dsitter-major-modes.html" rel="prev" title="Tree-sitter major modes">
|
||||
<style type="text/css">
|
||||
<!--
|
||||
a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
|
||||
|
@ -62,23 +62,23 @@
|
|||
<div class="section" id="Tree_002dsitter-C-API">
|
||||
<div class="header">
|
||||
<p>
|
||||
Previous: <a href="Multiple-Languages.html" accesskey="p" rel="prev">Parsing Text in Multiple Languages</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
||||
Previous: <a href="Tree_002dsitter-major-modes.html" accesskey="p" rel="prev">Developing major modes with tree-sitter</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
||||
</div>
|
||||
<hr>
|
||||
<span id="Tree_002dsitter-C-API-Correspondence"></span><h3 class="section">37.7 Tree-sitter C API Correspondence</h3>
|
||||
<span id="Tree_002dsitter-C-API-Correspondence"></span><h3 class="section">37.8 Tree-sitter C API Correspondence</h3>
|
||||
|
||||
<p>Emacs’ tree-sitter integration doesn’t expose every feature
|
||||
tree-sitter’s C API provides. Missing features include:
|
||||
provided by tree-sitter’s C API. Missing features include:
|
||||
</p>
|
||||
<ul>
|
||||
<li> Creating a tree cursor and navigating the syntax tree with it.
|
||||
</li><li> Setting timeout and cancellation flag for a parser.
|
||||
</li><li> Setting the logger for a parser.
|
||||
</li><li> Printing a DOT graph of the syntax tree to a file.
|
||||
</li><li> Coping and modifying a syntax tree. (Emacs doesn’t expose a tree
|
||||
</li><li> Printing a <acronym>DOT</acronym> graph of the syntax tree to a file.
|
||||
</li><li> Copying and modifying a syntax tree. (Emacs doesn’t expose a tree
|
||||
object.)
|
||||
</li><li> Using (row, column) coordinates as position.
|
||||
</li><li> Updating a node with changes. (In Emacs, retrieve a new node instead
|
||||
</li><li> Updating a node with changes. (In Emacs, retrieve a new node instead
|
||||
of updating the existing one.)
|
||||
</li><li> Querying statics of a language definition.
|
||||
</li></ul>
|
||||
|
@ -87,7 +87,7 @@
|
|||
convenient and idiomatic:
|
||||
</p>
|
||||
<ul>
|
||||
<li> Instead of using byte positions, the ELisp API uses character
|
||||
<li> Instead of using byte positions, the Emacs Lisp API uses character
|
||||
positions.
|
||||
</li><li> Null nodes are converted to nil.
|
||||
</li></ul>
|
||||
|
@ -203,7 +203,7 @@
|
|||
<hr>
|
||||
<div class="header">
|
||||
<p>
|
||||
Previous: <a href="Multiple-Languages.html">Parsing Text in Multiple Languages</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
||||
Previous: <a href="Tree_002dsitter-major-modes.html">Developing major modes with tree-sitter</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
|
||||
</div>
|
||||
|
||||
|
||||
|
|
|
@ -67,12 +67,12 @@
|
|||
</div>
|
||||
<hr>
|
||||
<span id="Using-Tree_002dsitter-Parser"></span><h3 class="section">37.2 Using Tree-sitter Parser</h3>
|
||||
<span id="index-Tree_002dsitter-parser"></span>
|
||||
<span id="index-tree_002dsitter-parser_002c-using"></span>
|
||||
|
||||
<p>This section described how to create and configure a tree-sitter
|
||||
<p>This section describes how to create and configure a tree-sitter
|
||||
parser. In Emacs, each tree-sitter parser is associated with a
|
||||
buffer. As we edit the buffer, the associated parser and the syntax
|
||||
tree is automatically kept up-to-date.
|
||||
buffer. As the user edits the buffer, the associated parser and
|
||||
syntax tree are automatically kept up-to-date.
|
||||
</p>
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dmax_002dbuffer_002dsize"><span class="category">Variable: </span><span><strong>treesit-max-buffer-size</strong><a href='#index-treesit_002dmax_002dbuffer_002dsize' class='copiable-anchor'> ¶</a></span></dt>
|
||||
|
@ -88,48 +88,49 @@
|
|||
<code>treesit-available-p</code> and <code>treesit-max-buffer-size</code>.
|
||||
</p></dd></dl>
|
||||
|
||||
<span id="index-Creating-tree_002dsitter-parsers"></span>
|
||||
<span id="index-creating-tree_002dsitter-parsers"></span>
|
||||
<span id="index-tree_002dsitter-parser_002c-creating"></span>
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dparser_002dcreate"><span class="category">Function: </span><span><strong>treesit-parser-create</strong> <em>language &optional buffer no-reuse</em><a href='#index-treesit_002dparser_002dcreate' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>To create a parser, we provide a <var>buffer</var> and the <var>language</var>
|
||||
to use (see <a href="Language-Definitions.html">Tree-sitter Language Definitions</a>). If <var>buffer</var> is nil, the
|
||||
current buffer is used.
|
||||
<dd><p>Create a parser for the specified <var>buffer</var> and <var>language</var>
|
||||
(see <a href="Language-Definitions.html">Tree-sitter Language Definitions</a>). If <var>buffer</var> is omitted or
|
||||
<code>nil</code>, it stands for the current buffer.
|
||||
</p>
|
||||
<p>By default, this function reuses a parser if one already exists for
|
||||
<var>language</var> in <var>buffer</var>, if <var>no-reuse</var> is non-nil, this
|
||||
function always creates a new parser.
|
||||
<var>language</var> in <var>buffer</var>, but if <var>no-reuse</var> is
|
||||
non-<code>nil</code>, this function always creates a new parser.
|
||||
</p></dd></dl>
|
||||
|
||||
<p>Given a parser, we can query information about it:
|
||||
<p>Given a parser, we can query information about it.
|
||||
</p>
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dparser_002dbuffer"><span class="category">Function: </span><span><strong>treesit-parser-buffer</strong> <em>parser</em><a href='#index-treesit_002dparser_002dbuffer' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>Returns the buffer associated with <var>parser</var>.
|
||||
<dd><p>This function returns the buffer associated with <var>parser</var>.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dparser_002dlanguage"><span class="category">Function: </span><span><strong>treesit-parser-language</strong> <em>parser</em><a href='#index-treesit_002dparser_002dlanguage' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>Returns the language that <var>parser</var> uses.
|
||||
<dd><p>This function returns the language used by <var>parser</var>.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dparser_002dp"><span class="category">Function: </span><span><strong>treesit-parser-p</strong> <em>object</em><a href='#index-treesit_002dparser_002dp' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>Checks if <var>object</var> is a tree-sitter parser. Return non-nil if it
|
||||
is, return nil otherwise.
|
||||
<dd><p>This function checks if <var>object</var> is a tree-sitter parser, and
|
||||
returns non-<code>nil</code> if it is, and <code>nil</code> otherwise.
|
||||
</p></dd></dl>
|
||||
|
||||
<p>There is no need to explicitly parse a buffer, because parsing is done
|
||||
automatically and lazily. A parser only parses when we query for a
|
||||
node in its syntax tree. Therefore, when a parser is first created,
|
||||
it doesn’t parse the buffer; it waits until we query for a node for
|
||||
the first time. Similarly, when some change is made in the buffer, a
|
||||
parser doesn’t re-parse immediately.
|
||||
automatically and lazily. A parser only parses when a Lisp program
|
||||
queries for a node in its syntax tree. Therefore, when a parser is
|
||||
first created, it doesn’t parse the buffer; it waits until the Lisp
|
||||
program queries for a node for the first time. Similarly, when some
|
||||
change is made in the buffer, a parser doesn’t re-parse immediately.
|
||||
</p>
|
||||
<span id="index-treesit_002dbuffer_002dtoo_002dlarge"></span>
|
||||
<p>When a parser do parse, it checks for the size of the buffer.
|
||||
<p>When a parser does parse, it checks for the size of the buffer.
|
||||
Tree-sitter can only handle buffer no larger than about 4GB. If the
|
||||
size exceeds that, Emacs signals <code>treesit-buffer-too-large</code>
|
||||
with signal data being the buffer size.
|
||||
size exceeds that, Emacs signals the <code>treesit-buffer-too-large</code>
|
||||
error with signal data being the buffer size.
|
||||
</p>
|
||||
<p>Once a parser is created, Emacs automatically adds it to the
|
||||
internal parser list. Every time a change is made to the buffer,
|
||||
|
@ -138,8 +139,9 @@
|
|||
</p>
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dparser_002dlist"><span class="category">Function: </span><span><strong>treesit-parser-list</strong> <em>&optional buffer</em><a href='#index-treesit_002dparser_002dlist' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function returns the parser list of <var>buffer</var>. And
|
||||
<var>buffer</var> defaults to the current buffer.
|
||||
<dd><p>This function returns the parser list of <var>buffer</var>. If
|
||||
<var>buffer</var> is <code>nil</code> or omitted, it defaults to the current
|
||||
buffer.
|
||||
</p></dd></dl>
|
||||
|
||||
<dl class="def">
|
||||
|
@ -148,29 +150,30 @@
|
|||
</p></dd></dl>
|
||||
|
||||
<span id="index-tree_002dsitter-narrowing"></span>
|
||||
<span id="tree_002dsitter-narrowing"></span><p>Normally, a parser “sees” the whole
|
||||
buffer, but when the buffer is narrowed (see <a href="Narrowing.html">Narrowing</a>), the
|
||||
parser will only see the visible region. As far as the parser can
|
||||
tell, the hidden region is deleted. And when the buffer is later
|
||||
widened, the parser thinks text is inserted in the beginning and in
|
||||
the end. Although parsers respect narrowing, narrowing shouldn’t be
|
||||
the mean to handle a multi-language buffer; instead, set the ranges in
|
||||
which a parser should operate in. See <a href="Multiple-Languages.html">Parsing Text in Multiple Languages</a>.
|
||||
<span id="tree_002dsitter-narrowing"></span><p>Normally, a parser “sees” the whole buffer, but when the buffer is
|
||||
narrowed (see <a href="Narrowing.html">Narrowing</a>), the parser will only see the accessible
|
||||
portion of the buffer. As far as the parser can tell, the hidden
|
||||
region was deleted. When the buffer is later widened, the parser
|
||||
thinks text is inserted at the beginning and at the end. Although
|
||||
parsers respect narrowing, modes should not use narrowing as a means
|
||||
to handle a multi-language buffer; instead, set the ranges in which the
|
||||
parser should operate. See <a href="Multiple-Languages.html">Parsing Text in Multiple Languages</a>.
|
||||
</p>
|
||||
<p>Because a parser parses lazily, when we narrow the buffer, the parser
|
||||
is not affected immediately; as long as we don’t query for a node
|
||||
while the buffer is narrowed, the parser is oblivious of the
|
||||
narrowing.
|
||||
<p>Because a parser parses lazily, when the user or a Lisp program
|
||||
narrows the buffer, the parser is not affected immediately; as long as
|
||||
the mode doesn’t query for a node while the buffer is narrowed, the
|
||||
parser is oblivious of the narrowing.
|
||||
</p>
|
||||
<span id="index-tree_002dsitter-parse-string"></span>
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dparse_002dstring"><span class="category">Function: </span><span><strong>treesit-parse-string</strong> <em>string language</em><a href='#index-treesit_002dparse_002dstring' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>Besides creating a parser for a buffer, we can also just parse a
|
||||
string. Unlike a buffer, parsing a string is a one-time deal, and
|
||||
<span id="index-parse-string_002c-tree_002dsitter"></span>
|
||||
<p>Besides creating a parser for a buffer, a Lisp program can also parse a
|
||||
string. Unlike a buffer, parsing a string is a one-off operation, and
|
||||
there is no way to update the result.
|
||||
</p>
|
||||
<p>This function parses <var>string</var> with <var>language</var>, and returns the
|
||||
root node of the generated syntax tree.
|
||||
<dl class="def">
|
||||
<dt id="index-treesit_002dparse_002dstring"><span class="category">Function: </span><span><strong>treesit-parse-string</strong> <em>string language</em><a href='#index-treesit_002dparse_002dstring' class='copiable-anchor'> ¶</a></span></dt>
|
||||
<dd><p>This function parses <var>string</var> using <var>language</var>, and returns
|
||||
the root node of the generated syntax tree.
|
||||
</p></dd></dl>
|
||||
|
||||
</div>
|
||||
|
|
|
@ -78,28 +78,25 @@ Now check if Emacs is built with tree-sitter library
|
|||
|
||||
(treesit-available-p)
|
||||
|
||||
For your major mode, first create a tree-sitter switch:
|
||||
Users toggle tree-sitter for each major mode with a central variable,
|
||||
‘treesit-settings’. You can check whether to enable tree-sitter with
|
||||
‘treesit-ready-p’, which takes a major-mode symbol and one or more
|
||||
language symbol. The major mode body should use a branch like this:
|
||||
|
||||
#+begin_src elisp
|
||||
(defcustom python-use-tree-sitter nil
|
||||
"If non-nil, `python-mode' tries to use tree-sitter.
|
||||
Currently `python-mode' can utilize tree-sitter for font-locking,
|
||||
imenu, and movement functions."
|
||||
:type 'boolean)
|
||||
#+end_src
|
||||
|
||||
Then in other places, we decide on whether to enable tree-sitter by
|
||||
|
||||
#+begin_src elisp
|
||||
(and python-use-tree-sitter
|
||||
(treesit-can-enable-p))
|
||||
#+begin_src emacs-lisp
|
||||
(cond
|
||||
;; Tree-sitter setup.
|
||||
((treesit-ready-p 'python-mode 'python)
|
||||
...)
|
||||
(t
|
||||
;; Non-tree-sitter setup.
|
||||
...))
|
||||
#+end_src
|
||||
|
||||
* Naming convention
|
||||
|
||||
When referring to tree-sitter as a noun, use “tree-sitter”, like
|
||||
python-use-tree-sitter. For prefix use “treesit”, like
|
||||
python-treesit-indent.
|
||||
Use tree-sitter for text (documentation, comment), use treesit for
|
||||
symbol (variable, function).
|
||||
|
||||
* Font-lock
|
||||
|
||||
|
@ -108,10 +105,23 @@ capture names, tree-sitter finds the nodes that match these patterns,
|
|||
tag the corresponding capture names onto the nodes and return them to
|
||||
you. The query function returns a list of (capture-name . node). For
|
||||
font-lock, we use face names as capture names. And the captured node
|
||||
will be fontified in their capture name. The capture name could also
|
||||
be a function, in which case (START END NODE) is passed to the
|
||||
function for font-lock. START and END is the start and end the
|
||||
captured NODE.
|
||||
will be fontified in their capture name.
|
||||
|
||||
The capture name could also be a function, in which case (NODE
|
||||
OVERRIDE START END) is passed to the function for fontification. START
|
||||
and END is the start and end of the region to be fontified. The
|
||||
function should only fontify within that region. The function should
|
||||
also allow more optional arguments with (&rest _), for future
|
||||
extensibility. For OVERRIDE check out the docstring of
|
||||
treesit-font-lock-rules.
|
||||
|
||||
Contextual syntax like multi-line comments and multi-line strings,
|
||||
needs special care. Because change in this type of things can affect
|
||||
a large portion of the buffer. Think of inserting a closing comment
|
||||
delimeter, it causes all the text before it (to the opening comment
|
||||
delimeter) to change to comment face. These things needs to be
|
||||
captured in a special name “contextual”, so that Emacs can give them
|
||||
special treatment. Se the example below for how it looks like.
|
||||
|
||||
** Query syntax
|
||||
|
||||
|
@ -171,52 +181,64 @@ The manual explains how to read grammar files in the bottom of section
|
|||
|
||||
** Debugging queires
|
||||
|
||||
If your query has problems, it usually cannot compile. In that case
|
||||
use ‘treesit-query-validate’ to debug the query. It will pop a buffer
|
||||
containing the query (in text format) and mark the offending part in
|
||||
red.
|
||||
If your query has problems, use ‘treesit-query-validate’ to debug the
|
||||
query. It will pop a buffer containing the query (in text format) and
|
||||
mark the offending part in red.
|
||||
|
||||
** Code
|
||||
|
||||
To enable tree-sitter font-lock, set ‘treesit-font-lock-settings’
|
||||
buffer-locally and call ‘treesit-font-lock-enable’. For example, see
|
||||
To enable tree-sitter font-lock, set ‘treesit-font-lock-settings’ and
|
||||
‘treesit-font-lock-feature-list’ buffer-locally and call
|
||||
‘treesit-major-mode-setup’. For example, see
|
||||
‘python--treesit-settings’ in python.el. Below I paste a snippet of
|
||||
it.
|
||||
|
||||
Note that like the current font-lock, if the to-be-fontified region
|
||||
already has a face (ie, an earlier match fontified part/all of the
|
||||
region), the new face is discarded rather than applied. If you want
|
||||
region), the new face is discarded rather than applied. If you want
|
||||
later matches always override earlier matches, use the :override
|
||||
keyword.
|
||||
|
||||
Each rule should have a :feature, like function-name,
|
||||
string-interpolation, builtin, etc. Users can then enable/disable each
|
||||
feature individually.
|
||||
|
||||
#+begin_src elisp
|
||||
(defvar python--treesit-settings
|
||||
(treesit-font-lock-rules
|
||||
:feature 'comment
|
||||
:language 'python
|
||||
:override t
|
||||
`(;; Queries for def and class.
|
||||
(function_definition
|
||||
name: (identifier) @font-lock-function-name-face)
|
||||
'((comment) @font-lock-comment-face)
|
||||
|
||||
(class_definition
|
||||
name: (identifier) @font-lock-type-face)
|
||||
:feature 'string
|
||||
:language 'python
|
||||
'((string) @font-lock-string-face
|
||||
(string) @contextual) ; Contextual special treatment.
|
||||
|
||||
;; Comment and string.
|
||||
(comment) @font-lock-comment-face
|
||||
:feature 'function-name
|
||||
:language 'python
|
||||
'((function_definition
|
||||
name: (identifier) @font-lock-function-name-face))
|
||||
|
||||
...)))
|
||||
:feature 'class-name
|
||||
:language 'python
|
||||
'((class_definition
|
||||
name: (identifier) @font-lock-type-face))
|
||||
|
||||
...))
|
||||
#+end_src
|
||||
|
||||
Then in ‘python-mode’, enable tree-sitter font-lock:
|
||||
|
||||
#+begin_src elisp
|
||||
(treesit-parser-create 'python)
|
||||
;; This turns off the syntax-based font-lock for comments and
|
||||
;; strings. So it doesn’t override tree-sitter’s fontification.
|
||||
(setq-local font-lock-keywords-only t)
|
||||
(setq-local treesit-font-lock-settings
|
||||
python--treesit-settings)
|
||||
(treesit-font-lock-enable)
|
||||
(setq-local treesit-font-lock-settings python--treesit-settings)
|
||||
(setq-local treesit-font-lock-feature-list
|
||||
'((comment string function-name)
|
||||
(class-name keyword builtin)
|
||||
(string-interpolation decorator)))
|
||||
...
|
||||
(treesit-major-mode-setup)
|
||||
#+end_src
|
||||
|
||||
Concretely, something like this:
|
||||
|
@ -224,29 +246,22 @@ Concretely, something like this:
|
|||
#+begin_src elisp
|
||||
(define-derived-mode python-mode prog-mode "Python"
|
||||
...
|
||||
|
||||
(treesit-parser-create 'python)
|
||||
|
||||
(if (and python-use-tree-sitter
|
||||
(treesit-can-enable-p))
|
||||
;; Tree-sitter.
|
||||
(progn
|
||||
(setq-local font-lock-keywords-only t)
|
||||
(setq-local treesit-font-lock-settings
|
||||
python--treesit-settings)
|
||||
(treesit-font-lock-enable))
|
||||
(cond
|
||||
;; Tree-sitter.
|
||||
((treesit-ready-p 'python-mode 'python)
|
||||
(treesit-parser-create 'python)
|
||||
(setq-local treesit-font-lock-settings python--treesit-settings)
|
||||
(setq-local treesit-font-lock-feature-list
|
||||
'((comment string function-name)
|
||||
(class-name keyword builtin)
|
||||
(string-interpolation decorator)))
|
||||
(treesit-major-mode-setup))
|
||||
(t
|
||||
;; No tree-sitter
|
||||
(setq-local font-lock-defaults ...))
|
||||
|
||||
...)
|
||||
(setq-local font-lock-defaults ...)
|
||||
...)))
|
||||
#+end_src
|
||||
|
||||
You’ll notice that tree-sitter’s font-lock doesn’t respect
|
||||
‘font-lock-maximum-decoration’, major modes are free to set
|
||||
‘treesit-font-lock-settings’ based on the value of
|
||||
‘font-lock-maximum-decoration’, or provide more fine-grained control
|
||||
through other mode-specific means. (Towards that end, the :toggle option in treesit-font-lock-rules is very useful.)
|
||||
|
||||
* Indent
|
||||
|
||||
Indent works like this: We have a bunch of rules that look like
|
||||
|
@ -262,10 +277,14 @@ previous line. We find the column number of that point (eg, 4), add
|
|||
OFFSET to it (eg, 0), and that is the column we want to indent the
|
||||
current line to (4 + 0 = 4).
|
||||
|
||||
Matchers and anchors are functions that takes (NODE PARENT BOL &rest
|
||||
_). Matches return nil/non-nil for no match/match, and anchors return
|
||||
the anchor point. Below are some convenient builtin matchers and anchors.
|
||||
|
||||
For MATHCER we have
|
||||
|
||||
(parent-is TYPE)
|
||||
(node-is TYPE)
|
||||
(parent-is TYPE) => matches if PARENT’s type matches TYPE as regexp
|
||||
(node-is TYPE) => mathces NODE’s type
|
||||
(query QUERY) => matches if querying PARENT with QUERY
|
||||
captures NODE.
|
||||
|
||||
|
@ -280,9 +299,9 @@ For ANCHOR we have
|
|||
first-sibling => start of the first sibling
|
||||
parent => start of parent
|
||||
parent-bol => BOL of the line parent is on.
|
||||
prev-sibling
|
||||
no-indent => don’t indent
|
||||
prev-line => same indent as previous line
|
||||
prev-sibling => start of previous sibling
|
||||
no-indent => current position (don’t indent)
|
||||
prev-line => start of previous line
|
||||
|
||||
There is also a manual section for indent: "Parser-based Indentation".
|
||||
|
||||
|
@ -301,7 +320,7 @@ tells you which rule is applied in the echo area.
|
|||
((node-is ")") parent-bol 0)
|
||||
((node-is "]") parent-bol 0)
|
||||
((node-is ">") parent-bol 0)
|
||||
((node-is ".") parent-bol ,offset)
|
||||
((node-is "\\.") parent-bol ,offset)
|
||||
((parent-is "ternary_expression") parent-bol ,offset)
|
||||
((parent-is "named_imports") parent-bol ,offset)
|
||||
((parent-is "statement_block") parent-bol ,offset)
|
||||
|
@ -320,21 +339,21 @@ tells you which rule is applied in the echo area.
|
|||
...))))
|
||||
#+end_src
|
||||
|
||||
Then you set ‘treesit-simple-indent-rules’ to your rules, and set
|
||||
‘indent-line-function’:
|
||||
Then you set ‘treesit-simple-indent-rules’ to your rules, and call
|
||||
‘treesit-major-mode-setup’:
|
||||
|
||||
#+begin_src elisp
|
||||
(setq-local treesit-simple-indent-rules typescript-mode-indent-rules)
|
||||
(setq-local indent-line-function #'treesit-indent)
|
||||
(treesit-major-mode-setup)
|
||||
#+end_src
|
||||
|
||||
* Imenu
|
||||
|
||||
Not much to say except for utilizing ‘treesit-induce-sparse-tree’.
|
||||
See ‘python--imenu-treesit-create-index-1’ in python.el for an
|
||||
example.
|
||||
See ‘js--treesit-imenu-1’ in js.el for an example.
|
||||
|
||||
Once you have the index builder, set ‘imenu-create-index-function’.
|
||||
Once you have the index builder, set ‘imenu-create-index-function’ to
|
||||
it.
|
||||
|
||||
* Navigation
|
||||
|
||||
|
@ -344,51 +363,33 @@ You can find the end of a defun with something like
|
|||
(treesit-search-forward-goto "function_definition" 'end)
|
||||
|
||||
where "function_definition" matches the node type of a function
|
||||
definition node, and ’end means we want to go to the end of that
|
||||
node.
|
||||
definition node, and ’end means we want to go to the end of that node.
|
||||
|
||||
Something like this should suffice:
|
||||
Tree-sitter has default implementations for
|
||||
‘beginning-of-defun-function’ and ‘end-of-defun-function’. So for
|
||||
ordinary languages, it is suffice to set ‘treesit-defun-type-regexp’
|
||||
to something that matches all the defun struct types in the language,
|
||||
and call ‘treesit-major-mode-setup’. For example,
|
||||
|
||||
#+begin_src elisp
|
||||
(defun js--treesit-beginning-of-defun (&optional arg)
|
||||
(let ((arg (or arg 1)))
|
||||
(if (> arg 0)
|
||||
;; Go backward.
|
||||
(while (and (> arg 0)
|
||||
(treesit-search-forward-goto
|
||||
"function_definition" 'start nil t))
|
||||
(setq arg (1- arg)))
|
||||
;; Go forward.
|
||||
(while (and (< arg 0)
|
||||
(treesit-search-forward-goto
|
||||
"function_definition" 'start))
|
||||
(setq arg (1+ arg))))))
|
||||
|
||||
(defun xxx-end-of-defun (&optional arg)
|
||||
(let ((arg (or arg 1)))
|
||||
(if (< arg 0)
|
||||
;; Go backward.
|
||||
(while (and (< arg 0)
|
||||
(treesit-search-forward-goto
|
||||
"function_definition" 'end nil t))
|
||||
(setq arg (1+ arg)))
|
||||
;; Go forward.
|
||||
(while (and (> arg 0)
|
||||
(treesit-search-forward-goto
|
||||
"function_definition" 'end))
|
||||
(setq arg (1- arg))))))
|
||||
|
||||
(setq-local beginning-of-defun-function #'xxx-beginning-of-defun)
|
||||
(setq-local end-of-defun-function #'xxx-end-of-defun)
|
||||
#+end_src
|
||||
#+begin_src emacs-lisp
|
||||
(setq-local treesit-defun-type-regexp (rx bol
|
||||
(or "function" "class")
|
||||
"_definition"
|
||||
eol))
|
||||
(treesit-major-mode-setup)
|
||||
#+end_src>
|
||||
|
||||
* Which-func
|
||||
|
||||
You can find the current function by going up the tree and looking for
|
||||
the function_definition node. See ‘python-info-treesit-current-defun’
|
||||
in python.el for an example. Since Python allows nested function
|
||||
definitions, that function keeps going until it reaches the root node,
|
||||
and records all the function names along the way.
|
||||
If you have an imenu implementation, set ‘which-func-functions’ to
|
||||
nil, and which-func will automatically use imenu’s data.
|
||||
|
||||
If you want independent implementation for which-func, you can find
|
||||
the current function by going up the tree and looking for the
|
||||
function_definition node. See the function below for an example.
|
||||
Since Python allows nested function definitions, that function keeps
|
||||
going until it reaches the root node, and records all the function
|
||||
names along the way.
|
||||
|
||||
#+begin_src elisp
|
||||
(defun python-info-treesit-current-defun (&optional include-type)
|
||||
|
|
Loading…
Add table
Reference in a new issue