diff --git a/admin/notes/tree-sitter/html-manual/Language-Definitions.html b/admin/notes/tree-sitter/html-manual/Language-Definitions.html index ba3eeb9eeb9..6df676b1680 100644 --- a/admin/notes/tree-sitter/html-manual/Language-Definitions.html +++ b/admin/notes/tree-sitter/html-manual/Language-Definitions.html @@ -66,14 +66,17 @@
Tree-sitter relies on language definitions to parse text in that
-language. In Emacs, A language definition is represented by a symbol.
-For example, C language definition is represented as c
, and
-c
can be passed to tree-sitter functions as the language
-argument.
+language. In Emacs, a language definition is represented by a symbol.
+For example, the C language definition is represented as the symbol
+c
, and c
can be passed to tree-sitter functions as the
+language argument.
Tree-sitter language definitions are distributed as dynamic libraries.
In order to use a language definition in Emacs, you need to make sure
that the dynamic library is installed on the system. Emacs looks for
-language definitions under load paths in
-treesit-extra-load-path
, user-emacs-directory
/tree-sitter,
-and system default locations for dynamic libraries, in that order.
-Emacs tries each extensions in treesit-load-suffixes
. If Emacs
-cannot find the library or has problem loading it, Emacs signals
-treesit-load-language-error
. The signal data is a list of
-specific error messages.
+language definitions in several places, in the following order:
+
treesit-extra-load-path
;
+user-emacs-directory
(see The Init File);
+In each of these directories, Emacs looks for a file with file-name
+extensions specified by the variable treesit-load-suffixes
.
+
If Emacs cannot find the library or has problems loading it, Emacs
+signals the treesit-load-language-error
error. The data of
+that signal could be one of the following:
+
(not-found error-msg …)
This means that Emacs could not find the language definition library. +
(symbol-error error-msg)
This means that Emacs could not find in the library the expected function +that every language definition library should export. +
(version-mismatch error-msg)
This means that the version of language definition library is incompatible +with that of the tree-sitter library. +
In all of these cases, error-msg might provide additional +details about the failure.
This function checks whether the dynamic library for language is -present on the system, and return non-nil if it is. +
This function returns non-nil
if the language definitions for
+language exist and can be loaded.
+
If detail is non-nil
, return (t . nil)
when
+language is available, and (nil . data)
when it’s
+unavailable. data is the signal data of
+treesit-load-language-error
.
By convention, the dynamic library for language is
-libtree-sitter-language.ext
, where ext is the
-system-specific extension for dynamic libraries. Also by convention,
+
By convention, the file name of the dynamic library for language is
+libtree-sitter-language.ext, where ext is the
+system-specific extension for dynamic libraries. Also by convention,
the function provided by that library is named
-tree_sitter_language
. If a language definition doesn’t
-follow this convention, you should add an entry
+tree_sitter_language
. If a language definition library
+doesn’t follow this convention, you should add an entry
(language library-base-name function-name)
to treesit-load-name-override-list
, where
-library-base-name is the base filename for the dynamic library
-(conventionally libtree-sitter-language
), and
+
to the list in the variable treesit-load-name-override-list
, where
+library-base-name is the basename of the dynamic library’s file name,
+(usually, libtree-sitter-language), and
function-name is the function provided by the library
-(conventionally tree_sitter_language
). For example,
+(usually, tree_sitter_language
). For example,
(cool-lang "libtree-sitter-coool" "tree_sitter_cooool")
for a language too cool to abide by conventions. +
for a language that considers itself too “cool” to abide by +conventions.
+Tree-sitter library has a language version, a language -definition’s version needs to match this version to be compatible. -
-This function returns tree-sitter library’s language version. If -min-compatible is non-nil, it returns the minimal compatible -version. +
This function returns the version of the language-definition
+Application Binary Interface (ABI) supported by the
+tree-sitter library. By default, it returns the latest ABI version
+supported by the library, but if min-compatible is
+non-nil
, it returns the oldest ABI version which the library
+still can support. Language definition libraries must be built for
+ABI versions between the oldest and the latest versions supported by
+the tree-sitter library, otherwise the library will be unable to load
+them.
A syntax tree is what a parser generates. In a syntax tree, each node represents a piece of text, and is connected to each other by a @@ -155,31 +195,34 @@ +------------+ +--------------+ +------------+ -
We can also represent it in s-expression: +
We can also represent it as an s-expression:
(root (expression (number) (operator) (number)))
Names like root
, expression
, number
,
-operator
are nodes’ type. However, not all nodes in a
-syntax tree have a type. Nodes that don’t are anonymous nodes,
-and nodes with a type are named nodes. Anonymous nodes are
-tokens with fixed spellings, including punctuation characters like
-bracket ‘]’, and keywords like return
.
+
+
+
+
Names like root
, expression
, number
, and
+operator
specify the type of the nodes. However, not all
+nodes in a syntax tree have a type. Nodes that don’t have a type are
+known as anonymous nodes, and nodes with a type are named
+nodes. Anonymous nodes are tokens with fixed spellings, including
+punctuation characters like bracket ‘]’, and keywords like
+return
.
To make the syntax tree easier to
-analyze, many language definitions assign field names to child
-nodes. For example, a function_definition
node could have a
-declarator
and a body
:
+
To make the syntax tree easier to analyze, many language definitions
+assign field names to child nodes. For example, a
+function_definition
node could have a declarator
and a
+body
:
(function_definition @@ -189,39 +232,40 @@
This minor mode displays the node that starts at point in -mode-line. The mode-line will display +
This minor mode displays on the mode-line the node that starts +at point. The mode-line will display
parent field-name: (child (grand-child (...))) +parent field: (child (grandchild (…)))
child, grand-child, and grand-grand-child, etc, are -nodes that have their beginning at point. And parent is the -parent of child. +
child, grand, grand-grandchild, etc., are nodes that +begin at point. parent is the parent node of child.
If there is no node that starts at point, i.e., point is in the middle of a node, then the mode-line only displays the smallest node that -spans point, and its immediate parent. +spans the position of point, and its immediate parent.
This minor mode doesn’t create parsers on its own. It simply uses the
first parser in (treesit-parser-list)
(see Using Tree-sitter Parser).
Authors of language definitions define the grammar of a -language, and this grammar determines how does a parser construct a -concrete syntax tree out of the text. In order to use the syntax -tree effectively, we need to read the grammar file. +programming language, which determines how a parser constructs a +concrete syntax tree out of the program text. In order to use the +syntax tree effectively, you need to consult the grammar file.
-The grammar file is usually grammar.js
in a language
-definition’s project repository. The link to a language definition’s
-home page can be found in tree-sitter’s homepage
-(https://tree-sitter.github.io/tree-sitter).
+
The grammar file is usually grammar.js in a language +definition’s project repository. The link to a language definition’s +home page can be found on +tree-sitter’s +homepage.
-The grammar is written in JavaScript syntax. For example, the rule
-matching a function_definition
node looks like
+
The grammar definition is written in JavaScript. For example, the
+rule matching a function_definition
node looks like
function_definition: $ => seq( @@ -231,12 +275,12 @@ )
The rule is represented by a function that takes a single argument +
The rules are represented by functions that take a single argument
$, representing the whole grammar. The function itself is
-constructed by other functions: the seq
function puts together a
-sequence of children; the field
function annotates a child with
-a field name. If we write the above definition in BNF syntax, it
-would look like
+constructed by other functions: the seq
function puts together
+a sequence of children; the field
function annotates a child
+with a field name. If we write the above definition in the so-called
+Backus-Naur Form (BNF) syntax, it would look like
function_definition := @@ -252,66 +296,77 @@ body: (compound_statement))
Below is a list of functions that one will see in a grammar -definition. Each function takes other rules as arguments and returns -a new rule. +
Below is a list of functions that one can see in a grammar definition. +Each function takes other rules as arguments and returns a new rule.
-seq(rule1, rule2, ...)
matches each rule one after another.
-
-choice(rule1, rule2, ...)
matches one of the rules in its
-arguments.
-
-repeat(rule)
matches rule for zero or more times.
+seq(rule1, rule2, …)
matches each rule one after another. +
choice(rule1, rule2, …)
matches one of the rules in its arguments. +
repeat(rule)
matches rule for zero or more times. This is like the ‘*’ operator in regular expressions. - -
repeat1(rule)
matches rule for one or more times.
+
+repeat1(rule)
matches rule for one or more times. This is like the ‘+’ operator in regular expressions. - -
optional(rule)
matches rule for zero or one time.
+
+optional(rule)
matches rule for zero or one time. This is like the ‘?’ operator in regular expressions. - -
field(name, rule)
assigns field name name to the child
-node matched by rule.
-
-alias(rule, alias)
makes nodes matched by rule appear as
-alias in the syntax tree generated by the parser. For example,
-
+
+field(name, rule)
assigns field name name to the child node matched by rule. +
alias(rule, alias)
makes nodes matched by rule appear as alias in the syntax +tree generated by the parser. For example, +
alias(preprocessor_call_exp, call_expression)
makes any node matched by preprocessor_call_exp
to appear as
+
makes any node matched by preprocessor_call_exp
appear as
call_expression
.
-
Below are grammar functions less interesting for a reader of a +
Below are grammar functions of lesser importance for reading a language definition.
-token(rule)
marks rule to produce a single leaf node.
-That is, instead of generating a parent node with individual child
-nodes under it, everything is combined into a single leaf node.
+token(rule)
marks rule to produce a single leaf node. That is, instead of +generating a parent node with individual child nodes under it, +everything is combined into a single leaf node. +
token.immediate(rule)
Normally, grammar rules ignore preceding whitespace; this +changes rule to match only when there is no preceding +whitespaces. +
prec(n, rule)
gives rule the level-n precedence. +
prec.left([n,] rule)
marks rule as left-associative, optionally with level n. +
prec.right([n,] rule)
marks rule as right-associative, optionally with level n. +
prec.dynamic(n, rule)
this is like prec
, but the precedence is applied at runtime
+instead.
+
token.immediate(rule)
changes rule to match only when
-there is no preceding whitespaces.
-
-prec(n, rule)
gives rule a level n precedence.
-
-prec.left([n,] rule)
marks rule as left-associative,
-optionally with level n.
-
-prec.right([n,] rule)
marks rule as right-associative,
-optionally with level n.
-
-prec.dynamic(n, rule)
is like prec
, but the precedence
-is applied at runtime instead.
-The tree-sitter project talks about writing a grammar in more detail: -https://tree-sitter.github.io/tree-sitter/creating-parsers. -Read especially “The Grammar DSL” section. +
The documentation of the tree-sitter project has +more +about writing a grammar. Read especially “The Grammar DSL” +section.