diff --git a/admin/notes/tree-sitter/html-manual/Language-Definitions.html b/admin/notes/tree-sitter/html-manual/Language-Definitions.html index ba3eeb9eeb9..6df676b1680 100644 --- a/admin/notes/tree-sitter/html-manual/Language-Definitions.html +++ b/admin/notes/tree-sitter/html-manual/Language-Definitions.html @@ -66,14 +66,17 @@

37.1 Tree-sitter Language Definitions

+

Loading a language definition

+ +

Tree-sitter relies on language definitions to parse text in that -language. In Emacs, A language definition is represented by a symbol. -For example, C language definition is represented as c, and -c can be passed to tree-sitter functions as the language -argument. +language. In Emacs, a language definition is represented by a symbol. +For example, the C language definition is represented as the symbol +c, and c can be passed to tree-sitter functions as the +language argument.

@@ -81,55 +84,92 @@

Tree-sitter language definitions are distributed as dynamic libraries. In order to use a language definition in Emacs, you need to make sure that the dynamic library is installed on the system. Emacs looks for -language definitions under load paths in -treesit-extra-load-path, user-emacs-directory/tree-sitter, -and system default locations for dynamic libraries, in that order. -Emacs tries each extensions in treesit-load-suffixes. If Emacs -cannot find the library or has problem loading it, Emacs signals -treesit-load-language-error. The signal data is a list of -specific error messages. +language definitions in several places, in the following order: +

+ + +

In each of these directories, Emacs looks for a file with file-name +extensions specified by the variable treesit-load-suffixes. +

+

If Emacs cannot find the library or has problems loading it, Emacs +signals the treesit-load-language-error error. The data of +that signal could be one of the following: +

+
+
(not-found error-msg …)
+

This means that Emacs could not find the language definition library. +

+
(symbol-error error-msg)
+

This means that Emacs could not find in the library the expected function +that every language definition library should export. +

+
(version-mismatch error-msg)
+

This means that the version of language definition library is incompatible +with that of the tree-sitter library. +

+
+ +

In all of these cases, error-msg might provide additional +details about the failure.

-
Function: treesit-language-available-p language
-

This function checks whether the dynamic library for language is -present on the system, and return non-nil if it is. +

Function: treesit-language-available-p language &optional detail
+

This function returns non-nil if the language definitions for +language exist and can be loaded. +

+

If detail is non-nil, return (t . nil) when +language is available, and (nil . data) when it’s +unavailable. data is the signal data of +treesit-load-language-error.

-

By convention, the dynamic library for language is -libtree-sitter-language.ext, where ext is the -system-specific extension for dynamic libraries. Also by convention, +

By convention, the file name of the dynamic library for language is +libtree-sitter-language.ext, where ext is the +system-specific extension for dynamic libraries. Also by convention, the function provided by that library is named -tree_sitter_language. If a language definition doesn’t -follow this convention, you should add an entry +tree_sitter_language. If a language definition library +doesn’t follow this convention, you should add an entry

(language library-base-name function-name)
 
-

to treesit-load-name-override-list, where -library-base-name is the base filename for the dynamic library -(conventionally libtree-sitter-language), and +

to the list in the variable treesit-load-name-override-list, where +library-base-name is the basename of the dynamic library’s file name, +(usually, libtree-sitter-language), and function-name is the function provided by the library -(conventionally tree_sitter_language). For example, +(usually, tree_sitter_language). For example,

(cool-lang "libtree-sitter-coool" "tree_sitter_cooool")
 
-

for a language too cool to abide by conventions. +

for a language that considers itself too “cool” to abide by +conventions.

+
Function: treesit-language-version &optional min-compatible
-

Tree-sitter library has a language version, a language -definition’s version needs to match this version to be compatible. -

-

This function returns tree-sitter library’s language version. If -min-compatible is non-nil, it returns the minimal compatible -version. +

This function returns the version of the language-definition +Application Binary Interface (ABI) supported by the +tree-sitter library. By default, it returns the latest ABI version +supported by the library, but if min-compatible is +non-nil, it returns the oldest ABI version which the library +still can support. Language definition libraries must be built for +ABI versions between the oldest and the latest versions supported by +the tree-sitter library, otherwise the library will be unable to load +them.

Concrete syntax tree

+

A syntax tree is what a parser generates. In a syntax tree, each node represents a piece of text, and is connected to each other by a @@ -155,31 +195,34 @@ +------------+ +--------------+ +------------+ -

We can also represent it in s-expression: +

We can also represent it as an s-expression:

(root (expression (number) (operator) (number)))
 

Node types

+ - - - -

Names like root, expression, number, -operator are nodes’ type. However, not all nodes in a -syntax tree have a type. Nodes that don’t are anonymous nodes, -and nodes with a type are named nodes. Anonymous nodes are -tokens with fixed spellings, including punctuation characters like -bracket ‘]’, and keywords like return. + + + +

Names like root, expression, number, and +operator specify the type of the nodes. However, not all +nodes in a syntax tree have a type. Nodes that don’t have a type are +known as anonymous nodes, and nodes with a type are named +nodes. Anonymous nodes are tokens with fixed spellings, including +punctuation characters like bracket ‘]’, and keywords like +return.

Field names

+ -

To make the syntax tree easier to -analyze, many language definitions assign field names to child -nodes. For example, a function_definition node could have a -declarator and a body: +

To make the syntax tree easier to analyze, many language definitions +assign field names to child nodes. For example, a +function_definition node could have a declarator and a +body:

(function_definition
@@ -189,39 +232,40 @@
 
 
Command: treesit-inspect-mode
-

This minor mode displays the node that starts at point in -mode-line. The mode-line will display +

This minor mode displays on the mode-line the node that starts +at point. The mode-line will display

-
parent field-name: (child (grand-child (...)))
+
parent field: (child (grandchild (…)))
 
-

child, grand-child, and grand-grand-child, etc, are -nodes that have their beginning at point. And parent is the -parent of child. +

child, grand, grand-grandchild, etc., are nodes that +begin at point. parent is the parent node of child.

If there is no node that starts at point, i.e., point is in the middle of a node, then the mode-line only displays the smallest node that -spans point, and its immediate parent. +spans the position of point, and its immediate parent.

This minor mode doesn’t create parsers on its own. It simply uses the first parser in (treesit-parser-list) (see Using Tree-sitter Parser).

Reading the grammar definition

+

Authors of language definitions define the grammar of a -language, and this grammar determines how does a parser construct a -concrete syntax tree out of the text. In order to use the syntax -tree effectively, we need to read the grammar file. +programming language, which determines how a parser constructs a +concrete syntax tree out of the program text. In order to use the +syntax tree effectively, you need to consult the grammar file.

-

The grammar file is usually grammar.js in a language -definition’s project repository. The link to a language definition’s -home page can be found in tree-sitter’s homepage -(https://tree-sitter.github.io/tree-sitter). +

The grammar file is usually grammar.js in a language +definition’s project repository. The link to a language definition’s +home page can be found on +tree-sitter’s +homepage.

-

The grammar is written in JavaScript syntax. For example, the rule -matching a function_definition node looks like +

The grammar definition is written in JavaScript. For example, the +rule matching a function_definition node looks like

function_definition: $ => seq(
@@ -231,12 +275,12 @@
 )
 
-

The rule is represented by a function that takes a single argument +

The rules are represented by functions that take a single argument $, representing the whole grammar. The function itself is -constructed by other functions: the seq function puts together a -sequence of children; the field function annotates a child with -a field name. If we write the above definition in BNF syntax, it -would look like +constructed by other functions: the seq function puts together +a sequence of children; the field function annotates a child +with a field name. If we write the above definition in the so-called +Backus-Naur Form (BNF) syntax, it would look like

function_definition :=
@@ -252,66 +296,77 @@
   body: (compound_statement))
 
-

Below is a list of functions that one will see in a grammar -definition. Each function takes other rules as arguments and returns -a new rule. +

Below is a list of functions that one can see in a grammar definition. +Each function takes other rules as arguments and returns a new rule.

- +

+ -

Below are grammar functions less interesting for a reader of a +

Below are grammar functions of lesser importance for reading a language definition.

- - -

The tree-sitter project talks about writing a grammar in more detail: -https://tree-sitter.github.io/tree-sitter/creating-parsers. -Read especially “The Grammar DSL” section. +

The documentation of the tree-sitter project has +more +about writing a grammar. Read especially “The Grammar DSL” +section.


diff --git a/admin/notes/tree-sitter/html-manual/Multiple-Languages.html b/admin/notes/tree-sitter/html-manual/Multiple-Languages.html index 1ee2df7f442..eac142921f1 100644 --- a/admin/notes/tree-sitter/html-manual/Multiple-Languages.html +++ b/admin/notes/tree-sitter/html-manual/Multiple-Languages.html @@ -33,7 +33,7 @@ - +