Document tree-sitter features in the user manual

* lisp/progmodes/c-ts-mode.el (c-ts-mode-map): Bind "C-c .", for
consistency with CC mode.
* lisp/treesit.el (treesit-font-lock-level): Doc fix.

* doc/emacs/programs.texi (C Indent, Custom C Indent): Document
the indentation features of 'c-ts-mode'.
(Moving by Defuns): Document 'treesit-defun-tactic'.
* doc/emacs/files.texi (Visiting): Document
'treesit-max-buffer-size'.
* doc/emacs/display.texi (Traditional Font Lock)
(Parser-based Font Lock): New subsections.
* doc/emacs/emacs.texi (Top): Update top-level menu.
This commit is contained in:
Eli Zaretskii 2023-01-29 15:22:20 +02:00
parent b73539832d
commit 197f994384
6 changed files with 170 additions and 36 deletions

View file

@ -1024,17 +1024,65 @@ customize-group @key{RET} font-lock-faces @key{RET}}. You can then
use that customization buffer to customize the appearance of these
faces. @xref{Face Customization}.
@cindex just-in-time (JIT) font-lock
@cindex background syntax highlighting
Fontifying very large buffers can take a long time. To avoid large
delays when a file is visited, Emacs initially fontifies only the
visible portion of a buffer. As you scroll through the buffer, each
portion that becomes visible is fontified as soon as it is displayed;
this type of Font Lock is called @dfn{Just-In-Time} (or @dfn{JIT})
Lock. You can control how JIT Lock behaves, including telling it to
perform fontification while idle, by customizing variables in the
customization group @samp{jit-lock}. @xref{Specific Customization}.
The information that major modes use for determining which parts of
buffer text to fontify and what faces to use can be based on several
different ways of analyzing the text:
@itemize @bullet
@item
Search for keywords and other textual patterns based on regular
expressions (@pxref{Regexp Search,, Regular Expression Search}).
@item
Find syntactically distinct parts of text based on built-in syntax
tables (@pxref{Syntax Tables,,, elisp, The Emacs Lisp Reference
Manual}).
@item
Use syntax tree produced by a full-blown parser, via a special-purpose
library, such as the tree-sitter library (@pxref{Parsing Program
Source,,, elisp, The Emacs Lisp Reference Manual}), or an external
program.
@end itemize
@menu
* Traditional Font Lock:: Font Lock based on regexps and syntax tables.
* Parser-based Font Lock:: Font Lock based on external parser.
@end menu
@node Traditional Font Lock
@subsection Traditional Font Lock
@cindex traditional font-lock
``Traditional'' methods of providing font-lock information are based
on regular-expression search and on syntactic analysis using syntax
tables built into Emacs. This subsection describes the use and
customization of font-lock for major modes which use these traditional
methods.
@vindex font-lock-maximum-decoration
You can customize the variable @code{font-lock-maximum-decoration}
to alter the amount of fontification applied by Font Lock mode, for
major modes that support this feature. The value should be a number
(with 1 representing a minimal amount of fontification; some modes
support levels as high as 3); or @code{t}, meaning ``as high as
possible'' (the default). To be effective for a given file buffer,
the customization of @code{font-lock-maximum-decoration} should be
done @emph{before} the file is visited; if you already have the file
visited in a buffer when you customize this variable, kill the buffer
and visit the file again after the customization.
You can control the amount of fontification applied by Font Lock
mode by customizing the variable @code{font-lock-maximum-decoration},
for major modes that support this feature. The value of this variable
should be a number (with 1 representing a minimal amount of
fontification; some modes support levels as high as 3); or @code{t},
meaning ``as high as possible'' (the default). To be effective for a
given file buffer, the customization of
@code{font-lock-maximum-decoration} should be done @emph{before} the
file is visited; if you already have the file visited in a buffer when
you customize this variable, kill the buffer and visit the file again
after the customization.
You can also specify different numbers for particular major modes; for
example, to use level 1 for C/C++ modes, and the default level
@ -1082,16 +1130,59 @@ keywords by customizing the @code{font-lock-ignore} option,
@pxref{Customizing Keywords,,, elisp, The Emacs Lisp Reference
Manual}.
@cindex just-in-time (JIT) font-lock
@cindex background syntax highlighting
Fontifying large buffers can take a long time. To avoid large
delays when a file is visited, Emacs initially fontifies only the
visible portion of a buffer. As you scroll through the buffer, each
portion that becomes visible is fontified as soon as it is displayed;
this type of Font Lock is called @dfn{Just-In-Time} (or @dfn{JIT})
Lock. You can control how JIT Lock behaves, including telling it to
perform fontification while idle, by customizing variables in the
customization group @samp{jit-lock}. @xref{Specific Customization}.
@node Parser-based Font Lock
@subsection Parser-based Font Lock
@cindex font-lock via tree-sitter
@cindex parser-based font-lock
If your Emacs was built with the tree-sitter library, it can use the
results of parsing the buffer text by that library for the purposes of
fontification. This is usually faster and more accurate than the
``traditional'' methods described in the previous subsection, since
the tree-sitter library provides full-blown parsers for programming
languages and other kinds of formatted text which it supports. Major
modes which utilize the tree-sitter library are named
@code{@var{foo}-ts-mode}, with the @samp{-ts-} part indicating the use
of the library. This subsection documents the Font Lock support based
on the tree-sitter library.
@vindex treesit-font-lock-level
You can control the amount of fontification applied by Font Lock
mode of major modes based on tree-sitter by customizing the variable
@code{treesit-font-lock-level}. Its value is a number between 1 and
4:
@table @asis
@item Level 1
This level usually fontifies only comments and function names in
function definitions.
@item Level 2
This level adds fontification of keywords, strings, and data types.
@item Level 3
This is the default level; it adds fontification of assignments,
numbers, properties, etc.
@item Level 4
This level adds everything else that can be fontified: operators,
delimiters, brackets, other punctuation, function names in function
calls, variables, etc.
@end table
@vindex treesit-font-lock-feature-list
@noindent
What exactly constitutes each of the syntactical categories mentioned
above depends on the major mode and the parser grammar used by
tree-sitter for the major-mode's language. However, in general the
categories follow the conventions of the programming language or the
file format supported by the major mode. The buffer-local value of
the variable @code{treesit-font-lock-feature-list} holds the
fontification features supported by a tree-sitter based major mode,
where each sub-list shows the features provided by the corresponding
fontification level.
Once you change the value of @code{treesit-font-lock-level} via
@w{@kbd{M-x customize-variable}} (@pxref{Specific Customization}), it
takes effect immediately in all the existing buffers and for files you
visit in the future in the same session.
@node Highlight Interactively
@section Interactive Highlighting

View file

@ -383,6 +383,10 @@ Controlling the Display
* Visual Line Mode:: Word wrap and screen line-based editing.
* Display Custom:: Information on variables for customizing display.
Font Lock
* Traditional Font Lock:: Font Lock based on regexps and syntax tables.
* Parser-based Font Lock:: Font Lock based on external parser.
Searching and Replacement
* Incremental Search:: Search happens as you type the string.

View file

@ -215,6 +215,17 @@ by the integers that Emacs can represent (@pxref{Buffers}). If you
try, Emacs displays an error message saying that the maximum buffer
size has been exceeded.
@vindex treesit-max-buffer-size
If you try to visit a file whose major mode (@pxref{Major Modes})
uses the tree-sitter parsing library, Emacs will display a warning if
the file's size in bytes is larger than the value of the variable
@code{treesit-max-buffer-size}. The default value is 40 megabytes for
64-bit Emacs and 15 megabytes for 32-bit Emacs. This avoids the
danger of having Emacs run out of memory by preventing the activation
of major modes based on tree-sitter in such large buffers, because a
typical tree-sitter parser needs about 10 times as much memory as the
text it parses.
@cindex wildcard characters in file names
@vindex find-file-wildcards
If the file name you specify contains shell-style wildcard

View file

@ -254,6 +254,17 @@ they do their standard jobs in a way better fitting a particular
language. Other major modes may replace any or all of these key
bindings for that purpose.
@cindex nested defuns
@vindex treesit-defun-tactic
Some programming languages supported @dfn{nested defuns}, whereby a
defun (such as a function or a method or a class) can be defined
inside (i.e., as part of the body) of another defun. The commands
described above by default find the beginning and the end of the
@emph{innermost} defun around point. Major modes based on the
tree-sitter library provide control of this behavior: if the variable
@code{treesit-defun-tactic} is set to the value @code{top-level}, the
defun commands will find the @emph{outermost} defuns instead.
@node Imenu
@subsection Imenu
@cindex index of buffer definitions
@ -520,15 +531,19 @@ then indent it like this:
@item C-c C-q
@kindex C-c C-q @r{(C mode)}
@findex c-indent-defun
@findex c-ts-mode-indent-defun
Reindent the current top-level function definition or aggregate type
declaration (@code{c-indent-defun}).
declaration (@code{c-indent-defun} in CC mode,
@code{c-ts-mode-indent-defun} in @code{c-ts-mode} based on tree-sitter).
@item C-M-q
@kindex C-M-q @r{(C mode)}
@findex c-indent-exp
Reindent each line in the balanced expression that follows point
(@code{c-indent-exp}). A prefix argument inhibits warning messages
about invalid syntax.
@findex prog-indent-sexp
Reindent each line in the balanced expression that follows point. In
CC mode, this invokes @code{c-indent-exp}; in tree-sitter based
@code{c-ts-mode} this invokes a more general @code{prog-indent-sexp}.
A prefix argument inhibits warning messages about invalid syntax.
@item @key{TAB}
@findex c-indent-line-or-region
@ -568,7 +583,8 @@ onto the indentation of the @dfn{anchor statement}.
@table @kbd
@item C-c . @var{style} @key{RET}
Select a predefined style @var{style} (@code{c-set-style}).
Select a predefined style @var{style} (@code{c-set-style} in CC mode,
@code{c-ts-mode-set-style} in @code{c-ts-mode} based on tree-sitter).
@end table
A @dfn{style} is a named collection of customizations that can be
@ -584,6 +600,7 @@ typing @kbd{C-M-q} at the start of a function definition.
@kindex C-c . @r{(C mode)}
@findex c-set-style
@findex c-ts-mode-set-style
To choose a style for the current buffer, use the command @w{@kbd{C-c
.}}. Specify a style name as an argument (case is not significant).
This command affects the current buffer only, and it affects only
@ -592,11 +609,11 @@ the code already in the buffer. To reindent the whole buffer in the
new style, you can type @kbd{C-x h C-M-\}.
@vindex c-default-style
You can also set the variable @code{c-default-style} to specify the
default style for various major modes. Its value should be either the
style's name (a string) or an alist, in which each element specifies
one major mode and which indentation style to use for it. For
example,
When using CC mode, you can also set the variable
@code{c-default-style} to specify the default style for various major
modes. Its value should be either the style's name (a string) or an
alist, in which each element specifies one major mode and which
indentation style to use for it. For example,
@example
(setq c-default-style
@ -613,6 +630,11 @@ one of the C-like major modes; thus, if you specify a new default
style for Java mode, you can make it take effect in an existing Java
mode buffer by typing @kbd{M-x java-mode} there.
@vindex c-ts-mode-indent-style
When using the tree-sitter based @code{c-ts-mode}, you can set the
default indentation style by customizing the variable
@code{c-ts-mode-indent-style}.
The @code{gnu} style specifies the formatting recommended by the GNU
Project for C; it is the default, so as to encourage use of our
recommended style.

View file

@ -700,7 +700,8 @@ the semicolon. This function skips the semicolon."
(defvar-keymap c-ts-mode-map
:doc "Keymap for the C language with tree-sitter"
:parent prog-mode-map
"C-c C-q" #'c-ts-mode-indent-defun)
"C-c C-q" #'c-ts-mode-indent-defun
"C-c ." #'c-ts-mode-set-style)
;;;###autoload
(define-derived-mode c-ts-base-mode prog-mode "C"

View file

@ -580,16 +580,21 @@ from 1 which is the absolute minimum, to 4 that yields the maximum
fontifications.
Level 1 usually contains only comments and definitions.
Level 2 usually adds keywords, strings, constants, types, etc.
Level 3 usually represents a full-blown fontification, including
assignment, constants, numbers, properties, etc.
Level 2 usually adds keywords, strings, data types, etc.
Level 3 usually represents full-blown fontifications, including
assignments, constants, numbers and literals, properties, etc.
Level 4 adds everything else that can be fontified: delimiters,
operators, brackets, all functions and variables, etc.
operators, brackets, punctuation, all functions and variables, etc.
In addition to the decoration level, individual features can be
turned on/off by calling `treesit-font-lock-recompute-features'.
Changing the decoration level requires calling
`treesit-font-lock-recompute-features' to have an effect."
`treesit-font-lock-recompute-features' to have an effect, unless
done via `customize-variable'.
To see which syntactical categories are fontified by each level
in a particular major mode, examine the buffer-local value of the
variable `treesit-font-lock-feature-list'."
:type 'integer
:set #'treesit--font-lock-level-setter
:version "29.1")