Initial revision

This commit is contained in:
Richard M. Stallman 1998-02-28 01:49:58 +00:00
parent 19061fd414
commit cc6d0d2c94
2 changed files with 1456 additions and 0 deletions

765
lispref/customize.texi Normal file
View file

@ -0,0 +1,765 @@
@c -*-texinfo-*-
@c This is part of the GNU Emacs Lisp Reference Manual.
@c Copyright (C) 1997, 1998 Free Software Foundation, Inc.
@c See the file elisp.texi for copying conditions.
@setfilename ../info/customize
@node Customization, Loading, Macros, Top
@chapter Writing Customization Definitions
This chapter describes how to declare customization groups, variables,
and faces. We use the term @dfn{customization item} to include all
three of those. This has few examples, but please look at the file
@file{cus-edit.el}, which contains many declarations you can learn from.
@menu
* Common Keywords::
* Group Definitions::
* Variable Definitions::
* Face Definitions::
* Customization Types::
@end menu
@node Common Keywords
@section Common Keywords for All Kinds of Items
All three kinds of customization declarations (for groups, variables,
and faces) accept keyword arguments for specifying various information.
This section describes some keywords that apply to all three.
All of these keywords, except @code{:tag}, can be used more than once in
a given item. Each use of the keyword has an independent effect. The
keyword @code{:tag} is an exception because any given item can only
display one name item.
@table @code
@item :group @var{group}
Put this customization item in group @var{group}. When you use
@code{:group} in a @code{defgroup}, it makes the new group a subgroup of
@var{group}.
If you use this keyword more than once, you can put a single item into
more than one group. Displaying any of those groups will show this
item. Be careful not to overdo this!
@item :link @var{link-data}
Include an external link after the documentation string for this item.
This is a sentence containing an active field which references some
other documentation.
There are three alternatives you can use for @var{link-data}:
@table @code
@item (custom-manual @var{info-node})
Link to an Info node; @var{info-node} is a string which specifies the
node name, as in @code{"(emacs)Top"}. The link appears as
@samp{[manual]} in the customization buffer.
@item (info-link @var{info-node})
Like @code{custom-manual} except that the link appears
in the customization buffer with the Info node name.
@item (url-link @var{url})
Link to a web page; @var{url} is a string which specifies the URL. The
link appears in the customization buffer as @var{url}.
@end table
You can specify the text to use in the customization buffer by adding
@code{:tag @var{name}} after the first element of the @var{link-data};
for example, @code{(info-link :tag "foo" "(emacs)Top")} makes a link to
the Emacs manual which appears in the buffer as @samp{foo}.
An item can have more than one external link; however, most items have
none at all.
@item :load @var{file}
Load file @var{file} (a string) before displaying this customization
item. Loading is done with @code{load-library}, and only if the file is
not already loaded.
@item :require @var{feature}
Require feature @var{feature} (a symbol) when installing a value for
this item (an option or a face) that was saved using the customization
feature. This is done by calling @code{require}.
The most common reason to use @code{:require} is when a variable enables
a feature such as a minor mode, and just setting the variable won't have
any effect unless the code which implements the mode is loaded.
@item :tag @var{name}
Use @var{name}, a string, instead of the item's name, to label the item
in customization menus and buffers.
@end table
@node Group Definitions
@section Defining Custom Groups
Each Emacs Lisp package should have one main customization group which
contains all the options, faces and other groups in the package. If the
package has a small number of options and faces, use just one group and
put everything in it. When there are more than twelve or so options and
faces, then you should structure them into subgroups, and put the
subgroups under the package's main customization group. It is ok to
have some of the options and faces in the package's main group alongside
the subgroups.
The package's main or only group should be a member of one or more of
the standard customization groups. Type press @kbd{C-h p} to display a
list of finder keywords; them choose some of them add your group to each
of them, using the @code{:group} keyword.
The way to declare new customization groups is with @code{defgroup}.
@tindex defgroup
@defmac defgroup group members doc [keyword value]...
Declare @var{group} as a customization group containing @var{members}.
Do not quote the symbol @var{group}. The argument @var{doc} specifies
the documentation string for the group.
The arguments @var{members} can be an alist whose elements specify
members of the group; however, normally @var{members} is @code{nil}, and
you specify the group's members by using the @code{:group} keyword when
defining those members.
@ignore
@code{(@var{name} @var{widget})}. Here @var{name} is a symbol, and
@var{widget} is a widget for editing that symbol. Useful widgets are
@code{custom-variable} for editing variables, @code{custom-face} for
editing faces, and @code{custom-group} for editing groups.
@end ignore
In addition to the common keywords (@pxref{Common Keywords}), you can
use this keyword in @code{defgroup}:
@table @code
@item :prefix @var{prefix}
If the name of an item in the group starts with @var{prefix}, then the
tag for that item is constructed (by default) by omitting @var{prefix}.
One group can have any number of prefixes.
@end table
@end defmac
The @code{:prefix} feature is currently turned off, which means that
@code{:prefix} currently has no effect. We did this because we found
that discarding the specified prefixes often led to confusing names for
options. This happened because the people who wrote the @code{defgroup}
definitions for various groups added @code{:prefix} keywords whenever
they make logical sense---that is, whenever they say that there was a
common prefix for the option names in a library.
In order to obtain good results with @code{:prefix}, it is necessary to
check the specific effects of discarding a particular prefix, given the
specific items in a group and their names and documentation. If the
resulting text is not clear, then @code{:prefix} should not be used in
that case.
It should be possible to recheck all the customization groups, delete
the @code{:prefix} specifications which give unclear results, and then
turn this feature back on, if someone would like to do the work.
@node Variable Definitions
@section Defining Customization Variables
Use @code{defcustom} to declare user editable variables.
@tindex defcustom
@defmac defcustom option value doc [keyword value]...
Declare @var{option} as a customizable user option variable that
defaults to @var{value}. Do not quote @var{option}. @var{value} should
be an expression to compute the value; it will be be evaluated on more
than one occasion.
If @var{option} is void, @code{defcustom} initializes it to @var{value}.
The argument @var{doc} specifies the documentation string for the variable.
The following additional keywords are defined:
@table @code
@item :type @var{type}
Use @var{type} as the data type for this option. It specifies which
values are legitimate, and how to display the value.
@xref{Customization Types}, for more information.
@item :options @var{list}
Specify @var{list} as the list of reasonable values for use in this
option.
Currently this is meaningful only when type is @code{hook}. The
elements of @var{list} are functions that you might likely want to use
as elements of the hook value. The user is not actually restricted to
using only these functions, but they are offered as convenient
alternatives.
@item :version @var{version}
This option specifies that the variable's default value was changed in
Emacs version @var{version}. For example,
@example
(defcustom foo-max 34
"*Maximum number of foo's allowed."
:type 'integer
:group 'foo
:version "20.3")
@end example
@item :set @var{setfunction}
Specify @var{setfunction} as the way to change the value of this option.
The function @var{setfunction} should take two arguments, a symbol and
the new value, and should do whatever is necessary to update the value
properly for this option (which may not mean simply setting the option
as a Lisp variable). The default for @var{setfunction} is
@code{set-default}.
@item :get @var{getfunction}
Specify @var{getfunction} as the way to extract the value of this
option. The function @var{getfunction} should take one argument, a
symbol, and should return the ``current value'' for that symbol (which
need not be the symbol's Lisp value). The default is
@code{default-value}.
@item :initialize @var{function}
@var{function} should be a function used to initialize the variable when
the @code{defcustom} is evaluated. It should take two arguments, the
symbol and value. Here are some predefined functions meant for use in
this way:
@table @code
@item custom-initialize-set
Use the variable's @code{:set} function to initialize the variable. Do
not reinitialize it if it is already non-void. This is the default
@code{:initialize} function.
@item custom-initialize-default
Always use @code{set-default} to initialize the variable, even if some
other @code{:set} function has been specified.
@item custom-initialize-reset
Even if the variable is already non-void, reset it by calling the
@code{:set} function using the current value (returned by the
@code{:get} method).
@item custom-initialize-changed
Like @code{custom-initialize-reset}, except use @code{set-default}
(rather than the @code{:set} function) to initialize the variable if it
is not bound and has not been set already.
@end table
@item :require @var{feature}
If the user saves a customized value for this item, them Emacs should do
@code{(require @var{feature})} after installing the saved value.
The place to use this feature is for an option that turns on the
operation of a certain feature. Assuming that the package is coded to
check the value of the option, you still need to arrange for the package
to be loaded. That is what @code{:require} is for.
@end table
@end defmac
@ignore
Use @code{custom-add-option} to specify that a specific function is
useful as an member of a hook.
@defun custom-add-option symbol option
To the variable @var{symbol} add @var{option}.
If @var{symbol} is a hook variable, @var{option} should be a hook
member. For other types variables, the effect is undefined."
@end defun
@end ignore
Internally, @code{defcustom} uses the symbol property
@code{standard-value} to record the expression for the default value,
and @code{saved-value} to record the value saved by the user with the
customization buffer. The @code{saved-value} property is actually a
list whose car is an expression which evaluates to the value.
@node Face Definitions
@section Defining Faces
Faces are declared with @code{defface}.
@tindex defface
@defmac defface face spec doc [keyword value]...
Declare @var{face} as a customizable face that defaults according to
@var{spec}. Do not quote the symbol @var{face}.
@var{doc} is the face documentation.
@var{spec} should be an alist whose elements have the form
@code{(@var{display} @var{atts})} (see below). When @code{defface}
executes, it defines the face according to @var{spec}, then uses any
customizations saved in the @file{.emacs} file to override that
specification.
In each element of @var{spec}, @var{atts} is a list of face attributes
and their values. The possible attributes are defined in the variable
@code{custom-face-attributes}.
The @var{display} part of an element of @var{spec} determines which
frames the element applies to. If more than one element of @var{spec}
matches a given frame, the first matching element is the only one used
for that frame.
If @var{display} is @code{t} in a @var{spec} element, that element
matches all frames. (This means that any subsequent elements of
@var{spec} are never used.)
Alternatively, @var{display} can be an alist whose elements have the
form @code{(@var{characteristic} @var{value}@dots{})}. Here
@var{characteristic} specifies a way of classifying frames, and the
@var{value}s are possible classifications which @var{display} should
apply to. Here are the possible values of @var{characteristic}:
@table @code
@item type
The kind of window system the frame uses---either @code{x}, @code{pc}
(for the MS-DOS console), @code{w32} (for MS Windows 9X/NT), or
@code{tty}.
@item class
What kinds of colors the frame supports---either @code{color},
@code{grayscale}, or @code{mono}.
@item background
The kind of background--- either @code{light} or @code{dark}.
@end table
If an element of @var{display} specifies more than one
@var{value} for a given @var{characteristic}, any of those values
is acceptable. If an element of @var{display} has elements for
more than one @var{characteristic}, then @var{each} characteristic
of the frame must match one of the values specified for it.
@end defmac
Internally, @code{defface} uses the symbol property
@code{face-defface-spec} to record the face attributes specified in
@code{defface}, @code{saved-face} for the attributes saved by the user
with the customization buffer, and @code{face-documentation} for the
documentation string.
@node Customization Types
@section Customization Types
When you define a user option with @code{defcustom}, you must specify
its @dfn{customization type}. That is a Lisp object which indictaes (1)
which values are legitimate and (2) how to display the value in the
customization buffer for editing.
You specify the customization type in @code{defcustom} with the
@code{:type} keyword. The argument of @code{:type} is evaluated; since
types that vary at run time are rarely useful, normally it is a quoted
constant. For example:
@example
(defcustom diff-command "diff"
"*The command to use to run diff."
:type 'string
:group 'diff)
@end example
In general, a customization type appears is a list whose first element
is a symbol, one of the customization type names defined in the
following sections. After this symbol come a number of arguments,
depending on the symbol. Some of the type symbols do not use any
arguments; those are called @dfn{simple types}.
In between the type symbol and its arguments, you can optionally
write keyword-value pairs. @xref{Type Keywords}.
For a simple type, if you do not use any keyword-value pairs, you can
omit the parentheses around the type symbol. The above example does
this, using just @code{string} as the customization type.
But @code{(string)} would mean the same thing.
@menu
* Simple Types::
* Composite Types::
* Splicing into Lists::
* Type Keywords::
@end menu
@node Simple Types
@subsection Simple Types
This section describes all the simple customization types.
@table @code
@item sexp
The value may be any Lisp object that can be printed and read back. You
can use @code{sexp} as a fall-back for any option, if you don't want to
take the time to work out a more specific type to use.
@item integer
The value must be an integer, and is represented textually
in the customization buffer.
@item number
The value must be a number, and is represented textually in the
customization buffer.
@item string
The value must be a string, and the customization buffer shows just the
contents, with no @samp{"} characters or quoting with @samp{\}.
@item regexp
The value must be a string which is a valid regular expression.
@item character
The value must be a character code. A character code is actually an
integer, but this type shows the value by inserting the character in the
buffer, rather than by showing the number.
@item file
The value must be a file name, and you can do completion with
@kbd{M-@key{TAB}}.
@item (file :must-match t)
The value must be a file name for an existing file, and you can do
completion with @kbd{M-@key{TAB}}.
@item directory
The value must be a directory name, and you can do completion with
@kbd{M-@key{TAB}}.
@item symbol
The value must be a symbol. It appears in the customization buffer as
the name of the symbol.
@item function
The value must be either a lambda expression or a function name. When
it is a function name, you can do completion with @kbd{M-@key{TAB}}.
@item variable
The value must be a variable name, and you can do completion with
@kbd{M-@key{TAB}}.
@item boolean
The value is boolean---either @code{nil} or @code{t}.
@end table
@node Composite Types
@subsection Composite Types
When none of the simple types is appropriate, you can use composite
types, which build from simple types. Here are several ways of doing
that:
@table @code
@item (restricted-sexp :match-alternatives @var{criteria})
The value may be any Lisp object that satisfies one of @var{criteria}.
@var{criteria} should be a list, and each elements should be
one of these possibilities:
@itemize @bullet
@item
A predicate---that is, a function of one argument that returns non-@code{nil}
if the argument fits a certain type. This means that objects of that type
are acceptable.
@item
A quoted constant---that is, @code{'@var{object}}. This means that
@var{object} is an acceptable value.
@end itemize
For example,
@example
(restricted-sexp :match-alternatives (integerp 't 'nil))
@end example
@noindent
allows integers, @code{t} and @code{nil} as legitimate values.
The customization buffer shows all legitimate values using their read
syntax, and the user edits them textually.
@item (cons @var{car-type} @var{cdr-type})
The value must be a cons cell, its @sc{car} must fit @var{car-type}, and
its @sc{cdr} must fit @var{cdr-type}. For example, @code{(const string
symbol)} is a customization type which matches values such as
@code{("foo" . foo)}.
In the customization buffeer, the @sc{car} and the @sc{cdr} are
displayed and edited separately, each according to the type
that you specify for it.
@item (list @var{element-types}@dots{})
The value must be a list with exactly as many elements as the
@var{element-types} you have specified; and each element must fit the
corresponding @var{element-type}.
For example, @code{(list integer string function)} describes a list of
three elements; the first element must be an integer, the second a
string, and the third a function.
In the customization buffeer, the each element is displayed and edited
separately, according to the type specified for it.
@item (vector @var{element-types}@dots{})
Like @code{list} except that the value must be a vector instead of a
list. The elements work the same as in @code{list}.
@item (choice @var{alternative-types}...)
The value must fit at least one of @var{alternative-types}.
For example, @code{(choice integer string)} allows either an
integer or a string.
In the customization buffer, the user selects one of the alternatives
using a menu, and can then edit the value in the usual way for that
alternative.
Normally the strings in this menu are determined automatically from the
choices; however, you can specify different strings for the menu by
including the @code{:tag} keyword in the alternatives. For example, if
an integer stands for a number of spaces, while a string is text to use
verbatim, you might write the customization type this way,
@smallexample
(choice (integer :tag "Number of spaces")
(string :tag "Literal text"))
@end smallexample
@noindent
so that the menu offers @samp{Number of spaces} and @samp{Literal Text}.
@item (const @var{value})
The value must be @var{value}---nothing else is allowed.
The main use of @code{const} is inside of @code{choice}. For example,
@code{(choice integer (const nil))} allows either an integer or
@code{nil}. @code{:tag} is often used with @code{const}.
@item (function-item @var{function})
Like @code{const}, but used for values which are functions. This
displays the documentation string of the function @var{function}
as well as its name.
@item (variable-item @var{variable})
Like @code{const}, but used for values which are variable names. This
displays the documentation string of the variable @var{variable} as well
as its name.
@item (set @var{elements}@dots{})
The value must be a list and each element of the list must be one of the
@var{elements} specified. This appears in the customization buffer as a
checklist.
@item (repeat @var{element-type})
The value must be a list and each element of the list must fit the type
@var{element-type}. This appears in the customization buffer as a
list of elements, with @samp{[INS]} and @samp{[DEL]} buttons for adding
more elements or removing elements.
@end table
@node Splicing into Lists
@subsection Splicing into Lists
The @code{:inline} feature lets you splice a variable number of
elements into the middle of a list or vector. You use it in a
@code{set}, @code{choice} or @code{repeat} type which appears among the
element-types of a @code{list} or @code{vector}.
Normally, each of the element-types in a @code{list} or @code{vector}
describes one and only one element of the list or vector. Thus, if an
element-type is a @code{repeat}, that specifies a list of unspecified
length which appears as one element.
But when the element-type uses @code{:inline}, the value it matches is
merged directly into the containing sequence. For example, if it
matches a list with three elements, those become three elements of the
overall sequence. This is analogous to using @samp{,@@} in the backquote
construct.
For example, to specify a list whose first element must be @code{t}
and whose remaining arguments should be zero or more of @code{foo} and
@code{bar}, use this customization type:
@example
(list (const t) (set :inline t foo bar))
@end example
@noindent
This matches values such as @code{(t)}, @code{(t foo)}, @code{(t bar)}
and @code{(t foo bar)}.
When the element-type is a @code{choice}, you use @code{:inline} not
in the @code{choice} itself, but in (some of) the alternatives of the
@code{choice}. For example, to match a list which must start with a
file name, followed either by the symbol @code{t} or two strings, use
this customization type:
@example
(list file
(choice (const t)
(list :inline t string string)))
@end example
@noindent
If the user chooses the first alternative in the choice, then the
overall list has two elements and the second element is @code{t}. If
the user chooses the second alternative, then the overall list has three
elements and the second and third must be strings.
@node Type Keywords
@subsection Type Keywords
You can specify keyword-argument pairs in a customization type after the
type name symbol. Here are the keywords you can use, and their
meanings:
@table @code
@item :value @var{default}
This is used for a type that appears as an alternative inside of
@code{:choice}; it specifies the default value to use, at first, if and
when the user selects this alternative with the menu in the
customization buffer.
Of course, if the actual value of the option fits this alternative, it
will appear showing the actual value, not @var{default}.
@item :format @var{format-string}
This string will be inserted in the buffer to represent the value
corresponding to the type. The following @samp{%} escapes are available
for use in @var{format-string}:
@table @samp
@ignore
@item %[@var{button}%]
Display the text @var{button} marked as a button. The @code{:action}
attribute specifies what the button will do if the user invokes it;
its value is a function which takes two arguments---the widget which
the button appears in, and the event.
There is no way to specify two different buttons with different
actions; but perhaps there is no need for one.
@end ignore
@item %@{@var{sample}%@}
Show @var{sample} in a special face specified by @code{:sample-face}.
@item %v
Substitute the item's value. How the value is represented depends on
the kind of item, and (for variables) on the customization type.
@item %d
Substitute the item's documentation string.
@item %h
Like @samp{%d}, but if the documentation string is more than one line,
add an active field to control whether to show all of it or just the
first line.
@item %t
Substitute the tag here. You specify the tag with the @code{:tag}
keyword.
@item %%
Display a literal @samp{%}.
@end table
@item :button-face @var{face}
Use face @var{face} for text displayed with @samp{%[@dots{}%]}.
@item :button-prefix
@itemx :button-suffix
These specify the text to display before and after a button.
Each can be:
@table @asis
@item @code{nil}
No text is inserted.
@item a string
The string is inserted literally.
@item a symbol
The symbol's value is used.
@end table
@item :doc @var{doc}
Use @var{doc} as the documentation string for this item.
@item :tag @var{tag}
Use @var{tag} (a string) as the tag for this item.
@item :help-echo @var{motion-doc}
When you move to this item with @code{widget-forward} or
@code{widget-backward}, it will display the string @var{motion-doc}
in the echo area.
@item :match @var{function}
Specify how to decide whether a value matches the type. @var{function}
should be a function that accepts two arguments, a widget and a value;
it should return non-@code{nil} if the value is acceptable.
@ignore
@item :indent @var{columns}
Indent this item by @var{columns} columns. The indentation is used for
@samp{%n}, and automatically for group names, for checklists and radio
buttons, and for editable lists. It affects the whole of the
item except for the first line.
@item :offset @var{columns}
An integer indicating how many extra spaces to indent the subitems of
this item. By default, subitems are indented the same as their parent.
@item :extra-offset
An integer indicating how many extra spaces to add to this item's
indentation, compared to its parent.
@item :notify
A function called each time the item or a subitem is changed. The
function is called with two or three arguments. The first argument is
the item itself, the second argument is the item that was changed, and
the third argument is the event leading to the change, if any.
@item :menu-tag
Tag used in the menu when the widget is used as an option in a
@code{menu-choice} widget.
@item :menu-tag-get
Function used for finding the tag when the widget is used as an option
in a @code{menu-choice} widget. By default, the tag used will be either the
@code{:menu-tag} or @code{:tag} property if present, or the @code{princ}
representation of the @code{:value} property if not.
@item :validate
A function which takes a widget as an argument, and return nil if the
widgets current value is valid for the widget. Otherwise, it should
return the widget containing the invalid data, and set that widgets
@code{:error} property to a string explaining the error.
You can use the function @code{widget-children-validate} for this job;
it tests that all children of @var{widget} are valid.
@item :tab-order
Specify the order in which widgets are traversed with
@code{widget-forward} or @code{widget-backward}. This is only partially
implemented.
@enumerate a
@item
Widgets with tabbing order @code{-1} are ignored.
@item
(Unimplemented) When on a widget with tabbing order @var{n}, go to the
next widget in the buffer with tabbing order @var{n+1} or @code{nil},
whichever comes first.
@item
When on a widget with no tabbing order specified, go to the next widget
in the buffer with a positive tabbing order, or @code{nil}
@end enumerate
@item :parent
The parent of a nested widget (e.g. a @code{menu-choice} item or an
element of a @code{editable-list} widget).
@item :sibling-args
This keyword is only used for members of a @code{radio-button-choice} or
@code{checklist}. The value should be a list of extra keyword
arguments, which will be used when creating the @code{radio-button} or
@code{checkbox} associated with this item.
@end ignore
@end table

691
lispref/nonascii.texi Normal file
View file

@ -0,0 +1,691 @@
@c -*-texinfo-*-
@c This is part of the GNU Emacs Lisp Reference Manual.
@c Copyright (C) 1998 Free Software Foundation, Inc.
@c See the file elisp.texi for copying conditions.
@setfilename ../info/characters
@node Non-ASCII Characters, Searching and Matching, Text, Top
@chapter Non-ASCII Characters
@cindex multibyte characters
@cindex non-ASCII characters
This chapter covers the special issues relating to non-@sc{ASCII}
characters and how they are stored in strings and buffers.
@menu
* Text Representations::
* Converting Representations::
* Selecting a Representation::
* Character Codes::
* Character Sets::
* Scanning Charsets::
* Chars and Bytes::
* Coding Systems::
* Default Coding Systems::
* Specifying Coding Systems::
* Explicit Encoding::
@end menu
@node Text Representations
@section Text Representations
@cindex text representations
Emacs has two @dfn{text representations}---two ways to represent text
in a string or buffer. These are called @dfn{unibyte} and
@dfn{multibyte}. Each string, and each buffer, uses one of these two
representations. For most purposes, you can ignore the issue of
representations, because Emacs converts text between them as
appropriate. Occasionally in Lisp programming you will need to pay
attention to the difference.
@cindex unibyte text
In unibyte representation, each character occupies one byte and
therefore the possible character codes range from 0 to 255. Codes 0
through 127 are @sc{ASCII} characters; the codes from 128 through 255
are used for one non-@sc{ASCII} character set (you can choose which one
by setting the variable @code{nonascii-insert-offset}).
@cindex leading code
@cindex multibyte text
In multibyte representation, a character may occupy more than one
byte, and as a result, the full range of Emacs character codes can be
stored. The first byte of a multibyte character is always in the range
128 through 159 (octal 0200 through 0237). These values are called
@dfn{leading codes}. The first byte determines which character set the
character belongs to (@pxref{Character Sets}); in particular, it
determines how many bytes long the sequence is. The second and
subsequent bytes of a multibyte character are always in the range 160
through 255 (octal 0240 through 0377).
In a buffer, the buffer-local value of the variable
@code{enable-multibyte-characters} specifies the representation used.
The representation for a string is determined based on the string
contents when the string is constructed.
@tindex enable-multibyte-characters
@defvar enable-multibyte-characters
This variable specifies the current buffer's text representation.
If it is non-@code{nil}, the buffer contains multibyte text; otherwise,
it contains unibyte text.
@strong{Warning:} do not set this variable directly; instead, use the
function @code{set-buffer-multibyte} to change a buffer's
representation.
@end defvar
@tindex default-enable-multibyte-characters
@defvar default-enable-multibyte-characters
This variable`s value is entirely equivalent to @code{(default-value
'enable-multibyte-characters)}, and setting this variable changes that
default value. Although setting the local binding of
@code{enable-multibyte-characters} in a specific buffer is dangerous,
changing the default value is safe, and it is a reasonable thing to do.
The @samp{--unibyte} command line option does its job by setting the
default value to @code{nil} early in startup.
@end defvar
@tindex multibyte-string-p
@defun multibyte-string-p string
Return @code{t} if @var{string} contains multibyte characters.
@end defun
@node Converting Representations
@section Converting Text Representations
Emacs can convert unibyte text to multibyte; it can also convert
multibyte text to unibyte, though this conversion loses information. In
general these conversions happen when inserting text into a buffer, or
when putting text from several strings together in one string. You can
also explicitly convert a string's contents to either representation.
Emacs chooses the representation for a string based on the text that
it is constructed from. The general rule is to convert unibyte text to
multibyte text when combining it with other multibyte text, because the
multibyte representation is more general and can hold whatever
characters the unibyte text has.
When inserting text into a buffer, Emacs converts the text to the
buffer's representation, as specified by
@code{enable-multibyte-characters} in that buffer. In particular, when
you insert multibyte text into a unibyte buffer, Emacs converts the text
to unibyte, even though this conversion cannot in general preserve all
the characters that might be in the multibyte text. The other natural
alternative, to convert the buffer contents to multibyte, is not
acceptable because the buffer's representation is a choice made by the
user that cannot simply be overrided.
Converting unibyte text to multibyte text leaves @sc{ASCII} characters
unchanged. It converts the non-@sc{ASCII} codes 128 through 255 by
adding the value @code{nonascii-insert-offset} to each character code.
By setting this variable, you specify which character set the unibyte
characters correspond to. For example, if @code{nonascii-insert-offset}
is 2048, which is @code{(- (make-char 'latin-iso8859-1 0) 128)}, then
the unibyte non-@sc{ASCII} characters correspond to Latin 1. If it is
2688, which is @code{(- (make-char 'greek-iso8859-7 0) 128)}, then they
correspond to Greek letters.
Converting multibyte text to unibyte is simpler: it performs
logical-and of each character code with 255. If
@code{nonascii-insert-offset} has a reasonable value, corresponding to
the beginning of some character set, this conversion is the inverse of
the other: converting unibyte text to multibyte and back to unibyte
reproduces the original unibyte text.
@tindex nonascii-insert-offset
@defvar nonascii-insert-offset
This variable specifies the amount to add to a non-@sc{ASCII} character
when converting unibyte text to multibyte. It also applies when
@code{insert-char} or @code{self-insert-command} inserts a character in
the unibyte non-@sc{ASCII} range, 128 through 255.
The right value to use to select character set @var{cs} is @code{(-
(make-char @var{cs} 0) 128)}. If the value of
@code{nonascii-insert-offset} is zero, then conversion actually uses the
value for the Latin 1 character set, rather than zero.
@end defvar
@tindex nonascii-translate-table
@defvar nonascii-translate-table
This variable provides a more general alternative to
@code{nonascii-insert-offset}. You can use it to specify independently
how to translate each code in the range of 128 through 255 into a
multibyte character. The value should be a vector, or @code{nil}.
@end defvar
@tindex string-make-unibyte
@defun string-make-unibyte string
This function converts the text of @var{string} to unibyte
representation, if it isn't already, and return the result. If
conversion does not change the contents, the value may be @var{string}
itself.
@end defun
@tindex string-make-multibyte
@defun string-make-multibyte string
This function converts the text of @var{string} to multibyte
representation, if it isn't already, and return the result. If
conversion does not change the contents, the value may be @var{string}
itself.
@end defun
@node Selecting a Representation
@section Selecting a Representation
Sometimes it is useful to examine an existing buffer or string as
multibyte when it was unibyte, or vice versa.
@tindex set-buffer-multibyte
@defun set-buffer-multibyte multibyte
Set the representation type of the current buffer. If @var{multibyte}
is non-@code{nil}, the buffer becomes multibyte. If @var{multibyte}
is @code{nil}, the buffer becomes unibyte.
This function leaves the buffer contents unchanged when viewed as a
sequence of bytes. As a consequence, it can change the contents viewed
as characters; a sequence of two bytes which is treated as one character
in multibyte representation will count as two characters in unibyte
representation.
This function sets @code{enable-multibyte-characters} to record which
representation is in use. It also adjusts various data in the buffer
(including its overlays, text properties and markers) so that they
cover or fall between the same text as they did before.
@end defun
@tindex string-as-unibyte
@defun string-as-unibyte string
This function returns a string with the same bytes as @var{string} but
treating each byte as a character. This means that the value may have
more characters than @var{string} has.
If @var{string} is unibyte already, then the value may be @var{string}
itself.
@end defun
@tindex string-as-multibyte
@defun string-as-multibyte string
This function returns a string with the same bytes as @var{string} but
treating each multibyte sequence as one character. This means that the
value may have fewer characters than @var{string} has.
If @var{string} is multibyte already, then the value may be @var{string}
itself.
@end defun
@node Character Codes
@section Character Codes
@cindex character codes
The unibyte and multibyte text representations use different character
codes. The valid character codes for unibyte representation range from
0 to 255---the values that can fit in one byte. The valid character
codes for multibyte representation range from 0 to 524287, but not all
values in that range are valid. In particular, the values 128 through
255 are not valid in multibyte text. Only the @sc{ASCII} codes 0
through 127 are used in both representations.
@defun char-valid-p charcode
This returns @code{t} if @var{charcode} is valid for either one of the two
text representations.
@example
(char-valid-p 65)
@result{} t
(char-valid-p 256)
@result{} nil
(char-valid-p 2248)
@result{} t
@end example
@end defun
@node Character Sets
@section Character Sets
@cindex character sets
Emacs classifies characters into various @dfn{character sets}, each of
which has a name which is a symbol. Each character belongs to one and
only one character set.
In general, there is one character set for each distinct script. For
example, @code{latin-iso8859-1} is one character set,
@code{greek-iso8859-7} is another, and @code{ascii} is another. An
Emacs character set can hold at most 9025 characters; therefore. in some
cases, a set of characters that would logically be grouped together are
split into several character sets. For example, one set of Chinese
characters is divided into eight Emacs character sets,
@code{chinese-cns11643-1} through @code{chinese-cns11643-7}.
@tindex charsetp
@defun charsetp object
Return @code{t} if @var{object} is a character set name symbol,
@code{nil} otherwise.
@end defun
@tindex charset-list
@defun charset-list
This function returns a list of all defined character set names.
@end defun
@tindex char-charset
@defun char-charset character
This function returns the the name of the character
set that @var{character} belongs to.
@end defun
@node Scanning Charsets
@section Scanning for Character Sets
Sometimes it is useful to find out which character sets appear in a
part of a buffer or a string. One use for this is in determining which
coding systems (@pxref{Coding Systems}) are capable of representing all
of the text in question.
@tindex find-charset-region
@defun find-charset-region beg end &optional unification
This function returns a list of the character sets
that appear in the current buffer between positions @var{beg}
and @var{end}.
@end defun
@tindex find-charset-string
@defun find-charset-string string &optional unification
This function returns a list of the character sets
that appear in the string @var{string}.
@end defun
@node Chars and Bytes
@section Characters and Bytes
@cindex bytes and characters
In multibyte representation, each character occupies one or more
bytes. The functions in this section convert between characters and the
byte values used to represent them.
@tindex char-bytes
@defun char-bytes character
This function returns the number of bytes used to represent the
character @var{character}. In most cases, this is the same as
@code{(length (split-char @var{character}))}; the only exception is for
ASCII characters, which use just one byte.
@example
(char-bytes 2248)
@result{} 2
(char-bytes 65)
@result{} 1
@end example
This function's values are correct for both multibyte and unibyte
representations, because the non-@sc{ASCII} character codes used in
those two representations do not overlap.
@example
(char-bytes 192)
@result{} 1
@end example
@end defun
@tindex split-char
@defun split-char character
Return a list containing the name of the character set of
@var{character}, followed by one or two byte-values which identify
@var{character} within that character set.
@example
(split-char 2248)
@result{} (latin-iso8859-1 72)
(split-char 65)
@result{} (ascii 65)
@end example
Unibyte non-@sc{ASCII} characters are considered as part of
the @code{ascii} character set:
@example
(split-char 192)
@result{} (ascii 192)
@end example
@end defun
@tindex make-char
@defun make-char charset &rest byte-values
Thus function returns the character in character set @var{charset}
identified by @var{byte-values}. This is roughly the opposite of
split-char.
@example
(make-char 'latin-iso8859-1 72)
@result{} 2248
@end example
@end defun
@node Coding Systems
@section Coding Systems
@cindex coding system
When Emacs reads or writes a file, and when Emacs sends text to a
subprocess or receives text from a subprocess, it normally performs
character code conversion and end-of-line conversion as specified
by a particular @dfn{coding system}.
@cindex character code conversion
@dfn{Character code conversion} involves conversion between the encoding
used inside Emacs and some other encoding. Emacs supports many
different encodings, in that it can convert to and from them. For
example, it can convert text to or from encodings such as Latin 1, Latin
2, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022. In some
cases, Emacs supports several alternative encodings for the same
characters; for example, there are three coding systems for the Cyrillic
(Russian) alphabet: ISO, Alternativnyj, and KOI8.
@cindex end of line conversion
@dfn{End of line conversion} handles three different conventions used
on various systems for end of line. The Unix convention is to use the
linefeed character (also called newline). The DOS convention is to use
the two character sequence, carriage-return linefeed, at the end of a
line. The Mac convention is to use just carriage-return.
Most coding systems specify a particular character code for
conversion, but some of them leave this unspecified---to be chosen
heuristically based on the data.
@cindex base coding system
@cindex variant coding system
@dfn{Base coding systems} such as @code{latin-1} leave the end-of-line
conversion unspecified, to be chosen based on the data. @dfn{Variant
coding systems} such as @code{latin-1-unix}, @code{latin-1-dos} and
@code{latin-1-mac} specify the end-of-line conversion explicitly as
well. Each base coding system has three corresponding variants whose
names are formed by adding @samp{-unix}, @samp{-dos} and @samp{-mac}.
Here are Lisp facilities for working with coding systems;
@tindex coding-system-list
@defun coding-system-list &optional base-only
This function returns a list of all coding system names (symbols). If
@var{base-only} is non-@code{nil}, the value includes only the
base coding systems. Otherwise, it includes variant coding systems as well.
@end defun
@tindex coding-system-p
@defun coding-system-p object
This function returns @code{t} if @var{object} is a coding system
name.
@end defun
@tindex check-coding-system
@defun check-coding-system coding-system
This function checks the validity of @var{coding-system}.
If that is valid, it returns @var{coding-system}.
Otherwise it signals an error with condition @code{coding-system-error}.
@end defun
@tindex detect-coding-region
@defun detect-coding-region start end highest
This function chooses a plausible coding system for decoding the text
from @var{start} to @var{end}. This text should be ``raw bytes''
(@pxref{Specifying Coding Systems}).
Normally this function returns is a list of coding systems that could
handle decoding the text that was scanned. They are listed in order of
decreasing priority, based on the priority specified by the user with
@code{prefer-coding-system}. But if @var{highest} is non-@code{nil},
then the return value is just one coding system, the one that is highest
in priority.
@end defun
@tindex detect-coding-string string highest
@defun detect-coding-string
This function is like @code{detect-coding-region} except that it
operates on the contents of @var{string} instead of bytes in the buffer.
@end defun
@defun find-operation-coding-system operation &rest arguments
This function returns the coding system to use (by default) for
performing @var{operation} with @var{arguments}. The value has this
form:
@example
(@var{decoding-system} @var{encoding-system})
@end example
The first element, @var{decoding-system}, is the coding system to use
for decoding (in case @var{operation} does decoding), and
@var{encoding-system} is the coding system for encoding (in case
@var{operation} does encoding).
The argument @var{operation} should be an Emacs I/O primitive:
@code{insert-file-contents}, @code{write-region}, @code{call-process},
@code{call-process-region}, @code{start-process}, or
@code{open-network-stream}.
The remaining arguments should be the same arguments that might be given
to that I/O primitive. Depending on which primitive, one of those
arguments is selected as the @dfn{target}. For example, if
@var{operation} does file I/O, whichever argument specifies the file
name is the target. For subprocess primitives, the process name is the
target. For @code{open-network-stream}, the target is the service name
or port number.
This function looks up the target in @code{file-coding-system-alist},
@code{process-coding-system-alist}, or
@code{network-coding-system-alist}, depending on @var{operation}.
@xref{Default Coding Systems}.
@end defun
@node Default Coding Systems
@section Default Coding Systems
These variable specify which coding system to use by default for
certain files or when running certain subprograms. The idea of these
variables is that you set them once and for all to the defaults you
want, and then do not change them again. To specify a particular coding
system for a particular operation, don't change these variables;
instead, override them using @code{coding-system-for-read} and
@code{coding-system-for-write} (@pxref{Specifying Coding Systems}).
@tindex file-coding-system-alist
@defvar file-coding-system-alist
This variable is an alist that specifies the coding systems to use for
reading and writing particular files. Each element has the form
@code{(@var{pattern} . @var{coding})}, where @var{pattern} is a regular
expression that matches certain file names. The element applies to file
names that match @var{pattern}.
The @sc{cdr} of the element, @var{val}, should be either a coding
system, a cons cell containing two coding systems, or a function symbol.
If @var{val} is a coding system, that coding system is used for both
reading the file and writing it. If @var{val} is a cons cell containing
two coding systems, its @sc{car} specifies the coding system for
decoding, and its @sc{cdr} specifies the coding system for encoding.
If @var{val} is a function symbol, the function must return a coding
system or a cons cell containing two coding systems. This value is used
as described above.
@end defvar
@tindex process-coding-system-alist
@defvar process-coding-system-alist
This variable is an alist specifying which coding systems to use for a
subprocess, depending on which program is running in the subprocess. It
works like @code{file-coding-system-alist}, except that @var{pattern} is
matched against the program name used to start the subprocess. The coding
system or systems specified in this alist are used to initialize the
coding systems used for I/O to the subprocess, but you can specify
other coding systems later using @code{set-process-coding-system}.
@end defvar
@tindex network-coding-system-alist
@defvar network-coding-system-alist
This variable is an alist that specifies the coding system to use for
network streams. It works much like @code{file-coding-system-alist},
with the difference that the @var{pattern} in an elemetn may be either a
port number or a regular expression. If it is a regular expression, it
is matched against the network service name used to open the network
stream.
@end defvar
@tindex default-process-coding-system
@defvar default-process-coding-system
This variable specifies the coding systems to use for subprocess (and
network stream) input and output, when nothing else specifies what to
do.
The value should be a cons cell of the form @code{(@var{output-coding}
. @var{input-coding})}. Here @var{output-coding} applies to output to
the subprocess, and @var{input-coding} applies to input from it.
@end defvar
@node Specifying Coding Systems
@section Specifying a Coding System for One Operation
You can specify the coding system for a specific operation by binding
the variables @code{coding-system-for-read} and/or
@code{coding-system-for-write}.
@tindex coding-system-for-read
@defvar coding-system-for-read
If this variable is non-@code{nil}, it specifies the coding system to
use for reading a file, or for input from a synchronous subprocess.
It also applies to any asynchronous subprocess or network stream, but in
a different way: the value of @code{coding-system-for-read} when you
start the subprocess or open the network stream specifies the input
decoding method for that subprocess or network stream. It remains in
use for that subprocess or network stream unless and until overridden.
The right way to use this variable is to bind it with @code{let} for a
specific I/O operation. Its global value is normally @code{nil}, and
you should not globally set it to any other value. Here is an example
of the right way to use the variable:
@example
;; @r{Read the file with no character code conversion.}
;; @r{Assume CRLF represents end-of-line.}
(let ((coding-system-for-write 'emacs-mule-dos))
(insert-file-contents filename))
@end example
When its value is non-@code{nil}, @code{coding-system-for-read} takes
precedence all other methods of specifying a coding system to use for
input, including @code{file-coding-system-alist},
@code{process-coding-system-alist} and
@code{network-coding-system-alist}.
@end defvar
@tindex coding-system-for-write
@defvar coding-system-for-write
This works much like @code{coding-system-for-read}, except that it
applies to output rather than input. It affects writing to files,
subprocesses, and net connections.
When a single operation does both input and output, as do
@code{call-process-region} and @code{start-process}, both
@code{coding-system-for-read} and @code{coding-system-for-write}
affect it.
@end defvar
@tindex last-coding-system-used
@defvar last-coding-system-used
All operations that use a coding system set this variable
to the coding system name that was used.
@end defvar
@tindex inhibit-eol-conversion
@defvar inhibit-eol-conversion
When this variable is non-@code{nil}, no end-of-line conversion is done,
no matter which coding system is specified. This applies to all the
Emacs I/O and subprocess primitives, and to the explicit encoding and
decoding functions (@pxref{Explicit Encoding}).
@end defvar
@tindex keyboard-coding-system
@defun keyboard-coding-system
This function returns the coding system that is in use for decoding
keyboard input---or @code{nil} if no coding system is to be used.
@end defun
@tindex set-keyboard-coding-system
@defun set-keyboard-coding-system coding-system
This function specifies @var{coding-system} as the coding system to
use for decoding keyboard input. If @var{coding-system} is @code{nil},
that means do not decode keyboard input.
@end defun
@tindex terminal-coding-system
@defun terminal-coding-system
This function returns the coding system that is in use for encoding
terminal output---or @code{nil} for no encoding.
@end defun
@tindex set-terminal-coding-system
@defun set-terminal-coding-system coding-system
This function specifies @var{coding-system} as the coding system to use
for encoding terminal output. If @var{coding-system} is @code{nil},
that means do not encode terminal output.
@end defun
See also the functions @code{process-coding-system} and
@code{set-process-coding-system}. @xref{Process Information}.
See also @code{read-coding-system} in @ref{High-Level Completion}.
@node Explicit Encoding
@section Explicit Encoding and Decoding
@cindex encoding text
@cindex decoding text
All the operations that transfer text in and out of Emacs have the
ability to use a coding system to encode or decode the text.
You can also explicitly encode and decode text using the functions
in this section.
@cindex raw bytes
The result of encoding, and the input to decoding, are not ordinary
text. They are ``raw bytes''---bytes that represent text in the same
way that an external file would. When a buffer contains raw bytes, it
is most natural to mark that buffer as using unibyte representation,
using @code{set-buffer-multibyte} (@pxref{Selecting a Representation}),
but this is not required.
The usual way to get raw bytes in a buffer, for explicit decoding, is
to read them with from a file with @code{insert-file-contents-literally}
(@pxref{Reading from Files}) or specify a non-@code{nil} @var{rawfile}
arguments when visiting a file with @code{find-file-noselect}.
The usual way to use the raw bytes that result from explicitly
encoding text is to copy them to a file or process---for example, to
write it with @code{write-region} (@pxref{Writing to Files}), and
suppress encoding for that @code{write-region} call by binding
@code{coding-system-for-write} to @code{no-conversion}.
@tindex encode-coding-region
@defun encode-coding-region start end coding-system
This function encodes the text from @var{start} to @var{end} according
to coding system @var{coding-system}. The encoded text replaces
the original text in the buffer. The result of encoding is
``raw bytes.''
@end defun
@tindex encode-coding-string
@defun encode-coding-string string coding-system
This function encodes the text in @var{string} according to coding
system @var{coding-system}. It returns a new string containing the
encoded text. The result of encoding is ``raw bytes.''
@end defun
@tindex decode-coding-region
@defun decode-coding-region start end coding-system
This function decodes the text from @var{start} to @var{end} according
to coding system @var{coding-system}. The decoded text replaces the
original text in the buffer. To make explicit decoding useful, the text
before decoding ought to be ``raw bytes.''
@end defun
@tindex decode-coding-string
@defun decode-coding-string string coding-system
This function decodes the text in @var{string} according to coding
system @var{coding-system}. It returns a new string containing the
decoded text. To make explicit decoding useful, the contents of
@var{string} ought to be ``raw bytes.''
@end defun