Improve documentation of 'decode-coding-region'

* src/coding.c (Fdecode_coding_region): Doc fix.

* doc/lispref/nonascii.texi (Coding System Basics)
(Explicit Encoding): Explain the significance of using 'undecided'
in 'decode-coding-*' functions.
This commit is contained in:
Eli Zaretskii 2021-11-12 10:53:52 +02:00
parent a6905e90cc
commit 0d0125daae
2 changed files with 27 additions and 8 deletions

View file

@ -1048,9 +1048,9 @@ Alternativnyj, and KOI8.
Every coding system specifies a particular set of character code Every coding system specifies a particular set of character code
conversions, but the coding system @code{undecided} is special: it conversions, but the coding system @code{undecided} is special: it
leaves the choice unspecified, to be chosen heuristically for each leaves the choice unspecified, to be chosen heuristically for each
file, based on the file's data. The coding system @code{prefer-utf-8} file or string, based on the file's or string's data, when they are
is like @code{undecided}, but it prefers to choose @code{utf-8} when decoded or encoded. The coding system @code{prefer-utf-8} is like
possible. @code{undecided}, but it prefers to choose @code{utf-8} when possible.
In general, a coding system doesn't guarantee roundtrip identity: In general, a coding system doesn't guarantee roundtrip identity:
decoding a byte sequence using a coding system, then encoding the decoding a byte sequence using a coding system, then encoding the
@ -1921,9 +1921,24 @@ length of the decoded text. If that buffer is a unibyte buffer
the decoded text (@pxref{Text Representations}) is inserted into the the decoded text (@pxref{Text Representations}) is inserted into the
buffer as individual bytes. buffer as individual bytes.
@cindex @code{charset}, text property on buffer text
This command puts a @code{charset} text property on the decoded text. This command puts a @code{charset} text property on the decoded text.
The value of the property states the character set used to decode the The value of the property states the character set used to decode the
original text. original text.
@cindex undecided coding-system, when decoding
This command detects the encoding of the text if necessary. If
@var{coding-system} is @code{undecided}, the command detects the
encoding of the text based on the byte sequences it finds in the text,
and also detects the type of end-of-line convention used by the text
(@pxref{Lisp and Coding Systems, eol type}). If @var{coding-system}
is @code{undecided-@var{eol-type}}, where @var{eol-type} is
@code{unix}, @code{dos}, or @code{mac}, then the command detects only
the encoding of the text. Any @var{coding-system} that doesn't
specify @var{eol-type}, as in @code{utf-8}, causes the command to
detect the end-of-line convention; specify the encoding completely, as
in @code{utf-8-unix}, if the EOL convention used by the text is known
in advance, to prevent any automatic detection.
@end deffn @end deffn
@defun decode-coding-string string coding-system &optional nocopy buffer @defun decode-coding-string string coding-system &optional nocopy buffer
@ -1936,13 +1951,16 @@ trivial. To make explicit decoding useful, the contents of
values, but a multibyte string is also acceptable (assuming it values, but a multibyte string is also acceptable (assuming it
contains 8-bit bytes in their multibyte form). contains 8-bit bytes in their multibyte form).
This function detects the encoding of the string if needed, like
@code{decode-coding-region} does.
If optional argument @var{buffer} specifies a buffer, the decoded text If optional argument @var{buffer} specifies a buffer, the decoded text
is inserted in that buffer after point (point does not move). In this is inserted in that buffer after point (point does not move). In this
case, the return value is the length of the decoded text. If that case, the return value is the length of the decoded text. If that
buffer is a unibyte buffer, the internal representation of the decoded buffer is a unibyte buffer, the internal representation of the decoded
text is inserted into it as individual bytes. text is inserted into it as individual bytes.
@cindex @code{charset}, text property @cindex @code{charset}, text property on strings
This function puts a @code{charset} text property on the decoded text. This function puts a @code{charset} text property on the decoded text.
The value of the property states the character set used to decode the The value of the property states the character set used to decode the
original text: original text:

View file

@ -9455,11 +9455,12 @@ code_convert_region (Lisp_Object start, Lisp_Object end,
DEFUN ("decode-coding-region", Fdecode_coding_region, Sdecode_coding_region, DEFUN ("decode-coding-region", Fdecode_coding_region, Sdecode_coding_region,
3, 4, "r\nzCoding system: ", 3, 4, "r\nzCoding system: ",
doc: /* Decode the current region from the specified coding system. doc: /* Decode the current region from the specified coding system.
Interactively, prompt for the coding system to decode the region.
What's meant by \"decoding\" is transforming bytes into text \"Decoding\" means transforming bytes into readable text (characters).
(characters). If, for instance, you have a region that contains data If, for instance, you have a region that contains data that represents
that represents the two bytes #xc2 #xa9, after calling this function the two bytes #xc2 #xa9, after calling this function with the utf-8
with the utf-8 coding system, the region will contain the single coding system, the region will contain the single
character ?\\N{COPYRIGHT SIGN}. character ?\\N{COPYRIGHT SIGN}.
When called from a program, takes four arguments: When called from a program, takes four arguments: