re PR libstdc++/59698 (The type of NULL is described elsewhere)
PR libstdc++/59698 * doc/xml/manual/status_cxx1998.xml (iso.1998.specific): Markup and stylistic improvements. * doc/xml/manual/codecvt.xml (std.localization.facet.codecvt): Likewise and update for C++11. * doc/xml/manual/ctype.xml (std.localization.facet.ctype): Likewise. From-SVN: r206524
This commit is contained in:
parent
5e6667b25b
commit
92bf138207
4 changed files with 102 additions and 77 deletions
|
@ -1,3 +1,12 @@
|
|||
2014-01-10 Jonathan Wakely <jwakely@redhat.com>
|
||||
|
||||
PR libstdc++/59698
|
||||
* doc/xml/manual/status_cxx1998.xml (iso.1998.specific): Markup
|
||||
and stylistic improvements.
|
||||
* doc/xml/manual/codecvt.xml (std.localization.facet.codecvt): Likewise
|
||||
and update for C++11.
|
||||
* doc/xml/manual/ctype.xml (std.localization.facet.ctype): Likewise.
|
||||
|
||||
2014-01-09 Jonathan Wakely <jwakely@redhat.com>
|
||||
|
||||
PR libstdc++/59738
|
||||
|
|
|
@ -15,11 +15,11 @@
|
|||
The standard class codecvt attempts to address conversions between
|
||||
different character encoding schemes. In particular, the standard
|
||||
attempts to detail conversions between the implementation-defined wide
|
||||
characters (hereafter referred to as wchar_t) and the standard type
|
||||
char that is so beloved in classic <quote>C</quote> (which can now be
|
||||
referred to as narrow characters.) This document attempts to describe
|
||||
how the GNU libstdc++ implementation deals with the conversion between
|
||||
wide and narrow characters, and also presents a framework for dealing
|
||||
characters (hereafter referred to as <type>wchar_t</type>) and the standard
|
||||
type <type>char</type> that is so beloved in classic <quote>C</quote>
|
||||
(which can now be referred to as narrow characters.) This document attempts
|
||||
to describe how the GNU libstdc++ implementation deals with the conversion
|
||||
between wide and narrow characters, and also presents a framework for dealing
|
||||
with the huge number of other encodings that iconv can convert,
|
||||
including Unicode and UTF8. Design issues and requirements are
|
||||
addressed, and examples of correct usage for both the required
|
||||
|
@ -47,8 +47,8 @@ The text around the codecvt definition gives some clues:
|
|||
<blockquote>
|
||||
<para>
|
||||
<emphasis>
|
||||
-1- The class codecvt<internT,externT,stateT> is for use when
|
||||
converting from one codeset to another, such as from wide characters
|
||||
-1- The class <code>codecvt<internT,externT,stateT></code> is for use
|
||||
when converting from one codeset to another, such as from wide characters
|
||||
to multibyte characters, between wide character encodings such as
|
||||
Unicode and EUC.
|
||||
</emphasis>
|
||||
|
@ -64,7 +64,7 @@ class.
|
|||
<blockquote>
|
||||
<para>
|
||||
<emphasis>
|
||||
-2- The stateT argument selects the pair of codesets being mapped between.
|
||||
-2- The <type>stateT</type> argument selects the pair of codesets being mapped between.
|
||||
</emphasis>
|
||||
</para>
|
||||
</blockquote>
|
||||
|
@ -76,17 +76,19 @@ Ah ha! Another clue...
|
|||
<blockquote>
|
||||
<para>
|
||||
<emphasis>
|
||||
-3- The instantiations required in the Table ??
|
||||
(lib.locale.category), namely codecvt<wchar_t,char,mbstate_t> and
|
||||
codecvt<char,char,mbstate_t>, convert the implementation-defined
|
||||
native character set. codecvt<char,char,mbstate_t> implements a
|
||||
degenerate conversion; it does not convert at
|
||||
all. codecvt<wchar_t,char,mbstate_t> converts between the native
|
||||
character sets for tiny and wide characters. Instantiations on
|
||||
mbstate_t perform conversion between encodings known to the library
|
||||
-3- The instantiations required in the Table 51 (lib.locale.category), namely
|
||||
<classname>codecvt<wchar_t,char,mbstate_t></classname> and
|
||||
<classname>codecvt<char,char,mbstate_t></classname>, convert the
|
||||
implementation-defined native character set.
|
||||
<classname>codecvt<char,char,mbstate_t></classname> implements a
|
||||
degenerate conversion; it does not convert at all.
|
||||
<classname>codecvt<wchar_t,char,mbstate_t></classname> converts between
|
||||
the native character sets for tiny and wide characters. Instantiations on
|
||||
<type>mbstate_t</type> perform conversion between encodings known to the library
|
||||
implementor. Other encodings can be converted by specializing on a
|
||||
user-defined stateT type. The stateT object can contain any state that
|
||||
is useful to communicate to or from the specialized do_convert member.
|
||||
user-defined <type>stateT</type> type. The <type>stateT</type> object can
|
||||
contain any state that is useful to communicate to or from the specialized
|
||||
<function>do_convert</function> member.
|
||||
</emphasis>
|
||||
</para>
|
||||
</blockquote>
|
||||
|
@ -98,13 +100,14 @@ At this point, a couple points become clear:
|
|||
<para>
|
||||
One: The standard clearly implies that attempts to add non-required
|
||||
(yet useful and widely used) conversions need to do so through the
|
||||
third template parameter, stateT.</para>
|
||||
third template parameter, <type>stateT</type>.</para>
|
||||
|
||||
<para>
|
||||
Two: The required conversions, by specifying mbstate_t as the third
|
||||
template parameter, imply an implementation strategy that is mostly
|
||||
Two: The required conversions, by specifying <type>mbstate_t</type> as the
|
||||
third template parameter, imply an implementation strategy that is mostly
|
||||
(or wholly) based on the underlying C library, and the functions
|
||||
mcsrtombs and wcsrtombs in particular.</para>
|
||||
<function>mcsrtombs</function> and <function>wcsrtombs</function> in
|
||||
particular.</para>
|
||||
</section>
|
||||
|
||||
<section xml:id="facet.codecvt.design"><info><title>Design</title></info>
|
||||
|
@ -114,7 +117,7 @@ mcsrtombs and wcsrtombs in particular.</para>
|
|||
|
||||
|
||||
<para>
|
||||
The simple implementation detail of wchar_t's size seems to
|
||||
The simple implementation detail of <type>wchar_t</type>'s size seems to
|
||||
repeatedly confound people. Many systems use a two byte,
|
||||
unsigned integral type to represent wide characters, and use an
|
||||
internal encoding of Unicode or UCS2. (See AIX, Microsoft NT,
|
||||
|
@ -122,7 +125,7 @@ mcsrtombs and wcsrtombs in particular.</para>
|
|||
type to represent wide characters, and use an internal encoding
|
||||
of UCS4. (GNU/Linux systems using glibc, in particular.) The C
|
||||
programming language (and thus C++) does not specify a specific
|
||||
size for the type wchar_t.
|
||||
size for the type <type>wchar_t</type>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
|
@ -136,9 +139,12 @@ mcsrtombs and wcsrtombs in particular.</para>
|
|||
Probably the most frequently asked question about code conversion
|
||||
is: "So dudes, what's the deal with Unicode strings?"
|
||||
The dude part is optional, but apparently the usefulness of
|
||||
Unicode strings is pretty widely appreciated. Sadly, this specific
|
||||
encoding (And other useful encodings like UTF8, UCS4, ISO 8859-10,
|
||||
etc etc etc) are not mentioned in the C++ standard.
|
||||
Unicode strings is pretty widely appreciated. The Unicode character
|
||||
set (and useful encodings like UTF-8, UCS-4, ISO 8859-10,
|
||||
etc etc etc) were not mentioned in the first C++ standard. (The 2011
|
||||
standard added support for string literals with different encodings
|
||||
and some library facilities for converting between encodings, but the
|
||||
notes below have not been updated to reflect that.)
|
||||
</para>
|
||||
|
||||
<para>
|
||||
|
@ -149,8 +155,8 @@ mcsrtombs and wcsrtombs in particular.</para>
|
|||
The thought that all one needs to convert between two arbitrary
|
||||
codesets is two types and some kind of state argument is
|
||||
unfortunate. In particular, encodings may be stateless. The naming
|
||||
of the third parameter as stateT is unfortunate, as what is really
|
||||
needed is some kind of generalized type that accounts for the
|
||||
of the third parameter as <type>stateT</type> is unfortunate, as what is
|
||||
really needed is some kind of generalized type that accounts for the
|
||||
issues that abstract encodings will need. The minimum information
|
||||
that is required includes:
|
||||
</para>
|
||||
|
@ -240,7 +246,8 @@ mechanism may be required.
|
|||
<para>
|
||||
In addition, multi-threaded and multi-locale environments also impact
|
||||
the design and requirements for code conversions. In particular, they
|
||||
affect the required specialization codecvt<wchar_t, char, mbstate_t>
|
||||
affect the required specialization
|
||||
<classname>codecvt<wchar_t, char, mbstate_t></classname>
|
||||
when implemented using standard "C" functions.
|
||||
</para>
|
||||
|
||||
|
@ -249,7 +256,8 @@ Three problems arise, one big, one of medium importance, and one small.
|
|||
</para>
|
||||
|
||||
<para>
|
||||
First, the small: mcsrtombs and wcsrtombs may not be multithread-safe
|
||||
First, the small: <function>mcsrtombs</function> and
|
||||
<function>wcsrtombs</function> may not be multithread-safe
|
||||
on all systems required by the GNU tools. For GNU/Linux and glibc,
|
||||
this is not an issue.
|
||||
</para>
|
||||
|
@ -275,7 +283,8 @@ option, a high-quality implementation, damn the additional complexity!
|
|||
</para>
|
||||
|
||||
<para>
|
||||
For the required specialization codecvt<wchar_t, char, mbstate_t> ,
|
||||
For the required specialization
|
||||
<classname>codecvt<wchar_t, char, mbstate_t></classname>,
|
||||
conversions are made between the internal character set (always UCS4
|
||||
on GNU/Linux) and whatever the currently selected locale for the
|
||||
LC_CTYPE category implements.
|
||||
|
@ -311,37 +320,39 @@ codecvt<char, wchar_t, mbstate_t>
|
|||
<para>
|
||||
This specialization, by specifying all the template parameters, pretty
|
||||
much ties the hands of implementors. As such, the implementation is
|
||||
straightforward, involving mcsrtombs for the conversions between char
|
||||
to wchar_t and wcsrtombs for conversions between wchar_t and char.
|
||||
straightforward, involving <function>mcsrtombs</function> for the conversions
|
||||
between <type>char</type> to <type>wchar_t</type> and
|
||||
<function>wcsrtombs</function> for conversions between <type>wchar_t</type>
|
||||
and <type>char</type>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Neither of these two required specializations deals with Unicode
|
||||
characters. As such, libstdc++ implements a partial specialization
|
||||
of the codecvt class with and iconv wrapper class, encoding_state as the
|
||||
third template parameter.
|
||||
of the <type>codecvt</type> class with an iconv wrapper class,
|
||||
<classname>encoding_state</classname> as the third template parameter.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This implementation should be standards conformant. First of all, the
|
||||
standard explicitly points out that instantiations on the third
|
||||
template parameter, stateT, are the proper way to implement
|
||||
template parameter, <type>stateT</type>, are the proper way to implement
|
||||
non-required conversions. Second of all, the standard says (in Chapter
|
||||
17) that partial specializations of required classes are a-ok. Third
|
||||
of all, the requirements for the stateT type elsewhere in the standard
|
||||
(see 21.1.2 traits typedefs) only indicate that this type be copy
|
||||
17) that partial specializations of required classes are A-OK. Third
|
||||
of all, the requirements for the <type>stateT</type> type elsewhere in the
|
||||
standard (see 21.1.2 traits typedefs) only indicate that this type be copy
|
||||
constructible.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
As such, the type encoding_state is defined as a non-templatized, POD
|
||||
type to be used as the third type of a codecvt instantiation. This
|
||||
type is just a wrapper class for iconv, and provides an easy interface
|
||||
As such, the type <type>encoding_state</type> is defined as a non-templatized,
|
||||
POD type to be used as the third type of a <type>codecvt</type> instantiation.
|
||||
This type is just a wrapper class for iconv, and provides an easy interface
|
||||
to iconv functionality.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
There are two constructors for encoding_state:
|
||||
There are two constructors for <type>encoding_state</type>:
|
||||
</para>
|
||||
|
||||
<para>
|
||||
|
@ -352,7 +363,7 @@ encoding_state() : __in_desc(0), __out_desc(0)
|
|||
<para>
|
||||
This default constructor sets the internal encoding to some default
|
||||
(currently UCS4) and the external encoding to whatever is returned by
|
||||
nl_langinfo(CODESET).
|
||||
<code>nl_langinfo(CODESET)</code>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
|
@ -370,7 +381,7 @@ either argument.
|
|||
<para>
|
||||
One of the issues with iconv is that the string literals identifying
|
||||
conversions are not standardized. Because of this, the thought of
|
||||
mandating and or enforcing some set of pre-determined valid
|
||||
mandating and/or enforcing some set of pre-determined valid
|
||||
identifiers seems iffy: thus, a more practical (and non-migraine
|
||||
inducing) strategy was implemented: end-users can specify any string
|
||||
(subject to a pre-determined length qualifier, currently 32 bytes) for
|
||||
|
@ -400,12 +411,12 @@ _M_good()
|
|||
</para>
|
||||
|
||||
<para>
|
||||
Provides a way to see if the given encoding_state object has been
|
||||
Provides a way to see if the given <type>encoding_state</type> object has been
|
||||
properly initialized. If the string literals describing the desired
|
||||
internal and external encoding are not valid, initialization will
|
||||
fail, and this will return false. If the internal and external
|
||||
encodings are valid, but iconv_open could not allocate conversion
|
||||
descriptors, this will also return false. Otherwise, the object is
|
||||
encodings are valid, but <function>iconv_open</function> could not allocate
|
||||
conversion descriptors, this will also return false. Otherwise, the object is
|
||||
ready to convert and will return true.
|
||||
</para>
|
||||
|
||||
|
@ -424,8 +435,8 @@ themselves.
|
|||
|
||||
<para>
|
||||
Definitions for all the required codecvt member functions are provided
|
||||
for this specialization, and usage of codecvt<internal character type,
|
||||
external character type, encoding_state> is consistent with other
|
||||
for this specialization, and usage of <code>codecvt<<replaceable>internal
|
||||
character type</replaceable>, <replaceable>external character type</replaceable>, <replaceable>encoding_state</replaceable>></code> is consistent with other
|
||||
codecvt usage.
|
||||
</para>
|
||||
|
||||
|
@ -433,7 +444,7 @@ codecvt usage.
|
|||
|
||||
<section xml:id="facet.codecvt.use"><info><title>Use</title></info>
|
||||
|
||||
<para>A conversions involving string literal.</para>
|
||||
<para>A conversion involving a string literal.</para>
|
||||
|
||||
<programlisting>
|
||||
typedef codecvt_base::result result;
|
||||
|
@ -490,7 +501,7 @@ codecvt usage.
|
|||
|
||||
<listitem>
|
||||
<para>
|
||||
b. conversions involving std::string
|
||||
b. conversions involving <type>std::string</type>
|
||||
</para>
|
||||
<itemizedlist>
|
||||
<listitem><para>
|
||||
|
|
|
@ -18,10 +18,10 @@
|
|||
|
||||
|
||||
<para>
|
||||
For the required specialization codecvt<wchar_t, char, mbstate_t> ,
|
||||
For the required specialization <classname>codecvt<wchar_t, char, mbstate_t></classname>,
|
||||
conversions are made between the internal character set (always UCS4
|
||||
on GNU/Linux) and whatever the currently selected locale for the
|
||||
LC_CTYPE category implements.
|
||||
<code>LC_CTYPE</code> category implements.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
|
@ -45,8 +45,10 @@ ctype<wchar_t>
|
|||
<para>
|
||||
This specialization, by specifying all the template parameters, pretty
|
||||
much ties the hands of implementors. As such, the implementation is
|
||||
straightforward, involving mcsrtombs for the conversions between char
|
||||
to wchar_t and wcsrtombs for conversions between wchar_t and char.
|
||||
straightforward, involving <function>mcsrtombs</function> for the
|
||||
conversions between <type>char</type> to <type>wchar_t</type> and
|
||||
<function>wcsrtombs</function> for conversions between <type>wchar_t</type>
|
||||
and <type>char</type>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
|
@ -69,7 +71,8 @@ characters.
|
|||
|
||||
<listitem>
|
||||
<para>
|
||||
How to deal with different types than char, wchar_t? </para></listitem>
|
||||
How to deal with types other than <type>char</type>, <type>wchar_t</type>?
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>
|
||||
Overlap between codecvt/ctype: narrow/widen
|
||||
|
@ -77,8 +80,8 @@ characters.
|
|||
|
||||
<listitem>
|
||||
<para>
|
||||
Mask typedef in codecvt_base, argument types in codecvt. what
|
||||
is know about this type?
|
||||
<type>mask</type> typedef in <classname>codecvt_base</classname>,
|
||||
argument types in <type>codecvt</type>. what is know about this type?
|
||||
</para></listitem>
|
||||
|
||||
<listitem>
|
||||
|
@ -95,10 +98,11 @@ characters.
|
|||
|
||||
<listitem>
|
||||
<para>
|
||||
Get the ctype<wchar_t>::mask stuff under control. Need to
|
||||
make some kind of static table, and not do lookup every time
|
||||
somebody hits the do_is... functions. Too bad we can't just
|
||||
redefine mask for ctype<wchar_t>
|
||||
Get the <type>ctype<wchar_t>::mask</type> stuff under control.
|
||||
Need to make some kind of static table, and not do lookup every time
|
||||
somebody hits the <code>do_is...</code> functions. Too bad we can't
|
||||
just redefine <type>mask</type> for
|
||||
<classname>ctype<wchar_t></classname>
|
||||
</para></listitem>
|
||||
|
||||
<listitem>
|
||||
|
|
|
@ -1075,7 +1075,7 @@ particular release.
|
|||
in the sections where the function itself occurs.
|
||||
-->
|
||||
<para><emphasis>[18.1]/4</emphasis> The type of <code>NULL</code> is described
|
||||
<link linkend="std.support.types.null">here</link>.
|
||||
under <link linkend="std.support.types.null">Support</link>.
|
||||
</para>
|
||||
<para><emphasis>[18.3]/8</emphasis> Even though it's listed in the library
|
||||
sections, libstdc++ has zero control over what the cleanup code hands
|
||||
|
@ -1107,9 +1107,10 @@ particular release.
|
|||
implementations, any requirements imposed on allocators by containers
|
||||
beyond those requirements that appear in Table 32, and the semantics
|
||||
of containers and algorithms when allocator instances compare
|
||||
non-equal, are implementation-defined."</emphasis> As yet we don't
|
||||
have any allocators which compare non-equal, so we can't describe how
|
||||
they behave.
|
||||
non-equal, are implementation-defined."</emphasis> There is experimental
|
||||
support for non-equal allocators in the standard containers in C++98
|
||||
mode. There are no additional requirements on allocators. It is undefined
|
||||
behaviour to swap two containers if their allocators are not equal.
|
||||
</para>
|
||||
<para><emphasis>[21.1.3.1]/3,4</emphasis>,
|
||||
<emphasis>[21.1.3.2]/2</emphasis>,
|
||||
|
@ -1121,16 +1122,16 @@ particular release.
|
|||
here would defeat the purpose. :-)
|
||||
</para>
|
||||
<para><emphasis>[21.1.3.1]/5</emphasis> I don't really know about
|
||||
the mbstate_t stuff... see
|
||||
the <link linkend="std.localization.facet.codecvt">chapter 22
|
||||
the <type>mbstate_t</type> stuff... see
|
||||
the <link linkend="std.localization.facet.codecvt"><code>codecvt</code>
|
||||
notes</link> for what does exist.
|
||||
</para>
|
||||
<para><emphasis>[22.*]</emphasis> Anything and everything we have on locale
|
||||
implementation will be described
|
||||
<link linkend="std.localization.locales.locale">over here</link>.
|
||||
implementation will be described under
|
||||
<link linkend="std.localization.locales.locale">Localization</link>.
|
||||
</para>
|
||||
<para><emphasis>[26.2.8]/9</emphasis> I have no idea what
|
||||
<code>complex<T></code>'s pow(0,0) returns.
|
||||
<code>complex<T></code>'s <code>pow(0,0)</code> returns.
|
||||
</para>
|
||||
<para><emphasis>[27.4.2.4]/2</emphasis> Calling
|
||||
<code>std::ios_base::sync_with_stdio</code> after I/O has already been
|
||||
|
@ -1138,8 +1139,8 @@ particular release.
|
|||
flush the buffers, and <!-- this line might go away -->
|
||||
destroy and recreate the underlying buffer instances. Whether or not
|
||||
the previously-written I/O is destroyed in this process depends mostly
|
||||
on the --enable-libio choice: for stdio, if the written data is
|
||||
already in the stdio buffer, the data may be completely safe!
|
||||
on the <code>--enable-libio</code> choice: for stdio, if the written
|
||||
data is already in the stdio buffer, the data may be completely safe!
|
||||
</para>
|
||||
<para><emphasis>[27.6.1.1.2]</emphasis>,
|
||||
<emphasis>[27.6.2.3]</emphasis> The I/O sentry ctor and dtor can perform
|
||||
|
@ -1148,8 +1149,8 @@ particular release.
|
|||
</para>
|
||||
<para><emphasis>[27.7.1.3]/16</emphasis>,
|
||||
<emphasis>[27.8.1.4]/10</emphasis>
|
||||
The effects of <code>pubsetbuf/setbuf</code> are described
|
||||
<link linkend="std.io">in this chapter</link>.
|
||||
The effects of <code>pubsetbuf/setbuf</code> are described in the
|
||||
<link linkend="std.io">Input and Output</link> chapter.
|
||||
</para>
|
||||
<para><emphasis>[27.8.1.4]/16</emphasis> Calling <code>fstream::sync</code> when
|
||||
a get area exists will... whatever <code>fflush()</code> does, I think.
|
||||
|
|
Loading…
Add table
Reference in a new issue