diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog index 19269f588d1..847d7b9f9e1 100644 --- a/libstdc++-v3/ChangeLog +++ b/libstdc++-v3/ChangeLog @@ -1,3 +1,12 @@ +2014-01-10 Jonathan Wakely + + PR libstdc++/59698 + * doc/xml/manual/status_cxx1998.xml (iso.1998.specific): Markup + and stylistic improvements. + * doc/xml/manual/codecvt.xml (std.localization.facet.codecvt): Likewise + and update for C++11. + * doc/xml/manual/ctype.xml (std.localization.facet.ctype): Likewise. + 2014-01-09 Jonathan Wakely PR libstdc++/59738 diff --git a/libstdc++-v3/doc/xml/manual/codecvt.xml b/libstdc++-v3/doc/xml/manual/codecvt.xml index 9961515d491..76120060e30 100644 --- a/libstdc++-v3/doc/xml/manual/codecvt.xml +++ b/libstdc++-v3/doc/xml/manual/codecvt.xml @@ -15,11 +15,11 @@ The standard class codecvt attempts to address conversions between different character encoding schemes. In particular, the standard attempts to detail conversions between the implementation-defined wide -characters (hereafter referred to as wchar_t) and the standard type -char that is so beloved in classic C (which can now be -referred to as narrow characters.) This document attempts to describe -how the GNU libstdc++ implementation deals with the conversion between -wide and narrow characters, and also presents a framework for dealing +characters (hereafter referred to as wchar_t) and the standard +type char that is so beloved in classic C +(which can now be referred to as narrow characters.) This document attempts +to describe how the GNU libstdc++ implementation deals with the conversion +between wide and narrow characters, and also presents a framework for dealing with the huge number of other encodings that iconv can convert, including Unicode and UTF8. Design issues and requirements are addressed, and examples of correct usage for both the required @@ -47,8 +47,8 @@ The text around the codecvt definition gives some clues:
--1- The class codecvt<internT,externT,stateT> is for use when -converting from one codeset to another, such as from wide characters +-1- The class codecvt<internT,externT,stateT> is for use +when converting from one codeset to another, such as from wide characters to multibyte characters, between wide character encodings such as Unicode and EUC. @@ -64,7 +64,7 @@ class.
--2- The stateT argument selects the pair of codesets being mapped between. +-2- The stateT argument selects the pair of codesets being mapped between.
@@ -76,17 +76,19 @@ Ah ha! Another clue...
--3- The instantiations required in the Table ?? -(lib.locale.category), namely codecvt<wchar_t,char,mbstate_t> and -codecvt<char,char,mbstate_t>, convert the implementation-defined -native character set. codecvt<char,char,mbstate_t> implements a -degenerate conversion; it does not convert at -all. codecvt<wchar_t,char,mbstate_t> converts between the native -character sets for tiny and wide characters. Instantiations on -mbstate_t perform conversion between encodings known to the library +-3- The instantiations required in the Table 51 (lib.locale.category), namely +codecvt<wchar_t,char,mbstate_t> and +codecvt<char,char,mbstate_t>, convert the +implementation-defined native character set. +codecvt<char,char,mbstate_t> implements a +degenerate conversion; it does not convert at all. +codecvt<wchar_t,char,mbstate_t> converts between +the native character sets for tiny and wide characters. Instantiations on +mbstate_t perform conversion between encodings known to the library implementor. Other encodings can be converted by specializing on a -user-defined stateT type. The stateT object can contain any state that -is useful to communicate to or from the specialized do_convert member. +user-defined stateT type. The stateT object can +contain any state that is useful to communicate to or from the specialized +do_convert member.
@@ -98,13 +100,14 @@ At this point, a couple points become clear: One: The standard clearly implies that attempts to add non-required (yet useful and widely used) conversions need to do so through the -third template parameter, stateT. +third template parameter, stateT.
-Two: The required conversions, by specifying mbstate_t as the third -template parameter, imply an implementation strategy that is mostly +Two: The required conversions, by specifying mbstate_t as the +third template parameter, imply an implementation strategy that is mostly (or wholly) based on the underlying C library, and the functions -mcsrtombs and wcsrtombs in particular. +mcsrtombs and wcsrtombs in +particular.
Design @@ -114,7 +117,7 @@ mcsrtombs and wcsrtombs in particular. - The simple implementation detail of wchar_t's size seems to + The simple implementation detail of wchar_t's size seems to repeatedly confound people. Many systems use a two byte, unsigned integral type to represent wide characters, and use an internal encoding of Unicode or UCS2. (See AIX, Microsoft NT, @@ -122,7 +125,7 @@ mcsrtombs and wcsrtombs in particular. type to represent wide characters, and use an internal encoding of UCS4. (GNU/Linux systems using glibc, in particular.) The C programming language (and thus C++) does not specify a specific - size for the type wchar_t. + size for the type wchar_t. @@ -136,9 +139,12 @@ mcsrtombs and wcsrtombs in particular. Probably the most frequently asked question about code conversion is: "So dudes, what's the deal with Unicode strings?" The dude part is optional, but apparently the usefulness of - Unicode strings is pretty widely appreciated. Sadly, this specific - encoding (And other useful encodings like UTF8, UCS4, ISO 8859-10, - etc etc etc) are not mentioned in the C++ standard. + Unicode strings is pretty widely appreciated. The Unicode character + set (and useful encodings like UTF-8, UCS-4, ISO 8859-10, + etc etc etc) were not mentioned in the first C++ standard. (The 2011 + standard added support for string literals with different encodings + and some library facilities for converting between encodings, but the + notes below have not been updated to reflect that.) @@ -149,8 +155,8 @@ mcsrtombs and wcsrtombs in particular. The thought that all one needs to convert between two arbitrary codesets is two types and some kind of state argument is unfortunate. In particular, encodings may be stateless. The naming - of the third parameter as stateT is unfortunate, as what is really - needed is some kind of generalized type that accounts for the + of the third parameter as stateT is unfortunate, as what is + really needed is some kind of generalized type that accounts for the issues that abstract encodings will need. The minimum information that is required includes: @@ -240,7 +246,8 @@ mechanism may be required. In addition, multi-threaded and multi-locale environments also impact the design and requirements for code conversions. In particular, they -affect the required specialization codecvt<wchar_t, char, mbstate_t> +affect the required specialization +codecvt<wchar_t, char, mbstate_t> when implemented using standard "C" functions. @@ -249,7 +256,8 @@ Three problems arise, one big, one of medium importance, and one small. -First, the small: mcsrtombs and wcsrtombs may not be multithread-safe +First, the small: mcsrtombs and +wcsrtombs may not be multithread-safe on all systems required by the GNU tools. For GNU/Linux and glibc, this is not an issue. @@ -275,7 +283,8 @@ option, a high-quality implementation, damn the additional complexity! -For the required specialization codecvt<wchar_t, char, mbstate_t> , +For the required specialization +codecvt<wchar_t, char, mbstate_t>, conversions are made between the internal character set (always UCS4 on GNU/Linux) and whatever the currently selected locale for the LC_CTYPE category implements. @@ -311,37 +320,39 @@ codecvt<char, wchar_t, mbstate_t> This specialization, by specifying all the template parameters, pretty much ties the hands of implementors. As such, the implementation is -straightforward, involving mcsrtombs for the conversions between char -to wchar_t and wcsrtombs for conversions between wchar_t and char. +straightforward, involving mcsrtombs for the conversions +between char to wchar_t and +wcsrtombs for conversions between wchar_t +and char. Neither of these two required specializations deals with Unicode characters. As such, libstdc++ implements a partial specialization -of the codecvt class with and iconv wrapper class, encoding_state as the -third template parameter. +of the codecvt class with an iconv wrapper class, +encoding_state as the third template parameter. This implementation should be standards conformant. First of all, the standard explicitly points out that instantiations on the third -template parameter, stateT, are the proper way to implement +template parameter, stateT, are the proper way to implement non-required conversions. Second of all, the standard says (in Chapter -17) that partial specializations of required classes are a-ok. Third -of all, the requirements for the stateT type elsewhere in the standard -(see 21.1.2 traits typedefs) only indicate that this type be copy +17) that partial specializations of required classes are A-OK. Third +of all, the requirements for the stateT type elsewhere in the +standard (see 21.1.2 traits typedefs) only indicate that this type be copy constructible. -As such, the type encoding_state is defined as a non-templatized, POD -type to be used as the third type of a codecvt instantiation. This -type is just a wrapper class for iconv, and provides an easy interface +As such, the type encoding_state is defined as a non-templatized, +POD type to be used as the third type of a codecvt instantiation. +This type is just a wrapper class for iconv, and provides an easy interface to iconv functionality. -There are two constructors for encoding_state: +There are two constructors for encoding_state: @@ -352,7 +363,7 @@ encoding_state() : __in_desc(0), __out_desc(0) This default constructor sets the internal encoding to some default (currently UCS4) and the external encoding to whatever is returned by -nl_langinfo(CODESET). +nl_langinfo(CODESET). @@ -370,7 +381,7 @@ either argument. One of the issues with iconv is that the string literals identifying conversions are not standardized. Because of this, the thought of -mandating and or enforcing some set of pre-determined valid +mandating and/or enforcing some set of pre-determined valid identifiers seems iffy: thus, a more practical (and non-migraine inducing) strategy was implemented: end-users can specify any string (subject to a pre-determined length qualifier, currently 32 bytes) for @@ -400,12 +411,12 @@ _M_good() -Provides a way to see if the given encoding_state object has been +Provides a way to see if the given encoding_state object has been properly initialized. If the string literals describing the desired internal and external encoding are not valid, initialization will fail, and this will return false. If the internal and external -encodings are valid, but iconv_open could not allocate conversion -descriptors, this will also return false. Otherwise, the object is +encodings are valid, but iconv_open could not allocate +conversion descriptors, this will also return false. Otherwise, the object is ready to convert and will return true. @@ -424,8 +435,8 @@ themselves. Definitions for all the required codecvt member functions are provided -for this specialization, and usage of codecvt<internal character type, -external character type, encoding_state> is consistent with other +for this specialization, and usage of codecvt<internal +character type, external character type, encoding_state> is consistent with other codecvt usage. @@ -433,7 +444,7 @@ codecvt usage.
Use -A conversions involving string literal. +A conversion involving a string literal. typedef codecvt_base::result result; @@ -490,7 +501,7 @@ codecvt usage. - b. conversions involving std::string + b. conversions involving std::string diff --git a/libstdc++-v3/doc/xml/manual/ctype.xml b/libstdc++-v3/doc/xml/manual/ctype.xml index 21b70724fd7..4e9777d7846 100644 --- a/libstdc++-v3/doc/xml/manual/ctype.xml +++ b/libstdc++-v3/doc/xml/manual/ctype.xml @@ -18,10 +18,10 @@ -For the required specialization codecvt<wchar_t, char, mbstate_t> , +For the required specialization codecvt<wchar_t, char, mbstate_t>, conversions are made between the internal character set (always UCS4 on GNU/Linux) and whatever the currently selected locale for the -LC_CTYPE category implements. +LC_CTYPE category implements. @@ -45,8 +45,10 @@ ctype<wchar_t> This specialization, by specifying all the template parameters, pretty much ties the hands of implementors. As such, the implementation is -straightforward, involving mcsrtombs for the conversions between char -to wchar_t and wcsrtombs for conversions between wchar_t and char. +straightforward, involving mcsrtombs for the +conversions between char to wchar_t and +wcsrtombs for conversions between wchar_t +and char. @@ -69,7 +71,8 @@ characters. - How to deal with different types than char, wchar_t? + How to deal with types other than char, wchar_t? + Overlap between codecvt/ctype: narrow/widen @@ -77,8 +80,8 @@ characters. - Mask typedef in codecvt_base, argument types in codecvt. what - is know about this type? + mask typedef in codecvt_base, + argument types in codecvt. what is know about this type? @@ -95,10 +98,11 @@ characters. - Get the ctype<wchar_t>::mask stuff under control. Need to - make some kind of static table, and not do lookup every time - somebody hits the do_is... functions. Too bad we can't just - redefine mask for ctype<wchar_t> + Get the ctype<wchar_t>::mask stuff under control. + Need to make some kind of static table, and not do lookup every time + somebody hits the do_is... functions. Too bad we can't + just redefine mask for + ctype<wchar_t> diff --git a/libstdc++-v3/doc/xml/manual/status_cxx1998.xml b/libstdc++-v3/doc/xml/manual/status_cxx1998.xml index 1541343e0cb..3bc6a16722b 100644 --- a/libstdc++-v3/doc/xml/manual/status_cxx1998.xml +++ b/libstdc++-v3/doc/xml/manual/status_cxx1998.xml @@ -1075,7 +1075,7 @@ particular release. in the sections where the function itself occurs. --> [18.1]/4 The type of NULL is described - here. + under Support. [18.3]/8 Even though it's listed in the library sections, libstdc++ has zero control over what the cleanup code hands @@ -1107,9 +1107,10 @@ particular release. implementations, any requirements imposed on allocators by containers beyond those requirements that appear in Table 32, and the semantics of containers and algorithms when allocator instances compare - non-equal, are implementation-defined." As yet we don't - have any allocators which compare non-equal, so we can't describe how - they behave. + non-equal, are implementation-defined." There is experimental + support for non-equal allocators in the standard containers in C++98 + mode. There are no additional requirements on allocators. It is undefined + behaviour to swap two containers if their allocators are not equal. [21.1.3.1]/3,4, [21.1.3.2]/2, @@ -1121,16 +1122,16 @@ particular release. here would defeat the purpose. :-) [21.1.3.1]/5 I don't really know about - the mbstate_t stuff... see - the chapter 22 + the mbstate_t stuff... see + the codecvt notes for what does exist. [22.*] Anything and everything we have on locale - implementation will be described - over here. + implementation will be described under + Localization. [26.2.8]/9 I have no idea what - complex<T>'s pow(0,0) returns. + complex<T>'s pow(0,0) returns. [27.4.2.4]/2 Calling std::ios_base::sync_with_stdio after I/O has already been @@ -1138,8 +1139,8 @@ particular release. flush the buffers, and destroy and recreate the underlying buffer instances. Whether or not the previously-written I/O is destroyed in this process depends mostly - on the --enable-libio choice: for stdio, if the written data is - already in the stdio buffer, the data may be completely safe! + on the --enable-libio choice: for stdio, if the written + data is already in the stdio buffer, the data may be completely safe! [27.6.1.1.2], [27.6.2.3] The I/O sentry ctor and dtor can perform @@ -1148,8 +1149,8 @@ particular release. [27.7.1.3]/16, [27.8.1.4]/10 - The effects of pubsetbuf/setbuf are described - in this chapter. + The effects of pubsetbuf/setbuf are described in the + Input and Output chapter. [27.8.1.4]/16 Calling fstream::sync when a get area exists will... whatever fflush() does, I think.