![]() This patch fixes the handling of surrogate code points in all standard facets for transcoding Unicode that are based on std::codecvt. Surrogate code points should always be treated as error. On the other hand surrogate code units can only appear in UTF-16 and only when they come in a proper pair. Additionally, it fixes a bug in std::codecvt_utf16::in() when odd number of bytes were given in the range [from, from_end), error was returned always. The last byte in such range does not form a full UTF-16 code unit and we can not make any decisions for error, instead partial should be returned. The testsuite for testing these facets was updated in the following order: 1. All functions that test codecvts that work with UTF-8 were refactored and made more generic so they accept codecvt that works with the char type char8_t. 2. The same functions were updated with new test cases for transcoding errors and now additionally test for surrogates, overlong UTF-8 sequences, code points out of the Unicode range, and more tests for missing leading and trailing code units. 3. New tests were added to test codecvt_utf16 in both of its variants, UTF-16 <-> UTF-32/UCS-4 and UTF-16 <-> UCS-2. libstdc++-v3/ChangeLog: PR libstdc++/108976 * src/c++11/codecvt.cc (read_utf8_code_point): Fix handing of surrogates in UTF-8. (ucs4_out): Fix handling of surrogates in UCS-4 -> UTF-8. (ucs4_in): Fix handling of range with odd number of bytes. (ucs4_out): Fix handling of surrogates in UCS-4 -> UTF-16. (ucs2_out): Fix handling of surrogates in UCS-2 -> UTF-16. (ucs2_in): Fix handling of range with odd number of bytes. (__codecvt_utf16_base<char16_t>::do_in): Likewise. (__codecvt_utf16_base<char32_t>::do_in): Likewise. (__codecvt_utf16_base<wchar_t>::do_in): Likewise. * testsuite/22_locale/codecvt/codecvt_unicode.cc: Renames, add tests for codecvt_utf16<char16_t> and codecvt_utf16<char32_t>. * testsuite/22_locale/codecvt/codecvt_unicode.h: Refactor UTF-8 testing functions for char8_t, add more test cases for errors, add testing functions for codecvt_utf16. * testsuite/22_locale/codecvt/codecvt_unicode_wchar_t.cc: Renames, add tests for codecvt_utf16<whchar_t>. * testsuite/22_locale/codecvt/codecvt_utf16/79980.cc (test06): Fix test. * testsuite/22_locale/codecvt/codecvt_unicode_char8_t.cc: New test. |
||
---|---|---|
c++tools | ||
config | ||
contrib | ||
fixincludes | ||
gcc | ||
gnattools | ||
gotools | ||
include | ||
INSTALL | ||
intl | ||
libada | ||
libatomic | ||
libbacktrace | ||
libcc1 | ||
libcody | ||
libcpp | ||
libdecnumber | ||
libffi | ||
libgcc | ||
libgfortran | ||
libgm2 | ||
libgo | ||
libgomp | ||
libiberty | ||
libitm | ||
libobjc | ||
libphobos | ||
libquadmath | ||
libsanitizer | ||
libssp | ||
libstdc++-v3 | ||
libvtv | ||
lto-plugin | ||
maintainer-scripts | ||
zlib | ||
.dir-locals.el | ||
.gitattributes | ||
.gitignore | ||
ABOUT-NLS | ||
ar-lib | ||
ChangeLog | ||
ChangeLog.jit | ||
ChangeLog.tree-ssa | ||
compile | ||
config-ml.in | ||
config.guess | ||
config.rpath | ||
config.sub | ||
configure | ||
configure.ac | ||
COPYING | ||
COPYING.LIB | ||
COPYING.RUNTIME | ||
COPYING3 | ||
COPYING3.LIB | ||
depcomp | ||
install-sh | ||
libtool-ldflags | ||
libtool.m4 | ||
ltgcc.m4 | ||
ltmain.sh | ||
ltoptions.m4 | ||
ltsugar.m4 | ||
ltversion.m4 | ||
lt~obsolete.m4 | ||
MAINTAINERS | ||
Makefile.def | ||
Makefile.in | ||
Makefile.tpl | ||
missing | ||
mkdep | ||
mkinstalldirs | ||
move-if-change | ||
multilib.am | ||
README | ||
symlink-tree | ||
test-driver | ||
ylwrap |
This directory contains the GNU Compiler Collection (GCC). The GNU Compiler Collection is free software. See the files whose names start with COPYING for copying permission. The manuals, and some of the runtime libraries, are under different terms; see the individual source files for details. The directory INSTALL contains copies of the installation information as HTML and plain text. The source of this information is gcc/doc/install.texi. The installation information includes details of what is included in the GCC sources and what files GCC installs. See the file gcc/doc/gcc.texi (together with other files that it includes) for usage and porting information. An online readable version of the manual is in the files gcc/doc/gcc.info*. See http://gcc.gnu.org/bugs/ for how to report bugs usefully. Copyright years on GCC source files may be listed using range notation, e.g., 1987-2012, indicating that every year in the range, inclusive, is a copyrightable year that could otherwise be listed individually.