Find a file
Dimitrij Mijoski a8b9c32da7 libstdc++: Fix handling of surrogate CP in codecvt [PR108976]
This patch fixes the handling of surrogate code points in all standard
facets for transcoding Unicode that are based on std::codecvt. Surrogate
code points should always be treated as error. On the other hand
surrogate code units can only appear in UTF-16 and only when they come
in a proper pair.

Additionally, it fixes a bug in std::codecvt_utf16::in() when odd number
of bytes were given in the range [from, from_end), error was returned
always. The last byte in such range does not form a full UTF-16 code
unit and we can not make any decisions for error, instead partial should
be returned.

The testsuite for testing these facets was updated in the following
order:

1. All functions that test codecvts that work with UTF-8 were refactored
   and made more generic so they accept codecvt that works with the char
   type char8_t.
2. The same functions were updated with new test cases for transcoding
   errors and now additionally test for surrogates, overlong UTF-8
   sequences, code points out of the Unicode range, and more tests for
   missing leading and trailing code units.
3. New tests were added to test codecvt_utf16 in both of its variants,
   UTF-16 <-> UTF-32/UCS-4 and UTF-16 <-> UCS-2.

libstdc++-v3/ChangeLog:

	PR libstdc++/108976
	* src/c++11/codecvt.cc (read_utf8_code_point): Fix handing of
	surrogates in UTF-8.
	(ucs4_out): Fix handling of surrogates in UCS-4 -> UTF-8.
	(ucs4_in): Fix handling of range with odd number of bytes.
	(ucs4_out): Fix handling of surrogates in UCS-4 -> UTF-16.
	(ucs2_out): Fix handling of surrogates in UCS-2 -> UTF-16.
	(ucs2_in): Fix handling of range with odd number of bytes.
	(__codecvt_utf16_base<char16_t>::do_in): Likewise.
	(__codecvt_utf16_base<char32_t>::do_in): Likewise.
	(__codecvt_utf16_base<wchar_t>::do_in): Likewise.
	* testsuite/22_locale/codecvt/codecvt_unicode.cc: Renames, add
	tests for codecvt_utf16<char16_t> and codecvt_utf16<char32_t>.
	* testsuite/22_locale/codecvt/codecvt_unicode.h: Refactor UTF-8
	testing functions for char8_t, add more test cases for errors,
	add testing functions for codecvt_utf16.
	* testsuite/22_locale/codecvt/codecvt_unicode_wchar_t.cc:
	Renames, add tests for codecvt_utf16<whchar_t>.
	* testsuite/22_locale/codecvt/codecvt_utf16/79980.cc (test06):
	Fix test.
	* testsuite/22_locale/codecvt/codecvt_unicode_char8_t.cc: New
	test.
2023-09-29 16:01:21 +01:00
c++tools Daily bump. 2023-06-23 00:16:38 +00:00
config Daily bump. 2023-09-16 00:17:55 +00:00
contrib Harmonize headers between both dg-extract-results scripts 2023-09-29 08:50:28 -06:00
fixincludes Daily bump. 2023-08-18 00:16:52 +00:00
gcc vec.h: Guard most of static assertions for GCC >= 5 2023-09-29 15:14:52 +02:00
gnattools Daily bump. 2023-04-26 00:17:46 +00:00
gotools Daily bump. 2022-08-31 00:16:45 +00:00
include Daily bump. 2023-08-23 00:17:59 +00:00
INSTALL
intl Daily bump. 2023-08-08 00:17:37 +00:00
libada Daily bump. 2023-08-08 00:17:37 +00:00
libatomic Daily bump. 2023-08-08 00:17:37 +00:00
libbacktrace Daily bump. 2023-08-08 00:17:37 +00:00
libcc1 Daily bump. 2023-08-12 00:17:36 +00:00
libcody Daily bump. 2023-06-16 00:17:18 +00:00
libcpp Daily bump. 2023-09-21 00:17:49 +00:00
libdecnumber Daily bump. 2023-06-16 00:17:18 +00:00
libffi Daily bump. 2023-08-24 00:18:18 +00:00
libgcc Daily bump. 2023-09-28 09:50:12 +00:00
libgfortran Daily bump. 2023-09-29 00:17:28 +00:00
libgm2 Daily bump. 2023-09-20 00:17:55 +00:00
libgo libgo: fix DejaGNU testsuite compiler when using build sysroot 2023-09-12 13:19:42 -07:00
libgomp Daily bump. 2023-09-21 00:17:49 +00:00
libiberty Daily bump. 2023-08-23 00:17:59 +00:00
libitm Daily bump. 2023-08-08 00:17:37 +00:00
libobjc Daily bump. 2023-08-08 00:17:37 +00:00
libphobos Daily bump. 2023-09-24 00:16:58 +00:00
libquadmath Daily bump. 2023-08-08 00:17:37 +00:00
libsanitizer Daily bump. 2023-08-08 00:17:37 +00:00
libssp Daily bump. 2023-08-08 00:17:37 +00:00
libstdc++-v3 libstdc++: Fix handling of surrogate CP in codecvt [PR108976] 2023-09-29 16:01:21 +01:00
libvtv Daily bump. 2023-08-08 00:17:37 +00:00
lto-plugin Daily bump. 2023-08-08 00:17:37 +00:00
maintainer-scripts Daily bump. 2023-07-08 00:16:53 +00:00
zlib Daily bump. 2023-08-08 00:17:37 +00:00
.dir-locals.el
.gitattributes
.gitignore .gitignore: do not ignore config.h 2022-07-19 17:07:04 +03:00
ABOUT-NLS
ar-lib
ChangeLog Daily bump. 2023-09-19 00:17:49 +00:00
ChangeLog.jit
ChangeLog.tree-ssa
compile
config-ml.in LoongArch: Reimplement multilib build option handling. 2023-09-15 10:42:12 +08:00
config.guess
config.rpath
config.sub
configure LoongArch: Reimplement multilib build option handling. 2023-09-15 10:42:12 +08:00
configure.ac LoongArch: Reimplement multilib build option handling. 2023-09-15 10:42:12 +08:00
COPYING
COPYING.LIB
COPYING.RUNTIME
COPYING3
COPYING3.LIB
depcomp
install-sh
libtool-ldflags
libtool.m4 libtool.m4: augment symcode for Solaris 11 2023-08-07 22:59:41 +02:00
ltgcc.m4
ltmain.sh
ltoptions.m4
ltsugar.m4
ltversion.m4
lt~obsolete.m4
MAINTAINERS MAINTAINERS: Add myself to write after approval 2023-09-18 09:43:54 +00:00
Makefile.def toplevel: Makefile.def: add install-strip dependency on libsframe 2023-08-07 22:59:42 +02:00
Makefile.in Pass 'SYSROOT_CFLAGS_FOR_TARGET' down to target libraries [PR109951] 2023-09-12 11:30:37 +02:00
Makefile.tpl Pass 'SYSROOT_CFLAGS_FOR_TARGET' down to target libraries [PR109951] 2023-09-12 11:30:37 +02:00
missing
mkdep
mkinstalldirs
move-if-change
multilib.am
README
symlink-tree
test-driver
ylwrap

This directory contains the GNU Compiler Collection (GCC).

The GNU Compiler Collection is free software.  See the files whose
names start with COPYING for copying permission.  The manuals, and
some of the runtime libraries, are under different terms; see the
individual source files for details.

The directory INSTALL contains copies of the installation information
as HTML and plain text.  The source of this information is
gcc/doc/install.texi.  The installation information includes details
of what is included in the GCC sources and what files GCC installs.

See the file gcc/doc/gcc.texi (together with other files that it
includes) for usage and porting information.  An online readable
version of the manual is in the files gcc/doc/gcc.info*.

See http://gcc.gnu.org/bugs/ for how to report bugs usefully.

Copyright years on GCC source files may be listed using range
notation, e.g., 1987-2012, indicating that every year in the range,
inclusive, is a copyrightable year that could otherwise be listed
individually.