libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318]
This is another C++26 change, approved in Varna 2023. We require a new
static array of data that is extracted from the IANA Character Sets
database. A new Python script to generate a header from the IANA CSV
file is added.
The text_encoding class is basically just a pointer to an {ID,name} pair
in the static array. The aliases view is also just the same pointer (or
empty), and the view's iterator moves forwards and backwards in the
array while the array elements have the same ID (or to one element
further, for a past-the-end iterator).
Because those iterators refer to a global array that never goes out of
scope, there's no reason they should every produce undefined behaviour
or indeterminate values. They should either have well-defined
behaviour, or abort. The overhead of ensuring those properties is pretty
low, so seems worth it.
This means that an aliases_view iterator should never be able to access
out-of-bounds. A non-value-initialized iterator always points to an
element of the static array even when not dereferenceable (the array has
unreachable entries at the start and end, which means that even a
past-the-end iterator for the last encoding in the array still points to
valid memory). Dereferencing an iterator can always return a valid
array element, or "" for a non-dereferenceable iterator (but doing so
will abort when assertions are enabled). In the language being proposed
for C++26, dereferencing an invalid iterator erroneously returns "".
Attempting to increment/decrement past the last/first element in the
view is erroneously a no-op, so aborts when assertions are enabled, and
doesn't change value otherwise.
Similarly, constructing a std::text_encoding with an invalid id (one
that doesn't have the value of an enumerator) erroneously behaves the
same as constructing with id::unknown, or aborts with assertions
enabled.
libstdc++-v3/ChangeLog:
PR libstdc++/113318
* acinclude.m4 (GLIBCXX_CONFIGURE): Add c++26 directory.
(GLIBCXX_CHECK_TEXT_ENCODING): Define.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_TEXT_ENCODING.
* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/bits/locale_classes.h (locale::encoding): Declare new
member function.
* include/bits/unicode.h (__charset_alias_match): New function.
* include/bits/text_encoding-data.h: New file.
* include/bits/version.def (text_encoding): Define.
* include/bits/version.h: Regenerate.
* include/std/text_encoding: New file.
* src/Makefile.am: Add new subdirectory.
* src/Makefile.in: Regenerate.
* src/c++26/Makefile.am: New file.
* src/c++26/Makefile.in: New file.
* src/c++26/text_encoding.cc: New file.
* src/experimental/Makefile.am: Include c++26 convenience
library.
* src/experimental/Makefile.in: Regenerate.
* python/libstdcxx/v6/printers.py (StdTextEncodingPrinter): New
printer.
* scripts/gen_text_encoding_data.py: New file.
* testsuite/22_locale/locale/encoding.cc: New test.
* testsuite/ext/unicode/charset_alias_match.cc: New test.
* testsuite/std/text_encoding/cons.cc: New test.
* testsuite/std/text_encoding/members.cc: New test.
* testsuite/std/text_encoding/requirements.cc: New test.
Reviewed-by: Ulrich Drepper <drepper.fsp@gmail.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
2024-01-15 15:42:50 +00:00
|
|
|
#!/usr/bin/env python3
|
|
|
|
#
|
|
|
|
# Script to generate tables for libstdc++ std::text_encoding.
|
|
|
|
#
|
|
|
|
# This file is part of GCC.
|
|
|
|
#
|
|
|
|
# GCC is free software; you can redistribute it and/or modify it under
|
|
|
|
# the terms of the GNU General Public License as published by the Free
|
|
|
|
# Software Foundation; either version 3, or (at your option) any later
|
|
|
|
# version.
|
|
|
|
#
|
|
|
|
# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
|
|
|
|
# WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
|
|
|
# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
|
|
|
|
# for more details.
|
|
|
|
#
|
|
|
|
# You should have received a copy of the GNU General Public License
|
|
|
|
# along with GCC; see the file COPYING3. If not see
|
|
|
|
# <http://www.gnu.org/licenses/>.
|
|
|
|
|
|
|
|
# To update the Libstdc++ static data in <bits/text_encoding-data.h> download
|
|
|
|
# the latest:
|
|
|
|
# https://www.iana.org/assignments/character-sets/character-sets-1.csv
|
|
|
|
# Then run this script and save the output to
|
|
|
|
# include/bits/text_encoding-data.h
|
|
|
|
|
|
|
|
import sys
|
|
|
|
import csv
|
2024-03-19 12:43:29 +00:00
|
|
|
import os
|
libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318]
This is another C++26 change, approved in Varna 2023. We require a new
static array of data that is extracted from the IANA Character Sets
database. A new Python script to generate a header from the IANA CSV
file is added.
The text_encoding class is basically just a pointer to an {ID,name} pair
in the static array. The aliases view is also just the same pointer (or
empty), and the view's iterator moves forwards and backwards in the
array while the array elements have the same ID (or to one element
further, for a past-the-end iterator).
Because those iterators refer to a global array that never goes out of
scope, there's no reason they should every produce undefined behaviour
or indeterminate values. They should either have well-defined
behaviour, or abort. The overhead of ensuring those properties is pretty
low, so seems worth it.
This means that an aliases_view iterator should never be able to access
out-of-bounds. A non-value-initialized iterator always points to an
element of the static array even when not dereferenceable (the array has
unreachable entries at the start and end, which means that even a
past-the-end iterator for the last encoding in the array still points to
valid memory). Dereferencing an iterator can always return a valid
array element, or "" for a non-dereferenceable iterator (but doing so
will abort when assertions are enabled). In the language being proposed
for C++26, dereferencing an invalid iterator erroneously returns "".
Attempting to increment/decrement past the last/first element in the
view is erroneously a no-op, so aborts when assertions are enabled, and
doesn't change value otherwise.
Similarly, constructing a std::text_encoding with an invalid id (one
that doesn't have the value of an enumerator) erroneously behaves the
same as constructing with id::unknown, or aborts with assertions
enabled.
libstdc++-v3/ChangeLog:
PR libstdc++/113318
* acinclude.m4 (GLIBCXX_CONFIGURE): Add c++26 directory.
(GLIBCXX_CHECK_TEXT_ENCODING): Define.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_TEXT_ENCODING.
* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/bits/locale_classes.h (locale::encoding): Declare new
member function.
* include/bits/unicode.h (__charset_alias_match): New function.
* include/bits/text_encoding-data.h: New file.
* include/bits/version.def (text_encoding): Define.
* include/bits/version.h: Regenerate.
* include/std/text_encoding: New file.
* src/Makefile.am: Add new subdirectory.
* src/Makefile.in: Regenerate.
* src/c++26/Makefile.am: New file.
* src/c++26/Makefile.in: New file.
* src/c++26/text_encoding.cc: New file.
* src/experimental/Makefile.am: Include c++26 convenience
library.
* src/experimental/Makefile.in: Regenerate.
* python/libstdcxx/v6/printers.py (StdTextEncodingPrinter): New
printer.
* scripts/gen_text_encoding_data.py: New file.
* testsuite/22_locale/locale/encoding.cc: New test.
* testsuite/ext/unicode/charset_alias_match.cc: New test.
* testsuite/std/text_encoding/cons.cc: New test.
* testsuite/std/text_encoding/members.cc: New test.
* testsuite/std/text_encoding/requirements.cc: New test.
Reviewed-by: Ulrich Drepper <drepper.fsp@gmail.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
2024-01-15 15:42:50 +00:00
|
|
|
|
|
|
|
if len(sys.argv) != 2:
|
|
|
|
print("Usage: %s <character sets csv>" % sys.argv[0], file=sys.stderr)
|
|
|
|
sys.exit(1)
|
|
|
|
|
2024-03-19 12:43:29 +00:00
|
|
|
self = os.path.basename(__file__)
|
|
|
|
print("// Generated by scripts/{}, do not edit.".format(self))
|
|
|
|
print("""
|
2024-02-02 13:20:11 +00:00
|
|
|
|
|
|
|
// Copyright The GNU Toolchain Authors.
|
|
|
|
//
|
|
|
|
// This file is part of the GNU ISO C++ Library. This library is free
|
|
|
|
// software; you can redistribute it and/or modify it under the
|
|
|
|
// terms of the GNU General Public License as published by the
|
|
|
|
// Free Software Foundation; either version 3, or (at your option)
|
|
|
|
// any later version.
|
|
|
|
|
|
|
|
// This library is distributed in the hope that it will be useful,
|
|
|
|
// but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
// GNU General Public License for more details.
|
|
|
|
|
|
|
|
// Under Section 7 of GPL version 3, you are granted additional
|
|
|
|
// permissions described in the GCC Runtime Library Exception, version
|
|
|
|
// 3.1, as published by the Free Software Foundation.
|
|
|
|
|
|
|
|
// You should have received a copy of the GNU General Public License and
|
|
|
|
// a copy of the GCC Runtime Library Exception along with this program;
|
|
|
|
// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
|
|
|
|
// <http://www.gnu.org/licenses/>.
|
|
|
|
|
|
|
|
/** @file bits/text_encoding-data.h
|
|
|
|
* This is an internal header file, included by other library headers.
|
|
|
|
* Do not attempt to use it directly. @headername{text_encoding}
|
|
|
|
*/
|
|
|
|
""")
|
libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318]
This is another C++26 change, approved in Varna 2023. We require a new
static array of data that is extracted from the IANA Character Sets
database. A new Python script to generate a header from the IANA CSV
file is added.
The text_encoding class is basically just a pointer to an {ID,name} pair
in the static array. The aliases view is also just the same pointer (or
empty), and the view's iterator moves forwards and backwards in the
array while the array elements have the same ID (or to one element
further, for a past-the-end iterator).
Because those iterators refer to a global array that never goes out of
scope, there's no reason they should every produce undefined behaviour
or indeterminate values. They should either have well-defined
behaviour, or abort. The overhead of ensuring those properties is pretty
low, so seems worth it.
This means that an aliases_view iterator should never be able to access
out-of-bounds. A non-value-initialized iterator always points to an
element of the static array even when not dereferenceable (the array has
unreachable entries at the start and end, which means that even a
past-the-end iterator for the last encoding in the array still points to
valid memory). Dereferencing an iterator can always return a valid
array element, or "" for a non-dereferenceable iterator (but doing so
will abort when assertions are enabled). In the language being proposed
for C++26, dereferencing an invalid iterator erroneously returns "".
Attempting to increment/decrement past the last/first element in the
view is erroneously a no-op, so aborts when assertions are enabled, and
doesn't change value otherwise.
Similarly, constructing a std::text_encoding with an invalid id (one
that doesn't have the value of an enumerator) erroneously behaves the
same as constructing with id::unknown, or aborts with assertions
enabled.
libstdc++-v3/ChangeLog:
PR libstdc++/113318
* acinclude.m4 (GLIBCXX_CONFIGURE): Add c++26 directory.
(GLIBCXX_CHECK_TEXT_ENCODING): Define.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_TEXT_ENCODING.
* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/bits/locale_classes.h (locale::encoding): Declare new
member function.
* include/bits/unicode.h (__charset_alias_match): New function.
* include/bits/text_encoding-data.h: New file.
* include/bits/version.def (text_encoding): Define.
* include/bits/version.h: Regenerate.
* include/std/text_encoding: New file.
* src/Makefile.am: Add new subdirectory.
* src/Makefile.in: Regenerate.
* src/c++26/Makefile.am: New file.
* src/c++26/Makefile.in: New file.
* src/c++26/text_encoding.cc: New file.
* src/experimental/Makefile.am: Include c++26 convenience
library.
* src/experimental/Makefile.in: Regenerate.
* python/libstdcxx/v6/printers.py (StdTextEncodingPrinter): New
printer.
* scripts/gen_text_encoding_data.py: New file.
* testsuite/22_locale/locale/encoding.cc: New test.
* testsuite/ext/unicode/charset_alias_match.cc: New test.
* testsuite/std/text_encoding/cons.cc: New test.
* testsuite/std/text_encoding/members.cc: New test.
* testsuite/std/text_encoding/requirements.cc: New test.
Reviewed-by: Ulrich Drepper <drepper.fsp@gmail.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
2024-01-15 15:42:50 +00:00
|
|
|
print("#ifndef _GLIBCXX_GET_ENCODING_DATA")
|
|
|
|
print('# error "This is not a public header, do not include it directly"')
|
|
|
|
print("#endif\n")
|
|
|
|
|
2024-01-23 14:57:15 +00:00
|
|
|
# We need to generate a list of initializers of the form { mib, alias }, e.g.,
|
|
|
|
# { 3, "US-ASCII" },
|
|
|
|
# { 3, "ISO646-US" },
|
|
|
|
# { 3, "csASCII" },
|
|
|
|
# { 4, "ISO_8859-1:1987" },
|
|
|
|
# { 4, "latin1" },
|
|
|
|
# The initializers must be sorted by the mib value. The first entry for
|
|
|
|
# a given mib must be the primary name for the encoding. Any aliases for
|
|
|
|
# the encoding come after the primary name.
|
|
|
|
# We also define a macro _GLIBCXX_TEXT_ENCODING_UTF8_OFFSET which is the
|
|
|
|
# offset into the list of the mib=106, alias="UTF-8" entry. This is used
|
|
|
|
# to optimize the common case, so we don't need to search for "UTF-8".
|
libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318]
This is another C++26 change, approved in Varna 2023. We require a new
static array of data that is extracted from the IANA Character Sets
database. A new Python script to generate a header from the IANA CSV
file is added.
The text_encoding class is basically just a pointer to an {ID,name} pair
in the static array. The aliases view is also just the same pointer (or
empty), and the view's iterator moves forwards and backwards in the
array while the array elements have the same ID (or to one element
further, for a past-the-end iterator).
Because those iterators refer to a global array that never goes out of
scope, there's no reason they should every produce undefined behaviour
or indeterminate values. They should either have well-defined
behaviour, or abort. The overhead of ensuring those properties is pretty
low, so seems worth it.
This means that an aliases_view iterator should never be able to access
out-of-bounds. A non-value-initialized iterator always points to an
element of the static array even when not dereferenceable (the array has
unreachable entries at the start and end, which means that even a
past-the-end iterator for the last encoding in the array still points to
valid memory). Dereferencing an iterator can always return a valid
array element, or "" for a non-dereferenceable iterator (but doing so
will abort when assertions are enabled). In the language being proposed
for C++26, dereferencing an invalid iterator erroneously returns "".
Attempting to increment/decrement past the last/first element in the
view is erroneously a no-op, so aborts when assertions are enabled, and
doesn't change value otherwise.
Similarly, constructing a std::text_encoding with an invalid id (one
that doesn't have the value of an enumerator) erroneously behaves the
same as constructing with id::unknown, or aborts with assertions
enabled.
libstdc++-v3/ChangeLog:
PR libstdc++/113318
* acinclude.m4 (GLIBCXX_CONFIGURE): Add c++26 directory.
(GLIBCXX_CHECK_TEXT_ENCODING): Define.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_TEXT_ENCODING.
* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/bits/locale_classes.h (locale::encoding): Declare new
member function.
* include/bits/unicode.h (__charset_alias_match): New function.
* include/bits/text_encoding-data.h: New file.
* include/bits/version.def (text_encoding): Define.
* include/bits/version.h: Regenerate.
* include/std/text_encoding: New file.
* src/Makefile.am: Add new subdirectory.
* src/Makefile.in: Regenerate.
* src/c++26/Makefile.am: New file.
* src/c++26/Makefile.in: New file.
* src/c++26/text_encoding.cc: New file.
* src/experimental/Makefile.am: Include c++26 convenience
library.
* src/experimental/Makefile.in: Regenerate.
* python/libstdcxx/v6/printers.py (StdTextEncodingPrinter): New
printer.
* scripts/gen_text_encoding_data.py: New file.
* testsuite/22_locale/locale/encoding.cc: New test.
* testsuite/ext/unicode/charset_alias_match.cc: New test.
* testsuite/std/text_encoding/cons.cc: New test.
* testsuite/std/text_encoding/members.cc: New test.
* testsuite/std/text_encoding/requirements.cc: New test.
Reviewed-by: Ulrich Drepper <drepper.fsp@gmail.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
2024-01-15 15:42:50 +00:00
|
|
|
|
|
|
|
charsets = {}
|
|
|
|
with open(sys.argv[1], newline='') as f:
|
|
|
|
reader = csv.reader(f)
|
|
|
|
next(reader) # skip header row
|
|
|
|
for row in reader:
|
|
|
|
mib = int(row[2])
|
|
|
|
if mib in charsets:
|
|
|
|
raise ValueError("Multiple rows for mibEnum={}".format(mib))
|
|
|
|
name = row[1]
|
|
|
|
aliases = row[5].split()
|
|
|
|
# Ensure primary name comes first
|
|
|
|
if name in aliases:
|
|
|
|
aliases.remove(name)
|
|
|
|
charsets[mib] = [name] + aliases
|
|
|
|
|
2024-01-23 14:57:15 +00:00
|
|
|
# Remove "NATS-DANO" and "NATS-DANO-ADD" as specified by the C++ standard.
|
libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318]
This is another C++26 change, approved in Varna 2023. We require a new
static array of data that is extracted from the IANA Character Sets
database. A new Python script to generate a header from the IANA CSV
file is added.
The text_encoding class is basically just a pointer to an {ID,name} pair
in the static array. The aliases view is also just the same pointer (or
empty), and the view's iterator moves forwards and backwards in the
array while the array elements have the same ID (or to one element
further, for a past-the-end iterator).
Because those iterators refer to a global array that never goes out of
scope, there's no reason they should every produce undefined behaviour
or indeterminate values. They should either have well-defined
behaviour, or abort. The overhead of ensuring those properties is pretty
low, so seems worth it.
This means that an aliases_view iterator should never be able to access
out-of-bounds. A non-value-initialized iterator always points to an
element of the static array even when not dereferenceable (the array has
unreachable entries at the start and end, which means that even a
past-the-end iterator for the last encoding in the array still points to
valid memory). Dereferencing an iterator can always return a valid
array element, or "" for a non-dereferenceable iterator (but doing so
will abort when assertions are enabled). In the language being proposed
for C++26, dereferencing an invalid iterator erroneously returns "".
Attempting to increment/decrement past the last/first element in the
view is erroneously a no-op, so aborts when assertions are enabled, and
doesn't change value otherwise.
Similarly, constructing a std::text_encoding with an invalid id (one
that doesn't have the value of an enumerator) erroneously behaves the
same as constructing with id::unknown, or aborts with assertions
enabled.
libstdc++-v3/ChangeLog:
PR libstdc++/113318
* acinclude.m4 (GLIBCXX_CONFIGURE): Add c++26 directory.
(GLIBCXX_CHECK_TEXT_ENCODING): Define.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_TEXT_ENCODING.
* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/bits/locale_classes.h (locale::encoding): Declare new
member function.
* include/bits/unicode.h (__charset_alias_match): New function.
* include/bits/text_encoding-data.h: New file.
* include/bits/version.def (text_encoding): Define.
* include/bits/version.h: Regenerate.
* include/std/text_encoding: New file.
* src/Makefile.am: Add new subdirectory.
* src/Makefile.in: Regenerate.
* src/c++26/Makefile.am: New file.
* src/c++26/Makefile.in: New file.
* src/c++26/text_encoding.cc: New file.
* src/experimental/Makefile.am: Include c++26 convenience
library.
* src/experimental/Makefile.in: Regenerate.
* python/libstdcxx/v6/printers.py (StdTextEncodingPrinter): New
printer.
* scripts/gen_text_encoding_data.py: New file.
* testsuite/22_locale/locale/encoding.cc: New test.
* testsuite/ext/unicode/charset_alias_match.cc: New test.
* testsuite/std/text_encoding/cons.cc: New test.
* testsuite/std/text_encoding/members.cc: New test.
* testsuite/std/text_encoding/requirements.cc: New test.
Reviewed-by: Ulrich Drepper <drepper.fsp@gmail.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
2024-01-15 15:42:50 +00:00
|
|
|
charsets.pop(33, None)
|
|
|
|
charsets.pop(34, None)
|
|
|
|
|
2024-01-23 14:57:15 +00:00
|
|
|
# This is not an official IANA alias, but we include it in the
|
|
|
|
# implementation-defined superset of aliases for US-ASCII.
|
|
|
|
# See also LWG 4043.
|
|
|
|
extra_aliases = {3: ["ASCII"]}
|
|
|
|
|
libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318]
This is another C++26 change, approved in Varna 2023. We require a new
static array of data that is extracted from the IANA Character Sets
database. A new Python script to generate a header from the IANA CSV
file is added.
The text_encoding class is basically just a pointer to an {ID,name} pair
in the static array. The aliases view is also just the same pointer (or
empty), and the view's iterator moves forwards and backwards in the
array while the array elements have the same ID (or to one element
further, for a past-the-end iterator).
Because those iterators refer to a global array that never goes out of
scope, there's no reason they should every produce undefined behaviour
or indeterminate values. They should either have well-defined
behaviour, or abort. The overhead of ensuring those properties is pretty
low, so seems worth it.
This means that an aliases_view iterator should never be able to access
out-of-bounds. A non-value-initialized iterator always points to an
element of the static array even when not dereferenceable (the array has
unreachable entries at the start and end, which means that even a
past-the-end iterator for the last encoding in the array still points to
valid memory). Dereferencing an iterator can always return a valid
array element, or "" for a non-dereferenceable iterator (but doing so
will abort when assertions are enabled). In the language being proposed
for C++26, dereferencing an invalid iterator erroneously returns "".
Attempting to increment/decrement past the last/first element in the
view is erroneously a no-op, so aborts when assertions are enabled, and
doesn't change value otherwise.
Similarly, constructing a std::text_encoding with an invalid id (one
that doesn't have the value of an enumerator) erroneously behaves the
same as constructing with id::unknown, or aborts with assertions
enabled.
libstdc++-v3/ChangeLog:
PR libstdc++/113318
* acinclude.m4 (GLIBCXX_CONFIGURE): Add c++26 directory.
(GLIBCXX_CHECK_TEXT_ENCODING): Define.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_TEXT_ENCODING.
* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/bits/locale_classes.h (locale::encoding): Declare new
member function.
* include/bits/unicode.h (__charset_alias_match): New function.
* include/bits/text_encoding-data.h: New file.
* include/bits/version.def (text_encoding): Define.
* include/bits/version.h: Regenerate.
* include/std/text_encoding: New file.
* src/Makefile.am: Add new subdirectory.
* src/Makefile.in: Regenerate.
* src/c++26/Makefile.am: New file.
* src/c++26/Makefile.in: New file.
* src/c++26/text_encoding.cc: New file.
* src/experimental/Makefile.am: Include c++26 convenience
library.
* src/experimental/Makefile.in: Regenerate.
* python/libstdcxx/v6/printers.py (StdTextEncodingPrinter): New
printer.
* scripts/gen_text_encoding_data.py: New file.
* testsuite/22_locale/locale/encoding.cc: New test.
* testsuite/ext/unicode/charset_alias_match.cc: New test.
* testsuite/std/text_encoding/cons.cc: New test.
* testsuite/std/text_encoding/members.cc: New test.
* testsuite/std/text_encoding/requirements.cc: New test.
Reviewed-by: Ulrich Drepper <drepper.fsp@gmail.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
2024-01-15 15:42:50 +00:00
|
|
|
count = 0
|
|
|
|
for mib in sorted(charsets.keys()):
|
|
|
|
names = charsets[mib]
|
|
|
|
if names[0] == "UTF-8":
|
|
|
|
print("#define _GLIBCXX_TEXT_ENCODING_UTF8_OFFSET {}".format(count))
|
|
|
|
for name in names:
|
|
|
|
print(' {{ {:4}, "{}" }},'.format(mib, name))
|
|
|
|
count += len(names)
|
2024-01-23 14:57:15 +00:00
|
|
|
if mib in extra_aliases:
|
|
|
|
names = extra_aliases[mib]
|
|
|
|
for name in names:
|
|
|
|
print(' {{ {:4}, "{}" }}, // libstdc++ extension'.format(mib, name))
|
|
|
|
count += len(names)
|
libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318]
This is another C++26 change, approved in Varna 2023. We require a new
static array of data that is extracted from the IANA Character Sets
database. A new Python script to generate a header from the IANA CSV
file is added.
The text_encoding class is basically just a pointer to an {ID,name} pair
in the static array. The aliases view is also just the same pointer (or
empty), and the view's iterator moves forwards and backwards in the
array while the array elements have the same ID (or to one element
further, for a past-the-end iterator).
Because those iterators refer to a global array that never goes out of
scope, there's no reason they should every produce undefined behaviour
or indeterminate values. They should either have well-defined
behaviour, or abort. The overhead of ensuring those properties is pretty
low, so seems worth it.
This means that an aliases_view iterator should never be able to access
out-of-bounds. A non-value-initialized iterator always points to an
element of the static array even when not dereferenceable (the array has
unreachable entries at the start and end, which means that even a
past-the-end iterator for the last encoding in the array still points to
valid memory). Dereferencing an iterator can always return a valid
array element, or "" for a non-dereferenceable iterator (but doing so
will abort when assertions are enabled). In the language being proposed
for C++26, dereferencing an invalid iterator erroneously returns "".
Attempting to increment/decrement past the last/first element in the
view is erroneously a no-op, so aborts when assertions are enabled, and
doesn't change value otherwise.
Similarly, constructing a std::text_encoding with an invalid id (one
that doesn't have the value of an enumerator) erroneously behaves the
same as constructing with id::unknown, or aborts with assertions
enabled.
libstdc++-v3/ChangeLog:
PR libstdc++/113318
* acinclude.m4 (GLIBCXX_CONFIGURE): Add c++26 directory.
(GLIBCXX_CHECK_TEXT_ENCODING): Define.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_TEXT_ENCODING.
* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/bits/locale_classes.h (locale::encoding): Declare new
member function.
* include/bits/unicode.h (__charset_alias_match): New function.
* include/bits/text_encoding-data.h: New file.
* include/bits/version.def (text_encoding): Define.
* include/bits/version.h: Regenerate.
* include/std/text_encoding: New file.
* src/Makefile.am: Add new subdirectory.
* src/Makefile.in: Regenerate.
* src/c++26/Makefile.am: New file.
* src/c++26/Makefile.in: New file.
* src/c++26/text_encoding.cc: New file.
* src/experimental/Makefile.am: Include c++26 convenience
library.
* src/experimental/Makefile.in: Regenerate.
* python/libstdcxx/v6/printers.py (StdTextEncodingPrinter): New
printer.
* scripts/gen_text_encoding_data.py: New file.
* testsuite/22_locale/locale/encoding.cc: New test.
* testsuite/ext/unicode/charset_alias_match.cc: New test.
* testsuite/std/text_encoding/cons.cc: New test.
* testsuite/std/text_encoding/members.cc: New test.
* testsuite/std/text_encoding/requirements.cc: New test.
Reviewed-by: Ulrich Drepper <drepper.fsp@gmail.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
2024-01-15 15:42:50 +00:00
|
|
|
|
|
|
|
# <text_encoding> gives an error if this macro is left defined.
|
|
|
|
# Do this last, so that the generated output is not usable unless we reach here.
|
|
|
|
print("\n#undef _GLIBCXX_GET_ENCODING_DATA")
|