The following testcase ICEs after emitting one pedwarn (about using
__VA_ARGS__ in a place where it shouldn't be used) and one error.
The error is emitted by _cpp_save_parameter where it sees the node
has been used already earlier. But unlike the other _cpp_save_parameter
caller which does goto out; if it returns false, this call with explicit
__VA_ARGS__ doesn't and if it increments number of parameters etc. after
the error, we then try to unsave it twice.
The following patch fixes it by doing the goto out in that case too,
the macro will then not be considered as variable arguments macro,
but for error recovery I think that is fine.
The other option would be before the other _cpp_save_parameter caller
check if the node is pfile->spec_nodes.n__VA_ARGS__ and in that case
also error and goto out, but that seems more expensive than this for
the common case that the macro definition is correct.
2025-04-09 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/118674
* macro.cc (parse_params) <case CPP_ELLIPSIS>: If _cpp_save_parameter
failed for __VA_ARGS__, goto out.
* gcc.dg/cpp/pr118674.c: New test.
The libcpp left shift handling implements (partially) the C99-C23
wording where shifts are UB if shift count is negative, or too large,
or shifting left a negative value or shifting left non-negative value
results in something not representable in the result type (in the
preprocessor case that is intmax_t).
libcpp actually implements left shift by negative count as right shifts
by negation of the count and similarly right shifts by negative count
as left shifts by negation (not ok), sets overflow for too large shift
count (ok), doesn't check for negative values on left shift (not ok)
and checks correctly for the non-representable ones otherwise (ok).
Now, C++11 to C++17 has different behavior, whereas in C99-C23 1 << 63
in preprocessor is invalid, in C++11-17 it is valid, but 3 << 63 is
not. The wording is that left shift of negative value is UB (like in C)
and signed non-negative left shift is UB if the result isn't representable
in corresponding unsigned type (so uintmax_t for libcpp).
And then C++20 and newer says all left shifts are well defined with the
exception of bad shift counts.
In -fsanitize=undefined we handle these by
/* For signed x << y, in C99 and later, the following:
(unsigned) x >> (uprecm1 - y)
if non-zero, is undefined. */
and
/* For signed x << y, in C++11 to C++17, the following:
x < 0 || ((unsigned) x >> (uprecm1 - y))
if > 1, is undefined. */
Now, we are late in GCC 15 development, so I think making the preprocessor
more strict than it is now is undesirable, so will defer setting overflow
flag for the shifts by negative count, or shifts by negative value left.
The following patch just makes some previously incorrectly rejected or
warned cases valid for C++11-17 and even more for C++20 and later.
2025-04-04 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/119391
* expr.cc (num_lshift): Add pfile argument. Don't set num.overflow
for !num.unsignedp in C++20 or later unless n >= precision. For
C++11 to C++17 set it if orig >> (precision - 1 - n) as logical
shift results in value > 1.
(num_binary_op): Pass pfile to num_lshift.
(num_div_op): Likewise.
* g++.dg/cpp/pr119391.C: New test.
This patch addresses an issue in the C preprocessor where incorrect
line number information is generated when processing files with a
large number of lines. The problem arises from improper handling
of location intervals in the line map, particularly when locations
exceed LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES.
By ensuring that the highest location is not decremented if it
would move to a different ordinary map, this fix resolves
the line number discrepancies observed in certain test cases.
This change improves the accuracy of line number reporting, benefiting
users relying on precise code coverage and debugging information.
libcpp/ChangeLog:
PR preprocessor/108900
* files.cc (_cpp_stack_file): Do not decrement highest_location
across distinct maps.
Signed-off-by: Jeremy Bettis <jbettis@google.com>
Signed-off-by: Yash Shinde <Yash.Shinde@windriver.com>
In r15-4286 I've introduced a typo, part of the change was
- cpp_error (pfile, CPP_DL_ERROR, "'\\o' not followed by '{'");
+ cpp_error (pfile, CPP_DL_ERROR, "%<\\o%> not followed by %<}%>");
which turned { into }. This patch fixes it back.
2025-03-12 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/119202
* charset.cc (convert_oct): Fix up typo in diagnostics about \o
not followed by {.
Now that the #embed paper has been voted in, the following patch
removes the pedwarn for C++26 on it (and adjusts pedwarn warning for
older C++ versions) and predefines __cpp_pp_embed FTM.
Also, the patch changes cpp_error to cpp_pedwarning with for C++
-Wc++26-extensions guarding, and for C add -Wc11-c23-compat warning
about #embed.
I believe we otherwise implement everything in the paper already,
except I'm really confused by the
[Example:
#embed <data.dat> limit(__has_include("a.h"))
#if __has_embed(<data.dat> limit(__has_include("a.h")))
// ill-formed: __has_include [cpp.cond] cannot appear here
#endif
— end example]
part. My reading of both C23 and C++ with the P1967R14 paper in
is that the first case (#embed with __has_include or __has_embed in its
clauses) is what is clearly invalid and so the ill-formed note should be
for #embed. And the __has_include/__has_embed in __has_embed is actually
questionable.
Both C and C++ have something like
"The identifiers __has_include, __has_embed, and __has_c_attribute
shall not appear in any context not mentioned in this subclause."
or
"The identifiers __has_include and __has_cpp_attribute shall not appear
in any context not mentioned in this subclause."
(into which P1967R14 adds __has_embed) in the conditional inclusion
subclause. #embed is defined in a different one, so using those in there
is invalid (unless "using the rules specified for conditional inclusion"
wording e.g. in limit clause overrides that).
The reason why I think it is fuzzy for __has_embed is that __has_embed
is actually defined in the Conditional inclusion subclause (so that
would mean one can use __has_include, __has_embed and __has_*attribute
in there) but its clauses are described in a different one.
GCC currently accepts
#embed __FILE__ limit (__has_include (<stdarg.h>))
#if __has_embed (__FILE__ limit (__has_include (<stdarg.h>)))
#endif
#embed __FILE__ limit (__has_embed (__FILE__))
#if __has_embed (__FILE__ limit (__has_embed (__FILE__)))
#endif
Note, it isn't just about limit clause, but also about
prefix/suffix/if_empty, except that in those cases the "using the rules
specified for conditional inclusion" doesn't apply.
In any case, I'd hope that can be dealt with incrementally (and should
be handled the same for both C and C++).
2025-02-28 Jakub Jelinek <jakub@redhat.com>
libcpp/
* include/cpplib.h (enum cpp_warning_reason): Add
CPP_W_CXX26_EXTENSIONS enumerator.
* init.cc (lang_defaults): Set embed for GNUCXX26 and CXX26.
* directives.cc (do_embed): Adjust pedwarn wording for embed in C++,
use cpp_pedwarning instead of cpp_error and add CPP_W_C11_C23_COMPAT
warning of cpp_pedwarning hasn't diagnosed anything.
gcc/c-family/
* c.opt (Wc++26-extensions): Add CppReason(CPP_W_CXX26_EXTENSIONS).
* c-cppbuiltin.cc (c_cpp_builtins): Predefine __cpp_pp_embed=202502
for C++26.
gcc/testsuite/
* g++.dg/cpp/embed-1.C: Adjust for pedwarn wording change and don't
expect any error for C++26.
* g++.dg/cpp/embed-2.C: Adjust for pedwarn wording change and don't
expect any warning for C++26.
* g++.dg/cpp26/feat-cxx26.C: Test __cpp_pp_embed value.
* gcc.dg/cpp/embed-17.c: New test.
It seems that tokens_buff_new() has always been allocating the virtual
location buffer 4 times larger than intended, and now that location_t is
64-bit, it is 8 times larger. Fixed.
libcpp/ChangeLog:
* macro.cc (tokens_buff_new): Fix length argument to XNEWVEC.
Change location_t to be a 64-bit integer instead of a 32-bit integer in
libcpp.
Also included in this change are the two other patches in the original
series which depended on this one; I am committing them all at once in case
it needs to be reverted later:
-Support for 64-bit location_t: gimple parts
The size of struct gimple increased by 8 bytes with the change in size of
location_t from 32- to 64-bit; adjust the WORD markings in the comments
accordingly. It seems that most of the WORD markings were off by one already,
probably not having been updated after a previous reduction in the size of a
gimple, so they have become retroactively correct again, and only a couple
needed adjustment actually.
Also add a comment that there is now 32 bits of unused padding available in
struct gimple for 64-bit hosts.
-Support for 64-bit location_t: Remove -flarge-source-files
The option -flarge-source-files became unnecessary with 64-bit location_t
and harms performance compared to the new default setting, so silently
ignore it.
libcpp/ChangeLog:
* include/cpplib.h (struct cpp_token): Adjust comment about the
struct size.
* include/line-map.h (location_t): Change typedef from 32-bit to 64-bit
integer.
(LINE_MAP_MAX_COLUMN_NUMBER): Increase size to be appropriate for
64-bit location_t.
(LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES): Likewise.
(LINE_MAP_MAX_LOCATION_WITH_COLS): Likewise.
(LINE_MAP_MAX_LOCATION): Likewise.
(MAX_LOCATION_T): Likewise.
(line_map_suggested_range_bits): Likewise.
(struct line_map): Adjust comment about the struct size.
(struct line_map_macro): Likewise.
(struct line_map_ordinary): Likewise. Rearrange fields to optimize
padding.
gcc/testsuite/ChangeLog:
* g++.dg/diagnostic/pr77949.C: Adapt the test for 64-bit location_t,
when the previously expected failure doesn't actually happen.
* g++.dg/modules/loc-prune-4.C: Adjust the expected output for the
64-bit location_t case.
* gcc.dg/plugin/expensive_selftests_plugin.cc: Don't try to test
the maximum supported column number in 64-bit location_t mode.
* gcc.dg/plugin/location_overflow_plugin.cc: Adjust the base_location
so it can effectively test 64-bit location_t.
gcc/ChangeLog:
* gimple.h (struct gphi): Update word marking comments to reflect
the new size of location_t.
(struct gimple): Likewise. Add a comment about padding.
* common.opt: Mark -flarge-source-files as Ignored.
* common.opt.urls: Regenerate.
* doc/invoke.texi: Remove -flarge-source-files.
* toplev.cc (process_options): Remove support for
-flarge-source-files.
This patch adds similar optimizations to the C++ FE as have been
implemented earlier in the C FE.
The libcpp hunk enables use of CPP_EMBED token even for C++, not just
C; the preprocessor guarantees there is always a CPP_NUMBER CPP_COMMA
before CPP_EMBED and CPP_COMMA CPP_NUMBER after it which simplifies
parsing (unless #embed is more than 2GB, in that case it could be
CPP_NUMBER CPP_COMMA CPP_EMBED CPP_COMMA CPP_EMBED CPP_COMMA CPP_EMBED
CPP_COMMA CPP_NUMBER etc. with each CPP_EMBED covering at most INT_MAX
bytes).
Similarly to the C patch, this patch parses it into RAW_DATA_CST tree
in the braced initializers (and from there peels into INTEGER_CSTs unless
it is an initializer of an std::byte array or integral array with CHAR_BIT
element precision), parses CPP_EMBED in cp_parser_expression into just
the last INTEGER_CST in it because I think users don't need millions of
-Wunused-value warnings because they did useless
int a = (
#embed "megabyte.dat"
);
and so most of the inner INTEGER_CSTs would be there just for the warning,
and in the rest of contexts like template argument list, function argument
list, attribute argument list, ...) parse it into a sequence of INTEGER_CSTs
(I wrote a range/iterator classes to simplify that).
My dumb
cat embed-11.c
constexpr unsigned char a[] = {
#embed "cc1plus"
};
const unsigned char *b = a;
testcase where cc1plus is 492329008 bytes long when configured
--enable-checking=yes,rtl,extra against recent binutils with .base64 gas
support results in:
time ./xg++ -B ./ -S -O2 embed-11.c
real 0m4.350s
user 0m2.427s
sys 0m0.830s
time ./xg++ -B ./ -c -O2 embed-11.c
real 0m6.932s
user 0m6.034s
sys 0m0.888s
(compared to running out of memory or very long compilation).
On a shorter inclusion,
cat embed-12.c
constexpr unsigned char a[] = {
#embed "xg++"
};
const unsigned char *b = a;
where xg++ is 15225904 bytes long, this takes using GCC with the #embed
patchset except for this patch:
time ~/src/gcc/obj36/gcc/xg++ -B ~/src/gcc/obj36/gcc/ -S -O2 embed-12.c
real 0m33.190s
user 0m32.327s
sys 0m0.790s
and with this patch:
time ./xg++ -B ./ -S -O2 embed-12.c
real 0m0.118s
user 0m0.090s
sys 0m0.028s
The patch doesn't change anything on what the first patch in the series
introduces even for C++, namely that #embed is expanded (actually or as if)
into a sequence of literals like
127,69,76,70,2,1,1,3,0,0,0,0,0,0,0,0,2,0,62,0,1,0,0,0,80,211,64,0,0,0,0,0,64,0,0,0,0,0,0,0,8,253
and so each element has int type.
That is how I believe it is in C23, and the different versions of the
C++ P1967 paper specified there some casts, P1967R12 in particular
"Otherwise, the integral constant expression is the value of std::fgetc’s return is cast
to unsigned char."
but please see
https://github.com/llvm/llvm-project/pull/97274#issuecomment-2230929277
comment and whether we really want the preprocessor to preprocess it for
C++ as (or as-if)
static_cast<unsigned char>(127),static_cast<unsigned char>(69),static_cast<unsigned char>(76),static_cast<unsigned char>(70),static_cast<unsigned char>(2),...
i.e. 9 tokens per byte rather than 2, or
(unsigned char)127,(unsigned char)69,...
or
((unsigned char)127),((unsigned char)69),...
etc.
Without a literal suffix for unsigned char constant literals it is horrible,
plus the incompatibility between C and C++. Sure, we could use the magic
form more often for C++ to save the size and do the 9 or how many tokens
form only for the boundary constants and use #embed "." __gnu__::__base64__("...")
for what is in between if there are at least 2 tokens inside of it.
E.g. (unsigned char)127 vs. static_cast<unsigned char>(127) behaves
differently if there is constexpr long long p[] = { ... };
...
#embed __FILE__
[p]
2024-12-06 Jakub Jelinek <jakub@redhat.com>
libcpp/
* files.cc (finish_embed): Use CPP_EMBED even for C++.
gcc/
* tree.h (RAW_DATA_UCHAR_ELT, RAW_DATA_SCHAR_ELT): Define.
gcc/cp/ChangeLog:
* cp-tree.h (class raw_data_iterator): New type.
(class raw_data_range): New type.
* parser.cc (cp_parser_postfix_open_square_expression): Handle
parsing of CPP_EMBED.
(cp_parser_parenthesized_expression_list): Likewise. Use
cp_lexer_next_token_is.
(cp_parser_expression): Handle parsing of CPP_EMBED.
(cp_parser_template_argument_list): Likewise.
(cp_parser_initializer_list): Likewise.
(cp_parser_oacc_clause_tile): Likewise.
(cp_parser_omp_tile_sizes): Likewise.
* pt.cc (tsubst_expr): Handle RAW_DATA_CST.
* constexpr.cc (reduced_constant_expression_p): Likewise.
(raw_data_cst_elt): New function.
(find_array_ctor_elt): Handle RAW_DATA_CST.
(cxx_eval_array_reference): Likewise.
* typeck2.cc (digest_init_r): Emit -Wnarrowing and/or -Wconversion
diagnostics.
(process_init_constructor_array): Handle RAW_DATA_CST.
* decl.cc (maybe_deduce_size_from_array_init): Likewise.
(is_direct_enum_init): Fail for RAW_DATA_CST.
(cp_maybe_split_raw_data): New function.
(consume_init): New function.
(reshape_init_array_1): Add VECTOR_P argument. Handle RAW_DATA_CST.
(reshape_init_array): Adjust reshape_init_array_1 caller.
(reshape_init_vector): Likewise.
(reshape_init_class): Handle RAW_DATA_CST.
(reshape_init_r): Likewise.
gcc/testsuite/
* c-c++-common/cpp/embed-22.c: New test.
* c-c++-common/cpp/embed-23.c: New test.
* g++.dg/cpp/embed-4.C: New test.
* g++.dg/cpp/embed-5.C: New test.
* g++.dg/cpp/embed-6.C: New test.
* g++.dg/cpp/embed-7.C: New test.
* g++.dg/cpp/embed-8.C: New test.
* g++.dg/cpp/embed-9.C: New test.
* g++.dg/cpp/embed-10.C: New test.
* g++.dg/cpp/embed-11.C: New test.
* g++.dg/cpp/embed-12.C: New test.
* g++.dg/cpp/embed-13.C: New test.
* g++.dg/cpp/embed-14.C: New test.
As noted in bug 117162, C23 changed some rules on UCNs to match C++
(this was a late change agreed in the resolution to CD2 comment
US-032, implementing changes from N3124), which we need to implement.
Allow UCNs below 0xa0 outside identifiers for C, with a
pedwarn-if-pedantic before C23 (and a warning with -Wc11-c23-compat)
except for the always-allowed cases of UCNs for $ @ `. Also as part
of that change, do not allow \u0024 in identifiers as equivalent to $
for C23.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
PR c/117162
libcpp/
* include/cpplib.h (struct cpp_options): Add low_ucns.
* init.cc (struct lang_flags, lang_defaults): Add low_ucns.
(cpp_set_lang): Set low_ucns
* charset.cc (_cpp_valid_ucn): For C, allow UCNs below 0xa0
outside identifiers, with a pedwarn if pedantic before C23 or a
warning with -Wc11-c23-compat. Do not allow \u0024 in identifiers
for C23.
gcc/testsuite/
* gcc.dg/cpp/c17-ucn-1.c, gcc.dg/cpp/c17-ucn-2.c,
gcc.dg/cpp/c17-ucn-3.c, gcc.dg/cpp/c17-ucn-4.c,
gcc.dg/cpp/c23-ucn-2.c, gcc.dg/cpp/c23-ucnid-2.c: New tests.
* c-c++-common/cpp/delimited-escape-seq-3.c,
c-c++-common/cpp/named-universal-char-escape-3.c,
gcc.dg/cpp/c23-ucn-1.c, gcc.dg/cpp/c2y-delimited-escape-seq-3.c:
Update expected messages
* gcc.dg/cpp/ucs.c: Use -pedantic-errors. Update expected
messages.
I enabled include translation to header units in r15-1104-ga29f481bbcaf2b,
but it seems that patch wasn't sufficient, as any diagnostics in the main
source file would show up as coming from the header instead.
Fixed by setting buffer->file for leaving the file transition that my
previous patch made us enter. And don't push a buffer of newlines, in this
case that messes up line numbers instead of aligning them.
libcpp/ChangeLog:
* files.cc (_cpp_stack_file): Handle -include of header unit more
specially.
gcc/testsuite/ChangeLog:
* g++.dg/modules/dashinclude-1_b.C: Add an #error.
* g++.dg/modules/dashinclude-1_a.H: Remove dg-module-do run.
The PR shows that we ICE after lexing an invalid unterminated raw string,
because lex_raw_string() pops the main buffer unexpectedly. Resolve by
handling this case the same way as for other directives.
libcpp/ChangeLog:
PR preprocessor/117118
* lex.cc (lex_raw_string): Treat an unterminated raw string the same
way for a deferred pragma as is done for other directives.
gcc/testsuite/ChangeLog:
PR preprocessor/117118
* c-c++-common/raw-string-directive-3.c: New test.
* c-c++-common/raw-string-directive-4.c: New test.
libcpp makes use of the cpp_buffer pfile->a_buff to store things while it is
handling macros. It uses it to store pointers (cpp_hashnode*, for macro
arguments) and cpp_macro objects. This works fine because a cpp_hashnode*
and a cpp_macro have the same alignment requirement on either 32-bit or
64-bit systems (namely, the same alignment as a pointer.)
When 64-bit location_t is enabled on a 32-bit sytem, the alignment
requirement may cease to be the same, because the alignment requirement of a
cpp_macro object changes to that of a uint64_t, which be larger than that of
a pointer. It's not the case for x86 32-bit, but for example, on sparc, a
pointer has 4-byte alignment while a uint64_t has 8. In that case,
intermixing the two within the same cpp_buffer leads to a misaligned
access. The code path that triggers this is the one in _cpp_commit_buff in
which a hash table with its own allocator (i.e. ggc) is not being used, so
it doesn't happen within the compiler itself, but it happens in the other
libcpp clients, such as genmatch.
Fix that up by ensuring _cpp_commit_buff commits a fully aligned chunk of the
buffer, so it's ready for anything it may be used for next.
Also modify CPP_ALIGN so that it guarantees to return an alignment at least
the size of location_t. Currently it returns the max of a pointer and a
double. I am not aware of any platform where a double may have smaller
alignment than a uint64_t, but it does not hurt to add location_t here to be
sure.
libcpp/ChangeLog:
* lex.cc (_cpp_commit_buff): Make sure that the buffer is properly
aligned for the next allocation.
* internal.h (struct dummy): Make sure alignment is large enough for
a location_t, just in case.
Prepare libcpp to support 64-bit location_t, without yet making
any functional changes, by adding new typedefs that enable code to be
written such that it works with any size location_t. Update the usage of
line maps within libcpp accordingly.
Subsequent patches will prepare the rest of the codebase similarly, and then
afterwards, location_t will be changed to uint64_t.
libcpp/ChangeLog:
* include/line-map.h (line_map_uint_t): New typedef, the same type
as location_t.
(location_diff_t): New typedef.
(line_map_suggested_range_bits): New constant.
(struct maps_info_ordinary): Change member types from "unsigned int"
to "line_map_uint_t".
(struct maps_info_macro): Likewise.
(struct location_adhoc_data_map): Likewise.
(LINEMAPS_ALLOCATED): Change return type from "unsigned int" to
"line_map_uint_t".
(LINEMAPS_ORDINARY_ALLOCATED): Likewise.
(LINEMAPS_MACRO_ALLOCATED): Likewise.
(LINEMAPS_USED): Likewise.
(LINEMAPS_ORDINARY_USED): Likewise.
(LINEMAPS_MACRO_USED): Likewise.
(linemap_lookup_macro_index): Likewise.
(LINEMAPS_MAP_AT): Change argument type from "unsigned int" to
"line_map_uint_t".
(LINEMAPS_ORDINARY_MAP_AT): Likewise.
(LINEMAPS_MACRO_MAP_AT): Likewise.
(line_map_new_raw): Likewise.
(linemap_module_restore): Likewise.
(linemap_dump): Likewise.
(line_table_dump): Likewise.
(LINEMAPS_LAST_MAP): Add a linemap_assert() for safety.
(SOURCE_COLUMN): Use a cast to ensure correctness if location_t
becomes a 64-bit type.
* line-map.cc (location_adhoc_data_hash): Don't truncate to 32-bit
prematurely when hashing.
(line_maps::get_or_create_combined_loc): Adapt types to support
potentially 64-bit location_t. Use MAX_LOCATION_T rather than a
hard-coded constant.
(line_maps::get_range_from_loc): Adapt types and constants to
support potentially 64-bit location_t.
(line_maps::pure_location_p): Likewise.
(line_maps::get_pure_location): Likewise.
(line_map_new_raw): Likewise.
(LAST_SOURCE_LINE_LOCATION): Likewise.
(linemap_add): Likewise.
(linemap_module_restore): Likewise.
(linemap_line_start): Likewise.
(linemap_position_for_column): Likewise.
(linemap_position_for_line_and_column): Likewise.
(linemap_position_for_loc_and_offset): Likewise.
(linemap_ordinary_map_lookup): Likewise.
(linemap_lookup_macro_index): Likewise.
(linemap_dump): Likewise.
(linemap_dump_location): Likewise.
(linemap_get_file_highest_location): Likewise.
(line_table_dump): Likewise.
(linemap_compare_locations): Avoid signed int overflow in the result.
* macro.cc (num_expanded_macros_counter): Change type of global
variable from "unsigned int" to "line_map_uint_t".
(num_macro_tokens_counter): Likewise.
The dependency output for header unit modules is based on the absolute
pathname of the header file, but that's not something that a makefile can
portably refer to. This patch adds a .c++-header-unit target based on the
header name relative to an element of the include path.
libcpp/ChangeLog:
* internal.h (_cpp_get_file_dir): Declare.
* files.cc (_cpp_get_file_dir): New fn.
* mkdeps.cc (make_write): Use it.
gcc/testsuite/ChangeLog:
* g++.dg/modules/dep-4.H: New test.