Commit graph

219634 commits

Author SHA1 Message Date
Jakub Jelinek
5a48e7732d bitintlower: Fix interaction of gimple_assign_copy_p stmts vs. has_single_use [PR119808]
The following testcase is miscompiled, because we emit a CLOBBER in a place
where it shouldn't be emitted.
Before lowering we have:
  b_5 = 0;
  b.0_6 = b_5;
  b.1_1 = (unsigned _BitInt(129)) b.0_6;
...
  <retval> = b_5;
The bitint coalescing assigns the same partition/underlying variable
for both b_5 and b.0_6 (possible because there is a copy assignment)
and of course a different one for b.1_1 (and other SSA_NAMEs in between).
This is -O0 so stmts aren't DCEd and aren't propagated that much etc.
It is -O0 so we also don't try to optimize and omit some names from m_names
and handle multiple stmts at once, so the expansion emits essentially
  bitint.4 = {};
  bitint.4 = bitint.4;
  bitint.2 = cast of bitint.4;
  bitint.4 = CLOBBER;
...
  <retval> = bitint.4;
and the CLOBBER is the problem because bitint.4 is still live afterwards.
We emit the clobbers to improve code generation, but do it only for
(initially) has_single_use SSA_NAMEs (remembered in m_single_use_names)
being used, if they don't have the same partition on the lhs and a few
other conditions.
The problem above is that b.0_6 which is used in the cast has_single_use
and so was in m_single_use_names bitmask and the lhs in that case is
bitint.2, so a different partition.  But there is gimple_assign_copy_p
with SSA_NAME rhs1 and the partitioning special cases those and while
b.0_6 is single use, b_5 has multiple uses.  I believe this ought to be
a problem solely in the case of such copy stmts and its special case
by the partitioning, if instead of b.0_6 = b_5; there would be
b.0_6 = b_5 + 1; or whatever other stmts that performs or may perform
changes on the value, partitioning couldn't assign the same partition
to b.0_6 and b_5 if b_5 is used later, it couldn't have two different
(or potentially different) values in the same bitint.N var.  With
copy that is possible though.

So the following patch fixes it by being more careful when we set
m_single_use_names, don't set it if it is a has_single_use SSA_NAME
but SSA_NAME_DEF_STMT of it is a copy stmt with SSA_NAME rhs1 and that
rhs1 doesn't have single use, or has_single_use but SSA_NAME_DEF_STMT of it
is a copy stmt etc.

Just to make sure it doesn't change code generation too much, I've gathered
statistics how many times
      if (m_first
          && m_single_use_names
          && m_vars[p] != m_lhs
          && m_after_stmt
          && bitmap_bit_p (m_single_use_names, SSA_NAME_VERSION (op)))
        {
          tree clobber = build_clobber (TREE_TYPE (m_vars[p]),
                                        CLOBBER_STORAGE_END);
          g = gimple_build_assign (m_vars[p], clobber);
          gimple_stmt_iterator gsi = gsi_for_stmt (m_after_stmt);
          gsi_insert_after (&gsi, g, GSI_SAME_STMT);
        }
emits a clobber on
make check-gcc GCC_TEST_RUN_EXPENSIVE=1 RUNTESTFLAGS="--target_board=unix\{-m64,-m32\} GCC_TEST_RUN_EXPENSIVE=1 dg.exp='*bitint* pr112673.c builtin-stdc-bit-*.c pr112566-2.c pr112511.c pr116588.c pr116003.c pr113693.c pr113602.c flex-array-counted-by-7.c' dg-torture.exp='*bitint* pr116480-2.c pr114312.c pr114121.c' dfp.exp=*bitint* i386.exp='pr118017.c pr117946.c apx-ndd-x32-2a.c' vect.exp='vect-early-break_99-pr113287.c' tree-ssa.exp=pr113735.c"
and before this patch it was 41010 clobbers and after it is 40968,
so difference is 42 clobbers, 0.1% fewer.

2025-04-16  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/119808
	* gimple-lower-bitint.cc (gimple_lower_bitint): Don't set
	m_single_use_names bits for SSA_NAMEs which have single use but
	their SSA_NAME_DEF_STMT is a copy from another SSA_NAME which doesn't
	have a single use, or single use which is such a copy etc.

	* gcc.dg/bitint-121.c: New test.
2025-04-16 09:11:06 +02:00
Jesse Huang
fc4099a484 riscv: Fix incorrect gnu property alignment on rv32
Codegen is incorrectly emitting a ".p2align 3" that coerces the
alignment of the .note.gnu.property section from 4 to 8 on rv32.

2025-04-11  Jesse Huang  <jesse.huang@sifive.com>

gcc/ChangeLog

	* config/riscv/riscv.cc (riscv_file_end): Fix .p2align value.

gcc/testsuite/ChangeLog

	* gcc.target/riscv/gnu-property-align-rv32.c: New file.
	* gcc.target/riscv/gnu-property-align-rv64.c: New file.
2025-04-16 14:55:27 +08:00
Kito Cheng
1d9e02bb7e RISC-V: Put jump table in text for large code model
Large code model assume the data or rodata may put far away from
text section.  So we need to put jump table in text section for
large code model.

gcc/ChangeLog:

	* config/riscv/riscv.h (JUMP_TABLES_IN_TEXT_SECTION): Check if
	large code model.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/jump-table-large-code-model.c: New test.
2025-04-16 14:55:02 +08:00
Jakub Jelinek
45a708d7bf testsuite: Add testcase for already fixed PR [PR116093]
This testcase got fixed with r15-9397 PR119722 fix.

2025-04-16  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/116093
	* gcc.dg/bitint-122.c: New test.
2025-04-16 08:44:37 +02:00
Tejas Belagod
31e16c8b75 AArch64: Fix operands order in vec_extract expander
The operand order to gen_vcond_mask call in the vec_extract pattern is wrong.
Fix the order where predicate is operand 3.

Tested and bootstrapped on aarch64-linux-gnu. OK for trunk?

gcc/ChangeLog

	* config/aarch64/aarch64-sve.md (vec_extract<vpred><Vel>): Fix operand
	order to gen_vcond_mask_*.
2025-04-16 11:16:43 +05:30
Alice Carlotti
43cbf049f5 aarch64: Disable sysreg feature gating
This applies to the sysreg read/write intrinsics __arm_[wr]sr*.  It does
not depend on changes to Binutils, because GCC converts recognised
sysreg names to an encoding based form, which is already ungated in Binutils.

We have, however, agreed to make an equivalent change in Binutils (which
would then disable feature gating for sysreg accesses in inline
assembly), but this has not yet been posted upstream.

In the future we may introduce a new flag to renable some checking,
but these checks could not be comprehensive because many system
registers depend on architecture features that don't have corresponding
GCC/GAS --march options.  This would also depend on addressing numerous
inconsistencies in the existing list of sysreg feature dependencies.

gcc/ChangeLog:

	* config/aarch64/aarch64.cc
	(aarch64_valid_sysreg_name_p): Remove feature check.
	(aarch64_retrieve_sysreg): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/acle/rwsr-ungated.c: New test.
2025-04-16 02:07:36 +01:00
GCC Administrator
60130b2d33 Daily bump. 2025-04-16 00:18:18 +00:00
Iain Buclaw
c5ffab99a5 d: Fix ICE: type variant differs by TYPE_MAX_VALUE with -g [PR119826]
Forward referenced enum types were never fixed up after the main
ENUMERAL_TYPE was finished.  All flags set are now propagated to all
variants after its mode, size, and alignment has been calculated.

	PR d/119826

gcc/d/ChangeLog:

	* types.cc (TypeVisitor::visit (TypeEnum *)): Propagate flags of main
	enum types to all forward-referenced variants.

gcc/testsuite/ChangeLog:

	* gdc.dg/debug/imports/pr119826b.d: New test.
	* gdc.dg/debug/pr119826.d: New test.
2025-04-16 01:44:27 +02:00
Nathaniel Shead
a6f4178d0d c++: Prune lambda captures from more places [PR119755]
Currently, pruned lambda captures are still leftover in the function's
BLOCK and topmost BIND_EXPR; this doesn't cause any issues for normal
compilation, but does break modules streaming as we try to reconstruct a
FIELD_DECL that no longer exists on the type itself.

	PR c++/119755

gcc/cp/ChangeLog:

	* lambda.cc (prune_lambda_captures): Remove pruned capture from
	function's BLOCK_VARS and BIND_EXPR_VARS.

gcc/testsuite/ChangeLog:

	* g++.dg/modules/lambda-10_a.H: New test.
	* g++.dg/modules/lambda-10_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
2025-04-16 09:24:00 +10:00
Jakub Jelinek
674b0875a9 testsuite: Fix up completion-2.c test
The r15-9487 change has added -flto-partition=default, which broke
the completion-2.c testcase because that case is now also printed
during completion.

2025-04-16  Jakub Jelinek  <jakub@redhat.com>

	* gcc.dg/completion-2.c: Expect also -flto-partition=default line.
2025-04-16 00:30:09 +02:00
Tobias Burnus
1ff4a22103 libgomp.texi (gcn, nvptx): Mention self_maps alongside USM
libgomp/ChangeLog:

	* libgomp.texi (gcn, nvptx): Mention self_maps clause
	besides unified_shared_memory in the requirements item.
2025-04-15 23:19:50 +02:00
Qing Zhao
727f330f9a c: Fully fold each parameter for call to .ACCESS_WITH_SIZE [PR119717]
C_MAYBE_CONST_EXPR is a C FE operator that will be removed by c_fully_fold.
In c_fully_fold, it assumes that operands of function calls have already
been folded. However, when we build call to .ACCESS_WITH_SIZE, all its
operands are not fully folded. therefore the C FE specific operator is
passed to middle-end.

In order to fix this issue, fully fold the parameters before building the
call to .ACCESS_WITH_SIZE.

	PR c/119717

gcc/c/ChangeLog:

	* c-typeck.cc (build_access_with_size_for_counted_by): Fully fold the
	parameters for call to .ACCESS_WITH_SIZE.

gcc/testsuite/ChangeLog:

	* gcc.dg/pr119717.c: New test.
2025-04-15 20:38:17 +00:00
waffl3x
99835bd68e OpenMP: omp.h omp::allocator C++ Allocator interface
The implementation of each allocator is simplified by inheriting from
__detail::__allocator_templ.  At the moment, none of the implementations
diverge in any way, simply passing in the allocator handle to be used when
an allocation is made.  In the future, const_mem will need special handling
added to it to support constant memory space.

libgomp/ChangeLog:

	* omp.h.in: Add omp::allocator::* and ompx::allocator::* allocators.
	(__detail::__allocator_templ<T, omp_allocator_handle_t>):
	New struct template.
	(null_allocator<T>): New struct template.
	(default_mem<T>): Likewise.
	(large_cap_mem<T>): Likewise.
	(const_mem<T>): Likewise.
	(high_bw_mem<T>): Likewise.
	(low_lat_mem<T>): Likewise.
	(cgroup_mem<T>): Likewise.
	(pteam_mem<T>): Likewise.
	(thread_mem<T>): Likewise.
	(ompx::allocator::gnu_pinned_mem<T>): Likewise.
	* testsuite/libgomp.c++/allocator-1.C: New test.
	* testsuite/libgomp.c++/allocator-2.C: New test.

Signed-off-by: waffl3x <waffl3x@baylibre.com>
2025-04-15 14:34:38 -06:00
H.J. Lu
5ed2fa4768 x86: Update gcc.target/i386/apx-interrupt-1.c
ix86_add_cfa_restore_note omits the REG_CFA_RESTORE REG note for registers
pushed in red-zone.  Since

commit 0a074b8c7e
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Apr 13 12:20:42 2025 -0700

    APX: Don't use red-zone with 32 GPRs and no caller-saved registers

disabled red-zone, update gcc.target/i386/apx-interrupt-1.c to expect
31 .cfi_restore directives.

	PR target/119784
	* gcc.target/i386/apx-interrupt-1.c: Expect 31 .cfi_restore
	directives.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-04-15 12:31:05 -07:00
Sandra Loosemore
d91aab4dd6 Docs: Address -fivopts, -O0, and -Q confusion [PR71094]
There's a blurb at the top of the "Optimize Options" node telling
people that most optimization options are completely disabled at -O0
and a similar blurb in the entry for -Og, but nothing at the entry for
-O0.  Since this is a continuing point of confusion it seems wise to
duplicate the information in all the places users are likely to look
for it.

gcc/ChangeLog
	PR tree-optimization/71094
	* doc/invoke.texi (Optimize Options): Document that -fivopts is
	enabled at -O1 and higher.  Add blurb about -O0 causing GCC to
	completely ignore most optimization options.
2025-04-15 19:18:27 +00:00
Jason Merrill
628aecb050 c++: constexpr, trivial, and non-alias target [PR111075]
On Darwin and other targets with !can_alias_cdtor, we instead go to
maybe_thunk_ctor, which builds a thunk function that calls the general
constructor.  And then cp_fold tries to constant-evaluate that call, and we
ICE because we don't expect to ever be asked to constant-evaluate a call to
a trivial function.

No new test because this fixes g++.dg/torture/tail-padding1.C on affected
targets.

	PR c++/111075

gcc/cp/ChangeLog:

	* constexpr.cc (cxx_eval_call_expression): Allow trivial
	call from a thunk.
2025-04-15 15:14:57 -04:00
Iain Sandoe
7f56a8e8ad configure, Darwin: Recognise new naming for Xcode ld.
The latest editions of XCode have altered the identify reported by 'ld -v'
(again).  This means that GCC configure no longer detects the version.

Fixed by adding the new name to the set checked.

gcc/ChangeLog:

	* configure: Regenerate.
	* configure.ac: Recognise PROJECT:ld-mmmm.nn.aa as an identifier
	for Darwin's static linker.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2025-04-15 19:26:02 +01:00
Iain Sandoe
9cf6b52d04 includes, Darwin: Handle modular use for macOS SDKs [PR116827].
Recent changes to the OS SDKs have altered the way in which include guards
are used for a number of headers when C++ modules are enabled.  Instead of
placing the guards in the included header, they are being placed in the
including header.  This breaks the assumptions in the current GCC stddef.h
specifically, that the presence of __PTRDIFF_T and __SIZE_T means that the
relevant defs are already made.  However in the case of the module-enabled
C++ with these SDKs, that is no longer true.

stddef.h has a large body of special-cases already, but it seems that the
only viable solution here is to add a new one specifically for __APPLE__
and modular code.

This fixes around 280 new fails in the modules test-suite; it is needed on
all open branches that support modules.

	PR target/116827

gcc/ChangeLog:

	* ginclude/stddef.h: Undefine __PTRDIFF_T and __SIZE_T for module-
	enabled c++ on Darwin/macOS platforms.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2025-04-15 19:25:07 +01:00
Kyrylo Tkachov
5621b3b5c9
Regenerate common.opt.urls
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>

	* common.opt.urls: Regenerate.
2025-04-15 20:06:49 +02:00
Richard Biener
248e228fec cobol/119302 - transform gcobol.3 name during install, install as gcobol-io.3
The following installs gcobol.3 as gcobol-io.3 and applies
program-transform-name to the gcobol-io part.  This follows
naming of the pdf and the html variants.
It also uses $(man1ext) and $(man3ext) consistently.

	PR cobol/119302
gcc/cobol/
	* Make-lang.in (GCOBOLIO_INSTALL_NAME): Define.
	Use $(GCOBOLIO_INSTALL_NAME) for gcobol.3 manpage source
	upon install.
2025-04-15 19:31:42 +02:00
Jan Hubicka
4a01869b96 Set znver5 issue rate to 4.
this patch sets issue rate of znver5 to 4.  With current model, unless a reservation is
missing, we will never issue more than 4 instructions per cycle since that is the limit
of decoders and the model does not take into acount the fact that typically code is run
from op cache.

gcc/ChangeLog:

	* config/i386/x86-tune-sched.cc (ix86_issue_rate): Set
	to 4 for znver5.
2025-04-15 19:30:02 +02:00
Jan Hubicka
e2011ab13d Set ADDSS cost to 3 for znver5
Znver5 has latency of addss 2 in typical case while all earlier versions has latency 3.
Unforunately addss cost is used to cost many other SSE instructions than just addss and
setting the cost to 2 makes us to vectorize 4 64bit stores into one 256bit store which
in turn regesses imagemagick.

This patch sets the cost back to 3.  Next stage1 we can untie addss from the other operatoins
and set it correctly.

bootstrapped/regtested x86_64-linux and also benchmarked on SPEC2k17

gcc/ChangeLog:

	PR target/119298
	* config/i386/x86-tune-costs.h (znver5_cost): Set ADDSS cost to 3.
2025-04-15 19:04:15 +02:00
Jonathan Wakely
25775e73ea
libstdc++: Do not define __cpp_lib_ranges_iota in <ranges>
In r14-7153-gadbc46942aee75 we removed a duplicate definition of
__glibcxx_want_range_iota from <ranges>, but __cpp_lib_ranges_iota
should be defined in <ranges> at all.

libstdc++-v3/ChangeLog:

	* include/std/ranges (__glibcxx_want_ranges_iota): Do not
	define.
2025-04-15 17:34:34 +01:00
Jonathan Wakely
df59bf20d8
libstdc++: Do not declare namespace ranges in <numeric> unconditionally
Move namespace ranges inside the feature test macro guard, because
'ranges' is not a reserved name before C++20.

libstdc++-v3/ChangeLog:

	* include/std/numeric (ranges): Only declare namespace for C++23
	and later.
	(ranges::iota_result): Fix indentation.
	* testsuite/17_intro/names.cc: Check ranges is not used as an
	identifier before C++20.
2025-04-15 17:34:34 +01:00
Vineet Gupta
edb4867412 RISC-V: vsetvl: elide abnormal edges from LCM computations [PR119533]
vsetvl phase4 uses LCM guided info to insert VSETVL insns, including a
straggler loop for "mising vsetvls" on certain edges. Currently it
asserts on encountering EDGE_ABNORMAL.

When enabling go frontend with V enabled, libgo build hits the assert.

The solution is to prevent abnormal edges from getting into LCM at all
(my prior attempt at this just ignored them after LCM which is not
right). Existing invalid_opt_bb_p () current does this for BB predecessors
but not for successors which is what the patch adds.

Crucially, the ICE/fix also depends on avoiding vsetvl hoisting past
non-transparent blocks: That is taken care of by Robin's patch
"RISC-V: Do not lift up vsetvl into non-transparent blocks [PR119547]"
for a different yet related issue.

Reported-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

	PR target/119533

gcc/ChangeLog:

	* config/riscv/riscv-vsetvl.cc (invalid_opt_bb_p): Check for
	EDGE_ABNOMAL.
	(pre_vsetvl::compute_lcm_local_properties): Initialize kill
	bitmap.
	Debug dump skipped edge.

gcc/testsuite/ChangeLog:

	* go.dg/pr119533-riscv.go: New test.
	* go.dg/pr119533-riscv-2.go: New test.
2025-04-15 09:29:08 -07:00
Robin Dapp
517f7e3f02 RISC-V: Do not lift up vsetvl into non-transparent blocks [PR119547].
When lifting up a vsetvl into a block we currently don't consider the
block's transparency with respect to the vsetvl as in other parts of the
pass.  This patch does not perform the lift when transparency is not
guaranteed.

This condition is more restrictive than necessary as we can still
perform a vsetvl lift if the conflicting register is only every used
in vsetvls and no regular insns but given how late we are in the GCC 15
cycle it seems better to defer this.  Therefore
gcc.target/riscv/rvv/vsetvl/avl_single-68.c is XFAILed for now.

This issue was found in OpenCV where it manifests as a runtime error.
Zhijin Zeng debugged PR119547 and provided an initial patch.

Reported-By: 曾治金 <zhijin.zeng@spacemit.com>

	PR target/119547

gcc/ChangeLog:

	* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info):
	Do not perform lift if block is not transparent.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/vsetvl/avl_single-68.c: xfail.
	* g++.target/riscv/rvv/autovec/pr119547.C: New test.
	* g++.target/riscv/rvv/autovec/pr119547-2.C: New test.
	* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-10.c: Adjust.
2025-04-15 17:20:59 +02:00
Tomasz Kamiński
f62e5d720d libstdc++: Implement formatter for ranges and range_formatter [PR109162]
This patch implements formatter specialization for input_ranges and
range_formatter class from P2286R8, as adjusted by P2585R1. The formatter
for pair/tuple is not yet provided, making maps not formattable.

This introduces an new _M_format_range member to internal __formatter_str,
that formats range as _CharT as string, according to the format spec.
This function transform any contiguous range into basic_string_view directly,
by computing size if necessary. Otherwise, for ranges for which size can be
computed (forward_range or sized_range) we use a stack buffer, if they are
sufficiently small. Finally, we create a basic_string<_CharT> from the range,
and format its content.

In case when padding is specified, this is handled by firstly formatting
the content of the range to the temporary string object. However, this can be
only implemented if the iterator of the basic_format_context is internal
type-erased iterator used by implementation. Otherwise a new basic_format_context
would need to be created, which would require rebinding of handles stored in
the arguments: note that format spec for element type could retrieve any format
argument from format context, visit and use handle to format it.
As basic_format_context provide no user-facing constructor, the user are not able
to construct object of that type with arbitrary iterators.

The signatures of the user-facing parse and format methods of the provided
formatters deviate from the standard by constraining types of params:
* _CharT is constrained __formatter::__char
* basic_format_parse_context<_CharT> for parse argument
* basic_format_context<_Out, _CharT> for format second argument
The standard specifies last three of above as unconstrained types. These types
are later passed to possibly user-provided formatter specializations, that are
required via formattable concept to only accept above types.

Finally, the formatter<input_range, _CharT> specialization is implemented
without using specialization of range-default-formatter exposition only
template as base class, while providing same functionality.

	PR libstdc++/109162

libstdc++-v3/ChangeLog:

	* include/std/format (__format::__has_debug_format, _Pres_type::_Pres_seq)
	(_Pres_type::_Pres_str, __format::__Stackbuf_size): Define.
	(_Separators::_S_squares, _Separators::_S_parens, _Separators::_S_comma)
	(_Separators::_S_colon): Define additional constants.
	(_Spec::_M_parse_fill_and_align): Define overload accepting
	list of excluded characters for fill, and forward existing overload.
	(__formatter_str::_M_format_range): Define.
	(__format::_Buf_sink) Use __Stackbuf_size for size of array.
	(__format::__is_map_formattable, std::range_formatter)
	(std::formatter<_Rg, _CharT>): Define.
	* src/c++23/std.cc.in (std::format_kind, std::range_format)
	(std::range_formatter): Export.
	* testsuite/std/format/formatter/lwg3944.cc: Guarded tests with
	__glibcxx_format_ranges.
	* testsuite/std/format/formatter/requirements.cc: Adjusted for standard
	behavior.
	* testsuite/23_containers/vector/bool/format.cc: Test vector<bool> formatting.
	* testsuite/std/format/ranges/format_kind.cc: New test.
	* testsuite/std/format/ranges/formatter.cc: New test.
	* testsuite/std/format/ranges/sequence.cc: New test.
	* testsuite/std/format/ranges/string.cc: New test.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-04-15 17:09:59 +02:00
Andreas Schwab
a039bab957 libgcobol: mark riscv64-*-linux* as supported target
* configure.tgt: Set LIBGCOBOL_SUPPORTED for riscv64-*-linux* with
	64-bit multilib.
2025-04-15 16:59:03 +02:00
Tobias Burnus
99cd28c473 Fortran/OpenMP: Support automatic mapping allocatable components (deep mapping)
When mapping an allocatable variable (or derived-type component), explicitly
or implicitly, all its allocated allocatable components will automatically be
mapped. The patch implements the target hooks, added for this feature to
omp-low.cc with commit r15-3895-ge4a58b6f28383c.

Namely, there is a check whether there are allocatable components at all:
gfc_omp_deep_mapping_p. Then gfc_omp_deep_mapping_cnt, counting the number
of required mappings; this is a dynamic value as it depends on array
bounds and whether an allocatable is allocated or not.
And, finally, the actual mapping: gfc_omp_deep_mapping.

Polymorphic variables are partially supported: the mapping of the _data
component is fully supported, but only components of the declared type
are processed for additional allocatables. Additionally, _vptr is not
touched. This means that everything needing _vtab information requires
unified shared memory; in particular, _size data is required when
accessing elements of polymorphic arrays.
However, for scalar arrays, accessing components of the declare type
should work just fine.

As polymorphic variables are not (really) supported and OpenMP 6
explicitly disallows them, there is now a warning (-Wopenmp) when
they are encountered. Unlimited polymorphics are rejected (error).

Additionally, PRIVATE and FIRSTPRIVATE are not quite supported for
allocatable components, polymorphic components and as polymorphic
variable. Thus, those are now rejected as well.

gcc/fortran/ChangeLog:

	* f95-lang.cc (LANG_HOOKS_OMP_DEEP_MAPPING,
	LANG_HOOKS_OMP_DEEP_MAPPING_P, LANG_HOOKS_OMP_DEEP_MAPPING_CNT):
	Define.
	* openmp.cc (gfc_match_omp_clause_reduction): Fix location setting.
	(resolve_omp_clauses): Permit allocatable components, reject
	them and polymorphic variables in PRIVATE/FIRSTPRIVATE.
	* trans-decl.cc (add_clause): Set clause location.
	* trans-openmp.cc (gfc_has_alloc_comps): Add ptr_ok and
	shallow_alloc_only Boolean arguments.
	(gfc_omp_replace_alloc_by_to_mapping): New.
	(gfc_omp_private_outer_ref, gfc_walk_alloc_comps,
	gfc_omp_clause_default_ctor, gfc_omp_clause_copy_ctor,
	gfc_omp_clause_assign_op, gfc_omp_clause_dtor): Update call to it.
	(gfc_omp_finish_clause): Minor cleanups, improve location data,
	handle allocatable components.
	(gfc_omp_deep_mapping_map, gfc_omp_deep_mapping_item,
	gfc_omp_deep_mapping_comps, gfc_omp_gen_simple_loop,
	gfc_omp_get_array_size, gfc_omp_elmental_loop,
	gfc_omp_deep_map_kind_p, gfc_omp_deep_mapping_int_p,
	gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_do,
	gfc_omp_deep_mapping_cnt, gfc_omp_deep_mapping): New.
	(gfc_trans_omp_array_section): Save array descriptor in case
	deep-mapping lang hook will need it.
	(gfc_trans_omp_clauses): Likewise; use better clause location data.
	* trans.h (gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_cnt,
	gfc_omp_deep_mapping): Add function prototypes.

libgomp/ChangeLog:

	* libgomp.texi (5.0 Impl. Status): Mark mapping alloc comps as 'Y'.
	* testsuite/libgomp.fortran/allocatable-comp.f90: New test.
	* testsuite/libgomp.fortran/map-alloc-comp-3.f90: New test.
	* testsuite/libgomp.fortran/map-alloc-comp-4.f90: New test.
	* testsuite/libgomp.fortran/map-alloc-comp-5.f90: New test.
	* testsuite/libgomp.fortran/map-alloc-comp-6.f90: New test.
	* testsuite/libgomp.fortran/map-alloc-comp-7.f90: New test.
	* testsuite/libgomp.fortran/map-alloc-comp-8.f90: New test.
	* testsuite/libgomp.fortran/map-alloc-comp-9.f90: New test.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/map-alloc-comp-1.f90: Remove dg-error.
	* gfortran.dg/gomp/polymorphic-mapping-2.f90: Update warn wording.
	* gfortran.dg/gomp/polymorphic-mapping.f90: Change expected
	diagnostic; some tests moved to ...
	* gfortran.dg/gomp/polymorphic-mapping-1.f90: ... here as new test.
	* gfortran.dg/gomp/polymorphic-mapping-3.f90: New test.
	* gfortran.dg/gomp/polymorphic-mapping-4.f90: New test.
	* gfortran.dg/gomp/polymorphic-mapping-5.f90: New test.
2025-04-15 16:42:42 +02:00
Kyrylo Tkachov
6d9fdf4bf5
Locality cloning pass: -fipa-reorder-for-locality
Implement partitioning and cloning in the callgraph to help locality.
A new -fipa-reorder-for-locality flag is used to enable this.
The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc
The optimization has two components:
* Partitioning the callgraph so as to group callers and callees that frequently
call each other in the same partition
* Cloning functions that straddle multiple callchains and allowing each clone
to be local to the partition of its callchain.

The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc.
It creates a partitioning plan and does the prerequisite cloning.
The partitioning is then implemented during the existing LTO partitioning pass.

To guide these locality heuristics we use PGO data.
In the absence of PGO data we use a static heuristic that uses the accumulated
estimated edge frequencies of the callees for each function to guide the
reordering.
We are investigating some more elaborate static heuristics, in particular using
the demangled C++ names to group template instantiatios together.
This is promising but we are working out some kinks in the implementation
currently and want to send that out as a follow-up once we're more confident
in it.

A new bootstrap-lto-locality bootstrap config is added that allows us to test
this on GCC itself with either static or PGO heuristics.
GCC bootstraps with both (normal LTO bootstrap and profiledbootstrap).

As this new pass enables a new partitioning scheme it is incompatible with
explicit -flto-partition= options so an error is introduced when the user
uses both flags explicitly.

With this optimization we are seeing good performance gains on some large
internal workloads that stress the parts of the processor that is sensitive
to code locality, but we'd appreciate wider performance evaluation.

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for mainline?
Thanks,
Kyrill

Signed-off-by: Prachi Godbole <pgodbole@nvidia.com>
Co-authored-by: Kyrylo Tkachov <ktkachov@nvidia.com>

config/ChangeLog:

	* bootstrap-lto-locality.mk: New file.

gcc/ChangeLog:

	* Makefile.in (OBJS): Add ipa-locality-cloning.o.
	* cgraph.h (set_new_clone_decl_and_node_flags): Declare prototype.
	* cgraphclones.cc (set_new_clone_decl_and_node_flags): Remove static
	qualifier.
	* common.opt (fipa-reorder-for-locality): New flag.
	(LTO_PARTITION_DEFAULT): Declare.
	(flto-partition): Change default to LTO_PARTITION_DFEAULT.
	* doc/invoke.texi: Document -fipa-reorder-for-locality.
	* flag-types.h (enum lto_locality_cloning_model): Declare.
	(lto_partitioning_model): Add LTO_PARTITION_DEFAULT.
	* lto-cgraph.cc (lto_set_symtab_encoder_in_partition): Add dumping of
	node and index.
	* opts.cc (validate_ipa_reorder_locality_lto_partition): Define.
	(finish_options): Handle LTO_PARTITION_DEFAULT.
	* params.opt (lto_locality_cloning_model): New enum.
	(lto-partition-locality-cloning): New param.
	(lto-partition-locality-frequency-cutoff): Likewise.
	(lto-partition-locality-size-cutoff): Likewise.
	(lto-max-locality-partition): Likewise.
	* passes.def: Register pass_ipa_locality_cloning.
	* timevar.def (TV_IPA_LC): New timevar.
	* tree-pass.h (make_pass_ipa_locality_cloning): Declare.
	* ipa-locality-cloning.cc: New file.
	* ipa-locality-cloning.h: New file.

gcc/lto/ChangeLog:

	* lto-partition.cc (add_node_references_to_partition): Define.
	(create_partition): Likewise.
	(lto_locality_map): Likewise.
	(lto_promote_cross_file_statics): Add extra dumping.
	* lto-partition.h (lto_locality_map): Declare prototype.
	* lto.cc (do_whole_program_analysis): Handle
	flag_ipa_reorder_for_locality.
2025-04-15 16:35:44 +02:00
Martin Jambor
b4cf69503b
ipa-bit-cp: Fix adjusting value according to mask (PR119803)
In my fix for PR 119318 I put mask calculation in
ipcp_bits_lattice::meet_with_1 above a final fix to value so that all
the bits in the value which are meaningless according to mask have
value zero, which has tripped a validator in PR 119803.  This patch
fixes that by moving the adjustment down.

Even thought the fix for PR 119318 did a similar thing in
ipcp_bits_lattice::meet_with, the same is not necessary because that
code path then feeds the new value and mask to
ipcp_bits_lattice::set_to_constant which does the final adjustment
correctly.

In both places, however, Jakup proposed a better way of calculating
cap_mask and so I have changed it accordingly.

gcc/ChangeLog:

2025-04-15  Martin Jambor  <mjambor@suse.cz>

	PR ipa/119803
	* ipa-cp.cc (ipcp_bits_lattice::meet_with_1): Move m_value adjustmed
	according to m_mask below the adjustment of the latter according to
	cap_mask.  Optimize the  calculation of cap_mask a bit.
	(ipcp_bits_lattice::meet_with): Optimize the calculation of cap_mask a
	bit.

gcc/testsuite/ChangeLog:

2025-04-15  Martin Jambor  <mjambor@suse.cz>

	PR ipa/119803
	* gcc.dg/ipa/pr119803.c: New test.

Co-authored-by: Jakub Jelinek <jakub@redhat.com>
2025-04-15 15:56:17 +02:00
Iain Buclaw
074b2b0f91 d: Fix internal compiler error: in visit, at d/decl.cc:838 [PR119799]
This was caused by a check in the D front-end disallowing static
VAR_DECLs with a size `0'.

While empty structs in D are give the size `1', the same symbol coming
from ImportC modules do infact have no size, so allow C variables to
pass the check as well as array objects.

	PR d/119799

gcc/d/ChangeLog:

	* decl.cc (DeclVisitor::visit (VarDeclaration *)): Check front-end
	type size before building the VAR_DECL.  Allow C symbols to have a
	size of `0'.

gcc/testsuite/ChangeLog:

	* gdc.dg/import-c/pr119799.d: New test.
	* gdc.dg/import-c/pr119799c.c: New test.
2025-04-15 15:26:58 +02:00
Patrick Palka
369461d074 c++: prev declared hidden tmpl friend inst, cont [PR119807]
When remapping existing specializations of a hidden template friend from
a previous declaration to the new definition, we must remap only those
specializations that match this new definition, but currently we
remap all specializations (since they all appear in the same
DECL_TEMPLATE_INSTANTIATIONS list of the most general template).

Concretely, in the first testcase below, we form two specializations of
the friend A::f, one with arguments {{0},{bool}} and another with
arguments {{1},{bool}}.  Later when instantiating B, we need to remap
these specializations.  During the B<0> instantiation we only want to
remap the first specialization, and during the B<1> instantiation only
the second specialization, but currently we remap both specializations
twice.

tsubst_friend_function needs to determine if an existing specialization
matches the shape of the new definition, which is tricky in general,
e.g. if the outer template parameters may not match up.  Fortunately we
don't have to reinvent the wheel here since is_specialization_of_friend
seems to do exactly what we need.  We can check this unconditionally,
but I think it's only necessary when dealing with specializations formed
from a class template scope previous declaration, hence the
TMPL_ARGS_HAVE_MULTIPLE_LEVELS check.

	PR c++/119807
	PR c++/112288

gcc/cp/ChangeLog:

	* pt.cc (tsubst_friend_function): Skip remapping an
	existing specialization if it doesn't match the shape of
	the new friend definition.

gcc/testsuite/ChangeLog:

	* g++.dg/template/friend86.C: New test.
	* g++.dg/template/friend87.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
2025-04-15 09:06:40 -04:00
Iain Buclaw
f5ed7d19c9 d: Fix ICE in dwarf2out_imported_module_or_decl, at dwarf2out.cc:27676 [PR119817]
The ImportVisitor method for handling the importing of overload sets was
pushing NULL_TREE to the array of import decls, which in turn got passed
to `debug_hooks->imported_module_or_decl', triggering the observed
internal compiler error.

NULL_TREE is returned from `build_import_decl' when the symbol was
ignored for being non-trivial to represent in debug, for example,
template or tuple declarations.  So similarly "skip" adding the symbol
when this is the case for overload sets too.

	PR d/119817

gcc/d/ChangeLog:

	* imports.cc (ImportVisitor::visit (OverloadSet *)): Don't push
	NULL_TREE to vector of import symbols.

gcc/testsuite/ChangeLog:

	* gdc.dg/debug/imports/m119817/a.d: New test.
	* gdc.dg/debug/imports/m119817/b.d: New test.
	* gdc.dg/debug/imports/m119817/package.d: New test.
	* gdc.dg/debug/pr119817.d: New test.
2025-04-15 15:04:13 +02:00
Jakub Jelinek
bf115fd457 ipa-cp: Fix up ipcp_print_widest_int
On Mon, Mar 31, 2025 at 03:34:07PM +0200, Martin Jambor wrote:
> This patch just introduces a form of dumping of widest ints that only
> have zeros in the lowest 128 bits so that instead of printing
> thousands of f's the output looks like:
>
>        Bits: value = 0xffff, mask = all ones folled by 0xffffffffffffffffffffffffffff0000
>
> and then makes sure we use the function not only to print bits but
> also to print masks where values like these can also occur.

Shouldn't that be followed by instead?
And the widest_int checks seems to be quite expensive (especially for
large widest_ints), I think for the first one we can just == -1
and for the second one wi::arshift (value, 128) == -1 and the zero extension
by using wi::zext.

Anyway, I wonder if it wouldn't be better to use something shorter,
the variant patch uses 0xf..f prefix before the 128-bit hexadecimal
number (maybe we could also special case the even more common bits 64+
are all ones case).  Or it could be 0xf*f prefix.  Or printing such
numbers as -0x prefixed negative, though that is not a good idea for masks.

This version doesn't print e.g.
0xf..fffffffffffffffffffffffffffff0000
but just
0xf..f0000
(of course, for say mask of
0xf..f0000000000000000000000000000ffff
it prints it like that, doesn't try to shorten the 0 digits.
But if the most significant bits aren't set, it will be just
0xffff.

2025-04-15  Jakub Jelinek  <jakub@redhat.com>

	* ipa-cp.cc (ipcp_print_widest_int): Print values with all ones in
	bits 128+ with "0xf..f" prefix instead of "all ones folled by ".
	Simplify wide_int check for -1 or all ones above least significant
	128 bits.
2025-04-15 14:56:30 +02:00
Jakub Jelinek
0756511537 tailc: Fix up musttail calls vs. -fsanitize=thread [PR119801]
Calls with musttail attribute don't really work with -fsanitize=thread in
GCC.  The problem is that TSan instrumentation adds
  __tsan_func_entry (__builtin_return_address (0));
calls at the start of each instrumented function and
  __tsan_func_exit ();
call at the end of those and the latter stands in a way of normal tail calls
as well as musttail tail calls.

Looking at what LLVM does, for normal calls -fsanitize=thread also prevents
tail calls like in GCC (well, the __tsan_func_exit () call itself can be
tail called in GCC (and from what I see not in clang)).
But for [[clang::musttail]] calls it arranges to move the
__tsan_func_exit () before the musttail call instead of after it.

The following patch handles it similarly.  If we for -fsanitize=thread
instrumented function detect __builtin_tsan_func_exit () call, we process
it normally (so that the call can be tail called in function returning void)
but set a flag that the builtin has been seen (only for cfun->has_musttail
in the diag_musttail phase).  And then let tree_optimize_tail_calls_1
call find_tail_calls again in a new mode where the __tsan_func_exit ()
call is ignored and so we are able to find calls before it, but only
accept that if the call before it is actually a musttail.  For C++ it needs
to verify that EH cleanup if any also has the __tsan_func_exit () call
and if all goes well, the musttail call is registered for tailcalling with
a flag that it has __tsan_func_exit () after it and when optimizing that
we emit __tsan_func_exit (); call before the musttail tail call (or musttail
tail recursion).

2025-04-15  Jakub Jelinek  <jakub@redhat.com>

	PR sanitizer/119801
	* sanitizer.def (BUILT_IN_TSAN_FUNC_EXIT): Use BT_FN_VOID rather
	than BT_FN_VOID_PTR.
	* tree-tailcall.cc: Include attribs.h and asan.h.
	(struct tailcall): Add has_tsan_func_exit member.
	(empty_eh_cleanup): Add eh_has_tsan_func_exit argument, set what
	it points to to 1 if there is exactly one __tsan_func_exit call
	and ignore that call otherwise.  Adjust recursive call.
	(find_tail_calls): Add RETRY_TSAN_FUNC_EXIT argument, pass it
	to recursive calls.  When seeing __tsan_func_exit call with
	RETRY_TSAN_FUNC_EXIT 0, set it to -1.  If RETRY_TSAN_FUNC_EXIT
	is 1, initially ignore __tsan_func_exit calls.  Adjust
	empty_eh_cleanup caller.  When looking through stmts after the call,
	ignore exactly one __tsan_func_exit call but remember it in
	t->has_tsan_func_exit.  Diagnose if EH cleanups didn't have
	__tsan_func_exit and normal path did or vice versa.
	(optimize_tail_call): Emit __tsan_func_exit before the tail call
	or tail recursion.
	(tree_optimize_tail_calls_1): Adjust find_tail_calls callers.  If
	find_tail_calls changes retry_tsan_func_exit to -1, set it to 1
	and call it again with otherwise the same arguments.

	* c-c++-common/tsan/pr119801.c: New test.
2025-04-15 14:09:55 +02:00
Jonathan Yong
039b566f2f Wbuiltin-declaration-mismatch-4.c: accept long long in warning for llp64
llp64 targets like mingw-w64 will print:
gcc/testsuite/gcc.dg/Wbuiltin-declaration-mismatch-4.c:80:17: warning: ‘memset’ argument 3 promotes to ‘ptrdiff_t’ {aka ‘long long int’} where ‘long long unsigned int’ is expected in a call to built-in function declared without prototype [-
Wbuiltin-declaration-mismatch]
Change the regex pattern to accept it.

Signed-off-by: Jonathan Yong <10walls@gmail.com>

gcc/testsuite/ChangeLog:

	* gcc.dg/Wbuiltin-declaration-mismatch-4.c: Make diagnostic
	accept long long.
2025-04-15 11:38:40 +00:00
Jakub Jelinek
a591629420 testsuite: Fix up ipa/pr119318.c test [PR119318]
dg-additional-options followed by dg-options is ignored.  I've added the
-w from there to dg-options and removed dg-additional-options.

2025-04-15  Jakub Jelinek  <jakub@redhat.com>

	PR ipa/119318
	* gcc.dg/ipa/pr119318.c: Remove dg-additional-options, add -w to
	dg-options.
2025-04-15 12:26:11 +02:00
Jonathan Wakely
05d3aebe24
libstdc++: Fix std::string construction from volatile char* [PR119748]
My recent r15-9381-g648d5c26e25497 change assumes that a contiguous
iterator with the correct value_type can be converted to a const charT*
but that's not true for volatile charT*. The optimization should only be
done if it can be converted to the right pointer type.

Additionally, some generic loops for non-contiguous iterators need an
explicit cast to deal with iterator reference types that do not bind to
the const charT& parameter of traits_type::assign.

libstdc++-v3/ChangeLog:

	PR libstdc++/119748
	* include/bits/basic_string.h (_S_copy_chars): Only optimize for
	contiguous iterators that are convertible to const charT*. Use
	explicit conversion to charT after dereferencing iterator.
	(_S_copy_range): Likewise for contiguous ranges.
	* include/bits/basic_string.tcc (_M_construct): Use explicit
	conversion to charT after dereferencing iterator.
	* include/bits/cow_string.h (_S_copy_chars): Likewise.
	(basic_string(from_range_t, R&&, const Allocator&)): Likewise.
	Only optimize for contiguous iterators that are convertible to
	const charT*.
	* testsuite/21_strings/basic_string/cons/char/119748.cc: New
	test.
	* testsuite/21_strings/basic_string/cons/wchar_t/119748.cc:
	New test.

Reviewed-by: Tomasz Kaminski <tkaminsk@redhat.com>
2025-04-15 09:24:58 +01:00
Jonathan Wakely
8a208899e9
libstdc++: Enable __gnu_test::test_container constructor for C++98
The only reason this constructor wasn't defined for C++98 is that it
uses constructor delegation, but that isn't necessary.

libstdc++-v3/ChangeLog:

	* testsuite/util/testsuite_iterators.h (test_container): Define
	array constructor for C++98 as well.
2025-04-15 09:24:44 +01:00
Jakub Jelinek
69ffddd8bd libgcobol: Handle long double as an alternate IEEE754 quad [PR119244]
I think there should be consistency in what we use, so like
libgcobol-fp.h specifies, IEEE quad long double should have highest
priority, then _Float128 with *f128 APIs, then libquadmath.
And when we decide to use say long double, we shouldn't mix that with
strfromf128/strtof128.

Additionally, given that the *l vs. *f128 vs. *q API decision is done
solely in libgcobol and not in the compiler (which is different from
the Fortran case where compiled code emits say sinq or sinf128 calls),
I think libgcobol.spec should only have -lquadmath in any form only in
the case when using libquadmath for everything.  In the Fortran case
it is for backwards compatibility purposes, if something has been
compiled with older gfortran which used say sinq and link is done by
gfortran which has been configured against new glibc with *f128, linking
would fail otherwise.

2025-04-15  Jakub Jelinek  <jakub@redhat.com>
	    Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

	PR cobol/119244
	* acinclude.m4 (LIBGCOBOL_CHECK_FLOAT128): Ensure
	libgcob_cv_have_float128 is not yes on targets with IEEE quad
	long double.  Don't check for --as-needed nor set LIBQUADSPEC
	on targets which USE_IEC_60559.
	* libgcobol-fp.h (FP128_FMT, strtofp128, strfromfp128): Define.
	* intrinsic.cc (strtof128): Don't redefine.
	(WEIRD_TRANSCENDENT_RETURN_VALUE): Use GCOB_FP128_LITERAL macro.
	(__gg__numval_f): Use strtofp128 instead of strtof128.
	* libgcobol.cc (strtof128): Don't redefine.
	(format_for_display_internal): Use strfromfp128 instead of
	strfromf128 or quadmath_snprintf and use FP128_FMT in the format
	string.
	(get_float128, __gg__compare_2, __gg__move, __gg__move_literala):
	Use strtofp128 instead of strtof128.
	* configure: Regenerate.
2025-04-15 07:55:55 +02:00
Sandra Loosemore
fc89b1face Doc: always_inline attribute vs multiple TUs and LTO [PR113203]
gcc/ChangeLog
	PR ipa/113203
	* doc/extend.texi (Common Function Attributes): Explain how to
	use always_inline in programs that have multiple translation
	units, and that LTO inlining additionally needs optimization
	enabled.
2025-04-15 03:58:31 +00:00
Jason Merrill
764f02327f c++: shortcut constexpr vector ctor [PR113835]
Since std::vector became usable in constant evaluation in C++20, a vector
variable with static storage duration might be manifestly
constant-evaluated, so we properly try to constant-evaluate its initializer.
But it can never succeed since the result will always refer to the result of
operator new, so trying is a waste of time.  Potentially a large waste of
time for a large vector, as in the testcase in the PR.

So, let's recognize this case and skip trying constant-evaluation.  I do
this only for the case of an integer argument, as that's the case that's
easy to write but slow to (fail to) evaluate.

In the test, I use dg-timeout-factor to lower the default timeout from 300
seconds to 15; on my laptop, compilation without the patch takes about 20
seconds versus about 2 with the patch.

	PR c++/113835

gcc/cp/ChangeLog:

	* constexpr.cc (cxx_eval_outermost_constant_expr): Bail out early
	for std::vector(N).

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/constexpr-vector1.C: New test.
2025-04-14 23:23:05 -04:00
liuhongt
fa58ff249a Revert documents from r11-344-g0fec3f62b9bfc0
gcc/ChangeLog:

	PR target/108134
	* doc/extend.texi: Remove documents from r11-344-g0fec3f62b9bfc0.
2025-04-14 18:29:57 -07:00
Sandra Loosemore
f7a2f0aa6b Doc: clarify -march=pentiumpro has no MMX support [PR42683]
gcc/ChangeLog
	PR target/42683
	* doc/invoke.texi (x86 Options): Clarify that -march=pentiumpro
	doesn't include MMX.
2025-04-15 01:09:36 +00:00
GCC Administrator
9f3d2506e4 Daily bump. 2025-04-15 00:19:09 +00:00
Thomas Schwinge
fe283dba77 GCN, nvptx: Support '-mfake-exceptions', and use it for offloading compilation [PR118794]
With '-mfake-exceptions' enabled, the user-visible behavior in presence of
exception handling constructs changes such that the compile-time
'sorry, unimplemented: exception handling not supported' is skipped, code
generation proceeds, and instead, exception handling constructs 'abort' at
run time.  (..., or don't, if they're in dead code.)

	PR target/118794
	gcc/
	* config/gcn/gcn.opt (-mfake-exceptions): Support.
	* config/nvptx/nvptx.opt (-mfake-exceptions): Likewise.
	* config/gcn/gcn.md (define_expand "exception_receiver"): Use it.
	* config/nvptx/nvptx.md (define_expand "exception_receiver"):
	Likewise.
	* config/gcn/mkoffload.cc (main): Set it.
	* config/nvptx/mkoffload.cc (main): Likewise.
	* config/nvptx/nvptx.cc (nvptx_assemble_integer)
	<in_section == exception_section>: Special handling for
	'SYMBOL_REF's.
	* except.cc (expand_dw2_landing_pad_for_region): Don't generate
	bogus code for (default)
	'#define EH_RETURN_DATA_REGNO(N) INVALID_REGNUM'.
	libgcc/
	* config/gcn/unwind-gcn.c (_Unwind_Resume): New.
	* config/nvptx/unwind-nvptx.c (_Unwind_Resume): Likewise.
	gcc/testsuite/
	* g++.target/gcn/exceptions-bad_cast-2.C: Set
	'-mno-fake-exceptions'.
	* g++.target/gcn/exceptions-pr118794-1.C: Likewise.
	* g++.target/gcn/exceptions-throw-2.C: Likewise.
	* g++.target/nvptx/exceptions-bad_cast-2.C: Likewise.
	* g++.target/nvptx/exceptions-pr118794-1.C: Likewise.
	* g++.target/nvptx/exceptions-throw-2.C: Likewise.
	* g++.target/gcn/exceptions-bad_cast-2_-mfake-exceptions.C: New.
	* g++.target/gcn/exceptions-pr118794-1_-mfake-exceptions.C:
	Likewise.
	* g++.target/gcn/exceptions-throw-2_-mfake-exceptions.C: Likewise.
	* g++.target/nvptx/exceptions-bad_cast-2_-mfake-exceptions.C:
	Likewise.
	* g++.target/nvptx/exceptions-pr118794-1_-mfake-exceptions.C:
	Likewise.
	* g++.target/nvptx/exceptions-throw-2_-mfake-exceptions.C:
	Likewise.
	libgomp/
	* testsuite/libgomp.c++/target-exceptions-bad_cast-2-offload-sorry-GCN.C:
	Set '-foffload-options=-mno-fake-exceptions'.
	* testsuite/libgomp.c++/target-exceptions-bad_cast-2-offload-sorry-nvptx.C:
	Likewise.
	* testsuite/libgomp.c++/target-exceptions-pr118794-1-offload-sorry-GCN.C:
	Likewise.
	* testsuite/libgomp.c++/target-exceptions-pr118794-1-offload-sorry-nvptx.C:
	Likewise.
	* testsuite/libgomp.c++/target-exceptions-throw-2-offload-sorry-GCN.C:
	Likewise.
	* testsuite/libgomp.c++/target-exceptions-throw-2-offload-sorry-nvptx.C:
	Likewise.
	* testsuite/libgomp.oacc-c++/exceptions-bad_cast-2-offload-sorry-GCN.C:
	Likewise.
	* testsuite/libgomp.oacc-c++/exceptions-bad_cast-2-offload-sorry-nvptx.C:
	Likewise.
	* testsuite/libgomp.oacc-c++/exceptions-throw-2-offload-sorry-GCN.C:
	Likewise.
	* testsuite/libgomp.oacc-c++/exceptions-throw-2-offload-sorry-nvptx.C:
	Likewise.
	* testsuite/libgomp.c++/target-exceptions-bad_cast-2.C: Adjust.
	* testsuite/libgomp.c++/target-exceptions-pr118794-1.C: Likewise.
	* testsuite/libgomp.c++/target-exceptions-throw-2.C: Likewise.
	* testsuite/libgomp.oacc-c++/exceptions-bad_cast-2.C: Likewise.
	* testsuite/libgomp.oacc-c++/exceptions-throw-2.C: Likewise.
	* testsuite/libgomp.c++/target-exceptions-throw-2-O0.C: New.
2025-04-14 23:56:05 +02:00
Thomas Schwinge
6c0ea84026 Add 'throw', dead code test cases for GCN, nvptx target and OpenACC, OpenMP 'target' offloading
gcc/testsuite/
	* g++.target/gcn/exceptions-throw-3.C: New.
	* g++.target/nvptx/exceptions-throw-3.C: Likewise.
	libgomp/
	* testsuite/libgomp.c++/target-exceptions-throw-3.C: New.
	* testsuite/libgomp.oacc-c++/exceptions-throw-3.C: Likewise.
2025-04-14 23:54:54 +02:00
Thomas Schwinge
1daf570498 Add 'throw', caught test cases for GCN, nvptx target and OpenACC, OpenMP 'target' offloading
gcc/testsuite/
	* g++.target/gcn/exceptions-throw-2.C: New.
	* g++.target/nvptx/exceptions-throw-2.C: Likewise.
	libgomp/
	* testsuite/libgomp.c++/target-exceptions-throw-2.C: New.
	* testsuite/libgomp.c++/target-exceptions-throw-2-offload-sorry-GCN.C: Likewise.
	* testsuite/libgomp.c++/target-exceptions-throw-2-offload-sorry-nvptx.C: Likewise.
	* testsuite/libgomp.oacc-c++/exceptions-throw-2.C: Likewise.
	* testsuite/libgomp.oacc-c++/exceptions-throw-2-offload-sorry-GCN.C: Likewise.
	* testsuite/libgomp.oacc-c++/exceptions-throw-2-offload-sorry-nvptx.C: Likewise.
2025-04-14 23:54:54 +02:00
Thomas Schwinge
1362d9d494 Add 'throw' test cases for GCN, nvptx target and OpenACC, OpenMP 'target' offloading
gcc/testsuite/
	* g++.target/gcn/exceptions-throw-1.C: New.
	* g++.target/nvptx/exceptions-throw-1.C: Likewise.
	libgomp/
	* testsuite/libgomp.c++/target-exceptions-throw-1.C: New.
	* testsuite/libgomp.c++/target-exceptions-throw-1-O0.C: Likewise.
	* testsuite/libgomp.oacc-c++/exceptions-throw-1.C: Likewise.
2025-04-14 23:54:53 +02:00