Commit graph

212312 commits

Author SHA1 Message Date
Matthew Malcomson
93ced50d1c [MAINTAINERS] Update email and move to DCO
* MAINTAINERS: Update my email address.

Signed-off-by: Matthew Malcomson <mmalcomson@nvidia.com>
2024-07-24 10:30:48 +01:00
Christoph Müllner
9817d29cd6 RISC-V: Disable Zba optimization pattern if XTheadMemIdx is enabled
It is possible that the Zba optimization pattern zero_extendsidi2_bitmanip
matches for a XTheadMemIdx INSN with the effect of emitting an invalid
instruction as reported in PR116035.

The pattern above is used to emit a zext.w instruction to zero-extend
SI mode registers to DI mode.  A similar functionality can be achieved
by XTheadBb's th.extu instruction.  And indeed, we have the equivalent
pattern in thead.md (zero_extendsidi2_th_extu).  However, that pattern
depends on !TARGET_XTHEADMEMIDX.  To compensate for that, there are
specific patterns that ensure that zero-extension instruction can still
be emitted (th_memidx_bb_zero_extendsidi2 and friends).

While we could implement something similar (th_memidx_zba_zero_extendsidi2)
it would only make sense, if there existed real HW that does implement Zba
and XTheadMemIdx, but not XTheadBb.  Unless such a machine exists, let's
simply disable zero_extendsidi2_bitmanip if XTheadMemIdx is available.

	PR target/116035

gcc/ChangeLog:

	* config/riscv/bitmanip.md: Disable zero_extendsidi2_bitmanip
	for XTheadMemIdx.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/pr116035-1.c: New test.
	* gcc.target/riscv/pr116035-2.c: New test.

Reported-by: Patrick O'Neill <patrick@rivosinc.com>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2024-07-24 09:26:27 +02:00
Lingling Kong
9d312ba544 x86: Don't enable APX_F in 32-bit mode
gcc/ChangeLog:

	PR target/115978
	* config/i386/driver-i386.cc (host_detect_local_cpu):  Enable
	APX_F only for 64-bit codegen.
	* config/i386/i386-options.cc (DEF_PTA):  Skip PTA_APX_F if
	not in 64-bit mode.

gcc/testsuite/ChangeLog:

	PR target/115978
	* gcc.target/i386/pr115978-1.c: New test.
	* gcc.target/i386/pr115978-2.c: Ditto.
2024-07-24 14:53:03 +08:00
Pan Li
9059734109 Internal-fn: Only allow modes describe types for internal fn[PR115961]
The direct_internal_fn_supported_p has no restrictions for the type
modes.  For example the bitfield like below will be recog as .SAT_TRUNC.

struct e
{
  unsigned pre : 12;
  unsigned a : 4;
};

__attribute__((noipa))
void bug (e * v, unsigned def, unsigned use) {
  e & defE = *v;
  defE.a = min_u (use + 1, 0xf);
}

This patch would like to add checks for the direct_internal_fn_supported_p,
and only allows the tree types describled by modes.

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

	PR target/115961

gcc/ChangeLog:

	* internal-fn.cc (type_strictly_matches_mode_p): Add new func
	impl to check type strictly matches mode or not.
	(type_pair_strictly_matches_mode_p): Ditto but for tree type
	pair.
	(direct_internal_fn_supported_p): Add above check for the tree
	type pair.

gcc/testsuite/ChangeLog:

	* g++.dg/torture/pr115961-run-1.C: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
2024-07-24 12:51:58 +08:00
Jeff Law
f9a60d575f [PR rtl-optimization/115877][6/n] Add testcase from pr115877
This just adds the testcase from pr115877.  It's working now on the trunk.  I'm
not done with cleanups/bugfixing, but there's no reason to not have the
testcase installed at this point.

	PR rtl-optimization/115877
gcc/testsuite
	* gcc.dg/torture/pr115877.c: New test.
2024-07-23 19:11:04 -06:00
GCC Administrator
daedc197bb Daily bump. 2024-07-24 00:18:20 +00:00
Mark Harmstone
1ca7a12807 Output CodeView type information for rvalue references
Translates DW_TAG_rvalue_reference_type DIEs into LF_POINTER types.

gcc/
	* dwarf2codeview.cc (get_type_num_reference_type): Handle rvalue refs.
	(get_type_num_array_type): Add DW_TAG_rvalue_reference_type to switch.
	(get_type_num): Handle DW_TAG_rvalue_reference_type DIEs.
	* dwarf2codeview.h (CV_PTR_MODE_RVREF): Define.
2024-07-24 00:52:58 +01:00
Mark Harmstone
7341607544 Output CodeView type information for references
Translates DW_TAG_reference_type DIEs into LF_POINTER types.

gcc/
	* dwarf2codeview.cc (get_type_num_reference_type): New function.
	(get_type_num_array_type): Add DW_TAG_reference_type to switch.
	(get_type_num): Handle DW_TAG_reference_type DIEs.
	* dwarf2codeview.h (CV_PTR_MODE_LVREF): Define.
2024-07-24 00:51:13 +01:00
Vineet Gupta
806927111c RISC-V: Fix snafu in SI mode splitters patch
SPEC2017 perlbench for RISC-V was broke as runtime output mismatch
failure.

> 3830:  mbox2: dWshe3Aa1EULre4CT5O/ErYFrk+o/EOoebA1kTVjQVQQH2EjT5fHcYnwjj2MdBmZu5y3Ce4Ei4QQZo/SNrry9g
>        mbox2: uuWPimQiU0D4UrwFP+LS0lFNph4qL43WV1A6T3tHleatIOUaHixhrJU9NoA2lc9KjwYpdEL0lNTXkvo8ymNHzA
>               ^
> 3832:  mbox3: 8f4jdv6GIf0lX3DcdwRdEm6/aZwnmGX6n86GzCvmkwTKFXQjwlwVHc8jy8XlcyiIPr3yXTkgVOiP3cRYvyYQPg
>        mbox3: 9xQySgP6qbhfxl8Usu1WfGA5UhStB5AN31wueGM6OF4Jp59DkqJPu6ksGblOU5u0nQapQC1e9oYIs16a2mq2NA
>               ^
> specdiff run completed

Edwin bisected this to 273f16a125 ("[v3][RISC-V] Handle bit
manipulation of SImode values") which had the operands swapped in one
of the new splitters introduced.

No test as reducer narrows it to down to the exact test introduced by
the original commit.

gcc/ChangeLog:
	* config/riscv/bitmanip.md: Fix splitter.

Reported-by: Edwin Lu <ewlu@rivosinc.com>
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2024-07-23 15:42:19 -07:00
Marek Polacek
e8c40aed0f doc: add missing @option for musttail
gcc/ChangeLog:

	* doc/extend.texi: Add missing @option.
2024-07-23 16:33:34 -04:00
Andi Kleen
8daae81113 Add documentation for musttail attribute
gcc/ChangeLog:

	PR c/83324
	* doc/extend.texi: Document [[musttail]]
2024-07-23 13:27:12 -07:00
Andi Kleen
8d1af8f904 Add tests for C/C++ musttail attributes
Some adopted from the existing C musttail plugin tests.
Also extends the ability to query the sibcall capabilities of the
target.

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp:
	(check_effective_target_struct_tail_call): New function.
	* c-c++-common/musttail1.c: New test.
	* c-c++-common/musttail12.c: New test.
	* c-c++-common/musttail13.c: New test.
	* c-c++-common/musttail2.c: New test.
	* c-c++-common/musttail3.c: New test.
	* c-c++-common/musttail4.c: New test.
	* c-c++-common/musttail5.c: New test.
	* c-c++-common/musttail7.c: New test.
	* c-c++-common/musttail8.c: New test.
	* g++.dg/musttail10.C: New test.
	* g++.dg/musttail11.C: New test.
	* g++.dg/musttail6.C: New test.
	* g++.dg/musttail9.C: New test.
2024-07-23 13:27:12 -07:00
Andi Kleen
78bbdbd535 C: Implement musttail attribute for returns
Implement a C23 clang compatible musttail attribute similar to the earlier
C++ implementation in the C parser.

gcc/c/ChangeLog:

	PR c/83324
	* c-parser.cc (struct attr_state): Define with musttail_p.
	(c_parser_statement_after_labels): Handle [[musttail]].
	(c_parser_std_attribute): Dito.
	(c_parser_handle_musttail): Dito.
	(c_parser_compound_statement_nostart): Dito.
	(c_parser_all_labels): Dito.
	(c_parser_statement): Dito.
	* c-tree.h (c_finish_return): Add musttail_p flag.
	* c-typeck.cc (c_finish_return): Handle musttail_p flag.
2024-07-23 13:27:12 -07:00
Andi Kleen
2bd8177256 C++: Support clang compatible [[musttail]] (PR83324)
This patch implements a clang compatible [[musttail]] attribute for
returns.

musttail is useful as an alternative to computed goto for interpreters.
With computed goto the interpreter function usually ends up very big
which causes problems with register allocation and other per function
optimizations not scaling. With musttail the interpreter can be instead
written as a sequence of smaller functions that call each other. To
avoid unbounded stack growth this requires forcing a sibling call, which
this attribute does. It guarantees an error if the call cannot be tail
called which allows the programmer to fix it instead of risking a stack
overflow. Unlike computed goto it is also type-safe.

It turns out that David Malcolm had already implemented middle/backend
support for a musttail attribute back in 2016, but it wasn't exposed
to any frontend other than a special plugin.

This patch adds a [[gnu::musttail]] attribute for C++ that can be added
to return statements. The return statement must be a direct call
(it does not follow dependencies), which is similar to what clang
implements. It then uses the existing must tail infrastructure.

For compatibility it also detects clang::musttail

Passes bootstrap and full test

gcc/c-family/ChangeLog:

	* c-attribs.cc (set_musttail_on_return): New function.
	* c-common.h (set_musttail_on_return): Declare new function.

gcc/cp/ChangeLog:

	PR c/83324
	* cp-tree.h (AGGR_INIT_EXPR_MUST_TAIL): Add.
	* parser.cc (cp_parser_statement): Handle musttail.
	(cp_parser_jump_statement): Dito.
	* pt.cc (tsubst_expr): Copy CALL_EXPR_MUST_TAIL_CALL.
	* semantics.cc (simplify_aggr_init_expr): Handle musttail.
2024-07-23 13:27:12 -07:00
Patrick Palka
2861eb34e3 c++: normalizing ttp constraints [PR115656]
Here we normalize the constraint same_as<T, bool> for the first
time during ttp coercion of B / UU, specifically constraint subsumption
checking.  During this normalization the set of in-scope template
parameters i.e. current_template_parms is empty, which we rely on
during normalization of the ttp constraints since we pass in_decl=NULL_TREE
to norm_info.  And this tricks the satisfaction cache into thinking that
the satisfaction value of same_as<T, bool> is independent of its template
parameters, and we incorrectly conflate the satisfaction value with
T = bool vs T = long and accept the specialization A<long, B>.

Since is_compatible_template_arg rewrites the ttp's constraints to
be in terms of the argument template's parameters, and since it's
the only caller of weakly_subsumes, the latter funcion can instead
pass in_decl=tmpl to avoid relying on current_template_parms.  This
patch implements this, and in turns renames weakly_subsumes to
ttp_subsumes to reflect that this predicate is now hardcoded for this
one caller.

	PR c++/115656

gcc/cp/ChangeLog:

	* constraint.cc (weakly_subsumes): Pass in_decl=tmpl to
	get_normalized_constraints_from_info.  Rename to ...
	(ttp_subsumes): ... this.
	* cp-tree.h (weakly_subsumes): Rename to ...
	(ttp_subsumes): ... this.
	* pt.cc (is_compatible_template_arg): Adjust after renaming.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/concepts-ttp7.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
2024-07-23 13:16:14 -04:00
Patrick Palka
f70281222d c++: missing SFINAE during alias CTAD [PR115296]
During the alias CTAD transformation, if substitution failed for some
guide we should just silently discard the guide.  We currently do
discard the guide, but not silently, as in the below testcase which
we diagnose forming a too-large array type when transforming the
user-defined deduction guides.

This patch fixes this by using complain=tf_none instead of
tf_warning_or_error throughout alias_ctad_tweaks.

	PR c++/115296

gcc/cp/ChangeLog:

	* pt.cc (alias_ctad_tweaks): Use complain=tf_none instead of
	tf_warning_or_error.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/class-deduction-alias23.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
2024-07-23 11:37:31 -04:00
Gaius Mulley
7f8064ff0e PR modula2/116048 ICE when encountering wrong kind of qualident
Following on from PR-115957 further ICEs can be generated by using the
wrong kind of qualident symbol.  For example using a variable instead of
a type or using a type instead of a const.  This fix tracks the expected
qualident kind state when parsing const, type and variable declarations.
If the error is unrecoverable then a detailed message explaining the
context of the qualident (and why the seen qualident is wrong) is
generated.

gcc/m2/ChangeLog:

	PR modula2/116048
	* Make-lang.in (GM2-COMP-BOOT-DEFS): Add M2StateCheck.def.
	(GM2-COMP-BOOT-MODS): Add M2StateCheck.mod.
	(GM2-COMP-DEFS): Add M2StateCheck.def.
	(GM2-COMP-MODS): Add M2StateCheck.mod.
	* gm2-compiler/M2Quads.mod (StartBuildWith): Generate
	unrecoverable error is the qualident type is NulSym.
	Replace MetaError1 with MetaErrorT1 and position the error
	to the qualident.
	* gm2-compiler/P3Build.bnf (M2StateCheck): Import procedures.
	(seenError): New variable.
	(WasNoError): Remove variable.
	(BlockState): New variable.
	(ErrorString): Rewrite using seenError.
	(CompilationUnit): Ditto.
	(QualidentCheck): New rule.
	(ConstantDeclaration): Bookend with InclConst and ExclConst.
	(Constructor): Add InclConstructor, ExclConstructor and call
	CheckQualident.
	(ConstActualParameters): Call PushState, PopState, InclConstFunc
	and CheckQualident.
	(TypeDeclaration): Bookend with InclType and ExclType.
	(SimpleType): Call QualidentCheck.
	(CaseTag): Ditto.
	(OptReturnType): Ditto.
	(VariableDeclaration): Bookend with InclVar and ExclVar.
	(Designator): Call QualidentCheck.
	(Formal;Type): Ditto.
	* gm2-compiler/PCBuild.bnf (M2StateCheck): Import procedures.
	(ConstantDeclaration): Rewrite using InclConst and ExclConst.
	(Constructor): Bookend with InclConstructor and ExclConstructor.
	Call CheckQualident.
	(ConstructorOrConstActualParameters): Rewrite and cal
	l CheckQualident.
	(ConstActualParameters): Bookend with PushState PopState.
	Call InclConstFunc and CheckQualident.
	* gm2-gcc/init.cc (_M2_M2StateCheck_init): New declaration.
	(_M2_P3Build_init): New declaration.
	(init_PerCompilationInit): Call _M2_M2StateCheck_init and
	_M2_P3Build_init.
	* gm2-compiler/M2StateCheck.def: New file.
	* gm2-compiler/M2StateCheck.mod: New file.

gcc/testsuite/ChangeLog:

	PR modula2/116048
	* gm2/errors/fail/errors-fail.exp: Remove -Wstudents
	and add -Wuninit-variable-checking=all.
	Replace gm2_init_pim with gm2_init_iso.
	* gm2/errors/fail/testfio.mod: Modify test code to
	provoke an error in the first basic block.
	* gm2/errors/fail/testparam.mod: Ditto.
	* gm2/errors/fail/array1.mod: Ditto.
	* gm2/errors/fail/badtype.mod: New test.
	* gm2/errors/fail/badvar.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2024-07-23 15:54:16 +01:00
Arsen Arsenović
826134760c
cp/coroutines: add a test for PR c++/103953
This PR seems to have been fixed by a fix for a seemingly unrelated PR.
Lets add a regression test to make sure it stays fixed.

PR c++/103953 - Leak of coroutine return object

	PR c++/103953

gcc/testsuite/ChangeLog:

	* g++.dg/coroutines/torture/pr103953.C: New test.

Reviewed-by: Iain Sandoe <iain@sandoe.co.uk>
2024-07-23 16:23:20 +02:00
Tobias Burnus
b95c82d60c install.texi (gcn): Suggest newer commit for Newlib
Newlib 4.4.0 lacks two commits: 7dd4eb1db (2024-03-25) to fix device console
output for GFX10/GFX11 and ed50a50b9 (2024-04-04) to make the added lock.h
compilable with C++. This commit mentiones now also the second commit.

gcc/ChangeLog:

	* doc/install.texi (amdgcn-x-amdhsa): Suggest newer git version
	for newlib.
2024-07-23 12:41:40 +02:00
Jiufu Guo
472eab9ab1 report message for operator %a on unaddressible operand
Hi,

For PR96866, when printing asm code for modifier "%a", an addressable
operand is required.  While the constraint "X" allow any kind of
operand even which is hard to get the address directly. e.g. extern
symbol whose address is in TOC.
An error message would be reported to indicate the invalid asm operand.

Compare with previous version, test case is updated with -mno-pcrel.

Bootstrap&regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff(Jiufu Guo)

	PR target/96866

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (print_operand_address): Emit message for
	unsupported operand.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr96866-1.c: New test.
	* gcc.target/powerpc/pr96866-2.c: New test.
2024-07-23 18:21:25 +08:00
Torbjörn SVENSSON
7793f5b419 testsuite: Disable finite math only for test [PR115826]
As the test case requires +-Inf and NaN to work and -ffast-math is added
by default for arm-none-eabi, re-enable non-finite math.

gcc/testsuite/ChangeLog:

	PR testsuite/115826
	* gcc.dg/vect/tsvc/vect-tsvc-s1281.c: Use -fno-finite-math-only.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
2024-07-23 12:04:09 +02:00
Richard Biener
15d3b2dab9 tree-optimization/116002 - PTA solving slow with degenerate graph
When the constraint graph consists of N nodes with only complex
constraints and no copy edges we have to be lucky to arrive at
a constraint solving order that requires the optimal number of
iterations.  What happens in the testcase is that we bottle-neck
on computing the visitation order but propagate changes only
very slowly.  Luckily the testcase complex constraints are
all copy-with-offset and those do provide a way to order
visitation.  The following adds this which reduces the iteration
count to one.

	PR tree-optimization/116002
	* tree-ssa-structalias.cc (topo_visit): Also consider
	SCALAR = SCALAR complex constraints as edges.
2024-07-23 11:51:29 +02:00
Jonathan Wakely
b40156d691
libstdc++: Use [[maybe_unused]] attribute in src/c++23/print.cc
This avoids some warnings when the preprocessor conditions are not met.

libstdc++-v3/ChangeLog:

	* src/c++23/print.cc (__open_terminal): Use [[maybe_unused]] on
	parameter.
2024-07-23 10:25:36 +01:00
Detlef Vollmann
8439405e38
libstdc++: Do not use isatty on avr [PR115482]
avrlibc has an incomplete unistd.h that doesn't have isatty.
So building libstdc++ fails when compiling c++23/print.cc.
As a workaround I added a check for AVR.

libstdc++-v3/ChangeLog:

	PR libstdc++/115482
	* src/c++23/print.cc (__open_terminal) [__AVR__]: Do not use
	isatty.
2024-07-23 10:25:36 +01:00
Jakub Jelinek
b9cefd67a2 ssa: Fix up maybe_rewrite_mem_ref_base complex type handling [PR116034]
The folding into REALPART_EXPR is correct, used only when the mem_offset
is zero, but for IMAGPART_EXPR it didn't check the exact offset value (just
that it is not 0).
The following patch fixes that by using IMAGPART_EXPR only if the offset
is right and using BITFIELD_REF or whatever else otherwise.

2024-07-23  Jakub Jelinek  <jakub@redhat.com>
	    Andrew Pinski  <quic_apinski@quicinc.com>

	PR tree-optimization/116034
	* tree-ssa.cc (maybe_rewrite_mem_ref_base): Only use IMAGPART_EXPR
	if MEM_REF offset is equal to element type size.

	* gcc.dg/pr116034.c: New test.
2024-07-23 10:51:32 +02:00
Jakub Jelinek
58756c9f55 c++: Remove CHECK_CONSTR
On Mon, Jul 22, 2024 at 11:48:51AM -0400, Patrick Palka wrote:
> FWIW this tree code seems to be a vestige of the initial Concepts TS
> implementation and is effectively unused, we can remove it outright.

Here is a patch which removes that.

2024-07-23  Jakub Jelinek  <jakub@redhat.com>

	* cp-tree.def (CHECK_CONSTR): Remove.
	* cp-tree.h (CHECK_CONSTR_CONCEPT, CHECK_CONSTR_ARGS): Remove.
	* cp-objcp-common.cc (cp_common_init_ts): Don't handle CHECK_CONSTR.
	* tree.cc (cp_tree_equal): Likewise.
	* error.cc (dump_expr): Likewise.
	* cxx-pretty-print.cc (cxx_pretty_printer::expression): Likewise.
	(pp_cxx_check_constraint): Remove.
	(pp_cxx_constraint): Don't handle CHECK_CONSTR.
2024-07-23 10:39:08 +02:00
Richard Biener
44e065a52f [v2] rtl-optimization/116002 - cselib hash is bad
The following addresses the bad hash function of cselib which uses
integer plus for merging.  This causes a huge number of collisions
for the testcase in the PR and thus very large compile-time.

The following rewrites it to use inchash, eliding duplicate mixing
of RTX code and mode in some cases and more consistently avoiding
a return value of zero as well as treating zero as fatal.  An
important part is to preserve mixing of hashes of commutative
operators as commutative.

For cselib_hash_plus_const_int this removes the apparent attempt
of making sure to hash the same as a PLUS as cselib_hash_rtx makes
sure to dispatch to cselib_hash_plus_const_int consistently.

This reduces compile-time for the testcase in the PR from unknown
to 22s and for a reduced testcase from 73s to 9s.  There's another
pending patchset to improve the speed of inchash mixing, but it's
not in the profile for this testcase (PTA pops up now).

The generated code is equal.  I've also compared cc1 builds
with and without the patch and they are now commparing equal
after retaining commutative hashing for commutative operators.

	PR rtl-optimization/116002
	* cselib.cc (cselib_hash_rtx): Use inchash to get proper mixing.
	Consistently avoid a zero return value when hashing successfully.
	Consistently treat a zero hash value from recursing as fatal.
	Use hashval_t where appropriate.
	(cselib_hash_plus_const_int): Likewise.
	(new_cselib_val): Use hashval_t.
	(cselib_lookup_1): Likewise.
2024-07-23 09:34:27 +02:00
liuhongt
a3f0389106 Relax ix86_hardreg_mov_ok after split1.
ix86_hardreg_mov_ok is added by r11-5066-gbe39636d9f68c4

>    The solution proposed here is to have the x86 backend/recog prevent
>    early RTL passes composing instructions (that set likely_spilled hard
>    registers) that they (combine) can't simplify, until after reload.
>    We allow sets from pseudo registers, immediate constants and memory
>    accesses, but anything more complicated is performed via a temporary
>    pseudo.  Not only does this simplify things for the register allocator,
>    but any remaining register-to-register moves are easily cleaned up
>    by the late optimization passes after reload, such as peephole2 and
>    cprop_hardreg.

The restriction is mainly for rtl optimization passes before pass_combine.

But split1 splits

```
(insn 17 13 18 2 (set (reg/i:V4SI 20 xmm0)
        (vec_merge:V4SI (const_vector:V4SI [
                    (const_int -1 [0xffffffffffffffff]) repeated x4
                ])
            (const_vector:V4SI [
                    (const_int 0 [0]) repeated x4
                ])
            (unspec:QI [
                    (reg:V4SF 106)
                    (reg:V4SF 102)
                    (const_int 0 [0])
                ] UNSPEC_PCMP))) "/app/example.cpp":20:1 2929 {*avx_cmpv4sf3_1}
     (expr_list:REG_DEAD (reg:V4SF 102)
        (expr_list:REG_DEAD (reg:V4SF 106)
            (nil))))
```

into:
```
(insn 23 13 24 2 (set (reg:V4SF 107)
        (unspec:V4SF [
                (reg:V4SF 106)
                (reg:V4SF 102)
                (const_int 0 [0])
            ] UNSPEC_PCMP)) "/app/example.cpp":20:1 -1
     (nil))
(insn 24 23 18 2 (set (reg/i:V4SI 20 xmm0)
        (subreg:V4SI (reg:V4SF 107) 0)) "/app/example.cpp":20:1 -1
     (nil))
```

There're many splitters generating MOV insn with SUBREG and would have
same problem.
Instead of changing those splitters one by one, the patch relaxes
ix86_hard_mov_ok to allow mov subreg to hard register after
split1. ix86_pre_reload_split () is used to replace
!reload_completed && ira_in_progress.

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_hardreg_mov_ok): Relax mov subreg
	to hard register after split1.

gcc/testsuite/ChangeLog:

	* g++.target/i386/pr115982.C: New test.
2024-07-23 13:53:05 +08:00
Kewen Lin
f4062e361c rs6000: Update option set in rs6000_inner_target_options [PR115713]
When function rs6000_inner_target_options parsing target
options, it updates the explicit option set information for
rs6000_opt_masks by rs6000_isa_flags_explicit, but it misses
to update that information for rs6000_opt_vars, and it can
result in some unexpected consequence as the associated test
case shows.  This patch is to fix rs6000_inner_target_options
to update the option set for rs6000_opt_vars as well.

	PR target/115713

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_inner_target_options): Update option
	set information for rs6000_opt_vars.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr115713-2.c: New test.
2024-07-23 00:50:59 -05:00
Kewen Lin
e6db8848d9 rs6000: Consider explicitly set options in target option parsing [PR115713]
In rs6000_inner_target_options, when enabling VSX we enable
altivec and disable -mavoid-indexed-addresses implicitly,
but it doesn't consider the case that the options altivec
and avoid-indexed-addresses can be explicitly disabled.  As
the test case in PR115713#c1 shows, with target attribute
"no-altivec,vsx", it results in that VSX unexpectedly set
altivec flag and there isn't an expected error.

This patch is to avoid the automatic enablement when they
are explicitly specified.  With this change, an existing
test case ppc-target-4.c also requires an adjustment by
specifying explicit altivec in target attribute (since it
requires altivec feature and command line is specifying
no-altivec).

	PR target/115713

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_inner_target_options): Avoid to
	enable altivec or disable avoid-indexed-addresses automatically
	when they get specified explicitly.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr115713-1.c: New test.
	* gcc.target/powerpc/ppc-target-4.c: Adjust by specifying altivec
	in target attribute.
2024-07-23 00:50:59 -05:00
Kewen Lin
04da747a06 rs6000: Escalate warning to error for VSX with explicit no-altivec etc.
As the discussion in PR115688, for now when users specify
-mvsx and -mno-altivec explicitly, compiler emits warning
rather than error, but considering both options are given
explicitly, emitting hard error should be better.

So this patch is to escalate some related warning to error
when both are incompatible.

	PR target/115713

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_option_override_internal): Emit error
	messages when explicit VSX encounters explicit soft-float, no-altivec
	or avoid-indexed-addresses.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/warn-1.c: Move to ...
	* gcc.target/powerpc/error-1.c: ... here.  Adjust dg-warning with
	dg-error and remove ineffective scan.
2024-07-23 00:50:59 -05:00
Haochen Jiang
062e46a813 i386: Change prefetchi output template
For prefetchi instructions, RIP-relative address is explicitly mentioned
for operand and assembler obeys that rule strictly. This makes
instruction like:

	prefetchit0	bar

got illegal for assembler, which should be a broad usage for prefetchi.

Change to %a to explicitly add (%rip) after function label to make it
legal in assembler so that it could pass to linker to get the real address.

gcc/ChangeLog:

	* config/i386/i386.md (prefetchi): Change to %a.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/prefetchi-1.c: Check (%rip).
2024-07-23 13:50:31 +08:00
Jeff Law
ad642d2c95 [5/n][PR rtl-optimization/115877] Fix handling of input/output operands
So in this patch we're correcting a failure to mark objects live in scenarios
like

(set (dest) (plus (dest) (src))

When handling set pseudos, we transfer the liveness information from LIVENOW
into LIVE_TMP.  LIVE_TMP is subsequently used to narrow what bit groups are
live for the inputs.

The first time we process the block we may not have DEST in the LIVENOW set (it
may be live across the loop, but not live after the loop).  Thus we can totally
miss making certain objects live, resulting in incorrect code.

The fix is pretty simple.  If LIVE_TMP is empty, then we should go ahead and
mark all the bit groups for the set object in LIVE_TMP.  This also removes an
invalid gcc_assert on the state of the liveness bitmaps.

This showed up on pru, rl78 and/or msp430 in the testsuite.  So no new test.

Bootstrapped and regression tested on x86_64 and also run through my tester on
all the cross platforms.

Pushing to the trunk.

	PR rtl-optimization/115877
gcc/
	* ext-dce.cc (ext_dce_process_sets): Reasonably handle input/output
	operands.
	(ext_dce_rd_transfer_n): Drop bogus assertion.
2024-07-22 21:48:28 -06:00
Alexandre Oliva
ad65caa332 [powerpc] [testsuite] reorder dg directives [PR106069]
The dg-do directive appears after dg-require-effective-target in
g++.target/powerpc/pr106069.C.  That doesn't work the way that was
presumably intended.  Both of these directives set dg-do-what, but
dg-do does so fully and unconditionally, overriding any decisions
recorded there by earlier directives.  Reorder the directives more
canonically, so that both take effect.


for  gcc/testsuite/ChangeLog

	PR target/106069
	* g++.target/powerpc/pr106069.C: Reorder dg directives.
2024-07-22 23:09:24 -03:00
Patrick Palka
7c5a9bf1d2 c++/coroutines: correct passing *this to promise type [PR104981]
When passing *this to the promise type ctor (or to its operator new)
(as per [dcl.fct.def.coroutine]/4), we add an explicit cast to lvalue
reference.  But this is unnecessary since *this is already always an
lvalue.  And doing so means we need to call convert_from_reference
afterward to lower the reference expression to an implicit dereference,
which we're currently neglecting to do and which causes overload
resolution to get confused when computing argument conversions.

So this patch removes this unneeded reference cast when passing *this
to the promise ctor, and removes both the cast and implicit deref when
passing *this to operator new, for consistency.  While we're here, use
cp_build_fold_indirect_ref instead of directly building INDIRECT_REF.

	PR c++/104981
	PR c++/115550

gcc/cp/ChangeLog:

	* coroutines.cc (morph_fn_to_coro): Remove unneeded calls
	to convert_to_reference and convert_from_reference when
	passing *this.  Use cp_build_fold_indirect_ref instead
	of directly building INDIRECT_REF.

gcc/testsuite/ChangeLog:

	* g++.dg/coroutines/pr104981-preview-this.C: New test.
	* g++.dg/coroutines/pr115550-preview-this.C: New test.

Reviewed-by: Iain Sandoe <iain@sandoe.co.uk>
Reviewed-by: Jason Merrill <jason@redhat.com>
2024-07-22 21:30:49 -04:00
Pan Li
5d2115b850 RISC-V: Implement the .SAT_TRUNC for scalar
This patch would like to implement the simple .SAT_TRUNC pattern
in the riscv backend. Aka:

Form 1:
  #define DEF_SAT_U_TRUC_FMT_1(NT, WT)     \
  NT __attribute__((noinline))             \
  sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \
  {                                        \
    bool overflow = x > (WT)(NT)(-1);      \
    return ((NT)x) | (NT)-overflow;        \
  }

DEF_SAT_U_TRUC_FMT_1(uint32_t, uint64_t)

Before this patch:
__attribute__((noinline))
uint8_t sat_u_truc_uint16_t_to_uint8_t_fmt_1 (uint16_t x)
{
  _Bool overflow;
  unsigned char _1;
  unsigned char _2;
  unsigned char _3;
  uint8_t _6;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  overflow_5 = x_4(D) > 255;
  _1 = (unsigned char) x_4(D);
  _2 = (unsigned char) overflow_5;
  _3 = -_2;
  _6 = _1 | _3;
  return _6;
;;    succ:       EXIT

}

After this patch:
__attribute__((noinline))
uint8_t sat_u_truc_uint16_t_to_uint8_t_fmt_1 (uint16_t x)
{
  uint8_t _6;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _6 = .SAT_TRUNC (x_4(D)); [tail call]
  return _6;
;;    succ:       EXIT

}

The below tests suites are passed for this patch
1. The rv64gcv fully regression test.
2. The rv64gcv build with glibc

gcc/ChangeLog:

	* config/riscv/iterators.md (ANYI_DOUBLE_TRUNC): Add new iterator
	for int double truncation.
	(ANYI_DOUBLE_TRUNCATED): Add new attr for int double truncation.
	(anyi_double_truncated): Ditto but for lowercase.
	* config/riscv/riscv-protos.h (riscv_expand_ustrunc): Add new
	func decl for expanding ustrunc
	* config/riscv/riscv.cc (riscv_expand_ustrunc): Add new func
	impl to expand ustrunc.
	* config/riscv/riscv.md (ustrunc<mode><anyi_double_truncated>2): Impl
	the new pattern ustrunc<m><n>2 for int.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/sat_arith.h: Add test helper macro.
	* gcc.target/riscv/sat_arith_data.h: New test.
	* gcc.target/riscv/sat_u_trunc-1.c: New test.
	* gcc.target/riscv/sat_u_trunc-2.c: New test.
	* gcc.target/riscv/sat_u_trunc-3.c: New test.
	* gcc.target/riscv/sat_u_trunc-run-1.c: New test.
	* gcc.target/riscv/sat_u_trunc-run-2.c: New test.
	* gcc.target/riscv/sat_u_trunc-run-3.c: New test.
	* gcc.target/riscv/scalar_sat_unary.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
2024-07-23 09:24:02 +08:00
GCC Administrator
d1b25543f9 Daily bump. 2024-07-23 00:19:00 +00:00
Jan Hubicka
efcbe7b985 Fix handling of ICF_NOVOPS in ipa-modref
As shown in somewhat convoluted testcase, ipa-modref is mistreating
ECF_NOVOPS as "having no side effects".  This come from time when
modref cared only about memory accesses and thus it was possible to
shortcut on it.

This patch removes (hopefully) all those bad shortcuts.
Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

	PR ipa/109985

	* ipa-modref.cc (modref_summary::useful_p): Fix handling of ECF_NOVOPS.
	(modref_access_analysis::process_fnspec): Likevise.
	(modref_access_analysis::analyze_call): Likevise.
	(propagate_unknown_call): Likevise.
	(modref_propagate_in_scc): Likevise.
	(modref_propagate_flags_in_scc): Likewise.
	(ipa_merge_modref_summary_after_inlining): Likewise.
2024-07-22 23:01:50 +02:00
Jakub Jelinek
6f81b7fa79 c++: Some cp-tree.def comment fixes
While reading the fold expression and concept tree comments, I found
various spots referring to non-existent macros etc.

The following patch attempts to sync that with what is actually implemented.

2024-07-22  Jakub Jelinek  <jakub@redhat.com>

	* cp-tree.def (UNARY_LEFT_FOLD_EXPR): Use FOLD_EXPR_MODIFY_P instead
	of FOLD_EXPR_MOD_P or FOLDEXPR_MOD_P in the comment.  Comment
	formatting fixes.
	(ATOMIC_CONSTEXPR): Use CONSTR_INFO instead of ATOMIC_CONSTR_INFO
	and ATOMIC_CONSTR_MAP instead of ATOMIC_CONSTR_PARMS in the comment.
	Comment formatting fixes.
	(CONJ_CONSTR): Remove comment about third operand.  Use CONSTR_INFO
	instead of CONJ_CONSTR_INFO and DISJ_CONSTR_INFO.
	(CHECK_CONSTR): Use CHECK_CONSTR_ARGS instead of
	CHECK_CONSTR_ARGUMENTS.
2024-07-22 19:47:17 +02:00
Jan Hubicka
1407477335 Fix modref's iteraction with store merging
Hi,
this patch fixes wrong code in case store-merging introduces load of function
parameter that was previously write-only (which happens for bitfields).
Without this, the whole store-merged area is consdered to be killed.

	PR ipa/111613

gcc/ChangeLog:

	* ipa-modref.cc (analyze_parms): Do not preserve EAF_NO_DIRECT_READ and
	EAF_NO_INDIRECT_READ from past flags.

gcc/testsuite/ChangeLog:

	* gcc.c-torture/pr111613.c: New test.
2024-07-22 19:01:05 +02:00
Michael Meissner
05f0e9eec9 Add -mcpu=power11 support.
This patch adds the power11 option to the -mcpu= and -mtune= switches.

This patch treats the power11 like a power10 in terms of costs and reassociation
width.

This patch issues a ".machine power11" to the assembly file if you use
-mcpu=power11.

This patch defines _ARCH_PWR11 if the user uses -mcpu=power11.

This patch allows GCC to be configured with the --with-cpu=power11 and
--with-tune=power11 options.

This patch passes -mpwr11 to the assembler if the user uses -mcpu=power11.

This patch adds support for using "power11" in the __builtin_cpu_is built-in
function.

2024-07-22  Michael Meissner  <meissner@linux.ibm.com>

gcc/

	* config.gcc (powerpc*-*-*): Add support for power11.
	* config/rs6000/aix71.h (ASM_CPU_SPEC): Add support for -mcpu=power11.
	* config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise.
	* config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise.
	* config/rs6000/driver-rs6000.cc (asm_names): Likewise.
	* config/rs6000/ppc-auxv.h (PPC_PLATFORM_POWER11): New define.
	* config/rs6000/rs6000-builtin.cc (cpu_is_info): Add power11.
	* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
	_ARCH_PWR11 if -mcpu=power11.
	* config/rs6000/rs6000-cpus.def (POWER11_MASKS_SERVER): New define.
	(POWERPC_MASKS): Add power11.
	(power11 cpu): Add power11 definition.
	* config/rs6000/rs6000-opts.h (PROCESSOR_POWER11): Add power11 processor.
	* config/rs6000/rs6000-string.cc (expand_compare_loop): Likewise.
	* config/rs6000/rs6000-tables.opt: Regenerate.
	* config/rs6000/rs6000.cc (rs6000_option_override_internal): Add power11
	support.
	(rs6000_machine_from_flags): Likewise.
	(rs6000_reassociation_width): Likewise.
	(rs6000_adjust_cost): Likewise.
	(rs6000_issue_rate): Likewise.
	(rs6000_sched_reorder): Likewise.
	(rs6000_sched_reorder2): Likewise.
	(rs6000_register_move_cost): Likewise.
	(rs6000_opt_masks): Likewise.
	* config/rs6000/rs6000.h (ASM_CPU_SPEC): Likewise.
	* config/rs6000/rs6000.md (cpu attribute): Add power11.
	* config/rs6000/rs6000.opt (-mpower11): Add internal power11 flag.
	* doc/invoke.texi (RS/6000 and PowerPC Options): Document -mcpu=power11.
	* config/rs6000/power10.md (all reservations): Add power11 support.

gcc/testsuite/

	* gcc.target/powerpc/power11-1.c: New test.
	* gcc.target/powerpc/power11-2.c: Likewise.
	* gcc.target/powerpc/power11-3.c: Likewise.
2024-07-22 12:20:43 -04:00
Jeff Law
ab7c0aed52 [4/n][PR rtl-optimization/115877] Correct SUBREG handling in a destination
If we encounter something during SET handling that we can not handle, the safe
thing to do is to ignore the destination and continue the loop.

We've actually been trying to do slightly better with SUBREG destinations by
iterating into SUBREG_REG.  It turns out that wasn't working as expected.

The problem is once we "continue" we lose the state that we were inside the SET
and thus we ended up ignoring the destination completely rather than tracking
the SUBREG_REG object.  This could be fixed by restarting SET processing, but I
just don't see this as all that important to handle.  So rather than leave the
code as-is, not working per design, I'm twiddling it to use the common 'skip
subrtxs and continue' idiom used elsewhere.

This is a prerequisite for another patch in this series.  Specifically I have a
patch that explicitly tracks if we skipped a destination rather than trying to
imply it from the state of LIVE_TMP.  So this is probably NFC right now, but
that's a short-lived NFC.

Bootstrapped and regression tested on x86 and also run as part of a larger kit
on the crosses in my tester.

	PR rtl-optimization/115877
gcc/
	* ext-dce.cc (ext_dce_process_sets): More correctly handle SUBREG
	destinations.
2024-07-22 10:13:34 -06:00
Jan Hubicka
cf8ffc58aa Fix modref_eaf_analysis::analyze_ssa_name handling of values dereferenced to function call parameters
modref_eaf_analysis::analyze_ssa_name misinterprets EAF flags.  If dereferenced
parameter is passed (to map_iterator in the testcase) it can be returned
indirectly which in turn makes it to escape into the next function call.

	PR ipa/115033

gcc/ChangeLog:

	* ipa-modref.cc (modref_eaf_analysis::analyze_ssa_name): Fix checking of
	EAF flags when analysing values dereferenced as function parameters.

gcc/testsuite/ChangeLog:

	* gcc.c-torture/execute/pr115033.c: New test.
2024-07-22 18:08:08 +02:00
Jan Hubicka
391f46f10b Fix accounting of offsets in unadjusted_ptr_and_unit_offset
unadjusted_ptr_and_unit_offset accidentally throws away the offset computed by
get_addr_base_and_unit_offset. Instead of passing extra_offset it passes offset.

	PR ipa/114207

gcc/ChangeLog:

	* ipa-prop.cc (unadjusted_ptr_and_unit_offset): Fix accounting of offsets in ADDR_EXPR.

gcc/testsuite/ChangeLog:

	* gcc.c-torture/execute/pr114207.c: New test.
2024-07-22 18:05:26 +02:00
Jan Hubicka
0d19fbc7b0 Compare loop bounds in ipa-icf
Hi,
this testcase shows another poblem with missing comparators for metadata
in ICF. With value ranges available to loop optimizations during early
opts we can estimate number of iterations based on guarding condition that
can be split away by the fnsplit pass. This patch disables ICF when
number of iteraitons does not match.

Bootstrapped/regtesed x86_64-linux, will commit it shortly

gcc/ChangeLog:

	PR ipa/115277
	* ipa-icf-gimple.cc (func_checker::compare_loops): compare loop
	bounds.

gcc/testsuite/ChangeLog:

	* gcc.c-torture/compile/pr115277.c: New test.
2024-07-22 18:01:57 +02:00
Richard Sandiford
34f33ea801 rtl-ssa: Avoid using a stale splay tree root [PR116009]
In the fix for PR115928, I'd failed to notice that "root" was used
later in the function, so needed to be updated.

gcc/
	PR rtl-optimization/116009
	* rtl-ssa/accesses.cc (function_info::add_def): Set the root
	local variable after removing the old clobber group.

gcc/testsuite/
	PR rtl-optimization/116009
	* gcc.c-torture/compile/pr116009.c: New test.
2024-07-22 16:42:16 +01:00
Richard Sandiford
e62988b777 rtl-ssa: Add debug routines for def_splay_tree
This patch adds debug routines for def_splay_tree, which I found
useful while debugging PR116009.

gcc/
	* rtl-ssa/accesses.h (rtl_ssa::pp_def_splay_tree): Declare.
	(dump, debug): Add overloads for def_splay_tree.
	* rtl-ssa/accesses.cc (rtl_ssa::pp_def_splay_tree): New function.
	(dump, debug): Add overloads for def_splay_tree.
2024-07-22 16:42:16 +01:00
Richard Sandiford
ebde0cc101 aarch64: Tighten aarch64_simd_mem_operand_p [PR115969]
aarch64_simd_mem_operand_p checked for a memory with a POST_INC
or REG address, but it didn't check what kind of register was
being used.  This meant that it allowed DImode FPRs as well as GPRs.

I wondered about rewriting it to use aarch64_classify_address,
but this one-line fix seemed simpler.  The structure then mirrors
the existing early exit in aarch64_classify_address itself:

  /* On LE, for AdvSIMD, don't support anything other than POST_INC or
     REG addressing.  */
  if (advsimd_struct_p
      && TARGET_SIMD
      && !BYTES_BIG_ENDIAN
      && (code != POST_INC && code != REG))
    return false;

gcc/
	PR target/115969
	* config/aarch64/aarch64.cc (aarch64_simd_mem_operand_p): Require
	the operand to be a legitimate memory_operand.

gcc/testsuite/
	PR target/115969
	* gcc.target/aarch64/pr115969.c: New test.
2024-07-22 16:42:15 +01:00
Jeff Law
88d16194d0 [NFC][PR rtl-optimization/115877] Avoid setting irrelevant bit groups as live in ext-dce
Another patch to refine liveness computations.  This should be NFC and is
designed to help debugging.

In simplest terms the patch avoids setting bit groups outside the size of a
pseudo as live.  Consider a HImode pseudo, bits 16..63 for such a pseudo don't
really have meaning, yet we often set bit groups related to bits 16.63 on in
the liveness bitmaps.

This makes debugging harder than it needs to be by simply having larger bitmaps
to verify when walking through the code in a debugger.

This has been bootstrapped and regression tested on x86_64.  It's also been
tested on the crosses in my tester without regressions.

Pushing to the trunk,

	PR rtl-optimization/115877
gcc/
	* ext-dce.cc (group_limit): New function.
	(mark_reg_live): Likewise.
	(ext_dce_process_sets): Use new functions.
	(ext_dce_process_uses): Likewise.
	(ext_dce_init): Likewise.
2024-07-22 08:45:10 -06:00
Richard Biener
a8e61cd71f Fix hash of WIDEN_*_EXPR
We're hashing operand 2 to the temporary hash.

	* fold-const.cc (operand_compare::hash_operand): Fix hash
	of WIDEN_*_EXPR.
2024-07-22 13:53:26 +02:00