Commit graph

202266 commits

Author SHA1 Message Date
liuhongt
37a231cc75 Disparage slightly for the alternative which move DFmode between SSE_REGS and GENERAL_REGS.
For testcase

void __cond_swap(double* __x, double* __y) {
  bool __r = (*__x < *__y);
  auto __tmp = __r ? *__x : *__y;
  *__y = __r ? *__y : *__x;
  *__x = __tmp;
}

GCC-14 with -O2 and -march=x86-64 options generates the following code:

__cond_swap(double*, double*):
        movsd   xmm1, QWORD PTR [rdi]
        movsd   xmm0, QWORD PTR [rsi]
        comisd  xmm0, xmm1
        jbe     .L2
        movq    rax, xmm1
        movapd  xmm1, xmm0
        movq    xmm0, rax
.L2:
        movsd   QWORD PTR [rsi], xmm1
        movsd   QWORD PTR [rdi], xmm0
        ret

rax is used to save and restore DFmode value. In RA both GENERAL_REGS
and SSE_REGS cost zero since we didn't disparage the
alternative in movdf_internal pattern, according to register
allocation order, GENERAL_REGS is allocated. The patch add ? for
alternative (r,v) and (v,r) just like we did for movsf/hf/bf_internal
pattern, after that we get optimal RA.

__cond_swap:
.LFB0:
	.cfi_startproc
	movsd	(%rdi), %xmm1
	movsd	(%rsi), %xmm0
	comisd	%xmm1, %xmm0
	jbe	.L2
	movapd	%xmm1, %xmm2
	movapd	%xmm0, %xmm1
	movapd	%xmm2, %xmm0
.L2:
	movsd	%xmm1, (%rsi)
	movsd	%xmm0, (%rdi)
	ret

gcc/ChangeLog:

	PR target/110170
	* config/i386/i386.md (movdf_internal): Disparage slightly for
	2 alternatives (r,v) and (v,r) by adding constraint modifier
	'?'.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr110170-3.c: New test.
2023-07-06 13:54:25 +08:00
Jeevitha Palanisamy
1669fad496 rs6000: Remove redundant initialization [PR106907]
PR106907 has few warnings spotted from cppcheck. In that addressing
redundant initialization issue. Here the initialized value of 'new_addr'
was overwritten before it was read. Updated the source by removing the
unnecessary initialization of 'new_addr'.

2023-07-06  Jeevitha Palanisamy  <jeevitha@linux.ibm.com>

gcc/
	PR target/106907
	* config/rs6000/rs6000.cc (rs6000_expand_vector_extract): Remove redundant
	initialization of new_addr.
2023-07-05 23:46:15 -05:00
Hao Liu
7339e725b9 tree-optimization/110474 - Vect: select small VF for epilog of unrolled loop
If a loop is unrolled during vectorization (i.e. suggested_unroll_factor > 1),
the VFs of both main and epilog loop are enlarged.  The epilog vect loop is
specific for a loop with small iteration counts, so a large VF may hurt
performance.

This patch unscales the main loop VF by suggested_unroll_factor while selecting
the epilog loop VF, so that it will be the same as vectorized loop without
unrolling (i.e. suggested_unroll_factor = 1).

gcc/ChangeLog:

	PR tree-optimization/110474
	* tree-vect-loop.cc (vect_analyze_loop_2): unscale the VF by suggested
	unroll factor while selecting the epilog vect loop VF.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/pr110474.c: New testcase.
2023-07-06 10:06:01 +08:00
GCC Administrator
5158918aa2 Daily bump. 2023-07-06 00:17:51 +00:00
Andrew MacLeod
778099c426 Make compute_operand_range a tail call.
Tweak the routine so it is making a tail call.

	* gimple-range-gori.cc (compute_operand_range): Convert to a tail
	call.
2023-07-05 19:06:31 -04:00
Andrew MacLeod
988b07a66a Make compute_operand2_range a leaf call.
Rather than creating long call chains, put the onus for finishing
the evlaution on the caller.

	* gimple-range-gori.cc (compute_operand_range): After calling
	compute_operand2_range, recursively call self if needed.
	(compute_operand2_range): Turn into a leaf function.
	(gori_compute::compute_operand1_and_operand2_range): Finish
	operand2 calculation.
	* gimple-range-gori.h (compute_operand2_range): Remove name param.
2023-07-05 19:06:30 -04:00
Andrew MacLeod
018e7f1640 Make compute_operand1_range a leaf call.
Rather than creating long call chains, put the onus for finishing
the evlaution on the caller.

	* gimple-range-gori.cc (compute_operand_range): After calling
	compute_operand1_range, recursively call self if needed.
	(compute_operand1_range): Turn into a leaf function.
	(gori_compute::compute_operand1_and_operand2_range): Finish
	operand1 calculation.
	* gimple-range-gori.h (compute_operand1_range): Remove name param.
2023-07-05 19:06:30 -04:00
Andrew MacLeod
f037570561 Simplify compute_operand_range for op1 and op2 case.
Move the check for co-dependency between 2 operands into
compute_operand_range, resulting in a much cleaner
compute_operand1_and_operand2_range routine.

	* gimple-range-gori.cc (compute_operand_range): Check for
	operand interdependence when both op1 and op2 are computed.
	(compute_operand1_and_operand2_range): No checks required now.
2023-07-05 19:06:30 -04:00
Andrew MacLeod
70d1e3f40f Move relation discovery into compute_operand_range
compute_operand1_range and compute_operand2_range were both doing
relation discovery between the 2 operands... move it into a common area.

	* gimple-range-gori.cc (compute_operand_range): Check for
	a relation between op1 and op2 and use that instead.
	(compute_operand1_range): Don't look for a relation override.
	(compute_operand2_range): Ditto.
2023-07-05 19:06:30 -04:00
Thomas Rodgers
acfe8fa8dc libstdc++: Split up pstl/set.cc testcase
This testcase is causing some timeout issues. This patch splits the
testcase up by individual set algorithm.

libstdc++-v3:/ChangeLog:
	* testsuite/25_algorithms/pstl/alg_sorting/set.cc: Delete
	file.
	* testsuite/25_algorithms/pstl/alg_sorting/set_difference.cc:
	New file.
	* testsuite/25_algorithms/pstl/alg_sorting/set_intersection.cc:
	Likewise.
	* testsuite/25_algorithms/pstl/alg_sorting/set_symmetric_difference.cc:
	Likewise.
	* testsuite/25_algorithms/pstl/alg_sorting/set_union.cc:
	Likewise.
	* testsuite/25_algorithms/pstl/alg_sorting/set_util.h:
	Likewise.
2023-07-05 14:13:02 -07:00
Jonathan Wakely
be240fc6ac doc: Update my Contributors entry
gcc/ChangeLog:

	* doc/contrib.texi (Contributors): Update my entry.
2023-07-05 17:10:24 +01:00
Filip Kastl
1ee710027d value-prof.cc: Correct edge prob calculation.
The mod-subtract optimization with ncounts==1 produced incorrect edge
probabilities due to incorrect conditional probability calculation. This
patch fixes the calculation.

Signed-off-by: Filip Kastl <filip.kastl@gmail.com>

gcc/ChangeLog:

	* value-prof.cc (gimple_mod_subtract_transform): Correct edge
	prob calculation.
2023-07-05 17:36:02 +02:00
Uros Bizjak
a4778dbd93 sched: Change return type of predicate functions from int to bool
Also change some internal variables to bool.

gcc/ChangeLog:

	* sched-int.h (struct haifa_sched_info): Change can_schedule_ready_p,
	scehdule_more_p and contributes_to_priority indirect frunction
	type from int to bool.
	(no_real_insns_p): Change return type from int to bool.
	(contributes_to_priority): Ditto.
	* haifa-sched.cc (no_real_insns_p): Change return type from
	int to bool and adjust function body accordingly.
	* modulo-sched.cc (try_scheduling_node_in_cycle): Change "success"
	variable type from int to bool.
	(ps_insn_advance_column): Change return type from int to bool.
	(ps_has_conflicts): Ditto. Change "has_conflicts"
	variable type from int to bool.
	* sched-deps.cc (deps_may_trap_p): Change return type from int to bool.
	(conditions_mutex_p): Ditto.
	* sched-ebb.cc (schedule_more_p): Ditto.
	(ebb_contributes_to_priority): Change return type from
	int to bool and adjust function body accordingly.
	* sched-rgn.cc (is_cfg_nonregular): Ditto.
	(check_live_1): Ditto.
	(is_pfree): Ditto.
	(find_conditional_protection): Ditto.
	(is_conditionally_protected): Ditto.
	(is_prisky): Ditto.
	(is_exception_free): Ditto.
	(haifa_find_rgns): Change "unreachable" and "too_large_failure"
	variables from int to bool.
	(extend_rgns): Change "rescan" variable from int to bool.
	(check_live): Change return type from
	int to bool and adjust function body accordingly.
	(can_schedule_ready_p): Ditto.
	(schedule_more_p): Ditto.
	(contributes_to_priority): Ditto.
2023-07-05 16:58:17 +02:00
Robin Dapp
c30efd8cd6 gimple-isel: Recognize vec_extract pattern.
In gimple-isel we already deduce a vec_set pattern from an
ARRAY_REF(VIEW_CONVERT_EXPR).  This patch does the same for a
vec_extract.

The code is largely similar to the vec_set one
including the addition of a can_vec_extract_var_idx_p function
in optabs.cc to check if the backend can handle a register
operand as index.  We already have can_vec_extract in
optabs-query but that one checks whether we can extract
specific modes.

With the introduction of an internal function for vec_extract
the expander must not FAIL.  For vec_set this has already been
the case so adjust the documentation accordingly.

Additionally, clarify the wording of the vector-vector case for
vec_extract.

gcc/ChangeLog:

	* doc/md.texi: Document that vec_set and vec_extract must not
	fail.
	* gimple-isel.cc (gimple_expand_vec_set_expr): Rename this...
	(gimple_expand_vec_set_extract_expr): ...to this.
	(gimple_expand_vec_exprs): Call renamed function.
	* internal-fn.cc (vec_extract_direct): Add.
	(expand_vec_extract_optab_fn): New function to expand
	vec_extract optab.
	(direct_vec_extract_optab_supported_p): Add.
	* internal-fn.def (VEC_EXTRACT): Add.
	* optabs.cc (can_vec_extract_var_idx_p): New function.
	* optabs.h (can_vec_extract_var_idx_p): Declare.
2023-07-05 16:57:05 +02:00
Robin Dapp
573bb719bb RISC-V: Support variable index in vec_extract.
This patch adds a gen_lowpart in the vec_extract expander so it properly
works with a variable index and adds tests.

gcc/ChangeLog:

	* config/riscv/autovec.md: Add gen_lowpart.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: Add
	tests for variable index.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c:
	Ditto.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c:
	Ditto.
2023-07-05 16:57:05 +02:00
Robin Dapp
df9a6cbb08 RISC-V: Allow variable index for vec_set.
This patch enables a variable index for vec_set and adjust the tests.

gcc/ChangeLog:

	* config/riscv/autovec.md: Allow register index operand.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: Adjust
	test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c:
	Ditto.
2023-07-05 16:56:46 +02:00
Pan Li
70b041684a RISC-V: Use FRM_DYN when add the rounding mode operand
This patch would like to take FRM_DYN const rtx as the rounding mode
operand according to the RVV spec, which takes the dyn as the only
rounding mode for floating-point.

Signed-off-by: Pan Li <pan2.li@intel.com>

gcc/ChangeLog:

	* config/riscv/riscv-vector-builtins.cc
	(function_expander::use_exact_insn): Use FRM_DYN instead of const0.
Signed-off-by: Pan Li <pan2.li@intel.com>
2023-07-05 22:26:37 +08:00
Robin Dapp
429905d809 RISC-V: Change truncate to float_truncate in narrowing patterns.
This fixes a bug in the autovect FP narrowing patterns which resulted in
a combine ICE.  It would try to e.g. simplify a unary operation by
simplify_const_unary_operation which obviously expects a float_truncate
and not a truncate for a floating-point mode.

gcc/ChangeLog:

	* config/riscv/autovec.md: Use float_truncate.
2023-07-05 15:54:51 +02:00
Ju-Zhe Zhong
34c614b7e9 VECT: Apply LEN_MASK_GATHER_LOAD/SCATTER_STORE into vectorizer
Hi, Richard and Richi.

Address comments from Richi.

Make gs_info.ifn = LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE.

I have fully tested these 4 format:

length = vf is a dummpy length,
mask = {-1,-1, ... } is a dummy mask.

1. no length, no mask
   LEN_MASK_GATHER_LOAD (..., length = vf, mask = {-1,-1,...})
2. exist length, no mask
   LEN_MASK_GATHER_LOAD (..., len, mask = {-1,-1,...})
3. exist mask, no length
   LEN_MASK_GATHER_LOAD (..., length = vf, mask)
4. both mask and length exist
   LEN_MASK_GATHER_LOAD (..., length, mask)

All of these work fine in this patch.

Here is the example:

void
f (int *restrict a,
   int *restrict b, int n,
   int base, int step,
   int *restrict cond)
{
  for (int i = 0; i < n; ++i)
    {
      if (cond[i])
        a[i * 4] = b[i];
    }
}

Gimple IR:

  <bb 3> [local count: 105119324]:
  _58 = (unsigned long) n_13(D);

  <bb 4> [local count: 630715945]:
  # vectp_cond.7_45 = PHI <vectp_cond.7_46(4), cond_14(D)(3)>
  # vectp_b.11_51 = PHI <vectp_b.11_52(4), b_15(D)(3)>
  # vectp_a.14_55 = PHI <vectp_a.14_56(4), a_16(D)(3)>
  # ivtmp_59 = PHI <ivtmp_60(4), _58(3)>
  _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [2, 2]);
  ivtmp_44 = _61 * 4;
  vect__4.9_47 = .LEN_MASK_LOAD (vectp_cond.7_45, 32B, _61, 0, { -1, ... });
  mask__24.10_49 = vect__4.9_47 != { 0, ... };
  vect__8.13_53 = .LEN_MASK_LOAD (vectp_b.11_51, 32B, _61, 0, mask__24.10_49);
  ivtmp_54 = _61 * 16;
  .LEN_MASK_SCATTER_STORE (vectp_a.14_55, { 0, 16, 32, ... }, 1, vect__8.13_53, _61, 0, mask__24.10_49);
  vectp_cond.7_46 = vectp_cond.7_45 + ivtmp_44;
  vectp_b.11_52 = vectp_b.11_51 + ivtmp_44;
  vectp_a.14_56 = vectp_a.14_55 + ivtmp_54;
  ivtmp_60 = ivtmp_59 - _61;
  if (ivtmp_60 != 0)
    goto <bb 4>; [83.33%]
  else
    goto <bb 5>; [16.67%]

Ok for trunk ?

gcc/ChangeLog:

	* internal-fn.cc (internal_fn_len_index): Apply
	LEN_MASK_GATHER_LOAD/SCATTER_STORE into vectorizer.
	(internal_fn_mask_index): Ditto.
	* optabs-query.cc (supports_vec_gather_load_p): Ditto.
	(supports_vec_scatter_store_p): Ditto.
	* tree-vect-data-refs.cc (vect_gather_scatter_fn_p): Ditto.
	* tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Ditto.
	* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
	(vect_get_strided_load_store_ops): Ditto.
	(vectorizable_store): Ditto.
	(vectorizable_load): Ditto.
2023-07-05 21:27:48 +08:00
Robin Dapp
f4a2ae2338 Change MODE_BITSIZE to MODE_PRECISION for MODE_VECTOR_BOOL.
RISC-V lowers the TYPE_PRECISION for MODE_VECTOR_BOOL vectors in order
to distinguish between VNx1BI, VNx2BI, VNx4BI and VNx8BI.

This patch adjusts uses of MODE_VECTOR_BOOL to use GET_MODE_PRECISION
instead of GET_MODE_BITSIZE.

The RISC-V tests are provided by Juzhe.

Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>

gcc/c-family/ChangeLog:

	* c-common.cc (c_common_type_for_mode): Use GET_MODE_PRECISION.

gcc/ChangeLog:

	* simplify-rtx.cc (native_encode_rtx): Ditto.
	(native_decode_vector_rtx): Ditto.
	(simplify_const_vector_byte_offset): Ditto.
	(simplify_const_vector_subreg): Ditto.
	* tree.cc (build_truth_vector_type_for_mode): Ditto.
	* varasm.cc (output_constant_pool_2): Ditto.

gcc/fortran/ChangeLog:

	* trans-types.cc (gfc_type_for_mode): Ditto.

gcc/go/ChangeLog:

	* go-lang.cc (go_langhook_type_for_mode): Ditto.

gcc/lto/ChangeLog:

	* lto-lang.cc (lto_type_for_mode): Ditto.

gcc/rust/ChangeLog:

	* backend/rust-tree.cc (c_common_type_for_mode): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-10.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-11.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-12.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-13.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-14.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c: New test.
2023-07-05 11:36:39 +02:00
YunQiang Su
5f5e37dcbc MIPS: Use unaligned access to expand block_move on r6
MIPSr6 support unaligned memory access with normal lh/sh/lw/sw/ld/sd
instructions, and thus lwl/lwr/ldl/ldr and swl/swr/sdl/sdr is removed.

For microarchitecture, these memory access instructions issue 2
operation if the address is not aligned, which is like what lwl family
do.

For some situation (such as accessing boundary of pages) on some
microarchitectures, the unaligned access may not be good enough,
then the kernel should trap&emu it: the kernel may need
-mno-unalgined-access option.

gcc/
	* config/mips/mips.cc (mips_expand_block_move): don't expand for
	r6 with -mno-unaligned-access option if one or both of src and
	dest are unaligned. restruct: return directly if length is not const.
	(mips_block_move_straight): emit_move if ISA_HAS_UNALIGNED_ACCESS.

gcc/testsuite/
	* gcc.target/mips/expand-block-move-r6-no-unaligned.c: new test.
	* gcc.target/mips/expand-block-move-r6.c: new test.
2023-07-05 17:26:02 +08:00
Richard Biener
a9c6db31cb adjust testcase for now happening epilogue vectorization
gcc.dg/vect/slp-perm-9.c is reported to FAIL with -march=cascadelake
now which is because we now vectorize the epilogue with V2HImode
vectors after the recent change to not scrap too large vector
epilogues during transform but during analysis time.

The following adjusts the testcase to always use the existing alternate
N which avoids epilogue vectorization.

	* gcc.dg/vect/slp-perm-9.c: Always use alternate N.
2023-07-05 10:05:05 +02:00
Jan Beulich
b647f75a6a x86: suppress avx512f-copysign.c testcase for 32-bit
The test installed by "x86: make VPTERNLOG* usable on less than 512-bit
operands with just AVX512F" won't succeed on 32-bit, for floating point
operations being done there (by default) without using SIMD insns.

gcc/testsuite/

	* gcc.target/i386/avx512f-copysign.c: Suppress for 32-bit.
2023-07-05 09:52:41 +02:00
Jan Beulich
e007369c8b x86: yet more PR target/100711-like splitting
Following two-operand bitwise operations, add another splitter to also
deal with not followed by broadcast all on its own, which can be
expressed as simple embedded broadcast instead once a broadcast operand
is actually permitted in the respective insn. While there also permit
a broadcast operand in the corresponding expander.

gcc/

	PR target/100711
	* config/i386/sse.md: New splitters to simplify
	not;vec_duplicate as a singular vpternlog.
	(one_cmpl<mode>2): Allow broadcast for operand 1.
	(<mask_codefor>one_cmpl<mode>2<mask_name>): Likewise.

gcc/testsuite/

	PR target/100711
	* gcc.target/i386/pr100711-6.c: New test.
2023-07-05 09:49:16 +02:00
Jan Beulich
fa58c2871a x86: further PR target/100711-like splitting
With respective two-operand bitwise operations now expressable by a
single VPTERNLOG, add splitters to also deal with ior and xor
counterparts of the original and-only case. Note that the splitters need
to be separate, as the placement of "not" differs in the final insns
(*iornot<mode>3, *xnor<mode>3) which are intended to pick up one half of
the result.

gcc/

	PR target/100711
	* config/i386/sse.md: New splitters to simplify
	not;vec_duplicate;{ior,xor} as vec_duplicate;{iornot,xnor}.

gcc/testsuite/

	PR target/100711
	* gcc.target/i386/pr100711-4.c: New test.
	* gcc.target/i386/pr100711-5.c: New test.
2023-07-05 09:48:47 +02:00
Jan Beulich
3186ef0cb9 x86: allow memory operand for AVX2 splitter for PR target/100711
The intended broadcast (with AVX512) can very well be done right from
memory.

gcc/

	PR target/100711
	* config/i386/sse.md: Permit non-immediate operand 1 in AVX2
	form of splitter for PR target/100711.
2023-07-05 09:48:19 +02:00
Richard Biener
9fed1ec67f middle-end/110541 - VEC_PERM_EXPR documentation is off
The following adjusts the tree.def documentation about VEC_PERM_EXPR
which wasn't adjusted when the restrictions of permutes with constant
mask were relaxed.

	PR middle-end/110541
	* tree.def (VEC_PERM_EXPR): Adjust documentation to reflect
	reality.
2023-07-05 09:44:24 +02:00
Jan Beulich
2d11c99dfc x86: use VPTERNLOG also for certain andnot forms
When it's the memory operand which is to be inverted, using VPANDN*
requires a further load instruction. The same can be achieved by a
single VPTERNLOG*. Add two new alternatives (for plain memory and
embedded broadcast), adjusting the predicate for the first operand
accordingly.

Two pre-existing testcases actually end up being affected (improved) by
the change, which is reflected in updated expectations there.

gcc/

	PR target/93768
	* config/i386/sse.md (*andnot<mode>3): Add new alternatives
	for memory form operand 1.

gcc/testsuite/

	PR target/93768
	* gcc.target/i386/avx512f-andn-di-zmm-2.c: New test.
	* gcc.target/i386/avx512f-andn-si-zmm-2.c: Adjust expecations
	towards generated code.
	* gcc.target/i386/pr100711-3.c: Adjust expectations for 32-bit
	code.
2023-07-05 09:41:09 +02:00
Jan Beulich
607613e516 x86: use VPTERNLOG for further bitwise two-vector operations
All combinations of and, ior, xor, and not involving two operands can be
expressed that way in a single insn.

gcc/

	PR target/93768
	* config/i386/i386.cc (ix86_rtx_costs): Further special-case
	bitwise vector operations.
	* config/i386/sse.md (*iornot<mode>3): New insn.
	(*xnor<mode>3): Likewise.
	(*<nlogic><mode>3): Likewise.
	(andor): New code iterator.
	(nlogic): New code attribute.
	(ternlog_nlogic): Likewise.

gcc/testsuite/

	PR target/93768
	* gcc.target/i386/avx512-binop-not-1.h: New.
	* gcc.target/i386/avx512-binop-not-2.h: New.
	* gcc.target/i386/avx512f-orn-si-zmm-1.c: New test.
	* gcc.target/i386/avx512f-orn-si-zmm-2.c: New test.
2023-07-05 09:40:40 +02:00
Richard Biener
450b9566d5 Fix typo in vectorizer debug message
* tree-vect-stmts.cc (vect_mark_relevant): Fix typo.
2023-07-05 09:33:10 +02:00
Jonathan Wakely
cd9964b7e2 libstdc++: Disable std::forward_list tests for C++98 mode
These tests fail with -std=gnu++98/-D_GLIBCXX_DEBUG in the runtest
flags. They should require the c++11 effective target.

libstdc++-v3/ChangeLog:

	* testsuite/23_containers/forward_list/debug/iterator1_neg.cc:
	Skip as UNSUPPORTED for C++98 mode.
	* testsuite/23_containers/forward_list/debug/iterator3_neg.cc:
	Likewise.
2023-07-05 07:39:04 +01:00
Jonathan Wakely
83cae6c4b7 libstdc++: Fix std::__uninitialized_default_n for constant evaluation [PR110542]
libstdc++-v3/ChangeLog:

	PR libstdc++/110542
	* include/bits/stl_uninitialized.h (__uninitialized_default_n):
	Do not use std::fill_n during constant evaluation.
2023-07-05 07:39:04 +01:00
Jonathan Wakely
4870a18ac2 libstdc++: Use RAII in std::vector::_M_default_append
Similar to r14-2052-gdd2eb972a5b063, replace the try-block with RAII
types for deallocating storage and destroying elements.

libstdc++-v3/ChangeLog:

	* include/bits/vector.tcc (_M_default_append): Replace try-block
	with RAII types.
2023-07-05 07:39:04 +01:00
Jonathan Wakely
49f2b325ec libstdc++: Add redundant 'typename' to std::projected
This is needed by Clang 15.

libstdc++-v3/ChangeLog:

	* include/bits/iterator_concepts.h (projected): Add typename.
2023-07-05 07:39:03 +01:00
yulong
8377cf1bf4 RISC-V:Add float16 tuple type abi
gcc/ChangeLog:

	* config/riscv/vector.md: Add float16 attr at sew、vlmul and ratio.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/abi-10.c: Add float16 tuple type case.
	* gcc.target/riscv/rvv/base/abi-11.c: Ditto.
	* gcc.target/riscv/rvv/base/abi-12.c: Ditto.
	* gcc.target/riscv/rvv/base/abi-15.c: Ditto.
	* gcc.target/riscv/rvv/base/abi-8.c: Ditto.
	* gcc.target/riscv/rvv/base/abi-9.c: Ditto.
	* gcc.target/riscv/rvv/base/abi-17.c: New test.
	* gcc.target/riscv/rvv/base/abi-18.c: New test.
2023-07-05 09:46:23 +08:00
yulong
0af87afb3f RISC-V:Add float16 tuple type support
This patch adds support for the float16 tuple type.

gcc/ChangeLog:

	* config/riscv/genrvv-type-indexer.cc (valid_type): Enable FP16 tuple.
	* config/riscv/riscv-modes.def (RVV_TUPLE_MODES): New macro.
	(ADJUST_ALIGNMENT): Ditto.
	(RVV_TUPLE_PARTIAL_MODES): Ditto.
	(ADJUST_NUNITS): Ditto.
	* config/riscv/riscv-vector-builtins-types.def (vfloat16mf4x2_t):
	New types.
	(vfloat16mf4x3_t): Ditto.
	(vfloat16mf4x4_t): Ditto.
	(vfloat16mf4x5_t): Ditto.
	(vfloat16mf4x6_t): Ditto.
	(vfloat16mf4x7_t): Ditto.
	(vfloat16mf4x8_t): Ditto.
	(vfloat16mf2x2_t): Ditto.
	(vfloat16mf2x3_t): Ditto.
	(vfloat16mf2x4_t): Ditto.
	(vfloat16mf2x5_t): Ditto.
	(vfloat16mf2x6_t): Ditto.
	(vfloat16mf2x7_t): Ditto.
	(vfloat16mf2x8_t): Ditto.
	(vfloat16m1x2_t): Ditto.
	(vfloat16m1x3_t): Ditto.
	(vfloat16m1x4_t): Ditto.
	(vfloat16m1x5_t): Ditto.
	(vfloat16m1x6_t): Ditto.
	(vfloat16m1x7_t): Ditto.
	(vfloat16m1x8_t): Ditto.
	(vfloat16m2x2_t): Ditto.
	(vfloat16m2x3_t): Ditto.
	(vfloat16m2x4_t): Ditto.
	(vfloat16m4x2_t): Ditto.
	* config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): New macro.
	(vfloat16mf4x3_t): Ditto.
	(vfloat16mf4x4_t): Ditto.
	(vfloat16mf4x5_t): Ditto.
	(vfloat16mf4x6_t): Ditto.
	(vfloat16mf4x7_t): Ditto.
	(vfloat16mf4x8_t): Ditto.
	(vfloat16mf2x2_t): Ditto.
	(vfloat16mf2x3_t): Ditto.
	(vfloat16mf2x4_t): Ditto.
	(vfloat16mf2x5_t): Ditto.
	(vfloat16mf2x6_t): Ditto.
	(vfloat16mf2x7_t): Ditto.
	(vfloat16mf2x8_t): Ditto.
	(vfloat16m1x2_t): Ditto.
	(vfloat16m1x3_t): Ditto.
	(vfloat16m1x4_t): Ditto.
	(vfloat16m1x5_t): Ditto.
	(vfloat16m1x6_t): Ditto.
	(vfloat16m1x7_t): Ditto.
	(vfloat16m1x8_t): Ditto.
	(vfloat16m2x2_t): Ditto.
	(vfloat16m2x3_t): Ditto.
	(vfloat16m2x4_t): Ditto.
	(vfloat16m4x2_t): Ditto.
	* config/riscv/riscv-vector-switch.def (TUPLE_ENTRY): New.
	* config/riscv/riscv.md: New.
	* config/riscv/vector-iterators.md: New.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/tuple-28.c: New test.
	* gcc.target/riscv/rvv/base/tuple-29.c: New test.
	* gcc.target/riscv/rvv/base/tuple-30.c: New test.
	* gcc.target/riscv/rvv/base/tuple-31.c: New test.
	* gcc.target/riscv/rvv/base/tuple-32.c: New test.
2023-07-05 09:46:02 +08:00
Jie Mei
9d5dbf706a MIPS: Adjust mips16e2 related tests for ifcvt costing changes
A mips16e2 related test fails after the ifcvt change. The mips16e2
addition also causes a test for unrelated module to fail.

This patch adjusts branch costs when running the two affected tests.

These tests should not require the -mbranch-cost option, and
this issue needs to be addressed.

gcc/testsuite/ChangeLog:

	* gcc.target/mips/mips16e2-cmov.c: Adjust branch cost to
	encourage if-conversion.
	* gcc.target/mips/movcc-3.c: Same as above.
2023-07-05 09:35:10 +08:00
GCC Administrator
6d966f9f17 Daily bump. 2023-07-05 00:17:06 +00:00
Andrew Pinski
71b68cc559 PR 110487: (a !=/== CST1 ? CST2 : CST3) pattern for type safety
The problem here is we might produce some values out of the type's
min/max (and/or valid values, e.g. signed booleans). The fix is to
use an integer type which has the same precision and signedness
as the original type.

Note two_value_replacement in phiopt had the same issue in previous
versions; though I don't know if a problem will show up there.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

	PR tree-optimization/110487
	* match.pd (a !=/== CST1 ? CST2 : CST3): Always
	build a nonstandard integer and use that.
2023-07-04 10:19:50 -07:00
Andrew Pinski
2e5c1b123d Fix PR 110487: invalid signed boolean value
This fixes the first part of this bug where `a ? -1 : 0`
would cause a value of 1 into the signed boolean value.
It fixes the problem by casting to an integer type of
the same size/signedness before doing the negative and
then casting to the type of expression.

OK? Bootstrapped and tested on x86_64.

gcc/ChangeLog:

	* match.pd (a?-1:0): Cast type an integer type
	rather the type before the negative.
	(a?0:-1): Likewise.
2023-07-04 10:19:50 -07:00
Takayuki 'January June' Suwa
cd22b97726 xtensa: Use HARD_REG_SET instead of bare integer
gcc/ChangeLog:

	* config/xtensa/xtensa.cc (machine_function, xtensa_expand_prologue):
	Change to use HARD_REG_BIT and its macros.
	* config/xtensa/xtensa.md
	(peephole2: regmove elimination during DFmode input reload):
	Likewise.
2023-07-04 08:38:17 -07:00
Richard Biener
819285ef10 tree-optimization/110491 - PHI-OPT and undefs
The following makes sure to not make conditional undefs in PHI arguments
unconditional by folding cond ? arg1 : arg2.

	PR tree-optimization/110491
	* tree-ssa-phiopt.cc (match_simplify_replacement): Check
	whether the PHI args are possibly undefined before folding
	the COND_EXPR.

	* gcc.dg/torture/pr110491.c: New testcase.
2023-07-04 14:13:33 +02:00
Pan Li
86ff0533fc Streamer: Fix out of range memory access of machine mode
We extend the machine mode from 8 to 16 bits already. But there still
one placing missing from the streamer. It has one hard coded array
for the machine code like size 256.

In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
value of the MAX_MACHINE_MODE will grow as more and more modes are
added. While the machine mode array in tree-streamer still leave 256 as is.

Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
lto_output_init_mode_table will touch the memory out of range unexpected.

This patch would like to take the MAX_MACHINE_MODE as the size of the
array in streamer, to make sure there is no potential unexpected
memory access in future. Meanwhile, this patch also adjust some place
which has MAX_MACHINE_MODE <= 256 assumption.

Care is taken that for offload compilation, we interpret the stream-in
data in terms of the host 'MAX_MACHINE_MODE' ('file_data->mode_bits'),
which very likely is different from the offload device
'MAX_MACHINE_MODE'.

	gcc/
	* lto-streamer-in.cc (lto_input_mode_table): Stream in the mode
	bits for machine mode table.
	* lto-streamer-out.cc (lto_write_mode_table): Stream out the
	HOST machine mode bits.
	* lto-streamer.h (struct lto_file_decl_data): New fields mode_bits.
	* tree-streamer.cc (streamer_mode_table): Take MAX_MACHINE_MODE
	as the table size.
	* tree-streamer.h (streamer_mode_table): Ditto.
	(bp_pack_machine_mode): Take 1 << ceil_log2 (MAX_MACHINE_MODE)
	as the packing limit.
	(bp_unpack_machine_mode): Ditto with 'file_data->mode_bits'.
	gcc/lto/
	* lto-common.cc (lto_file_finalize) [!ACCEL_COMPILER]: Initialize
	'file_data->mode_bits'.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2023-07-04 14:10:54 +02:00
Thomas Schwinge
d7faf7a54e LTO: Capture 'lto_file_decl_data *file_data' in 'class lto_input_block'
... instead of just 'unsigned char *mode_table'.  Preparation for a forthcoming
change, where we need to capture an additional 'file_data' item, so it seems
easier to just capture that one proper.

	gcc/
	* lto-streamer.h (class lto_input_block): Capture
	'lto_file_decl_data *file_data' instead of just
	'unsigned char *mode_table'.
	* ipa-devirt.cc (ipa_odr_read_section): Adjust.
	* ipa-fnsummary.cc (inline_read_section): Likewise.
	* ipa-icf.cc (sem_item_optimizer::read_section): Likewise.
	* ipa-modref.cc (read_section): Likewise.
	* ipa-prop.cc (ipa_prop_read_section, read_replacements_section):
	Likewise.
	* ipa-sra.cc (isra_read_summary_section): Likewise.
	* lto-cgraph.cc (input_cgraph_opt_section): Likewise.
	* lto-section-in.cc (lto_create_simple_input_block): Likewise.
	* lto-streamer-in.cc (lto_read_body_or_constructor)
	(lto_input_toplevel_asms): Likewise.
	* tree-streamer.h (bp_unpack_machine_mode): Likewise.
	gcc/lto/
	* lto-common.cc (lto_read_decls): Adjust.
2023-07-04 14:10:54 +02:00
Richard Biener
1135073424 Use mark_ssa_maybe_undefs in PHI-OPT
The following removes gimple_uses_undefined_value_p and instead
uses the conservative mark_ssa_maybe_undefs in PHI-OPT, the last
user of the other API.

	* tree-ssa-phiopt.cc (pass_phiopt::execute): Mark SSA undefs.
	(empty_bb_or_one_feeding_into_p): Check for them.
	* tree-ssa.h (gimple_uses_undefined_value_p): Remove.
	* tree-ssa.cc (gimple_uses_undefined_value_p): Likewise.
2023-07-04 12:32:56 +02:00
Richard Biener
6eea7eaf11 Remove unnecessary check on scalar_niter == 0
The following removes an unnecessary check.

	* tree-vect-loop.cc (vect_analyze_loop_costing): Remove
	check guarding scalar_niter underflow.
2023-07-04 12:32:25 +02:00
Richard Biener
d4800a23d8 tree-optimization/110376 - testcase for fixed bug
This is a new testcase for the fixed bug.

	PR tree-optimization/110376
	* gcc.dg/torture/pr110376.c: New testcase.
2023-07-04 12:29:08 +02:00
Hao Liu
2c12ccf800 PR tree-optimization/110531 - Vect: avoid using uninitialized variable
slp_done_for_suggested_uf is used directly in vect_analyze_loop_2
without initialization, which is undefined behavior.  Initialize it to false
according to the discussion.

gcc/ChangeLog:
	PR tree-optimization/110531
	* tree-vect-loop.cc (vect_analyze_loop_1): initialize
	slp_done_for_suggested_uf to false.
2023-07-04 17:19:23 +08:00
Richard Biener
b083203f05 tree-optimization/110228 - avoid undefs in ifcombine more thoroughly
The following replaces the simplistic gimple_uses_undefined_value_p
with the conservative mark_ssa_maybe_undefs approach as already
used by LIM and IVOPTs.  This is to avoid exposing an unconditional
uninitialized read on a path from entry by if-combine.

	PR tree-optimization/110228
	* tree-ssa-ifcombine.cc (pass_tree_ifcombine::execute):
	Mark SSA may-undefs.
	(bb_no_side_effects_p): Check stmt uses for undefs.

	* gcc.dg/torture/pr110228.c: New testcase.
	* gcc.dg/uninit-pr101912.c: Un-XFAIL.
2023-07-04 11:11:45 +02:00
Richard Biener
729aa4fa48 tree-optimization/110436 - bogus live/relevant for unused pattern
When we compute liveness and relevantness we have to make sure to
handle live but not relevant stmts in a way we can later vectorize
them.  When the stmt uses only operands that do not need vectorization
we can just leave such stmts in place - but not in the case they
are recognized as patterns.  Since we don't have a way to cancel
pattern recognition we have to force mark such stmts as relevant.

	PR tree-optimization/110436
	* tree-vect-stmts.cc (vect_mark_relevant): Expand dumping,
	force live but not relevant pattern stmts relevant.

	* gcc.dg/pr110436.c: New testcase.
2023-07-04 10:37:03 +02:00