For the testcase in the PR, we have
br64 = br;
br64 = ((br64 << 16) & 0x000000ff00000000ull) | (br64 & 0x0000ff00ull);
n->n: 0x3000000200.
n->range: 32.
n->type: uint64.
The original code assumes n->range is same as TYPE PRECISION(n->type),
and tries to rotate the mask from 0x300000200 -> 0x20300 which is
incorrect. The patch fixed this bug by not trying bswap + rotate when
TYPE_PRECISION(n->type) is not equal to n->range.
gcc/ChangeLog:
PR tree-optimization/110067
* gimple-ssa-store-merging.cc (find_bswap_or_nop): Don't try
bswap + rotate when TYPE_PRECISION(n->type) > n->range.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr110067.c: New test.
The problem here is DSE was not taking into account the address space
which meant if you had two addresses say `fs:0` and `gs:0` (on x86_64),
DSE would think they were the same and remove the first store.
This fixes that issue by adding a check for the address space too.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR rtl-optimization/102733
gcc/ChangeLog:
* dse.cc (store_info): Add addrspace field.
(record_store): Record the address space
and check to make sure they are the same.
gcc/testsuite/ChangeLog:
* gcc.target/i386/addr-space-6.c: New test.
After r14-1014-gc5df248509b489364c573e8, GCC started to emit
directly a zero_extract for `(t1&0x8)!=0`. This introduced
a small regression where ifcvt would not do the ifconversion
as there is now a paradoxical subreg in the dest which
was being rejected. Since paradoxical subreg set the whole
register, we can treat it as the same as a reg in the two places.
OK? Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu.
gcc/ChangeLog:
PR rtl-optimization/110042
* ifcvt.cc (bbs_ok_for_cmove_arith): Allow paradoxical subregs.
(bb_valid_for_noce_process_p): Strip the subreg for the SET_DEST.
gcc/testsuite/ChangeLog:
PR rtl-optimization/110042
* gcc.target/aarch64/csel_bfx_2.c: New test.
This bug was essentially that darwin_rs6000_special_round_type_align()
was ignoring externally-imposed capping of field alignment.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR target/110044
gcc/ChangeLog:
* config/rs6000/rs6000.cc (darwin_rs6000_special_round_type_align):
Make sure that we do not have a cap on field alignment before altering
the struct layout based on the type alignment of the first entry.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/darwin-abi-13-0.c: New test.
* gcc.target/powerpc/darwin-abi-13-1.c: New test.
* gcc.target/powerpc/darwin-abi-13-2.c: New test.
* gcc.target/powerpc/darwin-structs-0.h: New test.
Commit 7aae58b04b "btf: improve -dA comments for testsuite" broke
bootstrap on a number of architectures because it introduced some
new -Wformat errors.
Fix those errors by properly using PRIu64 and a small refactor to
the offending code.
Based on the suggested patch from Rainer Orth.
PR debug/110073
gcc/ChangeLog:
* btfout.cc (btf_absolute_func_id): New function.
(btf_asm_func_type): Call it here. Change index parameter from
size_t to ctf_id_t. Use PRIu64 formatter.
g:7aae58b04b92303ccda3ead600be98f0d4b7f462 introduced -Wformat errors
breaking bootstrap on some targets. This patch fixes that.
Committed as obvious.
gcc/ChangeLog:
* btfout.cc (btf_asm_type): Use PRIu64 instead of %lu for uint64_t.
(btf_asm_datasec_type): Likewise.
In the testcase, the user wants the assignment to use the operator= declared
in the class, but because [over.match.list] says that explicit constructors
are also considered for list-initialization, as affirmed in CWG1228, we end
up choosing the implicitly-declared copy assignment operator, using the
explicit constructor template for the argument, which is ill-formed. Other
implementations haven't implemented CWG1228, so we keep getting bug reports.
Discussion in CWG led to the idea for this targeted relaxation: if we use an
explicit constructor for the conversion to the argument of a copy or move
special member function, that makes the candidate worse than another.
DR 2735
PR c++/109247
gcc/cp/ChangeLog:
* call.cc (sfk_copy_or_move): New.
(joust): Add tiebreaker for explicit conv and copy ctor.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/initlist-explicit3.C: New test.
The third argument for __builtin_altivec_tr_stxvrhx should be short *
not int *. Similarly, the third argument for __builtin_altivec_tr_stxvrwx
should be int * not short *. This patch fixes the arguments in the two
builtins.
A runnable test case is added to test the __builtin_altivec_tr_stxvrbx,
__builtin_altivec_tr_stxvrhx, __builtin_altivec_tr_stxvrwx and
__builtin_altivec_tr_stxvrdx builtins.
gcc/
* config/rs6000/rs6000-builtins.def (__builtin_altivec_tr_stxvrhx,
__builtin_altivec_tr_stxvrwx): Fix type of third argument.
gcc/testsuite/
* gcc.target/powerpc/builtin_altivec_tr_stxvr_runnable.c: New test
for __builtin_altivec_tr_stxvrbx, __builtin_altivec_tr_stxvrhx,
__builtin_altivec_tr_stxvrwx, __builtin_altivec_tr_stxvrdx.
After the maybe_init_list_as_* patches, I noticed that we were putting the
array of strings into .rodata, but then memcpying it into an automatic
array, which is pointless; we should be able to use it directly.
This doesn't happen automatically because TREE_ADDRESSABLE is set (since
r12-657 for PR100464), and so gimplify_init_constructor won't promote the
variable to static. Theoretically we could do escape analysis to recognize
that the address, though taken, never leaves the function; that would allow
promotion when we're only using the address for indexing within the
function, as in initlist-opt2.C. But this would be a new pass.
And in initlist-opt1.C, we're passing the array address to another function,
so it definitely escapes; it's only safe in this case because it's calling a
standard library function that we know only uses it for indexing. So, a
flag seems needed. I first thought to put the flag on the TARGET_EXPR, but
the VAR_DECL seems more appropriate.
In a previous revision of the patch I called this flag DECL_NOT_OBSERVABLE,
but I think DECL_MERGEABLE is a better name, especially if we're going to
apply it to the backing array of initializer_list, which is observable. I
then also check it in places that check for -fmerge-all-constants, so that
multiple equivalent initializer-lists can also be combined. And then it
seemed to make sense for [[no_unique_address]] to have this meaning for
user-written variables.
I think the note in [dcl.init.list]/6 intended to allow this kind of merging
for initializer_lists, but it didn't actually work; for an explicit array
with the same initializer, if the address escapes the program could tell
whether the same variable in two frames have the same address. P2752 is
trying to correct this defect, so I'm going to assume that this is the
intent.
PR c++/110070
PR c++/105838
gcc/ChangeLog:
* tree.h (DECL_MERGEABLE): New.
* tree-core.h (struct tree_decl_common): Mention it.
* gimplify.cc (gimplify_init_constructor): Check it.
* cgraph.cc (symtab_node::address_can_be_compared_p): Likewise.
* varasm.cc (categorize_decl_for_section): Likewise.
gcc/cp/ChangeLog:
* call.cc (maybe_init_list_as_array): Set DECL_MERGEABLE.
(convert_like_internal) [ck_list]: Set it.
(set_up_extended_ref_temp): Copy it.
* tree.cc (handle_no_unique_addr_attribute): Set it.
gcc/testsuite/ChangeLog:
* g++.dg/tree-ssa/initlist-opt1.C: Check for static array.
* g++.dg/tree-ssa/initlist-opt2.C: Likewise.
* g++.dg/tree-ssa/initlist-opt4.C: New test.
* g++.dg/opt/icf1.C: New test.
* g++.dg/opt/icf2.C: New test.
* g++.dg/opt/icf3.C: New test.
* g++.dg/tree-ssa/array-temp1.C: Revert r12-657 change.
Also change some internal variables to bool and recode handling of
boolean varialbes to not use bitwise or.
gcc/ChangeLog:
* rtl.h (stack_regs_mentioned): Change return type from int to bool.
* reg-stack.cc (struct_block_info_def): Change "done" to bool.
(stack_regs_mentioned_p): Change return type from int to bool
and adjust function body accordingly.
(stack_regs_mentioned): Ditto.
(check_asm_stack_operands): Ditto. Change "malformed_asm"
variable to bool.
(move_for_stack_reg): Recode handling of control_flow_insn_deleted.
(swap_rtx_condition_1): Change return type from int to bool
and adjust function body accordingly. Change "r" variable to bool.
(swap_rtx_condition): Change return type from int to bool
and adjust function body accordingly.
(subst_stack_regs_pat): Recode handling of control_flow_insn_deleted.
(subst_stack_regs): Ditto.
(convert_regs_entry): Change return type from int to bool and adjust
function body accordingly. Change "inserted" variable to bool.
(convert_regs_1): Recode handling of control_flow_insn_deleted.
(convert_regs_2): Recode handling of cfg_altered.
(convert_regs): Ditto. Change "inserted" variable to bool.
In PR95226, the testcase was failing because we tried to output_constant a
NOP_EXPR to float from a double REAL_CST, and so we output a double where
the caller wanted a float. That doesn't happen anymore, but with the
output_constant hunk we will ICE in that situation rather than emit the
wrong number of bytes.
Part of the problem was that initializer_constant_valid_p_1 returned true
for that NOP_EXPR, because it compared the sizes of integer types but not
floating-point types. So the C++ front end assumed it didn't need to fold
the initializer.
PR c++/95226
gcc/ChangeLog:
* varasm.cc (output_constant) [REAL_TYPE]: Check that sizes match.
(initializer_constant_valid_p_1): Compare float precision.
gcc/analyzer/ChangeLog:
* store.cc (store::eval_alias_1): Regions in different memory
spaces can't alias.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
pr107557-[12].c invoke -flto option but do not check that the target
support LTO. This patch adds dg-require lto to the testcases.
* gcc.dg/pr107557-1.c: Require LTO support.
* gcc.dg/pr107557-2.c: Require LTO support.
Signed-off-by: David Edelsohn <dje.gcc@gmail.com>
Explicitly say that attempted shift past element bit width is UB for
vector types. Mention that integer promotions do not happen.
gcc/ChangeLog:
* doc/extend.texi (Vector Extensions): Clarify bitwise shift
semantics.
Follow Richi's suggestion, I change current decrement IV flow from:
do {
remain -= MIN (vf, remain);
} while (remain != 0);
into:
do {
old_remain = remain;
len = MIN (vf, remain);
remain -= vf;
} while (old_remain >= vf);
to enhance SCEV.
Include fixes from kewen.
This patch will need to wait for Kewen's test feedback.
Testing on X86 is on-going
Co-Authored by: Kewen Lin <linkw@linux.ibm.com>
PR tree-optimization/109971
gcc/ChangeLog:
* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change decrement IV flow.
(vect_set_loop_condition_partial_vectors): Ditto.
After reload, there may be sequences like
lreg = dreg
lreg = lreg <op> const
with an LD_REGS dreg, non-LD_REGS lreg, and <op> in PLUS, IOR, AND.
If dreg dies after the first insn, it is possible to use
dreg = dreg <op> const
lreg = dreg
instead which is more efficient.
gcc/
PR target/110088
* config/avr/avr.md: Add an RTL peephole to optimize operations on
non-LD_REGS after a move from LD_REGS.
(piaop): New code iterator.
With Subversion r265695 (Git commit 22e0527251)
"Update GCC to autoconf 2.69, automake 1.15.1 (PR bootstrap/82856)" we're back
to normal; per Automake 1.15.1 'configure.ac' still "[...] perl 5.6 or better
is required [...]".
PR bootstrap/82856
gcc/
* doc/install.texi (Perl): Back to requiring "Perl version 5.6.1 (or
later)".
2023-06-02 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/87477
* parse.cc (parse_associate): Replace the existing evaluation
of the target rank with calls to gfc_resolve_ref and
gfc_expression_rank. Identify untyped target function results
with structure constructors by finding the appropriate derived
type.
* resolve.cc (resolve_symbol): Allow associate variables to be
assumed shape.
gcc/testsuite/
PR fortran/87477
* gfortran.dg/associate_54.f90 : Cope with extra error.
PR fortran/102109
* gfortran.dg/pr102109.f90 : New test.
PR fortran/102112
* gfortran.dg/pr102112.f90 : New test.
PR fortran/102190
* gfortran.dg/pr102190.f90 : New test.
PR fortran/102532
* gfortran.dg/pr102532.f90 : New test.
PR fortran/109948
* gfortran.dg/pr109948.f90 : New test.
PR fortran/99326
* gfortran.dg/pr99326.f90 : New test.
This patch optimizes the following seriese vector:
[nunits - 1, nunits - 2, ...., 0]
Before this patch:
vid
vmul
vadd
After this patch:
vid
vrsub
This patch is an obvious and simple optimization, ok for trunk?
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vec_series): Optimize reverse series index vector.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Add assembly check.
Notice there is warning in predicates.md:
../../../riscv-gcc/gcc/config/riscv/predicates.md: In function ‘bool arith_operand_or_mode_mask(rtx, machine_mode)’:
../../../riscv-gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
(match_test "INTVAL (op) == GET_MODE_MASK (HImode)
../../../riscv-gcc/gcc/config/riscv/predicates.md:34:20: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
|| INTVAL (op) == GET_MODE_MASK (SImode)"))))
gcc/ChangeLog:
* config/riscv/predicates.md: Change INTVAL into UINTVAL.
This patch would like to add some test cases of vfloat16*_t (non tuple),
no 'zvfh' or 'zvfhmin' will meet unknown type.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/abi-16.c: Add test cases.
* gcc.target/riscv/rvv/base/user-7.c: Likewise.
1. This patch optimize the codegen of the following auto-vectorization codes:
void foo (int32_t * __restrict a, int64_t * __restrict b, int64_t * __restrict c, int n)
{
for (int i = 0; i < n; i++)
c[i] = (int64_t)a[i] + b[i];
}
Combine instruction from:
...
vsext.vf2
vadd.vv
...
into:
...
vwadd.wv
...
Since for PLUS operation, GCC prefer the following RTL operand order when combining:
(plus: (sign_extend:..)
(reg:)
instead of
(plus: (reg:..)
(sign_extend:)
which is different from MINUS pattern.
I split patterns of vwadd/vwsub, and add dedicated patterns for them.
2. This patch not only optimize the case as above (1) mentioned, also enhance vwadd.vv/vwsub.vv
optimization for complicate PLUS/MINUS codes, consider this following codes:
__attribute__ ((noipa)) void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
int16_t *__restrict dst3, int8_t *__restrict a,
int8_t *__restrict b, int8_t *__restrict a2,
int8_t *__restrict b2, int n)
{
for (int i = 0; i < n; i++)
{
dst[i] = (int16_t) a[i] + (int16_t) b[i];
dst2[i] = (int16_t) a2[i] + (int16_t) b[i];
dst3[i] = (int16_t) a2[i] + (int16_t) a[i];
}
}
Before this patch:
...
vsetvli zero,a6,e8,mf2,ta,ma
vle8.v v2,0(a3)
vle8.v v1,0(a4)
vsetvli t1,zero,e16,m1,ta,ma
vsext.vf2 v3,v2
vsext.vf2 v2,v1
vadd.vv v1,v2,v3
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v1,0(a0)
vle8.v v4,0(a5)
vsetvli t1,zero,e16,m1,ta,ma
vsext.vf2 v1,v4
vadd.vv v2,v1,v2
...
After this patch:
...
vsetvli zero,a6,e8,mf2,ta,ma
vle8.v v3,0(a4)
vle8.v v1,0(a3)
vsetvli t4,zero,e8,mf2,ta,ma
vwadd.vv v2,v1,v3
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v2,0(a0)
vle8.v v2,0(a5)
vsetvli t4,zero,e8,mf2,ta,ma
vwadd.vv v4,v3,v2
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v4,0(a1)
vsetvli t4,zero,e8,mf2,ta,ma
sub a7,a7,a6
vwadd.vv v3,v2,v1
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v3,0(a2)
...
The reason why current upstream GCC can not optimize codes using vwadd thoroughly is combine PASS
needs intermediate RTL IR (extend one of the operand pattern (vwadd.wv)), then base on this intermediate
RTL IR, extend the other operand to generate vwadd.vv.
So vwadd.wv/vwsub.wv definitely helps to vwadd.vv/vwsub.vv code optimizations.
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc: Change vwadd.wv/vwsub.wv
intrinsic API expander
* config/riscv/vector.md
(@pred_single_widen_<plus_minus:optab><any_extend:su><mode>): Remove it.
(@pred_single_widen_sub<any_extend:su><mode>): New pattern.
(@pred_single_widen_add<any_extend:su><mode>): New pattern.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/widen/widen-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-6.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-6.c: New test.
This patch supports vector permutation for VLS only by vec_perm pattern.
We will support TARGET_VECTORIZE_VEC_PERM_CONST to support VLA permutation
in the future.
Fixed following comments from Robin.
gcc/ChangeLog:
* config/riscv/autovec.md (vec_perm<mode>): New pattern.
* config/riscv/predicates.md (vector_perm_operand): New predicate.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_vec_perm): New function.
* config/riscv/riscv-v.cc (const_vec_all_in_range_p): Ditto.
(gen_const_vector_dup): Ditto.
(emit_vlmax_gather_insn): Ditto.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(expand_vec_perm): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm.h: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c: New test.
gcc/fortran/ChangeLog:
PR fortran/88552
* decl.cc (gfc_match_kind_spec): Use error path on missing right
parenthesis.
(gfc_match_decl_type_spec): Use error return when an error occurred
during matching a KIND specifier.
gcc/testsuite/ChangeLog:
PR fortran/88552
* gfortran.dg/pr88552.f90: New test.
This was helpful when debugging the recent multilib testsuite failure.
gcc/testsuite:
* lib/torture-options.exp: print the value of non-empty options:
torture_without_loops, torture_with_loops, LTO_TORTURE_OPTIONS.
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Recent discussion of -Wimplicit led me to want to clarify this section of
the documentation, and mark which diagnostics other than -Wpedantic are
affected by -pedantic-errors.
gcc/ChangeLog:
* doc/invoke.texi (-Wpedantic): Improve clarity.
AIX does not support -mstrict-align.
pr109566.c had skip directive in wrong order for DejaGNU.
* gcc.target/powerpc/pr100106-sa.c: Skip on AIX.
* gcc.target/powerpc/pr109566.c: Skip on AIX.
Signed-off-by: David Edelsohn <dje.gcc@gmail.com>
This test fails in C++20 and later due to a warning:
warning: C++20 says that these are ambiguous, even though the second is reversed:
note: candidate 1: 'bool MyClass::operator==(const MyClass&)'
note: candidate 2: 'bool MyClass::operator==(const MyClass&)' (reversed)
note: try making the operator a 'const' member function
FAIL: 26_numerics/pstl/numeric_ops/transform_reduce.cc (test for excess errors)
libstdc++-v3/ChangeLog:
* testsuite/26_numerics/pstl/numeric_ops/transform_reduce.cc:
Add const to equality operator.
The monadic operations in std::expected always check has_value() so we
can avoid the execptional path in value() and the assertions in error()
by accessing _M_val and _M_unex directly. This means that the monadic
operations no longer require _M_unex to be copyable so that it can be
thrown from value(), as modified by LWG 3938.
This also fixes two incorrect uses of std::move in transform(F&&)& and
transform(F&&) const& which I found while making these changes.
Now that move-only error types are supported, it's possible to properly
test the constraints that LWG 3877 added to and_then and transform. The
lwg3877.cc test now does that.
libstdc++-v3/ChangeLog:
* include/std/expected (expected::and_then, expected::or_else)
(expected::transform_error): Use _M_val and _M_unex instead of
calling value() and error(), as per LWG 3938.
(expected::transform): Likewise. Remove incorrect std::move
calls from lvalue overloads.
(expected<void, E>::and_then, expected<void, E>::or_else)
(expected<void, E>::transform): Use _M_unex instead of calling
error().
* testsuite/20_util/expected/lwg3877.cc: Add checks for and_then
and transform, and for std::expected<void, E>.
* testsuite/20_util/expected/lwg3938.cc: New test.
My r14-1452-gfb409a15d9babc change to add optimization hints to
std::vector causes regressions because it makes std::vector::size() and
std::vector::capacity() too big to inline. That's the opposite of what
I wanted, so revert the changes to those functions.
To achieve the original aim of optimizing vec.assign(vec.size(), x) we
can add a local optimization hint to _M_fill_assign, so that it doesn't
affect all other uses of size() and capacity().
Additionally, add the same hint to the _M_assign_aux overload for
forward iterators and add that to the testcase.
It would be nice to similarly optimize:
if (vec1.size() == vec2.size()) vec1 = vec2;
but adding hints to operator=(const vector&) doesn't help. Presumably
the relationships between the two sizes and two capacities are too
complex to track effectively.
libstdc++-v3/ChangeLog:
PR libstdc++/110060
* include/bits/stl_vector.h (_Vector_base::_M_invariant):
Remove.
(vector::size, vector::capacity): Remove calls to _M_invariant.
* include/bits/vector.tcc (vector::_M_fill_assign): Add
optimization hint to reallocating path.
(vector::_M_assign_aux(FwdIter, FwdIter, forward_iterator_tag)):
Likewise.
* testsuite/23_containers/vector/capacity/invariant.cc: Moved
to...
* testsuite/23_containers/vector/modifiers/assign/no_realloc.cc:
...here. Check assign(FwdIter, FwdIter) too.
* testsuite/23_containers/vector/types/1.cc: Revert addition
of -Wno-stringop-overread option.
Traditionally libstdc++ allowed containers and strings to be
instantiated with allocator's that have the wrong value type, implicitly
rebinding the allocator to the container's value type. Since C++20 that
has been explicitly ill-formed, so the extension is no longer supported
in strict modes (e.g. -std=c++17) and in C++20 and later.
libstdc++-v3/ChangeLog:
* doc/xml/manual/evolution.xml: Document removal of implicit
allocator rebinding extensions in strict mode and for C++20.
* doc/html/*: Regenerate.
Also change some function arguments to bool and remove one instance
of always zero function argument.
gcc/ChangeLog:
* rtl.h (exp_equiv_p): Change return type from int to bool.
* cse.cc (mention_regs): Change return type from int to bool
and adjust function body accordingly.
(exp_equiv_p): Ditto.
(insert_regs): Ditto. Change "modified" function argument to bool
and update usage accordingly.
(record_jump_cond): Remove always zero "reversed_nonequality"
function argument and update usage accordingly.
(fold_rtx): Change "changed" variable to bool.
(record_jump_equiv): Remove unneeded "reversed_nonequality" variable.
(is_dead_reg): Change return type from int to bool.
More optimized than the default RTL generation.
gcc/ChangeLog:
* config/xtensa/xtensa.md (adddi3, subdi3):
New RTL generation patterns implemented according to the instruc-
tion idioms described in the Xtensa ISA reference manual (p. 600).
This is my proposed minimal fix for PR target/109973 (hopefully suitable
for backporting) that follows Jakub Jelinek's suggestion that we introduce
CCZmode and CCCmode variants of ptest and vptest, so that the i386
backend treats [v]ptest instructions similarly to testl instructions;
using different CCmodes to indicate which condition flags are desired,
and then relying on the RTL cmpelim pass to eliminate redundant tests.
This conveniently matches Intel's intrinsics, that provide different
functions for retrieving different flags, _mm_testz_si128 tests the
Z flag, _mm_testc_si128 tests the carry flag. Currently we use the
same instruction (pattern) for both, and unfortunately the *ptest<mode>_and
optimization is only valid when the ptest/vptest instruction is used to
set/test the Z flag.
The downside, as predicted by Jakub, is that GCC's cmpelim pass is
currently COMPARE-centric and not able to merge the ptests from expressions
such as _mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b), which is a
known issue, PR target/80040.
2023-06-01 Roger Sayle <roger@nextmovesoftware.com>
Uros Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
PR target/109973
* config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new
CODE_for_sse4_1_ptestzv2di.
(__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di.
(__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di.
(__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di.
* config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode
when expanding UNSPEC_PTEST to compare against zero.
* config/i386/i386-features.cc (scalar_chain::convert_compare):
Likewise generate CCZmode UNSPEC_PTESTs when converting comparisons.
(general_scalar_chain::convert_insn): Use CCZmode for COMPARE result.
(timode_scalar_chain::convert_insn): Use CCZmode for COMPARE result.
* config/i386/i386-protos.h (ix86_match_ptest_ccmode): Prototype.
* config/i386/i386.cc (ix86_match_ptest_ccmode): New predicate to
check for suitable matching modes for the UNSPEC_PTEST pattern.
* config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK
to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ.
(*<sse4_1>_ptest<mode>): Add asterisk to hide define_insn. Remove
":CC" mode of FLAGS_REG, instead use ix86_match_ptest_ccmode.
(<sse4_1>_ptestz<mode>): New define_expand to specify CCZ.
(<sse4_1>_ptestc<mode>): New define_expand to specify CCC.
(<sse4_1>_ptest<mode>): A define_expand using CC to preserve the
current behavior.
(*ptest<mode>_and): Specify CCZ to only perform this optimization
when only the Z flag is required.
gcc/testsuite/ChangeLog
PR target/109973
* gcc.target/i386/pr109973-1.c: New test case.
* gcc.target/i386/pr109973-2.c: Likewise.
In the ABI's two-phase EH model, first we walk the stack looking for a
handler, then we walk the stack running cleanups until we reach that
handler. In the cleanup phase, we shouldn't redundantly check the handlers
along the way, e.g. when walking through g():
void f() { throw 42; }
void g() { try { f(); } catch (void *) { } }
int main() { try { g(); } catch (int) { } }
libstdc++-v3/ChangeLog:
* libsupc++/eh_personality.cc (PERSONALITY_FUNCTION): Don't check
handlers in the cleanup phase.
This option does not imply -march=i386 so it's incorrect to say it
generates code that will run on "any i386 system".
gcc/ChangeLog:
PR target/109954
* doc/invoke.texi (x86 Options): Fix description of -m32 option.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
PR libstdc++/110050
* include/experimental/bits/simd.h (__vectorized_sizeof): With
__have_neon_a32 only single-precision float works (in addition
to integers).
We can use the X registers to load and store 64-bit vector modes, we just need to add the alternatives
to the mov patterns. This straightforward patch does that and for the pair variants too.
For the testcase in the code we now generate the optimal assembly without any superfluous
GP<->SIMD moves.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VDMOV:mode>):
Add =r,m and =r,m alternatives.
(load_pair<DREG:mode><DREG2:mode>): Likewise.
(vec_store_pair<DREG:mode><DREG2:mode>): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/xreg-vec-modes_1.c: New test.