i386: Tweak ix86_expand_int_compare to use PTEST for vector equality.
I've come up with an alternate/complementary/supplementary fix to the patch https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622706.html for generating the PTEST during RTL expansion, rather than rely on this being caught/optimized later during STV. You'll notice in this patch, the tests for TARGET_SSE4_1 and TImode appear last. When I was writing this, I initially also added support for AVX VPTEST and OImode, before realizing that x86 doesn't (yet) support 256-bit OImode (which also explains why we don't have an OImode to V1OImode scalar-to-vector pass). Retaining this clause ordering should minimize the lines changed if things change in future. 2023-07-12 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_int_compare): If testing a TImode SUBREG of a 128-bit vector register against zero, use a PTEST instruction instead of first moving it to a pair of scalar registers.
This commit is contained in:
parent
a454325bea
commit
46ade8c9cc
1 changed files with 18 additions and 1 deletions
|
@ -2987,9 +2987,26 @@ ix86_expand_int_compare (enum rtx_code code, rtx op0, rtx op1)
|
|||
cmpmode = SELECT_CC_MODE (code, op0, op1);
|
||||
flags = gen_rtx_REG (cmpmode, FLAGS_REG);
|
||||
|
||||
/* Attempt to use PTEST, if available, when testing vector modes for
|
||||
equality/inequality against zero. */
|
||||
if (op1 == const0_rtx
|
||||
&& SUBREG_P (op0)
|
||||
&& cmpmode == CCZmode
|
||||
&& SUBREG_BYTE (op0) == 0
|
||||
&& REG_P (SUBREG_REG (op0))
|
||||
&& VECTOR_MODE_P (GET_MODE (SUBREG_REG (op0)))
|
||||
&& TARGET_SSE4_1
|
||||
&& GET_MODE (op0) == TImode
|
||||
&& GET_MODE_SIZE (GET_MODE (SUBREG_REG (op0))) == 16)
|
||||
{
|
||||
tmp = SUBREG_REG (op0);
|
||||
tmp = gen_rtx_UNSPEC (CCZmode, gen_rtvec (2, tmp, tmp), UNSPEC_PTEST);
|
||||
}
|
||||
else
|
||||
tmp = gen_rtx_COMPARE (cmpmode, op0, op1);
|
||||
|
||||
/* This is very simple, but making the interface the same as in the
|
||||
FP case makes the rest of the code easier. */
|
||||
tmp = gen_rtx_COMPARE (cmpmode, op0, op1);
|
||||
emit_insn (gen_rtx_SET (flags, tmp));
|
||||
|
||||
/* Return the test that should be put into the flags user, i.e.
|
||||
|
|
Loading…
Add table
Reference in a new issue