aarch64: Extend VECT_COMPARE_COSTS to !SVE [PR113104]

When SVE is enabled, we try vectorising with multiple different SVE and
Advanced SIMD approaches and use the cost model to pick the best one.
Until now, we've not done that for Advanced SIMD, since "the first mode
that works should always be the best".

The testcase is a counterexample.  Each iteration of the scalar loop
vectorises naturally with 64-bit input vectors and 128-bit output
vectors.  We do try that for SVE, and choose it as the best approach.
But the first approach we try is instead to use:

- a vectorisation factor of 2
- 1 128-bit vector for the inputs
- 2 128-bit vectors for the outputs

But since the stride is variable, the cost of marshalling the input
vector from two iterations outweighs the benefit of doing two iterations
at once.

This patch therefore generalises aarch64-sve-compare-costs to
aarch64-vect-compare-costs and applies it to non-SVE compilations.

gcc/
	PR target/113104
	* doc/invoke.texi (aarch64-sve-compare-costs): Replace with...
	(aarch64-vect-compare-costs): ...this.
	* config/aarch64/aarch64.opt (-param=aarch64-sve-compare-costs=):
	Replace with...
	(-param=aarch64-vect-compare-costs=): ...this new param.
	* config/aarch64/aarch64.cc (aarch64_override_options_internal):
	Don't disable it when vectorizing for Advanced SIMD only.
	(aarch64_autovectorize_vector_modes): Apply VECT_COMPARE_COSTS
	whenever aarch64_vect_compare_costs is true.

gcc/testsuite/
	PR target/113104
	* gcc.target/aarch64/pr113104.c: New test.
	* gcc.target/aarch64/sve/cond_arith_1.c: Update for new parameter
	names.
	* gcc.target/aarch64/sve/cond_arith_1_run.c: Likewise.
	* gcc.target/aarch64/sve/cond_arith_3.c: Likewise.
	* gcc.target/aarch64/sve/cond_arith_3_run.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_6.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_7.c: Likewise.
	* gcc.target/aarch64/sve/load_const_offset_2.c: Likewise.
	* gcc.target/aarch64/sve/load_const_offset_3.c: Likewise.
	* gcc.target/aarch64/sve/mask_gather_load_6.c: Likewise.
	* gcc.target/aarch64/sve/mask_gather_load_7.c: Likewise.
	* gcc.target/aarch64/sve/mask_load_slp_1.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_load_1.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_load_2.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_load_3.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_load_4.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_store_1.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_store_1_run.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_store_2.c: Likewise.
	* gcc.target/aarch64/sve/mask_struct_store_2_run.c: Likewise.
	* gcc.target/aarch64/sve/pack_1.c: Likewise.
	* gcc.target/aarch64/sve/reduc_4.c: Likewise.
	* gcc.target/aarch64/sve/scatter_store_6.c: Likewise.
	* gcc.target/aarch64/sve/scatter_store_7.c: Likewise.
	* gcc.target/aarch64/sve/strided_load_3.c: Likewise.
	* gcc.target/aarch64/sve/strided_store_3.c: Likewise.
	* gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Likewise.
	* gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise.
	* gcc.target/aarch64/sve/unpack_signed_1.c: Likewise.
	* gcc.target/aarch64/sve/unpack_unsigned_1.c: Likewise.
	* gcc.target/aarch64/sve/unpack_unsigned_1_run.c: Likewise.
	* gcc.target/aarch64/sve/vcond_11.c: Likewise.
	* gcc.target/aarch64/sve/vcond_11_run.c: Likewise.
This commit is contained in:
Richard Sandiford 2024-01-05 16:25:16 +00:00
parent d4cd871d15
commit 7328faf89e
36 changed files with 78 additions and 54 deletions

View file

@ -18223,12 +18223,6 @@ aarch64_override_options_internal (struct gcc_options *opts)
SET_OPTION_IF_UNSET (opts, &global_options_set,
param_sched_autopref_queue_depth, queue_depth);
/* If using Advanced SIMD only for autovectorization disable SVE vector costs
comparison. */
if (aarch64_autovec_preference == 1)
SET_OPTION_IF_UNSET (opts, &global_options_set,
aarch64_sve_compare_costs, 0);
/* Set up parameters to be used in prefetching algorithm. Do not
override the defaults unless we are tuning for a core we have
researched values for. */
@ -22138,12 +22132,7 @@ aarch64_autovectorize_vector_modes (vector_modes *modes, bool)
modes->safe_push (sve_modes[sve_i++]);
unsigned int flags = 0;
/* Consider enabling VECT_COMPARE_COSTS for SVE, both so that we
can compare SVE against Advanced SIMD and so that we can compare
multiple SVE vectorization approaches against each other. There's
not really any point doing this for Advanced SIMD only, since the
first mode that works should always be the best. */
if (TARGET_SVE && aarch64_sve_compare_costs)
if (aarch64_vect_compare_costs)
flags |= VECT_COMPARE_COSTS;
return flags;
}

View file

@ -332,9 +332,10 @@ moutline-atomics
Target Var(aarch64_flag_outline_atomics) Init(2) Save
Generate local calls to out-of-line atomic operations.
-param=aarch64-sve-compare-costs=
Target Joined UInteger Var(aarch64_sve_compare_costs) Init(1) IntegerRange(0, 1) Param
When vectorizing for SVE, consider using unpacked vectors for smaller elements and use the cost model to pick the cheapest approach. Also use the cost model to choose between SVE and Advanced SIMD vectorization.
-param=aarch64-vect-compare-costs=
Target Joined UInteger Var(aarch64_vect_compare_costs) Init(1) IntegerRange(0, 1) Param
When vectorizing, consider using multiple different approaches and use
the cost model to choose the cheapest one.
-param=aarch64-float-recp-precision=
Target Joined UInteger Var(aarch64_float_recp_precision) Init(1) IntegerRange(1, 5) Param

View file

@ -16778,14 +16778,23 @@ With @option{--param=openacc-privatization=noisy}, do diagnose.
The following choices of @var{name} are available on AArch64 targets:
@table @gcctabopt
@item aarch64-sve-compare-costs
When vectorizing for SVE, consider using ``unpacked'' vectors for
smaller elements and use the cost model to pick the cheapest approach.
Also use the cost model to choose between SVE and Advanced SIMD vectorization.
@item aarch64-vect-compare-costs
When vectorizing, consider using multiple different approaches and use
the cost model to choose the cheapest one. This includes:
Using unpacked vectors includes storing smaller elements in larger
containers and accessing elements with extending loads and truncating
stores.
@itemize
@item
Trying both SVE and Advanced SIMD, when SVE is available.
@item
Trying to use 64-bit Advanced SIMD vectors for the smallest data elements,
rather than using 128-bit vectors for everything.
@item
Trying to use ``unpacked'' SVE vectors for smaller elements. This includes
storing smaller elements in larger containers and accessing elements with
extending loads and truncating stores.
@end itemize
@item aarch64-float-recp-precision
The number of Newton iterations for calculating the reciprocal for float type.

View file

@ -0,0 +1,25 @@
/* { dg-options "-O3" } */
#pragma GCC target "+nosve"
int test(unsigned array[4][4]);
int foo(unsigned short *a, unsigned long n)
{
unsigned array[4][4];
for (unsigned i = 0; i < 4; i++, a += n)
{
array[i][0] = a[0] << 6;
array[i][1] = a[1] << 6;
array[i][2] = a[2] << 6;
array[i][3] = a[3] << 6;
}
return test(array);
}
/* { dg-final { scan-assembler-times {\tushll\t} 4 } } */
/* { dg-final { scan-assembler-not {\tzip.\t} } } */
/* { dg-final { scan-assembler-not {\tins\t} } } */
/* { dg-final { scan-assembler-not {\tshl\t} } } */

View file

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O2 -ftree-vectorize --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize --param aarch64-vect-compare-costs=0" } */
#include "cond_arith_1.c"

View file

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O2 -ftree-vectorize --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize --param aarch64-vect-compare-costs=0" } */
#include "cond_arith_3.c"

View file

@ -1,5 +1,5 @@
/* { dg-do assemble { target aarch64_asm_sve_ok } } */
/* { dg-options "-O2 -ftree-vectorize -fwrapv --save-temps --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -fwrapv --save-temps --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do assemble { target aarch64_asm_sve_ok } } */
/* { dg-options "-O2 -ftree-vectorize --save-temps --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize --save-temps --param aarch64-vect-compare-costs=0" } */
#define INDEX16 uint16_t
#define INDEX32 uint32_t

View file

@ -1,5 +1,5 @@
/* { dg-do assemble { target aarch64_asm_sve_ok } } */
/* { dg-options "-O2 -ftree-vectorize -save-temps --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -save-temps --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do assemble { target aarch64_asm_sve_ok } } */
/* { dg-options "-O2 -ftree-vectorize -save-temps -msve-vector-bits=256 --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -save-temps -msve-vector-bits=256 --param aarch64-vect-compare-costs=0" } */
#include "load_const_offset_2.c"

View file

@ -1,5 +1,5 @@
/* { dg-do assemble { target aarch64_asm_sve_ok } } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --save-temps --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --save-temps --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do assemble { target aarch64_asm_sve_ok } } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --save-temps --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --save-temps --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-vect-compare-costs=0" } */
#include "mask_struct_store_1.c"

View file

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-vect-compare-costs=0" } */
#include "mask_struct_store_2.c"

View file

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math --param aarch64-vect-compare-costs=0" } */
double
f (double *restrict a, double *restrict b, int *lookup)

View file

@ -1,5 +1,5 @@
/* { dg-do assemble { target aarch64_asm_sve_ok } } */
/* { dg-options "-O2 -ftree-vectorize -fwrapv --save-temps --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -fwrapv --save-temps --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do assemble { target aarch64_asm_sve_ok } } */
/* { dg-options "-O2 -ftree-vectorize --save-temps --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize --save-temps --param aarch64-vect-compare-costs=0" } */
#define INDEX16 uint16_t
#define INDEX32 uint32_t

View file

@ -1,5 +1,5 @@
/* { dg-do assemble { target aarch64_asm_sve_ok } } */
/* { dg-options "-O2 -ftree-vectorize --save-temps --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize --save-temps --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do assemble { target aarch64_asm_sve_ok } } */
/* { dg-options "-O2 -ftree-vectorize --save-temps --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize --save-temps --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -fno-inline --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -fno-inline --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -fno-inline --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -fno-inline --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -fno-inline --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -fno-inline --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O2 -ftree-vectorize -fno-inline --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -fno-inline --param aarch64-vect-compare-costs=0" } */
#include "unpack_unsigned_1.c"

View file

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --param aarch64-vect-compare-costs=0" } */
#include <stdint.h>

View file

@ -1,5 +1,5 @@
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --param aarch64-sve-compare-costs=0" } */
/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --param aarch64-vect-compare-costs=0" } */
#include "vcond_11.c"