tree-optimization/110474 - Vect: select small VF for epilog of unrolled loop
If a loop is unrolled during vectorization (i.e. suggested_unroll_factor > 1), the VFs of both main and epilog loop are enlarged. The epilog vect loop is specific for a loop with small iteration counts, so a large VF may hurt performance. This patch unscales the main loop VF by suggested_unroll_factor while selecting the epilog loop VF, so that it will be the same as vectorized loop without unrolling (i.e. suggested_unroll_factor = 1). gcc/ChangeLog: PR tree-optimization/110474 * tree-vect-loop.cc (vect_analyze_loop_2): unscale the VF by suggested unroll factor while selecting the epilog vect loop VF. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr110474.c: New testcase.
This commit is contained in:
parent
5158918aa2
commit
7339e725b9
2 changed files with 47 additions and 6 deletions
37
gcc/testsuite/gcc.target/aarch64/pr110474.c
Normal file
37
gcc/testsuite/gcc.target/aarch64/pr110474.c
Normal file
|
@ -0,0 +1,37 @@
|
|||
/* { dg-do compile } */
|
||||
/* { dg-options "-O3 -mtune=neoverse-n2 -mcpu=neoverse-n1 -fdump-tree-vect-details --param aarch64-vect-unroll-limit=2" } */
|
||||
/* { dg-final { scan-tree-dump "Choosing vector mode V8HI" "vect" } } */
|
||||
/* { dg-final { scan-tree-dump "Choosing epilogue vector mode V8QI" "vect" } } */
|
||||
|
||||
/* Do not increase the the vector factor of the epilog vectorized loop
|
||||
for a loop with suggested_unroll_factor > 1.
|
||||
|
||||
before (suggested_unroll_factor=1):
|
||||
if N >= 16:
|
||||
main vect loop
|
||||
if N >= 8:
|
||||
epilog vect loop
|
||||
scalar code
|
||||
|
||||
before (suggested_unroll_factor=2):
|
||||
if N >= 32:
|
||||
main vect loop
|
||||
if N >= 16: // May fail to execute vectorized code (e.g. N is 8)
|
||||
epilog vect loop
|
||||
scalar code
|
||||
|
||||
after (suggested_unroll_factor=2):
|
||||
if N >= 32:
|
||||
main vect loop
|
||||
if N >= 8: // The same VF as suggested_unroll_factor=1
|
||||
epilog vect loop
|
||||
scalar code */
|
||||
|
||||
int
|
||||
foo (short *A, char *B, int N)
|
||||
{
|
||||
int sum = 0;
|
||||
for (int i = 0; i < N; ++i)
|
||||
sum += A[i] * B[i];
|
||||
return sum;
|
||||
}
|
|
@ -3021,12 +3021,16 @@ start_over:
|
|||
to be able to handle fewer than VF scalars, or needs to have a lower VF
|
||||
than the main loop. */
|
||||
if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
|
||||
&& !LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
|
||||
&& maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
|
||||
LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)))
|
||||
return opt_result::failure_at (vect_location,
|
||||
"Vectorization factor too high for"
|
||||
" epilogue loop.\n");
|
||||
&& !LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
|
||||
{
|
||||
poly_uint64 unscaled_vf
|
||||
= exact_div (LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo),
|
||||
orig_loop_vinfo->suggested_unroll_factor);
|
||||
if (maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo), unscaled_vf))
|
||||
return opt_result::failure_at (vect_location,
|
||||
"Vectorization factor too high for"
|
||||
" epilogue loop.\n");
|
||||
}
|
||||
|
||||
/* Decide whether this loop_vinfo should use partial vectors or peeling,
|
||||
assuming that the loop will be used as a main loop. We will redo
|
||||
|
|
Loading…
Add table
Reference in a new issue