tree-optimization/110474 - Vect: select small VF for epilog of unrolled loop

If a loop is unrolled during vectorization (i.e. suggested_unroll_factor > 1),
the VFs of both main and epilog loop are enlarged.  The epilog vect loop is
specific for a loop with small iteration counts, so a large VF may hurt
performance.

This patch unscales the main loop VF by suggested_unroll_factor while selecting
the epilog loop VF, so that it will be the same as vectorized loop without
unrolling (i.e. suggested_unroll_factor = 1).

gcc/ChangeLog:

	PR tree-optimization/110474
	* tree-vect-loop.cc (vect_analyze_loop_2): unscale the VF by suggested
	unroll factor while selecting the epilog vect loop VF.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/pr110474.c: New testcase.
This commit is contained in:
Hao Liu 2023-07-06 10:03:47 +08:00
parent 5158918aa2
commit 7339e725b9
2 changed files with 47 additions and 6 deletions

View file

@ -0,0 +1,37 @@
/* { dg-do compile } */
/* { dg-options "-O3 -mtune=neoverse-n2 -mcpu=neoverse-n1 -fdump-tree-vect-details --param aarch64-vect-unroll-limit=2" } */
/* { dg-final { scan-tree-dump "Choosing vector mode V8HI" "vect" } } */
/* { dg-final { scan-tree-dump "Choosing epilogue vector mode V8QI" "vect" } } */
/* Do not increase the the vector factor of the epilog vectorized loop
for a loop with suggested_unroll_factor > 1.
before (suggested_unroll_factor=1):
if N >= 16:
main vect loop
if N >= 8:
epilog vect loop
scalar code
before (suggested_unroll_factor=2):
if N >= 32:
main vect loop
if N >= 16: // May fail to execute vectorized code (e.g. N is 8)
epilog vect loop
scalar code
after (suggested_unroll_factor=2):
if N >= 32:
main vect loop
if N >= 8: // The same VF as suggested_unroll_factor=1
epilog vect loop
scalar code */
int
foo (short *A, char *B, int N)
{
int sum = 0;
for (int i = 0; i < N; ++i)
sum += A[i] * B[i];
return sum;
}

View file

@ -3021,12 +3021,16 @@ start_over:
to be able to handle fewer than VF scalars, or needs to have a lower VF
than the main loop. */
if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
&& !LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
&& maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)))
return opt_result::failure_at (vect_location,
"Vectorization factor too high for"
" epilogue loop.\n");
&& !LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
{
poly_uint64 unscaled_vf
= exact_div (LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo),
orig_loop_vinfo->suggested_unroll_factor);
if (maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo), unscaled_vf))
return opt_result::failure_at (vect_location,
"Vectorization factor too high for"
" epilogue loop.\n");
}
/* Decide whether this loop_vinfo should use partial vectors or peeling,
assuming that the loop will be used as a main loop. We will redo