[rs6000] Adjust vectorization cost for scalar COND_EXPR

We found that the vectorization cost modeling on scalar COND_EXPR is a bit off
on rs6000.  One typical case is 548.exchange2_r, -Ofast -mcpu=power9 -mrecip
-fvect-cost-model=unlimited is better than -Ofast -mcpu=power9 -mrecip (the
default is -fvect-cost-model=dynamic) by 1.94%.  Scalar COND_EXPR is expanded
into compare + branch or compare + isel normally, either of them should be
priced more than the simple FXU operation.  This patch is to add additional
vectorization cost onto scalar COND_EXPR on top of builtin_vectorization_cost.
The idea to use additional cost value 2 instead of the others: 1) try various
possible value candidates from 1 to 5, 2 is the best measured on Power9.  2) 
from latency view, compare takes 3 cycles and isel takes 2 on Power9, it's 
2.5 times of simple FXU instruction which takes cost 1 in the current
modeling, it's close.  3) get fine SPEC2017 ratio on Power8 as well.

gcc/ChangeLog

    * config/rs6000/rs6000.c (adjust_vectorization_cost): New function.
    (rs6000_add_stmt_cost): Call adjust_vectorization_cost and update
    stmt_cost.

From-SVN: r279336
This commit is contained in:
Kewen Lin 2019-12-13 06:00:53 +00:00
parent a1af2dd9c3
commit 396c2a9842
2 changed files with 30 additions and 0 deletions

View file

@ -1,3 +1,9 @@
2019-12-13 Kewen Lin <linkw@gcc.gnu.org>
* config/rs6000/rs6000.c (adjust_vectorization_cost): New function.
(rs6000_add_stmt_cost): Call adjust_vectorization_cost and update
stmt_cost.
2019-12-12 Jakub Jelinek <jakub@redhat.com>
PR target/92904

View file

@ -4997,6 +4997,29 @@ rs6000_init_cost (struct loop *loop_info)
return data;
}
/* Adjust vectorization cost after calling rs6000_builtin_vectorization_cost.
For some statement, we would like to further fine-grain tweak the cost on
top of rs6000_builtin_vectorization_cost handling which doesn't have any
information on statement operation codes etc. One typical case here is
COND_EXPR, it takes the same cost to simple FXU instruction when evaluating
for scalar cost, but it should be priced more whatever transformed to either
compare + branch or compare + isel instructions. */
static unsigned
adjust_vectorization_cost (enum vect_cost_for_stmt kind,
struct _stmt_vec_info *stmt_info)
{
if (kind == scalar_stmt && stmt_info && stmt_info->stmt
&& gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN)
{
tree_code subcode = gimple_assign_rhs_code (stmt_info->stmt);
if (subcode == COND_EXPR)
return 2;
}
return 0;
}
/* Implement targetm.vectorize.add_stmt_cost. */
static unsigned
@ -5012,6 +5035,7 @@ rs6000_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
int stmt_cost = rs6000_builtin_vectorization_cost (kind, vectype,
misalign);
stmt_cost += adjust_vectorization_cost (kind, stmt_info);
/* Statements in an inner loop relative to the loop being
vectorized are weighted more heavily. The value here is
arbitrary and could potentially be improved with analysis. */