tree-data-refs.c (split_constant_offset): Expose.

* tree-data-refs.c (split_constant_offset): Expose.
        * tree-data-refs.h (split_constant_offset): Add declaration.

        * tree-vectorizer.h (dr_alignment_support): Renamed
        dr_unaligned_software_pipeline to dr_explicit_realign_optimized.
        Added a new value dr_explicit_realign.
        (_stmt_vec_info): Added new fields: dr_base_address, dr_init,
        dr_offset, dr_step, and dr_aligned_to, along with new access
        functions for these fields: STMT_VINFO_DR_BASE_ADDRESS,
        STMT_VINFO_DR_INIT, STMT_VINFO_DR_OFFSET, STMT_VINFO_DR_STEP, and
        STMT_VINFO_DR_ALIGNED_TO.

        * tree-vectorizer.c (vect_supportable_dr_alignment): Add
        documentation.
        In case of outer-loop vectorization with non-fixed misalignment - use
        the dr_explicit_realign scheme instead of the optimized realignment
        scheme.
        (new_stmt_vec_info): Initialize new fields.

        * tree-vect-analyze.c (vect_compute_data_ref_alignment): Handle the
        'nested_in_vect_loop' case. Change verbosity level.
        (vect_analyze_data_ref_access): Handle the 'nested_in_vect_loop' case.
        Don't fail on zero step in the outer-loop for loads.
        (vect_analyze_data_refs): Call split_constant_offset to calculate base,
        offset and init relative to the outer-loop.

        * tree-vect-transform.c (vect_create_data_ref_ptr): Replace the unused
        BSI function argument with a new function argument - at_loop.
        Simplify the condition that determines STEP. Takes additional argument
        INV_P. Support outer-loop vectorization (handle the nested_in_vect_loop
        case), including zero step in the outer-loop. Call
        vect_create_addr_base_for_vector_ref with additional argument.
        (vect_create_addr_base_for_vector_ref): Takes additional argument LOOP.
        Updated function documentation. Handle the 'nested_in_vect_loop' case.
        Fixed and simplified calculation of step.
        (vectorizable_store): Call vect_create_data_ref_ptr with loop instead
        of bsi, and with additional argument. Call bump_vector_ptr with
        additional argument. Fix typos. Handle the 'nested_in_vect_loop' case.
        (vect_setup_realignment): Takes additional arguments INIT_ADDR and
        DR_ALIGNMENT_SUPPORT. Returns another value AT_LOOP. Handle the case
        when the realignment setup needs to take place inside the loop.  Support
        the dr_explicit_realign scheme. Allow generating the optimized
        realignment scheme for outer-loop vectorization. Added documentation.
        (vectorizable_load): Support the dr_explicit_realign scheme. Handle the
        'nested_in_vect_loop' case, including loads that are invariant in the
        outer-loop and the realignment schemes. Handle the case when the
        realignment setup needs to take place inside the loop. Call
        vect_setup_realignment with additional arguments.  Call
        vect_create_data_ref_ptr with additional argument and with loop instead
        of bsi. Fix 80-column overflow. Fix typos. Rename PHI_STMT to PHI.
        (vect_gen_niters_for_prolog_loop): Call
        vect_create_addr_base_for_vector_ref with additional arguments.
        (vect_create_cond_for_align_checks): Likewise.
        (bump_vector_ptr): Updated to support the new dr_explicit_realign
        scheme: takes additional argument bump; argument ptr_incr is now
        optional; updated documentation.
        (vect_init_vector): Takes additional argument (bsi). Use it, if
        available, to insert the vector initialization.
        (get_initial_def_for_induction): Pass additional argument in call to
        vect_init_vector.
        (vect_get_vec_def_for_operand): Likewise.
        (vect_setup_realignment): Likewise.
        (vectorizable_load): Likewise.

From-SVN: r127624
This commit is contained in:
Dorit Nuzman 2007-08-19 12:02:48 +00:00 committed by Dorit Nuzman
parent d29de1bf28
commit 468c2ac0cc
40 changed files with 2498 additions and 211 deletions

View file

@ -1,3 +1,69 @@
2007-08-19 Dorit Nuzman <dorit@il.ibm.com>
* tree-data-refs.c (split_constant_offset): Expose.
* tree-data-refs.h (split_constant_offset): Add declaration.
* tree-vectorizer.h (dr_alignment_support): Renamed
dr_unaligned_software_pipeline to dr_explicit_realign_optimized.
Added a new value dr_explicit_realign.
(_stmt_vec_info): Added new fields: dr_base_address, dr_init,
dr_offset, dr_step, and dr_aligned_to, along with new access
functions for these fields: STMT_VINFO_DR_BASE_ADDRESS,
STMT_VINFO_DR_INIT, STMT_VINFO_DR_OFFSET, STMT_VINFO_DR_STEP, and
STMT_VINFO_DR_ALIGNED_TO.
* tree-vectorizer.c (vect_supportable_dr_alignment): Add
documentation.
In case of outer-loop vectorization with non-fixed misalignment - use
the dr_explicit_realign scheme instead of the optimized realignment
scheme.
(new_stmt_vec_info): Initialize new fields.
* tree-vect-analyze.c (vect_compute_data_ref_alignment): Handle the
'nested_in_vect_loop' case. Change verbosity level.
(vect_analyze_data_ref_access): Handle the 'nested_in_vect_loop' case.
Don't fail on zero step in the outer-loop for loads.
(vect_analyze_data_refs): Call split_constant_offset to calculate base,
offset and init relative to the outer-loop.
* tree-vect-transform.c (vect_create_data_ref_ptr): Replace the unused
BSI function argument with a new function argument - at_loop.
Simplify the condition that determines STEP. Takes additional argument
INV_P. Support outer-loop vectorization (handle the nested_in_vect_loop
case), including zero step in the outer-loop. Call
vect_create_addr_base_for_vector_ref with additional argument.
(vect_create_addr_base_for_vector_ref): Takes additional argument LOOP.
Updated function documentation. Handle the 'nested_in_vect_loop' case.
Fixed and simplified calculation of step.
(vectorizable_store): Call vect_create_data_ref_ptr with loop instead
of bsi, and with additional argument. Call bump_vector_ptr with
additional argument. Fix typos. Handle the 'nested_in_vect_loop' case.
(vect_setup_realignment): Takes additional arguments INIT_ADDR and
DR_ALIGNMENT_SUPPORT. Returns another value AT_LOOP. Handle the case
when the realignment setup needs to take place inside the loop. Support
the dr_explicit_realign scheme. Allow generating the optimized
realignment scheme for outer-loop vectorization. Added documentation.
(vectorizable_load): Support the dr_explicit_realign scheme. Handle the
'nested_in_vect_loop' case, including loads that are invariant in the
outer-loop and the realignment schemes. Handle the case when the
realignment setup needs to take place inside the loop. Call
vect_setup_realignment with additional arguments. Call
vect_create_data_ref_ptr with additional argument and with loop instead
of bsi. Fix 80-column overflow. Fix typos. Rename PHI_STMT to PHI.
(vect_gen_niters_for_prolog_loop): Call
vect_create_addr_base_for_vector_ref with additional arguments.
(vect_create_cond_for_align_checks): Likewise.
(bump_vector_ptr): Updated to support the new dr_explicit_realign
scheme: takes additional argument bump; argument ptr_incr is now
optional; updated documentation.
(vect_init_vector): Takes additional argument (bsi). Use it, if
available, to insert the vector initialization.
(get_initial_def_for_induction): Pass additional argument in call to
vect_init_vector.
(vect_get_vec_def_for_operand): Likewise.
(vect_setup_realignment): Likewise.
(vectorizable_load): Likewise.
2007-08-19 Dorit Nuzman <dorit@il.ibm.com>
* tree-vectorizer.h (vect_is_simple_reduction): Takes a loop_vec_info

View file

@ -1,3 +1,38 @@
2007-08-19 Dorit Nuzman <dorit@il.ibm.com>
* gcc.dg/vect/vect-117.c: Change inner-loop bound to
unknown (so that outer-loop wont get analyzed).
* gcc.dg/vect/vect-outer-1a.c: New test.
* gcc.dg/vect/vect-outer-1b.c: New test.
* gcc.dg/vect/vect-outer-1.c: New test.
* gcc.dg/vect/vect-outer-2a.c: New test.
* gcc.dg/vect/vect-outer-2b.c: New test.
* gcc.dg/vect/vect-outer-2c.c: New test.
* gcc.dg/vect/vect-outer-2.c: New test.
* gcc.dg/vect/vect-outer-3a.c: New test.
* gcc.dg/vect/vect-outer-3b.c: New test.
* gcc.dg/vect/vect-outer-3c.c: New test.
* gcc.dg/vect/vect-outer-3.c: New test.
* gcc.dg/vect/vect-outer-4a.c: New test.
* gcc.dg/vect/vect-outer-4b.c: New test.
* gcc.dg/vect/vect-outer-4c.c: New test.
* gcc.dg/vect/vect-outer-4d.c: New test.
* gcc.dg/vect/vect-outer-4e.c: New test.
* gcc.dg/vect/vect-outer-4f.c: New test.
* gcc.dg/vect/vect-outer-4g.c: New test.
* gcc.dg/vect/no-section-anchors-vect-outer-4h.c: New test.
* gcc.dg/vect/vect-outer-4i.c: New test.
* gcc.dg/vect/vect-outer-4j.c: New test.
* gcc.dg/vect/vect-outer-4k.c: New test.
* gcc.dg/vect/vect-outer-4l.c: New test.
* gcc.dg/vect/vect-outer-4m.c: New test.
* gcc.dg/vect/vect-outer-4.c: New test.
* gcc.dg/vect/vect-outer-5.c: New test.
* gcc.dg/vect/vect-outer-6.c: New test.
* gcc.dg/vect/vect-outer-fir.c: New test.
* gcc.dg/vect/vect-outer-fir-lb.c: New test.
* gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c: New test.
2007-08-19 Dorit Nuzman <dorit@il.ibm.com>
* gcc.dg/vect/vect.exp: Compile tests with -fno-tree-scev-cprop

View file

@ -0,0 +1,75 @@
/* { dg-require-effective-target vect_float } */
#include <stdarg.h>
#include "../../tree-vect.h"
#define N 40
#define M 128
float in[N+M];
float coeff[M];
float out[N];
float fir_out[N];
/* Should be vectorized. Fixed misaligment in the inner-loop. */
/* Currently not vectorized because we get too many BBs in the inner-loop,
because the compiler doesn't realize that the inner-loop executes at
least once (cause k<4), and so there's no need to create a guard code
to skip the inner-loop in case it doesn't execute. */
void foo (){
int i,j,k;
float diff;
for (i = 0; i < N; i++) {
out[i] = 0;
}
for (k = 0; k < 4; k++) {
for (i = 0; i < N; i++) {
diff = 0;
for (j = k; j < M; j+=4) {
diff += in[j+i]*coeff[j];
}
out[i] += diff;
}
}
/* Vectorized. Changing misalignment in the inner-loop. */
void fir (){
int i,j,k;
float diff;
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < M; j++) {
diff += in[j+i]*coeff[j];
}
fir_out[i] = diff;
}
}
int main (void)
{
check_vect ();
int i, j;
float diff;
for (i = 0; i < M; i++)
coeff[i] = i;
for (i = 0; i < N+M; i++)
in[i] = i;
foo ();
fir ();
for (i = 0; i < N; i++) {
if (out[i] != fir_out[i])
abort ();
}
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 2 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,47 @@
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 40
#define M 128
unsigned short a[M][N];
unsigned int out[N];
/* Outer-loop vectorization. */
void
foo (){
int i,j;
unsigned int diff;
for (i = 0; i < N; i++) {
for (j = 0; j < M; j++) {
a[j][i] = 4;
}
out[i]=5;
}
}
int main (void)
{
int i, j;
check_vect ();
foo ();
for (i = 0; i < N; i++) {
for (j = 0; j < M; j++) {
if (a[j][i] != 4)
abort ();
}
if (out[i] != 5)
abort ();
}
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -20,7 +20,7 @@ static int c[N][N] = {{ 1, 2, 3, 4, 5},
volatile int foo;
int main1 (int A[N][N])
int main1 (int A[N][N], int n)
{
int i,j;
@ -28,7 +28,7 @@ int main1 (int A[N][N])
/* vectorizable */
for (i = 1; i < N; i++)
{
for (j = 0; j < N; j++)
for (j = 0; j < n; j++)
{
A[i][j] = A[i-1][j] + A[i][j];
}
@ -42,7 +42,7 @@ int main (void)
int i,j;
foo = 0;
main1 (a);
main1 (a, N);
/* check results: */

View file

@ -0,0 +1,26 @@
/* { dg-do compile } */
#define N 40
signed short image[N][N] __attribute__ ((__aligned__(16)));
signed short block[N][N] __attribute__ ((__aligned__(16)));
signed short out[N] __attribute__ ((__aligned__(16)));
/* Can't do outer-loop vectorization because of non-consecutive access. */
void
foo (){
int i,j;
int diff;
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < N; j+=8) {
diff += (image[i][j] - block[i][j]);
}
out[i]=diff;
}
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "strided access in outer loop" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,28 @@
/* { dg-do compile } */
#define N 40
signed short image[N][N] __attribute__ ((__aligned__(16)));
signed short block[N][N] __attribute__ ((__aligned__(16)));
/* Can't do outer-loop vectorization because of non-consecutive access.
Currently fails to vectorize because the reduction pattern is not
recognized. */
int
foo (){
int i,j;
int diff = 0;
for (i = 0; i < N; i++) {
for (j = 0; j < N; j+=8) {
diff += (image[i][j] - block[i][j]);
}
}
return diff;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
/* FORNOW */
/* { dg-final { scan-tree-dump-times "strided access in outer loop" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "unexpected pattern" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,26 @@
/* { dg-do compile } */
#define N 40
signed short image[N][N];
signed short block[N][N];
signed short out[N];
/* Outer-loop cannot get vectorized because of non-consecutive access. */
void
foo (){
int i,j;
int diff;
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < N; j+=4) {
diff += (image[i][j] - block[i][j]);
}
out[i]=diff;
}
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "strided access in outer loop" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,40 @@
/* { dg-require-effective-target vect_float } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 40
float image[N][N] __attribute__ ((__aligned__(16)));
float out[N];
/* Outer-loop vectorization. */
void
foo (){
int i,j;
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
image[j][i] = j+i;
}
}
}
int main (void)
{
check_vect ();
int i, j;
foo ();
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
if (image[j][i] != j+i)
abort ();
}
}
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,41 @@
/* { dg-require-effective-target vect_float } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 40
float image[N][N][N] __attribute__ ((__aligned__(16)));
void
foo (){
int i,j,k;
for (k=0; k<N; k++) {
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
image[k][j][i] = j+i+k;
}
}
}
}
int main (void)
{
check_vect ();
int i, j, k;
foo ();
for (k=0; k<N; k++) {
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
if (image[k][j][i] != j+i+k)
abort ();
}
}
}
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,41 @@
/* { dg-require-effective-target vect_float } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 40
float image[2*N][N][N] __attribute__ ((__aligned__(16)));
void
foo (){
int i,j,k;
for (k=0; k<N; k++) {
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
image[k+i][j][i] = j+i+k;
}
}
}
}
int main (void)
{
check_vect ();
int i, j, k;
foo ();
for (k=0; k<N; k++) {
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
if (image[k+i][j][i] != j+i+k)
abort ();
}
}
}
return 0;
}
/* { dg-final { scan-tree-dump-times "strided access in outer loop." 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,41 @@
/* { dg-require-effective-target vect_float } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 40
float image[2*N][2*N][N] __attribute__ ((__aligned__(16)));
void
foo (){
int i,j,k;
for (k=0; k<N; k++) {
for (i = 0; i < N; i++) {
for (j = 0; j < N; j+=2) {
image[k][j][i] = j+i+k;
}
}
}
}
int main (void)
{
check_vect ();
int i, j, k;
foo ();
for (k=0; k<N; k++) {
for (i = 0; i < N; i++) {
for (j = 0; j < N; j+=2) {
if (image[k][j][i] != j+i+k)
abort ();
}
}
}
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,41 @@
/* { dg-require-effective-target vect_float } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 40
float image[N][N][N+1] __attribute__ ((__aligned__(16)));
void
foo (){
int i,j,k;
for (k=0; k<N; k++) {
for (i = 0; i < N; i++) {
for (j = 0; j < i+1; j++) {
image[k][j][i] = j+i+k;
}
}
}
}
int main (void)
{
check_vect ();
int i, j, k;
foo ();
for (k=0; k<N; k++) {
for (i = 0; i < N; i++) {
for (j = 0; j < i+1; j++) {
if (image[k][j][i] != j+i+k)
abort ();
}
}
}
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 0 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,52 @@
/* { dg-require-effective-target vect_float } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 40
float image[N][N] __attribute__ ((__aligned__(16)));
float out[N];
/* Outer-loop vectoriation. */
void
foo (){
int i,j;
float diff;
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < N; j++) {
diff += image[j][i];
}
out[i]=diff;
}
}
int main (void)
{
check_vect ();
int i, j;
float diff;
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
image[i][j]=i+j;
}
}
foo ();
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < N; j++) {
diff += image[j][i];
}
if (out[i] != diff)
abort ();
}
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,53 @@
/* { dg-require-effective-target vect_float } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 40
float image[N][N+1] __attribute__ ((__aligned__(16)));
float out[N];
/* Outer-loop vectorization with misaliged accesses in the inner-loop. */
void
foo (){
int i,j;
float diff;
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < N; j++) {
diff += image[j][i];
}
out[i]=diff;
}
}
int main (void)
{
check_vect ();
int i, j;
float diff;
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
image[i][j]=i+j;
}
}
foo ();
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < N; j++) {
diff += image[j][i];
}
if (out[i] != diff)
abort ();
}
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
/* { dg-final { scan-tree-dump-times "step doesn't divide the vector-size" 2 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,53 @@
/* { dg-require-effective-target vect_float } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 40
float image[N][N] __attribute__ ((__aligned__(16)));
float out[N];
/* Outer-loop vectorization with non-consecutive access. Not vectorized yet. */
void
foo (){
int i,j;
float diff;
for (i = 0; i < N/2; i++) {
diff = 0;
for (j = 0; j < N; j++) {
diff += image[j][2*i];
}
out[i]=diff;
}
}
int main (void)
{
check_vect ();
int i, j;
float diff;
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
image[i][j]=i+j;
}
}
foo ();
for (i = 0; i < N/2; i++) {
diff = 0;
for (j = 0; j < N; j++) {
diff += image[j][2*i];
}
if (out[i] != diff)
abort ();
}
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "strided access in outer loop" 2 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,52 @@
/* { dg-require-effective-target vect_float } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 40
float image[N][N+1] __attribute__ ((__aligned__(16)));
float out[N];
/* Outer-loop vectorization. */
void
foo (){
int i,j;
float diff;
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < N; j+=4) {
diff += image[j][i];
}
out[i]=diff;
}
}
int main (void)
{
check_vect ();
int i, j;
float diff;
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
image[i][j]=i+j;
}
}
foo ();
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < N; j+=4) {
diff += image[j][i];
}
if (out[i] != diff)
abort ();
}
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,55 @@
/* { dg-require-effective-target vect_float } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 40
#define M 128
float in[N+M];
float coeff[M];
float out[N];
/* Outer-loop vectorization. */
void
foo (){
int i,j;
float diff;
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < M; j+=4) {
diff += in[j+i]*coeff[j];
}
out[i]=diff;
}
}
int main (void)
{
check_vect ();
int i, j;
float diff;
for (i = 0; i < M; i++)
coeff[i] = i;
for (i = 0; i < N+M; i++)
in[i] = i;
foo ();
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < M; j+=4) {
diff += in[j+i]*coeff[j];
}
if (out[i] != diff)
abort ();
}
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "zero step in outer loop." 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,31 @@
/* { dg-do compile } */
#define N 40
#define M 128
signed short in[N+M];
signed short coeff[M];
signed short out[N];
/* Outer-loop vectorization.
Currently not vectorized because of multiple-data-types in the inner-loop. */
void
foo (){
int i,j;
int diff;
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < M; j+=8) {
diff += in[j+i]*coeff[j];
}
out[i]=diff;
}
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
/* FORNOW. not vectorized until we support 0-stride acceses like coeff[j]. should be:
{ scan-tree-dump-not "multiple types in nested loop." "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "zero step in outer loop." 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,31 @@
/* { dg-do compile } */
#define N 40
#define M 128
signed short in[N+M];
signed short coeff[M];
int out[N];
/* Outer-loop vectorization.
Currently not vectorized because of multiple-data-types in the inner-loop. */
void
foo (){
int i,j;
int diff;
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < M; j+=8) {
diff += in[j+i]*coeff[j];
}
out[i]=diff;
}
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
/* FORNOW. not vectorized until we support 0-stride acceses like coeff[j]. should be:
{ scan-tree-dump-not "multiple types in nested loop." "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "zero step in outer loop." 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,27 @@
/* { dg-do compile } */
#define N 40
#define M 128
unsigned short in[N+M];
unsigned short coeff[M];
unsigned int out[N];
/* Outer-loop vectorization. */
void
foo (){
int i,j;
unsigned short diff;
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < M; j+=8) {
diff += in[j+i]*coeff[j];
}
out[i]=diff;
}
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { target vect_short_mult } } } */
/* { dg-final { scan-tree-dump-times "zero step in outer loop." 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,51 @@
/* { dg-require-effective-target vect_float } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 40
#define M 128
float in[N+M];
float out[N];
/* Outer-loop vectorization. */
void
foo (){
int i,j;
float diff;
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < M; j+=4) {
diff += in[j+i];
}
out[i]=diff;
}
}
int main (void)
{
check_vect ();
int i, j;
float diff;
for (i = 0; i < N; i++)
in[i] = i;
foo ();
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < M; j+=4) {
diff += in[j+i];
}
if (out[i] != diff)
abort ();
}
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,27 @@
/* { dg-do compile } */
#define N 40
#define M 128
unsigned int in[N+M];
unsigned short out[N];
/* Outer-loop vectorization. */
void
foo (){
int i,j;
unsigned int diff;
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < M; j+=8) {
diff += in[j+i];
}
out[i]=(unsigned short)diff;
}
return;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,70 @@
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 40
#define M 128
unsigned short in[N+M];
unsigned int out[N];
unsigned char arr[N];
/* Outer-loop vectorization. */
/* Not vectorized due to multiple-types in the inner-loop. */
unsigned int
foo (){
int i,j;
unsigned int diff;
unsigned int s=0;
for (i = 0; i < N; i++) {
arr[i] = 3;
diff = 0;
for (j = 0; j < M; j+=8) {
diff += in[j+i];
}
s+=diff;
}
return s;
}
unsigned int
bar (int i, unsigned int diff, unsigned short *in)
{
int j;
for (j = 0; j < M; j+=8) {
diff += in[j+i];
}
return diff;
}
int main (void)
{
int i, j;
unsigned int diff;
unsigned int s=0,sum=0;
check_vect ();
for (i = 0; i < N+M; i++) {
in[i] = i;
}
sum=foo ();
for (i = 0; i < N; i++) {
arr[i] = 3;
diff = 0;
diff = bar (i, diff, in);
s += diff;
}
if (s != sum)
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: not allowed" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,70 @@
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 40
#define M 128
unsigned short in[N+M];
unsigned int out[N];
unsigned char arr[N];
/* Outer-loop vectorization. */
/* Not vectorized due to multiple-types in the inner-loop. */
unsigned int
foo (){
int i,j;
unsigned int diff;
unsigned int s=0;
for (i = 0; i < N; i++) {
arr[i] = 3;
diff = 0;
for (j = 0; j < M; j+=8) {
diff += in[j+i];
}
s+=diff;
}
return s;
}
unsigned int
bar (int i, unsigned int diff, unsigned short *in)
{
int j;
for (j = 0; j < M; j+=8) {
diff += in[j+i];
}
return diff;
}
int main (void)
{
int i, j;
unsigned int diff;
unsigned int s=0,sum=0;
check_vect ();
for (i = 0; i < N+M; i++) {
in[i] = i;
}
sum=foo ();
for (i = 0; i < N; i++) {
arr[i] = 3;
diff = 0;
diff = bar (i, diff, in);
s += diff;
}
if (s != sum)
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: not allowed" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,28 @@
/* { dg-do compile } */
#define N 40
#define M 128
unsigned char in[N+M];
unsigned short out[N];
/* Outer-loop vectorization. */
/* Not vectorized due to multiple-types in the inner-loop. */
unsigned short
foo (){
int i,j;
unsigned short diff;
unsigned short s=0;
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < M; j+=8) {
diff += in[j+i];
}
s+=diff;
}
return s;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,26 @@
/* { dg-do compile } */
#define N 40
#define M 128
unsigned char in[N+M];
unsigned short out[N];
/* Outer-loop vectorization. */
/* Not vectorized due to multiple-types in the inner-loop. */
void
foo (){
int i,j;
unsigned short diff;
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < M; j+=8) {
diff += in[j+i];
}
out[i]=diff;
}
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,70 @@
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 40
#define M 128
unsigned short in[N+M];
unsigned int out[N];
unsigned char arr[N];
/* Outer-loop vectorization. */
/* Not vectorized due to multiple-types in the inner-loop. */
unsigned int
foo (){
int i,j;
unsigned int diff;
unsigned int s=0;
for (i = 0; i < N; i++) {
arr[i] = 3;
diff = 0;
for (j = 0; j < M; j+=8) {
diff += in[j+i];
}
s+=diff;
}
return s;
}
unsigned int
bar (int i, unsigned int diff, unsigned short *in)
{
int j;
for (j = 0; j < M; j+=8) {
diff += in[j+i];
}
return diff;
}
int main (void)
{
int i, j;
unsigned int diff;
unsigned int s=0,sum=0;
check_vect ();
for (i = 0; i < N+M; i++) {
in[i] = i;
}
sum=foo ();
for (i = 0; i < N; i++) {
arr[i] = 3;
diff = 0;
diff = bar (i, diff, in);
s += diff;
}
if (s != sum)
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: not allowed" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,70 @@
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 40
#define M 128
unsigned short in[N+M];
unsigned int out[N];
unsigned char arr[N];
/* Outer-loop vectorization. */
/* Not vectorized due to multiple-types in the inner-loop. */
unsigned int
foo (){
int i,j;
unsigned int diff;
unsigned int s=0;
for (i = 0; i < N; i++) {
arr[i] = 3;
diff = 0;
for (j = 0; j < M; j+=8) {
diff += in[j+i];
}
s+=diff;
}
return s;
}
unsigned int
bar (int i, unsigned int diff, unsigned short *in)
{
int j;
for (j = 0; j < M; j+=8) {
diff += in[j+i];
}
return diff;
}
int main (void)
{
int i, j;
unsigned int diff;
unsigned int s=0,sum=0;
check_vect ();
for (i = 0; i < N+M; i++) {
in[i] = i;
}
sum=foo ();
for (i = 0; i < N; i++) {
arr[i] = 3;
diff = 0;
diff = bar (i, diff, in);
s += diff;
}
if (s != sum)
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: not allowed" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,58 @@
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 40
#define M 128
unsigned short in[N+M];
unsigned int out[N];
/* Outer-loop vectorization. */
/* Not vectorized due to multiple-types in the inner-loop. */
unsigned int
foo (){
int i,j;
unsigned int diff;
unsigned int s=0;
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < M; j+=8) {
diff += in[j+i];
}
s+=((unsigned short)diff>>3);
}
return s;
}
int main (void)
{
int i, j;
unsigned int diff;
unsigned int s=0,sum=0;
check_vect ();
for (i = 0; i < N+M; i++) {
in[i] = i;
}
sum=foo ();
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < M; j+=8) {
diff += in[j+i];
}
s += ((unsigned short)diff>>3);
}
if (s != sum)
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,83 @@
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <signal.h>
#include "tree-vect.h"
#define N 64
#define MAX 42
extern void abort(void);
int main1 ()
{
float A[N] __attribute__ ((__aligned__(16)));
float B[N] __attribute__ ((__aligned__(16)));
float C[N] __attribute__ ((__aligned__(16)));
float D[N] __attribute__ ((__aligned__(16)));
float s;
int i, j;
for (i = 0; i < N; i++)
{
A[i] = i;
B[i] = i;
C[i] = i;
D[i] = i;
}
/* Outer-loop 1: Vectorizable with respect to dependence distance. */
for (i = 0; i < N-20; i++)
{
s = 0;
for (j=0; j<N; j+=4)
s += C[j];
A[i] = A[i+20] + s;
}
/* check results: */
for (i = 0; i < N-20; i++)
{
s = 0;
for (j=0; j<N; j+=4)
s += C[j];
if (A[i] != D[i+20] + s)
abort ();
}
/* Outer-loop 2: Not vectorizable because of dependence distance. */
for (i = 0; i < 4; i++)
{
s = 0;
for (j=0; j<N; j+=4)
s += C[j];
B[i] = B[i+3] + s;
}
/* check results: */
for (i = 0; i < 4; i++)
{
s = 0;
for (j=0; j<N; j+=4)
s += C[j];
if (B[i] != D[i+3] + s)
abort ();
}
return 0;
}
int main ()
{
check_vect ();
return main1();
}
/* NOTE: We temporarily xfail the following check until versioning for
aliasing is fixed to avoid versioning when the dependence distance
is known. */
/* { dg-final { scan-tree-dump-times "not vectorized: possible dependence between data-refs" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "zero step in outer loop." 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,65 @@
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <signal.h>
#include "tree-vect.h"
#define N 64
#define MAX 42
float A[N] __attribute__ ((__aligned__(16)));
float B[N] __attribute__ ((__aligned__(16)));
float C[N] __attribute__ ((__aligned__(16)));
float D[N] __attribute__ ((__aligned__(16)));
extern void abort(void);
int main1 ()
{
float s;
int i, j;
for (i = 0; i < 8; i++)
{
s = 0;
for (j=0; j<8; j+=4)
s += C[j];
A[i] = s;
}
return 0;
}
int main ()
{
int i,j;
float s;
check_vect ();
for (i = 0; i < N; i++)
{
A[i] = i;
B[i] = i;
C[i] = i;
D[i] = i;
}
main1();
/* check results: */
for (i = 0; i < 8; i++)
{
s = 0;
for (j=0; j<8; j+=4)
s += C[j];
if (A[i] != s)
abort ();
}
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "zero step in outer loop." 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,80 @@
/* { dg-require-effective-target vect_float } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 40
#define M 64
float in[N+M];
float coeff[M];
float out[N];
float fir_out[N];
/* Should be vectorized. Fixed misaligment in the inner-loop. */
/* Currently not vectorized because the loop-count for the inner-loop
has a maybe_zero component. Will be fixed when we incorporate the
"cond_expr in rhs" patch. */
void foo (){
int i,j,k;
float diff;
for (i = 0; i < N; i++) {
out[i] = 0;
}
for (k = 0; k < 4; k++) {
for (i = 0; i < N; i++) {
diff = 0;
j = k;
do {
diff += in[j+i]*coeff[j];
j+=4;
} while (j < M);
out[i] += diff;
}
}
}
/* Vectorized. Changing misalignment in the inner-loop. */
void fir (){
int i,j,k;
float diff;
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < M; j++) {
diff += in[j+i]*coeff[j];
}
fir_out[i] = diff;
}
}
int main (void)
{
check_vect ();
int i, j;
float diff;
for (i = 0; i < M; i++)
coeff[i] = i;
for (i = 0; i < N+M; i++)
in[i] = i;
foo ();
fir ();
for (i = 0; i < N; i++) {
if (out[i] != fir_out[i])
abort ();
}
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 2 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -0,0 +1,77 @@
/* { dg-require-effective-target vect_float } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 40
#define M 128
float in[N+M];
float coeff[M];
float out[N];
float fir_out[N];
/* Should be vectorized. Fixed misaligment in the inner-loop. */
/* Currently not vectorized because we get too many BBs in the inner-loop,
because the compiler doesn't realize that the inner-loop executes at
least once (cause k<4), and so there's no need to create a guard code
to skip the inner-loop in case it doesn't execute. */
void foo (){
int i,j,k;
float diff;
for (i = 0; i < N; i++) {
out[i] = 0;
}
for (k = 0; k < 4; k++) {
for (i = 0; i < N; i++) {
diff = 0;
for (j = k; j < M; j+=4) {
diff += in[j+i]*coeff[j];
}
out[i] += diff;
}
}
}
/* Vectorized. Changing misalignment in the inner-loop. */
void fir (){
int i,j,k;
float diff;
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < M; j++) {
diff += in[j+i]*coeff[j];
}
fir_out[i] = diff;
}
}
int main (void)
{
check_vect ();
int i, j;
float diff;
for (i = 0; i < M; i++)
coeff[i] = i;
for (i = 0; i < N+M; i++)
in[i] = i;
foo ();
fir ();
for (i = 0; i < N; i++) {
if (out[i] != fir_out[i])
abort ();
}
return 0;
}
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 2 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */

View file

@ -489,7 +489,7 @@ dump_ddrs (FILE *file, VEC (ddr_p, heap) *ddrs)
/* Expresses EXP as VAR + OFF, where off is a constant. The type of OFF
will be ssizetype. */
static void
void
split_constant_offset (tree exp, tree *var, tree *off)
{
tree type = TREE_TYPE (exp), otype;

View file

@ -388,4 +388,7 @@ index_in_loop_nest (int var, VEC (loop_p, heap) *loop_nest)
/* In lambda-code.c */
bool lambda_transform_legal_p (lambda_trans_matrix, int, VEC (ddr_p, heap) *);
/* In tree-data-refs.c */
void split_constant_offset (tree , tree *, tree *);
#endif /* GCC_TREE_DATA_REF_H */

View file

@ -1279,6 +1279,8 @@ vect_compute_data_ref_alignment (struct data_reference *dr)
{
tree stmt = DR_STMT (dr);
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
tree ref = DR_REF (dr);
tree vectype;
tree base, base_addr;
@ -1295,13 +1297,42 @@ vect_compute_data_ref_alignment (struct data_reference *dr)
misalign = DR_INIT (dr);
aligned_to = DR_ALIGNED_TO (dr);
base_addr = DR_BASE_ADDRESS (dr);
/* In case the dataref is in an inner-loop of the loop that is being
vectorized (LOOP), we use the base and misalignment information
relative to the outer-loop (LOOP). This is ok only if the misalignment
stays the same throughout the execution of the inner-loop, which is why
we have to check that the stride of the dataref in the inner-loop evenly
divides by the vector size. */
if (nested_in_vect_loop_p (loop, stmt))
{
tree step = DR_STEP (dr);
HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
if (dr_step % UNITS_PER_SIMD_WORD == 0)
{
if (vect_print_dump_info (REPORT_ALIGNMENT))
fprintf (vect_dump, "inner step divides the vector-size.");
misalign = STMT_VINFO_DR_INIT (stmt_info);
aligned_to = STMT_VINFO_DR_ALIGNED_TO (stmt_info);
base_addr = STMT_VINFO_DR_BASE_ADDRESS (stmt_info);
}
else
{
if (vect_print_dump_info (REPORT_ALIGNMENT))
fprintf (vect_dump, "inner step doesn't divide the vector-size.");
misalign = NULL_TREE;
}
}
base = build_fold_indirect_ref (base_addr);
vectype = STMT_VINFO_VECTYPE (stmt_info);
alignment = ssize_int (TYPE_ALIGN (vectype)/BITS_PER_UNIT);
if (tree_int_cst_compare (aligned_to, alignment) < 0)
if ((aligned_to && tree_int_cst_compare (aligned_to, alignment) < 0)
|| !misalign)
{
if (vect_print_dump_info (REPORT_DETAILS))
if (vect_print_dump_info (REPORT_ALIGNMENT))
{
fprintf (vect_dump, "Unknown alignment for access: ");
print_generic_expr (vect_dump, base, TDF_SLIM);
@ -1980,20 +2011,39 @@ static bool
vect_analyze_data_ref_access (struct data_reference *dr)
{
tree step = DR_STEP (dr);
HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
tree scalar_type = TREE_TYPE (DR_REF (dr));
HOST_WIDE_INT type_size = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (scalar_type));
tree stmt = DR_STMT (dr);
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
HOST_WIDE_INT stride;
/* Don't allow invariant accesses. */
if (dr_step == 0)
return false;
if (nested_in_vect_loop_p (loop, stmt))
{
/* For the rest of the analysis we use the outer-loop step. */
step = STMT_VINFO_DR_STEP (stmt_info);
dr_step = TREE_INT_CST_LOW (step);
if (dr_step == 0)
{
if (vect_print_dump_info (REPORT_ALIGNMENT))
fprintf (vect_dump, "zero step in outer loop.");
if (DR_IS_READ (dr))
return true;
else
return false;
}
}
/* For interleaving, STRIDE is STEP counted in elements, i.e., the size of the
interleaving group (including gaps). */
HOST_WIDE_INT stride = dr_step / type_size;
if (!step)
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data-ref access");
return false;
}
stride = dr_step / type_size;
/* Consecutive? */
if (!tree_int_cst_compare (step, TYPE_SIZE_UNIT (scalar_type)))
@ -2003,6 +2053,13 @@ vect_analyze_data_ref_access (struct data_reference *dr)
return true;
}
if (nested_in_vect_loop_p (loop, stmt))
{
if (vect_print_dump_info (REPORT_ALIGNMENT))
fprintf (vect_dump, "strided access in outer loop.");
return false;
}
/* Not consecutive access is possible only if it is a part of interleaving. */
if (!DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt)))
{
@ -2231,6 +2288,7 @@ vect_analyze_data_refs (loop_vec_info loop_vinfo)
tree stmt;
stmt_vec_info stmt_info;
basic_block bb;
tree base, offset, init;
if (!dr || !DR_REF (dr))
{
@ -2238,36 +2296,13 @@ vect_analyze_data_refs (loop_vec_info loop_vinfo)
fprintf (vect_dump, "not vectorized: unhandled data-ref ");
return false;
}
/* Update DR field in stmt_vec_info struct. */
stmt = DR_STMT (dr);
stmt_info = vinfo_for_stmt (stmt);
/* If outer-loop vectorization: we don't yet support datarefs
in the innermost loop. */
bb = bb_for_stmt (stmt);
if (bb->loop_father != LOOP_VINFO_LOOP (loop_vinfo))
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
fprintf (vect_dump, "not vectorized: data-ref in nested loop");
return false;
}
if (STMT_VINFO_DATA_REF (stmt_info))
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
{
fprintf (vect_dump,
"not vectorized: more than one data ref in stmt: ");
print_generic_expr (vect_dump, stmt, TDF_SLIM);
}
return false;
}
STMT_VINFO_DATA_REF (stmt_info) = dr;
/* Check that analysis of the data-ref succeeded. */
if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_INIT (dr)
|| !DR_STEP (dr))
|| !DR_STEP (dr))
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
{
@ -2294,7 +2329,127 @@ vect_analyze_data_refs (loop_vec_info loop_vinfo)
}
return false;
}
base = unshare_expr (DR_BASE_ADDRESS (dr));
offset = unshare_expr (DR_OFFSET (dr));
init = unshare_expr (DR_INIT (dr));
/* Update DR field in stmt_vec_info struct. */
bb = bb_for_stmt (stmt);
/* If the dataref is in an inner-loop of the loop that is considered for
for vectorization, we also want to analyze the access relative to
the outer-loop (DR contains information only relative to the
inner-most enclosing loop). We do that by building a reference to the
first location accessed by the inner-loop, and analyze it relative to
the outer-loop. */
if (nested_in_vect_loop_p (loop, stmt))
{
tree outer_step, outer_base, outer_init;
HOST_WIDE_INT pbitsize, pbitpos;
tree poffset;
enum machine_mode pmode;
int punsignedp, pvolatilep;
affine_iv base_iv, offset_iv;
tree dinit;
/* Build a reference to the first location accessed by the
inner-loop: *(BASE+INIT). (The first location is actually
BASE+INIT+OFFSET, but we add OFFSET separately later. */
tree inner_base = build_fold_indirect_ref
(fold_build2 (PLUS_EXPR, TREE_TYPE (base), base, init));
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (dump_file, "analyze in outer-loop: ");
print_generic_expr (dump_file, inner_base, TDF_SLIM);
}
outer_base = get_inner_reference (inner_base, &pbitsize, &pbitpos,
&poffset, &pmode, &punsignedp, &pvolatilep, false);
gcc_assert (outer_base != NULL_TREE);
if (pbitpos % BITS_PER_UNIT != 0)
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (dump_file, "failed: bit offset alignment.\n");
return false;
}
outer_base = build_fold_addr_expr (outer_base);
if (!simple_iv (loop, stmt, outer_base, &base_iv, false))
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (dump_file, "failed: evolution of base is not affine.\n");
return false;
}
if (offset)
{
if (poffset)
poffset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, poffset);
else
poffset = offset;
}
if (!poffset)
{
offset_iv.base = ssize_int (0);
offset_iv.step = ssize_int (0);
}
else if (!simple_iv (loop, stmt, poffset, &offset_iv, false))
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (dump_file, "evolution of offset is not affine.\n");
return false;
}
outer_init = ssize_int (pbitpos / BITS_PER_UNIT);
split_constant_offset (base_iv.base, &base_iv.base, &dinit);
outer_init = size_binop (PLUS_EXPR, outer_init, dinit);
split_constant_offset (offset_iv.base, &offset_iv.base, &dinit);
outer_init = size_binop (PLUS_EXPR, outer_init, dinit);
outer_step = size_binop (PLUS_EXPR,
fold_convert (ssizetype, base_iv.step),
fold_convert (ssizetype, offset_iv.step));
STMT_VINFO_DR_STEP (stmt_info) = outer_step;
/* FIXME: Use canonicalize_base_object_address (base_iv.base); */
STMT_VINFO_DR_BASE_ADDRESS (stmt_info) = base_iv.base;
STMT_VINFO_DR_INIT (stmt_info) = outer_init;
STMT_VINFO_DR_OFFSET (stmt_info) =
fold_convert (ssizetype, offset_iv.base);
STMT_VINFO_DR_ALIGNED_TO (stmt_info) =
size_int (highest_pow2_factor (offset_iv.base));
if (dump_file && (dump_flags & TDF_DETAILS))
{
fprintf (dump_file, "\touter base_address: ");
print_generic_expr (dump_file, STMT_VINFO_DR_BASE_ADDRESS (stmt_info), TDF_SLIM);
fprintf (dump_file, "\n\touter offset from base address: ");
print_generic_expr (dump_file, STMT_VINFO_DR_OFFSET (stmt_info), TDF_SLIM);
fprintf (dump_file, "\n\touter constant offset from base address: ");
print_generic_expr (dump_file, STMT_VINFO_DR_INIT (stmt_info), TDF_SLIM);
fprintf (dump_file, "\n\touter step: ");
print_generic_expr (dump_file, STMT_VINFO_DR_STEP (stmt_info), TDF_SLIM);
fprintf (dump_file, "\n\touter aligned to: ");
print_generic_expr (dump_file, STMT_VINFO_DR_ALIGNED_TO (stmt_info), TDF_SLIM);
}
}
if (STMT_VINFO_DATA_REF (stmt_info))
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
{
fprintf (vect_dump,
"not vectorized: more than one data ref in stmt: ");
print_generic_expr (vect_dump, stmt, TDF_SLIM);
}
return false;
}
STMT_VINFO_DATA_REF (stmt_info) = dr;
/* Set vectype for STMT. */
scalar_type = TREE_TYPE (DR_REF (dr));
STMT_VINFO_VECTYPE (stmt_info) =

File diff suppressed because it is too large Load diff

View file

@ -1345,6 +1345,13 @@ new_stmt_vec_info (tree stmt, loop_vec_info loop_vinfo)
STMT_VINFO_IN_PATTERN_P (res) = false;
STMT_VINFO_RELATED_STMT (res) = NULL;
STMT_VINFO_DATA_REF (res) = NULL;
STMT_VINFO_DR_BASE_ADDRESS (res) = NULL;
STMT_VINFO_DR_OFFSET (res) = NULL;
STMT_VINFO_DR_INIT (res) = NULL;
STMT_VINFO_DR_STEP (res) = NULL;
STMT_VINFO_DR_ALIGNED_TO (res) = NULL;
if (TREE_CODE (stmt) == PHI_NODE && is_loop_header_bb_p (bb_for_stmt (stmt)))
STMT_VINFO_DEF_TYPE (res) = vect_unknown_def_type;
else
@ -1655,21 +1662,103 @@ get_vectype_for_scalar_type (tree scalar_type)
enum dr_alignment_support
vect_supportable_dr_alignment (struct data_reference *dr)
{
tree vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr)));
tree stmt = DR_STMT (dr);
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
enum machine_mode mode = (int) TYPE_MODE (vectype);
struct loop *vect_loop = LOOP_VINFO_LOOP (STMT_VINFO_LOOP_VINFO (stmt_info));
bool nested_in_vect_loop = nested_in_vect_loop_p (vect_loop, stmt);
bool invariant_in_outerloop = false;
if (aligned_access_p (dr))
return dr_aligned;
if (nested_in_vect_loop)
{
tree outerloop_step = STMT_VINFO_DR_STEP (stmt_info);
invariant_in_outerloop =
(tree_int_cst_compare (outerloop_step, size_zero_node) == 0);
}
/* Possibly unaligned access. */
/* We can choose between using the implicit realignment scheme (generating
a misaligned_move stmt) and the explicit realignment scheme (generating
aligned loads with a REALIGN_LOAD). There are two variants to the explicit
realignment scheme: optimized, and unoptimized.
We can optimize the realignment only if the step between consecutive
vector loads is equal to the vector size. Since the vector memory
accesses advance in steps of VS (Vector Size) in the vectorized loop, it
is guaranteed that the misalignment amount remains the same throughout the
execution of the vectorized loop. Therefore, we can create the
"realignment token" (the permutation mask that is passed to REALIGN_LOAD)
at the loop preheader.
However, in the case of outer-loop vectorization, when vectorizing a
memory access in the inner-loop nested within the LOOP that is now being
vectorized, while it is guaranteed that the misalignment of the
vectorized memory access will remain the same in different outer-loop
iterations, it is *not* guaranteed that is will remain the same throughout
the execution of the inner-loop. This is because the inner-loop advances
with the original scalar step (and not in steps of VS). If the inner-loop
step happens to be a multiple of VS, then the misalignment remaines fixed
and we can use the optimized realignment scheme. For example:
for (i=0; i<N; i++)
for (j=0; j<M; j++)
s += a[i+j];
When vectorizing the i-loop in the above example, the step between
consecutive vector loads is 1, and so the misalignment does not remain
fixed across the execution of the inner-loop, and the realignment cannot
be optimized (as illustrated in the following pseudo vectorized loop):
for (i=0; i<N; i+=4)
for (j=0; j<M; j++){
vs += vp[i+j]; // misalignment of &vp[i+j] is {0,1,2,3,0,1,2,3,...}
// when j is {0,1,2,3,4,5,6,7,...} respectively.
// (assuming that we start from an aligned address).
}
We therefore have to use the unoptimized realignment scheme:
for (i=0; i<N; i+=4)
for (j=k; j<M; j+=4)
vs += vp[i+j]; // misalignment of &vp[i+j] is always k (assuming
// that the misalignment of the initial address is
// 0).
The loop can then be vectorized as follows:
for (k=0; k<4; k++){
rt = get_realignment_token (&vp[k]);
for (i=0; i<N; i+=4){
v1 = vp[i+k];
for (j=k; j<M; j+=4){
v2 = vp[i+j+VS-1];
va = REALIGN_LOAD <v1,v2,rt>;
vs += va;
v1 = v2;
}
}
} */
if (DR_IS_READ (dr))
{
if (optab_handler (vec_realign_load_optab, mode)->insn_code != CODE_FOR_nothing
if (optab_handler (vec_realign_load_optab, mode)->insn_code !=
CODE_FOR_nothing
&& (!targetm.vectorize.builtin_mask_for_load
|| targetm.vectorize.builtin_mask_for_load ()))
return dr_unaligned_software_pipeline;
{
if (nested_in_vect_loop
&& TREE_INT_CST_LOW (DR_STEP (dr)) != UNITS_PER_SIMD_WORD)
return dr_explicit_realign;
else
return dr_explicit_realign_optimized;
}
if (optab_handler (movmisalign_optab, mode)->insn_code != CODE_FOR_nothing)
if (optab_handler (movmisalign_optab, mode)->insn_code !=
CODE_FOR_nothing)
/* Can't software pipeline the loads, but can at least do them. */
return dr_unaligned_supported;
}

View file

@ -53,7 +53,8 @@ enum operation_type {
enum dr_alignment_support {
dr_unaligned_unsupported,
dr_unaligned_supported,
dr_unaligned_software_pipeline,
dr_explicit_realign,
dr_explicit_realign_optimized,
dr_aligned
};
@ -249,9 +250,18 @@ typedef struct _stmt_vec_info {
data-ref (array/pointer/struct access). A GIMPLE stmt is expected to have
at most one such data-ref. **/
/* Information about the data-ref (access function, etc). */
/* Information about the data-ref (access function, etc),
relative to the inner-most containing loop. */
struct data_reference *data_ref_info;
/* Information about the data-ref relative to this loop
nest (the loop that is being considered for vectorization). */
tree dr_base_address;
tree dr_init;
tree dr_offset;
tree dr_step;
tree dr_aligned_to;
/* Stmt is part of some pattern (computation idiom) */
bool in_pattern_p;
@ -310,6 +320,13 @@ typedef struct _stmt_vec_info {
#define STMT_VINFO_VECTYPE(S) (S)->vectype
#define STMT_VINFO_VEC_STMT(S) (S)->vectorized_stmt
#define STMT_VINFO_DATA_REF(S) (S)->data_ref_info
#define STMT_VINFO_DR_BASE_ADDRESS(S) (S)->dr_base_address
#define STMT_VINFO_DR_INIT(S) (S)->dr_init
#define STMT_VINFO_DR_OFFSET(S) (S)->dr_offset
#define STMT_VINFO_DR_STEP(S) (S)->dr_step
#define STMT_VINFO_DR_ALIGNED_TO(S) (S)->dr_aligned_to
#define STMT_VINFO_IN_PATTERN_P(S) (S)->in_pattern_p
#define STMT_VINFO_RELATED_STMT(S) (S)->related_stmt
#define STMT_VINFO_SAME_ALIGN_REFS(S) (S)->same_align_refs