tree-data-refs.c (split_constant_offset): Expose.
* tree-data-refs.c (split_constant_offset): Expose. * tree-data-refs.h (split_constant_offset): Add declaration. * tree-vectorizer.h (dr_alignment_support): Renamed dr_unaligned_software_pipeline to dr_explicit_realign_optimized. Added a new value dr_explicit_realign. (_stmt_vec_info): Added new fields: dr_base_address, dr_init, dr_offset, dr_step, and dr_aligned_to, along with new access functions for these fields: STMT_VINFO_DR_BASE_ADDRESS, STMT_VINFO_DR_INIT, STMT_VINFO_DR_OFFSET, STMT_VINFO_DR_STEP, and STMT_VINFO_DR_ALIGNED_TO. * tree-vectorizer.c (vect_supportable_dr_alignment): Add documentation. In case of outer-loop vectorization with non-fixed misalignment - use the dr_explicit_realign scheme instead of the optimized realignment scheme. (new_stmt_vec_info): Initialize new fields. * tree-vect-analyze.c (vect_compute_data_ref_alignment): Handle the 'nested_in_vect_loop' case. Change verbosity level. (vect_analyze_data_ref_access): Handle the 'nested_in_vect_loop' case. Don't fail on zero step in the outer-loop for loads. (vect_analyze_data_refs): Call split_constant_offset to calculate base, offset and init relative to the outer-loop. * tree-vect-transform.c (vect_create_data_ref_ptr): Replace the unused BSI function argument with a new function argument - at_loop. Simplify the condition that determines STEP. Takes additional argument INV_P. Support outer-loop vectorization (handle the nested_in_vect_loop case), including zero step in the outer-loop. Call vect_create_addr_base_for_vector_ref with additional argument. (vect_create_addr_base_for_vector_ref): Takes additional argument LOOP. Updated function documentation. Handle the 'nested_in_vect_loop' case. Fixed and simplified calculation of step. (vectorizable_store): Call vect_create_data_ref_ptr with loop instead of bsi, and with additional argument. Call bump_vector_ptr with additional argument. Fix typos. Handle the 'nested_in_vect_loop' case. (vect_setup_realignment): Takes additional arguments INIT_ADDR and DR_ALIGNMENT_SUPPORT. Returns another value AT_LOOP. Handle the case when the realignment setup needs to take place inside the loop. Support the dr_explicit_realign scheme. Allow generating the optimized realignment scheme for outer-loop vectorization. Added documentation. (vectorizable_load): Support the dr_explicit_realign scheme. Handle the 'nested_in_vect_loop' case, including loads that are invariant in the outer-loop and the realignment schemes. Handle the case when the realignment setup needs to take place inside the loop. Call vect_setup_realignment with additional arguments. Call vect_create_data_ref_ptr with additional argument and with loop instead of bsi. Fix 80-column overflow. Fix typos. Rename PHI_STMT to PHI. (vect_gen_niters_for_prolog_loop): Call vect_create_addr_base_for_vector_ref with additional arguments. (vect_create_cond_for_align_checks): Likewise. (bump_vector_ptr): Updated to support the new dr_explicit_realign scheme: takes additional argument bump; argument ptr_incr is now optional; updated documentation. (vect_init_vector): Takes additional argument (bsi). Use it, if available, to insert the vector initialization. (get_initial_def_for_induction): Pass additional argument in call to vect_init_vector. (vect_get_vec_def_for_operand): Likewise. (vect_setup_realignment): Likewise. (vectorizable_load): Likewise. From-SVN: r127624
This commit is contained in:
parent
d29de1bf28
commit
468c2ac0cc
40 changed files with 2498 additions and 211 deletions
|
@ -1,3 +1,69 @@
|
|||
2007-08-19 Dorit Nuzman <dorit@il.ibm.com>
|
||||
|
||||
* tree-data-refs.c (split_constant_offset): Expose.
|
||||
* tree-data-refs.h (split_constant_offset): Add declaration.
|
||||
|
||||
* tree-vectorizer.h (dr_alignment_support): Renamed
|
||||
dr_unaligned_software_pipeline to dr_explicit_realign_optimized.
|
||||
Added a new value dr_explicit_realign.
|
||||
(_stmt_vec_info): Added new fields: dr_base_address, dr_init,
|
||||
dr_offset, dr_step, and dr_aligned_to, along with new access
|
||||
functions for these fields: STMT_VINFO_DR_BASE_ADDRESS,
|
||||
STMT_VINFO_DR_INIT, STMT_VINFO_DR_OFFSET, STMT_VINFO_DR_STEP, and
|
||||
STMT_VINFO_DR_ALIGNED_TO.
|
||||
|
||||
* tree-vectorizer.c (vect_supportable_dr_alignment): Add
|
||||
documentation.
|
||||
In case of outer-loop vectorization with non-fixed misalignment - use
|
||||
the dr_explicit_realign scheme instead of the optimized realignment
|
||||
scheme.
|
||||
(new_stmt_vec_info): Initialize new fields.
|
||||
|
||||
* tree-vect-analyze.c (vect_compute_data_ref_alignment): Handle the
|
||||
'nested_in_vect_loop' case. Change verbosity level.
|
||||
(vect_analyze_data_ref_access): Handle the 'nested_in_vect_loop' case.
|
||||
Don't fail on zero step in the outer-loop for loads.
|
||||
(vect_analyze_data_refs): Call split_constant_offset to calculate base,
|
||||
offset and init relative to the outer-loop.
|
||||
|
||||
* tree-vect-transform.c (vect_create_data_ref_ptr): Replace the unused
|
||||
BSI function argument with a new function argument - at_loop.
|
||||
Simplify the condition that determines STEP. Takes additional argument
|
||||
INV_P. Support outer-loop vectorization (handle the nested_in_vect_loop
|
||||
case), including zero step in the outer-loop. Call
|
||||
vect_create_addr_base_for_vector_ref with additional argument.
|
||||
(vect_create_addr_base_for_vector_ref): Takes additional argument LOOP.
|
||||
Updated function documentation. Handle the 'nested_in_vect_loop' case.
|
||||
Fixed and simplified calculation of step.
|
||||
(vectorizable_store): Call vect_create_data_ref_ptr with loop instead
|
||||
of bsi, and with additional argument. Call bump_vector_ptr with
|
||||
additional argument. Fix typos. Handle the 'nested_in_vect_loop' case.
|
||||
(vect_setup_realignment): Takes additional arguments INIT_ADDR and
|
||||
DR_ALIGNMENT_SUPPORT. Returns another value AT_LOOP. Handle the case
|
||||
when the realignment setup needs to take place inside the loop. Support
|
||||
the dr_explicit_realign scheme. Allow generating the optimized
|
||||
realignment scheme for outer-loop vectorization. Added documentation.
|
||||
(vectorizable_load): Support the dr_explicit_realign scheme. Handle the
|
||||
'nested_in_vect_loop' case, including loads that are invariant in the
|
||||
outer-loop and the realignment schemes. Handle the case when the
|
||||
realignment setup needs to take place inside the loop. Call
|
||||
vect_setup_realignment with additional arguments. Call
|
||||
vect_create_data_ref_ptr with additional argument and with loop instead
|
||||
of bsi. Fix 80-column overflow. Fix typos. Rename PHI_STMT to PHI.
|
||||
(vect_gen_niters_for_prolog_loop): Call
|
||||
vect_create_addr_base_for_vector_ref with additional arguments.
|
||||
(vect_create_cond_for_align_checks): Likewise.
|
||||
(bump_vector_ptr): Updated to support the new dr_explicit_realign
|
||||
scheme: takes additional argument bump; argument ptr_incr is now
|
||||
optional; updated documentation.
|
||||
(vect_init_vector): Takes additional argument (bsi). Use it, if
|
||||
available, to insert the vector initialization.
|
||||
(get_initial_def_for_induction): Pass additional argument in call to
|
||||
vect_init_vector.
|
||||
(vect_get_vec_def_for_operand): Likewise.
|
||||
(vect_setup_realignment): Likewise.
|
||||
(vectorizable_load): Likewise.
|
||||
|
||||
2007-08-19 Dorit Nuzman <dorit@il.ibm.com>
|
||||
|
||||
* tree-vectorizer.h (vect_is_simple_reduction): Takes a loop_vec_info
|
||||
|
|
|
@ -1,3 +1,38 @@
|
|||
2007-08-19 Dorit Nuzman <dorit@il.ibm.com>
|
||||
|
||||
* gcc.dg/vect/vect-117.c: Change inner-loop bound to
|
||||
unknown (so that outer-loop wont get analyzed).
|
||||
* gcc.dg/vect/vect-outer-1a.c: New test.
|
||||
* gcc.dg/vect/vect-outer-1b.c: New test.
|
||||
* gcc.dg/vect/vect-outer-1.c: New test.
|
||||
* gcc.dg/vect/vect-outer-2a.c: New test.
|
||||
* gcc.dg/vect/vect-outer-2b.c: New test.
|
||||
* gcc.dg/vect/vect-outer-2c.c: New test.
|
||||
* gcc.dg/vect/vect-outer-2.c: New test.
|
||||
* gcc.dg/vect/vect-outer-3a.c: New test.
|
||||
* gcc.dg/vect/vect-outer-3b.c: New test.
|
||||
* gcc.dg/vect/vect-outer-3c.c: New test.
|
||||
* gcc.dg/vect/vect-outer-3.c: New test.
|
||||
* gcc.dg/vect/vect-outer-4a.c: New test.
|
||||
* gcc.dg/vect/vect-outer-4b.c: New test.
|
||||
* gcc.dg/vect/vect-outer-4c.c: New test.
|
||||
* gcc.dg/vect/vect-outer-4d.c: New test.
|
||||
* gcc.dg/vect/vect-outer-4e.c: New test.
|
||||
* gcc.dg/vect/vect-outer-4f.c: New test.
|
||||
* gcc.dg/vect/vect-outer-4g.c: New test.
|
||||
* gcc.dg/vect/no-section-anchors-vect-outer-4h.c: New test.
|
||||
* gcc.dg/vect/vect-outer-4i.c: New test.
|
||||
* gcc.dg/vect/vect-outer-4j.c: New test.
|
||||
* gcc.dg/vect/vect-outer-4k.c: New test.
|
||||
* gcc.dg/vect/vect-outer-4l.c: New test.
|
||||
* gcc.dg/vect/vect-outer-4m.c: New test.
|
||||
* gcc.dg/vect/vect-outer-4.c: New test.
|
||||
* gcc.dg/vect/vect-outer-5.c: New test.
|
||||
* gcc.dg/vect/vect-outer-6.c: New test.
|
||||
* gcc.dg/vect/vect-outer-fir.c: New test.
|
||||
* gcc.dg/vect/vect-outer-fir-lb.c: New test.
|
||||
* gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c: New test.
|
||||
|
||||
2007-08-19 Dorit Nuzman <dorit@il.ibm.com>
|
||||
|
||||
* gcc.dg/vect/vect.exp: Compile tests with -fno-tree-scev-cprop
|
||||
|
|
|
@ -0,0 +1,75 @@
|
|||
/* { dg-require-effective-target vect_float } */
|
||||
|
||||
#include <stdarg.h>
|
||||
#include "../../tree-vect.h"
|
||||
|
||||
#define N 40
|
||||
#define M 128
|
||||
float in[N+M];
|
||||
float coeff[M];
|
||||
float out[N];
|
||||
float fir_out[N];
|
||||
|
||||
/* Should be vectorized. Fixed misaligment in the inner-loop. */
|
||||
/* Currently not vectorized because we get too many BBs in the inner-loop,
|
||||
because the compiler doesn't realize that the inner-loop executes at
|
||||
least once (cause k<4), and so there's no need to create a guard code
|
||||
to skip the inner-loop in case it doesn't execute. */
|
||||
void foo (){
|
||||
int i,j,k;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
out[i] = 0;
|
||||
}
|
||||
|
||||
for (k = 0; k < 4; k++) {
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = k; j < M; j+=4) {
|
||||
diff += in[j+i]*coeff[j];
|
||||
}
|
||||
out[i] += diff;
|
||||
}
|
||||
}
|
||||
|
||||
/* Vectorized. Changing misalignment in the inner-loop. */
|
||||
void fir (){
|
||||
int i,j,k;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < M; j++) {
|
||||
diff += in[j+i]*coeff[j];
|
||||
}
|
||||
fir_out[i] = diff;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
int main (void)
|
||||
{
|
||||
check_vect ();
|
||||
int i, j;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < M; i++)
|
||||
coeff[i] = i;
|
||||
for (i = 0; i < N+M; i++)
|
||||
in[i] = i;
|
||||
|
||||
foo ();
|
||||
fir ();
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
if (out[i] != fir_out[i])
|
||||
abort ();
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 2 "vect" { xfail *-*-* } } } */
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
47
gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-outer-4h.c
Normal file
47
gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-outer-4h.c
Normal file
|
@ -0,0 +1,47 @@
|
|||
/* { dg-require-effective-target vect_int } */
|
||||
#include <stdarg.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
|
||||
#define N 40
|
||||
#define M 128
|
||||
unsigned short a[M][N];
|
||||
unsigned int out[N];
|
||||
|
||||
/* Outer-loop vectorization. */
|
||||
|
||||
void
|
||||
foo (){
|
||||
int i,j;
|
||||
unsigned int diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
for (j = 0; j < M; j++) {
|
||||
a[j][i] = 4;
|
||||
}
|
||||
out[i]=5;
|
||||
}
|
||||
}
|
||||
|
||||
int main (void)
|
||||
{
|
||||
int i, j;
|
||||
check_vect ();
|
||||
|
||||
foo ();
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
for (j = 0; j < M; j++) {
|
||||
if (a[j][i] != 4)
|
||||
abort ();
|
||||
}
|
||||
if (out[i] != 5)
|
||||
abort ();
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
|
@ -20,7 +20,7 @@ static int c[N][N] = {{ 1, 2, 3, 4, 5},
|
|||
|
||||
volatile int foo;
|
||||
|
||||
int main1 (int A[N][N])
|
||||
int main1 (int A[N][N], int n)
|
||||
{
|
||||
|
||||
int i,j;
|
||||
|
@ -28,7 +28,7 @@ int main1 (int A[N][N])
|
|||
/* vectorizable */
|
||||
for (i = 1; i < N; i++)
|
||||
{
|
||||
for (j = 0; j < N; j++)
|
||||
for (j = 0; j < n; j++)
|
||||
{
|
||||
A[i][j] = A[i-1][j] + A[i][j];
|
||||
}
|
||||
|
@ -42,7 +42,7 @@ int main (void)
|
|||
int i,j;
|
||||
|
||||
foo = 0;
|
||||
main1 (a);
|
||||
main1 (a, N);
|
||||
|
||||
/* check results: */
|
||||
|
||||
|
|
26
gcc/testsuite/gcc.dg/vect/vect-outer-1.c
Normal file
26
gcc/testsuite/gcc.dg/vect/vect-outer-1.c
Normal file
|
@ -0,0 +1,26 @@
|
|||
/* { dg-do compile } */
|
||||
|
||||
#define N 40
|
||||
signed short image[N][N] __attribute__ ((__aligned__(16)));
|
||||
signed short block[N][N] __attribute__ ((__aligned__(16)));
|
||||
signed short out[N] __attribute__ ((__aligned__(16)));
|
||||
|
||||
/* Can't do outer-loop vectorization because of non-consecutive access. */
|
||||
|
||||
void
|
||||
foo (){
|
||||
int i,j;
|
||||
int diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < N; j+=8) {
|
||||
diff += (image[i][j] - block[i][j]);
|
||||
}
|
||||
out[i]=diff;
|
||||
}
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
|
||||
/* { dg-final { scan-tree-dump-times "strided access in outer loop" 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
28
gcc/testsuite/gcc.dg/vect/vect-outer-1a.c
Normal file
28
gcc/testsuite/gcc.dg/vect/vect-outer-1a.c
Normal file
|
@ -0,0 +1,28 @@
|
|||
/* { dg-do compile } */
|
||||
|
||||
#define N 40
|
||||
signed short image[N][N] __attribute__ ((__aligned__(16)));
|
||||
signed short block[N][N] __attribute__ ((__aligned__(16)));
|
||||
|
||||
/* Can't do outer-loop vectorization because of non-consecutive access.
|
||||
Currently fails to vectorize because the reduction pattern is not
|
||||
recognized. */
|
||||
|
||||
int
|
||||
foo (){
|
||||
int i,j;
|
||||
int diff = 0;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
for (j = 0; j < N; j+=8) {
|
||||
diff += (image[i][j] - block[i][j]);
|
||||
}
|
||||
}
|
||||
return diff;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
|
||||
/* FORNOW */
|
||||
/* { dg-final { scan-tree-dump-times "strided access in outer loop" 1 "vect" { xfail *-*-* } } } */
|
||||
/* { dg-final { scan-tree-dump-times "unexpected pattern" 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
26
gcc/testsuite/gcc.dg/vect/vect-outer-1b.c
Normal file
26
gcc/testsuite/gcc.dg/vect/vect-outer-1b.c
Normal file
|
@ -0,0 +1,26 @@
|
|||
/* { dg-do compile } */
|
||||
|
||||
#define N 40
|
||||
signed short image[N][N];
|
||||
signed short block[N][N];
|
||||
signed short out[N];
|
||||
|
||||
/* Outer-loop cannot get vectorized because of non-consecutive access. */
|
||||
|
||||
void
|
||||
foo (){
|
||||
int i,j;
|
||||
int diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < N; j+=4) {
|
||||
diff += (image[i][j] - block[i][j]);
|
||||
}
|
||||
out[i]=diff;
|
||||
}
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
|
||||
/* { dg-final { scan-tree-dump-times "strided access in outer loop" 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
40
gcc/testsuite/gcc.dg/vect/vect-outer-2.c
Normal file
40
gcc/testsuite/gcc.dg/vect/vect-outer-2.c
Normal file
|
@ -0,0 +1,40 @@
|
|||
/* { dg-require-effective-target vect_float } */
|
||||
#include <stdarg.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 40
|
||||
float image[N][N] __attribute__ ((__aligned__(16)));
|
||||
float out[N];
|
||||
|
||||
/* Outer-loop vectorization. */
|
||||
|
||||
void
|
||||
foo (){
|
||||
int i,j;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
for (j = 0; j < N; j++) {
|
||||
image[j][i] = j+i;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
int main (void)
|
||||
{
|
||||
check_vect ();
|
||||
int i, j;
|
||||
|
||||
foo ();
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
for (j = 0; j < N; j++) {
|
||||
if (image[j][i] != j+i)
|
||||
abort ();
|
||||
}
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
41
gcc/testsuite/gcc.dg/vect/vect-outer-2a.c
Normal file
41
gcc/testsuite/gcc.dg/vect/vect-outer-2a.c
Normal file
|
@ -0,0 +1,41 @@
|
|||
/* { dg-require-effective-target vect_float } */
|
||||
#include <stdarg.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 40
|
||||
float image[N][N][N] __attribute__ ((__aligned__(16)));
|
||||
|
||||
void
|
||||
foo (){
|
||||
int i,j,k;
|
||||
|
||||
for (k=0; k<N; k++) {
|
||||
for (i = 0; i < N; i++) {
|
||||
for (j = 0; j < N; j++) {
|
||||
image[k][j][i] = j+i+k;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
int main (void)
|
||||
{
|
||||
check_vect ();
|
||||
int i, j, k;
|
||||
|
||||
foo ();
|
||||
|
||||
for (k=0; k<N; k++) {
|
||||
for (i = 0; i < N; i++) {
|
||||
for (j = 0; j < N; j++) {
|
||||
if (image[k][j][i] != j+i+k)
|
||||
abort ();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
41
gcc/testsuite/gcc.dg/vect/vect-outer-2b.c
Normal file
41
gcc/testsuite/gcc.dg/vect/vect-outer-2b.c
Normal file
|
@ -0,0 +1,41 @@
|
|||
/* { dg-require-effective-target vect_float } */
|
||||
#include <stdarg.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 40
|
||||
float image[2*N][N][N] __attribute__ ((__aligned__(16)));
|
||||
|
||||
void
|
||||
foo (){
|
||||
int i,j,k;
|
||||
|
||||
for (k=0; k<N; k++) {
|
||||
for (i = 0; i < N; i++) {
|
||||
for (j = 0; j < N; j++) {
|
||||
image[k+i][j][i] = j+i+k;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
int main (void)
|
||||
{
|
||||
check_vect ();
|
||||
int i, j, k;
|
||||
|
||||
foo ();
|
||||
|
||||
for (k=0; k<N; k++) {
|
||||
for (i = 0; i < N; i++) {
|
||||
for (j = 0; j < N; j++) {
|
||||
if (image[k+i][j][i] != j+i+k)
|
||||
abort ();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "strided access in outer loop." 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
41
gcc/testsuite/gcc.dg/vect/vect-outer-2c.c
Normal file
41
gcc/testsuite/gcc.dg/vect/vect-outer-2c.c
Normal file
|
@ -0,0 +1,41 @@
|
|||
/* { dg-require-effective-target vect_float } */
|
||||
#include <stdarg.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 40
|
||||
float image[2*N][2*N][N] __attribute__ ((__aligned__(16)));
|
||||
|
||||
void
|
||||
foo (){
|
||||
int i,j,k;
|
||||
|
||||
for (k=0; k<N; k++) {
|
||||
for (i = 0; i < N; i++) {
|
||||
for (j = 0; j < N; j+=2) {
|
||||
image[k][j][i] = j+i+k;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
int main (void)
|
||||
{
|
||||
check_vect ();
|
||||
int i, j, k;
|
||||
|
||||
foo ();
|
||||
|
||||
for (k=0; k<N; k++) {
|
||||
for (i = 0; i < N; i++) {
|
||||
for (j = 0; j < N; j+=2) {
|
||||
if (image[k][j][i] != j+i+k)
|
||||
abort ();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
41
gcc/testsuite/gcc.dg/vect/vect-outer-2d.c
Normal file
41
gcc/testsuite/gcc.dg/vect/vect-outer-2d.c
Normal file
|
@ -0,0 +1,41 @@
|
|||
/* { dg-require-effective-target vect_float } */
|
||||
#include <stdarg.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 40
|
||||
float image[N][N][N+1] __attribute__ ((__aligned__(16)));
|
||||
|
||||
void
|
||||
foo (){
|
||||
int i,j,k;
|
||||
|
||||
for (k=0; k<N; k++) {
|
||||
for (i = 0; i < N; i++) {
|
||||
for (j = 0; j < i+1; j++) {
|
||||
image[k][j][i] = j+i+k;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
int main (void)
|
||||
{
|
||||
check_vect ();
|
||||
int i, j, k;
|
||||
|
||||
foo ();
|
||||
|
||||
for (k=0; k<N; k++) {
|
||||
for (i = 0; i < N; i++) {
|
||||
for (j = 0; j < i+1; j++) {
|
||||
if (image[k][j][i] != j+i+k)
|
||||
abort ();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 0 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
52
gcc/testsuite/gcc.dg/vect/vect-outer-3.c
Normal file
52
gcc/testsuite/gcc.dg/vect/vect-outer-3.c
Normal file
|
@ -0,0 +1,52 @@
|
|||
/* { dg-require-effective-target vect_float } */
|
||||
#include <stdarg.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 40
|
||||
float image[N][N] __attribute__ ((__aligned__(16)));
|
||||
float out[N];
|
||||
|
||||
/* Outer-loop vectoriation. */
|
||||
|
||||
void
|
||||
foo (){
|
||||
int i,j;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < N; j++) {
|
||||
diff += image[j][i];
|
||||
}
|
||||
out[i]=diff;
|
||||
}
|
||||
}
|
||||
|
||||
int main (void)
|
||||
{
|
||||
check_vect ();
|
||||
int i, j;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
for (j = 0; j < N; j++) {
|
||||
image[i][j]=i+j;
|
||||
}
|
||||
}
|
||||
|
||||
foo ();
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < N; j++) {
|
||||
diff += image[j][i];
|
||||
}
|
||||
if (out[i] != diff)
|
||||
abort ();
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
53
gcc/testsuite/gcc.dg/vect/vect-outer-3a.c
Normal file
53
gcc/testsuite/gcc.dg/vect/vect-outer-3a.c
Normal file
|
@ -0,0 +1,53 @@
|
|||
/* { dg-require-effective-target vect_float } */
|
||||
#include <stdarg.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 40
|
||||
float image[N][N+1] __attribute__ ((__aligned__(16)));
|
||||
float out[N];
|
||||
|
||||
/* Outer-loop vectorization with misaliged accesses in the inner-loop. */
|
||||
|
||||
void
|
||||
foo (){
|
||||
int i,j;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < N; j++) {
|
||||
diff += image[j][i];
|
||||
}
|
||||
out[i]=diff;
|
||||
}
|
||||
}
|
||||
|
||||
int main (void)
|
||||
{
|
||||
check_vect ();
|
||||
int i, j;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
for (j = 0; j < N; j++) {
|
||||
image[i][j]=i+j;
|
||||
}
|
||||
}
|
||||
|
||||
foo ();
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < N; j++) {
|
||||
diff += image[j][i];
|
||||
}
|
||||
if (out[i] != diff)
|
||||
abort ();
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
|
||||
/* { dg-final { scan-tree-dump-times "step doesn't divide the vector-size" 2 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
53
gcc/testsuite/gcc.dg/vect/vect-outer-3b.c
Normal file
53
gcc/testsuite/gcc.dg/vect/vect-outer-3b.c
Normal file
|
@ -0,0 +1,53 @@
|
|||
/* { dg-require-effective-target vect_float } */
|
||||
#include <stdarg.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 40
|
||||
float image[N][N] __attribute__ ((__aligned__(16)));
|
||||
float out[N];
|
||||
|
||||
/* Outer-loop vectorization with non-consecutive access. Not vectorized yet. */
|
||||
|
||||
void
|
||||
foo (){
|
||||
int i,j;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < N/2; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < N; j++) {
|
||||
diff += image[j][2*i];
|
||||
}
|
||||
out[i]=diff;
|
||||
}
|
||||
}
|
||||
|
||||
int main (void)
|
||||
{
|
||||
check_vect ();
|
||||
int i, j;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
for (j = 0; j < N; j++) {
|
||||
image[i][j]=i+j;
|
||||
}
|
||||
}
|
||||
|
||||
foo ();
|
||||
|
||||
for (i = 0; i < N/2; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < N; j++) {
|
||||
diff += image[j][2*i];
|
||||
}
|
||||
if (out[i] != diff)
|
||||
abort ();
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
|
||||
/* { dg-final { scan-tree-dump-times "strided access in outer loop" 2 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
52
gcc/testsuite/gcc.dg/vect/vect-outer-3c.c
Normal file
52
gcc/testsuite/gcc.dg/vect/vect-outer-3c.c
Normal file
|
@ -0,0 +1,52 @@
|
|||
/* { dg-require-effective-target vect_float } */
|
||||
#include <stdarg.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 40
|
||||
float image[N][N+1] __attribute__ ((__aligned__(16)));
|
||||
float out[N];
|
||||
|
||||
/* Outer-loop vectorization. */
|
||||
|
||||
void
|
||||
foo (){
|
||||
int i,j;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < N; j+=4) {
|
||||
diff += image[j][i];
|
||||
}
|
||||
out[i]=diff;
|
||||
}
|
||||
}
|
||||
|
||||
int main (void)
|
||||
{
|
||||
check_vect ();
|
||||
int i, j;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
for (j = 0; j < N; j++) {
|
||||
image[i][j]=i+j;
|
||||
}
|
||||
}
|
||||
|
||||
foo ();
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < N; j+=4) {
|
||||
diff += image[j][i];
|
||||
}
|
||||
if (out[i] != diff)
|
||||
abort ();
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
55
gcc/testsuite/gcc.dg/vect/vect-outer-4.c
Normal file
55
gcc/testsuite/gcc.dg/vect/vect-outer-4.c
Normal file
|
@ -0,0 +1,55 @@
|
|||
/* { dg-require-effective-target vect_float } */
|
||||
|
||||
#include <stdarg.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 40
|
||||
#define M 128
|
||||
float in[N+M];
|
||||
float coeff[M];
|
||||
float out[N];
|
||||
|
||||
/* Outer-loop vectorization. */
|
||||
|
||||
void
|
||||
foo (){
|
||||
int i,j;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < M; j+=4) {
|
||||
diff += in[j+i]*coeff[j];
|
||||
}
|
||||
out[i]=diff;
|
||||
}
|
||||
}
|
||||
|
||||
int main (void)
|
||||
{
|
||||
check_vect ();
|
||||
int i, j;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < M; i++)
|
||||
coeff[i] = i;
|
||||
for (i = 0; i < N+M; i++)
|
||||
in[i] = i;
|
||||
|
||||
foo ();
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < M; j+=4) {
|
||||
diff += in[j+i]*coeff[j];
|
||||
}
|
||||
if (out[i] != diff)
|
||||
abort ();
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
|
||||
/* { dg-final { scan-tree-dump-times "zero step in outer loop." 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
31
gcc/testsuite/gcc.dg/vect/vect-outer-4a.c
Normal file
31
gcc/testsuite/gcc.dg/vect/vect-outer-4a.c
Normal file
|
@ -0,0 +1,31 @@
|
|||
/* { dg-do compile } */
|
||||
|
||||
#define N 40
|
||||
#define M 128
|
||||
signed short in[N+M];
|
||||
signed short coeff[M];
|
||||
signed short out[N];
|
||||
|
||||
/* Outer-loop vectorization.
|
||||
Currently not vectorized because of multiple-data-types in the inner-loop. */
|
||||
|
||||
void
|
||||
foo (){
|
||||
int i,j;
|
||||
int diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < M; j+=8) {
|
||||
diff += in[j+i]*coeff[j];
|
||||
}
|
||||
out[i]=diff;
|
||||
}
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
|
||||
/* FORNOW. not vectorized until we support 0-stride acceses like coeff[j]. should be:
|
||||
{ scan-tree-dump-not "multiple types in nested loop." "vect" { xfail *-*-* } } } */
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "zero step in outer loop." 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
31
gcc/testsuite/gcc.dg/vect/vect-outer-4b.c
Normal file
31
gcc/testsuite/gcc.dg/vect/vect-outer-4b.c
Normal file
|
@ -0,0 +1,31 @@
|
|||
/* { dg-do compile } */
|
||||
|
||||
#define N 40
|
||||
#define M 128
|
||||
signed short in[N+M];
|
||||
signed short coeff[M];
|
||||
int out[N];
|
||||
|
||||
/* Outer-loop vectorization.
|
||||
Currently not vectorized because of multiple-data-types in the inner-loop. */
|
||||
|
||||
void
|
||||
foo (){
|
||||
int i,j;
|
||||
int diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < M; j+=8) {
|
||||
diff += in[j+i]*coeff[j];
|
||||
}
|
||||
out[i]=diff;
|
||||
}
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
|
||||
/* FORNOW. not vectorized until we support 0-stride acceses like coeff[j]. should be:
|
||||
{ scan-tree-dump-not "multiple types in nested loop." "vect" { xfail *-*-* } } } */
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "zero step in outer loop." 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
27
gcc/testsuite/gcc.dg/vect/vect-outer-4c.c
Normal file
27
gcc/testsuite/gcc.dg/vect/vect-outer-4c.c
Normal file
|
@ -0,0 +1,27 @@
|
|||
/* { dg-do compile } */
|
||||
|
||||
#define N 40
|
||||
#define M 128
|
||||
unsigned short in[N+M];
|
||||
unsigned short coeff[M];
|
||||
unsigned int out[N];
|
||||
|
||||
/* Outer-loop vectorization. */
|
||||
|
||||
void
|
||||
foo (){
|
||||
int i,j;
|
||||
unsigned short diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < M; j+=8) {
|
||||
diff += in[j+i]*coeff[j];
|
||||
}
|
||||
out[i]=diff;
|
||||
}
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { target vect_short_mult } } } */
|
||||
/* { dg-final { scan-tree-dump-times "zero step in outer loop." 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
51
gcc/testsuite/gcc.dg/vect/vect-outer-4d.c
Normal file
51
gcc/testsuite/gcc.dg/vect/vect-outer-4d.c
Normal file
|
@ -0,0 +1,51 @@
|
|||
/* { dg-require-effective-target vect_float } */
|
||||
|
||||
#include <stdarg.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 40
|
||||
#define M 128
|
||||
float in[N+M];
|
||||
float out[N];
|
||||
|
||||
/* Outer-loop vectorization. */
|
||||
|
||||
void
|
||||
foo (){
|
||||
int i,j;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < M; j+=4) {
|
||||
diff += in[j+i];
|
||||
}
|
||||
out[i]=diff;
|
||||
}
|
||||
}
|
||||
|
||||
int main (void)
|
||||
{
|
||||
check_vect ();
|
||||
int i, j;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < N; i++)
|
||||
in[i] = i;
|
||||
|
||||
foo ();
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < M; j+=4) {
|
||||
diff += in[j+i];
|
||||
}
|
||||
if (out[i] != diff)
|
||||
abort ();
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
27
gcc/testsuite/gcc.dg/vect/vect-outer-4e.c
Normal file
27
gcc/testsuite/gcc.dg/vect/vect-outer-4e.c
Normal file
|
@ -0,0 +1,27 @@
|
|||
/* { dg-do compile } */
|
||||
|
||||
#define N 40
|
||||
#define M 128
|
||||
unsigned int in[N+M];
|
||||
unsigned short out[N];
|
||||
|
||||
/* Outer-loop vectorization. */
|
||||
|
||||
void
|
||||
foo (){
|
||||
int i,j;
|
||||
unsigned int diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < M; j+=8) {
|
||||
diff += in[j+i];
|
||||
}
|
||||
out[i]=(unsigned short)diff;
|
||||
}
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
70
gcc/testsuite/gcc.dg/vect/vect-outer-4f.c
Normal file
70
gcc/testsuite/gcc.dg/vect/vect-outer-4f.c
Normal file
|
@ -0,0 +1,70 @@
|
|||
/* { dg-require-effective-target vect_int } */
|
||||
#include <stdarg.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 40
|
||||
#define M 128
|
||||
unsigned short in[N+M];
|
||||
unsigned int out[N];
|
||||
unsigned char arr[N];
|
||||
|
||||
/* Outer-loop vectorization. */
|
||||
/* Not vectorized due to multiple-types in the inner-loop. */
|
||||
|
||||
unsigned int
|
||||
foo (){
|
||||
int i,j;
|
||||
unsigned int diff;
|
||||
unsigned int s=0;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
arr[i] = 3;
|
||||
diff = 0;
|
||||
for (j = 0; j < M; j+=8) {
|
||||
diff += in[j+i];
|
||||
}
|
||||
s+=diff;
|
||||
}
|
||||
return s;
|
||||
}
|
||||
|
||||
unsigned int
|
||||
bar (int i, unsigned int diff, unsigned short *in)
|
||||
{
|
||||
int j;
|
||||
for (j = 0; j < M; j+=8) {
|
||||
diff += in[j+i];
|
||||
}
|
||||
return diff;
|
||||
}
|
||||
|
||||
int main (void)
|
||||
{
|
||||
int i, j;
|
||||
unsigned int diff;
|
||||
unsigned int s=0,sum=0;
|
||||
|
||||
check_vect ();
|
||||
|
||||
for (i = 0; i < N+M; i++) {
|
||||
in[i] = i;
|
||||
}
|
||||
|
||||
sum=foo ();
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
arr[i] = 3;
|
||||
diff = 0;
|
||||
diff = bar (i, diff, in);
|
||||
s += diff;
|
||||
}
|
||||
|
||||
if (s != sum)
|
||||
abort ();
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
|
||||
/* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: not allowed" 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
70
gcc/testsuite/gcc.dg/vect/vect-outer-4g.c
Normal file
70
gcc/testsuite/gcc.dg/vect/vect-outer-4g.c
Normal file
|
@ -0,0 +1,70 @@
|
|||
/* { dg-require-effective-target vect_int } */
|
||||
#include <stdarg.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 40
|
||||
#define M 128
|
||||
unsigned short in[N+M];
|
||||
unsigned int out[N];
|
||||
unsigned char arr[N];
|
||||
|
||||
/* Outer-loop vectorization. */
|
||||
/* Not vectorized due to multiple-types in the inner-loop. */
|
||||
|
||||
unsigned int
|
||||
foo (){
|
||||
int i,j;
|
||||
unsigned int diff;
|
||||
unsigned int s=0;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
arr[i] = 3;
|
||||
diff = 0;
|
||||
for (j = 0; j < M; j+=8) {
|
||||
diff += in[j+i];
|
||||
}
|
||||
s+=diff;
|
||||
}
|
||||
return s;
|
||||
}
|
||||
|
||||
unsigned int
|
||||
bar (int i, unsigned int diff, unsigned short *in)
|
||||
{
|
||||
int j;
|
||||
for (j = 0; j < M; j+=8) {
|
||||
diff += in[j+i];
|
||||
}
|
||||
return diff;
|
||||
}
|
||||
|
||||
int main (void)
|
||||
{
|
||||
int i, j;
|
||||
unsigned int diff;
|
||||
unsigned int s=0,sum=0;
|
||||
|
||||
check_vect ();
|
||||
|
||||
for (i = 0; i < N+M; i++) {
|
||||
in[i] = i;
|
||||
}
|
||||
|
||||
sum=foo ();
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
arr[i] = 3;
|
||||
diff = 0;
|
||||
diff = bar (i, diff, in);
|
||||
s += diff;
|
||||
}
|
||||
|
||||
if (s != sum)
|
||||
abort ();
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
|
||||
/* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: not allowed" 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
28
gcc/testsuite/gcc.dg/vect/vect-outer-4i.c
Normal file
28
gcc/testsuite/gcc.dg/vect/vect-outer-4i.c
Normal file
|
@ -0,0 +1,28 @@
|
|||
/* { dg-do compile } */
|
||||
|
||||
#define N 40
|
||||
#define M 128
|
||||
unsigned char in[N+M];
|
||||
unsigned short out[N];
|
||||
|
||||
/* Outer-loop vectorization. */
|
||||
/* Not vectorized due to multiple-types in the inner-loop. */
|
||||
|
||||
unsigned short
|
||||
foo (){
|
||||
int i,j;
|
||||
unsigned short diff;
|
||||
unsigned short s=0;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < M; j+=8) {
|
||||
diff += in[j+i];
|
||||
}
|
||||
s+=diff;
|
||||
}
|
||||
return s;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
26
gcc/testsuite/gcc.dg/vect/vect-outer-4j.c
Normal file
26
gcc/testsuite/gcc.dg/vect/vect-outer-4j.c
Normal file
|
@ -0,0 +1,26 @@
|
|||
/* { dg-do compile } */
|
||||
|
||||
#define N 40
|
||||
#define M 128
|
||||
unsigned char in[N+M];
|
||||
unsigned short out[N];
|
||||
|
||||
/* Outer-loop vectorization. */
|
||||
/* Not vectorized due to multiple-types in the inner-loop. */
|
||||
|
||||
void
|
||||
foo (){
|
||||
int i,j;
|
||||
unsigned short diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < M; j+=8) {
|
||||
diff += in[j+i];
|
||||
}
|
||||
out[i]=diff;
|
||||
}
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
70
gcc/testsuite/gcc.dg/vect/vect-outer-4k.c
Normal file
70
gcc/testsuite/gcc.dg/vect/vect-outer-4k.c
Normal file
|
@ -0,0 +1,70 @@
|
|||
/* { dg-require-effective-target vect_int } */
|
||||
#include <stdarg.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 40
|
||||
#define M 128
|
||||
unsigned short in[N+M];
|
||||
unsigned int out[N];
|
||||
unsigned char arr[N];
|
||||
|
||||
/* Outer-loop vectorization. */
|
||||
/* Not vectorized due to multiple-types in the inner-loop. */
|
||||
|
||||
unsigned int
|
||||
foo (){
|
||||
int i,j;
|
||||
unsigned int diff;
|
||||
unsigned int s=0;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
arr[i] = 3;
|
||||
diff = 0;
|
||||
for (j = 0; j < M; j+=8) {
|
||||
diff += in[j+i];
|
||||
}
|
||||
s+=diff;
|
||||
}
|
||||
return s;
|
||||
}
|
||||
|
||||
unsigned int
|
||||
bar (int i, unsigned int diff, unsigned short *in)
|
||||
{
|
||||
int j;
|
||||
for (j = 0; j < M; j+=8) {
|
||||
diff += in[j+i];
|
||||
}
|
||||
return diff;
|
||||
}
|
||||
|
||||
int main (void)
|
||||
{
|
||||
int i, j;
|
||||
unsigned int diff;
|
||||
unsigned int s=0,sum=0;
|
||||
|
||||
check_vect ();
|
||||
|
||||
for (i = 0; i < N+M; i++) {
|
||||
in[i] = i;
|
||||
}
|
||||
|
||||
sum=foo ();
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
arr[i] = 3;
|
||||
diff = 0;
|
||||
diff = bar (i, diff, in);
|
||||
s += diff;
|
||||
}
|
||||
|
||||
if (s != sum)
|
||||
abort ();
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
|
||||
/* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: not allowed" 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
70
gcc/testsuite/gcc.dg/vect/vect-outer-4l.c
Normal file
70
gcc/testsuite/gcc.dg/vect/vect-outer-4l.c
Normal file
|
@ -0,0 +1,70 @@
|
|||
/* { dg-require-effective-target vect_int } */
|
||||
#include <stdarg.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 40
|
||||
#define M 128
|
||||
unsigned short in[N+M];
|
||||
unsigned int out[N];
|
||||
unsigned char arr[N];
|
||||
|
||||
/* Outer-loop vectorization. */
|
||||
/* Not vectorized due to multiple-types in the inner-loop. */
|
||||
|
||||
unsigned int
|
||||
foo (){
|
||||
int i,j;
|
||||
unsigned int diff;
|
||||
unsigned int s=0;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
arr[i] = 3;
|
||||
diff = 0;
|
||||
for (j = 0; j < M; j+=8) {
|
||||
diff += in[j+i];
|
||||
}
|
||||
s+=diff;
|
||||
}
|
||||
return s;
|
||||
}
|
||||
|
||||
unsigned int
|
||||
bar (int i, unsigned int diff, unsigned short *in)
|
||||
{
|
||||
int j;
|
||||
for (j = 0; j < M; j+=8) {
|
||||
diff += in[j+i];
|
||||
}
|
||||
return diff;
|
||||
}
|
||||
|
||||
int main (void)
|
||||
{
|
||||
int i, j;
|
||||
unsigned int diff;
|
||||
unsigned int s=0,sum=0;
|
||||
|
||||
check_vect ();
|
||||
|
||||
for (i = 0; i < N+M; i++) {
|
||||
in[i] = i;
|
||||
}
|
||||
|
||||
sum=foo ();
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
arr[i] = 3;
|
||||
diff = 0;
|
||||
diff = bar (i, diff, in);
|
||||
s += diff;
|
||||
}
|
||||
|
||||
if (s != sum)
|
||||
abort ();
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
|
||||
/* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: not allowed" 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
58
gcc/testsuite/gcc.dg/vect/vect-outer-4m.c
Normal file
58
gcc/testsuite/gcc.dg/vect/vect-outer-4m.c
Normal file
|
@ -0,0 +1,58 @@
|
|||
/* { dg-require-effective-target vect_int } */
|
||||
#include <stdarg.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 40
|
||||
#define M 128
|
||||
unsigned short in[N+M];
|
||||
unsigned int out[N];
|
||||
|
||||
/* Outer-loop vectorization. */
|
||||
/* Not vectorized due to multiple-types in the inner-loop. */
|
||||
|
||||
unsigned int
|
||||
foo (){
|
||||
int i,j;
|
||||
unsigned int diff;
|
||||
unsigned int s=0;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < M; j+=8) {
|
||||
diff += in[j+i];
|
||||
}
|
||||
s+=((unsigned short)diff>>3);
|
||||
}
|
||||
return s;
|
||||
}
|
||||
|
||||
int main (void)
|
||||
{
|
||||
int i, j;
|
||||
unsigned int diff;
|
||||
unsigned int s=0,sum=0;
|
||||
|
||||
check_vect ();
|
||||
|
||||
for (i = 0; i < N+M; i++) {
|
||||
in[i] = i;
|
||||
}
|
||||
|
||||
sum=foo ();
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < M; j+=8) {
|
||||
diff += in[j+i];
|
||||
}
|
||||
s += ((unsigned short)diff>>3);
|
||||
}
|
||||
|
||||
if (s != sum)
|
||||
abort ();
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
83
gcc/testsuite/gcc.dg/vect/vect-outer-5.c
Normal file
83
gcc/testsuite/gcc.dg/vect/vect-outer-5.c
Normal file
|
@ -0,0 +1,83 @@
|
|||
/* { dg-require-effective-target vect_int } */
|
||||
|
||||
#include <stdarg.h>
|
||||
#include <signal.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 64
|
||||
#define MAX 42
|
||||
|
||||
extern void abort(void);
|
||||
|
||||
int main1 ()
|
||||
{
|
||||
float A[N] __attribute__ ((__aligned__(16)));
|
||||
float B[N] __attribute__ ((__aligned__(16)));
|
||||
float C[N] __attribute__ ((__aligned__(16)));
|
||||
float D[N] __attribute__ ((__aligned__(16)));
|
||||
float s;
|
||||
|
||||
int i, j;
|
||||
|
||||
for (i = 0; i < N; i++)
|
||||
{
|
||||
A[i] = i;
|
||||
B[i] = i;
|
||||
C[i] = i;
|
||||
D[i] = i;
|
||||
}
|
||||
|
||||
/* Outer-loop 1: Vectorizable with respect to dependence distance. */
|
||||
for (i = 0; i < N-20; i++)
|
||||
{
|
||||
s = 0;
|
||||
for (j=0; j<N; j+=4)
|
||||
s += C[j];
|
||||
A[i] = A[i+20] + s;
|
||||
}
|
||||
|
||||
/* check results: */
|
||||
for (i = 0; i < N-20; i++)
|
||||
{
|
||||
s = 0;
|
||||
for (j=0; j<N; j+=4)
|
||||
s += C[j];
|
||||
if (A[i] != D[i+20] + s)
|
||||
abort ();
|
||||
}
|
||||
|
||||
/* Outer-loop 2: Not vectorizable because of dependence distance. */
|
||||
for (i = 0; i < 4; i++)
|
||||
{
|
||||
s = 0;
|
||||
for (j=0; j<N; j+=4)
|
||||
s += C[j];
|
||||
B[i] = B[i+3] + s;
|
||||
}
|
||||
|
||||
/* check results: */
|
||||
for (i = 0; i < 4; i++)
|
||||
{
|
||||
s = 0;
|
||||
for (j=0; j<N; j+=4)
|
||||
s += C[j];
|
||||
if (B[i] != D[i+3] + s)
|
||||
abort ();
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int main ()
|
||||
{
|
||||
check_vect ();
|
||||
return main1();
|
||||
}
|
||||
|
||||
/* NOTE: We temporarily xfail the following check until versioning for
|
||||
aliasing is fixed to avoid versioning when the dependence distance
|
||||
is known. */
|
||||
/* { dg-final { scan-tree-dump-times "not vectorized: possible dependence between data-refs" 1 "vect" { xfail *-*-* } } } */
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
|
||||
/* { dg-final { scan-tree-dump-times "zero step in outer loop." 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
65
gcc/testsuite/gcc.dg/vect/vect-outer-6.c
Normal file
65
gcc/testsuite/gcc.dg/vect/vect-outer-6.c
Normal file
|
@ -0,0 +1,65 @@
|
|||
/* { dg-require-effective-target vect_int } */
|
||||
|
||||
#include <stdarg.h>
|
||||
#include <signal.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 64
|
||||
#define MAX 42
|
||||
|
||||
float A[N] __attribute__ ((__aligned__(16)));
|
||||
float B[N] __attribute__ ((__aligned__(16)));
|
||||
float C[N] __attribute__ ((__aligned__(16)));
|
||||
float D[N] __attribute__ ((__aligned__(16)));
|
||||
extern void abort(void);
|
||||
|
||||
int main1 ()
|
||||
{
|
||||
float s;
|
||||
|
||||
int i, j;
|
||||
|
||||
for (i = 0; i < 8; i++)
|
||||
{
|
||||
s = 0;
|
||||
for (j=0; j<8; j+=4)
|
||||
s += C[j];
|
||||
A[i] = s;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int main ()
|
||||
{
|
||||
int i,j;
|
||||
float s;
|
||||
|
||||
check_vect ();
|
||||
|
||||
for (i = 0; i < N; i++)
|
||||
{
|
||||
A[i] = i;
|
||||
B[i] = i;
|
||||
C[i] = i;
|
||||
D[i] = i;
|
||||
}
|
||||
|
||||
main1();
|
||||
|
||||
/* check results: */
|
||||
for (i = 0; i < 8; i++)
|
||||
{
|
||||
s = 0;
|
||||
for (j=0; j<8; j+=4)
|
||||
s += C[j];
|
||||
if (A[i] != s)
|
||||
abort ();
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
|
||||
/* { dg-final { scan-tree-dump-times "zero step in outer loop." 1 "vect" } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
80
gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb.c
Normal file
80
gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb.c
Normal file
|
@ -0,0 +1,80 @@
|
|||
/* { dg-require-effective-target vect_float } */
|
||||
|
||||
#include <stdarg.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 40
|
||||
#define M 64
|
||||
float in[N+M];
|
||||
float coeff[M];
|
||||
float out[N];
|
||||
float fir_out[N];
|
||||
|
||||
/* Should be vectorized. Fixed misaligment in the inner-loop. */
|
||||
/* Currently not vectorized because the loop-count for the inner-loop
|
||||
has a maybe_zero component. Will be fixed when we incorporate the
|
||||
"cond_expr in rhs" patch. */
|
||||
void foo (){
|
||||
int i,j,k;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
out[i] = 0;
|
||||
}
|
||||
|
||||
for (k = 0; k < 4; k++) {
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
j = k;
|
||||
|
||||
do {
|
||||
diff += in[j+i]*coeff[j];
|
||||
j+=4;
|
||||
} while (j < M);
|
||||
|
||||
out[i] += diff;
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
/* Vectorized. Changing misalignment in the inner-loop. */
|
||||
void fir (){
|
||||
int i,j,k;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < M; j++) {
|
||||
diff += in[j+i]*coeff[j];
|
||||
}
|
||||
fir_out[i] = diff;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
int main (void)
|
||||
{
|
||||
check_vect ();
|
||||
int i, j;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < M; i++)
|
||||
coeff[i] = i;
|
||||
for (i = 0; i < N+M; i++)
|
||||
in[i] = i;
|
||||
|
||||
foo ();
|
||||
fir ();
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
if (out[i] != fir_out[i])
|
||||
abort ();
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 2 "vect" { xfail *-*-* } } } */
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
77
gcc/testsuite/gcc.dg/vect/vect-outer-fir.c
Normal file
77
gcc/testsuite/gcc.dg/vect/vect-outer-fir.c
Normal file
|
@ -0,0 +1,77 @@
|
|||
/* { dg-require-effective-target vect_float } */
|
||||
|
||||
#include <stdarg.h>
|
||||
#include "tree-vect.h"
|
||||
|
||||
#define N 40
|
||||
#define M 128
|
||||
float in[N+M];
|
||||
float coeff[M];
|
||||
float out[N];
|
||||
float fir_out[N];
|
||||
|
||||
/* Should be vectorized. Fixed misaligment in the inner-loop. */
|
||||
/* Currently not vectorized because we get too many BBs in the inner-loop,
|
||||
because the compiler doesn't realize that the inner-loop executes at
|
||||
least once (cause k<4), and so there's no need to create a guard code
|
||||
to skip the inner-loop in case it doesn't execute. */
|
||||
void foo (){
|
||||
int i,j,k;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
out[i] = 0;
|
||||
}
|
||||
|
||||
for (k = 0; k < 4; k++) {
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = k; j < M; j+=4) {
|
||||
diff += in[j+i]*coeff[j];
|
||||
}
|
||||
out[i] += diff;
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
/* Vectorized. Changing misalignment in the inner-loop. */
|
||||
void fir (){
|
||||
int i,j,k;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
diff = 0;
|
||||
for (j = 0; j < M; j++) {
|
||||
diff += in[j+i]*coeff[j];
|
||||
}
|
||||
fir_out[i] = diff;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
int main (void)
|
||||
{
|
||||
check_vect ();
|
||||
int i, j;
|
||||
float diff;
|
||||
|
||||
for (i = 0; i < M; i++)
|
||||
coeff[i] = i;
|
||||
for (i = 0; i < N+M; i++)
|
||||
in[i] = i;
|
||||
|
||||
foo ();
|
||||
fir ();
|
||||
|
||||
for (i = 0; i < N; i++) {
|
||||
if (out[i] != fir_out[i])
|
||||
abort ();
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 2 "vect" { xfail *-*-* } } } */
|
||||
/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
|
||||
/* { dg-final { cleanup-tree-dump "vect" } } */
|
|
@ -489,7 +489,7 @@ dump_ddrs (FILE *file, VEC (ddr_p, heap) *ddrs)
|
|||
/* Expresses EXP as VAR + OFF, where off is a constant. The type of OFF
|
||||
will be ssizetype. */
|
||||
|
||||
static void
|
||||
void
|
||||
split_constant_offset (tree exp, tree *var, tree *off)
|
||||
{
|
||||
tree type = TREE_TYPE (exp), otype;
|
||||
|
|
|
@ -388,4 +388,7 @@ index_in_loop_nest (int var, VEC (loop_p, heap) *loop_nest)
|
|||
/* In lambda-code.c */
|
||||
bool lambda_transform_legal_p (lambda_trans_matrix, int, VEC (ddr_p, heap) *);
|
||||
|
||||
/* In tree-data-refs.c */
|
||||
void split_constant_offset (tree , tree *, tree *);
|
||||
|
||||
#endif /* GCC_TREE_DATA_REF_H */
|
||||
|
|
|
@ -1279,6 +1279,8 @@ vect_compute_data_ref_alignment (struct data_reference *dr)
|
|||
{
|
||||
tree stmt = DR_STMT (dr);
|
||||
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
|
||||
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
|
||||
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
|
||||
tree ref = DR_REF (dr);
|
||||
tree vectype;
|
||||
tree base, base_addr;
|
||||
|
@ -1295,13 +1297,42 @@ vect_compute_data_ref_alignment (struct data_reference *dr)
|
|||
misalign = DR_INIT (dr);
|
||||
aligned_to = DR_ALIGNED_TO (dr);
|
||||
base_addr = DR_BASE_ADDRESS (dr);
|
||||
|
||||
/* In case the dataref is in an inner-loop of the loop that is being
|
||||
vectorized (LOOP), we use the base and misalignment information
|
||||
relative to the outer-loop (LOOP). This is ok only if the misalignment
|
||||
stays the same throughout the execution of the inner-loop, which is why
|
||||
we have to check that the stride of the dataref in the inner-loop evenly
|
||||
divides by the vector size. */
|
||||
if (nested_in_vect_loop_p (loop, stmt))
|
||||
{
|
||||
tree step = DR_STEP (dr);
|
||||
HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
|
||||
|
||||
if (dr_step % UNITS_PER_SIMD_WORD == 0)
|
||||
{
|
||||
if (vect_print_dump_info (REPORT_ALIGNMENT))
|
||||
fprintf (vect_dump, "inner step divides the vector-size.");
|
||||
misalign = STMT_VINFO_DR_INIT (stmt_info);
|
||||
aligned_to = STMT_VINFO_DR_ALIGNED_TO (stmt_info);
|
||||
base_addr = STMT_VINFO_DR_BASE_ADDRESS (stmt_info);
|
||||
}
|
||||
else
|
||||
{
|
||||
if (vect_print_dump_info (REPORT_ALIGNMENT))
|
||||
fprintf (vect_dump, "inner step doesn't divide the vector-size.");
|
||||
misalign = NULL_TREE;
|
||||
}
|
||||
}
|
||||
|
||||
base = build_fold_indirect_ref (base_addr);
|
||||
vectype = STMT_VINFO_VECTYPE (stmt_info);
|
||||
alignment = ssize_int (TYPE_ALIGN (vectype)/BITS_PER_UNIT);
|
||||
|
||||
if (tree_int_cst_compare (aligned_to, alignment) < 0)
|
||||
if ((aligned_to && tree_int_cst_compare (aligned_to, alignment) < 0)
|
||||
|| !misalign)
|
||||
{
|
||||
if (vect_print_dump_info (REPORT_DETAILS))
|
||||
if (vect_print_dump_info (REPORT_ALIGNMENT))
|
||||
{
|
||||
fprintf (vect_dump, "Unknown alignment for access: ");
|
||||
print_generic_expr (vect_dump, base, TDF_SLIM);
|
||||
|
@ -1980,20 +2011,39 @@ static bool
|
|||
vect_analyze_data_ref_access (struct data_reference *dr)
|
||||
{
|
||||
tree step = DR_STEP (dr);
|
||||
HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
|
||||
tree scalar_type = TREE_TYPE (DR_REF (dr));
|
||||
HOST_WIDE_INT type_size = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (scalar_type));
|
||||
tree stmt = DR_STMT (dr);
|
||||
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
|
||||
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
|
||||
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
|
||||
HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
|
||||
HOST_WIDE_INT stride;
|
||||
|
||||
/* Don't allow invariant accesses. */
|
||||
if (dr_step == 0)
|
||||
return false;
|
||||
|
||||
if (nested_in_vect_loop_p (loop, stmt))
|
||||
{
|
||||
/* For the rest of the analysis we use the outer-loop step. */
|
||||
step = STMT_VINFO_DR_STEP (stmt_info);
|
||||
dr_step = TREE_INT_CST_LOW (step);
|
||||
|
||||
if (dr_step == 0)
|
||||
{
|
||||
if (vect_print_dump_info (REPORT_ALIGNMENT))
|
||||
fprintf (vect_dump, "zero step in outer loop.");
|
||||
if (DR_IS_READ (dr))
|
||||
return true;
|
||||
else
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/* For interleaving, STRIDE is STEP counted in elements, i.e., the size of the
|
||||
interleaving group (including gaps). */
|
||||
HOST_WIDE_INT stride = dr_step / type_size;
|
||||
|
||||
if (!step)
|
||||
{
|
||||
if (vect_print_dump_info (REPORT_DETAILS))
|
||||
fprintf (vect_dump, "bad data-ref access");
|
||||
return false;
|
||||
}
|
||||
stride = dr_step / type_size;
|
||||
|
||||
/* Consecutive? */
|
||||
if (!tree_int_cst_compare (step, TYPE_SIZE_UNIT (scalar_type)))
|
||||
|
@ -2003,6 +2053,13 @@ vect_analyze_data_ref_access (struct data_reference *dr)
|
|||
return true;
|
||||
}
|
||||
|
||||
if (nested_in_vect_loop_p (loop, stmt))
|
||||
{
|
||||
if (vect_print_dump_info (REPORT_ALIGNMENT))
|
||||
fprintf (vect_dump, "strided access in outer loop.");
|
||||
return false;
|
||||
}
|
||||
|
||||
/* Not consecutive access is possible only if it is a part of interleaving. */
|
||||
if (!DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt)))
|
||||
{
|
||||
|
@ -2231,6 +2288,7 @@ vect_analyze_data_refs (loop_vec_info loop_vinfo)
|
|||
tree stmt;
|
||||
stmt_vec_info stmt_info;
|
||||
basic_block bb;
|
||||
tree base, offset, init;
|
||||
|
||||
if (!dr || !DR_REF (dr))
|
||||
{
|
||||
|
@ -2238,36 +2296,13 @@ vect_analyze_data_refs (loop_vec_info loop_vinfo)
|
|||
fprintf (vect_dump, "not vectorized: unhandled data-ref ");
|
||||
return false;
|
||||
}
|
||||
|
||||
/* Update DR field in stmt_vec_info struct. */
|
||||
|
||||
stmt = DR_STMT (dr);
|
||||
stmt_info = vinfo_for_stmt (stmt);
|
||||
|
||||
/* If outer-loop vectorization: we don't yet support datarefs
|
||||
in the innermost loop. */
|
||||
bb = bb_for_stmt (stmt);
|
||||
if (bb->loop_father != LOOP_VINFO_LOOP (loop_vinfo))
|
||||
{
|
||||
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
|
||||
fprintf (vect_dump, "not vectorized: data-ref in nested loop");
|
||||
return false;
|
||||
}
|
||||
|
||||
if (STMT_VINFO_DATA_REF (stmt_info))
|
||||
{
|
||||
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
|
||||
{
|
||||
fprintf (vect_dump,
|
||||
"not vectorized: more than one data ref in stmt: ");
|
||||
print_generic_expr (vect_dump, stmt, TDF_SLIM);
|
||||
}
|
||||
return false;
|
||||
}
|
||||
STMT_VINFO_DATA_REF (stmt_info) = dr;
|
||||
|
||||
/* Check that analysis of the data-ref succeeded. */
|
||||
if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_INIT (dr)
|
||||
|| !DR_STEP (dr))
|
||||
|| !DR_STEP (dr))
|
||||
{
|
||||
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
|
||||
{
|
||||
|
@ -2294,7 +2329,127 @@ vect_analyze_data_refs (loop_vec_info loop_vinfo)
|
|||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
|
||||
base = unshare_expr (DR_BASE_ADDRESS (dr));
|
||||
offset = unshare_expr (DR_OFFSET (dr));
|
||||
init = unshare_expr (DR_INIT (dr));
|
||||
|
||||
/* Update DR field in stmt_vec_info struct. */
|
||||
bb = bb_for_stmt (stmt);
|
||||
|
||||
/* If the dataref is in an inner-loop of the loop that is considered for
|
||||
for vectorization, we also want to analyze the access relative to
|
||||
the outer-loop (DR contains information only relative to the
|
||||
inner-most enclosing loop). We do that by building a reference to the
|
||||
first location accessed by the inner-loop, and analyze it relative to
|
||||
the outer-loop. */
|
||||
if (nested_in_vect_loop_p (loop, stmt))
|
||||
{
|
||||
tree outer_step, outer_base, outer_init;
|
||||
HOST_WIDE_INT pbitsize, pbitpos;
|
||||
tree poffset;
|
||||
enum machine_mode pmode;
|
||||
int punsignedp, pvolatilep;
|
||||
affine_iv base_iv, offset_iv;
|
||||
tree dinit;
|
||||
|
||||
/* Build a reference to the first location accessed by the
|
||||
inner-loop: *(BASE+INIT). (The first location is actually
|
||||
BASE+INIT+OFFSET, but we add OFFSET separately later. */
|
||||
tree inner_base = build_fold_indirect_ref
|
||||
(fold_build2 (PLUS_EXPR, TREE_TYPE (base), base, init));
|
||||
|
||||
if (vect_print_dump_info (REPORT_DETAILS))
|
||||
{
|
||||
fprintf (dump_file, "analyze in outer-loop: ");
|
||||
print_generic_expr (dump_file, inner_base, TDF_SLIM);
|
||||
}
|
||||
|
||||
outer_base = get_inner_reference (inner_base, &pbitsize, &pbitpos,
|
||||
&poffset, &pmode, &punsignedp, &pvolatilep, false);
|
||||
gcc_assert (outer_base != NULL_TREE);
|
||||
|
||||
if (pbitpos % BITS_PER_UNIT != 0)
|
||||
{
|
||||
if (vect_print_dump_info (REPORT_DETAILS))
|
||||
fprintf (dump_file, "failed: bit offset alignment.\n");
|
||||
return false;
|
||||
}
|
||||
|
||||
outer_base = build_fold_addr_expr (outer_base);
|
||||
if (!simple_iv (loop, stmt, outer_base, &base_iv, false))
|
||||
{
|
||||
if (vect_print_dump_info (REPORT_DETAILS))
|
||||
fprintf (dump_file, "failed: evolution of base is not affine.\n");
|
||||
return false;
|
||||
}
|
||||
|
||||
if (offset)
|
||||
{
|
||||
if (poffset)
|
||||
poffset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, poffset);
|
||||
else
|
||||
poffset = offset;
|
||||
}
|
||||
|
||||
if (!poffset)
|
||||
{
|
||||
offset_iv.base = ssize_int (0);
|
||||
offset_iv.step = ssize_int (0);
|
||||
}
|
||||
else if (!simple_iv (loop, stmt, poffset, &offset_iv, false))
|
||||
{
|
||||
if (vect_print_dump_info (REPORT_DETAILS))
|
||||
fprintf (dump_file, "evolution of offset is not affine.\n");
|
||||
return false;
|
||||
}
|
||||
|
||||
outer_init = ssize_int (pbitpos / BITS_PER_UNIT);
|
||||
split_constant_offset (base_iv.base, &base_iv.base, &dinit);
|
||||
outer_init = size_binop (PLUS_EXPR, outer_init, dinit);
|
||||
split_constant_offset (offset_iv.base, &offset_iv.base, &dinit);
|
||||
outer_init = size_binop (PLUS_EXPR, outer_init, dinit);
|
||||
|
||||
outer_step = size_binop (PLUS_EXPR,
|
||||
fold_convert (ssizetype, base_iv.step),
|
||||
fold_convert (ssizetype, offset_iv.step));
|
||||
|
||||
STMT_VINFO_DR_STEP (stmt_info) = outer_step;
|
||||
/* FIXME: Use canonicalize_base_object_address (base_iv.base); */
|
||||
STMT_VINFO_DR_BASE_ADDRESS (stmt_info) = base_iv.base;
|
||||
STMT_VINFO_DR_INIT (stmt_info) = outer_init;
|
||||
STMT_VINFO_DR_OFFSET (stmt_info) =
|
||||
fold_convert (ssizetype, offset_iv.base);
|
||||
STMT_VINFO_DR_ALIGNED_TO (stmt_info) =
|
||||
size_int (highest_pow2_factor (offset_iv.base));
|
||||
|
||||
if (dump_file && (dump_flags & TDF_DETAILS))
|
||||
{
|
||||
fprintf (dump_file, "\touter base_address: ");
|
||||
print_generic_expr (dump_file, STMT_VINFO_DR_BASE_ADDRESS (stmt_info), TDF_SLIM);
|
||||
fprintf (dump_file, "\n\touter offset from base address: ");
|
||||
print_generic_expr (dump_file, STMT_VINFO_DR_OFFSET (stmt_info), TDF_SLIM);
|
||||
fprintf (dump_file, "\n\touter constant offset from base address: ");
|
||||
print_generic_expr (dump_file, STMT_VINFO_DR_INIT (stmt_info), TDF_SLIM);
|
||||
fprintf (dump_file, "\n\touter step: ");
|
||||
print_generic_expr (dump_file, STMT_VINFO_DR_STEP (stmt_info), TDF_SLIM);
|
||||
fprintf (dump_file, "\n\touter aligned to: ");
|
||||
print_generic_expr (dump_file, STMT_VINFO_DR_ALIGNED_TO (stmt_info), TDF_SLIM);
|
||||
}
|
||||
}
|
||||
|
||||
if (STMT_VINFO_DATA_REF (stmt_info))
|
||||
{
|
||||
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
|
||||
{
|
||||
fprintf (vect_dump,
|
||||
"not vectorized: more than one data ref in stmt: ");
|
||||
print_generic_expr (vect_dump, stmt, TDF_SLIM);
|
||||
}
|
||||
return false;
|
||||
}
|
||||
STMT_VINFO_DATA_REF (stmt_info) = dr;
|
||||
|
||||
/* Set vectype for STMT. */
|
||||
scalar_type = TREE_TYPE (DR_REF (dr));
|
||||
STMT_VINFO_VECTYPE (stmt_info) =
|
||||
|
|
File diff suppressed because it is too large
Load diff
|
@ -1345,6 +1345,13 @@ new_stmt_vec_info (tree stmt, loop_vec_info loop_vinfo)
|
|||
STMT_VINFO_IN_PATTERN_P (res) = false;
|
||||
STMT_VINFO_RELATED_STMT (res) = NULL;
|
||||
STMT_VINFO_DATA_REF (res) = NULL;
|
||||
|
||||
STMT_VINFO_DR_BASE_ADDRESS (res) = NULL;
|
||||
STMT_VINFO_DR_OFFSET (res) = NULL;
|
||||
STMT_VINFO_DR_INIT (res) = NULL;
|
||||
STMT_VINFO_DR_STEP (res) = NULL;
|
||||
STMT_VINFO_DR_ALIGNED_TO (res) = NULL;
|
||||
|
||||
if (TREE_CODE (stmt) == PHI_NODE && is_loop_header_bb_p (bb_for_stmt (stmt)))
|
||||
STMT_VINFO_DEF_TYPE (res) = vect_unknown_def_type;
|
||||
else
|
||||
|
@ -1655,21 +1662,103 @@ get_vectype_for_scalar_type (tree scalar_type)
|
|||
enum dr_alignment_support
|
||||
vect_supportable_dr_alignment (struct data_reference *dr)
|
||||
{
|
||||
tree vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr)));
|
||||
tree stmt = DR_STMT (dr);
|
||||
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
|
||||
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
|
||||
enum machine_mode mode = (int) TYPE_MODE (vectype);
|
||||
struct loop *vect_loop = LOOP_VINFO_LOOP (STMT_VINFO_LOOP_VINFO (stmt_info));
|
||||
bool nested_in_vect_loop = nested_in_vect_loop_p (vect_loop, stmt);
|
||||
bool invariant_in_outerloop = false;
|
||||
|
||||
if (aligned_access_p (dr))
|
||||
return dr_aligned;
|
||||
|
||||
if (nested_in_vect_loop)
|
||||
{
|
||||
tree outerloop_step = STMT_VINFO_DR_STEP (stmt_info);
|
||||
invariant_in_outerloop =
|
||||
(tree_int_cst_compare (outerloop_step, size_zero_node) == 0);
|
||||
}
|
||||
|
||||
/* Possibly unaligned access. */
|
||||
|
||||
/* We can choose between using the implicit realignment scheme (generating
|
||||
a misaligned_move stmt) and the explicit realignment scheme (generating
|
||||
aligned loads with a REALIGN_LOAD). There are two variants to the explicit
|
||||
realignment scheme: optimized, and unoptimized.
|
||||
We can optimize the realignment only if the step between consecutive
|
||||
vector loads is equal to the vector size. Since the vector memory
|
||||
accesses advance in steps of VS (Vector Size) in the vectorized loop, it
|
||||
is guaranteed that the misalignment amount remains the same throughout the
|
||||
execution of the vectorized loop. Therefore, we can create the
|
||||
"realignment token" (the permutation mask that is passed to REALIGN_LOAD)
|
||||
at the loop preheader.
|
||||
|
||||
However, in the case of outer-loop vectorization, when vectorizing a
|
||||
memory access in the inner-loop nested within the LOOP that is now being
|
||||
vectorized, while it is guaranteed that the misalignment of the
|
||||
vectorized memory access will remain the same in different outer-loop
|
||||
iterations, it is *not* guaranteed that is will remain the same throughout
|
||||
the execution of the inner-loop. This is because the inner-loop advances
|
||||
with the original scalar step (and not in steps of VS). If the inner-loop
|
||||
step happens to be a multiple of VS, then the misalignment remaines fixed
|
||||
and we can use the optimized realignment scheme. For example:
|
||||
|
||||
for (i=0; i<N; i++)
|
||||
for (j=0; j<M; j++)
|
||||
s += a[i+j];
|
||||
|
||||
When vectorizing the i-loop in the above example, the step between
|
||||
consecutive vector loads is 1, and so the misalignment does not remain
|
||||
fixed across the execution of the inner-loop, and the realignment cannot
|
||||
be optimized (as illustrated in the following pseudo vectorized loop):
|
||||
|
||||
for (i=0; i<N; i+=4)
|
||||
for (j=0; j<M; j++){
|
||||
vs += vp[i+j]; // misalignment of &vp[i+j] is {0,1,2,3,0,1,2,3,...}
|
||||
// when j is {0,1,2,3,4,5,6,7,...} respectively.
|
||||
// (assuming that we start from an aligned address).
|
||||
}
|
||||
|
||||
We therefore have to use the unoptimized realignment scheme:
|
||||
|
||||
for (i=0; i<N; i+=4)
|
||||
for (j=k; j<M; j+=4)
|
||||
vs += vp[i+j]; // misalignment of &vp[i+j] is always k (assuming
|
||||
// that the misalignment of the initial address is
|
||||
// 0).
|
||||
|
||||
The loop can then be vectorized as follows:
|
||||
|
||||
for (k=0; k<4; k++){
|
||||
rt = get_realignment_token (&vp[k]);
|
||||
for (i=0; i<N; i+=4){
|
||||
v1 = vp[i+k];
|
||||
for (j=k; j<M; j+=4){
|
||||
v2 = vp[i+j+VS-1];
|
||||
va = REALIGN_LOAD <v1,v2,rt>;
|
||||
vs += va;
|
||||
v1 = v2;
|
||||
}
|
||||
}
|
||||
} */
|
||||
|
||||
if (DR_IS_READ (dr))
|
||||
{
|
||||
if (optab_handler (vec_realign_load_optab, mode)->insn_code != CODE_FOR_nothing
|
||||
if (optab_handler (vec_realign_load_optab, mode)->insn_code !=
|
||||
CODE_FOR_nothing
|
||||
&& (!targetm.vectorize.builtin_mask_for_load
|
||||
|| targetm.vectorize.builtin_mask_for_load ()))
|
||||
return dr_unaligned_software_pipeline;
|
||||
{
|
||||
if (nested_in_vect_loop
|
||||
&& TREE_INT_CST_LOW (DR_STEP (dr)) != UNITS_PER_SIMD_WORD)
|
||||
return dr_explicit_realign;
|
||||
else
|
||||
return dr_explicit_realign_optimized;
|
||||
}
|
||||
|
||||
if (optab_handler (movmisalign_optab, mode)->insn_code != CODE_FOR_nothing)
|
||||
if (optab_handler (movmisalign_optab, mode)->insn_code !=
|
||||
CODE_FOR_nothing)
|
||||
/* Can't software pipeline the loads, but can at least do them. */
|
||||
return dr_unaligned_supported;
|
||||
}
|
||||
|
|
|
@ -53,7 +53,8 @@ enum operation_type {
|
|||
enum dr_alignment_support {
|
||||
dr_unaligned_unsupported,
|
||||
dr_unaligned_supported,
|
||||
dr_unaligned_software_pipeline,
|
||||
dr_explicit_realign,
|
||||
dr_explicit_realign_optimized,
|
||||
dr_aligned
|
||||
};
|
||||
|
||||
|
@ -249,9 +250,18 @@ typedef struct _stmt_vec_info {
|
|||
data-ref (array/pointer/struct access). A GIMPLE stmt is expected to have
|
||||
at most one such data-ref. **/
|
||||
|
||||
/* Information about the data-ref (access function, etc). */
|
||||
/* Information about the data-ref (access function, etc),
|
||||
relative to the inner-most containing loop. */
|
||||
struct data_reference *data_ref_info;
|
||||
|
||||
/* Information about the data-ref relative to this loop
|
||||
nest (the loop that is being considered for vectorization). */
|
||||
tree dr_base_address;
|
||||
tree dr_init;
|
||||
tree dr_offset;
|
||||
tree dr_step;
|
||||
tree dr_aligned_to;
|
||||
|
||||
/* Stmt is part of some pattern (computation idiom) */
|
||||
bool in_pattern_p;
|
||||
|
||||
|
@ -310,6 +320,13 @@ typedef struct _stmt_vec_info {
|
|||
#define STMT_VINFO_VECTYPE(S) (S)->vectype
|
||||
#define STMT_VINFO_VEC_STMT(S) (S)->vectorized_stmt
|
||||
#define STMT_VINFO_DATA_REF(S) (S)->data_ref_info
|
||||
|
||||
#define STMT_VINFO_DR_BASE_ADDRESS(S) (S)->dr_base_address
|
||||
#define STMT_VINFO_DR_INIT(S) (S)->dr_init
|
||||
#define STMT_VINFO_DR_OFFSET(S) (S)->dr_offset
|
||||
#define STMT_VINFO_DR_STEP(S) (S)->dr_step
|
||||
#define STMT_VINFO_DR_ALIGNED_TO(S) (S)->dr_aligned_to
|
||||
|
||||
#define STMT_VINFO_IN_PATTERN_P(S) (S)->in_pattern_p
|
||||
#define STMT_VINFO_RELATED_STMT(S) (S)->related_stmt
|
||||
#define STMT_VINFO_SAME_ALIGN_REFS(S) (S)->same_align_refs
|
||||
|
|
Loading…
Add table
Reference in a new issue