Patchwork Fix PR tree-optimization/53636 (SLP generates invalid misaligned access)

login
register
mail settings
Submitter Ulrich Weigand
Date June 15, 2012, 1:13 p.m.
Message ID <201206151313.q5FDDK22032184@d06av02.portsmouth.uk.ibm.com>
Download mbox | patch
Permalink /patch/165102/
State New
Headers show

Comments

Ulrich Weigand - June 15, 2012, 1:13 p.m.
Hello,

PR tree-optimization/53636 is a crash due to an invalid unaligned access
generated by the vectorizer.

The problem is that vect_compute_data_ref_alignment uses DR_ALIGNED_TO
as computed by the default data-ref analysis to decide whether an access
is sufficiently aligned for the vectorizer.

However, this analysis computes the scalar evolution relative to the
innermost loop in which the access takes place; DR_ALIGNED_TO only
reflects the known alignmnent of the *base* address according to
that evolution.

Now, if we're actually about to vectorize this particular loop, then
just checking the alignment of the base is exactly what we need to do
(subsequent accesses will usually be misaligned, but that's OK since
we're transforming those into a vector access).

However, if we're actually currently vectorizing something else, this
test is not sufficient.  The code currently already checks for the
case where we're performing outer-loop vectorization.  In this case,
we need to check alignment of the access on *every* pass through the
inner loop, as the comment states:

  /* In case the dataref is in an inner-loop of the loop that is being
     vectorized (LOOP), we use the base and misalignment information
     relative to the outer-loop (LOOP).  This is ok only if the misalignment
     stays the same throughout the execution of the inner-loop, which is why
     we have to check that the stride of the dataref in the inner-loop evenly
     divides by the vector size.  */

However, there is a second case where we need to check every pass: if
we're not actually vectorizing any loop, but are performing basic-block
SLP.  In this case, it would appear that we need the same check as
described in the comment above, i.e. to verify that the stride is a
multiple of the vector size.

The patch below adds this check, and this indeed fixes the invalid access
I was seeing in the test case (in the final assembler, we now get a vld1.16
instead of vldr).

Tested on arm-linux-gnueabi with no regressions.

OK for mainline?

Bye,
Ulrich


ChangeLog:

	gcc/
	PR tree-optimization/53636
	* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Verify
	stride when doing basic-block vectorization.

	gcc/testsuite/
	PR tree-optimization/53636
	* gcc.target/arm/pr53636.c: New test.
Richard Guenther - June 15, 2012, 1:22 p.m.
On Fri, Jun 15, 2012 at 3:13 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Hello,
>
> PR tree-optimization/53636 is a crash due to an invalid unaligned access
> generated by the vectorizer.
>
> The problem is that vect_compute_data_ref_alignment uses DR_ALIGNED_TO
> as computed by the default data-ref analysis to decide whether an access
> is sufficiently aligned for the vectorizer.
>
> However, this analysis computes the scalar evolution relative to the
> innermost loop in which the access takes place; DR_ALIGNED_TO only
> reflects the known alignmnent of the *base* address according to
> that evolution.
>
> Now, if we're actually about to vectorize this particular loop, then
> just checking the alignment of the base is exactly what we need to do
> (subsequent accesses will usually be misaligned, but that's OK since
> we're transforming those into a vector access).
>
> However, if we're actually currently vectorizing something else, this
> test is not sufficient.  The code currently already checks for the
> case where we're performing outer-loop vectorization.  In this case,
> we need to check alignment of the access on *every* pass through the
> inner loop, as the comment states:
>
>  /* In case the dataref is in an inner-loop of the loop that is being
>     vectorized (LOOP), we use the base and misalignment information
>     relative to the outer-loop (LOOP).  This is ok only if the misalignment
>     stays the same throughout the execution of the inner-loop, which is why
>     we have to check that the stride of the dataref in the inner-loop evenly
>     divides by the vector size.  */
>
> However, there is a second case where we need to check every pass: if
> we're not actually vectorizing any loop, but are performing basic-block
> SLP.  In this case, it would appear that we need the same check as
> described in the comment above, i.e. to verify that the stride is a
> multiple of the vector size.
>
> The patch below adds this check, and this indeed fixes the invalid access
> I was seeing in the test case (in the final assembler, we now get a vld1.16
> instead of vldr).
>
> Tested on arm-linux-gnueabi with no regressions.
>
> OK for mainline?

Ok.

Thanks,
Richard.

> Bye,
> Ulrich
>
>
> ChangeLog:
>
>        gcc/
>        PR tree-optimization/53636
>        * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Verify
>        stride when doing basic-block vectorization.
>
>        gcc/testsuite/
>        PR tree-optimization/53636
>        * gcc.target/arm/pr53636.c: New test.
>
>
> === added file 'gcc/testsuite/gcc.target/arm/pr53636.c'
> --- gcc/testsuite/gcc.target/arm/pr53636.c      1970-01-01 00:00:00 +0000
> +++ gcc/testsuite/gcc.target/arm/pr53636.c      2012-06-11 17:31:41 +0000
> @@ -0,0 +1,48 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target arm_neon_hw } */
> +/* { dg-options "-O -ftree-vectorize" } */
> +/* { dg-add-options arm_neon } */
> +
> +void fill (short *buf) __attribute__ ((noinline));
> +void fill (short *buf)
> +{
> +  int i;
> +
> +  for (i = 0; i < 11 * 8; i++)
> +    buf[i] = i;
> +}
> +
> +void test (unsigned char *dst) __attribute__ ((noinline));
> +void test (unsigned char *dst)
> +{
> +  short tmp[11 * 8], *tptr;
> +  int i;
> +
> +  fill (tmp);
> +
> +  tptr = tmp;
> +  for (i = 0; i < 8; i++)
> +    {
> +      dst[0] = (-tptr[0] + 9 * tptr[0 + 1] + 9 * tptr[0 + 2] - tptr[0 + 3]) >> 7;
> +      dst[1] = (-tptr[1] + 9 * tptr[1 + 1] + 9 * tptr[1 + 2] - tptr[1 + 3]) >> 7;
> +      dst[2] = (-tptr[2] + 9 * tptr[2 + 1] + 9 * tptr[2 + 2] - tptr[2 + 3]) >> 7;
> +      dst[3] = (-tptr[3] + 9 * tptr[3 + 1] + 9 * tptr[3 + 2] - tptr[3 + 3]) >> 7;
> +      dst[4] = (-tptr[4] + 9 * tptr[4 + 1] + 9 * tptr[4 + 2] - tptr[4 + 3]) >> 7;
> +      dst[5] = (-tptr[5] + 9 * tptr[5 + 1] + 9 * tptr[5 + 2] - tptr[5 + 3]) >> 7;
> +      dst[6] = (-tptr[6] + 9 * tptr[6 + 1] + 9 * tptr[6 + 2] - tptr[6 + 3]) >> 7;
> +      dst[7] = (-tptr[7] + 9 * tptr[7 + 1] + 9 * tptr[7 + 2] - tptr[7 + 3]) >> 7;
> +
> +      dst += 8;
> +      tptr += 11;
> +    }
> +}
> +
> +int main (void)
> +{
> +  char buf [8 * 8];
> +
> +  test (buf);
> +
> +  return 0;
> +}
> +
>
> === modified file 'gcc/tree-vect-data-refs.c'
> --- gcc/tree-vect-data-refs.c   2012-05-31 08:46:10 +0000
> +++ gcc/tree-vect-data-refs.c   2012-06-11 17:31:41 +0000
> @@ -845,6 +845,24 @@
>        }
>     }
>
> +  /* Similarly, if we're doing basic-block vectorization, we can only use
> +     base and misalignment information relative to an innermost loop if the
> +     misalignment stays the same throughout the execution of the loop.
> +     As above, this is the case if the stride of the dataref evenly divides
> +     by the vector size.  */
> +  if (!loop)
> +    {
> +      tree step = DR_STEP (dr);
> +      HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
> +
> +      if (dr_step % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0)
> +       {
> +         if (vect_print_dump_info (REPORT_ALIGNMENT))
> +           fprintf (vect_dump, "SLP: step doesn't divide the vector-size.");
> +         misalign = NULL_TREE;
> +       }
> +    }
> +
>   base = build_fold_indirect_ref (base_addr);
>   alignment = ssize_int (TYPE_ALIGN (vectype)/BITS_PER_UNIT);
>
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  Ulrich.Weigand@de.ibm.com
>
Richard Guenther - June 17, 2012, 11:39 a.m.
On Fri, Jun 15, 2012 at 5:00 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Richard Guenther wrote:
>> On Fri, Jun 15, 2012 at 3:13 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>> > However, there is a second case where we need to check every pass: if
>> > we're not actually vectorizing any loop, but are performing basic-block
>> > SLP.  In this case, it would appear that we need the same check as
>> > described in the comment above, i.e. to verify that the stride is a
>> > multiple of the vector size.
>> >
>> > The patch below adds this check, and this indeed fixes the invalid access
>> > I was seeing in the test case (in the final assembler, we now get a
>> > vld1.16 instead of vldr).
>> >
>> > Tested on arm-linux-gnueabi with no regressions.
>> >
>> > OK for mainline?
>>
>> Ok.
>
> Thanks for the quick review; I've checked this in to mainline now.
>
> I just noticed that the test case also crashes on 4.7, but not on 4.6.
>
> Would a backport to 4.7 also be OK, once testing passes?

Yes.  Please leave it on mainline a few days to catch fallout from
autotesters.

Thanks,
Richard.

> Thanks,
> Ulrich
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  Ulrich.Weigand@de.ibm.com
>
Mikael Pettersson - June 19, 2012, 9:36 p.m.
Richard Guenther writes:
 > On Fri, Jun 15, 2012 at 5:00 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
 > > Richard Guenther wrote:
 > >> On Fri, Jun 15, 2012 at 3:13 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
 > >> > However, there is a second case where we need to check every pass: if
 > >> > we're not actually vectorizing any loop, but are performing basic-block
 > >> > SLP.  In this case, it would appear that we need the same check as
 > >> > described in the comment above, i.e. to verify that the stride is a
 > >> > multiple of the vector size.
 > >> >
 > >> > The patch below adds this check, and this indeed fixes the invalid access
 > >> > I was seeing in the test case (in the final assembler, we now get a
 > >> > vld1.16 instead of vldr).
 > >> >
 > >> > Tested on arm-linux-gnueabi with no regressions.
 > >> >
 > >> > OK for mainline?
 > >>
 > >> Ok.
 > >
 > > Thanks for the quick review; I've checked this in to mainline now.
 > >
 > > I just noticed that the test case also crashes on 4.7, but not on 4.6.
 > >
 > > Would a backport to 4.7 also be OK, once testing passes?
 > 
 > Yes.  Please leave it on mainline a few days to catch fallout from
 > autotesters.

This patch caused

FAIL: gcc.dg/vect/bb-slp-16.c scan-tree-dump-times slp "basic block vectorized using SLP" 1

on sparc64-linux.  Comparing the pre and post patch dumps for that file shows

 22: vect_compute_data_ref_alignment:
 22: misalign = 4 bytes of ref MEM[(unsigned int *)pout_90 + 28B]
 22: vect_compute_data_ref_alignment:
-22: force alignment of arr[i_87]
-22: misalign = 0 bytes of ref arr[i_87]
+22: SLP: step doesn't divide the vector-size.
+22: Unknown alignment for access: arr

(lots of stuff that's simply gone)

-22: BASIC BLOCK VECTORIZED
-
-22: basic block vectorized using SLP
+22: not vectorized: unsupported unaligned store.arr[i_87]
+22: not vectorized: unsupported alignment in basic block.

/Mikael
Richard Guenther - June 20, 2012, 8:40 a.m.
On Tue, Jun 19, 2012 at 11:36 PM, Mikael Pettersson <mikpe@it.uu.se> wrote:
> Richard Guenther writes:
>  > On Fri, Jun 15, 2012 at 5:00 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>  > > Richard Guenther wrote:
>  > >> On Fri, Jun 15, 2012 at 3:13 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>  > >> > However, there is a second case where we need to check every pass: if
>  > >> > we're not actually vectorizing any loop, but are performing basic-block
>  > >> > SLP.  In this case, it would appear that we need the same check as
>  > >> > described in the comment above, i.e. to verify that the stride is a
>  > >> > multiple of the vector size.
>  > >> >
>  > >> > The patch below adds this check, and this indeed fixes the invalid access
>  > >> > I was seeing in the test case (in the final assembler, we now get a
>  > >> > vld1.16 instead of vldr).
>  > >> >
>  > >> > Tested on arm-linux-gnueabi with no regressions.
>  > >> >
>  > >> > OK for mainline?
>  > >>
>  > >> Ok.
>  > >
>  > > Thanks for the quick review; I've checked this in to mainline now.
>  > >
>  > > I just noticed that the test case also crashes on 4.7, but not on 4.6.
>  > >
>  > > Would a backport to 4.7 also be OK, once testing passes?
>  >
>  > Yes.  Please leave it on mainline a few days to catch fallout from
>  > autotesters.
>
> This patch caused
>
> FAIL: gcc.dg/vect/bb-slp-16.c scan-tree-dump-times slp "basic block vectorized using SLP" 1
>
> on sparc64-linux.  Comparing the pre and post patch dumps for that file shows
>
>  22: vect_compute_data_ref_alignment:
>  22: misalign = 4 bytes of ref MEM[(unsigned int *)pout_90 + 28B]
>  22: vect_compute_data_ref_alignment:
> -22: force alignment of arr[i_87]
> -22: misalign = 0 bytes of ref arr[i_87]
> +22: SLP: step doesn't divide the vector-size.
> +22: Unknown alignment for access: arr
>
> (lots of stuff that's simply gone)
>
> -22: BASIC BLOCK VECTORIZED
> -
> -22: basic block vectorized using SLP
> +22: not vectorized: unsupported unaligned store.arr[i_87]
> +22: not vectorized: unsupported alignment in basic block.

In this testcase the alignment of arr[i] should be irrelevant - it is
not part of
the stmts that are going to be vectorized.  But of course this may be
simply an odering issue in how we analyze data-references / statements
in basic-block vectorization (thus we possibly did not yet declare the
arr[i] = i statement as not taking part in the vectorization).

The line

> -22: force alignment of arr[i_87]

is odd, too - as said we do not need to touch arr when vectorizing the
basic-block.

Ulrich, can you look into this or do you want me to take a look here?

Mikael - please open a bugreport for this.

Thanks,
Richard.

> /Mikael
Mikael Pettersson - June 20, 2012, 11:30 a.m.
Richard Guenther writes:
 > On Tue, Jun 19, 2012 at 11:36 PM, Mikael Pettersson <mikpe@it.uu.se> wrote:
 > > Richard Guenther writes:
 > >  > On Fri, Jun 15, 2012 at 5:00 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
 > >  > > Richard Guenther wrote:
 > >  > >> On Fri, Jun 15, 2012 at 3:13 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
 > >  > >> > However, there is a second case where we need to check every pass: if
 > >  > >> > we're not actually vectorizing any loop, but are performing basic-block
 > >  > >> > SLP.  In this case, it would appear that we need the same check as
 > >  > >> > described in the comment above, i.e. to verify that the stride is a
 > >  > >> > multiple of the vector size.
 > >  > >> >
 > >  > >> > The patch below adds this check, and this indeed fixes the invalid access
 > >  > >> > I was seeing in the test case (in the final assembler, we now get a
 > >  > >> > vld1.16 instead of vldr).
 > >  > >> >
 > >  > >> > Tested on arm-linux-gnueabi with no regressions.
 > >  > >> >
 > >  > >> > OK for mainline?
 > >  > >>
 > >  > >> Ok.
 > >  > >
 > >  > > Thanks for the quick review; I've checked this in to mainline now.
 > >  > >
 > >  > > I just noticed that the test case also crashes on 4.7, but not on 4.6.
 > >  > >
 > >  > > Would a backport to 4.7 also be OK, once testing passes?
 > >  >
 > >  > Yes.  Please leave it on mainline a few days to catch fallout from
 > >  > autotesters.
 > >
 > > This patch caused
 > >
 > > FAIL: gcc.dg/vect/bb-slp-16.c scan-tree-dump-times slp "basic block vectorized using SLP" 1
 > >
 > > on sparc64-linux.  Comparing the pre and post patch dumps for that file shows
 > >
 > >  22: vect_compute_data_ref_alignment:
 > >  22: misalign = 4 bytes of ref MEM[(unsigned int *)pout_90 + 28B]
 > >  22: vect_compute_data_ref_alignment:
 > > -22: force alignment of arr[i_87]
 > > -22: misalign = 0 bytes of ref arr[i_87]
 > > +22: SLP: step doesn't divide the vector-size.
 > > +22: Unknown alignment for access: arr
 > >
 > > (lots of stuff that's simply gone)
 > >
 > > -22: BASIC BLOCK VECTORIZED
 > > -
 > > -22: basic block vectorized using SLP
 > > +22: not vectorized: unsupported unaligned store.arr[i_87]
 > > +22: not vectorized: unsupported alignment in basic block.
 > 
 > In this testcase the alignment of arr[i] should be irrelevant - it is
 > not part of
 > the stmts that are going to be vectorized.  But of course this may be
 > simply an odering issue in how we analyze data-references / statements
 > in basic-block vectorization (thus we possibly did not yet declare the
 > arr[i] = i statement as not taking part in the vectorization).
 > 
 > The line
 > 
 > > -22: force alignment of arr[i_87]
 > 
 > is odd, too - as said we do not need to touch arr when vectorizing the
 > basic-block.
 > 
 > Ulrich, can you look into this or do you want me to take a look here?
 > 
 > Mikael - please open a bugreport for this.

I opened PR53729 for this, with an update saying that powerpc64-linux
also has this regression.

/Mikael
Mikael Pettersson - June 26, 2012, 12:02 a.m.
Ulrich Weigand writes:
 > Richard Guenther wrote:
 > 
 > > In this testcase the alignment of arr[i] should be irrelevant - it is
 > > not part of
 > > the stmts that are going to be vectorized.  But of course this may be
 > > simply an odering issue in how we analyze data-references / statements
 > > in basic-block vectorization (thus we possibly did not yet declare the
 > > arr[i] = i statement as not taking part in the vectorization).
 > 
 > The following patch changes tree-vect-data-refs.c:vect_verify_datarefs_alignment
 > to only take into account statements marked as "relevant".
 > 
 > This should have no impact for loop-based vectorization, since the only caller
 > (vect_enhance_data_refs_alignment) already skips irrelevant statements.
 > [ As an aside, that routine should probably use STMT_VINFO_RELEVANT_P instead
 > of just checking STMT_VINFO_RELEVANT as a boolean. ]
 > 
 > However, for SLP this change in itself doesn't work, since vect_slp_analyze_bb_1
 > calls vect_verify_datarefs_alignment *before* statements have actually been
 > marked as relevant or irrelevant.  Therefore, the patch also rearranges the
 > sequence in vect_slp_analyze_bb_1.
 > 
 > This in turn caused ICEs in vect_get_store_cost/vect_get_load_cost, since those
 > now can get called with statements with unsupported alignment.  There is already
 > one caller (vect_get_data_access_cost) that checks for this case and simply
 > returns "infinite" cost instead of aborting.  The patch moves this behaviour
 > into vect_get_store_cost/vect_get_load_cost themselves.  (The particular cost
 > shouldn't matter since vectorization will still be rejected in the end, it's
 > just that the test now runs a bit later.)
 > 
 > Overall, I've tested the patch with no regresions on powerpc64-linux and
 > arm-linux-gnueabi.   On PowerPC, it fixed the gcc.dg/vect/bb-slp-16.c test.
 > 
 > Mikael, would you mind verifying that it fixes the problem on sparc64?

On sparc64-linux it fixed the bb-slp-16.c regression and didn't cause
any new ones.

/Mikael


 > 
 > OK for mainline?
 > 
 > Bye,
 > Ulrich
 > 
 > 
 > ChangeLog:
 > 
 > 	PR tree-optimization/53729
 > 	PR tree-optimization/53636
 > 	* tree-vect-slp.c (vect_slp_analyze_bb_1): Delay call to
 > 	vect_verify_datarefs_alignment until after statements have
 > 	been marked as relevant/irrelevant.
 > 	* tree-vect-data-refs.c (vect_verify_datarefs_alignment):
 > 	Skip irrelevant statements.
 > 	(vect_enhance_data_refs_alignment): Use STMT_VINFO_RELEVANT_P
 > 	instead of STMT_VINFO_RELEVANT.
 > 	(vect_get_data_access_cost): Do not check for supportable
 > 	alignment before calling vect_get_load_cost/vect_get_store_cost.
 > 	* tree-vect-stmts.c (vect_get_store_cost): Do not abort when
 > 	handling unsupported alignment.
 > 	(vect_get_load_cost): Likewise.
 > 
 > 
 > Index: gcc/tree-vect-data-refs.c
 > ===================================================================
 > *** gcc/tree-vect-data-refs.c	(revision 188850)
 > --- gcc/tree-vect-data-refs.c	(working copy)
 > *************** vect_verify_datarefs_alignment (loop_vec
 > *** 1094,1099 ****
 > --- 1094,1102 ----
 >         gimple stmt = DR_STMT (dr);
 >         stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
 >   
 > +       if (!STMT_VINFO_RELEVANT_P (stmt_info))
 > + 	continue;
 > + 
 >         /* For interleaving, only the alignment of the first access matters. 
 >            Skip statements marked as not vectorizable.  */
 >         if ((STMT_VINFO_GROUPED_ACCESS (stmt_info)
 > *************** vect_get_data_access_cost (struct data_r
 > *** 1213,1229 ****
 >     loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
 >     int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
 >     int ncopies = vf / nunits;
 > -   bool supportable_dr_alignment = vect_supportable_dr_alignment (dr, true);
 >   
 > !   if (!supportable_dr_alignment)
 > !     *inside_cost = VECT_MAX_COST;
 >     else
 > !     {
 > !       if (DR_IS_READ (dr))
 > !         vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost);
 > !       else
 > !         vect_get_store_cost (dr, ncopies, inside_cost);
 > !     }
 >   
 >     if (vect_print_dump_info (REPORT_COST))
 >       fprintf (vect_dump, "vect_get_data_access_cost: inside_cost = %d, "
 > --- 1216,1226 ----
 >     loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
 >     int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
 >     int ncopies = vf / nunits;
 >   
 > !   if (DR_IS_READ (dr))
 > !     vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost);
 >     else
 > !     vect_get_store_cost (dr, ncopies, inside_cost);
 >   
 >     if (vect_print_dump_info (REPORT_COST))
 >       fprintf (vect_dump, "vect_get_data_access_cost: inside_cost = %d, "
 > *************** vect_enhance_data_refs_alignment (loop_v
 > *** 1537,1543 ****
 >         stmt = DR_STMT (dr);
 >         stmt_info = vinfo_for_stmt (stmt);
 >   
 > !       if (!STMT_VINFO_RELEVANT (stmt_info))
 >   	continue;
 >   
 >         /* For interleaving, only the alignment of the first access
 > --- 1534,1540 ----
 >         stmt = DR_STMT (dr);
 >         stmt_info = vinfo_for_stmt (stmt);
 >   
 > !       if (!STMT_VINFO_RELEVANT_P (stmt_info))
 >   	continue;
 >   
 >         /* For interleaving, only the alignment of the first access
 > Index: gcc/tree-vect-stmts.c
 > ===================================================================
 > *** gcc/tree-vect-stmts.c	(revision 188850)
 > --- gcc/tree-vect-stmts.c	(working copy)
 > *************** vect_get_store_cost (struct data_referen
 > *** 931,936 ****
 > --- 931,946 ----
 >           break;
 >         }
 >   
 > +     case dr_unaligned_unsupported:
 > +       {
 > +         *inside_cost = VECT_MAX_COST;
 > + 
 > +         if (vect_print_dump_info (REPORT_COST))
 > +           fprintf (vect_dump, "vect_model_store_cost: unsupported access.");
 > + 
 > +         break;
 > +       }
 > + 
 >       default:
 >         gcc_unreachable ();
 >       }
 > *************** vect_get_load_cost (struct data_referenc
 > *** 1094,1099 ****
 > --- 1104,1119 ----
 >           break;
 >         }
 >   
 > +     case dr_unaligned_unsupported:
 > +       {
 > +         *inside_cost = VECT_MAX_COST;
 > + 
 > +         if (vect_print_dump_info (REPORT_COST))
 > +           fprintf (vect_dump, "vect_model_load_cost: unsupported access.");
 > + 
 > +         break;
 > +       }
 > + 
 >       default:
 >         gcc_unreachable ();
 >       }
 > Index: gcc/tree-vect-slp.c
 > ===================================================================
 > *** gcc/tree-vect-slp.c	(revision 188850)
 > --- gcc/tree-vect-slp.c	(working copy)
 > *************** vect_slp_analyze_bb_1 (basic_block bb)
 > *** 2050,2065 ****
 >         return NULL;
 >       }
 >   
 > -    if (!vect_verify_datarefs_alignment (NULL, bb_vinfo))
 > -     {
 > -       if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
 > -         fprintf (vect_dump, "not vectorized: unsupported alignment in basic "
 > -                             "block.\n");
 > - 
 > -       destroy_bb_vec_info (bb_vinfo);
 > -       return NULL;
 > -     }
 > - 
 >     /* Check the SLP opportunities in the basic block, analyze and build SLP
 >        trees.  */
 >     if (!vect_analyze_slp (NULL, bb_vinfo))
 > --- 2050,2055 ----
 > *************** vect_slp_analyze_bb_1 (basic_block bb)
 > *** 2082,2087 ****
 > --- 2072,2087 ----
 >         vect_mark_slp_stmts_relevant (SLP_INSTANCE_TREE (instance));
 >       }
 >   
 > +    if (!vect_verify_datarefs_alignment (NULL, bb_vinfo))
 > +     {
 > +       if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
 > +         fprintf (vect_dump, "not vectorized: unsupported alignment in basic "
 > +                             "block.\n");
 > + 
 > +       destroy_bb_vec_info (bb_vinfo);
 > +       return NULL;
 > +     }
 > + 
 >     if (!vect_slp_analyze_operations (bb_vinfo))
 >       {
 >         if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
 > 
 > -- 
 >   Dr. Ulrich Weigand
 >   GNU Toolchain for Linux on System z and Cell BE
 >   Ulrich.Weigand@de.ibm.com
 >
Richard Guenther - June 26, 2012, 8:39 a.m.
On Mon, Jun 25, 2012 at 5:44 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Richard Guenther wrote:
>
>> In this testcase the alignment of arr[i] should be irrelevant - it is
>> not part of
>> the stmts that are going to be vectorized.  But of course this may be
>> simply an odering issue in how we analyze data-references / statements
>> in basic-block vectorization (thus we possibly did not yet declare the
>> arr[i] = i statement as not taking part in the vectorization).
>
> The following patch changes tree-vect-data-refs.c:vect_verify_datarefs_alignment
> to only take into account statements marked as "relevant".
>
> This should have no impact for loop-based vectorization, since the only caller
> (vect_enhance_data_refs_alignment) already skips irrelevant statements.
> [ As an aside, that routine should probably use STMT_VINFO_RELEVANT_P instead
> of just checking STMT_VINFO_RELEVANT as a boolean. ]
>
> However, for SLP this change in itself doesn't work, since vect_slp_analyze_bb_1
> calls vect_verify_datarefs_alignment *before* statements have actually been
> marked as relevant or irrelevant.  Therefore, the patch also rearranges the
> sequence in vect_slp_analyze_bb_1.
>
> This in turn caused ICEs in vect_get_store_cost/vect_get_load_cost, since those
> now can get called with statements with unsupported alignment.  There is already
> one caller (vect_get_data_access_cost) that checks for this case and simply
> returns "infinite" cost instead of aborting.  The patch moves this behaviour
> into vect_get_store_cost/vect_get_load_cost themselves.  (The particular cost
> shouldn't matter since vectorization will still be rejected in the end, it's
> just that the test now runs a bit later.)
>
> Overall, I've tested the patch with no regresions on powerpc64-linux and
> arm-linux-gnueabi.   On PowerPC, it fixed the gcc.dg/vect/bb-slp-16.c test.
>
> Mikael, would you mind verifying that it fixes the problem on sparc64?
>
> OK for mainline?

Ok.

Thanks,
Richard.

> Bye,
> Ulrich
>
>
> ChangeLog:
>
>        PR tree-optimization/53729
>        PR tree-optimization/53636
>        * tree-vect-slp.c (vect_slp_analyze_bb_1): Delay call to
>        vect_verify_datarefs_alignment until after statements have
>        been marked as relevant/irrelevant.
>        * tree-vect-data-refs.c (vect_verify_datarefs_alignment):
>        Skip irrelevant statements.
>        (vect_enhance_data_refs_alignment): Use STMT_VINFO_RELEVANT_P
>        instead of STMT_VINFO_RELEVANT.
>        (vect_get_data_access_cost): Do not check for supportable
>        alignment before calling vect_get_load_cost/vect_get_store_cost.
>        * tree-vect-stmts.c (vect_get_store_cost): Do not abort when
>        handling unsupported alignment.
>        (vect_get_load_cost): Likewise.
>
>
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> *** gcc/tree-vect-data-refs.c   (revision 188850)
> --- gcc/tree-vect-data-refs.c   (working copy)
> *************** vect_verify_datarefs_alignment (loop_vec
> *** 1094,1099 ****
> --- 1094,1102 ----
>        gimple stmt = DR_STMT (dr);
>        stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>
> +       if (!STMT_VINFO_RELEVANT_P (stmt_info))
> +       continue;
> +
>        /* For interleaving, only the alignment of the first access matters.
>           Skip statements marked as not vectorizable.  */
>        if ((STMT_VINFO_GROUPED_ACCESS (stmt_info)
> *************** vect_get_data_access_cost (struct data_r
> *** 1213,1229 ****
>    loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>    int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>    int ncopies = vf / nunits;
> -   bool supportable_dr_alignment = vect_supportable_dr_alignment (dr, true);
>
> !   if (!supportable_dr_alignment)
> !     *inside_cost = VECT_MAX_COST;
>    else
> !     {
> !       if (DR_IS_READ (dr))
> !         vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost);
> !       else
> !         vect_get_store_cost (dr, ncopies, inside_cost);
> !     }
>
>    if (vect_print_dump_info (REPORT_COST))
>      fprintf (vect_dump, "vect_get_data_access_cost: inside_cost = %d, "
> --- 1216,1226 ----
>    loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>    int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>    int ncopies = vf / nunits;
>
> !   if (DR_IS_READ (dr))
> !     vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost);
>    else
> !     vect_get_store_cost (dr, ncopies, inside_cost);
>
>    if (vect_print_dump_info (REPORT_COST))
>      fprintf (vect_dump, "vect_get_data_access_cost: inside_cost = %d, "
> *************** vect_enhance_data_refs_alignment (loop_v
> *** 1537,1543 ****
>        stmt = DR_STMT (dr);
>        stmt_info = vinfo_for_stmt (stmt);
>
> !       if (!STMT_VINFO_RELEVANT (stmt_info))
>        continue;
>
>        /* For interleaving, only the alignment of the first access
> --- 1534,1540 ----
>        stmt = DR_STMT (dr);
>        stmt_info = vinfo_for_stmt (stmt);
>
> !       if (!STMT_VINFO_RELEVANT_P (stmt_info))
>        continue;
>
>        /* For interleaving, only the alignment of the first access
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> *** gcc/tree-vect-stmts.c       (revision 188850)
> --- gcc/tree-vect-stmts.c       (working copy)
> *************** vect_get_store_cost (struct data_referen
> *** 931,936 ****
> --- 931,946 ----
>          break;
>        }
>
> +     case dr_unaligned_unsupported:
> +       {
> +         *inside_cost = VECT_MAX_COST;
> +
> +         if (vect_print_dump_info (REPORT_COST))
> +           fprintf (vect_dump, "vect_model_store_cost: unsupported access.");
> +
> +         break;
> +       }
> +
>      default:
>        gcc_unreachable ();
>      }
> *************** vect_get_load_cost (struct data_referenc
> *** 1094,1099 ****
> --- 1104,1119 ----
>          break;
>        }
>
> +     case dr_unaligned_unsupported:
> +       {
> +         *inside_cost = VECT_MAX_COST;
> +
> +         if (vect_print_dump_info (REPORT_COST))
> +           fprintf (vect_dump, "vect_model_load_cost: unsupported access.");
> +
> +         break;
> +       }
> +
>      default:
>        gcc_unreachable ();
>      }
> Index: gcc/tree-vect-slp.c
> ===================================================================
> *** gcc/tree-vect-slp.c (revision 188850)
> --- gcc/tree-vect-slp.c (working copy)
> *************** vect_slp_analyze_bb_1 (basic_block bb)
> *** 2050,2065 ****
>        return NULL;
>      }
>
> -    if (!vect_verify_datarefs_alignment (NULL, bb_vinfo))
> -     {
> -       if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
> -         fprintf (vect_dump, "not vectorized: unsupported alignment in basic "
> -                             "block.\n");
> -
> -       destroy_bb_vec_info (bb_vinfo);
> -       return NULL;
> -     }
> -
>    /* Check the SLP opportunities in the basic block, analyze and build SLP
>       trees.  */
>    if (!vect_analyze_slp (NULL, bb_vinfo))
> --- 2050,2055 ----
> *************** vect_slp_analyze_bb_1 (basic_block bb)
> *** 2082,2087 ****
> --- 2072,2087 ----
>        vect_mark_slp_stmts_relevant (SLP_INSTANCE_TREE (instance));
>      }
>
> +    if (!vect_verify_datarefs_alignment (NULL, bb_vinfo))
> +     {
> +       if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
> +         fprintf (vect_dump, "not vectorized: unsupported alignment in basic "
> +                             "block.\n");
> +
> +       destroy_bb_vec_info (bb_vinfo);
> +       return NULL;
> +     }
> +
>    if (!vect_slp_analyze_operations (bb_vinfo))
>      {
>        if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  Ulrich.Weigand@de.ibm.com
>

Patch

=== added file 'gcc/testsuite/gcc.target/arm/pr53636.c'
--- gcc/testsuite/gcc.target/arm/pr53636.c	1970-01-01 00:00:00 +0000
+++ gcc/testsuite/gcc.target/arm/pr53636.c	2012-06-11 17:31:41 +0000
@@ -0,0 +1,48 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-options "-O -ftree-vectorize" } */
+/* { dg-add-options arm_neon } */
+
+void fill (short *buf) __attribute__ ((noinline));
+void fill (short *buf)
+{
+  int i;
+
+  for (i = 0; i < 11 * 8; i++)
+    buf[i] = i;
+}
+
+void test (unsigned char *dst) __attribute__ ((noinline));
+void test (unsigned char *dst)
+{
+  short tmp[11 * 8], *tptr;
+  int i;
+
+  fill (tmp);
+
+  tptr = tmp;
+  for (i = 0; i < 8; i++)
+    {
+      dst[0] = (-tptr[0] + 9 * tptr[0 + 1] + 9 * tptr[0 + 2] - tptr[0 + 3]) >> 7;
+      dst[1] = (-tptr[1] + 9 * tptr[1 + 1] + 9 * tptr[1 + 2] - tptr[1 + 3]) >> 7;
+      dst[2] = (-tptr[2] + 9 * tptr[2 + 1] + 9 * tptr[2 + 2] - tptr[2 + 3]) >> 7;
+      dst[3] = (-tptr[3] + 9 * tptr[3 + 1] + 9 * tptr[3 + 2] - tptr[3 + 3]) >> 7;
+      dst[4] = (-tptr[4] + 9 * tptr[4 + 1] + 9 * tptr[4 + 2] - tptr[4 + 3]) >> 7;
+      dst[5] = (-tptr[5] + 9 * tptr[5 + 1] + 9 * tptr[5 + 2] - tptr[5 + 3]) >> 7;
+      dst[6] = (-tptr[6] + 9 * tptr[6 + 1] + 9 * tptr[6 + 2] - tptr[6 + 3]) >> 7;
+      dst[7] = (-tptr[7] + 9 * tptr[7 + 1] + 9 * tptr[7 + 2] - tptr[7 + 3]) >> 7;
+
+      dst += 8;
+      tptr += 11;
+    }
+}
+
+int main (void)
+{
+  char buf [8 * 8];
+
+  test (buf);
+
+  return 0;
+}
+

=== modified file 'gcc/tree-vect-data-refs.c'
--- gcc/tree-vect-data-refs.c	2012-05-31 08:46:10 +0000
+++ gcc/tree-vect-data-refs.c	2012-06-11 17:31:41 +0000
@@ -845,6 +845,24 @@ 
 	}
     }
 
+  /* Similarly, if we're doing basic-block vectorization, we can only use
+     base and misalignment information relative to an innermost loop if the
+     misalignment stays the same throughout the execution of the loop.
+     As above, this is the case if the stride of the dataref evenly divides
+     by the vector size.  */
+  if (!loop)
+    {
+      tree step = DR_STEP (dr);
+      HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
+
+      if (dr_step % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0)
+	{
+	  if (vect_print_dump_info (REPORT_ALIGNMENT))
+	    fprintf (vect_dump, "SLP: step doesn't divide the vector-size.");
+	  misalign = NULL_TREE;
+	}
+    }
+
   base = build_fold_indirect_ref (base_addr);
   alignment = ssize_int (TYPE_ALIGN (vectype)/BITS_PER_UNIT);