Patchwork Fix PR tree-optimization/49771

login
register
mail settings
Submitter Ira Rosen
Date July 19, 2011, 6:25 a.m.
Message ID <CAKSNEw4t=p7L1EBFyHAc0rx_CvgXotusEBmbp6LZvFL29DQFJQ@mail.gmail.com>
Download mbox | patch
Permalink /patch/105396/
State New
Headers show

Comments

Ira Rosen - July 19, 2011, 6:25 a.m.
Hi,

The vectorizer performs the following alias checks for data-refs with
unknown dependence:

((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
  || (load_ptr_0 + load_segment_length_0) <= store_ptr_0))

where segment_length is data-ref's step in the loop multiplied by the
loop's number of iterations (in the general case). For invariant
data-refs segment_length is 0, since the step is 0. This creates
incorrect check for:

  for (i = 0; i < 1000; i++)
    for (j = 0; j < 1000; j++)
       a[j] = a[i] + 1;

We check:

&a + 4000 <= &a + i*4
|| &a + i*4 <= &a

and the second check is wrong for i=0.

This patch makes segment_length to be sizeof (data-ref type) in case
of zero step, changing the checks into

&a + 4000 <= &a + i*4
|| &a + i*4 + 4 <= &a

Bootstrapped and tested on powerpc64-suse-linux.
Committed revision 176434.

Ira

ChangeLog:

   PR tree-optimization/49771
   * tree-vect-loop-manip.c (vect_vfa_segment_size): In case of
   zero step, set segment length to the size of the data-ref's type.

testsuite/ChangeLog:

   PR tree-optimization/49771
   * gcc.dg/vect/pr49771.c: New test.
Richard Guenther - July 19, 2011, 8:50 a.m.
On Tue, Jul 19, 2011 at 8:25 AM, Ira Rosen <ira.rosen@linaro.org> wrote:
> Hi,
>
> The vectorizer performs the following alias checks for data-refs with
> unknown dependence:
>
> ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
>  || (load_ptr_0 + load_segment_length_0) <= store_ptr_0))
>
> where segment_length is data-ref's step in the loop multiplied by the
> loop's number of iterations (in the general case). For invariant
> data-refs segment_length is 0, since the step is 0. This creates
> incorrect check for:
>
>  for (i = 0; i < 1000; i++)
>    for (j = 0; j < 1000; j++)
>       a[j] = a[i] + 1;
>
> We check:
>
> &a + 4000 <= &a + i*4
> || &a + i*4 <= &a
>
> and the second check is wrong for i=0.
>
> This patch makes segment_length to be sizeof (data-ref type) in case
> of zero step, changing the checks into
>
> &a + 4000 <= &a + i*4
> || &a + i*4 + 4 <= &a
>
> Bootstrapped and tested on powerpc64-suse-linux.
> Committed revision 176434.
>
> Ira
>
> ChangeLog:
>
>   PR tree-optimization/49771
>   * tree-vect-loop-manip.c (vect_vfa_segment_size): In case of
>   zero step, set segment length to the size of the data-ref's type.
>
> testsuite/ChangeLog:
>
>   PR tree-optimization/49771
>   * gcc.dg/vect/pr49771.c: New test.
>
> Index: tree-vect-loop-manip.c
> ===================================================================
> --- tree-vect-loop-manip.c      (revision 176433)
> +++ tree-vect-loop-manip.c      (working copy)
> @@ -2356,9 +2356,14 @@ static tree
>  vect_vfa_segment_size (struct data_reference *dr, tree length_factor)
>  {
>   tree segment_length;
> -  segment_length = size_binop (MULT_EXPR,
> -                              fold_convert (sizetype, DR_STEP (dr)),
> -                              fold_convert (sizetype, length_factor));
> +
> +  if (!compare_tree_int (DR_STEP (dr), 0))

integer_zerop (DR_STEP (dr))

> +    segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
> +  else
> +    segment_length = size_binop (MULT_EXPR,
> +                                 fold_convert (sizetype, DR_STEP (dr)),
> +                                 fold_convert (sizetype, length_factor));
> +
>   if (vect_supportable_dr_alignment (dr, false)
>         == dr_explicit_realign_optimized)
>     {
> Index: testsuite/gcc.dg/vect/pr49771.c
> ===================================================================
> --- testsuite/gcc.dg/vect/pr49771.c     (revision 0)
> +++ testsuite/gcc.dg/vect/pr49771.c     (revision 0)
> @@ -0,0 +1,26 @@
> +#include <stdlib.h>
> +#include <stdarg.h>
> +
> +static int a[1000];
> +
> +int
> +foo (void)
> +{
> +  int j;
> +  int i;
> +  for (i = 0; i < 1000; i++)
> +    for (j = 0; j < 1000; j++)
> +      a[j] = a[i] + 1;
> +  return a[0];
> +}
> +
> +int
> +main (void)
> +{
> +  int res = foo ();
> +  if (res != 1999)
> +    abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { cleanup-tree-dump "vect" } } */
>
Ira Rosen - July 19, 2011, 1:27 p.m.
On 19 July 2011 11:50, Richard Guenther <richard.guenther@gmail.com> wrote:
> On Tue, Jul 19, 2011 at 8:25 AM, Ira Rosen <ira.rosen@linaro.org> wrote:
;
>> +
>> +  if (!compare_tree_int (DR_STEP (dr), 0))
>
> integer_zerop (DR_STEP (dr))
>

Right. I'll change this with some other opportunity, if that's ok.

Thanks,
Ira
Richard Guenther - July 19, 2011, 1:30 p.m.
On Tue, Jul 19, 2011 at 3:27 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
> On 19 July 2011 11:50, Richard Guenther <richard.guenther@gmail.com> wrote:
>> On Tue, Jul 19, 2011 at 8:25 AM, Ira Rosen <ira.rosen@linaro.org> wrote:
> ;
>>> +
>>> +  if (!compare_tree_int (DR_STEP (dr), 0))
>>
>> integer_zerop (DR_STEP (dr))
>>
>
> Right. I'll change this with some other opportunity, if that's ok.

Sure.

Thanks,
Richard.

> Thanks,
> Ira
>
Ira Rosen - July 21, 2011, 12:19 p.m.
On 20 July 2011 21:35, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Ira Rosen wrote:
>
>>    PR tree-optimization/49771
>>    * gcc.dg/vect/pr49771.c: New test.
>
> This test fails (with wrong code) on spu-elf ...
>
>> +int
>> +foo (void)
>> +{
>> +  int j;
>> +  int i;
>> +  for (i = 0; i < 1000; i++)
>> +    for (j = 0; j < 1000; j++)
>> +      a[j] = a[i] + 1;
>> +  return a[0];
>> +}
>> +
>> +int
>> +main (void)
>> +{
>> +  int res = foo ();
>> +  if (res != 1999)
>> +    abort ();
>> +  return 0;
>> +}
>
> The return value of foo with vectorization is 1249 instead
> of 1999 for some reason.

I reproduced the failure. It occurs without Richard's
(http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
patches too. Obviously the vectorized loop is executed, but at the
moment I don't understand why. I'll have a better look on Sunday.

Ira

>
> Bye,
> Ulrich
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  Ulrich.Weigand@de.ibm.com
>
Ira Rosen - July 24, 2011, 12:02 p.m.
On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote:
> On 20 July 2011 21:35, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>>
>> The return value of foo with vectorization is 1249 instead
>> of 1999 for some reason.
>
> I reproduced the failure. It occurs without Richard's
> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
> patches too. Obviously the vectorized loop is executed, but at the
> moment I don't understand why. I'll have a better look on Sunday.

Actually it doesn't choose the vectorized code. But the scalar version
gets optimized in a harmful way for SPU, AFAIU.
Here is the scalar loop after vrp2

<bb 8>:
  # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
  D.4593_42 = (void *) ivtmp.53_32;
  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
  D.4521_34 = D.4520_33 + 1;
  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
  ivtmp.42_45 = ivtmp.42_50 + 4;
  if (ivtmp.42_45 != 16)
    goto <bb 10>;
  else
    goto <bb 5>;

and the load is changed by dom2 to:

<bb 4>:
  ...
  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
   ...

where vector(4) int * vect_pa.9;

And the scalar loop has no rotate for that load:

.L3:
        lqd     $13,0($2)
        lqx     $11,$5,$3
        cwx     $7,$sp,$3
        ai      $12,$13,1
        shufb   $6,$12,$11,$7
        stqx    $6,$5,$3
        ai      $3,$3,4
        ceqi    $4,$3,16


I manually added rotqby for $13 and the result was correct (I changed
the test to iterate only 4 times to make the things easier).

Ira

>
> Ira
>
>>
>> Bye,
>> Ulrich
>>
>> --
>>  Dr. Ulrich Weigand
>>  GNU Toolchain for Linux on System z and Cell BE
>>  Ulrich.Weigand@de.ibm.com
>>
>
Richard Guenther - July 24, 2011, 2:31 p.m.
On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
> On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote:
>> On 20 July 2011 21:35, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>>>
>>> The return value of foo with vectorization is 1249 instead
>>> of 1999 for some reason.
>>
>> I reproduced the failure. It occurs without Richard's
>> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
>> patches too. Obviously the vectorized loop is executed, but at the
>> moment I don't understand why. I'll have a better look on Sunday.
>
> Actually it doesn't choose the vectorized code. But the scalar version
> gets optimized in a harmful way for SPU, AFAIU.
> Here is the scalar loop after vrp2
>
> <bb 8>:
>  # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
>  D.4593_42 = (void *) ivtmp.53_32;
>  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
>  D.4521_34 = D.4520_33 + 1;
>  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
>  ivtmp.42_45 = ivtmp.42_50 + 4;
>  if (ivtmp.42_45 != 16)
>    goto <bb 10>;
>  else
>    goto <bb 5>;
>
> and the load is changed by dom2 to:
>
> <bb 4>:
>  ...
>  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
>   ...
>
> where vector(4) int * vect_pa.9;
>
> And the scalar loop has no rotate for that load:

Hum.  This smells like we are hiding sth from the tree optimizers?

> .L3:
>        lqd     $13,0($2)
>        lqx     $11,$5,$3
>        cwx     $7,$sp,$3
>        ai      $12,$13,1
>        shufb   $6,$12,$11,$7
>        stqx    $6,$5,$3
>        ai      $3,$3,4
>        ceqi    $4,$3,16
>
>
> I manually added rotqby for $13 and the result was correct (I changed
> the test to iterate only 4 times to make the things easier).
>
> Ira
>
>>
>> Ira
>>
>>>
>>> Bye,
>>> Ulrich
>>>
>>> --
>>>  Dr. Ulrich Weigand
>>>  GNU Toolchain for Linux on System z and Cell BE
>>>  Ulrich.Weigand@de.ibm.com
>>>
>>
>
Richard Guenther - July 25, 2011, 9:39 a.m.
On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Richard Guenther wrote:
>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>> > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote:
>> >> I reproduced the failure. It occurs without Richard's
>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
>> >> patches too. Obviously the vectorized loop is executed, but at the
>> >> moment I don't understand why. I'll have a better look on Sunday.
>> >
>> > Actually it doesn't choose the vectorized code. But the scalar version
>> > gets optimized in a harmful way for SPU, AFAIU.
>> > Here is the scalar loop after vrp2
>> >
>> > <bb 8>:
>> >  # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
>> >  D.4593_42 = (void *) ivtmp.53_32;
>> >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
>> >  D.4521_34 = D.4520_33 + 1;
>> >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
>> >  ivtmp.42_45 = ivtmp.42_50 + 4;
>> >  if (ivtmp.42_45 != 16)
>> >    goto <bb 10>;
>> >  else
>> >    goto <bb 5>;
>> >
>> > and the load is changed by dom2 to:
>> >
>> > <bb 4>:
>> >  ...
>> >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
>> >   ...
>> >
>> > where vector(4) int * vect_pa.9;
>> >
>> > And the scalar loop has no rotate for that load:
>>
>> Hum.  This smells like we are hiding sth from the tree optimizers?
>
> Well, the back-end assumes a pointer to vector type is always
> naturally aligned, and therefore the data it points to can be
> accessed via a simple load, with no extra rotate needed.

I can't see any use of VECTOR_TYPE in config/spu/, and assuming
anything about alignment just because of the kind of the pointer
is bogus - the scalar code does a scalar read using that pointer.
So the backend better should look at the memory operation, not
at the pointer type.  That said, I can't find any code that looks
suspicious in the spu backend.

> It seems what happened here is that somehow, a pointer to int
> gets replaced by a pointer to vector, even though their alignment
> properties are different.

No, they are not.  They get replaced if they are value-equivalent
in which case they are also alignment-equivalent.  But well, the
dump snippet wasn't complete and I don't feel like building a
SPU cross to verify myself.

> This vector pointer must originate somehow in the vectorizer,
> however, since the original C source does not contain any
> vector types at all ...

That's for sure true, it must be the initial pointer we then increment
in the vectorized loop.

Richard.

> Bye,
> Ulrich
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  Ulrich.Weigand@de.ibm.com
>
Ira Rosen - July 25, 2011, 10:52 a.m.
On 25 July 2011 12:39, Richard Guenther <richard.guenther@gmail.com> wrote:
> On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>> Richard Guenther wrote:
>>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>>> > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote:
>>> >> I reproduced the failure. It occurs without Richard's
>>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
>>> >> patches too. Obviously the vectorized loop is executed, but at the
>>> >> moment I don't understand why. I'll have a better look on Sunday.
>>> >
>>> > Actually it doesn't choose the vectorized code. But the scalar version
>>> > gets optimized in a harmful way for SPU, AFAIU.
>>> > Here is the scalar loop after vrp2
>>> >
>>> > <bb 8>:
>>> >  # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
>>> >  D.4593_42 = (void *) ivtmp.53_32;
>>> >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
>>> >  D.4521_34 = D.4520_33 + 1;
>>> >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
>>> >  ivtmp.42_45 = ivtmp.42_50 + 4;
>>> >  if (ivtmp.42_45 != 16)
>>> >    goto <bb 10>;
>>> >  else
>>> >    goto <bb 5>;
>>> >
>>> > and the load is changed by dom2 to:
>>> >
>>> > <bb 4>:
>>> >  ...
>>> >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
>>> >   ...
>>> >
>>> > where vector(4) int * vect_pa.9;
>>> >
>>> > And the scalar loop has no rotate for that load:
>>>
>>> Hum.  This smells like we are hiding sth from the tree optimizers?
>>
>> Well, the back-end assumes a pointer to vector type is always
>> naturally aligned, and therefore the data it points to can be
>> accessed via a simple load, with no extra rotate needed.
>
> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
> anything about alignment just because of the kind of the pointer
> is bogus - the scalar code does a scalar read using that pointer.
> So the backend better should look at the memory operation, not
> at the pointer type.  That said, I can't find any code that looks
> suspicious in the spu backend.
>
>> It seems what happened here is that somehow, a pointer to int
>> gets replaced by a pointer to vector, even though their alignment
>> properties are different.
>
> No, they are not.  They get replaced if they are value-equivalent
> in which case they are also alignment-equivalent.  But well, the
> dump snippet wasn't complete and I don't feel like building a
> SPU cross to verify myself.

I am attaching the complete file.

Thanks,
Ira



>
>> This vector pointer must originate somehow in the vectorizer,
>> however, since the original C source does not contain any
>> vector types at all ...
>
> That's for sure true, it must be the initial pointer we then increment
> in the vectorized loop.
>
> Richard.
>
>> Bye,
>> Ulrich
>>
>> --
>>  Dr. Ulrich Weigand
>>  GNU Toolchain for Linux on System z and Cell BE
>>  Ulrich.Weigand@de.ibm.com
>>
>
Richard Guenther - July 25, 2011, 10:57 a.m.
On Mon, Jul 25, 2011 at 12:52 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
> On 25 July 2011 12:39, Richard Guenther <richard.guenther@gmail.com> wrote:
>> On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>>> Richard Guenther wrote:
>>>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>>>> > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote:
>>>> >> I reproduced the failure. It occurs without Richard's
>>>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
>>>> >> patches too. Obviously the vectorized loop is executed, but at the
>>>> >> moment I don't understand why. I'll have a better look on Sunday.
>>>> >
>>>> > Actually it doesn't choose the vectorized code. But the scalar version
>>>> > gets optimized in a harmful way for SPU, AFAIU.
>>>> > Here is the scalar loop after vrp2
>>>> >
>>>> > <bb 8>:
>>>> >  # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
>>>> >  D.4593_42 = (void *) ivtmp.53_32;
>>>> >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
>>>> >  D.4521_34 = D.4520_33 + 1;
>>>> >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
>>>> >  ivtmp.42_45 = ivtmp.42_50 + 4;
>>>> >  if (ivtmp.42_45 != 16)
>>>> >    goto <bb 10>;
>>>> >  else
>>>> >    goto <bb 5>;
>>>> >
>>>> > and the load is changed by dom2 to:
>>>> >
>>>> > <bb 4>:
>>>> >  ...
>>>> >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
>>>> >   ...
>>>> >
>>>> > where vector(4) int * vect_pa.9;
>>>> >
>>>> > And the scalar loop has no rotate for that load:
>>>>
>>>> Hum.  This smells like we are hiding sth from the tree optimizers?
>>>
>>> Well, the back-end assumes a pointer to vector type is always
>>> naturally aligned, and therefore the data it points to can be
>>> accessed via a simple load, with no extra rotate needed.
>>
>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
>> anything about alignment just because of the kind of the pointer
>> is bogus - the scalar code does a scalar read using that pointer.
>> So the backend better should look at the memory operation, not
>> at the pointer type.  That said, I can't find any code that looks
>> suspicious in the spu backend.
>>
>>> It seems what happened here is that somehow, a pointer to int
>>> gets replaced by a pointer to vector, even though their alignment
>>> properties are different.
>>
>> No, they are not.  They get replaced if they are value-equivalent
>> in which case they are also alignment-equivalent.  But well, the
>> dump snippet wasn't complete and I don't feel like building a
>> SPU cross to verify myself.
>
> I am attaching the complete file.

The issue seems to be that the IV in question, vect_pa.9_19, is
defined as

  vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;

but ivtmp.53_32 does not have a definition at all.

Richard.

>
> Thanks,
> Ira
>
>
>
>>
>>> This vector pointer must originate somehow in the vectorizer,
>>> however, since the original C source does not contain any
>>> vector types at all ...
>>
>> That's for sure true, it must be the initial pointer we then increment
>> in the vectorized loop.
>>
>> Richard.
>>
>>> Bye,
>>> Ulrich
>>>
>>> --
>>>  Dr. Ulrich Weigand
>>>  GNU Toolchain for Linux on System z and Cell BE
>>>  Ulrich.Weigand@de.ibm.com
>>>
>>
>
Ira Rosen - July 25, 2011, 11:09 a.m.
On 25 July 2011 13:57, Richard Guenther <richard.guenther@gmail.com> wrote:
> On Mon, Jul 25, 2011 at 12:52 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>> On 25 July 2011 12:39, Richard Guenther <richard.guenther@gmail.com> wrote:
>>> On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>>>> Richard Guenther wrote:
>>>>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>>>>> > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote:
>>>>> >> I reproduced the failure. It occurs without Richard's
>>>>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
>>>>> >> patches too. Obviously the vectorized loop is executed, but at the
>>>>> >> moment I don't understand why. I'll have a better look on Sunday.
>>>>> >
>>>>> > Actually it doesn't choose the vectorized code. But the scalar version
>>>>> > gets optimized in a harmful way for SPU, AFAIU.
>>>>> > Here is the scalar loop after vrp2
>>>>> >
>>>>> > <bb 8>:
>>>>> >  # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
>>>>> >  D.4593_42 = (void *) ivtmp.53_32;
>>>>> >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
>>>>> >  D.4521_34 = D.4520_33 + 1;
>>>>> >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
>>>>> >  ivtmp.42_45 = ivtmp.42_50 + 4;
>>>>> >  if (ivtmp.42_45 != 16)
>>>>> >    goto <bb 10>;
>>>>> >  else
>>>>> >    goto <bb 5>;
>>>>> >
>>>>> > and the load is changed by dom2 to:
>>>>> >
>>>>> > <bb 4>:
>>>>> >  ...
>>>>> >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
>>>>> >   ...
>>>>> >
>>>>> > where vector(4) int * vect_pa.9;
>>>>> >
>>>>> > And the scalar loop has no rotate for that load:
>>>>>
>>>>> Hum.  This smells like we are hiding sth from the tree optimizers?
>>>>
>>>> Well, the back-end assumes a pointer to vector type is always
>>>> naturally aligned, and therefore the data it points to can be
>>>> accessed via a simple load, with no extra rotate needed.
>>>
>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
>>> anything about alignment just because of the kind of the pointer
>>> is bogus - the scalar code does a scalar read using that pointer.
>>> So the backend better should look at the memory operation, not
>>> at the pointer type.  That said, I can't find any code that looks
>>> suspicious in the spu backend.
>>>
>>>> It seems what happened here is that somehow, a pointer to int
>>>> gets replaced by a pointer to vector, even though their alignment
>>>> properties are different.
>>>
>>> No, they are not.  They get replaced if they are value-equivalent
>>> in which case they are also alignment-equivalent.  But well, the
>>> dump snippet wasn't complete and I don't feel like building a
>>> SPU cross to verify myself.
>>
>> I am attaching the complete file.
>
> The issue seems to be that the IV in question, vect_pa.9_19, is
> defined as
>
>  vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;
>
> but ivtmp.53_32 does not have a definition at all.
>

I am sorry, it's my fault, resending the file.

Sorry,
Ira

> Richard.
>
>>
>> Thanks,
>> Ira
>>
>>
>>
>>>
>>>> This vector pointer must originate somehow in the vectorizer,
>>>> however, since the original C source does not contain any
>>>> vector types at all ...
>>>
>>> That's for sure true, it must be the initial pointer we then increment
>>> in the vectorized loop.
>>>
>>> Richard.
>>>
>>>> Bye,
>>>> Ulrich
>>>>
>>>> --
>>>>  Dr. Ulrich Weigand
>>>>  GNU Toolchain for Linux on System z and Cell BE
>>>>  Ulrich.Weigand@de.ibm.com
>>>>
>>>
>>
>
Richard Guenther - July 25, 2011, 11:15 a.m.
On Mon, Jul 25, 2011 at 1:09 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
> On 25 July 2011 13:57, Richard Guenther <richard.guenther@gmail.com> wrote:
>> On Mon, Jul 25, 2011 at 12:52 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>>> On 25 July 2011 12:39, Richard Guenther <richard.guenther@gmail.com> wrote:
>>>> On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>>>>> Richard Guenther wrote:
>>>>>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>>>>>> > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote:
>>>>>> >> I reproduced the failure. It occurs without Richard's
>>>>>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
>>>>>> >> patches too. Obviously the vectorized loop is executed, but at the
>>>>>> >> moment I don't understand why. I'll have a better look on Sunday.
>>>>>> >
>>>>>> > Actually it doesn't choose the vectorized code. But the scalar version
>>>>>> > gets optimized in a harmful way for SPU, AFAIU.
>>>>>> > Here is the scalar loop after vrp2
>>>>>> >
>>>>>> > <bb 8>:
>>>>>> >  # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
>>>>>> >  D.4593_42 = (void *) ivtmp.53_32;
>>>>>> >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
>>>>>> >  D.4521_34 = D.4520_33 + 1;
>>>>>> >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
>>>>>> >  ivtmp.42_45 = ivtmp.42_50 + 4;
>>>>>> >  if (ivtmp.42_45 != 16)
>>>>>> >    goto <bb 10>;
>>>>>> >  else
>>>>>> >    goto <bb 5>;
>>>>>> >
>>>>>> > and the load is changed by dom2 to:
>>>>>> >
>>>>>> > <bb 4>:
>>>>>> >  ...
>>>>>> >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
>>>>>> >   ...
>>>>>> >
>>>>>> > where vector(4) int * vect_pa.9;
>>>>>> >
>>>>>> > And the scalar loop has no rotate for that load:
>>>>>>
>>>>>> Hum.  This smells like we are hiding sth from the tree optimizers?
>>>>>
>>>>> Well, the back-end assumes a pointer to vector type is always
>>>>> naturally aligned, and therefore the data it points to can be
>>>>> accessed via a simple load, with no extra rotate needed.
>>>>
>>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
>>>> anything about alignment just because of the kind of the pointer
>>>> is bogus - the scalar code does a scalar read using that pointer.
>>>> So the backend better should look at the memory operation, not
>>>> at the pointer type.  That said, I can't find any code that looks
>>>> suspicious in the spu backend.
>>>>
>>>>> It seems what happened here is that somehow, a pointer to int
>>>>> gets replaced by a pointer to vector, even though their alignment
>>>>> properties are different.
>>>>
>>>> No, they are not.  They get replaced if they are value-equivalent
>>>> in which case they are also alignment-equivalent.  But well, the
>>>> dump snippet wasn't complete and I don't feel like building a
>>>> SPU cross to verify myself.
>>>
>>> I am attaching the complete file.
>>
>> The issue seems to be that the IV in question, vect_pa.9_19, is
>> defined as
>>
>>  vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;
>>
>> but ivtmp.53_32 does not have a definition at all.
>>
>
> I am sorry, it's my fault, resending the file.

Seems perfectly valid to me.  Or well - I suppose we might run into
the issue that the vectorizer sets alignment data at the wrong spot?
You can check alignment info when dumping with the -alias flag.
Building a spu cross now.

Richard.
Richard Guenther - July 25, 2011, 11:26 a.m.
On Mon, Jul 25, 2011 at 1:15 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Jul 25, 2011 at 1:09 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>> On 25 July 2011 13:57, Richard Guenther <richard.guenther@gmail.com> wrote:
>>> On Mon, Jul 25, 2011 at 12:52 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>>>> On 25 July 2011 12:39, Richard Guenther <richard.guenther@gmail.com> wrote:
>>>>> On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>>>>>> Richard Guenther wrote:
>>>>>>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>>>>>>> > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote:
>>>>>>> >> I reproduced the failure. It occurs without Richard's
>>>>>>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
>>>>>>> >> patches too. Obviously the vectorized loop is executed, but at the
>>>>>>> >> moment I don't understand why. I'll have a better look on Sunday.
>>>>>>> >
>>>>>>> > Actually it doesn't choose the vectorized code. But the scalar version
>>>>>>> > gets optimized in a harmful way for SPU, AFAIU.
>>>>>>> > Here is the scalar loop after vrp2
>>>>>>> >
>>>>>>> > <bb 8>:
>>>>>>> >  # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
>>>>>>> >  D.4593_42 = (void *) ivtmp.53_32;
>>>>>>> >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
>>>>>>> >  D.4521_34 = D.4520_33 + 1;
>>>>>>> >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
>>>>>>> >  ivtmp.42_45 = ivtmp.42_50 + 4;
>>>>>>> >  if (ivtmp.42_45 != 16)
>>>>>>> >    goto <bb 10>;
>>>>>>> >  else
>>>>>>> >    goto <bb 5>;
>>>>>>> >
>>>>>>> > and the load is changed by dom2 to:
>>>>>>> >
>>>>>>> > <bb 4>:
>>>>>>> >  ...
>>>>>>> >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
>>>>>>> >   ...
>>>>>>> >
>>>>>>> > where vector(4) int * vect_pa.9;
>>>>>>> >
>>>>>>> > And the scalar loop has no rotate for that load:
>>>>>>>
>>>>>>> Hum.  This smells like we are hiding sth from the tree optimizers?
>>>>>>
>>>>>> Well, the back-end assumes a pointer to vector type is always
>>>>>> naturally aligned, and therefore the data it points to can be
>>>>>> accessed via a simple load, with no extra rotate needed.
>>>>>
>>>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
>>>>> anything about alignment just because of the kind of the pointer
>>>>> is bogus - the scalar code does a scalar read using that pointer.
>>>>> So the backend better should look at the memory operation, not
>>>>> at the pointer type.  That said, I can't find any code that looks
>>>>> suspicious in the spu backend.
>>>>>
>>>>>> It seems what happened here is that somehow, a pointer to int
>>>>>> gets replaced by a pointer to vector, even though their alignment
>>>>>> properties are different.
>>>>>
>>>>> No, they are not.  They get replaced if they are value-equivalent
>>>>> in which case they are also alignment-equivalent.  But well, the
>>>>> dump snippet wasn't complete and I don't feel like building a
>>>>> SPU cross to verify myself.
>>>>
>>>> I am attaching the complete file.
>>>
>>> The issue seems to be that the IV in question, vect_pa.9_19, is
>>> defined as
>>>
>>>  vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;
>>>
>>> but ivtmp.53_32 does not have a definition at all.
>>>
>>
>> I am sorry, it's my fault, resending the file.
>
> Seems perfectly valid to me.  Or well - I suppose we might run into
> the issue that the vectorizer sets alignment data at the wrong spot?
> You can check alignment info when dumping with the -alias flag.
> Building a spu cross now.

Nope, all perfectly valid.

> Richard.
>
Richard Guenther - July 25, 2011, 1:24 p.m.
On Mon, Jul 25, 2011 at 3:22 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Richard Guenther wrote:
>
>> >>>>>> Well, the back-end assumes a pointer to vector type is always
>> >>>>>> naturally aligned, and therefore the data it points to can be
>> >>>>>> accessed via a simple load, with no extra rotate needed.
>> >>>>>
>> >>>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
>> >>>>> anything about alignment just because of the kind of the pointer
>> >>>>> is bogus - the scalar code does a scalar read using that pointer.
>> >>>>> So the backend better should look at the memory operation, not
>> >>>>> at the pointer type.  That said, I can't find any code that looks
>> >>>>> suspicious in the spu backend.
>> >>>>>
>> >>>>>> It seems what happened here is that somehow, a pointer to int
>> >>>>>> gets replaced by a pointer to vector, even though their alignment
>> >>>>>> properties are different.
>> >>>>>
>> >>>>> No, they are not.  They get replaced if they are value-equivalent
>> >>>>> in which case they are also alignment-equivalent.  But well, the
>> >>>>> dump snippet wasn't complete and I don't feel like building a
>> >>>>> SPU cross to verify myself.
>
>> > Seems perfectly valid to me.  Or well - I suppose we might run into
>> > the issue that the vectorizer sets alignment data at the wrong spot?
>> > You can check alignment info when dumping with the -alias flag.
>> > Building a spu cross now.
>>
>> Nope, all perfectly valid.
>
> Ah, I guess I see what's happening here.  When the SPU back-end is called
> to expand the load, the source operand is passed as:
>
> (mem:SI (reg/f:SI 226 [ vect_pa.9 ])
>        [2 MEM[base: vect_pa.9_44, offset: 0B]+0 S4 A32])
>
> Now this does say the MEM is only guaranteed to be aligned to 32 bits.
>
> However, spu_expand_load then goes and looks at the components of the
> address in detail, in order to figure out how to best perform the access.
> In doing so, it looks at the REGNO_POINTER_ALIGN values of the base
> registers involved in the address.
>
> In this case, REGNO_POINTER_ALIGN (226) is set to 128, and therefore
> the back-end thinks it can use an aligned access after all.
>
> Now, the reason why REGNO_POINTER_ALIGN (226) is 128 is that the register
> is the DECL_RTL for the variable vect_pa.9, and that variable has a
> pointer-to-vector type (with target alignment 128).
>
> When expanding that variable, expand_one_register_var does:
>
>  if (POINTER_TYPE_P (type))
>    mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type)));
>
> All this is normally completely correct -- a variable of type pointer
> to vector type *must* hold only properly aligned values.

No, this is indeed completely bogus code ;)  it should instead
use get_pointer_alignment.

Richard.
Richard Guenther - July 25, 2011, 2:03 p.m.
On Mon, Jul 25, 2011 at 3:24 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Jul 25, 2011 at 3:22 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>> Richard Guenther wrote:
>>
>>> >>>>>> Well, the back-end assumes a pointer to vector type is always
>>> >>>>>> naturally aligned, and therefore the data it points to can be
>>> >>>>>> accessed via a simple load, with no extra rotate needed.
>>> >>>>>
>>> >>>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
>>> >>>>> anything about alignment just because of the kind of the pointer
>>> >>>>> is bogus - the scalar code does a scalar read using that pointer.
>>> >>>>> So the backend better should look at the memory operation, not
>>> >>>>> at the pointer type.  That said, I can't find any code that looks
>>> >>>>> suspicious in the spu backend.
>>> >>>>>
>>> >>>>>> It seems what happened here is that somehow, a pointer to int
>>> >>>>>> gets replaced by a pointer to vector, even though their alignment
>>> >>>>>> properties are different.
>>> >>>>>
>>> >>>>> No, they are not.  They get replaced if they are value-equivalent
>>> >>>>> in which case they are also alignment-equivalent.  But well, the
>>> >>>>> dump snippet wasn't complete and I don't feel like building a
>>> >>>>> SPU cross to verify myself.
>>
>>> > Seems perfectly valid to me.  Or well - I suppose we might run into
>>> > the issue that the vectorizer sets alignment data at the wrong spot?
>>> > You can check alignment info when dumping with the -alias flag.
>>> > Building a spu cross now.
>>>
>>> Nope, all perfectly valid.
>>
>> Ah, I guess I see what's happening here.  When the SPU back-end is called
>> to expand the load, the source operand is passed as:
>>
>> (mem:SI (reg/f:SI 226 [ vect_pa.9 ])
>>        [2 MEM[base: vect_pa.9_44, offset: 0B]+0 S4 A32])
>>
>> Now this does say the MEM is only guaranteed to be aligned to 32 bits.
>>
>> However, spu_expand_load then goes and looks at the components of the
>> address in detail, in order to figure out how to best perform the access.
>> In doing so, it looks at the REGNO_POINTER_ALIGN values of the base
>> registers involved in the address.
>>
>> In this case, REGNO_POINTER_ALIGN (226) is set to 128, and therefore
>> the back-end thinks it can use an aligned access after all.
>>
>> Now, the reason why REGNO_POINTER_ALIGN (226) is 128 is that the register
>> is the DECL_RTL for the variable vect_pa.9, and that variable has a
>> pointer-to-vector type (with target alignment 128).
>>
>> When expanding that variable, expand_one_register_var does:
>>
>>  if (POINTER_TYPE_P (type))
>>    mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type)));
>>
>> All this is normally completely correct -- a variable of type pointer
>> to vector type *must* hold only properly aligned values.
>
> No, this is indeed completely bogus code ;)  it should instead
> use get_pointer_alignment.

Btw, as pseudos do not have a single def site how can the above
ever be correct in the face of coalescing?  For example on trees we
can have

 p_1 = &a; // align 256
 p_2 = p_1 + 4; // align 32

but we'll coalesce the thing and thus would have to use the weaker
alignment of both SSA names.  expand_one_register_var expands
the decl, not the SSA name, so using get_pointer_alignment on
the decl would probably be fine, though also pointless as it always
will return 8.

At least I don't see any code that would prevent a temporary variable
of type int * of being coalesced with a temporary variable of type vector int *.

Why should REGNO_POINTER_ALIGN be interesting to anyone?
Proper alignment information is (should be) attached to every
MEM already.

Richard.
Richard Guenther - July 25, 2011, 2:07 p.m.
On Mon, Jul 25, 2011 at 4:03 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Jul 25, 2011 at 3:24 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Mon, Jul 25, 2011 at 3:22 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>>> Richard Guenther wrote:
>>>
>>>> >>>>>> Well, the back-end assumes a pointer to vector type is always
>>>> >>>>>> naturally aligned, and therefore the data it points to can be
>>>> >>>>>> accessed via a simple load, with no extra rotate needed.
>>>> >>>>>
>>>> >>>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
>>>> >>>>> anything about alignment just because of the kind of the pointer
>>>> >>>>> is bogus - the scalar code does a scalar read using that pointer.
>>>> >>>>> So the backend better should look at the memory operation, not
>>>> >>>>> at the pointer type.  That said, I can't find any code that looks
>>>> >>>>> suspicious in the spu backend.
>>>> >>>>>
>>>> >>>>>> It seems what happened here is that somehow, a pointer to int
>>>> >>>>>> gets replaced by a pointer to vector, even though their alignment
>>>> >>>>>> properties are different.
>>>> >>>>>
>>>> >>>>> No, they are not.  They get replaced if they are value-equivalent
>>>> >>>>> in which case they are also alignment-equivalent.  But well, the
>>>> >>>>> dump snippet wasn't complete and I don't feel like building a
>>>> >>>>> SPU cross to verify myself.
>>>
>>>> > Seems perfectly valid to me.  Or well - I suppose we might run into
>>>> > the issue that the vectorizer sets alignment data at the wrong spot?
>>>> > You can check alignment info when dumping with the -alias flag.
>>>> > Building a spu cross now.
>>>>
>>>> Nope, all perfectly valid.
>>>
>>> Ah, I guess I see what's happening here.  When the SPU back-end is called
>>> to expand the load, the source operand is passed as:
>>>
>>> (mem:SI (reg/f:SI 226 [ vect_pa.9 ])
>>>        [2 MEM[base: vect_pa.9_44, offset: 0B]+0 S4 A32])
>>>
>>> Now this does say the MEM is only guaranteed to be aligned to 32 bits.
>>>
>>> However, spu_expand_load then goes and looks at the components of the
>>> address in detail, in order to figure out how to best perform the access.
>>> In doing so, it looks at the REGNO_POINTER_ALIGN values of the base
>>> registers involved in the address.
>>>
>>> In this case, REGNO_POINTER_ALIGN (226) is set to 128, and therefore
>>> the back-end thinks it can use an aligned access after all.
>>>
>>> Now, the reason why REGNO_POINTER_ALIGN (226) is 128 is that the register
>>> is the DECL_RTL for the variable vect_pa.9, and that variable has a
>>> pointer-to-vector type (with target alignment 128).
>>>
>>> When expanding that variable, expand_one_register_var does:
>>>
>>>  if (POINTER_TYPE_P (type))
>>>    mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type)));
>>>
>>> All this is normally completely correct -- a variable of type pointer
>>> to vector type *must* hold only properly aligned values.
>>
>> No, this is indeed completely bogus code ;)  it should instead
>> use get_pointer_alignment.
>
> Btw, as pseudos do not have a single def site how can the above
> ever be correct in the face of coalescing?  For example on trees we
> can have
>
>  p_1 = &a; // align 256
>  p_2 = p_1 + 4; // align 32
>
> but we'll coalesce the thing and thus would have to use the weaker
> alignment of both SSA names.  expand_one_register_var expands
> the decl, not the SSA name, so using get_pointer_alignment on
> the decl would probably be fine, though also pointless as it always
> will return 8.
>
> At least I don't see any code that would prevent a temporary variable
> of type int * of being coalesced with a temporary variable of type vector int *.
>
> Why should REGNO_POINTER_ALIGN be interesting to anyone?
> Proper alignment information is (should be) attached to every
> MEM already.

nonzero_bits1 seems to be the only consumer of REGNO_POINTER_ALIGN
apart from maybe alpha.c and spu.c.

We should simply kill REGNO_POINTER_ALIGN IMHO.

Richard.

> Richard.
>
Richard Guenther - July 25, 2011, 2:43 p.m.
On Mon, Jul 25, 2011 at 4:23 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Richard Guenther wrote:
>
>> > Btw, as pseudos do not have a single def site how can the above
>> > ever be correct in the face of coalescing?
>
> I had always understood this to reflect the simple fact that a
> pointer to some type must never hold a value that is not properly
> aligned for that type.  (Maybe this is only true on STRICT_ALIGNMENT
> targets?)   This has always been an important property to generate
> good code on SPU ...

We do not preserve pointer type casts in the middle-end (anymore).

>> > For example on trees we can have
>> >
>> >  p_1 = &a; // align 256
>> >  p_2 = p_1 + 4; // align 32
>> >
>> > but we'll coalesce the thing and thus would have to use the weaker
>> > alignment of both SSA names.  expand_one_register_var expands
>> > the decl, not the SSA name, so using get_pointer_alignment on
>> > the decl would probably be fine, though also pointless as it always
>> > will return 8.
>> >
>> > At least I don't see any code that would prevent a temporary variable
>> > of type int * of being coalesced with a temporary variable of type vector
>> > int *.
>
> I don't really understand the coalesce code, but in the above sample,
> the two variables must have the same type, otherwise there'd have to
> be a cast somewhere.  Does coalesce eliminate casts?

No, there is no cast between different pointer types.  Information is
not attached to types but to real entities.

>> > Why should REGNO_POINTER_ALIGN be interesting to anyone?
>> > Proper alignment information is (should be) attached to every
>> > MEM already.
>>
>> nonzero_bits1 seems to be the only consumer of REGNO_POINTER_ALIGN
>> apart from maybe alpha.c and spu.c.
>>
>> We should simply kill REGNO_POINTER_ALIGN IMHO.
>
> On the SPU at least, REGNO_POINTER_ALIGN carries additional information
> over just the MEM alignment.  Say, I'm getting a MEM the form
> (mem (plus (reg X) (reg Y))), and the MEM is aligned to 32 bits.
>
> This means I need to generate a rotate to fix up the value that was
> loaded by the (forced aligned) load instruction.  However, the form
> of this rotate can be simpler if I know that e.g. reg X is always
> guaranteed to be 128-bits aligned and only reg Y introduces the
> potential misalignment.  If on the other hand neither of the base
> registers is guaranteed to be 128-bit aligned, I need to generate
> more complex rotate code ...

Because then you need the value of X + Y instead of just picking either?

Why not expand this explicitly when you still have the per-SSA name
alignment information around?

> I understand this may also be important on other platforms, e.g.
> to choose which register to use as base and which as index in a
> memory operation ...

Well, we still have REG_POINTER.

Richard.

> Bye,
> Ulrich
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  Ulrich.Weigand@de.ibm.com
>
Richard Guenther - July 26, 2011, 7:23 a.m.
On Mon, Jul 25, 2011 at 5:25 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Richard Guenther wrote:
>> On Mon, Jul 25, 2011 at 4:23 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>> > I had always understood this to reflect the simple fact that a
>> > pointer to some type must never hold a value that is not properly
>> > aligned for that type.  (Maybe this is only true on STRICT_ALIGNMENT
>> > targets?)   This has always been an important property to generate
>> > good code on SPU ...
>>
>> We do not preserve pointer type casts in the middle-end (anymore).
>
> Huh, OK.  I was not aware of that ...
>
>> >> nonzero_bits1 seems to be the only consumer of REGNO_POINTER_ALIGN
>> >> apart from maybe alpha.c and spu.c.
>
> There's also a use in find_reloads_subreg_address, as well as in the
> i386/predicates.md and arm/arm.md files.
>
>> > This means I need to generate a rotate to fix up the value that was
>> > loaded by the (forced aligned) load instruction.  However, the form
>> > of this rotate can be simpler if I know that e.g. reg X is always
>> > guaranteed to be 128-bits aligned and only reg Y introduces the
>> > potential misalignment.  If on the other hand neither of the base
>> > registers is guaranteed to be 128-bit aligned, I need to generate
>> > more complex rotate code ...
>>
>> Because then you need the value of X + Y instead of just picking either?
>
> Yes, exactly.
>
>> Why not expand this explicitly when you still have the per-SSA name
>> alignment information around?
>
> When would that be?  The expansion does happen in the initial expand
> stage, but I'm getting called from the middle-end via emit_move_insn etc.
> which already provides me with a MEM ...

Hmm.  I suppose we'd need to see at the initial expand stage that the
move is going to be handled specially.  For other strict-align targets
we end up with store/load-bit-field for unaligned accesses, so I suppose
SPU doesn't want to go down that path (via insv/extv)?

> Can I use REG_ATTRS->decl to get at the register's DECL and use
> get_pointer_alignment on that?  [ On the other hand, don't we have
> the same problems with reliability of REG_ATTRS that we have with
> REGNO_POINTER_ALIGN, given e.g. the coalescing you mentioned? ]

Well, REG_ATTRS->decl is again a decl, not an SSA name.  I suppose
we'd need to pick a conservative REGNO_POINTER_ALIGN during
expansion of the SSA name partition - iterate over all of them in the
partition and pick the lowest alignment.  Or even adjust the partitioning
to avoid losing alignment information that way.

I suppose the RTL code transforms are careful to update REGNO_POINTER_ALIGN
conservatively.

Richard.

> Bye,
> Ulrich
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  Ulrich.Weigand@de.ibm.com
>
Andrew Pinski - July 26, 2011, 7:41 a.m.
On Tue, Jul 26, 2011 at 12:23 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> Hmm.  I suppose we'd need to see at the initial expand stage that the
> move is going to be handled specially.  For other strict-align targets
> we end up with store/load-bit-field for unaligned accesses, so I suppose
> SPU doesn't want to go down that path (via insv/extv)?

The problem is that almost all load/stores on spu are unaligned.  So
there will be less optimized done on them if we go down that path (it
was tried at least twice and it produced worse code).

Thanks,
Andrew Pinski

Patch

Index: tree-vect-loop-manip.c
===================================================================
--- tree-vect-loop-manip.c      (revision 176433)
+++ tree-vect-loop-manip.c      (working copy)
@@ -2356,9 +2356,14 @@  static tree
 vect_vfa_segment_size (struct data_reference *dr, tree length_factor)
 {
   tree segment_length;
-  segment_length = size_binop (MULT_EXPR,
-                              fold_convert (sizetype, DR_STEP (dr)),
-                              fold_convert (sizetype, length_factor));
+
+  if (!compare_tree_int (DR_STEP (dr), 0))
+    segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
+  else
+    segment_length = size_binop (MULT_EXPR,
+                                 fold_convert (sizetype, DR_STEP (dr)),
+                                 fold_convert (sizetype, length_factor));
+
   if (vect_supportable_dr_alignment (dr, false)
         == dr_explicit_realign_optimized)
     {
Index: testsuite/gcc.dg/vect/pr49771.c
===================================================================
--- testsuite/gcc.dg/vect/pr49771.c     (revision 0)
+++ testsuite/gcc.dg/vect/pr49771.c     (revision 0)
@@ -0,0 +1,26 @@ 
+#include <stdlib.h>
+#include <stdarg.h>
+
+static int a[1000];
+
+int
+foo (void)
+{
+  int j;
+  int i;
+  for (i = 0; i < 1000; i++)
+    for (j = 0; j < 1000; j++)
+      a[j] = a[i] + 1;
+  return a[0];
+}
+
+int
+main (void)
+{
+  int res = foo ();
+  if (res != 1999)
+    abort ();
+  return 0;
+}
+
+/* { dg-final { cleanup-tree-dump "vect" } } */