Alias analysis of zero sized arrays
diff mbox

Message ID CAFiYyc0BhpaAXp6onDhsgSyr-5OAAaObWzZ7XMCsDmPrcE4Yug@mail.gmail.com
State New
Headers show

Commit Message

Richard Biener April 26, 2017, 9:27 a.m. UTC
On Wed, Apr 26, 2017 at 12:27 AM, Steve Ellcey <sellcey@cavium.com> wrote:
> This patch changes get_ref_base_and_extent to treat zero sized arrays
> like C99 flexible arrays and assume that references to zero sized
> arrays can also be made beyond the 'end' of the array.  C99 flexible
> arrays are recognized by not having an INTEGER_CST limit/size and the
> routine then sets maxsize to -1 for special handling.  This patch
> checks for a value of 0 when we do have an INTEGER_CST limit/size and
> handles it like a flexible array.
>
> Tested with a bootstrap on aarch64 and a GCC test run with no
> regressions.
>
> I did not create a testcase because the test I have (attached) is not
> runnable.  If you compile this test case for aarch64 with
> '-UFLEX -O2 -fno-strict-aliasing' you will get two loads, then two
> stores in the main loop.  In the original large program that this came
> from, that generated bad code.  If compiled with -DFLEX instead of
> -UFLEX, you get load, store, load, store and that was working.  With
> this patch, both versions (-UFLEX and -DFLEX) generate the load, store,
> load, store sequence.
>
> OK for checkin?

Huh.  This doesn't look like the correct fix for anything.

Ah, so the issue is that we prune MEM_EXPR of MEMs off ARRAY_REFs
and end up with a MEM_EXPR of a.0_1->o which has zero size.  Then
we run into

  /* If MEM_OFFSET or MEM_SIZE are unknown what we got from MEM_EXPR
     is conservative, so trust it.  */
  if (!MEM_OFFSET_KNOWN_P (mem)
      || !MEM_SIZE_KNOWN_P (mem))
    return true;

but MEM_SIZE_KNOWN_P is true so we miss to do

  /* Refine size and offset we got from analyzing MEM_EXPR by using
     MEM_SIZE and MEM_OFFSET.  */

  ref->offset += MEM_OFFSET (mem) * BITS_PER_UNIT;
  ref->size = MEM_SIZE (mem) * BITS_PER_UNIT;

thus a nearly minimal fix would be (looks like some other sanity checks
might better be moved before the early "trust it" out):


is probably better.

Richard.

> Steve Ellcey
> sellcey@cavium.com
>
>
> 2017-04-25  Steve Ellcey  <sellcey@cavium.com>
>
>         * tree-dfa.c (get_ref_base_and_extent): Treat zero size array like
>         a C99 flexible array.
>

Comments

Richard Biener April 26, 2017, 9:33 a.m. UTC | #1
On Wed, Apr 26, 2017 at 11:27 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Wed, Apr 26, 2017 at 12:27 AM, Steve Ellcey <sellcey@cavium.com> wrote:
>> This patch changes get_ref_base_and_extent to treat zero sized arrays
>> like C99 flexible arrays and assume that references to zero sized
>> arrays can also be made beyond the 'end' of the array.  C99 flexible
>> arrays are recognized by not having an INTEGER_CST limit/size and the
>> routine then sets maxsize to -1 for special handling.  This patch
>> checks for a value of 0 when we do have an INTEGER_CST limit/size and
>> handles it like a flexible array.
>>
>> Tested with a bootstrap on aarch64 and a GCC test run with no
>> regressions.
>>
>> I did not create a testcase because the test I have (attached) is not
>> runnable.  If you compile this test case for aarch64 with
>> '-UFLEX -O2 -fno-strict-aliasing' you will get two loads, then two
>> stores in the main loop.  In the original large program that this came
>> from, that generated bad code.  If compiled with -DFLEX instead of
>> -UFLEX, you get load, store, load, store and that was working.  With
>> this patch, both versions (-UFLEX and -DFLEX) generate the load, store,
>> load, store sequence.
>>
>> OK for checkin?
>
> Huh.  This doesn't look like the correct fix for anything.
>
> Ah, so the issue is that we prune MEM_EXPR of MEMs off ARRAY_REFs
> and end up with a MEM_EXPR of a.0_1->o which has zero size.  Then
> we run into
>
>   /* If MEM_OFFSET or MEM_SIZE are unknown what we got from MEM_EXPR
>      is conservative, so trust it.  */
>   if (!MEM_OFFSET_KNOWN_P (mem)
>       || !MEM_SIZE_KNOWN_P (mem))
>     return true;
>
> but MEM_SIZE_KNOWN_P is true so we miss to do
>
>   /* Refine size and offset we got from analyzing MEM_EXPR by using
>      MEM_SIZE and MEM_OFFSET.  */
>
>   ref->offset += MEM_OFFSET (mem) * BITS_PER_UNIT;
>   ref->size = MEM_SIZE (mem) * BITS_PER_UNIT;
>
> thus a nearly minimal fix would be (looks like some other sanity checks
> might better be moved before the early "trust it" out):
>
> Index: gcc/alias.c
> ===================================================================
> --- gcc/alias.c (revision 247273)
> +++ gcc/alias.c (working copy)
> @@ -322,6 +322,13 @@ ao_ref_from_mem (ao_ref *ref, const_rtx
>
>    ref->ref_alias_set = MEM_ALIAS_SET (mem);
>
> +  /* Refine size and offset we got from analyzing MEM_EXPR by using
> +     MEM_SIZE and MEM_OFFSET.  */
> +  if (MEM_OFFSET_KNOWN_P (mem))
> +    ref->offset += MEM_OFFSET (mem) * BITS_PER_UNIT;
> +  if (MEM_SIZE_KNOWN_P (mem))
> +    ref->size = MEM_SIZE (mem) * BITS_PER_UNIT;
> +
>    /* If MEM_OFFSET or MEM_SIZE are unknown what we got from MEM_EXPR
>       is conservative, so trust it.  */
>    if (!MEM_OFFSET_KNOWN_P (mem)
> @@ -336,12 +343,6 @@ ao_ref_from_mem (ao_ref *ref, const_rtx
>               > ref->max_size)))
>      ref->ref = NULL_TREE;
>
> -  /* Refine size and offset we got from analyzing MEM_EXPR by using
> -     MEM_SIZE and MEM_OFFSET.  */
> -
> -  ref->offset += MEM_OFFSET (mem) * BITS_PER_UNIT;
> -  ref->size = MEM_SIZE (mem) * BITS_PER_UNIT;
> -
>    /* The MEM may extend into adjacent fields, so adjust max_size if
>       necessary.  */
>    if (ref->max_size != -1

Actually this change is wrong!  Leaves the one below as the fix.

> note that in the end we might want to stop pruning MEM_EXPR so much...
> (set_mem_attributes_minus_bitpos).  At least dropping component/array
> refs in this case isn't conservative as the size zero access doesn't cover
> the array ref stripped.  So arguably the bug is in
> set_mem_attributes_minus_bitpos
> itself.  Thus
>
> Index: gcc/emit-rtl.c
> ===================================================================
> --- gcc/emit-rtl.c      (revision 247273)
> +++ gcc/emit-rtl.c      (working copy)
> @@ -1954,7 +1954,9 @@ set_mem_attributes_minus_bitpos (rtx ref
>           while (TREE_CODE (t2) == ARRAY_REF);
>
>           if (DECL_P (t2)
> -             || TREE_CODE (t2) == COMPONENT_REF)
> +             || (TREE_CODE (t2) == COMPONENT_REF
> +                 && (! DECL_SIZE (TREE_OPERAND (t2, 1))
> +                     || ! integer_zerop (DECL_SIZE (TREE_OPERAND (t2, 1))))))
>             {
>               attrs.expr = t2;
>               attrs.offset_known_p = false;
>
> is an alternative / additional fix (OTOH for all trailing arrays this
> isn't really a
> conservative MEM_EXPR, not just for size zero ones).  Thus
>
> Index: gcc/emit-rtl.c
> ===================================================================
> --- gcc/emit-rtl.c      (revision 247273)
> +++ gcc/emit-rtl.c      (working copy)
> @@ -1954,7 +1954,10 @@ set_mem_attributes_minus_bitpos (rtx ref
>           while (TREE_CODE (t2) == ARRAY_REF);
>
>           if (DECL_P (t2)
> -             || TREE_CODE (t2) == COMPONENT_REF)
> +             || (TREE_CODE (t2) == COMPONENT_REF
> +                 /* For trailing arrays t2 doesn't have a size that
> +                    covers all valid accesses.  */
> +                 && ! array_at_struct_end_p (t, false)))
>             {
>               attrs.expr = t2;
>               attrs.offset_known_p = false;
>
> is probably better.
>
> Richard.
>
>> Steve Ellcey
>> sellcey@cavium.com
>>
>>
>> 2017-04-25  Steve Ellcey  <sellcey@cavium.com>
>>
>>         * tree-dfa.c (get_ref_base_and_extent): Treat zero size array like
>>         a C99 flexible array.
>>
Steve Ellcey April 26, 2017, 6:03 p.m. UTC | #2
On Wed, 2017-04-26 at 11:33 +0200, Richard Biener wrote:
> > 
> > Index: gcc/emit-rtl.c
> > ===================================================================
> > --- gcc/emit-rtl.c      (revision 247273)
> > +++ gcc/emit-rtl.c      (working copy)
> > @@ -1954,7 +1954,10 @@ set_mem_attributes_minus_bitpos (rtx ref
> >           while (TREE_CODE (t2) == ARRAY_REF);
> > 
> >           if (DECL_P (t2)
> > -             || TREE_CODE (t2) == COMPONENT_REF)
> > +             || (TREE_CODE (t2) == COMPONENT_REF
> > +                 /* For trailing arrays t2 doesn't have a size that
> > +                    covers all valid accesses.  */
> > +                 && ! array_at_struct_end_p (t, false)))
> >             {
> >               attrs.expr = t2;
> >               attrs.offset_known_p = false;
> > 
> > is probably better.
> > 
> > Richard.

I tested this patch and it fixed my problem.  It does seem better than
my patch because now the [0] and [] versions of my test generate
exactly the same code.  With my patch they still generated different
code though both versions worked OK.

I did a full bootstrap and gcc testsuite run on aarch64 with no
regressions.  Will you check this in or do you want me to do a
submission and checkin?

Steve Ellcey
sellcey@cavium.com
Richard Biener April 26, 2017, 6:52 p.m. UTC | #3
On April 26, 2017 8:03:44 PM GMT+02:00, Steve Ellcey <sellcey@cavium.com> wrote:
>On Wed, 2017-04-26 at 11:33 +0200, Richard Biener wrote:
>> > 
>> > Index: gcc/emit-rtl.c
>> > ===================================================================
>> > --- gcc/emit-rtl.c      (revision 247273)
>> > +++ gcc/emit-rtl.c      (working copy)
>> > @@ -1954,7 +1954,10 @@ set_mem_attributes_minus_bitpos (rtx ref
>> >           while (TREE_CODE (t2) == ARRAY_REF);
>> > 
>> >           if (DECL_P (t2)
>> > -             || TREE_CODE (t2) == COMPONENT_REF)
>> > +             || (TREE_CODE (t2) == COMPONENT_REF
>> > +                 /* For trailing arrays t2 doesn't have a size
>that
>> > +                    covers all valid accesses.  */
>> > +                 && ! array_at_struct_end_p (t, false)))
>> >             {
>> >               attrs.expr = t2;
>> >               attrs.offset_known_p = false;
>> > 
>> > is probably better.
>> > 
>> > Richard.
>
>I tested this patch and it fixed my problem.  It does seem better than
>my patch because now the [0] and [] versions of my test generate
>exactly the same code.  With my patch they still generated different
>code though both versions worked OK.
>
>I did a full bootstrap and gcc testsuite run on aarch64 with no
>regressions.  Will you check this in or do you want me to do a
>submission and checkin?

I'll throw it on x86 testing and will commit it.  What branches are affected?  Can you open a bugreport?

I'll try to add a runtime testcase as well if you don't beat me to that.  On x86 it likely requires -fschedule-insns to trigger.

Richard.

>Steve Ellcey
>sellcey@cavium.com
Steve Ellcey April 26, 2017, 7:09 p.m. UTC | #4
On Wed, 2017-04-26 at 20:52 +0200, Richard Biener wrote:

> I'll throw it on x86 testing and will commit it.  What branches are
> affected?  Can you open a bugreport?
> 
> I'll try to add a runtime testcase as well if you don't beat me to
> that.  On x86 it likely requires -fschedule-insns to trigger.
> 
> Richard.

I created bug 80533.  The problem was found on the gcc-5 branch and is
also on the gcc-6 and gcc-7 branches.

Steve Ellcey

Patch
diff mbox

Index: gcc/alias.c
===================================================================
--- gcc/alias.c (revision 247273)
+++ gcc/alias.c (working copy)
@@ -322,6 +322,13 @@  ao_ref_from_mem (ao_ref *ref, const_rtx

   ref->ref_alias_set = MEM_ALIAS_SET (mem);

+  /* Refine size and offset we got from analyzing MEM_EXPR by using
+     MEM_SIZE and MEM_OFFSET.  */
+  if (MEM_OFFSET_KNOWN_P (mem))
+    ref->offset += MEM_OFFSET (mem) * BITS_PER_UNIT;
+  if (MEM_SIZE_KNOWN_P (mem))
+    ref->size = MEM_SIZE (mem) * BITS_PER_UNIT;
+
   /* If MEM_OFFSET or MEM_SIZE are unknown what we got from MEM_EXPR
      is conservative, so trust it.  */
   if (!MEM_OFFSET_KNOWN_P (mem)
@@ -336,12 +343,6 @@  ao_ref_from_mem (ao_ref *ref, const_rtx
              > ref->max_size)))
     ref->ref = NULL_TREE;

-  /* Refine size and offset we got from analyzing MEM_EXPR by using
-     MEM_SIZE and MEM_OFFSET.  */
-
-  ref->offset += MEM_OFFSET (mem) * BITS_PER_UNIT;
-  ref->size = MEM_SIZE (mem) * BITS_PER_UNIT;
-
   /* The MEM may extend into adjacent fields, so adjust max_size if
      necessary.  */
   if (ref->max_size != -1

note that in the end we might want to stop pruning MEM_EXPR so much...
(set_mem_attributes_minus_bitpos).  At least dropping component/array
refs in this case isn't conservative as the size zero access doesn't cover
the array ref stripped.  So arguably the bug is in
set_mem_attributes_minus_bitpos
itself.  Thus

Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c      (revision 247273)
+++ gcc/emit-rtl.c      (working copy)
@@ -1954,7 +1954,9 @@  set_mem_attributes_minus_bitpos (rtx ref
          while (TREE_CODE (t2) == ARRAY_REF);

          if (DECL_P (t2)
-             || TREE_CODE (t2) == COMPONENT_REF)
+             || (TREE_CODE (t2) == COMPONENT_REF
+                 && (! DECL_SIZE (TREE_OPERAND (t2, 1))
+                     || ! integer_zerop (DECL_SIZE (TREE_OPERAND (t2, 1))))))
            {
              attrs.expr = t2;
              attrs.offset_known_p = false;

is an alternative / additional fix (OTOH for all trailing arrays this
isn't really a
conservative MEM_EXPR, not just for size zero ones).  Thus

Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c      (revision 247273)
+++ gcc/emit-rtl.c      (working copy)
@@ -1954,7 +1954,10 @@  set_mem_attributes_minus_bitpos (rtx ref
          while (TREE_CODE (t2) == ARRAY_REF);

          if (DECL_P (t2)
-             || TREE_CODE (t2) == COMPONENT_REF)
+             || (TREE_CODE (t2) == COMPONENT_REF
+                 /* For trailing arrays t2 doesn't have a size that
+                    covers all valid accesses.  */
+                 && ! array_at_struct_end_p (t, false)))
            {
              attrs.expr = t2;
              attrs.offset_known_p = false;