Message ID | CAFiYyc0BhpaAXp6onDhsgSyr-5OAAaObWzZ7XMCsDmPrcE4Yug@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Wed, Apr 26, 2017 at 11:27 AM, Richard Biener <richard.guenther@gmail.com> wrote: > On Wed, Apr 26, 2017 at 12:27 AM, Steve Ellcey <sellcey@cavium.com> wrote: >> This patch changes get_ref_base_and_extent to treat zero sized arrays >> like C99 flexible arrays and assume that references to zero sized >> arrays can also be made beyond the 'end' of the array. C99 flexible >> arrays are recognized by not having an INTEGER_CST limit/size and the >> routine then sets maxsize to -1 for special handling. This patch >> checks for a value of 0 when we do have an INTEGER_CST limit/size and >> handles it like a flexible array. >> >> Tested with a bootstrap on aarch64 and a GCC test run with no >> regressions. >> >> I did not create a testcase because the test I have (attached) is not >> runnable. If you compile this test case for aarch64 with >> '-UFLEX -O2 -fno-strict-aliasing' you will get two loads, then two >> stores in the main loop. In the original large program that this came >> from, that generated bad code. If compiled with -DFLEX instead of >> -UFLEX, you get load, store, load, store and that was working. With >> this patch, both versions (-UFLEX and -DFLEX) generate the load, store, >> load, store sequence. >> >> OK for checkin? > > Huh. This doesn't look like the correct fix for anything. > > Ah, so the issue is that we prune MEM_EXPR of MEMs off ARRAY_REFs > and end up with a MEM_EXPR of a.0_1->o which has zero size. Then > we run into > > /* If MEM_OFFSET or MEM_SIZE are unknown what we got from MEM_EXPR > is conservative, so trust it. */ > if (!MEM_OFFSET_KNOWN_P (mem) > || !MEM_SIZE_KNOWN_P (mem)) > return true; > > but MEM_SIZE_KNOWN_P is true so we miss to do > > /* Refine size and offset we got from analyzing MEM_EXPR by using > MEM_SIZE and MEM_OFFSET. */ > > ref->offset += MEM_OFFSET (mem) * BITS_PER_UNIT; > ref->size = MEM_SIZE (mem) * BITS_PER_UNIT; > > thus a nearly minimal fix would be (looks like some other sanity checks > might better be moved before the early "trust it" out): > > Index: gcc/alias.c > =================================================================== > --- gcc/alias.c (revision 247273) > +++ gcc/alias.c (working copy) > @@ -322,6 +322,13 @@ ao_ref_from_mem (ao_ref *ref, const_rtx > > ref->ref_alias_set = MEM_ALIAS_SET (mem); > > + /* Refine size and offset we got from analyzing MEM_EXPR by using > + MEM_SIZE and MEM_OFFSET. */ > + if (MEM_OFFSET_KNOWN_P (mem)) > + ref->offset += MEM_OFFSET (mem) * BITS_PER_UNIT; > + if (MEM_SIZE_KNOWN_P (mem)) > + ref->size = MEM_SIZE (mem) * BITS_PER_UNIT; > + > /* If MEM_OFFSET or MEM_SIZE are unknown what we got from MEM_EXPR > is conservative, so trust it. */ > if (!MEM_OFFSET_KNOWN_P (mem) > @@ -336,12 +343,6 @@ ao_ref_from_mem (ao_ref *ref, const_rtx > > ref->max_size))) > ref->ref = NULL_TREE; > > - /* Refine size and offset we got from analyzing MEM_EXPR by using > - MEM_SIZE and MEM_OFFSET. */ > - > - ref->offset += MEM_OFFSET (mem) * BITS_PER_UNIT; > - ref->size = MEM_SIZE (mem) * BITS_PER_UNIT; > - > /* The MEM may extend into adjacent fields, so adjust max_size if > necessary. */ > if (ref->max_size != -1 Actually this change is wrong! Leaves the one below as the fix. > note that in the end we might want to stop pruning MEM_EXPR so much... > (set_mem_attributes_minus_bitpos). At least dropping component/array > refs in this case isn't conservative as the size zero access doesn't cover > the array ref stripped. So arguably the bug is in > set_mem_attributes_minus_bitpos > itself. Thus > > Index: gcc/emit-rtl.c > =================================================================== > --- gcc/emit-rtl.c (revision 247273) > +++ gcc/emit-rtl.c (working copy) > @@ -1954,7 +1954,9 @@ set_mem_attributes_minus_bitpos (rtx ref > while (TREE_CODE (t2) == ARRAY_REF); > > if (DECL_P (t2) > - || TREE_CODE (t2) == COMPONENT_REF) > + || (TREE_CODE (t2) == COMPONENT_REF > + && (! DECL_SIZE (TREE_OPERAND (t2, 1)) > + || ! integer_zerop (DECL_SIZE (TREE_OPERAND (t2, 1)))))) > { > attrs.expr = t2; > attrs.offset_known_p = false; > > is an alternative / additional fix (OTOH for all trailing arrays this > isn't really a > conservative MEM_EXPR, not just for size zero ones). Thus > > Index: gcc/emit-rtl.c > =================================================================== > --- gcc/emit-rtl.c (revision 247273) > +++ gcc/emit-rtl.c (working copy) > @@ -1954,7 +1954,10 @@ set_mem_attributes_minus_bitpos (rtx ref > while (TREE_CODE (t2) == ARRAY_REF); > > if (DECL_P (t2) > - || TREE_CODE (t2) == COMPONENT_REF) > + || (TREE_CODE (t2) == COMPONENT_REF > + /* For trailing arrays t2 doesn't have a size that > + covers all valid accesses. */ > + && ! array_at_struct_end_p (t, false))) > { > attrs.expr = t2; > attrs.offset_known_p = false; > > is probably better. > > Richard. > >> Steve Ellcey >> sellcey@cavium.com >> >> >> 2017-04-25 Steve Ellcey <sellcey@cavium.com> >> >> * tree-dfa.c (get_ref_base_and_extent): Treat zero size array like >> a C99 flexible array. >>
On Wed, 2017-04-26 at 11:33 +0200, Richard Biener wrote: > > > > Index: gcc/emit-rtl.c > > =================================================================== > > --- gcc/emit-rtl.c (revision 247273) > > +++ gcc/emit-rtl.c (working copy) > > @@ -1954,7 +1954,10 @@ set_mem_attributes_minus_bitpos (rtx ref > > while (TREE_CODE (t2) == ARRAY_REF); > > > > if (DECL_P (t2) > > - || TREE_CODE (t2) == COMPONENT_REF) > > + || (TREE_CODE (t2) == COMPONENT_REF > > + /* For trailing arrays t2 doesn't have a size that > > + covers all valid accesses. */ > > + && ! array_at_struct_end_p (t, false))) > > { > > attrs.expr = t2; > > attrs.offset_known_p = false; > > > > is probably better. > > > > Richard. I tested this patch and it fixed my problem. It does seem better than my patch because now the [0] and [] versions of my test generate exactly the same code. With my patch they still generated different code though both versions worked OK. I did a full bootstrap and gcc testsuite run on aarch64 with no regressions. Will you check this in or do you want me to do a submission and checkin? Steve Ellcey sellcey@cavium.com
On April 26, 2017 8:03:44 PM GMT+02:00, Steve Ellcey <sellcey@cavium.com> wrote: >On Wed, 2017-04-26 at 11:33 +0200, Richard Biener wrote: >> > >> > Index: gcc/emit-rtl.c >> > =================================================================== >> > --- gcc/emit-rtl.c (revision 247273) >> > +++ gcc/emit-rtl.c (working copy) >> > @@ -1954,7 +1954,10 @@ set_mem_attributes_minus_bitpos (rtx ref >> > while (TREE_CODE (t2) == ARRAY_REF); >> > >> > if (DECL_P (t2) >> > - || TREE_CODE (t2) == COMPONENT_REF) >> > + || (TREE_CODE (t2) == COMPONENT_REF >> > + /* For trailing arrays t2 doesn't have a size >that >> > + covers all valid accesses. */ >> > + && ! array_at_struct_end_p (t, false))) >> > { >> > attrs.expr = t2; >> > attrs.offset_known_p = false; >> > >> > is probably better. >> > >> > Richard. > >I tested this patch and it fixed my problem. It does seem better than >my patch because now the [0] and [] versions of my test generate >exactly the same code. With my patch they still generated different >code though both versions worked OK. > >I did a full bootstrap and gcc testsuite run on aarch64 with no >regressions. Will you check this in or do you want me to do a >submission and checkin? I'll throw it on x86 testing and will commit it. What branches are affected? Can you open a bugreport? I'll try to add a runtime testcase as well if you don't beat me to that. On x86 it likely requires -fschedule-insns to trigger. Richard. >Steve Ellcey >sellcey@cavium.com
On Wed, 2017-04-26 at 20:52 +0200, Richard Biener wrote: > > I'll throw it on x86 testing and will commit it. What branches are > affected? Can you open a bugreport? > > I'll try to add a runtime testcase as well if you don't beat me to > that. On x86 it likely requires -fschedule-insns to trigger. > > Richard. I created bug 80533. The problem was found on the gcc-5 branch and is also on the gcc-6 and gcc-7 branches. Steve Ellcey
Index: gcc/alias.c =================================================================== --- gcc/alias.c (revision 247273) +++ gcc/alias.c (working copy) @@ -322,6 +322,13 @@ ao_ref_from_mem (ao_ref *ref, const_rtx ref->ref_alias_set = MEM_ALIAS_SET (mem); + /* Refine size and offset we got from analyzing MEM_EXPR by using + MEM_SIZE and MEM_OFFSET. */ + if (MEM_OFFSET_KNOWN_P (mem)) + ref->offset += MEM_OFFSET (mem) * BITS_PER_UNIT; + if (MEM_SIZE_KNOWN_P (mem)) + ref->size = MEM_SIZE (mem) * BITS_PER_UNIT; + /* If MEM_OFFSET or MEM_SIZE are unknown what we got from MEM_EXPR is conservative, so trust it. */ if (!MEM_OFFSET_KNOWN_P (mem) @@ -336,12 +343,6 @@ ao_ref_from_mem (ao_ref *ref, const_rtx > ref->max_size))) ref->ref = NULL_TREE; - /* Refine size and offset we got from analyzing MEM_EXPR by using - MEM_SIZE and MEM_OFFSET. */ - - ref->offset += MEM_OFFSET (mem) * BITS_PER_UNIT; - ref->size = MEM_SIZE (mem) * BITS_PER_UNIT; - /* The MEM may extend into adjacent fields, so adjust max_size if necessary. */ if (ref->max_size != -1 note that in the end we might want to stop pruning MEM_EXPR so much... (set_mem_attributes_minus_bitpos). At least dropping component/array refs in this case isn't conservative as the size zero access doesn't cover the array ref stripped. So arguably the bug is in set_mem_attributes_minus_bitpos itself. Thus Index: gcc/emit-rtl.c =================================================================== --- gcc/emit-rtl.c (revision 247273) +++ gcc/emit-rtl.c (working copy) @@ -1954,7 +1954,9 @@ set_mem_attributes_minus_bitpos (rtx ref while (TREE_CODE (t2) == ARRAY_REF); if (DECL_P (t2) - || TREE_CODE (t2) == COMPONENT_REF) + || (TREE_CODE (t2) == COMPONENT_REF + && (! DECL_SIZE (TREE_OPERAND (t2, 1)) + || ! integer_zerop (DECL_SIZE (TREE_OPERAND (t2, 1)))))) { attrs.expr = t2; attrs.offset_known_p = false; is an alternative / additional fix (OTOH for all trailing arrays this isn't really a conservative MEM_EXPR, not just for size zero ones). Thus Index: gcc/emit-rtl.c =================================================================== --- gcc/emit-rtl.c (revision 247273) +++ gcc/emit-rtl.c (working copy) @@ -1954,7 +1954,10 @@ set_mem_attributes_minus_bitpos (rtx ref while (TREE_CODE (t2) == ARRAY_REF); if (DECL_P (t2) - || TREE_CODE (t2) == COMPONENT_REF) + || (TREE_CODE (t2) == COMPONENT_REF + /* For trailing arrays t2 doesn't have a size that + covers all valid accesses. */ + && ! array_at_struct_end_p (t, false))) { attrs.expr = t2; attrs.offset_known_p = false;