diff mbox

[PR62151] Fix uninitialized register issue caused by distribute_notes in combine pass

Message ID CAHFci28BS75pySKVtLTeY8BQsYupJHvXs73ZdYq1RkfHLbXocQ@mail.gmail.com
State New
Headers show

Commit Message

Bin.Cheng Aug. 28, 2014, 5:04 a.m. UTC
On Wed, Aug 27, 2014 at 6:34 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
> On 27/08/14 11:08, Bin Cheng wrote:
>> Hi
>> As reported in bug pr62151, combine pass may wrongly delete necessary
>> instruction in function distribute_notes thus leaving register
>> uninitialized.  This patch is to fix the issue by checking if i2 immediately
>> modifies the register in REG_DEAD note.  If yes, set tem_insn accordingly to
>> start finding right place for note distribution from i2.
>>
>> I once sent a RFC patch at
>> https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01718.html, but got no
>> comments,  here I added some comments in this patch to make it a formal one.
>>
>>
>> I tested the original patch, and will re-test it against the latest code
>> later.  So is it OK?  Any comments will be appreciated.
>>
>
> Isn't this the sort of sequence that combinable_i3pat is supposed to reject?
Hi Richard,
I think it's not.  combinable_i3pat checks cases in which i1 feeds to
i3 directly, while i2 kills reg_use in i1.  For this case the feeding
chain is "i0->i1->i2->i3", the combination is valid and beneficial.
What needs to be handled is if i2dest is anticipated after i3.  If
not, then i2 could be deleted; if yes, i2 should be preserved.
Moreover, following constant propagation could delete i2 after
propagating the new i2src into i4.  Note combine pass already handles
this kind of case using variable added_sets_2 in function try_combine.
The problem is in distribute_notes which mis-deleted i2.

I added one test case in the updated patch.

Thanks,
bin
>

Comments

Jeff Law Aug. 30, 2014, 5:58 a.m. UTC | #1
On 08/27/14 23:04, Bin.Cheng wrote:
> On Wed, Aug 27, 2014 at 6:34 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
>> On 27/08/14 11:08, Bin Cheng wrote:
>>> Hi
>>> As reported in bug pr62151, combine pass may wrongly delete necessary
>>> instruction in function distribute_notes thus leaving register
>>> uninitialized.  This patch is to fix the issue by checking if i2 immediately
>>> modifies the register in REG_DEAD note.  If yes, set tem_insn accordingly to
>>> start finding right place for note distribution from i2.
>>>
>>> I once sent a RFC patch at
>>> https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01718.html, but got no
>>> comments,  here I added some comments in this patch to make it a formal one.
>>>
>>>
>>> I tested the original patch, and will re-test it against the latest code
>>> later.  So is it OK?  Any comments will be appreciated.
>>>
>>
>> Isn't this the sort of sequence that combinable_i3pat is supposed to reject?
> Hi Richard,
> I think it's not.  combinable_i3pat checks cases in which i1 feeds to
> i3 directly, while i2 kills reg_use in i1.  For this case the feeding
> chain is "i0->i1->i2->i3", the combination is valid and beneficial.
> What needs to be handled is if i2dest is anticipated after i3.  If
> not, then i2 could be deleted; if yes, i2 should be preserved.
> Moreover, following constant propagation could delete i2 after
> propagating the new i2src into i4.  Note combine pass already handles
> this kind of case using variable added_sets_2 in function try_combine.
> The problem is in distribute_notes which mis-deleted i2.
>
> I added one test case in the updated patch.
One could argue that this mess is a result of trying to optimize a reg 
that is set more than once.    Though I guess that might be a bit of a 
big hammer.

I haven't thought real hard, but isn't it the case that for a pseudo 
with multiple sets that we never want to move a REG_DEAD note across a 
set of that pseudo?  It would seem that in these cases we want to drop 
the REG_DEAD note completely.

Note that i0..i4 need not be consecutive insns, so you'd have to walk 
the chain from the location with the death note to the proposed death 
note site.  If between those locations there's another set of the same 
pseudo, then drop the note.  Since this may be an expensive check it 
should probably be conditionalized on REG_N_SETS (pseudo) > 1

Jeff
Segher Boessenkool Aug. 31, 2014, 12:18 p.m. UTC | #2
On Fri, Aug 29, 2014 at 11:58:37PM -0600, Jeff Law wrote:
> One could argue that this mess is a result of trying to optimize a reg 
> that is set more than once.    Though I guess that might be a bit of a 
> big hammer.

It works fine in other cases, and is quite beneficial for e.g. optimising
instruction sequences that set a fixed carry register twice.

In the testcase (and comment in the proposed patch), why is combine
combining four insns at all?  That means it rejected combining just the
first three.  Why did it do that?


Segher
Bin.Cheng Sept. 1, 2014, 3:36 a.m. UTC | #3
On Sun, Aug 31, 2014 at 8:18 PM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> On Fri, Aug 29, 2014 at 11:58:37PM -0600, Jeff Law wrote:
>> One could argue that this mess is a result of trying to optimize a reg
>> that is set more than once.    Though I guess that might be a bit of a
>> big hammer.
>
> It works fine in other cases, and is quite beneficial for e.g. optimising
> instruction sequences that set a fixed carry register twice.
>
> In the testcase (and comment in the proposed patch), why is combine
> combining four insns at all?  That means it rejected combining just the
> first three.  Why did it do that?
It is explicitly reject by below code in can_combine_p.

  if (GET_CODE (PATTERN (i3)) == PARALLEL)
    for (i = XVECLEN (PATTERN (i3), 0) - 1; i >= 0; i--)
      if (GET_CODE (XVECEXP (PATTERN (i3), 0, i)) == CLOBBER)
    {
      /* Don't substitute for a register intended as a clobberable
         operand.  */
      rtx reg = XEXP (XVECEXP (PATTERN (i3), 0, i), 0);
      if (rtx_equal_p (reg, dest))
        return 0;

Since insn i2 in the list of i0/i1/i2 as below contains parallel
clobber of dest_of_insn76/use_of_insn77.
   32: r84:SI=0
   76: flags:CC=cmp(r84:SI,0x1)
      REG_DEAD r84:SI
   77: {r84:SI=-ltu(flags:CC,0);clobber flags:CC;}
      REG_DEAD flags:CC
      REG_UNUSED flags:CC

Thanks,
bin
Bin.Cheng Sept. 1, 2014, 4:18 a.m. UTC | #4
On Sat, Aug 30, 2014 at 1:58 PM, Jeff Law <law@redhat.com> wrote:
> On 08/27/14 23:04, Bin.Cheng wrote:
>>
>> On Wed, Aug 27, 2014 at 6:34 PM, Richard Earnshaw <rearnsha@arm.com>
>> wrote:
>>>
>>> On 27/08/14 11:08, Bin Cheng wrote:
>>>>
>>>> Hi
>>>> As reported in bug pr62151, combine pass may wrongly delete necessary
>>>> instruction in function distribute_notes thus leaving register
>>>> uninitialized.  This patch is to fix the issue by checking if i2
>>>> immediately
>>>> modifies the register in REG_DEAD note.  If yes, set tem_insn
>>>> accordingly to
>>>> start finding right place for note distribution from i2.
>>>>
>>>> I once sent a RFC patch at
>>>> https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01718.html, but got no
>>>> comments,  here I added some comments in this patch to make it a formal
>>>> one.
>>>>
>>>>
>>>> I tested the original patch, and will re-test it against the latest code
>>>> later.  So is it OK?  Any comments will be appreciated.
>>>>
>>>
>>> Isn't this the sort of sequence that combinable_i3pat is supposed to
>>> reject?
>>
>> Hi Richard,
>> I think it's not.  combinable_i3pat checks cases in which i1 feeds to
>> i3 directly, while i2 kills reg_use in i1.  For this case the feeding
>> chain is "i0->i1->i2->i3", the combination is valid and beneficial.
>> What needs to be handled is if i2dest is anticipated after i3.  If
>> not, then i2 could be deleted; if yes, i2 should be preserved.
>> Moreover, following constant propagation could delete i2 after
>> propagating the new i2src into i4.  Note combine pass already handles
>> this kind of case using variable added_sets_2 in function try_combine.
>> The problem is in distribute_notes which mis-deleted i2.
>>
>> I added one test case in the updated patch.

Hi Jeff,
Thanks very much for the review.

>
> One could argue that this mess is a result of trying to optimize a reg that
> is set more than once.    Though I guess that might be a bit of a big
> hammer.

As noted by Segher, it's a quite useful optimization which we don't
want to disable.

>
> I haven't thought real hard, but isn't it the case that for a pseudo with
> multiple sets that we never want to move a REG_DEAD note across a set of
> that pseudo?  It would seem that in these cases we want to drop the REG_DEAD
> note completely.
>
> Note that i0..i4 need not be consecutive insns, so you'd have to walk the
> chain from the location with the death note to the proposed death note site.
> If between those locations there's another set of the same pseudo, then drop
> the note.  Since this may be an expensive check it should probably be
> conditionalized on REG_N_SETS (pseudo) > 1

Here is the complicated part.  The from_insn is i1, i2/i3 are the
following instructions.  The original logic seems to me is scanning
from i3 for one insn for distribution of REG_DEAD note in from_insn.
Since the last use is in from_insn, it makes no sense to scan from i3
(after from_insn).  What we need to do is scanning from from_insn in
backward trying to find a place for note distribution.  If we run into
a useless set of the note reg, we can just delete that insn or add
REG_UNUSED to it.  It just seems not right to do this on instructions
after from_insn, which causes the wrong code in this specific case.

Any comments?

Thanks,
bin
Segher Boessenkool Sept. 1, 2014, 11:38 a.m. UTC | #5
On Mon, Sep 01, 2014 at 11:36:07AM +0800, Bin.Cheng wrote:
> > In the testcase (and comment in the proposed patch), why is combine
> > combining four insns at all?  That means it rejected combining just the
> > first three.  Why did it do that?
> It is explicitly reject by below code in can_combine_p.
> 
>   if (GET_CODE (PATTERN (i3)) == PARALLEL)
>     for (i = XVECLEN (PATTERN (i3), 0) - 1; i >= 0; i--)
>       if (GET_CODE (XVECEXP (PATTERN (i3), 0, i)) == CLOBBER)
>     {
>       /* Don't substitute for a register intended as a clobberable
>          operand.  */
>       rtx reg = XEXP (XVECEXP (PATTERN (i3), 0, i), 0);
>       if (rtx_equal_p (reg, dest))
>         return 0;
> 
> Since insn i2 in the list of i0/i1/i2 as below contains parallel
> clobber of dest_of_insn76/use_of_insn77.
>    32: r84:SI=0
>    76: flags:CC=cmp(r84:SI,0x1)
>       REG_DEAD r84:SI
>    77: {r84:SI=-ltu(flags:CC,0);clobber flags:CC;}
>       REG_DEAD flags:CC
>       REG_UNUSED flags:CC

Archaeology suggests this check is because the clobber might be an
earlyclobber.  Which seems silly: how can it be a valid insn at all
in that case?  It seems to me the check can just be removed.  That
will hide your issue, maybe even solve it (but I doubt it).

If not, then combining the four insns (your case that explodes)
should not be allowed either (it's just the same, with a register
copy tucked on the end).  I haven't looked, but a missing can_combine_p
call perhaps?

Another question is why is r84 set twice in the first place?


Segher
Jeff Law Sept. 1, 2014, 4:39 p.m. UTC | #6
On 09/01/14 05:38, Segher Boessenkool wrote:
> On Mon, Sep 01, 2014 at 11:36:07AM +0800, Bin.Cheng wrote:
>>> In the testcase (and comment in the proposed patch), why is combine
>>> combining four insns at all?  That means it rejected combining just the
>>> first three.  Why did it do that?
>> It is explicitly reject by below code in can_combine_p.
>>
>>    if (GET_CODE (PATTERN (i3)) == PARALLEL)
>>      for (i = XVECLEN (PATTERN (i3), 0) - 1; i >= 0; i--)
>>        if (GET_CODE (XVECEXP (PATTERN (i3), 0, i)) == CLOBBER)
>>      {
>>        /* Don't substitute for a register intended as a clobberable
>>           operand.  */
>>        rtx reg = XEXP (XVECEXP (PATTERN (i3), 0, i), 0);
>>        if (rtx_equal_p (reg, dest))
>>          return 0;
>>
>> Since insn i2 in the list of i0/i1/i2 as below contains parallel
>> clobber of dest_of_insn76/use_of_insn77.
>>     32: r84:SI=0
>>     76: flags:CC=cmp(r84:SI,0x1)
>>        REG_DEAD r84:SI
>>     77: {r84:SI=-ltu(flags:CC,0);clobber flags:CC;}
>>        REG_DEAD flags:CC
>>        REG_UNUSED flags:CC
>
> Archaeology suggests this check is because the clobber might be an
> earlyclobber.  Which seems silly: how can it be a valid insn at all
> in that case?  It seems to me the check can just be removed.  That
> will hide your issue, maybe even solve it (but I doubt it).
Silly for other reasons, namely that earlyclobber doesn't come into play 
until after combine (register allocation and later).


>
> Another question is why is r84 set twice in the first place?
Various transformations can set that kind of situation up.  One could 
argue that a local pass ssa-like-ize things like this would be good 
independent of this BZ.  The web code would probably help here, but 
seems awful heavyweight for an opportunity like this.

jeff
Jeff Law Sept. 1, 2014, 4:41 p.m. UTC | #7
On 08/31/14 06:18, Segher Boessenkool wrote:
> On Fri, Aug 29, 2014 at 11:58:37PM -0600, Jeff Law wrote:
>> One could argue that this mess is a result of trying to optimize a reg
>> that is set more than once.    Though I guess that might be a bit of a
>> big hammer.
>
> It works fine in other cases, and is quite beneficial for e.g. optimising
> instruction sequences that set a fixed carry register twice.
How common is that?

While we don't have any formal SSA-like properties in RTL, we're 
certainly better off if we can avoid unnecessary cases where a single 
pseudo is set more than once and these days I wouldn't expect too many 
cases where have multiple sets appearing in a dep chain that can be 
processed by combine (and if we do one could easily argue those dep 
chains should be simplified).

Jeff
Segher Boessenkool Sept. 1, 2014, 5:50 p.m. UTC | #8
On Mon, Sep 01, 2014 at 10:39:10AM -0600, Jeff Law wrote:
> On 09/01/14 05:38, Segher Boessenkool wrote:
> >On Mon, Sep 01, 2014 at 11:36:07AM +0800, Bin.Cheng wrote:
> >>>In the testcase (and comment in the proposed patch), why is combine
> >>>combining four insns at all?  That means it rejected combining just the
> >>>first three.  Why did it do that?
> >>It is explicitly reject by below code in can_combine_p.
> >>
> >>   if (GET_CODE (PATTERN (i3)) == PARALLEL)
> >>     for (i = XVECLEN (PATTERN (i3), 0) - 1; i >= 0; i--)
> >>       if (GET_CODE (XVECEXP (PATTERN (i3), 0, i)) == CLOBBER)
> >>     {
> >>       /* Don't substitute for a register intended as a clobberable
> >>          operand.  */
> >>       rtx reg = XEXP (XVECEXP (PATTERN (i3), 0, i), 0);
> >>       if (rtx_equal_p (reg, dest))
> >>         return 0;
> >>
> >>Since insn i2 in the list of i0/i1/i2 as below contains parallel
> >>clobber of dest_of_insn76/use_of_insn77.
> >>    32: r84:SI=0
> >>    76: flags:CC=cmp(r84:SI,0x1)
> >>       REG_DEAD r84:SI
> >>    77: {r84:SI=-ltu(flags:CC,0);clobber flags:CC;}
> >>       REG_DEAD flags:CC
> >>       REG_UNUSED flags:CC
> >
> >Archaeology suggests this check is because the clobber might be an
> >earlyclobber.  Which seems silly: how can it be a valid insn at all
> >in that case?  It seems to me the check can just be removed.  That
> >will hide your issue, maybe even solve it (but I doubt it).
> Silly for other reasons, namely that earlyclobber doesn't come into play 
> until after combine (register allocation and later).

The last change to this code was by Ulrich (cc:ed); in that thread (June
2004, mostly not threaded in the mail archive, broken MUAs :-( ) it was
said that any clobber should be considered an earlyclobber (an RTL insn
can expand to multiple machine instructions, for example).  But I don't
see how that can matter for "dest" here (the dest of "insn", that's 76
in the example), only for "src".

The version of "flags" set in 76 obviously dies in 77 (it clobbers the
reg after all), but there is no way it could clobber it before it uses
it, that just makes no sense.  And in the combined insn that version of
flags does not exist at all.

> >Another question is why is r84 set twice in the first place?
> Various transformations can set that kind of situation up.

Sure -- but also lazy expanders can reuse a register instead of doing
gen_reg_rtx.  Which is why I asked :-)

> One could 
> argue that a local pass ssa-like-ize things like this would be good 
> independent of this BZ.  The web code would probably help here, but 
> seems awful heavyweight for an opportunity like this.

Not worth the effort I'd say.


Segher
Segher Boessenkool Sept. 1, 2014, 7:15 p.m. UTC | #9
On Mon, Sep 01, 2014 at 10:41:43AM -0600, Jeff Law wrote:
> On 08/31/14 06:18, Segher Boessenkool wrote:
> >On Fri, Aug 29, 2014 at 11:58:37PM -0600, Jeff Law wrote:
> >>One could argue that this mess is a result of trying to optimize a reg
> >>that is set more than once.    Though I guess that might be a bit of a
> >>big hammer.
> >
> >It works fine in other cases, and is quite beneficial for e.g. optimising
> >instruction sequences that set a fixed carry register twice.
> How common is that?

I meant once setting the reg, and then using and clobbering it in another
insn.  This is quite common -- on some targets the add-with-carry insns
are used for scc things too.  You could say all cases where combine can
do something with this should have been optimised earlier, but that is
not the case today.

> While we don't have any formal SSA-like properties in RTL, we're 
> certainly better off if we can avoid unnecessary cases where a single 
> pseudo is set more than once

Note that in this case we're talking about a hard register, not a pseudo.

> and these days I wouldn't expect too many 
> cases where have multiple sets appearing in a dep chain that can be 
> processed by combine (and if we do one could easily argue those dep 
> chains should be simplified).

For pseudos I of course agree with that :-)


Segher
Bin.Cheng Sept. 2, 2014, 2:02 a.m. UTC | #10
On Tue, Sep 2, 2014 at 1:50 AM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> On Mon, Sep 01, 2014 at 10:39:10AM -0600, Jeff Law wrote:
>> On 09/01/14 05:38, Segher Boessenkool wrote:
>> >On Mon, Sep 01, 2014 at 11:36:07AM +0800, Bin.Cheng wrote:
>> >>>In the testcase (and comment in the proposed patch), why is combine
>> >>>combining four insns at all?  That means it rejected combining just the
>> >>>first three.  Why did it do that?
>> >>It is explicitly reject by below code in can_combine_p.
>> >>
>> >>   if (GET_CODE (PATTERN (i3)) == PARALLEL)
>> >>     for (i = XVECLEN (PATTERN (i3), 0) - 1; i >= 0; i--)
>> >>       if (GET_CODE (XVECEXP (PATTERN (i3), 0, i)) == CLOBBER)
>> >>     {
>> >>       /* Don't substitute for a register intended as a clobberable
>> >>          operand.  */
>> >>       rtx reg = XEXP (XVECEXP (PATTERN (i3), 0, i), 0);
>> >>       if (rtx_equal_p (reg, dest))
>> >>         return 0;
>> >>
>> >>Since insn i2 in the list of i0/i1/i2 as below contains parallel
>> >>clobber of dest_of_insn76/use_of_insn77.
>> >>    32: r84:SI=0
>> >>    76: flags:CC=cmp(r84:SI,0x1)
>> >>       REG_DEAD r84:SI
>> >>    77: {r84:SI=-ltu(flags:CC,0);clobber flags:CC;}
>> >>       REG_DEAD flags:CC
>> >>       REG_UNUSED flags:CC
>> >
>> >Archaeology suggests this check is because the clobber might be an
>> >earlyclobber.  Which seems silly: how can it be a valid insn at all
>> >in that case?  It seems to me the check can just be removed.  That
>> >will hide your issue, maybe even solve it (but I doubt it).
>> Silly for other reasons, namely that earlyclobber doesn't come into play
>> until after combine (register allocation and later).
>
> The last change to this code was by Ulrich (cc:ed); in that thread (June
> 2004, mostly not threaded in the mail archive, broken MUAs :-( ) it was
> said that any clobber should be considered an earlyclobber (an RTL insn
> can expand to multiple machine instructions, for example).  But I don't
> see how that can matter for "dest" here (the dest of "insn", that's 76
> in the example), only for "src".
>
> The version of "flags" set in 76 obviously dies in 77 (it clobbers the
> reg after all), but there is no way it could clobber it before it uses
> it, that just makes no sense.  And in the combined insn that version of
> flags does not exist at all.
Agreed, otherwise it would be another uninitialized use problem.
Maybe the check is too strict here?  Do you have some archived page
address for that, just saving us some time for digging.

My only concern is, logic in dictribute_notes should also be revisited
under this BZ.  I think the issue will be hidden by changes we are
talking about in can_combine_p.

Thanks,
bin
>
>> >Another question is why is r84 set twice in the first place?
>> Various transformations can set that kind of situation up.
>
> Sure -- but also lazy expanders can reuse a register instead of doing
> gen_reg_rtx.  Which is why I asked :-)
>
>> One could
>> argue that a local pass ssa-like-ize things like this would be good
>> independent of this BZ.  The web code would probably help here, but
>> seems awful heavyweight for an opportunity like this.
>
> Not worth the effort I'd say.
>
>
> Segher
Jeff Law Sept. 2, 2014, 3:28 a.m. UTC | #11
On 09/01/14 13:15, Segher Boessenkool wrote:
> On Mon, Sep 01, 2014 at 10:41:43AM -0600, Jeff Law wrote:
>> On 08/31/14 06:18, Segher Boessenkool wrote:
>>> On Fri, Aug 29, 2014 at 11:58:37PM -0600, Jeff Law wrote:
>>>> One could argue that this mess is a result of trying to optimize a reg
>>>> that is set more than once.    Though I guess that might be a bit of a
>>>> big hammer.
>>>
>>> It works fine in other cases, and is quite beneficial for e.g. optimising
>>> instruction sequences that set a fixed carry register twice.
>> How common is that?
>
> I meant once setting the reg, and then using and clobbering it in another
> insn.  This is quite common -- on some targets the add-with-carry insns
> are used for scc things too.  You could say all cases where combine can
> do something with this should have been optimised earlier, but that is
> not the case today.
>
>> While we don't have any formal SSA-like properties in RTL, we're
>> certainly better off if we can avoid unnecessary cases where a single
>> pseudo is set more than once
>
> Note that in this case we're talking about a hard register, not a pseudo.
I was referring to r84 in Bin's message, not the condition code 
register.  Unless I missed something it's set at the start of the 
sequence to the value 0, then later to -ltu(flags,cc,0).

There's no good reason I can see why we're reusing a pseudo like that. 
I suspect that if we go back, fix whatever's creating that lame sequence 
and simply reject combinations involving a pseudo set more than once it 
won't affect code in any real way.  If we wanted to be anal about it, 
we'd put in some kind of debugging note and someone could do some wider 
scale testing.

jeff
Jeff Law Sept. 2, 2014, 3:35 a.m. UTC | #12
On 09/01/14 11:50, Segher Boessenkool wrote:
>
>>> Another question is why is r84 set twice in the first place?
>> Various transformations can set that kind of situation up.
>
> Sure -- but also lazy expanders can reuse a register instead of doing
> gen_reg_rtx.  Which is why I asked :-)
Which comes back to my original train of thought.  Fix the handling of 
r84 and disallow the combination when we've got multiple sets of a pseudo.

jeff
Jeff Law Sept. 2, 2014, 3:40 a.m. UTC | #13
On 08/31/14 22:18, Bin.Cheng wrote:
>>
>> Note that i0..i4 need not be consecutive insns, so you'd have to walk the
>> chain from the location with the death note to the proposed death note site.
>> If between those locations there's another set of the same pseudo, then drop
>> the note.  Since this may be an expensive check it should probably be
>> conditionalized on REG_N_SETS (pseudo) > 1
>
> Here is the complicated part.  The from_insn is i1, i2/i3 are the
> following instructions.  The original logic seems to me is scanning
> from i3 for one insn for distribution of REG_DEAD note in from_insn.
> Since the last use is in from_insn, it makes no sense to scan from i3
> (after from_insn).  What we need to do is scanning from from_insn in
> backward trying to find a place for note distribution.  If we run into
> a useless set of the note reg, we can just delete that insn or add
> REG_UNUSED to it.  It just seems not right to do this on instructions
> after from_insn, which causes the wrong code in this specific case.
I wasn't suggesting we add a REG_UNUSED or delete anything.  Merely look 
to see if between the original note's location and new proposed location 
for the note.  If there's another assignment to the same pseudo in that 
range of insns, then simply remove the note.  What happens if you do that?

It seems to me that adding a REG_UNUSED or trying to delete any insns at 
this stage is just a complication we don't really need.

jeff
Bin.Cheng Sept. 2, 2014, 5:14 a.m. UTC | #14
On Tue, Sep 2, 2014 at 11:40 AM, Jeff Law <law@redhat.com> wrote:
> On 08/31/14 22:18, Bin.Cheng wrote:
>>>
>>>
>>> Note that i0..i4 need not be consecutive insns, so you'd have to walk the
>>> chain from the location with the death note to the proposed death note
>>> site.
>>> If between those locations there's another set of the same pseudo, then
>>> drop
>>> the note.  Since this may be an expensive check it should probably be
>>> conditionalized on REG_N_SETS (pseudo) > 1
>>
>>
>> Here is the complicated part.  The from_insn is i1, i2/i3 are the
>> following instructions.  The original logic seems to me is scanning
>> from i3 for one insn for distribution of REG_DEAD note in from_insn.
>> Since the last use is in from_insn, it makes no sense to scan from i3
>> (after from_insn).  What we need to do is scanning from from_insn in
>> backward trying to find a place for note distribution.  If we run into
>> a useless set of the note reg, we can just delete that insn or add
>> REG_UNUSED to it.  It just seems not right to do this on instructions
>> after from_insn, which causes the wrong code in this specific case.
>
> I wasn't suggesting we add a REG_UNUSED or delete anything.  Merely look to
> see if between the original note's location and new proposed location for
> the note.  If there's another assignment to the same pseudo in that range of
> insns, then simply remove the note.  What happens if you do that?
I will do some experiments on this.  If there is no optimizations
depending on the REG_DEAD note following combine pass, I suppose there
is no read effect.

Thanks,
bin
>
> It seems to me that adding a REG_UNUSED or trying to delete any insns at
> this stage is just a complication we don't really need.
>
> jeff
Bin.Cheng Sept. 2, 2014, 5:17 a.m. UTC | #15
On Tue, Sep 2, 2014 at 11:28 AM, Jeff Law <law@redhat.com> wrote:
> On 09/01/14 13:15, Segher Boessenkool wrote:
>>
>> On Mon, Sep 01, 2014 at 10:41:43AM -0600, Jeff Law wrote:
>>>
>>> On 08/31/14 06:18, Segher Boessenkool wrote:
>>>>
>>>> On Fri, Aug 29, 2014 at 11:58:37PM -0600, Jeff Law wrote:
>>>>>
>>>>> One could argue that this mess is a result of trying to optimize a reg
>>>>> that is set more than once.    Though I guess that might be a bit of a
>>>>> big hammer.
>>>>
>>>>
>>>> It works fine in other cases, and is quite beneficial for e.g.
>>>> optimising
>>>> instruction sequences that set a fixed carry register twice.
>>>
>>> How common is that?
>>
>>
>> I meant once setting the reg, and then using and clobbering it in another
>> insn.  This is quite common -- on some targets the add-with-carry insns
>> are used for scc things too.  You could say all cases where combine can
>> do something with this should have been optimised earlier, but that is
>> not the case today.
>>
>>> While we don't have any formal SSA-like properties in RTL, we're
>>> certainly better off if we can avoid unnecessary cases where a single
>>> pseudo is set more than once
>>
>>
>> Note that in this case we're talking about a hard register, not a pseudo.
>
> I was referring to r84 in Bin's message, not the condition code register.
> Unless I missed something it's set at the start of the sequence to the value
> 0, then later to -ltu(flags,cc,0).
>
> There's no good reason I can see why we're reusing a pseudo like that. I
> suspect that if we go back, fix whatever's creating that lame sequence and
> simply reject combinations involving a pseudo set more than once it won't
> affect code in any real way.  If we wanted to be anal about it, we'd put in
> some kind of debugging note and someone could do some wider scale testing.

For this specific case, I think the reuse of r84 comes from coalescing
during expanding, and this is necessary to remove redundant reg-moves.
Then we need to fix this in coming passes?

Thanks,
bin
>
> jeff
>
Segher Boessenkool Sept. 2, 2014, 11:39 a.m. UTC | #16
On Tue, Sep 02, 2014 at 10:02:38AM +0800, Bin.Cheng wrote:
> >> >Archaeology suggests this check is because the clobber might be an
> >> >earlyclobber.  Which seems silly: how can it be a valid insn at all
> >> >in that case?  It seems to me the check can just be removed.  That
> >> >will hide your issue, maybe even solve it (but I doubt it).
> >> Silly for other reasons, namely that earlyclobber doesn't come into play
> >> until after combine (register allocation and later).
> >
> > The last change to this code was by Ulrich (cc:ed); in that thread (June
> > 2004, mostly not threaded in the mail archive, broken MUAs :-( ) it was
> > said that any clobber should be considered an earlyclobber (an RTL insn
> > can expand to multiple machine instructions, for example).  But I don't
> > see how that can matter for "dest" here (the dest of "insn", that's 76
> > in the example), only for "src".
> >
> > The version of "flags" set in 76 obviously dies in 77 (it clobbers the
> > reg after all), but there is no way it could clobber it before it uses
> > it, that just makes no sense.  And in the combined insn that version of
> > flags does not exist at all.
> Agreed, otherwise it would be another uninitialized use problem.
> Maybe the check is too strict here?  Do you have some archived page
> address for that, just saving us some time for digging.

http://gcc.gnu.org/ml/gcc-patches/2004-06/msg00994.html
(and look in that month's archives for the rest of the messages).

> My only concern is, logic in dictribute_notes should also be revisited
> under this BZ.  I think the issue will be hidden by changes we are
> talking about in can_combine_p.

Yes.  Unless we disallow all combinations that *would* cause problems :-)


Segher
Segher Boessenkool Sept. 2, 2014, 11:50 a.m. UTC | #17
On Mon, Sep 01, 2014 at 09:28:09PM -0600, Jeff Law wrote:
> >Note that in this case we're talking about a hard register, not a pseudo.
> I was referring to r84 in Bin's message, not the condition code 
> register.  Unless I missed something it's set at the start of the 
> sequence to the value 0, then later to -ltu(flags,cc,0).

Bin said that the three-insn combination is refused because of the flags
register, not r84.  So either the four-insn combination should do those
same checks, or we should allow it, or both.

> There's no good reason I can see why we're reusing a pseudo like that. 
> I suspect that if we go back, fix whatever's creating that lame sequence 
> and simply reject combinations involving a pseudo set more than once it 
> won't affect code in any real way.  If we wanted to be anal about it, 
> we'd put in some kind of debugging note and someone could do some wider 
> scale testing.

All that, too :-)  Although it all seems to work fine for two-insn and
three-insn combinations.


Segher
Segher Boessenkool Sept. 2, 2014, 1:40 p.m. UTC | #18
On Tue, Sep 02, 2014 at 02:10:32PM +0200, Ulrich Weigand wrote:
> In any case, this test in can_combine_p rejects a combination for *two*
> different issues.  One is the earlyclobber problem, which is what that
> 2004 thread was about, and which my patch back then relaxed for fixed
> hard register.
> 
> However, this doesn't seem to apply to the example above; that is really
> about the second problem: don't substitute into a clobber.

Right.

> I understand the reason why this particular substitution is rejected is
> simply that if it weren't, we'd be substituting flags:CC=cmp(r84:SI,0x1)
> into clobber flags:CC, resulting in clobber cmp(r84:SI,0x1), which is
> invalid RTL.

I checked, and that is indeed what combine does.  How silly.

> Now I guess this check could be relaxed if somewhere else in combine we'd
> recognize the substitution into a clobber and simply omit it in that case.

Yeah.

In the testcase, combine tries combining 76,77 (77 is that clobbering
insn) and refuses it; then it tries 32,76,77 and refuses it; and then
it tries 32,76,77,43 and allows it (it doesn't do this check at all,
77 is not i3, combine omits the clobber completely).  Which is inconsistent.

What a mess.  Thanks for looking!


Segher
Bin.Cheng Sept. 3, 2014, 2:34 a.m. UTC | #19
On Tue, Sep 2, 2014 at 9:40 PM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> On Tue, Sep 02, 2014 at 02:10:32PM +0200, Ulrich Weigand wrote:
>> In any case, this test in can_combine_p rejects a combination for *two*
>> different issues.  One is the earlyclobber problem, which is what that
>> 2004 thread was about, and which my patch back then relaxed for fixed
>> hard register.
>>
>> However, this doesn't seem to apply to the example above; that is really
>> about the second problem: don't substitute into a clobber.
>
> Right.
>
>> I understand the reason why this particular substitution is rejected is
>> simply that if it weren't, we'd be substituting flags:CC=cmp(r84:SI,0x1)
>> into clobber flags:CC, resulting in clobber cmp(r84:SI,0x1), which is
>> invalid RTL.
>
> I checked, and that is indeed what combine does.  How silly.
>
>> Now I guess this check could be relaxed if somewhere else in combine we'd
>> recognize the substitution into a clobber and simply omit it in that case.
>
> Yeah.
>
> In the testcase, combine tries combining 76,77 (77 is that clobbering
> insn) and refuses it; then it tries 32,76,77 and refuses it; and then
> it tries 32,76,77,43 and allows it (it doesn't do this check at all,
> 77 is not i3, combine omits the clobber completely).  Which is inconsistent.

I guess it makes sense because this way it doesn't introduce any
invalid instructions.  But yes, how combine handles the clobber in
this way may help combine the three instructions?

Thanks,
bin
>
> What a mess.  Thanks for looking!
>
>
> Segher
Segher Boessenkool Sept. 3, 2014, 1:20 p.m. UTC | #20
On Wed, Sep 03, 2014 at 10:34:39AM +0800, Bin.Cheng wrote:
> >> Now I guess this check could be relaxed if somewhere else in combine we'd
> >> recognize the substitution into a clobber and simply omit it in that case.
> >
> > Yeah.
> >
> > In the testcase, combine tries combining 76,77 (77 is that clobbering
> > insn) and refuses it; then it tries 32,76,77 and refuses it; and then
> > it tries 32,76,77,43 and allows it (it doesn't do this check at all,
> > 77 is not i3, combine omits the clobber completely).  Which is inconsistent.
> 
> I guess it makes sense because this way it doesn't introduce any
> invalid instructions.  But yes, how combine handles the clobber in
> this way may help combine the three instructions?

Combine could just throw away all clobbers on i3 and add them back if
wanted by the combined insn.  I think it does things the way it does so
that it creates slightly less garbage RTL and/or it is a little less work.
But it does disallow this case; easily fixed, have patch, will post in a
minute.

That will hide your problem, but not really fix it I think.  Maybe we
need to disallow all combos that set the same register twice in four-insn
combinations.  Or at least when that register does not die.  Why is this
combination tried anyway?  r84 (as set in 77, used in 43) does not die,
so this is just doing a very expensive very limited form of un-cse?


Segher
Jeff Law Sept. 5, 2014, 4:13 a.m. UTC | #21
On 09/01/14 23:14, Bin.Cheng wrote:
> On Tue, Sep 2, 2014 at 11:40 AM, Jeff Law <law@redhat.com> wrote:
>> On 08/31/14 22:18, Bin.Cheng wrote:
>>>>
>>>>
>>>> Note that i0..i4 need not be consecutive insns, so you'd have to walk the
>>>> chain from the location with the death note to the proposed death note
>>>> site.
>>>> If between those locations there's another set of the same pseudo, then
>>>> drop
>>>> the note.  Since this may be an expensive check it should probably be
>>>> conditionalized on REG_N_SETS (pseudo) > 1
>>>
>>>
>>> Here is the complicated part.  The from_insn is i1, i2/i3 are the
>>> following instructions.  The original logic seems to me is scanning
>>> from i3 for one insn for distribution of REG_DEAD note in from_insn.
>>> Since the last use is in from_insn, it makes no sense to scan from i3
>>> (after from_insn).  What we need to do is scanning from from_insn in
>>> backward trying to find a place for note distribution.  If we run into
>>> a useless set of the note reg, we can just delete that insn or add
>>> REG_UNUSED to it.  It just seems not right to do this on instructions
>>> after from_insn, which causes the wrong code in this specific case.
>>
>> I wasn't suggesting we add a REG_UNUSED or delete anything.  Merely look to
>> see if between the original note's location and new proposed location for
>> the note.  If there's another assignment to the same pseudo in that range of
>> insns, then simply remove the note.  What happens if you do that?
> I will do some experiments on this.  If there is no optimizations
> depending on the REG_DEAD note following combine pass, I suppose there
> is no read effect.
In the case we're talking about, as I understand it, the note no longer 
has any useful meaning.

In fact, I can't convince myself there is any valid place to put the 
note in the general case and trying to do so is really just a waste of 
time.  Or to look at it another way, if we were to conceptually throw 
away all the REG_DEAD notes, then recreate them from scratch after this 
insn combination that there'd be one less REG_DEAD note when they were 
rebuilt.

Jeff
Jeff Law Sept. 5, 2014, 4:17 a.m. UTC | #22
On 09/01/14 23:17, Bin.Cheng wrote:
> For this specific case, I think the reuse of r84 comes from coalescing
> during expanding, and this is necessary to remove redundant reg-moves.
> Then we need to fix this in coming passes?
Hmmm.  Yea, I can see how that might be happening.  There's a certain 
inherent tension between coalescing aggressively to remove reg moves and 
leaving the reg moves to avoid false dependencies like this.

I wonder if there's a less aggressive approach where we somehow say "I 
don't care about coalescing when the result is set, used & dies all 
within a basic block.  But that's probably out of scope for this problem.

jeff
diff mbox

Patch

Index: gcc/testsuite/gcc.c-torture/execute/pr62151.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/pr62151.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/pr62151.c	(revision 0)
@@ -0,0 +1,41 @@ 
+/* PR rtl-optimization/62151 */
+
+int a, c, d, e, f, g, h, i;
+short b;
+
+int
+fn1 ()
+{
+  b = 0;
+  for (;;)
+    {
+      int j[2];
+      j[f] = 0;
+      if (h)
+	d = 0;
+      else
+	{
+	  for (; f; f++)
+	    ;
+	  for (a = 0; a < 1; a++)
+	    for (;;)
+	      {
+		i = b & ((b ^ 1) & 83647) ? b : b - 1;
+		g = 1 ? i : 0;
+		e = j[0];
+		if (c)
+		  break;
+		return 0;
+	      }
+	}
+    }
+}
+
+int
+main ()
+{
+  fn1 ();
+  if (g != -1) 
+    __builtin_abort (); 
+  return 0;
+}
Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 214413)
+++ gcc/combine.c	(working copy)
@@ -13499,7 +13499,38 @@  distribute_notes (rtx notes, rtx_insn *from_insn,
 		       || rtx_equal_p (XEXP (note, 0), elim_i1)
 		       || rtx_equal_p (XEXP (note, 0), elim_i0))
 		break;
-	      tem_insn = i3;
+
+	      /* See PR62151.
+		 It's possible to have below situation:
+		   i0: r1 <- const_0
+		   i1: r2 <- r1 op_1 const_1
+		       REG_DEAD r1
+		   i2: r1 <- r2 op_2 const_2
+		       REG_DEAD r2
+		   i3: r3 <- r1
+		   i4: r4 <- r1
+
+		 It is transformed into below code before distributing
+		 the REG_DEAD note in i1:
+		   i0: NOTE_INSN_DELETED
+		   i1: NOTE_INSN_DELETED
+		   i2: r1 <- const_combined
+		   i3: r3 <- const_combined
+		   i4: r4 <- r1
+
+		 We need to check if i2 immediately modifies r1 otherwise
+		 i2 would be deleted by below code when distributing
+		 REG_DEAD note, leaving r1 in i4 uninitialied.
+
+		 We set TEM_INSN to i2 for this case indicating that we
+		 need to find right place for distribution from i2.
+		 */
+	      if (from_insn && i2
+		  && from_insn != i2 && from_insn != i3
+		  && reg_set_p (XEXP (note, 0), PATTERN (i2)))
+		tem_insn = i2;
+	      else
+		tem_insn = i3;
 	    }
 
 	  if (place == 0)