[PR,tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)
diff mbox

Message ID 39c1ceff-5584-210c-6ebc-12e8d234cffb@redhat.com
State New
Headers show

Commit Message

Aldy Hernandez Jan. 23, 2017, 5:26 p.m. UTC
On 01/18/2017 10:10 AM, Richard Biener wrote:
> On Fri, Jan 13, 2017 at 7:48 PM, Aldy Hernandez <aldyh@redhat.com> wrote:

Hi Richard.

I'd be happy to provide a patch, but could you please elaborate, as I'm 
afraid I'm not following.

>> We could go back to my original, no caching version (with the other
>> suggestions).  That solves the problem quite simply, with no regressions.
>
> So let's go with a unswitching-local solution then.  Based on
> tree_may_unswitch_on:

What do you mean by unswitching-local?

>
>   /* Condition must be invariant.  */
>   FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
>     {
>       /* Unswitching on undefined values would introduce undefined
>          behavior that the original program might never exercise.  */
>       if (ssa_undefined_value_p (use, true))
>         return NULL_TREE;
>       def = SSA_NAME_DEF_STMT (use);
>       def_bb = gimple_bb (def);
>       if (def_bb
>           && flow_bb_inside_loop_p (loop, def_bb))
>         return NULL_TREE;
>
> we only have to look for uses in blocks dominating the loop header block
> (or in blocks post-dominating that loop header, but we can probably implement
> that by simply including the loop header itself with a FIXME comment).

Look for *uses*??  Do you mean definitions?  I mean, we're trying to 
figure out whether we are going to unswitch on a use that is inside a 
the loop, not before or after. So perhaps we only care about 
*definitions* (SSA_NAME_DEF_STMT) dominating the loop header.

>
> Note that we only need to know whether a BB will be always executed when
> the loop is entered -- there's "infrastructure" elsewhere that computes this
> w/o the need of post-dominance.  For example see fill_always_executed_in_1
> tree-ssa-loop-im.c (we can't use that 1:1 I think because we already use ->aux
> via the original copy tables, but we could simplify it as we're only
> interested in
> the loop which preheader we put the unswitch condition on so we can use
> a bitmap to record whether a block of the loop is always executed or not).

I don't see any use of ->aux within loop unswitching, so perhaps no 
adjustment is needed?  I verified this by successfully bootstrapping with:

dominance info (or post-dominance info) originally was because we 
couldn't depend on dominance info being kept up to date between 
executions of tree_unswitch_single_loop(), which BTW, runs recursively.

> Can you send a patch that does this maybe?

Sure, when I understand what you suggest ;-).

Again, thanks for your input here.
Aldy

Comments

Richard Biener Jan. 24, 2017, 12:23 p.m. UTC | #1
On Mon, Jan 23, 2017 at 6:26 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> On 01/18/2017 10:10 AM, Richard Biener wrote:
>>
>> On Fri, Jan 13, 2017 at 7:48 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>
> Hi Richard.
>
> I'd be happy to provide a patch, but could you please elaborate, as I'm
> afraid I'm not following.
>
>>> We could go back to my original, no caching version (with the other
>>> suggestions).  That solves the problem quite simply, with no regressions.
>>
>>
>> So let's go with a unswitching-local solution then.  Based on
>> tree_may_unswitch_on:
>
>
> What do you mean by unswitching-local?

Like your original patch, not adding new generic infrastructure
outside of tree-ssa-unswitch.c.

>>
>>   /* Condition must be invariant.  */
>>   FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
>>     {
>>       /* Unswitching on undefined values would introduce undefined
>>          behavior that the original program might never exercise.  */
>>       if (ssa_undefined_value_p (use, true))
>>         return NULL_TREE;
>>       def = SSA_NAME_DEF_STMT (use);
>>       def_bb = gimple_bb (def);
>>       if (def_bb
>>           && flow_bb_inside_loop_p (loop, def_bb))
>>         return NULL_TREE;
>>
>> we only have to look for uses in blocks dominating the loop header block
>> (or in blocks post-dominating that loop header, but we can probably
>> implement
>> that by simply including the loop header itself with a FIXME comment).
>
>
> Look for *uses*??  Do you mean definitions?  I mean, we're trying to figure
> out whether we are going to unswitch on a use that is inside a the loop, not
> before or after. So perhaps we only care about *definitions*
> (SSA_NAME_DEF_STMT) dominating the loop header.

We're looking for stmts using the 'use's in the above loop on a path that is
always executed when the loop is entered.  So we're not introducing a
use of a possibly undefined value.

Of course we can also prove that 'use' is in fact not undefined looking at
its defs which are obviously always dominating the loop header if the
condition satisfies tree_may_unswitch_on (non-dominating defs will
have uses in PHIs dominating  the header which we have to treat
conservatively of course).

>
>>
>> Note that we only need to know whether a BB will be always executed when
>> the loop is entered -- there's "infrastructure" elsewhere that computes
>> this
>> w/o the need of post-dominance.  For example see fill_always_executed_in_1
>> tree-ssa-loop-im.c (we can't use that 1:1 I think because we already use
>> ->aux
>> via the original copy tables, but we could simplify it as we're only
>> interested in
>> the loop which preheader we put the unswitch condition on so we can use
>> a bitmap to record whether a block of the loop is always executed or not).
>
>
> I don't see any use of ->aux within loop unswitching, so perhaps no
> adjustment is needed?  I verified this by successfully bootstrapping with:
>
> diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c
> index 92599fb..774d6bf 100644
> --- a/gcc/tree-ssa-loop-unswitch.c
> +++ b/gcc/tree-ssa-loop-unswitch.c
> @@ -94,6 +94,14 @@ tree_ssa_unswitch_loops (void)
>    struct loop *loop;
>    bool changed = false;
>
> +  basic_block bb;
> +  FOR_ALL_BB_FN (bb, cfun)
> +    if (bb->aux)
> +      {
> +       gcc_unreachable ();
> +      }
>
> Furthermore, you say that we have this "infrastructure" without the need for
> post-dominance.  But we still need dominance info.  The function
> fill_always_execute_in_1 uses CDI_DOMINATORS both inside said function, and
> throughout its dependencies.  I thought the point of pre-calculating
> dominance info (or post-dominance info) originally was because we couldn't
> depend on dominance info being kept up to date between executions of
> tree_unswitch_single_loop(), which BTW, runs recursively.

dominators are kept up-to-date within cfg manipulation routines and unswitching
already uses them.

So if you just try to prove 'use' is defined you don't even need
dominators.  But
that misses cases like

int x;
foo ()
{
  for (;;)
    {
       if (x == 5)
         ...;
    }
}

where unswitching is valid because x is always used when the loop is entered.
Similar

int x, a[10];
foo (int c)
{
  foo (x);
  for (i=0;i<c;++i)
    {
       if (a[c])
         break;
       if (x == 5)
         ...;
    }
}

even though the condition is not always executed if the loop is entered.

>> Can you send a patch that does this maybe?
>
>
> Sure, when I understand what you suggest ;-).
>
> Again, thanks for your input here.
> Aldy
>
Aldy Hernandez Jan. 26, 2017, 12:04 p.m. UTC | #2
On 01/24/2017 07:23 AM, Richard Biener wrote:
> On Mon, Jan 23, 2017 at 6:26 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>> On 01/18/2017 10:10 AM, Richard Biener wrote:
>>>
>>> On Fri, Jan 13, 2017 at 7:48 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>>
>>
>> Hi Richard.
>>
>> I'd be happy to provide a patch, but could you please elaborate, as I'm
>> afraid I'm not following.
>>
>>>> We could go back to my original, no caching version (with the other
>>>> suggestions).  That solves the problem quite simply, with no regressions.
>>>
>>>
>>> So let's go with a unswitching-local solution then.  Based on
>>> tree_may_unswitch_on:
>>
>>
>> What do you mean by unswitching-local?
>
> Like your original patch, not adding new generic infrastructure
> outside of tree-ssa-unswitch.c.
>
>>>
>>>   /* Condition must be invariant.  */
>>>   FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
>>>     {
>>>       /* Unswitching on undefined values would introduce undefined
>>>          behavior that the original program might never exercise.  */
>>>       if (ssa_undefined_value_p (use, true))
>>>         return NULL_TREE;
>>>       def = SSA_NAME_DEF_STMT (use);
>>>       def_bb = gimple_bb (def);
>>>       if (def_bb
>>>           && flow_bb_inside_loop_p (loop, def_bb))
>>>         return NULL_TREE;
>>>
>>> we only have to look for uses in blocks dominating the loop header block
>>> (or in blocks post-dominating that loop header, but we can probably
>>> implement
>>> that by simply including the loop header itself with a FIXME comment).
>>
>>
>> Look for *uses*??  Do you mean definitions?  I mean, we're trying to figure
>> out whether we are going to unswitch on a use that is inside a the loop, not
>> before or after. So perhaps we only care about *definitions*
>> (SSA_NAME_DEF_STMT) dominating the loop header.
>
> We're looking for stmts using the 'use's in the above loop on a path that is
> always executed when the loop is entered.  So we're not introducing a
> use of a possibly undefined value.
>
> Of course we can also prove that 'use' is in fact not undefined looking at
> its defs which are obviously always dominating the loop header if the
> condition satisfies tree_may_unswitch_on (non-dominating defs will
> have uses in PHIs dominating  the header which we have to treat
> conservatively of course).
>
>>
>>>
>>> Note that we only need to know whether a BB will be always executed when
>>> the loop is entered -- there's "infrastructure" elsewhere that computes
>>> this
>>> w/o the need of post-dominance.  For example see fill_always_executed_in_1
>>> tree-ssa-loop-im.c (we can't use that 1:1 I think because we already use
>>> ->aux
>>> via the original copy tables, but we could simplify it as we're only
>>> interested in
>>> the loop which preheader we put the unswitch condition on so we can use
>>> a bitmap to record whether a block of the loop is always executed or not).
>>
>>
>> I don't see any use of ->aux within loop unswitching, so perhaps no
>> adjustment is needed?  I verified this by successfully bootstrapping with:
>>
>> diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c
>> index 92599fb..774d6bf 100644
>> --- a/gcc/tree-ssa-loop-unswitch.c
>> +++ b/gcc/tree-ssa-loop-unswitch.c
>> @@ -94,6 +94,14 @@ tree_ssa_unswitch_loops (void)
>>    struct loop *loop;
>>    bool changed = false;
>>
>> +  basic_block bb;
>> +  FOR_ALL_BB_FN (bb, cfun)
>> +    if (bb->aux)
>> +      {
>> +       gcc_unreachable ();
>> +      }
>>
>> Furthermore, you say that we have this "infrastructure" without the need for
>> post-dominance.  But we still need dominance info.  The function
>> fill_always_execute_in_1 uses CDI_DOMINATORS both inside said function, and
>> throughout its dependencies.  I thought the point of pre-calculating
>> dominance info (or post-dominance info) originally was because we couldn't
>> depend on dominance info being kept up to date between executions of
>> tree_unswitch_single_loop(), which BTW, runs recursively.
>
> dominators are kept up-to-date within cfg manipulation routines and unswitching
> already uses them.
>
> So if you just try to prove 'use' is defined you don't even need
> dominators.  But
> that misses cases like
>
> int x;
> foo ()
> {
>   for (;;)
>     {
>        if (x == 5)
>          ...;
>     }
> }
>
> where unswitching is valid because x is always used when the loop is entered.
> Similar
>
> int x, a[10];
> foo (int c)
> {
>   foo (x);
>   for (i=0;i<c;++i)
>     {
>        if (a[c])
>          break;
>        if (x == 5)
>          ...;
>     }
> }
>
> even though the condition is not always executed if the loop is entered.

I don't think fill_always_execute_in_1 gives us what (I think) you want. 
   For the attached graph, fill_always_execute_in_1 only marks the 
following basic blocks:

bb 4 in loop 2
bb 6 in loop 3
bb 9 in loop 1

Which are basically the loop headers.  Unless I'm misunderstanding 
something, this is useless for the problem at hand.

For example, for loop 3, I would've expected bb5 to be marked as always 
executed if we enter loop 3 since it is its immediate predecessor. 
Similarly for loop 2.  I would expect bb3 to be marked as always being 
executed if we enter loop 2.  And bb10 always enters loop 1.

Also, in other testcases, for loops where the flow loops back to the 
loop-header, we don't even get the loop header marked as always executed 
in the loop.

I still think my initial on-demand approach is straightforward, 
conservative, and solves the problem at hand.  Since I think I'm either 
misunderstanding here, or spending way too much time on this (I think 
this is the 3rd or 4th approach if we count Jeff's original one), I may 
have to respectfully put this back on the queue for someone else to tackle.

Aldy
Richard Biener Jan. 26, 2017, 12:29 p.m. UTC | #3
On Thu, Jan 26, 2017 at 1:04 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> On 01/24/2017 07:23 AM, Richard Biener wrote:
>>
>> On Mon, Jan 23, 2017 at 6:26 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>>>
>>> On 01/18/2017 10:10 AM, Richard Biener wrote:
>>>>
>>>>
>>>> On Fri, Jan 13, 2017 at 7:48 PM, Aldy Hernandez <aldyh@redhat.com>
>>>> wrote:
>>>
>>>
>>>
>>> Hi Richard.
>>>
>>> I'd be happy to provide a patch, but could you please elaborate, as I'm
>>> afraid I'm not following.
>>>
>>>>> We could go back to my original, no caching version (with the other
>>>>> suggestions).  That solves the problem quite simply, with no
>>>>> regressions.
>>>>
>>>>
>>>>
>>>> So let's go with a unswitching-local solution then.  Based on
>>>> tree_may_unswitch_on:
>>>
>>>
>>>
>>> What do you mean by unswitching-local?
>>
>>
>> Like your original patch, not adding new generic infrastructure
>> outside of tree-ssa-unswitch.c.
>>
>>>>
>>>>   /* Condition must be invariant.  */
>>>>   FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
>>>>     {
>>>>       /* Unswitching on undefined values would introduce undefined
>>>>          behavior that the original program might never exercise.  */
>>>>       if (ssa_undefined_value_p (use, true))
>>>>         return NULL_TREE;
>>>>       def = SSA_NAME_DEF_STMT (use);
>>>>       def_bb = gimple_bb (def);
>>>>       if (def_bb
>>>>           && flow_bb_inside_loop_p (loop, def_bb))
>>>>         return NULL_TREE;
>>>>
>>>> we only have to look for uses in blocks dominating the loop header block
>>>> (or in blocks post-dominating that loop header, but we can probably
>>>> implement
>>>> that by simply including the loop header itself with a FIXME comment).
>>>
>>>
>>>
>>> Look for *uses*??  Do you mean definitions?  I mean, we're trying to
>>> figure
>>> out whether we are going to unswitch on a use that is inside a the loop,
>>> not
>>> before or after. So perhaps we only care about *definitions*
>>> (SSA_NAME_DEF_STMT) dominating the loop header.
>>
>>
>> We're looking for stmts using the 'use's in the above loop on a path that
>> is
>> always executed when the loop is entered.  So we're not introducing a
>> use of a possibly undefined value.
>>
>> Of course we can also prove that 'use' is in fact not undefined looking at
>> its defs which are obviously always dominating the loop header if the
>> condition satisfies tree_may_unswitch_on (non-dominating defs will
>> have uses in PHIs dominating  the header which we have to treat
>> conservatively of course).
>>
>>>
>>>>
>>>> Note that we only need to know whether a BB will be always executed when
>>>> the loop is entered -- there's "infrastructure" elsewhere that computes
>>>> this
>>>> w/o the need of post-dominance.  For example see
>>>> fill_always_executed_in_1
>>>> tree-ssa-loop-im.c (we can't use that 1:1 I think because we already use
>>>> ->aux
>>>> via the original copy tables, but we could simplify it as we're only
>>>> interested in
>>>> the loop which preheader we put the unswitch condition on so we can use
>>>> a bitmap to record whether a block of the loop is always executed or
>>>> not).
>>>
>>>
>>>
>>> I don't see any use of ->aux within loop unswitching, so perhaps no
>>> adjustment is needed?  I verified this by successfully bootstrapping
>>> with:
>>>
>>> diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c
>>> index 92599fb..774d6bf 100644
>>> --- a/gcc/tree-ssa-loop-unswitch.c
>>> +++ b/gcc/tree-ssa-loop-unswitch.c
>>> @@ -94,6 +94,14 @@ tree_ssa_unswitch_loops (void)
>>>    struct loop *loop;
>>>    bool changed = false;
>>>
>>> +  basic_block bb;
>>> +  FOR_ALL_BB_FN (bb, cfun)
>>> +    if (bb->aux)
>>> +      {
>>> +       gcc_unreachable ();
>>> +      }
>>>
>>> Furthermore, you say that we have this "infrastructure" without the need
>>> for
>>> post-dominance.  But we still need dominance info.  The function
>>> fill_always_execute_in_1 uses CDI_DOMINATORS both inside said function,
>>> and
>>> throughout its dependencies.  I thought the point of pre-calculating
>>> dominance info (or post-dominance info) originally was because we
>>> couldn't
>>> depend on dominance info being kept up to date between executions of
>>> tree_unswitch_single_loop(), which BTW, runs recursively.
>>
>>
>> dominators are kept up-to-date within cfg manipulation routines and
>> unswitching
>> already uses them.
>>
>> So if you just try to prove 'use' is defined you don't even need
>> dominators.  But
>> that misses cases like
>>
>> int x;
>> foo ()
>> {
>>   for (;;)
>>     {
>>        if (x == 5)
>>          ...;
>>     }
>> }
>>
>> where unswitching is valid because x is always used when the loop is
>> entered.
>> Similar
>>
>> int x, a[10];
>> foo (int c)
>> {
>>   foo (x);
>>   for (i=0;i<c;++i)
>>     {
>>        if (a[c])
>>          break;
>>        if (x == 5)
>>          ...;
>>     }
>> }
>>
>> even though the condition is not always executed if the loop is entered.
>
>
> I don't think fill_always_execute_in_1 gives us what (I think) you want.
> For the attached graph, fill_always_execute_in_1 only marks the following
> basic blocks:
>
> bb 4 in loop 2
> bb 6 in loop 3
> bb 9 in loop 1
>
> Which are basically the loop headers.  Unless I'm misunderstanding
> something, this is useless for the problem at hand.
>
> For example, for loop 3, I would've expected bb5 to be marked as always
> executed if we enter loop 3 since it is its immediate predecessor. Similarly
> for loop 2.  I would expect bb3 to be marked as always being executed if we
> enter loop 2.  And bb10 always enters loop 1.

It only marks blocks inside loops, for out-of-loop blocks you can use dominance
info.  With fill_always_executed_in we want to handle

  header:
    if (x)
      ...not exit
  merge:
    if (y)

where we can unswitch on if (y) because if we enter the loop the condition is
always executed.  if the if (x) were guarding loop exit we couldn't do this.
This is basically computing dominated_by_p (CDI_POST_DOMINATORS,
loop-header, block-with-unswitch-condition).

> Also, in other testcases, for loops where the flow loops back to the
> loop-header, we don't even get the loop header marked as always executed in
> the loop.
>
> I still think my initial on-demand approach is straightforward,
> conservative, and solves the problem at hand.  Since I think I'm either
> misunderstanding here, or spending way too much time on this (I think this
> is the 3rd or 4th approach if we count Jeff's original one), I may have to
> respectfully put this back on the queue for someone else to tackle.

Your initial on-demand approach is fine to catch some of the cases but it
will not handle those for which we'd need post-dominance.

I guess we can incrementally add that.

Richard.

> Aldy

Patch
diff mbox

diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c
index 92599fb..774d6bf 100644
--- a/gcc/tree-ssa-loop-unswitch.c
+++ b/gcc/tree-ssa-loop-unswitch.c
@@ -94,6 +94,14 @@  tree_ssa_unswitch_loops (void)
    struct loop *loop;
    bool changed = false;

+  basic_block bb;
+  FOR_ALL_BB_FN (bb, cfun)
+    if (bb->aux)
+      {
+       gcc_unreachable ();
+      }

Furthermore, you say that we have this "infrastructure" without the need 
for post-dominance.  But we still need dominance info.  The function 
fill_always_execute_in_1 uses CDI_DOMINATORS both inside said function, 
and throughout its dependencies.  I thought the point of pre-calculating