diff mbox

Enhance ifcombine to recover non short circuit branches

Message ID CA+=Sn1k14xfavt02=1G4RWjdHC6BCvMpG0TrO5FKUEbxs6gkkA@mail.gmail.com
State New
Headers show

Commit Message

Andrew Pinski Oct. 27, 2013, 6:55 p.m. UTC
On Sat, Oct 26, 2013 at 4:49 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Sat, Oct 26, 2013 at 2:30 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Fri, Oct 18, 2013 at 2:21 AM, Zhenqiang Chen
>> <zhenqiang.chen@linaro.org> wrote:
>>> On 18 October 2013 00:58, Jeff Law <law@redhat.com> wrote:
>>>> On 10/17/13 05:03, Richard Biener wrote:
>>>>>>>
>>>>>>> Is it OK for trunk?
>>>>>>
>>>>>>
>>>>>> I had a much simpler change which did basically the same from 4.7 (I
>>>>>> can update it if people think this is a better approach).
>>>>>
>>>>>
>>>>> I like that more (note you can now use is_gimple_condexpr as predicate
>>>>> for force_gimple_operand).
>>>>
>>>> The obvious question is whether or not Andrew's simpler change picks up as
>>>> many transformations as Zhenqiang's change.  If not are the things missed
>>>> important.
>>>>
>>>> Zhenqiang, can you do some testing of your change vs Andrew P.'s change?
>>>
>>> Here is a rough compare:
>>>
>>> 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included
>>> in my patch). Root cause is that it does not skip "LABEL". The guard
>>> to do this opt should be the same the bb_has_overhead_p in my patch.
>>
>> This should be an easy change, I am working on this right now.
>>
>>>
>>> 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not
>>> efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate
>>>
>>>   _3 = a_2(D) > 0;
>>>   _5 = b_4(D) > 0;
>>>   _6 = _3 | _5;
>>>   _9 = c_7(D) <= 0;
>>>   _10 = ~_6;
>>>   _11 = _9 & _10;
>>>   if (_11 == 0)
>>>
>>> With my patch, it will generate
>>>
>>>   _3 = a_2(D) > 0;
>>>   _5 = b_4(D) > 0;
>>>   _6 = _3 | _5;
>>>   _9 = c_7(D) > 0;
>>>   _10 = _6 | _9;
>>>   if (_10 != 0)
>>
>> As mentioned otherwise, this seems like a missed optimization inside
>> forwprop.  When I originally wrote this code there used to be two
>> cases one for & and one for |, but this was removed sometime and I
>> just made the code evolve with that.
>
> Actually I can make a small patch (3 lines) to my current patch which
> causes tree-ssa-ifcombine.c to produce the behavior of your patch.
>
>  if (result_inv)
>    {
>      t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
>      result_inv = false;
>    }
>
> I will submit a new patch after some testing of my current patch which
> fixes items 1 and 2.


Here is my latest patch which adds the testcases from Zhenqiang's
patch and fixes item 1 and 2.

OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

ChangeLog:
* tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
(ifcombine_ifandif): Handle cases where
maybe_fold_and_comparisons fails, combining the branches
anyways.
(tree_ssa_ifcombine): Inverse the order of
the basic block walk, increases the number of combinings.
* gimple.h (gsi_start_nondebug_after_labels_bb): New function.


testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.

* gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent
conditional move to be used.
* gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for "one or more
intermediate".




>
> Thanks,
> Andrew Pinski
>
>
>>
>>>
>>> 3) The good thing of Andrew P.'s change is that "Inverse the order of
>>> the basic block walk" so it can do combine recursively.
>>>
>>> But I think we need some heuristic to control the number of ifs. Move
>>> too much compares from
>>> the inner_bb to outer_bb is not good.
>>
>>
>> I think this depends on the target.  For MIPS we don't want an upper
>> bound as integer comparisons go directly to GPRs.  I wrote this patch
>> with MIPS in mind as that was the target I was working on.
>>
>>>
>>> 4) Another good thing of Andrew P.'s change is that it reuses some
>>> existing functions. So it looks much simple.
>>
>>
>> Thanks,
>> Andrew Pinski
>>
>>>
>>>>>
>>>>> With that we should be able to kill the fold-const.c transform?
>>>>
>>>> That would certainly be nice and an excellent follow-up for Zhenqiang.
>>>
>>> That's my final goal to "kill the fold-const.c transform". I think we
>>> may combine the two changes to make a "simple" and "good" patch.
>>>
>>> Thanks!
>>> -Zhenqiang

Comments

Zhenqiang Chen Oct. 29, 2013, 1:06 a.m. UTC | #1
On 28 October 2013 02:55, Andrew Pinski <pinskia@gmail.com> wrote:
> On Sat, Oct 26, 2013 at 4:49 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Sat, Oct 26, 2013 at 2:30 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>> On Fri, Oct 18, 2013 at 2:21 AM, Zhenqiang Chen
>>> <zhenqiang.chen@linaro.org> wrote:
>>>> On 18 October 2013 00:58, Jeff Law <law@redhat.com> wrote:
>>>>> On 10/17/13 05:03, Richard Biener wrote:
>>>>>>>>
>>>>>>>> Is it OK for trunk?
>>>>>>>
>>>>>>>
>>>>>>> I had a much simpler change which did basically the same from 4.7 (I
>>>>>>> can update it if people think this is a better approach).
>>>>>>
>>>>>>
>>>>>> I like that more (note you can now use is_gimple_condexpr as predicate
>>>>>> for force_gimple_operand).
>>>>>
>>>>> The obvious question is whether or not Andrew's simpler change picks up as
>>>>> many transformations as Zhenqiang's change.  If not are the things missed
>>>>> important.
>>>>>
>>>>> Zhenqiang, can you do some testing of your change vs Andrew P.'s change?
>>>>
>>>> Here is a rough compare:
>>>>
>>>> 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included
>>>> in my patch). Root cause is that it does not skip "LABEL". The guard
>>>> to do this opt should be the same the bb_has_overhead_p in my patch.
>>>
>>> This should be an easy change, I am working on this right now.
>>>
>>>>
>>>> 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not
>>>> efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate
>>>>
>>>>   _3 = a_2(D) > 0;
>>>>   _5 = b_4(D) > 0;
>>>>   _6 = _3 | _5;
>>>>   _9 = c_7(D) <= 0;
>>>>   _10 = ~_6;
>>>>   _11 = _9 & _10;
>>>>   if (_11 == 0)
>>>>
>>>> With my patch, it will generate
>>>>
>>>>   _3 = a_2(D) > 0;
>>>>   _5 = b_4(D) > 0;
>>>>   _6 = _3 | _5;
>>>>   _9 = c_7(D) > 0;
>>>>   _10 = _6 | _9;
>>>>   if (_10 != 0)
>>>
>>> As mentioned otherwise, this seems like a missed optimization inside
>>> forwprop.  When I originally wrote this code there used to be two
>>> cases one for & and one for |, but this was removed sometime and I
>>> just made the code evolve with that.
>>
>> Actually I can make a small patch (3 lines) to my current patch which
>> causes tree-ssa-ifcombine.c to produce the behavior of your patch.
>>
>>  if (result_inv)
>>    {
>>      t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
>>      result_inv = false;
>>    }
>>
>> I will submit a new patch after some testing of my current patch which
>> fixes items 1 and 2.
>
>
> Here is my latest patch which adds the testcases from Zhenqiang's
> patch and fixes item 1 and 2.

The patch works fine for me. Bootstrap and no make check regression on
ARM Chromebook.

Thanks!
-Zhenqiang

> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> Thanks,
> Andrew Pinski
>
> ChangeLog:
> * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
> (ifcombine_ifandif): Handle cases where
> maybe_fold_and_comparisons fails, combining the branches
> anyways.
> (tree_ssa_ifcombine): Inverse the order of
> the basic block walk, increases the number of combinings.
> * gimple.h (gsi_start_nondebug_after_labels_bb): New function.
>
>
> testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.
>
> * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent
> conditional move to be used.
> * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for "one or more
> intermediate".
>
>
>
>
>>
>> Thanks,
>> Andrew Pinski
>>
>>
>>>
>>>>
>>>> 3) The good thing of Andrew P.'s change is that "Inverse the order of
>>>> the basic block walk" so it can do combine recursively.
>>>>
>>>> But I think we need some heuristic to control the number of ifs. Move
>>>> too much compares from
>>>> the inner_bb to outer_bb is not good.
>>>
>>>
>>> I think this depends on the target.  For MIPS we don't want an upper
>>> bound as integer comparisons go directly to GPRs.  I wrote this patch
>>> with MIPS in mind as that was the target I was working on.
>>>
>>>>
>>>> 4) Another good thing of Andrew P.'s change is that it reuses some
>>>> existing functions. So it looks much simple.
>>>
>>>
>>> Thanks,
>>> Andrew Pinski
>>>
>>>>
>>>>>>
>>>>>> With that we should be able to kill the fold-const.c transform?
>>>>>
>>>>> That would certainly be nice and an excellent follow-up for Zhenqiang.
>>>>
>>>> That's my final goal to "kill the fold-const.c transform". I think we
>>>> may combine the two changes to make a "simple" and "good" patch.
>>>>
>>>> Thanks!
>>>> -Zhenqiang
Jeff Law Oct. 29, 2013, 5:51 a.m. UTC | #2
On 10/27/13 12:55, Andrew Pinski wrote:
> Here is my latest patch which adds the testcases from Zhenqiang's
> patch and fixes item 1 and 2.
>
> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> Thanks,
> Andrew Pinski
>
> ChangeLog:
> * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
> (ifcombine_ifandif): Handle cases where
> maybe_fold_and_comparisons fails, combining the branches
> anyways.
> (tree_ssa_ifcombine): Inverse the order of
> the basic block walk, increases the number of combinings.
> * gimple.h (gsi_start_nondebug_after_labels_bb): New function.
>
>
> testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.
>
> * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent
> conditional move to be used.
> * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for "one or more
> intermediate".
I don't be able to look at this in any depth tonight.  However, I would 
ask that you pass along the dump file for ssa-dom-thread-3.c so that I 
can evaluate the correctness of your change better.

Thanks,
jeff
Richard Biener Oct. 29, 2013, 10:28 a.m. UTC | #3
On Sun, Oct 27, 2013 at 7:55 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Sat, Oct 26, 2013 at 4:49 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Sat, Oct 26, 2013 at 2:30 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>> On Fri, Oct 18, 2013 at 2:21 AM, Zhenqiang Chen
>>> <zhenqiang.chen@linaro.org> wrote:
>>>> On 18 October 2013 00:58, Jeff Law <law@redhat.com> wrote:
>>>>> On 10/17/13 05:03, Richard Biener wrote:
>>>>>>>>
>>>>>>>> Is it OK for trunk?
>>>>>>>
>>>>>>>
>>>>>>> I had a much simpler change which did basically the same from 4.7 (I
>>>>>>> can update it if people think this is a better approach).
>>>>>>
>>>>>>
>>>>>> I like that more (note you can now use is_gimple_condexpr as predicate
>>>>>> for force_gimple_operand).
>>>>>
>>>>> The obvious question is whether or not Andrew's simpler change picks up as
>>>>> many transformations as Zhenqiang's change.  If not are the things missed
>>>>> important.
>>>>>
>>>>> Zhenqiang, can you do some testing of your change vs Andrew P.'s change?
>>>>
>>>> Here is a rough compare:
>>>>
>>>> 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included
>>>> in my patch). Root cause is that it does not skip "LABEL". The guard
>>>> to do this opt should be the same the bb_has_overhead_p in my patch.
>>>
>>> This should be an easy change, I am working on this right now.
>>>
>>>>
>>>> 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not
>>>> efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate
>>>>
>>>>   _3 = a_2(D) > 0;
>>>>   _5 = b_4(D) > 0;
>>>>   _6 = _3 | _5;
>>>>   _9 = c_7(D) <= 0;
>>>>   _10 = ~_6;
>>>>   _11 = _9 & _10;
>>>>   if (_11 == 0)
>>>>
>>>> With my patch, it will generate
>>>>
>>>>   _3 = a_2(D) > 0;
>>>>   _5 = b_4(D) > 0;
>>>>   _6 = _3 | _5;
>>>>   _9 = c_7(D) > 0;
>>>>   _10 = _6 | _9;
>>>>   if (_10 != 0)
>>>
>>> As mentioned otherwise, this seems like a missed optimization inside
>>> forwprop.  When I originally wrote this code there used to be two
>>> cases one for & and one for |, but this was removed sometime and I
>>> just made the code evolve with that.
>>
>> Actually I can make a small patch (3 lines) to my current patch which
>> causes tree-ssa-ifcombine.c to produce the behavior of your patch.
>>
>>  if (result_inv)
>>    {
>>      t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
>>      result_inv = false;
>>    }
>>
>> I will submit a new patch after some testing of my current patch which
>> fixes items 1 and 2.
>
>
> Here is my latest patch which adds the testcases from Zhenqiang's
> patch and fixes item 1 and 2.
>
> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.

   return i;
 }
+/* Return a new iterator pointing to the first non-debug non-label statement in
+   basic block BB.  */
+
+static inline gimple_stmt_iterator
+gsi_start_nondebug_after_labels_bb (basic_block bb)


vertical space before the comment missing.

@@ -631,7 +669,7 @@ tree_ssa_ifcombine (void)
   bbs = single_pred_before_succ_order ();
   calculate_dominance_info (CDI_DOMINATORS);

-  for (i = 0; i < n_basic_blocks - NUM_FIXED_BLOCKS; ++i)
+  for (i = n_basic_blocks - NUM_FIXED_BLOCKS -1; i >= 0; i--)
     {
       basic_block bb = bbs[i];
       gimple stmt = last_stmt (bb);

The original code matches what phiopt does which even has a comment:

  /* Search every basic block for COND_EXPR we may be able to optimize.

     We walk the blocks in order that guarantees that a block with
     a single predecessor is processed before the predecessor.
     This ensures that we collapse inner ifs before visiting the
     outer ones, and also that we do not try to visit a removed
     block.  */
  bb_order = single_pred_before_succ_order ();
  n = n_basic_blocks - NUM_FIXED_BLOCKS;

  for (i = 0; i < n; i++)

so if the reverse order is better here (and in phiopt?) then it at least
needs a comment explaining why.

Otherwise the patch looks good, but please iterate with Jeff over
the dom testcase issue.

Thanks,
Richard.

> Thanks,
> Andrew Pinski
>
> ChangeLog:
> * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
> (ifcombine_ifandif): Handle cases where
> maybe_fold_and_comparisons fails, combining the branches
> anyways.
> (tree_ssa_ifcombine): Inverse the order of
> the basic block walk, increases the number of combinings.
> * gimple.h (gsi_start_nondebug_after_labels_bb): New function.
>
>
> testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.
>
> * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent
> conditional move to be used.
> * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for "one or more
> intermediate".
>
>
>
>
>>
>> Thanks,
>> Andrew Pinski
>>
>>
>>>
>>>>
>>>> 3) The good thing of Andrew P.'s change is that "Inverse the order of
>>>> the basic block walk" so it can do combine recursively.
>>>>
>>>> But I think we need some heuristic to control the number of ifs. Move
>>>> too much compares from
>>>> the inner_bb to outer_bb is not good.
>>>
>>>
>>> I think this depends on the target.  For MIPS we don't want an upper
>>> bound as integer comparisons go directly to GPRs.  I wrote this patch
>>> with MIPS in mind as that was the target I was working on.
>>>
>>>>
>>>> 4) Another good thing of Andrew P.'s change is that it reuses some
>>>> existing functions. So it looks much simple.
>>>
>>>
>>> Thanks,
>>> Andrew Pinski
>>>
>>>>
>>>>>>
>>>>>> With that we should be able to kill the fold-const.c transform?
>>>>>
>>>>> That would certainly be nice and an excellent follow-up for Zhenqiang.
>>>>
>>>> That's my final goal to "kill the fold-const.c transform". I think we
>>>> may combine the two changes to make a "simple" and "good" patch.
>>>>
>>>> Thanks!
>>>> -Zhenqiang
Andrew Pinski Oct. 29, 2013, 3:14 p.m. UTC | #4
On Mon, Oct 28, 2013 at 10:51 PM, Jeff Law <law@redhat.com> wrote:
> On 10/27/13 12:55, Andrew Pinski wrote:
>>
>> Here is my latest patch which adds the testcases from Zhenqiang's
>> patch and fixes item 1 and 2.
>>
>> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>>
>> Thanks,
>> Andrew Pinski
>>
>> ChangeLog:
>> * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
>> (ifcombine_ifandif): Handle cases where
>> maybe_fold_and_comparisons fails, combining the branches
>> anyways.
>> (tree_ssa_ifcombine): Inverse the order of
>> the basic block walk, increases the number of combinings.
>> * gimple.h (gsi_start_nondebug_after_labels_bb): New function.
>>
>>
>> testsuite/ChangeLog:
>>
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.
>>
>> * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent
>> conditional move to be used.
>> * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for "one or more
>> intermediate".
>
> I don't be able to look at this in any depth tonight.  However, I would ask
> that you pass along the dump file for ssa-dom-thread-3.c so that I can
> evaluate the correctness of your change better.

We no longer jump thread one case as we have:

  <bb 2>:
  var_3 = var_1(D)->ssa_name.var;
  _4 = var_1(D)->base.code;
  if (_4 == 1)
    goto <bb 3>;
  else
    goto <bb 5>;

...

 <bb 5>:
  _6 = var_3->base.code;
  _13 = _4 != 1;
  _14 = _6 != 0;
  _12 = _13 & _14;
  if (_12 != 0)
    goto <bb 8>;
  else
    goto <bb 6>;

Attached is the full dump too.

Thanks,
Andrew Pinski



>
> Thanks,
> jeff
>
Jeff Law Oct. 29, 2013, 6 p.m. UTC | #5
On 10/29/13 09:14, Andrew Pinski wrote:
> On Mon, Oct 28, 2013 at 10:51 PM, Jeff Law <law@redhat.com> wrote:
>> On 10/27/13 12:55, Andrew Pinski wrote:
>>>
>>> Here is my latest patch which adds the testcases from Zhenqiang's
>>> patch and fixes item 1 and 2.
>>>
>>> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>>>
>>> Thanks,
>>> Andrew Pinski
>>>
>>> ChangeLog:
>>> * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
>>> (ifcombine_ifandif): Handle cases where
>>> maybe_fold_and_comparisons fails, combining the branches
>>> anyways.
>>> (tree_ssa_ifcombine): Inverse the order of
>>> the basic block walk, increases the number of combinings.
>>> * gimple.h (gsi_start_nondebug_after_labels_bb): New function.
>>>
>>>
>>> testsuite/ChangeLog:
>>>
>>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
>>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
>>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
>>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
>>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
>>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.
>>>
>>> * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent
>>> conditional move to be used.
>>> * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for "one or more
>>> intermediate".
>>
>> I don't be able to look at this in any depth tonight.  However, I would ask
>> that you pass along the dump file for ssa-dom-thread-3.c so that I can
>> evaluate the correctness of your change better.
>
> We no longer jump thread one case as we have:
>
>    <bb 2>:
>    var_3 = var_1(D)->ssa_name.var;
>    _4 = var_1(D)->base.code;
>    if (_4 == 1)
>      goto <bb 3>;
>    else
>      goto <bb 5>;
>
> ...
>
>   <bb 5>:
>    _6 = var_3->base.code;
>    _13 = _4 != 1;
>    _14 = _6 != 0;
>    _12 = _13 & _14;
>    if (_12 != 0)
>      goto <bb 8>;
>    else
>      goto <bb 6>;
>
> Attached is the full dump too.
I mostly wanted to verify whether or not the test was worth running. 
It isn't after this change.  It no longer tests for the threader's 
ability to clone an intermediate block to isolate a path to a successor 
of the intermediate that is threadable.


Please just remove ssa-dom-thread-3.c as part of this checkin.

Thanks,

Jeff
diff mbox

Patch

Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c	(revision 0)
@@ -0,0 +1,14 @@ 
+/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */
+
+/* { dg-options "-O2 -g -fdump-tree-optimized" } */
+/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
+
+int t (int a, int b)
+{
+  if (a > 0)
+    if (b > 0)
+      return 0;
+  return 1;
+}
+/* { dg-final { scan-tree-dump "\&" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c	(revision 204095)
+++ testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c	(working copy)
@@ -42,7 +42,7 @@  expand_one_var (tree var, unsigned char
     abort ();
 }
 /* We should thread the jump, through an intermediate block.  */
-/* { dg-final { scan-tree-dump-times "Threaded" 2 "dom1"} } */
-/* { dg-final { scan-tree-dump-times "Registering jump thread: \\(.*\\) incoming edge;  \\(.*\\) joiner;  \\(.*\\) nocopy;" 1 "dom1"} } */
+/* { dg-final { scan-tree-dump "Threaded" "dom1"} } */
+/* { dg-final { scan-tree-dump-times "Registering jump thread: \\(.*\\) incoming edge;" 1 "dom1"} } */
 /* { dg-final { cleanup-tree-dump "dom1" } } */
 
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c	(revision 0)
@@ -0,0 +1,17 @@ 
+/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */
+
+/* { dg-options "-O2 -g -fdump-tree-optimized" } */
+/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
+
+int t (int a, int b)
+{
+  if (a > 0)
+    goto L1;
+  if (b > 0)
+    goto L1;
+  return 0;
+L1:
+  return 1;
+}
+/* { dg-final { scan-tree-dump "\|" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c	(revision 0)
@@ -0,0 +1,20 @@ 
+/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */
+
+/* { dg-options "-O2 -g -fdump-tree-optimized" } */
+/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
+
+int t (int a, int b)
+{
+  if (a > 0)
+    goto L1;
+  else
+    goto L2;
+L1:
+  if (b > 0)
+    goto L2;
+  return 5;
+L2:
+  return 6;
+}
+/* { dg-final { scan-tree-dump "\|" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c	(revision 0)
@@ -0,0 +1,18 @@ 
+/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */
+
+/* { dg-options "-O2 -g -fdump-tree-optimized" } */
+/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
+
+int t (int a, int b)
+{
+  if (a > 0)
+    goto L1;
+  if (b > 0)
+    goto L2;
+L1:
+  return 0;
+L2:
+  return 1;
+}
+/* { dg-final { scan-tree-dump "\&" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/phi-opt-9.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/phi-opt-9.c	(revision 204095)
+++ testsuite/gcc.dg/tree-ssa/phi-opt-9.c	(working copy)
@@ -2,13 +2,14 @@ 
 /* { dg-options "-O -fdump-tree-optimized" } */
 
 int g(int,int);
+int h(int);
 int f(int t, int c)
 {
   int d = 0;
   int e = 0;
   if (t)
     {
-      d = c+1;
+      d = h(c);
       e = t;
     }
   else d = 0, e = 0;
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c	(revision 0)
@@ -0,0 +1,13 @@ 
+/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */
+
+/* { dg-options "-O2 -g -fdump-tree-optimized" } */
+/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
+
+int t (int a, int b, int c)
+{
+  if (a > 0 && b > 0 && c > 0)
+      return 0;
+  return 1;
+}
+/* { dg-final { scan-tree-dump-times "\&" 2 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c	(revision 0)
@@ -0,0 +1,13 @@ 
+/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */
+
+/* { dg-options "-O2 -g -fdump-tree-optimized" } */
+/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
+
+int t (int a, int b, int c)
+{
+  if (a > 0 || b > 0 || c > 0)
+      return 0;
+  return 1;
+}
+/* { dg-final { scan-tree-dump-times "\\|" 2 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: tree-ssa-ifcombine.c
===================================================================
--- tree-ssa-ifcombine.c	(revision 204095)
+++ tree-ssa-ifcombine.c	(working copy)
@@ -22,6 +22,10 @@  along with GCC; see the file COPYING3.
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
+/* rtl is needed only because arm back-end requires it for
+   BRANCH_COST.  */
+#include "rtl.h"
+#include "tm_p.h"
 #include "tree.h"
 #include "basic-block.h"
 #include "tree-pretty-print.h"
@@ -32,6 +36,12 @@  along with GCC; see the file COPYING3.
 #include "ssa-iterators.h"
 #include "tree-pass.h"
 
+#ifndef LOGICAL_OP_NON_SHORT_CIRCUIT
+#define LOGICAL_OP_NON_SHORT_CIRCUIT \
+  (BRANCH_COST (optimize_function_for_speed_p (cfun), \
+                false) >= 2)
+#endif
+
 /* This pass combines COND_EXPRs to simplify control flow.  It
    currently recognizes bit tests and comparisons in chains that
    represent logical and or logical or of two COND_EXPRs.
@@ -488,7 +498,35 @@  ifcombine_ifandif (basic_block inner_con
 					    outer_cond_code,
 					    gimple_cond_lhs (outer_cond),
 					    gimple_cond_rhs (outer_cond))))
-	return false;
+	{
+	  tree t1, t2;
+	  gimple_stmt_iterator gsi;
+	  if (!LOGICAL_OP_NON_SHORT_CIRCUIT)
+	    return false;
+	  /* Only do this optimization if the inner bb contains only the conditional. */
+	  if (!gsi_one_before_end_p (gsi_start_nondebug_after_labels_bb (inner_cond_bb)))
+	    return false;
+	  t1 = fold_build2_loc (gimple_location (inner_cond),
+				inner_cond_code,
+				boolean_type_node,
+				gimple_cond_lhs (inner_cond),
+				gimple_cond_rhs (inner_cond));
+	  t2 = fold_build2_loc (gimple_location (outer_cond),
+				outer_cond_code,
+				boolean_type_node,
+				gimple_cond_lhs (outer_cond),
+				gimple_cond_rhs (outer_cond));
+	  t = fold_build2_loc (gimple_location (inner_cond), 
+			       TRUTH_AND_EXPR, boolean_type_node, t1, t2);
+	  if (result_inv)
+	    {
+	      t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
+	      result_inv = false;
+	    }
+	  gsi = gsi_for_stmt (inner_cond);
+	  t = force_gimple_operand_gsi_1 (&gsi, t, is_gimple_condexpr, NULL, true,
+					  GSI_SAME_STMT);
+        }
       if (result_inv)
 	t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
       t = canonicalize_cond_expr_cond (t);
@@ -631,7 +669,7 @@  tree_ssa_ifcombine (void)
   bbs = single_pred_before_succ_order ();
   calculate_dominance_info (CDI_DOMINATORS);
 
-  for (i = 0; i < n_basic_blocks - NUM_FIXED_BLOCKS; ++i)
+  for (i = n_basic_blocks - NUM_FIXED_BLOCKS -1; i >= 0; i--)
     {
       basic_block bb = bbs[i];
       gimple stmt = last_stmt (bb);
Index: gimple.h
===================================================================
--- gimple.h	(revision 204095)
+++ gimple.h	(working copy)
@@ -5533,6 +5533,19 @@  gsi_start_nondebug_bb (basic_block bb)
 
   return i;
 }
+/* Return a new iterator pointing to the first non-debug non-label statement in
+   basic block BB.  */
+
+static inline gimple_stmt_iterator
+gsi_start_nondebug_after_labels_bb (basic_block bb)
+{
+  gimple_stmt_iterator i = gsi_after_labels (bb);
+
+  if (!gsi_end_p (i) && is_gimple_debug (gsi_stmt (i)))
+    gsi_next_nondebug (&i);
+
+  return i;
+}
 
 /* Return a new iterator pointing to the last non-debug statement in
    basic block BB.  */