diff mbox

Enhance ifcombine to recover non short circuit branches

Message ID CA+=Sn1ktsHUWVzQ192Z4iLs-ZsNtJpw9QCzcH6ASrhVnBSw9mw@mail.gmail.com
State New
Headers show

Commit Message

Andrew Pinski Oct. 30, 2013, 4:03 a.m. UTC
On Tue, Oct 29, 2013 at 3:28 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Sun, Oct 27, 2013 at 7:55 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Sat, Oct 26, 2013 at 4:49 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>> On Sat, Oct 26, 2013 at 2:30 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>>> On Fri, Oct 18, 2013 at 2:21 AM, Zhenqiang Chen
>>>> <zhenqiang.chen@linaro.org> wrote:
>>>>> On 18 October 2013 00:58, Jeff Law <law@redhat.com> wrote:
>>>>>> On 10/17/13 05:03, Richard Biener wrote:
>>>>>>>>>
>>>>>>>>> Is it OK for trunk?
>>>>>>>>
>>>>>>>>
>>>>>>>> I had a much simpler change which did basically the same from 4.7 (I
>>>>>>>> can update it if people think this is a better approach).
>>>>>>>
>>>>>>>
>>>>>>> I like that more (note you can now use is_gimple_condexpr as predicate
>>>>>>> for force_gimple_operand).
>>>>>>
>>>>>> The obvious question is whether or not Andrew's simpler change picks up as
>>>>>> many transformations as Zhenqiang's change.  If not are the things missed
>>>>>> important.
>>>>>>
>>>>>> Zhenqiang, can you do some testing of your change vs Andrew P.'s change?
>>>>>
>>>>> Here is a rough compare:
>>>>>
>>>>> 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included
>>>>> in my patch). Root cause is that it does not skip "LABEL". The guard
>>>>> to do this opt should be the same the bb_has_overhead_p in my patch.
>>>>
>>>> This should be an easy change, I am working on this right now.
>>>>
>>>>>
>>>>> 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not
>>>>> efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate
>>>>>
>>>>>   _3 = a_2(D) > 0;
>>>>>   _5 = b_4(D) > 0;
>>>>>   _6 = _3 | _5;
>>>>>   _9 = c_7(D) <= 0;
>>>>>   _10 = ~_6;
>>>>>   _11 = _9 & _10;
>>>>>   if (_11 == 0)
>>>>>
>>>>> With my patch, it will generate
>>>>>
>>>>>   _3 = a_2(D) > 0;
>>>>>   _5 = b_4(D) > 0;
>>>>>   _6 = _3 | _5;
>>>>>   _9 = c_7(D) > 0;
>>>>>   _10 = _6 | _9;
>>>>>   if (_10 != 0)
>>>>
>>>> As mentioned otherwise, this seems like a missed optimization inside
>>>> forwprop.  When I originally wrote this code there used to be two
>>>> cases one for & and one for |, but this was removed sometime and I
>>>> just made the code evolve with that.
>>>
>>> Actually I can make a small patch (3 lines) to my current patch which
>>> causes tree-ssa-ifcombine.c to produce the behavior of your patch.
>>>
>>>  if (result_inv)
>>>    {
>>>      t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
>>>      result_inv = false;
>>>    }
>>>
>>> I will submit a new patch after some testing of my current patch which
>>> fixes items 1 and 2.
>>
>>
>> Here is my latest patch which adds the testcases from Zhenqiang's
>> patch and fixes item 1 and 2.
>>
>> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
>    return i;
>  }
> +/* Return a new iterator pointing to the first non-debug non-label statement in
> +   basic block BB.  */
> +
> +static inline gimple_stmt_iterator
> +gsi_start_nondebug_after_labels_bb (basic_block bb)
>
>
> vertical space before the comment missing.
>
> @@ -631,7 +669,7 @@ tree_ssa_ifcombine (void)
>    bbs = single_pred_before_succ_order ();
>    calculate_dominance_info (CDI_DOMINATORS);
>
> -  for (i = 0; i < n_basic_blocks - NUM_FIXED_BLOCKS; ++i)
> +  for (i = n_basic_blocks - NUM_FIXED_BLOCKS -1; i >= 0; i--)
>      {
>        basic_block bb = bbs[i];
>        gimple stmt = last_stmt (bb);
>
> The original code matches what phiopt does which even has a comment:
>
>   /* Search every basic block for COND_EXPR we may be able to optimize.
>
>      We walk the blocks in order that guarantees that a block with
>      a single predecessor is processed before the predecessor.
>      This ensures that we collapse inner ifs before visiting the
>      outer ones, and also that we do not try to visit a removed
>      block.  */
>   bb_order = single_pred_before_succ_order ();
>   n = n_basic_blocks - NUM_FIXED_BLOCKS;
>
>   for (i = 0; i < n; i++)
>
> so if the reverse order is better here (and in phiopt?) then it at least
> needs a comment explaining why.
>
> Otherwise the patch looks good, but please iterate with Jeff over
> the dom testcase issue.


Here is what I applied in the end; Jeff told me just to remove the
testcase.  I added the comment trying to explain why it was the
opposite order of PHI-opt.

Thanks,
Andrew Pinski

ChangeLog:
* tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
(ifcombine_ifandif): Handle cases where
maybe_fold_and_comparisons fails, combining the branches
anyways.
(tree_ssa_ifcombine): Inverse the order of
the basic block walk, increases the number of combinings.
* gimple.h (gsi_start_nondebug_after_labels_bb): New function.

testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.

* gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent
conditional move to be used.
* gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove.


>
> Thanks,
> Richard.
>
>> Thanks,
>> Andrew Pinski
>>
>> ChangeLog:
>> * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
>> (ifcombine_ifandif): Handle cases where
>> maybe_fold_and_comparisons fails, combining the branches
>> anyways.
>> (tree_ssa_ifcombine): Inverse the order of
>> the basic block walk, increases the number of combinings.
>> * gimple.h (gsi_start_nondebug_after_labels_bb): New function.
>>
>>
>> testsuite/ChangeLog:
>>
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.
>>
>> * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent
>> conditional move to be used.
>> * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for "one or more
>> intermediate".
>>
>>
>>
>>
>>>
>>> Thanks,
>>> Andrew Pinski
>>>
>>>
>>>>
>>>>>
>>>>> 3) The good thing of Andrew P.'s change is that "Inverse the order of
>>>>> the basic block walk" so it can do combine recursively.
>>>>>
>>>>> But I think we need some heuristic to control the number of ifs. Move
>>>>> too much compares from
>>>>> the inner_bb to outer_bb is not good.
>>>>
>>>>
>>>> I think this depends on the target.  For MIPS we don't want an upper
>>>> bound as integer comparisons go directly to GPRs.  I wrote this patch
>>>> with MIPS in mind as that was the target I was working on.
>>>>
>>>>>
>>>>> 4) Another good thing of Andrew P.'s change is that it reuses some
>>>>> existing functions. So it looks much simple.
>>>>
>>>>
>>>> Thanks,
>>>> Andrew Pinski
>>>>
>>>>>
>>>>>>>
>>>>>>> With that we should be able to kill the fold-const.c transform?
>>>>>>
>>>>>> That would certainly be nice and an excellent follow-up for Zhenqiang.
>>>>>
>>>>> That's my final goal to "kill the fold-const.c transform". I think we
>>>>> may combine the two changes to make a "simple" and "good" patch.
>>>>>
>>>>> Thanks!
>>>>> -Zhenqiang

Comments

Steven Bosscher Nov. 8, 2013, 5:20 p.m. UTC | #1
On Wed, Oct 30, 2013 at 5:03 AM, Andrew Pinski wrote:
> Here is what I applied in the end; Jeff told me just to remove the
> testcase.  I added the comment trying to explain why it was the
> opposite order of PHI-opt.
>
> Thanks,
> Andrew Pinski
>
> ChangeLog:
> * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.

Eh, why???

The file has this comment:

25    /* rtl is needed only because arm back-end requires it for
26       BRANCH_COST.  */
27    #include "rtl.h"
28    #include "tm_p.h"

Can you please clarify why this is not something to be fixed in the
ARM back end?

You're taking the easy way out here, but it's a step in the wrong
direction from modularity point of view.

Ciao!
Steven
Andrew Pinski Nov. 8, 2013, 5:45 p.m. UTC | #2
> On Nov 8, 2013, at 9:20 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> 
>> On Wed, Oct 30, 2013 at 5:03 AM, Andrew Pinski wrote:
>> Here is what I applied in the end; Jeff told me just to remove the
>> testcase.  I added the comment trying to explain why it was the
>> opposite order of PHI-opt.
>> 
>> Thanks,
>> Andrew Pinski
>> 
>> ChangeLog:
>> * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
> 
> Eh, why???
> 
> The file has this comment:
> 
> 25    /* rtl is needed only because arm back-end requires it for
> 26       BRANCH_COST.  */
> 27    #include "rtl.h"
> 28    #include "tm_p.h"
> 
> Can you please clarify why this is not something to be fixed in the
> ARM back end?

  Really BRANCH_COST should be a target hook rather than a macro which should fix this issue.  fold-const.c has the same include for the same reason.  I thought I had saw a patch which changes it into a hook. Next week if I get time, I will do that.  I knew it was the wrong direction which is why I added a comment.

Thanks,
Andrew

> 
> You're taking the easy way out here, but it's a step in the wrong
> direction from modularity point of view.
> 
> Ciao!
> Steven
Jeff Law Nov. 8, 2013, 9:46 p.m. UTC | #3
On 11/08/13 10:45, pinskia@gmail.com wrote:
>
>
>> On Nov 8, 2013, at 9:20 AM, Steven Bosscher <stevenb.gcc@gmail.com>
>> wrote:
>>
>>> On Wed, Oct 30, 2013 at 5:03 AM, Andrew Pinski wrote: Here is
>>> what I applied in the end; Jeff told me just to remove the
>>> testcase.  I added the comment trying to explain why it was the
>>> opposite order of PHI-opt.
>>>
>>> Thanks, Andrew Pinski
>>>
>>> ChangeLog: * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
>>
>> Eh, why???
>>
>> The file has this comment:
>>
>> 25    /* rtl is needed only because arm back-end requires it for 26
>> BRANCH_COST.  */ 27    #include "rtl.h" 28    #include "tm_p.h"
>>
>> Can you please clarify why this is not something to be fixed in
>> the ARM back end?
>
> Really BRANCH_COST should be a target hook rather than a macro which
> should fix this issue.  fold-const.c has the same include for the
> same reason.  I thought I had saw a patch which changes it into a
> hook. Next week if I get time, I will do that.  I knew it was the
> wrong direction which is why I added a comment.
Also note that other patches have gone in recently which include those 
files for exactly the same reason.  I don't think Andrew did anything 
wrong here.

Clearly those can/should/will go away when BRANCH_COST turns into a 
target hook.  I'd like to do it myself, but I'm buried right now.

jeff
Steven Bosscher Nov. 8, 2013, 9:56 p.m. UTC | #4
On Fri, Nov 8, 2013 at 10:46 PM, Jeff Law wrote:
> Also note that other patches have gone in recently which include those files
> for exactly the same reason.

I assume you're referring to tree-ssa-reassoc.c? It's the only new one
I could find since I cleaned rtl.h out from almost all tree-* files
last year.


>  I don't think Andrew did anything wrong here.

Yup. The problem came in with an ARM patch: http://gcc.gnu.org/r174578


> Clearly those can/should/will go away when BRANCH_COST turns into a target
> hook.  I'd like to do it myself, but I'm buried right now.

Hook would be better but even as a macro this can be solved, see the
patch I attached up thread.

There should be a ban, enforced somehow, on new #includes of rtl.h and
expr.h in GIMPLE passes...

Ciao!
Steven
diff mbox

Patch

Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c	(revision 0)
@@ -0,0 +1,14 @@ 
+/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */
+
+/* { dg-options "-O2 -g -fdump-tree-optimized" } */
+/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
+
+int t (int a, int b)
+{
+  if (a > 0)
+    if (b > 0)
+      return 0;
+  return 1;
+}
+/* { dg-final { scan-tree-dump "\&" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c	(revision 204133)
+++ testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c	(working copy)
@@ -1,48 +0,0 @@ 
-/* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-dom1-details -fno-short-enums" } */
-
-extern void abort (void) __attribute__ ((__noreturn__));
-union tree_node;
-typedef union tree_node *tree;
-enum tree_code
-{
-  VAR_DECL,
-  SSA_NAME,
-  MAX_TREE_CODES
-};
-extern unsigned char tree_contains_struct[MAX_TREE_CODES][64];
-struct tree_base
-{
-  enum tree_code code:16;
-};
-enum tree_node_structure_enum
-{
-  TS_DECL_COMMON
-};
-struct tree_ssa_name
-{
-  tree var;
-};
-union tree_node
-{
-  struct tree_base base;
-  struct tree_ssa_name ssa_name;
-};
-long
-expand_one_var (tree var, unsigned char toplevel, unsigned char really_expand)
-{
-  tree origvar = var;
-  var = var->ssa_name.var;
-  if (((enum tree_code) (origvar)->base.code) == SSA_NAME
-      && !((var->base.code != VAR_DECL)))
-    abort ();
-  if ((var->base.code) != VAR_DECL && ((origvar)->base.code) != SSA_NAME)
-    ;
-  else if (tree_contains_struct[(var->base.code)][(TS_DECL_COMMON)] != 1)
-    abort ();
-}
-/* We should thread the jump, through an intermediate block.  */
-/* { dg-final { scan-tree-dump-times "Threaded" 2 "dom1"} } */
-/* { dg-final { scan-tree-dump-times "Registering jump thread: \\(.*\\) incoming edge;  \\(.*\\) joiner;  \\(.*\\) nocopy;" 1 "dom1"} } */
-/* { dg-final { cleanup-tree-dump "dom1" } } */
-
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c	(revision 0)
@@ -0,0 +1,17 @@ 
+/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */
+
+/* { dg-options "-O2 -g -fdump-tree-optimized" } */
+/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
+
+int t (int a, int b)
+{
+  if (a > 0)
+    goto L1;
+  if (b > 0)
+    goto L1;
+  return 0;
+L1:
+  return 1;
+}
+/* { dg-final { scan-tree-dump "\|" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c	(revision 0)
@@ -0,0 +1,20 @@ 
+/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */
+
+/* { dg-options "-O2 -g -fdump-tree-optimized" } */
+/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
+
+int t (int a, int b)
+{
+  if (a > 0)
+    goto L1;
+  else
+    goto L2;
+L1:
+  if (b > 0)
+    goto L2;
+  return 5;
+L2:
+  return 6;
+}
+/* { dg-final { scan-tree-dump "\|" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c	(revision 0)
@@ -0,0 +1,18 @@ 
+/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */
+
+/* { dg-options "-O2 -g -fdump-tree-optimized" } */
+/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
+
+int t (int a, int b)
+{
+  if (a > 0)
+    goto L1;
+  if (b > 0)
+    goto L2;
+L1:
+  return 0;
+L2:
+  return 1;
+}
+/* { dg-final { scan-tree-dump "\&" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/phi-opt-9.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/phi-opt-9.c	(revision 204133)
+++ testsuite/gcc.dg/tree-ssa/phi-opt-9.c	(working copy)
@@ -2,13 +2,14 @@ 
 /* { dg-options "-O -fdump-tree-optimized" } */
 
 int g(int,int);
+int h(int);
 int f(int t, int c)
 {
   int d = 0;
   int e = 0;
   if (t)
     {
-      d = c+1;
+      d = h(c);
       e = t;
     }
   else d = 0, e = 0;
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c	(revision 0)
@@ -0,0 +1,13 @@ 
+/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */
+
+/* { dg-options "-O2 -g -fdump-tree-optimized" } */
+/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
+
+int t (int a, int b, int c)
+{
+  if (a > 0 && b > 0 && c > 0)
+      return 0;
+  return 1;
+}
+/* { dg-final { scan-tree-dump-times "\&" 2 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c	(revision 0)
@@ -0,0 +1,13 @@ 
+/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */
+
+/* { dg-options "-O2 -g -fdump-tree-optimized" } */
+/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
+
+int t (int a, int b, int c)
+{
+  if (a > 0 || b > 0 || c > 0)
+      return 0;
+  return 1;
+}
+/* { dg-final { scan-tree-dump-times "\\|" 2 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: tree-ssa-ifcombine.c
===================================================================
--- tree-ssa-ifcombine.c	(revision 204133)
+++ tree-ssa-ifcombine.c	(working copy)
@@ -22,6 +22,10 @@  along with GCC; see the file COPYING3.
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
+/* rtl is needed only because arm back-end requires it for
+   BRANCH_COST.  */
+#include "rtl.h"
+#include "tm_p.h"
 #include "tree.h"
 #include "basic-block.h"
 #include "tree-pretty-print.h"
@@ -32,6 +36,12 @@  along with GCC; see the file COPYING3.
 #include "ssa-iterators.h"
 #include "tree-pass.h"
 
+#ifndef LOGICAL_OP_NON_SHORT_CIRCUIT
+#define LOGICAL_OP_NON_SHORT_CIRCUIT \
+  (BRANCH_COST (optimize_function_for_speed_p (cfun), \
+                false) >= 2)
+#endif
+
 /* This pass combines COND_EXPRs to simplify control flow.  It
    currently recognizes bit tests and comparisons in chains that
    represent logical and or logical or of two COND_EXPRs.
@@ -488,7 +498,35 @@  ifcombine_ifandif (basic_block inner_con
 					    outer_cond_code,
 					    gimple_cond_lhs (outer_cond),
 					    gimple_cond_rhs (outer_cond))))
-	return false;
+	{
+	  tree t1, t2;
+	  gimple_stmt_iterator gsi;
+	  if (!LOGICAL_OP_NON_SHORT_CIRCUIT)
+	    return false;
+	  /* Only do this optimization if the inner bb contains only the conditional. */
+	  if (!gsi_one_before_end_p (gsi_start_nondebug_after_labels_bb (inner_cond_bb)))
+	    return false;
+	  t1 = fold_build2_loc (gimple_location (inner_cond),
+				inner_cond_code,
+				boolean_type_node,
+				gimple_cond_lhs (inner_cond),
+				gimple_cond_rhs (inner_cond));
+	  t2 = fold_build2_loc (gimple_location (outer_cond),
+				outer_cond_code,
+				boolean_type_node,
+				gimple_cond_lhs (outer_cond),
+				gimple_cond_rhs (outer_cond));
+	  t = fold_build2_loc (gimple_location (inner_cond), 
+			       TRUTH_AND_EXPR, boolean_type_node, t1, t2);
+	  if (result_inv)
+	    {
+	      t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
+	      result_inv = false;
+	    }
+	  gsi = gsi_for_stmt (inner_cond);
+	  t = force_gimple_operand_gsi_1 (&gsi, t, is_gimple_condexpr, NULL, true,
+					  GSI_SAME_STMT);
+        }
       if (result_inv)
 	t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
       t = canonicalize_cond_expr_cond (t);
@@ -631,7 +669,15 @@  tree_ssa_ifcombine (void)
   bbs = single_pred_before_succ_order ();
   calculate_dominance_info (CDI_DOMINATORS);
 
-  for (i = 0; i < n_basic_blocks - NUM_FIXED_BLOCKS; ++i)
+  /* Search every basic block for COND_EXPR we may be able to optimize.
+
+     We walk the blocks in order that guarantees that a block with
+     a single predecessor is processed after the predecessor.
+     This ensures that we collapse outter ifs before visiting the
+     inner ones, and also that we do not try to visit a removed
+     block.  This is opposite of PHI-OPT, because we cascade the
+     combining rather than cascading PHIs. */
+  for (i = n_basic_blocks - NUM_FIXED_BLOCKS - 1; i >= 0; i--)
     {
       basic_block bb = bbs[i];
       gimple stmt = last_stmt (bb);
Index: gimple.h
===================================================================
--- gimple.h	(revision 204133)
+++ gimple.h	(working copy)
@@ -5534,6 +5534,20 @@  gsi_start_nondebug_bb (basic_block bb)
   return i;
 }
 
+/* Return a new iterator pointing to the first non-debug non-label statement in
+   basic block BB.  */
+
+static inline gimple_stmt_iterator
+gsi_start_nondebug_after_labels_bb (basic_block bb)
+{
+  gimple_stmt_iterator i = gsi_after_labels (bb);
+
+  if (!gsi_end_p (i) && is_gimple_debug (gsi_stmt (i)))
+    gsi_next_nondebug (&i);
+
+  return i;
+}
+
 /* Return a new iterator pointing to the last non-debug statement in
    basic block BB.  */