diff mbox

Optimize n?rotate(x,n):x

Message ID alpine.DEB.2.10.1404131432090.3577@laptop-mg.saclay.inria.fr
State New
Headers show

Commit Message

Marc Glisse April 13, 2014, 8:58 p.m. UTC
On Mon, 3 Mar 2014, Richard Biener wrote:

> On Sat, Mar 1, 2014 at 3:33 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
>> Hello,
>>
>> again, a stage 1 patch that I will ping then, but early comments are
>> welcome.
>>
>> PR 59100 was asking to transform n?rotate(x,n):x to rotate(x,n) (because it
>> can be hard to write a strictly valid rotate in plain C). The operation is
>> really:
>> (x != neutral) ? x op y : y
>> where neutral is such that (neutral op y) is always y, so that's what I
>> implemented (and absorbing elements while I was at it).
>>
>> For some operations on some platforms, the transformation may not be such a
>> good idea, in particular if division is very slow and b is 1 most of the
>> time, then computing a/b may be slower than (b!=1)?a/b:a. The easiest might
>> be to comment out those operations in the switch for now. I think divisions
>> are the only ones slow enough to deserve this, though there certainly are
>> CPUs where multiplication is not so fast, and even for rotate it may not
>> always be a win if the processor doesn't have a rotate instruction and the
>> shift amount is almost always 0.
>
> You only handle integer operations, so checking for INTEGER_TYPE_P
> early on would make sense.

Ah, I wasn't planning on restricting this to integers but yes, indeed, 
since it ended up that way...

> Note that some archs may dispatch to libgcc for integer operations
> so it may make sense to check whether that is the case (you can
> query optabs to check that) - if the comparison also dispatches to
> libgcc then of course the transform would be a win again.

Ok. I am not sure about using cbranch_optab, other ideas for the 
comparison?

> Even on x86 TImode division uses libgcc.

This reminds me of PR 58897 (unrelated).

> Note also that value-profiling may have created this special case in the 
> first place! (gimple_divmod_fixed_value_transform)

Test coverage must be limited if I didn't break the testsuite :-(

value-profiling seems to set edge probabilities when it does the 
transformation, so I am checking only that.

> Otherwise I think this is a good transform.  Did you check if it triggers
> during GCC bootstrap?

It does, a few times. Java has the obvious:

     int padLength = blockSize;
     if (length % blockSize != 0)
       padLength = blockSize - length % blockSize;

sched-deps.c has (I didn't check if it was actually this piece of code 
that triggered or another one in the same file):

       if ((ds1 & t) && !(ds2 & t))
         ds |= ds1 & t;
       else if (!(ds1 & t) && (ds2 & t))
         ds |= ds2 & t;

and we end up seeing (phiopt3) if(a!=0)ds|=a.

And mostly bitmap.h where we see quite a bit of:

   if (_867 != 0)
     goto <bb 55>;
   else
     goto <bb 56>;

   <bb 55>:
   start_bit_869 = _867 * 128;

   <bb 56>:
   # start_bit_870 = PHI <0(54), start_bit_869(55)>


Bootstrap+testsuite on x86_64-linux-gnu (modulo a minor reformatting, I'm 
retesting anyway).

2014-04-14  Marc Glisse  <marc.glisse@inria.fr>

 	PR tree-optimization/59100
gcc/
 	* tree-ssa-phiopt.c (neutral_element_p, absorbing_element_p,
 	is_cheap_stmt): New functions.
 	(value_replacement): Handle conditional binary operations with a
 	neutral or absorbing element.
gcc/testsuite/
 	* gcc.dg/tree-ssa/phi-opt-12.c: New file.
 	* gcc.dg/tree-ssa/phi-opt-13.c: Likewise.

Comments

Richard Biener April 14, 2014, 1:48 p.m. UTC | #1
On Sun, Apr 13, 2014 at 10:58 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Mon, 3 Mar 2014, Richard Biener wrote:
>
>> On Sat, Mar 1, 2014 at 3:33 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
>>>
>>> Hello,
>>>
>>> again, a stage 1 patch that I will ping then, but early comments are
>>> welcome.
>>>
>>> PR 59100 was asking to transform n?rotate(x,n):x to rotate(x,n) (because
>>> it
>>> can be hard to write a strictly valid rotate in plain C). The operation
>>> is
>>> really:
>>> (x != neutral) ? x op y : y
>>> where neutral is such that (neutral op y) is always y, so that's what I
>>> implemented (and absorbing elements while I was at it).
>>>
>>> For some operations on some platforms, the transformation may not be such
>>> a
>>> good idea, in particular if division is very slow and b is 1 most of the
>>> time, then computing a/b may be slower than (b!=1)?a/b:a. The easiest
>>> might
>>> be to comment out those operations in the switch for now. I think
>>> divisions
>>> are the only ones slow enough to deserve this, though there certainly are
>>> CPUs where multiplication is not so fast, and even for rotate it may not
>>> always be a win if the processor doesn't have a rotate instruction and
>>> the
>>> shift amount is almost always 0.
>>
>>
>> You only handle integer operations, so checking for INTEGER_TYPE_P
>> early on would make sense.
>
>
> Ah, I wasn't planning on restricting this to integers but yes, indeed, since
> it ended up that way...
>
>
>> Note that some archs may dispatch to libgcc for integer operations
>> so it may make sense to check whether that is the case (you can
>> query optabs to check that) - if the comparison also dispatches to
>> libgcc then of course the transform would be a win again.
>
>
> Ok. I am not sure about using cbranch_optab, other ideas for the comparison?
>
>
>> Even on x86 TImode division uses libgcc.
>
>
> This reminds me of PR 58897 (unrelated).
>
>
>> Note also that value-profiling may have created this special case in the
>> first place! (gimple_divmod_fixed_value_transform)
>
>
> Test coverage must be limited if I didn't break the testsuite :-(
>
> value-profiling seems to set edge probabilities when it does the
> transformation, so I am checking only that.
>
>
>> Otherwise I think this is a good transform.  Did you check if it triggers
>> during GCC bootstrap?
>
>
> It does, a few times. Java has the obvious:
>
>     int padLength = blockSize;
>     if (length % blockSize != 0)
>       padLength = blockSize - length % blockSize;
>
> sched-deps.c has (I didn't check if it was actually this piece of code that
> triggered or another one in the same file):
>
>       if ((ds1 & t) && !(ds2 & t))
>         ds |= ds1 & t;
>       else if (!(ds1 & t) && (ds2 & t))
>         ds |= ds2 & t;
>
> and we end up seeing (phiopt3) if(a!=0)ds|=a.
>
> And mostly bitmap.h where we see quite a bit of:
>
>   if (_867 != 0)
>     goto <bb 55>;
>   else
>     goto <bb 56>;
>
>   <bb 55>:
>   start_bit_869 = _867 * 128;
>
>   <bb 56>:
>   # start_bit_870 = PHI <0(54), start_bit_869(55)>
>
>
> Bootstrap+testsuite on x86_64-linux-gnu (modulo a minor reformatting, I'm
> retesting anyway).

Few comments below

> 2014-04-14  Marc Glisse  <marc.glisse@inria.fr>
>
>         PR tree-optimization/59100
> gcc/
>         * tree-ssa-phiopt.c (neutral_element_p, absorbing_element_p,
>         is_cheap_stmt): New functions.
>
>         (value_replacement): Handle conditional binary operations with a
>         neutral or absorbing element.
> gcc/testsuite/
>         * gcc.dg/tree-ssa/phi-opt-12.c: New file.
>         * gcc.dg/tree-ssa/phi-opt-13.c: Likewise.
>
> --
> Marc Glisse
> Index: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c  (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c  (working copy)
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-phiopt1" } */
> +
> +int f(int a, int b, int c) {
> +  if (c > 5) return c;
> +  if (a == 0) return b;
> +  return a + b;
> +}
> +
> +unsigned rot(unsigned x, int n) {
> +  const int bits = __CHAR_BIT__ * __SIZEOF_INT__;
> +  return (n == 0) ? x : ((x << n) | (x >> (bits - n)));
> +}
> +
> +unsigned m(unsigned a, unsigned b) {
> +  if (a == 0)
> +    return 0;
> +  else
> +    return a & b;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "goto" 2 "phiopt1" } } */
> +/* { dg-final { cleanup-tree-dump "phiopt1" } } */
>
> Property changes on: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c
> ___________________________________________________________________
> Added: svn:keywords
> ## -0,0 +1 ##
> +Author Date Id Revision URL
> \ No newline at end of property
> Added: svn:eol-style
> ## -0,0 +1 ##
> +native
> \ No newline at end of property

Try to avoid this

> Index: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-13.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/phi-opt-13.c  (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/phi-opt-13.c  (working copy)
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target x86_64-*-* } } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +int f(int a, int b) {
> +  if (__builtin_expect(a == 0, 1)) return b;
> +  return a + b;
> +}
> +
> +// optab_handler can handle if(b==1) but not a/b
> +// so we consider a/b too expensive.
> +unsigned __int128 g(unsigned __int128 a, unsigned __int128 b) {
> +  if (b == 1)
> +    return a;
> +  else
> +    return a / b;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "goto " 4 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
>
> Property changes on: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-13.c
> ___________________________________________________________________
> Added: svn:keywords
> ## -0,0 +1 ##
> +Author Date Id Revision URL
> \ No newline at end of property
> Added: svn:eol-style
> ## -0,0 +1 ##
> +native
> \ No newline at end of property
> Index: gcc/tree-ssa-phiopt.c
> ===================================================================
> --- gcc/tree-ssa-phiopt.c       (revision 209347)
> +++ gcc/tree-ssa-phiopt.c       (working copy)
> @@ -140,20 +140,37 @@ static bool gate_hoist_loads (void);
>         x = PHI (CONST, a)
>
>     Gets replaced with:
>       bb0:
>       bb2:
>         t1 = a == CONST;
>         t2 = b > c;
>         t3 = t1 & t2;
>         x = a;
>
> +
> +   It also replaces
> +
> +     bb0:
> +       if (a != 0) goto bb1; else goto bb2;
> +     bb1:
> +       c = a + b;
> +     bb2:
> +       x = PHI <c (bb1), b (bb0), ...>;
> +
> +   with
> +
> +     bb0:
> +       c = a + b;
> +     bb2:
> +       x = PHI <c (bb0), ...>;
> +
>     ABS Replacement
>     ---------------
>
>     This transformation, implemented in abs_replacement, replaces
>
>       bb0:
>         if (a >= 0) goto bb2; else goto bb1;
>       bb1:
>         x = -a;
>       bb2:
> @@ -809,20 +826,104 @@ operand_equal_for_value_replacement (con
>    if (rhs_is_fed_for_value_replacement (arg0, arg1, code, tmp))
>      return true;
>
>    tmp = gimple_assign_rhs2 (def);
>    if (rhs_is_fed_for_value_replacement (arg0, arg1, code, tmp))
>      return true;
>
>    return false;
>  }
>
> +/* Returns true if ARG is a neutral element for operation CODE
> +   on the RIGHT side.  */
> +
> +static bool
> +neutral_element_p (tree_code code, tree arg, bool right)
> +{
> +  switch (code)
> +    {
> +    case PLUS_EXPR:
> +    case BIT_IOR_EXPR:
> +    case BIT_XOR_EXPR:
> +      return integer_zerop (arg);
> +
> +    case LROTATE_EXPR:
> +    case RROTATE_EXPR:
> +    case LSHIFT_EXPR:
> +    case RSHIFT_EXPR:
> +    case MINUS_EXPR:
> +    case POINTER_PLUS_EXPR:
> +      return right && integer_zerop (arg);
> +
> +    case MULT_EXPR:
> +      return integer_onep (arg);
> +
> +    // Are those too expensive? Should we check edge probabilities?

Comment obsolete now

> +    case TRUNC_DIV_EXPR:
> +    case CEIL_DIV_EXPR:
> +    case FLOOR_DIV_EXPR:
> +    case ROUND_DIV_EXPR:
> +    case EXACT_DIV_EXPR:
> +      return right && integer_onep (arg);
> +
> +    case BIT_AND_EXPR:
> +      return integer_all_onesp (arg);
> +
> +    default:
> +      return false;
> +    }
> +}
> +
> +/* Returns true if ARG is an absorbing element for operation CODE.  */
> +
> +static bool
> +absorbing_element_p (tree_code code, tree arg)
> +{
> +  switch (code)
> +    {
> +    case BIT_IOR_EXPR:
> +      return integer_all_onesp (arg);
> +
> +    case MULT_EXPR:
> +    case BIT_AND_EXPR:
> +      return integer_zerop (arg);
> +
> +    default:
> +      return false;
> +    }
> +}
> +
> +/* Returns true if the statement is cheap. The simple heuristic used here
> +   is that anything the optab knows is cheap.  */
> +
> +static bool
> +is_cheap_stmt (gimple stmt)
> +{
> +  tree type;
> +  optab op;
> +  if (is_gimple_assign (stmt))
> +    {
> +      type = TREE_TYPE (gimple_assign_rhs1 (stmt));
> +      enum tree_code code = gimple_assign_rhs_code (stmt);
> +      op = optab_for_tree_code (code, type, optab_scalar);
> +    }
> +  else if (gimple_code (stmt) == GIMPLE_COND)
> +    {
> +      type = TREE_TYPE (gimple_cond_lhs (stmt));
> +      op = cbranch_optab;

Hum.  cmp_optab/ucmp_optab?  But it seems those are only used
to query libfunc names ...  Maybe try using can_compare_p instead
with ccp_jump, that looks like a suitable abstraction for is_cheap_stmt.

> +    }
> +  else
> +    gcc_unreachable ();
> +  return (op != unknown_optab
> +         && optab_handler (op, TYPE_MODE (type)) != CODE_FOR_nothing);
> +}
> +
>  /*  The function value_replacement does the main work of doing the value
>      replacement.  Return non-zero if the replacement is done.  Otherwise
> return
>      0.  If we remove the middle basic block, return 2.
>      BB is the basic block where the replacement is going to be done on.
> ARG0
>      is argument 0 from the PHI.  Likewise for ARG1.  */
>
>  static int
>  value_replacement (basic_block cond_bb, basic_block middle_bb,
>                    edge e0, edge e1, gimple phi,
>                    tree arg0, tree arg1)
> @@ -833,23 +934,21 @@ value_replacement (basic_block cond_bb,
>    enum tree_code code;
>    bool emtpy_or_with_defined_p = true;
>
>    /* If the type says honor signed zeros we cannot do this
>       optimization.  */
>    if (HONOR_SIGNED_ZEROS (TYPE_MODE (TREE_TYPE (arg1))))
>      return 0;
>
>    /* If there is a statement in MIDDLE_BB that defines one of the PHI
>       arguments, then adjust arg0 or arg1.  */
> -  gsi = gsi_after_labels (middle_bb);
> -  if (!gsi_end_p (gsi) && is_gimple_debug (gsi_stmt (gsi)))
> -    gsi_next_nondebug (&gsi);
> +  gsi = gsi_start_nondebug_after_labels_bb (middle_bb);
>    while (!gsi_end_p (gsi))
>      {
>        gimple stmt = gsi_stmt (gsi);
>        tree lhs;
>        gsi_next_nondebug (&gsi);
>        if (!is_gimple_assign (stmt))
>         {
>           emtpy_or_with_defined_p = false;
>           continue;
>         }
> @@ -931,20 +1030,66 @@ value_replacement (basic_block cond_bb,
>               print_generic_expr (dump_file, gimple_phi_result (phi), 0);
>               fprintf (dump_file, " reduced for COND_EXPR in block %d to ",
>                        cond_bb->index);
>               print_generic_expr (dump_file, arg, 0);
>               fprintf (dump_file, ".\n");
>              }
>            return 1;
>         }
>
>      }
> +
> +  /* Now optimize (x != 0) ? x + y : y to just y.
> +     The following condition is too restrictive, there can easily be
> another
> +     stmt in middle_bb, for instance a CONVERT_EXPR for the second
> argument.  */
> +  gimple assign = last_and_only_stmt (middle_bb);
> +  if (!assign || gimple_code (assign) != GIMPLE_ASSIGN
> +      || gimple_assign_rhs_class (assign) != GIMPLE_BINARY_RHS
> +      || !INTEGRAL_TYPE_P (TREE_TYPE (arg0)))

That will make POINTER_PLUS_EXPR unhandled (pointer arg0).
BIT_AND_EXPR also is valid for pointers (both arguments are pointer type).

> +    return 0;
> +
> +  /* assign may call a libgcc routine, which is slow.  */
> +  if (!is_cheap_stmt (assign) && is_cheap_stmt (cond))
> +    return 0;
> +
> +  /* If the special case has a high probability, keep it.  */
> +  if (EDGE_PRED (middle_bb, 0)->probability < PROB_EVEN)

I suppose Honza has a comment on how to test this properly
(not sure if ->probability or ->frequency is always initialized properly).
for example single_likely_edge tests profile_status_for_fn !=
PROFILE_ABSENT (and uses a fixed probability value ...).
Anyway, the comparison looks backwards to me, but maybe I'm
missing sth - I'd use >= PROB_LIKELY ;)

Otherwise the patch looks ok to me.

Thanks,
Richard.

> +    return 0;
> +
> +  tree lhs = gimple_assign_lhs (assign);
> +  tree rhs1 = gimple_assign_rhs1 (assign);
> +  tree rhs2 = gimple_assign_rhs2 (assign);
> +  enum tree_code code_def = gimple_assign_rhs_code (assign);
> +  tree cond_lhs = gimple_cond_lhs (cond);
> +  tree cond_rhs = gimple_cond_rhs (cond);
> +
> +  if (((code == NE_EXPR && e1 == false_edge)
> +       || (code == EQ_EXPR && e1 == true_edge))
> +      && arg0 == lhs
> +      && ((arg1 == rhs1
> +          && operand_equal_for_phi_arg_p (rhs2, cond_lhs)
> +          && neutral_element_p (code_def, cond_rhs, true))
> +         || (arg1 == rhs2
> +             && operand_equal_for_phi_arg_p (rhs1, cond_lhs)
> +             && neutral_element_p (code_def, cond_rhs, false))
> +         || (operand_equal_for_phi_arg_p (arg1, cond_rhs)
> +             && (operand_equal_for_phi_arg_p (rhs2, cond_lhs)
> +                 || operand_equal_for_phi_arg_p (rhs1, cond_lhs))
> +             && absorbing_element_p (code_def, cond_rhs))))
> +    {
> +      gsi = gsi_for_stmt (cond);
> +      gimple_stmt_iterator gsi_from = gsi_for_stmt (assign);
> +      gsi_move_before (&gsi_from, &gsi);
> +      replace_phi_edge_with_variable (cond_bb, e1, phi, lhs);
> +      return 2;
> +    }
> +
>    return 0;
>  }
>
>  /*  The function minmax_replacement does the main work of doing the minmax
>      replacement.  Return true if the replacement is done.  Otherwise return
>      false.
>      BB is the basic block where the replacement is going to be done on.
> ARG0
>      is argument 0 from the PHI.  Likewise for ARG1.  */
>
>  static bool
>
diff mbox

Patch

Index: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c	(working copy)
@@ -0,0 +1,23 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-phiopt1" } */
+
+int f(int a, int b, int c) {
+  if (c > 5) return c;
+  if (a == 0) return b;
+  return a + b;
+}
+
+unsigned rot(unsigned x, int n) {
+  const int bits = __CHAR_BIT__ * __SIZEOF_INT__;
+  return (n == 0) ? x : ((x << n) | (x >> (bits - n)));
+}
+
+unsigned m(unsigned a, unsigned b) {
+  if (a == 0)
+    return 0;
+  else
+    return a & b;
+}
+
+/* { dg-final { scan-tree-dump-times "goto" 2 "phiopt1" } } */
+/* { dg-final { cleanup-tree-dump "phiopt1" } } */

Property changes on: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c
___________________________________________________________________
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Revision URL
\ No newline at end of property
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Index: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-13.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/phi-opt-13.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/phi-opt-13.c	(working copy)
@@ -0,0 +1,19 @@ 
+/* { dg-do compile { target x86_64-*-* } } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int f(int a, int b) {
+  if (__builtin_expect(a == 0, 1)) return b;
+  return a + b;
+}
+
+// optab_handler can handle if(b==1) but not a/b
+// so we consider a/b too expensive.
+unsigned __int128 g(unsigned __int128 a, unsigned __int128 b) {
+  if (b == 1)
+    return a;
+  else
+    return a / b;
+}
+
+/* { dg-final { scan-tree-dump-times "goto " 4 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */

Property changes on: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-13.c
___________________________________________________________________
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Revision URL
\ No newline at end of property
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Index: gcc/tree-ssa-phiopt.c
===================================================================
--- gcc/tree-ssa-phiopt.c	(revision 209347)
+++ gcc/tree-ssa-phiopt.c	(working copy)
@@ -140,20 +140,37 @@  static bool gate_hoist_loads (void);
        x = PHI (CONST, a)
 
    Gets replaced with:
      bb0:
      bb2:
        t1 = a == CONST;
        t2 = b > c;
        t3 = t1 & t2;
        x = a;
 
+
+   It also replaces
+
+     bb0:
+       if (a != 0) goto bb1; else goto bb2;
+     bb1:
+       c = a + b;
+     bb2:
+       x = PHI <c (bb1), b (bb0), ...>;
+
+   with
+
+     bb0:
+       c = a + b;
+     bb2:
+       x = PHI <c (bb0), ...>;
+
    ABS Replacement
    ---------------
 
    This transformation, implemented in abs_replacement, replaces
 
      bb0:
        if (a >= 0) goto bb2; else goto bb1;
      bb1:
        x = -a;
      bb2:
@@ -809,20 +826,104 @@  operand_equal_for_value_replacement (con
   if (rhs_is_fed_for_value_replacement (arg0, arg1, code, tmp))
     return true;
 
   tmp = gimple_assign_rhs2 (def);
   if (rhs_is_fed_for_value_replacement (arg0, arg1, code, tmp))
     return true;
 
   return false;
 }
 
+/* Returns true if ARG is a neutral element for operation CODE
+   on the RIGHT side.  */
+
+static bool
+neutral_element_p (tree_code code, tree arg, bool right)
+{
+  switch (code)
+    {
+    case PLUS_EXPR:
+    case BIT_IOR_EXPR:
+    case BIT_XOR_EXPR:
+      return integer_zerop (arg);
+
+    case LROTATE_EXPR:
+    case RROTATE_EXPR:
+    case LSHIFT_EXPR:
+    case RSHIFT_EXPR:
+    case MINUS_EXPR:
+    case POINTER_PLUS_EXPR:
+      return right && integer_zerop (arg);
+
+    case MULT_EXPR:
+      return integer_onep (arg);
+
+    // Are those too expensive? Should we check edge probabilities?
+    case TRUNC_DIV_EXPR:
+    case CEIL_DIV_EXPR:
+    case FLOOR_DIV_EXPR:
+    case ROUND_DIV_EXPR:
+    case EXACT_DIV_EXPR:
+      return right && integer_onep (arg);
+
+    case BIT_AND_EXPR:
+      return integer_all_onesp (arg);
+
+    default:
+      return false;
+    }
+}
+
+/* Returns true if ARG is an absorbing element for operation CODE.  */
+
+static bool
+absorbing_element_p (tree_code code, tree arg)
+{
+  switch (code)
+    {
+    case BIT_IOR_EXPR:
+      return integer_all_onesp (arg);
+
+    case MULT_EXPR:
+    case BIT_AND_EXPR:
+      return integer_zerop (arg);
+
+    default:
+      return false;
+    }
+}
+
+/* Returns true if the statement is cheap. The simple heuristic used here
+   is that anything the optab knows is cheap.  */
+
+static bool
+is_cheap_stmt (gimple stmt)
+{
+  tree type;
+  optab op;
+  if (is_gimple_assign (stmt))
+    {
+      type = TREE_TYPE (gimple_assign_rhs1 (stmt));
+      enum tree_code code = gimple_assign_rhs_code (stmt);
+      op = optab_for_tree_code (code, type, optab_scalar);
+    }
+  else if (gimple_code (stmt) == GIMPLE_COND)
+    {
+      type = TREE_TYPE (gimple_cond_lhs (stmt));
+      op = cbranch_optab;
+    }
+  else
+    gcc_unreachable ();
+  return (op != unknown_optab
+	  && optab_handler (op, TYPE_MODE (type)) != CODE_FOR_nothing);
+}
+
 /*  The function value_replacement does the main work of doing the value
     replacement.  Return non-zero if the replacement is done.  Otherwise return
     0.  If we remove the middle basic block, return 2.
     BB is the basic block where the replacement is going to be done on.  ARG0
     is argument 0 from the PHI.  Likewise for ARG1.  */
 
 static int
 value_replacement (basic_block cond_bb, basic_block middle_bb,
 		   edge e0, edge e1, gimple phi,
 		   tree arg0, tree arg1)
@@ -833,23 +934,21 @@  value_replacement (basic_block cond_bb,
   enum tree_code code;
   bool emtpy_or_with_defined_p = true;
 
   /* If the type says honor signed zeros we cannot do this
      optimization.  */
   if (HONOR_SIGNED_ZEROS (TYPE_MODE (TREE_TYPE (arg1))))
     return 0;
 
   /* If there is a statement in MIDDLE_BB that defines one of the PHI
      arguments, then adjust arg0 or arg1.  */
-  gsi = gsi_after_labels (middle_bb);
-  if (!gsi_end_p (gsi) && is_gimple_debug (gsi_stmt (gsi)))
-    gsi_next_nondebug (&gsi);
+  gsi = gsi_start_nondebug_after_labels_bb (middle_bb);
   while (!gsi_end_p (gsi))
     {
       gimple stmt = gsi_stmt (gsi);
       tree lhs;
       gsi_next_nondebug (&gsi);
       if (!is_gimple_assign (stmt))
 	{
 	  emtpy_or_with_defined_p = false;
 	  continue;
 	}
@@ -931,20 +1030,66 @@  value_replacement (basic_block cond_bb,
 	      print_generic_expr (dump_file, gimple_phi_result (phi), 0);
 	      fprintf (dump_file, " reduced for COND_EXPR in block %d to ",
 		       cond_bb->index);
 	      print_generic_expr (dump_file, arg, 0);
 	      fprintf (dump_file, ".\n");
             }
           return 1;
 	}
 
     }
+
+  /* Now optimize (x != 0) ? x + y : y to just y.
+     The following condition is too restrictive, there can easily be another
+     stmt in middle_bb, for instance a CONVERT_EXPR for the second argument.  */
+  gimple assign = last_and_only_stmt (middle_bb);
+  if (!assign || gimple_code (assign) != GIMPLE_ASSIGN
+      || gimple_assign_rhs_class (assign) != GIMPLE_BINARY_RHS
+      || !INTEGRAL_TYPE_P (TREE_TYPE (arg0)))
+    return 0;
+
+  /* assign may call a libgcc routine, which is slow.  */
+  if (!is_cheap_stmt (assign) && is_cheap_stmt (cond))
+    return 0;
+
+  /* If the special case has a high probability, keep it.  */
+  if (EDGE_PRED (middle_bb, 0)->probability < PROB_EVEN)
+    return 0;
+
+  tree lhs = gimple_assign_lhs (assign);
+  tree rhs1 = gimple_assign_rhs1 (assign);
+  tree rhs2 = gimple_assign_rhs2 (assign);
+  enum tree_code code_def = gimple_assign_rhs_code (assign);
+  tree cond_lhs = gimple_cond_lhs (cond);
+  tree cond_rhs = gimple_cond_rhs (cond);
+
+  if (((code == NE_EXPR && e1 == false_edge)
+	|| (code == EQ_EXPR && e1 == true_edge))
+      && arg0 == lhs
+      && ((arg1 == rhs1
+	   && operand_equal_for_phi_arg_p (rhs2, cond_lhs)
+	   && neutral_element_p (code_def, cond_rhs, true))
+	  || (arg1 == rhs2
+	      && operand_equal_for_phi_arg_p (rhs1, cond_lhs)
+	      && neutral_element_p (code_def, cond_rhs, false))
+	  || (operand_equal_for_phi_arg_p (arg1, cond_rhs)
+	      && (operand_equal_for_phi_arg_p (rhs2, cond_lhs)
+		  || operand_equal_for_phi_arg_p (rhs1, cond_lhs))
+	      && absorbing_element_p (code_def, cond_rhs))))
+    {
+      gsi = gsi_for_stmt (cond);
+      gimple_stmt_iterator gsi_from = gsi_for_stmt (assign);
+      gsi_move_before (&gsi_from, &gsi);
+      replace_phi_edge_with_variable (cond_bb, e1, phi, lhs);
+      return 2;
+    }
+
   return 0;
 }
 
 /*  The function minmax_replacement does the main work of doing the minmax
     replacement.  Return true if the replacement is done.  Otherwise return
     false.
     BB is the basic block where the replacement is going to be done on.  ARG0
     is argument 0 from the PHI.  Likewise for ARG1.  */
 
 static bool