Patchwork [tree-optimization] : Improve reassociation pass for bitwise-operations

login
register
mail settings
Submitter Kai Tietz
Date Aug. 2, 2011, 8:06 p.m.
Message ID <CAEwic4a-PGR=+h--OT4Saz+XuW5drEePqQq8Y-4WN=F1L4A1rg@mail.gmail.com>
Download mbox | patch
Permalink /patch/107993/
State New
Headers show

Comments

Kai Tietz - Aug. 2, 2011, 8:06 p.m.
Hello,

this patch improves the ability of reassociation pass to simplifiy
more complex bitwise-binary
operations with comparisons.  We break-up for this patch statements
like (X | Y) != 0 to X != 0 | Y != 0,
and (X | Y) == 0 to expanded X == 0 & Y == 0.
Additionally we expand logical-not expressions like ~(A & B) -> ~A |
~B, ~(A & B) -> ~A | ~B, and
~(A ^ B) -> A ^ ~B.  These expansion are just temporary for this pass
and getting later by fold
reversed again back to their original form.

ChangeLog

2011-08-02  Kai Tietz  <ktietz@redhat.com>

	* tree-ssa-reassoc.c (gimple build_and_add_sum): Add forwarder
	declaration and add support for unary-expression.
	(remove_visited_stmt_chain): Add forwarder declaration.
	(make_new_tmp_statement): New helper function.
	(expand_not_bitwise_binary): Likewise.
	(break_up_bitwise_combined_stmt): Likeiwise.
	(break_up_subtract_bb): Add call to break_up_bitwise_combined_stmt.


ChangeLog

	* gcc.dg/tree-ssa/reassoc-24.c: New test.
	* gcc.dg/tree-ssa/reassoc-25.c: New test.
	* gcc.dg/tree-ssa/reassoc-26.c: New test.

Bootstrapped and regression tested for all languages (including Ada
and Obj-C++) on host x86_64-pc-linux-gnu.
Ok for apply?

Regards,
Kai
Michael Matz - Aug. 3, 2011, 1:07 p.m.
Hi,

On Tue, 2 Aug 2011, Kai Tietz wrote:

> this patch improves the ability of reassociation pass to simplifiy
> more complex bitwise-binary
> operations with comparisons.  We break-up for this patch statements
> like (X | Y) != 0 to X != 0 | Y != 0,
> and (X | Y) == 0 to expanded X == 0 & Y == 0.
> Additionally we expand logical-not expressions like ~(A & B) -> ~A |
> ~B, ~(A & B) -> ~A | ~B, and
> ~(A ^ B) -> A ^ ~B.  These expansion are just temporary for this pass
> and getting later by fold
> reversed again back to their original form.

Implement all of this in the normal reassoc machinery that already exists.  
Don't implement your own walker (which btw is superlinear because you 
recurse into both operands).  If no simplifications were possible you have 
to fold back the NOTs into the shorter sequences again, which conveniently 
reassoc already can do for negates and PLUS/MINUS chains.

Hence extend the existing support for arithmetic operations to logical 
operations.


Ciao,
Michael.
Kai Tietz - Aug. 3, 2011, 1:32 p.m.
2011/8/3 Michael Matz <matz@suse.de>:
> Hi,
>
> On Tue, 2 Aug 2011, Kai Tietz wrote:
>
>> this patch improves the ability of reassociation pass to simplifiy
>> more complex bitwise-binary
>> operations with comparisons.  We break-up for this patch statements
>> like (X | Y) != 0 to X != 0 | Y != 0,
>> and (X | Y) == 0 to expanded X == 0 & Y == 0.
>> Additionally we expand logical-not expressions like ~(A & B) -> ~A |
>> ~B, ~(A & B) -> ~A | ~B, and
>> ~(A ^ B) -> A ^ ~B.  These expansion are just temporary for this pass
>> and getting later by fold
>> reversed again back to their original form.
>
> Implement all of this in the normal reassoc machinery that already exists.
> Don't implement your own walker (which btw is superlinear because you
> recurse into both operands).  If no simplifications were possible you have
> to fold back the NOTs into the shorter sequences again, which conveniently
> reassoc already can do for negates and PLUS/MINUS chains.
>
> Hence extend the existing support for arithmetic operations to logical
> operations.
>
>
> Ciao,
> Michael.

What you mean by existing machinery for negate expression here?  This
machinery doen't work in this case and additionally doesn't provide
the opportunity to do a complete reassociation rewrite of
bitwise-expression-chains.

Eg: the case (~(a | c) & (b & ~d))  would be expanded (by code in
patch) to ~a & ~c & b & ~d.
This intermediate result is good to inspect doubles, or inverted optimizations.
On rebuilding of tree the result gets transformed (or should) to ~(a |
c | d) & b.
This opportunity is just present because we unwinded the intial tree.
Classical folding pass isn't able to actual detect or do that on
gimpled-trees.

For comparisons it is somewhat the same, just that
bitwise-binary-chain is then for sure boolean-typed.  So as example:
(a == 1 & b != 0) & ~(a <= 1 | b == 0) would be transformed by the
code in patch to a == 1 & b != 0 & a > 1 & b != 0. So the final tree
is able to transform this into a >=1 & b != 0.  Again without
expansion and flattening of bitwise tree, we aren't able to detect and
combine that.

I thought about using the same mechanism as for negate, but it doesn't
work and has major weaknesses for bitwise-binary operations. The
negate logic is here a predicate, but the unwinding and flattening of
bitwise trees isn't a predicate.

Regards,
Kai
Richard Guenther - Aug. 3, 2011, 2:12 p.m.
On Wed, Aug 3, 2011 at 3:32 PM, Kai Tietz <ktietz70@googlemail.com> wrote:
> 2011/8/3 Michael Matz <matz@suse.de>:
>> Hi,
>>
>> On Tue, 2 Aug 2011, Kai Tietz wrote:
>>
>>> this patch improves the ability of reassociation pass to simplifiy
>>> more complex bitwise-binary
>>> operations with comparisons.  We break-up for this patch statements
>>> like (X | Y) != 0 to X != 0 | Y != 0,
>>> and (X | Y) == 0 to expanded X == 0 & Y == 0.
>>> Additionally we expand logical-not expressions like ~(A & B) -> ~A |
>>> ~B, ~(A & B) -> ~A | ~B, and
>>> ~(A ^ B) -> A ^ ~B.  These expansion are just temporary for this pass
>>> and getting later by fold
>>> reversed again back to their original form.
>>
>> Implement all of this in the normal reassoc machinery that already exists.
>> Don't implement your own walker (which btw is superlinear because you
>> recurse into both operands).  If no simplifications were possible you have
>> to fold back the NOTs into the shorter sequences again, which conveniently
>> reassoc already can do for negates and PLUS/MINUS chains.
>>
>> Hence extend the existing support for arithmetic operations to logical
>> operations.
>>
>>
>> Ciao,
>> Michael.
>
> What you mean by existing machinery for negate expression here?  This
> machinery doen't work in this case and additionally doesn't provide
> the opportunity to do a complete reassociation rewrite of
> bitwise-expression-chains.
>
> Eg: the case (~(a | c) & (b & ~d))  would be expanded (by code in
> patch) to ~a & ~c & b & ~d.
> This intermediate result is good to inspect doubles, or inverted optimizations.
> On rebuilding of tree the result gets transformed (or should) to ~(a |
> c | d) & b.

It depends on the context whether a conjunctive or a disjunctive normal
form is what you want.  As you are mixing two operation kinds reassoc
isn't the perfect place to deal with this (yet).

You don't seem to stop when single-use chains end (which is where
reassoc will give up) and even visit stmts multiple times that way.
You need to at least do this "unfolding" in a lot more controlled manner.

Richard.

> This opportunity is just present because we unwinded the intial tree.
> Classical folding pass isn't able to actual detect or do that on
> gimpled-trees.
>
> For comparisons it is somewhat the same, just that
> bitwise-binary-chain is then for sure boolean-typed.  So as example:
> (a == 1 & b != 0) & ~(a <= 1 | b == 0) would be transformed by the
> code in patch to a == 1 & b != 0 & a > 1 & b != 0. So the final tree
> is able to transform this into a >=1 & b != 0.  Again without
> expansion and flattening of bitwise tree, we aren't able to detect and
> combine that.
>
> I thought about using the same mechanism as for negate, but it doesn't
> work and has major weaknesses for bitwise-binary operations. The
> negate logic is here a predicate, but the unwinding and flattening of
> bitwise trees isn't a predicate.
>
> Regards,
> Kai
>
Michael Matz - Aug. 3, 2011, 2:21 p.m.
Hi,

On Wed, 3 Aug 2011, Kai Tietz wrote:

> > Implement all of this in the normal reassoc machinery that already 
> > exists. Don't implement your own walker (which btw is superlinear 
> > because you recurse into both operands).  If no simplifications were 
> > possible you have to fold back the NOTs into the shorter sequences 
> > again, which conveniently reassoc already can do for negates and 
> > PLUS/MINUS chains.
> >
> > Hence extend the existing support for arithmetic operations to logical 
> > operations.
> 
> What you mean by existing machinery for negate expression here?

tree-reassoc can handle chains of NEGATE/PLUS/MINUS/MULT/DIV expressions, 
including undistribution (A*X + B*X -> (A+B)*X).  It does have some 
deficiencies with mixed chains (e.g. mixture of PLUS and MULT 
expressions).

Now, in the usual mathematical parlance we have these correspondences:
  NOT == NEGATE
  AND == MULT
  IOR == PLUS
hence, if you extend the existing reassoc machinery to first deal nicely 
with mixed chains and then handle NOT/AND/IOR like their arithmetic 
counterparts you will have improved reassoc for arithmetic operations 
_and_ extended it to also handle logical operation chains.  Without 
hacking a completely different way of walking and collection operands for 
logicals in parallel to the one for arithmetics.

> This machinery doen't work in this case

That's why you have to extend it.

> Eg: the case (~(a | c) & (b & ~d))  would be expanded (by code in
> patch) to ~a & ~c & b & ~d.

That's what I mean with better handling of mixed chains.  -(a+b) is 
already (sometimes) rewritten into -a + -b (see negate_value).  That's 
just slightly different from rewriting ~(a|b) into ~a & ~b.

> This intermediate result is good to inspect doubles, or inverted 
> optimizations. On rebuilding of tree the result gets transformed (or 
> should) to ~(a | c | d) & b.

And that would be done by undistribute_ops_list, if it were properly 
extended.

> This opportunity is just present because we unwinded the intial tree.
> Classical folding pass isn't able to actual detect or do that on
> gimpled-trees.

That's why you do this whole excercise in tree-reassoc, where it indeed 
belongs.  My point is that you should extend the existing means to support 
your operations, instead of implementing a similar-but-different approach 
in parallel.

> I thought about using the same mechanism as for negate, but it doesn't 
> work and has major weaknesses for bitwise-binary operations. The negate 
> logic is here a predicate, but the unwinding and flattening of bitwise 
> trees isn't a predicate.

What you descrive as "flattening" is actually building a normal form.  
reassoc uses a vector of operand extries to hold such normal form (as 
said, not yet dealing with mixed chains).  linearize_expr_tree does the 
flattening, and you would have to extend it (and its associated helper 
routines) to handle mixed chains and then you logical operations.

Once a linearized array of operands (possibly of level two to handle mixed 
chains) exists undistribute_ops_list and optimize_ops_list will optimize 
those operands again, and rewrite_expr_tree will produce statements 
implementing the so optimized operand list.  You will have to extend that 
too for your logical operations (and possibly compares).


Ciao,
Michael.
Kai Tietz - Aug. 3, 2011, 2:55 p.m.
2011/8/3 Michael Matz <matz@suse.de>:
> Hi,
>
> On Wed, 3 Aug 2011, Kai Tietz wrote:
>
>> > Implement all of this in the normal reassoc machinery that already
>> > exists. Don't implement your own walker (which btw is superlinear
>> > because you recurse into both operands).  If no simplifications were
>> > possible you have to fold back the NOTs into the shorter sequences
>> > again, which conveniently reassoc already can do for negates and
>> > PLUS/MINUS chains.
>> >
>> > Hence extend the existing support for arithmetic operations to logical
>> > operations.
>>
>> What you mean by existing machinery for negate expression here?
>
> tree-reassoc can handle chains of NEGATE/PLUS/MINUS/MULT/DIV expressions,
> including undistribution (A*X + B*X -> (A+B)*X).  It does have some
> deficiencies with mixed chains (e.g. mixture of PLUS and MULT
> expressions).
>
> Now, in the usual mathematical parlance we have these correspondences:
>  NOT == NEGATE
>  AND == MULT
>  IOR == PLUS
> hence, if you extend the existing reassoc machinery to first deal nicely
> with mixed chains and then handle NOT/AND/IOR like their arithmetic
> counterparts you will have improved reassoc for arithmetic operations
> _and_ extended it to also handle logical operation chains.  Without
> hacking a completely different way of walking and collection operands for
> logicals in parallel to the one for arithmetics.
>
>> This machinery doen't work in this case
>
> That's why you have to extend it.

The issue about this machinery is that it assumes that the statement
itself gets transformed, but for normalized form of invert of bitwise
operations it is essential to perform invert operation on the
operands, too.

>> Eg: the case (~(a | c) & (b & ~d))  would be expanded (by code in
>> patch) to ~a & ~c & b & ~d.
>
> That's what I mean with better handling of mixed chains.  -(a+b) is
> already (sometimes) rewritten into -a + -b (see negate_value).  That's
> just slightly different from rewriting ~(a|b) into ~a & ~b.

Yes, it affects just one operand. And its weakness is that a (which
might be a more complex statement) doesn't get folded for the negate
operation here.  Eg.  a = 1 - d; c = - (a + b); should become d - b -
1 and optimized form of (d - (b + 1)).

But AFAIR the code the thing is different what break_up_substract
does.  It modifies (a - b) -> a + (-b),  which is for sure worth to
simplify and ease arithmetic plus optimization.  But doesn't match the
point you are talking about.

>> This intermediate result is good to inspect doubles, or inverted
>> optimizations. On rebuilding of tree the result gets transformed (or
>> should) to ~(a | c | d) & b.
>
> And that would be done by undistribute_ops_list, if it were properly
> extended.
>
>> This opportunity is just present because we unwinded the intial tree.
>> Classical folding pass isn't able to actual detect or do that on
>> gimpled-trees.
>
> That's why you do this whole excercise in tree-reassoc, where it indeed
> belongs.  My point is that you should extend the existing means to support
> your operations, instead of implementing a similar-but-different approach
> in parallel.

As I tried to pointed out before is the approach in reassoc only well
working for single statements (not to talk here about non-single-used
statements, which seems to be something my patch needs to handle
better - a Richard said, it might walks statements so multiple times).
 But for an invert the operands and the expression-codes are changing,
too.
As later reassoc pass just walk the tree and combines simliar
operation-codes in its vector by rank, and later optimization just
happens within this range,  it is essential to feed this vector with
proper (normalized) expression-codes.

>> I thought about using the same mechanism as for negate, but it doesn't
>> work and has major weaknesses for bitwise-binary operations. The negate
>> logic is here a predicate, but the unwinding and flattening of bitwise
>> trees isn't a predicate.
>
> What you descrive as "flattening" is actually building a normal form.
> reassoc uses a vector of operand extries to hold such normal form (as
> said, not yet dealing with mixed chains).  linearize_expr_tree does the
> flattening, and you would have to extend it (and its associated helper
> routines) to handle mixed chains and then you logical operations.

Well, it does this in a very limited approach and is strictly bound to
an underlying statement.

> Once a linearized array of operands (possibly of level two to handle mixed
> chains) exists undistribute_ops_list and optimize_ops_list will optimize
> those operands again, and rewrite_expr_tree will produce statements
> implementing the so optimized operand list.  You will have to extend that
> too for your logical operations (and possibly compares).
>
>
> Ciao,
> Michael.

Actually it might be also a way to rewrite reassociation pass in a way
that it just operates on normalized predicated trees made out of the
orignal tree.  Operates on it, and then write it out in a denormalized
form.  Such a tree would need information flags for inverted, negated,
code, operands (maybe also altered), Which means that such a tree gets
easily more complex.
Nevertheless the creation of final result out of a normalized
predicated tree would mean a re-write of tree anyway - even if there
might be in some rare cases the opportunity to reuse present
statements and avoid their rewriting.

Regards,
Kai
Michael Matz - Aug. 4, 2011, 1:14 p.m.
Hi,

On Wed, 3 Aug 2011, Kai Tietz wrote:

> >> This machinery doen't work in this case
> >
> > That's why you have to extend it.
> 
> The issue about this machinery is that it assumes that the statement 
> itself gets transformed, but for normalized form of invert of bitwise 
> operations it is essential to perform invert operation on the operands, 
> too.

Yes, and that is what break_up_subtract_bb and friends do for negates.  In 
particular if the defining statement is of the right type itself (right 
now simply a PLUS) it propagates the negate into the operands of that 
operation.  Extend it to handle AND/IOR/NOT and you're a step further.

> >> Eg: the case (~(a | c) & (b & ~d))  would be expanded (by code in
> >> patch) to ~a & ~c & b & ~d.
> >
> > That's what I mean with better handling of mixed chains.  -(a+b) is
> > already (sometimes) rewritten into -a + -b (see negate_value).  That's
> > just slightly different from rewriting ~(a|b) into ~a & ~b.
> 
> Yes, it affects just one operand.

No, it doesn't.

> And its weakness is that a (which might be a more complex statement) 
> doesn't get folded for the negate operation here.  Eg.  a = 1 - d; c = - 
> (a + b); should become d - b - 1 and optimized form of (d - (b + 1)).

The folding you're talking about is done by undistribute_ops_list, 
optimize_ops_list and repropagate_negates.

I'm asking you to follow the same general scheme of tree-ssa-reassoc which 
is:
1) enabling transformations to make collecting operands easier
   (this possibly changes and adds statements, that's the breakup_subtract
   things)
2) collect operands of operation chains
   (this would sometimes require remembering some context like top
   level and second level operation in your case)
3) optimize that list of operands
4) transform this list into statements again
5) undo transformations done in 1) if they turned out to be
   unprofitable

I'm fairly certain that the transformations you want can be done with this 
scheme.

In particular (as said multiple times) you are missing step (5) even 
though you unconditionally do step (1).  And you don't do steps 2-4 at 
all.

> But AFAIR the code the thing is different what break_up_substract does.  
> It modifies (a - b) -> a + (-b), which is for sure worth to simplify and 
> ease arithmetic plus optimization.  But doesn't match the point you are 
> talking about.

I'm not sure what you think my point is.  It is: reuse existing schemes to 
do things, extending them as necessary to support your usecases.


> As I tried to pointed out before is the approach in reassoc only well
> working for single statements

That's completely wrong.  reassoc was precisely implemented to handle a 
chain of arithmetic operations, i.e. multiple statements.

> But for an invert the operands and the expression-codes are changing,
> too.

Right, that's why you (possibly) might want to handle chains of mixing 
AND/IOR operations, though I don't see those happening in your testcases.  
After the negates are folded into atomics (i.e. negates of AND and IOR are 
folded into operands) you always have a chain of either AND or IOR (with 
the atomics being negated or not).  That can trivially be handled with the 
currently implemented scheme.

> As later reassoc pass just walk the tree and combines simliar
> operation-codes in its vector by rank, and later optimization just
> happens within this range,  it is essential to feed this vector with
> proper (normalized) expression-codes.

That's step (1) above, and for arithmetics done by break_up_subtract.

> Actually it might be also a way to rewrite reassociation pass in a way
> that it just operates on normalized predicated trees made out of the
> orignal tree.  Operates on it, and then write it out in a denormalized
> form.  Such a tree would need information flags for inverted, negated,
> code, operands (maybe also altered), Which means that such a tree gets
> easily more complex.

Right, but ultimately it's the way forward.  Though, as I said, right now 
your testcase only have one common reducing operation.

> Nevertheless the creation of final result out of a normalized
> predicated tree would mean a re-write of tree anyway - even if there
> might be in some rare cases the opportunity to reuse present
> statements and avoid their rewriting.

The reusing of statements in tree-reassoc is merely an artifact, it's not 
an inherent design decision.


Ciao,
Michael.

Patch

Index: gcc/gcc/tree-ssa-reassoc.c
===================================================================
--- gcc.orig/gcc/tree-ssa-reassoc.c
+++ gcc/gcc/tree-ssa-reassoc.c
@@ -41,6 +41,10 @@  along with GCC; see the file COPYING3.
 #include "cfgloop.h"
 #include "flags.h"

+/* Forwarders.  */
+static gimple build_and_add_sum (tree, tree, tree, enum tree_code);
+static void remove_visited_stmt_chain (tree);
+
 /*  This is a simple global reassociation pass.  It is, in part, based
     on the LLVM pass of the same name (They do some things more/less
     than we do, in different orders, etc).
@@ -48,7 +52,9 @@  along with GCC; see the file COPYING3.
     It consists of five steps:

     1. Breaking up subtract operations into addition + negate, where
-    it would promote the reassociation of adds.
+    it would promote the reassociation of adds.  Additionally breaking
+    up combined expression made out of boolean-typed bitwise expressions
+    for improving simplification.

     2. Left linearization of the expression trees, so that (A+B)+(C+D)
     becomes (((A+B)+C)+D), which is easier for us to rewrite later.
@@ -554,6 +560,233 @@  get_unary_op (tree name, enum tree_code
   return NULL_TREE;
 }

+/* Create a temorary register expression with type TYPE, tree code CODE, and
+   operands OP1 and OP2.  If REF_DEF is a valid gimple statement, we use its
+   location for new generated temporary.
+   Function returns left-hand-side of new generated temporary register.  */
+static tree
+make_new_tmp_statement (tree type, enum tree_code code, tree op1, tree op2,
+			gimple ref_def)
+{
+  gimple sum;
+  tree tmpvar = create_tmp_reg (type, NULL);
+  add_referenced_var (tmpvar);
+  sum = build_and_add_sum (tmpvar, op1, op2, code);
+  if (ref_def)
+    gimple_set_location (sum, gimple_location (ref_def));
+  return gimple_get_lhs (sum);
+}
+
+/* Perfrom on tree LHS with optional definition statement EPXR
+   a logic-not operation.  TYPE is of kind boolean with a 1-bit
+   precision.  */
+static tree
+expand_not_bitwise_binary (tree type, tree lhs, gimple expr)
+{
+  enum tree_code code = ERROR_MARK;
+  tree op1, op2;
+  gimple s1 = NULL, s2 = NULL;
+
+  if (expr && is_gimple_assign (expr))
+    code = gimple_assign_rhs_code (expr);
+  else if (TREE_CODE (lhs) == INTEGER_CST)
+    return fold_build1 (BIT_NOT_EXPR, type, lhs);
+
+  /* ~(~X) -> X.  */
+  if (code == BIT_NOT_EXPR)
+    return gimple_assign_rhs1 (expr);
+  /* Invert comparison if possible, otherwise fall through to
+     default case.  */
+  else if (TREE_CODE_CLASS (code) == tcc_comparison)
+    {
+      enum tree_code ncode;
+      ncode = invert_tree_comparison (code,
+				      HONOR_NANS (TYPE_MODE (type)));
+      if (ncode != ERROR_MARK)
+	return make_new_tmp_statement (type, ncode,
+				       gimple_assign_rhs1 (expr),
+				       gimple_assign_rhs2 (expr),
+				       expr);
+    }
+  /* ~(A & B) -> ~A | ~B.  */
+  else if (code == BIT_AND_EXPR)
+    {
+      op1 = gimple_assign_rhs1 (expr);
+      if (TREE_CODE (op1) == SSA_NAME)
+	s1 = SSA_NAME_DEF_STMT (op1);
+      op2 = gimple_assign_rhs2 (expr);
+      if (TREE_CODE (op2) == SSA_NAME)
+	s2 = SSA_NAME_DEF_STMT (op2);
+      op1 = expand_not_bitwise_binary (type, op1, s1);
+      op2 = expand_not_bitwise_binary (type, op2, s2);
+      if (TREE_CODE (op2) == INTEGER_CST && integer_zerop (op2))
+	return op1;
+      else if (TREE_CODE (op2) == INTEGER_CST && integer_all_onesp (op2))
+	return op2;
+      return make_new_tmp_statement (type, BIT_IOR_EXPR, op1, op2, expr);
+    }
+  /* ~(A | B) -> ~A & ~B.  */
+  else if (code == BIT_IOR_EXPR)
+    {
+      op1 = gimple_assign_rhs1 (expr);
+      if (TREE_CODE (op1) == SSA_NAME)
+	s1 = SSA_NAME_DEF_STMT (op1);
+      op2 = gimple_assign_rhs2 (expr);
+      if (TREE_CODE (op2) == SSA_NAME)
+	s2 = SSA_NAME_DEF_STMT (op2);
+      op1 = expand_not_bitwise_binary (type, op1, s1);
+      op2 = expand_not_bitwise_binary (type, op2, s2);
+      if (TREE_CODE (op2) == INTEGER_CST && integer_zerop (op2))
+	return op2;
+      else if (TREE_CODE (op2) == INTEGER_CST && integer_all_onesp (op2))
+	return op1;
+      return make_new_tmp_statement (type, BIT_AND_EXPR, op1, op2, expr);
+    }
+  /* ~(A ^ B) -> ~A ^ B.  Handle here special cases for B being
+     an integer constant, or being a logical-not.  */
+  else if (code == BIT_XOR_EXPR)
+    {
+      op1 = gimple_assign_rhs1 (expr);
+      if (TREE_CODE (op1) == SSA_NAME)
+	s1 = SSA_NAME_DEF_STMT (op1);
+      op2 = gimple_assign_rhs2 (expr);
+      if (TREE_CODE (op2) == SSA_NAME)
+	s2 = SSA_NAME_DEF_STMT (op2);
+      if (TREE_CODE (op2) != INTEGER_CST
+	  && (!s2 || !is_gimple_assign (s2)
+	      || gimple_assign_rhs_code (s2) != BIT_NOT_EXPR))
+        op1 = expand_not_bitwise_binary (type, op1, s1);
+      else
+        op2 = expand_not_bitwise_binary (type, op2, s2);
+
+      if (TREE_CODE (op2) == INTEGER_CST && integer_zerop (op2))
+	return op1;
+      else if (TREE_CODE (op2) == INTEGER_CST && integer_all_onesp (op2))
+	return make_new_tmp_statement (type, BIT_NOT_EXPR, op1, NULL_TREE,
+				       expr);
+      return make_new_tmp_statement (type, BIT_XOR_EXPR, op1, op2, expr);
+    }
+
+  /* Default case lhs -> ~lhs  */
+  return make_new_tmp_statement (type, BIT_NOT_EXPR, lhs, NULL_TREE, expr);
+}
+
+/* Break up statement STMT if it is a combined expressions made out of
+   bitwise operations.  Handle expansion of (A | B) !=/== 0, and ~(A op B).  */
+static bool
+break_up_bitwise_combined_stmt (gimple stmt)
+{
+  tree op1, op2;
+  gimple op1_def, op2_def;
+  enum tree_code code = gimple_assign_rhs_code (stmt);
+  tree type = TREE_TYPE (gimple_assign_lhs (stmt));
+  gimple_stmt_iterator gsi;
+  bool ret;
+
+  /* Don't do anything for none integral type.  */
+  if (!INTEGRAL_TYPE_P (type))
+    return false;
+
+  op1 = gimple_assign_rhs1 (stmt);
+
+  op1_def = NULL;
+  if (TREE_CODE (op1) != SSA_NAME
+      || !(op1_def = SSA_NAME_DEF_STMT (op1))
+      || !is_gimple_assign (op1_def))
+    op1_def = NULL;
+
+  if (op1_def && (code == NE_EXPR || code == EQ_EXPR))
+    {
+      tree zero = gimple_assign_rhs2 (stmt);
+      tree old_op = op1;
+
+      /* Check that right-hand operand has integral type and
+         not a boolean.  And see if it is constant zero valued.  */
+      if (!INTEGRAL_TYPE_P (TREE_TYPE (zero))
+	  || TREE_CODE (TREE_TYPE (zero)) == BOOLEAN_TYPE
+	  || !integer_zerop (zero))
+	return false;
+      /* Is left-hand operand bitwise-or expression?  */
+      if (gimple_assign_rhs_code (op1_def) != BIT_IOR_EXPR)
+	return false;
+      op1 = make_new_tmp_statement (type, code,
+				   gimple_assign_rhs1 (op1_def), zero,
+				   stmt);
+      op2 = make_new_tmp_statement (type, code,
+				   gimple_assign_rhs2 (op1_def), zero,
+				   stmt);
+
+      gsi = gsi_for_stmt (stmt);
+      gimple_assign_set_rhs_with_ops (&gsi, (code == NE_EXPR ? BIT_IOR_EXPR
+							     : BIT_AND_EXPR),
+				      op1, op2);
+      update_stmt (gsi_stmt (gsi));
+      remove_visited_stmt_chain (old_op);
+      return true;
+    }
+  /* Handle expansion for logical-not ~X.  */
+  else if (op1_def && code == BIT_NOT_EXPR
+           && TREE_CODE (type) == BOOLEAN_TYPE
+           && TYPE_PRECISION (type) == 1)
+    {
+      tree old_op;
+      enum tree_code inner_code = gimple_assign_rhs_code (op1_def);
+      if (inner_code != BIT_AND_EXPR && inner_code != BIT_IOR_EXPR
+          && inner_code != BIT_XOR_EXPR)
+	return false;
+      old_op = op1;
+      op1 = gimple_assign_rhs1 (op1_def);
+      op2 = gimple_assign_rhs2 (op1_def);
+      op1_def = op2_def = NULL;
+      if (TREE_CODE (op1) != SSA_NAME
+	|| (op1_def = SSA_NAME_DEF_STMT (op1)) == NULL
+	|| !is_gimple_assign (op1_def))
+	op1_def = NULL;
+      if (TREE_CODE (op2) != SSA_NAME
+	|| (op2_def = SSA_NAME_DEF_STMT (op2)) == NULL
+	|| !is_gimple_assign (op2_def))
+	op2_def = NULL;
+      if (inner_code == BIT_XOR_EXPR)
+	{
+          if (TREE_CODE (op2) != INTEGER_CST
+	      && (!op2_def || !is_gimple_assign (op2_def)
+	          || gimple_assign_rhs_code (op2_def) != BIT_NOT_EXPR))
+            op1 = expand_not_bitwise_binary (type, op1, op1_def);
+          else
+            op2 = expand_not_bitwise_binary (type, op2, op2_def);
+	}
+      else
+	{
+          op1 = expand_not_bitwise_binary (type, op1, op1_def);
+          op2 = expand_not_bitwise_binary (type, op2, op2_def);
+	  inner_code = (inner_code == BIT_AND_EXPR ? BIT_IOR_EXPR
+						   : BIT_AND_EXPR);
+	}
+      gsi = gsi_for_stmt (stmt);
+      gimple_assign_set_rhs_with_ops (&gsi, inner_code, op1, op2);
+      update_stmt (gsi_stmt (gsi));
+      remove_visited_stmt_chain (old_op);
+      return true;
+    }
+  /* Is CODE a bitwise-binary operation X op Y?  */
+  else if (code != BIT_AND_EXPR && code != BIT_IOR_EXPR
+	   && code != BIT_XOR_EXPR)
+   return false;
+
+  /* See if X needs to be expanded.  */
+  ret = false;
+  if (op1_def)
+    ret = break_up_bitwise_combined_stmt (op1_def);
+
+  /* See if Y needs to be expanded.  */
+  op2 = gimple_assign_rhs2 (stmt);
+  if (TREE_CODE (op2) != SSA_NAME
+      || !(op2_def = SSA_NAME_DEF_STMT (op2))
+      || !is_gimple_assign (op2_def))
+    return ret;
+  return break_up_bitwise_combined_stmt (op2_def) | ret;
+}
+
 /* If CURR and LAST are a pair of ops that OPCODE allows us to
    eliminate through equivalences, do so, remove them from OPS, and
    return true.  Otherwise, return false.  */
@@ -1015,8 +1248,8 @@  zero_one_operation (tree *def, enum tree
   while (1);
 }

-/* Builds one statement performing OP1 OPCODE OP2 using TMPVAR for
-   the result.  Places the statement after the definition of either
+/* Builds one statement performing OP1 OPCODE OP2, OPCODE op1 using TMPVAR
+   for the result.  Places the statement after the definition of either
    OP1 or OP2.  Returns the new statement.  */

 static gimple
@@ -1035,7 +1268,7 @@  build_and_add_sum (tree tmpvar, tree op1
   /* Find an insertion place and insert.  */
   if (TREE_CODE (op1) == SSA_NAME)
     op1def = SSA_NAME_DEF_STMT (op1);
-  if (TREE_CODE (op2) == SSA_NAME)
+  if (op2 && TREE_CODE (op2) == SSA_NAME)
     op2def = SSA_NAME_DEF_STMT (op2);
   if ((!op1def || gimple_nop_p (op1def))
       && (!op2def || gimple_nop_p (op2def)))
@@ -2133,6 +2366,17 @@  can_reassociate_p (tree op)
    we want to break up k = t - q, but we won't until we've transformed q
    = b - r, which won't be broken up until we transform b = c - d.

+   Break up comparison !=/== 0 operations of bitwise-or operations for
+   being able to optimize within combined conditions.
+   (A | B) != 0 -> (A != 0) || (B != 0)
+   (A | B) == 0 -> (A == 0) && (B != 0)
+
+   Break up logical-not expressions of bitwise boolean-typed and/or/xor
+   operations for being able to optimize wihin combined conditions.
+   ~(A | B) -> ~A | ~B
+   ~(A & B) -> ~A & ~B
+   ~(A ^ B) -> A ^ ~B (special case if B is a constant)
+
    En passant, clear the GIMPLE visited flag on every statement.  */

 static void
@@ -2141,21 +2385,32 @@  break_up_subtract_bb (basic_block bb)
   gimple_stmt_iterator gsi;
   basic_block son;

-  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); )
     {
       gimple stmt = gsi_stmt (gsi);
       gimple_set_visited (stmt, false);
-
-      if (!is_gimple_assign (stmt)
-	  || !can_reassociate_p (gimple_assign_lhs (stmt)))
+      if (!is_gimple_assign (stmt))
+        {
+	  gsi_next (&gsi);
+	  continue;
+        }
+      if (break_up_bitwise_combined_stmt (stmt))
 	continue;
+      if (!can_reassociate_p (gimple_assign_lhs (stmt)))
+	{
+	  gsi_next (&gsi);
+	  continue;
+	}

       /* Look for simple gimple subtract operations.  */
       if (gimple_assign_rhs_code (stmt) == MINUS_EXPR)
 	{
 	  if (!can_reassociate_p (gimple_assign_rhs1 (stmt))
 	      || !can_reassociate_p (gimple_assign_rhs2 (stmt)))
-	    continue;
+	    {
+	      gsi_next (&gsi);
+	      continue;
+	    }

 	  /* Check for a subtract used only in an addition.  If this
 	     is the case, transform it into add of a negate for better
@@ -2167,6 +2422,7 @@  break_up_subtract_bb (basic_block bb)
       else if (gimple_assign_rhs_code (stmt) == NEGATE_EXPR
 	       && can_reassociate_p (gimple_assign_rhs1 (stmt)))
 	VEC_safe_push (tree, heap, plus_negates, gimple_assign_lhs (stmt));
+      gsi_next (&gsi);
     }
   for (son = first_dom_son (CDI_DOMINATORS, bb);
        son;
Index: gcc/gcc/testsuite/gcc.dg/tree-ssa/reassoc-24.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.dg/tree-ssa/reassoc-24.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized -ffast-math" } */
+
+int foo (int a, int b, int c, int d)
+{
+  int r1 = a != 0 & c != 0 & b != 0;
+  int r2 = a == 0 | b != 0 | d == 0;
+  return (r1 != 0 & r2 == 0);
+}
+
+/* { dg-final { scan-tree-dump-times "return 0" 1 "optimized"} } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/gcc/testsuite/gcc.dg/tree-ssa/reassoc-25.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.dg/tree-ssa/reassoc-25.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized -ffast-math" } */
+
+int foo (int a, int b, int c, int d)
+{
+  int r1 = a != 0 & c != 0 & b != 0;
+  int r2 = a == 0 | b != 0 | d == 0;
+  return (r1 == 0 | r2 != 0);
+}
+
+/* { dg-final { scan-tree-dump-times "return 1" 1 "optimized"} } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/gcc/testsuite/gcc.dg/tree-ssa/reassoc-26.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.dg/tree-ssa/reassoc-26.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized -ffast-math" } */
+
+int foo (int a, int b, int c, int d)
+{
+  int r1 = (a | b | c) == 0;
+  int r2 = (a | d | c) != 0 | b == 0;
+  return (r1 == 0 | r2 != 0);
+}
+
+/* { dg-final { scan-tree-dump-times "return 1" 1 "optimized"} } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */