Patchwork [tree-optimization,1/2] : Branch-cost optimizations

login
register
mail settings
Submitter Kai Tietz
Date Nov. 6, 2011, 10:12 p.m.
Message ID <b4b65f93-39e7-4429-a1eb-b24708e1f400@zmail14.collab.prod.int.phx2.redhat.com>
Download mbox | patch
Permalink /patch/123976/
State New
Headers show

Comments

Kai Tietz - Nov. 6, 2011, 10:12 p.m.
Hello,

By this patch branch-cost optimization is moved from tree AST to cfgexpand from gimple to RTL.  By this we are able to do better optimization on conditionals simliar for all targets and do the final transition for branch-cost that late it shows best effect.

This patch is splitted up into two pieces.  First adds feature of BC optimization to cfgexpand and scratches out BC-optimization code from fold-const.

The second patch adds to tree-ssa-ifcombine pass the feature to merge simple if-and/or-if patterns into associative form.

Two tests are failing due this patch in C's testsuite.  This are unit-pred-6_b and uniit-pred-6_c testcases.  Those failures are caused by jump-threading optimization in vrp, as vrp-pass.  Those failures could be fixed by the second patch, if we would move the ifcombine pass before the first vrp pass.

ChangeLog

2011-11-06  Kai Tietz  <ktietz@redhat.com>

        * cfgexpand.c (is_bool_op_p): New helper.
        (normalize_truth_condition): Likewise.
        (cond_assoc_t): New structure type.
        (collect_cond_chain): New helper.
        (build_cond_expr): Likewise.
        (is_bitwise_binary_simple_combine): Likewise.
        (preeval_cond_integral): Likewise.
        (combine_conds): Likewise.
        (branchcost_optimization_on_conditions): Likewise.
        (expand_gimple_cond): Use branchcost_optimization_on_condition
        function.
        * dojump.c (do_jump): Prevent converting bitwise-and/or
        to real iffs for branch-cost bigger then zero.
        * fold_const.c (simple_operand_p_2): Improve evaluation
        of side-effects and trapping for associative truth-bitwise
        binary operations.
        (fold_range_test): Remove branch-cost specific code.
        (fold_truth_andor_1): Likewise.
        (fold_truth_andor): Likewise.

ChangeLog testsuite

2011-11-06  Kai Tietz  <ktietz@redhat.com>

        * gcc.dg/pr46909.c: Adjust test.
        * gcc.dg/tree-ssa/vrp33.c: Likewise.
        * gcc.target/i386/branch-cost1.c: Likewise.
        * gcc.target/i386/branch-cost2.c: Likewise.
        * gcc.target/i386/branch-cost3.c: Likewise.
        * gcc.target/i386/branch-cost4.c: Likewise.

Patch was bootstrapped and regression tested for x86_64-unknown-linux-gnu.  Ok for apply?
ChangeLog

2011-11-06  Kai Tietz  <ktietz@redhat.com>

	* cfgexpand.c (is_bool_op_p): New helper.
	(normalize_truth_condition): Likewise.
	(cond_assoc_t): New structure type.
	(collect_cond_chain): New helper.
	(build_cond_expr): Likewise.
	(is_bitwise_binary_simple_combine): Likewise.
	(preeval_cond_integral): Likewise.
	(combine_conds): Likewise.
	(branchcost_optimization_on_conditions): Likewise.
	(expand_gimple_cond): Use branchcost_optimization_on_condition
	function.
	* dojump.c (do_jump): Prevent converting bitwise-and/or
	to real iffs for branch-cost bigger then zero.
	* fold_const.c (simple_operand_p_2): Improve evaluation
	of side-effects and trapping for associative truth-bitwise
	binary operations.
	(fold_range_test): Remove branch-cost specific code.
	(fold_truth_andor_1): Likewise.
	(fold_truth_andor): Likewise.

ChangeLog testsuite

2011-11-06  Kai Tietz  <ktietz@redhat.com>

	* gcc.dg/pr46909.c: Adjust test.
	* gcc.dg/tree-ssa/vrp33.c: Likewise.
	* gcc.target/i386/branch-cost1.c: Likewise.
	* gcc.target/i386/branch-cost2.c: Likewise.
	* gcc.target/i386/branch-cost3.c: Likewise.
	* gcc.target/i386/branch-cost4.c: Likewise.
Richard Guenther - Nov. 7, 2011, 2:38 p.m.
On Sun, Nov 6, 2011 at 11:12 PM, Kai Tietz <ktietz@redhat.com> wrote:
> Hello,
>
> By this patch branch-cost optimization is moved from tree AST to cfgexpand from gimple to RTL.  By this we are able to do better optimization on conditionals simliar for all targets and do the final transition for branch-cost that late it shows best effect.
>
> This patch is splitted up into two pieces.  First adds feature of BC optimization to cfgexpand and scratches out BC-optimization code from fold-const.
>
> The second patch adds to tree-ssa-ifcombine pass the feature to merge simple if-and/or-if patterns into associative form.
>
> Two tests are failing due this patch in C's testsuite.  This are unit-pred-6_b and uniit-pred-6_c testcases.  Those failures are caused by jump-threading optimization in vrp, as vrp-pass.  Those failures could be fixed by the second patch, if we would move the ifcombine pass before the first vrp pass.

The idea of doing target cost specific conversion of straight-line conditional
code to branchy code and vice versa is a good one.  Though technically
the spot you chose isn't very suitable - we already performed SSA name
coalescing so you do not have the freedom that you seem to exercise
with using SSA name definition statements.

Rather than hooking the logic into the place where we expand conditions
this should be a "lowering" pass right before expansion (that would,
hopefully at some point also do switch statement expansion/lowering
according to target costs).  Initially such patch should try to exactly
copy what we do during expansion right now.

The cond_chain stuff should be as we have discussed quite some time
ago on IRC - modeled after tree-affine.c - a "vector" of predicates
of the form [~] A op B, combined using logical AND or OR
(thus, able to represent a disjunctive or conjunctive normal form).
All suitably abstracted so if-conversion, the loop optimizer and
loop number of iteration analysis can use this code (they all collect
a controlling predicate (chain) and try to simplify against it).

A patch adding a pre-expand pass should not at the same time
change what we feed it - what we have before such pass in the IL
should be driven by choice what canonical form is more optimal
for optimization passes (and unfortunately we have conflicting
requirements here - think of profile-feedback and the fact that
we do not have SSA names for each predicate we compute).

I note that there are still more patches from you pending related
to predicate optimizations - but I lack an idea where you are
heading to, from a global point of view.  Take the time that
stage3 and pre-stage1 offers to lay ground for a more structured
approach to this area.  You may, for example, want to pick up
my (suspended) work on separating predicate computation from
predicate use (or at least think about whether that's a good idea
for the purpose of predicate optimizations).

Thanks,
Richard.

> ChangeLog
>
> 2011-11-06  Kai Tietz  <ktietz@redhat.com>
>
>        * cfgexpand.c (is_bool_op_p): New helper.
>        (normalize_truth_condition): Likewise.
>        (cond_assoc_t): New structure type.
>        (collect_cond_chain): New helper.
>        (build_cond_expr): Likewise.
>        (is_bitwise_binary_simple_combine): Likewise.
>        (preeval_cond_integral): Likewise.
>        (combine_conds): Likewise.
>        (branchcost_optimization_on_conditions): Likewise.
>        (expand_gimple_cond): Use branchcost_optimization_on_condition
>        function.
>        * dojump.c (do_jump): Prevent converting bitwise-and/or
>        to real iffs for branch-cost bigger then zero.
>        * fold_const.c (simple_operand_p_2): Improve evaluation
>        of side-effects and trapping for associative truth-bitwise
>        binary operations.
>        (fold_range_test): Remove branch-cost specific code.
>        (fold_truth_andor_1): Likewise.
>        (fold_truth_andor): Likewise.
>
> ChangeLog testsuite
>
> 2011-11-06  Kai Tietz  <ktietz@redhat.com>
>
>        * gcc.dg/pr46909.c: Adjust test.
>        * gcc.dg/tree-ssa/vrp33.c: Likewise.
>        * gcc.target/i386/branch-cost1.c: Likewise.
>        * gcc.target/i386/branch-cost2.c: Likewise.
>        * gcc.target/i386/branch-cost3.c: Likewise.
>        * gcc.target/i386/branch-cost4.c: Likewise.
>
> Patch was bootstrapped and regression tested for x86_64-unknown-linux-gnu.  Ok for apply?
>
Kai Tietz - Nov. 7, 2011, 3:53 p.m.
2011/11/7 Richard Guenther <richard.guenther@gmail.com>:
> On Sun, Nov 6, 2011 at 11:12 PM, Kai Tietz <ktietz@redhat.com> wrote:
>> Hello,
>>
>> By this patch branch-cost optimization is moved from tree AST to cfgexpand from gimple to RTL.  By this we are able to do better optimization on conditionals simliar for all targets and do the final transition for branch-cost that late it shows best effect.
>>
>> This patch is splitted up into two pieces.  First adds feature of BC optimization to cfgexpand and scratches out BC-optimization code from fold-const.
>>
>> The second patch adds to tree-ssa-ifcombine pass the feature to merge simple if-and/or-if patterns into associative form.
>>
>> Two tests are failing due this patch in C's testsuite.  This are unit-pred-6_b and uniit-pred-6_c testcases.  Those failures are caused by jump-threading optimization in vrp, as vrp-pass.  Those failures could be fixed by the second patch, if we would move the ifcombine pass before the first vrp pass.
>
> The idea of doing target cost specific conversion of straight-line conditional
> code to branchy code and vice versa is a good one.  Though technically
> the spot you chose isn't very suitable - we already performed SSA name
> coalescing so you do not have the freedom that you seem to exercise
> with using SSA name definition statements.

Well, not that I noticed that I missed here any freedom.  In what
cases you mean I would need here freedom to create new ssa-statements?
 I admit that to move this branching code into a late pass before
cfgexpand would be preferable, but due current way cfgexpand already
interacts here with dojump, it seems to me for now the better place.
So for the upcoming 4.8 I will happily work on a more improved
version, which can handle the switch-lowering, too.
The current approach I've posted here shows already (first patch only)
an overall lowering of instructions executed for first condition, and
a pretty good lowering of executed instruction for extra dynamic
conditions.  As the choosen algorithm here isn't pretty aggressive in
finding matches, there should be indeed much chance to improve it
more.

The real pain is here already in AST, as for doing associative logical
bitwise combining we need to know pretty well that condition-elements
don't trap and have no side-effects.  This is right now only possible
by walking full tree, which is (as my tests have shown) even faster
then I expected.

> Rather than hooking the logic into the place where we expand conditions
> this should be a "lowering" pass right before expansion (that would,
> hopefully at some point also do switch statement expansion/lowering
> according to target costs).  Initially such patch should try to exactly
> copy what we do during expansion right now.

I agree, this should be the place it should go to for upcoming version.

> The cond_chain stuff should be as we have discussed quite some time
> ago on IRC - modeled after tree-affine.c - a "vector" of predicates
> of the form [~] A op B, combined using logical AND or OR
> (thus, able to represent a disjunctive or conjunctive normal form).
> All suitably abstracted so if-conversion, the loop optimizer and
> loop number of iteration analysis can use this code (they all collect
> a controlling predicate (chain) and try to simplify against it).

Yes, I remember this discussion.  And I agree that an affine-tree for
this could help at some places to share same facility.  Issue here is,
as we also were discussing, that a affine tree of not and ops for
bitwise-binaries is for truth-op folding not enough.  We need to
handle here comparsions and have to detect trapping, and side-effects.
As I said above for upcoming 4.8 I will work on that.

> A patch adding a pre-expand pass should not at the same time
> change what we feed it - what we have before such pass in the IL
> should be driven by choice what canonical form is more optimal
> for optimization passes (and unfortunately we have conflicting
> requirements here - think of profile-feedback and the fact that
> we do not have SSA names for each predicate we compute).

Well, this logic is in the new code for cfgexpand done. It assumes
that each instruction has cost of 1.  So it collects the amount of
required instruction required for the conditional check and sees if it
might be profitable to join it with second operand (if there is one).
This logic is much more sensible in terms of executed instruction that
what we have now in fold-const, where we are blindly joining
associative bitwise-operations, even if it is more costy then the
separate condition.
So by this patch we already have a code-reduction of about 0.3% and
for each extra-dynamical branch a much higher reduction of executed
instruction as current approach shows.

> I note that there are still more patches from you pending related
> to predicate optimizations - but I lack an idea where you are
> heading to, from a global point of view.  Take the time that
> stage3 and pre-stage1 offers to lay ground for a more structured
> approach to this area.  You may, for example, want to pick up
> my (suspended) work on separating predicate computation from
> predicate use (or at least think about whether that's a good idea
> for the purpose of predicate optimizations).

Well, the idea behind this is, that we use for general bitwise-binary
optimizations (logical and non-logical) operation one pass which
already handles it.
We might should have a reassoc pass much earlier then now.

> Thanks,
> Richard.
>
>> ChangeLog
>>
>> 2011-11-06  Kai Tietz  <ktietz@redhat.com>
>>
>>        * cfgexpand.c (is_bool_op_p): New helper.
>>        (normalize_truth_condition): Likewise.
>>        (cond_assoc_t): New structure type.
>>        (collect_cond_chain): New helper.
>>        (build_cond_expr): Likewise.
>>        (is_bitwise_binary_simple_combine): Likewise.
>>        (preeval_cond_integral): Likewise.
>>        (combine_conds): Likewise.
>>        (branchcost_optimization_on_conditions): Likewise.
>>        (expand_gimple_cond): Use branchcost_optimization_on_condition
>>        function.
>>        * dojump.c (do_jump): Prevent converting bitwise-and/or
>>        to real iffs for branch-cost bigger then zero.
>>        * fold_const.c (simple_operand_p_2): Improve evaluation
>>        of side-effects and trapping for associative truth-bitwise
>>        binary operations.
>>        (fold_range_test): Remove branch-cost specific code.
>>        (fold_truth_andor_1): Likewise.
>>        (fold_truth_andor): Likewise.
>>
>> ChangeLog testsuite
>>
>> 2011-11-06  Kai Tietz  <ktietz@redhat.com>
>>
>>        * gcc.dg/pr46909.c: Adjust test.
>>        * gcc.dg/tree-ssa/vrp33.c: Likewise.
>>        * gcc.target/i386/branch-cost1.c: Likewise.
>>        * gcc.target/i386/branch-cost2.c: Likewise.
>>        * gcc.target/i386/branch-cost3.c: Likewise.
>>        * gcc.target/i386/branch-cost4.c: Likewise.
>>
>> Patch was bootstrapped and regression tested for x86_64-unknown-linux-gnu.  Ok for apply?
>>
>
Richard Guenther - Nov. 7, 2011, 4:14 p.m.
On Mon, Nov 7, 2011 at 4:53 PM, Kai Tietz <ktietz70@googlemail.com> wrote:
> 2011/11/7 Richard Guenther <richard.guenther@gmail.com>:
>> On Sun, Nov 6, 2011 at 11:12 PM, Kai Tietz <ktietz@redhat.com> wrote:
>>> Hello,
>>>
>>> By this patch branch-cost optimization is moved from tree AST to cfgexpand from gimple to RTL.  By this we are able to do better optimization on conditionals simliar for all targets and do the final transition for branch-cost that late it shows best effect.
>>>
>>> This patch is splitted up into two pieces.  First adds feature of BC optimization to cfgexpand and scratches out BC-optimization code from fold-const.
>>>
>>> The second patch adds to tree-ssa-ifcombine pass the feature to merge simple if-and/or-if patterns into associative form.
>>>
>>> Two tests are failing due this patch in C's testsuite.  This are unit-pred-6_b and uniit-pred-6_c testcases.  Those failures are caused by jump-threading optimization in vrp, as vrp-pass.  Those failures could be fixed by the second patch, if we would move the ifcombine pass before the first vrp pass.
>>
>> The idea of doing target cost specific conversion of straight-line conditional
>> code to branchy code and vice versa is a good one.  Though technically
>> the spot you chose isn't very suitable - we already performed SSA name
>> coalescing so you do not have the freedom that you seem to exercise
>> with using SSA name definition statements.
>
> Well, not that I noticed that I missed here any freedom.  In what
> cases you mean I would need here freedom to create new ssa-statements?

You lookup SSA_NAME_DEF_STMTs of SSA names - you cannot do
that for SSA names that have been coalesced with others.  Instead
there is get_gimple_for_ssa_name which may return NULL.

>  I admit that to move this branching code into a late pass before
> cfgexpand would be preferable, but due current way cfgexpand already
> interacts here with dojump, it seems to me for now the better place.

How is it for now the better place when it's the wrong place?

> So for the upcoming 4.8 I will happily work on a more improved
> version, which can handle the switch-lowering, too.
> The current approach I've posted here shows already (first patch only)
> an overall lowering of instructions executed for first condition, and
> a pretty good lowering of executed instruction for extra dynamic
> conditions.  As the choosen algorithm here isn't pretty aggressive in
> finding matches, there should be indeed much chance to improve it
> more.
>
> The real pain is here already in AST, as for doing associative logical
> bitwise combining we need to know pretty well that condition-elements
> don't trap and have no side-effects.  This is right now only possible
> by walking full tree, which is (as my tests have shown) even faster
> then I expected.

Well, trapping statements are properly marked and operands never
have side-effects in gimple.  So the situation is very easy - never
collect these kinds of operations in such "affine combination" - you
couldn't do anything with them anyway.

>
>> Rather than hooking the logic into the place where we expand conditions
>> this should be a "lowering" pass right before expansion (that would,
>> hopefully at some point also do switch statement expansion/lowering
>> according to target costs).  Initially such patch should try to exactly
>> copy what we do during expansion right now.
>
> I agree, this should be the place it should go to for upcoming version.
>
>> The cond_chain stuff should be as we have discussed quite some time
>> ago on IRC - modeled after tree-affine.c - a "vector" of predicates
>> of the form [~] A op B, combined using logical AND or OR
>> (thus, able to represent a disjunctive or conjunctive normal form).
>> All suitably abstracted so if-conversion, the loop optimizer and
>> loop number of iteration analysis can use this code (they all collect
>> a controlling predicate (chain) and try to simplify against it).
>
> Yes, I remember this discussion.  And I agree that an affine-tree for
> this could help at some places to share same facility.  Issue here is,
> as we also were discussing, that a affine tree of not and ops for
> bitwise-binaries is for truth-op folding not enough.  We need to
> handle here comparsions and have to detect trapping, and side-effects.
> As I said above for upcoming 4.8 I will work on that.

See above.  you'd have a vector { [~] A op B, [~] C op D, ... } with
a common combining operation, AND or OR.  A op B would be
for example a comparison.  For a full DNF/CNF you'd have a vector
of those vectors as you need both AND and OR - but I believe that
in most cases a flat combination is what we want to work with
(there is no reason A or B could not be arbirtrarily complex trees,
other than "trees are ugly").

>> A patch adding a pre-expand pass should not at the same time
>> change what we feed it - what we have before such pass in the IL
>> should be driven by choice what canonical form is more optimal
>> for optimization passes (and unfortunately we have conflicting
>> requirements here - think of profile-feedback and the fact that
>> we do not have SSA names for each predicate we compute).
>
> Well, this logic is in the new code for cfgexpand done. It assumes
> that each instruction has cost of 1.  So it collects the amount of
> required instruction required for the conditional check and sees if it
> might be profitable to join it with second operand (if there is one).

Huh, well, if you are in expansion you should better deal with real
target costs, not an arbitrary 1.

> This logic is much more sensible in terms of executed instruction that
> what we have now in fold-const, where we are blindly joining
> associative bitwise-operations, even if it is more costy then the
> separate condition.

We surely should stop doing this kind of complex transforms in
fold-const.c, especially as that is applied to the frontends AST.

> So by this patch we already have a code-reduction of about 0.3% and
> for each extra-dynamical branch a much higher reduction of executed
> instruction as current approach shows.

On which targets and for which benchmark?

>> I note that there are still more patches from you pending related
>> to predicate optimizations - but I lack an idea where you are
>> heading to, from a global point of view.  Take the time that
>> stage3 and pre-stage1 offers to lay ground for a more structured
>> approach to this area.  You may, for example, want to pick up
>> my (suspended) work on separating predicate computation from
>> predicate use (or at least think about whether that's a good idea
>> for the purpose of predicate optimizations).
>
> Well, the idea behind this is, that we use for general bitwise-binary
> optimizations (logical and non-logical) operation one pass which
> already handles it.
> We might should have a reassoc pass much earlier then now.

So, what do you want to canonicalize to?  Your patch seems to
move towards straight-line code which pessimizes passes that
perform jump-threading and conditional value-numbering.  So, why
is that a good idea?

Richard.
Jeff Law - Nov. 7, 2011, 7:36 p.m.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/07/11 09:14, Richard Guenther wrote:

>> 
>> Well, not that I noticed that I missed here any freedom.  In
>> what cases you mean I would need here freedom to create new
>> ssa-statements?
> 
> You lookup SSA_NAME_DEF_STMTs of SSA names - you cannot do that for
> SSA names that have been coalesced with others.  Instead there is
> get_gimple_for_ssa_name which may return NULL.
Just so I understand what's going on here.

Kai wants to move this code into the expanders which obviously run
after TER.  Once TER has done it's job we can't depend on
SSA_NAME_DEF_STMT in any meaningful way.

I haven't looked closely at Kai's implementation (I've just helped
with some benchmarking stuff), but would it make sense to leave this
stuff in gimple, but make the transformations we want just prior to TER?

Alternately, the code could use get_gimple_for_ssa_name and bail out
if it returned something non-useful, such as NULL?  Obviously this
would be the only choice if the code were intimately tied to the
cfgexpand bits.




> Kai writes:
>> So for the upcoming 4.8 I will happily work on a more improved 
>> version, which can handle the switch-lowering, too.
What I think Richard was suggesting is a general approach where we can
tie in switch lowering and other lowerering or cost dependent
transformations.  I don't think he's asking you to add switching
lowering, just build what you're doing in such a way that someone
could add switch lowering later.

My concern is that the more things you try to fix and handle, the more
complex the patch is going to become and the more difficult to review,
benchmark & analyze.  So my preference would be to try and get the
current improvements into a suitable form & approved before tackling
something like switch lowering.

> 
> Well, trapping statements are properly marked and operands never 
> have side-effects in gimple.  So the situation is very easy -
> never collect these kinds of operations in such "affine
> combination" - you couldn't do anything with them anyway.
Right.  Kai, I think it's this property of gimple that Richi wants you
to take advantage of.

I think we've all agreed that doing this stuff on the pre-gimplified
tress is difficult and against the general direction we want to go.
But that does _not_ mean that doing this stuff on a gimplified tree is
bad.  In fact, gimple was designed to make this kind stuff easy to do.

> 
>> So by this patch we already have a code-reduction of about 0.3%
>> and for each extra-dynamical branch a much higher reduction of
>> executed instruction as current approach shows.
> 
> On which targets and for which benchmark?
Kai may be referring the benchmarks I've been running for him using
the framework I use to evaluate for jump threading and related changes.

If so the process looks something like this:

Build the trunk to get a baseline compiler.
Use the trunk compiler to build a gcc-4.6.0 compiler (checking enabled)
Then use the gcc-4.6.0 cc1 we just built to compile a bunch of .i
files under the watchful eye of valgrind/cachegrind

Apply patch
Rebuild trunk
Use the patched trunk compiler to build a new gcc-4.6.0 compiler
Use the newer built gcc-4.6.0 to build the bunch of .i files under the
watchful eye of valgrind/cachegrind

Compare results.  From valgrind we get # insns executed, # branches
and a variety of other data.  However, insns executed and branches are
the most useful for the changes I'm tracking.  There's certain
limitations to this kind of testing that I have discussed with Kai, in
particular valgrind doesn't tell us costs of insns or break them down
further based on type.  So we speculate an insn out of a conditional,
linearize the code, eliminating some branching.  The net result may be
more total insns executed, but fewer poorly predicted branches executed.

Jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOuDNNAAoJEBRtltQi2kC7GpMH/2ABKFbjvPB5t/hjugL2XlNv
qxYfZH/ySreP6oSeNn35VGmKB19jZYi9V3tPnkRxW6QY4YTHEAgDVRt4nuuQw+Mn
bK78x7qtnYLIqzGY1Gp/QEq1EFz13sehfHEk4jWcoJcCrJJzJG9Y1I+mc5H4PRHU
cclqUtkRsF9Eu+s24E/kiSpUV2d+C4yZgr8mmrnfp5Xh1MbeCJKg4JMbVuOHTLWB
apAJF0LQQjOVl+eb4mtZTwOpew7NarJrcKMuh9O83IlRtsrk1pXUl2k2BtBnYTbc
G0EHlQhiPqxpr57eMYX2XpGUp6NZ30FKLTA8VoK6nZimCRF1UiJBhM8Dh79h9dg=
=mxTE
-----END PGP SIGNATURE-----
Jeff Law - Nov. 7, 2011, 7:47 p.m.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/07/11 07:38, Richard Guenther wrote:
> 
> 
> The cond_chain stuff should be as we have discussed quite some
> time ago on IRC - modeled after tree-affine.c - a "vector" of
> predicates of the form [~] A op B, combined using logical AND or
> OR (thus, able to represent a disjunctive or conjunctive normal
> form). All suitably abstracted so if-conversion, the loop optimizer
> and loop number of iteration analysis can use this code (they all
> collect a controlling predicate (chain) and try to simplify against
> it).
Hmm, I've never looked at this code, does it do something similar to
simplify_plus_minus?

simplify_plus_minus takes a hunk of nested additions/subtractions,
breaks them into components (where subtractions are modeled using
addition+negation).  The components are held in an array which is
repeatedly simplified (combining constants, eliminating unnecessary
negations, etc etc).  After simpification we re-emit the new,
simplified arithmetic?

If so, then definitely, we want that kind of structure.  Interestingly
enough Kenner and I discussed this back in the early 90s before I was
capable of doing anything with trees.  There was some codegen problem
on the PA that would have been significantly helped by regrouping of
this nature at the tree level.



> You may, for example, want to pick up my (suspended) work on
> separating predicate computation from predicate use (or at least
> think about whether that's a good idea for the purpose of predicate
> optimizations).
I certainly see value in in going down this path.

jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOuDXcAAoJEBRtltQi2kC72gAH/22OpgEAQ8e2fRPX2yFb5on3
6HqGb6/4GQCm6GuwaNOxDa6njN1y6/emhDK2cIyF7GzbE5U3ch7kQ5cvHg7pbhAf
fpc9nwdlAMw4w22x9J0Gwl8AcI7AG2o2WfHrYMikp6dNrnh1zZwUBmKBeLgW1jOr
4OXmaJFeiVFajprvNnwuNdKyrtVQ4hQZANAVbcJQOlypBAuSzI0HITiQcsJJupvw
BNjjNHXo9hm9uuScXPTyRhAbZ+w/lO7fwbyDKB6Bq8tTlsCXhACJ1ngLEZsm+M9P
7+aI+33vmhnWnH2jiNwYjLTi0BXR8ho39hmdf2J6HqaPBcGMtiseXx+zvCtV7/k=
=z1R8
-----END PGP SIGNATURE-----
Kai Tietz - Nov. 7, 2011, 8:28 p.m.
2011/11/7 Jeff Law <law@redhat.com>:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 11/07/11 09:14, Richard Guenther wrote:
>
>>>
>>> Well, not that I noticed that I missed here any freedom.  In
>>> what cases you mean I would need here freedom to create new
>>> ssa-statements?
>>
>> You lookup SSA_NAME_DEF_STMTs of SSA names - you cannot do that for
>> SSA names that have been coalesced with others.  Instead there is
>> get_gimple_for_ssa_name which may return NULL.
> Just so I understand what's going on here.
>
> Kai wants to move this code into the expanders which obviously run
> after TER.  Once TER has done it's job we can't depend on
> SSA_NAME_DEF_STMT in any meaningful way.
>
> I haven't looked closely at Kai's implementation (I've just helped
> with some benchmarking stuff), but would it make sense to leave this
> stuff in gimple, but make the transformations we want just prior to TER?

Well, to model this before TER makes sense.  Nevertheless we have here
some issues about transformation from gimple to RTL in terms of
adjusting code in a way cfgexpander does the simplest form of
transformation.  As code-transformation is pretty much tied to C-tree
in dojump.  Some bits in dojump we need to touch anyway, as otherwise
cfg-expander will destroy done BC-optimization.

> Alternately, the code could use get_gimple_for_ssa_name and bail out
> if it returned something non-useful, such as NULL?  Obviously this
> would be the only choice if the code were intimately tied to the
> cfgexpand bits.

I would suggest that I use for 4.7 here get_gimple_for_ssa_name.  In
fact I have modified the patch for testing SA.values and tested it
with success.

>> Kai writes:
>>> So for the upcoming 4.8 I will happily work on a more improved
>>> version, which can handle the switch-lowering, too.
> What I think Richard was suggesting is a general approach where we can
> tie in switch lowering and other lowerering or cost dependent
> transformations.  I don't think he's asking you to add switching
> lowering, just build what you're doing in such a way that someone
> could add switch lowering later.
>
> My concern is that the more things you try to fix and handle, the more
> complex the patch is going to become and the more difficult to review,
> benchmark & analyze.  So my preference would be to try and get the
> current improvements into a suitable form & approved before tackling
> something like switch lowering.

Agreed.

>>
>> Well, trapping statements are properly marked and operands never
>> have side-effects in gimple.  So the situation is very easy -
>> never collect these kinds of operations in such "affine
>> combination" - you couldn't do anything with them anyway.
> Right.  Kai, I think it's this property of gimple that Richi wants you
> to take advantage of.
>
> I think we've all agreed that doing this stuff on the pre-gimplified
> tress is difficult and against the general direction we want to go.
> But that does _not_ mean that doing this stuff on a gimplified tree is
> bad.  In fact, gimple was designed to make this kind stuff easy to do.

Sure.  A more general question, which was raised by Richi here.  For
BC optimization it is of course interesting to know real
instruction-costs and not just guessings.   The current code in
fold-const guess a bit too optimistic,  the code in that patch is in
some cases too pesimistic.  So is there a facility to get prediction
of instruction-costs on gimple-tree?  I would say this isn't possible,
as here we would need final RTL form to tell this, isn't it?

Kai
Richard Guenther - Nov. 7, 2011, 10:36 p.m.
On Mon, Nov 7, 2011 at 8:47 PM, Jeff Law <law@redhat.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 11/07/11 07:38, Richard Guenther wrote:
>>
>>
>> The cond_chain stuff should be as we have discussed quite some
>> time ago on IRC - modeled after tree-affine.c - a "vector" of
>> predicates of the form [~] A op B, combined using logical AND or
>> OR (thus, able to represent a disjunctive or conjunctive normal
>> form). All suitably abstracted so if-conversion, the loop optimizer
>> and loop number of iteration analysis can use this code (they all
>> collect a controlling predicate (chain) and try to simplify against
>> it).
> Hmm, I've never looked at this code, does it do something similar to
> simplify_plus_minus?
>
> simplify_plus_minus takes a hunk of nested additions/subtractions,
> breaks them into components (where subtractions are modeled using
> addition+negation).  The components are held in an array which is
> repeatedly simplified (combining constants, eliminating unnecessary
> negations, etc etc).  After simpification we re-emit the new,
> simplified arithmetic?

Yes.  tree-affine does this for a sum of expressions of the form a + b * c.
It collects such sum, optimizes it (and you can add/subtract or scale
these things) and re-emit the new simplified form.

> If so, then definitely, we want that kind of structure.  Interestingly
> enough Kenner and I discussed this back in the early 90s before I was
> capable of doing anything with trees.  There was some codegen problem
> on the PA that would have been significantly helped by regrouping of
> this nature at the tree level.

Yes.  Basically allow to operate on on-the-side conjunctive/disjunctive
normal form of predicate compositions (where the predicate atoms
would be for example comparisons or simple SSA names).

Sebastian started to implement a very rough form of that in
tree-if-conv.c (not exactly how I would do it though, and not
general enough).

>> You may, for example, want to pick up my (suspended) work on
>> separating predicate computation from predicate use (or at least
>> think about whether that's a good idea for the purpose of predicate
>> optimizations).
> I certainly see value in in going down this path.

Richard.
Richard Guenther - Nov. 7, 2011, 10:42 p.m.
On Mon, Nov 7, 2011 at 8:36 PM, Jeff Law <law@redhat.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 11/07/11 09:14, Richard Guenther wrote:
>
>>>
>>> Well, not that I noticed that I missed here any freedom.  In
>>> what cases you mean I would need here freedom to create new
>>> ssa-statements?
>>
>> You lookup SSA_NAME_DEF_STMTs of SSA names - you cannot do that for
>> SSA names that have been coalesced with others.  Instead there is
>> get_gimple_for_ssa_name which may return NULL.
> Just so I understand what's going on here.
>
> Kai wants to move this code into the expanders which obviously run
> after TER.  Once TER has done it's job we can't depend on
> SSA_NAME_DEF_STMT in any meaningful way.
>
> I haven't looked closely at Kai's implementation (I've just helped
> with some benchmarking stuff), but would it make sense to leave this
> stuff in gimple, but make the transformations we want just prior to TER?
>
> Alternately, the code could use get_gimple_for_ssa_name and bail out
> if it returned something non-useful, such as NULL?  Obviously this
> would be the only choice if the code were intimately tied to the
> cfgexpand bits.
>
>> Kai writes:
>>> So for the upcoming 4.8 I will happily work on a more improved
>>> version, which can handle the switch-lowering, too.
> What I think Richard was suggesting is a general approach where we can
> tie in switch lowering and other lowerering or cost dependent
> transformations.  I don't think he's asking you to add switching
> lowering, just build what you're doing in such a way that someone
> could add switch lowering later.

Yes, and because TER limits our freedom (we've basically partially
gone out-of-SSA for coalesced names) we want to do this before
RTL expansion.

> My concern is that the more things you try to fix and handle, the more
> complex the patch is going to become and the more difficult to review,
> benchmark & analyze.  So my preference would be to try and get the
> current improvements into a suitable form & approved before tackling
> something like switch lowering.
>
>>
>> Well, trapping statements are properly marked and operands never
>> have side-effects in gimple.  So the situation is very easy -
>> never collect these kinds of operations in such "affine
>> combination" - you couldn't do anything with them anyway.
> Right.  Kai, I think it's this property of gimple that Richi wants you
> to take advantage of.
>
> I think we've all agreed that doing this stuff on the pre-gimplified
> tress is difficult and against the general direction we want to go.
> But that does _not_ mean that doing this stuff on a gimplified tree is
> bad.  In fact, gimple was designed to make this kind stuff easy to do.
>
>>
>>> So by this patch we already have a code-reduction of about 0.3%
>>> and for each extra-dynamical branch a much higher reduction of
>>> executed instruction as current approach shows.
>>
>> On which targets and for which benchmark?
> Kai may be referring the benchmarks I've been running for him using
> the framework I use to evaluate for jump threading and related changes.
>
> If so the process looks something like this:
>
> Build the trunk to get a baseline compiler.
> Use the trunk compiler to build a gcc-4.6.0 compiler (checking enabled)
> Then use the gcc-4.6.0 cc1 we just built to compile a bunch of .i
> files under the watchful eye of valgrind/cachegrind
>
> Apply patch
> Rebuild trunk
> Use the patched trunk compiler to build a new gcc-4.6.0 compiler
> Use the newer built gcc-4.6.0 to build the bunch of .i files under the
> watchful eye of valgrind/cachegrind
>
> Compare results.  From valgrind we get # insns executed, # branches
> and a variety of other data.  However, insns executed and branches are
> the most useful for the changes I'm tracking.  There's certain
> limitations to this kind of testing that I have discussed with Kai, in
> particular valgrind doesn't tell us costs of insns or break them down
> further based on type.  So we speculate an insn out of a conditional,
> linearize the code, eliminating some branching.  The net result may be
> more total insns executed, but fewer poorly predicted branches executed.

IIRC valgrind even offers simple branch predictor simulation - well predicted
branches tend to be cheap.

The question on which side to lean for canonicalization before such lowering
remains though (a few years back I thought of an enabling pass that would
agressively transform IL in either way, but that's of course not going
to be cheap).
If we'd have SSA names for all predicates then the if statments plus edges
would trivially map to &&s and ||s, so maybe canonicalization in either way
wouldn't be too important in that case.  Now there's quite some fallout
with that change though.

Richard.
Jeff Law - Nov. 8, 2011, 7:17 p.m.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/07/11 15:42, Richard Guenther wrote:
> 
> IIRC valgrind even offers simple branch predictor simulation - well
> predicted branches tend to be cheap.
It does.  The predictor is based on processors that are 5-7 years old.
  I don't generally look closely at the prediction stuff.

Removing even well predicted branches is profitable as they consume
resources within the hardware predictor mechanisms.  Obviously we get
more benefit by removing difficult to predict branches.

As I've mentioned to Kai, raw insn & branch counts from valgrind are
useful, but are just a piece of the puzzle.  For example, when
speculating, we'll obviously increase the insn count, but it may be
the case that there's sufficient processor resources to hide the cost
of the speculated code.  The same can often happen with if-conversion.
 We may execute more insns, but eliminate costly branches.

Even with its faults, the valgrind results are helpful; just like any
other tool we have to understand its limitations.


> 
> The question on which side to lean for canonicalization before such
> lowering remains though (a few years back I thought of an enabling
> pass that would agressively transform IL in either way, but that's
> of course not going to be cheap). If we'd have SSA names for all
> predicates then the if statments plus edges would trivially map to
> &&s and ||s, so maybe canonicalization in either way wouldn't be
> too important in that case.  Now there's quite some fallout with
> that change though.
As you say, it's something we've never really decided which is best.
Exposing a name for every predicate has value.  But exposing it in the
IL has, at least in the past, generally resulted in worse code.  So
the question becomes can we fix things so that exposing the name
doesn't regress the code, or can we expose a name, even if it's not
reflected in the IL.

jeff


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOuYBhAAoJEBRtltQi2kC7pSwH/3Tpbm3ayZAtg0Swef/DOI6M
miLlO5ZZXoqwJdiU0QFccI6zYSq7Y/JvTHhxYZIQ0QlOO+n2YUGdWk4FwGSRHGs4
kZk1ICRPdDKCXWvvM9gKQWYR9vM0VJql1VAD1TQP465g2af5kqaH7vM+nzCVGEGp
0uFlzx58wqozZnIldNyHqrtlnRJtvSdLFDH9Pduv6IL37NcBrJjzHR9kmObxu4uL
zS9JdiFoODfQRAOOlr3POqL2u54wJA8rstsPJLBns+MsGe5HTZ1GqxyKgs3bgIsk
m60GML1satgkQPfI6XJsWeTYW0uLDKAwtQGsmUT+h3Mnl9WNO/cr2GwHUA4wwzw=
=KZkC
-----END PGP SIGNATURE-----
Jeff Law - Nov. 9, 2011, 6:34 p.m.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/07/11 15:36, Richard Guenther wrote:

> 
> Yes.  tree-affine does this for a sum of expressions of the form a
> + b * c. It collects such sum, optimizes it (and you can
> add/subtract or scale these things) and re-emit the new simplified
> form.
Kai, what what were the concerns with this kind of approach?
Richard's suggestion seems sound to me.  From a maintenance standpoint
reusing/extending the tree-affine code seems like a better route.
jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOusfNAAoJEBRtltQi2kC7qjYH/35nH85/+mgZNQiKTSfh2QMp
eC9XUDScOzzIbCiN0kZZiedHarIlZL7LJ9285t5PGJP0oTzCpFuHOKrdp7+CC1e4
bNJSXlZpVKhJfvd5NCoJVts/CR/AlwA2P4hOGMHs2jn939fbIokxjknGsevG8udm
W/SCS2B65IysJFNCQLjz7/CiZNq36Keuw2BC6c6dn1bXXDxAcvGuR8dgr3CEBbE1
fkR2WRzucOxnoy3/d05kJuG+GRXjQBLCVtFDl1SKK/moK3Zck2MDleI4oguCj+Gp
B1zCA2BjEcdDQOoQjip8XhYqhoL1hFGJXoz7KU9nwl6utVG4SGeYw1V7Wr+i3u0=
=3e/j
-----END PGP SIGNATURE-----
Kai Tietz - Nov. 9, 2011, 9:09 p.m.
2011/11/9 Jeff Law <law@redhat.com>:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 11/07/11 15:36, Richard Guenther wrote:
>
>>
>> Yes.  tree-affine does this for a sum of expressions of the form a
>> + b * c. It collects such sum, optimizes it (and you can
>> add/subtract or scale these things) and re-emit the new simplified
>> form.
> Kai, what what were the concerns with this kind of approach?
> Richard's suggestion seems sound to me.  From a maintenance standpoint
> reusing/extending the tree-affine code seems like a better route.
> jeff

Well, such a comparison-logic-folder helper - like affine-tree for
add/subtract/scale) - is for sure something good for inner gimple
passes building up new logic-truth expressions, but such a pass
doesn't meet requirements need to fold for BC optimization AFAICS.

The difference is that for BC we don't want to fold at all. Also it
isn't necessarily "simplified" statement we want. For example 'if ((a
| b) == 0 & ...) ...'.  If the costs of such pattern '(a | b) == 0'
are too high, we want to representation it instead as 'if (a == 0) if
(b == 0) ...'.

So for BC optimization we need to have a fully compare-expanded
sequence of bitwise-operations including for sub-sequences. For normal
folding we don't need sub-sequence expansion in most cases.

The cause why we need for BC fully compare-expanded sequence I try to
demonstrate on the following example.

We have an condition 'if ((A | B) == 0 && C == 0) ...' where the
joining of A == 0 and C == 0 would be profitable by BC-optimization,
but the join of A != 0 and B != 0 isn't.
So we do - as my patch does - first an expand to element
comparison-sequence view.

So we get for it the transformed form 'if (A == 0 && B == 0 && C == 0)'.
Now we can begin to search for valid patterns in the condition for
joining by searching from left-to-right for a profitable pattern.  So
we end-up with final statement 'if ((A | C) == 0 && C)'

So as conclusion to answer your question about tree-affine
implementation.  Sure we can move the BC-optimization into a separate
pass.  As you and Richard already have explained, this would be indeed
in some cases better, as there is more freedom in operating on
gimple-statements.  This makes for sure sense.

But the logic itself we need in normal sequence-representation for
folding seems not to be that what we need for BC.  For general gimple
passes we want to have compact form on linear-sequence without
sub-sequences and we want compact final-representation in output.  In
BC we have slightly different requirements.  We need a
comparison-expanded form and of course with sub-sequences to do
split-up right dependent on the actual branch-costs.

Regards,
Kai


PS: There are several patch pending here about tree-reassoc (which
does folding on truth-bitwise operations pretty well, if it gets an
expanded view), about exactly the same subject, but here optimized for
simplest form for folding with compacted output.
Richard Guenther - Nov. 10, 2011, 10 a.m.
On Wed, Nov 9, 2011 at 7:34 PM, Jeff Law <law@redhat.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 11/07/11 15:36, Richard Guenther wrote:
>
>>
>> Yes.  tree-affine does this for a sum of expressions of the form a
>> + b * c. It collects such sum, optimizes it (and you can
>> add/subtract or scale these things) and re-emit the new simplified
>> form.
> Kai, what what were the concerns with this kind of approach?
> Richard's suggestion seems sound to me.  From a maintenance standpoint
> reusing/extending the tree-affine code seems like a better route.
> jeff

I'd rather write a new (but similar) infrastructure for predicates.  They
are substantially different enough so that putting them together doesn't
make too much sense.

Richard.

> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iQEcBAEBAgAGBQJOusfNAAoJEBRtltQi2kC7qjYH/35nH85/+mgZNQiKTSfh2QMp
> eC9XUDScOzzIbCiN0kZZiedHarIlZL7LJ9285t5PGJP0oTzCpFuHOKrdp7+CC1e4
> bNJSXlZpVKhJfvd5NCoJVts/CR/AlwA2P4hOGMHs2jn939fbIokxjknGsevG8udm
> W/SCS2B65IysJFNCQLjz7/CiZNq36Keuw2BC6c6dn1bXXDxAcvGuR8dgr3CEBbE1
> fkR2WRzucOxnoy3/d05kJuG+GRXjQBLCVtFDl1SKK/moK3Zck2MDleI4oguCj+Gp
> B1zCA2BjEcdDQOoQjip8XhYqhoL1hFGJXoz7KU9nwl6utVG4SGeYw1V7Wr+i3u0=
> =3e/j
> -----END PGP SIGNATURE-----
>
Richard Guenther - Nov. 10, 2011, 10:09 a.m.
On Wed, Nov 9, 2011 at 10:09 PM, Kai Tietz <ktietz70@googlemail.com> wrote:
> 2011/11/9 Jeff Law <law@redhat.com>:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On 11/07/11 15:36, Richard Guenther wrote:
>>
>>>
>>> Yes.  tree-affine does this for a sum of expressions of the form a
>>> + b * c. It collects such sum, optimizes it (and you can
>>> add/subtract or scale these things) and re-emit the new simplified
>>> form.
>> Kai, what what were the concerns with this kind of approach?
>> Richard's suggestion seems sound to me.  From a maintenance standpoint
>> reusing/extending the tree-affine code seems like a better route.
>> jeff
>
> Well, such a comparison-logic-folder helper - like affine-tree for
> add/subtract/scale) - is for sure something good for inner gimple
> passes building up new logic-truth expressions, but such a pass
> doesn't meet requirements need to fold for BC optimization AFAICS.

tree-affine is not a "affine folder" either.  It is an on-the side
representation
of a sum of affine components that you can operate on (for example,
you can simplify it).

> The difference is that for BC we don't want to fold at all. Also it
> isn't necessarily "simplified" statement we want. For example 'if ((a
> | b) == 0 & ...) ...'.  If the costs of such pattern '(a | b) == 0'
> are too high, we want to representation it instead as 'if (a == 0) if
> (b == 0) ...'.

The "affine tree" of (a | b) == 0 is

AND
[0] ~a
[1] ~b

> So for BC optimization we need to have a fully compare-expanded
> sequence of bitwise-operations including for sub-sequences. For normal
> folding we don't need sub-sequence expansion in most cases.

the predicate infrastructure isn't meant to be for folding - it is mean
to be a data structure that is well suited for operating on predicate
expressions in all ways (well, including folding).

> The cause why we need for BC fully compare-expanded sequence I try to
> demonstrate on the following example.
>
> We have an condition 'if ((A | B) == 0 && C == 0) ...' where the
> joining of A == 0 and C == 0 would be profitable by BC-optimization,
> but the join of A != 0 and B != 0 isn't.
> So we do - as my patch does - first an expand to element
> comparison-sequence view.

AND
[0] ~A
[1] ~B
[2] ~C

> So we get for it the transformed form 'if (A == 0 && B == 0 && C == 0)'.
> Now we can begin to search for valid patterns in the condition for
> joining by searching from left-to-right for a profitable pattern.  So
> we end-up with final statement 'if ((A | C) == 0 && C)'

> So as conclusion to answer your question about tree-affine
> implementation.

It's exactly what you implemented (well, sort-of).  You just did not
properly abstract it away.

>  Sure we can move the BC-optimization into a separate
> pass.  As you and Richard already have explained, this would be indeed
> in some cases better, as there is more freedom in operating on
> gimple-statements.  This makes for sure sense.
>
> But the logic itself we need in normal sequence-representation for
> folding seems not to be that what we need for BC.\

Sure, it's exactly the same data structure we can use.

>  For general gimple
> passes we want to have compact form on linear-sequence without
> sub-sequences and we want compact final-representation in output.  In
> BC we have slightly different requirements.  We need a
> comparison-expanded form and of course with sub-sequences to do
> split-up right dependent on the actual branch-costs.

You can trivially split up a predicate combination into multiple ones.

Richard.
Kai Tietz - Nov. 10, 2011, 10:49 a.m.
2011/11/10 Richard Guenther <richard.guenther@gmail.com>:
> On Wed, Nov 9, 2011 at 10:09 PM, Kai Tietz <ktietz70@googlemail.com> wrote:
>> 2011/11/9 Jeff Law <law@redhat.com>:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> On 11/07/11 15:36, Richard Guenther wrote:
>>>
>>>>
>>>> Yes.  tree-affine does this for a sum of expressions of the form a
>>>> + b * c. It collects such sum, optimizes it (and you can
>>>> add/subtract or scale these things) and re-emit the new simplified
>>>> form.
>>> Kai, what what were the concerns with this kind of approach?
>>> Richard's suggestion seems sound to me.  From a maintenance standpoint
>>> reusing/extending the tree-affine code seems like a better route.
>>> jeff
>>
>> Well, such a comparison-logic-folder helper - like affine-tree for
>> add/subtract/scale) - is for sure something good for inner gimple
>> passes building up new logic-truth expressions, but such a pass
>> doesn't meet requirements need to fold for BC optimization AFAICS.
>
> tree-affine is not a "affine folder" either.  It is an on-the side
> representation
> of a sum of affine components that you can operate on (for example,
> you can simplify it).
>
>> The difference is that for BC we don't want to fold at all. Also it
>> isn't necessarily "simplified" statement we want. For example 'if ((a
>> | b) == 0 & ...) ...'.  If the costs of such pattern '(a | b) == 0'
>> are too high, we want to representation it instead as 'if (a == 0) if
>> (b == 0) ...'.
>
> The "affine tree" of (a | b) == 0 is
>
> AND
> [0] ~a
> [1] ~b

Well, this is just true, if a and b are boolean-typed.  But we need to
handle also elements with different types, which are always
comparisons.   I have choosen exactly this sample, as it works on any
integral-type.

So using predicate "not" is not that what we need here in general.  We
have indeed to make up element-chains via comparison to operate on.

>> So for BC optimization we need to have a fully compare-expanded
>> sequence of bitwise-operations including for sub-sequences. For normal
>> folding we don't need sub-sequence expansion in most cases.
>
> the predicate infrastructure isn't meant to be for folding - it is mean
> to be a data structure that is well suited for operating on predicate
> expressions in all ways (well, including folding).
>
>> The cause why we need for BC fully compare-expanded sequence I try to
>> demonstrate on the following example.
>>
>> We have an condition 'if ((A | B) == 0 && C == 0) ...' where the
>> joining of A == 0 and C == 0 would be profitable by BC-optimization,
>> but the join of A != 0 and B != 0 isn't.
>> So we do - as my patch does - first an expand to element
>> comparison-sequence view.
>
> AND
> [0] ~A
> [1] ~B
> [2] ~C

Again, not true, as ~ works only on boolean-typed invariant elements
A, B, and C.  The attribute logical not is only of interest within a
fina operand of an invariant argument of an element.  It is no
predicate for the bitwise-binary-chain itself.

If we have a bitwise-chain with different type as boolean, an
"inverse" might be a candidate for predicates, but we try to operate
here on conditions.

>> So we get for it the transformed form 'if (A == 0 && B == 0 && C == 0)'.
>> Now we can begin to search for valid patterns in the condition for
>> joining by searching from left-to-right for a profitable pattern.  So
>> we end-up with final statement 'if ((A | C) == 0 && C)'
>
>> So as conclusion to answer your question about tree-affine
>> implementation.
>
> It's exactly what you implemented (well, sort-of).  You just did not
> properly abstract it away.

I know, but this can be done.

Kai
Richard Guenther - Nov. 10, 2011, 11:25 a.m.
On Thu, Nov 10, 2011 at 11:49 AM, Kai Tietz <ktietz70@googlemail.com> wrote:
> 2011/11/10 Richard Guenther <richard.guenther@gmail.com>:
>> On Wed, Nov 9, 2011 at 10:09 PM, Kai Tietz <ktietz70@googlemail.com> wrote:
>>> 2011/11/9 Jeff Law <law@redhat.com>:
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> On 11/07/11 15:36, Richard Guenther wrote:
>>>>
>>>>>
>>>>> Yes.  tree-affine does this for a sum of expressions of the form a
>>>>> + b * c. It collects such sum, optimizes it (and you can
>>>>> add/subtract or scale these things) and re-emit the new simplified
>>>>> form.
>>>> Kai, what what were the concerns with this kind of approach?
>>>> Richard's suggestion seems sound to me.  From a maintenance standpoint
>>>> reusing/extending the tree-affine code seems like a better route.
>>>> jeff
>>>
>>> Well, such a comparison-logic-folder helper - like affine-tree for
>>> add/subtract/scale) - is for sure something good for inner gimple
>>> passes building up new logic-truth expressions, but such a pass
>>> doesn't meet requirements need to fold for BC optimization AFAICS.
>>
>> tree-affine is not a "affine folder" either.  It is an on-the side
>> representation
>> of a sum of affine components that you can operate on (for example,
>> you can simplify it).
>>
>>> The difference is that for BC we don't want to fold at all. Also it
>>> isn't necessarily "simplified" statement we want. For example 'if ((a
>>> | b) == 0 & ...) ...'.  If the costs of such pattern '(a | b) == 0'
>>> are too high, we want to representation it instead as 'if (a == 0) if
>>> (b == 0) ...'.
>>
>> The "affine tree" of (a | b) == 0 is
>>
>> AND
>> [0] ~a
>> [1] ~b
>
> Well, this is just true, if a and b are boolean-typed.  But we need to
> handle also elements with different types, which are always
> comparisons.   I have choosen exactly this sample, as it works on any
> integral-type.
>
> So using predicate "not" is not that what we need here in general.  We
> have indeed to make up element-chains via comparison to operate on.

AND
[0] ~ (a != 0)
[1] ~ (b != 0)

just because I chose to draw simple pictures does not mean a more
complex one would be not supported (we'd want general comparisons
anway).  Nowhere did I specify that it should only work on boolean
variables.

>>> So for BC optimization we need to have a fully compare-expanded
>>> sequence of bitwise-operations including for sub-sequences. For normal
>>> folding we don't need sub-sequence expansion in most cases.
>>
>> the predicate infrastructure isn't meant to be for folding - it is mean
>> to be a data structure that is well suited for operating on predicate
>> expressions in all ways (well, including folding).
>>
>>> The cause why we need for BC fully compare-expanded sequence I try to
>>> demonstrate on the following example.
>>>
>>> We have an condition 'if ((A | B) == 0 && C == 0) ...' where the
>>> joining of A == 0 and C == 0 would be profitable by BC-optimization,
>>> but the join of A != 0 and B != 0 isn't.
>>> So we do - as my patch does - first an expand to element
>>> comparison-sequence view.
>>
>> AND
>> [0] ~A
>> [1] ~B
>> [2] ~C
>
> Again, not true, as ~ works only on boolean-typed invariant elements
> A, B, and C. The attribute logical not is only of interest within a
> fina operand of an invariant argument of an element.  It is no
> predicate for the bitwise-binary-chain itself.
>
> If we have a bitwise-chain with different type as boolean, an
> "inverse" might be a candidate for predicates, but we try to operate
> here on conditions.

See above.

>>> So we get for it the transformed form 'if (A == 0 && B == 0 && C == 0)'.
>>> Now we can begin to search for valid patterns in the condition for
>>> joining by searching from left-to-right for a profitable pattern.  So
>>> we end-up with final statement 'if ((A | C) == 0 && C)'
>>
>>> So as conclusion to answer your question about tree-affine
>>> implementation.
>>
>> It's exactly what you implemented (well, sort-of).  You just did not
>> properly abstract it away.
>
> I know, but this can be done.
>
> Kai
>
Jeff Law - Nov. 10, 2011, 5:58 p.m.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/09/11 14:09, Kai Tietz wrote:
> 
> Well, such a comparison-logic-folder helper - like affine-tree for 
> add/subtract/scale) - is for sure something good for inner gimple 
> passes building up new logic-truth expressions, but such a pass 
> doesn't meet requirements need to fold for BC optimization AFAICS.
Perhaps.  I think the thing to do would be to see what additional
needs we have and evaluate if they make sense in that kind of framework.

> 
> The difference is that for BC we don't want to fold at all. Also
> it isn't necessarily "simplified" statement we want. For example
> 'if ((a | b) == 0 & ...) ...'.  If the costs of such pattern '(a |
> b) == 0' are too high, we want to representation it instead as 'if
> (a == 0) if (b == 0) ...'.
We don't have to fold.  Think of this as an easy to use on-the-side
representation of some operation(s).

What I would roughly expect to see is the set of operations feeding
the comparison shoved into this structure.  With the full set of ops
exposed into this structure we could look at the cost,
canonicalize/simplify if appropriate, then select the best codegen
strategy, modifying the on-the-side structure as needed.  We then
reflect the final result back into the IL.

> We have an condition 'if ((A | B) == 0 && C == 0) ...' where the 
> joining of A == 0 and C == 0 would be profitable by
> BC-optimization, but the join of A != 0 and B != 0 isn't. So we do
> - as my patch does - first an expand to element comparison-sequence
> view.
> 
> So we get for it the transformed form 'if (A == 0 && B == 0 && C ==
> 0)'.
Right, so one of the first steps would be to canonicalize the (A|B) ==
0 && C == 0 form into something like you've shown above within this
on-the-side structure.

> Now we can begin to search for valid patterns in the condition for 
> joining by searching from left-to-right for a profitable pattern.
> So we end-up with final statement 'if ((A | C) == 0 && C)'
Which would be fairly straighforward using the on-the-side structure.

I'm not sure what I'm missing since the description you've given fits
very nicely with the overall approach Richi is suggesting.

jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOvBC4AAoJEBRtltQi2kC7tlcH/RHMWieuVEeJY8HZHw4wroA+
3Dnz1SFd7wA5kmj+1G+UdT4tl+L6zMdiF0GxwJ2zRh9QBQBCkwk3gBHfsgKSGb1h
u3jsUfa/TAVtym6cccIQZ6+ieEGVaARkqzt+dlqKyd+YItkm9nCciYVIaTTBwgsd
D6I2GMRrFfPkh1txQQ1sQQ9knnXmTp3YEwiDN3jCbm2dpn6X+jI9fOFJqGNPXrum
3t+d30zHiWlai+T0zfBSNJKOJO/NOU6hU1ShbPDGy3d+YQItXVb6BcSIivu9Jexz
c9RNyflJRXY3tKkcMWqbarddbGeyXdnS66tgkIoxXpMp5len46Ion+Y+DJCXV34=
=A65E
-----END PGP SIGNATURE-----
Jeff Law - Nov. 11, 2011, 6:38 p.m.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/07/11 13:28, Kai Tietz wrote:

> Sure.  A more general question, which was raised by Richi here.
> For BC optimization it is of course interesting to know real 
> instruction-costs and not just guessings.   The current code in 
> fold-const guess a bit too optimistic,  the code in that patch is
> in some cases too pesimistic.  So is there a facility to get
> prediction of instruction-costs on gimple-tree?  I would say this
> isn't possible, as here we would need final RTL form to tell this,
> isn't it?
We don't have much costing information in the gimple IL; that is
largely by design.  We generally wanted to avoid introducing cost
information into the gimple/ssa optimizers so that their behavior
would be more predictable.

The fact that BRANCH_COST is utilized in fold-const.c is a historical
blunder that we haven't fixed yet.

jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOvWuwAAoJEBRtltQi2kC7ElgIAIOuttaZN4ff6I4WnkN1cmUk
5dr2B4ROVe49cLyq2NOXPp2zLxtxfDG6PkWDi+w1y+XXEXerRn/ncHIczYSZi+Be
/DMhJDsog1GWUzblHTYvE4RluaB26+Khaj8yLgxZKhASAPw4h8Tjf7HwVedjiy1k
PDJC/as8lbGHHgOfxrzOL3t1NXx8RthrtwR7pKBvq1oq0oomjsIEPEa+ubZJfnBJ
TuKLJczcYCPKkgKBm5+wdOe5dO9NdpdfH5GVdEAPa+2lnXdrK7PXuYGz75uF9Hm1
j1zlO+zGP6qIS7FtEm7N70TZWcmPfmIKtamZTqT9JLsrvPTpzLkC4/PN486PiwQ=
=U8VZ
-----END PGP SIGNATURE-----
Kai Tietz - Nov. 12, 2011, 11:07 a.m.
2011/11/10 Jeff Law <law@redhat.com>:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 11/09/11 14:09, Kai Tietz wrote:
>>
>> Well, such a comparison-logic-folder helper - like affine-tree for
>> add/subtract/scale) - is for sure something good for inner gimple
>> passes building up new logic-truth expressions, but such a pass
>> doesn't meet requirements need to fold for BC optimization AFAICS.
> Perhaps.  I think the thing to do would be to see what additional
> needs we have and evaluate if they make sense in that kind of framework.
>
>>
>> The difference is that for BC we don't want to fold at all. Also
>> it isn't necessarily "simplified" statement we want. For example
>> 'if ((a | b) == 0 & ...) ...'.  If the costs of such pattern '(a |
>> b) == 0' are too high, we want to representation it instead as 'if
>> (a == 0) if (b == 0) ...'.
> We don't have to fold.  Think of this as an easy to use on-the-side
> representation of some operation(s).
Well, this is clear, but I see here not much re-use between
BC-optimization and general truth-logic folding.

The requirement that we need to handle comparisons for conditional
folding is mainly caused by the cond_expr itself.  As here we have no
ssa-comparison statement itself.  We have here left and right operand
plus a comparison-code.  So this kind of statement requires that we
handle it in tree.  Additional it might be a floating-point
comparison, where logical-invert can't be done on comparison's code
itself.  So we need here to introduce a dummy expression for such
cases (eg making a AST tree out of it).
One of the major difference I see between expanding for general
optimization vs. BC-optimization is the cost-analysis itself and the
decision, if we want to exloide a statment into its sub-parts, or
not.. For non-conditions, we don't want to look into multiple used
statements at all, but for BC we might want that, as long as statement
wasn't used before (in statement and dominators)  Also costs of a
statements operand depends on its use.  Means an operand with
multiple-use, which was prior to this statement evaluated, can be
considered as simple-statments with costs of 1, but a statement with
multiple use, which wasn't already evaluated has the costs of its
operations.

> What I would roughly expect to see is the set of operations feeding
> the comparison shoved into this structure.  With the full set of ops
> exposed into this structure we could look at the cost,
> canonicalize/simplify if appropriate, then select the best codegen
> strategy, modifying the on-the-side structure as needed.  We then
> reflect the final result back into the IL.

Well, the set of operations would be pretty limited IMHO.  It would be
logical and/or/not. May logical-xor could be interesting in some
cases, but not really sure if it is worth to handle it all, as a
logical-xor is normally a simple equal-compare in a condition.
Additionally we need function to read in a conditional-expression from
SSA-gimple form and built by it this representation and doing the
comparison-expansion (means eliminating logical-not expressions and
exploding some folded-patterns to their pure-logical expanded form -
like (A | B) ==/!= 0, (A & B) ==/!= ~0 for integral-typed A,B, .And
for boolean-typed A,B we might have things like A < B -> !A && B,
etc.

>> We have an condition 'if ((A | B) == 0 && C == 0) ...' where the
>> joining of A == 0 and C == 0 would be profitable by
>> BC-optimization, but the join of A != 0 and B != 0 isn't. So we do
>> - as my patch does - first an expand to element comparison-sequence
>> view.
>>
>> So we get for it the transformed form 'if (A == 0 && B == 0 && C ==
>> 0)'.
> Right, so one of the first steps would be to canonicalize the (A|B) ==
> 0 && C == 0 form into something like you've shown above within this
> on-the-side structure.

Sure. with considering the "simple" characteristic of a
gimple-statement and evaluating elements costs.

>> Now we can begin to search for valid patterns in the condition for
>> joining by searching from left-to-right for a profitable pattern.
>> So we end-up with final statement 'if ((A | C) == 0 && C)'
> Which would be fairly straighforward using the on-the-side structure.
>
> I'm not sure what I'm missing since the description you've given fits
> very nicely with the overall approach Richi is suggesting.

The point is here more, that I don't see a share-ability of this
abstraction for BC and general condition-build-up, as the attributes
for them are quite different.  For a simple comparison-chain collector
we need indeed just an inverse, and a helper-mode to represent intial
condition-expression's compare.  For BC we would need to make out of
such a general tree a specialized one with the additional attributes
and expansion logic before we can work on it at all.  As for BC we
have always to operate on already created SSA-gimple form, we don't
need any operations on such a tree.  We just read it in to our
representation with calculating attributes, and then generating out of
it the new representation for BC.  And therefore I am questioning the
approach to abstract this for general use with BC.  Not in the
approach itself.

Kai

Patch

Index: gcc-trunk/gcc/cfgexpand.c
===================================================================
--- gcc-trunk.orig/gcc/cfgexpand.c
+++ gcc-trunk/gcc/cfgexpand.c
@@ -1650,6 +1650,651 @@  maybe_cleanup_end_of_block (edge e, rtx 
     }
 }
 
+/* Check if statement can be considered as a "simple" one.  Simples are:
+   - minimal invariant
+   - any non-SSA_NAME veriant
+   - any SSA_NAME variant without a definition statement
+   - any SSA_NAME with default definition.
+   - an assignment of kind ~X, if X is minimal invariant, or has no
+     definition statement, We exclude here floating point types for X
+     and Y, as ~ (X cmp Y) can have special meaning on floats..
+   - an assignment of kind X ^ ~0, if X is minimal invariant, or has no
+     definition statement,  */
+
+static bool
+is_bool_op_p (tree op, bool *is_not)
+{
+  gimple s;
+  enum tree_code code;
+
+  *is_not = false;
+
+  /* Reject result types not of boolean kine.  */
+  if (TREE_CODE (TREE_TYPE (op)) != BOOLEAN_TYPE)
+    return false;
+
+  if (is_gimple_min_invariant (op)
+      || TREE_CODE (op) != SSA_NAME
+      || SSA_NAME_IS_DEFAULT_DEF (op)
+      || (s = SSA_NAME_DEF_STMT (op)) == NULL)
+    return true;
+
+  /* Reject statement which isn't of kind assign.  */
+  if (!is_gimple_assign (s))
+    return false;
+
+  code = gimple_assign_rhs_code (s);
+
+  /* See if we have a "simple" logical not.  */
+  if (code == BIT_NOT_EXPR)
+    *is_not = true;
+  else if (code == BIT_XOR_EXPR
+           && integer_all_onesp (gimple_assign_rhs2 (s)))
+    *is_not = true;
+  else
+    return false;
+
+  op = gimple_assign_rhs1 (s);
+
+  if (TREE_CODE (op) != SSA_NAME)
+    return false;
+
+  /* Can the inner statement be considered as
+     simple.  */
+  if (is_gimple_min_invariant (op)
+      || SSA_NAME_IS_DEFAULT_DEF (op)
+      || SSA_NAME_DEF_STMT (op) == NULL)
+    return true;
+
+  *is_not = false;
+  return false;
+}
+
+/* This helper routine normalizes comparisons on boolean-typed operands
+   for OP1 ==/!= CST.
+   Folded patterns are:
+    X ==/!= 1 -> X !=/== 0
+    ~(X cmp Y) !=/== 0 -> (X cmp Y) ==/!= 0, if X and Y aren't floats.
+    ~(X & Y) !=/== 0 -> (X & Y) ==/!= 0
+    ~(Y | Y) !=/== 0 -> (X | Y) ==/!= 0
+    (X cmp Y) != 0 -> (X cmp Y)
+    (X cmp Y) == 0 -> (X cmp-invert Y), if X and Y aren't floats.
+   Note: We reject here operations on floats as pattern ~(float-compare)
+   can have side-effects.  */
+
+static void
+normalize_truth_condition (enum tree_code *cmp, tree *op1, tree *op2)
+{
+  gimple stmt, s;
+  tree nop0, nop1, op;
+  tree type = TREE_TYPE (*op1);
+  enum tree_code code2;
+
+  if (TREE_CODE (type) != BOOLEAN_TYPE
+      || (*cmp != EQ_EXPR && *cmp != NE_EXPR)
+      || TREE_CODE (*op2) != INTEGER_CST)
+    return;
+
+  if (!SA.values
+      || TREE_CODE (*op1) != SSA_NAME
+      || !bitmap_bit_p (SA.values, SSA_NAME_VERSION (*op1)))
+   return;
+
+  if (integer_onep (*op2))
+    {
+      *op2 = build_int_cst (type, 0);
+      *cmp = (*cmp == EQ_EXPR ? NE_EXPR : EQ_EXPR);
+    }
+
+  for (;;)
+    {
+      if (!SA.values
+	  || TREE_CODE (*op1) != SSA_NAME
+	  || !bitmap_bit_p (SA.values, SSA_NAME_VERSION (*op1)))
+       return;
+
+      stmt = SSA_NAME_DEF_STMT (*op1);
+      if (gimple_code (stmt) != GIMPLE_ASSIGN)
+	return;
+
+      code2 = gimple_assign_rhs_code (stmt);
+      if (code2 != BIT_NOT_EXPR)
+        break;
+      op = gimple_assign_rhs1 (stmt);
+
+      if (!SA.values
+	  || TREE_CODE (op) != SSA_NAME
+	  || !bitmap_bit_p (SA.values, SSA_NAME_VERSION (op)))
+       return;
+
+      s = SSA_NAME_DEF_STMT (op);
+      if (!s || gimple_code (s) != GIMPLE_ASSIGN)
+	return;
+
+      code2 = gimple_assign_rhs_code (s);
+
+      if (TREE_CODE_CLASS (code2) == tcc_comparison)
+        {
+	  tree t = TREE_TYPE (gimple_assign_rhs1 (s));
+
+	  /* Don't fold ~ (X cmp Y) ==/!= 0 to (X cmp Y) ==/!= 0,
+	     if operands of comparison are floating-points.  */
+	  if (FLOAT_TYPE_P (t))
+	    return;
+	}
+      else if (code2 != BIT_AND_EXPR && code2 != BIT_IOR_EXPR)
+        return;
+      *op1 = op;
+      *cmp = (*cmp == EQ_EXPR ? NE_EXPR : EQ_EXPR);
+    }
+
+  if (TREE_CODE_CLASS (code2) != tcc_comparison)
+    return;
+
+  nop0 = gimple_assign_rhs1 (stmt);
+  nop1 = gimple_assign_rhs2 (stmt);
+
+  /* Case (X cmp Y) != 0 -> X cmp Y.  */
+  if (*cmp == NE_EXPR)
+    {
+      *cmp = code2;
+      *op1 = nop0;
+      *op2 = nop1;
+      return;
+    }
+
+  /* Case (X cmp Y) == 0 */
+
+  type = TREE_TYPE (nop0);
+  if (FLOAT_TYPE_P (type))
+    return;
+  code2 = invert_tree_comparison (*cmp,
+				  HONOR_NANS (TYPE_MODE (type)));
+  if (code2 == ERROR_MARK)
+    return;
+  *cmp = code2;
+  *op1 = nop0;
+  *op2 = nop1;
+}
+
+/* Structure to keep some attributes of operands
+   for bitwise-and/or chain within a condition.  */
+
+typedef struct cond_assoc_t {
+  tree leaf;
+  enum tree_code code;
+  tree op2;
+  tree type;
+  bool is_bool;
+  bool is_trap;
+  bool has_sideeffect;
+  bool is_bool_not;
+  bool is_or;
+  bool is_and;
+  bool joined;
+} cond_assoc_t;
+
+/* Build up operand list in OP for bitwise operand chain of kind CODE and
+   store each element in CONDS, if non-NULL..
+   CMP can be either EQ_EXPR or NE_EXPR, and op2 is integral zero.
+   Function returns the count of found elements in OP.  */
+static int
+collect_cond_chain (tree op, enum tree_code code, cond_assoc_t *conds,
+		    enum tree_code cmp, tree op2)
+{
+  gimple s = NULL;
+  int ret = 0;
+
+  if (TREE_CODE (op) != SSA_NAME
+      || (s = SSA_NAME_DEF_STMT (op)) == NULL
+      || !is_gimple_assign (s)
+      || gimple_assign_rhs_code (s) != code)
+    {
+      if (conds)
+	{
+	  enum tree_code ncode = cmp;
+	  tree nop2 = op2;
+
+	  memset (&conds[ret], 0, sizeof (cond_assoc_t));
+	  normalize_truth_condition (&ncode, &op, &nop2);
+	  conds[ret].leaf = op;
+	  conds[ret].code = ncode;
+	  conds[ret].op2 = nop2;
+	  conds[ret].type = boolean_type_node;
+
+	  s = NULL;
+	  if (TREE_CODE (op) != SSA_NAME
+	      || (s = SSA_NAME_DEF_STMT (op)) == NULL
+	      || !is_gimple_assign (s))
+	    s = NULL;
+
+	  if (s)
+	    conds[ret].is_trap = gimple_could_trap_p (s);
+	  if (s)
+	    conds[ret].has_sideeffect = gimple_has_side_effects (s);
+
+	  if ((ncode == EQ_EXPR || ncode == NE_EXPR)
+	      && TREE_CODE (TREE_TYPE (op)) == BOOLEAN_TYPE
+	      && integer_zerop (nop2))
+	    {
+	      conds[ret].is_bool_not = false;
+	      conds[ret].is_bool = is_bool_op_p (op, &conds[ret].is_bool_not);
+	      conds[ret].is_bool_not &= conds[ret].is_bool;
+
+	      if (conds[ret].is_trap || conds[ret].has_sideeffect)
+	        {
+		  conds[ret].is_bool = false;
+		  conds[ret].is_bool_not = false;
+		}
+	      code = ERROR_MARK;
+	      if (s && is_gimple_assign (s)
+	          && TREE_CODE (TREE_TYPE (op)) == BOOLEAN_TYPE)
+		code = gimple_assign_rhs_code (s);
+	      conds[ret].is_or = (code == BIT_IOR_EXPR);
+	      conds[ret].is_and = (code == BIT_AND_EXPR);
+	    }
+	}
+      return 1;
+    }
+  ret = collect_cond_chain (gimple_assign_rhs1 (s), code, (conds ? &conds[ret] : NULL), cmp, op2);
+  ret += collect_cond_chain (gimple_assign_rhs2 (s), code, (conds ? &conds[ret] : NULL), cmp, op2);
+
+  return ret;
+}
+
+/* This routine collects the bitwise operand chain of CODE in OP.
+   The found amount of elements is stored in *PCNT.
+   The function returns the pointer allocated buffer containing
+   *PCNT elements.  */
+
+static cond_assoc_t *
+get_condition_list_for (tree op, enum tree_code code, int *pcnt,
+			enum tree_code cmp, tree cst)
+{
+  int cnt;
+  cond_assoc_t *conds;
+  cnt = collect_cond_chain (op, code, NULL, cmp, cst);
+
+  conds = (cond_assoc_t *) XNEWVEC (cond_assoc_t, cnt);
+  collect_cond_chain (op, code, conds, cmp, cst);
+  *pcnt = cnt;
+
+  return conds;
+}
+
+/* Helper function to build a binary tree.
+   If OP0 is boolean-typed, CODE is equal to NE_EXPR,
+   and OP2 is zero constant, then return OP0.  Otherwise
+   the result of build2 is returned.  */
+
+static tree
+build_cond_expr (enum tree_code code, tree type, tree op0, tree op1)
+{
+  if (code == NE_EXPR && integer_zerop (op1)
+      && TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE)
+    return op0;
+  return build2 (code, type, op0, op1);
+}
+
+/* Deside if we can spare branch costs on combining
+   L and R operands for bitwise-and/or dependent on BRANCH_COST.
+   Function returns TRUE, if combining of L and R might be profitable,
+   otherwise FALSE is returned.  */
+
+static bool
+is_bitwise_binary_simple_combine (cond_assoc_t *l, cond_assoc_t *r)
+{
+  bool op0_has_not = false, op1_has_not = false;
+  int bcosts = BRANCH_COST (optimize_insn_for_speed_p (), false);
+  int costs = 1;
+
+  /* If jumps aren't cheap avoid to turn some more codes into
+     jumpy sequences.  */
+  if (bcosts > 4)
+    return true;
+
+  /* We would spare one branch, but higher the count of executed
+     instructions.  */
+  if (bcosts <= 0)
+    return false;
+
+  if (l->is_bool == false || r->is_bool == false)
+    return false;
+
+  op0_has_not = l->is_bool_not;
+  op1_has_not = r->is_bool_not;
+
+  /* If L and R have inner bit-not, then there
+     are no additional costs for them.  */
+  if (op0_has_not && op1_has_not)
+    op0_has_not = op1_has_not = false;
+
+  /* If comparison codes of L and R aren't equal, then
+     it might be more costy to combine them, if not
+     on arm doesn't have an inner logical-not.  */
+  if (l->code != r->code)
+    {
+      if (op0_has_not)
+        op0_has_not = false;
+      else if (op1_has_not)
+        op1_has_not = false;
+      else
+        op0_has_not = true;
+    }
+  /* If leaf of L or R isn't boolean-typed, then we have at least
+     two operations to obtain result.
+     Each logical-not costs one more operation.  */
+
+  if (TREE_CODE (TREE_TYPE (l->leaf)) != BOOLEAN_TYPE)
+    costs += 2;
+  else if (op0_has_not)
+    costs++;
+
+  if (TREE_CODE (TREE_TYPE (r->leaf)) != BOOLEAN_TYPE)
+    costs += 2;
+  else if (op1_has_not)
+    costs++;
+
+  if (bcosts < costs)
+    return false;
+
+  return true;
+}
+
+/* Obtain some characteristics on comparison of intergral typed
+   arguments and determine if we have here a simple combinable
+   operand.
+   In PREFIX_NOT the value TRUE is stored, if operand CA contains
+   a value-invert operation.
+   In CMP_ZERO the value TRUE is stored, if operand CA is a comparison
+   to integral zero.
+   This function returns TRUE, if a possible candidate was matched,
+   otherwise it returns FALSE.  */
+
+static bool
+preeval_cond_integral (const cond_assoc_t *ca, bool *prefix_not, bool *cmp_zero)
+{
+  gimple s = NULL;
+  tree o;
+
+  if (ca->joined == true
+      || ca->is_bool
+      || ca->is_trap || ca->has_sideeffect
+      || ca->is_and || ca->is_or)
+    return false;
+
+  /* Reject anything different than X !=/== 0 and X !=/== ~0.  */
+  if ((ca->code != EQ_EXPR && ca->code != NE_EXPR)
+      || !INTEGRAL_TYPE_P (TREE_TYPE (ca->leaf))
+      || TREE_CODE (TREE_TYPE (ca->leaf)) == BOOLEAN_TYPE
+      || (!integer_zerop (ca->op2) && !integer_all_onesp (ca->op2)))
+    return false;
+
+  if (TREE_CODE (ca->leaf) != SSA_NAME)
+    return false;
+  *prefix_not = false;
+  *cmp_zero = integer_zerop (ca->op2);
+
+  s = SSA_NAME_DEF_STMT (ca->leaf);
+  if (!s || SSA_NAME_IS_DEFAULT_DEF (ca->leaf))
+    return true;
+
+  /* See if we have a valid ~X pattern in left-hand comparison
+     operand.  */
+  if (!is_gimple_assign (s)
+      || gimple_assign_rhs_code (s) != BIT_NOT_EXPR)
+    return false;
+
+  o = gimple_assign_rhs1 (s);
+  if (TREE_CODE (o) != SSA_NAME)
+    return false;
+
+  *prefix_not = true;
+  s = SSA_NAME_DEF_STMT (o);
+  if (!s || SSA_NAME_IS_DEFAULT_DEF (o))
+    return true;
+  *prefix_not = false;
+
+  return false;
+}
+
+/* Helper routine to do branch-cost optimization for operand-list
+   in CA with count CA_COUNT.  The bitwise-binary operation BITCODE needs
+   to be toggled between and/or, if INVERTED is TRUE.
+   In PCODE the final binary result-code is stored. In POP0 and POP1
+   its final operands.  */
+
+static void
+combine_conds (cond_assoc_t *ca, int ca_count, bool inverted,
+	       enum tree_code bitcode, enum tree_code *pcode, tree *pop0, tree *pop1)
+{
+  cond_assoc_t *sub_ca;
+  int sub_ca_count, i, l;
+  tree collect = NULL_TREE, tem;
+  enum tree_code chain_bitwise = bitcode;
+  enum tree_code chain_logical;
+
+  if (inverted)
+    chain_bitwise = (chain_bitwise == BIT_AND_EXPR ? BIT_IOR_EXPR : BIT_AND_EXPR);
+  chain_logical = (chain_bitwise == BIT_AND_EXPR ? TRUTH_ANDIF_EXPR : TRUTH_ORIF_EXPR);
+
+  /* First try to combine the comparisons on integral-typed arguments, which
+     aren't boolean-typed.  */
+  for (i = 0; i < ca_count; i++)
+    {
+      tree o1, o2;
+      bool op1_not = false, op2_not = false;
+      bool op1_cmp_zero = false, op2_cmp_zero = false;
+      int bcosts = BRANCH_COST (optimize_insn_for_speed_p (), false);
+
+      if (bcosts < 2)
+        break;
+      if (!preeval_cond_integral (&ca[i], &op1_not, &op1_cmp_zero))
+        continue;
+
+      if ((chain_bitwise == BIT_IOR_EXPR && ca[i].code != NE_EXPR)
+          || (chain_bitwise == BIT_AND_EXPR && ca[i].code != EQ_EXPR))
+        continue;
+
+      for (l = i + 1; l < ca_count; l++)
+        {
+	  tree p1;
+
+	  if (!preeval_cond_integral (&ca[l], &op2_not, &op2_cmp_zero))
+	    continue;
+	  if (ca[i].code != ca[l].code)
+	    continue;
+
+	  if (TREE_TYPE (ca[i].leaf) != TREE_TYPE (ca[l].leaf))
+	    continue;
+
+	  o1 = ca[i].leaf;
+	  o2 = ca[l].leaf;
+	  p1 = ca[i].op2;
+
+	  if (op1_cmp_zero == op2_cmp_zero)
+	    {
+	      if (op2_not && op1_not)
+	        {
+		  o1 = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (o1));
+		  p1 = fold_build1 (BIT_NOT_EXPR, TREE_TYPE (p1), p1);
+		  o2 = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (o2));
+		  op1_cmp_zero = !op1_cmp_zero;
+		}
+	      else if ((op1_not || op2_not) && bcosts <= 2)
+	        continue;
+	    }
+	  else if (!op1_not && !op2_not)
+	    continue;
+	  else if (op1_not)
+	    {
+	      if (op2_not && bcosts <= 2)
+	        continue;
+	      o1 = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (o1));
+	      p1 = fold_build1 (BIT_NOT_EXPR, TREE_TYPE (p1), p1);
+	      op1_cmp_zero = !op1_cmp_zero;
+	    }
+	  else
+	    o2 = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (o2));
+
+	  if (op1_cmp_zero)
+	    ca[i].leaf = build2 (BIT_IOR_EXPR, TREE_TYPE (o1), o1, o2);
+	  else
+	    ca[i].leaf = build2 (BIT_AND_EXPR, TREE_TYPE (o1), o1, o2);
+	  ca[i].op2 = p1;
+	  ca[l].joined = true;
+	  break;
+	}
+    }
+
+  /* Now try to combine comparisons on boolean-typed arguments and do
+     operations for inner bitwise and/or chains.  */
+  for (i = 0; i < ca_count; i++)
+    {
+      if (ca[i].joined == true)
+        continue;
+
+      if (ca[i].is_and || ca[i].is_or)
+        {
+	  enum tree_code sub_code;
+	  tree op0;
+	  tree cst = ca[i].op2;
+	  enum tree_code ao_code = ca[i].code;
+
+	  sub_ca = get_condition_list_for (ca[i].leaf,
+	  				   (ca[i].is_and ? BIT_AND_EXPR : BIT_IOR_EXPR),
+	  				   &sub_ca_count,
+	  				   ao_code, cst);
+	  combine_conds (sub_ca, sub_ca_count, (ao_code == EQ_EXPR),
+	  		 (ca[i].is_and ? BIT_AND_EXPR : BIT_IOR_EXPR),
+	  		 &sub_code, &op0, &cst);
+	  free (sub_ca);
+	  tem = build_cond_expr (sub_code, ca[i].type, op0, cst);
+	  ca[i].joined = true;
+	}
+      else
+        {
+	  ca[i].joined = true;
+	  tem = NULL_TREE;
+	  if (ca[i].is_bool)
+	    {
+	      for (l = i + 1; l < ca_count; l++)
+		{
+		  if (ca[l].joined == true || ca[l].is_bool == false)
+		    continue;
+		  if (is_bitwise_binary_simple_combine (&ca[i], &ca[l]))
+		    break;
+		}
+	      if (l < ca_count)
+		{
+		  tree tem2;
+		  enum tree_code bitw = chain_bitwise;
+
+		  /* ~X == 0 -> X != 0, ~X != 0 -> X == 0 */
+		  if (ca[i].is_bool_not && ca[l].is_bool_not)
+		    {
+		      gimple s = SSA_NAME_DEF_STMT (ca[i].leaf);
+		      ca[i].leaf = gimple_assign_rhs1 (s);
+		      ca[i].code = (ca[i].code == EQ_EXPR ? NE_EXPR : EQ_EXPR);
+		      s = SSA_NAME_DEF_STMT (ca[l].leaf);
+		      ca[l].leaf = gimple_assign_rhs1 (s);
+		      ca[l].code = (ca[l].code == EQ_EXPR ? NE_EXPR : EQ_EXPR);
+		    }
+		  if (ca[i].code != ca[l].code)
+		    {
+		      gimple s;
+
+		      if (ca[i].is_bool_not)
+		        {
+			  s = SSA_NAME_DEF_STMT (ca[i].leaf);
+			  ca[i].leaf = gimple_assign_rhs1 (s);
+			  ca[i].code = (ca[i].code == EQ_EXPR ? NE_EXPR : EQ_EXPR);
+			}
+		      else
+		        {
+			  s = SSA_NAME_DEF_STMT (ca[l].leaf);
+			  ca[l].leaf = gimple_assign_rhs1 (s);
+			  ca[l].code = (ca[l].code == EQ_EXPR ? NE_EXPR : EQ_EXPR);
+			}
+		    }
+		  /* (X == 0 op Y == 0) != 0 -> (X op-inv Y) == 0.  */
+		  if (ca[i].code == EQ_EXPR && ca[l].code == EQ_EXPR)
+		    {
+		      bitw = (bitw == BIT_AND_EXPR ? BIT_IOR_EXPR : BIT_AND_EXPR);
+		      tem = build_cond_expr (NE_EXPR, ca[i].type, ca[i].leaf, ca[i].op2);
+		      tem2 = build_cond_expr (NE_EXPR, ca[l].type, ca[l].leaf, ca[l].op2);
+		      tem = build_cond_expr (bitw,
+					     TREE_TYPE (tem), tem, tem2);
+		      tem = build_cond_expr (EQ_EXPR, ca[i].type, tem, ca[i].op2);
+		    }
+		  else
+		    {
+		      tem = build_cond_expr (ca[i].code, ca[i].type, ca[i].leaf, ca[i].op2);
+		      tem2 = build_cond_expr (ca[l].code, ca[l].type, ca[l].leaf, ca[l].op2);
+		      tem = build_cond_expr (bitw,
+					     TREE_TYPE (tem), tem, tem2);
+		    }
+		  ca[l].joined = true;
+		}
+	    }
+
+	  if (!tem)
+	    tem = build_cond_expr (ca[i].code, ca[i].type, ca[i].leaf, ca[i].op2);
+	}
+
+      if (!collect)
+	collect = tem;
+      else
+	collect = build2 (chain_logical, TREE_TYPE (collect), collect, tem);
+    }
+
+  *pcode = TREE_CODE (collect);
+  *pop0 = TREE_OPERAND (collect, 0);
+  *pop1 = TREE_OPERAND (collect, 1);
+}
+
+/* Helper routine to do branch-cost optimization on binary expression
+   *POP0 *PCODE *POP1.  */
+
+static void
+branchcost_optimization_on_conditions (enum tree_code *pcode, tree *pop0, tree *pop1)
+{
+  tree op0 = *pop0;
+  tree op1 = *pop1;
+  tree type = TREE_TYPE (op0);
+  enum tree_code code = *pcode;
+  gimple stmt;
+  bool invert = false;
+  cond_assoc_t *ca;
+  int ca_count;
+
+  if (TREE_CODE (type) != BOOLEAN_TYPE
+      || !integer_zerop (op1)
+      || (code != EQ_EXPR && code != NE_EXPR))
+    return;
+  if (!SA.values
+      || TREE_CODE (op0) != SSA_NAME
+      || !bitmap_bit_p (SA.values, SSA_NAME_VERSION (op0)))
+   return;
+
+  invert = (code == EQ_EXPR);
+  stmt = SSA_NAME_DEF_STMT (op0); /* $$$$ */
+  if (gimple_code (stmt) != GIMPLE_ASSIGN)
+    return;
+  switch (gimple_assign_rhs_code (stmt))
+    {
+    case BIT_AND_EXPR:
+    case BIT_IOR_EXPR:
+      break;
+    default:
+      return;
+    }
+
+  ca = get_condition_list_for (op0, gimple_assign_rhs_code (stmt), &ca_count, code, op1);
+  combine_conds (ca, ca_count, invert, gimple_assign_rhs_code (stmt), pcode, pop0, pop1);
+  free (ca);
+}
+
 /* A subroutine of expand_gimple_basic_block.  Expand one GIMPLE_COND.
    Returns a new basic block if we've terminated the current basic
    block and created a new one.  */
@@ -1668,6 +2313,8 @@  expand_gimple_cond (basic_block bb, gimp
   code = gimple_cond_code (stmt);
   op0 = gimple_cond_lhs (stmt);
   op1 = gimple_cond_rhs (stmt);
+
+  normalize_truth_condition (&code, &op0, &op1);
   /* We're sometimes presented with such code:
        D.123_1 = x < y;
        if (D.123_1 != 0)
@@ -1676,44 +2323,16 @@  expand_gimple_cond (basic_block bb, gimp
      be cleaned up by combine.  But some pattern matchers like if-conversion
      work better when there's only one compare, so make up for this
      here as special exception if TER would have made the same change.  */
-  if (gimple_cond_single_var_p (stmt)
-      && SA.values
-      && TREE_CODE (op0) == SSA_NAME
-      && bitmap_bit_p (SA.values, SSA_NAME_VERSION (op0)))
-    {
-      gimple second = SSA_NAME_DEF_STMT (op0);
-      if (gimple_code (second) == GIMPLE_ASSIGN)
-	{
-	  enum tree_code code2 = gimple_assign_rhs_code (second);
-	  if (TREE_CODE_CLASS (code2) == tcc_comparison)
-	    {
-	      code = code2;
-	      op0 = gimple_assign_rhs1 (second);
-	      op1 = gimple_assign_rhs2 (second);
-	    }
-	  /* If jumps are cheap turn some more codes into
-	     jumpy sequences.  */
-	  else if (BRANCH_COST (optimize_insn_for_speed_p (), false) < 4)
-	    {
-	      if ((code2 == BIT_AND_EXPR
-		   && TYPE_PRECISION (TREE_TYPE (op0)) == 1
-		   && TREE_CODE (gimple_assign_rhs2 (second)) != INTEGER_CST)
-		  || code2 == TRUTH_AND_EXPR)
-		{
-		  code = TRUTH_ANDIF_EXPR;
-		  op0 = gimple_assign_rhs1 (second);
-		  op1 = gimple_assign_rhs2 (second);
-		}
-	      else if (code2 == BIT_IOR_EXPR || code2 == TRUTH_OR_EXPR)
-		{
-		  code = TRUTH_ORIF_EXPR;
-		  op0 = gimple_assign_rhs1 (second);
-		  op1 = gimple_assign_rhs2 (second);
-		}
-	    }
-	}
-    }
+  branchcost_optimization_on_conditions (&code, &op0, &op1);
 
+  /* Take care that final statement is a comparison, if we end
+     up with an AND or OR associative statement.  */
+  if (code == BIT_AND_EXPR || code == BIT_IOR_EXPR)
+    {
+      op0 = build2 (code, TREE_TYPE (op0), op0, op1);
+      code = NE_EXPR;
+      op1 = build_int_cst (TREE_TYPE (op0), 0);
+    }
   last2 = last = get_last_insn ();
 
   extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
Index: gcc-trunk/gcc/dojump.c
===================================================================
--- gcc-trunk.orig/gcc/dojump.c
+++ gcc-trunk/gcc/dojump.c
@@ -579,13 +579,12 @@  do_jump (tree exp, rtx if_false_label, r
 	goto normal;
 
       /* Boolean comparisons can be compiled as TRUTH_AND_EXPR.  */
-
     case TRUTH_AND_EXPR:
       /* High branch cost, expand as the bitwise AND of the conditions.
 	 Do the same if the RHS has side effects, because we're effectively
 	 turning a TRUTH_AND_EXPR into a TRUTH_ANDIF_EXPR.  */
       if (BRANCH_COST (optimize_insn_for_speed_p (),
-		       false) >= 4
+		       false) >= 1
 	  || TREE_SIDE_EFFECTS (TREE_OPERAND (exp, 1)))
 	goto normal;
       code = TRUTH_ANDIF_EXPR;
@@ -596,7 +595,7 @@  do_jump (tree exp, rtx if_false_label, r
       /* High branch cost, expand as the bitwise OR of the conditions.
 	 Do the same if the RHS has side effects, because we're effectively
 	 turning a TRUTH_OR_EXPR into a TRUTH_ORIF_EXPR.  */
-      if (BRANCH_COST (optimize_insn_for_speed_p (), false) >= 4
+      if (BRANCH_COST (optimize_insn_for_speed_p (), false) >= 1
 	  || TREE_SIDE_EFFECTS (TREE_OPERAND (exp, 1)))
 	goto normal;
       code = TRUTH_ORIF_EXPR;
Index: gcc-trunk/gcc/fold-const.c
===================================================================
--- gcc-trunk.orig/gcc/fold-const.c
+++ gcc-trunk/gcc/fold-const.c
@@ -3711,13 +3711,26 @@  simple_operand_p_2 (tree exp)
 
   code = TREE_CODE (exp);
 
-  if (TREE_CODE_CLASS (code) == tcc_comparison)
-    return (simple_operand_p (TREE_OPERAND (exp, 0))
-	    && simple_operand_p (TREE_OPERAND (exp, 1)));
+  if (code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR)
+    return false;
+
+  /* We need to recurse here tree for two reasons.  First is that
+     simple_operand_p would reject too much binary/unary/expression-kind
+     expression.  Secondly, tree_could_trap_p doesn't inspect all kind
+     of binary/expression/unary-expression and so misses some side-effects.  */
+
+  if (TREE_CODE_CLASS (code) == tcc_binary
+      || TREE_CODE_CLASS (code) == tcc_comparison
+      || code == TRUTH_AND_EXPR || code == TRUTH_OR_EXPR
+      || code == TRUTH_XOR_EXPR)
+    return (simple_operand_p_2 (TREE_OPERAND (exp, 0))
+	    && simple_operand_p_2 (TREE_OPERAND (exp, 1)));
 
-  if (code == TRUTH_NOT_EXPR)
+  if (TREE_CODE_CLASS (code) == tcc_unary
+      || code == TRUTH_NOT_EXPR)
       return simple_operand_p_2 (TREE_OPERAND (exp, 0));
 
+  /* This function is in some cases too strict.  */
   return simple_operand_p (exp);
 }
 
@@ -4875,45 +4888,6 @@  fold_range_test (location_t loc, enum tr
       return or_op ? invert_truthvalue_loc (loc, tem) : tem;
     }
 
-  /* On machines where the branch cost is expensive, if this is a
-     short-circuited branch and the underlying object on both sides
-     is the same, make a non-short-circuit operation.  */
-  else if (LOGICAL_OP_NON_SHORT_CIRCUIT
-	   && lhs != 0 && rhs != 0
-	   && (code == TRUTH_ANDIF_EXPR
-	       || code == TRUTH_ORIF_EXPR)
-	   && operand_equal_p (lhs, rhs, 0))
-    {
-      /* If simple enough, just rewrite.  Otherwise, make a SAVE_EXPR
-	 unless we are at top level or LHS contains a PLACEHOLDER_EXPR, in
-	 which cases we can't do this.  */
-      if (simple_operand_p (lhs))
-	return build2_loc (loc, code == TRUTH_ANDIF_EXPR
-			   ? TRUTH_AND_EXPR : TRUTH_OR_EXPR,
-			   type, op0, op1);
-
-      else if (!lang_hooks.decls.global_bindings_p ()
-	       && !CONTAINS_PLACEHOLDER_P (lhs))
-	{
-	  tree common = save_expr (lhs);
-
-	  if (0 != (lhs = build_range_check (loc, type, common,
-					     or_op ? ! in0_p : in0_p,
-					     low0, high0))
-	      && (0 != (rhs = build_range_check (loc, type, common,
-						 or_op ? ! in1_p : in1_p,
-						 low1, high1))))
-	    {
-	      if (strict_overflow_p)
-		fold_overflow_warning (warnmsg,
-				       WARN_STRICT_OVERFLOW_COMPARISON);
-	      return build2_loc (loc, code == TRUTH_ANDIF_EXPR
-				 ? TRUTH_AND_EXPR : TRUTH_OR_EXPR,
-				 type, lhs, rhs);
-	    }
-	}
-    }
-
   return 0;
 }
 
@@ -5143,40 +5117,6 @@  fold_truth_andor_1 (location_t loc, enum
   code = ((code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR)
 	  ? TRUTH_AND_EXPR : TRUTH_OR_EXPR);
 
-  /* If the RHS can be evaluated unconditionally and its operands are
-     simple, it wins to evaluate the RHS unconditionally on machines
-     with expensive branches.  In this case, this isn't a comparison
-     that can be merged.  */
-
-  if (BRANCH_COST (optimize_function_for_speed_p (cfun),
-		   false) >= 2
-      && ! FLOAT_TYPE_P (TREE_TYPE (rl_arg))
-      && simple_operand_p (rl_arg)
-      && simple_operand_p (rr_arg))
-    {
-      /* Convert (a != 0) || (b != 0) into (a | b) != 0.  */
-      if (code == TRUTH_OR_EXPR
-	  && lcode == NE_EXPR && integer_zerop (lr_arg)
-	  && rcode == NE_EXPR && integer_zerop (rr_arg)
-	  && TREE_TYPE (ll_arg) == TREE_TYPE (rl_arg)
-	  && INTEGRAL_TYPE_P (TREE_TYPE (ll_arg)))
-	return build2_loc (loc, NE_EXPR, truth_type,
-			   build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg),
-				   ll_arg, rl_arg),
-			   build_int_cst (TREE_TYPE (ll_arg), 0));
-
-      /* Convert (a == 0) && (b == 0) into (a | b) == 0.  */
-      if (code == TRUTH_AND_EXPR
-	  && lcode == EQ_EXPR && integer_zerop (lr_arg)
-	  && rcode == EQ_EXPR && integer_zerop (rr_arg)
-	  && TREE_TYPE (ll_arg) == TREE_TYPE (rl_arg)
-	  && INTEGRAL_TYPE_P (TREE_TYPE (ll_arg)))
-	return build2_loc (loc, EQ_EXPR, truth_type,
-			   build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg),
-				   ll_arg, rl_arg),
-			   build_int_cst (TREE_TYPE (ll_arg), 0));
-    }
-
   /* See if the comparisons can be merged.  Then get all the parameters for
      each side.  */
 
@@ -8406,13 +8346,12 @@  fold_truth_andor (location_t loc, enum t
   if ((tem = fold_truth_andor_1 (loc, code, type, arg0, arg1)) != 0)
     return tem;
 
-  if ((BRANCH_COST (optimize_function_for_speed_p (cfun),
-		    false) >= 2)
-      && LOGICAL_OP_NON_SHORT_CIRCUIT
-      && (code == TRUTH_AND_EXPR
-          || code == TRUTH_ANDIF_EXPR
-          || code == TRUTH_OR_EXPR
-          || code == TRUTH_ORIF_EXPR))
+  /* Try to optimize TRUTH_AND/OR[IF]_EXPRs to associative TRUTH_AND/OR_EXPRs
+     chains.  */
+  if (code == TRUTH_AND_EXPR
+      || code == TRUTH_ANDIF_EXPR
+      || code == TRUTH_OR_EXPR
+      || code == TRUTH_ORIF_EXPR)
     {
       enum tree_code ncode, icode;
 
@@ -8440,7 +8379,7 @@  fold_truth_andor (location_t loc, enum t
 	  return fold_build2_loc (loc, icode, type, TREE_OPERAND (arg0, 0),
 				  tem);
 	}
-	/* Same as abouve but for (A AND[-IF] (B AND-IF C)) -> ((A AND B) AND-IF C),
+	/* Same as above but for (A AND[-IF] (B AND-IF C)) -> ((A AND B) AND-IF C),
 	   or (A OR[-IF] (B OR-IF C) -> ((A OR B) OR-IF C).  */
       else if (TREE_CODE (arg1) == icode
 	  && simple_operand_p_2 (arg0)
Index: gcc-trunk/gcc/testsuite/gcc.dg/pr46909.c
===================================================================
--- gcc-trunk.orig/gcc/testsuite/gcc.dg/pr46909.c
+++ gcc-trunk/gcc/testsuite/gcc.dg/pr46909.c
@@ -13,7 +13,7 @@  foo (unsigned int x)
   return -1;
 }
 
-/* { dg-final { scan-tree-dump-times "x_\[0-9\]+\\(D\\) != 4" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "x_\[0-9\]+\\(D\\) != 4" 0 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "x_\[0-9\]+\\(D\\) != 6" 0 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "x_\[0-9\]+\\(D\\) == 2" 0 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "x_\[0-9\]+\\(D\\) == 6" 0 "optimized" } } */
Index: gcc-trunk/gcc/testsuite/gcc.dg/tree-ssa/vrp33.c
===================================================================
--- gcc-trunk.orig/gcc/testsuite/gcc.dg/tree-ssa/vrp33.c
+++ gcc-trunk/gcc/testsuite/gcc.dg/tree-ssa/vrp33.c
@@ -3,7 +3,14 @@ 
 
 /* This is from PR14052.  */
 
-int f2(int x) { return x == 1 || x == 3 || x == 1; }
+int f2(int x)
+{
+  if (x == 1 || x == 3)
+    return 1;
+  if (x == 1)
+    return 1;
+  return 0;
+}
 
 /* { dg-final { scan-tree-dump "Folding predicate.*== 1 to 0" "vrp1" } } */
 /* { dg-final { cleanup-tree-dump "vrp1" } } */
Index: gcc-trunk/gcc/testsuite/gcc.target/i386/branch-cost1.c
===================================================================
--- gcc-trunk.orig/gcc/testsuite/gcc.target/i386/branch-cost1.c
+++ gcc-trunk/gcc/testsuite/gcc.target/i386/branch-cost1.c
@@ -11,6 +11,7 @@  foo (int a, int b)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "if " 2 "gimple" } } */
-/* { dg-final { scan-tree-dump-not " & " "gimple" } } */
+/* { dg-final { scan-tree-dump-times "if " 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times " & " 1 "gimple" } } */
+/* { dg-final { scan-assembler-times "jne|je" 2 } } */
 /* { dg-final { cleanup-tree-dump "gimple" } } */
Index: gcc-trunk/gcc/testsuite/gcc.target/i386/branch-cost2.c
===================================================================
--- gcc-trunk.orig/gcc/testsuite/gcc.target/i386/branch-cost2.c
+++ gcc-trunk/gcc/testsuite/gcc.target/i386/branch-cost2.c
@@ -13,4 +13,5 @@  foo (int a, int b)
 
 /* { dg-final { scan-tree-dump-times "if " 1 "gimple" } } */
 /* { dg-final { scan-tree-dump-times " & " 1 "gimple" } } */
+/* { dg-final { scan-assembler-times "jne|je" 1 } } */
 /* { dg-final { cleanup-tree-dump "gimple" } } */
Index: gcc-trunk/gcc/testsuite/gcc.target/i386/branch-cost3.c
===================================================================
--- gcc-trunk.orig/gcc/testsuite/gcc.target/i386/branch-cost3.c
+++ gcc-trunk/gcc/testsuite/gcc.target/i386/branch-cost3.c
@@ -13,4 +13,5 @@  foo (_Bool a, _Bool b)
 
 /* { dg-final { scan-tree-dump-times "if " 1 "gimple" } } */
 /* { dg-final { scan-tree-dump-times " & " 1 "gimple" } } */
+/* { dg-final { scan-assembler-times "jne|je" 1 } } */
 /* { dg-final { cleanup-tree-dump "gimple" } } */
Index: gcc-trunk/gcc/testsuite/gcc.target/i386/branch-cost4.c
===================================================================
--- gcc-trunk.orig/gcc/testsuite/gcc.target/i386/branch-cost4.c
+++ gcc-trunk/gcc/testsuite/gcc.target/i386/branch-cost4.c
@@ -11,6 +11,7 @@  foo (_Bool a, _Bool b)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "if " 2 "gimple" } } */
-/* { dg-final { scan-tree-dump-not " & " "gimple" } } */
+/* { dg-final { scan-tree-dump-times "if " 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times " & " 1 "gimple" } } */
+/* { dg-final { scan-assembler-times "jne|je" 2 } } */
 /* { dg-final { cleanup-tree-dump "gimple" } } */