Patchwork RFC: Add of type-demotion pass

login
register
mail settings
Submitter Kai Tietz
Date July 8, 2013, 5:45 p.m.
Message ID <155227895.847667.1373305519786.JavaMail.root@redhat.com>
Download mbox | patch
Permalink /patch/257580/
State New
Headers show

Comments

Kai Tietz - July 8, 2013, 5:45 p.m.
Hello,

These passes are implementing type-demotion (type sinking into statements) for some gimple statements.  I limitted inital implementation to the set of multiply, addition, substraction, and binary-and/or/xor.  Additional this pass adds some rules to sink type-cast sequences - eg. (int) (short) x; with char as type of x.  This special handing in this pass of such type-sequence simplification is necessary to avoid too complex cast-sequences by type-unsigned conversion used by this pass to avoid undefined overflow behaviour.

I will sent separate patch with some test-cases to demonstrate and verify operation of this new optimization.  Just one sample I will cite here to demonstrate operation of type-demotion pass.

------start-----
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-optimized" } */

signed char a[1024], b[1024];

void
baz (void)
{
  int i, s, t;
  for (i = 0; i < 1024; i++)
    { s = a[i]; t = b[i]; s += t + 0x12345600; a[i] = s; }
}

/* { dg-final { scan-tree-dump-times "305419776" 0 "optimized" } } */
/* { dg-final { cleanup-tree-dump "optimized" } } */
------end------

You will notice that by typedemote2 pass the 's += t + 0x12345600;' expression gets simplified to 's += t + 0;'.

I have added an additional early pass "typedemote1" to this patch for simple cases types can be easily sunken into statement without special unsigned-cast for overflow-case.   Jakub asked for it. Tests have shown that this pass does optimizations in pretty few cases.  As example in testsuite see for example pr46867.c testcase.
The second pass (I put it behind first vrp pass to avoid testcase-conflicts) uses 'unsigned'-type casts to avoid undefined overflow behavior. This pass has much more hits in standard code, but seems to trigger some regressions in vect pass:

List of regi

FAIL: gcc.dg/vect/slp-cond-3.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/vect-reduc-dot-s8b.c -flto  scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" 1
FAIL: gcc.dg/vect/slp-cond-3.c -flto  scan-tree-dump-times vect "vectorizing stmts using SLP" 1
FAIL: gcc.target/i386/rotate-1.c scan-assembler ro[lr]b


2013-07-08  Kai Tietz  <ktietz@redhat.com>

	* Makefile.in (OBJS): Add tree-ssa-te.o
	* passes.c (init_optimization_passes): Add
	pass_type_demote1 and pass_type_demote2.
	tree-pass.h (pass_type_demote1): Add declaration.
	(pass_type_demote2): Likewise.
	* tree-ssa-te.c: New implementation file.

Patch is tested for x86_64-unknown-linux-gnu, and x86_64-w64-mingw32. Ok for apply?

Regards,
Kai
Marc Glisse - July 8, 2013, 8:52 p.m.
On Mon, 8 Jul 2013, Kai Tietz wrote:

> These passes are implementing type-demotion (type sinking into 
> statements) for some gimple statements.  I limitted inital 
> implementation to the set of multiply, addition, substraction, and 
> binary-and/or/xor.  Additional this pass adds some rules to sink 
> type-cast sequences - eg. (int) (short) x; with char as type of x. 
> This special handing in this pass of such type-sequence simplification 
> is necessary to avoid too complex cast-sequences by type-unsigned 
> conversion used by this pass to avoid undefined overflow behaviour.

Hello,

thanks for working on this. This seems like a nice chance for me to learn, 
so I am trying to understand your patch, and I would be glad if you could 
give me a couple hints.

I wonder why you implemented this as a separate pass instead of adding it 
to tree-ssa-forwprop. demote_cast is (or should be) a special case of 
combine_conversions, so it would be nice to avoid the code duplication 
(there is also a version in fold-const.c). demote_into could be called 
from roughly the same place as simplify_conversion_from_bitmask. And you 
could reuse get_prop_source_stmt, can_propagate_from, 
remove_prop_source_from_use, etc.

If I understand, the main reason is because you want to go through the 
statements in reverse order, since this is the way the casts are being 
propagated (would forwprop also work, just more slowly, or would it miss 
opportunities across basic blocks?).

I have some trouble understanding why something as complicated as 
build_and_add_sum (which isn't restricted to additions by the way) is 
needed. Could you add a comment to the code explaining why you need to 
insert the statements precisely there and not for instance just before 
their use? Is that to try and help CSE?

> I have added an additional early pass "typedemote1" to this patch for simple cases types can be easily sunken into statement without special unsigned-cast for overflow-case.   Jakub asked for it. Tests have shown that this pass does optimizations in pretty few cases.  As example in testsuite see for example pr46867.c testcase.
> The second pass (I put it behind first vrp pass to avoid testcase-conflicts) uses 'unsigned'-type casts to avoid undefined overflow behavior. This pass has much more hits in standard code,

I assume that, when the pass is done, we can consider removing the 
corresponding code from the front-ends? That would increase the hits ;-)
Kai Tietz - July 9, 2013, 9:03 a.m.
----- Original Message -----
> On Mon, 8 Jul 2013, Kai Tietz wrote:
> 
> > These passes are implementing type-demotion (type sinking into
> > statements) for some gimple statements.  I limitted inital
> > implementation to the set of multiply, addition, substraction, and
> > binary-and/or/xor.  Additional this pass adds some rules to sink
> > type-cast sequences - eg. (int) (short) x; with char as type of x.
> > This special handing in this pass of such type-sequence simplification
> > is necessary to avoid too complex cast-sequences by type-unsigned
> > conversion used by this pass to avoid undefined overflow behaviour.
> 
> Hello,
> 
> thanks for working on this. This seems like a nice chance for me to learn,
> so I am trying to understand your patch, and I would be glad if you could
> give me a couple hints.

I try.
 
> I wonder why you implemented this as a separate pass instead of adding it
> to tree-ssa-forwprop. demote_cast is (or should be) a special case of
> combine_conversions, so it would be nice to avoid the code duplication
> (there is also a version in fold-const.c). demote_into could be called
> from roughly the same place as simplify_conversion_from_bitmask. And you
> could reuse get_prop_source_stmt, can_propagate_from,
> remove_prop_source_from_use, etc.

Initial patch (from last year) actual implemented that in forwprop.  I was then kindly asked to put this into a separate pass.  There are some good reason why forward-propagation isn't the right place for it.  Eg, forwprop does type-raising by default.  So by implementing type-demotion there too, would lead to raise-condition.  So there would be required additionally that within forwprop a straight line-depth conversion is done for statement-lists.  All this doesn't fit pretty well into current concept of forward-propagation ...
The cast demotion is of course something of interest for folding and might be fitting into forward-propagation-pass too.  The main cause why it is implemented within demotion pass is, that this pass introduces such cast-demotion-folding opportunities due its "unsigned"-type expansion.  So we want
to fold that within pass and not waiting until a later pass optimizes such redundant sequences away.

Btw, the cast-demotion can be still improved for some cases - see here as example the gcc.target/i386/rotate-1.c testcase for such an improvement - so that some expressions of the form of (type1) (((type2) x) op ((type2) y)) can be transformed into x op y.

> If I understand, the main reason is because you want to go through the
> statements in reverse order, since this is the way the casts are being
> propagated (would forwprop also work, just more slowly, or would it miss
> opportunities across basic blocks?).
It would miss some opportunities, and causes penalty on speed due concept of forwprop.
 
> I have some trouble understanding why something as complicated as
> build_and_add_sum (which isn't restricted to additions by the way) is
> needed. Could you add a comment to the code explaining why you need to
> insert the statements precisely there and not for instance just before
> their use? Is that to try and help CSE?
Yes, this function is a bit more complex as actual required for now in general.  Nevertheless required.  And I expect that demotion-pass will improve and so even the right now "unlikely" cases might be required more frequent.  I had in front lighter version of statement addition, but it failed in some cases.In some (rare) cases we would otherwise choose wrong basic-block to add the new "cast"-statements.  Well, there is another place (reassoc) where you can find nearly the same function, and its needs are exactly the same as within demote-pass.

> > I have added an additional early pass "typedemote1" to this patch for
> > simple cases types can be easily sunken into statement without special
> > unsigned-cast for overflow-case.   Jakub asked for it. Tests have shown
> > that this pass does optimizations in pretty few cases.  As example in
> > testsuite see for example pr46867.c testcase.
> > The second pass (I put it behind first vrp pass to avoid
> > testcase-conflicts) uses 'unsigned'-type casts to avoid undefined overflow
> > behavior. This pass has much more hits in standard code,
> 
> I assume that, when the pass is done, we can consider removing the
> corresponding code from the front-ends? That would increase the hits ;-)

Hmm, well possible, but unlikely.  FE and ME are two pretty different things.  Sadly stuff isn't as clearly separated as it should be.  But well, I know there is work going on to achieve that in this area.
 
> --
> Marc Glisse
> 

Kai
Jeff Law - July 9, 2013, 10:41 p.m.
On 07/08/2013 02:52 PM, Marc Glisse wrote:
>
> I wonder why you implemented this as a separate pass instead of adding
> it to tree-ssa-forwprop. demote_cast is (or should be) a special case of
> combine_conversions, so it would be nice to avoid the code duplication
> (there is also a version in fold-const.c). demote_into could be called
> from roughly the same place as simplify_conversion_from_bitmask. And you
> could reuse get_prop_source_stmt, can_propagate_from,
> remove_prop_source_from_use, etc.
That's a real good question; I find myself looking a lot at the bits in 
forwprop and I'm getting worried it's on its way to being an 
unmaintainable mess.  Sadly, I'm making things worse rather than better 
with my recent changes.  I'm still hoping more structure will become 
evident as I continue to work through various improvements.

I find myself also pondering these bits in a code motion model; what 
hasn't become clear yet is the driving motivation to show why thinking 
about this as a code motion problem is interesting.

Conceptually we can hoist casts to their earliest possible point and 
sink them to their latest possible point.  What are the benefits of 
those transformations and is there anything inherently good about 
actually moving the typecasts as opposed to just realizing the casts are 
in the IL and optimizing appropriately.

ie, often I see the hoisting/sinking code bring a series of casts 
together into straighline code which then gets optimized.  *BUT* is 
there anything inherently better/easier with having them in straightline 
code.  We can walk the use->def chains and recover the same information. 
  If that's not happening, then that points to a failing in our optimizers.

Floating out there is the hope that there's a set of canonicalization 
rules to guide us where to place the typecasts.  ie, is it generally 
better to have

(T) (a) OP (T) b

Or is it better to have
(T) (a OP b)

[ Assuming they're semantically the same. ]

Is it dependent on T and how T relates to the underlying target?  Are 
the guidelines likely the same for logicals vs arithmetic, etc?




>
> If I understand, the main reason is because you want to go through the
> statements in reverse order, since this is the way the casts are being
> propagated (would forwprop also work, just more slowly, or would it miss
> opportunities across basic blocks?).
SSA_NAME_DEF_STMT can cross block boundaries.


>
> I have some trouble understanding why something as complicated as
> build_and_add_sum (which isn't restricted to additions by the way) is
> needed. Could you add a comment to the code explaining why you need to
> insert the statements precisely there and not for instance just before
> their use? Is that to try and help CSE?
Hmm, I thought I had suggested that routine get renamed during an 
internal, informal review of Kai's work.

>
>> I have added an additional early pass "typedemote1" to this patch for
>> simple cases types can be easily sunken into statement without special
>> unsigned-cast for overflow-case.   Jakub asked for it. Tests have
>> shown that this pass does optimizations in pretty few cases.  As
>> example in testsuite see for example pr46867.c testcase.
>> The second pass (I put it behind first vrp pass to avoid
>> testcase-conflicts) uses 'unsigned'-type casts to avoid undefined
>> overflow behavior. This pass has much more hits in standard code,
>
> I assume that, when the pass is done, we can consider removing the
> corresponding code from the front-ends? That would increase the hits ;-)
Kai and I have briefly touched on this, mostly in the context of 
removing bits from fold-const.c rather than the front-ends proper.


jeff
>
Marc Glisse - July 12, 2013, 7:13 p.m.
On Tue, 9 Jul 2013, Kai Tietz wrote:

>> I would be glad if you could give me a couple hints.
> I try.

Thanks :-)

>> I wonder why you implemented this as a separate pass instead of adding it
>> to tree-ssa-forwprop. demote_cast is (or should be) a special case of
>> combine_conversions, so it would be nice to avoid the code duplication
>> (there is also a version in fold-const.c). demote_into could be called
>> from roughly the same place as simplify_conversion_from_bitmask. And you
>> could reuse get_prop_source_stmt, can_propagate_from,
>> remove_prop_source_from_use, etc.
>
> Initial patch (from last year) actual implemented that in forwprop.

Ah, reading the conversation from last year helped me understand a bit 
better.

> I was then kindly asked to put this into a separate pass.  There are 
> some good reason why forward-propagation isn't the right place for it. 
> Eg, forwprop does type-raising by default.

Even in cases where that increases the size of the operation? I see a 
hoist_conversion_for_bitop_p that seems to try and prevent it.

> So by implementing type-demotion 
> there too, would lead to raise-condition.  So there would be required 
> additionally that within forwprop a straight line-depth conversion is 
> done for statement-lists.  All this doesn't fit pretty well into current 
> concept of forward-propagation ... The cast demotion is of course 
> something of interest for folding and might be fitting into 
> forward-propagation-pass too.  The main cause why it is implemented 
> within demotion pass is, that this pass introduces such 
> cast-demotion-folding opportunities due its "unsigned"-type expansion. 
> So we want to fold that within pass and not waiting until a later pass 
> optimizes such redundant sequences away.

I hope we can at least find a way to share code between the passes.

>> If I understand, the main reason is because you want to go through the
>> statements in reverse order, since this is the way the casts are being
>> propagated (would forwprop also work, just more slowly, or would it miss
>> opportunities across basic blocks?).
> It would miss some opportunities,

Could you explain in what case? I guess my trouble understanding this is 
the same as in the next question, and I am missing a fundamental point...

> and causes penalty on speed due concept of forwprop.
>
>> I have some trouble understanding why something as complicated as
>> build_and_add_sum (which isn't restricted to additions by the way) is
>> needed. Could you add a comment to the code explaining why you need to
>> insert the statements precisely there and not for instance just before
>> their use? Is that to try and help CSE?
> Yes, this function is a bit more complex as actual required for now in 
> general.  Nevertheless required.  And I expect that demotion-pass will 
> improve and so even the right now "unlikely" cases might be required 
> more frequent.  I had in front lighter version of statement addition, 
> but it failed in some cases.In some (rare) cases we would otherwise 
> choose wrong basic-block to add the new "cast"-statements.  Well, there 
> is another place (reassoc) where you can find nearly the same function, 
> and its needs are exactly the same as within demote-pass.

I had a look at the reassoc pass and failed to understand the logic there 
as well :-( I don't really understand how the basic block can be wrong. If 
we only handle the single use case with no side effects and no weird jump 
in between, nothing but the user should notice if you move the definition 
next to the use. So is this a way to handle weirder jumps than you could 
otherwise?

>>> I have added an additional early pass "typedemote1" to this patch for
>>> simple cases types can be easily sunken into statement without special
>>> unsigned-cast for overflow-case.   Jakub asked for it. Tests have shown
>>> that this pass does optimizations in pretty few cases.  As example in
>>> testsuite see for example pr46867.c testcase.
>>> The second pass (I put it behind first vrp pass to avoid
>>> testcase-conflicts) uses 'unsigned'-type casts to avoid undefined overflow
>>> behavior. This pass has much more hits in standard code,

My little understanding from the old conversation and Jeff's post in this 
thread is that there are 2 ways to think about this optimization:

1) the canonical representation is with the smallest type possible. Using 
forwprop could make sense there (even if it may need help from another 
pass for some complicated cases, and it isn't the most efficient way). 
There is possibly a late pass to change all the types too small for the 
hardware into int.

2) we compute for each operation what minimal size it needs. Then, we may 
do some optimizations (truncate constants), possibly convert the 
operations to some smaller type (many at once), but not necessarily. This 
requires a separate pass but doesn't look like what you are doing.



On Tue, 9 Jul 2013, Jeff Law wrote:

> On 07/08/2013 02:52 PM, Marc Glisse wrote:
>> 
>> I wonder why you implemented this as a separate pass instead of adding
>> it to tree-ssa-forwprop. demote_cast is (or should be) a special case of
>> combine_conversions, so it would be nice to avoid the code duplication
>> (there is also a version in fold-const.c). demote_into could be called
>> from roughly the same place as simplify_conversion_from_bitmask. And you
>> could reuse get_prop_source_stmt, can_propagate_from,
>> remove_prop_source_from_use, etc.
> That's a real good question; I find myself looking a lot at the bits in 
> forwprop and I'm getting worried it's on its way to being an unmaintainable 
> mess.  Sadly, I'm making things worse rather than better with my recent 
> changes.  I'm still hoping more structure will become evident as I continue 
> to work through various improvements.

It looks to me like a gimple version of fold, except that since it is 
gimple, basic operations are an order of magnitude more complicated. But I 
don't really see why that would make it an unmaintainable mess, giant 
switches are not that bad.

>> If I understand, the main reason is because you want to go through the
>> statements in reverse order, since this is the way the casts are being
>> propagated (would forwprop also work, just more slowly, or would it miss
>> opportunities across basic blocks?).
> SSA_NAME_DEF_STMT can cross block boundaries.

I know, which is why it isn't obvious to me what opportunities are missed.

>> I assume that, when the pass is done, we can consider removing the
>> corresponding code from the front-ends? That would increase the hits ;-)
> Kai and I have briefly touched on this, mostly in the context of removing 
> bits from fold-const.c rather than the front-ends proper.

The code in the front-ends is a pain. Lately, it made implementing some 
parts of the undefined behavior sanitizer much more complicated than it 
should have been. I think fold-const hurts less than optimization code 
directly in the front-ends.

Thanks Kai and Jeff, your posts helped a lot,
Jeff Law - July 19, 2013, 1:06 p.m.
On 07/12/2013 01:13 PM, Marc Glisse wrote:
>> Initial patch (from last year) actual implemented that in forwprop.
>
> Ah, reading the conversation from last year helped me understand a bit
> better.
It's also worth noting that fold-const.c also does some type 
hoisting/sinking.   Ideally that code should just be going away.


>> So by implementing type-demotion there too, would lead to
>> raise-condition.  So there would be required additionally that within
>> forwprop a straight line-depth conversion is done for
>> statement-lists.  All this doesn't fit pretty well into current
>> concept of forward-propagation ... The cast demotion is of course
>> something of interest for folding and might be fitting into
>> forward-propagation-pass too.  The main cause why it is implemented
>> within demotion pass is, that this pass introduces such
>> cast-demotion-folding opportunities due its "unsigned"-type expansion.
>> So we want to fold that within pass and not waiting until a later pass
>> optimizes such redundant sequences away.
>
> I hope we can at least find a way to share code between the passes.
Well, I'd like to pull all of the type hoisting/sinking code out to its 
own pass.  That may be a bit idealistic, but that's my hope/goal.


>
>>> If I understand, the main reason is because you want to go through the
>>> statements in reverse order, since this is the way the casts are being
>>> propagated (would forwprop also work, just more slowly, or would it miss
>>> opportunities across basic blocks?).
>> It would miss some opportunities,
>
> Could you explain in what case? I guess my trouble understanding this is
> the same as in the next question, and I am missing a fundamental point...
Anytime you hoist/sink a cast, you're moving it across an operation -- 
which can expose it as a redundant cast.

Let's say you start with

A = (T) x1;
B = (T) x2;
R = A & B;


And sink the cast after the operation like this:

C = x1 & x2;
R = (T) C;

We may find that that cast of C to type T is redundant.  Similar cases 
can be found when we hoist the cast across the operation.  I saw this 
kind of situation occur regularly when I was looking at Kai's 
hoisting/sinking patches.

Now I believe one of Kai's goals is to allow our various pattern based 
folders to work better by not having to account for casting operations 
as often in sequences of statements we want to fold.  I suspect that to 
see benefit from that we'd have to hoist, fold, sink, fold.  That argues 
that hoisting/sinking should be independent of the folding step (which 
shouldn't be a surprise to any of us).


>> That's a real good question; I find myself looking a lot at the bits
>> in forwprop and I'm getting worried it's on its way to being an
>> unmaintainable mess.  Sadly, I'm making things worse rather than
>> better with my recent changes.  I'm still hoping more structure will
>> become evident as I continue to work through various improvements.
>
> It looks to me like a gimple version of fold, except that since it is
> gimple, basic operations are an order of magnitude more complicated. But
> I don't really see why that would make it an unmaintainable mess, giant
> switches are not that bad.
It's certainly moving towards a gimple version of fold and special 
casing everything as we convert from fold and to gimple_fold (or 
whatever we call it) is going to result in a horrid mess.

I find myself thinking that at a high level we need to split out the 
forward and backward propagation bits into distinct passes.  The 
backward propagation bits are just a tree combiner and the idioms used 
to follow the backward chains to create more complex trees and fold them 
need to be reusable components.  It's totally silly (and ultimately 
unmaintainable) that each transformation is open-coding the walking of 
the use-def chain and simplification step.

Jeff
Richard Guenther - July 19, 2013, 3:41 p.m.
Jeff Law <law@redhat.com> wrote:

>On 07/12/2013 01:13 PM, Marc Glisse wrote:
>>> Initial patch (from last year) actual implemented that in forwprop.
>>
>> Ah, reading the conversation from last year helped me understand a
>bit
>> better.
>It's also worth noting that fold-const.c also does some type 
>hoisting/sinking.   Ideally that code should just be going away.
>
>
>>> So by implementing type-demotion there too, would lead to
>>> raise-condition.  So there would be required additionally that
>within
>>> forwprop a straight line-depth conversion is done for
>>> statement-lists.  All this doesn't fit pretty well into current
>>> concept of forward-propagation ... The cast demotion is of course
>>> something of interest for folding and might be fitting into
>>> forward-propagation-pass too.  The main cause why it is implemented
>>> within demotion pass is, that this pass introduces such
>>> cast-demotion-folding opportunities due its "unsigned"-type
>expansion.
>>> So we want to fold that within pass and not waiting until a later
>pass
>>> optimizes such redundant sequences away.
>>
>> I hope we can at least find a way to share code between the passes.
>Well, I'd like to pull all of the type hoisting/sinking code out to its
>
>own pass.  That may be a bit idealistic, but that's my hope/goal.
>
>
>>
>>>> If I understand, the main reason is because you want to go through
>the
>>>> statements in reverse order, since this is the way the casts are
>being
>>>> propagated (would forwprop also work, just more slowly, or would it
>miss
>>>> opportunities across basic blocks?).
>>> It would miss some opportunities,
>>
>> Could you explain in what case? I guess my trouble understanding this
>is
>> the same as in the next question, and I am missing a fundamental
>point...
>Anytime you hoist/sink a cast, you're moving it across an operation -- 
>which can expose it as a redundant cast.
>
>Let's say you start with
>
>A = (T) x1;
>B = (T) x2;
>R = A & B;
>
>
>And sink the cast after the operation like this:
>
>C = x1 & x2;
>R = (T) C;
>
>We may find that that cast of C to type T is redundant.  Similar cases 
>can be found when we hoist the cast across the operation.  I saw this 
>kind of situation occur regularly when I was looking at Kai's 
>hoisting/sinking patches.
>
>Now I believe one of Kai's goals is to allow our various pattern based 
>folders to work better by not having to account for casting operations 
>as often in sequences of statements we want to fold.  I suspect that to
>
>see benefit from that we'd have to hoist, fold, sink, fold.  That
>argues 
>that hoisting/sinking should be independent of the folding step (which 
>shouldn't be a surprise to any of us).
>
>
>>> That's a real good question; I find myself looking a lot at the bits
>>> in forwprop and I'm getting worried it's on its way to being an
>>> unmaintainable mess.  Sadly, I'm making things worse rather than
>>> better with my recent changes.  I'm still hoping more structure will
>>> become evident as I continue to work through various improvements.
>>
>> It looks to me like a gimple version of fold, except that since it is
>> gimple, basic operations are an order of magnitude more complicated.
>But
>> I don't really see why that would make it an unmaintainable mess,
>giant
>> switches are not that bad.
>It's certainly moving towards a gimple version of fold and special 
>casing everything as we convert from fold and to gimple_fold (or 
>whatever we call it) is going to result in a horrid mess.
>
>I find myself thinking that at a high level we need to split out the 
>forward and backward propagation bits into distinct passes.  The 
>backward propagation bits are just a tree combiner and the idioms used 
>to follow the backward chains to create more complex trees and fold
>them 
>need to be reusable components.  It's totally silly (and ultimately 
>unmaintainable) that each transformation is open-coding the walking of 
>the use-def chain and simplification step.

Splitting forward and backward propagation into separate passes creates a pass ordering issue. Rather than that all forward propagations should be formulated as backward or the other way around. A bit awkward in some cases maybe, but you can at least drive forward propagations from their sinks.

As of a type demotion pass - I'd rather have this kind of thing as a lowering exposing mode promotion somewhen late in the simple optimizing stage.

Btw, other demotions still happen from convert.c ...

Richard.

>Jeff
Marc Glisse - July 19, 2013, 3:55 p.m.
On Fri, 19 Jul 2013, Jeff Law wrote:

> It's also worth noting that fold-const.c also does some type 
> hoisting/sinking.

fold-const.c does everything ;-)

> Ideally that code should just be going away.

Like most of the code from fold-const.c that isn't about folding 
constants, I guess, once it has a replacement at the gimple level.

>>>> If I understand, the main reason is because you want to go through the
>>>> statements in reverse order, since this is the way the casts are being
>>>> propagated (would forwprop also work, just more slowly, or would it miss
>>>> opportunities across basic blocks?).
>>> It would miss some opportunities,
>> 
>> Could you explain in what case? I guess my trouble understanding this is
>> the same as in the next question, and I am missing a fundamental point...
> Anytime you hoist/sink a cast, you're moving it across an operation -- which 
> can expose it as a redundant cast.
>
> Let's say you start with
>
> A = (T) x1;
> B = (T) x2;
> R = A & B;
>
>
> And sink the cast after the operation like this:
>
> C = x1 & x2;
> R = (T) C;
>
> We may find that that cast of C to type T is redundant.  Similar cases can be 
> found when we hoist the cast across the operation.  I saw this kind of 
> situation occur regularly when I was looking at Kai's hoisting/sinking 
> patches.

IIRC the question here was how forwprop could miss opportunities that a 
backward (reverse dominance) walk finds, and I don't see that in this 
example. If the cast is redundant, forwprop is just as likely to detect 
it.

> Now I believe one of Kai's goals is to allow our various pattern based 
> folders to work better by not having to account for casting operations as 
> often in sequences of statements we want to fold.

Ah. I thought it was mostly so operations would be performed in a smaller, 
hopefully cheaper mode. Simplifying combiner patterns would be nice if 
that worked.

>>> That's a real good question; I find myself looking a lot at the bits
>>> in forwprop and I'm getting worried it's on its way to being an
>>> unmaintainable mess.  Sadly, I'm making things worse rather than
>>> better with my recent changes.  I'm still hoping more structure will
>>> become evident as I continue to work through various improvements.
>> 
>> It looks to me like a gimple version of fold, except that since it is
>> gimple, basic operations are an order of magnitude more complicated. But
>> I don't really see why that would make it an unmaintainable mess, giant
>> switches are not that bad.
> It's certainly moving towards a gimple version of fold and special casing 
> everything as we convert from fold and to gimple_fold (or whatever we call 
> it) is going to result in a horrid mess.
>
> I find myself thinking that at a high level we need to split out the forward 
> and backward propagation bits into distinct passes.

They are already mostly split (2 separate loops in 
ssa_forward_propagate_and_combine), so that wouldn't be hard.

> The backward propagation 
> bits are just a tree combiner and the idioms used to follow the backward 
> chains to create more complex trees and fold them need to be reusable 
> components.  It's totally silly (and ultimately unmaintainable) that each 
> transformation is open-coding the walking of the use-def chain and 
> simplification step.

It isn't completely open-coded. get_prop_source_stmt, can_propagate_from, 
defcodefor_name, etc make walking the use-def chain not so bad.

Usually at this point Richard refers to Andrew's work on a gimple 
combiner...
Richard Guenther - July 21, 2013, 3:27 p.m.
Marc Glisse <marc.glisse@inria.fr> wrote:

>On Fri, 19 Jul 2013, Jeff Law wrote:
>
>> It's also worth noting that fold-const.c also does some type 
>> hoisting/sinking.
>
>fold-const.c does everything ;-)
>
>> Ideally that code should just be going away.
>
>Like most of the code from fold-const.c that isn't about folding 
>constants, I guess, once it has a replacement at the gimple level.
>
>>>>> If I understand, the main reason is because you want to go through
>the
>>>>> statements in reverse order, since this is the way the casts are
>being
>>>>> propagated (would forwprop also work, just more slowly, or would
>it miss
>>>>> opportunities across basic blocks?).
>>>> It would miss some opportunities,
>>> 
>>> Could you explain in what case? I guess my trouble understanding
>this is
>>> the same as in the next question, and I am missing a fundamental
>point...
>> Anytime you hoist/sink a cast, you're moving it across an operation
>-- which 
>> can expose it as a redundant cast.
>>
>> Let's say you start with
>>
>> A = (T) x1;
>> B = (T) x2;
>> R = A & B;
>>
>>
>> And sink the cast after the operation like this:
>>
>> C = x1 & x2;
>> R = (T) C;
>>
>> We may find that that cast of C to type T is redundant.  Similar
>cases can be 
>> found when we hoist the cast across the operation.  I saw this kind
>of 
>> situation occur regularly when I was looking at Kai's
>hoisting/sinking 
>> patches.
>
>IIRC the question here was how forwprop could miss opportunities that a
>
>backward (reverse dominance) walk finds, and I don't see that in this 
>example. If the cast is redundant, forwprop is just as likely to detect
>
>it.
>
>> Now I believe one of Kai's goals is to allow our various pattern
>based 
>> folders to work better by not having to account for casting
>operations as 
>> often in sequences of statements we want to fold.
>
>Ah. I thought it was mostly so operations would be performed in a
>smaller, 
>hopefully cheaper mode. Simplifying combiner patterns would be nice if 
>that worked.
>
>>>> That's a real good question; I find myself looking a lot at the
>bits
>>>> in forwprop and I'm getting worried it's on its way to being an
>>>> unmaintainable mess.  Sadly, I'm making things worse rather than
>>>> better with my recent changes.  I'm still hoping more structure
>will
>>>> become evident as I continue to work through various improvements.
>>> 
>>> It looks to me like a gimple version of fold, except that since it
>is
>>> gimple, basic operations are an order of magnitude more complicated.
>But
>>> I don't really see why that would make it an unmaintainable mess,
>giant
>>> switches are not that bad.
>> It's certainly moving towards a gimple version of fold and special
>casing 
>> everything as we convert from fold and to gimple_fold (or whatever we
>call 
>> it) is going to result in a horrid mess.
>>
>> I find myself thinking that at a high level we need to split out the
>forward 
>> and backward propagation bits into distinct passes.
>
>They are already mostly split (2 separate loops in 
>ssa_forward_propagate_and_combine), so that wouldn't be hard.
>
>> The backward propagation 
>> bits are just a tree combiner and the idioms used to follow the
>backward 
>> chains to create more complex trees and fold them need to be reusable
>
>> components.  It's totally silly (and ultimately unmaintainable) that
>each 
>> transformation is open-coding the walking of the use-def chain and 
>> simplification step.

Yeah - ideally the pattern matching would be abstract enough that it works on trees, gimple and rtl ... Using a domain specific language like Haskell would get you pattern combining for free. But I suppose with c++ you can come close enough to at least support tree and gimple ssa.

>It isn't completely open-coded. get_prop_source_stmt,
>can_propagate_from, 
>defcodefor_name, etc make walking the use-def chain not so bad.
>
>Usually at this point Richard refers to Andrew's work on a gimple 
>combiner...

That or the gimple folding interface. Note the idea behind both should be to make the combined accessible from random passes just like we use fold_build now.

Richard.

Patch

Index: gcc-trunk/gcc/Makefile.in
===================================================================
--- gcc-trunk.orig/gcc/Makefile.in
+++ gcc-trunk/gcc/Makefile.in
@@ -1442,6 +1442,7 @@  OBJS = \
 	tree-ssa-strlen.o \
 	tree-ssa-structalias.o \
 	tree-ssa-tail-merge.o \
+	tree-ssa-te.o \
 	tree-ssa-ter.o \
 	tree-ssa-threadedge.o \
 	tree-ssa-threadupdate.o \
Index: gcc-trunk/gcc/passes.c
===================================================================
--- gcc-trunk.orig/gcc/passes.c
+++ gcc-trunk/gcc/passes.c
@@ -1412,6 +1412,7 @@  init_optimization_passes (void)
       NEXT_PASS (pass_copy_prop);
       NEXT_PASS (pass_merge_phi);
       NEXT_PASS (pass_vrp);
+      NEXT_PASS (pass_type_demote1);
       NEXT_PASS (pass_dce);
       NEXT_PASS (pass_call_cdce);
       NEXT_PASS (pass_cselim);
@@ -1437,6 +1438,7 @@  init_optimization_passes (void)
       NEXT_PASS (pass_phi_only_cprop);
       NEXT_PASS (pass_dse);
       NEXT_PASS (pass_reassoc);
+      NEXT_PASS (pass_type_demote2);
       NEXT_PASS (pass_dce);
       NEXT_PASS (pass_forwprop);
       NEXT_PASS (pass_phiopt);
Index: gcc-trunk/gcc/tree-pass.h
===================================================================
--- gcc-trunk.orig/gcc/tree-pass.h
+++ gcc-trunk/gcc/tree-pass.h
@@ -365,6 +365,9 @@  extern struct gimple_opt_pass pass_tm_ed
 extern struct gimple_opt_pass pass_split_functions;
 extern struct gimple_opt_pass pass_feedback_split_functions;
 extern struct gimple_opt_pass pass_strength_reduction;
+extern struct gimple_opt_pass pass_type_promote;
+extern struct gimple_opt_pass pass_type_demote1;
+extern struct gimple_opt_pass pass_type_demote2;
 
 /* IPA Passes */
 extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
Index: gcc-trunk/gcc/tree-ssa-te.c
===================================================================
--- /dev/null
+++ gcc-trunk/gcc/tree-ssa-te.c
@@ -0,0 +1,566 @@ 
+/* Type-elevation for trees
+   Copyright (C) 2013
+   Free Software Foundation, Inc.
+   Contributed by Kai Tietz <ktietz@redhat.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "flags.h"
+#include "tm_p.h"
+#include "basic-block.h"
+#include "function.h"
+#include "gimple-pretty-print.h"
+#include "tree-flow.h"
+#include "tree-pass.h"
+#include "tree-ssa-propagate.h"
+#include "value-prof.h"
+#include "alloc-pool.h"
+#include "vec.h"
+#include "langhooks.h"
+#include "target.h"
+#include "diagnostic-core.h"
+#include "dbgcnt.h"
+#include "gimple-fold.h"
+#include "params.h"
+#include "hash-table.h"
+#include "cfgloop.h"
+
+/* Operation kinds.  */
+#define TDK_NO_EXPAND_VIA_UNSIGNED  1
+#define TDK_NO_EXPAND_WIDENING	    2
+
+/* Builds one statement performing OP1 OPCODE OP2 using TMPVAR for
+   the result.  Places the statement after the definition of either
+   OP1 or OP2.  Returns the new statement.  */
+
+static tree
+build_and_add_sum (tree type, tree op1, tree op2, enum tree_code opcode)
+{
+  gimple op1def = NULL, op2def = NULL;
+  gimple_stmt_iterator gsi;
+  tree op;
+  gimple sum;
+
+  /* Create the addition statement.  */
+  op = make_ssa_name (type, NULL);
+  sum = gimple_build_assign_with_ops (opcode, op, op1, op2);
+
+  /* Find an insertion place and insert.  */
+  if (TREE_CODE (op1) == SSA_NAME)
+    op1def = SSA_NAME_DEF_STMT (op1);
+  if (op2 && TREE_CODE (op2) == SSA_NAME)
+    op2def = SSA_NAME_DEF_STMT (op2);
+
+  if ((!op1def || gimple_nop_p (op1def))
+      && (!op2def || gimple_nop_p (op2def)))
+    {
+      gsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR));
+      gsi_insert_before (&gsi, sum, GSI_NEW_STMT);
+    }
+  else if (op2
+           && ((!op1def || gimple_nop_p (op1def))
+	       || (op2def && !gimple_nop_p (op2def)
+	           && stmt_dominates_stmt_p (op1def, op2def))))
+    {
+      if (gimple_code (op2def) == GIMPLE_PHI)
+	{
+	  gsi = gsi_after_labels (gimple_bb (op2def));
+	  gsi_insert_before (&gsi, sum, GSI_NEW_STMT);
+	}
+      else
+	{
+	  if (!stmt_ends_bb_p (op2def))
+	    {
+	      gsi = gsi_for_stmt (op2def);
+	      gsi_insert_after (&gsi, sum, GSI_NEW_STMT);
+	    }
+	  else
+	    {
+	      edge e;
+	      edge_iterator ei;
+
+	      FOR_EACH_EDGE (e, ei, gimple_bb (op2def)->succs)
+		if (e->flags & EDGE_FALLTHRU)
+		  gsi_insert_on_edge_immediate (e, sum);
+	    }
+	}
+    }
+  else
+    {
+      if (gimple_code (op1def) == GIMPLE_PHI)
+	{
+	  gsi = gsi_after_labels (gimple_bb (op1def));
+	  gsi_insert_before (&gsi, sum, GSI_NEW_STMT);
+	}
+      else
+	{
+	  if (!stmt_ends_bb_p (op1def))
+	    {
+	      gsi = gsi_for_stmt (op1def);
+	      gsi_insert_after (&gsi, sum, GSI_NEW_STMT);
+	    }
+	  else
+	    {
+	      edge e;
+	      edge_iterator ei;
+
+	      FOR_EACH_EDGE (e, ei, gimple_bb (op1def)->succs)
+		if (e->flags & EDGE_FALLTHRU)
+		  gsi_insert_on_edge_immediate (e, sum);
+	    }
+	}
+    }
+
+  update_stmt (sum);
+
+  return gimple_assign_lhs (sum);
+}
+
+/* Release assign-statement chain.  */
+
+static void
+remove_stmt_chain (tree var)
+{
+  tree var2;
+  gimple stmt;
+  gimple_stmt_iterator gsi;
+
+  while (1)
+    {
+      if (TREE_CODE (var) != SSA_NAME || !has_zero_uses (var))
+        return;
+      stmt = SSA_NAME_DEF_STMT (var);
+      if (!is_gimple_assign (stmt))
+        return;
+      var = gimple_assign_rhs1 (stmt);
+      if (gimple_assign_rhs_class (stmt) == GIMPLE_BINARY_RHS)
+	var2 = gimple_assign_rhs2 (stmt);
+      else
+	var2 = NULL_TREE;
+      gsi = gsi_for_stmt (stmt);
+      gsi_remove (&gsi, true);
+      release_defs (stmt);
+      if (var2)
+	remove_stmt_chain (var2);
+    }
+}
+
+/* Generated gimple cast expression (NEWTYPE) name, and
+   return new created assignment.  */
+static tree
+gen_cast (tree newtype, tree name)
+{
+  tree ret = NULL_TREE;
+
+  if (INTEGRAL_TYPE_P (newtype) && TREE_CODE (name) == INTEGER_CST)
+    return fold_convert (newtype, name);
+
+  /* We can't propagate through our defining statement, so emit
+     a conversion explicitely.  */
+  if (!useless_type_conversion_p (newtype, TREE_TYPE (name)))
+    ret = build_and_add_sum (newtype, name, NULL_TREE, NOP_EXPR);
+  else
+    ret = name;
+
+  if (dump_file && ret != name)
+    {
+      fprintf (dump_file, "Generated cast ");
+      print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (ret), 0, 0);
+    }
+  return ret;
+}
+
+/* If we have pattern (TYPE1) (TYPE2) X, then we might be able to
+   simplify to (TYPE1) X.
+   The following conditions need to be met:
+   - typeof(X), TYPE1, TYPE2 are of integral kind.
+   - The (TYPE2) X statement is a single-use.
+   - TYPE1, and TYPE2 aren't pointer-types
+   a) bitsize(TYPE2) >= bitsize(X) && bitsize(TYPE1) <= bitsize(X);
+   b) bitsize(TYPE1) <= bitsize(TYPE2) && bitsize(TYPE1) <= bitsize(X)
+   c) bitsize(TYPE1) > bitsize(X) && bitsize(TYPE2) > bitsize(X) && SIGN(T1)==SIGN(T2)
+   d) T1 > X && T2 == X && SIGN(T!) == SIGN(T2) == SIGN(X)
+ */
+
+static bool
+demote_cast_p (tree t1, tree t2, tree tx)
+{
+  if (!INTEGRAL_TYPE_P (t1)
+      || !INTEGRAL_TYPE_P (t2))
+    return false;
+
+  if (useless_type_conversion_p (t1, t2)
+      || (TREE_CODE (t2) == BOOLEAN_TYPE && TREE_CODE (t1) == BOOLEAN_TYPE))
+    return true;
+
+  if (TREE_CODE (t2) == BOOLEAN_TYPE && TREE_CODE (t1) != BOOLEAN_TYPE)
+    return false;
+  if (!INTEGRAL_TYPE_P (tx))
+    return false;
+
+  /* For (T1) (T2) X, if T1 <= X && T2 >= X we can transform to
+     (T1) X.  */
+  if (TYPE_PRECISION (t1) <= TYPE_PRECISION (tx)
+      && TYPE_PRECISION (t2) >= TYPE_PRECISION (t1))
+    return true;
+  /* For (T1) (T2) X, if T1 <= T2 && T1 <= X we can transform to
+     (T1) X.  */
+  else if (TYPE_PRECISION (t1) <= TYPE_PRECISION (t2)
+      && TYPE_PRECISION (t1) <= TYPE_PRECISION (tx))
+    return true;
+  /* For (T1) (T2) X, if T1 > X && T1 <= T2 && (SIGN(T1) == SIGN(T2)
+     || !SIGN(TX)),  we can transform to (T1) X.  */
+  else if (TYPE_PRECISION (t1) > TYPE_PRECISION (tx)
+      && TYPE_PRECISION (t2) > TYPE_PRECISION (tx)
+      && (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2) || TYPE_UNSIGNED (tx)))
+    return true;
+  /* For (T1) (T2) X, if T1 > X && T2 == X && SIGN(X) == SIGN(T2),
+     we can transform to (T1) X.  */
+  else if (TYPE_PRECISION (t1) > TYPE_PRECISION (tx)
+      && TYPE_PRECISION (t2) == TYPE_PRECISION (tx)
+      && TYPE_UNSIGNED (tx) == TYPE_UNSIGNED (t2))
+    return true;
+  else if (TYPE_PRECISION (t1) > TYPE_PRECISION (tx)
+	   && TYPE_PRECISION (t2) > TYPE_PRECISION (tx)
+	   && TYPE_PRECISION (t1) <= TYPE_PRECISION (t2))
+    /* Eg.  (short) (unsigned int) x, for char x;  */
+    return true;
+
+  return false;
+}
+
+
+/* Try to sink cast expression (TYPE1) (TYPE2) X to (TYPE1) X, if
+   TYPE1, TYPE2, and type of X are of integral kind and met at least
+   one condition in demote_cast_p.
+   If cast was sunken true is returned, otherwise false.  */
+
+static bool
+demote_cast (gimple def, tree name, tree c, gimple def2)
+{
+  tree x;
+
+  if (!gimple_assign_cast_p (def2))
+    return false;
+
+  x = gimple_assign_rhs1 (def2);
+
+  if (TREE_CODE (x) != SSA_NAME
+      || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (x)
+      || !demote_cast_p (TREE_TYPE (name), TREE_TYPE (c), TREE_TYPE (x)))
+    return false;
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "Remove cast ");
+      print_gimple_stmt (dump_file, def, 0, 0);
+      fprintf (dump_file, "  ");
+      print_gimple_stmt (dump_file, def2, 0, 0);
+    }
+
+  gimple_assign_set_rhs1 (def, x);
+  if (useless_type_conversion_p (TREE_TYPE (name), TREE_TYPE (x)))
+    gimple_assign_set_rhs_code (def, SSA_NAME);
+  update_stmt (def);
+  remove_stmt_chain (c);
+  if (dump_file)
+    {
+      fprintf (dump_file, " by cast ");
+      print_gimple_stmt (dump_file, def, 0, 0);
+    }
+
+  return true;
+}
+
+/* Try to demote assignment for the following tree-codes:
+    (typ) (X * Y) -> ((typ) X) * ((typ) Y)
+    (typ) (X + Y) -> ((typ) X) + ((typ) Y)
+    (typ) (X - Y) -> ((typ) X) - ((typ) Y)
+    (typ) (X << Y) -> ((typ) X) << Y
+    (typ) (X >> Y) -> ((typ) X) >> Y
+    (typ) (X ror Y) -> ((typ) X) ror Y
+    (typ) (X rol Y) -> ((typ) X) rol Y
+    (typ) (X & Y) -> ((typ) X) & ((typ) Y)
+    (typ) (X | Y) -> ((typ) X) | ((typ) Y)
+    (typ) (X ^ Y) -> ((typ) X) ^ ((typ) Y).  */
+
+static bool
+demote_into (int flags, gimple stmt, tree lhs, tree r, gimple r_stmt, gimple *pstmt)
+{
+  gimple_stmt_iterator gsi;
+  enum tree_code code;
+  tree l_type = TREE_TYPE (lhs);
+  tree r_type, a1, a2;
+
+  r_type = TREE_TYPE (r);
+
+  /* We can't do type-sinking if only one side is boolean typed.  */
+  if ((TREE_CODE (r_type) == BOOLEAN_TYPE
+       || TREE_CODE (l_type) == BOOLEAN_TYPE)
+      && TREE_CODE (r_type) != TREE_CODE (l_type))
+    return false;
+
+  if ((flags & TDK_NO_EXPAND_WIDENING) != 0
+      && TYPE_PRECISION (l_type) > TYPE_PRECISION (r_type))
+    return false;
+  code = gimple_assign_rhs_code (r_stmt);
+
+  switch (code)
+    {
+    case MULT_EXPR:
+    case PLUS_EXPR: case MINUS_EXPR:
+      a1 = gimple_assign_rhs1 (r_stmt);
+      a2 = gimple_assign_rhs2 (r_stmt);
+
+      if (TYPE_PRECISION (l_type) < TYPE_PRECISION (r_type)
+	  && !TYPE_UNSIGNED (l_type))
+	{
+	  tree uns;
+
+	  uns = unsigned_type_for (l_type);
+
+	  if ((flags & TDK_NO_EXPAND_VIA_UNSIGNED) != 0)
+	    return false;
+
+	  a1 = gen_cast (uns, a1);
+	  a2 = gen_cast (uns, a2);
+	  a1 = build_and_add_sum (uns, a1, a2, code);
+	  code = NOP_EXPR;
+	  a2 = NULL_TREE;
+	  break;
+	}
+      else if (TYPE_PRECISION (l_type) < TYPE_PRECISION (r_type)
+	  && TYPE_UNSIGNED (l_type))
+	{
+	  a1 = gen_cast (l_type, a1);
+	  a2 = gen_cast (l_type, a2);
+	  break;
+	}
+      else if (TYPE_PRECISION (l_type) == TYPE_PRECISION (r_type))
+	{
+	  if (TYPE_UNSIGNED (r_type) == TYPE_UNSIGNED (l_type)
+	      || !TYPE_UNSIGNED (r_type))
+	    {
+	      a1 = gen_cast (l_type, a1);
+	      a2 = gen_cast (l_type, a2);
+	      break;
+	    }
+	}
+      return false;
+
+    case RSHIFT_EXPR:
+      /* (NEWTYPE) (X >> Y) can be transformed to
+	  ((NEWTYPE) X) >> Y, if NEWTYPE == X and
+	  SIGN(NEWTYPE) == SIGN(X).  */
+      if (TYPE_UNSIGNED (l_type) != TYPE_UNSIGNED (r_type))
+	return false;
+    case LROTATE_EXPR:
+    case RROTATE_EXPR:
+    case LSHIFT_EXPR:
+      /* We can't allow <= as right-hand operand might be too large
+	  for the smaller type.  */
+      if (TYPE_PRECISION (l_type) != TYPE_PRECISION (r_type))
+	return false;
+      a1 = gimple_assign_rhs1 (r_stmt);
+      if (!useless_type_conversion_p (l_type, TREE_TYPE (a1)))
+	a1 = gen_cast (l_type, a1);
+      a2 = gimple_assign_rhs2 (r_stmt);
+
+      break;
+
+    case MIN_EXPR: case MAX_EXPR:
+      a1 = gimple_assign_rhs1 (r_stmt);
+      a2 = gimple_assign_rhs2 (r_stmt);
+
+      if (TYPE_PRECISION (l_type) < TYPE_PRECISION (TREE_TYPE (a1)))
+	return false;
+      if (TYPE_UNSIGNED (l_type) != TYPE_UNSIGNED (TREE_TYPE (a1)))
+	{
+	  if (!TYPE_UNSIGNED (TREE_TYPE (a1))
+	      || TYPE_PRECISION (l_type) <= TYPE_PRECISION (TREE_TYPE (a1)))
+	    return false;
+	}
+
+      if (!useless_type_conversion_p (l_type, TREE_TYPE (a1)))
+	a1 = gen_cast (l_type, a1);
+      if (!useless_type_conversion_p (l_type, TREE_TYPE (a2)))
+	a2 = gen_cast (l_type, a2);
+
+      break;
+
+    case BIT_AND_EXPR: case BIT_IOR_EXPR: case BIT_XOR_EXPR:
+      a1 = gimple_assign_rhs1 (r_stmt);
+      a2 = gimple_assign_rhs2 (r_stmt);
+
+      if (!useless_type_conversion_p (l_type, TREE_TYPE (a1)))
+	a1 = gen_cast (l_type, a1);
+      if (!useless_type_conversion_p (l_type, TREE_TYPE (a2)))
+	a2 = gen_cast (l_type, a2);
+      break;
+    default:
+      return false;
+    }
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "Move cast ");
+      print_gimple_stmt (dump_file, stmt, 0, 0);
+    }
+
+  if (gimple_location (r_stmt) != UNKNOWN_LOCATION
+      && gimple_location (stmt) == UNKNOWN_LOCATION)
+    gimple_set_location (stmt, gimple_location (r_stmt));
+  gsi = gsi_for_stmt (stmt);
+  gimple_assign_set_rhs_with_ops (&gsi, code, a1, a2);
+  stmt = gsi_stmt (gsi);
+  update_stmt (stmt);
+  remove_stmt_chain (r);
+  *pstmt = stmt;
+
+  if (dump_file)
+    {
+      fprintf (dump_file, " into ");
+      print_gimple_stmt (dump_file, stmt, 0, 0);
+    }
+
+  return true;
+}
+
+/* Loop over each statement for each basic-block.  */
+
+static void
+execute_type_demote_int (int flags, basic_block bb)
+{
+  gimple_stmt_iterator gsi;
+  basic_block son;
+
+  for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi);)
+    {
+      gimple stmt, def2;
+      tree lhs, inner;
+
+      stmt = gsi_stmt (gsi);
+      gsi_prev (&gsi);
+
+      if (!is_gimple_assign (stmt)
+	  || (lhs = gimple_assign_lhs (stmt)) == NULL_TREE
+	  || TREE_CODE (lhs) != SSA_NAME
+	  || !INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+	  || !gimple_assign_cast_p (stmt))
+	continue;
+      inner = gimple_assign_rhs1 (stmt);
+      if (TREE_CODE (inner) != SSA_NAME
+	  || !INTEGRAL_TYPE_P (TREE_TYPE (inner))
+	  || !(def2 = SSA_NAME_DEF_STMT (inner))
+          || !is_gimple_assign (def2)
+	  || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (inner)
+	  || !has_single_use (inner))
+	continue;
+      if (demote_cast (stmt, lhs, inner, def2))
+	gsi = gsi_for_stmt (stmt);
+      else if (demote_into (flags, stmt, lhs, inner, def2, &stmt))
+        {
+	  gsi = gsi_for_stmt (stmt);
+	  gsi_prev (&gsi);
+        }
+    }
+  /* OCCURS_IN_ABNORMAL_PHI */
+  for (son = first_dom_son (CDI_POST_DOMINATORS, bb);
+       son;
+       son = next_dom_son (CDI_POST_DOMINATORS, son))
+    execute_type_demote_int (flags, son);
+
+}
+
+/* Entry without unsigned-type demotion.   */
+static unsigned int
+execute_type_demote1 (void)
+{
+  /* We calculate dominance info as CDI_DOMINATORS to make
+     sure we can insert new generated statenents at proper
+     basic-block.  */
+  calculate_dominance_info (CDI_DOMINATORS);
+  calculate_dominance_info (CDI_POST_DOMINATORS);
+
+  execute_type_demote_int (TDK_NO_EXPAND_VIA_UNSIGNED, EXIT_BLOCK_PTR);
+
+  free_dominance_info (CDI_POST_DOMINATORS);
+  return 0;
+}
+
+/* Entry with unsigned-type demotion.   */
+static unsigned int
+execute_type_demote2 (void)
+{
+  /* We calculate dominance info as CDI_DOMINATORS to make
+     sure we can insert new generated statenents at proper
+     basic-block.  */
+  calculate_dominance_info (CDI_DOMINATORS);
+  calculate_dominance_info (CDI_POST_DOMINATORS);
+
+  execute_type_demote_int (0, EXIT_BLOCK_PTR);
+
+  free_dominance_info (CDI_POST_DOMINATORS);
+  return 0;
+}
+
+struct gimple_opt_pass pass_type_demote1 =
+{
+  {
+    GIMPLE_PASS,
+    "typedemote1",                      /* name */
+    OPTGROUP_NONE,
+    NULL,                              /* gate */
+    execute_type_demote1,               /* execute */
+    NULL,                              /* sub */
+    NULL,                              /* next */
+    0,                                 /* static_pass_number */
+    TV_NONE,                           /* tv_id */
+    PROP_cfg | PROP_ssa,               /* properties_required */
+    0,                                 /* properties_provided */
+    0,                                 /* properties_destroyed */
+    0,                                 /* todo_flags_start */
+    TODO_verify_ssa
+    | TODO_update_ssa                  /* todo_flags_finish */
+  }
+};
+
+struct gimple_opt_pass pass_type_demote2 =
+{
+  {
+    GIMPLE_PASS,
+    "typedemote2",                      /* name */
+    OPTGROUP_NONE,
+    NULL,                              /* gate */
+    execute_type_demote2,               /* execute */
+    NULL,                              /* sub */
+    NULL,                              /* next */
+    0,                                 /* static_pass_number */
+    TV_NONE,                           /* tv_id */
+    PROP_cfg | PROP_ssa,               /* properties_required */
+    0,                                 /* properties_provided */
+    0,                                 /* properties_destroyed */
+    0,                                 /* todo_flags_start */
+    TODO_verify_ssa
+    | TODO_update_ssa                  /* todo_flags_finish */
+  }
+};