Message ID | 155227895.847667.1373305519786.JavaMail.root@redhat.com |
---|---|
State | New |
Headers | show |
On Mon, 8 Jul 2013, Kai Tietz wrote: > These passes are implementing type-demotion (type sinking into > statements) for some gimple statements. I limitted inital > implementation to the set of multiply, addition, substraction, and > binary-and/or/xor. Additional this pass adds some rules to sink > type-cast sequences - eg. (int) (short) x; with char as type of x. > This special handing in this pass of such type-sequence simplification > is necessary to avoid too complex cast-sequences by type-unsigned > conversion used by this pass to avoid undefined overflow behaviour. Hello, thanks for working on this. This seems like a nice chance for me to learn, so I am trying to understand your patch, and I would be glad if you could give me a couple hints. I wonder why you implemented this as a separate pass instead of adding it to tree-ssa-forwprop. demote_cast is (or should be) a special case of combine_conversions, so it would be nice to avoid the code duplication (there is also a version in fold-const.c). demote_into could be called from roughly the same place as simplify_conversion_from_bitmask. And you could reuse get_prop_source_stmt, can_propagate_from, remove_prop_source_from_use, etc. If I understand, the main reason is because you want to go through the statements in reverse order, since this is the way the casts are being propagated (would forwprop also work, just more slowly, or would it miss opportunities across basic blocks?). I have some trouble understanding why something as complicated as build_and_add_sum (which isn't restricted to additions by the way) is needed. Could you add a comment to the code explaining why you need to insert the statements precisely there and not for instance just before their use? Is that to try and help CSE? > I have added an additional early pass "typedemote1" to this patch for simple cases types can be easily sunken into statement without special unsigned-cast for overflow-case. Jakub asked for it. Tests have shown that this pass does optimizations in pretty few cases. As example in testsuite see for example pr46867.c testcase. > The second pass (I put it behind first vrp pass to avoid testcase-conflicts) uses 'unsigned'-type casts to avoid undefined overflow behavior. This pass has much more hits in standard code, I assume that, when the pass is done, we can consider removing the corresponding code from the front-ends? That would increase the hits ;-)
----- Original Message ----- > On Mon, 8 Jul 2013, Kai Tietz wrote: > > > These passes are implementing type-demotion (type sinking into > > statements) for some gimple statements. I limitted inital > > implementation to the set of multiply, addition, substraction, and > > binary-and/or/xor. Additional this pass adds some rules to sink > > type-cast sequences - eg. (int) (short) x; with char as type of x. > > This special handing in this pass of such type-sequence simplification > > is necessary to avoid too complex cast-sequences by type-unsigned > > conversion used by this pass to avoid undefined overflow behaviour. > > Hello, > > thanks for working on this. This seems like a nice chance for me to learn, > so I am trying to understand your patch, and I would be glad if you could > give me a couple hints. I try. > I wonder why you implemented this as a separate pass instead of adding it > to tree-ssa-forwprop. demote_cast is (or should be) a special case of > combine_conversions, so it would be nice to avoid the code duplication > (there is also a version in fold-const.c). demote_into could be called > from roughly the same place as simplify_conversion_from_bitmask. And you > could reuse get_prop_source_stmt, can_propagate_from, > remove_prop_source_from_use, etc. Initial patch (from last year) actual implemented that in forwprop. I was then kindly asked to put this into a separate pass. There are some good reason why forward-propagation isn't the right place for it. Eg, forwprop does type-raising by default. So by implementing type-demotion there too, would lead to raise-condition. So there would be required additionally that within forwprop a straight line-depth conversion is done for statement-lists. All this doesn't fit pretty well into current concept of forward-propagation ... The cast demotion is of course something of interest for folding and might be fitting into forward-propagation-pass too. The main cause why it is implemented within demotion pass is, that this pass introduces such cast-demotion-folding opportunities due its "unsigned"-type expansion. So we want to fold that within pass and not waiting until a later pass optimizes such redundant sequences away. Btw, the cast-demotion can be still improved for some cases - see here as example the gcc.target/i386/rotate-1.c testcase for such an improvement - so that some expressions of the form of (type1) (((type2) x) op ((type2) y)) can be transformed into x op y. > If I understand, the main reason is because you want to go through the > statements in reverse order, since this is the way the casts are being > propagated (would forwprop also work, just more slowly, or would it miss > opportunities across basic blocks?). It would miss some opportunities, and causes penalty on speed due concept of forwprop. > I have some trouble understanding why something as complicated as > build_and_add_sum (which isn't restricted to additions by the way) is > needed. Could you add a comment to the code explaining why you need to > insert the statements precisely there and not for instance just before > their use? Is that to try and help CSE? Yes, this function is a bit more complex as actual required for now in general. Nevertheless required. And I expect that demotion-pass will improve and so even the right now "unlikely" cases might be required more frequent. I had in front lighter version of statement addition, but it failed in some cases.In some (rare) cases we would otherwise choose wrong basic-block to add the new "cast"-statements. Well, there is another place (reassoc) where you can find nearly the same function, and its needs are exactly the same as within demote-pass. > > I have added an additional early pass "typedemote1" to this patch for > > simple cases types can be easily sunken into statement without special > > unsigned-cast for overflow-case. Jakub asked for it. Tests have shown > > that this pass does optimizations in pretty few cases. As example in > > testsuite see for example pr46867.c testcase. > > The second pass (I put it behind first vrp pass to avoid > > testcase-conflicts) uses 'unsigned'-type casts to avoid undefined overflow > > behavior. This pass has much more hits in standard code, > > I assume that, when the pass is done, we can consider removing the > corresponding code from the front-ends? That would increase the hits ;-) Hmm, well possible, but unlikely. FE and ME are two pretty different things. Sadly stuff isn't as clearly separated as it should be. But well, I know there is work going on to achieve that in this area. > -- > Marc Glisse > Kai
On 07/08/2013 02:52 PM, Marc Glisse wrote: > > I wonder why you implemented this as a separate pass instead of adding > it to tree-ssa-forwprop. demote_cast is (or should be) a special case of > combine_conversions, so it would be nice to avoid the code duplication > (there is also a version in fold-const.c). demote_into could be called > from roughly the same place as simplify_conversion_from_bitmask. And you > could reuse get_prop_source_stmt, can_propagate_from, > remove_prop_source_from_use, etc. That's a real good question; I find myself looking a lot at the bits in forwprop and I'm getting worried it's on its way to being an unmaintainable mess. Sadly, I'm making things worse rather than better with my recent changes. I'm still hoping more structure will become evident as I continue to work through various improvements. I find myself also pondering these bits in a code motion model; what hasn't become clear yet is the driving motivation to show why thinking about this as a code motion problem is interesting. Conceptually we can hoist casts to their earliest possible point and sink them to their latest possible point. What are the benefits of those transformations and is there anything inherently good about actually moving the typecasts as opposed to just realizing the casts are in the IL and optimizing appropriately. ie, often I see the hoisting/sinking code bring a series of casts together into straighline code which then gets optimized. *BUT* is there anything inherently better/easier with having them in straightline code. We can walk the use->def chains and recover the same information. If that's not happening, then that points to a failing in our optimizers. Floating out there is the hope that there's a set of canonicalization rules to guide us where to place the typecasts. ie, is it generally better to have (T) (a) OP (T) b Or is it better to have (T) (a OP b) [ Assuming they're semantically the same. ] Is it dependent on T and how T relates to the underlying target? Are the guidelines likely the same for logicals vs arithmetic, etc? > > If I understand, the main reason is because you want to go through the > statements in reverse order, since this is the way the casts are being > propagated (would forwprop also work, just more slowly, or would it miss > opportunities across basic blocks?). SSA_NAME_DEF_STMT can cross block boundaries. > > I have some trouble understanding why something as complicated as > build_and_add_sum (which isn't restricted to additions by the way) is > needed. Could you add a comment to the code explaining why you need to > insert the statements precisely there and not for instance just before > their use? Is that to try and help CSE? Hmm, I thought I had suggested that routine get renamed during an internal, informal review of Kai's work. > >> I have added an additional early pass "typedemote1" to this patch for >> simple cases types can be easily sunken into statement without special >> unsigned-cast for overflow-case. Jakub asked for it. Tests have >> shown that this pass does optimizations in pretty few cases. As >> example in testsuite see for example pr46867.c testcase. >> The second pass (I put it behind first vrp pass to avoid >> testcase-conflicts) uses 'unsigned'-type casts to avoid undefined >> overflow behavior. This pass has much more hits in standard code, > > I assume that, when the pass is done, we can consider removing the > corresponding code from the front-ends? That would increase the hits ;-) Kai and I have briefly touched on this, mostly in the context of removing bits from fold-const.c rather than the front-ends proper. jeff >
On Tue, 9 Jul 2013, Kai Tietz wrote: >> I would be glad if you could give me a couple hints. > I try. Thanks :-) >> I wonder why you implemented this as a separate pass instead of adding it >> to tree-ssa-forwprop. demote_cast is (or should be) a special case of >> combine_conversions, so it would be nice to avoid the code duplication >> (there is also a version in fold-const.c). demote_into could be called >> from roughly the same place as simplify_conversion_from_bitmask. And you >> could reuse get_prop_source_stmt, can_propagate_from, >> remove_prop_source_from_use, etc. > > Initial patch (from last year) actual implemented that in forwprop. Ah, reading the conversation from last year helped me understand a bit better. > I was then kindly asked to put this into a separate pass. There are > some good reason why forward-propagation isn't the right place for it. > Eg, forwprop does type-raising by default. Even in cases where that increases the size of the operation? I see a hoist_conversion_for_bitop_p that seems to try and prevent it. > So by implementing type-demotion > there too, would lead to raise-condition. So there would be required > additionally that within forwprop a straight line-depth conversion is > done for statement-lists. All this doesn't fit pretty well into current > concept of forward-propagation ... The cast demotion is of course > something of interest for folding and might be fitting into > forward-propagation-pass too. The main cause why it is implemented > within demotion pass is, that this pass introduces such > cast-demotion-folding opportunities due its "unsigned"-type expansion. > So we want to fold that within pass and not waiting until a later pass > optimizes such redundant sequences away. I hope we can at least find a way to share code between the passes. >> If I understand, the main reason is because you want to go through the >> statements in reverse order, since this is the way the casts are being >> propagated (would forwprop also work, just more slowly, or would it miss >> opportunities across basic blocks?). > It would miss some opportunities, Could you explain in what case? I guess my trouble understanding this is the same as in the next question, and I am missing a fundamental point... > and causes penalty on speed due concept of forwprop. > >> I have some trouble understanding why something as complicated as >> build_and_add_sum (which isn't restricted to additions by the way) is >> needed. Could you add a comment to the code explaining why you need to >> insert the statements precisely there and not for instance just before >> their use? Is that to try and help CSE? > Yes, this function is a bit more complex as actual required for now in > general. Nevertheless required. And I expect that demotion-pass will > improve and so even the right now "unlikely" cases might be required > more frequent. I had in front lighter version of statement addition, > but it failed in some cases.In some (rare) cases we would otherwise > choose wrong basic-block to add the new "cast"-statements. Well, there > is another place (reassoc) where you can find nearly the same function, > and its needs are exactly the same as within demote-pass. I had a look at the reassoc pass and failed to understand the logic there as well :-( I don't really understand how the basic block can be wrong. If we only handle the single use case with no side effects and no weird jump in between, nothing but the user should notice if you move the definition next to the use. So is this a way to handle weirder jumps than you could otherwise? >>> I have added an additional early pass "typedemote1" to this patch for >>> simple cases types can be easily sunken into statement without special >>> unsigned-cast for overflow-case. Jakub asked for it. Tests have shown >>> that this pass does optimizations in pretty few cases. As example in >>> testsuite see for example pr46867.c testcase. >>> The second pass (I put it behind first vrp pass to avoid >>> testcase-conflicts) uses 'unsigned'-type casts to avoid undefined overflow >>> behavior. This pass has much more hits in standard code, My little understanding from the old conversation and Jeff's post in this thread is that there are 2 ways to think about this optimization: 1) the canonical representation is with the smallest type possible. Using forwprop could make sense there (even if it may need help from another pass for some complicated cases, and it isn't the most efficient way). There is possibly a late pass to change all the types too small for the hardware into int. 2) we compute for each operation what minimal size it needs. Then, we may do some optimizations (truncate constants), possibly convert the operations to some smaller type (many at once), but not necessarily. This requires a separate pass but doesn't look like what you are doing. On Tue, 9 Jul 2013, Jeff Law wrote: > On 07/08/2013 02:52 PM, Marc Glisse wrote: >> >> I wonder why you implemented this as a separate pass instead of adding >> it to tree-ssa-forwprop. demote_cast is (or should be) a special case of >> combine_conversions, so it would be nice to avoid the code duplication >> (there is also a version in fold-const.c). demote_into could be called >> from roughly the same place as simplify_conversion_from_bitmask. And you >> could reuse get_prop_source_stmt, can_propagate_from, >> remove_prop_source_from_use, etc. > That's a real good question; I find myself looking a lot at the bits in > forwprop and I'm getting worried it's on its way to being an unmaintainable > mess. Sadly, I'm making things worse rather than better with my recent > changes. I'm still hoping more structure will become evident as I continue > to work through various improvements. It looks to me like a gimple version of fold, except that since it is gimple, basic operations are an order of magnitude more complicated. But I don't really see why that would make it an unmaintainable mess, giant switches are not that bad. >> If I understand, the main reason is because you want to go through the >> statements in reverse order, since this is the way the casts are being >> propagated (would forwprop also work, just more slowly, or would it miss >> opportunities across basic blocks?). > SSA_NAME_DEF_STMT can cross block boundaries. I know, which is why it isn't obvious to me what opportunities are missed. >> I assume that, when the pass is done, we can consider removing the >> corresponding code from the front-ends? That would increase the hits ;-) > Kai and I have briefly touched on this, mostly in the context of removing > bits from fold-const.c rather than the front-ends proper. The code in the front-ends is a pain. Lately, it made implementing some parts of the undefined behavior sanitizer much more complicated than it should have been. I think fold-const hurts less than optimization code directly in the front-ends. Thanks Kai and Jeff, your posts helped a lot,
On 07/12/2013 01:13 PM, Marc Glisse wrote: >> Initial patch (from last year) actual implemented that in forwprop. > > Ah, reading the conversation from last year helped me understand a bit > better. It's also worth noting that fold-const.c also does some type hoisting/sinking. Ideally that code should just be going away. >> So by implementing type-demotion there too, would lead to >> raise-condition. So there would be required additionally that within >> forwprop a straight line-depth conversion is done for >> statement-lists. All this doesn't fit pretty well into current >> concept of forward-propagation ... The cast demotion is of course >> something of interest for folding and might be fitting into >> forward-propagation-pass too. The main cause why it is implemented >> within demotion pass is, that this pass introduces such >> cast-demotion-folding opportunities due its "unsigned"-type expansion. >> So we want to fold that within pass and not waiting until a later pass >> optimizes such redundant sequences away. > > I hope we can at least find a way to share code between the passes. Well, I'd like to pull all of the type hoisting/sinking code out to its own pass. That may be a bit idealistic, but that's my hope/goal. > >>> If I understand, the main reason is because you want to go through the >>> statements in reverse order, since this is the way the casts are being >>> propagated (would forwprop also work, just more slowly, or would it miss >>> opportunities across basic blocks?). >> It would miss some opportunities, > > Could you explain in what case? I guess my trouble understanding this is > the same as in the next question, and I am missing a fundamental point... Anytime you hoist/sink a cast, you're moving it across an operation -- which can expose it as a redundant cast. Let's say you start with A = (T) x1; B = (T) x2; R = A & B; And sink the cast after the operation like this: C = x1 & x2; R = (T) C; We may find that that cast of C to type T is redundant. Similar cases can be found when we hoist the cast across the operation. I saw this kind of situation occur regularly when I was looking at Kai's hoisting/sinking patches. Now I believe one of Kai's goals is to allow our various pattern based folders to work better by not having to account for casting operations as often in sequences of statements we want to fold. I suspect that to see benefit from that we'd have to hoist, fold, sink, fold. That argues that hoisting/sinking should be independent of the folding step (which shouldn't be a surprise to any of us). >> That's a real good question; I find myself looking a lot at the bits >> in forwprop and I'm getting worried it's on its way to being an >> unmaintainable mess. Sadly, I'm making things worse rather than >> better with my recent changes. I'm still hoping more structure will >> become evident as I continue to work through various improvements. > > It looks to me like a gimple version of fold, except that since it is > gimple, basic operations are an order of magnitude more complicated. But > I don't really see why that would make it an unmaintainable mess, giant > switches are not that bad. It's certainly moving towards a gimple version of fold and special casing everything as we convert from fold and to gimple_fold (or whatever we call it) is going to result in a horrid mess. I find myself thinking that at a high level we need to split out the forward and backward propagation bits into distinct passes. The backward propagation bits are just a tree combiner and the idioms used to follow the backward chains to create more complex trees and fold them need to be reusable components. It's totally silly (and ultimately unmaintainable) that each transformation is open-coding the walking of the use-def chain and simplification step. Jeff
Jeff Law <law@redhat.com> wrote: >On 07/12/2013 01:13 PM, Marc Glisse wrote: >>> Initial patch (from last year) actual implemented that in forwprop. >> >> Ah, reading the conversation from last year helped me understand a >bit >> better. >It's also worth noting that fold-const.c also does some type >hoisting/sinking. Ideally that code should just be going away. > > >>> So by implementing type-demotion there too, would lead to >>> raise-condition. So there would be required additionally that >within >>> forwprop a straight line-depth conversion is done for >>> statement-lists. All this doesn't fit pretty well into current >>> concept of forward-propagation ... The cast demotion is of course >>> something of interest for folding and might be fitting into >>> forward-propagation-pass too. The main cause why it is implemented >>> within demotion pass is, that this pass introduces such >>> cast-demotion-folding opportunities due its "unsigned"-type >expansion. >>> So we want to fold that within pass and not waiting until a later >pass >>> optimizes such redundant sequences away. >> >> I hope we can at least find a way to share code between the passes. >Well, I'd like to pull all of the type hoisting/sinking code out to its > >own pass. That may be a bit idealistic, but that's my hope/goal. > > >> >>>> If I understand, the main reason is because you want to go through >the >>>> statements in reverse order, since this is the way the casts are >being >>>> propagated (would forwprop also work, just more slowly, or would it >miss >>>> opportunities across basic blocks?). >>> It would miss some opportunities, >> >> Could you explain in what case? I guess my trouble understanding this >is >> the same as in the next question, and I am missing a fundamental >point... >Anytime you hoist/sink a cast, you're moving it across an operation -- >which can expose it as a redundant cast. > >Let's say you start with > >A = (T) x1; >B = (T) x2; >R = A & B; > > >And sink the cast after the operation like this: > >C = x1 & x2; >R = (T) C; > >We may find that that cast of C to type T is redundant. Similar cases >can be found when we hoist the cast across the operation. I saw this >kind of situation occur regularly when I was looking at Kai's >hoisting/sinking patches. > >Now I believe one of Kai's goals is to allow our various pattern based >folders to work better by not having to account for casting operations >as often in sequences of statements we want to fold. I suspect that to > >see benefit from that we'd have to hoist, fold, sink, fold. That >argues >that hoisting/sinking should be independent of the folding step (which >shouldn't be a surprise to any of us). > > >>> That's a real good question; I find myself looking a lot at the bits >>> in forwprop and I'm getting worried it's on its way to being an >>> unmaintainable mess. Sadly, I'm making things worse rather than >>> better with my recent changes. I'm still hoping more structure will >>> become evident as I continue to work through various improvements. >> >> It looks to me like a gimple version of fold, except that since it is >> gimple, basic operations are an order of magnitude more complicated. >But >> I don't really see why that would make it an unmaintainable mess, >giant >> switches are not that bad. >It's certainly moving towards a gimple version of fold and special >casing everything as we convert from fold and to gimple_fold (or >whatever we call it) is going to result in a horrid mess. > >I find myself thinking that at a high level we need to split out the >forward and backward propagation bits into distinct passes. The >backward propagation bits are just a tree combiner and the idioms used >to follow the backward chains to create more complex trees and fold >them >need to be reusable components. It's totally silly (and ultimately >unmaintainable) that each transformation is open-coding the walking of >the use-def chain and simplification step. Splitting forward and backward propagation into separate passes creates a pass ordering issue. Rather than that all forward propagations should be formulated as backward or the other way around. A bit awkward in some cases maybe, but you can at least drive forward propagations from their sinks. As of a type demotion pass - I'd rather have this kind of thing as a lowering exposing mode promotion somewhen late in the simple optimizing stage. Btw, other demotions still happen from convert.c ... Richard. >Jeff
On Fri, 19 Jul 2013, Jeff Law wrote: > It's also worth noting that fold-const.c also does some type > hoisting/sinking. fold-const.c does everything ;-) > Ideally that code should just be going away. Like most of the code from fold-const.c that isn't about folding constants, I guess, once it has a replacement at the gimple level. >>>> If I understand, the main reason is because you want to go through the >>>> statements in reverse order, since this is the way the casts are being >>>> propagated (would forwprop also work, just more slowly, or would it miss >>>> opportunities across basic blocks?). >>> It would miss some opportunities, >> >> Could you explain in what case? I guess my trouble understanding this is >> the same as in the next question, and I am missing a fundamental point... > Anytime you hoist/sink a cast, you're moving it across an operation -- which > can expose it as a redundant cast. > > Let's say you start with > > A = (T) x1; > B = (T) x2; > R = A & B; > > > And sink the cast after the operation like this: > > C = x1 & x2; > R = (T) C; > > We may find that that cast of C to type T is redundant. Similar cases can be > found when we hoist the cast across the operation. I saw this kind of > situation occur regularly when I was looking at Kai's hoisting/sinking > patches. IIRC the question here was how forwprop could miss opportunities that a backward (reverse dominance) walk finds, and I don't see that in this example. If the cast is redundant, forwprop is just as likely to detect it. > Now I believe one of Kai's goals is to allow our various pattern based > folders to work better by not having to account for casting operations as > often in sequences of statements we want to fold. Ah. I thought it was mostly so operations would be performed in a smaller, hopefully cheaper mode. Simplifying combiner patterns would be nice if that worked. >>> That's a real good question; I find myself looking a lot at the bits >>> in forwprop and I'm getting worried it's on its way to being an >>> unmaintainable mess. Sadly, I'm making things worse rather than >>> better with my recent changes. I'm still hoping more structure will >>> become evident as I continue to work through various improvements. >> >> It looks to me like a gimple version of fold, except that since it is >> gimple, basic operations are an order of magnitude more complicated. But >> I don't really see why that would make it an unmaintainable mess, giant >> switches are not that bad. > It's certainly moving towards a gimple version of fold and special casing > everything as we convert from fold and to gimple_fold (or whatever we call > it) is going to result in a horrid mess. > > I find myself thinking that at a high level we need to split out the forward > and backward propagation bits into distinct passes. They are already mostly split (2 separate loops in ssa_forward_propagate_and_combine), so that wouldn't be hard. > The backward propagation > bits are just a tree combiner and the idioms used to follow the backward > chains to create more complex trees and fold them need to be reusable > components. It's totally silly (and ultimately unmaintainable) that each > transformation is open-coding the walking of the use-def chain and > simplification step. It isn't completely open-coded. get_prop_source_stmt, can_propagate_from, defcodefor_name, etc make walking the use-def chain not so bad. Usually at this point Richard refers to Andrew's work on a gimple combiner...
Marc Glisse <marc.glisse@inria.fr> wrote: >On Fri, 19 Jul 2013, Jeff Law wrote: > >> It's also worth noting that fold-const.c also does some type >> hoisting/sinking. > >fold-const.c does everything ;-) > >> Ideally that code should just be going away. > >Like most of the code from fold-const.c that isn't about folding >constants, I guess, once it has a replacement at the gimple level. > >>>>> If I understand, the main reason is because you want to go through >the >>>>> statements in reverse order, since this is the way the casts are >being >>>>> propagated (would forwprop also work, just more slowly, or would >it miss >>>>> opportunities across basic blocks?). >>>> It would miss some opportunities, >>> >>> Could you explain in what case? I guess my trouble understanding >this is >>> the same as in the next question, and I am missing a fundamental >point... >> Anytime you hoist/sink a cast, you're moving it across an operation >-- which >> can expose it as a redundant cast. >> >> Let's say you start with >> >> A = (T) x1; >> B = (T) x2; >> R = A & B; >> >> >> And sink the cast after the operation like this: >> >> C = x1 & x2; >> R = (T) C; >> >> We may find that that cast of C to type T is redundant. Similar >cases can be >> found when we hoist the cast across the operation. I saw this kind >of >> situation occur regularly when I was looking at Kai's >hoisting/sinking >> patches. > >IIRC the question here was how forwprop could miss opportunities that a > >backward (reverse dominance) walk finds, and I don't see that in this >example. If the cast is redundant, forwprop is just as likely to detect > >it. > >> Now I believe one of Kai's goals is to allow our various pattern >based >> folders to work better by not having to account for casting >operations as >> often in sequences of statements we want to fold. > >Ah. I thought it was mostly so operations would be performed in a >smaller, >hopefully cheaper mode. Simplifying combiner patterns would be nice if >that worked. > >>>> That's a real good question; I find myself looking a lot at the >bits >>>> in forwprop and I'm getting worried it's on its way to being an >>>> unmaintainable mess. Sadly, I'm making things worse rather than >>>> better with my recent changes. I'm still hoping more structure >will >>>> become evident as I continue to work through various improvements. >>> >>> It looks to me like a gimple version of fold, except that since it >is >>> gimple, basic operations are an order of magnitude more complicated. >But >>> I don't really see why that would make it an unmaintainable mess, >giant >>> switches are not that bad. >> It's certainly moving towards a gimple version of fold and special >casing >> everything as we convert from fold and to gimple_fold (or whatever we >call >> it) is going to result in a horrid mess. >> >> I find myself thinking that at a high level we need to split out the >forward >> and backward propagation bits into distinct passes. > >They are already mostly split (2 separate loops in >ssa_forward_propagate_and_combine), so that wouldn't be hard. > >> The backward propagation >> bits are just a tree combiner and the idioms used to follow the >backward >> chains to create more complex trees and fold them need to be reusable > >> components. It's totally silly (and ultimately unmaintainable) that >each >> transformation is open-coding the walking of the use-def chain and >> simplification step. Yeah - ideally the pattern matching would be abstract enough that it works on trees, gimple and rtl ... Using a domain specific language like Haskell would get you pattern combining for free. But I suppose with c++ you can come close enough to at least support tree and gimple ssa. >It isn't completely open-coded. get_prop_source_stmt, >can_propagate_from, >defcodefor_name, etc make walking the use-def chain not so bad. > >Usually at this point Richard refers to Andrew's work on a gimple >combiner... That or the gimple folding interface. Note the idea behind both should be to make the combined accessible from random passes just like we use fold_build now. Richard.
Index: gcc-trunk/gcc/Makefile.in =================================================================== --- gcc-trunk.orig/gcc/Makefile.in +++ gcc-trunk/gcc/Makefile.in @@ -1442,6 +1442,7 @@ OBJS = \ tree-ssa-strlen.o \ tree-ssa-structalias.o \ tree-ssa-tail-merge.o \ + tree-ssa-te.o \ tree-ssa-ter.o \ tree-ssa-threadedge.o \ tree-ssa-threadupdate.o \ Index: gcc-trunk/gcc/passes.c =================================================================== --- gcc-trunk.orig/gcc/passes.c +++ gcc-trunk/gcc/passes.c @@ -1412,6 +1412,7 @@ init_optimization_passes (void) NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_vrp); + NEXT_PASS (pass_type_demote1); NEXT_PASS (pass_dce); NEXT_PASS (pass_call_cdce); NEXT_PASS (pass_cselim); @@ -1437,6 +1438,7 @@ init_optimization_passes (void) NEXT_PASS (pass_phi_only_cprop); NEXT_PASS (pass_dse); NEXT_PASS (pass_reassoc); + NEXT_PASS (pass_type_demote2); NEXT_PASS (pass_dce); NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt); Index: gcc-trunk/gcc/tree-pass.h =================================================================== --- gcc-trunk.orig/gcc/tree-pass.h +++ gcc-trunk/gcc/tree-pass.h @@ -365,6 +365,9 @@ extern struct gimple_opt_pass pass_tm_ed extern struct gimple_opt_pass pass_split_functions; extern struct gimple_opt_pass pass_feedback_split_functions; extern struct gimple_opt_pass pass_strength_reduction; +extern struct gimple_opt_pass pass_type_promote; +extern struct gimple_opt_pass pass_type_demote1; +extern struct gimple_opt_pass pass_type_demote2; /* IPA Passes */ extern struct simple_ipa_opt_pass pass_ipa_lower_emutls; Index: gcc-trunk/gcc/tree-ssa-te.c =================================================================== --- /dev/null +++ gcc-trunk/gcc/tree-ssa-te.c @@ -0,0 +1,566 @@ +/* Type-elevation for trees + Copyright (C) 2013 + Free Software Foundation, Inc. + Contributed by Kai Tietz <ktietz@redhat.com> + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" +#include "tree.h" +#include "flags.h" +#include "tm_p.h" +#include "basic-block.h" +#include "function.h" +#include "gimple-pretty-print.h" +#include "tree-flow.h" +#include "tree-pass.h" +#include "tree-ssa-propagate.h" +#include "value-prof.h" +#include "alloc-pool.h" +#include "vec.h" +#include "langhooks.h" +#include "target.h" +#include "diagnostic-core.h" +#include "dbgcnt.h" +#include "gimple-fold.h" +#include "params.h" +#include "hash-table.h" +#include "cfgloop.h" + +/* Operation kinds. */ +#define TDK_NO_EXPAND_VIA_UNSIGNED 1 +#define TDK_NO_EXPAND_WIDENING 2 + +/* Builds one statement performing OP1 OPCODE OP2 using TMPVAR for + the result. Places the statement after the definition of either + OP1 or OP2. Returns the new statement. */ + +static tree +build_and_add_sum (tree type, tree op1, tree op2, enum tree_code opcode) +{ + gimple op1def = NULL, op2def = NULL; + gimple_stmt_iterator gsi; + tree op; + gimple sum; + + /* Create the addition statement. */ + op = make_ssa_name (type, NULL); + sum = gimple_build_assign_with_ops (opcode, op, op1, op2); + + /* Find an insertion place and insert. */ + if (TREE_CODE (op1) == SSA_NAME) + op1def = SSA_NAME_DEF_STMT (op1); + if (op2 && TREE_CODE (op2) == SSA_NAME) + op2def = SSA_NAME_DEF_STMT (op2); + + if ((!op1def || gimple_nop_p (op1def)) + && (!op2def || gimple_nop_p (op2def))) + { + gsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR)); + gsi_insert_before (&gsi, sum, GSI_NEW_STMT); + } + else if (op2 + && ((!op1def || gimple_nop_p (op1def)) + || (op2def && !gimple_nop_p (op2def) + && stmt_dominates_stmt_p (op1def, op2def)))) + { + if (gimple_code (op2def) == GIMPLE_PHI) + { + gsi = gsi_after_labels (gimple_bb (op2def)); + gsi_insert_before (&gsi, sum, GSI_NEW_STMT); + } + else + { + if (!stmt_ends_bb_p (op2def)) + { + gsi = gsi_for_stmt (op2def); + gsi_insert_after (&gsi, sum, GSI_NEW_STMT); + } + else + { + edge e; + edge_iterator ei; + + FOR_EACH_EDGE (e, ei, gimple_bb (op2def)->succs) + if (e->flags & EDGE_FALLTHRU) + gsi_insert_on_edge_immediate (e, sum); + } + } + } + else + { + if (gimple_code (op1def) == GIMPLE_PHI) + { + gsi = gsi_after_labels (gimple_bb (op1def)); + gsi_insert_before (&gsi, sum, GSI_NEW_STMT); + } + else + { + if (!stmt_ends_bb_p (op1def)) + { + gsi = gsi_for_stmt (op1def); + gsi_insert_after (&gsi, sum, GSI_NEW_STMT); + } + else + { + edge e; + edge_iterator ei; + + FOR_EACH_EDGE (e, ei, gimple_bb (op1def)->succs) + if (e->flags & EDGE_FALLTHRU) + gsi_insert_on_edge_immediate (e, sum); + } + } + } + + update_stmt (sum); + + return gimple_assign_lhs (sum); +} + +/* Release assign-statement chain. */ + +static void +remove_stmt_chain (tree var) +{ + tree var2; + gimple stmt; + gimple_stmt_iterator gsi; + + while (1) + { + if (TREE_CODE (var) != SSA_NAME || !has_zero_uses (var)) + return; + stmt = SSA_NAME_DEF_STMT (var); + if (!is_gimple_assign (stmt)) + return; + var = gimple_assign_rhs1 (stmt); + if (gimple_assign_rhs_class (stmt) == GIMPLE_BINARY_RHS) + var2 = gimple_assign_rhs2 (stmt); + else + var2 = NULL_TREE; + gsi = gsi_for_stmt (stmt); + gsi_remove (&gsi, true); + release_defs (stmt); + if (var2) + remove_stmt_chain (var2); + } +} + +/* Generated gimple cast expression (NEWTYPE) name, and + return new created assignment. */ +static tree +gen_cast (tree newtype, tree name) +{ + tree ret = NULL_TREE; + + if (INTEGRAL_TYPE_P (newtype) && TREE_CODE (name) == INTEGER_CST) + return fold_convert (newtype, name); + + /* We can't propagate through our defining statement, so emit + a conversion explicitely. */ + if (!useless_type_conversion_p (newtype, TREE_TYPE (name))) + ret = build_and_add_sum (newtype, name, NULL_TREE, NOP_EXPR); + else + ret = name; + + if (dump_file && ret != name) + { + fprintf (dump_file, "Generated cast "); + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (ret), 0, 0); + } + return ret; +} + +/* If we have pattern (TYPE1) (TYPE2) X, then we might be able to + simplify to (TYPE1) X. + The following conditions need to be met: + - typeof(X), TYPE1, TYPE2 are of integral kind. + - The (TYPE2) X statement is a single-use. + - TYPE1, and TYPE2 aren't pointer-types + a) bitsize(TYPE2) >= bitsize(X) && bitsize(TYPE1) <= bitsize(X); + b) bitsize(TYPE1) <= bitsize(TYPE2) && bitsize(TYPE1) <= bitsize(X) + c) bitsize(TYPE1) > bitsize(X) && bitsize(TYPE2) > bitsize(X) && SIGN(T1)==SIGN(T2) + d) T1 > X && T2 == X && SIGN(T!) == SIGN(T2) == SIGN(X) + */ + +static bool +demote_cast_p (tree t1, tree t2, tree tx) +{ + if (!INTEGRAL_TYPE_P (t1) + || !INTEGRAL_TYPE_P (t2)) + return false; + + if (useless_type_conversion_p (t1, t2) + || (TREE_CODE (t2) == BOOLEAN_TYPE && TREE_CODE (t1) == BOOLEAN_TYPE)) + return true; + + if (TREE_CODE (t2) == BOOLEAN_TYPE && TREE_CODE (t1) != BOOLEAN_TYPE) + return false; + if (!INTEGRAL_TYPE_P (tx)) + return false; + + /* For (T1) (T2) X, if T1 <= X && T2 >= X we can transform to + (T1) X. */ + if (TYPE_PRECISION (t1) <= TYPE_PRECISION (tx) + && TYPE_PRECISION (t2) >= TYPE_PRECISION (t1)) + return true; + /* For (T1) (T2) X, if T1 <= T2 && T1 <= X we can transform to + (T1) X. */ + else if (TYPE_PRECISION (t1) <= TYPE_PRECISION (t2) + && TYPE_PRECISION (t1) <= TYPE_PRECISION (tx)) + return true; + /* For (T1) (T2) X, if T1 > X && T1 <= T2 && (SIGN(T1) == SIGN(T2) + || !SIGN(TX)), we can transform to (T1) X. */ + else if (TYPE_PRECISION (t1) > TYPE_PRECISION (tx) + && TYPE_PRECISION (t2) > TYPE_PRECISION (tx) + && (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2) || TYPE_UNSIGNED (tx))) + return true; + /* For (T1) (T2) X, if T1 > X && T2 == X && SIGN(X) == SIGN(T2), + we can transform to (T1) X. */ + else if (TYPE_PRECISION (t1) > TYPE_PRECISION (tx) + && TYPE_PRECISION (t2) == TYPE_PRECISION (tx) + && TYPE_UNSIGNED (tx) == TYPE_UNSIGNED (t2)) + return true; + else if (TYPE_PRECISION (t1) > TYPE_PRECISION (tx) + && TYPE_PRECISION (t2) > TYPE_PRECISION (tx) + && TYPE_PRECISION (t1) <= TYPE_PRECISION (t2)) + /* Eg. (short) (unsigned int) x, for char x; */ + return true; + + return false; +} + + +/* Try to sink cast expression (TYPE1) (TYPE2) X to (TYPE1) X, if + TYPE1, TYPE2, and type of X are of integral kind and met at least + one condition in demote_cast_p. + If cast was sunken true is returned, otherwise false. */ + +static bool +demote_cast (gimple def, tree name, tree c, gimple def2) +{ + tree x; + + if (!gimple_assign_cast_p (def2)) + return false; + + x = gimple_assign_rhs1 (def2); + + if (TREE_CODE (x) != SSA_NAME + || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (x) + || !demote_cast_p (TREE_TYPE (name), TREE_TYPE (c), TREE_TYPE (x))) + return false; + + if (dump_file) + { + fprintf (dump_file, "Remove cast "); + print_gimple_stmt (dump_file, def, 0, 0); + fprintf (dump_file, " "); + print_gimple_stmt (dump_file, def2, 0, 0); + } + + gimple_assign_set_rhs1 (def, x); + if (useless_type_conversion_p (TREE_TYPE (name), TREE_TYPE (x))) + gimple_assign_set_rhs_code (def, SSA_NAME); + update_stmt (def); + remove_stmt_chain (c); + if (dump_file) + { + fprintf (dump_file, " by cast "); + print_gimple_stmt (dump_file, def, 0, 0); + } + + return true; +} + +/* Try to demote assignment for the following tree-codes: + (typ) (X * Y) -> ((typ) X) * ((typ) Y) + (typ) (X + Y) -> ((typ) X) + ((typ) Y) + (typ) (X - Y) -> ((typ) X) - ((typ) Y) + (typ) (X << Y) -> ((typ) X) << Y + (typ) (X >> Y) -> ((typ) X) >> Y + (typ) (X ror Y) -> ((typ) X) ror Y + (typ) (X rol Y) -> ((typ) X) rol Y + (typ) (X & Y) -> ((typ) X) & ((typ) Y) + (typ) (X | Y) -> ((typ) X) | ((typ) Y) + (typ) (X ^ Y) -> ((typ) X) ^ ((typ) Y). */ + +static bool +demote_into (int flags, gimple stmt, tree lhs, tree r, gimple r_stmt, gimple *pstmt) +{ + gimple_stmt_iterator gsi; + enum tree_code code; + tree l_type = TREE_TYPE (lhs); + tree r_type, a1, a2; + + r_type = TREE_TYPE (r); + + /* We can't do type-sinking if only one side is boolean typed. */ + if ((TREE_CODE (r_type) == BOOLEAN_TYPE + || TREE_CODE (l_type) == BOOLEAN_TYPE) + && TREE_CODE (r_type) != TREE_CODE (l_type)) + return false; + + if ((flags & TDK_NO_EXPAND_WIDENING) != 0 + && TYPE_PRECISION (l_type) > TYPE_PRECISION (r_type)) + return false; + code = gimple_assign_rhs_code (r_stmt); + + switch (code) + { + case MULT_EXPR: + case PLUS_EXPR: case MINUS_EXPR: + a1 = gimple_assign_rhs1 (r_stmt); + a2 = gimple_assign_rhs2 (r_stmt); + + if (TYPE_PRECISION (l_type) < TYPE_PRECISION (r_type) + && !TYPE_UNSIGNED (l_type)) + { + tree uns; + + uns = unsigned_type_for (l_type); + + if ((flags & TDK_NO_EXPAND_VIA_UNSIGNED) != 0) + return false; + + a1 = gen_cast (uns, a1); + a2 = gen_cast (uns, a2); + a1 = build_and_add_sum (uns, a1, a2, code); + code = NOP_EXPR; + a2 = NULL_TREE; + break; + } + else if (TYPE_PRECISION (l_type) < TYPE_PRECISION (r_type) + && TYPE_UNSIGNED (l_type)) + { + a1 = gen_cast (l_type, a1); + a2 = gen_cast (l_type, a2); + break; + } + else if (TYPE_PRECISION (l_type) == TYPE_PRECISION (r_type)) + { + if (TYPE_UNSIGNED (r_type) == TYPE_UNSIGNED (l_type) + || !TYPE_UNSIGNED (r_type)) + { + a1 = gen_cast (l_type, a1); + a2 = gen_cast (l_type, a2); + break; + } + } + return false; + + case RSHIFT_EXPR: + /* (NEWTYPE) (X >> Y) can be transformed to + ((NEWTYPE) X) >> Y, if NEWTYPE == X and + SIGN(NEWTYPE) == SIGN(X). */ + if (TYPE_UNSIGNED (l_type) != TYPE_UNSIGNED (r_type)) + return false; + case LROTATE_EXPR: + case RROTATE_EXPR: + case LSHIFT_EXPR: + /* We can't allow <= as right-hand operand might be too large + for the smaller type. */ + if (TYPE_PRECISION (l_type) != TYPE_PRECISION (r_type)) + return false; + a1 = gimple_assign_rhs1 (r_stmt); + if (!useless_type_conversion_p (l_type, TREE_TYPE (a1))) + a1 = gen_cast (l_type, a1); + a2 = gimple_assign_rhs2 (r_stmt); + + break; + + case MIN_EXPR: case MAX_EXPR: + a1 = gimple_assign_rhs1 (r_stmt); + a2 = gimple_assign_rhs2 (r_stmt); + + if (TYPE_PRECISION (l_type) < TYPE_PRECISION (TREE_TYPE (a1))) + return false; + if (TYPE_UNSIGNED (l_type) != TYPE_UNSIGNED (TREE_TYPE (a1))) + { + if (!TYPE_UNSIGNED (TREE_TYPE (a1)) + || TYPE_PRECISION (l_type) <= TYPE_PRECISION (TREE_TYPE (a1))) + return false; + } + + if (!useless_type_conversion_p (l_type, TREE_TYPE (a1))) + a1 = gen_cast (l_type, a1); + if (!useless_type_conversion_p (l_type, TREE_TYPE (a2))) + a2 = gen_cast (l_type, a2); + + break; + + case BIT_AND_EXPR: case BIT_IOR_EXPR: case BIT_XOR_EXPR: + a1 = gimple_assign_rhs1 (r_stmt); + a2 = gimple_assign_rhs2 (r_stmt); + + if (!useless_type_conversion_p (l_type, TREE_TYPE (a1))) + a1 = gen_cast (l_type, a1); + if (!useless_type_conversion_p (l_type, TREE_TYPE (a2))) + a2 = gen_cast (l_type, a2); + break; + default: + return false; + } + + if (dump_file) + { + fprintf (dump_file, "Move cast "); + print_gimple_stmt (dump_file, stmt, 0, 0); + } + + if (gimple_location (r_stmt) != UNKNOWN_LOCATION + && gimple_location (stmt) == UNKNOWN_LOCATION) + gimple_set_location (stmt, gimple_location (r_stmt)); + gsi = gsi_for_stmt (stmt); + gimple_assign_set_rhs_with_ops (&gsi, code, a1, a2); + stmt = gsi_stmt (gsi); + update_stmt (stmt); + remove_stmt_chain (r); + *pstmt = stmt; + + if (dump_file) + { + fprintf (dump_file, " into "); + print_gimple_stmt (dump_file, stmt, 0, 0); + } + + return true; +} + +/* Loop over each statement for each basic-block. */ + +static void +execute_type_demote_int (int flags, basic_block bb) +{ + gimple_stmt_iterator gsi; + basic_block son; + + for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi);) + { + gimple stmt, def2; + tree lhs, inner; + + stmt = gsi_stmt (gsi); + gsi_prev (&gsi); + + if (!is_gimple_assign (stmt) + || (lhs = gimple_assign_lhs (stmt)) == NULL_TREE + || TREE_CODE (lhs) != SSA_NAME + || !INTEGRAL_TYPE_P (TREE_TYPE (lhs)) + || !gimple_assign_cast_p (stmt)) + continue; + inner = gimple_assign_rhs1 (stmt); + if (TREE_CODE (inner) != SSA_NAME + || !INTEGRAL_TYPE_P (TREE_TYPE (inner)) + || !(def2 = SSA_NAME_DEF_STMT (inner)) + || !is_gimple_assign (def2) + || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (inner) + || !has_single_use (inner)) + continue; + if (demote_cast (stmt, lhs, inner, def2)) + gsi = gsi_for_stmt (stmt); + else if (demote_into (flags, stmt, lhs, inner, def2, &stmt)) + { + gsi = gsi_for_stmt (stmt); + gsi_prev (&gsi); + } + } + /* OCCURS_IN_ABNORMAL_PHI */ + for (son = first_dom_son (CDI_POST_DOMINATORS, bb); + son; + son = next_dom_son (CDI_POST_DOMINATORS, son)) + execute_type_demote_int (flags, son); + +} + +/* Entry without unsigned-type demotion. */ +static unsigned int +execute_type_demote1 (void) +{ + /* We calculate dominance info as CDI_DOMINATORS to make + sure we can insert new generated statenents at proper + basic-block. */ + calculate_dominance_info (CDI_DOMINATORS); + calculate_dominance_info (CDI_POST_DOMINATORS); + + execute_type_demote_int (TDK_NO_EXPAND_VIA_UNSIGNED, EXIT_BLOCK_PTR); + + free_dominance_info (CDI_POST_DOMINATORS); + return 0; +} + +/* Entry with unsigned-type demotion. */ +static unsigned int +execute_type_demote2 (void) +{ + /* We calculate dominance info as CDI_DOMINATORS to make + sure we can insert new generated statenents at proper + basic-block. */ + calculate_dominance_info (CDI_DOMINATORS); + calculate_dominance_info (CDI_POST_DOMINATORS); + + execute_type_demote_int (0, EXIT_BLOCK_PTR); + + free_dominance_info (CDI_POST_DOMINATORS); + return 0; +} + +struct gimple_opt_pass pass_type_demote1 = +{ + { + GIMPLE_PASS, + "typedemote1", /* name */ + OPTGROUP_NONE, + NULL, /* gate */ + execute_type_demote1, /* execute */ + NULL, /* sub */ + NULL, /* next */ + 0, /* static_pass_number */ + TV_NONE, /* tv_id */ + PROP_cfg | PROP_ssa, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_verify_ssa + | TODO_update_ssa /* todo_flags_finish */ + } +}; + +struct gimple_opt_pass pass_type_demote2 = +{ + { + GIMPLE_PASS, + "typedemote2", /* name */ + OPTGROUP_NONE, + NULL, /* gate */ + execute_type_demote2, /* execute */ + NULL, /* sub */ + NULL, /* next */ + 0, /* static_pass_number */ + TV_NONE, /* tv_id */ + PROP_cfg | PROP_ssa, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_verify_ssa + | TODO_update_ssa /* todo_flags_finish */ + } +};