Message ID | 55B5A884.4060105@linaro.org |
---|---|
State | New |
Headers | show |
On Mon, Jul 27, 2015 at 11:41 AM, Michael Collison <michael.collison@linaro.org> wrote: > This patch is designed to optimize end of loop conditions involving of the > form > i < x && i < y into i < min (x, y). Loop condition involving '>' are > handled similarly using max(x,y). > As an example: > > #define N 1024 > > int a[N], b[N], c[N]; > > void add (unsignedint m, unsignedint n) > { > unsignedint i, bound = (m < n) ? m : n; > for (i = 0; i < m && i < n; ++i) > a[i] = b[i] + c[i]; > } > > > Performed bootstrap and make check on: x86_64_unknown-linux-gnu, > arm-linux-gnueabihf, and aarch64-linux-gnu. > Okay for trunk? > > 2015-07-24 Michael Collison <michael.collison@linaro.org> > Andrew Pinski <andrew.pinski@caviumnetworks.com> > > * match.pd ((x < y) && (x < z) -> x < min (y,z), > (x > y) and (x > z) -> x > max (y,z)) > > diff --git a/gcc/match.pd b/gcc/match.pd > index 5e8fd32..8691710 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3. If not see > (convert (bit_and (op (convert:utype @0) (convert:utype @1)) > (convert:utype @4))))))) > > + > +/* Transform (@0 < @1 and @0 < @2) to use min */ > +(for op (lt le) > +(simplify > +(bit_and:c (op @0 @1) (op @0 @2)) > +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) > +(op @0 (min @1 @2))))) > + > +/* Transform (@0 > @1 and @0 > @2) to use max */ > +(for op (gt ge) > +(simplify > +(bit_and:c (op @0 @1) (op @0 @2)) > +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) > +(op @0 (max @1 @2))))) Could you please give a test case for it? Also IIUC, this is not only simplification, but also loop invariant hoist, so how does it check invariantness? Thanks, bin > -- > > -- > Michael Collison > Linaro Toolchain Working Group > michael.collison@linaro.org >
On Mon, Jul 27, 2015 at 12:23 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Mon, Jul 27, 2015 at 11:41 AM, Michael Collison > <michael.collison@linaro.org> wrote: >> This patch is designed to optimize end of loop conditions involving of the >> form >> i < x && i < y into i < min (x, y). Loop condition involving '>' are >> handled similarly using max(x,y). >> As an example: >> >> #define N 1024 >> >> int a[N], b[N], c[N]; >> >> void add (unsignedint m, unsignedint n) >> { >> unsignedint i, bound = (m < n) ? m : n; >> for (i = 0; i < m && i < n; ++i) >> a[i] = b[i] + c[i]; >> } >> >> >> Performed bootstrap and make check on: x86_64_unknown-linux-gnu, >> arm-linux-gnueabihf, and aarch64-linux-gnu. >> Okay for trunk? >> >> 2015-07-24 Michael Collison <michael.collison@linaro.org> >> Andrew Pinski <andrew.pinski@caviumnetworks.com> >> >> * match.pd ((x < y) && (x < z) -> x < min (y,z), >> (x > y) and (x > z) -> x > max (y,z)) >> >> diff --git a/gcc/match.pd b/gcc/match.pd >> index 5e8fd32..8691710 100644 >> --- a/gcc/match.pd >> +++ b/gcc/match.pd >> @@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3. If not see >> (convert (bit_and (op (convert:utype @0) (convert:utype @1)) >> (convert:utype @4))))))) >> >> + >> +/* Transform (@0 < @1 and @0 < @2) to use min */ >> +(for op (lt le) >> +(simplify >> +(bit_and:c (op @0 @1) (op @0 @2)) >> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) >> +(op @0 (min @1 @2))))) >> + >> +/* Transform (@0 > @1 and @0 > @2) to use max */ >> +(for op (gt ge) >> +(simplify >> +(bit_and:c (op @0 @1) (op @0 @2)) >> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) >> +(op @0 (max @1 @2))))) > > Could you please give a test case for it? Also IIUC, this is not only > simplification, but also loop invariant hoist, so how does it check > invariantness? Sorry I realized this patch only does simplification and then let lim pass decide if it can be moved? In this way, there is no invariant problem, please ignore previous message. Thanks, bin
On Mon, Jul 27, 2015 at 5:41 AM, Michael Collison <michael.collison@linaro.org> wrote: > This patch is designed to optimize end of loop conditions involving of the > form > i < x && i < y into i < min (x, y). Loop condition involving '>' are > handled similarly using max(x,y). > As an example: > > #define N 1024 > > int a[N], b[N], c[N]; > > void add (unsignedint m, unsignedint n) > { > unsignedint i, bound = (m < n) ? m : n; > for (i = 0; i < m && i < n; ++i) > a[i] = b[i] + c[i]; > } > > > Performed bootstrap and make check on: x86_64_unknown-linux-gnu, > arm-linux-gnueabihf, and aarch64-linux-gnu. > Okay for trunk? So this works only for && that has been lowered to non-CFG form (I suppose phiopt would catch that? If not, ifcombine would be the place to implement it I guess). Furthermore it doesn't work for three such ops which would require an additional pattern like (simplfiy (bit_and:c (op @0 (min @1 @2)) (op @0 @3)) (op @0 (min (min @1 @2) @3)))) if that's profitable? Richard. > 2015-07-24 Michael Collison <michael.collison@linaro.org> > Andrew Pinski <andrew.pinski@caviumnetworks.com> > > * match.pd ((x < y) && (x < z) -> x < min (y,z), > (x > y) and (x > z) -> x > max (y,z)) > > diff --git a/gcc/match.pd b/gcc/match.pd > index 5e8fd32..8691710 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3. If not see > (convert (bit_and (op (convert:utype @0) (convert:utype @1)) > (convert:utype @4))))))) > > + > +/* Transform (@0 < @1 and @0 < @2) to use min */ > +(for op (lt le) > +(simplify > +(bit_and:c (op @0 @1) (op @0 @2)) > +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) > +(op @0 (min @1 @2))))) > + > +/* Transform (@0 > @1 and @0 > @2) to use max */ > +(for op (gt ge) > +(simplify > +(bit_and:c (op @0 @1) (op @0 @2)) > +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) > +(op @0 (max @1 @2))))) > -- > > -- > Michael Collison > Linaro Toolchain Working Group > michael.collison@linaro.org >
On 07/27/2015 03:25 AM, Richard Biener wrote: > On Mon, Jul 27, 2015 at 5:41 AM, Michael Collison > <michael.collison@linaro.org> wrote: >> This patch is designed to optimize end of loop conditions involving of the >> form >> i < x && i < y into i < min (x, y). Loop condition involving '>' are >> handled similarly using max(x,y). >> As an example: >> >> #define N 1024 >> >> int a[N], b[N], c[N]; >> >> void add (unsignedint m, unsignedint n) >> { >> unsignedint i, bound = (m < n) ? m : n; >> for (i = 0; i < m && i < n; ++i) >> a[i] = b[i] + c[i]; >> } >> >> >> Performed bootstrap and make check on: x86_64_unknown-linux-gnu, >> arm-linux-gnueabihf, and aarch64-linux-gnu. >> Okay for trunk? > > So this works only for && that has been lowered to non-CFG form > (I suppose phiopt would catch that? If not, ifcombine would be the > place to implement it I guess). phiopt is supposed to be generating MIN/MAX expressions for us. If it isn't it'd be good to see any testcases where it isn't. I think that raises a general question though. Does it make more sense to capture MIN/MAX (and others) in phiopt or in the match.pd framework? Jeff
On Mon, Jul 27, 2015 at 6:20 PM, Jeff Law <law@redhat.com> wrote: > On 07/27/2015 03:25 AM, Richard Biener wrote: >> >> On Mon, Jul 27, 2015 at 5:41 AM, Michael Collison >> <michael.collison@linaro.org> wrote: >>> >>> This patch is designed to optimize end of loop conditions involving of >>> the >>> form >>> i < x && i < y into i < min (x, y). Loop condition involving '>' are >>> handled similarly using max(x,y). >>> As an example: >>> >>> #define N 1024 >>> >>> int a[N], b[N], c[N]; >>> >>> void add (unsignedint m, unsignedint n) >>> { >>> unsignedint i, bound = (m < n) ? m : n; >>> for (i = 0; i < m && i < n; ++i) >>> a[i] = b[i] + c[i]; >>> } >>> >>> >>> Performed bootstrap and make check on: x86_64_unknown-linux-gnu, >>> arm-linux-gnueabihf, and aarch64-linux-gnu. >>> Okay for trunk? >> >> >> So this works only for && that has been lowered to non-CFG form >> (I suppose phiopt would catch that? If not, ifcombine would be the >> place to implement it I guess). > > phiopt is supposed to be generating MIN/MAX expressions for us. If it isn't > it'd be good to see any testcases where it isn't. > > I think that raises a general question though. Does it make more sense to > capture MIN/MAX (and others) in phiopt or in the match.pd framework? match.pd is good for pattern recognition - patterns of fixed size. There are cases that are done in fold-const.c for example that doesn't fit very well and should be done as separate pass, like for example figuring out whether an expression can be easily negated or whether there are sign-changes that can be stripped. Basically all cases where fold currently recurses (unbound). The above case is a corner case I think - the number of && you can change into (multiple) MIN/MAX is unbound but we might only care about the case where there will be one MIN/MAX operation. Generally phiopt and other patterns that match the CFG are not yet well supported by match.pd (though I outlined how matching PHI nodes when facing (simplify (cond ...) ...) would be possible). So while putting something into match.pd is easy I'd like people to think if doing the same thing elsewhere is better - that is, if this is really a pattern transform operation or if you are just implementing a special-case of a general transform as a pattern. Richard. > Jeff >
Richard and Jeff, Any conclusion to this discussion? Is this okay in match.pd or would you like to see it implemented elsewhere? On 7/28/2015 12:41 AM, Richard Biener wrote: > On Mon, Jul 27, 2015 at 6:20 PM, Jeff Law <law@redhat.com> wrote: >> On 07/27/2015 03:25 AM, Richard Biener wrote: >>> On Mon, Jul 27, 2015 at 5:41 AM, Michael Collison >>> <michael.collison@linaro.org> wrote: >>>> This patch is designed to optimize end of loop conditions involving of >>>> the >>>> form >>>> i < x && i < y into i < min (x, y). Loop condition involving '>' are >>>> handled similarly using max(x,y). >>>> As an example: >>>> >>>> #define N 1024 >>>> >>>> int a[N], b[N], c[N]; >>>> >>>> void add (unsignedint m, unsignedint n) >>>> { >>>> unsignedint i, bound = (m < n) ? m : n; >>>> for (i = 0; i < m && i < n; ++i) >>>> a[i] = b[i] + c[i]; >>>> } >>>> >>>> >>>> Performed bootstrap and make check on: x86_64_unknown-linux-gnu, >>>> arm-linux-gnueabihf, and aarch64-linux-gnu. >>>> Okay for trunk? >>> >>> So this works only for && that has been lowered to non-CFG form >>> (I suppose phiopt would catch that? If not, ifcombine would be the >>> place to implement it I guess). >> phiopt is supposed to be generating MIN/MAX expressions for us. If it isn't >> it'd be good to see any testcases where it isn't. >> >> I think that raises a general question though. Does it make more sense to >> capture MIN/MAX (and others) in phiopt or in the match.pd framework? > match.pd is good for pattern recognition - patterns of fixed size. There are > cases that are done in fold-const.c for example that doesn't fit very well > and should be done as separate pass, like for example figuring out whether > an expression can be easily negated or whether there are sign-changes that > can be stripped. Basically all cases where fold currently recurses (unbound). > > The above case is a corner case I think - the number of && you can change > into (multiple) MIN/MAX is unbound but we might only care about the case > where there will be one MIN/MAX operation. > > Generally phiopt and other patterns that match the CFG are not yet well > supported by match.pd (though I outlined how matching PHI nodes when > facing (simplify (cond ...) ...) would be possible). > > So while putting something into match.pd is easy I'd like people to > think if doing the same thing elsewhere is better - that is, if this is really > a pattern transform operation or if you are just implementing a special-case > of a general transform as a pattern. > > Richard. > >> Jeff >>
On 07/28/2015 01:41 AM, Richard Biener wrote: > > The above case is a corner case I think - the number of && you can change > into (multiple) MIN/MAX is unbound but we might only care about the case > where there will be one MIN/MAX operation. I suspect that's going to be the most important/common case. > > Generally phiopt and other patterns that match the CFG are not yet well > supported by match.pd (though I outlined how matching PHI nodes when > facing (simplify (cond ...) ...) would be possible). Right. Though I thought the conclusion after outlining we determined it wasn't really feasible, yet. > > So while putting something into match.pd is easy I'd like people to > think if doing the same thing elsewhere is better - that is, if this is really > a pattern transform operation or if you are just implementing a special-case > of a general transform as a pattern. So in this case we're taking something like: _6 = i_1 < m_4(D); _7 = i_1 < n_3(D); _8 = _6 & _7; if (_8 != 0) And turning it into _X = MIN (m_4, n_3) if (i_1 < _X) That seems to me like a good match for match.pd given its generality and the need to walk up the use-def chain. It's certainly not a good fit for phi-opt since we're not looking at PHIs :-) Michael -- can you take your sample code and turn it into a test for the testsuite. I'd hazard a guess it'll need to be target specific because of its interactions with branch-costing. Might as well make 4 variants (lt -> MIN, le -> MIN, ge->MAX, gt->MAX). We're going to want that regardless of whether tackling this issue in match.pd (my current preference) or elsewhere. jeff
Hi Jeff, Yes I will create a test case. I'm not quite sure what to check for even in the machine dependent test case. It's quite possible for the instructions that are generated to change over time. On 7/31/2015 9:20 AM, Jeff Law wrote: > On 07/28/2015 01:41 AM, Richard Biener wrote: >> >> The above case is a corner case I think - the number of && you can >> change >> into (multiple) MIN/MAX is unbound but we might only care about the case >> where there will be one MIN/MAX operation. > I suspect that's going to be the most important/common case. > >> >> Generally phiopt and other patterns that match the CFG are not yet well >> supported by match.pd (though I outlined how matching PHI nodes when >> facing (simplify (cond ...) ...) would be possible). > Right. Though I thought the conclusion after outlining we determined > it wasn't really feasible, yet. > > >> >> So while putting something into match.pd is easy I'd like people to >> think if doing the same thing elsewhere is better - that is, if this >> is really >> a pattern transform operation or if you are just implementing a >> special-case >> of a general transform as a pattern. > So in this case we're taking something like: > > _6 = i_1 < m_4(D); > _7 = i_1 < n_3(D); > _8 = _6 & _7; > if (_8 != 0) > > > And turning it into > > _X = MIN (m_4, n_3) > if (i_1 < _X) > > That seems to me like a good match for match.pd given its generality > and the need to walk up the use-def chain. It's certainly not a good > fit for phi-opt since we're not looking at PHIs :-) > > > > Michael -- can you take your sample code and turn it into a test for > the testsuite. I'd hazard a guess it'll need to be target specific > because of its interactions with branch-costing. Might as well make 4 > variants (lt -> MIN, le -> MIN, ge->MAX, gt->MAX). > > We're going to want that regardless of whether tackling this issue in > match.pd (my current preference) or elsewhere. > > jeff
On 07/31/2015 12:18 PM, Michael Collison wrote: > Hi Jeff, > > Yes I will create a test case. I'm not quite sure what to check for even > in the machine dependent test case. It's quite possible for the > instructions that are generated to change over time. I think we're going to want to look at the gimple IR and search for the MIN/MAX expressions rather than the instructions. Given we don't know where the transformation is going to land (yet), you can probably start with -fdump-tree-optimized and scanning the .optimized dump. We can still do that and have the test be target specific. jeff
On Fri, Jul 31, 2015 at 8:27 PM, Jeff Law <law@redhat.com> wrote: > On 07/31/2015 12:18 PM, Michael Collison wrote: >> >> Hi Jeff, >> >> Yes I will create a test case. I'm not quite sure what to check for even >> in the machine dependent test case. It's quite possible for the >> instructions that are generated to change over time. > > I think we're going to want to look at the gimple IR and search for the > MIN/MAX expressions rather than the instructions. Given we don't know where > the transformation is going to land (yet), you can probably start with > -fdump-tree-optimized and scanning the .optimized dump. Yeah. FWIW I'm ok with the specialized pattern in match.pd. Indeed phiopt isn't a good fit here (though on some targets you may end up seeing a PHI node, dependent on LOGICAL_OP_NON_SHORT_CIRCUIT). For longer chains I'd say reassoc is the appropriate place to handle this as it has already code to combine &&s and ||s of conditions. Maybe you can give it a short try to only handle it there. I'm also somewhat worried about code-generation on random targets. So can you have a quick look (just for your testcase) what the code-gen difference is on primary platforms (just build some stage1 cc1s)? Thanks, Richard. > We can still do that and have the test be target specific. > > jeff >
Richard Biener wrote: > Furthermore it doesn't work for three such ops which would require > an additional pattern like > > (simplfiy > (bit_and:c (op @0 (min @1 @2)) (op @0 @3)) > (op @0 (min (min @1 @2) @3)))) > > if that's profitable? Shouldn't that be just a case of binding @1 in the original pattern: >> +/* Transform (@0 < @1 and @0 < @2) to use min */ >> +(for op (lt le) >> +(simplify >> +(bit_and:c (op @0 @1) (op @0 @2)) >> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) >> +(op @0 (min @1 @2))))) to (min @1 @2) in your three-way pattern, @2 in the original to @3 in yours, and @0 to @0 ? Or is @1 not allowed to bind to a 'min' ? Secondly: should/can Michael's fix reference PR57600 ? Cheers, Alan
On 08/05/2015 10:07 AM, Alan Lawrence wrote: > Richard Biener wrote: > >> Furthermore it doesn't work for three such ops which would require >> an additional pattern like >> >> (simplfiy >> (bit_and:c (op @0 (min @1 @2)) (op @0 @3)) >> (op @0 (min (min @1 @2) @3)))) >> >> if that's profitable? > > Shouldn't that be just a case of binding @1 in the original pattern: > >>> +/* Transform (@0 < @1 and @0 < @2) to use min */ >>> +(for op (lt le) >>> +(simplify >>> +(bit_and:c (op @0 @1) (op @0 @2)) > >> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) > >> +(op @0 (min @1 @2))))) > > to (min @1 @2) in your three-way pattern, @2 in the original to @3 in > yours, and @0 to @0 ? > > Or is @1 not allowed to bind to a 'min' ? > > > Secondly: should/can Michael's fix reference PR57600 ? It probably should. Reading that BZ ISTM that we probably are missing some expression equivalences in DOM. Given x = min (a, b) We should be recording x <= a -> true x <= b -> true in our expression equivalency tables (plus all the related equivalences). I'm not sure if that's the cause of the missed optimizations or not though. Just something I noticed while reading the BZ. jeff
diff --git a/gcc/match.pd b/gcc/match.pd index 5e8fd32..8691710 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3. If not see (convert (bit_and (op (convert:utype @0) (convert:utype @1)) (convert:utype @4))))))) + +/* Transform (@0 < @1 and @0 < @2) to use min */ +(for op (lt le) +(simplify +(bit_and:c (op @0 @1) (op @0 @2)) +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) +(op @0 (min @1 @2))))) + +/* Transform (@0 > @1 and @0 > @2) to use max */ +(for op (gt ge) +(simplify +(bit_and:c (op @0 @1) (op @0 @2)) +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) +(op @0 (max @1 @2)))))