diff mbox

Optimize certain end of loop conditions into min/max operation

Message ID 55B5A884.4060105@linaro.org
State New
Headers show

Commit Message

Michael Collison July 27, 2015, 3:41 a.m. UTC
This patch is designed to optimize end of loop conditions involving of 
the form
  i < x && i < y into i < min (x, y). Loop condition involving '>' are 
handled similarly using max(x,y).
As an example:

#define N 1024

int  a[N], b[N], c[N];

void add (unsignedint  m, unsignedint  n)
{
   unsignedint  i, bound = (m < n) ? m : n;
   for  (i = 0; i < m && i < n; ++i)
     a[i] = b[i] + c[i];
}


Performed bootstrap and make check on: x86_64_unknown-linux-gnu, 
arm-linux-gnueabihf, and aarch64-linux-gnu.
Okay for trunk?

2015-07-24  Michael Collison  <michael.collison@linaro.org>
                     Andrew Pinski <andrew.pinski@caviumnetworks.com>

                 * match.pd ((x < y) && (x < z) -> x < min (y,z),
                 (x > y) and (x > z) -> x > max (y,z))

Comments

Bin.Cheng July 27, 2015, 4:23 a.m. UTC | #1
On Mon, Jul 27, 2015 at 11:41 AM, Michael Collison
<michael.collison@linaro.org> wrote:
> This patch is designed to optimize end of loop conditions involving of the
> form
>  i < x && i < y into i < min (x, y). Loop condition involving '>' are
> handled similarly using max(x,y).
> As an example:
>
> #define N 1024
>
> int  a[N], b[N], c[N];
>
> void add (unsignedint  m, unsignedint  n)
> {
>   unsignedint  i, bound = (m < n) ? m : n;
>   for  (i = 0; i < m && i < n; ++i)
>     a[i] = b[i] + c[i];
> }
>
>
> Performed bootstrap and make check on: x86_64_unknown-linux-gnu,
> arm-linux-gnueabihf, and aarch64-linux-gnu.
> Okay for trunk?
>
> 2015-07-24  Michael Collison  <michael.collison@linaro.org>
>                     Andrew Pinski <andrew.pinski@caviumnetworks.com>
>
>                 * match.pd ((x < y) && (x < z) -> x < min (y,z),
>                 (x > y) and (x > z) -> x > max (y,z))
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5e8fd32..8691710 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3.  If not see
>      (convert (bit_and (op (convert:utype @0) (convert:utype @1))
>                (convert:utype @4)))))))
>
> +
> +/* Transform (@0 < @1 and @0 < @2) to use min */
> +(for op (lt le)
> +(simplify
> +(bit_and:c (op @0 @1) (op @0 @2))
> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
> +(op @0 (min @1 @2)))))
> +
> +/* Transform (@0 > @1 and @0 > @2) to use max */
> +(for op (gt ge)
> +(simplify
> +(bit_and:c (op @0 @1) (op @0 @2))
> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
> +(op @0 (max @1 @2)))))

Could you please give a test case for it?  Also IIUC, this is not only
simplification, but also loop invariant hoist, so how does it check
invariantness?

Thanks,
bin
> --
>
> --
> Michael Collison
> Linaro Toolchain Working Group
> michael.collison@linaro.org
>
Bin.Cheng July 27, 2015, 5:41 a.m. UTC | #2
On Mon, Jul 27, 2015 at 12:23 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Mon, Jul 27, 2015 at 11:41 AM, Michael Collison
> <michael.collison@linaro.org> wrote:
>> This patch is designed to optimize end of loop conditions involving of the
>> form
>>  i < x && i < y into i < min (x, y). Loop condition involving '>' are
>> handled similarly using max(x,y).
>> As an example:
>>
>> #define N 1024
>>
>> int  a[N], b[N], c[N];
>>
>> void add (unsignedint  m, unsignedint  n)
>> {
>>   unsignedint  i, bound = (m < n) ? m : n;
>>   for  (i = 0; i < m && i < n; ++i)
>>     a[i] = b[i] + c[i];
>> }
>>
>>
>> Performed bootstrap and make check on: x86_64_unknown-linux-gnu,
>> arm-linux-gnueabihf, and aarch64-linux-gnu.
>> Okay for trunk?
>>
>> 2015-07-24  Michael Collison  <michael.collison@linaro.org>
>>                     Andrew Pinski <andrew.pinski@caviumnetworks.com>
>>
>>                 * match.pd ((x < y) && (x < z) -> x < min (y,z),
>>                 (x > y) and (x > z) -> x > max (y,z))
>>
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 5e8fd32..8691710 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3.  If not see
>>      (convert (bit_and (op (convert:utype @0) (convert:utype @1))
>>                (convert:utype @4)))))))
>>
>> +
>> +/* Transform (@0 < @1 and @0 < @2) to use min */
>> +(for op (lt le)
>> +(simplify
>> +(bit_and:c (op @0 @1) (op @0 @2))
>> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
>> +(op @0 (min @1 @2)))))
>> +
>> +/* Transform (@0 > @1 and @0 > @2) to use max */
>> +(for op (gt ge)
>> +(simplify
>> +(bit_and:c (op @0 @1) (op @0 @2))
>> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
>> +(op @0 (max @1 @2)))))
>
> Could you please give a test case for it?  Also IIUC, this is not only
> simplification, but also loop invariant hoist, so how does it check
> invariantness?

Sorry I realized this patch only does simplification and then let lim
pass decide if it can be moved?  In this way, there is no invariant
problem, please ignore previous message.

Thanks,
bin
Richard Biener July 27, 2015, 9:25 a.m. UTC | #3
On Mon, Jul 27, 2015 at 5:41 AM, Michael Collison
<michael.collison@linaro.org> wrote:
> This patch is designed to optimize end of loop conditions involving of the
> form
>  i < x && i < y into i < min (x, y). Loop condition involving '>' are
> handled similarly using max(x,y).
> As an example:
>
> #define N 1024
>
> int  a[N], b[N], c[N];
>
> void add (unsignedint  m, unsignedint  n)
> {
>   unsignedint  i, bound = (m < n) ? m : n;
>   for  (i = 0; i < m && i < n; ++i)
>     a[i] = b[i] + c[i];
> }
>
>
> Performed bootstrap and make check on: x86_64_unknown-linux-gnu,
> arm-linux-gnueabihf, and aarch64-linux-gnu.
> Okay for trunk?

So this works only for && that has been lowered to non-CFG form
(I suppose phiopt would catch that?  If not, ifcombine would be the
place to implement it I guess).

Furthermore it doesn't work for three such ops which would require
an additional pattern like

 (simplfiy
  (bit_and:c (op @0 (min @1 @2)) (op @0 @3))
  (op @0 (min (min @1 @2) @3))))

if that's profitable?

Richard.

> 2015-07-24  Michael Collison  <michael.collison@linaro.org>
>                     Andrew Pinski <andrew.pinski@caviumnetworks.com>
>
>                 * match.pd ((x < y) && (x < z) -> x < min (y,z),
>                 (x > y) and (x > z) -> x > max (y,z))
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5e8fd32..8691710 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3.  If not see
>      (convert (bit_and (op (convert:utype @0) (convert:utype @1))
>                (convert:utype @4)))))))
>
> +
> +/* Transform (@0 < @1 and @0 < @2) to use min */
> +(for op (lt le)
> +(simplify
> +(bit_and:c (op @0 @1) (op @0 @2))
> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
> +(op @0 (min @1 @2)))))
> +
> +/* Transform (@0 > @1 and @0 > @2) to use max */
> +(for op (gt ge)
> +(simplify
> +(bit_and:c (op @0 @1) (op @0 @2))
> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
> +(op @0 (max @1 @2)))))
> --
>
> --
> Michael Collison
> Linaro Toolchain Working Group
> michael.collison@linaro.org
>
Jeff Law July 27, 2015, 4:20 p.m. UTC | #4
On 07/27/2015 03:25 AM, Richard Biener wrote:
> On Mon, Jul 27, 2015 at 5:41 AM, Michael Collison
> <michael.collison@linaro.org> wrote:
>> This patch is designed to optimize end of loop conditions involving of the
>> form
>>   i < x && i < y into i < min (x, y). Loop condition involving '>' are
>> handled similarly using max(x,y).
>> As an example:
>>
>> #define N 1024
>>
>> int  a[N], b[N], c[N];
>>
>> void add (unsignedint  m, unsignedint  n)
>> {
>>    unsignedint  i, bound = (m < n) ? m : n;
>>    for  (i = 0; i < m && i < n; ++i)
>>      a[i] = b[i] + c[i];
>> }
>>
>>
>> Performed bootstrap and make check on: x86_64_unknown-linux-gnu,
>> arm-linux-gnueabihf, and aarch64-linux-gnu.
>> Okay for trunk?
>
> So this works only for && that has been lowered to non-CFG form
> (I suppose phiopt would catch that?  If not, ifcombine would be the
> place to implement it I guess).
phiopt is supposed to be generating MIN/MAX expressions for us.  If it 
isn't it'd be good to see any testcases where it isn't.

I think that raises a general question though.  Does it make more sense 
to capture MIN/MAX (and others) in phiopt or in the match.pd framework?

Jeff
Richard Biener July 28, 2015, 7:41 a.m. UTC | #5
On Mon, Jul 27, 2015 at 6:20 PM, Jeff Law <law@redhat.com> wrote:
> On 07/27/2015 03:25 AM, Richard Biener wrote:
>>
>> On Mon, Jul 27, 2015 at 5:41 AM, Michael Collison
>> <michael.collison@linaro.org> wrote:
>>>
>>> This patch is designed to optimize end of loop conditions involving of
>>> the
>>> form
>>>   i < x && i < y into i < min (x, y). Loop condition involving '>' are
>>> handled similarly using max(x,y).
>>> As an example:
>>>
>>> #define N 1024
>>>
>>> int  a[N], b[N], c[N];
>>>
>>> void add (unsignedint  m, unsignedint  n)
>>> {
>>>    unsignedint  i, bound = (m < n) ? m : n;
>>>    for  (i = 0; i < m && i < n; ++i)
>>>      a[i] = b[i] + c[i];
>>> }
>>>
>>>
>>> Performed bootstrap and make check on: x86_64_unknown-linux-gnu,
>>> arm-linux-gnueabihf, and aarch64-linux-gnu.
>>> Okay for trunk?
>>
>>
>> So this works only for && that has been lowered to non-CFG form
>> (I suppose phiopt would catch that?  If not, ifcombine would be the
>> place to implement it I guess).
>
> phiopt is supposed to be generating MIN/MAX expressions for us.  If it isn't
> it'd be good to see any testcases where it isn't.
>
> I think that raises a general question though.  Does it make more sense to
> capture MIN/MAX (and others) in phiopt or in the match.pd framework?

match.pd is good for pattern recognition - patterns of fixed size.  There are
cases that are done in fold-const.c for example that doesn't fit very well
and should be done as separate pass, like for example figuring out whether
an expression can be easily negated or whether there are sign-changes that
can be stripped.  Basically all cases where fold currently recurses (unbound).

The above case is a corner case I think - the number of && you can change
into (multiple) MIN/MAX is unbound but we might only care about the case
where there will be one MIN/MAX operation.

Generally phiopt and other patterns that match the CFG are not yet well
supported by match.pd (though I outlined how matching PHI nodes when
facing (simplify (cond ...) ...) would be possible).

So while putting something into match.pd is easy I'd like people to
think if doing the same thing elsewhere is better - that is, if this is really
a pattern transform operation or if you are just implementing a special-case
of a general transform as a pattern.

Richard.

> Jeff
>
Michael Collison July 29, 2015, 10:10 p.m. UTC | #6
Richard and Jeff,

Any conclusion to this discussion? Is this okay in match.pd or would you 
like to see it implemented elsewhere?

On 7/28/2015 12:41 AM, Richard Biener wrote:
> On Mon, Jul 27, 2015 at 6:20 PM, Jeff Law <law@redhat.com> wrote:
>> On 07/27/2015 03:25 AM, Richard Biener wrote:
>>> On Mon, Jul 27, 2015 at 5:41 AM, Michael Collison
>>> <michael.collison@linaro.org> wrote:
>>>> This patch is designed to optimize end of loop conditions involving of
>>>> the
>>>> form
>>>>    i < x && i < y into i < min (x, y). Loop condition involving '>' are
>>>> handled similarly using max(x,y).
>>>> As an example:
>>>>
>>>> #define N 1024
>>>>
>>>> int  a[N], b[N], c[N];
>>>>
>>>> void add (unsignedint  m, unsignedint  n)
>>>> {
>>>>     unsignedint  i, bound = (m < n) ? m : n;
>>>>     for  (i = 0; i < m && i < n; ++i)
>>>>       a[i] = b[i] + c[i];
>>>> }
>>>>
>>>>
>>>> Performed bootstrap and make check on: x86_64_unknown-linux-gnu,
>>>> arm-linux-gnueabihf, and aarch64-linux-gnu.
>>>> Okay for trunk?
>>>
>>> So this works only for && that has been lowered to non-CFG form
>>> (I suppose phiopt would catch that?  If not, ifcombine would be the
>>> place to implement it I guess).
>> phiopt is supposed to be generating MIN/MAX expressions for us.  If it isn't
>> it'd be good to see any testcases where it isn't.
>>
>> I think that raises a general question though.  Does it make more sense to
>> capture MIN/MAX (and others) in phiopt or in the match.pd framework?
> match.pd is good for pattern recognition - patterns of fixed size.  There are
> cases that are done in fold-const.c for example that doesn't fit very well
> and should be done as separate pass, like for example figuring out whether
> an expression can be easily negated or whether there are sign-changes that
> can be stripped.  Basically all cases where fold currently recurses (unbound).
>
> The above case is a corner case I think - the number of && you can change
> into (multiple) MIN/MAX is unbound but we might only care about the case
> where there will be one MIN/MAX operation.
>
> Generally phiopt and other patterns that match the CFG are not yet well
> supported by match.pd (though I outlined how matching PHI nodes when
> facing (simplify (cond ...) ...) would be possible).
>
> So while putting something into match.pd is easy I'd like people to
> think if doing the same thing elsewhere is better - that is, if this is really
> a pattern transform operation or if you are just implementing a special-case
> of a general transform as a pattern.
>
> Richard.
>
>> Jeff
>>
Jeff Law July 31, 2015, 4:20 p.m. UTC | #7
On 07/28/2015 01:41 AM, Richard Biener wrote:
>
> The above case is a corner case I think - the number of && you can change
> into (multiple) MIN/MAX is unbound but we might only care about the case
> where there will be one MIN/MAX operation.
I suspect that's going to be the most important/common case.

>
> Generally phiopt and other patterns that match the CFG are not yet well
> supported by match.pd (though I outlined how matching PHI nodes when
> facing (simplify (cond ...) ...) would be possible).
Right.  Though I thought the conclusion after outlining we determined it 
wasn't really feasible, yet.


>
> So while putting something into match.pd is easy I'd like people to
> think if doing the same thing elsewhere is better - that is, if this is really
> a pattern transform operation or if you are just implementing a special-case
> of a general transform as a pattern.
So in this case we're taking something like:

  _6 = i_1 < m_4(D);
   _7 = i_1 < n_3(D);
   _8 = _6 & _7;
   if (_8 != 0)


And turning it into

_X = MIN (m_4, n_3)
if (i_1 < _X)

That seems to me like a good match for match.pd given its generality and 
the need to walk up the use-def chain.  It's certainly not a good fit 
for phi-opt since we're not looking at PHIs :-)



Michael -- can you take your sample code and turn it into a test for the 
testsuite.  I'd hazard a guess it'll need to be target specific because 
of its interactions with branch-costing.  Might as well make 4 variants 
(lt -> MIN, le -> MIN, ge->MAX, gt->MAX).

We're going to want that regardless of whether tackling this issue in 
match.pd (my current preference) or elsewhere.

jeff
Michael Collison July 31, 2015, 6:18 p.m. UTC | #8
Hi Jeff,

Yes I will create a test case. I'm not quite sure what to check for even 
in the machine dependent test case. It's quite possible for the 
instructions that are generated to change over time.

On 7/31/2015 9:20 AM, Jeff Law wrote:
> On 07/28/2015 01:41 AM, Richard Biener wrote:
>>
>> The above case is a corner case I think - the number of && you can 
>> change
>> into (multiple) MIN/MAX is unbound but we might only care about the case
>> where there will be one MIN/MAX operation.
> I suspect that's going to be the most important/common case.
>
>>
>> Generally phiopt and other patterns that match the CFG are not yet well
>> supported by match.pd (though I outlined how matching PHI nodes when
>> facing (simplify (cond ...) ...) would be possible).
> Right.  Though I thought the conclusion after outlining we determined 
> it wasn't really feasible, yet.
>
>
>>
>> So while putting something into match.pd is easy I'd like people to
>> think if doing the same thing elsewhere is better - that is, if this 
>> is really
>> a pattern transform operation or if you are just implementing a 
>> special-case
>> of a general transform as a pattern.
> So in this case we're taking something like:
>
>  _6 = i_1 < m_4(D);
>   _7 = i_1 < n_3(D);
>   _8 = _6 & _7;
>   if (_8 != 0)
>
>
> And turning it into
>
> _X = MIN (m_4, n_3)
> if (i_1 < _X)
>
> That seems to me like a good match for match.pd given its generality 
> and the need to walk up the use-def chain.  It's certainly not a good 
> fit for phi-opt since we're not looking at PHIs :-)
>
>
>
> Michael -- can you take your sample code and turn it into a test for 
> the testsuite.  I'd hazard a guess it'll need to be target specific 
> because of its interactions with branch-costing.  Might as well make 4 
> variants (lt -> MIN, le -> MIN, ge->MAX, gt->MAX).
>
> We're going to want that regardless of whether tackling this issue in 
> match.pd (my current preference) or elsewhere.
>
> jeff
Jeff Law July 31, 2015, 6:27 p.m. UTC | #9
On 07/31/2015 12:18 PM, Michael Collison wrote:
> Hi Jeff,
>
> Yes I will create a test case. I'm not quite sure what to check for even
> in the machine dependent test case. It's quite possible for the
> instructions that are generated to change over time.
I think we're going to want to look at the gimple IR and search for the 
MIN/MAX expressions rather than the instructions.  Given we don't know 
where the transformation is going to land (yet), you can probably start 
with -fdump-tree-optimized and scanning the .optimized dump.

We can still do that and have the test be target specific.

jeff
Richard Biener Aug. 3, 2015, 7:34 a.m. UTC | #10
On Fri, Jul 31, 2015 at 8:27 PM, Jeff Law <law@redhat.com> wrote:
> On 07/31/2015 12:18 PM, Michael Collison wrote:
>>
>> Hi Jeff,
>>
>> Yes I will create a test case. I'm not quite sure what to check for even
>> in the machine dependent test case. It's quite possible for the
>> instructions that are generated to change over time.
>
> I think we're going to want to look at the gimple IR and search for the
> MIN/MAX expressions rather than the instructions.  Given we don't know where
> the transformation is going to land (yet), you can probably start with
> -fdump-tree-optimized and scanning the .optimized dump.

Yeah.  FWIW I'm ok with the specialized pattern in match.pd.  Indeed phiopt
isn't a good fit here (though on some targets you may end up seeing a PHI node,
dependent on LOGICAL_OP_NON_SHORT_CIRCUIT).  For longer chains
I'd say reassoc is the appropriate place to handle this as it has
already code to
combine &&s and ||s of conditions.  Maybe you can give it a short try to only
handle it there.

I'm also somewhat worried about code-generation on random targets.  So can
you have a quick look (just for your testcase) what the code-gen difference is
on primary platforms (just build some stage1 cc1s)?

Thanks,
Richard.

> We can still do that and have the test be target specific.
>
> jeff
>
Alan Lawrence Aug. 5, 2015, 4:07 p.m. UTC | #11
Richard Biener wrote:

> Furthermore it doesn't work for three such ops which would require
> an additional pattern like
> 
>  (simplfiy
>   (bit_and:c (op @0 (min @1 @2)) (op @0 @3))
>   (op @0 (min (min @1 @2) @3))))
> 
> if that's profitable?

Shouldn't that be just a case of binding @1 in the original pattern:

>> +/* Transform (@0 < @1 and @0 < @2) to use min */
>> +(for op (lt le)
>> +(simplify
>> +(bit_and:c (op @0 @1) (op @0 @2))
 >> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
 >> +(op @0 (min @1 @2)))))

to (min @1 @2) in your three-way pattern, @2 in the original to @3 in yours, and 
@0 to @0 ?

Or is @1 not allowed to bind to a 'min' ?


Secondly: should/can Michael's fix reference PR57600 ?

Cheers,
Alan
Jeff Law Aug. 5, 2015, 4:15 p.m. UTC | #12
On 08/05/2015 10:07 AM, Alan Lawrence wrote:
> Richard Biener wrote:
>
>> Furthermore it doesn't work for three such ops which would require
>> an additional pattern like
>>
>>  (simplfiy
>>   (bit_and:c (op @0 (min @1 @2)) (op @0 @3))
>>   (op @0 (min (min @1 @2) @3))))
>>
>> if that's profitable?
>
> Shouldn't that be just a case of binding @1 in the original pattern:
>
>>> +/* Transform (@0 < @1 and @0 < @2) to use min */
>>> +(for op (lt le)
>>> +(simplify
>>> +(bit_and:c (op @0 @1) (op @0 @2))
>  >> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
>  >> +(op @0 (min @1 @2)))))
>
> to (min @1 @2) in your three-way pattern, @2 in the original to @3 in
> yours, and @0 to @0 ?
>
> Or is @1 not allowed to bind to a 'min' ?
>
>
> Secondly: should/can Michael's fix reference PR57600 ?
It probably should.

Reading that BZ ISTM that we probably are missing some expression 
equivalences in DOM.  Given

x = min (a, b)

We should be recording
x <= a -> true
x <= b -> true

in our expression equivalency tables (plus all the related 
equivalences).  I'm not sure if that's the cause of the missed 
optimizations or not though.  Just something I noticed while reading the BZ.

jeff
diff mbox

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 5e8fd32..8691710 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1793,3 +1793,17 @@  along with GCC; see the file COPYING3.  If not see
      (convert (bit_and (op (convert:utype @0) (convert:utype @1))
                (convert:utype @4)))))))

+
+/* Transform (@0 < @1 and @0 < @2) to use min */
+(for op (lt le)
+(simplify
+(bit_and:c (op @0 @1) (op @0 @2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(op @0 (min @1 @2)))))
+
+/* Transform (@0 > @1 and @0 > @2) to use max */
+(for op (gt ge)
+(simplify
+(bit_and:c (op @0 @1) (op @0 @2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(op @0 (max @1 @2)))))