Fold some equal to and not equal to patterns in match.pd

Message ID	DM2PR0701MB1018928F9138DEBCB2A0E9108E840@DM2PR0701MB1018.namprd07.prod.outlook.com
State	New
Headers	show Return-Path: <gcc-patches-return-403451-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type:mime-version; q=dns; s=default; b=MGsGQMlSeGAegBm25kxfIDes/fkVjaeXnjicLwFb4P+J5TQzP3 yArM13RLGZLnAOz678FAteRCjl1KoZXB9cdcgh2yoOIYj54X21TvdLo4fW5eORrm DxbfvE9G9YUipXctGco6426P52Sfg0ShDAwa8SXvAanwcd8n/Q1UlChD0= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: "Hurugalawadi, Naveen" <Naveen.Hurugalawadi@caviumnetworks.com> To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org> CC: "Pinski, Andrew" <Andrew.Pinski@caviumnetworks.com> Subject: Fold some equal to and not equal to patterns in match.pd Date: Tue, 21 Jul 2015 09:15:31 +0000 Message-ID: <DM2PR0701MB1018928F9138DEBCB2A0E9108E840@DM2PR0701MB1018.namprd07.prod.outlook.com> by1pr0701mb1237: X-MS-Exchange-Organization-RulesExecuted Content-Type: multipart/mixed; boundary="_002_DM2PR0701MB1018928F9138DEBCB2A0E9108E840DM2PR0701MB1018_" MIME-Version: 1.0

Hurugalawadi, Naveen July 21, 2015, 9:15 a.m. UTC

Hi,

Please find attached the patch which performs following patterns folding
in match.pd:-

a ==/!= a p+ b to b ==/!= 0.
a << N ==/!= 0 to a&(-1>>N) ==/!= 0.
a * N ==/!= 0 where N is a power of 2 to a & (-1<<N2) ==/!= 0 where N2 is log2 of N.

Please review the same and let us know if its okay?

Regression Tested on X86_64.

On Behalf of Andrew Pinski.

Thanks,

gcc/testsuite/ChangeLog:

2015-01-21  Andrew Pinski  <apinski@cavium.com>

	* testsuite/gcc.dg/tree-ssa/compare-shiftmult-1.c: New testcase.
	* testsuite/gcc.dg/tree-ssa/compare_pointers-1.c: New testcase.

gcc/ChangeLog:

2015-01-21  Andrew Pinski  <apinski@cavium.com>

	* match.pd (define_predicates): Add integer_pow2p.
	Add pattern for folding of a ==/!= a p+ b to b ==/!= 0.
	(unsigned_integral_valued_p): New match.
	Add pattern for folding of a<<N ==/!= 0 to a&(-1>>N) ==/!= 0.
	Add pattern for folding of a*N ==/!= 0 where N is a power of 2
	to a&(-1<<N2) ==/!= 0 where N2 is log2 of N.

Andrew Pinski July 21, 2015, 9:26 a.m. UTC | #1

> On Jul 21, 2015, at 2:15 AM, Hurugalawadi, Naveen <Naveen.Hurugalawadi@caviumnetworks.com> wrote:
> 
> Hi,
> 
> Please find attached the patch which performs following patterns folding
> in match.pd:-
> 
> a ==/!= a p+ b to b ==/!= 0.
> a << N ==/!= 0 to a&(-1>>N) ==/!= 0.
> a * N ==/!= 0 where N is a power of 2 to a & (-1<<N2) ==/!= 0 where N2 is log2 of N.
> 
> Please review the same and let us know if its okay?

I should note this shows up in perlbmk in spec 2006. 

Thanks,
Andrew

> 
> Regression Tested on X86_64.
> 
> On Behalf of Andrew Pinski.
> 
> Thanks,
> 
> gcc/testsuite/ChangeLog:
> 
> 2015-01-21  Andrew Pinski  <apinski@cavium.com>
> 
>    * testsuite/gcc.dg/tree-ssa/compare-shiftmult-1.c: New testcase.
>    * testsuite/gcc.dg/tree-ssa/compare_pointers-1.c: New testcase.
> 
> gcc/ChangeLog:
> 
> 2015-01-21  Andrew Pinski  <apinski@cavium.com>
> 
>    * match.pd (define_predicates): Add integer_pow2p.
>    Add pattern for folding of a ==/!= a p+ b to b ==/!= 0.
>    (unsigned_integral_valued_p): New match.
>    Add pattern for folding of a<<N ==/!= 0 to a&(-1>>N) ==/!= 0.
>    Add pattern for folding of a*N ==/!= 0 where N is a power of 2
>    to a&(-1<<N2) ==/!= 0 where N2 is log2 of N.
> <gcc_match.patch>

Jakub Jelinek July 21, 2015, 9:38 a.m. UTC | #2

On Tue, Jul 21, 2015 at 09:15:31AM +0000, Hurugalawadi, Naveen wrote:
> Please find attached the patch which performs following patterns folding
> in match.pd:-
> 
> a ==/!= a p+ b to b ==/!= 0.
> a << N ==/!= 0 to a&(-1>>N) ==/!= 0.

Not sure about this second one.  Why do you think it is generally
beneficial?  On many targets, shifts are as fast as bitwise and, and
-1>>N could be e.g. significantly more expensive constant (say require
3 instructions to construct).

	Jakub

Kyrylo Tkachov July 21, 2015, 10:31 a.m. UTC | #3

On 21/07/15 10:26, pinskia@gmail.com wrote:
>
>
>
>> On Jul 21, 2015, at 2:15 AM, Hurugalawadi, Naveen <Naveen.Hurugalawadi@caviumnetworks.com> wrote:
>>
>> Hi,
>>
>> Please find attached the patch which performs following patterns folding
>> in match.pd:-
>>
>> a ==/!= a p+ b to b ==/!= 0.
>> a << N ==/!= 0 to a&(-1>>N) ==/!= 0.
>> a * N ==/!= 0 where N is a power of 2 to a & (-1<<N2) ==/!= 0 where N2 is log2 of N.
>>
>> Please review the same and let us know if its okay?
> I should note this shows up in perlbmk in spec 2006.

Yes, I see it triggering there for aarch64, but I also see some undesired effects,
for example in gcc:
     lsl    x24, x24, 3
     cbz    x24, .L1194

now becomes:
     and    x0, x24, 2305843009213693951
     lsl    x24, x24, 3
     cbz    x0, .L1194

> Thanks,
> Andrew
>
>> Regression Tested on X86_64.
>>
>> On Behalf of Andrew Pinski.
>>
>> Thanks,
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2015-01-21  Andrew Pinski  <apinski@cavium.com>
>>
>>     * testsuite/gcc.dg/tree-ssa/compare-shiftmult-1.c: New testcase.
>>     * testsuite/gcc.dg/tree-ssa/compare_pointers-1.c: New testcase.
>>
>> gcc/ChangeLog:
>>
>> 2015-01-21  Andrew Pinski  <apinski@cavium.com>
>>
>>     * match.pd (define_predicates): Add integer_pow2p.
>>     Add pattern for folding of a ==/!= a p+ b to b ==/!= 0.
>>     (unsigned_integral_valued_p): New match.
>>     Add pattern for folding of a<<N ==/!= 0 to a&(-1>>N) ==/!= 0.
>>     Add pattern for folding of a*N ==/!= 0 where N is a power of 2
>>     to a&(-1<<N2) ==/!= 0 where N2 is log2 of N.
>> <gcc_match.patch>

Andrew Pinski July 21, 2015, 10:46 a.m. UTC | #4

> On Jul 21, 2015, at 3:31 AM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
> 
> 
>> On 21/07/15 10:26, pinskia@gmail.com wrote:
>> 
>> 
>> 
>>> On Jul 21, 2015, at 2:15 AM, Hurugalawadi, Naveen <Naveen.Hurugalawadi@caviumnetworks.com> wrote:
>>> 
>>> Hi,
>>> 
>>> Please find attached the patch which performs following patterns folding
>>> in match.pd:-
>>> 
>>> a ==/!= a p+ b to b ==/!= 0.
>>> a << N ==/!= 0 to a&(-1>>N) ==/!= 0.
>>> a * N ==/!= 0 where N is a power of 2 to a & (-1<<N2) ==/!= 0 where N2 is log2 of N.
>>> 
>>> Please review the same and let us know if its okay?
>> I should note this shows up in perlbmk in spec 2006.
> 
> Yes, I see it triggering there for aarch64, but I also see some undesired effects,
> for example in gcc:
>    lsl    x24, x24, 3
>    cbz    x24, .L1194
> 
> now becomes:
>    and    x0, x24, 2305843009213693951
>    lsl    x24, x24, 3
>    cbz    x0, .L1194

Shouldn't the and become a tst instead and the cbz be a b.eq?  Oh I have another patch which does that and the reason the performance for me does not regress on thunderx (tst and branches can merge before issue). 

Thanks,
Andrew

> 
>> Thanks,
>> Andrew
>> 
>>> Regression Tested on X86_64.
>>> 
>>> On Behalf of Andrew Pinski.
>>> 
>>> Thanks,
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>> 2015-01-21  Andrew Pinski  <apinski@cavium.com>
>>> 
>>>    * testsuite/gcc.dg/tree-ssa/compare-shiftmult-1.c: New testcase.
>>>    * testsuite/gcc.dg/tree-ssa/compare_pointers-1.c: New testcase.
>>> 
>>> gcc/ChangeLog:
>>> 
>>> 2015-01-21  Andrew Pinski  <apinski@cavium.com>
>>> 
>>>    * match.pd (define_predicates): Add integer_pow2p.
>>>    Add pattern for folding of a ==/!= a p+ b to b ==/!= 0.
>>>    (unsigned_integral_valued_p): New match.
>>>    Add pattern for folding of a<<N ==/!= 0 to a&(-1>>N) ==/!= 0.
>>>    Add pattern for folding of a*N ==/!= 0 where N is a power of 2
>>>    to a&(-1<<N2) ==/!= 0 where N2 is log2 of N.
>>> <gcc_match.patch>
>

Kyrylo Tkachov July 21, 2015, 10:52 a.m. UTC | #5

On 21/07/15 11:46, pinskia@gmail.com wrote:
>
>
>
>> On Jul 21, 2015, at 3:31 AM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>>
>>
>>> On 21/07/15 10:26, pinskia@gmail.com wrote:
>>>
>>>
>>>
>>>> On Jul 21, 2015, at 2:15 AM, Hurugalawadi, Naveen <Naveen.Hurugalawadi@caviumnetworks.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Please find attached the patch which performs following patterns folding
>>>> in match.pd:-
>>>>
>>>> a ==/!= a p+ b to b ==/!= 0.
>>>> a << N ==/!= 0 to a&(-1>>N) ==/!= 0.
>>>> a * N ==/!= 0 where N is a power of 2 to a & (-1<<N2) ==/!= 0 where N2 is log2 of N.
>>>>
>>>> Please review the same and let us know if its okay?
>>> I should note this shows up in perlbmk in spec 2006.
>> Yes, I see it triggering there for aarch64, but I also see some undesired effects,
>> for example in gcc:
>>     lsl    x24, x24, 3
>>     cbz    x24, .L1194
>>
>> now becomes:
>>     and    x0, x24, 2305843009213693951
>>     lsl    x24, x24, 3
>>     cbz    x0, .L1194
> Shouldn't the and become a tst instead and the cbz be a b.eq?  Oh I have another patch which does that and the reason the performance for me does not regress on thunderx (tst and branches can merge before issue).

well, the overall effect in perlbench and gcc is beneficial (at least code size-wise).
It's just a few warts like this here and there.

Kyrill

>
> Thanks,
> Andrew
>
>>> Thanks,
>>> Andrew
>>>
>>>> Regression Tested on X86_64.
>>>>
>>>> On Behalf of Andrew Pinski.
>>>>
>>>> Thanks,
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 2015-01-21  Andrew Pinski  <apinski@cavium.com>
>>>>
>>>>     * testsuite/gcc.dg/tree-ssa/compare-shiftmult-1.c: New testcase.
>>>>     * testsuite/gcc.dg/tree-ssa/compare_pointers-1.c: New testcase.
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 2015-01-21  Andrew Pinski  <apinski@cavium.com>
>>>>
>>>>     * match.pd (define_predicates): Add integer_pow2p.
>>>>     Add pattern for folding of a ==/!= a p+ b to b ==/!= 0.
>>>>     (unsigned_integral_valued_p): New match.
>>>>     Add pattern for folding of a<<N ==/!= 0 to a&(-1>>N) ==/!= 0.
>>>>     Add pattern for folding of a*N ==/!= 0 where N is a power of 2
>>>>     to a&(-1<<N2) ==/!= 0 where N2 is log2 of N.
>>>> <gcc_match.patch>

Kyrylo Tkachov July 21, 2015, 2:57 p.m. UTC | #6

On 21/07/15 11:46, pinskia@gmail.com wrote:
>
>
>
>> On Jul 21, 2015, at 3:31 AM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>>
>>
>>> On 21/07/15 10:26, pinskia@gmail.com wrote:
>>>
>>>
>>>
>>>> On Jul 21, 2015, at 2:15 AM, Hurugalawadi, Naveen <Naveen.Hurugalawadi@caviumnetworks.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Please find attached the patch which performs following patterns folding
>>>> in match.pd:-
>>>>
>>>> a ==/!= a p+ b to b ==/!= 0.
>>>> a << N ==/!= 0 to a&(-1>>N) ==/!= 0.
>>>> a * N ==/!= 0 where N is a power of 2 to a & (-1<<N2) ==/!= 0 where N2 is log2 of N.
>>>>
>>>> Please review the same and let us know if its okay?
>>> I should note this shows up in perlbmk in spec 2006.
>> Yes, I see it triggering there for aarch64, but I also see some undesired effects,
>> for example in gcc:
>>     lsl    x24, x24, 3
>>     cbz    x24, .L1194
>>
>> now becomes:
>>     and    x0, x24, 2305843009213693951
>>     lsl    x24, x24, 3
>>     cbz    x0, .L1194
> Shouldn't the and become a tst instead and the cbz be a b.eq?  Oh I have another patch which does that and the reason the performance for me does not regress on thunderx (tst and branches can merge before issue).

I also see a code size regression on xalancbmk.
If you have an aarch64 patch that mitigates this there, could you post that as well?
Would be nice to get the benefits of this folding without regressing existing cases on aarch64.

Thanks,
Kyrill

>
> Thanks,
> Andrew
>
>>> Thanks,
>>> Andrew
>>>
>>>> Regression Tested on X86_64.
>>>>
>>>> On Behalf of Andrew Pinski.
>>>>
>>>> Thanks,
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 2015-01-21  Andrew Pinski  <apinski@cavium.com>
>>>>
>>>>     * testsuite/gcc.dg/tree-ssa/compare-shiftmult-1.c: New testcase.
>>>>     * testsuite/gcc.dg/tree-ssa/compare_pointers-1.c: New testcase.
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 2015-01-21  Andrew Pinski  <apinski@cavium.com>
>>>>
>>>>     * match.pd (define_predicates): Add integer_pow2p.
>>>>     Add pattern for folding of a ==/!= a p+ b to b ==/!= 0.
>>>>     (unsigned_integral_valued_p): New match.
>>>>     Add pattern for folding of a<<N ==/!= 0 to a&(-1>>N) ==/!= 0.
>>>>     Add pattern for folding of a*N ==/!= 0 where N is a power of 2
>>>>     to a&(-1<<N2) ==/!= 0 where N2 is log2 of N.
>>>> <gcc_match.patch>

Richard Biener July 21, 2015, 7:16 p.m. UTC | #7

On July 21, 2015 11:38:31 AM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>On Tue, Jul 21, 2015 at 09:15:31AM +0000, Hurugalawadi, Naveen wrote:
>> Please find attached the patch which performs following patterns
>folding
>> in match.pd:-
>> 
>> a ==/!= a p+ b to b ==/!= 0.
>> a << N ==/!= 0 to a&(-1>>N) ==/!= 0.
>
>Not sure about this second one.  Why do you think it is generally
>beneficial?  On many targets, shifts are as fast as bitwise and, and
>-1>>N could be e.g. significantly more expensive constant (say require
>3 instructions to construct).

And may set flags while shift not? Of course we do a very poor job of representing this kind of stuff on gimple.

Richard.

>	Jakub

Andrew Pinski July 22, 2015, 12:40 a.m. UTC | #8

On Tue, Jul 21, 2015 at 12:16 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On July 21, 2015 11:38:31 AM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>>On Tue, Jul 21, 2015 at 09:15:31AM +0000, Hurugalawadi, Naveen wrote:
>>> Please find attached the patch which performs following patterns
>>folding
>>> in match.pd:-
>>>
>>> a ==/!= a p+ b to b ==/!= 0.
>>> a << N ==/!= 0 to a&(-1>>N) ==/!= 0.
>>
>>Not sure about this second one.  Why do you think it is generally
>>beneficial?  On many targets, shifts are as fast as bitwise and, and
>>-1>>N could be e.g. significantly more expensive constant (say require
>>3 instructions to construct).
>
> And may set flags while shift not? Of course we do a very poor job of representing this kind of stuff on gimple.

The biggest question now becomes which way is the canonical form for
gimple and we can decide to optimize it on the RTL level (combine)
instead if it produces better code in those cases.
Note on AARCH64, producing x&(-1>>N) has no cost difference from a<<N
so we would like to do it there.  Also in this case producing flags is
useful.

Thanks,
Andrew

>
> Richard.
>
>>       Jakub
>
>

Richard Biener July 22, 2015, 11:04 a.m. UTC | #9

On Wed, Jul 22, 2015 at 2:40 AM, Andrew Pinski
<andrew.pinski@caviumnetworks.com> wrote:
> On Tue, Jul 21, 2015 at 12:16 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On July 21, 2015 11:38:31 AM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>>>On Tue, Jul 21, 2015 at 09:15:31AM +0000, Hurugalawadi, Naveen wrote:
>>>> Please find attached the patch which performs following patterns
>>>folding
>>>> in match.pd:-
>>>>
>>>> a ==/!= a p+ b to b ==/!= 0.
>>>> a << N ==/!= 0 to a&(-1>>N) ==/!= 0.
>>>
>>>Not sure about this second one.  Why do you think it is generally
>>>beneficial?  On many targets, shifts are as fast as bitwise and, and
>>>-1>>N could be e.g. significantly more expensive constant (say require
>>>3 instructions to construct).
>>
>> And may set flags while shift not? Of course we do a very poor job of representing this kind of stuff on gimple.
>
> The biggest question now becomes which way is the canonical form for
> gimple and we can decide to optimize it on the RTL level (combine)
> instead if it produces better code in those cases.
> Note on AARCH64, producing x&(-1>>N) has no cost difference from a<<N
> so we would like to do it there.  Also in this case producing flags is
> useful.

Canonical GIMPLE is what has less GIMPLE operations.  In this case it's
a << N ==/!= 0 (two ops), not a & (-1 >> N) ==/!= 0 (three ops).

Thus on GIMPLE the transform is not wanted.

Richard.

> Thanks,
> Andrew
>
>>
>> Richard.
>>
>>>       Jakub
>>
>>

Richard Biener July 22, 2015, 11:18 a.m. UTC | #10

On Tue, Jul 21, 2015 at 11:15 AM, Hurugalawadi, Naveen
<Naveen.Hurugalawadi@caviumnetworks.com> wrote:
> Hi,
>
> Please find attached the patch which performs following patterns folding
> in match.pd:-
>
> a ==/!= a p+ b to b ==/!= 0.
> a << N ==/!= 0 to a&(-1>>N) ==/!= 0.
> a * N ==/!= 0 where N is a power of 2 to a & (-1<<N2) ==/!= 0 where N2 is log2 of N.
>
> Please review the same and let us know if its okay?

+(match unsigned_integral_valued_p
+ @0
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type))))

please avoid adding matches for simple things like this, instead inline the if.

> a * N ==/!= 0 where N is a power of 2 to a & (-1<<N2) ==/!= 0 where N2 is log2 of N.

We have the similar (for the signed case)

/* Transform comparisons of the form X * C1 CMP 0 to X CMP 0 in the
   signed arithmetic case.  That form is created by the compiler
   often enough for folding it to be of value.  One example is in
   computing loop trip counts after Operator Strength Reduction.  */

which shows the transform may be valid for other comparison codes as well.
Note as Jakub says the (-1 << N2) constant may be expensive to generate.
Did you check if we expand to a multiply again if that is the case?

Also you have the same idea in two patterns so if a & (-1 << N2) should
be canonical on GIMPLE (I'd say gimple should prefer "smaller" constants
when the number of operations is the same) then why not simplify
all multiplies that way instead of just those feeding comparisons?  That is,
a * N -> a << N?

So we have the remaining

+/* Fold a ==/!= a p + b to b ==/!= 0.  */
+(for op (ne eq)
+ (simplify
+  (op:c @0 (pointer_plus @0 @1))
+  (op @1 { build_zero_cst (TREE_TYPE (@1)); })))

fold_comparison has the more "complete"

  /* For comparisons of pointers we can decompose it to a compile time
     comparison of the base objects and the offsets into the object.
     This requires at least one operand being an ADDR_EXPR or a
     POINTER_PLUS_EXPR to do more than the operand_equal_p test below.  */

but it looks like you are implementing fold_binarys

      /* Transform comparisons of the form X +- Y CMP X to Y CMP 0.  */
      if ((TREE_CODE (arg0) == PLUS_EXPR
           || TREE_CODE (arg0) == POINTER_PLUS_EXPR
           || TREE_CODE (arg0) == MINUS_EXPR)
          && operand_equal_p (tree_strip_nop_conversions (TREE_OPERAND (arg0,
                                                                        0)),
                              arg1, 0)
          && (INTEGRAL_TYPE_P (TREE_TYPE (arg0))
              || POINTER_TYPE_P (TREE_TYPE (arg0))))
        {
          tree val = TREE_OPERAND (arg0, 1);
          return omit_two_operands_loc (loc, type,
                                    fold_build2_loc (loc, code, type,
                                                 val,
                                                 build_int_cst (TREE_TYPE (val),
                                                                0)),
                                    TREE_OPERAND (arg0, 0), arg1);
        }

but leave out non-pointer support.  So please start over with just
moving the above
code from fold-const.c to match.pd as a pattern.

Thanks,
Richard.


> Regression Tested on X86_64.
>
> On Behalf of Andrew Pinski.
>
> Thanks,
>
> gcc/testsuite/ChangeLog:
>
> 2015-01-21  Andrew Pinski  <apinski@cavium.com>
>
>         * testsuite/gcc.dg/tree-ssa/compare-shiftmult-1.c: New testcase.
>         * testsuite/gcc.dg/tree-ssa/compare_pointers-1.c: New testcase.
>
> gcc/ChangeLog:
>
> 2015-01-21  Andrew Pinski  <apinski@cavium.com>
>
>         * match.pd (define_predicates): Add integer_pow2p.
>         Add pattern for folding of a ==/!= a p+ b to b ==/!= 0.
>         (unsigned_integral_valued_p): New match.
>         Add pattern for folding of a<<N ==/!= 0 to a&(-1>>N) ==/!= 0.
>         Add pattern for folding of a*N ==/!= 0 where N is a power of 2
>         to a&(-1<<N2) ==/!= 0 where N2 is log2 of N.

Segher Boessenkool July 22, 2015, 12:56 p.m. UTC | #11

On Tue, Jul 21, 2015 at 05:40:07PM -0700, Andrew Pinski wrote:
> The biggest question now becomes which way is the canonical form for
> gimple and we can decide to optimize it on the RTL level (combine)
> instead if it produces better code in those cases.

combine does not do instruction selection in general; it only does
instruction combination.  It already handles most cases where shifts
and masks are combined; if you find one where it doesn't, please
report.

Segher

Jeff Law July 23, 2015, 4:09 p.m. UTC | #12

On 07/21/2015 06:40 PM, Andrew Pinski wrote:
> On Tue, Jul 21, 2015 at 12:16 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On July 21, 2015 11:38:31 AM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>>> On Tue, Jul 21, 2015 at 09:15:31AM +0000, Hurugalawadi, Naveen wrote:
>>>> Please find attached the patch which performs following patterns
>>> folding
>>>> in match.pd:-
>>>>
>>>> a ==/!= a p+ b to b ==/!= 0.
>>>> a << N ==/!= 0 to a&(-1>>N) ==/!= 0.
>>>
>>> Not sure about this second one.  Why do you think it is generally
>>> beneficial?  On many targets, shifts are as fast as bitwise and, and
>>> -1>>N could be e.g. significantly more expensive constant (say require
>>> 3 instructions to construct).
>>
>> And may set flags while shift not? Of course we do a very poor job of representing this kind of stuff on gimple.
>
> The biggest question now becomes which way is the canonical form for
> gimple and we can decide to optimize it on the RTL level (combine)
> instead if it produces better code in those cases.
> Note on AARCH64, producing x&(-1>>N) has no cost difference from a<<N
> so we would like to do it there.  Also in this case producing flags is
> useful.
Funny I was having a discussion with Kai about this class of problems 
yesterday.

It seems to me in these kind of cases that selection of the canonical 
form should be driven by factors outside of which is better for a 
particular target.  ie, which is simpler

Instead we should be looking at the gimple->rtl expansion phase to 
expand using a form that is better for a particluar target.

[And yes, I think I've violated this new guiding principle in the past]

Given the (a << n) EQ/NE form, can we reasonably detect this in the 
gimple->expansion phase and emit code as if we had the alternate form 
for targets such as aarch64?

jeff

Segher Boessenkool July 23, 2015, 4:33 p.m. UTC | #13

On Thu, Jul 23, 2015 at 10:09:49AM -0600, Jeff Law wrote:
> It seems to me in these kind of cases that selection of the canonical 
> form should be driven by factors outside of which is better for a 
> particular target.  ie, which is simpler

I agree.  But neither form is simpler here, and we need to have both
forms in other contexts, so there is no real benefit to canonicalising.

> Instead we should be looking at the gimple->rtl expansion phase to 
> expand using a form that is better for a particluar target.
> 
> [And yes, I think I've violated this new guiding principle in the past]
> 
> Given the (a << n) EQ/NE form, can we reasonably detect this in the 
> gimple->expansion phase and emit code as if we had the alternate form 
> for targets such as aarch64?

In gimple, both forms have the same complexity as far as I see.

In RTL both forms have the same complexity as well.  RTX cost can be
higher for the AND form (or, in theory at least, can be lower for some
targets).

In my opinion expand shouldn't do anything special here, just emit RTL
like the gimple it is fed, and let the target deal with it, or the
generic RTL optimisers.  Or do we have an example where that does not
work or is inconvenient?

Segher

Jeff Law July 24, 2015, 5:54 a.m. UTC | #14

On 07/23/2015 10:33 AM, Segher Boessenkool wrote:
> On Thu, Jul 23, 2015 at 10:09:49AM -0600, Jeff Law wrote:
>> It seems to me in these kind of cases that selection of the canonical
>> form should be driven by factors outside of which is better for a
>> particular target.  ie, which is simpler
>
> I agree.  But neither form is simpler here, and we need to have both
> forms in other contexts, so there is no real benefit to canonicalising.

a << N ==/!= 0

Looks like two operations.  A shift and a comparison against zero 
regardless of whether or not N is constant.

a&(-1>>N) ==/!= 0

For a varying N, this has a shift, logical and and comparison against zero.

For a constant N obviously the shift collapses to a constant and we're 
left with two operations again.

So for gimple, I'd prefer to see us using the a << N form.

If we need both forms in other contexts, we ought to be looking to 
eliminate that need :-)

If we go to the RTL level, then it's more complex -- it might depend on 
the constant produced by the -1 >> N operation, whether or not the 
target can shift by more than one bit at a time (H8/300 series is 
limited here for example), whether or not one operation sets condition 
codes in a useful way, potentially allowing the comparison to be 
removed, etc etc.  rtx_costs, even with its limitations is probably the 
way to drive selection of form for the RTL optimizers.

Jeff

Kai Tietz July 24, 2015, 7:48 a.m. UTC | #15

2015-07-24 7:54 GMT+02:00 Jeff Law <law@redhat.com>:
> On 07/23/2015 10:33 AM, Segher Boessenkool wrote:
>>
>> On Thu, Jul 23, 2015 at 10:09:49AM -0600, Jeff Law wrote:
>>>
>>> It seems to me in these kind of cases that selection of the canonical
>>> form should be driven by factors outside of which is better for a
>>> particular target.  ie, which is simpler
>>
>>
>> I agree.  But neither form is simpler here, and we need to have both
>> forms in other contexts, so there is no real benefit to canonicalising.
>
>
>
> a << N ==/!= 0
>
> Looks like two operations.  A shift and a comparison against zero regardless
> of whether or not N is constant.
>
>
> a&(-1>>N) ==/!= 0
>
> For a varying N, this has a shift, logical and and comparison against zero.
>
> For a constant N obviously the shift collapses to a constant and we're left
> with two operations again.
>
> So for gimple, I'd prefer to see us using the a << N form.
>
> If we need both forms in other contexts, we ought to be looking to eliminate
> that need :-)
>
> If we go to the RTL level, then it's more complex -- it might depend on the
> constant produced by the -1 >> N operation, whether or not the target can
> shift by more than one bit at a time (H8/300 series is limited here for
> example), whether or not one operation sets condition codes in a useful way,
> potentially allowing the comparison to be removed, etc etc.  rtx_costs, even
> with its limitations is probably the way to drive selection of form for the
> RTL optimizers.
>
>
> Jeff

I fully agree.  But there are case there is not necessarily a 'better'
representation.  For example for sinking converts into shift-operation
can be tricky.

for 'typea a;'

(typeb) (a << N) -> ((typeb) a) << N

'bitsizeof (typeb) <= N' result is always zero.
'bitsizeof (typeb) > N' result needs to be calculated.

But it isn't necessarily easy to say if form '(typeb) (a << N)', or
'((typeb) a) << N' form is to be prefered.  It strongly depends on
expression this pattern is used in.

For RTL this pattern has another issue, as we need to take natural
mode into account here too,

We should define for such forms our 'normal' representation.

Kai

Richard Biener July 24, 2015, 9:26 a.m. UTC | #16

On Fri, Jul 24, 2015 at 9:48 AM, Kai Tietz <ktietz70@googlemail.com> wrote:
> 2015-07-24 7:54 GMT+02:00 Jeff Law <law@redhat.com>:
>> On 07/23/2015 10:33 AM, Segher Boessenkool wrote:
>>>
>>> On Thu, Jul 23, 2015 at 10:09:49AM -0600, Jeff Law wrote:
>>>>
>>>> It seems to me in these kind of cases that selection of the canonical
>>>> form should be driven by factors outside of which is better for a
>>>> particular target.  ie, which is simpler
>>>
>>>
>>> I agree.  But neither form is simpler here, and we need to have both
>>> forms in other contexts, so there is no real benefit to canonicalising.
>>
>>
>>
>> a << N ==/!= 0
>>
>> Looks like two operations.  A shift and a comparison against zero regardless
>> of whether or not N is constant.
>>
>>
>> a&(-1>>N) ==/!= 0
>>
>> For a varying N, this has a shift, logical and and comparison against zero.
>>
>> For a constant N obviously the shift collapses to a constant and we're left
>> with two operations again.
>>
>> So for gimple, I'd prefer to see us using the a << N form.
>>
>> If we need both forms in other contexts, we ought to be looking to eliminate
>> that need :-)
>>
>> If we go to the RTL level, then it's more complex -- it might depend on the
>> constant produced by the -1 >> N operation, whether or not the target can
>> shift by more than one bit at a time (H8/300 series is limited here for
>> example), whether or not one operation sets condition codes in a useful way,
>> potentially allowing the comparison to be removed, etc etc.  rtx_costs, even
>> with its limitations is probably the way to drive selection of form for the
>> RTL optimizers.
>>
>>
>> Jeff
>
> I fully agree.  But there are case there is not necessarily a 'better'
> representation.  For example for sinking converts into shift-operation
> can be tricky.
>
> for 'typea a;'
>
> (typeb) (a << N) -> ((typeb) a) << N
>
> 'bitsizeof (typeb) <= N' result is always zero.
> 'bitsizeof (typeb) > N' result needs to be calculated.
>
> But it isn't necessarily easy to say if form '(typeb) (a << N)', or
> '((typeb) a) << N' form is to be prefered.  It strongly depends on
> expression this pattern is used in.
>
> For RTL this pattern has another issue, as we need to take natural
> mode into account here too,
>
> We should define for such forms our 'normal' representation.

The usual reason for canonicalization is CSE.  In this case you'd
need to have

 X = a & (-1 >> N) == 0;
 Y = a << N == 0;

and want to CSE Y.  Might be not a very common pattern though.

Richard.

> Kai

Fold some equal to and not equal to patterns in match.pd

Commit Message

Comments

Patch