RFC: Two minor optimization patterns

Message ID	877fwquwug.fsf@rasmusvillemoes.dk
State	New
Headers	show Return-Path: <gcc-patches-return-389131-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=OimT6FAiJd9lbxS+n5DbrKv/H8GPH0NvhNJIKc7D8XUvNio43vPS1 BClcIlQjzRb1pd5yVSwtFX68PsU6UQ9G6lg+S7ZkMGouJEYN5okEv/zsumfqajl2 Jyvhp0XCPjOD+G02+0SEKzEQOYIQK307LZJRH+46aGbxt7axOrxl8I= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: Rasmus Villemoes <rv@rasmusvillemoes.dk> To: gcc-patches@gcc.gnu.org Subject: RFC: Two minor optimization patterns Date: Tue, 13 Jan 2015 23:41:59 +0100 Message-ID: <877fwquwug.fsf@rasmusvillemoes.dk> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain

Rasmus Villemoes Jan. 13, 2015, 10:41 p.m. UTC

[My first attempt at submitting a patch for gcc, so please forgive me
if I'm not following the right protocol.]

Sometimes rounding a variable to the next even integer is written x += x
& 1. This usually means using an extra register (and hence at least an
extra mov instruction) compared to the equivalent x = (x + 1) & ~1. The
first pattern below tries to do this transformation.

While playing with various ways of rounding down, I noticed that gcc
already optimizes all of x-(x&3), x^(x&3) and x&~(x&3) to simply
x&~3. In fact, x&~(x&y) is rewritten as x&~y. However, the dual of this
is not handled, so I included the second pattern below.

I've tested the below in the sense that gcc compiles and that trivial
test cases get compiled as expected.

Rasmus

Andrew Pinski Jan. 13, 2015, 10:47 p.m. UTC | #1

On Tue, Jan 13, 2015 at 2:41 PM, Rasmus Villemoes <rv@rasmusvillemoes.dk> wrote:
> [My first attempt at submitting a patch for gcc, so please forgive me
> if I'm not following the right protocol.]

There are a few things missing.  For one, a testcase or two for the
added optimizations.

>
> Sometimes rounding a variable to the next even integer is written x += x
> & 1. This usually means using an extra register (and hence at least an
> extra mov instruction) compared to the equivalent x = (x + 1) & ~1. The
> first pattern below tries to do this transformation.
>
> While playing with various ways of rounding down, I noticed that gcc
> already optimizes all of x-(x&3), x^(x&3) and x&~(x&3) to simply
> x&~3. In fact, x&~(x&y) is rewritten as x&~y. However, the dual of this
> is not handled, so I included the second pattern below.
>
> I've tested the below in the sense that gcc compiles and that trivial
> test cases get compiled as expected.

The other thing you missed is a changelog entry for the change you did.
Also you mentioned you tested the patch below but did not mention
which target you tested it on and you should run the full GCC
testsuite.
https://gcc.gnu.org/contribute.html is a good page to start with how
to handle most of the items above.
https://gcc.gnu.org/wiki/HowToPrepareATestcase is a good page on how
to write the testcase for testing the added optimization.

Thanks,
Andrew Pinski

>
> Rasmus
>
>
>
> diff --git gcc/match.pd gcc/match.pd
> index 81c4ee6..04a0bc4 100644
> --- gcc/match.pd
> +++ gcc/match.pd
> @@ -262,6 +262,16 @@ along with GCC; see the file COPYING3.  If not see
>   (abs tree_expr_nonnegative_p@0)
>   @0)
>
> +/* x + (x & 1) -> (x + 1) & ~1 */
> +(simplify
> + (plus @0 (bit_and @0 integer_onep@1))
> + (bit_and (plus @0 @1) (bit_not @1)))
> +
> +/* x | ~(x | y) -> x | ~y */
> +(simplify
> + (bit_ior @0 (bit_not (bit_ior @0 @1)))
> + (bit_ior @0 (bit_not @1)))
> +
>
>  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
>     when profitable.

Richard Biener Jan. 14, 2015, 9:24 a.m. UTC | #2

On Tue, Jan 13, 2015 at 11:47 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Tue, Jan 13, 2015 at 2:41 PM, Rasmus Villemoes <rv@rasmusvillemoes.dk> wrote:
>> [My first attempt at submitting a patch for gcc, so please forgive me
>> if I'm not following the right protocol.]
>
> There are a few things missing.  For one, a testcase or two for the
> added optimizations.
>
>>
>> Sometimes rounding a variable to the next even integer is written x += x
>> & 1. This usually means using an extra register (and hence at least an
>> extra mov instruction) compared to the equivalent x = (x + 1) & ~1. The
>> first pattern below tries to do this transformation.
>>
>> While playing with various ways of rounding down, I noticed that gcc
>> already optimizes all of x-(x&3), x^(x&3) and x&~(x&3) to simply
>> x&~3.

Does it also handle x+(x&3)?  Where does it handle x - (x&3)?

That is, doesn't the pattern also work for constants other than 1?

Please put it before the abs simplifications after the last one handing
bit_and/bit_ior.

Thanks,
Richard.

> In fact, x&~(x&y) is rewritten as x&~y. However, the dual of this
>> is not handled, so I included the second pattern below.
>>
>> I've tested the below in the sense that gcc compiles and that trivial
>> test cases get compiled as expected.
>
> The other thing you missed is a changelog entry for the change you did.
> Also you mentioned you tested the patch below but did not mention
> which target you tested it on and you should run the full GCC
> testsuite.
> https://gcc.gnu.org/contribute.html is a good page to start with how
> to handle most of the items above.
> https://gcc.gnu.org/wiki/HowToPrepareATestcase is a good page on how
> to write the testcase for testing the added optimization.
>
> Thanks,
> Andrew Pinski
>
>>
>> Rasmus
>>
>>
>>
>> diff --git gcc/match.pd gcc/match.pd
>> index 81c4ee6..04a0bc4 100644
>> --- gcc/match.pd
>> +++ gcc/match.pd
>> @@ -262,6 +262,16 @@ along with GCC; see the file COPYING3.  If not see
>>   (abs tree_expr_nonnegative_p@0)
>>   @0)
>>
>> +/* x + (x & 1) -> (x + 1) & ~1 */
>> +(simplify
>> + (plus @0 (bit_and @0 integer_onep@1))
>> + (bit_and (plus @0 @1) (bit_not @1)))
>> +
>> +/* x | ~(x | y) -> x | ~y */
>> +(simplify
>> + (bit_ior @0 (bit_not (bit_ior @0 @1)))
>> + (bit_ior @0 (bit_not @1)))
>> +
>>
>>  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
>>     when profitable.

Rasmus Villemoes Jan. 14, 2015, 12:23 p.m. UTC | #3

On Wed, Jan 14 2015, Richard Biener <richard.guenther@gmail.com> wrote:

> On Tue, Jan 13, 2015 at 11:47 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Tue, Jan 13, 2015 at 2:41 PM, Rasmus Villemoes <rv@rasmusvillemoes.dk> wrote:
>>> [My first attempt at submitting a patch for gcc, so please forgive me
>>> if I'm not following the right protocol.]
>>
>> There are a few things missing.  For one, a testcase or two for the
>> added optimizations.

I'll see what I can come up with. Thanks for the pointers.

>>> Sometimes rounding a variable to the next even integer is written x += x
>>> & 1. This usually means using an extra register (and hence at least an
>>> extra mov instruction) compared to the equivalent x = (x + 1) & ~1. The
>>> first pattern below tries to do this transformation.
>>>
>>> While playing with various ways of rounding down, I noticed that gcc
>>> already optimizes all of x-(x&3), x^(x&3) and x&~(x&3) to simply
>>> x&~3.
>
> Does it also handle x+(x&3)?

I'm not sure what 'it' refers to, and I'm also not sure how you think
x+(x&3) could be rewritten.

> Where does it handle x - (x&3)?

If by 'it' you mean gcc, I tried looking for a pattern matching this,
but couldn't find it, so I don't know where it is handled. I can just
see by running gcc-5.0 -fdump-tree-original -O2 -c opt.c that "x - (x &
3)" is rewritten as x & -4 (which is of course the same as x & ~3). Btw,
I now see that neither x&~(x&3) or x&~(x&y) are rewritten that early,
but objdump -d shows that the end result is the same.

> That is, doesn't the pattern also work for constants other than 1?

Here I assume that 'the pattern' refers to the first pattern, and the
answer is 'not immediately'. To round up a number to the next multiple
of 2^k we need to add the negative of that number modulo 2^k. It just so
happens that for k=1 we have x==-x for both possible values of x. So
with a little tweak, this does in fact lead to an optimization
opportunity, namely x + ((-x) & m) -> (x + m) & ~m whenever m is one
less than a power of 2. I don't know how to check for m satisfying this
in the match.pd language.

> Please put it before the abs simplifications after the last one handing
> bit_and/bit_ior.

OK, will do.

Rasmus

Marc Glisse Jan. 14, 2015, 12:45 p.m. UTC | #4

On Tue, 13 Jan 2015, Rasmus Villemoes wrote:

> diff --git gcc/match.pd gcc/match.pd
> index 81c4ee6..04a0bc4 100644
> --- gcc/match.pd
> +++ gcc/match.pd
> @@ -262,6 +262,16 @@ along with GCC; see the file COPYING3.  If not see
>  (abs tree_expr_nonnegative_p@0)
>  @0)
>
> +/* x + (x & 1) -> (x + 1) & ~1 */
> +(simplify
> + (plus @0 (bit_and @0 integer_onep@1))
> + (bit_and (plus @0 @1) (bit_not @1)))
> +
> +/* x | ~(x | y) -> x | ~y */
> +(simplify
> + (bit_ior @0 (bit_not (bit_ior @0 @1)))
> + (bit_ior @0 (bit_not @1)))

You may want to consider using has_single_use (see other patterns).

Richard Biener Jan. 14, 2015, 1:51 p.m. UTC | #5

On Wed, Jan 14, 2015 at 1:45 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Tue, 13 Jan 2015, Rasmus Villemoes wrote:
>
>> diff --git gcc/match.pd gcc/match.pd
>> index 81c4ee6..04a0bc4 100644
>> --- gcc/match.pd
>> +++ gcc/match.pd
>> @@ -262,6 +262,16 @@ along with GCC; see the file COPYING3.  If not see
>>  (abs tree_expr_nonnegative_p@0)
>>  @0)
>>
>> +/* x + (x & 1) -> (x + 1) & ~1 */
>> +(simplify
>> + (plus @0 (bit_and @0 integer_onep@1))
>> + (bit_and (plus @0 @1) (bit_not @1)))
>> +
>> +/* x | ~(x | y) -> x | ~y */
>> +(simplify
>> + (bit_ior @0 (bit_not (bit_ior @0 @1)))
>> + (bit_ior @0 (bit_not @1)))
>
>
> You may want to consider using has_single_use (see other patterns).

Just to clarify, you mean on the (x | y) sub-expression?

Richard.

> --
> Marc Glisse

Richard Biener Jan. 14, 2015, 1:54 p.m. UTC | #6

On Wed, Jan 14, 2015 at 1:23 PM, Rasmus Villemoes <rv@rasmusvillemoes.dk> wrote:
> On Wed, Jan 14 2015, Richard Biener <richard.guenther@gmail.com> wrote:
>
>> On Tue, Jan 13, 2015 at 11:47 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>> On Tue, Jan 13, 2015 at 2:41 PM, Rasmus Villemoes <rv@rasmusvillemoes.dk> wrote:
>>>> [My first attempt at submitting a patch for gcc, so please forgive me
>>>> if I'm not following the right protocol.]
>>>
>>> There are a few things missing.  For one, a testcase or two for the
>>> added optimizations.
>
> I'll see what I can come up with. Thanks for the pointers.
>
>>>> Sometimes rounding a variable to the next even integer is written x += x
>>>> & 1. This usually means using an extra register (and hence at least an
>>>> extra mov instruction) compared to the equivalent x = (x + 1) & ~1. The
>>>> first pattern below tries to do this transformation.
>>>>
>>>> While playing with various ways of rounding down, I noticed that gcc
>>>> already optimizes all of x-(x&3), x^(x&3) and x&~(x&3) to simply
>>>> x&~3.
>>
>> Does it also handle x+(x&3)?
>
> I'm not sure what 'it' refers to, and I'm also not sure how you think
> x+(x&3) could be rewritten.

I was just guessing.

>> Where does it handle x - (x&3)?
>
> If by 'it' you mean gcc, I tried looking for a pattern matching this,
> but couldn't find it, so I don't know where it is handled. I can just
> see by running gcc-5.0 -fdump-tree-original -O2 -c opt.c that "x - (x &
> 3)" is rewritten as x & -4 (which is of course the same as x & ~3).

That's done in fold-const.c:fold_binary_loc here:

          /* Fold A - (A & B) into ~B & A.  */
          if (!TREE_SIDE_EFFECTS (arg0)
              && TREE_CODE (arg1) == BIT_AND_EXPR)
            {
...

(note that patterns are not fully moved to match.pd yet)

> Btw,
> I now see that neither x&~(x&3) or x&~(x&y) are rewritten that early,
> but objdump -d shows that the end result is the same.
>
>> That is, doesn't the pattern also work for constants other than 1?
>
> Here I assume that 'the pattern' refers to the first pattern, and the
> answer is 'not immediately'. To round up a number to the next multiple
> of 2^k we need to add the negative of that number modulo 2^k. It just so
> happens that for k=1 we have x==-x for both possible values of x. So
> with a little tweak, this does in fact lead to an optimization
> opportunity, namely x + ((-x) & m) -> (x + m) & ~m whenever m is one
> less than a power of 2. I don't know how to check for m satisfying this
> in the match.pd language.

you'd need to write some C code involving trees in a if/with.  We do
have a integer_pow2p predicate but not a integer_one_less_than_pow2p
one.

>
>> Please put it before the abs simplifications after the last one handing
>> bit_and/bit_ior.
>
> OK, will do.

Thanks,
Richard.

> Rasmus

Marc Glisse Jan. 14, 2015, 2:26 p.m. UTC | #7

On Wed, 14 Jan 2015, Richard Biener wrote:

> On Wed, Jan 14, 2015 at 1:45 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
>> On Tue, 13 Jan 2015, Rasmus Villemoes wrote:
>>
>>> diff --git gcc/match.pd gcc/match.pd
>>> index 81c4ee6..04a0bc4 100644
>>> --- gcc/match.pd
>>> +++ gcc/match.pd
>>> @@ -262,6 +262,16 @@ along with GCC; see the file COPYING3.  If not see
>>>  (abs tree_expr_nonnegative_p@0)
>>>  @0)
>>>
>>> +/* x + (x & 1) -> (x + 1) & ~1 */
>>> +(simplify
>>> + (plus @0 (bit_and @0 integer_onep@1))
>>> + (bit_and (plus @0 @1) (bit_not @1)))
>>> +
>>> +/* x | ~(x | y) -> x | ~y */
>>> +(simplify
>>> + (bit_ior @0 (bit_not (bit_ior @0 @1)))
>>> + (bit_ior @0 (bit_not @1)))
>>
>>
>> You may want to consider using has_single_use (see other patterns).
>
> Just to clarify, you mean on the (x | y) sub-expression?

I meant on (x & 1) for the first pattern and on ~(x | y) for the second 
one. That is, cases where the transformation could actually increase the 
number of statements (we don't have a convenient interface to check if 
(x+1) or ~y are already available).

(x | y) could sometimes be cheaper than y if it is already computed and it 
shortens the lifetime of y, but we are way too early in the pipeline to 
make an informed decision about that, so it seems better to canonicalize.

Now that I think about it, some platforms probably have an instruction 
ornot, so my reasons above could be wrong...

Richard Biener Jan. 14, 2015, 2:31 p.m. UTC | #8

On Wed, Jan 14, 2015 at 3:26 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Wed, 14 Jan 2015, Richard Biener wrote:
>
>> On Wed, Jan 14, 2015 at 1:45 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
>>>
>>> On Tue, 13 Jan 2015, Rasmus Villemoes wrote:
>>>
>>>> diff --git gcc/match.pd gcc/match.pd
>>>> index 81c4ee6..04a0bc4 100644
>>>> --- gcc/match.pd
>>>> +++ gcc/match.pd
>>>> @@ -262,6 +262,16 @@ along with GCC; see the file COPYING3.  If not see
>>>>  (abs tree_expr_nonnegative_p@0)
>>>>  @0)
>>>>
>>>> +/* x + (x & 1) -> (x + 1) & ~1 */
>>>> +(simplify
>>>> + (plus @0 (bit_and @0 integer_onep@1))
>>>> + (bit_and (plus @0 @1) (bit_not @1)))
>>>> +
>>>> +/* x | ~(x | y) -> x | ~y */
>>>> +(simplify
>>>> + (bit_ior @0 (bit_not (bit_ior @0 @1)))
>>>> + (bit_ior @0 (bit_not @1)))
>>>
>>>
>>>
>>> You may want to consider using has_single_use (see other patterns).
>>
>>
>> Just to clarify, you mean on the (x | y) sub-expression?
>
>
> I meant on (x & 1) for the first pattern and on ~(x | y) for the second one.
> That is, cases where the transformation could actually increase the number
> of statements (we don't have a convenient interface to check if (x+1) or ~y
> are already available).
>
> (x | y) could sometimes be cheaper than y if it is already computed and it
> shortens the lifetime of y, but we are way too early in the pipeline to make
> an informed decision about that, so it seems better to canonicalize.

Yeah.  Note that I've not yet settled myself on how to attack the
"single-use issue" generally.  For example I'd like to do code
generation for a specific simplification variant for value-numbering which
can check the availability of expressions - here explicit has_single_use
calls wil be prohibitive.  It should be the responsibility of the caller
to apply restrictions like this (but for the tree-ssa-forwprop.c case I've
not yet come up with a reasonable way to tame things down...)

Richard.

> Now that I think about it, some platforms probably have an instruction
> ornot, so my reasons above could be wrong...



> --
> Marc Glisse

Rasmus Villemoes Jan. 21, 2015, 10:49 a.m. UTC | #9

This adds four optimization patterns to match.pd, along with trivial
test cases. Compiles and works as expected, and 'make check' on x86_64
gives the same number of "unexpected failures" before and after (8
from gfortran.dg/guality/pr41558.f90, 1 from failing to compile
gcc.dg/plugin/ggcplug.c).

I know almost nothing about the internals of gcc, so 4/4 may very well
be considered ugly - it was what I could stitch together from pieces I
picked up here and there.

Rasmus Villemoes (4):
  match.pd: Add x + (x & 1) -> (x + 1) & ~1 pattern
  match.pd: Add x & ~(x & y) -> x & ~y pattern
  match.pd: Add x | ~(x | y) -> x | ~y pattern
  match.pd: Add x + ((-x) & m) -> (x + m) & ~m pattern

 gcc/match.pd                      | 27 ++++++++++++++++++
 gcc/testsuite/gcc.dg/20150120-1.c | 51 +++++++++++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/20150120-2.c | 32 +++++++++++++++++++++
 gcc/testsuite/gcc.dg/20150120-3.c | 32 +++++++++++++++++++++
 gcc/testsuite/gcc.dg/20150120-4.c | 59 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 201 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/20150120-1.c
 create mode 100644 gcc/testsuite/gcc.dg/20150120-2.c
 create mode 100644 gcc/testsuite/gcc.dg/20150120-3.c
 create mode 100644 gcc/testsuite/gcc.dg/20150120-4.c

RFC: Two minor optimization patterns

Commit Message

Comments

Patch