diff mbox

[4/4] match.pd: Add x + ((-x) & m) -> (x + m) & ~m pattern

Message ID 1421837394-7619-5-git-send-email-rv@rasmusvillemoes.dk
State New
Headers show

Commit Message

Rasmus Villemoes Jan. 21, 2015, 10:49 a.m. UTC
Generalizing the x+(x&1) pattern, one can round up x to a multiple of
a 2^k by adding the negative of x modulo 2^k. But it is fewer
instructions, and presumably requires fewer registers, to do the more
common (x+m)&~m where m=2^k-1.

Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
---
 gcc/match.pd                      |  9 ++++++
 gcc/testsuite/gcc.dg/20150120-4.c | 59 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/20150120-4.c

Comments

Richard Biener April 30, 2015, 9:34 a.m. UTC | #1
On Wed, Jan 21, 2015 at 11:49 AM, Rasmus Villemoes
<rv@rasmusvillemoes.dk> wrote:
> Generalizing the x+(x&1) pattern, one can round up x to a multiple of
> a 2^k by adding the negative of x modulo 2^k. But it is fewer
> instructions, and presumably requires fewer registers, to do the more
> common (x+m)&~m where m=2^k-1.
>
> Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
> ---
>  gcc/match.pd                      |  9 ++++++
>  gcc/testsuite/gcc.dg/20150120-4.c | 59 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 68 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/20150120-4.c
>
> diff --git gcc/match.pd gcc/match.pd
> index 47865f1..93c2298 100644
> --- gcc/match.pd
> +++ gcc/match.pd
> @@ -273,6 +273,15 @@ along with GCC; see the file COPYING3.  If not see
>   (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
>    (bit_ior @0 (bit_not @1))))
>
> +/* x + ((-x) & m) -> (x + m) & ~m when m == 2^k-1.  */
> +(simplify
> + (plus:c @0 (bit_and@2 (negate @0) CONSTANT_CLASS_P@1))

I think you want to restrict this to INTEGER_CST@1

> + (with { tree cst = fold_binary (PLUS_EXPR, TREE_TYPE (@1),
> +                                @1, build_one_cst (TREE_TYPE (@1))); }

We shouldn't dispatch to fold_binary in patterns.  int_const_binop would
be the appropriate function to use - but what happens for @1 == INT_MAX
where @1 + 1 overflows?  Similar, is this also valid for negative @1
and thus signed mask types?  IMHO we should check whether @1
is equal to wi::mask (TYPE_PRECISION (TREE_TYPE (@1)) - wi::clz (@1),
false, TYPE_PRECISION (TREE_TYPE (@1)).

As with the other patch a ChangeLog entry is missing as well as stating
how you tested the patch.

Thanks,
Richard.

> +  (if ((TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
> +       && cst && integer_pow2p (cst))
> +   (bit_and (plus @0 @1) (bit_not @1)))))
> +
>  (simplify
>   (abs (negate @0))
>   (abs @0))
> diff --git gcc/testsuite/gcc.dg/20150120-4.c gcc/testsuite/gcc.dg/20150120-4.c
> new file mode 100644
> index 0000000..c3552bf
> --- /dev/null
> +++ gcc/testsuite/gcc.dg/20150120-4.c
> @@ -0,0 +1,59 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-original" } */
> +
> +/* x + ((-x) & m) -> (x + m) & ~m for m one less than a pow2.  */
> +int
> +fn1 (int x)
> +{
> +       return x + ((-x) & 7);
> +}
> +int
> +fn2 (int x)
> +{
> +       return ((-x) & 7) + x;
> +}
> +unsigned int
> +fn3 (unsigned int x)
> +{
> +       return x + ((-x) & 7);
> +}
> +unsigned int
> +fn4 (unsigned int x)
> +{
> +       return ((-x) & 7) + x;
> +}
> +unsigned int
> +fn5 (unsigned int x)
> +{
> +       return x + ((-x) % 8);
> +}
> +unsigned int
> +fn6 (unsigned int x)
> +{
> +       return ((-x) % 8) + x;
> +}
> +int
> +fn7 (int x)
> +{
> +       return x + ((-x) & 9);
> +}
> +int
> +fn8 (int x)
> +{
> +       return ((-x) & 9) + x;
> +}
> +unsigned int
> +fn9 (unsigned int x)
> +{
> +       return x + ((-x) & ~0U);
> +}
> +unsigned int
> +fn10 (unsigned int x)
> +{
> +       return ((-x) & ~0U) + x;
> +}
> +
> +
> +/* { dg-final { scan-tree-dump-times "x \\+ 7" 6 "original" } } */
> +/* { dg-final { scan-tree-dump-times "-x & 9" 2 "original" } } */
> +/* { dg-final { scan-tree-dump-times "return 0" 2 "original" } } */
> --
> 2.1.3
>
Marc Glisse April 30, 2015, 11:44 a.m. UTC | #2
On Thu, 30 Apr 2015, Richard Biener wrote:

> On Wed, Jan 21, 2015 at 11:49 AM, Rasmus Villemoes
> <rv@rasmusvillemoes.dk> wrote:
>> Generalizing the x+(x&1) pattern, one can round up x to a multiple of
>> a 2^k by adding the negative of x modulo 2^k. But it is fewer
>> instructions, and presumably requires fewer registers, to do the more
>> common (x+m)&~m where m=2^k-1.
>>
>> Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
>> ---
>>  gcc/match.pd                      |  9 ++++++
>>  gcc/testsuite/gcc.dg/20150120-4.c | 59 +++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 68 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.dg/20150120-4.c
>>
>> diff --git gcc/match.pd gcc/match.pd
>> index 47865f1..93c2298 100644
>> --- gcc/match.pd
>> +++ gcc/match.pd
>> @@ -273,6 +273,15 @@ along with GCC; see the file COPYING3.  If not see
>>   (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
>>    (bit_ior @0 (bit_not @1))))
>>
>> +/* x + ((-x) & m) -> (x + m) & ~m when m == 2^k-1.  */
>> +(simplify
>> + (plus:c @0 (bit_and@2 (negate @0) CONSTANT_CLASS_P@1))
>
> I think you want to restrict this to INTEGER_CST@1

Is this only to make the following test easier (a good enough reason for 
me) or is there some fundamental reason why this transformation would be 
wrong for vectors?

>> + (with { tree cst = fold_binary (PLUS_EXPR, TREE_TYPE (@1),
>> +                                @1, build_one_cst (TREE_TYPE (@1))); }
>
> We shouldn't dispatch to fold_binary in patterns.  int_const_binop would
> be the appropriate function to use - but what happens for @1 == INT_MAX
> where @1 + 1 overflows?  Similar, is this also valid for negative @1
> and thus signed mask types?  IMHO we should check whether @1
> is equal to wi::mask (TYPE_PRECISION (TREE_TYPE (@1)) - wi::clz (@1),
> false, TYPE_PRECISION (TREE_TYPE (@1)).
>
> As with the other patch a ChangeLog entry is missing as well as stating
> how you tested the patch.
>
> Thanks,
> Richard.
>
>> +  (if ((TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
>> +       && cst && integer_pow2p (cst))
>> +   (bit_and (plus @0 @1) (bit_not @1)))))
Richard Biener April 30, 2015, 12:18 p.m. UTC | #3
On Thu, Apr 30, 2015 at 1:44 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Thu, 30 Apr 2015, Richard Biener wrote:
>
>> On Wed, Jan 21, 2015 at 11:49 AM, Rasmus Villemoes
>> <rv@rasmusvillemoes.dk> wrote:
>>>
>>> Generalizing the x+(x&1) pattern, one can round up x to a multiple of
>>> a 2^k by adding the negative of x modulo 2^k. But it is fewer
>>> instructions, and presumably requires fewer registers, to do the more
>>> common (x+m)&~m where m=2^k-1.
>>>
>>> Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
>>> ---
>>>  gcc/match.pd                      |  9 ++++++
>>>  gcc/testsuite/gcc.dg/20150120-4.c | 59
>>> +++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 68 insertions(+)
>>>  create mode 100644 gcc/testsuite/gcc.dg/20150120-4.c
>>>
>>> diff --git gcc/match.pd gcc/match.pd
>>> index 47865f1..93c2298 100644
>>> --- gcc/match.pd
>>> +++ gcc/match.pd
>>> @@ -273,6 +273,15 @@ along with GCC; see the file COPYING3.  If not see
>>>   (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
>>>    (bit_ior @0 (bit_not @1))))
>>>
>>> +/* x + ((-x) & m) -> (x + m) & ~m when m == 2^k-1.  */
>>> +(simplify
>>> + (plus:c @0 (bit_and@2 (negate @0) CONSTANT_CLASS_P@1))
>>
>>
>> I think you want to restrict this to INTEGER_CST@1
>
>
> Is this only to make the following test easier (a good enough reason for me)
> or is there some fundamental reason why this transformation would be wrong
> for vectors?

Good question - I suppose it also works for vectors (well, the predicates
don't).  for non-ingegers or complex ints we shouldn't arrive here as
we can't have bit_and for them.  for pointers we can't have plus on them.

So yes, it makes the following tests easier.  A TODO comment for vectors
might be appropriate (we'd simply need a predicate that can test for
all emlements being 2^k-1).

Richard.

>
>>> + (with { tree cst = fold_binary (PLUS_EXPR, TREE_TYPE (@1),
>>> +                                @1, build_one_cst (TREE_TYPE (@1))); }
>>
>>
>> We shouldn't dispatch to fold_binary in patterns.  int_const_binop would
>> be the appropriate function to use - but what happens for @1 == INT_MAX
>> where @1 + 1 overflows?  Similar, is this also valid for negative @1
>> and thus signed mask types?  IMHO we should check whether @1
>> is equal to wi::mask (TYPE_PRECISION (TREE_TYPE (@1)) - wi::clz (@1),
>> false, TYPE_PRECISION (TREE_TYPE (@1)).
>>
>> As with the other patch a ChangeLog entry is missing as well as stating
>> how you tested the patch.
>>
>> Thanks,
>> Richard.
>>
>>> +  (if ((TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
>>> +       && cst && integer_pow2p (cst))
>>> +   (bit_and (plus @0 @1) (bit_not @1)))))
>
>
> --
> Marc Glisse
diff mbox

Patch

diff --git gcc/match.pd gcc/match.pd
index 47865f1..93c2298 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -273,6 +273,15 @@  along with GCC; see the file COPYING3.  If not see
  (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
   (bit_ior @0 (bit_not @1))))
 
+/* x + ((-x) & m) -> (x + m) & ~m when m == 2^k-1.  */
+(simplify
+ (plus:c @0 (bit_and@2 (negate @0) CONSTANT_CLASS_P@1))
+ (with { tree cst = fold_binary (PLUS_EXPR, TREE_TYPE (@1),
+				 @1, build_one_cst (TREE_TYPE (@1))); }
+  (if ((TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
+	&& cst && integer_pow2p (cst))
+   (bit_and (plus @0 @1) (bit_not @1)))))
+
 (simplify
  (abs (negate @0))
  (abs @0))
diff --git gcc/testsuite/gcc.dg/20150120-4.c gcc/testsuite/gcc.dg/20150120-4.c
new file mode 100644
index 0000000..c3552bf
--- /dev/null
+++ gcc/testsuite/gcc.dg/20150120-4.c
@@ -0,0 +1,59 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-original" } */
+
+/* x + ((-x) & m) -> (x + m) & ~m for m one less than a pow2.  */
+int
+fn1 (int x)
+{
+	return x + ((-x) & 7);
+}
+int
+fn2 (int x)
+{
+	return ((-x) & 7) + x;
+}
+unsigned int
+fn3 (unsigned int x)
+{
+	return x + ((-x) & 7);
+}
+unsigned int
+fn4 (unsigned int x)
+{
+	return ((-x) & 7) + x;
+}
+unsigned int
+fn5 (unsigned int x)
+{
+	return x + ((-x) % 8);
+}
+unsigned int
+fn6 (unsigned int x)
+{
+	return ((-x) % 8) + x;
+}
+int
+fn7 (int x)
+{
+	return x + ((-x) & 9);
+}
+int
+fn8 (int x)
+{
+	return ((-x) & 9) + x;
+}
+unsigned int
+fn9 (unsigned int x)
+{
+	return x + ((-x) & ~0U);
+}
+unsigned int
+fn10 (unsigned int x)
+{
+	return ((-x) & ~0U) + x;
+}
+
+
+/* { dg-final { scan-tree-dump-times "x \\+ 7" 6 "original" } } */
+/* { dg-final { scan-tree-dump-times "-x & 9" 2 "original" } } */
+/* { dg-final { scan-tree-dump-times "return 0" 2 "original" } } */