diff mbox

Add (x & ~m) | (y & m) folding (PR middle-end/63568)

Message ID 20141217112944.GK16332@redhat.com
State New
Headers show

Commit Message

Marek Polacek Dec. 17, 2014, 11:29 a.m. UTC
On Wed, Dec 17, 2014 at 10:12:06AM +0100, Richard Biener wrote:
> On Wed, 17 Dec 2014, Jakub Jelinek wrote:
> 
> > On Wed, Dec 17, 2014 at 09:46:44AM +0100, Marek Polacek wrote:
> > > This adds a transformation of (x & ~m) | (y & m), which (on GIMPLE)
> > > has 4 ops to x ^ ((x ^ y) & m) that has 3 ops on GIMPLE.  In fact,
> > > the latter is then transformed to (a ^ b) & m ^ a, which also has 3
> > > ops.
> > 
> > So why don't you transform it to ((x ^ y) & m) ^ x directly (i.e. swap
> > @0 with the (bit_and ...) ?
> 
> Yeah, I'd prefer that as well.  Also you probably should make sure
> the (x & ~m) and (y & m) exprs are single-use (see other examples
> in match.pd using has_single_use).
 
Ok.  We have only one use of has_single_use so far in match.pd, so I
hope what I have is correct...  I suppose the point is not to hinder
CSEing those two subexpressions if they have multiple uses.

> > BTW, the advantage of (x & ~m) | (y & m) form is that there are fewer
> > dependencies, at least if the target has andn instruction (e.g. SPARC,
> > Alpha, PA, MMIX, IA64 have them), just the final or depends on the result of
> > both and and andnot instructions.  While in the 2 xor forms, and
> > depends on the first xor result and the second xor depends on the and
> > result, so if there are multiple ALU units available, the and | andn
> > form can use both the units, while the second one is unnecessarily
> > serialized.
> 
> Right - I've pointed this out as well.  Of course this simply
> asks for more clever expansion of ((x ^ y) & m) ^ x rather
> than disabling this transform.  Should be doable with TER
> in the BIT_XOR_EXPR expansion code.  Or figure out if teaching
> combine/simplify RTX is better.  Of course this form requires
> one more register ... which means with high register pressure
> or on register starved machines the GIMPLE canonical form might
> be better (in some cases).  Which means that LRA should know
> how to transform it back? (interesting kind of "reload" ;))

I'd rather leave the expansion/combine stuff to someone else ;).

Bootstrapped/regtested on x86_64-linux and ppc64-linux, ok for trunk?

2014-12-17  Marek Polacek  <polacek@redhat.com>

	PR middle-end/63568
	* match.pd: Add (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x pattern.

	* gcc.dg/pr63568.c: New test.


	Marek

Comments

Richard Biener Dec. 17, 2014, 11:39 a.m. UTC | #1
On December 17, 2014 12:29:44 PM CET, Marek Polacek <polacek@redhat.com> wrote:
>On Wed, Dec 17, 2014 at 10:12:06AM +0100, Richard Biener wrote:
>> On Wed, 17 Dec 2014, Jakub Jelinek wrote:
>> 
>> > On Wed, Dec 17, 2014 at 09:46:44AM +0100, Marek Polacek wrote:
>> > > This adds a transformation of (x & ~m) | (y & m), which (on
>GIMPLE)
>> > > has 4 ops to x ^ ((x ^ y) & m) that has 3 ops on GIMPLE.  In
>fact,
>> > > the latter is then transformed to (a ^ b) & m ^ a, which also has
>3
>> > > ops.
>> > 
>> > So why don't you transform it to ((x ^ y) & m) ^ x directly (i.e.
>swap
>> > @0 with the (bit_and ...) ?
>> 
>> Yeah, I'd prefer that as well.  Also you probably should make sure
>> the (x & ~m) and (y & m) exprs are single-use (see other examples
>> in match.pd using has_single_use).
> 
>Ok.  We have only one use of has_single_use so far in match.pd, so I
>hope what I have is correct...  I suppose the point is not to hinder
>CSEing those two subexpressions if they have multiple uses.
>
>> > BTW, the advantage of (x & ~m) | (y & m) form is that there are
>fewer
>> > dependencies, at least if the target has andn instruction (e.g.
>SPARC,
>> > Alpha, PA, MMIX, IA64 have them), just the final or depends on the
>result of
>> > both and and andnot instructions.  While in the 2 xor forms, and
>> > depends on the first xor result and the second xor depends on the
>and
>> > result, so if there are multiple ALU units available, the and |
>andn
>> > form can use both the units, while the second one is unnecessarily
>> > serialized.
>> 
>> Right - I've pointed this out as well.  Of course this simply
>> asks for more clever expansion of ((x ^ y) & m) ^ x rather
>> than disabling this transform.  Should be doable with TER
>> in the BIT_XOR_EXPR expansion code.  Or figure out if teaching
>> combine/simplify RTX is better.  Of course this form requires
>> one more register ... which means with high register pressure
>> or on register starved machines the GIMPLE canonical form might
>> be better (in some cases).  Which means that LRA should know
>> how to transform it back? (interesting kind of "reload" ;))
>
>I'd rather leave the expansion/combine stuff to someone else ;).
>
>Bootstrapped/regtested on x86_64-linux and ppc64-linux, ok for trunk?

OK.

Thanks,
Richard.

>2014-12-17  Marek Polacek  <polacek@redhat.com>
>
>	PR middle-end/63568
>	* match.pd: Add (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x pattern.
>
>	* gcc.dg/pr63568.c: New test.
>
>diff --git gcc/match.pd gcc/match.pd
>index dbca99e..4d4bc9f 100644
>--- gcc/match.pd
>+++ gcc/match.pd
>@@ -382,6 +382,13 @@ along with GCC; see the file COPYING3.  If not see
>   (bit_not (bit_not @0))
>   @0)
> 
>+/* (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x */
>+(simplify
>+  (bit_ior:c (bit_and:c@3 @0 (bit_not @2)) (bit_and:c@4 @1 @2))
>+  (if ((TREE_CODE (@3) != SSA_NAME || has_single_use (@3))
>+	&& (TREE_CODE (@4) != SSA_NAME || has_single_use (@4)))
>+   (bit_xor (bit_and (bit_xor @0 @1) @2) @0)))
>+
> 
> /* Associate (p +p off1) +p off2 as (p +p (off1 + off2)).  */
> (simplify
>diff --git gcc/testsuite/gcc.dg/pr63568.c
>gcc/testsuite/gcc.dg/pr63568.c
>index e69de29..fb42bea 100644
>--- gcc/testsuite/gcc.dg/pr63568.c
>+++ gcc/testsuite/gcc.dg/pr63568.c
>@@ -0,0 +1,54 @@
>+/* PR middle-end/63568 */
>+/* { dg-do compile } */
>+/* { dg-options "-fdump-tree-original" } */
>+
>+int
>+fn1 (int a, int b, int m)
>+{
>+  return (a & ~m) | (b & m);
>+}
>+
>+int
>+fn2 (int a, int b, int m)
>+{
>+  return (a & ~m) | (m & b);
>+}
>+
>+int
>+fn3 (int a, int b, int m)
>+{
>+  return (~m & a) | (m & b);
>+}
>+
>+int
>+fn4 (int a, int b, int m)
>+{
>+  return (~m & a) | (b & m);
>+}
>+
>+int
>+fn5 (int a, int b, int m)
>+{
>+  return (b & m) | (a & ~m);
>+}
>+
>+int
>+fn6 (int a, int b, int m)
>+{
>+  return (m & b) | (a & ~m);
>+}
>+
>+int
>+fn7 (int a, int b, int m)
>+{
>+  return (m & b) | (~m & a);
>+}
>+
>+int
>+fn8 (int a, int b, int m)
>+{
>+  return (b & m) | (~m & a);
>+}
>+
>+/* { dg-final { scan-tree-dump-not " \\| " "original" } } */
>+/* { dg-final { cleanup-tree-dump "original" } } */
>
>	Marek
diff mbox

Patch

diff --git gcc/match.pd gcc/match.pd
index dbca99e..4d4bc9f 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -382,6 +382,13 @@  along with GCC; see the file COPYING3.  If not see
   (bit_not (bit_not @0))
   @0)
 
+/* (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x */
+(simplify
+  (bit_ior:c (bit_and:c@3 @0 (bit_not @2)) (bit_and:c@4 @1 @2))
+  (if ((TREE_CODE (@3) != SSA_NAME || has_single_use (@3))
+	&& (TREE_CODE (@4) != SSA_NAME || has_single_use (@4)))
+   (bit_xor (bit_and (bit_xor @0 @1) @2) @0)))
+
 
 /* Associate (p +p off1) +p off2 as (p +p (off1 + off2)).  */
 (simplify
diff --git gcc/testsuite/gcc.dg/pr63568.c gcc/testsuite/gcc.dg/pr63568.c
index e69de29..fb42bea 100644
--- gcc/testsuite/gcc.dg/pr63568.c
+++ gcc/testsuite/gcc.dg/pr63568.c
@@ -0,0 +1,54 @@ 
+/* PR middle-end/63568 */
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-original" } */
+
+int
+fn1 (int a, int b, int m)
+{
+  return (a & ~m) | (b & m);
+}
+
+int
+fn2 (int a, int b, int m)
+{
+  return (a & ~m) | (m & b);
+}
+
+int
+fn3 (int a, int b, int m)
+{
+  return (~m & a) | (m & b);
+}
+
+int
+fn4 (int a, int b, int m)
+{
+  return (~m & a) | (b & m);
+}
+
+int
+fn5 (int a, int b, int m)
+{
+  return (b & m) | (a & ~m);
+}
+
+int
+fn6 (int a, int b, int m)
+{
+  return (m & b) | (a & ~m);
+}
+
+int
+fn7 (int a, int b, int m)
+{
+  return (m & b) | (~m & a);
+}
+
+int
+fn8 (int a, int b, int m)
+{
+  return (b & m) | (~m & a);
+}
+
+/* { dg-final { scan-tree-dump-not " \\| " "original" } } */
+/* { dg-final { cleanup-tree-dump "original" } } */