diff mbox

Add (x & ~m) | (y & m) folding (PR middle-end/63568)

Message ID 20141217084644.GJ16332@redhat.com
State New
Headers show

Commit Message

Marek Polacek Dec. 17, 2014, 8:46 a.m. UTC
This adds a transformation of (x & ~m) | (y & m), which (on GIMPLE)
has 4 ops to x ^ ((x ^ y) & m) that has 3 ops on GIMPLE.  In fact,
the latter is then transformed to (a ^ b) & m ^ a, which also has 3
ops.
As discussed in the PR, the original form might also be 3-op on targets
that support something like andn instruction (e.g. BMI1 on Intel), but
that is something for expand/combine.

Not sure if some single-use restriction applies here or not.

Bootstrapped/regtested on x86_64-linux and ppc64-linux, ok for trunk?

2014-12-17  Marek Polacek  <polacek@redhat.com>

	PR middle-end/63568
	* match.pd: Add (x & ~m) | (y & m) -> x ^ ((x ^ y) & m).

	* gcc.dg/pr63568.c: New test.


	Marek

Comments

Jakub Jelinek Dec. 17, 2014, 9:04 a.m. UTC | #1
On Wed, Dec 17, 2014 at 09:46:44AM +0100, Marek Polacek wrote:
> This adds a transformation of (x & ~m) | (y & m), which (on GIMPLE)
> has 4 ops to x ^ ((x ^ y) & m) that has 3 ops on GIMPLE.  In fact,
> the latter is then transformed to (a ^ b) & m ^ a, which also has 3
> ops.

So why don't you transform it to ((x ^ y) & m) ^ x directly (i.e. swap
@0 with the (bit_and ...) ?

BTW, the advantage of (x & ~m) | (y & m) form is that there are fewer
dependencies, at least if the target has andn instruction (e.g. SPARC,
Alpha, PA, MMIX, IA64 have them), just the final or depends on the result of
both and and andnot instructions.  While in the 2 xor forms, and
depends on the first xor result and the second xor depends on the and
result, so if there are multiple ALU units available, the and | andn
form can use both the units, while the second one is unnecessarily
serialized.

	Jakub
Richard Biener Dec. 17, 2014, 9:12 a.m. UTC | #2
On Wed, 17 Dec 2014, Jakub Jelinek wrote:

> On Wed, Dec 17, 2014 at 09:46:44AM +0100, Marek Polacek wrote:
> > This adds a transformation of (x & ~m) | (y & m), which (on GIMPLE)
> > has 4 ops to x ^ ((x ^ y) & m) that has 3 ops on GIMPLE.  In fact,
> > the latter is then transformed to (a ^ b) & m ^ a, which also has 3
> > ops.
> 
> So why don't you transform it to ((x ^ y) & m) ^ x directly (i.e. swap
> @0 with the (bit_and ...) ?

Yeah, I'd prefer that as well.  Also you probably should make sure
the (x & ~m) and (y & m) exprs are single-use (see other examples
in match.pd using has_single_use).

> BTW, the advantage of (x & ~m) | (y & m) form is that there are fewer
> dependencies, at least if the target has andn instruction (e.g. SPARC,
> Alpha, PA, MMIX, IA64 have them), just the final or depends on the result of
> both and and andnot instructions.  While in the 2 xor forms, and
> depends on the first xor result and the second xor depends on the and
> result, so if there are multiple ALU units available, the and | andn
> form can use both the units, while the second one is unnecessarily
> serialized.

Right - I've pointed this out as well.  Of course this simply
asks for more clever expansion of ((x ^ y) & m) ^ x rather
than disabling this transform.  Should be doable with TER
in the BIT_XOR_EXPR expansion code.  Or figure out if teaching
combine/simplify RTX is better.  Of course this form requires
one more register ... which means with high register pressure
or on register starved machines the GIMPLE canonical form might
be better (in some cases).  Which means that LRA should know
how to transform it back? (interesting kind of "reload" ;))

Richard.
diff mbox

Patch

diff --git gcc/match.pd gcc/match.pd
index dbca99e..161b639 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -382,6 +382,11 @@  along with GCC; see the file COPYING3.  If not see
   (bit_not (bit_not @0))
   @0)
 
+/* (x & ~m) | (y & m) -> x ^ ((x ^ y) & m) */
+(simplify
+  (bit_ior:c (bit_and:c @0 (bit_not @2)) (bit_and:c @1 @2))
+  (bit_xor @0 (bit_and (bit_xor @0 @1) @2)))
+
 
 /* Associate (p +p off1) +p off2 as (p +p (off1 + off2)).  */
 (simplify
diff --git gcc/testsuite/gcc.dg/pr63568.c gcc/testsuite/gcc.dg/pr63568.c
index e69de29..fb42bea 100644
--- gcc/testsuite/gcc.dg/pr63568.c
+++ gcc/testsuite/gcc.dg/pr63568.c
@@ -0,0 +1,54 @@ 
+/* PR middle-end/63568 */
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-original" } */
+
+int
+fn1 (int a, int b, int m)
+{
+  return (a & ~m) | (b & m);
+}
+
+int
+fn2 (int a, int b, int m)
+{
+  return (a & ~m) | (m & b);
+}
+
+int
+fn3 (int a, int b, int m)
+{
+  return (~m & a) | (m & b);
+}
+
+int
+fn4 (int a, int b, int m)
+{
+  return (~m & a) | (b & m);
+}
+
+int
+fn5 (int a, int b, int m)
+{
+  return (b & m) | (a & ~m);
+}
+
+int
+fn6 (int a, int b, int m)
+{
+  return (m & b) | (a & ~m);
+}
+
+int
+fn7 (int a, int b, int m)
+{
+  return (m & b) | (~m & a);
+}
+
+int
+fn8 (int a, int b, int m)
+{
+  return (b & m) | (~m & a);
+}
+
+/* { dg-final { scan-tree-dump-not " \\| " "original" } } */
+/* { dg-final { cleanup-tree-dump "original" } } */