Message ID | 20141217084644.GJ16332@redhat.com |
---|---|
State | New |
Headers | show |
On Wed, Dec 17, 2014 at 09:46:44AM +0100, Marek Polacek wrote: > This adds a transformation of (x & ~m) | (y & m), which (on GIMPLE) > has 4 ops to x ^ ((x ^ y) & m) that has 3 ops on GIMPLE. In fact, > the latter is then transformed to (a ^ b) & m ^ a, which also has 3 > ops. So why don't you transform it to ((x ^ y) & m) ^ x directly (i.e. swap @0 with the (bit_and ...) ? BTW, the advantage of (x & ~m) | (y & m) form is that there are fewer dependencies, at least if the target has andn instruction (e.g. SPARC, Alpha, PA, MMIX, IA64 have them), just the final or depends on the result of both and and andnot instructions. While in the 2 xor forms, and depends on the first xor result and the second xor depends on the and result, so if there are multiple ALU units available, the and | andn form can use both the units, while the second one is unnecessarily serialized. Jakub
On Wed, 17 Dec 2014, Jakub Jelinek wrote: > On Wed, Dec 17, 2014 at 09:46:44AM +0100, Marek Polacek wrote: > > This adds a transformation of (x & ~m) | (y & m), which (on GIMPLE) > > has 4 ops to x ^ ((x ^ y) & m) that has 3 ops on GIMPLE. In fact, > > the latter is then transformed to (a ^ b) & m ^ a, which also has 3 > > ops. > > So why don't you transform it to ((x ^ y) & m) ^ x directly (i.e. swap > @0 with the (bit_and ...) ? Yeah, I'd prefer that as well. Also you probably should make sure the (x & ~m) and (y & m) exprs are single-use (see other examples in match.pd using has_single_use). > BTW, the advantage of (x & ~m) | (y & m) form is that there are fewer > dependencies, at least if the target has andn instruction (e.g. SPARC, > Alpha, PA, MMIX, IA64 have them), just the final or depends on the result of > both and and andnot instructions. While in the 2 xor forms, and > depends on the first xor result and the second xor depends on the and > result, so if there are multiple ALU units available, the and | andn > form can use both the units, while the second one is unnecessarily > serialized. Right - I've pointed this out as well. Of course this simply asks for more clever expansion of ((x ^ y) & m) ^ x rather than disabling this transform. Should be doable with TER in the BIT_XOR_EXPR expansion code. Or figure out if teaching combine/simplify RTX is better. Of course this form requires one more register ... which means with high register pressure or on register starved machines the GIMPLE canonical form might be better (in some cases). Which means that LRA should know how to transform it back? (interesting kind of "reload" ;)) Richard.
diff --git gcc/match.pd gcc/match.pd index dbca99e..161b639 100644 --- gcc/match.pd +++ gcc/match.pd @@ -382,6 +382,11 @@ along with GCC; see the file COPYING3. If not see (bit_not (bit_not @0)) @0) +/* (x & ~m) | (y & m) -> x ^ ((x ^ y) & m) */ +(simplify + (bit_ior:c (bit_and:c @0 (bit_not @2)) (bit_and:c @1 @2)) + (bit_xor @0 (bit_and (bit_xor @0 @1) @2))) + /* Associate (p +p off1) +p off2 as (p +p (off1 + off2)). */ (simplify diff --git gcc/testsuite/gcc.dg/pr63568.c gcc/testsuite/gcc.dg/pr63568.c index e69de29..fb42bea 100644 --- gcc/testsuite/gcc.dg/pr63568.c +++ gcc/testsuite/gcc.dg/pr63568.c @@ -0,0 +1,54 @@ +/* PR middle-end/63568 */ +/* { dg-do compile } */ +/* { dg-options "-fdump-tree-original" } */ + +int +fn1 (int a, int b, int m) +{ + return (a & ~m) | (b & m); +} + +int +fn2 (int a, int b, int m) +{ + return (a & ~m) | (m & b); +} + +int +fn3 (int a, int b, int m) +{ + return (~m & a) | (m & b); +} + +int +fn4 (int a, int b, int m) +{ + return (~m & a) | (b & m); +} + +int +fn5 (int a, int b, int m) +{ + return (b & m) | (a & ~m); +} + +int +fn6 (int a, int b, int m) +{ + return (m & b) | (a & ~m); +} + +int +fn7 (int a, int b, int m) +{ + return (m & b) | (~m & a); +} + +int +fn8 (int a, int b, int m) +{ + return (b & m) | (~m & a); +} + +/* { dg-final { scan-tree-dump-not " \\| " "original" } } */ +/* { dg-final { cleanup-tree-dump "original" } } */