Message ID | 20211229012706.3244946-1-luoxhu@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | convert (xor (and (xor A B) C) A) to (ior (and A ~C) (and B C)) [PR90323] | expand |
On 12/28/2021 6:27 PM, Xionghu Luo via Gcc-patches wrote: > Bootstrapped and regtested on powerpc64le-linux-gnu {P10,P9} > powerpc64-linux-gnu {P8, P7} and X86. OK for master? > > gcc/ChangeLog: > > PR 90323 > * simplify-rtx.c (simplify_context::simplify_binary_operation_1): Relax > C from constant to constant or reg. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pr90323.c: New test. If C is not a constant and the target does not have and-complement instructions, then this is likely worse than the original. If you want to do this transformation, then you probably need to be checking target costs. Jeff
On Thu, Dec 30, 2021 at 09:22:51AM -0700, Jeff Law via Gcc-patches wrote: > On 12/28/2021 6:27 PM, Xionghu Luo via Gcc-patches wrote: > > Bootstrapped and regtested on powerpc64le-linux-gnu {P10,P9} > > powerpc64-linux-gnu {P8, P7} and X86. OK for master? > > > > gcc/ChangeLog: > > > > PR 90323 > > * simplify-rtx.c (simplify_context::simplify_binary_operation_1): Relax > > C from constant to constant or reg. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/powerpc/pr90323.c: New test. > If C is not a constant and the target does not have and-complement > instructions, then this is likely worse than the original. If you want to > do this transformation, then you probably need to be checking target costs. I'm not sure we can check costs because what simplify_* is fed especially during combine might be far from what is a valid instruction and checking costs on something that isn't valid could lead to bogus results. Perhaps check if the andnot optab exist for the mode, except we don't have such an optab... Jakub
On 12/30/2021 9:27 AM, Jakub Jelinek wrote: > On Thu, Dec 30, 2021 at 09:22:51AM -0700, Jeff Law via Gcc-patches wrote: >> On 12/28/2021 6:27 PM, Xionghu Luo via Gcc-patches wrote: >>> Bootstrapped and regtested on powerpc64le-linux-gnu {P10,P9} >>> powerpc64-linux-gnu {P8, P7} and X86. OK for master? >>> >>> gcc/ChangeLog: >>> >>> PR 90323 >>> * simplify-rtx.c (simplify_context::simplify_binary_operation_1): Relax >>> C from constant to constant or reg. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/powerpc/pr90323.c: New test. >> If C is not a constant and the target does not have and-complement >> instructions, then this is likely worse than the original. If you want to >> do this transformation, then you probably need to be checking target costs. > I'm not sure we can check costs because what simplify_* is fed especially > during combine might be far from what is a valid instruction and checking > costs on something that isn't valid could lead to bogus results. > Perhaps check if the andnot optab exist for the mode, except we don't have > such an optab... Should we just defer this to the next stage1? That would give plenty of time to wire up an optab and test it on the appropriate targets. jeff
On Thu, Dec 30, 2021 at 09:22:51AM -0700, Jeff Law wrote: > On 12/28/2021 6:27 PM, Xionghu Luo via Gcc-patches wrote: > > PR 90323 > > * simplify-rtx.c (simplify_context::simplify_binary_operation_1): > > Relax > > C from constant to constant or reg. > > > >gcc/testsuite/ChangeLog: > > > > * gcc.target/powerpc/pr90323.c: New test. > If C is not a constant and the target does not have and-complement > instructions, then this is likely worse than the original. If you want > to do this transformation, then you probably need to be checking target > costs. It even then is still not worse on any modern superscalar machine! Segher
Hi! On Tue, Dec 28, 2021 at 07:27:06PM -0600, Xionghu Luo wrote: > Bootstrapped and regtested on powerpc64le-linux-gnu {P10,P9} > powerpc64-linux-gnu {P8, P7} and X86. OK for master? > > gcc/ChangeLog: > > PR 90323 > * simplify-rtx.c (simplify_context::simplify_binary_operation_1): Relax > C from constant to constant or reg. So, this doesn't even do the right thing for PowerPC, for scalar operations that is: we only have rl[wd]imi for this, and nothing that uses a register for the mask instead. Not that it will hurt much at all (if anything) in practice, but :-) OTOH for vectors we have vsel/xxsel, which always uses a register for the mask. On yet another hand, the formulation with two XORs is just obfuscation, for RTL. RTL very much does not have the rule that fewer ops is better. So, ideally we will never get this nonsense in RTL at all. Probably the simplest / best way to get there is to not have it in Gimple either, and instead just use some "select" operation there, which can be optimised much better anyway? And yeah that will be GCC 13 stuff :-( Segher
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index a060f1bbce0..be240b2979e 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -3581,12 +3581,13 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, } } - /* If we have (xor (and (xor A B) C) A) with C a constant we can instead - do (ior (and A ~C) (and B C)) which is a machine instruction on some - machines, and also has shorter instruction path length. */ + /* If we have (xor (and (xor A B) C) A) with C a constant or register + we can instead do (ior (and A ~C) (and B C)) which is a machine + instruction on some machines, and also has shorter instruction path + length. */ if (GET_CODE (op0) == AND && GET_CODE (XEXP (op0, 0)) == XOR - && CONST_INT_P (XEXP (op0, 1)) + && (CONST_INT_P (XEXP (op0, 1)) || REG_P (XEXP (op0, 1))) && rtx_equal_p (XEXP (XEXP (op0, 0), 0), trueop1)) { rtx a = trueop1; @@ -3600,7 +3601,7 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, /* Similarly, (xor (and (xor A B) C) B) as (ior (and A C) (and B ~C)) */ else if (GET_CODE (op0) == AND && GET_CODE (XEXP (op0, 0)) == XOR - && CONST_INT_P (XEXP (op0, 1)) + && (CONST_INT_P (XEXP (op0, 1)) || REG_P (XEXP (op0, 1))) && rtx_equal_p (XEXP (XEXP (op0, 0), 1), trueop1)) { rtx a = XEXP (XEXP (op0, 0), 0); diff --git a/gcc/testsuite/gcc.target/powerpc/pr90323.c b/gcc/testsuite/gcc.target/powerpc/pr90323.c new file mode 100644 index 00000000000..e80609c477f --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr90323.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <altivec.h> +#include <stdio.h> +volatile vector unsigned orig = {0xebebebeb, 0x34343434, 0x76767676, 0x12121212}; +volatile vector unsigned mask = {0xffffffff, 0, 0xffffffff, 0}; +volatile vector unsigned fill = {0xfefefefe, 0xaaaaaaaa, 0xbbbbbbbb, 0xcccccccc}; +volatile vector unsigned expected = {0xfefefefe, 0x34343434, 0xbbbbbbbb, 0x12121212}; +__attribute__((noinline)) vector unsigned +without_sel(vector unsigned l, vector unsigned r, vector unsigned mask) { + l = l & ~mask; + l |= mask & r; + return l; +} + +__attribute__((noinline)) vector unsigned +with_sel(vector unsigned l, vector unsigned r, vector unsigned mask) { + return vec_sel(l, r, mask); +} + +int main() { + vector unsigned res1 = without_sel(orig, fill, mask); + vector unsigned res2 = with_sel(orig, fill, mask); + if (!vec_all_eq(res1, expected)) + printf("error1\n"); + if (!vec_all_eq(res2, expected)) + printf("error2\n"); + return 0; +} + + +/* { dg-final { scan-assembler-times {\mxxsel\M} 2 } } */