diff mbox

simplify-rtx: Transform (xor (and (xor A B) C) B) with C const

Message ID 698378c52a694b83b88313d591be654c1880b194.1478725732.git.segher@kernel.crashing.org
State New
Headers show

Commit Message

Segher Boessenkool Nov. 9, 2016, 9:13 p.m. UTC
match.pd transforms (A&C)|(B&~C) to ((A^B)&C)^B, which is fewer
operations if C is not const (and it is not on simple tests at least,
this transform is done very early already).

Various processors have "insert" instructions that can do this, but
combine cannot build those from the xor-and-xor, especially it has no
chance at all to do that if A or B or multiple instructions as well
(on PowerPC, the rl[ws]imi instructions that can do this with a rotate,
or a simple shift with appropriate C; other ISAs have similar insns).

This patch makes RTL simplify transform (xor (and (xor A B) C) B) back
to (ior (and A C) (and B ~C)) for constant C (and similar with A instead
of B for that last term).

Tested on powerpc64-linux {-m32,-m64}; no regressions.  Is this okay
for trunk?


Segher


2016-11-09  Segher Boessenkool  <segher@kernel.crashing.org>

	* simplify-rtx.c (simplify_binary_operation_1): Simplify
	(xor (and (xor A B) C) B) to (ior (and A C) (and B ~C)) and
	(xor (and (xor A B) C) B) to (ior (and A ~C) (and B C)) if C
	is a const_int.

---
 gcc/simplify-rtx.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

Comments

Bernd Schmidt Nov. 9, 2016, 9:27 p.m. UTC | #1
On 11/09/2016 10:13 PM, Segher Boessenkool wrote:
> 	* simplify-rtx.c (simplify_binary_operation_1): Simplify
> 	(xor (and (xor A B) C) B) to (ior (and A C) (and B ~C)) and
> 	(xor (and (xor A B) C) B) to (ior (and A ~C) (and B C)) if C
> 	is a const_int.

I think one of the xors should have A as the second operand.

>
> +    /* If we have (xor (and (xor A B) C) A) with C a constant we can instead
> +       do (ior (and A ~C) (and B C)) which is a machine instruction on some
> +       machines, and also has shorter instruction path length.  */
> +      if (GET_CODE (op0) == AND

Comments doesn't line up with the if/else on my monitor; could be email 
damage but please check.

Other than that, I think it does qualify as a simplification (or at 
least an improvement), so OK. Would be nice to check for it with a testcase.


Bernd
Marc Glisse Nov. 9, 2016, 9:54 p.m. UTC | #2
On Wed, 9 Nov 2016, Segher Boessenkool wrote:

> match.pd transforms (A&C)|(B&~C) to ((A^B)&C)^B, which is fewer
> operations if C is not const (and it is not on simple tests at least,
> this transform is done very early already).
>
> Various processors have "insert" instructions that can do this, but
> combine cannot build those from the xor-and-xor, especially it has no
> chance at all to do that if A or B or multiple instructions as well
> (on PowerPC, the rl[ws]imi instructions that can do this with a rotate,
> or a simple shift with appropriate C; other ISAs have similar insns).
>
> This patch makes RTL simplify transform (xor (and (xor A B) C) B) back
> to (ior (and A C) (and B ~C)) for constant C (and similar with A instead
> of B for that last term).

Would it make sense to implement this transformation in match.pd, next to 
the "opposite" one, or do you need it at the RTL level because C only 
becomes a constant at that stage?
Segher Boessenkool Nov. 9, 2016, 9:58 p.m. UTC | #3
On Wed, Nov 09, 2016 at 10:27:35PM +0100, Bernd Schmidt wrote:
> On 11/09/2016 10:13 PM, Segher Boessenkool wrote:
> >	* simplify-rtx.c (simplify_binary_operation_1): Simplify
> >	(xor (and (xor A B) C) B) to (ior (and A C) (and B ~C)) and
> >	(xor (and (xor A B) C) B) to (ior (and A ~C) (and B C)) if C
> >	is a const_int.
> 
> I think one of the xors should have A as the second operand.

The second, thanks for spotting it.

> >+    /* If we have (xor (and (xor A B) C) A) with C a constant we can 
> >instead
> >+       do (ior (and A ~C) (and B C)) which is a machine instruction on 
> >some
> >+       machines, and also has shorter instruction path length.  */
> >+      if (GET_CODE (op0) == AND
> 
> Comments doesn't line up with the if/else on my monitor; could be email 
> damage but please check.

I messed it up (had // comments first, whoops).

> Other than that, I think it does qualify as a simplification (or at 
> least an improvement), so OK. Would be nice to check for it with a testcase.

I'll do a PowerPC-specific testcase for all rl[wd]* next week.  rl[wd]imi
will show this xor-xor thing (half of all possible insns were not optimised
before this patch).  Is that enough?


Segher
Segher Boessenkool Nov. 9, 2016, 10:05 p.m. UTC | #4
On Wed, Nov 09, 2016 at 10:54:53PM +0100, Marc Glisse wrote:
> >match.pd transforms (A&C)|(B&~C) to ((A^B)&C)^B, which is fewer
> >operations if C is not const (and it is not on simple tests at least,
> >this transform is done very early already).
> >
> >Various processors have "insert" instructions that can do this, but
> >combine cannot build those from the xor-and-xor, especially it has no
> >chance at all to do that if A or B or multiple instructions as well
> >(on PowerPC, the rl[ws]imi instructions that can do this with a rotate,
> >or a simple shift with appropriate C; other ISAs have similar insns).
> >
> >This patch makes RTL simplify transform (xor (and (xor A B) C) B) back
> >to (ior (and A C) (and B ~C)) for constant C (and similar with A instead
> >of B for that last term).
> 
> Would it make sense to implement this transformation in match.pd, next to 
> the "opposite" one, or do you need it at the RTL level because C only 
> becomes a constant at that stage?

It becomes a constant in the later gimple passes, but we need it in the RTL
simplifiers as well, even if you also do it in match.pd?


Segher
Marc Glisse Nov. 9, 2016, 10:29 p.m. UTC | #5
On Wed, 9 Nov 2016, Segher Boessenkool wrote:

> On Wed, Nov 09, 2016 at 10:54:53PM +0100, Marc Glisse wrote:
>>> match.pd transforms (A&C)|(B&~C) to ((A^B)&C)^B, which is fewer
>>> operations if C is not const (and it is not on simple tests at least,
>>> this transform is done very early already).
>>>
>>> Various processors have "insert" instructions that can do this, but
>>> combine cannot build those from the xor-and-xor, especially it has no
>>> chance at all to do that if A or B or multiple instructions as well
>>> (on PowerPC, the rl[ws]imi instructions that can do this with a rotate,
>>> or a simple shift with appropriate C; other ISAs have similar insns).
>>>
>>> This patch makes RTL simplify transform (xor (and (xor A B) C) B) back
>>> to (ior (and A C) (and B ~C)) for constant C (and similar with A instead
>>> of B for that last term).
>>
>> Would it make sense to implement this transformation in match.pd, next to
>> the "opposite" one, or do you need it at the RTL level because C only
>> becomes a constant at that stage?
>
> It becomes a constant in the later gimple passes, but we need it in the RTL
> simplifiers as well, even if you also do it in match.pd?

(assuming it is always an improvement, even though it may use the same 
number of operations and one more constant)

Sure, it doesn't hurt to have it in both places. It just seems that since 
the problem was caused by match.pd in your original testcase, fixing it at 
that level (undoing the harm as soon as possible) would make the RTL 
version less useful (though not useless). Anyway, I don't feel competent 
to decide when which form is preferable, I was just curious.

(simplify
  (bit_xor:cs (bit_and:s (bit_xor:cs @0 @1) CONSTANT_CLASS_P@2) @0)
  (bit_ior (bit_and @0 (bit_not @2)) (bit_and @1 @2)))

(this handles vectors as well, I don't know if that is desired)
Marc Glisse Nov. 9, 2016, 10:51 p.m. UTC | #6
On Wed, 9 Nov 2016, Marc Glisse wrote:

> On Wed, 9 Nov 2016, Segher Boessenkool wrote:
>
>> On Wed, Nov 09, 2016 at 10:54:53PM +0100, Marc Glisse wrote:
>>>> match.pd transforms (A&C)|(B&~C) to ((A^B)&C)^B, which is fewer
>>>> operations if C is not const (and it is not on simple tests at least,
>>>> this transform is done very early already).
>>>> 
>>>> Various processors have "insert" instructions that can do this, but
>>>> combine cannot build those from the xor-and-xor, especially it has no
>>>> chance at all to do that if A or B or multiple instructions as well
>>>> (on PowerPC, the rl[ws]imi instructions that can do this with a rotate,
>>>> or a simple shift with appropriate C; other ISAs have similar insns).
>>>> 
>>>> This patch makes RTL simplify transform (xor (and (xor A B) C) B) back
>>>> to (ior (and A C) (and B ~C)) for constant C (and similar with A instead
>>>> of B for that last term).
>>> 
>>> Would it make sense to implement this transformation in match.pd, next to
>>> the "opposite" one, or do you need it at the RTL level because C only
>>> becomes a constant at that stage?
>> 
>> It becomes a constant in the later gimple passes, but we need it in the RTL
>> simplifiers as well, even if you also do it in match.pd?
>
> (assuming it is always an improvement, even though it may use the same number 
> of operations and one more constant)
>
> Sure, it doesn't hurt to have it in both places. It just seems that since the 
> problem was caused by match.pd in your original testcase, fixing it at that 
> level (undoing the harm as soon as possible) would make the RTL version less 
> useful (though not useless). Anyway, I don't feel competent to decide when 
> which form is preferable, I was just curious.
>
> (simplify
> (bit_xor:cs (bit_and:s (bit_xor:cs @0 @1) CONSTANT_CLASS_P@2) @0)

  (bit_xor:c (bit_and:s (bit_xor:cs @0 @1) CONSTANT_CLASS_P@2) @0)

without the initial "s" of course... (nice, genmatch does notice something 
is wrong if I leave it in)

> (bit_ior (bit_and @0 (bit_not @2)) (bit_and @1 @2)))
>
> (this handles vectors as well, I don't know if that is desired)
Segher Boessenkool Nov. 9, 2016, 11:05 p.m. UTC | #7
On Wed, Nov 09, 2016 at 11:29:45PM +0100, Marc Glisse wrote:
> >>>This patch makes RTL simplify transform (xor (and (xor A B) C) B) back
> >>>to (ior (and A C) (and B ~C)) for constant C (and similar with A instead
> >>>of B for that last term).
> >>
> >>Would it make sense to implement this transformation in match.pd, next to
> >>the "opposite" one, or do you need it at the RTL level because C only
> >>becomes a constant at that stage?
> >
> >It becomes a constant in the later gimple passes, but we need it in the RTL
> >simplifiers as well, even if you also do it in match.pd?
> 
> (assuming it is always an improvement, even though it may use the same 
> number of operations and one more constant)

And a shallower evaluation tree.

Does match.pd (or the gimple optimisers in general) care about the number
of constants at all?

> Sure, it doesn't hurt to have it in both places. It just seems that since 
> the problem was caused by match.pd in your original testcase, fixing it at 
> that level (undoing the harm as soon as possible) would make the RTL 
> version less useful (though not useless). Anyway, I don't feel competent 
> to decide when which form is preferable, I was just curious.

I don't know either.  See PR63568.

> (simplify
>  (bit_xor:cs (bit_and:s (bit_xor:cs @0 @1) CONSTANT_CLASS_P@2) @0)
>  (bit_ior (bit_and @0 (bit_not @2)) (bit_and @1 @2)))
> 
> (this handles vectors as well, I don't know if that is desired)

No clue.  There is a theme here ;-)


Segher
Bernd Schmidt Nov. 10, 2016, midnight UTC | #8
On 11/09/2016 10:58 PM, Segher Boessenkool wrote:
> I'll do a PowerPC-specific testcase for all rl[wd]* next week.  rl[wd]imi
> will show this xor-xor thing (half of all possible insns were not optimised
> before this patch).  Is that enough?

Sure. Once David's rtl testing work is ready we'll want to start doing 
selftests for these kinds of things.


Bernd
Richard Biener Nov. 10, 2016, 10:43 a.m. UTC | #9
On Thu, Nov 10, 2016 at 12:05 AM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> On Wed, Nov 09, 2016 at 11:29:45PM +0100, Marc Glisse wrote:
>> >>>This patch makes RTL simplify transform (xor (and (xor A B) C) B) back
>> >>>to (ior (and A C) (and B ~C)) for constant C (and similar with A instead
>> >>>of B for that last term).
>> >>
>> >>Would it make sense to implement this transformation in match.pd, next to
>> >>the "opposite" one, or do you need it at the RTL level because C only
>> >>becomes a constant at that stage?
>> >
>> >It becomes a constant in the later gimple passes, but we need it in the RTL
>> >simplifiers as well, even if you also do it in match.pd?
>>
>> (assuming it is always an improvement, even though it may use the same
>> number of operations and one more constant)
>
> And a shallower evaluation tree.
>
> Does match.pd (or the gimple optimisers in general) care about the number
> of constants at all?

It's a matter of definition what is simpler.  I think for two
different constants vs
one constant I'd prefer the single constant (the match.pd pattern would apply
to int128 as well).  Unless you have a andnot instruction and combine/CSE
are clever enough to see the opportunity to re-use a previous constant if
it changes bit-and to bit-and-not ...

I'd at _most_ consider them equal cost which would mean we'd need to
decide for a canonical form -- having the same one for constant and not constant
seems like a good idea here as well.

bernd notices better resource utilization due to the shallower tree for the
ior variant but that also eventually increases register pressure.

Thus, I'd leave this all to RTL land ;)  (esp. the constant with
and-not instruction
trick is likely not done on RTL)

Richard.

>> Sure, it doesn't hurt to have it in both places. It just seems that since
>> the problem was caused by match.pd in your original testcase, fixing it at
>> that level (undoing the harm as soon as possible) would make the RTL
>> version less useful (though not useless). Anyway, I don't feel competent
>> to decide when which form is preferable, I was just curious.
>
> I don't know either.  See PR63568.
>
>> (simplify
>>  (bit_xor:cs (bit_and:s (bit_xor:cs @0 @1) CONSTANT_CLASS_P@2) @0)
>>  (bit_ior (bit_and @0 (bit_not @2)) (bit_and @1 @2)))
>>
>> (this handles vectors as well, I don't know if that is desired)
>
> No clue.  There is a theme here ;-)
>
>
> Segher
Jeff Law Nov. 28, 2016, 11:28 p.m. UTC | #10
On 11/10/2016 03:43 AM, Richard Biener wrote:
> On Thu, Nov 10, 2016 at 12:05 AM, Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
>> On Wed, Nov 09, 2016 at 11:29:45PM +0100, Marc Glisse wrote:
>>>>>> This patch makes RTL simplify transform (xor (and (xor A B) C) B) back
>>>>>> to (ior (and A C) (and B ~C)) for constant C (and similar with A instead
>>>>>> of B for that last term).
>>>>>
>>>>> Would it make sense to implement this transformation in match.pd, next to
>>>>> the "opposite" one, or do you need it at the RTL level because C only
>>>>> becomes a constant at that stage?
>>>>
>>>> It becomes a constant in the later gimple passes, but we need it in the RTL
>>>> simplifiers as well, even if you also do it in match.pd?
>>>
>>> (assuming it is always an improvement, even though it may use the same
>>> number of operations and one more constant)
>>
>> And a shallower evaluation tree.
>>
>> Does match.pd (or the gimple optimisers in general) care about the number
>> of constants at all?
>
> It's a matter of definition what is simpler.  I think for two
> different constants vs
> one constant I'd prefer the single constant (the match.pd pattern would apply
> to int128 as well).  Unless you have a andnot instruction and combine/CSE
> are clever enough to see the opportunity to re-use a previous constant if
> it changes bit-and to bit-and-not ...
>
> I'd at _most_ consider them equal cost which would mean we'd need to
> decide for a canonical form -- having the same one for constant and not constant
> seems like a good idea here as well.
> bernd notices better resource utilization due to the shallower tree for the
> ior variant but that also eventually increases register pressure.
>
> Thus, I'd leave this all to RTL land ;)  (esp. the constant with
> and-not instruction
> trick is likely not done on RTL)
Definitely seems like we want to keep the match.pd transformation, then 
potentially un-do it in the RTL world where we can query target costs, 
supported insns, etc.

jeff
diff mbox

Patch

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 5c3dea1..62d60f3 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -2886,6 +2886,37 @@  simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 	    }
 	}
 
+    /* If we have (xor (and (xor A B) C) A) with C a constant we can instead
+       do (ior (and A ~C) (and B C)) which is a machine instruction on some
+       machines, and also has shorter instruction path length.  */
+      if (GET_CODE (op0) == AND
+	  && GET_CODE (XEXP (op0, 0)) == XOR
+	  && CONST_INT_P (XEXP (op0, 1))
+	  && rtx_equal_p (XEXP (XEXP (op0, 0), 0), trueop1))
+	{
+	  rtx a = trueop1;
+	  rtx b = XEXP (XEXP (op0, 0), 1);
+	  rtx c = XEXP (op0, 1);
+	  rtx nc = simplify_gen_unary (NOT, mode, c, mode);
+	  rtx a_nc = simplify_gen_binary (AND, mode, a, nc);
+	  rtx bc = simplify_gen_binary (AND, mode, b, c);
+	  return simplify_gen_binary (IOR, mode, a_nc, bc);
+	}
+    /* Similarly, (xor (and (xor A B) C) B) as (ior (and A C) (and B ~C))  */
+      else if (GET_CODE (op0) == AND
+	  && GET_CODE (XEXP (op0, 0)) == XOR
+	  && CONST_INT_P (XEXP (op0, 1))
+	  && rtx_equal_p (XEXP (XEXP (op0, 0), 1), trueop1))
+	{
+	  rtx a = XEXP (XEXP (op0, 0), 0);
+	  rtx b = trueop1;
+	  rtx c = XEXP (op0, 1);
+	  rtx nc = simplify_gen_unary (NOT, mode, c, mode);
+	  rtx b_nc = simplify_gen_binary (AND, mode, b, nc);
+	  rtx ac = simplify_gen_binary (AND, mode, a, c);
+	  return simplify_gen_binary (IOR, mode, ac, b_nc);
+	}
+
       /* (xor (comparison foo bar) (const_int 1)) can become the reversed
 	 comparison if STORE_FLAG_VALUE is 1.  */
       if (STORE_FLAG_VALUE == 1