diff mbox series

combine, v3: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]

Message ID ZDaBpPyA/XiPOvjw@tucnak
State New
Headers show
Series combine, v3: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040] | expand

Commit Message

Jakub Jelinek April 12, 2023, 10:02 a.m. UTC
Hi!

On Wed, Apr 12, 2023 at 08:21:26AM +0200, Jakub Jelinek via Gcc-patches wrote:
> I would have expected something like
> WORD_REGISTER_OPERATIONS && known_le (GET_MODE_PRECISION (mode), BITS_PER_WORD)
> as the condition to use word_mode, rather than just
> WORD_REGISTER_OPERATIONS.  In both spots.  Because larger modes should be
> used as is, not a narrower word_mode instead of them.

In patch form that would be following (given that the combine.cc change
had scalar_int_mode mode we can as well just use normal comparison, and
simplify-rtx.cc has it guarded on HWI_COMPUTABLE_MODE_P, which is also only
true for scalar int modes).

I've tried the pr108947.c testcase, but I see no differences in the assembly
before/after the patch (but dunno if I'm using the right options).
The pr109040.c testcase from the patch I don't see the expected zero
extension without the patch and do see it with it.

As before, I can only test this easily on non-WORD_REGISTER_OPERATIONS
targets.

2023-04-12  Jeff Law  <jlaw@ventanamicro.com>
	    Jakub Jelinek  <jakub@redhat.com>

	PR target/109040
	* combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
	word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
	smaller than word_mode.
	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
	<case AND>: Likewise.

	* gcc.c-torture/execute/pr109040.c: New test.


	Jakub

Comments

Jeff Law April 12, 2023, 2:17 p.m. UTC | #1
On 4/12/23 04:02, Jakub Jelinek wrote:
> Hi!
> 
> On Wed, Apr 12, 2023 at 08:21:26AM +0200, Jakub Jelinek via Gcc-patches wrote:
>> I would have expected something like
>> WORD_REGISTER_OPERATIONS && known_le (GET_MODE_PRECISION (mode), BITS_PER_WORD)
>> as the condition to use word_mode, rather than just
>> WORD_REGISTER_OPERATIONS.  In both spots.  Because larger modes should be
>> used as is, not a narrower word_mode instead of them.
> 
> In patch form that would be following (given that the combine.cc change
> had scalar_int_mode mode we can as well just use normal comparison, and
> simplify-rtx.cc has it guarded on HWI_COMPUTABLE_MODE_P, which is also only
> true for scalar int modes).
> 
> I've tried the pr108947.c testcase, but I see no differences in the assembly
> before/after the patch (but dunno if I'm using the right options).
> The pr109040.c testcase from the patch I don't see the expected zero
> extension without the patch and do see it with it.
> 
> As before, I can only test this easily on non-WORD_REGISTER_OPERATIONS
> targets.
> 
> 2023-04-12  Jeff Law  <jlaw@ventanamicro.com>
> 	    Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/109040
> 	* combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
> 	word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
> 	smaller than word_mode.
> 	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
> 	<case AND>: Likewise.
> 
> 	* gcc.c-torture/execute/pr109040.c: New test.
Looks pretty sensible.  It'll take most of the day, but I'll do a 
bootstrap and regression test with this variant.

jeff
Jakub Jelinek April 12, 2023, 2:30 p.m. UTC | #2
On Wed, Apr 12, 2023 at 08:17:46AM -0600, Jeff Law wrote:
> Looks pretty sensible.  It'll take most of the day, but I'll do a bootstrap
> and regression test with this variant.

Thanks.  Note, it bootstraps/regtests on x86_64-linux and i686-linux fine,
though that is not WORD_REGISTER_OPERATIONS target.  And it builds cc1 on
aarch64-linux (just wanted to make sure I didn't break anything on 2
coefficient poly-int arches).

	Jakub
Segher Boessenkool April 12, 2023, 3:24 p.m. UTC | #3
On Wed, Apr 12, 2023 at 12:02:12PM +0200, Jakub Jelinek wrote:
> On Wed, Apr 12, 2023 at 08:21:26AM +0200, Jakub Jelinek via Gcc-patches wrote:
> > I would have expected something like
> > WORD_REGISTER_OPERATIONS && known_le (GET_MODE_PRECISION (mode), BITS_PER_WORD)
> > as the condition to use word_mode, rather than just
> > WORD_REGISTER_OPERATIONS.  In both spots.  Because larger modes should be
> > used as is, not a narrower word_mode instead of them.
> 
> In patch form that would be following (given that the combine.cc change
> had scalar_int_mode mode we can as well just use normal comparison, and
> simplify-rtx.cc has it guarded on HWI_COMPUTABLE_MODE_P, which is also only
> true for scalar int modes).
> 
> I've tried the pr108947.c testcase, but I see no differences in the assembly
> before/after the patch (but dunno if I'm using the right options).
> The pr109040.c testcase from the patch I don't see the expected zero
> extension without the patch and do see it with it.
> 
> As before, I can only test this easily on non-WORD_REGISTER_OPERATIONS
> targets.

There are no doubt tens more similar WORD_REGISTER_OPERATIONS problems
lurking.  We would be much better off if this wart was removed and we
handled such things properly.

That said:

> 	PR target/109040
> 	* combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
> 	word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
> 	smaller than word_mode.
> 	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
> 	<case AND>: Likewise.
> 
> 	* gcc.c-torture/execute/pr109040.c: New test.

Okay for trunk.  Thanks!


Segher
diff mbox series

Patch

--- gcc/combine.cc.jj	2023-04-07 16:02:06.668051629 +0200
+++ gcc/combine.cc	2023-04-12 11:24:18.458240028 +0200
@@ -10055,9 +10055,12 @@  simplify_and_const_int_1 (scalar_int_mod
 
   /* See what bits may be nonzero in VAROP.  Unlike the general case of
      a call to nonzero_bits, here we don't care about bits outside
-     MODE.  */
+     MODE unless WORD_REGISTER_OPERATIONS is true.  */
 
-  nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);
+  scalar_int_mode tmode = mode;
+  if (WORD_REGISTER_OPERATIONS && GET_MODE_BITSIZE (mode) < BITS_PER_WORD)
+    tmode = word_mode;
+  nonzero = nonzero_bits (varop, tmode) & GET_MODE_MASK (tmode);
 
   /* Turn off all bits in the constant that are known to already be zero.
      Thus, if the AND isn't needed at all, we will have CONSTOP == NONZERO_BITS
@@ -10071,7 +10074,7 @@  simplify_and_const_int_1 (scalar_int_mod
 
   /* If VAROP is a NEG of something known to be zero or 1 and CONSTOP is
      a power of two, we can replace this with an ASHIFT.  */
-  if (GET_CODE (varop) == NEG && nonzero_bits (XEXP (varop, 0), mode) == 1
+  if (GET_CODE (varop) == NEG && nonzero_bits (XEXP (varop, 0), tmode) == 1
       && (i = exact_log2 (constop)) >= 0)
     return simplify_shift_const (NULL_RTX, ASHIFT, mode, XEXP (varop, 0), i);
 
--- gcc/simplify-rtx.cc.jj	2023-03-02 19:09:45.459594212 +0100
+++ gcc/simplify-rtx.cc	2023-04-12 11:26:26.027400305 +0200
@@ -3752,7 +3752,13 @@  simplify_context::simplify_binary_operat
 	return op0;
       if (HWI_COMPUTABLE_MODE_P (mode))
 	{
-	  HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode);
+	  /* When WORD_REGISTER_OPERATIONS is true, we need to know the
+	     nonzero bits in WORD_MODE rather than MODE.  */
+          scalar_int_mode tmode = as_a <scalar_int_mode> (mode);
+          if (WORD_REGISTER_OPERATIONS
+	      && GET_MODE_BITSIZE (tmode) < BITS_PER_WORD)
+	    tmode = word_mode;
+	  HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, tmode);
 	  HOST_WIDE_INT nzop1;
 	  if (CONST_INT_P (trueop1))
 	    {
--- gcc/testsuite/gcc.c-torture/execute/pr109040.c.jj	2023-04-12 11:11:56.728938344 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr109040.c	2023-04-12 11:11:56.728938344 +0200
@@ -0,0 +1,23 @@ 
+/* PR target/109040 */
+
+typedef unsigned short __attribute__((__vector_size__ (32))) V;
+
+unsigned short a, b, c, d;
+
+void
+foo (V m, unsigned short *ret)
+{
+  V v = 6 > ((V) { 2124, 8 } & m);
+  unsigned short uc = v[0] + a + b + c + d;
+  *ret = uc;
+}
+
+int
+main ()
+{
+  unsigned short x;
+  foo ((V) { 0, 15 }, &x);
+  if (x != (unsigned short) ~0)
+    __builtin_abort ();
+  return 0;
+}