diff mbox series

combine: Don't optimize paradoxical SUBREG AND CONST_INT on WORD_REGISTER_OPERATIONS targets [PR112758]

Message ID ZYVKa0ZMbjsP+n1h@tucnak
State New
Headers show
Series combine: Don't optimize paradoxical SUBREG AND CONST_INT on WORD_REGISTER_OPERATIONS targets [PR112758] | expand

Commit Message

Jakub Jelinek Dec. 22, 2023, 8:35 a.m. UTC
Hi!

As discussed in the PR, the following testcase is miscompiled on RISC-V
64-bit, because num_sign_bit_copies in one spot pretends the bits in
a paradoxical SUBREG beyond SUBREG_REG SImode are all sign bit copies:
5444		  /* For paradoxical SUBREGs on machines where all register operations
5445		     affect the entire register, just look inside.  Note that we are
5446		     passing MODE to the recursive call, so the number of sign bit
5447		     copies will remain relative to that mode, not the inner mode.
5448	
5449		     This works only if loads sign extend.  Otherwise, if we get a
5450		     reload for the inner part, it may be loaded from the stack, and
5451		     then we lose all sign bit copies that existed before the store
5452		     to the stack.  */
5453		  if (WORD_REGISTER_OPERATIONS
5454		      && load_extend_op (inner_mode) == SIGN_EXTEND
5455		      && paradoxical_subreg_p (x)
5456		      && MEM_P (SUBREG_REG (x)))
and then optimizes based on that in one place, but then the
r7-1077 optimization triggers in and treats all the upper bits in
paradoxical SUBREG as undefined and performs based on that another
optimization.  The r7-1077 optimization is done only if SUBREG_REG
is either a REG or MEM, from the discussions in the PR seems that if
it is a REG, the upper bits in paradoxical SUBREG on
WORD_REGISTER_OPERATIONS targets aren't really undefined, but we can't
tell what values they have because we don't see the operation which
computed that REG, and for MEM it depends on load_extend_op - if
it is SIGN_EXTEND, the upper bits are sign bit copies and so something
not really usable for the optimization, if ZERO_EXTEND, they are zeros
and it is usable for the optimization, for UNKNOWN I think it is better
to punt as well.

So, the following patch basically disables the r7-1077 optimization
on WORD_REGISTER_OPERATIONS unless we know it is still ok for sure,
which is either if sub_width is >= BITS_PER_WORD because then the
WORD_REGISTER_OPERATIONS rules don't apply, or load_extend_op on a MEM
is ZERO_EXTEND.

Bootstrapped/regtested on x86_64-linux and i686-linux (neither of which
is WORD_REGISTER_OPERATIONS target), tested on the testcase using
cross to riscv64-linux but don't have an easy access to a
WORD_REGISTER_OPERATIONS target to bootstrap/regtest it there.

Ok for trunk?

2023-12-22  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/112758
	* combine.cc (make_compopund_operation_int): Optimize AND of a SUBREG
	based on nonzero_bits of SUBREG_REG and constant mask on
	WORD_REGISTER_OPERATIONS targets only if it is a zero extending
	MEM load.

	* gcc.c-torture/execute/pr112758.c: New test.


	Jakub

Comments

Eric Botcazou Dec. 22, 2023, 11:06 a.m. UTC | #1
> Bootstrapped/regtested on x86_64-linux and i686-linux (neither of which
> is WORD_REGISTER_OPERATIONS target), tested on the testcase using
> cross to riscv64-linux but don't have an easy access to a
> WORD_REGISTER_OPERATIONS target to bootstrap/regtest it there.
> 
> Ok for trunk?

Yes, thanks for fixing this.
diff mbox series

Patch

--- gcc/combine.cc.jj	2023-12-11 23:52:03.528513943 +0100
+++ gcc/combine.cc	2023-12-21 20:25:45.461737423 +0100
@@ -8227,12 +8227,20 @@  make_compound_operation_int (scalar_int_
 	  int sub_width;
 	  if ((REG_P (sub) || MEM_P (sub))
 	      && GET_MODE_PRECISION (sub_mode).is_constant (&sub_width)
-	      && sub_width < mode_width)
+	      && sub_width < mode_width
+	      && (!WORD_REGISTER_OPERATIONS
+		  || sub_width >= BITS_PER_WORD
+		  /* On WORD_REGISTER_OPERATIONS targets the bits
+		     beyond sub_mode aren't considered undefined,
+		     so optimize only if it is a MEM load when MEM loads
+		     zero extend, because then the upper bits are all zero.  */
+		  || (MEM_P (sub)
+		      && load_extend_op (sub_mode) == ZERO_EXTEND)))
 	    {
 	      unsigned HOST_WIDE_INT mode_mask = GET_MODE_MASK (sub_mode);
 	      unsigned HOST_WIDE_INT mask;
 
-	      /* original AND constant with all the known zero bits set */
+	      /* Original AND constant with all the known zero bits set.  */
 	      mask = UINTVAL (XEXP (x, 1)) | (~nonzero_bits (sub, sub_mode));
 	      if ((mask & mode_mask) == mode_mask)
 		{
--- gcc/testsuite/gcc.c-torture/execute/pr112758.c.jj	2023-12-21 21:01:43.780755959 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr112758.c	2023-12-21 21:01:30.521940358 +0100
@@ -0,0 +1,15 @@ 
+/* PR rtl-optimization/112758 */
+
+int a = -__INT_MAX__ - 1;
+
+int
+main ()
+{
+  if (-__INT_MAX__ - 1U == 0x80000000ULL)
+    {
+      unsigned long long b = 0xffff00ffffffffffULL;
+      if ((b & a) != 0xffff00ff80000000ULL)
+	__builtin_abort ();
+    }
+  return 0;
+}