diff mbox series

Simplify (truncate:QI (subreg:SI (reg:QI x))) to (reg:QI x)

Message ID 000f01d79550$77015fa0$65041ee0$@nextmovesoftware.com
State New
Headers show
Series Simplify (truncate:QI (subreg:SI (reg:QI x))) to (reg:QI x) | expand

Commit Message

Roger Sayle Aug. 19, 2021, 11:18 p.m. UTC
Whilst working on a backend patch, I noticed that the middle-end's
RTL optimizers weren't simplifying a truncation of a paradoxical
subreg extension, though it does transform closely related (more
complex) expressions.  The main (first) part of this patch
implements this simplification, reusing much of the logic already
in place.

I briefly considered suggesting that it's difficult to provide a new
testcase for this change, but then realized the reviewer's response
would be that this type of transformation should be self-tested
in simplify-rtx, so this patch adds a bunch of tests that integer
extensions and truncations are simplified as expected.  No good
deed goes unpunished and I was equally surprised to see that we
don't currently simplify/check/defend (zero_extend:SI (reg:SI)),
i.e. useless no-op extensions to the same mode.  So I've added
some logic to simplify (or more accurately prevent us generating
dubious RTL for) those.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and "make -k check" with no new failures.

Ok for mainline?


2021-08-20  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* simplify-rtx.c (simplify_truncation): Generalize simplification
	of (truncate:A (subreg:B X)).
	(simplify_unary_operation_1) [FLOAT_TRUNCATE, FLOAT_EXTEND,
	SIGN_EXTEND, ZERO_EXTEND]: Handle cases where the operand
	already has the desired machine mode.
	(test_scalar_int_ops): Add tests that useless extensions and
	truncations are optimized away.
	(test_scalar_int_ext_ops): New self-test function to confirm
	that truncations of extensions are correctly simplified.
	(test_scalar_int_ext_ops2): New self-test function to check
	truncations of truncations, extensions of extensions, and
	truncations of extensions.
	(test_scalar_ops): Call the above two functions with a
	representative sampling of integer machine modes.

Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

Comments

Andrew Pinski Aug. 19, 2021, 11:37 p.m. UTC | #1
On Thu, Aug 19, 2021 at 4:18 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Whilst working on a backend patch, I noticed that the middle-end's
> RTL optimizers weren't simplifying a truncation of a paradoxical
> subreg extension, though it does transform closely related (more
> complex) expressions.  The main (first) part of this patch
> implements this simplification, reusing much of the logic already
> in place.
>
> I briefly considered suggesting that it's difficult to provide a new
> testcase for this change, but then realized the reviewer's response
> would be that this type of transformation should be self-tested
> in simplify-rtx, so this patch adds a bunch of tests that integer
> extensions and truncations are simplified as expected.  No good
> deed goes unpunished and I was equally surprised to see that we
> don't currently simplify/check/defend (zero_extend:SI (reg:SI)),
> i.e. useless no-op extensions to the same mode.  So I've added
> some logic to simplify (or more accurately prevent us generating
> dubious RTL for) those.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and "make -k check" with no new failures.
>
> Ok for mainline?

The main target I know of that uses truncate a lot is MIPS64 which has
TARGET_TRULY_NOOP_TRUNCATION defined to be:
static bool
mips_truly_noop_truncation (poly_uint64 outprec, poly_uint64 inprec)
{
  return !TARGET_64BIT || inprec <= 32 || outprec > 32;
}

So you might want to make sure this is still correct for this case.

Thanks,
Andrew Pinski


>
>
> 2021-08-20  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * simplify-rtx.c (simplify_truncation): Generalize simplification
>         of (truncate:A (subreg:B X)).
>         (simplify_unary_operation_1) [FLOAT_TRUNCATE, FLOAT_EXTEND,
>         SIGN_EXTEND, ZERO_EXTEND]: Handle cases where the operand
>         already has the desired machine mode.
>         (test_scalar_int_ops): Add tests that useless extensions and
>         truncations are optimized away.
>         (test_scalar_int_ext_ops): New self-test function to confirm
>         that truncations of extensions are correctly simplified.
>         (test_scalar_int_ext_ops2): New self-test function to check
>         truncations of truncations, extensions of extensions, and
>         truncations of extensions.
>         (test_scalar_ops): Call the above two functions with a
>         representative sampling of integer machine modes.
>
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>
Jeff Law Aug. 23, 2021, 1:19 a.m. UTC | #2
On 8/19/2021 5:18 PM, Roger Sayle wrote:
> Whilst working on a backend patch, I noticed that the middle-end's
> RTL optimizers weren't simplifying a truncation of a paradoxical
> subreg extension, though it does transform closely related (more
> complex) expressions.  The main (first) part of this patch
> implements this simplification, reusing much of the logic already
> in place.
>
> I briefly considered suggesting that it's difficult to provide a new
> testcase for this change, but then realized the reviewer's response
> would be that this type of transformation should be self-tested
> in simplify-rtx, so this patch adds a bunch of tests that integer
> extensions and truncations are simplified as expected.  No good
> deed goes unpunished and I was equally surprised to see that we
> don't currently simplify/check/defend (zero_extend:SI (reg:SI)),
> i.e. useless no-op extensions to the same mode.  So I've added
> some logic to simplify (or more accurately prevent us generating
> dubious RTL for) those.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and "make -k check" with no new failures.
Indeed.  I'd bet there's other weaknesses in here.   I've got some 
patches here which add overflow handling on the H8 port (attempting to 
cut runtime of the builtin-arith-overflow-* tests).  Those end up using 
subregs and extensions fairly heavily.  While looking at how the code 
moves through the RTL pipeline it became pretty clear that we're 
generally not doing a good job at optimizing those cases well.

Thankfully I've found some sequences that allow the port to do limited 
store-flag instructions and that eliminated the need to chase this stuff 
down, at least for now.

>
> Ok for mainline?
>
>
> 2021-08-20  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
> 	* simplify-rtx.c (simplify_truncation): Generalize simplification
> 	of (truncate:A (subreg:B X)).
> 	(simplify_unary_operation_1) [FLOAT_TRUNCATE, FLOAT_EXTEND,
> 	SIGN_EXTEND, ZERO_EXTEND]: Handle cases where the operand
> 	already has the desired machine mode.
> 	(test_scalar_int_ops): Add tests that useless extensions and
> 	truncations are optimized away.
> 	(test_scalar_int_ext_ops): New self-test function to confirm
> 	that truncations of extensions are correctly simplified.
> 	(test_scalar_int_ext_ops2): New self-test function to check
> 	truncations of truncations, extensions of extensions, and
> 	truncations of extensions.
> 	(test_scalar_ops): Call the above two functions with a
> 	representative sampling of integer machine modes.
I briefly thought you were missing a subreg_lowpart check, but that's 
checked in the outermost IF.  The comments are somewhat misleading as 
the subreg offset in a lowpart will vary based on endianness, but that's 
not a big deal IMHO,

OK
jeff
diff mbox series

Patch

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index a719f57..f3df614 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -813,23 +813,49 @@  simplify_context::simplify_truncation (machine_mode mode, rtx op,
     return simplify_gen_unary (GET_CODE (op), mode,
 			       XEXP (XEXP (op, 0), 0), mode);
 
-  /* (truncate:A (subreg:B (truncate:C X) 0)) is
-     (truncate:A X).  */
+  /* Simplifications of (truncate:A (subreg:B X 0)).  */
   if (GET_CODE (op) == SUBREG
       && is_a <scalar_int_mode> (mode, &int_mode)
       && SCALAR_INT_MODE_P (op_mode)
       && is_a <scalar_int_mode> (GET_MODE (SUBREG_REG (op)), &subreg_mode)
-      && GET_CODE (SUBREG_REG (op)) == TRUNCATE
       && subreg_lowpart_p (op))
     {
-      rtx inner = XEXP (SUBREG_REG (op), 0);
-      if (GET_MODE_PRECISION (int_mode) <= GET_MODE_PRECISION (subreg_mode))
-	return simplify_gen_unary (TRUNCATE, int_mode, inner,
-				   GET_MODE (inner));
-      else
-	/* If subreg above is paradoxical and C is narrower
-	   than A, return (subreg:A (truncate:C X) 0).  */
-	return simplify_gen_subreg (int_mode, SUBREG_REG (op), subreg_mode, 0);
+      /* (truncate:A (subreg:B (truncate:C X) 0)) is (truncate:A X).  */
+      if (GET_CODE (SUBREG_REG (op)) == TRUNCATE)
+	{
+	  rtx inner = XEXP (SUBREG_REG (op), 0);
+	  if (GET_MODE_PRECISION (int_mode)
+	      <= GET_MODE_PRECISION (subreg_mode))
+	    return simplify_gen_unary (TRUNCATE, int_mode, inner,
+				       GET_MODE (inner));
+	  else
+	    /* If subreg above is paradoxical and C is narrower
+	       than A, return (subreg:A (truncate:C X) 0).  */
+	    return simplify_gen_subreg (int_mode, SUBREG_REG (op),
+					subreg_mode, 0);
+	}
+
+      /* Simplifications of (truncate:A (subreg:B X:C 0)) with
+	 paradoxical subregs (B is wider than C).  */
+      if (is_a <scalar_int_mode> (op_mode, &int_op_mode))
+	{
+	  unsigned int int_op_prec = GET_MODE_PRECISION (int_op_mode);
+	  unsigned int subreg_prec = GET_MODE_PRECISION (subreg_mode);
+	  if (int_op_prec > subreg_mode)
+	    {
+	      if (int_mode == subreg_mode)
+		return SUBREG_REG (op);
+	      if (GET_MODE_PRECISION (int_mode) < subreg_prec)
+		return simplify_gen_unary (TRUNCATE, int_mode,
+					   SUBREG_REG (op), subreg_mode);
+	    }
+	  /* Simplification of (truncate:A (subreg:B X:C 0)) where
+ 	     A is narrower than B and B is narrower than C.  */
+	  else if (int_op_prec < subreg_mode
+		   && GET_MODE_PRECISION (int_mode) < int_op_prec)
+	    return simplify_gen_unary (TRUNCATE, int_mode,
+				       SUBREG_REG (op), subreg_mode);
+	}
     }
 
   /* (truncate:A (truncate:B X)) is (truncate:A X).  */
@@ -1245,6 +1271,10 @@  simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
       break;
 
     case FLOAT_TRUNCATE:
+      /* Check for useless truncation.  */
+      if (GET_MODE (op) == mode)
+	return op;
+
       if (DECIMAL_FLOAT_MODE_P (mode))
 	break;
 
@@ -1297,6 +1327,10 @@  simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
       break;
 
     case FLOAT_EXTEND:
+      /* Check for useless extension.  */
+      if (GET_MODE (op) == mode)
+	return op;
+
       if (DECIMAL_FLOAT_MODE_P (mode))
 	break;
 
@@ -1410,6 +1444,10 @@  simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
       break;
 
     case SIGN_EXTEND:
+      /* Check for useless extension.  */
+      if (GET_MODE (op) == mode)
+	return op;
+
       /* (sign_extend (truncate (minus (label_ref L1) (label_ref L2))))
 	 becomes just the MINUS if its mode is MODE.  This allows
 	 folding switch statements on machines using casesi (such as
@@ -1580,6 +1618,10 @@  simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
       break;
 
     case ZERO_EXTEND:
+      /* Check for useless extension.  */
+      if (GET_MODE (op) == mode)
+	return op;
+
       /* Check for a zero extension of a subreg of a promoted
 	 variable, where the promotion is zero-extended, and the
 	 target mode is the same as the variable's promotion.  */
@@ -7563,8 +7605,92 @@  test_scalar_int_ops (machine_mode mode)
 		 simplify_gen_binary (IOR, mode, and_op0_6, and_op1_6));
   ASSERT_RTX_EQ (simplify_gen_binary (AND, mode, and_op0_op1, six),
 		 simplify_gen_binary (AND, mode, and_op0_6, and_op1_6));
+
+  /* Test useless extensions are eliminated.  */
+  ASSERT_RTX_EQ (op0, simplify_gen_unary (TRUNCATE, mode, op0, mode));
+  ASSERT_RTX_EQ (op0, simplify_gen_unary (ZERO_EXTEND, mode, op0, mode));
+  ASSERT_RTX_EQ (op0, simplify_gen_unary (SIGN_EXTEND, mode, op0, mode));
+  ASSERT_RTX_EQ (op0, lowpart_subreg (mode, op0, mode));
+}
+
+/* Verify some simplifications of integer extension/truncation.
+   Machine mode BMODE is the guaranteed wider than SMODE.  */
+
+static void
+test_scalar_int_ext_ops (machine_mode bmode, machine_mode smode)
+{
+  rtx sreg = make_test_reg (smode);
+
+  /* Check truncation of extension.  */
+  ASSERT_RTX_EQ (simplify_gen_unary (TRUNCATE, smode,
+				     simplify_gen_unary (ZERO_EXTEND, bmode,
+							 sreg, smode),
+				     bmode),
+		 sreg);
+  ASSERT_RTX_EQ (simplify_gen_unary (TRUNCATE, smode,
+				     simplify_gen_unary (SIGN_EXTEND, bmode,
+							 sreg, smode),
+				     bmode),
+		 sreg);
+  ASSERT_RTX_EQ (simplify_gen_unary (TRUNCATE, smode,
+				     lowpart_subreg (bmode, sreg, smode),
+				     bmode),
+		 sreg);
 }
 
+/* Verify more simplifications of integer extension/truncation.
+   BMODE is wider than MMODE which is wider than SMODE.  */
+
+static void
+test_scalar_int_ext_ops2 (machine_mode bmode, machine_mode mmode,
+			  machine_mode smode)
+{
+  rtx breg = make_test_reg (bmode);
+  rtx mreg = make_test_reg (mmode);
+  rtx sreg = make_test_reg (smode);
+
+  /* Check truncate of truncate.  */
+  ASSERT_RTX_EQ (simplify_gen_unary (TRUNCATE, smode,
+				     simplify_gen_unary (TRUNCATE, mmode,
+							 breg, bmode),
+				     mmode),
+		 simplify_gen_unary (TRUNCATE, smode, breg, bmode));
+
+  /* Check extension of extension.  */
+  ASSERT_RTX_EQ (simplify_gen_unary (ZERO_EXTEND, bmode,
+				     simplify_gen_unary (ZERO_EXTEND, mmode,
+							 sreg, smode),
+				     mmode),
+		 simplify_gen_unary (ZERO_EXTEND, bmode, sreg, smode));
+  ASSERT_RTX_EQ (simplify_gen_unary (SIGN_EXTEND, bmode,
+				     simplify_gen_unary (SIGN_EXTEND, mmode,
+							 sreg, smode),
+				     mmode),
+		 simplify_gen_unary (SIGN_EXTEND, bmode, sreg, smode));
+  ASSERT_RTX_EQ (simplify_gen_unary (SIGN_EXTEND, bmode,
+				     simplify_gen_unary (ZERO_EXTEND, mmode,
+							 sreg, smode),
+				     mmode),
+		 simplify_gen_unary (ZERO_EXTEND, bmode, sreg, smode));
+
+  /* Check truncation of extension.  */
+  ASSERT_RTX_EQ (simplify_gen_unary (TRUNCATE, smode,
+				     simplify_gen_unary (ZERO_EXTEND, bmode,
+							 mreg, mmode),
+				     bmode),
+		 simplify_gen_unary (TRUNCATE, smode, mreg, mmode));
+  ASSERT_RTX_EQ (simplify_gen_unary (TRUNCATE, smode,
+				     simplify_gen_unary (SIGN_EXTEND, bmode,
+							 mreg, mmode),
+				     bmode),
+		 simplify_gen_unary (TRUNCATE, smode, mreg, mmode));
+  ASSERT_RTX_EQ (simplify_gen_unary (TRUNCATE, smode,
+				     lowpart_subreg (bmode, mreg, mmode),
+				     bmode),
+		 simplify_gen_unary (TRUNCATE, smode, mreg, mmode));
+}  
+
+
 /* Verify some simplifications involving scalar expressions.  */
 
 static void
@@ -7576,6 +7702,18 @@  test_scalar_ops ()
       if (SCALAR_INT_MODE_P (mode) && mode != BImode)
 	test_scalar_int_ops (mode);
     }
+
+  test_scalar_int_ext_ops (HImode, QImode);
+  test_scalar_int_ext_ops (SImode, QImode);
+  test_scalar_int_ext_ops (SImode, HImode);
+  test_scalar_int_ext_ops (DImode, QImode);
+  test_scalar_int_ext_ops (DImode, HImode);
+  test_scalar_int_ext_ops (DImode, SImode);
+
+  test_scalar_int_ext_ops2 (SImode, HImode, QImode);
+  test_scalar_int_ext_ops2 (DImode, HImode, QImode);
+  test_scalar_int_ext_ops2 (DImode, SImode, QImode);
+  test_scalar_int_ext_ops2 (DImode, SImode, HImode);
 }
 
 /* Test vector simplifications involving VEC_DUPLICATE in which the