diff mbox series

[take,2] middle-end: Optimize (A&C)^(B&C) to (A^B)&C in simplify_rtx.

Message ID 000c01d643b2$845dfc80$8d19f580$@nextmovesoftware.com
State New
Headers show
Series [take,2] middle-end: Optimize (A&C)^(B&C) to (A^B)&C in simplify_rtx. | expand

Commit Message

Roger Sayle June 16, 2020, 7:48 a.m. UTC
As suggested by Richard Sandiford, this patch splits out the
previous RTL simplification, of (X&C)^(Y&C) to (X^Y)&C, to its
own function, simplify_distributive_operation, and calls it when
appropriate for IOR, XOR and AND.

Instrumenting a bootstrap reveals this optimization triggers
393358 times during stage2, stage3 and building the libraries,
and a further 263659 times during make check.  By order of
occurrence the RTL transformation opportunities are:

 284447 01 and ior
 156798 00 xor and
 131110 11 and ior
  47253 00 and ior
  28035 00 ior and
   2804 01 ior and
   2698 11 ior and
   2262 01 xor and
    602 11 xor and
    312 00 xor xor
    298 00 xor rotate
    120 00 and ashift
    108 00 ior lshiftrt
     60 00 and ashiftrt
     54 00 ior ashift
     18 00 and lshiftrt
     12 10 xor and
     12 00 xor rotatert
      8 00 xor lshiftrt
      4 10 ior and
      2 00 ior ashiftrt

where the two binary digits denote the surviving inner unique operands,
so "00 xor and" corresponds to the original (xor (and X Z) (and Y Z)),
and "01 and ior" corresponds to (and (ior X Y) (ior Y Z)).

Many thanks also to Richard for pointing out simplify_rtx_c_tests, the
self-testing framework in simplify-rtx.c, which is new since my day.
This patch supplements the existing vector mode testing, with a suite
of scalar integer mode tests that confirm that many of the expected
integer simplifications in simplify-rtx are being applied as expected.
This includes three tests of the new simplify_distributive_operation.

Before:
xgcc -xc -fself-test: 59693 pass(es) in 0.820956 seconds
xgcc -xc++ -fself-test: 59723 pass(es) in 0.786662 seconds
After:
xgcc -xc -fself-test: 60003 pass(es) in 0.860637 seconds
xgcc -xc++ -fself-test: 60033 pass(es) in 0.794624 seconds


I do have one thought/suggestion around test_scalar_ops for future
generations.  These tests are extremely strict; instead of an
unexpected failure in the testsuite, breaking a self-test stops
the build.  Instead of reverting this patch, should anything go
wrong (in future on a misbehaving platform), might I instead
propose simply commenting out the call to test_scalar_ops in
simplify_rtx_c_tests as a mitigation strategy whilst the build is
restored.  In fact, removing the "static" from test_scalar_ops
would avoid the "defined but not used" complication from this
disaster recovery plan.

This patch has been tested with "make bootstrap" and "make -k check"
on x86_64-pc-linux-gnu with no regressions.


2020-06-16  Roger Sayle  <roger@nextmovesoftware.com>
            Richard Sandiford  <richard.sandiford@arm.com>

        * simplify-rtx.c (simplify_distributive_operation): New function
        to un-distribute a binary operation of two binary operations.
        (X & C) ^ (Y & C) to (X ^ Y) & C, when C is simple (i.e. a constant).
        (simplify_binary_operation_1) <IOR, XOR, AND>: Call it from here
        when appropriate.
        (test_scalar_int_ops): New function for unit self-testing
        scalar integer transformations in simplify-rtx.c.
        (test_scalar_ops): Call test_scalar_int_ops for each integer mode.
        (simplify_rtx_c_tests): Call test_scalar_ops.


Many thanks in advance,
Roger
--
Roger Sayle,
NextMove Software
Cambridge, UK

Comments

Richard Sandiford June 22, 2020, 7:41 p.m. UTC | #1
Hi Roger,

Thanks for the update and sorry for the slow reply.

"Roger Sayle" <roger@nextmovesoftware.com> writes:
> As suggested by Richard Sandiford, this patch splits out the
> previous RTL simplification, of (X&C)^(Y&C) to (X^Y)&C, to its
> own function, simplify_distributive_operation, and calls it when
> appropriate for IOR, XOR and AND.
>
> Instrumenting a bootstrap reveals this optimization triggers
> 393358 times during stage2, stage3 and building the libraries,
> and a further 263659 times during make check.  By order of
> occurrence the RTL transformation opportunities are:
>
>  284447 01 and ior
>  156798 00 xor and
>  131110 11 and ior
>   47253 00 and ior
>   28035 00 ior and
>    2804 01 ior and
>    2698 11 ior and
>    2262 01 xor and
>     602 11 xor and
>     312 00 xor xor
>     298 00 xor rotate
>     120 00 and ashift
>     108 00 ior lshiftrt
>      60 00 and ashiftrt
>      54 00 ior ashift
>      18 00 and lshiftrt
>      12 10 xor and
>      12 00 xor rotatert
>       8 00 xor lshiftrt
>       4 10 ior and
>       2 00 ior ashiftrt

That's an impressive number of hits :-)

> where the two binary digits denote the surviving inner unique operands,
> so "00 xor and" corresponds to the original (xor (and X Z) (and Y Z)),
> and "01 and ior" corresponds to (and (ior X Y) (ior Y Z)).
>
> Many thanks also to Richard for pointing out simplify_rtx_c_tests, the
> self-testing framework in simplify-rtx.c, which is new since my day.
> This patch supplements the existing vector mode testing, with a suite
> of scalar integer mode tests that confirm that many of the expected
> integer simplifications in simplify-rtx are being applied as expected.
> This includes three tests of the new simplify_distributive_operation.
>
> Before:
> xgcc -xc -fself-test: 59693 pass(es) in 0.820956 seconds
> xgcc -xc++ -fself-test: 59723 pass(es) in 0.786662 seconds
> After:
> xgcc -xc -fself-test: 60003 pass(es) in 0.860637 seconds
> xgcc -xc++ -fself-test: 60033 pass(es) in 0.794624 seconds
>
>
> I do have one thought/suggestion around test_scalar_ops for future
> generations.  These tests are extremely strict; instead of an
> unexpected failure in the testsuite, breaking a self-test stops
> the build.  Instead of reverting this patch, should anything go
> wrong (in future on a misbehaving platform), might I instead
> propose simply commenting out the call to test_scalar_ops in
> simplify_rtx_c_tests as a mitigation strategy whilst the build is
> restored.  In fact, removing the "static" from test_scalar_ops
> would avoid the "defined but not used" complication from this
> disaster recovery plan.

Yeah, we can work around it rather than revert the patch.

> This patch has been tested with "make bootstrap" and "make -k check"
> on x86_64-pc-linux-gnu with no regressions.
>
>
> 2020-06-16  Roger Sayle  <roger@nextmovesoftware.com>
>             Richard Sandiford  <richard.sandiford@arm.com>

Thanks for the gesture, but I don't think I should be co-author here.
I didn't write anything :-)

> […]
> @@ -3064,6 +3112,21 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
>  	    }
>  	}
>  
> +      /* Convert (ior (and A C) (and B C)) into (and (ior A B) C).  */
> +      if (GET_CODE (op0) == GET_CODE (op1)
> +	  && (GET_CODE (op0) == AND
> +	      || GET_CODE (op0) == IOR
> +	      || GET_CODE (op0) == LSHIFTRT
> +	      || GET_CODE (op0) == ASHIFTRT
> +	      || GET_CODE (op0) == ASHIFT
> +	      || GET_CODE (op0) == ROTATE
> +	      || GET_CODE (op0) == ROTATERT))
> +	{
> +	  tem = simplify_distributive_operation (code, mode, op0, op1);
> +	  if (tem)
> +	    return tem;
> +	}
> +
>        tem = simplify_byte_swapping_operation (code, mode, op0, op1);
>        if (tem)
>  	return tem;
> @@ -3302,6 +3365,20 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
>  	  && (reversed = reversed_comparison (op0, int_mode)))
>  	return reversed;
>  
> +      /* Convert (xor (and A C) (and B C)) into (and (xor A B) C).  */
> +      if (GET_CODE (op0) == GET_CODE (op1)
> +	  && (GET_CODE (op0) == AND
> +	      || GET_CODE (op0) == XOR
> +	      || GET_CODE (op0) == LSHIFTRT
> +	      || GET_CODE (op0) == ASHIFT
> +	      || GET_CODE (op0) == ROTATE
> +	      || GET_CODE (op0) == ROTATERT))
> +	{
> +	  tem = simplify_distributive_operation (code, mode, op0, op1);
> +	  if (tem)
> +	    return tem;
> +	}
> +
>        tem = simplify_byte_swapping_operation (code, mode, op0, op1);
>        if (tem)
>  	return tem;
> @@ -3500,6 +3577,21 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
>  	  && rtx_equal_p (op1, XEXP (XEXP (op0, 1), 0)))
>  	return simplify_gen_binary (AND, mode, op1, XEXP (op0, 0));
>  
> +      /* Convert (and (ior A C) (ior B C)) into (ior (and A B) C).  */
> +      if (GET_CODE (op0) == GET_CODE (op1)
> +	  && (GET_CODE (op0) == AND
> +	      || GET_CODE (op0) == IOR
> +	      || GET_CODE (op0) == LSHIFTRT
> +	      || GET_CODE (op0) == ASHIFTRT
> +	      || GET_CODE (op0) == ASHIFT
> +	      || GET_CODE (op0) == ROTATE
> +	      || GET_CODE (op0) == ROTATERT))
> +	{
> +	  tem = simplify_distributive_operation (code, mode, op0, op1);
> +	  if (tem)
> +	    return tem;
> +	}
> +
>        tem = simplify_byte_swapping_operation (code, mode, op0, op1);
>        if (tem)
>  	return tem;

Any reason to exclude ASHIFTRT from the XOR case?  AFAICT it distributes
in the same way: for equal shifts, the ASHIFTRT reproduces the same
number of sign bits, which are handled by all logical ops in the same
way as the original sign bit.  Thus doing the logical OP first on the
original sign bit and then shifting will have the same effect.

LGTM otherwise, thanks.  If you can update the patch then I'll push it.

Richard
diff mbox series

Patch

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 28c2dc6..0f93e17 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -2392,6 +2392,54 @@  simplify_binary_operation_series (rtx_code code, machine_mode mode,
   return gen_vec_series (mode, new_base, new_step);
 }
 
+/* Subroutine of simplify_binary_operation_1.  Un-distribute a binary
+   operation CODE with result mode MODE, operating on OP0 and OP1.
+   e.g. simplify (xor (and A C) (and (B C)) to (and (xor (A B) C).
+   Returns NULL_RTX if no simplification is possible.  */
+
+static rtx
+simplify_distributive_operation (enum rtx_code code, machine_mode mode,
+				 rtx op0, rtx op1)
+{
+  enum rtx_code op = GET_CODE (op0);
+  gcc_assert (GET_CODE (op1) == op);
+
+  if (rtx_equal_p (XEXP (op0, 1), XEXP (op1, 1))
+      && ! side_effects_p (XEXP (op0, 1)))
+    return simplify_gen_binary (op, mode,
+				simplify_gen_binary (code, mode,
+						     XEXP (op0, 0),
+						     XEXP (op1, 0)),
+				XEXP (op0, 1));
+
+  if (GET_RTX_CLASS (op) == RTX_COMM_ARITH)
+    {
+      if (rtx_equal_p (XEXP (op0, 0), XEXP (op1, 0))
+	  && ! side_effects_p (XEXP (op0, 0)))
+	return simplify_gen_binary (op, mode,
+				    simplify_gen_binary (code, mode,
+							 XEXP (op0, 1),
+							 XEXP (op1, 1)),
+				    XEXP (op0, 0));
+      if (rtx_equal_p (XEXP (op0, 0), XEXP (op1, 1))
+	  && ! side_effects_p (XEXP (op0, 0)))
+	return simplify_gen_binary (op, mode,
+				    simplify_gen_binary (code, mode,
+							 XEXP (op0, 1),
+							 XEXP (op1, 0)),
+				    XEXP (op0, 0));
+      if (rtx_equal_p (XEXP (op0, 1), XEXP (op1, 0))
+	  && ! side_effects_p (XEXP (op0, 1)))
+	return simplify_gen_binary (op, mode,
+				    simplify_gen_binary (code, mode,
+							 XEXP (op0, 0),
+							 XEXP (op1, 1)),
+				    XEXP (op0, 1));
+    }
+
+  return NULL_RTX;
+}
+
 /* Subroutine of simplify_binary_operation.  Simplify a binary operation
    CODE with result mode MODE, operating on OP0 and OP1.  If OP0 and/or
    OP1 are constant pool references, TRUEOP0 and TRUEOP1 represent the
@@ -3064,6 +3112,21 @@  simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 	    }
 	}
 
+      /* Convert (ior (and A C) (and B C)) into (and (ior A B) C).  */
+      if (GET_CODE (op0) == GET_CODE (op1)
+	  && (GET_CODE (op0) == AND
+	      || GET_CODE (op0) == IOR
+	      || GET_CODE (op0) == LSHIFTRT
+	      || GET_CODE (op0) == ASHIFTRT
+	      || GET_CODE (op0) == ASHIFT
+	      || GET_CODE (op0) == ROTATE
+	      || GET_CODE (op0) == ROTATERT))
+	{
+	  tem = simplify_distributive_operation (code, mode, op0, op1);
+	  if (tem)
+	    return tem;
+	}
+
       tem = simplify_byte_swapping_operation (code, mode, op0, op1);
       if (tem)
 	return tem;
@@ -3302,6 +3365,20 @@  simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 	  && (reversed = reversed_comparison (op0, int_mode)))
 	return reversed;
 
+      /* Convert (xor (and A C) (and B C)) into (and (xor A B) C).  */
+      if (GET_CODE (op0) == GET_CODE (op1)
+	  && (GET_CODE (op0) == AND
+	      || GET_CODE (op0) == XOR
+	      || GET_CODE (op0) == LSHIFTRT
+	      || GET_CODE (op0) == ASHIFT
+	      || GET_CODE (op0) == ROTATE
+	      || GET_CODE (op0) == ROTATERT))
+	{
+	  tem = simplify_distributive_operation (code, mode, op0, op1);
+	  if (tem)
+	    return tem;
+	}
+
       tem = simplify_byte_swapping_operation (code, mode, op0, op1);
       if (tem)
 	return tem;
@@ -3500,6 +3577,21 @@  simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 	  && rtx_equal_p (op1, XEXP (XEXP (op0, 1), 0)))
 	return simplify_gen_binary (AND, mode, op1, XEXP (op0, 0));
 
+      /* Convert (and (ior A C) (ior B C)) into (ior (and A B) C).  */
+      if (GET_CODE (op0) == GET_CODE (op1)
+	  && (GET_CODE (op0) == AND
+	      || GET_CODE (op0) == IOR
+	      || GET_CODE (op0) == LSHIFTRT
+	      || GET_CODE (op0) == ASHIFTRT
+	      || GET_CODE (op0) == ASHIFT
+	      || GET_CODE (op0) == ROTATE
+	      || GET_CODE (op0) == ROTATERT))
+	{
+	  tem = simplify_distributive_operation (code, mode, op0, op1);
+	  if (tem)
+	    return tem;
+	}
+
       tem = simplify_byte_swapping_operation (code, mode, op0, op1);
       if (tem)
 	return tem;
@@ -7218,6 +7310,81 @@  make_test_reg (machine_mode mode)
   return gen_rtx_REG (mode, test_reg_num++);
 }
 
+static void
+test_scalar_int_ops (machine_mode mode)
+{
+  rtx op0 = make_test_reg (mode);
+  rtx op1 = make_test_reg (mode);
+  rtx six = GEN_INT (6);
+
+  rtx neg_op0 = simplify_gen_unary (NEG, mode, op0, mode);
+  rtx not_op0 = simplify_gen_unary (NOT, mode, op0, mode);
+  rtx bswap_op0 = simplify_gen_unary (BSWAP, mode, op0, mode);
+
+  rtx and_op0_op1 = simplify_gen_binary (AND, mode, op0, op1);
+  rtx ior_op0_op1 = simplify_gen_binary (IOR, mode, op0, op1);
+  rtx xor_op0_op1 = simplify_gen_binary (XOR, mode, op0, op1);
+
+  rtx and_op0_6 = simplify_gen_binary (AND, mode, op0, six);
+  rtx and_op1_6 = simplify_gen_binary (AND, mode, op1, six);
+
+  /* Test some binary identities.  */
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (PLUS, mode, op0, const0_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (PLUS, mode, const0_rtx, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (MINUS, mode, op0, const0_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (MULT, mode, op0, const1_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (MULT, mode, const1_rtx, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (DIV, mode, op0, const1_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (AND, mode, op0, constm1_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (AND, mode, constm1_rtx, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (IOR, mode, op0, const0_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (IOR, mode, const0_rtx, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (XOR, mode, op0, const0_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (XOR, mode, const0_rtx, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (ASHIFT, mode, op0, const0_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (ROTATE, mode, op0, const0_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (ASHIFTRT, mode, op0, const0_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (LSHIFTRT, mode, op0, const0_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (ROTATERT, mode, op0, const0_rtx));
+
+  /* Test some idempotent operations.  */
+  ASSERT_RTX_EQ (op0, simplify_gen_unary (NEG, mode, neg_op0, mode));
+  ASSERT_RTX_EQ (op0, simplify_gen_unary (NOT, mode, not_op0, mode));
+  ASSERT_RTX_EQ (op0, simplify_gen_unary (BSWAP, mode, bswap_op0, mode));
+
+  /* Test some reflexive operations.  */
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (AND, mode, op0, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (IOR, mode, op0, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (SMIN, mode, op0, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (SMAX, mode, op0, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (UMIN, mode, op0, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (UMAX, mode, op0, op0));
+
+  ASSERT_RTX_EQ (const0_rtx, simplify_gen_binary (MINUS, mode, op0, op0));
+  ASSERT_RTX_EQ (const0_rtx, simplify_gen_binary (XOR, mode, op0, op0));
+
+  /* Test simplify_distributive_operation.  */
+  ASSERT_RTX_EQ (simplify_gen_binary (AND, mode, xor_op0_op1, six),
+		 simplify_gen_binary (XOR, mode, and_op0_6, and_op1_6));
+  ASSERT_RTX_EQ (simplify_gen_binary (AND, mode, ior_op0_op1, six),
+		 simplify_gen_binary (IOR, mode, and_op0_6, and_op1_6));
+  ASSERT_RTX_EQ (simplify_gen_binary (AND, mode, and_op0_op1, six),
+		 simplify_gen_binary (AND, mode, and_op0_6, and_op1_6));
+}
+
+/* Verify some simplifications involving scalar expressions.  */
+
+static void
+test_scalar_ops ()
+{
+  for (unsigned int i = 0; i < NUM_MACHINE_MODES; ++i)
+    {
+      machine_mode mode = (machine_mode) i;
+      if (SCALAR_INT_MODE_P (mode) && mode != BImode)
+	test_scalar_int_ops (mode);
+    }
+}
+
 /* Test vector simplifications involving VEC_DUPLICATE in which the
    operands and result have vector mode MODE.  SCALAR_REG is a pseudo
    register that holds one element of MODE.  */
@@ -7735,6 +7902,7 @@  simplify_const_poly_int_tests<N>::run ()
 void
 simplify_rtx_c_tests ()
 {
+  test_scalar_ops ();
   test_vector_ops ();
   simplify_const_poly_int_tests<NUM_POLY_INT_COEFFS>::run ();
 }