diff mbox series

i386: Improve ix86_expand_int_movcc [PR105338]

Message ID YmOigFj35/+044Dx@tucnak
State New
Headers show
Series i386: Improve ix86_expand_int_movcc [PR105338] | expand

Commit Message

Jakub Jelinek April 23, 2022, 6:53 a.m. UTC
Hi!

The following testcase regressed on x86_64 on the trunk, due to some GIMPLE
pass changes (r12-7687) we end up an *.optimized dump difference of:
@@ -8,14 +8,14 @@ int foo (int i)
 
   <bb 2> [local count: 1073741824]:
   if (i_2(D) != 0)
-    goto <bb 4>; [35.00%]
+    goto <bb 3>; [35.00%]
   else
-    goto <bb 3>; [65.00%]
+    goto <bb 4>; [65.00%]
 
-  <bb 3> [local count: 697932184]:
+  <bb 3> [local count: 375809640]:
 
   <bb 4> [local count: 1073741824]:
-  # iftmp.0_1 = PHI <5(2), i_2(D)(3)>
+  # iftmp.0_1 = PHI <5(3), i_2(D)(2)>
   return iftmp.0_1;
 
 }
and similarly for the other functions.  That is functionally equivalent and
there is no canonical form for those.  The reason for i_2(D) in the PHI
argument as opposed to 0 is the uncprop pass, that is in many cases
beneficial for expansion as we don't need to load the value into some pseudo
in one of the if blocks.
Now, for the 11.x ordering we have the pseudo = i insn in the extended basic
block (it comes first) and so forwprop1 undoes what uncprop does by
propagating constant 0 there.  But for the 12.x ordering, the extended basic
block contains pseudo = 5 and pseudo = i is in the other bb and so fwprop1
doesn't change it.
During the ce1 pass, we attempt to emit a conditional move and we have very
nice code for the cases where both last operands of ?: are constant, and yet
another for !TARGET_CMOVE if at least one of them is.

The following patch will undo the uncprop behavior during
ix86_expand_int_movcc, but just for those spots that can benefit from both
or at least one operands being constant, leaving the pure cmov case as is
(because then it is useful not to have to load a constant into a pseudo
as it already is in one).  We can do that in the
op0 == op1 ? op0 : op3
or
op0 != op1 ? op2 : op0
cases if op1 is a CONST_INT by pretending it is
op0 == op1 ? op1 : op3
or
op0 != op1 ? op2 : op1

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-04-23  Jakub Jelinek  <jakub@redhat.com>

	PR target/105338
	* config/i386/i386-expand.cc (ix86_expand_int_movcc): Handle
	op0 == cst1 ? op0 : op3 like op0 == cst1 ? cst1 : op3 for the non-cmov
	cases.

	* gcc.target/i386/pr105338.c: New test.


	Jakub

Comments

Uros Bizjak April 23, 2022, 8:05 a.m. UTC | #1
On Sat, Apr 23, 2022 at 8:53 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> Hi!
>
> The following testcase regressed on x86_64 on the trunk, due to some GIMPLE
> pass changes (r12-7687) we end up an *.optimized dump difference of:
> @@ -8,14 +8,14 @@ int foo (int i)
>
>    <bb 2> [local count: 1073741824]:
>    if (i_2(D) != 0)
> -    goto <bb 4>; [35.00%]
> +    goto <bb 3>; [35.00%]
>    else
> -    goto <bb 3>; [65.00%]
> +    goto <bb 4>; [65.00%]
>
> -  <bb 3> [local count: 697932184]:
> +  <bb 3> [local count: 375809640]:
>
>    <bb 4> [local count: 1073741824]:
> -  # iftmp.0_1 = PHI <5(2), i_2(D)(3)>
> +  # iftmp.0_1 = PHI <5(3), i_2(D)(2)>
>    return iftmp.0_1;
>
>  }
> and similarly for the other functions.  That is functionally equivalent and
> there is no canonical form for those.  The reason for i_2(D) in the PHI
> argument as opposed to 0 is the uncprop pass, that is in many cases
> beneficial for expansion as we don't need to load the value into some pseudo
> in one of the if blocks.
> Now, for the 11.x ordering we have the pseudo = i insn in the extended basic
> block (it comes first) and so forwprop1 undoes what uncprop does by
> propagating constant 0 there.  But for the 12.x ordering, the extended basic
> block contains pseudo = 5 and pseudo = i is in the other bb and so fwprop1
> doesn't change it.
> During the ce1 pass, we attempt to emit a conditional move and we have very
> nice code for the cases where both last operands of ?: are constant, and yet
> another for !TARGET_CMOVE if at least one of them is.
>
> The following patch will undo the uncprop behavior during
> ix86_expand_int_movcc, but just for those spots that can benefit from both
> or at least one operands being constant, leaving the pure cmov case as is
> (because then it is useful not to have to load a constant into a pseudo
> as it already is in one).  We can do that in the
> op0 == op1 ? op0 : op3
> or
> op0 != op1 ? op2 : op0
> cases if op1 is a CONST_INT by pretending it is
> op0 == op1 ? op1 : op3
> or
> op0 != op1 ? op2 : op1
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2022-04-23  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/105338
>         * config/i386/i386-expand.cc (ix86_expand_int_movcc): Handle
>         op0 == cst1 ? op0 : op3 like op0 == cst1 ? cst1 : op3 for the non-cmov
>         cases.
>
>         * gcc.target/i386/pr105338.c: New test.

OK.

Thanks,
Uros.

>
> --- gcc/config/i386/i386-expand.cc.jj   2022-04-13 15:42:39.000000000 +0200
> +++ gcc/config/i386/i386-expand.cc      2022-04-22 14:18:27.347135185 +0200
> @@ -3136,6 +3136,8 @@ ix86_expand_int_movcc (rtx operands[])
>    bool sign_bit_compare_p = false;
>    rtx op0 = XEXP (operands[1], 0);
>    rtx op1 = XEXP (operands[1], 1);
> +  rtx op2 = operands[2];
> +  rtx op3 = operands[3];
>
>    if (GET_MODE (op0) == TImode
>        || (GET_MODE (op0) == DImode
> @@ -3153,17 +3155,29 @@ ix86_expand_int_movcc (rtx operands[])
>        || (op1 == constm1_rtx && (code == GT || code == LE)))
>      sign_bit_compare_p = true;
>
> +  /* op0 == op1 ? op0 : op3 is equivalent to op0 == op1 ? op1 : op3,
> +     but if op1 is a constant, the latter form allows more optimizations,
> +     either through the last 2 ops being constant handling, or the one
> +     constant and one variable cases.  On the other side, for cmov the
> +     former might be better as we don't need to load the constant into
> +     another register.  */
> +  if (code == EQ && CONST_INT_P (op1) && rtx_equal_p (op0, op2))
> +    op2 = op1;
> +  /* Similarly for op0 != op1 ? op2 : op0 and op0 != op1 ? op2 : op1.  */
> +  else if (code == NE && CONST_INT_P (op1) && rtx_equal_p (op0, op3))
> +    op3 = op1;
> +
>    /* Don't attempt mode expansion here -- if we had to expand 5 or 6
>       HImode insns, we'd be swallowed in word prefix ops.  */
>
>    if ((mode != HImode || TARGET_FAST_PREFIX)
>        && (mode != (TARGET_64BIT ? TImode : DImode))
> -      && CONST_INT_P (operands[2])
> -      && CONST_INT_P (operands[3]))
> +      && CONST_INT_P (op2)
> +      && CONST_INT_P (op3))
>      {
>        rtx out = operands[0];
> -      HOST_WIDE_INT ct = INTVAL (operands[2]);
> -      HOST_WIDE_INT cf = INTVAL (operands[3]);
> +      HOST_WIDE_INT ct = INTVAL (op2);
> +      HOST_WIDE_INT cf = INTVAL (op3);
>        HOST_WIDE_INT diff;
>
>        diff = ct - cf;
> @@ -3559,6 +3573,9 @@ ix86_expand_int_movcc (rtx operands[])
>        if (BRANCH_COST (optimize_insn_for_speed_p (), false) <= 2)
>         return false;
>
> +      operands[2] = op2;
> +      operands[3] = op3;
> +
>        /* If one of the two operands is an interesting constant, load a
>          constant with the above and mask it in with a logical operation.  */
>
> --- gcc/testsuite/gcc.target/i386/pr105338.c.jj 2022-04-22 16:14:35.827045371 +0200
> +++ gcc/testsuite/gcc.target/i386/pr105338.c    2022-04-22 16:20:43.579913630 +0200
> @@ -0,0 +1,26 @@
> +/* PR target/105338 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-ipa-icf -masm=att" } */
> +/* { dg-final { scan-assembler-times "\tnegl\t" 3 } } */
> +/* { dg-final { scan-assembler-times "\tsbbl\t" 3 } } */
> +/* { dg-final { scan-assembler-times "\tandl\t" 3 } } */
> +
> +int
> +foo (int i)
> +{
> +  return i ? 5 : 0;
> +}
> +
> +int
> +bar (int b)
> +{
> +  return !!b * 5;
> +}
> +
> +int
> +baz (int b)
> +{
> +  if (!b)
> +    return 0;
> +  return 5;
> +}
>
>         Jakub
>
diff mbox series

Patch

--- gcc/config/i386/i386-expand.cc.jj	2022-04-13 15:42:39.000000000 +0200
+++ gcc/config/i386/i386-expand.cc	2022-04-22 14:18:27.347135185 +0200
@@ -3136,6 +3136,8 @@  ix86_expand_int_movcc (rtx operands[])
   bool sign_bit_compare_p = false;
   rtx op0 = XEXP (operands[1], 0);
   rtx op1 = XEXP (operands[1], 1);
+  rtx op2 = operands[2];
+  rtx op3 = operands[3];
 
   if (GET_MODE (op0) == TImode
       || (GET_MODE (op0) == DImode
@@ -3153,17 +3155,29 @@  ix86_expand_int_movcc (rtx operands[])
       || (op1 == constm1_rtx && (code == GT || code == LE)))
     sign_bit_compare_p = true;
 
+  /* op0 == op1 ? op0 : op3 is equivalent to op0 == op1 ? op1 : op3,
+     but if op1 is a constant, the latter form allows more optimizations,
+     either through the last 2 ops being constant handling, or the one
+     constant and one variable cases.  On the other side, for cmov the
+     former might be better as we don't need to load the constant into
+     another register.  */
+  if (code == EQ && CONST_INT_P (op1) && rtx_equal_p (op0, op2))
+    op2 = op1;
+  /* Similarly for op0 != op1 ? op2 : op0 and op0 != op1 ? op2 : op1.  */
+  else if (code == NE && CONST_INT_P (op1) && rtx_equal_p (op0, op3))
+    op3 = op1;
+
   /* Don't attempt mode expansion here -- if we had to expand 5 or 6
      HImode insns, we'd be swallowed in word prefix ops.  */
 
   if ((mode != HImode || TARGET_FAST_PREFIX)
       && (mode != (TARGET_64BIT ? TImode : DImode))
-      && CONST_INT_P (operands[2])
-      && CONST_INT_P (operands[3]))
+      && CONST_INT_P (op2)
+      && CONST_INT_P (op3))
     {
       rtx out = operands[0];
-      HOST_WIDE_INT ct = INTVAL (operands[2]);
-      HOST_WIDE_INT cf = INTVAL (operands[3]);
+      HOST_WIDE_INT ct = INTVAL (op2);
+      HOST_WIDE_INT cf = INTVAL (op3);
       HOST_WIDE_INT diff;
 
       diff = ct - cf;
@@ -3559,6 +3573,9 @@  ix86_expand_int_movcc (rtx operands[])
       if (BRANCH_COST (optimize_insn_for_speed_p (), false) <= 2)
 	return false;
 
+      operands[2] = op2;
+      operands[3] = op3;
+
       /* If one of the two operands is an interesting constant, load a
 	 constant with the above and mask it in with a logical operation.  */
 
--- gcc/testsuite/gcc.target/i386/pr105338.c.jj	2022-04-22 16:14:35.827045371 +0200
+++ gcc/testsuite/gcc.target/i386/pr105338.c	2022-04-22 16:20:43.579913630 +0200
@@ -0,0 +1,26 @@ 
+/* PR target/105338 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-ipa-icf -masm=att" } */
+/* { dg-final { scan-assembler-times "\tnegl\t" 3 } } */
+/* { dg-final { scan-assembler-times "\tsbbl\t" 3 } } */
+/* { dg-final { scan-assembler-times "\tandl\t" 3 } } */
+
+int
+foo (int i)
+{
+  return i ? 5 : 0;
+}
+
+int
+bar (int b)
+{
+  return !!b * 5;
+}
+
+int
+baz (int b)
+{
+  if (!b)
+    return 0;
+  return 5;
+}