diff mbox

[RTL-ifcvt] Allow PLUS+immediate expression in noce_try_store_flag_constants

Message ID 55CB71E4.6040507@arm.com
State New
Headers show

Commit Message

Kyrylo Tkachov Aug. 12, 2015, 4:18 p.m. UTC
Hi all,

This patch is a sequel to:
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02567.html

It allows if-conversion of expressions of the form:
if (test) x := y + c1; else x := y + c2
where c1 and c2 are a pair constants that can be optimised using the
existing rules in noce_try_store_flag_constants.

The resulting sequence looks something like:
x := (test [!=,==] 0) + y;
x := x + [c1,c2];

This patch reuses the logic for the combinations of diff and STORE_FLAG signs that
is hammered out in https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02567.html.

That means we just extract the constants c1 and c2 from the PLUS rtx
and use their diff as the driver for the logic in that function.

Currently, this patch allows only the case where
diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE
as it is the most beneficial case (from looking at SPEC2006 on aarch64)
and also the least tricky to get right.
In the future I suppose that restriction can be relaxed if needed, but in any case
this should be an improvement over the existing situation.

On aarch64 this allows us for code:
int
barsi (int x)
{
   return x > 100 ? x + 4 : x + 3;
}

to generate:
barsi:
         cmp     w0, 101
         cinc    w0, w0, ge
         add     w0, w0, 3
         ret

instead of the current:
barsi:
         add     w1, w0, 4
         add     w2, w0, 3
         cmp     w0, 101
         csel    w0, w2, w1, lt
         ret

thus saving one instruction and using two fewer registers.

Bootstrapped and tested on arm, aarch64, x86_64.

Ok for trunk?

Thanks,
Kyrill

2015-08-12  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * ifcvt.c (noce_try_store_flag_constants): Handle PLUS-immediate
     expressions in A and B.

2015-08-12  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/cinc_common_1.c: New test.

Comments

Jeff Law Aug. 12, 2015, 5:32 p.m. UTC | #1
On 08/12/2015 10:18 AM, Kyrill Tkachov wrote:
> Hi all,
>
> This patch is a sequel to:
> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02567.html
>
> It allows if-conversion of expressions of the form:
> if (test) x := y + c1; else x := y + c2
> where c1 and c2 are a pair constants that can be optimised using the
> existing rules in noce_try_store_flag_constants.
>
> The resulting sequence looks something like:
> x := (test [!=,==] 0) + y;
> x := x + [c1,c2];
>
> This patch reuses the logic for the combinations of diff and STORE_FLAG
> signs that
> is hammered out in
> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02567.html.
>
> That means we just extract the constants c1 and c2 from the PLUS rtx
> and use their diff as the driver for the logic in that function.
>
> Currently, this patch allows only the case where
> diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE
> as it is the most beneficial case (from looking at SPEC2006 on aarch64)
> and also the least tricky to get right.
> In the future I suppose that restriction can be relaxed if needed, but
> in any case
> this should be an improvement over the existing situation.
>
> On aarch64 this allows us for code:
> int
> barsi (int x)
> {
>    return x > 100 ? x + 4 : x + 3;
> }
>
> to generate:
> barsi:
>          cmp     w0, 101
>          cinc    w0, w0, ge
>          add     w0, w0, 3
>          ret
>
> instead of the current:
> barsi:
>          add     w1, w0, 4
>          add     w2, w0, 3
>          cmp     w0, 101
>          csel    w0, w2, w1, lt
>          ret
>
> thus saving one instruction and using two fewer registers.
>
> Bootstrapped and tested on arm, aarch64, x86_64.
>
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> 2015-08-12  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>      * ifcvt.c (noce_try_store_flag_constants): Handle PLUS-immediate
>      expressions in A and B.
>
> 2015-08-12  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>      * gcc.target/aarch64/cinc_common_1.c: New test.
The function comment for noce_try_store_flag_constants needs to be updated.

OK with that fix.

Presumably there's some obvious extensions here for operators other than 
PLUS, the question in my mind is whether or not they matter in practice. 
  Curious to hear your thoughts about handling other operators and 
whether or not you think it's worth it as a follow-up.

Jeff
Kyrylo Tkachov Aug. 12, 2015, 5:42 p.m. UTC | #2
On 12/08/15 18:32, Jeff Law wrote:
> On 08/12/2015 10:18 AM, Kyrill Tkachov wrote:
>> Hi all,
>>
>> This patch is a sequel to:
>> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02567.html
>>
>> It allows if-conversion of expressions of the form:
>> if (test) x := y + c1; else x := y + c2
>> where c1 and c2 are a pair constants that can be optimised using the
>> existing rules in noce_try_store_flag_constants.
>>
>> The resulting sequence looks something like:
>> x := (test [!=,==] 0) + y;
>> x := x + [c1,c2];
>>
>> This patch reuses the logic for the combinations of diff and STORE_FLAG
>> signs that
>> is hammered out in
>> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02567.html.
>>
>> That means we just extract the constants c1 and c2 from the PLUS rtx
>> and use their diff as the driver for the logic in that function.
>>
>> Currently, this patch allows only the case where
>> diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE
>> as it is the most beneficial case (from looking at SPEC2006 on aarch64)
>> and also the least tricky to get right.
>> In the future I suppose that restriction can be relaxed if needed, but
>> in any case
>> this should be an improvement over the existing situation.
>>
>> On aarch64 this allows us for code:
>> int
>> barsi (int x)
>> {
>>     return x > 100 ? x + 4 : x + 3;
>> }
>>
>> to generate:
>> barsi:
>>           cmp     w0, 101
>>           cinc    w0, w0, ge
>>           add     w0, w0, 3
>>           ret
>>
>> instead of the current:
>> barsi:
>>           add     w1, w0, 4
>>           add     w2, w0, 3
>>           cmp     w0, 101
>>           csel    w0, w2, w1, lt
>>           ret
>>
>> thus saving one instruction and using two fewer registers.
>>
>> Bootstrapped and tested on arm, aarch64, x86_64.
>>
>> Ok for trunk?
>>
>> Thanks,
>> Kyrill
>>
>> 2015-08-12  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>       * ifcvt.c (noce_try_store_flag_constants): Handle PLUS-immediate
>>       expressions in A and B.
>>
>> 2015-08-12  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>       * gcc.target/aarch64/cinc_common_1.c: New test.
> The function comment for noce_try_store_flag_constants needs to be updated.
>
> OK with that fix.

Thanks, I'll update the comment and commit.

>
> Presumably there's some obvious extensions here for operators other than
> PLUS, the question in my mind is whether or not they matter in practice.
>    Curious to hear your thoughts about handling other operators and
> whether or not you think it's worth it as a follow-up.

I did consider it. I am most interested in the case that I enabled
in this patch because this can enable targets that have conditional
increment instructions, like aarch64. In that case, if-conversion
opens the way for combine to utilise that naturally.

The other two transformations in that function are:
       /* if (test) x = 8; else x = 0;
      =>   x = (test != 0) << 3;  */

and
       /* if (test) x = -1; else x = b;
      =>   x = -(test != 0) | b;  */

So they either shift the stored flag, or do other bitwise ops on it.
By adding an extra addition (for the common part) I don't think there's
much to gain against the alternative, which will be to do two parallel
additions and a conditional select.

Thanks,
Kyrill


>
> Jeff
>
diff mbox

Patch

commit ce0de8d30962bf00339dc3a030401a4be2a7a248
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Thu Jul 30 15:14:27 2015 +0100

    [ifcvt][unfinished] Allow PLUS+immediate expression in noce_try_store_flag_constants

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index f815166..7a44e29 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1198,17 +1198,35 @@  noce_try_store_flag_constants (struct noce_if_info *if_info)
   HOST_WIDE_INT itrue, ifalse, diff, tmp;
   int normalize;
   bool can_reverse;
-  machine_mode mode;
+  machine_mode mode = GET_MODE (if_info->x);;
+  rtx common = NULL_RTX;
 
   if (!noce_simple_bbs (if_info))
     return FALSE;
 
-  if (CONST_INT_P (if_info->a)
-      && CONST_INT_P (if_info->b))
+  rtx a = if_info->a;
+  rtx b = if_info->b;
+
+  /* Handle cases like x := test ? y + 3 : y + 4.  */
+  if (GET_CODE (a) == PLUS
+      && GET_CODE (b) == PLUS
+      && CONST_INT_P (XEXP (a, 1))
+      && CONST_INT_P (XEXP (b, 1))
+      && rtx_equal_p (XEXP (a, 0), XEXP (b, 0))
+      && noce_operand_ok (XEXP (a, 0))
+      && if_info->branch_cost >= 2)
     {
-      mode = GET_MODE (if_info->x);
-      ifalse = INTVAL (if_info->a);
-      itrue = INTVAL (if_info->b);
+      common = XEXP (a, 0);
+
+      a = XEXP (a, 1);
+      b = XEXP (b, 1);
+    }
+
+  if (CONST_INT_P (a)
+      && CONST_INT_P (b))
+    {
+      ifalse = INTVAL (a);
+      itrue = INTVAL (b);
       bool subtract_flag_p = false;
 
       diff = (unsigned HOST_WIDE_INT) itrue - ifalse;
@@ -1241,6 +1259,11 @@  noce_try_store_flag_constants (struct noce_if_info *if_info)
 	    {
 	      reversep = can_reverse;
 	      subtract_flag_p = !can_reverse;
+	      /* If we need to subtract the flag and we have PLUS-immediate
+		 A and B then it is unlikely to be beneficial to play tricks
+		 here.  */
+	      if (subtract_flag_p && common)
+		return FALSE;
 	    }
 	  /* test ? 3 : 4
 	     => can_reverse  | 3 + (test == 0)
@@ -1249,6 +1272,11 @@  noce_try_store_flag_constants (struct noce_if_info *if_info)
 	    {
 	      reversep = can_reverse;
 	      subtract_flag_p = !can_reverse;
+	      /* If we need to subtract the flag and we have PLUS-immediate
+		 A and B then it is unlikely to be beneficial to play tricks
+		 here.  */
+	      if (subtract_flag_p && common)
+		return FALSE;
 	    }
 	  /* test ? 4 : 3
 	     => 4 + (test != 0).  */
@@ -1290,6 +1318,15 @@  noce_try_store_flag_constants (struct noce_if_info *if_info)
 	}
 
       start_sequence ();
+
+      /* If we have x := test ? x + 3 : x + 4 then move the original
+	 x out of the way while we store flags.  */
+      if (common && rtx_equal_p (common, if_info->x))
+	{
+	  common = gen_reg_rtx (mode);
+	  noce_emit_move_insn (common, if_info->x);
+	}
+
       target = noce_emit_store_flag (if_info, if_info->x, reversep, normalize);
       if (! target)
 	{
@@ -1301,13 +1338,27 @@  noce_try_store_flag_constants (struct noce_if_info *if_info)
 	 =>   x = 3 + (test == 0);  */
       if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
 	{
+	  /* Add the common part now.  This may allow combine to merge this
+	     with the store flag operation earlier into some sort of conditional
+	     increment/decrement if the target allows it.  */
+	  if (common)
+	    target = expand_simple_binop (mode, PLUS,
+					   target, common,
+					   target, 0, OPTAB_WIDEN);
+
 	  /* Always use ifalse here.  It should have been swapped with itrue
 	     when appropriate when reversep is true.  */
 	  target = expand_simple_binop (mode, subtract_flag_p ? MINUS : PLUS,
 					gen_int_mode (ifalse, mode), target,
 					if_info->x, 0, OPTAB_WIDEN);
 	}
-
+      /* Other cases are not beneficial when the original A and B are PLUS
+	 expressions.  */
+      else if (common)
+	{
+	  end_sequence ();
+	  return FALSE;
+	}
       /* if (test) x = 8; else x = 0;
 	 =>   x = (test != 0) << 3;  */
       else if (ifalse == 0 && (tmp = exact_log2 (itrue)) >= 0)
diff --git a/gcc/testsuite/gcc.target/aarch64/cinc_common_1.c b/gcc/testsuite/gcc.target/aarch64/cinc_common_1.c
new file mode 100644
index 0000000..d041263
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cinc_common_1.c
@@ -0,0 +1,64 @@ 
+/* { dg-do run } */
+/* { dg-options "-save-temps -O2 -fno-inline" } */
+
+extern void abort (void);
+
+int
+foosi (int x)
+{
+  return x > 100 ? x - 2 : x - 1;
+}
+
+int
+barsi (int x)
+{
+  return x > 100 ? x + 4 : x + 3;
+}
+
+long
+foodi (long x)
+{
+  return x > 100 ? x - 2 : x - 1;
+}
+
+long
+bardi (long x)
+{
+  return x > 100 ? x + 4 : x + 3;
+}
+
+/* { dg-final { scan-assembler-times "cs?inc\tw\[0-9\]*" 2 } } */
+/* { dg-final { scan-assembler-times "cs?inc\tx\[0-9\]*" 2 } } */
+
+int
+main (void)
+{
+  if (foosi (105) != 103)
+    abort ();
+
+  if (foosi (95) != 94)
+    abort ();
+
+  if (barsi (105) != 109)
+    abort ();
+
+  if (barsi (95) != 98)
+    abort ();
+
+  if (foodi (105) != 103)
+    abort ();
+
+  if (foodi (95) != 94)
+    abort ();
+
+  if (bardi (105) != 109)
+    abort ();
+
+  if (bardi (95) != 98)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not "csel\tx\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler-not "csel\tw\[0-9\]*.*" } } */