diff mbox

[RTL-ifcvt] Improve conditional select ops on immediates

Message ID 55B8D9EB.3060806@arm.com
State New
Headers show

Commit Message

Kyrylo Tkachov July 29, 2015, 1:49 p.m. UTC
Hi all,

This patch improves RTL if-conversion on sequences that perform a conditional select on integer constants.
Most of the smart logic to do that already exists in the noce_try_store_flag_constants function.
However, currently that function is tried after noce_try_cmove.
noce_try_cmove is a simple catch-all function that just loads the two immediates and performs a conditional
select between them. It returns true and then the caller noce_process_if_block doesn't try any other transformations,
completely skipping the more aggressive transformations that noce_try_store_flag_constants allows!

Calling noce_try_store_flag_constants before noce_try_cmove allows for the smarter if-conversion transformations
to be used. An example that occurs a lot in the gcc code itself is for the C code:
int
foo (int a, int b)
{
   return ((a & (1 << 25)) ? 5 : 4);
}

i.e. test a bit in a and return 5 or 4. Currently on aarch64 this generates the naive:
         and     w2, w0, 33554432  // mask away all bits except bit 25
         mov     w1, 4
         cmp     w2, wzr
         mov     w0, 5
         csel    w0, w0, w1, ne


whereas with this patch this can be transformed into the much better:
         ubfx    x0, x0, 25, 1  // extract bit 25
         add     w0, w0, 4

Another issue I encountered is that the code that claims to perform the transformation:
       /* if (test) x = 3; else x = 4;
      =>   x = 3 + (test == 0);  */

doesn't seem to do exactly that in all cases. In fact for that case it will try something like:
x = 4 - (test == 0)
which is suboptimal for targets like aarch64 which have a conditional increment operation.
This patch tweaks that code to always try to generate an addition of the condition rather than
a subtraction.

Anyway, for code:
int
fooinc (int x)
{
   return x ? 1025 : 1026;
}

we currently generate:
         mov     w2, 1025
         mov     w1, 1026
         cmp     w0, wzr
         csel    w0, w2, w1, ne

whereas with this patch we will generate:
         cmp     w0, wzr
         cset    w0, eq
         add     w0, w0, 1025

Bootstrapped and tested on arm, aarch64, x86_64.
Ok for trunk?

Thanks,
Kyrill

P.S. noce_try_store_flag_constants is currently gated on !targetm.have_conditional_execution () but I don't see
any reason to restrict it on targets with conditional execution. For example, I think the first example above
would show a benefit on arm if it was enabled there. But that can be a separate investigation.

2015-07-29  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * ifcvt.c (noce_try_store_flag_constants): Reverse when diff is
     -STORE_FLAG and condition is reversable.  Prefer to add to the
     flag value.
     (noce_process_if_block): Try noce_try_store_flag_constants before
     noce_try_cmove.

2015-07-29  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/csel_bfx_1.c: New test.
     * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.

Comments

Jeff Law July 29, 2015, 10:38 p.m. UTC | #1
On 07/29/2015 07:49 AM, Kyrill Tkachov wrote:
> Hi all,
>
> This patch improves RTL if-conversion on sequences that perform a
> conditional select on integer constants.
> Most of the smart logic to do that already exists in the
> noce_try_store_flag_constants function.
> However, currently that function is tried after noce_try_cmove.
> noce_try_cmove is a simple catch-all function that just loads the two
> immediates and performs a conditional
> select between them. It returns true and then the caller
> noce_process_if_block doesn't try any other transformations,
> completely skipping the more aggressive transformations that
> noce_try_store_flag_constants allows!
>
> Calling noce_try_store_flag_constants before noce_try_cmove allows for
> the smarter if-conversion transformations
> to be used. An example that occurs a lot in the gcc code itself is for
> the C code:
> int
> foo (int a, int b)
> {
>    return ((a & (1 << 25)) ? 5 : 4);
> }
>
> i.e. test a bit in a and return 5 or 4. Currently on aarch64 this
> generates the naive:
>          and     w2, w0, 33554432  // mask away all bits except bit 25
>          mov     w1, 4
>          cmp     w2, wzr
>          mov     w0, 5
>          csel    w0, w0, w1, ne
>
>
> whereas with this patch this can be transformed into the much better:
>          ubfx    x0, x0, 25, 1  // extract bit 25
>          add     w0, w0, 4
I suspect the PA would benefit from this as well, probably several 
other architectures with reasonable bitfield extraction capabilities.

>
> Another issue I encountered is that the code that claims to perform the
> transformation:
>        /* if (test) x = 3; else x = 4;
>       =>   x = 3 + (test == 0);  */
>
> doesn't seem to do exactly that in all cases. In fact for that case it
> will try something like:
> x = 4 - (test == 0)
> which is suboptimal for targets like aarch64 which have a conditional
> increment operation.
I vaguely recall targets that don't have const - X insns, but do have X 
+ const style insns.  And more generally I think we're better off 
generating the PLUS version.


> This patch tweaks that code to always try to generate an addition of the
> condition rather than
> a subtraction.
>
> Anyway, for code:
> int
> fooinc (int x)
> {
>    return x ? 1025 : 1026;
> }
>
> we currently generate:
>          mov     w2, 1025
>          mov     w1, 1026
>          cmp     w0, wzr
>          csel    w0, w2, w1, ne
>
> whereas with this patch we will generate:
>          cmp     w0, wzr
>          cset    w0, eq
>          add     w0, w0, 1025
>
> Bootstrapped and tested on arm, aarch64, x86_64.
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> P.S. noce_try_store_flag_constants is currently gated on
> !targetm.have_conditional_execution () but I don't see
> any reason to restrict it on targets with conditional execution. For
> example, I think the first example above
> would show a benefit on arm if it was enabled there. But that can be a
> separate investigation.
>
> 2015-07-29  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
>      * ifcvt.c (noce_try_store_flag_constants): Reverse when diff is
>      -STORE_FLAG and condition is reversable.  Prefer to add to the
>      flag value.
>      (noce_process_if_block): Try noce_try_store_flag_constants before
>      noce_try_cmove.
>
> 2015-07-29  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
>      * gcc.target/aarch64/csel_bfx_1.c: New test.
>      * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
>
> ifcvt-csel-imms.patch
>
>
> commit 0164ef164483bdf0b2f73e267e2ff1df7800dd6d
> Author: Kyrylo Tkachov<kyrylo.tkachov@arm.com>
> Date:   Tue Jul 28 14:59:46 2015 +0100
>
>      [RTL-ifcvt] Improve conditional increment ops on immediates
>
> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index a57d78c..80d0285 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -1222,7 +1222,7 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
>
>         reversep = 0;
>         if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
> -	normalize = 0;
> +	normalize = 0, reversep = (diff == -STORE_FLAG_VALUE) && can_reverse;
We generally avoid using a ',' operator like this.  However, I can see 
that you're just following existing convention in that code.  So I won't 
object.



>         else if (ifalse == 0 && exact_log2 (itrue) >= 0
>   	       && (STORE_FLAG_VALUE == 1
>   		   || if_info->branch_cost >= 2))
> @@ -1261,10 +1261,13 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
>   	 =>   x = 3 + (test == 0);  */
>         if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
>   	{
> -	  target = expand_simple_binop (mode,
> -					(diff == STORE_FLAG_VALUE
> -					 ? PLUS : MINUS),
> -					gen_int_mode (ifalse, mode), target,
> +	  rtx_code code = reversep ? PLUS :
> +				    (diff == STORE_FLAG_VALUE ? PLUS
> +							       : MINUS);
> +	  HOST_WIDE_INT to_add = reversep ? MIN (ifalse, itrue) : ifalse;
> +
> +	  target = expand_simple_binop (mode, code,
> +					gen_int_mode (to_add, mode), target,
>   					if_info->x, 0, OPTAB_WIDEN);
>   	}
Note that STORE_FLAG_VALUE can potentially be any value.  Most targets 
use "1", but others use -1 (m68k for example, there are others).  I'm 
concerned that the reversep ? MIN (ifalse, itrue) won't do what we want 
on targets such as the m68k.

The logic here is also somewhat convoluted.  When reversep is true, we 
always use PLUS, but is that actually correct?  Normally we'd compute 
the code, then invert the code.  ie, if we normally want PLUS, then 
we've flip to MINUS and vice-versa.


>
> @@ -3120,13 +3123,14 @@ noce_process_if_block (struct noce_if_info *if_info)
>       goto success;
>     if (noce_try_abs (if_info))
>       goto success;
> +  if (!targetm.have_conditional_execution ()
> +      && noce_try_store_flag_constants (if_info))
> +    goto success;
>     if (HAVE_conditional_move
>         && noce_try_cmove (if_info))
>       goto success;
>     if (! targetm.have_conditional_execution ())
>       {
> -      if (noce_try_store_flag_constants (if_info))
> -	goto success;
>         if (noce_try_addcc (if_info))
>   	goto success;
>         if (noce_try_store_flag_mask (if_info))
This part seems fine and ought to be able to go forward independently. 
Consider this hunk and its associated testcase approved if you want to 
test it independently and get it installed while we iterate on the other 
hunks,

jeff
diff mbox

Patch

commit 0164ef164483bdf0b2f73e267e2ff1df7800dd6d
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Tue Jul 28 14:59:46 2015 +0100

    [RTL-ifcvt] Improve conditional increment ops on immediates

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index a57d78c..80d0285 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1222,7 +1222,7 @@  noce_try_store_flag_constants (struct noce_if_info *if_info)
 
       reversep = 0;
       if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
-	normalize = 0;
+	normalize = 0, reversep = (diff == -STORE_FLAG_VALUE) && can_reverse;
       else if (ifalse == 0 && exact_log2 (itrue) >= 0
 	       && (STORE_FLAG_VALUE == 1
 		   || if_info->branch_cost >= 2))
@@ -1261,10 +1261,13 @@  noce_try_store_flag_constants (struct noce_if_info *if_info)
 	 =>   x = 3 + (test == 0);  */
       if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
 	{
-	  target = expand_simple_binop (mode,
-					(diff == STORE_FLAG_VALUE
-					 ? PLUS : MINUS),
-					gen_int_mode (ifalse, mode), target,
+	  rtx_code code = reversep ? PLUS :
+				    (diff == STORE_FLAG_VALUE ? PLUS
+							       : MINUS);
+	  HOST_WIDE_INT to_add = reversep ? MIN (ifalse, itrue) : ifalse;
+
+	  target = expand_simple_binop (mode, code,
+					gen_int_mode (to_add, mode), target,
 					if_info->x, 0, OPTAB_WIDEN);
 	}
 
@@ -3120,13 +3123,14 @@  noce_process_if_block (struct noce_if_info *if_info)
     goto success;
   if (noce_try_abs (if_info))
     goto success;
+  if (!targetm.have_conditional_execution ()
+      && noce_try_store_flag_constants (if_info))
+    goto success;
   if (HAVE_conditional_move
       && noce_try_cmove (if_info))
     goto success;
   if (! targetm.have_conditional_execution ())
     {
-      if (noce_try_store_flag_constants (if_info))
-	goto success;
       if (noce_try_addcc (if_info))
 	goto success;
       if (noce_try_store_flag_mask (if_info))
diff --git a/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c b/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c
new file mode 100644
index 0000000..c20597f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O2" } */
+
+int
+foo (int a, int b)
+{
+  return ((a & (1 << 25)) ? 5 : 4);
+}
+
+/* { dg-final { scan-assembler "ubfx\t\[xw\]\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler-not "csel\tw\[0-9\]*.*" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/csel_imms_inc_1.c b/gcc/testsuite/gcc.target/aarch64/csel_imms_inc_1.c
new file mode 100644
index 0000000..2ae434d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/csel_imms_inc_1.c
@@ -0,0 +1,42 @@ 
+/* { dg-do run } */
+/* { dg-options "-save-temps -O2 -fno-inline" } */
+
+extern void abort (void);
+
+int
+fooinc (int x)
+{
+  if (x)
+    return 1025;
+  else
+    return 1026;
+}
+
+int
+fooinc2 (int x)
+{
+  if (x)
+    return 1026;
+  else
+    return 1025;
+}
+
+int
+main (void)
+{
+  if (fooinc (0) != 1026)
+    abort ();
+
+  if (fooinc (1) != 1025)
+    abort ();
+
+  if (fooinc2 (0) != 1025)
+    abort ();
+
+  if (fooinc2 (1) != 1026)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not "csel\tw\[0-9\]*.*" } } */