[RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
diff mbox

Message ID CAC1BbcSjSYHd2j==dSwrjuMTrDvgnwrJJ2941k89aLEqnt49xg@mail.gmail.com
State New
Headers show

Commit Message

Bernhard Reutner-Fischer July 10, 2015, 11 p.m. UTC
On 10 July 2015 at 14:31, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
> Hi all,
>
> This patch makes if-conversion more aggressive when handling code of the
> form:
> if (test)
>   x := a  //THEN
> else
>   x := b  //ELSE

> The current code adds the costs of both the THEN and ELSE blocks and proceeds if they don't
> exceed the branch cost. I don't think that's quite a right calculation.
> We're going to be executing at least one of the basic blocks anyway.
> This patch we instead check the *maximum* of the two blocks against the branch cost.
> This should still catch cases where a high latency instruction appears in one or both of
> the paths.

Shouldn't this maximum also take probability into account? Or maybe
not, would have to think about it tomorrow.

$ contrib/check_GNU_style.sh rtl-ifcvt.00.patch

Blocks of 8 spaces should be replaced with tabs.
783:+        return FALSE;


Generally ifcvt.c (resp. the whole tree) could use a
sed -i -e "s/\([[:space:]]\)FALSE/\1false/g" gcc/ifcvt.c
Maybe some of the int predicates could then become bools.


+/* Return iff the registers that the insns in BB_A set do not
+   get used in BB_B.  */

Return true iff


Did you include go in your testing?
I see:
Unexpected results in this build (new failures)
FAIL: encoding/json
FAIL: go/printer
FAIL: go/scanner
FAIL: html/template
FAIL: log
FAIL: net/http
FAIL: net/http/cgi
FAIL: net/http/cookiejar
FAIL: os
FAIL: text/template


bbs_ok_for_cmove_arith() looks costly but i guess you looked if
there's some pre-existing cleverness you could have used instead?

noce_emit_bb() could use a better comment. Likewise insn_valid_noce_process_p().

insn_rtx_cost() should return an unsigned int, then_cost, else_cost
should thus be unsigned int too.

copy_of_a versus copy_of_insn_b; I'd shorten the latter.

bb_valid_for_noce_process_p() suggests that there is a JOIN_BB param
but there is none?
Also should document the return value (and should not clobber the OUT
params upon failure, no?).

As for the testcases, it would be nice to have at least a tiny bit for
x86_64, too.

PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either ;)
PPS: attached meant to illustrate comments above. Untested.

cheers,

Comments

Bernhard Reutner-Fischer July 10, 2015, 11:08 p.m. UTC | #1
On 11 July 2015 at 01:00, Bernhard Reutner-Fischer
<rep.dot.nop@gmail.com> wrote:
> On 10 July 2015 at 14:31, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:

> PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either ;)

err, arm and aarch64 have no -mbranch-cost, a couple of prominent
other arches do..
Kyrylo Tkachov July 13, 2015, 9:45 a.m. UTC | #2
Hi Bernhard,

On 11/07/15 00:00, Bernhard Reutner-Fischer wrote:
> On 10 July 2015 at 14:31, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>> Hi all,
>>
>> This patch makes if-conversion more aggressive when handling code of the
>> form:
>> if (test)
>>    x := a  //THEN
>> else
>>    x := b  //ELSE
>> The current code adds the costs of both the THEN and ELSE blocks and proceeds if they don't
>> exceed the branch cost. I don't think that's quite a right calculation.
>> We're going to be executing at least one of the basic blocks anyway.
>> This patch we instead check the *maximum* of the two blocks against the branch cost.
>> This should still catch cases where a high latency instruction appears in one or both of
>> the paths.
> Shouldn't this maximum also take probability into account? Or maybe
> not, would have to think about it tomorrow.

The branch cost that we test against (recorded in if_info earlier in the ifcvt.c
call chain) is either the predictable branch cost or the unpredictable branch
cost, depending on what the predictable_edge_p machinery returned.
I think checking against that should be enough, but it's an easy thing to experiment
with, so I'm open to arguments in any direction.

>
> $ contrib/check_GNU_style.sh rtl-ifcvt.00.patch
>
> Blocks of 8 spaces should be replaced with tabs.
> 783:+        return FALSE;
>
>
> Generally ifcvt.c (resp. the whole tree) could use a
> sed -i -e "s/\([[:space:]]\)FALSE/\1false/g" gcc/ifcvt.c
> Maybe some of the int predicates could then become bools.

Ok, will go over the style in the patch.

>
>
> +/* Return iff the registers that the insns in BB_A set do not
> +   get used in BB_B.  */
>
> Return true iff

I tried to be too formal here ;) https://en.wikipedia.org/wiki/If_and_only_if
I'll use a normal if here.

>
>
> Did you include go in your testing?
> I see:
> Unexpected results in this build (new failures)
> FAIL: encoding/json
> FAIL: go/printer
> FAIL: go/scanner
> FAIL: html/template
> FAIL: log
> FAIL: net/http
> FAIL: net/http/cgi
> FAIL: net/http/cookiejar
> FAIL: os
> FAIL: text/template

Hmmm. I don't see these failures. I double checked right now and they
appear as PASS in my configuration.

I tested make check-go on x86_64-unknown-linux-gnu configured with
--without-isl --disable-multilib --enable-languages=c,c++,fortran,go.

Are you sure this is not some other issue in your tree?

>
> bbs_ok_for_cmove_arith() looks costly but i guess you looked if
> there's some pre-existing cleverness you could have used instead?

I did have a look, but couldn't find any.
The bbs_ok_for_cmove_arith is done after the costs check
so I'd hope that the costs check would already discard
really long basic-blocks.

>
> noce_emit_bb() could use a better comment. Likewise insn_valid_noce_process_p().
>
> insn_rtx_cost() should return an unsigned int, then_cost, else_cost
> should thus be unsigned int too.
>
> copy_of_a versus copy_of_insn_b; I'd shorten the latter.

Thanks, good suggestions.

>
> bb_valid_for_noce_process_p() suggests that there is a JOIN_BB param
> but there is none?
> Also should document the return value (and should not clobber the OUT
> params upon failure, no?).

bah, I forgot to update the comment once I modified the function
during development of the patch. I'll fix those.

>
> As for the testcases, it would be nice to have at least a tiny bit for
> x86_64, too.

I can put the testcases in gcc.dg and enable them for x86 as well,
but I think a couple of the already pass as is because x86 doesn't
need to do an extra zero_extend inside the basic-block.

>
> PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either ;)
> PPS: attached meant to illustrate comments above. Untested.

Thanks a lot! This is all very helpful.
I'll respin the patch.

Thanks,
Kyrill


>
> cheers,
Kyrylo Tkachov July 13, 2015, 10 a.m. UTC | #3
On 13/07/15 10:45, Kyrill Tkachov wrote:
>
>>
>> +/* Return iff the registers that the insns in BB_A set do not
>> +   get used in BB_B.  */
>>
>> Return true iff
> I tried to be too formal here ;) https://en.wikipedia.org/wiki/If_and_only_if
> I'll use a normal if here.

Err, of course you were talking about the missing 'true'.
Sorry, early morning.

Kyrill
Bernhard Reutner-Fischer July 13, 2015, 10:48 a.m. UTC | #4
On July 13, 2015 11:45:55 AM GMT+02:00, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>Hi Bernhard,
>

>> Did you include go in your testing?
>> I see:
>> Unexpected results in this build (new failures)
>> FAIL: encoding/json
>> FAIL: go/printer
>> FAIL: go/scanner
>> FAIL: html/template
>> FAIL: log
>> FAIL: net/http
>> FAIL: net/http/cgi
>> FAIL: net/http/cookiejar
>> FAIL: os
>> FAIL: text/template
>
>Hmmm. I don't see these failures. I double checked right now and they
>appear as PASS in my configuration.
>
>I tested make check-go on x86_64-unknown-linux-gnu configured with
>--without-isl --disable-multilib --enable-languages=c,c++,fortran,go.
>
>Are you sure this is not some other issue in your tree?

I have ISL enabled. I do have a couple of local stuff but that tested fine before your patch and should not really have impact on the parts your patch touches. So maybe it's ISL.

Thanks,
Kyrylo Tkachov July 13, 2015, 1:12 p.m. UTC | #5
On 13/07/15 11:48, Bernhard Reutner-Fischer wrote:
> On July 13, 2015 11:45:55 AM GMT+02:00, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>> Hi Bernhard,
>>
>>> Did you include go in your testing?
>>> I see:
>>> Unexpected results in this build (new failures)
>>> FAIL: encoding/json
>>> FAIL: go/printer
>>> FAIL: go/scanner
>>> FAIL: html/template
>>> FAIL: log
>>> FAIL: net/http
>>> FAIL: net/http/cgi
>>> FAIL: net/http/cookiejar
>>> FAIL: os
>>> FAIL: text/template
>> Hmmm. I don't see these failures. I double checked right now and they
>> appear as PASS in my configuration.
>>
>> I tested make check-go on x86_64-unknown-linux-gnu configured with
>> --without-isl --disable-multilib --enable-languages=c,c++,fortran,go.
>>
>> Are you sure this is not some other issue in your tree?
> I have ISL enabled. I do have a couple of local stuff but that tested fine before your patch and should not really have impact on the parts your patch touches. So maybe it's ISL.

I've rebuilt with ISL from scratch and the tests still pass for me.
I'm testing Ubuntu on a Haswell machine, don't know if that's relevant.

Kyrill

>
> Thanks,
>

Patch
diff mbox

From 1b7d8f9b61eb538cc4338e2073d04a66518f13c2 Mon Sep 17 00:00:00 2001
From: Bernhard Reutner-Fischer <rep.dot.nop@gmail.com>
Date: Fri, 10 Jul 2015 21:25:30 +0200
Subject: [PATCH] rtl-ifcvt typos

fix some typos on top of Kyrill's patch.
should mv the testcases to common ground.
---
 gcc/ifcvt.c                                     |   49 ++++++++++-------------
 gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c |    2 +-
 gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c |    2 +-
 gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c |    7 ++--
 4 files changed, 27 insertions(+), 33 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 3d324257..0bf6645 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1784,8 +1784,7 @@  noce_try_cmove_arith (struct noce_if_info *if_info)
 
   /* We're going to execute one of the basic blocks anyway, so
      bail out if the most expensive of the two blocks is unacceptable.  */
-  if (MAX (then_cost, else_cost)
-      > COSTS_N_INSNS (if_info->branch_cost))
+  if (MAX (then_cost, else_cost) > COSTS_N_INSNS (if_info->branch_cost))
     return FALSE;
 
   /* Possibly rearrange operands to make things come out more natural.  */
@@ -2730,28 +2729,26 @@  bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   rtx cc = cc_in_cond (cond);
 
   if (!insn_valid_noce_process_p (last_insn, cc))
-    return FALSE;
+    return false;
   last_set = single_set (last_insn);
 
   rtx x = SET_DEST (last_set);
-
   rtx_insn *first_insn = first_active_insn (test_bb);
   rtx first_set = single_set (first_insn);
-  bool speed_p = optimize_bb_for_speed_p (test_bb);
 
-  *cost = insn_rtx_cost (last_set, speed_p);
   if (!first_set)
     return false;
+
   /* We have a single simple set, that's okay.  */
-  else if (first_insn == last_insn)
+  bool speed_p = optimize_bb_for_speed_p (test_bb);
+
+  if (first_insn == last_insn)
     {
       *simple_p = noce_operand_ok (SET_DEST (first_set));
       *cost = insn_rtx_cost (first_set, speed_p);
       return *simple_p;
     }
 
-  *simple_p = false;
-
   rtx_insn *prev_last_insn = PREV_INSN (last_insn);
   gcc_assert (prev_last_insn);
 
@@ -2764,6 +2761,7 @@  bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   /* The regs that are live out of test_bb.  */
   bitmap test_bb_live_out = df_get_live_out (test_bb);
 
+  int potential_cost = insn_rtx_cost (last_set, speed_p);
   rtx_insn *insn;
   FOR_BB_INSNS (test_bb, insn)
     {
@@ -2781,7 +2779,7 @@  bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
 	  if (MEM_P (SET_SRC (sset)) || MEM_P (SET_DEST (sset)))
 	    goto free_bitmap_and_fail;
 
-	  *cost += insn_rtx_cost (sset, speed_p);
+	  potential_cost += insn_rtx_cost (sset, speed_p);
 	  bitmap_set_bit (test_bb_temps, REGNO (SET_DEST (sset)));
 	}
     }
@@ -2792,11 +2790,13 @@  bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
     goto free_bitmap_and_fail;
 
   BITMAP_FREE (test_bb_temps);
+  *cost = potential_cost;
+  *simple_p = false;
   return true;
 
-  free_bitmap_and_fail:
-    BITMAP_FREE (test_bb_temps);
-    return false;
+ free_bitmap_and_fail:
+  BITMAP_FREE (test_bb_temps);
+  return false;
 }
 
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert
@@ -2828,21 +2828,14 @@  noce_process_if_block (struct noce_if_info *if_info)
      to calculate a value for x.
      ??? For future expansion, look for multiple X in such patterns.  */
 
-  bool then_bb_valid
-    = bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost,
-				    &if_info->then_simple);
-
-  bool else_bb_valid = false;
-  if (else_bb)
-    else_bb_valid
-      = bb_valid_for_noce_process_p (else_bb, cond, &if_info->else_cost,
-				      &if_info->else_simple);
-
-  if (!then_bb_valid)
-    return FALSE;
+  if (! bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost,
+				    &if_info->then_simple))
+    return false;
 
-  if (else_bb && !else_bb_valid)
-    return FALSE;
+  if (else_bb
+      && ! bb_valid_for_noce_process_p (else_bb, cond, &if_info->else_cost,
+				      &if_info->else_simple))
+    return false;
 
   insn_a = last_active_insn (then_bb, FALSE);
   set_a = single_set (insn_a);
@@ -2866,7 +2859,7 @@  noce_process_if_block (struct noce_if_info *if_info)
       gcc_assert (set_b);
 
       if (!rtx_interchangeable_p (x, SET_DEST (set_b)))
-        return FALSE;
+	return FALSE;
     }
   else
     {
diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c
index 1836f57..92bc17a 100644
--- a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c
@@ -1,4 +1,4 @@ 
-/* { dg-do compile } */
+/* { dg-do compile { target aarch64*-*-* x86_64-*-* } } */
 /* { dg-options "-fdump-rtl-ce1 -O2" } */
 
 int
diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c
index 8c48270..e0e1728 100644
--- a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c
@@ -1,4 +1,4 @@ 
-/* { dg-do compile } */
+/* { dg-do compile { target aarch64*-*-* x86_64-*-* } } */
 /* { dg-options "-fdump-rtl-ce1 -O2" } */
 
 
diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c
index 1aecbc9..0441357 100644
--- a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c
@@ -1,5 +1,5 @@ 
-/* { dg-do compile } */
-/* { dg-options "-save-temps -O2" } */
+/* { dg-do assemble { target aarch64*-*-* x86_64-*-* } } */
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
 
 
 typedef long long s64;
@@ -16,4 +16,5 @@  foo (s64 a, s64 b, s64 c)
 }
 
 /* This test can be reduced to just return a + c;  */
-/* { dg-final { scan-assembler-not "sub\.*\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+\.*" } } */
+/* { dg-final { scan-rtl-dump "3 true changes made" "ce1" } } */
+/* { dg-final { scan-assembler-not "sub\.*\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+\.*" { target { aarch64*-*-* } } } } */
-- 
1.7.10.4