diff mbox

[ifcvt] Add a new parameter to limit if-conversion

Message ID 1450446761-5209-1-git-send-email-james.greenhalgh@arm.com
State New
Headers show

Commit Message

James Greenhalgh Dec. 18, 2015, 1:52 p.m. UTC
Hi,

PR68920 talks about undesirable if-conversion in the x86 back-end.
The if-conversion cost model simply uses BRANCH_COST (I want to revisit
this for GCC 7), but BRANCH_COST is overloaded in the compiler and reducing
it causes other optimisations to be disabled.

Consequently, to fix the PR we want something new that the target can set
to override BRANCH_COST and reduce the number of instructions that get
if-converted. This patch adds this mechanism through a param.

Bootstrapped on x86_64-none-linux-gnu and aarch64-none-linux-gnu.

OK for trunk?

Thanks,
James

---
gcc/

2015-12-17  James Greenhalgh  <james.greenhalgh@arm.com>

	PR rtl-optimization/68920
	* params.def (PARAM_MAX_RTL_IF_CONVERSION_INSNS): New.
	* ifcvt.c (noce_find_if_block): Limit by new param.
	* doc/invoke.texi (max-rtl-if-conversion-insns): Document it.

gcc/testsuite/

2015-12-17  James Greenhalgh  <james.greenhalgh@arm.com>

	PR rtl-optimization/68920
	* gcc.deg/ifcvt-5.c: New.

Comments

Yuri Rumyantsev Dec. 18, 2015, 2:09 p.m. UTC | #1
James,

We implemented slightly different patch - we restrict number of SET
instructions for if-conversion through new parameter and add check in
bb_ok_for_noce_convert_multiple_sets:

+  unsigned limit = MIN (ii->branch_cost,
+ (unsigned) PARAM_VALUE (PARAM_MAX_IF_CONV_SET_INSNS));
..
-  if (count > ii->branch_cost)
-    return FALSE;
+  if (count > limit)
+    return false;

Now we are testing it for different suites/chips but I saw that real
benchmark for which we saw huge performance degradation gets speed-ip
on 65% for -march=slm -m32.

2015-12-18 16:52 GMT+03:00 James Greenhalgh <james.greenhalgh@arm.com>:
>
> Hi,
>
> PR68920 talks about undesirable if-conversion in the x86 back-end.
> The if-conversion cost model simply uses BRANCH_COST (I want to revisit
> this for GCC 7), but BRANCH_COST is overloaded in the compiler and reducing
> it causes other optimisations to be disabled.
>
> Consequently, to fix the PR we want something new that the target can set
> to override BRANCH_COST and reduce the number of instructions that get
> if-converted. This patch adds this mechanism through a param.
>
> Bootstrapped on x86_64-none-linux-gnu and aarch64-none-linux-gnu.
>
> OK for trunk?
>
> Thanks,
> James
>
> ---
> gcc/
>
> 2015-12-17  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         PR rtl-optimization/68920
>         * params.def (PARAM_MAX_RTL_IF_CONVERSION_INSNS): New.
>         * ifcvt.c (noce_find_if_block): Limit by new param.
>         * doc/invoke.texi (max-rtl-if-conversion-insns): Document it.
>
> gcc/testsuite/
>
> 2015-12-17  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         PR rtl-optimization/68920
>         * gcc.deg/ifcvt-5.c: New.
>
James Greenhalgh Dec. 18, 2015, 2:19 p.m. UTC | #2
On Fri, Dec 18, 2015 at 05:09:14PM +0300, Yuri Rumyantsev wrote:
> James,
> 
> We implemented slightly different patch - we restrict number of SET
> instructions for if-conversion through new parameter and add check in
> bb_ok_for_noce_convert_multiple_sets:
> 
> +  unsigned limit = MIN (ii->branch_cost,
> + (unsigned) PARAM_VALUE (PARAM_MAX_IF_CONV_SET_INSNS));
> ..
> -  if (count > ii->branch_cost)
> -    return FALSE;
> +  if (count > limit)
> +    return false;
> 
> Now we are testing it for different suites/chips but I saw that real
> benchmark for which we saw huge performance degradation gets speed-ip
> on 65% for -march=slm -m32.

Yes, that will work too. I put it where I did to give back-ends the choice
to turn off all if-conversion by setting the param to zero. Your fix is more
targeted to fixing just the one regression. I don't have a preference as
to which direction we take this. I saw a similar performance boost for your
testcase on my development box with my patch - so at least we now have two
candidate patches to fix the performance regression.

Thanks,
James

> 
> 2015-12-18 16:52 GMT+03:00 James Greenhalgh <james.greenhalgh@arm.com>:
> >
> > Hi,
> >
> > PR68920 talks about undesirable if-conversion in the x86 back-end.
> > The if-conversion cost model simply uses BRANCH_COST (I want to revisit
> > this for GCC 7), but BRANCH_COST is overloaded in the compiler and reducing
> > it causes other optimisations to be disabled.
> >
> > Consequently, to fix the PR we want something new that the target can set
> > to override BRANCH_COST and reduce the number of instructions that get
> > if-converted. This patch adds this mechanism through a param.
> >
> > Bootstrapped on x86_64-none-linux-gnu and aarch64-none-linux-gnu.
> >
> > OK for trunk?
> >
> > Thanks,
> > James
> >
> > ---
> > gcc/
> >
> > 2015-12-17  James Greenhalgh  <james.greenhalgh@arm.com>
> >
> >         PR rtl-optimization/68920
> >         * params.def (PARAM_MAX_RTL_IF_CONVERSION_INSNS): New.
> >         * ifcvt.c (noce_find_if_block): Limit by new param.
> >         * doc/invoke.texi (max-rtl-if-conversion-insns): Document it.
> >
> > gcc/testsuite/
> >
> > 2015-12-17  James Greenhalgh  <james.greenhalgh@arm.com>
> >
> >         PR rtl-optimization/68920
> >         * gcc.deg/ifcvt-5.c: New.
> >
>
Yuri Rumyantsev Dec. 31, 2015, 9:21 a.m. UTC | #3
Hi All,

Here is slightly modified patch which limits a number of conditional
moves instead of changing conditional branch cost. This is in fact a
work-around for very poor cost model which needs to be enhanced to
evaluate cost of conditional move that could be greater then cost of
ordinary move (for some targets). This fix did not show any
performance regressions on different x86 platforms in comparison with
James patch.

Bootstrap and regression testing did not show any new failures.
Is it OK for trunk?

ChangeLog:
2015-12-31  Yuri Rumyantsev  <ysrumyan@gmail.com>

PR rtl-optimization/68920
* config/i386/i386.c (ix86_option_override_internal): Restrict number
of conditional moves for  RTL if-conversion to 1 for
TARGET_ONE_IF_CONV_INSN.
* config/i386/i386.h (TARGET_ONE_IF_CONV_INSN): New macros.
* config/i386/x86-tune.def (X86_TUNE_ONE_IF_CONV_INSN): New macros.
* params.def (PARAM_MAX_RTL_IF_CONVERSION_INSNS) : Introduce new
parameter to restirct number of conditional moves for
RTL if-conversion.
* doc/invoke.texi (max-rtl-if-conversion-insns): Document it.
* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): Limit number of
conditionl moves.

gcc/testsuite/ChangeLog
* gcc.dg/ifcvt-4.c: Add "--param max-rtl-if-conversion-insns=3" option
for ix86 targets.
* gcc.dg/ifcvt-5.c: New test.



2015-12-18 17:19 GMT+03:00 James Greenhalgh <james.greenhalgh@arm.com>:
> On Fri, Dec 18, 2015 at 05:09:14PM +0300, Yuri Rumyantsev wrote:
>> James,
>>
>> We implemented slightly different patch - we restrict number of SET
>> instructions for if-conversion through new parameter and add check in
>> bb_ok_for_noce_convert_multiple_sets:
>>
>> +  unsigned limit = MIN (ii->branch_cost,
>> + (unsigned) PARAM_VALUE (PARAM_MAX_IF_CONV_SET_INSNS));
>> ..
>> -  if (count > ii->branch_cost)
>> -    return FALSE;
>> +  if (count > limit)
>> +    return false;
>>
>> Now we are testing it for different suites/chips but I saw that real
>> benchmark for which we saw huge performance degradation gets speed-ip
>> on 65% for -march=slm -m32.
>
> Yes, that will work too. I put it where I did to give back-ends the choice
> to turn off all if-conversion by setting the param to zero. Your fix is more
> targeted to fixing just the one regression. I don't have a preference as
> to which direction we take this. I saw a similar performance boost for your
> testcase on my development box with my patch - so at least we now have two
> candidate patches to fix the performance regression.
>
> Thanks,
> James
>
>>
>> 2015-12-18 16:52 GMT+03:00 James Greenhalgh <james.greenhalgh@arm.com>:
>> >
>> > Hi,
>> >
>> > PR68920 talks about undesirable if-conversion in the x86 back-end.
>> > The if-conversion cost model simply uses BRANCH_COST (I want to revisit
>> > this for GCC 7), but BRANCH_COST is overloaded in the compiler and reducing
>> > it causes other optimisations to be disabled.
>> >
>> > Consequently, to fix the PR we want something new that the target can set
>> > to override BRANCH_COST and reduce the number of instructions that get
>> > if-converted. This patch adds this mechanism through a param.
>> >
>> > Bootstrapped on x86_64-none-linux-gnu and aarch64-none-linux-gnu.
>> >
>> > OK for trunk?
>> >
>> > Thanks,
>> > James
>> >
>> > ---
>> > gcc/
>> >
>> > 2015-12-17  James Greenhalgh  <james.greenhalgh@arm.com>
>> >
>> >         PR rtl-optimization/68920
>> >         * params.def (PARAM_MAX_RTL_IF_CONVERSION_INSNS): New.
>> >         * ifcvt.c (noce_find_if_block): Limit by new param.
>> >         * doc/invoke.texi (max-rtl-if-conversion-insns): Document it.
>> >
>> > gcc/testsuite/
>> >
>> > 2015-12-17  James Greenhalgh  <james.greenhalgh@arm.com>
>> >
>> >         PR rtl-optimization/68920
>> >         * gcc.deg/ifcvt-5.c: New.
>> >
>>
Bernd Schmidt Jan. 4, 2016, 3:39 p.m. UTC | #4
On 12/31/2015 10:21 AM, Yuri Rumyantsev wrote:
> Here is slightly modified patch which limits a number of conditional
> moves instead of changing conditional branch cost. This is in fact a
> work-around for very poor cost model which needs to be enhanced to
> evaluate cost of conditional move that could be greater then cost of
> ordinary move (for some targets). This fix did not show any
> performance regressions on different x86 platforms in comparison with
> James patch.

I think this is OK. In the future, when attaching patches, please make 
sure they are text/plain so they are displayed by mail readers and can 
be quoted.


Bernd
Andreas Schwab Jan. 12, 2016, 10:07 a.m. UTC | #5
gcc.dg/ifcvt-5.c fails on ia64:

From ifcvt-5.c.223r.ce1:

========== Pass 2 ==========


========== no more changes

1 possible IF blocks searched.
1 IF blocks converted.
2 true changes made.

Andreas.
Yuri Rumyantsev Jan. 12, 2016, 1:13 p.m. UTC | #6
Andreas,

Is it OK for you if we exclude dg/ifcvt-5.c from ia64 testing since
predication must be used instead of conditional move's.

2016-01-12 13:07 GMT+03:00 Andreas Schwab <schwab@suse.de>:
> gcc.dg/ifcvt-5.c fails on ia64:
>
> From ifcvt-5.c.223r.ce1:
>
> ========== Pass 2 ==========
>
>
> ========== no more changes
>
> 1 possible IF blocks searched.
> 1 IF blocks converted.
> 2 true changes made.
>
> Andreas.
>
> --
> Andreas Schwab, SUSE Labs, schwab@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."
Andreas Schwab Jan. 12, 2016, 2:01 p.m. UTC | #7
Yuri Rumyantsev <ysrumyan@gmail.com> writes:

> Is it OK for you if we exclude dg/ifcvt-5.c from ia64 testing

Sure, go ahead.

Andreas.
Yuri Rumyantsev Jan. 12, 2016, 3:41 p.m. UTC | #8
Hi All,

Here is a simple fix to exclude dg/ifcvt-5.c test from ia64 testing.

Is it OK for trunk?
testsuite/ChangeLog:
2016-01-12  Yuri Rumyantsev  <ysrumyan@gmail.com>

PR rtl-optimization/68920
gcc/testsuite/ChangeLog
* gcc.dg/ifcvt-5.c: Exclude it from ia64 testing.

2016-01-12 17:01 GMT+03:00 Andreas Schwab <schwab@suse.de>:
> Yuri Rumyantsev <ysrumyan@gmail.com> writes:
>
>> Is it OK for you if we exclude dg/ifcvt-5.c from ia64 testing
>
> Sure, go ahead.
>
> Andreas.
>
> --
> Andreas Schwab, SUSE Labs, schwab@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."
Bernd Schmidt Jan. 13, 2016, 1:52 a.m. UTC | #9
On 01/12/2016 04:41 PM, Yuri Rumyantsev wrote:
> Here is a simple fix to exclude dg/ifcvt-5.c test from ia64 testing.
>
> Is it OK for trunk?

Please ensure patches are attached as plain text so that reviewers don't 
have to save them to be able to read them.

It looks like ia64 is making chanes in cond_move_process_if_block. Maybe 
that function needs to take the param into account so we don't have to 
keep adding excluded targets to the testcase?


Bernd
diff mbox

Patch

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9b3e2fe..1f9b0fe 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10341,6 +10341,14 @@  In each case, the @var{value} is an integer.  The allowable choices for
 When branch is predicted to be taken with probability lower than this threshold
 (in percent), then it is considered well predictable. The default is 10.
 
+@item max-rtl-if-conversion-insns
+RTL if-conversion tries to remove conditional branches around a block and
+replace them with conditionally executed instructions.  This parameter
+gives the maximum number of instructions in a block which should be
+considered for if-conversion.  The default is 10, though the compiler will
+also use other heuristics to decide whether if-conversion is likely to be
+profitable.
+
 @item max-crossjump-edges
 The maximum number of incoming edges to consider for cross-jumping.
 The algorithm used by @option{-fcrossjumping} is @math{O(N^2)} in
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 6164232..2674baa 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -32,6 +32,7 @@ 
 #include "regs.h"
 #include "emit-rtl.h"
 #include "recog.h"
+#include "params.h"
 
 #include "cfgrtl.h"
 #include "cfganal.h"
@@ -3944,6 +3945,9 @@  noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge,
   if_info.then_else_reversed = then_else_reversed;
   if_info.branch_cost = BRANCH_COST (optimize_bb_for_speed_p (test_bb),
 				     predictable_edge_p (then_edge));
+  if_info.branch_cost
+    = MIN (if_info.branch_cost,
+	   (unsigned int) PARAM_VALUE (PARAM_MAX_RTL_IF_CONVERSION_INSNS));
 
   /* Do the real work.  */
 
diff --git a/gcc/params.def b/gcc/params.def
index 41fd8a8..a0d9787 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -49,6 +49,15 @@  DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME,
 	  "Maximal estimated outcome of branch considered predictable.",
 	  2, 0, 50)
 
+/* The maximum number of insns in a basic block to consider for
+   RTL if-conversion.  A target might use BRANCH_COST to further limit the
+   number of insns considered.  */
+DEFPARAM (PARAM_MAX_RTL_IF_CONVERSION_INSNS,
+	  "max-rtl-if-conversion-insns",
+	  "Maximum number of insns in a basic block to consider for RTL "
+	  "if-conversion.",
+	  10, 0, 10)
+
 DEFPARAM (PARAM_INLINE_MIN_SPEEDUP,
 	  "inline-min-speedup",
 	  "The minimal estimated speedup allowing inliner to ignore inline-insns-single and inline-isnsns-auto.",
diff --git a/gcc/testsuite/gcc.dg/ifcvt-5.c b/gcc/testsuite/gcc.dg/ifcvt-5.c
new file mode 100644
index 0000000..fc6a2ca
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ifcvt-5.c
@@ -0,0 +1,19 @@ 
+/* Check that multi-insn if-conversion is not done if the override
+   parameter would not allow it.  */
+
+/* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-insns=1" } */
+int
+foo (int x, int y, int a)
+{
+  int i = x;
+  int j = y;
+  /* Try to make taking the branch likely.  */
+  __builtin_expect (x > y, 1);
+  if (x > y)
+    {
+      i = a;
+      j = i;
+    }
+  return i * j;
+}
+/* { dg-final { scan-rtl-dump "0 true changes made" "ce1" } } */