diff mbox

[x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

Message ID CAOvf_xxUaOkzzyVZp1nzq8Uw63tfuTyneSzhkoLwanzVw32_4g@mail.gmail.com
State New
Headers show

Commit Message

Evgeny Stupachenko Nov. 21, 2014, 10:46 a.m. UTC
PING.
"200" currently looks optimal for x86.
Let's commit the following:

2014-11-21  Evgeny Stupachenko  <evstupac@gmail.com>
        * config/i386/i386.c (ix86_option_override_internal): Increase
        PARAM_MAX_COMPLETELY_PEELED_INSNS.

       && HAVE_prefetch

On Wed, Nov 12, 2014 at 5:02 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> Code size for spec2000 is almost unchanged (many benchmarks have the
> same binaries).
> For those that are changed we have the following numbers (200 vs 100,
> both dynamic build -Ofast -funroll-loops -flto):
> 183.equake +10%
> 164.gzip, 173.applu +3,5%
> 187.facerec, 191.fma3d +2,5%
> 200.sixstrack +2%
> 177.mesa, 178.galgel +1%
>
>
> On Wed, Nov 12, 2014 at 2:51 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> > 150 and 200 make Silvermont performance better on 173.applu (+8%) and
>>> > 183.equake (+3%); Haswell spec2006 performance stays almost unchanged.
>>> > Higher value of 300 leave the performance of mentioned tests
>>> > unchanged, but add some regressions on other benchmarks.
>>> >
>>> > So I like 200 as well as 120 and 150, but can confirm performance
>>> > gains only for x86.
>>>
>>> IMO it's either 150 or 200.  We chose 200 for our 4.9-based compiler because
>>> this gave the performance boost without affecting the code size (on x86-64)
>>> and because this was previously 400, but it's your call.
>>
>> Both 150 or 200 globally work for me if there is not too much of code size
>> bloat (did not see code size mentioned here).
>>
>> What I did before decreasing the bounds was strenghtening the loop iteraton
>> count bounds and adding logic the predicts constant propagation enabled by
>> unrolling. For this reason 400 became too large as we did a lot more complete
>> unrolling than before. Also 400 in older compilers is not really 400 in newer.
>>
>> Because I saw performance to drop only with values bellow 50, I went for 100.
>> It would be very interesting to actually analyze what happends for those two
>> benchmarks (that should not be too hard with perf).
>>
>> Honza

Comments

Uros Bizjak Nov. 21, 2014, 10:57 a.m. UTC | #1
On Fri, Nov 21, 2014 at 11:46 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> PING.
> "200" currently looks optimal for x86.
> Let's commit the following:
>
> 2014-11-21  Evgeny Stupachenko  <evstupac@gmail.com>
>         * config/i386/i386.c (ix86_option_override_internal): Increase
>         PARAM_MAX_COMPLETELY_PEELED_INSNS.

OK. Looks like a good performance vs. codesize tradeoff.

Uros.
Eric Botcazou Nov. 22, 2014, 9:49 a.m. UTC | #2
> OK. Looks like a good performance vs. codesize tradeoff.

Yes, but IMO this should be done in the generic code, unrolling small loops is 
profitable on most architectures.
Uros Bizjak Nov. 22, 2014, 10:09 a.m. UTC | #3
On Sat, Nov 22, 2014 at 10:49 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> OK. Looks like a good performance vs. codesize tradeoff.
>
> Yes, but IMO this should be done in the generic code, unrolling small loops is
> profitable on most architectures.

Yeah, but after a couple of pings for a generic change, we went the target way.

Uros.
Eric Botcazou Nov. 22, 2014, 11:24 a.m. UTC | #4
> Yeah, but after a couple of pings for a generic change, we went the target
> way.

That's a bit of a shame, the 400 -> 100 change was very likely tested only on 
x86-64 and nevetheless applied to the generic code, so the fix repairing the 
damages should also be applied to the generic code.
Richard Biener Nov. 22, 2014, 6:38 p.m. UTC | #5
On November 22, 2014 12:24:22 PM CET, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> Yeah, but after a couple of pings for a generic change, we went the
>target
>> way.
>
>That's a bit of a shame, the 400 -> 100 change was very likely tested
>only on 
>x86-64 and nevetheless applied to the generic code, so the fix
>repairing the 
>damages should also be applied to the generic code.

A patch to bump the generic limit is OK.

Targets that dont want it can reduce it in target specific code.

Richard.
diff mbox

Patch

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6337aa5..5ac10eb 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4081,6 +4081,12 @@  ix86_option_override_internal (bool main_args_p,
                         opts->x_param_values,
                         opts_set->x_param_values);

+  /* Extend full peel max insns parameter for x86.  */

+  maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
+                        200,
+                        opts->x_param_values,
+                        opts_set->x_param_values);
+
   /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  */
   if (opts->x_flag_prefetch_loop_arrays < 0