diff mbox

[rs6000] Update default loop peel limits

Message ID 50C0FCDE.5010508@linux.vnet.ibm.com
State New
Headers show

Commit Message

Pat Haugen Dec. 6, 2012, 8:15 p.m. UTC
The following patch restores the old default limits for loop peeling 
that were recently changed to 100 and caused a 20% degradation in 
454.calculix.

Bootstrap/regtest on powerpc64-linux with no new regressions. Ok for trunk?

-Pat


2012-12-06  Pat Haugen <pthaugen@us.ibm.com>
         * config/rs6000/rs6000.c (rs6000_option_override_internal): Set
         default loop peeling limits.

Comments

David Edelsohn Dec. 6, 2012, 8:17 p.m. UTC | #1
On Thu, Dec 6, 2012 at 3:15 PM, Pat Haugen <pthaugen@linux.vnet.ibm.com> wrote:
> The following patch restores the old default limits for loop peeling that
> were recently changed to 100 and caused a 20% degradation in 454.calculix.
>
> Bootstrap/regtest on powerpc64-linux with no new regressions. Ok for trunk?
>
> -Pat
>
>
> 2012-12-06  Pat Haugen <pthaugen@us.ibm.com>
>         * config/rs6000/rs6000.c (rs6000_option_override_internal): Set
>         default loop peeling limits.

Okay.

Thanks, David
Jan Hubicka Dec. 6, 2012, 9:43 p.m. UTC | #2
> The following patch restores the old default limits for loop peeling
> that were recently changed to 100 and caused a 20% degradation in
> 454.calculix.
> 
> Bootstrap/regtest on powerpc64-linux with no new regressions. Ok for trunk?
> 
> -Pat
> 
> 
> 2012-12-06  Pat Haugen <pthaugen@us.ibm.com>
>         * config/rs6000/rs6000.c (rs6000_option_override_internal): Set
>         default loop peeling limits.

Actually the calculix regression is also seen on core.
Igor was looking into what loops got slower and why. Either we can fix that partiuclar
loop or revert to the old default (that sadly causes quite a lot of code bloat)

Honza
Richard Biener Dec. 7, 2012, 9:05 a.m. UTC | #3
On Thu, Dec 6, 2012 at 10:43 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> The following patch restores the old default limits for loop peeling
>> that were recently changed to 100 and caused a 20% degradation in
>> 454.calculix.
>>
>> Bootstrap/regtest on powerpc64-linux with no new regressions. Ok for trunk?
>>
>> -Pat
>>
>>
>> 2012-12-06  Pat Haugen <pthaugen@us.ibm.com>
>>         * config/rs6000/rs6000.c (rs6000_option_override_internal): Set
>>         default loop peeling limits.
>
> Actually the calculix regression is also seen on core.
> Igor was looking into what loops got slower and why. Either we can fix that partiuclar
> loop or revert to the old default (that sadly causes quite a lot of code bloat)

Well, as the patch regressed the testcase in put in the testsuite for calculix
that is no surprise... gfortran.dg/reassoc_4.f.  The fix for the testcase was to
increase the limit with a --param :/

Note that it is even beneficial to unroll two more levels of the nest
completely.
It's just an insane testcase (and I spent quite some time on it trying to
somehow autodetect the simplification opportunities - see the unrolling
heuristic rewrite patch I dumped on you a few weeks ago).

No advice from me on how to "fix" this ... but eventually the rewrite restores
the old behavior with the new limits (I did the rewrite to try to somehow make
it do two extra levels of nest unrolling ...).

Richard.

> Honza
Jan Hubicka Dec. 7, 2012, 4:43 p.m. UTC | #4
> On Thu, Dec 6, 2012 at 10:43 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> The following patch restores the old default limits for loop peeling
> >> that were recently changed to 100 and caused a 20% degradation in
> >> 454.calculix.
> >>
> >> Bootstrap/regtest on powerpc64-linux with no new regressions. Ok for trunk?
> >>
> >> -Pat
> >>
> >>
> >> 2012-12-06  Pat Haugen <pthaugen@us.ibm.com>
> >>         * config/rs6000/rs6000.c (rs6000_option_override_internal): Set
> >>         default loop peeling limits.
> >
> > Actually the calculix regression is also seen on core.
> > Igor was looking into what loops got slower and why. Either we can fix that partiuclar
> > loop or revert to the old default (that sadly causes quite a lot of code bloat)
> 
> Well, as the patch regressed the testcase in put in the testsuite for calculix
> that is no surprise... gfortran.dg/reassoc_4.f.  The fix for the testcase was to
> increase the limit with a --param :/

I would not care too much about limits for gfortran.dg/reassoc_4.f, it is an artificail
testcase.
> 
> Note that it is even beneficial to unroll two more levels of the nest
> completely.
> It's just an insane testcase (and I spent quite some time on it trying to
> somehow autodetect the simplification opportunities - see the unrolling
> heuristic rewrite patch I dumped on you a few weeks ago).
> 
> No advice from me on how to "fix" this ... but eventually the rewrite restores
> the old behavior with the new limits (I did the rewrite to try to somehow make
> it do two extra levels of nest unrolling ...).

Yep, calculix is kind of special case.  I also tested increasing the limits to more
than 10 fold improves ammp and applu (I filled in enhancement PR to track that).

I am not sure how to deal with calculix - I am OK with reverting to the old limit even
if it doesn't fare best with firefox (whose code size growth was main motivation for
tamning the heuristics down). In meantime inlining improved so except for FDO the firefox
is still smaller with 4.8 compres to 4.7 in my setup.

I am also OK with declaring calculix a special case and living with the regression,
same way as we do not try to handle ammp/applu.

Honza
> 
> Richard.
> 
> > Honza
Michael Matz Dec. 7, 2012, 4:51 p.m. UTC | #5
Hi,

On Fri, 7 Dec 2012, Jan Hubicka wrote:

> > > Actually the calculix regression is also seen on core.
> > > Igor was looking into what loops got slower and why. Either we can fix that partiuclar
> > > loop or revert to the old default (that sadly causes quite a lot of code bloat)
> > 
> > Well, as the patch regressed the testcase in put in the testsuite for calculix
> > that is no surprise... gfortran.dg/reassoc_4.f.  The fix for the testcase was to
> > increase the limit with a --param :/
> 
> I would not care too much about limits for gfortran.dg/reassoc_4.f, it 
> is an artificail testcase.

Ehm, no it's not.  It _specifically_ is the important inner loop nest of 
calculix and it should be compiled to exactly the expected form (i.e. 22 
multiplications and unrolled) without any changes to params.  At least 
when you expect that -Ofast is giving good results on 454.calculix.

I.e. this testcase breaking automatically implies slower calculix results.


Ciao,
Michael.
diff mbox

Patch

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 194260)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -3120,6 +3120,14 @@  rs6000_option_override_internal (bool gl
 			     global_options.x_param_values,
 			     global_options_set.x_param_values);
 
+      /* Increase loop peeling limits based on performance analysis. */
+      maybe_set_param_value (PARAM_MAX_PEELED_INSNS, 400,
+			     global_options.x_param_values,
+			     global_options_set.x_param_values);
+      maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS, 400,
+			     global_options.x_param_values,
+			     global_options_set.x_param_values);
+
       /* If using typedef char *va_list, signal that
 	 __builtin_va_start (&ap, 0) can be optimized to
 	 ap = __builtin_next_arg (0).  */