Message ID | 50C0FCDE.5010508@linux.vnet.ibm.com |
---|---|
State | New |
Headers | show |
On Thu, Dec 6, 2012 at 3:15 PM, Pat Haugen <pthaugen@linux.vnet.ibm.com> wrote: > The following patch restores the old default limits for loop peeling that > were recently changed to 100 and caused a 20% degradation in 454.calculix. > > Bootstrap/regtest on powerpc64-linux with no new regressions. Ok for trunk? > > -Pat > > > 2012-12-06 Pat Haugen <pthaugen@us.ibm.com> > * config/rs6000/rs6000.c (rs6000_option_override_internal): Set > default loop peeling limits. Okay. Thanks, David
> The following patch restores the old default limits for loop peeling > that were recently changed to 100 and caused a 20% degradation in > 454.calculix. > > Bootstrap/regtest on powerpc64-linux with no new regressions. Ok for trunk? > > -Pat > > > 2012-12-06 Pat Haugen <pthaugen@us.ibm.com> > * config/rs6000/rs6000.c (rs6000_option_override_internal): Set > default loop peeling limits. Actually the calculix regression is also seen on core. Igor was looking into what loops got slower and why. Either we can fix that partiuclar loop or revert to the old default (that sadly causes quite a lot of code bloat) Honza
On Thu, Dec 6, 2012 at 10:43 PM, Jan Hubicka <hubicka@ucw.cz> wrote: >> The following patch restores the old default limits for loop peeling >> that were recently changed to 100 and caused a 20% degradation in >> 454.calculix. >> >> Bootstrap/regtest on powerpc64-linux with no new regressions. Ok for trunk? >> >> -Pat >> >> >> 2012-12-06 Pat Haugen <pthaugen@us.ibm.com> >> * config/rs6000/rs6000.c (rs6000_option_override_internal): Set >> default loop peeling limits. > > Actually the calculix regression is also seen on core. > Igor was looking into what loops got slower and why. Either we can fix that partiuclar > loop or revert to the old default (that sadly causes quite a lot of code bloat) Well, as the patch regressed the testcase in put in the testsuite for calculix that is no surprise... gfortran.dg/reassoc_4.f. The fix for the testcase was to increase the limit with a --param :/ Note that it is even beneficial to unroll two more levels of the nest completely. It's just an insane testcase (and I spent quite some time on it trying to somehow autodetect the simplification opportunities - see the unrolling heuristic rewrite patch I dumped on you a few weeks ago). No advice from me on how to "fix" this ... but eventually the rewrite restores the old behavior with the new limits (I did the rewrite to try to somehow make it do two extra levels of nest unrolling ...). Richard. > Honza
> On Thu, Dec 6, 2012 at 10:43 PM, Jan Hubicka <hubicka@ucw.cz> wrote: > >> The following patch restores the old default limits for loop peeling > >> that were recently changed to 100 and caused a 20% degradation in > >> 454.calculix. > >> > >> Bootstrap/regtest on powerpc64-linux with no new regressions. Ok for trunk? > >> > >> -Pat > >> > >> > >> 2012-12-06 Pat Haugen <pthaugen@us.ibm.com> > >> * config/rs6000/rs6000.c (rs6000_option_override_internal): Set > >> default loop peeling limits. > > > > Actually the calculix regression is also seen on core. > > Igor was looking into what loops got slower and why. Either we can fix that partiuclar > > loop or revert to the old default (that sadly causes quite a lot of code bloat) > > Well, as the patch regressed the testcase in put in the testsuite for calculix > that is no surprise... gfortran.dg/reassoc_4.f. The fix for the testcase was to > increase the limit with a --param :/ I would not care too much about limits for gfortran.dg/reassoc_4.f, it is an artificail testcase. > > Note that it is even beneficial to unroll two more levels of the nest > completely. > It's just an insane testcase (and I spent quite some time on it trying to > somehow autodetect the simplification opportunities - see the unrolling > heuristic rewrite patch I dumped on you a few weeks ago). > > No advice from me on how to "fix" this ... but eventually the rewrite restores > the old behavior with the new limits (I did the rewrite to try to somehow make > it do two extra levels of nest unrolling ...). Yep, calculix is kind of special case. I also tested increasing the limits to more than 10 fold improves ammp and applu (I filled in enhancement PR to track that). I am not sure how to deal with calculix - I am OK with reverting to the old limit even if it doesn't fare best with firefox (whose code size growth was main motivation for tamning the heuristics down). In meantime inlining improved so except for FDO the firefox is still smaller with 4.8 compres to 4.7 in my setup. I am also OK with declaring calculix a special case and living with the regression, same way as we do not try to handle ammp/applu. Honza > > Richard. > > > Honza
Hi, On Fri, 7 Dec 2012, Jan Hubicka wrote: > > > Actually the calculix regression is also seen on core. > > > Igor was looking into what loops got slower and why. Either we can fix that partiuclar > > > loop or revert to the old default (that sadly causes quite a lot of code bloat) > > > > Well, as the patch regressed the testcase in put in the testsuite for calculix > > that is no surprise... gfortran.dg/reassoc_4.f. The fix for the testcase was to > > increase the limit with a --param :/ > > I would not care too much about limits for gfortran.dg/reassoc_4.f, it > is an artificail testcase. Ehm, no it's not. It _specifically_ is the important inner loop nest of calculix and it should be compiled to exactly the expected form (i.e. 22 multiplications and unrolled) without any changes to params. At least when you expect that -Ofast is giving good results on 454.calculix. I.e. this testcase breaking automatically implies slower calculix results. Ciao, Michael.
Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 194260) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -3120,6 +3120,14 @@ rs6000_option_override_internal (bool gl global_options.x_param_values, global_options_set.x_param_values); + /* Increase loop peeling limits based on performance analysis. */ + maybe_set_param_value (PARAM_MAX_PEELED_INSNS, 400, + global_options.x_param_values, + global_options_set.x_param_values); + maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS, 400, + global_options.x_param_values, + global_options_set.x_param_values); + /* If using typedef char *va_list, signal that __builtin_va_start (&ap, 0) can be optimized to ap = __builtin_next_arg (0). */