Message ID | 1572849103-60161-1-git-send-email-guojiufu@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [V3] rs6000: Refine small loop unroll in loop_unroll_adjust hook | expand |
Hi Jeff, Thanks for the patch, I learned a lot from it. Some nits embedded. on 2019/11/4 下午2:31, Jiufu Guo wrote: > Hi, > > In this patch, loop unroll adjust hook is introduced for powerpc. We can do > target related hueristic adjustment in this hook. In this patch, small loops > is unrolled 2 times for O2 and O3 by default. With this patch, we can see > some improvement for spec2017. This patch enhanced a little for [Patch V2] to > enable small loops unroll for O3 by default like O2. > > Bootstrapped and regtested on powerpc64le. Is this ok for trunk? > > Jiufu > BR. > > gcc/ > 2019-11-04 Jiufu Guo <guojiufu@linux.ibm.com> > > PR tree-optimization/88760 > * config/rs6000/rs6000.c (rs6000_option_override_internal): Remove > code which changes PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS. > (TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook. > (rs6000_loop_unroll_adjust): New hook for loop unroll adjust. > Unrolling small loop 2 times for -O2 and -O3. > (rs6000_function_specific_save): Save unroll_small_loops flag. > (rs6000_function_specific_restore): Restore unroll_small_loops flag. > * gcc/config/rs6000/rs6000.opt (unroll_small_loops): New internal flag. > > > gcc.testsuite/ > 2019-11-04 Jiufu Guo <guojiufu@linux.ibm.com> > > PR tree-optimization/88760 > * gcc.dg/pr59643.c: Update back to r277550. > > --- > gcc/config/rs6000/rs6000.c | 38 ++++++++++++++++++++++++++++---------- > gcc/config/rs6000/rs6000.opt | 7 +++++++ > gcc/testsuite/gcc.dg/pr59643.c | 3 --- > 3 files changed, 35 insertions(+), 13 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index 9ed5151..5e1a75d 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribute_table[] = > #undef TARGET_VECTORIZE_DESTROY_COST_DATA > #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data > > +#undef TARGET_LOOP_UNROLL_ADJUST > +#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust > + > #undef TARGET_INIT_BUILTINS > #define TARGET_INIT_BUILTINS rs6000_init_builtins > #undef TARGET_BUILTIN_DECL > @@ -4540,25 +4543,20 @@ rs6000_option_override_internal (bool global_init_p) > global_options.x_param_values, > global_options_set.x_param_values); > > - /* unroll very small loops 2 time if no -funroll-loops. */ > + /* If funroll-loops is not enabled explicitly, then enable small loops > + unrolling for -O2, and do not turn fweb or frename-registers on. */ "for -O2" -> "for -O2 and up"? since I noticed it checks "optimize >=2" later. > if (!global_options_set.x_flag_unroll_loops > && !global_options_set.x_flag_unroll_all_loops) > { > - maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2, > - global_options.x_param_values, > - global_options_set.x_param_values); > - > - maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20, > - global_options.x_param_values, > - global_options_set.x_param_values); > + unroll_small_loops = optimize >= 2 ? 1 : 0; Maybe simpler with "unroll_small_loops = flag_unroll_loops"? > > - /* If fweb or frename-registers are not specificed in command-line, > - do not turn them on implicitly. */ > if (!global_options_set.x_flag_web) > global_options.x_flag_web = 0; > if (!global_options_set.x_flag_rename_registers) > global_options.x_flag_rename_registers = 0; > } > + else > + unroll_small_loops = 0; Could we initialize this in rs6000.opt as zero? BR, Kewen
"Kewen.Lin" <linkw@linux.ibm.com> writes: > Hi Jeff, > > Thanks for the patch, I learned a lot from it. Some nits embedded. Thanks for your comments! > > on 2019/11/4 下午2:31, Jiufu Guo wrote: >> Hi, >> >> In this patch, loop unroll adjust hook is introduced for powerpc. We can do >> target related hueristic adjustment in this hook. In this patch, small loops >> is unrolled 2 times for O2 and O3 by default. With this patch, we can see >> some improvement for spec2017. This patch enhanced a little for [Patch V2] to >> enable small loops unroll for O3 by default like O2. >> >> Bootstrapped and regtested on powerpc64le. Is this ok for trunk? >> >> Jiufu >> BR. >> >> gcc/ >> 2019-11-04 Jiufu Guo <guojiufu@linux.ibm.com> >> >> PR tree-optimization/88760 >> * config/rs6000/rs6000.c (rs6000_option_override_internal): Remove >> code which changes PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS. >> (TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook. >> (rs6000_loop_unroll_adjust): New hook for loop unroll adjust. >> Unrolling small loop 2 times for -O2 and -O3. >> (rs6000_function_specific_save): Save unroll_small_loops flag. >> (rs6000_function_specific_restore): Restore unroll_small_loops flag. >> * gcc/config/rs6000/rs6000.opt (unroll_small_loops): New internal flag. >> >> >> gcc.testsuite/ >> 2019-11-04 Jiufu Guo <guojiufu@linux.ibm.com> >> >> PR tree-optimization/88760 >> * gcc.dg/pr59643.c: Update back to r277550. >> >> --- >> gcc/config/rs6000/rs6000.c | 38 ++++++++++++++++++++++++++++---------- >> gcc/config/rs6000/rs6000.opt | 7 +++++++ >> gcc/testsuite/gcc.dg/pr59643.c | 3 --- >> 3 files changed, 35 insertions(+), 13 deletions(-) >> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c >> index 9ed5151..5e1a75d 100644 >> --- a/gcc/config/rs6000/rs6000.c >> +++ b/gcc/config/rs6000/rs6000.c >> @@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribute_table[] = >> #undef TARGET_VECTORIZE_DESTROY_COST_DATA >> #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data >> >> +#undef TARGET_LOOP_UNROLL_ADJUST >> +#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust >> + >> #undef TARGET_INIT_BUILTINS >> #define TARGET_INIT_BUILTINS rs6000_init_builtins >> #undef TARGET_BUILTIN_DECL >> @@ -4540,25 +4543,20 @@ rs6000_option_override_internal (bool global_init_p) >> global_options.x_param_values, >> global_options_set.x_param_values); >> >> - /* unroll very small loops 2 time if no -funroll-loops. */ >> + /* If funroll-loops is not enabled explicitly, then enable small loops >> + unrolling for -O2, and do not turn fweb or frename-registers on. */ > > "for -O2" -> "for -O2 and up"? since I noticed it checks "optimize > >=2" later. Thanks, yes, this would be more clear. > >> if (!global_options_set.x_flag_unroll_loops >> && !global_options_set.x_flag_unroll_all_loops) >> { >> - maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2, >> - global_options.x_param_values, >> - global_options_set.x_param_values); >> - >> - maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20, >> - global_options.x_param_values, >> - global_options_set.x_param_values); >> + unroll_small_loops = optimize >= 2 ? 1 : 0; > > Maybe simpler with "unroll_small_loops = flag_unroll_loops"? Both would be ok, I think. ;) > >> >> - /* If fweb or frename-registers are not specificed in command-line, >> - do not turn them on implicitly. */ >> if (!global_options_set.x_flag_web) >> global_options.x_flag_web = 0; >> if (!global_options_set.x_flag_rename_registers) >> global_options.x_flag_rename_registers = 0; >> } >> + else >> + unroll_small_loops = 0; > > Could we initialize this in rs6000.opt as zero? We could also set initial value as 0 in rs6000.opt. BR Jiufu. > > > BR, > Kewen
Hi! On Mon, Nov 04, 2019 at 02:31:43PM +0800, Jiufu Guo wrote: > In this patch, loop unroll adjust hook is introduced for powerpc. We can do > target related hueristic adjustment in this hook. In this patch, small loops > is unrolled 2 times for O2 and O3 by default. With this patch, we can see > some improvement for spec2017. This patch enhanced a little for [Patch V2] to > enable small loops unroll for O3 by default like O2. > * gcc/config/rs6000/rs6000.opt (unroll_small_loops): New internal flag. That's the declaration of a variable. A command line flag is something like -munroll-small-loops. Do we want a command line option like that? It makes testing simpler. > - /* unroll very small loops 2 time if no -funroll-loops. */ > + /* If funroll-loops is not enabled explicitly, then enable small loops > + unrolling for -O2, and do not turn fweb or frename-registers on. */ > if (!global_options_set.x_flag_unroll_loops > && !global_options_set.x_flag_unroll_all_loops) > { > - maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2, > - global_options.x_param_values, > - global_options_set.x_param_values); > - > - maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20, > - global_options.x_param_values, > - global_options_set.x_param_values); > + unroll_small_loops = optimize >= 2 ? 1 : 0; That includes -Os? I think you shouldn't always set it to some value, only enable it where you want to enable it. If you make a command line option for it this is especially simple (the table in common/config/rs6000/rs6000-common.c). > +static unsigned > +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop) > +{ > + if (unroll_small_loops) > + { > + /* TODO: This is hardcoded to 10 right now. It can be refined, for > + example we may want to unroll very small loops more times (4 perhaps). > + We also should use a PARAM for this. */ > + if (loop->ninsns <= 10) > + return MIN (2, nunroll); > + else > + return 0; > + } (Add an empty line here?) > + return nunroll; > +} Great :-) > @@ -23472,6 +23488,7 @@ rs6000_function_specific_save (struct cl_target_option *ptr, > { > ptr->x_rs6000_isa_flags = opts->x_rs6000_isa_flags; > ptr->x_rs6000_isa_flags_explicit = opts->x_rs6000_isa_flags_explicit; > + ptr->x_unroll_small_loops = opts->x_unroll_small_loops; > } Yeah we shouldn't need to add that, this should all be automatic. Segher
On Mon, 4 Nov 2019, Segher Boessenkool wrote: > Hi! > > On Mon, Nov 04, 2019 at 02:31:43PM +0800, Jiufu Guo wrote: > > In this patch, loop unroll adjust hook is introduced for powerpc. We can do > > target related hueristic adjustment in this hook. In this patch, small loops > > is unrolled 2 times for O2 and O3 by default. With this patch, we can see > > some improvement for spec2017. This patch enhanced a little for [Patch V2] to > > enable small loops unroll for O3 by default like O2. > > > * gcc/config/rs6000/rs6000.opt (unroll_small_loops): New internal flag. > > That's the declaration of a variable. A command line flag is something > like -munroll-small-loops. Do we want a command line option like that? > It makes testing simpler. > > > - /* unroll very small loops 2 time if no -funroll-loops. */ > > + /* If funroll-loops is not enabled explicitly, then enable small loops > > + unrolling for -O2, and do not turn fweb or frename-registers on. */ > > if (!global_options_set.x_flag_unroll_loops > > && !global_options_set.x_flag_unroll_all_loops) > > { > > - maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2, > > - global_options.x_param_values, > > - global_options_set.x_param_values); > > - > > - maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20, > > - global_options.x_param_values, > > - global_options_set.x_param_values); > > + unroll_small_loops = optimize >= 2 ? 1 : 0; > > That includes -Os? It also re-introduces the exactly same issue as the --param with LTO. > I think you shouldn't always set it to some value, only enable it where > you want to enable it. If you make a command line option for it this is > especially simple (the table in common/config/rs6000/rs6000-common.c). > > > +static unsigned > > +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop) > > +{ > > + if (unroll_small_loops) > > + { > > + /* TODO: This is hardcoded to 10 right now. It can be refined, for > > + example we may want to unroll very small loops more times (4 perhaps). > > + We also should use a PARAM for this. */ > > + if (loop->ninsns <= 10) > > + return MIN (2, nunroll); > > + else > > + return 0; > > + } > > (Add an empty line here?) > > > + return nunroll; > > +} > > Great :-) > > > @@ -23472,6 +23488,7 @@ rs6000_function_specific_save (struct cl_target_option *ptr, > > { > > ptr->x_rs6000_isa_flags = opts->x_rs6000_isa_flags; > > ptr->x_rs6000_isa_flags_explicit = opts->x_rs6000_isa_flags_explicit; > > + ptr->x_unroll_small_loops = opts->x_unroll_small_loops; > > } > > Yeah we shouldn't need to add that, this should all be automatic. > > > Segher >
Segher Boessenkool <segher@kernel.crashing.org> writes: > Hi! > > On Mon, Nov 04, 2019 at 02:31:43PM +0800, Jiufu Guo wrote: >> In this patch, loop unroll adjust hook is introduced for powerpc. We can do >> target related hueristic adjustment in this hook. In this patch, small loops >> is unrolled 2 times for O2 and O3 by default. With this patch, we can see >> some improvement for spec2017. This patch enhanced a little for [Patch V2] to >> enable small loops unroll for O3 by default like O2. > >> * gcc/config/rs6000/rs6000.opt (unroll_small_loops): New internal flag. > > That's the declaration of a variable. A command line flag is something > like -munroll-small-loops. Do we want a command line option like that? > It makes testing simpler. Thanks for great sugguestion, will update patch to add a command line option. > >> - /* unroll very small loops 2 time if no -funroll-loops. */ >> + /* If funroll-loops is not enabled explicitly, then enable small loops >> + unrolling for -O2, and do not turn fweb or frename-registers on. */ >> if (!global_options_set.x_flag_unroll_loops >> && !global_options_set.x_flag_unroll_all_loops) >> { >> - maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2, >> - global_options.x_param_values, >> - global_options_set.x_param_values); >> - >> - maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20, >> - global_options.x_param_values, >> - global_options_set.x_param_values); >> + unroll_small_loops = optimize >= 2 ? 1 : 0; > > That includes -Os? > > I think you shouldn't always set it to some value, only enable it where > you want to enable it. If you make a command line option for it this is > especially simple (the table in common/config/rs6000/rs6000-common.c). Thanks again, update rs6000_option_optimization_table as : { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fweb, NULL, 0 }, { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_frename_registers, NULL, 0 }, { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 }, While, I still keep the code to disable unroll_small_loops for explicit -funroll-loops via checking global_options_set.x_flag_unroll_loops. > >> +static unsigned >> +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop) >> +{ >> + if (unroll_small_loops) >> + { >> + /* TODO: This is hardcoded to 10 right now. It can be refined, for >> + example we may want to unroll very small loops more times (4 perhaps). >> + We also should use a PARAM for this. */ >> + if (loop->ninsns <= 10) >> + return MIN (2, nunroll); >> + else >> + return 0; >> + } > > (Add an empty line here?) Thanks again, updated accordingly. > >> + return nunroll; >> +} > > Great :-) > >> @@ -23472,6 +23488,7 @@ rs6000_function_specific_save (struct cl_target_option *ptr, >> { >> ptr->x_rs6000_isa_flags = opts->x_rs6000_isa_flags; >> ptr->x_rs6000_isa_flags_explicit = opts->x_rs6000_isa_flags_explicit; >> + ptr->x_unroll_small_loops = opts->x_unroll_small_loops; >> } > > Yeah we shouldn't need to add that, this should all be automatic. Yes, through adding new option in .opt, this is handled automaticly. Updated patch is at the end of this mail. Thanks for review. Jiufu > > > Segher Updated patch: Index: gcc/common/config/rs6000/rs6000-common.c =================================================================== --- gcc/common/config/rs6000/rs6000-common.c (revision 277765) +++ gcc/common/config/rs6000/rs6000-common.c (working copy) @@ -35,7 +35,9 @@ static const struct default_options rs6000_option_ { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 }, /* Enable -fsched-pressure for first pass instruction scheduling. */ { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, - { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 }, + /* Enable -funroll-loops with -munroll-small-loops. */ + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 }, { OPT_LEVELS_NONE, 0, NULL, 0 } }; Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 277765) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribut #undef TARGET_VECTORIZE_DESTROY_COST_DATA #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data +#undef TARGET_LOOP_UNROLL_ADJUST +#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust + #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS rs6000_init_builtins #undef TARGET_BUILTIN_DECL @@ -4540,24 +4543,21 @@ rs6000_option_override_internal (bool global_init_ global_options.x_param_values, global_options_set.x_param_values); - /* unroll very small loops 2 time if no -funroll-loops. */ - if (!global_options_set.x_flag_unroll_loops - && !global_options_set.x_flag_unroll_all_loops) + /* Explicit funroll-loops turns -munroll-small-loops off. + Implicit funroll-loops does not turn fweb or frename-registers on. */ + if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops) + || (global_options_set.x_flag_unroll_all_loops + && flag_unroll_all_loops)) { - maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2, - global_options.x_param_values, - global_options_set.x_param_values); - - maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20, - global_options.x_param_values, - global_options_set.x_param_values); - - /* If fweb or frename-registers are not specificed in command-line, - do not turn them on implicitly. */ + if (!global_options_set.x_unroll_small_loops) + unroll_small_loops = 0; + } + else + { if (!global_options_set.x_flag_web) - global_options.x_flag_web = 0; + flag_web = 0; if (!global_options_set.x_flag_rename_registers) - global_options.x_flag_rename_registers = 0; + flag_rename_registers = 0; } /* If using typedef char *va_list, signal that @@ -5101,6 +5101,25 @@ rs6000_destroy_cost_data (void *data) free (data); } +/* Implement targetm.loop_unroll_adjust. */ + +static unsigned +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop) +{ + if (unroll_small_loops) + { + /* TODO: This is hardcoded to 10 right now. It can be refined, for + example we may want to unroll very small loops more times (4 perhaps). + We also should use a PARAM for this. */ + if (loop->ninsns <= 10) + return MIN (2, nunroll); + else + return 0; + } + + return nunroll; +} + /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a library with vectorized intrinsics. */ Index: gcc/config/rs6000/rs6000.opt =================================================================== --- gcc/config/rs6000/rs6000.opt (revision 277765) +++ gcc/config/rs6000/rs6000.opt (working copy) @@ -501,6 +501,10 @@ moptimize-swaps Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save Analyze and remove doubleword swaps from VSX computations. +munroll-small-loops +Target Undocumented Var(unroll_small_loops) Init(0) Save +Use conservative small loop unrolling. + mpower9-misc Target Undocumented Report Mask(P9_MISC) Var(rs6000_isa_flags) Use certain scalar instructions added in ISA 3.0. Index: gcc/testsuite/gcc.dg/pr59643.c =================================================================== --- gcc/testsuite/gcc.dg/pr59643.c (revision 277765) +++ gcc/testsuite/gcc.dg/pr59643.c (working copy) @@ -1,9 +1,6 @@ /* PR tree-optimization/59643 */ /* { dg-do compile } */ /* { dg-options "-O3 -fdump-tree-pcom-details" } */ -/* { dg-additional-options "--param max-unrolled-insns=400" { target { powerpc*-*-* } } } */ -/* Implicit threashold of max-unrolled-insn on ppc at O3 is too small for the - loop of this case. */ void foo (double *a, double *b, double *c, double d, double e, int n)
Richard Biener <rguenther@suse.de> writes: > On Mon, 4 Nov 2019, Segher Boessenkool wrote: > >> Hi! >> >> On Mon, Nov 04, 2019 at 02:31:43PM +0800, Jiufu Guo wrote: >> > In this patch, loop unroll adjust hook is introduced for powerpc. We can do >> > target related hueristic adjustment in this hook. In this patch, small loops >> > is unrolled 2 times for O2 and O3 by default. With this patch, we can see >> > some improvement for spec2017. This patch enhanced a little for [Patch V2] to >> > enable small loops unroll for O3 by default like O2. >> >> > * gcc/config/rs6000/rs6000.opt (unroll_small_loops): New internal flag. >> >> That's the declaration of a variable. A command line flag is something >> like -munroll-small-loops. Do we want a command line option like that? >> It makes testing simpler. >> >> > - /* unroll very small loops 2 time if no -funroll-loops. */ >> > + /* If funroll-loops is not enabled explicitly, then enable small loops >> > + unrolling for -O2, and do not turn fweb or frename-registers on. */ >> > if (!global_options_set.x_flag_unroll_loops >> > && !global_options_set.x_flag_unroll_all_loops) >> > { >> > - maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2, >> > - global_options.x_param_values, >> > - global_options_set.x_param_values); >> > - >> > - maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20, >> > - global_options.x_param_values, >> > - global_options_set.x_param_values); >> > + unroll_small_loops = optimize >= 2 ? 1 : 0; >> >> That includes -Os? > > It also re-introduces the exactly same issue as the --param with LTO. Thanks Richard, This flag (unroll_small_loops) is saved/restored cl_target_option it can distigush different CU. I had a test, it works when -flto for multi-sources. Jiufu BR. > >> I think you shouldn't always set it to some value, only enable it where >> you want to enable it. If you make a command line option for it this is >> especially simple (the table in common/config/rs6000/rs6000-common.c). >> >> > +static unsigned >> > +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop) >> > +{ >> > + if (unroll_small_loops) >> > + { >> > + /* TODO: This is hardcoded to 10 right now. It can be refined, for >> > + example we may want to unroll very small loops more times (4 perhaps). >> > + We also should use a PARAM for this. */ >> > + if (loop->ninsns <= 10) >> > + return MIN (2, nunroll); >> > + else >> > + return 0; >> > + } >> >> (Add an empty line here?) >> >> > + return nunroll; >> > +} >> >> Great :-) >> >> > @@ -23472,6 +23488,7 @@ rs6000_function_specific_save (struct cl_target_option *ptr, >> > { >> > ptr->x_rs6000_isa_flags = opts->x_rs6000_isa_flags; >> > ptr->x_rs6000_isa_flags_explicit = opts->x_rs6000_isa_flags_explicit; >> > + ptr->x_unroll_small_loops = opts->x_unroll_small_loops; >> > } >> >> Yeah we shouldn't need to add that, this should all be automatic. >> >> >> Segher >>
Jiufu Guo <guojiufu@linux.ibm.com> writes: > Segher Boessenkool <segher@kernel.crashing.org> writes: > >> Hi! >> >> On Mon, Nov 04, 2019 at 02:31:43PM +0800, Jiufu Guo wrote: >>> In this patch, loop unroll adjust hook is introduced for powerpc. We can do >>> target related hueristic adjustment in this hook. In this patch, small loops >>> is unrolled 2 times for O2 and O3 by default. With this patch, we can see >>> some improvement for spec2017. This patch enhanced a little for [Patch V2] to >>> enable small loops unroll for O3 by default like O2. >> >>> * gcc/config/rs6000/rs6000.opt (unroll_small_loops): New internal flag. >> >> That's the declaration of a variable. A command line flag is something >> like -munroll-small-loops. Do we want a command line option like that? >> It makes testing simpler. > Thanks for great sugguestion, will update patch to add a command line > option. > >> >>> - /* unroll very small loops 2 time if no -funroll-loops. */ >>> + /* If funroll-loops is not enabled explicitly, then enable small loops >>> + unrolling for -O2, and do not turn fweb or frename-registers on. */ >>> if (!global_options_set.x_flag_unroll_loops >>> && !global_options_set.x_flag_unroll_all_loops) >>> { >>> - maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2, >>> - global_options.x_param_values, >>> - global_options_set.x_param_values); >>> - >>> - maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20, >>> - global_options.x_param_values, >>> - global_options_set.x_param_values); >>> + unroll_small_loops = optimize >= 2 ? 1 : 0; >> >> That includes -Os? >> >> I think you shouldn't always set it to some value, only enable it where >> you want to enable it. If you make a command line option for it this is >> especially simple (the table in common/config/rs6000/rs6000-common.c). > Thanks again, update rs6000_option_optimization_table as : > { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, > { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fweb, NULL, 0 }, > { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_frename_registers, NULL, 0 }, Sorry, typo, in patch, above 2 lines are not there. Because they do not turn off flag_web and flag_ename_registers. > { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 }, > > While, I still keep the code to disable unroll_small_loops for explicit > -funroll-loops via checking global_options_set.x_flag_unroll_loops. >> >>> +static unsigned >>> +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop) >>> +{ >>> + if (unroll_small_loops) >>> + { >>> + /* TODO: This is hardcoded to 10 right now. It can be refined, for >>> + example we may want to unroll very small loops more times (4 perhaps). >>> + We also should use a PARAM for this. */ >>> + if (loop->ninsns <= 10) >>> + return MIN (2, nunroll); >>> + else >>> + return 0; >>> + } >> >> (Add an empty line here?) > Thanks again, updated accordingly. >> >>> + return nunroll; >>> +} >> >> Great :-) >> >>> @@ -23472,6 +23488,7 @@ rs6000_function_specific_save (struct cl_target_option *ptr, >>> { >>> ptr->x_rs6000_isa_flags = opts->x_rs6000_isa_flags; >>> ptr->x_rs6000_isa_flags_explicit = opts->x_rs6000_isa_flags_explicit; >>> + ptr->x_unroll_small_loops = opts->x_unroll_small_loops; >>> } >> >> Yeah we shouldn't need to add that, this should all be automatic. > Yes, through adding new option in .opt, this is handled automaticly. > > Updated patch is at the end of this mail. Thanks for review. > > Jiufu >> >> >> Segher > > Updated patch: > > Index: gcc/common/config/rs6000/rs6000-common.c > =================================================================== > --- gcc/common/config/rs6000/rs6000-common.c (revision 277765) > +++ gcc/common/config/rs6000/rs6000-common.c (working copy) > @@ -35,7 +35,9 @@ static const struct default_options rs6000_option_ > { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 }, > /* Enable -fsched-pressure for first pass instruction scheduling. */ > { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, > - { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 }, > + /* Enable -funroll-loops with -munroll-small-loops. */ > + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, > + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 }, > { OPT_LEVELS_NONE, 0, NULL, 0 } > }; > > Index: gcc/config/rs6000/rs6000.c > =================================================================== > --- gcc/config/rs6000/rs6000.c (revision 277765) > +++ gcc/config/rs6000/rs6000.c (working copy) > @@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribut > #undef TARGET_VECTORIZE_DESTROY_COST_DATA > #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data > > +#undef TARGET_LOOP_UNROLL_ADJUST > +#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust > + > #undef TARGET_INIT_BUILTINS > #define TARGET_INIT_BUILTINS rs6000_init_builtins > #undef TARGET_BUILTIN_DECL > @@ -4540,24 +4543,21 @@ rs6000_option_override_internal (bool global_init_ > global_options.x_param_values, > global_options_set.x_param_values); > > - /* unroll very small loops 2 time if no -funroll-loops. */ > - if (!global_options_set.x_flag_unroll_loops > - && !global_options_set.x_flag_unroll_all_loops) > + /* Explicit funroll-loops turns -munroll-small-loops off. > + Implicit funroll-loops does not turn fweb or frename-registers on. */ > + if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops) > + || (global_options_set.x_flag_unroll_all_loops > + && flag_unroll_all_loops)) > { > - maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2, > - global_options.x_param_values, > - global_options_set.x_param_values); > - > - maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20, > - global_options.x_param_values, > - global_options_set.x_param_values); > - > - /* If fweb or frename-registers are not specificed in command-line, > - do not turn them on implicitly. */ > + if (!global_options_set.x_unroll_small_loops) > + unroll_small_loops = 0; > + } > + else > + { > if (!global_options_set.x_flag_web) > - global_options.x_flag_web = 0; > + flag_web = 0; > if (!global_options_set.x_flag_rename_registers) > - global_options.x_flag_rename_registers = 0; > + flag_rename_registers = 0; > } > > /* If using typedef char *va_list, signal that > @@ -5101,6 +5101,25 @@ rs6000_destroy_cost_data (void *data) > free (data); > } > > +/* Implement targetm.loop_unroll_adjust. */ > + > +static unsigned > +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop) > +{ > + if (unroll_small_loops) > + { > + /* TODO: This is hardcoded to 10 right now. It can be refined, for > + example we may want to unroll very small loops more times (4 perhaps). > + We also should use a PARAM for this. */ > + if (loop->ninsns <= 10) > + return MIN (2, nunroll); > + else > + return 0; > + } > + > + return nunroll; > +} > + > /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a > library with vectorized intrinsics. */ > > Index: gcc/config/rs6000/rs6000.opt > =================================================================== > --- gcc/config/rs6000/rs6000.opt (revision 277765) > +++ gcc/config/rs6000/rs6000.opt (working copy) > @@ -501,6 +501,10 @@ moptimize-swaps > Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save > Analyze and remove doubleword swaps from VSX computations. > > +munroll-small-loops > +Target Undocumented Var(unroll_small_loops) Init(0) Save > +Use conservative small loop unrolling. > + > mpower9-misc > Target Undocumented Report Mask(P9_MISC) Var(rs6000_isa_flags) > Use certain scalar instructions added in ISA 3.0. > Index: gcc/testsuite/gcc.dg/pr59643.c > =================================================================== > --- gcc/testsuite/gcc.dg/pr59643.c (revision 277765) > +++ gcc/testsuite/gcc.dg/pr59643.c (working copy) > @@ -1,9 +1,6 @@ > /* PR tree-optimization/59643 */ > /* { dg-do compile } */ > /* { dg-options "-O3 -fdump-tree-pcom-details" } */ > -/* { dg-additional-options "--param max-unrolled-insns=400" { target { powerpc*-*-* } } } */ > -/* Implicit threashold of max-unrolled-insn on ppc at O3 is too small for the > - loop of this case. */ > > void > foo (double *a, double *b, double *c, double d, double e, int n)
Jiufu Guo <guojiufu@linux.ibm.com> writes: Got message about fail to send to gcc-patches. Send again. > Segher Boessenkool <segher@kernel.crashing.org> writes: > >> Hi! >> >> On Mon, Nov 04, 2019 at 02:31:43PM +0800, Jiufu Guo wrote: >>> In this patch, loop unroll adjust hook is introduced for powerpc. We can do >>> target related hueristic adjustment in this hook. In this patch, small loops >>> is unrolled 2 times for O2 and O3 by default. With this patch, we can see >>> some improvement for spec2017. This patch enhanced a little for [Patch V2] to >>> enable small loops unroll for O3 by default like O2. >> [...] > > Updated patch is at the end of this mail. Thanks for review. > > Jiufu >> >> >> Segher Thanks again for review. Add ChangeLog: gcc/ 2019-11-04 Jiufu Guo <guojiufu@linux.ibm.com> PR tree-optimization/88760 * gcc/common/config/rs6000/rs6000-common.c (rs6000_option_optimization_table): Enable OPT_funroll_loops and OPT_munroll_small_loops for OPT_LEVELS_2_PLUS_SPEED_ONLY. * config/rs6000/rs6000.c (rs6000_option_override_internal): Remove code which changes PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS. Handle option overrides for explicit and implicit funroll-loops. (TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook. (rs6000_loop_unroll_adjust): New hook for loop unroll adjust. Unrolling 2 times for small loops. * gcc/config/rs6000/rs6000.opt (munroll-small-loops): Add new flag. gcc.testsuite/ 2019-11-04 Jiufu Guo <guojiufu@linux.ibm.com> PR tree-optimization/88760 * gcc.dg/pr59643.c: Update back to r277550. > > Updated patch: > > Index: gcc/common/config/rs6000/rs6000-common.c > =================================================================== > --- gcc/common/config/rs6000/rs6000-common.c (revision 277765) > +++ gcc/common/config/rs6000/rs6000-common.c (working copy) > @@ -35,7 +35,9 @@ static const struct default_options rs6000_option_ > { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 }, > /* Enable -fsched-pressure for first pass instruction scheduling. */ > { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, > - { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 }, > + /* Enable -funroll-loops with -munroll-small-loops. */ > + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, > + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 }, > { OPT_LEVELS_NONE, 0, NULL, 0 } > }; > > Index: gcc/config/rs6000/rs6000.c > =================================================================== > --- gcc/config/rs6000/rs6000.c (revision 277765) > +++ gcc/config/rs6000/rs6000.c (working copy) > @@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribut > #undef TARGET_VECTORIZE_DESTROY_COST_DATA > #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data > > +#undef TARGET_LOOP_UNROLL_ADJUST > +#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust > + > #undef TARGET_INIT_BUILTINS > #define TARGET_INIT_BUILTINS rs6000_init_builtins > #undef TARGET_BUILTIN_DECL > @@ -4540,24 +4543,21 @@ rs6000_option_override_internal (bool global_init_ > global_options.x_param_values, > global_options_set.x_param_values); > > - /* unroll very small loops 2 time if no -funroll-loops. */ > - if (!global_options_set.x_flag_unroll_loops > - && !global_options_set.x_flag_unroll_all_loops) > + /* Explicit funroll-loops turns -munroll-small-loops off. > + Implicit funroll-loops does not turn fweb or frename-registers on. */ > + if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops) > + || (global_options_set.x_flag_unroll_all_loops > + && flag_unroll_all_loops)) > { > - maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2, > - global_options.x_param_values, > - global_options_set.x_param_values); > - > - maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20, > - global_options.x_param_values, > - global_options_set.x_param_values); > - > - /* If fweb or frename-registers are not specificed in command-line, > - do not turn them on implicitly. */ > + if (!global_options_set.x_unroll_small_loops) > + unroll_small_loops = 0; > + } > + else > + { > if (!global_options_set.x_flag_web) > - global_options.x_flag_web = 0; > + flag_web = 0; > if (!global_options_set.x_flag_rename_registers) > - global_options.x_flag_rename_registers = 0; > + flag_rename_registers = 0; > } > > /* If using typedef char *va_list, signal that > @@ -5101,6 +5101,25 @@ rs6000_destroy_cost_data (void *data) > free (data); > } > > +/* Implement targetm.loop_unroll_adjust. */ > + > +static unsigned > +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop) > +{ > + if (unroll_small_loops) > + { > + /* TODO: This is hardcoded to 10 right now. It can be refined, for > + example we may want to unroll very small loops more times (4 perhaps). > + We also should use a PARAM for this. */ > + if (loop->ninsns <= 10) > + return MIN (2, nunroll); > + else > + return 0; > + } > + > + return nunroll; > +} > + > /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a > library with vectorized intrinsics. */ > > Index: gcc/config/rs6000/rs6000.opt > =================================================================== > --- gcc/config/rs6000/rs6000.opt (revision 277765) > +++ gcc/config/rs6000/rs6000.opt (working copy) > @@ -501,6 +501,10 @@ moptimize-swaps > Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save > Analyze and remove doubleword swaps from VSX computations. > > +munroll-small-loops > +Target Undocumented Var(unroll_small_loops) Init(0) Save > +Use conservative small loop unrolling. > + > mpower9-misc > Target Undocumented Report Mask(P9_MISC) Var(rs6000_isa_flags) > Use certain scalar instructions added in ISA 3.0. > Index: gcc/testsuite/gcc.dg/pr59643.c > =================================================================== > --- gcc/testsuite/gcc.dg/pr59643.c (revision 277765) > +++ gcc/testsuite/gcc.dg/pr59643.c (working copy) > @@ -1,9 +1,6 @@ > /* PR tree-optimization/59643 */ > /* { dg-do compile } */ > /* { dg-options "-O3 -fdump-tree-pcom-details" } */ > -/* { dg-additional-options "--param max-unrolled-insns=400" { target { powerpc*-*-* } } } */ > -/* Implicit threashold of max-unrolled-insn on ppc at O3 is too small for the > - loop of this case. */ > > void > foo (double *a, double *b, double *c, double d, double e, int n)
Hi! On Tue, Nov 05, 2019 at 04:33:23PM +0800, Jiufu Guo wrote: > --- gcc/common/config/rs6000/rs6000-common.c (revision 277765) > +++ gcc/common/config/rs6000/rs6000-common.c (working copy) > @@ -35,7 +35,9 @@ static const struct default_options rs6000_option_ > { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 }, > /* Enable -fsched-pressure for first pass instruction scheduling. */ > { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, > - { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 }, > + /* Enable -funroll-loops with -munroll-small-loops. */ > + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, > + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 }, I guess the comment should say what we enable here more than the generic code does. Something like /* Enable -funroll-loops at -O2 already. Also enable -munroll-small-loops. */ > + /* Explicit funroll-loops turns -munroll-small-loops off. > + Implicit funroll-loops does not turn fweb or frename-registers on. */ > + if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops) > + || (global_options_set.x_flag_unroll_all_loops > + && flag_unroll_all_loops)) > { > + if (!global_options_set.x_unroll_small_loops) > + unroll_small_loops = 0; > + } > + else > + { > if (!global_options_set.x_flag_web) > + flag_web = 0; > if (!global_options_set.x_flag_rename_registers) > + flag_rename_registers = 0; > } So unroll-small-loops should better be called unroll-only-small-loops? Why does explicit unroll-loops turns on web and rnreg? Why only explicit? Isn't it good and/or bad in all the same cases, implicit and explicit? > +munroll-small-loops > +Target Undocumented Var(unroll_small_loops) Init(0) Save > +Use conservative small loop unrolling. Undocumented means undocumented, so you don't have a comment string in here. But you can comment it: ; Use conservative small loop unrolling. Segher
Segher Boessenkool <segher@kernel.crashing.org> writes: > Hi! > > On Tue, Nov 05, 2019 at 04:33:23PM +0800, Jiufu Guo wrote: >> --- gcc/common/config/rs6000/rs6000-common.c (revision 277765) >> +++ gcc/common/config/rs6000/rs6000-common.c (working copy) >> @@ -35,7 +35,9 @@ static const struct default_options rs6000_option_ >> { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 }, >> /* Enable -fsched-pressure for first pass instruction scheduling. */ >> { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, >> - { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 }, >> + /* Enable -funroll-loops with -munroll-small-loops. */ >> + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, >> + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 }, > > I guess the comment should say what we enable here more than the generic > code does. Something like > > /* Enable -funroll-loops at -O2 already. Also enable > -munroll-small-loops. */ updated to: /* Enable -munroll-only-small-loops with -funroll-loops to unroll small loops at -O2 and above by default. */ { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 }, /* Disable -fweb and -frename-registers to avoid bad impacts. */ { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 }, { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 }, Thanks for more comments to make it better! > >> + /* Explicit funroll-loops turns -munroll-small-loops off. >> + Implicit funroll-loops does not turn fweb or frename-registers on. */ >> + if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops) >> + || (global_options_set.x_flag_unroll_all_loops >> + && flag_unroll_all_loops)) >> { >> + if (!global_options_set.x_unroll_small_loops) >> + unroll_small_loops = 0; >> + } >> + else >> + { >> if (!global_options_set.x_flag_web) >> + flag_web = 0; >> if (!global_options_set.x_flag_rename_registers) >> + flag_rename_registers = 0; >> } > > So unroll-small-loops should better be called unroll-only-small-loops? Thanks again. Right, unroll-only-small-loops is better. > > Why does explicit unroll-loops turns on web and rnreg? Why only explicit? > Isn't it good and/or bad in all the same cases, implicit and explicit? Good question! Turn off them by default, because they do not help too much for generic cases, and did not see performance gain on SPEC2017. And turning them off will help to consistent with previous -O2/-O3 which does not turn them on. This could avoid regressions in test cases. If do not turn them on with -funroll-loops, user may see performance difference on some cases. For example, in SPEC peak which option contains -funroll-loops, it may need to add -frename-registers manually for some benchmarks. Any sugguestions? Do you think it is a good idea to disable them by default, and let user to add them when they are helpful? e.g. add them for some benchmarks at `peak`. > >> +munroll-small-loops >> +Target Undocumented Var(unroll_small_loops) Init(0) Save >> +Use conservative small loop unrolling. > > Undocumented means undocumented, so you don't have a comment string in > here. But you can comment it: > > ; Use conservative small loop unrolling. Thanks again for you kindly review! Jiufu, BR. > > > Segher
Jiufu Guo <guojiufu@linux.ibm.com> writes: > Segher Boessenkool <segher@kernel.crashing.org> writes: > >> Hi! >> >> On Tue, Nov 05, 2019 at 04:33:23PM +0800, Jiufu Guo wrote: >>> --- gcc/common/config/rs6000/rs6000-common.c (revision 277765) >>> +++ gcc/common/config/rs6000/rs6000-common.c (working copy) >>> @@ -35,7 +35,9 @@ static const struct default_options rs6000_option_ >>> { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 }, >>> /* Enable -fsched-pressure for first pass instruction scheduling. */ >>> { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, >>> - { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 }, >>> + /* Enable -funroll-loops with -munroll-small-loops. */ >>> + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, >>> + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 }, >> >> I guess the comment should say what we enable here more than the generic >> code does. Something like >> >> /* Enable -funroll-loops at -O2 already. Also enable >> -munroll-small-loops. */ > > updated to: > /* Enable -munroll-only-small-loops with -funroll-loops to unroll small > loops at -O2 and above by default. */ > { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, > { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 }, > /* Disable -fweb and -frename-registers to avoid bad impacts. */ > { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 }, > { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 }, /* Enable -munroll-only-small-loops with -funroll-loops to unroll small loops at -O2 and above by default. */ { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 }, /* -fweb and -frename-registers are useless in general, turn them off. */ { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 }, { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 }, A little better? Updated patch is attached at the end of this mail, maybe it is easy for review. :) Jiufu, BR. > > Thanks for more comments to make it better! > >> >>> + /* Explicit funroll-loops turns -munroll-small-loops off. >>> + Implicit funroll-loops does not turn fweb or frename-registers on. */ >>> + if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops) >>> + || (global_options_set.x_flag_unroll_all_loops >>> + && flag_unroll_all_loops)) >>> { >>> + if (!global_options_set.x_unroll_small_loops) >>> + unroll_small_loops = 0; >>> + } >>> + else >>> + { >>> if (!global_options_set.x_flag_web) >>> + flag_web = 0; >>> if (!global_options_set.x_flag_rename_registers) >>> + flag_rename_registers = 0; >>> } >> >> So unroll-small-loops should better be called unroll-only-small-loops? > Thanks again. Right, unroll-only-small-loops is better. >> >> Why does explicit unroll-loops turns on web and rnreg? Why only explicit? >> Isn't it good and/or bad in all the same cases, implicit and explicit? > Good question! > > Turn off them by default, because they do not help too much for generic > cases, and did not see performance gain on SPEC2017. And turning them > off will help to consistent with previous -O2/-O3 which does not turn > them on. This could avoid regressions in test cases. > If do not turn them on with -funroll-loops, user may see performance > difference on some cases. For example, in SPEC peak which option > contains -funroll-loops, it may need to add -frename-registers manually > for some benchmarks. > > Any sugguestions? Do you think it is a good idea to disable them by > default, and let user to add them when they are helpful? e.g. add them > for some benchmarks at `peak`. > >> >>> +munroll-small-loops >>> +Target Undocumented Var(unroll_small_loops) Init(0) Save >>> +Use conservative small loop unrolling. >> >> Undocumented means undocumented, so you don't have a comment string in >> here. But you can comment it: >> >> ; Use conservative small loop unrolling. > Thanks again for you kindly review! > > Jiufu, > > BR. >> >> >> Segher gcc/ 2019-11-06 Jiufu Guo <guojiufu@linux.ibm.com> PR tree-optimization/88760 * gcc/config/rs6000/rs6000.opt (-munroll-small-loops): New option. * gcc/common/config/rs6000/rs6000-common.c (rs6000_option_optimization_table) [OPT_LEVELS_2_PLUS_SPEED_ONLY]: Turn on -funroll-loops and -munroll-small-loops. [OPT_LEVELS_ALL]: Turn off -fweb and -frename-registers. * config/rs6000/rs6000.c (rs6000_option_override_internal): Remove set of PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS. Turn off -munroll-small-loops, turn on -fweb and -frename-registers for explicit funroll-loops. (TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook. (rs6000_loop_unroll_adjust): Define it. Use -munroll-small-loops. gcc.testsuite/ 2019-11-06 Jiufu Guo <guojiufu@linux.ibm.com> PR tree-optimization/88760 * gcc.dg/pr59643.c: Update back to r277550. Index: gcc/common/config/rs6000/rs6000-common.c =================================================================== --- gcc/common/config/rs6000/rs6000-common.c (revision 277871) +++ gcc/common/config/rs6000/rs6000-common.c (working copy) @@ -35,7 +35,13 @@ static const struct default_options rs6000_option_ { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 }, /* Enable -fsched-pressure for first pass instruction scheduling. */ { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, - { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 }, + /* Enable -munroll-only-small-loops with -funroll-loops to unroll small + loops at -O2 and above by default. */ + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 }, + /* -fweb and -frename-registers are useless in general, turn them off. */ + { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 }, + { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 }, { OPT_LEVELS_NONE, 0, NULL, 0 } }; Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 277871) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribut #undef TARGET_VECTORIZE_DESTROY_COST_DATA #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data +#undef TARGET_LOOP_UNROLL_ADJUST +#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust + #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS rs6000_init_builtins #undef TARGET_BUILTIN_DECL @@ -4540,24 +4543,18 @@ rs6000_option_override_internal (bool global_init_ global_options.x_param_values, global_options_set.x_param_values); - /* unroll very small loops 2 time if no -funroll-loops. */ - if (!global_options_set.x_flag_unroll_loops - && !global_options_set.x_flag_unroll_all_loops) + /* Explicit -funroll-loops turns -munroll-small-loops off, and turns + fweb or frename-registers on. */ + if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops) + || (global_options_set.x_flag_unroll_all_loops + && flag_unroll_all_loops)) { - maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2, - global_options.x_param_values, - global_options_set.x_param_values); - - maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20, - global_options.x_param_values, - global_options_set.x_param_values); - - /* If fweb or frename-registers are not specificed in command-line, - do not turn them on implicitly. */ + if (!global_options_set.x_unroll_only_small_loops) + unroll_only_small_loops = 0; if (!global_options_set.x_flag_web) - global_options.x_flag_web = 0; + flag_web = 1; if (!global_options_set.x_flag_rename_registers) - global_options.x_flag_rename_registers = 0; + flag_rename_registers = 1; } /* If using typedef char *va_list, signal that @@ -5101,6 +5098,25 @@ rs6000_destroy_cost_data (void *data) free (data); } +/* Implement targetm.loop_unroll_adjust. */ + +static unsigned +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop) +{ + if (unroll_only_small_loops) + { + /* TODO: This is hardcoded to 10 right now. It can be refined, for + example we may want to unroll very small loops more times (4 perhaps). + We also should use a PARAM for this. */ + if (loop->ninsns <= 10) + return MIN (2, nunroll); + else + return 0; + } + + return nunroll; +} + /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a library with vectorized intrinsics. */ Index: gcc/config/rs6000/rs6000.opt =================================================================== --- gcc/config/rs6000/rs6000.opt (revision 277871) +++ gcc/config/rs6000/rs6000.opt (working copy) @@ -501,6 +501,10 @@ moptimize-swaps Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save Analyze and remove doubleword swaps from VSX computations. +munroll-only-small-loops +Target Undocumented Var(unroll_only_small_loops) Init(0) Save +; Use conservative small loop unrolling. + mpower9-misc Target Undocumented Report Mask(P9_MISC) Var(rs6000_isa_flags) Use certain scalar instructions added in ISA 3.0. Index: gcc/testsuite/gcc.dg/pr59643.c =================================================================== --- gcc/testsuite/gcc.dg/pr59643.c (revision 277871) +++ gcc/testsuite/gcc.dg/pr59643.c (working copy) @@ -1,9 +1,6 @@ /* PR tree-optimization/59643 */ /* { dg-do compile } */ /* { dg-options "-O3 -fdump-tree-pcom-details" } */ -/* { dg-additional-options "--param max-unrolled-insns=400" { target { powerpc*-*-* } } } */ -/* Implicit threashold of max-unrolled-insn on ppc at O3 is too small for the - loop of this case. */ void foo (double *a, double *b, double *c, double d, double e, int n)
Jiufu Guo <guojiufu@linux.ibm.com> writes: Hi Segher, I updated the patch for option name at the end of this mail. Thanks for review in advance. > Jiufu Guo <guojiufu@linux.ibm.com> writes: > >> Segher Boessenkool <segher@kernel.crashing.org> writes: >> >>> Hi! >>> >>> On Tue, Nov 05, 2019 at 04:33:23PM +0800, Jiufu Guo wrote: >>>> --- gcc/common/config/rs6000/rs6000-common.c (revision 277765) >>>> +++ gcc/common/config/rs6000/rs6000-common.c (working copy) >>>> @@ -35,7 +35,9 @@ static const struct default_options rs6000_option_ >>>> { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 }, >>>> /* Enable -fsched-pressure for first pass instruction scheduling. */ >>>> { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, >>>> - { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 }, >>>> + /* Enable -funroll-loops with -munroll-small-loops. */ >>>> + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, >>>> + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 }, >>> >>> I guess the comment should say what we enable here more than the generic >>> code does. Something like >>> >>> /* Enable -funroll-loops at -O2 already. Also enable >>> -munroll-small-loops. */ >> >> updated to: >> /* Enable -munroll-only-small-loops with -funroll-loops to unroll small >> loops at -O2 and above by default. */ >> { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, >> { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 }, >> /* Disable -fweb and -frename-registers to avoid bad impacts. */ >> { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 }, >> { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 }, > /* Enable -munroll-only-small-loops with -funroll-loops to unroll small > loops at -O2 and above by default. */ > { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, > { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 }, > /* -fweb and -frename-registers are useless in general, turn them off. */ > { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 }, > { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 }, > > A little better? > Updated patch is attached at the end of this mail, maybe it is easy for > review. :) > > Jiufu, > BR. > >> >> Thanks for more comments to make it better! >> >>> >>>> + /* Explicit funroll-loops turns -munroll-small-loops off. >>>> + Implicit funroll-loops does not turn fweb or frename-registers on. */ >>>> + if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops) >>>> + || (global_options_set.x_flag_unroll_all_loops >>>> + && flag_unroll_all_loops)) >>>> { >>>> + if (!global_options_set.x_unroll_small_loops) >>>> + unroll_small_loops = 0; >>>> + } >>>> + else >>>> + { >>>> if (!global_options_set.x_flag_web) >>>> + flag_web = 0; >>>> if (!global_options_set.x_flag_rename_registers) >>>> + flag_rename_registers = 0; >>>> } >>> >>> So unroll-small-loops should better be called unroll-only-small-loops? >> Thanks again. Right, unroll-only-small-loops is better. >>> >>> Why does explicit unroll-loops turns on web and rnreg? Why only explicit? >>> Isn't it good and/or bad in all the same cases, implicit and explicit? >> Good question! >> >> Turn off them by default, because they do not help too much for generic >> cases, and did not see performance gain on SPEC2017. And turning them >> off will help to consistent with previous -O2/-O3 which does not turn >> them on. This could avoid regressions in test cases. >> If do not turn them on with -funroll-loops, user may see performance >> difference on some cases. For example, in SPEC peak which option >> contains -funroll-loops, it may need to add -frename-registers manually >> for some benchmarks. >> >> Any sugguestions? Do you think it is a good idea to disable them by >> default, and let user to add them when they are helpful? e.g. add them >> for some benchmarks at `peak`. >> >>> >>>> +munroll-small-loops >>>> +Target Undocumented Var(unroll_small_loops) Init(0) Save >>>> +Use conservative small loop unrolling. >>> >>> Undocumented means undocumented, so you don't have a comment string in >>> here. But you can comment it: >>> >>> ; Use conservative small loop unrolling. >> Thanks again for you kindly review! >> >> Jiufu, >> >> BR. >>> >>> >>> Segher gcc/ 2019-11-07 Jiufu Guo <guojiufu@linux.ibm.com> PR tree-optimization/88760 * gcc/config/rs6000/rs6000.opt (-munroll-only-small-loops): New option. * gcc/common/config/rs6000/rs6000-common.c (rs6000_option_optimization_table) [OPT_LEVELS_2_PLUS_SPEED_ONLY]: Turn on -funroll-loops and -munroll-only-small-loops. [OPT_LEVELS_ALL]: Turn off -fweb and -frename-registers. * config/rs6000/rs6000.c (rs6000_option_override_internal): Remove set of PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS. Turn off -munroll-only-small-loops, turn on -fweb and -frename-registers for explicit -funroll-loops. (TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook. (rs6000_loop_unroll_adjust): Define it. Use -munroll-only-small-loops. gcc.testsuite/ 2019-11-07 Jiufu Guo <guojiufu@linux.ibm.com> PR tree-optimization/88760 * gcc.dg/pr59643.c: Update back to r277550. Index: gcc/common/config/rs6000/rs6000-common.c =================================================================== --- gcc/common/config/rs6000/rs6000-common.c (revision 277871) +++ gcc/common/config/rs6000/rs6000-common.c (working copy) @@ -35,7 +35,13 @@ static const struct default_options rs6000_option_ { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 }, /* Enable -fsched-pressure for first pass instruction scheduling. */ { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, - { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 }, + /* Enable -munroll-only-small-loops with -funroll-loops to unroll small + loops at -O2 and above by default. */ + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 }, + /* -fweb and -frename-registers are useless in general, turn them off. */ + { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 }, + { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 }, { OPT_LEVELS_NONE, 0, NULL, 0 } }; Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 277871) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribut #undef TARGET_VECTORIZE_DESTROY_COST_DATA #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data +#undef TARGET_LOOP_UNROLL_ADJUST +#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust + #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS rs6000_init_builtins #undef TARGET_BUILTIN_DECL @@ -4540,24 +4543,18 @@ rs6000_option_override_internal (bool global_init_ global_options.x_param_values, global_options_set.x_param_values); - /* unroll very small loops 2 time if no -funroll-loops. */ - if (!global_options_set.x_flag_unroll_loops - && !global_options_set.x_flag_unroll_all_loops) + /* Explicit -funroll-loops turns -munroll-only-small-loops off, and + turns fweb or frename-registers on. */ + if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops) + || (global_options_set.x_flag_unroll_all_loops + && flag_unroll_all_loops)) { - maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2, - global_options.x_param_values, - global_options_set.x_param_values); - - maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20, - global_options.x_param_values, - global_options_set.x_param_values); - - /* If fweb or frename-registers are not specificed in command-line, - do not turn them on implicitly. */ + if (!global_options_set.x_unroll_only_small_loops) + unroll_only_small_loops = 0; if (!global_options_set.x_flag_web) - global_options.x_flag_web = 0; + flag_web = 1; if (!global_options_set.x_flag_rename_registers) - global_options.x_flag_rename_registers = 0; + flag_rename_registers = 1; } /* If using typedef char *va_list, signal that @@ -5101,6 +5098,25 @@ rs6000_destroy_cost_data (void *data) free (data); } +/* Implement targetm.loop_unroll_adjust. */ + +static unsigned +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop) +{ + if (unroll_only_small_loops) + { + /* TODO: This is hardcoded to 10 right now. It can be refined, for + example we may want to unroll very small loops more times (4 perhaps). + We also should use a PARAM for this. */ + if (loop->ninsns <= 10) + return MIN (2, nunroll); + else + return 0; + } + + return nunroll; +} + /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a library with vectorized intrinsics. */ Index: gcc/config/rs6000/rs6000.opt =================================================================== --- gcc/config/rs6000/rs6000.opt (revision 277871) +++ gcc/config/rs6000/rs6000.opt (working copy) @@ -501,6 +501,10 @@ moptimize-swaps Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save Analyze and remove doubleword swaps from VSX computations. +munroll-only-small-loops +Target Undocumented Var(unroll_only_small_loops) Init(0) Save +; Use conservative small loop unrolling. + mpower9-misc Target Undocumented Report Mask(P9_MISC) Var(rs6000_isa_flags) Use certain scalar instructions added in ISA 3.0. Index: gcc/testsuite/gcc.dg/pr59643.c =================================================================== --- gcc/testsuite/gcc.dg/pr59643.c (revision 277871) +++ gcc/testsuite/gcc.dg/pr59643.c (working copy) @@ -1,9 +1,6 @@ /* PR tree-optimization/59643 */ /* { dg-do compile } */ /* { dg-options "-O3 -fdump-tree-pcom-details" } */ -/* { dg-additional-options "--param max-unrolled-insns=400" { target { powerpc*-*-* } } } */ -/* Implicit threashold of max-unrolled-insn on ppc at O3 is too small for the - loop of this case. */ void foo (double *a, double *b, double *c, double d, double e, int n)
Jiufu Guo <guojiufu@linux.ibm.com> writes: Hi Sehger, I updated the patch as we discussed. This patch does not turn on -fweb or -frename-registers with -funroll-loops for powerpc. Thanks for review in advance. gcc/ 2019-11-07 Jiufu Guo <guojiufu@linux.ibm.com> PR tree-optimization/88760 * gcc/config/rs6000/rs6000.opt (-munroll-only-small-loops): New option. * gcc/common/config/rs6000/rs6000-common.c (rs6000_option_optimization_table) [OPT_LEVELS_2_PLUS_SPEED_ONLY]: Turn on -funroll-loops and -munroll-only-small-loops. [OPT_LEVELS_ALL]: Turn off -fweb and -frename-registers. * config/rs6000/rs6000.c (rs6000_option_override_internal): Remove set of PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS. Turn off -munroll-only-small-loops for explicit -funroll-loops. (TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook. (rs6000_loop_unroll_adjust): Define it. Use -munroll-only-small-loops. gcc.testsuite/ 2019-11-07 Jiufu Guo <guojiufu@linux.ibm.com> PR tree-optimization/88760 * gcc.dg/pr59643.c: Update back to r277550. Index: gcc/common/config/rs6000/rs6000-common.c =================================================================== --- gcc/common/config/rs6000/rs6000-common.c (revision 277871) +++ gcc/common/config/rs6000/rs6000-common.c (working copy) @@ -35,7 +35,14 @@ static const struct default_options rs6000_option_ { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 }, /* Enable -fsched-pressure for first pass instruction scheduling. */ { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, - { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 }, + /* Enable -munroll-only-small-loops with -funroll-loops to unroll small + loops at -O2 and above by default. */ + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 }, + /* -fweb and -frename-registers are useless in general for rs6000, + turn them off. */ + { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 }, + { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 }, { OPT_LEVELS_NONE, 0, NULL, 0 } }; Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 277871) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribut #undef TARGET_VECTORIZE_DESTROY_COST_DATA #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data +#undef TARGET_LOOP_UNROLL_ADJUST +#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust + #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS rs6000_init_builtins #undef TARGET_BUILTIN_DECL @@ -4540,26 +4543,13 @@ rs6000_option_override_internal (bool global_init_ global_options.x_param_values, global_options_set.x_param_values); - /* unroll very small loops 2 time if no -funroll-loops. */ - if (!global_options_set.x_flag_unroll_loops - && !global_options_set.x_flag_unroll_all_loops) - { - maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2, - global_options.x_param_values, - global_options_set.x_param_values); + /* Explicit -funroll-loops turns -munroll-only-small-loops off. */ + if (((global_options_set.x_flag_unroll_loops && flag_unroll_loops) + || (global_options_set.x_flag_unroll_all_loops + && flag_unroll_all_loops)) + && !global_options_set.x_unroll_only_small_loops) + unroll_only_small_loops = 0; - maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20, - global_options.x_param_values, - global_options_set.x_param_values); - - /* If fweb or frename-registers are not specificed in command-line, - do not turn them on implicitly. */ - if (!global_options_set.x_flag_web) - global_options.x_flag_web = 0; - if (!global_options_set.x_flag_rename_registers) - global_options.x_flag_rename_registers = 0; - } - /* If using typedef char *va_list, signal that __builtin_va_start (&ap, 0) can be optimized to ap = __builtin_next_arg (0). */ @@ -5101,6 +5091,25 @@ rs6000_destroy_cost_data (void *data) free (data); } +/* Implement targetm.loop_unroll_adjust. */ + +static unsigned +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop) +{ + if (unroll_only_small_loops) + { + /* TODO: This is hardcoded to 10 right now. It can be refined, for + example we may want to unroll very small loops more times (4 perhaps). + We also should use a PARAM for this. */ + if (loop->ninsns <= 10) + return MIN (2, nunroll); + else + return 0; + } + + return nunroll; +} + /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a library with vectorized intrinsics. */ Index: gcc/config/rs6000/rs6000.opt =================================================================== --- gcc/config/rs6000/rs6000.opt (revision 277871) +++ gcc/config/rs6000/rs6000.opt (working copy) @@ -501,6 +501,10 @@ moptimize-swaps Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save Analyze and remove doubleword swaps from VSX computations. +munroll-only-small-loops +Target Undocumented Var(unroll_only_small_loops) Init(0) Save +; Use conservative small loop unrolling. + mpower9-misc Target Undocumented Report Mask(P9_MISC) Var(rs6000_isa_flags) Use certain scalar instructions added in ISA 3.0. Index: gcc/testsuite/gcc.dg/pr59643.c =================================================================== --- gcc/testsuite/gcc.dg/pr59643.c (revision 277871) +++ gcc/testsuite/gcc.dg/pr59643.c (working copy) @@ -1,9 +1,6 @@ /* PR tree-optimization/59643 */ /* { dg-do compile } */ /* { dg-options "-O3 -fdump-tree-pcom-details" } */ -/* { dg-additional-options "--param max-unrolled-insns=400" { target { powerpc*-*-* } } } */ -/* Implicit threashold of max-unrolled-insn on ppc at O3 is too small for the - loop of this case. */ void foo (double *a, double *b, double *c, double d, double e, int n)
Hi Jiu Fu, On Thu, Nov 07, 2019 at 10:40:41PM +0800, Jiufu Guo wrote: > gcc/ > 2019-11-07 Jiufu Guo <guojiufu@linux.ibm.com> > > PR tree-optimization/88760 > * gcc/config/rs6000/rs6000.opt (-munroll-only-small-loops): New option. > * gcc/common/config/rs6000/rs6000-common.c > (rs6000_option_optimization_table) [OPT_LEVELS_2_PLUS_SPEED_ONLY]: > Turn on -funroll-loops and -munroll-only-small-loops. > [OPT_LEVELS_ALL]: Turn off -fweb and -frename-registers. > * config/rs6000/rs6000.c (rs6000_option_override_internal): Remove > set of PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS. > Turn off -munroll-only-small-loops for explicit -funroll-loops. > (TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook. > (rs6000_loop_unroll_adjust): Define it. Use -munroll-only-small-loops. > > gcc.testsuite/ > 2019-11-07 Jiufu Guo <guojiufu@linux.ibm.com> > > PR tree-optimization/88760 > * gcc.dg/pr59643.c: Update back to r277550. Okay for trunk. Thanks! Just some formatting stuff: > + /* Enable -munroll-only-small-loops with -funroll-loops to unroll small > + loops at -O2 and above by default. */ The "l" of "loops" should align with the "E" of "Enable", and only two spaces after a dot: /* Enable -munroll-only-small-loops with -funroll-loops to unroll small loops at -O2 and above by default. */ > +/* Implement targetm.loop_unroll_adjust. */ Only one space at the start of the comment. > +static unsigned > +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop) struct loop *loop > + /* TODO: This is hardcoded to 10 right now. It can be refined, for > + example we may want to unroll very small loops more times (4 perhaps). > + We also should use a PARAM for this. */ There will be target-specific params soon, if I understood correctly :-) Cheers, Segher
Segher Boessenkool <segher@kernel.crashing.org> writes: > Hi Jiu Fu, > > On Thu, Nov 07, 2019 at 10:40:41PM +0800, Jiufu Guo wrote: >> gcc/ >> 2019-11-07 Jiufu Guo <guojiufu@linux.ibm.com> >> >> PR tree-optimization/88760 >> * gcc/config/rs6000/rs6000.opt (-munroll-only-small-loops): New option. >> * gcc/common/config/rs6000/rs6000-common.c >> (rs6000_option_optimization_table) [OPT_LEVELS_2_PLUS_SPEED_ONLY]: >> Turn on -funroll-loops and -munroll-only-small-loops. >> [OPT_LEVELS_ALL]: Turn off -fweb and -frename-registers. >> * config/rs6000/rs6000.c (rs6000_option_override_internal): Remove >> set of PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS. >> Turn off -munroll-only-small-loops for explicit -funroll-loops. >> (TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook. >> (rs6000_loop_unroll_adjust): Define it. Use -munroll-only-small-loops. >> >> gcc.testsuite/ >> 2019-11-07 Jiufu Guo <guojiufu@linux.ibm.com> >> >> PR tree-optimization/88760 >> * gcc.dg/pr59643.c: Update back to r277550. > > Okay for trunk. Thanks! Just some formatting stuff: Thanks Segher! I will update according for patch. > >> + /* Enable -munroll-only-small-loops with -funroll-loops to unroll small >> + loops at -O2 and above by default. */ > > The "l" of "loops" should align with the "E" of "Enable", and only two > spaces after a dot: > /* Enable -munroll-only-small-loops with -funroll-loops to unroll small > loops at -O2 and above by default. */ > >> +/* Implement targetm.loop_unroll_adjust. */ > > Only one space at the start of the comment. > >> +static unsigned >> +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop) > > struct loop *loop > >> + /* TODO: This is hardcoded to 10 right now. It can be refined, for >> + example we may want to unroll very small loops more times (4 perhaps). >> + We also should use a PARAM for this. */ > > There will be target-specific params soon, if I understood correctly > :-) Yes, It is what I want to do. Thanks again! Jiufu BR. > > Cheers, > > > Segher
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 9ed5151..5e1a75d 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_VECTORIZE_DESTROY_COST_DATA #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data +#undef TARGET_LOOP_UNROLL_ADJUST +#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust + #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS rs6000_init_builtins #undef TARGET_BUILTIN_DECL @@ -4540,25 +4543,20 @@ rs6000_option_override_internal (bool global_init_p) global_options.x_param_values, global_options_set.x_param_values); - /* unroll very small loops 2 time if no -funroll-loops. */ + /* If funroll-loops is not enabled explicitly, then enable small loops + unrolling for -O2, and do not turn fweb or frename-registers on. */ if (!global_options_set.x_flag_unroll_loops && !global_options_set.x_flag_unroll_all_loops) { - maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2, - global_options.x_param_values, - global_options_set.x_param_values); - - maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20, - global_options.x_param_values, - global_options_set.x_param_values); + unroll_small_loops = optimize >= 2 ? 1 : 0; - /* If fweb or frename-registers are not specificed in command-line, - do not turn them on implicitly. */ if (!global_options_set.x_flag_web) global_options.x_flag_web = 0; if (!global_options_set.x_flag_rename_registers) global_options.x_flag_rename_registers = 0; } + else + unroll_small_loops = 0; /* If using typedef char *va_list, signal that __builtin_va_start (&ap, 0) can be optimized to @@ -5101,6 +5099,24 @@ rs6000_destroy_cost_data (void *data) free (data); } +/* Implement targetm.loop_unroll_adjust. */ + +static unsigned +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop) +{ + if (unroll_small_loops) + { + /* TODO: This is hardcoded to 10 right now. It can be refined, for + example we may want to unroll very small loops more times (4 perhaps). + We also should use a PARAM for this. */ + if (loop->ninsns <= 10) + return MIN (2, nunroll); + else + return 0; + } + return nunroll; +} + /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a library with vectorized intrinsics. */ @@ -23472,6 +23488,7 @@ rs6000_function_specific_save (struct cl_target_option *ptr, { ptr->x_rs6000_isa_flags = opts->x_rs6000_isa_flags; ptr->x_rs6000_isa_flags_explicit = opts->x_rs6000_isa_flags_explicit; + ptr->x_unroll_small_loops = opts->x_unroll_small_loops; } /* Restore the current options */ @@ -23483,6 +23500,7 @@ rs6000_function_specific_restore (struct gcc_options *opts, { opts->x_rs6000_isa_flags = ptr->x_rs6000_isa_flags; opts->x_rs6000_isa_flags_explicit = ptr->x_rs6000_isa_flags_explicit; + opts->x_unroll_small_loops = ptr->x_unroll_small_loops; (void) rs6000_option_override_internal (false); } diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index 1f37a92..9cd5b4e 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -96,6 +96,13 @@ enum rs6000_cmodel rs6000_current_cmodel = CMODEL_SMALL TargetVariable unsigned int rs6000_recip_control +;; Whether to unroll small loops only +Variable +unsigned char unroll_small_loops + +TargetSave +unsigned char x_unroll_small_loops + ;; Mask of what builtin functions are allowed TargetVariable HOST_WIDE_INT rs6000_builtin_mask diff --git a/gcc/testsuite/gcc.dg/pr59643.c b/gcc/testsuite/gcc.dg/pr59643.c index 4446f6e..de78d60 100644 --- a/gcc/testsuite/gcc.dg/pr59643.c +++ b/gcc/testsuite/gcc.dg/pr59643.c @@ -1,9 +1,6 @@ /* PR tree-optimization/59643 */ /* { dg-do compile } */ /* { dg-options "-O3 -fdump-tree-pcom-details" } */ -/* { dg-additional-options "--param max-unrolled-insns=400" { target { powerpc*-*-* } } } */ -/* Implicit threashold of max-unrolled-insn on ppc at O3 is too small for the - loop of this case. */ void foo (double *a, double *b, double *c, double d, double e, int n)