diff mbox series

[V3] rs6000: Refine small loop unroll in loop_unroll_adjust hook

Message ID 1572849103-60161-1-git-send-email-guojiufu@linux.ibm.com
State New
Headers show
Series [V3] rs6000: Refine small loop unroll in loop_unroll_adjust hook | expand

Commit Message

Jiufu Guo Nov. 4, 2019, 6:31 a.m. UTC
Hi,

In this patch, loop unroll adjust hook is introduced for powerpc.  We can do
target related hueristic adjustment in this hook. In this patch, small loops
is unrolled 2 times for O2 and O3 by default.  With this patch, we can see
some improvement for spec2017.  This patch enhanced a little for [Patch V2] to
enable small loops unroll for O3 by default like O2.

Bootstrapped and regtested on powerpc64le.  Is this ok for trunk?

Jiufu
BR.

gcc/
2019-11-04  Jiufu Guo  <guojiufu@linux.ibm.com>	    

	PR tree-optimization/88760
	* config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
	code which changes PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS.
	(TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook.
	(rs6000_loop_unroll_adjust): New hook for loop unroll adjust.
	Unrolling small loop 2 times for -O2 and -O3.
	(rs6000_function_specific_save): Save unroll_small_loops flag.
	(rs6000_function_specific_restore): Restore unroll_small_loops flag.
	* gcc/config/rs6000/rs6000.opt (unroll_small_loops): New internal flag.

	
gcc.testsuite/
2019-11-04  Jiufu Guo  <guojiufu@linux.ibm.com>

	PR tree-optimization/88760
	* gcc.dg/pr59643.c: Update back to r277550.

---
 gcc/config/rs6000/rs6000.c     | 38 ++++++++++++++++++++++++++++----------
 gcc/config/rs6000/rs6000.opt   |  7 +++++++
 gcc/testsuite/gcc.dg/pr59643.c |  3 ---
 3 files changed, 35 insertions(+), 13 deletions(-)

Comments

Kewen.Lin Nov. 4, 2019, 9:07 a.m. UTC | #1
Hi Jeff,

Thanks for the patch, I learned a lot from it.  Some nits embedded.

on 2019/11/4 下午2:31, Jiufu Guo wrote:
> Hi,
> 
> In this patch, loop unroll adjust hook is introduced for powerpc.  We can do
> target related hueristic adjustment in this hook. In this patch, small loops
> is unrolled 2 times for O2 and O3 by default.  With this patch, we can see
> some improvement for spec2017.  This patch enhanced a little for [Patch V2] to
> enable small loops unroll for O3 by default like O2.
> 
> Bootstrapped and regtested on powerpc64le.  Is this ok for trunk?
> 
> Jiufu
> BR.
> 
> gcc/
> 2019-11-04  Jiufu Guo  <guojiufu@linux.ibm.com>	    
> 
> 	PR tree-optimization/88760
> 	* config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
> 	code which changes PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS.
> 	(TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook.
> 	(rs6000_loop_unroll_adjust): New hook for loop unroll adjust.
> 	Unrolling small loop 2 times for -O2 and -O3.
> 	(rs6000_function_specific_save): Save unroll_small_loops flag.
> 	(rs6000_function_specific_restore): Restore unroll_small_loops flag.
> 	* gcc/config/rs6000/rs6000.opt (unroll_small_loops): New internal flag.
> 
> 	
> gcc.testsuite/
> 2019-11-04  Jiufu Guo  <guojiufu@linux.ibm.com>
> 
> 	PR tree-optimization/88760
> 	* gcc.dg/pr59643.c: Update back to r277550.
> 
> ---
>  gcc/config/rs6000/rs6000.c     | 38 ++++++++++++++++++++++++++++----------
>  gcc/config/rs6000/rs6000.opt   |  7 +++++++
>  gcc/testsuite/gcc.dg/pr59643.c |  3 ---
>  3 files changed, 35 insertions(+), 13 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 9ed5151..5e1a75d 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribute_table[] =
>  #undef TARGET_VECTORIZE_DESTROY_COST_DATA
>  #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data
>  
> +#undef TARGET_LOOP_UNROLL_ADJUST
> +#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
> +
>  #undef TARGET_INIT_BUILTINS
>  #define TARGET_INIT_BUILTINS rs6000_init_builtins
>  #undef TARGET_BUILTIN_DECL
> @@ -4540,25 +4543,20 @@ rs6000_option_override_internal (bool global_init_p)
>  			     global_options.x_param_values,
>  			     global_options_set.x_param_values);
>  
> -      /* unroll very small loops 2 time if no -funroll-loops.  */
> +      /* If funroll-loops is not enabled explicitly, then enable small loops
> +	 unrolling for -O2, and do not turn fweb or frename-registers on.  */

"for -O2" -> "for -O2 and up"? since I noticed it checks "optimize >=2" later.

>        if (!global_options_set.x_flag_unroll_loops
>  	  && !global_options_set.x_flag_unroll_all_loops)
>  	{
> -	  maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
> -				 global_options.x_param_values,
> -				 global_options_set.x_param_values);
> -
> -	  maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
> -				 global_options.x_param_values,
> -				 global_options_set.x_param_values);
> +	  unroll_small_loops = optimize >= 2 ? 1 : 0;

Maybe simpler with "unroll_small_loops = flag_unroll_loops"? 

>  
> -	  /* If fweb or frename-registers are not specificed in command-line,
> -	     do not turn them on implicitly.  */
>  	  if (!global_options_set.x_flag_web)
>  	    global_options.x_flag_web = 0;
>  	  if (!global_options_set.x_flag_rename_registers)
>  	    global_options.x_flag_rename_registers = 0;
>  	}
> +      else
> +	unroll_small_loops = 0;

Could we initialize this in rs6000.opt as zero?


BR,
Kewen
Jiufu Guo Nov. 4, 2019, 9:53 a.m. UTC | #2
"Kewen.Lin" <linkw@linux.ibm.com> writes:

> Hi Jeff,
>
> Thanks for the patch, I learned a lot from it.  Some nits embedded.
Thanks for your comments!
>
> on 2019/11/4 下午2:31, Jiufu Guo wrote:
>> Hi,
>> 
>> In this patch, loop unroll adjust hook is introduced for powerpc.  We can do
>> target related hueristic adjustment in this hook. In this patch, small loops
>> is unrolled 2 times for O2 and O3 by default.  With this patch, we can see
>> some improvement for spec2017.  This patch enhanced a little for [Patch V2] to
>> enable small loops unroll for O3 by default like O2.
>> 
>> Bootstrapped and regtested on powerpc64le.  Is this ok for trunk?
>> 
>> Jiufu
>> BR.
>> 
>> gcc/
>> 2019-11-04  Jiufu Guo  <guojiufu@linux.ibm.com>	    
>> 
>> 	PR tree-optimization/88760
>> 	* config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
>> 	code which changes PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS.
>> 	(TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook.
>> 	(rs6000_loop_unroll_adjust): New hook for loop unroll adjust.
>> 	Unrolling small loop 2 times for -O2 and -O3.
>> 	(rs6000_function_specific_save): Save unroll_small_loops flag.
>> 	(rs6000_function_specific_restore): Restore unroll_small_loops flag.
>> 	* gcc/config/rs6000/rs6000.opt (unroll_small_loops): New internal flag.
>> 
>> 	
>> gcc.testsuite/
>> 2019-11-04  Jiufu Guo  <guojiufu@linux.ibm.com>
>> 
>> 	PR tree-optimization/88760
>> 	* gcc.dg/pr59643.c: Update back to r277550.
>> 
>> ---
>>  gcc/config/rs6000/rs6000.c     | 38 ++++++++++++++++++++++++++++----------
>>  gcc/config/rs6000/rs6000.opt   |  7 +++++++
>>  gcc/testsuite/gcc.dg/pr59643.c |  3 ---
>>  3 files changed, 35 insertions(+), 13 deletions(-)
>> 
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index 9ed5151..5e1a75d 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribute_table[] =
>>  #undef TARGET_VECTORIZE_DESTROY_COST_DATA
>>  #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data
>>  
>> +#undef TARGET_LOOP_UNROLL_ADJUST
>> +#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
>> +
>>  #undef TARGET_INIT_BUILTINS
>>  #define TARGET_INIT_BUILTINS rs6000_init_builtins
>>  #undef TARGET_BUILTIN_DECL
>> @@ -4540,25 +4543,20 @@ rs6000_option_override_internal (bool global_init_p)
>>  			     global_options.x_param_values,
>>  			     global_options_set.x_param_values);
>>  
>> -      /* unroll very small loops 2 time if no -funroll-loops.  */
>> +      /* If funroll-loops is not enabled explicitly, then enable small loops
>> +	 unrolling for -O2, and do not turn fweb or frename-registers on.  */
>
> "for -O2" -> "for -O2 and up"? since I noticed it checks "optimize
> >=2" later.
Thanks, yes, this would be more clear.
>
>>        if (!global_options_set.x_flag_unroll_loops
>>  	  && !global_options_set.x_flag_unroll_all_loops)
>>  	{
>> -	  maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
>> -				 global_options.x_param_values,
>> -				 global_options_set.x_param_values);
>> -
>> -	  maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
>> -				 global_options.x_param_values,
>> -				 global_options_set.x_param_values);
>> +	  unroll_small_loops = optimize >= 2 ? 1 : 0;
>
> Maybe simpler with "unroll_small_loops = flag_unroll_loops"?
Both would be ok, I think. ;)
>
>>  
>> -	  /* If fweb or frename-registers are not specificed in command-line,
>> -	     do not turn them on implicitly.  */
>>  	  if (!global_options_set.x_flag_web)
>>  	    global_options.x_flag_web = 0;
>>  	  if (!global_options_set.x_flag_rename_registers)
>>  	    global_options.x_flag_rename_registers = 0;
>>  	}
>> +      else
>> +	unroll_small_loops = 0;
>
> Could we initialize this in rs6000.opt as zero?
We could also set initial value as 0 in rs6000.opt.

BR
Jiufu.

>
>
> BR,
> Kewen
Segher Boessenkool Nov. 4, 2019, 8:55 p.m. UTC | #3
Hi!

On Mon, Nov 04, 2019 at 02:31:43PM +0800, Jiufu Guo wrote:
> In this patch, loop unroll adjust hook is introduced for powerpc.  We can do
> target related hueristic adjustment in this hook. In this patch, small loops
> is unrolled 2 times for O2 and O3 by default.  With this patch, we can see
> some improvement for spec2017.  This patch enhanced a little for [Patch V2] to
> enable small loops unroll for O3 by default like O2.

> 	* gcc/config/rs6000/rs6000.opt (unroll_small_loops): New internal flag.

That's the declaration of a variable.  A command line flag is something
like -munroll-small-loops.  Do we want a command line option like that?
It makes testing simpler.

> -      /* unroll very small loops 2 time if no -funroll-loops.  */
> +      /* If funroll-loops is not enabled explicitly, then enable small loops
> +	 unrolling for -O2, and do not turn fweb or frename-registers on.  */
>        if (!global_options_set.x_flag_unroll_loops
>  	  && !global_options_set.x_flag_unroll_all_loops)
>  	{
> -	  maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
> -				 global_options.x_param_values,
> -				 global_options_set.x_param_values);
> -
> -	  maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
> -				 global_options.x_param_values,
> -				 global_options_set.x_param_values);
> +	  unroll_small_loops = optimize >= 2 ? 1 : 0;

That includes -Os?

I think you shouldn't always set it to some value, only enable it where
you want to enable it.  If you make a command line option for it this is
especially simple (the table in common/config/rs6000/rs6000-common.c).

> +static unsigned
> +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop)
> +{
> +  if (unroll_small_loops)
> +    {
> +      /* TODO: This is hardcoded to 10 right now.  It can be refined, for
> +	 example we may want to unroll very small loops more times (4 perhaps).
> +	 We also should use a PARAM for this.  */
> +      if (loop->ninsns <= 10)
> +	return MIN (2, nunroll);
> +      else
> +	return 0;
> +    }

(Add an empty line here?)

> +  return nunroll;
> +}

Great :-)

> @@ -23472,6 +23488,7 @@ rs6000_function_specific_save (struct cl_target_option *ptr,
>  {
>    ptr->x_rs6000_isa_flags = opts->x_rs6000_isa_flags;
>    ptr->x_rs6000_isa_flags_explicit = opts->x_rs6000_isa_flags_explicit;
> +  ptr->x_unroll_small_loops = opts->x_unroll_small_loops;
>  }

Yeah we shouldn't need to add that, this should all be automatic.


Segher
Richard Biener Nov. 5, 2019, 7:09 a.m. UTC | #4
On Mon, 4 Nov 2019, Segher Boessenkool wrote:

> Hi!
> 
> On Mon, Nov 04, 2019 at 02:31:43PM +0800, Jiufu Guo wrote:
> > In this patch, loop unroll adjust hook is introduced for powerpc.  We can do
> > target related hueristic adjustment in this hook. In this patch, small loops
> > is unrolled 2 times for O2 and O3 by default.  With this patch, we can see
> > some improvement for spec2017.  This patch enhanced a little for [Patch V2] to
> > enable small loops unroll for O3 by default like O2.
> 
> > 	* gcc/config/rs6000/rs6000.opt (unroll_small_loops): New internal flag.
> 
> That's the declaration of a variable.  A command line flag is something
> like -munroll-small-loops.  Do we want a command line option like that?
> It makes testing simpler.
> 
> > -      /* unroll very small loops 2 time if no -funroll-loops.  */
> > +      /* If funroll-loops is not enabled explicitly, then enable small loops
> > +	 unrolling for -O2, and do not turn fweb or frename-registers on.  */
> >        if (!global_options_set.x_flag_unroll_loops
> >  	  && !global_options_set.x_flag_unroll_all_loops)
> >  	{
> > -	  maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
> > -				 global_options.x_param_values,
> > -				 global_options_set.x_param_values);
> > -
> > -	  maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
> > -				 global_options.x_param_values,
> > -				 global_options_set.x_param_values);
> > +	  unroll_small_loops = optimize >= 2 ? 1 : 0;
> 
> That includes -Os?

It also re-introduces the exactly same issue as the --param with LTO.

> I think you shouldn't always set it to some value, only enable it where
> you want to enable it.  If you make a command line option for it this is
> especially simple (the table in common/config/rs6000/rs6000-common.c).
> 
> > +static unsigned
> > +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop)
> > +{
> > +  if (unroll_small_loops)
> > +    {
> > +      /* TODO: This is hardcoded to 10 right now.  It can be refined, for
> > +	 example we may want to unroll very small loops more times (4 perhaps).
> > +	 We also should use a PARAM for this.  */
> > +      if (loop->ninsns <= 10)
> > +	return MIN (2, nunroll);
> > +      else
> > +	return 0;
> > +    }
> 
> (Add an empty line here?)
> 
> > +  return nunroll;
> > +}
> 
> Great :-)
> 
> > @@ -23472,6 +23488,7 @@ rs6000_function_specific_save (struct cl_target_option *ptr,
> >  {
> >    ptr->x_rs6000_isa_flags = opts->x_rs6000_isa_flags;
> >    ptr->x_rs6000_isa_flags_explicit = opts->x_rs6000_isa_flags_explicit;
> > +  ptr->x_unroll_small_loops = opts->x_unroll_small_loops;
> >  }
> 
> Yeah we shouldn't need to add that, this should all be automatic.
> 
> 
> Segher
>
Jiufu Guo Nov. 5, 2019, 8:33 a.m. UTC | #5
Segher Boessenkool <segher@kernel.crashing.org> writes:

> Hi!
>
> On Mon, Nov 04, 2019 at 02:31:43PM +0800, Jiufu Guo wrote:
>> In this patch, loop unroll adjust hook is introduced for powerpc.  We can do
>> target related hueristic adjustment in this hook. In this patch, small loops
>> is unrolled 2 times for O2 and O3 by default.  With this patch, we can see
>> some improvement for spec2017.  This patch enhanced a little for [Patch V2] to
>> enable small loops unroll for O3 by default like O2.
>
>> 	* gcc/config/rs6000/rs6000.opt (unroll_small_loops): New internal flag.
>
> That's the declaration of a variable.  A command line flag is something
> like -munroll-small-loops.  Do we want a command line option like that?
> It makes testing simpler.
Thanks for great sugguestion, will update patch to add a command line
option.

>
>> -      /* unroll very small loops 2 time if no -funroll-loops.  */
>> +      /* If funroll-loops is not enabled explicitly, then enable small loops
>> +	 unrolling for -O2, and do not turn fweb or frename-registers on.  */
>>        if (!global_options_set.x_flag_unroll_loops
>>  	  && !global_options_set.x_flag_unroll_all_loops)
>>  	{
>> -	  maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
>> -				 global_options.x_param_values,
>> -				 global_options_set.x_param_values);
>> -
>> -	  maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
>> -				 global_options.x_param_values,
>> -				 global_options_set.x_param_values);
>> +	  unroll_small_loops = optimize >= 2 ? 1 : 0;
>
> That includes -Os?
>
> I think you shouldn't always set it to some value, only enable it where
> you want to enable it.  If you make a command line option for it this is
> especially simple (the table in common/config/rs6000/rs6000-common.c).
Thanks again, update rs6000_option_optimization_table as :
    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fweb, NULL, 0 },
    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_frename_registers, NULL, 0 },
    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 },

While, I still keep the code to disable unroll_small_loops for explicit
-funroll-loops via checking global_options_set.x_flag_unroll_loops. 
>
>> +static unsigned
>> +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop)
>> +{
>> +  if (unroll_small_loops)
>> +    {
>> +      /* TODO: This is hardcoded to 10 right now.  It can be refined, for
>> +	 example we may want to unroll very small loops more times (4 perhaps).
>> +	 We also should use a PARAM for this.  */
>> +      if (loop->ninsns <= 10)
>> +	return MIN (2, nunroll);
>> +      else
>> +	return 0;
>> +    }
>
> (Add an empty line here?)
Thanks again, updated accordingly.
>
>> +  return nunroll;
>> +}
>
> Great :-)
>
>> @@ -23472,6 +23488,7 @@ rs6000_function_specific_save (struct cl_target_option *ptr,
>>  {
>>    ptr->x_rs6000_isa_flags = opts->x_rs6000_isa_flags;
>>    ptr->x_rs6000_isa_flags_explicit = opts->x_rs6000_isa_flags_explicit;
>> +  ptr->x_unroll_small_loops = opts->x_unroll_small_loops;
>>  }
>
> Yeah we shouldn't need to add that, this should all be automatic.
Yes, through adding new option in .opt, this is handled automaticly.

Updated patch is at the end of this mail. Thanks for review.

Jiufu
>
>
> Segher

Updated patch:

Index: gcc/common/config/rs6000/rs6000-common.c
===================================================================
--- gcc/common/config/rs6000/rs6000-common.c	(revision 277765)
+++ gcc/common/config/rs6000/rs6000-common.c	(working copy)
@@ -35,7 +35,9 @@ static const struct default_options rs6000_option_
     { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
     /* Enable -fsched-pressure for first pass instruction scheduling.  */
     { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
-    { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
+    /* Enable  -funroll-loops with -munroll-small-loops.  */
+    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
+    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 },
     { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 277765)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribut
 #undef TARGET_VECTORIZE_DESTROY_COST_DATA
 #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data
 
+#undef TARGET_LOOP_UNROLL_ADJUST
+#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
+
 #undef TARGET_INIT_BUILTINS
 #define TARGET_INIT_BUILTINS rs6000_init_builtins
 #undef TARGET_BUILTIN_DECL
@@ -4540,24 +4543,21 @@ rs6000_option_override_internal (bool global_init_
 			     global_options.x_param_values,
 			     global_options_set.x_param_values);
 
-      /* unroll very small loops 2 time if no -funroll-loops.  */
-      if (!global_options_set.x_flag_unroll_loops
-	  && !global_options_set.x_flag_unroll_all_loops)
+      /* Explicit funroll-loops turns -munroll-small-loops off.
+	 Implicit funroll-loops does not turn fweb or frename-registers on.  */
+      if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops)
+	  || (global_options_set.x_flag_unroll_all_loops
+	      && flag_unroll_all_loops))
 	{
-	  maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
-				 global_options.x_param_values,
-				 global_options_set.x_param_values);
-
-	  maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
-				 global_options.x_param_values,
-				 global_options_set.x_param_values);
-
-	  /* If fweb or frename-registers are not specificed in command-line,
-	     do not turn them on implicitly.  */
+	  if (!global_options_set.x_unroll_small_loops)
+	    unroll_small_loops = 0;
+	}
+      else
+	{
 	  if (!global_options_set.x_flag_web)
-	    global_options.x_flag_web = 0;
+	    flag_web = 0;
 	  if (!global_options_set.x_flag_rename_registers)
-	    global_options.x_flag_rename_registers = 0;
+	    flag_rename_registers = 0;
 	}
 
       /* If using typedef char *va_list, signal that
@@ -5101,6 +5101,25 @@ rs6000_destroy_cost_data (void *data)
   free (data);
 }
 
+/*  Implement targetm.loop_unroll_adjust.  */
+
+static unsigned
+rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop)
+{
+   if (unroll_small_loops)
+    {
+      /* TODO: This is hardcoded to 10 right now.  It can be refined, for
+	 example we may want to unroll very small loops more times (4 perhaps).
+	 We also should use a PARAM for this.  */
+      if (loop->ninsns <= 10)
+	return MIN (2, nunroll);
+      else
+	return 0;
+    }
+  
+  return nunroll;
+}
+
 /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
    library with vectorized intrinsics.  */
 
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 277765)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -501,6 +501,10 @@ moptimize-swaps
 Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save
 Analyze and remove doubleword swaps from VSX computations.
 
+munroll-small-loops
+Target Undocumented Var(unroll_small_loops) Init(0) Save
+Use conservative small loop unrolling.
+
 mpower9-misc
 Target Undocumented Report Mask(P9_MISC) Var(rs6000_isa_flags)
 Use certain scalar instructions added in ISA 3.0.
Index: gcc/testsuite/gcc.dg/pr59643.c
===================================================================
--- gcc/testsuite/gcc.dg/pr59643.c	(revision 277765)
+++ gcc/testsuite/gcc.dg/pr59643.c	(working copy)
@@ -1,9 +1,6 @@
 /* PR tree-optimization/59643 */
 /* { dg-do compile } */
 /* { dg-options "-O3 -fdump-tree-pcom-details" } */
-/* { dg-additional-options "--param max-unrolled-insns=400" { target { powerpc*-*-* } } } */
-/* Implicit threashold of max-unrolled-insn on ppc at O3 is too small for the
-   loop of this case.  */
 
 void
 foo (double *a, double *b, double *c, double d, double e, int n)
Jiufu Guo Nov. 5, 2019, 8:51 a.m. UTC | #6
Richard Biener <rguenther@suse.de> writes:

> On Mon, 4 Nov 2019, Segher Boessenkool wrote:
>
>> Hi!
>> 
>> On Mon, Nov 04, 2019 at 02:31:43PM +0800, Jiufu Guo wrote:
>> > In this patch, loop unroll adjust hook is introduced for powerpc.  We can do
>> > target related hueristic adjustment in this hook. In this patch, small loops
>> > is unrolled 2 times for O2 and O3 by default.  With this patch, we can see
>> > some improvement for spec2017.  This patch enhanced a little for [Patch V2] to
>> > enable small loops unroll for O3 by default like O2.
>> 
>> > 	* gcc/config/rs6000/rs6000.opt (unroll_small_loops): New internal flag.
>> 
>> That's the declaration of a variable.  A command line flag is something
>> like -munroll-small-loops.  Do we want a command line option like that?
>> It makes testing simpler.
>> 
>> > -      /* unroll very small loops 2 time if no -funroll-loops.  */
>> > +      /* If funroll-loops is not enabled explicitly, then enable small loops
>> > +	 unrolling for -O2, and do not turn fweb or frename-registers on.  */
>> >        if (!global_options_set.x_flag_unroll_loops
>> >  	  && !global_options_set.x_flag_unroll_all_loops)
>> >  	{
>> > -	  maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
>> > -				 global_options.x_param_values,
>> > -				 global_options_set.x_param_values);
>> > -
>> > -	  maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
>> > -				 global_options.x_param_values,
>> > -				 global_options_set.x_param_values);
>> > +	  unroll_small_loops = optimize >= 2 ? 1 : 0;
>> 
>> That includes -Os?
>
> It also re-introduces the exactly same issue as the --param with LTO.
Thanks Richard,
This flag (unroll_small_loops) is saved/restored cl_target_option it can
distigush different CU. I had a test, it works when -flto for
multi-sources. 

Jiufu
BR.

>
>> I think you shouldn't always set it to some value, only enable it where
>> you want to enable it.  If you make a command line option for it this is
>> especially simple (the table in common/config/rs6000/rs6000-common.c).
>> 
>> > +static unsigned
>> > +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop)
>> > +{
>> > +  if (unroll_small_loops)
>> > +    {
>> > +      /* TODO: This is hardcoded to 10 right now.  It can be refined, for
>> > +	 example we may want to unroll very small loops more times (4 perhaps).
>> > +	 We also should use a PARAM for this.  */
>> > +      if (loop->ninsns <= 10)
>> > +	return MIN (2, nunroll);
>> > +      else
>> > +	return 0;
>> > +    }
>> 
>> (Add an empty line here?)
>> 
>> > +  return nunroll;
>> > +}
>> 
>> Great :-)
>> 
>> > @@ -23472,6 +23488,7 @@ rs6000_function_specific_save (struct cl_target_option *ptr,
>> >  {
>> >    ptr->x_rs6000_isa_flags = opts->x_rs6000_isa_flags;
>> >    ptr->x_rs6000_isa_flags_explicit = opts->x_rs6000_isa_flags_explicit;
>> > +  ptr->x_unroll_small_loops = opts->x_unroll_small_loops;
>> >  }
>> 
>> Yeah we shouldn't need to add that, this should all be automatic.
>> 
>> 
>> Segher
>>
Jiufu Guo Nov. 5, 2019, 8:55 a.m. UTC | #7
Jiufu Guo <guojiufu@linux.ibm.com> writes:

> Segher Boessenkool <segher@kernel.crashing.org> writes:
>
>> Hi!
>>
>> On Mon, Nov 04, 2019 at 02:31:43PM +0800, Jiufu Guo wrote:
>>> In this patch, loop unroll adjust hook is introduced for powerpc.  We can do
>>> target related hueristic adjustment in this hook. In this patch, small loops
>>> is unrolled 2 times for O2 and O3 by default.  With this patch, we can see
>>> some improvement for spec2017.  This patch enhanced a little for [Patch V2] to
>>> enable small loops unroll for O3 by default like O2.
>>
>>> 	* gcc/config/rs6000/rs6000.opt (unroll_small_loops): New internal flag.
>>
>> That's the declaration of a variable.  A command line flag is something
>> like -munroll-small-loops.  Do we want a command line option like that?
>> It makes testing simpler.
> Thanks for great sugguestion, will update patch to add a command line
> option.
>
>>
>>> -      /* unroll very small loops 2 time if no -funroll-loops.  */
>>> +      /* If funroll-loops is not enabled explicitly, then enable small loops
>>> +	 unrolling for -O2, and do not turn fweb or frename-registers on.  */
>>>        if (!global_options_set.x_flag_unroll_loops
>>>  	  && !global_options_set.x_flag_unroll_all_loops)
>>>  	{
>>> -	  maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
>>> -				 global_options.x_param_values,
>>> -				 global_options_set.x_param_values);
>>> -
>>> -	  maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
>>> -				 global_options.x_param_values,
>>> -				 global_options_set.x_param_values);
>>> +	  unroll_small_loops = optimize >= 2 ? 1 : 0;
>>
>> That includes -Os?
>>
>> I think you shouldn't always set it to some value, only enable it where
>> you want to enable it.  If you make a command line option for it this is
>> especially simple (the table in common/config/rs6000/rs6000-common.c).
> Thanks again, update rs6000_option_optimization_table as :
>     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
>     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fweb, NULL, 0 },
>     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_frename_registers, NULL, 0 },
Sorry, typo, in patch, above 2 lines are not there. Because they do not
turn off flag_web and flag_ename_registers.
>     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 },
>
> While, I still keep the code to disable unroll_small_loops for explicit
> -funroll-loops via checking global_options_set.x_flag_unroll_loops. 
>>
>>> +static unsigned
>>> +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop)
>>> +{
>>> +  if (unroll_small_loops)
>>> +    {
>>> +      /* TODO: This is hardcoded to 10 right now.  It can be refined, for
>>> +	 example we may want to unroll very small loops more times (4 perhaps).
>>> +	 We also should use a PARAM for this.  */
>>> +      if (loop->ninsns <= 10)
>>> +	return MIN (2, nunroll);
>>> +      else
>>> +	return 0;
>>> +    }
>>
>> (Add an empty line here?)
> Thanks again, updated accordingly.
>>
>>> +  return nunroll;
>>> +}
>>
>> Great :-)
>>
>>> @@ -23472,6 +23488,7 @@ rs6000_function_specific_save (struct cl_target_option *ptr,
>>>  {
>>>    ptr->x_rs6000_isa_flags = opts->x_rs6000_isa_flags;
>>>    ptr->x_rs6000_isa_flags_explicit = opts->x_rs6000_isa_flags_explicit;
>>> +  ptr->x_unroll_small_loops = opts->x_unroll_small_loops;
>>>  }
>>
>> Yeah we shouldn't need to add that, this should all be automatic.
> Yes, through adding new option in .opt, this is handled automaticly.
>
> Updated patch is at the end of this mail. Thanks for review.
>
> Jiufu
>>
>>
>> Segher
>
> Updated patch:
>
> Index: gcc/common/config/rs6000/rs6000-common.c
> ===================================================================
> --- gcc/common/config/rs6000/rs6000-common.c	(revision 277765)
> +++ gcc/common/config/rs6000/rs6000-common.c	(working copy)
> @@ -35,7 +35,9 @@ static const struct default_options rs6000_option_
>      { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
>      /* Enable -fsched-pressure for first pass instruction scheduling.  */
>      { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
> -    { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
> +    /* Enable  -funroll-loops with -munroll-small-loops.  */
> +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
> +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 },
>      { OPT_LEVELS_NONE, 0, NULL, 0 }
>    };
>  
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c	(revision 277765)
> +++ gcc/config/rs6000/rs6000.c	(working copy)
> @@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribut
>  #undef TARGET_VECTORIZE_DESTROY_COST_DATA
>  #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data
>  
> +#undef TARGET_LOOP_UNROLL_ADJUST
> +#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
> +
>  #undef TARGET_INIT_BUILTINS
>  #define TARGET_INIT_BUILTINS rs6000_init_builtins
>  #undef TARGET_BUILTIN_DECL
> @@ -4540,24 +4543,21 @@ rs6000_option_override_internal (bool global_init_
>  			     global_options.x_param_values,
>  			     global_options_set.x_param_values);
>  
> -      /* unroll very small loops 2 time if no -funroll-loops.  */
> -      if (!global_options_set.x_flag_unroll_loops
> -	  && !global_options_set.x_flag_unroll_all_loops)
> +      /* Explicit funroll-loops turns -munroll-small-loops off.
> +	 Implicit funroll-loops does not turn fweb or frename-registers on.  */
> +      if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops)
> +	  || (global_options_set.x_flag_unroll_all_loops
> +	      && flag_unroll_all_loops))
>  	{
> -	  maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
> -				 global_options.x_param_values,
> -				 global_options_set.x_param_values);
> -
> -	  maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
> -				 global_options.x_param_values,
> -				 global_options_set.x_param_values);
> -
> -	  /* If fweb or frename-registers are not specificed in command-line,
> -	     do not turn them on implicitly.  */
> +	  if (!global_options_set.x_unroll_small_loops)
> +	    unroll_small_loops = 0;
> +	}
> +      else
> +	{
>  	  if (!global_options_set.x_flag_web)
> -	    global_options.x_flag_web = 0;
> +	    flag_web = 0;
>  	  if (!global_options_set.x_flag_rename_registers)
> -	    global_options.x_flag_rename_registers = 0;
> +	    flag_rename_registers = 0;
>  	}
>  
>        /* If using typedef char *va_list, signal that
> @@ -5101,6 +5101,25 @@ rs6000_destroy_cost_data (void *data)
>    free (data);
>  }
>  
> +/*  Implement targetm.loop_unroll_adjust.  */
> +
> +static unsigned
> +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop)
> +{
> +   if (unroll_small_loops)
> +    {
> +      /* TODO: This is hardcoded to 10 right now.  It can be refined, for
> +	 example we may want to unroll very small loops more times (4 perhaps).
> +	 We also should use a PARAM for this.  */
> +      if (loop->ninsns <= 10)
> +	return MIN (2, nunroll);
> +      else
> +	return 0;
> +    }
> +  
> +  return nunroll;
> +}
> +
>  /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
>     library with vectorized intrinsics.  */
>  
> Index: gcc/config/rs6000/rs6000.opt
> ===================================================================
> --- gcc/config/rs6000/rs6000.opt	(revision 277765)
> +++ gcc/config/rs6000/rs6000.opt	(working copy)
> @@ -501,6 +501,10 @@ moptimize-swaps
>  Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save
>  Analyze and remove doubleword swaps from VSX computations.
>  
> +munroll-small-loops
> +Target Undocumented Var(unroll_small_loops) Init(0) Save
> +Use conservative small loop unrolling.
> +
>  mpower9-misc
>  Target Undocumented Report Mask(P9_MISC) Var(rs6000_isa_flags)
>  Use certain scalar instructions added in ISA 3.0.
> Index: gcc/testsuite/gcc.dg/pr59643.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/pr59643.c	(revision 277765)
> +++ gcc/testsuite/gcc.dg/pr59643.c	(working copy)
> @@ -1,9 +1,6 @@
>  /* PR tree-optimization/59643 */
>  /* { dg-do compile } */
>  /* { dg-options "-O3 -fdump-tree-pcom-details" } */
> -/* { dg-additional-options "--param max-unrolled-insns=400" { target { powerpc*-*-* } } } */
> -/* Implicit threashold of max-unrolled-insn on ppc at O3 is too small for the
> -   loop of this case.  */
>  
>  void
>  foo (double *a, double *b, double *c, double d, double e, int n)
Jiufu Guo Nov. 5, 2019, 2:52 p.m. UTC | #8
Jiufu Guo <guojiufu@linux.ibm.com> writes:

Got message about fail to send to gcc-patches. Send again.

> Segher Boessenkool <segher@kernel.crashing.org> writes:
>
>> Hi!
>>
>> On Mon, Nov 04, 2019 at 02:31:43PM +0800, Jiufu Guo wrote:
>>> In this patch, loop unroll adjust hook is introduced for powerpc.  We can do
>>> target related hueristic adjustment in this hook. In this patch, small loops
>>> is unrolled 2 times for O2 and O3 by default.  With this patch, we can see
>>> some improvement for spec2017.  This patch enhanced a little for [Patch V2] to
>>> enable small loops unroll for O3 by default like O2.
>>
[...]
>
> Updated patch is at the end of this mail. Thanks for review.
>
> Jiufu
>>
>>
>> Segher

Thanks again for review.
Add ChangeLog:

gcc/
2019-11-04  Jiufu Guo  <guojiufu@linux.ibm.com>

	PR tree-optimization/88760
	* gcc/common/config/rs6000/rs6000-common.c
	(rs6000_option_optimization_table): Enable OPT_funroll_loops and
	OPT_munroll_small_loops for OPT_LEVELS_2_PLUS_SPEED_ONLY.
	* config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
	code which changes PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS.
	Handle option overrides for explicit and implicit funroll-loops.
	(TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook.
	(rs6000_loop_unroll_adjust): New hook for loop unroll adjust.
	Unrolling 2 times for small loops.
	* gcc/config/rs6000/rs6000.opt (munroll-small-loops): Add new flag.

gcc.testsuite/
2019-11-04  Jiufu Guo  <guojiufu@linux.ibm.com>

	PR tree-optimization/88760
	* gcc.dg/pr59643.c: Update back to r277550.

>
> Updated patch:
>
> Index: gcc/common/config/rs6000/rs6000-common.c
> ===================================================================
> --- gcc/common/config/rs6000/rs6000-common.c	(revision 277765)
> +++ gcc/common/config/rs6000/rs6000-common.c	(working copy)
> @@ -35,7 +35,9 @@ static const struct default_options rs6000_option_
>      { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
>      /* Enable -fsched-pressure for first pass instruction scheduling.  */
>      { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
> -    { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
> +    /* Enable  -funroll-loops with -munroll-small-loops.  */
> +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
> +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 },
>      { OPT_LEVELS_NONE, 0, NULL, 0 }
>    };
>  
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c	(revision 277765)
> +++ gcc/config/rs6000/rs6000.c	(working copy)
> @@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribut
>  #undef TARGET_VECTORIZE_DESTROY_COST_DATA
>  #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data
>  
> +#undef TARGET_LOOP_UNROLL_ADJUST
> +#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
> +
>  #undef TARGET_INIT_BUILTINS
>  #define TARGET_INIT_BUILTINS rs6000_init_builtins
>  #undef TARGET_BUILTIN_DECL
> @@ -4540,24 +4543,21 @@ rs6000_option_override_internal (bool global_init_
>  			     global_options.x_param_values,
>  			     global_options_set.x_param_values);
>  
> -      /* unroll very small loops 2 time if no -funroll-loops.  */
> -      if (!global_options_set.x_flag_unroll_loops
> -	  && !global_options_set.x_flag_unroll_all_loops)
> +      /* Explicit funroll-loops turns -munroll-small-loops off.
> +	 Implicit funroll-loops does not turn fweb or frename-registers on.  */
> +      if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops)
> +	  || (global_options_set.x_flag_unroll_all_loops
> +	      && flag_unroll_all_loops))
>  	{
> -	  maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
> -				 global_options.x_param_values,
> -				 global_options_set.x_param_values);
> -
> -	  maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
> -				 global_options.x_param_values,
> -				 global_options_set.x_param_values);
> -
> -	  /* If fweb or frename-registers are not specificed in command-line,
> -	     do not turn them on implicitly.  */
> +	  if (!global_options_set.x_unroll_small_loops)
> +	    unroll_small_loops = 0;
> +	}
> +      else
> +	{
>  	  if (!global_options_set.x_flag_web)
> -	    global_options.x_flag_web = 0;
> +	    flag_web = 0;
>  	  if (!global_options_set.x_flag_rename_registers)
> -	    global_options.x_flag_rename_registers = 0;
> +	    flag_rename_registers = 0;
>  	}
>  
>        /* If using typedef char *va_list, signal that
> @@ -5101,6 +5101,25 @@ rs6000_destroy_cost_data (void *data)
>    free (data);
>  }
>  
> +/*  Implement targetm.loop_unroll_adjust.  */
> +
> +static unsigned
> +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop)
> +{
> +   if (unroll_small_loops)
> +    {
> +      /* TODO: This is hardcoded to 10 right now.  It can be refined, for
> +	 example we may want to unroll very small loops more times (4 perhaps).
> +	 We also should use a PARAM for this.  */
> +      if (loop->ninsns <= 10)
> +	return MIN (2, nunroll);
> +      else
> +	return 0;
> +    }
> +  
> +  return nunroll;
> +}
> +
>  /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
>     library with vectorized intrinsics.  */
>  
> Index: gcc/config/rs6000/rs6000.opt
> ===================================================================
> --- gcc/config/rs6000/rs6000.opt	(revision 277765)
> +++ gcc/config/rs6000/rs6000.opt	(working copy)
> @@ -501,6 +501,10 @@ moptimize-swaps
>  Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save
>  Analyze and remove doubleword swaps from VSX computations.
>  
> +munroll-small-loops
> +Target Undocumented Var(unroll_small_loops) Init(0) Save
> +Use conservative small loop unrolling.
> +
>  mpower9-misc
>  Target Undocumented Report Mask(P9_MISC) Var(rs6000_isa_flags)
>  Use certain scalar instructions added in ISA 3.0.
> Index: gcc/testsuite/gcc.dg/pr59643.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/pr59643.c	(revision 277765)
> +++ gcc/testsuite/gcc.dg/pr59643.c	(working copy)
> @@ -1,9 +1,6 @@
>  /* PR tree-optimization/59643 */
>  /* { dg-do compile } */
>  /* { dg-options "-O3 -fdump-tree-pcom-details" } */
> -/* { dg-additional-options "--param max-unrolled-insns=400" { target { powerpc*-*-* } } } */
> -/* Implicit threashold of max-unrolled-insn on ppc at O3 is too small for the
> -   loop of this case.  */
>  
>  void
>  foo (double *a, double *b, double *c, double d, double e, int n)
Segher Boessenkool Nov. 6, 2019, 6:06 p.m. UTC | #9
Hi!

On Tue, Nov 05, 2019 at 04:33:23PM +0800, Jiufu Guo wrote:
> --- gcc/common/config/rs6000/rs6000-common.c	(revision 277765)
> +++ gcc/common/config/rs6000/rs6000-common.c	(working copy)
> @@ -35,7 +35,9 @@ static const struct default_options rs6000_option_
>      { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
>      /* Enable -fsched-pressure for first pass instruction scheduling.  */
>      { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
> -    { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
> +    /* Enable  -funroll-loops with -munroll-small-loops.  */
> +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
> +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 },

I guess the comment should say what we enable here more than the generic
code does.  Something like

    /* Enable -funroll-loops at -O2 already.  Also enable
       -munroll-small-loops.  */

> +      /* Explicit funroll-loops turns -munroll-small-loops off.
> +	 Implicit funroll-loops does not turn fweb or frename-registers on.  */
> +      if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops)
> +	  || (global_options_set.x_flag_unroll_all_loops
> +	      && flag_unroll_all_loops))
>  	{
> +	  if (!global_options_set.x_unroll_small_loops)
> +	    unroll_small_loops = 0;
> +	}
> +      else
> +	{
>  	  if (!global_options_set.x_flag_web)
> +	    flag_web = 0;
>  	  if (!global_options_set.x_flag_rename_registers)
> +	    flag_rename_registers = 0;
>  	}

So unroll-small-loops should better be called unroll-only-small-loops?

Why does explicit unroll-loops turns on web and rnreg?  Why only explicit?
Isn't it good and/or bad in all the same cases, implicit and explicit?

> +munroll-small-loops
> +Target Undocumented Var(unroll_small_loops) Init(0) Save
> +Use conservative small loop unrolling.

Undocumented means undocumented, so you don't have a comment string in
here.  But you can comment it:

; Use conservative small loop unrolling.


Segher
Jiufu Guo Nov. 6, 2019, 9:36 p.m. UTC | #10
Segher Boessenkool <segher@kernel.crashing.org> writes:

> Hi!
>
> On Tue, Nov 05, 2019 at 04:33:23PM +0800, Jiufu Guo wrote:
>> --- gcc/common/config/rs6000/rs6000-common.c	(revision 277765)
>> +++ gcc/common/config/rs6000/rs6000-common.c	(working copy)
>> @@ -35,7 +35,9 @@ static const struct default_options rs6000_option_
>>      { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
>>      /* Enable -fsched-pressure for first pass instruction scheduling.  */
>>      { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
>> -    { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
>> +    /* Enable  -funroll-loops with -munroll-small-loops.  */
>> +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
>> +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 },
>
> I guess the comment should say what we enable here more than the generic
> code does.  Something like
>
>     /* Enable -funroll-loops at -O2 already.  Also enable
>        -munroll-small-loops.  */

updated to:
    /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
    loops at -O2 and above by default.   */
    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 },
    /* Disable -fweb and -frename-registers to avoid bad impacts.  */
    { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
    { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },

Thanks for more comments to make it better!

>
>> +      /* Explicit funroll-loops turns -munroll-small-loops off.
>> +	 Implicit funroll-loops does not turn fweb or frename-registers on.  */
>> +      if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops)
>> +	  || (global_options_set.x_flag_unroll_all_loops
>> +	      && flag_unroll_all_loops))
>>  	{
>> +	  if (!global_options_set.x_unroll_small_loops)
>> +	    unroll_small_loops = 0;
>> +	}
>> +      else
>> +	{
>>  	  if (!global_options_set.x_flag_web)
>> +	    flag_web = 0;
>>  	  if (!global_options_set.x_flag_rename_registers)
>> +	    flag_rename_registers = 0;
>>  	}
>
> So unroll-small-loops should better be called unroll-only-small-loops?
Thanks again.  Right, unroll-only-small-loops is better.
>
> Why does explicit unroll-loops turns on web and rnreg?  Why only explicit?
> Isn't it good and/or bad in all the same cases, implicit and explicit?
Good question!

Turn off them by default, because they do not help too much for generic
cases, and did not see performance gain on SPEC2017. And turning them
off will help to consistent with previous -O2/-O3 which does not turn
them on. This could avoid regressions in test cases.
If do not turn them on with -funroll-loops, user may see performance
difference on some cases.  For example, in SPEC peak which option
contains -funroll-loops, it may need to add -frename-registers manually
for some benchmarks.

Any sugguestions? Do you think it is a good idea to disable them by
default, and let user to add them when they are helpful? e.g. add them
for some benchmarks at `peak`.

>
>> +munroll-small-loops
>> +Target Undocumented Var(unroll_small_loops) Init(0) Save
>> +Use conservative small loop unrolling.
>
> Undocumented means undocumented, so you don't have a comment string in
> here.  But you can comment it:
>
> ; Use conservative small loop unrolling.
Thanks again for you kindly review!

Jiufu,

BR.
>
>
> Segher
Jiufu Guo Nov. 6, 2019, 10 p.m. UTC | #11
Jiufu Guo <guojiufu@linux.ibm.com> writes:

> Segher Boessenkool <segher@kernel.crashing.org> writes:
>
>> Hi!
>>
>> On Tue, Nov 05, 2019 at 04:33:23PM +0800, Jiufu Guo wrote:
>>> --- gcc/common/config/rs6000/rs6000-common.c	(revision 277765)
>>> +++ gcc/common/config/rs6000/rs6000-common.c	(working copy)
>>> @@ -35,7 +35,9 @@ static const struct default_options rs6000_option_
>>>      { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
>>>      /* Enable -fsched-pressure for first pass instruction scheduling.  */
>>>      { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
>>> -    { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
>>> +    /* Enable  -funroll-loops with -munroll-small-loops.  */
>>> +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
>>> +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 },
>>
>> I guess the comment should say what we enable here more than the generic
>> code does.  Something like
>>
>>     /* Enable -funroll-loops at -O2 already.  Also enable
>>        -munroll-small-loops.  */
>
> updated to:
>     /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
>     loops at -O2 and above by default.   */
>     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
>     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 },
>     /* Disable -fweb and -frename-registers to avoid bad impacts.  */
>     { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
>     { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
    /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
    loops at -O2 and above by default.   */
    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
    /* -fweb and -frename-registers are useless in general, turn them off.  */
    { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
    { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },

A little better?
Updated patch is attached at the end of this mail, maybe it is easy for
review.  :)

Jiufu,
BR.

>
> Thanks for more comments to make it better!
>
>>
>>> +      /* Explicit funroll-loops turns -munroll-small-loops off.
>>> +	 Implicit funroll-loops does not turn fweb or frename-registers on.  */
>>> +      if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops)
>>> +	  || (global_options_set.x_flag_unroll_all_loops
>>> +	      && flag_unroll_all_loops))
>>>  	{
>>> +	  if (!global_options_set.x_unroll_small_loops)
>>> +	    unroll_small_loops = 0;
>>> +	}
>>> +      else
>>> +	{
>>>  	  if (!global_options_set.x_flag_web)
>>> +	    flag_web = 0;
>>>  	  if (!global_options_set.x_flag_rename_registers)
>>> +	    flag_rename_registers = 0;
>>>  	}
>>
>> So unroll-small-loops should better be called unroll-only-small-loops?
> Thanks again.  Right, unroll-only-small-loops is better.
>>
>> Why does explicit unroll-loops turns on web and rnreg?  Why only explicit?
>> Isn't it good and/or bad in all the same cases, implicit and explicit?
> Good question!
>
> Turn off them by default, because they do not help too much for generic
> cases, and did not see performance gain on SPEC2017. And turning them
> off will help to consistent with previous -O2/-O3 which does not turn
> them on. This could avoid regressions in test cases.
> If do not turn them on with -funroll-loops, user may see performance
> difference on some cases.  For example, in SPEC peak which option
> contains -funroll-loops, it may need to add -frename-registers manually
> for some benchmarks.
>
> Any sugguestions? Do you think it is a good idea to disable them by
> default, and let user to add them when they are helpful? e.g. add them
> for some benchmarks at `peak`.
>
>>
>>> +munroll-small-loops
>>> +Target Undocumented Var(unroll_small_loops) Init(0) Save
>>> +Use conservative small loop unrolling.
>>
>> Undocumented means undocumented, so you don't have a comment string in
>> here.  But you can comment it:
>>
>> ; Use conservative small loop unrolling.
> Thanks again for you kindly review!
>
> Jiufu,
>
> BR.
>>
>>
>> Segher
gcc/
2019-11-06  Jiufu Guo  <guojiufu@linux.ibm.com>

	PR tree-optimization/88760
	* gcc/config/rs6000/rs6000.opt (-munroll-small-loops): New option.
	* gcc/common/config/rs6000/rs6000-common.c
	(rs6000_option_optimization_table) [OPT_LEVELS_2_PLUS_SPEED_ONLY]:
	Turn on -funroll-loops and -munroll-small-loops.
	[OPT_LEVELS_ALL]: Turn off -fweb and -frename-registers.
	* config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
	set of PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS.
	Turn off -munroll-small-loops, turn on -fweb and -frename-registers
	for explicit funroll-loops.
	(TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook.
	(rs6000_loop_unroll_adjust): Define it.  Use -munroll-small-loops.

gcc.testsuite/
2019-11-06  Jiufu Guo  <guojiufu@linux.ibm.com>

	PR tree-optimization/88760
	* gcc.dg/pr59643.c: Update back to r277550.

Index: gcc/common/config/rs6000/rs6000-common.c
===================================================================
--- gcc/common/config/rs6000/rs6000-common.c	(revision 277871)
+++ gcc/common/config/rs6000/rs6000-common.c	(working copy)
@@ -35,7 +35,13 @@ static const struct default_options rs6000_option_
     { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
     /* Enable -fsched-pressure for first pass instruction scheduling.  */
     { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
-    { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
+    /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
+    loops at -O2 and above by default.   */
+    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
+    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
+    /* -fweb and -frename-registers are useless in general, turn them off.  */
+    { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
+    { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
     { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 277871)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribut
 #undef TARGET_VECTORIZE_DESTROY_COST_DATA
 #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data
 
+#undef TARGET_LOOP_UNROLL_ADJUST
+#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
+
 #undef TARGET_INIT_BUILTINS
 #define TARGET_INIT_BUILTINS rs6000_init_builtins
 #undef TARGET_BUILTIN_DECL
@@ -4540,24 +4543,18 @@ rs6000_option_override_internal (bool global_init_
 			     global_options.x_param_values,
 			     global_options_set.x_param_values);
 
-      /* unroll very small loops 2 time if no -funroll-loops.  */
-      if (!global_options_set.x_flag_unroll_loops
-	  && !global_options_set.x_flag_unroll_all_loops)
+      /* Explicit -funroll-loops turns -munroll-small-loops off, and turns
+	 fweb or frename-registers on.  */
+      if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops)
+	  || (global_options_set.x_flag_unroll_all_loops
+	      && flag_unroll_all_loops))
 	{
-	  maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
-				 global_options.x_param_values,
-				 global_options_set.x_param_values);
-
-	  maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
-				 global_options.x_param_values,
-				 global_options_set.x_param_values);
-
-	  /* If fweb or frename-registers are not specificed in command-line,
-	     do not turn them on implicitly.  */
+	  if (!global_options_set.x_unroll_only_small_loops)
+	    unroll_only_small_loops = 0;
 	  if (!global_options_set.x_flag_web)
-	    global_options.x_flag_web = 0;
+	    flag_web = 1;
 	  if (!global_options_set.x_flag_rename_registers)
-	    global_options.x_flag_rename_registers = 0;
+	    flag_rename_registers = 1;
 	}
 
       /* If using typedef char *va_list, signal that
@@ -5101,6 +5098,25 @@ rs6000_destroy_cost_data (void *data)
   free (data);
 }
 
+/*  Implement targetm.loop_unroll_adjust.  */
+
+static unsigned
+rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop)
+{
+   if (unroll_only_small_loops)
+    {
+      /* TODO: This is hardcoded to 10 right now.  It can be refined, for
+	 example we may want to unroll very small loops more times (4 perhaps).
+	 We also should use a PARAM for this.  */
+      if (loop->ninsns <= 10)
+	return MIN (2, nunroll);
+      else
+	return 0;
+    }
+
+  return nunroll;
+}
+
 /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
    library with vectorized intrinsics.  */
 
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 277871)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -501,6 +501,10 @@ moptimize-swaps
 Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save
 Analyze and remove doubleword swaps from VSX computations.
 
+munroll-only-small-loops
+Target Undocumented Var(unroll_only_small_loops) Init(0) Save
+; Use conservative small loop unrolling.
+
 mpower9-misc
 Target Undocumented Report Mask(P9_MISC) Var(rs6000_isa_flags)
 Use certain scalar instructions added in ISA 3.0.
Index: gcc/testsuite/gcc.dg/pr59643.c
===================================================================
--- gcc/testsuite/gcc.dg/pr59643.c	(revision 277871)
+++ gcc/testsuite/gcc.dg/pr59643.c	(working copy)
@@ -1,9 +1,6 @@
 /* PR tree-optimization/59643 */
 /* { dg-do compile } */
 /* { dg-options "-O3 -fdump-tree-pcom-details" } */
-/* { dg-additional-options "--param max-unrolled-insns=400" { target { powerpc*-*-* } } } */
-/* Implicit threashold of max-unrolled-insn on ppc at O3 is too small for the
-   loop of this case.  */
 
 void
 foo (double *a, double *b, double *c, double d, double e, int n)
Jiufu Guo Nov. 7, 2019, 7:15 a.m. UTC | #12
Jiufu Guo <guojiufu@linux.ibm.com> writes:

Hi Segher,

I updated the patch for option name at the end of this mail. Thanks for
review in advance.

> Jiufu Guo <guojiufu@linux.ibm.com> writes:
>
>> Segher Boessenkool <segher@kernel.crashing.org> writes:
>>
>>> Hi!
>>>
>>> On Tue, Nov 05, 2019 at 04:33:23PM +0800, Jiufu Guo wrote:
>>>> --- gcc/common/config/rs6000/rs6000-common.c	(revision 277765)
>>>> +++ gcc/common/config/rs6000/rs6000-common.c	(working copy)
>>>> @@ -35,7 +35,9 @@ static const struct default_options rs6000_option_
>>>>      { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
>>>>      /* Enable -fsched-pressure for first pass instruction scheduling.  */
>>>>      { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
>>>> -    { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
>>>> +    /* Enable  -funroll-loops with -munroll-small-loops.  */
>>>> +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
>>>> +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 },
>>>
>>> I guess the comment should say what we enable here more than the generic
>>> code does.  Something like
>>>
>>>     /* Enable -funroll-loops at -O2 already.  Also enable
>>>        -munroll-small-loops.  */
>>
>> updated to:
>>     /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
>>     loops at -O2 and above by default.   */
>>     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
>>     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 },
>>     /* Disable -fweb and -frename-registers to avoid bad impacts.  */
>>     { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
>>     { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
>     /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
>     loops at -O2 and above by default.   */
>     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
>     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
>     /* -fweb and -frename-registers are useless in general, turn them off.  */
>     { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
>     { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
>
> A little better?
> Updated patch is attached at the end of this mail, maybe it is easy for
> review.  :)
>
> Jiufu,
> BR.
>
>>
>> Thanks for more comments to make it better!
>>
>>>
>>>> +      /* Explicit funroll-loops turns -munroll-small-loops off.
>>>> +	 Implicit funroll-loops does not turn fweb or frename-registers on.  */
>>>> +      if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops)
>>>> +	  || (global_options_set.x_flag_unroll_all_loops
>>>> +	      && flag_unroll_all_loops))
>>>>  	{
>>>> +	  if (!global_options_set.x_unroll_small_loops)
>>>> +	    unroll_small_loops = 0;
>>>> +	}
>>>> +      else
>>>> +	{
>>>>  	  if (!global_options_set.x_flag_web)
>>>> +	    flag_web = 0;
>>>>  	  if (!global_options_set.x_flag_rename_registers)
>>>> +	    flag_rename_registers = 0;
>>>>  	}
>>>
>>> So unroll-small-loops should better be called unroll-only-small-loops?
>> Thanks again.  Right, unroll-only-small-loops is better.
>>>
>>> Why does explicit unroll-loops turns on web and rnreg?  Why only explicit?
>>> Isn't it good and/or bad in all the same cases, implicit and explicit?
>> Good question!
>>
>> Turn off them by default, because they do not help too much for generic
>> cases, and did not see performance gain on SPEC2017. And turning them
>> off will help to consistent with previous -O2/-O3 which does not turn
>> them on. This could avoid regressions in test cases.
>> If do not turn them on with -funroll-loops, user may see performance
>> difference on some cases.  For example, in SPEC peak which option
>> contains -funroll-loops, it may need to add -frename-registers manually
>> for some benchmarks.
>>
>> Any sugguestions? Do you think it is a good idea to disable them by
>> default, and let user to add them when they are helpful? e.g. add them
>> for some benchmarks at `peak`.
>>
>>>
>>>> +munroll-small-loops
>>>> +Target Undocumented Var(unroll_small_loops) Init(0) Save
>>>> +Use conservative small loop unrolling.
>>>
>>> Undocumented means undocumented, so you don't have a comment string in
>>> here.  But you can comment it:
>>>
>>> ; Use conservative small loop unrolling.
>> Thanks again for you kindly review!
>>
>> Jiufu,
>>
>> BR.
>>>
>>>
>>> Segher

gcc/
2019-11-07  Jiufu Guo  <guojiufu@linux.ibm.com>

	PR tree-optimization/88760
	* gcc/config/rs6000/rs6000.opt (-munroll-only-small-loops): New option.
	* gcc/common/config/rs6000/rs6000-common.c
	(rs6000_option_optimization_table) [OPT_LEVELS_2_PLUS_SPEED_ONLY]:
	Turn on -funroll-loops and -munroll-only-small-loops.
	[OPT_LEVELS_ALL]: Turn off -fweb and -frename-registers.
	* config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
	set of PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS.
	Turn off -munroll-only-small-loops, turn on -fweb and -frename-registers
	for explicit -funroll-loops.
	(TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook.
	(rs6000_loop_unroll_adjust): Define it.  Use -munroll-only-small-loops.

gcc.testsuite/
2019-11-07  Jiufu Guo  <guojiufu@linux.ibm.com>

	PR tree-optimization/88760
	* gcc.dg/pr59643.c: Update back to r277550.

Index: gcc/common/config/rs6000/rs6000-common.c
===================================================================
--- gcc/common/config/rs6000/rs6000-common.c	(revision 277871)
+++ gcc/common/config/rs6000/rs6000-common.c	(working copy)
@@ -35,7 +35,13 @@ static const struct default_options rs6000_option_
     { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
     /* Enable -fsched-pressure for first pass instruction scheduling.  */
     { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
-    { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
+    /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
+    loops at -O2 and above by default.   */
+    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
+    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
+    /* -fweb and -frename-registers are useless in general, turn them off.  */
+    { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
+    { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
     { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 277871)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribut
 #undef TARGET_VECTORIZE_DESTROY_COST_DATA
 #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data
 
+#undef TARGET_LOOP_UNROLL_ADJUST
+#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
+
 #undef TARGET_INIT_BUILTINS
 #define TARGET_INIT_BUILTINS rs6000_init_builtins
 #undef TARGET_BUILTIN_DECL
@@ -4540,24 +4543,18 @@ rs6000_option_override_internal (bool global_init_
 			     global_options.x_param_values,
 			     global_options_set.x_param_values);
 
-      /* unroll very small loops 2 time if no -funroll-loops.  */
-      if (!global_options_set.x_flag_unroll_loops
-	  && !global_options_set.x_flag_unroll_all_loops)
+      /* Explicit -funroll-loops turns -munroll-only-small-loops off, and
+	 turns fweb or frename-registers on.  */
+      if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops)
+	  || (global_options_set.x_flag_unroll_all_loops
+	      && flag_unroll_all_loops))
 	{
-	  maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
-				 global_options.x_param_values,
-				 global_options_set.x_param_values);
-
-	  maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
-				 global_options.x_param_values,
-				 global_options_set.x_param_values);
-
-	  /* If fweb or frename-registers are not specificed in command-line,
-	     do not turn them on implicitly.  */
+	  if (!global_options_set.x_unroll_only_small_loops)
+	    unroll_only_small_loops = 0;
 	  if (!global_options_set.x_flag_web)
-	    global_options.x_flag_web = 0;
+	    flag_web = 1;
 	  if (!global_options_set.x_flag_rename_registers)
-	    global_options.x_flag_rename_registers = 0;
+	    flag_rename_registers = 1;
 	}
 
       /* If using typedef char *va_list, signal that
@@ -5101,6 +5098,25 @@ rs6000_destroy_cost_data (void *data)
   free (data);
 }
 
+/*  Implement targetm.loop_unroll_adjust.  */
+
+static unsigned
+rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop)
+{
+   if (unroll_only_small_loops)
+    {
+      /* TODO: This is hardcoded to 10 right now.  It can be refined, for
+	 example we may want to unroll very small loops more times (4 perhaps).
+	 We also should use a PARAM for this.  */
+      if (loop->ninsns <= 10)
+	return MIN (2, nunroll);
+      else
+	return 0;
+    }
+
+  return nunroll;
+}
+
 /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
    library with vectorized intrinsics.  */
 
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 277871)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -501,6 +501,10 @@ moptimize-swaps
 Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save
 Analyze and remove doubleword swaps from VSX computations.
 
+munroll-only-small-loops
+Target Undocumented Var(unroll_only_small_loops) Init(0) Save
+; Use conservative small loop unrolling.
+
 mpower9-misc
 Target Undocumented Report Mask(P9_MISC) Var(rs6000_isa_flags)
 Use certain scalar instructions added in ISA 3.0.
Index: gcc/testsuite/gcc.dg/pr59643.c
===================================================================
--- gcc/testsuite/gcc.dg/pr59643.c	(revision 277871)
+++ gcc/testsuite/gcc.dg/pr59643.c	(working copy)
@@ -1,9 +1,6 @@
 /* PR tree-optimization/59643 */
 /* { dg-do compile } */
 /* { dg-options "-O3 -fdump-tree-pcom-details" } */
-/* { dg-additional-options "--param max-unrolled-insns=400" { target { powerpc*-*-* } } } */
-/* Implicit threashold of max-unrolled-insn on ppc at O3 is too small for the
-   loop of this case.  */
 
 void
 foo (double *a, double *b, double *c, double d, double e, int n)
Jiufu Guo Nov. 7, 2019, 2:40 p.m. UTC | #13
Jiufu Guo <guojiufu@linux.ibm.com> writes:

Hi Sehger,

I updated the patch as we discussed.  This patch does not turn on -fweb
or -frename-registers with -funroll-loops for powerpc.

Thanks for review in advance.

gcc/
2019-11-07  Jiufu Guo  <guojiufu@linux.ibm.com>

	PR tree-optimization/88760
	* gcc/config/rs6000/rs6000.opt (-munroll-only-small-loops): New option.
	* gcc/common/config/rs6000/rs6000-common.c
	(rs6000_option_optimization_table) [OPT_LEVELS_2_PLUS_SPEED_ONLY]:
	Turn on -funroll-loops and -munroll-only-small-loops.
	[OPT_LEVELS_ALL]: Turn off -fweb and -frename-registers.
	* config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
	set of PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS.
	Turn off -munroll-only-small-loops for explicit -funroll-loops.
	(TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook.
	(rs6000_loop_unroll_adjust): Define it.  Use -munroll-only-small-loops.

gcc.testsuite/
2019-11-07  Jiufu Guo  <guojiufu@linux.ibm.com>

	PR tree-optimization/88760
	* gcc.dg/pr59643.c: Update back to r277550.

Index: gcc/common/config/rs6000/rs6000-common.c
===================================================================
--- gcc/common/config/rs6000/rs6000-common.c	(revision 277871)
+++ gcc/common/config/rs6000/rs6000-common.c	(working copy)
@@ -35,7 +35,14 @@ static const struct default_options rs6000_option_
     { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
     /* Enable -fsched-pressure for first pass instruction scheduling.  */
     { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
-    { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
+    /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
+    loops at -O2 and above by default.   */
+    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
+    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
+    /* -fweb and -frename-registers are useless in general for rs6000,
+       turn them off.  */
+    { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
+    { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
     { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 277871)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribut
 #undef TARGET_VECTORIZE_DESTROY_COST_DATA
 #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data
 
+#undef TARGET_LOOP_UNROLL_ADJUST
+#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
+
 #undef TARGET_INIT_BUILTINS
 #define TARGET_INIT_BUILTINS rs6000_init_builtins
 #undef TARGET_BUILTIN_DECL
@@ -4540,26 +4543,13 @@ rs6000_option_override_internal (bool global_init_
 			     global_options.x_param_values,
 			     global_options_set.x_param_values);
 
-      /* unroll very small loops 2 time if no -funroll-loops.  */
-      if (!global_options_set.x_flag_unroll_loops
-	  && !global_options_set.x_flag_unroll_all_loops)
-	{
-	  maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
-				 global_options.x_param_values,
-				 global_options_set.x_param_values);
+      /* Explicit -funroll-loops turns -munroll-only-small-loops off.  */
+      if (((global_options_set.x_flag_unroll_loops && flag_unroll_loops)
+	   || (global_options_set.x_flag_unroll_all_loops
+	       && flag_unroll_all_loops))
+	  && !global_options_set.x_unroll_only_small_loops)
+	unroll_only_small_loops = 0;
 
-	  maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
-				 global_options.x_param_values,
-				 global_options_set.x_param_values);
-
-	  /* If fweb or frename-registers are not specificed in command-line,
-	     do not turn them on implicitly.  */
-	  if (!global_options_set.x_flag_web)
-	    global_options.x_flag_web = 0;
-	  if (!global_options_set.x_flag_rename_registers)
-	    global_options.x_flag_rename_registers = 0;
-	}
-
       /* If using typedef char *va_list, signal that
 	 __builtin_va_start (&ap, 0) can be optimized to
 	 ap = __builtin_next_arg (0).  */
@@ -5101,6 +5091,25 @@ rs6000_destroy_cost_data (void *data)
   free (data);
 }
 
+/*  Implement targetm.loop_unroll_adjust.  */
+
+static unsigned
+rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop)
+{
+   if (unroll_only_small_loops)
+    {
+      /* TODO: This is hardcoded to 10 right now.  It can be refined, for
+	 example we may want to unroll very small loops more times (4 perhaps).
+	 We also should use a PARAM for this.  */
+      if (loop->ninsns <= 10)
+	return MIN (2, nunroll);
+      else
+	return 0;
+    }
+
+  return nunroll;
+}
+
 /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
    library with vectorized intrinsics.  */
 
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 277871)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -501,6 +501,10 @@ moptimize-swaps
 Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save
 Analyze and remove doubleword swaps from VSX computations.
 
+munroll-only-small-loops
+Target Undocumented Var(unroll_only_small_loops) Init(0) Save
+; Use conservative small loop unrolling.
+
 mpower9-misc
 Target Undocumented Report Mask(P9_MISC) Var(rs6000_isa_flags)
 Use certain scalar instructions added in ISA 3.0.
Index: gcc/testsuite/gcc.dg/pr59643.c
===================================================================
--- gcc/testsuite/gcc.dg/pr59643.c	(revision 277871)
+++ gcc/testsuite/gcc.dg/pr59643.c	(working copy)
@@ -1,9 +1,6 @@
 /* PR tree-optimization/59643 */
 /* { dg-do compile } */
 /* { dg-options "-O3 -fdump-tree-pcom-details" } */
-/* { dg-additional-options "--param max-unrolled-insns=400" { target { powerpc*-*-* } } } */
-/* Implicit threashold of max-unrolled-insn on ppc at O3 is too small for the
-   loop of this case.  */
 
 void
 foo (double *a, double *b, double *c, double d, double e, int n)
Segher Boessenkool Nov. 8, 2019, 3:49 p.m. UTC | #14
Hi Jiu Fu,

On Thu, Nov 07, 2019 at 10:40:41PM +0800, Jiufu Guo wrote:
> gcc/
> 2019-11-07  Jiufu Guo  <guojiufu@linux.ibm.com>
> 
> 	PR tree-optimization/88760
> 	* gcc/config/rs6000/rs6000.opt (-munroll-only-small-loops): New option.
> 	* gcc/common/config/rs6000/rs6000-common.c
> 	(rs6000_option_optimization_table) [OPT_LEVELS_2_PLUS_SPEED_ONLY]:
> 	Turn on -funroll-loops and -munroll-only-small-loops.
> 	[OPT_LEVELS_ALL]: Turn off -fweb and -frename-registers.
> 	* config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
> 	set of PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS.
> 	Turn off -munroll-only-small-loops for explicit -funroll-loops.
> 	(TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook.
> 	(rs6000_loop_unroll_adjust): Define it.  Use -munroll-only-small-loops.
> 
> gcc.testsuite/
> 2019-11-07  Jiufu Guo  <guojiufu@linux.ibm.com>
> 
> 	PR tree-optimization/88760
> 	* gcc.dg/pr59643.c: Update back to r277550.

Okay for trunk.  Thanks!  Just some formatting stuff:

> +    /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
> +    loops at -O2 and above by default.   */

The "l" of "loops" should align with the "E" of "Enable", and only two
spaces after a dot:
    /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
       loops at -O2 and above by default.  */

> +/*  Implement targetm.loop_unroll_adjust.  */

Only one space at the start of the comment.

> +static unsigned
> +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop)

struct loop *loop

> +      /* TODO: This is hardcoded to 10 right now.  It can be refined, for
> +	 example we may want to unroll very small loops more times (4 perhaps).
> +	 We also should use a PARAM for this.  */

There will be target-specific params soon, if I understood correctly :-)

Cheers,


Segher
Jiufu Guo Nov. 11, 2019, 3:02 a.m. UTC | #15
Segher Boessenkool <segher@kernel.crashing.org> writes:

> Hi Jiu Fu,
>
> On Thu, Nov 07, 2019 at 10:40:41PM +0800, Jiufu Guo wrote:
>> gcc/
>> 2019-11-07  Jiufu Guo  <guojiufu@linux.ibm.com>
>> 
>> 	PR tree-optimization/88760
>> 	* gcc/config/rs6000/rs6000.opt (-munroll-only-small-loops): New option.
>> 	* gcc/common/config/rs6000/rs6000-common.c
>> 	(rs6000_option_optimization_table) [OPT_LEVELS_2_PLUS_SPEED_ONLY]:
>> 	Turn on -funroll-loops and -munroll-only-small-loops.
>> 	[OPT_LEVELS_ALL]: Turn off -fweb and -frename-registers.
>> 	* config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
>> 	set of PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS.
>> 	Turn off -munroll-only-small-loops for explicit -funroll-loops.
>> 	(TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook.
>> 	(rs6000_loop_unroll_adjust): Define it.  Use -munroll-only-small-loops.
>> 
>> gcc.testsuite/
>> 2019-11-07  Jiufu Guo  <guojiufu@linux.ibm.com>
>> 
>> 	PR tree-optimization/88760
>> 	* gcc.dg/pr59643.c: Update back to r277550.
>
> Okay for trunk.  Thanks!  Just some formatting stuff:
Thanks Segher! I will update according for patch.
>
>> +    /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
>> +    loops at -O2 and above by default.   */
>
> The "l" of "loops" should align with the "E" of "Enable", and only two
> spaces after a dot:
>     /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
>        loops at -O2 and above by default.  */
>
>> +/*  Implement targetm.loop_unroll_adjust.  */
>
> Only one space at the start of the comment.
>
>> +static unsigned
>> +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop)
>
> struct loop *loop
>
>> +      /* TODO: This is hardcoded to 10 right now.  It can be refined, for
>> +	 example we may want to unroll very small loops more times (4 perhaps).
>> +	 We also should use a PARAM for this.  */
>
> There will be target-specific params soon, if I understood correctly
> :-)
Yes, It is what I want to do.

Thanks again!

Jiufu
BR.

>
> Cheers,
>
>
> Segher
diff mbox series

Patch

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 9ed5151..5e1a75d 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1428,6 +1428,9 @@  static const struct attribute_spec rs6000_attribute_table[] =
 #undef TARGET_VECTORIZE_DESTROY_COST_DATA
 #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data
 
+#undef TARGET_LOOP_UNROLL_ADJUST
+#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
+
 #undef TARGET_INIT_BUILTINS
 #define TARGET_INIT_BUILTINS rs6000_init_builtins
 #undef TARGET_BUILTIN_DECL
@@ -4540,25 +4543,20 @@  rs6000_option_override_internal (bool global_init_p)
 			     global_options.x_param_values,
 			     global_options_set.x_param_values);
 
-      /* unroll very small loops 2 time if no -funroll-loops.  */
+      /* If funroll-loops is not enabled explicitly, then enable small loops
+	 unrolling for -O2, and do not turn fweb or frename-registers on.  */
       if (!global_options_set.x_flag_unroll_loops
 	  && !global_options_set.x_flag_unroll_all_loops)
 	{
-	  maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
-				 global_options.x_param_values,
-				 global_options_set.x_param_values);
-
-	  maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
-				 global_options.x_param_values,
-				 global_options_set.x_param_values);
+	  unroll_small_loops = optimize >= 2 ? 1 : 0;
 
-	  /* If fweb or frename-registers are not specificed in command-line,
-	     do not turn them on implicitly.  */
 	  if (!global_options_set.x_flag_web)
 	    global_options.x_flag_web = 0;
 	  if (!global_options_set.x_flag_rename_registers)
 	    global_options.x_flag_rename_registers = 0;
 	}
+      else
+	unroll_small_loops = 0;
 
       /* If using typedef char *va_list, signal that
 	 __builtin_va_start (&ap, 0) can be optimized to
@@ -5101,6 +5099,24 @@  rs6000_destroy_cost_data (void *data)
   free (data);
 }
 
+/*  Implement targetm.loop_unroll_adjust.  */
+
+static unsigned
+rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop)
+{
+  if (unroll_small_loops)
+    {
+      /* TODO: This is hardcoded to 10 right now.  It can be refined, for
+	 example we may want to unroll very small loops more times (4 perhaps).
+	 We also should use a PARAM for this.  */
+      if (loop->ninsns <= 10)
+	return MIN (2, nunroll);
+      else
+	return 0;
+    }
+  return nunroll;
+}
+
 /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
    library with vectorized intrinsics.  */
 
@@ -23472,6 +23488,7 @@  rs6000_function_specific_save (struct cl_target_option *ptr,
 {
   ptr->x_rs6000_isa_flags = opts->x_rs6000_isa_flags;
   ptr->x_rs6000_isa_flags_explicit = opts->x_rs6000_isa_flags_explicit;
+  ptr->x_unroll_small_loops = opts->x_unroll_small_loops;
 }
 
 /* Restore the current options */
@@ -23483,6 +23500,7 @@  rs6000_function_specific_restore (struct gcc_options *opts,
 {
   opts->x_rs6000_isa_flags = ptr->x_rs6000_isa_flags;
   opts->x_rs6000_isa_flags_explicit = ptr->x_rs6000_isa_flags_explicit;
+  opts->x_unroll_small_loops = ptr->x_unroll_small_loops;
   (void) rs6000_option_override_internal (false);
 }
 
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 1f37a92..9cd5b4e 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -96,6 +96,13 @@  enum rs6000_cmodel rs6000_current_cmodel = CMODEL_SMALL
 TargetVariable
 unsigned int rs6000_recip_control
 
+;; Whether to unroll small loops only
+Variable
+unsigned char unroll_small_loops
+
+TargetSave
+unsigned char x_unroll_small_loops
+
 ;; Mask of what builtin functions are allowed
 TargetVariable
 HOST_WIDE_INT rs6000_builtin_mask
diff --git a/gcc/testsuite/gcc.dg/pr59643.c b/gcc/testsuite/gcc.dg/pr59643.c
index 4446f6e..de78d60 100644
--- a/gcc/testsuite/gcc.dg/pr59643.c
+++ b/gcc/testsuite/gcc.dg/pr59643.c
@@ -1,9 +1,6 @@ 
 /* PR tree-optimization/59643 */
 /* { dg-do compile } */
 /* { dg-options "-O3 -fdump-tree-pcom-details" } */
-/* { dg-additional-options "--param max-unrolled-insns=400" { target { powerpc*-*-* } } } */
-/* Implicit threashold of max-unrolled-insn on ppc at O3 is too small for the
-   loop of this case.  */
 
 void
 foo (double *a, double *b, double *c, double d, double e, int n)