diff mbox series

[1/2] Introduce flag_cunroll_grow_size for cunroll

Message ID 20200528085207.4021-1-guojiufu@linux.ibm.com
State New
Headers show
Series [1/2] Introduce flag_cunroll_grow_size for cunroll | expand

Commit Message

Jiufu Guo May 28, 2020, 8:52 a.m. UTC
From: Jiufu Guo <guojiufu@linux.ibm.com>

Currently GIMPLE complete unroller(cunroll) is checking
flag_unroll_loops and flag_peel_loops to see if allow size growth.
Beside affects curnoll, flag_unroll_loops also controls RTL unroler.
To have more freedom to control cunroll and RTL unroller, this patch
introduces flag_cunroll_grow_size.  With this patch, we can control
cunroll and RTL unroller indepently.

Bootstrap/regtest pass on powerpc64le. OK for trunk? And backport to
GCC10 after week?

gcc/ChangeLog
2020-02-28  Jiufu Guo  <guojiufu@linux.ibm.com>

	* common.opt (flag_cunroll_grow_size): New flag.
	* toplev.c (process_options): Set flag_cunroll_grow_size.
	* tree-ssa-loop-ivcanon.c (pass_complete_unroll::execute):
	Use flag_cunroll_grow_size.
---
 gcc/common.opt              | 4 ++++
 gcc/toplev.c                | 4 ++++
 gcc/tree-ssa-loop-ivcanon.c | 3 +--
 3 files changed, 9 insertions(+), 2 deletions(-)

Comments

Richard Biener May 28, 2020, 10:01 a.m. UTC | #1
On Thu, May 28, 2020 at 10:52 AM guojiufu <guojiufu@linux.ibm.com> wrote:
>
> From: Jiufu Guo <guojiufu@linux.ibm.com>
>
> Currently GIMPLE complete unroller(cunroll) is checking
> flag_unroll_loops and flag_peel_loops to see if allow size growth.
> Beside affects curnoll, flag_unroll_loops also controls RTL unroler.
> To have more freedom to control cunroll and RTL unroller, this patch
> introduces flag_cunroll_grow_size.  With this patch, we can control
> cunroll and RTL unroller indepently.
>
> Bootstrap/regtest pass on powerpc64le. OK for trunk? And backport to
> GCC10 after week?
>
> gcc/ChangeLog
> 2020-02-28  Jiufu Guo  <guojiufu@linux.ibm.com>
>
>         * common.opt (flag_cunroll_grow_size): New flag.
>         * toplev.c (process_options): Set flag_cunroll_grow_size.
>         * tree-ssa-loop-ivcanon.c (pass_complete_unroll::execute):
>         Use flag_cunroll_grow_size.
> ---
>  gcc/common.opt              | 4 ++++
>  gcc/toplev.c                | 4 ++++
>  gcc/tree-ssa-loop-ivcanon.c | 3 +--
>  3 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 4464049fc1f..1d0fa7b1749 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2856,6 +2856,10 @@ funroll-all-loops
>  Common Report Var(flag_unroll_all_loops) Optimization
>  Perform loop unrolling for all loops.
>
> +funroll-completely-grow-size
> +Var(flag_cunroll_grow_size) Init(2)
> +; Control cunroll to allow size growth during complete unrolling
> +

So this really adds a new compiler option which would need documenting.

I fear we'll get into bikeshed territory here as well...  I originally thought
we can use

Variable
int flag_cunroll_grow_size;

but now realize that does not work well with LTO without adjusting
the awk scripts to generate option saving/restoring.  For your patch
you'd need to add 'Optimization' to get the flag streamed properly,
you should also verify the target adjustment done in the backend
is reflected in LTO mode.

>  ; Nonzero means that loop optimizer may assume that the induction variables
>  ; that control loops do not overflow and that the loops with nontrivial
>  ; exit condition are not infinite
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 96316fbd23b..8d52358efdd 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -1474,6 +1474,10 @@ process_options (void)
>    if (flag_unroll_all_loops)
>      flag_unroll_loops = 1;
>
> +  /* Allow cunroll to grow size accordingly.  */
> +  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
> +    flag_cunroll_grow_size = flag_unroll_loops || flag_peel_loops;
> +

Any reason to not use EnabledBy(funroll-loops || fpeel-loops)?

>    /* web and rename-registers help when run after loop unrolling.  */
>    if (flag_web == AUTODETECT_VALUE)
>      flag_web = flag_unroll_loops;
> diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
> index 8ab6ab3330c..d6a4617a6a1 100644
> --- a/gcc/tree-ssa-loop-ivcanon.c
> +++ b/gcc/tree-ssa-loop-ivcanon.c
> @@ -1603,8 +1603,7 @@ pass_complete_unroll::execute (function *fun)
>       re-peeling the same loop multiple times.  */
>    if (flag_peel_loops)
>      peeled_loops = BITMAP_ALLOC (NULL);
> -  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
> -                                                  || flag_peel_loops
> +  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size
>                                                    || optimize >= 3, true);

Given we check optimize >= 3 here please enable the flag by default
at O3+ via opts.c:default_options_table and also elide the optimize >= 3
check.  That way -fno-unroll-completely-grow-size would have the desired effect.

Now back to the option name ... if we expose the option we should apply
some forward looking.  Currently cunroll cannot be disabled or enabled
with a flag and the desired new flag simply tunes one knob on it.  How
about adding

-fcomplete-unroll-loops[=may-grow]

to be able to further extend this later (there's the knob to only unroll
non-outermost loops and the knob whether to unroll loops where
intermediate exits are not statically predicted - incompletely controlled
by -fpeel-loops).  There's unfortunately no existing examples that allows
multiple flags like -fcomlete-unroll-loops=may-grow,outer other than
the sanitizers which have manual option parsing.

So if there's no good suggestion from option folks maybe go with

-fcomplete-unroll-loops-may-grow

(ick).  And on a second thought -fcomplete-unroll-loops[=...] should
be -funroll-loops[={complete,may-grow,all}] to cover all unrolling
bases?

I really hate to explode the number of options users have to
consider optimizing their code ...

So if we can defer all this thinking and make a non-option flag
variable work that would be best IMHO.

Richard.

>    if (peeled_loops)
>      {
> --
> 2.17.1
>
Segher Boessenkool May 28, 2020, 1:40 p.m. UTC | #2
On Thu, May 28, 2020 at 12:01:26PM +0200, Richard Biener wrote:
> On Thu, May 28, 2020 at 10:52 AM guojiufu <guojiufu@linux.ibm.com> wrote:
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -2856,6 +2856,10 @@ funroll-all-loops
> >  Common Report Var(flag_unroll_all_loops) Optimization
> >  Perform loop unrolling for all loops.
> >
> > +funroll-completely-grow-size
> > +Var(flag_cunroll_grow_size) Init(2)
> > +; Control cunroll to allow size growth during complete unrolling
> > +
> 
> So this really adds a new compiler option which would need documenting.

The new flag can be marked Undocumented, would that help?  It is only
for internal use (which a comment can say as well).

> I fear we'll get into bikeshed territory here as well...  I originally thought
> we can use
> 
> Variable
> int flag_cunroll_grow_size;
> 
> but now realize that does not work well with LTO without adjusting
> the awk scripts to generate option saving/restoring.  For your patch
> you'd need to add 'Optimization' to get the flag streamed properly,
> you should also verify the target adjustment done in the backend
> is reflected in LTO mode.

The option machinery together with LTO is too complicated for me :-/

> Now back to the option name ... if we expose the option we should apply
> some forward looking.  Currently cunroll cannot be disabled or enabled
> with a flag and the desired new flag simply tunes one knob on it.  How
> about adding
> 
> -fcomplete-unroll-loops[=may-grow]
> 
> to be able to further extend this later

Trying to anticipate how things will be extended later rarely works :-(

> (there's the knob to only unroll
> non-outermost loops and the knob whether to unroll loops where
> intermediate exits are not statically predicted - incompletely controlled
> by -fpeel-loops).  There's unfortunately no existing examples that allows
> multiple flags like -fcomlete-unroll-loops=may-grow,outer other than
> the sanitizers which have manual option parsing.
> 
> So if there's no good suggestion from option folks maybe go with
> 
> -fcomplete-unroll-loops-may-grow
> 
> (ick).  And on a second thought -fcomplete-unroll-loops[=...] should
> be -funroll-loops[={complete,may-grow,all}] to cover all unrolling
> bases?
> 
> I really hate to explode the number of options users have to
> consider optimizing their code ...

Well, the defaults should be good for almost everyone.  But after that,
sure, it should be possible to tune things in a reasonable way.

> So if we can defer all this thinking and make a non-option flag
> variable work that would be best IMHO.

:-)


Segher
Jiufu Guo May 28, 2020, 2:36 p.m. UTC | #3
Richard Biener <richard.guenther@gmail.com> writes:

> On Thu, May 28, 2020 at 10:52 AM guojiufu <guojiufu@linux.ibm.com> wrote:
>>
>> From: Jiufu Guo <guojiufu@linux.ibm.com>
>>
>> Currently GIMPLE complete unroller(cunroll) is checking
>> flag_unroll_loops and flag_peel_loops to see if allow size growth.
>> Beside affects curnoll, flag_unroll_loops also controls RTL unroler.
>> To have more freedom to control cunroll and RTL unroller, this patch
>> introduces flag_cunroll_grow_size.  With this patch, we can control
>> cunroll and RTL unroller indepently.
>>
>> Bootstrap/regtest pass on powerpc64le. OK for trunk? And backport to
>> GCC10 after week?
>>
>>
>> +funroll-completely-grow-size
>> +Var(flag_cunroll_grow_size) Init(2)
>> +; Control cunroll to allow size growth during complete unrolling
>> +
>
> So this really adds a new compiler option which would need
> documenting.
I once add 'Undocumented' (avoid shown in --help), and do not add
'Common' (avoid --help=common).  What I want to do is avoid expose this
to user. 
While this is still an option as you said.

>
> I fear we'll get into bikeshed territory here as well...  I originally thought
> we can use
>
> Variable
> int flag_cunroll_grow_size;

Thanks, this code is definetly a variable instead an option. I would try
this way. 
>
> but now realize that does not work well with LTO without adjusting
> the awk scripts to generate option saving/restoring.  For your patch
> you'd need to add 'Optimization' to get the flag streamed properly,
> you should also verify the target adjustment done in the backend
> is reflected in LTO mode.

At here, internal option is relative simple 'Optimization' could help.
When trying 'Variable', I will verify it in LTO mode. 

>
>>  ; Nonzero means that loop optimizer may assume that the induction variables
>>
>> +  /* Allow cunroll to grow size accordingly.  */
>> +  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
>> +    flag_cunroll_grow_size = flag_unroll_loops || flag_peel_loops;
>> +
>
> Any reason to not use EnabledBy(funroll-loops || fpeel-loops)?

With tests and checking the generated code(e.g. options.c), I find that
this setting has some unexpected behavior:
For example, "-funroll-loops -fno-peel-loops" turns off the flag.
"||" would indicate the flag will be _on/off_ by f[no]-unroll-loop or
f[no]-peel-loops.

>
>>    /* web and rename-registers help when run after loop unrolling.  */
>>    if (flag_web == AUTODETECT_VALUE)
>>      flag_web = flag_unroll_loops;

>> -  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
>> -                                                  || flag_peel_loops
>> +  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size
>>                                                    || optimize >= 3, true);
>
> Given we check optimize >= 3 here please enable the flag by default
> at O3+ via opts.c:default_options_table and also elide the optimize >= 3
> check.  That way -fno-unroll-completely-grow-size would have the desired effect.
>
Actually in code flag_peel_loops is enabled at O3+, so, "|| optimize >=
3" could be removed.  Like you said, this helps to set negative form
even at -O3.

> Now back to the option name ... if we expose the option we should apply
> some forward looking.  Currently cunroll cannot be disabled or enabled
> with a flag and the desired new flag simply tunes one knob on it.  How
> about adding
>
> -fcomplete-unroll-loops[=may-grow]
-fcomplete-unroll-loops[=may-grow|inner|outer]
>
> to be able to further extend this later (there's the knob to only unroll
> non-outermost loops and the knob whether to unroll loops where
> intermediate exits are not statically predicted - incompletely controlled
> by -fpeel-loops).  There's unfortunately no existing examples that allows
> multiple flags like -fcomlete-unroll-loops=may-grow,outer other than
> the sanitizers which have manual option parsing.
>
> So if there's no good suggestion from option folks maybe go with
>
> -fcomplete-unroll-loops-may-grow
>
> (ick).  And on a second thought -fcomplete-unroll-loops[=...] should
> be -funroll-loops[={complete,may-grow,all}] to cover all unrolling
> bases?
>
> I really hate to explode the number of options users have to
> consider optimizing their code ...
>
> So if we can defer all this thinking and make a non-option flag
> variable work that would be best IMHO.
Yes, a few things are need trade-off when designing a user options.


Jiufu
>
> Richard.
>
>>    if (peeled_loops)
>>      {
>> --
>> 2.17.1
>>
Richard Biener May 28, 2020, 2:47 p.m. UTC | #4
On Thu, May 28, 2020 at 4:37 PM Jiufu Guo <guojiufu@linux.ibm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
>
> > On Thu, May 28, 2020 at 10:52 AM guojiufu <guojiufu@linux.ibm.com> wrote:
> >>
> >> From: Jiufu Guo <guojiufu@linux.ibm.com>
> >>
> >> Currently GIMPLE complete unroller(cunroll) is checking
> >> flag_unroll_loops and flag_peel_loops to see if allow size growth.
> >> Beside affects curnoll, flag_unroll_loops also controls RTL unroler.
> >> To have more freedom to control cunroll and RTL unroller, this patch
> >> introduces flag_cunroll_grow_size.  With this patch, we can control
> >> cunroll and RTL unroller indepently.
> >>
> >> Bootstrap/regtest pass on powerpc64le. OK for trunk? And backport to
> >> GCC10 after week?
> >>
> >>
> >> +funroll-completely-grow-size
> >> +Var(flag_cunroll_grow_size) Init(2)
> >> +; Control cunroll to allow size growth during complete unrolling
> >> +
> >
> > So this really adds a new compiler option which would need
> > documenting.
> I once add 'Undocumented' (avoid shown in --help), and do not add
> 'Common' (avoid --help=common).  What I want to do is avoid expose this
> to user.
> While this is still an option as you said.
>
> >
> > I fear we'll get into bikeshed territory here as well...  I originally thought
> > we can use
> >
> > Variable
> > int flag_cunroll_grow_size;
>
> Thanks, this code is definetly a variable instead an option. I would try
> this way.
> >
> > but now realize that does not work well with LTO without adjusting
> > the awk scripts to generate option saving/restoring.  For your patch
> > you'd need to add 'Optimization' to get the flag streamed properly,
> > you should also verify the target adjustment done in the backend
> > is reflected in LTO mode.
>
> At here, internal option is relative simple 'Optimization' could help.
> When trying 'Variable', I will verify it in LTO mode.

It won't work without adjusting the awk scripts.  So go with

funroll-completely-grow-size
Undocumented Optimization Var(flag_cunroll_grow_size)
EnabledBy(funroll-loops || fpeel-loops)
; ...

and enable it at O3+.  AUTODETECT_VALUE doesn't make sense for
an option not supposed to be set by users?


> >
> >>  ; Nonzero means that loop optimizer may assume that the induction variables
> >>
> >> +  /* Allow cunroll to grow size accordingly.  */
> >> +  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
> >> +    flag_cunroll_grow_size = flag_unroll_loops || flag_peel_loops;
> >> +
> >
> > Any reason to not use EnabledBy(funroll-loops || fpeel-loops)?
>
> With tests and checking the generated code(e.g. options.c), I find that
> this setting has some unexpected behavior:
> For example, "-funroll-loops -fno-peel-loops" turns off the flag.
> "||" would indicate the flag will be _on/off_ by f[no]-unroll-loop or
> f[no]-peel-loops.
>
> >
> >>    /* web and rename-registers help when run after loop unrolling.  */
> >>    if (flag_web == AUTODETECT_VALUE)
> >>      flag_web = flag_unroll_loops;
>
> >> -  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
> >> -                                                  || flag_peel_loops
> >> +  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size
> >>                                                    || optimize >= 3, true);
> >
> > Given we check optimize >= 3 here please enable the flag by default
> > at O3+ via opts.c:default_options_table and also elide the optimize >= 3
> > check.  That way -fno-unroll-completely-grow-size would have the desired effect.
> >
> Actually in code flag_peel_loops is enabled at O3+, so, "|| optimize >=
> 3" could be removed.  Like you said, this helps to set negative form
> even at -O3.

You are right.

> > Now back to the option name ... if we expose the option we should apply
> > some forward looking.  Currently cunroll cannot be disabled or enabled
> > with a flag and the desired new flag simply tunes one knob on it.  How
> > about adding
> >
> > -fcomplete-unroll-loops[=may-grow]
> -fcomplete-unroll-loops[=may-grow|inner|outer]
> >
> > to be able to further extend this later (there's the knob to only unroll
> > non-outermost loops and the knob whether to unroll loops where
> > intermediate exits are not statically predicted - incompletely controlled
> > by -fpeel-loops).  There's unfortunately no existing examples that allows
> > multiple flags like -fcomlete-unroll-loops=may-grow,outer other than
> > the sanitizers which have manual option parsing.
> >
> > So if there's no good suggestion from option folks maybe go with
> >
> > -fcomplete-unroll-loops-may-grow
> >
> > (ick).  And on a second thought -fcomplete-unroll-loops[=...] should
> > be -funroll-loops[={complete,may-grow,all}] to cover all unrolling
> > bases?
> >
> > I really hate to explode the number of options users have to
> > consider optimizing their code ...
> >
> > So if we can defer all this thinking and make a non-option flag
> > variable work that would be best IMHO.
> Yes, a few things are need trade-off when designing a user options.
>
>
> Jiufu
> >
> > Richard.
> >
> >>    if (peeled_loops)
> >>      {
> >> --
> >> 2.17.1
> >>
Jiufu Guo May 29, 2020, 5:42 a.m. UTC | #5
Richard Biener <richard.guenther@gmail.com> writes:

> On Thu, May 28, 2020 at 4:37 PM Jiufu Guo <guojiufu@linux.ibm.com> wrote:
>>
>> Richard Biener <richard.guenther@gmail.com> writes:
>>
>> > On Thu, May 28, 2020 at 10:52 AM guojiufu <guojiufu@linux.ibm.com> wrote:
>> >>
>> >> From: Jiufu Guo <guojiufu@linux.ibm.com>
>> >>
>> >> Currently GIMPLE complete unroller(cunroll) is checking
>> >> flag_unroll_loops and flag_peel_loops to see if allow size growth.
>> >> Beside affects curnoll, flag_unroll_loops also controls RTL unroler.
>> >> To have more freedom to control cunroll and RTL unroller, this patch
>> >> introduces flag_cunroll_grow_size.  With this patch, we can control
>> >> cunroll and RTL unroller indepently.
>> >>
>> >> Bootstrap/regtest pass on powerpc64le. OK for trunk? And backport to
>> >> GCC10 after week?
>> >>
>> >>
>> >> +funroll-completely-grow-size
>> >> +Var(flag_cunroll_grow_size) Init(2)
>> >> +; Control cunroll to allow size growth during complete unrolling
>> >> +
>> >
>> > So this really adds a new compiler option which would need
>> > documenting.
>> I once add 'Undocumented' (avoid shown in --help), and do not add
>> 'Common' (avoid --help=common).  What I want to do is avoid expose this
>> to user.
>> While this is still an option as you said.
>>
>> >
>> > I fear we'll get into bikeshed territory here as well...  I originally thought
>> > we can use
>> >
>> > Variable
>> > int flag_cunroll_grow_size;
>>
>> Thanks, this code is definetly a variable instead an option. I would try
>> this way.
>> >
>> > but now realize that does not work well with LTO without adjusting
>> > the awk scripts to generate option saving/restoring.  For your patch
>> > you'd need to add 'Optimization' to get the flag streamed properly,
>> > you should also verify the target adjustment done in the backend
>> > is reflected in LTO mode.
>>
>> At here, internal option is relative simple 'Optimization' could help.
>> When trying 'Variable', I will verify it in LTO mode.
>
> It won't work without adjusting the awk scripts.  So go with
>
> funroll-completely-grow-size
> Undocumented Optimization Var(flag_cunroll_grow_size)
> EnabledBy(funroll-loops || fpeel-loops)
> ; ...
>
EnabledBy(funroll-loops || fpeel-loops) does not works as we expected:
"-funroll-loops -fno-peel-loops" turns off flag_cunroll_grow_size.

Through "EnabledBy", a flag can be turned, and also can be turned off by
the "EnabledBy option", only if the flag is not specifed through commond
line.  

> and enable it at O3+.  AUTODETECT_VALUE doesn't make sense for
> an option not supposed to be set by users?
>

global_options_set.x_flagxxx can be used to check if option is set by
user.  But it does not work well here neither, because we also care of
if the flag is override by OPTION_OPTIMIZATION_TABLE or
OPTION_OVERRIDE. 

AUTODETECT_VALUE(value is 2) is used for some flags like flag_web,
flag_rename_registers, flag_var_tracking, flag_tree_cselim...
And this way could be used to check if the flag is effective(on/off)
either explicit set by command line or implicit set through
OPTION_OVERRIDE or OPTION_OPTIMIZATION_TABLE.
So, I use it here.

Thanks again!
Jiufu

>
>> >
>> >>  ; Nonzero means that loop optimizer may assume that the induction variables
>> >>
>> >> +  /* Allow cunroll to grow size accordingly.  */
>> >> +  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
>> >> +    flag_cunroll_grow_size = flag_unroll_loops || flag_peel_loops;
>> >> +
>> >
>> > Any reason to not use EnabledBy(funroll-loops || fpeel-loops)?
>>
>> With tests and checking the generated code(e.g. options.c), I find that
>> this setting has some unexpected behavior:
>> For example, "-funroll-loops -fno-peel-loops" turns off the flag.
>> "||" would indicate the flag will be _on/off_ by f[no]-unroll-loop or
>> f[no]-peel-loops.
>>
>> >
>> >>    /* web and rename-registers help when run after loop unrolling.  */
>> >>    if (flag_web == AUTODETECT_VALUE)
>> >>      flag_web = flag_unroll_loops;
>>
>> >> -  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
>> >> -                                                  || flag_peel_loops
>> >> +  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size
>> >>                                                    || optimize >= 3, true);
>> >
>> > Given we check optimize >= 3 here please enable the flag by default
>> > at O3+ via opts.c:default_options_table and also elide the optimize >= 3
>> > check.  That way -fno-unroll-completely-grow-size would have the desired effect.
>> >
>> Actually in code flag_peel_loops is enabled at O3+, so, "|| optimize >=
>> 3" could be removed.  Like you said, this helps to set negative form
>> even at -O3.
>
> You are right.
>
>> > Now back to the option name ... if we expose the option we should apply
>> > some forward looking.  Currently cunroll cannot be disabled or enabled
>> > with a flag and the desired new flag simply tunes one knob on it.  How
>> > about adding
>> >
>> > -fcomplete-unroll-loops[=may-grow]
>> -fcomplete-unroll-loops[=may-grow|inner|outer]
>> >
>> > to be able to further extend this later (there's the knob to only unroll
>> > non-outermost loops and the knob whether to unroll loops where
>> > intermediate exits are not statically predicted - incompletely controlled
>> > by -fpeel-loops).  There's unfortunately no existing examples that allows
>> > multiple flags like -fcomlete-unroll-loops=may-grow,outer other than
>> > the sanitizers which have manual option parsing.
>> >
>> > So if there's no good suggestion from option folks maybe go with
>> >
>> > -fcomplete-unroll-loops-may-grow
>> >
>> > (ick).  And on a second thought -fcomplete-unroll-loops[=...] should
>> > be -funroll-loops[={complete,may-grow,all}] to cover all unrolling
>> > bases?
>> >
>> > I really hate to explode the number of options users have to
>> > consider optimizing their code ...
>> >
>> > So if we can defer all this thinking and make a non-option flag
>> > variable work that would be best IMHO.
>> Yes, a few things are need trade-off when designing a user options.
>>
>>
>> Jiufu
>> >
>> > Richard.
>> >
>> >>    if (peeled_loops)
>> >>      {
>> >> --
>> >> 2.17.1
>> >>
Jiufu Guo June 2, 2020, 2:10 a.m. UTC | #6
Jiufu Guo <guojiufu@linux.ibm.com> writes:

Hi,

I updated the patch just a little accordinlgy.  Thanks!

diff --git a/gcc/common.opt b/gcc/common.opt
index 4464049fc1f..570e2aa53c8 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2856,6 +2856,10 @@ funroll-all-loops
 Common Report Var(flag_unroll_all_loops) Optimization
 Perform loop unrolling for all loops.
 
+funroll-completely-grow-size
+Common Undocumented Var(flag_cunroll_grow_size) Init(2) Optimization
+; Internal undocumented flag, allow size growth during complete unrolling
+
 ; Nonzero means that loop optimizer may assume that the induction variables
 ; that control loops do not overflow and that the loops with nontrivial
 ; exit condition are not infinite
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 96316fbd23b..8d52358efdd 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1474,6 +1474,10 @@ process_options (void)
   if (flag_unroll_all_loops)
     flag_unroll_loops = 1;
 
+  /* Allow cunroll to grow size accordingly.  */
+  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
+    flag_cunroll_grow_size = flag_unroll_loops || flag_peel_loops;
+
   /* web and rename-registers help when run after loop unrolling.  */
   if (flag_web == AUTODETECT_VALUE)
     flag_web = flag_unroll_loops;
diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index 8ab6ab3330c..298ab215530 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -1603,9 +1603,8 @@ pass_complete_unroll::execute (function *fun)
      re-peeling the same loop multiple times.  */
   if (flag_peel_loops)
     peeled_loops = BITMAP_ALLOC (NULL);
-  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
-						   || flag_peel_loops
-						   || optimize >= 3, true);
+  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size, 
+						   true);
   if (peeled_loops)
     {
       BITMAP_FREE (peeled_loops);

BR,
Jiufu

> Richard Biener <richard.guenther@gmail.com> writes:
>

>>> >> From: Jiufu Guo <guojiufu@linux.ibm.com>
>>> >>
>>> >> Currently GIMPLE complete unroller(cunroll) is checking
>>> >> flag_unroll_loops and flag_peel_loops to see if allow size growth.
>>> >> Beside affects curnoll, flag_unroll_loops also controls RTL unroler.
>>> >> To have more freedom to control cunroll and RTL unroller, this patch
>>> >> introduces flag_cunroll_grow_size.  With this patch, we can control
>>> >> cunroll and RTL unroller indepently.
>>> >>
>>> >> Bootstrap/regtest pass on powerpc64le. OK for trunk? And backport to
>>> >> GCC10 after week?
>>> >>
>>> >>
>>> >> +funroll-completely-grow-size
>>> >> +Var(flag_cunroll_grow_size) Init(2)
>>> >> +; Control cunroll to allow size growth during complete unrolling
>>> >> +
>>> >
>>
>> It won't work without adjusting the awk scripts.  So go with
>>
>> funroll-completely-grow-size
>> Undocumented Optimization Var(flag_cunroll_grow_size)
>> EnabledBy(funroll-loops || fpeel-loops)
>> ; ...
>>
> EnabledBy(funroll-loops || fpeel-loops) does not works as we expected:
> "-funroll-loops -fno-peel-loops" turns off flag_cunroll_grow_size.
>
> Through "EnabledBy", a flag can be turned, and also can be turned off by
> the "EnabledBy option", only if the flag is not specifed through commond
> line.  
>
>> and enable it at O3+.  AUTODETECT_VALUE doesn't make sense for
>> an option not supposed to be set by users?
>>
>
> global_options_set.x_flagxxx can be used to check if option is set by
> user.  But it does not work well here neither, because we also care of
> if the flag is override by OPTION_OPTIMIZATION_TABLE or
> OPTION_OVERRIDE. 
>
> AUTODETECT_VALUE(value is 2) is used for some flags like flag_web,
> flag_rename_registers, flag_var_tracking, flag_tree_cselim...
> And this way could be used to check if the flag is effective(on/off)
> either explicit set by command line or implicit set through
> OPTION_OVERRIDE or OPTION_OPTIMIZATION_TABLE.
> So, I use it here.
Richard Biener June 2, 2020, 11:15 a.m. UTC | #7
On Tue, Jun 2, 2020 at 4:10 AM Jiufu Guo <guojiufu@linux.ibm.com> wrote:
>
> Jiufu Guo <guojiufu@linux.ibm.com> writes:
>
> Hi,
>
> I updated the patch just a little accordinlgy.  Thanks!
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 4464049fc1f..570e2aa53c8 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2856,6 +2856,10 @@ funroll-all-loops
>  Common Report Var(flag_unroll_all_loops) Optimization
>  Perform loop unrolling for all loops.
>
> +funroll-completely-grow-size
> +Common Undocumented Var(flag_cunroll_grow_size) Init(2) Optimization
> +; Internal undocumented flag, allow size growth during complete unrolling
> +
>  ; Nonzero means that loop optimizer may assume that the induction variables
>  ; that control loops do not overflow and that the loops with nontrivial
>  ; exit condition are not infinite
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 96316fbd23b..8d52358efdd 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -1474,6 +1474,10 @@ process_options (void)
>    if (flag_unroll_all_loops)
>      flag_unroll_loops = 1;
>
> +  /* Allow cunroll to grow size accordingly.  */
> +  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
> +    flag_cunroll_grow_size = flag_unroll_loops || flag_peel_loops;
> +

(*)

>    /* web and rename-registers help when run after loop unrolling.  */
>    if (flag_web == AUTODETECT_VALUE)
>      flag_web = flag_unroll_loops;
> diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
> index 8ab6ab3330c..298ab215530 100644
> --- a/gcc/tree-ssa-loop-ivcanon.c
> +++ b/gcc/tree-ssa-loop-ivcanon.c
> @@ -1603,9 +1603,8 @@ pass_complete_unroll::execute (function *fun)
>       re-peeling the same loop multiple times.  */
>    if (flag_peel_loops)
>      peeled_loops = BITMAP_ALLOC (NULL);
> -  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
> -                                                  || flag_peel_loops
> -                                                  || optimize >= 3, true);
> +  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size,

With -O3 -fno-peel-loops this does not have the same effect anymore.
So above (*) you'd need to check || optimize >= 3 as well.

> +                                                  true);
>    if (peeled_loops)
>      {
>        BITMAP_FREE (peeled_loops);
>
> BR,
> Jiufu
>
> > Richard Biener <richard.guenther@gmail.com> writes:
> >
>
> >>> >> From: Jiufu Guo <guojiufu@linux.ibm.com>
> >>> >>
> >>> >> Currently GIMPLE complete unroller(cunroll) is checking
> >>> >> flag_unroll_loops and flag_peel_loops to see if allow size growth.
> >>> >> Beside affects curnoll, flag_unroll_loops also controls RTL unroler.
> >>> >> To have more freedom to control cunroll and RTL unroller, this patch
> >>> >> introduces flag_cunroll_grow_size.  With this patch, we can control
> >>> >> cunroll and RTL unroller indepently.
> >>> >>
> >>> >> Bootstrap/regtest pass on powerpc64le. OK for trunk? And backport to
> >>> >> GCC10 after week?
> >>> >>
> >>> >>
> >>> >> +funroll-completely-grow-size
> >>> >> +Var(flag_cunroll_grow_size) Init(2)
> >>> >> +; Control cunroll to allow size growth during complete unrolling
> >>> >> +
> >>> >
> >>
> >> It won't work without adjusting the awk scripts.  So go with
> >>
> >> funroll-completely-grow-size
> >> Undocumented Optimization Var(flag_cunroll_grow_size)
> >> EnabledBy(funroll-loops || fpeel-loops)
> >> ; ...
> >>
> > EnabledBy(funroll-loops || fpeel-loops) does not works as we expected:
> > "-funroll-loops -fno-peel-loops" turns off flag_cunroll_grow_size.
> >
> > Through "EnabledBy", a flag can be turned, and also can be turned off by
> > the "EnabledBy option", only if the flag is not specifed through commond
> > line.
> >
> >> and enable it at O3+.  AUTODETECT_VALUE doesn't make sense for
> >> an option not supposed to be set by users?
> >>
> >
> > global_options_set.x_flagxxx can be used to check if option is set by
> > user.  But it does not work well here neither, because we also care of
> > if the flag is override by OPTION_OPTIMIZATION_TABLE or
> > OPTION_OVERRIDE.
> >
> > AUTODETECT_VALUE(value is 2) is used for some flags like flag_web,
> > flag_rename_registers, flag_var_tracking, flag_tree_cselim...
> > And this way could be used to check if the flag is effective(on/off)
> > either explicit set by command line or implicit set through
> > OPTION_OVERRIDE or OPTION_OPTIMIZATION_TABLE.
> > So, I use it here.
>
>
Jiufu Guo June 3, 2020, 1:53 a.m. UTC | #8
Richard Biener <richard.guenther@gmail.com> writes:

> On Tue, Jun 2, 2020 at 4:10 AM Jiufu Guo <guojiufu@linux.ibm.com> wrote:
>>
>> Jiufu Guo <guojiufu@linux.ibm.com> writes:
>>
>> Hi,
>>
>> I updated the patch just a little accordinlgy.  Thanks!
>>
>> diff --git a/gcc/common.opt b/gcc/common.opt
>> index 4464049fc1f..570e2aa53c8 100644
>> --- a/gcc/common.opt
>> +++ b/gcc/common.opt
>> @@ -2856,6 +2856,10 @@ funroll-all-loops
>>  Common Report Var(flag_unroll_all_loops) Optimization
>>  Perform loop unrolling for all loops.
>>
>> +funroll-completely-grow-size
>> +Common Undocumented Var(flag_cunroll_grow_size) Init(2) Optimization
>> +; Internal undocumented flag, allow size growth during complete unrolling
>> +
>>  ; Nonzero means that loop optimizer may assume that the induction variables
>>  ; that control loops do not overflow and that the loops with nontrivial
>>  ; exit condition are not infinite
>> diff --git a/gcc/toplev.c b/gcc/toplev.c
>> index 96316fbd23b..8d52358efdd 100644
>> --- a/gcc/toplev.c
>> +++ b/gcc/toplev.c
>> @@ -1474,6 +1474,10 @@ process_options (void)
>>    if (flag_unroll_all_loops)
>>      flag_unroll_loops = 1;
>>
>> +  /* Allow cunroll to grow size accordingly.  */
>> +  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
>> +    flag_cunroll_grow_size = flag_unroll_loops || flag_peel_loops;
>> +
>
> (*)
>
>>    /* web and rename-registers help when run after loop unrolling.  */
>>    if (flag_web == AUTODETECT_VALUE)
>>      flag_web = flag_unroll_loops;
>> diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
>> index 8ab6ab3330c..298ab215530 100644
>> --- a/gcc/tree-ssa-loop-ivcanon.c
>> +++ b/gcc/tree-ssa-loop-ivcanon.c
>> @@ -1603,9 +1603,8 @@ pass_complete_unroll::execute (function *fun)
>>       re-peeling the same loop multiple times.  */
>>    if (flag_peel_loops)
>>      peeled_loops = BITMAP_ALLOC (NULL);
>> -  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
>> -                                                  || flag_peel_loops
>> -                                                  || optimize >= 3, true);
>> +  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size,
>
> With -O3 -fno-peel-loops this does not have the same effect anymore.
> So above (*) you'd need to check || optimize >= 3 as well.

Oh, yes.  Thanks for your kindly review!

BR,
Jiufu

>
>> +                                                  true);
>>    if (peeled_loops)
>>      {
>>        BITMAP_FREE (peeled_loops);
>>
>> BR,
>> Jiufu
>>
>> > Richard Biener <richard.guenther@gmail.com> writes:
>> >
>>
>> >>> >> From: Jiufu Guo <guojiufu@linux.ibm.com>
>> >>> >>
>> >>> >> Currently GIMPLE complete unroller(cunroll) is checking
>> >>> >> flag_unroll_loops and flag_peel_loops to see if allow size growth.
>> >>> >> Beside affects curnoll, flag_unroll_loops also controls RTL unroler.
>> >>> >> To have more freedom to control cunroll and RTL unroller, this patch
>> >>> >> introduces flag_cunroll_grow_size.  With this patch, we can control
>> >>> >> cunroll and RTL unroller indepently.
>> >>> >>
>> >>> >> Bootstrap/regtest pass on powerpc64le. OK for trunk? And backport to
>> >>> >> GCC10 after week?
>> >>> >>
>> >>> >>
>> >>> >> +funroll-completely-grow-size
>> >>> >> +Var(flag_cunroll_grow_size) Init(2)
>> >>> >> +; Control cunroll to allow size growth during complete unrolling
>> >>> >> +
>> >>> >
>> >>
>> >> It won't work without adjusting the awk scripts.  So go with
>> >>
>> >> funroll-completely-grow-size
>> >> Undocumented Optimization Var(flag_cunroll_grow_size)
>> >> EnabledBy(funroll-loops || fpeel-loops)
>> >> ; ...
>> >>
>> > EnabledBy(funroll-loops || fpeel-loops) does not works as we expected:
>> > "-funroll-loops -fno-peel-loops" turns off flag_cunroll_grow_size.
>> >
>> > Through "EnabledBy", a flag can be turned, and also can be turned off by
>> > the "EnabledBy option", only if the flag is not specifed through commond
>> > line.
>> >
>> >> and enable it at O3+.  AUTODETECT_VALUE doesn't make sense for
>> >> an option not supposed to be set by users?
>> >>
>> >
>> > global_options_set.x_flagxxx can be used to check if option is set by
>> > user.  But it does not work well here neither, because we also care of
>> > if the flag is override by OPTION_OPTIMIZATION_TABLE or
>> > OPTION_OVERRIDE.
>> >
>> > AUTODETECT_VALUE(value is 2) is used for some flags like flag_web,
>> > flag_rename_registers, flag_var_tracking, flag_tree_cselim...
>> > And this way could be used to check if the flag is effective(on/off)
>> > either explicit set by command line or implicit set through
>> > OPTION_OVERRIDE or OPTION_OPTIMIZATION_TABLE.
>> > So, I use it here.
>>
>>
Jiufu Guo June 4, 2020, 3:34 a.m. UTC | #9
Jiufu Guo <guojiufu@linux.ibm.com> writes:

Hi,

Patch is updated a little according to comments.
Please see if this is ok to commit.

diff --git a/gcc/common.opt b/gcc/common.opt
index 4464049fc1f..570e2aa53c8 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2856,6 +2856,10 @@ funroll-all-loops
 Common Report Var(flag_unroll_all_loops) Optimization
 Perform loop unrolling for all loops.
 
+funroll-completely-grow-size
+Undocumented Var(flag_cunroll_grow_size) Init(2) Optimization
+; Internal undocumented flag, allow size growth during complete unrolling
+
 ; Nonzero means that loop optimizer may assume that the induction variables
 ; that control loops do not overflow and that the loops with nontrivial
 ; exit condition are not infinite
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 96316fbd23b..e1010077ddb 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1474,6 +1474,11 @@ process_options (void)
   if (flag_unroll_all_loops)
     flag_unroll_loops = 1;
 
+  /* Allow cunroll to grow size accordingly.  */
+  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
+    flag_cunroll_grow_size
+      = flag_unroll_loops || flag_peel_loops || optimize >= 3;
+
   /* web and rename-registers help when run after loop unrolling.  */
   if (flag_web == AUTODETECT_VALUE)
     flag_web = flag_unroll_loops;
diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index 8ab6ab3330c..298ab215530 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -1603,9 +1603,8 @@ pass_complete_unroll::execute (function *fun)
      re-peeling the same loop multiple times.  */
   if (flag_peel_loops)
     peeled_loops = BITMAP_ALLOC (NULL);
-  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
-						   || flag_peel_loops
-						   || optimize >= 3, true);
+  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size, 
+						   true);
   if (peeled_loops)
     {
       BITMAP_FREE (peeled_loops);

BR,
Jiufu


> Richard Biener <richard.guenther@gmail.com> writes:
>
>> On Tue, Jun 2, 2020 at 4:10 AM Jiufu Guo <guojiufu@linux.ibm.com> wrote:
>>>
>>> Jiufu Guo <guojiufu@linux.ibm.com> writes:
>>>
>>> Hi,
>>>
>>> I updated the patch just a little accordinlgy.  Thanks!
>>>
>>> diff --git a/gcc/common.opt b/gcc/common.opt
>>> index 4464049fc1f..570e2aa53c8 100644
>>> --- a/gcc/common.opt
>>> +++ b/gcc/common.opt
>>> @@ -2856,6 +2856,10 @@ funroll-all-loops
>>>  Common Report Var(flag_unroll_all_loops) Optimization
>>>  Perform loop unrolling for all loops.
>>>
>>> +funroll-completely-grow-size
>>> +Common Undocumented Var(flag_cunroll_grow_size) Init(2) Optimization
>>> +; Internal undocumented flag, allow size growth during complete unrolling
>>> +
>>>  ; Nonzero means that loop optimizer may assume that the induction variables
>>>  ; that control loops do not overflow and that the loops with nontrivial
>>>  ; exit condition are not infinite
>>> diff --git a/gcc/toplev.c b/gcc/toplev.c
>>> index 96316fbd23b..8d52358efdd 100644
>>> --- a/gcc/toplev.c
>>> +++ b/gcc/toplev.c
>>> @@ -1474,6 +1474,10 @@ process_options (void)
>>>    if (flag_unroll_all_loops)
>>>      flag_unroll_loops = 1;
>>>
>>> +  /* Allow cunroll to grow size accordingly.  */
>>> +  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
>>> +    flag_cunroll_grow_size = flag_unroll_loops || flag_peel_loops;
>>> +
>>
>> (*)
>>
>>>    /* web and rename-registers help when run after loop unrolling.  */
>>>    if (flag_web == AUTODETECT_VALUE)
>>>      flag_web = flag_unroll_loops;
>>> diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
>>> index 8ab6ab3330c..298ab215530 100644
>>> --- a/gcc/tree-ssa-loop-ivcanon.c
>>> +++ b/gcc/tree-ssa-loop-ivcanon.c
>>> @@ -1603,9 +1603,8 @@ pass_complete_unroll::execute (function *fun)
>>>       re-peeling the same loop multiple times.  */
>>>    if (flag_peel_loops)
>>>      peeled_loops = BITMAP_ALLOC (NULL);
>>> -  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
>>> -                                                  || flag_peel_loops
>>> -                                                  || optimize >= 3, true);
>>> +  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size,
>>
>> With -O3 -fno-peel-loops this does not have the same effect anymore.
>> So above (*) you'd need to check || optimize >= 3 as well.
>
> Oh, yes.  Thanks for your kindly review!
>
> BR,
> Jiufu
>
>>
>>> +                                                  true);
>>>    if (peeled_loops)
>>>      {
>>>        BITMAP_FREE (peeled_loops);
>>>
>>> BR,
>>> Jiufu
>>>
>>> > Richard Biener <richard.guenther@gmail.com> writes:
>>> >
>>>
>>> >>> >> From: Jiufu Guo <guojiufu@linux.ibm.com>
>>> >>> >>
>>> >>> >> Currently GIMPLE complete unroller(cunroll) is checking
>>> >>> >> flag_unroll_loops and flag_peel_loops to see if allow size growth.
>>> >>> >> Beside affects curnoll, flag_unroll_loops also controls RTL unroler.
>>> >>> >> To have more freedom to control cunroll and RTL unroller, this patch
>>> >>> >> introduces flag_cunroll_grow_size.  With this patch, we can control
>>> >>> >> cunroll and RTL unroller indepently.
>>> >>> >>
>>> >>> >> Bootstrap/regtest pass on powerpc64le. OK for trunk? And backport to
>>> >>> >> GCC10 after week?
>>> >>> >>
>>> >>> >>
>>> >>> >> +funroll-completely-grow-size
>>> >>> >> +Var(flag_cunroll_grow_size) Init(2)
>>> >>> >> +; Control cunroll to allow size growth during complete unrolling
>>> >>> >> +
>>> >>> >
>>> >>
>>> >> It won't work without adjusting the awk scripts.  So go with
>>> >>
>>> >> funroll-completely-grow-size
>>> >> Undocumented Optimization Var(flag_cunroll_grow_size)
>>> >> EnabledBy(funroll-loops || fpeel-loops)
>>> >> ; ...
>>> >>
>>> > EnabledBy(funroll-loops || fpeel-loops) does not works as we expected:
>>> > "-funroll-loops -fno-peel-loops" turns off flag_cunroll_grow_size.
>>> >
>>> > Through "EnabledBy", a flag can be turned, and also can be turned off by
>>> > the "EnabledBy option", only if the flag is not specifed through commond
>>> > line.
>>> >
>>> >> and enable it at O3+.  AUTODETECT_VALUE doesn't make sense for
>>> >> an option not supposed to be set by users?
>>> >>
>>> >
>>> > global_options_set.x_flagxxx can be used to check if option is set by
>>> > user.  But it does not work well here neither, because we also care of
>>> > if the flag is override by OPTION_OPTIMIZATION_TABLE or
>>> > OPTION_OVERRIDE.
>>> >
>>> > AUTODETECT_VALUE(value is 2) is used for some flags like flag_web,
>>> > flag_rename_registers, flag_var_tracking, flag_tree_cselim...
>>> > And this way could be used to check if the flag is effective(on/off)
>>> > either explicit set by command line or implicit set through
>>> > OPTION_OVERRIDE or OPTION_OPTIMIZATION_TABLE.
>>> > So, I use it here.
>>>
>>>
Richard Biener June 4, 2020, 6:46 a.m. UTC | #10
On Thu, Jun 4, 2020 at 5:34 AM Jiufu Guo <guojiufu@linux.ibm.com> wrote:
>
> Jiufu Guo <guojiufu@linux.ibm.com> writes:
>
> Hi,
>
> Patch is updated a little according to comments.
> Please see if this is ok to commit.

OK with a proper ChangeLog after bootstrap / testing.

It's also OK to backport this to the GCC 10 branch if powerpc
folks want to restore GIMPLE cunroll behavior for their target.

Thanks,
Richard.

>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 4464049fc1f..570e2aa53c8 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2856,6 +2856,10 @@ funroll-all-loops
>  Common Report Var(flag_unroll_all_loops) Optimization
>  Perform loop unrolling for all loops.
>
> +funroll-completely-grow-size
> +Undocumented Var(flag_cunroll_grow_size) Init(2) Optimization
> +; Internal undocumented flag, allow size growth during complete unrolling
> +
>  ; Nonzero means that loop optimizer may assume that the induction variables
>  ; that control loops do not overflow and that the loops with nontrivial
>  ; exit condition are not infinite
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 96316fbd23b..e1010077ddb 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -1474,6 +1474,11 @@ process_options (void)
>    if (flag_unroll_all_loops)
>      flag_unroll_loops = 1;
>
> +  /* Allow cunroll to grow size accordingly.  */
> +  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
> +    flag_cunroll_grow_size
> +      = flag_unroll_loops || flag_peel_loops || optimize >= 3;
> +
>    /* web and rename-registers help when run after loop unrolling.  */
>    if (flag_web == AUTODETECT_VALUE)
>      flag_web = flag_unroll_loops;
> diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
> index 8ab6ab3330c..298ab215530 100644
> --- a/gcc/tree-ssa-loop-ivcanon.c
> +++ b/gcc/tree-ssa-loop-ivcanon.c
> @@ -1603,9 +1603,8 @@ pass_complete_unroll::execute (function *fun)
>       re-peeling the same loop multiple times.  */
>    if (flag_peel_loops)
>      peeled_loops = BITMAP_ALLOC (NULL);
> -  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
> -                                                  || flag_peel_loops
> -                                                  || optimize >= 3, true);
> +  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size,
> +                                                  true);
>    if (peeled_loops)
>      {
>        BITMAP_FREE (peeled_loops);
>
> BR,
> Jiufu
>
>
> > Richard Biener <richard.guenther@gmail.com> writes:
> >
> >> On Tue, Jun 2, 2020 at 4:10 AM Jiufu Guo <guojiufu@linux.ibm.com> wrote:
> >>>
> >>> Jiufu Guo <guojiufu@linux.ibm.com> writes:
> >>>
> >>> Hi,
> >>>
> >>> I updated the patch just a little accordinlgy.  Thanks!
> >>>
> >>> diff --git a/gcc/common.opt b/gcc/common.opt
> >>> index 4464049fc1f..570e2aa53c8 100644
> >>> --- a/gcc/common.opt
> >>> +++ b/gcc/common.opt
> >>> @@ -2856,6 +2856,10 @@ funroll-all-loops
> >>>  Common Report Var(flag_unroll_all_loops) Optimization
> >>>  Perform loop unrolling for all loops.
> >>>
> >>> +funroll-completely-grow-size
> >>> +Common Undocumented Var(flag_cunroll_grow_size) Init(2) Optimization
> >>> +; Internal undocumented flag, allow size growth during complete unrolling
> >>> +
> >>>  ; Nonzero means that loop optimizer may assume that the induction variables
> >>>  ; that control loops do not overflow and that the loops with nontrivial
> >>>  ; exit condition are not infinite
> >>> diff --git a/gcc/toplev.c b/gcc/toplev.c
> >>> index 96316fbd23b..8d52358efdd 100644
> >>> --- a/gcc/toplev.c
> >>> +++ b/gcc/toplev.c
> >>> @@ -1474,6 +1474,10 @@ process_options (void)
> >>>    if (flag_unroll_all_loops)
> >>>      flag_unroll_loops = 1;
> >>>
> >>> +  /* Allow cunroll to grow size accordingly.  */
> >>> +  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
> >>> +    flag_cunroll_grow_size = flag_unroll_loops || flag_peel_loops;
> >>> +
> >>
> >> (*)
> >>
> >>>    /* web and rename-registers help when run after loop unrolling.  */
> >>>    if (flag_web == AUTODETECT_VALUE)
> >>>      flag_web = flag_unroll_loops;
> >>> diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
> >>> index 8ab6ab3330c..298ab215530 100644
> >>> --- a/gcc/tree-ssa-loop-ivcanon.c
> >>> +++ b/gcc/tree-ssa-loop-ivcanon.c
> >>> @@ -1603,9 +1603,8 @@ pass_complete_unroll::execute (function *fun)
> >>>       re-peeling the same loop multiple times.  */
> >>>    if (flag_peel_loops)
> >>>      peeled_loops = BITMAP_ALLOC (NULL);
> >>> -  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
> >>> -                                                  || flag_peel_loops
> >>> -                                                  || optimize >= 3, true);
> >>> +  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size,
> >>
> >> With -O3 -fno-peel-loops this does not have the same effect anymore.
> >> So above (*) you'd need to check || optimize >= 3 as well.
> >
> > Oh, yes.  Thanks for your kindly review!
> >
> > BR,
> > Jiufu
> >
> >>
> >>> +                                                  true);
> >>>    if (peeled_loops)
> >>>      {
> >>>        BITMAP_FREE (peeled_loops);
> >>>
> >>> BR,
> >>> Jiufu
> >>>
> >>> > Richard Biener <richard.guenther@gmail.com> writes:
> >>> >
> >>>
> >>> >>> >> From: Jiufu Guo <guojiufu@linux.ibm.com>
> >>> >>> >>
> >>> >>> >> Currently GIMPLE complete unroller(cunroll) is checking
> >>> >>> >> flag_unroll_loops and flag_peel_loops to see if allow size growth.
> >>> >>> >> Beside affects curnoll, flag_unroll_loops also controls RTL unroler.
> >>> >>> >> To have more freedom to control cunroll and RTL unroller, this patch
> >>> >>> >> introduces flag_cunroll_grow_size.  With this patch, we can control
> >>> >>> >> cunroll and RTL unroller indepently.
> >>> >>> >>
> >>> >>> >> Bootstrap/regtest pass on powerpc64le. OK for trunk? And backport to
> >>> >>> >> GCC10 after week?
> >>> >>> >>
> >>> >>> >>
> >>> >>> >> +funroll-completely-grow-size
> >>> >>> >> +Var(flag_cunroll_grow_size) Init(2)
> >>> >>> >> +; Control cunroll to allow size growth during complete unrolling
> >>> >>> >> +
> >>> >>> >
> >>> >>
> >>> >> It won't work without adjusting the awk scripts.  So go with
> >>> >>
> >>> >> funroll-completely-grow-size
> >>> >> Undocumented Optimization Var(flag_cunroll_grow_size)
> >>> >> EnabledBy(funroll-loops || fpeel-loops)
> >>> >> ; ...
> >>> >>
> >>> > EnabledBy(funroll-loops || fpeel-loops) does not works as we expected:
> >>> > "-funroll-loops -fno-peel-loops" turns off flag_cunroll_grow_size.
> >>> >
> >>> > Through "EnabledBy", a flag can be turned, and also can be turned off by
> >>> > the "EnabledBy option", only if the flag is not specifed through commond
> >>> > line.
> >>> >
> >>> >> and enable it at O3+.  AUTODETECT_VALUE doesn't make sense for
> >>> >> an option not supposed to be set by users?
> >>> >>
> >>> >
> >>> > global_options_set.x_flagxxx can be used to check if option is set by
> >>> > user.  But it does not work well here neither, because we also care of
> >>> > if the flag is override by OPTION_OPTIMIZATION_TABLE or
> >>> > OPTION_OVERRIDE.
> >>> >
> >>> > AUTODETECT_VALUE(value is 2) is used for some flags like flag_web,
> >>> > flag_rename_registers, flag_var_tracking, flag_tree_cselim...
> >>> > And this way could be used to check if the flag is effective(on/off)
> >>> > either explicit set by command line or implicit set through
> >>> > OPTION_OVERRIDE or OPTION_OPTIMIZATION_TABLE.
> >>> > So, I use it here.
> >>>
> >>>
Segher Boessenkool June 4, 2020, 5:53 p.m. UTC | #11
On Thu, Jun 04, 2020 at 08:46:23AM +0200, Richard Biener wrote:
> On Thu, Jun 4, 2020 at 5:34 AM Jiufu Guo <guojiufu@linux.ibm.com> wrote:
> > Patch is updated a little according to comments.
> > Please see if this is ok to commit.
> 
> OK with a proper ChangeLog after bootstrap / testing.
> 
> It's also OK to backport this to the GCC 10 branch if powerpc
> folks want to restore GIMPLE cunroll behavior for their target.

Yes please!  (But first make sure it works as expected, no fallout, etc.)

Thanks to both of you,


Segher
Jiufu Guo June 8, 2020, 3 a.m. UTC | #12
On 2020-06-05 01:53, Segher Boessenkool wrote:
> On Thu, Jun 04, 2020 at 08:46:23AM +0200, Richard Biener wrote:
>> On Thu, Jun 4, 2020 at 5:34 AM Jiufu Guo <guojiufu@linux.ibm.com> 
>> wrote:
>> > Patch is updated a little according to comments.
>> > Please see if this is ok to commit.
>> 
>> OK with a proper ChangeLog after bootstrap / testing.
>> 
>> It's also OK to backport this to the GCC 10 branch if powerpc
>> folks want to restore GIMPLE cunroll behavior for their target.
> 
> Yes please!  (But first make sure it works as expected, no fallout, 
> etc.)
> 
> Thanks to both of you,
> 
> 
> Segher

Thanks all for your comments and suggestions!

The patches were committed to trunk.
I thinking it would still need to tune cunroll to avoid peeling complex 
loop(e.g. early exits loop) or tune the params of cunroll.


BR,
Jiufu
diff mbox series

Patch

diff --git a/gcc/common.opt b/gcc/common.opt
index 4464049fc1f..1d0fa7b1749 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2856,6 +2856,10 @@  funroll-all-loops
 Common Report Var(flag_unroll_all_loops) Optimization
 Perform loop unrolling for all loops.
 
+funroll-completely-grow-size
+Var(flag_cunroll_grow_size) Init(2)
+; Control cunroll to allow size growth during complete unrolling
+
 ; Nonzero means that loop optimizer may assume that the induction variables
 ; that control loops do not overflow and that the loops with nontrivial
 ; exit condition are not infinite
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 96316fbd23b..8d52358efdd 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1474,6 +1474,10 @@  process_options (void)
   if (flag_unroll_all_loops)
     flag_unroll_loops = 1;
 
+  /* Allow cunroll to grow size accordingly.  */
+  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
+    flag_cunroll_grow_size = flag_unroll_loops || flag_peel_loops;
+
   /* web and rename-registers help when run after loop unrolling.  */
   if (flag_web == AUTODETECT_VALUE)
     flag_web = flag_unroll_loops;
diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index 8ab6ab3330c..d6a4617a6a1 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -1603,8 +1603,7 @@  pass_complete_unroll::execute (function *fun)
      re-peeling the same loop multiple times.  */
   if (flag_peel_loops)
     peeled_loops = BITMAP_ALLOC (NULL);
-  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
-						   || flag_peel_loops
+  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size
 						   || optimize >= 3, true);
   if (peeled_loops)
     {