diff mbox

Do not imply -fweb with -fpeel-loops

Message ID 20160531104919.GA95630@kam.mff.cuni.cz
State New
Headers show

Commit Message

Jan Hubicka May 31, 2016, 10:49 a.m. UTC
Hi,
enabling -fpeel-loops with -O3 and -Ofast had unexpected effect of enabling
-fweb and -frename-registers.  This patch drop -fweb from list of passes that
are implied by -fpeel-loops because it is no longer needed since we do peeling
at tree instead of RTL and the out-of-ssa pass cares about this.

I will look into rename registers separately. My understanding is that rename
registers is not really that specific for -funroll-loops or -fpeel-loops.
Loop unrolling may make it bit more useful because it increases expected basic
block size and thus enables scheduler to do more.

Rather it depends on target whether register renaming is win. On out of order targets
where scheduling is not that imprtant it does not seem to be worth the compile time
and code size effects, while on targets that depends on scheduling it does.
Tonight tests on x86 shows improvmeents in botan which I think are related to renaming
(because they also improved when Bern temporarily enabled it few days back),
measurable code size regression (which is probably fixable if we make rename-registers
to not increase instruction encoding size) and noticeable compile time slowdown
(not sure if it is from regrename itself that is quite easy pass or from the fact
that scheduler has more freedom and thus work harder).

On the other hand both SPECint and SPECfp improved by about 2% on Itanium and
code size reduced.  I think similar effect should happen on superscalar inorder
tarets with constant size of instruction encoding.

I would say we should handle -frename-registers same way as we do -fschedule-insns.
I.e. enable/disable it on target basis rather than budnle it to loop unrolling.
But that is for future patch.  I will commit this and propose patch making -fpeel-loops
independent of rename-registers next

Honza

Bootstrapped/regtested x86_64-linux, will commit it shortly.

	* loop-init.c (gate): Do not enale RTL loop unroller with -fpeel-loops.
	It no longer does that.
	* toplev.c (process_options): Do not enable flag_web with -fpeel-loops.

Comments

Richard Biener May 31, 2016, 11:10 a.m. UTC | #1
On Tue, May 31, 2016 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> enabling -fpeel-loops with -O3 and -Ofast had unexpected effect of enabling
> -fweb and -frename-registers.  This patch drop -fweb from list of passes that
> are implied by -fpeel-loops because it is no longer needed since we do peeling
> at tree instead of RTL and the out-of-ssa pass cares about this.
>
> I will look into rename registers separately. My understanding is that rename
> registers is not really that specific for -funroll-loops or -fpeel-loops.
> Loop unrolling may make it bit more useful because it increases expected basic
> block size and thus enables scheduler to do more.
>
> Rather it depends on target whether register renaming is win. On out of order targets
> where scheduling is not that imprtant it does not seem to be worth the compile time
> and code size effects, while on targets that depends on scheduling it does.
> Tonight tests on x86 shows improvmeents in botan which I think are related to renaming
> (because they also improved when Bern temporarily enabled it few days back),
> measurable code size regression (which is probably fixable if we make rename-registers
> to not increase instruction encoding size) and noticeable compile time slowdown
> (not sure if it is from regrename itself that is quite easy pass or from the fact
> that scheduler has more freedom and thus work harder).
>
> On the other hand both SPECint and SPECfp improved by about 2% on Itanium and
> code size reduced.  I think similar effect should happen on superscalar inorder
> tarets with constant size of instruction encoding.
>
> I would say we should handle -frename-registers same way as we do -fschedule-insns.
> I.e. enable/disable it on target basis rather than budnle it to loop unrolling.
> But that is for future patch.  I will commit this and propose patch making -fpeel-loops
> independent of rename-registers next

Ok.  Does -fweb still do what its documentation says (enable register
allocation to
work on pseudos directly)?  Given its downside and strong GIMPLE
optimizations maybe
it is time to remove it?

Thanks,
Richard.

> Honza
>
> Bootstrapped/regtested x86_64-linux, will commit it shortly.
>
>         * loop-init.c (gate): Do not enale RTL loop unroller with -fpeel-loops.
>         It no longer does that.
>         * toplev.c (process_options): Do not enable flag_web with -fpeel-loops.
> Index: loop-init.c
> ===================================================================
> --- loop-init.c (revision 236894)
> +++ loop-init.c (working copy)
> @@ -560,7 +560,7 @@ public:
>    /* opt_pass methods: */
>    virtual bool gate (function *)
>      {
> -      return (flag_peel_loops || flag_unroll_loops || flag_unroll_all_loops);
> +      return (flag_unroll_loops || flag_unroll_all_loops);
>      }
>
>    virtual unsigned int execute (function *);
> Index: toplev.c
> ===================================================================
> --- toplev.c    (revision 236894)
> +++ toplev.c    (working copy)
> @@ -1296,7 +1296,7 @@ process_options (void)
>
>    /* web and rename-registers help when run after loop unrolling.  */
>    if (flag_web == AUTODETECT_VALUE)
> -    flag_web = flag_unroll_loops || flag_peel_loops;
> +    flag_web = flag_unroll_loops;
>
>    if (flag_rename_registers == AUTODETECT_VALUE)
>      flag_rename_registers = flag_unroll_loops || flag_peel_loops;
Jan Hubicka May 31, 2016, 11:42 a.m. UTC | #2
> > I would say we should handle -frename-registers same way as we do -fschedule-insns.
> > I.e. enable/disable it on target basis rather than budnle it to loop unrolling.
> > But that is for future patch.  I will commit this and propose patch making -fpeel-loops
> > independent of rename-registers next
> 
> Ok.  Does -fweb still do what its documentation says (enable register
> allocation to
> work on pseudos directly)?  Given its downside and strong GIMPLE
> optimizations maybe
> it is time to remove it?

I will wait with comitting the renaming patch and we will see the effect with
-fno-unroll-loops on ia64 tester (it probably pays back more on in-order
target) and we can test effect with -funroll-loops by patching it tomorrow.

I guess IRA does live range pslitting by itself.  Register allocator is not the
only pass which does not like reuse of pseudo.  Web originally also made
-fschedule-insns (not -fschedule-insns2) and CSE to work harder.

In tree-SSA world most of reuse is avoided by out-of-ssa pass. Random reuse
these days is limited to loop unrolling and RTL expansion I think. Without
unroling webizer renames 204 pseudos on tramp3d that seems close to 0. I can
try to dig into that. With -funroll-all-loops it renames 17309 registers.
Of course it would be possible to write unroller specific renamer, but it would
hardly be any wasier than webizer.

Honza
Richard Biener May 31, 2016, 1:50 p.m. UTC | #3
On Tue, May 31, 2016 at 1:42 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> > I would say we should handle -frename-registers same way as we do -fschedule-insns.
>> > I.e. enable/disable it on target basis rather than budnle it to loop unrolling.
>> > But that is for future patch.  I will commit this and propose patch making -fpeel-loops
>> > independent of rename-registers next
>>
>> Ok.  Does -fweb still do what its documentation says (enable register
>> allocation to
>> work on pseudos directly)?  Given its downside and strong GIMPLE
>> optimizations maybe
>> it is time to remove it?
>
> I will wait with comitting the renaming patch and we will see the effect with
> -fno-unroll-loops on ia64 tester (it probably pays back more on in-order
> target) and we can test effect with -funroll-loops by patching it tomorrow.
>
> I guess IRA does live range pslitting by itself.  Register allocator is not the
> only pass which does not like reuse of pseudo.  Web originally also made
> -fschedule-insns (not -fschedule-insns2) and CSE to work harder.
>
> In tree-SSA world most of reuse is avoided by out-of-ssa pass. Random reuse
> these days is limited to loop unrolling and RTL expansion I think. Without
> unroling webizer renames 204 pseudos on tramp3d that seems close to 0. I can
> try to dig into that. With -funroll-all-loops it renames 17309 registers.
> Of course it would be possible to write unroller specific renamer, but it would
> hardly be any wasier than webizer.

And we have things like -fsplit-ivs-in-unroller that does part of it...

Richard.

> Honza
Bill Schmidt May 31, 2016, 3:02 p.m. UTC | #4
If the decision is taken to remove -fweb, please give me a heads-up as I have a target-specific pass that shares infrastructure with it.  I can factor that out to facilitate the removal; just let me know.

Thanks!

-- Bill

Bill Schmidt, Ph.D.
GCC for LInux on Power
Linux on Power Toolchain
IBM Linux Technology Center
wschmidt@linux.vnet.ibm.com




> On May 31, 2016, at 8:50 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Tue, May 31, 2016 at 1:42 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>> I would say we should handle -frename-registers same way as we do -fschedule-insns.
>>>> I.e. enable/disable it on target basis rather than budnle it to loop unrolling.
>>>> But that is for future patch.  I will commit this and propose patch making -fpeel-loops
>>>> independent of rename-registers next
>>> 
>>> Ok.  Does -fweb still do what its documentation says (enable register
>>> allocation to
>>> work on pseudos directly)?  Given its downside and strong GIMPLE
>>> optimizations maybe
>>> it is time to remove it?
>> 
>> I will wait with comitting the renaming patch and we will see the effect with
>> -fno-unroll-loops on ia64 tester (it probably pays back more on in-order
>> target) and we can test effect with -funroll-loops by patching it tomorrow.
>> 
>> I guess IRA does live range pslitting by itself.  Register allocator is not the
>> only pass which does not like reuse of pseudo.  Web originally also made
>> -fschedule-insns (not -fschedule-insns2) and CSE to work harder.
>> 
>> In tree-SSA world most of reuse is avoided by out-of-ssa pass. Random reuse
>> these days is limited to loop unrolling and RTL expansion I think. Without
>> unroling webizer renames 204 pseudos on tramp3d that seems close to 0. I can
>> try to dig into that. With -funroll-all-loops it renames 17309 registers.
>> Of course it would be possible to write unroller specific renamer, but it would
>> hardly be any wasier than webizer.
> 
> And we have things like -fsplit-ivs-in-unroller that does part of it...
> 
> Richard.
> 
>> Honza
>
Jan Hubicka June 1, 2016, 2:31 p.m. UTC | #5
> >
> > In tree-SSA world most of reuse is avoided by out-of-ssa pass. Random reuse
> > these days is limited to loop unrolling and RTL expansion I think. Without
> > unroling webizer renames 204 pseudos on tramp3d that seems close to 0. I can
> > try to dig into that. With -funroll-all-loops it renames 17309 registers.
> > Of course it would be possible to write unroller specific renamer, but it would
> > hardly be any wasier than webizer.
> 
> And we have things like -fsplit-ivs-in-unroller that does part of it...

That is orthogonal to splitting done by web pass though. On Terbium I see two perofmrance change
today which is probably for turning off the webizer again. It is Eon 975->955 and Mesa 810->778.
This is w/o unrolling so not that many splits are triggered. I will collect the data with unrolling.

Honza

> 
> Richard.
> 
> > Honza
diff mbox

Patch

Index: loop-init.c
===================================================================
--- loop-init.c	(revision 236894)
+++ loop-init.c	(working copy)
@@ -560,7 +560,7 @@  public:
   /* opt_pass methods: */
   virtual bool gate (function *)
     {
-      return (flag_peel_loops || flag_unroll_loops || flag_unroll_all_loops);
+      return (flag_unroll_loops || flag_unroll_all_loops);
     }
 
   virtual unsigned int execute (function *);
Index: toplev.c
===================================================================
--- toplev.c	(revision 236894)
+++ toplev.c	(working copy)
@@ -1296,7 +1296,7 @@  process_options (void)
 
   /* web and rename-registers help when run after loop unrolling.  */
   if (flag_web == AUTODETECT_VALUE)
-    flag_web = flag_unroll_loops || flag_peel_loops;
+    flag_web = flag_unroll_loops;
 
   if (flag_rename_registers == AUTODETECT_VALUE)
     flag_rename_registers = flag_unroll_loops || flag_peel_loops;