Patchwork Move switch-conversion after profiling

login
register
mail settings
Submitter Steven Bosscher
Date April 19, 2012, 3:16 p.m.
Message ID <CABu31nP-Bvq7cydgz_5QL=Ya_j=3ObyxzEn+BVWMKm_tw2+4aQ@mail.gmail.com>
Download mbox | patch
Permalink /patch/153813/
State New
Headers show

Comments

Steven Bosscher - April 19, 2012, 3:16 p.m.
Hello,

If we want to use profiling to expand switches in GIMPLE, we'll have
to run switch-conversion after profiling.

Bootstrapped and tested on x86_64-unknown-linux-gnu. OK?

Ciao!
Steven

	* passes.c (pass_convert_switch): Move after profiling and pure_const.
Richard Guenther - April 20, 2012, 8:25 a.m.
On Thu, Apr 19, 2012 at 5:16 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> Hello,
>
> If we want to use profiling to expand switches in GIMPLE, we'll have
> to run switch-conversion after profiling.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu. OK?

That's too early still.  Profile data is not read until pass_ipa_tree_profile.
Which means moving it to after all IPA transforms (and thus run at LTRANS
stage with LTO) would be better.  May I suggest to move it after the
DCE pass that runs after the first VRP pass?

Thanks,
Richard.

> Ciao!
> Steven
>
>        * passes.c (pass_convert_switch): Move after profiling and pure_const.
>
> Index: passes.c
> ===================================================================
> --- passes.c    (revision 186586)
> +++ passes.c    (working copy)
> @@ -1326,10 +1326,10 @@ init_optimization_passes (void)
>          NEXT_PASS (pass_cd_dce);
>          NEXT_PASS (pass_early_ipa_sra);
>          NEXT_PASS (pass_tail_recursion);
> -         NEXT_PASS (pass_convert_switch);
>           NEXT_PASS (pass_cleanup_eh);
>           NEXT_PASS (pass_profile);
>           NEXT_PASS (pass_local_pure_const);
> +         NEXT_PASS (pass_convert_switch);
>          /* Split functions creates parts that are not run through
>             early optimizations again.  It is thus good idea to do this
>             late.  */
Jan Hubicka - April 20, 2012, 8:52 a.m.
> On Thu, Apr 19, 2012 at 5:16 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> > Hello,
> >
> > If we want to use profiling to expand switches in GIMPLE, we'll have
> > to run switch-conversion after profiling.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu. OK?
> 
> That's too early still.  Profile data is not read until pass_ipa_tree_profile.

Good point ;)

> Which means moving it to after all IPA transforms (and thus run at LTRANS
> stage with LTO) would be better.  May I suggest to move it after the
> DCE pass that runs after the first VRP pass?

The original motivation to do switch conversion early was to get function
bodies smaller (i.e. when inlining the static var don't need duplication, the
switch code does)

This was motivated by real world examples, i.e. mesa that inlines function converting
error codes into strings. At one point we estimated it to be of 0 size (because it
has only switch that was ignored and string constants that are zero cost) and we
ended up completely exploding by inlining it everywhere.

I think we should retain this and perhaps have always win conversions done early
and real coversion (of switches to decision trees) done much later.
Switch expansion hides what the code really does and some of the late optimizers
will probably benefit from knowing that switch is really a switch.

I would expect that somewhere before second VRP makes most sense.

Honza
Richard Guenther - April 20, 2012, 8:57 a.m.
On Fri, Apr 20, 2012 at 10:52 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Thu, Apr 19, 2012 at 5:16 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
>> > Hello,
>> >
>> > If we want to use profiling to expand switches in GIMPLE, we'll have
>> > to run switch-conversion after profiling.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu. OK?
>>
>> That's too early still.  Profile data is not read until pass_ipa_tree_profile.
>
> Good point ;)
>
>> Which means moving it to after all IPA transforms (and thus run at LTRANS
>> stage with LTO) would be better.  May I suggest to move it after the
>> DCE pass that runs after the first VRP pass?
>
> The original motivation to do switch conversion early was to get function
> bodies smaller (i.e. when inlining the static var don't need duplication, the
> switch code does)
>
> This was motivated by real world examples, i.e. mesa that inlines function converting
> error codes into strings. At one point we estimated it to be of 0 size (because it
> has only switch that was ignored and string constants that are zero cost) and we
> ended up completely exploding by inlining it everywhere.

Well, I never really believed this theory ;)

> I think we should retain this and perhaps have always win conversions done early
> and real coversion (of switches to decision trees) done much later.
> Switch expansion hides what the code really does and some of the late optimizers
> will probably benefit from knowing that switch is really a switch.
>
> I would expect that somewhere before second VRP makes most sense.

No, I think before loop opts it makes most sense as you can end up
creating vectorizable code.  After first VRP because such VRP can
remove dead cases.

Richard.

> Honza
Jan Hubicka - April 20, 2012, 9:24 a.m.
> >
> > The original motivation to do switch conversion early was to get function
> > bodies smaller (i.e. when inlining the static var don't need duplication, the
> > switch code does)
> >
> > This was motivated by real world examples, i.e. mesa that inlines function converting
> > error codes into strings. At one point we estimated it to be of 0 size (because it
> > has only switch that was ignored and string constants that are zero cost) and we
> > ended up completely exploding by inlining it everywhere.
> 
> Well, I never really believed this theory ;)

:) It was a bug in the cost functions (it now has switch statement estimation code in it
for this reason), but anyway it shows that converting those kind of switch constructs
helps to get more inlining and code sharing.

Still it seems to me that we are mixing two essentially independent things - one is
set of recognizers for trivial switch constructs that can be easilly converted into
smaller and faster switch free code. This IMO makes sense to do early.

Second is expanding switch constructs into decision trees (i.e. lowering) that
makes sense to be later after profile is built and code is resonably optimized.
> 
> > I think we should retain this and perhaps have always win conversions done early
> > and real coversion (of switches to decision trees) done much later.
> > Switch expansion hides what the code really does and some of the late optimizers
> > will probably benefit from knowing that switch is really a switch.
> >
> > I would expect that somewhere before second VRP makes most sense.
> 
> No, I think before loop opts it makes most sense as you can end up
> creating vectorizable code.  After first VRP because such VRP can
> remove dead cases.

Hmm, before loop opts it makes sense indeed.
> 
> Richard.
> 
> > Honza
Richard Guenther - April 20, 2012, 9:30 a.m.
On Fri, Apr 20, 2012 at 11:24 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >
>> > The original motivation to do switch conversion early was to get function
>> > bodies smaller (i.e. when inlining the static var don't need duplication, the
>> > switch code does)
>> >
>> > This was motivated by real world examples, i.e. mesa that inlines function converting
>> > error codes into strings. At one point we estimated it to be of 0 size (because it
>> > has only switch that was ignored and string constants that are zero cost) and we
>> > ended up completely exploding by inlining it everywhere.
>>
>> Well, I never really believed this theory ;)
>
> :) It was a bug in the cost functions (it now has switch statement estimation code in it
> for this reason), but anyway it shows that converting those kind of switch constructs
> helps to get more inlining and code sharing.

Well, there are a lot of things that eventually lead to code size savings.  But
in early opts we only want to do things that _always_ lead to code size savings.
And things that do not affect debugging (I'm still thinking of doing a -Og or
re-doing -O1 to only do early opts - still having switch conversion in that set
sounds odd).

> Still it seems to me that we are mixing two essentially independent things - one is
> set of recognizers for trivial switch constructs that can be easilly converted into
> smaller and faster switch free code. This IMO makes sense to do early.
>
> Second is expanding switch constructs into decision trees (i.e. lowering) that
> makes sense to be later after profile is built and code is resonably optimized.
>>
>> > I think we should retain this and perhaps have always win conversions done early
>> > and real coversion (of switches to decision trees) done much later.
>> > Switch expansion hides what the code really does and some of the late optimizers
>> > will probably benefit from knowing that switch is really a switch.
>> >
>> > I would expect that somewhere before second VRP makes most sense.
>>
>> No, I think before loop opts it makes most sense as you can end up
>> creating vectorizable code.  After first VRP because such VRP can
>> remove dead cases.
>
> Hmm, before loop opts it makes sense indeed.
>>
>> Richard.
>>
>> > Honza
Jan Hubicka - April 20, 2012, 9:38 a.m.
> On Fri, Apr 20, 2012 at 11:24 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> >
> >> > The original motivation to do switch conversion early was to get function
> >> > bodies smaller (i.e. when inlining the static var don't need duplication, the
> >> > switch code does)
> >> >
> >> > This was motivated by real world examples, i.e. mesa that inlines function converting
> >> > error codes into strings. At one point we estimated it to be of 0 size (because it
> >> > has only switch that was ignored and string constants that are zero cost) and we
> >> > ended up completely exploding by inlining it everywhere.
> >>
> >> Well, I never really believed this theory ;)
> >
> > :) It was a bug in the cost functions (it now has switch statement estimation code in it
> > for this reason), but anyway it shows that converting those kind of switch constructs
> > helps to get more inlining and code sharing.
> 
> Well, there are a lot of things that eventually lead to code size savings.  But
> in early opts we only want to do things that _always_ lead to code size savings.

Agreed, we should do only transforms that are win code size and performance wise.
But the switch conversion code does that (at least in the cases we speak about).

> And things that do not affect debugging (I'm still thinking of doing a -Og or
> re-doing -O1 to only do early opts - still having switch conversion in that set
> sounds odd).

Converting swtich to table lookup kills the statements in the case labels and degrade
debug info. But this is not too different from what copy propagation/DCE and other things
we definitely want in early opts does. It does not seem to be that disturbing for
debugger to me.

Honza
> 
> > Still it seems to me that we are mixing two essentially independent things - one is
> > set of recognizers for trivial switch constructs that can be easilly converted into
> > smaller and faster switch free code. This IMO makes sense to do early.
> >
> > Second is expanding switch constructs into decision trees (i.e. lowering) that
> > makes sense to be later after profile is built and code is resonably optimized.
> >>
> >> > I think we should retain this and perhaps have always win conversions done early
> >> > and real coversion (of switches to decision trees) done much later.
> >> > Switch expansion hides what the code really does and some of the late optimizers
> >> > will probably benefit from knowing that switch is really a switch.
> >> >
> >> > I would expect that somewhere before second VRP makes most sense.
> >>
> >> No, I think before loop opts it makes most sense as you can end up
> >> creating vectorizable code.  After first VRP because such VRP can
> >> remove dead cases.
> >
> > Hmm, before loop opts it makes sense indeed.
> >>
> >> Richard.
> >>
> >> > Honza

Patch

Index: passes.c
===================================================================
--- passes.c	(revision 186586)
+++ passes.c	(working copy)
@@ -1326,10 +1326,10 @@  init_optimization_passes (void)
 	  NEXT_PASS (pass_cd_dce);
 	  NEXT_PASS (pass_early_ipa_sra);
 	  NEXT_PASS (pass_tail_recursion);
-	  NEXT_PASS (pass_convert_switch);
           NEXT_PASS (pass_cleanup_eh);
           NEXT_PASS (pass_profile);
           NEXT_PASS (pass_local_pure_const);
+	  NEXT_PASS (pass_convert_switch);
 	  /* Split functions creates parts that are not run through
 	     early optimizations again.  It is thus good idea to do this
 	     late.  */