Message ID | 20150310075708.GA7213@maggie |
---|---|
State | New |
Headers | show |
On Tue, Mar 10, 2015 at 8:57 AM, Andreas Krebbel wrote: > > * gcc/ifcvt.c (if_convert): > ...yes...? > diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c > index a3e3e5c..d2040af 100644 > --- a/gcc/ifcvt.c > +++ b/gcc/ifcvt.c > @@ -4626,6 +4626,13 @@ if_convert (bool after_combine) > num_true_changes); > } > > + if (num_true_changes > 0) > + { > + df_set_flags (DF_LR_RUN_DCE); > + df_mark_solutions_dirty (); > + df_analyze (); > + } > + > if (optimize == 1) > df_remove_problem (df_live); Tiny nail, huge hammer. This triggers a full re-scan of all insns and a re-calculation of all dataflow problems. The transformations in ifcvt are all simple enough that it should be possible to just clean up that redundant insn at the site where the code transformation is performed. Ciao! Steven
On 03/10/2015 10:12 AM, Steven Bosscher wrote: > On Tue, Mar 10, 2015 at 8:57 AM, Andreas Krebbel wrote: >> >> * gcc/ifcvt.c (if_convert): >> > > ...yes...? Damn. mklog is still not able to do the complete job for me ;) > Tiny nail, huge hammer. This triggers a full re-scan of all insns and > a re-calculation of all dataflow problems. > > The transformations in ifcvt are all simple enough that it should be > possible to just clean up that redundant insn at the site where the > code transformation is performed. Agreed. I was hoping to get this low risk version into 5.0. Bye, -Andreas-
On Tue, Mar 10, 2015 at 10:19 AM, Andreas Krebbel <krebbel@linux.vnet.ibm.com> wrote: > On 03/10/2015 10:12 AM, Steven Bosscher wrote: >> On Tue, Mar 10, 2015 at 8:57 AM, Andreas Krebbel wrote: >>> >>> * gcc/ifcvt.c (if_convert): >>> >> >> ...yes...? > > Damn. mklog is still not able to do the complete job for me ;) > >> Tiny nail, huge hammer. This triggers a full re-scan of all insns and >> a re-calculation of all dataflow problems. >> >> The transformations in ifcvt are all simple enough that it should be >> possible to just clean up that redundant insn at the site where the >> code transformation is performed. > > Agreed. I was hoping to get this low risk version into 5.0. Low risk? High risk of heavy compile-time regression I'd say (make a testcase that triggers a lot of iteration). Is this fixing a regression in some way? Richard. > Bye, > > -Andreas- > >
On 03/10/2015 11:27 AM, Richard Biener wrote:
....
> Is this fixing a regression in some way?
Not really. The optimization supposed to fold the bswap in that case is not that old:
https://gcc.gnu.org/ml/gcc-patches/2013-05/msg01378.html
The underlying problem however is probably visible in one way or the other since we have "load on
condition" instructions introduced with z196 (GCC 4.6).
-Andreas-
On 03/10/2015 11:27 AM, Richard Biener wrote: > On Tue, Mar 10, 2015 at 10:19 AM, Andreas Krebbel > <krebbel@linux.vnet.ibm.com> wrote: >> On 03/10/2015 10:12 AM, Steven Bosscher wrote: >>> On Tue, Mar 10, 2015 at 8:57 AM, Andreas Krebbel wrote: >>>> >>>> * gcc/ifcvt.c (if_convert): >>>> >>> >>> ...yes...? >> >> Damn. mklog is still not able to do the complete job for me ;) >> >>> Tiny nail, huge hammer. This triggers a full re-scan of all insns and >>> a re-calculation of all dataflow problems. >>> >>> The transformations in ifcvt are all simple enough that it should be >>> possible to just clean up that redundant insn at the site where the >>> code transformation is performed. >> >> Agreed. I was hoping to get this low risk version into 5.0. > > Low risk? High risk of heavy compile-time regression I'd say > (make a testcase that triggers a lot of iteration). Just to have some numbers I did run a -j1 GCC bootstrap twice with and without the patch on x86_64. Best results for both are: clean: 21459s patched: 21314s There rather appears to be a trend towards reduced compile time perhaps due to the reduced number of INSNs to be processed in the RTL passes between the two ifcvt runs (loop optimization, combine, fwprop, dse,...)?! I also tried to measure the testsuite runs but the results show a big variance. So what I have right now does not qualify as a benchmark. Bye, -Andreas- > > Is this fixing a regression in some way? > > Richard. > >> Bye, >> >> -Andreas- >> >> >
On 03/17/2015 02:17 AM, Andreas Krebbel wrote: > > Just to have some numbers I did run a -j1 GCC bootstrap twice with and without the patch on x86_64. > Best results for both are: > > clean: 21459s > patched: 21314s > > There rather appears to be a trend towards reduced compile time perhaps due to the reduced number of > INSNs to be processed in the RTL passes between the two ifcvt runs (loop optimization, combine, > fwprop, dse,...)?! > > I also tried to measure the testsuite runs but the results show a big variance. So what I have right > now does not qualify as a benchmark. And reality is it's getting harder and harder to benchmark this kind of thing with turbo modes and such. A single run isn't sufficient unless you've locked the box into a particular cpu frequency. jeff >
On Tue, Mar 17, 2015 at 7:29 PM, Jeff Law <law@redhat.com> wrote: > On 03/17/2015 02:17 AM, Andreas Krebbel wrote: >> >> >> Just to have some numbers I did run a -j1 GCC bootstrap twice with and >> without the patch on x86_64. >> Best results for both are: >> >> clean: 21459s >> patched: 21314s >> >> There rather appears to be a trend towards reduced compile time perhaps >> due to the reduced number of >> INSNs to be processed in the RTL passes between the two ifcvt runs (loop >> optimization, combine, >> fwprop, dse,...)?! >> >> I also tried to measure the testsuite runs but the results show a big >> variance. So what I have right >> now does not qualify as a benchmark. > > And reality is it's getting harder and harder to benchmark this kind of > thing with turbo modes and such. A single run isn't sufficient unless > you've locked the box into a particular cpu frequency. For the particular patch I wonder if you really need to change all three if-conversion pass instances or if changing the one before combine (pass_rtl_ifcvt, thus rest_of_handle_if_conversion) is enough. That already runs an unconditonal (huh...) cleanup_cfg (0) at the end which could be changed so that DCE is performed (CLEANUP_EXPENSIVE, runs delete_trivially_dead_insns). At least that makes the patch smaller and its impact restricted to one of the three ifcvt passes. OTOH ifcvt performs a DCE at its start (to be not confused by dead instructions I guess), so why doesn't combine do that as well (oh, it does!?)? And maybe _that_ DCE can be removed as if_convert () already performs a DF_LR_RUN_DCE on the first pass. Richard. > jeff >> >> >
On 03/18/2015 12:04 PM, Richard Biener wrote: > On Tue, Mar 17, 2015 at 7:29 PM, Jeff Law <law@redhat.com> wrote: >> On 03/17/2015 02:17 AM, Andreas Krebbel wrote: >>> >>> >>> Just to have some numbers I did run a -j1 GCC bootstrap twice with and >>> without the patch on x86_64. >>> Best results for both are: >>> >>> clean: 21459s >>> patched: 21314s >>> >>> There rather appears to be a trend towards reduced compile time perhaps >>> due to the reduced number of >>> INSNs to be processed in the RTL passes between the two ifcvt runs (loop >>> optimization, combine, >>> fwprop, dse,...)?! >>> >>> I also tried to measure the testsuite runs but the results show a big >>> variance. So what I have right >>> now does not qualify as a benchmark. >> >> And reality is it's getting harder and harder to benchmark this kind of >> thing with turbo modes and such. A single run isn't sufficient unless >> you've locked the box into a particular cpu frequency. > > For the particular patch I wonder if you really need to change all > three if-conversion pass instances or if changing the one before > combine (pass_rtl_ifcvt, thus rest_of_handle_if_conversion) is enough. Right. For this particular case it would be good enough to do it only in the first ifcvt run. But perhaps there are cases where later passes get confused by the leftovers from ifcvt? > That already runs an unconditonal (huh...) cleanup_cfg (0) at the end > which could be changed so that DCE is performed (CLEANUP_EXPENSIVE, > runs delete_trivially_dead_insns). > > At least that makes the patch smaller and its impact restricted to > one of the three ifcvt passes. This does not seem to work. The DCE run by cleanup_cfg only deals with dead pseudos. In my case it is a set to the CC hard reg which becomes dead. > OTOH ifcvt performs a DCE at its start (to be not confused by dead > instructions I guess), so why doesn't combine do that as well > (oh, it does!?)? When reaching combine the LR solution is clean and therefore also DCE isn't performed. This is because ifcvt disables running DCE with LR. So LR is always clean after ifcvt although there are still dead insns left. > And maybe _that_ DCE can be removed as if_convert () > already performs a DF_LR_RUN_DCE on the first pass. You mean removing the DCE run in combine? That one probably can go away then given that the passes running between ifcvt and combine (loop and fwprop) get rid of dead insns properly. -Andreas- > > Richard. > >> jeff >>> >>> >> >
On Fri, Mar 20, 2015 at 9:50 AM, Andreas Krebbel <krebbel@linux.vnet.ibm.com> wrote: > On 03/18/2015 12:04 PM, Richard Biener wrote: >> On Tue, Mar 17, 2015 at 7:29 PM, Jeff Law <law@redhat.com> wrote: >>> On 03/17/2015 02:17 AM, Andreas Krebbel wrote: >>>> >>>> >>>> Just to have some numbers I did run a -j1 GCC bootstrap twice with and >>>> without the patch on x86_64. >>>> Best results for both are: >>>> >>>> clean: 21459s >>>> patched: 21314s >>>> >>>> There rather appears to be a trend towards reduced compile time perhaps >>>> due to the reduced number of >>>> INSNs to be processed in the RTL passes between the two ifcvt runs (loop >>>> optimization, combine, >>>> fwprop, dse,...)?! >>>> >>>> I also tried to measure the testsuite runs but the results show a big >>>> variance. So what I have right >>>> now does not qualify as a benchmark. >>> >>> And reality is it's getting harder and harder to benchmark this kind of >>> thing with turbo modes and such. A single run isn't sufficient unless >>> you've locked the box into a particular cpu frequency. >> >> For the particular patch I wonder if you really need to change all >> three if-conversion pass instances or if changing the one before >> combine (pass_rtl_ifcvt, thus rest_of_handle_if_conversion) is enough. > Right. For this particular case it would be good enough to do it only in the first ifcvt run. But > perhaps there are cases where later passes get confused by the leftovers from ifcvt? Well, "perhaps" ... >> That already runs an unconditonal (huh...) cleanup_cfg (0) at the end >> which could be changed so that DCE is performed (CLEANUP_EXPENSIVE, >> runs delete_trivially_dead_insns). >> >> At least that makes the patch smaller and its impact restricted to >> one of the three ifcvt passes. > This does not seem to work. The DCE run by cleanup_cfg only deals with dead pseudos. In my case it > is a set to the CC hard reg which becomes dead. I see. >> OTOH ifcvt performs a DCE at its start (to be not confused by dead >> instructions I guess), so why doesn't combine do that as well >> (oh, it does!?)? > When reaching combine the LR solution is clean and therefore also DCE isn't performed. This is > because ifcvt disables running DCE with LR. So LR is always clean after ifcvt although there are > still dead insns left. Ok... >> And maybe _that_ DCE can be removed as if_convert () >> already performs a DF_LR_RUN_DCE on the first pass. > You mean removing the DCE run in combine? That one probably can go away then given that the passes > running between ifcvt and combine (loop and fwprop) get rid of dead insns properly. So I think the best solution is still removing the dead code manually in ifcvt. Then there is also pass_ud_rtl_dce run before the combine pass - if that isn't enough because of clean solutions or whatever then that should be strenghthened - it was most definitely designed to fix the issue you see. Richard. > -Andreas- > >> >> Richard. >> >>> jeff >>>> >>>> >>> >> >
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index a3e3e5c..d2040af 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -4626,6 +4626,13 @@ if_convert (bool after_combine) num_true_changes); } + if (num_true_changes > 0) + { + df_set_flags (DF_LR_RUN_DCE); + df_mark_solutions_dirty (); + df_analyze (); + } + if (optimize == 1) df_remove_problem (df_live);