Message ID | CAAe5K+Uw2a8vTXDhF=j6u0QM0W0ZUeYYESB05jGM-_WY9yKMLw@mail.gmail.com |
---|---|
State | New |
Headers | show |
> 2013-11-18 Teresa Johnson <tejohnson@google.com> > > * gcc/cfgrtl.c (cfg_layout_initialize): Assert if we > try to go into cfglayout after bb reordering. > * gcc/passes.def: Move compgotos before bb reordering > since it goes into cfglayout. This seems resonable to me, but I can't approve it. > > Index: gcc/cfgrtl.c > =================================================================== > --- gcc/cfgrtl.c (revision 204977) > +++ gcc/cfgrtl.c (working copy) > @@ -4204,6 +4204,8 @@ cfg_layout_initialize (unsigned int flags) > rtx x; > basic_block bb; > > + gcc_assert (!crtl->bb_reorder_complete); I would definitely add a comment explaining why we don't want to run after bb_reorder. Honza > + > initialize_original_copy_tables (); > > cfg_layout_rtl_register_cfg_hooks (); > Index: gcc/passes.def > =================================================================== > --- gcc/passes.def (revision 204977) > +++ gcc/passes.def (working copy) > @@ -387,6 +387,7 @@ along with GCC; see the file COPYING3. If not see > NEXT_PASS (pass_regrename); > NEXT_PASS (pass_cprop_hardreg); > NEXT_PASS (pass_fast_rtl_dce); > + NEXT_PASS (pass_duplicate_computed_gotos); > NEXT_PASS (pass_reorder_blocks); > NEXT_PASS (pass_branch_target_load_optimize2); > NEXT_PASS (pass_leaf_regs); > @@ -398,7 +399,6 @@ along with GCC; see the file COPYING3. If not see > NEXT_PASS (pass_stack_regs_run); > POP_INSERT_PASSES () > NEXT_PASS (pass_compute_alignments); > - NEXT_PASS (pass_duplicate_computed_gotos); > NEXT_PASS (pass_variable_tracking); > NEXT_PASS (pass_free_cfg); > NEXT_PASS (pass_machine_reorg); > > > > > > Ciao! > > Steven > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
On 11/18/13 12:33, Teresa Johnson wrote: > On Mon, Nov 18, 2013 at 7:34 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote: >> On Mon, Nov 18, 2013 at 3:53 PM, Teresa Johnson wrote: >>> From a bb layout perspective it seems like it would be beneficial to >>> do compgotos before layout. Was the current position just to try to >>> reduce compile time by keeping the block unified as long as possible? >> >> It was more a hack that got out of hand. Apparently it hurt some >> interpreters (branch prediction!) when the unified computed goto is >> not "unfactored". There was a PR for this, and the unfactoring code I >> added only fixed part of the problem. >> >> >>> For your comment "I think those cases would benefit from having at >>> least scheduling/reordering and alignments done right." do you mean >>> that it would be better from a code quality perspective for those to >>> have compgotos done earlier before those passes? That seems to make >>> sense to me as well. >> >> Theoretically: Yes, perhaps. In practice there isn't much to gain. >> Unfactoring before bb-reorder is probably the most helpful thing, to >> get better results for basic block alignment and placement. But >> scheduling punts on computed gotos (or explodes in some non-linear >> bottleneck). >> >> What used to help is profile-based branch speculation, i.e. >> >> if (*target_addr == most_frequent_target_addr) >> goto most_frequent_target_add; >> else >> goto *target_addr; >> >> But I'm not sure if our value profiling based optimizations still have >> this case. >> >> >>> I'm doing an lto profiledbootstrap with the change right now. Is there >>> other testing that should be done for this change? Can you point me at >>> lucier's testcase that you refer to above? I found that PR15242 was >>> the bug that inspired adding compgotos, but it is a small test case so >>> I'm not sure what I will learn from trying that other than whether >>> compgotos still kicks in as expected. >> >> ISTR it's http://gcc.gnu.org/PR26854 > > Ok, I have confirmed that moving compgotos earlier, to just before bb > reordering, passes an LTO profiledbootstrap with > -freorder-blocks-and-partition forced on (with my prior patch to fixup > the layout removed). I also added an assert to ensure we aren't going > into cfglayout mode after bb reordering is complete. Currently I am > running a normal bootstrap and regression test without > -freorder-blocks-and-partition force enabled. > > I confirmed that we still apply compgotos and successfully unfactor > the computed goto in the testcase from PR15242. > > I also tried the two test cases in PR26854 (all.i and compiler.i) both > with -O1 -fschedule-insns2 (which shouldn't actually be affected since > compgotos/bbro not on) and with -O3 -fschedule-insns, as described in > the bug report. I compared the time and mem reports before and after > my change to the compgotos pass placement and there is no significant > difference in the time or memory used from bb reordering or other > downstream passes. > > Here is the patch I am testing with a normal bootstrap and regression > test on x86-64-unknown-linux-gnu. Is this change ok for trunk assuming > that passes? > > Thanks, > Teresa > > 2013-11-18 Teresa Johnson <tejohnson@google.com> > > * gcc/cfgrtl.c (cfg_layout_initialize): Assert if we > try to go into cfglayout after bb reordering. > * gcc/passes.def: Move compgotos before bb reordering > since it goes into cfglayout. This is fine once the bootstrap/regression testing passes. Thanks, jeff
On Mon, Nov 18, 2013 at 12:23 PM, Jeff Law <law@redhat.com> wrote: > On 11/18/13 12:33, Teresa Johnson wrote: >> >> On Mon, Nov 18, 2013 at 7:34 AM, Steven Bosscher <stevenb.gcc@gmail.com> >> wrote: >>> >>> On Mon, Nov 18, 2013 at 3:53 PM, Teresa Johnson wrote: >>>> >>>> From a bb layout perspective it seems like it would be beneficial to >>>> do compgotos before layout. Was the current position just to try to >>>> reduce compile time by keeping the block unified as long as possible? >>> >>> >>> It was more a hack that got out of hand. Apparently it hurt some >>> interpreters (branch prediction!) when the unified computed goto is >>> not "unfactored". There was a PR for this, and the unfactoring code I >>> added only fixed part of the problem. >>> >>> >>>> For your comment "I think those cases would benefit from having at >>>> least scheduling/reordering and alignments done right." do you mean >>>> that it would be better from a code quality perspective for those to >>>> have compgotos done earlier before those passes? That seems to make >>>> sense to me as well. >>> >>> >>> Theoretically: Yes, perhaps. In practice there isn't much to gain. >>> Unfactoring before bb-reorder is probably the most helpful thing, to >>> get better results for basic block alignment and placement. But >>> scheduling punts on computed gotos (or explodes in some non-linear >>> bottleneck). >>> >>> What used to help is profile-based branch speculation, i.e. >>> >>> if (*target_addr == most_frequent_target_addr) >>> goto most_frequent_target_add; >>> else >>> goto *target_addr; >>> >>> But I'm not sure if our value profiling based optimizations still have >>> this case. >>> >>> >>>> I'm doing an lto profiledbootstrap with the change right now. Is there >>>> other testing that should be done for this change? Can you point me at >>>> lucier's testcase that you refer to above? I found that PR15242 was >>>> the bug that inspired adding compgotos, but it is a small test case so >>>> I'm not sure what I will learn from trying that other than whether >>>> compgotos still kicks in as expected. >>> >>> >>> ISTR it's http://gcc.gnu.org/PR26854 >> >> >> Ok, I have confirmed that moving compgotos earlier, to just before bb >> reordering, passes an LTO profiledbootstrap with >> -freorder-blocks-and-partition forced on (with my prior patch to fixup >> the layout removed). I also added an assert to ensure we aren't going >> into cfglayout mode after bb reordering is complete. Currently I am >> running a normal bootstrap and regression test without >> -freorder-blocks-and-partition force enabled. >> >> I confirmed that we still apply compgotos and successfully unfactor >> the computed goto in the testcase from PR15242. >> >> I also tried the two test cases in PR26854 (all.i and compiler.i) both >> with -O1 -fschedule-insns2 (which shouldn't actually be affected since >> compgotos/bbro not on) and with -O3 -fschedule-insns, as described in >> the bug report. I compared the time and mem reports before and after >> my change to the compgotos pass placement and there is no significant >> difference in the time or memory used from bb reordering or other >> downstream passes. >> >> Here is the patch I am testing with a normal bootstrap and regression >> test on x86-64-unknown-linux-gnu. Is this change ok for trunk assuming >> that passes? >> >> Thanks, >> Teresa >> >> 2013-11-18 Teresa Johnson <tejohnson@google.com> >> >> * gcc/cfgrtl.c (cfg_layout_initialize): Assert if we >> try to go into cfglayout after bb reordering. >> * gcc/passes.def: Move compgotos before bb reordering >> since it goes into cfglayout. > > This is fine once the bootstrap/regression testing passes. The bootstrap/regression test passed. Added a comment as Honza suggested and committed as r204985. Thanks, Teresa > > Thanks, > jeff
Index: gcc/cfgrtl.c =================================================================== --- gcc/cfgrtl.c (revision 204977) +++ gcc/cfgrtl.c (working copy) @@ -4204,6 +4204,8 @@ cfg_layout_initialize (unsigned int flags) rtx x; basic_block bb; + gcc_assert (!crtl->bb_reorder_complete); + initialize_original_copy_tables (); cfg_layout_rtl_register_cfg_hooks (); Index: gcc/passes.def =================================================================== --- gcc/passes.def (revision 204977) +++ gcc/passes.def (working copy) @@ -387,6 +387,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_regrename); NEXT_PASS (pass_cprop_hardreg); NEXT_PASS (pass_fast_rtl_dce); + NEXT_PASS (pass_duplicate_computed_gotos); NEXT_PASS (pass_reorder_blocks); NEXT_PASS (pass_branch_target_load_optimize2); NEXT_PASS (pass_leaf_regs); @@ -398,7 +399,6 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_stack_regs_run); POP_INSERT_PASSES () NEXT_PASS (pass_compute_alignments); - NEXT_PASS (pass_duplicate_computed_gotos); NEXT_PASS (pass_variable_tracking); NEXT_PASS (pass_free_cfg); NEXT_PASS (pass_machine_reorg);