diff mbox

Fix lto bootstrap verification failure with -freorder-blocks-and-partition

Message ID CAAe5K+Uw2a8vTXDhF=j6u0QM0W0ZUeYYESB05jGM-_WY9yKMLw@mail.gmail.com
State New
Headers show

Commit Message

Teresa Johnson Nov. 18, 2013, 7:33 p.m. UTC
On Mon, Nov 18, 2013 at 7:34 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Mon, Nov 18, 2013 at 3:53 PM, Teresa Johnson wrote:
>> From a bb layout perspective it seems like it would be beneficial to
>> do compgotos before layout. Was the current position just to try to
>> reduce compile time by keeping the block unified as long as possible?
>
> It was more a hack that got out of hand. Apparently it hurt some
> interpreters (branch prediction!) when the unified computed goto is
> not "unfactored". There was a PR for this, and the unfactoring code I
> added only fixed part of the problem.
>
>
>> For your comment "I think those cases would benefit from having at
>> least scheduling/reordering and alignments done right." do you mean
>> that it would be better from a code quality perspective for those to
>> have compgotos done earlier before those passes? That seems to make
>> sense to me as well.
>
> Theoretically: Yes, perhaps. In practice there isn't much to gain.
> Unfactoring before bb-reorder is probably the most helpful thing, to
> get better results for basic block alignment and placement. But
> scheduling punts on computed gotos (or explodes in some non-linear
> bottleneck).
>
> What used to help is profile-based branch speculation, i.e.
>
> if (*target_addr == most_frequent_target_addr)
>   goto most_frequent_target_add;
> else
>   goto *target_addr;
>
> But I'm not sure if our value profiling based optimizations still have
> this case.
>
>
>> I'm doing an lto profiledbootstrap with the change right now. Is there
>> other testing that should be done for this change? Can you point me at
>> lucier's testcase that you refer to above? I found that PR15242 was
>> the bug that inspired adding compgotos, but it is a small test case so
>> I'm not sure what I will learn from trying that other than whether
>> compgotos still kicks in as expected.
>
> ISTR it's http://gcc.gnu.org/PR26854

Ok, I have confirmed that moving compgotos earlier, to just before bb
reordering, passes an LTO profiledbootstrap with
-freorder-blocks-and-partition forced on (with my prior patch to fixup
the layout removed). I also added an assert to ensure we aren't going
into cfglayout mode after bb reordering is complete. Currently I am
running a normal bootstrap and regression test without
-freorder-blocks-and-partition force enabled.

I confirmed that we still apply compgotos and successfully unfactor
the computed goto in the testcase from PR15242.

I also tried the two test cases in PR26854 (all.i and compiler.i) both
with -O1 -fschedule-insns2 (which shouldn't actually be affected since
compgotos/bbro not on) and with -O3 -fschedule-insns, as described in
the bug report. I compared the time and mem reports before and after
my change to the compgotos pass placement and there is no significant
difference in the time or memory used from bb reordering or other
downstream passes.

Here is the patch I am testing with a normal bootstrap and regression
test on x86-64-unknown-linux-gnu. Is this change ok for trunk assuming
that passes?

Thanks,
Teresa

2013-11-18  Teresa Johnson  <tejohnson@google.com>

        * gcc/cfgrtl.c (cfg_layout_initialize): Assert if we
        try to go into cfglayout after bb reordering.
        * gcc/passes.def: Move compgotos before bb reordering
        since it goes into cfglayout.



>
> Ciao!
> Steven

Comments

Jan Hubicka Nov. 18, 2013, 7:45 p.m. UTC | #1
> 2013-11-18  Teresa Johnson  <tejohnson@google.com>
> 
>         * gcc/cfgrtl.c (cfg_layout_initialize): Assert if we
>         try to go into cfglayout after bb reordering.
>         * gcc/passes.def: Move compgotos before bb reordering
>         since it goes into cfglayout.

This seems resonable to me, but I can't approve it.
> 
> Index: gcc/cfgrtl.c
> ===================================================================
> --- gcc/cfgrtl.c        (revision 204977)
> +++ gcc/cfgrtl.c        (working copy)
> @@ -4204,6 +4204,8 @@ cfg_layout_initialize (unsigned int flags)
>    rtx x;
>    basic_block bb;
> 
> +  gcc_assert (!crtl->bb_reorder_complete);

I would definitely add a comment explaining why we don't want to run after bb_reorder.

Honza
> +
>    initialize_original_copy_tables ();
> 
>    cfg_layout_rtl_register_cfg_hooks ();
> Index: gcc/passes.def
> ===================================================================
> --- gcc/passes.def      (revision 204977)
> +++ gcc/passes.def      (working copy)
> @@ -387,6 +387,7 @@ along with GCC; see the file COPYING3.  If not see
>           NEXT_PASS (pass_regrename);
>           NEXT_PASS (pass_cprop_hardreg);
>           NEXT_PASS (pass_fast_rtl_dce);
> +         NEXT_PASS (pass_duplicate_computed_gotos);
>           NEXT_PASS (pass_reorder_blocks);
>           NEXT_PASS (pass_branch_target_load_optimize2);
>           NEXT_PASS (pass_leaf_regs);
> @@ -398,7 +399,6 @@ along with GCC; see the file COPYING3.  If not see
>               NEXT_PASS (pass_stack_regs_run);
>           POP_INSERT_PASSES ()
>           NEXT_PASS (pass_compute_alignments);
> -         NEXT_PASS (pass_duplicate_computed_gotos);
>           NEXT_PASS (pass_variable_tracking);
>           NEXT_PASS (pass_free_cfg);
>           NEXT_PASS (pass_machine_reorg);
> 
> 
> >
> > Ciao!
> > Steven
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Jeff Law Nov. 18, 2013, 8:23 p.m. UTC | #2
On 11/18/13 12:33, Teresa Johnson wrote:
> On Mon, Nov 18, 2013 at 7:34 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
>> On Mon, Nov 18, 2013 at 3:53 PM, Teresa Johnson wrote:
>>>  From a bb layout perspective it seems like it would be beneficial to
>>> do compgotos before layout. Was the current position just to try to
>>> reduce compile time by keeping the block unified as long as possible?
>>
>> It was more a hack that got out of hand. Apparently it hurt some
>> interpreters (branch prediction!) when the unified computed goto is
>> not "unfactored". There was a PR for this, and the unfactoring code I
>> added only fixed part of the problem.
>>
>>
>>> For your comment "I think those cases would benefit from having at
>>> least scheduling/reordering and alignments done right." do you mean
>>> that it would be better from a code quality perspective for those to
>>> have compgotos done earlier before those passes? That seems to make
>>> sense to me as well.
>>
>> Theoretically: Yes, perhaps. In practice there isn't much to gain.
>> Unfactoring before bb-reorder is probably the most helpful thing, to
>> get better results for basic block alignment and placement. But
>> scheduling punts on computed gotos (or explodes in some non-linear
>> bottleneck).
>>
>> What used to help is profile-based branch speculation, i.e.
>>
>> if (*target_addr == most_frequent_target_addr)
>>    goto most_frequent_target_add;
>> else
>>    goto *target_addr;
>>
>> But I'm not sure if our value profiling based optimizations still have
>> this case.
>>
>>
>>> I'm doing an lto profiledbootstrap with the change right now. Is there
>>> other testing that should be done for this change? Can you point me at
>>> lucier's testcase that you refer to above? I found that PR15242 was
>>> the bug that inspired adding compgotos, but it is a small test case so
>>> I'm not sure what I will learn from trying that other than whether
>>> compgotos still kicks in as expected.
>>
>> ISTR it's http://gcc.gnu.org/PR26854
>
> Ok, I have confirmed that moving compgotos earlier, to just before bb
> reordering, passes an LTO profiledbootstrap with
> -freorder-blocks-and-partition forced on (with my prior patch to fixup
> the layout removed). I also added an assert to ensure we aren't going
> into cfglayout mode after bb reordering is complete. Currently I am
> running a normal bootstrap and regression test without
> -freorder-blocks-and-partition force enabled.
>
> I confirmed that we still apply compgotos and successfully unfactor
> the computed goto in the testcase from PR15242.
>
> I also tried the two test cases in PR26854 (all.i and compiler.i) both
> with -O1 -fschedule-insns2 (which shouldn't actually be affected since
> compgotos/bbro not on) and with -O3 -fschedule-insns, as described in
> the bug report. I compared the time and mem reports before and after
> my change to the compgotos pass placement and there is no significant
> difference in the time or memory used from bb reordering or other
> downstream passes.
>
> Here is the patch I am testing with a normal bootstrap and regression
> test on x86-64-unknown-linux-gnu. Is this change ok for trunk assuming
> that passes?
>
> Thanks,
> Teresa
>
> 2013-11-18  Teresa Johnson  <tejohnson@google.com>
>
>          * gcc/cfgrtl.c (cfg_layout_initialize): Assert if we
>          try to go into cfglayout after bb reordering.
>          * gcc/passes.def: Move compgotos before bb reordering
>          since it goes into cfglayout.
This is fine once the bootstrap/regression testing passes.

Thanks,
jeff
Teresa Johnson Nov. 18, 2013, 10:39 p.m. UTC | #3
On Mon, Nov 18, 2013 at 12:23 PM, Jeff Law <law@redhat.com> wrote:
> On 11/18/13 12:33, Teresa Johnson wrote:
>>
>> On Mon, Nov 18, 2013 at 7:34 AM, Steven Bosscher <stevenb.gcc@gmail.com>
>> wrote:
>>>
>>> On Mon, Nov 18, 2013 at 3:53 PM, Teresa Johnson wrote:
>>>>
>>>>  From a bb layout perspective it seems like it would be beneficial to
>>>> do compgotos before layout. Was the current position just to try to
>>>> reduce compile time by keeping the block unified as long as possible?
>>>
>>>
>>> It was more a hack that got out of hand. Apparently it hurt some
>>> interpreters (branch prediction!) when the unified computed goto is
>>> not "unfactored". There was a PR for this, and the unfactoring code I
>>> added only fixed part of the problem.
>>>
>>>
>>>> For your comment "I think those cases would benefit from having at
>>>> least scheduling/reordering and alignments done right." do you mean
>>>> that it would be better from a code quality perspective for those to
>>>> have compgotos done earlier before those passes? That seems to make
>>>> sense to me as well.
>>>
>>>
>>> Theoretically: Yes, perhaps. In practice there isn't much to gain.
>>> Unfactoring before bb-reorder is probably the most helpful thing, to
>>> get better results for basic block alignment and placement. But
>>> scheduling punts on computed gotos (or explodes in some non-linear
>>> bottleneck).
>>>
>>> What used to help is profile-based branch speculation, i.e.
>>>
>>> if (*target_addr == most_frequent_target_addr)
>>>    goto most_frequent_target_add;
>>> else
>>>    goto *target_addr;
>>>
>>> But I'm not sure if our value profiling based optimizations still have
>>> this case.
>>>
>>>
>>>> I'm doing an lto profiledbootstrap with the change right now. Is there
>>>> other testing that should be done for this change? Can you point me at
>>>> lucier's testcase that you refer to above? I found that PR15242 was
>>>> the bug that inspired adding compgotos, but it is a small test case so
>>>> I'm not sure what I will learn from trying that other than whether
>>>> compgotos still kicks in as expected.
>>>
>>>
>>> ISTR it's http://gcc.gnu.org/PR26854
>>
>>
>> Ok, I have confirmed that moving compgotos earlier, to just before bb
>> reordering, passes an LTO profiledbootstrap with
>> -freorder-blocks-and-partition forced on (with my prior patch to fixup
>> the layout removed). I also added an assert to ensure we aren't going
>> into cfglayout mode after bb reordering is complete. Currently I am
>> running a normal bootstrap and regression test without
>> -freorder-blocks-and-partition force enabled.
>>
>> I confirmed that we still apply compgotos and successfully unfactor
>> the computed goto in the testcase from PR15242.
>>
>> I also tried the two test cases in PR26854 (all.i and compiler.i) both
>> with -O1 -fschedule-insns2 (which shouldn't actually be affected since
>> compgotos/bbro not on) and with -O3 -fschedule-insns, as described in
>> the bug report. I compared the time and mem reports before and after
>> my change to the compgotos pass placement and there is no significant
>> difference in the time or memory used from bb reordering or other
>> downstream passes.
>>
>> Here is the patch I am testing with a normal bootstrap and regression
>> test on x86-64-unknown-linux-gnu. Is this change ok for trunk assuming
>> that passes?
>>
>> Thanks,
>> Teresa
>>
>> 2013-11-18  Teresa Johnson  <tejohnson@google.com>
>>
>>          * gcc/cfgrtl.c (cfg_layout_initialize): Assert if we
>>          try to go into cfglayout after bb reordering.
>>          * gcc/passes.def: Move compgotos before bb reordering
>>          since it goes into cfglayout.
>
> This is fine once the bootstrap/regression testing passes.

The bootstrap/regression test passed. Added a comment as Honza
suggested and committed as r204985.

Thanks,
Teresa

>
> Thanks,
> jeff
diff mbox

Patch

Index: gcc/cfgrtl.c
===================================================================
--- gcc/cfgrtl.c        (revision 204977)
+++ gcc/cfgrtl.c        (working copy)
@@ -4204,6 +4204,8 @@  cfg_layout_initialize (unsigned int flags)
   rtx x;
   basic_block bb;

+  gcc_assert (!crtl->bb_reorder_complete);
+
   initialize_original_copy_tables ();

   cfg_layout_rtl_register_cfg_hooks ();
Index: gcc/passes.def
===================================================================
--- gcc/passes.def      (revision 204977)
+++ gcc/passes.def      (working copy)
@@ -387,6 +387,7 @@  along with GCC; see the file COPYING3.  If not see
          NEXT_PASS (pass_regrename);
          NEXT_PASS (pass_cprop_hardreg);
          NEXT_PASS (pass_fast_rtl_dce);
+         NEXT_PASS (pass_duplicate_computed_gotos);
          NEXT_PASS (pass_reorder_blocks);
          NEXT_PASS (pass_branch_target_load_optimize2);
          NEXT_PASS (pass_leaf_regs);
@@ -398,7 +399,6 @@  along with GCC; see the file COPYING3.  If not see
              NEXT_PASS (pass_stack_regs_run);
          POP_INSERT_PASSES ()
          NEXT_PASS (pass_compute_alignments);
-         NEXT_PASS (pass_duplicate_computed_gotos);
          NEXT_PASS (pass_variable_tracking);
          NEXT_PASS (pass_free_cfg);
          NEXT_PASS (pass_machine_reorg);