diff mbox

Fix lto bootstrap verification failure with -freorder-blocks-and-partition

Message ID CAAe5K+VuZwb00Ni76j8ju=9_SnwA7PGsz=YZmnky8QWmM8ZCaw@mail.gmail.com
State New
Headers show

Commit Message

Teresa Johnson Nov. 15, 2013, 11:40 p.m. UTC
When testing with -freorder-blocks-and-partition enabled, I hit a
verification failure in an LTO profiledbootstrap. Edge forwarding
performed when we went into cfg layout mode after bb reordering
(during compgotos) created a situation where a hot block was then
dominated by a cold block and was therefore remarked as cold. Because
bb reorder was complete at that point, it was not moved in the
physical layout, and we incorrectly went in and out of the cold
section multiple times.

The following patch addresses that by fixing the layout when we move
blocks to the cold section after bb reordering is complete.

Tested with an LTO profiledbootstrap with
-freorder-blocks-and-partition enabled. Ok for trunk?

Thanks,
Teresa

2013-11-15  Teresa Johnson  <tejohnson@google.com>

        * cfgrtl.c (fixup_partitions): Reorder blocks if necessary.

Comments

Jan Hubicka Nov. 16, 2013, 8:33 a.m. UTC | #1
> When testing with -freorder-blocks-and-partition enabled, I hit a
> verification failure in an LTO profiledbootstrap. Edge forwarding
> performed when we went into cfg layout mode after bb reordering
> (during compgotos) created a situation where a hot block was then
> dominated by a cold block and was therefore remarked as cold. Because
> bb reorder was complete at that point, it was not moved in the
> physical layout, and we incorrectly went in and out of the cold
> section multiple times.
> 
> The following patch addresses that by fixing the layout when we move
> blocks to the cold section after bb reordering is complete.
> 
> Tested with an LTO profiledbootstrap with
> -freorder-blocks-and-partition enabled. Ok for trunk?
> 
> Thanks,
> Teresa
> 
> 2013-11-15  Teresa Johnson  <tejohnson@google.com>
> 
>         * cfgrtl.c (fixup_partitions): Reorder blocks if necessary.

computed_gotos just unfactors unified blocks that we use to avoid CFGs with
O(n^2) edges. This is mostly to avoid problems with nonlinearity of other passes
and to reduce the quadratic memory use case to one function at a time.

I wonder if it won't be cleaner to simply unfactor those just before pass_reorder_blocks.

Computed gotos are used e.g. in libjava interpreter to optimize the tight interpretting
loop.  I think those cases would benefit from having at least scheduling/reordering
and alignments done right.

Of course it depends on how bad the compile time implications are (I think in addition
to libjava, there was a lucier's testcase that made us to go for this trick) ,
but I would prefer it over ading yet another hack into cfgrtl...
We also may just avoid cfglayout cleanup_cfg while doing computed gotos...

Honza
Teresa Johnson Nov. 16, 2013, 3:56 p.m. UTC | #2
On Sat, Nov 16, 2013 at 12:33 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> When testing with -freorder-blocks-and-partition enabled, I hit a
>> verification failure in an LTO profiledbootstrap. Edge forwarding
>> performed when we went into cfg layout mode after bb reordering
>> (during compgotos) created a situation where a hot block was then
>> dominated by a cold block and was therefore remarked as cold. Because
>> bb reorder was complete at that point, it was not moved in the
>> physical layout, and we incorrectly went in and out of the cold
>> section multiple times.
>>
>> The following patch addresses that by fixing the layout when we move
>> blocks to the cold section after bb reordering is complete.
>>
>> Tested with an LTO profiledbootstrap with
>> -freorder-blocks-and-partition enabled. Ok for trunk?
>>
>> Thanks,
>> Teresa
>>
>> 2013-11-15  Teresa Johnson  <tejohnson@google.com>
>>
>>         * cfgrtl.c (fixup_partitions): Reorder blocks if necessary.
>
> computed_gotos just unfactors unified blocks that we use to avoid CFGs with
> O(n^2) edges. This is mostly to avoid problems with nonlinearity of other passes
> and to reduce the quadratic memory use case to one function at a time.
>
> I wonder if it won't be cleaner to simply unfactor those just before pass_reorder_blocks.
>
> Computed gotos are used e.g. in libjava interpreter to optimize the tight interpretting
> loop.  I think those cases would benefit from having at least scheduling/reordering
> and alignments done right.
>
> Of course it depends on how bad the compile time implications are (I think in addition
> to libjava, there was a lucier's testcase that made us to go for this trick) ,
> but I would prefer it over ading yet another hack into cfgrtl...
> We also may just avoid cfglayout cleanup_cfg while doing computed gotos...

Note I haven't done an extensive check to see if compgotos is the only
phase that goes back into cfglayout mode after bb reordering is done,
that's just the one that hit this. Eventually it might be good to
prevent going into cfglayout mode after bb reordering.

For now we could either fix up the layout as I am doing here. Or as
you suggest, prevent some cleanup/cfg optimization after bb reordering
is done. I thought about preventing the forwarding optimization after
bb reordering when splitting was on initially, but didn't want
enabling -freorder-blocks-and-partition to unnecessarily prevent
optimization. The reordering seemed reasonably straightforward so I
went with that solution in this patch.

Let me know if you'd rather have the solution of preventing the
forwarding (or maybe all all of try_optimize_cfg to be safe) under
-freorder-blocks-and-partition after bb reordering.

Thanks,
Teresa

>
> Honza
Jan Hubicka Nov. 16, 2013, 4:05 p.m. UTC | #3
> Note I haven't done an extensive check to see if compgotos is the only
> phase that goes back into cfglayout mode after bb reordering is done,
> that's just the one that hit this. Eventually it might be good to
> prevent going into cfglayout mode after bb reordering.

Can we just try to abort when into cfg layout is called after
bb reorder.  It seems to make sense to avoid that - in/out will
definitely result in misplaced gotos and toher stuff.
> 
> For now we could either fix up the layout as I am doing here. Or as
> you suggest, prevent some cleanup/cfg optimization after bb reordering
> is done. I thought about preventing the forwarding optimization after
> bb reordering when splitting was on initially, but didn't want
> enabling -freorder-blocks-and-partition to unnecessarily prevent
> optimization. The reordering seemed reasonably straightforward so I
> went with that solution in this patch.
> 
> Let me know if you'd rather have the solution of preventing the
> forwarding (or maybe all all of try_optimize_cfg to be safe) under
> -freorder-blocks-and-partition after bb reordering.

Generally I would like to be consistent about the stage of IL - i.e. go to
cfglayout after RTl expanstion and stay in it until after the bb reorder and
then consistently work with the actual insns layout we decided on.

Honza
> 
> Thanks,
> Teresa
> 
> >
> > Honza
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson Nov. 18, 2013, 2:53 p.m. UTC | #4
On Sat, Nov 16, 2013 at 12:33 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> When testing with -freorder-blocks-and-partition enabled, I hit a
>> verification failure in an LTO profiledbootstrap. Edge forwarding
>> performed when we went into cfg layout mode after bb reordering
>> (during compgotos) created a situation where a hot block was then
>> dominated by a cold block and was therefore remarked as cold. Because
>> bb reorder was complete at that point, it was not moved in the
>> physical layout, and we incorrectly went in and out of the cold
>> section multiple times.
>>
>> The following patch addresses that by fixing the layout when we move
>> blocks to the cold section after bb reordering is complete.
>>
>> Tested with an LTO profiledbootstrap with
>> -freorder-blocks-and-partition enabled. Ok for trunk?
>>
>> Thanks,
>> Teresa
>>
>> 2013-11-15  Teresa Johnson  <tejohnson@google.com>
>>
>>         * cfgrtl.c (fixup_partitions): Reorder blocks if necessary.
>
> computed_gotos just unfactors unified blocks that we use to avoid CFGs with
> O(n^2) edges. This is mostly to avoid problems with nonlinearity of other passes
> and to reduce the quadratic memory use case to one function at a time.
>
> I wonder if it won't be cleaner to simply unfactor those just before pass_reorder_blocks.
>
> Computed gotos are used e.g. in libjava interpreter to optimize the tight interpretting
> loop.  I think those cases would benefit from having at least scheduling/reordering
> and alignments done right.
>
> Of course it depends on how bad the compile time implications are (I think in addition
> to libjava, there was a lucier's testcase that made us to go for this trick) ,
> but I would prefer it over ading yet another hack into cfgrtl...
> We also may just avoid cfglayout cleanup_cfg while doing computed gotos...

I am testing a new patch right now that simply moves compgotos to just
before bb reordering, and ads an assert to cfg_layout_initialize to
detect when we attempt that after bb reordering. I looked at the other
places that go into cfg layout and I think compgotos is currently the
only one after bb reordering.

From a bb layout perspective it seems like it would be beneficial to
do compgotos before layout. Was the current position just to try to
reduce compile time by keeping the block unified as long as possible?
For your comment "I think those cases would benefit from having at
least scheduling/reordering and alignments done right." do you mean
that it would be better from a code quality perspective for those to
have compgotos done earlier before those passes? That seems to make
sense to me as well.

I'm doing an lto profiledbootstrap with the change right now. Is there
other testing that should be done for this change? Can you point me at
lucier's testcase that you refer to above? I found that PR15242 was
the bug that inspired adding compgotos, but it is a small test case so
I'm not sure what I will learn from trying that other than whether
compgotos still kicks in as expected.

Thanks,
Teresa

>
> Honza
Steven Bosscher Nov. 18, 2013, 3:34 p.m. UTC | #5
On Mon, Nov 18, 2013 at 3:53 PM, Teresa Johnson wrote:
> From a bb layout perspective it seems like it would be beneficial to
> do compgotos before layout. Was the current position just to try to
> reduce compile time by keeping the block unified as long as possible?

It was more a hack that got out of hand. Apparently it hurt some
interpreters (branch prediction!) when the unified computed goto is
not "unfactored". There was a PR for this, and the unfactoring code I
added only fixed part of the problem.


> For your comment "I think those cases would benefit from having at
> least scheduling/reordering and alignments done right." do you mean
> that it would be better from a code quality perspective for those to
> have compgotos done earlier before those passes? That seems to make
> sense to me as well.

Theoretically: Yes, perhaps. In practice there isn't much to gain.
Unfactoring before bb-reorder is probably the most helpful thing, to
get better results for basic block alignment and placement. But
scheduling punts on computed gotos (or explodes in some non-linear
bottleneck).

What used to help is profile-based branch speculation, i.e.

if (*target_addr == most_frequent_target_addr)
  goto most_frequent_target_add;
else
  goto *target_addr;

But I'm not sure if our value profiling based optimizations still have
this case.


> I'm doing an lto profiledbootstrap with the change right now. Is there
> other testing that should be done for this change? Can you point me at
> lucier's testcase that you refer to above? I found that PR15242 was
> the bug that inspired adding compgotos, but it is a small test case so
> I'm not sure what I will learn from trying that other than whether
> compgotos still kicks in as expected.

ISTR it's http://gcc.gnu.org/PR26854

Ciao!
Steven
diff mbox

Patch

Index: cfgrtl.c
===================================================================
--- cfgrtl.c    (revision 204792)
+++ cfgrtl.c    (working copy)
@@ -61,6 +61,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "ggc.h"
 #include "tree-pass.h"
 #include "df.h"
+#include "pointer-set.h"

 /* Holds the interesting leading and trailing notes for the function.
    Only applicable if the CFG is in cfglayout mode.  */
@@ -2332,14 +2333,69 @@  fixup_partitions (void)
      forwarding and unreachable block deletion.  */
   vec<basic_block> bbs_to_fix = find_partition_fixes (false);

+  unsigned new_cold_count = bbs_to_fix.length();
+  if (! new_cold_count)
+    return;
+
   /* Do the partition fixup after all necessary blocks have been converted to
      cold, so that we only update the region crossings the minimum number of
      places, which can require forcing edges to be non fallthru.  */
+  struct pointer_set_t *bbs_moved_to_cold = pointer_set_create ();
   while (! bbs_to_fix.is_empty ())
     {
       bb = bbs_to_fix.pop ();
       fixup_new_cold_bb (bb);
+      pointer_set_insert (bbs_moved_to_cold, bb);
     }
+
+  /* If this happens after bb reordering is done we need to adjust the layout
+     via the next_bb pointers. Otherwise we will incorrectly have multiple
+     switches in and out of the cold section, which is illegal and will be
+     flagged by verify_flow_info. Currently, the compgotos pass goes back into
+     and out of cfglayout mode after bb reordering is complete. When
+     going into cfglayout mode, cfg optimizations may detect opportunities
+     for edge forwarding, which could leave a block in the hot section
+     dominated by a cold block, which will cause the above code to move
+     into to the cold section. By adjusting the next_bb/prev_bb pointers we
+     ensure that fixup_reorder_chain correctly relinks the blocks on exit
+     from cfglayout mode.  */
+  if (crtl->bb_reorder_complete)
+    {
+      /* We should only be doing optimizations that require moving
+         blocks between sections after reordering when we are in cfglayout
+         mode.  */
+      gcc_assert (current_ir_type () == IR_RTL_CFGLAYOUT);
+
+      /* Walk through in layout order, rip out each bb that was moved to
+         the cold section and physically move it to the end of the cold
+         section in layout order. Doing it in this order will guarantee
+         that any fallthrough blocks that were both reassigned to cold stay
+         in the right order.  */
+      unsigned moved_bbs = 0;
+      FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR->next_bb, EXIT_BLOCK_PTR, next_bb)
+        {
+          if (BB_PARTITION (bb) != BB_COLD_PARTITION)
+            continue;
+
+          /* If this was already a cold bb (it is not in the set
+             bbs_moved_to_cold), then we have hit the original cold
+             section blocks and we are done.  */
+          if (!pointer_set_contains (bbs_moved_to_cold, bb))
+            {
+              /* Make sure we processed all the newly cold bbs.  */
+              gcc_assert (moved_bbs == new_cold_count);
+              break;
+            }
+
+          /* Remove bb from current layout position.  */
+          unlink_block (bb);
+          /* Move bb to the end of the cold section.  */
+          link_block (bb, EXIT_BLOCK_PTR->prev_bb);
+
+          moved_bbs++;
+        }
+    }
+    pointer_set_destroy (bbs_moved_to_cold);
 }

 /* Verify, in the basic block chain, that there is at most one switch