diff mbox

Sanitize block partitioning under -freorder-blocks-and-partition

Message ID CAAe5K+UnqBfxYrZxSkjRudq8NYni_9ih+t=us+2hH+UPsrwDLA@mail.gmail.com
State New
Headers show

Commit Message

Teresa Johnson Aug. 19, 2013, 5:47 p.m. UTC
On Mon, Aug 19, 2013 at 8:09 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Remember it isn't using dominance anymore. The latest patch was
>> instead ensuring the most frequent path between hot blocks and the
>> entry/exit are marked hot. That should be better than the dominance
>> approach used in the earlier version.
>
> Indeed, that looks more resonable approach.
> Can you point me to the last version of patch? Last one I remember still
> walked dominators...

I've included the latest patch below. I still use dominators in the
post-cfg-optimization fixup (fixup_partitions), but not in the
partition sanitizing done during the partitioning itself
(sanitize_hot_paths). The former is looking for hot bbs newly
dominated by cold bbs after cfg transformations.

>>
>> > We can commit it and work on better solution incrementally but it will
>> > probably mean replacing it later.  If you think it makes things easier
>> > to work on it incrementally, I think the patch is OK.
>>
>> Yes, I think this is a big step forward from what is there now for
>> splitting, which does the splitting purely based on bb count in
>> isolation. I don't have a much better solution in mind yet.
>>
>> >>
>> >> > - I'll try building and profiling gimp myself to see if I can
>> >> > reproduce the issue with code executing out of the cold section.
>> >>
>> >> I have spent some time this week trying to get the latest gimp Martin
>> >> pointed me to configured and built, but it took awhile to track down
>> >> and configure/build all of the required versions of dependent
>> >> packages. I'm still hitting some issues trying to get it compiled, so
>> >> it may not yet be configured properly. I'll take a look again early
>> >> next week.
>> >
>> > I do not think there is anything special about gimp.  You can probably
>> > take any other bigger app, like GCC itself. With profiledbootstrap
>> > and linker script to lock unlikely section you should get ICEs where
>> > we jump into cold secton and should not.
>>
>> Ok, please point me to the linker script and I will try gcc
>> profiledbootstrap as well. I wanted to try gimp if possible as I
>> haven't seen this much jumping to the cold section in some of the
>> internal apps I tried.
>
> You can also discuss with Martin the systemtap script to plot disk accesses
> during the startup.  It is very handy for analyzing the code layout issues

Ok. I am using linux perf to collect this info (fed through some
scripts that munge and plot the data).

>
> It may be interesting to get similar script taking traces from valgrind
> and ploting the most frequent calls in the final layout ;)

I think linux perf -g to get a callgraph should give similar data.

Teresa

>
> Honza

2013-08-05  Teresa Johnson  <tejohnson@google.com>
            Steven Bosscher  <steven@gcc.gnu.org>

        * cfgrtl.c (fixup_new_cold_bb): New routine.
        (commit_edge_insertions): Invoke fixup_partitions.
        (find_partition_fixes): New routine.
        (fixup_partitions): Ditto.
        (verify_hot_cold_block_grouping): Update comments.
        (rtl_verify_edges): Invoke find_partition_fixes.
        (rtl_verify_bb_pointers): Update comments.
        (rtl_verify_bb_layout): Ditto.
        * basic-block.h (fixup_partitions): Declare.
        * cfgcleanup.c (try_optimize_cfg): Invoke fixup_partitions.
        * bb-reorder.c (sanitize_hot_paths): New function.
        (find_rarely_executed_basic_blocks_and_crossing_edges): Invoke
        sanitize_hot_paths.

      be in a different partition as the throw.  Fix this by either
      moving or duplicating the landing pads.  */

Comments

Martin Liška Aug. 19, 2013, 6:54 p.m. UTC | #1
Dear Teresa,

On 19 August 2013 19:47, Teresa Johnson <tejohnson@google.com> wrote:
> On Mon, Aug 19, 2013 at 8:09 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> Remember it isn't using dominance anymore. The latest patch was
>>> instead ensuring the most frequent path between hot blocks and the
>>> entry/exit are marked hot. That should be better than the dominance
>>> approach used in the earlier version.
>>
>> Indeed, that looks more resonable approach.
>> Can you point me to the last version of patch? Last one I remember still
>> walked dominators...
>
> I've included the latest patch below. I still use dominators in the
> post-cfg-optimization fixup (fixup_partitions), but not in the
> partition sanitizing done during the partitioning itself
> (sanitize_hot_paths). The former is looking for hot bbs newly
> dominated by cold bbs after cfg transformations.
>
>>>
>>> > We can commit it and work on better solution incrementally but it will
>>> > probably mean replacing it later.  If you think it makes things easier
>>> > to work on it incrementally, I think the patch is OK.
>>>
>>> Yes, I think this is a big step forward from what is there now for
>>> splitting, which does the splitting purely based on bb count in
>>> isolation. I don't have a much better solution in mind yet.
>>>
>>> >>
>>> >> > - I'll try building and profiling gimp myself to see if I can
>>> >> > reproduce the issue with code executing out of the cold section.
>>> >>
>>> >> I have spent some time this week trying to get the latest gimp Martin
>>> >> pointed me to configured and built, but it took awhile to track down
>>> >> and configure/build all of the required versions of dependent
>>> >> packages. I'm still hitting some issues trying to get it compiled, so
>>> >> it may not yet be configured properly. I'll take a look again early
>>> >> next week.
>>> >
>>> > I do not think there is anything special about gimp.  You can probably
>>> > take any other bigger app, like GCC itself. With profiledbootstrap
>>> > and linker script to lock unlikely section you should get ICEs where
>>> > we jump into cold secton and should not.
>>>
>>> Ok, please point me to the linker script and I will try gcc
>>> profiledbootstrap as well. I wanted to try gimp if possible as I
>>> haven't seen this much jumping to the cold section in some of the
>>> internal apps I tried.
>>
>> You can also discuss with Martin the systemtap script to plot disk accesses
>> during the startup.  It is very handy for analyzing the code layout issues

I send you as an attachment linker script that preserves
.text.unlikely, .text.exit, .text.startup and .text.hot sections. I
tries to modify memory access flags for .text.unlikely section to
read-only, but I am not so familiar with linker scripting. Maybe, you
can define with MEMORY command the read-only memory area for
.text.unlikely.

I am using for my graphing disabled read-ahead function in the Linux
kernel with systemtap script and the results are presented with
matplotlib library. If you are more interested in this tool-chain,
write me and I can send you these scripts. I was also using valgrind
to trace functions, so I will try to get locations of called functions
to identify which functions are called in corresponding sections.

Martin

> Ok. I am using linux perf to collect this info (fed through some
> scripts that munge and plot the data).
>
>>
>> It may be interesting to get similar script taking traces from valgrind
>> and ploting the most frequent calls in the final layout ;)
>
> I think linux perf -g to get a callgraph should give similar data.
>
> Teresa
>
>>
>> Honza
>
> 2013-08-05  Teresa Johnson  <tejohnson@google.com>
>             Steven Bosscher  <steven@gcc.gnu.org>
>
>         * cfgrtl.c (fixup_new_cold_bb): New routine.
>         (commit_edge_insertions): Invoke fixup_partitions.
>         (find_partition_fixes): New routine.
>         (fixup_partitions): Ditto.
>         (verify_hot_cold_block_grouping): Update comments.
>         (rtl_verify_edges): Invoke find_partition_fixes.
>         (rtl_verify_bb_pointers): Update comments.
>         (rtl_verify_bb_layout): Ditto.
>         * basic-block.h (fixup_partitions): Declare.
>         * cfgcleanup.c (try_optimize_cfg): Invoke fixup_partitions.
>         * bb-reorder.c (sanitize_hot_paths): New function.
>         (find_rarely_executed_basic_blocks_and_crossing_edges): Invoke
>         sanitize_hot_paths.
>
> Index: cfgrtl.c
> ===================================================================
> --- cfgrtl.c    (revision 201461)
> +++ cfgrtl.c    (working copy)
> @@ -1341,6 +1341,43 @@ fixup_partition_crossing (edge e)
>      }
>  }
>
> +/* Called when block BB has been reassigned to the cold partition,
> +   because it is now dominated by another cold block,
> +   to ensure that the region crossing attributes are updated.  */
> +
> +static void
> +fixup_new_cold_bb (basic_block bb)
> +{
> +  edge e;
> +  edge_iterator ei;
> +
> +  /* This is called when a hot bb is found to now be dominated
> +     by a cold bb and therefore needs to become cold. Therefore,
> +     its preds will no longer be region crossing. Any non-dominating
> +     preds that were previously hot would also have become cold
> +     in the caller for the same region. Any preds that were previously
> +     region-crossing will be adjusted in fixup_partition_crossing.  */
> +  FOR_EACH_EDGE (e, ei, bb->preds)
> +    {
> +      fixup_partition_crossing (e);
> +    }
> +
> +  /* Possibly need to make bb's successor edges region crossing,
> +     or remove stale region crossing.  */
> +  FOR_EACH_EDGE (e, ei, bb->succs)
> +    {
> +      /* We can't have fall-through edges across partition boundaries.
> +         Note that force_nonfallthru will do any necessary partition
> +         boundary fixup by calling fixup_partition_crossing itself.  */
> +      if ((e->flags & EDGE_FALLTHRU)
> +          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
> +          && e->dest != EXIT_BLOCK_PTR)
> +        force_nonfallthru (e);
> +      else
> +        fixup_partition_crossing (e);
> +    }
> +}
> +
>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>     expense of adding new instructions or reordering basic blocks.
>
> @@ -1979,6 +2016,14 @@ commit_edge_insertions (void)
>  {
>    basic_block bb;
>
> +  /* Optimization passes that invoke this routine can cause hot blocks
> +     previously reached by both hot and cold blocks to become dominated only
> +     by cold blocks. This will cause the verification below to fail,
> +     and lead to now cold code in the hot section. In some cases this
> +     may only be visible after newly unreachable blocks are deleted,
> +     which will be done by fixup_partitions.  */
> +  fixup_partitions ();
> +
>  #ifdef ENABLE_CHECKING
>    verify_flow_info ();
>  #endif
> @@ -2173,6 +2218,101 @@ get_last_bb_insn (basic_block bb)
>    return end;
>  }
>
> +/* Sanity check partition hotness to ensure that basic blocks in
> +   the cold partition don't dominate basic blocks in the hot partition.
> +   If FLAG_ONLY is true, report violations as errors. Otherwise
> +   re-mark the dominated blocks as cold, since this is run after
> +   cfg optimizations that may make hot blocks previously reached
> +   by both hot and cold blocks now only reachable along cold paths.  */
> +
> +static vec<basic_block>
> +find_partition_fixes (bool flag_only)
> +{
> +  basic_block bb;
> +  vec<basic_block> bbs_in_cold_partition = vNULL;
> +  vec<basic_block> bbs_to_fix = vNULL;
> +
> +  /* Callers check this.  */
> +  gcc_checking_assert (crtl->has_bb_partition);
> +
> +  FOR_EACH_BB (bb)
> +    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
> +      bbs_in_cold_partition.safe_push (bb);
> +
> +  if (bbs_in_cold_partition.is_empty ())
> +    return vNULL;
> +
> +  bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
> +
> +  if (dom_calculated_here)
> +    calculate_dominance_info (CDI_DOMINATORS);
> +
> +  while (! bbs_in_cold_partition.is_empty  ())
> +    {
> +      bb = bbs_in_cold_partition.pop ();
> +      /* Any blocks dominated by a block in the cold section
> +         must also be cold.  */
> +      basic_block son;
> +      for (son = first_dom_son (CDI_DOMINATORS, bb);
> +           son;
> +           son = next_dom_son (CDI_DOMINATORS, son))
> +        {
> +          /* If son is not yet cold, then mark it cold here and
> +             enqueue it for further processing.  */
> +          if ((BB_PARTITION (son) != BB_COLD_PARTITION))
> +            {
> +              if (flag_only)
> +                error ("non-cold basic block %d dominated "
> +                       "by a block in the cold partition (%d)",
> son->index, bb->index);
> +              else
> +                BB_SET_PARTITION (son, BB_COLD_PARTITION);
> +              bbs_to_fix.safe_push (son);
> +              bbs_in_cold_partition.safe_push (son);
> +            }
> +        }
> +    }
> +
> +  if (dom_calculated_here)
> +    free_dominance_info (CDI_DOMINATORS);
> +
> +  return bbs_to_fix;
> +}
> +
> +/* Perform cleanup on the hot/cold bb partitioning after optimization
> +   passes that modify the cfg.  */
> +
> +void
> +fixup_partitions (void)
> +{
> +  basic_block bb;
> +
> +  if (!crtl->has_bb_partition)
> +    return;
> +
> +  /* Delete any blocks that became unreachable and weren't
> +     already cleaned up, for example during edge forwarding
> +     and convert_jumps_to_returns. This will expose more
> +     opportunities for fixing the partition boundaries here.
> +     Also, the calculation of the dominance graph during verification
> +     will assert if there are unreachable nodes.  */
> +  delete_unreachable_blocks ();
> +
> +  /* If there are partitions, do a sanity check on them: A basic block in
> +     a cold partition cannot dominate a basic block in a hot partition.
> +     Fixup any that now violate this requirement, as a result of edge
> +     forwarding and unreachable block deletion.  */
> +  vec<basic_block> bbs_to_fix = find_partition_fixes (false);
> +
> +  /* Do the partition fixup after all necessary blocks have been converted to
> +     cold, so that we only update the region crossings the minimum number of
> +     places, which can require forcing edges to be non fallthru.  */
> +  while (! bbs_to_fix.is_empty ())
> +    {
> +      bb = bbs_to_fix.pop ();
> +      fixup_new_cold_bb (bb);
> +    }
> +}
> +
>  /* Verify, in the basic block chain, that there is at most one switch
>     between hot/cold partitions. This condition will not be true until
>     after reorder_basic_blocks is called.  */
> @@ -2219,7 +2359,8 @@ verify_hot_cold_block_grouping (void)
>  /* Perform several checks on the edges out of each block, such as
>     the consistency of the branch probabilities, the correctness
>     of hot/cold partition crossing edges, and the number of expected
> -   successor edges.  */
> +   successor edges.  Also verify that the dominance relationship
> +   between hot/cold blocks is sane.  */
>
>  static int
>  rtl_verify_edges (void)
> @@ -2382,6 +2523,14 @@ rtl_verify_edges (void)
>         }
>      }
>
> +  /* If there are partitions, do a sanity check on them: A basic block in
> +     a cold partition cannot dominate a basic block in a hot partition.  */
> +  if (crtl->has_bb_partition && !err)
> +    {
> +      vec<basic_block> bbs_to_fix = find_partition_fixes (true);
> +      err = !bbs_to_fix.is_empty ();
> +    }
> +
>    /* Clean up.  */
>    return err;
>  }
> @@ -2515,7 +2664,7 @@ rtl_verify_bb_pointers (void)
>       and NOTE_INSN_BASIC_BLOCK
>     - verify that no fall_thru edge crosses hot/cold partition boundaries
>     - verify that there are no pending RTL branch predictions
> -   - verify that there is a single hot/cold partition boundary after bbro
> +   - verify that hot blocks are not dominated by cold blocks
>
>     In future it can be extended check a lot of other stuff as well
>     (reachability of basic blocks, life information, etc. etc.).  */
> @@ -2761,7 +2910,8 @@ rtl_verify_bb_layout (void)
>     - check that all insns are in the basic blocks
>       (except the switch handling code, barriers and notes)
>     - check that all returns are followed by barriers
> -   - check that all fallthru edge points to the adjacent blocks.  */
> +   - check that all fallthru edge points to the adjacent blocks
> +   - verify that there is a single hot/cold partition boundary after bbro  */
>
>  static int
>  rtl_verify_flow_info (void)
> Index: basic-block.h
> ===================================================================
> --- basic-block.h       (revision 201461)
> +++ basic-block.h       (working copy)
> @@ -797,6 +797,7 @@ extern bool contains_no_active_insn_p (const_basic
>  extern bool forwarder_block_p (const_basic_block);
>  extern bool can_fallthru (basic_block, basic_block);
>  extern void emit_barrier_after_bb (basic_block bb);
> +extern void fixup_partitions (void);
>
>  /* In cfgbuild.c.  */
>  extern void find_many_sub_basic_blocks (sbitmap);
> Index: cfgcleanup.c
> ===================================================================
> --- cfgcleanup.c        (revision 201461)
> +++ cfgcleanup.c        (working copy)
> @@ -2807,10 +2807,21 @@ try_optimize_cfg (int mode)
>               df_analyze ();
>             }
>
> +         if (changed)
> +            {
> +              /* Edge forwarding in particular can cause hot blocks previously
> +                 reached by both hot and cold blocks to become dominated only
> +                 by cold blocks. This will cause the verification
> below to fail,
> +                 and lead to now cold code in the hot section. This is not easy
> +                 to detect and fix during edge forwarding, and in some cases
> +                 is only visible after newly unreachable blocks are deleted,
> +                 which will be done in fixup_partitions.  */
> +              fixup_partitions ();
> +
>  #ifdef ENABLE_CHECKING
> -         if (changed)
> -           verify_flow_info ();
> +              verify_flow_info ();
>  #endif
> +            }
>
>           changed_overall |= changed;
>           first_pass = false;
> Index: bb-reorder.c
> ===================================================================
> --- bb-reorder.c        (revision 201461)
> +++ bb-reorder.c        (working copy)
> @@ -1444,27 +1444,134 @@ fix_up_crossing_landing_pad (eh_landing_pad old_lp
>        ei_next (&ei);
>  }
>
> +
> +/* Ensure that all hot bbs are included in a hot path through the
> +   procedure. This is done by calling this function twice, once
> +   with WALK_UP true (to look for paths from the entry to hot bbs) and
> +   once with WALK_UP false (to look for paths from hot bbs to the exit).
> +   Returns the updated value of COLD_BB_COUNT and adds newly-hot bbs
> +   to BBS_IN_HOT_PARTITION.  */
> +
> +static unsigned int
> +sanitize_hot_paths (bool walk_up, unsigned int cold_bb_count,
> +                    vec<basic_block> *bbs_in_hot_partition)
> +{
> +  /* Callers check this.  */
> +  gcc_checking_assert (cold_bb_count);
> +
> +  /* Keep examining hot bbs while we still have some left to check
> +     and there are remaining cold bbs.  */
> +  vec<basic_block> hot_bbs_to_check = bbs_in_hot_partition->copy ();
> +  while (! hot_bbs_to_check.is_empty ()
> +         && cold_bb_count)
> +    {
> +      basic_block bb = hot_bbs_to_check.pop ();
> +      vec<edge, va_gc> *edges = walk_up ? bb->preds : bb->succs;
> +      edge e;
> +      edge_iterator ei;
> +      int highest_probability = 0;
> +      bool found = false;
> +
> +      /* Walk the preds/succs and check if there is at least one already
> +         marked hot. Keep track of the most frequent pred/succ so that we
> +         can mark it hot if we don't find one.  */
> +      FOR_EACH_EDGE (e, ei, edges)
> +        {
> +          basic_block reach_bb = walk_up ? e->src : e->dest;
> +
> +          if (e->flags & EDGE_DFS_BACK)
> +            continue;
> +
> +          if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION)
> +          {
> +            found = true;
> +            break;
> +          }
> +          if (e->probability > highest_probability)
> +            highest_probability = e->probability;
> +        }
> +
> +      /* If bb is reached by (or reaches, in the case of !WALK_UP) another hot
> +         block (or unpartitioned, e.g. the entry block) then it is ok. If not,
> +         then the most frequent pred (or succ) needs to be adjusted.  In the
> +         case where multiple preds/succs have the same probability (e.g. a
> +         50-50 branch), then both will be adjusted.  */
> +      if (found)
> +        continue;
> +
> +      FOR_EACH_EDGE (e, ei, edges)
> +        {
> +          if (e->flags & EDGE_DFS_BACK)
> +            continue;
> +          if (e->probability < highest_probability)
> +            continue;
> +
> +          basic_block reach_bb = walk_up ? e->src : e->dest;
> +
> +          /* We have a hot bb with an immediate dominator that is cold.
> +             The dominator needs to be re-marked hot.  */
> +          BB_SET_PARTITION (reach_bb, BB_HOT_PARTITION);
> +          cold_bb_count--;
> +
> +          /* Now we need to examine newly-hot reach_bb to see if it is also
> +             dominated by a cold bb.  */
> +          bbs_in_hot_partition->safe_push (reach_bb);
> +          hot_bbs_to_check.safe_push (reach_bb);
> +        }
> +    }
> +
> +  return cold_bb_count;
> +}
> +
> +
>  /* Find the basic blocks that are rarely executed and need to be moved to
>     a separate section of the .o file (to cut down on paging and improve
>     cache locality).  Return a vector of all edges that cross.  */
>
> -static vec<edge>
> +static vec<edge>
>  find_rarely_executed_basic_blocks_and_crossing_edges (void)
>  {
>    vec<edge> crossing_edges = vNULL;
>    basic_block bb;
>    edge e;
>    edge_iterator ei;
> +  unsigned int cold_bb_count = 0;
> +  vec<basic_block> bbs_in_hot_partition = vNULL;
>
>    /* Mark which partition (hot/cold) each basic block belongs in.  */
>    FOR_EACH_BB (bb)
>      {
>        if (probably_never_executed_bb_p (cfun, bb))
> -       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
> +        {
> +          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
> +          cold_bb_count++;
> +        }
>        else
> -       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
> +        {
> +          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
> +          bbs_in_hot_partition.safe_push (bb);
> +        }
>      }
>
> +  /* Ensure that hot bbs are included along a hot path from the entry to exit.
> +     Several different possibilities may include cold bbs along all paths
> +     to/from a hot bb. One is that there are edge weight insanities
> +     due to optimization phases that do not properly update basic block profile
> +     counts. The second is that the entry of the function may not be
> hot, because
> +     it is entered fewer times than the number of profile training
> runs, but there
> +     is a loop inside the function that causes blocks within the function to be
> +     above the threshold for hotness. This is fixed by walking up from hot bbs
> +     to the entry block, and then down from hot bbs to the exit, performing
> +     partitioning fixups as necessary.  */
> +  if (cold_bb_count)
> +    {
> +      mark_dfs_back_edges ();
> +      cold_bb_count = sanitize_hot_paths (true, cold_bb_count,
> +                                          &bbs_in_hot_partition);
> +      if (cold_bb_count)
> +        sanitize_hot_paths (false, cold_bb_count, &bbs_in_hot_partition);
> +    }
> +
>    /* The format of .gcc_except_table does not allow landing pads to
>       be in a different partition as the throw.  Fix this by either
>       moving or duplicating the landing pads.  */
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
diff mbox

Patch

Index: cfgrtl.c
===================================================================
--- cfgrtl.c    (revision 201461)
+++ cfgrtl.c    (working copy)
@@ -1341,6 +1341,43 @@  fixup_partition_crossing (edge e)
     }
 }

+/* Called when block BB has been reassigned to the cold partition,
+   because it is now dominated by another cold block,
+   to ensure that the region crossing attributes are updated.  */
+
+static void
+fixup_new_cold_bb (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  /* This is called when a hot bb is found to now be dominated
+     by a cold bb and therefore needs to become cold. Therefore,
+     its preds will no longer be region crossing. Any non-dominating
+     preds that were previously hot would also have become cold
+     in the caller for the same region. Any preds that were previously
+     region-crossing will be adjusted in fixup_partition_crossing.  */
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      fixup_partition_crossing (e);
+    }
+
+  /* Possibly need to make bb's successor edges region crossing,
+     or remove stale region crossing.  */
+  FOR_EACH_EDGE (e, ei, bb->succs)
+    {
+      /* We can't have fall-through edges across partition boundaries.
+         Note that force_nonfallthru will do any necessary partition
+         boundary fixup by calling fixup_partition_crossing itself.  */
+      if ((e->flags & EDGE_FALLTHRU)
+          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
+          && e->dest != EXIT_BLOCK_PTR)
+        force_nonfallthru (e);
+      else
+        fixup_partition_crossing (e);
+    }
+}
+
 /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
    expense of adding new instructions or reordering basic blocks.

@@ -1979,6 +2016,14 @@  commit_edge_insertions (void)
 {
   basic_block bb;

+  /* Optimization passes that invoke this routine can cause hot blocks
+     previously reached by both hot and cold blocks to become dominated only
+     by cold blocks. This will cause the verification below to fail,
+     and lead to now cold code in the hot section. In some cases this
+     may only be visible after newly unreachable blocks are deleted,
+     which will be done by fixup_partitions.  */
+  fixup_partitions ();
+
 #ifdef ENABLE_CHECKING
   verify_flow_info ();
 #endif
@@ -2173,6 +2218,101 @@  get_last_bb_insn (basic_block bb)
   return end;
 }

+/* Sanity check partition hotness to ensure that basic blocks in
+   the cold partition don't dominate basic blocks in the hot partition.
+   If FLAG_ONLY is true, report violations as errors. Otherwise
+   re-mark the dominated blocks as cold, since this is run after
+   cfg optimizations that may make hot blocks previously reached
+   by both hot and cold blocks now only reachable along cold paths.  */
+
+static vec<basic_block>
+find_partition_fixes (bool flag_only)
+{
+  basic_block bb;
+  vec<basic_block> bbs_in_cold_partition = vNULL;
+  vec<basic_block> bbs_to_fix = vNULL;
+
+  /* Callers check this.  */
+  gcc_checking_assert (crtl->has_bb_partition);
+
+  FOR_EACH_BB (bb)
+    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
+      bbs_in_cold_partition.safe_push (bb);
+
+  if (bbs_in_cold_partition.is_empty ())
+    return vNULL;
+
+  bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+
+  if (dom_calculated_here)
+    calculate_dominance_info (CDI_DOMINATORS);
+
+  while (! bbs_in_cold_partition.is_empty  ())
+    {
+      bb = bbs_in_cold_partition.pop ();
+      /* Any blocks dominated by a block in the cold section
+         must also be cold.  */
+      basic_block son;
+      for (son = first_dom_son (CDI_DOMINATORS, bb);
+           son;
+           son = next_dom_son (CDI_DOMINATORS, son))
+        {
+          /* If son is not yet cold, then mark it cold here and
+             enqueue it for further processing.  */
+          if ((BB_PARTITION (son) != BB_COLD_PARTITION))
+            {
+              if (flag_only)
+                error ("non-cold basic block %d dominated "
+                       "by a block in the cold partition (%d)",
son->index, bb->index);
+              else
+                BB_SET_PARTITION (son, BB_COLD_PARTITION);
+              bbs_to_fix.safe_push (son);
+              bbs_in_cold_partition.safe_push (son);
+            }
+        }
+    }
+
+  if (dom_calculated_here)
+    free_dominance_info (CDI_DOMINATORS);
+
+  return bbs_to_fix;
+}
+
+/* Perform cleanup on the hot/cold bb partitioning after optimization
+   passes that modify the cfg.  */
+
+void
+fixup_partitions (void)
+{
+  basic_block bb;
+
+  if (!crtl->has_bb_partition)
+    return;
+
+  /* Delete any blocks that became unreachable and weren't
+     already cleaned up, for example during edge forwarding
+     and convert_jumps_to_returns. This will expose more
+     opportunities for fixing the partition boundaries here.
+     Also, the calculation of the dominance graph during verification
+     will assert if there are unreachable nodes.  */
+  delete_unreachable_blocks ();
+
+  /* If there are partitions, do a sanity check on them: A basic block in
+     a cold partition cannot dominate a basic block in a hot partition.
+     Fixup any that now violate this requirement, as a result of edge
+     forwarding and unreachable block deletion.  */
+  vec<basic_block> bbs_to_fix = find_partition_fixes (false);
+
+  /* Do the partition fixup after all necessary blocks have been converted to
+     cold, so that we only update the region crossings the minimum number of
+     places, which can require forcing edges to be non fallthru.  */
+  while (! bbs_to_fix.is_empty ())
+    {
+      bb = bbs_to_fix.pop ();
+      fixup_new_cold_bb (bb);
+    }
+}
+
 /* Verify, in the basic block chain, that there is at most one switch
    between hot/cold partitions. This condition will not be true until
    after reorder_basic_blocks is called.  */
@@ -2219,7 +2359,8 @@  verify_hot_cold_block_grouping (void)
 /* Perform several checks on the edges out of each block, such as
    the consistency of the branch probabilities, the correctness
    of hot/cold partition crossing edges, and the number of expected
-   successor edges.  */
+   successor edges.  Also verify that the dominance relationship
+   between hot/cold blocks is sane.  */

 static int
 rtl_verify_edges (void)
@@ -2382,6 +2523,14 @@  rtl_verify_edges (void)
        }
     }

+  /* If there are partitions, do a sanity check on them: A basic block in
+     a cold partition cannot dominate a basic block in a hot partition.  */
+  if (crtl->has_bb_partition && !err)
+    {
+      vec<basic_block> bbs_to_fix = find_partition_fixes (true);
+      err = !bbs_to_fix.is_empty ();
+    }
+
   /* Clean up.  */
   return err;
 }
@@ -2515,7 +2664,7 @@  rtl_verify_bb_pointers (void)
      and NOTE_INSN_BASIC_BLOCK
    - verify that no fall_thru edge crosses hot/cold partition boundaries
    - verify that there are no pending RTL branch predictions
-   - verify that there is a single hot/cold partition boundary after bbro
+   - verify that hot blocks are not dominated by cold blocks

    In future it can be extended check a lot of other stuff as well
    (reachability of basic blocks, life information, etc. etc.).  */
@@ -2761,7 +2910,8 @@  rtl_verify_bb_layout (void)
    - check that all insns are in the basic blocks
      (except the switch handling code, barriers and notes)
    - check that all returns are followed by barriers
-   - check that all fallthru edge points to the adjacent blocks.  */
+   - check that all fallthru edge points to the adjacent blocks
+   - verify that there is a single hot/cold partition boundary after bbro  */

 static int
 rtl_verify_flow_info (void)
Index: basic-block.h
===================================================================
--- basic-block.h       (revision 201461)
+++ basic-block.h       (working copy)
@@ -797,6 +797,7 @@  extern bool contains_no_active_insn_p (const_basic
 extern bool forwarder_block_p (const_basic_block);
 extern bool can_fallthru (basic_block, basic_block);
 extern void emit_barrier_after_bb (basic_block bb);
+extern void fixup_partitions (void);

 /* In cfgbuild.c.  */
 extern void find_many_sub_basic_blocks (sbitmap);
Index: cfgcleanup.c
===================================================================
--- cfgcleanup.c        (revision 201461)
+++ cfgcleanup.c        (working copy)
@@ -2807,10 +2807,21 @@  try_optimize_cfg (int mode)
              df_analyze ();
            }

+         if (changed)
+            {
+              /* Edge forwarding in particular can cause hot blocks previously
+                 reached by both hot and cold blocks to become dominated only
+                 by cold blocks. This will cause the verification
below to fail,
+                 and lead to now cold code in the hot section. This is not easy
+                 to detect and fix during edge forwarding, and in some cases
+                 is only visible after newly unreachable blocks are deleted,
+                 which will be done in fixup_partitions.  */
+              fixup_partitions ();
+
 #ifdef ENABLE_CHECKING
-         if (changed)
-           verify_flow_info ();
+              verify_flow_info ();
 #endif
+            }

          changed_overall |= changed;
          first_pass = false;
Index: bb-reorder.c
===================================================================
--- bb-reorder.c        (revision 201461)
+++ bb-reorder.c        (working copy)
@@ -1444,27 +1444,134 @@  fix_up_crossing_landing_pad (eh_landing_pad old_lp
       ei_next (&ei);
 }

+
+/* Ensure that all hot bbs are included in a hot path through the
+   procedure. This is done by calling this function twice, once
+   with WALK_UP true (to look for paths from the entry to hot bbs) and
+   once with WALK_UP false (to look for paths from hot bbs to the exit).
+   Returns the updated value of COLD_BB_COUNT and adds newly-hot bbs
+   to BBS_IN_HOT_PARTITION.  */
+
+static unsigned int
+sanitize_hot_paths (bool walk_up, unsigned int cold_bb_count,
+                    vec<basic_block> *bbs_in_hot_partition)
+{
+  /* Callers check this.  */
+  gcc_checking_assert (cold_bb_count);
+
+  /* Keep examining hot bbs while we still have some left to check
+     and there are remaining cold bbs.  */
+  vec<basic_block> hot_bbs_to_check = bbs_in_hot_partition->copy ();
+  while (! hot_bbs_to_check.is_empty ()
+         && cold_bb_count)
+    {
+      basic_block bb = hot_bbs_to_check.pop ();
+      vec<edge, va_gc> *edges = walk_up ? bb->preds : bb->succs;
+      edge e;
+      edge_iterator ei;
+      int highest_probability = 0;
+      bool found = false;
+
+      /* Walk the preds/succs and check if there is at least one already
+         marked hot. Keep track of the most frequent pred/succ so that we
+         can mark it hot if we don't find one.  */
+      FOR_EACH_EDGE (e, ei, edges)
+        {
+          basic_block reach_bb = walk_up ? e->src : e->dest;
+
+          if (e->flags & EDGE_DFS_BACK)
+            continue;
+
+          if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION)
+          {
+            found = true;
+            break;
+          }
+          if (e->probability > highest_probability)
+            highest_probability = e->probability;
+        }
+
+      /* If bb is reached by (or reaches, in the case of !WALK_UP) another hot
+         block (or unpartitioned, e.g. the entry block) then it is ok. If not,
+         then the most frequent pred (or succ) needs to be adjusted.  In the
+         case where multiple preds/succs have the same probability (e.g. a
+         50-50 branch), then both will be adjusted.  */
+      if (found)
+        continue;
+
+      FOR_EACH_EDGE (e, ei, edges)
+        {
+          if (e->flags & EDGE_DFS_BACK)
+            continue;
+          if (e->probability < highest_probability)
+            continue;
+
+          basic_block reach_bb = walk_up ? e->src : e->dest;
+
+          /* We have a hot bb with an immediate dominator that is cold.
+             The dominator needs to be re-marked hot.  */
+          BB_SET_PARTITION (reach_bb, BB_HOT_PARTITION);
+          cold_bb_count--;
+
+          /* Now we need to examine newly-hot reach_bb to see if it is also
+             dominated by a cold bb.  */
+          bbs_in_hot_partition->safe_push (reach_bb);
+          hot_bbs_to_check.safe_push (reach_bb);
+        }
+    }
+
+  return cold_bb_count;
+}
+
+
 /* Find the basic blocks that are rarely executed and need to be moved to
    a separate section of the .o file (to cut down on paging and improve
    cache locality).  Return a vector of all edges that cross.  */

-static vec<edge>
+static vec<edge>
 find_rarely_executed_basic_blocks_and_crossing_edges (void)
 {
   vec<edge> crossing_edges = vNULL;
   basic_block bb;
   edge e;
   edge_iterator ei;
+  unsigned int cold_bb_count = 0;
+  vec<basic_block> bbs_in_hot_partition = vNULL;

   /* Mark which partition (hot/cold) each basic block belongs in.  */
   FOR_EACH_BB (bb)
     {
       if (probably_never_executed_bb_p (cfun, bb))
-       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+        {
+          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+          cold_bb_count++;
+        }
       else
-       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
+        {
+          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
+          bbs_in_hot_partition.safe_push (bb);
+        }
     }

+  /* Ensure that hot bbs are included along a hot path from the entry to exit.
+     Several different possibilities may include cold bbs along all paths
+     to/from a hot bb. One is that there are edge weight insanities
+     due to optimization phases that do not properly update basic block profile
+     counts. The second is that the entry of the function may not be
hot, because
+     it is entered fewer times than the number of profile training
runs, but there
+     is a loop inside the function that causes blocks within the function to be
+     above the threshold for hotness. This is fixed by walking up from hot bbs
+     to the entry block, and then down from hot bbs to the exit, performing
+     partitioning fixups as necessary.  */
+  if (cold_bb_count)
+    {
+      mark_dfs_back_edges ();
+      cold_bb_count = sanitize_hot_paths (true, cold_bb_count,
+                                          &bbs_in_hot_partition);
+      if (cold_bb_count)
+        sanitize_hot_paths (false, cold_bb_count, &bbs_in_hot_partition);
+    }
+
   /* The format of .gcc_except_table does not allow landing pads to