diff mbox

Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

Message ID 20121115201013.D8E7E61423@tjsboxrox.mtv.corp.google.com
State New
Headers show

Commit Message

Teresa Johnson Nov. 15, 2012, 8:10 p.m. UTC
Revised patch that fixes failures encountered when enabling
-freorder-blocks-and-partition, including the failure reported in PR 53743.

This includes new verification code to ensure no cold blocks dominate hot
blocks contributed by Steven Bosscher.

I attempted to make the handling of partition updates through the optimization
passes much more consistent, removing a number of partial fixes in the code
stream in the process. The code to fixup partitions (including the BB_PARTITION
assignement, region crossing jump notes, and switch text section notes) is
now handled in a few centralized locations. For example, inside
rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
don't need to attempt the fixup themselves.

For optimization passes that make adjustments to the cfg while in cfg layout
mode that are not easy to fix up incrementally, the new routine
fixup_partitions handles the cleanup globally. This does require calculation
of the dominance relation, however, as far as I can tell the routines which
now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
are invoked typically once (or a small number of times in the case of
try_optimize_cfg) per optimization pass. Additionally, I compared the
-ftime-report output for some large fdo compilations and saw only minimal
increases in the dominance computation times, which were only a tiny percent
of the overall compile time.

Additionally, I added a flag to the rtl_data structure to indicate whether
any partitioning was actually performed, so that optimizations which were
conservatively disabled whenever the flag_reorder_blocks_and_partition
is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
conservative for functions where no partitions were formed (e.g. they are
completely hot).

Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
benchmarks and internal google benchmarks using profile feedback and
-freorder-blocks-and-partition to get more coverage. Ok for trunk?

Thanks,
Teresa

2012-11-14  Teresa Johnson  <tejohnson@google.com>
            Steven Bosscher  <steven@gcc.gnu.org>

	* cfghooks.h (cfg_layout_finalize): New parameter.
	* modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
        parameter.
	* ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
        as this is now done by redirect_edge_and_branch_force.
	* function.c (thread_prologue_and_epilogue_insns): Insert new bb after
        barriers, new cfg_layout_finalize parameter, and don't store exit
        predecessor BB until after it is potentially split.
	* function.h (struct rtl_data): New flag has_bb_partition.
	* hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
	* cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
        any blocks in function actually partitioned.
	(try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
        up partitioning.
	* bb-reorder.c (connect_traces): Only look for partitions and skip
        block copying if any blocks in function actually partitioned.
	(emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
        (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
        that no cold blocks dominate a hot block.
	(fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
        as this is now done by force_nonfallthru_and_redirect.
	(add_reg_crossing_jump_notes): Handle the fact that some jumps may
        already be marked with region crossing note.
	(reorder_basic_blocks): Only need to verify partitions if any
        blocks in function actually partitioned.
	(insert_section_boundary_note): Only need to insert note if any
        blocks in function actually partitioned.
	(rest_of_handle_reorder_blocks): New cfg_layout_finalize
        parameter, and remove call to insert_section_boundary_note as this
        is now called via cfg_layout_finalize/fixup_reorder_chain.
	(duplicate_computed_gotos): New cfg_layout_finalize
        parameter.
	(partition_hot_cold_basic_blocks): Set flag indicating function
        has bb partitions.
	* bb-reorder.h: Declare insert_section_boundary_note and
        emit_barrier_after_bb, which are no longer static.
	* basic-block.h: Declare new function fixup_partitions.
	* cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
        check for region crossing note.
	(fixup_partition_crossing): New function.
	(fixup_bb_partition): Ditto.
	(rtl_redirect_edge_and_branch): Fixup partition boundaries.
	(force_nonfallthru_and_redirect): Fixup partition boundaries,
        remove old code that tried to do this. Emit barrier correctly
        when we are in cfglayout mode.
	(rtl_split_edge): Correctly fixup partition boundaries.
	(commit_one_edge_insertion): Remove old code that tried to
        fixup region crossing edge since this is now handled in
        split_block, and set up insertion point correctly since
        block may now end in a jump.
	(commit_edge_insertions): Invoke fixup_partitions to sanitize partition
        boundaries after optimizations that modify cfg and before trying to
        verify the flow info.
	(fixup_partitions): New function.
	(rtl_verify_flow_info_1): Add verification that no cold bbs dominate
        hot bbs.
	(record_effective_endpoints): Remove region-crossing notes and set flag
        indicating that they need to be reinserted on exit from cfglayout mode.
	(outof_cfg_layout_mode): New cfg_layout_finalize parameter.
	(fixup_reorder_chain): Call insert_section_boundary_note if necessary.
        Remove old code that attempted to fixup region crossing note as
        this is now handled in force_nonfallthru_and_redirect.
	(duplicate_insn_chain): Don't duplicate switch section notes.
	(cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
	(rtl_can_remove_branch_p): Remove unnecessary check for region crossing
        note.


--
This patch is available for review at http://codereview.appspot.com/6823047

Comments

Teresa Johnson Nov. 26, 2012, 3:55 p.m. UTC | #1
Ping.
Teresa

On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote:
> Revised patch that fixes failures encountered when enabling
> -freorder-blocks-and-partition, including the failure reported in PR 53743.
>
> This includes new verification code to ensure no cold blocks dominate hot
> blocks contributed by Steven Bosscher.
>
> I attempted to make the handling of partition updates through the optimization
> passes much more consistent, removing a number of partial fixes in the code
> stream in the process. The code to fixup partitions (including the BB_PARTITION
> assignement, region crossing jump notes, and switch text section notes) is
> now handled in a few centralized locations. For example, inside
> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
> don't need to attempt the fixup themselves.
>
> For optimization passes that make adjustments to the cfg while in cfg layout
> mode that are not easy to fix up incrementally, the new routine
> fixup_partitions handles the cleanup globally. This does require calculation
> of the dominance relation, however, as far as I can tell the routines which
> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
> are invoked typically once (or a small number of times in the case of
> try_optimize_cfg) per optimization pass. Additionally, I compared the
> -ftime-report output for some large fdo compilations and saw only minimal
> increases in the dominance computation times, which were only a tiny percent
> of the overall compile time.
>
> Additionally, I added a flag to the rtl_data structure to indicate whether
> any partitioning was actually performed, so that optimizations which were
> conservatively disabled whenever the flag_reorder_blocks_and_partition
> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
> conservative for functions where no partitions were formed (e.g. they are
> completely hot).
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
> benchmarks and internal google benchmarks using profile feedback and
> -freorder-blocks-and-partition to get more coverage. Ok for trunk?
>
> Thanks,
> Teresa
>
> 2012-11-14  Teresa Johnson  <tejohnson@google.com>
>             Steven Bosscher  <steven@gcc.gnu.org>
>
>         * cfghooks.h (cfg_layout_finalize): New parameter.
>         * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
>         parameter.
>         * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
>         as this is now done by redirect_edge_and_branch_force.
>         * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
>         barriers, new cfg_layout_finalize parameter, and don't store exit
>         predecessor BB until after it is potentially split.
>         * function.h (struct rtl_data): New flag has_bb_partition.
>         * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
>         * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
>         any blocks in function actually partitioned.
>         (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
>         up partitioning.
>         * bb-reorder.c (connect_traces): Only look for partitions and skip
>         block copying if any blocks in function actually partitioned.
>         (emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
>         (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
>         that no cold blocks dominate a hot block.
>         (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
>         as this is now done by force_nonfallthru_and_redirect.
>         (add_reg_crossing_jump_notes): Handle the fact that some jumps may
>         already be marked with region crossing note.
>         (reorder_basic_blocks): Only need to verify partitions if any
>         blocks in function actually partitioned.
>         (insert_section_boundary_note): Only need to insert note if any
>         blocks in function actually partitioned.
>         (rest_of_handle_reorder_blocks): New cfg_layout_finalize
>         parameter, and remove call to insert_section_boundary_note as this
>         is now called via cfg_layout_finalize/fixup_reorder_chain.
>         (duplicate_computed_gotos): New cfg_layout_finalize
>         parameter.
>         (partition_hot_cold_basic_blocks): Set flag indicating function
>         has bb partitions.
>         * bb-reorder.h: Declare insert_section_boundary_note and
>         emit_barrier_after_bb, which are no longer static.
>         * basic-block.h: Declare new function fixup_partitions.
>         * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
>         check for region crossing note.
>         (fixup_partition_crossing): New function.
>         (fixup_bb_partition): Ditto.
>         (rtl_redirect_edge_and_branch): Fixup partition boundaries.
>         (force_nonfallthru_and_redirect): Fixup partition boundaries,
>         remove old code that tried to do this. Emit barrier correctly
>         when we are in cfglayout mode.
>         (rtl_split_edge): Correctly fixup partition boundaries.
>         (commit_one_edge_insertion): Remove old code that tried to
>         fixup region crossing edge since this is now handled in
>         split_block, and set up insertion point correctly since
>         block may now end in a jump.
>         (commit_edge_insertions): Invoke fixup_partitions to sanitize partition
>         boundaries after optimizations that modify cfg and before trying to
>         verify the flow info.
>         (fixup_partitions): New function.
>         (rtl_verify_flow_info_1): Add verification that no cold bbs dominate
>         hot bbs.
>         (record_effective_endpoints): Remove region-crossing notes and set flag
>         indicating that they need to be reinserted on exit from cfglayout mode.
>         (outof_cfg_layout_mode): New cfg_layout_finalize parameter.
>         (fixup_reorder_chain): Call insert_section_boundary_note if necessary.
>         Remove old code that attempted to fixup region crossing note as
>         this is now handled in force_nonfallthru_and_redirect.
>         (duplicate_insn_chain): Don't duplicate switch section notes.
>         (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
>         (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
>         note.
>
> Index: cfghooks.h
> ===================================================================
> --- cfghooks.h  (revision 193376)
> +++ cfghooks.h  (working copy)
> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
>  void account_profile_record (struct profile_record *, int);
>
>  extern void cfg_layout_initialize (unsigned int);
> -extern void cfg_layout_finalize (void);
> +extern void cfg_layout_finalize (bool);
>
>  /* Hooks containers.  */
>  extern struct cfg_hooks gimple_cfg_hooks;
> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
>  extern void gimple_register_cfg_hooks (void);
>  extern struct cfg_hooks get_cfg_hooks (void);
>  extern void set_cfg_hooks (struct cfg_hooks);
> -
> Index: modulo-sched.c
> ===================================================================
> --- modulo-sched.c      (revision 193376)
> +++ modulo-sched.c      (working copy)
> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void)
>      if (bb->next_bb != EXIT_BLOCK_PTR)
>        bb->aux = bb->next_bb;
>    free_dominance_info (CDI_DOMINATORS);
> -  cfg_layout_finalize ();
> +  cfg_layout_finalize (false);
>  #endif /* INSN_SCHEDULING */
>    return 0;
>  }
> Index: ifcvt.c
> ===================================================================
> --- ifcvt.c     (revision 193376)
> +++ ifcvt.c     (working copy)
> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
>    if (new_bb)
>      {
>        df_bb_replace (then_bb_index, new_bb);
> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
> -         we need to ensure that new_bb is in the same partition as
> -         test bb (you can not fall through across section boundaries).  */
> -      BB_COPY_PARTITION (new_bb, test_bb);
> +      /* This should have been done above via force_nonfallthru_and_redirect
> +         (possibly called from redirect_edge_and_branch_force).  */
> +      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
>      }
>
>    num_true_changes++;
> Index: function.c
> ===================================================================
> --- function.c  (revision 193376)
> +++ function.c  (working copy)
> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void)
>                     break;
>                 if (e)
>                   {
> -                   copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
> -                                                 NULL_RTX, e->src);
> +                    /* Make sure we insert after any barriers.  */
> +                    rtx end = get_last_bb_insn (e->src);
> +                    copy_bb = create_basic_block (NEXT_INSN (end),
> +                                                  NULL_RTX, e->src);
>                     BB_COPY_PARTITION (copy_bb, e->src);
>                   }
>                 else
> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void)
>         if (cur_bb->index >= NUM_FIXED_BLOCKS
>             && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
>           cur_bb->aux = cur_bb->next_bb;
> -      cfg_layout_finalize ();
> +      cfg_layout_finalize (false);
>      }
>
>  epilogue_done:
> @@ -6517,7 +6519,7 @@ epilogue_done:
>        basic_block simple_return_block_cold = NULL;
>        edge pending_edge_hot = NULL;
>        edge pending_edge_cold = NULL;
> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
> +      basic_block exit_pred;
>        int i;
>
>        gcc_assert (entry_edge != orig_entry_edge);
> @@ -6545,6 +6547,12 @@ epilogue_done:
>             else
>               pending_edge_cold = e;
>           }
> +
> +      /* Save a pointer to the exit's predecessor BB for use in
> +         inserting new BBs at the end of the function. Do this
> +         after the call to split_block above which may split
> +         the original exit pred.  */
> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
>
>        FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
>         {
> Index: function.h
> ===================================================================
> --- function.h  (revision 193376)
> +++ function.h  (working copy)
> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data {
>       sched2) and is useful only if the port defines LEAF_REGISTERS.  */
>    bool uses_only_leaf_regs;
>
> +  /* Nonzero if the function being compiled has undergone hot/cold partitioning
> +     (under flag_reorder_blocks_and_partition) and has at least one cold
> +     block.  */
> +  bool has_bb_partition;
> +
>    /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
>       asm.  Unlike regs_ever_live, elements of this array corresponding
>       to eliminable regs (like the frame pointer) are set if an asm
> Index: hw-doloop.c
> ===================================================================
> --- hw-doloop.c (revision 193376)
> +++ hw-doloop.c (working copy)
> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
>        else
>         bb->aux = NULL;
>      }
> -  cfg_layout_finalize ();
> +  cfg_layout_finalize (false);
>    clear_aux_for_blocks ();
>    df_analyze ();
>  }
> Index: cfgcleanup.c
> ===================================================================
> --- cfgcleanup.c        (revision 193376)
> +++ cfgcleanup.c        (working copy)
> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
>       partition boundaries).  See the comments at the top of
>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>
> -  if (flag_reorder_blocks_and_partition && reload_completed)
> +  if (crtl->has_bb_partition && reload_completed)
>      return false;
>
>    /* Search backward through forwarder blocks.  We don't need to worry
> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode)
>               df_analyze ();
>             }
>
> +         if (changed)
> +            {
> +              /* Edge forwarding in particular can cause hot blocks previously
> +                 reached by both hot and cold blocks to become dominated only
> +                 by cold blocks. This will cause the verification below to fail,
> +                 and lead to now cold code in the hot section. This is not easy
> +                 to detect and fix during edge forwarding, and in some cases
> +                 is only visible after newly unreachable blocks are deleted,
> +                 which will be done in fixup_partitions.  */
> +              fixup_partitions ();
> +
>  #ifdef ENABLE_CHECKING
> -         if (changed)
> -           verify_flow_info ();
> +              verify_flow_info ();
>  #endif
> +            }
>
>           changed_overall |= changed;
>           first_pass = false;
> Index: bb-reorder.c
> ===================================================================
> --- bb-reorder.c        (revision 193376)
> +++ bb-reorder.c        (working copy)
> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
>    current_partition = BB_PARTITION (traces[0].first);
>    two_passes = false;
>
> -  if (flag_reorder_blocks_and_partition)
> +  if (crtl->has_bb_partition)
>      for (i = 0; i < n_traces && !two_passes; i++)
>        if (BB_PARTITION (traces[0].first)
>           != BB_PARTITION (traces[i].first))
> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
>                       }
>                   }
>
> -             if (flag_reorder_blocks_and_partition)
> +             if (crtl->has_bb_partition)
>                 try_copy = false;
>
>               /* Copy tiny blocks always; copy larger blocks only when the
> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
>    return length;
>  }
>
> -/* Emit a barrier into the footer of BB.  */
> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
>
> -static void
> +void
>  emit_barrier_after_bb (basic_block bb)
>  {
>    rtx barrier = emit_barrier_after (BB_END (bb));
> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>  }
>
>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
>  {
>    VEC(edge, heap) *crossing_edges = NULL;
>    basic_block bb;
> -  edge e;
> -  edge_iterator ei;
> +  edge e, e2;
> +  edge_iterator ei, ei2;
> +  unsigned int cold_bb_count = 0;
> +  VEC (basic_block, heap) *bbs_in_hot_partition = NULL;
> +  VEC (basic_block, heap) *bbs_newly_hot = NULL;
>
>    /* Mark which partition (hot/cold) each basic block belongs in.  */
>    FOR_EACH_BB (bb)
>      {
>        if (probably_never_executed_bb_p (cfun, bb))
> -       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
> +        {
> +          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
> +          cold_bb_count++;
> +        }
>        else
> -       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
> +        {
> +          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb);
> +        }
>      }
>
> +  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
> +     several different possibilities. One is that there are edge weight insanities
> +     due to optimization phases that do not properly update basic block profile
> +     counts. The second is that the entry of the function may not be hot, because
> +     it is entered fewer times than the number of profile training runs, but there
> +     is a loop inside the function that causes blocks within the function to be
> +     above the threshold for hotness.  */
> +  if (cold_bb_count)
> +    {
> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
> +
> +      if (dom_calculated_here)
> +        calculate_dominance_info (CDI_DOMINATORS);
> +
> +      /* Keep examining hot bbs until we have either checked them all, or
> +         re-marked all cold bbs hot.  */
> +      while (! VEC_empty (basic_block, bbs_in_hot_partition)
> +             && cold_bb_count)
> +        {
> +          basic_block dom_bb;
> +
> +          bb = VEC_pop (basic_block, bbs_in_hot_partition);
> +          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
> +
> +          /* If bb's immediate dominator is also hot then it is ok.  */
> +          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
> +            continue;
> +
> +          /* We have a hot bb with an immediate dominator that is cold.
> +             The dominator needs to be re-marked to hot.  */
> +          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
> +          cold_bb_count--;
> +
> +          /* Now we need to examine newly-hot dom_bb to see if it is also
> +             dominated by a cold bb.  */
> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb);
> +
> +          /* We should also adjust any cold blocks that the newly-hot bb
> +             feeds and see if it makes sense to re-mark those as hot as
> +             well.  */
> +          VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb);
> +          while (! VEC_empty (basic_block, bbs_newly_hot))
> +            {
> +              basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot);
> +              /* Examine all successors of this newly-hot bb to see if they
> +                 are cold and should be re-marked as hot.  */
> +              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
> +                {
> +                  bool any_cold_preds = false;
> +                  basic_block succ = e->dest;
> +                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
> +                    continue;
> +                  /* Does this block have any cold predecessors now?  */
> +                  FOR_EACH_EDGE (e2, ei2, succ->preds)
> +                  {
> +                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
> +                      {
> +                        any_cold_preds = true;
> +                        break;
> +                      }
> +                  }
> +                  if (any_cold_preds)
> +                    continue;
> +
> +                  /* Here we have a successor of newly-hot bb that is cold
> +                     but no longer has any cold precessessors. Since the original
> +                     assignment of our newly-hot bb was incorrect, this successor's
> +                     assignment as cold is also suspect. Go ahead and re-mark it
> +                     as hot now too. Better heuristics may be in order here.  */
> +                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
> +                  cold_bb_count--;
> +                  VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ);
> +                  /* Examine this successor as a newly-hot bb.  */
> +                  VEC_safe_push (basic_block, heap, bbs_newly_hot, succ);
> +                }
> +            }
> +        }
> +
> +      if (dom_calculated_here)
> +        free_dominance_info (CDI_DOMINATORS);
> +    }
> +
>    /* The format of .gcc_except_table does not allow landing pads to
>       be in a different partition as the throw.  Fix this by either
>       moving or duplicating the landing pads.  */
> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
>                       new_bb->aux = cur_bb->aux;
>                       cur_bb->aux = new_bb;
>
> -                     /* Make sure new fall-through bb is in same
> -                        partition as bb it's falling through from.  */
> +                      /* This is done by force_nonfallthru_and_redirect.  */
> +                     gcc_assert (BB_PARTITION (new_bb)
> +                                  == BB_PARTITION (cur_bb));
>
> -                     BB_COPY_PARTITION (new_bb, cur_bb);
>                       single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
>                     }
>                   else
> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
>    FOR_EACH_BB (bb)
>      FOR_EACH_EDGE (e, ei, bb->succs)
>        if ((e->flags & EDGE_CROSSING)
> -         && JUMP_P (BB_END (e->src)))
> +         && JUMP_P (BB_END (e->src))
> +          /* Some notes were added during fix_up_fall_thru_edges, via
> +             force_nonfallthru_and_redirect.  */
> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
>         add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>  }
>
> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
>        dump_flow_info (dump_file, dump_flags);
>      }
>
> -  if (flag_reorder_blocks_and_partition)
> +  if (crtl->has_bb_partition)
>      verify_hot_cold_block_grouping ();
>  }
>
> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
>     encountering this note will make the compiler switch between the
>     hot and cold text sections.  */
>
> -static void
> +void
>  insert_section_boundary_note (void)
>  {
>    basic_block bb;
>    rtx new_note;
>    int first_partition = 0;
>
> -  if (!flag_reorder_blocks_and_partition)
> +  if (!crtl->has_bb_partition)
>      return;
>
>    FOR_EACH_BB (bb)
> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
>    FOR_EACH_BB (bb)
>      if (bb->next_bb != EXIT_BLOCK_PTR)
>        bb->aux = bb->next_bb;
> -  cfg_layout_finalize ();
> +  cfg_layout_finalize (true);
>
> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
> -  insert_section_boundary_note ();
>    return 0;
>  }
>
> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
>      }
>
>  done:
> -  cfg_layout_finalize ();
> +  cfg_layout_finalize (false);
>
>    BITMAP_FREE (candidates);
>    return 0;
> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
>    if (crossing_edges == NULL)
>      return 0;
>
> +  crtl->has_bb_partition = true;
> +
>    /* Make sure the source of any crossing edge ends in a jump and the
>       destination of any crossing edge has a label.  */
>    add_labels_and_missing_jumps (crossing_edges);
> Index: bb-reorder.h
> ===================================================================
> --- bb-reorder.h        (revision 193376)
> +++ bb-reorder.h        (working copy)
> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re
>
>  extern int get_uncond_jump_length (void);
>
> +extern void insert_section_boundary_note (void);
> +
> +extern void emit_barrier_after_bb (basic_block bb);
> +
>  #endif
> Index: basic-block.h
> ===================================================================
> --- basic-block.h       (revision 193376)
> +++ basic-block.h       (working copy)
> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect
>  extern bool contains_no_active_insn_p (const_basic_block);
>  extern bool forwarder_block_p (const_basic_block);
>  extern bool can_fallthru (basic_block, basic_block);
> +extern void fixup_partitions (void);
>
>  /* In cfgbuild.c.  */
>  extern void find_many_sub_basic_blocks (sbitmap);
> Index: cfgrtl.c
> ===================================================================
> --- cfgrtl.c    (revision 193376)
> +++ cfgrtl.c    (working copy)
> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree.h"
>  #include "hard-reg-set.h"
>  #include "basic-block.h"
> +#include "bb-reorder.h"
>  #include "regs.h"
>  #include "flags.h"
>  #include "function.h"
> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
>     Only applicable if the CFG is in cfglayout mode.  */
>  static GTY(()) rtx cfg_layout_function_footer;
>  static GTY(()) rtx cfg_layout_function_header;
> +static bool had_sec_boundary_notes;
>
>  static rtx skip_insns_after_block (basic_block);
>  static void record_effective_endpoints (void);
>  static rtx label_for_bb (basic_block);
> -static void fixup_reorder_chain (void);
> +static void fixup_reorder_chain (bool finalize_reorder_blocks);
>
>  void verify_insn_chain (void);
>  static void fixup_fallthru_exit_predecessor (void);
> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
>       partition boundaries).  See  the comments at the top of
>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>
> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
> -      || BB_PARTITION (src) != BB_PARTITION (target))
> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>      return NULL;
>
>    /* We can replace or remove a complex jump only when we have exactly
> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
>    return e;
>  }
>
> +/* Called when edge E has been redirected to a new destination,
> +   in order to update the region crossing flag on the edge and
> +   jump.  */
> +
> +static void
> +fixup_partition_crossing (edge e, basic_block target)
> +{
> +  rtx note;
> +
> +  gcc_assert (e->dest == target);
> +
> +  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
> +    return;
> +  /* If we redirected an existing edge, it may already be marked
> +     crossing, even though the new src is missing a reg crossing note.
> +     But make sure reg crossing note doesn't already exist before
> +     inserting.  */
> +  if (BB_PARTITION (e->src) != BB_PARTITION (target))
> +    {
> +      e->flags |= EDGE_CROSSING;
> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
> +      if (JUMP_P (BB_END (e->src))
> +          && !note)
> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
> +    }
> +  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
> +    {
> +      e->flags &= ~EDGE_CROSSING;
> +      /* Remove the region crossing note from jump at end of
> +         e->src if it exists.  */
> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
> +      if (note)
> +        remove_note (BB_END (e->src), note);
> +    }
> +}
> +
> +/* Called when block BB has been reassigned to a different partition,
> +   to ensure that the region crossing attributes are updated.  */
> +
> +static void
> +fixup_bb_partition (basic_block bb)
> +{
> +  edge e;
> +  edge_iterator ei;
> +
> +  /* Now need to make bb's pred edges non-region crossing.  */
> +  FOR_EACH_EDGE (e, ei, bb->preds)
> +    {
> +      fixup_partition_crossing (e, e->dest);
> +    }
> +
> +  /* Possibly need to make bb's successor edges region crossing,
> +     or remove stale region crossing.  */
> +  FOR_EACH_EDGE (e, ei, bb->succs)
> +    {
> +      if ((e->flags & EDGE_FALLTHRU)
> +          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
> +          && e->dest != EXIT_BLOCK_PTR)
> +        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
> +        force_nonfallthru (e);
> +      else
> +        fixup_partition_crossing (e, e->dest);
> +    }
> +}
> +
>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>     expense of adding new instructions or reordering basic blocks.
>
> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>  {
>    edge ret;
>    basic_block src = e->src;
> +  basic_block dest = e->dest;
>
>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>      return NULL;
>
> -  if (e->dest == target)
> +  if (dest == target)
>      return e;
>
>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
>      {
>        df_set_bb_dirty (src);
> +      fixup_partition_crossing (ret, target);
>        return ret;
>      }
>
> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>      return NULL;
>
>    df_set_bb_dirty (src);
> +  fixup_partition_crossing (ret, target);
>    return ret;
>  }
>
> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>        /* Make sure new block ends up in correct hot/cold section.  */
>
>        BB_COPY_PARTITION (jump_block, e->src);
> -      if (flag_reorder_blocks_and_partition
> -         && targetm_common.have_named_sections
> -         && JUMP_P (BB_END (jump_block))
> -         && !any_condjump_p (BB_END (jump_block))
> -         && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
> -       add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
>
>        /* Wire edge in.  */
>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
>        new_edge->probability = probability;
>        new_edge->count = count;
>
> +      /* If e->src was previously region crossing, it no longer is
> +         and the reg crossing note should be removed.  */
> +      fixup_partition_crossing (new_edge, jump_block);
> +
>        /* Redirect old edge.  */
>        redirect_edge_pred (e, jump_block);
>        e->probability = REG_BR_PROB_BASE;
> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>        LABEL_NUSES (label)++;
>      }
>
> -  emit_barrier_after (BB_END (jump_block));
> +  /* We might be in cfg layout mode, and if so, the following routine will
> +     insert the barrier correctly.  */
> +  emit_barrier_after_bb (jump_block);
>    redirect_edge_succ_nodup (e, target);
>
>    if (abnormal_edge_flags)
>      make_edge (src, target, abnormal_edge_flags);
>
>    df_mark_solutions_dirty ();
> +  fixup_partition_crossing (e, target);
>    return new_bb;
>  }
>
> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>  static basic_block
>  rtl_split_edge (edge edge_in)
>  {
> -  basic_block bb;
> +  basic_block bb, new_bb;
>    rtx before;
>
>    /* Abnormal edges cannot be split.  */
> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in)
>    else
>      {
>        bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
> -      /* ??? Why not edge_in->dest->prev_bb here?  */
> -      BB_COPY_PARTITION (bb, edge_in->dest);
> +      if (edge_in->src == ENTRY_BLOCK_PTR)
> +        BB_COPY_PARTITION (bb, edge_in->dest);
> +      else
> +        /* Put the split bb into the src partition, to avoid creating
> +           a situation where a cold bb dominates a hot bb, in the case
> +           where src is cold and dest is hot. The src will dominate
> +           the new bb (whereas it might not have dominated dest).  */
> +        BB_COPY_PARTITION (bb, edge_in->src);
>      }
>
>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
>
> +  /* Can't allow a region crossing edge to be fallthrough.  */
> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
> +      && edge_in->dest != EXIT_BLOCK_PTR)
> +    {
> +      new_bb = force_nonfallthru (single_succ_edge (bb));
> +      gcc_assert (!new_bb);
> +    }
> +
>    /* For non-fallthru edges, we must adjust the predecessor's
>       jump instruction to target our new block.  */
>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e)
>    else
>      {
>        bb = split_edge (e);
> -      after = BB_END (bb);
>
> -      if (flag_reorder_blocks_and_partition
> -         && targetm_common.have_named_sections
> -         && e->src != ENTRY_BLOCK_PTR
> -         && BB_PARTITION (e->src) == BB_COLD_PARTITION
> -         && !(e->flags & EDGE_CROSSING)
> -         && JUMP_P (after)
> -         && !any_condjump_p (after)
> -         && (single_succ_edge (bb)->flags & EDGE_CROSSING))
> -       add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
> +      /* If e crossed a partition boundary, we needed to make bb end in
> +         a region-crossing jump, even though it was originally fallthru.  */
> +      if (JUMP_P (BB_END (bb)))
> +       before = BB_END (bb);
> +      else
> +        after = BB_END (bb);
>      }
>
>    /* Now that we've found the spot, do the insertion.  */
> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void)
>  {
>    basic_block bb;
>
> +  /* Optimization passes that invoke this routine can cause hot blocks
> +     previously reached by both hot and cold blocks to become dominated only
> +     by cold blocks. This will cause the verification below to fail,
> +     and lead to now cold code in the hot section. In some cases this
> +     may only be visible after newly unreachable blocks are deleted,
> +     which will be done by fixup_partitions.  */
> +  fixup_partitions ();
> +
>  #ifdef ENABLE_CHECKING
>    verify_flow_info ();
>  #endif
> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb)
>
>    return end;
>  }
> -
> +
> +/* Perform cleanup on the hot/cold bb partitioning after optimization
> +   passes that modify the cfg.  */
> +
> +void
> +fixup_partitions (void)
> +{
> +  basic_block bb;
> +
> +  if (!crtl->has_bb_partition)
> +    return;
> +
> +  /* Delete any blocks that became unreachable and weren't
> +     already cleaned up, for example during edge forwarding
> +     and convert_jumps_to_returns. This will expose more
> +     opportunities for fixing the partition boundaries here.
> +     Also, the calculation of the dominance graph during verification
> +     will assert if there are unreachable nodes.  */
> +  delete_unreachable_blocks ();
> +
> +  /* If there are partitions, do a sanity check on them: A basic block in
> +     a cold partition cannot dominate a basic block in a hot partition.
> +     Fixup any that now violate this requirement, as a result of edge
> +     forwarding and unreachable block deletion.  */
> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
> +  VEC (basic_block, heap) *bbs_to_fix = NULL;
> +  FOR_EACH_BB (bb)
> +    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
> +      VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
> +    {
> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
> +      basic_block son;
> +
> +      if (dom_calculated_here)
> +        calculate_dominance_info (CDI_DOMINATORS);
> +
> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
> +        {
> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
> +          /* If bb is not yet cold (because it was added below as
> +             a block dominated by a cold bb) then mark it cold here.  */
> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
> +            {
> +              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
> +              VEC_safe_push (basic_block, heap, bbs_to_fix, bb);
> +            }
> +          /* Any blocks dominated by a block in the cold section
> +             must also be cold.  */
> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
> +               son;
> +               son = next_dom_son (CDI_DOMINATORS, son))
> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
> +        }
> +
> +      if (dom_calculated_here)
> +        free_dominance_info (CDI_DOMINATORS);
> +    }
> +
> +  /* Do the partition fixup after all necessary blocks have been converted to
> +     cold, so that we only update the region crossings the minimum number of
> +     places, which can require forcing edges to be non fallthru.  */
> +  while (! VEC_empty (basic_block, bbs_to_fix))
> +    {
> +      bb = VEC_pop (basic_block, bbs_to_fix);
> +      fixup_bb_partition (bb);
> +    }
> +}
> +
>  /* Verify the CFG and RTL consistency common for both underlying RTL and
>     cfglayout RTL.
>
> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void)
>    rtx x;
>    int err = 0;
>    basic_block bb;
> +  bool have_partitions = false;
>
>    /* Check the general integrity of the basic blocks.  */
>    FOR_EACH_BB_REVERSE (bb)
> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void)
>
>           if (e->flags & EDGE_ABNORMAL)
>             n_abnormal++;
> +
> +          have_partitions |= is_crossing;
>         }
>
>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void)
>           }
>      }
>
> +  /* If there are partitions, do a sanity check on them: A basic block in
> +     a cold partition cannot dominate a basic block in a hot partition.  */
> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
> +  if (have_partitions && !err)
> +    FOR_EACH_BB (bb)
> +      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
> +        VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
> +    {
> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
> +      basic_block son;
> +
> +      if (dom_calculated_here)
> +        calculate_dominance_info (CDI_DOMINATORS);
> +
> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
> +        {
> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
> +            {
> +              error ("non-cold basic block %d dominated "
> +                     "by a block in the cold partition", bb->index);
> +              err = 1;
> +            }
> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
> +               son;
> +               son = next_dom_son (CDI_DOMINATORS, son))
> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
> +        }
> +
> +      if (dom_calculated_here)
> +        free_dominance_info (CDI_DOMINATORS);
> +    }
> +
>    /* Clean up.  */
>    return err;
>  }
> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void)
>    else
>      cfg_layout_function_header = NULL_RTX;
>
> +  had_sec_boundary_notes = false;
> +
>    next_insn = get_insns ();
>    FOR_EACH_BB (bb)
>      {
>        rtx end;
>
>        if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
> -       BB_HEADER (bb) = unlink_insn_chain (next_insn,
> -                                             PREV_INSN (BB_HEAD (bb)));
> +        {
> +          /* Rather than try to keep section boundary notes incrementally
> +             up-to-date through cfg layout optimizations, simply remove them
> +             and flag that they should be re-inserted when exiting
> +             cfg layout mode.  */
> +          rtx check_insn = next_insn;
> +          while (check_insn)
> +            {
> +              if (NOTE_P (check_insn)
> +                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
> +              {
> +                had_sec_boundary_notes |= true;
> +                /* Remove note from chain. Grab new next_insn first.  */
> +                if (next_insn == check_insn)
> +                  next_insn = NEXT_INSN (check_insn);
> +                /* Delete note.  */
> +                delete_insn (check_insn);
> +                /* There will only be one.  */
> +                break;
> +              }
> +              check_insn = NEXT_INSN (check_insn);
> +            }
> +          /* If we still have header instructions left after above loop.  */
> +          if (next_insn != BB_HEAD (bb))
> +            BB_HEADER (bb) = unlink_insn_chain (next_insn,
> +                                                PREV_INSN (BB_HEAD (bb)));
> +        }
>        end = skip_insns_after_block (bb);
>        if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
>         BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void)
>      if (bb->next_bb != EXIT_BLOCK_PTR)
>        bb->aux = bb->next_bb;
>
> -  cfg_layout_finalize ();
> +  cfg_layout_finalize (false);
>
>    return 0;
>  }
> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
>  }
>
>
> -/* Given a reorder chain, rearrange the code to match.  */
> +/* Given a reorder chain, rearrange the code to match. If
> +   this is called when we will FINALIZE_REORDER_BLOCKS, or when
> +   section boundary notes were removed on entry to cfg layout
> +   mode, insert section boundary notes here.  */
>
>  static void
> -fixup_reorder_chain (void)
> +fixup_reorder_chain (bool finalize_reorder_blocks)
>  {
>    basic_block bb;
>    rtx insn = NULL;
> @@ -3150,7 +3373,7 @@ static void
>           PREV_INSN (BB_HEADER (bb)) = insn;
>           insn = BB_HEADER (bb);
>           while (NEXT_INSN (insn))
> -           insn = NEXT_INSN (insn);
> +            insn = NEXT_INSN (insn);
>         }
>        if (insn)
>         NEXT_INSN (insn) = BB_HEAD (bb);
> @@ -3175,6 +3398,11 @@ static void
>      insn = NEXT_INSN (insn);
>
>    set_last_insn (insn);
> +
> +  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
> +  if (had_sec_boundary_notes || finalize_reorder_blocks)
> +    insert_section_boundary_note ();
> +
>  #ifdef ENABLE_CHECKING
>    verify_insn_chain ();
>  #endif
> @@ -3187,7 +3415,7 @@ static void
>        edge e_fall, e_taken, e;
>        rtx bb_end_insn;
>        rtx ret_label = NULL_RTX;
> -      basic_block nb, src_bb;
> +      basic_block nb;
>        edge_iterator ei;
>
>        if (EDGE_COUNT (bb->succs) == 0)
> @@ -3322,7 +3550,6 @@ static void
>        /* We got here if we need to add a new jump insn.
>          Note force_nonfallthru can delete E_FALL and thus we have to
>          save E_FALL->src prior to the call to force_nonfallthru.  */
> -      src_bb = e_fall->src;
>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
>        if (nb)
>         {
> @@ -3330,17 +3557,6 @@ static void
>           bb->aux = nb;
>           /* Don't process this new block.  */
>           bb = nb;
> -
> -         /* Make sure new bb is tagged for correct section (same as
> -            fall-thru source, since you cannot fall-thru across
> -            section boundaries).  */
> -         BB_COPY_PARTITION (src_bb, single_pred (bb));
> -         if (flag_reorder_blocks_and_partition
> -             && targetm_common.have_named_sections
> -             && JUMP_P (BB_END (bb))
> -             && !any_condjump_p (BB_END (bb))
> -             && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
> -           add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
>         }
>      }
>
> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to)
>             case NOTE_INSN_FUNCTION_BEG:
>               /* There is always just single entry to function.  */
>             case NOTE_INSN_BASIC_BLOCK:
> +              /* We should only switch text sections once.  */
> +           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>               break;
>
>             case NOTE_INSN_EPILOGUE_BEG:
> -           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>               emit_note_copy (insn);
>               break;
>
> @@ -3759,10 +3976,13 @@ break_superblocks (void)
>  }
>
>  /* Finalize the changes: reorder insn list according to the sequence specified
> -   by aux pointers, enter compensation code, rebuild scope forest.  */
> +   by aux pointers, enter compensation code, rebuild scope forest. If
> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
> +   to fixup_reorder_chain so that it can insert the proper switch text
> +   section notes.  */
>
>  void
> -cfg_layout_finalize (void)
> +cfg_layout_finalize (bool finalize_reorder_blocks)
>  {
>  #ifdef ENABLE_CHECKING
>    verify_flow_info ();
> @@ -3775,7 +3995,7 @@ void
>  #endif
>        )
>      fixup_fallthru_exit_predecessor ();
> -  fixup_reorder_chain ();
> +  fixup_reorder_chain (finalize_reorder_blocks);
>
>    rebuild_jump_labels (get_insns ());
>    delete_dead_jumptables ();
> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e)
>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>      return false;
>
> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
> -      || BB_PARTITION (src) != BB_PARTITION (target))
> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>      return false;
>
>    if (!onlyjump_p (insn)
>
> --
> This patch is available for review at http://codereview.appspot.com/6823047
Christophe Lyon Nov. 26, 2012, 4:25 p.m. UTC | #2
Hi,

I have tested your patch on Spec2000 on ARM, and I can still see
several failures caused by:
"error: fallthru edge crosses section boundary", including the case
described in PR55121.

On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote:
> Ping.
> Teresa
>
> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote:
>> Revised patch that fixes failures encountered when enabling
>> -freorder-blocks-and-partition, including the failure reported in PR 53743.
>>
>> This includes new verification code to ensure no cold blocks dominate hot
>> blocks contributed by Steven Bosscher.
>>
>> I attempted to make the handling of partition updates through the optimization
>> passes much more consistent, removing a number of partial fixes in the code
>> stream in the process. The code to fixup partitions (including the BB_PARTITION
>> assignement, region crossing jump notes, and switch text section notes) is
>> now handled in a few centralized locations. For example, inside
>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
>> don't need to attempt the fixup themselves.
>>
>> For optimization passes that make adjustments to the cfg while in cfg layout
>> mode that are not easy to fix up incrementally, the new routine
>> fixup_partitions handles the cleanup globally. This does require calculation
>> of the dominance relation, however, as far as I can tell the routines which
>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
>> are invoked typically once (or a small number of times in the case of
>> try_optimize_cfg) per optimization pass. Additionally, I compared the
>> -ftime-report output for some large fdo compilations and saw only minimal
>> increases in the dominance computation times, which were only a tiny percent
>> of the overall compile time.
>>
>> Additionally, I added a flag to the rtl_data structure to indicate whether
>> any partitioning was actually performed, so that optimizations which were
>> conservatively disabled whenever the flag_reorder_blocks_and_partition
>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
>> conservative for functions where no partitions were formed (e.g. they are
>> completely hot).
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
>> benchmarks and internal google benchmarks using profile feedback and
>> -freorder-blocks-and-partition to get more coverage. Ok for trunk?
>>
>> Thanks,
>> Teresa
>>
>> 2012-11-14  Teresa Johnson  <tejohnson@google.com>
>>             Steven Bosscher  <steven@gcc.gnu.org>
>>
>>         * cfghooks.h (cfg_layout_finalize): New parameter.
>>         * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
>>         parameter.
>>         * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
>>         as this is now done by redirect_edge_and_branch_force.
>>         * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
>>         barriers, new cfg_layout_finalize parameter, and don't store exit
>>         predecessor BB until after it is potentially split.
>>         * function.h (struct rtl_data): New flag has_bb_partition.
>>         * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
>>         * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
>>         any blocks in function actually partitioned.
>>         (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
>>         up partitioning.
>>         * bb-reorder.c (connect_traces): Only look for partitions and skip
>>         block copying if any blocks in function actually partitioned.
>>         (emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
>>         (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
>>         that no cold blocks dominate a hot block.
>>         (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
>>         as this is now done by force_nonfallthru_and_redirect.
>>         (add_reg_crossing_jump_notes): Handle the fact that some jumps may
>>         already be marked with region crossing note.
>>         (reorder_basic_blocks): Only need to verify partitions if any
>>         blocks in function actually partitioned.
>>         (insert_section_boundary_note): Only need to insert note if any
>>         blocks in function actually partitioned.
>>         (rest_of_handle_reorder_blocks): New cfg_layout_finalize
>>         parameter, and remove call to insert_section_boundary_note as this
>>         is now called via cfg_layout_finalize/fixup_reorder_chain.
>>         (duplicate_computed_gotos): New cfg_layout_finalize
>>         parameter.
>>         (partition_hot_cold_basic_blocks): Set flag indicating function
>>         has bb partitions.
>>         * bb-reorder.h: Declare insert_section_boundary_note and
>>         emit_barrier_after_bb, which are no longer static.
>>         * basic-block.h: Declare new function fixup_partitions.
>>         * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
>>         check for region crossing note.
>>         (fixup_partition_crossing): New function.
>>         (fixup_bb_partition): Ditto.
>>         (rtl_redirect_edge_and_branch): Fixup partition boundaries.
>>         (force_nonfallthru_and_redirect): Fixup partition boundaries,
>>         remove old code that tried to do this. Emit barrier correctly
>>         when we are in cfglayout mode.
>>         (rtl_split_edge): Correctly fixup partition boundaries.
>>         (commit_one_edge_insertion): Remove old code that tried to
>>         fixup region crossing edge since this is now handled in
>>         split_block, and set up insertion point correctly since
>>         block may now end in a jump.
>>         (commit_edge_insertions): Invoke fixup_partitions to sanitize partition
>>         boundaries after optimizations that modify cfg and before trying to
>>         verify the flow info.
>>         (fixup_partitions): New function.
>>         (rtl_verify_flow_info_1): Add verification that no cold bbs dominate
>>         hot bbs.
>>         (record_effective_endpoints): Remove region-crossing notes and set flag
>>         indicating that they need to be reinserted on exit from cfglayout mode.
>>         (outof_cfg_layout_mode): New cfg_layout_finalize parameter.
>>         (fixup_reorder_chain): Call insert_section_boundary_note if necessary.
>>         Remove old code that attempted to fixup region crossing note as
>>         this is now handled in force_nonfallthru_and_redirect.
>>         (duplicate_insn_chain): Don't duplicate switch section notes.
>>         (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
>>         (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
>>         note.
>>
>> Index: cfghooks.h
>> ===================================================================
>> --- cfghooks.h  (revision 193376)
>> +++ cfghooks.h  (working copy)
>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
>>  void account_profile_record (struct profile_record *, int);
>>
>>  extern void cfg_layout_initialize (unsigned int);
>> -extern void cfg_layout_finalize (void);
>> +extern void cfg_layout_finalize (bool);
>>
>>  /* Hooks containers.  */
>>  extern struct cfg_hooks gimple_cfg_hooks;
>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
>>  extern void gimple_register_cfg_hooks (void);
>>  extern struct cfg_hooks get_cfg_hooks (void);
>>  extern void set_cfg_hooks (struct cfg_hooks);
>> -
>> Index: modulo-sched.c
>> ===================================================================
>> --- modulo-sched.c      (revision 193376)
>> +++ modulo-sched.c      (working copy)
>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void)
>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>        bb->aux = bb->next_bb;
>>    free_dominance_info (CDI_DOMINATORS);
>> -  cfg_layout_finalize ();
>> +  cfg_layout_finalize (false);
>>  #endif /* INSN_SCHEDULING */
>>    return 0;
>>  }
>> Index: ifcvt.c
>> ===================================================================
>> --- ifcvt.c     (revision 193376)
>> +++ ifcvt.c     (working copy)
>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
>>    if (new_bb)
>>      {
>>        df_bb_replace (then_bb_index, new_bb);
>> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
>> -         we need to ensure that new_bb is in the same partition as
>> -         test bb (you can not fall through across section boundaries).  */
>> -      BB_COPY_PARTITION (new_bb, test_bb);
>> +      /* This should have been done above via force_nonfallthru_and_redirect
>> +         (possibly called from redirect_edge_and_branch_force).  */
>> +      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
>>      }
>>
>>    num_true_changes++;
>> Index: function.c
>> ===================================================================
>> --- function.c  (revision 193376)
>> +++ function.c  (working copy)
>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void)
>>                     break;
>>                 if (e)
>>                   {
>> -                   copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
>> -                                                 NULL_RTX, e->src);
>> +                    /* Make sure we insert after any barriers.  */
>> +                    rtx end = get_last_bb_insn (e->src);
>> +                    copy_bb = create_basic_block (NEXT_INSN (end),
>> +                                                  NULL_RTX, e->src);
>>                     BB_COPY_PARTITION (copy_bb, e->src);
>>                   }
>>                 else
>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void)
>>         if (cur_bb->index >= NUM_FIXED_BLOCKS
>>             && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
>>           cur_bb->aux = cur_bb->next_bb;
>> -      cfg_layout_finalize ();
>> +      cfg_layout_finalize (false);
>>      }
>>
>>  epilogue_done:
>> @@ -6517,7 +6519,7 @@ epilogue_done:
>>        basic_block simple_return_block_cold = NULL;
>>        edge pending_edge_hot = NULL;
>>        edge pending_edge_cold = NULL;
>> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
>> +      basic_block exit_pred;
>>        int i;
>>
>>        gcc_assert (entry_edge != orig_entry_edge);
>> @@ -6545,6 +6547,12 @@ epilogue_done:
>>             else
>>               pending_edge_cold = e;
>>           }
>> +
>> +      /* Save a pointer to the exit's predecessor BB for use in
>> +         inserting new BBs at the end of the function. Do this
>> +         after the call to split_block above which may split
>> +         the original exit pred.  */
>> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>
>>        FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
>>         {
>> Index: function.h
>> ===================================================================
>> --- function.h  (revision 193376)
>> +++ function.h  (working copy)
>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data {
>>       sched2) and is useful only if the port defines LEAF_REGISTERS.  */
>>    bool uses_only_leaf_regs;
>>
>> +  /* Nonzero if the function being compiled has undergone hot/cold partitioning
>> +     (under flag_reorder_blocks_and_partition) and has at least one cold
>> +     block.  */
>> +  bool has_bb_partition;
>> +
>>    /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
>>       asm.  Unlike regs_ever_live, elements of this array corresponding
>>       to eliminable regs (like the frame pointer) are set if an asm
>> Index: hw-doloop.c
>> ===================================================================
>> --- hw-doloop.c (revision 193376)
>> +++ hw-doloop.c (working copy)
>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
>>        else
>>         bb->aux = NULL;
>>      }
>> -  cfg_layout_finalize ();
>> +  cfg_layout_finalize (false);
>>    clear_aux_for_blocks ();
>>    df_analyze ();
>>  }
>> Index: cfgcleanup.c
>> ===================================================================
>> --- cfgcleanup.c        (revision 193376)
>> +++ cfgcleanup.c        (working copy)
>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
>>       partition boundaries).  See the comments at the top of
>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>
>> -  if (flag_reorder_blocks_and_partition && reload_completed)
>> +  if (crtl->has_bb_partition && reload_completed)
>>      return false;
>>
>>    /* Search backward through forwarder blocks.  We don't need to worry
>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode)
>>               df_analyze ();
>>             }
>>
>> +         if (changed)
>> +            {
>> +              /* Edge forwarding in particular can cause hot blocks previously
>> +                 reached by both hot and cold blocks to become dominated only
>> +                 by cold blocks. This will cause the verification below to fail,
>> +                 and lead to now cold code in the hot section. This is not easy
>> +                 to detect and fix during edge forwarding, and in some cases
>> +                 is only visible after newly unreachable blocks are deleted,
>> +                 which will be done in fixup_partitions.  */
>> +              fixup_partitions ();
>> +
>>  #ifdef ENABLE_CHECKING
>> -         if (changed)
>> -           verify_flow_info ();
>> +              verify_flow_info ();
>>  #endif
>> +            }
>>
>>           changed_overall |= changed;
>>           first_pass = false;
>> Index: bb-reorder.c
>> ===================================================================
>> --- bb-reorder.c        (revision 193376)
>> +++ bb-reorder.c        (working copy)
>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
>>    current_partition = BB_PARTITION (traces[0].first);
>>    two_passes = false;
>>
>> -  if (flag_reorder_blocks_and_partition)
>> +  if (crtl->has_bb_partition)
>>      for (i = 0; i < n_traces && !two_passes; i++)
>>        if (BB_PARTITION (traces[0].first)
>>           != BB_PARTITION (traces[i].first))
>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
>>                       }
>>                   }
>>
>> -             if (flag_reorder_blocks_and_partition)
>> +             if (crtl->has_bb_partition)
>>                 try_copy = false;
>>
>>               /* Copy tiny blocks always; copy larger blocks only when the
>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
>>    return length;
>>  }
>>
>> -/* Emit a barrier into the footer of BB.  */
>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
>>
>> -static void
>> +void
>>  emit_barrier_after_bb (basic_block bb)
>>  {
>>    rtx barrier = emit_barrier_after (BB_END (bb));
>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>  }
>>
>>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
>>  {
>>    VEC(edge, heap) *crossing_edges = NULL;
>>    basic_block bb;
>> -  edge e;
>> -  edge_iterator ei;
>> +  edge e, e2;
>> +  edge_iterator ei, ei2;
>> +  unsigned int cold_bb_count = 0;
>> +  VEC (basic_block, heap) *bbs_in_hot_partition = NULL;
>> +  VEC (basic_block, heap) *bbs_newly_hot = NULL;
>>
>>    /* Mark which partition (hot/cold) each basic block belongs in.  */
>>    FOR_EACH_BB (bb)
>>      {
>>        if (probably_never_executed_bb_p (cfun, bb))
>> -       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>> +        {
>> +          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>> +          cold_bb_count++;
>> +        }
>>        else
>> -       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>> +        {
>> +          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb);
>> +        }
>>      }
>>
>> +  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
>> +     several different possibilities. One is that there are edge weight insanities
>> +     due to optimization phases that do not properly update basic block profile
>> +     counts. The second is that the entry of the function may not be hot, because
>> +     it is entered fewer times than the number of profile training runs, but there
>> +     is a loop inside the function that causes blocks within the function to be
>> +     above the threshold for hotness.  */
>> +  if (cold_bb_count)
>> +    {
>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>> +
>> +      if (dom_calculated_here)
>> +        calculate_dominance_info (CDI_DOMINATORS);
>> +
>> +      /* Keep examining hot bbs until we have either checked them all, or
>> +         re-marked all cold bbs hot.  */
>> +      while (! VEC_empty (basic_block, bbs_in_hot_partition)
>> +             && cold_bb_count)
>> +        {
>> +          basic_block dom_bb;
>> +
>> +          bb = VEC_pop (basic_block, bbs_in_hot_partition);
>> +          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
>> +
>> +          /* If bb's immediate dominator is also hot then it is ok.  */
>> +          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
>> +            continue;
>> +
>> +          /* We have a hot bb with an immediate dominator that is cold.
>> +             The dominator needs to be re-marked to hot.  */
>> +          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
>> +          cold_bb_count--;
>> +
>> +          /* Now we need to examine newly-hot dom_bb to see if it is also
>> +             dominated by a cold bb.  */
>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb);
>> +
>> +          /* We should also adjust any cold blocks that the newly-hot bb
>> +             feeds and see if it makes sense to re-mark those as hot as
>> +             well.  */
>> +          VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb);
>> +          while (! VEC_empty (basic_block, bbs_newly_hot))
>> +            {
>> +              basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot);
>> +              /* Examine all successors of this newly-hot bb to see if they
>> +                 are cold and should be re-marked as hot.  */
>> +              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
>> +                {
>> +                  bool any_cold_preds = false;
>> +                  basic_block succ = e->dest;
>> +                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
>> +                    continue;
>> +                  /* Does this block have any cold predecessors now?  */
>> +                  FOR_EACH_EDGE (e2, ei2, succ->preds)
>> +                  {
>> +                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
>> +                      {
>> +                        any_cold_preds = true;
>> +                        break;
>> +                      }
>> +                  }
>> +                  if (any_cold_preds)
>> +                    continue;
>> +
>> +                  /* Here we have a successor of newly-hot bb that is cold
>> +                     but no longer has any cold precessessors. Since the original
>> +                     assignment of our newly-hot bb was incorrect, this successor's
>> +                     assignment as cold is also suspect. Go ahead and re-mark it
>> +                     as hot now too. Better heuristics may be in order here.  */
>> +                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
>> +                  cold_bb_count--;
>> +                  VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ);
>> +                  /* Examine this successor as a newly-hot bb.  */
>> +                  VEC_safe_push (basic_block, heap, bbs_newly_hot, succ);
>> +                }
>> +            }
>> +        }
>> +
>> +      if (dom_calculated_here)
>> +        free_dominance_info (CDI_DOMINATORS);
>> +    }
>> +
>>    /* The format of .gcc_except_table does not allow landing pads to
>>       be in a different partition as the throw.  Fix this by either
>>       moving or duplicating the landing pads.  */
>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
>>                       new_bb->aux = cur_bb->aux;
>>                       cur_bb->aux = new_bb;
>>
>> -                     /* Make sure new fall-through bb is in same
>> -                        partition as bb it's falling through from.  */
>> +                      /* This is done by force_nonfallthru_and_redirect.  */
>> +                     gcc_assert (BB_PARTITION (new_bb)
>> +                                  == BB_PARTITION (cur_bb));
>>
>> -                     BB_COPY_PARTITION (new_bb, cur_bb);
>>                       single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
>>                     }
>>                   else
>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
>>    FOR_EACH_BB (bb)
>>      FOR_EACH_EDGE (e, ei, bb->succs)
>>        if ((e->flags & EDGE_CROSSING)
>> -         && JUMP_P (BB_END (e->src)))
>> +         && JUMP_P (BB_END (e->src))
>> +          /* Some notes were added during fix_up_fall_thru_edges, via
>> +             force_nonfallthru_and_redirect.  */
>> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
>>         add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>  }
>>
>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
>>        dump_flow_info (dump_file, dump_flags);
>>      }
>>
>> -  if (flag_reorder_blocks_and_partition)
>> +  if (crtl->has_bb_partition)
>>      verify_hot_cold_block_grouping ();
>>  }
>>
>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
>>     encountering this note will make the compiler switch between the
>>     hot and cold text sections.  */
>>
>> -static void
>> +void
>>  insert_section_boundary_note (void)
>>  {
>>    basic_block bb;
>>    rtx new_note;
>>    int first_partition = 0;
>>
>> -  if (!flag_reorder_blocks_and_partition)
>> +  if (!crtl->has_bb_partition)
>>      return;
>>
>>    FOR_EACH_BB (bb)
>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
>>    FOR_EACH_BB (bb)
>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>        bb->aux = bb->next_bb;
>> -  cfg_layout_finalize ();
>> +  cfg_layout_finalize (true);
>>
>> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>> -  insert_section_boundary_note ();
>>    return 0;
>>  }
>>
>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
>>      }
>>
>>  done:
>> -  cfg_layout_finalize ();
>> +  cfg_layout_finalize (false);
>>
>>    BITMAP_FREE (candidates);
>>    return 0;
>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
>>    if (crossing_edges == NULL)
>>      return 0;
>>
>> +  crtl->has_bb_partition = true;
>> +
>>    /* Make sure the source of any crossing edge ends in a jump and the
>>       destination of any crossing edge has a label.  */
>>    add_labels_and_missing_jumps (crossing_edges);
>> Index: bb-reorder.h
>> ===================================================================
>> --- bb-reorder.h        (revision 193376)
>> +++ bb-reorder.h        (working copy)
>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re
>>
>>  extern int get_uncond_jump_length (void);
>>
>> +extern void insert_section_boundary_note (void);
>> +
>> +extern void emit_barrier_after_bb (basic_block bb);
>> +
>>  #endif
>> Index: basic-block.h
>> ===================================================================
>> --- basic-block.h       (revision 193376)
>> +++ basic-block.h       (working copy)
>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect
>>  extern bool contains_no_active_insn_p (const_basic_block);
>>  extern bool forwarder_block_p (const_basic_block);
>>  extern bool can_fallthru (basic_block, basic_block);
>> +extern void fixup_partitions (void);
>>
>>  /* In cfgbuild.c.  */
>>  extern void find_many_sub_basic_blocks (sbitmap);
>> Index: cfgrtl.c
>> ===================================================================
>> --- cfgrtl.c    (revision 193376)
>> +++ cfgrtl.c    (working copy)
>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree.h"
>>  #include "hard-reg-set.h"
>>  #include "basic-block.h"
>> +#include "bb-reorder.h"
>>  #include "regs.h"
>>  #include "flags.h"
>>  #include "function.h"
>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
>>     Only applicable if the CFG is in cfglayout mode.  */
>>  static GTY(()) rtx cfg_layout_function_footer;
>>  static GTY(()) rtx cfg_layout_function_header;
>> +static bool had_sec_boundary_notes;
>>
>>  static rtx skip_insns_after_block (basic_block);
>>  static void record_effective_endpoints (void);
>>  static rtx label_for_bb (basic_block);
>> -static void fixup_reorder_chain (void);
>> +static void fixup_reorder_chain (bool finalize_reorder_blocks);
>>
>>  void verify_insn_chain (void);
>>  static void fixup_fallthru_exit_predecessor (void);
>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
>>       partition boundaries).  See  the comments at the top of
>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>
>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>      return NULL;
>>
>>    /* We can replace or remove a complex jump only when we have exactly
>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
>>    return e;
>>  }
>>
>> +/* Called when edge E has been redirected to a new destination,
>> +   in order to update the region crossing flag on the edge and
>> +   jump.  */
>> +
>> +static void
>> +fixup_partition_crossing (edge e, basic_block target)
>> +{
>> +  rtx note;
>> +
>> +  gcc_assert (e->dest == target);
>> +
>> +  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
>> +    return;
>> +  /* If we redirected an existing edge, it may already be marked
>> +     crossing, even though the new src is missing a reg crossing note.
>> +     But make sure reg crossing note doesn't already exist before
>> +     inserting.  */
>> +  if (BB_PARTITION (e->src) != BB_PARTITION (target))
>> +    {
>> +      e->flags |= EDGE_CROSSING;
>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>> +      if (JUMP_P (BB_END (e->src))
>> +          && !note)
>> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>> +    }
>> +  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
>> +    {
>> +      e->flags &= ~EDGE_CROSSING;
>> +      /* Remove the region crossing note from jump at end of
>> +         e->src if it exists.  */
>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>> +      if (note)
>> +        remove_note (BB_END (e->src), note);
>> +    }
>> +}
>> +
>> +/* Called when block BB has been reassigned to a different partition,
>> +   to ensure that the region crossing attributes are updated.  */
>> +
>> +static void
>> +fixup_bb_partition (basic_block bb)
>> +{
>> +  edge e;
>> +  edge_iterator ei;
>> +
>> +  /* Now need to make bb's pred edges non-region crossing.  */
>> +  FOR_EACH_EDGE (e, ei, bb->preds)
>> +    {
>> +      fixup_partition_crossing (e, e->dest);
>> +    }
>> +
>> +  /* Possibly need to make bb's successor edges region crossing,
>> +     or remove stale region crossing.  */
>> +  FOR_EACH_EDGE (e, ei, bb->succs)
>> +    {
>> +      if ((e->flags & EDGE_FALLTHRU)
>> +          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
>> +          && e->dest != EXIT_BLOCK_PTR)
>> +        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
>> +        force_nonfallthru (e);
>> +      else
>> +        fixup_partition_crossing (e, e->dest);
>> +    }
>> +}
>> +
>>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>>     expense of adding new instructions or reordering basic blocks.
>>
>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>  {
>>    edge ret;
>>    basic_block src = e->src;
>> +  basic_block dest = e->dest;
>>
>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>      return NULL;
>>
>> -  if (e->dest == target)
>> +  if (dest == target)
>>      return e;
>>
>>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
>>      {
>>        df_set_bb_dirty (src);
>> +      fixup_partition_crossing (ret, target);
>>        return ret;
>>      }
>>
>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>      return NULL;
>>
>>    df_set_bb_dirty (src);
>> +  fixup_partition_crossing (ret, target);
>>    return ret;
>>  }
>>
>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>        /* Make sure new block ends up in correct hot/cold section.  */
>>
>>        BB_COPY_PARTITION (jump_block, e->src);
>> -      if (flag_reorder_blocks_and_partition
>> -         && targetm_common.have_named_sections
>> -         && JUMP_P (BB_END (jump_block))
>> -         && !any_condjump_p (BB_END (jump_block))
>> -         && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
>> -       add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
>>
>>        /* Wire edge in.  */
>>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
>>        new_edge->probability = probability;
>>        new_edge->count = count;
>>
>> +      /* If e->src was previously region crossing, it no longer is
>> +         and the reg crossing note should be removed.  */
>> +      fixup_partition_crossing (new_edge, jump_block);
>> +
>>        /* Redirect old edge.  */
>>        redirect_edge_pred (e, jump_block);
>>        e->probability = REG_BR_PROB_BASE;
>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>        LABEL_NUSES (label)++;
>>      }
>>
>> -  emit_barrier_after (BB_END (jump_block));
>> +  /* We might be in cfg layout mode, and if so, the following routine will
>> +     insert the barrier correctly.  */
>> +  emit_barrier_after_bb (jump_block);
>>    redirect_edge_succ_nodup (e, target);
>>
>>    if (abnormal_edge_flags)
>>      make_edge (src, target, abnormal_edge_flags);
>>
>>    df_mark_solutions_dirty ();
>> +  fixup_partition_crossing (e, target);
>>    return new_bb;
>>  }
>>
>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>>  static basic_block
>>  rtl_split_edge (edge edge_in)
>>  {
>> -  basic_block bb;
>> +  basic_block bb, new_bb;
>>    rtx before;
>>
>>    /* Abnormal edges cannot be split.  */
>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in)
>>    else
>>      {
>>        bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
>> -      /* ??? Why not edge_in->dest->prev_bb here?  */
>> -      BB_COPY_PARTITION (bb, edge_in->dest);
>> +      if (edge_in->src == ENTRY_BLOCK_PTR)
>> +        BB_COPY_PARTITION (bb, edge_in->dest);
>> +      else
>> +        /* Put the split bb into the src partition, to avoid creating
>> +           a situation where a cold bb dominates a hot bb, in the case
>> +           where src is cold and dest is hot. The src will dominate
>> +           the new bb (whereas it might not have dominated dest).  */
>> +        BB_COPY_PARTITION (bb, edge_in->src);
>>      }
>>
>>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
>>
>> +  /* Can't allow a region crossing edge to be fallthrough.  */
>> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
>> +      && edge_in->dest != EXIT_BLOCK_PTR)
>> +    {
>> +      new_bb = force_nonfallthru (single_succ_edge (bb));
>> +      gcc_assert (!new_bb);
>> +    }
>> +
>>    /* For non-fallthru edges, we must adjust the predecessor's
>>       jump instruction to target our new block.  */
>>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e)
>>    else
>>      {
>>        bb = split_edge (e);
>> -      after = BB_END (bb);
>>
>> -      if (flag_reorder_blocks_and_partition
>> -         && targetm_common.have_named_sections
>> -         && e->src != ENTRY_BLOCK_PTR
>> -         && BB_PARTITION (e->src) == BB_COLD_PARTITION
>> -         && !(e->flags & EDGE_CROSSING)
>> -         && JUMP_P (after)
>> -         && !any_condjump_p (after)
>> -         && (single_succ_edge (bb)->flags & EDGE_CROSSING))
>> -       add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
>> +      /* If e crossed a partition boundary, we needed to make bb end in
>> +         a region-crossing jump, even though it was originally fallthru.  */
>> +      if (JUMP_P (BB_END (bb)))
>> +       before = BB_END (bb);
>> +      else
>> +        after = BB_END (bb);
>>      }
>>
>>    /* Now that we've found the spot, do the insertion.  */
>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void)
>>  {
>>    basic_block bb;
>>
>> +  /* Optimization passes that invoke this routine can cause hot blocks
>> +     previously reached by both hot and cold blocks to become dominated only
>> +     by cold blocks. This will cause the verification below to fail,
>> +     and lead to now cold code in the hot section. In some cases this
>> +     may only be visible after newly unreachable blocks are deleted,
>> +     which will be done by fixup_partitions.  */
>> +  fixup_partitions ();
>> +
>>  #ifdef ENABLE_CHECKING
>>    verify_flow_info ();
>>  #endif
>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb)
>>
>>    return end;
>>  }
>> -
>> +
>> +/* Perform cleanup on the hot/cold bb partitioning after optimization
>> +   passes that modify the cfg.  */
>> +
>> +void
>> +fixup_partitions (void)
>> +{
>> +  basic_block bb;
>> +
>> +  if (!crtl->has_bb_partition)
>> +    return;
>> +
>> +  /* Delete any blocks that became unreachable and weren't
>> +     already cleaned up, for example during edge forwarding
>> +     and convert_jumps_to_returns. This will expose more
>> +     opportunities for fixing the partition boundaries here.
>> +     Also, the calculation of the dominance graph during verification
>> +     will assert if there are unreachable nodes.  */
>> +  delete_unreachable_blocks ();
>> +
>> +  /* If there are partitions, do a sanity check on them: A basic block in
>> +     a cold partition cannot dominate a basic block in a hot partition.
>> +     Fixup any that now violate this requirement, as a result of edge
>> +     forwarding and unreachable block deletion.  */
>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>> +  VEC (basic_block, heap) *bbs_to_fix = NULL;
>> +  FOR_EACH_BB (bb)
>> +    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>> +      VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>> +    {
>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>> +      basic_block son;
>> +
>> +      if (dom_calculated_here)
>> +        calculate_dominance_info (CDI_DOMINATORS);
>> +
>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>> +        {
>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>> +          /* If bb is not yet cold (because it was added below as
>> +             a block dominated by a cold bb) then mark it cold here.  */
>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>> +            {
>> +              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>> +              VEC_safe_push (basic_block, heap, bbs_to_fix, bb);
>> +            }
>> +          /* Any blocks dominated by a block in the cold section
>> +             must also be cold.  */
>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>> +               son;
>> +               son = next_dom_son (CDI_DOMINATORS, son))
>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>> +        }
>> +
>> +      if (dom_calculated_here)
>> +        free_dominance_info (CDI_DOMINATORS);
>> +    }
>> +
>> +  /* Do the partition fixup after all necessary blocks have been converted to
>> +     cold, so that we only update the region crossings the minimum number of
>> +     places, which can require forcing edges to be non fallthru.  */
>> +  while (! VEC_empty (basic_block, bbs_to_fix))
>> +    {
>> +      bb = VEC_pop (basic_block, bbs_to_fix);
>> +      fixup_bb_partition (bb);
>> +    }
>> +}
>> +
>>  /* Verify the CFG and RTL consistency common for both underlying RTL and
>>     cfglayout RTL.
>>
>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void)
>>    rtx x;
>>    int err = 0;
>>    basic_block bb;
>> +  bool have_partitions = false;
>>
>>    /* Check the general integrity of the basic blocks.  */
>>    FOR_EACH_BB_REVERSE (bb)
>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void)
>>
>>           if (e->flags & EDGE_ABNORMAL)
>>             n_abnormal++;
>> +
>> +          have_partitions |= is_crossing;
>>         }
>>
>>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void)
>>           }
>>      }
>>
>> +  /* If there are partitions, do a sanity check on them: A basic block in
>> +     a cold partition cannot dominate a basic block in a hot partition.  */
>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>> +  if (have_partitions && !err)
>> +    FOR_EACH_BB (bb)
>> +      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>> +        VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>> +    {
>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>> +      basic_block son;
>> +
>> +      if (dom_calculated_here)
>> +        calculate_dominance_info (CDI_DOMINATORS);
>> +
>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>> +        {
>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>> +            {
>> +              error ("non-cold basic block %d dominated "
>> +                     "by a block in the cold partition", bb->index);
>> +              err = 1;
>> +            }
>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>> +               son;
>> +               son = next_dom_son (CDI_DOMINATORS, son))
>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>> +        }
>> +
>> +      if (dom_calculated_here)
>> +        free_dominance_info (CDI_DOMINATORS);
>> +    }
>> +
>>    /* Clean up.  */
>>    return err;
>>  }
>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void)
>>    else
>>      cfg_layout_function_header = NULL_RTX;
>>
>> +  had_sec_boundary_notes = false;
>> +
>>    next_insn = get_insns ();
>>    FOR_EACH_BB (bb)
>>      {
>>        rtx end;
>>
>>        if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
>> -       BB_HEADER (bb) = unlink_insn_chain (next_insn,
>> -                                             PREV_INSN (BB_HEAD (bb)));
>> +        {
>> +          /* Rather than try to keep section boundary notes incrementally
>> +             up-to-date through cfg layout optimizations, simply remove them
>> +             and flag that they should be re-inserted when exiting
>> +             cfg layout mode.  */
>> +          rtx check_insn = next_insn;
>> +          while (check_insn)
>> +            {
>> +              if (NOTE_P (check_insn)
>> +                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
>> +              {
>> +                had_sec_boundary_notes |= true;
>> +                /* Remove note from chain. Grab new next_insn first.  */
>> +                if (next_insn == check_insn)
>> +                  next_insn = NEXT_INSN (check_insn);
>> +                /* Delete note.  */
>> +                delete_insn (check_insn);
>> +                /* There will only be one.  */
>> +                break;
>> +              }
>> +              check_insn = NEXT_INSN (check_insn);
>> +            }
>> +          /* If we still have header instructions left after above loop.  */
>> +          if (next_insn != BB_HEAD (bb))
>> +            BB_HEADER (bb) = unlink_insn_chain (next_insn,
>> +                                                PREV_INSN (BB_HEAD (bb)));
>> +        }
>>        end = skip_insns_after_block (bb);
>>        if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
>>         BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void)
>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>        bb->aux = bb->next_bb;
>>
>> -  cfg_layout_finalize ();
>> +  cfg_layout_finalize (false);
>>
>>    return 0;
>>  }
>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
>>  }
>>
>>
>> -/* Given a reorder chain, rearrange the code to match.  */
>> +/* Given a reorder chain, rearrange the code to match. If
>> +   this is called when we will FINALIZE_REORDER_BLOCKS, or when
>> +   section boundary notes were removed on entry to cfg layout
>> +   mode, insert section boundary notes here.  */
>>
>>  static void
>> -fixup_reorder_chain (void)
>> +fixup_reorder_chain (bool finalize_reorder_blocks)
>>  {
>>    basic_block bb;
>>    rtx insn = NULL;
>> @@ -3150,7 +3373,7 @@ static void
>>           PREV_INSN (BB_HEADER (bb)) = insn;
>>           insn = BB_HEADER (bb);
>>           while (NEXT_INSN (insn))
>> -           insn = NEXT_INSN (insn);
>> +            insn = NEXT_INSN (insn);
>>         }
>>        if (insn)
>>         NEXT_INSN (insn) = BB_HEAD (bb);
>> @@ -3175,6 +3398,11 @@ static void
>>      insn = NEXT_INSN (insn);
>>
>>    set_last_insn (insn);
>> +
>> +  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>> +  if (had_sec_boundary_notes || finalize_reorder_blocks)
>> +    insert_section_boundary_note ();
>> +
>>  #ifdef ENABLE_CHECKING
>>    verify_insn_chain ();
>>  #endif
>> @@ -3187,7 +3415,7 @@ static void
>>        edge e_fall, e_taken, e;
>>        rtx bb_end_insn;
>>        rtx ret_label = NULL_RTX;
>> -      basic_block nb, src_bb;
>> +      basic_block nb;
>>        edge_iterator ei;
>>
>>        if (EDGE_COUNT (bb->succs) == 0)
>> @@ -3322,7 +3550,6 @@ static void
>>        /* We got here if we need to add a new jump insn.
>>          Note force_nonfallthru can delete E_FALL and thus we have to
>>          save E_FALL->src prior to the call to force_nonfallthru.  */
>> -      src_bb = e_fall->src;
>>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
>>        if (nb)
>>         {
>> @@ -3330,17 +3557,6 @@ static void
>>           bb->aux = nb;
>>           /* Don't process this new block.  */
>>           bb = nb;
>> -
>> -         /* Make sure new bb is tagged for correct section (same as
>> -            fall-thru source, since you cannot fall-thru across
>> -            section boundaries).  */
>> -         BB_COPY_PARTITION (src_bb, single_pred (bb));
>> -         if (flag_reorder_blocks_and_partition
>> -             && targetm_common.have_named_sections
>> -             && JUMP_P (BB_END (bb))
>> -             && !any_condjump_p (BB_END (bb))
>> -             && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
>> -           add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
>>         }
>>      }
>>
>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to)
>>             case NOTE_INSN_FUNCTION_BEG:
>>               /* There is always just single entry to function.  */
>>             case NOTE_INSN_BASIC_BLOCK:
>> +              /* We should only switch text sections once.  */
>> +           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>               break;
>>
>>             case NOTE_INSN_EPILOGUE_BEG:
>> -           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>               emit_note_copy (insn);
>>               break;
>>
>> @@ -3759,10 +3976,13 @@ break_superblocks (void)
>>  }
>>
>>  /* Finalize the changes: reorder insn list according to the sequence specified
>> -   by aux pointers, enter compensation code, rebuild scope forest.  */
>> +   by aux pointers, enter compensation code, rebuild scope forest. If
>> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
>> +   to fixup_reorder_chain so that it can insert the proper switch text
>> +   section notes.  */
>>
>>  void
>> -cfg_layout_finalize (void)
>> +cfg_layout_finalize (bool finalize_reorder_blocks)
>>  {
>>  #ifdef ENABLE_CHECKING
>>    verify_flow_info ();
>> @@ -3775,7 +3995,7 @@ void
>>  #endif
>>        )
>>      fixup_fallthru_exit_predecessor ();
>> -  fixup_reorder_chain ();
>> +  fixup_reorder_chain (finalize_reorder_blocks);
>>
>>    rebuild_jump_labels (get_insns ());
>>    delete_dead_jumptables ();
>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e)
>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>      return false;
>>
>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>      return false;
>>
>>    if (!onlyjump_p (insn)
>>
>> --
>> This patch is available for review at http://codereview.appspot.com/6823047
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson Nov. 26, 2012, 8:19 p.m. UTC | #3
Are you sure you have all my changes applied? I applied the 4 patches
attached to PR55121 into my trunk checkout that has my fixes, and to a
pristine trunk checkout. I configured and built both for
--target=arm-none-linux-gnueabi, and built using your options, .i file
and gcda file. I can reproduce the failure using the pristine trunk
with your patches but not with my fixed trunk + your patches. (I just
updated to head to pickup recent changes and get the same result. The
vec changes required some manual changes to the patch, which I will
resend shortly.)

Without my fixes:

$ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce
ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
-mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
-mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
-fno-common -o eval.s -freorder-blocks-and-partition
GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
2.4.2-p1, MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
2.4.2-p1, MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a
eval.c: In function ‘Ge’:
eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560
 }
 ^
0x622f71 df_compact_blocks()
../../gcc_trunk_3/gcc/df-core.c:1560
0x5cfcb5 compact_blocks()
../../gcc_trunk_3/gcc/cfg.c:162
0xc9dce0 reorder_basic_blocks
../../gcc_trunk_3/gcc/bb-reorder.c:2154
0xc9dce0 rest_of_handle_reorder_blocks
../../gcc_trunk_3/gcc/bb-reorder.c:2219
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.


With my fixes:

$ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce
ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
-mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
-mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
-fno-common -o eval.s -freorder-blocks-and-partition
GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
2.4.2-p1, MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
2.4.2-p1, MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3


Thanks,
Teresa

On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> Hi,
>
> I have tested your patch on Spec2000 on ARM, and I can still see
> several failures caused by:
> "error: fallthru edge crosses section boundary", including the case
> described in PR55121.
>
> On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote:
>> Ping.
>> Teresa
>>
>> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>> Revised patch that fixes failures encountered when enabling
>>> -freorder-blocks-and-partition, including the failure reported in PR 53743.
>>>
>>> This includes new verification code to ensure no cold blocks dominate hot
>>> blocks contributed by Steven Bosscher.
>>>
>>> I attempted to make the handling of partition updates through the optimization
>>> passes much more consistent, removing a number of partial fixes in the code
>>> stream in the process. The code to fixup partitions (including the BB_PARTITION
>>> assignement, region crossing jump notes, and switch text section notes) is
>>> now handled in a few centralized locations. For example, inside
>>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
>>> don't need to attempt the fixup themselves.
>>>
>>> For optimization passes that make adjustments to the cfg while in cfg layout
>>> mode that are not easy to fix up incrementally, the new routine
>>> fixup_partitions handles the cleanup globally. This does require calculation
>>> of the dominance relation, however, as far as I can tell the routines which
>>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
>>> are invoked typically once (or a small number of times in the case of
>>> try_optimize_cfg) per optimization pass. Additionally, I compared the
>>> -ftime-report output for some large fdo compilations and saw only minimal
>>> increases in the dominance computation times, which were only a tiny percent
>>> of the overall compile time.
>>>
>>> Additionally, I added a flag to the rtl_data structure to indicate whether
>>> any partitioning was actually performed, so that optimizations which were
>>> conservatively disabled whenever the flag_reorder_blocks_and_partition
>>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
>>> conservative for functions where no partitions were formed (e.g. they are
>>> completely hot).
>>>
>>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
>>> benchmarks and internal google benchmarks using profile feedback and
>>> -freorder-blocks-and-partition to get more coverage. Ok for trunk?
>>>
>>> Thanks,
>>> Teresa
>>>
>>> 2012-11-14  Teresa Johnson  <tejohnson@google.com>
>>>             Steven Bosscher  <steven@gcc.gnu.org>
>>>
>>>         * cfghooks.h (cfg_layout_finalize): New parameter.
>>>         * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
>>>         parameter.
>>>         * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
>>>         as this is now done by redirect_edge_and_branch_force.
>>>         * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
>>>         barriers, new cfg_layout_finalize parameter, and don't store exit
>>>         predecessor BB until after it is potentially split.
>>>         * function.h (struct rtl_data): New flag has_bb_partition.
>>>         * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
>>>         * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
>>>         any blocks in function actually partitioned.
>>>         (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
>>>         up partitioning.
>>>         * bb-reorder.c (connect_traces): Only look for partitions and skip
>>>         block copying if any blocks in function actually partitioned.
>>>         (emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
>>>         (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
>>>         that no cold blocks dominate a hot block.
>>>         (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
>>>         as this is now done by force_nonfallthru_and_redirect.
>>>         (add_reg_crossing_jump_notes): Handle the fact that some jumps may
>>>         already be marked with region crossing note.
>>>         (reorder_basic_blocks): Only need to verify partitions if any
>>>         blocks in function actually partitioned.
>>>         (insert_section_boundary_note): Only need to insert note if any
>>>         blocks in function actually partitioned.
>>>         (rest_of_handle_reorder_blocks): New cfg_layout_finalize
>>>         parameter, and remove call to insert_section_boundary_note as this
>>>         is now called via cfg_layout_finalize/fixup_reorder_chain.
>>>         (duplicate_computed_gotos): New cfg_layout_finalize
>>>         parameter.
>>>         (partition_hot_cold_basic_blocks): Set flag indicating function
>>>         has bb partitions.
>>>         * bb-reorder.h: Declare insert_section_boundary_note and
>>>         emit_barrier_after_bb, which are no longer static.
>>>         * basic-block.h: Declare new function fixup_partitions.
>>>         * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
>>>         check for region crossing note.
>>>         (fixup_partition_crossing): New function.
>>>         (fixup_bb_partition): Ditto.
>>>         (rtl_redirect_edge_and_branch): Fixup partition boundaries.
>>>         (force_nonfallthru_and_redirect): Fixup partition boundaries,
>>>         remove old code that tried to do this. Emit barrier correctly
>>>         when we are in cfglayout mode.
>>>         (rtl_split_edge): Correctly fixup partition boundaries.
>>>         (commit_one_edge_insertion): Remove old code that tried to
>>>         fixup region crossing edge since this is now handled in
>>>         split_block, and set up insertion point correctly since
>>>         block may now end in a jump.
>>>         (commit_edge_insertions): Invoke fixup_partitions to sanitize partition
>>>         boundaries after optimizations that modify cfg and before trying to
>>>         verify the flow info.
>>>         (fixup_partitions): New function.
>>>         (rtl_verify_flow_info_1): Add verification that no cold bbs dominate
>>>         hot bbs.
>>>         (record_effective_endpoints): Remove region-crossing notes and set flag
>>>         indicating that they need to be reinserted on exit from cfglayout mode.
>>>         (outof_cfg_layout_mode): New cfg_layout_finalize parameter.
>>>         (fixup_reorder_chain): Call insert_section_boundary_note if necessary.
>>>         Remove old code that attempted to fixup region crossing note as
>>>         this is now handled in force_nonfallthru_and_redirect.
>>>         (duplicate_insn_chain): Don't duplicate switch section notes.
>>>         (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
>>>         (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
>>>         note.
>>>
>>> Index: cfghooks.h
>>> ===================================================================
>>> --- cfghooks.h  (revision 193376)
>>> +++ cfghooks.h  (working copy)
>>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
>>>  void account_profile_record (struct profile_record *, int);
>>>
>>>  extern void cfg_layout_initialize (unsigned int);
>>> -extern void cfg_layout_finalize (void);
>>> +extern void cfg_layout_finalize (bool);
>>>
>>>  /* Hooks containers.  */
>>>  extern struct cfg_hooks gimple_cfg_hooks;
>>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
>>>  extern void gimple_register_cfg_hooks (void);
>>>  extern struct cfg_hooks get_cfg_hooks (void);
>>>  extern void set_cfg_hooks (struct cfg_hooks);
>>> -
>>> Index: modulo-sched.c
>>> ===================================================================
>>> --- modulo-sched.c      (revision 193376)
>>> +++ modulo-sched.c      (working copy)
>>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void)
>>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>        bb->aux = bb->next_bb;
>>>    free_dominance_info (CDI_DOMINATORS);
>>> -  cfg_layout_finalize ();
>>> +  cfg_layout_finalize (false);
>>>  #endif /* INSN_SCHEDULING */
>>>    return 0;
>>>  }
>>> Index: ifcvt.c
>>> ===================================================================
>>> --- ifcvt.c     (revision 193376)
>>> +++ ifcvt.c     (working copy)
>>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
>>>    if (new_bb)
>>>      {
>>>        df_bb_replace (then_bb_index, new_bb);
>>> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
>>> -         we need to ensure that new_bb is in the same partition as
>>> -         test bb (you can not fall through across section boundaries).  */
>>> -      BB_COPY_PARTITION (new_bb, test_bb);
>>> +      /* This should have been done above via force_nonfallthru_and_redirect
>>> +         (possibly called from redirect_edge_and_branch_force).  */
>>> +      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
>>>      }
>>>
>>>    num_true_changes++;
>>> Index: function.c
>>> ===================================================================
>>> --- function.c  (revision 193376)
>>> +++ function.c  (working copy)
>>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void)
>>>                     break;
>>>                 if (e)
>>>                   {
>>> -                   copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
>>> -                                                 NULL_RTX, e->src);
>>> +                    /* Make sure we insert after any barriers.  */
>>> +                    rtx end = get_last_bb_insn (e->src);
>>> +                    copy_bb = create_basic_block (NEXT_INSN (end),
>>> +                                                  NULL_RTX, e->src);
>>>                     BB_COPY_PARTITION (copy_bb, e->src);
>>>                   }
>>>                 else
>>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void)
>>>         if (cur_bb->index >= NUM_FIXED_BLOCKS
>>>             && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
>>>           cur_bb->aux = cur_bb->next_bb;
>>> -      cfg_layout_finalize ();
>>> +      cfg_layout_finalize (false);
>>>      }
>>>
>>>  epilogue_done:
>>> @@ -6517,7 +6519,7 @@ epilogue_done:
>>>        basic_block simple_return_block_cold = NULL;
>>>        edge pending_edge_hot = NULL;
>>>        edge pending_edge_cold = NULL;
>>> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>> +      basic_block exit_pred;
>>>        int i;
>>>
>>>        gcc_assert (entry_edge != orig_entry_edge);
>>> @@ -6545,6 +6547,12 @@ epilogue_done:
>>>             else
>>>               pending_edge_cold = e;
>>>           }
>>> +
>>> +      /* Save a pointer to the exit's predecessor BB for use in
>>> +         inserting new BBs at the end of the function. Do this
>>> +         after the call to split_block above which may split
>>> +         the original exit pred.  */
>>> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>>
>>>        FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
>>>         {
>>> Index: function.h
>>> ===================================================================
>>> --- function.h  (revision 193376)
>>> +++ function.h  (working copy)
>>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data {
>>>       sched2) and is useful only if the port defines LEAF_REGISTERS.  */
>>>    bool uses_only_leaf_regs;
>>>
>>> +  /* Nonzero if the function being compiled has undergone hot/cold partitioning
>>> +     (under flag_reorder_blocks_and_partition) and has at least one cold
>>> +     block.  */
>>> +  bool has_bb_partition;
>>> +
>>>    /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
>>>       asm.  Unlike regs_ever_live, elements of this array corresponding
>>>       to eliminable regs (like the frame pointer) are set if an asm
>>> Index: hw-doloop.c
>>> ===================================================================
>>> --- hw-doloop.c (revision 193376)
>>> +++ hw-doloop.c (working copy)
>>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
>>>        else
>>>         bb->aux = NULL;
>>>      }
>>> -  cfg_layout_finalize ();
>>> +  cfg_layout_finalize (false);
>>>    clear_aux_for_blocks ();
>>>    df_analyze ();
>>>  }
>>> Index: cfgcleanup.c
>>> ===================================================================
>>> --- cfgcleanup.c        (revision 193376)
>>> +++ cfgcleanup.c        (working copy)
>>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
>>>       partition boundaries).  See the comments at the top of
>>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>>
>>> -  if (flag_reorder_blocks_and_partition && reload_completed)
>>> +  if (crtl->has_bb_partition && reload_completed)
>>>      return false;
>>>
>>>    /* Search backward through forwarder blocks.  We don't need to worry
>>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode)
>>>               df_analyze ();
>>>             }
>>>
>>> +         if (changed)
>>> +            {
>>> +              /* Edge forwarding in particular can cause hot blocks previously
>>> +                 reached by both hot and cold blocks to become dominated only
>>> +                 by cold blocks. This will cause the verification below to fail,
>>> +                 and lead to now cold code in the hot section. This is not easy
>>> +                 to detect and fix during edge forwarding, and in some cases
>>> +                 is only visible after newly unreachable blocks are deleted,
>>> +                 which will be done in fixup_partitions.  */
>>> +              fixup_partitions ();
>>> +
>>>  #ifdef ENABLE_CHECKING
>>> -         if (changed)
>>> -           verify_flow_info ();
>>> +              verify_flow_info ();
>>>  #endif
>>> +            }
>>>
>>>           changed_overall |= changed;
>>>           first_pass = false;
>>> Index: bb-reorder.c
>>> ===================================================================
>>> --- bb-reorder.c        (revision 193376)
>>> +++ bb-reorder.c        (working copy)
>>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
>>>    current_partition = BB_PARTITION (traces[0].first);
>>>    two_passes = false;
>>>
>>> -  if (flag_reorder_blocks_and_partition)
>>> +  if (crtl->has_bb_partition)
>>>      for (i = 0; i < n_traces && !two_passes; i++)
>>>        if (BB_PARTITION (traces[0].first)
>>>           != BB_PARTITION (traces[i].first))
>>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
>>>                       }
>>>                   }
>>>
>>> -             if (flag_reorder_blocks_and_partition)
>>> +             if (crtl->has_bb_partition)
>>>                 try_copy = false;
>>>
>>>               /* Copy tiny blocks always; copy larger blocks only when the
>>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
>>>    return length;
>>>  }
>>>
>>> -/* Emit a barrier into the footer of BB.  */
>>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
>>>
>>> -static void
>>> +void
>>>  emit_barrier_after_bb (basic_block bb)
>>>  {
>>>    rtx barrier = emit_barrier_after (BB_END (bb));
>>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>>  }
>>>
>>>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
>>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
>>>  {
>>>    VEC(edge, heap) *crossing_edges = NULL;
>>>    basic_block bb;
>>> -  edge e;
>>> -  edge_iterator ei;
>>> +  edge e, e2;
>>> +  edge_iterator ei, ei2;
>>> +  unsigned int cold_bb_count = 0;
>>> +  VEC (basic_block, heap) *bbs_in_hot_partition = NULL;
>>> +  VEC (basic_block, heap) *bbs_newly_hot = NULL;
>>>
>>>    /* Mark which partition (hot/cold) each basic block belongs in.  */
>>>    FOR_EACH_BB (bb)
>>>      {
>>>        if (probably_never_executed_bb_p (cfun, bb))
>>> -       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>> +        {
>>> +          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>> +          cold_bb_count++;
>>> +        }
>>>        else
>>> -       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>>> +        {
>>> +          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb);
>>> +        }
>>>      }
>>>
>>> +  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
>>> +     several different possibilities. One is that there are edge weight insanities
>>> +     due to optimization phases that do not properly update basic block profile
>>> +     counts. The second is that the entry of the function may not be hot, because
>>> +     it is entered fewer times than the number of profile training runs, but there
>>> +     is a loop inside the function that causes blocks within the function to be
>>> +     above the threshold for hotness.  */
>>> +  if (cold_bb_count)
>>> +    {
>>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>> +
>>> +      if (dom_calculated_here)
>>> +        calculate_dominance_info (CDI_DOMINATORS);
>>> +
>>> +      /* Keep examining hot bbs until we have either checked them all, or
>>> +         re-marked all cold bbs hot.  */
>>> +      while (! VEC_empty (basic_block, bbs_in_hot_partition)
>>> +             && cold_bb_count)
>>> +        {
>>> +          basic_block dom_bb;
>>> +
>>> +          bb = VEC_pop (basic_block, bbs_in_hot_partition);
>>> +          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
>>> +
>>> +          /* If bb's immediate dominator is also hot then it is ok.  */
>>> +          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
>>> +            continue;
>>> +
>>> +          /* We have a hot bb with an immediate dominator that is cold.
>>> +             The dominator needs to be re-marked to hot.  */
>>> +          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
>>> +          cold_bb_count--;
>>> +
>>> +          /* Now we need to examine newly-hot dom_bb to see if it is also
>>> +             dominated by a cold bb.  */
>>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb);
>>> +
>>> +          /* We should also adjust any cold blocks that the newly-hot bb
>>> +             feeds and see if it makes sense to re-mark those as hot as
>>> +             well.  */
>>> +          VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb);
>>> +          while (! VEC_empty (basic_block, bbs_newly_hot))
>>> +            {
>>> +              basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot);
>>> +              /* Examine all successors of this newly-hot bb to see if they
>>> +                 are cold and should be re-marked as hot.  */
>>> +              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
>>> +                {
>>> +                  bool any_cold_preds = false;
>>> +                  basic_block succ = e->dest;
>>> +                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
>>> +                    continue;
>>> +                  /* Does this block have any cold predecessors now?  */
>>> +                  FOR_EACH_EDGE (e2, ei2, succ->preds)
>>> +                  {
>>> +                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
>>> +                      {
>>> +                        any_cold_preds = true;
>>> +                        break;
>>> +                      }
>>> +                  }
>>> +                  if (any_cold_preds)
>>> +                    continue;
>>> +
>>> +                  /* Here we have a successor of newly-hot bb that is cold
>>> +                     but no longer has any cold precessessors. Since the original
>>> +                     assignment of our newly-hot bb was incorrect, this successor's
>>> +                     assignment as cold is also suspect. Go ahead and re-mark it
>>> +                     as hot now too. Better heuristics may be in order here.  */
>>> +                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
>>> +                  cold_bb_count--;
>>> +                  VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ);
>>> +                  /* Examine this successor as a newly-hot bb.  */
>>> +                  VEC_safe_push (basic_block, heap, bbs_newly_hot, succ);
>>> +                }
>>> +            }
>>> +        }
>>> +
>>> +      if (dom_calculated_here)
>>> +        free_dominance_info (CDI_DOMINATORS);
>>> +    }
>>> +
>>>    /* The format of .gcc_except_table does not allow landing pads to
>>>       be in a different partition as the throw.  Fix this by either
>>>       moving or duplicating the landing pads.  */
>>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
>>>                       new_bb->aux = cur_bb->aux;
>>>                       cur_bb->aux = new_bb;
>>>
>>> -                     /* Make sure new fall-through bb is in same
>>> -                        partition as bb it's falling through from.  */
>>> +                      /* This is done by force_nonfallthru_and_redirect.  */
>>> +                     gcc_assert (BB_PARTITION (new_bb)
>>> +                                  == BB_PARTITION (cur_bb));
>>>
>>> -                     BB_COPY_PARTITION (new_bb, cur_bb);
>>>                       single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
>>>                     }
>>>                   else
>>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
>>>    FOR_EACH_BB (bb)
>>>      FOR_EACH_EDGE (e, ei, bb->succs)
>>>        if ((e->flags & EDGE_CROSSING)
>>> -         && JUMP_P (BB_END (e->src)))
>>> +         && JUMP_P (BB_END (e->src))
>>> +          /* Some notes were added during fix_up_fall_thru_edges, via
>>> +             force_nonfallthru_and_redirect.  */
>>> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
>>>         add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>  }
>>>
>>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
>>>        dump_flow_info (dump_file, dump_flags);
>>>      }
>>>
>>> -  if (flag_reorder_blocks_and_partition)
>>> +  if (crtl->has_bb_partition)
>>>      verify_hot_cold_block_grouping ();
>>>  }
>>>
>>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
>>>     encountering this note will make the compiler switch between the
>>>     hot and cold text sections.  */
>>>
>>> -static void
>>> +void
>>>  insert_section_boundary_note (void)
>>>  {
>>>    basic_block bb;
>>>    rtx new_note;
>>>    int first_partition = 0;
>>>
>>> -  if (!flag_reorder_blocks_and_partition)
>>> +  if (!crtl->has_bb_partition)
>>>      return;
>>>
>>>    FOR_EACH_BB (bb)
>>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
>>>    FOR_EACH_BB (bb)
>>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>        bb->aux = bb->next_bb;
>>> -  cfg_layout_finalize ();
>>> +  cfg_layout_finalize (true);
>>>
>>> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>>> -  insert_section_boundary_note ();
>>>    return 0;
>>>  }
>>>
>>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
>>>      }
>>>
>>>  done:
>>> -  cfg_layout_finalize ();
>>> +  cfg_layout_finalize (false);
>>>
>>>    BITMAP_FREE (candidates);
>>>    return 0;
>>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
>>>    if (crossing_edges == NULL)
>>>      return 0;
>>>
>>> +  crtl->has_bb_partition = true;
>>> +
>>>    /* Make sure the source of any crossing edge ends in a jump and the
>>>       destination of any crossing edge has a label.  */
>>>    add_labels_and_missing_jumps (crossing_edges);
>>> Index: bb-reorder.h
>>> ===================================================================
>>> --- bb-reorder.h        (revision 193376)
>>> +++ bb-reorder.h        (working copy)
>>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re
>>>
>>>  extern int get_uncond_jump_length (void);
>>>
>>> +extern void insert_section_boundary_note (void);
>>> +
>>> +extern void emit_barrier_after_bb (basic_block bb);
>>> +
>>>  #endif
>>> Index: basic-block.h
>>> ===================================================================
>>> --- basic-block.h       (revision 193376)
>>> +++ basic-block.h       (working copy)
>>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect
>>>  extern bool contains_no_active_insn_p (const_basic_block);
>>>  extern bool forwarder_block_p (const_basic_block);
>>>  extern bool can_fallthru (basic_block, basic_block);
>>> +extern void fixup_partitions (void);
>>>
>>>  /* In cfgbuild.c.  */
>>>  extern void find_many_sub_basic_blocks (sbitmap);
>>> Index: cfgrtl.c
>>> ===================================================================
>>> --- cfgrtl.c    (revision 193376)
>>> +++ cfgrtl.c    (working copy)
>>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "tree.h"
>>>  #include "hard-reg-set.h"
>>>  #include "basic-block.h"
>>> +#include "bb-reorder.h"
>>>  #include "regs.h"
>>>  #include "flags.h"
>>>  #include "function.h"
>>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
>>>     Only applicable if the CFG is in cfglayout mode.  */
>>>  static GTY(()) rtx cfg_layout_function_footer;
>>>  static GTY(()) rtx cfg_layout_function_header;
>>> +static bool had_sec_boundary_notes;
>>>
>>>  static rtx skip_insns_after_block (basic_block);
>>>  static void record_effective_endpoints (void);
>>>  static rtx label_for_bb (basic_block);
>>> -static void fixup_reorder_chain (void);
>>> +static void fixup_reorder_chain (bool finalize_reorder_blocks);
>>>
>>>  void verify_insn_chain (void);
>>>  static void fixup_fallthru_exit_predecessor (void);
>>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
>>>       partition boundaries).  See  the comments at the top of
>>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>>
>>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>>      return NULL;
>>>
>>>    /* We can replace or remove a complex jump only when we have exactly
>>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
>>>    return e;
>>>  }
>>>
>>> +/* Called when edge E has been redirected to a new destination,
>>> +   in order to update the region crossing flag on the edge and
>>> +   jump.  */
>>> +
>>> +static void
>>> +fixup_partition_crossing (edge e, basic_block target)
>>> +{
>>> +  rtx note;
>>> +
>>> +  gcc_assert (e->dest == target);
>>> +
>>> +  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
>>> +    return;
>>> +  /* If we redirected an existing edge, it may already be marked
>>> +     crossing, even though the new src is missing a reg crossing note.
>>> +     But make sure reg crossing note doesn't already exist before
>>> +     inserting.  */
>>> +  if (BB_PARTITION (e->src) != BB_PARTITION (target))
>>> +    {
>>> +      e->flags |= EDGE_CROSSING;
>>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>> +      if (JUMP_P (BB_END (e->src))
>>> +          && !note)
>>> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>> +    }
>>> +  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
>>> +    {
>>> +      e->flags &= ~EDGE_CROSSING;
>>> +      /* Remove the region crossing note from jump at end of
>>> +         e->src if it exists.  */
>>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>> +      if (note)
>>> +        remove_note (BB_END (e->src), note);
>>> +    }
>>> +}
>>> +
>>> +/* Called when block BB has been reassigned to a different partition,
>>> +   to ensure that the region crossing attributes are updated.  */
>>> +
>>> +static void
>>> +fixup_bb_partition (basic_block bb)
>>> +{
>>> +  edge e;
>>> +  edge_iterator ei;
>>> +
>>> +  /* Now need to make bb's pred edges non-region crossing.  */
>>> +  FOR_EACH_EDGE (e, ei, bb->preds)
>>> +    {
>>> +      fixup_partition_crossing (e, e->dest);
>>> +    }
>>> +
>>> +  /* Possibly need to make bb's successor edges region crossing,
>>> +     or remove stale region crossing.  */
>>> +  FOR_EACH_EDGE (e, ei, bb->succs)
>>> +    {
>>> +      if ((e->flags & EDGE_FALLTHRU)
>>> +          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
>>> +          && e->dest != EXIT_BLOCK_PTR)
>>> +        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
>>> +        force_nonfallthru (e);
>>> +      else
>>> +        fixup_partition_crossing (e, e->dest);
>>> +    }
>>> +}
>>> +
>>>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>>>     expense of adding new instructions or reordering basic blocks.
>>>
>>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>>  {
>>>    edge ret;
>>>    basic_block src = e->src;
>>> +  basic_block dest = e->dest;
>>>
>>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>>      return NULL;
>>>
>>> -  if (e->dest == target)
>>> +  if (dest == target)
>>>      return e;
>>>
>>>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
>>>      {
>>>        df_set_bb_dirty (src);
>>> +      fixup_partition_crossing (ret, target);
>>>        return ret;
>>>      }
>>>
>>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>>      return NULL;
>>>
>>>    df_set_bb_dirty (src);
>>> +  fixup_partition_crossing (ret, target);
>>>    return ret;
>>>  }
>>>
>>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>>        /* Make sure new block ends up in correct hot/cold section.  */
>>>
>>>        BB_COPY_PARTITION (jump_block, e->src);
>>> -      if (flag_reorder_blocks_and_partition
>>> -         && targetm_common.have_named_sections
>>> -         && JUMP_P (BB_END (jump_block))
>>> -         && !any_condjump_p (BB_END (jump_block))
>>> -         && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
>>> -       add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
>>>
>>>        /* Wire edge in.  */
>>>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
>>>        new_edge->probability = probability;
>>>        new_edge->count = count;
>>>
>>> +      /* If e->src was previously region crossing, it no longer is
>>> +         and the reg crossing note should be removed.  */
>>> +      fixup_partition_crossing (new_edge, jump_block);
>>> +
>>>        /* Redirect old edge.  */
>>>        redirect_edge_pred (e, jump_block);
>>>        e->probability = REG_BR_PROB_BASE;
>>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>>        LABEL_NUSES (label)++;
>>>      }
>>>
>>> -  emit_barrier_after (BB_END (jump_block));
>>> +  /* We might be in cfg layout mode, and if so, the following routine will
>>> +     insert the barrier correctly.  */
>>> +  emit_barrier_after_bb (jump_block);
>>>    redirect_edge_succ_nodup (e, target);
>>>
>>>    if (abnormal_edge_flags)
>>>      make_edge (src, target, abnormal_edge_flags);
>>>
>>>    df_mark_solutions_dirty ();
>>> +  fixup_partition_crossing (e, target);
>>>    return new_bb;
>>>  }
>>>
>>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>>>  static basic_block
>>>  rtl_split_edge (edge edge_in)
>>>  {
>>> -  basic_block bb;
>>> +  basic_block bb, new_bb;
>>>    rtx before;
>>>
>>>    /* Abnormal edges cannot be split.  */
>>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in)
>>>    else
>>>      {
>>>        bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
>>> -      /* ??? Why not edge_in->dest->prev_bb here?  */
>>> -      BB_COPY_PARTITION (bb, edge_in->dest);
>>> +      if (edge_in->src == ENTRY_BLOCK_PTR)
>>> +        BB_COPY_PARTITION (bb, edge_in->dest);
>>> +      else
>>> +        /* Put the split bb into the src partition, to avoid creating
>>> +           a situation where a cold bb dominates a hot bb, in the case
>>> +           where src is cold and dest is hot. The src will dominate
>>> +           the new bb (whereas it might not have dominated dest).  */
>>> +        BB_COPY_PARTITION (bb, edge_in->src);
>>>      }
>>>
>>>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
>>>
>>> +  /* Can't allow a region crossing edge to be fallthrough.  */
>>> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
>>> +      && edge_in->dest != EXIT_BLOCK_PTR)
>>> +    {
>>> +      new_bb = force_nonfallthru (single_succ_edge (bb));
>>> +      gcc_assert (!new_bb);
>>> +    }
>>> +
>>>    /* For non-fallthru edges, we must adjust the predecessor's
>>>       jump instruction to target our new block.  */
>>>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
>>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e)
>>>    else
>>>      {
>>>        bb = split_edge (e);
>>> -      after = BB_END (bb);
>>>
>>> -      if (flag_reorder_blocks_and_partition
>>> -         && targetm_common.have_named_sections
>>> -         && e->src != ENTRY_BLOCK_PTR
>>> -         && BB_PARTITION (e->src) == BB_COLD_PARTITION
>>> -         && !(e->flags & EDGE_CROSSING)
>>> -         && JUMP_P (after)
>>> -         && !any_condjump_p (after)
>>> -         && (single_succ_edge (bb)->flags & EDGE_CROSSING))
>>> -       add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
>>> +      /* If e crossed a partition boundary, we needed to make bb end in
>>> +         a region-crossing jump, even though it was originally fallthru.  */
>>> +      if (JUMP_P (BB_END (bb)))
>>> +       before = BB_END (bb);
>>> +      else
>>> +        after = BB_END (bb);
>>>      }
>>>
>>>    /* Now that we've found the spot, do the insertion.  */
>>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void)
>>>  {
>>>    basic_block bb;
>>>
>>> +  /* Optimization passes that invoke this routine can cause hot blocks
>>> +     previously reached by both hot and cold blocks to become dominated only
>>> +     by cold blocks. This will cause the verification below to fail,
>>> +     and lead to now cold code in the hot section. In some cases this
>>> +     may only be visible after newly unreachable blocks are deleted,
>>> +     which will be done by fixup_partitions.  */
>>> +  fixup_partitions ();
>>> +
>>>  #ifdef ENABLE_CHECKING
>>>    verify_flow_info ();
>>>  #endif
>>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb)
>>>
>>>    return end;
>>>  }
>>> -
>>> +
>>> +/* Perform cleanup on the hot/cold bb partitioning after optimization
>>> +   passes that modify the cfg.  */
>>> +
>>> +void
>>> +fixup_partitions (void)
>>> +{
>>> +  basic_block bb;
>>> +
>>> +  if (!crtl->has_bb_partition)
>>> +    return;
>>> +
>>> +  /* Delete any blocks that became unreachable and weren't
>>> +     already cleaned up, for example during edge forwarding
>>> +     and convert_jumps_to_returns. This will expose more
>>> +     opportunities for fixing the partition boundaries here.
>>> +     Also, the calculation of the dominance graph during verification
>>> +     will assert if there are unreachable nodes.  */
>>> +  delete_unreachable_blocks ();
>>> +
>>> +  /* If there are partitions, do a sanity check on them: A basic block in
>>> +     a cold partition cannot dominate a basic block in a hot partition.
>>> +     Fixup any that now violate this requirement, as a result of edge
>>> +     forwarding and unreachable block deletion.  */
>>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>>> +  VEC (basic_block, heap) *bbs_to_fix = NULL;
>>> +  FOR_EACH_BB (bb)
>>> +    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>>> +      VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>>> +    {
>>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>> +      basic_block son;
>>> +
>>> +      if (dom_calculated_here)
>>> +        calculate_dominance_info (CDI_DOMINATORS);
>>> +
>>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>>> +        {
>>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>>> +          /* If bb is not yet cold (because it was added below as
>>> +             a block dominated by a cold bb) then mark it cold here.  */
>>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>>> +            {
>>> +              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>> +              VEC_safe_push (basic_block, heap, bbs_to_fix, bb);
>>> +            }
>>> +          /* Any blocks dominated by a block in the cold section
>>> +             must also be cold.  */
>>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>>> +               son;
>>> +               son = next_dom_son (CDI_DOMINATORS, son))
>>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>>> +        }
>>> +
>>> +      if (dom_calculated_here)
>>> +        free_dominance_info (CDI_DOMINATORS);
>>> +    }
>>> +
>>> +  /* Do the partition fixup after all necessary blocks have been converted to
>>> +     cold, so that we only update the region crossings the minimum number of
>>> +     places, which can require forcing edges to be non fallthru.  */
>>> +  while (! VEC_empty (basic_block, bbs_to_fix))
>>> +    {
>>> +      bb = VEC_pop (basic_block, bbs_to_fix);
>>> +      fixup_bb_partition (bb);
>>> +    }
>>> +}
>>> +
>>>  /* Verify the CFG and RTL consistency common for both underlying RTL and
>>>     cfglayout RTL.
>>>
>>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void)
>>>    rtx x;
>>>    int err = 0;
>>>    basic_block bb;
>>> +  bool have_partitions = false;
>>>
>>>    /* Check the general integrity of the basic blocks.  */
>>>    FOR_EACH_BB_REVERSE (bb)
>>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void)
>>>
>>>           if (e->flags & EDGE_ABNORMAL)
>>>             n_abnormal++;
>>> +
>>> +          have_partitions |= is_crossing;
>>>         }
>>>
>>>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
>>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void)
>>>           }
>>>      }
>>>
>>> +  /* If there are partitions, do a sanity check on them: A basic block in
>>> +     a cold partition cannot dominate a basic block in a hot partition.  */
>>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>>> +  if (have_partitions && !err)
>>> +    FOR_EACH_BB (bb)
>>> +      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>>> +        VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>>> +    {
>>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>> +      basic_block son;
>>> +
>>> +      if (dom_calculated_here)
>>> +        calculate_dominance_info (CDI_DOMINATORS);
>>> +
>>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>>> +        {
>>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>>> +            {
>>> +              error ("non-cold basic block %d dominated "
>>> +                     "by a block in the cold partition", bb->index);
>>> +              err = 1;
>>> +            }
>>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>>> +               son;
>>> +               son = next_dom_son (CDI_DOMINATORS, son))
>>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>>> +        }
>>> +
>>> +      if (dom_calculated_here)
>>> +        free_dominance_info (CDI_DOMINATORS);
>>> +    }
>>> +
>>>    /* Clean up.  */
>>>    return err;
>>>  }
>>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void)
>>>    else
>>>      cfg_layout_function_header = NULL_RTX;
>>>
>>> +  had_sec_boundary_notes = false;
>>> +
>>>    next_insn = get_insns ();
>>>    FOR_EACH_BB (bb)
>>>      {
>>>        rtx end;
>>>
>>>        if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
>>> -       BB_HEADER (bb) = unlink_insn_chain (next_insn,
>>> -                                             PREV_INSN (BB_HEAD (bb)));
>>> +        {
>>> +          /* Rather than try to keep section boundary notes incrementally
>>> +             up-to-date through cfg layout optimizations, simply remove them
>>> +             and flag that they should be re-inserted when exiting
>>> +             cfg layout mode.  */
>>> +          rtx check_insn = next_insn;
>>> +          while (check_insn)
>>> +            {
>>> +              if (NOTE_P (check_insn)
>>> +                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
>>> +              {
>>> +                had_sec_boundary_notes |= true;
>>> +                /* Remove note from chain. Grab new next_insn first.  */
>>> +                if (next_insn == check_insn)
>>> +                  next_insn = NEXT_INSN (check_insn);
>>> +                /* Delete note.  */
>>> +                delete_insn (check_insn);
>>> +                /* There will only be one.  */
>>> +                break;
>>> +              }
>>> +              check_insn = NEXT_INSN (check_insn);
>>> +            }
>>> +          /* If we still have header instructions left after above loop.  */
>>> +          if (next_insn != BB_HEAD (bb))
>>> +            BB_HEADER (bb) = unlink_insn_chain (next_insn,
>>> +                                                PREV_INSN (BB_HEAD (bb)));
>>> +        }
>>>        end = skip_insns_after_block (bb);
>>>        if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
>>>         BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
>>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void)
>>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>        bb->aux = bb->next_bb;
>>>
>>> -  cfg_layout_finalize ();
>>> +  cfg_layout_finalize (false);
>>>
>>>    return 0;
>>>  }
>>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
>>>  }
>>>
>>>
>>> -/* Given a reorder chain, rearrange the code to match.  */
>>> +/* Given a reorder chain, rearrange the code to match. If
>>> +   this is called when we will FINALIZE_REORDER_BLOCKS, or when
>>> +   section boundary notes were removed on entry to cfg layout
>>> +   mode, insert section boundary notes here.  */
>>>
>>>  static void
>>> -fixup_reorder_chain (void)
>>> +fixup_reorder_chain (bool finalize_reorder_blocks)
>>>  {
>>>    basic_block bb;
>>>    rtx insn = NULL;
>>> @@ -3150,7 +3373,7 @@ static void
>>>           PREV_INSN (BB_HEADER (bb)) = insn;
>>>           insn = BB_HEADER (bb);
>>>           while (NEXT_INSN (insn))
>>> -           insn = NEXT_INSN (insn);
>>> +            insn = NEXT_INSN (insn);
>>>         }
>>>        if (insn)
>>>         NEXT_INSN (insn) = BB_HEAD (bb);
>>> @@ -3175,6 +3398,11 @@ static void
>>>      insn = NEXT_INSN (insn);
>>>
>>>    set_last_insn (insn);
>>> +
>>> +  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>>> +  if (had_sec_boundary_notes || finalize_reorder_blocks)
>>> +    insert_section_boundary_note ();
>>> +
>>>  #ifdef ENABLE_CHECKING
>>>    verify_insn_chain ();
>>>  #endif
>>> @@ -3187,7 +3415,7 @@ static void
>>>        edge e_fall, e_taken, e;
>>>        rtx bb_end_insn;
>>>        rtx ret_label = NULL_RTX;
>>> -      basic_block nb, src_bb;
>>> +      basic_block nb;
>>>        edge_iterator ei;
>>>
>>>        if (EDGE_COUNT (bb->succs) == 0)
>>> @@ -3322,7 +3550,6 @@ static void
>>>        /* We got here if we need to add a new jump insn.
>>>          Note force_nonfallthru can delete E_FALL and thus we have to
>>>          save E_FALL->src prior to the call to force_nonfallthru.  */
>>> -      src_bb = e_fall->src;
>>>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
>>>        if (nb)
>>>         {
>>> @@ -3330,17 +3557,6 @@ static void
>>>           bb->aux = nb;
>>>           /* Don't process this new block.  */
>>>           bb = nb;
>>> -
>>> -         /* Make sure new bb is tagged for correct section (same as
>>> -            fall-thru source, since you cannot fall-thru across
>>> -            section boundaries).  */
>>> -         BB_COPY_PARTITION (src_bb, single_pred (bb));
>>> -         if (flag_reorder_blocks_and_partition
>>> -             && targetm_common.have_named_sections
>>> -             && JUMP_P (BB_END (bb))
>>> -             && !any_condjump_p (BB_END (bb))
>>> -             && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
>>> -           add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
>>>         }
>>>      }
>>>
>>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to)
>>>             case NOTE_INSN_FUNCTION_BEG:
>>>               /* There is always just single entry to function.  */
>>>             case NOTE_INSN_BASIC_BLOCK:
>>> +              /* We should only switch text sections once.  */
>>> +           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>>               break;
>>>
>>>             case NOTE_INSN_EPILOGUE_BEG:
>>> -           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>>               emit_note_copy (insn);
>>>               break;
>>>
>>> @@ -3759,10 +3976,13 @@ break_superblocks (void)
>>>  }
>>>
>>>  /* Finalize the changes: reorder insn list according to the sequence specified
>>> -   by aux pointers, enter compensation code, rebuild scope forest.  */
>>> +   by aux pointers, enter compensation code, rebuild scope forest. If
>>> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
>>> +   to fixup_reorder_chain so that it can insert the proper switch text
>>> +   section notes.  */
>>>
>>>  void
>>> -cfg_layout_finalize (void)
>>> +cfg_layout_finalize (bool finalize_reorder_blocks)
>>>  {
>>>  #ifdef ENABLE_CHECKING
>>>    verify_flow_info ();
>>> @@ -3775,7 +3995,7 @@ void
>>>  #endif
>>>        )
>>>      fixup_fallthru_exit_predecessor ();
>>> -  fixup_reorder_chain ();
>>> +  fixup_reorder_chain (finalize_reorder_blocks);
>>>
>>>    rebuild_jump_labels (get_insns ());
>>>    delete_dead_jumptables ();
>>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e)
>>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>>      return false;
>>>
>>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>>      return false;
>>>
>>>    if (!onlyjump_p (insn)
>>>
>>> --
>>> This patch is available for review at http://codereview.appspot.com/6823047
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Jack Howarth Nov. 26, 2012, 8:42 p.m. UTC | #4
On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote:
> Are you sure you have all my changes applied? I applied the 4 patches
> attached to PR55121 into my trunk checkout that has my fixes, and to a
> pristine trunk checkout. I configured and built both for
> --target=arm-none-linux-gnueabi, and built using your options, .i file
> and gcda file. I can reproduce the failure using the pristine trunk
> with your patches but not with my fixed trunk + your patches. (I just
> updated to head to pickup recent changes and get the same result. The
> vec changes required some manual changes to the patch, which I will
> resend shortly.)

Teresa,
    Your mailer seems to have corrupted the posted patch with stray
=3D characters and line breaks. Can you repost a copy as an attachment
to the list?
             Jack

> 
> Without my fixes:
> 
> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce
> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
> -fno-common -o eval.s -freorder-blocks-and-partition
> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
> 2.4.2-p1, MPC version 0.8.1
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
> 2.4.2-p1, MPC version 0.8.1
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a
> eval.c: In function ‘Ge’:
> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560
>  }
>  ^
> 0x622f71 df_compact_blocks()
> ../../gcc_trunk_3/gcc/df-core.c:1560
> 0x5cfcb5 compact_blocks()
> ../../gcc_trunk_3/gcc/cfg.c:162
> 0xc9dce0 reorder_basic_blocks
> ../../gcc_trunk_3/gcc/bb-reorder.c:2154
> 0xc9dce0 rest_of_handle_reorder_blocks
> ../../gcc_trunk_3/gcc/bb-reorder.c:2219
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See <http://gcc.gnu.org/bugs.html> for instructions.
> 
> 
> With my fixes:
> 
> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce
> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
> -fno-common -o eval.s -freorder-blocks-and-partition
> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
> 2.4.2-p1, MPC version 0.8.1
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
> 2.4.2-p1, MPC version 0.8.1
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3
> 
> 
> Thanks,
> Teresa
> 
> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
> > Hi,
> >
> > I have tested your patch on Spec2000 on ARM, and I can still see
> > several failures caused by:
> > "error: fallthru edge crosses section boundary", including the case
> > described in PR55121.
> >
> > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote:
> >> Ping.
> >> Teresa
> >>
> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote:
> >>> Revised patch that fixes failures encountered when enabling
> >>> -freorder-blocks-and-partition, including the failure reported in PR 53743.
> >>>
> >>> This includes new verification code to ensure no cold blocks dominate hot
> >>> blocks contributed by Steven Bosscher.
> >>>
> >>> I attempted to make the handling of partition updates through the optimization
> >>> passes much more consistent, removing a number of partial fixes in the code
> >>> stream in the process. The code to fixup partitions (including the BB_PARTITION
> >>> assignement, region crossing jump notes, and switch text section notes) is
> >>> now handled in a few centralized locations. For example, inside
> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
> >>> don't need to attempt the fixup themselves.
> >>>
> >>> For optimization passes that make adjustments to the cfg while in cfg layout
> >>> mode that are not easy to fix up incrementally, the new routine
> >>> fixup_partitions handles the cleanup globally. This does require calculation
> >>> of the dominance relation, however, as far as I can tell the routines which
> >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
> >>> are invoked typically once (or a small number of times in the case of
> >>> try_optimize_cfg) per optimization pass. Additionally, I compared the
> >>> -ftime-report output for some large fdo compilations and saw only minimal
> >>> increases in the dominance computation times, which were only a tiny percent
> >>> of the overall compile time.
> >>>
> >>> Additionally, I added a flag to the rtl_data structure to indicate whether
> >>> any partitioning was actually performed, so that optimizations which were
> >>> conservatively disabled whenever the flag_reorder_blocks_and_partition
> >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
> >>> conservative for functions where no partitions were formed (e.g. they are
> >>> completely hot).
> >>>
> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
> >>> benchmarks and internal google benchmarks using profile feedback and
> >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk?
> >>>
> >>> Thanks,
> >>> Teresa
> >>>
> >>> 2012-11-14  Teresa Johnson  <tejohnson@google.com>
> >>>             Steven Bosscher  <steven@gcc.gnu.org>
> >>>
> >>>         * cfghooks.h (cfg_layout_finalize): New parameter.
> >>>         * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
> >>>         parameter.
> >>>         * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
> >>>         as this is now done by redirect_edge_and_branch_force.
> >>>         * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
> >>>         barriers, new cfg_layout_finalize parameter, and don't store exit
> >>>         predecessor BB until after it is potentially split.
> >>>         * function.h (struct rtl_data): New flag has_bb_partition.
> >>>         * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
> >>>         * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
> >>>         any blocks in function actually partitioned.
> >>>         (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
> >>>         up partitioning.
> >>>         * bb-reorder.c (connect_traces): Only look for partitions and skip
> >>>         block copying if any blocks in function actually partitioned.
> >>>         (emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
> >>>         (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
> >>>         that no cold blocks dominate a hot block.
> >>>         (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
> >>>         as this is now done by force_nonfallthru_and_redirect.
> >>>         (add_reg_crossing_jump_notes): Handle the fact that some jumps may
> >>>         already be marked with region crossing note.
> >>>         (reorder_basic_blocks): Only need to verify partitions if any
> >>>         blocks in function actually partitioned.
> >>>         (insert_section_boundary_note): Only need to insert note if any
> >>>         blocks in function actually partitioned.
> >>>         (rest_of_handle_reorder_blocks): New cfg_layout_finalize
> >>>         parameter, and remove call to insert_section_boundary_note as this
> >>>         is now called via cfg_layout_finalize/fixup_reorder_chain.
> >>>         (duplicate_computed_gotos): New cfg_layout_finalize
> >>>         parameter.
> >>>         (partition_hot_cold_basic_blocks): Set flag indicating function
> >>>         has bb partitions.
> >>>         * bb-reorder.h: Declare insert_section_boundary_note and
> >>>         emit_barrier_after_bb, which are no longer static.
> >>>         * basic-block.h: Declare new function fixup_partitions.
> >>>         * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
> >>>         check for region crossing note.
> >>>         (fixup_partition_crossing): New function.
> >>>         (fixup_bb_partition): Ditto.
> >>>         (rtl_redirect_edge_and_branch): Fixup partition boundaries.
> >>>         (force_nonfallthru_and_redirect): Fixup partition boundaries,
> >>>         remove old code that tried to do this. Emit barrier correctly
> >>>         when we are in cfglayout mode.
> >>>         (rtl_split_edge): Correctly fixup partition boundaries.
> >>>         (commit_one_edge_insertion): Remove old code that tried to
> >>>         fixup region crossing edge since this is now handled in
> >>>         split_block, and set up insertion point correctly since
> >>>         block may now end in a jump.
> >>>         (commit_edge_insertions): Invoke fixup_partitions to sanitize partition
> >>>         boundaries after optimizations that modify cfg and before trying to
> >>>         verify the flow info.
> >>>         (fixup_partitions): New function.
> >>>         (rtl_verify_flow_info_1): Add verification that no cold bbs dominate
> >>>         hot bbs.
> >>>         (record_effective_endpoints): Remove region-crossing notes and set flag
> >>>         indicating that they need to be reinserted on exit from cfglayout mode.
> >>>         (outof_cfg_layout_mode): New cfg_layout_finalize parameter.
> >>>         (fixup_reorder_chain): Call insert_section_boundary_note if necessary.
> >>>         Remove old code that attempted to fixup region crossing note as
> >>>         this is now handled in force_nonfallthru_and_redirect.
> >>>         (duplicate_insn_chain): Don't duplicate switch section notes.
> >>>         (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
> >>>         (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
> >>>         note.
> >>>
> >>> Index: cfghooks.h
> >>> ===================================================================
> >>> --- cfghooks.h  (revision 193376)
> >>> +++ cfghooks.h  (working copy)
> >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
> >>>  void account_profile_record (struct profile_record *, int);
> >>>
> >>>  extern void cfg_layout_initialize (unsigned int);
> >>> -extern void cfg_layout_finalize (void);
> >>> +extern void cfg_layout_finalize (bool);
> >>>
> >>>  /* Hooks containers.  */
> >>>  extern struct cfg_hooks gimple_cfg_hooks;
> >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
> >>>  extern void gimple_register_cfg_hooks (void);
> >>>  extern struct cfg_hooks get_cfg_hooks (void);
> >>>  extern void set_cfg_hooks (struct cfg_hooks);
> >>> -
> >>> Index: modulo-sched.c
> >>> ===================================================================
> >>> --- modulo-sched.c      (revision 193376)
> >>> +++ modulo-sched.c      (working copy)
> >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void)
> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
> >>>        bb->aux = bb->next_bb;
> >>>    free_dominance_info (CDI_DOMINATORS);
> >>> -  cfg_layout_finalize ();
> >>> +  cfg_layout_finalize (false);
> >>>  #endif /* INSN_SCHEDULING */
> >>>    return 0;
> >>>  }
> >>> Index: ifcvt.c
> >>> ===================================================================
> >>> --- ifcvt.c     (revision 193376)
> >>> +++ ifcvt.c     (working copy)
> >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
> >>>    if (new_bb)
> >>>      {
> >>>        df_bb_replace (then_bb_index, new_bb);
> >>> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
> >>> -         we need to ensure that new_bb is in the same partition as
> >>> -         test bb (you can not fall through across section boundaries).  */
> >>> -      BB_COPY_PARTITION (new_bb, test_bb);
> >>> +      /* This should have been done above via force_nonfallthru_and_redirect
> >>> +         (possibly called from redirect_edge_and_branch_force).  */
> >>> +      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
> >>>      }
> >>>
> >>>    num_true_changes++;
> >>> Index: function.c
> >>> ===================================================================
> >>> --- function.c  (revision 193376)
> >>> +++ function.c  (working copy)
> >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void)
> >>>                     break;
> >>>                 if (e)
> >>>                   {
> >>> -                   copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
> >>> -                                                 NULL_RTX, e->src);
> >>> +                    /* Make sure we insert after any barriers.  */
> >>> +                    rtx end = get_last_bb_insn (e->src);
> >>> +                    copy_bb = create_basic_block (NEXT_INSN (end),
> >>> +                                                  NULL_RTX, e->src);
> >>>                     BB_COPY_PARTITION (copy_bb, e->src);
> >>>                   }
> >>>                 else
> >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void)
> >>>         if (cur_bb->index >= NUM_FIXED_BLOCKS
> >>>             && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
> >>>           cur_bb->aux = cur_bb->next_bb;
> >>> -      cfg_layout_finalize ();
> >>> +      cfg_layout_finalize (false);
> >>>      }
> >>>
> >>>  epilogue_done:
> >>> @@ -6517,7 +6519,7 @@ epilogue_done:
> >>>        basic_block simple_return_block_cold = NULL;
> >>>        edge pending_edge_hot = NULL;
> >>>        edge pending_edge_cold = NULL;
> >>> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
> >>> +      basic_block exit_pred;
> >>>        int i;
> >>>
> >>>        gcc_assert (entry_edge != orig_entry_edge);
> >>> @@ -6545,6 +6547,12 @@ epilogue_done:
> >>>             else
> >>>               pending_edge_cold = e;
> >>>           }
> >>> +
> >>> +      /* Save a pointer to the exit's predecessor BB for use in
> >>> +         inserting new BBs at the end of the function. Do this
> >>> +         after the call to split_block above which may split
> >>> +         the original exit pred.  */
> >>> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
> >>>
> >>>        FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
> >>>         {
> >>> Index: function.h
> >>> ===================================================================
> >>> --- function.h  (revision 193376)
> >>> +++ function.h  (working copy)
> >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data {
> >>>       sched2) and is useful only if the port defines LEAF_REGISTERS.  */
> >>>    bool uses_only_leaf_regs;
> >>>
> >>> +  /* Nonzero if the function being compiled has undergone hot/cold partitioning
> >>> +     (under flag_reorder_blocks_and_partition) and has at least one cold
> >>> +     block.  */
> >>> +  bool has_bb_partition;
> >>> +
> >>>    /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
> >>>       asm.  Unlike regs_ever_live, elements of this array corresponding
> >>>       to eliminable regs (like the frame pointer) are set if an asm
> >>> Index: hw-doloop.c
> >>> ===================================================================
> >>> --- hw-doloop.c (revision 193376)
> >>> +++ hw-doloop.c (working copy)
> >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
> >>>        else
> >>>         bb->aux = NULL;
> >>>      }
> >>> -  cfg_layout_finalize ();
> >>> +  cfg_layout_finalize (false);
> >>>    clear_aux_for_blocks ();
> >>>    df_analyze ();
> >>>  }
> >>> Index: cfgcleanup.c
> >>> ===================================================================
> >>> --- cfgcleanup.c        (revision 193376)
> >>> +++ cfgcleanup.c        (working copy)
> >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
> >>>       partition boundaries).  See the comments at the top of
> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
> >>>
> >>> -  if (flag_reorder_blocks_and_partition && reload_completed)
> >>> +  if (crtl->has_bb_partition && reload_completed)
> >>>      return false;
> >>>
> >>>    /* Search backward through forwarder blocks.  We don't need to worry
> >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode)
> >>>               df_analyze ();
> >>>             }
> >>>
> >>> +         if (changed)
> >>> +            {
> >>> +              /* Edge forwarding in particular can cause hot blocks previously
> >>> +                 reached by both hot and cold blocks to become dominated only
> >>> +                 by cold blocks. This will cause the verification below to fail,
> >>> +                 and lead to now cold code in the hot section. This is not easy
> >>> +                 to detect and fix during edge forwarding, and in some cases
> >>> +                 is only visible after newly unreachable blocks are deleted,
> >>> +                 which will be done in fixup_partitions.  */
> >>> +              fixup_partitions ();
> >>> +
> >>>  #ifdef ENABLE_CHECKING
> >>> -         if (changed)
> >>> -           verify_flow_info ();
> >>> +              verify_flow_info ();
> >>>  #endif
> >>> +            }
> >>>
> >>>           changed_overall |= changed;
> >>>           first_pass = false;
> >>> Index: bb-reorder.c
> >>> ===================================================================
> >>> --- bb-reorder.c        (revision 193376)
> >>> +++ bb-reorder.c        (working copy)
> >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
> >>>    current_partition = BB_PARTITION (traces[0].first);
> >>>    two_passes = false;
> >>>
> >>> -  if (flag_reorder_blocks_and_partition)
> >>> +  if (crtl->has_bb_partition)
> >>>      for (i = 0; i < n_traces && !two_passes; i++)
> >>>        if (BB_PARTITION (traces[0].first)
> >>>           != BB_PARTITION (traces[i].first))
> >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
> >>>                       }
> >>>                   }
> >>>
> >>> -             if (flag_reorder_blocks_and_partition)
> >>> +             if (crtl->has_bb_partition)
> >>>                 try_copy = false;
> >>>
> >>>               /* Copy tiny blocks always; copy larger blocks only when the
> >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
> >>>    return length;
> >>>  }
> >>>
> >>> -/* Emit a barrier into the footer of BB.  */
> >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
> >>>
> >>> -static void
> >>> +void
> >>>  emit_barrier_after_bb (basic_block bb)
> >>>  {
> >>>    rtx barrier = emit_barrier_after (BB_END (bb));
> >>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
> >>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
> >>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
> >>>  }
> >>>
> >>>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
> >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
> >>>  {
> >>>    VEC(edge, heap) *crossing_edges = NULL;
> >>>    basic_block bb;
> >>> -  edge e;
> >>> -  edge_iterator ei;
> >>> +  edge e, e2;
> >>> +  edge_iterator ei, ei2;
> >>> +  unsigned int cold_bb_count = 0;
> >>> +  VEC (basic_block, heap) *bbs_in_hot_partition = NULL;
> >>> +  VEC (basic_block, heap) *bbs_newly_hot = NULL;
> >>>
> >>>    /* Mark which partition (hot/cold) each basic block belongs in.  */
> >>>    FOR_EACH_BB (bb)
> >>>      {
> >>>        if (probably_never_executed_bb_p (cfun, bb))
> >>> -       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
> >>> +        {
> >>> +          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
> >>> +          cold_bb_count++;
> >>> +        }
> >>>        else
> >>> -       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
> >>> +        {
> >>> +          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb);
> >>> +        }
> >>>      }
> >>>
> >>> +  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
> >>> +     several different possibilities. One is that there are edge weight insanities
> >>> +     due to optimization phases that do not properly update basic block profile
> >>> +     counts. The second is that the entry of the function may not be hot, because
> >>> +     it is entered fewer times than the number of profile training runs, but there
> >>> +     is a loop inside the function that causes blocks within the function to be
> >>> +     above the threshold for hotness.  */
> >>> +  if (cold_bb_count)
> >>> +    {
> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
> >>> +
> >>> +      if (dom_calculated_here)
> >>> +        calculate_dominance_info (CDI_DOMINATORS);
> >>> +
> >>> +      /* Keep examining hot bbs until we have either checked them all, or
> >>> +         re-marked all cold bbs hot.  */
> >>> +      while (! VEC_empty (basic_block, bbs_in_hot_partition)
> >>> +             && cold_bb_count)
> >>> +        {
> >>> +          basic_block dom_bb;
> >>> +
> >>> +          bb = VEC_pop (basic_block, bbs_in_hot_partition);
> >>> +          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
> >>> +
> >>> +          /* If bb's immediate dominator is also hot then it is ok.  */
> >>> +          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
> >>> +            continue;
> >>> +
> >>> +          /* We have a hot bb with an immediate dominator that is cold.
> >>> +             The dominator needs to be re-marked to hot.  */
> >>> +          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
> >>> +          cold_bb_count--;
> >>> +
> >>> +          /* Now we need to examine newly-hot dom_bb to see if it is also
> >>> +             dominated by a cold bb.  */
> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb);
> >>> +
> >>> +          /* We should also adjust any cold blocks that the newly-hot bb
> >>> +             feeds and see if it makes sense to re-mark those as hot as
> >>> +             well.  */
> >>> +          VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb);
> >>> +          while (! VEC_empty (basic_block, bbs_newly_hot))
> >>> +            {
> >>> +              basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot);
> >>> +              /* Examine all successors of this newly-hot bb to see if they
> >>> +                 are cold and should be re-marked as hot.  */
> >>> +              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
> >>> +                {
> >>> +                  bool any_cold_preds = false;
> >>> +                  basic_block succ = e->dest;
> >>> +                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
> >>> +                    continue;
> >>> +                  /* Does this block have any cold predecessors now?  */
> >>> +                  FOR_EACH_EDGE (e2, ei2, succ->preds)
> >>> +                  {
> >>> +                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
> >>> +                      {
> >>> +                        any_cold_preds = true;
> >>> +                        break;
> >>> +                      }
> >>> +                  }
> >>> +                  if (any_cold_preds)
> >>> +                    continue;
> >>> +
> >>> +                  /* Here we have a successor of newly-hot bb that is cold
> >>> +                     but no longer has any cold precessessors. Since the original
> >>> +                     assignment of our newly-hot bb was incorrect, this successor's
> >>> +                     assignment as cold is also suspect. Go ahead and re-mark it
> >>> +                     as hot now too. Better heuristics may be in order here.  */
> >>> +                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
> >>> +                  cold_bb_count--;
> >>> +                  VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ);
> >>> +                  /* Examine this successor as a newly-hot bb.  */
> >>> +                  VEC_safe_push (basic_block, heap, bbs_newly_hot, succ);
> >>> +                }
> >>> +            }
> >>> +        }
> >>> +
> >>> +      if (dom_calculated_here)
> >>> +        free_dominance_info (CDI_DOMINATORS);
> >>> +    }
> >>> +
> >>>    /* The format of .gcc_except_table does not allow landing pads to
> >>>       be in a different partition as the throw.  Fix this by either
> >>>       moving or duplicating the landing pads.  */
> >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
> >>>                       new_bb->aux = cur_bb->aux;
> >>>                       cur_bb->aux = new_bb;
> >>>
> >>> -                     /* Make sure new fall-through bb is in same
> >>> -                        partition as bb it's falling through from.  */
> >>> +                      /* This is done by force_nonfallthru_and_redirect.  */
> >>> +                     gcc_assert (BB_PARTITION (new_bb)
> >>> +                                  == BB_PARTITION (cur_bb));
> >>>
> >>> -                     BB_COPY_PARTITION (new_bb, cur_bb);
> >>>                       single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
> >>>                     }
> >>>                   else
> >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
> >>>    FOR_EACH_BB (bb)
> >>>      FOR_EACH_EDGE (e, ei, bb->succs)
> >>>        if ((e->flags & EDGE_CROSSING)
> >>> -         && JUMP_P (BB_END (e->src)))
> >>> +         && JUMP_P (BB_END (e->src))
> >>> +          /* Some notes were added during fix_up_fall_thru_edges, via
> >>> +             force_nonfallthru_and_redirect.  */
> >>> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
> >>>         add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
> >>>  }
> >>>
> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
> >>>        dump_flow_info (dump_file, dump_flags);
> >>>      }
> >>>
> >>> -  if (flag_reorder_blocks_and_partition)
> >>> +  if (crtl->has_bb_partition)
> >>>      verify_hot_cold_block_grouping ();
> >>>  }
> >>>
> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
> >>>     encountering this note will make the compiler switch between the
> >>>     hot and cold text sections.  */
> >>>
> >>> -static void
> >>> +void
> >>>  insert_section_boundary_note (void)
> >>>  {
> >>>    basic_block bb;
> >>>    rtx new_note;
> >>>    int first_partition = 0;
> >>>
> >>> -  if (!flag_reorder_blocks_and_partition)
> >>> +  if (!crtl->has_bb_partition)
> >>>      return;
> >>>
> >>>    FOR_EACH_BB (bb)
> >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
> >>>    FOR_EACH_BB (bb)
> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
> >>>        bb->aux = bb->next_bb;
> >>> -  cfg_layout_finalize ();
> >>> +  cfg_layout_finalize (true);
> >>>
> >>> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
> >>> -  insert_section_boundary_note ();
> >>>    return 0;
> >>>  }
> >>>
> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
> >>>      }
> >>>
> >>>  done:
> >>> -  cfg_layout_finalize ();
> >>> +  cfg_layout_finalize (false);
> >>>
> >>>    BITMAP_FREE (candidates);
> >>>    return 0;
> >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
> >>>    if (crossing_edges == NULL)
> >>>      return 0;
> >>>
> >>> +  crtl->has_bb_partition = true;
> >>> +
> >>>    /* Make sure the source of any crossing edge ends in a jump and the
> >>>       destination of any crossing edge has a label.  */
> >>>    add_labels_and_missing_jumps (crossing_edges);
> >>> Index: bb-reorder.h
> >>> ===================================================================
> >>> --- bb-reorder.h        (revision 193376)
> >>> +++ bb-reorder.h        (working copy)
> >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re
> >>>
> >>>  extern int get_uncond_jump_length (void);
> >>>
> >>> +extern void insert_section_boundary_note (void);
> >>> +
> >>> +extern void emit_barrier_after_bb (basic_block bb);
> >>> +
> >>>  #endif
> >>> Index: basic-block.h
> >>> ===================================================================
> >>> --- basic-block.h       (revision 193376)
> >>> +++ basic-block.h       (working copy)
> >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect
> >>>  extern bool contains_no_active_insn_p (const_basic_block);
> >>>  extern bool forwarder_block_p (const_basic_block);
> >>>  extern bool can_fallthru (basic_block, basic_block);
> >>> +extern void fixup_partitions (void);
> >>>
> >>>  /* In cfgbuild.c.  */
> >>>  extern void find_many_sub_basic_blocks (sbitmap);
> >>> Index: cfgrtl.c
> >>> ===================================================================
> >>> --- cfgrtl.c    (revision 193376)
> >>> +++ cfgrtl.c    (working copy)
> >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
> >>>  #include "tree.h"
> >>>  #include "hard-reg-set.h"
> >>>  #include "basic-block.h"
> >>> +#include "bb-reorder.h"
> >>>  #include "regs.h"
> >>>  #include "flags.h"
> >>>  #include "function.h"
> >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
> >>>     Only applicable if the CFG is in cfglayout mode.  */
> >>>  static GTY(()) rtx cfg_layout_function_footer;
> >>>  static GTY(()) rtx cfg_layout_function_header;
> >>> +static bool had_sec_boundary_notes;
> >>>
> >>>  static rtx skip_insns_after_block (basic_block);
> >>>  static void record_effective_endpoints (void);
> >>>  static rtx label_for_bb (basic_block);
> >>> -static void fixup_reorder_chain (void);
> >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks);
> >>>
> >>>  void verify_insn_chain (void);
> >>>  static void fixup_fallthru_exit_predecessor (void);
> >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
> >>>       partition boundaries).  See  the comments at the top of
> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
> >>>
> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
> >>>      return NULL;
> >>>
> >>>    /* We can replace or remove a complex jump only when we have exactly
> >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
> >>>    return e;
> >>>  }
> >>>
> >>> +/* Called when edge E has been redirected to a new destination,
> >>> +   in order to update the region crossing flag on the edge and
> >>> +   jump.  */
> >>> +
> >>> +static void
> >>> +fixup_partition_crossing (edge e, basic_block target)
> >>> +{
> >>> +  rtx note;
> >>> +
> >>> +  gcc_assert (e->dest == target);
> >>> +
> >>> +  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
> >>> +    return;
> >>> +  /* If we redirected an existing edge, it may already be marked
> >>> +     crossing, even though the new src is missing a reg crossing note.
> >>> +     But make sure reg crossing note doesn't already exist before
> >>> +     inserting.  */
> >>> +  if (BB_PARTITION (e->src) != BB_PARTITION (target))
> >>> +    {
> >>> +      e->flags |= EDGE_CROSSING;
> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
> >>> +      if (JUMP_P (BB_END (e->src))
> >>> +          && !note)
> >>> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
> >>> +    }
> >>> +  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
> >>> +    {
> >>> +      e->flags &= ~EDGE_CROSSING;
> >>> +      /* Remove the region crossing note from jump at end of
> >>> +         e->src if it exists.  */
> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
> >>> +      if (note)
> >>> +        remove_note (BB_END (e->src), note);
> >>> +    }
> >>> +}
> >>> +
> >>> +/* Called when block BB has been reassigned to a different partition,
> >>> +   to ensure that the region crossing attributes are updated.  */
> >>> +
> >>> +static void
> >>> +fixup_bb_partition (basic_block bb)
> >>> +{
> >>> +  edge e;
> >>> +  edge_iterator ei;
> >>> +
> >>> +  /* Now need to make bb's pred edges non-region crossing.  */
> >>> +  FOR_EACH_EDGE (e, ei, bb->preds)
> >>> +    {
> >>> +      fixup_partition_crossing (e, e->dest);
> >>> +    }
> >>> +
> >>> +  /* Possibly need to make bb's successor edges region crossing,
> >>> +     or remove stale region crossing.  */
> >>> +  FOR_EACH_EDGE (e, ei, bb->succs)
> >>> +    {
> >>> +      if ((e->flags & EDGE_FALLTHRU)
> >>> +          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
> >>> +          && e->dest != EXIT_BLOCK_PTR)
> >>> +        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
> >>> +        force_nonfallthru (e);
> >>> +      else
> >>> +        fixup_partition_crossing (e, e->dest);
> >>> +    }
> >>> +}
> >>> +
> >>>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
> >>>     expense of adding new instructions or reordering basic blocks.
> >>>
> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
> >>>  {
> >>>    edge ret;
> >>>    basic_block src = e->src;
> >>> +  basic_block dest = e->dest;
> >>>
> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
> >>>      return NULL;
> >>>
> >>> -  if (e->dest == target)
> >>> +  if (dest == target)
> >>>      return e;
> >>>
> >>>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
> >>>      {
> >>>        df_set_bb_dirty (src);
> >>> +      fixup_partition_crossing (ret, target);
> >>>        return ret;
> >>>      }
> >>>
> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
> >>>      return NULL;
> >>>
> >>>    df_set_bb_dirty (src);
> >>> +  fixup_partition_crossing (ret, target);
> >>>    return ret;
> >>>  }
> >>>
> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
> >>>        /* Make sure new block ends up in correct hot/cold section.  */
> >>>
> >>>        BB_COPY_PARTITION (jump_block, e->src);
> >>> -      if (flag_reorder_blocks_and_partition
> >>> -         && targetm_common.have_named_sections
> >>> -         && JUMP_P (BB_END (jump_block))
> >>> -         && !any_condjump_p (BB_END (jump_block))
> >>> -         && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
> >>> -       add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
> >>>
> >>>        /* Wire edge in.  */
> >>>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
> >>>        new_edge->probability = probability;
> >>>        new_edge->count = count;
> >>>
> >>> +      /* If e->src was previously region crossing, it no longer is
> >>> +         and the reg crossing note should be removed.  */
> >>> +      fixup_partition_crossing (new_edge, jump_block);
> >>> +
> >>>        /* Redirect old edge.  */
> >>>        redirect_edge_pred (e, jump_block);
> >>>        e->probability = REG_BR_PROB_BASE;
> >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
> >>>        LABEL_NUSES (label)++;
> >>>      }
> >>>
> >>> -  emit_barrier_after (BB_END (jump_block));
> >>> +  /* We might be in cfg layout mode, and if so, the following routine will
> >>> +     insert the barrier correctly.  */
> >>> +  emit_barrier_after_bb (jump_block);
> >>>    redirect_edge_succ_nodup (e, target);
> >>>
> >>>    if (abnormal_edge_flags)
> >>>      make_edge (src, target, abnormal_edge_flags);
> >>>
> >>>    df_mark_solutions_dirty ();
> >>> +  fixup_partition_crossing (e, target);
> >>>    return new_bb;
> >>>  }
> >>>
> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
> >>>  static basic_block
> >>>  rtl_split_edge (edge edge_in)
> >>>  {
> >>> -  basic_block bb;
> >>> +  basic_block bb, new_bb;
> >>>    rtx before;
> >>>
> >>>    /* Abnormal edges cannot be split.  */
> >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in)
> >>>    else
> >>>      {
> >>>        bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
> >>> -      /* ??? Why not edge_in->dest->prev_bb here?  */
> >>> -      BB_COPY_PARTITION (bb, edge_in->dest);
> >>> +      if (edge_in->src == ENTRY_BLOCK_PTR)
> >>> +        BB_COPY_PARTITION (bb, edge_in->dest);
> >>> +      else
> >>> +        /* Put the split bb into the src partition, to avoid creating
> >>> +           a situation where a cold bb dominates a hot bb, in the case
> >>> +           where src is cold and dest is hot. The src will dominate
> >>> +           the new bb (whereas it might not have dominated dest).  */
> >>> +        BB_COPY_PARTITION (bb, edge_in->src);
> >>>      }
> >>>
> >>>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
> >>>
> >>> +  /* Can't allow a region crossing edge to be fallthrough.  */
> >>> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
> >>> +      && edge_in->dest != EXIT_BLOCK_PTR)
> >>> +    {
> >>> +      new_bb = force_nonfallthru (single_succ_edge (bb));
> >>> +      gcc_assert (!new_bb);
> >>> +    }
> >>> +
> >>>    /* For non-fallthru edges, we must adjust the predecessor's
> >>>       jump instruction to target our new block.  */
> >>>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
> >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e)
> >>>    else
> >>>      {
> >>>        bb = split_edge (e);
> >>> -      after = BB_END (bb);
> >>>
> >>> -      if (flag_reorder_blocks_and_partition
> >>> -         && targetm_common.have_named_sections
> >>> -         && e->src != ENTRY_BLOCK_PTR
> >>> -         && BB_PARTITION (e->src) == BB_COLD_PARTITION
> >>> -         && !(e->flags & EDGE_CROSSING)
> >>> -         && JUMP_P (after)
> >>> -         && !any_condjump_p (after)
> >>> -         && (single_succ_edge (bb)->flags & EDGE_CROSSING))
> >>> -       add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
> >>> +      /* If e crossed a partition boundary, we needed to make bb end in
> >>> +         a region-crossing jump, even though it was originally fallthru.  */
> >>> +      if (JUMP_P (BB_END (bb)))
> >>> +       before = BB_END (bb);
> >>> +      else
> >>> +        after = BB_END (bb);
> >>>      }
> >>>
> >>>    /* Now that we've found the spot, do the insertion.  */
> >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void)
> >>>  {
> >>>    basic_block bb;
> >>>
> >>> +  /* Optimization passes that invoke this routine can cause hot blocks
> >>> +     previously reached by both hot and cold blocks to become dominated only
> >>> +     by cold blocks. This will cause the verification below to fail,
> >>> +     and lead to now cold code in the hot section. In some cases this
> >>> +     may only be visible after newly unreachable blocks are deleted,
> >>> +     which will be done by fixup_partitions.  */
> >>> +  fixup_partitions ();
> >>> +
> >>>  #ifdef ENABLE_CHECKING
> >>>    verify_flow_info ();
> >>>  #endif
> >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb)
> >>>
> >>>    return end;
> >>>  }
> >>> -
> >>> +
> >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization
> >>> +   passes that modify the cfg.  */
> >>> +
> >>> +void
> >>> +fixup_partitions (void)
> >>> +{
> >>> +  basic_block bb;
> >>> +
> >>> +  if (!crtl->has_bb_partition)
> >>> +    return;
> >>> +
> >>> +  /* Delete any blocks that became unreachable and weren't
> >>> +     already cleaned up, for example during edge forwarding
> >>> +     and convert_jumps_to_returns. This will expose more
> >>> +     opportunities for fixing the partition boundaries here.
> >>> +     Also, the calculation of the dominance graph during verification
> >>> +     will assert if there are unreachable nodes.  */
> >>> +  delete_unreachable_blocks ();
> >>> +
> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
> >>> +     a cold partition cannot dominate a basic block in a hot partition.
> >>> +     Fixup any that now violate this requirement, as a result of edge
> >>> +     forwarding and unreachable block deletion.  */
> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
> >>> +  VEC (basic_block, heap) *bbs_to_fix = NULL;
> >>> +  FOR_EACH_BB (bb)
> >>> +    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
> >>> +      VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
> >>> +    {
> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
> >>> +      basic_block son;
> >>> +
> >>> +      if (dom_calculated_here)
> >>> +        calculate_dominance_info (CDI_DOMINATORS);
> >>> +
> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
> >>> +        {
> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
> >>> +          /* If bb is not yet cold (because it was added below as
> >>> +             a block dominated by a cold bb) then mark it cold here.  */
> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
> >>> +            {
> >>> +              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
> >>> +              VEC_safe_push (basic_block, heap, bbs_to_fix, bb);
> >>> +            }
> >>> +          /* Any blocks dominated by a block in the cold section
> >>> +             must also be cold.  */
> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
> >>> +               son;
> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
> >>> +        }
> >>> +
> >>> +      if (dom_calculated_here)
> >>> +        free_dominance_info (CDI_DOMINATORS);
> >>> +    }
> >>> +
> >>> +  /* Do the partition fixup after all necessary blocks have been converted to
> >>> +     cold, so that we only update the region crossings the minimum number of
> >>> +     places, which can require forcing edges to be non fallthru.  */
> >>> +  while (! VEC_empty (basic_block, bbs_to_fix))
> >>> +    {
> >>> +      bb = VEC_pop (basic_block, bbs_to_fix);
> >>> +      fixup_bb_partition (bb);
> >>> +    }
> >>> +}
> >>> +
> >>>  /* Verify the CFG and RTL consistency common for both underlying RTL and
> >>>     cfglayout RTL.
> >>>
> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void)
> >>>    rtx x;
> >>>    int err = 0;
> >>>    basic_block bb;
> >>> +  bool have_partitions = false;
> >>>
> >>>    /* Check the general integrity of the basic blocks.  */
> >>>    FOR_EACH_BB_REVERSE (bb)
> >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void)
> >>>
> >>>           if (e->flags & EDGE_ABNORMAL)
> >>>             n_abnormal++;
> >>> +
> >>> +          have_partitions |= is_crossing;
> >>>         }
> >>>
> >>>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
> >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void)
> >>>           }
> >>>      }
> >>>
> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
> >>> +     a cold partition cannot dominate a basic block in a hot partition.  */
> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
> >>> +  if (have_partitions && !err)
> >>> +    FOR_EACH_BB (bb)
> >>> +      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
> >>> +        VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
> >>> +    {
> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
> >>> +      basic_block son;
> >>> +
> >>> +      if (dom_calculated_here)
> >>> +        calculate_dominance_info (CDI_DOMINATORS);
> >>> +
> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
> >>> +        {
> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
> >>> +            {
> >>> +              error ("non-cold basic block %d dominated "
> >>> +                     "by a block in the cold partition", bb->index);
> >>> +              err = 1;
> >>> +            }
> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
> >>> +               son;
> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
> >>> +        }
> >>> +
> >>> +      if (dom_calculated_here)
> >>> +        free_dominance_info (CDI_DOMINATORS);
> >>> +    }
> >>> +
> >>>    /* Clean up.  */
> >>>    return err;
> >>>  }
> >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void)
> >>>    else
> >>>      cfg_layout_function_header = NULL_RTX;
> >>>
> >>> +  had_sec_boundary_notes = false;
> >>> +
> >>>    next_insn = get_insns ();
> >>>    FOR_EACH_BB (bb)
> >>>      {
> >>>        rtx end;
> >>>
> >>>        if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
> >>> -       BB_HEADER (bb) = unlink_insn_chain (next_insn,
> >>> -                                             PREV_INSN (BB_HEAD (bb)));
> >>> +        {
> >>> +          /* Rather than try to keep section boundary notes incrementally
> >>> +             up-to-date through cfg layout optimizations, simply remove them
> >>> +             and flag that they should be re-inserted when exiting
> >>> +             cfg layout mode.  */
> >>> +          rtx check_insn = next_insn;
> >>> +          while (check_insn)
> >>> +            {
> >>> +              if (NOTE_P (check_insn)
> >>> +                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
> >>> +              {
> >>> +                had_sec_boundary_notes |= true;
> >>> +                /* Remove note from chain. Grab new next_insn first.  */
> >>> +                if (next_insn == check_insn)
> >>> +                  next_insn = NEXT_INSN (check_insn);
> >>> +                /* Delete note.  */
> >>> +                delete_insn (check_insn);
> >>> +                /* There will only be one.  */
> >>> +                break;
> >>> +              }
> >>> +              check_insn = NEXT_INSN (check_insn);
> >>> +            }
> >>> +          /* If we still have header instructions left after above loop.  */
> >>> +          if (next_insn != BB_HEAD (bb))
> >>> +            BB_HEADER (bb) = unlink_insn_chain (next_insn,
> >>> +                                                PREV_INSN (BB_HEAD (bb)));
> >>> +        }
> >>>        end = skip_insns_after_block (bb);
> >>>        if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
> >>>         BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
> >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void)
> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
> >>>        bb->aux = bb->next_bb;
> >>>
> >>> -  cfg_layout_finalize ();
> >>> +  cfg_layout_finalize (false);
> >>>
> >>>    return 0;
> >>>  }
> >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
> >>>  }
> >>>
> >>>
> >>> -/* Given a reorder chain, rearrange the code to match.  */
> >>> +/* Given a reorder chain, rearrange the code to match. If
> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, or when
> >>> +   section boundary notes were removed on entry to cfg layout
> >>> +   mode, insert section boundary notes here.  */
> >>>
> >>>  static void
> >>> -fixup_reorder_chain (void)
> >>> +fixup_reorder_chain (bool finalize_reorder_blocks)
> >>>  {
> >>>    basic_block bb;
> >>>    rtx insn = NULL;
> >>> @@ -3150,7 +3373,7 @@ static void
> >>>           PREV_INSN (BB_HEADER (bb)) = insn;
> >>>           insn = BB_HEADER (bb);
> >>>           while (NEXT_INSN (insn))
> >>> -           insn = NEXT_INSN (insn);
> >>> +            insn = NEXT_INSN (insn);
> >>>         }
> >>>        if (insn)
> >>>         NEXT_INSN (insn) = BB_HEAD (bb);
> >>> @@ -3175,6 +3398,11 @@ static void
> >>>      insn = NEXT_INSN (insn);
> >>>
> >>>    set_last_insn (insn);
> >>> +
> >>> +  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
> >>> +  if (had_sec_boundary_notes || finalize_reorder_blocks)
> >>> +    insert_section_boundary_note ();
> >>> +
> >>>  #ifdef ENABLE_CHECKING
> >>>    verify_insn_chain ();
> >>>  #endif
> >>> @@ -3187,7 +3415,7 @@ static void
> >>>        edge e_fall, e_taken, e;
> >>>        rtx bb_end_insn;
> >>>        rtx ret_label = NULL_RTX;
> >>> -      basic_block nb, src_bb;
> >>> +      basic_block nb;
> >>>        edge_iterator ei;
> >>>
> >>>        if (EDGE_COUNT (bb->succs) == 0)
> >>> @@ -3322,7 +3550,6 @@ static void
> >>>        /* We got here if we need to add a new jump insn.
> >>>          Note force_nonfallthru can delete E_FALL and thus we have to
> >>>          save E_FALL->src prior to the call to force_nonfallthru.  */
> >>> -      src_bb = e_fall->src;
> >>>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
> >>>        if (nb)
> >>>         {
> >>> @@ -3330,17 +3557,6 @@ static void
> >>>           bb->aux = nb;
> >>>           /* Don't process this new block.  */
> >>>           bb = nb;
> >>> -
> >>> -         /* Make sure new bb is tagged for correct section (same as
> >>> -            fall-thru source, since you cannot fall-thru across
> >>> -            section boundaries).  */
> >>> -         BB_COPY_PARTITION (src_bb, single_pred (bb));
> >>> -         if (flag_reorder_blocks_and_partition
> >>> -             && targetm_common.have_named_sections
> >>> -             && JUMP_P (BB_END (bb))
> >>> -             && !any_condjump_p (BB_END (bb))
> >>> -             && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
> >>> -           add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
> >>>         }
> >>>      }
> >>>
> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to)
> >>>             case NOTE_INSN_FUNCTION_BEG:
> >>>               /* There is always just single entry to function.  */
> >>>             case NOTE_INSN_BASIC_BLOCK:
> >>> +              /* We should only switch text sections once.  */
> >>> +           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
> >>>               break;
> >>>
> >>>             case NOTE_INSN_EPILOGUE_BEG:
> >>> -           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
> >>>               emit_note_copy (insn);
> >>>               break;
> >>>
> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void)
> >>>  }
> >>>
> >>>  /* Finalize the changes: reorder insn list according to the sequence specified
> >>> -   by aux pointers, enter compensation code, rebuild scope forest.  */
> >>> +   by aux pointers, enter compensation code, rebuild scope forest. If
> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
> >>> +   to fixup_reorder_chain so that it can insert the proper switch text
> >>> +   section notes.  */
> >>>
> >>>  void
> >>> -cfg_layout_finalize (void)
> >>> +cfg_layout_finalize (bool finalize_reorder_blocks)
> >>>  {
> >>>  #ifdef ENABLE_CHECKING
> >>>    verify_flow_info ();
> >>> @@ -3775,7 +3995,7 @@ void
> >>>  #endif
> >>>        )
> >>>      fixup_fallthru_exit_predecessor ();
> >>> -  fixup_reorder_chain ();
> >>> +  fixup_reorder_chain (finalize_reorder_blocks);
> >>>
> >>>    rebuild_jump_labels (get_insns ());
> >>>    delete_dead_jumptables ();
> >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e)
> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
> >>>      return false;
> >>>
> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
> >>>      return false;
> >>>
> >>>    if (!onlyjump_p (insn)
> >>>
> >>> --
> >>> This patch is available for review at http://codereview.appspot.com/6823047
> >>
> >>
> >>
> >> --
> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson Nov. 26, 2012, 8:52 p.m. UTC | #5
Sorry, I don't know what happened there. Patch is attached.
Thanks,
Teresa

On Mon, Nov 26, 2012 at 12:42 PM, Jack Howarth <howarth@bromo.med.uc.edu> wrote:
> On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote:
>> Are you sure you have all my changes applied? I applied the 4 patches
>> attached to PR55121 into my trunk checkout that has my fixes, and to a
>> pristine trunk checkout. I configured and built both for
>> --target=arm-none-linux-gnueabi, and built using your options, .i file
>> and gcda file. I can reproduce the failure using the pristine trunk
>> with your patches but not with my fixed trunk + your patches. (I just
>> updated to head to pickup recent changes and get the same result. The
>> vec changes required some manual changes to the patch, which I will
>> resend shortly.)
>
> Teresa,
>     Your mailer seems to have corrupted the posted patch with stray
> =3D characters and line breaks. Can you repost a copy as an attachment
> to the list?
>              Jack
>
>>
>> Without my fixes:
>>
>> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce
>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
>> -fno-common -o eval.s -freorder-blocks-and-partition
>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>> 2.4.2-p1, MPC version 0.8.1
>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>> 2.4.2-p1, MPC version 0.8.1
>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a
>> eval.c: In function ‘Ge’:
>> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560
>>  }
>>  ^
>> 0x622f71 df_compact_blocks()
>> ../../gcc_trunk_3/gcc/df-core.c:1560
>> 0x5cfcb5 compact_blocks()
>> ../../gcc_trunk_3/gcc/cfg.c:162
>> 0xc9dce0 reorder_basic_blocks
>> ../../gcc_trunk_3/gcc/bb-reorder.c:2154
>> 0xc9dce0 rest_of_handle_reorder_blocks
>> ../../gcc_trunk_3/gcc/bb-reorder.c:2219
>> Please submit a full bug report,
>> with preprocessed source if appropriate.
>> Please include the complete backtrace with any bug report.
>> See <http://gcc.gnu.org/bugs.html> for instructions.
>>
>>
>> With my fixes:
>>
>> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce
>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
>> -fno-common -o eval.s -freorder-blocks-and-partition
>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>> 2.4.2-p1, MPC version 0.8.1
>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>> 2.4.2-p1, MPC version 0.8.1
>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3
>>
>>
>> Thanks,
>> Teresa
>>
>> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon
>> <christophe.lyon@linaro.org> wrote:
>> > Hi,
>> >
>> > I have tested your patch on Spec2000 on ARM, and I can still see
>> > several failures caused by:
>> > "error: fallthru edge crosses section boundary", including the case
>> > described in PR55121.
>> >
>> > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote:
>> >> Ping.
>> >> Teresa
>> >>
>> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote:
>> >>> Revised patch that fixes failures encountered when enabling
>> >>> -freorder-blocks-and-partition, including the failure reported in PR 53743.
>> >>>
>> >>> This includes new verification code to ensure no cold blocks dominate hot
>> >>> blocks contributed by Steven Bosscher.
>> >>>
>> >>> I attempted to make the handling of partition updates through the optimization
>> >>> passes much more consistent, removing a number of partial fixes in the code
>> >>> stream in the process. The code to fixup partitions (including the BB_PARTITION
>> >>> assignement, region crossing jump notes, and switch text section notes) is
>> >>> now handled in a few centralized locations. For example, inside
>> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
>> >>> don't need to attempt the fixup themselves.
>> >>>
>> >>> For optimization passes that make adjustments to the cfg while in cfg layout
>> >>> mode that are not easy to fix up incrementally, the new routine
>> >>> fixup_partitions handles the cleanup globally. This does require calculation
>> >>> of the dominance relation, however, as far as I can tell the routines which
>> >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
>> >>> are invoked typically once (or a small number of times in the case of
>> >>> try_optimize_cfg) per optimization pass. Additionally, I compared the
>> >>> -ftime-report output for some large fdo compilations and saw only minimal
>> >>> increases in the dominance computation times, which were only a tiny percent
>> >>> of the overall compile time.
>> >>>
>> >>> Additionally, I added a flag to the rtl_data structure to indicate whether
>> >>> any partitioning was actually performed, so that optimizations which were
>> >>> conservatively disabled whenever the flag_reorder_blocks_and_partition
>> >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
>> >>> conservative for functions where no partitions were formed (e.g. they are
>> >>> completely hot).
>> >>>
>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
>> >>> benchmarks and internal google benchmarks using profile feedback and
>> >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk?
>> >>>
>> >>> Thanks,
>> >>> Teresa
>> >>>
>> >>> 2012-11-14  Teresa Johnson  <tejohnson@google.com>
>> >>>             Steven Bosscher  <steven@gcc.gnu.org>
>> >>>
>> >>>         * cfghooks.h (cfg_layout_finalize): New parameter.
>> >>>         * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
>> >>>         parameter.
>> >>>         * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
>> >>>         as this is now done by redirect_edge_and_branch_force.
>> >>>         * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
>> >>>         barriers, new cfg_layout_finalize parameter, and don't store exit
>> >>>         predecessor BB until after it is potentially split.
>> >>>         * function.h (struct rtl_data): New flag has_bb_partition.
>> >>>         * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
>> >>>         * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
>> >>>         any blocks in function actually partitioned.
>> >>>         (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
>> >>>         up partitioning.
>> >>>         * bb-reorder.c (connect_traces): Only look for partitions and skip
>> >>>         block copying if any blocks in function actually partitioned.
>> >>>         (emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
>> >>>         (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
>> >>>         that no cold blocks dominate a hot block.
>> >>>         (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
>> >>>         as this is now done by force_nonfallthru_and_redirect.
>> >>>         (add_reg_crossing_jump_notes): Handle the fact that some jumps may
>> >>>         already be marked with region crossing note.
>> >>>         (reorder_basic_blocks): Only need to verify partitions if any
>> >>>         blocks in function actually partitioned.
>> >>>         (insert_section_boundary_note): Only need to insert note if any
>> >>>         blocks in function actually partitioned.
>> >>>         (rest_of_handle_reorder_blocks): New cfg_layout_finalize
>> >>>         parameter, and remove call to insert_section_boundary_note as this
>> >>>         is now called via cfg_layout_finalize/fixup_reorder_chain.
>> >>>         (duplicate_computed_gotos): New cfg_layout_finalize
>> >>>         parameter.
>> >>>         (partition_hot_cold_basic_blocks): Set flag indicating function
>> >>>         has bb partitions.
>> >>>         * bb-reorder.h: Declare insert_section_boundary_note and
>> >>>         emit_barrier_after_bb, which are no longer static.
>> >>>         * basic-block.h: Declare new function fixup_partitions.
>> >>>         * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
>> >>>         check for region crossing note.
>> >>>         (fixup_partition_crossing): New function.
>> >>>         (fixup_bb_partition): Ditto.
>> >>>         (rtl_redirect_edge_and_branch): Fixup partition boundaries.
>> >>>         (force_nonfallthru_and_redirect): Fixup partition boundaries,
>> >>>         remove old code that tried to do this. Emit barrier correctly
>> >>>         when we are in cfglayout mode.
>> >>>         (rtl_split_edge): Correctly fixup partition boundaries.
>> >>>         (commit_one_edge_insertion): Remove old code that tried to
>> >>>         fixup region crossing edge since this is now handled in
>> >>>         split_block, and set up insertion point correctly since
>> >>>         block may now end in a jump.
>> >>>         (commit_edge_insertions): Invoke fixup_partitions to sanitize partition
>> >>>         boundaries after optimizations that modify cfg and before trying to
>> >>>         verify the flow info.
>> >>>         (fixup_partitions): New function.
>> >>>         (rtl_verify_flow_info_1): Add verification that no cold bbs dominate
>> >>>         hot bbs.
>> >>>         (record_effective_endpoints): Remove region-crossing notes and set flag
>> >>>         indicating that they need to be reinserted on exit from cfglayout mode.
>> >>>         (outof_cfg_layout_mode): New cfg_layout_finalize parameter.
>> >>>         (fixup_reorder_chain): Call insert_section_boundary_note if necessary.
>> >>>         Remove old code that attempted to fixup region crossing note as
>> >>>         this is now handled in force_nonfallthru_and_redirect.
>> >>>         (duplicate_insn_chain): Don't duplicate switch section notes.
>> >>>         (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
>> >>>         (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
>> >>>         note.
>> >>>
>> >>> Index: cfghooks.h
>> >>> ===================================================================
>> >>> --- cfghooks.h  (revision 193376)
>> >>> +++ cfghooks.h  (working copy)
>> >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
>> >>>  void account_profile_record (struct profile_record *, int);
>> >>>
>> >>>  extern void cfg_layout_initialize (unsigned int);
>> >>> -extern void cfg_layout_finalize (void);
>> >>> +extern void cfg_layout_finalize (bool);
>> >>>
>> >>>  /* Hooks containers.  */
>> >>>  extern struct cfg_hooks gimple_cfg_hooks;
>> >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
>> >>>  extern void gimple_register_cfg_hooks (void);
>> >>>  extern struct cfg_hooks get_cfg_hooks (void);
>> >>>  extern void set_cfg_hooks (struct cfg_hooks);
>> >>> -
>> >>> Index: modulo-sched.c
>> >>> ===================================================================
>> >>> --- modulo-sched.c      (revision 193376)
>> >>> +++ modulo-sched.c      (working copy)
>> >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void)
>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>> >>>        bb->aux = bb->next_bb;
>> >>>    free_dominance_info (CDI_DOMINATORS);
>> >>> -  cfg_layout_finalize ();
>> >>> +  cfg_layout_finalize (false);
>> >>>  #endif /* INSN_SCHEDULING */
>> >>>    return 0;
>> >>>  }
>> >>> Index: ifcvt.c
>> >>> ===================================================================
>> >>> --- ifcvt.c     (revision 193376)
>> >>> +++ ifcvt.c     (working copy)
>> >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
>> >>>    if (new_bb)
>> >>>      {
>> >>>        df_bb_replace (then_bb_index, new_bb);
>> >>> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
>> >>> -         we need to ensure that new_bb is in the same partition as
>> >>> -         test bb (you can not fall through across section boundaries).  */
>> >>> -      BB_COPY_PARTITION (new_bb, test_bb);
>> >>> +      /* This should have been done above via force_nonfallthru_and_redirect
>> >>> +         (possibly called from redirect_edge_and_branch_force).  */
>> >>> +      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
>> >>>      }
>> >>>
>> >>>    num_true_changes++;
>> >>> Index: function.c
>> >>> ===================================================================
>> >>> --- function.c  (revision 193376)
>> >>> +++ function.c  (working copy)
>> >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void)
>> >>>                     break;
>> >>>                 if (e)
>> >>>                   {
>> >>> -                   copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
>> >>> -                                                 NULL_RTX, e->src);
>> >>> +                    /* Make sure we insert after any barriers.  */
>> >>> +                    rtx end = get_last_bb_insn (e->src);
>> >>> +                    copy_bb = create_basic_block (NEXT_INSN (end),
>> >>> +                                                  NULL_RTX, e->src);
>> >>>                     BB_COPY_PARTITION (copy_bb, e->src);
>> >>>                   }
>> >>>                 else
>> >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void)
>> >>>         if (cur_bb->index >= NUM_FIXED_BLOCKS
>> >>>             && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
>> >>>           cur_bb->aux = cur_bb->next_bb;
>> >>> -      cfg_layout_finalize ();
>> >>> +      cfg_layout_finalize (false);
>> >>>      }
>> >>>
>> >>>  epilogue_done:
>> >>> @@ -6517,7 +6519,7 @@ epilogue_done:
>> >>>        basic_block simple_return_block_cold = NULL;
>> >>>        edge pending_edge_hot = NULL;
>> >>>        edge pending_edge_cold = NULL;
>> >>> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
>> >>> +      basic_block exit_pred;
>> >>>        int i;
>> >>>
>> >>>        gcc_assert (entry_edge != orig_entry_edge);
>> >>> @@ -6545,6 +6547,12 @@ epilogue_done:
>> >>>             else
>> >>>               pending_edge_cold = e;
>> >>>           }
>> >>> +
>> >>> +      /* Save a pointer to the exit's predecessor BB for use in
>> >>> +         inserting new BBs at the end of the function. Do this
>> >>> +         after the call to split_block above which may split
>> >>> +         the original exit pred.  */
>> >>> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
>> >>>
>> >>>        FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
>> >>>         {
>> >>> Index: function.h
>> >>> ===================================================================
>> >>> --- function.h  (revision 193376)
>> >>> +++ function.h  (working copy)
>> >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data {
>> >>>       sched2) and is useful only if the port defines LEAF_REGISTERS.  */
>> >>>    bool uses_only_leaf_regs;
>> >>>
>> >>> +  /* Nonzero if the function being compiled has undergone hot/cold partitioning
>> >>> +     (under flag_reorder_blocks_and_partition) and has at least one cold
>> >>> +     block.  */
>> >>> +  bool has_bb_partition;
>> >>> +
>> >>>    /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
>> >>>       asm.  Unlike regs_ever_live, elements of this array corresponding
>> >>>       to eliminable regs (like the frame pointer) are set if an asm
>> >>> Index: hw-doloop.c
>> >>> ===================================================================
>> >>> --- hw-doloop.c (revision 193376)
>> >>> +++ hw-doloop.c (working copy)
>> >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
>> >>>        else
>> >>>         bb->aux = NULL;
>> >>>      }
>> >>> -  cfg_layout_finalize ();
>> >>> +  cfg_layout_finalize (false);
>> >>>    clear_aux_for_blocks ();
>> >>>    df_analyze ();
>> >>>  }
>> >>> Index: cfgcleanup.c
>> >>> ===================================================================
>> >>> --- cfgcleanup.c        (revision 193376)
>> >>> +++ cfgcleanup.c        (working copy)
>> >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
>> >>>       partition boundaries).  See the comments at the top of
>> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>> >>>
>> >>> -  if (flag_reorder_blocks_and_partition && reload_completed)
>> >>> +  if (crtl->has_bb_partition && reload_completed)
>> >>>      return false;
>> >>>
>> >>>    /* Search backward through forwarder blocks.  We don't need to worry
>> >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode)
>> >>>               df_analyze ();
>> >>>             }
>> >>>
>> >>> +         if (changed)
>> >>> +            {
>> >>> +              /* Edge forwarding in particular can cause hot blocks previously
>> >>> +                 reached by both hot and cold blocks to become dominated only
>> >>> +                 by cold blocks. This will cause the verification below to fail,
>> >>> +                 and lead to now cold code in the hot section. This is not easy
>> >>> +                 to detect and fix during edge forwarding, and in some cases
>> >>> +                 is only visible after newly unreachable blocks are deleted,
>> >>> +                 which will be done in fixup_partitions.  */
>> >>> +              fixup_partitions ();
>> >>> +
>> >>>  #ifdef ENABLE_CHECKING
>> >>> -         if (changed)
>> >>> -           verify_flow_info ();
>> >>> +              verify_flow_info ();
>> >>>  #endif
>> >>> +            }
>> >>>
>> >>>           changed_overall |= changed;
>> >>>           first_pass = false;
>> >>> Index: bb-reorder.c
>> >>> ===================================================================
>> >>> --- bb-reorder.c        (revision 193376)
>> >>> +++ bb-reorder.c        (working copy)
>> >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
>> >>>    current_partition = BB_PARTITION (traces[0].first);
>> >>>    two_passes = false;
>> >>>
>> >>> -  if (flag_reorder_blocks_and_partition)
>> >>> +  if (crtl->has_bb_partition)
>> >>>      for (i = 0; i < n_traces && !two_passes; i++)
>> >>>        if (BB_PARTITION (traces[0].first)
>> >>>           != BB_PARTITION (traces[i].first))
>> >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
>> >>>                       }
>> >>>                   }
>> >>>
>> >>> -             if (flag_reorder_blocks_and_partition)
>> >>> +             if (crtl->has_bb_partition)
>> >>>                 try_copy = false;
>> >>>
>> >>>               /* Copy tiny blocks always; copy larger blocks only when the
>> >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
>> >>>    return length;
>> >>>  }
>> >>>
>> >>> -/* Emit a barrier into the footer of BB.  */
>> >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
>> >>>
>> >>> -static void
>> >>> +void
>> >>>  emit_barrier_after_bb (basic_block bb)
>> >>>  {
>> >>>    rtx barrier = emit_barrier_after (BB_END (bb));
>> >>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>> >>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>> >>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>> >>>  }
>> >>>
>> >>>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
>> >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
>> >>>  {
>> >>>    VEC(edge, heap) *crossing_edges = NULL;
>> >>>    basic_block bb;
>> >>> -  edge e;
>> >>> -  edge_iterator ei;
>> >>> +  edge e, e2;
>> >>> +  edge_iterator ei, ei2;
>> >>> +  unsigned int cold_bb_count = 0;
>> >>> +  VEC (basic_block, heap) *bbs_in_hot_partition = NULL;
>> >>> +  VEC (basic_block, heap) *bbs_newly_hot = NULL;
>> >>>
>> >>>    /* Mark which partition (hot/cold) each basic block belongs in.  */
>> >>>    FOR_EACH_BB (bb)
>> >>>      {
>> >>>        if (probably_never_executed_bb_p (cfun, bb))
>> >>> -       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>> >>> +        {
>> >>> +          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>> >>> +          cold_bb_count++;
>> >>> +        }
>> >>>        else
>> >>> -       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>> >>> +        {
>> >>> +          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb);
>> >>> +        }
>> >>>      }
>> >>>
>> >>> +  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
>> >>> +     several different possibilities. One is that there are edge weight insanities
>> >>> +     due to optimization phases that do not properly update basic block profile
>> >>> +     counts. The second is that the entry of the function may not be hot, because
>> >>> +     it is entered fewer times than the number of profile training runs, but there
>> >>> +     is a loop inside the function that causes blocks within the function to be
>> >>> +     above the threshold for hotness.  */
>> >>> +  if (cold_bb_count)
>> >>> +    {
>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>> >>> +
>> >>> +      if (dom_calculated_here)
>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>> >>> +
>> >>> +      /* Keep examining hot bbs until we have either checked them all, or
>> >>> +         re-marked all cold bbs hot.  */
>> >>> +      while (! VEC_empty (basic_block, bbs_in_hot_partition)
>> >>> +             && cold_bb_count)
>> >>> +        {
>> >>> +          basic_block dom_bb;
>> >>> +
>> >>> +          bb = VEC_pop (basic_block, bbs_in_hot_partition);
>> >>> +          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
>> >>> +
>> >>> +          /* If bb's immediate dominator is also hot then it is ok.  */
>> >>> +          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
>> >>> +            continue;
>> >>> +
>> >>> +          /* We have a hot bb with an immediate dominator that is cold.
>> >>> +             The dominator needs to be re-marked to hot.  */
>> >>> +          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
>> >>> +          cold_bb_count--;
>> >>> +
>> >>> +          /* Now we need to examine newly-hot dom_bb to see if it is also
>> >>> +             dominated by a cold bb.  */
>> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb);
>> >>> +
>> >>> +          /* We should also adjust any cold blocks that the newly-hot bb
>> >>> +             feeds and see if it makes sense to re-mark those as hot as
>> >>> +             well.  */
>> >>> +          VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb);
>> >>> +          while (! VEC_empty (basic_block, bbs_newly_hot))
>> >>> +            {
>> >>> +              basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot);
>> >>> +              /* Examine all successors of this newly-hot bb to see if they
>> >>> +                 are cold and should be re-marked as hot.  */
>> >>> +              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
>> >>> +                {
>> >>> +                  bool any_cold_preds = false;
>> >>> +                  basic_block succ = e->dest;
>> >>> +                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
>> >>> +                    continue;
>> >>> +                  /* Does this block have any cold predecessors now?  */
>> >>> +                  FOR_EACH_EDGE (e2, ei2, succ->preds)
>> >>> +                  {
>> >>> +                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
>> >>> +                      {
>> >>> +                        any_cold_preds = true;
>> >>> +                        break;
>> >>> +                      }
>> >>> +                  }
>> >>> +                  if (any_cold_preds)
>> >>> +                    continue;
>> >>> +
>> >>> +                  /* Here we have a successor of newly-hot bb that is cold
>> >>> +                     but no longer has any cold precessessors. Since the original
>> >>> +                     assignment of our newly-hot bb was incorrect, this successor's
>> >>> +                     assignment as cold is also suspect. Go ahead and re-mark it
>> >>> +                     as hot now too. Better heuristics may be in order here.  */
>> >>> +                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
>> >>> +                  cold_bb_count--;
>> >>> +                  VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ);
>> >>> +                  /* Examine this successor as a newly-hot bb.  */
>> >>> +                  VEC_safe_push (basic_block, heap, bbs_newly_hot, succ);
>> >>> +                }
>> >>> +            }
>> >>> +        }
>> >>> +
>> >>> +      if (dom_calculated_here)
>> >>> +        free_dominance_info (CDI_DOMINATORS);
>> >>> +    }
>> >>> +
>> >>>    /* The format of .gcc_except_table does not allow landing pads to
>> >>>       be in a different partition as the throw.  Fix this by either
>> >>>       moving or duplicating the landing pads.  */
>> >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
>> >>>                       new_bb->aux = cur_bb->aux;
>> >>>                       cur_bb->aux = new_bb;
>> >>>
>> >>> -                     /* Make sure new fall-through bb is in same
>> >>> -                        partition as bb it's falling through from.  */
>> >>> +                      /* This is done by force_nonfallthru_and_redirect.  */
>> >>> +                     gcc_assert (BB_PARTITION (new_bb)
>> >>> +                                  == BB_PARTITION (cur_bb));
>> >>>
>> >>> -                     BB_COPY_PARTITION (new_bb, cur_bb);
>> >>>                       single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
>> >>>                     }
>> >>>                   else
>> >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
>> >>>    FOR_EACH_BB (bb)
>> >>>      FOR_EACH_EDGE (e, ei, bb->succs)
>> >>>        if ((e->flags & EDGE_CROSSING)
>> >>> -         && JUMP_P (BB_END (e->src)))
>> >>> +         && JUMP_P (BB_END (e->src))
>> >>> +          /* Some notes were added during fix_up_fall_thru_edges, via
>> >>> +             force_nonfallthru_and_redirect.  */
>> >>> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
>> >>>         add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>> >>>  }
>> >>>
>> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
>> >>>        dump_flow_info (dump_file, dump_flags);
>> >>>      }
>> >>>
>> >>> -  if (flag_reorder_blocks_and_partition)
>> >>> +  if (crtl->has_bb_partition)
>> >>>      verify_hot_cold_block_grouping ();
>> >>>  }
>> >>>
>> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
>> >>>     encountering this note will make the compiler switch between the
>> >>>     hot and cold text sections.  */
>> >>>
>> >>> -static void
>> >>> +void
>> >>>  insert_section_boundary_note (void)
>> >>>  {
>> >>>    basic_block bb;
>> >>>    rtx new_note;
>> >>>    int first_partition = 0;
>> >>>
>> >>> -  if (!flag_reorder_blocks_and_partition)
>> >>> +  if (!crtl->has_bb_partition)
>> >>>      return;
>> >>>
>> >>>    FOR_EACH_BB (bb)
>> >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
>> >>>    FOR_EACH_BB (bb)
>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>> >>>        bb->aux = bb->next_bb;
>> >>> -  cfg_layout_finalize ();
>> >>> +  cfg_layout_finalize (true);
>> >>>
>> >>> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>> >>> -  insert_section_boundary_note ();
>> >>>    return 0;
>> >>>  }
>> >>>
>> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
>> >>>      }
>> >>>
>> >>>  done:
>> >>> -  cfg_layout_finalize ();
>> >>> +  cfg_layout_finalize (false);
>> >>>
>> >>>    BITMAP_FREE (candidates);
>> >>>    return 0;
>> >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
>> >>>    if (crossing_edges == NULL)
>> >>>      return 0;
>> >>>
>> >>> +  crtl->has_bb_partition = true;
>> >>> +
>> >>>    /* Make sure the source of any crossing edge ends in a jump and the
>> >>>       destination of any crossing edge has a label.  */
>> >>>    add_labels_and_missing_jumps (crossing_edges);
>> >>> Index: bb-reorder.h
>> >>> ===================================================================
>> >>> --- bb-reorder.h        (revision 193376)
>> >>> +++ bb-reorder.h        (working copy)
>> >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re
>> >>>
>> >>>  extern int get_uncond_jump_length (void);
>> >>>
>> >>> +extern void insert_section_boundary_note (void);
>> >>> +
>> >>> +extern void emit_barrier_after_bb (basic_block bb);
>> >>> +
>> >>>  #endif
>> >>> Index: basic-block.h
>> >>> ===================================================================
>> >>> --- basic-block.h       (revision 193376)
>> >>> +++ basic-block.h       (working copy)
>> >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect
>> >>>  extern bool contains_no_active_insn_p (const_basic_block);
>> >>>  extern bool forwarder_block_p (const_basic_block);
>> >>>  extern bool can_fallthru (basic_block, basic_block);
>> >>> +extern void fixup_partitions (void);
>> >>>
>> >>>  /* In cfgbuild.c.  */
>> >>>  extern void find_many_sub_basic_blocks (sbitmap);
>> >>> Index: cfgrtl.c
>> >>> ===================================================================
>> >>> --- cfgrtl.c    (revision 193376)
>> >>> +++ cfgrtl.c    (working copy)
>> >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>> >>>  #include "tree.h"
>> >>>  #include "hard-reg-set.h"
>> >>>  #include "basic-block.h"
>> >>> +#include "bb-reorder.h"
>> >>>  #include "regs.h"
>> >>>  #include "flags.h"
>> >>>  #include "function.h"
>> >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
>> >>>     Only applicable if the CFG is in cfglayout mode.  */
>> >>>  static GTY(()) rtx cfg_layout_function_footer;
>> >>>  static GTY(()) rtx cfg_layout_function_header;
>> >>> +static bool had_sec_boundary_notes;
>> >>>
>> >>>  static rtx skip_insns_after_block (basic_block);
>> >>>  static void record_effective_endpoints (void);
>> >>>  static rtx label_for_bb (basic_block);
>> >>> -static void fixup_reorder_chain (void);
>> >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks);
>> >>>
>> >>>  void verify_insn_chain (void);
>> >>>  static void fixup_fallthru_exit_predecessor (void);
>> >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
>> >>>       partition boundaries).  See  the comments at the top of
>> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>> >>>
>> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>> >>>      return NULL;
>> >>>
>> >>>    /* We can replace or remove a complex jump only when we have exactly
>> >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
>> >>>    return e;
>> >>>  }
>> >>>
>> >>> +/* Called when edge E has been redirected to a new destination,
>> >>> +   in order to update the region crossing flag on the edge and
>> >>> +   jump.  */
>> >>> +
>> >>> +static void
>> >>> +fixup_partition_crossing (edge e, basic_block target)
>> >>> +{
>> >>> +  rtx note;
>> >>> +
>> >>> +  gcc_assert (e->dest == target);
>> >>> +
>> >>> +  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
>> >>> +    return;
>> >>> +  /* If we redirected an existing edge, it may already be marked
>> >>> +     crossing, even though the new src is missing a reg crossing note.
>> >>> +     But make sure reg crossing note doesn't already exist before
>> >>> +     inserting.  */
>> >>> +  if (BB_PARTITION (e->src) != BB_PARTITION (target))
>> >>> +    {
>> >>> +      e->flags |= EDGE_CROSSING;
>> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>> >>> +      if (JUMP_P (BB_END (e->src))
>> >>> +          && !note)
>> >>> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>> >>> +    }
>> >>> +  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
>> >>> +    {
>> >>> +      e->flags &= ~EDGE_CROSSING;
>> >>> +      /* Remove the region crossing note from jump at end of
>> >>> +         e->src if it exists.  */
>> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>> >>> +      if (note)
>> >>> +        remove_note (BB_END (e->src), note);
>> >>> +    }
>> >>> +}
>> >>> +
>> >>> +/* Called when block BB has been reassigned to a different partition,
>> >>> +   to ensure that the region crossing attributes are updated.  */
>> >>> +
>> >>> +static void
>> >>> +fixup_bb_partition (basic_block bb)
>> >>> +{
>> >>> +  edge e;
>> >>> +  edge_iterator ei;
>> >>> +
>> >>> +  /* Now need to make bb's pred edges non-region crossing.  */
>> >>> +  FOR_EACH_EDGE (e, ei, bb->preds)
>> >>> +    {
>> >>> +      fixup_partition_crossing (e, e->dest);
>> >>> +    }
>> >>> +
>> >>> +  /* Possibly need to make bb's successor edges region crossing,
>> >>> +     or remove stale region crossing.  */
>> >>> +  FOR_EACH_EDGE (e, ei, bb->succs)
>> >>> +    {
>> >>> +      if ((e->flags & EDGE_FALLTHRU)
>> >>> +          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
>> >>> +          && e->dest != EXIT_BLOCK_PTR)
>> >>> +        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
>> >>> +        force_nonfallthru (e);
>> >>> +      else
>> >>> +        fixup_partition_crossing (e, e->dest);
>> >>> +    }
>> >>> +}
>> >>> +
>> >>>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>> >>>     expense of adding new instructions or reordering basic blocks.
>> >>>
>> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>> >>>  {
>> >>>    edge ret;
>> >>>    basic_block src = e->src;
>> >>> +  basic_block dest = e->dest;
>> >>>
>> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>> >>>      return NULL;
>> >>>
>> >>> -  if (e->dest == target)
>> >>> +  if (dest == target)
>> >>>      return e;
>> >>>
>> >>>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
>> >>>      {
>> >>>        df_set_bb_dirty (src);
>> >>> +      fixup_partition_crossing (ret, target);
>> >>>        return ret;
>> >>>      }
>> >>>
>> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>> >>>      return NULL;
>> >>>
>> >>>    df_set_bb_dirty (src);
>> >>> +  fixup_partition_crossing (ret, target);
>> >>>    return ret;
>> >>>  }
>> >>>
>> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>> >>>        /* Make sure new block ends up in correct hot/cold section.  */
>> >>>
>> >>>        BB_COPY_PARTITION (jump_block, e->src);
>> >>> -      if (flag_reorder_blocks_and_partition
>> >>> -         && targetm_common.have_named_sections
>> >>> -         && JUMP_P (BB_END (jump_block))
>> >>> -         && !any_condjump_p (BB_END (jump_block))
>> >>> -         && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
>> >>> -       add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
>> >>>
>> >>>        /* Wire edge in.  */
>> >>>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
>> >>>        new_edge->probability = probability;
>> >>>        new_edge->count = count;
>> >>>
>> >>> +      /* If e->src was previously region crossing, it no longer is
>> >>> +         and the reg crossing note should be removed.  */
>> >>> +      fixup_partition_crossing (new_edge, jump_block);
>> >>> +
>> >>>        /* Redirect old edge.  */
>> >>>        redirect_edge_pred (e, jump_block);
>> >>>        e->probability = REG_BR_PROB_BASE;
>> >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>> >>>        LABEL_NUSES (label)++;
>> >>>      }
>> >>>
>> >>> -  emit_barrier_after (BB_END (jump_block));
>> >>> +  /* We might be in cfg layout mode, and if so, the following routine will
>> >>> +     insert the barrier correctly.  */
>> >>> +  emit_barrier_after_bb (jump_block);
>> >>>    redirect_edge_succ_nodup (e, target);
>> >>>
>> >>>    if (abnormal_edge_flags)
>> >>>      make_edge (src, target, abnormal_edge_flags);
>> >>>
>> >>>    df_mark_solutions_dirty ();
>> >>> +  fixup_partition_crossing (e, target);
>> >>>    return new_bb;
>> >>>  }
>> >>>
>> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>> >>>  static basic_block
>> >>>  rtl_split_edge (edge edge_in)
>> >>>  {
>> >>> -  basic_block bb;
>> >>> +  basic_block bb, new_bb;
>> >>>    rtx before;
>> >>>
>> >>>    /* Abnormal edges cannot be split.  */
>> >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in)
>> >>>    else
>> >>>      {
>> >>>        bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
>> >>> -      /* ??? Why not edge_in->dest->prev_bb here?  */
>> >>> -      BB_COPY_PARTITION (bb, edge_in->dest);
>> >>> +      if (edge_in->src == ENTRY_BLOCK_PTR)
>> >>> +        BB_COPY_PARTITION (bb, edge_in->dest);
>> >>> +      else
>> >>> +        /* Put the split bb into the src partition, to avoid creating
>> >>> +           a situation where a cold bb dominates a hot bb, in the case
>> >>> +           where src is cold and dest is hot. The src will dominate
>> >>> +           the new bb (whereas it might not have dominated dest).  */
>> >>> +        BB_COPY_PARTITION (bb, edge_in->src);
>> >>>      }
>> >>>
>> >>>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
>> >>>
>> >>> +  /* Can't allow a region crossing edge to be fallthrough.  */
>> >>> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
>> >>> +      && edge_in->dest != EXIT_BLOCK_PTR)
>> >>> +    {
>> >>> +      new_bb = force_nonfallthru (single_succ_edge (bb));
>> >>> +      gcc_assert (!new_bb);
>> >>> +    }
>> >>> +
>> >>>    /* For non-fallthru edges, we must adjust the predecessor's
>> >>>       jump instruction to target our new block.  */
>> >>>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
>> >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e)
>> >>>    else
>> >>>      {
>> >>>        bb = split_edge (e);
>> >>> -      after = BB_END (bb);
>> >>>
>> >>> -      if (flag_reorder_blocks_and_partition
>> >>> -         && targetm_common.have_named_sections
>> >>> -         && e->src != ENTRY_BLOCK_PTR
>> >>> -         && BB_PARTITION (e->src) == BB_COLD_PARTITION
>> >>> -         && !(e->flags & EDGE_CROSSING)
>> >>> -         && JUMP_P (after)
>> >>> -         && !any_condjump_p (after)
>> >>> -         && (single_succ_edge (bb)->flags & EDGE_CROSSING))
>> >>> -       add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
>> >>> +      /* If e crossed a partition boundary, we needed to make bb end in
>> >>> +         a region-crossing jump, even though it was originally fallthru.  */
>> >>> +      if (JUMP_P (BB_END (bb)))
>> >>> +       before = BB_END (bb);
>> >>> +      else
>> >>> +        after = BB_END (bb);
>> >>>      }
>> >>>
>> >>>    /* Now that we've found the spot, do the insertion.  */
>> >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void)
>> >>>  {
>> >>>    basic_block bb;
>> >>>
>> >>> +  /* Optimization passes that invoke this routine can cause hot blocks
>> >>> +     previously reached by both hot and cold blocks to become dominated only
>> >>> +     by cold blocks. This will cause the verification below to fail,
>> >>> +     and lead to now cold code in the hot section. In some cases this
>> >>> +     may only be visible after newly unreachable blocks are deleted,
>> >>> +     which will be done by fixup_partitions.  */
>> >>> +  fixup_partitions ();
>> >>> +
>> >>>  #ifdef ENABLE_CHECKING
>> >>>    verify_flow_info ();
>> >>>  #endif
>> >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb)
>> >>>
>> >>>    return end;
>> >>>  }
>> >>> -
>> >>> +
>> >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization
>> >>> +   passes that modify the cfg.  */
>> >>> +
>> >>> +void
>> >>> +fixup_partitions (void)
>> >>> +{
>> >>> +  basic_block bb;
>> >>> +
>> >>> +  if (!crtl->has_bb_partition)
>> >>> +    return;
>> >>> +
>> >>> +  /* Delete any blocks that became unreachable and weren't
>> >>> +     already cleaned up, for example during edge forwarding
>> >>> +     and convert_jumps_to_returns. This will expose more
>> >>> +     opportunities for fixing the partition boundaries here.
>> >>> +     Also, the calculation of the dominance graph during verification
>> >>> +     will assert if there are unreachable nodes.  */
>> >>> +  delete_unreachable_blocks ();
>> >>> +
>> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
>> >>> +     a cold partition cannot dominate a basic block in a hot partition.
>> >>> +     Fixup any that now violate this requirement, as a result of edge
>> >>> +     forwarding and unreachable block deletion.  */
>> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>> >>> +  VEC (basic_block, heap) *bbs_to_fix = NULL;
>> >>> +  FOR_EACH_BB (bb)
>> >>> +    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>> >>> +      VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>> >>> +    {
>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>> >>> +      basic_block son;
>> >>> +
>> >>> +      if (dom_calculated_here)
>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>> >>> +
>> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>> >>> +        {
>> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>> >>> +          /* If bb is not yet cold (because it was added below as
>> >>> +             a block dominated by a cold bb) then mark it cold here.  */
>> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>> >>> +            {
>> >>> +              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>> >>> +              VEC_safe_push (basic_block, heap, bbs_to_fix, bb);
>> >>> +            }
>> >>> +          /* Any blocks dominated by a block in the cold section
>> >>> +             must also be cold.  */
>> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>> >>> +               son;
>> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
>> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>> >>> +        }
>> >>> +
>> >>> +      if (dom_calculated_here)
>> >>> +        free_dominance_info (CDI_DOMINATORS);
>> >>> +    }
>> >>> +
>> >>> +  /* Do the partition fixup after all necessary blocks have been converted to
>> >>> +     cold, so that we only update the region crossings the minimum number of
>> >>> +     places, which can require forcing edges to be non fallthru.  */
>> >>> +  while (! VEC_empty (basic_block, bbs_to_fix))
>> >>> +    {
>> >>> +      bb = VEC_pop (basic_block, bbs_to_fix);
>> >>> +      fixup_bb_partition (bb);
>> >>> +    }
>> >>> +}
>> >>> +
>> >>>  /* Verify the CFG and RTL consistency common for both underlying RTL and
>> >>>     cfglayout RTL.
>> >>>
>> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void)
>> >>>    rtx x;
>> >>>    int err = 0;
>> >>>    basic_block bb;
>> >>> +  bool have_partitions = false;
>> >>>
>> >>>    /* Check the general integrity of the basic blocks.  */
>> >>>    FOR_EACH_BB_REVERSE (bb)
>> >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void)
>> >>>
>> >>>           if (e->flags & EDGE_ABNORMAL)
>> >>>             n_abnormal++;
>> >>> +
>> >>> +          have_partitions |= is_crossing;
>> >>>         }
>> >>>
>> >>>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
>> >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void)
>> >>>           }
>> >>>      }
>> >>>
>> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
>> >>> +     a cold partition cannot dominate a basic block in a hot partition.  */
>> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>> >>> +  if (have_partitions && !err)
>> >>> +    FOR_EACH_BB (bb)
>> >>> +      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>> >>> +        VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>> >>> +    {
>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>> >>> +      basic_block son;
>> >>> +
>> >>> +      if (dom_calculated_here)
>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>> >>> +
>> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>> >>> +        {
>> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>> >>> +            {
>> >>> +              error ("non-cold basic block %d dominated "
>> >>> +                     "by a block in the cold partition", bb->index);
>> >>> +              err = 1;
>> >>> +            }
>> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>> >>> +               son;
>> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
>> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>> >>> +        }
>> >>> +
>> >>> +      if (dom_calculated_here)
>> >>> +        free_dominance_info (CDI_DOMINATORS);
>> >>> +    }
>> >>> +
>> >>>    /* Clean up.  */
>> >>>    return err;
>> >>>  }
>> >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void)
>> >>>    else
>> >>>      cfg_layout_function_header = NULL_RTX;
>> >>>
>> >>> +  had_sec_boundary_notes = false;
>> >>> +
>> >>>    next_insn = get_insns ();
>> >>>    FOR_EACH_BB (bb)
>> >>>      {
>> >>>        rtx end;
>> >>>
>> >>>        if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
>> >>> -       BB_HEADER (bb) = unlink_insn_chain (next_insn,
>> >>> -                                             PREV_INSN (BB_HEAD (bb)));
>> >>> +        {
>> >>> +          /* Rather than try to keep section boundary notes incrementally
>> >>> +             up-to-date through cfg layout optimizations, simply remove them
>> >>> +             and flag that they should be re-inserted when exiting
>> >>> +             cfg layout mode.  */
>> >>> +          rtx check_insn = next_insn;
>> >>> +          while (check_insn)
>> >>> +            {
>> >>> +              if (NOTE_P (check_insn)
>> >>> +                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
>> >>> +              {
>> >>> +                had_sec_boundary_notes |= true;
>> >>> +                /* Remove note from chain. Grab new next_insn first.  */
>> >>> +                if (next_insn == check_insn)
>> >>> +                  next_insn = NEXT_INSN (check_insn);
>> >>> +                /* Delete note.  */
>> >>> +                delete_insn (check_insn);
>> >>> +                /* There will only be one.  */
>> >>> +                break;
>> >>> +              }
>> >>> +              check_insn = NEXT_INSN (check_insn);
>> >>> +            }
>> >>> +          /* If we still have header instructions left after above loop.  */
>> >>> +          if (next_insn != BB_HEAD (bb))
>> >>> +            BB_HEADER (bb) = unlink_insn_chain (next_insn,
>> >>> +                                                PREV_INSN (BB_HEAD (bb)));
>> >>> +        }
>> >>>        end = skip_insns_after_block (bb);
>> >>>        if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
>> >>>         BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
>> >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void)
>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>> >>>        bb->aux = bb->next_bb;
>> >>>
>> >>> -  cfg_layout_finalize ();
>> >>> +  cfg_layout_finalize (false);
>> >>>
>> >>>    return 0;
>> >>>  }
>> >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
>> >>>  }
>> >>>
>> >>>
>> >>> -/* Given a reorder chain, rearrange the code to match.  */
>> >>> +/* Given a reorder chain, rearrange the code to match. If
>> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, or when
>> >>> +   section boundary notes were removed on entry to cfg layout
>> >>> +   mode, insert section boundary notes here.  */
>> >>>
>> >>>  static void
>> >>> -fixup_reorder_chain (void)
>> >>> +fixup_reorder_chain (bool finalize_reorder_blocks)
>> >>>  {
>> >>>    basic_block bb;
>> >>>    rtx insn = NULL;
>> >>> @@ -3150,7 +3373,7 @@ static void
>> >>>           PREV_INSN (BB_HEADER (bb)) = insn;
>> >>>           insn = BB_HEADER (bb);
>> >>>           while (NEXT_INSN (insn))
>> >>> -           insn = NEXT_INSN (insn);
>> >>> +            insn = NEXT_INSN (insn);
>> >>>         }
>> >>>        if (insn)
>> >>>         NEXT_INSN (insn) = BB_HEAD (bb);
>> >>> @@ -3175,6 +3398,11 @@ static void
>> >>>      insn = NEXT_INSN (insn);
>> >>>
>> >>>    set_last_insn (insn);
>> >>> +
>> >>> +  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>> >>> +  if (had_sec_boundary_notes || finalize_reorder_blocks)
>> >>> +    insert_section_boundary_note ();
>> >>> +
>> >>>  #ifdef ENABLE_CHECKING
>> >>>    verify_insn_chain ();
>> >>>  #endif
>> >>> @@ -3187,7 +3415,7 @@ static void
>> >>>        edge e_fall, e_taken, e;
>> >>>        rtx bb_end_insn;
>> >>>        rtx ret_label = NULL_RTX;
>> >>> -      basic_block nb, src_bb;
>> >>> +      basic_block nb;
>> >>>        edge_iterator ei;
>> >>>
>> >>>        if (EDGE_COUNT (bb->succs) == 0)
>> >>> @@ -3322,7 +3550,6 @@ static void
>> >>>        /* We got here if we need to add a new jump insn.
>> >>>          Note force_nonfallthru can delete E_FALL and thus we have to
>> >>>          save E_FALL->src prior to the call to force_nonfallthru.  */
>> >>> -      src_bb = e_fall->src;
>> >>>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
>> >>>        if (nb)
>> >>>         {
>> >>> @@ -3330,17 +3557,6 @@ static void
>> >>>           bb->aux = nb;
>> >>>           /* Don't process this new block.  */
>> >>>           bb = nb;
>> >>> -
>> >>> -         /* Make sure new bb is tagged for correct section (same as
>> >>> -            fall-thru source, since you cannot fall-thru across
>> >>> -            section boundaries).  */
>> >>> -         BB_COPY_PARTITION (src_bb, single_pred (bb));
>> >>> -         if (flag_reorder_blocks_and_partition
>> >>> -             && targetm_common.have_named_sections
>> >>> -             && JUMP_P (BB_END (bb))
>> >>> -             && !any_condjump_p (BB_END (bb))
>> >>> -             && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
>> >>> -           add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
>> >>>         }
>> >>>      }
>> >>>
>> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to)
>> >>>             case NOTE_INSN_FUNCTION_BEG:
>> >>>               /* There is always just single entry to function.  */
>> >>>             case NOTE_INSN_BASIC_BLOCK:
>> >>> +              /* We should only switch text sections once.  */
>> >>> +           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>> >>>               break;
>> >>>
>> >>>             case NOTE_INSN_EPILOGUE_BEG:
>> >>> -           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>> >>>               emit_note_copy (insn);
>> >>>               break;
>> >>>
>> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void)
>> >>>  }
>> >>>
>> >>>  /* Finalize the changes: reorder insn list according to the sequence specified
>> >>> -   by aux pointers, enter compensation code, rebuild scope forest.  */
>> >>> +   by aux pointers, enter compensation code, rebuild scope forest. If
>> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
>> >>> +   to fixup_reorder_chain so that it can insert the proper switch text
>> >>> +   section notes.  */
>> >>>
>> >>>  void
>> >>> -cfg_layout_finalize (void)
>> >>> +cfg_layout_finalize (bool finalize_reorder_blocks)
>> >>>  {
>> >>>  #ifdef ENABLE_CHECKING
>> >>>    verify_flow_info ();
>> >>> @@ -3775,7 +3995,7 @@ void
>> >>>  #endif
>> >>>        )
>> >>>      fixup_fallthru_exit_predecessor ();
>> >>> -  fixup_reorder_chain ();
>> >>> +  fixup_reorder_chain (finalize_reorder_blocks);
>> >>>
>> >>>    rebuild_jump_labels (get_insns ());
>> >>>    delete_dead_jumptables ();
>> >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e)
>> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>> >>>      return false;
>> >>>
>> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>> >>>      return false;
>> >>>
>> >>>    if (!onlyjump_p (insn)
>> >>>
>> >>> --
>> >>> This patch is available for review at http://codereview.appspot.com/6823047
>> >>
>> >>
>> >>
>> >> --
>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Christophe Lyon Nov. 28, 2012, 3:48 p.m. UTC | #6
I have updated my trunk checkout, and I can confirm that eval.c now
compiles with your patch (and the other 4 patches I added to PR55121).

Now, when looking at the whole Spec2k results:
- vpr passes now (used to fail)
- gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail
with the same error from gas:
can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171'
{.text section}
- gap still does not build (same error as above)

I haven't looked in detail, so I may be missing an obvious patch here.

And I still observe runtime mis-behaviour on crafty, galgel, facerec and fma3d.

Thanks
Christophe.


On 26 November 2012 21:52, Teresa Johnson <tejohnson@google.com> wrote:
> Sorry, I don't know what happened there. Patch is attached.
> Thanks,
> Teresa
>
> On Mon, Nov 26, 2012 at 12:42 PM, Jack Howarth <howarth@bromo.med.uc.edu> wrote:
>> On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote:
>>> Are you sure you have all my changes applied? I applied the 4 patches
>>> attached to PR55121 into my trunk checkout that has my fixes, and to a
>>> pristine trunk checkout. I configured and built both for
>>> --target=arm-none-linux-gnueabi, and built using your options, .i file
>>> and gcda file. I can reproduce the failure using the pristine trunk
>>> with your patches but not with my fixed trunk + your patches. (I just
>>> updated to head to pickup recent changes and get the same result. The
>>> vec changes required some manual changes to the patch, which I will
>>> resend shortly.)
>>
>> Teresa,
>>     Your mailer seems to have corrupted the posted patch with stray
>> =3D characters and line breaks. Can you repost a copy as an attachment
>> to the list?
>>              Jack
>>
>>>
>>> Without my fixes:
>>>
>>> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce
>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
>>> -fno-common -o eval.s -freorder-blocks-and-partition
>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>> 2.4.2-p1, MPC version 0.8.1
>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>> 2.4.2-p1, MPC version 0.8.1
>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a
>>> eval.c: In function ‘Ge’:
>>> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560
>>>  }
>>>  ^
>>> 0x622f71 df_compact_blocks()
>>> ../../gcc_trunk_3/gcc/df-core.c:1560
>>> 0x5cfcb5 compact_blocks()
>>> ../../gcc_trunk_3/gcc/cfg.c:162
>>> 0xc9dce0 reorder_basic_blocks
>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2154
>>> 0xc9dce0 rest_of_handle_reorder_blocks
>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2219
>>> Please submit a full bug report,
>>> with preprocessed source if appropriate.
>>> Please include the complete backtrace with any bug report.
>>> See <http://gcc.gnu.org/bugs.html> for instructions.
>>>
>>>
>>> With my fixes:
>>>
>>> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce
>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
>>> -fno-common -o eval.s -freorder-blocks-and-partition
>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>> 2.4.2-p1, MPC version 0.8.1
>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>> 2.4.2-p1, MPC version 0.8.1
>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3
>>>
>>>
>>> Thanks,
>>> Teresa
>>>
>>> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon
>>> <christophe.lyon@linaro.org> wrote:
>>> > Hi,
>>> >
>>> > I have tested your patch on Spec2000 on ARM, and I can still see
>>> > several failures caused by:
>>> > "error: fallthru edge crosses section boundary", including the case
>>> > described in PR55121.
>>> >
>>> > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote:
>>> >> Ping.
>>> >> Teresa
>>> >>
>>> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>> >>> Revised patch that fixes failures encountered when enabling
>>> >>> -freorder-blocks-and-partition, including the failure reported in PR 53743.
>>> >>>
>>> >>> This includes new verification code to ensure no cold blocks dominate hot
>>> >>> blocks contributed by Steven Bosscher.
>>> >>>
>>> >>> I attempted to make the handling of partition updates through the optimization
>>> >>> passes much more consistent, removing a number of partial fixes in the code
>>> >>> stream in the process. The code to fixup partitions (including the BB_PARTITION
>>> >>> assignement, region crossing jump notes, and switch text section notes) is
>>> >>> now handled in a few centralized locations. For example, inside
>>> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
>>> >>> don't need to attempt the fixup themselves.
>>> >>>
>>> >>> For optimization passes that make adjustments to the cfg while in cfg layout
>>> >>> mode that are not easy to fix up incrementally, the new routine
>>> >>> fixup_partitions handles the cleanup globally. This does require calculation
>>> >>> of the dominance relation, however, as far as I can tell the routines which
>>> >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
>>> >>> are invoked typically once (or a small number of times in the case of
>>> >>> try_optimize_cfg) per optimization pass. Additionally, I compared the
>>> >>> -ftime-report output for some large fdo compilations and saw only minimal
>>> >>> increases in the dominance computation times, which were only a tiny percent
>>> >>> of the overall compile time.
>>> >>>
>>> >>> Additionally, I added a flag to the rtl_data structure to indicate whether
>>> >>> any partitioning was actually performed, so that optimizations which were
>>> >>> conservatively disabled whenever the flag_reorder_blocks_and_partition
>>> >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
>>> >>> conservative for functions where no partitions were formed (e.g. they are
>>> >>> completely hot).
>>> >>>
>>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
>>> >>> benchmarks and internal google benchmarks using profile feedback and
>>> >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk?
>>> >>>
>>> >>> Thanks,
>>> >>> Teresa
>>> >>>
>>> >>> 2012-11-14  Teresa Johnson  <tejohnson@google.com>
>>> >>>             Steven Bosscher  <steven@gcc.gnu.org>
>>> >>>
>>> >>>         * cfghooks.h (cfg_layout_finalize): New parameter.
>>> >>>         * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
>>> >>>         parameter.
>>> >>>         * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
>>> >>>         as this is now done by redirect_edge_and_branch_force.
>>> >>>         * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
>>> >>>         barriers, new cfg_layout_finalize parameter, and don't store exit
>>> >>>         predecessor BB until after it is potentially split.
>>> >>>         * function.h (struct rtl_data): New flag has_bb_partition.
>>> >>>         * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
>>> >>>         * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
>>> >>>         any blocks in function actually partitioned.
>>> >>>         (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
>>> >>>         up partitioning.
>>> >>>         * bb-reorder.c (connect_traces): Only look for partitions and skip
>>> >>>         block copying if any blocks in function actually partitioned.
>>> >>>         (emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
>>> >>>         (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
>>> >>>         that no cold blocks dominate a hot block.
>>> >>>         (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
>>> >>>         as this is now done by force_nonfallthru_and_redirect.
>>> >>>         (add_reg_crossing_jump_notes): Handle the fact that some jumps may
>>> >>>         already be marked with region crossing note.
>>> >>>         (reorder_basic_blocks): Only need to verify partitions if any
>>> >>>         blocks in function actually partitioned.
>>> >>>         (insert_section_boundary_note): Only need to insert note if any
>>> >>>         blocks in function actually partitioned.
>>> >>>         (rest_of_handle_reorder_blocks): New cfg_layout_finalize
>>> >>>         parameter, and remove call to insert_section_boundary_note as this
>>> >>>         is now called via cfg_layout_finalize/fixup_reorder_chain.
>>> >>>         (duplicate_computed_gotos): New cfg_layout_finalize
>>> >>>         parameter.
>>> >>>         (partition_hot_cold_basic_blocks): Set flag indicating function
>>> >>>         has bb partitions.
>>> >>>         * bb-reorder.h: Declare insert_section_boundary_note and
>>> >>>         emit_barrier_after_bb, which are no longer static.
>>> >>>         * basic-block.h: Declare new function fixup_partitions.
>>> >>>         * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
>>> >>>         check for region crossing note.
>>> >>>         (fixup_partition_crossing): New function.
>>> >>>         (fixup_bb_partition): Ditto.
>>> >>>         (rtl_redirect_edge_and_branch): Fixup partition boundaries.
>>> >>>         (force_nonfallthru_and_redirect): Fixup partition boundaries,
>>> >>>         remove old code that tried to do this. Emit barrier correctly
>>> >>>         when we are in cfglayout mode.
>>> >>>         (rtl_split_edge): Correctly fixup partition boundaries.
>>> >>>         (commit_one_edge_insertion): Remove old code that tried to
>>> >>>         fixup region crossing edge since this is now handled in
>>> >>>         split_block, and set up insertion point correctly since
>>> >>>         block may now end in a jump.
>>> >>>         (commit_edge_insertions): Invoke fixup_partitions to sanitize partition
>>> >>>         boundaries after optimizations that modify cfg and before trying to
>>> >>>         verify the flow info.
>>> >>>         (fixup_partitions): New function.
>>> >>>         (rtl_verify_flow_info_1): Add verification that no cold bbs dominate
>>> >>>         hot bbs.
>>> >>>         (record_effective_endpoints): Remove region-crossing notes and set flag
>>> >>>         indicating that they need to be reinserted on exit from cfglayout mode.
>>> >>>         (outof_cfg_layout_mode): New cfg_layout_finalize parameter.
>>> >>>         (fixup_reorder_chain): Call insert_section_boundary_note if necessary.
>>> >>>         Remove old code that attempted to fixup region crossing note as
>>> >>>         this is now handled in force_nonfallthru_and_redirect.
>>> >>>         (duplicate_insn_chain): Don't duplicate switch section notes.
>>> >>>         (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
>>> >>>         (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
>>> >>>         note.
>>> >>>
>>> >>> Index: cfghooks.h
>>> >>> ===================================================================
>>> >>> --- cfghooks.h  (revision 193376)
>>> >>> +++ cfghooks.h  (working copy)
>>> >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
>>> >>>  void account_profile_record (struct profile_record *, int);
>>> >>>
>>> >>>  extern void cfg_layout_initialize (unsigned int);
>>> >>> -extern void cfg_layout_finalize (void);
>>> >>> +extern void cfg_layout_finalize (bool);
>>> >>>
>>> >>>  /* Hooks containers.  */
>>> >>>  extern struct cfg_hooks gimple_cfg_hooks;
>>> >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
>>> >>>  extern void gimple_register_cfg_hooks (void);
>>> >>>  extern struct cfg_hooks get_cfg_hooks (void);
>>> >>>  extern void set_cfg_hooks (struct cfg_hooks);
>>> >>> -
>>> >>> Index: modulo-sched.c
>>> >>> ===================================================================
>>> >>> --- modulo-sched.c      (revision 193376)
>>> >>> +++ modulo-sched.c      (working copy)
>>> >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void)
>>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>> >>>        bb->aux = bb->next_bb;
>>> >>>    free_dominance_info (CDI_DOMINATORS);
>>> >>> -  cfg_layout_finalize ();
>>> >>> +  cfg_layout_finalize (false);
>>> >>>  #endif /* INSN_SCHEDULING */
>>> >>>    return 0;
>>> >>>  }
>>> >>> Index: ifcvt.c
>>> >>> ===================================================================
>>> >>> --- ifcvt.c     (revision 193376)
>>> >>> +++ ifcvt.c     (working copy)
>>> >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
>>> >>>    if (new_bb)
>>> >>>      {
>>> >>>        df_bb_replace (then_bb_index, new_bb);
>>> >>> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
>>> >>> -         we need to ensure that new_bb is in the same partition as
>>> >>> -         test bb (you can not fall through across section boundaries).  */
>>> >>> -      BB_COPY_PARTITION (new_bb, test_bb);
>>> >>> +      /* This should have been done above via force_nonfallthru_and_redirect
>>> >>> +         (possibly called from redirect_edge_and_branch_force).  */
>>> >>> +      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
>>> >>>      }
>>> >>>
>>> >>>    num_true_changes++;
>>> >>> Index: function.c
>>> >>> ===================================================================
>>> >>> --- function.c  (revision 193376)
>>> >>> +++ function.c  (working copy)
>>> >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void)
>>> >>>                     break;
>>> >>>                 if (e)
>>> >>>                   {
>>> >>> -                   copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
>>> >>> -                                                 NULL_RTX, e->src);
>>> >>> +                    /* Make sure we insert after any barriers.  */
>>> >>> +                    rtx end = get_last_bb_insn (e->src);
>>> >>> +                    copy_bb = create_basic_block (NEXT_INSN (end),
>>> >>> +                                                  NULL_RTX, e->src);
>>> >>>                     BB_COPY_PARTITION (copy_bb, e->src);
>>> >>>                   }
>>> >>>                 else
>>> >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void)
>>> >>>         if (cur_bb->index >= NUM_FIXED_BLOCKS
>>> >>>             && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
>>> >>>           cur_bb->aux = cur_bb->next_bb;
>>> >>> -      cfg_layout_finalize ();
>>> >>> +      cfg_layout_finalize (false);
>>> >>>      }
>>> >>>
>>> >>>  epilogue_done:
>>> >>> @@ -6517,7 +6519,7 @@ epilogue_done:
>>> >>>        basic_block simple_return_block_cold = NULL;
>>> >>>        edge pending_edge_hot = NULL;
>>> >>>        edge pending_edge_cold = NULL;
>>> >>> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>> >>> +      basic_block exit_pred;
>>> >>>        int i;
>>> >>>
>>> >>>        gcc_assert (entry_edge != orig_entry_edge);
>>> >>> @@ -6545,6 +6547,12 @@ epilogue_done:
>>> >>>             else
>>> >>>               pending_edge_cold = e;
>>> >>>           }
>>> >>> +
>>> >>> +      /* Save a pointer to the exit's predecessor BB for use in
>>> >>> +         inserting new BBs at the end of the function. Do this
>>> >>> +         after the call to split_block above which may split
>>> >>> +         the original exit pred.  */
>>> >>> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>> >>>
>>> >>>        FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
>>> >>>         {
>>> >>> Index: function.h
>>> >>> ===================================================================
>>> >>> --- function.h  (revision 193376)
>>> >>> +++ function.h  (working copy)
>>> >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data {
>>> >>>       sched2) and is useful only if the port defines LEAF_REGISTERS.  */
>>> >>>    bool uses_only_leaf_regs;
>>> >>>
>>> >>> +  /* Nonzero if the function being compiled has undergone hot/cold partitioning
>>> >>> +     (under flag_reorder_blocks_and_partition) and has at least one cold
>>> >>> +     block.  */
>>> >>> +  bool has_bb_partition;
>>> >>> +
>>> >>>    /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
>>> >>>       asm.  Unlike regs_ever_live, elements of this array corresponding
>>> >>>       to eliminable regs (like the frame pointer) are set if an asm
>>> >>> Index: hw-doloop.c
>>> >>> ===================================================================
>>> >>> --- hw-doloop.c (revision 193376)
>>> >>> +++ hw-doloop.c (working copy)
>>> >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
>>> >>>        else
>>> >>>         bb->aux = NULL;
>>> >>>      }
>>> >>> -  cfg_layout_finalize ();
>>> >>> +  cfg_layout_finalize (false);
>>> >>>    clear_aux_for_blocks ();
>>> >>>    df_analyze ();
>>> >>>  }
>>> >>> Index: cfgcleanup.c
>>> >>> ===================================================================
>>> >>> --- cfgcleanup.c        (revision 193376)
>>> >>> +++ cfgcleanup.c        (working copy)
>>> >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
>>> >>>       partition boundaries).  See the comments at the top of
>>> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>> >>>
>>> >>> -  if (flag_reorder_blocks_and_partition && reload_completed)
>>> >>> +  if (crtl->has_bb_partition && reload_completed)
>>> >>>      return false;
>>> >>>
>>> >>>    /* Search backward through forwarder blocks.  We don't need to worry
>>> >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode)
>>> >>>               df_analyze ();
>>> >>>             }
>>> >>>
>>> >>> +         if (changed)
>>> >>> +            {
>>> >>> +              /* Edge forwarding in particular can cause hot blocks previously
>>> >>> +                 reached by both hot and cold blocks to become dominated only
>>> >>> +                 by cold blocks. This will cause the verification below to fail,
>>> >>> +                 and lead to now cold code in the hot section. This is not easy
>>> >>> +                 to detect and fix during edge forwarding, and in some cases
>>> >>> +                 is only visible after newly unreachable blocks are deleted,
>>> >>> +                 which will be done in fixup_partitions.  */
>>> >>> +              fixup_partitions ();
>>> >>> +
>>> >>>  #ifdef ENABLE_CHECKING
>>> >>> -         if (changed)
>>> >>> -           verify_flow_info ();
>>> >>> +              verify_flow_info ();
>>> >>>  #endif
>>> >>> +            }
>>> >>>
>>> >>>           changed_overall |= changed;
>>> >>>           first_pass = false;
>>> >>> Index: bb-reorder.c
>>> >>> ===================================================================
>>> >>> --- bb-reorder.c        (revision 193376)
>>> >>> +++ bb-reorder.c        (working copy)
>>> >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
>>> >>>    current_partition = BB_PARTITION (traces[0].first);
>>> >>>    two_passes = false;
>>> >>>
>>> >>> -  if (flag_reorder_blocks_and_partition)
>>> >>> +  if (crtl->has_bb_partition)
>>> >>>      for (i = 0; i < n_traces && !two_passes; i++)
>>> >>>        if (BB_PARTITION (traces[0].first)
>>> >>>           != BB_PARTITION (traces[i].first))
>>> >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
>>> >>>                       }
>>> >>>                   }
>>> >>>
>>> >>> -             if (flag_reorder_blocks_and_partition)
>>> >>> +             if (crtl->has_bb_partition)
>>> >>>                 try_copy = false;
>>> >>>
>>> >>>               /* Copy tiny blocks always; copy larger blocks only when the
>>> >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
>>> >>>    return length;
>>> >>>  }
>>> >>>
>>> >>> -/* Emit a barrier into the footer of BB.  */
>>> >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
>>> >>>
>>> >>> -static void
>>> >>> +void
>>> >>>  emit_barrier_after_bb (basic_block bb)
>>> >>>  {
>>> >>>    rtx barrier = emit_barrier_after (BB_END (bb));
>>> >>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>> >>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>>> >>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>> >>>  }
>>> >>>
>>> >>>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
>>> >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
>>> >>>  {
>>> >>>    VEC(edge, heap) *crossing_edges = NULL;
>>> >>>    basic_block bb;
>>> >>> -  edge e;
>>> >>> -  edge_iterator ei;
>>> >>> +  edge e, e2;
>>> >>> +  edge_iterator ei, ei2;
>>> >>> +  unsigned int cold_bb_count = 0;
>>> >>> +  VEC (basic_block, heap) *bbs_in_hot_partition = NULL;
>>> >>> +  VEC (basic_block, heap) *bbs_newly_hot = NULL;
>>> >>>
>>> >>>    /* Mark which partition (hot/cold) each basic block belongs in.  */
>>> >>>    FOR_EACH_BB (bb)
>>> >>>      {
>>> >>>        if (probably_never_executed_bb_p (cfun, bb))
>>> >>> -       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>> >>> +        {
>>> >>> +          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>> >>> +          cold_bb_count++;
>>> >>> +        }
>>> >>>        else
>>> >>> -       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>>> >>> +        {
>>> >>> +          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>>> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb);
>>> >>> +        }
>>> >>>      }
>>> >>>
>>> >>> +  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
>>> >>> +     several different possibilities. One is that there are edge weight insanities
>>> >>> +     due to optimization phases that do not properly update basic block profile
>>> >>> +     counts. The second is that the entry of the function may not be hot, because
>>> >>> +     it is entered fewer times than the number of profile training runs, but there
>>> >>> +     is a loop inside the function that causes blocks within the function to be
>>> >>> +     above the threshold for hotness.  */
>>> >>> +  if (cold_bb_count)
>>> >>> +    {
>>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>> >>> +
>>> >>> +      if (dom_calculated_here)
>>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>>> >>> +
>>> >>> +      /* Keep examining hot bbs until we have either checked them all, or
>>> >>> +         re-marked all cold bbs hot.  */
>>> >>> +      while (! VEC_empty (basic_block, bbs_in_hot_partition)
>>> >>> +             && cold_bb_count)
>>> >>> +        {
>>> >>> +          basic_block dom_bb;
>>> >>> +
>>> >>> +          bb = VEC_pop (basic_block, bbs_in_hot_partition);
>>> >>> +          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
>>> >>> +
>>> >>> +          /* If bb's immediate dominator is also hot then it is ok.  */
>>> >>> +          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
>>> >>> +            continue;
>>> >>> +
>>> >>> +          /* We have a hot bb with an immediate dominator that is cold.
>>> >>> +             The dominator needs to be re-marked to hot.  */
>>> >>> +          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
>>> >>> +          cold_bb_count--;
>>> >>> +
>>> >>> +          /* Now we need to examine newly-hot dom_bb to see if it is also
>>> >>> +             dominated by a cold bb.  */
>>> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb);
>>> >>> +
>>> >>> +          /* We should also adjust any cold blocks that the newly-hot bb
>>> >>> +             feeds and see if it makes sense to re-mark those as hot as
>>> >>> +             well.  */
>>> >>> +          VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb);
>>> >>> +          while (! VEC_empty (basic_block, bbs_newly_hot))
>>> >>> +            {
>>> >>> +              basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot);
>>> >>> +              /* Examine all successors of this newly-hot bb to see if they
>>> >>> +                 are cold and should be re-marked as hot.  */
>>> >>> +              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
>>> >>> +                {
>>> >>> +                  bool any_cold_preds = false;
>>> >>> +                  basic_block succ = e->dest;
>>> >>> +                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
>>> >>> +                    continue;
>>> >>> +                  /* Does this block have any cold predecessors now?  */
>>> >>> +                  FOR_EACH_EDGE (e2, ei2, succ->preds)
>>> >>> +                  {
>>> >>> +                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
>>> >>> +                      {
>>> >>> +                        any_cold_preds = true;
>>> >>> +                        break;
>>> >>> +                      }
>>> >>> +                  }
>>> >>> +                  if (any_cold_preds)
>>> >>> +                    continue;
>>> >>> +
>>> >>> +                  /* Here we have a successor of newly-hot bb that is cold
>>> >>> +                     but no longer has any cold precessessors. Since the original
>>> >>> +                     assignment of our newly-hot bb was incorrect, this successor's
>>> >>> +                     assignment as cold is also suspect. Go ahead and re-mark it
>>> >>> +                     as hot now too. Better heuristics may be in order here.  */
>>> >>> +                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
>>> >>> +                  cold_bb_count--;
>>> >>> +                  VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ);
>>> >>> +                  /* Examine this successor as a newly-hot bb.  */
>>> >>> +                  VEC_safe_push (basic_block, heap, bbs_newly_hot, succ);
>>> >>> +                }
>>> >>> +            }
>>> >>> +        }
>>> >>> +
>>> >>> +      if (dom_calculated_here)
>>> >>> +        free_dominance_info (CDI_DOMINATORS);
>>> >>> +    }
>>> >>> +
>>> >>>    /* The format of .gcc_except_table does not allow landing pads to
>>> >>>       be in a different partition as the throw.  Fix this by either
>>> >>>       moving or duplicating the landing pads.  */
>>> >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
>>> >>>                       new_bb->aux = cur_bb->aux;
>>> >>>                       cur_bb->aux = new_bb;
>>> >>>
>>> >>> -                     /* Make sure new fall-through bb is in same
>>> >>> -                        partition as bb it's falling through from.  */
>>> >>> +                      /* This is done by force_nonfallthru_and_redirect.  */
>>> >>> +                     gcc_assert (BB_PARTITION (new_bb)
>>> >>> +                                  == BB_PARTITION (cur_bb));
>>> >>>
>>> >>> -                     BB_COPY_PARTITION (new_bb, cur_bb);
>>> >>>                       single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
>>> >>>                     }
>>> >>>                   else
>>> >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
>>> >>>    FOR_EACH_BB (bb)
>>> >>>      FOR_EACH_EDGE (e, ei, bb->succs)
>>> >>>        if ((e->flags & EDGE_CROSSING)
>>> >>> -         && JUMP_P (BB_END (e->src)))
>>> >>> +         && JUMP_P (BB_END (e->src))
>>> >>> +          /* Some notes were added during fix_up_fall_thru_edges, via
>>> >>> +             force_nonfallthru_and_redirect.  */
>>> >>> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
>>> >>>         add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>> >>>  }
>>> >>>
>>> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
>>> >>>        dump_flow_info (dump_file, dump_flags);
>>> >>>      }
>>> >>>
>>> >>> -  if (flag_reorder_blocks_and_partition)
>>> >>> +  if (crtl->has_bb_partition)
>>> >>>      verify_hot_cold_block_grouping ();
>>> >>>  }
>>> >>>
>>> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
>>> >>>     encountering this note will make the compiler switch between the
>>> >>>     hot and cold text sections.  */
>>> >>>
>>> >>> -static void
>>> >>> +void
>>> >>>  insert_section_boundary_note (void)
>>> >>>  {
>>> >>>    basic_block bb;
>>> >>>    rtx new_note;
>>> >>>    int first_partition = 0;
>>> >>>
>>> >>> -  if (!flag_reorder_blocks_and_partition)
>>> >>> +  if (!crtl->has_bb_partition)
>>> >>>      return;
>>> >>>
>>> >>>    FOR_EACH_BB (bb)
>>> >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
>>> >>>    FOR_EACH_BB (bb)
>>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>> >>>        bb->aux = bb->next_bb;
>>> >>> -  cfg_layout_finalize ();
>>> >>> +  cfg_layout_finalize (true);
>>> >>>
>>> >>> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>>> >>> -  insert_section_boundary_note ();
>>> >>>    return 0;
>>> >>>  }
>>> >>>
>>> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
>>> >>>      }
>>> >>>
>>> >>>  done:
>>> >>> -  cfg_layout_finalize ();
>>> >>> +  cfg_layout_finalize (false);
>>> >>>
>>> >>>    BITMAP_FREE (candidates);
>>> >>>    return 0;
>>> >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
>>> >>>    if (crossing_edges == NULL)
>>> >>>      return 0;
>>> >>>
>>> >>> +  crtl->has_bb_partition = true;
>>> >>> +
>>> >>>    /* Make sure the source of any crossing edge ends in a jump and the
>>> >>>       destination of any crossing edge has a label.  */
>>> >>>    add_labels_and_missing_jumps (crossing_edges);
>>> >>> Index: bb-reorder.h
>>> >>> ===================================================================
>>> >>> --- bb-reorder.h        (revision 193376)
>>> >>> +++ bb-reorder.h        (working copy)
>>> >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re
>>> >>>
>>> >>>  extern int get_uncond_jump_length (void);
>>> >>>
>>> >>> +extern void insert_section_boundary_note (void);
>>> >>> +
>>> >>> +extern void emit_barrier_after_bb (basic_block bb);
>>> >>> +
>>> >>>  #endif
>>> >>> Index: basic-block.h
>>> >>> ===================================================================
>>> >>> --- basic-block.h       (revision 193376)
>>> >>> +++ basic-block.h       (working copy)
>>> >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect
>>> >>>  extern bool contains_no_active_insn_p (const_basic_block);
>>> >>>  extern bool forwarder_block_p (const_basic_block);
>>> >>>  extern bool can_fallthru (basic_block, basic_block);
>>> >>> +extern void fixup_partitions (void);
>>> >>>
>>> >>>  /* In cfgbuild.c.  */
>>> >>>  extern void find_many_sub_basic_blocks (sbitmap);
>>> >>> Index: cfgrtl.c
>>> >>> ===================================================================
>>> >>> --- cfgrtl.c    (revision 193376)
>>> >>> +++ cfgrtl.c    (working copy)
>>> >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>>> >>>  #include "tree.h"
>>> >>>  #include "hard-reg-set.h"
>>> >>>  #include "basic-block.h"
>>> >>> +#include "bb-reorder.h"
>>> >>>  #include "regs.h"
>>> >>>  #include "flags.h"
>>> >>>  #include "function.h"
>>> >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
>>> >>>     Only applicable if the CFG is in cfglayout mode.  */
>>> >>>  static GTY(()) rtx cfg_layout_function_footer;
>>> >>>  static GTY(()) rtx cfg_layout_function_header;
>>> >>> +static bool had_sec_boundary_notes;
>>> >>>
>>> >>>  static rtx skip_insns_after_block (basic_block);
>>> >>>  static void record_effective_endpoints (void);
>>> >>>  static rtx label_for_bb (basic_block);
>>> >>> -static void fixup_reorder_chain (void);
>>> >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks);
>>> >>>
>>> >>>  void verify_insn_chain (void);
>>> >>>  static void fixup_fallthru_exit_predecessor (void);
>>> >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
>>> >>>       partition boundaries).  See  the comments at the top of
>>> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>> >>>
>>> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>> >>>      return NULL;
>>> >>>
>>> >>>    /* We can replace or remove a complex jump only when we have exactly
>>> >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
>>> >>>    return e;
>>> >>>  }
>>> >>>
>>> >>> +/* Called when edge E has been redirected to a new destination,
>>> >>> +   in order to update the region crossing flag on the edge and
>>> >>> +   jump.  */
>>> >>> +
>>> >>> +static void
>>> >>> +fixup_partition_crossing (edge e, basic_block target)
>>> >>> +{
>>> >>> +  rtx note;
>>> >>> +
>>> >>> +  gcc_assert (e->dest == target);
>>> >>> +
>>> >>> +  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
>>> >>> +    return;
>>> >>> +  /* If we redirected an existing edge, it may already be marked
>>> >>> +     crossing, even though the new src is missing a reg crossing note.
>>> >>> +     But make sure reg crossing note doesn't already exist before
>>> >>> +     inserting.  */
>>> >>> +  if (BB_PARTITION (e->src) != BB_PARTITION (target))
>>> >>> +    {
>>> >>> +      e->flags |= EDGE_CROSSING;
>>> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>> >>> +      if (JUMP_P (BB_END (e->src))
>>> >>> +          && !note)
>>> >>> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>> >>> +    }
>>> >>> +  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
>>> >>> +    {
>>> >>> +      e->flags &= ~EDGE_CROSSING;
>>> >>> +      /* Remove the region crossing note from jump at end of
>>> >>> +         e->src if it exists.  */
>>> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>> >>> +      if (note)
>>> >>> +        remove_note (BB_END (e->src), note);
>>> >>> +    }
>>> >>> +}
>>> >>> +
>>> >>> +/* Called when block BB has been reassigned to a different partition,
>>> >>> +   to ensure that the region crossing attributes are updated.  */
>>> >>> +
>>> >>> +static void
>>> >>> +fixup_bb_partition (basic_block bb)
>>> >>> +{
>>> >>> +  edge e;
>>> >>> +  edge_iterator ei;
>>> >>> +
>>> >>> +  /* Now need to make bb's pred edges non-region crossing.  */
>>> >>> +  FOR_EACH_EDGE (e, ei, bb->preds)
>>> >>> +    {
>>> >>> +      fixup_partition_crossing (e, e->dest);
>>> >>> +    }
>>> >>> +
>>> >>> +  /* Possibly need to make bb's successor edges region crossing,
>>> >>> +     or remove stale region crossing.  */
>>> >>> +  FOR_EACH_EDGE (e, ei, bb->succs)
>>> >>> +    {
>>> >>> +      if ((e->flags & EDGE_FALLTHRU)
>>> >>> +          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
>>> >>> +          && e->dest != EXIT_BLOCK_PTR)
>>> >>> +        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
>>> >>> +        force_nonfallthru (e);
>>> >>> +      else
>>> >>> +        fixup_partition_crossing (e, e->dest);
>>> >>> +    }
>>> >>> +}
>>> >>> +
>>> >>>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>>> >>>     expense of adding new instructions or reordering basic blocks.
>>> >>>
>>> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>> >>>  {
>>> >>>    edge ret;
>>> >>>    basic_block src = e->src;
>>> >>> +  basic_block dest = e->dest;
>>> >>>
>>> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>> >>>      return NULL;
>>> >>>
>>> >>> -  if (e->dest == target)
>>> >>> +  if (dest == target)
>>> >>>      return e;
>>> >>>
>>> >>>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
>>> >>>      {
>>> >>>        df_set_bb_dirty (src);
>>> >>> +      fixup_partition_crossing (ret, target);
>>> >>>        return ret;
>>> >>>      }
>>> >>>
>>> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>> >>>      return NULL;
>>> >>>
>>> >>>    df_set_bb_dirty (src);
>>> >>> +  fixup_partition_crossing (ret, target);
>>> >>>    return ret;
>>> >>>  }
>>> >>>
>>> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>> >>>        /* Make sure new block ends up in correct hot/cold section.  */
>>> >>>
>>> >>>        BB_COPY_PARTITION (jump_block, e->src);
>>> >>> -      if (flag_reorder_blocks_and_partition
>>> >>> -         && targetm_common.have_named_sections
>>> >>> -         && JUMP_P (BB_END (jump_block))
>>> >>> -         && !any_condjump_p (BB_END (jump_block))
>>> >>> -         && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
>>> >>> -       add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
>>> >>>
>>> >>>        /* Wire edge in.  */
>>> >>>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
>>> >>>        new_edge->probability = probability;
>>> >>>        new_edge->count = count;
>>> >>>
>>> >>> +      /* If e->src was previously region crossing, it no longer is
>>> >>> +         and the reg crossing note should be removed.  */
>>> >>> +      fixup_partition_crossing (new_edge, jump_block);
>>> >>> +
>>> >>>        /* Redirect old edge.  */
>>> >>>        redirect_edge_pred (e, jump_block);
>>> >>>        e->probability = REG_BR_PROB_BASE;
>>> >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>> >>>        LABEL_NUSES (label)++;
>>> >>>      }
>>> >>>
>>> >>> -  emit_barrier_after (BB_END (jump_block));
>>> >>> +  /* We might be in cfg layout mode, and if so, the following routine will
>>> >>> +     insert the barrier correctly.  */
>>> >>> +  emit_barrier_after_bb (jump_block);
>>> >>>    redirect_edge_succ_nodup (e, target);
>>> >>>
>>> >>>    if (abnormal_edge_flags)
>>> >>>      make_edge (src, target, abnormal_edge_flags);
>>> >>>
>>> >>>    df_mark_solutions_dirty ();
>>> >>> +  fixup_partition_crossing (e, target);
>>> >>>    return new_bb;
>>> >>>  }
>>> >>>
>>> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>>> >>>  static basic_block
>>> >>>  rtl_split_edge (edge edge_in)
>>> >>>  {
>>> >>> -  basic_block bb;
>>> >>> +  basic_block bb, new_bb;
>>> >>>    rtx before;
>>> >>>
>>> >>>    /* Abnormal edges cannot be split.  */
>>> >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in)
>>> >>>    else
>>> >>>      {
>>> >>>        bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
>>> >>> -      /* ??? Why not edge_in->dest->prev_bb here?  */
>>> >>> -      BB_COPY_PARTITION (bb, edge_in->dest);
>>> >>> +      if (edge_in->src == ENTRY_BLOCK_PTR)
>>> >>> +        BB_COPY_PARTITION (bb, edge_in->dest);
>>> >>> +      else
>>> >>> +        /* Put the split bb into the src partition, to avoid creating
>>> >>> +           a situation where a cold bb dominates a hot bb, in the case
>>> >>> +           where src is cold and dest is hot. The src will dominate
>>> >>> +           the new bb (whereas it might not have dominated dest).  */
>>> >>> +        BB_COPY_PARTITION (bb, edge_in->src);
>>> >>>      }
>>> >>>
>>> >>>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
>>> >>>
>>> >>> +  /* Can't allow a region crossing edge to be fallthrough.  */
>>> >>> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
>>> >>> +      && edge_in->dest != EXIT_BLOCK_PTR)
>>> >>> +    {
>>> >>> +      new_bb = force_nonfallthru (single_succ_edge (bb));
>>> >>> +      gcc_assert (!new_bb);
>>> >>> +    }
>>> >>> +
>>> >>>    /* For non-fallthru edges, we must adjust the predecessor's
>>> >>>       jump instruction to target our new block.  */
>>> >>>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
>>> >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e)
>>> >>>    else
>>> >>>      {
>>> >>>        bb = split_edge (e);
>>> >>> -      after = BB_END (bb);
>>> >>>
>>> >>> -      if (flag_reorder_blocks_and_partition
>>> >>> -         && targetm_common.have_named_sections
>>> >>> -         && e->src != ENTRY_BLOCK_PTR
>>> >>> -         && BB_PARTITION (e->src) == BB_COLD_PARTITION
>>> >>> -         && !(e->flags & EDGE_CROSSING)
>>> >>> -         && JUMP_P (after)
>>> >>> -         && !any_condjump_p (after)
>>> >>> -         && (single_succ_edge (bb)->flags & EDGE_CROSSING))
>>> >>> -       add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
>>> >>> +      /* If e crossed a partition boundary, we needed to make bb end in
>>> >>> +         a region-crossing jump, even though it was originally fallthru.  */
>>> >>> +      if (JUMP_P (BB_END (bb)))
>>> >>> +       before = BB_END (bb);
>>> >>> +      else
>>> >>> +        after = BB_END (bb);
>>> >>>      }
>>> >>>
>>> >>>    /* Now that we've found the spot, do the insertion.  */
>>> >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void)
>>> >>>  {
>>> >>>    basic_block bb;
>>> >>>
>>> >>> +  /* Optimization passes that invoke this routine can cause hot blocks
>>> >>> +     previously reached by both hot and cold blocks to become dominated only
>>> >>> +     by cold blocks. This will cause the verification below to fail,
>>> >>> +     and lead to now cold code in the hot section. In some cases this
>>> >>> +     may only be visible after newly unreachable blocks are deleted,
>>> >>> +     which will be done by fixup_partitions.  */
>>> >>> +  fixup_partitions ();
>>> >>> +
>>> >>>  #ifdef ENABLE_CHECKING
>>> >>>    verify_flow_info ();
>>> >>>  #endif
>>> >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb)
>>> >>>
>>> >>>    return end;
>>> >>>  }
>>> >>> -
>>> >>> +
>>> >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization
>>> >>> +   passes that modify the cfg.  */
>>> >>> +
>>> >>> +void
>>> >>> +fixup_partitions (void)
>>> >>> +{
>>> >>> +  basic_block bb;
>>> >>> +
>>> >>> +  if (!crtl->has_bb_partition)
>>> >>> +    return;
>>> >>> +
>>> >>> +  /* Delete any blocks that became unreachable and weren't
>>> >>> +     already cleaned up, for example during edge forwarding
>>> >>> +     and convert_jumps_to_returns. This will expose more
>>> >>> +     opportunities for fixing the partition boundaries here.
>>> >>> +     Also, the calculation of the dominance graph during verification
>>> >>> +     will assert if there are unreachable nodes.  */
>>> >>> +  delete_unreachable_blocks ();
>>> >>> +
>>> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
>>> >>> +     a cold partition cannot dominate a basic block in a hot partition.
>>> >>> +     Fixup any that now violate this requirement, as a result of edge
>>> >>> +     forwarding and unreachable block deletion.  */
>>> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>>> >>> +  VEC (basic_block, heap) *bbs_to_fix = NULL;
>>> >>> +  FOR_EACH_BB (bb)
>>> >>> +    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>>> >>> +      VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>>> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>>> >>> +    {
>>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>> >>> +      basic_block son;
>>> >>> +
>>> >>> +      if (dom_calculated_here)
>>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>>> >>> +
>>> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>>> >>> +        {
>>> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>>> >>> +          /* If bb is not yet cold (because it was added below as
>>> >>> +             a block dominated by a cold bb) then mark it cold here.  */
>>> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>>> >>> +            {
>>> >>> +              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>> >>> +              VEC_safe_push (basic_block, heap, bbs_to_fix, bb);
>>> >>> +            }
>>> >>> +          /* Any blocks dominated by a block in the cold section
>>> >>> +             must also be cold.  */
>>> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>>> >>> +               son;
>>> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
>>> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>>> >>> +        }
>>> >>> +
>>> >>> +      if (dom_calculated_here)
>>> >>> +        free_dominance_info (CDI_DOMINATORS);
>>> >>> +    }
>>> >>> +
>>> >>> +  /* Do the partition fixup after all necessary blocks have been converted to
>>> >>> +     cold, so that we only update the region crossings the minimum number of
>>> >>> +     places, which can require forcing edges to be non fallthru.  */
>>> >>> +  while (! VEC_empty (basic_block, bbs_to_fix))
>>> >>> +    {
>>> >>> +      bb = VEC_pop (basic_block, bbs_to_fix);
>>> >>> +      fixup_bb_partition (bb);
>>> >>> +    }
>>> >>> +}
>>> >>> +
>>> >>>  /* Verify the CFG and RTL consistency common for both underlying RTL and
>>> >>>     cfglayout RTL.
>>> >>>
>>> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void)
>>> >>>    rtx x;
>>> >>>    int err = 0;
>>> >>>    basic_block bb;
>>> >>> +  bool have_partitions = false;
>>> >>>
>>> >>>    /* Check the general integrity of the basic blocks.  */
>>> >>>    FOR_EACH_BB_REVERSE (bb)
>>> >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void)
>>> >>>
>>> >>>           if (e->flags & EDGE_ABNORMAL)
>>> >>>             n_abnormal++;
>>> >>> +
>>> >>> +          have_partitions |= is_crossing;
>>> >>>         }
>>> >>>
>>> >>>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
>>> >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void)
>>> >>>           }
>>> >>>      }
>>> >>>
>>> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
>>> >>> +     a cold partition cannot dominate a basic block in a hot partition.  */
>>> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>>> >>> +  if (have_partitions && !err)
>>> >>> +    FOR_EACH_BB (bb)
>>> >>> +      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>>> >>> +        VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>>> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>>> >>> +    {
>>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>> >>> +      basic_block son;
>>> >>> +
>>> >>> +      if (dom_calculated_here)
>>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>>> >>> +
>>> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>>> >>> +        {
>>> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>>> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>>> >>> +            {
>>> >>> +              error ("non-cold basic block %d dominated "
>>> >>> +                     "by a block in the cold partition", bb->index);
>>> >>> +              err = 1;
>>> >>> +            }
>>> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>>> >>> +               son;
>>> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
>>> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>>> >>> +        }
>>> >>> +
>>> >>> +      if (dom_calculated_here)
>>> >>> +        free_dominance_info (CDI_DOMINATORS);
>>> >>> +    }
>>> >>> +
>>> >>>    /* Clean up.  */
>>> >>>    return err;
>>> >>>  }
>>> >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void)
>>> >>>    else
>>> >>>      cfg_layout_function_header = NULL_RTX;
>>> >>>
>>> >>> +  had_sec_boundary_notes = false;
>>> >>> +
>>> >>>    next_insn = get_insns ();
>>> >>>    FOR_EACH_BB (bb)
>>> >>>      {
>>> >>>        rtx end;
>>> >>>
>>> >>>        if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
>>> >>> -       BB_HEADER (bb) = unlink_insn_chain (next_insn,
>>> >>> -                                             PREV_INSN (BB_HEAD (bb)));
>>> >>> +        {
>>> >>> +          /* Rather than try to keep section boundary notes incrementally
>>> >>> +             up-to-date through cfg layout optimizations, simply remove them
>>> >>> +             and flag that they should be re-inserted when exiting
>>> >>> +             cfg layout mode.  */
>>> >>> +          rtx check_insn = next_insn;
>>> >>> +          while (check_insn)
>>> >>> +            {
>>> >>> +              if (NOTE_P (check_insn)
>>> >>> +                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
>>> >>> +              {
>>> >>> +                had_sec_boundary_notes |= true;
>>> >>> +                /* Remove note from chain. Grab new next_insn first.  */
>>> >>> +                if (next_insn == check_insn)
>>> >>> +                  next_insn = NEXT_INSN (check_insn);
>>> >>> +                /* Delete note.  */
>>> >>> +                delete_insn (check_insn);
>>> >>> +                /* There will only be one.  */
>>> >>> +                break;
>>> >>> +              }
>>> >>> +              check_insn = NEXT_INSN (check_insn);
>>> >>> +            }
>>> >>> +          /* If we still have header instructions left after above loop.  */
>>> >>> +          if (next_insn != BB_HEAD (bb))
>>> >>> +            BB_HEADER (bb) = unlink_insn_chain (next_insn,
>>> >>> +                                                PREV_INSN (BB_HEAD (bb)));
>>> >>> +        }
>>> >>>        end = skip_insns_after_block (bb);
>>> >>>        if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
>>> >>>         BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
>>> >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void)
>>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>> >>>        bb->aux = bb->next_bb;
>>> >>>
>>> >>> -  cfg_layout_finalize ();
>>> >>> +  cfg_layout_finalize (false);
>>> >>>
>>> >>>    return 0;
>>> >>>  }
>>> >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
>>> >>>  }
>>> >>>
>>> >>>
>>> >>> -/* Given a reorder chain, rearrange the code to match.  */
>>> >>> +/* Given a reorder chain, rearrange the code to match. If
>>> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, or when
>>> >>> +   section boundary notes were removed on entry to cfg layout
>>> >>> +   mode, insert section boundary notes here.  */
>>> >>>
>>> >>>  static void
>>> >>> -fixup_reorder_chain (void)
>>> >>> +fixup_reorder_chain (bool finalize_reorder_blocks)
>>> >>>  {
>>> >>>    basic_block bb;
>>> >>>    rtx insn = NULL;
>>> >>> @@ -3150,7 +3373,7 @@ static void
>>> >>>           PREV_INSN (BB_HEADER (bb)) = insn;
>>> >>>           insn = BB_HEADER (bb);
>>> >>>           while (NEXT_INSN (insn))
>>> >>> -           insn = NEXT_INSN (insn);
>>> >>> +            insn = NEXT_INSN (insn);
>>> >>>         }
>>> >>>        if (insn)
>>> >>>         NEXT_INSN (insn) = BB_HEAD (bb);
>>> >>> @@ -3175,6 +3398,11 @@ static void
>>> >>>      insn = NEXT_INSN (insn);
>>> >>>
>>> >>>    set_last_insn (insn);
>>> >>> +
>>> >>> +  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>>> >>> +  if (had_sec_boundary_notes || finalize_reorder_blocks)
>>> >>> +    insert_section_boundary_note ();
>>> >>> +
>>> >>>  #ifdef ENABLE_CHECKING
>>> >>>    verify_insn_chain ();
>>> >>>  #endif
>>> >>> @@ -3187,7 +3415,7 @@ static void
>>> >>>        edge e_fall, e_taken, e;
>>> >>>        rtx bb_end_insn;
>>> >>>        rtx ret_label = NULL_RTX;
>>> >>> -      basic_block nb, src_bb;
>>> >>> +      basic_block nb;
>>> >>>        edge_iterator ei;
>>> >>>
>>> >>>        if (EDGE_COUNT (bb->succs) == 0)
>>> >>> @@ -3322,7 +3550,6 @@ static void
>>> >>>        /* We got here if we need to add a new jump insn.
>>> >>>          Note force_nonfallthru can delete E_FALL and thus we have to
>>> >>>          save E_FALL->src prior to the call to force_nonfallthru.  */
>>> >>> -      src_bb = e_fall->src;
>>> >>>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
>>> >>>        if (nb)
>>> >>>         {
>>> >>> @@ -3330,17 +3557,6 @@ static void
>>> >>>           bb->aux = nb;
>>> >>>           /* Don't process this new block.  */
>>> >>>           bb = nb;
>>> >>> -
>>> >>> -         /* Make sure new bb is tagged for correct section (same as
>>> >>> -            fall-thru source, since you cannot fall-thru across
>>> >>> -            section boundaries).  */
>>> >>> -         BB_COPY_PARTITION (src_bb, single_pred (bb));
>>> >>> -         if (flag_reorder_blocks_and_partition
>>> >>> -             && targetm_common.have_named_sections
>>> >>> -             && JUMP_P (BB_END (bb))
>>> >>> -             && !any_condjump_p (BB_END (bb))
>>> >>> -             && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
>>> >>> -           add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
>>> >>>         }
>>> >>>      }
>>> >>>
>>> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to)
>>> >>>             case NOTE_INSN_FUNCTION_BEG:
>>> >>>               /* There is always just single entry to function.  */
>>> >>>             case NOTE_INSN_BASIC_BLOCK:
>>> >>> +              /* We should only switch text sections once.  */
>>> >>> +           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>> >>>               break;
>>> >>>
>>> >>>             case NOTE_INSN_EPILOGUE_BEG:
>>> >>> -           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>> >>>               emit_note_copy (insn);
>>> >>>               break;
>>> >>>
>>> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void)
>>> >>>  }
>>> >>>
>>> >>>  /* Finalize the changes: reorder insn list according to the sequence specified
>>> >>> -   by aux pointers, enter compensation code, rebuild scope forest.  */
>>> >>> +   by aux pointers, enter compensation code, rebuild scope forest. If
>>> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
>>> >>> +   to fixup_reorder_chain so that it can insert the proper switch text
>>> >>> +   section notes.  */
>>> >>>
>>> >>>  void
>>> >>> -cfg_layout_finalize (void)
>>> >>> +cfg_layout_finalize (bool finalize_reorder_blocks)
>>> >>>  {
>>> >>>  #ifdef ENABLE_CHECKING
>>> >>>    verify_flow_info ();
>>> >>> @@ -3775,7 +3995,7 @@ void
>>> >>>  #endif
>>> >>>        )
>>> >>>      fixup_fallthru_exit_predecessor ();
>>> >>> -  fixup_reorder_chain ();
>>> >>> +  fixup_reorder_chain (finalize_reorder_blocks);
>>> >>>
>>> >>>    rebuild_jump_labels (get_insns ());
>>> >>>    delete_dead_jumptables ();
>>> >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e)
>>> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>> >>>      return false;
>>> >>>
>>> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>> >>>      return false;
>>> >>>
>>> >>>    if (!onlyjump_p (insn)
>>> >>>
>>> >>> --
>>> >>> This patch is available for review at http://codereview.appspot.com/6823047
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson Nov. 28, 2012, 3:56 p.m. UTC | #7
Is this with the same target compiler and options used in PR55121? I
will try to reproduce the compile-time failures with arm and those
options if so. I haven't seen those with spec2006 linux x86_64. I'm
not sure how to test the runtime behavior though.

Thanks,
Teresa

On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> I have updated my trunk checkout, and I can confirm that eval.c now
> compiles with your patch (and the other 4 patches I added to PR55121).
>
> Now, when looking at the whole Spec2k results:
> - vpr passes now (used to fail)
> - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail
> with the same error from gas:
> can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171'
> {.text section}
> - gap still does not build (same error as above)
>
> I haven't looked in detail, so I may be missing an obvious patch here.
>
> And I still observe runtime mis-behaviour on crafty, galgel, facerec and fma3d.
>
> Thanks
> Christophe.
>
>
> On 26 November 2012 21:52, Teresa Johnson <tejohnson@google.com> wrote:
>> Sorry, I don't know what happened there. Patch is attached.
>> Thanks,
>> Teresa
>>
>> On Mon, Nov 26, 2012 at 12:42 PM, Jack Howarth <howarth@bromo.med.uc.edu> wrote:
>>> On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote:
>>>> Are you sure you have all my changes applied? I applied the 4 patches
>>>> attached to PR55121 into my trunk checkout that has my fixes, and to a
>>>> pristine trunk checkout. I configured and built both for
>>>> --target=arm-none-linux-gnueabi, and built using your options, .i file
>>>> and gcda file. I can reproduce the failure using the pristine trunk
>>>> with your patches but not with my fixed trunk + your patches. (I just
>>>> updated to head to pickup recent changes and get the same result. The
>>>> vec changes required some manual changes to the patch, which I will
>>>> resend shortly.)
>>>
>>> Teresa,
>>>     Your mailer seems to have corrupted the posted patch with stray
>>> =3D characters and line breaks. Can you repost a copy as an attachment
>>> to the list?
>>>              Jack
>>>
>>>>
>>>> Without my fixes:
>>>>
>>>> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce
>>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
>>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
>>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
>>>> -fno-common -o eval.s -freorder-blocks-and-partition
>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>>> 2.4.2-p1, MPC version 0.8.1
>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>>> 2.4.2-p1, MPC version 0.8.1
>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>>> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a
>>>> eval.c: In function ‘Ge’:
>>>> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560
>>>>  }
>>>>  ^
>>>> 0x622f71 df_compact_blocks()
>>>> ../../gcc_trunk_3/gcc/df-core.c:1560
>>>> 0x5cfcb5 compact_blocks()
>>>> ../../gcc_trunk_3/gcc/cfg.c:162
>>>> 0xc9dce0 reorder_basic_blocks
>>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2154
>>>> 0xc9dce0 rest_of_handle_reorder_blocks
>>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2219
>>>> Please submit a full bug report,
>>>> with preprocessed source if appropriate.
>>>> Please include the complete backtrace with any bug report.
>>>> See <http://gcc.gnu.org/bugs.html> for instructions.
>>>>
>>>>
>>>> With my fixes:
>>>>
>>>> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce
>>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
>>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
>>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
>>>> -fno-common -o eval.s -freorder-blocks-and-partition
>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>>> 2.4.2-p1, MPC version 0.8.1
>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>>> 2.4.2-p1, MPC version 0.8.1
>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>>> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3
>>>>
>>>>
>>>> Thanks,
>>>> Teresa
>>>>
>>>> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon
>>>> <christophe.lyon@linaro.org> wrote:
>>>> > Hi,
>>>> >
>>>> > I have tested your patch on Spec2000 on ARM, and I can still see
>>>> > several failures caused by:
>>>> > "error: fallthru edge crosses section boundary", including the case
>>>> > described in PR55121.
>>>> >
>>>> > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote:
>>>> >> Ping.
>>>> >> Teresa
>>>> >>
>>>> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>>> >>> Revised patch that fixes failures encountered when enabling
>>>> >>> -freorder-blocks-and-partition, including the failure reported in PR 53743.
>>>> >>>
>>>> >>> This includes new verification code to ensure no cold blocks dominate hot
>>>> >>> blocks contributed by Steven Bosscher.
>>>> >>>
>>>> >>> I attempted to make the handling of partition updates through the optimization
>>>> >>> passes much more consistent, removing a number of partial fixes in the code
>>>> >>> stream in the process. The code to fixup partitions (including the BB_PARTITION
>>>> >>> assignement, region crossing jump notes, and switch text section notes) is
>>>> >>> now handled in a few centralized locations. For example, inside
>>>> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
>>>> >>> don't need to attempt the fixup themselves.
>>>> >>>
>>>> >>> For optimization passes that make adjustments to the cfg while in cfg layout
>>>> >>> mode that are not easy to fix up incrementally, the new routine
>>>> >>> fixup_partitions handles the cleanup globally. This does require calculation
>>>> >>> of the dominance relation, however, as far as I can tell the routines which
>>>> >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
>>>> >>> are invoked typically once (or a small number of times in the case of
>>>> >>> try_optimize_cfg) per optimization pass. Additionally, I compared the
>>>> >>> -ftime-report output for some large fdo compilations and saw only minimal
>>>> >>> increases in the dominance computation times, which were only a tiny percent
>>>> >>> of the overall compile time.
>>>> >>>
>>>> >>> Additionally, I added a flag to the rtl_data structure to indicate whether
>>>> >>> any partitioning was actually performed, so that optimizations which were
>>>> >>> conservatively disabled whenever the flag_reorder_blocks_and_partition
>>>> >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
>>>> >>> conservative for functions where no partitions were formed (e.g. they are
>>>> >>> completely hot).
>>>> >>>
>>>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
>>>> >>> benchmarks and internal google benchmarks using profile feedback and
>>>> >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk?
>>>> >>>
>>>> >>> Thanks,
>>>> >>> Teresa
>>>> >>>
>>>> >>> 2012-11-14  Teresa Johnson  <tejohnson@google.com>
>>>> >>>             Steven Bosscher  <steven@gcc.gnu.org>
>>>> >>>
>>>> >>>         * cfghooks.h (cfg_layout_finalize): New parameter.
>>>> >>>         * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
>>>> >>>         parameter.
>>>> >>>         * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
>>>> >>>         as this is now done by redirect_edge_and_branch_force.
>>>> >>>         * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
>>>> >>>         barriers, new cfg_layout_finalize parameter, and don't store exit
>>>> >>>         predecessor BB until after it is potentially split.
>>>> >>>         * function.h (struct rtl_data): New flag has_bb_partition.
>>>> >>>         * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
>>>> >>>         * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
>>>> >>>         any blocks in function actually partitioned.
>>>> >>>         (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
>>>> >>>         up partitioning.
>>>> >>>         * bb-reorder.c (connect_traces): Only look for partitions and skip
>>>> >>>         block copying if any blocks in function actually partitioned.
>>>> >>>         (emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
>>>> >>>         (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
>>>> >>>         that no cold blocks dominate a hot block.
>>>> >>>         (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
>>>> >>>         as this is now done by force_nonfallthru_and_redirect.
>>>> >>>         (add_reg_crossing_jump_notes): Handle the fact that some jumps may
>>>> >>>         already be marked with region crossing note.
>>>> >>>         (reorder_basic_blocks): Only need to verify partitions if any
>>>> >>>         blocks in function actually partitioned.
>>>> >>>         (insert_section_boundary_note): Only need to insert note if any
>>>> >>>         blocks in function actually partitioned.
>>>> >>>         (rest_of_handle_reorder_blocks): New cfg_layout_finalize
>>>> >>>         parameter, and remove call to insert_section_boundary_note as this
>>>> >>>         is now called via cfg_layout_finalize/fixup_reorder_chain.
>>>> >>>         (duplicate_computed_gotos): New cfg_layout_finalize
>>>> >>>         parameter.
>>>> >>>         (partition_hot_cold_basic_blocks): Set flag indicating function
>>>> >>>         has bb partitions.
>>>> >>>         * bb-reorder.h: Declare insert_section_boundary_note and
>>>> >>>         emit_barrier_after_bb, which are no longer static.
>>>> >>>         * basic-block.h: Declare new function fixup_partitions.
>>>> >>>         * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
>>>> >>>         check for region crossing note.
>>>> >>>         (fixup_partition_crossing): New function.
>>>> >>>         (fixup_bb_partition): Ditto.
>>>> >>>         (rtl_redirect_edge_and_branch): Fixup partition boundaries.
>>>> >>>         (force_nonfallthru_and_redirect): Fixup partition boundaries,
>>>> >>>         remove old code that tried to do this. Emit barrier correctly
>>>> >>>         when we are in cfglayout mode.
>>>> >>>         (rtl_split_edge): Correctly fixup partition boundaries.
>>>> >>>         (commit_one_edge_insertion): Remove old code that tried to
>>>> >>>         fixup region crossing edge since this is now handled in
>>>> >>>         split_block, and set up insertion point correctly since
>>>> >>>         block may now end in a jump.
>>>> >>>         (commit_edge_insertions): Invoke fixup_partitions to sanitize partition
>>>> >>>         boundaries after optimizations that modify cfg and before trying to
>>>> >>>         verify the flow info.
>>>> >>>         (fixup_partitions): New function.
>>>> >>>         (rtl_verify_flow_info_1): Add verification that no cold bbs dominate
>>>> >>>         hot bbs.
>>>> >>>         (record_effective_endpoints): Remove region-crossing notes and set flag
>>>> >>>         indicating that they need to be reinserted on exit from cfglayout mode.
>>>> >>>         (outof_cfg_layout_mode): New cfg_layout_finalize parameter.
>>>> >>>         (fixup_reorder_chain): Call insert_section_boundary_note if necessary.
>>>> >>>         Remove old code that attempted to fixup region crossing note as
>>>> >>>         this is now handled in force_nonfallthru_and_redirect.
>>>> >>>         (duplicate_insn_chain): Don't duplicate switch section notes.
>>>> >>>         (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
>>>> >>>         (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
>>>> >>>         note.
>>>> >>>
>>>> >>> Index: cfghooks.h
>>>> >>> ===================================================================
>>>> >>> --- cfghooks.h  (revision 193376)
>>>> >>> +++ cfghooks.h  (working copy)
>>>> >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
>>>> >>>  void account_profile_record (struct profile_record *, int);
>>>> >>>
>>>> >>>  extern void cfg_layout_initialize (unsigned int);
>>>> >>> -extern void cfg_layout_finalize (void);
>>>> >>> +extern void cfg_layout_finalize (bool);
>>>> >>>
>>>> >>>  /* Hooks containers.  */
>>>> >>>  extern struct cfg_hooks gimple_cfg_hooks;
>>>> >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
>>>> >>>  extern void gimple_register_cfg_hooks (void);
>>>> >>>  extern struct cfg_hooks get_cfg_hooks (void);
>>>> >>>  extern void set_cfg_hooks (struct cfg_hooks);
>>>> >>> -
>>>> >>> Index: modulo-sched.c
>>>> >>> ===================================================================
>>>> >>> --- modulo-sched.c      (revision 193376)
>>>> >>> +++ modulo-sched.c      (working copy)
>>>> >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void)
>>>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>> >>>        bb->aux = bb->next_bb;
>>>> >>>    free_dominance_info (CDI_DOMINATORS);
>>>> >>> -  cfg_layout_finalize ();
>>>> >>> +  cfg_layout_finalize (false);
>>>> >>>  #endif /* INSN_SCHEDULING */
>>>> >>>    return 0;
>>>> >>>  }
>>>> >>> Index: ifcvt.c
>>>> >>> ===================================================================
>>>> >>> --- ifcvt.c     (revision 193376)
>>>> >>> +++ ifcvt.c     (working copy)
>>>> >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
>>>> >>>    if (new_bb)
>>>> >>>      {
>>>> >>>        df_bb_replace (then_bb_index, new_bb);
>>>> >>> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
>>>> >>> -         we need to ensure that new_bb is in the same partition as
>>>> >>> -         test bb (you can not fall through across section boundaries).  */
>>>> >>> -      BB_COPY_PARTITION (new_bb, test_bb);
>>>> >>> +      /* This should have been done above via force_nonfallthru_and_redirect
>>>> >>> +         (possibly called from redirect_edge_and_branch_force).  */
>>>> >>> +      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
>>>> >>>      }
>>>> >>>
>>>> >>>    num_true_changes++;
>>>> >>> Index: function.c
>>>> >>> ===================================================================
>>>> >>> --- function.c  (revision 193376)
>>>> >>> +++ function.c  (working copy)
>>>> >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void)
>>>> >>>                     break;
>>>> >>>                 if (e)
>>>> >>>                   {
>>>> >>> -                   copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
>>>> >>> -                                                 NULL_RTX, e->src);
>>>> >>> +                    /* Make sure we insert after any barriers.  */
>>>> >>> +                    rtx end = get_last_bb_insn (e->src);
>>>> >>> +                    copy_bb = create_basic_block (NEXT_INSN (end),
>>>> >>> +                                                  NULL_RTX, e->src);
>>>> >>>                     BB_COPY_PARTITION (copy_bb, e->src);
>>>> >>>                   }
>>>> >>>                 else
>>>> >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void)
>>>> >>>         if (cur_bb->index >= NUM_FIXED_BLOCKS
>>>> >>>             && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
>>>> >>>           cur_bb->aux = cur_bb->next_bb;
>>>> >>> -      cfg_layout_finalize ();
>>>> >>> +      cfg_layout_finalize (false);
>>>> >>>      }
>>>> >>>
>>>> >>>  epilogue_done:
>>>> >>> @@ -6517,7 +6519,7 @@ epilogue_done:
>>>> >>>        basic_block simple_return_block_cold = NULL;
>>>> >>>        edge pending_edge_hot = NULL;
>>>> >>>        edge pending_edge_cold = NULL;
>>>> >>> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>>> >>> +      basic_block exit_pred;
>>>> >>>        int i;
>>>> >>>
>>>> >>>        gcc_assert (entry_edge != orig_entry_edge);
>>>> >>> @@ -6545,6 +6547,12 @@ epilogue_done:
>>>> >>>             else
>>>> >>>               pending_edge_cold = e;
>>>> >>>           }
>>>> >>> +
>>>> >>> +      /* Save a pointer to the exit's predecessor BB for use in
>>>> >>> +         inserting new BBs at the end of the function. Do this
>>>> >>> +         after the call to split_block above which may split
>>>> >>> +         the original exit pred.  */
>>>> >>> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>>> >>>
>>>> >>>        FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
>>>> >>>         {
>>>> >>> Index: function.h
>>>> >>> ===================================================================
>>>> >>> --- function.h  (revision 193376)
>>>> >>> +++ function.h  (working copy)
>>>> >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data {
>>>> >>>       sched2) and is useful only if the port defines LEAF_REGISTERS.  */
>>>> >>>    bool uses_only_leaf_regs;
>>>> >>>
>>>> >>> +  /* Nonzero if the function being compiled has undergone hot/cold partitioning
>>>> >>> +     (under flag_reorder_blocks_and_partition) and has at least one cold
>>>> >>> +     block.  */
>>>> >>> +  bool has_bb_partition;
>>>> >>> +
>>>> >>>    /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
>>>> >>>       asm.  Unlike regs_ever_live, elements of this array corresponding
>>>> >>>       to eliminable regs (like the frame pointer) are set if an asm
>>>> >>> Index: hw-doloop.c
>>>> >>> ===================================================================
>>>> >>> --- hw-doloop.c (revision 193376)
>>>> >>> +++ hw-doloop.c (working copy)
>>>> >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
>>>> >>>        else
>>>> >>>         bb->aux = NULL;
>>>> >>>      }
>>>> >>> -  cfg_layout_finalize ();
>>>> >>> +  cfg_layout_finalize (false);
>>>> >>>    clear_aux_for_blocks ();
>>>> >>>    df_analyze ();
>>>> >>>  }
>>>> >>> Index: cfgcleanup.c
>>>> >>> ===================================================================
>>>> >>> --- cfgcleanup.c        (revision 193376)
>>>> >>> +++ cfgcleanup.c        (working copy)
>>>> >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
>>>> >>>       partition boundaries).  See the comments at the top of
>>>> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>>> >>>
>>>> >>> -  if (flag_reorder_blocks_and_partition && reload_completed)
>>>> >>> +  if (crtl->has_bb_partition && reload_completed)
>>>> >>>      return false;
>>>> >>>
>>>> >>>    /* Search backward through forwarder blocks.  We don't need to worry
>>>> >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode)
>>>> >>>               df_analyze ();
>>>> >>>             }
>>>> >>>
>>>> >>> +         if (changed)
>>>> >>> +            {
>>>> >>> +              /* Edge forwarding in particular can cause hot blocks previously
>>>> >>> +                 reached by both hot and cold blocks to become dominated only
>>>> >>> +                 by cold blocks. This will cause the verification below to fail,
>>>> >>> +                 and lead to now cold code in the hot section. This is not easy
>>>> >>> +                 to detect and fix during edge forwarding, and in some cases
>>>> >>> +                 is only visible after newly unreachable blocks are deleted,
>>>> >>> +                 which will be done in fixup_partitions.  */
>>>> >>> +              fixup_partitions ();
>>>> >>> +
>>>> >>>  #ifdef ENABLE_CHECKING
>>>> >>> -         if (changed)
>>>> >>> -           verify_flow_info ();
>>>> >>> +              verify_flow_info ();
>>>> >>>  #endif
>>>> >>> +            }
>>>> >>>
>>>> >>>           changed_overall |= changed;
>>>> >>>           first_pass = false;
>>>> >>> Index: bb-reorder.c
>>>> >>> ===================================================================
>>>> >>> --- bb-reorder.c        (revision 193376)
>>>> >>> +++ bb-reorder.c        (working copy)
>>>> >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
>>>> >>>    current_partition = BB_PARTITION (traces[0].first);
>>>> >>>    two_passes = false;
>>>> >>>
>>>> >>> -  if (flag_reorder_blocks_and_partition)
>>>> >>> +  if (crtl->has_bb_partition)
>>>> >>>      for (i = 0; i < n_traces && !two_passes; i++)
>>>> >>>        if (BB_PARTITION (traces[0].first)
>>>> >>>           != BB_PARTITION (traces[i].first))
>>>> >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
>>>> >>>                       }
>>>> >>>                   }
>>>> >>>
>>>> >>> -             if (flag_reorder_blocks_and_partition)
>>>> >>> +             if (crtl->has_bb_partition)
>>>> >>>                 try_copy = false;
>>>> >>>
>>>> >>>               /* Copy tiny blocks always; copy larger blocks only when the
>>>> >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
>>>> >>>    return length;
>>>> >>>  }
>>>> >>>
>>>> >>> -/* Emit a barrier into the footer of BB.  */
>>>> >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
>>>> >>>
>>>> >>> -static void
>>>> >>> +void
>>>> >>>  emit_barrier_after_bb (basic_block bb)
>>>> >>>  {
>>>> >>>    rtx barrier = emit_barrier_after (BB_END (bb));
>>>> >>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>>> >>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>>>> >>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>>> >>>  }
>>>> >>>
>>>> >>>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
>>>> >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
>>>> >>>  {
>>>> >>>    VEC(edge, heap) *crossing_edges = NULL;
>>>> >>>    basic_block bb;
>>>> >>> -  edge e;
>>>> >>> -  edge_iterator ei;
>>>> >>> +  edge e, e2;
>>>> >>> +  edge_iterator ei, ei2;
>>>> >>> +  unsigned int cold_bb_count = 0;
>>>> >>> +  VEC (basic_block, heap) *bbs_in_hot_partition = NULL;
>>>> >>> +  VEC (basic_block, heap) *bbs_newly_hot = NULL;
>>>> >>>
>>>> >>>    /* Mark which partition (hot/cold) each basic block belongs in.  */
>>>> >>>    FOR_EACH_BB (bb)
>>>> >>>      {
>>>> >>>        if (probably_never_executed_bb_p (cfun, bb))
>>>> >>> -       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>>> >>> +        {
>>>> >>> +          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>>> >>> +          cold_bb_count++;
>>>> >>> +        }
>>>> >>>        else
>>>> >>> -       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>>>> >>> +        {
>>>> >>> +          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>>>> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb);
>>>> >>> +        }
>>>> >>>      }
>>>> >>>
>>>> >>> +  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
>>>> >>> +     several different possibilities. One is that there are edge weight insanities
>>>> >>> +     due to optimization phases that do not properly update basic block profile
>>>> >>> +     counts. The second is that the entry of the function may not be hot, because
>>>> >>> +     it is entered fewer times than the number of profile training runs, but there
>>>> >>> +     is a loop inside the function that causes blocks within the function to be
>>>> >>> +     above the threshold for hotness.  */
>>>> >>> +  if (cold_bb_count)
>>>> >>> +    {
>>>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>>> >>> +
>>>> >>> +      if (dom_calculated_here)
>>>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>>>> >>> +
>>>> >>> +      /* Keep examining hot bbs until we have either checked them all, or
>>>> >>> +         re-marked all cold bbs hot.  */
>>>> >>> +      while (! VEC_empty (basic_block, bbs_in_hot_partition)
>>>> >>> +             && cold_bb_count)
>>>> >>> +        {
>>>> >>> +          basic_block dom_bb;
>>>> >>> +
>>>> >>> +          bb = VEC_pop (basic_block, bbs_in_hot_partition);
>>>> >>> +          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
>>>> >>> +
>>>> >>> +          /* If bb's immediate dominator is also hot then it is ok.  */
>>>> >>> +          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
>>>> >>> +            continue;
>>>> >>> +
>>>> >>> +          /* We have a hot bb with an immediate dominator that is cold.
>>>> >>> +             The dominator needs to be re-marked to hot.  */
>>>> >>> +          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
>>>> >>> +          cold_bb_count--;
>>>> >>> +
>>>> >>> +          /* Now we need to examine newly-hot dom_bb to see if it is also
>>>> >>> +             dominated by a cold bb.  */
>>>> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb);
>>>> >>> +
>>>> >>> +          /* We should also adjust any cold blocks that the newly-hot bb
>>>> >>> +             feeds and see if it makes sense to re-mark those as hot as
>>>> >>> +             well.  */
>>>> >>> +          VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb);
>>>> >>> +          while (! VEC_empty (basic_block, bbs_newly_hot))
>>>> >>> +            {
>>>> >>> +              basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot);
>>>> >>> +              /* Examine all successors of this newly-hot bb to see if they
>>>> >>> +                 are cold and should be re-marked as hot.  */
>>>> >>> +              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
>>>> >>> +                {
>>>> >>> +                  bool any_cold_preds = false;
>>>> >>> +                  basic_block succ = e->dest;
>>>> >>> +                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
>>>> >>> +                    continue;
>>>> >>> +                  /* Does this block have any cold predecessors now?  */
>>>> >>> +                  FOR_EACH_EDGE (e2, ei2, succ->preds)
>>>> >>> +                  {
>>>> >>> +                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
>>>> >>> +                      {
>>>> >>> +                        any_cold_preds = true;
>>>> >>> +                        break;
>>>> >>> +                      }
>>>> >>> +                  }
>>>> >>> +                  if (any_cold_preds)
>>>> >>> +                    continue;
>>>> >>> +
>>>> >>> +                  /* Here we have a successor of newly-hot bb that is cold
>>>> >>> +                     but no longer has any cold precessessors. Since the original
>>>> >>> +                     assignment of our newly-hot bb was incorrect, this successor's
>>>> >>> +                     assignment as cold is also suspect. Go ahead and re-mark it
>>>> >>> +                     as hot now too. Better heuristics may be in order here.  */
>>>> >>> +                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
>>>> >>> +                  cold_bb_count--;
>>>> >>> +                  VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ);
>>>> >>> +                  /* Examine this successor as a newly-hot bb.  */
>>>> >>> +                  VEC_safe_push (basic_block, heap, bbs_newly_hot, succ);
>>>> >>> +                }
>>>> >>> +            }
>>>> >>> +        }
>>>> >>> +
>>>> >>> +      if (dom_calculated_here)
>>>> >>> +        free_dominance_info (CDI_DOMINATORS);
>>>> >>> +    }
>>>> >>> +
>>>> >>>    /* The format of .gcc_except_table does not allow landing pads to
>>>> >>>       be in a different partition as the throw.  Fix this by either
>>>> >>>       moving or duplicating the landing pads.  */
>>>> >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
>>>> >>>                       new_bb->aux = cur_bb->aux;
>>>> >>>                       cur_bb->aux = new_bb;
>>>> >>>
>>>> >>> -                     /* Make sure new fall-through bb is in same
>>>> >>> -                        partition as bb it's falling through from.  */
>>>> >>> +                      /* This is done by force_nonfallthru_and_redirect.  */
>>>> >>> +                     gcc_assert (BB_PARTITION (new_bb)
>>>> >>> +                                  == BB_PARTITION (cur_bb));
>>>> >>>
>>>> >>> -                     BB_COPY_PARTITION (new_bb, cur_bb);
>>>> >>>                       single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
>>>> >>>                     }
>>>> >>>                   else
>>>> >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
>>>> >>>    FOR_EACH_BB (bb)
>>>> >>>      FOR_EACH_EDGE (e, ei, bb->succs)
>>>> >>>        if ((e->flags & EDGE_CROSSING)
>>>> >>> -         && JUMP_P (BB_END (e->src)))
>>>> >>> +         && JUMP_P (BB_END (e->src))
>>>> >>> +          /* Some notes were added during fix_up_fall_thru_edges, via
>>>> >>> +             force_nonfallthru_and_redirect.  */
>>>> >>> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
>>>> >>>         add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>> >>>  }
>>>> >>>
>>>> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
>>>> >>>        dump_flow_info (dump_file, dump_flags);
>>>> >>>      }
>>>> >>>
>>>> >>> -  if (flag_reorder_blocks_and_partition)
>>>> >>> +  if (crtl->has_bb_partition)
>>>> >>>      verify_hot_cold_block_grouping ();
>>>> >>>  }
>>>> >>>
>>>> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
>>>> >>>     encountering this note will make the compiler switch between the
>>>> >>>     hot and cold text sections.  */
>>>> >>>
>>>> >>> -static void
>>>> >>> +void
>>>> >>>  insert_section_boundary_note (void)
>>>> >>>  {
>>>> >>>    basic_block bb;
>>>> >>>    rtx new_note;
>>>> >>>    int first_partition = 0;
>>>> >>>
>>>> >>> -  if (!flag_reorder_blocks_and_partition)
>>>> >>> +  if (!crtl->has_bb_partition)
>>>> >>>      return;
>>>> >>>
>>>> >>>    FOR_EACH_BB (bb)
>>>> >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
>>>> >>>    FOR_EACH_BB (bb)
>>>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>> >>>        bb->aux = bb->next_bb;
>>>> >>> -  cfg_layout_finalize ();
>>>> >>> +  cfg_layout_finalize (true);
>>>> >>>
>>>> >>> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>>>> >>> -  insert_section_boundary_note ();
>>>> >>>    return 0;
>>>> >>>  }
>>>> >>>
>>>> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
>>>> >>>      }
>>>> >>>
>>>> >>>  done:
>>>> >>> -  cfg_layout_finalize ();
>>>> >>> +  cfg_layout_finalize (false);
>>>> >>>
>>>> >>>    BITMAP_FREE (candidates);
>>>> >>>    return 0;
>>>> >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
>>>> >>>    if (crossing_edges == NULL)
>>>> >>>      return 0;
>>>> >>>
>>>> >>> +  crtl->has_bb_partition = true;
>>>> >>> +
>>>> >>>    /* Make sure the source of any crossing edge ends in a jump and the
>>>> >>>       destination of any crossing edge has a label.  */
>>>> >>>    add_labels_and_missing_jumps (crossing_edges);
>>>> >>> Index: bb-reorder.h
>>>> >>> ===================================================================
>>>> >>> --- bb-reorder.h        (revision 193376)
>>>> >>> +++ bb-reorder.h        (working copy)
>>>> >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re
>>>> >>>
>>>> >>>  extern int get_uncond_jump_length (void);
>>>> >>>
>>>> >>> +extern void insert_section_boundary_note (void);
>>>> >>> +
>>>> >>> +extern void emit_barrier_after_bb (basic_block bb);
>>>> >>> +
>>>> >>>  #endif
>>>> >>> Index: basic-block.h
>>>> >>> ===================================================================
>>>> >>> --- basic-block.h       (revision 193376)
>>>> >>> +++ basic-block.h       (working copy)
>>>> >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect
>>>> >>>  extern bool contains_no_active_insn_p (const_basic_block);
>>>> >>>  extern bool forwarder_block_p (const_basic_block);
>>>> >>>  extern bool can_fallthru (basic_block, basic_block);
>>>> >>> +extern void fixup_partitions (void);
>>>> >>>
>>>> >>>  /* In cfgbuild.c.  */
>>>> >>>  extern void find_many_sub_basic_blocks (sbitmap);
>>>> >>> Index: cfgrtl.c
>>>> >>> ===================================================================
>>>> >>> --- cfgrtl.c    (revision 193376)
>>>> >>> +++ cfgrtl.c    (working copy)
>>>> >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>>>> >>>  #include "tree.h"
>>>> >>>  #include "hard-reg-set.h"
>>>> >>>  #include "basic-block.h"
>>>> >>> +#include "bb-reorder.h"
>>>> >>>  #include "regs.h"
>>>> >>>  #include "flags.h"
>>>> >>>  #include "function.h"
>>>> >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
>>>> >>>     Only applicable if the CFG is in cfglayout mode.  */
>>>> >>>  static GTY(()) rtx cfg_layout_function_footer;
>>>> >>>  static GTY(()) rtx cfg_layout_function_header;
>>>> >>> +static bool had_sec_boundary_notes;
>>>> >>>
>>>> >>>  static rtx skip_insns_after_block (basic_block);
>>>> >>>  static void record_effective_endpoints (void);
>>>> >>>  static rtx label_for_bb (basic_block);
>>>> >>> -static void fixup_reorder_chain (void);
>>>> >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks);
>>>> >>>
>>>> >>>  void verify_insn_chain (void);
>>>> >>>  static void fixup_fallthru_exit_predecessor (void);
>>>> >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
>>>> >>>       partition boundaries).  See  the comments at the top of
>>>> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>>> >>>
>>>> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>>> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>>> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>>> >>>      return NULL;
>>>> >>>
>>>> >>>    /* We can replace or remove a complex jump only when we have exactly
>>>> >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
>>>> >>>    return e;
>>>> >>>  }
>>>> >>>
>>>> >>> +/* Called when edge E has been redirected to a new destination,
>>>> >>> +   in order to update the region crossing flag on the edge and
>>>> >>> +   jump.  */
>>>> >>> +
>>>> >>> +static void
>>>> >>> +fixup_partition_crossing (edge e, basic_block target)
>>>> >>> +{
>>>> >>> +  rtx note;
>>>> >>> +
>>>> >>> +  gcc_assert (e->dest == target);
>>>> >>> +
>>>> >>> +  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
>>>> >>> +    return;
>>>> >>> +  /* If we redirected an existing edge, it may already be marked
>>>> >>> +     crossing, even though the new src is missing a reg crossing note.
>>>> >>> +     But make sure reg crossing note doesn't already exist before
>>>> >>> +     inserting.  */
>>>> >>> +  if (BB_PARTITION (e->src) != BB_PARTITION (target))
>>>> >>> +    {
>>>> >>> +      e->flags |= EDGE_CROSSING;
>>>> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>> >>> +      if (JUMP_P (BB_END (e->src))
>>>> >>> +          && !note)
>>>> >>> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>> >>> +    }
>>>> >>> +  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
>>>> >>> +    {
>>>> >>> +      e->flags &= ~EDGE_CROSSING;
>>>> >>> +      /* Remove the region crossing note from jump at end of
>>>> >>> +         e->src if it exists.  */
>>>> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>> >>> +      if (note)
>>>> >>> +        remove_note (BB_END (e->src), note);
>>>> >>> +    }
>>>> >>> +}
>>>> >>> +
>>>> >>> +/* Called when block BB has been reassigned to a different partition,
>>>> >>> +   to ensure that the region crossing attributes are updated.  */
>>>> >>> +
>>>> >>> +static void
>>>> >>> +fixup_bb_partition (basic_block bb)
>>>> >>> +{
>>>> >>> +  edge e;
>>>> >>> +  edge_iterator ei;
>>>> >>> +
>>>> >>> +  /* Now need to make bb's pred edges non-region crossing.  */
>>>> >>> +  FOR_EACH_EDGE (e, ei, bb->preds)
>>>> >>> +    {
>>>> >>> +      fixup_partition_crossing (e, e->dest);
>>>> >>> +    }
>>>> >>> +
>>>> >>> +  /* Possibly need to make bb's successor edges region crossing,
>>>> >>> +     or remove stale region crossing.  */
>>>> >>> +  FOR_EACH_EDGE (e, ei, bb->succs)
>>>> >>> +    {
>>>> >>> +      if ((e->flags & EDGE_FALLTHRU)
>>>> >>> +          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
>>>> >>> +          && e->dest != EXIT_BLOCK_PTR)
>>>> >>> +        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
>>>> >>> +        force_nonfallthru (e);
>>>> >>> +      else
>>>> >>> +        fixup_partition_crossing (e, e->dest);
>>>> >>> +    }
>>>> >>> +}
>>>> >>> +
>>>> >>>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>>>> >>>     expense of adding new instructions or reordering basic blocks.
>>>> >>>
>>>> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>>> >>>  {
>>>> >>>    edge ret;
>>>> >>>    basic_block src = e->src;
>>>> >>> +  basic_block dest = e->dest;
>>>> >>>
>>>> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>>> >>>      return NULL;
>>>> >>>
>>>> >>> -  if (e->dest == target)
>>>> >>> +  if (dest == target)
>>>> >>>      return e;
>>>> >>>
>>>> >>>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
>>>> >>>      {
>>>> >>>        df_set_bb_dirty (src);
>>>> >>> +      fixup_partition_crossing (ret, target);
>>>> >>>        return ret;
>>>> >>>      }
>>>> >>>
>>>> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>>> >>>      return NULL;
>>>> >>>
>>>> >>>    df_set_bb_dirty (src);
>>>> >>> +  fixup_partition_crossing (ret, target);
>>>> >>>    return ret;
>>>> >>>  }
>>>> >>>
>>>> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>>> >>>        /* Make sure new block ends up in correct hot/cold section.  */
>>>> >>>
>>>> >>>        BB_COPY_PARTITION (jump_block, e->src);
>>>> >>> -      if (flag_reorder_blocks_and_partition
>>>> >>> -         && targetm_common.have_named_sections
>>>> >>> -         && JUMP_P (BB_END (jump_block))
>>>> >>> -         && !any_condjump_p (BB_END (jump_block))
>>>> >>> -         && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
>>>> >>> -       add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
>>>> >>>
>>>> >>>        /* Wire edge in.  */
>>>> >>>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
>>>> >>>        new_edge->probability = probability;
>>>> >>>        new_edge->count = count;
>>>> >>>
>>>> >>> +      /* If e->src was previously region crossing, it no longer is
>>>> >>> +         and the reg crossing note should be removed.  */
>>>> >>> +      fixup_partition_crossing (new_edge, jump_block);
>>>> >>> +
>>>> >>>        /* Redirect old edge.  */
>>>> >>>        redirect_edge_pred (e, jump_block);
>>>> >>>        e->probability = REG_BR_PROB_BASE;
>>>> >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>>> >>>        LABEL_NUSES (label)++;
>>>> >>>      }
>>>> >>>
>>>> >>> -  emit_barrier_after (BB_END (jump_block));
>>>> >>> +  /* We might be in cfg layout mode, and if so, the following routine will
>>>> >>> +     insert the barrier correctly.  */
>>>> >>> +  emit_barrier_after_bb (jump_block);
>>>> >>>    redirect_edge_succ_nodup (e, target);
>>>> >>>
>>>> >>>    if (abnormal_edge_flags)
>>>> >>>      make_edge (src, target, abnormal_edge_flags);
>>>> >>>
>>>> >>>    df_mark_solutions_dirty ();
>>>> >>> +  fixup_partition_crossing (e, target);
>>>> >>>    return new_bb;
>>>> >>>  }
>>>> >>>
>>>> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>>>> >>>  static basic_block
>>>> >>>  rtl_split_edge (edge edge_in)
>>>> >>>  {
>>>> >>> -  basic_block bb;
>>>> >>> +  basic_block bb, new_bb;
>>>> >>>    rtx before;
>>>> >>>
>>>> >>>    /* Abnormal edges cannot be split.  */
>>>> >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in)
>>>> >>>    else
>>>> >>>      {
>>>> >>>        bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
>>>> >>> -      /* ??? Why not edge_in->dest->prev_bb here?  */
>>>> >>> -      BB_COPY_PARTITION (bb, edge_in->dest);
>>>> >>> +      if (edge_in->src == ENTRY_BLOCK_PTR)
>>>> >>> +        BB_COPY_PARTITION (bb, edge_in->dest);
>>>> >>> +      else
>>>> >>> +        /* Put the split bb into the src partition, to avoid creating
>>>> >>> +           a situation where a cold bb dominates a hot bb, in the case
>>>> >>> +           where src is cold and dest is hot. The src will dominate
>>>> >>> +           the new bb (whereas it might not have dominated dest).  */
>>>> >>> +        BB_COPY_PARTITION (bb, edge_in->src);
>>>> >>>      }
>>>> >>>
>>>> >>>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
>>>> >>>
>>>> >>> +  /* Can't allow a region crossing edge to be fallthrough.  */
>>>> >>> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
>>>> >>> +      && edge_in->dest != EXIT_BLOCK_PTR)
>>>> >>> +    {
>>>> >>> +      new_bb = force_nonfallthru (single_succ_edge (bb));
>>>> >>> +      gcc_assert (!new_bb);
>>>> >>> +    }
>>>> >>> +
>>>> >>>    /* For non-fallthru edges, we must adjust the predecessor's
>>>> >>>       jump instruction to target our new block.  */
>>>> >>>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
>>>> >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e)
>>>> >>>    else
>>>> >>>      {
>>>> >>>        bb = split_edge (e);
>>>> >>> -      after = BB_END (bb);
>>>> >>>
>>>> >>> -      if (flag_reorder_blocks_and_partition
>>>> >>> -         && targetm_common.have_named_sections
>>>> >>> -         && e->src != ENTRY_BLOCK_PTR
>>>> >>> -         && BB_PARTITION (e->src) == BB_COLD_PARTITION
>>>> >>> -         && !(e->flags & EDGE_CROSSING)
>>>> >>> -         && JUMP_P (after)
>>>> >>> -         && !any_condjump_p (after)
>>>> >>> -         && (single_succ_edge (bb)->flags & EDGE_CROSSING))
>>>> >>> -       add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
>>>> >>> +      /* If e crossed a partition boundary, we needed to make bb end in
>>>> >>> +         a region-crossing jump, even though it was originally fallthru.  */
>>>> >>> +      if (JUMP_P (BB_END (bb)))
>>>> >>> +       before = BB_END (bb);
>>>> >>> +      else
>>>> >>> +        after = BB_END (bb);
>>>> >>>      }
>>>> >>>
>>>> >>>    /* Now that we've found the spot, do the insertion.  */
>>>> >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void)
>>>> >>>  {
>>>> >>>    basic_block bb;
>>>> >>>
>>>> >>> +  /* Optimization passes that invoke this routine can cause hot blocks
>>>> >>> +     previously reached by both hot and cold blocks to become dominated only
>>>> >>> +     by cold blocks. This will cause the verification below to fail,
>>>> >>> +     and lead to now cold code in the hot section. In some cases this
>>>> >>> +     may only be visible after newly unreachable blocks are deleted,
>>>> >>> +     which will be done by fixup_partitions.  */
>>>> >>> +  fixup_partitions ();
>>>> >>> +
>>>> >>>  #ifdef ENABLE_CHECKING
>>>> >>>    verify_flow_info ();
>>>> >>>  #endif
>>>> >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb)
>>>> >>>
>>>> >>>    return end;
>>>> >>>  }
>>>> >>> -
>>>> >>> +
>>>> >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization
>>>> >>> +   passes that modify the cfg.  */
>>>> >>> +
>>>> >>> +void
>>>> >>> +fixup_partitions (void)
>>>> >>> +{
>>>> >>> +  basic_block bb;
>>>> >>> +
>>>> >>> +  if (!crtl->has_bb_partition)
>>>> >>> +    return;
>>>> >>> +
>>>> >>> +  /* Delete any blocks that became unreachable and weren't
>>>> >>> +     already cleaned up, for example during edge forwarding
>>>> >>> +     and convert_jumps_to_returns. This will expose more
>>>> >>> +     opportunities for fixing the partition boundaries here.
>>>> >>> +     Also, the calculation of the dominance graph during verification
>>>> >>> +     will assert if there are unreachable nodes.  */
>>>> >>> +  delete_unreachable_blocks ();
>>>> >>> +
>>>> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
>>>> >>> +     a cold partition cannot dominate a basic block in a hot partition.
>>>> >>> +     Fixup any that now violate this requirement, as a result of edge
>>>> >>> +     forwarding and unreachable block deletion.  */
>>>> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>>>> >>> +  VEC (basic_block, heap) *bbs_to_fix = NULL;
>>>> >>> +  FOR_EACH_BB (bb)
>>>> >>> +    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>>>> >>> +      VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>>>> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>> >>> +    {
>>>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>>> >>> +      basic_block son;
>>>> >>> +
>>>> >>> +      if (dom_calculated_here)
>>>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>>>> >>> +
>>>> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>> >>> +        {
>>>> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>>>> >>> +          /* If bb is not yet cold (because it was added below as
>>>> >>> +             a block dominated by a cold bb) then mark it cold here.  */
>>>> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>>>> >>> +            {
>>>> >>> +              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>>> >>> +              VEC_safe_push (basic_block, heap, bbs_to_fix, bb);
>>>> >>> +            }
>>>> >>> +          /* Any blocks dominated by a block in the cold section
>>>> >>> +             must also be cold.  */
>>>> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>>>> >>> +               son;
>>>> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
>>>> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>>>> >>> +        }
>>>> >>> +
>>>> >>> +      if (dom_calculated_here)
>>>> >>> +        free_dominance_info (CDI_DOMINATORS);
>>>> >>> +    }
>>>> >>> +
>>>> >>> +  /* Do the partition fixup after all necessary blocks have been converted to
>>>> >>> +     cold, so that we only update the region crossings the minimum number of
>>>> >>> +     places, which can require forcing edges to be non fallthru.  */
>>>> >>> +  while (! VEC_empty (basic_block, bbs_to_fix))
>>>> >>> +    {
>>>> >>> +      bb = VEC_pop (basic_block, bbs_to_fix);
>>>> >>> +      fixup_bb_partition (bb);
>>>> >>> +    }
>>>> >>> +}
>>>> >>> +
>>>> >>>  /* Verify the CFG and RTL consistency common for both underlying RTL and
>>>> >>>     cfglayout RTL.
>>>> >>>
>>>> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void)
>>>> >>>    rtx x;
>>>> >>>    int err = 0;
>>>> >>>    basic_block bb;
>>>> >>> +  bool have_partitions = false;
>>>> >>>
>>>> >>>    /* Check the general integrity of the basic blocks.  */
>>>> >>>    FOR_EACH_BB_REVERSE (bb)
>>>> >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void)
>>>> >>>
>>>> >>>           if (e->flags & EDGE_ABNORMAL)
>>>> >>>             n_abnormal++;
>>>> >>> +
>>>> >>> +          have_partitions |= is_crossing;
>>>> >>>         }
>>>> >>>
>>>> >>>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
>>>> >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void)
>>>> >>>           }
>>>> >>>      }
>>>> >>>
>>>> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
>>>> >>> +     a cold partition cannot dominate a basic block in a hot partition.  */
>>>> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>>>> >>> +  if (have_partitions && !err)
>>>> >>> +    FOR_EACH_BB (bb)
>>>> >>> +      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>>>> >>> +        VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>>>> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>> >>> +    {
>>>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>>> >>> +      basic_block son;
>>>> >>> +
>>>> >>> +      if (dom_calculated_here)
>>>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>>>> >>> +
>>>> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>> >>> +        {
>>>> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>>>> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>>>> >>> +            {
>>>> >>> +              error ("non-cold basic block %d dominated "
>>>> >>> +                     "by a block in the cold partition", bb->index);
>>>> >>> +              err = 1;
>>>> >>> +            }
>>>> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>>>> >>> +               son;
>>>> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
>>>> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>>>> >>> +        }
>>>> >>> +
>>>> >>> +      if (dom_calculated_here)
>>>> >>> +        free_dominance_info (CDI_DOMINATORS);
>>>> >>> +    }
>>>> >>> +
>>>> >>>    /* Clean up.  */
>>>> >>>    return err;
>>>> >>>  }
>>>> >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void)
>>>> >>>    else
>>>> >>>      cfg_layout_function_header = NULL_RTX;
>>>> >>>
>>>> >>> +  had_sec_boundary_notes = false;
>>>> >>> +
>>>> >>>    next_insn = get_insns ();
>>>> >>>    FOR_EACH_BB (bb)
>>>> >>>      {
>>>> >>>        rtx end;
>>>> >>>
>>>> >>>        if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
>>>> >>> -       BB_HEADER (bb) = unlink_insn_chain (next_insn,
>>>> >>> -                                             PREV_INSN (BB_HEAD (bb)));
>>>> >>> +        {
>>>> >>> +          /* Rather than try to keep section boundary notes incrementally
>>>> >>> +             up-to-date through cfg layout optimizations, simply remove them
>>>> >>> +             and flag that they should be re-inserted when exiting
>>>> >>> +             cfg layout mode.  */
>>>> >>> +          rtx check_insn = next_insn;
>>>> >>> +          while (check_insn)
>>>> >>> +            {
>>>> >>> +              if (NOTE_P (check_insn)
>>>> >>> +                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
>>>> >>> +              {
>>>> >>> +                had_sec_boundary_notes |= true;
>>>> >>> +                /* Remove note from chain. Grab new next_insn first.  */
>>>> >>> +                if (next_insn == check_insn)
>>>> >>> +                  next_insn = NEXT_INSN (check_insn);
>>>> >>> +                /* Delete note.  */
>>>> >>> +                delete_insn (check_insn);
>>>> >>> +                /* There will only be one.  */
>>>> >>> +                break;
>>>> >>> +              }
>>>> >>> +              check_insn = NEXT_INSN (check_insn);
>>>> >>> +            }
>>>> >>> +          /* If we still have header instructions left after above loop.  */
>>>> >>> +          if (next_insn != BB_HEAD (bb))
>>>> >>> +            BB_HEADER (bb) = unlink_insn_chain (next_insn,
>>>> >>> +                                                PREV_INSN (BB_HEAD (bb)));
>>>> >>> +        }
>>>> >>>        end = skip_insns_after_block (bb);
>>>> >>>        if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
>>>> >>>         BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
>>>> >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void)
>>>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>> >>>        bb->aux = bb->next_bb;
>>>> >>>
>>>> >>> -  cfg_layout_finalize ();
>>>> >>> +  cfg_layout_finalize (false);
>>>> >>>
>>>> >>>    return 0;
>>>> >>>  }
>>>> >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
>>>> >>>  }
>>>> >>>
>>>> >>>
>>>> >>> -/* Given a reorder chain, rearrange the code to match.  */
>>>> >>> +/* Given a reorder chain, rearrange the code to match. If
>>>> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, or when
>>>> >>> +   section boundary notes were removed on entry to cfg layout
>>>> >>> +   mode, insert section boundary notes here.  */
>>>> >>>
>>>> >>>  static void
>>>> >>> -fixup_reorder_chain (void)
>>>> >>> +fixup_reorder_chain (bool finalize_reorder_blocks)
>>>> >>>  {
>>>> >>>    basic_block bb;
>>>> >>>    rtx insn = NULL;
>>>> >>> @@ -3150,7 +3373,7 @@ static void
>>>> >>>           PREV_INSN (BB_HEADER (bb)) = insn;
>>>> >>>           insn = BB_HEADER (bb);
>>>> >>>           while (NEXT_INSN (insn))
>>>> >>> -           insn = NEXT_INSN (insn);
>>>> >>> +            insn = NEXT_INSN (insn);
>>>> >>>         }
>>>> >>>        if (insn)
>>>> >>>         NEXT_INSN (insn) = BB_HEAD (bb);
>>>> >>> @@ -3175,6 +3398,11 @@ static void
>>>> >>>      insn = NEXT_INSN (insn);
>>>> >>>
>>>> >>>    set_last_insn (insn);
>>>> >>> +
>>>> >>> +  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>>>> >>> +  if (had_sec_boundary_notes || finalize_reorder_blocks)
>>>> >>> +    insert_section_boundary_note ();
>>>> >>> +
>>>> >>>  #ifdef ENABLE_CHECKING
>>>> >>>    verify_insn_chain ();
>>>> >>>  #endif
>>>> >>> @@ -3187,7 +3415,7 @@ static void
>>>> >>>        edge e_fall, e_taken, e;
>>>> >>>        rtx bb_end_insn;
>>>> >>>        rtx ret_label = NULL_RTX;
>>>> >>> -      basic_block nb, src_bb;
>>>> >>> +      basic_block nb;
>>>> >>>        edge_iterator ei;
>>>> >>>
>>>> >>>        if (EDGE_COUNT (bb->succs) == 0)
>>>> >>> @@ -3322,7 +3550,6 @@ static void
>>>> >>>        /* We got here if we need to add a new jump insn.
>>>> >>>          Note force_nonfallthru can delete E_FALL and thus we have to
>>>> >>>          save E_FALL->src prior to the call to force_nonfallthru.  */
>>>> >>> -      src_bb = e_fall->src;
>>>> >>>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
>>>> >>>        if (nb)
>>>> >>>         {
>>>> >>> @@ -3330,17 +3557,6 @@ static void
>>>> >>>           bb->aux = nb;
>>>> >>>           /* Don't process this new block.  */
>>>> >>>           bb = nb;
>>>> >>> -
>>>> >>> -         /* Make sure new bb is tagged for correct section (same as
>>>> >>> -            fall-thru source, since you cannot fall-thru across
>>>> >>> -            section boundaries).  */
>>>> >>> -         BB_COPY_PARTITION (src_bb, single_pred (bb));
>>>> >>> -         if (flag_reorder_blocks_and_partition
>>>> >>> -             && targetm_common.have_named_sections
>>>> >>> -             && JUMP_P (BB_END (bb))
>>>> >>> -             && !any_condjump_p (BB_END (bb))
>>>> >>> -             && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
>>>> >>> -           add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
>>>> >>>         }
>>>> >>>      }
>>>> >>>
>>>> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to)
>>>> >>>             case NOTE_INSN_FUNCTION_BEG:
>>>> >>>               /* There is always just single entry to function.  */
>>>> >>>             case NOTE_INSN_BASIC_BLOCK:
>>>> >>> +              /* We should only switch text sections once.  */
>>>> >>> +           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>>> >>>               break;
>>>> >>>
>>>> >>>             case NOTE_INSN_EPILOGUE_BEG:
>>>> >>> -           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>>> >>>               emit_note_copy (insn);
>>>> >>>               break;
>>>> >>>
>>>> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void)
>>>> >>>  }
>>>> >>>
>>>> >>>  /* Finalize the changes: reorder insn list according to the sequence specified
>>>> >>> -   by aux pointers, enter compensation code, rebuild scope forest.  */
>>>> >>> +   by aux pointers, enter compensation code, rebuild scope forest. If
>>>> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
>>>> >>> +   to fixup_reorder_chain so that it can insert the proper switch text
>>>> >>> +   section notes.  */
>>>> >>>
>>>> >>>  void
>>>> >>> -cfg_layout_finalize (void)
>>>> >>> +cfg_layout_finalize (bool finalize_reorder_blocks)
>>>> >>>  {
>>>> >>>  #ifdef ENABLE_CHECKING
>>>> >>>    verify_flow_info ();
>>>> >>> @@ -3775,7 +3995,7 @@ void
>>>> >>>  #endif
>>>> >>>        )
>>>> >>>      fixup_fallthru_exit_predecessor ();
>>>> >>> -  fixup_reorder_chain ();
>>>> >>> +  fixup_reorder_chain (finalize_reorder_blocks);
>>>> >>>
>>>> >>>    rebuild_jump_labels (get_insns ());
>>>> >>>    delete_dead_jumptables ();
>>>> >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e)
>>>> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>>> >>>      return false;
>>>> >>>
>>>> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>>> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>>> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>>> >>>      return false;
>>>> >>>
>>>> >>>    if (!onlyjump_p (insn)
>>>> >>>
>>>> >>> --
>>>> >>> This patch is available for review at http://codereview.appspot.com/6823047
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Christophe Lyon Nov. 28, 2012, 5:02 p.m. UTC | #8
Yes, I have configured GCC with:
--target=arm-none-linux-gnueabi--with-cpu=cortex-a9 --with-fpu=neon
--with-float=softfp

Thanks,

Christophe.

On 28 November 2012 16:56, Teresa Johnson <tejohnson@google.com> wrote:
> Is this with the same target compiler and options used in PR55121? I
> will try to reproduce the compile-time failures with arm and those
> options if so. I haven't seen those with spec2006 linux x86_64. I'm
> not sure how to test the runtime behavior though.
>
> Thanks,
> Teresa
>
> On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
>> I have updated my trunk checkout, and I can confirm that eval.c now
>> compiles with your patch (and the other 4 patches I added to PR55121).
>>
>> Now, when looking at the whole Spec2k results:
>> - vpr passes now (used to fail)
>> - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail
>> with the same error from gas:
>> can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171'
>> {.text section}
>> - gap still does not build (same error as above)
>>
>> I haven't looked in detail, so I may be missing an obvious patch here.
>>
>> And I still observe runtime mis-behaviour on crafty, galgel, facerec and fma3d.
>>
>> Thanks
>> Christophe.
>>
>>
>> On 26 November 2012 21:52, Teresa Johnson <tejohnson@google.com> wrote:
>>> Sorry, I don't know what happened there. Patch is attached.
>>> Thanks,
>>> Teresa
>>>
>>> On Mon, Nov 26, 2012 at 12:42 PM, Jack Howarth <howarth@bromo.med.uc.edu> wrote:
>>>> On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote:
>>>>> Are you sure you have all my changes applied? I applied the 4 patches
>>>>> attached to PR55121 into my trunk checkout that has my fixes, and to a
>>>>> pristine trunk checkout. I configured and built both for
>>>>> --target=arm-none-linux-gnueabi, and built using your options, .i file
>>>>> and gcda file. I can reproduce the failure using the pristine trunk
>>>>> with your patches but not with my fixed trunk + your patches. (I just
>>>>> updated to head to pickup recent changes and get the same result. The
>>>>> vec changes required some manual changes to the patch, which I will
>>>>> resend shortly.)
>>>>
>>>> Teresa,
>>>>     Your mailer seems to have corrupted the posted patch with stray
>>>> =3D characters and line breaks. Can you repost a copy as an attachment
>>>> to the list?
>>>>              Jack
>>>>
>>>>>
>>>>> Without my fixes:
>>>>>
>>>>> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce
>>>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
>>>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
>>>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
>>>>> -fno-common -o eval.s -freorder-blocks-and-partition
>>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>>>> 2.4.2-p1, MPC version 0.8.1
>>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>>>> 2.4.2-p1, MPC version 0.8.1
>>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>>>> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a
>>>>> eval.c: In function ‘Ge’:
>>>>> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560
>>>>>  }
>>>>>  ^
>>>>> 0x622f71 df_compact_blocks()
>>>>> ../../gcc_trunk_3/gcc/df-core.c:1560
>>>>> 0x5cfcb5 compact_blocks()
>>>>> ../../gcc_trunk_3/gcc/cfg.c:162
>>>>> 0xc9dce0 reorder_basic_blocks
>>>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2154
>>>>> 0xc9dce0 rest_of_handle_reorder_blocks
>>>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2219
>>>>> Please submit a full bug report,
>>>>> with preprocessed source if appropriate.
>>>>> Please include the complete backtrace with any bug report.
>>>>> See <http://gcc.gnu.org/bugs.html> for instructions.
>>>>>
>>>>>
>>>>> With my fixes:
>>>>>
>>>>> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce
>>>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
>>>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
>>>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
>>>>> -fno-common -o eval.s -freorder-blocks-and-partition
>>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>>>> 2.4.2-p1, MPC version 0.8.1
>>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>>>> 2.4.2-p1, MPC version 0.8.1
>>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>>>> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Teresa
>>>>>
>>>>> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon
>>>>> <christophe.lyon@linaro.org> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I have tested your patch on Spec2000 on ARM, and I can still see
>>>>> > several failures caused by:
>>>>> > "error: fallthru edge crosses section boundary", including the case
>>>>> > described in PR55121.
>>>>> >
>>>>> > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote:
>>>>> >> Ping.
>>>>> >> Teresa
>>>>> >>
>>>>> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>>>> >>> Revised patch that fixes failures encountered when enabling
>>>>> >>> -freorder-blocks-and-partition, including the failure reported in PR 53743.
>>>>> >>>
>>>>> >>> This includes new verification code to ensure no cold blocks dominate hot
>>>>> >>> blocks contributed by Steven Bosscher.
>>>>> >>>
>>>>> >>> I attempted to make the handling of partition updates through the optimization
>>>>> >>> passes much more consistent, removing a number of partial fixes in the code
>>>>> >>> stream in the process. The code to fixup partitions (including the BB_PARTITION
>>>>> >>> assignement, region crossing jump notes, and switch text section notes) is
>>>>> >>> now handled in a few centralized locations. For example, inside
>>>>> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
>>>>> >>> don't need to attempt the fixup themselves.
>>>>> >>>
>>>>> >>> For optimization passes that make adjustments to the cfg while in cfg layout
>>>>> >>> mode that are not easy to fix up incrementally, the new routine
>>>>> >>> fixup_partitions handles the cleanup globally. This does require calculation
>>>>> >>> of the dominance relation, however, as far as I can tell the routines which
>>>>> >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
>>>>> >>> are invoked typically once (or a small number of times in the case of
>>>>> >>> try_optimize_cfg) per optimization pass. Additionally, I compared the
>>>>> >>> -ftime-report output for some large fdo compilations and saw only minimal
>>>>> >>> increases in the dominance computation times, which were only a tiny percent
>>>>> >>> of the overall compile time.
>>>>> >>>
>>>>> >>> Additionally, I added a flag to the rtl_data structure to indicate whether
>>>>> >>> any partitioning was actually performed, so that optimizations which were
>>>>> >>> conservatively disabled whenever the flag_reorder_blocks_and_partition
>>>>> >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
>>>>> >>> conservative for functions where no partitions were formed (e.g. they are
>>>>> >>> completely hot).
>>>>> >>>
>>>>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
>>>>> >>> benchmarks and internal google benchmarks using profile feedback and
>>>>> >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk?
>>>>> >>>
>>>>> >>> Thanks,
>>>>> >>> Teresa
>>>>> >>>
>>>>> >>> 2012-11-14  Teresa Johnson  <tejohnson@google.com>
>>>>> >>>             Steven Bosscher  <steven@gcc.gnu.org>
>>>>> >>>
>>>>> >>>         * cfghooks.h (cfg_layout_finalize): New parameter.
>>>>> >>>         * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
>>>>> >>>         parameter.
>>>>> >>>         * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
>>>>> >>>         as this is now done by redirect_edge_and_branch_force.
>>>>> >>>         * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
>>>>> >>>         barriers, new cfg_layout_finalize parameter, and don't store exit
>>>>> >>>         predecessor BB until after it is potentially split.
>>>>> >>>         * function.h (struct rtl_data): New flag has_bb_partition.
>>>>> >>>         * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
>>>>> >>>         * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
>>>>> >>>         any blocks in function actually partitioned.
>>>>> >>>         (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
>>>>> >>>         up partitioning.
>>>>> >>>         * bb-reorder.c (connect_traces): Only look for partitions and skip
>>>>> >>>         block copying if any blocks in function actually partitioned.
>>>>> >>>         (emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
>>>>> >>>         (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
>>>>> >>>         that no cold blocks dominate a hot block.
>>>>> >>>         (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
>>>>> >>>         as this is now done by force_nonfallthru_and_redirect.
>>>>> >>>         (add_reg_crossing_jump_notes): Handle the fact that some jumps may
>>>>> >>>         already be marked with region crossing note.
>>>>> >>>         (reorder_basic_blocks): Only need to verify partitions if any
>>>>> >>>         blocks in function actually partitioned.
>>>>> >>>         (insert_section_boundary_note): Only need to insert note if any
>>>>> >>>         blocks in function actually partitioned.
>>>>> >>>         (rest_of_handle_reorder_blocks): New cfg_layout_finalize
>>>>> >>>         parameter, and remove call to insert_section_boundary_note as this
>>>>> >>>         is now called via cfg_layout_finalize/fixup_reorder_chain.
>>>>> >>>         (duplicate_computed_gotos): New cfg_layout_finalize
>>>>> >>>         parameter.
>>>>> >>>         (partition_hot_cold_basic_blocks): Set flag indicating function
>>>>> >>>         has bb partitions.
>>>>> >>>         * bb-reorder.h: Declare insert_section_boundary_note and
>>>>> >>>         emit_barrier_after_bb, which are no longer static.
>>>>> >>>         * basic-block.h: Declare new function fixup_partitions.
>>>>> >>>         * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
>>>>> >>>         check for region crossing note.
>>>>> >>>         (fixup_partition_crossing): New function.
>>>>> >>>         (fixup_bb_partition): Ditto.
>>>>> >>>         (rtl_redirect_edge_and_branch): Fixup partition boundaries.
>>>>> >>>         (force_nonfallthru_and_redirect): Fixup partition boundaries,
>>>>> >>>         remove old code that tried to do this. Emit barrier correctly
>>>>> >>>         when we are in cfglayout mode.
>>>>> >>>         (rtl_split_edge): Correctly fixup partition boundaries.
>>>>> >>>         (commit_one_edge_insertion): Remove old code that tried to
>>>>> >>>         fixup region crossing edge since this is now handled in
>>>>> >>>         split_block, and set up insertion point correctly since
>>>>> >>>         block may now end in a jump.
>>>>> >>>         (commit_edge_insertions): Invoke fixup_partitions to sanitize partition
>>>>> >>>         boundaries after optimizations that modify cfg and before trying to
>>>>> >>>         verify the flow info.
>>>>> >>>         (fixup_partitions): New function.
>>>>> >>>         (rtl_verify_flow_info_1): Add verification that no cold bbs dominate
>>>>> >>>         hot bbs.
>>>>> >>>         (record_effective_endpoints): Remove region-crossing notes and set flag
>>>>> >>>         indicating that they need to be reinserted on exit from cfglayout mode.
>>>>> >>>         (outof_cfg_layout_mode): New cfg_layout_finalize parameter.
>>>>> >>>         (fixup_reorder_chain): Call insert_section_boundary_note if necessary.
>>>>> >>>         Remove old code that attempted to fixup region crossing note as
>>>>> >>>         this is now handled in force_nonfallthru_and_redirect.
>>>>> >>>         (duplicate_insn_chain): Don't duplicate switch section notes.
>>>>> >>>         (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
>>>>> >>>         (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
>>>>> >>>         note.
>>>>> >>>
>>>>> >>> Index: cfghooks.h
>>>>> >>> ===================================================================
>>>>> >>> --- cfghooks.h  (revision 193376)
>>>>> >>> +++ cfghooks.h  (working copy)
>>>>> >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
>>>>> >>>  void account_profile_record (struct profile_record *, int);
>>>>> >>>
>>>>> >>>  extern void cfg_layout_initialize (unsigned int);
>>>>> >>> -extern void cfg_layout_finalize (void);
>>>>> >>> +extern void cfg_layout_finalize (bool);
>>>>> >>>
>>>>> >>>  /* Hooks containers.  */
>>>>> >>>  extern struct cfg_hooks gimple_cfg_hooks;
>>>>> >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
>>>>> >>>  extern void gimple_register_cfg_hooks (void);
>>>>> >>>  extern struct cfg_hooks get_cfg_hooks (void);
>>>>> >>>  extern void set_cfg_hooks (struct cfg_hooks);
>>>>> >>> -
>>>>> >>> Index: modulo-sched.c
>>>>> >>> ===================================================================
>>>>> >>> --- modulo-sched.c      (revision 193376)
>>>>> >>> +++ modulo-sched.c      (working copy)
>>>>> >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void)
>>>>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>>> >>>        bb->aux = bb->next_bb;
>>>>> >>>    free_dominance_info (CDI_DOMINATORS);
>>>>> >>> -  cfg_layout_finalize ();
>>>>> >>> +  cfg_layout_finalize (false);
>>>>> >>>  #endif /* INSN_SCHEDULING */
>>>>> >>>    return 0;
>>>>> >>>  }
>>>>> >>> Index: ifcvt.c
>>>>> >>> ===================================================================
>>>>> >>> --- ifcvt.c     (revision 193376)
>>>>> >>> +++ ifcvt.c     (working copy)
>>>>> >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
>>>>> >>>    if (new_bb)
>>>>> >>>      {
>>>>> >>>        df_bb_replace (then_bb_index, new_bb);
>>>>> >>> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
>>>>> >>> -         we need to ensure that new_bb is in the same partition as
>>>>> >>> -         test bb (you can not fall through across section boundaries).  */
>>>>> >>> -      BB_COPY_PARTITION (new_bb, test_bb);
>>>>> >>> +      /* This should have been done above via force_nonfallthru_and_redirect
>>>>> >>> +         (possibly called from redirect_edge_and_branch_force).  */
>>>>> >>> +      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
>>>>> >>>      }
>>>>> >>>
>>>>> >>>    num_true_changes++;
>>>>> >>> Index: function.c
>>>>> >>> ===================================================================
>>>>> >>> --- function.c  (revision 193376)
>>>>> >>> +++ function.c  (working copy)
>>>>> >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void)
>>>>> >>>                     break;
>>>>> >>>                 if (e)
>>>>> >>>                   {
>>>>> >>> -                   copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
>>>>> >>> -                                                 NULL_RTX, e->src);
>>>>> >>> +                    /* Make sure we insert after any barriers.  */
>>>>> >>> +                    rtx end = get_last_bb_insn (e->src);
>>>>> >>> +                    copy_bb = create_basic_block (NEXT_INSN (end),
>>>>> >>> +                                                  NULL_RTX, e->src);
>>>>> >>>                     BB_COPY_PARTITION (copy_bb, e->src);
>>>>> >>>                   }
>>>>> >>>                 else
>>>>> >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void)
>>>>> >>>         if (cur_bb->index >= NUM_FIXED_BLOCKS
>>>>> >>>             && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
>>>>> >>>           cur_bb->aux = cur_bb->next_bb;
>>>>> >>> -      cfg_layout_finalize ();
>>>>> >>> +      cfg_layout_finalize (false);
>>>>> >>>      }
>>>>> >>>
>>>>> >>>  epilogue_done:
>>>>> >>> @@ -6517,7 +6519,7 @@ epilogue_done:
>>>>> >>>        basic_block simple_return_block_cold = NULL;
>>>>> >>>        edge pending_edge_hot = NULL;
>>>>> >>>        edge pending_edge_cold = NULL;
>>>>> >>> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>>>> >>> +      basic_block exit_pred;
>>>>> >>>        int i;
>>>>> >>>
>>>>> >>>        gcc_assert (entry_edge != orig_entry_edge);
>>>>> >>> @@ -6545,6 +6547,12 @@ epilogue_done:
>>>>> >>>             else
>>>>> >>>               pending_edge_cold = e;
>>>>> >>>           }
>>>>> >>> +
>>>>> >>> +      /* Save a pointer to the exit's predecessor BB for use in
>>>>> >>> +         inserting new BBs at the end of the function. Do this
>>>>> >>> +         after the call to split_block above which may split
>>>>> >>> +         the original exit pred.  */
>>>>> >>> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>>>> >>>
>>>>> >>>        FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
>>>>> >>>         {
>>>>> >>> Index: function.h
>>>>> >>> ===================================================================
>>>>> >>> --- function.h  (revision 193376)
>>>>> >>> +++ function.h  (working copy)
>>>>> >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data {
>>>>> >>>       sched2) and is useful only if the port defines LEAF_REGISTERS.  */
>>>>> >>>    bool uses_only_leaf_regs;
>>>>> >>>
>>>>> >>> +  /* Nonzero if the function being compiled has undergone hot/cold partitioning
>>>>> >>> +     (under flag_reorder_blocks_and_partition) and has at least one cold
>>>>> >>> +     block.  */
>>>>> >>> +  bool has_bb_partition;
>>>>> >>> +
>>>>> >>>    /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
>>>>> >>>       asm.  Unlike regs_ever_live, elements of this array corresponding
>>>>> >>>       to eliminable regs (like the frame pointer) are set if an asm
>>>>> >>> Index: hw-doloop.c
>>>>> >>> ===================================================================
>>>>> >>> --- hw-doloop.c (revision 193376)
>>>>> >>> +++ hw-doloop.c (working copy)
>>>>> >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
>>>>> >>>        else
>>>>> >>>         bb->aux = NULL;
>>>>> >>>      }
>>>>> >>> -  cfg_layout_finalize ();
>>>>> >>> +  cfg_layout_finalize (false);
>>>>> >>>    clear_aux_for_blocks ();
>>>>> >>>    df_analyze ();
>>>>> >>>  }
>>>>> >>> Index: cfgcleanup.c
>>>>> >>> ===================================================================
>>>>> >>> --- cfgcleanup.c        (revision 193376)
>>>>> >>> +++ cfgcleanup.c        (working copy)
>>>>> >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
>>>>> >>>       partition boundaries).  See the comments at the top of
>>>>> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>>>> >>>
>>>>> >>> -  if (flag_reorder_blocks_and_partition && reload_completed)
>>>>> >>> +  if (crtl->has_bb_partition && reload_completed)
>>>>> >>>      return false;
>>>>> >>>
>>>>> >>>    /* Search backward through forwarder blocks.  We don't need to worry
>>>>> >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode)
>>>>> >>>               df_analyze ();
>>>>> >>>             }
>>>>> >>>
>>>>> >>> +         if (changed)
>>>>> >>> +            {
>>>>> >>> +              /* Edge forwarding in particular can cause hot blocks previously
>>>>> >>> +                 reached by both hot and cold blocks to become dominated only
>>>>> >>> +                 by cold blocks. This will cause the verification below to fail,
>>>>> >>> +                 and lead to now cold code in the hot section. This is not easy
>>>>> >>> +                 to detect and fix during edge forwarding, and in some cases
>>>>> >>> +                 is only visible after newly unreachable blocks are deleted,
>>>>> >>> +                 which will be done in fixup_partitions.  */
>>>>> >>> +              fixup_partitions ();
>>>>> >>> +
>>>>> >>>  #ifdef ENABLE_CHECKING
>>>>> >>> -         if (changed)
>>>>> >>> -           verify_flow_info ();
>>>>> >>> +              verify_flow_info ();
>>>>> >>>  #endif
>>>>> >>> +            }
>>>>> >>>
>>>>> >>>           changed_overall |= changed;
>>>>> >>>           first_pass = false;
>>>>> >>> Index: bb-reorder.c
>>>>> >>> ===================================================================
>>>>> >>> --- bb-reorder.c        (revision 193376)
>>>>> >>> +++ bb-reorder.c        (working copy)
>>>>> >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
>>>>> >>>    current_partition = BB_PARTITION (traces[0].first);
>>>>> >>>    two_passes = false;
>>>>> >>>
>>>>> >>> -  if (flag_reorder_blocks_and_partition)
>>>>> >>> +  if (crtl->has_bb_partition)
>>>>> >>>      for (i = 0; i < n_traces && !two_passes; i++)
>>>>> >>>        if (BB_PARTITION (traces[0].first)
>>>>> >>>           != BB_PARTITION (traces[i].first))
>>>>> >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
>>>>> >>>                       }
>>>>> >>>                   }
>>>>> >>>
>>>>> >>> -             if (flag_reorder_blocks_and_partition)
>>>>> >>> +             if (crtl->has_bb_partition)
>>>>> >>>                 try_copy = false;
>>>>> >>>
>>>>> >>>               /* Copy tiny blocks always; copy larger blocks only when the
>>>>> >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
>>>>> >>>    return length;
>>>>> >>>  }
>>>>> >>>
>>>>> >>> -/* Emit a barrier into the footer of BB.  */
>>>>> >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
>>>>> >>>
>>>>> >>> -static void
>>>>> >>> +void
>>>>> >>>  emit_barrier_after_bb (basic_block bb)
>>>>> >>>  {
>>>>> >>>    rtx barrier = emit_barrier_after (BB_END (bb));
>>>>> >>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>>>> >>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>>>>> >>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>>>> >>>  }
>>>>> >>>
>>>>> >>>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
>>>>> >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
>>>>> >>>  {
>>>>> >>>    VEC(edge, heap) *crossing_edges = NULL;
>>>>> >>>    basic_block bb;
>>>>> >>> -  edge e;
>>>>> >>> -  edge_iterator ei;
>>>>> >>> +  edge e, e2;
>>>>> >>> +  edge_iterator ei, ei2;
>>>>> >>> +  unsigned int cold_bb_count = 0;
>>>>> >>> +  VEC (basic_block, heap) *bbs_in_hot_partition = NULL;
>>>>> >>> +  VEC (basic_block, heap) *bbs_newly_hot = NULL;
>>>>> >>>
>>>>> >>>    /* Mark which partition (hot/cold) each basic block belongs in.  */
>>>>> >>>    FOR_EACH_BB (bb)
>>>>> >>>      {
>>>>> >>>        if (probably_never_executed_bb_p (cfun, bb))
>>>>> >>> -       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>>>> >>> +        {
>>>>> >>> +          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>>>> >>> +          cold_bb_count++;
>>>>> >>> +        }
>>>>> >>>        else
>>>>> >>> -       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>>>>> >>> +        {
>>>>> >>> +          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>>>>> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb);
>>>>> >>> +        }
>>>>> >>>      }
>>>>> >>>
>>>>> >>> +  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
>>>>> >>> +     several different possibilities. One is that there are edge weight insanities
>>>>> >>> +     due to optimization phases that do not properly update basic block profile
>>>>> >>> +     counts. The second is that the entry of the function may not be hot, because
>>>>> >>> +     it is entered fewer times than the number of profile training runs, but there
>>>>> >>> +     is a loop inside the function that causes blocks within the function to be
>>>>> >>> +     above the threshold for hotness.  */
>>>>> >>> +  if (cold_bb_count)
>>>>> >>> +    {
>>>>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>>>> >>> +
>>>>> >>> +      if (dom_calculated_here)
>>>>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>>>>> >>> +
>>>>> >>> +      /* Keep examining hot bbs until we have either checked them all, or
>>>>> >>> +         re-marked all cold bbs hot.  */
>>>>> >>> +      while (! VEC_empty (basic_block, bbs_in_hot_partition)
>>>>> >>> +             && cold_bb_count)
>>>>> >>> +        {
>>>>> >>> +          basic_block dom_bb;
>>>>> >>> +
>>>>> >>> +          bb = VEC_pop (basic_block, bbs_in_hot_partition);
>>>>> >>> +          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
>>>>> >>> +
>>>>> >>> +          /* If bb's immediate dominator is also hot then it is ok.  */
>>>>> >>> +          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
>>>>> >>> +            continue;
>>>>> >>> +
>>>>> >>> +          /* We have a hot bb with an immediate dominator that is cold.
>>>>> >>> +             The dominator needs to be re-marked to hot.  */
>>>>> >>> +          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
>>>>> >>> +          cold_bb_count--;
>>>>> >>> +
>>>>> >>> +          /* Now we need to examine newly-hot dom_bb to see if it is also
>>>>> >>> +             dominated by a cold bb.  */
>>>>> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb);
>>>>> >>> +
>>>>> >>> +          /* We should also adjust any cold blocks that the newly-hot bb
>>>>> >>> +             feeds and see if it makes sense to re-mark those as hot as
>>>>> >>> +             well.  */
>>>>> >>> +          VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb);
>>>>> >>> +          while (! VEC_empty (basic_block, bbs_newly_hot))
>>>>> >>> +            {
>>>>> >>> +              basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot);
>>>>> >>> +              /* Examine all successors of this newly-hot bb to see if they
>>>>> >>> +                 are cold and should be re-marked as hot.  */
>>>>> >>> +              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
>>>>> >>> +                {
>>>>> >>> +                  bool any_cold_preds = false;
>>>>> >>> +                  basic_block succ = e->dest;
>>>>> >>> +                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
>>>>> >>> +                    continue;
>>>>> >>> +                  /* Does this block have any cold predecessors now?  */
>>>>> >>> +                  FOR_EACH_EDGE (e2, ei2, succ->preds)
>>>>> >>> +                  {
>>>>> >>> +                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
>>>>> >>> +                      {
>>>>> >>> +                        any_cold_preds = true;
>>>>> >>> +                        break;
>>>>> >>> +                      }
>>>>> >>> +                  }
>>>>> >>> +                  if (any_cold_preds)
>>>>> >>> +                    continue;
>>>>> >>> +
>>>>> >>> +                  /* Here we have a successor of newly-hot bb that is cold
>>>>> >>> +                     but no longer has any cold precessessors. Since the original
>>>>> >>> +                     assignment of our newly-hot bb was incorrect, this successor's
>>>>> >>> +                     assignment as cold is also suspect. Go ahead and re-mark it
>>>>> >>> +                     as hot now too. Better heuristics may be in order here.  */
>>>>> >>> +                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
>>>>> >>> +                  cold_bb_count--;
>>>>> >>> +                  VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ);
>>>>> >>> +                  /* Examine this successor as a newly-hot bb.  */
>>>>> >>> +                  VEC_safe_push (basic_block, heap, bbs_newly_hot, succ);
>>>>> >>> +                }
>>>>> >>> +            }
>>>>> >>> +        }
>>>>> >>> +
>>>>> >>> +      if (dom_calculated_here)
>>>>> >>> +        free_dominance_info (CDI_DOMINATORS);
>>>>> >>> +    }
>>>>> >>> +
>>>>> >>>    /* The format of .gcc_except_table does not allow landing pads to
>>>>> >>>       be in a different partition as the throw.  Fix this by either
>>>>> >>>       moving or duplicating the landing pads.  */
>>>>> >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
>>>>> >>>                       new_bb->aux = cur_bb->aux;
>>>>> >>>                       cur_bb->aux = new_bb;
>>>>> >>>
>>>>> >>> -                     /* Make sure new fall-through bb is in same
>>>>> >>> -                        partition as bb it's falling through from.  */
>>>>> >>> +                      /* This is done by force_nonfallthru_and_redirect.  */
>>>>> >>> +                     gcc_assert (BB_PARTITION (new_bb)
>>>>> >>> +                                  == BB_PARTITION (cur_bb));
>>>>> >>>
>>>>> >>> -                     BB_COPY_PARTITION (new_bb, cur_bb);
>>>>> >>>                       single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
>>>>> >>>                     }
>>>>> >>>                   else
>>>>> >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
>>>>> >>>    FOR_EACH_BB (bb)
>>>>> >>>      FOR_EACH_EDGE (e, ei, bb->succs)
>>>>> >>>        if ((e->flags & EDGE_CROSSING)
>>>>> >>> -         && JUMP_P (BB_END (e->src)))
>>>>> >>> +         && JUMP_P (BB_END (e->src))
>>>>> >>> +          /* Some notes were added during fix_up_fall_thru_edges, via
>>>>> >>> +             force_nonfallthru_and_redirect.  */
>>>>> >>> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
>>>>> >>>         add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>>> >>>  }
>>>>> >>>
>>>>> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
>>>>> >>>        dump_flow_info (dump_file, dump_flags);
>>>>> >>>      }
>>>>> >>>
>>>>> >>> -  if (flag_reorder_blocks_and_partition)
>>>>> >>> +  if (crtl->has_bb_partition)
>>>>> >>>      verify_hot_cold_block_grouping ();
>>>>> >>>  }
>>>>> >>>
>>>>> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
>>>>> >>>     encountering this note will make the compiler switch between the
>>>>> >>>     hot and cold text sections.  */
>>>>> >>>
>>>>> >>> -static void
>>>>> >>> +void
>>>>> >>>  insert_section_boundary_note (void)
>>>>> >>>  {
>>>>> >>>    basic_block bb;
>>>>> >>>    rtx new_note;
>>>>> >>>    int first_partition = 0;
>>>>> >>>
>>>>> >>> -  if (!flag_reorder_blocks_and_partition)
>>>>> >>> +  if (!crtl->has_bb_partition)
>>>>> >>>      return;
>>>>> >>>
>>>>> >>>    FOR_EACH_BB (bb)
>>>>> >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
>>>>> >>>    FOR_EACH_BB (bb)
>>>>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>>> >>>        bb->aux = bb->next_bb;
>>>>> >>> -  cfg_layout_finalize ();
>>>>> >>> +  cfg_layout_finalize (true);
>>>>> >>>
>>>>> >>> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>>>>> >>> -  insert_section_boundary_note ();
>>>>> >>>    return 0;
>>>>> >>>  }
>>>>> >>>
>>>>> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
>>>>> >>>      }
>>>>> >>>
>>>>> >>>  done:
>>>>> >>> -  cfg_layout_finalize ();
>>>>> >>> +  cfg_layout_finalize (false);
>>>>> >>>
>>>>> >>>    BITMAP_FREE (candidates);
>>>>> >>>    return 0;
>>>>> >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
>>>>> >>>    if (crossing_edges == NULL)
>>>>> >>>      return 0;
>>>>> >>>
>>>>> >>> +  crtl->has_bb_partition = true;
>>>>> >>> +
>>>>> >>>    /* Make sure the source of any crossing edge ends in a jump and the
>>>>> >>>       destination of any crossing edge has a label.  */
>>>>> >>>    add_labels_and_missing_jumps (crossing_edges);
>>>>> >>> Index: bb-reorder.h
>>>>> >>> ===================================================================
>>>>> >>> --- bb-reorder.h        (revision 193376)
>>>>> >>> +++ bb-reorder.h        (working copy)
>>>>> >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re
>>>>> >>>
>>>>> >>>  extern int get_uncond_jump_length (void);
>>>>> >>>
>>>>> >>> +extern void insert_section_boundary_note (void);
>>>>> >>> +
>>>>> >>> +extern void emit_barrier_after_bb (basic_block bb);
>>>>> >>> +
>>>>> >>>  #endif
>>>>> >>> Index: basic-block.h
>>>>> >>> ===================================================================
>>>>> >>> --- basic-block.h       (revision 193376)
>>>>> >>> +++ basic-block.h       (working copy)
>>>>> >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect
>>>>> >>>  extern bool contains_no_active_insn_p (const_basic_block);
>>>>> >>>  extern bool forwarder_block_p (const_basic_block);
>>>>> >>>  extern bool can_fallthru (basic_block, basic_block);
>>>>> >>> +extern void fixup_partitions (void);
>>>>> >>>
>>>>> >>>  /* In cfgbuild.c.  */
>>>>> >>>  extern void find_many_sub_basic_blocks (sbitmap);
>>>>> >>> Index: cfgrtl.c
>>>>> >>> ===================================================================
>>>>> >>> --- cfgrtl.c    (revision 193376)
>>>>> >>> +++ cfgrtl.c    (working copy)
>>>>> >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>>>>> >>>  #include "tree.h"
>>>>> >>>  #include "hard-reg-set.h"
>>>>> >>>  #include "basic-block.h"
>>>>> >>> +#include "bb-reorder.h"
>>>>> >>>  #include "regs.h"
>>>>> >>>  #include "flags.h"
>>>>> >>>  #include "function.h"
>>>>> >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
>>>>> >>>     Only applicable if the CFG is in cfglayout mode.  */
>>>>> >>>  static GTY(()) rtx cfg_layout_function_footer;
>>>>> >>>  static GTY(()) rtx cfg_layout_function_header;
>>>>> >>> +static bool had_sec_boundary_notes;
>>>>> >>>
>>>>> >>>  static rtx skip_insns_after_block (basic_block);
>>>>> >>>  static void record_effective_endpoints (void);
>>>>> >>>  static rtx label_for_bb (basic_block);
>>>>> >>> -static void fixup_reorder_chain (void);
>>>>> >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks);
>>>>> >>>
>>>>> >>>  void verify_insn_chain (void);
>>>>> >>>  static void fixup_fallthru_exit_predecessor (void);
>>>>> >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
>>>>> >>>       partition boundaries).  See  the comments at the top of
>>>>> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>>>> >>>
>>>>> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>>>> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>>>> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>>>> >>>      return NULL;
>>>>> >>>
>>>>> >>>    /* We can replace or remove a complex jump only when we have exactly
>>>>> >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
>>>>> >>>    return e;
>>>>> >>>  }
>>>>> >>>
>>>>> >>> +/* Called when edge E has been redirected to a new destination,
>>>>> >>> +   in order to update the region crossing flag on the edge and
>>>>> >>> +   jump.  */
>>>>> >>> +
>>>>> >>> +static void
>>>>> >>> +fixup_partition_crossing (edge e, basic_block target)
>>>>> >>> +{
>>>>> >>> +  rtx note;
>>>>> >>> +
>>>>> >>> +  gcc_assert (e->dest == target);
>>>>> >>> +
>>>>> >>> +  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
>>>>> >>> +    return;
>>>>> >>> +  /* If we redirected an existing edge, it may already be marked
>>>>> >>> +     crossing, even though the new src is missing a reg crossing note.
>>>>> >>> +     But make sure reg crossing note doesn't already exist before
>>>>> >>> +     inserting.  */
>>>>> >>> +  if (BB_PARTITION (e->src) != BB_PARTITION (target))
>>>>> >>> +    {
>>>>> >>> +      e->flags |= EDGE_CROSSING;
>>>>> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>>> >>> +      if (JUMP_P (BB_END (e->src))
>>>>> >>> +          && !note)
>>>>> >>> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>>> >>> +    }
>>>>> >>> +  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
>>>>> >>> +    {
>>>>> >>> +      e->flags &= ~EDGE_CROSSING;
>>>>> >>> +      /* Remove the region crossing note from jump at end of
>>>>> >>> +         e->src if it exists.  */
>>>>> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>>> >>> +      if (note)
>>>>> >>> +        remove_note (BB_END (e->src), note);
>>>>> >>> +    }
>>>>> >>> +}
>>>>> >>> +
>>>>> >>> +/* Called when block BB has been reassigned to a different partition,
>>>>> >>> +   to ensure that the region crossing attributes are updated.  */
>>>>> >>> +
>>>>> >>> +static void
>>>>> >>> +fixup_bb_partition (basic_block bb)
>>>>> >>> +{
>>>>> >>> +  edge e;
>>>>> >>> +  edge_iterator ei;
>>>>> >>> +
>>>>> >>> +  /* Now need to make bb's pred edges non-region crossing.  */
>>>>> >>> +  FOR_EACH_EDGE (e, ei, bb->preds)
>>>>> >>> +    {
>>>>> >>> +      fixup_partition_crossing (e, e->dest);
>>>>> >>> +    }
>>>>> >>> +
>>>>> >>> +  /* Possibly need to make bb's successor edges region crossing,
>>>>> >>> +     or remove stale region crossing.  */
>>>>> >>> +  FOR_EACH_EDGE (e, ei, bb->succs)
>>>>> >>> +    {
>>>>> >>> +      if ((e->flags & EDGE_FALLTHRU)
>>>>> >>> +          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
>>>>> >>> +          && e->dest != EXIT_BLOCK_PTR)
>>>>> >>> +        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
>>>>> >>> +        force_nonfallthru (e);
>>>>> >>> +      else
>>>>> >>> +        fixup_partition_crossing (e, e->dest);
>>>>> >>> +    }
>>>>> >>> +}
>>>>> >>> +
>>>>> >>>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>>>>> >>>     expense of adding new instructions or reordering basic blocks.
>>>>> >>>
>>>>> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>>>> >>>  {
>>>>> >>>    edge ret;
>>>>> >>>    basic_block src = e->src;
>>>>> >>> +  basic_block dest = e->dest;
>>>>> >>>
>>>>> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>>>> >>>      return NULL;
>>>>> >>>
>>>>> >>> -  if (e->dest == target)
>>>>> >>> +  if (dest == target)
>>>>> >>>      return e;
>>>>> >>>
>>>>> >>>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
>>>>> >>>      {
>>>>> >>>        df_set_bb_dirty (src);
>>>>> >>> +      fixup_partition_crossing (ret, target);
>>>>> >>>        return ret;
>>>>> >>>      }
>>>>> >>>
>>>>> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>>>> >>>      return NULL;
>>>>> >>>
>>>>> >>>    df_set_bb_dirty (src);
>>>>> >>> +  fixup_partition_crossing (ret, target);
>>>>> >>>    return ret;
>>>>> >>>  }
>>>>> >>>
>>>>> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>>>> >>>        /* Make sure new block ends up in correct hot/cold section.  */
>>>>> >>>
>>>>> >>>        BB_COPY_PARTITION (jump_block, e->src);
>>>>> >>> -      if (flag_reorder_blocks_and_partition
>>>>> >>> -         && targetm_common.have_named_sections
>>>>> >>> -         && JUMP_P (BB_END (jump_block))
>>>>> >>> -         && !any_condjump_p (BB_END (jump_block))
>>>>> >>> -         && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
>>>>> >>> -       add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
>>>>> >>>
>>>>> >>>        /* Wire edge in.  */
>>>>> >>>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
>>>>> >>>        new_edge->probability = probability;
>>>>> >>>        new_edge->count = count;
>>>>> >>>
>>>>> >>> +      /* If e->src was previously region crossing, it no longer is
>>>>> >>> +         and the reg crossing note should be removed.  */
>>>>> >>> +      fixup_partition_crossing (new_edge, jump_block);
>>>>> >>> +
>>>>> >>>        /* Redirect old edge.  */
>>>>> >>>        redirect_edge_pred (e, jump_block);
>>>>> >>>        e->probability = REG_BR_PROB_BASE;
>>>>> >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>>>> >>>        LABEL_NUSES (label)++;
>>>>> >>>      }
>>>>> >>>
>>>>> >>> -  emit_barrier_after (BB_END (jump_block));
>>>>> >>> +  /* We might be in cfg layout mode, and if so, the following routine will
>>>>> >>> +     insert the barrier correctly.  */
>>>>> >>> +  emit_barrier_after_bb (jump_block);
>>>>> >>>    redirect_edge_succ_nodup (e, target);
>>>>> >>>
>>>>> >>>    if (abnormal_edge_flags)
>>>>> >>>      make_edge (src, target, abnormal_edge_flags);
>>>>> >>>
>>>>> >>>    df_mark_solutions_dirty ();
>>>>> >>> +  fixup_partition_crossing (e, target);
>>>>> >>>    return new_bb;
>>>>> >>>  }
>>>>> >>>
>>>>> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>>>>> >>>  static basic_block
>>>>> >>>  rtl_split_edge (edge edge_in)
>>>>> >>>  {
>>>>> >>> -  basic_block bb;
>>>>> >>> +  basic_block bb, new_bb;
>>>>> >>>    rtx before;
>>>>> >>>
>>>>> >>>    /* Abnormal edges cannot be split.  */
>>>>> >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in)
>>>>> >>>    else
>>>>> >>>      {
>>>>> >>>        bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
>>>>> >>> -      /* ??? Why not edge_in->dest->prev_bb here?  */
>>>>> >>> -      BB_COPY_PARTITION (bb, edge_in->dest);
>>>>> >>> +      if (edge_in->src == ENTRY_BLOCK_PTR)
>>>>> >>> +        BB_COPY_PARTITION (bb, edge_in->dest);
>>>>> >>> +      else
>>>>> >>> +        /* Put the split bb into the src partition, to avoid creating
>>>>> >>> +           a situation where a cold bb dominates a hot bb, in the case
>>>>> >>> +           where src is cold and dest is hot. The src will dominate
>>>>> >>> +           the new bb (whereas it might not have dominated dest).  */
>>>>> >>> +        BB_COPY_PARTITION (bb, edge_in->src);
>>>>> >>>      }
>>>>> >>>
>>>>> >>>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
>>>>> >>>
>>>>> >>> +  /* Can't allow a region crossing edge to be fallthrough.  */
>>>>> >>> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
>>>>> >>> +      && edge_in->dest != EXIT_BLOCK_PTR)
>>>>> >>> +    {
>>>>> >>> +      new_bb = force_nonfallthru (single_succ_edge (bb));
>>>>> >>> +      gcc_assert (!new_bb);
>>>>> >>> +    }
>>>>> >>> +
>>>>> >>>    /* For non-fallthru edges, we must adjust the predecessor's
>>>>> >>>       jump instruction to target our new block.  */
>>>>> >>>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
>>>>> >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e)
>>>>> >>>    else
>>>>> >>>      {
>>>>> >>>        bb = split_edge (e);
>>>>> >>> -      after = BB_END (bb);
>>>>> >>>
>>>>> >>> -      if (flag_reorder_blocks_and_partition
>>>>> >>> -         && targetm_common.have_named_sections
>>>>> >>> -         && e->src != ENTRY_BLOCK_PTR
>>>>> >>> -         && BB_PARTITION (e->src) == BB_COLD_PARTITION
>>>>> >>> -         && !(e->flags & EDGE_CROSSING)
>>>>> >>> -         && JUMP_P (after)
>>>>> >>> -         && !any_condjump_p (after)
>>>>> >>> -         && (single_succ_edge (bb)->flags & EDGE_CROSSING))
>>>>> >>> -       add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
>>>>> >>> +      /* If e crossed a partition boundary, we needed to make bb end in
>>>>> >>> +         a region-crossing jump, even though it was originally fallthru.  */
>>>>> >>> +      if (JUMP_P (BB_END (bb)))
>>>>> >>> +       before = BB_END (bb);
>>>>> >>> +      else
>>>>> >>> +        after = BB_END (bb);
>>>>> >>>      }
>>>>> >>>
>>>>> >>>    /* Now that we've found the spot, do the insertion.  */
>>>>> >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void)
>>>>> >>>  {
>>>>> >>>    basic_block bb;
>>>>> >>>
>>>>> >>> +  /* Optimization passes that invoke this routine can cause hot blocks
>>>>> >>> +     previously reached by both hot and cold blocks to become dominated only
>>>>> >>> +     by cold blocks. This will cause the verification below to fail,
>>>>> >>> +     and lead to now cold code in the hot section. In some cases this
>>>>> >>> +     may only be visible after newly unreachable blocks are deleted,
>>>>> >>> +     which will be done by fixup_partitions.  */
>>>>> >>> +  fixup_partitions ();
>>>>> >>> +
>>>>> >>>  #ifdef ENABLE_CHECKING
>>>>> >>>    verify_flow_info ();
>>>>> >>>  #endif
>>>>> >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb)
>>>>> >>>
>>>>> >>>    return end;
>>>>> >>>  }
>>>>> >>> -
>>>>> >>> +
>>>>> >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization
>>>>> >>> +   passes that modify the cfg.  */
>>>>> >>> +
>>>>> >>> +void
>>>>> >>> +fixup_partitions (void)
>>>>> >>> +{
>>>>> >>> +  basic_block bb;
>>>>> >>> +
>>>>> >>> +  if (!crtl->has_bb_partition)
>>>>> >>> +    return;
>>>>> >>> +
>>>>> >>> +  /* Delete any blocks that became unreachable and weren't
>>>>> >>> +     already cleaned up, for example during edge forwarding
>>>>> >>> +     and convert_jumps_to_returns. This will expose more
>>>>> >>> +     opportunities for fixing the partition boundaries here.
>>>>> >>> +     Also, the calculation of the dominance graph during verification
>>>>> >>> +     will assert if there are unreachable nodes.  */
>>>>> >>> +  delete_unreachable_blocks ();
>>>>> >>> +
>>>>> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
>>>>> >>> +     a cold partition cannot dominate a basic block in a hot partition.
>>>>> >>> +     Fixup any that now violate this requirement, as a result of edge
>>>>> >>> +     forwarding and unreachable block deletion.  */
>>>>> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>>>>> >>> +  VEC (basic_block, heap) *bbs_to_fix = NULL;
>>>>> >>> +  FOR_EACH_BB (bb)
>>>>> >>> +    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>>>>> >>> +      VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>>>>> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>>> >>> +    {
>>>>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>>>> >>> +      basic_block son;
>>>>> >>> +
>>>>> >>> +      if (dom_calculated_here)
>>>>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>>>>> >>> +
>>>>> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>>> >>> +        {
>>>>> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>>>>> >>> +          /* If bb is not yet cold (because it was added below as
>>>>> >>> +             a block dominated by a cold bb) then mark it cold here.  */
>>>>> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>>>>> >>> +            {
>>>>> >>> +              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>>>> >>> +              VEC_safe_push (basic_block, heap, bbs_to_fix, bb);
>>>>> >>> +            }
>>>>> >>> +          /* Any blocks dominated by a block in the cold section
>>>>> >>> +             must also be cold.  */
>>>>> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>>>>> >>> +               son;
>>>>> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
>>>>> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>>>>> >>> +        }
>>>>> >>> +
>>>>> >>> +      if (dom_calculated_here)
>>>>> >>> +        free_dominance_info (CDI_DOMINATORS);
>>>>> >>> +    }
>>>>> >>> +
>>>>> >>> +  /* Do the partition fixup after all necessary blocks have been converted to
>>>>> >>> +     cold, so that we only update the region crossings the minimum number of
>>>>> >>> +     places, which can require forcing edges to be non fallthru.  */
>>>>> >>> +  while (! VEC_empty (basic_block, bbs_to_fix))
>>>>> >>> +    {
>>>>> >>> +      bb = VEC_pop (basic_block, bbs_to_fix);
>>>>> >>> +      fixup_bb_partition (bb);
>>>>> >>> +    }
>>>>> >>> +}
>>>>> >>> +
>>>>> >>>  /* Verify the CFG and RTL consistency common for both underlying RTL and
>>>>> >>>     cfglayout RTL.
>>>>> >>>
>>>>> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void)
>>>>> >>>    rtx x;
>>>>> >>>    int err = 0;
>>>>> >>>    basic_block bb;
>>>>> >>> +  bool have_partitions = false;
>>>>> >>>
>>>>> >>>    /* Check the general integrity of the basic blocks.  */
>>>>> >>>    FOR_EACH_BB_REVERSE (bb)
>>>>> >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void)
>>>>> >>>
>>>>> >>>           if (e->flags & EDGE_ABNORMAL)
>>>>> >>>             n_abnormal++;
>>>>> >>> +
>>>>> >>> +          have_partitions |= is_crossing;
>>>>> >>>         }
>>>>> >>>
>>>>> >>>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
>>>>> >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void)
>>>>> >>>           }
>>>>> >>>      }
>>>>> >>>
>>>>> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
>>>>> >>> +     a cold partition cannot dominate a basic block in a hot partition.  */
>>>>> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>>>>> >>> +  if (have_partitions && !err)
>>>>> >>> +    FOR_EACH_BB (bb)
>>>>> >>> +      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>>>>> >>> +        VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>>>>> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>>> >>> +    {
>>>>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>>>> >>> +      basic_block son;
>>>>> >>> +
>>>>> >>> +      if (dom_calculated_here)
>>>>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>>>>> >>> +
>>>>> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>>> >>> +        {
>>>>> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>>>>> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>>>>> >>> +            {
>>>>> >>> +              error ("non-cold basic block %d dominated "
>>>>> >>> +                     "by a block in the cold partition", bb->index);
>>>>> >>> +              err = 1;
>>>>> >>> +            }
>>>>> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>>>>> >>> +               son;
>>>>> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
>>>>> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>>>>> >>> +        }
>>>>> >>> +
>>>>> >>> +      if (dom_calculated_here)
>>>>> >>> +        free_dominance_info (CDI_DOMINATORS);
>>>>> >>> +    }
>>>>> >>> +
>>>>> >>>    /* Clean up.  */
>>>>> >>>    return err;
>>>>> >>>  }
>>>>> >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void)
>>>>> >>>    else
>>>>> >>>      cfg_layout_function_header = NULL_RTX;
>>>>> >>>
>>>>> >>> +  had_sec_boundary_notes = false;
>>>>> >>> +
>>>>> >>>    next_insn = get_insns ();
>>>>> >>>    FOR_EACH_BB (bb)
>>>>> >>>      {
>>>>> >>>        rtx end;
>>>>> >>>
>>>>> >>>        if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
>>>>> >>> -       BB_HEADER (bb) = unlink_insn_chain (next_insn,
>>>>> >>> -                                             PREV_INSN (BB_HEAD (bb)));
>>>>> >>> +        {
>>>>> >>> +          /* Rather than try to keep section boundary notes incrementally
>>>>> >>> +             up-to-date through cfg layout optimizations, simply remove them
>>>>> >>> +             and flag that they should be re-inserted when exiting
>>>>> >>> +             cfg layout mode.  */
>>>>> >>> +          rtx check_insn = next_insn;
>>>>> >>> +          while (check_insn)
>>>>> >>> +            {
>>>>> >>> +              if (NOTE_P (check_insn)
>>>>> >>> +                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
>>>>> >>> +              {
>>>>> >>> +                had_sec_boundary_notes |= true;
>>>>> >>> +                /* Remove note from chain. Grab new next_insn first.  */
>>>>> >>> +                if (next_insn == check_insn)
>>>>> >>> +                  next_insn = NEXT_INSN (check_insn);
>>>>> >>> +                /* Delete note.  */
>>>>> >>> +                delete_insn (check_insn);
>>>>> >>> +                /* There will only be one.  */
>>>>> >>> +                break;
>>>>> >>> +              }
>>>>> >>> +              check_insn = NEXT_INSN (check_insn);
>>>>> >>> +            }
>>>>> >>> +          /* If we still have header instructions left after above loop.  */
>>>>> >>> +          if (next_insn != BB_HEAD (bb))
>>>>> >>> +            BB_HEADER (bb) = unlink_insn_chain (next_insn,
>>>>> >>> +                                                PREV_INSN (BB_HEAD (bb)));
>>>>> >>> +        }
>>>>> >>>        end = skip_insns_after_block (bb);
>>>>> >>>        if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
>>>>> >>>         BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
>>>>> >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void)
>>>>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>>> >>>        bb->aux = bb->next_bb;
>>>>> >>>
>>>>> >>> -  cfg_layout_finalize ();
>>>>> >>> +  cfg_layout_finalize (false);
>>>>> >>>
>>>>> >>>    return 0;
>>>>> >>>  }
>>>>> >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
>>>>> >>>  }
>>>>> >>>
>>>>> >>>
>>>>> >>> -/* Given a reorder chain, rearrange the code to match.  */
>>>>> >>> +/* Given a reorder chain, rearrange the code to match. If
>>>>> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, or when
>>>>> >>> +   section boundary notes were removed on entry to cfg layout
>>>>> >>> +   mode, insert section boundary notes here.  */
>>>>> >>>
>>>>> >>>  static void
>>>>> >>> -fixup_reorder_chain (void)
>>>>> >>> +fixup_reorder_chain (bool finalize_reorder_blocks)
>>>>> >>>  {
>>>>> >>>    basic_block bb;
>>>>> >>>    rtx insn = NULL;
>>>>> >>> @@ -3150,7 +3373,7 @@ static void
>>>>> >>>           PREV_INSN (BB_HEADER (bb)) = insn;
>>>>> >>>           insn = BB_HEADER (bb);
>>>>> >>>           while (NEXT_INSN (insn))
>>>>> >>> -           insn = NEXT_INSN (insn);
>>>>> >>> +            insn = NEXT_INSN (insn);
>>>>> >>>         }
>>>>> >>>        if (insn)
>>>>> >>>         NEXT_INSN (insn) = BB_HEAD (bb);
>>>>> >>> @@ -3175,6 +3398,11 @@ static void
>>>>> >>>      insn = NEXT_INSN (insn);
>>>>> >>>
>>>>> >>>    set_last_insn (insn);
>>>>> >>> +
>>>>> >>> +  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>>>>> >>> +  if (had_sec_boundary_notes || finalize_reorder_blocks)
>>>>> >>> +    insert_section_boundary_note ();
>>>>> >>> +
>>>>> >>>  #ifdef ENABLE_CHECKING
>>>>> >>>    verify_insn_chain ();
>>>>> >>>  #endif
>>>>> >>> @@ -3187,7 +3415,7 @@ static void
>>>>> >>>        edge e_fall, e_taken, e;
>>>>> >>>        rtx bb_end_insn;
>>>>> >>>        rtx ret_label = NULL_RTX;
>>>>> >>> -      basic_block nb, src_bb;
>>>>> >>> +      basic_block nb;
>>>>> >>>        edge_iterator ei;
>>>>> >>>
>>>>> >>>        if (EDGE_COUNT (bb->succs) == 0)
>>>>> >>> @@ -3322,7 +3550,6 @@ static void
>>>>> >>>        /* We got here if we need to add a new jump insn.
>>>>> >>>          Note force_nonfallthru can delete E_FALL and thus we have to
>>>>> >>>          save E_FALL->src prior to the call to force_nonfallthru.  */
>>>>> >>> -      src_bb = e_fall->src;
>>>>> >>>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
>>>>> >>>        if (nb)
>>>>> >>>         {
>>>>> >>> @@ -3330,17 +3557,6 @@ static void
>>>>> >>>           bb->aux = nb;
>>>>> >>>           /* Don't process this new block.  */
>>>>> >>>           bb = nb;
>>>>> >>> -
>>>>> >>> -         /* Make sure new bb is tagged for correct section (same as
>>>>> >>> -            fall-thru source, since you cannot fall-thru across
>>>>> >>> -            section boundaries).  */
>>>>> >>> -         BB_COPY_PARTITION (src_bb, single_pred (bb));
>>>>> >>> -         if (flag_reorder_blocks_and_partition
>>>>> >>> -             && targetm_common.have_named_sections
>>>>> >>> -             && JUMP_P (BB_END (bb))
>>>>> >>> -             && !any_condjump_p (BB_END (bb))
>>>>> >>> -             && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
>>>>> >>> -           add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
>>>>> >>>         }
>>>>> >>>      }
>>>>> >>>
>>>>> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to)
>>>>> >>>             case NOTE_INSN_FUNCTION_BEG:
>>>>> >>>               /* There is always just single entry to function.  */
>>>>> >>>             case NOTE_INSN_BASIC_BLOCK:
>>>>> >>> +              /* We should only switch text sections once.  */
>>>>> >>> +           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>>>> >>>               break;
>>>>> >>>
>>>>> >>>             case NOTE_INSN_EPILOGUE_BEG:
>>>>> >>> -           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>>>> >>>               emit_note_copy (insn);
>>>>> >>>               break;
>>>>> >>>
>>>>> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void)
>>>>> >>>  }
>>>>> >>>
>>>>> >>>  /* Finalize the changes: reorder insn list according to the sequence specified
>>>>> >>> -   by aux pointers, enter compensation code, rebuild scope forest.  */
>>>>> >>> +   by aux pointers, enter compensation code, rebuild scope forest. If
>>>>> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
>>>>> >>> +   to fixup_reorder_chain so that it can insert the proper switch text
>>>>> >>> +   section notes.  */
>>>>> >>>
>>>>> >>>  void
>>>>> >>> -cfg_layout_finalize (void)
>>>>> >>> +cfg_layout_finalize (bool finalize_reorder_blocks)
>>>>> >>>  {
>>>>> >>>  #ifdef ENABLE_CHECKING
>>>>> >>>    verify_flow_info ();
>>>>> >>> @@ -3775,7 +3995,7 @@ void
>>>>> >>>  #endif
>>>>> >>>        )
>>>>> >>>      fixup_fallthru_exit_predecessor ();
>>>>> >>> -  fixup_reorder_chain ();
>>>>> >>> +  fixup_reorder_chain (finalize_reorder_blocks);
>>>>> >>>
>>>>> >>>    rebuild_jump_labels (get_insns ());
>>>>> >>>    delete_dead_jumptables ();
>>>>> >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e)
>>>>> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>>>> >>>      return false;
>>>>> >>>
>>>>> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>>>> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>>>> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>>>> >>>      return false;
>>>>> >>>
>>>>> >>>    if (!onlyjump_p (insn)
>>>>> >>>
>>>>> >>> --
>>>>> >>> This patch is available for review at http://codereview.appspot.com/6823047
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Christophe Lyon Jan. 31, 2013, 2:18 p.m. UTC | #9
Hello,

Sorry for the long delay (ref http://patchwork.ozlabs.org/patch/199397/)



On 6 December 2012 20:26, Teresa Johnson <tejohnson@google.com> wrote:
>
>
>
> On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
>>
>> I have updated my trunk checkout, and I can confirm that eval.c now
>> compiles with your patch (and the other 4 patches I added to PR55121).
>
>
> good
>
>>
>>
>> Now, when looking at the whole Spec2k results:
>> - vpr passes now (used to fail)
>
>
> good
>
>>
>> - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail
>> with the same error from gas:
>> can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171'
>> {.text section}
>> - gap still does not build (same error as above)
>>
>> I haven't looked in detail, so I may be missing an obvious patch here.
>
>
> Finally had a chance to get back to this. I was able to reproduce the
> failure using x86_64 linux with "-freorder-blocks-and-partition -g".
> However, I am also getting the same failure with a pristine copy of trunk.
> Can you confirm whether you were seeing any of these failures without my
> patches, because I believe they are probably a limitation with function
> splitting and debug info that is orthogonal to my patch.
>
Yes I confirm that I see these failures without your patch too; and
both -freorder-blocks-and-partition and -g are present in my
command-line.
And now gap's integer.c fails to compile with a similar error message too.

>>
>> And I still observe runtime mis-behaviour on crafty, galgel, facerec and
>> fma3d.
>
>
> I'm not seeing this on x86_64, unfortunately, so it might require some
> follow-on work to triage and fix.
>
> I'll look into the gas failure, but if someone could review this patch in
> the meantime given that it does improve things considerably (at least
> without -g), that would be great.
>
Indeed.

> Thanks,
> Teresa
>

Thanks
Christophe
Teresa Johnson Feb. 5, 2013, 3:45 p.m. UTC | #10
Thanks for the confirmation that the -g issue is orthogonal. I did
start to try to address it but got pulled away by some other things
for awhile. I'll see if I can take another stab at it.

In the meantime, could one of the global maintainers take a look at
the patch? I don't want it to get too stale, and without these fixes I
am unable to get -freorder-blocks-and-partition to work at all.

Thanks!
Teresa

On Thu, Jan 31, 2013 at 6:18 AM, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> Hello,
>
> Sorry for the long delay (ref http://patchwork.ozlabs.org/patch/199397/)
>
>
>
> On 6 December 2012 20:26, Teresa Johnson <tejohnson@google.com> wrote:
>>
>>
>>
>> On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon
>> <christophe.lyon@linaro.org> wrote:
>>>
>>> I have updated my trunk checkout, and I can confirm that eval.c now
>>> compiles with your patch (and the other 4 patches I added to PR55121).
>>
>>
>> good
>>
>>>
>>>
>>> Now, when looking at the whole Spec2k results:
>>> - vpr passes now (used to fail)
>>
>>
>> good
>>
>>>
>>> - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail
>>> with the same error from gas:
>>> can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171'
>>> {.text section}
>>> - gap still does not build (same error as above)
>>>
>>> I haven't looked in detail, so I may be missing an obvious patch here.
>>
>>
>> Finally had a chance to get back to this. I was able to reproduce the
>> failure using x86_64 linux with "-freorder-blocks-and-partition -g".
>> However, I am also getting the same failure with a pristine copy of trunk.
>> Can you confirm whether you were seeing any of these failures without my
>> patches, because I believe they are probably a limitation with function
>> splitting and debug info that is orthogonal to my patch.
>>
> Yes I confirm that I see these failures without your patch too; and
> both -freorder-blocks-and-partition and -g are present in my
> command-line.
> And now gap's integer.c fails to compile with a similar error message too.
>
>>>
>>> And I still observe runtime mis-behaviour on crafty, galgel, facerec and
>>> fma3d.
>>
>>
>> I'm not seeing this on x86_64, unfortunately, so it might require some
>> follow-on work to triage and fix.
>>
>> I'll look into the gas failure, but if someone could review this patch in
>> the meantime given that it does improve things considerably (at least
>> without -g), that would be great.
>>
> Indeed.
>
>> Thanks,
>> Teresa
>>
>
> Thanks
> Christophe
diff mbox

Patch

Index: cfghooks.h
===================================================================
--- cfghooks.h	(revision 193376)
+++ cfghooks.h	(working copy)
@@ -204,7 +204,7 @@  extern void copy_bbs (basic_block *, unsigned, bas
 void account_profile_record (struct profile_record *, int);
 
 extern void cfg_layout_initialize (unsigned int);
-extern void cfg_layout_finalize (void);
+extern void cfg_layout_finalize (bool);
 
 /* Hooks containers.  */
 extern struct cfg_hooks gimple_cfg_hooks;
@@ -218,4 +218,3 @@  extern void cfg_layout_rtl_register_cfg_hooks (voi
 extern void gimple_register_cfg_hooks (void);
 extern struct cfg_hooks get_cfg_hooks (void);
 extern void set_cfg_hooks (struct cfg_hooks);
-
Index: modulo-sched.c
===================================================================
--- modulo-sched.c	(revision 193376)
+++ modulo-sched.c	(working copy)
@@ -3354,7 +3354,7 @@  rest_of_handle_sms (void)
     if (bb->next_bb != EXIT_BLOCK_PTR)
       bb->aux = bb->next_bb;
   free_dominance_info (CDI_DOMINATORS);
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
 #endif /* INSN_SCHEDULING */
   return 0;
 }
Index: ifcvt.c
===================================================================
--- ifcvt.c	(revision 193376)
+++ ifcvt.c	(working copy)
@@ -3900,10 +3900,9 @@  find_if_case_1 (basic_block test_bb, edge then_edg
   if (new_bb)
     {
       df_bb_replace (then_bb_index, new_bb);
-      /* Since the fallthru edge was redirected from test_bb to new_bb,
-         we need to ensure that new_bb is in the same partition as
-         test bb (you can not fall through across section boundaries).  */
-      BB_COPY_PARTITION (new_bb, test_bb);
+      /* This should have been done above via force_nonfallthru_and_redirect
+         (possibly called from redirect_edge_and_branch_force).  */
+      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
     }
 
   num_true_changes++;
Index: function.c
===================================================================
--- function.c	(revision 193376)
+++ function.c	(working copy)
@@ -6249,8 +6249,10 @@  thread_prologue_and_epilogue_insns (void)
 		    break;
 		if (e)
 		  {
-		    copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
-						  NULL_RTX, e->src);
+                    /* Make sure we insert after any barriers.  */
+                    rtx end = get_last_bb_insn (e->src);
+                    copy_bb = create_basic_block (NEXT_INSN (end),
+                                                  NULL_RTX, e->src);
 		    BB_COPY_PARTITION (copy_bb, e->src);
 		  }
 		else
@@ -6475,7 +6477,7 @@  thread_prologue_and_epilogue_insns (void)
 	if (cur_bb->index >= NUM_FIXED_BLOCKS
 	    && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
 	  cur_bb->aux = cur_bb->next_bb;
-      cfg_layout_finalize ();
+      cfg_layout_finalize (false);
     }
 
 epilogue_done:
@@ -6517,7 +6519,7 @@  epilogue_done:
       basic_block simple_return_block_cold = NULL;
       edge pending_edge_hot = NULL;
       edge pending_edge_cold = NULL;
-      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
+      basic_block exit_pred;
       int i;
 
       gcc_assert (entry_edge != orig_entry_edge);
@@ -6545,6 +6547,12 @@  epilogue_done:
 	    else
 	      pending_edge_cold = e;
 	  }
+      
+      /* Save a pointer to the exit's predecessor BB for use in
+         inserting new BBs at the end of the function. Do this
+         after the call to split_block above which may split
+         the original exit pred.  */
+      exit_pred = EXIT_BLOCK_PTR->prev_bb;
 
       FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
 	{
Index: function.h
===================================================================
--- function.h	(revision 193376)
+++ function.h	(working copy)
@@ -459,6 +459,11 @@  struct GTY(()) rtl_data {
      sched2) and is useful only if the port defines LEAF_REGISTERS.  */
   bool uses_only_leaf_regs;
 
+  /* Nonzero if the function being compiled has undergone hot/cold partitioning
+     (under flag_reorder_blocks_and_partition) and has at least one cold
+     block.  */
+  bool has_bb_partition;
+
   /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
      asm.  Unlike regs_ever_live, elements of this array corresponding
      to eliminable regs (like the frame pointer) are set if an asm
Index: hw-doloop.c
===================================================================
--- hw-doloop.c	(revision 193376)
+++ hw-doloop.c	(working copy)
@@ -547,7 +547,7 @@  reorder_loops (hwloop_info loops)
       else
 	bb->aux = NULL;
     }
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
   clear_aux_for_blocks ();
   df_analyze ();
 }
Index: cfgcleanup.c
===================================================================
--- cfgcleanup.c	(revision 193376)
+++ cfgcleanup.c	(working copy)
@@ -1824,7 +1824,7 @@  try_crossjump_to_edge (int mode, edge e1, edge e2,
      partition boundaries).  See the comments at the top of
      bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
 
-  if (flag_reorder_blocks_and_partition && reload_completed)
+  if (crtl->has_bb_partition && reload_completed)
     return false;
 
   /* Search backward through forwarder blocks.  We don't need to worry
@@ -2767,10 +2767,21 @@  try_optimize_cfg (int mode)
 	      df_analyze ();
 	    }
 
+	  if (changed)
+            {
+              /* Edge forwarding in particular can cause hot blocks previously
+                 reached by both hot and cold blocks to become dominated only
+                 by cold blocks. This will cause the verification below to fail,
+                 and lead to now cold code in the hot section. This is not easy
+                 to detect and fix during edge forwarding, and in some cases
+                 is only visible after newly unreachable blocks are deleted,
+                 which will be done in fixup_partitions.  */
+              fixup_partitions ();
+
 #ifdef ENABLE_CHECKING
-	  if (changed)
-	    verify_flow_info ();
+              verify_flow_info ();
 #endif
+            }
 
 	  changed_overall |= changed;
 	  first_pass = false;
Index: bb-reorder.c
===================================================================
--- bb-reorder.c	(revision 193376)
+++ bb-reorder.c	(working copy)
@@ -1054,7 +1054,7 @@  connect_traces (int n_traces, struct trace *traces
   current_partition = BB_PARTITION (traces[0].first);
   two_passes = false;
 
-  if (flag_reorder_blocks_and_partition)
+  if (crtl->has_bb_partition)
     for (i = 0; i < n_traces && !two_passes; i++)
       if (BB_PARTITION (traces[0].first)
 	  != BB_PARTITION (traces[i].first))
@@ -1263,7 +1263,7 @@  connect_traces (int n_traces, struct trace *traces
 		      }
 		  }
 
-	      if (flag_reorder_blocks_and_partition)
+	      if (crtl->has_bb_partition)
 		try_copy = false;
 
 	      /* Copy tiny blocks always; copy larger blocks only when the
@@ -1381,13 +1381,14 @@  get_uncond_jump_length (void)
   return length;
 }
 
-/* Emit a barrier into the footer of BB.  */
+/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
 
-static void
+void
 emit_barrier_after_bb (basic_block bb)
 {
   rtx barrier = emit_barrier_after (BB_END (bb));
-  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
+  if (current_ir_type () == IR_RTL_CFGLAYOUT)
+    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
 }
 
 /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
@@ -1463,18 +1464,109 @@  find_rarely_executed_basic_blocks_and_crossing_edg
 {
   VEC(edge, heap) *crossing_edges = NULL;
   basic_block bb;
-  edge e;
-  edge_iterator ei;
+  edge e, e2;
+  edge_iterator ei, ei2;
+  unsigned int cold_bb_count = 0;
+  VEC (basic_block, heap) *bbs_in_hot_partition = NULL;
+  VEC (basic_block, heap) *bbs_newly_hot = NULL;
 
   /* Mark which partition (hot/cold) each basic block belongs in.  */
   FOR_EACH_BB (bb)
     {
       if (probably_never_executed_bb_p (cfun, bb))
-	BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+        {
+          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+          cold_bb_count++;
+        }
       else
-	BB_SET_PARTITION (bb, BB_HOT_PARTITION);
+        {
+          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
+          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb);
+        }
     }
 
+  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
+     several different possibilities. One is that there are edge weight insanities
+     due to optimization phases that do not properly update basic block profile
+     counts. The second is that the entry of the function may not be hot, because
+     it is entered fewer times than the number of profile training runs, but there
+     is a loop inside the function that causes blocks within the function to be
+     above the threshold for hotness.  */
+  if (cold_bb_count)
+    {
+      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+
+      if (dom_calculated_here)
+        calculate_dominance_info (CDI_DOMINATORS);
+
+      /* Keep examining hot bbs until we have either checked them all, or
+         re-marked all cold bbs hot.  */
+      while (! VEC_empty (basic_block, bbs_in_hot_partition)
+             && cold_bb_count)
+        {
+          basic_block dom_bb;
+
+          bb = VEC_pop (basic_block, bbs_in_hot_partition);
+          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+
+          /* If bb's immediate dominator is also hot then it is ok.  */
+          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
+            continue;
+
+          /* We have a hot bb with an immediate dominator that is cold.
+             The dominator needs to be re-marked to hot.  */
+          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
+          cold_bb_count--;
+
+          /* Now we need to examine newly-hot dom_bb to see if it is also
+             dominated by a cold bb.  */
+          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb);
+
+          /* We should also adjust any cold blocks that the newly-hot bb
+             feeds and see if it makes sense to re-mark those as hot as
+             well.  */
+          VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb);
+          while (! VEC_empty (basic_block, bbs_newly_hot))
+            {
+              basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot);
+              /* Examine all successors of this newly-hot bb to see if they
+                 are cold and should be re-marked as hot.  */
+              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
+                {
+                  bool any_cold_preds = false;
+                  basic_block succ = e->dest;
+                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
+                    continue;
+                  /* Does this block have any cold predecessors now?  */
+                  FOR_EACH_EDGE (e2, ei2, succ->preds)
+                  {
+                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
+                      {
+                        any_cold_preds = true;
+                        break;
+                      }
+                  }
+                  if (any_cold_preds)
+                    continue;
+
+                  /* Here we have a successor of newly-hot bb that is cold
+                     but no longer has any cold precessessors. Since the original
+                     assignment of our newly-hot bb was incorrect, this successor's
+                     assignment as cold is also suspect. Go ahead and re-mark it
+                     as hot now too. Better heuristics may be in order here.  */
+                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
+                  cold_bb_count--;
+                  VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ);
+                  /* Examine this successor as a newly-hot bb.  */
+                  VEC_safe_push (basic_block, heap, bbs_newly_hot, succ);
+                }
+            }
+        }
+
+      if (dom_calculated_here)
+        free_dominance_info (CDI_DOMINATORS);
+    }
+
   /* The format of .gcc_except_table does not allow landing pads to
      be in a different partition as the throw.  Fix this by either
      moving or duplicating the landing pads.  */
@@ -1766,10 +1858,10 @@  fix_up_fall_thru_edges (void)
 		      new_bb->aux = cur_bb->aux;
 		      cur_bb->aux = new_bb;
 
-		      /* Make sure new fall-through bb is in same
-			 partition as bb it's falling through from.  */
+                      /* This is done by force_nonfallthru_and_redirect.  */
+		      gcc_assert (BB_PARTITION (new_bb)
+                                  == BB_PARTITION (cur_bb));
 
-		      BB_COPY_PARTITION (new_bb, cur_bb);
 		      single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
 		    }
 		  else
@@ -2067,7 +2159,10 @@  add_reg_crossing_jump_notes (void)
   FOR_EACH_BB (bb)
     FOR_EACH_EDGE (e, ei, bb->succs)
       if ((e->flags & EDGE_CROSSING)
-	  && JUMP_P (BB_END (e->src)))
+	  && JUMP_P (BB_END (e->src))
+          /* Some notes were added during fix_up_fall_thru_edges, via
+             force_nonfallthru_and_redirect.  */
+          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
 	add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
 }
 
@@ -2160,7 +2255,7 @@  reorder_basic_blocks (void)
       dump_flow_info (dump_file, dump_flags);
     }
 
-  if (flag_reorder_blocks_and_partition)
+  if (crtl->has_bb_partition)
     verify_hot_cold_block_grouping ();
 }
 
@@ -2172,14 +2267,14 @@  reorder_basic_blocks (void)
    encountering this note will make the compiler switch between the
    hot and cold text sections.  */
 
-static void
+void
 insert_section_boundary_note (void)
 {
   basic_block bb;
   rtx new_note;
   int first_partition = 0;
 
-  if (!flag_reorder_blocks_and_partition)
+  if (!crtl->has_bb_partition)
     return;
 
   FOR_EACH_BB (bb)
@@ -2222,10 +2317,8 @@  rest_of_handle_reorder_blocks (void)
   FOR_EACH_BB (bb)
     if (bb->next_bb != EXIT_BLOCK_PTR)
       bb->aux = bb->next_bb;
-  cfg_layout_finalize ();
+  cfg_layout_finalize (true);
 
-  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
-  insert_section_boundary_note ();
   return 0;
 }
 
@@ -2366,7 +2459,7 @@  duplicate_computed_gotos (void)
     }
 
 done:
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
 
   BITMAP_FREE (candidates);
   return 0;
@@ -2511,6 +2604,8 @@  partition_hot_cold_basic_blocks (void)
   if (crossing_edges == NULL)
     return 0;
 
+  crtl->has_bb_partition = true;
+
   /* Make sure the source of any crossing edge ends in a jump and the
      destination of any crossing edge has a label.  */
   add_labels_and_missing_jumps (crossing_edges);
Index: bb-reorder.h
===================================================================
--- bb-reorder.h	(revision 193376)
+++ bb-reorder.h	(working copy)
@@ -36,4 +36,8 @@  extern struct target_bb_reorder *this_target_bb_re
 
 extern int get_uncond_jump_length (void);
 
+extern void insert_section_boundary_note (void);
+
+extern void emit_barrier_after_bb (basic_block bb);
+
 #endif
Index: basic-block.h
===================================================================
--- basic-block.h	(revision 193376)
+++ basic-block.h	(working copy)
@@ -806,6 +806,7 @@  extern basic_block force_nonfallthru_and_redirect
 extern bool contains_no_active_insn_p (const_basic_block);
 extern bool forwarder_block_p (const_basic_block);
 extern bool can_fallthru (basic_block, basic_block);
+extern void fixup_partitions (void);
 
 /* In cfgbuild.c.  */
 extern void find_many_sub_basic_blocks (sbitmap);
Index: cfgrtl.c
===================================================================
--- cfgrtl.c	(revision 193376)
+++ cfgrtl.c	(working copy)
@@ -46,6 +46,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree.h"
 #include "hard-reg-set.h"
 #include "basic-block.h"
+#include "bb-reorder.h"
 #include "regs.h"
 #include "flags.h"
 #include "function.h"
@@ -67,11 +68,12 @@  along with GCC; see the file COPYING3.  If not see
    Only applicable if the CFG is in cfglayout mode.  */
 static GTY(()) rtx cfg_layout_function_footer;
 static GTY(()) rtx cfg_layout_function_header;
+static bool had_sec_boundary_notes;
 
 static rtx skip_insns_after_block (basic_block);
 static void record_effective_endpoints (void);
 static rtx label_for_bb (basic_block);
-static void fixup_reorder_chain (void);
+static void fixup_reorder_chain (bool finalize_reorder_blocks);
 
 void verify_insn_chain (void);
 static void fixup_fallthru_exit_predecessor (void);
@@ -976,8 +978,7 @@  try_redirect_by_replacing_jump (edge e, basic_bloc
      partition boundaries).  See  the comments at the top of
      bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
 
-  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
-      || BB_PARTITION (src) != BB_PARTITION (target))
+  if (BB_PARTITION (src) != BB_PARTITION (target))
     return NULL;
 
   /* We can replace or remove a complex jump only when we have exactly
@@ -1286,6 +1287,71 @@  redirect_branch_edge (edge e, basic_block target)
   return e;
 }
 
+/* Called when edge E has been redirected to a new destination,
+   in order to update the region crossing flag on the edge and
+   jump.  */
+
+static void
+fixup_partition_crossing (edge e, basic_block target)
+{
+  rtx note;
+
+  gcc_assert (e->dest == target);
+
+  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
+    return;
+  /* If we redirected an existing edge, it may already be marked
+     crossing, even though the new src is missing a reg crossing note.
+     But make sure reg crossing note doesn't already exist before
+     inserting.  */
+  if (BB_PARTITION (e->src) != BB_PARTITION (target))
+    {
+      e->flags |= EDGE_CROSSING;
+      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+      if (JUMP_P (BB_END (e->src))
+          && !note)
+        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+    }
+  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
+    {
+      e->flags &= ~EDGE_CROSSING;
+      /* Remove the region crossing note from jump at end of
+         e->src if it exists.  */
+      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+      if (note)
+        remove_note (BB_END (e->src), note);
+    }
+}
+
+/* Called when block BB has been reassigned to a different partition,
+   to ensure that the region crossing attributes are updated.  */
+
+static void
+fixup_bb_partition (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  /* Now need to make bb's pred edges non-region crossing.  */
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      fixup_partition_crossing (e, e->dest);
+    }
+
+  /* Possibly need to make bb's successor edges region crossing,
+     or remove stale region crossing.  */
+  FOR_EACH_EDGE (e, ei, bb->succs)
+    {
+      if ((e->flags & EDGE_FALLTHRU)
+          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
+          && e->dest != EXIT_BLOCK_PTR)
+        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
+        force_nonfallthru (e);
+      else
+        fixup_partition_crossing (e, e->dest);
+    }
+}
+
 /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
    expense of adding new instructions or reordering basic blocks.
 
@@ -1302,16 +1368,18 @@  rtl_redirect_edge_and_branch (edge e, basic_block
 {
   edge ret;
   basic_block src = e->src;
+  basic_block dest = e->dest;
 
   if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
     return NULL;
 
-  if (e->dest == target)
+  if (dest == target)
     return e;
 
   if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
     {
       df_set_bb_dirty (src);
+      fixup_partition_crossing (ret, target);
       return ret;
     }
 
@@ -1320,6 +1388,7 @@  rtl_redirect_edge_and_branch (edge e, basic_block
     return NULL;
 
   df_set_bb_dirty (src);
+  fixup_partition_crossing (ret, target);
   return ret;
 }
 
@@ -1454,18 +1523,16 @@  force_nonfallthru_and_redirect (edge e, basic_bloc
       /* Make sure new block ends up in correct hot/cold section.  */
 
       BB_COPY_PARTITION (jump_block, e->src);
-      if (flag_reorder_blocks_and_partition
-	  && targetm_common.have_named_sections
-	  && JUMP_P (BB_END (jump_block))
-	  && !any_condjump_p (BB_END (jump_block))
-	  && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
-	add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
 
       /* Wire edge in.  */
       new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
       new_edge->probability = probability;
       new_edge->count = count;
 
+      /* If e->src was previously region crossing, it no longer is
+         and the reg crossing note should be removed.  */
+      fixup_partition_crossing (new_edge, jump_block);
+
       /* Redirect old edge.  */
       redirect_edge_pred (e, jump_block);
       e->probability = REG_BR_PROB_BASE;
@@ -1521,13 +1588,16 @@  force_nonfallthru_and_redirect (edge e, basic_bloc
       LABEL_NUSES (label)++;
     }
 
-  emit_barrier_after (BB_END (jump_block));
+  /* We might be in cfg layout mode, and if so, the following routine will
+     insert the barrier correctly.  */
+  emit_barrier_after_bb (jump_block);
   redirect_edge_succ_nodup (e, target);
 
   if (abnormal_edge_flags)
     make_edge (src, target, abnormal_edge_flags);
 
   df_mark_solutions_dirty ();
+  fixup_partition_crossing (e, target);
   return new_bb;
 }
 
@@ -1626,7 +1696,7 @@  rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
 static basic_block
 rtl_split_edge (edge edge_in)
 {
-  basic_block bb;
+  basic_block bb, new_bb;
   rtx before;
 
   /* Abnormal edges cannot be split.  */
@@ -1659,12 +1729,26 @@  rtl_split_edge (edge edge_in)
   else
     {
       bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
-      /* ??? Why not edge_in->dest->prev_bb here?  */
-      BB_COPY_PARTITION (bb, edge_in->dest);
+      if (edge_in->src == ENTRY_BLOCK_PTR)
+        BB_COPY_PARTITION (bb, edge_in->dest);
+      else
+        /* Put the split bb into the src partition, to avoid creating
+           a situation where a cold bb dominates a hot bb, in the case
+           where src is cold and dest is hot. The src will dominate
+           the new bb (whereas it might not have dominated dest).  */
+        BB_COPY_PARTITION (bb, edge_in->src);
     }
 
   make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
 
+  /* Can't allow a region crossing edge to be fallthrough.  */
+  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
+      && edge_in->dest != EXIT_BLOCK_PTR)
+    {
+      new_bb = force_nonfallthru (single_succ_edge (bb));
+      gcc_assert (!new_bb);
+    }
+
   /* For non-fallthru edges, we must adjust the predecessor's
      jump instruction to target our new block.  */
   if ((edge_in->flags & EDGE_FALLTHRU) == 0)
@@ -1777,17 +1861,13 @@  commit_one_edge_insertion (edge e)
   else
     {
       bb = split_edge (e);
-      after = BB_END (bb);
 
-      if (flag_reorder_blocks_and_partition
-	  && targetm_common.have_named_sections
-	  && e->src != ENTRY_BLOCK_PTR
-	  && BB_PARTITION (e->src) == BB_COLD_PARTITION
-	  && !(e->flags & EDGE_CROSSING)
-	  && JUMP_P (after)
-	  && !any_condjump_p (after)
-	  && (single_succ_edge (bb)->flags & EDGE_CROSSING))
-	add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
+      /* If e crossed a partition boundary, we needed to make bb end in
+         a region-crossing jump, even though it was originally fallthru.  */
+      if (JUMP_P (BB_END (bb)))
+	before = BB_END (bb);
+      else
+        after = BB_END (bb);
     }
 
   /* Now that we've found the spot, do the insertion.  */
@@ -1827,6 +1907,14 @@  commit_edge_insertions (void)
 {
   basic_block bb;
 
+  /* Optimization passes that invoke this routine can cause hot blocks
+     previously reached by both hot and cold blocks to become dominated only
+     by cold blocks. This will cause the verification below to fail,
+     and lead to now cold code in the hot section. In some cases this
+     may only be visible after newly unreachable blocks are deleted,
+     which will be done by fixup_partitions.  */
+  fixup_partitions ();
+
 #ifdef ENABLE_CHECKING
   verify_flow_info ();
 #endif
@@ -2028,7 +2116,75 @@  get_last_bb_insn (basic_block bb)
 
   return end;
 }
-
+
+/* Perform cleanup on the hot/cold bb partitioning after optimization
+   passes that modify the cfg.  */
+
+void
+fixup_partitions (void)
+{
+  basic_block bb;
+
+  if (!crtl->has_bb_partition)
+    return;
+
+  /* Delete any blocks that became unreachable and weren't
+     already cleaned up, for example during edge forwarding
+     and convert_jumps_to_returns. This will expose more
+     opportunities for fixing the partition boundaries here.
+     Also, the calculation of the dominance graph during verification
+     will assert if there are unreachable nodes.  */
+  delete_unreachable_blocks ();
+
+  /* If there are partitions, do a sanity check on them: A basic block in
+     a cold partition cannot dominate a basic block in a hot partition.
+     Fixup any that now violate this requirement, as a result of edge
+     forwarding and unreachable block deletion.  */
+  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
+  VEC (basic_block, heap) *bbs_to_fix = NULL;
+  FOR_EACH_BB (bb)
+    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
+      VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
+  if (! VEC_empty (basic_block, bbs_in_cold_partition))
+    {
+      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+      basic_block son;
+
+      if (dom_calculated_here)
+        calculate_dominance_info (CDI_DOMINATORS);
+
+      while (! VEC_empty (basic_block, bbs_in_cold_partition))
+        {
+          bb = VEC_pop (basic_block, bbs_in_cold_partition);
+          /* If bb is not yet cold (because it was added below as
+             a block dominated by a cold bb) then mark it cold here.  */
+          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
+            {
+              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+              VEC_safe_push (basic_block, heap, bbs_to_fix, bb);
+            }
+          /* Any blocks dominated by a block in the cold section
+             must also be cold.  */
+          for (son = first_dom_son (CDI_DOMINATORS, bb);
+               son;
+               son = next_dom_son (CDI_DOMINATORS, son))
+            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
+        }
+
+      if (dom_calculated_here)
+        free_dominance_info (CDI_DOMINATORS);
+    }
+
+  /* Do the partition fixup after all necessary blocks have been converted to
+     cold, so that we only update the region crossings the minimum number of
+     places, which can require forcing edges to be non fallthru.  */
+  while (! VEC_empty (basic_block, bbs_to_fix))
+    {
+      bb = VEC_pop (basic_block, bbs_to_fix);
+      fixup_bb_partition (bb);
+    }
+}
+
 /* Verify the CFG and RTL consistency common for both underlying RTL and
    cfglayout RTL.
 
@@ -2052,6 +2208,7 @@  rtl_verify_flow_info_1 (void)
   rtx x;
   int err = 0;
   basic_block bb;
+  bool have_partitions = false;
 
   /* Check the general integrity of the basic blocks.  */
   FOR_EACH_BB_REVERSE (bb)
@@ -2169,6 +2326,8 @@  rtl_verify_flow_info_1 (void)
 
 	  if (e->flags & EDGE_ABNORMAL)
 	    n_abnormal++;
+
+          have_partitions |= is_crossing;
 	}
 
       if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
@@ -2293,6 +2452,40 @@  rtl_verify_flow_info_1 (void)
 	  }
     }
 
+  /* If there are partitions, do a sanity check on them: A basic block in
+     a cold partition cannot dominate a basic block in a hot partition.  */
+  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
+  if (have_partitions && !err)
+    FOR_EACH_BB (bb)
+      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
+        VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
+  if (! VEC_empty (basic_block, bbs_in_cold_partition))
+    {
+      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+      basic_block son;
+
+      if (dom_calculated_here)
+        calculate_dominance_info (CDI_DOMINATORS);
+
+      while (! VEC_empty (basic_block, bbs_in_cold_partition))
+        {
+          bb = VEC_pop (basic_block, bbs_in_cold_partition);
+          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
+            {
+              error ("non-cold basic block %d dominated "
+                     "by a block in the cold partition", bb->index);
+              err = 1;
+            }
+          for (son = first_dom_son (CDI_DOMINATORS, bb);
+               son;
+               son = next_dom_son (CDI_DOMINATORS, son))
+            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
+        }
+
+      if (dom_calculated_here)
+        free_dominance_info (CDI_DOMINATORS);
+    }
+
   /* Clean up.  */
   return err;
 }
@@ -2965,14 +3158,41 @@  record_effective_endpoints (void)
   else
     cfg_layout_function_header = NULL_RTX;
 
+  had_sec_boundary_notes = false;
+
   next_insn = get_insns ();
   FOR_EACH_BB (bb)
     {
       rtx end;
 
       if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
-	BB_HEADER (bb) = unlink_insn_chain (next_insn,
-					      PREV_INSN (BB_HEAD (bb)));
+        {
+          /* Rather than try to keep section boundary notes incrementally
+             up-to-date through cfg layout optimizations, simply remove them
+             and flag that they should be re-inserted when exiting
+             cfg layout mode.  */
+          rtx check_insn = next_insn;
+          while (check_insn)
+            {
+              if (NOTE_P (check_insn)
+                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
+              {
+                had_sec_boundary_notes |= true;
+                /* Remove note from chain. Grab new next_insn first.  */
+                if (next_insn == check_insn)
+                  next_insn = NEXT_INSN (check_insn);
+                /* Delete note.  */
+                delete_insn (check_insn);
+                /* There will only be one.  */
+                break;
+              }
+              check_insn = NEXT_INSN (check_insn);
+            }
+          /* If we still have header instructions left after above loop.  */
+          if (next_insn != BB_HEAD (bb))
+            BB_HEADER (bb) = unlink_insn_chain (next_insn,
+                                                PREV_INSN (BB_HEAD (bb)));
+        }
       end = skip_insns_after_block (bb);
       if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
 	BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
@@ -3000,7 +3220,7 @@  outof_cfg_layout_mode (void)
     if (bb->next_bb != EXIT_BLOCK_PTR)
       bb->aux = bb->next_bb;
 
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
 
   return 0;
 }
@@ -3120,10 +3340,13 @@  relink_block_chain (bool stay_in_cfglayout_mode)
 }
 
 
-/* Given a reorder chain, rearrange the code to match.  */
+/* Given a reorder chain, rearrange the code to match. If
+   this is called when we will FINALIZE_REORDER_BLOCKS, or when
+   section boundary notes were removed on entry to cfg layout
+   mode, insert section boundary notes here.  */
 
 static void
-fixup_reorder_chain (void)
+fixup_reorder_chain (bool finalize_reorder_blocks)
 {
   basic_block bb;
   rtx insn = NULL;
@@ -3150,7 +3373,7 @@  static void
 	  PREV_INSN (BB_HEADER (bb)) = insn;
 	  insn = BB_HEADER (bb);
 	  while (NEXT_INSN (insn))
-	    insn = NEXT_INSN (insn);
+            insn = NEXT_INSN (insn);
 	}
       if (insn)
 	NEXT_INSN (insn) = BB_HEAD (bb);
@@ -3175,6 +3398,11 @@  static void
     insn = NEXT_INSN (insn);
 
   set_last_insn (insn);
+
+  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
+  if (had_sec_boundary_notes || finalize_reorder_blocks)
+    insert_section_boundary_note ();
+
 #ifdef ENABLE_CHECKING
   verify_insn_chain ();
 #endif
@@ -3187,7 +3415,7 @@  static void
       edge e_fall, e_taken, e;
       rtx bb_end_insn;
       rtx ret_label = NULL_RTX;
-      basic_block nb, src_bb;
+      basic_block nb;
       edge_iterator ei;
 
       if (EDGE_COUNT (bb->succs) == 0)
@@ -3322,7 +3550,6 @@  static void
       /* We got here if we need to add a new jump insn. 
 	 Note force_nonfallthru can delete E_FALL and thus we have to
 	 save E_FALL->src prior to the call to force_nonfallthru.  */
-      src_bb = e_fall->src;
       nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
       if (nb)
 	{
@@ -3330,17 +3557,6 @@  static void
 	  bb->aux = nb;
 	  /* Don't process this new block.  */
 	  bb = nb;
-
-	  /* Make sure new bb is tagged for correct section (same as
-	     fall-thru source, since you cannot fall-thru across
-	     section boundaries).  */
-	  BB_COPY_PARTITION (src_bb, single_pred (bb));
-	  if (flag_reorder_blocks_and_partition
-	      && targetm_common.have_named_sections
-	      && JUMP_P (BB_END (bb))
-	      && !any_condjump_p (BB_END (bb))
-	      && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
-	    add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
 	}
     }
 
@@ -3644,10 +3860,11 @@  duplicate_insn_chain (rtx from, rtx to)
 	    case NOTE_INSN_FUNCTION_BEG:
 	      /* There is always just single entry to function.  */
 	    case NOTE_INSN_BASIC_BLOCK:
+              /* We should only switch text sections once.  */
+	    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
 	      break;
 
 	    case NOTE_INSN_EPILOGUE_BEG:
-	    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
 	      emit_note_copy (insn);
 	      break;
 
@@ -3759,10 +3976,13 @@  break_superblocks (void)
 }
 
 /* Finalize the changes: reorder insn list according to the sequence specified
-   by aux pointers, enter compensation code, rebuild scope forest.  */
+   by aux pointers, enter compensation code, rebuild scope forest. If
+   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
+   to fixup_reorder_chain so that it can insert the proper switch text
+   section notes.  */
 
 void
-cfg_layout_finalize (void)
+cfg_layout_finalize (bool finalize_reorder_blocks)
 {
 #ifdef ENABLE_CHECKING
   verify_flow_info ();
@@ -3775,7 +3995,7 @@  void
 #endif
       )
     fixup_fallthru_exit_predecessor ();
-  fixup_reorder_chain ();
+  fixup_reorder_chain (finalize_reorder_blocks);
 
   rebuild_jump_labels (get_insns ());
   delete_dead_jumptables ();
@@ -4454,8 +4674,7 @@  rtl_can_remove_branch_p (const_edge e)
   if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
     return false;
 
-  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
-      || BB_PARTITION (src) != BB_PARTITION (target))
+  if (BB_PARTITION (src) != BB_PARTITION (target))
     return false;
 
   if (!onlyjump_p (insn)