Patchwork Fix PR 53743 and other -freorder-blocks-and-partition failures

login
register
mail settings
Submitter Teresa Johnson
Date May 23, 2013, 1:18 p.m.
Message ID <CAAe5K+WEPgQLzO_LiKkvsNuL-JUZzJS7G9PCqFT5imVGTa5Smw@mail.gmail.com>
Download mbox | patch
Permalink /patch/245950/
State New
Headers show

Comments

Teresa Johnson - May 23, 2013, 1:18 p.m.
On Wed, May 22, 2013 at 2:05 PM, Teresa Johnson <tejohnson@google.com> wrote:
> Revised patch included below. The spacing of my pasted in patch text
> looks funky again, let me know if you want the patch as an attachment
> instead.
>
> I addressed all of Steven's comments, except for the suggestion to use
> gcc_assert
> instead of error() in verify_hot_cold_block_grouping() to keep this consistent
> with the rest of the verify_flow_info subroutines (let me know if this is ok).

I fixed this issue too, which was actually in
insert_section_boundary_note(), so that it gcc_asserts more
efficiently as suggested. Retested, latest patch below.

Honza, would you be able to review the patch?

Thanks!
Teresa

>
> The other main changes:
> (1) Added several test cases (cloned from the torture subdirectories,
> where I manually
> built/ran with FDO and -freorder-blocks-and-partition with both the
> current trunk and
> my fixed trunk compiler, and was able to expose some failures I fixed.
> (2) Changed existing tree-prof tests that used
> -freorder-blocks-and-partition to be
> built with -O2 instead of -O, so that partitioning actually kicks in.
> (3) Fixed a couple of failures in the new
> verify_hot_cold_block_grouping() checks
> exposed by the torture tests I ran manually with splitting (2 of the
> tests cloned
> to tree-prof in this patch). One was in computed goto where we were
> too aggressive
> about cloning crossing edges, and the other was in rtl_split_edge
> called from the "stack"
> pass which was not correctly inserting the new bb in the correct partition since
> bb layout is complete at that point.
>
> Re-tested on x86_64-unknown-linux-gnu with bootstrap and profiledbootstrap
> builds and regression testing. Re-built/ran cpu2006int with profile
> feedback and -freorder-blocks-and-partition enabled.
>
> Ok for trunk?
>
> Thanks!
> Teresa

2013-05-23  Teresa Johnson  <tejohnson@google.com>

* ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
        as this is now done by redirect_edge_and_branch_force.
* function.c (thread_prologue_and_epilogue_insns): Insert new bb after
        barriers, and fix interaction with splitting.
* emit-rtl.c (try_split): Copy REG_CROSSING_JUMP notes.
* cfgcleanup.c (try_forward_edges): Fix early return value to properly
        reflect changes made in the routine.
* bb-reorder.c (emit_barrier_after_bb): Move to cfgrtl.c.
(fix_up_fall_thru_edges): Remove incorrect check for bb layout order
        since this is called in cfglayout mode, and replace partition fixup
        with assert as that is now done by force_nonfallthru_and_redirect.
(add_reg_crossing_jump_notes): Handle the fact that some jumps may
        already be marked with region crossing note.
(insert_section_boundary_note): Make non-static, gate on flag
        has_bb_partition, rewrite to also check for multiple partitions.
(rest_of_handle_reorder_blocks): Remove call to
        insert_section_boundary_note, now done later during free_cfg.
        (duplicate_computed_gotos): Don't duplicate partition crossing edge.
* bb-reorder.h (insert_section_boundary_note): Declare.
* Makefile.in (cfgrtl.o): Depend on bb-reorder.h
* cfgrtl.c (rest_of_pass_free_cfg): If partitions exist
        invoke insert_section_boundary_note.
(try_redirect_by_replacing_jump): Remove unnecessary
        check for region crossing note.
(fixup_partition_crossing): New function.
(rtl_redirect_edge_and_branch): Fixup partition boundaries.
(emit_barrier_after_bb): Move here from bb-reorder.c, handle insertion
        in non-cfglayout mode.
(force_nonfallthru_and_redirect): Fixup partition boundaries,
        remove old code that tried to do this. Emit barrier correctly
        when we are in cfglayout mode.
        (last_bb_in_partition): New function.
(rtl_split_edge): Correctly fixup partition boundaries.
(commit_one_edge_insertion): Remove old code that tried to
        fixup region crossing edge since this is now handled in
        split_block, and set up insertion point correctly since
        block may now end in a jump.
        (verify_hot_cold_block_grouping): Guard against checking when not in
        linearized RTL mode.
(rtl_verify_edges): Add checks for incorrect/missing REG_CROSSING_JUMP
        notes.
        (rtl_verify_flow_info_1): Move verify_hot_cold_block_grouping to
        rtl_verify_flow_info, so not called in cfglayout mode.
        (rtl_verify_flow_info): Move verify_hot_cold_block_grouping here.
(fixup_reorder_chain): Remove old code that attempted to fixup region
        crossing note as this is now handled in force_nonfallthru_and_redirect.
(duplicate_insn_chain): Don't duplicate switch section notes.
(rtl_can_remove_branch_p): Remove unnecessary check for region crossing
        note.
* basic-block.h (emit_barrier_after_bb): Declare.
* testsuite/gcc.dg/tree-prof/va-arg-pack-1.c: Cloned from c-torture, made
        into -freorder-blocks-and-partition test.
* testsuite/gcc.dg/tree-prof/comp-goto-1.c: Ditto.
* testsuite/gcc.dg/tree-prof/20041218-1.c: Ditto.
* testsuite/gcc.dg/tree-prof/pr52027.c: Use -O2.
* testsuite/gcc.dg/tree-prof/pr50907.c: Ditto.
* testsuite/gcc.dg/tree-prof/pr45354.c: Ditto.
* testsuite/g++.dg/tree-prof/partition2.C: Ditto.
* testsuite/g++.dg/tree-prof/partition3.C: Ditto.




--
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson - May 29, 2013, 2:57 p.m.
On Thu, May 23, 2013 at 6:18 AM, Teresa Johnson <tejohnson@google.com> wrote:
> On Wed, May 22, 2013 at 2:05 PM, Teresa Johnson <tejohnson@google.com> wrote:
>> Revised patch included below. The spacing of my pasted in patch text
>> looks funky again, let me know if you want the patch as an attachment
>> instead.
>>
>> I addressed all of Steven's comments, except for the suggestion to use
>> gcc_assert
>> instead of error() in verify_hot_cold_block_grouping() to keep this consistent
>> with the rest of the verify_flow_info subroutines (let me know if this is ok).
>
> I fixed this issue too, which was actually in
> insert_section_boundary_note(), so that it gcc_asserts more
> efficiently as suggested. Retested, latest patch below.
>
> Honza, would you be able to review the patch?

Ping. Still needs a global maintainer to review and approve.

Also, I submitted a PR for the debug range issue:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57451

Thanks!
Teresa

>
> Thanks!
> Teresa
>
>>
>> The other main changes:
>> (1) Added several test cases (cloned from the torture subdirectories,
>> where I manually
>> built/ran with FDO and -freorder-blocks-and-partition with both the
>> current trunk and
>> my fixed trunk compiler, and was able to expose some failures I fixed.
>> (2) Changed existing tree-prof tests that used
>> -freorder-blocks-and-partition to be
>> built with -O2 instead of -O, so that partitioning actually kicks in.
>> (3) Fixed a couple of failures in the new
>> verify_hot_cold_block_grouping() checks
>> exposed by the torture tests I ran manually with splitting (2 of the
>> tests cloned
>> to tree-prof in this patch). One was in computed goto where we were
>> too aggressive
>> about cloning crossing edges, and the other was in rtl_split_edge
>> called from the "stack"
>> pass which was not correctly inserting the new bb in the correct partition since
>> bb layout is complete at that point.
>>
>> Re-tested on x86_64-unknown-linux-gnu with bootstrap and profiledbootstrap
>> builds and regression testing. Re-built/ran cpu2006int with profile
>> feedback and -freorder-blocks-and-partition enabled.
>>
>> Ok for trunk?
>>
>> Thanks!
>> Teresa
>
> 2013-05-23  Teresa Johnson  <tejohnson@google.com>
>
> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
>         as this is now done by redirect_edge_and_branch_force.
> * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
>         barriers, and fix interaction with splitting.
> * emit-rtl.c (try_split): Copy REG_CROSSING_JUMP notes.
> * cfgcleanup.c (try_forward_edges): Fix early return value to properly
>         reflect changes made in the routine.
> * bb-reorder.c (emit_barrier_after_bb): Move to cfgrtl.c.
> (fix_up_fall_thru_edges): Remove incorrect check for bb layout order
>         since this is called in cfglayout mode, and replace partition fixup
>         with assert as that is now done by force_nonfallthru_and_redirect.
> (add_reg_crossing_jump_notes): Handle the fact that some jumps may
>         already be marked with region crossing note.
> (insert_section_boundary_note): Make non-static, gate on flag
>         has_bb_partition, rewrite to also check for multiple partitions.
> (rest_of_handle_reorder_blocks): Remove call to
>         insert_section_boundary_note, now done later during free_cfg.
>         (duplicate_computed_gotos): Don't duplicate partition crossing edge.
> * bb-reorder.h (insert_section_boundary_note): Declare.
> * Makefile.in (cfgrtl.o): Depend on bb-reorder.h
> * cfgrtl.c (rest_of_pass_free_cfg): If partitions exist
>         invoke insert_section_boundary_note.
> (try_redirect_by_replacing_jump): Remove unnecessary
>         check for region crossing note.
> (fixup_partition_crossing): New function.
> (rtl_redirect_edge_and_branch): Fixup partition boundaries.
> (emit_barrier_after_bb): Move here from bb-reorder.c, handle insertion
>         in non-cfglayout mode.
> (force_nonfallthru_and_redirect): Fixup partition boundaries,
>         remove old code that tried to do this. Emit barrier correctly
>         when we are in cfglayout mode.
>         (last_bb_in_partition): New function.
> (rtl_split_edge): Correctly fixup partition boundaries.
> (commit_one_edge_insertion): Remove old code that tried to
>         fixup region crossing edge since this is now handled in
>         split_block, and set up insertion point correctly since
>         block may now end in a jump.
>         (verify_hot_cold_block_grouping): Guard against checking when not in
>         linearized RTL mode.
> (rtl_verify_edges): Add checks for incorrect/missing REG_CROSSING_JUMP
>         notes.
>         (rtl_verify_flow_info_1): Move verify_hot_cold_block_grouping to
>         rtl_verify_flow_info, so not called in cfglayout mode.
>         (rtl_verify_flow_info): Move verify_hot_cold_block_grouping here.
> (fixup_reorder_chain): Remove old code that attempted to fixup region
>         crossing note as this is now handled in force_nonfallthru_and_redirect.
> (duplicate_insn_chain): Don't duplicate switch section notes.
> (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
>         note.
> * basic-block.h (emit_barrier_after_bb): Declare.
> * testsuite/gcc.dg/tree-prof/va-arg-pack-1.c: Cloned from c-torture, made
>         into -freorder-blocks-and-partition test.
> * testsuite/gcc.dg/tree-prof/comp-goto-1.c: Ditto.
> * testsuite/gcc.dg/tree-prof/20041218-1.c: Ditto.
> * testsuite/gcc.dg/tree-prof/pr52027.c: Use -O2.
> * testsuite/gcc.dg/tree-prof/pr50907.c: Ditto.
> * testsuite/gcc.dg/tree-prof/pr45354.c: Ditto.
> * testsuite/g++.dg/tree-prof/partition2.C: Ditto.
> * testsuite/g++.dg/tree-prof/partition3.C: Ditto.
>
> Index: ifcvt.c
> ===================================================================
> --- ifcvt.c (revision 199014)
> +++ ifcvt.c (working copy)
> @@ -3905,10 +3905,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
>    if (new_bb)
>      {
>        df_bb_replace (then_bb_index, new_bb);
> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
> -         we need to ensure that new_bb is in the same partition as
> -         test bb (you can not fall through across section boundaries).  */
> -      BB_COPY_PARTITION (new_bb, test_bb);
> +      /* This should have been done above via force_nonfallthru_and_redirect
> +         (possibly called from redirect_edge_and_branch_force).  */
> +      gcc_checking_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
>      }
>
>    num_true_changes++;
> Index: function.c
> ===================================================================
> --- function.c (revision 199014)
> +++ function.c (working copy)
> @@ -6270,8 +6270,10 @@ thread_prologue_and_epilogue_insns (void)
>      break;
>   if (e)
>    {
> -    copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
> -  NULL_RTX, e->src);
> +                    /* Make sure we insert after any barriers.  */
> +                    rtx end = get_last_bb_insn (e->src);
> +                    copy_bb = create_basic_block (NEXT_INSN (end),
> +                                                  NULL_RTX, e->src);
>      BB_COPY_PARTITION (copy_bb, e->src);
>    }
>   else
> @@ -6538,7 +6540,7 @@ epilogue_done:
>        basic_block simple_return_block_cold = NULL;
>        edge pending_edge_hot = NULL;
>        edge pending_edge_cold = NULL;
> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
> +      basic_block exit_pred;
>        int i;
>
>        gcc_assert (entry_edge != orig_entry_edge);
> @@ -6566,6 +6568,12 @@ epilogue_done:
>      else
>        pending_edge_cold = e;
>    }
> +
> +      /* Save a pointer to the exit's predecessor BB for use in
> +         inserting new BBs at the end of the function. Do this
> +         after the call to split_block above which may split
> +         the original exit pred.  */
> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
>
>        FOR_EACH_VEC_ELT (unconverted_simple_returns, i, e)
>   {
> Index: emit-rtl.c
> ===================================================================
> --- emit-rtl.c (revision 199014)
> +++ emit-rtl.c (working copy)
> @@ -3574,6 +3574,7 @@ try_split (rtx pat, rtx trial, int last)
>    break;
>
>   case REG_NON_LOCAL_GOTO:
> + case REG_CROSSING_JUMP:
>    for (insn = insn_last; insn != NULL_RTX; insn = PREV_INSN (insn))
>      {
>        if (JUMP_P (insn))
> Index: cfgcleanup.c
> ===================================================================
> --- cfgcleanup.c (revision 199014)
> +++ cfgcleanup.c (working copy)
> @@ -456,7 +456,7 @@ try_forward_edges (int mode, basic_block b)
>
>        if (first != EXIT_BLOCK_PTR
>    && find_reg_note (BB_END (first), REG_CROSSING_JUMP, NULL_RTX))
> - return false;
> + return changed;
>
>        while (counter < n_basic_blocks)
>   {
> Index: bb-reorder.c
> ===================================================================
> --- bb-reorder.c (revision 199014)
> +++ bb-reorder.c (working copy)
> @@ -1380,15 +1380,6 @@ get_uncond_jump_length (void)
>    return length;
>  }
>
> -/* Emit a barrier into the footer of BB.  */
> -
> -static void
> -emit_barrier_after_bb (basic_block bb)
> -{
> -  rtx barrier = emit_barrier_after (BB_END (bb));
> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
> -}
> -
>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
>     Duplicate the landing pad and split the edges so that no EH edge
>     crosses partitions.  */
> @@ -1720,8 +1711,7 @@ fix_up_fall_thru_edges (void)
>       (i.e. fix it so the fall through does not cross and
>       the cond jump does).  */
>
> -  if (!cond_jump_crosses
> -      && cur_bb->aux == cond_jump->dest)
> +  if (!cond_jump_crosses)
>      {
>        /* Find label in fall_thru block. We've already added
>   any missing labels, so there must be one.  */
> @@ -1765,10 +1755,10 @@ fix_up_fall_thru_edges (void)
>        new_bb->aux = cur_bb->aux;
>        cur_bb->aux = new_bb;
>
> -      /* Make sure new fall-through bb is in same
> - partition as bb it's falling through from.  */
> +                      /* This is done by force_nonfallthru_and_redirect.  */
> +      gcc_assert (BB_PARTITION (new_bb)
> +                                  == BB_PARTITION (cur_bb));
>
> -      BB_COPY_PARTITION (new_bb, cur_bb);
>        single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
>      }
>    else
> @@ -2064,7 +2054,10 @@ add_reg_crossing_jump_notes (void)
>    FOR_EACH_BB (bb)
>      FOR_EACH_EDGE (e, ei, bb->succs)
>        if ((e->flags & EDGE_CROSSING)
> -  && JUMP_P (BB_END (e->src)))
> +  && JUMP_P (BB_END (e->src))
> +          /* Some notes were added during fix_up_fall_thru_edges, via
> +             force_nonfallthru_and_redirect.  */
> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
>   add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>  }
>
> @@ -2133,23 +2126,26 @@ reorder_basic_blocks (void)
>     encountering this note will make the compiler switch between the
>     hot and cold text sections.  */
>
> -static void
> +void
>  insert_section_boundary_note (void)
>  {
>    basic_block bb;
> -  int first_partition = 0;
> +  bool switched_sections = false;
> +  int current_partition = 0;
>
> -  if (!flag_reorder_blocks_and_partition)
> +  if (!crtl->has_bb_partition)
>      return;
>
>    FOR_EACH_BB (bb)
>      {
> -      if (!first_partition)
> - first_partition = BB_PARTITION (bb);
> -      if (BB_PARTITION (bb) != first_partition)
> +      if (!current_partition)
> + current_partition = BB_PARTITION (bb);
> +      if (BB_PARTITION (bb) != current_partition)
>   {
> -  emit_note_before (NOTE_INSN_SWITCH_TEXT_SECTIONS, BB_HEAD (bb));
> -  break;
> +  gcc_assert (!switched_sections);
> +          switched_sections = true;
> +          emit_note_before (NOTE_INSN_SWITCH_TEXT_SECTIONS, BB_HEAD (bb));
> +          current_partition = BB_PARTITION (bb);
>   }
>      }
>  }
> @@ -2180,8 +2176,6 @@ rest_of_handle_reorder_blocks (void)
>        bb->aux = bb->next_bb;
>    cfg_layout_finalize ();
>
> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
> -  insert_section_boundary_note ();
>    return 0;
>  }
>
> @@ -2315,6 +2309,11 @@ duplicate_computed_gotos (void)
>        if (!bitmap_bit_p (candidates, single_succ (bb)->index))
>   continue;
>
> +      /* Don't duplicate a partition crossing edge, which requires difficult
> +         fixup.  */
> +      if (find_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX))
> + continue;
> +
>        new_bb = duplicate_block (single_succ (bb), single_succ_edge (bb), bb);
>        new_bb->aux = bb->aux;
>        bb->aux = new_bb;
> Index: bb-reorder.h
> ===================================================================
> --- bb-reorder.h (revision 199014)
> +++ bb-reorder.h (working copy)
> @@ -35,4 +35,6 @@ extern struct target_bb_reorder *this_target_bb_re
>
>  extern int get_uncond_jump_length (void);
>
> +extern void insert_section_boundary_note (void);
> +
>  #endif
> Index: Makefile.in
> ===================================================================
> --- Makefile.in (revision 199014)
> +++ Makefile.in (working copy)
> @@ -3151,7 +3151,7 @@ cfgrtl.o : cfgrtl.c $(CONFIG_H) $(SYSTEM_H) corety
>     $(FUNCTION_H) $(EXCEPT_H) $(TM_P_H) $(INSN_ATTR_H) \
>     insn-config.h $(EXPR_H) \
>     $(CFGLOOP_H) $(OBSTACK_H) $(TARGET_H) $(TREE_H) \
> -   $(TREE_PASS_H) $(DF_H) $(GGC_H) $(COMMON_TARGET_H) gt-cfgrtl.h
> +   $(TREE_PASS_H) $(DF_H) $(GGC_H) $(COMMON_TARGET_H) gt-cfgrtl.h bb-reorder.h
>  cfganal.o : cfganal.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(BASIC_BLOCK_H) \
>     $(TIMEVAR_H) sbitmap.h $(BITMAP_H)
>  cfgbuild.o : cfgbuild.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
> Index: cfgrtl.c
> ===================================================================
> --- cfgrtl.c (revision 199014)
> +++ cfgrtl.c (working copy)
> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree.h"
>  #include "hard-reg-set.h"
>  #include "basic-block.h"
> +#include "bb-reorder.h"
>  #include "regs.h"
>  #include "flags.h"
>  #include "function.h"
> @@ -451,6 +452,9 @@ rest_of_pass_free_cfg (void)
>      }
>  #endif
>
> +  if (crtl->has_bb_partition)
> +    insert_section_boundary_note ();
> +
>    free_bb_for_insn ();
>    return 0;
>  }
> @@ -981,8 +985,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
>       partition boundaries).  See  the comments at the top of
>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>
> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
> -      || BB_PARTITION (src) != BB_PARTITION (target))
> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>      return NULL;
>
>    /* We can replace or remove a complex jump only when we have exactly
> @@ -1291,6 +1294,53 @@ redirect_branch_edge (edge e, basic_block target)
>    return e;
>  }
>
> +/* Called when edge E has been redirected to a new destination,
> +   in order to update the region crossing flag on the edge and
> +   jump.  */
> +
> +static void
> +fixup_partition_crossing (edge e)
> +{
> +  rtx note;
> +
> +  if (e->src == ENTRY_BLOCK_PTR || e->dest == EXIT_BLOCK_PTR)
> +    return;
> +  /* If we redirected an existing edge, it may already be marked
> +     crossing, even though the new src is missing a reg crossing note.
> +     But make sure reg crossing note doesn't already exist before
> +     inserting.  */
> +  if (BB_PARTITION (e->src) != BB_PARTITION (e->dest))
> +    {
> +      e->flags |= EDGE_CROSSING;
> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
> +      if (JUMP_P (BB_END (e->src))
> +          && !note)
> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
> +    }
> +  else if (BB_PARTITION (e->src) == BB_PARTITION (e->dest))
> +    {
> +      e->flags &= ~EDGE_CROSSING;
> +      /* Remove the section crossing note from jump at end of
> +         src if it exists, and if no other successors are
> +         still crossing.  */
> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
> +      if (note)
> +        {
> +          bool has_crossing_succ = false;
> +          edge e2;
> +          edge_iterator ei;
> +          FOR_EACH_EDGE (e2, ei, e->src->succs)
> +            {
> +              has_crossing_succ |= (e2->flags & EDGE_CROSSING);
> +              if (has_crossing_succ)
> +                break;
> +            }
> +          if (!has_crossing_succ)
> +            remove_note (BB_END (e->src), note);
> +        }
> +    }
> +}
> +
>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>     expense of adding new instructions or reordering basic blocks.
>
> @@ -1307,16 +1357,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>  {
>    edge ret;
>    basic_block src = e->src;
> +  basic_block dest = e->dest;
>
>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>      return NULL;
>
> -  if (e->dest == target)
> +  if (dest == target)
>      return e;
>
>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
>      {
>        df_set_bb_dirty (src);
> +      fixup_partition_crossing (ret);
>        return ret;
>      }
>
> @@ -1325,9 +1377,22 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>      return NULL;
>
>    df_set_bb_dirty (src);
> +  fixup_partition_crossing (ret);
>    return ret;
>  }
>
> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
> +
> +void
> +emit_barrier_after_bb (basic_block bb)
> +{
> +  rtx barrier = emit_barrier_after (BB_END (bb));
> +  gcc_assert (current_ir_type() == IR_RTL_CFGRTL
> +              || current_ir_type () == IR_RTL_CFGLAYOUT);
> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
> +}
> +
>  /* Like force_nonfallthru below, but additionally performs redirection
>     Used by redirect_edge_and_branch_force.  JUMP_LABEL is used only
>     when redirecting to the EXIT_BLOCK, it is either ret_rtx or
> @@ -1492,12 +1557,6 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>        /* Make sure new block ends up in correct hot/cold section.  */
>
>        BB_COPY_PARTITION (jump_block, e->src);
> -      if (flag_reorder_blocks_and_partition
> -  && targetm_common.have_named_sections
> -  && JUMP_P (BB_END (jump_block))
> -  && !any_condjump_p (BB_END (jump_block))
> -  && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
> - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
>
>        /* Wire edge in.  */
>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
> @@ -1508,6 +1567,10 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>        redirect_edge_pred (e, jump_block);
>        e->probability = REG_BR_PROB_BASE;
>
> +      /* If e->src was previously region crossing, it no longer is
> +         and the reg crossing note should be removed.  */
> +      fixup_partition_crossing (new_edge);
> +
>        /* If asm goto has any label refs to target's label,
>   add also edge from asm goto bb to target.  */
>        if (asm_goto_edge)
> @@ -1559,13 +1622,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>        LABEL_NUSES (label)++;
>      }
>
> -  emit_barrier_after (BB_END (jump_block));
> +  /* We might be in cfg layout mode, and if so, the following routine will
> +     insert the barrier correctly.  */
> +  emit_barrier_after_bb (jump_block);
>    redirect_edge_succ_nodup (e, target);
>
>    if (abnormal_edge_flags)
>      make_edge (src, target, abnormal_edge_flags);
>
>    df_mark_solutions_dirty ();
> +  fixup_partition_crossing (e);
>    return new_bb;
>  }
>
> @@ -1654,6 +1720,21 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>    return false;
>  }
>
> +/* Locate the last bb in the same partition as START_BB.  */
> +
> +static basic_block
> +last_bb_in_partition (basic_block start_bb)
> +{
> +  basic_block bb;
> +  FOR_BB_BETWEEN (bb, start_bb, EXIT_BLOCK_PTR, next_bb)
> +    {
> +      if (BB_PARTITION (start_bb) != BB_PARTITION (bb->next_bb))
> +        return bb;
> +    }
> +  /* Return bb before EXIT_BLOCK_PTR.  */
> +  return bb->prev_bb;
> +}
> +
>  /* Split a (typically critical) edge.  Return the new block.
>     The edge must not be abnormal.
>
> @@ -1664,7 +1745,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>  static basic_block
>  rtl_split_edge (edge edge_in)
>  {
> -  basic_block bb;
> +  basic_block bb, new_bb;
>    rtx before;
>
>    /* Abnormal edges cannot be split.  */
> @@ -1696,13 +1777,50 @@ rtl_split_edge (edge edge_in)
>      }
>    else
>      {
> -      bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
> -      /* ??? Why not edge_in->dest->prev_bb here?  */
> -      BB_COPY_PARTITION (bb, edge_in->dest);
> +      if (edge_in->src == ENTRY_BLOCK_PTR)
> +        {
> +          bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
> +          BB_COPY_PARTITION (bb, edge_in->dest);
> +        }
> +      else
> +        {
> +          basic_block after = edge_in->dest->prev_bb;
> +          /* If this is post-bb reordering, and the edge crosses a partition
> +             boundary, the new block needs to be inserted in the bb chain
> +             at the end of the src partition (since we put the new bb into
> +             that partition, see below). Otherwise we may end up creating
> +             an extra partition crossing in the chain, which is illegal.
> +             It can't go after the src, because src may have a fall-through
> +             to a different block.  */
> +          if (crtl->bb_reorder_complete
> +              && (edge_in->flags & EDGE_CROSSING))
> +            {
> +              after = last_bb_in_partition (edge_in->src);
> +              before = NEXT_INSN (BB_END (after));
> +              /* The instruction following the last bb in partition should
> +                 be a barrier, since it cannot end in a fall-through.  */
> +              gcc_checking_assert (BARRIER_P (before));
> +              before = NEXT_INSN (before);
> +            }
> +          bb = create_basic_block (before, NULL, after);
> +          /* Put the split bb into the src partition, to avoid creating
> +             a situation where a cold bb dominates a hot bb, in the case
> +             where src is cold and dest is hot. The src will dominate
> +             the new bb (whereas it might not have dominated dest).  */
> +          BB_COPY_PARTITION (bb, edge_in->src);
> +        }
>      }
>
>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
>
> +  /* Can't allow a region crossing edge to be fallthrough.  */
> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
> +      && edge_in->dest != EXIT_BLOCK_PTR)
> +    {
> +      new_bb = force_nonfallthru (single_succ_edge (bb));
> +      gcc_assert (!new_bb);
> +    }
> +
>    /* For non-fallthru edges, we must adjust the predecessor's
>       jump instruction to target our new block.  */
>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
> @@ -1815,17 +1933,13 @@ commit_one_edge_insertion (edge e)
>    else
>      {
>        bb = split_edge (e);
> -      after = BB_END (bb);
>
> -      if (flag_reorder_blocks_and_partition
> -  && targetm_common.have_named_sections
> -  && e->src != ENTRY_BLOCK_PTR
> -  && BB_PARTITION (e->src) == BB_COLD_PARTITION
> -  && !(e->flags & EDGE_CROSSING)
> -  && JUMP_P (after)
> -  && !any_condjump_p (after)
> -  && (single_succ_edge (bb)->flags & EDGE_CROSSING))
> - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
> +      /* If E crossed a partition boundary, we needed to make bb end in
> +         a region-crossing jump, even though it was originally fallthru.  */
> +      if (JUMP_P (BB_END (bb)))
> + before = BB_END (bb);
> +      else
> +        after = BB_END (bb);
>      }
>
>    /* Now that we've found the spot, do the insertion.  */
> @@ -2071,7 +2185,11 @@ verify_hot_cold_block_grouping (void)
>    bool switched_sections = false;
>    int current_partition = BB_UNPARTITIONED;
>
> -  if (!crtl->bb_reorder_complete)
> +  /* Even after bb reordering is complete, we go into cfglayout mode
> +     again (in compgoto). Ensure we don't call this before going back
> +     into linearized RTL when any layout fixes would have been committed.  */
> +  if (!crtl->bb_reorder_complete
> +      || current_ir_type() != IR_RTL_CFGRTL)
>      return err;
>
>    FOR_EACH_BB (bb)
> @@ -2116,6 +2234,7 @@ rtl_verify_edges (void)
>        edge e, fallthru = NULL;
>        edge_iterator ei;
>        rtx note;
> +      bool has_crossing_edge = false;
>
>        if (JUMP_P (BB_END (bb))
>    && (note = find_reg_note (BB_END (bb), REG_BR_PROB, NULL_RTX))
> @@ -2141,6 +2260,7 @@ rtl_verify_edges (void)
>    is_crossing = (BB_PARTITION (e->src) != BB_PARTITION (e->dest)
>   && e->src != ENTRY_BLOCK_PTR
>   && e->dest != EXIT_BLOCK_PTR);
> +          has_crossing_edge |= is_crossing;
>    if (e->flags & EDGE_CROSSING)
>      {
>        if (!is_crossing)
> @@ -2160,6 +2280,13 @@ rtl_verify_edges (void)
>   e->src->index);
>    err = 1;
>   }
> +              if (JUMP_P (BB_END (bb))
> +                  && !find_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX))
> + {
> +  error ("No region crossing jump at section boundary in bb %i",
> + bb->index);
> +  err = 1;
> + }
>      }
>    else if (is_crossing)
>      {
> @@ -2188,6 +2315,15 @@ rtl_verify_edges (void)
>      n_abnormal++;
>   }
>
> +        if (!has_crossing_edge
> +            && find_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX))
> +          {
> +            print_rtl_with_bb (stderr, get_insns (), TDF_RTL |
> TDF_BLOCKS | TDF_DETAILS);
> +            error ("Region crossing jump across same section in bb %i",
> +                   bb->index);
> +            err = 1;
> +          }
> +
>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
>   {
>    error ("missing REG_EH_REGION note at the end of bb %i", bb->index);
> @@ -2395,8 +2531,6 @@ rtl_verify_flow_info_1 (void)
>
>    err |= rtl_verify_edges ();
>
> -  err |= verify_hot_cold_block_grouping();
> -
>    return err;
>  }
>
> @@ -2642,6 +2776,8 @@ rtl_verify_flow_info (void)
>
>    err |= rtl_verify_bb_layout ();
>
> +  err |= verify_hot_cold_block_grouping ();
> +
>    return err;
>  }
>
> @@ -3343,7 +3479,7 @@ fixup_reorder_chain (void)
>        edge e_fall, e_taken, e;
>        rtx bb_end_insn;
>        rtx ret_label = NULL_RTX;
> -      basic_block nb, src_bb;
> +      basic_block nb;
>        edge_iterator ei;
>
>        if (EDGE_COUNT (bb->succs) == 0)
> @@ -3478,7 +3614,6 @@ fixup_reorder_chain (void)
>        /* We got here if we need to add a new jump insn.
>   Note force_nonfallthru can delete E_FALL and thus we have to
>   save E_FALL->src prior to the call to force_nonfallthru.  */
> -      src_bb = e_fall->src;
>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
>        if (nb)
>   {
> @@ -3486,17 +3621,6 @@ fixup_reorder_chain (void)
>    bb->aux = nb;
>    /* Don't process this new block.  */
>    bb = nb;
> -
> -  /* Make sure new bb is tagged for correct section (same as
> -     fall-thru source, since you cannot fall-thru across
> -     section boundaries).  */
> -  BB_COPY_PARTITION (src_bb, single_pred (bb));
> -  if (flag_reorder_blocks_and_partition
> -      && targetm_common.have_named_sections
> -      && JUMP_P (BB_END (bb))
> -      && !any_condjump_p (BB_END (bb))
> -      && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
> -    add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
>   }
>      }
>
> @@ -3796,10 +3920,11 @@ duplicate_insn_chain (rtx from, rtx to)
>      case NOTE_INSN_FUNCTION_BEG:
>        /* There is always just single entry to function.  */
>      case NOTE_INSN_BASIC_BLOCK:
> +              /* We should only switch text sections once.  */
> +    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>        break;
>
>      case NOTE_INSN_EPILOGUE_BEG:
> -    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>        emit_note_copy (insn);
>        break;
>
> @@ -4611,8 +4736,7 @@ rtl_can_remove_branch_p (const_edge e)
>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>      return false;
>
> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
> -      || BB_PARTITION (src) != BB_PARTITION (target))
> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>      return false;
>
>    if (!onlyjump_p (insn)
> Index: basic-block.h
> ===================================================================
> --- basic-block.h (revision 199014)
> +++ basic-block.h (working copy)
> @@ -796,6 +796,7 @@ extern basic_block force_nonfallthru_and_redirect
>  extern bool contains_no_active_insn_p (const_basic_block);
>  extern bool forwarder_block_p (const_basic_block);
>  extern bool can_fallthru (basic_block, basic_block);
> +extern void emit_barrier_after_bb (basic_block bb);
>
>  /* In cfgbuild.c.  */
>  extern void find_many_sub_basic_blocks (sbitmap);
> Index: testsuite/gcc.dg/tree-prof/va-arg-pack-1.c
> ===================================================================
> --- testsuite/gcc.dg/tree-prof/va-arg-pack-1.c (revision 0)
> +++ testsuite/gcc.dg/tree-prof/va-arg-pack-1.c (revision 0)
> @@ -0,0 +1,145 @@
> +/* __builtin_va_arg_pack () builtin tests.  */
> +/* { dg-require-effective-target freorder } */
> +/* { dg-options "-O2 -freorder-blocks-and-partition" } */
> +
> +#include <stdarg.h>
> +
> +extern void abort (void);
> +
> +int v1 = 8;
> +long int v2 = 3;
> +void *v3 = (void *) &v2;
> +struct A { char c[16]; } v4 = { "foo" };
> +long double v5 = 40;
> +char seen[20];
> +int cnt;
> +
> +__attribute__ ((noinline)) int
> +foo1 (int x, int y, ...)
> +{
> +  int i;
> +  long int l;
> +  void *v;
> +  struct A a;
> +  long double ld;
> +  va_list ap;
> +
> +  va_start (ap, y);
> +  if (x < 0 || x >= 20 || seen[x])
> +    abort ();
> +  seen[x] = ++cnt;
> +  if (y != 6)
> +    abort ();
> +  i = va_arg (ap, int);
> +  if (i != 5)
> +    abort ();
> +  switch (x)
> +    {
> +    case 0:
> +      i = va_arg (ap, int);
> +      if (i != 9 || v1 != 9)
> + abort ();
> +      a = va_arg (ap, struct A);
> +      if (__builtin_memcmp (a.c, v4.c, sizeof (a.c)) != 0)
> + abort ();
> +      v = (void *) va_arg (ap, struct A *);
> +      if (v != (void *) &v4)
> + abort ();
> +      l = va_arg (ap, long int);
> +      if (l != 3 || v2 != 4)
> + abort ();
> +      break;
> +    case 1:
> +      ld = va_arg (ap, long double);
> +      if (ld != 41 || v5 != ld)
> + abort ();
> +      i = va_arg (ap, int);
> +      if (i != 8)
> + abort ();
> +      v = va_arg (ap, void *);
> +      if (v != &v2)
> + abort ();
> +      break;
> +    case 2:
> +      break;
> +    default:
> +      abort ();
> +    }
> +  va_end (ap);
> +  return x;
> +}
> +
> +__attribute__ ((noinline)) int
> +foo2 (int x, int y, ...)
> +{
> +  long long int ll;
> +  void *v;
> +  struct A a, b;
> +  long double ld;
> +  va_list ap;
> +
> +  va_start (ap, y);
> +  if (x < 0 || x >= 20 || seen[x])
> +    abort ();
> +  seen[x] = ++cnt | 64;
> +  if (y != 10)
> +    abort ();
> +  switch (x)
> +    {
> +    case 11:
> +      break;
> +    case 12:
> +      ld = va_arg (ap, long double);
> +      if (ld != 41 || v5 != 40)
> + abort ();
> +      a = va_arg (ap, struct A);
> +      if (__builtin_memcmp (a.c, v4.c, sizeof (a.c)) != 0)
> + abort ();
> +      b = va_arg (ap, struct A);
> +      if (__builtin_memcmp (b.c, v4.c, sizeof (b.c)) != 0)
> + abort ();
> +      v = va_arg (ap, void *);
> +      if (v != &v2)
> + abort ();
> +      ll = va_arg (ap, long long int);
> +      if (ll != 16LL)
> + abort ();
> +      break;
> +    case 2:
> +      break;
> +    default:
> +      abort ();
> +    }
> +  va_end (ap);
> +  return x + 8;
> +}
> +
> +__attribute__ ((noinline)) int
> +foo3 (void)
> +{
> +  return 6;
> +}
> +
> +extern inline __attribute__ ((always_inline, gnu_inline)) int
> +bar (int x, ...)
> +{
> +  if (x < 10)
> +    return foo1 (x, foo3 (), 5, __builtin_va_arg_pack ());
> +  return foo2 (x, foo3 () + 4, __builtin_va_arg_pack ());
> +}
> +
> +int
> +main (void)
> +{
> +  if (bar (0, ++v1, v4, &v4, v2++) != 0)
> +    abort ();
> +  if (bar (1, ++v5, 8, v3) != 1)
> +    abort ();
> +  if (bar (2) != 2)
> +    abort ();
> +  if (bar (v1 + 2) != 19)
> +    abort ();
> +  if (bar (v1 + 3, v5--, v4, v4, v3, 16LL) != 20)
> +    abort ();
> +  return 0;
> +}
> Index: testsuite/gcc.dg/tree-prof/comp-goto-1.c
> ===================================================================
> --- testsuite/gcc.dg/tree-prof/comp-goto-1.c (revision 0)
> +++ testsuite/gcc.dg/tree-prof/comp-goto-1.c (revision 0)
> @@ -0,0 +1,166 @@
> +/* { dg-require-effective-target freorder } */
> +/* { dg-options "-O2 -freorder-blocks-and-partition" } */
> +#include <stdlib.h>
> +
> +#if !defined(NO_LABEL_VALUES) && (!defined(STACK_SIZE) || STACK_SIZE
>>= 4000) && __INT_MAX__ >= 2147483647
> +typedef unsigned int uint32;
> +typedef signed int sint32;
> +
> +typedef uint32 reg_t;
> +
> +typedef unsigned long int host_addr_t;
> +typedef uint32 target_addr_t;
> +typedef sint32 target_saddr_t;
> +
> +typedef union
> +{
> +  struct
> +    {
> +      unsigned int offset:18;
> +      unsigned int ignore:4;
> +      unsigned int s1:8;
> +      int :2;
> +      signed int simm:14;
> +      unsigned int s3:8;
> +      unsigned int s2:8;
> +      int pad2:2;
> +    } f1;
> +  long long ll;
> +  double d;
> +} insn_t;
> +
> +typedef struct
> +{
> +  target_addr_t vaddr_tag;
> +  unsigned long int rigged_paddr;
> +} tlb_entry_t;
> +
> +typedef struct
> +{
> +  insn_t *pc;
> +  reg_t registers[256];
> +  insn_t *program;
> +  tlb_entry_t tlb_tab[0x100];
> +} environment_t;
> +
> +enum operations
> +{
> +  LOAD32_RR,
> +  METAOP_DONE
> +};
> +
> +host_addr_t
> +f ()
> +{
> +  abort ();
> +}
> +
> +reg_t
> +simulator_kernel (int what, environment_t *env)
> +{
> +  register insn_t *pc = env->pc;
> +  register reg_t *regs = env->registers;
> +  register insn_t insn;
> +  register int s1;
> +  register reg_t r2;
> +  register void *base_addr = &&sim_base_addr;
> +  register tlb_entry_t *tlb = env->tlb_tab;
> +
> +  if (what != 0)
> +    {
> +      int i;
> +      static void *op_map[] =
> + {
> +  &&L_LOAD32_RR,
> +  &&L_METAOP_DONE,
> + };
> +      insn_t *program = env->program;
> +      for (i = 0; i < what; i++)
> + program[i].f1.offset = op_map[program[i].f1.offset] - base_addr;
> +    }
> +
> + sim_base_addr:;
> +
> +  insn = *pc++;
> +  r2 = (*(reg_t *) (((char *) regs) + (insn.f1.s2 << 2)));
> +  s1 = (insn.f1.s1 << 2);
> +  goto *(base_addr + insn.f1.offset);
> +
> + L_LOAD32_RR:
> +  {
> +    target_addr_t vaddr_page = r2 / 4096;
> +    unsigned int x = vaddr_page % 0x100;
> +    insn = *pc++;
> +
> +    for (;;)
> +      {
> + target_addr_t tag = tlb[x].vaddr_tag;
> + host_addr_t rigged_paddr = tlb[x].rigged_paddr;
> +
> + if (tag == vaddr_page)
> +  {
> +    *(reg_t *) (((char *) regs) + s1) = *(uint32 *) (rigged_paddr + r2);
> +    r2 = *(reg_t *) (((char *) regs) + (insn.f1.s2 << 2));
> +    s1 = insn.f1.s1 << 2;
> +    goto *(base_addr + insn.f1.offset);
> +  }
> +
> + if (((target_saddr_t) tag < 0))
> +  {
> +    *(reg_t *) (((char *) regs) + s1) = *(uint32 *) f ();
> +    r2 = *(reg_t *) (((char *) regs) + (insn.f1.s2 << 2));
> +    s1 = insn.f1.s1 << 2;
> +    goto *(base_addr + insn.f1.offset);
> +  }
> +
> + x = (x - 1) % 0x100;
> +      }
> +
> +    L_METAOP_DONE:
> +      return (*(reg_t *) (((char *) regs) + s1));
> +  }
> +}
> +
> +insn_t program[2 + 1];
> +
> +void *malloc ();
> +
> +int
> +main ()
> +{
> +  environment_t env;
> +  insn_t insn;
> +  int i, res;
> +  host_addr_t a_page = (host_addr_t) malloc (2 * 4096);
> +  target_addr_t a_vaddr = 0x123450;
> +  target_addr_t vaddr_page = a_vaddr / 4096;
> +  a_page = (a_page + 4096 - 1) & -4096;
> +
> +  env.tlb_tab[((vaddr_page) % 0x100)].vaddr_tag = vaddr_page;
> +  env.tlb_tab[((vaddr_page) % 0x100)].rigged_paddr = a_page -
> vaddr_page * 4096;
> +  insn.f1.offset = LOAD32_RR;
> +  env.registers[0] = 0;
> +  env.registers[2] = a_vaddr;
> +  *(sint32 *) (a_page + a_vaddr % 4096) = 88;
> +  insn.f1.s1 = 0;
> +  insn.f1.s2 = 2;
> +
> +  for (i = 0; i < 2; i++)
> +    program[i] = insn;
> +
> +  insn.f1.offset = METAOP_DONE;
> +  insn.f1.s1 = 0;
> +  program[2] = insn;
> +
> +  env.pc = program;
> +  env.program = program;
> +
> +  res = simulator_kernel (2 + 1, &env);
> +
> +  if (res != 88)
> +    abort ();
> +  exit (0);
> +}
> +#else
> +main(){ exit (0); }
> +#endif
> Index: testsuite/gcc.dg/tree-prof/pr52027.c
> ===================================================================
> --- testsuite/gcc.dg/tree-prof/pr52027.c (revision 199014)
> +++ testsuite/gcc.dg/tree-prof/pr52027.c (working copy)
> @@ -1,6 +1,6 @@
>  /* PR debug/52027 */
>  /* { dg-require-effective-target freorder } */
> -/* { dg-options "-O -freorder-blocks-and-partition -fno-reorder-functions" } */
> +/* { dg-options "-O2 -freorder-blocks-and-partition
> -fno-reorder-functions" } */
>
>  void
>  foo (int len)
> Index: testsuite/gcc.dg/tree-prof/pr50907.c
> ===================================================================
> --- testsuite/gcc.dg/tree-prof/pr50907.c (revision 199014)
> +++ testsuite/gcc.dg/tree-prof/pr50907.c (working copy)
> @@ -1,5 +1,5 @@
>  /* PR middle-end/50907 */
>  /* { dg-require-effective-target freorder } */
> -/* { dg-options "-O -freorder-blocks-and-partition -fschedule-insns
> -fselective-scheduling -fpic" { target { { powerpc*-*-* ia64-*-*
> x86_64-*-* } && fpic } } } */
> +/* { dg-options "-O2 -freorder-blocks-and-partition -fschedule-insns
> -fselective-scheduling -fpic" { target { { powerpc*-*-* ia64-*-*
> x86_64-*-* } && fpic } } } */
>
>  #include "pr45354.c"
> Index: testsuite/gcc.dg/tree-prof/pr45354.c
> ===================================================================
> --- testsuite/gcc.dg/tree-prof/pr45354.c (revision 199014)
> +++ testsuite/gcc.dg/tree-prof/pr45354.c (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-require-effective-target freorder } */
> -/* { dg-options "-O -freorder-blocks-and-partition -fschedule-insns
> -fselective-scheduling" { target powerpc*-*-* ia64-*-* x86_64-*-* } }
> */
> +/* { dg-options "-O2 -freorder-blocks-and-partition -fschedule-insns
> -fselective-scheduling" { target powerpc*-*-* ia64-*-* x86_64-*-* } }
> */
>
>  extern void abort (void);
>
> Index: testsuite/gcc.dg/tree-prof/20041218-1.c
> ===================================================================
> --- testsuite/gcc.dg/tree-prof/20041218-1.c (revision 0)
> +++ testsuite/gcc.dg/tree-prof/20041218-1.c (revision 0)
> @@ -0,0 +1,119 @@
> +/* PR rtl-optimization/16968 */
> +/* Testcase by Jakub Jelinek  <jakub@redhat.com> */
> +/* { dg-require-effective-target freorder } */
> +/* { dg-options "-O2 -freorder-blocks-and-partition" } */
> +
> +struct T
> +{
> +  unsigned int b, c, *d;
> +  unsigned char e;
> +};
> +struct S
> +{
> +  unsigned int a;
> +  struct T f;
> +};
> +struct U
> +{
> +  struct S g, h;
> +};
> +struct V
> +{
> +  unsigned int i;
> +  struct U j;
> +};
> +
> +extern void exit (int);
> +extern void abort (void);
> +
> +void *
> +dummy1 (void *x)
> +{
> +  return "";
> +}
> +
> +void *
> +dummy2 (void *x, void *y)
> +{
> +  exit (0);
> +}
> +
> +struct V *
> +baz (unsigned int x)
> +{
> +  static struct V v;
> +  __builtin_memset (&v, 0x55, sizeof (v));
> +  return &v;
> +}
> +
> +int
> +check (void *x, struct S *y)
> +{
> +  if (y->a || y->f.b || y->f.c || y->f.d || y->f.e)
> +    abort ();
> +  return 1;
> +}
> +
> +static struct V *
> +bar (unsigned int x, void *y)
> +{
> +  const struct T t = { 0, 0, (void *) 0, 0 };
> +  struct V *u;
> +  void *v;
> +  v = dummy1 (y);
> +  if (!v)
> +    return (void *) 0;
> +
> +  u = baz (sizeof (struct V));
> +  u->i = x;
> +  u->j.g.a = 0;
> +  u->j.g.f = t;
> +  u->j.h.a = 0;
> +  u->j.h.f = t;
> +
> +  if (!check (v, &u->j.g) || !check (v, &u->j.h))
> +    return (void *) 0;
> +  return u;
> +}
> +
> +int
> +foo (unsigned int *x, unsigned int y, void **z)
> +{
> +  void *v;
> +  unsigned int i, j;
> +
> +  *z = v = (void *) 0;
> +
> +  for (i = 0; i < y; i++)
> +    {
> +      struct V *c;
> +
> +      j = *x;
> +
> +      switch (j)
> + {
> + case 1:
> +  c = bar (j, x);
> +  break;
> + default:
> +  c = 0;
> +  break;
> + }
> +      if (c)
> + v = dummy2 (v, c);
> +      else
> +        return 1;
> +    }
> +
> +  *z = v;
> +  return 0;
> +}
> +
> +int
> +main (void)
> +{
> +  unsigned int one = 1;
> +  void *p;
> +  foo (&one, 1, &p);
> +  abort ();
> +}
> Index: testsuite/g++.dg/tree-prof/partition2.C
> ===================================================================
> --- testsuite/g++.dg/tree-prof/partition2.C (revision 199014)
> +++ testsuite/g++.dg/tree-prof/partition2.C (working copy)
> @@ -1,6 +1,6 @@
>  // PR middle-end/45458
>  // { dg-require-effective-target freorder }
> -// { dg-options "-fnon-call-exceptions -freorder-blocks-and-partition" }
> +// { dg-options "-O2 -fnon-call-exceptions -freorder-blocks-and-partition" }
>
>  int
>  main ()
> Index: testsuite/g++.dg/tree-prof/partition3.C
> ===================================================================
> --- testsuite/g++.dg/tree-prof/partition3.C (revision 199014)
> +++ testsuite/g++.dg/tree-prof/partition3.C (working copy)
> @@ -1,6 +1,6 @@
>  // PR middle-end/45566
>  // { dg-require-effective-target freorder }
> -// { dg-options "-O -fnon-call-exceptions -freorder-blocks-and-partition" }
> +// { dg-options "-O2 -fnon-call-exceptions -freorder-blocks-and-partition" }
>
>  int k;
>
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



--
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson - June 5, 2013, 2:06 p.m.
On Wed, May 29, 2013 at 7:57 AM, Teresa Johnson <tejohnson@google.com> wrote:
> On Thu, May 23, 2013 at 6:18 AM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Wed, May 22, 2013 at 2:05 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>> Revised patch included below. The spacing of my pasted in patch text
>>> looks funky again, let me know if you want the patch as an attachment
>>> instead.
>>>
>>> I addressed all of Steven's comments, except for the suggestion to use
>>> gcc_assert
>>> instead of error() in verify_hot_cold_block_grouping() to keep this consistent
>>> with the rest of the verify_flow_info subroutines (let me know if this is ok).
>>
>> I fixed this issue too, which was actually in
>> insert_section_boundary_note(), so that it gcc_asserts more
>> efficiently as suggested. Retested, latest patch below.
>>
>> Honza, would you be able to review the patch?
>
> Ping. Still needs a global maintainer to review and approve.

Ping.

Thanks!
Teresa

>
> Also, I submitted a PR for the debug range issue:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57451
>
> Thanks!
> Teresa
>
>>
>> Thanks!
>> Teresa
>>
>>>
>>> The other main changes:
>>> (1) Added several test cases (cloned from the torture subdirectories,
>>> where I manually
>>> built/ran with FDO and -freorder-blocks-and-partition with both the
>>> current trunk and
>>> my fixed trunk compiler, and was able to expose some failures I fixed.
>>> (2) Changed existing tree-prof tests that used
>>> -freorder-blocks-and-partition to be
>>> built with -O2 instead of -O, so that partitioning actually kicks in.
>>> (3) Fixed a couple of failures in the new
>>> verify_hot_cold_block_grouping() checks
>>> exposed by the torture tests I ran manually with splitting (2 of the
>>> tests cloned
>>> to tree-prof in this patch). One was in computed goto where we were
>>> too aggressive
>>> about cloning crossing edges, and the other was in rtl_split_edge
>>> called from the "stack"
>>> pass which was not correctly inserting the new bb in the correct partition since
>>> bb layout is complete at that point.
>>>
>>> Re-tested on x86_64-unknown-linux-gnu with bootstrap and profiledbootstrap
>>> builds and regression testing. Re-built/ran cpu2006int with profile
>>> feedback and -freorder-blocks-and-partition enabled.
>>>
>>> Ok for trunk?
>>>
>>> Thanks!
>>> Teresa
>>
>> 2013-05-23  Teresa Johnson  <tejohnson@google.com>
>>
>> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
>>         as this is now done by redirect_edge_and_branch_force.
>> * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
>>         barriers, and fix interaction with splitting.
>> * emit-rtl.c (try_split): Copy REG_CROSSING_JUMP notes.
>> * cfgcleanup.c (try_forward_edges): Fix early return value to properly
>>         reflect changes made in the routine.
>> * bb-reorder.c (emit_barrier_after_bb): Move to cfgrtl.c.
>> (fix_up_fall_thru_edges): Remove incorrect check for bb layout order
>>         since this is called in cfglayout mode, and replace partition fixup
>>         with assert as that is now done by force_nonfallthru_and_redirect.
>> (add_reg_crossing_jump_notes): Handle the fact that some jumps may
>>         already be marked with region crossing note.
>> (insert_section_boundary_note): Make non-static, gate on flag
>>         has_bb_partition, rewrite to also check for multiple partitions.
>> (rest_of_handle_reorder_blocks): Remove call to
>>         insert_section_boundary_note, now done later during free_cfg.
>>         (duplicate_computed_gotos): Don't duplicate partition crossing edge.
>> * bb-reorder.h (insert_section_boundary_note): Declare.
>> * Makefile.in (cfgrtl.o): Depend on bb-reorder.h
>> * cfgrtl.c (rest_of_pass_free_cfg): If partitions exist
>>         invoke insert_section_boundary_note.
>> (try_redirect_by_replacing_jump): Remove unnecessary
>>         check for region crossing note.
>> (fixup_partition_crossing): New function.
>> (rtl_redirect_edge_and_branch): Fixup partition boundaries.
>> (emit_barrier_after_bb): Move here from bb-reorder.c, handle insertion
>>         in non-cfglayout mode.
>> (force_nonfallthru_and_redirect): Fixup partition boundaries,
>>         remove old code that tried to do this. Emit barrier correctly
>>         when we are in cfglayout mode.
>>         (last_bb_in_partition): New function.
>> (rtl_split_edge): Correctly fixup partition boundaries.
>> (commit_one_edge_insertion): Remove old code that tried to
>>         fixup region crossing edge since this is now handled in
>>         split_block, and set up insertion point correctly since
>>         block may now end in a jump.
>>         (verify_hot_cold_block_grouping): Guard against checking when not in
>>         linearized RTL mode.
>> (rtl_verify_edges): Add checks for incorrect/missing REG_CROSSING_JUMP
>>         notes.
>>         (rtl_verify_flow_info_1): Move verify_hot_cold_block_grouping to
>>         rtl_verify_flow_info, so not called in cfglayout mode.
>>         (rtl_verify_flow_info): Move verify_hot_cold_block_grouping here.
>> (fixup_reorder_chain): Remove old code that attempted to fixup region
>>         crossing note as this is now handled in force_nonfallthru_and_redirect.
>> (duplicate_insn_chain): Don't duplicate switch section notes.
>> (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
>>         note.
>> * basic-block.h (emit_barrier_after_bb): Declare.
>> * testsuite/gcc.dg/tree-prof/va-arg-pack-1.c: Cloned from c-torture, made
>>         into -freorder-blocks-and-partition test.
>> * testsuite/gcc.dg/tree-prof/comp-goto-1.c: Ditto.
>> * testsuite/gcc.dg/tree-prof/20041218-1.c: Ditto.
>> * testsuite/gcc.dg/tree-prof/pr52027.c: Use -O2.
>> * testsuite/gcc.dg/tree-prof/pr50907.c: Ditto.
>> * testsuite/gcc.dg/tree-prof/pr45354.c: Ditto.
>> * testsuite/g++.dg/tree-prof/partition2.C: Ditto.
>> * testsuite/g++.dg/tree-prof/partition3.C: Ditto.
>>
>> Index: ifcvt.c
>> ===================================================================
>> --- ifcvt.c (revision 199014)
>> +++ ifcvt.c (working copy)
>> @@ -3905,10 +3905,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
>>    if (new_bb)
>>      {
>>        df_bb_replace (then_bb_index, new_bb);
>> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
>> -         we need to ensure that new_bb is in the same partition as
>> -         test bb (you can not fall through across section boundaries).  */
>> -      BB_COPY_PARTITION (new_bb, test_bb);
>> +      /* This should have been done above via force_nonfallthru_and_redirect
>> +         (possibly called from redirect_edge_and_branch_force).  */
>> +      gcc_checking_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
>>      }
>>
>>    num_true_changes++;
>> Index: function.c
>> ===================================================================
>> --- function.c (revision 199014)
>> +++ function.c (working copy)
>> @@ -6270,8 +6270,10 @@ thread_prologue_and_epilogue_insns (void)
>>      break;
>>   if (e)
>>    {
>> -    copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
>> -  NULL_RTX, e->src);
>> +                    /* Make sure we insert after any barriers.  */
>> +                    rtx end = get_last_bb_insn (e->src);
>> +                    copy_bb = create_basic_block (NEXT_INSN (end),
>> +                                                  NULL_RTX, e->src);
>>      BB_COPY_PARTITION (copy_bb, e->src);
>>    }
>>   else
>> @@ -6538,7 +6540,7 @@ epilogue_done:
>>        basic_block simple_return_block_cold = NULL;
>>        edge pending_edge_hot = NULL;
>>        edge pending_edge_cold = NULL;
>> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
>> +      basic_block exit_pred;
>>        int i;
>>
>>        gcc_assert (entry_edge != orig_entry_edge);
>> @@ -6566,6 +6568,12 @@ epilogue_done:
>>      else
>>        pending_edge_cold = e;
>>    }
>> +
>> +      /* Save a pointer to the exit's predecessor BB for use in
>> +         inserting new BBs at the end of the function. Do this
>> +         after the call to split_block above which may split
>> +         the original exit pred.  */
>> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>
>>        FOR_EACH_VEC_ELT (unconverted_simple_returns, i, e)
>>   {
>> Index: emit-rtl.c
>> ===================================================================
>> --- emit-rtl.c (revision 199014)
>> +++ emit-rtl.c (working copy)
>> @@ -3574,6 +3574,7 @@ try_split (rtx pat, rtx trial, int last)
>>    break;
>>
>>   case REG_NON_LOCAL_GOTO:
>> + case REG_CROSSING_JUMP:
>>    for (insn = insn_last; insn != NULL_RTX; insn = PREV_INSN (insn))
>>      {
>>        if (JUMP_P (insn))
>> Index: cfgcleanup.c
>> ===================================================================
>> --- cfgcleanup.c (revision 199014)
>> +++ cfgcleanup.c (working copy)
>> @@ -456,7 +456,7 @@ try_forward_edges (int mode, basic_block b)
>>
>>        if (first != EXIT_BLOCK_PTR
>>    && find_reg_note (BB_END (first), REG_CROSSING_JUMP, NULL_RTX))
>> - return false;
>> + return changed;
>>
>>        while (counter < n_basic_blocks)
>>   {
>> Index: bb-reorder.c
>> ===================================================================
>> --- bb-reorder.c (revision 199014)
>> +++ bb-reorder.c (working copy)
>> @@ -1380,15 +1380,6 @@ get_uncond_jump_length (void)
>>    return length;
>>  }
>>
>> -/* Emit a barrier into the footer of BB.  */
>> -
>> -static void
>> -emit_barrier_after_bb (basic_block bb)
>> -{
>> -  rtx barrier = emit_barrier_after (BB_END (bb));
>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>> -}
>> -
>>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
>>     Duplicate the landing pad and split the edges so that no EH edge
>>     crosses partitions.  */
>> @@ -1720,8 +1711,7 @@ fix_up_fall_thru_edges (void)
>>       (i.e. fix it so the fall through does not cross and
>>       the cond jump does).  */
>>
>> -  if (!cond_jump_crosses
>> -      && cur_bb->aux == cond_jump->dest)
>> +  if (!cond_jump_crosses)
>>      {
>>        /* Find label in fall_thru block. We've already added
>>   any missing labels, so there must be one.  */
>> @@ -1765,10 +1755,10 @@ fix_up_fall_thru_edges (void)
>>        new_bb->aux = cur_bb->aux;
>>        cur_bb->aux = new_bb;
>>
>> -      /* Make sure new fall-through bb is in same
>> - partition as bb it's falling through from.  */
>> +                      /* This is done by force_nonfallthru_and_redirect.  */
>> +      gcc_assert (BB_PARTITION (new_bb)
>> +                                  == BB_PARTITION (cur_bb));
>>
>> -      BB_COPY_PARTITION (new_bb, cur_bb);
>>        single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
>>      }
>>    else
>> @@ -2064,7 +2054,10 @@ add_reg_crossing_jump_notes (void)
>>    FOR_EACH_BB (bb)
>>      FOR_EACH_EDGE (e, ei, bb->succs)
>>        if ((e->flags & EDGE_CROSSING)
>> -  && JUMP_P (BB_END (e->src)))
>> +  && JUMP_P (BB_END (e->src))
>> +          /* Some notes were added during fix_up_fall_thru_edges, via
>> +             force_nonfallthru_and_redirect.  */
>> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
>>   add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>  }
>>
>> @@ -2133,23 +2126,26 @@ reorder_basic_blocks (void)
>>     encountering this note will make the compiler switch between the
>>     hot and cold text sections.  */
>>
>> -static void
>> +void
>>  insert_section_boundary_note (void)
>>  {
>>    basic_block bb;
>> -  int first_partition = 0;
>> +  bool switched_sections = false;
>> +  int current_partition = 0;
>>
>> -  if (!flag_reorder_blocks_and_partition)
>> +  if (!crtl->has_bb_partition)
>>      return;
>>
>>    FOR_EACH_BB (bb)
>>      {
>> -      if (!first_partition)
>> - first_partition = BB_PARTITION (bb);
>> -      if (BB_PARTITION (bb) != first_partition)
>> +      if (!current_partition)
>> + current_partition = BB_PARTITION (bb);
>> +      if (BB_PARTITION (bb) != current_partition)
>>   {
>> -  emit_note_before (NOTE_INSN_SWITCH_TEXT_SECTIONS, BB_HEAD (bb));
>> -  break;
>> +  gcc_assert (!switched_sections);
>> +          switched_sections = true;
>> +          emit_note_before (NOTE_INSN_SWITCH_TEXT_SECTIONS, BB_HEAD (bb));
>> +          current_partition = BB_PARTITION (bb);
>>   }
>>      }
>>  }
>> @@ -2180,8 +2176,6 @@ rest_of_handle_reorder_blocks (void)
>>        bb->aux = bb->next_bb;
>>    cfg_layout_finalize ();
>>
>> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>> -  insert_section_boundary_note ();
>>    return 0;
>>  }
>>
>> @@ -2315,6 +2309,11 @@ duplicate_computed_gotos (void)
>>        if (!bitmap_bit_p (candidates, single_succ (bb)->index))
>>   continue;
>>
>> +      /* Don't duplicate a partition crossing edge, which requires difficult
>> +         fixup.  */
>> +      if (find_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX))
>> + continue;
>> +
>>        new_bb = duplicate_block (single_succ (bb), single_succ_edge (bb), bb);
>>        new_bb->aux = bb->aux;
>>        bb->aux = new_bb;
>> Index: bb-reorder.h
>> ===================================================================
>> --- bb-reorder.h (revision 199014)
>> +++ bb-reorder.h (working copy)
>> @@ -35,4 +35,6 @@ extern struct target_bb_reorder *this_target_bb_re
>>
>>  extern int get_uncond_jump_length (void);
>>
>> +extern void insert_section_boundary_note (void);
>> +
>>  #endif
>> Index: Makefile.in
>> ===================================================================
>> --- Makefile.in (revision 199014)
>> +++ Makefile.in (working copy)
>> @@ -3151,7 +3151,7 @@ cfgrtl.o : cfgrtl.c $(CONFIG_H) $(SYSTEM_H) corety
>>     $(FUNCTION_H) $(EXCEPT_H) $(TM_P_H) $(INSN_ATTR_H) \
>>     insn-config.h $(EXPR_H) \
>>     $(CFGLOOP_H) $(OBSTACK_H) $(TARGET_H) $(TREE_H) \
>> -   $(TREE_PASS_H) $(DF_H) $(GGC_H) $(COMMON_TARGET_H) gt-cfgrtl.h
>> +   $(TREE_PASS_H) $(DF_H) $(GGC_H) $(COMMON_TARGET_H) gt-cfgrtl.h bb-reorder.h
>>  cfganal.o : cfganal.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(BASIC_BLOCK_H) \
>>     $(TIMEVAR_H) sbitmap.h $(BITMAP_H)
>>  cfgbuild.o : cfgbuild.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>> Index: cfgrtl.c
>> ===================================================================
>> --- cfgrtl.c (revision 199014)
>> +++ cfgrtl.c (working copy)
>> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree.h"
>>  #include "hard-reg-set.h"
>>  #include "basic-block.h"
>> +#include "bb-reorder.h"
>>  #include "regs.h"
>>  #include "flags.h"
>>  #include "function.h"
>> @@ -451,6 +452,9 @@ rest_of_pass_free_cfg (void)
>>      }
>>  #endif
>>
>> +  if (crtl->has_bb_partition)
>> +    insert_section_boundary_note ();
>> +
>>    free_bb_for_insn ();
>>    return 0;
>>  }
>> @@ -981,8 +985,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
>>       partition boundaries).  See  the comments at the top of
>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>
>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>      return NULL;
>>
>>    /* We can replace or remove a complex jump only when we have exactly
>> @@ -1291,6 +1294,53 @@ redirect_branch_edge (edge e, basic_block target)
>>    return e;
>>  }
>>
>> +/* Called when edge E has been redirected to a new destination,
>> +   in order to update the region crossing flag on the edge and
>> +   jump.  */
>> +
>> +static void
>> +fixup_partition_crossing (edge e)
>> +{
>> +  rtx note;
>> +
>> +  if (e->src == ENTRY_BLOCK_PTR || e->dest == EXIT_BLOCK_PTR)
>> +    return;
>> +  /* If we redirected an existing edge, it may already be marked
>> +     crossing, even though the new src is missing a reg crossing note.
>> +     But make sure reg crossing note doesn't already exist before
>> +     inserting.  */
>> +  if (BB_PARTITION (e->src) != BB_PARTITION (e->dest))
>> +    {
>> +      e->flags |= EDGE_CROSSING;
>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>> +      if (JUMP_P (BB_END (e->src))
>> +          && !note)
>> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>> +    }
>> +  else if (BB_PARTITION (e->src) == BB_PARTITION (e->dest))
>> +    {
>> +      e->flags &= ~EDGE_CROSSING;
>> +      /* Remove the section crossing note from jump at end of
>> +         src if it exists, and if no other successors are
>> +         still crossing.  */
>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>> +      if (note)
>> +        {
>> +          bool has_crossing_succ = false;
>> +          edge e2;
>> +          edge_iterator ei;
>> +          FOR_EACH_EDGE (e2, ei, e->src->succs)
>> +            {
>> +              has_crossing_succ |= (e2->flags & EDGE_CROSSING);
>> +              if (has_crossing_succ)
>> +                break;
>> +            }
>> +          if (!has_crossing_succ)
>> +            remove_note (BB_END (e->src), note);
>> +        }
>> +    }
>> +}
>> +
>>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>>     expense of adding new instructions or reordering basic blocks.
>>
>> @@ -1307,16 +1357,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>  {
>>    edge ret;
>>    basic_block src = e->src;
>> +  basic_block dest = e->dest;
>>
>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>      return NULL;
>>
>> -  if (e->dest == target)
>> +  if (dest == target)
>>      return e;
>>
>>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
>>      {
>>        df_set_bb_dirty (src);
>> +      fixup_partition_crossing (ret);
>>        return ret;
>>      }
>>
>> @@ -1325,9 +1377,22 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>      return NULL;
>>
>>    df_set_bb_dirty (src);
>> +  fixup_partition_crossing (ret);
>>    return ret;
>>  }
>>
>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
>> +
>> +void
>> +emit_barrier_after_bb (basic_block bb)
>> +{
>> +  rtx barrier = emit_barrier_after (BB_END (bb));
>> +  gcc_assert (current_ir_type() == IR_RTL_CFGRTL
>> +              || current_ir_type () == IR_RTL_CFGLAYOUT);
>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>> +}
>> +
>>  /* Like force_nonfallthru below, but additionally performs redirection
>>     Used by redirect_edge_and_branch_force.  JUMP_LABEL is used only
>>     when redirecting to the EXIT_BLOCK, it is either ret_rtx or
>> @@ -1492,12 +1557,6 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>        /* Make sure new block ends up in correct hot/cold section.  */
>>
>>        BB_COPY_PARTITION (jump_block, e->src);
>> -      if (flag_reorder_blocks_and_partition
>> -  && targetm_common.have_named_sections
>> -  && JUMP_P (BB_END (jump_block))
>> -  && !any_condjump_p (BB_END (jump_block))
>> -  && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
>> - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
>>
>>        /* Wire edge in.  */
>>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
>> @@ -1508,6 +1567,10 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>        redirect_edge_pred (e, jump_block);
>>        e->probability = REG_BR_PROB_BASE;
>>
>> +      /* If e->src was previously region crossing, it no longer is
>> +         and the reg crossing note should be removed.  */
>> +      fixup_partition_crossing (new_edge);
>> +
>>        /* If asm goto has any label refs to target's label,
>>   add also edge from asm goto bb to target.  */
>>        if (asm_goto_edge)
>> @@ -1559,13 +1622,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>        LABEL_NUSES (label)++;
>>      }
>>
>> -  emit_barrier_after (BB_END (jump_block));
>> +  /* We might be in cfg layout mode, and if so, the following routine will
>> +     insert the barrier correctly.  */
>> +  emit_barrier_after_bb (jump_block);
>>    redirect_edge_succ_nodup (e, target);
>>
>>    if (abnormal_edge_flags)
>>      make_edge (src, target, abnormal_edge_flags);
>>
>>    df_mark_solutions_dirty ();
>> +  fixup_partition_crossing (e);
>>    return new_bb;
>>  }
>>
>> @@ -1654,6 +1720,21 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>>    return false;
>>  }
>>
>> +/* Locate the last bb in the same partition as START_BB.  */
>> +
>> +static basic_block
>> +last_bb_in_partition (basic_block start_bb)
>> +{
>> +  basic_block bb;
>> +  FOR_BB_BETWEEN (bb, start_bb, EXIT_BLOCK_PTR, next_bb)
>> +    {
>> +      if (BB_PARTITION (start_bb) != BB_PARTITION (bb->next_bb))
>> +        return bb;
>> +    }
>> +  /* Return bb before EXIT_BLOCK_PTR.  */
>> +  return bb->prev_bb;
>> +}
>> +
>>  /* Split a (typically critical) edge.  Return the new block.
>>     The edge must not be abnormal.
>>
>> @@ -1664,7 +1745,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>>  static basic_block
>>  rtl_split_edge (edge edge_in)
>>  {
>> -  basic_block bb;
>> +  basic_block bb, new_bb;
>>    rtx before;
>>
>>    /* Abnormal edges cannot be split.  */
>> @@ -1696,13 +1777,50 @@ rtl_split_edge (edge edge_in)
>>      }
>>    else
>>      {
>> -      bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
>> -      /* ??? Why not edge_in->dest->prev_bb here?  */
>> -      BB_COPY_PARTITION (bb, edge_in->dest);
>> +      if (edge_in->src == ENTRY_BLOCK_PTR)
>> +        {
>> +          bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
>> +          BB_COPY_PARTITION (bb, edge_in->dest);
>> +        }
>> +      else
>> +        {
>> +          basic_block after = edge_in->dest->prev_bb;
>> +          /* If this is post-bb reordering, and the edge crosses a partition
>> +             boundary, the new block needs to be inserted in the bb chain
>> +             at the end of the src partition (since we put the new bb into
>> +             that partition, see below). Otherwise we may end up creating
>> +             an extra partition crossing in the chain, which is illegal.
>> +             It can't go after the src, because src may have a fall-through
>> +             to a different block.  */
>> +          if (crtl->bb_reorder_complete
>> +              && (edge_in->flags & EDGE_CROSSING))
>> +            {
>> +              after = last_bb_in_partition (edge_in->src);
>> +              before = NEXT_INSN (BB_END (after));
>> +              /* The instruction following the last bb in partition should
>> +                 be a barrier, since it cannot end in a fall-through.  */
>> +              gcc_checking_assert (BARRIER_P (before));
>> +              before = NEXT_INSN (before);
>> +            }
>> +          bb = create_basic_block (before, NULL, after);
>> +          /* Put the split bb into the src partition, to avoid creating
>> +             a situation where a cold bb dominates a hot bb, in the case
>> +             where src is cold and dest is hot. The src will dominate
>> +             the new bb (whereas it might not have dominated dest).  */
>> +          BB_COPY_PARTITION (bb, edge_in->src);
>> +        }
>>      }
>>
>>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
>>
>> +  /* Can't allow a region crossing edge to be fallthrough.  */
>> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
>> +      && edge_in->dest != EXIT_BLOCK_PTR)
>> +    {
>> +      new_bb = force_nonfallthru (single_succ_edge (bb));
>> +      gcc_assert (!new_bb);
>> +    }
>> +
>>    /* For non-fallthru edges, we must adjust the predecessor's
>>       jump instruction to target our new block.  */
>>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
>> @@ -1815,17 +1933,13 @@ commit_one_edge_insertion (edge e)
>>    else
>>      {
>>        bb = split_edge (e);
>> -      after = BB_END (bb);
>>
>> -      if (flag_reorder_blocks_and_partition
>> -  && targetm_common.have_named_sections
>> -  && e->src != ENTRY_BLOCK_PTR
>> -  && BB_PARTITION (e->src) == BB_COLD_PARTITION
>> -  && !(e->flags & EDGE_CROSSING)
>> -  && JUMP_P (after)
>> -  && !any_condjump_p (after)
>> -  && (single_succ_edge (bb)->flags & EDGE_CROSSING))
>> - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
>> +      /* If E crossed a partition boundary, we needed to make bb end in
>> +         a region-crossing jump, even though it was originally fallthru.  */
>> +      if (JUMP_P (BB_END (bb)))
>> + before = BB_END (bb);
>> +      else
>> +        after = BB_END (bb);
>>      }
>>
>>    /* Now that we've found the spot, do the insertion.  */
>> @@ -2071,7 +2185,11 @@ verify_hot_cold_block_grouping (void)
>>    bool switched_sections = false;
>>    int current_partition = BB_UNPARTITIONED;
>>
>> -  if (!crtl->bb_reorder_complete)
>> +  /* Even after bb reordering is complete, we go into cfglayout mode
>> +     again (in compgoto). Ensure we don't call this before going back
>> +     into linearized RTL when any layout fixes would have been committed.  */
>> +  if (!crtl->bb_reorder_complete
>> +      || current_ir_type() != IR_RTL_CFGRTL)
>>      return err;
>>
>>    FOR_EACH_BB (bb)
>> @@ -2116,6 +2234,7 @@ rtl_verify_edges (void)
>>        edge e, fallthru = NULL;
>>        edge_iterator ei;
>>        rtx note;
>> +      bool has_crossing_edge = false;
>>
>>        if (JUMP_P (BB_END (bb))
>>    && (note = find_reg_note (BB_END (bb), REG_BR_PROB, NULL_RTX))
>> @@ -2141,6 +2260,7 @@ rtl_verify_edges (void)
>>    is_crossing = (BB_PARTITION (e->src) != BB_PARTITION (e->dest)
>>   && e->src != ENTRY_BLOCK_PTR
>>   && e->dest != EXIT_BLOCK_PTR);
>> +          has_crossing_edge |= is_crossing;
>>    if (e->flags & EDGE_CROSSING)
>>      {
>>        if (!is_crossing)
>> @@ -2160,6 +2280,13 @@ rtl_verify_edges (void)
>>   e->src->index);
>>    err = 1;
>>   }
>> +              if (JUMP_P (BB_END (bb))
>> +                  && !find_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX))
>> + {
>> +  error ("No region crossing jump at section boundary in bb %i",
>> + bb->index);
>> +  err = 1;
>> + }
>>      }
>>    else if (is_crossing)
>>      {
>> @@ -2188,6 +2315,15 @@ rtl_verify_edges (void)
>>      n_abnormal++;
>>   }
>>
>> +        if (!has_crossing_edge
>> +            && find_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX))
>> +          {
>> +            print_rtl_with_bb (stderr, get_insns (), TDF_RTL |
>> TDF_BLOCKS | TDF_DETAILS);
>> +            error ("Region crossing jump across same section in bb %i",
>> +                   bb->index);
>> +            err = 1;
>> +          }
>> +
>>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
>>   {
>>    error ("missing REG_EH_REGION note at the end of bb %i", bb->index);
>> @@ -2395,8 +2531,6 @@ rtl_verify_flow_info_1 (void)
>>
>>    err |= rtl_verify_edges ();
>>
>> -  err |= verify_hot_cold_block_grouping();
>> -
>>    return err;
>>  }
>>
>> @@ -2642,6 +2776,8 @@ rtl_verify_flow_info (void)
>>
>>    err |= rtl_verify_bb_layout ();
>>
>> +  err |= verify_hot_cold_block_grouping ();
>> +
>>    return err;
>>  }
>>
>> @@ -3343,7 +3479,7 @@ fixup_reorder_chain (void)
>>        edge e_fall, e_taken, e;
>>        rtx bb_end_insn;
>>        rtx ret_label = NULL_RTX;
>> -      basic_block nb, src_bb;
>> +      basic_block nb;
>>        edge_iterator ei;
>>
>>        if (EDGE_COUNT (bb->succs) == 0)
>> @@ -3478,7 +3614,6 @@ fixup_reorder_chain (void)
>>        /* We got here if we need to add a new jump insn.
>>   Note force_nonfallthru can delete E_FALL and thus we have to
>>   save E_FALL->src prior to the call to force_nonfallthru.  */
>> -      src_bb = e_fall->src;
>>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
>>        if (nb)
>>   {
>> @@ -3486,17 +3621,6 @@ fixup_reorder_chain (void)
>>    bb->aux = nb;
>>    /* Don't process this new block.  */
>>    bb = nb;
>> -
>> -  /* Make sure new bb is tagged for correct section (same as
>> -     fall-thru source, since you cannot fall-thru across
>> -     section boundaries).  */
>> -  BB_COPY_PARTITION (src_bb, single_pred (bb));
>> -  if (flag_reorder_blocks_and_partition
>> -      && targetm_common.have_named_sections
>> -      && JUMP_P (BB_END (bb))
>> -      && !any_condjump_p (BB_END (bb))
>> -      && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
>> -    add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
>>   }
>>      }
>>
>> @@ -3796,10 +3920,11 @@ duplicate_insn_chain (rtx from, rtx to)
>>      case NOTE_INSN_FUNCTION_BEG:
>>        /* There is always just single entry to function.  */
>>      case NOTE_INSN_BASIC_BLOCK:
>> +              /* We should only switch text sections once.  */
>> +    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>        break;
>>
>>      case NOTE_INSN_EPILOGUE_BEG:
>> -    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>        emit_note_copy (insn);
>>        break;
>>
>> @@ -4611,8 +4736,7 @@ rtl_can_remove_branch_p (const_edge e)
>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>      return false;
>>
>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>      return false;
>>
>>    if (!onlyjump_p (insn)
>> Index: basic-block.h
>> ===================================================================
>> --- basic-block.h (revision 199014)
>> +++ basic-block.h (working copy)
>> @@ -796,6 +796,7 @@ extern basic_block force_nonfallthru_and_redirect
>>  extern bool contains_no_active_insn_p (const_basic_block);
>>  extern bool forwarder_block_p (const_basic_block);
>>  extern bool can_fallthru (basic_block, basic_block);
>> +extern void emit_barrier_after_bb (basic_block bb);
>>
>>  /* In cfgbuild.c.  */
>>  extern void find_many_sub_basic_blocks (sbitmap);
>> Index: testsuite/gcc.dg/tree-prof/va-arg-pack-1.c
>> ===================================================================
>> --- testsuite/gcc.dg/tree-prof/va-arg-pack-1.c (revision 0)
>> +++ testsuite/gcc.dg/tree-prof/va-arg-pack-1.c (revision 0)
>> @@ -0,0 +1,145 @@
>> +/* __builtin_va_arg_pack () builtin tests.  */
>> +/* { dg-require-effective-target freorder } */
>> +/* { dg-options "-O2 -freorder-blocks-and-partition" } */
>> +
>> +#include <stdarg.h>
>> +
>> +extern void abort (void);
>> +
>> +int v1 = 8;
>> +long int v2 = 3;
>> +void *v3 = (void *) &v2;
>> +struct A { char c[16]; } v4 = { "foo" };
>> +long double v5 = 40;
>> +char seen[20];
>> +int cnt;
>> +
>> +__attribute__ ((noinline)) int
>> +foo1 (int x, int y, ...)
>> +{
>> +  int i;
>> +  long int l;
>> +  void *v;
>> +  struct A a;
>> +  long double ld;
>> +  va_list ap;
>> +
>> +  va_start (ap, y);
>> +  if (x < 0 || x >= 20 || seen[x])
>> +    abort ();
>> +  seen[x] = ++cnt;
>> +  if (y != 6)
>> +    abort ();
>> +  i = va_arg (ap, int);
>> +  if (i != 5)
>> +    abort ();
>> +  switch (x)
>> +    {
>> +    case 0:
>> +      i = va_arg (ap, int);
>> +      if (i != 9 || v1 != 9)
>> + abort ();
>> +      a = va_arg (ap, struct A);
>> +      if (__builtin_memcmp (a.c, v4.c, sizeof (a.c)) != 0)
>> + abort ();
>> +      v = (void *) va_arg (ap, struct A *);
>> +      if (v != (void *) &v4)
>> + abort ();
>> +      l = va_arg (ap, long int);
>> +      if (l != 3 || v2 != 4)
>> + abort ();
>> +      break;
>> +    case 1:
>> +      ld = va_arg (ap, long double);
>> +      if (ld != 41 || v5 != ld)
>> + abort ();
>> +      i = va_arg (ap, int);
>> +      if (i != 8)
>> + abort ();
>> +      v = va_arg (ap, void *);
>> +      if (v != &v2)
>> + abort ();
>> +      break;
>> +    case 2:
>> +      break;
>> +    default:
>> +      abort ();
>> +    }
>> +  va_end (ap);
>> +  return x;
>> +}
>> +
>> +__attribute__ ((noinline)) int
>> +foo2 (int x, int y, ...)
>> +{
>> +  long long int ll;
>> +  void *v;
>> +  struct A a, b;
>> +  long double ld;
>> +  va_list ap;
>> +
>> +  va_start (ap, y);
>> +  if (x < 0 || x >= 20 || seen[x])
>> +    abort ();
>> +  seen[x] = ++cnt | 64;
>> +  if (y != 10)
>> +    abort ();
>> +  switch (x)
>> +    {
>> +    case 11:
>> +      break;
>> +    case 12:
>> +      ld = va_arg (ap, long double);
>> +      if (ld != 41 || v5 != 40)
>> + abort ();
>> +      a = va_arg (ap, struct A);
>> +      if (__builtin_memcmp (a.c, v4.c, sizeof (a.c)) != 0)
>> + abort ();
>> +      b = va_arg (ap, struct A);
>> +      if (__builtin_memcmp (b.c, v4.c, sizeof (b.c)) != 0)
>> + abort ();
>> +      v = va_arg (ap, void *);
>> +      if (v != &v2)
>> + abort ();
>> +      ll = va_arg (ap, long long int);
>> +      if (ll != 16LL)
>> + abort ();
>> +      break;
>> +    case 2:
>> +      break;
>> +    default:
>> +      abort ();
>> +    }
>> +  va_end (ap);
>> +  return x + 8;
>> +}
>> +
>> +__attribute__ ((noinline)) int
>> +foo3 (void)
>> +{
>> +  return 6;
>> +}
>> +
>> +extern inline __attribute__ ((always_inline, gnu_inline)) int
>> +bar (int x, ...)
>> +{
>> +  if (x < 10)
>> +    return foo1 (x, foo3 (), 5, __builtin_va_arg_pack ());
>> +  return foo2 (x, foo3 () + 4, __builtin_va_arg_pack ());
>> +}
>> +
>> +int
>> +main (void)
>> +{
>> +  if (bar (0, ++v1, v4, &v4, v2++) != 0)
>> +    abort ();
>> +  if (bar (1, ++v5, 8, v3) != 1)
>> +    abort ();
>> +  if (bar (2) != 2)
>> +    abort ();
>> +  if (bar (v1 + 2) != 19)
>> +    abort ();
>> +  if (bar (v1 + 3, v5--, v4, v4, v3, 16LL) != 20)
>> +    abort ();
>> +  return 0;
>> +}
>> Index: testsuite/gcc.dg/tree-prof/comp-goto-1.c
>> ===================================================================
>> --- testsuite/gcc.dg/tree-prof/comp-goto-1.c (revision 0)
>> +++ testsuite/gcc.dg/tree-prof/comp-goto-1.c (revision 0)
>> @@ -0,0 +1,166 @@
>> +/* { dg-require-effective-target freorder } */
>> +/* { dg-options "-O2 -freorder-blocks-and-partition" } */
>> +#include <stdlib.h>
>> +
>> +#if !defined(NO_LABEL_VALUES) && (!defined(STACK_SIZE) || STACK_SIZE
>>>= 4000) && __INT_MAX__ >= 2147483647
>> +typedef unsigned int uint32;
>> +typedef signed int sint32;
>> +
>> +typedef uint32 reg_t;
>> +
>> +typedef unsigned long int host_addr_t;
>> +typedef uint32 target_addr_t;
>> +typedef sint32 target_saddr_t;
>> +
>> +typedef union
>> +{
>> +  struct
>> +    {
>> +      unsigned int offset:18;
>> +      unsigned int ignore:4;
>> +      unsigned int s1:8;
>> +      int :2;
>> +      signed int simm:14;
>> +      unsigned int s3:8;
>> +      unsigned int s2:8;
>> +      int pad2:2;
>> +    } f1;
>> +  long long ll;
>> +  double d;
>> +} insn_t;
>> +
>> +typedef struct
>> +{
>> +  target_addr_t vaddr_tag;
>> +  unsigned long int rigged_paddr;
>> +} tlb_entry_t;
>> +
>> +typedef struct
>> +{
>> +  insn_t *pc;
>> +  reg_t registers[256];
>> +  insn_t *program;
>> +  tlb_entry_t tlb_tab[0x100];
>> +} environment_t;
>> +
>> +enum operations
>> +{
>> +  LOAD32_RR,
>> +  METAOP_DONE
>> +};
>> +
>> +host_addr_t
>> +f ()
>> +{
>> +  abort ();
>> +}
>> +
>> +reg_t
>> +simulator_kernel (int what, environment_t *env)
>> +{
>> +  register insn_t *pc = env->pc;
>> +  register reg_t *regs = env->registers;
>> +  register insn_t insn;
>> +  register int s1;
>> +  register reg_t r2;
>> +  register void *base_addr = &&sim_base_addr;
>> +  register tlb_entry_t *tlb = env->tlb_tab;
>> +
>> +  if (what != 0)
>> +    {
>> +      int i;
>> +      static void *op_map[] =
>> + {
>> +  &&L_LOAD32_RR,
>> +  &&L_METAOP_DONE,
>> + };
>> +      insn_t *program = env->program;
>> +      for (i = 0; i < what; i++)
>> + program[i].f1.offset = op_map[program[i].f1.offset] - base_addr;
>> +    }
>> +
>> + sim_base_addr:;
>> +
>> +  insn = *pc++;
>> +  r2 = (*(reg_t *) (((char *) regs) + (insn.f1.s2 << 2)));
>> +  s1 = (insn.f1.s1 << 2);
>> +  goto *(base_addr + insn.f1.offset);
>> +
>> + L_LOAD32_RR:
>> +  {
>> +    target_addr_t vaddr_page = r2 / 4096;
>> +    unsigned int x = vaddr_page % 0x100;
>> +    insn = *pc++;
>> +
>> +    for (;;)
>> +      {
>> + target_addr_t tag = tlb[x].vaddr_tag;
>> + host_addr_t rigged_paddr = tlb[x].rigged_paddr;
>> +
>> + if (tag == vaddr_page)
>> +  {
>> +    *(reg_t *) (((char *) regs) + s1) = *(uint32 *) (rigged_paddr + r2);
>> +    r2 = *(reg_t *) (((char *) regs) + (insn.f1.s2 << 2));
>> +    s1 = insn.f1.s1 << 2;
>> +    goto *(base_addr + insn.f1.offset);
>> +  }
>> +
>> + if (((target_saddr_t) tag < 0))
>> +  {
>> +    *(reg_t *) (((char *) regs) + s1) = *(uint32 *) f ();
>> +    r2 = *(reg_t *) (((char *) regs) + (insn.f1.s2 << 2));
>> +    s1 = insn.f1.s1 << 2;
>> +    goto *(base_addr + insn.f1.offset);
>> +  }
>> +
>> + x = (x - 1) % 0x100;
>> +      }
>> +
>> +    L_METAOP_DONE:
>> +      return (*(reg_t *) (((char *) regs) + s1));
>> +  }
>> +}
>> +
>> +insn_t program[2 + 1];
>> +
>> +void *malloc ();
>> +
>> +int
>> +main ()
>> +{
>> +  environment_t env;
>> +  insn_t insn;
>> +  int i, res;
>> +  host_addr_t a_page = (host_addr_t) malloc (2 * 4096);
>> +  target_addr_t a_vaddr = 0x123450;
>> +  target_addr_t vaddr_page = a_vaddr / 4096;
>> +  a_page = (a_page + 4096 - 1) & -4096;
>> +
>> +  env.tlb_tab[((vaddr_page) % 0x100)].vaddr_tag = vaddr_page;
>> +  env.tlb_tab[((vaddr_page) % 0x100)].rigged_paddr = a_page -
>> vaddr_page * 4096;
>> +  insn.f1.offset = LOAD32_RR;
>> +  env.registers[0] = 0;
>> +  env.registers[2] = a_vaddr;
>> +  *(sint32 *) (a_page + a_vaddr % 4096) = 88;
>> +  insn.f1.s1 = 0;
>> +  insn.f1.s2 = 2;
>> +
>> +  for (i = 0; i < 2; i++)
>> +    program[i] = insn;
>> +
>> +  insn.f1.offset = METAOP_DONE;
>> +  insn.f1.s1 = 0;
>> +  program[2] = insn;
>> +
>> +  env.pc = program;
>> +  env.program = program;
>> +
>> +  res = simulator_kernel (2 + 1, &env);
>> +
>> +  if (res != 88)
>> +    abort ();
>> +  exit (0);
>> +}
>> +#else
>> +main(){ exit (0); }
>> +#endif
>> Index: testsuite/gcc.dg/tree-prof/pr52027.c
>> ===================================================================
>> --- testsuite/gcc.dg/tree-prof/pr52027.c (revision 199014)
>> +++ testsuite/gcc.dg/tree-prof/pr52027.c (working copy)
>> @@ -1,6 +1,6 @@
>>  /* PR debug/52027 */
>>  /* { dg-require-effective-target freorder } */
>> -/* { dg-options "-O -freorder-blocks-and-partition -fno-reorder-functions" } */
>> +/* { dg-options "-O2 -freorder-blocks-and-partition
>> -fno-reorder-functions" } */
>>
>>  void
>>  foo (int len)
>> Index: testsuite/gcc.dg/tree-prof/pr50907.c
>> ===================================================================
>> --- testsuite/gcc.dg/tree-prof/pr50907.c (revision 199014)
>> +++ testsuite/gcc.dg/tree-prof/pr50907.c (working copy)
>> @@ -1,5 +1,5 @@
>>  /* PR middle-end/50907 */
>>  /* { dg-require-effective-target freorder } */
>> -/* { dg-options "-O -freorder-blocks-and-partition -fschedule-insns
>> -fselective-scheduling -fpic" { target { { powerpc*-*-* ia64-*-*
>> x86_64-*-* } && fpic } } } */
>> +/* { dg-options "-O2 -freorder-blocks-and-partition -fschedule-insns
>> -fselective-scheduling -fpic" { target { { powerpc*-*-* ia64-*-*
>> x86_64-*-* } && fpic } } } */
>>
>>  #include "pr45354.c"
>> Index: testsuite/gcc.dg/tree-prof/pr45354.c
>> ===================================================================
>> --- testsuite/gcc.dg/tree-prof/pr45354.c (revision 199014)
>> +++ testsuite/gcc.dg/tree-prof/pr45354.c (working copy)
>> @@ -1,5 +1,5 @@
>>  /* { dg-require-effective-target freorder } */
>> -/* { dg-options "-O -freorder-blocks-and-partition -fschedule-insns
>> -fselective-scheduling" { target powerpc*-*-* ia64-*-* x86_64-*-* } }
>> */
>> +/* { dg-options "-O2 -freorder-blocks-and-partition -fschedule-insns
>> -fselective-scheduling" { target powerpc*-*-* ia64-*-* x86_64-*-* } }
>> */
>>
>>  extern void abort (void);
>>
>> Index: testsuite/gcc.dg/tree-prof/20041218-1.c
>> ===================================================================
>> --- testsuite/gcc.dg/tree-prof/20041218-1.c (revision 0)
>> +++ testsuite/gcc.dg/tree-prof/20041218-1.c (revision 0)
>> @@ -0,0 +1,119 @@
>> +/* PR rtl-optimization/16968 */
>> +/* Testcase by Jakub Jelinek  <jakub@redhat.com> */
>> +/* { dg-require-effective-target freorder } */
>> +/* { dg-options "-O2 -freorder-blocks-and-partition" } */
>> +
>> +struct T
>> +{
>> +  unsigned int b, c, *d;
>> +  unsigned char e;
>> +};
>> +struct S
>> +{
>> +  unsigned int a;
>> +  struct T f;
>> +};
>> +struct U
>> +{
>> +  struct S g, h;
>> +};
>> +struct V
>> +{
>> +  unsigned int i;
>> +  struct U j;
>> +};
>> +
>> +extern void exit (int);
>> +extern void abort (void);
>> +
>> +void *
>> +dummy1 (void *x)
>> +{
>> +  return "";
>> +}
>> +
>> +void *
>> +dummy2 (void *x, void *y)
>> +{
>> +  exit (0);
>> +}
>> +
>> +struct V *
>> +baz (unsigned int x)
>> +{
>> +  static struct V v;
>> +  __builtin_memset (&v, 0x55, sizeof (v));
>> +  return &v;
>> +}
>> +
>> +int
>> +check (void *x, struct S *y)
>> +{
>> +  if (y->a || y->f.b || y->f.c || y->f.d || y->f.e)
>> +    abort ();
>> +  return 1;
>> +}
>> +
>> +static struct V *
>> +bar (unsigned int x, void *y)
>> +{
>> +  const struct T t = { 0, 0, (void *) 0, 0 };
>> +  struct V *u;
>> +  void *v;
>> +  v = dummy1 (y);
>> +  if (!v)
>> +    return (void *) 0;
>> +
>> +  u = baz (sizeof (struct V));
>> +  u->i = x;
>> +  u->j.g.a = 0;
>> +  u->j.g.f = t;
>> +  u->j.h.a = 0;
>> +  u->j.h.f = t;
>> +
>> +  if (!check (v, &u->j.g) || !check (v, &u->j.h))
>> +    return (void *) 0;
>> +  return u;
>> +}
>> +
>> +int
>> +foo (unsigned int *x, unsigned int y, void **z)
>> +{
>> +  void *v;
>> +  unsigned int i, j;
>> +
>> +  *z = v = (void *) 0;
>> +
>> +  for (i = 0; i < y; i++)
>> +    {
>> +      struct V *c;
>> +
>> +      j = *x;
>> +
>> +      switch (j)
>> + {
>> + case 1:
>> +  c = bar (j, x);
>> +  break;
>> + default:
>> +  c = 0;
>> +  break;
>> + }
>> +      if (c)
>> + v = dummy2 (v, c);
>> +      else
>> +        return 1;
>> +    }
>> +
>> +  *z = v;
>> +  return 0;
>> +}
>> +
>> +int
>> +main (void)
>> +{
>> +  unsigned int one = 1;
>> +  void *p;
>> +  foo (&one, 1, &p);
>> +  abort ();
>> +}
>> Index: testsuite/g++.dg/tree-prof/partition2.C
>> ===================================================================
>> --- testsuite/g++.dg/tree-prof/partition2.C (revision 199014)
>> +++ testsuite/g++.dg/tree-prof/partition2.C (working copy)
>> @@ -1,6 +1,6 @@
>>  // PR middle-end/45458
>>  // { dg-require-effective-target freorder }
>> -// { dg-options "-fnon-call-exceptions -freorder-blocks-and-partition" }
>> +// { dg-options "-O2 -fnon-call-exceptions -freorder-blocks-and-partition" }
>>
>>  int
>>  main ()
>> Index: testsuite/g++.dg/tree-prof/partition3.C
>> ===================================================================
>> --- testsuite/g++.dg/tree-prof/partition3.C (revision 199014)
>> +++ testsuite/g++.dg/tree-prof/partition3.C (working copy)
>> @@ -1,6 +1,6 @@
>>  // PR middle-end/45566
>>  // { dg-require-effective-target freorder }
>> -// { dg-options "-O -fnon-call-exceptions -freorder-blocks-and-partition" }
>> +// { dg-options "-O2 -fnon-call-exceptions -freorder-blocks-and-partition" }
>>
>>  int k;
>>
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



--
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Richard Guenther - June 6, 2013, 8:54 a.m.
On Wed, Jun 5, 2013 at 4:06 PM, Teresa Johnson <tejohnson@google.com> wrote:
> On Wed, May 29, 2013 at 7:57 AM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Thu, May 23, 2013 at 6:18 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>> On Wed, May 22, 2013 at 2:05 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>>> Revised patch included below. The spacing of my pasted in patch text
>>>> looks funky again, let me know if you want the patch as an attachment
>>>> instead.
>>>>
>>>> I addressed all of Steven's comments, except for the suggestion to use
>>>> gcc_assert
>>>> instead of error() in verify_hot_cold_block_grouping() to keep this consistent
>>>> with the rest of the verify_flow_info subroutines (let me know if this is ok).
>>>
>>> I fixed this issue too, which was actually in
>>> insert_section_boundary_note(), so that it gcc_asserts more
>>> efficiently as suggested. Retested, latest patch below.
>>>
>>> Honza, would you be able to review the patch?
>>
>> Ping. Still needs a global maintainer to review and approve.
>
> Ping.

This is ok.  Please watch for fallout!

Thanks,
Richard.

> Thanks!
> Teresa
>
>>
>> Also, I submitted a PR for the debug range issue:
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57451
>>
>> Thanks!
>> Teresa
>>
>>>
>>> Thanks!
>>> Teresa
>>>
>>>>
>>>> The other main changes:
>>>> (1) Added several test cases (cloned from the torture subdirectories,
>>>> where I manually
>>>> built/ran with FDO and -freorder-blocks-and-partition with both the
>>>> current trunk and
>>>> my fixed trunk compiler, and was able to expose some failures I fixed.
>>>> (2) Changed existing tree-prof tests that used
>>>> -freorder-blocks-and-partition to be
>>>> built with -O2 instead of -O, so that partitioning actually kicks in.
>>>> (3) Fixed a couple of failures in the new
>>>> verify_hot_cold_block_grouping() checks
>>>> exposed by the torture tests I ran manually with splitting (2 of the
>>>> tests cloned
>>>> to tree-prof in this patch). One was in computed goto where we were
>>>> too aggressive
>>>> about cloning crossing edges, and the other was in rtl_split_edge
>>>> called from the "stack"
>>>> pass which was not correctly inserting the new bb in the correct partition since
>>>> bb layout is complete at that point.
>>>>
>>>> Re-tested on x86_64-unknown-linux-gnu with bootstrap and profiledbootstrap
>>>> builds and regression testing. Re-built/ran cpu2006int with profile
>>>> feedback and -freorder-blocks-and-partition enabled.
>>>>
>>>> Ok for trunk?
>>>>
>>>> Thanks!
>>>> Teresa
>>>
>>> 2013-05-23  Teresa Johnson  <tejohnson@google.com>
>>>
>>> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
>>>         as this is now done by redirect_edge_and_branch_force.
>>> * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
>>>         barriers, and fix interaction with splitting.
>>> * emit-rtl.c (try_split): Copy REG_CROSSING_JUMP notes.
>>> * cfgcleanup.c (try_forward_edges): Fix early return value to properly
>>>         reflect changes made in the routine.
>>> * bb-reorder.c (emit_barrier_after_bb): Move to cfgrtl.c.
>>> (fix_up_fall_thru_edges): Remove incorrect check for bb layout order
>>>         since this is called in cfglayout mode, and replace partition fixup
>>>         with assert as that is now done by force_nonfallthru_and_redirect.
>>> (add_reg_crossing_jump_notes): Handle the fact that some jumps may
>>>         already be marked with region crossing note.
>>> (insert_section_boundary_note): Make non-static, gate on flag
>>>         has_bb_partition, rewrite to also check for multiple partitions.
>>> (rest_of_handle_reorder_blocks): Remove call to
>>>         insert_section_boundary_note, now done later during free_cfg.
>>>         (duplicate_computed_gotos): Don't duplicate partition crossing edge.
>>> * bb-reorder.h (insert_section_boundary_note): Declare.
>>> * Makefile.in (cfgrtl.o): Depend on bb-reorder.h
>>> * cfgrtl.c (rest_of_pass_free_cfg): If partitions exist
>>>         invoke insert_section_boundary_note.
>>> (try_redirect_by_replacing_jump): Remove unnecessary
>>>         check for region crossing note.
>>> (fixup_partition_crossing): New function.
>>> (rtl_redirect_edge_and_branch): Fixup partition boundaries.
>>> (emit_barrier_after_bb): Move here from bb-reorder.c, handle insertion
>>>         in non-cfglayout mode.
>>> (force_nonfallthru_and_redirect): Fixup partition boundaries,
>>>         remove old code that tried to do this. Emit barrier correctly
>>>         when we are in cfglayout mode.
>>>         (last_bb_in_partition): New function.
>>> (rtl_split_edge): Correctly fixup partition boundaries.
>>> (commit_one_edge_insertion): Remove old code that tried to
>>>         fixup region crossing edge since this is now handled in
>>>         split_block, and set up insertion point correctly since
>>>         block may now end in a jump.
>>>         (verify_hot_cold_block_grouping): Guard against checking when not in
>>>         linearized RTL mode.
>>> (rtl_verify_edges): Add checks for incorrect/missing REG_CROSSING_JUMP
>>>         notes.
>>>         (rtl_verify_flow_info_1): Move verify_hot_cold_block_grouping to
>>>         rtl_verify_flow_info, so not called in cfglayout mode.
>>>         (rtl_verify_flow_info): Move verify_hot_cold_block_grouping here.
>>> (fixup_reorder_chain): Remove old code that attempted to fixup region
>>>         crossing note as this is now handled in force_nonfallthru_and_redirect.
>>> (duplicate_insn_chain): Don't duplicate switch section notes.
>>> (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
>>>         note.
>>> * basic-block.h (emit_barrier_after_bb): Declare.
>>> * testsuite/gcc.dg/tree-prof/va-arg-pack-1.c: Cloned from c-torture, made
>>>         into -freorder-blocks-and-partition test.
>>> * testsuite/gcc.dg/tree-prof/comp-goto-1.c: Ditto.
>>> * testsuite/gcc.dg/tree-prof/20041218-1.c: Ditto.
>>> * testsuite/gcc.dg/tree-prof/pr52027.c: Use -O2.
>>> * testsuite/gcc.dg/tree-prof/pr50907.c: Ditto.
>>> * testsuite/gcc.dg/tree-prof/pr45354.c: Ditto.
>>> * testsuite/g++.dg/tree-prof/partition2.C: Ditto.
>>> * testsuite/g++.dg/tree-prof/partition3.C: Ditto.
>>>
>>> Index: ifcvt.c
>>> ===================================================================
>>> --- ifcvt.c (revision 199014)
>>> +++ ifcvt.c (working copy)
>>> @@ -3905,10 +3905,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
>>>    if (new_bb)
>>>      {
>>>        df_bb_replace (then_bb_index, new_bb);
>>> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
>>> -         we need to ensure that new_bb is in the same partition as
>>> -         test bb (you can not fall through across section boundaries).  */
>>> -      BB_COPY_PARTITION (new_bb, test_bb);
>>> +      /* This should have been done above via force_nonfallthru_and_redirect
>>> +         (possibly called from redirect_edge_and_branch_force).  */
>>> +      gcc_checking_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
>>>      }
>>>
>>>    num_true_changes++;
>>> Index: function.c
>>> ===================================================================
>>> --- function.c (revision 199014)
>>> +++ function.c (working copy)
>>> @@ -6270,8 +6270,10 @@ thread_prologue_and_epilogue_insns (void)
>>>      break;
>>>   if (e)
>>>    {
>>> -    copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
>>> -  NULL_RTX, e->src);
>>> +                    /* Make sure we insert after any barriers.  */
>>> +                    rtx end = get_last_bb_insn (e->src);
>>> +                    copy_bb = create_basic_block (NEXT_INSN (end),
>>> +                                                  NULL_RTX, e->src);
>>>      BB_COPY_PARTITION (copy_bb, e->src);
>>>    }
>>>   else
>>> @@ -6538,7 +6540,7 @@ epilogue_done:
>>>        basic_block simple_return_block_cold = NULL;
>>>        edge pending_edge_hot = NULL;
>>>        edge pending_edge_cold = NULL;
>>> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>> +      basic_block exit_pred;
>>>        int i;
>>>
>>>        gcc_assert (entry_edge != orig_entry_edge);
>>> @@ -6566,6 +6568,12 @@ epilogue_done:
>>>      else
>>>        pending_edge_cold = e;
>>>    }
>>> +
>>> +      /* Save a pointer to the exit's predecessor BB for use in
>>> +         inserting new BBs at the end of the function. Do this
>>> +         after the call to split_block above which may split
>>> +         the original exit pred.  */
>>> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>>
>>>        FOR_EACH_VEC_ELT (unconverted_simple_returns, i, e)
>>>   {
>>> Index: emit-rtl.c
>>> ===================================================================
>>> --- emit-rtl.c (revision 199014)
>>> +++ emit-rtl.c (working copy)
>>> @@ -3574,6 +3574,7 @@ try_split (rtx pat, rtx trial, int last)
>>>    break;
>>>
>>>   case REG_NON_LOCAL_GOTO:
>>> + case REG_CROSSING_JUMP:
>>>    for (insn = insn_last; insn != NULL_RTX; insn = PREV_INSN (insn))
>>>      {
>>>        if (JUMP_P (insn))
>>> Index: cfgcleanup.c
>>> ===================================================================
>>> --- cfgcleanup.c (revision 199014)
>>> +++ cfgcleanup.c (working copy)
>>> @@ -456,7 +456,7 @@ try_forward_edges (int mode, basic_block b)
>>>
>>>        if (first != EXIT_BLOCK_PTR
>>>    && find_reg_note (BB_END (first), REG_CROSSING_JUMP, NULL_RTX))
>>> - return false;
>>> + return changed;
>>>
>>>        while (counter < n_basic_blocks)
>>>   {
>>> Index: bb-reorder.c
>>> ===================================================================
>>> --- bb-reorder.c (revision 199014)
>>> +++ bb-reorder.c (working copy)
>>> @@ -1380,15 +1380,6 @@ get_uncond_jump_length (void)
>>>    return length;
>>>  }
>>>
>>> -/* Emit a barrier into the footer of BB.  */
>>> -
>>> -static void
>>> -emit_barrier_after_bb (basic_block bb)
>>> -{
>>> -  rtx barrier = emit_barrier_after (BB_END (bb));
>>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>> -}
>>> -
>>>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
>>>     Duplicate the landing pad and split the edges so that no EH edge
>>>     crosses partitions.  */
>>> @@ -1720,8 +1711,7 @@ fix_up_fall_thru_edges (void)
>>>       (i.e. fix it so the fall through does not cross and
>>>       the cond jump does).  */
>>>
>>> -  if (!cond_jump_crosses
>>> -      && cur_bb->aux == cond_jump->dest)
>>> +  if (!cond_jump_crosses)
>>>      {
>>>        /* Find label in fall_thru block. We've already added
>>>   any missing labels, so there must be one.  */
>>> @@ -1765,10 +1755,10 @@ fix_up_fall_thru_edges (void)
>>>        new_bb->aux = cur_bb->aux;
>>>        cur_bb->aux = new_bb;
>>>
>>> -      /* Make sure new fall-through bb is in same
>>> - partition as bb it's falling through from.  */
>>> +                      /* This is done by force_nonfallthru_and_redirect.  */
>>> +      gcc_assert (BB_PARTITION (new_bb)
>>> +                                  == BB_PARTITION (cur_bb));
>>>
>>> -      BB_COPY_PARTITION (new_bb, cur_bb);
>>>        single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
>>>      }
>>>    else
>>> @@ -2064,7 +2054,10 @@ add_reg_crossing_jump_notes (void)
>>>    FOR_EACH_BB (bb)
>>>      FOR_EACH_EDGE (e, ei, bb->succs)
>>>        if ((e->flags & EDGE_CROSSING)
>>> -  && JUMP_P (BB_END (e->src)))
>>> +  && JUMP_P (BB_END (e->src))
>>> +          /* Some notes were added during fix_up_fall_thru_edges, via
>>> +             force_nonfallthru_and_redirect.  */
>>> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
>>>   add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>  }
>>>
>>> @@ -2133,23 +2126,26 @@ reorder_basic_blocks (void)
>>>     encountering this note will make the compiler switch between the
>>>     hot and cold text sections.  */
>>>
>>> -static void
>>> +void
>>>  insert_section_boundary_note (void)
>>>  {
>>>    basic_block bb;
>>> -  int first_partition = 0;
>>> +  bool switched_sections = false;
>>> +  int current_partition = 0;
>>>
>>> -  if (!flag_reorder_blocks_and_partition)
>>> +  if (!crtl->has_bb_partition)
>>>      return;
>>>
>>>    FOR_EACH_BB (bb)
>>>      {
>>> -      if (!first_partition)
>>> - first_partition = BB_PARTITION (bb);
>>> -      if (BB_PARTITION (bb) != first_partition)
>>> +      if (!current_partition)
>>> + current_partition = BB_PARTITION (bb);
>>> +      if (BB_PARTITION (bb) != current_partition)
>>>   {
>>> -  emit_note_before (NOTE_INSN_SWITCH_TEXT_SECTIONS, BB_HEAD (bb));
>>> -  break;
>>> +  gcc_assert (!switched_sections);
>>> +          switched_sections = true;
>>> +          emit_note_before (NOTE_INSN_SWITCH_TEXT_SECTIONS, BB_HEAD (bb));
>>> +          current_partition = BB_PARTITION (bb);
>>>   }
>>>      }
>>>  }
>>> @@ -2180,8 +2176,6 @@ rest_of_handle_reorder_blocks (void)
>>>        bb->aux = bb->next_bb;
>>>    cfg_layout_finalize ();
>>>
>>> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>>> -  insert_section_boundary_note ();
>>>    return 0;
>>>  }
>>>
>>> @@ -2315,6 +2309,11 @@ duplicate_computed_gotos (void)
>>>        if (!bitmap_bit_p (candidates, single_succ (bb)->index))
>>>   continue;
>>>
>>> +      /* Don't duplicate a partition crossing edge, which requires difficult
>>> +         fixup.  */
>>> +      if (find_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX))
>>> + continue;
>>> +
>>>        new_bb = duplicate_block (single_succ (bb), single_succ_edge (bb), bb);
>>>        new_bb->aux = bb->aux;
>>>        bb->aux = new_bb;
>>> Index: bb-reorder.h
>>> ===================================================================
>>> --- bb-reorder.h (revision 199014)
>>> +++ bb-reorder.h (working copy)
>>> @@ -35,4 +35,6 @@ extern struct target_bb_reorder *this_target_bb_re
>>>
>>>  extern int get_uncond_jump_length (void);
>>>
>>> +extern void insert_section_boundary_note (void);
>>> +
>>>  #endif
>>> Index: Makefile.in
>>> ===================================================================
>>> --- Makefile.in (revision 199014)
>>> +++ Makefile.in (working copy)
>>> @@ -3151,7 +3151,7 @@ cfgrtl.o : cfgrtl.c $(CONFIG_H) $(SYSTEM_H) corety
>>>     $(FUNCTION_H) $(EXCEPT_H) $(TM_P_H) $(INSN_ATTR_H) \
>>>     insn-config.h $(EXPR_H) \
>>>     $(CFGLOOP_H) $(OBSTACK_H) $(TARGET_H) $(TREE_H) \
>>> -   $(TREE_PASS_H) $(DF_H) $(GGC_H) $(COMMON_TARGET_H) gt-cfgrtl.h
>>> +   $(TREE_PASS_H) $(DF_H) $(GGC_H) $(COMMON_TARGET_H) gt-cfgrtl.h bb-reorder.h
>>>  cfganal.o : cfganal.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(BASIC_BLOCK_H) \
>>>     $(TIMEVAR_H) sbitmap.h $(BITMAP_H)
>>>  cfgbuild.o : cfgbuild.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>> Index: cfgrtl.c
>>> ===================================================================
>>> --- cfgrtl.c (revision 199014)
>>> +++ cfgrtl.c (working copy)
>>> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "tree.h"
>>>  #include "hard-reg-set.h"
>>>  #include "basic-block.h"
>>> +#include "bb-reorder.h"
>>>  #include "regs.h"
>>>  #include "flags.h"
>>>  #include "function.h"
>>> @@ -451,6 +452,9 @@ rest_of_pass_free_cfg (void)
>>>      }
>>>  #endif
>>>
>>> +  if (crtl->has_bb_partition)
>>> +    insert_section_boundary_note ();
>>> +
>>>    free_bb_for_insn ();
>>>    return 0;
>>>  }
>>> @@ -981,8 +985,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
>>>       partition boundaries).  See  the comments at the top of
>>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>>
>>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>>      return NULL;
>>>
>>>    /* We can replace or remove a complex jump only when we have exactly
>>> @@ -1291,6 +1294,53 @@ redirect_branch_edge (edge e, basic_block target)
>>>    return e;
>>>  }
>>>
>>> +/* Called when edge E has been redirected to a new destination,
>>> +   in order to update the region crossing flag on the edge and
>>> +   jump.  */
>>> +
>>> +static void
>>> +fixup_partition_crossing (edge e)
>>> +{
>>> +  rtx note;
>>> +
>>> +  if (e->src == ENTRY_BLOCK_PTR || e->dest == EXIT_BLOCK_PTR)
>>> +    return;
>>> +  /* If we redirected an existing edge, it may already be marked
>>> +     crossing, even though the new src is missing a reg crossing note.
>>> +     But make sure reg crossing note doesn't already exist before
>>> +     inserting.  */
>>> +  if (BB_PARTITION (e->src) != BB_PARTITION (e->dest))
>>> +    {
>>> +      e->flags |= EDGE_CROSSING;
>>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>> +      if (JUMP_P (BB_END (e->src))
>>> +          && !note)
>>> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>> +    }
>>> +  else if (BB_PARTITION (e->src) == BB_PARTITION (e->dest))
>>> +    {
>>> +      e->flags &= ~EDGE_CROSSING;
>>> +      /* Remove the section crossing note from jump at end of
>>> +         src if it exists, and if no other successors are
>>> +         still crossing.  */
>>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>> +      if (note)
>>> +        {
>>> +          bool has_crossing_succ = false;
>>> +          edge e2;
>>> +          edge_iterator ei;
>>> +          FOR_EACH_EDGE (e2, ei, e->src->succs)
>>> +            {
>>> +              has_crossing_succ |= (e2->flags & EDGE_CROSSING);
>>> +              if (has_crossing_succ)
>>> +                break;
>>> +            }
>>> +          if (!has_crossing_succ)
>>> +            remove_note (BB_END (e->src), note);
>>> +        }
>>> +    }
>>> +}
>>> +
>>>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>>>     expense of adding new instructions or reordering basic blocks.
>>>
>>> @@ -1307,16 +1357,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>>  {
>>>    edge ret;
>>>    basic_block src = e->src;
>>> +  basic_block dest = e->dest;
>>>
>>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>>      return NULL;
>>>
>>> -  if (e->dest == target)
>>> +  if (dest == target)
>>>      return e;
>>>
>>>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
>>>      {
>>>        df_set_bb_dirty (src);
>>> +      fixup_partition_crossing (ret);
>>>        return ret;
>>>      }
>>>
>>> @@ -1325,9 +1377,22 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>>      return NULL;
>>>
>>>    df_set_bb_dirty (src);
>>> +  fixup_partition_crossing (ret);
>>>    return ret;
>>>  }
>>>
>>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
>>> +
>>> +void
>>> +emit_barrier_after_bb (basic_block bb)
>>> +{
>>> +  rtx barrier = emit_barrier_after (BB_END (bb));
>>> +  gcc_assert (current_ir_type() == IR_RTL_CFGRTL
>>> +              || current_ir_type () == IR_RTL_CFGLAYOUT);
>>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>> +}
>>> +
>>>  /* Like force_nonfallthru below, but additionally performs redirection
>>>     Used by redirect_edge_and_branch_force.  JUMP_LABEL is used only
>>>     when redirecting to the EXIT_BLOCK, it is either ret_rtx or
>>> @@ -1492,12 +1557,6 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>>        /* Make sure new block ends up in correct hot/cold section.  */
>>>
>>>        BB_COPY_PARTITION (jump_block, e->src);
>>> -      if (flag_reorder_blocks_and_partition
>>> -  && targetm_common.have_named_sections
>>> -  && JUMP_P (BB_END (jump_block))
>>> -  && !any_condjump_p (BB_END (jump_block))
>>> -  && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
>>> - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
>>>
>>>        /* Wire edge in.  */
>>>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
>>> @@ -1508,6 +1567,10 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>>        redirect_edge_pred (e, jump_block);
>>>        e->probability = REG_BR_PROB_BASE;
>>>
>>> +      /* If e->src was previously region crossing, it no longer is
>>> +         and the reg crossing note should be removed.  */
>>> +      fixup_partition_crossing (new_edge);
>>> +
>>>        /* If asm goto has any label refs to target's label,
>>>   add also edge from asm goto bb to target.  */
>>>        if (asm_goto_edge)
>>> @@ -1559,13 +1622,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>>        LABEL_NUSES (label)++;
>>>      }
>>>
>>> -  emit_barrier_after (BB_END (jump_block));
>>> +  /* We might be in cfg layout mode, and if so, the following routine will
>>> +     insert the barrier correctly.  */
>>> +  emit_barrier_after_bb (jump_block);
>>>    redirect_edge_succ_nodup (e, target);
>>>
>>>    if (abnormal_edge_flags)
>>>      make_edge (src, target, abnormal_edge_flags);
>>>
>>>    df_mark_solutions_dirty ();
>>> +  fixup_partition_crossing (e);
>>>    return new_bb;
>>>  }
>>>
>>> @@ -1654,6 +1720,21 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>>>    return false;
>>>  }
>>>
>>> +/* Locate the last bb in the same partition as START_BB.  */
>>> +
>>> +static basic_block
>>> +last_bb_in_partition (basic_block start_bb)
>>> +{
>>> +  basic_block bb;
>>> +  FOR_BB_BETWEEN (bb, start_bb, EXIT_BLOCK_PTR, next_bb)
>>> +    {
>>> +      if (BB_PARTITION (start_bb) != BB_PARTITION (bb->next_bb))
>>> +        return bb;
>>> +    }
>>> +  /* Return bb before EXIT_BLOCK_PTR.  */
>>> +  return bb->prev_bb;
>>> +}
>>> +
>>>  /* Split a (typically critical) edge.  Return the new block.
>>>     The edge must not be abnormal.
>>>
>>> @@ -1664,7 +1745,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>>>  static basic_block
>>>  rtl_split_edge (edge edge_in)
>>>  {
>>> -  basic_block bb;
>>> +  basic_block bb, new_bb;
>>>    rtx before;
>>>
>>>    /* Abnormal edges cannot be split.  */
>>> @@ -1696,13 +1777,50 @@ rtl_split_edge (edge edge_in)
>>>      }
>>>    else
>>>      {
>>> -      bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
>>> -      /* ??? Why not edge_in->dest->prev_bb here?  */
>>> -      BB_COPY_PARTITION (bb, edge_in->dest);
>>> +      if (edge_in->src == ENTRY_BLOCK_PTR)
>>> +        {
>>> +          bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
>>> +          BB_COPY_PARTITION (bb, edge_in->dest);
>>> +        }
>>> +      else
>>> +        {
>>> +          basic_block after = edge_in->dest->prev_bb;
>>> +          /* If this is post-bb reordering, and the edge crosses a partition
>>> +             boundary, the new block needs to be inserted in the bb chain
>>> +             at the end of the src partition (since we put the new bb into
>>> +             that partition, see below). Otherwise we may end up creating
>>> +             an extra partition crossing in the chain, which is illegal.
>>> +             It can't go after the src, because src may have a fall-through
>>> +             to a different block.  */
>>> +          if (crtl->bb_reorder_complete
>>> +              && (edge_in->flags & EDGE_CROSSING))
>>> +            {
>>> +              after = last_bb_in_partition (edge_in->src);
>>> +              before = NEXT_INSN (BB_END (after));
>>> +              /* The instruction following the last bb in partition should
>>> +                 be a barrier, since it cannot end in a fall-through.  */
>>> +              gcc_checking_assert (BARRIER_P (before));
>>> +              before = NEXT_INSN (before);
>>> +            }
>>> +          bb = create_basic_block (before, NULL, after);
>>> +          /* Put the split bb into the src partition, to avoid creating
>>> +             a situation where a cold bb dominates a hot bb, in the case
>>> +             where src is cold and dest is hot. The src will dominate
>>> +             the new bb (whereas it might not have dominated dest).  */
>>> +          BB_COPY_PARTITION (bb, edge_in->src);
>>> +        }
>>>      }
>>>
>>>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
>>>
>>> +  /* Can't allow a region crossing edge to be fallthrough.  */
>>> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
>>> +      && edge_in->dest != EXIT_BLOCK_PTR)
>>> +    {
>>> +      new_bb = force_nonfallthru (single_succ_edge (bb));
>>> +      gcc_assert (!new_bb);
>>> +    }
>>> +
>>>    /* For non-fallthru edges, we must adjust the predecessor's
>>>       jump instruction to target our new block.  */
>>>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
>>> @@ -1815,17 +1933,13 @@ commit_one_edge_insertion (edge e)
>>>    else
>>>      {
>>>        bb = split_edge (e);
>>> -      after = BB_END (bb);
>>>
>>> -      if (flag_reorder_blocks_and_partition
>>> -  && targetm_common.have_named_sections
>>> -  && e->src != ENTRY_BLOCK_PTR
>>> -  && BB_PARTITION (e->src) == BB_COLD_PARTITION
>>> -  && !(e->flags & EDGE_CROSSING)
>>> -  && JUMP_P (after)
>>> -  && !any_condjump_p (after)
>>> -  && (single_succ_edge (bb)->flags & EDGE_CROSSING))
>>> - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
>>> +      /* If E crossed a partition boundary, we needed to make bb end in
>>> +         a region-crossing jump, even though it was originally fallthru.  */
>>> +      if (JUMP_P (BB_END (bb)))
>>> + before = BB_END (bb);
>>> +      else
>>> +        after = BB_END (bb);
>>>      }
>>>
>>>    /* Now that we've found the spot, do the insertion.  */
>>> @@ -2071,7 +2185,11 @@ verify_hot_cold_block_grouping (void)
>>>    bool switched_sections = false;
>>>    int current_partition = BB_UNPARTITIONED;
>>>
>>> -  if (!crtl->bb_reorder_complete)
>>> +  /* Even after bb reordering is complete, we go into cfglayout mode
>>> +     again (in compgoto). Ensure we don't call this before going back
>>> +     into linearized RTL when any layout fixes would have been committed.  */
>>> +  if (!crtl->bb_reorder_complete
>>> +      || current_ir_type() != IR_RTL_CFGRTL)
>>>      return err;
>>>
>>>    FOR_EACH_BB (bb)
>>> @@ -2116,6 +2234,7 @@ rtl_verify_edges (void)
>>>        edge e, fallthru = NULL;
>>>        edge_iterator ei;
>>>        rtx note;
>>> +      bool has_crossing_edge = false;
>>>
>>>        if (JUMP_P (BB_END (bb))
>>>    && (note = find_reg_note (BB_END (bb), REG_BR_PROB, NULL_RTX))
>>> @@ -2141,6 +2260,7 @@ rtl_verify_edges (void)
>>>    is_crossing = (BB_PARTITION (e->src) != BB_PARTITION (e->dest)
>>>   && e->src != ENTRY_BLOCK_PTR
>>>   && e->dest != EXIT_BLOCK_PTR);
>>> +          has_crossing_edge |= is_crossing;
>>>    if (e->flags & EDGE_CROSSING)
>>>      {
>>>        if (!is_crossing)
>>> @@ -2160,6 +2280,13 @@ rtl_verify_edges (void)
>>>   e->src->index);
>>>    err = 1;
>>>   }
>>> +              if (JUMP_P (BB_END (bb))
>>> +                  && !find_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX))
>>> + {
>>> +  error ("No region crossing jump at section boundary in bb %i",
>>> + bb->index);
>>> +  err = 1;
>>> + }
>>>      }
>>>    else if (is_crossing)
>>>      {
>>> @@ -2188,6 +2315,15 @@ rtl_verify_edges (void)
>>>      n_abnormal++;
>>>   }
>>>
>>> +        if (!has_crossing_edge
>>> +            && find_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX))
>>> +          {
>>> +            print_rtl_with_bb (stderr, get_insns (), TDF_RTL |
>>> TDF_BLOCKS | TDF_DETAILS);
>>> +            error ("Region crossing jump across same section in bb %i",
>>> +                   bb->index);
>>> +            err = 1;
>>> +          }
>>> +
>>>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
>>>   {
>>>    error ("missing REG_EH_REGION note at the end of bb %i", bb->index);
>>> @@ -2395,8 +2531,6 @@ rtl_verify_flow_info_1 (void)
>>>
>>>    err |= rtl_verify_edges ();
>>>
>>> -  err |= verify_hot_cold_block_grouping();
>>> -
>>>    return err;
>>>  }
>>>
>>> @@ -2642,6 +2776,8 @@ rtl_verify_flow_info (void)
>>>
>>>    err |= rtl_verify_bb_layout ();
>>>
>>> +  err |= verify_hot_cold_block_grouping ();
>>> +
>>>    return err;
>>>  }
>>>
>>> @@ -3343,7 +3479,7 @@ fixup_reorder_chain (void)
>>>        edge e_fall, e_taken, e;
>>>        rtx bb_end_insn;
>>>        rtx ret_label = NULL_RTX;
>>> -      basic_block nb, src_bb;
>>> +      basic_block nb;
>>>        edge_iterator ei;
>>>
>>>        if (EDGE_COUNT (bb->succs) == 0)
>>> @@ -3478,7 +3614,6 @@ fixup_reorder_chain (void)
>>>        /* We got here if we need to add a new jump insn.
>>>   Note force_nonfallthru can delete E_FALL and thus we have to
>>>   save E_FALL->src prior to the call to force_nonfallthru.  */
>>> -      src_bb = e_fall->src;
>>>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
>>>        if (nb)
>>>   {
>>> @@ -3486,17 +3621,6 @@ fixup_reorder_chain (void)
>>>    bb->aux = nb;
>>>    /* Don't process this new block.  */
>>>    bb = nb;
>>> -
>>> -  /* Make sure new bb is tagged for correct section (same as
>>> -     fall-thru source, since you cannot fall-thru across
>>> -     section boundaries).  */
>>> -  BB_COPY_PARTITION (src_bb, single_pred (bb));
>>> -  if (flag_reorder_blocks_and_partition
>>> -      && targetm_common.have_named_sections
>>> -      && JUMP_P (BB_END (bb))
>>> -      && !any_condjump_p (BB_END (bb))
>>> -      && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
>>> -    add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
>>>   }
>>>      }
>>>
>>> @@ -3796,10 +3920,11 @@ duplicate_insn_chain (rtx from, rtx to)
>>>      case NOTE_INSN_FUNCTION_BEG:
>>>        /* There is always just single entry to function.  */
>>>      case NOTE_INSN_BASIC_BLOCK:
>>> +              /* We should only switch text sections once.  */
>>> +    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>>        break;
>>>
>>>      case NOTE_INSN_EPILOGUE_BEG:
>>> -    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>>        emit_note_copy (insn);
>>>        break;
>>>
>>> @@ -4611,8 +4736,7 @@ rtl_can_remove_branch_p (const_edge e)
>>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>>      return false;
>>>
>>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>>      return false;
>>>
>>>    if (!onlyjump_p (insn)
>>> Index: basic-block.h
>>> ===================================================================
>>> --- basic-block.h (revision 199014)
>>> +++ basic-block.h (working copy)
>>> @@ -796,6 +796,7 @@ extern basic_block force_nonfallthru_and_redirect
>>>  extern bool contains_no_active_insn_p (const_basic_block);
>>>  extern bool forwarder_block_p (const_basic_block);
>>>  extern bool can_fallthru (basic_block, basic_block);
>>> +extern void emit_barrier_after_bb (basic_block bb);
>>>
>>>  /* In cfgbuild.c.  */
>>>  extern void find_many_sub_basic_blocks (sbitmap);
>>> Index: testsuite/gcc.dg/tree-prof/va-arg-pack-1.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/tree-prof/va-arg-pack-1.c (revision 0)
>>> +++ testsuite/gcc.dg/tree-prof/va-arg-pack-1.c (revision 0)
>>> @@ -0,0 +1,145 @@
>>> +/* __builtin_va_arg_pack () builtin tests.  */
>>> +/* { dg-require-effective-target freorder } */
>>> +/* { dg-options "-O2 -freorder-blocks-and-partition" } */
>>> +
>>> +#include <stdarg.h>
>>> +
>>> +extern void abort (void);
>>> +
>>> +int v1 = 8;
>>> +long int v2 = 3;
>>> +void *v3 = (void *) &v2;
>>> +struct A { char c[16]; } v4 = { "foo" };
>>> +long double v5 = 40;
>>> +char seen[20];
>>> +int cnt;
>>> +
>>> +__attribute__ ((noinline)) int
>>> +foo1 (int x, int y, ...)
>>> +{
>>> +  int i;
>>> +  long int l;
>>> +  void *v;
>>> +  struct A a;
>>> +  long double ld;
>>> +  va_list ap;
>>> +
>>> +  va_start (ap, y);
>>> +  if (x < 0 || x >= 20 || seen[x])
>>> +    abort ();
>>> +  seen[x] = ++cnt;
>>> +  if (y != 6)
>>> +    abort ();
>>> +  i = va_arg (ap, int);
>>> +  if (i != 5)
>>> +    abort ();
>>> +  switch (x)
>>> +    {
>>> +    case 0:
>>> +      i = va_arg (ap, int);
>>> +      if (i != 9 || v1 != 9)
>>> + abort ();
>>> +      a = va_arg (ap, struct A);
>>> +      if (__builtin_memcmp (a.c, v4.c, sizeof (a.c)) != 0)
>>> + abort ();
>>> +      v = (void *) va_arg (ap, struct A *);
>>> +      if (v != (void *) &v4)
>>> + abort ();
>>> +      l = va_arg (ap, long int);
>>> +      if (l != 3 || v2 != 4)
>>> + abort ();
>>> +      break;
>>> +    case 1:
>>> +      ld = va_arg (ap, long double);
>>> +      if (ld != 41 || v5 != ld)
>>> + abort ();
>>> +      i = va_arg (ap, int);
>>> +      if (i != 8)
>>> + abort ();
>>> +      v = va_arg (ap, void *);
>>> +      if (v != &v2)
>>> + abort ();
>>> +      break;
>>> +    case 2:
>>> +      break;
>>> +    default:
>>> +      abort ();
>>> +    }
>>> +  va_end (ap);
>>> +  return x;
>>> +}
>>> +
>>> +__attribute__ ((noinline)) int
>>> +foo2 (int x, int y, ...)
>>> +{
>>> +  long long int ll;
>>> +  void *v;
>>> +  struct A a, b;
>>> +  long double ld;
>>> +  va_list ap;
>>> +
>>> +  va_start (ap, y);
>>> +  if (x < 0 || x >= 20 || seen[x])
>>> +    abort ();
>>> +  seen[x] = ++cnt | 64;
>>> +  if (y != 10)
>>> +    abort ();
>>> +  switch (x)
>>> +    {
>>> +    case 11:
>>> +      break;
>>> +    case 12:
>>> +      ld = va_arg (ap, long double);
>>> +      if (ld != 41 || v5 != 40)
>>> + abort ();
>>> +      a = va_arg (ap, struct A);
>>> +      if (__builtin_memcmp (a.c, v4.c, sizeof (a.c)) != 0)
>>> + abort ();
>>> +      b = va_arg (ap, struct A);
>>> +      if (__builtin_memcmp (b.c, v4.c, sizeof (b.c)) != 0)
>>> + abort ();
>>> +      v = va_arg (ap, void *);
>>> +      if (v != &v2)
>>> + abort ();
>>> +      ll = va_arg (ap, long long int);
>>> +      if (ll != 16LL)
>>> + abort ();
>>> +      break;
>>> +    case 2:
>>> +      break;
>>> +    default:
>>> +      abort ();
>>> +    }
>>> +  va_end (ap);
>>> +  return x + 8;
>>> +}
>>> +
>>> +__attribute__ ((noinline)) int
>>> +foo3 (void)
>>> +{
>>> +  return 6;
>>> +}
>>> +
>>> +extern inline __attribute__ ((always_inline, gnu_inline)) int
>>> +bar (int x, ...)
>>> +{
>>> +  if (x < 10)
>>> +    return foo1 (x, foo3 (), 5, __builtin_va_arg_pack ());
>>> +  return foo2 (x, foo3 () + 4, __builtin_va_arg_pack ());
>>> +}
>>> +
>>> +int
>>> +main (void)
>>> +{
>>> +  if (bar (0, ++v1, v4, &v4, v2++) != 0)
>>> +    abort ();
>>> +  if (bar (1, ++v5, 8, v3) != 1)
>>> +    abort ();
>>> +  if (bar (2) != 2)
>>> +    abort ();
>>> +  if (bar (v1 + 2) != 19)
>>> +    abort ();
>>> +  if (bar (v1 + 3, v5--, v4, v4, v3, 16LL) != 20)
>>> +    abort ();
>>> +  return 0;
>>> +}
>>> Index: testsuite/gcc.dg/tree-prof/comp-goto-1.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/tree-prof/comp-goto-1.c (revision 0)
>>> +++ testsuite/gcc.dg/tree-prof/comp-goto-1.c (revision 0)
>>> @@ -0,0 +1,166 @@
>>> +/* { dg-require-effective-target freorder } */
>>> +/* { dg-options "-O2 -freorder-blocks-and-partition" } */
>>> +#include <stdlib.h>
>>> +
>>> +#if !defined(NO_LABEL_VALUES) && (!defined(STACK_SIZE) || STACK_SIZE
>>>>= 4000) && __INT_MAX__ >= 2147483647
>>> +typedef unsigned int uint32;
>>> +typedef signed int sint32;
>>> +
>>> +typedef uint32 reg_t;
>>> +
>>> +typedef unsigned long int host_addr_t;
>>> +typedef uint32 target_addr_t;
>>> +typedef sint32 target_saddr_t;
>>> +
>>> +typedef union
>>> +{
>>> +  struct
>>> +    {
>>> +      unsigned int offset:18;
>>> +      unsigned int ignore:4;
>>> +      unsigned int s1:8;
>>> +      int :2;
>>> +      signed int simm:14;
>>> +      unsigned int s3:8;
>>> +      unsigned int s2:8;
>>> +      int pad2:2;
>>> +    } f1;
>>> +  long long ll;
>>> +  double d;
>>> +} insn_t;
>>> +
>>> +typedef struct
>>> +{
>>> +  target_addr_t vaddr_tag;
>>> +  unsigned long int rigged_paddr;
>>> +} tlb_entry_t;
>>> +
>>> +typedef struct
>>> +{
>>> +  insn_t *pc;
>>> +  reg_t registers[256];
>>> +  insn_t *program;
>>> +  tlb_entry_t tlb_tab[0x100];
>>> +} environment_t;
>>> +
>>> +enum operations
>>> +{
>>> +  LOAD32_RR,
>>> +  METAOP_DONE
>>> +};
>>> +
>>> +host_addr_t
>>> +f ()
>>> +{
>>> +  abort ();
>>> +}
>>> +
>>> +reg_t
>>> +simulator_kernel (int what, environment_t *env)
>>> +{
>>> +  register insn_t *pc = env->pc;
>>> +  register reg_t *regs = env->registers;
>>> +  register insn_t insn;
>>> +  register int s1;
>>> +  register reg_t r2;
>>> +  register void *base_addr = &&sim_base_addr;
>>> +  register tlb_entry_t *tlb = env->tlb_tab;
>>> +
>>> +  if (what != 0)
>>> +    {
>>> +      int i;
>>> +      static void *op_map[] =
>>> + {
>>> +  &&L_LOAD32_RR,
>>> +  &&L_METAOP_DONE,
>>> + };
>>> +      insn_t *program = env->program;
>>> +      for (i = 0; i < what; i++)
>>> + program[i].f1.offset = op_map[program[i].f1.offset] - base_addr;
>>> +    }
>>> +
>>> + sim_base_addr:;
>>> +
>>> +  insn = *pc++;
>>> +  r2 = (*(reg_t *) (((char *) regs) + (insn.f1.s2 << 2)));
>>> +  s1 = (insn.f1.s1 << 2);
>>> +  goto *(base_addr + insn.f1.offset);
>>> +
>>> + L_LOAD32_RR:
>>> +  {
>>> +    target_addr_t vaddr_page = r2 / 4096;
>>> +    unsigned int x = vaddr_page % 0x100;
>>> +    insn = *pc++;
>>> +
>>> +    for (;;)
>>> +      {
>>> + target_addr_t tag = tlb[x].vaddr_tag;
>>> + host_addr_t rigged_paddr = tlb[x].rigged_paddr;
>>> +
>>> + if (tag == vaddr_page)
>>> +  {
>>> +    *(reg_t *) (((char *) regs) + s1) = *(uint32 *) (rigged_paddr + r2);
>>> +    r2 = *(reg_t *) (((char *) regs) + (insn.f1.s2 << 2));
>>> +    s1 = insn.f1.s1 << 2;
>>> +    goto *(base_addr + insn.f1.offset);
>>> +  }
>>> +
>>> + if (((target_saddr_t) tag < 0))
>>> +  {
>>> +    *(reg_t *) (((char *) regs) + s1) = *(uint32 *) f ();
>>> +    r2 = *(reg_t *) (((char *) regs) + (insn.f1.s2 << 2));
>>> +    s1 = insn.f1.s1 << 2;
>>> +    goto *(base_addr + insn.f1.offset);
>>> +  }
>>> +
>>> + x = (x - 1) % 0x100;
>>> +      }
>>> +
>>> +    L_METAOP_DONE:
>>> +      return (*(reg_t *) (((char *) regs) + s1));
>>> +  }
>>> +}
>>> +
>>> +insn_t program[2 + 1];
>>> +
>>> +void *malloc ();
>>> +
>>> +int
>>> +main ()
>>> +{
>>> +  environment_t env;
>>> +  insn_t insn;
>>> +  int i, res;
>>> +  host_addr_t a_page = (host_addr_t) malloc (2 * 4096);
>>> +  target_addr_t a_vaddr = 0x123450;
>>> +  target_addr_t vaddr_page = a_vaddr / 4096;
>>> +  a_page = (a_page + 4096 - 1) & -4096;
>>> +
>>> +  env.tlb_tab[((vaddr_page) % 0x100)].vaddr_tag = vaddr_page;
>>> +  env.tlb_tab[((vaddr_page) % 0x100)].rigged_paddr = a_page -
>>> vaddr_page * 4096;
>>> +  insn.f1.offset = LOAD32_RR;
>>> +  env.registers[0] = 0;
>>> +  env.registers[2] = a_vaddr;
>>> +  *(sint32 *) (a_page + a_vaddr % 4096) = 88;
>>> +  insn.f1.s1 = 0;
>>> +  insn.f1.s2 = 2;
>>> +
>>> +  for (i = 0; i < 2; i++)
>>> +    program[i] = insn;
>>> +
>>> +  insn.f1.offset = METAOP_DONE;
>>> +  insn.f1.s1 = 0;
>>> +  program[2] = insn;
>>> +
>>> +  env.pc = program;
>>> +  env.program = program;
>>> +
>>> +  res = simulator_kernel (2 + 1, &env);
>>> +
>>> +  if (res != 88)
>>> +    abort ();
>>> +  exit (0);
>>> +}
>>> +#else
>>> +main(){ exit (0); }
>>> +#endif
>>> Index: testsuite/gcc.dg/tree-prof/pr52027.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/tree-prof/pr52027.c (revision 199014)
>>> +++ testsuite/gcc.dg/tree-prof/pr52027.c (working copy)
>>> @@ -1,6 +1,6 @@
>>>  /* PR debug/52027 */
>>>  /* { dg-require-effective-target freorder } */
>>> -/* { dg-options "-O -freorder-blocks-and-partition -fno-reorder-functions" } */
>>> +/* { dg-options "-O2 -freorder-blocks-and-partition
>>> -fno-reorder-functions" } */
>>>
>>>  void
>>>  foo (int len)
>>> Index: testsuite/gcc.dg/tree-prof/pr50907.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/tree-prof/pr50907.c (revision 199014)
>>> +++ testsuite/gcc.dg/tree-prof/pr50907.c (working copy)
>>> @@ -1,5 +1,5 @@
>>>  /* PR middle-end/50907 */
>>>  /* { dg-require-effective-target freorder } */
>>> -/* { dg-options "-O -freorder-blocks-and-partition -fschedule-insns
>>> -fselective-scheduling -fpic" { target { { powerpc*-*-* ia64-*-*
>>> x86_64-*-* } && fpic } } } */
>>> +/* { dg-options "-O2 -freorder-blocks-and-partition -fschedule-insns
>>> -fselective-scheduling -fpic" { target { { powerpc*-*-* ia64-*-*
>>> x86_64-*-* } && fpic } } } */
>>>
>>>  #include "pr45354.c"
>>> Index: testsuite/gcc.dg/tree-prof/pr45354.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/tree-prof/pr45354.c (revision 199014)
>>> +++ testsuite/gcc.dg/tree-prof/pr45354.c (working copy)
>>> @@ -1,5 +1,5 @@
>>>  /* { dg-require-effective-target freorder } */
>>> -/* { dg-options "-O -freorder-blocks-and-partition -fschedule-insns
>>> -fselective-scheduling" { target powerpc*-*-* ia64-*-* x86_64-*-* } }
>>> */
>>> +/* { dg-options "-O2 -freorder-blocks-and-partition -fschedule-insns
>>> -fselective-scheduling" { target powerpc*-*-* ia64-*-* x86_64-*-* } }
>>> */
>>>
>>>  extern void abort (void);
>>>
>>> Index: testsuite/gcc.dg/tree-prof/20041218-1.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/tree-prof/20041218-1.c (revision 0)
>>> +++ testsuite/gcc.dg/tree-prof/20041218-1.c (revision 0)
>>> @@ -0,0 +1,119 @@
>>> +/* PR rtl-optimization/16968 */
>>> +/* Testcase by Jakub Jelinek  <jakub@redhat.com> */
>>> +/* { dg-require-effective-target freorder } */
>>> +/* { dg-options "-O2 -freorder-blocks-and-partition" } */
>>> +
>>> +struct T
>>> +{
>>> +  unsigned int b, c, *d;
>>> +  unsigned char e;
>>> +};
>>> +struct S
>>> +{
>>> +  unsigned int a;
>>> +  struct T f;
>>> +};
>>> +struct U
>>> +{
>>> +  struct S g, h;
>>> +};
>>> +struct V
>>> +{
>>> +  unsigned int i;
>>> +  struct U j;
>>> +};
>>> +
>>> +extern void exit (int);
>>> +extern void abort (void);
>>> +
>>> +void *
>>> +dummy1 (void *x)
>>> +{
>>> +  return "";
>>> +}
>>> +
>>> +void *
>>> +dummy2 (void *x, void *y)
>>> +{
>>> +  exit (0);
>>> +}
>>> +
>>> +struct V *
>>> +baz (unsigned int x)
>>> +{
>>> +  static struct V v;
>>> +  __builtin_memset (&v, 0x55, sizeof (v));
>>> +  return &v;
>>> +}
>>> +
>>> +int
>>> +check (void *x, struct S *y)
>>> +{
>>> +  if (y->a || y->f.b || y->f.c || y->f.d || y->f.e)
>>> +    abort ();
>>> +  return 1;
>>> +}
>>> +
>>> +static struct V *
>>> +bar (unsigned int x, void *y)
>>> +{
>>> +  const struct T t = { 0, 0, (void *) 0, 0 };
>>> +  struct V *u;
>>> +  void *v;
>>> +  v = dummy1 (y);
>>> +  if (!v)
>>> +    return (void *) 0;
>>> +
>>> +  u = baz (sizeof (struct V));
>>> +  u->i = x;
>>> +  u->j.g.a = 0;
>>> +  u->j.g.f = t;
>>> +  u->j.h.a = 0;
>>> +  u->j.h.f = t;
>>> +
>>> +  if (!check (v, &u->j.g) || !check (v, &u->j.h))
>>> +    return (void *) 0;
>>> +  return u;
>>> +}
>>> +
>>> +int
>>> +foo (unsigned int *x, unsigned int y, void **z)
>>> +{
>>> +  void *v;
>>> +  unsigned int i, j;
>>> +
>>> +  *z = v = (void *) 0;
>>> +
>>> +  for (i = 0; i < y; i++)
>>> +    {
>>> +      struct V *c;
>>> +
>>> +      j = *x;
>>> +
>>> +      switch (j)
>>> + {
>>> + case 1:
>>> +  c = bar (j, x);
>>> +  break;
>>> + default:
>>> +  c = 0;
>>> +  break;
>>> + }
>>> +      if (c)
>>> + v = dummy2 (v, c);
>>> +      else
>>> +        return 1;
>>> +    }
>>> +
>>> +  *z = v;
>>> +  return 0;
>>> +}
>>> +
>>> +int
>>> +main (void)
>>> +{
>>> +  unsigned int one = 1;
>>> +  void *p;
>>> +  foo (&one, 1, &p);
>>> +  abort ();
>>> +}
>>> Index: testsuite/g++.dg/tree-prof/partition2.C
>>> ===================================================================
>>> --- testsuite/g++.dg/tree-prof/partition2.C (revision 199014)
>>> +++ testsuite/g++.dg/tree-prof/partition2.C (working copy)
>>> @@ -1,6 +1,6 @@
>>>  // PR middle-end/45458
>>>  // { dg-require-effective-target freorder }
>>> -// { dg-options "-fnon-call-exceptions -freorder-blocks-and-partition" }
>>> +// { dg-options "-O2 -fnon-call-exceptions -freorder-blocks-and-partition" }
>>>
>>>  int
>>>  main ()
>>> Index: testsuite/g++.dg/tree-prof/partition3.C
>>> ===================================================================
>>> --- testsuite/g++.dg/tree-prof/partition3.C (revision 199014)
>>> +++ testsuite/g++.dg/tree-prof/partition3.C (working copy)
>>> @@ -1,6 +1,6 @@
>>>  // PR middle-end/45566
>>>  // { dg-require-effective-target freorder }
>>> -// { dg-options "-O -fnon-call-exceptions -freorder-blocks-and-partition" }
>>> +// { dg-options "-O2 -fnon-call-exceptions -freorder-blocks-and-partition" }
>>>
>>>  int k;
>>>
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

Patch

Index: ifcvt.c
===================================================================
--- ifcvt.c (revision 199014)
+++ ifcvt.c (working copy)
@@ -3905,10 +3905,9 @@  find_if_case_1 (basic_block test_bb, edge then_edg
   if (new_bb)
     {
       df_bb_replace (then_bb_index, new_bb);
-      /* Since the fallthru edge was redirected from test_bb to new_bb,
-         we need to ensure that new_bb is in the same partition as
-         test bb (you can not fall through across section boundaries).  */
-      BB_COPY_PARTITION (new_bb, test_bb);
+      /* This should have been done above via force_nonfallthru_and_redirect
+         (possibly called from redirect_edge_and_branch_force).  */
+      gcc_checking_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
     }

   num_true_changes++;
Index: function.c
===================================================================
--- function.c (revision 199014)
+++ function.c (working copy)
@@ -6270,8 +6270,10 @@  thread_prologue_and_epilogue_insns (void)
     break;
  if (e)
   {
-    copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
-  NULL_RTX, e->src);
+                    /* Make sure we insert after any barriers.  */
+                    rtx end = get_last_bb_insn (e->src);
+                    copy_bb = create_basic_block (NEXT_INSN (end),
+                                                  NULL_RTX, e->src);
     BB_COPY_PARTITION (copy_bb, e->src);
   }
  else
@@ -6538,7 +6540,7 @@  epilogue_done:
       basic_block simple_return_block_cold = NULL;
       edge pending_edge_hot = NULL;
       edge pending_edge_cold = NULL;
-      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
+      basic_block exit_pred;
       int i;

       gcc_assert (entry_edge != orig_entry_edge);
@@ -6566,6 +6568,12 @@  epilogue_done:
     else
       pending_edge_cold = e;
   }
+
+      /* Save a pointer to the exit's predecessor BB for use in
+         inserting new BBs at the end of the function. Do this
+         after the call to split_block above which may split
+         the original exit pred.  */
+      exit_pred = EXIT_BLOCK_PTR->prev_bb;

       FOR_EACH_VEC_ELT (unconverted_simple_returns, i, e)
  {
Index: emit-rtl.c
===================================================================
--- emit-rtl.c (revision 199014)
+++ emit-rtl.c (working copy)
@@ -3574,6 +3574,7 @@  try_split (rtx pat, rtx trial, int last)
   break;

  case REG_NON_LOCAL_GOTO:
+ case REG_CROSSING_JUMP:
   for (insn = insn_last; insn != NULL_RTX; insn = PREV_INSN (insn))
     {
       if (JUMP_P (insn))
Index: cfgcleanup.c
===================================================================
--- cfgcleanup.c (revision 199014)
+++ cfgcleanup.c (working copy)
@@ -456,7 +456,7 @@  try_forward_edges (int mode, basic_block b)

       if (first != EXIT_BLOCK_PTR
   && find_reg_note (BB_END (first), REG_CROSSING_JUMP, NULL_RTX))
- return false;
+ return changed;

       while (counter < n_basic_blocks)
  {
Index: bb-reorder.c
===================================================================
--- bb-reorder.c (revision 199014)
+++ bb-reorder.c (working copy)
@@ -1380,15 +1380,6 @@  get_uncond_jump_length (void)
   return length;
 }

-/* Emit a barrier into the footer of BB.  */
-
-static void
-emit_barrier_after_bb (basic_block bb)
-{
-  rtx barrier = emit_barrier_after (BB_END (bb));
-  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
-}
-
 /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
    Duplicate the landing pad and split the edges so that no EH edge
    crosses partitions.  */
@@ -1720,8 +1711,7 @@  fix_up_fall_thru_edges (void)
      (i.e. fix it so the fall through does not cross and
      the cond jump does).  */

-  if (!cond_jump_crosses
-      && cur_bb->aux == cond_jump->dest)
+  if (!cond_jump_crosses)
     {
       /* Find label in fall_thru block. We've already added
  any missing labels, so there must be one.  */
@@ -1765,10 +1755,10 @@  fix_up_fall_thru_edges (void)
       new_bb->aux = cur_bb->aux;
       cur_bb->aux = new_bb;

-      /* Make sure new fall-through bb is in same
- partition as bb it's falling through from.  */
+                      /* This is done by force_nonfallthru_and_redirect.  */
+      gcc_assert (BB_PARTITION (new_bb)
+                                  == BB_PARTITION (cur_bb));

-      BB_COPY_PARTITION (new_bb, cur_bb);
       single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
     }
   else
@@ -2064,7 +2054,10 @@  add_reg_crossing_jump_notes (void)
   FOR_EACH_BB (bb)
     FOR_EACH_EDGE (e, ei, bb->succs)
       if ((e->flags & EDGE_CROSSING)
-  && JUMP_P (BB_END (e->src)))
+  && JUMP_P (BB_END (e->src))
+          /* Some notes were added during fix_up_fall_thru_edges, via
+             force_nonfallthru_and_redirect.  */
+          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
  add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
 }

@@ -2133,23 +2126,26 @@  reorder_basic_blocks (void)
    encountering this note will make the compiler switch between the
    hot and cold text sections.  */

-static void
+void
 insert_section_boundary_note (void)
 {
   basic_block bb;
-  int first_partition = 0;
+  bool switched_sections = false;
+  int current_partition = 0;

-  if (!flag_reorder_blocks_and_partition)
+  if (!crtl->has_bb_partition)
     return;

   FOR_EACH_BB (bb)
     {
-      if (!first_partition)
- first_partition = BB_PARTITION (bb);
-      if (BB_PARTITION (bb) != first_partition)
+      if (!current_partition)
+ current_partition = BB_PARTITION (bb);
+      if (BB_PARTITION (bb) != current_partition)
  {
-  emit_note_before (NOTE_INSN_SWITCH_TEXT_SECTIONS, BB_HEAD (bb));
-  break;
+  gcc_assert (!switched_sections);
+          switched_sections = true;
+          emit_note_before (NOTE_INSN_SWITCH_TEXT_SECTIONS, BB_HEAD (bb));
+          current_partition = BB_PARTITION (bb);
  }
     }
 }
@@ -2180,8 +2176,6 @@  rest_of_handle_reorder_blocks (void)
       bb->aux = bb->next_bb;
   cfg_layout_finalize ();

-  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
-  insert_section_boundary_note ();
   return 0;
 }

@@ -2315,6 +2309,11 @@  duplicate_computed_gotos (void)
       if (!bitmap_bit_p (candidates, single_succ (bb)->index))
  continue;

+      /* Don't duplicate a partition crossing edge, which requires difficult
+         fixup.  */
+      if (find_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX))
+ continue;
+
       new_bb = duplicate_block (single_succ (bb), single_succ_edge (bb), bb);
       new_bb->aux = bb->aux;
       bb->aux = new_bb;
Index: bb-reorder.h
===================================================================
--- bb-reorder.h (revision 199014)
+++ bb-reorder.h (working copy)
@@ -35,4 +35,6 @@  extern struct target_bb_reorder *this_target_bb_re

 extern int get_uncond_jump_length (void);

+extern void insert_section_boundary_note (void);
+
 #endif
Index: Makefile.in
===================================================================
--- Makefile.in (revision 199014)
+++ Makefile.in (working copy)
@@ -3151,7 +3151,7 @@  cfgrtl.o : cfgrtl.c $(CONFIG_H) $(SYSTEM_H) corety
    $(FUNCTION_H) $(EXCEPT_H) $(TM_P_H) $(INSN_ATTR_H) \
    insn-config.h $(EXPR_H) \
    $(CFGLOOP_H) $(OBSTACK_H) $(TARGET_H) $(TREE_H) \
-   $(TREE_PASS_H) $(DF_H) $(GGC_H) $(COMMON_TARGET_H) gt-cfgrtl.h
+   $(TREE_PASS_H) $(DF_H) $(GGC_H) $(COMMON_TARGET_H) gt-cfgrtl.h bb-reorder.h
 cfganal.o : cfganal.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(BASIC_BLOCK_H) \
    $(TIMEVAR_H) sbitmap.h $(BITMAP_H)
 cfgbuild.o : cfgbuild.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
Index: cfgrtl.c
===================================================================
--- cfgrtl.c (revision 199014)
+++ cfgrtl.c (working copy)
@@ -44,6 +44,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree.h"
 #include "hard-reg-set.h"
 #include "basic-block.h"
+#include "bb-reorder.h"
 #include "regs.h"
 #include "flags.h"
 #include "function.h"
@@ -451,6 +452,9 @@  rest_of_pass_free_cfg (void)
     }
 #endif

+  if (crtl->has_bb_partition)
+    insert_section_boundary_note ();
+
   free_bb_for_insn ();
   return 0;
 }
@@ -981,8 +985,7 @@  try_redirect_by_replacing_jump (edge e, basic_bloc
      partition boundaries).  See  the comments at the top of
      bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */

-  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
-      || BB_PARTITION (src) != BB_PARTITION (target))
+  if (BB_PARTITION (src) != BB_PARTITION (target))
     return NULL;

   /* We can replace or remove a complex jump only when we have exactly
@@ -1291,6 +1294,53 @@  redirect_branch_edge (edge e, basic_block target)
   return e;
 }

+/* Called when edge E has been redirected to a new destination,
+   in order to update the region crossing flag on the edge and
+   jump.  */
+
+static void
+fixup_partition_crossing (edge e)
+{
+  rtx note;
+
+  if (e->src == ENTRY_BLOCK_PTR || e->dest == EXIT_BLOCK_PTR)
+    return;
+  /* If we redirected an existing edge, it may already be marked
+     crossing, even though the new src is missing a reg crossing note.
+     But make sure reg crossing note doesn't already exist before
+     inserting.  */
+  if (BB_PARTITION (e->src) != BB_PARTITION (e->dest))
+    {
+      e->flags |= EDGE_CROSSING;
+      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+      if (JUMP_P (BB_END (e->src))
+          && !note)
+        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+    }
+  else if (BB_PARTITION (e->src) == BB_PARTITION (e->dest))
+    {
+      e->flags &= ~EDGE_CROSSING;
+      /* Remove the section crossing note from jump at end of
+         src if it exists, and if no other successors are
+         still crossing.  */
+      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+      if (note)
+        {
+          bool has_crossing_succ = false;
+          edge e2;
+          edge_iterator ei;
+          FOR_EACH_EDGE (e2, ei, e->src->succs)
+            {
+              has_crossing_succ |= (e2->flags & EDGE_CROSSING);
+              if (has_crossing_succ)
+                break;
+            }
+          if (!has_crossing_succ)
+            remove_note (BB_END (e->src), note);
+        }
+    }
+}
+
 /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
    expense of adding new instructions or reordering basic blocks.

@@ -1307,16 +1357,18 @@  rtl_redirect_edge_and_branch (edge e, basic_block
 {
   edge ret;
   basic_block src = e->src;
+  basic_block dest = e->dest;

   if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
     return NULL;

-  if (e->dest == target)
+  if (dest == target)
     return e;

   if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
     {
       df_set_bb_dirty (src);
+      fixup_partition_crossing (ret);
       return ret;
     }

@@ -1325,9 +1377,22 @@  rtl_redirect_edge_and_branch (edge e, basic_block
     return NULL;

   df_set_bb_dirty (src);
+  fixup_partition_crossing (ret);
   return ret;
 }

+/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
+
+void
+emit_barrier_after_bb (basic_block bb)
+{
+  rtx barrier = emit_barrier_after (BB_END (bb));
+  gcc_assert (current_ir_type() == IR_RTL_CFGRTL
+              || current_ir_type () == IR_RTL_CFGLAYOUT);
+  if (current_ir_type () == IR_RTL_CFGLAYOUT)
+    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
+}
+
 /* Like force_nonfallthru below, but additionally performs redirection
    Used by redirect_edge_and_branch_force.  JUMP_LABEL is used only
    when redirecting to the EXIT_BLOCK, it is either ret_rtx or
@@ -1492,12 +1557,6 @@  force_nonfallthru_and_redirect (edge e, basic_bloc
       /* Make sure new block ends up in correct hot/cold section.  */

       BB_COPY_PARTITION (jump_block, e->src);
-      if (flag_reorder_blocks_and_partition
-  && targetm_common.have_named_sections
-  && JUMP_P (BB_END (jump_block))
-  && !any_condjump_p (BB_END (jump_block))
-  && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
- add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);

       /* Wire edge in.  */
       new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
@@ -1508,6 +1567,10 @@  force_nonfallthru_and_redirect (edge e, basic_bloc
       redirect_edge_pred (e, jump_block);
       e->probability = REG_BR_PROB_BASE;

+      /* If e->src was previously region crossing, it no longer is
+         and the reg crossing note should be removed.  */
+      fixup_partition_crossing (new_edge);
+
       /* If asm goto has any label refs to target's label,
  add also edge from asm goto bb to target.  */
       if (asm_goto_edge)
@@ -1559,13 +1622,16 @@  force_nonfallthru_and_redirect (edge e, basic_bloc
       LABEL_NUSES (label)++;
     }

-  emit_barrier_after (BB_END (jump_block));
+  /* We might be in cfg layout mode, and if so, the following routine will
+     insert the barrier correctly.  */
+  emit_barrier_after_bb (jump_block);
   redirect_edge_succ_nodup (e, target);

   if (abnormal_edge_flags)
     make_edge (src, target, abnormal_edge_flags);

   df_mark_solutions_dirty ();
+  fixup_partition_crossing (e);
   return new_bb;
 }

@@ -1654,6 +1720,21 @@  rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
   return false;
 }

+/* Locate the last bb in the same partition as START_BB.  */
+
+static basic_block
+last_bb_in_partition (basic_block start_bb)
+{
+  basic_block bb;
+  FOR_BB_BETWEEN (bb, start_bb, EXIT_BLOCK_PTR, next_bb)
+    {
+      if (BB_PARTITION (start_bb) != BB_PARTITION (bb->next_bb))
+        return bb;
+    }
+  /* Return bb before EXIT_BLOCK_PTR.  */
+  return bb->prev_bb;
+}
+
 /* Split a (typically critical) edge.  Return the new block.
    The edge must not be abnormal.

@@ -1664,7 +1745,7 @@  rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
 static basic_block
 rtl_split_edge (edge edge_in)
 {
-  basic_block bb;
+  basic_block bb, new_bb;
   rtx before;

   /* Abnormal edges cannot be split.  */
@@ -1696,13 +1777,50 @@  rtl_split_edge (edge edge_in)
     }
   else
     {
-      bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
-      /* ??? Why not edge_in->dest->prev_bb here?  */
-      BB_COPY_PARTITION (bb, edge_in->dest);
+      if (edge_in->src == ENTRY_BLOCK_PTR)
+        {
+          bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
+          BB_COPY_PARTITION (bb, edge_in->dest);
+        }
+      else
+        {
+          basic_block after = edge_in->dest->prev_bb;
+          /* If this is post-bb reordering, and the edge crosses a partition
+             boundary, the new block needs to be inserted in the bb chain
+             at the end of the src partition (since we put the new bb into
+             that partition, see below). Otherwise we may end up creating
+             an extra partition crossing in the chain, which is illegal.
+             It can't go after the src, because src may have a fall-through
+             to a different block.  */
+          if (crtl->bb_reorder_complete
+              && (edge_in->flags & EDGE_CROSSING))
+            {
+              after = last_bb_in_partition (edge_in->src);
+              before = NEXT_INSN (BB_END (after));
+              /* The instruction following the last bb in partition should
+                 be a barrier, since it cannot end in a fall-through.  */
+              gcc_checking_assert (BARRIER_P (before));
+              before = NEXT_INSN (before);
+            }
+          bb = create_basic_block (before, NULL, after);
+          /* Put the split bb into the src partition, to avoid creating
+             a situation where a cold bb dominates a hot bb, in the case
+             where src is cold and dest is hot. The src will dominate
+             the new bb (whereas it might not have dominated dest).  */
+          BB_COPY_PARTITION (bb, edge_in->src);
+        }
     }

   make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);

+  /* Can't allow a region crossing edge to be fallthrough.  */
+  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
+      && edge_in->dest != EXIT_BLOCK_PTR)
+    {
+      new_bb = force_nonfallthru (single_succ_edge (bb));
+      gcc_assert (!new_bb);
+    }
+
   /* For non-fallthru edges, we must adjust the predecessor's
      jump instruction to target our new block.  */
   if ((edge_in->flags & EDGE_FALLTHRU) == 0)
@@ -1815,17 +1933,13 @@  commit_one_edge_insertion (edge e)
   else
     {
       bb = split_edge (e);
-      after = BB_END (bb);

-      if (flag_reorder_blocks_and_partition
-  && targetm_common.have_named_sections
-  && e->src != ENTRY_BLOCK_PTR
-  && BB_PARTITION (e->src) == BB_COLD_PARTITION
-  && !(e->flags & EDGE_CROSSING)
-  && JUMP_P (after)
-  && !any_condjump_p (after)
-  && (single_succ_edge (bb)->flags & EDGE_CROSSING))
- add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
+      /* If E crossed a partition boundary, we needed to make bb end in
+         a region-crossing jump, even though it was originally fallthru.  */
+      if (JUMP_P (BB_END (bb)))
+ before = BB_END (bb);
+      else
+        after = BB_END (bb);
     }

   /* Now that we've found the spot, do the insertion.  */
@@ -2071,7 +2185,11 @@  verify_hot_cold_block_grouping (void)
   bool switched_sections = false;
   int current_partition = BB_UNPARTITIONED;

-  if (!crtl->bb_reorder_complete)
+  /* Even after bb reordering is complete, we go into cfglayout mode
+     again (in compgoto). Ensure we don't call this before going back
+     into linearized RTL when any layout fixes would have been committed.  */
+  if (!crtl->bb_reorder_complete
+      || current_ir_type() != IR_RTL_CFGRTL)
     return err;

   FOR_EACH_BB (bb)
@@ -2116,6 +2234,7 @@  rtl_verify_edges (void)
       edge e, fallthru = NULL;
       edge_iterator ei;
       rtx note;
+      bool has_crossing_edge = false;

       if (JUMP_P (BB_END (bb))
   && (note = find_reg_note (BB_END (bb), REG_BR_PROB, NULL_RTX))
@@ -2141,6 +2260,7 @@  rtl_verify_edges (void)
   is_crossing = (BB_PARTITION (e->src) != BB_PARTITION (e->dest)
  && e->src != ENTRY_BLOCK_PTR
  && e->dest != EXIT_BLOCK_PTR);
+          has_crossing_edge |= is_crossing;
   if (e->flags & EDGE_CROSSING)
     {
       if (!is_crossing)
@@ -2160,6 +2280,13 @@  rtl_verify_edges (void)
  e->src->index);
   err = 1;
  }
+              if (JUMP_P (BB_END (bb))
+                  && !find_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX))
+ {
+  error ("No region crossing jump at section boundary in bb %i",
+ bb->index);
+  err = 1;
+ }
     }
   else if (is_crossing)
     {
@@ -2188,6 +2315,15 @@  rtl_verify_edges (void)
     n_abnormal++;
  }

+        if (!has_crossing_edge
+            && find_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX))
+          {
+            print_rtl_with_bb (stderr, get_insns (), TDF_RTL |
TDF_BLOCKS | TDF_DETAILS);
+            error ("Region crossing jump across same section in bb %i",
+                   bb->index);
+            err = 1;
+          }
+
       if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
  {
   error ("missing REG_EH_REGION note at the end of bb %i", bb->index);
@@ -2395,8 +2531,6 @@  rtl_verify_flow_info_1 (void)

   err |= rtl_verify_edges ();

-  err |= verify_hot_cold_block_grouping();
-
   return err;
 }

@@ -2642,6 +2776,8 @@  rtl_verify_flow_info (void)

   err |= rtl_verify_bb_layout ();

+  err |= verify_hot_cold_block_grouping ();
+
   return err;
 }

@@ -3343,7 +3479,7 @@  fixup_reorder_chain (void)
       edge e_fall, e_taken, e;
       rtx bb_end_insn;
       rtx ret_label = NULL_RTX;
-      basic_block nb, src_bb;
+      basic_block nb;
       edge_iterator ei;

       if (EDGE_COUNT (bb->succs) == 0)
@@ -3478,7 +3614,6 @@  fixup_reorder_chain (void)
       /* We got here if we need to add a new jump insn.
  Note force_nonfallthru can delete E_FALL and thus we have to
  save E_FALL->src prior to the call to force_nonfallthru.  */
-      src_bb = e_fall->src;
       nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
       if (nb)
  {
@@ -3486,17 +3621,6 @@  fixup_reorder_chain (void)
   bb->aux = nb;
   /* Don't process this new block.  */
   bb = nb;
-
-  /* Make sure new bb is tagged for correct section (same as
-     fall-thru source, since you cannot fall-thru across
-     section boundaries).  */
-  BB_COPY_PARTITION (src_bb, single_pred (bb));
-  if (flag_reorder_blocks_and_partition
-      && targetm_common.have_named_sections
-      && JUMP_P (BB_END (bb))
-      && !any_condjump_p (BB_END (bb))
-      && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
-    add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
  }
     }

@@ -3796,10 +3920,11 @@  duplicate_insn_chain (rtx from, rtx to)
     case NOTE_INSN_FUNCTION_BEG:
       /* There is always just single entry to function.  */
     case NOTE_INSN_BASIC_BLOCK:
+              /* We should only switch text sections once.  */
+    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
       break;

     case NOTE_INSN_EPILOGUE_BEG:
-    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
       emit_note_copy (insn);
       break;

@@ -4611,8 +4736,7 @@  rtl_can_remove_branch_p (const_edge e)
   if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
     return false;

-  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
-      || BB_PARTITION (src) != BB_PARTITION (target))
+  if (BB_PARTITION (src) != BB_PARTITION (target))
     return false;

   if (!onlyjump_p (insn)
Index: basic-block.h
===================================================================
--- basic-block.h (revision 199014)
+++ basic-block.h (working copy)
@@ -796,6 +796,7 @@  extern basic_block force_nonfallthru_and_redirect
 extern bool contains_no_active_insn_p (const_basic_block);
 extern bool forwarder_block_p (const_basic_block);
 extern bool can_fallthru (basic_block, basic_block);
+extern void emit_barrier_after_bb (basic_block bb);

 /* In cfgbuild.c.  */
 extern void find_many_sub_basic_blocks (sbitmap);
Index: testsuite/gcc.dg/tree-prof/va-arg-pack-1.c
===================================================================
--- testsuite/gcc.dg/tree-prof/va-arg-pack-1.c (revision 0)
+++ testsuite/gcc.dg/tree-prof/va-arg-pack-1.c (revision 0)
@@ -0,0 +1,145 @@ 
+/* __builtin_va_arg_pack () builtin tests.  */
+/* { dg-require-effective-target freorder } */
+/* { dg-options "-O2 -freorder-blocks-and-partition" } */
+
+#include <stdarg.h>
+
+extern void abort (void);
+
+int v1 = 8;
+long int v2 = 3;
+void *v3 = (void *) &v2;
+struct A { char c[16]; } v4 = { "foo" };
+long double v5 = 40;
+char seen[20];
+int cnt;
+
+__attribute__ ((noinline)) int
+foo1 (int x, int y, ...)
+{
+  int i;
+  long int l;
+  void *v;
+  struct A a;
+  long double ld;
+  va_list ap;
+
+  va_start (ap, y);
+  if (x < 0 || x >= 20 || seen[x])
+    abort ();
+  seen[x] = ++cnt;
+  if (y != 6)
+    abort ();
+  i = va_arg (ap, int);
+  if (i != 5)
+    abort ();
+  switch (x)
+    {
+    case 0:
+      i = va_arg (ap, int);
+      if (i != 9 || v1 != 9)
+ abort ();
+      a = va_arg (ap, struct A);
+      if (__builtin_memcmp (a.c, v4.c, sizeof (a.c)) != 0)
+ abort ();
+      v = (void *) va_arg (ap, struct A *);
+      if (v != (void *) &v4)
+ abort ();
+      l = va_arg (ap, long int);
+      if (l != 3 || v2 != 4)
+ abort ();
+      break;
+    case 1:
+      ld = va_arg (ap, long double);
+      if (ld != 41 || v5 != ld)
+ abort ();
+      i = va_arg (ap, int);
+      if (i != 8)
+ abort ();
+      v = va_arg (ap, void *);
+      if (v != &v2)
+ abort ();
+      break;
+    case 2:
+      break;
+    default:
+      abort ();
+    }
+  va_end (ap);
+  return x;
+}
+
+__attribute__ ((noinline)) int
+foo2 (int x, int y, ...)
+{
+  long long int ll;
+  void *v;
+  struct A a, b;
+  long double ld;
+  va_list ap;
+
+  va_start (ap, y);
+  if (x < 0 || x >= 20 || seen[x])
+    abort ();
+  seen[x] = ++cnt | 64;
+  if (y != 10)
+    abort ();
+  switch (x)
+    {
+    case 11:
+      break;
+    case 12:
+      ld = va_arg (ap, long double);
+      if (ld != 41 || v5 != 40)
+ abort ();
+      a = va_arg (ap, struct A);
+      if (__builtin_memcmp (a.c, v4.c, sizeof (a.c)) != 0)
+ abort ();
+      b = va_arg (ap, struct A);
+      if (__builtin_memcmp (b.c, v4.c, sizeof (b.c)) != 0)
+ abort ();
+      v = va_arg (ap, void *);
+      if (v != &v2)
+ abort ();
+      ll = va_arg (ap, long long int);
+      if (ll != 16LL)
+ abort ();
+      break;
+    case 2:
+      break;
+    default:
+      abort ();
+    }
+  va_end (ap);
+  return x + 8;
+}
+
+__attribute__ ((noinline)) int
+foo3 (void)
+{
+  return 6;
+}
+
+extern inline __attribute__ ((always_inline, gnu_inline)) int
+bar (int x, ...)
+{
+  if (x < 10)
+    return foo1 (x, foo3 (), 5, __builtin_va_arg_pack ());
+  return foo2 (x, foo3 () + 4, __builtin_va_arg_pack ());
+}
+
+int
+main (void)
+{
+  if (bar (0, ++v1, v4, &v4, v2++) != 0)
+    abort ();
+  if (bar (1, ++v5, 8, v3) != 1)
+    abort ();
+  if (bar (2) != 2)
+    abort ();
+  if (bar (v1 + 2) != 19)
+    abort ();
+  if (bar (v1 + 3, v5--, v4, v4, v3, 16LL) != 20)
+    abort ();
+  return 0;
+}
Index: testsuite/gcc.dg/tree-prof/comp-goto-1.c
===================================================================
--- testsuite/gcc.dg/tree-prof/comp-goto-1.c (revision 0)
+++ testsuite/gcc.dg/tree-prof/comp-goto-1.c (revision 0)
@@ -0,0 +1,166 @@ 
+/* { dg-require-effective-target freorder } */
+/* { dg-options "-O2 -freorder-blocks-and-partition" } */
+#include <stdlib.h>
+
+#if !defined(NO_LABEL_VALUES) && (!defined(STACK_SIZE) || STACK_SIZE
>= 4000) && __INT_MAX__ >= 2147483647
+typedef unsigned int uint32;
+typedef signed int sint32;
+
+typedef uint32 reg_t;
+
+typedef unsigned long int host_addr_t;
+typedef uint32 target_addr_t;
+typedef sint32 target_saddr_t;
+
+typedef union
+{
+  struct
+    {
+      unsigned int offset:18;
+      unsigned int ignore:4;
+      unsigned int s1:8;
+      int :2;
+      signed int simm:14;
+      unsigned int s3:8;
+      unsigned int s2:8;
+      int pad2:2;
+    } f1;
+  long long ll;
+  double d;
+} insn_t;
+
+typedef struct
+{
+  target_addr_t vaddr_tag;
+  unsigned long int rigged_paddr;
+} tlb_entry_t;
+
+typedef struct
+{
+  insn_t *pc;
+  reg_t registers[256];
+  insn_t *program;
+  tlb_entry_t tlb_tab[0x100];
+} environment_t;
+
+enum operations
+{
+  LOAD32_RR,
+  METAOP_DONE
+};
+
+host_addr_t
+f ()
+{
+  abort ();
+}
+
+reg_t
+simulator_kernel (int what, environment_t *env)
+{
+  register insn_t *pc = env->pc;
+  register reg_t *regs = env->registers;
+  register insn_t insn;
+  register int s1;
+  register reg_t r2;
+  register void *base_addr = &&sim_base_addr;
+  register tlb_entry_t *tlb = env->tlb_tab;
+
+  if (what != 0)
+    {
+      int i;
+      static void *op_map[] =
+ {
+  &&L_LOAD32_RR,
+  &&L_METAOP_DONE,
+ };
+      insn_t *program = env->program;
+      for (i = 0; i < what; i++)
+ program[i].f1.offset = op_map[program[i].f1.offset] - base_addr;
+    }
+
+ sim_base_addr:;
+
+  insn = *pc++;
+  r2 = (*(reg_t *) (((char *) regs) + (insn.f1.s2 << 2)));
+  s1 = (insn.f1.s1 << 2);
+  goto *(base_addr + insn.f1.offset);
+
+ L_LOAD32_RR:
+  {
+    target_addr_t vaddr_page = r2 / 4096;
+    unsigned int x = vaddr_page % 0x100;
+    insn = *pc++;
+
+    for (;;)
+      {
+ target_addr_t tag = tlb[x].vaddr_tag;
+ host_addr_t rigged_paddr = tlb[x].rigged_paddr;
+
+ if (tag == vaddr_page)
+  {
+    *(reg_t *) (((char *) regs) + s1) = *(uint32 *) (rigged_paddr + r2);
+    r2 = *(reg_t *) (((char *) regs) + (insn.f1.s2 << 2));
+    s1 = insn.f1.s1 << 2;
+    goto *(base_addr + insn.f1.offset);
+  }
+
+ if (((target_saddr_t) tag < 0))
+  {
+    *(reg_t *) (((char *) regs) + s1) = *(uint32 *) f ();
+    r2 = *(reg_t *) (((char *) regs) + (insn.f1.s2 << 2));
+    s1 = insn.f1.s1 << 2;
+    goto *(base_addr + insn.f1.offset);
+  }
+
+ x = (x - 1) % 0x100;
+      }
+
+    L_METAOP_DONE:
+      return (*(reg_t *) (((char *) regs) + s1));
+  }
+}
+
+insn_t program[2 + 1];
+
+void *malloc ();
+
+int
+main ()
+{
+  environment_t env;
+  insn_t insn;
+  int i, res;
+  host_addr_t a_page = (host_addr_t) malloc (2 * 4096);
+  target_addr_t a_vaddr = 0x123450;
+  target_addr_t vaddr_page = a_vaddr / 4096;
+  a_page = (a_page + 4096 - 1) & -4096;
+
+  env.tlb_tab[((vaddr_page) % 0x100)].vaddr_tag = vaddr_page;
+  env.tlb_tab[((vaddr_page) % 0x100)].rigged_paddr = a_page -
vaddr_page * 4096;
+  insn.f1.offset = LOAD32_RR;
+  env.registers[0] = 0;
+  env.registers[2] = a_vaddr;
+  *(sint32 *) (a_page + a_vaddr % 4096) = 88;
+  insn.f1.s1 = 0;
+  insn.f1.s2 = 2;
+
+  for (i = 0; i < 2; i++)
+    program[i] = insn;
+
+  insn.f1.offset = METAOP_DONE;
+  insn.f1.s1 = 0;
+  program[2] = insn;
+
+  env.pc = program;
+  env.program = program;
+
+  res = simulator_kernel (2 + 1, &env);
+
+  if (res != 88)
+    abort ();
+  exit (0);
+}
+#else
+main(){ exit (0); }
+#endif
Index: testsuite/gcc.dg/tree-prof/pr52027.c
===================================================================
--- testsuite/gcc.dg/tree-prof/pr52027.c (revision 199014)
+++ testsuite/gcc.dg/tree-prof/pr52027.c (working copy)
@@ -1,6 +1,6 @@ 
 /* PR debug/52027 */
 /* { dg-require-effective-target freorder } */
-/* { dg-options "-O -freorder-blocks-and-partition -fno-reorder-functions" } */
+/* { dg-options "-O2 -freorder-blocks-and-partition
-fno-reorder-functions" } */

 void
 foo (int len)
Index: testsuite/gcc.dg/tree-prof/pr50907.c
===================================================================
--- testsuite/gcc.dg/tree-prof/pr50907.c (revision 199014)
+++ testsuite/gcc.dg/tree-prof/pr50907.c (working copy)
@@ -1,5 +1,5 @@ 
 /* PR middle-end/50907 */
 /* { dg-require-effective-target freorder } */
-/* { dg-options "-O -freorder-blocks-and-partition -fschedule-insns
-fselective-scheduling -fpic" { target { { powerpc*-*-* ia64-*-*
x86_64-*-* } && fpic } } } */
+/* { dg-options "-O2 -freorder-blocks-and-partition -fschedule-insns
-fselective-scheduling -fpic" { target { { powerpc*-*-* ia64-*-*
x86_64-*-* } && fpic } } } */

 #include "pr45354.c"
Index: testsuite/gcc.dg/tree-prof/pr45354.c
===================================================================
--- testsuite/gcc.dg/tree-prof/pr45354.c (revision 199014)
+++ testsuite/gcc.dg/tree-prof/pr45354.c (working copy)
@@ -1,5 +1,5 @@ 
 /* { dg-require-effective-target freorder } */
-/* { dg-options "-O -freorder-blocks-and-partition -fschedule-insns
-fselective-scheduling" { target powerpc*-*-* ia64-*-* x86_64-*-* } }
*/
+/* { dg-options "-O2 -freorder-blocks-and-partition -fschedule-insns
-fselective-scheduling" { target powerpc*-*-* ia64-*-* x86_64-*-* } }
*/

 extern void abort (void);

Index: testsuite/gcc.dg/tree-prof/20041218-1.c
===================================================================
--- testsuite/gcc.dg/tree-prof/20041218-1.c (revision 0)
+++ testsuite/gcc.dg/tree-prof/20041218-1.c (revision 0)
@@ -0,0 +1,119 @@ 
+/* PR rtl-optimization/16968 */
+/* Testcase by Jakub Jelinek  <jakub@redhat.com> */
+/* { dg-require-effective-target freorder } */
+/* { dg-options "-O2 -freorder-blocks-and-partition" } */
+
+struct T
+{
+  unsigned int b, c, *d;
+  unsigned char e;
+};
+struct S
+{
+  unsigned int a;
+  struct T f;
+};
+struct U
+{
+  struct S g, h;
+};
+struct V
+{
+  unsigned int i;
+  struct U j;
+};
+
+extern void exit (int);
+extern void abort (void);
+
+void *
+dummy1 (void *x)
+{
+  return "";
+}
+
+void *
+dummy2 (void *x, void *y)
+{
+  exit (0);
+}
+
+struct V *
+baz (unsigned int x)
+{
+  static struct V v;
+  __builtin_memset (&v, 0x55, sizeof (v));
+  return &v;
+}
+
+int
+check (void *x, struct S *y)
+{
+  if (y->a || y->f.b || y->f.c || y->f.d || y->f.e)
+    abort ();
+  return 1;
+}
+
+static struct V *
+bar (unsigned int x, void *y)
+{
+  const struct T t = { 0, 0, (void *) 0, 0 };
+  struct V *u;
+  void *v;
+  v = dummy1 (y);
+  if (!v)
+    return (void *) 0;
+
+  u = baz (sizeof (struct V));
+  u->i = x;
+  u->j.g.a = 0;
+  u->j.g.f = t;
+  u->j.h.a = 0;
+  u->j.h.f = t;
+
+  if (!check (v, &u->j.g) || !check (v, &u->j.h))
+    return (void *) 0;
+  return u;
+}
+
+int
+foo (unsigned int *x, unsigned int y, void **z)
+{
+  void *v;
+  unsigned int i, j;
+
+  *z = v = (void *) 0;
+
+  for (i = 0; i < y; i++)
+    {
+      struct V *c;
+
+      j = *x;
+
+      switch (j)
+ {
+ case 1:
+  c = bar (j, x);
+  break;
+ default:
+  c = 0;
+  break;
+ }
+      if (c)
+ v = dummy2 (v, c);
+      else
+        return 1;
+    }
+
+  *z = v;
+  return 0;
+}
+
+int
+main (void)
+{
+  unsigned int one = 1;
+  void *p;
+  foo (&one, 1, &p);
+  abort ();
+}
Index: testsuite/g++.dg/tree-prof/partition2.C
===================================================================
--- testsuite/g++.dg/tree-prof/partition2.C (revision 199014)
+++ testsuite/g++.dg/tree-prof/partition2.C (working copy)
@@ -1,6 +1,6 @@ 
 // PR middle-end/45458
 // { dg-require-effective-target freorder }
-// { dg-options "-fnon-call-exceptions -freorder-blocks-and-partition" }
+// { dg-options "-O2 -fnon-call-exceptions -freorder-blocks-and-partition" }

 int
 main ()
Index: testsuite/g++.dg/tree-prof/partition3.C
===================================================================
--- testsuite/g++.dg/tree-prof/partition3.C (revision 199014)
+++ testsuite/g++.dg/tree-prof/partition3.C (working copy)
@@ -1,6 +1,6 @@ 
 // PR middle-end/45566
 // { dg-require-effective-target freorder }
-// { dg-options "-O -fnon-call-exceptions -freorder-blocks-and-partition" }
+// { dg-options "-O2 -fnon-call-exceptions -freorder-blocks-and-partition" }

 int k;