Message ID | 20121115201013.D8E7E61423@tjsboxrox.mtv.corp.google.com |
---|---|
State | New |
Headers | show |
Ping. Teresa On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote: > Revised patch that fixes failures encountered when enabling > -freorder-blocks-and-partition, including the failure reported in PR 53743. > > This includes new verification code to ensure no cold blocks dominate hot > blocks contributed by Steven Bosscher. > > I attempted to make the handling of partition updates through the optimization > passes much more consistent, removing a number of partial fixes in the code > stream in the process. The code to fixup partitions (including the BB_PARTITION > assignement, region crossing jump notes, and switch text section notes) is > now handled in a few centralized locations. For example, inside > rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers > don't need to attempt the fixup themselves. > > For optimization passes that make adjustments to the cfg while in cfg layout > mode that are not easy to fix up incrementally, the new routine > fixup_partitions handles the cleanup globally. This does require calculation > of the dominance relation, however, as far as I can tell the routines which > now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) > are invoked typically once (or a small number of times in the case of > try_optimize_cfg) per optimization pass. Additionally, I compared the > -ftime-report output for some large fdo compilations and saw only minimal > increases in the dominance computation times, which were only a tiny percent > of the overall compile time. > > Additionally, I added a flag to the rtl_data structure to indicate whether > any partitioning was actually performed, so that optimizations which were > conservatively disabled whenever the flag_reorder_blocks_and_partition > is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less > conservative for functions where no partitions were formed (e.g. they are > completely hot). > > Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int > benchmarks and internal google benchmarks using profile feedback and > -freorder-blocks-and-partition to get more coverage. Ok for trunk? > > Thanks, > Teresa > > 2012-11-14 Teresa Johnson <tejohnson@google.com> > Steven Bosscher <steven@gcc.gnu.org> > > * cfghooks.h (cfg_layout_finalize): New parameter. > * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize > parameter. > * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert > as this is now done by redirect_edge_and_branch_force. > * function.c (thread_prologue_and_epilogue_insns): Insert new bb after > barriers, new cfg_layout_finalize parameter, and don't store exit > predecessor BB until after it is potentially split. > * function.h (struct rtl_data): New flag has_bb_partition. > * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. > * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if > any blocks in function actually partitioned. > (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean > up partitioning. > * bb-reorder.c (connect_traces): Only look for partitions and skip > block copying if any blocks in function actually partitioned. > (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. > (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure > that no cold blocks dominate a hot block. > (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert > as this is now done by force_nonfallthru_and_redirect. > (add_reg_crossing_jump_notes): Handle the fact that some jumps may > already be marked with region crossing note. > (reorder_basic_blocks): Only need to verify partitions if any > blocks in function actually partitioned. > (insert_section_boundary_note): Only need to insert note if any > blocks in function actually partitioned. > (rest_of_handle_reorder_blocks): New cfg_layout_finalize > parameter, and remove call to insert_section_boundary_note as this > is now called via cfg_layout_finalize/fixup_reorder_chain. > (duplicate_computed_gotos): New cfg_layout_finalize > parameter. > (partition_hot_cold_basic_blocks): Set flag indicating function > has bb partitions. > * bb-reorder.h: Declare insert_section_boundary_note and > emit_barrier_after_bb, which are no longer static. > * basic-block.h: Declare new function fixup_partitions. > * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary > check for region crossing note. > (fixup_partition_crossing): New function. > (fixup_bb_partition): Ditto. > (rtl_redirect_edge_and_branch): Fixup partition boundaries. > (force_nonfallthru_and_redirect): Fixup partition boundaries, > remove old code that tried to do this. Emit barrier correctly > when we are in cfglayout mode. > (rtl_split_edge): Correctly fixup partition boundaries. > (commit_one_edge_insertion): Remove old code that tried to > fixup region crossing edge since this is now handled in > split_block, and set up insertion point correctly since > block may now end in a jump. > (commit_edge_insertions): Invoke fixup_partitions to sanitize partition > boundaries after optimizations that modify cfg and before trying to > verify the flow info. > (fixup_partitions): New function. > (rtl_verify_flow_info_1): Add verification that no cold bbs dominate > hot bbs. > (record_effective_endpoints): Remove region-crossing notes and set flag > indicating that they need to be reinserted on exit from cfglayout mode. > (outof_cfg_layout_mode): New cfg_layout_finalize parameter. > (fixup_reorder_chain): Call insert_section_boundary_note if necessary. > Remove old code that attempted to fixup region crossing note as > this is now handled in force_nonfallthru_and_redirect. > (duplicate_insn_chain): Don't duplicate switch section notes. > (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain. > (rtl_can_remove_branch_p): Remove unnecessary check for region crossing > note. > > Index: cfghooks.h > =================================================================== > --- cfghooks.h (revision 193376) > +++ cfghooks.h (working copy) > @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas > void account_profile_record (struct profile_record *, int); > > extern void cfg_layout_initialize (unsigned int); > -extern void cfg_layout_finalize (void); > +extern void cfg_layout_finalize (bool); > > /* Hooks containers. */ > extern struct cfg_hooks gimple_cfg_hooks; > @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi > extern void gimple_register_cfg_hooks (void); > extern struct cfg_hooks get_cfg_hooks (void); > extern void set_cfg_hooks (struct cfg_hooks); > - > Index: modulo-sched.c > =================================================================== > --- modulo-sched.c (revision 193376) > +++ modulo-sched.c (working copy) > @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) > if (bb->next_bb != EXIT_BLOCK_PTR) > bb->aux = bb->next_bb; > free_dominance_info (CDI_DOMINATORS); > - cfg_layout_finalize (); > + cfg_layout_finalize (false); > #endif /* INSN_SCHEDULING */ > return 0; > } > Index: ifcvt.c > =================================================================== > --- ifcvt.c (revision 193376) > +++ ifcvt.c (working copy) > @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg > if (new_bb) > { > df_bb_replace (then_bb_index, new_bb); > - /* Since the fallthru edge was redirected from test_bb to new_bb, > - we need to ensure that new_bb is in the same partition as > - test bb (you can not fall through across section boundaries). */ > - BB_COPY_PARTITION (new_bb, test_bb); > + /* This should have been done above via force_nonfallthru_and_redirect > + (possibly called from redirect_edge_and_branch_force). */ > + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); > } > > num_true_changes++; > Index: function.c > =================================================================== > --- function.c (revision 193376) > +++ function.c (working copy) > @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) > break; > if (e) > { > - copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), > - NULL_RTX, e->src); > + /* Make sure we insert after any barriers. */ > + rtx end = get_last_bb_insn (e->src); > + copy_bb = create_basic_block (NEXT_INSN (end), > + NULL_RTX, e->src); > BB_COPY_PARTITION (copy_bb, e->src); > } > else > @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) > if (cur_bb->index >= NUM_FIXED_BLOCKS > && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) > cur_bb->aux = cur_bb->next_bb; > - cfg_layout_finalize (); > + cfg_layout_finalize (false); > } > > epilogue_done: > @@ -6517,7 +6519,7 @@ epilogue_done: > basic_block simple_return_block_cold = NULL; > edge pending_edge_hot = NULL; > edge pending_edge_cold = NULL; > - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; > + basic_block exit_pred; > int i; > > gcc_assert (entry_edge != orig_entry_edge); > @@ -6545,6 +6547,12 @@ epilogue_done: > else > pending_edge_cold = e; > } > + > + /* Save a pointer to the exit's predecessor BB for use in > + inserting new BBs at the end of the function. Do this > + after the call to split_block above which may split > + the original exit pred. */ > + exit_pred = EXIT_BLOCK_PTR->prev_bb; > > FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) > { > Index: function.h > =================================================================== > --- function.h (revision 193376) > +++ function.h (working copy) > @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { > sched2) and is useful only if the port defines LEAF_REGISTERS. */ > bool uses_only_leaf_regs; > > + /* Nonzero if the function being compiled has undergone hot/cold partitioning > + (under flag_reorder_blocks_and_partition) and has at least one cold > + block. */ > + bool has_bb_partition; > + > /* Like regs_ever_live, but 1 if a reg is set or clobbered from an > asm. Unlike regs_ever_live, elements of this array corresponding > to eliminable regs (like the frame pointer) are set if an asm > Index: hw-doloop.c > =================================================================== > --- hw-doloop.c (revision 193376) > +++ hw-doloop.c (working copy) > @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) > else > bb->aux = NULL; > } > - cfg_layout_finalize (); > + cfg_layout_finalize (false); > clear_aux_for_blocks (); > df_analyze (); > } > Index: cfgcleanup.c > =================================================================== > --- cfgcleanup.c (revision 193376) > +++ cfgcleanup.c (working copy) > @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, > partition boundaries). See the comments at the top of > bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ > > - if (flag_reorder_blocks_and_partition && reload_completed) > + if (crtl->has_bb_partition && reload_completed) > return false; > > /* Search backward through forwarder blocks. We don't need to worry > @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) > df_analyze (); > } > > + if (changed) > + { > + /* Edge forwarding in particular can cause hot blocks previously > + reached by both hot and cold blocks to become dominated only > + by cold blocks. This will cause the verification below to fail, > + and lead to now cold code in the hot section. This is not easy > + to detect and fix during edge forwarding, and in some cases > + is only visible after newly unreachable blocks are deleted, > + which will be done in fixup_partitions. */ > + fixup_partitions (); > + > #ifdef ENABLE_CHECKING > - if (changed) > - verify_flow_info (); > + verify_flow_info (); > #endif > + } > > changed_overall |= changed; > first_pass = false; > Index: bb-reorder.c > =================================================================== > --- bb-reorder.c (revision 193376) > +++ bb-reorder.c (working copy) > @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces > current_partition = BB_PARTITION (traces[0].first); > two_passes = false; > > - if (flag_reorder_blocks_and_partition) > + if (crtl->has_bb_partition) > for (i = 0; i < n_traces && !two_passes; i++) > if (BB_PARTITION (traces[0].first) > != BB_PARTITION (traces[i].first)) > @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces > } > } > > - if (flag_reorder_blocks_and_partition) > + if (crtl->has_bb_partition) > try_copy = false; > > /* Copy tiny blocks always; copy larger blocks only when the > @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) > return length; > } > > -/* Emit a barrier into the footer of BB. */ > +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode. */ > > -static void > +void > emit_barrier_after_bb (basic_block bb) > { > rtx barrier = emit_barrier_after (BB_END (bb)); > - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); > + if (current_ir_type () == IR_RTL_CFGLAYOUT) > + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); > } > > /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. > @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg > { > VEC(edge, heap) *crossing_edges = NULL; > basic_block bb; > - edge e; > - edge_iterator ei; > + edge e, e2; > + edge_iterator ei, ei2; > + unsigned int cold_bb_count = 0; > + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; > + VEC (basic_block, heap) *bbs_newly_hot = NULL; > > /* Mark which partition (hot/cold) each basic block belongs in. */ > FOR_EACH_BB (bb) > { > if (probably_never_executed_bb_p (cfun, bb)) > - BB_SET_PARTITION (bb, BB_COLD_PARTITION); > + { > + BB_SET_PARTITION (bb, BB_COLD_PARTITION); > + cold_bb_count++; > + } > else > - BB_SET_PARTITION (bb, BB_HOT_PARTITION); > + { > + BB_SET_PARTITION (bb, BB_HOT_PARTITION); > + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb); > + } > } > > + /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of > + several different possibilities. One is that there are edge weight insanities > + due to optimization phases that do not properly update basic block profile > + counts. The second is that the entry of the function may not be hot, because > + it is entered fewer times than the number of profile training runs, but there > + is a loop inside the function that causes blocks within the function to be > + above the threshold for hotness. */ > + if (cold_bb_count) > + { > + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); > + > + if (dom_calculated_here) > + calculate_dominance_info (CDI_DOMINATORS); > + > + /* Keep examining hot bbs until we have either checked them all, or > + re-marked all cold bbs hot. */ > + while (! VEC_empty (basic_block, bbs_in_hot_partition) > + && cold_bb_count) > + { > + basic_block dom_bb; > + > + bb = VEC_pop (basic_block, bbs_in_hot_partition); > + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); > + > + /* If bb's immediate dominator is also hot then it is ok. */ > + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) > + continue; > + > + /* We have a hot bb with an immediate dominator that is cold. > + The dominator needs to be re-marked to hot. */ > + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); > + cold_bb_count--; > + > + /* Now we need to examine newly-hot dom_bb to see if it is also > + dominated by a cold bb. */ > + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb); > + > + /* We should also adjust any cold blocks that the newly-hot bb > + feeds and see if it makes sense to re-mark those as hot as > + well. */ > + VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb); > + while (! VEC_empty (basic_block, bbs_newly_hot)) > + { > + basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot); > + /* Examine all successors of this newly-hot bb to see if they > + are cold and should be re-marked as hot. */ > + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) > + { > + bool any_cold_preds = false; > + basic_block succ = e->dest; > + if (BB_PARTITION (succ) != BB_COLD_PARTITION) > + continue; > + /* Does this block have any cold predecessors now? */ > + FOR_EACH_EDGE (e2, ei2, succ->preds) > + { > + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) > + { > + any_cold_preds = true; > + break; > + } > + } > + if (any_cold_preds) > + continue; > + > + /* Here we have a successor of newly-hot bb that is cold > + but no longer has any cold precessessors. Since the original > + assignment of our newly-hot bb was incorrect, this successor's > + assignment as cold is also suspect. Go ahead and re-mark it > + as hot now too. Better heuristics may be in order here. */ > + BB_SET_PARTITION (succ, BB_HOT_PARTITION); > + cold_bb_count--; > + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ); > + /* Examine this successor as a newly-hot bb. */ > + VEC_safe_push (basic_block, heap, bbs_newly_hot, succ); > + } > + } > + } > + > + if (dom_calculated_here) > + free_dominance_info (CDI_DOMINATORS); > + } > + > /* The format of .gcc_except_table does not allow landing pads to > be in a different partition as the throw. Fix this by either > moving or duplicating the landing pads. */ > @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) > new_bb->aux = cur_bb->aux; > cur_bb->aux = new_bb; > > - /* Make sure new fall-through bb is in same > - partition as bb it's falling through from. */ > + /* This is done by force_nonfallthru_and_redirect. */ > + gcc_assert (BB_PARTITION (new_bb) > + == BB_PARTITION (cur_bb)); > > - BB_COPY_PARTITION (new_bb, cur_bb); > single_succ_edge (new_bb)->flags |= EDGE_CROSSING; > } > else > @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) > FOR_EACH_BB (bb) > FOR_EACH_EDGE (e, ei, bb->succs) > if ((e->flags & EDGE_CROSSING) > - && JUMP_P (BB_END (e->src))) > + && JUMP_P (BB_END (e->src)) > + /* Some notes were added during fix_up_fall_thru_edges, via > + force_nonfallthru_and_redirect. */ > + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX)) > add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); > } > > @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) > dump_flow_info (dump_file, dump_flags); > } > > - if (flag_reorder_blocks_and_partition) > + if (crtl->has_bb_partition) > verify_hot_cold_block_grouping (); > } > > @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) > encountering this note will make the compiler switch between the > hot and cold text sections. */ > > -static void > +void > insert_section_boundary_note (void) > { > basic_block bb; > rtx new_note; > int first_partition = 0; > > - if (!flag_reorder_blocks_and_partition) > + if (!crtl->has_bb_partition) > return; > > FOR_EACH_BB (bb) > @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) > FOR_EACH_BB (bb) > if (bb->next_bb != EXIT_BLOCK_PTR) > bb->aux = bb->next_bb; > - cfg_layout_finalize (); > + cfg_layout_finalize (true); > > - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ > - insert_section_boundary_note (); > return 0; > } > > @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) > } > > done: > - cfg_layout_finalize (); > + cfg_layout_finalize (false); > > BITMAP_FREE (candidates); > return 0; > @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) > if (crossing_edges == NULL) > return 0; > > + crtl->has_bb_partition = true; > + > /* Make sure the source of any crossing edge ends in a jump and the > destination of any crossing edge has a label. */ > add_labels_and_missing_jumps (crossing_edges); > Index: bb-reorder.h > =================================================================== > --- bb-reorder.h (revision 193376) > +++ bb-reorder.h (working copy) > @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re > > extern int get_uncond_jump_length (void); > > +extern void insert_section_boundary_note (void); > + > +extern void emit_barrier_after_bb (basic_block bb); > + > #endif > Index: basic-block.h > =================================================================== > --- basic-block.h (revision 193376) > +++ basic-block.h (working copy) > @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect > extern bool contains_no_active_insn_p (const_basic_block); > extern bool forwarder_block_p (const_basic_block); > extern bool can_fallthru (basic_block, basic_block); > +extern void fixup_partitions (void); > > /* In cfgbuild.c. */ > extern void find_many_sub_basic_blocks (sbitmap); > Index: cfgrtl.c > =================================================================== > --- cfgrtl.c (revision 193376) > +++ cfgrtl.c (working copy) > @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see > #include "tree.h" > #include "hard-reg-set.h" > #include "basic-block.h" > +#include "bb-reorder.h" > #include "regs.h" > #include "flags.h" > #include "function.h" > @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see > Only applicable if the CFG is in cfglayout mode. */ > static GTY(()) rtx cfg_layout_function_footer; > static GTY(()) rtx cfg_layout_function_header; > +static bool had_sec_boundary_notes; > > static rtx skip_insns_after_block (basic_block); > static void record_effective_endpoints (void); > static rtx label_for_bb (basic_block); > -static void fixup_reorder_chain (void); > +static void fixup_reorder_chain (bool finalize_reorder_blocks); > > void verify_insn_chain (void); > static void fixup_fallthru_exit_predecessor (void); > @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc > partition boundaries). See the comments at the top of > bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ > > - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) > - || BB_PARTITION (src) != BB_PARTITION (target)) > + if (BB_PARTITION (src) != BB_PARTITION (target)) > return NULL; > > /* We can replace or remove a complex jump only when we have exactly > @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) > return e; > } > > +/* Called when edge E has been redirected to a new destination, > + in order to update the region crossing flag on the edge and > + jump. */ > + > +static void > +fixup_partition_crossing (edge e, basic_block target) > +{ > + rtx note; > + > + gcc_assert (e->dest == target); > + > + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) > + return; > + /* If we redirected an existing edge, it may already be marked > + crossing, even though the new src is missing a reg crossing note. > + But make sure reg crossing note doesn't already exist before > + inserting. */ > + if (BB_PARTITION (e->src) != BB_PARTITION (target)) > + { > + e->flags |= EDGE_CROSSING; > + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); > + if (JUMP_P (BB_END (e->src)) > + && !note) > + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); > + } > + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) > + { > + e->flags &= ~EDGE_CROSSING; > + /* Remove the region crossing note from jump at end of > + e->src if it exists. */ > + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); > + if (note) > + remove_note (BB_END (e->src), note); > + } > +} > + > +/* Called when block BB has been reassigned to a different partition, > + to ensure that the region crossing attributes are updated. */ > + > +static void > +fixup_bb_partition (basic_block bb) > +{ > + edge e; > + edge_iterator ei; > + > + /* Now need to make bb's pred edges non-region crossing. */ > + FOR_EACH_EDGE (e, ei, bb->preds) > + { > + fixup_partition_crossing (e, e->dest); > + } > + > + /* Possibly need to make bb's successor edges region crossing, > + or remove stale region crossing. */ > + FOR_EACH_EDGE (e, ei, bb->succs) > + { > + if ((e->flags & EDGE_FALLTHRU) > + && BB_PARTITION (bb) != BB_PARTITION (e->dest) > + && e->dest != EXIT_BLOCK_PTR) > + /* force_nonfallthru_and_redirect calls fixup_partition_crossing. */ > + force_nonfallthru (e); > + else > + fixup_partition_crossing (e, e->dest); > + } > +} > + > /* Attempt to change code to redirect edge E to TARGET. Don't do that on > expense of adding new instructions or reordering basic blocks. > > @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block > { > edge ret; > basic_block src = e->src; > + basic_block dest = e->dest; > > if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) > return NULL; > > - if (e->dest == target) > + if (dest == target) > return e; > > if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) > { > df_set_bb_dirty (src); > + fixup_partition_crossing (ret, target); > return ret; > } > > @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block > return NULL; > > df_set_bb_dirty (src); > + fixup_partition_crossing (ret, target); > return ret; > } > > @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc > /* Make sure new block ends up in correct hot/cold section. */ > > BB_COPY_PARTITION (jump_block, e->src); > - if (flag_reorder_blocks_and_partition > - && targetm_common.have_named_sections > - && JUMP_P (BB_END (jump_block)) > - && !any_condjump_p (BB_END (jump_block)) > - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) > - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); > > /* Wire edge in. */ > new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); > new_edge->probability = probability; > new_edge->count = count; > > + /* If e->src was previously region crossing, it no longer is > + and the reg crossing note should be removed. */ > + fixup_partition_crossing (new_edge, jump_block); > + > /* Redirect old edge. */ > redirect_edge_pred (e, jump_block); > e->probability = REG_BR_PROB_BASE; > @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc > LABEL_NUSES (label)++; > } > > - emit_barrier_after (BB_END (jump_block)); > + /* We might be in cfg layout mode, and if so, the following routine will > + insert the barrier correctly. */ > + emit_barrier_after_bb (jump_block); > redirect_edge_succ_nodup (e, target); > > if (abnormal_edge_flags) > make_edge (src, target, abnormal_edge_flags); > > df_mark_solutions_dirty (); > + fixup_partition_crossing (e, target); > return new_bb; > } > > @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU > static basic_block > rtl_split_edge (edge edge_in) > { > - basic_block bb; > + basic_block bb, new_bb; > rtx before; > > /* Abnormal edges cannot be split. */ > @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) > else > { > bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); > - /* ??? Why not edge_in->dest->prev_bb here? */ > - BB_COPY_PARTITION (bb, edge_in->dest); > + if (edge_in->src == ENTRY_BLOCK_PTR) > + BB_COPY_PARTITION (bb, edge_in->dest); > + else > + /* Put the split bb into the src partition, to avoid creating > + a situation where a cold bb dominates a hot bb, in the case > + where src is cold and dest is hot. The src will dominate > + the new bb (whereas it might not have dominated dest). */ > + BB_COPY_PARTITION (bb, edge_in->src); > } > > make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); > > + /* Can't allow a region crossing edge to be fallthrough. */ > + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) > + && edge_in->dest != EXIT_BLOCK_PTR) > + { > + new_bb = force_nonfallthru (single_succ_edge (bb)); > + gcc_assert (!new_bb); > + } > + > /* For non-fallthru edges, we must adjust the predecessor's > jump instruction to target our new block. */ > if ((edge_in->flags & EDGE_FALLTHRU) == 0) > @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) > else > { > bb = split_edge (e); > - after = BB_END (bb); > > - if (flag_reorder_blocks_and_partition > - && targetm_common.have_named_sections > - && e->src != ENTRY_BLOCK_PTR > - && BB_PARTITION (e->src) == BB_COLD_PARTITION > - && !(e->flags & EDGE_CROSSING) > - && JUMP_P (after) > - && !any_condjump_p (after) > - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) > - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); > + /* If e crossed a partition boundary, we needed to make bb end in > + a region-crossing jump, even though it was originally fallthru. */ > + if (JUMP_P (BB_END (bb))) > + before = BB_END (bb); > + else > + after = BB_END (bb); > } > > /* Now that we've found the spot, do the insertion. */ > @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) > { > basic_block bb; > > + /* Optimization passes that invoke this routine can cause hot blocks > + previously reached by both hot and cold blocks to become dominated only > + by cold blocks. This will cause the verification below to fail, > + and lead to now cold code in the hot section. In some cases this > + may only be visible after newly unreachable blocks are deleted, > + which will be done by fixup_partitions. */ > + fixup_partitions (); > + > #ifdef ENABLE_CHECKING > verify_flow_info (); > #endif > @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) > > return end; > } > - > + > +/* Perform cleanup on the hot/cold bb partitioning after optimization > + passes that modify the cfg. */ > + > +void > +fixup_partitions (void) > +{ > + basic_block bb; > + > + if (!crtl->has_bb_partition) > + return; > + > + /* Delete any blocks that became unreachable and weren't > + already cleaned up, for example during edge forwarding > + and convert_jumps_to_returns. This will expose more > + opportunities for fixing the partition boundaries here. > + Also, the calculation of the dominance graph during verification > + will assert if there are unreachable nodes. */ > + delete_unreachable_blocks (); > + > + /* If there are partitions, do a sanity check on them: A basic block in > + a cold partition cannot dominate a basic block in a hot partition. > + Fixup any that now violate this requirement, as a result of edge > + forwarding and unreachable block deletion. */ > + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; > + VEC (basic_block, heap) *bbs_to_fix = NULL; > + FOR_EACH_BB (bb) > + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) > + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); > + if (! VEC_empty (basic_block, bbs_in_cold_partition)) > + { > + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); > + basic_block son; > + > + if (dom_calculated_here) > + calculate_dominance_info (CDI_DOMINATORS); > + > + while (! VEC_empty (basic_block, bbs_in_cold_partition)) > + { > + bb = VEC_pop (basic_block, bbs_in_cold_partition); > + /* If bb is not yet cold (because it was added below as > + a block dominated by a cold bb) then mark it cold here. */ > + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) > + { > + BB_SET_PARTITION (bb, BB_COLD_PARTITION); > + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); > + } > + /* Any blocks dominated by a block in the cold section > + must also be cold. */ > + for (son = first_dom_son (CDI_DOMINATORS, bb); > + son; > + son = next_dom_son (CDI_DOMINATORS, son)) > + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); > + } > + > + if (dom_calculated_here) > + free_dominance_info (CDI_DOMINATORS); > + } > + > + /* Do the partition fixup after all necessary blocks have been converted to > + cold, so that we only update the region crossings the minimum number of > + places, which can require forcing edges to be non fallthru. */ > + while (! VEC_empty (basic_block, bbs_to_fix)) > + { > + bb = VEC_pop (basic_block, bbs_to_fix); > + fixup_bb_partition (bb); > + } > +} > + > /* Verify the CFG and RTL consistency common for both underlying RTL and > cfglayout RTL. > > @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) > rtx x; > int err = 0; > basic_block bb; > + bool have_partitions = false; > > /* Check the general integrity of the basic blocks. */ > FOR_EACH_BB_REVERSE (bb) > @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) > > if (e->flags & EDGE_ABNORMAL) > n_abnormal++; > + > + have_partitions |= is_crossing; > } > > if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) > @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) > } > } > > + /* If there are partitions, do a sanity check on them: A basic block in > + a cold partition cannot dominate a basic block in a hot partition. */ > + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; > + if (have_partitions && !err) > + FOR_EACH_BB (bb) > + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) > + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); > + if (! VEC_empty (basic_block, bbs_in_cold_partition)) > + { > + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); > + basic_block son; > + > + if (dom_calculated_here) > + calculate_dominance_info (CDI_DOMINATORS); > + > + while (! VEC_empty (basic_block, bbs_in_cold_partition)) > + { > + bb = VEC_pop (basic_block, bbs_in_cold_partition); > + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) > + { > + error ("non-cold basic block %d dominated " > + "by a block in the cold partition", bb->index); > + err = 1; > + } > + for (son = first_dom_son (CDI_DOMINATORS, bb); > + son; > + son = next_dom_son (CDI_DOMINATORS, son)) > + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); > + } > + > + if (dom_calculated_here) > + free_dominance_info (CDI_DOMINATORS); > + } > + > /* Clean up. */ > return err; > } > @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) > else > cfg_layout_function_header = NULL_RTX; > > + had_sec_boundary_notes = false; > + > next_insn = get_insns (); > FOR_EACH_BB (bb) > { > rtx end; > > if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) > - BB_HEADER (bb) = unlink_insn_chain (next_insn, > - PREV_INSN (BB_HEAD (bb))); > + { > + /* Rather than try to keep section boundary notes incrementally > + up-to-date through cfg layout optimizations, simply remove them > + and flag that they should be re-inserted when exiting > + cfg layout mode. */ > + rtx check_insn = next_insn; > + while (check_insn) > + { > + if (NOTE_P (check_insn) > + && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) > + { > + had_sec_boundary_notes |= true; > + /* Remove note from chain. Grab new next_insn first. */ > + if (next_insn == check_insn) > + next_insn = NEXT_INSN (check_insn); > + /* Delete note. */ > + delete_insn (check_insn); > + /* There will only be one. */ > + break; > + } > + check_insn = NEXT_INSN (check_insn); > + } > + /* If we still have header instructions left after above loop. */ > + if (next_insn != BB_HEAD (bb)) > + BB_HEADER (bb) = unlink_insn_chain (next_insn, > + PREV_INSN (BB_HEAD (bb))); > + } > end = skip_insns_after_block (bb); > if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) > BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end); > @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) > if (bb->next_bb != EXIT_BLOCK_PTR) > bb->aux = bb->next_bb; > > - cfg_layout_finalize (); > + cfg_layout_finalize (false); > > return 0; > } > @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) > } > > > -/* Given a reorder chain, rearrange the code to match. */ > +/* Given a reorder chain, rearrange the code to match. If > + this is called when we will FINALIZE_REORDER_BLOCKS, or when > + section boundary notes were removed on entry to cfg layout > + mode, insert section boundary notes here. */ > > static void > -fixup_reorder_chain (void) > +fixup_reorder_chain (bool finalize_reorder_blocks) > { > basic_block bb; > rtx insn = NULL; > @@ -3150,7 +3373,7 @@ static void > PREV_INSN (BB_HEADER (bb)) = insn; > insn = BB_HEADER (bb); > while (NEXT_INSN (insn)) > - insn = NEXT_INSN (insn); > + insn = NEXT_INSN (insn); > } > if (insn) > NEXT_INSN (insn) = BB_HEAD (bb); > @@ -3175,6 +3398,11 @@ static void > insn = NEXT_INSN (insn); > > set_last_insn (insn); > + > + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ > + if (had_sec_boundary_notes || finalize_reorder_blocks) > + insert_section_boundary_note (); > + > #ifdef ENABLE_CHECKING > verify_insn_chain (); > #endif > @@ -3187,7 +3415,7 @@ static void > edge e_fall, e_taken, e; > rtx bb_end_insn; > rtx ret_label = NULL_RTX; > - basic_block nb, src_bb; > + basic_block nb; > edge_iterator ei; > > if (EDGE_COUNT (bb->succs) == 0) > @@ -3322,7 +3550,6 @@ static void > /* We got here if we need to add a new jump insn. > Note force_nonfallthru can delete E_FALL and thus we have to > save E_FALL->src prior to the call to force_nonfallthru. */ > - src_bb = e_fall->src; > nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); > if (nb) > { > @@ -3330,17 +3557,6 @@ static void > bb->aux = nb; > /* Don't process this new block. */ > bb = nb; > - > - /* Make sure new bb is tagged for correct section (same as > - fall-thru source, since you cannot fall-thru across > - section boundaries). */ > - BB_COPY_PARTITION (src_bb, single_pred (bb)); > - if (flag_reorder_blocks_and_partition > - && targetm_common.have_named_sections > - && JUMP_P (BB_END (bb)) > - && !any_condjump_p (BB_END (bb)) > - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) > - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); > } > } > > @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) > case NOTE_INSN_FUNCTION_BEG: > /* There is always just single entry to function. */ > case NOTE_INSN_BASIC_BLOCK: > + /* We should only switch text sections once. */ > + case NOTE_INSN_SWITCH_TEXT_SECTIONS: > break; > > case NOTE_INSN_EPILOGUE_BEG: > - case NOTE_INSN_SWITCH_TEXT_SECTIONS: > emit_note_copy (insn); > break; > > @@ -3759,10 +3976,13 @@ break_superblocks (void) > } > > /* Finalize the changes: reorder insn list according to the sequence specified > - by aux pointers, enter compensation code, rebuild scope forest. */ > + by aux pointers, enter compensation code, rebuild scope forest. If > + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that > + to fixup_reorder_chain so that it can insert the proper switch text > + section notes. */ > > void > -cfg_layout_finalize (void) > +cfg_layout_finalize (bool finalize_reorder_blocks) > { > #ifdef ENABLE_CHECKING > verify_flow_info (); > @@ -3775,7 +3995,7 @@ void > #endif > ) > fixup_fallthru_exit_predecessor (); > - fixup_reorder_chain (); > + fixup_reorder_chain (finalize_reorder_blocks); > > rebuild_jump_labels (get_insns ()); > delete_dead_jumptables (); > @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) > if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) > return false; > > - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) > - || BB_PARTITION (src) != BB_PARTITION (target)) > + if (BB_PARTITION (src) != BB_PARTITION (target)) > return false; > > if (!onlyjump_p (insn) > > -- > This patch is available for review at http://codereview.appspot.com/6823047
Hi, I have tested your patch on Spec2000 on ARM, and I can still see several failures caused by: "error: fallthru edge crosses section boundary", including the case described in PR55121. On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote: > Ping. > Teresa > > On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote: >> Revised patch that fixes failures encountered when enabling >> -freorder-blocks-and-partition, including the failure reported in PR 53743. >> >> This includes new verification code to ensure no cold blocks dominate hot >> blocks contributed by Steven Bosscher. >> >> I attempted to make the handling of partition updates through the optimization >> passes much more consistent, removing a number of partial fixes in the code >> stream in the process. The code to fixup partitions (including the BB_PARTITION >> assignement, region crossing jump notes, and switch text section notes) is >> now handled in a few centralized locations. For example, inside >> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers >> don't need to attempt the fixup themselves. >> >> For optimization passes that make adjustments to the cfg while in cfg layout >> mode that are not easy to fix up incrementally, the new routine >> fixup_partitions handles the cleanup globally. This does require calculation >> of the dominance relation, however, as far as I can tell the routines which >> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) >> are invoked typically once (or a small number of times in the case of >> try_optimize_cfg) per optimization pass. Additionally, I compared the >> -ftime-report output for some large fdo compilations and saw only minimal >> increases in the dominance computation times, which were only a tiny percent >> of the overall compile time. >> >> Additionally, I added a flag to the rtl_data structure to indicate whether >> any partitioning was actually performed, so that optimizations which were >> conservatively disabled whenever the flag_reorder_blocks_and_partition >> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less >> conservative for functions where no partitions were formed (e.g. they are >> completely hot). >> >> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int >> benchmarks and internal google benchmarks using profile feedback and >> -freorder-blocks-and-partition to get more coverage. Ok for trunk? >> >> Thanks, >> Teresa >> >> 2012-11-14 Teresa Johnson <tejohnson@google.com> >> Steven Bosscher <steven@gcc.gnu.org> >> >> * cfghooks.h (cfg_layout_finalize): New parameter. >> * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize >> parameter. >> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert >> as this is now done by redirect_edge_and_branch_force. >> * function.c (thread_prologue_and_epilogue_insns): Insert new bb after >> barriers, new cfg_layout_finalize parameter, and don't store exit >> predecessor BB until after it is potentially split. >> * function.h (struct rtl_data): New flag has_bb_partition. >> * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. >> * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if >> any blocks in function actually partitioned. >> (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean >> up partitioning. >> * bb-reorder.c (connect_traces): Only look for partitions and skip >> block copying if any blocks in function actually partitioned. >> (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. >> (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure >> that no cold blocks dominate a hot block. >> (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert >> as this is now done by force_nonfallthru_and_redirect. >> (add_reg_crossing_jump_notes): Handle the fact that some jumps may >> already be marked with region crossing note. >> (reorder_basic_blocks): Only need to verify partitions if any >> blocks in function actually partitioned. >> (insert_section_boundary_note): Only need to insert note if any >> blocks in function actually partitioned. >> (rest_of_handle_reorder_blocks): New cfg_layout_finalize >> parameter, and remove call to insert_section_boundary_note as this >> is now called via cfg_layout_finalize/fixup_reorder_chain. >> (duplicate_computed_gotos): New cfg_layout_finalize >> parameter. >> (partition_hot_cold_basic_blocks): Set flag indicating function >> has bb partitions. >> * bb-reorder.h: Declare insert_section_boundary_note and >> emit_barrier_after_bb, which are no longer static. >> * basic-block.h: Declare new function fixup_partitions. >> * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary >> check for region crossing note. >> (fixup_partition_crossing): New function. >> (fixup_bb_partition): Ditto. >> (rtl_redirect_edge_and_branch): Fixup partition boundaries. >> (force_nonfallthru_and_redirect): Fixup partition boundaries, >> remove old code that tried to do this. Emit barrier correctly >> when we are in cfglayout mode. >> (rtl_split_edge): Correctly fixup partition boundaries. >> (commit_one_edge_insertion): Remove old code that tried to >> fixup region crossing edge since this is now handled in >> split_block, and set up insertion point correctly since >> block may now end in a jump. >> (commit_edge_insertions): Invoke fixup_partitions to sanitize partition >> boundaries after optimizations that modify cfg and before trying to >> verify the flow info. >> (fixup_partitions): New function. >> (rtl_verify_flow_info_1): Add verification that no cold bbs dominate >> hot bbs. >> (record_effective_endpoints): Remove region-crossing notes and set flag >> indicating that they need to be reinserted on exit from cfglayout mode. >> (outof_cfg_layout_mode): New cfg_layout_finalize parameter. >> (fixup_reorder_chain): Call insert_section_boundary_note if necessary. >> Remove old code that attempted to fixup region crossing note as >> this is now handled in force_nonfallthru_and_redirect. >> (duplicate_insn_chain): Don't duplicate switch section notes. >> (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain. >> (rtl_can_remove_branch_p): Remove unnecessary check for region crossing >> note. >> >> Index: cfghooks.h >> =================================================================== >> --- cfghooks.h (revision 193376) >> +++ cfghooks.h (working copy) >> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas >> void account_profile_record (struct profile_record *, int); >> >> extern void cfg_layout_initialize (unsigned int); >> -extern void cfg_layout_finalize (void); >> +extern void cfg_layout_finalize (bool); >> >> /* Hooks containers. */ >> extern struct cfg_hooks gimple_cfg_hooks; >> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi >> extern void gimple_register_cfg_hooks (void); >> extern struct cfg_hooks get_cfg_hooks (void); >> extern void set_cfg_hooks (struct cfg_hooks); >> - >> Index: modulo-sched.c >> =================================================================== >> --- modulo-sched.c (revision 193376) >> +++ modulo-sched.c (working copy) >> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) >> if (bb->next_bb != EXIT_BLOCK_PTR) >> bb->aux = bb->next_bb; >> free_dominance_info (CDI_DOMINATORS); >> - cfg_layout_finalize (); >> + cfg_layout_finalize (false); >> #endif /* INSN_SCHEDULING */ >> return 0; >> } >> Index: ifcvt.c >> =================================================================== >> --- ifcvt.c (revision 193376) >> +++ ifcvt.c (working copy) >> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg >> if (new_bb) >> { >> df_bb_replace (then_bb_index, new_bb); >> - /* Since the fallthru edge was redirected from test_bb to new_bb, >> - we need to ensure that new_bb is in the same partition as >> - test bb (you can not fall through across section boundaries). */ >> - BB_COPY_PARTITION (new_bb, test_bb); >> + /* This should have been done above via force_nonfallthru_and_redirect >> + (possibly called from redirect_edge_and_branch_force). */ >> + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); >> } >> >> num_true_changes++; >> Index: function.c >> =================================================================== >> --- function.c (revision 193376) >> +++ function.c (working copy) >> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) >> break; >> if (e) >> { >> - copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), >> - NULL_RTX, e->src); >> + /* Make sure we insert after any barriers. */ >> + rtx end = get_last_bb_insn (e->src); >> + copy_bb = create_basic_block (NEXT_INSN (end), >> + NULL_RTX, e->src); >> BB_COPY_PARTITION (copy_bb, e->src); >> } >> else >> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) >> if (cur_bb->index >= NUM_FIXED_BLOCKS >> && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) >> cur_bb->aux = cur_bb->next_bb; >> - cfg_layout_finalize (); >> + cfg_layout_finalize (false); >> } >> >> epilogue_done: >> @@ -6517,7 +6519,7 @@ epilogue_done: >> basic_block simple_return_block_cold = NULL; >> edge pending_edge_hot = NULL; >> edge pending_edge_cold = NULL; >> - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; >> + basic_block exit_pred; >> int i; >> >> gcc_assert (entry_edge != orig_entry_edge); >> @@ -6545,6 +6547,12 @@ epilogue_done: >> else >> pending_edge_cold = e; >> } >> + >> + /* Save a pointer to the exit's predecessor BB for use in >> + inserting new BBs at the end of the function. Do this >> + after the call to split_block above which may split >> + the original exit pred. */ >> + exit_pred = EXIT_BLOCK_PTR->prev_bb; >> >> FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) >> { >> Index: function.h >> =================================================================== >> --- function.h (revision 193376) >> +++ function.h (working copy) >> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { >> sched2) and is useful only if the port defines LEAF_REGISTERS. */ >> bool uses_only_leaf_regs; >> >> + /* Nonzero if the function being compiled has undergone hot/cold partitioning >> + (under flag_reorder_blocks_and_partition) and has at least one cold >> + block. */ >> + bool has_bb_partition; >> + >> /* Like regs_ever_live, but 1 if a reg is set or clobbered from an >> asm. Unlike regs_ever_live, elements of this array corresponding >> to eliminable regs (like the frame pointer) are set if an asm >> Index: hw-doloop.c >> =================================================================== >> --- hw-doloop.c (revision 193376) >> +++ hw-doloop.c (working copy) >> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) >> else >> bb->aux = NULL; >> } >> - cfg_layout_finalize (); >> + cfg_layout_finalize (false); >> clear_aux_for_blocks (); >> df_analyze (); >> } >> Index: cfgcleanup.c >> =================================================================== >> --- cfgcleanup.c (revision 193376) >> +++ cfgcleanup.c (working copy) >> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, >> partition boundaries). See the comments at the top of >> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >> >> - if (flag_reorder_blocks_and_partition && reload_completed) >> + if (crtl->has_bb_partition && reload_completed) >> return false; >> >> /* Search backward through forwarder blocks. We don't need to worry >> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) >> df_analyze (); >> } >> >> + if (changed) >> + { >> + /* Edge forwarding in particular can cause hot blocks previously >> + reached by both hot and cold blocks to become dominated only >> + by cold blocks. This will cause the verification below to fail, >> + and lead to now cold code in the hot section. This is not easy >> + to detect and fix during edge forwarding, and in some cases >> + is only visible after newly unreachable blocks are deleted, >> + which will be done in fixup_partitions. */ >> + fixup_partitions (); >> + >> #ifdef ENABLE_CHECKING >> - if (changed) >> - verify_flow_info (); >> + verify_flow_info (); >> #endif >> + } >> >> changed_overall |= changed; >> first_pass = false; >> Index: bb-reorder.c >> =================================================================== >> --- bb-reorder.c (revision 193376) >> +++ bb-reorder.c (working copy) >> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces >> current_partition = BB_PARTITION (traces[0].first); >> two_passes = false; >> >> - if (flag_reorder_blocks_and_partition) >> + if (crtl->has_bb_partition) >> for (i = 0; i < n_traces && !two_passes; i++) >> if (BB_PARTITION (traces[0].first) >> != BB_PARTITION (traces[i].first)) >> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces >> } >> } >> >> - if (flag_reorder_blocks_and_partition) >> + if (crtl->has_bb_partition) >> try_copy = false; >> >> /* Copy tiny blocks always; copy larger blocks only when the >> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) >> return length; >> } >> >> -/* Emit a barrier into the footer of BB. */ >> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode. */ >> >> -static void >> +void >> emit_barrier_after_bb (basic_block bb) >> { >> rtx barrier = emit_barrier_after (BB_END (bb)); >> - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >> + if (current_ir_type () == IR_RTL_CFGLAYOUT) >> + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >> } >> >> /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. >> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg >> { >> VEC(edge, heap) *crossing_edges = NULL; >> basic_block bb; >> - edge e; >> - edge_iterator ei; >> + edge e, e2; >> + edge_iterator ei, ei2; >> + unsigned int cold_bb_count = 0; >> + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; >> + VEC (basic_block, heap) *bbs_newly_hot = NULL; >> >> /* Mark which partition (hot/cold) each basic block belongs in. */ >> FOR_EACH_BB (bb) >> { >> if (probably_never_executed_bb_p (cfun, bb)) >> - BB_SET_PARTITION (bb, BB_COLD_PARTITION); >> + { >> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >> + cold_bb_count++; >> + } >> else >> - BB_SET_PARTITION (bb, BB_HOT_PARTITION); >> + { >> + BB_SET_PARTITION (bb, BB_HOT_PARTITION); >> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb); >> + } >> } >> >> + /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of >> + several different possibilities. One is that there are edge weight insanities >> + due to optimization phases that do not properly update basic block profile >> + counts. The second is that the entry of the function may not be hot, because >> + it is entered fewer times than the number of profile training runs, but there >> + is a loop inside the function that causes blocks within the function to be >> + above the threshold for hotness. */ >> + if (cold_bb_count) >> + { >> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >> + >> + if (dom_calculated_here) >> + calculate_dominance_info (CDI_DOMINATORS); >> + >> + /* Keep examining hot bbs until we have either checked them all, or >> + re-marked all cold bbs hot. */ >> + while (! VEC_empty (basic_block, bbs_in_hot_partition) >> + && cold_bb_count) >> + { >> + basic_block dom_bb; >> + >> + bb = VEC_pop (basic_block, bbs_in_hot_partition); >> + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); >> + >> + /* If bb's immediate dominator is also hot then it is ok. */ >> + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) >> + continue; >> + >> + /* We have a hot bb with an immediate dominator that is cold. >> + The dominator needs to be re-marked to hot. */ >> + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); >> + cold_bb_count--; >> + >> + /* Now we need to examine newly-hot dom_bb to see if it is also >> + dominated by a cold bb. */ >> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb); >> + >> + /* We should also adjust any cold blocks that the newly-hot bb >> + feeds and see if it makes sense to re-mark those as hot as >> + well. */ >> + VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb); >> + while (! VEC_empty (basic_block, bbs_newly_hot)) >> + { >> + basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot); >> + /* Examine all successors of this newly-hot bb to see if they >> + are cold and should be re-marked as hot. */ >> + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) >> + { >> + bool any_cold_preds = false; >> + basic_block succ = e->dest; >> + if (BB_PARTITION (succ) != BB_COLD_PARTITION) >> + continue; >> + /* Does this block have any cold predecessors now? */ >> + FOR_EACH_EDGE (e2, ei2, succ->preds) >> + { >> + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) >> + { >> + any_cold_preds = true; >> + break; >> + } >> + } >> + if (any_cold_preds) >> + continue; >> + >> + /* Here we have a successor of newly-hot bb that is cold >> + but no longer has any cold precessessors. Since the original >> + assignment of our newly-hot bb was incorrect, this successor's >> + assignment as cold is also suspect. Go ahead and re-mark it >> + as hot now too. Better heuristics may be in order here. */ >> + BB_SET_PARTITION (succ, BB_HOT_PARTITION); >> + cold_bb_count--; >> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ); >> + /* Examine this successor as a newly-hot bb. */ >> + VEC_safe_push (basic_block, heap, bbs_newly_hot, succ); >> + } >> + } >> + } >> + >> + if (dom_calculated_here) >> + free_dominance_info (CDI_DOMINATORS); >> + } >> + >> /* The format of .gcc_except_table does not allow landing pads to >> be in a different partition as the throw. Fix this by either >> moving or duplicating the landing pads. */ >> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) >> new_bb->aux = cur_bb->aux; >> cur_bb->aux = new_bb; >> >> - /* Make sure new fall-through bb is in same >> - partition as bb it's falling through from. */ >> + /* This is done by force_nonfallthru_and_redirect. */ >> + gcc_assert (BB_PARTITION (new_bb) >> + == BB_PARTITION (cur_bb)); >> >> - BB_COPY_PARTITION (new_bb, cur_bb); >> single_succ_edge (new_bb)->flags |= EDGE_CROSSING; >> } >> else >> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) >> FOR_EACH_BB (bb) >> FOR_EACH_EDGE (e, ei, bb->succs) >> if ((e->flags & EDGE_CROSSING) >> - && JUMP_P (BB_END (e->src))) >> + && JUMP_P (BB_END (e->src)) >> + /* Some notes were added during fix_up_fall_thru_edges, via >> + force_nonfallthru_and_redirect. */ >> + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX)) >> add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >> } >> >> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) >> dump_flow_info (dump_file, dump_flags); >> } >> >> - if (flag_reorder_blocks_and_partition) >> + if (crtl->has_bb_partition) >> verify_hot_cold_block_grouping (); >> } >> >> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) >> encountering this note will make the compiler switch between the >> hot and cold text sections. */ >> >> -static void >> +void >> insert_section_boundary_note (void) >> { >> basic_block bb; >> rtx new_note; >> int first_partition = 0; >> >> - if (!flag_reorder_blocks_and_partition) >> + if (!crtl->has_bb_partition) >> return; >> >> FOR_EACH_BB (bb) >> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) >> FOR_EACH_BB (bb) >> if (bb->next_bb != EXIT_BLOCK_PTR) >> bb->aux = bb->next_bb; >> - cfg_layout_finalize (); >> + cfg_layout_finalize (true); >> >> - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >> - insert_section_boundary_note (); >> return 0; >> } >> >> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) >> } >> >> done: >> - cfg_layout_finalize (); >> + cfg_layout_finalize (false); >> >> BITMAP_FREE (candidates); >> return 0; >> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) >> if (crossing_edges == NULL) >> return 0; >> >> + crtl->has_bb_partition = true; >> + >> /* Make sure the source of any crossing edge ends in a jump and the >> destination of any crossing edge has a label. */ >> add_labels_and_missing_jumps (crossing_edges); >> Index: bb-reorder.h >> =================================================================== >> --- bb-reorder.h (revision 193376) >> +++ bb-reorder.h (working copy) >> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re >> >> extern int get_uncond_jump_length (void); >> >> +extern void insert_section_boundary_note (void); >> + >> +extern void emit_barrier_after_bb (basic_block bb); >> + >> #endif >> Index: basic-block.h >> =================================================================== >> --- basic-block.h (revision 193376) >> +++ basic-block.h (working copy) >> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect >> extern bool contains_no_active_insn_p (const_basic_block); >> extern bool forwarder_block_p (const_basic_block); >> extern bool can_fallthru (basic_block, basic_block); >> +extern void fixup_partitions (void); >> >> /* In cfgbuild.c. */ >> extern void find_many_sub_basic_blocks (sbitmap); >> Index: cfgrtl.c >> =================================================================== >> --- cfgrtl.c (revision 193376) >> +++ cfgrtl.c (working copy) >> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see >> #include "tree.h" >> #include "hard-reg-set.h" >> #include "basic-block.h" >> +#include "bb-reorder.h" >> #include "regs.h" >> #include "flags.h" >> #include "function.h" >> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see >> Only applicable if the CFG is in cfglayout mode. */ >> static GTY(()) rtx cfg_layout_function_footer; >> static GTY(()) rtx cfg_layout_function_header; >> +static bool had_sec_boundary_notes; >> >> static rtx skip_insns_after_block (basic_block); >> static void record_effective_endpoints (void); >> static rtx label_for_bb (basic_block); >> -static void fixup_reorder_chain (void); >> +static void fixup_reorder_chain (bool finalize_reorder_blocks); >> >> void verify_insn_chain (void); >> static void fixup_fallthru_exit_predecessor (void); >> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc >> partition boundaries). See the comments at the top of >> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >> >> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >> - || BB_PARTITION (src) != BB_PARTITION (target)) >> + if (BB_PARTITION (src) != BB_PARTITION (target)) >> return NULL; >> >> /* We can replace or remove a complex jump only when we have exactly >> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) >> return e; >> } >> >> +/* Called when edge E has been redirected to a new destination, >> + in order to update the region crossing flag on the edge and >> + jump. */ >> + >> +static void >> +fixup_partition_crossing (edge e, basic_block target) >> +{ >> + rtx note; >> + >> + gcc_assert (e->dest == target); >> + >> + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) >> + return; >> + /* If we redirected an existing edge, it may already be marked >> + crossing, even though the new src is missing a reg crossing note. >> + But make sure reg crossing note doesn't already exist before >> + inserting. */ >> + if (BB_PARTITION (e->src) != BB_PARTITION (target)) >> + { >> + e->flags |= EDGE_CROSSING; >> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >> + if (JUMP_P (BB_END (e->src)) >> + && !note) >> + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >> + } >> + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) >> + { >> + e->flags &= ~EDGE_CROSSING; >> + /* Remove the region crossing note from jump at end of >> + e->src if it exists. */ >> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >> + if (note) >> + remove_note (BB_END (e->src), note); >> + } >> +} >> + >> +/* Called when block BB has been reassigned to a different partition, >> + to ensure that the region crossing attributes are updated. */ >> + >> +static void >> +fixup_bb_partition (basic_block bb) >> +{ >> + edge e; >> + edge_iterator ei; >> + >> + /* Now need to make bb's pred edges non-region crossing. */ >> + FOR_EACH_EDGE (e, ei, bb->preds) >> + { >> + fixup_partition_crossing (e, e->dest); >> + } >> + >> + /* Possibly need to make bb's successor edges region crossing, >> + or remove stale region crossing. */ >> + FOR_EACH_EDGE (e, ei, bb->succs) >> + { >> + if ((e->flags & EDGE_FALLTHRU) >> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) >> + && e->dest != EXIT_BLOCK_PTR) >> + /* force_nonfallthru_and_redirect calls fixup_partition_crossing. */ >> + force_nonfallthru (e); >> + else >> + fixup_partition_crossing (e, e->dest); >> + } >> +} >> + >> /* Attempt to change code to redirect edge E to TARGET. Don't do that on >> expense of adding new instructions or reordering basic blocks. >> >> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block >> { >> edge ret; >> basic_block src = e->src; >> + basic_block dest = e->dest; >> >> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >> return NULL; >> >> - if (e->dest == target) >> + if (dest == target) >> return e; >> >> if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) >> { >> df_set_bb_dirty (src); >> + fixup_partition_crossing (ret, target); >> return ret; >> } >> >> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block >> return NULL; >> >> df_set_bb_dirty (src); >> + fixup_partition_crossing (ret, target); >> return ret; >> } >> >> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >> /* Make sure new block ends up in correct hot/cold section. */ >> >> BB_COPY_PARTITION (jump_block, e->src); >> - if (flag_reorder_blocks_and_partition >> - && targetm_common.have_named_sections >> - && JUMP_P (BB_END (jump_block)) >> - && !any_condjump_p (BB_END (jump_block)) >> - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) >> - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); >> >> /* Wire edge in. */ >> new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); >> new_edge->probability = probability; >> new_edge->count = count; >> >> + /* If e->src was previously region crossing, it no longer is >> + and the reg crossing note should be removed. */ >> + fixup_partition_crossing (new_edge, jump_block); >> + >> /* Redirect old edge. */ >> redirect_edge_pred (e, jump_block); >> e->probability = REG_BR_PROB_BASE; >> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >> LABEL_NUSES (label)++; >> } >> >> - emit_barrier_after (BB_END (jump_block)); >> + /* We might be in cfg layout mode, and if so, the following routine will >> + insert the barrier correctly. */ >> + emit_barrier_after_bb (jump_block); >> redirect_edge_succ_nodup (e, target); >> >> if (abnormal_edge_flags) >> make_edge (src, target, abnormal_edge_flags); >> >> df_mark_solutions_dirty (); >> + fixup_partition_crossing (e, target); >> return new_bb; >> } >> >> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU >> static basic_block >> rtl_split_edge (edge edge_in) >> { >> - basic_block bb; >> + basic_block bb, new_bb; >> rtx before; >> >> /* Abnormal edges cannot be split. */ >> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) >> else >> { >> bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); >> - /* ??? Why not edge_in->dest->prev_bb here? */ >> - BB_COPY_PARTITION (bb, edge_in->dest); >> + if (edge_in->src == ENTRY_BLOCK_PTR) >> + BB_COPY_PARTITION (bb, edge_in->dest); >> + else >> + /* Put the split bb into the src partition, to avoid creating >> + a situation where a cold bb dominates a hot bb, in the case >> + where src is cold and dest is hot. The src will dominate >> + the new bb (whereas it might not have dominated dest). */ >> + BB_COPY_PARTITION (bb, edge_in->src); >> } >> >> make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); >> >> + /* Can't allow a region crossing edge to be fallthrough. */ >> + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) >> + && edge_in->dest != EXIT_BLOCK_PTR) >> + { >> + new_bb = force_nonfallthru (single_succ_edge (bb)); >> + gcc_assert (!new_bb); >> + } >> + >> /* For non-fallthru edges, we must adjust the predecessor's >> jump instruction to target our new block. */ >> if ((edge_in->flags & EDGE_FALLTHRU) == 0) >> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) >> else >> { >> bb = split_edge (e); >> - after = BB_END (bb); >> >> - if (flag_reorder_blocks_and_partition >> - && targetm_common.have_named_sections >> - && e->src != ENTRY_BLOCK_PTR >> - && BB_PARTITION (e->src) == BB_COLD_PARTITION >> - && !(e->flags & EDGE_CROSSING) >> - && JUMP_P (after) >> - && !any_condjump_p (after) >> - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) >> - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); >> + /* If e crossed a partition boundary, we needed to make bb end in >> + a region-crossing jump, even though it was originally fallthru. */ >> + if (JUMP_P (BB_END (bb))) >> + before = BB_END (bb); >> + else >> + after = BB_END (bb); >> } >> >> /* Now that we've found the spot, do the insertion. */ >> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) >> { >> basic_block bb; >> >> + /* Optimization passes that invoke this routine can cause hot blocks >> + previously reached by both hot and cold blocks to become dominated only >> + by cold blocks. This will cause the verification below to fail, >> + and lead to now cold code in the hot section. In some cases this >> + may only be visible after newly unreachable blocks are deleted, >> + which will be done by fixup_partitions. */ >> + fixup_partitions (); >> + >> #ifdef ENABLE_CHECKING >> verify_flow_info (); >> #endif >> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) >> >> return end; >> } >> - >> + >> +/* Perform cleanup on the hot/cold bb partitioning after optimization >> + passes that modify the cfg. */ >> + >> +void >> +fixup_partitions (void) >> +{ >> + basic_block bb; >> + >> + if (!crtl->has_bb_partition) >> + return; >> + >> + /* Delete any blocks that became unreachable and weren't >> + already cleaned up, for example during edge forwarding >> + and convert_jumps_to_returns. This will expose more >> + opportunities for fixing the partition boundaries here. >> + Also, the calculation of the dominance graph during verification >> + will assert if there are unreachable nodes. */ >> + delete_unreachable_blocks (); >> + >> + /* If there are partitions, do a sanity check on them: A basic block in >> + a cold partition cannot dominate a basic block in a hot partition. >> + Fixup any that now violate this requirement, as a result of edge >> + forwarding and unreachable block deletion. */ >> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >> + VEC (basic_block, heap) *bbs_to_fix = NULL; >> + FOR_EACH_BB (bb) >> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >> + { >> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >> + basic_block son; >> + >> + if (dom_calculated_here) >> + calculate_dominance_info (CDI_DOMINATORS); >> + >> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >> + { >> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >> + /* If bb is not yet cold (because it was added below as >> + a block dominated by a cold bb) then mark it cold here. */ >> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >> + { >> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >> + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); >> + } >> + /* Any blocks dominated by a block in the cold section >> + must also be cold. */ >> + for (son = first_dom_son (CDI_DOMINATORS, bb); >> + son; >> + son = next_dom_son (CDI_DOMINATORS, son)) >> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >> + } >> + >> + if (dom_calculated_here) >> + free_dominance_info (CDI_DOMINATORS); >> + } >> + >> + /* Do the partition fixup after all necessary blocks have been converted to >> + cold, so that we only update the region crossings the minimum number of >> + places, which can require forcing edges to be non fallthru. */ >> + while (! VEC_empty (basic_block, bbs_to_fix)) >> + { >> + bb = VEC_pop (basic_block, bbs_to_fix); >> + fixup_bb_partition (bb); >> + } >> +} >> + >> /* Verify the CFG and RTL consistency common for both underlying RTL and >> cfglayout RTL. >> >> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) >> rtx x; >> int err = 0; >> basic_block bb; >> + bool have_partitions = false; >> >> /* Check the general integrity of the basic blocks. */ >> FOR_EACH_BB_REVERSE (bb) >> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) >> >> if (e->flags & EDGE_ABNORMAL) >> n_abnormal++; >> + >> + have_partitions |= is_crossing; >> } >> >> if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) >> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) >> } >> } >> >> + /* If there are partitions, do a sanity check on them: A basic block in >> + a cold partition cannot dominate a basic block in a hot partition. */ >> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >> + if (have_partitions && !err) >> + FOR_EACH_BB (bb) >> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >> + { >> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >> + basic_block son; >> + >> + if (dom_calculated_here) >> + calculate_dominance_info (CDI_DOMINATORS); >> + >> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >> + { >> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >> + { >> + error ("non-cold basic block %d dominated " >> + "by a block in the cold partition", bb->index); >> + err = 1; >> + } >> + for (son = first_dom_son (CDI_DOMINATORS, bb); >> + son; >> + son = next_dom_son (CDI_DOMINATORS, son)) >> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >> + } >> + >> + if (dom_calculated_here) >> + free_dominance_info (CDI_DOMINATORS); >> + } >> + >> /* Clean up. */ >> return err; >> } >> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) >> else >> cfg_layout_function_header = NULL_RTX; >> >> + had_sec_boundary_notes = false; >> + >> next_insn = get_insns (); >> FOR_EACH_BB (bb) >> { >> rtx end; >> >> if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) >> - BB_HEADER (bb) = unlink_insn_chain (next_insn, >> - PREV_INSN (BB_HEAD (bb))); >> + { >> + /* Rather than try to keep section boundary notes incrementally >> + up-to-date through cfg layout optimizations, simply remove them >> + and flag that they should be re-inserted when exiting >> + cfg layout mode. */ >> + rtx check_insn = next_insn; >> + while (check_insn) >> + { >> + if (NOTE_P (check_insn) >> + && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) >> + { >> + had_sec_boundary_notes |= true; >> + /* Remove note from chain. Grab new next_insn first. */ >> + if (next_insn == check_insn) >> + next_insn = NEXT_INSN (check_insn); >> + /* Delete note. */ >> + delete_insn (check_insn); >> + /* There will only be one. */ >> + break; >> + } >> + check_insn = NEXT_INSN (check_insn); >> + } >> + /* If we still have header instructions left after above loop. */ >> + if (next_insn != BB_HEAD (bb)) >> + BB_HEADER (bb) = unlink_insn_chain (next_insn, >> + PREV_INSN (BB_HEAD (bb))); >> + } >> end = skip_insns_after_block (bb); >> if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) >> BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end); >> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) >> if (bb->next_bb != EXIT_BLOCK_PTR) >> bb->aux = bb->next_bb; >> >> - cfg_layout_finalize (); >> + cfg_layout_finalize (false); >> >> return 0; >> } >> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) >> } >> >> >> -/* Given a reorder chain, rearrange the code to match. */ >> +/* Given a reorder chain, rearrange the code to match. If >> + this is called when we will FINALIZE_REORDER_BLOCKS, or when >> + section boundary notes were removed on entry to cfg layout >> + mode, insert section boundary notes here. */ >> >> static void >> -fixup_reorder_chain (void) >> +fixup_reorder_chain (bool finalize_reorder_blocks) >> { >> basic_block bb; >> rtx insn = NULL; >> @@ -3150,7 +3373,7 @@ static void >> PREV_INSN (BB_HEADER (bb)) = insn; >> insn = BB_HEADER (bb); >> while (NEXT_INSN (insn)) >> - insn = NEXT_INSN (insn); >> + insn = NEXT_INSN (insn); >> } >> if (insn) >> NEXT_INSN (insn) = BB_HEAD (bb); >> @@ -3175,6 +3398,11 @@ static void >> insn = NEXT_INSN (insn); >> >> set_last_insn (insn); >> + >> + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >> + if (had_sec_boundary_notes || finalize_reorder_blocks) >> + insert_section_boundary_note (); >> + >> #ifdef ENABLE_CHECKING >> verify_insn_chain (); >> #endif >> @@ -3187,7 +3415,7 @@ static void >> edge e_fall, e_taken, e; >> rtx bb_end_insn; >> rtx ret_label = NULL_RTX; >> - basic_block nb, src_bb; >> + basic_block nb; >> edge_iterator ei; >> >> if (EDGE_COUNT (bb->succs) == 0) >> @@ -3322,7 +3550,6 @@ static void >> /* We got here if we need to add a new jump insn. >> Note force_nonfallthru can delete E_FALL and thus we have to >> save E_FALL->src prior to the call to force_nonfallthru. */ >> - src_bb = e_fall->src; >> nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); >> if (nb) >> { >> @@ -3330,17 +3557,6 @@ static void >> bb->aux = nb; >> /* Don't process this new block. */ >> bb = nb; >> - >> - /* Make sure new bb is tagged for correct section (same as >> - fall-thru source, since you cannot fall-thru across >> - section boundaries). */ >> - BB_COPY_PARTITION (src_bb, single_pred (bb)); >> - if (flag_reorder_blocks_and_partition >> - && targetm_common.have_named_sections >> - && JUMP_P (BB_END (bb)) >> - && !any_condjump_p (BB_END (bb)) >> - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) >> - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); >> } >> } >> >> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) >> case NOTE_INSN_FUNCTION_BEG: >> /* There is always just single entry to function. */ >> case NOTE_INSN_BASIC_BLOCK: >> + /* We should only switch text sections once. */ >> + case NOTE_INSN_SWITCH_TEXT_SECTIONS: >> break; >> >> case NOTE_INSN_EPILOGUE_BEG: >> - case NOTE_INSN_SWITCH_TEXT_SECTIONS: >> emit_note_copy (insn); >> break; >> >> @@ -3759,10 +3976,13 @@ break_superblocks (void) >> } >> >> /* Finalize the changes: reorder insn list according to the sequence specified >> - by aux pointers, enter compensation code, rebuild scope forest. */ >> + by aux pointers, enter compensation code, rebuild scope forest. If >> + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that >> + to fixup_reorder_chain so that it can insert the proper switch text >> + section notes. */ >> >> void >> -cfg_layout_finalize (void) >> +cfg_layout_finalize (bool finalize_reorder_blocks) >> { >> #ifdef ENABLE_CHECKING >> verify_flow_info (); >> @@ -3775,7 +3995,7 @@ void >> #endif >> ) >> fixup_fallthru_exit_predecessor (); >> - fixup_reorder_chain (); >> + fixup_reorder_chain (finalize_reorder_blocks); >> >> rebuild_jump_labels (get_insns ()); >> delete_dead_jumptables (); >> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) >> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >> return false; >> >> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >> - || BB_PARTITION (src) != BB_PARTITION (target)) >> + if (BB_PARTITION (src) != BB_PARTITION (target)) >> return false; >> >> if (!onlyjump_p (insn) >> >> -- >> This patch is available for review at http://codereview.appspot.com/6823047 > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Are you sure you have all my changes applied? I applied the 4 patches attached to PR55121 into my trunk checkout that has my fixes, and to a pristine trunk checkout. I configured and built both for --target=arm-none-linux-gnueabi, and built using your options, .i file and gcda file. I can reproduce the failure using the pristine trunk with your patches but not with my fixed trunk + your patches. (I just updated to head to pickup recent changes and get the same result. The vec changes required some manual changes to the patch, which I will resend shortly.) Without my fixes: $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use -fno-common -o eval.s -freorder-blocks-and-partition GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version 2.4.2-p1, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version 2.4.2-p1, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a eval.c: In function ‘Ge’: eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560 } ^ 0x622f71 df_compact_blocks() ../../gcc_trunk_3/gcc/df-core.c:1560 0x5cfcb5 compact_blocks() ../../gcc_trunk_3/gcc/cfg.c:162 0xc9dce0 reorder_basic_blocks ../../gcc_trunk_3/gcc/bb-reorder.c:2154 0xc9dce0 rest_of_handle_reorder_blocks ../../gcc_trunk_3/gcc/bb-reorder.c:2219 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. With my fixes: $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use -fno-common -o eval.s -freorder-blocks-and-partition GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version 2.4.2-p1, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version 2.4.2-p1, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3 Thanks, Teresa On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon <christophe.lyon@linaro.org> wrote: > Hi, > > I have tested your patch on Spec2000 on ARM, and I can still see > several failures caused by: > "error: fallthru edge crosses section boundary", including the case > described in PR55121. > > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote: >> Ping. >> Teresa >> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote: >>> Revised patch that fixes failures encountered when enabling >>> -freorder-blocks-and-partition, including the failure reported in PR 53743. >>> >>> This includes new verification code to ensure no cold blocks dominate hot >>> blocks contributed by Steven Bosscher. >>> >>> I attempted to make the handling of partition updates through the optimization >>> passes much more consistent, removing a number of partial fixes in the code >>> stream in the process. The code to fixup partitions (including the BB_PARTITION >>> assignement, region crossing jump notes, and switch text section notes) is >>> now handled in a few centralized locations. For example, inside >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers >>> don't need to attempt the fixup themselves. >>> >>> For optimization passes that make adjustments to the cfg while in cfg layout >>> mode that are not easy to fix up incrementally, the new routine >>> fixup_partitions handles the cleanup globally. This does require calculation >>> of the dominance relation, however, as far as I can tell the routines which >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) >>> are invoked typically once (or a small number of times in the case of >>> try_optimize_cfg) per optimization pass. Additionally, I compared the >>> -ftime-report output for some large fdo compilations and saw only minimal >>> increases in the dominance computation times, which were only a tiny percent >>> of the overall compile time. >>> >>> Additionally, I added a flag to the rtl_data structure to indicate whether >>> any partitioning was actually performed, so that optimizations which were >>> conservatively disabled whenever the flag_reorder_blocks_and_partition >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less >>> conservative for functions where no partitions were formed (e.g. they are >>> completely hot). >>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int >>> benchmarks and internal google benchmarks using profile feedback and >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk? >>> >>> Thanks, >>> Teresa >>> >>> 2012-11-14 Teresa Johnson <tejohnson@google.com> >>> Steven Bosscher <steven@gcc.gnu.org> >>> >>> * cfghooks.h (cfg_layout_finalize): New parameter. >>> * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize >>> parameter. >>> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert >>> as this is now done by redirect_edge_and_branch_force. >>> * function.c (thread_prologue_and_epilogue_insns): Insert new bb after >>> barriers, new cfg_layout_finalize parameter, and don't store exit >>> predecessor BB until after it is potentially split. >>> * function.h (struct rtl_data): New flag has_bb_partition. >>> * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. >>> * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if >>> any blocks in function actually partitioned. >>> (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean >>> up partitioning. >>> * bb-reorder.c (connect_traces): Only look for partitions and skip >>> block copying if any blocks in function actually partitioned. >>> (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. >>> (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure >>> that no cold blocks dominate a hot block. >>> (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert >>> as this is now done by force_nonfallthru_and_redirect. >>> (add_reg_crossing_jump_notes): Handle the fact that some jumps may >>> already be marked with region crossing note. >>> (reorder_basic_blocks): Only need to verify partitions if any >>> blocks in function actually partitioned. >>> (insert_section_boundary_note): Only need to insert note if any >>> blocks in function actually partitioned. >>> (rest_of_handle_reorder_blocks): New cfg_layout_finalize >>> parameter, and remove call to insert_section_boundary_note as this >>> is now called via cfg_layout_finalize/fixup_reorder_chain. >>> (duplicate_computed_gotos): New cfg_layout_finalize >>> parameter. >>> (partition_hot_cold_basic_blocks): Set flag indicating function >>> has bb partitions. >>> * bb-reorder.h: Declare insert_section_boundary_note and >>> emit_barrier_after_bb, which are no longer static. >>> * basic-block.h: Declare new function fixup_partitions. >>> * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary >>> check for region crossing note. >>> (fixup_partition_crossing): New function. >>> (fixup_bb_partition): Ditto. >>> (rtl_redirect_edge_and_branch): Fixup partition boundaries. >>> (force_nonfallthru_and_redirect): Fixup partition boundaries, >>> remove old code that tried to do this. Emit barrier correctly >>> when we are in cfglayout mode. >>> (rtl_split_edge): Correctly fixup partition boundaries. >>> (commit_one_edge_insertion): Remove old code that tried to >>> fixup region crossing edge since this is now handled in >>> split_block, and set up insertion point correctly since >>> block may now end in a jump. >>> (commit_edge_insertions): Invoke fixup_partitions to sanitize partition >>> boundaries after optimizations that modify cfg and before trying to >>> verify the flow info. >>> (fixup_partitions): New function. >>> (rtl_verify_flow_info_1): Add verification that no cold bbs dominate >>> hot bbs. >>> (record_effective_endpoints): Remove region-crossing notes and set flag >>> indicating that they need to be reinserted on exit from cfglayout mode. >>> (outof_cfg_layout_mode): New cfg_layout_finalize parameter. >>> (fixup_reorder_chain): Call insert_section_boundary_note if necessary. >>> Remove old code that attempted to fixup region crossing note as >>> this is now handled in force_nonfallthru_and_redirect. >>> (duplicate_insn_chain): Don't duplicate switch section notes. >>> (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain. >>> (rtl_can_remove_branch_p): Remove unnecessary check for region crossing >>> note. >>> >>> Index: cfghooks.h >>> =================================================================== >>> --- cfghooks.h (revision 193376) >>> +++ cfghooks.h (working copy) >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas >>> void account_profile_record (struct profile_record *, int); >>> >>> extern void cfg_layout_initialize (unsigned int); >>> -extern void cfg_layout_finalize (void); >>> +extern void cfg_layout_finalize (bool); >>> >>> /* Hooks containers. */ >>> extern struct cfg_hooks gimple_cfg_hooks; >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi >>> extern void gimple_register_cfg_hooks (void); >>> extern struct cfg_hooks get_cfg_hooks (void); >>> extern void set_cfg_hooks (struct cfg_hooks); >>> - >>> Index: modulo-sched.c >>> =================================================================== >>> --- modulo-sched.c (revision 193376) >>> +++ modulo-sched.c (working copy) >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>> bb->aux = bb->next_bb; >>> free_dominance_info (CDI_DOMINATORS); >>> - cfg_layout_finalize (); >>> + cfg_layout_finalize (false); >>> #endif /* INSN_SCHEDULING */ >>> return 0; >>> } >>> Index: ifcvt.c >>> =================================================================== >>> --- ifcvt.c (revision 193376) >>> +++ ifcvt.c (working copy) >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg >>> if (new_bb) >>> { >>> df_bb_replace (then_bb_index, new_bb); >>> - /* Since the fallthru edge was redirected from test_bb to new_bb, >>> - we need to ensure that new_bb is in the same partition as >>> - test bb (you can not fall through across section boundaries). */ >>> - BB_COPY_PARTITION (new_bb, test_bb); >>> + /* This should have been done above via force_nonfallthru_and_redirect >>> + (possibly called from redirect_edge_and_branch_force). */ >>> + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); >>> } >>> >>> num_true_changes++; >>> Index: function.c >>> =================================================================== >>> --- function.c (revision 193376) >>> +++ function.c (working copy) >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) >>> break; >>> if (e) >>> { >>> - copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), >>> - NULL_RTX, e->src); >>> + /* Make sure we insert after any barriers. */ >>> + rtx end = get_last_bb_insn (e->src); >>> + copy_bb = create_basic_block (NEXT_INSN (end), >>> + NULL_RTX, e->src); >>> BB_COPY_PARTITION (copy_bb, e->src); >>> } >>> else >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) >>> if (cur_bb->index >= NUM_FIXED_BLOCKS >>> && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) >>> cur_bb->aux = cur_bb->next_bb; >>> - cfg_layout_finalize (); >>> + cfg_layout_finalize (false); >>> } >>> >>> epilogue_done: >>> @@ -6517,7 +6519,7 @@ epilogue_done: >>> basic_block simple_return_block_cold = NULL; >>> edge pending_edge_hot = NULL; >>> edge pending_edge_cold = NULL; >>> - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; >>> + basic_block exit_pred; >>> int i; >>> >>> gcc_assert (entry_edge != orig_entry_edge); >>> @@ -6545,6 +6547,12 @@ epilogue_done: >>> else >>> pending_edge_cold = e; >>> } >>> + >>> + /* Save a pointer to the exit's predecessor BB for use in >>> + inserting new BBs at the end of the function. Do this >>> + after the call to split_block above which may split >>> + the original exit pred. */ >>> + exit_pred = EXIT_BLOCK_PTR->prev_bb; >>> >>> FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) >>> { >>> Index: function.h >>> =================================================================== >>> --- function.h (revision 193376) >>> +++ function.h (working copy) >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { >>> sched2) and is useful only if the port defines LEAF_REGISTERS. */ >>> bool uses_only_leaf_regs; >>> >>> + /* Nonzero if the function being compiled has undergone hot/cold partitioning >>> + (under flag_reorder_blocks_and_partition) and has at least one cold >>> + block. */ >>> + bool has_bb_partition; >>> + >>> /* Like regs_ever_live, but 1 if a reg is set or clobbered from an >>> asm. Unlike regs_ever_live, elements of this array corresponding >>> to eliminable regs (like the frame pointer) are set if an asm >>> Index: hw-doloop.c >>> =================================================================== >>> --- hw-doloop.c (revision 193376) >>> +++ hw-doloop.c (working copy) >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) >>> else >>> bb->aux = NULL; >>> } >>> - cfg_layout_finalize (); >>> + cfg_layout_finalize (false); >>> clear_aux_for_blocks (); >>> df_analyze (); >>> } >>> Index: cfgcleanup.c >>> =================================================================== >>> --- cfgcleanup.c (revision 193376) >>> +++ cfgcleanup.c (working copy) >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, >>> partition boundaries). See the comments at the top of >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >>> >>> - if (flag_reorder_blocks_and_partition && reload_completed) >>> + if (crtl->has_bb_partition && reload_completed) >>> return false; >>> >>> /* Search backward through forwarder blocks. We don't need to worry >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) >>> df_analyze (); >>> } >>> >>> + if (changed) >>> + { >>> + /* Edge forwarding in particular can cause hot blocks previously >>> + reached by both hot and cold blocks to become dominated only >>> + by cold blocks. This will cause the verification below to fail, >>> + and lead to now cold code in the hot section. This is not easy >>> + to detect and fix during edge forwarding, and in some cases >>> + is only visible after newly unreachable blocks are deleted, >>> + which will be done in fixup_partitions. */ >>> + fixup_partitions (); >>> + >>> #ifdef ENABLE_CHECKING >>> - if (changed) >>> - verify_flow_info (); >>> + verify_flow_info (); >>> #endif >>> + } >>> >>> changed_overall |= changed; >>> first_pass = false; >>> Index: bb-reorder.c >>> =================================================================== >>> --- bb-reorder.c (revision 193376) >>> +++ bb-reorder.c (working copy) >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces >>> current_partition = BB_PARTITION (traces[0].first); >>> two_passes = false; >>> >>> - if (flag_reorder_blocks_and_partition) >>> + if (crtl->has_bb_partition) >>> for (i = 0; i < n_traces && !two_passes; i++) >>> if (BB_PARTITION (traces[0].first) >>> != BB_PARTITION (traces[i].first)) >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces >>> } >>> } >>> >>> - if (flag_reorder_blocks_and_partition) >>> + if (crtl->has_bb_partition) >>> try_copy = false; >>> >>> /* Copy tiny blocks always; copy larger blocks only when the >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) >>> return length; >>> } >>> >>> -/* Emit a barrier into the footer of BB. */ >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode. */ >>> >>> -static void >>> +void >>> emit_barrier_after_bb (basic_block bb) >>> { >>> rtx barrier = emit_barrier_after (BB_END (bb)); >>> - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >>> + if (current_ir_type () == IR_RTL_CFGLAYOUT) >>> + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >>> } >>> >>> /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg >>> { >>> VEC(edge, heap) *crossing_edges = NULL; >>> basic_block bb; >>> - edge e; >>> - edge_iterator ei; >>> + edge e, e2; >>> + edge_iterator ei, ei2; >>> + unsigned int cold_bb_count = 0; >>> + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; >>> + VEC (basic_block, heap) *bbs_newly_hot = NULL; >>> >>> /* Mark which partition (hot/cold) each basic block belongs in. */ >>> FOR_EACH_BB (bb) >>> { >>> if (probably_never_executed_bb_p (cfun, bb)) >>> - BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>> + { >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>> + cold_bb_count++; >>> + } >>> else >>> - BB_SET_PARTITION (bb, BB_HOT_PARTITION); >>> + { >>> + BB_SET_PARTITION (bb, BB_HOT_PARTITION); >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb); >>> + } >>> } >>> >>> + /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of >>> + several different possibilities. One is that there are edge weight insanities >>> + due to optimization phases that do not properly update basic block profile >>> + counts. The second is that the entry of the function may not be hot, because >>> + it is entered fewer times than the number of profile training runs, but there >>> + is a loop inside the function that causes blocks within the function to be >>> + above the threshold for hotness. */ >>> + if (cold_bb_count) >>> + { >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>> + >>> + if (dom_calculated_here) >>> + calculate_dominance_info (CDI_DOMINATORS); >>> + >>> + /* Keep examining hot bbs until we have either checked them all, or >>> + re-marked all cold bbs hot. */ >>> + while (! VEC_empty (basic_block, bbs_in_hot_partition) >>> + && cold_bb_count) >>> + { >>> + basic_block dom_bb; >>> + >>> + bb = VEC_pop (basic_block, bbs_in_hot_partition); >>> + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); >>> + >>> + /* If bb's immediate dominator is also hot then it is ok. */ >>> + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) >>> + continue; >>> + >>> + /* We have a hot bb with an immediate dominator that is cold. >>> + The dominator needs to be re-marked to hot. */ >>> + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); >>> + cold_bb_count--; >>> + >>> + /* Now we need to examine newly-hot dom_bb to see if it is also >>> + dominated by a cold bb. */ >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb); >>> + >>> + /* We should also adjust any cold blocks that the newly-hot bb >>> + feeds and see if it makes sense to re-mark those as hot as >>> + well. */ >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb); >>> + while (! VEC_empty (basic_block, bbs_newly_hot)) >>> + { >>> + basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot); >>> + /* Examine all successors of this newly-hot bb to see if they >>> + are cold and should be re-marked as hot. */ >>> + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) >>> + { >>> + bool any_cold_preds = false; >>> + basic_block succ = e->dest; >>> + if (BB_PARTITION (succ) != BB_COLD_PARTITION) >>> + continue; >>> + /* Does this block have any cold predecessors now? */ >>> + FOR_EACH_EDGE (e2, ei2, succ->preds) >>> + { >>> + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) >>> + { >>> + any_cold_preds = true; >>> + break; >>> + } >>> + } >>> + if (any_cold_preds) >>> + continue; >>> + >>> + /* Here we have a successor of newly-hot bb that is cold >>> + but no longer has any cold precessessors. Since the original >>> + assignment of our newly-hot bb was incorrect, this successor's >>> + assignment as cold is also suspect. Go ahead and re-mark it >>> + as hot now too. Better heuristics may be in order here. */ >>> + BB_SET_PARTITION (succ, BB_HOT_PARTITION); >>> + cold_bb_count--; >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ); >>> + /* Examine this successor as a newly-hot bb. */ >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, succ); >>> + } >>> + } >>> + } >>> + >>> + if (dom_calculated_here) >>> + free_dominance_info (CDI_DOMINATORS); >>> + } >>> + >>> /* The format of .gcc_except_table does not allow landing pads to >>> be in a different partition as the throw. Fix this by either >>> moving or duplicating the landing pads. */ >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) >>> new_bb->aux = cur_bb->aux; >>> cur_bb->aux = new_bb; >>> >>> - /* Make sure new fall-through bb is in same >>> - partition as bb it's falling through from. */ >>> + /* This is done by force_nonfallthru_and_redirect. */ >>> + gcc_assert (BB_PARTITION (new_bb) >>> + == BB_PARTITION (cur_bb)); >>> >>> - BB_COPY_PARTITION (new_bb, cur_bb); >>> single_succ_edge (new_bb)->flags |= EDGE_CROSSING; >>> } >>> else >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) >>> FOR_EACH_BB (bb) >>> FOR_EACH_EDGE (e, ei, bb->succs) >>> if ((e->flags & EDGE_CROSSING) >>> - && JUMP_P (BB_END (e->src))) >>> + && JUMP_P (BB_END (e->src)) >>> + /* Some notes were added during fix_up_fall_thru_edges, via >>> + force_nonfallthru_and_redirect. */ >>> + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX)) >>> add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>> } >>> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) >>> dump_flow_info (dump_file, dump_flags); >>> } >>> >>> - if (flag_reorder_blocks_and_partition) >>> + if (crtl->has_bb_partition) >>> verify_hot_cold_block_grouping (); >>> } >>> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) >>> encountering this note will make the compiler switch between the >>> hot and cold text sections. */ >>> >>> -static void >>> +void >>> insert_section_boundary_note (void) >>> { >>> basic_block bb; >>> rtx new_note; >>> int first_partition = 0; >>> >>> - if (!flag_reorder_blocks_and_partition) >>> + if (!crtl->has_bb_partition) >>> return; >>> >>> FOR_EACH_BB (bb) >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) >>> FOR_EACH_BB (bb) >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>> bb->aux = bb->next_bb; >>> - cfg_layout_finalize (); >>> + cfg_layout_finalize (true); >>> >>> - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >>> - insert_section_boundary_note (); >>> return 0; >>> } >>> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) >>> } >>> >>> done: >>> - cfg_layout_finalize (); >>> + cfg_layout_finalize (false); >>> >>> BITMAP_FREE (candidates); >>> return 0; >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) >>> if (crossing_edges == NULL) >>> return 0; >>> >>> + crtl->has_bb_partition = true; >>> + >>> /* Make sure the source of any crossing edge ends in a jump and the >>> destination of any crossing edge has a label. */ >>> add_labels_and_missing_jumps (crossing_edges); >>> Index: bb-reorder.h >>> =================================================================== >>> --- bb-reorder.h (revision 193376) >>> +++ bb-reorder.h (working copy) >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re >>> >>> extern int get_uncond_jump_length (void); >>> >>> +extern void insert_section_boundary_note (void); >>> + >>> +extern void emit_barrier_after_bb (basic_block bb); >>> + >>> #endif >>> Index: basic-block.h >>> =================================================================== >>> --- basic-block.h (revision 193376) >>> +++ basic-block.h (working copy) >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect >>> extern bool contains_no_active_insn_p (const_basic_block); >>> extern bool forwarder_block_p (const_basic_block); >>> extern bool can_fallthru (basic_block, basic_block); >>> +extern void fixup_partitions (void); >>> >>> /* In cfgbuild.c. */ >>> extern void find_many_sub_basic_blocks (sbitmap); >>> Index: cfgrtl.c >>> =================================================================== >>> --- cfgrtl.c (revision 193376) >>> +++ cfgrtl.c (working copy) >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see >>> #include "tree.h" >>> #include "hard-reg-set.h" >>> #include "basic-block.h" >>> +#include "bb-reorder.h" >>> #include "regs.h" >>> #include "flags.h" >>> #include "function.h" >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see >>> Only applicable if the CFG is in cfglayout mode. */ >>> static GTY(()) rtx cfg_layout_function_footer; >>> static GTY(()) rtx cfg_layout_function_header; >>> +static bool had_sec_boundary_notes; >>> >>> static rtx skip_insns_after_block (basic_block); >>> static void record_effective_endpoints (void); >>> static rtx label_for_bb (basic_block); >>> -static void fixup_reorder_chain (void); >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks); >>> >>> void verify_insn_chain (void); >>> static void fixup_fallthru_exit_predecessor (void); >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc >>> partition boundaries). See the comments at the top of >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >>> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >>> return NULL; >>> >>> /* We can replace or remove a complex jump only when we have exactly >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) >>> return e; >>> } >>> >>> +/* Called when edge E has been redirected to a new destination, >>> + in order to update the region crossing flag on the edge and >>> + jump. */ >>> + >>> +static void >>> +fixup_partition_crossing (edge e, basic_block target) >>> +{ >>> + rtx note; >>> + >>> + gcc_assert (e->dest == target); >>> + >>> + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) >>> + return; >>> + /* If we redirected an existing edge, it may already be marked >>> + crossing, even though the new src is missing a reg crossing note. >>> + But make sure reg crossing note doesn't already exist before >>> + inserting. */ >>> + if (BB_PARTITION (e->src) != BB_PARTITION (target)) >>> + { >>> + e->flags |= EDGE_CROSSING; >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>> + if (JUMP_P (BB_END (e->src)) >>> + && !note) >>> + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>> + } >>> + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) >>> + { >>> + e->flags &= ~EDGE_CROSSING; >>> + /* Remove the region crossing note from jump at end of >>> + e->src if it exists. */ >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>> + if (note) >>> + remove_note (BB_END (e->src), note); >>> + } >>> +} >>> + >>> +/* Called when block BB has been reassigned to a different partition, >>> + to ensure that the region crossing attributes are updated. */ >>> + >>> +static void >>> +fixup_bb_partition (basic_block bb) >>> +{ >>> + edge e; >>> + edge_iterator ei; >>> + >>> + /* Now need to make bb's pred edges non-region crossing. */ >>> + FOR_EACH_EDGE (e, ei, bb->preds) >>> + { >>> + fixup_partition_crossing (e, e->dest); >>> + } >>> + >>> + /* Possibly need to make bb's successor edges region crossing, >>> + or remove stale region crossing. */ >>> + FOR_EACH_EDGE (e, ei, bb->succs) >>> + { >>> + if ((e->flags & EDGE_FALLTHRU) >>> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) >>> + && e->dest != EXIT_BLOCK_PTR) >>> + /* force_nonfallthru_and_redirect calls fixup_partition_crossing. */ >>> + force_nonfallthru (e); >>> + else >>> + fixup_partition_crossing (e, e->dest); >>> + } >>> +} >>> + >>> /* Attempt to change code to redirect edge E to TARGET. Don't do that on >>> expense of adding new instructions or reordering basic blocks. >>> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block >>> { >>> edge ret; >>> basic_block src = e->src; >>> + basic_block dest = e->dest; >>> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >>> return NULL; >>> >>> - if (e->dest == target) >>> + if (dest == target) >>> return e; >>> >>> if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) >>> { >>> df_set_bb_dirty (src); >>> + fixup_partition_crossing (ret, target); >>> return ret; >>> } >>> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block >>> return NULL; >>> >>> df_set_bb_dirty (src); >>> + fixup_partition_crossing (ret, target); >>> return ret; >>> } >>> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >>> /* Make sure new block ends up in correct hot/cold section. */ >>> >>> BB_COPY_PARTITION (jump_block, e->src); >>> - if (flag_reorder_blocks_and_partition >>> - && targetm_common.have_named_sections >>> - && JUMP_P (BB_END (jump_block)) >>> - && !any_condjump_p (BB_END (jump_block)) >>> - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) >>> - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); >>> >>> /* Wire edge in. */ >>> new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); >>> new_edge->probability = probability; >>> new_edge->count = count; >>> >>> + /* If e->src was previously region crossing, it no longer is >>> + and the reg crossing note should be removed. */ >>> + fixup_partition_crossing (new_edge, jump_block); >>> + >>> /* Redirect old edge. */ >>> redirect_edge_pred (e, jump_block); >>> e->probability = REG_BR_PROB_BASE; >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >>> LABEL_NUSES (label)++; >>> } >>> >>> - emit_barrier_after (BB_END (jump_block)); >>> + /* We might be in cfg layout mode, and if so, the following routine will >>> + insert the barrier correctly. */ >>> + emit_barrier_after_bb (jump_block); >>> redirect_edge_succ_nodup (e, target); >>> >>> if (abnormal_edge_flags) >>> make_edge (src, target, abnormal_edge_flags); >>> >>> df_mark_solutions_dirty (); >>> + fixup_partition_crossing (e, target); >>> return new_bb; >>> } >>> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU >>> static basic_block >>> rtl_split_edge (edge edge_in) >>> { >>> - basic_block bb; >>> + basic_block bb, new_bb; >>> rtx before; >>> >>> /* Abnormal edges cannot be split. */ >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) >>> else >>> { >>> bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); >>> - /* ??? Why not edge_in->dest->prev_bb here? */ >>> - BB_COPY_PARTITION (bb, edge_in->dest); >>> + if (edge_in->src == ENTRY_BLOCK_PTR) >>> + BB_COPY_PARTITION (bb, edge_in->dest); >>> + else >>> + /* Put the split bb into the src partition, to avoid creating >>> + a situation where a cold bb dominates a hot bb, in the case >>> + where src is cold and dest is hot. The src will dominate >>> + the new bb (whereas it might not have dominated dest). */ >>> + BB_COPY_PARTITION (bb, edge_in->src); >>> } >>> >>> make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); >>> >>> + /* Can't allow a region crossing edge to be fallthrough. */ >>> + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) >>> + && edge_in->dest != EXIT_BLOCK_PTR) >>> + { >>> + new_bb = force_nonfallthru (single_succ_edge (bb)); >>> + gcc_assert (!new_bb); >>> + } >>> + >>> /* For non-fallthru edges, we must adjust the predecessor's >>> jump instruction to target our new block. */ >>> if ((edge_in->flags & EDGE_FALLTHRU) == 0) >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) >>> else >>> { >>> bb = split_edge (e); >>> - after = BB_END (bb); >>> >>> - if (flag_reorder_blocks_and_partition >>> - && targetm_common.have_named_sections >>> - && e->src != ENTRY_BLOCK_PTR >>> - && BB_PARTITION (e->src) == BB_COLD_PARTITION >>> - && !(e->flags & EDGE_CROSSING) >>> - && JUMP_P (after) >>> - && !any_condjump_p (after) >>> - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) >>> - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); >>> + /* If e crossed a partition boundary, we needed to make bb end in >>> + a region-crossing jump, even though it was originally fallthru. */ >>> + if (JUMP_P (BB_END (bb))) >>> + before = BB_END (bb); >>> + else >>> + after = BB_END (bb); >>> } >>> >>> /* Now that we've found the spot, do the insertion. */ >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) >>> { >>> basic_block bb; >>> >>> + /* Optimization passes that invoke this routine can cause hot blocks >>> + previously reached by both hot and cold blocks to become dominated only >>> + by cold blocks. This will cause the verification below to fail, >>> + and lead to now cold code in the hot section. In some cases this >>> + may only be visible after newly unreachable blocks are deleted, >>> + which will be done by fixup_partitions. */ >>> + fixup_partitions (); >>> + >>> #ifdef ENABLE_CHECKING >>> verify_flow_info (); >>> #endif >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) >>> >>> return end; >>> } >>> - >>> + >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization >>> + passes that modify the cfg. */ >>> + >>> +void >>> +fixup_partitions (void) >>> +{ >>> + basic_block bb; >>> + >>> + if (!crtl->has_bb_partition) >>> + return; >>> + >>> + /* Delete any blocks that became unreachable and weren't >>> + already cleaned up, for example during edge forwarding >>> + and convert_jumps_to_returns. This will expose more >>> + opportunities for fixing the partition boundaries here. >>> + Also, the calculation of the dominance graph during verification >>> + will assert if there are unreachable nodes. */ >>> + delete_unreachable_blocks (); >>> + >>> + /* If there are partitions, do a sanity check on them: A basic block in >>> + a cold partition cannot dominate a basic block in a hot partition. >>> + Fixup any that now violate this requirement, as a result of edge >>> + forwarding and unreachable block deletion. */ >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >>> + VEC (basic_block, heap) *bbs_to_fix = NULL; >>> + FOR_EACH_BB (bb) >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >>> + { >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>> + basic_block son; >>> + >>> + if (dom_calculated_here) >>> + calculate_dominance_info (CDI_DOMINATORS); >>> + >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >>> + { >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >>> + /* If bb is not yet cold (because it was added below as >>> + a block dominated by a cold bb) then mark it cold here. */ >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >>> + { >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>> + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); >>> + } >>> + /* Any blocks dominated by a block in the cold section >>> + must also be cold. */ >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >>> + son; >>> + son = next_dom_son (CDI_DOMINATORS, son)) >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >>> + } >>> + >>> + if (dom_calculated_here) >>> + free_dominance_info (CDI_DOMINATORS); >>> + } >>> + >>> + /* Do the partition fixup after all necessary blocks have been converted to >>> + cold, so that we only update the region crossings the minimum number of >>> + places, which can require forcing edges to be non fallthru. */ >>> + while (! VEC_empty (basic_block, bbs_to_fix)) >>> + { >>> + bb = VEC_pop (basic_block, bbs_to_fix); >>> + fixup_bb_partition (bb); >>> + } >>> +} >>> + >>> /* Verify the CFG and RTL consistency common for both underlying RTL and >>> cfglayout RTL. >>> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) >>> rtx x; >>> int err = 0; >>> basic_block bb; >>> + bool have_partitions = false; >>> >>> /* Check the general integrity of the basic blocks. */ >>> FOR_EACH_BB_REVERSE (bb) >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) >>> >>> if (e->flags & EDGE_ABNORMAL) >>> n_abnormal++; >>> + >>> + have_partitions |= is_crossing; >>> } >>> >>> if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) >>> } >>> } >>> >>> + /* If there are partitions, do a sanity check on them: A basic block in >>> + a cold partition cannot dominate a basic block in a hot partition. */ >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >>> + if (have_partitions && !err) >>> + FOR_EACH_BB (bb) >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >>> + { >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>> + basic_block son; >>> + >>> + if (dom_calculated_here) >>> + calculate_dominance_info (CDI_DOMINATORS); >>> + >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >>> + { >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >>> + { >>> + error ("non-cold basic block %d dominated " >>> + "by a block in the cold partition", bb->index); >>> + err = 1; >>> + } >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >>> + son; >>> + son = next_dom_son (CDI_DOMINATORS, son)) >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >>> + } >>> + >>> + if (dom_calculated_here) >>> + free_dominance_info (CDI_DOMINATORS); >>> + } >>> + >>> /* Clean up. */ >>> return err; >>> } >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) >>> else >>> cfg_layout_function_header = NULL_RTX; >>> >>> + had_sec_boundary_notes = false; >>> + >>> next_insn = get_insns (); >>> FOR_EACH_BB (bb) >>> { >>> rtx end; >>> >>> if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) >>> - BB_HEADER (bb) = unlink_insn_chain (next_insn, >>> - PREV_INSN (BB_HEAD (bb))); >>> + { >>> + /* Rather than try to keep section boundary notes incrementally >>> + up-to-date through cfg layout optimizations, simply remove them >>> + and flag that they should be re-inserted when exiting >>> + cfg layout mode. */ >>> + rtx check_insn = next_insn; >>> + while (check_insn) >>> + { >>> + if (NOTE_P (check_insn) >>> + && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) >>> + { >>> + had_sec_boundary_notes |= true; >>> + /* Remove note from chain. Grab new next_insn first. */ >>> + if (next_insn == check_insn) >>> + next_insn = NEXT_INSN (check_insn); >>> + /* Delete note. */ >>> + delete_insn (check_insn); >>> + /* There will only be one. */ >>> + break; >>> + } >>> + check_insn = NEXT_INSN (check_insn); >>> + } >>> + /* If we still have header instructions left after above loop. */ >>> + if (next_insn != BB_HEAD (bb)) >>> + BB_HEADER (bb) = unlink_insn_chain (next_insn, >>> + PREV_INSN (BB_HEAD (bb))); >>> + } >>> end = skip_insns_after_block (bb); >>> if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) >>> BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end); >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>> bb->aux = bb->next_bb; >>> >>> - cfg_layout_finalize (); >>> + cfg_layout_finalize (false); >>> >>> return 0; >>> } >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) >>> } >>> >>> >>> -/* Given a reorder chain, rearrange the code to match. */ >>> +/* Given a reorder chain, rearrange the code to match. If >>> + this is called when we will FINALIZE_REORDER_BLOCKS, or when >>> + section boundary notes were removed on entry to cfg layout >>> + mode, insert section boundary notes here. */ >>> >>> static void >>> -fixup_reorder_chain (void) >>> +fixup_reorder_chain (bool finalize_reorder_blocks) >>> { >>> basic_block bb; >>> rtx insn = NULL; >>> @@ -3150,7 +3373,7 @@ static void >>> PREV_INSN (BB_HEADER (bb)) = insn; >>> insn = BB_HEADER (bb); >>> while (NEXT_INSN (insn)) >>> - insn = NEXT_INSN (insn); >>> + insn = NEXT_INSN (insn); >>> } >>> if (insn) >>> NEXT_INSN (insn) = BB_HEAD (bb); >>> @@ -3175,6 +3398,11 @@ static void >>> insn = NEXT_INSN (insn); >>> >>> set_last_insn (insn); >>> + >>> + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >>> + if (had_sec_boundary_notes || finalize_reorder_blocks) >>> + insert_section_boundary_note (); >>> + >>> #ifdef ENABLE_CHECKING >>> verify_insn_chain (); >>> #endif >>> @@ -3187,7 +3415,7 @@ static void >>> edge e_fall, e_taken, e; >>> rtx bb_end_insn; >>> rtx ret_label = NULL_RTX; >>> - basic_block nb, src_bb; >>> + basic_block nb; >>> edge_iterator ei; >>> >>> if (EDGE_COUNT (bb->succs) == 0) >>> @@ -3322,7 +3550,6 @@ static void >>> /* We got here if we need to add a new jump insn. >>> Note force_nonfallthru can delete E_FALL and thus we have to >>> save E_FALL->src prior to the call to force_nonfallthru. */ >>> - src_bb = e_fall->src; >>> nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); >>> if (nb) >>> { >>> @@ -3330,17 +3557,6 @@ static void >>> bb->aux = nb; >>> /* Don't process this new block. */ >>> bb = nb; >>> - >>> - /* Make sure new bb is tagged for correct section (same as >>> - fall-thru source, since you cannot fall-thru across >>> - section boundaries). */ >>> - BB_COPY_PARTITION (src_bb, single_pred (bb)); >>> - if (flag_reorder_blocks_and_partition >>> - && targetm_common.have_named_sections >>> - && JUMP_P (BB_END (bb)) >>> - && !any_condjump_p (BB_END (bb)) >>> - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) >>> - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); >>> } >>> } >>> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) >>> case NOTE_INSN_FUNCTION_BEG: >>> /* There is always just single entry to function. */ >>> case NOTE_INSN_BASIC_BLOCK: >>> + /* We should only switch text sections once. */ >>> + case NOTE_INSN_SWITCH_TEXT_SECTIONS: >>> break; >>> >>> case NOTE_INSN_EPILOGUE_BEG: >>> - case NOTE_INSN_SWITCH_TEXT_SECTIONS: >>> emit_note_copy (insn); >>> break; >>> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void) >>> } >>> >>> /* Finalize the changes: reorder insn list according to the sequence specified >>> - by aux pointers, enter compensation code, rebuild scope forest. */ >>> + by aux pointers, enter compensation code, rebuild scope forest. If >>> + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that >>> + to fixup_reorder_chain so that it can insert the proper switch text >>> + section notes. */ >>> >>> void >>> -cfg_layout_finalize (void) >>> +cfg_layout_finalize (bool finalize_reorder_blocks) >>> { >>> #ifdef ENABLE_CHECKING >>> verify_flow_info (); >>> @@ -3775,7 +3995,7 @@ void >>> #endif >>> ) >>> fixup_fallthru_exit_predecessor (); >>> - fixup_reorder_chain (); >>> + fixup_reorder_chain (finalize_reorder_blocks); >>> >>> rebuild_jump_labels (get_insns ()); >>> delete_dead_jumptables (); >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >>> return false; >>> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >>> return false; >>> >>> if (!onlyjump_p (insn) >>> >>> -- >>> This patch is available for review at http://codereview.appspot.com/6823047 >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote: > Are you sure you have all my changes applied? I applied the 4 patches > attached to PR55121 into my trunk checkout that has my fixes, and to a > pristine trunk checkout. I configured and built both for > --target=arm-none-linux-gnueabi, and built using your options, .i file > and gcda file. I can reproduce the failure using the pristine trunk > with your patches but not with my fixed trunk + your patches. (I just > updated to head to pickup recent changes and get the same result. The > vec changes required some manual changes to the patch, which I will > resend shortly.) Teresa, Your mailer seems to have corrupted the posted patch with stray =3D characters and line breaks. Can you repost a copy as an attachment to the list? Jack > > Without my fixes: > > $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce > ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 > -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp > -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use > -fno-common -o eval.s -freorder-blocks-and-partition > GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) > compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version > 2.4.2-p1, MPC version 0.8.1 > GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 > GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) > compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version > 2.4.2-p1, MPC version 0.8.1 > GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 > Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a > eval.c: In function ‘Ge’: > eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560 > } > ^ > 0x622f71 df_compact_blocks() > ../../gcc_trunk_3/gcc/df-core.c:1560 > 0x5cfcb5 compact_blocks() > ../../gcc_trunk_3/gcc/cfg.c:162 > 0xc9dce0 reorder_basic_blocks > ../../gcc_trunk_3/gcc/bb-reorder.c:2154 > 0xc9dce0 rest_of_handle_reorder_blocks > ../../gcc_trunk_3/gcc/bb-reorder.c:2219 > Please submit a full bug report, > with preprocessed source if appropriate. > Please include the complete backtrace with any bug report. > See <http://gcc.gnu.org/bugs.html> for instructions. > > > With my fixes: > > $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce > ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 > -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp > -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use > -fno-common -o eval.s -freorder-blocks-and-partition > GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) > compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version > 2.4.2-p1, MPC version 0.8.1 > GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 > GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) > compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version > 2.4.2-p1, MPC version 0.8.1 > GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 > Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3 > > > Thanks, > Teresa > > On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon > <christophe.lyon@linaro.org> wrote: > > Hi, > > > > I have tested your patch on Spec2000 on ARM, and I can still see > > several failures caused by: > > "error: fallthru edge crosses section boundary", including the case > > described in PR55121. > > > > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote: > >> Ping. > >> Teresa > >> > >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote: > >>> Revised patch that fixes failures encountered when enabling > >>> -freorder-blocks-and-partition, including the failure reported in PR 53743. > >>> > >>> This includes new verification code to ensure no cold blocks dominate hot > >>> blocks contributed by Steven Bosscher. > >>> > >>> I attempted to make the handling of partition updates through the optimization > >>> passes much more consistent, removing a number of partial fixes in the code > >>> stream in the process. The code to fixup partitions (including the BB_PARTITION > >>> assignement, region crossing jump notes, and switch text section notes) is > >>> now handled in a few centralized locations. For example, inside > >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers > >>> don't need to attempt the fixup themselves. > >>> > >>> For optimization passes that make adjustments to the cfg while in cfg layout > >>> mode that are not easy to fix up incrementally, the new routine > >>> fixup_partitions handles the cleanup globally. This does require calculation > >>> of the dominance relation, however, as far as I can tell the routines which > >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) > >>> are invoked typically once (or a small number of times in the case of > >>> try_optimize_cfg) per optimization pass. Additionally, I compared the > >>> -ftime-report output for some large fdo compilations and saw only minimal > >>> increases in the dominance computation times, which were only a tiny percent > >>> of the overall compile time. > >>> > >>> Additionally, I added a flag to the rtl_data structure to indicate whether > >>> any partitioning was actually performed, so that optimizations which were > >>> conservatively disabled whenever the flag_reorder_blocks_and_partition > >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less > >>> conservative for functions where no partitions were formed (e.g. they are > >>> completely hot). > >>> > >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int > >>> benchmarks and internal google benchmarks using profile feedback and > >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk? > >>> > >>> Thanks, > >>> Teresa > >>> > >>> 2012-11-14 Teresa Johnson <tejohnson@google.com> > >>> Steven Bosscher <steven@gcc.gnu.org> > >>> > >>> * cfghooks.h (cfg_layout_finalize): New parameter. > >>> * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize > >>> parameter. > >>> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert > >>> as this is now done by redirect_edge_and_branch_force. > >>> * function.c (thread_prologue_and_epilogue_insns): Insert new bb after > >>> barriers, new cfg_layout_finalize parameter, and don't store exit > >>> predecessor BB until after it is potentially split. > >>> * function.h (struct rtl_data): New flag has_bb_partition. > >>> * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. > >>> * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if > >>> any blocks in function actually partitioned. > >>> (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean > >>> up partitioning. > >>> * bb-reorder.c (connect_traces): Only look for partitions and skip > >>> block copying if any blocks in function actually partitioned. > >>> (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. > >>> (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure > >>> that no cold blocks dominate a hot block. > >>> (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert > >>> as this is now done by force_nonfallthru_and_redirect. > >>> (add_reg_crossing_jump_notes): Handle the fact that some jumps may > >>> already be marked with region crossing note. > >>> (reorder_basic_blocks): Only need to verify partitions if any > >>> blocks in function actually partitioned. > >>> (insert_section_boundary_note): Only need to insert note if any > >>> blocks in function actually partitioned. > >>> (rest_of_handle_reorder_blocks): New cfg_layout_finalize > >>> parameter, and remove call to insert_section_boundary_note as this > >>> is now called via cfg_layout_finalize/fixup_reorder_chain. > >>> (duplicate_computed_gotos): New cfg_layout_finalize > >>> parameter. > >>> (partition_hot_cold_basic_blocks): Set flag indicating function > >>> has bb partitions. > >>> * bb-reorder.h: Declare insert_section_boundary_note and > >>> emit_barrier_after_bb, which are no longer static. > >>> * basic-block.h: Declare new function fixup_partitions. > >>> * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary > >>> check for region crossing note. > >>> (fixup_partition_crossing): New function. > >>> (fixup_bb_partition): Ditto. > >>> (rtl_redirect_edge_and_branch): Fixup partition boundaries. > >>> (force_nonfallthru_and_redirect): Fixup partition boundaries, > >>> remove old code that tried to do this. Emit barrier correctly > >>> when we are in cfglayout mode. > >>> (rtl_split_edge): Correctly fixup partition boundaries. > >>> (commit_one_edge_insertion): Remove old code that tried to > >>> fixup region crossing edge since this is now handled in > >>> split_block, and set up insertion point correctly since > >>> block may now end in a jump. > >>> (commit_edge_insertions): Invoke fixup_partitions to sanitize partition > >>> boundaries after optimizations that modify cfg and before trying to > >>> verify the flow info. > >>> (fixup_partitions): New function. > >>> (rtl_verify_flow_info_1): Add verification that no cold bbs dominate > >>> hot bbs. > >>> (record_effective_endpoints): Remove region-crossing notes and set flag > >>> indicating that they need to be reinserted on exit from cfglayout mode. > >>> (outof_cfg_layout_mode): New cfg_layout_finalize parameter. > >>> (fixup_reorder_chain): Call insert_section_boundary_note if necessary. > >>> Remove old code that attempted to fixup region crossing note as > >>> this is now handled in force_nonfallthru_and_redirect. > >>> (duplicate_insn_chain): Don't duplicate switch section notes. > >>> (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain. > >>> (rtl_can_remove_branch_p): Remove unnecessary check for region crossing > >>> note. > >>> > >>> Index: cfghooks.h > >>> =================================================================== > >>> --- cfghooks.h (revision 193376) > >>> +++ cfghooks.h (working copy) > >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas > >>> void account_profile_record (struct profile_record *, int); > >>> > >>> extern void cfg_layout_initialize (unsigned int); > >>> -extern void cfg_layout_finalize (void); > >>> +extern void cfg_layout_finalize (bool); > >>> > >>> /* Hooks containers. */ > >>> extern struct cfg_hooks gimple_cfg_hooks; > >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi > >>> extern void gimple_register_cfg_hooks (void); > >>> extern struct cfg_hooks get_cfg_hooks (void); > >>> extern void set_cfg_hooks (struct cfg_hooks); > >>> - > >>> Index: modulo-sched.c > >>> =================================================================== > >>> --- modulo-sched.c (revision 193376) > >>> +++ modulo-sched.c (working copy) > >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) > >>> if (bb->next_bb != EXIT_BLOCK_PTR) > >>> bb->aux = bb->next_bb; > >>> free_dominance_info (CDI_DOMINATORS); > >>> - cfg_layout_finalize (); > >>> + cfg_layout_finalize (false); > >>> #endif /* INSN_SCHEDULING */ > >>> return 0; > >>> } > >>> Index: ifcvt.c > >>> =================================================================== > >>> --- ifcvt.c (revision 193376) > >>> +++ ifcvt.c (working copy) > >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg > >>> if (new_bb) > >>> { > >>> df_bb_replace (then_bb_index, new_bb); > >>> - /* Since the fallthru edge was redirected from test_bb to new_bb, > >>> - we need to ensure that new_bb is in the same partition as > >>> - test bb (you can not fall through across section boundaries). */ > >>> - BB_COPY_PARTITION (new_bb, test_bb); > >>> + /* This should have been done above via force_nonfallthru_and_redirect > >>> + (possibly called from redirect_edge_and_branch_force). */ > >>> + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); > >>> } > >>> > >>> num_true_changes++; > >>> Index: function.c > >>> =================================================================== > >>> --- function.c (revision 193376) > >>> +++ function.c (working copy) > >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) > >>> break; > >>> if (e) > >>> { > >>> - copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), > >>> - NULL_RTX, e->src); > >>> + /* Make sure we insert after any barriers. */ > >>> + rtx end = get_last_bb_insn (e->src); > >>> + copy_bb = create_basic_block (NEXT_INSN (end), > >>> + NULL_RTX, e->src); > >>> BB_COPY_PARTITION (copy_bb, e->src); > >>> } > >>> else > >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) > >>> if (cur_bb->index >= NUM_FIXED_BLOCKS > >>> && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) > >>> cur_bb->aux = cur_bb->next_bb; > >>> - cfg_layout_finalize (); > >>> + cfg_layout_finalize (false); > >>> } > >>> > >>> epilogue_done: > >>> @@ -6517,7 +6519,7 @@ epilogue_done: > >>> basic_block simple_return_block_cold = NULL; > >>> edge pending_edge_hot = NULL; > >>> edge pending_edge_cold = NULL; > >>> - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; > >>> + basic_block exit_pred; > >>> int i; > >>> > >>> gcc_assert (entry_edge != orig_entry_edge); > >>> @@ -6545,6 +6547,12 @@ epilogue_done: > >>> else > >>> pending_edge_cold = e; > >>> } > >>> + > >>> + /* Save a pointer to the exit's predecessor BB for use in > >>> + inserting new BBs at the end of the function. Do this > >>> + after the call to split_block above which may split > >>> + the original exit pred. */ > >>> + exit_pred = EXIT_BLOCK_PTR->prev_bb; > >>> > >>> FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) > >>> { > >>> Index: function.h > >>> =================================================================== > >>> --- function.h (revision 193376) > >>> +++ function.h (working copy) > >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { > >>> sched2) and is useful only if the port defines LEAF_REGISTERS. */ > >>> bool uses_only_leaf_regs; > >>> > >>> + /* Nonzero if the function being compiled has undergone hot/cold partitioning > >>> + (under flag_reorder_blocks_and_partition) and has at least one cold > >>> + block. */ > >>> + bool has_bb_partition; > >>> + > >>> /* Like regs_ever_live, but 1 if a reg is set or clobbered from an > >>> asm. Unlike regs_ever_live, elements of this array corresponding > >>> to eliminable regs (like the frame pointer) are set if an asm > >>> Index: hw-doloop.c > >>> =================================================================== > >>> --- hw-doloop.c (revision 193376) > >>> +++ hw-doloop.c (working copy) > >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) > >>> else > >>> bb->aux = NULL; > >>> } > >>> - cfg_layout_finalize (); > >>> + cfg_layout_finalize (false); > >>> clear_aux_for_blocks (); > >>> df_analyze (); > >>> } > >>> Index: cfgcleanup.c > >>> =================================================================== > >>> --- cfgcleanup.c (revision 193376) > >>> +++ cfgcleanup.c (working copy) > >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, > >>> partition boundaries). See the comments at the top of > >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ > >>> > >>> - if (flag_reorder_blocks_and_partition && reload_completed) > >>> + if (crtl->has_bb_partition && reload_completed) > >>> return false; > >>> > >>> /* Search backward through forwarder blocks. We don't need to worry > >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) > >>> df_analyze (); > >>> } > >>> > >>> + if (changed) > >>> + { > >>> + /* Edge forwarding in particular can cause hot blocks previously > >>> + reached by both hot and cold blocks to become dominated only > >>> + by cold blocks. This will cause the verification below to fail, > >>> + and lead to now cold code in the hot section. This is not easy > >>> + to detect and fix during edge forwarding, and in some cases > >>> + is only visible after newly unreachable blocks are deleted, > >>> + which will be done in fixup_partitions. */ > >>> + fixup_partitions (); > >>> + > >>> #ifdef ENABLE_CHECKING > >>> - if (changed) > >>> - verify_flow_info (); > >>> + verify_flow_info (); > >>> #endif > >>> + } > >>> > >>> changed_overall |= changed; > >>> first_pass = false; > >>> Index: bb-reorder.c > >>> =================================================================== > >>> --- bb-reorder.c (revision 193376) > >>> +++ bb-reorder.c (working copy) > >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces > >>> current_partition = BB_PARTITION (traces[0].first); > >>> two_passes = false; > >>> > >>> - if (flag_reorder_blocks_and_partition) > >>> + if (crtl->has_bb_partition) > >>> for (i = 0; i < n_traces && !two_passes; i++) > >>> if (BB_PARTITION (traces[0].first) > >>> != BB_PARTITION (traces[i].first)) > >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces > >>> } > >>> } > >>> > >>> - if (flag_reorder_blocks_and_partition) > >>> + if (crtl->has_bb_partition) > >>> try_copy = false; > >>> > >>> /* Copy tiny blocks always; copy larger blocks only when the > >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) > >>> return length; > >>> } > >>> > >>> -/* Emit a barrier into the footer of BB. */ > >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode. */ > >>> > >>> -static void > >>> +void > >>> emit_barrier_after_bb (basic_block bb) > >>> { > >>> rtx barrier = emit_barrier_after (BB_END (bb)); > >>> - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); > >>> + if (current_ir_type () == IR_RTL_CFGLAYOUT) > >>> + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); > >>> } > >>> > >>> /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. > >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg > >>> { > >>> VEC(edge, heap) *crossing_edges = NULL; > >>> basic_block bb; > >>> - edge e; > >>> - edge_iterator ei; > >>> + edge e, e2; > >>> + edge_iterator ei, ei2; > >>> + unsigned int cold_bb_count = 0; > >>> + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; > >>> + VEC (basic_block, heap) *bbs_newly_hot = NULL; > >>> > >>> /* Mark which partition (hot/cold) each basic block belongs in. */ > >>> FOR_EACH_BB (bb) > >>> { > >>> if (probably_never_executed_bb_p (cfun, bb)) > >>> - BB_SET_PARTITION (bb, BB_COLD_PARTITION); > >>> + { > >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); > >>> + cold_bb_count++; > >>> + } > >>> else > >>> - BB_SET_PARTITION (bb, BB_HOT_PARTITION); > >>> + { > >>> + BB_SET_PARTITION (bb, BB_HOT_PARTITION); > >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb); > >>> + } > >>> } > >>> > >>> + /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of > >>> + several different possibilities. One is that there are edge weight insanities > >>> + due to optimization phases that do not properly update basic block profile > >>> + counts. The second is that the entry of the function may not be hot, because > >>> + it is entered fewer times than the number of profile training runs, but there > >>> + is a loop inside the function that causes blocks within the function to be > >>> + above the threshold for hotness. */ > >>> + if (cold_bb_count) > >>> + { > >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); > >>> + > >>> + if (dom_calculated_here) > >>> + calculate_dominance_info (CDI_DOMINATORS); > >>> + > >>> + /* Keep examining hot bbs until we have either checked them all, or > >>> + re-marked all cold bbs hot. */ > >>> + while (! VEC_empty (basic_block, bbs_in_hot_partition) > >>> + && cold_bb_count) > >>> + { > >>> + basic_block dom_bb; > >>> + > >>> + bb = VEC_pop (basic_block, bbs_in_hot_partition); > >>> + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); > >>> + > >>> + /* If bb's immediate dominator is also hot then it is ok. */ > >>> + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) > >>> + continue; > >>> + > >>> + /* We have a hot bb with an immediate dominator that is cold. > >>> + The dominator needs to be re-marked to hot. */ > >>> + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); > >>> + cold_bb_count--; > >>> + > >>> + /* Now we need to examine newly-hot dom_bb to see if it is also > >>> + dominated by a cold bb. */ > >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb); > >>> + > >>> + /* We should also adjust any cold blocks that the newly-hot bb > >>> + feeds and see if it makes sense to re-mark those as hot as > >>> + well. */ > >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb); > >>> + while (! VEC_empty (basic_block, bbs_newly_hot)) > >>> + { > >>> + basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot); > >>> + /* Examine all successors of this newly-hot bb to see if they > >>> + are cold and should be re-marked as hot. */ > >>> + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) > >>> + { > >>> + bool any_cold_preds = false; > >>> + basic_block succ = e->dest; > >>> + if (BB_PARTITION (succ) != BB_COLD_PARTITION) > >>> + continue; > >>> + /* Does this block have any cold predecessors now? */ > >>> + FOR_EACH_EDGE (e2, ei2, succ->preds) > >>> + { > >>> + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) > >>> + { > >>> + any_cold_preds = true; > >>> + break; > >>> + } > >>> + } > >>> + if (any_cold_preds) > >>> + continue; > >>> + > >>> + /* Here we have a successor of newly-hot bb that is cold > >>> + but no longer has any cold precessessors. Since the original > >>> + assignment of our newly-hot bb was incorrect, this successor's > >>> + assignment as cold is also suspect. Go ahead and re-mark it > >>> + as hot now too. Better heuristics may be in order here. */ > >>> + BB_SET_PARTITION (succ, BB_HOT_PARTITION); > >>> + cold_bb_count--; > >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ); > >>> + /* Examine this successor as a newly-hot bb. */ > >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, succ); > >>> + } > >>> + } > >>> + } > >>> + > >>> + if (dom_calculated_here) > >>> + free_dominance_info (CDI_DOMINATORS); > >>> + } > >>> + > >>> /* The format of .gcc_except_table does not allow landing pads to > >>> be in a different partition as the throw. Fix this by either > >>> moving or duplicating the landing pads. */ > >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) > >>> new_bb->aux = cur_bb->aux; > >>> cur_bb->aux = new_bb; > >>> > >>> - /* Make sure new fall-through bb is in same > >>> - partition as bb it's falling through from. */ > >>> + /* This is done by force_nonfallthru_and_redirect. */ > >>> + gcc_assert (BB_PARTITION (new_bb) > >>> + == BB_PARTITION (cur_bb)); > >>> > >>> - BB_COPY_PARTITION (new_bb, cur_bb); > >>> single_succ_edge (new_bb)->flags |= EDGE_CROSSING; > >>> } > >>> else > >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) > >>> FOR_EACH_BB (bb) > >>> FOR_EACH_EDGE (e, ei, bb->succs) > >>> if ((e->flags & EDGE_CROSSING) > >>> - && JUMP_P (BB_END (e->src))) > >>> + && JUMP_P (BB_END (e->src)) > >>> + /* Some notes were added during fix_up_fall_thru_edges, via > >>> + force_nonfallthru_and_redirect. */ > >>> + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX)) > >>> add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); > >>> } > >>> > >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) > >>> dump_flow_info (dump_file, dump_flags); > >>> } > >>> > >>> - if (flag_reorder_blocks_and_partition) > >>> + if (crtl->has_bb_partition) > >>> verify_hot_cold_block_grouping (); > >>> } > >>> > >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) > >>> encountering this note will make the compiler switch between the > >>> hot and cold text sections. */ > >>> > >>> -static void > >>> +void > >>> insert_section_boundary_note (void) > >>> { > >>> basic_block bb; > >>> rtx new_note; > >>> int first_partition = 0; > >>> > >>> - if (!flag_reorder_blocks_and_partition) > >>> + if (!crtl->has_bb_partition) > >>> return; > >>> > >>> FOR_EACH_BB (bb) > >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) > >>> FOR_EACH_BB (bb) > >>> if (bb->next_bb != EXIT_BLOCK_PTR) > >>> bb->aux = bb->next_bb; > >>> - cfg_layout_finalize (); > >>> + cfg_layout_finalize (true); > >>> > >>> - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ > >>> - insert_section_boundary_note (); > >>> return 0; > >>> } > >>> > >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) > >>> } > >>> > >>> done: > >>> - cfg_layout_finalize (); > >>> + cfg_layout_finalize (false); > >>> > >>> BITMAP_FREE (candidates); > >>> return 0; > >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) > >>> if (crossing_edges == NULL) > >>> return 0; > >>> > >>> + crtl->has_bb_partition = true; > >>> + > >>> /* Make sure the source of any crossing edge ends in a jump and the > >>> destination of any crossing edge has a label. */ > >>> add_labels_and_missing_jumps (crossing_edges); > >>> Index: bb-reorder.h > >>> =================================================================== > >>> --- bb-reorder.h (revision 193376) > >>> +++ bb-reorder.h (working copy) > >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re > >>> > >>> extern int get_uncond_jump_length (void); > >>> > >>> +extern void insert_section_boundary_note (void); > >>> + > >>> +extern void emit_barrier_after_bb (basic_block bb); > >>> + > >>> #endif > >>> Index: basic-block.h > >>> =================================================================== > >>> --- basic-block.h (revision 193376) > >>> +++ basic-block.h (working copy) > >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect > >>> extern bool contains_no_active_insn_p (const_basic_block); > >>> extern bool forwarder_block_p (const_basic_block); > >>> extern bool can_fallthru (basic_block, basic_block); > >>> +extern void fixup_partitions (void); > >>> > >>> /* In cfgbuild.c. */ > >>> extern void find_many_sub_basic_blocks (sbitmap); > >>> Index: cfgrtl.c > >>> =================================================================== > >>> --- cfgrtl.c (revision 193376) > >>> +++ cfgrtl.c (working copy) > >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see > >>> #include "tree.h" > >>> #include "hard-reg-set.h" > >>> #include "basic-block.h" > >>> +#include "bb-reorder.h" > >>> #include "regs.h" > >>> #include "flags.h" > >>> #include "function.h" > >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see > >>> Only applicable if the CFG is in cfglayout mode. */ > >>> static GTY(()) rtx cfg_layout_function_footer; > >>> static GTY(()) rtx cfg_layout_function_header; > >>> +static bool had_sec_boundary_notes; > >>> > >>> static rtx skip_insns_after_block (basic_block); > >>> static void record_effective_endpoints (void); > >>> static rtx label_for_bb (basic_block); > >>> -static void fixup_reorder_chain (void); > >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks); > >>> > >>> void verify_insn_chain (void); > >>> static void fixup_fallthru_exit_predecessor (void); > >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc > >>> partition boundaries). See the comments at the top of > >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ > >>> > >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) > >>> - || BB_PARTITION (src) != BB_PARTITION (target)) > >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) > >>> return NULL; > >>> > >>> /* We can replace or remove a complex jump only when we have exactly > >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) > >>> return e; > >>> } > >>> > >>> +/* Called when edge E has been redirected to a new destination, > >>> + in order to update the region crossing flag on the edge and > >>> + jump. */ > >>> + > >>> +static void > >>> +fixup_partition_crossing (edge e, basic_block target) > >>> +{ > >>> + rtx note; > >>> + > >>> + gcc_assert (e->dest == target); > >>> + > >>> + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) > >>> + return; > >>> + /* If we redirected an existing edge, it may already be marked > >>> + crossing, even though the new src is missing a reg crossing note. > >>> + But make sure reg crossing note doesn't already exist before > >>> + inserting. */ > >>> + if (BB_PARTITION (e->src) != BB_PARTITION (target)) > >>> + { > >>> + e->flags |= EDGE_CROSSING; > >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); > >>> + if (JUMP_P (BB_END (e->src)) > >>> + && !note) > >>> + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); > >>> + } > >>> + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) > >>> + { > >>> + e->flags &= ~EDGE_CROSSING; > >>> + /* Remove the region crossing note from jump at end of > >>> + e->src if it exists. */ > >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); > >>> + if (note) > >>> + remove_note (BB_END (e->src), note); > >>> + } > >>> +} > >>> + > >>> +/* Called when block BB has been reassigned to a different partition, > >>> + to ensure that the region crossing attributes are updated. */ > >>> + > >>> +static void > >>> +fixup_bb_partition (basic_block bb) > >>> +{ > >>> + edge e; > >>> + edge_iterator ei; > >>> + > >>> + /* Now need to make bb's pred edges non-region crossing. */ > >>> + FOR_EACH_EDGE (e, ei, bb->preds) > >>> + { > >>> + fixup_partition_crossing (e, e->dest); > >>> + } > >>> + > >>> + /* Possibly need to make bb's successor edges region crossing, > >>> + or remove stale region crossing. */ > >>> + FOR_EACH_EDGE (e, ei, bb->succs) > >>> + { > >>> + if ((e->flags & EDGE_FALLTHRU) > >>> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) > >>> + && e->dest != EXIT_BLOCK_PTR) > >>> + /* force_nonfallthru_and_redirect calls fixup_partition_crossing. */ > >>> + force_nonfallthru (e); > >>> + else > >>> + fixup_partition_crossing (e, e->dest); > >>> + } > >>> +} > >>> + > >>> /* Attempt to change code to redirect edge E to TARGET. Don't do that on > >>> expense of adding new instructions or reordering basic blocks. > >>> > >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block > >>> { > >>> edge ret; > >>> basic_block src = e->src; > >>> + basic_block dest = e->dest; > >>> > >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) > >>> return NULL; > >>> > >>> - if (e->dest == target) > >>> + if (dest == target) > >>> return e; > >>> > >>> if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) > >>> { > >>> df_set_bb_dirty (src); > >>> + fixup_partition_crossing (ret, target); > >>> return ret; > >>> } > >>> > >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block > >>> return NULL; > >>> > >>> df_set_bb_dirty (src); > >>> + fixup_partition_crossing (ret, target); > >>> return ret; > >>> } > >>> > >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc > >>> /* Make sure new block ends up in correct hot/cold section. */ > >>> > >>> BB_COPY_PARTITION (jump_block, e->src); > >>> - if (flag_reorder_blocks_and_partition > >>> - && targetm_common.have_named_sections > >>> - && JUMP_P (BB_END (jump_block)) > >>> - && !any_condjump_p (BB_END (jump_block)) > >>> - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) > >>> - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); > >>> > >>> /* Wire edge in. */ > >>> new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); > >>> new_edge->probability = probability; > >>> new_edge->count = count; > >>> > >>> + /* If e->src was previously region crossing, it no longer is > >>> + and the reg crossing note should be removed. */ > >>> + fixup_partition_crossing (new_edge, jump_block); > >>> + > >>> /* Redirect old edge. */ > >>> redirect_edge_pred (e, jump_block); > >>> e->probability = REG_BR_PROB_BASE; > >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc > >>> LABEL_NUSES (label)++; > >>> } > >>> > >>> - emit_barrier_after (BB_END (jump_block)); > >>> + /* We might be in cfg layout mode, and if so, the following routine will > >>> + insert the barrier correctly. */ > >>> + emit_barrier_after_bb (jump_block); > >>> redirect_edge_succ_nodup (e, target); > >>> > >>> if (abnormal_edge_flags) > >>> make_edge (src, target, abnormal_edge_flags); > >>> > >>> df_mark_solutions_dirty (); > >>> + fixup_partition_crossing (e, target); > >>> return new_bb; > >>> } > >>> > >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU > >>> static basic_block > >>> rtl_split_edge (edge edge_in) > >>> { > >>> - basic_block bb; > >>> + basic_block bb, new_bb; > >>> rtx before; > >>> > >>> /* Abnormal edges cannot be split. */ > >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) > >>> else > >>> { > >>> bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); > >>> - /* ??? Why not edge_in->dest->prev_bb here? */ > >>> - BB_COPY_PARTITION (bb, edge_in->dest); > >>> + if (edge_in->src == ENTRY_BLOCK_PTR) > >>> + BB_COPY_PARTITION (bb, edge_in->dest); > >>> + else > >>> + /* Put the split bb into the src partition, to avoid creating > >>> + a situation where a cold bb dominates a hot bb, in the case > >>> + where src is cold and dest is hot. The src will dominate > >>> + the new bb (whereas it might not have dominated dest). */ > >>> + BB_COPY_PARTITION (bb, edge_in->src); > >>> } > >>> > >>> make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); > >>> > >>> + /* Can't allow a region crossing edge to be fallthrough. */ > >>> + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) > >>> + && edge_in->dest != EXIT_BLOCK_PTR) > >>> + { > >>> + new_bb = force_nonfallthru (single_succ_edge (bb)); > >>> + gcc_assert (!new_bb); > >>> + } > >>> + > >>> /* For non-fallthru edges, we must adjust the predecessor's > >>> jump instruction to target our new block. */ > >>> if ((edge_in->flags & EDGE_FALLTHRU) == 0) > >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) > >>> else > >>> { > >>> bb = split_edge (e); > >>> - after = BB_END (bb); > >>> > >>> - if (flag_reorder_blocks_and_partition > >>> - && targetm_common.have_named_sections > >>> - && e->src != ENTRY_BLOCK_PTR > >>> - && BB_PARTITION (e->src) == BB_COLD_PARTITION > >>> - && !(e->flags & EDGE_CROSSING) > >>> - && JUMP_P (after) > >>> - && !any_condjump_p (after) > >>> - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) > >>> - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); > >>> + /* If e crossed a partition boundary, we needed to make bb end in > >>> + a region-crossing jump, even though it was originally fallthru. */ > >>> + if (JUMP_P (BB_END (bb))) > >>> + before = BB_END (bb); > >>> + else > >>> + after = BB_END (bb); > >>> } > >>> > >>> /* Now that we've found the spot, do the insertion. */ > >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) > >>> { > >>> basic_block bb; > >>> > >>> + /* Optimization passes that invoke this routine can cause hot blocks > >>> + previously reached by both hot and cold blocks to become dominated only > >>> + by cold blocks. This will cause the verification below to fail, > >>> + and lead to now cold code in the hot section. In some cases this > >>> + may only be visible after newly unreachable blocks are deleted, > >>> + which will be done by fixup_partitions. */ > >>> + fixup_partitions (); > >>> + > >>> #ifdef ENABLE_CHECKING > >>> verify_flow_info (); > >>> #endif > >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) > >>> > >>> return end; > >>> } > >>> - > >>> + > >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization > >>> + passes that modify the cfg. */ > >>> + > >>> +void > >>> +fixup_partitions (void) > >>> +{ > >>> + basic_block bb; > >>> + > >>> + if (!crtl->has_bb_partition) > >>> + return; > >>> + > >>> + /* Delete any blocks that became unreachable and weren't > >>> + already cleaned up, for example during edge forwarding > >>> + and convert_jumps_to_returns. This will expose more > >>> + opportunities for fixing the partition boundaries here. > >>> + Also, the calculation of the dominance graph during verification > >>> + will assert if there are unreachable nodes. */ > >>> + delete_unreachable_blocks (); > >>> + > >>> + /* If there are partitions, do a sanity check on them: A basic block in > >>> + a cold partition cannot dominate a basic block in a hot partition. > >>> + Fixup any that now violate this requirement, as a result of edge > >>> + forwarding and unreachable block deletion. */ > >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; > >>> + VEC (basic_block, heap) *bbs_to_fix = NULL; > >>> + FOR_EACH_BB (bb) > >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) > >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); > >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) > >>> + { > >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); > >>> + basic_block son; > >>> + > >>> + if (dom_calculated_here) > >>> + calculate_dominance_info (CDI_DOMINATORS); > >>> + > >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) > >>> + { > >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); > >>> + /* If bb is not yet cold (because it was added below as > >>> + a block dominated by a cold bb) then mark it cold here. */ > >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) > >>> + { > >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); > >>> + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); > >>> + } > >>> + /* Any blocks dominated by a block in the cold section > >>> + must also be cold. */ > >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); > >>> + son; > >>> + son = next_dom_son (CDI_DOMINATORS, son)) > >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); > >>> + } > >>> + > >>> + if (dom_calculated_here) > >>> + free_dominance_info (CDI_DOMINATORS); > >>> + } > >>> + > >>> + /* Do the partition fixup after all necessary blocks have been converted to > >>> + cold, so that we only update the region crossings the minimum number of > >>> + places, which can require forcing edges to be non fallthru. */ > >>> + while (! VEC_empty (basic_block, bbs_to_fix)) > >>> + { > >>> + bb = VEC_pop (basic_block, bbs_to_fix); > >>> + fixup_bb_partition (bb); > >>> + } > >>> +} > >>> + > >>> /* Verify the CFG and RTL consistency common for both underlying RTL and > >>> cfglayout RTL. > >>> > >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) > >>> rtx x; > >>> int err = 0; > >>> basic_block bb; > >>> + bool have_partitions = false; > >>> > >>> /* Check the general integrity of the basic blocks. */ > >>> FOR_EACH_BB_REVERSE (bb) > >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) > >>> > >>> if (e->flags & EDGE_ABNORMAL) > >>> n_abnormal++; > >>> + > >>> + have_partitions |= is_crossing; > >>> } > >>> > >>> if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) > >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) > >>> } > >>> } > >>> > >>> + /* If there are partitions, do a sanity check on them: A basic block in > >>> + a cold partition cannot dominate a basic block in a hot partition. */ > >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; > >>> + if (have_partitions && !err) > >>> + FOR_EACH_BB (bb) > >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) > >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); > >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) > >>> + { > >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); > >>> + basic_block son; > >>> + > >>> + if (dom_calculated_here) > >>> + calculate_dominance_info (CDI_DOMINATORS); > >>> + > >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) > >>> + { > >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); > >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) > >>> + { > >>> + error ("non-cold basic block %d dominated " > >>> + "by a block in the cold partition", bb->index); > >>> + err = 1; > >>> + } > >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); > >>> + son; > >>> + son = next_dom_son (CDI_DOMINATORS, son)) > >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); > >>> + } > >>> + > >>> + if (dom_calculated_here) > >>> + free_dominance_info (CDI_DOMINATORS); > >>> + } > >>> + > >>> /* Clean up. */ > >>> return err; > >>> } > >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) > >>> else > >>> cfg_layout_function_header = NULL_RTX; > >>> > >>> + had_sec_boundary_notes = false; > >>> + > >>> next_insn = get_insns (); > >>> FOR_EACH_BB (bb) > >>> { > >>> rtx end; > >>> > >>> if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) > >>> - BB_HEADER (bb) = unlink_insn_chain (next_insn, > >>> - PREV_INSN (BB_HEAD (bb))); > >>> + { > >>> + /* Rather than try to keep section boundary notes incrementally > >>> + up-to-date through cfg layout optimizations, simply remove them > >>> + and flag that they should be re-inserted when exiting > >>> + cfg layout mode. */ > >>> + rtx check_insn = next_insn; > >>> + while (check_insn) > >>> + { > >>> + if (NOTE_P (check_insn) > >>> + && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) > >>> + { > >>> + had_sec_boundary_notes |= true; > >>> + /* Remove note from chain. Grab new next_insn first. */ > >>> + if (next_insn == check_insn) > >>> + next_insn = NEXT_INSN (check_insn); > >>> + /* Delete note. */ > >>> + delete_insn (check_insn); > >>> + /* There will only be one. */ > >>> + break; > >>> + } > >>> + check_insn = NEXT_INSN (check_insn); > >>> + } > >>> + /* If we still have header instructions left after above loop. */ > >>> + if (next_insn != BB_HEAD (bb)) > >>> + BB_HEADER (bb) = unlink_insn_chain (next_insn, > >>> + PREV_INSN (BB_HEAD (bb))); > >>> + } > >>> end = skip_insns_after_block (bb); > >>> if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) > >>> BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end); > >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) > >>> if (bb->next_bb != EXIT_BLOCK_PTR) > >>> bb->aux = bb->next_bb; > >>> > >>> - cfg_layout_finalize (); > >>> + cfg_layout_finalize (false); > >>> > >>> return 0; > >>> } > >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) > >>> } > >>> > >>> > >>> -/* Given a reorder chain, rearrange the code to match. */ > >>> +/* Given a reorder chain, rearrange the code to match. If > >>> + this is called when we will FINALIZE_REORDER_BLOCKS, or when > >>> + section boundary notes were removed on entry to cfg layout > >>> + mode, insert section boundary notes here. */ > >>> > >>> static void > >>> -fixup_reorder_chain (void) > >>> +fixup_reorder_chain (bool finalize_reorder_blocks) > >>> { > >>> basic_block bb; > >>> rtx insn = NULL; > >>> @@ -3150,7 +3373,7 @@ static void > >>> PREV_INSN (BB_HEADER (bb)) = insn; > >>> insn = BB_HEADER (bb); > >>> while (NEXT_INSN (insn)) > >>> - insn = NEXT_INSN (insn); > >>> + insn = NEXT_INSN (insn); > >>> } > >>> if (insn) > >>> NEXT_INSN (insn) = BB_HEAD (bb); > >>> @@ -3175,6 +3398,11 @@ static void > >>> insn = NEXT_INSN (insn); > >>> > >>> set_last_insn (insn); > >>> + > >>> + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ > >>> + if (had_sec_boundary_notes || finalize_reorder_blocks) > >>> + insert_section_boundary_note (); > >>> + > >>> #ifdef ENABLE_CHECKING > >>> verify_insn_chain (); > >>> #endif > >>> @@ -3187,7 +3415,7 @@ static void > >>> edge e_fall, e_taken, e; > >>> rtx bb_end_insn; > >>> rtx ret_label = NULL_RTX; > >>> - basic_block nb, src_bb; > >>> + basic_block nb; > >>> edge_iterator ei; > >>> > >>> if (EDGE_COUNT (bb->succs) == 0) > >>> @@ -3322,7 +3550,6 @@ static void > >>> /* We got here if we need to add a new jump insn. > >>> Note force_nonfallthru can delete E_FALL and thus we have to > >>> save E_FALL->src prior to the call to force_nonfallthru. */ > >>> - src_bb = e_fall->src; > >>> nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); > >>> if (nb) > >>> { > >>> @@ -3330,17 +3557,6 @@ static void > >>> bb->aux = nb; > >>> /* Don't process this new block. */ > >>> bb = nb; > >>> - > >>> - /* Make sure new bb is tagged for correct section (same as > >>> - fall-thru source, since you cannot fall-thru across > >>> - section boundaries). */ > >>> - BB_COPY_PARTITION (src_bb, single_pred (bb)); > >>> - if (flag_reorder_blocks_and_partition > >>> - && targetm_common.have_named_sections > >>> - && JUMP_P (BB_END (bb)) > >>> - && !any_condjump_p (BB_END (bb)) > >>> - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) > >>> - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); > >>> } > >>> } > >>> > >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) > >>> case NOTE_INSN_FUNCTION_BEG: > >>> /* There is always just single entry to function. */ > >>> case NOTE_INSN_BASIC_BLOCK: > >>> + /* We should only switch text sections once. */ > >>> + case NOTE_INSN_SWITCH_TEXT_SECTIONS: > >>> break; > >>> > >>> case NOTE_INSN_EPILOGUE_BEG: > >>> - case NOTE_INSN_SWITCH_TEXT_SECTIONS: > >>> emit_note_copy (insn); > >>> break; > >>> > >>> @@ -3759,10 +3976,13 @@ break_superblocks (void) > >>> } > >>> > >>> /* Finalize the changes: reorder insn list according to the sequence specified > >>> - by aux pointers, enter compensation code, rebuild scope forest. */ > >>> + by aux pointers, enter compensation code, rebuild scope forest. If > >>> + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that > >>> + to fixup_reorder_chain so that it can insert the proper switch text > >>> + section notes. */ > >>> > >>> void > >>> -cfg_layout_finalize (void) > >>> +cfg_layout_finalize (bool finalize_reorder_blocks) > >>> { > >>> #ifdef ENABLE_CHECKING > >>> verify_flow_info (); > >>> @@ -3775,7 +3995,7 @@ void > >>> #endif > >>> ) > >>> fixup_fallthru_exit_predecessor (); > >>> - fixup_reorder_chain (); > >>> + fixup_reorder_chain (finalize_reorder_blocks); > >>> > >>> rebuild_jump_labels (get_insns ()); > >>> delete_dead_jumptables (); > >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) > >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) > >>> return false; > >>> > >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) > >>> - || BB_PARTITION (src) != BB_PARTITION (target)) > >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) > >>> return false; > >>> > >>> if (!onlyjump_p (insn) > >>> > >>> -- > >>> This patch is available for review at http://codereview.appspot.com/6823047 > >> > >> > >> > >> -- > >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sorry, I don't know what happened there. Patch is attached. Thanks, Teresa On Mon, Nov 26, 2012 at 12:42 PM, Jack Howarth <howarth@bromo.med.uc.edu> wrote: > On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote: >> Are you sure you have all my changes applied? I applied the 4 patches >> attached to PR55121 into my trunk checkout that has my fixes, and to a >> pristine trunk checkout. I configured and built both for >> --target=arm-none-linux-gnueabi, and built using your options, .i file >> and gcda file. I can reproduce the failure using the pristine trunk >> with your patches but not with my fixed trunk + your patches. (I just >> updated to head to pickup recent changes and get the same result. The >> vec changes required some manual changes to the patch, which I will >> resend shortly.) > > Teresa, > Your mailer seems to have corrupted the posted patch with stray > =3D characters and line breaks. Can you repost a copy as an attachment > to the list? > Jack > >> >> Without my fixes: >> >> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce >> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 >> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp >> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use >> -fno-common -o eval.s -freorder-blocks-and-partition >> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >> 2.4.2-p1, MPC version 0.8.1 >> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >> 2.4.2-p1, MPC version 0.8.1 >> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a >> eval.c: In function ‘Ge’: >> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560 >> } >> ^ >> 0x622f71 df_compact_blocks() >> ../../gcc_trunk_3/gcc/df-core.c:1560 >> 0x5cfcb5 compact_blocks() >> ../../gcc_trunk_3/gcc/cfg.c:162 >> 0xc9dce0 reorder_basic_blocks >> ../../gcc_trunk_3/gcc/bb-reorder.c:2154 >> 0xc9dce0 rest_of_handle_reorder_blocks >> ../../gcc_trunk_3/gcc/bb-reorder.c:2219 >> Please submit a full bug report, >> with preprocessed source if appropriate. >> Please include the complete backtrace with any bug report. >> See <http://gcc.gnu.org/bugs.html> for instructions. >> >> >> With my fixes: >> >> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce >> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 >> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp >> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use >> -fno-common -o eval.s -freorder-blocks-and-partition >> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >> 2.4.2-p1, MPC version 0.8.1 >> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >> 2.4.2-p1, MPC version 0.8.1 >> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3 >> >> >> Thanks, >> Teresa >> >> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon >> <christophe.lyon@linaro.org> wrote: >> > Hi, >> > >> > I have tested your patch on Spec2000 on ARM, and I can still see >> > several failures caused by: >> > "error: fallthru edge crosses section boundary", including the case >> > described in PR55121. >> > >> > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote: >> >> Ping. >> >> Teresa >> >> >> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote: >> >>> Revised patch that fixes failures encountered when enabling >> >>> -freorder-blocks-and-partition, including the failure reported in PR 53743. >> >>> >> >>> This includes new verification code to ensure no cold blocks dominate hot >> >>> blocks contributed by Steven Bosscher. >> >>> >> >>> I attempted to make the handling of partition updates through the optimization >> >>> passes much more consistent, removing a number of partial fixes in the code >> >>> stream in the process. The code to fixup partitions (including the BB_PARTITION >> >>> assignement, region crossing jump notes, and switch text section notes) is >> >>> now handled in a few centralized locations. For example, inside >> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers >> >>> don't need to attempt the fixup themselves. >> >>> >> >>> For optimization passes that make adjustments to the cfg while in cfg layout >> >>> mode that are not easy to fix up incrementally, the new routine >> >>> fixup_partitions handles the cleanup globally. This does require calculation >> >>> of the dominance relation, however, as far as I can tell the routines which >> >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) >> >>> are invoked typically once (or a small number of times in the case of >> >>> try_optimize_cfg) per optimization pass. Additionally, I compared the >> >>> -ftime-report output for some large fdo compilations and saw only minimal >> >>> increases in the dominance computation times, which were only a tiny percent >> >>> of the overall compile time. >> >>> >> >>> Additionally, I added a flag to the rtl_data structure to indicate whether >> >>> any partitioning was actually performed, so that optimizations which were >> >>> conservatively disabled whenever the flag_reorder_blocks_and_partition >> >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less >> >>> conservative for functions where no partitions were formed (e.g. they are >> >>> completely hot). >> >>> >> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int >> >>> benchmarks and internal google benchmarks using profile feedback and >> >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk? >> >>> >> >>> Thanks, >> >>> Teresa >> >>> >> >>> 2012-11-14 Teresa Johnson <tejohnson@google.com> >> >>> Steven Bosscher <steven@gcc.gnu.org> >> >>> >> >>> * cfghooks.h (cfg_layout_finalize): New parameter. >> >>> * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize >> >>> parameter. >> >>> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert >> >>> as this is now done by redirect_edge_and_branch_force. >> >>> * function.c (thread_prologue_and_epilogue_insns): Insert new bb after >> >>> barriers, new cfg_layout_finalize parameter, and don't store exit >> >>> predecessor BB until after it is potentially split. >> >>> * function.h (struct rtl_data): New flag has_bb_partition. >> >>> * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. >> >>> * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if >> >>> any blocks in function actually partitioned. >> >>> (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean >> >>> up partitioning. >> >>> * bb-reorder.c (connect_traces): Only look for partitions and skip >> >>> block copying if any blocks in function actually partitioned. >> >>> (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. >> >>> (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure >> >>> that no cold blocks dominate a hot block. >> >>> (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert >> >>> as this is now done by force_nonfallthru_and_redirect. >> >>> (add_reg_crossing_jump_notes): Handle the fact that some jumps may >> >>> already be marked with region crossing note. >> >>> (reorder_basic_blocks): Only need to verify partitions if any >> >>> blocks in function actually partitioned. >> >>> (insert_section_boundary_note): Only need to insert note if any >> >>> blocks in function actually partitioned. >> >>> (rest_of_handle_reorder_blocks): New cfg_layout_finalize >> >>> parameter, and remove call to insert_section_boundary_note as this >> >>> is now called via cfg_layout_finalize/fixup_reorder_chain. >> >>> (duplicate_computed_gotos): New cfg_layout_finalize >> >>> parameter. >> >>> (partition_hot_cold_basic_blocks): Set flag indicating function >> >>> has bb partitions. >> >>> * bb-reorder.h: Declare insert_section_boundary_note and >> >>> emit_barrier_after_bb, which are no longer static. >> >>> * basic-block.h: Declare new function fixup_partitions. >> >>> * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary >> >>> check for region crossing note. >> >>> (fixup_partition_crossing): New function. >> >>> (fixup_bb_partition): Ditto. >> >>> (rtl_redirect_edge_and_branch): Fixup partition boundaries. >> >>> (force_nonfallthru_and_redirect): Fixup partition boundaries, >> >>> remove old code that tried to do this. Emit barrier correctly >> >>> when we are in cfglayout mode. >> >>> (rtl_split_edge): Correctly fixup partition boundaries. >> >>> (commit_one_edge_insertion): Remove old code that tried to >> >>> fixup region crossing edge since this is now handled in >> >>> split_block, and set up insertion point correctly since >> >>> block may now end in a jump. >> >>> (commit_edge_insertions): Invoke fixup_partitions to sanitize partition >> >>> boundaries after optimizations that modify cfg and before trying to >> >>> verify the flow info. >> >>> (fixup_partitions): New function. >> >>> (rtl_verify_flow_info_1): Add verification that no cold bbs dominate >> >>> hot bbs. >> >>> (record_effective_endpoints): Remove region-crossing notes and set flag >> >>> indicating that they need to be reinserted on exit from cfglayout mode. >> >>> (outof_cfg_layout_mode): New cfg_layout_finalize parameter. >> >>> (fixup_reorder_chain): Call insert_section_boundary_note if necessary. >> >>> Remove old code that attempted to fixup region crossing note as >> >>> this is now handled in force_nonfallthru_and_redirect. >> >>> (duplicate_insn_chain): Don't duplicate switch section notes. >> >>> (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain. >> >>> (rtl_can_remove_branch_p): Remove unnecessary check for region crossing >> >>> note. >> >>> >> >>> Index: cfghooks.h >> >>> =================================================================== >> >>> --- cfghooks.h (revision 193376) >> >>> +++ cfghooks.h (working copy) >> >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas >> >>> void account_profile_record (struct profile_record *, int); >> >>> >> >>> extern void cfg_layout_initialize (unsigned int); >> >>> -extern void cfg_layout_finalize (void); >> >>> +extern void cfg_layout_finalize (bool); >> >>> >> >>> /* Hooks containers. */ >> >>> extern struct cfg_hooks gimple_cfg_hooks; >> >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi >> >>> extern void gimple_register_cfg_hooks (void); >> >>> extern struct cfg_hooks get_cfg_hooks (void); >> >>> extern void set_cfg_hooks (struct cfg_hooks); >> >>> - >> >>> Index: modulo-sched.c >> >>> =================================================================== >> >>> --- modulo-sched.c (revision 193376) >> >>> +++ modulo-sched.c (working copy) >> >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) >> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >> >>> bb->aux = bb->next_bb; >> >>> free_dominance_info (CDI_DOMINATORS); >> >>> - cfg_layout_finalize (); >> >>> + cfg_layout_finalize (false); >> >>> #endif /* INSN_SCHEDULING */ >> >>> return 0; >> >>> } >> >>> Index: ifcvt.c >> >>> =================================================================== >> >>> --- ifcvt.c (revision 193376) >> >>> +++ ifcvt.c (working copy) >> >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg >> >>> if (new_bb) >> >>> { >> >>> df_bb_replace (then_bb_index, new_bb); >> >>> - /* Since the fallthru edge was redirected from test_bb to new_bb, >> >>> - we need to ensure that new_bb is in the same partition as >> >>> - test bb (you can not fall through across section boundaries). */ >> >>> - BB_COPY_PARTITION (new_bb, test_bb); >> >>> + /* This should have been done above via force_nonfallthru_and_redirect >> >>> + (possibly called from redirect_edge_and_branch_force). */ >> >>> + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); >> >>> } >> >>> >> >>> num_true_changes++; >> >>> Index: function.c >> >>> =================================================================== >> >>> --- function.c (revision 193376) >> >>> +++ function.c (working copy) >> >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) >> >>> break; >> >>> if (e) >> >>> { >> >>> - copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), >> >>> - NULL_RTX, e->src); >> >>> + /* Make sure we insert after any barriers. */ >> >>> + rtx end = get_last_bb_insn (e->src); >> >>> + copy_bb = create_basic_block (NEXT_INSN (end), >> >>> + NULL_RTX, e->src); >> >>> BB_COPY_PARTITION (copy_bb, e->src); >> >>> } >> >>> else >> >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) >> >>> if (cur_bb->index >= NUM_FIXED_BLOCKS >> >>> && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) >> >>> cur_bb->aux = cur_bb->next_bb; >> >>> - cfg_layout_finalize (); >> >>> + cfg_layout_finalize (false); >> >>> } >> >>> >> >>> epilogue_done: >> >>> @@ -6517,7 +6519,7 @@ epilogue_done: >> >>> basic_block simple_return_block_cold = NULL; >> >>> edge pending_edge_hot = NULL; >> >>> edge pending_edge_cold = NULL; >> >>> - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; >> >>> + basic_block exit_pred; >> >>> int i; >> >>> >> >>> gcc_assert (entry_edge != orig_entry_edge); >> >>> @@ -6545,6 +6547,12 @@ epilogue_done: >> >>> else >> >>> pending_edge_cold = e; >> >>> } >> >>> + >> >>> + /* Save a pointer to the exit's predecessor BB for use in >> >>> + inserting new BBs at the end of the function. Do this >> >>> + after the call to split_block above which may split >> >>> + the original exit pred. */ >> >>> + exit_pred = EXIT_BLOCK_PTR->prev_bb; >> >>> >> >>> FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) >> >>> { >> >>> Index: function.h >> >>> =================================================================== >> >>> --- function.h (revision 193376) >> >>> +++ function.h (working copy) >> >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { >> >>> sched2) and is useful only if the port defines LEAF_REGISTERS. */ >> >>> bool uses_only_leaf_regs; >> >>> >> >>> + /* Nonzero if the function being compiled has undergone hot/cold partitioning >> >>> + (under flag_reorder_blocks_and_partition) and has at least one cold >> >>> + block. */ >> >>> + bool has_bb_partition; >> >>> + >> >>> /* Like regs_ever_live, but 1 if a reg is set or clobbered from an >> >>> asm. Unlike regs_ever_live, elements of this array corresponding >> >>> to eliminable regs (like the frame pointer) are set if an asm >> >>> Index: hw-doloop.c >> >>> =================================================================== >> >>> --- hw-doloop.c (revision 193376) >> >>> +++ hw-doloop.c (working copy) >> >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) >> >>> else >> >>> bb->aux = NULL; >> >>> } >> >>> - cfg_layout_finalize (); >> >>> + cfg_layout_finalize (false); >> >>> clear_aux_for_blocks (); >> >>> df_analyze (); >> >>> } >> >>> Index: cfgcleanup.c >> >>> =================================================================== >> >>> --- cfgcleanup.c (revision 193376) >> >>> +++ cfgcleanup.c (working copy) >> >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, >> >>> partition boundaries). See the comments at the top of >> >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >> >>> >> >>> - if (flag_reorder_blocks_and_partition && reload_completed) >> >>> + if (crtl->has_bb_partition && reload_completed) >> >>> return false; >> >>> >> >>> /* Search backward through forwarder blocks. We don't need to worry >> >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) >> >>> df_analyze (); >> >>> } >> >>> >> >>> + if (changed) >> >>> + { >> >>> + /* Edge forwarding in particular can cause hot blocks previously >> >>> + reached by both hot and cold blocks to become dominated only >> >>> + by cold blocks. This will cause the verification below to fail, >> >>> + and lead to now cold code in the hot section. This is not easy >> >>> + to detect and fix during edge forwarding, and in some cases >> >>> + is only visible after newly unreachable blocks are deleted, >> >>> + which will be done in fixup_partitions. */ >> >>> + fixup_partitions (); >> >>> + >> >>> #ifdef ENABLE_CHECKING >> >>> - if (changed) >> >>> - verify_flow_info (); >> >>> + verify_flow_info (); >> >>> #endif >> >>> + } >> >>> >> >>> changed_overall |= changed; >> >>> first_pass = false; >> >>> Index: bb-reorder.c >> >>> =================================================================== >> >>> --- bb-reorder.c (revision 193376) >> >>> +++ bb-reorder.c (working copy) >> >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces >> >>> current_partition = BB_PARTITION (traces[0].first); >> >>> two_passes = false; >> >>> >> >>> - if (flag_reorder_blocks_and_partition) >> >>> + if (crtl->has_bb_partition) >> >>> for (i = 0; i < n_traces && !two_passes; i++) >> >>> if (BB_PARTITION (traces[0].first) >> >>> != BB_PARTITION (traces[i].first)) >> >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces >> >>> } >> >>> } >> >>> >> >>> - if (flag_reorder_blocks_and_partition) >> >>> + if (crtl->has_bb_partition) >> >>> try_copy = false; >> >>> >> >>> /* Copy tiny blocks always; copy larger blocks only when the >> >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) >> >>> return length; >> >>> } >> >>> >> >>> -/* Emit a barrier into the footer of BB. */ >> >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode. */ >> >>> >> >>> -static void >> >>> +void >> >>> emit_barrier_after_bb (basic_block bb) >> >>> { >> >>> rtx barrier = emit_barrier_after (BB_END (bb)); >> >>> - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >> >>> + if (current_ir_type () == IR_RTL_CFGLAYOUT) >> >>> + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >> >>> } >> >>> >> >>> /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. >> >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg >> >>> { >> >>> VEC(edge, heap) *crossing_edges = NULL; >> >>> basic_block bb; >> >>> - edge e; >> >>> - edge_iterator ei; >> >>> + edge e, e2; >> >>> + edge_iterator ei, ei2; >> >>> + unsigned int cold_bb_count = 0; >> >>> + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; >> >>> + VEC (basic_block, heap) *bbs_newly_hot = NULL; >> >>> >> >>> /* Mark which partition (hot/cold) each basic block belongs in. */ >> >>> FOR_EACH_BB (bb) >> >>> { >> >>> if (probably_never_executed_bb_p (cfun, bb)) >> >>> - BB_SET_PARTITION (bb, BB_COLD_PARTITION); >> >>> + { >> >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >> >>> + cold_bb_count++; >> >>> + } >> >>> else >> >>> - BB_SET_PARTITION (bb, BB_HOT_PARTITION); >> >>> + { >> >>> + BB_SET_PARTITION (bb, BB_HOT_PARTITION); >> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb); >> >>> + } >> >>> } >> >>> >> >>> + /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of >> >>> + several different possibilities. One is that there are edge weight insanities >> >>> + due to optimization phases that do not properly update basic block profile >> >>> + counts. The second is that the entry of the function may not be hot, because >> >>> + it is entered fewer times than the number of profile training runs, but there >> >>> + is a loop inside the function that causes blocks within the function to be >> >>> + above the threshold for hotness. */ >> >>> + if (cold_bb_count) >> >>> + { >> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >> >>> + >> >>> + if (dom_calculated_here) >> >>> + calculate_dominance_info (CDI_DOMINATORS); >> >>> + >> >>> + /* Keep examining hot bbs until we have either checked them all, or >> >>> + re-marked all cold bbs hot. */ >> >>> + while (! VEC_empty (basic_block, bbs_in_hot_partition) >> >>> + && cold_bb_count) >> >>> + { >> >>> + basic_block dom_bb; >> >>> + >> >>> + bb = VEC_pop (basic_block, bbs_in_hot_partition); >> >>> + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); >> >>> + >> >>> + /* If bb's immediate dominator is also hot then it is ok. */ >> >>> + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) >> >>> + continue; >> >>> + >> >>> + /* We have a hot bb with an immediate dominator that is cold. >> >>> + The dominator needs to be re-marked to hot. */ >> >>> + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); >> >>> + cold_bb_count--; >> >>> + >> >>> + /* Now we need to examine newly-hot dom_bb to see if it is also >> >>> + dominated by a cold bb. */ >> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb); >> >>> + >> >>> + /* We should also adjust any cold blocks that the newly-hot bb >> >>> + feeds and see if it makes sense to re-mark those as hot as >> >>> + well. */ >> >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb); >> >>> + while (! VEC_empty (basic_block, bbs_newly_hot)) >> >>> + { >> >>> + basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot); >> >>> + /* Examine all successors of this newly-hot bb to see if they >> >>> + are cold and should be re-marked as hot. */ >> >>> + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) >> >>> + { >> >>> + bool any_cold_preds = false; >> >>> + basic_block succ = e->dest; >> >>> + if (BB_PARTITION (succ) != BB_COLD_PARTITION) >> >>> + continue; >> >>> + /* Does this block have any cold predecessors now? */ >> >>> + FOR_EACH_EDGE (e2, ei2, succ->preds) >> >>> + { >> >>> + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) >> >>> + { >> >>> + any_cold_preds = true; >> >>> + break; >> >>> + } >> >>> + } >> >>> + if (any_cold_preds) >> >>> + continue; >> >>> + >> >>> + /* Here we have a successor of newly-hot bb that is cold >> >>> + but no longer has any cold precessessors. Since the original >> >>> + assignment of our newly-hot bb was incorrect, this successor's >> >>> + assignment as cold is also suspect. Go ahead and re-mark it >> >>> + as hot now too. Better heuristics may be in order here. */ >> >>> + BB_SET_PARTITION (succ, BB_HOT_PARTITION); >> >>> + cold_bb_count--; >> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ); >> >>> + /* Examine this successor as a newly-hot bb. */ >> >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, succ); >> >>> + } >> >>> + } >> >>> + } >> >>> + >> >>> + if (dom_calculated_here) >> >>> + free_dominance_info (CDI_DOMINATORS); >> >>> + } >> >>> + >> >>> /* The format of .gcc_except_table does not allow landing pads to >> >>> be in a different partition as the throw. Fix this by either >> >>> moving or duplicating the landing pads. */ >> >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) >> >>> new_bb->aux = cur_bb->aux; >> >>> cur_bb->aux = new_bb; >> >>> >> >>> - /* Make sure new fall-through bb is in same >> >>> - partition as bb it's falling through from. */ >> >>> + /* This is done by force_nonfallthru_and_redirect. */ >> >>> + gcc_assert (BB_PARTITION (new_bb) >> >>> + == BB_PARTITION (cur_bb)); >> >>> >> >>> - BB_COPY_PARTITION (new_bb, cur_bb); >> >>> single_succ_edge (new_bb)->flags |= EDGE_CROSSING; >> >>> } >> >>> else >> >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) >> >>> FOR_EACH_BB (bb) >> >>> FOR_EACH_EDGE (e, ei, bb->succs) >> >>> if ((e->flags & EDGE_CROSSING) >> >>> - && JUMP_P (BB_END (e->src))) >> >>> + && JUMP_P (BB_END (e->src)) >> >>> + /* Some notes were added during fix_up_fall_thru_edges, via >> >>> + force_nonfallthru_and_redirect. */ >> >>> + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX)) >> >>> add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >> >>> } >> >>> >> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) >> >>> dump_flow_info (dump_file, dump_flags); >> >>> } >> >>> >> >>> - if (flag_reorder_blocks_and_partition) >> >>> + if (crtl->has_bb_partition) >> >>> verify_hot_cold_block_grouping (); >> >>> } >> >>> >> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) >> >>> encountering this note will make the compiler switch between the >> >>> hot and cold text sections. */ >> >>> >> >>> -static void >> >>> +void >> >>> insert_section_boundary_note (void) >> >>> { >> >>> basic_block bb; >> >>> rtx new_note; >> >>> int first_partition = 0; >> >>> >> >>> - if (!flag_reorder_blocks_and_partition) >> >>> + if (!crtl->has_bb_partition) >> >>> return; >> >>> >> >>> FOR_EACH_BB (bb) >> >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) >> >>> FOR_EACH_BB (bb) >> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >> >>> bb->aux = bb->next_bb; >> >>> - cfg_layout_finalize (); >> >>> + cfg_layout_finalize (true); >> >>> >> >>> - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >> >>> - insert_section_boundary_note (); >> >>> return 0; >> >>> } >> >>> >> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) >> >>> } >> >>> >> >>> done: >> >>> - cfg_layout_finalize (); >> >>> + cfg_layout_finalize (false); >> >>> >> >>> BITMAP_FREE (candidates); >> >>> return 0; >> >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) >> >>> if (crossing_edges == NULL) >> >>> return 0; >> >>> >> >>> + crtl->has_bb_partition = true; >> >>> + >> >>> /* Make sure the source of any crossing edge ends in a jump and the >> >>> destination of any crossing edge has a label. */ >> >>> add_labels_and_missing_jumps (crossing_edges); >> >>> Index: bb-reorder.h >> >>> =================================================================== >> >>> --- bb-reorder.h (revision 193376) >> >>> +++ bb-reorder.h (working copy) >> >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re >> >>> >> >>> extern int get_uncond_jump_length (void); >> >>> >> >>> +extern void insert_section_boundary_note (void); >> >>> + >> >>> +extern void emit_barrier_after_bb (basic_block bb); >> >>> + >> >>> #endif >> >>> Index: basic-block.h >> >>> =================================================================== >> >>> --- basic-block.h (revision 193376) >> >>> +++ basic-block.h (working copy) >> >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect >> >>> extern bool contains_no_active_insn_p (const_basic_block); >> >>> extern bool forwarder_block_p (const_basic_block); >> >>> extern bool can_fallthru (basic_block, basic_block); >> >>> +extern void fixup_partitions (void); >> >>> >> >>> /* In cfgbuild.c. */ >> >>> extern void find_many_sub_basic_blocks (sbitmap); >> >>> Index: cfgrtl.c >> >>> =================================================================== >> >>> --- cfgrtl.c (revision 193376) >> >>> +++ cfgrtl.c (working copy) >> >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see >> >>> #include "tree.h" >> >>> #include "hard-reg-set.h" >> >>> #include "basic-block.h" >> >>> +#include "bb-reorder.h" >> >>> #include "regs.h" >> >>> #include "flags.h" >> >>> #include "function.h" >> >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see >> >>> Only applicable if the CFG is in cfglayout mode. */ >> >>> static GTY(()) rtx cfg_layout_function_footer; >> >>> static GTY(()) rtx cfg_layout_function_header; >> >>> +static bool had_sec_boundary_notes; >> >>> >> >>> static rtx skip_insns_after_block (basic_block); >> >>> static void record_effective_endpoints (void); >> >>> static rtx label_for_bb (basic_block); >> >>> -static void fixup_reorder_chain (void); >> >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks); >> >>> >> >>> void verify_insn_chain (void); >> >>> static void fixup_fallthru_exit_predecessor (void); >> >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc >> >>> partition boundaries). See the comments at the top of >> >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >> >>> >> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >> >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >> >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >> >>> return NULL; >> >>> >> >>> /* We can replace or remove a complex jump only when we have exactly >> >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) >> >>> return e; >> >>> } >> >>> >> >>> +/* Called when edge E has been redirected to a new destination, >> >>> + in order to update the region crossing flag on the edge and >> >>> + jump. */ >> >>> + >> >>> +static void >> >>> +fixup_partition_crossing (edge e, basic_block target) >> >>> +{ >> >>> + rtx note; >> >>> + >> >>> + gcc_assert (e->dest == target); >> >>> + >> >>> + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) >> >>> + return; >> >>> + /* If we redirected an existing edge, it may already be marked >> >>> + crossing, even though the new src is missing a reg crossing note. >> >>> + But make sure reg crossing note doesn't already exist before >> >>> + inserting. */ >> >>> + if (BB_PARTITION (e->src) != BB_PARTITION (target)) >> >>> + { >> >>> + e->flags |= EDGE_CROSSING; >> >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >> >>> + if (JUMP_P (BB_END (e->src)) >> >>> + && !note) >> >>> + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >> >>> + } >> >>> + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) >> >>> + { >> >>> + e->flags &= ~EDGE_CROSSING; >> >>> + /* Remove the region crossing note from jump at end of >> >>> + e->src if it exists. */ >> >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >> >>> + if (note) >> >>> + remove_note (BB_END (e->src), note); >> >>> + } >> >>> +} >> >>> + >> >>> +/* Called when block BB has been reassigned to a different partition, >> >>> + to ensure that the region crossing attributes are updated. */ >> >>> + >> >>> +static void >> >>> +fixup_bb_partition (basic_block bb) >> >>> +{ >> >>> + edge e; >> >>> + edge_iterator ei; >> >>> + >> >>> + /* Now need to make bb's pred edges non-region crossing. */ >> >>> + FOR_EACH_EDGE (e, ei, bb->preds) >> >>> + { >> >>> + fixup_partition_crossing (e, e->dest); >> >>> + } >> >>> + >> >>> + /* Possibly need to make bb's successor edges region crossing, >> >>> + or remove stale region crossing. */ >> >>> + FOR_EACH_EDGE (e, ei, bb->succs) >> >>> + { >> >>> + if ((e->flags & EDGE_FALLTHRU) >> >>> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) >> >>> + && e->dest != EXIT_BLOCK_PTR) >> >>> + /* force_nonfallthru_and_redirect calls fixup_partition_crossing. */ >> >>> + force_nonfallthru (e); >> >>> + else >> >>> + fixup_partition_crossing (e, e->dest); >> >>> + } >> >>> +} >> >>> + >> >>> /* Attempt to change code to redirect edge E to TARGET. Don't do that on >> >>> expense of adding new instructions or reordering basic blocks. >> >>> >> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block >> >>> { >> >>> edge ret; >> >>> basic_block src = e->src; >> >>> + basic_block dest = e->dest; >> >>> >> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >> >>> return NULL; >> >>> >> >>> - if (e->dest == target) >> >>> + if (dest == target) >> >>> return e; >> >>> >> >>> if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) >> >>> { >> >>> df_set_bb_dirty (src); >> >>> + fixup_partition_crossing (ret, target); >> >>> return ret; >> >>> } >> >>> >> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block >> >>> return NULL; >> >>> >> >>> df_set_bb_dirty (src); >> >>> + fixup_partition_crossing (ret, target); >> >>> return ret; >> >>> } >> >>> >> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >> >>> /* Make sure new block ends up in correct hot/cold section. */ >> >>> >> >>> BB_COPY_PARTITION (jump_block, e->src); >> >>> - if (flag_reorder_blocks_and_partition >> >>> - && targetm_common.have_named_sections >> >>> - && JUMP_P (BB_END (jump_block)) >> >>> - && !any_condjump_p (BB_END (jump_block)) >> >>> - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) >> >>> - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); >> >>> >> >>> /* Wire edge in. */ >> >>> new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); >> >>> new_edge->probability = probability; >> >>> new_edge->count = count; >> >>> >> >>> + /* If e->src was previously region crossing, it no longer is >> >>> + and the reg crossing note should be removed. */ >> >>> + fixup_partition_crossing (new_edge, jump_block); >> >>> + >> >>> /* Redirect old edge. */ >> >>> redirect_edge_pred (e, jump_block); >> >>> e->probability = REG_BR_PROB_BASE; >> >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >> >>> LABEL_NUSES (label)++; >> >>> } >> >>> >> >>> - emit_barrier_after (BB_END (jump_block)); >> >>> + /* We might be in cfg layout mode, and if so, the following routine will >> >>> + insert the barrier correctly. */ >> >>> + emit_barrier_after_bb (jump_block); >> >>> redirect_edge_succ_nodup (e, target); >> >>> >> >>> if (abnormal_edge_flags) >> >>> make_edge (src, target, abnormal_edge_flags); >> >>> >> >>> df_mark_solutions_dirty (); >> >>> + fixup_partition_crossing (e, target); >> >>> return new_bb; >> >>> } >> >>> >> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU >> >>> static basic_block >> >>> rtl_split_edge (edge edge_in) >> >>> { >> >>> - basic_block bb; >> >>> + basic_block bb, new_bb; >> >>> rtx before; >> >>> >> >>> /* Abnormal edges cannot be split. */ >> >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) >> >>> else >> >>> { >> >>> bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); >> >>> - /* ??? Why not edge_in->dest->prev_bb here? */ >> >>> - BB_COPY_PARTITION (bb, edge_in->dest); >> >>> + if (edge_in->src == ENTRY_BLOCK_PTR) >> >>> + BB_COPY_PARTITION (bb, edge_in->dest); >> >>> + else >> >>> + /* Put the split bb into the src partition, to avoid creating >> >>> + a situation where a cold bb dominates a hot bb, in the case >> >>> + where src is cold and dest is hot. The src will dominate >> >>> + the new bb (whereas it might not have dominated dest). */ >> >>> + BB_COPY_PARTITION (bb, edge_in->src); >> >>> } >> >>> >> >>> make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); >> >>> >> >>> + /* Can't allow a region crossing edge to be fallthrough. */ >> >>> + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) >> >>> + && edge_in->dest != EXIT_BLOCK_PTR) >> >>> + { >> >>> + new_bb = force_nonfallthru (single_succ_edge (bb)); >> >>> + gcc_assert (!new_bb); >> >>> + } >> >>> + >> >>> /* For non-fallthru edges, we must adjust the predecessor's >> >>> jump instruction to target our new block. */ >> >>> if ((edge_in->flags & EDGE_FALLTHRU) == 0) >> >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) >> >>> else >> >>> { >> >>> bb = split_edge (e); >> >>> - after = BB_END (bb); >> >>> >> >>> - if (flag_reorder_blocks_and_partition >> >>> - && targetm_common.have_named_sections >> >>> - && e->src != ENTRY_BLOCK_PTR >> >>> - && BB_PARTITION (e->src) == BB_COLD_PARTITION >> >>> - && !(e->flags & EDGE_CROSSING) >> >>> - && JUMP_P (after) >> >>> - && !any_condjump_p (after) >> >>> - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) >> >>> - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); >> >>> + /* If e crossed a partition boundary, we needed to make bb end in >> >>> + a region-crossing jump, even though it was originally fallthru. */ >> >>> + if (JUMP_P (BB_END (bb))) >> >>> + before = BB_END (bb); >> >>> + else >> >>> + after = BB_END (bb); >> >>> } >> >>> >> >>> /* Now that we've found the spot, do the insertion. */ >> >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) >> >>> { >> >>> basic_block bb; >> >>> >> >>> + /* Optimization passes that invoke this routine can cause hot blocks >> >>> + previously reached by both hot and cold blocks to become dominated only >> >>> + by cold blocks. This will cause the verification below to fail, >> >>> + and lead to now cold code in the hot section. In some cases this >> >>> + may only be visible after newly unreachable blocks are deleted, >> >>> + which will be done by fixup_partitions. */ >> >>> + fixup_partitions (); >> >>> + >> >>> #ifdef ENABLE_CHECKING >> >>> verify_flow_info (); >> >>> #endif >> >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) >> >>> >> >>> return end; >> >>> } >> >>> - >> >>> + >> >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization >> >>> + passes that modify the cfg. */ >> >>> + >> >>> +void >> >>> +fixup_partitions (void) >> >>> +{ >> >>> + basic_block bb; >> >>> + >> >>> + if (!crtl->has_bb_partition) >> >>> + return; >> >>> + >> >>> + /* Delete any blocks that became unreachable and weren't >> >>> + already cleaned up, for example during edge forwarding >> >>> + and convert_jumps_to_returns. This will expose more >> >>> + opportunities for fixing the partition boundaries here. >> >>> + Also, the calculation of the dominance graph during verification >> >>> + will assert if there are unreachable nodes. */ >> >>> + delete_unreachable_blocks (); >> >>> + >> >>> + /* If there are partitions, do a sanity check on them: A basic block in >> >>> + a cold partition cannot dominate a basic block in a hot partition. >> >>> + Fixup any that now violate this requirement, as a result of edge >> >>> + forwarding and unreachable block deletion. */ >> >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >> >>> + VEC (basic_block, heap) *bbs_to_fix = NULL; >> >>> + FOR_EACH_BB (bb) >> >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >> >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >> >>> + { >> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >> >>> + basic_block son; >> >>> + >> >>> + if (dom_calculated_here) >> >>> + calculate_dominance_info (CDI_DOMINATORS); >> >>> + >> >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >> >>> + { >> >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >> >>> + /* If bb is not yet cold (because it was added below as >> >>> + a block dominated by a cold bb) then mark it cold here. */ >> >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >> >>> + { >> >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >> >>> + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); >> >>> + } >> >>> + /* Any blocks dominated by a block in the cold section >> >>> + must also be cold. */ >> >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >> >>> + son; >> >>> + son = next_dom_son (CDI_DOMINATORS, son)) >> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >> >>> + } >> >>> + >> >>> + if (dom_calculated_here) >> >>> + free_dominance_info (CDI_DOMINATORS); >> >>> + } >> >>> + >> >>> + /* Do the partition fixup after all necessary blocks have been converted to >> >>> + cold, so that we only update the region crossings the minimum number of >> >>> + places, which can require forcing edges to be non fallthru. */ >> >>> + while (! VEC_empty (basic_block, bbs_to_fix)) >> >>> + { >> >>> + bb = VEC_pop (basic_block, bbs_to_fix); >> >>> + fixup_bb_partition (bb); >> >>> + } >> >>> +} >> >>> + >> >>> /* Verify the CFG and RTL consistency common for both underlying RTL and >> >>> cfglayout RTL. >> >>> >> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) >> >>> rtx x; >> >>> int err = 0; >> >>> basic_block bb; >> >>> + bool have_partitions = false; >> >>> >> >>> /* Check the general integrity of the basic blocks. */ >> >>> FOR_EACH_BB_REVERSE (bb) >> >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) >> >>> >> >>> if (e->flags & EDGE_ABNORMAL) >> >>> n_abnormal++; >> >>> + >> >>> + have_partitions |= is_crossing; >> >>> } >> >>> >> >>> if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) >> >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) >> >>> } >> >>> } >> >>> >> >>> + /* If there are partitions, do a sanity check on them: A basic block in >> >>> + a cold partition cannot dominate a basic block in a hot partition. */ >> >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >> >>> + if (have_partitions && !err) >> >>> + FOR_EACH_BB (bb) >> >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >> >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >> >>> + { >> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >> >>> + basic_block son; >> >>> + >> >>> + if (dom_calculated_here) >> >>> + calculate_dominance_info (CDI_DOMINATORS); >> >>> + >> >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >> >>> + { >> >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >> >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >> >>> + { >> >>> + error ("non-cold basic block %d dominated " >> >>> + "by a block in the cold partition", bb->index); >> >>> + err = 1; >> >>> + } >> >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >> >>> + son; >> >>> + son = next_dom_son (CDI_DOMINATORS, son)) >> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >> >>> + } >> >>> + >> >>> + if (dom_calculated_here) >> >>> + free_dominance_info (CDI_DOMINATORS); >> >>> + } >> >>> + >> >>> /* Clean up. */ >> >>> return err; >> >>> } >> >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) >> >>> else >> >>> cfg_layout_function_header = NULL_RTX; >> >>> >> >>> + had_sec_boundary_notes = false; >> >>> + >> >>> next_insn = get_insns (); >> >>> FOR_EACH_BB (bb) >> >>> { >> >>> rtx end; >> >>> >> >>> if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) >> >>> - BB_HEADER (bb) = unlink_insn_chain (next_insn, >> >>> - PREV_INSN (BB_HEAD (bb))); >> >>> + { >> >>> + /* Rather than try to keep section boundary notes incrementally >> >>> + up-to-date through cfg layout optimizations, simply remove them >> >>> + and flag that they should be re-inserted when exiting >> >>> + cfg layout mode. */ >> >>> + rtx check_insn = next_insn; >> >>> + while (check_insn) >> >>> + { >> >>> + if (NOTE_P (check_insn) >> >>> + && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) >> >>> + { >> >>> + had_sec_boundary_notes |= true; >> >>> + /* Remove note from chain. Grab new next_insn first. */ >> >>> + if (next_insn == check_insn) >> >>> + next_insn = NEXT_INSN (check_insn); >> >>> + /* Delete note. */ >> >>> + delete_insn (check_insn); >> >>> + /* There will only be one. */ >> >>> + break; >> >>> + } >> >>> + check_insn = NEXT_INSN (check_insn); >> >>> + } >> >>> + /* If we still have header instructions left after above loop. */ >> >>> + if (next_insn != BB_HEAD (bb)) >> >>> + BB_HEADER (bb) = unlink_insn_chain (next_insn, >> >>> + PREV_INSN (BB_HEAD (bb))); >> >>> + } >> >>> end = skip_insns_after_block (bb); >> >>> if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) >> >>> BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end); >> >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) >> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >> >>> bb->aux = bb->next_bb; >> >>> >> >>> - cfg_layout_finalize (); >> >>> + cfg_layout_finalize (false); >> >>> >> >>> return 0; >> >>> } >> >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) >> >>> } >> >>> >> >>> >> >>> -/* Given a reorder chain, rearrange the code to match. */ >> >>> +/* Given a reorder chain, rearrange the code to match. If >> >>> + this is called when we will FINALIZE_REORDER_BLOCKS, or when >> >>> + section boundary notes were removed on entry to cfg layout >> >>> + mode, insert section boundary notes here. */ >> >>> >> >>> static void >> >>> -fixup_reorder_chain (void) >> >>> +fixup_reorder_chain (bool finalize_reorder_blocks) >> >>> { >> >>> basic_block bb; >> >>> rtx insn = NULL; >> >>> @@ -3150,7 +3373,7 @@ static void >> >>> PREV_INSN (BB_HEADER (bb)) = insn; >> >>> insn = BB_HEADER (bb); >> >>> while (NEXT_INSN (insn)) >> >>> - insn = NEXT_INSN (insn); >> >>> + insn = NEXT_INSN (insn); >> >>> } >> >>> if (insn) >> >>> NEXT_INSN (insn) = BB_HEAD (bb); >> >>> @@ -3175,6 +3398,11 @@ static void >> >>> insn = NEXT_INSN (insn); >> >>> >> >>> set_last_insn (insn); >> >>> + >> >>> + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >> >>> + if (had_sec_boundary_notes || finalize_reorder_blocks) >> >>> + insert_section_boundary_note (); >> >>> + >> >>> #ifdef ENABLE_CHECKING >> >>> verify_insn_chain (); >> >>> #endif >> >>> @@ -3187,7 +3415,7 @@ static void >> >>> edge e_fall, e_taken, e; >> >>> rtx bb_end_insn; >> >>> rtx ret_label = NULL_RTX; >> >>> - basic_block nb, src_bb; >> >>> + basic_block nb; >> >>> edge_iterator ei; >> >>> >> >>> if (EDGE_COUNT (bb->succs) == 0) >> >>> @@ -3322,7 +3550,6 @@ static void >> >>> /* We got here if we need to add a new jump insn. >> >>> Note force_nonfallthru can delete E_FALL and thus we have to >> >>> save E_FALL->src prior to the call to force_nonfallthru. */ >> >>> - src_bb = e_fall->src; >> >>> nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); >> >>> if (nb) >> >>> { >> >>> @@ -3330,17 +3557,6 @@ static void >> >>> bb->aux = nb; >> >>> /* Don't process this new block. */ >> >>> bb = nb; >> >>> - >> >>> - /* Make sure new bb is tagged for correct section (same as >> >>> - fall-thru source, since you cannot fall-thru across >> >>> - section boundaries). */ >> >>> - BB_COPY_PARTITION (src_bb, single_pred (bb)); >> >>> - if (flag_reorder_blocks_and_partition >> >>> - && targetm_common.have_named_sections >> >>> - && JUMP_P (BB_END (bb)) >> >>> - && !any_condjump_p (BB_END (bb)) >> >>> - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) >> >>> - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); >> >>> } >> >>> } >> >>> >> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) >> >>> case NOTE_INSN_FUNCTION_BEG: >> >>> /* There is always just single entry to function. */ >> >>> case NOTE_INSN_BASIC_BLOCK: >> >>> + /* We should only switch text sections once. */ >> >>> + case NOTE_INSN_SWITCH_TEXT_SECTIONS: >> >>> break; >> >>> >> >>> case NOTE_INSN_EPILOGUE_BEG: >> >>> - case NOTE_INSN_SWITCH_TEXT_SECTIONS: >> >>> emit_note_copy (insn); >> >>> break; >> >>> >> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void) >> >>> } >> >>> >> >>> /* Finalize the changes: reorder insn list according to the sequence specified >> >>> - by aux pointers, enter compensation code, rebuild scope forest. */ >> >>> + by aux pointers, enter compensation code, rebuild scope forest. If >> >>> + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that >> >>> + to fixup_reorder_chain so that it can insert the proper switch text >> >>> + section notes. */ >> >>> >> >>> void >> >>> -cfg_layout_finalize (void) >> >>> +cfg_layout_finalize (bool finalize_reorder_blocks) >> >>> { >> >>> #ifdef ENABLE_CHECKING >> >>> verify_flow_info (); >> >>> @@ -3775,7 +3995,7 @@ void >> >>> #endif >> >>> ) >> >>> fixup_fallthru_exit_predecessor (); >> >>> - fixup_reorder_chain (); >> >>> + fixup_reorder_chain (finalize_reorder_blocks); >> >>> >> >>> rebuild_jump_labels (get_insns ()); >> >>> delete_dead_jumptables (); >> >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) >> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >> >>> return false; >> >>> >> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >> >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >> >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >> >>> return false; >> >>> >> >>> if (!onlyjump_p (insn) >> >>> >> >>> -- >> >>> This patch is available for review at http://codereview.appspot.com/6823047 >> >> >> >> >> >> >> >> -- >> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
I have updated my trunk checkout, and I can confirm that eval.c now compiles with your patch (and the other 4 patches I added to PR55121). Now, when looking at the whole Spec2k results: - vpr passes now (used to fail) - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail with the same error from gas: can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171' {.text section} - gap still does not build (same error as above) I haven't looked in detail, so I may be missing an obvious patch here. And I still observe runtime mis-behaviour on crafty, galgel, facerec and fma3d. Thanks Christophe. On 26 November 2012 21:52, Teresa Johnson <tejohnson@google.com> wrote: > Sorry, I don't know what happened there. Patch is attached. > Thanks, > Teresa > > On Mon, Nov 26, 2012 at 12:42 PM, Jack Howarth <howarth@bromo.med.uc.edu> wrote: >> On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote: >>> Are you sure you have all my changes applied? I applied the 4 patches >>> attached to PR55121 into my trunk checkout that has my fixes, and to a >>> pristine trunk checkout. I configured and built both for >>> --target=arm-none-linux-gnueabi, and built using your options, .i file >>> and gcda file. I can reproduce the failure using the pristine trunk >>> with your patches but not with my fixed trunk + your patches. (I just >>> updated to head to pickup recent changes and get the same result. The >>> vec changes required some manual changes to the patch, which I will >>> resend shortly.) >> >> Teresa, >> Your mailer seems to have corrupted the posted patch with stray >> =3D characters and line breaks. Can you repost a copy as an attachment >> to the list? >> Jack >> >>> >>> Without my fixes: >>> >>> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce >>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 >>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp >>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use >>> -fno-common -o eval.s -freorder-blocks-and-partition >>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>> 2.4.2-p1, MPC version 0.8.1 >>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>> 2.4.2-p1, MPC version 0.8.1 >>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a >>> eval.c: In function ‘Ge’: >>> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560 >>> } >>> ^ >>> 0x622f71 df_compact_blocks() >>> ../../gcc_trunk_3/gcc/df-core.c:1560 >>> 0x5cfcb5 compact_blocks() >>> ../../gcc_trunk_3/gcc/cfg.c:162 >>> 0xc9dce0 reorder_basic_blocks >>> ../../gcc_trunk_3/gcc/bb-reorder.c:2154 >>> 0xc9dce0 rest_of_handle_reorder_blocks >>> ../../gcc_trunk_3/gcc/bb-reorder.c:2219 >>> Please submit a full bug report, >>> with preprocessed source if appropriate. >>> Please include the complete backtrace with any bug report. >>> See <http://gcc.gnu.org/bugs.html> for instructions. >>> >>> >>> With my fixes: >>> >>> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce >>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 >>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp >>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use >>> -fno-common -o eval.s -freorder-blocks-and-partition >>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>> 2.4.2-p1, MPC version 0.8.1 >>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>> 2.4.2-p1, MPC version 0.8.1 >>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3 >>> >>> >>> Thanks, >>> Teresa >>> >>> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon >>> <christophe.lyon@linaro.org> wrote: >>> > Hi, >>> > >>> > I have tested your patch on Spec2000 on ARM, and I can still see >>> > several failures caused by: >>> > "error: fallthru edge crosses section boundary", including the case >>> > described in PR55121. >>> > >>> > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote: >>> >> Ping. >>> >> Teresa >>> >> >>> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote: >>> >>> Revised patch that fixes failures encountered when enabling >>> >>> -freorder-blocks-and-partition, including the failure reported in PR 53743. >>> >>> >>> >>> This includes new verification code to ensure no cold blocks dominate hot >>> >>> blocks contributed by Steven Bosscher. >>> >>> >>> >>> I attempted to make the handling of partition updates through the optimization >>> >>> passes much more consistent, removing a number of partial fixes in the code >>> >>> stream in the process. The code to fixup partitions (including the BB_PARTITION >>> >>> assignement, region crossing jump notes, and switch text section notes) is >>> >>> now handled in a few centralized locations. For example, inside >>> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers >>> >>> don't need to attempt the fixup themselves. >>> >>> >>> >>> For optimization passes that make adjustments to the cfg while in cfg layout >>> >>> mode that are not easy to fix up incrementally, the new routine >>> >>> fixup_partitions handles the cleanup globally. This does require calculation >>> >>> of the dominance relation, however, as far as I can tell the routines which >>> >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) >>> >>> are invoked typically once (or a small number of times in the case of >>> >>> try_optimize_cfg) per optimization pass. Additionally, I compared the >>> >>> -ftime-report output for some large fdo compilations and saw only minimal >>> >>> increases in the dominance computation times, which were only a tiny percent >>> >>> of the overall compile time. >>> >>> >>> >>> Additionally, I added a flag to the rtl_data structure to indicate whether >>> >>> any partitioning was actually performed, so that optimizations which were >>> >>> conservatively disabled whenever the flag_reorder_blocks_and_partition >>> >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less >>> >>> conservative for functions where no partitions were formed (e.g. they are >>> >>> completely hot). >>> >>> >>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int >>> >>> benchmarks and internal google benchmarks using profile feedback and >>> >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk? >>> >>> >>> >>> Thanks, >>> >>> Teresa >>> >>> >>> >>> 2012-11-14 Teresa Johnson <tejohnson@google.com> >>> >>> Steven Bosscher <steven@gcc.gnu.org> >>> >>> >>> >>> * cfghooks.h (cfg_layout_finalize): New parameter. >>> >>> * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize >>> >>> parameter. >>> >>> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert >>> >>> as this is now done by redirect_edge_and_branch_force. >>> >>> * function.c (thread_prologue_and_epilogue_insns): Insert new bb after >>> >>> barriers, new cfg_layout_finalize parameter, and don't store exit >>> >>> predecessor BB until after it is potentially split. >>> >>> * function.h (struct rtl_data): New flag has_bb_partition. >>> >>> * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. >>> >>> * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if >>> >>> any blocks in function actually partitioned. >>> >>> (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean >>> >>> up partitioning. >>> >>> * bb-reorder.c (connect_traces): Only look for partitions and skip >>> >>> block copying if any blocks in function actually partitioned. >>> >>> (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. >>> >>> (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure >>> >>> that no cold blocks dominate a hot block. >>> >>> (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert >>> >>> as this is now done by force_nonfallthru_and_redirect. >>> >>> (add_reg_crossing_jump_notes): Handle the fact that some jumps may >>> >>> already be marked with region crossing note. >>> >>> (reorder_basic_blocks): Only need to verify partitions if any >>> >>> blocks in function actually partitioned. >>> >>> (insert_section_boundary_note): Only need to insert note if any >>> >>> blocks in function actually partitioned. >>> >>> (rest_of_handle_reorder_blocks): New cfg_layout_finalize >>> >>> parameter, and remove call to insert_section_boundary_note as this >>> >>> is now called via cfg_layout_finalize/fixup_reorder_chain. >>> >>> (duplicate_computed_gotos): New cfg_layout_finalize >>> >>> parameter. >>> >>> (partition_hot_cold_basic_blocks): Set flag indicating function >>> >>> has bb partitions. >>> >>> * bb-reorder.h: Declare insert_section_boundary_note and >>> >>> emit_barrier_after_bb, which are no longer static. >>> >>> * basic-block.h: Declare new function fixup_partitions. >>> >>> * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary >>> >>> check for region crossing note. >>> >>> (fixup_partition_crossing): New function. >>> >>> (fixup_bb_partition): Ditto. >>> >>> (rtl_redirect_edge_and_branch): Fixup partition boundaries. >>> >>> (force_nonfallthru_and_redirect): Fixup partition boundaries, >>> >>> remove old code that tried to do this. Emit barrier correctly >>> >>> when we are in cfglayout mode. >>> >>> (rtl_split_edge): Correctly fixup partition boundaries. >>> >>> (commit_one_edge_insertion): Remove old code that tried to >>> >>> fixup region crossing edge since this is now handled in >>> >>> split_block, and set up insertion point correctly since >>> >>> block may now end in a jump. >>> >>> (commit_edge_insertions): Invoke fixup_partitions to sanitize partition >>> >>> boundaries after optimizations that modify cfg and before trying to >>> >>> verify the flow info. >>> >>> (fixup_partitions): New function. >>> >>> (rtl_verify_flow_info_1): Add verification that no cold bbs dominate >>> >>> hot bbs. >>> >>> (record_effective_endpoints): Remove region-crossing notes and set flag >>> >>> indicating that they need to be reinserted on exit from cfglayout mode. >>> >>> (outof_cfg_layout_mode): New cfg_layout_finalize parameter. >>> >>> (fixup_reorder_chain): Call insert_section_boundary_note if necessary. >>> >>> Remove old code that attempted to fixup region crossing note as >>> >>> this is now handled in force_nonfallthru_and_redirect. >>> >>> (duplicate_insn_chain): Don't duplicate switch section notes. >>> >>> (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain. >>> >>> (rtl_can_remove_branch_p): Remove unnecessary check for region crossing >>> >>> note. >>> >>> >>> >>> Index: cfghooks.h >>> >>> =================================================================== >>> >>> --- cfghooks.h (revision 193376) >>> >>> +++ cfghooks.h (working copy) >>> >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas >>> >>> void account_profile_record (struct profile_record *, int); >>> >>> >>> >>> extern void cfg_layout_initialize (unsigned int); >>> >>> -extern void cfg_layout_finalize (void); >>> >>> +extern void cfg_layout_finalize (bool); >>> >>> >>> >>> /* Hooks containers. */ >>> >>> extern struct cfg_hooks gimple_cfg_hooks; >>> >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi >>> >>> extern void gimple_register_cfg_hooks (void); >>> >>> extern struct cfg_hooks get_cfg_hooks (void); >>> >>> extern void set_cfg_hooks (struct cfg_hooks); >>> >>> - >>> >>> Index: modulo-sched.c >>> >>> =================================================================== >>> >>> --- modulo-sched.c (revision 193376) >>> >>> +++ modulo-sched.c (working copy) >>> >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) >>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>> >>> bb->aux = bb->next_bb; >>> >>> free_dominance_info (CDI_DOMINATORS); >>> >>> - cfg_layout_finalize (); >>> >>> + cfg_layout_finalize (false); >>> >>> #endif /* INSN_SCHEDULING */ >>> >>> return 0; >>> >>> } >>> >>> Index: ifcvt.c >>> >>> =================================================================== >>> >>> --- ifcvt.c (revision 193376) >>> >>> +++ ifcvt.c (working copy) >>> >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg >>> >>> if (new_bb) >>> >>> { >>> >>> df_bb_replace (then_bb_index, new_bb); >>> >>> - /* Since the fallthru edge was redirected from test_bb to new_bb, >>> >>> - we need to ensure that new_bb is in the same partition as >>> >>> - test bb (you can not fall through across section boundaries). */ >>> >>> - BB_COPY_PARTITION (new_bb, test_bb); >>> >>> + /* This should have been done above via force_nonfallthru_and_redirect >>> >>> + (possibly called from redirect_edge_and_branch_force). */ >>> >>> + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); >>> >>> } >>> >>> >>> >>> num_true_changes++; >>> >>> Index: function.c >>> >>> =================================================================== >>> >>> --- function.c (revision 193376) >>> >>> +++ function.c (working copy) >>> >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) >>> >>> break; >>> >>> if (e) >>> >>> { >>> >>> - copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), >>> >>> - NULL_RTX, e->src); >>> >>> + /* Make sure we insert after any barriers. */ >>> >>> + rtx end = get_last_bb_insn (e->src); >>> >>> + copy_bb = create_basic_block (NEXT_INSN (end), >>> >>> + NULL_RTX, e->src); >>> >>> BB_COPY_PARTITION (copy_bb, e->src); >>> >>> } >>> >>> else >>> >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) >>> >>> if (cur_bb->index >= NUM_FIXED_BLOCKS >>> >>> && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) >>> >>> cur_bb->aux = cur_bb->next_bb; >>> >>> - cfg_layout_finalize (); >>> >>> + cfg_layout_finalize (false); >>> >>> } >>> >>> >>> >>> epilogue_done: >>> >>> @@ -6517,7 +6519,7 @@ epilogue_done: >>> >>> basic_block simple_return_block_cold = NULL; >>> >>> edge pending_edge_hot = NULL; >>> >>> edge pending_edge_cold = NULL; >>> >>> - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; >>> >>> + basic_block exit_pred; >>> >>> int i; >>> >>> >>> >>> gcc_assert (entry_edge != orig_entry_edge); >>> >>> @@ -6545,6 +6547,12 @@ epilogue_done: >>> >>> else >>> >>> pending_edge_cold = e; >>> >>> } >>> >>> + >>> >>> + /* Save a pointer to the exit's predecessor BB for use in >>> >>> + inserting new BBs at the end of the function. Do this >>> >>> + after the call to split_block above which may split >>> >>> + the original exit pred. */ >>> >>> + exit_pred = EXIT_BLOCK_PTR->prev_bb; >>> >>> >>> >>> FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) >>> >>> { >>> >>> Index: function.h >>> >>> =================================================================== >>> >>> --- function.h (revision 193376) >>> >>> +++ function.h (working copy) >>> >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { >>> >>> sched2) and is useful only if the port defines LEAF_REGISTERS. */ >>> >>> bool uses_only_leaf_regs; >>> >>> >>> >>> + /* Nonzero if the function being compiled has undergone hot/cold partitioning >>> >>> + (under flag_reorder_blocks_and_partition) and has at least one cold >>> >>> + block. */ >>> >>> + bool has_bb_partition; >>> >>> + >>> >>> /* Like regs_ever_live, but 1 if a reg is set or clobbered from an >>> >>> asm. Unlike regs_ever_live, elements of this array corresponding >>> >>> to eliminable regs (like the frame pointer) are set if an asm >>> >>> Index: hw-doloop.c >>> >>> =================================================================== >>> >>> --- hw-doloop.c (revision 193376) >>> >>> +++ hw-doloop.c (working copy) >>> >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) >>> >>> else >>> >>> bb->aux = NULL; >>> >>> } >>> >>> - cfg_layout_finalize (); >>> >>> + cfg_layout_finalize (false); >>> >>> clear_aux_for_blocks (); >>> >>> df_analyze (); >>> >>> } >>> >>> Index: cfgcleanup.c >>> >>> =================================================================== >>> >>> --- cfgcleanup.c (revision 193376) >>> >>> +++ cfgcleanup.c (working copy) >>> >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, >>> >>> partition boundaries). See the comments at the top of >>> >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >>> >>> >>> >>> - if (flag_reorder_blocks_and_partition && reload_completed) >>> >>> + if (crtl->has_bb_partition && reload_completed) >>> >>> return false; >>> >>> >>> >>> /* Search backward through forwarder blocks. We don't need to worry >>> >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) >>> >>> df_analyze (); >>> >>> } >>> >>> >>> >>> + if (changed) >>> >>> + { >>> >>> + /* Edge forwarding in particular can cause hot blocks previously >>> >>> + reached by both hot and cold blocks to become dominated only >>> >>> + by cold blocks. This will cause the verification below to fail, >>> >>> + and lead to now cold code in the hot section. This is not easy >>> >>> + to detect and fix during edge forwarding, and in some cases >>> >>> + is only visible after newly unreachable blocks are deleted, >>> >>> + which will be done in fixup_partitions. */ >>> >>> + fixup_partitions (); >>> >>> + >>> >>> #ifdef ENABLE_CHECKING >>> >>> - if (changed) >>> >>> - verify_flow_info (); >>> >>> + verify_flow_info (); >>> >>> #endif >>> >>> + } >>> >>> >>> >>> changed_overall |= changed; >>> >>> first_pass = false; >>> >>> Index: bb-reorder.c >>> >>> =================================================================== >>> >>> --- bb-reorder.c (revision 193376) >>> >>> +++ bb-reorder.c (working copy) >>> >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces >>> >>> current_partition = BB_PARTITION (traces[0].first); >>> >>> two_passes = false; >>> >>> >>> >>> - if (flag_reorder_blocks_and_partition) >>> >>> + if (crtl->has_bb_partition) >>> >>> for (i = 0; i < n_traces && !two_passes; i++) >>> >>> if (BB_PARTITION (traces[0].first) >>> >>> != BB_PARTITION (traces[i].first)) >>> >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces >>> >>> } >>> >>> } >>> >>> >>> >>> - if (flag_reorder_blocks_and_partition) >>> >>> + if (crtl->has_bb_partition) >>> >>> try_copy = false; >>> >>> >>> >>> /* Copy tiny blocks always; copy larger blocks only when the >>> >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) >>> >>> return length; >>> >>> } >>> >>> >>> >>> -/* Emit a barrier into the footer of BB. */ >>> >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode. */ >>> >>> >>> >>> -static void >>> >>> +void >>> >>> emit_barrier_after_bb (basic_block bb) >>> >>> { >>> >>> rtx barrier = emit_barrier_after (BB_END (bb)); >>> >>> - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >>> >>> + if (current_ir_type () == IR_RTL_CFGLAYOUT) >>> >>> + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >>> >>> } >>> >>> >>> >>> /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. >>> >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg >>> >>> { >>> >>> VEC(edge, heap) *crossing_edges = NULL; >>> >>> basic_block bb; >>> >>> - edge e; >>> >>> - edge_iterator ei; >>> >>> + edge e, e2; >>> >>> + edge_iterator ei, ei2; >>> >>> + unsigned int cold_bb_count = 0; >>> >>> + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; >>> >>> + VEC (basic_block, heap) *bbs_newly_hot = NULL; >>> >>> >>> >>> /* Mark which partition (hot/cold) each basic block belongs in. */ >>> >>> FOR_EACH_BB (bb) >>> >>> { >>> >>> if (probably_never_executed_bb_p (cfun, bb)) >>> >>> - BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>> >>> + { >>> >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>> >>> + cold_bb_count++; >>> >>> + } >>> >>> else >>> >>> - BB_SET_PARTITION (bb, BB_HOT_PARTITION); >>> >>> + { >>> >>> + BB_SET_PARTITION (bb, BB_HOT_PARTITION); >>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb); >>> >>> + } >>> >>> } >>> >>> >>> >>> + /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of >>> >>> + several different possibilities. One is that there are edge weight insanities >>> >>> + due to optimization phases that do not properly update basic block profile >>> >>> + counts. The second is that the entry of the function may not be hot, because >>> >>> + it is entered fewer times than the number of profile training runs, but there >>> >>> + is a loop inside the function that causes blocks within the function to be >>> >>> + above the threshold for hotness. */ >>> >>> + if (cold_bb_count) >>> >>> + { >>> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>> >>> + >>> >>> + if (dom_calculated_here) >>> >>> + calculate_dominance_info (CDI_DOMINATORS); >>> >>> + >>> >>> + /* Keep examining hot bbs until we have either checked them all, or >>> >>> + re-marked all cold bbs hot. */ >>> >>> + while (! VEC_empty (basic_block, bbs_in_hot_partition) >>> >>> + && cold_bb_count) >>> >>> + { >>> >>> + basic_block dom_bb; >>> >>> + >>> >>> + bb = VEC_pop (basic_block, bbs_in_hot_partition); >>> >>> + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); >>> >>> + >>> >>> + /* If bb's immediate dominator is also hot then it is ok. */ >>> >>> + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) >>> >>> + continue; >>> >>> + >>> >>> + /* We have a hot bb with an immediate dominator that is cold. >>> >>> + The dominator needs to be re-marked to hot. */ >>> >>> + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); >>> >>> + cold_bb_count--; >>> >>> + >>> >>> + /* Now we need to examine newly-hot dom_bb to see if it is also >>> >>> + dominated by a cold bb. */ >>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb); >>> >>> + >>> >>> + /* We should also adjust any cold blocks that the newly-hot bb >>> >>> + feeds and see if it makes sense to re-mark those as hot as >>> >>> + well. */ >>> >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb); >>> >>> + while (! VEC_empty (basic_block, bbs_newly_hot)) >>> >>> + { >>> >>> + basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot); >>> >>> + /* Examine all successors of this newly-hot bb to see if they >>> >>> + are cold and should be re-marked as hot. */ >>> >>> + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) >>> >>> + { >>> >>> + bool any_cold_preds = false; >>> >>> + basic_block succ = e->dest; >>> >>> + if (BB_PARTITION (succ) != BB_COLD_PARTITION) >>> >>> + continue; >>> >>> + /* Does this block have any cold predecessors now? */ >>> >>> + FOR_EACH_EDGE (e2, ei2, succ->preds) >>> >>> + { >>> >>> + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) >>> >>> + { >>> >>> + any_cold_preds = true; >>> >>> + break; >>> >>> + } >>> >>> + } >>> >>> + if (any_cold_preds) >>> >>> + continue; >>> >>> + >>> >>> + /* Here we have a successor of newly-hot bb that is cold >>> >>> + but no longer has any cold precessessors. Since the original >>> >>> + assignment of our newly-hot bb was incorrect, this successor's >>> >>> + assignment as cold is also suspect. Go ahead and re-mark it >>> >>> + as hot now too. Better heuristics may be in order here. */ >>> >>> + BB_SET_PARTITION (succ, BB_HOT_PARTITION); >>> >>> + cold_bb_count--; >>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ); >>> >>> + /* Examine this successor as a newly-hot bb. */ >>> >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, succ); >>> >>> + } >>> >>> + } >>> >>> + } >>> >>> + >>> >>> + if (dom_calculated_here) >>> >>> + free_dominance_info (CDI_DOMINATORS); >>> >>> + } >>> >>> + >>> >>> /* The format of .gcc_except_table does not allow landing pads to >>> >>> be in a different partition as the throw. Fix this by either >>> >>> moving or duplicating the landing pads. */ >>> >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) >>> >>> new_bb->aux = cur_bb->aux; >>> >>> cur_bb->aux = new_bb; >>> >>> >>> >>> - /* Make sure new fall-through bb is in same >>> >>> - partition as bb it's falling through from. */ >>> >>> + /* This is done by force_nonfallthru_and_redirect. */ >>> >>> + gcc_assert (BB_PARTITION (new_bb) >>> >>> + == BB_PARTITION (cur_bb)); >>> >>> >>> >>> - BB_COPY_PARTITION (new_bb, cur_bb); >>> >>> single_succ_edge (new_bb)->flags |= EDGE_CROSSING; >>> >>> } >>> >>> else >>> >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) >>> >>> FOR_EACH_BB (bb) >>> >>> FOR_EACH_EDGE (e, ei, bb->succs) >>> >>> if ((e->flags & EDGE_CROSSING) >>> >>> - && JUMP_P (BB_END (e->src))) >>> >>> + && JUMP_P (BB_END (e->src)) >>> >>> + /* Some notes were added during fix_up_fall_thru_edges, via >>> >>> + force_nonfallthru_and_redirect. */ >>> >>> + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX)) >>> >>> add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>> >>> } >>> >>> >>> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) >>> >>> dump_flow_info (dump_file, dump_flags); >>> >>> } >>> >>> >>> >>> - if (flag_reorder_blocks_and_partition) >>> >>> + if (crtl->has_bb_partition) >>> >>> verify_hot_cold_block_grouping (); >>> >>> } >>> >>> >>> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) >>> >>> encountering this note will make the compiler switch between the >>> >>> hot and cold text sections. */ >>> >>> >>> >>> -static void >>> >>> +void >>> >>> insert_section_boundary_note (void) >>> >>> { >>> >>> basic_block bb; >>> >>> rtx new_note; >>> >>> int first_partition = 0; >>> >>> >>> >>> - if (!flag_reorder_blocks_and_partition) >>> >>> + if (!crtl->has_bb_partition) >>> >>> return; >>> >>> >>> >>> FOR_EACH_BB (bb) >>> >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) >>> >>> FOR_EACH_BB (bb) >>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>> >>> bb->aux = bb->next_bb; >>> >>> - cfg_layout_finalize (); >>> >>> + cfg_layout_finalize (true); >>> >>> >>> >>> - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >>> >>> - insert_section_boundary_note (); >>> >>> return 0; >>> >>> } >>> >>> >>> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) >>> >>> } >>> >>> >>> >>> done: >>> >>> - cfg_layout_finalize (); >>> >>> + cfg_layout_finalize (false); >>> >>> >>> >>> BITMAP_FREE (candidates); >>> >>> return 0; >>> >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) >>> >>> if (crossing_edges == NULL) >>> >>> return 0; >>> >>> >>> >>> + crtl->has_bb_partition = true; >>> >>> + >>> >>> /* Make sure the source of any crossing edge ends in a jump and the >>> >>> destination of any crossing edge has a label. */ >>> >>> add_labels_and_missing_jumps (crossing_edges); >>> >>> Index: bb-reorder.h >>> >>> =================================================================== >>> >>> --- bb-reorder.h (revision 193376) >>> >>> +++ bb-reorder.h (working copy) >>> >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re >>> >>> >>> >>> extern int get_uncond_jump_length (void); >>> >>> >>> >>> +extern void insert_section_boundary_note (void); >>> >>> + >>> >>> +extern void emit_barrier_after_bb (basic_block bb); >>> >>> + >>> >>> #endif >>> >>> Index: basic-block.h >>> >>> =================================================================== >>> >>> --- basic-block.h (revision 193376) >>> >>> +++ basic-block.h (working copy) >>> >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect >>> >>> extern bool contains_no_active_insn_p (const_basic_block); >>> >>> extern bool forwarder_block_p (const_basic_block); >>> >>> extern bool can_fallthru (basic_block, basic_block); >>> >>> +extern void fixup_partitions (void); >>> >>> >>> >>> /* In cfgbuild.c. */ >>> >>> extern void find_many_sub_basic_blocks (sbitmap); >>> >>> Index: cfgrtl.c >>> >>> =================================================================== >>> >>> --- cfgrtl.c (revision 193376) >>> >>> +++ cfgrtl.c (working copy) >>> >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see >>> >>> #include "tree.h" >>> >>> #include "hard-reg-set.h" >>> >>> #include "basic-block.h" >>> >>> +#include "bb-reorder.h" >>> >>> #include "regs.h" >>> >>> #include "flags.h" >>> >>> #include "function.h" >>> >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see >>> >>> Only applicable if the CFG is in cfglayout mode. */ >>> >>> static GTY(()) rtx cfg_layout_function_footer; >>> >>> static GTY(()) rtx cfg_layout_function_header; >>> >>> +static bool had_sec_boundary_notes; >>> >>> >>> >>> static rtx skip_insns_after_block (basic_block); >>> >>> static void record_effective_endpoints (void); >>> >>> static rtx label_for_bb (basic_block); >>> >>> -static void fixup_reorder_chain (void); >>> >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks); >>> >>> >>> >>> void verify_insn_chain (void); >>> >>> static void fixup_fallthru_exit_predecessor (void); >>> >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc >>> >>> partition boundaries). See the comments at the top of >>> >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >>> >>> >>> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >>> >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >>> >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >>> >>> return NULL; >>> >>> >>> >>> /* We can replace or remove a complex jump only when we have exactly >>> >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) >>> >>> return e; >>> >>> } >>> >>> >>> >>> +/* Called when edge E has been redirected to a new destination, >>> >>> + in order to update the region crossing flag on the edge and >>> >>> + jump. */ >>> >>> + >>> >>> +static void >>> >>> +fixup_partition_crossing (edge e, basic_block target) >>> >>> +{ >>> >>> + rtx note; >>> >>> + >>> >>> + gcc_assert (e->dest == target); >>> >>> + >>> >>> + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) >>> >>> + return; >>> >>> + /* If we redirected an existing edge, it may already be marked >>> >>> + crossing, even though the new src is missing a reg crossing note. >>> >>> + But make sure reg crossing note doesn't already exist before >>> >>> + inserting. */ >>> >>> + if (BB_PARTITION (e->src) != BB_PARTITION (target)) >>> >>> + { >>> >>> + e->flags |= EDGE_CROSSING; >>> >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>> >>> + if (JUMP_P (BB_END (e->src)) >>> >>> + && !note) >>> >>> + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>> >>> + } >>> >>> + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) >>> >>> + { >>> >>> + e->flags &= ~EDGE_CROSSING; >>> >>> + /* Remove the region crossing note from jump at end of >>> >>> + e->src if it exists. */ >>> >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>> >>> + if (note) >>> >>> + remove_note (BB_END (e->src), note); >>> >>> + } >>> >>> +} >>> >>> + >>> >>> +/* Called when block BB has been reassigned to a different partition, >>> >>> + to ensure that the region crossing attributes are updated. */ >>> >>> + >>> >>> +static void >>> >>> +fixup_bb_partition (basic_block bb) >>> >>> +{ >>> >>> + edge e; >>> >>> + edge_iterator ei; >>> >>> + >>> >>> + /* Now need to make bb's pred edges non-region crossing. */ >>> >>> + FOR_EACH_EDGE (e, ei, bb->preds) >>> >>> + { >>> >>> + fixup_partition_crossing (e, e->dest); >>> >>> + } >>> >>> + >>> >>> + /* Possibly need to make bb's successor edges region crossing, >>> >>> + or remove stale region crossing. */ >>> >>> + FOR_EACH_EDGE (e, ei, bb->succs) >>> >>> + { >>> >>> + if ((e->flags & EDGE_FALLTHRU) >>> >>> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) >>> >>> + && e->dest != EXIT_BLOCK_PTR) >>> >>> + /* force_nonfallthru_and_redirect calls fixup_partition_crossing. */ >>> >>> + force_nonfallthru (e); >>> >>> + else >>> >>> + fixup_partition_crossing (e, e->dest); >>> >>> + } >>> >>> +} >>> >>> + >>> >>> /* Attempt to change code to redirect edge E to TARGET. Don't do that on >>> >>> expense of adding new instructions or reordering basic blocks. >>> >>> >>> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block >>> >>> { >>> >>> edge ret; >>> >>> basic_block src = e->src; >>> >>> + basic_block dest = e->dest; >>> >>> >>> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >>> >>> return NULL; >>> >>> >>> >>> - if (e->dest == target) >>> >>> + if (dest == target) >>> >>> return e; >>> >>> >>> >>> if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) >>> >>> { >>> >>> df_set_bb_dirty (src); >>> >>> + fixup_partition_crossing (ret, target); >>> >>> return ret; >>> >>> } >>> >>> >>> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block >>> >>> return NULL; >>> >>> >>> >>> df_set_bb_dirty (src); >>> >>> + fixup_partition_crossing (ret, target); >>> >>> return ret; >>> >>> } >>> >>> >>> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >>> >>> /* Make sure new block ends up in correct hot/cold section. */ >>> >>> >>> >>> BB_COPY_PARTITION (jump_block, e->src); >>> >>> - if (flag_reorder_blocks_and_partition >>> >>> - && targetm_common.have_named_sections >>> >>> - && JUMP_P (BB_END (jump_block)) >>> >>> - && !any_condjump_p (BB_END (jump_block)) >>> >>> - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) >>> >>> - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); >>> >>> >>> >>> /* Wire edge in. */ >>> >>> new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); >>> >>> new_edge->probability = probability; >>> >>> new_edge->count = count; >>> >>> >>> >>> + /* If e->src was previously region crossing, it no longer is >>> >>> + and the reg crossing note should be removed. */ >>> >>> + fixup_partition_crossing (new_edge, jump_block); >>> >>> + >>> >>> /* Redirect old edge. */ >>> >>> redirect_edge_pred (e, jump_block); >>> >>> e->probability = REG_BR_PROB_BASE; >>> >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >>> >>> LABEL_NUSES (label)++; >>> >>> } >>> >>> >>> >>> - emit_barrier_after (BB_END (jump_block)); >>> >>> + /* We might be in cfg layout mode, and if so, the following routine will >>> >>> + insert the barrier correctly. */ >>> >>> + emit_barrier_after_bb (jump_block); >>> >>> redirect_edge_succ_nodup (e, target); >>> >>> >>> >>> if (abnormal_edge_flags) >>> >>> make_edge (src, target, abnormal_edge_flags); >>> >>> >>> >>> df_mark_solutions_dirty (); >>> >>> + fixup_partition_crossing (e, target); >>> >>> return new_bb; >>> >>> } >>> >>> >>> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU >>> >>> static basic_block >>> >>> rtl_split_edge (edge edge_in) >>> >>> { >>> >>> - basic_block bb; >>> >>> + basic_block bb, new_bb; >>> >>> rtx before; >>> >>> >>> >>> /* Abnormal edges cannot be split. */ >>> >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) >>> >>> else >>> >>> { >>> >>> bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); >>> >>> - /* ??? Why not edge_in->dest->prev_bb here? */ >>> >>> - BB_COPY_PARTITION (bb, edge_in->dest); >>> >>> + if (edge_in->src == ENTRY_BLOCK_PTR) >>> >>> + BB_COPY_PARTITION (bb, edge_in->dest); >>> >>> + else >>> >>> + /* Put the split bb into the src partition, to avoid creating >>> >>> + a situation where a cold bb dominates a hot bb, in the case >>> >>> + where src is cold and dest is hot. The src will dominate >>> >>> + the new bb (whereas it might not have dominated dest). */ >>> >>> + BB_COPY_PARTITION (bb, edge_in->src); >>> >>> } >>> >>> >>> >>> make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); >>> >>> >>> >>> + /* Can't allow a region crossing edge to be fallthrough. */ >>> >>> + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) >>> >>> + && edge_in->dest != EXIT_BLOCK_PTR) >>> >>> + { >>> >>> + new_bb = force_nonfallthru (single_succ_edge (bb)); >>> >>> + gcc_assert (!new_bb); >>> >>> + } >>> >>> + >>> >>> /* For non-fallthru edges, we must adjust the predecessor's >>> >>> jump instruction to target our new block. */ >>> >>> if ((edge_in->flags & EDGE_FALLTHRU) == 0) >>> >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) >>> >>> else >>> >>> { >>> >>> bb = split_edge (e); >>> >>> - after = BB_END (bb); >>> >>> >>> >>> - if (flag_reorder_blocks_and_partition >>> >>> - && targetm_common.have_named_sections >>> >>> - && e->src != ENTRY_BLOCK_PTR >>> >>> - && BB_PARTITION (e->src) == BB_COLD_PARTITION >>> >>> - && !(e->flags & EDGE_CROSSING) >>> >>> - && JUMP_P (after) >>> >>> - && !any_condjump_p (after) >>> >>> - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) >>> >>> - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); >>> >>> + /* If e crossed a partition boundary, we needed to make bb end in >>> >>> + a region-crossing jump, even though it was originally fallthru. */ >>> >>> + if (JUMP_P (BB_END (bb))) >>> >>> + before = BB_END (bb); >>> >>> + else >>> >>> + after = BB_END (bb); >>> >>> } >>> >>> >>> >>> /* Now that we've found the spot, do the insertion. */ >>> >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) >>> >>> { >>> >>> basic_block bb; >>> >>> >>> >>> + /* Optimization passes that invoke this routine can cause hot blocks >>> >>> + previously reached by both hot and cold blocks to become dominated only >>> >>> + by cold blocks. This will cause the verification below to fail, >>> >>> + and lead to now cold code in the hot section. In some cases this >>> >>> + may only be visible after newly unreachable blocks are deleted, >>> >>> + which will be done by fixup_partitions. */ >>> >>> + fixup_partitions (); >>> >>> + >>> >>> #ifdef ENABLE_CHECKING >>> >>> verify_flow_info (); >>> >>> #endif >>> >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) >>> >>> >>> >>> return end; >>> >>> } >>> >>> - >>> >>> + >>> >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization >>> >>> + passes that modify the cfg. */ >>> >>> + >>> >>> +void >>> >>> +fixup_partitions (void) >>> >>> +{ >>> >>> + basic_block bb; >>> >>> + >>> >>> + if (!crtl->has_bb_partition) >>> >>> + return; >>> >>> + >>> >>> + /* Delete any blocks that became unreachable and weren't >>> >>> + already cleaned up, for example during edge forwarding >>> >>> + and convert_jumps_to_returns. This will expose more >>> >>> + opportunities for fixing the partition boundaries here. >>> >>> + Also, the calculation of the dominance graph during verification >>> >>> + will assert if there are unreachable nodes. */ >>> >>> + delete_unreachable_blocks (); >>> >>> + >>> >>> + /* If there are partitions, do a sanity check on them: A basic block in >>> >>> + a cold partition cannot dominate a basic block in a hot partition. >>> >>> + Fixup any that now violate this requirement, as a result of edge >>> >>> + forwarding and unreachable block deletion. */ >>> >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >>> >>> + VEC (basic_block, heap) *bbs_to_fix = NULL; >>> >>> + FOR_EACH_BB (bb) >>> >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >>> >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >>> >>> + { >>> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>> >>> + basic_block son; >>> >>> + >>> >>> + if (dom_calculated_here) >>> >>> + calculate_dominance_info (CDI_DOMINATORS); >>> >>> + >>> >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >>> >>> + { >>> >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >>> >>> + /* If bb is not yet cold (because it was added below as >>> >>> + a block dominated by a cold bb) then mark it cold here. */ >>> >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >>> >>> + { >>> >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>> >>> + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); >>> >>> + } >>> >>> + /* Any blocks dominated by a block in the cold section >>> >>> + must also be cold. */ >>> >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >>> >>> + son; >>> >>> + son = next_dom_son (CDI_DOMINATORS, son)) >>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >>> >>> + } >>> >>> + >>> >>> + if (dom_calculated_here) >>> >>> + free_dominance_info (CDI_DOMINATORS); >>> >>> + } >>> >>> + >>> >>> + /* Do the partition fixup after all necessary blocks have been converted to >>> >>> + cold, so that we only update the region crossings the minimum number of >>> >>> + places, which can require forcing edges to be non fallthru. */ >>> >>> + while (! VEC_empty (basic_block, bbs_to_fix)) >>> >>> + { >>> >>> + bb = VEC_pop (basic_block, bbs_to_fix); >>> >>> + fixup_bb_partition (bb); >>> >>> + } >>> >>> +} >>> >>> + >>> >>> /* Verify the CFG and RTL consistency common for both underlying RTL and >>> >>> cfglayout RTL. >>> >>> >>> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) >>> >>> rtx x; >>> >>> int err = 0; >>> >>> basic_block bb; >>> >>> + bool have_partitions = false; >>> >>> >>> >>> /* Check the general integrity of the basic blocks. */ >>> >>> FOR_EACH_BB_REVERSE (bb) >>> >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) >>> >>> >>> >>> if (e->flags & EDGE_ABNORMAL) >>> >>> n_abnormal++; >>> >>> + >>> >>> + have_partitions |= is_crossing; >>> >>> } >>> >>> >>> >>> if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) >>> >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) >>> >>> } >>> >>> } >>> >>> >>> >>> + /* If there are partitions, do a sanity check on them: A basic block in >>> >>> + a cold partition cannot dominate a basic block in a hot partition. */ >>> >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >>> >>> + if (have_partitions && !err) >>> >>> + FOR_EACH_BB (bb) >>> >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >>> >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >>> >>> + { >>> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>> >>> + basic_block son; >>> >>> + >>> >>> + if (dom_calculated_here) >>> >>> + calculate_dominance_info (CDI_DOMINATORS); >>> >>> + >>> >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >>> >>> + { >>> >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >>> >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >>> >>> + { >>> >>> + error ("non-cold basic block %d dominated " >>> >>> + "by a block in the cold partition", bb->index); >>> >>> + err = 1; >>> >>> + } >>> >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >>> >>> + son; >>> >>> + son = next_dom_son (CDI_DOMINATORS, son)) >>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >>> >>> + } >>> >>> + >>> >>> + if (dom_calculated_here) >>> >>> + free_dominance_info (CDI_DOMINATORS); >>> >>> + } >>> >>> + >>> >>> /* Clean up. */ >>> >>> return err; >>> >>> } >>> >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) >>> >>> else >>> >>> cfg_layout_function_header = NULL_RTX; >>> >>> >>> >>> + had_sec_boundary_notes = false; >>> >>> + >>> >>> next_insn = get_insns (); >>> >>> FOR_EACH_BB (bb) >>> >>> { >>> >>> rtx end; >>> >>> >>> >>> if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) >>> >>> - BB_HEADER (bb) = unlink_insn_chain (next_insn, >>> >>> - PREV_INSN (BB_HEAD (bb))); >>> >>> + { >>> >>> + /* Rather than try to keep section boundary notes incrementally >>> >>> + up-to-date through cfg layout optimizations, simply remove them >>> >>> + and flag that they should be re-inserted when exiting >>> >>> + cfg layout mode. */ >>> >>> + rtx check_insn = next_insn; >>> >>> + while (check_insn) >>> >>> + { >>> >>> + if (NOTE_P (check_insn) >>> >>> + && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) >>> >>> + { >>> >>> + had_sec_boundary_notes |= true; >>> >>> + /* Remove note from chain. Grab new next_insn first. */ >>> >>> + if (next_insn == check_insn) >>> >>> + next_insn = NEXT_INSN (check_insn); >>> >>> + /* Delete note. */ >>> >>> + delete_insn (check_insn); >>> >>> + /* There will only be one. */ >>> >>> + break; >>> >>> + } >>> >>> + check_insn = NEXT_INSN (check_insn); >>> >>> + } >>> >>> + /* If we still have header instructions left after above loop. */ >>> >>> + if (next_insn != BB_HEAD (bb)) >>> >>> + BB_HEADER (bb) = unlink_insn_chain (next_insn, >>> >>> + PREV_INSN (BB_HEAD (bb))); >>> >>> + } >>> >>> end = skip_insns_after_block (bb); >>> >>> if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) >>> >>> BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end); >>> >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) >>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>> >>> bb->aux = bb->next_bb; >>> >>> >>> >>> - cfg_layout_finalize (); >>> >>> + cfg_layout_finalize (false); >>> >>> >>> >>> return 0; >>> >>> } >>> >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) >>> >>> } >>> >>> >>> >>> >>> >>> -/* Given a reorder chain, rearrange the code to match. */ >>> >>> +/* Given a reorder chain, rearrange the code to match. If >>> >>> + this is called when we will FINALIZE_REORDER_BLOCKS, or when >>> >>> + section boundary notes were removed on entry to cfg layout >>> >>> + mode, insert section boundary notes here. */ >>> >>> >>> >>> static void >>> >>> -fixup_reorder_chain (void) >>> >>> +fixup_reorder_chain (bool finalize_reorder_blocks) >>> >>> { >>> >>> basic_block bb; >>> >>> rtx insn = NULL; >>> >>> @@ -3150,7 +3373,7 @@ static void >>> >>> PREV_INSN (BB_HEADER (bb)) = insn; >>> >>> insn = BB_HEADER (bb); >>> >>> while (NEXT_INSN (insn)) >>> >>> - insn = NEXT_INSN (insn); >>> >>> + insn = NEXT_INSN (insn); >>> >>> } >>> >>> if (insn) >>> >>> NEXT_INSN (insn) = BB_HEAD (bb); >>> >>> @@ -3175,6 +3398,11 @@ static void >>> >>> insn = NEXT_INSN (insn); >>> >>> >>> >>> set_last_insn (insn); >>> >>> + >>> >>> + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >>> >>> + if (had_sec_boundary_notes || finalize_reorder_blocks) >>> >>> + insert_section_boundary_note (); >>> >>> + >>> >>> #ifdef ENABLE_CHECKING >>> >>> verify_insn_chain (); >>> >>> #endif >>> >>> @@ -3187,7 +3415,7 @@ static void >>> >>> edge e_fall, e_taken, e; >>> >>> rtx bb_end_insn; >>> >>> rtx ret_label = NULL_RTX; >>> >>> - basic_block nb, src_bb; >>> >>> + basic_block nb; >>> >>> edge_iterator ei; >>> >>> >>> >>> if (EDGE_COUNT (bb->succs) == 0) >>> >>> @@ -3322,7 +3550,6 @@ static void >>> >>> /* We got here if we need to add a new jump insn. >>> >>> Note force_nonfallthru can delete E_FALL and thus we have to >>> >>> save E_FALL->src prior to the call to force_nonfallthru. */ >>> >>> - src_bb = e_fall->src; >>> >>> nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); >>> >>> if (nb) >>> >>> { >>> >>> @@ -3330,17 +3557,6 @@ static void >>> >>> bb->aux = nb; >>> >>> /* Don't process this new block. */ >>> >>> bb = nb; >>> >>> - >>> >>> - /* Make sure new bb is tagged for correct section (same as >>> >>> - fall-thru source, since you cannot fall-thru across >>> >>> - section boundaries). */ >>> >>> - BB_COPY_PARTITION (src_bb, single_pred (bb)); >>> >>> - if (flag_reorder_blocks_and_partition >>> >>> - && targetm_common.have_named_sections >>> >>> - && JUMP_P (BB_END (bb)) >>> >>> - && !any_condjump_p (BB_END (bb)) >>> >>> - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) >>> >>> - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); >>> >>> } >>> >>> } >>> >>> >>> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) >>> >>> case NOTE_INSN_FUNCTION_BEG: >>> >>> /* There is always just single entry to function. */ >>> >>> case NOTE_INSN_BASIC_BLOCK: >>> >>> + /* We should only switch text sections once. */ >>> >>> + case NOTE_INSN_SWITCH_TEXT_SECTIONS: >>> >>> break; >>> >>> >>> >>> case NOTE_INSN_EPILOGUE_BEG: >>> >>> - case NOTE_INSN_SWITCH_TEXT_SECTIONS: >>> >>> emit_note_copy (insn); >>> >>> break; >>> >>> >>> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void) >>> >>> } >>> >>> >>> >>> /* Finalize the changes: reorder insn list according to the sequence specified >>> >>> - by aux pointers, enter compensation code, rebuild scope forest. */ >>> >>> + by aux pointers, enter compensation code, rebuild scope forest. If >>> >>> + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that >>> >>> + to fixup_reorder_chain so that it can insert the proper switch text >>> >>> + section notes. */ >>> >>> >>> >>> void >>> >>> -cfg_layout_finalize (void) >>> >>> +cfg_layout_finalize (bool finalize_reorder_blocks) >>> >>> { >>> >>> #ifdef ENABLE_CHECKING >>> >>> verify_flow_info (); >>> >>> @@ -3775,7 +3995,7 @@ void >>> >>> #endif >>> >>> ) >>> >>> fixup_fallthru_exit_predecessor (); >>> >>> - fixup_reorder_chain (); >>> >>> + fixup_reorder_chain (finalize_reorder_blocks); >>> >>> >>> >>> rebuild_jump_labels (get_insns ()); >>> >>> delete_dead_jumptables (); >>> >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) >>> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >>> >>> return false; >>> >>> >>> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >>> >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >>> >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >>> >>> return false; >>> >>> >>> >>> if (!onlyjump_p (insn) >>> >>> >>> >>> -- >>> >>> This patch is available for review at http://codereview.appspot.com/6823047 >>> >> >>> >> >>> >> >>> >> -- >>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 >>> >>> >>> >>> -- >>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Is this with the same target compiler and options used in PR55121? I will try to reproduce the compile-time failures with arm and those options if so. I haven't seen those with spec2006 linux x86_64. I'm not sure how to test the runtime behavior though. Thanks, Teresa On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon <christophe.lyon@linaro.org> wrote: > I have updated my trunk checkout, and I can confirm that eval.c now > compiles with your patch (and the other 4 patches I added to PR55121). > > Now, when looking at the whole Spec2k results: > - vpr passes now (used to fail) > - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail > with the same error from gas: > can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171' > {.text section} > - gap still does not build (same error as above) > > I haven't looked in detail, so I may be missing an obvious patch here. > > And I still observe runtime mis-behaviour on crafty, galgel, facerec and fma3d. > > Thanks > Christophe. > > > On 26 November 2012 21:52, Teresa Johnson <tejohnson@google.com> wrote: >> Sorry, I don't know what happened there. Patch is attached. >> Thanks, >> Teresa >> >> On Mon, Nov 26, 2012 at 12:42 PM, Jack Howarth <howarth@bromo.med.uc.edu> wrote: >>> On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote: >>>> Are you sure you have all my changes applied? I applied the 4 patches >>>> attached to PR55121 into my trunk checkout that has my fixes, and to a >>>> pristine trunk checkout. I configured and built both for >>>> --target=arm-none-linux-gnueabi, and built using your options, .i file >>>> and gcda file. I can reproduce the failure using the pristine trunk >>>> with your patches but not with my fixed trunk + your patches. (I just >>>> updated to head to pickup recent changes and get the same result. The >>>> vec changes required some manual changes to the patch, which I will >>>> resend shortly.) >>> >>> Teresa, >>> Your mailer seems to have corrupted the posted patch with stray >>> =3D characters and line breaks. Can you repost a copy as an attachment >>> to the list? >>> Jack >>> >>>> >>>> Without my fixes: >>>> >>>> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce >>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 >>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp >>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use >>>> -fno-common -o eval.s -freorder-blocks-and-partition >>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>>> 2.4.2-p1, MPC version 0.8.1 >>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>>> 2.4.2-p1, MPC version 0.8.1 >>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>>> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a >>>> eval.c: In function ‘Ge’: >>>> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560 >>>> } >>>> ^ >>>> 0x622f71 df_compact_blocks() >>>> ../../gcc_trunk_3/gcc/df-core.c:1560 >>>> 0x5cfcb5 compact_blocks() >>>> ../../gcc_trunk_3/gcc/cfg.c:162 >>>> 0xc9dce0 reorder_basic_blocks >>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2154 >>>> 0xc9dce0 rest_of_handle_reorder_blocks >>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2219 >>>> Please submit a full bug report, >>>> with preprocessed source if appropriate. >>>> Please include the complete backtrace with any bug report. >>>> See <http://gcc.gnu.org/bugs.html> for instructions. >>>> >>>> >>>> With my fixes: >>>> >>>> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce >>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 >>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp >>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use >>>> -fno-common -o eval.s -freorder-blocks-and-partition >>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>>> 2.4.2-p1, MPC version 0.8.1 >>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>>> 2.4.2-p1, MPC version 0.8.1 >>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>>> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3 >>>> >>>> >>>> Thanks, >>>> Teresa >>>> >>>> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon >>>> <christophe.lyon@linaro.org> wrote: >>>> > Hi, >>>> > >>>> > I have tested your patch on Spec2000 on ARM, and I can still see >>>> > several failures caused by: >>>> > "error: fallthru edge crosses section boundary", including the case >>>> > described in PR55121. >>>> > >>>> > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote: >>>> >> Ping. >>>> >> Teresa >>>> >> >>>> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote: >>>> >>> Revised patch that fixes failures encountered when enabling >>>> >>> -freorder-blocks-and-partition, including the failure reported in PR 53743. >>>> >>> >>>> >>> This includes new verification code to ensure no cold blocks dominate hot >>>> >>> blocks contributed by Steven Bosscher. >>>> >>> >>>> >>> I attempted to make the handling of partition updates through the optimization >>>> >>> passes much more consistent, removing a number of partial fixes in the code >>>> >>> stream in the process. The code to fixup partitions (including the BB_PARTITION >>>> >>> assignement, region crossing jump notes, and switch text section notes) is >>>> >>> now handled in a few centralized locations. For example, inside >>>> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers >>>> >>> don't need to attempt the fixup themselves. >>>> >>> >>>> >>> For optimization passes that make adjustments to the cfg while in cfg layout >>>> >>> mode that are not easy to fix up incrementally, the new routine >>>> >>> fixup_partitions handles the cleanup globally. This does require calculation >>>> >>> of the dominance relation, however, as far as I can tell the routines which >>>> >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) >>>> >>> are invoked typically once (or a small number of times in the case of >>>> >>> try_optimize_cfg) per optimization pass. Additionally, I compared the >>>> >>> -ftime-report output for some large fdo compilations and saw only minimal >>>> >>> increases in the dominance computation times, which were only a tiny percent >>>> >>> of the overall compile time. >>>> >>> >>>> >>> Additionally, I added a flag to the rtl_data structure to indicate whether >>>> >>> any partitioning was actually performed, so that optimizations which were >>>> >>> conservatively disabled whenever the flag_reorder_blocks_and_partition >>>> >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less >>>> >>> conservative for functions where no partitions were formed (e.g. they are >>>> >>> completely hot). >>>> >>> >>>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int >>>> >>> benchmarks and internal google benchmarks using profile feedback and >>>> >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk? >>>> >>> >>>> >>> Thanks, >>>> >>> Teresa >>>> >>> >>>> >>> 2012-11-14 Teresa Johnson <tejohnson@google.com> >>>> >>> Steven Bosscher <steven@gcc.gnu.org> >>>> >>> >>>> >>> * cfghooks.h (cfg_layout_finalize): New parameter. >>>> >>> * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize >>>> >>> parameter. >>>> >>> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert >>>> >>> as this is now done by redirect_edge_and_branch_force. >>>> >>> * function.c (thread_prologue_and_epilogue_insns): Insert new bb after >>>> >>> barriers, new cfg_layout_finalize parameter, and don't store exit >>>> >>> predecessor BB until after it is potentially split. >>>> >>> * function.h (struct rtl_data): New flag has_bb_partition. >>>> >>> * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. >>>> >>> * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if >>>> >>> any blocks in function actually partitioned. >>>> >>> (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean >>>> >>> up partitioning. >>>> >>> * bb-reorder.c (connect_traces): Only look for partitions and skip >>>> >>> block copying if any blocks in function actually partitioned. >>>> >>> (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. >>>> >>> (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure >>>> >>> that no cold blocks dominate a hot block. >>>> >>> (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert >>>> >>> as this is now done by force_nonfallthru_and_redirect. >>>> >>> (add_reg_crossing_jump_notes): Handle the fact that some jumps may >>>> >>> already be marked with region crossing note. >>>> >>> (reorder_basic_blocks): Only need to verify partitions if any >>>> >>> blocks in function actually partitioned. >>>> >>> (insert_section_boundary_note): Only need to insert note if any >>>> >>> blocks in function actually partitioned. >>>> >>> (rest_of_handle_reorder_blocks): New cfg_layout_finalize >>>> >>> parameter, and remove call to insert_section_boundary_note as this >>>> >>> is now called via cfg_layout_finalize/fixup_reorder_chain. >>>> >>> (duplicate_computed_gotos): New cfg_layout_finalize >>>> >>> parameter. >>>> >>> (partition_hot_cold_basic_blocks): Set flag indicating function >>>> >>> has bb partitions. >>>> >>> * bb-reorder.h: Declare insert_section_boundary_note and >>>> >>> emit_barrier_after_bb, which are no longer static. >>>> >>> * basic-block.h: Declare new function fixup_partitions. >>>> >>> * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary >>>> >>> check for region crossing note. >>>> >>> (fixup_partition_crossing): New function. >>>> >>> (fixup_bb_partition): Ditto. >>>> >>> (rtl_redirect_edge_and_branch): Fixup partition boundaries. >>>> >>> (force_nonfallthru_and_redirect): Fixup partition boundaries, >>>> >>> remove old code that tried to do this. Emit barrier correctly >>>> >>> when we are in cfglayout mode. >>>> >>> (rtl_split_edge): Correctly fixup partition boundaries. >>>> >>> (commit_one_edge_insertion): Remove old code that tried to >>>> >>> fixup region crossing edge since this is now handled in >>>> >>> split_block, and set up insertion point correctly since >>>> >>> block may now end in a jump. >>>> >>> (commit_edge_insertions): Invoke fixup_partitions to sanitize partition >>>> >>> boundaries after optimizations that modify cfg and before trying to >>>> >>> verify the flow info. >>>> >>> (fixup_partitions): New function. >>>> >>> (rtl_verify_flow_info_1): Add verification that no cold bbs dominate >>>> >>> hot bbs. >>>> >>> (record_effective_endpoints): Remove region-crossing notes and set flag >>>> >>> indicating that they need to be reinserted on exit from cfglayout mode. >>>> >>> (outof_cfg_layout_mode): New cfg_layout_finalize parameter. >>>> >>> (fixup_reorder_chain): Call insert_section_boundary_note if necessary. >>>> >>> Remove old code that attempted to fixup region crossing note as >>>> >>> this is now handled in force_nonfallthru_and_redirect. >>>> >>> (duplicate_insn_chain): Don't duplicate switch section notes. >>>> >>> (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain. >>>> >>> (rtl_can_remove_branch_p): Remove unnecessary check for region crossing >>>> >>> note. >>>> >>> >>>> >>> Index: cfghooks.h >>>> >>> =================================================================== >>>> >>> --- cfghooks.h (revision 193376) >>>> >>> +++ cfghooks.h (working copy) >>>> >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas >>>> >>> void account_profile_record (struct profile_record *, int); >>>> >>> >>>> >>> extern void cfg_layout_initialize (unsigned int); >>>> >>> -extern void cfg_layout_finalize (void); >>>> >>> +extern void cfg_layout_finalize (bool); >>>> >>> >>>> >>> /* Hooks containers. */ >>>> >>> extern struct cfg_hooks gimple_cfg_hooks; >>>> >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi >>>> >>> extern void gimple_register_cfg_hooks (void); >>>> >>> extern struct cfg_hooks get_cfg_hooks (void); >>>> >>> extern void set_cfg_hooks (struct cfg_hooks); >>>> >>> - >>>> >>> Index: modulo-sched.c >>>> >>> =================================================================== >>>> >>> --- modulo-sched.c (revision 193376) >>>> >>> +++ modulo-sched.c (working copy) >>>> >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) >>>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>>> >>> bb->aux = bb->next_bb; >>>> >>> free_dominance_info (CDI_DOMINATORS); >>>> >>> - cfg_layout_finalize (); >>>> >>> + cfg_layout_finalize (false); >>>> >>> #endif /* INSN_SCHEDULING */ >>>> >>> return 0; >>>> >>> } >>>> >>> Index: ifcvt.c >>>> >>> =================================================================== >>>> >>> --- ifcvt.c (revision 193376) >>>> >>> +++ ifcvt.c (working copy) >>>> >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg >>>> >>> if (new_bb) >>>> >>> { >>>> >>> df_bb_replace (then_bb_index, new_bb); >>>> >>> - /* Since the fallthru edge was redirected from test_bb to new_bb, >>>> >>> - we need to ensure that new_bb is in the same partition as >>>> >>> - test bb (you can not fall through across section boundaries). */ >>>> >>> - BB_COPY_PARTITION (new_bb, test_bb); >>>> >>> + /* This should have been done above via force_nonfallthru_and_redirect >>>> >>> + (possibly called from redirect_edge_and_branch_force). */ >>>> >>> + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); >>>> >>> } >>>> >>> >>>> >>> num_true_changes++; >>>> >>> Index: function.c >>>> >>> =================================================================== >>>> >>> --- function.c (revision 193376) >>>> >>> +++ function.c (working copy) >>>> >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) >>>> >>> break; >>>> >>> if (e) >>>> >>> { >>>> >>> - copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), >>>> >>> - NULL_RTX, e->src); >>>> >>> + /* Make sure we insert after any barriers. */ >>>> >>> + rtx end = get_last_bb_insn (e->src); >>>> >>> + copy_bb = create_basic_block (NEXT_INSN (end), >>>> >>> + NULL_RTX, e->src); >>>> >>> BB_COPY_PARTITION (copy_bb, e->src); >>>> >>> } >>>> >>> else >>>> >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) >>>> >>> if (cur_bb->index >= NUM_FIXED_BLOCKS >>>> >>> && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) >>>> >>> cur_bb->aux = cur_bb->next_bb; >>>> >>> - cfg_layout_finalize (); >>>> >>> + cfg_layout_finalize (false); >>>> >>> } >>>> >>> >>>> >>> epilogue_done: >>>> >>> @@ -6517,7 +6519,7 @@ epilogue_done: >>>> >>> basic_block simple_return_block_cold = NULL; >>>> >>> edge pending_edge_hot = NULL; >>>> >>> edge pending_edge_cold = NULL; >>>> >>> - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; >>>> >>> + basic_block exit_pred; >>>> >>> int i; >>>> >>> >>>> >>> gcc_assert (entry_edge != orig_entry_edge); >>>> >>> @@ -6545,6 +6547,12 @@ epilogue_done: >>>> >>> else >>>> >>> pending_edge_cold = e; >>>> >>> } >>>> >>> + >>>> >>> + /* Save a pointer to the exit's predecessor BB for use in >>>> >>> + inserting new BBs at the end of the function. Do this >>>> >>> + after the call to split_block above which may split >>>> >>> + the original exit pred. */ >>>> >>> + exit_pred = EXIT_BLOCK_PTR->prev_bb; >>>> >>> >>>> >>> FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) >>>> >>> { >>>> >>> Index: function.h >>>> >>> =================================================================== >>>> >>> --- function.h (revision 193376) >>>> >>> +++ function.h (working copy) >>>> >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { >>>> >>> sched2) and is useful only if the port defines LEAF_REGISTERS. */ >>>> >>> bool uses_only_leaf_regs; >>>> >>> >>>> >>> + /* Nonzero if the function being compiled has undergone hot/cold partitioning >>>> >>> + (under flag_reorder_blocks_and_partition) and has at least one cold >>>> >>> + block. */ >>>> >>> + bool has_bb_partition; >>>> >>> + >>>> >>> /* Like regs_ever_live, but 1 if a reg is set or clobbered from an >>>> >>> asm. Unlike regs_ever_live, elements of this array corresponding >>>> >>> to eliminable regs (like the frame pointer) are set if an asm >>>> >>> Index: hw-doloop.c >>>> >>> =================================================================== >>>> >>> --- hw-doloop.c (revision 193376) >>>> >>> +++ hw-doloop.c (working copy) >>>> >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) >>>> >>> else >>>> >>> bb->aux = NULL; >>>> >>> } >>>> >>> - cfg_layout_finalize (); >>>> >>> + cfg_layout_finalize (false); >>>> >>> clear_aux_for_blocks (); >>>> >>> df_analyze (); >>>> >>> } >>>> >>> Index: cfgcleanup.c >>>> >>> =================================================================== >>>> >>> --- cfgcleanup.c (revision 193376) >>>> >>> +++ cfgcleanup.c (working copy) >>>> >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, >>>> >>> partition boundaries). See the comments at the top of >>>> >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >>>> >>> >>>> >>> - if (flag_reorder_blocks_and_partition && reload_completed) >>>> >>> + if (crtl->has_bb_partition && reload_completed) >>>> >>> return false; >>>> >>> >>>> >>> /* Search backward through forwarder blocks. We don't need to worry >>>> >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) >>>> >>> df_analyze (); >>>> >>> } >>>> >>> >>>> >>> + if (changed) >>>> >>> + { >>>> >>> + /* Edge forwarding in particular can cause hot blocks previously >>>> >>> + reached by both hot and cold blocks to become dominated only >>>> >>> + by cold blocks. This will cause the verification below to fail, >>>> >>> + and lead to now cold code in the hot section. This is not easy >>>> >>> + to detect and fix during edge forwarding, and in some cases >>>> >>> + is only visible after newly unreachable blocks are deleted, >>>> >>> + which will be done in fixup_partitions. */ >>>> >>> + fixup_partitions (); >>>> >>> + >>>> >>> #ifdef ENABLE_CHECKING >>>> >>> - if (changed) >>>> >>> - verify_flow_info (); >>>> >>> + verify_flow_info (); >>>> >>> #endif >>>> >>> + } >>>> >>> >>>> >>> changed_overall |= changed; >>>> >>> first_pass = false; >>>> >>> Index: bb-reorder.c >>>> >>> =================================================================== >>>> >>> --- bb-reorder.c (revision 193376) >>>> >>> +++ bb-reorder.c (working copy) >>>> >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces >>>> >>> current_partition = BB_PARTITION (traces[0].first); >>>> >>> two_passes = false; >>>> >>> >>>> >>> - if (flag_reorder_blocks_and_partition) >>>> >>> + if (crtl->has_bb_partition) >>>> >>> for (i = 0; i < n_traces && !two_passes; i++) >>>> >>> if (BB_PARTITION (traces[0].first) >>>> >>> != BB_PARTITION (traces[i].first)) >>>> >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces >>>> >>> } >>>> >>> } >>>> >>> >>>> >>> - if (flag_reorder_blocks_and_partition) >>>> >>> + if (crtl->has_bb_partition) >>>> >>> try_copy = false; >>>> >>> >>>> >>> /* Copy tiny blocks always; copy larger blocks only when the >>>> >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) >>>> >>> return length; >>>> >>> } >>>> >>> >>>> >>> -/* Emit a barrier into the footer of BB. */ >>>> >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode. */ >>>> >>> >>>> >>> -static void >>>> >>> +void >>>> >>> emit_barrier_after_bb (basic_block bb) >>>> >>> { >>>> >>> rtx barrier = emit_barrier_after (BB_END (bb)); >>>> >>> - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >>>> >>> + if (current_ir_type () == IR_RTL_CFGLAYOUT) >>>> >>> + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >>>> >>> } >>>> >>> >>>> >>> /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. >>>> >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg >>>> >>> { >>>> >>> VEC(edge, heap) *crossing_edges = NULL; >>>> >>> basic_block bb; >>>> >>> - edge e; >>>> >>> - edge_iterator ei; >>>> >>> + edge e, e2; >>>> >>> + edge_iterator ei, ei2; >>>> >>> + unsigned int cold_bb_count = 0; >>>> >>> + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; >>>> >>> + VEC (basic_block, heap) *bbs_newly_hot = NULL; >>>> >>> >>>> >>> /* Mark which partition (hot/cold) each basic block belongs in. */ >>>> >>> FOR_EACH_BB (bb) >>>> >>> { >>>> >>> if (probably_never_executed_bb_p (cfun, bb)) >>>> >>> - BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>>> >>> + { >>>> >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>>> >>> + cold_bb_count++; >>>> >>> + } >>>> >>> else >>>> >>> - BB_SET_PARTITION (bb, BB_HOT_PARTITION); >>>> >>> + { >>>> >>> + BB_SET_PARTITION (bb, BB_HOT_PARTITION); >>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb); >>>> >>> + } >>>> >>> } >>>> >>> >>>> >>> + /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of >>>> >>> + several different possibilities. One is that there are edge weight insanities >>>> >>> + due to optimization phases that do not properly update basic block profile >>>> >>> + counts. The second is that the entry of the function may not be hot, because >>>> >>> + it is entered fewer times than the number of profile training runs, but there >>>> >>> + is a loop inside the function that causes blocks within the function to be >>>> >>> + above the threshold for hotness. */ >>>> >>> + if (cold_bb_count) >>>> >>> + { >>>> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>>> >>> + >>>> >>> + if (dom_calculated_here) >>>> >>> + calculate_dominance_info (CDI_DOMINATORS); >>>> >>> + >>>> >>> + /* Keep examining hot bbs until we have either checked them all, or >>>> >>> + re-marked all cold bbs hot. */ >>>> >>> + while (! VEC_empty (basic_block, bbs_in_hot_partition) >>>> >>> + && cold_bb_count) >>>> >>> + { >>>> >>> + basic_block dom_bb; >>>> >>> + >>>> >>> + bb = VEC_pop (basic_block, bbs_in_hot_partition); >>>> >>> + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); >>>> >>> + >>>> >>> + /* If bb's immediate dominator is also hot then it is ok. */ >>>> >>> + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) >>>> >>> + continue; >>>> >>> + >>>> >>> + /* We have a hot bb with an immediate dominator that is cold. >>>> >>> + The dominator needs to be re-marked to hot. */ >>>> >>> + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); >>>> >>> + cold_bb_count--; >>>> >>> + >>>> >>> + /* Now we need to examine newly-hot dom_bb to see if it is also >>>> >>> + dominated by a cold bb. */ >>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb); >>>> >>> + >>>> >>> + /* We should also adjust any cold blocks that the newly-hot bb >>>> >>> + feeds and see if it makes sense to re-mark those as hot as >>>> >>> + well. */ >>>> >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb); >>>> >>> + while (! VEC_empty (basic_block, bbs_newly_hot)) >>>> >>> + { >>>> >>> + basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot); >>>> >>> + /* Examine all successors of this newly-hot bb to see if they >>>> >>> + are cold and should be re-marked as hot. */ >>>> >>> + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) >>>> >>> + { >>>> >>> + bool any_cold_preds = false; >>>> >>> + basic_block succ = e->dest; >>>> >>> + if (BB_PARTITION (succ) != BB_COLD_PARTITION) >>>> >>> + continue; >>>> >>> + /* Does this block have any cold predecessors now? */ >>>> >>> + FOR_EACH_EDGE (e2, ei2, succ->preds) >>>> >>> + { >>>> >>> + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) >>>> >>> + { >>>> >>> + any_cold_preds = true; >>>> >>> + break; >>>> >>> + } >>>> >>> + } >>>> >>> + if (any_cold_preds) >>>> >>> + continue; >>>> >>> + >>>> >>> + /* Here we have a successor of newly-hot bb that is cold >>>> >>> + but no longer has any cold precessessors. Since the original >>>> >>> + assignment of our newly-hot bb was incorrect, this successor's >>>> >>> + assignment as cold is also suspect. Go ahead and re-mark it >>>> >>> + as hot now too. Better heuristics may be in order here. */ >>>> >>> + BB_SET_PARTITION (succ, BB_HOT_PARTITION); >>>> >>> + cold_bb_count--; >>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ); >>>> >>> + /* Examine this successor as a newly-hot bb. */ >>>> >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, succ); >>>> >>> + } >>>> >>> + } >>>> >>> + } >>>> >>> + >>>> >>> + if (dom_calculated_here) >>>> >>> + free_dominance_info (CDI_DOMINATORS); >>>> >>> + } >>>> >>> + >>>> >>> /* The format of .gcc_except_table does not allow landing pads to >>>> >>> be in a different partition as the throw. Fix this by either >>>> >>> moving or duplicating the landing pads. */ >>>> >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) >>>> >>> new_bb->aux = cur_bb->aux; >>>> >>> cur_bb->aux = new_bb; >>>> >>> >>>> >>> - /* Make sure new fall-through bb is in same >>>> >>> - partition as bb it's falling through from. */ >>>> >>> + /* This is done by force_nonfallthru_and_redirect. */ >>>> >>> + gcc_assert (BB_PARTITION (new_bb) >>>> >>> + == BB_PARTITION (cur_bb)); >>>> >>> >>>> >>> - BB_COPY_PARTITION (new_bb, cur_bb); >>>> >>> single_succ_edge (new_bb)->flags |= EDGE_CROSSING; >>>> >>> } >>>> >>> else >>>> >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) >>>> >>> FOR_EACH_BB (bb) >>>> >>> FOR_EACH_EDGE (e, ei, bb->succs) >>>> >>> if ((e->flags & EDGE_CROSSING) >>>> >>> - && JUMP_P (BB_END (e->src))) >>>> >>> + && JUMP_P (BB_END (e->src)) >>>> >>> + /* Some notes were added during fix_up_fall_thru_edges, via >>>> >>> + force_nonfallthru_and_redirect. */ >>>> >>> + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX)) >>>> >>> add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>> >>> } >>>> >>> >>>> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) >>>> >>> dump_flow_info (dump_file, dump_flags); >>>> >>> } >>>> >>> >>>> >>> - if (flag_reorder_blocks_and_partition) >>>> >>> + if (crtl->has_bb_partition) >>>> >>> verify_hot_cold_block_grouping (); >>>> >>> } >>>> >>> >>>> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) >>>> >>> encountering this note will make the compiler switch between the >>>> >>> hot and cold text sections. */ >>>> >>> >>>> >>> -static void >>>> >>> +void >>>> >>> insert_section_boundary_note (void) >>>> >>> { >>>> >>> basic_block bb; >>>> >>> rtx new_note; >>>> >>> int first_partition = 0; >>>> >>> >>>> >>> - if (!flag_reorder_blocks_and_partition) >>>> >>> + if (!crtl->has_bb_partition) >>>> >>> return; >>>> >>> >>>> >>> FOR_EACH_BB (bb) >>>> >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) >>>> >>> FOR_EACH_BB (bb) >>>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>>> >>> bb->aux = bb->next_bb; >>>> >>> - cfg_layout_finalize (); >>>> >>> + cfg_layout_finalize (true); >>>> >>> >>>> >>> - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >>>> >>> - insert_section_boundary_note (); >>>> >>> return 0; >>>> >>> } >>>> >>> >>>> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) >>>> >>> } >>>> >>> >>>> >>> done: >>>> >>> - cfg_layout_finalize (); >>>> >>> + cfg_layout_finalize (false); >>>> >>> >>>> >>> BITMAP_FREE (candidates); >>>> >>> return 0; >>>> >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) >>>> >>> if (crossing_edges == NULL) >>>> >>> return 0; >>>> >>> >>>> >>> + crtl->has_bb_partition = true; >>>> >>> + >>>> >>> /* Make sure the source of any crossing edge ends in a jump and the >>>> >>> destination of any crossing edge has a label. */ >>>> >>> add_labels_and_missing_jumps (crossing_edges); >>>> >>> Index: bb-reorder.h >>>> >>> =================================================================== >>>> >>> --- bb-reorder.h (revision 193376) >>>> >>> +++ bb-reorder.h (working copy) >>>> >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re >>>> >>> >>>> >>> extern int get_uncond_jump_length (void); >>>> >>> >>>> >>> +extern void insert_section_boundary_note (void); >>>> >>> + >>>> >>> +extern void emit_barrier_after_bb (basic_block bb); >>>> >>> + >>>> >>> #endif >>>> >>> Index: basic-block.h >>>> >>> =================================================================== >>>> >>> --- basic-block.h (revision 193376) >>>> >>> +++ basic-block.h (working copy) >>>> >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect >>>> >>> extern bool contains_no_active_insn_p (const_basic_block); >>>> >>> extern bool forwarder_block_p (const_basic_block); >>>> >>> extern bool can_fallthru (basic_block, basic_block); >>>> >>> +extern void fixup_partitions (void); >>>> >>> >>>> >>> /* In cfgbuild.c. */ >>>> >>> extern void find_many_sub_basic_blocks (sbitmap); >>>> >>> Index: cfgrtl.c >>>> >>> =================================================================== >>>> >>> --- cfgrtl.c (revision 193376) >>>> >>> +++ cfgrtl.c (working copy) >>>> >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see >>>> >>> #include "tree.h" >>>> >>> #include "hard-reg-set.h" >>>> >>> #include "basic-block.h" >>>> >>> +#include "bb-reorder.h" >>>> >>> #include "regs.h" >>>> >>> #include "flags.h" >>>> >>> #include "function.h" >>>> >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see >>>> >>> Only applicable if the CFG is in cfglayout mode. */ >>>> >>> static GTY(()) rtx cfg_layout_function_footer; >>>> >>> static GTY(()) rtx cfg_layout_function_header; >>>> >>> +static bool had_sec_boundary_notes; >>>> >>> >>>> >>> static rtx skip_insns_after_block (basic_block); >>>> >>> static void record_effective_endpoints (void); >>>> >>> static rtx label_for_bb (basic_block); >>>> >>> -static void fixup_reorder_chain (void); >>>> >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks); >>>> >>> >>>> >>> void verify_insn_chain (void); >>>> >>> static void fixup_fallthru_exit_predecessor (void); >>>> >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc >>>> >>> partition boundaries). See the comments at the top of >>>> >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >>>> >>> >>>> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >>>> >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >>>> >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >>>> >>> return NULL; >>>> >>> >>>> >>> /* We can replace or remove a complex jump only when we have exactly >>>> >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) >>>> >>> return e; >>>> >>> } >>>> >>> >>>> >>> +/* Called when edge E has been redirected to a new destination, >>>> >>> + in order to update the region crossing flag on the edge and >>>> >>> + jump. */ >>>> >>> + >>>> >>> +static void >>>> >>> +fixup_partition_crossing (edge e, basic_block target) >>>> >>> +{ >>>> >>> + rtx note; >>>> >>> + >>>> >>> + gcc_assert (e->dest == target); >>>> >>> + >>>> >>> + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) >>>> >>> + return; >>>> >>> + /* If we redirected an existing edge, it may already be marked >>>> >>> + crossing, even though the new src is missing a reg crossing note. >>>> >>> + But make sure reg crossing note doesn't already exist before >>>> >>> + inserting. */ >>>> >>> + if (BB_PARTITION (e->src) != BB_PARTITION (target)) >>>> >>> + { >>>> >>> + e->flags |= EDGE_CROSSING; >>>> >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>> >>> + if (JUMP_P (BB_END (e->src)) >>>> >>> + && !note) >>>> >>> + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>> >>> + } >>>> >>> + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) >>>> >>> + { >>>> >>> + e->flags &= ~EDGE_CROSSING; >>>> >>> + /* Remove the region crossing note from jump at end of >>>> >>> + e->src if it exists. */ >>>> >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>> >>> + if (note) >>>> >>> + remove_note (BB_END (e->src), note); >>>> >>> + } >>>> >>> +} >>>> >>> + >>>> >>> +/* Called when block BB has been reassigned to a different partition, >>>> >>> + to ensure that the region crossing attributes are updated. */ >>>> >>> + >>>> >>> +static void >>>> >>> +fixup_bb_partition (basic_block bb) >>>> >>> +{ >>>> >>> + edge e; >>>> >>> + edge_iterator ei; >>>> >>> + >>>> >>> + /* Now need to make bb's pred edges non-region crossing. */ >>>> >>> + FOR_EACH_EDGE (e, ei, bb->preds) >>>> >>> + { >>>> >>> + fixup_partition_crossing (e, e->dest); >>>> >>> + } >>>> >>> + >>>> >>> + /* Possibly need to make bb's successor edges region crossing, >>>> >>> + or remove stale region crossing. */ >>>> >>> + FOR_EACH_EDGE (e, ei, bb->succs) >>>> >>> + { >>>> >>> + if ((e->flags & EDGE_FALLTHRU) >>>> >>> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) >>>> >>> + && e->dest != EXIT_BLOCK_PTR) >>>> >>> + /* force_nonfallthru_and_redirect calls fixup_partition_crossing. */ >>>> >>> + force_nonfallthru (e); >>>> >>> + else >>>> >>> + fixup_partition_crossing (e, e->dest); >>>> >>> + } >>>> >>> +} >>>> >>> + >>>> >>> /* Attempt to change code to redirect edge E to TARGET. Don't do that on >>>> >>> expense of adding new instructions or reordering basic blocks. >>>> >>> >>>> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block >>>> >>> { >>>> >>> edge ret; >>>> >>> basic_block src = e->src; >>>> >>> + basic_block dest = e->dest; >>>> >>> >>>> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >>>> >>> return NULL; >>>> >>> >>>> >>> - if (e->dest == target) >>>> >>> + if (dest == target) >>>> >>> return e; >>>> >>> >>>> >>> if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) >>>> >>> { >>>> >>> df_set_bb_dirty (src); >>>> >>> + fixup_partition_crossing (ret, target); >>>> >>> return ret; >>>> >>> } >>>> >>> >>>> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block >>>> >>> return NULL; >>>> >>> >>>> >>> df_set_bb_dirty (src); >>>> >>> + fixup_partition_crossing (ret, target); >>>> >>> return ret; >>>> >>> } >>>> >>> >>>> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >>>> >>> /* Make sure new block ends up in correct hot/cold section. */ >>>> >>> >>>> >>> BB_COPY_PARTITION (jump_block, e->src); >>>> >>> - if (flag_reorder_blocks_and_partition >>>> >>> - && targetm_common.have_named_sections >>>> >>> - && JUMP_P (BB_END (jump_block)) >>>> >>> - && !any_condjump_p (BB_END (jump_block)) >>>> >>> - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) >>>> >>> - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); >>>> >>> >>>> >>> /* Wire edge in. */ >>>> >>> new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); >>>> >>> new_edge->probability = probability; >>>> >>> new_edge->count = count; >>>> >>> >>>> >>> + /* If e->src was previously region crossing, it no longer is >>>> >>> + and the reg crossing note should be removed. */ >>>> >>> + fixup_partition_crossing (new_edge, jump_block); >>>> >>> + >>>> >>> /* Redirect old edge. */ >>>> >>> redirect_edge_pred (e, jump_block); >>>> >>> e->probability = REG_BR_PROB_BASE; >>>> >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >>>> >>> LABEL_NUSES (label)++; >>>> >>> } >>>> >>> >>>> >>> - emit_barrier_after (BB_END (jump_block)); >>>> >>> + /* We might be in cfg layout mode, and if so, the following routine will >>>> >>> + insert the barrier correctly. */ >>>> >>> + emit_barrier_after_bb (jump_block); >>>> >>> redirect_edge_succ_nodup (e, target); >>>> >>> >>>> >>> if (abnormal_edge_flags) >>>> >>> make_edge (src, target, abnormal_edge_flags); >>>> >>> >>>> >>> df_mark_solutions_dirty (); >>>> >>> + fixup_partition_crossing (e, target); >>>> >>> return new_bb; >>>> >>> } >>>> >>> >>>> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU >>>> >>> static basic_block >>>> >>> rtl_split_edge (edge edge_in) >>>> >>> { >>>> >>> - basic_block bb; >>>> >>> + basic_block bb, new_bb; >>>> >>> rtx before; >>>> >>> >>>> >>> /* Abnormal edges cannot be split. */ >>>> >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) >>>> >>> else >>>> >>> { >>>> >>> bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); >>>> >>> - /* ??? Why not edge_in->dest->prev_bb here? */ >>>> >>> - BB_COPY_PARTITION (bb, edge_in->dest); >>>> >>> + if (edge_in->src == ENTRY_BLOCK_PTR) >>>> >>> + BB_COPY_PARTITION (bb, edge_in->dest); >>>> >>> + else >>>> >>> + /* Put the split bb into the src partition, to avoid creating >>>> >>> + a situation where a cold bb dominates a hot bb, in the case >>>> >>> + where src is cold and dest is hot. The src will dominate >>>> >>> + the new bb (whereas it might not have dominated dest). */ >>>> >>> + BB_COPY_PARTITION (bb, edge_in->src); >>>> >>> } >>>> >>> >>>> >>> make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); >>>> >>> >>>> >>> + /* Can't allow a region crossing edge to be fallthrough. */ >>>> >>> + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) >>>> >>> + && edge_in->dest != EXIT_BLOCK_PTR) >>>> >>> + { >>>> >>> + new_bb = force_nonfallthru (single_succ_edge (bb)); >>>> >>> + gcc_assert (!new_bb); >>>> >>> + } >>>> >>> + >>>> >>> /* For non-fallthru edges, we must adjust the predecessor's >>>> >>> jump instruction to target our new block. */ >>>> >>> if ((edge_in->flags & EDGE_FALLTHRU) == 0) >>>> >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) >>>> >>> else >>>> >>> { >>>> >>> bb = split_edge (e); >>>> >>> - after = BB_END (bb); >>>> >>> >>>> >>> - if (flag_reorder_blocks_and_partition >>>> >>> - && targetm_common.have_named_sections >>>> >>> - && e->src != ENTRY_BLOCK_PTR >>>> >>> - && BB_PARTITION (e->src) == BB_COLD_PARTITION >>>> >>> - && !(e->flags & EDGE_CROSSING) >>>> >>> - && JUMP_P (after) >>>> >>> - && !any_condjump_p (after) >>>> >>> - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) >>>> >>> - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); >>>> >>> + /* If e crossed a partition boundary, we needed to make bb end in >>>> >>> + a region-crossing jump, even though it was originally fallthru. */ >>>> >>> + if (JUMP_P (BB_END (bb))) >>>> >>> + before = BB_END (bb); >>>> >>> + else >>>> >>> + after = BB_END (bb); >>>> >>> } >>>> >>> >>>> >>> /* Now that we've found the spot, do the insertion. */ >>>> >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) >>>> >>> { >>>> >>> basic_block bb; >>>> >>> >>>> >>> + /* Optimization passes that invoke this routine can cause hot blocks >>>> >>> + previously reached by both hot and cold blocks to become dominated only >>>> >>> + by cold blocks. This will cause the verification below to fail, >>>> >>> + and lead to now cold code in the hot section. In some cases this >>>> >>> + may only be visible after newly unreachable blocks are deleted, >>>> >>> + which will be done by fixup_partitions. */ >>>> >>> + fixup_partitions (); >>>> >>> + >>>> >>> #ifdef ENABLE_CHECKING >>>> >>> verify_flow_info (); >>>> >>> #endif >>>> >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) >>>> >>> >>>> >>> return end; >>>> >>> } >>>> >>> - >>>> >>> + >>>> >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization >>>> >>> + passes that modify the cfg. */ >>>> >>> + >>>> >>> +void >>>> >>> +fixup_partitions (void) >>>> >>> +{ >>>> >>> + basic_block bb; >>>> >>> + >>>> >>> + if (!crtl->has_bb_partition) >>>> >>> + return; >>>> >>> + >>>> >>> + /* Delete any blocks that became unreachable and weren't >>>> >>> + already cleaned up, for example during edge forwarding >>>> >>> + and convert_jumps_to_returns. This will expose more >>>> >>> + opportunities for fixing the partition boundaries here. >>>> >>> + Also, the calculation of the dominance graph during verification >>>> >>> + will assert if there are unreachable nodes. */ >>>> >>> + delete_unreachable_blocks (); >>>> >>> + >>>> >>> + /* If there are partitions, do a sanity check on them: A basic block in >>>> >>> + a cold partition cannot dominate a basic block in a hot partition. >>>> >>> + Fixup any that now violate this requirement, as a result of edge >>>> >>> + forwarding and unreachable block deletion. */ >>>> >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >>>> >>> + VEC (basic_block, heap) *bbs_to_fix = NULL; >>>> >>> + FOR_EACH_BB (bb) >>>> >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >>>> >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>> >>> + { >>>> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>>> >>> + basic_block son; >>>> >>> + >>>> >>> + if (dom_calculated_here) >>>> >>> + calculate_dominance_info (CDI_DOMINATORS); >>>> >>> + >>>> >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>> >>> + { >>>> >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >>>> >>> + /* If bb is not yet cold (because it was added below as >>>> >>> + a block dominated by a cold bb) then mark it cold here. */ >>>> >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >>>> >>> + { >>>> >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>>> >>> + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); >>>> >>> + } >>>> >>> + /* Any blocks dominated by a block in the cold section >>>> >>> + must also be cold. */ >>>> >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >>>> >>> + son; >>>> >>> + son = next_dom_son (CDI_DOMINATORS, son)) >>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >>>> >>> + } >>>> >>> + >>>> >>> + if (dom_calculated_here) >>>> >>> + free_dominance_info (CDI_DOMINATORS); >>>> >>> + } >>>> >>> + >>>> >>> + /* Do the partition fixup after all necessary blocks have been converted to >>>> >>> + cold, so that we only update the region crossings the minimum number of >>>> >>> + places, which can require forcing edges to be non fallthru. */ >>>> >>> + while (! VEC_empty (basic_block, bbs_to_fix)) >>>> >>> + { >>>> >>> + bb = VEC_pop (basic_block, bbs_to_fix); >>>> >>> + fixup_bb_partition (bb); >>>> >>> + } >>>> >>> +} >>>> >>> + >>>> >>> /* Verify the CFG and RTL consistency common for both underlying RTL and >>>> >>> cfglayout RTL. >>>> >>> >>>> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) >>>> >>> rtx x; >>>> >>> int err = 0; >>>> >>> basic_block bb; >>>> >>> + bool have_partitions = false; >>>> >>> >>>> >>> /* Check the general integrity of the basic blocks. */ >>>> >>> FOR_EACH_BB_REVERSE (bb) >>>> >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) >>>> >>> >>>> >>> if (e->flags & EDGE_ABNORMAL) >>>> >>> n_abnormal++; >>>> >>> + >>>> >>> + have_partitions |= is_crossing; >>>> >>> } >>>> >>> >>>> >>> if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) >>>> >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) >>>> >>> } >>>> >>> } >>>> >>> >>>> >>> + /* If there are partitions, do a sanity check on them: A basic block in >>>> >>> + a cold partition cannot dominate a basic block in a hot partition. */ >>>> >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >>>> >>> + if (have_partitions && !err) >>>> >>> + FOR_EACH_BB (bb) >>>> >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >>>> >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>> >>> + { >>>> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>>> >>> + basic_block son; >>>> >>> + >>>> >>> + if (dom_calculated_here) >>>> >>> + calculate_dominance_info (CDI_DOMINATORS); >>>> >>> + >>>> >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>> >>> + { >>>> >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >>>> >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >>>> >>> + { >>>> >>> + error ("non-cold basic block %d dominated " >>>> >>> + "by a block in the cold partition", bb->index); >>>> >>> + err = 1; >>>> >>> + } >>>> >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >>>> >>> + son; >>>> >>> + son = next_dom_son (CDI_DOMINATORS, son)) >>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >>>> >>> + } >>>> >>> + >>>> >>> + if (dom_calculated_here) >>>> >>> + free_dominance_info (CDI_DOMINATORS); >>>> >>> + } >>>> >>> + >>>> >>> /* Clean up. */ >>>> >>> return err; >>>> >>> } >>>> >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) >>>> >>> else >>>> >>> cfg_layout_function_header = NULL_RTX; >>>> >>> >>>> >>> + had_sec_boundary_notes = false; >>>> >>> + >>>> >>> next_insn = get_insns (); >>>> >>> FOR_EACH_BB (bb) >>>> >>> { >>>> >>> rtx end; >>>> >>> >>>> >>> if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) >>>> >>> - BB_HEADER (bb) = unlink_insn_chain (next_insn, >>>> >>> - PREV_INSN (BB_HEAD (bb))); >>>> >>> + { >>>> >>> + /* Rather than try to keep section boundary notes incrementally >>>> >>> + up-to-date through cfg layout optimizations, simply remove them >>>> >>> + and flag that they should be re-inserted when exiting >>>> >>> + cfg layout mode. */ >>>> >>> + rtx check_insn = next_insn; >>>> >>> + while (check_insn) >>>> >>> + { >>>> >>> + if (NOTE_P (check_insn) >>>> >>> + && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) >>>> >>> + { >>>> >>> + had_sec_boundary_notes |= true; >>>> >>> + /* Remove note from chain. Grab new next_insn first. */ >>>> >>> + if (next_insn == check_insn) >>>> >>> + next_insn = NEXT_INSN (check_insn); >>>> >>> + /* Delete note. */ >>>> >>> + delete_insn (check_insn); >>>> >>> + /* There will only be one. */ >>>> >>> + break; >>>> >>> + } >>>> >>> + check_insn = NEXT_INSN (check_insn); >>>> >>> + } >>>> >>> + /* If we still have header instructions left after above loop. */ >>>> >>> + if (next_insn != BB_HEAD (bb)) >>>> >>> + BB_HEADER (bb) = unlink_insn_chain (next_insn, >>>> >>> + PREV_INSN (BB_HEAD (bb))); >>>> >>> + } >>>> >>> end = skip_insns_after_block (bb); >>>> >>> if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) >>>> >>> BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end); >>>> >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) >>>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>>> >>> bb->aux = bb->next_bb; >>>> >>> >>>> >>> - cfg_layout_finalize (); >>>> >>> + cfg_layout_finalize (false); >>>> >>> >>>> >>> return 0; >>>> >>> } >>>> >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) >>>> >>> } >>>> >>> >>>> >>> >>>> >>> -/* Given a reorder chain, rearrange the code to match. */ >>>> >>> +/* Given a reorder chain, rearrange the code to match. If >>>> >>> + this is called when we will FINALIZE_REORDER_BLOCKS, or when >>>> >>> + section boundary notes were removed on entry to cfg layout >>>> >>> + mode, insert section boundary notes here. */ >>>> >>> >>>> >>> static void >>>> >>> -fixup_reorder_chain (void) >>>> >>> +fixup_reorder_chain (bool finalize_reorder_blocks) >>>> >>> { >>>> >>> basic_block bb; >>>> >>> rtx insn = NULL; >>>> >>> @@ -3150,7 +3373,7 @@ static void >>>> >>> PREV_INSN (BB_HEADER (bb)) = insn; >>>> >>> insn = BB_HEADER (bb); >>>> >>> while (NEXT_INSN (insn)) >>>> >>> - insn = NEXT_INSN (insn); >>>> >>> + insn = NEXT_INSN (insn); >>>> >>> } >>>> >>> if (insn) >>>> >>> NEXT_INSN (insn) = BB_HEAD (bb); >>>> >>> @@ -3175,6 +3398,11 @@ static void >>>> >>> insn = NEXT_INSN (insn); >>>> >>> >>>> >>> set_last_insn (insn); >>>> >>> + >>>> >>> + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >>>> >>> + if (had_sec_boundary_notes || finalize_reorder_blocks) >>>> >>> + insert_section_boundary_note (); >>>> >>> + >>>> >>> #ifdef ENABLE_CHECKING >>>> >>> verify_insn_chain (); >>>> >>> #endif >>>> >>> @@ -3187,7 +3415,7 @@ static void >>>> >>> edge e_fall, e_taken, e; >>>> >>> rtx bb_end_insn; >>>> >>> rtx ret_label = NULL_RTX; >>>> >>> - basic_block nb, src_bb; >>>> >>> + basic_block nb; >>>> >>> edge_iterator ei; >>>> >>> >>>> >>> if (EDGE_COUNT (bb->succs) == 0) >>>> >>> @@ -3322,7 +3550,6 @@ static void >>>> >>> /* We got here if we need to add a new jump insn. >>>> >>> Note force_nonfallthru can delete E_FALL and thus we have to >>>> >>> save E_FALL->src prior to the call to force_nonfallthru. */ >>>> >>> - src_bb = e_fall->src; >>>> >>> nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); >>>> >>> if (nb) >>>> >>> { >>>> >>> @@ -3330,17 +3557,6 @@ static void >>>> >>> bb->aux = nb; >>>> >>> /* Don't process this new block. */ >>>> >>> bb = nb; >>>> >>> - >>>> >>> - /* Make sure new bb is tagged for correct section (same as >>>> >>> - fall-thru source, since you cannot fall-thru across >>>> >>> - section boundaries). */ >>>> >>> - BB_COPY_PARTITION (src_bb, single_pred (bb)); >>>> >>> - if (flag_reorder_blocks_and_partition >>>> >>> - && targetm_common.have_named_sections >>>> >>> - && JUMP_P (BB_END (bb)) >>>> >>> - && !any_condjump_p (BB_END (bb)) >>>> >>> - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) >>>> >>> - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); >>>> >>> } >>>> >>> } >>>> >>> >>>> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) >>>> >>> case NOTE_INSN_FUNCTION_BEG: >>>> >>> /* There is always just single entry to function. */ >>>> >>> case NOTE_INSN_BASIC_BLOCK: >>>> >>> + /* We should only switch text sections once. */ >>>> >>> + case NOTE_INSN_SWITCH_TEXT_SECTIONS: >>>> >>> break; >>>> >>> >>>> >>> case NOTE_INSN_EPILOGUE_BEG: >>>> >>> - case NOTE_INSN_SWITCH_TEXT_SECTIONS: >>>> >>> emit_note_copy (insn); >>>> >>> break; >>>> >>> >>>> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void) >>>> >>> } >>>> >>> >>>> >>> /* Finalize the changes: reorder insn list according to the sequence specified >>>> >>> - by aux pointers, enter compensation code, rebuild scope forest. */ >>>> >>> + by aux pointers, enter compensation code, rebuild scope forest. If >>>> >>> + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that >>>> >>> + to fixup_reorder_chain so that it can insert the proper switch text >>>> >>> + section notes. */ >>>> >>> >>>> >>> void >>>> >>> -cfg_layout_finalize (void) >>>> >>> +cfg_layout_finalize (bool finalize_reorder_blocks) >>>> >>> { >>>> >>> #ifdef ENABLE_CHECKING >>>> >>> verify_flow_info (); >>>> >>> @@ -3775,7 +3995,7 @@ void >>>> >>> #endif >>>> >>> ) >>>> >>> fixup_fallthru_exit_predecessor (); >>>> >>> - fixup_reorder_chain (); >>>> >>> + fixup_reorder_chain (finalize_reorder_blocks); >>>> >>> >>>> >>> rebuild_jump_labels (get_insns ()); >>>> >>> delete_dead_jumptables (); >>>> >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) >>>> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >>>> >>> return false; >>>> >>> >>>> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >>>> >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >>>> >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >>>> >>> return false; >>>> >>> >>>> >>> if (!onlyjump_p (insn) >>>> >>> >>>> >>> -- >>>> >>> This patch is available for review at http://codereview.appspot.com/6823047 >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 >>>> >>>> >>>> >>>> -- >>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Yes, I have configured GCC with: --target=arm-none-linux-gnueabi--with-cpu=cortex-a9 --with-fpu=neon --with-float=softfp Thanks, Christophe. On 28 November 2012 16:56, Teresa Johnson <tejohnson@google.com> wrote: > Is this with the same target compiler and options used in PR55121? I > will try to reproduce the compile-time failures with arm and those > options if so. I haven't seen those with spec2006 linux x86_64. I'm > not sure how to test the runtime behavior though. > > Thanks, > Teresa > > On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon > <christophe.lyon@linaro.org> wrote: >> I have updated my trunk checkout, and I can confirm that eval.c now >> compiles with your patch (and the other 4 patches I added to PR55121). >> >> Now, when looking at the whole Spec2k results: >> - vpr passes now (used to fail) >> - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail >> with the same error from gas: >> can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171' >> {.text section} >> - gap still does not build (same error as above) >> >> I haven't looked in detail, so I may be missing an obvious patch here. >> >> And I still observe runtime mis-behaviour on crafty, galgel, facerec and fma3d. >> >> Thanks >> Christophe. >> >> >> On 26 November 2012 21:52, Teresa Johnson <tejohnson@google.com> wrote: >>> Sorry, I don't know what happened there. Patch is attached. >>> Thanks, >>> Teresa >>> >>> On Mon, Nov 26, 2012 at 12:42 PM, Jack Howarth <howarth@bromo.med.uc.edu> wrote: >>>> On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote: >>>>> Are you sure you have all my changes applied? I applied the 4 patches >>>>> attached to PR55121 into my trunk checkout that has my fixes, and to a >>>>> pristine trunk checkout. I configured and built both for >>>>> --target=arm-none-linux-gnueabi, and built using your options, .i file >>>>> and gcda file. I can reproduce the failure using the pristine trunk >>>>> with your patches but not with my fixed trunk + your patches. (I just >>>>> updated to head to pickup recent changes and get the same result. The >>>>> vec changes required some manual changes to the patch, which I will >>>>> resend shortly.) >>>> >>>> Teresa, >>>> Your mailer seems to have corrupted the posted patch with stray >>>> =3D characters and line breaks. Can you repost a copy as an attachment >>>> to the list? >>>> Jack >>>> >>>>> >>>>> Without my fixes: >>>>> >>>>> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce >>>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 >>>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp >>>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use >>>>> -fno-common -o eval.s -freorder-blocks-and-partition >>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>>>> 2.4.2-p1, MPC version 0.8.1 >>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>>>> 2.4.2-p1, MPC version 0.8.1 >>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>>>> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a >>>>> eval.c: In function ‘Ge’: >>>>> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560 >>>>> } >>>>> ^ >>>>> 0x622f71 df_compact_blocks() >>>>> ../../gcc_trunk_3/gcc/df-core.c:1560 >>>>> 0x5cfcb5 compact_blocks() >>>>> ../../gcc_trunk_3/gcc/cfg.c:162 >>>>> 0xc9dce0 reorder_basic_blocks >>>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2154 >>>>> 0xc9dce0 rest_of_handle_reorder_blocks >>>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2219 >>>>> Please submit a full bug report, >>>>> with preprocessed source if appropriate. >>>>> Please include the complete backtrace with any bug report. >>>>> See <http://gcc.gnu.org/bugs.html> for instructions. >>>>> >>>>> >>>>> With my fixes: >>>>> >>>>> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce >>>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 >>>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp >>>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use >>>>> -fno-common -o eval.s -freorder-blocks-and-partition >>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>>>> 2.4.2-p1, MPC version 0.8.1 >>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>>>> 2.4.2-p1, MPC version 0.8.1 >>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>>>> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3 >>>>> >>>>> >>>>> Thanks, >>>>> Teresa >>>>> >>>>> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon >>>>> <christophe.lyon@linaro.org> wrote: >>>>> > Hi, >>>>> > >>>>> > I have tested your patch on Spec2000 on ARM, and I can still see >>>>> > several failures caused by: >>>>> > "error: fallthru edge crosses section boundary", including the case >>>>> > described in PR55121. >>>>> > >>>>> > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote: >>>>> >> Ping. >>>>> >> Teresa >>>>> >> >>>>> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote: >>>>> >>> Revised patch that fixes failures encountered when enabling >>>>> >>> -freorder-blocks-and-partition, including the failure reported in PR 53743. >>>>> >>> >>>>> >>> This includes new verification code to ensure no cold blocks dominate hot >>>>> >>> blocks contributed by Steven Bosscher. >>>>> >>> >>>>> >>> I attempted to make the handling of partition updates through the optimization >>>>> >>> passes much more consistent, removing a number of partial fixes in the code >>>>> >>> stream in the process. The code to fixup partitions (including the BB_PARTITION >>>>> >>> assignement, region crossing jump notes, and switch text section notes) is >>>>> >>> now handled in a few centralized locations. For example, inside >>>>> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers >>>>> >>> don't need to attempt the fixup themselves. >>>>> >>> >>>>> >>> For optimization passes that make adjustments to the cfg while in cfg layout >>>>> >>> mode that are not easy to fix up incrementally, the new routine >>>>> >>> fixup_partitions handles the cleanup globally. This does require calculation >>>>> >>> of the dominance relation, however, as far as I can tell the routines which >>>>> >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) >>>>> >>> are invoked typically once (or a small number of times in the case of >>>>> >>> try_optimize_cfg) per optimization pass. Additionally, I compared the >>>>> >>> -ftime-report output for some large fdo compilations and saw only minimal >>>>> >>> increases in the dominance computation times, which were only a tiny percent >>>>> >>> of the overall compile time. >>>>> >>> >>>>> >>> Additionally, I added a flag to the rtl_data structure to indicate whether >>>>> >>> any partitioning was actually performed, so that optimizations which were >>>>> >>> conservatively disabled whenever the flag_reorder_blocks_and_partition >>>>> >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less >>>>> >>> conservative for functions where no partitions were formed (e.g. they are >>>>> >>> completely hot). >>>>> >>> >>>>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int >>>>> >>> benchmarks and internal google benchmarks using profile feedback and >>>>> >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk? >>>>> >>> >>>>> >>> Thanks, >>>>> >>> Teresa >>>>> >>> >>>>> >>> 2012-11-14 Teresa Johnson <tejohnson@google.com> >>>>> >>> Steven Bosscher <steven@gcc.gnu.org> >>>>> >>> >>>>> >>> * cfghooks.h (cfg_layout_finalize): New parameter. >>>>> >>> * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize >>>>> >>> parameter. >>>>> >>> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert >>>>> >>> as this is now done by redirect_edge_and_branch_force. >>>>> >>> * function.c (thread_prologue_and_epilogue_insns): Insert new bb after >>>>> >>> barriers, new cfg_layout_finalize parameter, and don't store exit >>>>> >>> predecessor BB until after it is potentially split. >>>>> >>> * function.h (struct rtl_data): New flag has_bb_partition. >>>>> >>> * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. >>>>> >>> * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if >>>>> >>> any blocks in function actually partitioned. >>>>> >>> (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean >>>>> >>> up partitioning. >>>>> >>> * bb-reorder.c (connect_traces): Only look for partitions and skip >>>>> >>> block copying if any blocks in function actually partitioned. >>>>> >>> (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. >>>>> >>> (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure >>>>> >>> that no cold blocks dominate a hot block. >>>>> >>> (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert >>>>> >>> as this is now done by force_nonfallthru_and_redirect. >>>>> >>> (add_reg_crossing_jump_notes): Handle the fact that some jumps may >>>>> >>> already be marked with region crossing note. >>>>> >>> (reorder_basic_blocks): Only need to verify partitions if any >>>>> >>> blocks in function actually partitioned. >>>>> >>> (insert_section_boundary_note): Only need to insert note if any >>>>> >>> blocks in function actually partitioned. >>>>> >>> (rest_of_handle_reorder_blocks): New cfg_layout_finalize >>>>> >>> parameter, and remove call to insert_section_boundary_note as this >>>>> >>> is now called via cfg_layout_finalize/fixup_reorder_chain. >>>>> >>> (duplicate_computed_gotos): New cfg_layout_finalize >>>>> >>> parameter. >>>>> >>> (partition_hot_cold_basic_blocks): Set flag indicating function >>>>> >>> has bb partitions. >>>>> >>> * bb-reorder.h: Declare insert_section_boundary_note and >>>>> >>> emit_barrier_after_bb, which are no longer static. >>>>> >>> * basic-block.h: Declare new function fixup_partitions. >>>>> >>> * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary >>>>> >>> check for region crossing note. >>>>> >>> (fixup_partition_crossing): New function. >>>>> >>> (fixup_bb_partition): Ditto. >>>>> >>> (rtl_redirect_edge_and_branch): Fixup partition boundaries. >>>>> >>> (force_nonfallthru_and_redirect): Fixup partition boundaries, >>>>> >>> remove old code that tried to do this. Emit barrier correctly >>>>> >>> when we are in cfglayout mode. >>>>> >>> (rtl_split_edge): Correctly fixup partition boundaries. >>>>> >>> (commit_one_edge_insertion): Remove old code that tried to >>>>> >>> fixup region crossing edge since this is now handled in >>>>> >>> split_block, and set up insertion point correctly since >>>>> >>> block may now end in a jump. >>>>> >>> (commit_edge_insertions): Invoke fixup_partitions to sanitize partition >>>>> >>> boundaries after optimizations that modify cfg and before trying to >>>>> >>> verify the flow info. >>>>> >>> (fixup_partitions): New function. >>>>> >>> (rtl_verify_flow_info_1): Add verification that no cold bbs dominate >>>>> >>> hot bbs. >>>>> >>> (record_effective_endpoints): Remove region-crossing notes and set flag >>>>> >>> indicating that they need to be reinserted on exit from cfglayout mode. >>>>> >>> (outof_cfg_layout_mode): New cfg_layout_finalize parameter. >>>>> >>> (fixup_reorder_chain): Call insert_section_boundary_note if necessary. >>>>> >>> Remove old code that attempted to fixup region crossing note as >>>>> >>> this is now handled in force_nonfallthru_and_redirect. >>>>> >>> (duplicate_insn_chain): Don't duplicate switch section notes. >>>>> >>> (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain. >>>>> >>> (rtl_can_remove_branch_p): Remove unnecessary check for region crossing >>>>> >>> note. >>>>> >>> >>>>> >>> Index: cfghooks.h >>>>> >>> =================================================================== >>>>> >>> --- cfghooks.h (revision 193376) >>>>> >>> +++ cfghooks.h (working copy) >>>>> >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas >>>>> >>> void account_profile_record (struct profile_record *, int); >>>>> >>> >>>>> >>> extern void cfg_layout_initialize (unsigned int); >>>>> >>> -extern void cfg_layout_finalize (void); >>>>> >>> +extern void cfg_layout_finalize (bool); >>>>> >>> >>>>> >>> /* Hooks containers. */ >>>>> >>> extern struct cfg_hooks gimple_cfg_hooks; >>>>> >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi >>>>> >>> extern void gimple_register_cfg_hooks (void); >>>>> >>> extern struct cfg_hooks get_cfg_hooks (void); >>>>> >>> extern void set_cfg_hooks (struct cfg_hooks); >>>>> >>> - >>>>> >>> Index: modulo-sched.c >>>>> >>> =================================================================== >>>>> >>> --- modulo-sched.c (revision 193376) >>>>> >>> +++ modulo-sched.c (working copy) >>>>> >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) >>>>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>>>> >>> bb->aux = bb->next_bb; >>>>> >>> free_dominance_info (CDI_DOMINATORS); >>>>> >>> - cfg_layout_finalize (); >>>>> >>> + cfg_layout_finalize (false); >>>>> >>> #endif /* INSN_SCHEDULING */ >>>>> >>> return 0; >>>>> >>> } >>>>> >>> Index: ifcvt.c >>>>> >>> =================================================================== >>>>> >>> --- ifcvt.c (revision 193376) >>>>> >>> +++ ifcvt.c (working copy) >>>>> >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg >>>>> >>> if (new_bb) >>>>> >>> { >>>>> >>> df_bb_replace (then_bb_index, new_bb); >>>>> >>> - /* Since the fallthru edge was redirected from test_bb to new_bb, >>>>> >>> - we need to ensure that new_bb is in the same partition as >>>>> >>> - test bb (you can not fall through across section boundaries). */ >>>>> >>> - BB_COPY_PARTITION (new_bb, test_bb); >>>>> >>> + /* This should have been done above via force_nonfallthru_and_redirect >>>>> >>> + (possibly called from redirect_edge_and_branch_force). */ >>>>> >>> + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); >>>>> >>> } >>>>> >>> >>>>> >>> num_true_changes++; >>>>> >>> Index: function.c >>>>> >>> =================================================================== >>>>> >>> --- function.c (revision 193376) >>>>> >>> +++ function.c (working copy) >>>>> >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) >>>>> >>> break; >>>>> >>> if (e) >>>>> >>> { >>>>> >>> - copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), >>>>> >>> - NULL_RTX, e->src); >>>>> >>> + /* Make sure we insert after any barriers. */ >>>>> >>> + rtx end = get_last_bb_insn (e->src); >>>>> >>> + copy_bb = create_basic_block (NEXT_INSN (end), >>>>> >>> + NULL_RTX, e->src); >>>>> >>> BB_COPY_PARTITION (copy_bb, e->src); >>>>> >>> } >>>>> >>> else >>>>> >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) >>>>> >>> if (cur_bb->index >= NUM_FIXED_BLOCKS >>>>> >>> && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) >>>>> >>> cur_bb->aux = cur_bb->next_bb; >>>>> >>> - cfg_layout_finalize (); >>>>> >>> + cfg_layout_finalize (false); >>>>> >>> } >>>>> >>> >>>>> >>> epilogue_done: >>>>> >>> @@ -6517,7 +6519,7 @@ epilogue_done: >>>>> >>> basic_block simple_return_block_cold = NULL; >>>>> >>> edge pending_edge_hot = NULL; >>>>> >>> edge pending_edge_cold = NULL; >>>>> >>> - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; >>>>> >>> + basic_block exit_pred; >>>>> >>> int i; >>>>> >>> >>>>> >>> gcc_assert (entry_edge != orig_entry_edge); >>>>> >>> @@ -6545,6 +6547,12 @@ epilogue_done: >>>>> >>> else >>>>> >>> pending_edge_cold = e; >>>>> >>> } >>>>> >>> + >>>>> >>> + /* Save a pointer to the exit's predecessor BB for use in >>>>> >>> + inserting new BBs at the end of the function. Do this >>>>> >>> + after the call to split_block above which may split >>>>> >>> + the original exit pred. */ >>>>> >>> + exit_pred = EXIT_BLOCK_PTR->prev_bb; >>>>> >>> >>>>> >>> FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) >>>>> >>> { >>>>> >>> Index: function.h >>>>> >>> =================================================================== >>>>> >>> --- function.h (revision 193376) >>>>> >>> +++ function.h (working copy) >>>>> >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { >>>>> >>> sched2) and is useful only if the port defines LEAF_REGISTERS. */ >>>>> >>> bool uses_only_leaf_regs; >>>>> >>> >>>>> >>> + /* Nonzero if the function being compiled has undergone hot/cold partitioning >>>>> >>> + (under flag_reorder_blocks_and_partition) and has at least one cold >>>>> >>> + block. */ >>>>> >>> + bool has_bb_partition; >>>>> >>> + >>>>> >>> /* Like regs_ever_live, but 1 if a reg is set or clobbered from an >>>>> >>> asm. Unlike regs_ever_live, elements of this array corresponding >>>>> >>> to eliminable regs (like the frame pointer) are set if an asm >>>>> >>> Index: hw-doloop.c >>>>> >>> =================================================================== >>>>> >>> --- hw-doloop.c (revision 193376) >>>>> >>> +++ hw-doloop.c (working copy) >>>>> >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) >>>>> >>> else >>>>> >>> bb->aux = NULL; >>>>> >>> } >>>>> >>> - cfg_layout_finalize (); >>>>> >>> + cfg_layout_finalize (false); >>>>> >>> clear_aux_for_blocks (); >>>>> >>> df_analyze (); >>>>> >>> } >>>>> >>> Index: cfgcleanup.c >>>>> >>> =================================================================== >>>>> >>> --- cfgcleanup.c (revision 193376) >>>>> >>> +++ cfgcleanup.c (working copy) >>>>> >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, >>>>> >>> partition boundaries). See the comments at the top of >>>>> >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >>>>> >>> >>>>> >>> - if (flag_reorder_blocks_and_partition && reload_completed) >>>>> >>> + if (crtl->has_bb_partition && reload_completed) >>>>> >>> return false; >>>>> >>> >>>>> >>> /* Search backward through forwarder blocks. We don't need to worry >>>>> >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) >>>>> >>> df_analyze (); >>>>> >>> } >>>>> >>> >>>>> >>> + if (changed) >>>>> >>> + { >>>>> >>> + /* Edge forwarding in particular can cause hot blocks previously >>>>> >>> + reached by both hot and cold blocks to become dominated only >>>>> >>> + by cold blocks. This will cause the verification below to fail, >>>>> >>> + and lead to now cold code in the hot section. This is not easy >>>>> >>> + to detect and fix during edge forwarding, and in some cases >>>>> >>> + is only visible after newly unreachable blocks are deleted, >>>>> >>> + which will be done in fixup_partitions. */ >>>>> >>> + fixup_partitions (); >>>>> >>> + >>>>> >>> #ifdef ENABLE_CHECKING >>>>> >>> - if (changed) >>>>> >>> - verify_flow_info (); >>>>> >>> + verify_flow_info (); >>>>> >>> #endif >>>>> >>> + } >>>>> >>> >>>>> >>> changed_overall |= changed; >>>>> >>> first_pass = false; >>>>> >>> Index: bb-reorder.c >>>>> >>> =================================================================== >>>>> >>> --- bb-reorder.c (revision 193376) >>>>> >>> +++ bb-reorder.c (working copy) >>>>> >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces >>>>> >>> current_partition = BB_PARTITION (traces[0].first); >>>>> >>> two_passes = false; >>>>> >>> >>>>> >>> - if (flag_reorder_blocks_and_partition) >>>>> >>> + if (crtl->has_bb_partition) >>>>> >>> for (i = 0; i < n_traces && !two_passes; i++) >>>>> >>> if (BB_PARTITION (traces[0].first) >>>>> >>> != BB_PARTITION (traces[i].first)) >>>>> >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces >>>>> >>> } >>>>> >>> } >>>>> >>> >>>>> >>> - if (flag_reorder_blocks_and_partition) >>>>> >>> + if (crtl->has_bb_partition) >>>>> >>> try_copy = false; >>>>> >>> >>>>> >>> /* Copy tiny blocks always; copy larger blocks only when the >>>>> >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) >>>>> >>> return length; >>>>> >>> } >>>>> >>> >>>>> >>> -/* Emit a barrier into the footer of BB. */ >>>>> >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode. */ >>>>> >>> >>>>> >>> -static void >>>>> >>> +void >>>>> >>> emit_barrier_after_bb (basic_block bb) >>>>> >>> { >>>>> >>> rtx barrier = emit_barrier_after (BB_END (bb)); >>>>> >>> - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >>>>> >>> + if (current_ir_type () == IR_RTL_CFGLAYOUT) >>>>> >>> + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >>>>> >>> } >>>>> >>> >>>>> >>> /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. >>>>> >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg >>>>> >>> { >>>>> >>> VEC(edge, heap) *crossing_edges = NULL; >>>>> >>> basic_block bb; >>>>> >>> - edge e; >>>>> >>> - edge_iterator ei; >>>>> >>> + edge e, e2; >>>>> >>> + edge_iterator ei, ei2; >>>>> >>> + unsigned int cold_bb_count = 0; >>>>> >>> + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; >>>>> >>> + VEC (basic_block, heap) *bbs_newly_hot = NULL; >>>>> >>> >>>>> >>> /* Mark which partition (hot/cold) each basic block belongs in. */ >>>>> >>> FOR_EACH_BB (bb) >>>>> >>> { >>>>> >>> if (probably_never_executed_bb_p (cfun, bb)) >>>>> >>> - BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>>>> >>> + { >>>>> >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>>>> >>> + cold_bb_count++; >>>>> >>> + } >>>>> >>> else >>>>> >>> - BB_SET_PARTITION (bb, BB_HOT_PARTITION); >>>>> >>> + { >>>>> >>> + BB_SET_PARTITION (bb, BB_HOT_PARTITION); >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb); >>>>> >>> + } >>>>> >>> } >>>>> >>> >>>>> >>> + /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of >>>>> >>> + several different possibilities. One is that there are edge weight insanities >>>>> >>> + due to optimization phases that do not properly update basic block profile >>>>> >>> + counts. The second is that the entry of the function may not be hot, because >>>>> >>> + it is entered fewer times than the number of profile training runs, but there >>>>> >>> + is a loop inside the function that causes blocks within the function to be >>>>> >>> + above the threshold for hotness. */ >>>>> >>> + if (cold_bb_count) >>>>> >>> + { >>>>> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>>>> >>> + >>>>> >>> + if (dom_calculated_here) >>>>> >>> + calculate_dominance_info (CDI_DOMINATORS); >>>>> >>> + >>>>> >>> + /* Keep examining hot bbs until we have either checked them all, or >>>>> >>> + re-marked all cold bbs hot. */ >>>>> >>> + while (! VEC_empty (basic_block, bbs_in_hot_partition) >>>>> >>> + && cold_bb_count) >>>>> >>> + { >>>>> >>> + basic_block dom_bb; >>>>> >>> + >>>>> >>> + bb = VEC_pop (basic_block, bbs_in_hot_partition); >>>>> >>> + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); >>>>> >>> + >>>>> >>> + /* If bb's immediate dominator is also hot then it is ok. */ >>>>> >>> + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) >>>>> >>> + continue; >>>>> >>> + >>>>> >>> + /* We have a hot bb with an immediate dominator that is cold. >>>>> >>> + The dominator needs to be re-marked to hot. */ >>>>> >>> + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); >>>>> >>> + cold_bb_count--; >>>>> >>> + >>>>> >>> + /* Now we need to examine newly-hot dom_bb to see if it is also >>>>> >>> + dominated by a cold bb. */ >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb); >>>>> >>> + >>>>> >>> + /* We should also adjust any cold blocks that the newly-hot bb >>>>> >>> + feeds and see if it makes sense to re-mark those as hot as >>>>> >>> + well. */ >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb); >>>>> >>> + while (! VEC_empty (basic_block, bbs_newly_hot)) >>>>> >>> + { >>>>> >>> + basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot); >>>>> >>> + /* Examine all successors of this newly-hot bb to see if they >>>>> >>> + are cold and should be re-marked as hot. */ >>>>> >>> + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) >>>>> >>> + { >>>>> >>> + bool any_cold_preds = false; >>>>> >>> + basic_block succ = e->dest; >>>>> >>> + if (BB_PARTITION (succ) != BB_COLD_PARTITION) >>>>> >>> + continue; >>>>> >>> + /* Does this block have any cold predecessors now? */ >>>>> >>> + FOR_EACH_EDGE (e2, ei2, succ->preds) >>>>> >>> + { >>>>> >>> + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) >>>>> >>> + { >>>>> >>> + any_cold_preds = true; >>>>> >>> + break; >>>>> >>> + } >>>>> >>> + } >>>>> >>> + if (any_cold_preds) >>>>> >>> + continue; >>>>> >>> + >>>>> >>> + /* Here we have a successor of newly-hot bb that is cold >>>>> >>> + but no longer has any cold precessessors. Since the original >>>>> >>> + assignment of our newly-hot bb was incorrect, this successor's >>>>> >>> + assignment as cold is also suspect. Go ahead and re-mark it >>>>> >>> + as hot now too. Better heuristics may be in order here. */ >>>>> >>> + BB_SET_PARTITION (succ, BB_HOT_PARTITION); >>>>> >>> + cold_bb_count--; >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ); >>>>> >>> + /* Examine this successor as a newly-hot bb. */ >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, succ); >>>>> >>> + } >>>>> >>> + } >>>>> >>> + } >>>>> >>> + >>>>> >>> + if (dom_calculated_here) >>>>> >>> + free_dominance_info (CDI_DOMINATORS); >>>>> >>> + } >>>>> >>> + >>>>> >>> /* The format of .gcc_except_table does not allow landing pads to >>>>> >>> be in a different partition as the throw. Fix this by either >>>>> >>> moving or duplicating the landing pads. */ >>>>> >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) >>>>> >>> new_bb->aux = cur_bb->aux; >>>>> >>> cur_bb->aux = new_bb; >>>>> >>> >>>>> >>> - /* Make sure new fall-through bb is in same >>>>> >>> - partition as bb it's falling through from. */ >>>>> >>> + /* This is done by force_nonfallthru_and_redirect. */ >>>>> >>> + gcc_assert (BB_PARTITION (new_bb) >>>>> >>> + == BB_PARTITION (cur_bb)); >>>>> >>> >>>>> >>> - BB_COPY_PARTITION (new_bb, cur_bb); >>>>> >>> single_succ_edge (new_bb)->flags |= EDGE_CROSSING; >>>>> >>> } >>>>> >>> else >>>>> >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) >>>>> >>> FOR_EACH_BB (bb) >>>>> >>> FOR_EACH_EDGE (e, ei, bb->succs) >>>>> >>> if ((e->flags & EDGE_CROSSING) >>>>> >>> - && JUMP_P (BB_END (e->src))) >>>>> >>> + && JUMP_P (BB_END (e->src)) >>>>> >>> + /* Some notes were added during fix_up_fall_thru_edges, via >>>>> >>> + force_nonfallthru_and_redirect. */ >>>>> >>> + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX)) >>>>> >>> add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>>> >>> } >>>>> >>> >>>>> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) >>>>> >>> dump_flow_info (dump_file, dump_flags); >>>>> >>> } >>>>> >>> >>>>> >>> - if (flag_reorder_blocks_and_partition) >>>>> >>> + if (crtl->has_bb_partition) >>>>> >>> verify_hot_cold_block_grouping (); >>>>> >>> } >>>>> >>> >>>>> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) >>>>> >>> encountering this note will make the compiler switch between the >>>>> >>> hot and cold text sections. */ >>>>> >>> >>>>> >>> -static void >>>>> >>> +void >>>>> >>> insert_section_boundary_note (void) >>>>> >>> { >>>>> >>> basic_block bb; >>>>> >>> rtx new_note; >>>>> >>> int first_partition = 0; >>>>> >>> >>>>> >>> - if (!flag_reorder_blocks_and_partition) >>>>> >>> + if (!crtl->has_bb_partition) >>>>> >>> return; >>>>> >>> >>>>> >>> FOR_EACH_BB (bb) >>>>> >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) >>>>> >>> FOR_EACH_BB (bb) >>>>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>>>> >>> bb->aux = bb->next_bb; >>>>> >>> - cfg_layout_finalize (); >>>>> >>> + cfg_layout_finalize (true); >>>>> >>> >>>>> >>> - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >>>>> >>> - insert_section_boundary_note (); >>>>> >>> return 0; >>>>> >>> } >>>>> >>> >>>>> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) >>>>> >>> } >>>>> >>> >>>>> >>> done: >>>>> >>> - cfg_layout_finalize (); >>>>> >>> + cfg_layout_finalize (false); >>>>> >>> >>>>> >>> BITMAP_FREE (candidates); >>>>> >>> return 0; >>>>> >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) >>>>> >>> if (crossing_edges == NULL) >>>>> >>> return 0; >>>>> >>> >>>>> >>> + crtl->has_bb_partition = true; >>>>> >>> + >>>>> >>> /* Make sure the source of any crossing edge ends in a jump and the >>>>> >>> destination of any crossing edge has a label. */ >>>>> >>> add_labels_and_missing_jumps (crossing_edges); >>>>> >>> Index: bb-reorder.h >>>>> >>> =================================================================== >>>>> >>> --- bb-reorder.h (revision 193376) >>>>> >>> +++ bb-reorder.h (working copy) >>>>> >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re >>>>> >>> >>>>> >>> extern int get_uncond_jump_length (void); >>>>> >>> >>>>> >>> +extern void insert_section_boundary_note (void); >>>>> >>> + >>>>> >>> +extern void emit_barrier_after_bb (basic_block bb); >>>>> >>> + >>>>> >>> #endif >>>>> >>> Index: basic-block.h >>>>> >>> =================================================================== >>>>> >>> --- basic-block.h (revision 193376) >>>>> >>> +++ basic-block.h (working copy) >>>>> >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect >>>>> >>> extern bool contains_no_active_insn_p (const_basic_block); >>>>> >>> extern bool forwarder_block_p (const_basic_block); >>>>> >>> extern bool can_fallthru (basic_block, basic_block); >>>>> >>> +extern void fixup_partitions (void); >>>>> >>> >>>>> >>> /* In cfgbuild.c. */ >>>>> >>> extern void find_many_sub_basic_blocks (sbitmap); >>>>> >>> Index: cfgrtl.c >>>>> >>> =================================================================== >>>>> >>> --- cfgrtl.c (revision 193376) >>>>> >>> +++ cfgrtl.c (working copy) >>>>> >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see >>>>> >>> #include "tree.h" >>>>> >>> #include "hard-reg-set.h" >>>>> >>> #include "basic-block.h" >>>>> >>> +#include "bb-reorder.h" >>>>> >>> #include "regs.h" >>>>> >>> #include "flags.h" >>>>> >>> #include "function.h" >>>>> >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see >>>>> >>> Only applicable if the CFG is in cfglayout mode. */ >>>>> >>> static GTY(()) rtx cfg_layout_function_footer; >>>>> >>> static GTY(()) rtx cfg_layout_function_header; >>>>> >>> +static bool had_sec_boundary_notes; >>>>> >>> >>>>> >>> static rtx skip_insns_after_block (basic_block); >>>>> >>> static void record_effective_endpoints (void); >>>>> >>> static rtx label_for_bb (basic_block); >>>>> >>> -static void fixup_reorder_chain (void); >>>>> >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks); >>>>> >>> >>>>> >>> void verify_insn_chain (void); >>>>> >>> static void fixup_fallthru_exit_predecessor (void); >>>>> >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc >>>>> >>> partition boundaries). See the comments at the top of >>>>> >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >>>>> >>> >>>>> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >>>>> >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >>>>> >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >>>>> >>> return NULL; >>>>> >>> >>>>> >>> /* We can replace or remove a complex jump only when we have exactly >>>>> >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) >>>>> >>> return e; >>>>> >>> } >>>>> >>> >>>>> >>> +/* Called when edge E has been redirected to a new destination, >>>>> >>> + in order to update the region crossing flag on the edge and >>>>> >>> + jump. */ >>>>> >>> + >>>>> >>> +static void >>>>> >>> +fixup_partition_crossing (edge e, basic_block target) >>>>> >>> +{ >>>>> >>> + rtx note; >>>>> >>> + >>>>> >>> + gcc_assert (e->dest == target); >>>>> >>> + >>>>> >>> + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) >>>>> >>> + return; >>>>> >>> + /* If we redirected an existing edge, it may already be marked >>>>> >>> + crossing, even though the new src is missing a reg crossing note. >>>>> >>> + But make sure reg crossing note doesn't already exist before >>>>> >>> + inserting. */ >>>>> >>> + if (BB_PARTITION (e->src) != BB_PARTITION (target)) >>>>> >>> + { >>>>> >>> + e->flags |= EDGE_CROSSING; >>>>> >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>>> >>> + if (JUMP_P (BB_END (e->src)) >>>>> >>> + && !note) >>>>> >>> + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>>> >>> + } >>>>> >>> + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) >>>>> >>> + { >>>>> >>> + e->flags &= ~EDGE_CROSSING; >>>>> >>> + /* Remove the region crossing note from jump at end of >>>>> >>> + e->src if it exists. */ >>>>> >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>>> >>> + if (note) >>>>> >>> + remove_note (BB_END (e->src), note); >>>>> >>> + } >>>>> >>> +} >>>>> >>> + >>>>> >>> +/* Called when block BB has been reassigned to a different partition, >>>>> >>> + to ensure that the region crossing attributes are updated. */ >>>>> >>> + >>>>> >>> +static void >>>>> >>> +fixup_bb_partition (basic_block bb) >>>>> >>> +{ >>>>> >>> + edge e; >>>>> >>> + edge_iterator ei; >>>>> >>> + >>>>> >>> + /* Now need to make bb's pred edges non-region crossing. */ >>>>> >>> + FOR_EACH_EDGE (e, ei, bb->preds) >>>>> >>> + { >>>>> >>> + fixup_partition_crossing (e, e->dest); >>>>> >>> + } >>>>> >>> + >>>>> >>> + /* Possibly need to make bb's successor edges region crossing, >>>>> >>> + or remove stale region crossing. */ >>>>> >>> + FOR_EACH_EDGE (e, ei, bb->succs) >>>>> >>> + { >>>>> >>> + if ((e->flags & EDGE_FALLTHRU) >>>>> >>> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) >>>>> >>> + && e->dest != EXIT_BLOCK_PTR) >>>>> >>> + /* force_nonfallthru_and_redirect calls fixup_partition_crossing. */ >>>>> >>> + force_nonfallthru (e); >>>>> >>> + else >>>>> >>> + fixup_partition_crossing (e, e->dest); >>>>> >>> + } >>>>> >>> +} >>>>> >>> + >>>>> >>> /* Attempt to change code to redirect edge E to TARGET. Don't do that on >>>>> >>> expense of adding new instructions or reordering basic blocks. >>>>> >>> >>>>> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block >>>>> >>> { >>>>> >>> edge ret; >>>>> >>> basic_block src = e->src; >>>>> >>> + basic_block dest = e->dest; >>>>> >>> >>>>> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >>>>> >>> return NULL; >>>>> >>> >>>>> >>> - if (e->dest == target) >>>>> >>> + if (dest == target) >>>>> >>> return e; >>>>> >>> >>>>> >>> if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) >>>>> >>> { >>>>> >>> df_set_bb_dirty (src); >>>>> >>> + fixup_partition_crossing (ret, target); >>>>> >>> return ret; >>>>> >>> } >>>>> >>> >>>>> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block >>>>> >>> return NULL; >>>>> >>> >>>>> >>> df_set_bb_dirty (src); >>>>> >>> + fixup_partition_crossing (ret, target); >>>>> >>> return ret; >>>>> >>> } >>>>> >>> >>>>> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >>>>> >>> /* Make sure new block ends up in correct hot/cold section. */ >>>>> >>> >>>>> >>> BB_COPY_PARTITION (jump_block, e->src); >>>>> >>> - if (flag_reorder_blocks_and_partition >>>>> >>> - && targetm_common.have_named_sections >>>>> >>> - && JUMP_P (BB_END (jump_block)) >>>>> >>> - && !any_condjump_p (BB_END (jump_block)) >>>>> >>> - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) >>>>> >>> - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); >>>>> >>> >>>>> >>> /* Wire edge in. */ >>>>> >>> new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); >>>>> >>> new_edge->probability = probability; >>>>> >>> new_edge->count = count; >>>>> >>> >>>>> >>> + /* If e->src was previously region crossing, it no longer is >>>>> >>> + and the reg crossing note should be removed. */ >>>>> >>> + fixup_partition_crossing (new_edge, jump_block); >>>>> >>> + >>>>> >>> /* Redirect old edge. */ >>>>> >>> redirect_edge_pred (e, jump_block); >>>>> >>> e->probability = REG_BR_PROB_BASE; >>>>> >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >>>>> >>> LABEL_NUSES (label)++; >>>>> >>> } >>>>> >>> >>>>> >>> - emit_barrier_after (BB_END (jump_block)); >>>>> >>> + /* We might be in cfg layout mode, and if so, the following routine will >>>>> >>> + insert the barrier correctly. */ >>>>> >>> + emit_barrier_after_bb (jump_block); >>>>> >>> redirect_edge_succ_nodup (e, target); >>>>> >>> >>>>> >>> if (abnormal_edge_flags) >>>>> >>> make_edge (src, target, abnormal_edge_flags); >>>>> >>> >>>>> >>> df_mark_solutions_dirty (); >>>>> >>> + fixup_partition_crossing (e, target); >>>>> >>> return new_bb; >>>>> >>> } >>>>> >>> >>>>> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU >>>>> >>> static basic_block >>>>> >>> rtl_split_edge (edge edge_in) >>>>> >>> { >>>>> >>> - basic_block bb; >>>>> >>> + basic_block bb, new_bb; >>>>> >>> rtx before; >>>>> >>> >>>>> >>> /* Abnormal edges cannot be split. */ >>>>> >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) >>>>> >>> else >>>>> >>> { >>>>> >>> bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); >>>>> >>> - /* ??? Why not edge_in->dest->prev_bb here? */ >>>>> >>> - BB_COPY_PARTITION (bb, edge_in->dest); >>>>> >>> + if (edge_in->src == ENTRY_BLOCK_PTR) >>>>> >>> + BB_COPY_PARTITION (bb, edge_in->dest); >>>>> >>> + else >>>>> >>> + /* Put the split bb into the src partition, to avoid creating >>>>> >>> + a situation where a cold bb dominates a hot bb, in the case >>>>> >>> + where src is cold and dest is hot. The src will dominate >>>>> >>> + the new bb (whereas it might not have dominated dest). */ >>>>> >>> + BB_COPY_PARTITION (bb, edge_in->src); >>>>> >>> } >>>>> >>> >>>>> >>> make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); >>>>> >>> >>>>> >>> + /* Can't allow a region crossing edge to be fallthrough. */ >>>>> >>> + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) >>>>> >>> + && edge_in->dest != EXIT_BLOCK_PTR) >>>>> >>> + { >>>>> >>> + new_bb = force_nonfallthru (single_succ_edge (bb)); >>>>> >>> + gcc_assert (!new_bb); >>>>> >>> + } >>>>> >>> + >>>>> >>> /* For non-fallthru edges, we must adjust the predecessor's >>>>> >>> jump instruction to target our new block. */ >>>>> >>> if ((edge_in->flags & EDGE_FALLTHRU) == 0) >>>>> >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) >>>>> >>> else >>>>> >>> { >>>>> >>> bb = split_edge (e); >>>>> >>> - after = BB_END (bb); >>>>> >>> >>>>> >>> - if (flag_reorder_blocks_and_partition >>>>> >>> - && targetm_common.have_named_sections >>>>> >>> - && e->src != ENTRY_BLOCK_PTR >>>>> >>> - && BB_PARTITION (e->src) == BB_COLD_PARTITION >>>>> >>> - && !(e->flags & EDGE_CROSSING) >>>>> >>> - && JUMP_P (after) >>>>> >>> - && !any_condjump_p (after) >>>>> >>> - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) >>>>> >>> - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); >>>>> >>> + /* If e crossed a partition boundary, we needed to make bb end in >>>>> >>> + a region-crossing jump, even though it was originally fallthru. */ >>>>> >>> + if (JUMP_P (BB_END (bb))) >>>>> >>> + before = BB_END (bb); >>>>> >>> + else >>>>> >>> + after = BB_END (bb); >>>>> >>> } >>>>> >>> >>>>> >>> /* Now that we've found the spot, do the insertion. */ >>>>> >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) >>>>> >>> { >>>>> >>> basic_block bb; >>>>> >>> >>>>> >>> + /* Optimization passes that invoke this routine can cause hot blocks >>>>> >>> + previously reached by both hot and cold blocks to become dominated only >>>>> >>> + by cold blocks. This will cause the verification below to fail, >>>>> >>> + and lead to now cold code in the hot section. In some cases this >>>>> >>> + may only be visible after newly unreachable blocks are deleted, >>>>> >>> + which will be done by fixup_partitions. */ >>>>> >>> + fixup_partitions (); >>>>> >>> + >>>>> >>> #ifdef ENABLE_CHECKING >>>>> >>> verify_flow_info (); >>>>> >>> #endif >>>>> >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) >>>>> >>> >>>>> >>> return end; >>>>> >>> } >>>>> >>> - >>>>> >>> + >>>>> >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization >>>>> >>> + passes that modify the cfg. */ >>>>> >>> + >>>>> >>> +void >>>>> >>> +fixup_partitions (void) >>>>> >>> +{ >>>>> >>> + basic_block bb; >>>>> >>> + >>>>> >>> + if (!crtl->has_bb_partition) >>>>> >>> + return; >>>>> >>> + >>>>> >>> + /* Delete any blocks that became unreachable and weren't >>>>> >>> + already cleaned up, for example during edge forwarding >>>>> >>> + and convert_jumps_to_returns. This will expose more >>>>> >>> + opportunities for fixing the partition boundaries here. >>>>> >>> + Also, the calculation of the dominance graph during verification >>>>> >>> + will assert if there are unreachable nodes. */ >>>>> >>> + delete_unreachable_blocks (); >>>>> >>> + >>>>> >>> + /* If there are partitions, do a sanity check on them: A basic block in >>>>> >>> + a cold partition cannot dominate a basic block in a hot partition. >>>>> >>> + Fixup any that now violate this requirement, as a result of edge >>>>> >>> + forwarding and unreachable block deletion. */ >>>>> >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >>>>> >>> + VEC (basic_block, heap) *bbs_to_fix = NULL; >>>>> >>> + FOR_EACH_BB (bb) >>>>> >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >>>>> >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>>> >>> + { >>>>> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>>>> >>> + basic_block son; >>>>> >>> + >>>>> >>> + if (dom_calculated_here) >>>>> >>> + calculate_dominance_info (CDI_DOMINATORS); >>>>> >>> + >>>>> >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>>> >>> + { >>>>> >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >>>>> >>> + /* If bb is not yet cold (because it was added below as >>>>> >>> + a block dominated by a cold bb) then mark it cold here. */ >>>>> >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >>>>> >>> + { >>>>> >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); >>>>> >>> + } >>>>> >>> + /* Any blocks dominated by a block in the cold section >>>>> >>> + must also be cold. */ >>>>> >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >>>>> >>> + son; >>>>> >>> + son = next_dom_son (CDI_DOMINATORS, son)) >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >>>>> >>> + } >>>>> >>> + >>>>> >>> + if (dom_calculated_here) >>>>> >>> + free_dominance_info (CDI_DOMINATORS); >>>>> >>> + } >>>>> >>> + >>>>> >>> + /* Do the partition fixup after all necessary blocks have been converted to >>>>> >>> + cold, so that we only update the region crossings the minimum number of >>>>> >>> + places, which can require forcing edges to be non fallthru. */ >>>>> >>> + while (! VEC_empty (basic_block, bbs_to_fix)) >>>>> >>> + { >>>>> >>> + bb = VEC_pop (basic_block, bbs_to_fix); >>>>> >>> + fixup_bb_partition (bb); >>>>> >>> + } >>>>> >>> +} >>>>> >>> + >>>>> >>> /* Verify the CFG and RTL consistency common for both underlying RTL and >>>>> >>> cfglayout RTL. >>>>> >>> >>>>> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) >>>>> >>> rtx x; >>>>> >>> int err = 0; >>>>> >>> basic_block bb; >>>>> >>> + bool have_partitions = false; >>>>> >>> >>>>> >>> /* Check the general integrity of the basic blocks. */ >>>>> >>> FOR_EACH_BB_REVERSE (bb) >>>>> >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) >>>>> >>> >>>>> >>> if (e->flags & EDGE_ABNORMAL) >>>>> >>> n_abnormal++; >>>>> >>> + >>>>> >>> + have_partitions |= is_crossing; >>>>> >>> } >>>>> >>> >>>>> >>> if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) >>>>> >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) >>>>> >>> } >>>>> >>> } >>>>> >>> >>>>> >>> + /* If there are partitions, do a sanity check on them: A basic block in >>>>> >>> + a cold partition cannot dominate a basic block in a hot partition. */ >>>>> >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >>>>> >>> + if (have_partitions && !err) >>>>> >>> + FOR_EACH_BB (bb) >>>>> >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >>>>> >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>>> >>> + { >>>>> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>>>> >>> + basic_block son; >>>>> >>> + >>>>> >>> + if (dom_calculated_here) >>>>> >>> + calculate_dominance_info (CDI_DOMINATORS); >>>>> >>> + >>>>> >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>>> >>> + { >>>>> >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >>>>> >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >>>>> >>> + { >>>>> >>> + error ("non-cold basic block %d dominated " >>>>> >>> + "by a block in the cold partition", bb->index); >>>>> >>> + err = 1; >>>>> >>> + } >>>>> >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >>>>> >>> + son; >>>>> >>> + son = next_dom_son (CDI_DOMINATORS, son)) >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >>>>> >>> + } >>>>> >>> + >>>>> >>> + if (dom_calculated_here) >>>>> >>> + free_dominance_info (CDI_DOMINATORS); >>>>> >>> + } >>>>> >>> + >>>>> >>> /* Clean up. */ >>>>> >>> return err; >>>>> >>> } >>>>> >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) >>>>> >>> else >>>>> >>> cfg_layout_function_header = NULL_RTX; >>>>> >>> >>>>> >>> + had_sec_boundary_notes = false; >>>>> >>> + >>>>> >>> next_insn = get_insns (); >>>>> >>> FOR_EACH_BB (bb) >>>>> >>> { >>>>> >>> rtx end; >>>>> >>> >>>>> >>> if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) >>>>> >>> - BB_HEADER (bb) = unlink_insn_chain (next_insn, >>>>> >>> - PREV_INSN (BB_HEAD (bb))); >>>>> >>> + { >>>>> >>> + /* Rather than try to keep section boundary notes incrementally >>>>> >>> + up-to-date through cfg layout optimizations, simply remove them >>>>> >>> + and flag that they should be re-inserted when exiting >>>>> >>> + cfg layout mode. */ >>>>> >>> + rtx check_insn = next_insn; >>>>> >>> + while (check_insn) >>>>> >>> + { >>>>> >>> + if (NOTE_P (check_insn) >>>>> >>> + && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) >>>>> >>> + { >>>>> >>> + had_sec_boundary_notes |= true; >>>>> >>> + /* Remove note from chain. Grab new next_insn first. */ >>>>> >>> + if (next_insn == check_insn) >>>>> >>> + next_insn = NEXT_INSN (check_insn); >>>>> >>> + /* Delete note. */ >>>>> >>> + delete_insn (check_insn); >>>>> >>> + /* There will only be one. */ >>>>> >>> + break; >>>>> >>> + } >>>>> >>> + check_insn = NEXT_INSN (check_insn); >>>>> >>> + } >>>>> >>> + /* If we still have header instructions left after above loop. */ >>>>> >>> + if (next_insn != BB_HEAD (bb)) >>>>> >>> + BB_HEADER (bb) = unlink_insn_chain (next_insn, >>>>> >>> + PREV_INSN (BB_HEAD (bb))); >>>>> >>> + } >>>>> >>> end = skip_insns_after_block (bb); >>>>> >>> if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) >>>>> >>> BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end); >>>>> >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) >>>>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>>>> >>> bb->aux = bb->next_bb; >>>>> >>> >>>>> >>> - cfg_layout_finalize (); >>>>> >>> + cfg_layout_finalize (false); >>>>> >>> >>>>> >>> return 0; >>>>> >>> } >>>>> >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) >>>>> >>> } >>>>> >>> >>>>> >>> >>>>> >>> -/* Given a reorder chain, rearrange the code to match. */ >>>>> >>> +/* Given a reorder chain, rearrange the code to match. If >>>>> >>> + this is called when we will FINALIZE_REORDER_BLOCKS, or when >>>>> >>> + section boundary notes were removed on entry to cfg layout >>>>> >>> + mode, insert section boundary notes here. */ >>>>> >>> >>>>> >>> static void >>>>> >>> -fixup_reorder_chain (void) >>>>> >>> +fixup_reorder_chain (bool finalize_reorder_blocks) >>>>> >>> { >>>>> >>> basic_block bb; >>>>> >>> rtx insn = NULL; >>>>> >>> @@ -3150,7 +3373,7 @@ static void >>>>> >>> PREV_INSN (BB_HEADER (bb)) = insn; >>>>> >>> insn = BB_HEADER (bb); >>>>> >>> while (NEXT_INSN (insn)) >>>>> >>> - insn = NEXT_INSN (insn); >>>>> >>> + insn = NEXT_INSN (insn); >>>>> >>> } >>>>> >>> if (insn) >>>>> >>> NEXT_INSN (insn) = BB_HEAD (bb); >>>>> >>> @@ -3175,6 +3398,11 @@ static void >>>>> >>> insn = NEXT_INSN (insn); >>>>> >>> >>>>> >>> set_last_insn (insn); >>>>> >>> + >>>>> >>> + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >>>>> >>> + if (had_sec_boundary_notes || finalize_reorder_blocks) >>>>> >>> + insert_section_boundary_note (); >>>>> >>> + >>>>> >>> #ifdef ENABLE_CHECKING >>>>> >>> verify_insn_chain (); >>>>> >>> #endif >>>>> >>> @@ -3187,7 +3415,7 @@ static void >>>>> >>> edge e_fall, e_taken, e; >>>>> >>> rtx bb_end_insn; >>>>> >>> rtx ret_label = NULL_RTX; >>>>> >>> - basic_block nb, src_bb; >>>>> >>> + basic_block nb; >>>>> >>> edge_iterator ei; >>>>> >>> >>>>> >>> if (EDGE_COUNT (bb->succs) == 0) >>>>> >>> @@ -3322,7 +3550,6 @@ static void >>>>> >>> /* We got here if we need to add a new jump insn. >>>>> >>> Note force_nonfallthru can delete E_FALL and thus we have to >>>>> >>> save E_FALL->src prior to the call to force_nonfallthru. */ >>>>> >>> - src_bb = e_fall->src; >>>>> >>> nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); >>>>> >>> if (nb) >>>>> >>> { >>>>> >>> @@ -3330,17 +3557,6 @@ static void >>>>> >>> bb->aux = nb; >>>>> >>> /* Don't process this new block. */ >>>>> >>> bb = nb; >>>>> >>> - >>>>> >>> - /* Make sure new bb is tagged for correct section (same as >>>>> >>> - fall-thru source, since you cannot fall-thru across >>>>> >>> - section boundaries). */ >>>>> >>> - BB_COPY_PARTITION (src_bb, single_pred (bb)); >>>>> >>> - if (flag_reorder_blocks_and_partition >>>>> >>> - && targetm_common.have_named_sections >>>>> >>> - && JUMP_P (BB_END (bb)) >>>>> >>> - && !any_condjump_p (BB_END (bb)) >>>>> >>> - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) >>>>> >>> - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); >>>>> >>> } >>>>> >>> } >>>>> >>> >>>>> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) >>>>> >>> case NOTE_INSN_FUNCTION_BEG: >>>>> >>> /* There is always just single entry to function. */ >>>>> >>> case NOTE_INSN_BASIC_BLOCK: >>>>> >>> + /* We should only switch text sections once. */ >>>>> >>> + case NOTE_INSN_SWITCH_TEXT_SECTIONS: >>>>> >>> break; >>>>> >>> >>>>> >>> case NOTE_INSN_EPILOGUE_BEG: >>>>> >>> - case NOTE_INSN_SWITCH_TEXT_SECTIONS: >>>>> >>> emit_note_copy (insn); >>>>> >>> break; >>>>> >>> >>>>> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void) >>>>> >>> } >>>>> >>> >>>>> >>> /* Finalize the changes: reorder insn list according to the sequence specified >>>>> >>> - by aux pointers, enter compensation code, rebuild scope forest. */ >>>>> >>> + by aux pointers, enter compensation code, rebuild scope forest. If >>>>> >>> + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that >>>>> >>> + to fixup_reorder_chain so that it can insert the proper switch text >>>>> >>> + section notes. */ >>>>> >>> >>>>> >>> void >>>>> >>> -cfg_layout_finalize (void) >>>>> >>> +cfg_layout_finalize (bool finalize_reorder_blocks) >>>>> >>> { >>>>> >>> #ifdef ENABLE_CHECKING >>>>> >>> verify_flow_info (); >>>>> >>> @@ -3775,7 +3995,7 @@ void >>>>> >>> #endif >>>>> >>> ) >>>>> >>> fixup_fallthru_exit_predecessor (); >>>>> >>> - fixup_reorder_chain (); >>>>> >>> + fixup_reorder_chain (finalize_reorder_blocks); >>>>> >>> >>>>> >>> rebuild_jump_labels (get_insns ()); >>>>> >>> delete_dead_jumptables (); >>>>> >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) >>>>> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >>>>> >>> return false; >>>>> >>> >>>>> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >>>>> >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >>>>> >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >>>>> >>> return false; >>>>> >>> >>>>> >>> if (!onlyjump_p (insn) >>>>> >>> >>>>> >>> -- >>>>> >>> This patch is available for review at http://codereview.appspot.com/6823047 >>>>> >> >>>>> >> >>>>> >> >>>>> >> -- >>>>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 >>>>> >>>>> >>>>> >>>>> -- >>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 >>> >>> >>> >>> -- >>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Hello, Sorry for the long delay (ref http://patchwork.ozlabs.org/patch/199397/) On 6 December 2012 20:26, Teresa Johnson <tejohnson@google.com> wrote: > > > > On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon > <christophe.lyon@linaro.org> wrote: >> >> I have updated my trunk checkout, and I can confirm that eval.c now >> compiles with your patch (and the other 4 patches I added to PR55121). > > > good > >> >> >> Now, when looking at the whole Spec2k results: >> - vpr passes now (used to fail) > > > good > >> >> - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail >> with the same error from gas: >> can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171' >> {.text section} >> - gap still does not build (same error as above) >> >> I haven't looked in detail, so I may be missing an obvious patch here. > > > Finally had a chance to get back to this. I was able to reproduce the > failure using x86_64 linux with "-freorder-blocks-and-partition -g". > However, I am also getting the same failure with a pristine copy of trunk. > Can you confirm whether you were seeing any of these failures without my > patches, because I believe they are probably a limitation with function > splitting and debug info that is orthogonal to my patch. > Yes I confirm that I see these failures without your patch too; and both -freorder-blocks-and-partition and -g are present in my command-line. And now gap's integer.c fails to compile with a similar error message too. >> >> And I still observe runtime mis-behaviour on crafty, galgel, facerec and >> fma3d. > > > I'm not seeing this on x86_64, unfortunately, so it might require some > follow-on work to triage and fix. > > I'll look into the gas failure, but if someone could review this patch in > the meantime given that it does improve things considerably (at least > without -g), that would be great. > Indeed. > Thanks, > Teresa > Thanks Christophe
Thanks for the confirmation that the -g issue is orthogonal. I did start to try to address it but got pulled away by some other things for awhile. I'll see if I can take another stab at it. In the meantime, could one of the global maintainers take a look at the patch? I don't want it to get too stale, and without these fixes I am unable to get -freorder-blocks-and-partition to work at all. Thanks! Teresa On Thu, Jan 31, 2013 at 6:18 AM, Christophe Lyon <christophe.lyon@linaro.org> wrote: > Hello, > > Sorry for the long delay (ref http://patchwork.ozlabs.org/patch/199397/) > > > > On 6 December 2012 20:26, Teresa Johnson <tejohnson@google.com> wrote: >> >> >> >> On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon >> <christophe.lyon@linaro.org> wrote: >>> >>> I have updated my trunk checkout, and I can confirm that eval.c now >>> compiles with your patch (and the other 4 patches I added to PR55121). >> >> >> good >> >>> >>> >>> Now, when looking at the whole Spec2k results: >>> - vpr passes now (used to fail) >> >> >> good >> >>> >>> - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail >>> with the same error from gas: >>> can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171' >>> {.text section} >>> - gap still does not build (same error as above) >>> >>> I haven't looked in detail, so I may be missing an obvious patch here. >> >> >> Finally had a chance to get back to this. I was able to reproduce the >> failure using x86_64 linux with "-freorder-blocks-and-partition -g". >> However, I am also getting the same failure with a pristine copy of trunk. >> Can you confirm whether you were seeing any of these failures without my >> patches, because I believe they are probably a limitation with function >> splitting and debug info that is orthogonal to my patch. >> > Yes I confirm that I see these failures without your patch too; and > both -freorder-blocks-and-partition and -g are present in my > command-line. > And now gap's integer.c fails to compile with a similar error message too. > >>> >>> And I still observe runtime mis-behaviour on crafty, galgel, facerec and >>> fma3d. >> >> >> I'm not seeing this on x86_64, unfortunately, so it might require some >> follow-on work to triage and fix. >> >> I'll look into the gas failure, but if someone could review this patch in >> the meantime given that it does improve things considerably (at least >> without -g), that would be great. >> > Indeed. > >> Thanks, >> Teresa >> > > Thanks > Christophe
Index: cfghooks.h =================================================================== --- cfghooks.h (revision 193376) +++ cfghooks.h (working copy) @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas void account_profile_record (struct profile_record *, int); extern void cfg_layout_initialize (unsigned int); -extern void cfg_layout_finalize (void); +extern void cfg_layout_finalize (bool); /* Hooks containers. */ extern struct cfg_hooks gimple_cfg_hooks; @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi extern void gimple_register_cfg_hooks (void); extern struct cfg_hooks get_cfg_hooks (void); extern void set_cfg_hooks (struct cfg_hooks); - Index: modulo-sched.c =================================================================== --- modulo-sched.c (revision 193376) +++ modulo-sched.c (working copy) @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) if (bb->next_bb != EXIT_BLOCK_PTR) bb->aux = bb->next_bb; free_dominance_info (CDI_DOMINATORS); - cfg_layout_finalize (); + cfg_layout_finalize (false); #endif /* INSN_SCHEDULING */ return 0; } Index: ifcvt.c =================================================================== --- ifcvt.c (revision 193376) +++ ifcvt.c (working copy) @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg if (new_bb) { df_bb_replace (then_bb_index, new_bb); - /* Since the fallthru edge was redirected from test_bb to new_bb, - we need to ensure that new_bb is in the same partition as - test bb (you can not fall through across section boundaries). */ - BB_COPY_PARTITION (new_bb, test_bb); + /* This should have been done above via force_nonfallthru_and_redirect + (possibly called from redirect_edge_and_branch_force). */ + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); } num_true_changes++; Index: function.c =================================================================== --- function.c (revision 193376) +++ function.c (working copy) @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) break; if (e) { - copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), - NULL_RTX, e->src); + /* Make sure we insert after any barriers. */ + rtx end = get_last_bb_insn (e->src); + copy_bb = create_basic_block (NEXT_INSN (end), + NULL_RTX, e->src); BB_COPY_PARTITION (copy_bb, e->src); } else @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) if (cur_bb->index >= NUM_FIXED_BLOCKS && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) cur_bb->aux = cur_bb->next_bb; - cfg_layout_finalize (); + cfg_layout_finalize (false); } epilogue_done: @@ -6517,7 +6519,7 @@ epilogue_done: basic_block simple_return_block_cold = NULL; edge pending_edge_hot = NULL; edge pending_edge_cold = NULL; - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; + basic_block exit_pred; int i; gcc_assert (entry_edge != orig_entry_edge); @@ -6545,6 +6547,12 @@ epilogue_done: else pending_edge_cold = e; } + + /* Save a pointer to the exit's predecessor BB for use in + inserting new BBs at the end of the function. Do this + after the call to split_block above which may split + the original exit pred. */ + exit_pred = EXIT_BLOCK_PTR->prev_bb; FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) { Index: function.h =================================================================== --- function.h (revision 193376) +++ function.h (working copy) @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { sched2) and is useful only if the port defines LEAF_REGISTERS. */ bool uses_only_leaf_regs; + /* Nonzero if the function being compiled has undergone hot/cold partitioning + (under flag_reorder_blocks_and_partition) and has at least one cold + block. */ + bool has_bb_partition; + /* Like regs_ever_live, but 1 if a reg is set or clobbered from an asm. Unlike regs_ever_live, elements of this array corresponding to eliminable regs (like the frame pointer) are set if an asm Index: hw-doloop.c =================================================================== --- hw-doloop.c (revision 193376) +++ hw-doloop.c (working copy) @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) else bb->aux = NULL; } - cfg_layout_finalize (); + cfg_layout_finalize (false); clear_aux_for_blocks (); df_analyze (); } Index: cfgcleanup.c =================================================================== --- cfgcleanup.c (revision 193376) +++ cfgcleanup.c (working copy) @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, partition boundaries). See the comments at the top of bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ - if (flag_reorder_blocks_and_partition && reload_completed) + if (crtl->has_bb_partition && reload_completed) return false; /* Search backward through forwarder blocks. We don't need to worry @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) df_analyze (); } + if (changed) + { + /* Edge forwarding in particular can cause hot blocks previously + reached by both hot and cold blocks to become dominated only + by cold blocks. This will cause the verification below to fail, + and lead to now cold code in the hot section. This is not easy + to detect and fix during edge forwarding, and in some cases + is only visible after newly unreachable blocks are deleted, + which will be done in fixup_partitions. */ + fixup_partitions (); + #ifdef ENABLE_CHECKING - if (changed) - verify_flow_info (); + verify_flow_info (); #endif + } changed_overall |= changed; first_pass = false; Index: bb-reorder.c =================================================================== --- bb-reorder.c (revision 193376) +++ bb-reorder.c (working copy) @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces current_partition = BB_PARTITION (traces[0].first); two_passes = false; - if (flag_reorder_blocks_and_partition) + if (crtl->has_bb_partition) for (i = 0; i < n_traces && !two_passes; i++) if (BB_PARTITION (traces[0].first) != BB_PARTITION (traces[i].first)) @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces } } - if (flag_reorder_blocks_and_partition) + if (crtl->has_bb_partition) try_copy = false; /* Copy tiny blocks always; copy larger blocks only when the @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) return length; } -/* Emit a barrier into the footer of BB. */ +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode. */ -static void +void emit_barrier_after_bb (basic_block bb) { rtx barrier = emit_barrier_after (BB_END (bb)); - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); + if (current_ir_type () == IR_RTL_CFGLAYOUT) + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); } /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg { VEC(edge, heap) *crossing_edges = NULL; basic_block bb; - edge e; - edge_iterator ei; + edge e, e2; + edge_iterator ei, ei2; + unsigned int cold_bb_count = 0; + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; + VEC (basic_block, heap) *bbs_newly_hot = NULL; /* Mark which partition (hot/cold) each basic block belongs in. */ FOR_EACH_BB (bb) { if (probably_never_executed_bb_p (cfun, bb)) - BB_SET_PARTITION (bb, BB_COLD_PARTITION); + { + BB_SET_PARTITION (bb, BB_COLD_PARTITION); + cold_bb_count++; + } else - BB_SET_PARTITION (bb, BB_HOT_PARTITION); + { + BB_SET_PARTITION (bb, BB_HOT_PARTITION); + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb); + } } + /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of + several different possibilities. One is that there are edge weight insanities + due to optimization phases that do not properly update basic block profile + counts. The second is that the entry of the function may not be hot, because + it is entered fewer times than the number of profile training runs, but there + is a loop inside the function that causes blocks within the function to be + above the threshold for hotness. */ + if (cold_bb_count) + { + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); + + if (dom_calculated_here) + calculate_dominance_info (CDI_DOMINATORS); + + /* Keep examining hot bbs until we have either checked them all, or + re-marked all cold bbs hot. */ + while (! VEC_empty (basic_block, bbs_in_hot_partition) + && cold_bb_count) + { + basic_block dom_bb; + + bb = VEC_pop (basic_block, bbs_in_hot_partition); + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); + + /* If bb's immediate dominator is also hot then it is ok. */ + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) + continue; + + /* We have a hot bb with an immediate dominator that is cold. + The dominator needs to be re-marked to hot. */ + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); + cold_bb_count--; + + /* Now we need to examine newly-hot dom_bb to see if it is also + dominated by a cold bb. */ + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb); + + /* We should also adjust any cold blocks that the newly-hot bb + feeds and see if it makes sense to re-mark those as hot as + well. */ + VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb); + while (! VEC_empty (basic_block, bbs_newly_hot)) + { + basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot); + /* Examine all successors of this newly-hot bb to see if they + are cold and should be re-marked as hot. */ + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) + { + bool any_cold_preds = false; + basic_block succ = e->dest; + if (BB_PARTITION (succ) != BB_COLD_PARTITION) + continue; + /* Does this block have any cold predecessors now? */ + FOR_EACH_EDGE (e2, ei2, succ->preds) + { + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) + { + any_cold_preds = true; + break; + } + } + if (any_cold_preds) + continue; + + /* Here we have a successor of newly-hot bb that is cold + but no longer has any cold precessessors. Since the original + assignment of our newly-hot bb was incorrect, this successor's + assignment as cold is also suspect. Go ahead and re-mark it + as hot now too. Better heuristics may be in order here. */ + BB_SET_PARTITION (succ, BB_HOT_PARTITION); + cold_bb_count--; + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ); + /* Examine this successor as a newly-hot bb. */ + VEC_safe_push (basic_block, heap, bbs_newly_hot, succ); + } + } + } + + if (dom_calculated_here) + free_dominance_info (CDI_DOMINATORS); + } + /* The format of .gcc_except_table does not allow landing pads to be in a different partition as the throw. Fix this by either moving or duplicating the landing pads. */ @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) new_bb->aux = cur_bb->aux; cur_bb->aux = new_bb; - /* Make sure new fall-through bb is in same - partition as bb it's falling through from. */ + /* This is done by force_nonfallthru_and_redirect. */ + gcc_assert (BB_PARTITION (new_bb) + == BB_PARTITION (cur_bb)); - BB_COPY_PARTITION (new_bb, cur_bb); single_succ_edge (new_bb)->flags |= EDGE_CROSSING; } else @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) FOR_EACH_BB (bb) FOR_EACH_EDGE (e, ei, bb->succs) if ((e->flags & EDGE_CROSSING) - && JUMP_P (BB_END (e->src))) + && JUMP_P (BB_END (e->src)) + /* Some notes were added during fix_up_fall_thru_edges, via + force_nonfallthru_and_redirect. */ + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX)) add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); } @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) dump_flow_info (dump_file, dump_flags); } - if (flag_reorder_blocks_and_partition) + if (crtl->has_bb_partition) verify_hot_cold_block_grouping (); } @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) encountering this note will make the compiler switch between the hot and cold text sections. */ -static void +void insert_section_boundary_note (void) { basic_block bb; rtx new_note; int first_partition = 0; - if (!flag_reorder_blocks_and_partition) + if (!crtl->has_bb_partition) return; FOR_EACH_BB (bb) @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) FOR_EACH_BB (bb) if (bb->next_bb != EXIT_BLOCK_PTR) bb->aux = bb->next_bb; - cfg_layout_finalize (); + cfg_layout_finalize (true); - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ - insert_section_boundary_note (); return 0; } @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) } done: - cfg_layout_finalize (); + cfg_layout_finalize (false); BITMAP_FREE (candidates); return 0; @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) if (crossing_edges == NULL) return 0; + crtl->has_bb_partition = true; + /* Make sure the source of any crossing edge ends in a jump and the destination of any crossing edge has a label. */ add_labels_and_missing_jumps (crossing_edges); Index: bb-reorder.h =================================================================== --- bb-reorder.h (revision 193376) +++ bb-reorder.h (working copy) @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re extern int get_uncond_jump_length (void); +extern void insert_section_boundary_note (void); + +extern void emit_barrier_after_bb (basic_block bb); + #endif Index: basic-block.h =================================================================== --- basic-block.h (revision 193376) +++ basic-block.h (working copy) @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect extern bool contains_no_active_insn_p (const_basic_block); extern bool forwarder_block_p (const_basic_block); extern bool can_fallthru (basic_block, basic_block); +extern void fixup_partitions (void); /* In cfgbuild.c. */ extern void find_many_sub_basic_blocks (sbitmap); Index: cfgrtl.c =================================================================== --- cfgrtl.c (revision 193376) +++ cfgrtl.c (working copy) @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see #include "tree.h" #include "hard-reg-set.h" #include "basic-block.h" +#include "bb-reorder.h" #include "regs.h" #include "flags.h" #include "function.h" @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see Only applicable if the CFG is in cfglayout mode. */ static GTY(()) rtx cfg_layout_function_footer; static GTY(()) rtx cfg_layout_function_header; +static bool had_sec_boundary_notes; static rtx skip_insns_after_block (basic_block); static void record_effective_endpoints (void); static rtx label_for_bb (basic_block); -static void fixup_reorder_chain (void); +static void fixup_reorder_chain (bool finalize_reorder_blocks); void verify_insn_chain (void); static void fixup_fallthru_exit_predecessor (void); @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc partition boundaries). See the comments at the top of bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) - || BB_PARTITION (src) != BB_PARTITION (target)) + if (BB_PARTITION (src) != BB_PARTITION (target)) return NULL; /* We can replace or remove a complex jump only when we have exactly @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) return e; } +/* Called when edge E has been redirected to a new destination, + in order to update the region crossing flag on the edge and + jump. */ + +static void +fixup_partition_crossing (edge e, basic_block target) +{ + rtx note; + + gcc_assert (e->dest == target); + + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) + return; + /* If we redirected an existing edge, it may already be marked + crossing, even though the new src is missing a reg crossing note. + But make sure reg crossing note doesn't already exist before + inserting. */ + if (BB_PARTITION (e->src) != BB_PARTITION (target)) + { + e->flags |= EDGE_CROSSING; + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); + if (JUMP_P (BB_END (e->src)) + && !note) + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); + } + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) + { + e->flags &= ~EDGE_CROSSING; + /* Remove the region crossing note from jump at end of + e->src if it exists. */ + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); + if (note) + remove_note (BB_END (e->src), note); + } +} + +/* Called when block BB has been reassigned to a different partition, + to ensure that the region crossing attributes are updated. */ + +static void +fixup_bb_partition (basic_block bb) +{ + edge e; + edge_iterator ei; + + /* Now need to make bb's pred edges non-region crossing. */ + FOR_EACH_EDGE (e, ei, bb->preds) + { + fixup_partition_crossing (e, e->dest); + } + + /* Possibly need to make bb's successor edges region crossing, + or remove stale region crossing. */ + FOR_EACH_EDGE (e, ei, bb->succs) + { + if ((e->flags & EDGE_FALLTHRU) + && BB_PARTITION (bb) != BB_PARTITION (e->dest) + && e->dest != EXIT_BLOCK_PTR) + /* force_nonfallthru_and_redirect calls fixup_partition_crossing. */ + force_nonfallthru (e); + else + fixup_partition_crossing (e, e->dest); + } +} + /* Attempt to change code to redirect edge E to TARGET. Don't do that on expense of adding new instructions or reordering basic blocks. @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block { edge ret; basic_block src = e->src; + basic_block dest = e->dest; if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) return NULL; - if (e->dest == target) + if (dest == target) return e; if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) { df_set_bb_dirty (src); + fixup_partition_crossing (ret, target); return ret; } @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block return NULL; df_set_bb_dirty (src); + fixup_partition_crossing (ret, target); return ret; } @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc /* Make sure new block ends up in correct hot/cold section. */ BB_COPY_PARTITION (jump_block, e->src); - if (flag_reorder_blocks_and_partition - && targetm_common.have_named_sections - && JUMP_P (BB_END (jump_block)) - && !any_condjump_p (BB_END (jump_block)) - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); /* Wire edge in. */ new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); new_edge->probability = probability; new_edge->count = count; + /* If e->src was previously region crossing, it no longer is + and the reg crossing note should be removed. */ + fixup_partition_crossing (new_edge, jump_block); + /* Redirect old edge. */ redirect_edge_pred (e, jump_block); e->probability = REG_BR_PROB_BASE; @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc LABEL_NUSES (label)++; } - emit_barrier_after (BB_END (jump_block)); + /* We might be in cfg layout mode, and if so, the following routine will + insert the barrier correctly. */ + emit_barrier_after_bb (jump_block); redirect_edge_succ_nodup (e, target); if (abnormal_edge_flags) make_edge (src, target, abnormal_edge_flags); df_mark_solutions_dirty (); + fixup_partition_crossing (e, target); return new_bb; } @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU static basic_block rtl_split_edge (edge edge_in) { - basic_block bb; + basic_block bb, new_bb; rtx before; /* Abnormal edges cannot be split. */ @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) else { bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); - /* ??? Why not edge_in->dest->prev_bb here? */ - BB_COPY_PARTITION (bb, edge_in->dest); + if (edge_in->src == ENTRY_BLOCK_PTR) + BB_COPY_PARTITION (bb, edge_in->dest); + else + /* Put the split bb into the src partition, to avoid creating + a situation where a cold bb dominates a hot bb, in the case + where src is cold and dest is hot. The src will dominate + the new bb (whereas it might not have dominated dest). */ + BB_COPY_PARTITION (bb, edge_in->src); } make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); + /* Can't allow a region crossing edge to be fallthrough. */ + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) + && edge_in->dest != EXIT_BLOCK_PTR) + { + new_bb = force_nonfallthru (single_succ_edge (bb)); + gcc_assert (!new_bb); + } + /* For non-fallthru edges, we must adjust the predecessor's jump instruction to target our new block. */ if ((edge_in->flags & EDGE_FALLTHRU) == 0) @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) else { bb = split_edge (e); - after = BB_END (bb); - if (flag_reorder_blocks_and_partition - && targetm_common.have_named_sections - && e->src != ENTRY_BLOCK_PTR - && BB_PARTITION (e->src) == BB_COLD_PARTITION - && !(e->flags & EDGE_CROSSING) - && JUMP_P (after) - && !any_condjump_p (after) - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); + /* If e crossed a partition boundary, we needed to make bb end in + a region-crossing jump, even though it was originally fallthru. */ + if (JUMP_P (BB_END (bb))) + before = BB_END (bb); + else + after = BB_END (bb); } /* Now that we've found the spot, do the insertion. */ @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) { basic_block bb; + /* Optimization passes that invoke this routine can cause hot blocks + previously reached by both hot and cold blocks to become dominated only + by cold blocks. This will cause the verification below to fail, + and lead to now cold code in the hot section. In some cases this + may only be visible after newly unreachable blocks are deleted, + which will be done by fixup_partitions. */ + fixup_partitions (); + #ifdef ENABLE_CHECKING verify_flow_info (); #endif @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) return end; } - + +/* Perform cleanup on the hot/cold bb partitioning after optimization + passes that modify the cfg. */ + +void +fixup_partitions (void) +{ + basic_block bb; + + if (!crtl->has_bb_partition) + return; + + /* Delete any blocks that became unreachable and weren't + already cleaned up, for example during edge forwarding + and convert_jumps_to_returns. This will expose more + opportunities for fixing the partition boundaries here. + Also, the calculation of the dominance graph during verification + will assert if there are unreachable nodes. */ + delete_unreachable_blocks (); + + /* If there are partitions, do a sanity check on them: A basic block in + a cold partition cannot dominate a basic block in a hot partition. + Fixup any that now violate this requirement, as a result of edge + forwarding and unreachable block deletion. */ + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; + VEC (basic_block, heap) *bbs_to_fix = NULL; + FOR_EACH_BB (bb) + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); + if (! VEC_empty (basic_block, bbs_in_cold_partition)) + { + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); + basic_block son; + + if (dom_calculated_here) + calculate_dominance_info (CDI_DOMINATORS); + + while (! VEC_empty (basic_block, bbs_in_cold_partition)) + { + bb = VEC_pop (basic_block, bbs_in_cold_partition); + /* If bb is not yet cold (because it was added below as + a block dominated by a cold bb) then mark it cold here. */ + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) + { + BB_SET_PARTITION (bb, BB_COLD_PARTITION); + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); + } + /* Any blocks dominated by a block in the cold section + must also be cold. */ + for (son = first_dom_son (CDI_DOMINATORS, bb); + son; + son = next_dom_son (CDI_DOMINATORS, son)) + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); + } + + if (dom_calculated_here) + free_dominance_info (CDI_DOMINATORS); + } + + /* Do the partition fixup after all necessary blocks have been converted to + cold, so that we only update the region crossings the minimum number of + places, which can require forcing edges to be non fallthru. */ + while (! VEC_empty (basic_block, bbs_to_fix)) + { + bb = VEC_pop (basic_block, bbs_to_fix); + fixup_bb_partition (bb); + } +} + /* Verify the CFG and RTL consistency common for both underlying RTL and cfglayout RTL. @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) rtx x; int err = 0; basic_block bb; + bool have_partitions = false; /* Check the general integrity of the basic blocks. */ FOR_EACH_BB_REVERSE (bb) @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) if (e->flags & EDGE_ABNORMAL) n_abnormal++; + + have_partitions |= is_crossing; } if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) } } + /* If there are partitions, do a sanity check on them: A basic block in + a cold partition cannot dominate a basic block in a hot partition. */ + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; + if (have_partitions && !err) + FOR_EACH_BB (bb) + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); + if (! VEC_empty (basic_block, bbs_in_cold_partition)) + { + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); + basic_block son; + + if (dom_calculated_here) + calculate_dominance_info (CDI_DOMINATORS); + + while (! VEC_empty (basic_block, bbs_in_cold_partition)) + { + bb = VEC_pop (basic_block, bbs_in_cold_partition); + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) + { + error ("non-cold basic block %d dominated " + "by a block in the cold partition", bb->index); + err = 1; + } + for (son = first_dom_son (CDI_DOMINATORS, bb); + son; + son = next_dom_son (CDI_DOMINATORS, son)) + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); + } + + if (dom_calculated_here) + free_dominance_info (CDI_DOMINATORS); + } + /* Clean up. */ return err; } @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) else cfg_layout_function_header = NULL_RTX; + had_sec_boundary_notes = false; + next_insn = get_insns (); FOR_EACH_BB (bb) { rtx end; if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) - BB_HEADER (bb) = unlink_insn_chain (next_insn, - PREV_INSN (BB_HEAD (bb))); + { + /* Rather than try to keep section boundary notes incrementally + up-to-date through cfg layout optimizations, simply remove them + and flag that they should be re-inserted when exiting + cfg layout mode. */ + rtx check_insn = next_insn; + while (check_insn) + { + if (NOTE_P (check_insn) + && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) + { + had_sec_boundary_notes |= true; + /* Remove note from chain. Grab new next_insn first. */ + if (next_insn == check_insn) + next_insn = NEXT_INSN (check_insn); + /* Delete note. */ + delete_insn (check_insn); + /* There will only be one. */ + break; + } + check_insn = NEXT_INSN (check_insn); + } + /* If we still have header instructions left after above loop. */ + if (next_insn != BB_HEAD (bb)) + BB_HEADER (bb) = unlink_insn_chain (next_insn, + PREV_INSN (BB_HEAD (bb))); + } end = skip_insns_after_block (bb); if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end); @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) if (bb->next_bb != EXIT_BLOCK_PTR) bb->aux = bb->next_bb; - cfg_layout_finalize (); + cfg_layout_finalize (false); return 0; } @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) } -/* Given a reorder chain, rearrange the code to match. */ +/* Given a reorder chain, rearrange the code to match. If + this is called when we will FINALIZE_REORDER_BLOCKS, or when + section boundary notes were removed on entry to cfg layout + mode, insert section boundary notes here. */ static void -fixup_reorder_chain (void) +fixup_reorder_chain (bool finalize_reorder_blocks) { basic_block bb; rtx insn = NULL; @@ -3150,7 +3373,7 @@ static void PREV_INSN (BB_HEADER (bb)) = insn; insn = BB_HEADER (bb); while (NEXT_INSN (insn)) - insn = NEXT_INSN (insn); + insn = NEXT_INSN (insn); } if (insn) NEXT_INSN (insn) = BB_HEAD (bb); @@ -3175,6 +3398,11 @@ static void insn = NEXT_INSN (insn); set_last_insn (insn); + + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ + if (had_sec_boundary_notes || finalize_reorder_blocks) + insert_section_boundary_note (); + #ifdef ENABLE_CHECKING verify_insn_chain (); #endif @@ -3187,7 +3415,7 @@ static void edge e_fall, e_taken, e; rtx bb_end_insn; rtx ret_label = NULL_RTX; - basic_block nb, src_bb; + basic_block nb; edge_iterator ei; if (EDGE_COUNT (bb->succs) == 0) @@ -3322,7 +3550,6 @@ static void /* We got here if we need to add a new jump insn. Note force_nonfallthru can delete E_FALL and thus we have to save E_FALL->src prior to the call to force_nonfallthru. */ - src_bb = e_fall->src; nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); if (nb) { @@ -3330,17 +3557,6 @@ static void bb->aux = nb; /* Don't process this new block. */ bb = nb; - - /* Make sure new bb is tagged for correct section (same as - fall-thru source, since you cannot fall-thru across - section boundaries). */ - BB_COPY_PARTITION (src_bb, single_pred (bb)); - if (flag_reorder_blocks_and_partition - && targetm_common.have_named_sections - && JUMP_P (BB_END (bb)) - && !any_condjump_p (BB_END (bb)) - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); } } @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) case NOTE_INSN_FUNCTION_BEG: /* There is always just single entry to function. */ case NOTE_INSN_BASIC_BLOCK: + /* We should only switch text sections once. */ + case NOTE_INSN_SWITCH_TEXT_SECTIONS: break; case NOTE_INSN_EPILOGUE_BEG: - case NOTE_INSN_SWITCH_TEXT_SECTIONS: emit_note_copy (insn); break; @@ -3759,10 +3976,13 @@ break_superblocks (void) } /* Finalize the changes: reorder insn list according to the sequence specified - by aux pointers, enter compensation code, rebuild scope forest. */ + by aux pointers, enter compensation code, rebuild scope forest. If + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that + to fixup_reorder_chain so that it can insert the proper switch text + section notes. */ void -cfg_layout_finalize (void) +cfg_layout_finalize (bool finalize_reorder_blocks) { #ifdef ENABLE_CHECKING verify_flow_info (); @@ -3775,7 +3995,7 @@ void #endif ) fixup_fallthru_exit_predecessor (); - fixup_reorder_chain (); + fixup_reorder_chain (finalize_reorder_blocks); rebuild_jump_labels (get_insns ()); delete_dead_jumptables (); @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) return false; - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) - || BB_PARTITION (src) != BB_PARTITION (target)) + if (BB_PARTITION (src) != BB_PARTITION (target)) return false; if (!onlyjump_p (insn)