RFA: Rework FOR_BB_INSNS iterators

I noticed the head of FOR_BB_INSNS loops showing up high in the profile,
both for -O0 and -O2, so this patch tries to make the loops more efficient.

The current definition of FOR_BB_INSNS is:

#define FOR_BB_INSNS(BB, INSN)			\
  for ((INSN) = BB_HEAD (BB);			\
       (INSN) && (INSN) != NEXT_INSN (BB_END (BB));	\
       (INSN) = NEXT_INSN (INSN))

The two parts of the loop condition are really handling two different
kinds of block: ones like entry and exit that are completely empty
and normal ones that have at least a block note.  There's no real
need to check for null INSNs in normal blocks.

Also, refetching NEXT_INSN (BB_END (BB)) for each iteration can be expensive.
If we're prepared to say that the loop body can't insert instructions
for another block immediately after BB_END, then we could cache the
end point in the loop header.  I think that restriction already applies
to the special case of INSN == BB_END (bb) in FOR_BB_INSNS_SAFE:

#define FOR_BB_INSNS_SAFE(BB, INSN, CURR)			\
  for ((INSN) = BB_HEAD (BB), (CURR) = (INSN) ? NEXT_INSN ((INSN)): NULL;	\
       (INSN) && (INSN) != NEXT_INSN (BB_END (BB));	\
       (INSN) = (CURR), (CURR) = (INSN) ? NEXT_INSN ((INSN)) : NULL)

because otherwise CURR would end up being after NEXT_INSN (BB_END (bb))
and the loop wouldn't terminate properly.

It's easier to change these macros if they define the INSN variables
themselves.

Also, the modified version of FOR_BB_INSNS_SAFE ended up being significantly
faster than the modified version of FOR_BB_INSNS.  I.e. fetching NEXT_INSN
in the loop header is better than fetching at continuation time.
FOR_BB_INSNS_SAFE also has the (IMO) nice property of skipping instructions
that the loop body inserts immediately after INSN, just like it would skip
those that the body inserts immediately before INSN.  I tried to check for
loops that might be relying on the old behaviour but I couldn't see any.

So the patch also gets rid of the safe/unsafe distinction and makes
the normal iterators "safe".

This seems to give a consistent 1% speedup on both -O0 and -O2 compiles
that I've tried.  Tested on x86_64-linux-gnu.  I also checked that there
were no changes in asm output for gcc.dg, g++.dg and gcc.c-torture for
x86_64 and that one target from each directory still builds correctly.
OK to install?

Thanks,
Richard

gcc/
	* basic-block.h (FOR_BB_INSNS): Redefine as a "safe" iterator
	that declares the iteration variable itself.  Cache the value
	of the terminator.  Make the compiler aware that the loop iterates
	at least once if the first insn is nonnull.
	(FOR_BB_INSNS_REVERSE): Likewise.
	(FOR_BB_INSNS_SAFE, FOR_BB_INSNS_REVERSE_SAFE): Delete.
	* alias.c (init_alias_analysis): Remove separate variable
	declarations.
	* bb-reorder.c (copy_bb_p): Likewise.
	(pass_duplicate_computed_gotos::execute): Likewise.
	* cfgloop.c (get_loop_location): Likewise.
	* cfgloopanal.c (num_loop_insns, average_num_loop_insns): Likewise.
	* cfgrtl.c (rtl_verify_bb_pointers, rtl_block_empty_p): Likewise.
	(rtl_split_block_before_cond_jump): Likewise.
	(rtl_account_profile_record): Likewise.
	* combine.c (create_log_links, combine_instructions): Likewise.
	* cprop.c (compute_hash_table_work, local_cprop_pass): Likewise.
	(bypass_conditional_jumps, one_cprop_pass): Likewise.
	* cse.c (cse_prescan_path, cse_extended_basic_block): Likewise.
	* df-core.c (df_bb_regno_first_def_find): Likewise.
	(df_bb_regno_last_def_find): Likewise.
	* df-problems.c (df_rd_bb_local_compute): Likewise.
	(df_lr_bb_local_compute, df_live_bb_local_compute): Likewise.
	(df_chain_remove_problem, df_chain_create_bb): Likewise.
	(df_word_lr_bb_local_compute, df_note_bb_compute): Likewise.
	(df_md_bb_local_compute): Likewise.
	* df-scan.c (df_scan_free_bb_info, df_scan_start_dump): Likewise.
	(df_reorganize_refs_by_reg_by_insn): Likewise.
	(df_reorganize_refs_by_insn_bb, df_recompute_luids): Likewise.
	(df_bb_refs_record, df_bb_verify): Likewise.
	(df_insn_rescan_all, df_update_entry_exit_and_calls): Likewise (with
	reindentation).
	* dse.c (dse_step1): Likewise.
	* function.c (reposition_prologue_and_epilogue_notes): Likewise.
	(pass_match_asm_constraints::execute): Likewise.
	* fwprop.c (single_def_use_dom_walker::before_dom_children): Likewise.
	* gcse.c (compute_hash_table_work, hoist_code): Likewise.
	(calculate_bb_reg_pressure, compute_ld_motion_mems): Likewise.
	* ifcvt.c (noce_can_store_speculate_p): Likewise.
	(cond_move_convert_if_block, dead_or_predicable): Likewise.
	* init-regs.c (initialize_uninitialized_regs): Likewise.
	* ira-build.c (create_bb_allocnos): Likewise.
	* ira-conflicts.c (add_copies): Likewise.
	* ira-costs.c (process_bb_for_costs): Likewise.
	(process_bb_node_for_hard_reg_moves): Likewise.
	* ira-emit.c (change_loop, ira_emit): Likewise.
	* ira-lives.c (process_bb_node_lives): Likewise.
	* ira.c (decrease_live_ranges_number): Likewise.
	(compute_regs_asm_clobbered, update_equiv_regs): Likewise.
	(build_insn_chain, find_moveable_pseudos): Likewise.
	(split_live_ranges_for_shrink_wrap): Likewise.
	* jump.c (mark_all_labels): Likewise.
	* haifa-sched.c (initiate_bb_reg_pressure_info): Likewise.
	(sched_init_luids, haifa_init_h_i_d): Likewise (with reindentation).
	* loop-invariant.c (find_exits, find_invariants_bb): Likewise.
	(calculate_loop_reg_pressure): Likewise.
	* loop-iv.c (simplify_using_initial_values): Likewise.
	* lower-subreg.c (decompose_multiword_subregs): Likewise.
	* lra-constraints.c (get_last_insertion_point): Likewise.
	* lra-eliminations.c (init_elimination): Likewise.
	* lra.c (remove_scratches, check_rtl, update_inc_notes): Likewise.
	* mode-switching.c (optimize_mode_switching): Likewise.
	* postreload-gcse.c (alloc_mem): Likewise.
	(compute_hash_table): Likewise (with reindentation).
	(eliminate_partially_redundant_loads): Likewise.
	* postreload.c (reload_cse_regs_1): Likewise.
	* predict.c (expensive_function_p): Likewise (with reindenation).
	* ree.c (find_removable_extensions): Likewise.
	* reginfo.c (init_subregs_of_mode): Likewise.
	* regrename.c (regrename_analyze): Likewise.
	* regstat.c (regstat_bb_compute_ri): Likewise.
	(regstat_bb_compute_calls_crossed): Likewise.
	* reload1.c (calculate_elim_costs_all_insns): Likewise.
	* sched-rgn.c (is_cfg_nonregular): Likewise.
	(rgn_estimate_number_of_insns): Likewise (with reindentation).
	* sched-vis.c (rtl_dump_bb_for_graph): Likewise.
	* sel-sched-ir.c (sched_scan): Likewise (with reindentation).
	(sel_restore_notes, clear_outdated_rtx_info): Likewise.
	(sel_split_block): Likewise.
	* sel-sched.c (simplify_changed_insns): Likewise.
	* stack-ptr-mod.c (pass_stack_ptr_mod::execute): Likewise.
	* store-motion.c (compute_store_table, build_store_vectors): Likewise.
	* web.c (pass_web::execute): Likewise.
	* config/arm/arm.c (thumb2_reorg): Likewise.
	* config/epiphany/resolve-sw-modes.c (pass_resolve_sw_modes::execute):
	Likewise.
	* config/i386/i386.c (ix86_finalize_stack_realign_flags): Likewise.
	(ix86_count_insn_bb): Likewise.
	* config/mips/mips.c (mips_get_pic_call_symbol): Likewise (with
	reindentation).
	* config/mn10300/mn10300.c (mn10300_block_contains_call): Likewise.
	* config/s390/s390.c (s390_regs_ever_clobbered): Likewise.
	(s390_optimize_nonescaping_tx): Likewise.
	* lra-lives.c (process_bb_lives): Use a local "insn" variable and
	assign it to curr_insn.
	* loop-unroll.c (loop_exit_at_end_p): Remove separate variable
	declarations.
	(referenced_in_one_insn_in_loop_p, reset_debug_uses_in_loop): Likewise.
	(analyze_insns_in_loop): Likewise.
	(apply_opt_in_copies): Replace use of safe iterators with standard
	ones.
	* auto-inc-dec.c (merge_in_block): Likewise.
	* lra-coalesce.c (lra_coalesce): Likewise.
	* var-tracking.c (delete_debug_insns): Likewise.
	* shrink-wrap.c (prepare_shrink_wrap): Likewise.
	(try_shrink_wrapping): Remove separate variable declarations.
	* dce.c (word_dce_process_block, dce_process_block): Likewise.
	(reset_unmarked_insns_debug_uses): Replace use of safe iterators
	with standard ones.
	(delete_unmarked_insns, prescan_insns_for_dce): Likewise.
	* lra-spills.c (lra_final_code_change): Likewise.
	(assign_spill_hard_regs, spill_pseudos): Remove separate variable
	declarations.
	* config/c6x/c6x.c (conditionalize_after_sched): Likewise.
	(bb_earliest_end_cycle): Likewise.
	(filter_insns_above): Replace use of safe iterators with standard ones.

RFA: Rework FOR_BB_INSNS iterators

Commit Message

Comments

Patch