Message ID | 85b4098e-a72f-d013-ff17-8097971f71ba@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273] | expand |
Hi, Gentle ping this: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634201.html BR, Kewen on 2023/10/25 10:45, Kewen.Lin wrote: > Hi, > > This is almost a repost for v2 which was posted at[1] in March > excepting for: > 1) rebased from r14-4810 which is relatively up-to-date, > some conflicts on "int to bool" return type change have > been resolved; > 2) adjust commit log a bit; > 3) fix misspelled "articial" with "artificial" somewhere; > > -- > *v2 comments*: > > By addressing Alexander's comments, against v1 this > patch v2 mainly: > > - Rename no_real_insns_p to no_real_nondebug_insns_p; > - Introduce enum rgn_bb_deps_free_action for three > kinds of actions to free deps; > - Change function free_deps_for_bb_no_real_insns_p to > resolve_forw_deps which only focuses on forward deps; > - Extend the handlings to cover dbg-cnt sched_block, > add one test case for it; > - Move free_trg_info call in schedule_region to an > appropriate place. > > One thing I'm not sure about is the change in function > sched_rgn_local_finish, currently the invocation to > sched_rgn_local_free is guarded with !sel_sched_p (), > so I just follow it, but the initialization of those > structures (in sched_rgn_local_init) isn't guarded > with !sel_sched_p (), it looks odd. > > -- > > As PR108273 shows, when there is one block which only has > NOTE_P and LABEL_P insns at non-debug mode while has some > extra DEBUG_INSN_P insns at debug mode, after scheduling > it, the DFA states would be different between debug mode > and non-debug mode. Since at non-debug mode, the block > meets no_real_insns_p, it gets skipped; while at debug > mode, it gets scheduled, even it only has NOTE_P, LABEL_P > and DEBUG_INSN_P, the call of function advance_one_cycle > will change the DFA state. PR108519 also shows this issue > can be exposed by some scheduler changes. > > This patch is to change function no_real_insns_p to > function no_real_nondebug_insns_p by taking debug insn into > account, which make us not try to schedule for the block > having only NOTE_P, LABEL_P and DEBUG_INSN_P insns, > resulting in consistent DFA states between non-debug and > debug mode. > > Changing no_real_insns_p to no_real_nondebug_insns_p caused > ICE when doing free_block_dependencies, the root cause is > that we create dependencies for debug insns, those > dependencies are expected to be resolved during scheduling > insns, but they get skipped after this change. > By checking the code, it looks it's reasonable to skip to > compute block dependences for no_real_nondebug_insns_p > blocks. There is also another issue, which gets exposed > in SPEC2017 bmks build at option -O2 -g, is that we could > skip to schedule some block, which already gets dependency > graph built so has dependencies computed and rgn_n_insns > accumulated, then the later verification on if the graph > becomes exhausted by scheduling would fail as follow: > > /* Sanity check: verify that all region insns were > scheduled. */ > gcc_assert (sched_rgn_n_insns == rgn_n_insns); > > , and also some forward deps aren't resovled. > > As Alexander pointed out, the current debug count handling > also suffers the similar issue, so this patch handles these > two cases together: one is for some block gets skipped by > !dbg_cnt (sched_block), the other is for some block which > is not no_real_nondebug_insns_p initially but becomes > no_real_nondebug_insns_p due to speculative scheduling. > > This patch can be bootstrapped and regress-tested on > x86_64-redhat-linux, aarch64-linux-gnu and > powerpc64{,le}-linux-gnu. > > I also verified this patch can pass SPEC2017 both intrate > and fprate bmks building at -g -O2/-O3. > > Any thoughts? Is it ok for trunk? > > [1] v2: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614818.html > [2] v1: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614224.html > > BR, > Kewen > ----- > PR rtl-optimization/108273 > > gcc/ChangeLog: > > * haifa-sched.cc (no_real_insns_p): Rename to ... > (no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn. > * sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with > no_real_nondebug_insns_p. > * sched-int.h (no_real_insns_p): Rename to ... > (no_real_nondebug_insns_p): ... this. > * sched-rgn.cc (enum rgn_bb_deps_free_action): New enum. > (bb_deps_free_actions): New static variable. > (compute_block_dependences): Skip for no_real_nondebug_insns_p. > (resolve_forw_deps): New function. > (free_block_dependencies): Check bb_deps_free_actions and call > function resolve_forw_deps for RGN_BB_DEPS_FREE_ARTIFICIAL. > (compute_priorities): Replace no_real_insns_p with > no_real_nondebug_insns_p. > (schedule_region): Replace no_real_insns_p with > no_real_nondebug_insns_p, set RGN_BB_DEPS_FREE_ARTIFICIAL if the block > get dependencies computed before but skipped now, fix up count > sched_rgn_n_insns for it too. Call free_trg_info when the block > gets scheduled, and move sched_rgn_local_finish after the loop > of free_block_dependencies loop. > (sched_rgn_local_init): Allocate and compute bb_deps_free_actions. > (sched_rgn_local_finish): Free bb_deps_free_actions. > * sel-sched.cc (sel_region_target_finish): Replace no_real_insns_p with > no_real_nondebug_insns_p. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pr108273.c: New test. > --- > gcc/haifa-sched.cc | 9 +- > gcc/sched-ebb.cc | 2 +- > gcc/sched-int.h | 2 +- > gcc/sched-rgn.cc | 148 +++++++++++++++----- > gcc/sel-sched.cc | 3 +- > gcc/testsuite/gcc.target/powerpc/pr108273.c | 26 ++++ > 6 files changed, 150 insertions(+), 40 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108273.c > > diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc > index 8e8add709b3..30cc90ec49f 100644 > --- a/gcc/haifa-sched.cc > +++ b/gcc/haifa-sched.cc > @@ -5033,14 +5033,17 @@ get_ebb_head_tail (basic_block beg, basic_block end, > *tailp = end_tail; > } > > -/* Return true if there are no real insns in the range [ HEAD, TAIL ]. */ > +/* Return true if there are no real nondebug insns in the range > + [ HEAD, TAIL ]. */ > > bool > -no_real_insns_p (const rtx_insn *head, const rtx_insn *tail) > +no_real_nondebug_insns_p (const rtx_insn *head, const rtx_insn *tail) > { > while (head != NEXT_INSN (tail)) > { > - if (!NOTE_P (head) && !LABEL_P (head)) > + if (!NOTE_P (head) > + && !LABEL_P (head) > + && !DEBUG_INSN_P (head)) > return false; > head = NEXT_INSN (head); > } > diff --git a/gcc/sched-ebb.cc b/gcc/sched-ebb.cc > index 110fcdbca4d..03d96290a7c 100644 > --- a/gcc/sched-ebb.cc > +++ b/gcc/sched-ebb.cc > @@ -491,7 +491,7 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, bool modulo_scheduling) > first_bb = BLOCK_FOR_INSN (head); > last_bb = BLOCK_FOR_INSN (tail); > > - if (no_real_insns_p (head, tail)) > + if (no_real_nondebug_insns_p (head, tail)) > return BLOCK_FOR_INSN (tail); > > gcc_assert (INSN_P (head) && INSN_P (tail)); > diff --git a/gcc/sched-int.h b/gcc/sched-int.h > index 64a2f0bcff9..adca494ade5 100644 > --- a/gcc/sched-int.h > +++ b/gcc/sched-int.h > @@ -1397,7 +1397,7 @@ extern void free_global_sched_pressure_data (void); > extern int haifa_classify_insn (const_rtx); > extern void get_ebb_head_tail (basic_block, basic_block, > rtx_insn **, rtx_insn **); > -extern bool no_real_insns_p (const rtx_insn *, const rtx_insn *); > +extern bool no_real_nondebug_insns_p (const rtx_insn *, const rtx_insn *); > > extern int insn_sched_cost (rtx_insn *); > extern int dep_cost_1 (dep_t, dw_t); > diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc > index e5964f54ead..2549e834aa8 100644 > --- a/gcc/sched-rgn.cc > +++ b/gcc/sched-rgn.cc > @@ -213,6 +213,22 @@ static int rgn_nr_edges; > /* Array of size rgn_nr_edges. */ > static edge *rgn_edges; > > +/* Possible actions for dependencies freeing. */ > +enum rgn_bb_deps_free_action > +{ > + /* This block doesn't get dependencies computed so don't need to free. */ > + RGN_BB_DEPS_FREE_NO, > + /* This block gets scheduled normally so free dependencies as usual. */ > + RGN_BB_DEPS_FREE_NORMAL, > + /* This block gets skipped in scheduling but has dependencies computed early, > + need to free the forward list artificially. */ > + RGN_BB_DEPS_FREE_ARTIFICIAL > +}; > + > +/* For basic block i, bb_deps_free_actions[i] indicates which action needs > + to be taken for freeing its dependencies. */ > +static enum rgn_bb_deps_free_action *bb_deps_free_actions; > + > /* Mapping from each edge in the graph to its number in the rgn. */ > #define EDGE_TO_BIT(edge) ((int)(size_t)(edge)->aux) > #define SET_EDGE_TO_BIT(edge,nr) ((edge)->aux = (void *)(size_t)(nr)) > @@ -2735,6 +2751,15 @@ compute_block_dependences (int bb) > gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb)); > get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); > > + /* Don't compute block dependences if there are no real nondebug insns. */ > + if (no_real_nondebug_insns_p (head, tail)) > + { > + if (current_nr_blocks > 1) > + propagate_deps (bb, &tmp_deps); > + free_deps (&tmp_deps); > + return; > + } > + > sched_analyze (&tmp_deps, head, tail); > > add_branch_dependences (head, tail); > @@ -2749,6 +2774,24 @@ compute_block_dependences (int bb) > targetm.sched.dependencies_evaluation_hook (head, tail); > } > > +/* Artificially resolve forward dependencies for instructions HEAD to TAIL. */ > + > +static void > +resolve_forw_deps (rtx_insn *head, rtx_insn *tail) > +{ > + rtx_insn *insn; > + rtx_insn *next_tail = NEXT_INSN (tail); > + sd_iterator_def sd_it; > + dep_t dep; > + > + /* There could be some insns which get skipped in scheduling but we compute > + dependencies for them previously, so make them resolved. */ > + for (insn = head; insn != next_tail; insn = NEXT_INSN (insn)) > + for (sd_it = sd_iterator_start (insn, SD_LIST_FORW); > + sd_iterator_cond (&sd_it, &dep);) > + sd_resolve_dep (sd_it); > +} > + > /* Free dependencies of instructions inside BB. */ > static void > free_block_dependencies (int bb) > @@ -2758,9 +2801,12 @@ free_block_dependencies (int bb) > > get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); > > - if (no_real_insns_p (head, tail)) > + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO) > return; > > + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL) > + resolve_forw_deps (head, tail); > + > sched_free_deps (head, tail, true); > } > > @@ -3024,7 +3070,7 @@ compute_priorities (void) > gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb)); > get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); > > - if (no_real_insns_p (head, tail)) > + if (no_real_nondebug_insns_p (head, tail)) > continue; > > rgn_n_insns += set_priorities (head, tail); > @@ -3158,7 +3204,7 @@ schedule_region (int rgn) > > get_ebb_head_tail (first_bb, last_bb, &head, &tail); > > - if (no_real_insns_p (head, tail)) > + if (no_real_nondebug_insns_p (head, tail)) > { > gcc_assert (first_bb == last_bb); > continue; > @@ -3178,44 +3224,62 @@ schedule_region (int rgn) > > get_ebb_head_tail (first_bb, last_bb, &head, &tail); > > - if (no_real_insns_p (head, tail)) > + if (no_real_nondebug_insns_p (head, tail)) > { > gcc_assert (first_bb == last_bb); > save_state_for_fallthru_edge (last_bb, bb_state[first_bb->index]); > - continue; > + > + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO) > + continue; > + > + /* As it's not no_real_nondebug_insns_p initially, then it has some > + dependencies computed so free it artificially. */ > + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL; > } > + else > + { > + current_sched_info->prev_head = PREV_INSN (head); > + current_sched_info->next_tail = NEXT_INSN (tail); > > - current_sched_info->prev_head = PREV_INSN (head); > - current_sched_info->next_tail = NEXT_INSN (tail); > + remove_notes (head, tail); > > - remove_notes (head, tail); > + unlink_bb_notes (first_bb, last_bb); > > - unlink_bb_notes (first_bb, last_bb); > + target_bb = bb; > > - target_bb = bb; > + gcc_assert (flag_schedule_interblock || current_nr_blocks == 1); > + current_sched_info->queue_must_finish_empty = current_nr_blocks == 1; > > - gcc_assert (flag_schedule_interblock || current_nr_blocks == 1); > - current_sched_info->queue_must_finish_empty = current_nr_blocks == 1; > + curr_bb = first_bb; > + if (dbg_cnt (sched_block)) > + { > + int saved_last_basic_block = last_basic_block_for_fn (cfun); > > - curr_bb = first_bb; > - if (dbg_cnt (sched_block)) > - { > - int saved_last_basic_block = last_basic_block_for_fn (cfun); > + schedule_block (&curr_bb, bb_state[first_bb->index]); > + gcc_assert (EBB_FIRST_BB (bb) == first_bb); > + sched_rgn_n_insns += sched_n_insns; > + realloc_bb_state_array (saved_last_basic_block); > + save_state_for_fallthru_edge (last_bb, curr_state); > > - schedule_block (&curr_bb, bb_state[first_bb->index]); > - gcc_assert (EBB_FIRST_BB (bb) == first_bb); > - sched_rgn_n_insns += sched_n_insns; > - realloc_bb_state_array (saved_last_basic_block); > - save_state_for_fallthru_edge (last_bb, curr_state); > - } > - else > - { > - sched_rgn_n_insns += rgn_n_insns; > - } > + /* Clean up. */ > + if (current_nr_blocks > 1) > + free_trg_info (); > + } > + else > + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL; > + } > > - /* Clean up. */ > - if (current_nr_blocks > 1) > - free_trg_info (); > + /* We have counted this block when computing rgn_n_insns > + previously, so need to fix up sched_rgn_n_insns now. */ > + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL) > + { > + while (head != NEXT_INSN (tail)) > + { > + if (INSN_P (head)) > + sched_rgn_n_insns++; > + head = NEXT_INSN (head); > + } > + } > } > > /* Sanity check: verify that all region insns were scheduled. */ > @@ -3223,13 +3287,13 @@ schedule_region (int rgn) > > sched_finish_ready_list (); > > - /* Done with this region. */ > - sched_rgn_local_finish (); > - > /* Free dependencies. */ > for (bb = 0; bb < current_nr_blocks; ++bb) > free_block_dependencies (bb); > > + /* Done with this region. */ > + sched_rgn_local_finish (); > + > gcc_assert (haifa_recovery_bb_ever_added_p > || deps_pools_are_empty_p ()); > } > @@ -3450,6 +3514,19 @@ sched_rgn_local_init (int rgn) > e->aux = NULL; > } > } > + > + /* Initialize bb_deps_free_actions. */ > + bb_deps_free_actions > + = XNEWVEC (enum rgn_bb_deps_free_action, current_nr_blocks); > + for (bb = 0; bb < current_nr_blocks; bb++) > + { > + rtx_insn *head, *tail; > + get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); > + if (no_real_nondebug_insns_p (head, tail)) > + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NO; > + else > + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NORMAL; > + } > } > > /* Free data computed for the finished region. */ > @@ -3467,9 +3544,12 @@ sched_rgn_local_free (void) > void > sched_rgn_local_finish (void) > { > - if (current_nr_blocks > 1 && !sel_sched_p ()) > + if (!sel_sched_p ()) > { > - sched_rgn_local_free (); > + if (current_nr_blocks > 1) > + sched_rgn_local_free (); > + > + free (bb_deps_free_actions); > } > } > > diff --git a/gcc/sel-sched.cc b/gcc/sel-sched.cc > index 1925f4a9461..8310c892e13 100644 > --- a/gcc/sel-sched.cc > +++ b/gcc/sel-sched.cc > @@ -7213,7 +7213,8 @@ sel_region_target_finish (bool reset_sched_cycles_p) > > find_ebb_boundaries (EBB_FIRST_BB (i), scheduled_blocks); > > - if (no_real_insns_p (current_sched_info->head, current_sched_info->tail)) > + if (no_real_nondebug_insns_p (current_sched_info->head, > + current_sched_info->tail)) > continue; > > if (reset_sched_cycles_p) > diff --git a/gcc/testsuite/gcc.target/powerpc/pr108273.c b/gcc/testsuite/gcc.target/powerpc/pr108273.c > new file mode 100644 > index 00000000000..937224eaa69 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr108273.c > @@ -0,0 +1,26 @@ > +/* { dg-options "-O2 -fdbg-cnt=sched_block:1" } */ > +/* { dg-prune-output {\*\*\*dbgcnt:.*limit.*reached} } */ > + > +/* Verify there is no ICE. */ > + > +int a, b, c, e, f; > +float d; > + > +void > +g () > +{ > + float h, i[1]; > + for (; f;) > + if (c) > + { > + d *e; > + if (b) > + { > + float *j = i; > + j[0] += 0; > + } > + h += d; > + } > + if (h) > + a = i[0]; > +} > -- > 2.39.1
"Kewen.Lin" <linkw@linux.ibm.com> writes: > Hi, > > Gentle ping this: > > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634201.html Sorry for the lack of review on this. Personally, I've never looked at this part of code base in detail, so I don't think I can do a proper review. I'll try to have a look in stage 3 if no one more qualified beats me to it. Thanks, Richard > > BR, > Kewen > > on 2023/10/25 10:45, Kewen.Lin wrote: >> Hi, >> >> This is almost a repost for v2 which was posted at[1] in March >> excepting for: >> 1) rebased from r14-4810 which is relatively up-to-date, >> some conflicts on "int to bool" return type change have >> been resolved; >> 2) adjust commit log a bit; >> 3) fix misspelled "articial" with "artificial" somewhere; >> >> -- >> *v2 comments*: >> >> By addressing Alexander's comments, against v1 this >> patch v2 mainly: >> >> - Rename no_real_insns_p to no_real_nondebug_insns_p; >> - Introduce enum rgn_bb_deps_free_action for three >> kinds of actions to free deps; >> - Change function free_deps_for_bb_no_real_insns_p to >> resolve_forw_deps which only focuses on forward deps; >> - Extend the handlings to cover dbg-cnt sched_block, >> add one test case for it; >> - Move free_trg_info call in schedule_region to an >> appropriate place. >> >> One thing I'm not sure about is the change in function >> sched_rgn_local_finish, currently the invocation to >> sched_rgn_local_free is guarded with !sel_sched_p (), >> so I just follow it, but the initialization of those >> structures (in sched_rgn_local_init) isn't guarded >> with !sel_sched_p (), it looks odd. >> >> -- >> >> As PR108273 shows, when there is one block which only has >> NOTE_P and LABEL_P insns at non-debug mode while has some >> extra DEBUG_INSN_P insns at debug mode, after scheduling >> it, the DFA states would be different between debug mode >> and non-debug mode. Since at non-debug mode, the block >> meets no_real_insns_p, it gets skipped; while at debug >> mode, it gets scheduled, even it only has NOTE_P, LABEL_P >> and DEBUG_INSN_P, the call of function advance_one_cycle >> will change the DFA state. PR108519 also shows this issue >> can be exposed by some scheduler changes. >> >> This patch is to change function no_real_insns_p to >> function no_real_nondebug_insns_p by taking debug insn into >> account, which make us not try to schedule for the block >> having only NOTE_P, LABEL_P and DEBUG_INSN_P insns, >> resulting in consistent DFA states between non-debug and >> debug mode. >> >> Changing no_real_insns_p to no_real_nondebug_insns_p caused >> ICE when doing free_block_dependencies, the root cause is >> that we create dependencies for debug insns, those >> dependencies are expected to be resolved during scheduling >> insns, but they get skipped after this change. >> By checking the code, it looks it's reasonable to skip to >> compute block dependences for no_real_nondebug_insns_p >> blocks. There is also another issue, which gets exposed >> in SPEC2017 bmks build at option -O2 -g, is that we could >> skip to schedule some block, which already gets dependency >> graph built so has dependencies computed and rgn_n_insns >> accumulated, then the later verification on if the graph >> becomes exhausted by scheduling would fail as follow: >> >> /* Sanity check: verify that all region insns were >> scheduled. */ >> gcc_assert (sched_rgn_n_insns == rgn_n_insns); >> >> , and also some forward deps aren't resovled. >> >> As Alexander pointed out, the current debug count handling >> also suffers the similar issue, so this patch handles these >> two cases together: one is for some block gets skipped by >> !dbg_cnt (sched_block), the other is for some block which >> is not no_real_nondebug_insns_p initially but becomes >> no_real_nondebug_insns_p due to speculative scheduling. >> >> This patch can be bootstrapped and regress-tested on >> x86_64-redhat-linux, aarch64-linux-gnu and >> powerpc64{,le}-linux-gnu. >> >> I also verified this patch can pass SPEC2017 both intrate >> and fprate bmks building at -g -O2/-O3. >> >> Any thoughts? Is it ok for trunk? >> >> [1] v2: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614818.html >> [2] v1: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614224.html >> >> BR, >> Kewen >> ----- >> PR rtl-optimization/108273 >> >> gcc/ChangeLog: >> >> * haifa-sched.cc (no_real_insns_p): Rename to ... >> (no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn. >> * sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with >> no_real_nondebug_insns_p. >> * sched-int.h (no_real_insns_p): Rename to ... >> (no_real_nondebug_insns_p): ... this. >> * sched-rgn.cc (enum rgn_bb_deps_free_action): New enum. >> (bb_deps_free_actions): New static variable. >> (compute_block_dependences): Skip for no_real_nondebug_insns_p. >> (resolve_forw_deps): New function. >> (free_block_dependencies): Check bb_deps_free_actions and call >> function resolve_forw_deps for RGN_BB_DEPS_FREE_ARTIFICIAL. >> (compute_priorities): Replace no_real_insns_p with >> no_real_nondebug_insns_p. >> (schedule_region): Replace no_real_insns_p with >> no_real_nondebug_insns_p, set RGN_BB_DEPS_FREE_ARTIFICIAL if the block >> get dependencies computed before but skipped now, fix up count >> sched_rgn_n_insns for it too. Call free_trg_info when the block >> gets scheduled, and move sched_rgn_local_finish after the loop >> of free_block_dependencies loop. >> (sched_rgn_local_init): Allocate and compute bb_deps_free_actions. >> (sched_rgn_local_finish): Free bb_deps_free_actions. >> * sel-sched.cc (sel_region_target_finish): Replace no_real_insns_p with >> no_real_nondebug_insns_p. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/pr108273.c: New test. >> --- >> gcc/haifa-sched.cc | 9 +- >> gcc/sched-ebb.cc | 2 +- >> gcc/sched-int.h | 2 +- >> gcc/sched-rgn.cc | 148 +++++++++++++++----- >> gcc/sel-sched.cc | 3 +- >> gcc/testsuite/gcc.target/powerpc/pr108273.c | 26 ++++ >> 6 files changed, 150 insertions(+), 40 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108273.c >> >> diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc >> index 8e8add709b3..30cc90ec49f 100644 >> --- a/gcc/haifa-sched.cc >> +++ b/gcc/haifa-sched.cc >> @@ -5033,14 +5033,17 @@ get_ebb_head_tail (basic_block beg, basic_block end, >> *tailp = end_tail; >> } >> >> -/* Return true if there are no real insns in the range [ HEAD, TAIL ]. */ >> +/* Return true if there are no real nondebug insns in the range >> + [ HEAD, TAIL ]. */ >> >> bool >> -no_real_insns_p (const rtx_insn *head, const rtx_insn *tail) >> +no_real_nondebug_insns_p (const rtx_insn *head, const rtx_insn *tail) >> { >> while (head != NEXT_INSN (tail)) >> { >> - if (!NOTE_P (head) && !LABEL_P (head)) >> + if (!NOTE_P (head) >> + && !LABEL_P (head) >> + && !DEBUG_INSN_P (head)) >> return false; >> head = NEXT_INSN (head); >> } >> diff --git a/gcc/sched-ebb.cc b/gcc/sched-ebb.cc >> index 110fcdbca4d..03d96290a7c 100644 >> --- a/gcc/sched-ebb.cc >> +++ b/gcc/sched-ebb.cc >> @@ -491,7 +491,7 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, bool modulo_scheduling) >> first_bb = BLOCK_FOR_INSN (head); >> last_bb = BLOCK_FOR_INSN (tail); >> >> - if (no_real_insns_p (head, tail)) >> + if (no_real_nondebug_insns_p (head, tail)) >> return BLOCK_FOR_INSN (tail); >> >> gcc_assert (INSN_P (head) && INSN_P (tail)); >> diff --git a/gcc/sched-int.h b/gcc/sched-int.h >> index 64a2f0bcff9..adca494ade5 100644 >> --- a/gcc/sched-int.h >> +++ b/gcc/sched-int.h >> @@ -1397,7 +1397,7 @@ extern void free_global_sched_pressure_data (void); >> extern int haifa_classify_insn (const_rtx); >> extern void get_ebb_head_tail (basic_block, basic_block, >> rtx_insn **, rtx_insn **); >> -extern bool no_real_insns_p (const rtx_insn *, const rtx_insn *); >> +extern bool no_real_nondebug_insns_p (const rtx_insn *, const rtx_insn *); >> >> extern int insn_sched_cost (rtx_insn *); >> extern int dep_cost_1 (dep_t, dw_t); >> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc >> index e5964f54ead..2549e834aa8 100644 >> --- a/gcc/sched-rgn.cc >> +++ b/gcc/sched-rgn.cc >> @@ -213,6 +213,22 @@ static int rgn_nr_edges; >> /* Array of size rgn_nr_edges. */ >> static edge *rgn_edges; >> >> +/* Possible actions for dependencies freeing. */ >> +enum rgn_bb_deps_free_action >> +{ >> + /* This block doesn't get dependencies computed so don't need to free. */ >> + RGN_BB_DEPS_FREE_NO, >> + /* This block gets scheduled normally so free dependencies as usual. */ >> + RGN_BB_DEPS_FREE_NORMAL, >> + /* This block gets skipped in scheduling but has dependencies computed early, >> + need to free the forward list artificially. */ >> + RGN_BB_DEPS_FREE_ARTIFICIAL >> +}; >> + >> +/* For basic block i, bb_deps_free_actions[i] indicates which action needs >> + to be taken for freeing its dependencies. */ >> +static enum rgn_bb_deps_free_action *bb_deps_free_actions; >> + >> /* Mapping from each edge in the graph to its number in the rgn. */ >> #define EDGE_TO_BIT(edge) ((int)(size_t)(edge)->aux) >> #define SET_EDGE_TO_BIT(edge,nr) ((edge)->aux = (void *)(size_t)(nr)) >> @@ -2735,6 +2751,15 @@ compute_block_dependences (int bb) >> gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb)); >> get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); >> >> + /* Don't compute block dependences if there are no real nondebug insns. */ >> + if (no_real_nondebug_insns_p (head, tail)) >> + { >> + if (current_nr_blocks > 1) >> + propagate_deps (bb, &tmp_deps); >> + free_deps (&tmp_deps); >> + return; >> + } >> + >> sched_analyze (&tmp_deps, head, tail); >> >> add_branch_dependences (head, tail); >> @@ -2749,6 +2774,24 @@ compute_block_dependences (int bb) >> targetm.sched.dependencies_evaluation_hook (head, tail); >> } >> >> +/* Artificially resolve forward dependencies for instructions HEAD to TAIL. */ >> + >> +static void >> +resolve_forw_deps (rtx_insn *head, rtx_insn *tail) >> +{ >> + rtx_insn *insn; >> + rtx_insn *next_tail = NEXT_INSN (tail); >> + sd_iterator_def sd_it; >> + dep_t dep; >> + >> + /* There could be some insns which get skipped in scheduling but we compute >> + dependencies for them previously, so make them resolved. */ >> + for (insn = head; insn != next_tail; insn = NEXT_INSN (insn)) >> + for (sd_it = sd_iterator_start (insn, SD_LIST_FORW); >> + sd_iterator_cond (&sd_it, &dep);) >> + sd_resolve_dep (sd_it); >> +} >> + >> /* Free dependencies of instructions inside BB. */ >> static void >> free_block_dependencies (int bb) >> @@ -2758,9 +2801,12 @@ free_block_dependencies (int bb) >> >> get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); >> >> - if (no_real_insns_p (head, tail)) >> + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO) >> return; >> >> + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL) >> + resolve_forw_deps (head, tail); >> + >> sched_free_deps (head, tail, true); >> } >> >> @@ -3024,7 +3070,7 @@ compute_priorities (void) >> gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb)); >> get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); >> >> - if (no_real_insns_p (head, tail)) >> + if (no_real_nondebug_insns_p (head, tail)) >> continue; >> >> rgn_n_insns += set_priorities (head, tail); >> @@ -3158,7 +3204,7 @@ schedule_region (int rgn) >> >> get_ebb_head_tail (first_bb, last_bb, &head, &tail); >> >> - if (no_real_insns_p (head, tail)) >> + if (no_real_nondebug_insns_p (head, tail)) >> { >> gcc_assert (first_bb == last_bb); >> continue; >> @@ -3178,44 +3224,62 @@ schedule_region (int rgn) >> >> get_ebb_head_tail (first_bb, last_bb, &head, &tail); >> >> - if (no_real_insns_p (head, tail)) >> + if (no_real_nondebug_insns_p (head, tail)) >> { >> gcc_assert (first_bb == last_bb); >> save_state_for_fallthru_edge (last_bb, bb_state[first_bb->index]); >> - continue; >> + >> + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO) >> + continue; >> + >> + /* As it's not no_real_nondebug_insns_p initially, then it has some >> + dependencies computed so free it artificially. */ >> + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL; >> } >> + else >> + { >> + current_sched_info->prev_head = PREV_INSN (head); >> + current_sched_info->next_tail = NEXT_INSN (tail); >> >> - current_sched_info->prev_head = PREV_INSN (head); >> - current_sched_info->next_tail = NEXT_INSN (tail); >> + remove_notes (head, tail); >> >> - remove_notes (head, tail); >> + unlink_bb_notes (first_bb, last_bb); >> >> - unlink_bb_notes (first_bb, last_bb); >> + target_bb = bb; >> >> - target_bb = bb; >> + gcc_assert (flag_schedule_interblock || current_nr_blocks == 1); >> + current_sched_info->queue_must_finish_empty = current_nr_blocks == 1; >> >> - gcc_assert (flag_schedule_interblock || current_nr_blocks == 1); >> - current_sched_info->queue_must_finish_empty = current_nr_blocks == 1; >> + curr_bb = first_bb; >> + if (dbg_cnt (sched_block)) >> + { >> + int saved_last_basic_block = last_basic_block_for_fn (cfun); >> >> - curr_bb = first_bb; >> - if (dbg_cnt (sched_block)) >> - { >> - int saved_last_basic_block = last_basic_block_for_fn (cfun); >> + schedule_block (&curr_bb, bb_state[first_bb->index]); >> + gcc_assert (EBB_FIRST_BB (bb) == first_bb); >> + sched_rgn_n_insns += sched_n_insns; >> + realloc_bb_state_array (saved_last_basic_block); >> + save_state_for_fallthru_edge (last_bb, curr_state); >> >> - schedule_block (&curr_bb, bb_state[first_bb->index]); >> - gcc_assert (EBB_FIRST_BB (bb) == first_bb); >> - sched_rgn_n_insns += sched_n_insns; >> - realloc_bb_state_array (saved_last_basic_block); >> - save_state_for_fallthru_edge (last_bb, curr_state); >> - } >> - else >> - { >> - sched_rgn_n_insns += rgn_n_insns; >> - } >> + /* Clean up. */ >> + if (current_nr_blocks > 1) >> + free_trg_info (); >> + } >> + else >> + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL; >> + } >> >> - /* Clean up. */ >> - if (current_nr_blocks > 1) >> - free_trg_info (); >> + /* We have counted this block when computing rgn_n_insns >> + previously, so need to fix up sched_rgn_n_insns now. */ >> + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL) >> + { >> + while (head != NEXT_INSN (tail)) >> + { >> + if (INSN_P (head)) >> + sched_rgn_n_insns++; >> + head = NEXT_INSN (head); >> + } >> + } >> } >> >> /* Sanity check: verify that all region insns were scheduled. */ >> @@ -3223,13 +3287,13 @@ schedule_region (int rgn) >> >> sched_finish_ready_list (); >> >> - /* Done with this region. */ >> - sched_rgn_local_finish (); >> - >> /* Free dependencies. */ >> for (bb = 0; bb < current_nr_blocks; ++bb) >> free_block_dependencies (bb); >> >> + /* Done with this region. */ >> + sched_rgn_local_finish (); >> + >> gcc_assert (haifa_recovery_bb_ever_added_p >> || deps_pools_are_empty_p ()); >> } >> @@ -3450,6 +3514,19 @@ sched_rgn_local_init (int rgn) >> e->aux = NULL; >> } >> } >> + >> + /* Initialize bb_deps_free_actions. */ >> + bb_deps_free_actions >> + = XNEWVEC (enum rgn_bb_deps_free_action, current_nr_blocks); >> + for (bb = 0; bb < current_nr_blocks; bb++) >> + { >> + rtx_insn *head, *tail; >> + get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); >> + if (no_real_nondebug_insns_p (head, tail)) >> + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NO; >> + else >> + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NORMAL; >> + } >> } >> >> /* Free data computed for the finished region. */ >> @@ -3467,9 +3544,12 @@ sched_rgn_local_free (void) >> void >> sched_rgn_local_finish (void) >> { >> - if (current_nr_blocks > 1 && !sel_sched_p ()) >> + if (!sel_sched_p ()) >> { >> - sched_rgn_local_free (); >> + if (current_nr_blocks > 1) >> + sched_rgn_local_free (); >> + >> + free (bb_deps_free_actions); >> } >> } >> >> diff --git a/gcc/sel-sched.cc b/gcc/sel-sched.cc >> index 1925f4a9461..8310c892e13 100644 >> --- a/gcc/sel-sched.cc >> +++ b/gcc/sel-sched.cc >> @@ -7213,7 +7213,8 @@ sel_region_target_finish (bool reset_sched_cycles_p) >> >> find_ebb_boundaries (EBB_FIRST_BB (i), scheduled_blocks); >> >> - if (no_real_insns_p (current_sched_info->head, current_sched_info->tail)) >> + if (no_real_nondebug_insns_p (current_sched_info->head, >> + current_sched_info->tail)) >> continue; >> >> if (reset_sched_cycles_p) >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108273.c b/gcc/testsuite/gcc.target/powerpc/pr108273.c >> new file mode 100644 >> index 00000000000..937224eaa69 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr108273.c >> @@ -0,0 +1,26 @@ >> +/* { dg-options "-O2 -fdbg-cnt=sched_block:1" } */ >> +/* { dg-prune-output {\*\*\*dbgcnt:.*limit.*reached} } */ >> + >> +/* Verify there is no ICE. */ >> + >> +int a, b, c, e, f; >> +float d; >> + >> +void >> +g () >> +{ >> + float h, i[1]; >> + for (; f;) >> + if (c) >> + { >> + d *e; >> + if (b) >> + { >> + float *j = i; >> + j[0] += 0; >> + } >> + h += d; >> + } >> + if (h) >> + a = i[0]; >> +} >> -- >> 2.39.1
Hi Kewen, Below are my comments. I don't want to override Alexander's review, and if the patch looks good to him, it's fine to ignore my concerns. My main concern is that this adds a new entity -- forceful skipping of DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change in behavior. Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is already quite a bit of logic in the scheduler to skip them _as part of normal operation_. Would you please consider 2 ideas below. #1: After a brief look, I'm guessing this part is causing the problem: haifa-sched.cc <http://haifa-sched.cc/>:schedule_block(): === [1] /* Loop until all the insns in BB are scheduled. */ while ((*current_sched_info->schedule_more_p) ()) { perform_replacements_new_cycle (); do { start_clock_var = clock_var; clock_var++; advance_one_cycle (); === and then in the nested loop we have === [2] /* We don't want md sched reorder to even see debug isns, so put them out right away. */ if (ready.n_ready && DEBUG_INSN_P (ready_element (&ready, 0)) && (*current_sched_info->schedule_more_p) ()) { while (ready.n_ready && DEBUG_INSN_P (ready_element (&ready, 0))) { rtx_insn *insn = ready_remove_first (&ready); gcc_assert (DEBUG_INSN_P (insn)); (*current_sched_info->begin_schedule_ready) (insn); scheduled_insns.safe_push (insn); last_scheduled_insn = insn; advance = schedule_insn (insn); gcc_assert (advance == 0); if (ready.n_ready > 0) ready_sort (&ready); } } === . At the [1] point we already have sorted ready list, and I don't see any blockers to doing [2] before calling advance_one_cycle(). #2 Another approach, which might be even easier, is to save the state of DFA before the initial advance_one_cycle(), and then restore it if no real insns have been scheduled. Kind regards, -- Maxim Kuvyrkov https://www.linaro.org > On Nov 8, 2023, at 06:49, Kewen.Lin <linkw@linux.ibm.com> wrote: > > Hi, > > Gentle ping this: > > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634201.html > > BR, > Kewen > > on 2023/10/25 10:45, Kewen.Lin wrote: >> Hi, >> >> This is almost a repost for v2 which was posted at[1] in March >> excepting for: >> 1) rebased from r14-4810 which is relatively up-to-date, >> some conflicts on "int to bool" return type change have >> been resolved; >> 2) adjust commit log a bit; >> 3) fix misspelled "articial" with "artificial" somewhere; >> >> -- >> *v2 comments*: >> >> By addressing Alexander's comments, against v1 this >> patch v2 mainly: >> >> - Rename no_real_insns_p to no_real_nondebug_insns_p; >> - Introduce enum rgn_bb_deps_free_action for three >> kinds of actions to free deps; >> - Change function free_deps_for_bb_no_real_insns_p to >> resolve_forw_deps which only focuses on forward deps; >> - Extend the handlings to cover dbg-cnt sched_block, >> add one test case for it; >> - Move free_trg_info call in schedule_region to an >> appropriate place. >> >> One thing I'm not sure about is the change in function >> sched_rgn_local_finish, currently the invocation to >> sched_rgn_local_free is guarded with !sel_sched_p (), >> so I just follow it, but the initialization of those >> structures (in sched_rgn_local_init) isn't guarded >> with !sel_sched_p (), it looks odd. >> >> -- >> >> As PR108273 shows, when there is one block which only has >> NOTE_P and LABEL_P insns at non-debug mode while has some >> extra DEBUG_INSN_P insns at debug mode, after scheduling >> it, the DFA states would be different between debug mode >> and non-debug mode. Since at non-debug mode, the block >> meets no_real_insns_p, it gets skipped; while at debug >> mode, it gets scheduled, even it only has NOTE_P, LABEL_P >> and DEBUG_INSN_P, the call of function advance_one_cycle >> will change the DFA state. PR108519 also shows this issue >> can be exposed by some scheduler changes. >> >> This patch is to change function no_real_insns_p to >> function no_real_nondebug_insns_p by taking debug insn into >> account, which make us not try to schedule for the block >> having only NOTE_P, LABEL_P and DEBUG_INSN_P insns, >> resulting in consistent DFA states between non-debug and >> debug mode. >> >> Changing no_real_insns_p to no_real_nondebug_insns_p caused >> ICE when doing free_block_dependencies, the root cause is >> that we create dependencies for debug insns, those >> dependencies are expected to be resolved during scheduling >> insns, but they get skipped after this change. >> By checking the code, it looks it's reasonable to skip to >> compute block dependences for no_real_nondebug_insns_p >> blocks. There is also another issue, which gets exposed >> in SPEC2017 bmks build at option -O2 -g, is that we could >> skip to schedule some block, which already gets dependency >> graph built so has dependencies computed and rgn_n_insns >> accumulated, then the later verification on if the graph >> becomes exhausted by scheduling would fail as follow: >> >> /* Sanity check: verify that all region insns were >> scheduled. */ >> gcc_assert (sched_rgn_n_insns == rgn_n_insns); >> >> , and also some forward deps aren't resovled. >> >> As Alexander pointed out, the current debug count handling >> also suffers the similar issue, so this patch handles these >> two cases together: one is for some block gets skipped by >> !dbg_cnt (sched_block), the other is for some block which >> is not no_real_nondebug_insns_p initially but becomes >> no_real_nondebug_insns_p due to speculative scheduling. >> >> This patch can be bootstrapped and regress-tested on >> x86_64-redhat-linux, aarch64-linux-gnu and >> powerpc64{,le}-linux-gnu. >> >> I also verified this patch can pass SPEC2017 both intrate >> and fprate bmks building at -g -O2/-O3. >> >> Any thoughts? Is it ok for trunk? >> >> [1] v2: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614818.html >> [2] v1: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614224.html >> >> BR, >> Kewen >> ----- >> PR rtl-optimization/108273 >> >> gcc/ChangeLog: >> >> * haifa-sched.cc (no_real_insns_p): Rename to ... >> (no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn. >> * sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with >> no_real_nondebug_insns_p. >> * sched-int.h (no_real_insns_p): Rename to ... >> (no_real_nondebug_insns_p): ... this. >> * sched-rgn.cc (enum rgn_bb_deps_free_action): New enum. >> (bb_deps_free_actions): New static variable. >> (compute_block_dependences): Skip for no_real_nondebug_insns_p. >> (resolve_forw_deps): New function. >> (free_block_dependencies): Check bb_deps_free_actions and call >> function resolve_forw_deps for RGN_BB_DEPS_FREE_ARTIFICIAL. >> (compute_priorities): Replace no_real_insns_p with >> no_real_nondebug_insns_p. >> (schedule_region): Replace no_real_insns_p with >> no_real_nondebug_insns_p, set RGN_BB_DEPS_FREE_ARTIFICIAL if the block >> get dependencies computed before but skipped now, fix up count >> sched_rgn_n_insns for it too. Call free_trg_info when the block >> gets scheduled, and move sched_rgn_local_finish after the loop >> of free_block_dependencies loop. >> (sched_rgn_local_init): Allocate and compute bb_deps_free_actions. >> (sched_rgn_local_finish): Free bb_deps_free_actions. >> * sel-sched.cc (sel_region_target_finish): Replace no_real_insns_p with >> no_real_nondebug_insns_p. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/pr108273.c: New test. >> --- >> gcc/haifa-sched.cc | 9 +- >> gcc/sched-ebb.cc | 2 +- >> gcc/sched-int.h | 2 +- >> gcc/sched-rgn.cc | 148 +++++++++++++++----- >> gcc/sel-sched.cc | 3 +- >> gcc/testsuite/gcc.target/powerpc/pr108273.c | 26 ++++ >> 6 files changed, 150 insertions(+), 40 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108273.c >> >> diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc >> index 8e8add709b3..30cc90ec49f 100644 >> --- a/gcc/haifa-sched.cc >> +++ b/gcc/haifa-sched.cc >> @@ -5033,14 +5033,17 @@ get_ebb_head_tail (basic_block beg, basic_block end, >> *tailp = end_tail; >> } >> >> -/* Return true if there are no real insns in the range [ HEAD, TAIL ]. */ >> +/* Return true if there are no real nondebug insns in the range >> + [ HEAD, TAIL ]. */ >> >> bool >> -no_real_insns_p (const rtx_insn *head, const rtx_insn *tail) >> +no_real_nondebug_insns_p (const rtx_insn *head, const rtx_insn *tail) >> { >> while (head != NEXT_INSN (tail)) >> { >> - if (!NOTE_P (head) && !LABEL_P (head)) >> + if (!NOTE_P (head) >> + && !LABEL_P (head) >> + && !DEBUG_INSN_P (head)) >> return false; >> head = NEXT_INSN (head); >> } >> diff --git a/gcc/sched-ebb.cc b/gcc/sched-ebb.cc >> index 110fcdbca4d..03d96290a7c 100644 >> --- a/gcc/sched-ebb.cc >> +++ b/gcc/sched-ebb.cc >> @@ -491,7 +491,7 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, bool modulo_scheduling) >> first_bb = BLOCK_FOR_INSN (head); >> last_bb = BLOCK_FOR_INSN (tail); >> >> - if (no_real_insns_p (head, tail)) >> + if (no_real_nondebug_insns_p (head, tail)) >> return BLOCK_FOR_INSN (tail); >> >> gcc_assert (INSN_P (head) && INSN_P (tail)); >> diff --git a/gcc/sched-int.h b/gcc/sched-int.h >> index 64a2f0bcff9..adca494ade5 100644 >> --- a/gcc/sched-int.h >> +++ b/gcc/sched-int.h >> @@ -1397,7 +1397,7 @@ extern void free_global_sched_pressure_data (void); >> extern int haifa_classify_insn (const_rtx); >> extern void get_ebb_head_tail (basic_block, basic_block, >> rtx_insn **, rtx_insn **); >> -extern bool no_real_insns_p (const rtx_insn *, const rtx_insn *); >> +extern bool no_real_nondebug_insns_p (const rtx_insn *, const rtx_insn *); >> >> extern int insn_sched_cost (rtx_insn *); >> extern int dep_cost_1 (dep_t, dw_t); >> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc >> index e5964f54ead..2549e834aa8 100644 >> --- a/gcc/sched-rgn.cc >> +++ b/gcc/sched-rgn.cc >> @@ -213,6 +213,22 @@ static int rgn_nr_edges; >> /* Array of size rgn_nr_edges. */ >> static edge *rgn_edges; >> >> +/* Possible actions for dependencies freeing. */ >> +enum rgn_bb_deps_free_action >> +{ >> + /* This block doesn't get dependencies computed so don't need to free. */ >> + RGN_BB_DEPS_FREE_NO, >> + /* This block gets scheduled normally so free dependencies as usual. */ >> + RGN_BB_DEPS_FREE_NORMAL, >> + /* This block gets skipped in scheduling but has dependencies computed early, >> + need to free the forward list artificially. */ >> + RGN_BB_DEPS_FREE_ARTIFICIAL >> +}; >> + >> +/* For basic block i, bb_deps_free_actions[i] indicates which action needs >> + to be taken for freeing its dependencies. */ >> +static enum rgn_bb_deps_free_action *bb_deps_free_actions; >> + >> /* Mapping from each edge in the graph to its number in the rgn. */ >> #define EDGE_TO_BIT(edge) ((int)(size_t)(edge)->aux) >> #define SET_EDGE_TO_BIT(edge,nr) ((edge)->aux = (void *)(size_t)(nr)) >> @@ -2735,6 +2751,15 @@ compute_block_dependences (int bb) >> gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb)); >> get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); >> >> + /* Don't compute block dependences if there are no real nondebug insns. */ >> + if (no_real_nondebug_insns_p (head, tail)) >> + { >> + if (current_nr_blocks > 1) >> + propagate_deps (bb, &tmp_deps); >> + free_deps (&tmp_deps); >> + return; >> + } >> + >> sched_analyze (&tmp_deps, head, tail); >> >> add_branch_dependences (head, tail); >> @@ -2749,6 +2774,24 @@ compute_block_dependences (int bb) >> targetm.sched.dependencies_evaluation_hook (head, tail); >> } >> >> +/* Artificially resolve forward dependencies for instructions HEAD to TAIL. */ >> + >> +static void >> +resolve_forw_deps (rtx_insn *head, rtx_insn *tail) >> +{ >> + rtx_insn *insn; >> + rtx_insn *next_tail = NEXT_INSN (tail); >> + sd_iterator_def sd_it; >> + dep_t dep; >> + >> + /* There could be some insns which get skipped in scheduling but we compute >> + dependencies for them previously, so make them resolved. */ >> + for (insn = head; insn != next_tail; insn = NEXT_INSN (insn)) >> + for (sd_it = sd_iterator_start (insn, SD_LIST_FORW); >> + sd_iterator_cond (&sd_it, &dep);) >> + sd_resolve_dep (sd_it); >> +} >> + >> /* Free dependencies of instructions inside BB. */ >> static void >> free_block_dependencies (int bb) >> @@ -2758,9 +2801,12 @@ free_block_dependencies (int bb) >> >> get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); >> >> - if (no_real_insns_p (head, tail)) >> + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO) >> return; >> >> + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL) >> + resolve_forw_deps (head, tail); >> + >> sched_free_deps (head, tail, true); >> } >> >> @@ -3024,7 +3070,7 @@ compute_priorities (void) >> gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb)); >> get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); >> >> - if (no_real_insns_p (head, tail)) >> + if (no_real_nondebug_insns_p (head, tail)) >> continue; >> >> rgn_n_insns += set_priorities (head, tail); >> @@ -3158,7 +3204,7 @@ schedule_region (int rgn) >> >> get_ebb_head_tail (first_bb, last_bb, &head, &tail); >> >> - if (no_real_insns_p (head, tail)) >> + if (no_real_nondebug_insns_p (head, tail)) >> { >> gcc_assert (first_bb == last_bb); >> continue; >> @@ -3178,44 +3224,62 @@ schedule_region (int rgn) >> >> get_ebb_head_tail (first_bb, last_bb, &head, &tail); >> >> - if (no_real_insns_p (head, tail)) >> + if (no_real_nondebug_insns_p (head, tail)) >> { >> gcc_assert (first_bb == last_bb); >> save_state_for_fallthru_edge (last_bb, bb_state[first_bb->index]); >> - continue; >> + >> + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO) >> + continue; >> + >> + /* As it's not no_real_nondebug_insns_p initially, then it has some >> + dependencies computed so free it artificially. */ >> + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL; >> } >> + else >> + { >> + current_sched_info->prev_head = PREV_INSN (head); >> + current_sched_info->next_tail = NEXT_INSN (tail); >> >> - current_sched_info->prev_head = PREV_INSN (head); >> - current_sched_info->next_tail = NEXT_INSN (tail); >> + remove_notes (head, tail); >> >> - remove_notes (head, tail); >> + unlink_bb_notes (first_bb, last_bb); >> >> - unlink_bb_notes (first_bb, last_bb); >> + target_bb = bb; >> >> - target_bb = bb; >> + gcc_assert (flag_schedule_interblock || current_nr_blocks == 1); >> + current_sched_info->queue_must_finish_empty = current_nr_blocks == 1; >> >> - gcc_assert (flag_schedule_interblock || current_nr_blocks == 1); >> - current_sched_info->queue_must_finish_empty = current_nr_blocks == 1; >> + curr_bb = first_bb; >> + if (dbg_cnt (sched_block)) >> + { >> + int saved_last_basic_block = last_basic_block_for_fn (cfun); >> >> - curr_bb = first_bb; >> - if (dbg_cnt (sched_block)) >> - { >> - int saved_last_basic_block = last_basic_block_for_fn (cfun); >> + schedule_block (&curr_bb, bb_state[first_bb->index]); >> + gcc_assert (EBB_FIRST_BB (bb) == first_bb); >> + sched_rgn_n_insns += sched_n_insns; >> + realloc_bb_state_array (saved_last_basic_block); >> + save_state_for_fallthru_edge (last_bb, curr_state); >> >> - schedule_block (&curr_bb, bb_state[first_bb->index]); >> - gcc_assert (EBB_FIRST_BB (bb) == first_bb); >> - sched_rgn_n_insns += sched_n_insns; >> - realloc_bb_state_array (saved_last_basic_block); >> - save_state_for_fallthru_edge (last_bb, curr_state); >> - } >> - else >> - { >> - sched_rgn_n_insns += rgn_n_insns; >> - } >> + /* Clean up. */ >> + if (current_nr_blocks > 1) >> + free_trg_info (); >> + } >> + else >> + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL; >> + } >> >> - /* Clean up. */ >> - if (current_nr_blocks > 1) >> - free_trg_info (); >> + /* We have counted this block when computing rgn_n_insns >> + previously, so need to fix up sched_rgn_n_insns now. */ >> + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL) >> + { >> + while (head != NEXT_INSN (tail)) >> + { >> + if (INSN_P (head)) >> + sched_rgn_n_insns++; >> + head = NEXT_INSN (head); >> + } >> + } >> } >> >> /* Sanity check: verify that all region insns were scheduled. */ >> @@ -3223,13 +3287,13 @@ schedule_region (int rgn) >> >> sched_finish_ready_list (); >> >> - /* Done with this region. */ >> - sched_rgn_local_finish (); >> - >> /* Free dependencies. */ >> for (bb = 0; bb < current_nr_blocks; ++bb) >> free_block_dependencies (bb); >> >> + /* Done with this region. */ >> + sched_rgn_local_finish (); >> + >> gcc_assert (haifa_recovery_bb_ever_added_p >> || deps_pools_are_empty_p ()); >> } >> @@ -3450,6 +3514,19 @@ sched_rgn_local_init (int rgn) >> e->aux = NULL; >> } >> } >> + >> + /* Initialize bb_deps_free_actions. */ >> + bb_deps_free_actions >> + = XNEWVEC (enum rgn_bb_deps_free_action, current_nr_blocks); >> + for (bb = 0; bb < current_nr_blocks; bb++) >> + { >> + rtx_insn *head, *tail; >> + get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); >> + if (no_real_nondebug_insns_p (head, tail)) >> + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NO; >> + else >> + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NORMAL; >> + } >> } >> >> /* Free data computed for the finished region. */ >> @@ -3467,9 +3544,12 @@ sched_rgn_local_free (void) >> void >> sched_rgn_local_finish (void) >> { >> - if (current_nr_blocks > 1 && !sel_sched_p ()) >> + if (!sel_sched_p ()) >> { >> - sched_rgn_local_free (); >> + if (current_nr_blocks > 1) >> + sched_rgn_local_free (); >> + >> + free (bb_deps_free_actions); >> } >> } >> >> diff --git a/gcc/sel-sched.cc b/gcc/sel-sched.cc >> index 1925f4a9461..8310c892e13 100644 >> --- a/gcc/sel-sched.cc >> +++ b/gcc/sel-sched.cc >> @@ -7213,7 +7213,8 @@ sel_region_target_finish (bool reset_sched_cycles_p) >> >> find_ebb_boundaries (EBB_FIRST_BB (i), scheduled_blocks); >> >> - if (no_real_insns_p (current_sched_info->head, current_sched_info->tail)) >> + if (no_real_nondebug_insns_p (current_sched_info->head, >> + current_sched_info->tail)) >> continue; >> >> if (reset_sched_cycles_p) >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108273.c b/gcc/testsuite/gcc.target/powerpc/pr108273.c >> new file mode 100644 >> index 00000000000..937224eaa69 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr108273.c >> @@ -0,0 +1,26 @@ >> +/* { dg-options "-O2 -fdbg-cnt=sched_block:1" } */ >> +/* { dg-prune-output {\*\*\*dbgcnt:.*limit.*reached} } */ >> + >> +/* Verify there is no ICE. */ >> + >> +int a, b, c, e, f; >> +float d; >> + >> +void >> +g () >> +{ >> + float h, i[1]; >> + for (; f;) >> + if (c) >> + { >> + d *e; >> + if (b) >> + { >> + float *j = i; >> + j[0] += 0; >> + } >> + h += d; >> + } >> + if (h) >> + a = i[0]; >> +} >> -- >> 2.39.1
On Thu, 9 Nov 2023, Maxim Kuvyrkov wrote: > Hi Kewen, > > Below are my comments. I don't want to override Alexander's review, and if > the patch looks good to him, it's fine to ignore my concerns. > > My main concern is that this adds a new entity -- forceful skipping of > DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change > in behavior. Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is > already quite a bit of logic in the scheduler to skip them _as part of normal > operation_. I agree with the concern. I hoped that solving the problem by skipping the BB like the (bit-rotted) debug code needs to would be a minor surgery. As things look now, it may be better to remove the non-working sched_block debug counter entirely and implement a good solution for the problem at hand. > > Would you please consider 2 ideas below. > > #1: > After a brief look, I'm guessing this part is causing the problem: > haifa-sched.cc <http://haifa-sched.cc/>:schedule_block(): > === [1] > /* Loop until all the insns in BB are scheduled. */ > while ((*current_sched_info->schedule_more_p) ()) > { > perform_replacements_new_cycle (); > do > { > start_clock_var = clock_var; > > clock_var++; > > advance_one_cycle (); As I understand, we have spurious calls to advance_one_cycle on basic block boundaries, which don't model the hardware (the CPU doesn't see BB boundaries) and cause divergence when passing through a debug-only BB which would not be present at all without -g. Since EBBs and regions may not have jump targets in the middle, advancing a cycle on BB boundaries does not seem well motivated. Can we remove it? Can we teach haifa-sched to emit RTX NOTEs with hashes of DFA states on BB boundaries with -fcompare-debug is enabled? It should make the problem readily detectable by -fcompare-debug even when scheduling did not diverge. Alexander
Hi Maxim and Alexander, Thanks a lot for the review comments! on 2023/11/10 01:40, Alexander Monakov wrote: > > On Thu, 9 Nov 2023, Maxim Kuvyrkov wrote: > >> Hi Kewen, >> >> Below are my comments. I don't want to override Alexander's review, and if >> the patch looks good to him, it's fine to ignore my concerns. >> >> My main concern is that this adds a new entity -- forceful skipping of >> DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change >> in behavior. Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is >> already quite a bit of logic in the scheduler to skip them _as part of normal >> operation_. Yeah, I noticed that the scheduler takes care of DEBUG_INSNs as normal operations. When I started to work on this issue, initially I wanted to try something similar to your idea #2, but when checking the APIs, I realized why not just skip the basic block with NOTEs and LABELs, DEBUG_INSNs as well. IMHO there is no value to try to schedule this kind of BB (to be scheduled range), skipping it can save some resource allocation (like block dependencies) and make it more efficient (not enter function schedule_block etc.), from this perspective it seems an enhancement. Does it sound reasonable to you? > > I agree with the concern. I hoped that solving the problem by skipping the BB > like the (bit-rotted) debug code needs to would be a minor surgery. As things > look now, it may be better to remove the non-working sched_block debug counter > entirely and implement a good solution for the problem at hand. OK, if debug counter sched_block is useless and can be removed, then the proposed new skipping becomes the only actual need for the artificial resolve_forw_deps. > >> >> Would you please consider 2 ideas below. >> >> #1: >> After a brief look, I'm guessing this part is causing the problem: >> haifa-sched.cc <http://haifa-sched.cc/>:schedule_block(): >> === [1] >> /* Loop until all the insns in BB are scheduled. */ >> while ((*current_sched_info->schedule_more_p) ()) >> { >> perform_replacements_new_cycle (); >> do >> { >> start_clock_var = clock_var; >> >> clock_var++; >> >> advance_one_cycle (); > > As I understand, we have spurious calls to advance_one_cycle on basic block > boundaries, which don't model the hardware (the CPU doesn't see BB boundaries) > and cause divergence when passing through a debug-only BB which would not be > present at all without -g. > > Since EBBs and regions may not have jump targets in the middle, advancing > a cycle on BB boundaries does not seem well motivated. Can we remove it? > > Can we teach haifa-sched to emit RTX NOTEs with hashes of DFA states on BB > boundaries with -fcompare-debug is enabled? It should make the problem > readily detectable by -fcompare-debug even when scheduling did not diverge. Good idea! It would be easy to detect the inconsistent issue with such note. BR, Kewen
On 11/9/23 10:40, Alexander Monakov wrote: > > On Thu, 9 Nov 2023, Maxim Kuvyrkov wrote: > >> Hi Kewen, >> >> Below are my comments. I don't want to override Alexander's review, and if >> the patch looks good to him, it's fine to ignore my concerns. >> >> My main concern is that this adds a new entity -- forceful skipping of >> DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change >> in behavior. Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is >> already quite a bit of logic in the scheduler to skip them _as part of normal >> operation_. > > I agree with the concern. I hoped that solving the problem by skipping the BB > like the (bit-rotted) debug code needs to would be a minor surgery. As things > look now, it may be better to remove the non-working sched_block debug counter > entirely and implement a good solution for the problem at hand. I wouldn't lose sleep over this -- if removing a debug counter from this code simplifies the implementation, that's fine with me. I believe they were all added when significant work in the scheduler was more common. > > Can we teach haifa-sched to emit RTX NOTEs with hashes of DFA states on BB > boundaries with -fcompare-debug is enabled? It should make the problem > readily detectable by -fcompare-debug even when scheduling did not diverge. I like it. jeff
On 11/9/23 18:57, Kewen.Lin wrote: > Hi Maxim and Alexander, > > Thanks a lot for the review comments! > > on 2023/11/10 01:40, Alexander Monakov wrote: >> >> On Thu, 9 Nov 2023, Maxim Kuvyrkov wrote: >> >>> Hi Kewen, >>> >>> Below are my comments. I don't want to override Alexander's review, and if >>> the patch looks good to him, it's fine to ignore my concerns. >>> >>> My main concern is that this adds a new entity -- forceful skipping of >>> DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change >>> in behavior. Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is >>> already quite a bit of logic in the scheduler to skip them _as part of normal >>> operation_. > > Yeah, I noticed that the scheduler takes care of DEBUG_INSNs as normal operations. > When I started to work on this issue, initially I wanted to try something similar > to your idea #2, but when checking the APIs, I realized why not just skip the basic > block with NOTEs and LABELs, DEBUG_INSNs as well. IMHO there is no value to try to > schedule this kind of BB (to be scheduled range), skipping it can save some resource > allocation (like block dependencies) and make it more efficient (not enter function > schedule_block etc.), from this perspective it seems an enhancement. Does it sound > reasonable to you? It sounds reasonable, but only if doing so doesn't add significant implementation complexity. ie, the gains from doing less work here are likely to be very marginal, so I'm more interested in clean, easy to maintain code. Jeff
On Thu, 9 Nov 2023, Jeff Law wrote: > > Yeah, I noticed that the scheduler takes care of DEBUG_INSNs as normal > > operations. When I started to work on this issue, initially I wanted to try > > something similar to your idea #2, but when checking the APIs, I realized > > why not just skip the basic block with NOTEs and LABELs, DEBUG_INSNs as > > well. IMHO there is no value to try to schedule this kind of BB (to be > > scheduled range), skipping it can save some resource allocation (like block > > dependencies) and make it more efficient (not enter function schedule_block > > etc.), from this perspective it seems an enhancement. Does it sound > > reasonable to you? > It sounds reasonable, but only if doing so doesn't add significant > implementation complexity. ie, the gains from doing less work here are likely > to be very marginal, so I'm more interested in clean, easy to maintain code. I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking design: DEBUG_INSNs participate in dependency graph so that schedulers can remove or mutate them as needed when moving real insns across them. Cc'ing Alexandre Oliva who can correct me on that if necessary. Alexander
On Fri, Nov 10, 2023 at 12:25 PM Alexander Monakov <amonakov@ispras.ru> wrote: > > > On Thu, 9 Nov 2023, Jeff Law wrote: > > > > Yeah, I noticed that the scheduler takes care of DEBUG_INSNs as normal > > > operations. When I started to work on this issue, initially I wanted to try > > > something similar to your idea #2, but when checking the APIs, I realized > > > why not just skip the basic block with NOTEs and LABELs, DEBUG_INSNs as > > > well. IMHO there is no value to try to schedule this kind of BB (to be > > > scheduled range), skipping it can save some resource allocation (like block > > > dependencies) and make it more efficient (not enter function schedule_block > > > etc.), from this perspective it seems an enhancement. Does it sound > > > reasonable to you? > > It sounds reasonable, but only if doing so doesn't add significant > > implementation complexity. ie, the gains from doing less work here are likely > > to be very marginal, so I'm more interested in clean, easy to maintain code. > > I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking design: > DEBUG_INSNs participate in dependency graph so that schedulers can remove or > mutate them as needed when moving real insns across them. Note that debug-only BBs do not exist - the BB would be there even without debug insns! So instead you have to handle BBs with just debug insns the same you handle a completely empty BB. > Cc'ing Alexandre Oliva who can correct me on that if necessary. > > Alexander
On Fri, 10 Nov 2023, Richard Biener wrote: > > I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking design: > > DEBUG_INSNs participate in dependency graph so that schedulers can remove or > > mutate them as needed when moving real insns across them. > > Note that debug-only BBs do not exist - the BB would be there even without debug > insns! Yep, sorry, I misspoke when I earlier said >> and cause divergence when passing through a debug-only BB which would not be >> present at all without -g. They are present in the region, but skipped via no_real_insns_p. > So instead you have to handle BBs with just debug insns the same you > handle a completely empty BB. Yeah. There would be no problem if the scheduler never used no_real_insns_p and handled empty and non-empty BBs the same way. Alexander
On Fri, Nov 10, 2023 at 3:18 PM Alexander Monakov <amonakov@ispras.ru> wrote: > > > On Fri, 10 Nov 2023, Richard Biener wrote: > > > > I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking design: > > > DEBUG_INSNs participate in dependency graph so that schedulers can remove or > > > mutate them as needed when moving real insns across them. > > > > Note that debug-only BBs do not exist - the BB would be there even without debug > > insns! > > Yep, sorry, I misspoke when I earlier said > > >> and cause divergence when passing through a debug-only BB which would not be > >> present at all without -g. > > They are present in the region, but skipped via no_real_insns_p. > > > So instead you have to handle BBs with just debug insns the same you > > handle a completely empty BB. > > Yeah. There would be no problem if the scheduler never used no_real_insns_p > and handled empty and non-empty BBs the same way. And I suppose it would be OK to do that. Empty BBs are usually removed by CFG cleanup so the situation should only happen in rare corner cases where the fix would be to actually run CFG cleanup ... Richard. > Alexander
On Fri, 10 Nov 2023, Richard Biener wrote: > On Fri, Nov 10, 2023 at 3:18 PM Alexander Monakov <amonakov@ispras.ru> wrote: > > > > > > On Fri, 10 Nov 2023, Richard Biener wrote: > > > > > > I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking design: > > > > DEBUG_INSNs participate in dependency graph so that schedulers can remove or > > > > mutate them as needed when moving real insns across them. > > > > > > Note that debug-only BBs do not exist - the BB would be there even without debug > > > insns! > > > > Yep, sorry, I misspoke when I earlier said > > > > >> and cause divergence when passing through a debug-only BB which would not be > > >> present at all without -g. > > > > They are present in the region, but skipped via no_real_insns_p. > > > > > So instead you have to handle BBs with just debug insns the same you > > > handle a completely empty BB. > > > > Yeah. There would be no problem if the scheduler never used no_real_insns_p > > and handled empty and non-empty BBs the same way. > > And I suppose it would be OK to do that. Empty BBs are usually removed by > CFG cleanup so the situation should only happen in rare corner cases where > the fix would be to actually run CFG cleanup ... Yeah, sel-sched invokes 'cfg_cleanup (0)' up front, and I suppose that may be a preferable compromise for sched-rgn as well. I'm afraid one does not simply remove all uses of no_real_insns_p from sched-rgn, but would be happy to be wrong about that. Alexander
Hi Alexander/Richard/Jeff, Thanks for the insightful comments! on 2023/11/10 22:41, Alexander Monakov wrote: > > On Fri, 10 Nov 2023, Richard Biener wrote: > >> On Fri, Nov 10, 2023 at 3:18 PM Alexander Monakov <amonakov@ispras.ru> wrote: >>> >>> >>> On Fri, 10 Nov 2023, Richard Biener wrote: >>> >>>>> I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking design: >>>>> DEBUG_INSNs participate in dependency graph so that schedulers can remove or >>>>> mutate them as needed when moving real insns across them. >>>> >>>> Note that debug-only BBs do not exist - the BB would be there even without debug >>>> insns! >>> >>> Yep, sorry, I misspoke when I earlier said >>> >>>>> and cause divergence when passing through a debug-only BB which would not be >>>>> present at all without -g. >>> >>> They are present in the region, but skipped via no_real_insns_p. >>> >>>> So instead you have to handle BBs with just debug insns the same you >>>> handle a completely empty BB. >>> >>> Yeah. There would be no problem if the scheduler never used no_real_insns_p >>> and handled empty and non-empty BBs the same way. >> >> And I suppose it would be OK to do that. Empty BBs are usually removed by >> CFG cleanup so the situation should only happen in rare corner cases where >> the fix would be to actually run CFG cleanup ... > > Yeah, sel-sched invokes 'cfg_cleanup (0)' up front, and I suppose that > may be a preferable compromise for sched-rgn as well. > Inspired by this discussion, I tested the attached patch 1 which is to run cleanup_cfg (0) first in haifa_sched_init, it's bootstrapped and regress-tested on x86_64-redhat-linux and powerpc64{,le}-linux-gnu. Then I assumed some of the current uses of no_real_insns_p won't encounter empty blocks any more, so made a patch 2 with some explicit assertions, but unfortunately I got ICEs during bootstrapping happens in function compute_priorities. I'm going to investigate it further and post more findings, but just heads-up to ensure if this is on the right track. BR, Kewen From 7652655f278cfe0f6271c50aecb56e68e0877cc2 Mon Sep 17 00:00:00 2001 From: Kewen Lin <linkw@linux.ibm.com> Date: Tue, 14 Nov 2023 15:16:47 +0800 Subject: [PATCH] sched: cleanup cfg for empty blocks first in haifa_sched_init [PR108273] PR108273 exposed the inconsistent states issue between non-debug mode and debug mode, as the discussion in [1], we can follow the current practice in sel_global_init, to run cleanup_cfg (0) first to remove empty blocks. This patch is to follow this direction and remove empty blocks by cleanup_cfg (0) in haifa_sched_init which affects sms, ebb and rgn schedulings. PR rtl-optimization/108273 gcc/ChangeLog: * haifa-sched.cc (haifa_sched_init): Call cleanup_cfg (0) to remove empty blocks. --- gcc/haifa-sched.cc | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc index 8e8add709b3..e348d1a2119 100644 --- a/gcc/haifa-sched.cc +++ b/gcc/haifa-sched.cc @@ -7375,6 +7375,12 @@ haifa_sched_init (void) sched_deps_info->generate_spec_deps = 1; } + /* Remove empty blocks to avoid some inconsistency like: we skip + empty block in scheduling but don't for empty block + only + debug_insn, it could result in different subsequent states + and unexpected insn sequence difference. */ + cleanup_cfg (0); + /* Initialize luids, dependency caches, target and h_i_d for the whole function. */ {
On Wed, 15 Nov 2023, Kewen.Lin wrote: > >> And I suppose it would be OK to do that. Empty BBs are usually removed by > >> CFG cleanup so the situation should only happen in rare corner cases where > >> the fix would be to actually run CFG cleanup ... > > > > Yeah, sel-sched invokes 'cfg_cleanup (0)' up front, and I suppose that > > may be a preferable compromise for sched-rgn as well. > > Inspired by this discussion, I tested the attached patch 1 which is to run > cleanup_cfg (0) first in haifa_sched_init, it's bootstrapped and > regress-tested on x86_64-redhat-linux and powerpc64{,le}-linux-gnu. I don't think you can run cleanup_cfg after sched_init. I would suggest to put it early in schedule_insns. > Then I assumed some of the current uses of no_real_insns_p won't encounter > empty blocks any more, so made a patch 2 with some explicit assertions, but > unfortunately I got ICEs during bootstrapping happens in function > compute_priorities. I'm going to investigate it further and post more > findings, but just heads-up to ensure if this is on the right track. I suspect this may be caused by invoking cleanup_cfg too late. Alexander
on 2023/11/15 17:43, Alexander Monakov wrote: > > On Wed, 15 Nov 2023, Kewen.Lin wrote: > >>>> And I suppose it would be OK to do that. Empty BBs are usually removed by >>>> CFG cleanup so the situation should only happen in rare corner cases where >>>> the fix would be to actually run CFG cleanup ... >>> >>> Yeah, sel-sched invokes 'cfg_cleanup (0)' up front, and I suppose that >>> may be a preferable compromise for sched-rgn as well. >> >> Inspired by this discussion, I tested the attached patch 1 which is to run >> cleanup_cfg (0) first in haifa_sched_init, it's bootstrapped and >> regress-tested on x86_64-redhat-linux and powerpc64{,le}-linux-gnu. > > I don't think you can run cleanup_cfg after sched_init. I would suggest > to put it early in schedule_insns. Thanks for the suggestion, I placed it at the beginning of haifa_sched_init instead, since schedule_insns invokes haifa_sched_init, although the calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed ahead but they are all "setup" functions, shouldn't affect or be affected by this placement. > >> Then I assumed some of the current uses of no_real_insns_p won't encounter >> empty blocks any more, so made a patch 2 with some explicit assertions, but >> unfortunately I got ICEs during bootstrapping happens in function >> compute_priorities. I'm going to investigate it further and post more >> findings, but just heads-up to ensure if this is on the right track. > > I suspect this may be caused by invoking cleanup_cfg too late. By looking into some failures, I found that although cleanup_cfg is executed there would be still some empty blocks left, by analyzing a few failures there are at least such cases: 1. empty function body 2. block holding a label for return. 3. block without any successor. 4. block which becomes empty after scheduling some other block. 5. block which looks mergeable with its always successor but left. ... For 1,2, there is one single successor EXIT block, I think they don't affect state transition, for 3, it's the same. For 4, it depends on if we can have the assumption this kind of empty block doesn't have the chance to have debug insn (like associated debug insn should be moved along), I'm not sure. For 5, a reduced test case is: #include <istream> namespace std { template <> basic_istream<char> & basic_istream<char>::getline (char_type *, streamsize, char_type) __try { __streambuf_type *a = rdbuf (); a->sgetc (); while (traits_type::eq_int_type) ; } __catch (...) {} } // namespace std slim RTL: 1: NOTE_INSN_DELETED 7: NOTE_INSN_BASIC_BLOCK 2 ... 15: r137:CCUNS=cmp(r135:DI,r136:DI) REG_DEAD r136:DI REG_DEAD r135:DI 16: pc={(ltu(r137:CCUNS,0))?L24:pc} REG_DEAD r137:CCUNS REG_BR_PROB 966367644 17: NOTE_INSN_BASIC_BLOCK 3 18: r138:DI=[r122:DI] 19: r139:DI=[r138:DI+0x48] REG_DEAD r138:DI 20: %3:DI=r122:DI REG_DEAD r122:DI 21: %12:DI=r139:DI 22: ctr:DI=r139:DI REG_DEAD r139:DI 23: %3:DI=call [ctr:DI] argc:0 REG_DEAD ctr:DI REG_DEAD %12:DI REG_UNUSED %3:DI REG_EH_REGION 0x1 REG_CALL_DECL (nil) 24: L24: 25: NOTE_INSN_BASIC_BLOCK 4 51: L51: 48: NOTE_INSN_BASIC_BLOCK 5 47: NOTE_INSN_DELETED 52: pc=L51 53: barrier 39: L39: 41: NOTE_INSN_BASIC_BLOCK 6 32: %3:DI=call [`__cxa_begin_catch'] argc:0 REG_UNUSED %3:DI REG_CALL_DECL `__cxa_begin_catch' REG_EH_REGION 0 33: call [`__cxa_end_catch'] argc:0 REG_CALL_DECL `__cxa_end_catch' 34: barrier ;; basic block 4, loop depth 0, count 10631108 (estimated locally, freq 1.0000), maybe hot ;; prev block 3, next block 5, flags: (REACHABLE, RTL) ;; pred: 3 [always] count:1063111 (estimated locally, freq 0.1000) (FALLTHRU) ;; 2 [90.0% (guessed)] count:9567997 (estimated locally, freq 0.9000) ;; succ: 5 [always] count:10631108 (estimated locally, freq 1.0000) (FALLTHRU) ;; basic block 5, loop depth 0, count 1073741824 (estimated locally, freq 101.0000), maybe hot ;; prev block 4, next block 6, flags: (REACHABLE, RTL, MODIFIED) ;; pred: 4 [always] count:10631108 (estimated locally, freq 1.0000) (FALLTHRU) ;; 5 [always] count:1073741824 (estimated locally, freq 101.0000) ;; succ: 5 [always] count:1073741824 (estimated locally, freq 101.0000) It looks bb 4 can be merged with bb 5, a miss-opt? There can be some other cases similar to this, either it's miss-opt or not, it seems to affect our assumption. With the limited findings above so far, I wonder if we still want to go with this direction that running cleanup_cfg first and make the assumption that there is no such empty block which can cause debug and non-debug state inconsistency? IMHO it seems risky to make it. Thoughts? BR, Kewen
On Fri, Nov 17, 2023 at 10:04 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > > on 2023/11/15 17:43, Alexander Monakov wrote: > > > > On Wed, 15 Nov 2023, Kewen.Lin wrote: > > > >>>> And I suppose it would be OK to do that. Empty BBs are usually removed by > >>>> CFG cleanup so the situation should only happen in rare corner cases where > >>>> the fix would be to actually run CFG cleanup ... > >>> > >>> Yeah, sel-sched invokes 'cfg_cleanup (0)' up front, and I suppose that > >>> may be a preferable compromise for sched-rgn as well. > >> > >> Inspired by this discussion, I tested the attached patch 1 which is to run > >> cleanup_cfg (0) first in haifa_sched_init, it's bootstrapped and > >> regress-tested on x86_64-redhat-linux and powerpc64{,le}-linux-gnu. > > > > I don't think you can run cleanup_cfg after sched_init. I would suggest > > to put it early in schedule_insns. > > Thanks for the suggestion, I placed it at the beginning of haifa_sched_init > instead, since schedule_insns invokes haifa_sched_init, although the > calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed > ahead but they are all "setup" functions, shouldn't affect or be affected > by this placement. > > > > >> Then I assumed some of the current uses of no_real_insns_p won't encounter > >> empty blocks any more, so made a patch 2 with some explicit assertions, but > >> unfortunately I got ICEs during bootstrapping happens in function > >> compute_priorities. I'm going to investigate it further and post more > >> findings, but just heads-up to ensure if this is on the right track. > > > > I suspect this may be caused by invoking cleanup_cfg too late. > > By looking into some failures, I found that although cleanup_cfg is executed > there would be still some empty blocks left, by analyzing a few failures there > are at least such cases: > 1. empty function body > 2. block holding a label for return. > 3. block without any successor. > 4. block which becomes empty after scheduling some other block. > 5. block which looks mergeable with its always successor but left. > ... > > For 1,2, there is one single successor EXIT block, I think they don't affect > state transition, for 3, it's the same. For 4, it depends on if we can have > the assumption this kind of empty block doesn't have the chance to have debug > insn (like associated debug insn should be moved along), I'm not sure. For 5, > a reduced test case is: > > #include <istream> > namespace std { > template <> > basic_istream<char> & > basic_istream<char>::getline (char_type *, streamsize, char_type) __try > { > __streambuf_type *a = rdbuf (); > a->sgetc (); > while (traits_type::eq_int_type) > ; > } > __catch (...) {} > } // namespace std > > slim RTL: > > 1: NOTE_INSN_DELETED > 7: NOTE_INSN_BASIC_BLOCK 2 > ... > 15: r137:CCUNS=cmp(r135:DI,r136:DI) > REG_DEAD r136:DI > REG_DEAD r135:DI > 16: pc={(ltu(r137:CCUNS,0))?L24:pc} > REG_DEAD r137:CCUNS > REG_BR_PROB 966367644 > 17: NOTE_INSN_BASIC_BLOCK 3 > 18: r138:DI=[r122:DI] > 19: r139:DI=[r138:DI+0x48] > REG_DEAD r138:DI > 20: %3:DI=r122:DI > REG_DEAD r122:DI > 21: %12:DI=r139:DI > 22: ctr:DI=r139:DI > REG_DEAD r139:DI > 23: %3:DI=call [ctr:DI] argc:0 > REG_DEAD ctr:DI > REG_DEAD %12:DI > REG_UNUSED %3:DI > REG_EH_REGION 0x1 > REG_CALL_DECL (nil) > 24: L24: > 25: NOTE_INSN_BASIC_BLOCK 4 > 51: L51: > 48: NOTE_INSN_BASIC_BLOCK 5 > 47: NOTE_INSN_DELETED > 52: pc=L51 > 53: barrier > 39: L39: > 41: NOTE_INSN_BASIC_BLOCK 6 > 32: %3:DI=call [`__cxa_begin_catch'] argc:0 > REG_UNUSED %3:DI > REG_CALL_DECL `__cxa_begin_catch' > REG_EH_REGION 0 > 33: call [`__cxa_end_catch'] argc:0 > REG_CALL_DECL `__cxa_end_catch' > 34: barrier > > ;; basic block 4, loop depth 0, count 10631108 (estimated locally, freq 1.0000), maybe hot > ;; prev block 3, next block 5, flags: (REACHABLE, RTL) > ;; pred: 3 [always] count:1063111 (estimated locally, freq 0.1000) (FALLTHRU) > ;; 2 [90.0% (guessed)] count:9567997 (estimated locally, freq 0.9000) > ;; succ: 5 [always] count:10631108 (estimated locally, freq 1.0000) (FALLTHRU) > ;; basic block 5, loop depth 0, count 1073741824 (estimated locally, freq 101.0000), maybe hot > ;; prev block 4, next block 6, flags: (REACHABLE, RTL, MODIFIED) > ;; pred: 4 [always] count:10631108 (estimated locally, freq 1.0000) (FALLTHRU) > ;; 5 [always] count:1073741824 (estimated locally, freq 101.0000) > ;; succ: 5 [always] count:1073741824 (estimated locally, freq 101.0000) > > It looks bb 4 can be merged with bb 5, a miss-opt? There can be some other cases > similar to this, either it's miss-opt or not, it seems to affect our assumption. > > With the limited findings above so far, I wonder if we still want to go with this > direction that running cleanup_cfg first and make the assumption that there is no > such empty block which can cause debug and non-debug state inconsistency? > IMHO it seems risky to make it. Thoughts? Running CFG cleanup shouldn't be the fix to remove such blocks but instead scheduling shouldn't special case empty blocks as they usually do not appear. Richard. > > BR, > Kewen
On Fri, 17 Nov 2023, Kewen.Lin wrote: > > I don't think you can run cleanup_cfg after sched_init. I would suggest > > to put it early in schedule_insns. > > Thanks for the suggestion, I placed it at the beginning of haifa_sched_init > instead, since schedule_insns invokes haifa_sched_init, although the > calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed > ahead but they are all "setup" functions, shouldn't affect or be affected > by this placement. I was worried because sched_init invokes df_analyze, and I'm not sure if cfg_cleanup can invalidate it. > > I suspect this may be caused by invoking cleanup_cfg too late. > > By looking into some failures, I found that although cleanup_cfg is executed > there would be still some empty blocks left, by analyzing a few failures there > are at least such cases: > 1. empty function body > 2. block holding a label for return. > 3. block without any successor. > 4. block which becomes empty after scheduling some other block. > 5. block which looks mergeable with its always successor but left. > ... > > For 1,2, there is one single successor EXIT block, I think they don't affect > state transition, for 3, it's the same. For 4, it depends on if we can have > the assumption this kind of empty block doesn't have the chance to have debug > insn (like associated debug insn should be moved along), I'm not sure. For 5, > a reduced test case is: Oh, I should have thought of cases like these, really sorry about the slip of attention, and thanks for showing a testcase for item 5. As Richard as saying in his response, cfg_cleanup cannot be a fix here. The thing to check would be changing no_real_insns_p to always return false, and see if the situation looks recoverable (if it breaks bootstrap, regtest statistics of a non-bootstrapped compiler are still informative). Alexander
on 2023/11/17 20:55, Alexander Monakov wrote: > > On Fri, 17 Nov 2023, Kewen.Lin wrote: >>> I don't think you can run cleanup_cfg after sched_init. I would suggest >>> to put it early in schedule_insns. >> >> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init >> instead, since schedule_insns invokes haifa_sched_init, although the >> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed >> ahead but they are all "setup" functions, shouldn't affect or be affected >> by this placement. > > I was worried because sched_init invokes df_analyze, and I'm not sure if > cfg_cleanup can invalidate it. Thanks for further explaining! By scanning cleanup_cfg, it seems that it considers df, like compact_blocks checks df, try_optimize_cfg invokes df_analyze etc., but I agree that moving cleanup_cfg before sched_init makes more sense. > >>> I suspect this may be caused by invoking cleanup_cfg too late. >> >> By looking into some failures, I found that although cleanup_cfg is executed >> there would be still some empty blocks left, by analyzing a few failures there >> are at least such cases: >> 1. empty function body >> 2. block holding a label for return. >> 3. block without any successor. >> 4. block which becomes empty after scheduling some other block. >> 5. block which looks mergeable with its always successor but left. >> ... >> >> For 1,2, there is one single successor EXIT block, I think they don't affect >> state transition, for 3, it's the same. For 4, it depends on if we can have >> the assumption this kind of empty block doesn't have the chance to have debug >> insn (like associated debug insn should be moved along), I'm not sure. For 5, >> a reduced test case is: > > Oh, I should have thought of cases like these, really sorry about the slip > of attention, and thanks for showing a testcase for item 5. As Richard as > saying in his response, cfg_cleanup cannot be a fix here. The thing to check > would be changing no_real_insns_p to always return false, and see if the > situation looks recoverable (if it breaks bootstrap, regtest statistics of > a non-bootstrapped compiler are still informative). As you suggested, I forced no_real_insns_p to return false all the time, some issues got exposed, almost all of them are asserting NOTE_P insn shouldn't be encountered in those places, so the adjustments for most of them are just to consider NOTE_P or this kind of special block and so on. One draft patch is attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86. btw, it's without the previous cfg_cleanup adjustment (hope it can get more empty blocks and expose more issues). The draft isn't qualified for code review but I hope it can provide some information on what kinds of changes are needed for the proposal. If this is the direction which we all agree on, I'll further refine it and post a formal patch. One thing I want to note is that this patch disable one assertion below: diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc index e5964f54ead..abd334864fb 100644 --- a/gcc/sched-rgn.cc +++ b/gcc/sched-rgn.cc @@ -3219,7 +3219,7 @@ schedule_region (int rgn) } /* Sanity check: verify that all region insns were scheduled. */ - gcc_assert (sched_rgn_n_insns == rgn_n_insns); + // gcc_assert (sched_rgn_n_insns == rgn_n_insns); sched_finish_ready_list (); Some cases can cause this assertion to fail, it's due to the mismatch on to-be-scheduled and scheduled insn counts. The reason why it happens is that one block previously has only one INSN_P but while scheduling some other blocks it gets moved as well then we ends up with an empty block so that the only NOTE_P insn was counted then, but since this block isn't empty initially and NOTE_P gets skipped in a normal block, the count to-be-scheduled can't count it in. It can be fixed with special-casing this kind of block for counting like initially recording which block is empty and if a block isn't recorded before then fix up the count for it accordingly. I'm not sure if someone may have an argument that all the complication make this proposal beaten by previous special-casing debug insn approach, looking forward to more comments. BR, Kewen From d350f411b23f6064a33a72a6ca7afc49b0ccea65 Mon Sep 17 00:00:00 2001 From: Kewen Lin <linkw@linux.ibm.com> Date: Wed, 22 Nov 2023 00:08:59 -0600 Subject: [PATCH] sched: Don't skip empty block in scheduling att --- gcc/haifa-sched.cc | 166 +++++++++++++++++++++++++++------------------ gcc/rtl.h | 4 +- gcc/sched-rgn.cc | 2 +- 3 files changed, 103 insertions(+), 69 deletions(-) diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc index 8e8add709b3..62377d99162 100644 --- a/gcc/haifa-sched.cc +++ b/gcc/haifa-sched.cc @@ -1207,6 +1207,11 @@ recompute_todo_spec (rtx_insn *next, bool for_backtrack) int n_replace = 0; bool first_p = true; + /* We don't skip no_real_insns_p any more, so it's possible to + meet NOTE insn now, early return if so. */ + if (NOTE_P (next)) + return 0; + if (sd_lists_empty_p (next, SD_LIST_BACK)) /* NEXT has all its dependencies resolved. */ return 0; @@ -1728,6 +1733,9 @@ setup_insn_reg_pressure_info (rtx_insn *insn) gcc_checking_assert (!DEBUG_INSN_P (insn)); + if (NOTE_P (insn)) + return; + excess_cost_change = 0; calculate_reg_deaths (insn, death); pressure_info = INSN_REG_PRESSURE (insn); @@ -4017,10 +4025,10 @@ schedule_insn (rtx_insn *insn) /* Scheduling instruction should have all its dependencies resolved and should have been removed from the ready list. */ - gcc_assert (sd_lists_empty_p (insn, SD_LIST_HARD_BACK)); + gcc_assert (NOTE_P (insn) || sd_lists_empty_p (insn, SD_LIST_HARD_BACK)); /* Reset debug insns invalidated by moving this insn. */ - if (MAY_HAVE_DEBUG_BIND_INSNS && !DEBUG_INSN_P (insn)) + if (MAY_HAVE_DEBUG_BIND_INSNS && !DEBUG_INSN_P (insn) && !NOTE_P (insn)) for (sd_it = sd_iterator_start (insn, SD_LIST_BACK); sd_iterator_cond (&sd_it, &dep);) { @@ -4106,61 +4114,64 @@ schedule_insn (rtx_insn *insn) check_clobbered_conditions (insn); - /* Update dependent instructions. First, see if by scheduling this insn - now we broke a dependence in a way that requires us to change another - insn. */ - for (sd_it = sd_iterator_start (insn, SD_LIST_SPEC_BACK); - sd_iterator_cond (&sd_it, &dep); sd_iterator_next (&sd_it)) + if (!NOTE_P (insn)) { - struct dep_replacement *desc = DEP_REPLACE (dep); - rtx_insn *pro = DEP_PRO (dep); - if (QUEUE_INDEX (pro) != QUEUE_SCHEDULED - && desc != NULL && desc->insn == pro) - apply_replacement (dep, false); - } + /* Update dependent instructions. First, see if by scheduling this insn + now we broke a dependence in a way that requires us to change another + insn. */ + for (sd_it = sd_iterator_start (insn, SD_LIST_SPEC_BACK); + sd_iterator_cond (&sd_it, &dep); sd_iterator_next (&sd_it)) + { + struct dep_replacement *desc = DEP_REPLACE (dep); + rtx_insn *pro = DEP_PRO (dep); + if (QUEUE_INDEX (pro) != QUEUE_SCHEDULED && desc != NULL + && desc->insn == pro) + apply_replacement (dep, false); + } - /* Go through and resolve forward dependencies. */ - for (sd_it = sd_iterator_start (insn, SD_LIST_FORW); - sd_iterator_cond (&sd_it, &dep);) - { - rtx_insn *next = DEP_CON (dep); - bool cancelled = (DEP_STATUS (dep) & DEP_CANCELLED) != 0; + /* Go through and resolve forward dependencies. */ + for (sd_it = sd_iterator_start (insn, SD_LIST_FORW); + sd_iterator_cond (&sd_it, &dep);) + { + rtx_insn *next = DEP_CON (dep); + bool cancelled = (DEP_STATUS (dep) & DEP_CANCELLED) != 0; - /* Resolve the dependence between INSN and NEXT. - sd_resolve_dep () moves current dep to another list thus - advancing the iterator. */ - sd_resolve_dep (sd_it); + /* Resolve the dependence between INSN and NEXT. + sd_resolve_dep () moves current dep to another list thus + advancing the iterator. */ + sd_resolve_dep (sd_it); - if (cancelled) - { - if (must_restore_pattern_p (next, dep)) - restore_pattern (dep, false); - continue; - } + if (cancelled) + { + if (must_restore_pattern_p (next, dep)) + restore_pattern (dep, false); + continue; + } - /* Don't bother trying to mark next as ready if insn is a debug - insn. If insn is the last hard dependency, it will have - already been discounted. */ - if (DEBUG_INSN_P (insn) && !DEBUG_INSN_P (next)) - continue; + /* Don't bother trying to mark next as ready if insn is a debug + insn. If insn is the last hard dependency, it will have + already been discounted. */ + if (DEBUG_INSN_P (insn) && !DEBUG_INSN_P (next)) + continue; - if (!IS_SPECULATION_BRANCHY_CHECK_P (insn)) - { - int effective_cost; + if (!IS_SPECULATION_BRANCHY_CHECK_P (insn)) + { + int effective_cost; - effective_cost = try_ready (next); + effective_cost = try_ready (next); - if (effective_cost >= 0 - && SCHED_GROUP_P (next) - && advance < effective_cost) - advance = effective_cost; - } - else - /* Check always has only one forward dependence (to the first insn in - the recovery block), therefore, this will be executed only once. */ - { - gcc_assert (sd_lists_empty_p (insn, SD_LIST_FORW)); - fix_recovery_deps (RECOVERY_BLOCK (insn)); + if (effective_cost >= 0 && SCHED_GROUP_P (next) + && advance < effective_cost) + advance = effective_cost; + } + else + /* Check always has only one forward dependence (to the first insn + in the recovery block), therefore, this will be executed only + once. */ + { + gcc_assert (sd_lists_empty_p (insn, SD_LIST_FORW)); + fix_recovery_deps (RECOVERY_BLOCK (insn)); + } } } @@ -4170,6 +4181,7 @@ schedule_insn (rtx_insn *insn) may use this information to decide how the instruction should be aligned. */ if (issue_rate > 1 + && !NOTE_P (insn) && GET_CODE (PATTERN (insn)) != USE && GET_CODE (PATTERN (insn)) != CLOBBER && !DEBUG_INSN_P (insn)) @@ -5036,8 +5048,11 @@ get_ebb_head_tail (basic_block beg, basic_block end, /* Return true if there are no real insns in the range [ HEAD, TAIL ]. */ bool -no_real_insns_p (const rtx_insn *head, const rtx_insn *tail) +no_real_insns_p (const rtx_insn *head ATTRIBUTE_UNUSED, + const rtx_insn *tail ATTRIBUTE_UNUSED) { + return false; +#if 0 while (head != NEXT_INSN (tail)) { if (!NOTE_P (head) && !LABEL_P (head)) @@ -5045,6 +5060,7 @@ no_real_insns_p (const rtx_insn *head, const rtx_insn *tail) head = NEXT_INSN (head); } return true; +#endif } /* Restore-other-notes: NOTE_LIST is the end of a chain of notes @@ -6224,8 +6240,9 @@ commit_schedule (rtx_insn *prev_head, rtx_insn *tail, basic_block *target_bb) scheduled_insns.iterate (i, &insn); i++) { - if (control_flow_insn_p (last_scheduled_insn) - || current_sched_info->advance_target_bb (*target_bb, insn)) + if ((control_flow_insn_p (last_scheduled_insn) + || current_sched_info->advance_target_bb (*target_bb, insn)) + && !NOTE_P (insn)) { *target_bb = current_sched_info->advance_target_bb (*target_bb, 0); @@ -6245,7 +6262,7 @@ commit_schedule (rtx_insn *prev_head, rtx_insn *tail, basic_block *target_bb) (*current_sched_info->begin_move_insn) (insn, last_scheduled_insn); move_insn (insn, last_scheduled_insn, current_sched_info->next_tail); - if (!DEBUG_INSN_P (insn)) + if (!DEBUG_INSN_P (insn) && !NOTE_P (insn)) reemit_notes (insn); last_scheduled_insn = insn; } @@ -6296,7 +6313,7 @@ prune_ready_list (state_t temp_state, bool first_cycle_insn_p, int cost = 0; const char *reason = "resource conflict"; - if (DEBUG_INSN_P (insn)) + if (DEBUG_INSN_P (insn) || NOTE_P (insn)) continue; if (sched_group_found && !SCHED_GROUP_P (insn) @@ -6504,7 +6521,7 @@ schedule_block (basic_block *target_bb, state_t init_state) and caused problems because schedule_block and compute_forward_dependences had different notions of what the "head" insn was. */ - gcc_assert (head != tail || INSN_P (head)); + gcc_assert (head != tail || (INSN_P (head) || NOTE_P (head))); haifa_recovery_bb_recently_added_p = false; @@ -6544,9 +6561,10 @@ schedule_block (basic_block *target_bb, state_t init_state) last_nondebug_scheduled_insn = NULL; nonscheduled_insns_begin = NULL; - gcc_assert ((NOTE_P (last_scheduled_insn) - || DEBUG_INSN_P (last_scheduled_insn)) - && BLOCK_FOR_INSN (last_scheduled_insn) == *target_bb); + gcc_assert ( + ((NOTE_P (last_scheduled_insn) || DEBUG_INSN_P (last_scheduled_insn)) + && BLOCK_FOR_INSN (last_scheduled_insn) == *target_bb) + || (head == tail && NOTE_P (head))); /* Initialize INSN_QUEUE. Q_SIZE is the total number of insns in the queue. */ @@ -6744,6 +6762,21 @@ schedule_block (basic_block *target_bb, state_t init_state) } } + /* Handle the only NOTE here similar to the above for debug insn. */ + if (ready.n_ready && NOTE_P (ready_element (&ready, 0)) + && (*current_sched_info->schedule_more_p) ()) + { + rtx_insn *insn = ready_remove_first (&ready); + (*current_sched_info->begin_schedule_ready) (insn); + scheduled_insns.safe_push (insn); + last_scheduled_insn = insn; + advance = schedule_insn (insn); + gcc_assert (advance == 0); + if (ready.n_ready > 0) + ready_sort (&ready); + gcc_assert (!ready.n_ready); + } + if (ls.first_cycle_insn_p && !ready.n_ready) break; @@ -6886,7 +6919,7 @@ schedule_block (basic_block *target_bb, state_t init_state) /* Update counters, etc in the scheduler's front end. */ (*current_sched_info->begin_schedule_ready) (insn); scheduled_insns.safe_push (insn); - gcc_assert (NONDEBUG_INSN_P (insn)); + gcc_assert (NONDEBUG_INSN_P (insn) || NOTE_P (insn)); last_nondebug_scheduled_insn = last_scheduled_insn = insn; if (recog_memoized (insn) >= 0) @@ -7145,17 +7178,16 @@ schedule_block (basic_block *target_bb, state_t init_state) int set_priorities (rtx_insn *head, rtx_insn *tail) { + /* This is a no_real_insns_p block. */ + if (head == tail && !INSN_P (head)) + return 1; + rtx_insn *insn; - int n_insn; + int n_insn = 0; int sched_max_insns_priority = current_sched_info->sched_max_insns_priority; rtx_insn *prev_head; - if (head == tail && ! INSN_P (head)) - gcc_unreachable (); - - n_insn = 0; - prev_head = PREV_INSN (head); for (insn = tail; insn != prev_head; insn = PREV_INSN (insn)) { @@ -7688,7 +7720,9 @@ fix_tick_ready (rtx_insn *next) { int tick, delay; - if (!DEBUG_INSN_P (next) && !sd_lists_empty_p (next, SD_LIST_RES_BACK)) + if (!DEBUG_INSN_P (next) + && !NOTE_P (next) + && !sd_lists_empty_p (next, SD_LIST_RES_BACK)) { int full_p; sd_iterator_def sd_it; diff --git a/gcc/rtl.h b/gcc/rtl.h index e4b6cc0dbb5..34b3f31d1ee 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -2695,8 +2695,8 @@ do { \ /* During sched, 1 if RTX is an insn that must be scheduled together with the preceding insn. */ #define SCHED_GROUP_P(RTX) \ - (RTL_FLAG_CHECK4 ("SCHED_GROUP_P", (RTX), DEBUG_INSN, INSN, \ - JUMP_INSN, CALL_INSN)->in_struct) + (RTL_FLAG_CHECK5 ("SCHED_GROUP_P", (RTX), DEBUG_INSN, INSN, \ + JUMP_INSN, CALL_INSN, NOTE)->in_struct) /* For a SET rtx, SET_DEST is the place that is set and SET_SRC is the value it is set to. */ diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc index e5964f54ead..abd334864fb 100644 --- a/gcc/sched-rgn.cc +++ b/gcc/sched-rgn.cc @@ -3219,7 +3219,7 @@ schedule_region (int rgn) } /* Sanity check: verify that all region insns were scheduled. */ - gcc_assert (sched_rgn_n_insns == rgn_n_insns); + // gcc_assert (sched_rgn_n_insns == rgn_n_insns); sched_finish_ready_list ();
On Wed, Nov 22, 2023 at 10:31 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > > on 2023/11/17 20:55, Alexander Monakov wrote: > > > > On Fri, 17 Nov 2023, Kewen.Lin wrote: > >>> I don't think you can run cleanup_cfg after sched_init. I would suggest > >>> to put it early in schedule_insns. > >> > >> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init > >> instead, since schedule_insns invokes haifa_sched_init, although the > >> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed > >> ahead but they are all "setup" functions, shouldn't affect or be affected > >> by this placement. > > > > I was worried because sched_init invokes df_analyze, and I'm not sure if > > cfg_cleanup can invalidate it. > > Thanks for further explaining! By scanning cleanup_cfg, it seems that it > considers df, like compact_blocks checks df, try_optimize_cfg invokes > df_analyze etc., but I agree that moving cleanup_cfg before sched_init > makes more sense. > > > > >>> I suspect this may be caused by invoking cleanup_cfg too late. > >> > >> By looking into some failures, I found that although cleanup_cfg is executed > >> there would be still some empty blocks left, by analyzing a few failures there > >> are at least such cases: > >> 1. empty function body > >> 2. block holding a label for return. > >> 3. block without any successor. > >> 4. block which becomes empty after scheduling some other block. > >> 5. block which looks mergeable with its always successor but left. > >> ... > >> > >> For 1,2, there is one single successor EXIT block, I think they don't affect > >> state transition, for 3, it's the same. For 4, it depends on if we can have > >> the assumption this kind of empty block doesn't have the chance to have debug > >> insn (like associated debug insn should be moved along), I'm not sure. For 5, > >> a reduced test case is: > > > > Oh, I should have thought of cases like these, really sorry about the slip > > of attention, and thanks for showing a testcase for item 5. As Richard as > > saying in his response, cfg_cleanup cannot be a fix here. The thing to check > > would be changing no_real_insns_p to always return false, and see if the > > situation looks recoverable (if it breaks bootstrap, regtest statistics of > > a non-bootstrapped compiler are still informative). > > As you suggested, I forced no_real_insns_p to return false all the time, some > issues got exposed, almost all of them are asserting NOTE_P insn shouldn't be > encountered in those places, so the adjustments for most of them are just to > consider NOTE_P or this kind of special block and so on. One draft patch is > attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86. > btw, it's without the previous cfg_cleanup adjustment (hope it can get more > empty blocks and expose more issues). The draft isn't qualified for code > review but I hope it can provide some information on what kinds of changes > are needed for the proposal. If this is the direction which we all agree on, > I'll further refine it and post a formal patch. One thing I want to note is > that this patch disable one assertion below: > > diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc > index e5964f54ead..abd334864fb 100644 > --- a/gcc/sched-rgn.cc > +++ b/gcc/sched-rgn.cc > @@ -3219,7 +3219,7 @@ schedule_region (int rgn) > } > > /* Sanity check: verify that all region insns were scheduled. */ > - gcc_assert (sched_rgn_n_insns == rgn_n_insns); > + // gcc_assert (sched_rgn_n_insns == rgn_n_insns); > > sched_finish_ready_list (); > > Some cases can cause this assertion to fail, it's due to the mismatch on > to-be-scheduled and scheduled insn counts. The reason why it happens is that > one block previously has only one INSN_P but while scheduling some other blocks > it gets moved as well then we ends up with an empty block so that the only > NOTE_P insn was counted then, but since this block isn't empty initially and > NOTE_P gets skipped in a normal block, the count to-be-scheduled can't count > it in. It can be fixed with special-casing this kind of block for counting > like initially recording which block is empty and if a block isn't recorded > before then fix up the count for it accordingly. I'm not sure if someone may > have an argument that all the complication make this proposal beaten by > previous special-casing debug insn approach, looking forward to more comments. Just a comment that the NOTE_P thing is odd - do we only ever have those for otherwise empty BBs? How are they skipped otherwise (and why does that not work for otherwise empty BBs)? Richard. > BR, > Kewen >
on 2023/11/22 18:25, Richard Biener wrote: > On Wed, Nov 22, 2023 at 10:31 AM Kewen.Lin <linkw@linux.ibm.com> wrote: >> >> on 2023/11/17 20:55, Alexander Monakov wrote: >>> >>> On Fri, 17 Nov 2023, Kewen.Lin wrote: >>>>> I don't think you can run cleanup_cfg after sched_init. I would suggest >>>>> to put it early in schedule_insns. >>>> >>>> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init >>>> instead, since schedule_insns invokes haifa_sched_init, although the >>>> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed >>>> ahead but they are all "setup" functions, shouldn't affect or be affected >>>> by this placement. >>> >>> I was worried because sched_init invokes df_analyze, and I'm not sure if >>> cfg_cleanup can invalidate it. >> >> Thanks for further explaining! By scanning cleanup_cfg, it seems that it >> considers df, like compact_blocks checks df, try_optimize_cfg invokes >> df_analyze etc., but I agree that moving cleanup_cfg before sched_init >> makes more sense. >> >>> >>>>> I suspect this may be caused by invoking cleanup_cfg too late. >>>> >>>> By looking into some failures, I found that although cleanup_cfg is executed >>>> there would be still some empty blocks left, by analyzing a few failures there >>>> are at least such cases: >>>> 1. empty function body >>>> 2. block holding a label for return. >>>> 3. block without any successor. >>>> 4. block which becomes empty after scheduling some other block. >>>> 5. block which looks mergeable with its always successor but left. >>>> ... >>>> >>>> For 1,2, there is one single successor EXIT block, I think they don't affect >>>> state transition, for 3, it's the same. For 4, it depends on if we can have >>>> the assumption this kind of empty block doesn't have the chance to have debug >>>> insn (like associated debug insn should be moved along), I'm not sure. For 5, >>>> a reduced test case is: >>> >>> Oh, I should have thought of cases like these, really sorry about the slip >>> of attention, and thanks for showing a testcase for item 5. As Richard as >>> saying in his response, cfg_cleanup cannot be a fix here. The thing to check >>> would be changing no_real_insns_p to always return false, and see if the >>> situation looks recoverable (if it breaks bootstrap, regtest statistics of >>> a non-bootstrapped compiler are still informative). >> >> As you suggested, I forced no_real_insns_p to return false all the time, some >> issues got exposed, almost all of them are asserting NOTE_P insn shouldn't be >> encountered in those places, so the adjustments for most of them are just to >> consider NOTE_P or this kind of special block and so on. One draft patch is >> attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86. >> btw, it's without the previous cfg_cleanup adjustment (hope it can get more >> empty blocks and expose more issues). The draft isn't qualified for code >> review but I hope it can provide some information on what kinds of changes >> are needed for the proposal. If this is the direction which we all agree on, >> I'll further refine it and post a formal patch. One thing I want to note is >> that this patch disable one assertion below: >> >> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc >> index e5964f54ead..abd334864fb 100644 >> --- a/gcc/sched-rgn.cc >> +++ b/gcc/sched-rgn.cc >> @@ -3219,7 +3219,7 @@ schedule_region (int rgn) >> } >> >> /* Sanity check: verify that all region insns were scheduled. */ >> - gcc_assert (sched_rgn_n_insns == rgn_n_insns); >> + // gcc_assert (sched_rgn_n_insns == rgn_n_insns); >> >> sched_finish_ready_list (); >> >> Some cases can cause this assertion to fail, it's due to the mismatch on >> to-be-scheduled and scheduled insn counts. The reason why it happens is that >> one block previously has only one INSN_P but while scheduling some other blocks >> it gets moved as well then we ends up with an empty block so that the only >> NOTE_P insn was counted then, but since this block isn't empty initially and >> NOTE_P gets skipped in a normal block, the count to-be-scheduled can't count >> it in. It can be fixed with special-casing this kind of block for counting >> like initially recording which block is empty and if a block isn't recorded >> before then fix up the count for it accordingly. I'm not sure if someone may >> have an argument that all the complication make this proposal beaten by >> previous special-casing debug insn approach, looking forward to more comments. > > Just a comment that the NOTE_P thing is odd - do we only ever have those for > otherwise empty BBs? How are they skipped otherwise (and why does that not > work for otherwise empty BBs)? Yes, previously (bypassing empty BBs) there is no chance to encounter NOTE_P when scheduling insns, as for notes in normal BBs, when setting up the head and tail, some are skipped (like get_ebb_head_tail), and there are also some special handlings remove_notes and unlink_bb_notes to guarantee they are gone. By disabling empty BB bypassing, all empty BBs will be actually uniformed as (head == tail && NOTE_P (head)), we have to deal with NOTE_P. BR, Kewen
On Thu, Nov 23, 2023 at 4:02 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > > on 2023/11/22 18:25, Richard Biener wrote: > > On Wed, Nov 22, 2023 at 10:31 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > >> > >> on 2023/11/17 20:55, Alexander Monakov wrote: > >>> > >>> On Fri, 17 Nov 2023, Kewen.Lin wrote: > >>>>> I don't think you can run cleanup_cfg after sched_init. I would suggest > >>>>> to put it early in schedule_insns. > >>>> > >>>> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init > >>>> instead, since schedule_insns invokes haifa_sched_init, although the > >>>> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed > >>>> ahead but they are all "setup" functions, shouldn't affect or be affected > >>>> by this placement. > >>> > >>> I was worried because sched_init invokes df_analyze, and I'm not sure if > >>> cfg_cleanup can invalidate it. > >> > >> Thanks for further explaining! By scanning cleanup_cfg, it seems that it > >> considers df, like compact_blocks checks df, try_optimize_cfg invokes > >> df_analyze etc., but I agree that moving cleanup_cfg before sched_init > >> makes more sense. > >> > >>> > >>>>> I suspect this may be caused by invoking cleanup_cfg too late. > >>>> > >>>> By looking into some failures, I found that although cleanup_cfg is executed > >>>> there would be still some empty blocks left, by analyzing a few failures there > >>>> are at least such cases: > >>>> 1. empty function body > >>>> 2. block holding a label for return. > >>>> 3. block without any successor. > >>>> 4. block which becomes empty after scheduling some other block. > >>>> 5. block which looks mergeable with its always successor but left. > >>>> ... > >>>> > >>>> For 1,2, there is one single successor EXIT block, I think they don't affect > >>>> state transition, for 3, it's the same. For 4, it depends on if we can have > >>>> the assumption this kind of empty block doesn't have the chance to have debug > >>>> insn (like associated debug insn should be moved along), I'm not sure. For 5, > >>>> a reduced test case is: > >>> > >>> Oh, I should have thought of cases like these, really sorry about the slip > >>> of attention, and thanks for showing a testcase for item 5. As Richard as > >>> saying in his response, cfg_cleanup cannot be a fix here. The thing to check > >>> would be changing no_real_insns_p to always return false, and see if the > >>> situation looks recoverable (if it breaks bootstrap, regtest statistics of > >>> a non-bootstrapped compiler are still informative). > >> > >> As you suggested, I forced no_real_insns_p to return false all the time, some > >> issues got exposed, almost all of them are asserting NOTE_P insn shouldn't be > >> encountered in those places, so the adjustments for most of them are just to > >> consider NOTE_P or this kind of special block and so on. One draft patch is > >> attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86. > >> btw, it's without the previous cfg_cleanup adjustment (hope it can get more > >> empty blocks and expose more issues). The draft isn't qualified for code > >> review but I hope it can provide some information on what kinds of changes > >> are needed for the proposal. If this is the direction which we all agree on, > >> I'll further refine it and post a formal patch. One thing I want to note is > >> that this patch disable one assertion below: > >> > >> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc > >> index e5964f54ead..abd334864fb 100644 > >> --- a/gcc/sched-rgn.cc > >> +++ b/gcc/sched-rgn.cc > >> @@ -3219,7 +3219,7 @@ schedule_region (int rgn) > >> } > >> > >> /* Sanity check: verify that all region insns were scheduled. */ > >> - gcc_assert (sched_rgn_n_insns == rgn_n_insns); > >> + // gcc_assert (sched_rgn_n_insns == rgn_n_insns); > >> > >> sched_finish_ready_list (); > >> > >> Some cases can cause this assertion to fail, it's due to the mismatch on > >> to-be-scheduled and scheduled insn counts. The reason why it happens is that > >> one block previously has only one INSN_P but while scheduling some other blocks > >> it gets moved as well then we ends up with an empty block so that the only > >> NOTE_P insn was counted then, but since this block isn't empty initially and > >> NOTE_P gets skipped in a normal block, the count to-be-scheduled can't count > >> it in. It can be fixed with special-casing this kind of block for counting > >> like initially recording which block is empty and if a block isn't recorded > >> before then fix up the count for it accordingly. I'm not sure if someone may > >> have an argument that all the complication make this proposal beaten by > >> previous special-casing debug insn approach, looking forward to more comments. > > > > Just a comment that the NOTE_P thing is odd - do we only ever have those for > > otherwise empty BBs? How are they skipped otherwise (and why does that not > > work for otherwise empty BBs)? > > Yes, previously (bypassing empty BBs) there is no chance to encounter NOTE_P > when scheduling insns, as for notes in normal BBs, when setting up the head > and tail, some are skipped (like get_ebb_head_tail), and there are also some > special handlings remove_notes and unlink_bb_notes to guarantee they are > gone. By disabling empty BB bypassing, all empty BBs will be actually > uniformed as (head == tail && NOTE_P (head)), we have to deal with NOTE_P. I see. I expected most of them to be naturally part of another EBB. So it's rather a limitation of the head/tail representation. I wonder if there's a more minimal fix though. But iff head or tail of an EBB then I guess either head or tail has to point to a stmt in said block which necessarily then means either a debug or note. Richard. > > BR, > Kewen
on 2023/11/23 16:20, Richard Biener wrote: > On Thu, Nov 23, 2023 at 4:02 AM Kewen.Lin <linkw@linux.ibm.com> wrote: >> >> on 2023/11/22 18:25, Richard Biener wrote: >>> On Wed, Nov 22, 2023 at 10:31 AM Kewen.Lin <linkw@linux.ibm.com> wrote: >>>> >>>> on 2023/11/17 20:55, Alexander Monakov wrote: >>>>> >>>>> On Fri, 17 Nov 2023, Kewen.Lin wrote: >>>>>>> I don't think you can run cleanup_cfg after sched_init. I would suggest >>>>>>> to put it early in schedule_insns. >>>>>> >>>>>> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init >>>>>> instead, since schedule_insns invokes haifa_sched_init, although the >>>>>> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed >>>>>> ahead but they are all "setup" functions, shouldn't affect or be affected >>>>>> by this placement. >>>>> >>>>> I was worried because sched_init invokes df_analyze, and I'm not sure if >>>>> cfg_cleanup can invalidate it. >>>> >>>> Thanks for further explaining! By scanning cleanup_cfg, it seems that it >>>> considers df, like compact_blocks checks df, try_optimize_cfg invokes >>>> df_analyze etc., but I agree that moving cleanup_cfg before sched_init >>>> makes more sense. >>>> >>>>> >>>>>>> I suspect this may be caused by invoking cleanup_cfg too late. >>>>>> >>>>>> By looking into some failures, I found that although cleanup_cfg is executed >>>>>> there would be still some empty blocks left, by analyzing a few failures there >>>>>> are at least such cases: >>>>>> 1. empty function body >>>>>> 2. block holding a label for return. >>>>>> 3. block without any successor. >>>>>> 4. block which becomes empty after scheduling some other block. >>>>>> 5. block which looks mergeable with its always successor but left. >>>>>> ... >>>>>> >>>>>> For 1,2, there is one single successor EXIT block, I think they don't affect >>>>>> state transition, for 3, it's the same. For 4, it depends on if we can have >>>>>> the assumption this kind of empty block doesn't have the chance to have debug >>>>>> insn (like associated debug insn should be moved along), I'm not sure. For 5, >>>>>> a reduced test case is: >>>>> >>>>> Oh, I should have thought of cases like these, really sorry about the slip >>>>> of attention, and thanks for showing a testcase for item 5. As Richard as >>>>> saying in his response, cfg_cleanup cannot be a fix here. The thing to check >>>>> would be changing no_real_insns_p to always return false, and see if the >>>>> situation looks recoverable (if it breaks bootstrap, regtest statistics of >>>>> a non-bootstrapped compiler are still informative). >>>> >>>> As you suggested, I forced no_real_insns_p to return false all the time, some >>>> issues got exposed, almost all of them are asserting NOTE_P insn shouldn't be >>>> encountered in those places, so the adjustments for most of them are just to >>>> consider NOTE_P or this kind of special block and so on. One draft patch is >>>> attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86. >>>> btw, it's without the previous cfg_cleanup adjustment (hope it can get more >>>> empty blocks and expose more issues). The draft isn't qualified for code >>>> review but I hope it can provide some information on what kinds of changes >>>> are needed for the proposal. If this is the direction which we all agree on, >>>> I'll further refine it and post a formal patch. One thing I want to note is >>>> that this patch disable one assertion below: >>>> >>>> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc >>>> index e5964f54ead..abd334864fb 100644 >>>> --- a/gcc/sched-rgn.cc >>>> +++ b/gcc/sched-rgn.cc >>>> @@ -3219,7 +3219,7 @@ schedule_region (int rgn) >>>> } >>>> >>>> /* Sanity check: verify that all region insns were scheduled. */ >>>> - gcc_assert (sched_rgn_n_insns == rgn_n_insns); >>>> + // gcc_assert (sched_rgn_n_insns == rgn_n_insns); >>>> >>>> sched_finish_ready_list (); >>>> >>>> Some cases can cause this assertion to fail, it's due to the mismatch on >>>> to-be-scheduled and scheduled insn counts. The reason why it happens is that >>>> one block previously has only one INSN_P but while scheduling some other blocks >>>> it gets moved as well then we ends up with an empty block so that the only >>>> NOTE_P insn was counted then, but since this block isn't empty initially and >>>> NOTE_P gets skipped in a normal block, the count to-be-scheduled can't count >>>> it in. It can be fixed with special-casing this kind of block for counting >>>> like initially recording which block is empty and if a block isn't recorded >>>> before then fix up the count for it accordingly. I'm not sure if someone may >>>> have an argument that all the complication make this proposal beaten by >>>> previous special-casing debug insn approach, looking forward to more comments. >>> >>> Just a comment that the NOTE_P thing is odd - do we only ever have those for >>> otherwise empty BBs? How are they skipped otherwise (and why does that not >>> work for otherwise empty BBs)? >> >> Yes, previously (bypassing empty BBs) there is no chance to encounter NOTE_P >> when scheduling insns, as for notes in normal BBs, when setting up the head >> and tail, some are skipped (like get_ebb_head_tail), and there are also some >> special handlings remove_notes and unlink_bb_notes to guarantee they are >> gone. By disabling empty BB bypassing, all empty BBs will be actually >> uniformed as (head == tail && NOTE_P (head)), we have to deal with NOTE_P. > > I see. I expected most of them to be naturally part of another EBB. So it's > rather a limitation of the head/tail representation. > > I wonder if there's a more minimal fix though. But iff head or tail Not sure I got the question correctly, if it is for this inconsistent states issue, I think what Maxim suggested seems to be the minimal fixes. > of an EBB then I > guess either head or tail has to point to a stmt in said block which necessarily > then means either a debug or note. Yes, will only have NOTE_P for empty BB, if there is a debug insn then it wins (all NOTEs get dropped). remove_notes runs from head to tail, it's applied for EBB's special head and tail. BR, Kewen
diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc index 8e8add709b3..30cc90ec49f 100644 --- a/gcc/haifa-sched.cc +++ b/gcc/haifa-sched.cc @@ -5033,14 +5033,17 @@ get_ebb_head_tail (basic_block beg, basic_block end, *tailp = end_tail; } -/* Return true if there are no real insns in the range [ HEAD, TAIL ]. */ +/* Return true if there are no real nondebug insns in the range + [ HEAD, TAIL ]. */ bool -no_real_insns_p (const rtx_insn *head, const rtx_insn *tail) +no_real_nondebug_insns_p (const rtx_insn *head, const rtx_insn *tail) { while (head != NEXT_INSN (tail)) { - if (!NOTE_P (head) && !LABEL_P (head)) + if (!NOTE_P (head) + && !LABEL_P (head) + && !DEBUG_INSN_P (head)) return false; head = NEXT_INSN (head); } diff --git a/gcc/sched-ebb.cc b/gcc/sched-ebb.cc index 110fcdbca4d..03d96290a7c 100644 --- a/gcc/sched-ebb.cc +++ b/gcc/sched-ebb.cc @@ -491,7 +491,7 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, bool modulo_scheduling) first_bb = BLOCK_FOR_INSN (head); last_bb = BLOCK_FOR_INSN (tail); - if (no_real_insns_p (head, tail)) + if (no_real_nondebug_insns_p (head, tail)) return BLOCK_FOR_INSN (tail); gcc_assert (INSN_P (head) && INSN_P (tail)); diff --git a/gcc/sched-int.h b/gcc/sched-int.h index 64a2f0bcff9..adca494ade5 100644 --- a/gcc/sched-int.h +++ b/gcc/sched-int.h @@ -1397,7 +1397,7 @@ extern void free_global_sched_pressure_data (void); extern int haifa_classify_insn (const_rtx); extern void get_ebb_head_tail (basic_block, basic_block, rtx_insn **, rtx_insn **); -extern bool no_real_insns_p (const rtx_insn *, const rtx_insn *); +extern bool no_real_nondebug_insns_p (const rtx_insn *, const rtx_insn *); extern int insn_sched_cost (rtx_insn *); extern int dep_cost_1 (dep_t, dw_t); diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc index e5964f54ead..2549e834aa8 100644 --- a/gcc/sched-rgn.cc +++ b/gcc/sched-rgn.cc @@ -213,6 +213,22 @@ static int rgn_nr_edges; /* Array of size rgn_nr_edges. */ static edge *rgn_edges; +/* Possible actions for dependencies freeing. */ +enum rgn_bb_deps_free_action +{ + /* This block doesn't get dependencies computed so don't need to free. */ + RGN_BB_DEPS_FREE_NO, + /* This block gets scheduled normally so free dependencies as usual. */ + RGN_BB_DEPS_FREE_NORMAL, + /* This block gets skipped in scheduling but has dependencies computed early, + need to free the forward list artificially. */ + RGN_BB_DEPS_FREE_ARTIFICIAL +}; + +/* For basic block i, bb_deps_free_actions[i] indicates which action needs + to be taken for freeing its dependencies. */ +static enum rgn_bb_deps_free_action *bb_deps_free_actions; + /* Mapping from each edge in the graph to its number in the rgn. */ #define EDGE_TO_BIT(edge) ((int)(size_t)(edge)->aux) #define SET_EDGE_TO_BIT(edge,nr) ((edge)->aux = (void *)(size_t)(nr)) @@ -2735,6 +2751,15 @@ compute_block_dependences (int bb) gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb)); get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); + /* Don't compute block dependences if there are no real nondebug insns. */ + if (no_real_nondebug_insns_p (head, tail)) + { + if (current_nr_blocks > 1) + propagate_deps (bb, &tmp_deps); + free_deps (&tmp_deps); + return; + } + sched_analyze (&tmp_deps, head, tail); add_branch_dependences (head, tail); @@ -2749,6 +2774,24 @@ compute_block_dependences (int bb) targetm.sched.dependencies_evaluation_hook (head, tail); } +/* Artificially resolve forward dependencies for instructions HEAD to TAIL. */ + +static void +resolve_forw_deps (rtx_insn *head, rtx_insn *tail) +{ + rtx_insn *insn; + rtx_insn *next_tail = NEXT_INSN (tail); + sd_iterator_def sd_it; + dep_t dep; + + /* There could be some insns which get skipped in scheduling but we compute + dependencies for them previously, so make them resolved. */ + for (insn = head; insn != next_tail; insn = NEXT_INSN (insn)) + for (sd_it = sd_iterator_start (insn, SD_LIST_FORW); + sd_iterator_cond (&sd_it, &dep);) + sd_resolve_dep (sd_it); +} + /* Free dependencies of instructions inside BB. */ static void free_block_dependencies (int bb) @@ -2758,9 +2801,12 @@ free_block_dependencies (int bb) get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); - if (no_real_insns_p (head, tail)) + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO) return; + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL) + resolve_forw_deps (head, tail); + sched_free_deps (head, tail, true); } @@ -3024,7 +3070,7 @@ compute_priorities (void) gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb)); get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); - if (no_real_insns_p (head, tail)) + if (no_real_nondebug_insns_p (head, tail)) continue; rgn_n_insns += set_priorities (head, tail); @@ -3158,7 +3204,7 @@ schedule_region (int rgn) get_ebb_head_tail (first_bb, last_bb, &head, &tail); - if (no_real_insns_p (head, tail)) + if (no_real_nondebug_insns_p (head, tail)) { gcc_assert (first_bb == last_bb); continue; @@ -3178,44 +3224,62 @@ schedule_region (int rgn) get_ebb_head_tail (first_bb, last_bb, &head, &tail); - if (no_real_insns_p (head, tail)) + if (no_real_nondebug_insns_p (head, tail)) { gcc_assert (first_bb == last_bb); save_state_for_fallthru_edge (last_bb, bb_state[first_bb->index]); - continue; + + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO) + continue; + + /* As it's not no_real_nondebug_insns_p initially, then it has some + dependencies computed so free it artificially. */ + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL; } + else + { + current_sched_info->prev_head = PREV_INSN (head); + current_sched_info->next_tail = NEXT_INSN (tail); - current_sched_info->prev_head = PREV_INSN (head); - current_sched_info->next_tail = NEXT_INSN (tail); + remove_notes (head, tail); - remove_notes (head, tail); + unlink_bb_notes (first_bb, last_bb); - unlink_bb_notes (first_bb, last_bb); + target_bb = bb; - target_bb = bb; + gcc_assert (flag_schedule_interblock || current_nr_blocks == 1); + current_sched_info->queue_must_finish_empty = current_nr_blocks == 1; - gcc_assert (flag_schedule_interblock || current_nr_blocks == 1); - current_sched_info->queue_must_finish_empty = current_nr_blocks == 1; + curr_bb = first_bb; + if (dbg_cnt (sched_block)) + { + int saved_last_basic_block = last_basic_block_for_fn (cfun); - curr_bb = first_bb; - if (dbg_cnt (sched_block)) - { - int saved_last_basic_block = last_basic_block_for_fn (cfun); + schedule_block (&curr_bb, bb_state[first_bb->index]); + gcc_assert (EBB_FIRST_BB (bb) == first_bb); + sched_rgn_n_insns += sched_n_insns; + realloc_bb_state_array (saved_last_basic_block); + save_state_for_fallthru_edge (last_bb, curr_state); - schedule_block (&curr_bb, bb_state[first_bb->index]); - gcc_assert (EBB_FIRST_BB (bb) == first_bb); - sched_rgn_n_insns += sched_n_insns; - realloc_bb_state_array (saved_last_basic_block); - save_state_for_fallthru_edge (last_bb, curr_state); - } - else - { - sched_rgn_n_insns += rgn_n_insns; - } + /* Clean up. */ + if (current_nr_blocks > 1) + free_trg_info (); + } + else + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL; + } - /* Clean up. */ - if (current_nr_blocks > 1) - free_trg_info (); + /* We have counted this block when computing rgn_n_insns + previously, so need to fix up sched_rgn_n_insns now. */ + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL) + { + while (head != NEXT_INSN (tail)) + { + if (INSN_P (head)) + sched_rgn_n_insns++; + head = NEXT_INSN (head); + } + } } /* Sanity check: verify that all region insns were scheduled. */ @@ -3223,13 +3287,13 @@ schedule_region (int rgn) sched_finish_ready_list (); - /* Done with this region. */ - sched_rgn_local_finish (); - /* Free dependencies. */ for (bb = 0; bb < current_nr_blocks; ++bb) free_block_dependencies (bb); + /* Done with this region. */ + sched_rgn_local_finish (); + gcc_assert (haifa_recovery_bb_ever_added_p || deps_pools_are_empty_p ()); } @@ -3450,6 +3514,19 @@ sched_rgn_local_init (int rgn) e->aux = NULL; } } + + /* Initialize bb_deps_free_actions. */ + bb_deps_free_actions + = XNEWVEC (enum rgn_bb_deps_free_action, current_nr_blocks); + for (bb = 0; bb < current_nr_blocks; bb++) + { + rtx_insn *head, *tail; + get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); + if (no_real_nondebug_insns_p (head, tail)) + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NO; + else + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NORMAL; + } } /* Free data computed for the finished region. */ @@ -3467,9 +3544,12 @@ sched_rgn_local_free (void) void sched_rgn_local_finish (void) { - if (current_nr_blocks > 1 && !sel_sched_p ()) + if (!sel_sched_p ()) { - sched_rgn_local_free (); + if (current_nr_blocks > 1) + sched_rgn_local_free (); + + free (bb_deps_free_actions); } } diff --git a/gcc/sel-sched.cc b/gcc/sel-sched.cc index 1925f4a9461..8310c892e13 100644 --- a/gcc/sel-sched.cc +++ b/gcc/sel-sched.cc @@ -7213,7 +7213,8 @@ sel_region_target_finish (bool reset_sched_cycles_p) find_ebb_boundaries (EBB_FIRST_BB (i), scheduled_blocks); - if (no_real_insns_p (current_sched_info->head, current_sched_info->tail)) + if (no_real_nondebug_insns_p (current_sched_info->head, + current_sched_info->tail)) continue; if (reset_sched_cycles_p) diff --git a/gcc/testsuite/gcc.target/powerpc/pr108273.c b/gcc/testsuite/gcc.target/powerpc/pr108273.c new file mode 100644 index 00000000000..937224eaa69 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr108273.c @@ -0,0 +1,26 @@ +/* { dg-options "-O2 -fdbg-cnt=sched_block:1" } */ +/* { dg-prune-output {\*\*\*dbgcnt:.*limit.*reached} } */ + +/* Verify there is no ICE. */ + +int a, b, c, e, f; +float d; + +void +g () +{ + float h, i[1]; + for (; f;) + if (c) + { + d *e; + if (b) + { + float *j = i; + j[0] += 0; + } + h += d; + } + if (h) + a = i[0]; +}