diff mbox series

[v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

Message ID 85b4098e-a72f-d013-ff17-8097971f71ba@linux.ibm.com
State New
Headers show
Series [v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273] | expand

Commit Message

Kewen.Lin Oct. 25, 2023, 2:45 a.m. UTC
Hi,

This is almost a repost for v2 which was posted at[1] in March
excepting for:
  1) rebased from r14-4810 which is relatively up-to-date,
     some conflicts on "int to bool" return type change have
     been resolved;
  2) adjust commit log a bit;
  3) fix misspelled "articial" with "artificial" somewhere;

--
*v2 comments*:

By addressing Alexander's comments, against v1 this
patch v2 mainly:

  - Rename no_real_insns_p to no_real_nondebug_insns_p;
  - Introduce enum rgn_bb_deps_free_action for three
    kinds of actions to free deps;
  - Change function free_deps_for_bb_no_real_insns_p to
    resolve_forw_deps which only focuses on forward deps;
  - Extend the handlings to cover dbg-cnt sched_block,
    add one test case for it;
  - Move free_trg_info call in schedule_region to an
    appropriate place.

One thing I'm not sure about is the change in function
sched_rgn_local_finish, currently the invocation to
sched_rgn_local_free is guarded with !sel_sched_p (),
so I just follow it, but the initialization of those
structures (in sched_rgn_local_init) isn't guarded
with !sel_sched_p (), it looks odd.

--

As PR108273 shows, when there is one block which only has
NOTE_P and LABEL_P insns at non-debug mode while has some
extra DEBUG_INSN_P insns at debug mode, after scheduling
it, the DFA states would be different between debug mode
and non-debug mode.  Since at non-debug mode, the block
meets no_real_insns_p, it gets skipped; while at debug
mode, it gets scheduled, even it only has NOTE_P, LABEL_P
and DEBUG_INSN_P, the call of function advance_one_cycle
will change the DFA state.  PR108519 also shows this issue
can be exposed by some scheduler changes.

This patch is to change function no_real_insns_p to
function no_real_nondebug_insns_p by taking debug insn into
account, which make us not try to schedule for the block
having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
resulting in consistent DFA states between non-debug and
debug mode.

Changing no_real_insns_p to no_real_nondebug_insns_p caused
ICE when doing free_block_dependencies, the root cause is
that we create dependencies for debug insns, those
dependencies are expected to be resolved during scheduling
insns, but they get skipped after this change.
By checking the code, it looks it's reasonable to skip to
compute block dependences for no_real_nondebug_insns_p
blocks.  There is also another issue, which gets exposed
in SPEC2017 bmks build at option -O2 -g, is that we could
skip to schedule some block, which already gets dependency
graph built so has dependencies computed and rgn_n_insns
accumulated, then the later verification on if the graph
becomes exhausted by scheduling would fail as follow:

  /* Sanity check: verify that all region insns were
     scheduled.  */
    gcc_assert (sched_rgn_n_insns == rgn_n_insns);

, and also some forward deps aren't resovled.

As Alexander pointed out, the current debug count handling
also suffers the similar issue, so this patch handles these
two cases together: one is for some block gets skipped by
!dbg_cnt (sched_block), the other is for some block which
is not no_real_nondebug_insns_p initially but becomes
no_real_nondebug_insns_p due to speculative scheduling.

This patch can be bootstrapped and regress-tested on
x86_64-redhat-linux, aarch64-linux-gnu and
powerpc64{,le}-linux-gnu.

I also verified this patch can pass SPEC2017 both intrate
and fprate bmks building at -g -O2/-O3.

Any thoughts?  Is it ok for trunk?

[1] v2: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614818.html
[2] v1: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614224.html

BR,
Kewen
-----
	PR rtl-optimization/108273

gcc/ChangeLog:

	* haifa-sched.cc (no_real_insns_p): Rename to ...
	(no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn.
	* sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with
	no_real_nondebug_insns_p.
	* sched-int.h (no_real_insns_p): Rename to ...
	(no_real_nondebug_insns_p): ... this.
	* sched-rgn.cc (enum rgn_bb_deps_free_action): New enum.
	(bb_deps_free_actions): New static variable.
	(compute_block_dependences): Skip for no_real_nondebug_insns_p.
	(resolve_forw_deps): New function.
	(free_block_dependencies): Check bb_deps_free_actions and call
	function resolve_forw_deps for RGN_BB_DEPS_FREE_ARTIFICIAL.
	(compute_priorities): Replace no_real_insns_p with
	no_real_nondebug_insns_p.
	(schedule_region): Replace no_real_insns_p with
	no_real_nondebug_insns_p, set RGN_BB_DEPS_FREE_ARTIFICIAL if the block
	get dependencies computed before but skipped now, fix up count
	sched_rgn_n_insns for it too.  Call free_trg_info when the block
	gets scheduled, and move sched_rgn_local_finish after the loop
	of free_block_dependencies loop.
	(sched_rgn_local_init): Allocate and compute bb_deps_free_actions.
	(sched_rgn_local_finish): Free bb_deps_free_actions.
	* sel-sched.cc (sel_region_target_finish): Replace no_real_insns_p with
	no_real_nondebug_insns_p.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr108273.c: New test.
---
 gcc/haifa-sched.cc                          |   9 +-
 gcc/sched-ebb.cc                            |   2 +-
 gcc/sched-int.h                             |   2 +-
 gcc/sched-rgn.cc                            | 148 +++++++++++++++-----
 gcc/sel-sched.cc                            |   3 +-
 gcc/testsuite/gcc.target/powerpc/pr108273.c |  26 ++++
 6 files changed, 150 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108273.c

--
2.39.1

Comments

Kewen.Lin Nov. 8, 2023, 2:49 a.m. UTC | #1
Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634201.html

BR,
Kewen

on 2023/10/25 10:45, Kewen.Lin wrote:
> Hi,
> 
> This is almost a repost for v2 which was posted at[1] in March
> excepting for:
>   1) rebased from r14-4810 which is relatively up-to-date,
>      some conflicts on "int to bool" return type change have
>      been resolved;
>   2) adjust commit log a bit;
>   3) fix misspelled "articial" with "artificial" somewhere;
> 
> --
> *v2 comments*:
> 
> By addressing Alexander's comments, against v1 this
> patch v2 mainly:
> 
>   - Rename no_real_insns_p to no_real_nondebug_insns_p;
>   - Introduce enum rgn_bb_deps_free_action for three
>     kinds of actions to free deps;
>   - Change function free_deps_for_bb_no_real_insns_p to
>     resolve_forw_deps which only focuses on forward deps;
>   - Extend the handlings to cover dbg-cnt sched_block,
>     add one test case for it;
>   - Move free_trg_info call in schedule_region to an
>     appropriate place.
> 
> One thing I'm not sure about is the change in function
> sched_rgn_local_finish, currently the invocation to
> sched_rgn_local_free is guarded with !sel_sched_p (),
> so I just follow it, but the initialization of those
> structures (in sched_rgn_local_init) isn't guarded
> with !sel_sched_p (), it looks odd.
> 
> --
> 
> As PR108273 shows, when there is one block which only has
> NOTE_P and LABEL_P insns at non-debug mode while has some
> extra DEBUG_INSN_P insns at debug mode, after scheduling
> it, the DFA states would be different between debug mode
> and non-debug mode.  Since at non-debug mode, the block
> meets no_real_insns_p, it gets skipped; while at debug
> mode, it gets scheduled, even it only has NOTE_P, LABEL_P
> and DEBUG_INSN_P, the call of function advance_one_cycle
> will change the DFA state.  PR108519 also shows this issue
> can be exposed by some scheduler changes.
> 
> This patch is to change function no_real_insns_p to
> function no_real_nondebug_insns_p by taking debug insn into
> account, which make us not try to schedule for the block
> having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
> resulting in consistent DFA states between non-debug and
> debug mode.
> 
> Changing no_real_insns_p to no_real_nondebug_insns_p caused
> ICE when doing free_block_dependencies, the root cause is
> that we create dependencies for debug insns, those
> dependencies are expected to be resolved during scheduling
> insns, but they get skipped after this change.
> By checking the code, it looks it's reasonable to skip to
> compute block dependences for no_real_nondebug_insns_p
> blocks.  There is also another issue, which gets exposed
> in SPEC2017 bmks build at option -O2 -g, is that we could
> skip to schedule some block, which already gets dependency
> graph built so has dependencies computed and rgn_n_insns
> accumulated, then the later verification on if the graph
> becomes exhausted by scheduling would fail as follow:
> 
>   /* Sanity check: verify that all region insns were
>      scheduled.  */
>     gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> 
> , and also some forward deps aren't resovled.
> 
> As Alexander pointed out, the current debug count handling
> also suffers the similar issue, so this patch handles these
> two cases together: one is for some block gets skipped by
> !dbg_cnt (sched_block), the other is for some block which
> is not no_real_nondebug_insns_p initially but becomes
> no_real_nondebug_insns_p due to speculative scheduling.
> 
> This patch can be bootstrapped and regress-tested on
> x86_64-redhat-linux, aarch64-linux-gnu and
> powerpc64{,le}-linux-gnu.
> 
> I also verified this patch can pass SPEC2017 both intrate
> and fprate bmks building at -g -O2/-O3.
> 
> Any thoughts?  Is it ok for trunk?
> 
> [1] v2: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614818.html
> [2] v1: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614224.html
> 
> BR,
> Kewen
> -----
> 	PR rtl-optimization/108273
> 
> gcc/ChangeLog:
> 
> 	* haifa-sched.cc (no_real_insns_p): Rename to ...
> 	(no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn.
> 	* sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with
> 	no_real_nondebug_insns_p.
> 	* sched-int.h (no_real_insns_p): Rename to ...
> 	(no_real_nondebug_insns_p): ... this.
> 	* sched-rgn.cc (enum rgn_bb_deps_free_action): New enum.
> 	(bb_deps_free_actions): New static variable.
> 	(compute_block_dependences): Skip for no_real_nondebug_insns_p.
> 	(resolve_forw_deps): New function.
> 	(free_block_dependencies): Check bb_deps_free_actions and call
> 	function resolve_forw_deps for RGN_BB_DEPS_FREE_ARTIFICIAL.
> 	(compute_priorities): Replace no_real_insns_p with
> 	no_real_nondebug_insns_p.
> 	(schedule_region): Replace no_real_insns_p with
> 	no_real_nondebug_insns_p, set RGN_BB_DEPS_FREE_ARTIFICIAL if the block
> 	get dependencies computed before but skipped now, fix up count
> 	sched_rgn_n_insns for it too.  Call free_trg_info when the block
> 	gets scheduled, and move sched_rgn_local_finish after the loop
> 	of free_block_dependencies loop.
> 	(sched_rgn_local_init): Allocate and compute bb_deps_free_actions.
> 	(sched_rgn_local_finish): Free bb_deps_free_actions.
> 	* sel-sched.cc (sel_region_target_finish): Replace no_real_insns_p with
> 	no_real_nondebug_insns_p.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/powerpc/pr108273.c: New test.
> ---
>  gcc/haifa-sched.cc                          |   9 +-
>  gcc/sched-ebb.cc                            |   2 +-
>  gcc/sched-int.h                             |   2 +-
>  gcc/sched-rgn.cc                            | 148 +++++++++++++++-----
>  gcc/sel-sched.cc                            |   3 +-
>  gcc/testsuite/gcc.target/powerpc/pr108273.c |  26 ++++
>  6 files changed, 150 insertions(+), 40 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108273.c
> 
> diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
> index 8e8add709b3..30cc90ec49f 100644
> --- a/gcc/haifa-sched.cc
> +++ b/gcc/haifa-sched.cc
> @@ -5033,14 +5033,17 @@ get_ebb_head_tail (basic_block beg, basic_block end,
>    *tailp = end_tail;
>  }
> 
> -/* Return true if there are no real insns in the range [ HEAD, TAIL ].  */
> +/* Return true if there are no real nondebug insns in the range
> +   [ HEAD, TAIL ].  */
> 
>  bool
> -no_real_insns_p (const rtx_insn *head, const rtx_insn *tail)
> +no_real_nondebug_insns_p (const rtx_insn *head, const rtx_insn *tail)
>  {
>    while (head != NEXT_INSN (tail))
>      {
> -      if (!NOTE_P (head) && !LABEL_P (head))
> +      if (!NOTE_P (head)
> +	  && !LABEL_P (head)
> +	  && !DEBUG_INSN_P (head))
>  	return false;
>        head = NEXT_INSN (head);
>      }
> diff --git a/gcc/sched-ebb.cc b/gcc/sched-ebb.cc
> index 110fcdbca4d..03d96290a7c 100644
> --- a/gcc/sched-ebb.cc
> +++ b/gcc/sched-ebb.cc
> @@ -491,7 +491,7 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, bool modulo_scheduling)
>    first_bb = BLOCK_FOR_INSN (head);
>    last_bb = BLOCK_FOR_INSN (tail);
> 
> -  if (no_real_insns_p (head, tail))
> +  if (no_real_nondebug_insns_p (head, tail))
>      return BLOCK_FOR_INSN (tail);
> 
>    gcc_assert (INSN_P (head) && INSN_P (tail));
> diff --git a/gcc/sched-int.h b/gcc/sched-int.h
> index 64a2f0bcff9..adca494ade5 100644
> --- a/gcc/sched-int.h
> +++ b/gcc/sched-int.h
> @@ -1397,7 +1397,7 @@ extern void free_global_sched_pressure_data (void);
>  extern int haifa_classify_insn (const_rtx);
>  extern void get_ebb_head_tail (basic_block, basic_block,
>  			       rtx_insn **, rtx_insn **);
> -extern bool no_real_insns_p (const rtx_insn *, const rtx_insn *);
> +extern bool no_real_nondebug_insns_p (const rtx_insn *, const rtx_insn *);
> 
>  extern int insn_sched_cost (rtx_insn *);
>  extern int dep_cost_1 (dep_t, dw_t);
> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
> index e5964f54ead..2549e834aa8 100644
> --- a/gcc/sched-rgn.cc
> +++ b/gcc/sched-rgn.cc
> @@ -213,6 +213,22 @@ static int rgn_nr_edges;
>  /* Array of size rgn_nr_edges.  */
>  static edge *rgn_edges;
> 
> +/* Possible actions for dependencies freeing.  */
> +enum rgn_bb_deps_free_action
> +{
> +  /* This block doesn't get dependencies computed so don't need to free.  */
> +  RGN_BB_DEPS_FREE_NO,
> +  /* This block gets scheduled normally so free dependencies as usual.  */
> +  RGN_BB_DEPS_FREE_NORMAL,
> +  /* This block gets skipped in scheduling but has dependencies computed early,
> +     need to free the forward list artificially.  */
> +  RGN_BB_DEPS_FREE_ARTIFICIAL
> +};
> +
> +/* For basic block i, bb_deps_free_actions[i] indicates which action needs
> +   to be taken for freeing its dependencies.  */
> +static enum rgn_bb_deps_free_action *bb_deps_free_actions;
> +
>  /* Mapping from each edge in the graph to its number in the rgn.  */
>  #define EDGE_TO_BIT(edge) ((int)(size_t)(edge)->aux)
>  #define SET_EDGE_TO_BIT(edge,nr) ((edge)->aux = (void *)(size_t)(nr))
> @@ -2735,6 +2751,15 @@ compute_block_dependences (int bb)
>    gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
>    get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
> 
> +  /* Don't compute block dependences if there are no real nondebug insns.  */
> +  if (no_real_nondebug_insns_p (head, tail))
> +    {
> +      if (current_nr_blocks > 1)
> +	propagate_deps (bb, &tmp_deps);
> +      free_deps (&tmp_deps);
> +      return;
> +    }
> +
>    sched_analyze (&tmp_deps, head, tail);
> 
>    add_branch_dependences (head, tail);
> @@ -2749,6 +2774,24 @@ compute_block_dependences (int bb)
>      targetm.sched.dependencies_evaluation_hook (head, tail);
>  }
> 
> +/* Artificially resolve forward dependencies for instructions HEAD to TAIL.  */
> +
> +static void
> +resolve_forw_deps (rtx_insn *head, rtx_insn *tail)
> +{
> +  rtx_insn *insn;
> +  rtx_insn *next_tail = NEXT_INSN (tail);
> +  sd_iterator_def sd_it;
> +  dep_t dep;
> +
> +  /* There could be some insns which get skipped in scheduling but we compute
> +     dependencies for them previously, so make them resolved.  */
> +  for (insn = head; insn != next_tail; insn = NEXT_INSN (insn))
> +    for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
> +	 sd_iterator_cond (&sd_it, &dep);)
> +      sd_resolve_dep (sd_it);
> +}
> +
>  /* Free dependencies of instructions inside BB.  */
>  static void
>  free_block_dependencies (int bb)
> @@ -2758,9 +2801,12 @@ free_block_dependencies (int bb)
> 
>    get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
> 
> -  if (no_real_insns_p (head, tail))
> +  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO)
>      return;
> 
> +  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL)
> +    resolve_forw_deps (head, tail);
> +
>    sched_free_deps (head, tail, true);
>  }
> 
> @@ -3024,7 +3070,7 @@ compute_priorities (void)
>        gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
>        get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
> 
> -      if (no_real_insns_p (head, tail))
> +      if (no_real_nondebug_insns_p (head, tail))
>  	continue;
> 
>        rgn_n_insns += set_priorities (head, tail);
> @@ -3158,7 +3204,7 @@ schedule_region (int rgn)
> 
>  	  get_ebb_head_tail (first_bb, last_bb, &head, &tail);
> 
> -	  if (no_real_insns_p (head, tail))
> +	  if (no_real_nondebug_insns_p (head, tail))
>  	    {
>  	      gcc_assert (first_bb == last_bb);
>  	      continue;
> @@ -3178,44 +3224,62 @@ schedule_region (int rgn)
> 
>        get_ebb_head_tail (first_bb, last_bb, &head, &tail);
> 
> -      if (no_real_insns_p (head, tail))
> +      if (no_real_nondebug_insns_p (head, tail))
>  	{
>  	  gcc_assert (first_bb == last_bb);
>  	  save_state_for_fallthru_edge (last_bb, bb_state[first_bb->index]);
> -	  continue;
> +
> +	  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO)
> +	    continue;
> +
> +	  /* As it's not no_real_nondebug_insns_p initially, then it has some
> +	     dependencies computed so free it artificially.  */
> +	  bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL;
>  	}
> +      else
> +	{
> +	  current_sched_info->prev_head = PREV_INSN (head);
> +	  current_sched_info->next_tail = NEXT_INSN (tail);
> 
> -      current_sched_info->prev_head = PREV_INSN (head);
> -      current_sched_info->next_tail = NEXT_INSN (tail);
> +	  remove_notes (head, tail);
> 
> -      remove_notes (head, tail);
> +	  unlink_bb_notes (first_bb, last_bb);
> 
> -      unlink_bb_notes (first_bb, last_bb);
> +	  target_bb = bb;
> 
> -      target_bb = bb;
> +	  gcc_assert (flag_schedule_interblock || current_nr_blocks == 1);
> +	  current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;
> 
> -      gcc_assert (flag_schedule_interblock || current_nr_blocks == 1);
> -      current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;
> +	  curr_bb = first_bb;
> +	  if (dbg_cnt (sched_block))
> +	    {
> +	      int saved_last_basic_block = last_basic_block_for_fn (cfun);
> 
> -      curr_bb = first_bb;
> -      if (dbg_cnt (sched_block))
> -        {
> -	  int saved_last_basic_block = last_basic_block_for_fn (cfun);
> +	      schedule_block (&curr_bb, bb_state[first_bb->index]);
> +	      gcc_assert (EBB_FIRST_BB (bb) == first_bb);
> +	      sched_rgn_n_insns += sched_n_insns;
> +	      realloc_bb_state_array (saved_last_basic_block);
> +	      save_state_for_fallthru_edge (last_bb, curr_state);
> 
> -	  schedule_block (&curr_bb, bb_state[first_bb->index]);
> -	  gcc_assert (EBB_FIRST_BB (bb) == first_bb);
> -	  sched_rgn_n_insns += sched_n_insns;
> -	  realloc_bb_state_array (saved_last_basic_block);
> -	  save_state_for_fallthru_edge (last_bb, curr_state);
> -        }
> -      else
> -        {
> -          sched_rgn_n_insns += rgn_n_insns;
> -        }
> +	      /* Clean up.  */
> +	      if (current_nr_blocks > 1)
> +		free_trg_info ();
> +	    }
> +	  else
> +	    bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL;
> +	}
> 
> -      /* Clean up.  */
> -      if (current_nr_blocks > 1)
> -	free_trg_info ();
> +      /* We have counted this block when computing rgn_n_insns
> +	 previously, so need to fix up sched_rgn_n_insns now.  */
> +      if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL)
> +	{
> +	  while (head != NEXT_INSN (tail))
> +	    {
> +	      if (INSN_P (head))
> +		sched_rgn_n_insns++;
> +	      head = NEXT_INSN (head);
> +	    }
> +	}
>      }
> 
>    /* Sanity check: verify that all region insns were scheduled.  */
> @@ -3223,13 +3287,13 @@ schedule_region (int rgn)
> 
>    sched_finish_ready_list ();
> 
> -  /* Done with this region.  */
> -  sched_rgn_local_finish ();
> -
>    /* Free dependencies.  */
>    for (bb = 0; bb < current_nr_blocks; ++bb)
>      free_block_dependencies (bb);
> 
> +  /* Done with this region.  */
> +  sched_rgn_local_finish ();
> +
>    gcc_assert (haifa_recovery_bb_ever_added_p
>  	      || deps_pools_are_empty_p ());
>  }
> @@ -3450,6 +3514,19 @@ sched_rgn_local_init (int rgn)
>  	    e->aux = NULL;
>          }
>      }
> +
> +  /* Initialize bb_deps_free_actions.  */
> +  bb_deps_free_actions
> +    = XNEWVEC (enum rgn_bb_deps_free_action, current_nr_blocks);
> +  for (bb = 0; bb < current_nr_blocks; bb++)
> +    {
> +      rtx_insn *head, *tail;
> +      get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
> +      if (no_real_nondebug_insns_p (head, tail))
> +	bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NO;
> +      else
> +	bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NORMAL;
> +    }
>  }
> 
>  /* Free data computed for the finished region.  */
> @@ -3467,9 +3544,12 @@ sched_rgn_local_free (void)
>  void
>  sched_rgn_local_finish (void)
>  {
> -  if (current_nr_blocks > 1 && !sel_sched_p ())
> +  if (!sel_sched_p ())
>      {
> -      sched_rgn_local_free ();
> +      if (current_nr_blocks > 1)
> +	sched_rgn_local_free ();
> +
> +      free (bb_deps_free_actions);
>      }
>  }
> 
> diff --git a/gcc/sel-sched.cc b/gcc/sel-sched.cc
> index 1925f4a9461..8310c892e13 100644
> --- a/gcc/sel-sched.cc
> +++ b/gcc/sel-sched.cc
> @@ -7213,7 +7213,8 @@ sel_region_target_finish (bool reset_sched_cycles_p)
> 
>        find_ebb_boundaries (EBB_FIRST_BB (i), scheduled_blocks);
> 
> -      if (no_real_insns_p (current_sched_info->head, current_sched_info->tail))
> +      if (no_real_nondebug_insns_p (current_sched_info->head,
> +				    current_sched_info->tail))
>  	continue;
> 
>        if (reset_sched_cycles_p)
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108273.c b/gcc/testsuite/gcc.target/powerpc/pr108273.c
> new file mode 100644
> index 00000000000..937224eaa69
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr108273.c
> @@ -0,0 +1,26 @@
> +/* { dg-options "-O2 -fdbg-cnt=sched_block:1" } */
> +/* { dg-prune-output {\*\*\*dbgcnt:.*limit.*reached} } */
> +
> +/* Verify there is no ICE.  */
> +
> +int a, b, c, e, f;
> +float d;
> +
> +void
> +g ()
> +{
> +  float h, i[1];
> +  for (; f;)
> +    if (c)
> +      {
> +	d *e;
> +	if (b)
> +	  {
> +	    float *j = i;
> +	    j[0] += 0;
> +	  }
> +	h += d;
> +      }
> +  if (h)
> +    a = i[0];
> +}
> --
> 2.39.1
Richard Sandiford Nov. 8, 2023, 9:45 a.m. UTC | #2
"Kewen.Lin" <linkw@linux.ibm.com> writes:
> Hi,
>
> Gentle ping this:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634201.html

Sorry for the lack of review on this.  Personally, I've never looked
at this part of code base in detail, so I don't think I can do a proper
review.  I'll try to have a look in stage 3 if no one more qualified
beats me to it.

Thanks,
Richard

>
> BR,
> Kewen
>
> on 2023/10/25 10:45, Kewen.Lin wrote:
>> Hi,
>> 
>> This is almost a repost for v2 which was posted at[1] in March
>> excepting for:
>>   1) rebased from r14-4810 which is relatively up-to-date,
>>      some conflicts on "int to bool" return type change have
>>      been resolved;
>>   2) adjust commit log a bit;
>>   3) fix misspelled "articial" with "artificial" somewhere;
>> 
>> --
>> *v2 comments*:
>> 
>> By addressing Alexander's comments, against v1 this
>> patch v2 mainly:
>> 
>>   - Rename no_real_insns_p to no_real_nondebug_insns_p;
>>   - Introduce enum rgn_bb_deps_free_action for three
>>     kinds of actions to free deps;
>>   - Change function free_deps_for_bb_no_real_insns_p to
>>     resolve_forw_deps which only focuses on forward deps;
>>   - Extend the handlings to cover dbg-cnt sched_block,
>>     add one test case for it;
>>   - Move free_trg_info call in schedule_region to an
>>     appropriate place.
>> 
>> One thing I'm not sure about is the change in function
>> sched_rgn_local_finish, currently the invocation to
>> sched_rgn_local_free is guarded with !sel_sched_p (),
>> so I just follow it, but the initialization of those
>> structures (in sched_rgn_local_init) isn't guarded
>> with !sel_sched_p (), it looks odd.
>> 
>> --
>> 
>> As PR108273 shows, when there is one block which only has
>> NOTE_P and LABEL_P insns at non-debug mode while has some
>> extra DEBUG_INSN_P insns at debug mode, after scheduling
>> it, the DFA states would be different between debug mode
>> and non-debug mode.  Since at non-debug mode, the block
>> meets no_real_insns_p, it gets skipped; while at debug
>> mode, it gets scheduled, even it only has NOTE_P, LABEL_P
>> and DEBUG_INSN_P, the call of function advance_one_cycle
>> will change the DFA state.  PR108519 also shows this issue
>> can be exposed by some scheduler changes.
>> 
>> This patch is to change function no_real_insns_p to
>> function no_real_nondebug_insns_p by taking debug insn into
>> account, which make us not try to schedule for the block
>> having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
>> resulting in consistent DFA states between non-debug and
>> debug mode.
>> 
>> Changing no_real_insns_p to no_real_nondebug_insns_p caused
>> ICE when doing free_block_dependencies, the root cause is
>> that we create dependencies for debug insns, those
>> dependencies are expected to be resolved during scheduling
>> insns, but they get skipped after this change.
>> By checking the code, it looks it's reasonable to skip to
>> compute block dependences for no_real_nondebug_insns_p
>> blocks.  There is also another issue, which gets exposed
>> in SPEC2017 bmks build at option -O2 -g, is that we could
>> skip to schedule some block, which already gets dependency
>> graph built so has dependencies computed and rgn_n_insns
>> accumulated, then the later verification on if the graph
>> becomes exhausted by scheduling would fail as follow:
>> 
>>   /* Sanity check: verify that all region insns were
>>      scheduled.  */
>>     gcc_assert (sched_rgn_n_insns == rgn_n_insns);
>> 
>> , and also some forward deps aren't resovled.
>> 
>> As Alexander pointed out, the current debug count handling
>> also suffers the similar issue, so this patch handles these
>> two cases together: one is for some block gets skipped by
>> !dbg_cnt (sched_block), the other is for some block which
>> is not no_real_nondebug_insns_p initially but becomes
>> no_real_nondebug_insns_p due to speculative scheduling.
>> 
>> This patch can be bootstrapped and regress-tested on
>> x86_64-redhat-linux, aarch64-linux-gnu and
>> powerpc64{,le}-linux-gnu.
>> 
>> I also verified this patch can pass SPEC2017 both intrate
>> and fprate bmks building at -g -O2/-O3.
>> 
>> Any thoughts?  Is it ok for trunk?
>> 
>> [1] v2: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614818.html
>> [2] v1: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614224.html
>> 
>> BR,
>> Kewen
>> -----
>> 	PR rtl-optimization/108273
>> 
>> gcc/ChangeLog:
>> 
>> 	* haifa-sched.cc (no_real_insns_p): Rename to ...
>> 	(no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn.
>> 	* sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with
>> 	no_real_nondebug_insns_p.
>> 	* sched-int.h (no_real_insns_p): Rename to ...
>> 	(no_real_nondebug_insns_p): ... this.
>> 	* sched-rgn.cc (enum rgn_bb_deps_free_action): New enum.
>> 	(bb_deps_free_actions): New static variable.
>> 	(compute_block_dependences): Skip for no_real_nondebug_insns_p.
>> 	(resolve_forw_deps): New function.
>> 	(free_block_dependencies): Check bb_deps_free_actions and call
>> 	function resolve_forw_deps for RGN_BB_DEPS_FREE_ARTIFICIAL.
>> 	(compute_priorities): Replace no_real_insns_p with
>> 	no_real_nondebug_insns_p.
>> 	(schedule_region): Replace no_real_insns_p with
>> 	no_real_nondebug_insns_p, set RGN_BB_DEPS_FREE_ARTIFICIAL if the block
>> 	get dependencies computed before but skipped now, fix up count
>> 	sched_rgn_n_insns for it too.  Call free_trg_info when the block
>> 	gets scheduled, and move sched_rgn_local_finish after the loop
>> 	of free_block_dependencies loop.
>> 	(sched_rgn_local_init): Allocate and compute bb_deps_free_actions.
>> 	(sched_rgn_local_finish): Free bb_deps_free_actions.
>> 	* sel-sched.cc (sel_region_target_finish): Replace no_real_insns_p with
>> 	no_real_nondebug_insns_p.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> 	* gcc.target/powerpc/pr108273.c: New test.
>> ---
>>  gcc/haifa-sched.cc                          |   9 +-
>>  gcc/sched-ebb.cc                            |   2 +-
>>  gcc/sched-int.h                             |   2 +-
>>  gcc/sched-rgn.cc                            | 148 +++++++++++++++-----
>>  gcc/sel-sched.cc                            |   3 +-
>>  gcc/testsuite/gcc.target/powerpc/pr108273.c |  26 ++++
>>  6 files changed, 150 insertions(+), 40 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108273.c
>> 
>> diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
>> index 8e8add709b3..30cc90ec49f 100644
>> --- a/gcc/haifa-sched.cc
>> +++ b/gcc/haifa-sched.cc
>> @@ -5033,14 +5033,17 @@ get_ebb_head_tail (basic_block beg, basic_block end,
>>    *tailp = end_tail;
>>  }
>> 
>> -/* Return true if there are no real insns in the range [ HEAD, TAIL ].  */
>> +/* Return true if there are no real nondebug insns in the range
>> +   [ HEAD, TAIL ].  */
>> 
>>  bool
>> -no_real_insns_p (const rtx_insn *head, const rtx_insn *tail)
>> +no_real_nondebug_insns_p (const rtx_insn *head, const rtx_insn *tail)
>>  {
>>    while (head != NEXT_INSN (tail))
>>      {
>> -      if (!NOTE_P (head) && !LABEL_P (head))
>> +      if (!NOTE_P (head)
>> +	  && !LABEL_P (head)
>> +	  && !DEBUG_INSN_P (head))
>>  	return false;
>>        head = NEXT_INSN (head);
>>      }
>> diff --git a/gcc/sched-ebb.cc b/gcc/sched-ebb.cc
>> index 110fcdbca4d..03d96290a7c 100644
>> --- a/gcc/sched-ebb.cc
>> +++ b/gcc/sched-ebb.cc
>> @@ -491,7 +491,7 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, bool modulo_scheduling)
>>    first_bb = BLOCK_FOR_INSN (head);
>>    last_bb = BLOCK_FOR_INSN (tail);
>> 
>> -  if (no_real_insns_p (head, tail))
>> +  if (no_real_nondebug_insns_p (head, tail))
>>      return BLOCK_FOR_INSN (tail);
>> 
>>    gcc_assert (INSN_P (head) && INSN_P (tail));
>> diff --git a/gcc/sched-int.h b/gcc/sched-int.h
>> index 64a2f0bcff9..adca494ade5 100644
>> --- a/gcc/sched-int.h
>> +++ b/gcc/sched-int.h
>> @@ -1397,7 +1397,7 @@ extern void free_global_sched_pressure_data (void);
>>  extern int haifa_classify_insn (const_rtx);
>>  extern void get_ebb_head_tail (basic_block, basic_block,
>>  			       rtx_insn **, rtx_insn **);
>> -extern bool no_real_insns_p (const rtx_insn *, const rtx_insn *);
>> +extern bool no_real_nondebug_insns_p (const rtx_insn *, const rtx_insn *);
>> 
>>  extern int insn_sched_cost (rtx_insn *);
>>  extern int dep_cost_1 (dep_t, dw_t);
>> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
>> index e5964f54ead..2549e834aa8 100644
>> --- a/gcc/sched-rgn.cc
>> +++ b/gcc/sched-rgn.cc
>> @@ -213,6 +213,22 @@ static int rgn_nr_edges;
>>  /* Array of size rgn_nr_edges.  */
>>  static edge *rgn_edges;
>> 
>> +/* Possible actions for dependencies freeing.  */
>> +enum rgn_bb_deps_free_action
>> +{
>> +  /* This block doesn't get dependencies computed so don't need to free.  */
>> +  RGN_BB_DEPS_FREE_NO,
>> +  /* This block gets scheduled normally so free dependencies as usual.  */
>> +  RGN_BB_DEPS_FREE_NORMAL,
>> +  /* This block gets skipped in scheduling but has dependencies computed early,
>> +     need to free the forward list artificially.  */
>> +  RGN_BB_DEPS_FREE_ARTIFICIAL
>> +};
>> +
>> +/* For basic block i, bb_deps_free_actions[i] indicates which action needs
>> +   to be taken for freeing its dependencies.  */
>> +static enum rgn_bb_deps_free_action *bb_deps_free_actions;
>> +
>>  /* Mapping from each edge in the graph to its number in the rgn.  */
>>  #define EDGE_TO_BIT(edge) ((int)(size_t)(edge)->aux)
>>  #define SET_EDGE_TO_BIT(edge,nr) ((edge)->aux = (void *)(size_t)(nr))
>> @@ -2735,6 +2751,15 @@ compute_block_dependences (int bb)
>>    gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
>>    get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>> 
>> +  /* Don't compute block dependences if there are no real nondebug insns.  */
>> +  if (no_real_nondebug_insns_p (head, tail))
>> +    {
>> +      if (current_nr_blocks > 1)
>> +	propagate_deps (bb, &tmp_deps);
>> +      free_deps (&tmp_deps);
>> +      return;
>> +    }
>> +
>>    sched_analyze (&tmp_deps, head, tail);
>> 
>>    add_branch_dependences (head, tail);
>> @@ -2749,6 +2774,24 @@ compute_block_dependences (int bb)
>>      targetm.sched.dependencies_evaluation_hook (head, tail);
>>  }
>> 
>> +/* Artificially resolve forward dependencies for instructions HEAD to TAIL.  */
>> +
>> +static void
>> +resolve_forw_deps (rtx_insn *head, rtx_insn *tail)
>> +{
>> +  rtx_insn *insn;
>> +  rtx_insn *next_tail = NEXT_INSN (tail);
>> +  sd_iterator_def sd_it;
>> +  dep_t dep;
>> +
>> +  /* There could be some insns which get skipped in scheduling but we compute
>> +     dependencies for them previously, so make them resolved.  */
>> +  for (insn = head; insn != next_tail; insn = NEXT_INSN (insn))
>> +    for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
>> +	 sd_iterator_cond (&sd_it, &dep);)
>> +      sd_resolve_dep (sd_it);
>> +}
>> +
>>  /* Free dependencies of instructions inside BB.  */
>>  static void
>>  free_block_dependencies (int bb)
>> @@ -2758,9 +2801,12 @@ free_block_dependencies (int bb)
>> 
>>    get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>> 
>> -  if (no_real_insns_p (head, tail))
>> +  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO)
>>      return;
>> 
>> +  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL)
>> +    resolve_forw_deps (head, tail);
>> +
>>    sched_free_deps (head, tail, true);
>>  }
>> 
>> @@ -3024,7 +3070,7 @@ compute_priorities (void)
>>        gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
>>        get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>> 
>> -      if (no_real_insns_p (head, tail))
>> +      if (no_real_nondebug_insns_p (head, tail))
>>  	continue;
>> 
>>        rgn_n_insns += set_priorities (head, tail);
>> @@ -3158,7 +3204,7 @@ schedule_region (int rgn)
>> 
>>  	  get_ebb_head_tail (first_bb, last_bb, &head, &tail);
>> 
>> -	  if (no_real_insns_p (head, tail))
>> +	  if (no_real_nondebug_insns_p (head, tail))
>>  	    {
>>  	      gcc_assert (first_bb == last_bb);
>>  	      continue;
>> @@ -3178,44 +3224,62 @@ schedule_region (int rgn)
>> 
>>        get_ebb_head_tail (first_bb, last_bb, &head, &tail);
>> 
>> -      if (no_real_insns_p (head, tail))
>> +      if (no_real_nondebug_insns_p (head, tail))
>>  	{
>>  	  gcc_assert (first_bb == last_bb);
>>  	  save_state_for_fallthru_edge (last_bb, bb_state[first_bb->index]);
>> -	  continue;
>> +
>> +	  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO)
>> +	    continue;
>> +
>> +	  /* As it's not no_real_nondebug_insns_p initially, then it has some
>> +	     dependencies computed so free it artificially.  */
>> +	  bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL;
>>  	}
>> +      else
>> +	{
>> +	  current_sched_info->prev_head = PREV_INSN (head);
>> +	  current_sched_info->next_tail = NEXT_INSN (tail);
>> 
>> -      current_sched_info->prev_head = PREV_INSN (head);
>> -      current_sched_info->next_tail = NEXT_INSN (tail);
>> +	  remove_notes (head, tail);
>> 
>> -      remove_notes (head, tail);
>> +	  unlink_bb_notes (first_bb, last_bb);
>> 
>> -      unlink_bb_notes (first_bb, last_bb);
>> +	  target_bb = bb;
>> 
>> -      target_bb = bb;
>> +	  gcc_assert (flag_schedule_interblock || current_nr_blocks == 1);
>> +	  current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;
>> 
>> -      gcc_assert (flag_schedule_interblock || current_nr_blocks == 1);
>> -      current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;
>> +	  curr_bb = first_bb;
>> +	  if (dbg_cnt (sched_block))
>> +	    {
>> +	      int saved_last_basic_block = last_basic_block_for_fn (cfun);
>> 
>> -      curr_bb = first_bb;
>> -      if (dbg_cnt (sched_block))
>> -        {
>> -	  int saved_last_basic_block = last_basic_block_for_fn (cfun);
>> +	      schedule_block (&curr_bb, bb_state[first_bb->index]);
>> +	      gcc_assert (EBB_FIRST_BB (bb) == first_bb);
>> +	      sched_rgn_n_insns += sched_n_insns;
>> +	      realloc_bb_state_array (saved_last_basic_block);
>> +	      save_state_for_fallthru_edge (last_bb, curr_state);
>> 
>> -	  schedule_block (&curr_bb, bb_state[first_bb->index]);
>> -	  gcc_assert (EBB_FIRST_BB (bb) == first_bb);
>> -	  sched_rgn_n_insns += sched_n_insns;
>> -	  realloc_bb_state_array (saved_last_basic_block);
>> -	  save_state_for_fallthru_edge (last_bb, curr_state);
>> -        }
>> -      else
>> -        {
>> -          sched_rgn_n_insns += rgn_n_insns;
>> -        }
>> +	      /* Clean up.  */
>> +	      if (current_nr_blocks > 1)
>> +		free_trg_info ();
>> +	    }
>> +	  else
>> +	    bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL;
>> +	}
>> 
>> -      /* Clean up.  */
>> -      if (current_nr_blocks > 1)
>> -	free_trg_info ();
>> +      /* We have counted this block when computing rgn_n_insns
>> +	 previously, so need to fix up sched_rgn_n_insns now.  */
>> +      if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL)
>> +	{
>> +	  while (head != NEXT_INSN (tail))
>> +	    {
>> +	      if (INSN_P (head))
>> +		sched_rgn_n_insns++;
>> +	      head = NEXT_INSN (head);
>> +	    }
>> +	}
>>      }
>> 
>>    /* Sanity check: verify that all region insns were scheduled.  */
>> @@ -3223,13 +3287,13 @@ schedule_region (int rgn)
>> 
>>    sched_finish_ready_list ();
>> 
>> -  /* Done with this region.  */
>> -  sched_rgn_local_finish ();
>> -
>>    /* Free dependencies.  */
>>    for (bb = 0; bb < current_nr_blocks; ++bb)
>>      free_block_dependencies (bb);
>> 
>> +  /* Done with this region.  */
>> +  sched_rgn_local_finish ();
>> +
>>    gcc_assert (haifa_recovery_bb_ever_added_p
>>  	      || deps_pools_are_empty_p ());
>>  }
>> @@ -3450,6 +3514,19 @@ sched_rgn_local_init (int rgn)
>>  	    e->aux = NULL;
>>          }
>>      }
>> +
>> +  /* Initialize bb_deps_free_actions.  */
>> +  bb_deps_free_actions
>> +    = XNEWVEC (enum rgn_bb_deps_free_action, current_nr_blocks);
>> +  for (bb = 0; bb < current_nr_blocks; bb++)
>> +    {
>> +      rtx_insn *head, *tail;
>> +      get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>> +      if (no_real_nondebug_insns_p (head, tail))
>> +	bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NO;
>> +      else
>> +	bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NORMAL;
>> +    }
>>  }
>> 
>>  /* Free data computed for the finished region.  */
>> @@ -3467,9 +3544,12 @@ sched_rgn_local_free (void)
>>  void
>>  sched_rgn_local_finish (void)
>>  {
>> -  if (current_nr_blocks > 1 && !sel_sched_p ())
>> +  if (!sel_sched_p ())
>>      {
>> -      sched_rgn_local_free ();
>> +      if (current_nr_blocks > 1)
>> +	sched_rgn_local_free ();
>> +
>> +      free (bb_deps_free_actions);
>>      }
>>  }
>> 
>> diff --git a/gcc/sel-sched.cc b/gcc/sel-sched.cc
>> index 1925f4a9461..8310c892e13 100644
>> --- a/gcc/sel-sched.cc
>> +++ b/gcc/sel-sched.cc
>> @@ -7213,7 +7213,8 @@ sel_region_target_finish (bool reset_sched_cycles_p)
>> 
>>        find_ebb_boundaries (EBB_FIRST_BB (i), scheduled_blocks);
>> 
>> -      if (no_real_insns_p (current_sched_info->head, current_sched_info->tail))
>> +      if (no_real_nondebug_insns_p (current_sched_info->head,
>> +				    current_sched_info->tail))
>>  	continue;
>> 
>>        if (reset_sched_cycles_p)
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108273.c b/gcc/testsuite/gcc.target/powerpc/pr108273.c
>> new file mode 100644
>> index 00000000000..937224eaa69
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr108273.c
>> @@ -0,0 +1,26 @@
>> +/* { dg-options "-O2 -fdbg-cnt=sched_block:1" } */
>> +/* { dg-prune-output {\*\*\*dbgcnt:.*limit.*reached} } */
>> +
>> +/* Verify there is no ICE.  */
>> +
>> +int a, b, c, e, f;
>> +float d;
>> +
>> +void
>> +g ()
>> +{
>> +  float h, i[1];
>> +  for (; f;)
>> +    if (c)
>> +      {
>> +	d *e;
>> +	if (b)
>> +	  {
>> +	    float *j = i;
>> +	    j[0] += 0;
>> +	  }
>> +	h += d;
>> +      }
>> +  if (h)
>> +    a = i[0];
>> +}
>> --
>> 2.39.1
Maxim Kuvyrkov Nov. 9, 2023, 9:10 a.m. UTC | #3
Hi Kewen,

Below are my comments.  I don't want to override Alexander's review, and if the patch looks good to him, it's fine to ignore my concerns.

My main concern is that this adds a new entity -- forceful skipping of DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change in behavior.  Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is already quite a bit of logic in the scheduler to skip them _as part of normal operation_.

Would you please consider 2 ideas below.

#1:
After a brief look, I'm guessing this part is causing the problem:
haifa-sched.cc <http://haifa-sched.cc/>:schedule_block():
=== [1]
  /* Loop until all the insns in BB are scheduled.  */
  while ((*current_sched_info->schedule_more_p) ())
    {
      perform_replacements_new_cycle ();
      do
	{
	  start_clock_var = clock_var;

	  clock_var++;

	  advance_one_cycle ();
===

and then in the nested loop we have
=== [2]
	  /* We don't want md sched reorder to even see debug isns, so put
	     them out right away.  */
	  if (ready.n_ready && DEBUG_INSN_P (ready_element (&ready, 0))
	      && (*current_sched_info->schedule_more_p) ())
	    {
	      while (ready.n_ready && DEBUG_INSN_P (ready_element (&ready, 0)))
		{
		  rtx_insn *insn = ready_remove_first (&ready);
		  gcc_assert (DEBUG_INSN_P (insn));
		  (*current_sched_info->begin_schedule_ready) (insn);
		  scheduled_insns.safe_push (insn);
		  last_scheduled_insn = insn;
		  advance = schedule_insn (insn);
		  gcc_assert (advance == 0);
		  if (ready.n_ready > 0)
		    ready_sort (&ready);
		}
	    }
===
.  At the [1] point we already have sorted ready list, and I don't see any blockers to doing [2] before calling advance_one_cycle().

#2
Another approach, which might be even easier, is to save the state of DFA before the initial advance_one_cycle(), and then restore it if no real insns have been scheduled.

Kind regards,

--
Maxim Kuvyrkov
https://www.linaro.org


> On Nov 8, 2023, at 06:49, Kewen.Lin <linkw@linux.ibm.com> wrote:
> 
> Hi,
> 
> Gentle ping this:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634201.html
> 
> BR,
> Kewen
> 
> on 2023/10/25 10:45, Kewen.Lin wrote:
>> Hi,
>> 
>> This is almost a repost for v2 which was posted at[1] in March
>> excepting for:
>>  1) rebased from r14-4810 which is relatively up-to-date,
>>     some conflicts on "int to bool" return type change have
>>     been resolved;
>>  2) adjust commit log a bit;
>>  3) fix misspelled "articial" with "artificial" somewhere;
>> 
>> --
>> *v2 comments*:
>> 
>> By addressing Alexander's comments, against v1 this
>> patch v2 mainly:
>> 
>>  - Rename no_real_insns_p to no_real_nondebug_insns_p;
>>  - Introduce enum rgn_bb_deps_free_action for three
>>    kinds of actions to free deps;
>>  - Change function free_deps_for_bb_no_real_insns_p to
>>    resolve_forw_deps which only focuses on forward deps;
>>  - Extend the handlings to cover dbg-cnt sched_block,
>>    add one test case for it;
>>  - Move free_trg_info call in schedule_region to an
>>    appropriate place.
>> 
>> One thing I'm not sure about is the change in function
>> sched_rgn_local_finish, currently the invocation to
>> sched_rgn_local_free is guarded with !sel_sched_p (),
>> so I just follow it, but the initialization of those
>> structures (in sched_rgn_local_init) isn't guarded
>> with !sel_sched_p (), it looks odd.
>> 
>> --
>> 
>> As PR108273 shows, when there is one block which only has
>> NOTE_P and LABEL_P insns at non-debug mode while has some
>> extra DEBUG_INSN_P insns at debug mode, after scheduling
>> it, the DFA states would be different between debug mode
>> and non-debug mode.  Since at non-debug mode, the block
>> meets no_real_insns_p, it gets skipped; while at debug
>> mode, it gets scheduled, even it only has NOTE_P, LABEL_P
>> and DEBUG_INSN_P, the call of function advance_one_cycle
>> will change the DFA state.  PR108519 also shows this issue
>> can be exposed by some scheduler changes.
>> 
>> This patch is to change function no_real_insns_p to
>> function no_real_nondebug_insns_p by taking debug insn into
>> account, which make us not try to schedule for the block
>> having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
>> resulting in consistent DFA states between non-debug and
>> debug mode.
>> 
>> Changing no_real_insns_p to no_real_nondebug_insns_p caused
>> ICE when doing free_block_dependencies, the root cause is
>> that we create dependencies for debug insns, those
>> dependencies are expected to be resolved during scheduling
>> insns, but they get skipped after this change.
>> By checking the code, it looks it's reasonable to skip to
>> compute block dependences for no_real_nondebug_insns_p
>> blocks.  There is also another issue, which gets exposed
>> in SPEC2017 bmks build at option -O2 -g, is that we could
>> skip to schedule some block, which already gets dependency
>> graph built so has dependencies computed and rgn_n_insns
>> accumulated, then the later verification on if the graph
>> becomes exhausted by scheduling would fail as follow:
>> 
>>  /* Sanity check: verify that all region insns were
>>     scheduled.  */
>>    gcc_assert (sched_rgn_n_insns == rgn_n_insns);
>> 
>> , and also some forward deps aren't resovled.
>> 
>> As Alexander pointed out, the current debug count handling
>> also suffers the similar issue, so this patch handles these
>> two cases together: one is for some block gets skipped by
>> !dbg_cnt (sched_block), the other is for some block which
>> is not no_real_nondebug_insns_p initially but becomes
>> no_real_nondebug_insns_p due to speculative scheduling.
>> 
>> This patch can be bootstrapped and regress-tested on
>> x86_64-redhat-linux, aarch64-linux-gnu and
>> powerpc64{,le}-linux-gnu.
>> 
>> I also verified this patch can pass SPEC2017 both intrate
>> and fprate bmks building at -g -O2/-O3.
>> 
>> Any thoughts?  Is it ok for trunk?
>> 
>> [1] v2: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614818.html
>> [2] v1: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614224.html
>> 
>> BR,
>> Kewen
>> -----
>> PR rtl-optimization/108273
>> 
>> gcc/ChangeLog:
>> 
>> * haifa-sched.cc (no_real_insns_p): Rename to ...
>> (no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn.
>> * sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with
>> no_real_nondebug_insns_p.
>> * sched-int.h (no_real_insns_p): Rename to ...
>> (no_real_nondebug_insns_p): ... this.
>> * sched-rgn.cc (enum rgn_bb_deps_free_action): New enum.
>> (bb_deps_free_actions): New static variable.
>> (compute_block_dependences): Skip for no_real_nondebug_insns_p.
>> (resolve_forw_deps): New function.
>> (free_block_dependencies): Check bb_deps_free_actions and call
>> function resolve_forw_deps for RGN_BB_DEPS_FREE_ARTIFICIAL.
>> (compute_priorities): Replace no_real_insns_p with
>> no_real_nondebug_insns_p.
>> (schedule_region): Replace no_real_insns_p with
>> no_real_nondebug_insns_p, set RGN_BB_DEPS_FREE_ARTIFICIAL if the block
>> get dependencies computed before but skipped now, fix up count
>> sched_rgn_n_insns for it too.  Call free_trg_info when the block
>> gets scheduled, and move sched_rgn_local_finish after the loop
>> of free_block_dependencies loop.
>> (sched_rgn_local_init): Allocate and compute bb_deps_free_actions.
>> (sched_rgn_local_finish): Free bb_deps_free_actions.
>> * sel-sched.cc (sel_region_target_finish): Replace no_real_insns_p with
>> no_real_nondebug_insns_p.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> * gcc.target/powerpc/pr108273.c: New test.
>> ---
>> gcc/haifa-sched.cc                          |   9 +-
>> gcc/sched-ebb.cc                            |   2 +-
>> gcc/sched-int.h                             |   2 +-
>> gcc/sched-rgn.cc                            | 148 +++++++++++++++-----
>> gcc/sel-sched.cc                            |   3 +-
>> gcc/testsuite/gcc.target/powerpc/pr108273.c |  26 ++++
>> 6 files changed, 150 insertions(+), 40 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108273.c
>> 
>> diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
>> index 8e8add709b3..30cc90ec49f 100644
>> --- a/gcc/haifa-sched.cc
>> +++ b/gcc/haifa-sched.cc
>> @@ -5033,14 +5033,17 @@ get_ebb_head_tail (basic_block beg, basic_block end,
>>   *tailp = end_tail;
>> }
>> 
>> -/* Return true if there are no real insns in the range [ HEAD, TAIL ].  */
>> +/* Return true if there are no real nondebug insns in the range
>> +   [ HEAD, TAIL ].  */
>> 
>> bool
>> -no_real_insns_p (const rtx_insn *head, const rtx_insn *tail)
>> +no_real_nondebug_insns_p (const rtx_insn *head, const rtx_insn *tail)
>> {
>>   while (head != NEXT_INSN (tail))
>>     {
>> -      if (!NOTE_P (head) && !LABEL_P (head))
>> +      if (!NOTE_P (head)
>> +  && !LABEL_P (head)
>> +  && !DEBUG_INSN_P (head))
>> return false;
>>       head = NEXT_INSN (head);
>>     }
>> diff --git a/gcc/sched-ebb.cc b/gcc/sched-ebb.cc
>> index 110fcdbca4d..03d96290a7c 100644
>> --- a/gcc/sched-ebb.cc
>> +++ b/gcc/sched-ebb.cc
>> @@ -491,7 +491,7 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, bool modulo_scheduling)
>>   first_bb = BLOCK_FOR_INSN (head);
>>   last_bb = BLOCK_FOR_INSN (tail);
>> 
>> -  if (no_real_insns_p (head, tail))
>> +  if (no_real_nondebug_insns_p (head, tail))
>>     return BLOCK_FOR_INSN (tail);
>> 
>>   gcc_assert (INSN_P (head) && INSN_P (tail));
>> diff --git a/gcc/sched-int.h b/gcc/sched-int.h
>> index 64a2f0bcff9..adca494ade5 100644
>> --- a/gcc/sched-int.h
>> +++ b/gcc/sched-int.h
>> @@ -1397,7 +1397,7 @@ extern void free_global_sched_pressure_data (void);
>> extern int haifa_classify_insn (const_rtx);
>> extern void get_ebb_head_tail (basic_block, basic_block,
>>       rtx_insn **, rtx_insn **);
>> -extern bool no_real_insns_p (const rtx_insn *, const rtx_insn *);
>> +extern bool no_real_nondebug_insns_p (const rtx_insn *, const rtx_insn *);
>> 
>> extern int insn_sched_cost (rtx_insn *);
>> extern int dep_cost_1 (dep_t, dw_t);
>> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
>> index e5964f54ead..2549e834aa8 100644
>> --- a/gcc/sched-rgn.cc
>> +++ b/gcc/sched-rgn.cc
>> @@ -213,6 +213,22 @@ static int rgn_nr_edges;
>> /* Array of size rgn_nr_edges.  */
>> static edge *rgn_edges;
>> 
>> +/* Possible actions for dependencies freeing.  */
>> +enum rgn_bb_deps_free_action
>> +{
>> +  /* This block doesn't get dependencies computed so don't need to free.  */
>> +  RGN_BB_DEPS_FREE_NO,
>> +  /* This block gets scheduled normally so free dependencies as usual.  */
>> +  RGN_BB_DEPS_FREE_NORMAL,
>> +  /* This block gets skipped in scheduling but has dependencies computed early,
>> +     need to free the forward list artificially.  */
>> +  RGN_BB_DEPS_FREE_ARTIFICIAL
>> +};
>> +
>> +/* For basic block i, bb_deps_free_actions[i] indicates which action needs
>> +   to be taken for freeing its dependencies.  */
>> +static enum rgn_bb_deps_free_action *bb_deps_free_actions;
>> +
>> /* Mapping from each edge in the graph to its number in the rgn.  */
>> #define EDGE_TO_BIT(edge) ((int)(size_t)(edge)->aux)
>> #define SET_EDGE_TO_BIT(edge,nr) ((edge)->aux = (void *)(size_t)(nr))
>> @@ -2735,6 +2751,15 @@ compute_block_dependences (int bb)
>>   gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
>>   get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>> 
>> +  /* Don't compute block dependences if there are no real nondebug insns.  */
>> +  if (no_real_nondebug_insns_p (head, tail))
>> +    {
>> +      if (current_nr_blocks > 1)
>> + propagate_deps (bb, &tmp_deps);
>> +      free_deps (&tmp_deps);
>> +      return;
>> +    }
>> +
>>   sched_analyze (&tmp_deps, head, tail);
>> 
>>   add_branch_dependences (head, tail);
>> @@ -2749,6 +2774,24 @@ compute_block_dependences (int bb)
>>     targetm.sched.dependencies_evaluation_hook (head, tail);
>> }
>> 
>> +/* Artificially resolve forward dependencies for instructions HEAD to TAIL.  */
>> +
>> +static void
>> +resolve_forw_deps (rtx_insn *head, rtx_insn *tail)
>> +{
>> +  rtx_insn *insn;
>> +  rtx_insn *next_tail = NEXT_INSN (tail);
>> +  sd_iterator_def sd_it;
>> +  dep_t dep;
>> +
>> +  /* There could be some insns which get skipped in scheduling but we compute
>> +     dependencies for them previously, so make them resolved.  */
>> +  for (insn = head; insn != next_tail; insn = NEXT_INSN (insn))
>> +    for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
>> + sd_iterator_cond (&sd_it, &dep);)
>> +      sd_resolve_dep (sd_it);
>> +}
>> +
>> /* Free dependencies of instructions inside BB.  */
>> static void
>> free_block_dependencies (int bb)
>> @@ -2758,9 +2801,12 @@ free_block_dependencies (int bb)
>> 
>>   get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>> 
>> -  if (no_real_insns_p (head, tail))
>> +  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO)
>>     return;
>> 
>> +  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL)
>> +    resolve_forw_deps (head, tail);
>> +
>>   sched_free_deps (head, tail, true);
>> }
>> 
>> @@ -3024,7 +3070,7 @@ compute_priorities (void)
>>       gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
>>       get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>> 
>> -      if (no_real_insns_p (head, tail))
>> +      if (no_real_nondebug_insns_p (head, tail))
>> continue;
>> 
>>       rgn_n_insns += set_priorities (head, tail);
>> @@ -3158,7 +3204,7 @@ schedule_region (int rgn)
>> 
>>  get_ebb_head_tail (first_bb, last_bb, &head, &tail);
>> 
>> -  if (no_real_insns_p (head, tail))
>> +  if (no_real_nondebug_insns_p (head, tail))
>>    {
>>      gcc_assert (first_bb == last_bb);
>>      continue;
>> @@ -3178,44 +3224,62 @@ schedule_region (int rgn)
>> 
>>       get_ebb_head_tail (first_bb, last_bb, &head, &tail);
>> 
>> -      if (no_real_insns_p (head, tail))
>> +      if (no_real_nondebug_insns_p (head, tail))
>> {
>>  gcc_assert (first_bb == last_bb);
>>  save_state_for_fallthru_edge (last_bb, bb_state[first_bb->index]);
>> -  continue;
>> +
>> +  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO)
>> +    continue;
>> +
>> +  /* As it's not no_real_nondebug_insns_p initially, then it has some
>> +     dependencies computed so free it artificially.  */
>> +  bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL;
>> }
>> +      else
>> + {
>> +  current_sched_info->prev_head = PREV_INSN (head);
>> +  current_sched_info->next_tail = NEXT_INSN (tail);
>> 
>> -      current_sched_info->prev_head = PREV_INSN (head);
>> -      current_sched_info->next_tail = NEXT_INSN (tail);
>> +  remove_notes (head, tail);
>> 
>> -      remove_notes (head, tail);
>> +  unlink_bb_notes (first_bb, last_bb);
>> 
>> -      unlink_bb_notes (first_bb, last_bb);
>> +  target_bb = bb;
>> 
>> -      target_bb = bb;
>> +  gcc_assert (flag_schedule_interblock || current_nr_blocks == 1);
>> +  current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;
>> 
>> -      gcc_assert (flag_schedule_interblock || current_nr_blocks == 1);
>> -      current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;
>> +  curr_bb = first_bb;
>> +  if (dbg_cnt (sched_block))
>> +    {
>> +      int saved_last_basic_block = last_basic_block_for_fn (cfun);
>> 
>> -      curr_bb = first_bb;
>> -      if (dbg_cnt (sched_block))
>> -        {
>> -  int saved_last_basic_block = last_basic_block_for_fn (cfun);
>> +      schedule_block (&curr_bb, bb_state[first_bb->index]);
>> +      gcc_assert (EBB_FIRST_BB (bb) == first_bb);
>> +      sched_rgn_n_insns += sched_n_insns;
>> +      realloc_bb_state_array (saved_last_basic_block);
>> +      save_state_for_fallthru_edge (last_bb, curr_state);
>> 
>> -  schedule_block (&curr_bb, bb_state[first_bb->index]);
>> -  gcc_assert (EBB_FIRST_BB (bb) == first_bb);
>> -  sched_rgn_n_insns += sched_n_insns;
>> -  realloc_bb_state_array (saved_last_basic_block);
>> -  save_state_for_fallthru_edge (last_bb, curr_state);
>> -        }
>> -      else
>> -        {
>> -          sched_rgn_n_insns += rgn_n_insns;
>> -        }
>> +      /* Clean up.  */
>> +      if (current_nr_blocks > 1)
>> + free_trg_info ();
>> +    }
>> +  else
>> +    bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL;
>> + }
>> 
>> -      /* Clean up.  */
>> -      if (current_nr_blocks > 1)
>> - free_trg_info ();
>> +      /* We have counted this block when computing rgn_n_insns
>> + previously, so need to fix up sched_rgn_n_insns now.  */
>> +      if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL)
>> + {
>> +  while (head != NEXT_INSN (tail))
>> +    {
>> +      if (INSN_P (head))
>> + sched_rgn_n_insns++;
>> +      head = NEXT_INSN (head);
>> +    }
>> + }
>>     }
>> 
>>   /* Sanity check: verify that all region insns were scheduled.  */
>> @@ -3223,13 +3287,13 @@ schedule_region (int rgn)
>> 
>>   sched_finish_ready_list ();
>> 
>> -  /* Done with this region.  */
>> -  sched_rgn_local_finish ();
>> -
>>   /* Free dependencies.  */
>>   for (bb = 0; bb < current_nr_blocks; ++bb)
>>     free_block_dependencies (bb);
>> 
>> +  /* Done with this region.  */
>> +  sched_rgn_local_finish ();
>> +
>>   gcc_assert (haifa_recovery_bb_ever_added_p
>>      || deps_pools_are_empty_p ());
>> }
>> @@ -3450,6 +3514,19 @@ sched_rgn_local_init (int rgn)
>>    e->aux = NULL;
>>         }
>>     }
>> +
>> +  /* Initialize bb_deps_free_actions.  */
>> +  bb_deps_free_actions
>> +    = XNEWVEC (enum rgn_bb_deps_free_action, current_nr_blocks);
>> +  for (bb = 0; bb < current_nr_blocks; bb++)
>> +    {
>> +      rtx_insn *head, *tail;
>> +      get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>> +      if (no_real_nondebug_insns_p (head, tail))
>> + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NO;
>> +      else
>> + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NORMAL;
>> +    }
>> }
>> 
>> /* Free data computed for the finished region.  */
>> @@ -3467,9 +3544,12 @@ sched_rgn_local_free (void)
>> void
>> sched_rgn_local_finish (void)
>> {
>> -  if (current_nr_blocks > 1 && !sel_sched_p ())
>> +  if (!sel_sched_p ())
>>     {
>> -      sched_rgn_local_free ();
>> +      if (current_nr_blocks > 1)
>> + sched_rgn_local_free ();
>> +
>> +      free (bb_deps_free_actions);
>>     }
>> }
>> 
>> diff --git a/gcc/sel-sched.cc b/gcc/sel-sched.cc
>> index 1925f4a9461..8310c892e13 100644
>> --- a/gcc/sel-sched.cc
>> +++ b/gcc/sel-sched.cc
>> @@ -7213,7 +7213,8 @@ sel_region_target_finish (bool reset_sched_cycles_p)
>> 
>>       find_ebb_boundaries (EBB_FIRST_BB (i), scheduled_blocks);
>> 
>> -      if (no_real_insns_p (current_sched_info->head, current_sched_info->tail))
>> +      if (no_real_nondebug_insns_p (current_sched_info->head,
>> +    current_sched_info->tail))
>> continue;
>> 
>>       if (reset_sched_cycles_p)
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108273.c b/gcc/testsuite/gcc.target/powerpc/pr108273.c
>> new file mode 100644
>> index 00000000000..937224eaa69
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr108273.c
>> @@ -0,0 +1,26 @@
>> +/* { dg-options "-O2 -fdbg-cnt=sched_block:1" } */
>> +/* { dg-prune-output {\*\*\*dbgcnt:.*limit.*reached} } */
>> +
>> +/* Verify there is no ICE.  */
>> +
>> +int a, b, c, e, f;
>> +float d;
>> +
>> +void
>> +g ()
>> +{
>> +  float h, i[1];
>> +  for (; f;)
>> +    if (c)
>> +      {
>> + d *e;
>> + if (b)
>> +  {
>> +    float *j = i;
>> +    j[0] += 0;
>> +  }
>> + h += d;
>> +      }
>> +  if (h)
>> +    a = i[0];
>> +}
>> --
>> 2.39.1
Alexander Monakov Nov. 9, 2023, 5:40 p.m. UTC | #4
On Thu, 9 Nov 2023, Maxim Kuvyrkov wrote:

> Hi Kewen,
> 
> Below are my comments.  I don't want to override Alexander's review, and if
> the patch looks good to him, it's fine to ignore my concerns.
> 
> My main concern is that this adds a new entity -- forceful skipping of
> DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change
> in behavior.  Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is
> already quite a bit of logic in the scheduler to skip them _as part of normal
> operation_.

I agree with the concern. I hoped that solving the problem by skipping the BB
like the (bit-rotted) debug code needs to would be a minor surgery. As things
look now, it may be better to remove the non-working sched_block debug counter
entirely and implement a good solution for the problem at hand.

> 
> Would you please consider 2 ideas below.
> 
> #1:
> After a brief look, I'm guessing this part is causing the problem:
> haifa-sched.cc <http://haifa-sched.cc/>:schedule_block():
> === [1]
>   /* Loop until all the insns in BB are scheduled.  */
>   while ((*current_sched_info->schedule_more_p) ())
>     {
>       perform_replacements_new_cycle ();
>       do
> 	{
> 	  start_clock_var = clock_var;
> 
> 	  clock_var++;
> 
> 	  advance_one_cycle ();

As I understand, we have spurious calls to advance_one_cycle on basic block
boundaries, which don't model the hardware (the CPU doesn't see BB boundaries)
and cause divergence when passing through a debug-only BB which would not be
present at all without -g.

Since EBBs and regions may not have jump targets in the middle, advancing
a cycle on BB boundaries does not seem well motivated. Can we remove it?

Can we teach haifa-sched to emit RTX NOTEs with hashes of DFA states on BB
boundaries with -fcompare-debug is enabled? It should make the problem
readily detectable by -fcompare-debug even when scheduling did not diverge.

Alexander
Kewen.Lin Nov. 10, 2023, 1:57 a.m. UTC | #5
Hi Maxim and Alexander,

Thanks a lot for the review comments!

on 2023/11/10 01:40, Alexander Monakov wrote:
> 
> On Thu, 9 Nov 2023, Maxim Kuvyrkov wrote:
> 
>> Hi Kewen,
>>
>> Below are my comments.  I don't want to override Alexander's review, and if
>> the patch looks good to him, it's fine to ignore my concerns.
>>
>> My main concern is that this adds a new entity -- forceful skipping of
>> DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change
>> in behavior.  Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is
>> already quite a bit of logic in the scheduler to skip them _as part of normal
>> operation_.

Yeah, I noticed that the scheduler takes care of DEBUG_INSNs as normal operations.
When I started to work on this issue, initially I wanted to try something similar
to your idea #2, but when checking the APIs, I realized why not just skip the basic
block with NOTEs and LABELs, DEBUG_INSNs as well.  IMHO there is no value to try to
schedule this kind of BB (to be scheduled range), skipping it can save some resource
allocation (like block dependencies) and make it more efficient (not enter function
schedule_block etc.), from this perspective it seems an enhancement.  Does it sound
reasonable to you?

> 
> I agree with the concern. I hoped that solving the problem by skipping the BB
> like the (bit-rotted) debug code needs to would be a minor surgery. As things
> look now, it may be better to remove the non-working sched_block debug counter
> entirely and implement a good solution for the problem at hand.

OK, if debug counter sched_block is useless and can be removed, then the proposed
new skipping becomes the only actual need for the artificial resolve_forw_deps.

> 
>>
>> Would you please consider 2 ideas below.
>>
>> #1:
>> After a brief look, I'm guessing this part is causing the problem:
>> haifa-sched.cc <http://haifa-sched.cc/>:schedule_block():
>> === [1]
>>   /* Loop until all the insns in BB are scheduled.  */
>>   while ((*current_sched_info->schedule_more_p) ())
>>     {
>>       perform_replacements_new_cycle ();
>>       do
>> 	{
>> 	  start_clock_var = clock_var;
>>
>> 	  clock_var++;
>>
>> 	  advance_one_cycle ();
> 
> As I understand, we have spurious calls to advance_one_cycle on basic block
> boundaries, which don't model the hardware (the CPU doesn't see BB boundaries)
> and cause divergence when passing through a debug-only BB which would not be
> present at all without -g.
> 
> Since EBBs and regions may not have jump targets in the middle, advancing
> a cycle on BB boundaries does not seem well motivated. Can we remove it?
> 
> Can we teach haifa-sched to emit RTX NOTEs with hashes of DFA states on BB
> boundaries with -fcompare-debug is enabled? It should make the problem
> readily detectable by -fcompare-debug even when scheduling did not diverge.

Good idea!  It would be easy to detect the inconsistent issue with such note.

BR,
Kewen
Jeff Law Nov. 10, 2023, 3:49 a.m. UTC | #6
On 11/9/23 10:40, Alexander Monakov wrote:
> 
> On Thu, 9 Nov 2023, Maxim Kuvyrkov wrote:
> 
>> Hi Kewen,
>>
>> Below are my comments.  I don't want to override Alexander's review, and if
>> the patch looks good to him, it's fine to ignore my concerns.
>>
>> My main concern is that this adds a new entity -- forceful skipping of
>> DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change
>> in behavior.  Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is
>> already quite a bit of logic in the scheduler to skip them _as part of normal
>> operation_.
> 
> I agree with the concern. I hoped that solving the problem by skipping the BB
> like the (bit-rotted) debug code needs to would be a minor surgery. As things
> look now, it may be better to remove the non-working sched_block debug counter
> entirely and implement a good solution for the problem at hand.
I wouldn't lose sleep over this -- if removing a debug counter from this 
code simplifies the implementation, that's fine with me.  I believe they 
were all added when significant work in the scheduler was more common.


> 
> Can we teach haifa-sched to emit RTX NOTEs with hashes of DFA states on BB
> boundaries with -fcompare-debug is enabled? It should make the problem
> readily detectable by -fcompare-debug even when scheduling did not diverge.
I like it.

jeff
Jeff Law Nov. 10, 2023, 3:49 a.m. UTC | #7
On 11/9/23 18:57, Kewen.Lin wrote:
> Hi Maxim and Alexander,
> 
> Thanks a lot for the review comments!
> 
> on 2023/11/10 01:40, Alexander Monakov wrote:
>>
>> On Thu, 9 Nov 2023, Maxim Kuvyrkov wrote:
>>
>>> Hi Kewen,
>>>
>>> Below are my comments.  I don't want to override Alexander's review, and if
>>> the patch looks good to him, it's fine to ignore my concerns.
>>>
>>> My main concern is that this adds a new entity -- forceful skipping of
>>> DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change
>>> in behavior.  Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is
>>> already quite a bit of logic in the scheduler to skip them _as part of normal
>>> operation_.
> 
> Yeah, I noticed that the scheduler takes care of DEBUG_INSNs as normal operations.
> When I started to work on this issue, initially I wanted to try something similar
> to your idea #2, but when checking the APIs, I realized why not just skip the basic
> block with NOTEs and LABELs, DEBUG_INSNs as well.  IMHO there is no value to try to
> schedule this kind of BB (to be scheduled range), skipping it can save some resource
> allocation (like block dependencies) and make it more efficient (not enter function
> schedule_block etc.), from this perspective it seems an enhancement.  Does it sound
> reasonable to you?
It sounds reasonable, but only if doing so doesn't add significant 
implementation complexity.  ie, the gains from doing less work here are 
likely to be very marginal, so I'm more interested in clean, easy to 
maintain code.

Jeff
Alexander Monakov Nov. 10, 2023, 11:25 a.m. UTC | #8
On Thu, 9 Nov 2023, Jeff Law wrote:

> > Yeah, I noticed that the scheduler takes care of DEBUG_INSNs as normal
> > operations.  When I started to work on this issue, initially I wanted to try
> > something similar to your idea #2, but when checking the APIs, I realized
> > why not just skip the basic block with NOTEs and LABELs, DEBUG_INSNs as
> > well.  IMHO there is no value to try to schedule this kind of BB (to be
> > scheduled range), skipping it can save some resource allocation (like block
> > dependencies) and make it more efficient (not enter function schedule_block
> > etc.), from this perspective it seems an enhancement.  Does it sound
> > reasonable to you?
> It sounds reasonable, but only if doing so doesn't add significant
> implementation complexity.  ie, the gains from doing less work here are likely
> to be very marginal, so I'm more interested in clean, easy to maintain code.

I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking design:
DEBUG_INSNs participate in dependency graph so that schedulers can remove or
mutate them as needed when moving real insns across them.

Cc'ing Alexandre Oliva who can correct me on that if necessary.

Alexander
Richard Biener Nov. 10, 2023, 1:32 p.m. UTC | #9
On Fri, Nov 10, 2023 at 12:25 PM Alexander Monakov <amonakov@ispras.ru> wrote:
>
>
> On Thu, 9 Nov 2023, Jeff Law wrote:
>
> > > Yeah, I noticed that the scheduler takes care of DEBUG_INSNs as normal
> > > operations.  When I started to work on this issue, initially I wanted to try
> > > something similar to your idea #2, but when checking the APIs, I realized
> > > why not just skip the basic block with NOTEs and LABELs, DEBUG_INSNs as
> > > well.  IMHO there is no value to try to schedule this kind of BB (to be
> > > scheduled range), skipping it can save some resource allocation (like block
> > > dependencies) and make it more efficient (not enter function schedule_block
> > > etc.), from this perspective it seems an enhancement.  Does it sound
> > > reasonable to you?
> > It sounds reasonable, but only if doing so doesn't add significant
> > implementation complexity.  ie, the gains from doing less work here are likely
> > to be very marginal, so I'm more interested in clean, easy to maintain code.
>
> I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking design:
> DEBUG_INSNs participate in dependency graph so that schedulers can remove or
> mutate them as needed when moving real insns across them.

Note that debug-only BBs do not exist - the BB would be there even without debug
insns!  So instead you have to handle BBs with just debug insns the same you
handle a completely empty BB.

> Cc'ing Alexandre Oliva who can correct me on that if necessary.
>
> Alexander
Alexander Monakov Nov. 10, 2023, 2:18 p.m. UTC | #10
On Fri, 10 Nov 2023, Richard Biener wrote:

> > I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking design:
> > DEBUG_INSNs participate in dependency graph so that schedulers can remove or
> > mutate them as needed when moving real insns across them.
> 
> Note that debug-only BBs do not exist - the BB would be there even without debug
> insns!

Yep, sorry, I misspoke when I earlier said

>> and cause divergence when passing through a debug-only BB which would not be
>> present at all without -g.

They are present in the region, but skipped via no_real_insns_p.

> So instead you have to handle BBs with just debug insns the same you
> handle a completely empty BB.

Yeah. There would be no problem if the scheduler never used no_real_insns_p
and handled empty and non-empty BBs the same way.

Alexander
Richard Biener Nov. 10, 2023, 2:20 p.m. UTC | #11
On Fri, Nov 10, 2023 at 3:18 PM Alexander Monakov <amonakov@ispras.ru> wrote:
>
>
> On Fri, 10 Nov 2023, Richard Biener wrote:
>
> > > I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking design:
> > > DEBUG_INSNs participate in dependency graph so that schedulers can remove or
> > > mutate them as needed when moving real insns across them.
> >
> > Note that debug-only BBs do not exist - the BB would be there even without debug
> > insns!
>
> Yep, sorry, I misspoke when I earlier said
>
> >> and cause divergence when passing through a debug-only BB which would not be
> >> present at all without -g.
>
> They are present in the region, but skipped via no_real_insns_p.
>
> > So instead you have to handle BBs with just debug insns the same you
> > handle a completely empty BB.
>
> Yeah. There would be no problem if the scheduler never used no_real_insns_p
> and handled empty and non-empty BBs the same way.

And I suppose it would be OK to do that.  Empty BBs are usually removed by
CFG cleanup so the situation should only happen in rare corner cases where
the fix would be to actually run CFG cleanup ...

Richard.

> Alexander
Alexander Monakov Nov. 10, 2023, 2:41 p.m. UTC | #12
On Fri, 10 Nov 2023, Richard Biener wrote:

> On Fri, Nov 10, 2023 at 3:18 PM Alexander Monakov <amonakov@ispras.ru> wrote:
> >
> >
> > On Fri, 10 Nov 2023, Richard Biener wrote:
> >
> > > > I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking design:
> > > > DEBUG_INSNs participate in dependency graph so that schedulers can remove or
> > > > mutate them as needed when moving real insns across them.
> > >
> > > Note that debug-only BBs do not exist - the BB would be there even without debug
> > > insns!
> >
> > Yep, sorry, I misspoke when I earlier said
> >
> > >> and cause divergence when passing through a debug-only BB which would not be
> > >> present at all without -g.
> >
> > They are present in the region, but skipped via no_real_insns_p.
> >
> > > So instead you have to handle BBs with just debug insns the same you
> > > handle a completely empty BB.
> >
> > Yeah. There would be no problem if the scheduler never used no_real_insns_p
> > and handled empty and non-empty BBs the same way.
> 
> And I suppose it would be OK to do that.  Empty BBs are usually removed by
> CFG cleanup so the situation should only happen in rare corner cases where
> the fix would be to actually run CFG cleanup ...

Yeah, sel-sched invokes 'cfg_cleanup (0)' up front, and I suppose that
may be a preferable compromise for sched-rgn as well.

I'm afraid one does not simply remove all uses of no_real_insns_p from
sched-rgn, but would be happy to be wrong about that.

Alexander
Kewen.Lin Nov. 15, 2023, 9:12 a.m. UTC | #13
Hi Alexander/Richard/Jeff,

Thanks for the insightful comments!

on 2023/11/10 22:41, Alexander Monakov wrote:
> 
> On Fri, 10 Nov 2023, Richard Biener wrote:
> 
>> On Fri, Nov 10, 2023 at 3:18 PM Alexander Monakov <amonakov@ispras.ru> wrote:
>>>
>>>
>>> On Fri, 10 Nov 2023, Richard Biener wrote:
>>>
>>>>> I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking design:
>>>>> DEBUG_INSNs participate in dependency graph so that schedulers can remove or
>>>>> mutate them as needed when moving real insns across them.
>>>>
>>>> Note that debug-only BBs do not exist - the BB would be there even without debug
>>>> insns!
>>>
>>> Yep, sorry, I misspoke when I earlier said
>>>
>>>>> and cause divergence when passing through a debug-only BB which would not be
>>>>> present at all without -g.
>>>
>>> They are present in the region, but skipped via no_real_insns_p.
>>>
>>>> So instead you have to handle BBs with just debug insns the same you
>>>> handle a completely empty BB.
>>>
>>> Yeah. There would be no problem if the scheduler never used no_real_insns_p
>>> and handled empty and non-empty BBs the same way.
>>
>> And I suppose it would be OK to do that.  Empty BBs are usually removed by
>> CFG cleanup so the situation should only happen in rare corner cases where
>> the fix would be to actually run CFG cleanup ...
> 
> Yeah, sel-sched invokes 'cfg_cleanup (0)' up front, and I suppose that
> may be a preferable compromise for sched-rgn as well.
> 

Inspired by this discussion, I tested the attached patch 1 which is to run
cleanup_cfg (0) first in haifa_sched_init, it's bootstrapped and
regress-tested on x86_64-redhat-linux and powerpc64{,le}-linux-gnu.

Then I assumed some of the current uses of no_real_insns_p won't encounter
empty blocks any more, so made a patch 2 with some explicit assertions, but
unfortunately I got ICEs during bootstrapping happens in function
compute_priorities.  I'm going to investigate it further and post more
findings, but just heads-up to ensure if this is on the right track.

BR,
Kewen
From 7652655f278cfe0f6271c50aecb56e68e0877cc2 Mon Sep 17 00:00:00 2001
From: Kewen Lin <linkw@linux.ibm.com>
Date: Tue, 14 Nov 2023 15:16:47 +0800
Subject: [PATCH] sched: cleanup cfg for empty blocks first in haifa_sched_init
 [PR108273]

PR108273 exposed the inconsistent states issue between
non-debug mode and debug mode, as the discussion in [1],
we can follow the current practice in sel_global_init,
to run cleanup_cfg (0) first to remove empty blocks.

This patch is to follow this direction and remove empty
blocks by cleanup_cfg (0) in haifa_sched_init which
affects sms, ebb and rgn schedulings.

	PR rtl-optimization/108273

gcc/ChangeLog:

	* haifa-sched.cc (haifa_sched_init): Call cleanup_cfg (0) to remove
	empty blocks.
---
 gcc/haifa-sched.cc | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
index 8e8add709b3..e348d1a2119 100644
--- a/gcc/haifa-sched.cc
+++ b/gcc/haifa-sched.cc
@@ -7375,6 +7375,12 @@ haifa_sched_init (void)
       sched_deps_info->generate_spec_deps = 1;
     }
 
+  /* Remove empty blocks to avoid some inconsistency like: we skip
+     empty block in scheduling but don't for empty block + only
+     debug_insn, it could result in different subsequent states
+     and unexpected insn sequence difference.  */
+  cleanup_cfg (0);
+
   /* Initialize luids, dependency caches, target and h_i_d for the
      whole function.  */
   {
Alexander Monakov Nov. 15, 2023, 9:43 a.m. UTC | #14
On Wed, 15 Nov 2023, Kewen.Lin wrote:

> >> And I suppose it would be OK to do that.  Empty BBs are usually removed by
> >> CFG cleanup so the situation should only happen in rare corner cases where
> >> the fix would be to actually run CFG cleanup ...
> > 
> > Yeah, sel-sched invokes 'cfg_cleanup (0)' up front, and I suppose that
> > may be a preferable compromise for sched-rgn as well.
> 
> Inspired by this discussion, I tested the attached patch 1 which is to run
> cleanup_cfg (0) first in haifa_sched_init, it's bootstrapped and
> regress-tested on x86_64-redhat-linux and powerpc64{,le}-linux-gnu.

I don't think you can run cleanup_cfg after sched_init. I would suggest
to put it early in schedule_insns.

> Then I assumed some of the current uses of no_real_insns_p won't encounter
> empty blocks any more, so made a patch 2 with some explicit assertions, but
> unfortunately I got ICEs during bootstrapping happens in function
> compute_priorities.  I'm going to investigate it further and post more
> findings, but just heads-up to ensure if this is on the right track.

I suspect this may be caused by invoking cleanup_cfg too late.

Alexander
Kewen.Lin Nov. 17, 2023, 9:03 a.m. UTC | #15
on 2023/11/15 17:43, Alexander Monakov wrote:
> 
> On Wed, 15 Nov 2023, Kewen.Lin wrote:
> 
>>>> And I suppose it would be OK to do that.  Empty BBs are usually removed by
>>>> CFG cleanup so the situation should only happen in rare corner cases where
>>>> the fix would be to actually run CFG cleanup ...
>>>
>>> Yeah, sel-sched invokes 'cfg_cleanup (0)' up front, and I suppose that
>>> may be a preferable compromise for sched-rgn as well.
>>
>> Inspired by this discussion, I tested the attached patch 1 which is to run
>> cleanup_cfg (0) first in haifa_sched_init, it's bootstrapped and
>> regress-tested on x86_64-redhat-linux and powerpc64{,le}-linux-gnu.
> 
> I don't think you can run cleanup_cfg after sched_init. I would suggest
> to put it early in schedule_insns.

Thanks for the suggestion, I placed it at the beginning of haifa_sched_init
instead, since schedule_insns invokes haifa_sched_init, although the
calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
ahead but they are all "setup" functions, shouldn't affect or be affected
by this placement.

> 
>> Then I assumed some of the current uses of no_real_insns_p won't encounter
>> empty blocks any more, so made a patch 2 with some explicit assertions, but
>> unfortunately I got ICEs during bootstrapping happens in function
>> compute_priorities.  I'm going to investigate it further and post more
>> findings, but just heads-up to ensure if this is on the right track.
> 
> I suspect this may be caused by invoking cleanup_cfg too late.

By looking into some failures, I found that although cleanup_cfg is executed
there would be still some empty blocks left, by analyzing a few failures there
are at least such cases:
  1. empty function body
  2. block holding a label for return.
  3. block without any successor.
  4. block which becomes empty after scheduling some other block.
  5. block which looks mergeable with its always successor but left.
  ...

For 1,2, there is one single successor EXIT block, I think they don't affect
state transition, for 3, it's the same.  For 4, it depends on if we can have
the assumption this kind of empty block doesn't have the chance to have debug
insn (like associated debug insn should be moved along), I'm not sure.  For 5,
a reduced test case is:

#include <istream>
namespace std {
template <>
basic_istream<char> &
basic_istream<char>::getline (char_type *, streamsize, char_type) __try
{
  __streambuf_type *a = rdbuf ();
  a->sgetc ();
  while (traits_type::eq_int_type)
    ;
}
__catch (...) {}
} // namespace std

slim RTL:

    1: NOTE_INSN_DELETED
    7: NOTE_INSN_BASIC_BLOCK 2
...
   15: r137:CCUNS=cmp(r135:DI,r136:DI)
      REG_DEAD r136:DI
      REG_DEAD r135:DI
   16: pc={(ltu(r137:CCUNS,0))?L24:pc}
      REG_DEAD r137:CCUNS
      REG_BR_PROB 966367644
   17: NOTE_INSN_BASIC_BLOCK 3
   18: r138:DI=[r122:DI]
   19: r139:DI=[r138:DI+0x48]
      REG_DEAD r138:DI
   20: %3:DI=r122:DI
      REG_DEAD r122:DI
   21: %12:DI=r139:DI
   22: ctr:DI=r139:DI
      REG_DEAD r139:DI
   23: %3:DI=call [ctr:DI] argc:0
      REG_DEAD ctr:DI
      REG_DEAD %12:DI
      REG_UNUSED %3:DI
      REG_EH_REGION 0x1
      REG_CALL_DECL (nil)
   24: L24:
   25: NOTE_INSN_BASIC_BLOCK 4
   51: L51:
   48: NOTE_INSN_BASIC_BLOCK 5
   47: NOTE_INSN_DELETED
   52: pc=L51
   53: barrier
   39: L39:
   41: NOTE_INSN_BASIC_BLOCK 6
   32: %3:DI=call [`__cxa_begin_catch'] argc:0
      REG_UNUSED %3:DI
      REG_CALL_DECL `__cxa_begin_catch'
      REG_EH_REGION 0
   33: call [`__cxa_end_catch'] argc:0
      REG_CALL_DECL `__cxa_end_catch'
   34: barrier

;; basic block 4, loop depth 0, count 10631108 (estimated locally, freq 1.0000), maybe hot
;;  prev block 3, next block 5, flags: (REACHABLE, RTL)
;;  pred:       3 [always]  count:1063111 (estimated locally, freq 0.1000) (FALLTHRU)
;;              2 [90.0% (guessed)]  count:9567997 (estimated locally, freq 0.9000)
;;  succ:       5 [always]  count:10631108 (estimated locally, freq 1.0000) (FALLTHRU)
;; basic block 5, loop depth 0, count 1073741824 (estimated locally, freq 101.0000), maybe hot
;;  prev block 4, next block 6, flags: (REACHABLE, RTL, MODIFIED)
;;  pred:       4 [always]  count:10631108 (estimated locally, freq 1.0000) (FALLTHRU)
;;              5 [always]  count:1073741824 (estimated locally, freq 101.0000)
;;  succ:       5 [always]  count:1073741824 (estimated locally, freq 101.0000)

It looks bb 4 can be merged with bb 5, a miss-opt?  There can be some other cases
similar to this, either it's miss-opt or not, it seems to affect our assumption.

With the limited findings above so far, I wonder if we still want to go with this
direction that running cleanup_cfg first and make the assumption that there is no
such empty block which can cause debug and non-debug state inconsistency?
IMHO it seems risky to make it.  Thoughts?

BR,
Kewen
Richard Biener Nov. 17, 2023, 10:13 a.m. UTC | #16
On Fri, Nov 17, 2023 at 10:04 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> on 2023/11/15 17:43, Alexander Monakov wrote:
> >
> > On Wed, 15 Nov 2023, Kewen.Lin wrote:
> >
> >>>> And I suppose it would be OK to do that.  Empty BBs are usually removed by
> >>>> CFG cleanup so the situation should only happen in rare corner cases where
> >>>> the fix would be to actually run CFG cleanup ...
> >>>
> >>> Yeah, sel-sched invokes 'cfg_cleanup (0)' up front, and I suppose that
> >>> may be a preferable compromise for sched-rgn as well.
> >>
> >> Inspired by this discussion, I tested the attached patch 1 which is to run
> >> cleanup_cfg (0) first in haifa_sched_init, it's bootstrapped and
> >> regress-tested on x86_64-redhat-linux and powerpc64{,le}-linux-gnu.
> >
> > I don't think you can run cleanup_cfg after sched_init. I would suggest
> > to put it early in schedule_insns.
>
> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init
> instead, since schedule_insns invokes haifa_sched_init, although the
> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
> ahead but they are all "setup" functions, shouldn't affect or be affected
> by this placement.
>
> >
> >> Then I assumed some of the current uses of no_real_insns_p won't encounter
> >> empty blocks any more, so made a patch 2 with some explicit assertions, but
> >> unfortunately I got ICEs during bootstrapping happens in function
> >> compute_priorities.  I'm going to investigate it further and post more
> >> findings, but just heads-up to ensure if this is on the right track.
> >
> > I suspect this may be caused by invoking cleanup_cfg too late.
>
> By looking into some failures, I found that although cleanup_cfg is executed
> there would be still some empty blocks left, by analyzing a few failures there
> are at least such cases:
>   1. empty function body
>   2. block holding a label for return.
>   3. block without any successor.
>   4. block which becomes empty after scheduling some other block.
>   5. block which looks mergeable with its always successor but left.
>   ...
>
> For 1,2, there is one single successor EXIT block, I think they don't affect
> state transition, for 3, it's the same.  For 4, it depends on if we can have
> the assumption this kind of empty block doesn't have the chance to have debug
> insn (like associated debug insn should be moved along), I'm not sure.  For 5,
> a reduced test case is:
>
> #include <istream>
> namespace std {
> template <>
> basic_istream<char> &
> basic_istream<char>::getline (char_type *, streamsize, char_type) __try
> {
>   __streambuf_type *a = rdbuf ();
>   a->sgetc ();
>   while (traits_type::eq_int_type)
>     ;
> }
> __catch (...) {}
> } // namespace std
>
> slim RTL:
>
>     1: NOTE_INSN_DELETED
>     7: NOTE_INSN_BASIC_BLOCK 2
> ...
>    15: r137:CCUNS=cmp(r135:DI,r136:DI)
>       REG_DEAD r136:DI
>       REG_DEAD r135:DI
>    16: pc={(ltu(r137:CCUNS,0))?L24:pc}
>       REG_DEAD r137:CCUNS
>       REG_BR_PROB 966367644
>    17: NOTE_INSN_BASIC_BLOCK 3
>    18: r138:DI=[r122:DI]
>    19: r139:DI=[r138:DI+0x48]
>       REG_DEAD r138:DI
>    20: %3:DI=r122:DI
>       REG_DEAD r122:DI
>    21: %12:DI=r139:DI
>    22: ctr:DI=r139:DI
>       REG_DEAD r139:DI
>    23: %3:DI=call [ctr:DI] argc:0
>       REG_DEAD ctr:DI
>       REG_DEAD %12:DI
>       REG_UNUSED %3:DI
>       REG_EH_REGION 0x1
>       REG_CALL_DECL (nil)
>    24: L24:
>    25: NOTE_INSN_BASIC_BLOCK 4
>    51: L51:
>    48: NOTE_INSN_BASIC_BLOCK 5
>    47: NOTE_INSN_DELETED
>    52: pc=L51
>    53: barrier
>    39: L39:
>    41: NOTE_INSN_BASIC_BLOCK 6
>    32: %3:DI=call [`__cxa_begin_catch'] argc:0
>       REG_UNUSED %3:DI
>       REG_CALL_DECL `__cxa_begin_catch'
>       REG_EH_REGION 0
>    33: call [`__cxa_end_catch'] argc:0
>       REG_CALL_DECL `__cxa_end_catch'
>    34: barrier
>
> ;; basic block 4, loop depth 0, count 10631108 (estimated locally, freq 1.0000), maybe hot
> ;;  prev block 3, next block 5, flags: (REACHABLE, RTL)
> ;;  pred:       3 [always]  count:1063111 (estimated locally, freq 0.1000) (FALLTHRU)
> ;;              2 [90.0% (guessed)]  count:9567997 (estimated locally, freq 0.9000)
> ;;  succ:       5 [always]  count:10631108 (estimated locally, freq 1.0000) (FALLTHRU)
> ;; basic block 5, loop depth 0, count 1073741824 (estimated locally, freq 101.0000), maybe hot
> ;;  prev block 4, next block 6, flags: (REACHABLE, RTL, MODIFIED)
> ;;  pred:       4 [always]  count:10631108 (estimated locally, freq 1.0000) (FALLTHRU)
> ;;              5 [always]  count:1073741824 (estimated locally, freq 101.0000)
> ;;  succ:       5 [always]  count:1073741824 (estimated locally, freq 101.0000)
>
> It looks bb 4 can be merged with bb 5, a miss-opt?  There can be some other cases
> similar to this, either it's miss-opt or not, it seems to affect our assumption.
>
> With the limited findings above so far, I wonder if we still want to go with this
> direction that running cleanup_cfg first and make the assumption that there is no
> such empty block which can cause debug and non-debug state inconsistency?
> IMHO it seems risky to make it.  Thoughts?

Running CFG cleanup shouldn't be the fix to remove such blocks but instead
scheduling shouldn't special case empty blocks as they usually do not appear.

Richard.

>
> BR,
> Kewen
Alexander Monakov Nov. 17, 2023, 12:55 p.m. UTC | #17
On Fri, 17 Nov 2023, Kewen.Lin wrote:
> > I don't think you can run cleanup_cfg after sched_init. I would suggest
> > to put it early in schedule_insns.
> 
> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init
> instead, since schedule_insns invokes haifa_sched_init, although the
> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
> ahead but they are all "setup" functions, shouldn't affect or be affected
> by this placement.

I was worried because sched_init invokes df_analyze, and I'm not sure if
cfg_cleanup can invalidate it.

> > I suspect this may be caused by invoking cleanup_cfg too late.
> 
> By looking into some failures, I found that although cleanup_cfg is executed
> there would be still some empty blocks left, by analyzing a few failures there
> are at least such cases:
>   1. empty function body
>   2. block holding a label for return.
>   3. block without any successor.
>   4. block which becomes empty after scheduling some other block.
>   5. block which looks mergeable with its always successor but left.
>   ...
> 
> For 1,2, there is one single successor EXIT block, I think they don't affect
> state transition, for 3, it's the same.  For 4, it depends on if we can have
> the assumption this kind of empty block doesn't have the chance to have debug
> insn (like associated debug insn should be moved along), I'm not sure.  For 5,
> a reduced test case is:

Oh, I should have thought of cases like these, really sorry about the slip
of attention, and thanks for showing a testcase for item 5. As Richard as
saying in his response, cfg_cleanup cannot be a fix here. The thing to check
would be changing no_real_insns_p to always return false, and see if the
situation looks recoverable (if it breaks bootstrap, regtest statistics of
a non-bootstrapped compiler are still informative).

Alexander
Kewen.Lin Nov. 22, 2023, 9:30 a.m. UTC | #18
on 2023/11/17 20:55, Alexander Monakov wrote:
> 
> On Fri, 17 Nov 2023, Kewen.Lin wrote:
>>> I don't think you can run cleanup_cfg after sched_init. I would suggest
>>> to put it early in schedule_insns.
>>
>> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init
>> instead, since schedule_insns invokes haifa_sched_init, although the
>> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
>> ahead but they are all "setup" functions, shouldn't affect or be affected
>> by this placement.
> 
> I was worried because sched_init invokes df_analyze, and I'm not sure if
> cfg_cleanup can invalidate it.

Thanks for further explaining!  By scanning cleanup_cfg, it seems that it
considers df, like compact_blocks checks df, try_optimize_cfg invokes
df_analyze etc., but I agree that moving cleanup_cfg before sched_init
makes more sense.

> 
>>> I suspect this may be caused by invoking cleanup_cfg too late.
>>
>> By looking into some failures, I found that although cleanup_cfg is executed
>> there would be still some empty blocks left, by analyzing a few failures there
>> are at least such cases:
>>   1. empty function body
>>   2. block holding a label for return.
>>   3. block without any successor.
>>   4. block which becomes empty after scheduling some other block.
>>   5. block which looks mergeable with its always successor but left.
>>   ...
>>
>> For 1,2, there is one single successor EXIT block, I think they don't affect
>> state transition, for 3, it's the same.  For 4, it depends on if we can have
>> the assumption this kind of empty block doesn't have the chance to have debug
>> insn (like associated debug insn should be moved along), I'm not sure.  For 5,
>> a reduced test case is:
> 
> Oh, I should have thought of cases like these, really sorry about the slip
> of attention, and thanks for showing a testcase for item 5. As Richard as
> saying in his response, cfg_cleanup cannot be a fix here. The thing to check
> would be changing no_real_insns_p to always return false, and see if the
> situation looks recoverable (if it breaks bootstrap, regtest statistics of
> a non-bootstrapped compiler are still informative).

As you suggested, I forced no_real_insns_p to return false all the time, some
issues got exposed, almost all of them are asserting NOTE_P insn shouldn't be
encountered in those places, so the adjustments for most of them are just to
consider NOTE_P or this kind of special block and so on.  One draft patch is
attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86.
btw, it's without the previous cfg_cleanup adjustment (hope it can get more
empty blocks and expose more issues).  The draft isn't qualified for code
review but I hope it can provide some information on what kinds of changes
are needed for the proposal.  If this is the direction which we all agree on,
I'll further refine it and post a formal patch.  One thing I want to note is
that this patch disable one assertion below:

diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
index e5964f54ead..abd334864fb 100644
--- a/gcc/sched-rgn.cc
+++ b/gcc/sched-rgn.cc
@@ -3219,7 +3219,7 @@ schedule_region (int rgn)
     }

   /* Sanity check: verify that all region insns were scheduled.  */
-  gcc_assert (sched_rgn_n_insns == rgn_n_insns);
+  // gcc_assert (sched_rgn_n_insns == rgn_n_insns);

   sched_finish_ready_list ();

Some cases can cause this assertion to fail, it's due to the mismatch on
to-be-scheduled and scheduled insn counts.  The reason why it happens is that
one block previously has only one INSN_P but while scheduling some other blocks
it gets moved as well then we ends up with an empty block so that the only
NOTE_P insn was counted then, but since this block isn't empty initially and
NOTE_P gets skipped in a normal block, the count to-be-scheduled can't count
it in.  It can be fixed with special-casing this kind of block for counting
like initially recording which block is empty and if a block isn't recorded
before then fix up the count for it accordingly.  I'm not sure if someone may
have an argument that all the complication make this proposal beaten by
previous special-casing debug insn approach, looking forward to more comments.

BR,
Kewen
From d350f411b23f6064a33a72a6ca7afc49b0ccea65 Mon Sep 17 00:00:00 2001
From: Kewen Lin <linkw@linux.ibm.com>
Date: Wed, 22 Nov 2023 00:08:59 -0600
Subject: [PATCH] sched: Don't skip empty block in scheduling

att
---
 gcc/haifa-sched.cc | 166 +++++++++++++++++++++++++++------------------
 gcc/rtl.h          |   4 +-
 gcc/sched-rgn.cc   |   2 +-
 3 files changed, 103 insertions(+), 69 deletions(-)

diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
index 8e8add709b3..62377d99162 100644
--- a/gcc/haifa-sched.cc
+++ b/gcc/haifa-sched.cc
@@ -1207,6 +1207,11 @@ recompute_todo_spec (rtx_insn *next, bool for_backtrack)
   int n_replace = 0;
   bool first_p = true;
 
+  /* We don't skip no_real_insns_p any more, so it's possible to
+     meet NOTE insn now, early return if so.  */
+  if (NOTE_P (next))
+    return 0;
+
   if (sd_lists_empty_p (next, SD_LIST_BACK))
     /* NEXT has all its dependencies resolved.  */
     return 0;
@@ -1728,6 +1733,9 @@ setup_insn_reg_pressure_info (rtx_insn *insn)
 
   gcc_checking_assert (!DEBUG_INSN_P (insn));
 
+  if (NOTE_P (insn))
+    return;
+
   excess_cost_change = 0;
   calculate_reg_deaths (insn, death);
   pressure_info = INSN_REG_PRESSURE (insn);
@@ -4017,10 +4025,10 @@ schedule_insn (rtx_insn *insn)
 
   /* Scheduling instruction should have all its dependencies resolved and
      should have been removed from the ready list.  */
-  gcc_assert (sd_lists_empty_p (insn, SD_LIST_HARD_BACK));
+  gcc_assert (NOTE_P (insn) || sd_lists_empty_p (insn, SD_LIST_HARD_BACK));
 
   /* Reset debug insns invalidated by moving this insn.  */
-  if (MAY_HAVE_DEBUG_BIND_INSNS && !DEBUG_INSN_P (insn))
+  if (MAY_HAVE_DEBUG_BIND_INSNS && !DEBUG_INSN_P (insn) && !NOTE_P (insn))
     for (sd_it = sd_iterator_start (insn, SD_LIST_BACK);
 	 sd_iterator_cond (&sd_it, &dep);)
       {
@@ -4106,61 +4114,64 @@ schedule_insn (rtx_insn *insn)
 
   check_clobbered_conditions (insn);
 
-  /* Update dependent instructions.  First, see if by scheduling this insn
-     now we broke a dependence in a way that requires us to change another
-     insn.  */
-  for (sd_it = sd_iterator_start (insn, SD_LIST_SPEC_BACK);
-       sd_iterator_cond (&sd_it, &dep); sd_iterator_next (&sd_it))
+  if (!NOTE_P (insn))
     {
-      struct dep_replacement *desc = DEP_REPLACE (dep);
-      rtx_insn *pro = DEP_PRO (dep);
-      if (QUEUE_INDEX (pro) != QUEUE_SCHEDULED
-	  && desc != NULL && desc->insn == pro)
-	apply_replacement (dep, false);
-    }
+      /* Update dependent instructions.  First, see if by scheduling this insn
+	 now we broke a dependence in a way that requires us to change another
+	 insn.  */
+      for (sd_it = sd_iterator_start (insn, SD_LIST_SPEC_BACK);
+	   sd_iterator_cond (&sd_it, &dep); sd_iterator_next (&sd_it))
+	{
+	  struct dep_replacement *desc = DEP_REPLACE (dep);
+	  rtx_insn *pro = DEP_PRO (dep);
+	  if (QUEUE_INDEX (pro) != QUEUE_SCHEDULED && desc != NULL
+	      && desc->insn == pro)
+	    apply_replacement (dep, false);
+	}
 
-  /* Go through and resolve forward dependencies.  */
-  for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
-       sd_iterator_cond (&sd_it, &dep);)
-    {
-      rtx_insn *next = DEP_CON (dep);
-      bool cancelled = (DEP_STATUS (dep) & DEP_CANCELLED) != 0;
+      /* Go through and resolve forward dependencies.  */
+      for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
+	   sd_iterator_cond (&sd_it, &dep);)
+	{
+	  rtx_insn *next = DEP_CON (dep);
+	  bool cancelled = (DEP_STATUS (dep) & DEP_CANCELLED) != 0;
 
-      /* Resolve the dependence between INSN and NEXT.
-	 sd_resolve_dep () moves current dep to another list thus
-	 advancing the iterator.  */
-      sd_resolve_dep (sd_it);
+	  /* Resolve the dependence between INSN and NEXT.
+	     sd_resolve_dep () moves current dep to another list thus
+	     advancing the iterator.  */
+	  sd_resolve_dep (sd_it);
 
-      if (cancelled)
-	{
-	  if (must_restore_pattern_p (next, dep))
-	    restore_pattern (dep, false);
-	  continue;
-	}
+	  if (cancelled)
+	    {
+	      if (must_restore_pattern_p (next, dep))
+		restore_pattern (dep, false);
+	      continue;
+	    }
 
-      /* Don't bother trying to mark next as ready if insn is a debug
-	 insn.  If insn is the last hard dependency, it will have
-	 already been discounted.  */
-      if (DEBUG_INSN_P (insn) && !DEBUG_INSN_P (next))
-	continue;
+	  /* Don't bother trying to mark next as ready if insn is a debug
+	     insn.  If insn is the last hard dependency, it will have
+	     already been discounted.  */
+	  if (DEBUG_INSN_P (insn) && !DEBUG_INSN_P (next))
+	    continue;
 
-      if (!IS_SPECULATION_BRANCHY_CHECK_P (insn))
-	{
-	  int effective_cost;
+	  if (!IS_SPECULATION_BRANCHY_CHECK_P (insn))
+	    {
+	      int effective_cost;
 
-	  effective_cost = try_ready (next);
+	      effective_cost = try_ready (next);
 
-	  if (effective_cost >= 0
-	      && SCHED_GROUP_P (next)
-	      && advance < effective_cost)
-	    advance = effective_cost;
-	}
-      else
-	/* Check always has only one forward dependence (to the first insn in
-	   the recovery block), therefore, this will be executed only once.  */
-	{
-	  gcc_assert (sd_lists_empty_p (insn, SD_LIST_FORW));
-	  fix_recovery_deps (RECOVERY_BLOCK (insn));
+	      if (effective_cost >= 0 && SCHED_GROUP_P (next)
+		  && advance < effective_cost)
+		advance = effective_cost;
+	    }
+	  else
+	    /* Check always has only one forward dependence (to the first insn
+	       in the recovery block), therefore, this will be executed only
+	       once.  */
+	    {
+	      gcc_assert (sd_lists_empty_p (insn, SD_LIST_FORW));
+	      fix_recovery_deps (RECOVERY_BLOCK (insn));
+	    }
 	}
     }
 
@@ -4170,6 +4181,7 @@ schedule_insn (rtx_insn *insn)
      may use this information to decide how the instruction should
      be aligned.  */
   if (issue_rate > 1
+      && !NOTE_P (insn)
       && GET_CODE (PATTERN (insn)) != USE
       && GET_CODE (PATTERN (insn)) != CLOBBER
       && !DEBUG_INSN_P (insn))
@@ -5036,8 +5048,11 @@ get_ebb_head_tail (basic_block beg, basic_block end,
 /* Return true if there are no real insns in the range [ HEAD, TAIL ].  */
 
 bool
-no_real_insns_p (const rtx_insn *head, const rtx_insn *tail)
+no_real_insns_p (const rtx_insn *head ATTRIBUTE_UNUSED,
+		 const rtx_insn *tail ATTRIBUTE_UNUSED)
 {
+  return false;
+#if 0
   while (head != NEXT_INSN (tail))
     {
       if (!NOTE_P (head) && !LABEL_P (head))
@@ -5045,6 +5060,7 @@ no_real_insns_p (const rtx_insn *head, const rtx_insn *tail)
       head = NEXT_INSN (head);
     }
   return true;
+#endif
 }
 
 /* Restore-other-notes: NOTE_LIST is the end of a chain of notes
@@ -6224,8 +6240,9 @@ commit_schedule (rtx_insn *prev_head, rtx_insn *tail, basic_block *target_bb)
        scheduled_insns.iterate (i, &insn);
        i++)
     {
-      if (control_flow_insn_p (last_scheduled_insn)
-	  || current_sched_info->advance_target_bb (*target_bb, insn))
+      if ((control_flow_insn_p (last_scheduled_insn)
+	   || current_sched_info->advance_target_bb (*target_bb, insn))
+	  && !NOTE_P (insn))
 	{
 	  *target_bb = current_sched_info->advance_target_bb (*target_bb, 0);
 
@@ -6245,7 +6262,7 @@ commit_schedule (rtx_insn *prev_head, rtx_insn *tail, basic_block *target_bb)
 	(*current_sched_info->begin_move_insn) (insn, last_scheduled_insn);
       move_insn (insn, last_scheduled_insn,
 		 current_sched_info->next_tail);
-      if (!DEBUG_INSN_P (insn))
+      if (!DEBUG_INSN_P (insn) && !NOTE_P (insn))
 	reemit_notes (insn);
       last_scheduled_insn = insn;
     }
@@ -6296,7 +6313,7 @@ prune_ready_list (state_t temp_state, bool first_cycle_insn_p,
 	  int cost = 0;
 	  const char *reason = "resource conflict";
 
-	  if (DEBUG_INSN_P (insn))
+	  if (DEBUG_INSN_P (insn) || NOTE_P (insn))
 	    continue;
 
 	  if (sched_group_found && !SCHED_GROUP_P (insn)
@@ -6504,7 +6521,7 @@ schedule_block (basic_block *target_bb, state_t init_state)
      and caused problems because schedule_block and compute_forward_dependences
      had different notions of what the "head" insn was.  */
 
-  gcc_assert (head != tail || INSN_P (head));
+  gcc_assert (head != tail || (INSN_P (head) || NOTE_P (head)));
 
   haifa_recovery_bb_recently_added_p = false;
 
@@ -6544,9 +6561,10 @@ schedule_block (basic_block *target_bb, state_t init_state)
   last_nondebug_scheduled_insn = NULL;
   nonscheduled_insns_begin = NULL;
 
-  gcc_assert ((NOTE_P (last_scheduled_insn)
-	       || DEBUG_INSN_P (last_scheduled_insn))
-	      && BLOCK_FOR_INSN (last_scheduled_insn) == *target_bb);
+  gcc_assert (
+    ((NOTE_P (last_scheduled_insn) || DEBUG_INSN_P (last_scheduled_insn))
+     && BLOCK_FOR_INSN (last_scheduled_insn) == *target_bb)
+    || (head == tail && NOTE_P (head)));
 
   /* Initialize INSN_QUEUE.  Q_SIZE is the total number of insns in the
      queue.  */
@@ -6744,6 +6762,21 @@ schedule_block (basic_block *target_bb, state_t init_state)
 		}
 	    }
 
+	  /* Handle the only NOTE here similar to the above for debug insn.  */
+	  if (ready.n_ready && NOTE_P (ready_element (&ready, 0))
+	      && (*current_sched_info->schedule_more_p) ())
+	    {
+	      rtx_insn *insn = ready_remove_first (&ready);
+	      (*current_sched_info->begin_schedule_ready) (insn);
+	      scheduled_insns.safe_push (insn);
+	      last_scheduled_insn = insn;
+	      advance = schedule_insn (insn);
+	      gcc_assert (advance == 0);
+	      if (ready.n_ready > 0)
+		ready_sort (&ready);
+	      gcc_assert (!ready.n_ready);
+	    }
+
 	  if (ls.first_cycle_insn_p && !ready.n_ready)
 	    break;
 
@@ -6886,7 +6919,7 @@ schedule_block (basic_block *target_bb, state_t init_state)
 	  /* Update counters, etc in the scheduler's front end.  */
 	  (*current_sched_info->begin_schedule_ready) (insn);
 	  scheduled_insns.safe_push (insn);
-	  gcc_assert (NONDEBUG_INSN_P (insn));
+	  gcc_assert (NONDEBUG_INSN_P (insn) || NOTE_P (insn));
 	  last_nondebug_scheduled_insn = last_scheduled_insn = insn;
 
 	  if (recog_memoized (insn) >= 0)
@@ -7145,17 +7178,16 @@ schedule_block (basic_block *target_bb, state_t init_state)
 int
 set_priorities (rtx_insn *head, rtx_insn *tail)
 {
+  /* This is a no_real_insns_p block.  */
+  if (head == tail && !INSN_P (head))
+    return 1;
+
   rtx_insn *insn;
-  int n_insn;
+  int n_insn = 0;
   int sched_max_insns_priority =
 	current_sched_info->sched_max_insns_priority;
   rtx_insn *prev_head;
 
-  if (head == tail && ! INSN_P (head))
-    gcc_unreachable ();
-
-  n_insn = 0;
-
   prev_head = PREV_INSN (head);
   for (insn = tail; insn != prev_head; insn = PREV_INSN (insn))
     {
@@ -7688,7 +7720,9 @@ fix_tick_ready (rtx_insn *next)
 {
   int tick, delay;
 
-  if (!DEBUG_INSN_P (next) && !sd_lists_empty_p (next, SD_LIST_RES_BACK))
+  if (!DEBUG_INSN_P (next)
+      && !NOTE_P (next)
+      && !sd_lists_empty_p (next, SD_LIST_RES_BACK))
     {
       int full_p;
       sd_iterator_def sd_it;
diff --git a/gcc/rtl.h b/gcc/rtl.h
index e4b6cc0dbb5..34b3f31d1ee 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2695,8 +2695,8 @@ do {								        \
 /* During sched, 1 if RTX is an insn that must be scheduled together
    with the preceding insn.  */
 #define SCHED_GROUP_P(RTX)						\
-  (RTL_FLAG_CHECK4 ("SCHED_GROUP_P", (RTX), DEBUG_INSN, INSN,		\
-		    JUMP_INSN, CALL_INSN)->in_struct)
+  (RTL_FLAG_CHECK5 ("SCHED_GROUP_P", (RTX), DEBUG_INSN, INSN,		\
+		    JUMP_INSN, CALL_INSN, NOTE)->in_struct)
 
 /* For a SET rtx, SET_DEST is the place that is set
    and SET_SRC is the value it is set to.  */
diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
index e5964f54ead..abd334864fb 100644
--- a/gcc/sched-rgn.cc
+++ b/gcc/sched-rgn.cc
@@ -3219,7 +3219,7 @@ schedule_region (int rgn)
     }
 
   /* Sanity check: verify that all region insns were scheduled.  */
-  gcc_assert (sched_rgn_n_insns == rgn_n_insns);
+  // gcc_assert (sched_rgn_n_insns == rgn_n_insns);
 
   sched_finish_ready_list ();
Richard Biener Nov. 22, 2023, 10:25 a.m. UTC | #19
On Wed, Nov 22, 2023 at 10:31 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> on 2023/11/17 20:55, Alexander Monakov wrote:
> >
> > On Fri, 17 Nov 2023, Kewen.Lin wrote:
> >>> I don't think you can run cleanup_cfg after sched_init. I would suggest
> >>> to put it early in schedule_insns.
> >>
> >> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init
> >> instead, since schedule_insns invokes haifa_sched_init, although the
> >> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
> >> ahead but they are all "setup" functions, shouldn't affect or be affected
> >> by this placement.
> >
> > I was worried because sched_init invokes df_analyze, and I'm not sure if
> > cfg_cleanup can invalidate it.
>
> Thanks for further explaining!  By scanning cleanup_cfg, it seems that it
> considers df, like compact_blocks checks df, try_optimize_cfg invokes
> df_analyze etc., but I agree that moving cleanup_cfg before sched_init
> makes more sense.
>
> >
> >>> I suspect this may be caused by invoking cleanup_cfg too late.
> >>
> >> By looking into some failures, I found that although cleanup_cfg is executed
> >> there would be still some empty blocks left, by analyzing a few failures there
> >> are at least such cases:
> >>   1. empty function body
> >>   2. block holding a label for return.
> >>   3. block without any successor.
> >>   4. block which becomes empty after scheduling some other block.
> >>   5. block which looks mergeable with its always successor but left.
> >>   ...
> >>
> >> For 1,2, there is one single successor EXIT block, I think they don't affect
> >> state transition, for 3, it's the same.  For 4, it depends on if we can have
> >> the assumption this kind of empty block doesn't have the chance to have debug
> >> insn (like associated debug insn should be moved along), I'm not sure.  For 5,
> >> a reduced test case is:
> >
> > Oh, I should have thought of cases like these, really sorry about the slip
> > of attention, and thanks for showing a testcase for item 5. As Richard as
> > saying in his response, cfg_cleanup cannot be a fix here. The thing to check
> > would be changing no_real_insns_p to always return false, and see if the
> > situation looks recoverable (if it breaks bootstrap, regtest statistics of
> > a non-bootstrapped compiler are still informative).
>
> As you suggested, I forced no_real_insns_p to return false all the time, some
> issues got exposed, almost all of them are asserting NOTE_P insn shouldn't be
> encountered in those places, so the adjustments for most of them are just to
> consider NOTE_P or this kind of special block and so on.  One draft patch is
> attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86.
> btw, it's without the previous cfg_cleanup adjustment (hope it can get more
> empty blocks and expose more issues).  The draft isn't qualified for code
> review but I hope it can provide some information on what kinds of changes
> are needed for the proposal.  If this is the direction which we all agree on,
> I'll further refine it and post a formal patch.  One thing I want to note is
> that this patch disable one assertion below:
>
> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
> index e5964f54ead..abd334864fb 100644
> --- a/gcc/sched-rgn.cc
> +++ b/gcc/sched-rgn.cc
> @@ -3219,7 +3219,7 @@ schedule_region (int rgn)
>      }
>
>    /* Sanity check: verify that all region insns were scheduled.  */
> -  gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> +  // gcc_assert (sched_rgn_n_insns == rgn_n_insns);
>
>    sched_finish_ready_list ();
>
> Some cases can cause this assertion to fail, it's due to the mismatch on
> to-be-scheduled and scheduled insn counts.  The reason why it happens is that
> one block previously has only one INSN_P but while scheduling some other blocks
> it gets moved as well then we ends up with an empty block so that the only
> NOTE_P insn was counted then, but since this block isn't empty initially and
> NOTE_P gets skipped in a normal block, the count to-be-scheduled can't count
> it in.  It can be fixed with special-casing this kind of block for counting
> like initially recording which block is empty and if a block isn't recorded
> before then fix up the count for it accordingly.  I'm not sure if someone may
> have an argument that all the complication make this proposal beaten by
> previous special-casing debug insn approach, looking forward to more comments.

Just a comment that the NOTE_P thing is odd - do we only ever have those for
otherwise empty BBs?  How are they skipped otherwise (and why does that not
work for otherwise empty BBs)?

Richard.

> BR,
> Kewen
>
Kewen.Lin Nov. 23, 2023, 2:36 a.m. UTC | #20
on 2023/11/22 18:25, Richard Biener wrote:
> On Wed, Nov 22, 2023 at 10:31 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> on 2023/11/17 20:55, Alexander Monakov wrote:
>>>
>>> On Fri, 17 Nov 2023, Kewen.Lin wrote:
>>>>> I don't think you can run cleanup_cfg after sched_init. I would suggest
>>>>> to put it early in schedule_insns.
>>>>
>>>> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init
>>>> instead, since schedule_insns invokes haifa_sched_init, although the
>>>> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
>>>> ahead but they are all "setup" functions, shouldn't affect or be affected
>>>> by this placement.
>>>
>>> I was worried because sched_init invokes df_analyze, and I'm not sure if
>>> cfg_cleanup can invalidate it.
>>
>> Thanks for further explaining!  By scanning cleanup_cfg, it seems that it
>> considers df, like compact_blocks checks df, try_optimize_cfg invokes
>> df_analyze etc., but I agree that moving cleanup_cfg before sched_init
>> makes more sense.
>>
>>>
>>>>> I suspect this may be caused by invoking cleanup_cfg too late.
>>>>
>>>> By looking into some failures, I found that although cleanup_cfg is executed
>>>> there would be still some empty blocks left, by analyzing a few failures there
>>>> are at least such cases:
>>>>   1. empty function body
>>>>   2. block holding a label for return.
>>>>   3. block without any successor.
>>>>   4. block which becomes empty after scheduling some other block.
>>>>   5. block which looks mergeable with its always successor but left.
>>>>   ...
>>>>
>>>> For 1,2, there is one single successor EXIT block, I think they don't affect
>>>> state transition, for 3, it's the same.  For 4, it depends on if we can have
>>>> the assumption this kind of empty block doesn't have the chance to have debug
>>>> insn (like associated debug insn should be moved along), I'm not sure.  For 5,
>>>> a reduced test case is:
>>>
>>> Oh, I should have thought of cases like these, really sorry about the slip
>>> of attention, and thanks for showing a testcase for item 5. As Richard as
>>> saying in his response, cfg_cleanup cannot be a fix here. The thing to check
>>> would be changing no_real_insns_p to always return false, and see if the
>>> situation looks recoverable (if it breaks bootstrap, regtest statistics of
>>> a non-bootstrapped compiler are still informative).
>>
>> As you suggested, I forced no_real_insns_p to return false all the time, some
>> issues got exposed, almost all of them are asserting NOTE_P insn shouldn't be
>> encountered in those places, so the adjustments for most of them are just to
>> consider NOTE_P or this kind of special block and so on.  One draft patch is
>> attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86.
>> btw, it's without the previous cfg_cleanup adjustment (hope it can get more
>> empty blocks and expose more issues).  The draft isn't qualified for code
>> review but I hope it can provide some information on what kinds of changes
>> are needed for the proposal.  If this is the direction which we all agree on,
>> I'll further refine it and post a formal patch.  One thing I want to note is
>> that this patch disable one assertion below:
>>
>> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
>> index e5964f54ead..abd334864fb 100644
>> --- a/gcc/sched-rgn.cc
>> +++ b/gcc/sched-rgn.cc
>> @@ -3219,7 +3219,7 @@ schedule_region (int rgn)
>>      }
>>
>>    /* Sanity check: verify that all region insns were scheduled.  */
>> -  gcc_assert (sched_rgn_n_insns == rgn_n_insns);
>> +  // gcc_assert (sched_rgn_n_insns == rgn_n_insns);
>>
>>    sched_finish_ready_list ();
>>
>> Some cases can cause this assertion to fail, it's due to the mismatch on
>> to-be-scheduled and scheduled insn counts.  The reason why it happens is that
>> one block previously has only one INSN_P but while scheduling some other blocks
>> it gets moved as well then we ends up with an empty block so that the only
>> NOTE_P insn was counted then, but since this block isn't empty initially and
>> NOTE_P gets skipped in a normal block, the count to-be-scheduled can't count
>> it in.  It can be fixed with special-casing this kind of block for counting
>> like initially recording which block is empty and if a block isn't recorded
>> before then fix up the count for it accordingly.  I'm not sure if someone may
>> have an argument that all the complication make this proposal beaten by
>> previous special-casing debug insn approach, looking forward to more comments.
> 
> Just a comment that the NOTE_P thing is odd - do we only ever have those for
> otherwise empty BBs?  How are they skipped otherwise (and why does that not
> work for otherwise empty BBs)?

Yes, previously (bypassing empty BBs) there is no chance to encounter NOTE_P
when scheduling insns, as for notes in normal BBs, when setting up the head
and tail, some are skipped (like get_ebb_head_tail), and there are also some
special handlings remove_notes and unlink_bb_notes to guarantee they are
gone.  By disabling empty BB bypassing, all empty BBs will be actually
uniformed as (head == tail && NOTE_P (head)), we have to deal with NOTE_P.

BR,
Kewen
Richard Biener Nov. 23, 2023, 8:20 a.m. UTC | #21
On Thu, Nov 23, 2023 at 4:02 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> on 2023/11/22 18:25, Richard Biener wrote:
> > On Wed, Nov 22, 2023 at 10:31 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>
> >> on 2023/11/17 20:55, Alexander Monakov wrote:
> >>>
> >>> On Fri, 17 Nov 2023, Kewen.Lin wrote:
> >>>>> I don't think you can run cleanup_cfg after sched_init. I would suggest
> >>>>> to put it early in schedule_insns.
> >>>>
> >>>> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init
> >>>> instead, since schedule_insns invokes haifa_sched_init, although the
> >>>> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
> >>>> ahead but they are all "setup" functions, shouldn't affect or be affected
> >>>> by this placement.
> >>>
> >>> I was worried because sched_init invokes df_analyze, and I'm not sure if
> >>> cfg_cleanup can invalidate it.
> >>
> >> Thanks for further explaining!  By scanning cleanup_cfg, it seems that it
> >> considers df, like compact_blocks checks df, try_optimize_cfg invokes
> >> df_analyze etc., but I agree that moving cleanup_cfg before sched_init
> >> makes more sense.
> >>
> >>>
> >>>>> I suspect this may be caused by invoking cleanup_cfg too late.
> >>>>
> >>>> By looking into some failures, I found that although cleanup_cfg is executed
> >>>> there would be still some empty blocks left, by analyzing a few failures there
> >>>> are at least such cases:
> >>>>   1. empty function body
> >>>>   2. block holding a label for return.
> >>>>   3. block without any successor.
> >>>>   4. block which becomes empty after scheduling some other block.
> >>>>   5. block which looks mergeable with its always successor but left.
> >>>>   ...
> >>>>
> >>>> For 1,2, there is one single successor EXIT block, I think they don't affect
> >>>> state transition, for 3, it's the same.  For 4, it depends on if we can have
> >>>> the assumption this kind of empty block doesn't have the chance to have debug
> >>>> insn (like associated debug insn should be moved along), I'm not sure.  For 5,
> >>>> a reduced test case is:
> >>>
> >>> Oh, I should have thought of cases like these, really sorry about the slip
> >>> of attention, and thanks for showing a testcase for item 5. As Richard as
> >>> saying in his response, cfg_cleanup cannot be a fix here. The thing to check
> >>> would be changing no_real_insns_p to always return false, and see if the
> >>> situation looks recoverable (if it breaks bootstrap, regtest statistics of
> >>> a non-bootstrapped compiler are still informative).
> >>
> >> As you suggested, I forced no_real_insns_p to return false all the time, some
> >> issues got exposed, almost all of them are asserting NOTE_P insn shouldn't be
> >> encountered in those places, so the adjustments for most of them are just to
> >> consider NOTE_P or this kind of special block and so on.  One draft patch is
> >> attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86.
> >> btw, it's without the previous cfg_cleanup adjustment (hope it can get more
> >> empty blocks and expose more issues).  The draft isn't qualified for code
> >> review but I hope it can provide some information on what kinds of changes
> >> are needed for the proposal.  If this is the direction which we all agree on,
> >> I'll further refine it and post a formal patch.  One thing I want to note is
> >> that this patch disable one assertion below:
> >>
> >> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
> >> index e5964f54ead..abd334864fb 100644
> >> --- a/gcc/sched-rgn.cc
> >> +++ b/gcc/sched-rgn.cc
> >> @@ -3219,7 +3219,7 @@ schedule_region (int rgn)
> >>      }
> >>
> >>    /* Sanity check: verify that all region insns were scheduled.  */
> >> -  gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> >> +  // gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> >>
> >>    sched_finish_ready_list ();
> >>
> >> Some cases can cause this assertion to fail, it's due to the mismatch on
> >> to-be-scheduled and scheduled insn counts.  The reason why it happens is that
> >> one block previously has only one INSN_P but while scheduling some other blocks
> >> it gets moved as well then we ends up with an empty block so that the only
> >> NOTE_P insn was counted then, but since this block isn't empty initially and
> >> NOTE_P gets skipped in a normal block, the count to-be-scheduled can't count
> >> it in.  It can be fixed with special-casing this kind of block for counting
> >> like initially recording which block is empty and if a block isn't recorded
> >> before then fix up the count for it accordingly.  I'm not sure if someone may
> >> have an argument that all the complication make this proposal beaten by
> >> previous special-casing debug insn approach, looking forward to more comments.
> >
> > Just a comment that the NOTE_P thing is odd - do we only ever have those for
> > otherwise empty BBs?  How are they skipped otherwise (and why does that not
> > work for otherwise empty BBs)?
>
> Yes, previously (bypassing empty BBs) there is no chance to encounter NOTE_P
> when scheduling insns, as for notes in normal BBs, when setting up the head
> and tail, some are skipped (like get_ebb_head_tail), and there are also some
> special handlings remove_notes and unlink_bb_notes to guarantee they are
> gone.  By disabling empty BB bypassing, all empty BBs will be actually
> uniformed as (head == tail && NOTE_P (head)), we have to deal with NOTE_P.

I see.  I expected most of them to be naturally part of another EBB.  So it's
rather a limitation of the head/tail representation.

I wonder if there's a more minimal fix though.  But iff head or tail
of an EBB then I
guess either head or tail has to point to a stmt in said block which necessarily
then means either a debug or note.

Richard.

>
> BR,
> Kewen
Kewen.Lin Nov. 23, 2023, 9:09 a.m. UTC | #22
on 2023/11/23 16:20, Richard Biener wrote:
> On Thu, Nov 23, 2023 at 4:02 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> on 2023/11/22 18:25, Richard Biener wrote:
>>> On Wed, Nov 22, 2023 at 10:31 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>
>>>> on 2023/11/17 20:55, Alexander Monakov wrote:
>>>>>
>>>>> On Fri, 17 Nov 2023, Kewen.Lin wrote:
>>>>>>> I don't think you can run cleanup_cfg after sched_init. I would suggest
>>>>>>> to put it early in schedule_insns.
>>>>>>
>>>>>> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init
>>>>>> instead, since schedule_insns invokes haifa_sched_init, although the
>>>>>> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
>>>>>> ahead but they are all "setup" functions, shouldn't affect or be affected
>>>>>> by this placement.
>>>>>
>>>>> I was worried because sched_init invokes df_analyze, and I'm not sure if
>>>>> cfg_cleanup can invalidate it.
>>>>
>>>> Thanks for further explaining!  By scanning cleanup_cfg, it seems that it
>>>> considers df, like compact_blocks checks df, try_optimize_cfg invokes
>>>> df_analyze etc., but I agree that moving cleanup_cfg before sched_init
>>>> makes more sense.
>>>>
>>>>>
>>>>>>> I suspect this may be caused by invoking cleanup_cfg too late.
>>>>>>
>>>>>> By looking into some failures, I found that although cleanup_cfg is executed
>>>>>> there would be still some empty blocks left, by analyzing a few failures there
>>>>>> are at least such cases:
>>>>>>   1. empty function body
>>>>>>   2. block holding a label for return.
>>>>>>   3. block without any successor.
>>>>>>   4. block which becomes empty after scheduling some other block.
>>>>>>   5. block which looks mergeable with its always successor but left.
>>>>>>   ...
>>>>>>
>>>>>> For 1,2, there is one single successor EXIT block, I think they don't affect
>>>>>> state transition, for 3, it's the same.  For 4, it depends on if we can have
>>>>>> the assumption this kind of empty block doesn't have the chance to have debug
>>>>>> insn (like associated debug insn should be moved along), I'm not sure.  For 5,
>>>>>> a reduced test case is:
>>>>>
>>>>> Oh, I should have thought of cases like these, really sorry about the slip
>>>>> of attention, and thanks for showing a testcase for item 5. As Richard as
>>>>> saying in his response, cfg_cleanup cannot be a fix here. The thing to check
>>>>> would be changing no_real_insns_p to always return false, and see if the
>>>>> situation looks recoverable (if it breaks bootstrap, regtest statistics of
>>>>> a non-bootstrapped compiler are still informative).
>>>>
>>>> As you suggested, I forced no_real_insns_p to return false all the time, some
>>>> issues got exposed, almost all of them are asserting NOTE_P insn shouldn't be
>>>> encountered in those places, so the adjustments for most of them are just to
>>>> consider NOTE_P or this kind of special block and so on.  One draft patch is
>>>> attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86.
>>>> btw, it's without the previous cfg_cleanup adjustment (hope it can get more
>>>> empty blocks and expose more issues).  The draft isn't qualified for code
>>>> review but I hope it can provide some information on what kinds of changes
>>>> are needed for the proposal.  If this is the direction which we all agree on,
>>>> I'll further refine it and post a formal patch.  One thing I want to note is
>>>> that this patch disable one assertion below:
>>>>
>>>> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
>>>> index e5964f54ead..abd334864fb 100644
>>>> --- a/gcc/sched-rgn.cc
>>>> +++ b/gcc/sched-rgn.cc
>>>> @@ -3219,7 +3219,7 @@ schedule_region (int rgn)
>>>>      }
>>>>
>>>>    /* Sanity check: verify that all region insns were scheduled.  */
>>>> -  gcc_assert (sched_rgn_n_insns == rgn_n_insns);
>>>> +  // gcc_assert (sched_rgn_n_insns == rgn_n_insns);
>>>>
>>>>    sched_finish_ready_list ();
>>>>
>>>> Some cases can cause this assertion to fail, it's due to the mismatch on
>>>> to-be-scheduled and scheduled insn counts.  The reason why it happens is that
>>>> one block previously has only one INSN_P but while scheduling some other blocks
>>>> it gets moved as well then we ends up with an empty block so that the only
>>>> NOTE_P insn was counted then, but since this block isn't empty initially and
>>>> NOTE_P gets skipped in a normal block, the count to-be-scheduled can't count
>>>> it in.  It can be fixed with special-casing this kind of block for counting
>>>> like initially recording which block is empty and if a block isn't recorded
>>>> before then fix up the count for it accordingly.  I'm not sure if someone may
>>>> have an argument that all the complication make this proposal beaten by
>>>> previous special-casing debug insn approach, looking forward to more comments.
>>>
>>> Just a comment that the NOTE_P thing is odd - do we only ever have those for
>>> otherwise empty BBs?  How are they skipped otherwise (and why does that not
>>> work for otherwise empty BBs)?
>>
>> Yes, previously (bypassing empty BBs) there is no chance to encounter NOTE_P
>> when scheduling insns, as for notes in normal BBs, when setting up the head
>> and tail, some are skipped (like get_ebb_head_tail), and there are also some
>> special handlings remove_notes and unlink_bb_notes to guarantee they are
>> gone.  By disabling empty BB bypassing, all empty BBs will be actually
>> uniformed as (head == tail && NOTE_P (head)), we have to deal with NOTE_P.
> 
> I see.  I expected most of them to be naturally part of another EBB.  So it's
> rather a limitation of the head/tail representation.
> 
> I wonder if there's a more minimal fix though.  But iff head or tail

Not sure I got the question correctly, if it is for this inconsistent states
issue, I think what Maxim suggested seems to be the minimal fixes.

> of an EBB then I
> guess either head or tail has to point to a stmt in said block which necessarily
> then means either a debug or note.

Yes, will only have NOTE_P for empty BB, if there is a debug insn then it wins
(all NOTEs get dropped).  remove_notes runs from head to tail, it's applied for
EBB's special head and tail.

BR,
Kewen
diff mbox series

Patch

diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
index 8e8add709b3..30cc90ec49f 100644
--- a/gcc/haifa-sched.cc
+++ b/gcc/haifa-sched.cc
@@ -5033,14 +5033,17 @@  get_ebb_head_tail (basic_block beg, basic_block end,
   *tailp = end_tail;
 }

-/* Return true if there are no real insns in the range [ HEAD, TAIL ].  */
+/* Return true if there are no real nondebug insns in the range
+   [ HEAD, TAIL ].  */

 bool
-no_real_insns_p (const rtx_insn *head, const rtx_insn *tail)
+no_real_nondebug_insns_p (const rtx_insn *head, const rtx_insn *tail)
 {
   while (head != NEXT_INSN (tail))
     {
-      if (!NOTE_P (head) && !LABEL_P (head))
+      if (!NOTE_P (head)
+	  && !LABEL_P (head)
+	  && !DEBUG_INSN_P (head))
 	return false;
       head = NEXT_INSN (head);
     }
diff --git a/gcc/sched-ebb.cc b/gcc/sched-ebb.cc
index 110fcdbca4d..03d96290a7c 100644
--- a/gcc/sched-ebb.cc
+++ b/gcc/sched-ebb.cc
@@ -491,7 +491,7 @@  schedule_ebb (rtx_insn *head, rtx_insn *tail, bool modulo_scheduling)
   first_bb = BLOCK_FOR_INSN (head);
   last_bb = BLOCK_FOR_INSN (tail);

-  if (no_real_insns_p (head, tail))
+  if (no_real_nondebug_insns_p (head, tail))
     return BLOCK_FOR_INSN (tail);

   gcc_assert (INSN_P (head) && INSN_P (tail));
diff --git a/gcc/sched-int.h b/gcc/sched-int.h
index 64a2f0bcff9..adca494ade5 100644
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@@ -1397,7 +1397,7 @@  extern void free_global_sched_pressure_data (void);
 extern int haifa_classify_insn (const_rtx);
 extern void get_ebb_head_tail (basic_block, basic_block,
 			       rtx_insn **, rtx_insn **);
-extern bool no_real_insns_p (const rtx_insn *, const rtx_insn *);
+extern bool no_real_nondebug_insns_p (const rtx_insn *, const rtx_insn *);

 extern int insn_sched_cost (rtx_insn *);
 extern int dep_cost_1 (dep_t, dw_t);
diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
index e5964f54ead..2549e834aa8 100644
--- a/gcc/sched-rgn.cc
+++ b/gcc/sched-rgn.cc
@@ -213,6 +213,22 @@  static int rgn_nr_edges;
 /* Array of size rgn_nr_edges.  */
 static edge *rgn_edges;

+/* Possible actions for dependencies freeing.  */
+enum rgn_bb_deps_free_action
+{
+  /* This block doesn't get dependencies computed so don't need to free.  */
+  RGN_BB_DEPS_FREE_NO,
+  /* This block gets scheduled normally so free dependencies as usual.  */
+  RGN_BB_DEPS_FREE_NORMAL,
+  /* This block gets skipped in scheduling but has dependencies computed early,
+     need to free the forward list artificially.  */
+  RGN_BB_DEPS_FREE_ARTIFICIAL
+};
+
+/* For basic block i, bb_deps_free_actions[i] indicates which action needs
+   to be taken for freeing its dependencies.  */
+static enum rgn_bb_deps_free_action *bb_deps_free_actions;
+
 /* Mapping from each edge in the graph to its number in the rgn.  */
 #define EDGE_TO_BIT(edge) ((int)(size_t)(edge)->aux)
 #define SET_EDGE_TO_BIT(edge,nr) ((edge)->aux = (void *)(size_t)(nr))
@@ -2735,6 +2751,15 @@  compute_block_dependences (int bb)
   gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
   get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);

+  /* Don't compute block dependences if there are no real nondebug insns.  */
+  if (no_real_nondebug_insns_p (head, tail))
+    {
+      if (current_nr_blocks > 1)
+	propagate_deps (bb, &tmp_deps);
+      free_deps (&tmp_deps);
+      return;
+    }
+
   sched_analyze (&tmp_deps, head, tail);

   add_branch_dependences (head, tail);
@@ -2749,6 +2774,24 @@  compute_block_dependences (int bb)
     targetm.sched.dependencies_evaluation_hook (head, tail);
 }

+/* Artificially resolve forward dependencies for instructions HEAD to TAIL.  */
+
+static void
+resolve_forw_deps (rtx_insn *head, rtx_insn *tail)
+{
+  rtx_insn *insn;
+  rtx_insn *next_tail = NEXT_INSN (tail);
+  sd_iterator_def sd_it;
+  dep_t dep;
+
+  /* There could be some insns which get skipped in scheduling but we compute
+     dependencies for them previously, so make them resolved.  */
+  for (insn = head; insn != next_tail; insn = NEXT_INSN (insn))
+    for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
+	 sd_iterator_cond (&sd_it, &dep);)
+      sd_resolve_dep (sd_it);
+}
+
 /* Free dependencies of instructions inside BB.  */
 static void
 free_block_dependencies (int bb)
@@ -2758,9 +2801,12 @@  free_block_dependencies (int bb)

   get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);

-  if (no_real_insns_p (head, tail))
+  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO)
     return;

+  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL)
+    resolve_forw_deps (head, tail);
+
   sched_free_deps (head, tail, true);
 }

@@ -3024,7 +3070,7 @@  compute_priorities (void)
       gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
       get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);

-      if (no_real_insns_p (head, tail))
+      if (no_real_nondebug_insns_p (head, tail))
 	continue;

       rgn_n_insns += set_priorities (head, tail);
@@ -3158,7 +3204,7 @@  schedule_region (int rgn)

 	  get_ebb_head_tail (first_bb, last_bb, &head, &tail);

-	  if (no_real_insns_p (head, tail))
+	  if (no_real_nondebug_insns_p (head, tail))
 	    {
 	      gcc_assert (first_bb == last_bb);
 	      continue;
@@ -3178,44 +3224,62 @@  schedule_region (int rgn)

       get_ebb_head_tail (first_bb, last_bb, &head, &tail);

-      if (no_real_insns_p (head, tail))
+      if (no_real_nondebug_insns_p (head, tail))
 	{
 	  gcc_assert (first_bb == last_bb);
 	  save_state_for_fallthru_edge (last_bb, bb_state[first_bb->index]);
-	  continue;
+
+	  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO)
+	    continue;
+
+	  /* As it's not no_real_nondebug_insns_p initially, then it has some
+	     dependencies computed so free it artificially.  */
+	  bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL;
 	}
+      else
+	{
+	  current_sched_info->prev_head = PREV_INSN (head);
+	  current_sched_info->next_tail = NEXT_INSN (tail);

-      current_sched_info->prev_head = PREV_INSN (head);
-      current_sched_info->next_tail = NEXT_INSN (tail);
+	  remove_notes (head, tail);

-      remove_notes (head, tail);
+	  unlink_bb_notes (first_bb, last_bb);

-      unlink_bb_notes (first_bb, last_bb);
+	  target_bb = bb;

-      target_bb = bb;
+	  gcc_assert (flag_schedule_interblock || current_nr_blocks == 1);
+	  current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;

-      gcc_assert (flag_schedule_interblock || current_nr_blocks == 1);
-      current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;
+	  curr_bb = first_bb;
+	  if (dbg_cnt (sched_block))
+	    {
+	      int saved_last_basic_block = last_basic_block_for_fn (cfun);

-      curr_bb = first_bb;
-      if (dbg_cnt (sched_block))
-        {
-	  int saved_last_basic_block = last_basic_block_for_fn (cfun);
+	      schedule_block (&curr_bb, bb_state[first_bb->index]);
+	      gcc_assert (EBB_FIRST_BB (bb) == first_bb);
+	      sched_rgn_n_insns += sched_n_insns;
+	      realloc_bb_state_array (saved_last_basic_block);
+	      save_state_for_fallthru_edge (last_bb, curr_state);

-	  schedule_block (&curr_bb, bb_state[first_bb->index]);
-	  gcc_assert (EBB_FIRST_BB (bb) == first_bb);
-	  sched_rgn_n_insns += sched_n_insns;
-	  realloc_bb_state_array (saved_last_basic_block);
-	  save_state_for_fallthru_edge (last_bb, curr_state);
-        }
-      else
-        {
-          sched_rgn_n_insns += rgn_n_insns;
-        }
+	      /* Clean up.  */
+	      if (current_nr_blocks > 1)
+		free_trg_info ();
+	    }
+	  else
+	    bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL;
+	}

-      /* Clean up.  */
-      if (current_nr_blocks > 1)
-	free_trg_info ();
+      /* We have counted this block when computing rgn_n_insns
+	 previously, so need to fix up sched_rgn_n_insns now.  */
+      if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL)
+	{
+	  while (head != NEXT_INSN (tail))
+	    {
+	      if (INSN_P (head))
+		sched_rgn_n_insns++;
+	      head = NEXT_INSN (head);
+	    }
+	}
     }

   /* Sanity check: verify that all region insns were scheduled.  */
@@ -3223,13 +3287,13 @@  schedule_region (int rgn)

   sched_finish_ready_list ();

-  /* Done with this region.  */
-  sched_rgn_local_finish ();
-
   /* Free dependencies.  */
   for (bb = 0; bb < current_nr_blocks; ++bb)
     free_block_dependencies (bb);

+  /* Done with this region.  */
+  sched_rgn_local_finish ();
+
   gcc_assert (haifa_recovery_bb_ever_added_p
 	      || deps_pools_are_empty_p ());
 }
@@ -3450,6 +3514,19 @@  sched_rgn_local_init (int rgn)
 	    e->aux = NULL;
         }
     }
+
+  /* Initialize bb_deps_free_actions.  */
+  bb_deps_free_actions
+    = XNEWVEC (enum rgn_bb_deps_free_action, current_nr_blocks);
+  for (bb = 0; bb < current_nr_blocks; bb++)
+    {
+      rtx_insn *head, *tail;
+      get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
+      if (no_real_nondebug_insns_p (head, tail))
+	bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NO;
+      else
+	bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NORMAL;
+    }
 }

 /* Free data computed for the finished region.  */
@@ -3467,9 +3544,12 @@  sched_rgn_local_free (void)
 void
 sched_rgn_local_finish (void)
 {
-  if (current_nr_blocks > 1 && !sel_sched_p ())
+  if (!sel_sched_p ())
     {
-      sched_rgn_local_free ();
+      if (current_nr_blocks > 1)
+	sched_rgn_local_free ();
+
+      free (bb_deps_free_actions);
     }
 }

diff --git a/gcc/sel-sched.cc b/gcc/sel-sched.cc
index 1925f4a9461..8310c892e13 100644
--- a/gcc/sel-sched.cc
+++ b/gcc/sel-sched.cc
@@ -7213,7 +7213,8 @@  sel_region_target_finish (bool reset_sched_cycles_p)

       find_ebb_boundaries (EBB_FIRST_BB (i), scheduled_blocks);

-      if (no_real_insns_p (current_sched_info->head, current_sched_info->tail))
+      if (no_real_nondebug_insns_p (current_sched_info->head,
+				    current_sched_info->tail))
 	continue;

       if (reset_sched_cycles_p)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr108273.c b/gcc/testsuite/gcc.target/powerpc/pr108273.c
new file mode 100644
index 00000000000..937224eaa69
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr108273.c
@@ -0,0 +1,26 @@ 
+/* { dg-options "-O2 -fdbg-cnt=sched_block:1" } */
+/* { dg-prune-output {\*\*\*dbgcnt:.*limit.*reached} } */
+
+/* Verify there is no ICE.  */
+
+int a, b, c, e, f;
+float d;
+
+void
+g ()
+{
+  float h, i[1];
+  for (; f;)
+    if (c)
+      {
+	d *e;
+	if (b)
+	  {
+	    float *j = i;
+	    j[0] += 0;
+	  }
+	h += d;
+      }
+  if (h)
+    a = i[0];
+}