Tweak ALAP calculation in SCHED_PRESSURE_MODEL

Message ID 5BE427B9.9020607@foss.arm.com
State New
Headers show
Series
  • Tweak ALAP calculation in SCHED_PRESSURE_MODEL
Related show

Commit Message

Kyrill Tkachov Nov. 8, 2018, 12:10 p.m.
<Sending this on behalf of Richard Sandiford>

This patch fixes a flaw in the relationship between the way that
SCHED_PRESSURE_MODEL calculates the alap and depth vs how it uses
them in model_order_p.  A comment in model_order_p says:

   /* Combine the length of the longest path of satisfied true dependencies
      that leads to each instruction (depth) with the length of the longest
      path of any dependencies that leads from the instruction (alap).
      Prefer instructions with the greatest combined length.  If the combined
      lengths are equal, prefer instructions with the greatest depth.

      The idea is that, if we have a set S of "equal" instructions that each
      have ALAP value X, and we pick one such instruction I, any true-dependent
      successors of I that have ALAP value X - 1 should be preferred over S.
      This encourages the schedule to be "narrow" rather than "wide".
      However, if I is a low-priority instruction that we decided to
      schedule because of its model_classify_pressure, and if there
      is a set of higher-priority instructions T, the aforementioned
      successors of I should not have the edge over T.  */

The expectation was that scheduling an instruction X would give a
greater priority to the highest-priority successor instructions Y than
X had: Y.depth would be X.depth + 1 and Y.alap would be X.alap - 1,
giving an equal combined height, but with the greater depth winning as
a tie-breaker. But this doesn't work if the alap value was ultimately
determined by an anti-dependence.

This is particularly bad when --param max-pending-list-length kicks in,
since we then start adding fake anti-dependencies in order to keep the
list length down.  These fake dependencies tend to be on the critical
path.

The attached patch avoids that by making the alap calculation only
look at true dependencies.  This shouldn't be too bad, since we use
INSN_PRIORITY as the final tie-breaker than that does take
anti-dependencies into account.

This reduces the number of spills in the hot function from 436.cactusADM
by 14% on aarch64 at -O3 (and the number of instructions in general).
SPEC2017 shows a minor improvement on Cortex-A72 (about 0.1% overall).
Thanks to Wilco for the benchmarking.

Bootstrapped and tested on aarch64-none-linux-gnu.

Is this ok for trunk?

Thanks,
Kyrill

2018-11-08  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
     * haifa-sched.c (model_analyze_insns): Only add 1 to the consumer's
     ALAP if the dependence is a true dependence.

Comments

Jeff Law Nov. 10, 2018, 12:04 a.m. | #1
On 11/8/18 5:10 AM, Kyrill Tkachov wrote:
> <Sending this on behalf of Richard Sandiford>
> 
> This patch fixes a flaw in the relationship between the way that
> SCHED_PRESSURE_MODEL calculates the alap and depth vs how it uses
> them in model_order_p.  A comment in model_order_p says:
> 
>   /* Combine the length of the longest path of satisfied true dependencies
>      that leads to each instruction (depth) with the length of the longest
>      path of any dependencies that leads from the instruction (alap).
>      Prefer instructions with the greatest combined length.  If the
> combined
>      lengths are equal, prefer instructions with the greatest depth.
> 
>      The idea is that, if we have a set S of "equal" instructions that each
>      have ALAP value X, and we pick one such instruction I, any
> true-dependent
>      successors of I that have ALAP value X - 1 should be preferred over S.
>      This encourages the schedule to be "narrow" rather than "wide".
>      However, if I is a low-priority instruction that we decided to
>      schedule because of its model_classify_pressure, and if there
>      is a set of higher-priority instructions T, the aforementioned
>      successors of I should not have the edge over T.  */
> 
> The expectation was that scheduling an instruction X would give a
> greater priority to the highest-priority successor instructions Y than
> X had: Y.depth would be X.depth + 1 and Y.alap would be X.alap - 1,
> giving an equal combined height, but with the greater depth winning as
> a tie-breaker. But this doesn't work if the alap value was ultimately
> determined by an anti-dependence.
> 
> This is particularly bad when --param max-pending-list-length kicks in,
> since we then start adding fake anti-dependencies in order to keep the
> list length down.  These fake dependencies tend to be on the critical
> path.
> 
> The attached patch avoids that by making the alap calculation only
> look at true dependencies.  This shouldn't be too bad, since we use
> INSN_PRIORITY as the final tie-breaker than that does take
> anti-dependencies into account.
> 
> This reduces the number of spills in the hot function from 436.cactusADM
> by 14% on aarch64 at -O3 (and the number of instructions in general).
> SPEC2017 shows a minor improvement on Cortex-A72 (about 0.1% overall).
> Thanks to Wilco for the benchmarking.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Is this ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2018-11-08  Richard Sandiford  <richard.sandiford@arm.com>
> 
> gcc/
>     * haifa-sched.c (model_analyze_insns): Only add 1 to the consumer's
>     ALAP if the dependence is a true dependence.
So at the least the documentation of the ALAP field would need to be
updated as well as the comment you referenced (the "any dependencies").

But more importantly, it seems like blindly ignoring anti dependencies
is just a hack that happens to work.  I wonder if we could somehow mark
the fake dependencies we add, and avoid bumping the ALAP when we
encounter those fake dependencies.

It probably wouldn't be a bad idea to look at the default for
MAX_PENDING_LIST_LENGTH.  Based on the current default value and the
comments in the code that value could well have been tuned 25 or more
years ago!  How often are we falling over that during a bootstrap and
during spec builds?


Jeff

Patch

diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index 1fdc9df9fb26f23758ec8326cec91eecc4c917c1..01825de440c2e818eceab5ab7411b20b05ee54f1 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -3504,8 +3504,10 @@  model_analyze_insns (void)
 	FOR_EACH_DEP (iter, SD_LIST_FORW, sd_it, dep)
 	  {
 	    con = MODEL_INSN_INFO (DEP_CON (dep));
-	    if (con->insn && insn->alap < con->alap + 1)
-	      insn->alap = con->alap + 1;
+	    unsigned int min_alap
+	      = con->alap + (DEP_TYPE (dep) == REG_DEP_TRUE);
+	    if (con->insn && insn->alap < min_alap)
+	      insn->alap = min_alap;
 	  }
 
 	insn->old_queue = QUEUE_INDEX (iter);