diff mbox series

Schedule by INSN_COST in case of tie

Message ID 5B97C539.1020009@arm.com
State New
Headers show
Series Schedule by INSN_COST in case of tie | expand

Commit Message

Vlad Lazar Sept. 11, 2018, 1:38 p.m. UTC
Hi.

This patch makes the scheduler prefer instructions with higher cost if two given instructions are equally good.
Issuing more restricted instructions first is particularly useful on in-order cores because it increases the
number of dual issue opportunities.

For example, on AArch64, instead of:

   add     x11, x2, 96
   mov     x4, x2
   mov     w10, 1
   ldrh    w5, [x0]
   ldrh    w13, [x0, 2]
   ldrh    w9, [x0, 4]
   ldrh    w12, [x0, 6]
   b       .L759

Generate:

   ldrh    w5, [x0]
   add     x11, x2, 96
   ldrh    w13, [x0, 2]
   mov     x4, x2
   ldrh    w9, [x0, 4]
   mov     w10, 1
   ldrh    w12, [x0, 6]
   b       .L759

Bootstrapped and regtested on aarch64-none-linux-gnu and there are no regressions.
Ok for trunk?

Thanks,
Vlad

gcc/
Changelog for gcc/Changelog
2018-09-11  Vlad Lazar  <vlad.lazar@arm.com>

	* haifa-sched.c (rank_for_schedule): Schedule by INSN_COST.
	(rfs_decision): New scheduling decision.

Comments

Ramana Radhakrishnan Sept. 11, 2018, 2:55 p.m. UTC | #1
On Tue, 11 Sep 2018, 14:38 Vlad Lazar, <vlad.lazar@arm.com> wrote:

> Hi.
>
> This patch makes the scheduler prefer instructions with higher cost if two
> given instructions are equally good.
> Issuing more restricted instructions first is particularly useful on
> in-order cores because it increases the
> number of dual issue opportunities.
>
> For example, on AArch64, instead of:
>
>    add     x11, x2, 96
>    mov     x4, x2
>    mov     w10, 1
>    ldrh    w5, [x0]
>    ldrh    w13, [x0, 2]
>    ldrh    w9, [x0, 4]
>    ldrh    w12, [x0, 6]
>    b       .L759
>
> Generate:
>
>    ldrh    w5, [x0]
>    add     x11, x2, 96
>    ldrh    w13, [x0, 2]
>    mov     x4, x2
>    ldrh    w9, [x0, 4]
>    mov     w10, 1
>    ldrh    w12, [x0, 6]
>    b       .L759
>
> Bootstrapped and regtested on aarch64-none-linux-gnu and there are no
> regressions.
> Ok for trunk?
>

This to me feels like the wrong approach as it feels like you are assuming
INSN_COST is latency in some way ? Surely, we shouldn't be introducing
INSN_COST based stuff into the scheduler.

Have you investigated  using TARGET_SCHED_ADJUST_COST (IIRC, look for the
right name in the internals documents) and such hooks that come from the
scheduler rather than trying to massage INSN_COST into the target
independent parts of the scheduler ?

Ramana


>
> Thanks,
> Vlad
>
> gcc/
> Changelog for gcc/Changelog
> 2018-09-11  Vlad Lazar  <vlad.lazar@arm.com>
>
>         * haifa-sched.c (rank_for_schedule): Schedule by INSN_COST.
>         (rfs_decision): New scheduling decision.
>
Kyrill Tkachov Sept. 11, 2018, 3 p.m. UTC | #2
Hi Ramana,

On 11/09/18 15:55, Ramana Radhakrishnan wrote:
> On Tue, 11 Sep 2018, 14:38 Vlad Lazar, <vlad.lazar@arm.com> wrote:
>
> > Hi.
> >
> > This patch makes the scheduler prefer instructions with higher cost if two
> > given instructions are equally good.
> > Issuing more restricted instructions first is particularly useful on
> > in-order cores because it increases the
> > number of dual issue opportunities.
> >
> > For example, on AArch64, instead of:
> >
> >    add     x11, x2, 96
> >    mov     x4, x2
> >    mov     w10, 1
> >    ldrh    w5, [x0]
> >    ldrh    w13, [x0, 2]
> >    ldrh    w9, [x0, 4]
> >    ldrh    w12, [x0, 6]
> >    b       .L759
> >
> > Generate:
> >
> >    ldrh    w5, [x0]
> >    add     x11, x2, 96
> >    ldrh    w13, [x0, 2]
> >    mov     x4, x2
> >    ldrh    w9, [x0, 4]
> >    mov     w10, 1
> >    ldrh    w12, [x0, 6]
> >    b       .L759
> >
> > Bootstrapped and regtested on aarch64-none-linux-gnu and there are no
> > regressions.
> > Ok for trunk?
> >
>
> This to me feels like the wrong approach as it feels like you are assuming
> INSN_COST is latency in some way ? Surely, we shouldn't be introducing
> INSN_COST based stuff into the scheduler.
>
> Have you investigated  using TARGET_SCHED_ADJUST_COST (IIRC, look for the
> right name in the internals documents) and such hooks that come from the
> scheduler rather than trying to massage INSN_COST into the target
> independent parts of the scheduler ?
>

In the context of haifa-sched.c, INSN_COST is the latency cost.
It is not the rtx_cost of the insn, as used by combine and others.
So this approach looks reasonable to me (though I haven't done a deep review).

Thanks,
Kyrill

> Ramana
>
>
> >
> > Thanks,
> > Vlad
> >
> > gcc/
> > Changelog for gcc/Changelog
> > 2018-09-11  Vlad Lazar  <vlad.lazar@arm.com>
> >
> >         * haifa-sched.c (rank_for_schedule): Schedule by INSN_COST.
> >         (rfs_decision): New scheduling decision.
> >
Ramana Radhakrishnan Sept. 11, 2018, 3:03 p.m. UTC | #3
>
> > This to me feels like the wrong approach as it feels like you are assuming
> > INSN_COST is latency in some way ? Surely, we shouldn't be introducing
> > INSN_COST based stuff into the scheduler.
> >
> > Have you investigated  using TARGET_SCHED_ADJUST_COST (IIRC, look for the
> > right name in the internals documents) and such hooks that come from the
> > scheduler rather than trying to massage INSN_COST into the target
> > independent parts of the scheduler ?
> >
>
> In the context of haifa-sched.c, INSN_COST is the latency cost.
> It is not the rtx_cost of the insn, as used by combine and others.

Ah, I was conflating rtx_cost with INSN_COST, sorry about the noise.

Ramana

> So this approach looks reasonable to me (though I haven't done a deep review).
>
> Thanks,
> Kyrill
>
> > Ramana
> >
> >
> > >
> > > Thanks,
> > > Vlad
> > >
> > > gcc/
> > > Changelog for gcc/Changelog
> > > 2018-09-11  Vlad Lazar  <vlad.lazar@arm.com>
> > >
> > >         * haifa-sched.c (rank_for_schedule): Schedule by INSN_COST.
> > >         (rfs_decision): New scheduling decision.
> > >
>
Jeff Law Sept. 11, 2018, 8:46 p.m. UTC | #4
On 9/11/18 9:00 AM, Kyrill Tkachov wrote:
> Hi Ramana,
> 
> On 11/09/18 15:55, Ramana Radhakrishnan wrote:
>> On Tue, 11 Sep 2018, 14:38 Vlad Lazar, <vlad.lazar@arm.com> wrote:
>>
>> > Hi.
>> >
>> > This patch makes the scheduler prefer instructions with higher cost
>> if two
>> > given instructions are equally good.
>> > Issuing more restricted instructions first is particularly useful on
>> > in-order cores because it increases the
>> > number of dual issue opportunities.
>> >
>> > For example, on AArch64, instead of:
>> >
>> >    add     x11, x2, 96
>> >    mov     x4, x2
>> >    mov     w10, 1
>> >    ldrh    w5, [x0]
>> >    ldrh    w13, [x0, 2]
>> >    ldrh    w9, [x0, 4]
>> >    ldrh    w12, [x0, 6]
>> >    b       .L759
>> >
>> > Generate:
>> >
>> >    ldrh    w5, [x0]
>> >    add     x11, x2, 96
>> >    ldrh    w13, [x0, 2]
>> >    mov     x4, x2
>> >    ldrh    w9, [x0, 4]
>> >    mov     w10, 1
>> >    ldrh    w12, [x0, 6]
>> >    b       .L759
>> >
>> > Bootstrapped and regtested on aarch64-none-linux-gnu and there are no
>> > regressions.
>> > Ok for trunk?
>> >
>>
>> This to me feels like the wrong approach as it feels like you are
>> assuming
>> INSN_COST is latency in some way ? Surely, we shouldn't be introducing
>> INSN_COST based stuff into the scheduler.
>>
>> Have you investigated  using TARGET_SCHED_ADJUST_COST (IIRC, look for the
>> right name in the internals documents) and such hooks that come from the
>> scheduler rather than trying to massage INSN_COST into the target
>> independent parts of the scheduler ?
>>
> 
> In the context of haifa-sched.c, INSN_COST is the latency cost.
> It is not the rtx_cost of the insn, as used by combine and others.
> So this approach looks reasonable to me (though I haven't done a deep
> review).
It looks reasonable to me as well.  Essentially it's a new tie-breaker
if everything else is equal.

jeff
Jeff Law Sept. 11, 2018, 8:46 p.m. UTC | #5
On 9/11/18 7:38 AM, Vlad Lazar wrote:
> Hi.
> 
> This patch makes the scheduler prefer instructions with higher cost if
> two given instructions are equally good.
> Issuing more restricted instructions first is particularly useful on
> in-order cores because it increases the
> number of dual issue opportunities.
> 
> For example, on AArch64, instead of:
> 
>   add     x11, x2, 96
>   mov     x4, x2
>   mov     w10, 1
>   ldrh    w5, [x0]
>   ldrh    w13, [x0, 2]
>   ldrh    w9, [x0, 4]
>   ldrh    w12, [x0, 6]
>   b       .L759
> 
> Generate:
> 
>   ldrh    w5, [x0]
>   add     x11, x2, 96
>   ldrh    w13, [x0, 2]
>   mov     x4, x2
>   ldrh    w9, [x0, 4]
>   mov     w10, 1
>   ldrh    w12, [x0, 6]
>   b       .L759
> 
> Bootstrapped and regtested on aarch64-none-linux-gnu and there are no
> regressions.
> Ok for trunk?
> 
> Thanks,
> Vlad
> 
> gcc/
> Changelog for gcc/Changelog
> 2018-09-11  Vlad Lazar  <vlad.lazar@arm.com>
> 
>     * haifa-sched.c (rank_for_schedule): Schedule by INSN_COST.
>     (rfs_decision): New scheduling decision.
OK.
jeff
diff mbox series

Patch

diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index 4f0221f6f43..3095e0375b5 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -2542,7 +2542,7 @@  enum rfs_decision {
   RFS_SCHED_GROUP, RFS_PRESSURE_DELAY, RFS_PRESSURE_TICK,
   RFS_FEEDS_BACKTRACK_INSN, RFS_PRIORITY, RFS_SPECULATION,
   RFS_SCHED_RANK, RFS_LAST_INSN, RFS_PRESSURE_INDEX,
-  RFS_DEP_COUNT, RFS_TIE, RFS_FUSION, RFS_N };
+  RFS_DEP_COUNT, RFS_TIE, RFS_FUSION, RFS_COST, RFS_N };
 
 /* Corresponding strings for print outs.  */
 static const char *rfs_str[RFS_N] = {
@@ -2550,7 +2550,7 @@  static const char *rfs_str[RFS_N] = {
   "RFS_SCHED_GROUP", "RFS_PRESSURE_DELAY", "RFS_PRESSURE_TICK",
   "RFS_FEEDS_BACKTRACK_INSN", "RFS_PRIORITY", "RFS_SPECULATION",
   "RFS_SCHED_RANK", "RFS_LAST_INSN", "RFS_PRESSURE_INDEX",
-  "RFS_DEP_COUNT", "RFS_TIE", "RFS_FUSION" };
+  "RFS_DEP_COUNT", "RFS_TIE", "RFS_FUSION", "RFS_COST" };
 
 /* Statistical breakdown of rank_for_schedule decisions.  */
 struct rank_for_schedule_stats_t { unsigned stats[RFS_N]; };
@@ -2803,6 +2803,14 @@  rank_for_schedule (const void *x, const void *y)
   if (flag_sched_dep_count_heuristic && val != 0)
     return rfs_result (RFS_DEP_COUNT, val, tmp, tmp2);
 
+  /* Sort by INSN_COST rather than INSN_LUID.  This means that instructions
+     which take longer to execute are prioritised and it leads to more
+     dual-issue opportunities on in-order cores which have this feature.  */
+
+  if (INSN_COST (tmp) != INSN_COST (tmp2))
+    return rfs_result (RFS_COST, INSN_COST (tmp2) - INSN_COST (tmp),
+		       tmp, tmp2);
+
   /* If insns are equally good, sort by INSN_LUID (original insn order),
      so that we make the sort stable.  This minimizes instruction movement,
      thus minimizing sched's effect on debugging and cross-jumping.  */