diff mbox series

[RFC,Instruction,Scheduler,Stage1] New hook/code to perform fusion of dependent instructions

Message ID a269003e-637a-44e9-2eab-211f11cb7bba@linux.ibm.com
State New
Headers show
Series [RFC,Instruction,Scheduler,Stage1] New hook/code to perform fusion of dependent instructions | expand

Commit Message

Li, Pan2 via Gcc-patches April 7, 2020, 8:45 p.m. UTC
The Power processor has the ability to fuse certain pairs of dependent
instructions to improve their performance if they appear back-to-back in
the instruction stream. In looking at the current support for
instruction fusion in GCC I saw the following 2 options.

1) TARGET_SCHED_MACRO_FUSION target hooks: Only looks at existing
back-to-back instructions and will ensure the scheduler keeps them together.

2) -fsched-fusion/TARGET_SCHED_FUSION_PRIORITY: Runs as a separate
scheduling pass before peephole2. Operates independently on a single
insn. Used by ARM backend to assign higher priorities to base/disp loads
and stores so that the scheduling pass will schedule loads/stores to
adjacent memory back-to-back. Later these insns will be transformed into
load/store pair insns.

Neither of these work for Power's purpose because they don't deal with
fusion of dependent insns that may not already be back-to-back. The
TARGET_SCHED_REORDER[2] hooks also don't work since the dependent insn
more than likely gets queued for N cycles so wouldn't be on the ready
list for the reorder hooks to process. We want the ability for the
scheduler to schedule dependent insn pairs back-to-back when possible
(i.e. other dependencies of both insns have been satisfied).

I have coded up a proof of concept that implements our needs via a new
target hook. The hook is passed a pair of dependent insns and returns if
they are a fusion candidate. It is called while removing the forward
dependencies of the just scheduled insn. If a dependent insn becomes
available to schedule and it's a fusion candidate with the just
scheduled insn, then the new code moves it to the ready list (if
necessary) and marks it as SCHED_GROUP (piggy-backing on the existing
code used by TARGET_SCHED_MACRO_FUSION) to make sure the fusion
candidate will be scheduled next. Following is the scheduling part of
the diff. Does this sound like a feasible approach? I welcome any
comments/discussion.

Thanks,
Pat

Comments

Li, Pan2 via Gcc-patches April 8, 2020, 7:46 a.m. UTC | #1
On Tue, Apr 7, 2020 at 10:45 PM Pat Haugen via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> The Power processor has the ability to fuse certain pairs of dependent
> instructions to improve their performance if they appear back-to-back in
> the instruction stream. In looking at the current support for
> instruction fusion in GCC I saw the following 2 options.
>
> 1) TARGET_SCHED_MACRO_FUSION target hooks: Only looks at existing
> back-to-back instructions and will ensure the scheduler keeps them together.

The i386 backend uses this for compare + branch fusion.

> 2) -fsched-fusion/TARGET_SCHED_FUSION_PRIORITY: Runs as a separate
> scheduling pass before peephole2. Operates independently on a single
> insn. Used by ARM backend to assign higher priorities to base/disp loads
> and stores so that the scheduling pass will schedule loads/stores to
> adjacent memory back-to-back. Later these insns will be transformed into
> load/store pair insns.
>
> Neither of these work for Power's purpose because they don't deal with
> fusion of dependent insns that may not already be back-to-back. The
> TARGET_SCHED_REORDER[2] hooks also don't work since the dependent insn
> more than likely gets queued for N cycles so wouldn't be on the ready
> list for the reorder hooks to process. We want the ability for the
> scheduler to schedule dependent insn pairs back-to-back when possible
> (i.e. other dependencies of both insns have been satisfied).
>
> I have coded up a proof of concept that implements our needs via a new
> target hook. The hook is passed a pair of dependent insns and returns if
> they are a fusion candidate. It is called while removing the forward
> dependencies of the just scheduled insn. If a dependent insn becomes
> available to schedule and it's a fusion candidate with the just
> scheduled insn, then the new code moves it to the ready list (if
> necessary) and marks it as SCHED_GROUP (piggy-backing on the existing
> code used by TARGET_SCHED_MACRO_FUSION) to make sure the fusion
> candidate will be scheduled next. Following is the scheduling part of
> the diff. Does this sound like a feasible approach? I welcome any
> comments/discussion.

It would be nice test if the i386 use of TARGET_SCHED_MACRO_FUSION hooks
for compare + branch fusion can be transitioned to that scheme.  I think your
proposal doesn't improve things there since we can never schedule
branches earlier
but we could schedule the compare later.  But your proposal only
affects scheduling
of later insns, not the order of insns getting onto the ready list?

Richard.

> Thanks,
> Pat
>
>
> diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
> index 80687fb5359..7a62136d497 100644
> --- a/gcc/haifa-sched.c
> +++ b/gcc/haifa-sched.c
> @@ -4152,6 +4152,39 @@ schedule_insn (rtx_insn *insn)
>               && SCHED_GROUP_P (next)
>               && advance < effective_cost)
>             advance = effective_cost;
> +
> +         /* If all dependencies of this insn have now been resolved and the
> +            just scheduled insn was not part of a SCHED_GROUP, check if
> +            this dependent insn can be fused with the just scheduled insn.  */
> +         else if (effective_cost >= 0 && !SCHED_GROUP_P (insn)
> +                  && targetm.sched.dep_fusion
> +                  && targetm.sched.dep_fusion (insn, next))
> +           {
> +             /* Move to ready list if necessary.  */
> +             if (effective_cost > 0)
> +               {
> +                 queue_remove (next);
> +                 ready_add (&ready, next, true);
> +               }
> +
> +             /* Mark as sched_group.  */
> +             SCHED_GROUP_P (next) = 1;
> +
> +             /* Fix insn_tick.  */
> +             INSN_TICK (next) = INSN_TICK (insn);
> +
> +             /* Dump some debug output for success.  */
> +             if (sched_verbose >= 5)
> +               {
> +                 fprintf (sched_dump, ";;\t\tFusing dependent insns: ");
> +                 fprintf (sched_dump, "%4d %-30s --> ", INSN_UID (insn),
> +                          str_pattern_slim (PATTERN (insn)));
> +                 fprintf (sched_dump, "%4d %-30s\n", INSN_UID (next),
> +                          str_pattern_slim (PATTERN (next)));
> +               }
> +           }
>         }
>        else
>         /* Check always has only one forward dependence (to the first insn in
Li, Pan2 via Gcc-patches April 8, 2020, 4:43 p.m. UTC | #2
On 4/8/20 2:46 AM, Richard Biener wrote:
>> I have coded up a proof of concept that implements our needs via a new
>> target hook. The hook is passed a pair of dependent insns and returns if
>> they are a fusion candidate. It is called while removing the forward
>> dependencies of the just scheduled insn. If a dependent insn becomes
>> available to schedule and it's a fusion candidate with the just
>> scheduled insn, then the new code moves it to the ready list (if
>> necessary) and marks it as SCHED_GROUP (piggy-backing on the existing
>> code used by TARGET_SCHED_MACRO_FUSION) to make sure the fusion
>> candidate will be scheduled next. Following is the scheduling part of
>> the diff. Does this sound like a feasible approach? I welcome any
>> comments/discussion.
> It would be nice test if the i386 use of TARGET_SCHED_MACRO_FUSION hooks
> for compare + branch fusion can be transitioned to that scheme.  I think your
> proposal doesn't improve things there since we can never schedule
> branches earlier
> but we could schedule the compare later.  But your proposal only
> affects scheduling
> of later insns, not the order of insns getting onto the ready list?

Yes, your understanding is correct, my approach only affects later
dependent insns. The first insn of the fusion pair will be scheduled
according to existing priority based scheduling.

-Pat
Jeff Law Nov. 12, 2020, 4:05 p.m. UTC | #3
On 4/7/20 2:45 PM, Pat Haugen via Gcc-patches wrote:
> The Power processor has the ability to fuse certain pairs of dependent
> instructions to improve their performance if they appear back-to-back in
> the instruction stream. In looking at the current support for
> instruction fusion in GCC I saw the following 2 options.
>
> 1) TARGET_SCHED_MACRO_FUSION target hooks: Only looks at existing
> back-to-back instructions and will ensure the scheduler keeps them together.
>
> 2) -fsched-fusion/TARGET_SCHED_FUSION_PRIORITY: Runs as a separate
> scheduling pass before peephole2. Operates independently on a single
> insn. Used by ARM backend to assign higher priorities to base/disp loads
> and stores so that the scheduling pass will schedule loads/stores to
> adjacent memory back-to-back. Later these insns will be transformed into
> load/store pair insns.
>
> Neither of these work for Power's purpose because they don't deal with
> fusion of dependent insns that may not already be back-to-back. The
> TARGET_SCHED_REORDER[2] hooks also don't work since the dependent insn
> more than likely gets queued for N cycles so wouldn't be on the ready
> list for the reorder hooks to process. We want the ability for the
> scheduler to schedule dependent insn pairs back-to-back when possible
> (i.e. other dependencies of both insns have been satisfied).
>
> I have coded up a proof of concept that implements our needs via a new
> target hook. The hook is passed a pair of dependent insns and returns if
> they are a fusion candidate. It is called while removing the forward
> dependencies of the just scheduled insn. If a dependent insn becomes
> available to schedule and it's a fusion candidate with the just
> scheduled insn, then the new code moves it to the ready list (if
> necessary) and marks it as SCHED_GROUP (piggy-backing on the existing
> code used by TARGET_SCHED_MACRO_FUSION) to make sure the fusion
> candidate will be scheduled next. Following is the scheduling part of
> the diff. Does this sound like a feasible approach? I welcome any
> comments/discussion.

It looks fairly reasonable to me.   Do you plan on trying to take this
forward at all?


jeff
Pat Haugen Nov. 13, 2020, 9:01 p.m. UTC | #4
On 11/12/20 10:05 AM, Jeff Law wrote:
>> I have coded up a proof of concept that implements our needs via a new
>> target hook. The hook is passed a pair of dependent insns and returns if
>> they are a fusion candidate. It is called while removing the forward
>> dependencies of the just scheduled insn. If a dependent insn becomes
>> available to schedule and it's a fusion candidate with the just
>> scheduled insn, then the new code moves it to the ready list (if
>> necessary) and marks it as SCHED_GROUP (piggy-backing on the existing
>> code used by TARGET_SCHED_MACRO_FUSION) to make sure the fusion
>> candidate will be scheduled next. Following is the scheduling part of
>> the diff. Does this sound like a feasible approach? I welcome any
>> comments/discussion.
> It looks fairly reasonable to me.   Do you plan on trying to take this
> forward at all?

Due to other requirements where this approach did not work, we are pursuing a different approach of creating combine patterns for the target.

-Pat
diff mbox series

Patch

diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index 80687fb5359..7a62136d497 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -4152,6 +4152,39 @@  schedule_insn (rtx_insn *insn)
 	      && SCHED_GROUP_P (next)
 	      && advance < effective_cost)
 	    advance = effective_cost;
+
+	  /* If all dependencies of this insn have now been resolved and the
+	     just scheduled insn was not part of a SCHED_GROUP, check if
+	     this dependent insn can be fused with the just scheduled insn.  */
+	  else if (effective_cost >= 0 && !SCHED_GROUP_P (insn)
+		   && targetm.sched.dep_fusion
+		   && targetm.sched.dep_fusion (insn, next))
+	    {
+	      /* Move to ready list if necessary.  */
+	      if (effective_cost > 0)
+		{
+		  queue_remove (next);
+		  ready_add (&ready, next, true);
+		}
+
+	      /* Mark as sched_group.  */
+	      SCHED_GROUP_P (next) = 1;
+
+	      /* Fix insn_tick.  */
+	      INSN_TICK (next) = INSN_TICK (insn);
+
+	      /* Dump some debug output for success.  */
+	      if (sched_verbose >= 5)
+		{
+		  fprintf (sched_dump, ";;\t\tFusing dependent insns: ");
+		  fprintf (sched_dump, "%4d %-30s --> ", INSN_UID (insn),
+			   str_pattern_slim (PATTERN (insn)));
+		  fprintf (sched_dump, "%4d %-30s\n", INSN_UID (next),
+			   str_pattern_slim (PATTERN (next)));
+		}
+	    }
 	}
       else
 	/* Check always has only one forward dependence (to the first insn in