diff mbox

Target hook for disabling the delay slot filler.

Message ID B83211783F7A334B926F0C0CA42E32CAF2CF01@hhmail02.hh.imgtec.org
State New
Headers show

Commit Message

Simon Dardis Sept. 15, 2015, 2:19 p.m. UTC
Hello all,

This patch adds a target hook for disabling the eager delay slot filler which when disabled can give better code. No new regressions. Ok to commit?

Thanks,
Simon

gcc/
	* target.def (use_eager_delay_filler_p): New hook for selectively
	disabling eager delay slot filler.
	* reorg.c (dbr_schedule): Use the new hook.
	* config/mips/mips.c (mips_use_eager_delay_filler_p): New static
	function.
	(TARGET_USE_EAGER_DELAY_FILLER_P): Define.
	* doc/tm.texi.in: Add placeholder for new hook.
	* doc/tm.texi: Regenerate.

gcc/testsuite/

	* gcc.target/mips/ds-schedule-1.c: New file.
	* gcc.target/mips/ds-schedule-2.c: Likewise.

Comments

Bernd Schmidt Sept. 15, 2015, 2:27 p.m. UTC | #1
On 09/15/2015 04:19 PM, Simon Dardis wrote:
> This patch adds a target hook for disabling the eager delay slot filler which when disabled can give better code. No new regressions. Ok to commit?

Hmm. Whether a branch was filled by the simple or eager filler is an 
implementation detail - is there some better way to describe which kind 
of branch is profitable?


Bernd
Jeff Law Sept. 15, 2015, 3:01 p.m. UTC | #2
On 09/15/2015 08:27 AM, Bernd Schmidt wrote:
> On 09/15/2015 04:19 PM, Simon Dardis wrote:
>> This patch adds a target hook for disabling the eager delay slot
>> filler which when disabled can give better code. No new regressions.
>> Ok to commit?
>
> Hmm. Whether a branch was filled by the simple or eager filler is an
> implementation detail - is there some better way to describe which kind
> of branch is profitable?
And more importantly, it's far better to be able to describe when it is 
not profitable to use eager filling rather than just disabling it 
completely.

Jeff
Simon Dardis Sept. 17, 2015, 9:52 a.m. UTC | #3
The profitability of using an ordinary branch over a delay slot branch
depends on how the delay slot is filled. If a delay slot can be filled from
an instruction preceding the branch or instructions proceeding that must be 
executed on both sides then it is profitable to use a delay slot branch.

For cases when instructions are chosen from one side of the branch, 
the proposed optimization strategy is to not speculatively execute 
instructions when ordinary branches could be used. Performance-wise
this avoids executing instructions which the eager delay filler picked
wrongly.

Since most branches have a compact form disabling the eager delay filler
should be no worse than altering it not to fill delay slots in this case.

Thanks,
Simon

-----Original Message-----
From: Jeff Law [mailto:law@redhat.com] 
Sent: 15 September 2015 16:02
To: Bernd Schmidt; Simon Dardis; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Target hook for disabling the delay slot filler.

On 09/15/2015 08:27 AM, Bernd Schmidt wrote:
> On 09/15/2015 04:19 PM, Simon Dardis wrote:
>> This patch adds a target hook for disabling the eager delay slot 
>> filler which when disabled can give better code. No new regressions.
>> Ok to commit?
>
> Hmm. Whether a branch was filled by the simple or eager filler is an 
> implementation detail - is there some better way to describe which 
> kind of branch is profitable?
And more importantly, it's far better to be able to describe when it is not profitable to use eager filling rather than just disabling it completely.

Jeff
Bernd Schmidt Sept. 17, 2015, 10:16 a.m. UTC | #4
On 09/17/2015 11:52 AM, Simon Dardis wrote:
> The profitability of using an ordinary branch over a delay slot branch
> depends on how the delay slot is filled. If a delay slot can be filled from
> an instruction preceding the branch or instructions proceeding that must be
> executed on both sides then it is profitable to use a delay slot branch.
>
> For cases when instructions are chosen from one side of the branch,
> the proposed optimization strategy is to not speculatively execute
> instructions when ordinary branches could be used. Performance-wise
> this avoids executing instructions which the eager delay filler picked
> wrongly.
>
> Since most branches have a compact form disabling the eager delay filler
> should be no worse than altering it not to fill delay slots in this case.

Ok, so in that case I think the patch would be reasonable if the target 
hook was named appropriately to say something like don't speculate when 
filling delay slots. It looks like fill_eager_delay_slots always 
speculates, so you needn't change your approach in reorg.c.
Possibly place the hook after rtx_cost/address_cost in target.def since 
it's cost related.


Bernd
Jeff Law Sept. 17, 2015, 4:55 p.m. UTC | #5
On 09/17/2015 03:52 AM, Simon Dardis wrote:
> The profitability of using an ordinary branch over a delay slot branch
> depends on how the delay slot is filled. If a delay slot can be filled from
> an instruction preceding the branch or instructions proceeding that must be
> executed on both sides then it is profitable to use a delay slot branch.
Agreed.  It's an over-simplification, but for the purposes of this 
discussion it's close enough.


>
> For cases when instructions are chosen from one side of the branch,
> the proposed optimization strategy is to not speculatively execute
> instructions when ordinary branches could be used. Performance-wise
> this avoids executing instructions which the eager delay filler picked
> wrongly.
Are you trying to say that you have the option as to what kind of branch 
to use?  ie, "ordinary", presumably without a delay slot or one with a 
delay slot?

Is the "ordinary" actually just a nullified delay slot or some form of 
likely/not likely static hint?



>
> Since most branches have a compact form disabling the eager delay filler
> should be no worse than altering it not to fill delay slots in this case.
But what is the compact form at the micro-architectural level?  My 
mips-fu has diminished greatly, but my recollection is the bubble is 
always there.   Is that not the case?

fill_eager_delay_slots is most definitely speculative and its 
profitability is largely dependent on the cost of what insns it finds to 
fill those delay slots and whether they're from the common or uncommon path.

If it is able to find insns from the commonly executed path that don't 
have a long latency, then the fill is usually profitable (since the 
pipeline bubble always exists).  However, pulling a long latency 
instruction (say anything that might cache miss or an fdiv/fsqrt) off 
the slow path and conditionally nullifying it can be *awful*. 
Everything else is in-between.



Jeff
diff mbox

Patch

Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	(revision 227676)
+++ gcc/config/mips/mips.c	(working copy)
@@ -14425,6 +14425,14 @@ 
   return cached_can_issue_more;
 }
 
+/* Implement USE_EAGER_DELAY_FILLER.  */
+
+static bool
+mips_use_eager_delay_filler_p ()
+{
+  return TARGET_CB_NEVER;
+}
+
 /* Update round-robin counters for ALU1/2 and FALU1/2.  */
 
 static void
@@ -19982,6 +19990,9 @@ 
 #undef TARGET_IN_SMALL_DATA_P
 #define TARGET_IN_SMALL_DATA_P mips_in_small_data_p
 
+#undef TARGET_USE_EAGER_DELAY_FILLER_P
+#define TARGET_USE_EAGER_DELAY_FILLER_P mips_use_eager_delay_filler_p
+
 #undef TARGET_MACHINE_DEPENDENT_REORG
 #define TARGET_MACHINE_DEPENDENT_REORG mips_reorg
 
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 227676)
+++ gcc/doc/tm.texi	(working copy)
@@ -10949,6 +10949,15 @@ 
 definition is null.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_USE_EAGER_DELAY_FILLER_P (void)
+This predicate controls the use of the eager delay slot filler.  Targets
+such as certain MIPS architectures possess both branches with and without
+delay slots.  As the eager delay slot filler can increase code size,
+disabling it is beneficial when ordinary branches are available.  Use of
+delay slot branches filled using the basic filler is often still desirable
+as the delay slot can hide a pipeline bubble.
+@end deftypefn
+
 @deftypefn {Target Hook} void TARGET_INIT_BUILTINS (void)
 Define this hook if you have any machine-specific built-in functions
 that need to be defined.  It should be a function that performs the
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 227676)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -7985,6 +7985,8 @@ 
 
 @hook TARGET_MACHINE_DEPENDENT_REORG
 
+@hook TARGET_USE_EAGER_DELAY_FILLER_P
+
 @hook TARGET_INIT_BUILTINS
 
 @hook TARGET_BUILTIN_DECL
Index: gcc/reorg.c
===================================================================
--- gcc/reorg.c	(revision 227676)
+++ gcc/reorg.c	(working copy)
@@ -3793,7 +3793,8 @@ 
     {
       fill_simple_delay_slots (1);
       fill_simple_delay_slots (0);
-      fill_eager_delay_slots ();
+      if (targetm.use_eager_delay_filler_p ())
+	fill_eager_delay_slots ();
       relax_delay_slots (first);
     }
 
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 227676)
+++ gcc/target.def	(working copy)
@@ -3618,6 +3618,17 @@ 
 definition is null.",
  void, (void), NULL)
 
+/* Control of eager delay slot filling in delayed-branch scheduling.  */
+DEFHOOK
+(use_eager_delay_filler_p,
+ "This predicate controls the use of the eager delay slot filler.  Targets\n\
+such as certain MIPS architectures possess both branches with and without\n\
+delay slots.  As the eager delay slot filler can increase code size,\n\
+disabling it is beneficial when ordinary branches are available.  Use of\n\
+delay slot branches filled using the basic filler is often still desirable\n\
+as the delay slot can hide a pipeline bubble.", bool, (void),
+  hook_bool_void_true)
+
 /* Create the __builtin_va_list type.  */
 DEFHOOK
 (build_builtin_va_list,
Index: gcc/testsuite/gcc.target/mips/ds-schedule-1.c
===================================================================
--- gcc/testsuite/gcc.target/mips/ds-schedule-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/mips/ds-schedule-1.c	(working copy)
@@ -0,0 +1,29 @@ 
+/* { dg-options "isa_rev>=6 -mcompact-branches=optimal -mno-abicalls -G4" } */
+/* { dg-final { scan-assembler-not "bne\t" } } */
+/* { dg-final { scan-assembler-not "beq\t" } } */
+/* { dg-final { scan-assembler-times "\\(foo\\)" 1 } } */
+
+/* Test that when compact branches are used, that a compact branch is
+   produced in the case where code expansion would have occurred if a
+   delay slot branch would have be used.  'foo' should only be
+   referenced once in the program text.  */
+
+struct list
+{
+  struct list *next;
+  int element;
+};
+
+struct list *gr;
+
+int foo;
+
+extern void t (int, int, int*);
+
+void
+f (struct list **ptr)
+{
+  if (gr)
+    *ptr = gr->next;
+  t (1, foo, &gr->element);
+}
Index: gcc/testsuite/gcc.target/mips/ds-schedule-2.c
===================================================================
--- gcc/testsuite/gcc.target/mips/ds-schedule-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/mips/ds-schedule-2.c	(working copy)
@@ -0,0 +1,28 @@ 
+/* { dg-options "-mcompact-branches=never -mno-abicalls -G4" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" "-O1" "-Os" } { "" } } */
+/* { dg-final { scan-assembler "beq.*\n\tlw" } } */
+/* { dg-final { scan-assembler-times "\\(foo\\)" 2 } } */
+
+/* Test that when compact branches are explicitly disabled, that a non-compact
+   branch is produced. 'foo' should be referenced twice in the program text as the
+   eager delay slot filler will duplicate the load of foo. */
+
+struct list
+{
+  struct list *next;
+  int element;
+};
+
+struct list *gr;
+
+int foo;
+
+extern void t (int, int, int*);
+
+void
+f (struct list **ptr)
+{
+  if (gr)
+    *ptr = gr->next;
+  t (1, foo, &gr->element);
+}