Message ID | B83211783F7A334B926F0C0CA42E32CAF2CF01@hhmail02.hh.imgtec.org |
---|---|
State | New |
Headers | show |
On 09/15/2015 04:19 PM, Simon Dardis wrote:
> This patch adds a target hook for disabling the eager delay slot filler which when disabled can give better code. No new regressions. Ok to commit?
Hmm. Whether a branch was filled by the simple or eager filler is an
implementation detail - is there some better way to describe which kind
of branch is profitable?
Bernd
On 09/15/2015 08:27 AM, Bernd Schmidt wrote: > On 09/15/2015 04:19 PM, Simon Dardis wrote: >> This patch adds a target hook for disabling the eager delay slot >> filler which when disabled can give better code. No new regressions. >> Ok to commit? > > Hmm. Whether a branch was filled by the simple or eager filler is an > implementation detail - is there some better way to describe which kind > of branch is profitable? And more importantly, it's far better to be able to describe when it is not profitable to use eager filling rather than just disabling it completely. Jeff
The profitability of using an ordinary branch over a delay slot branch depends on how the delay slot is filled. If a delay slot can be filled from an instruction preceding the branch or instructions proceeding that must be executed on both sides then it is profitable to use a delay slot branch. For cases when instructions are chosen from one side of the branch, the proposed optimization strategy is to not speculatively execute instructions when ordinary branches could be used. Performance-wise this avoids executing instructions which the eager delay filler picked wrongly. Since most branches have a compact form disabling the eager delay filler should be no worse than altering it not to fill delay slots in this case. Thanks, Simon -----Original Message----- From: Jeff Law [mailto:law@redhat.com] Sent: 15 September 2015 16:02 To: Bernd Schmidt; Simon Dardis; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Target hook for disabling the delay slot filler. On 09/15/2015 08:27 AM, Bernd Schmidt wrote: > On 09/15/2015 04:19 PM, Simon Dardis wrote: >> This patch adds a target hook for disabling the eager delay slot >> filler which when disabled can give better code. No new regressions. >> Ok to commit? > > Hmm. Whether a branch was filled by the simple or eager filler is an > implementation detail - is there some better way to describe which > kind of branch is profitable? And more importantly, it's far better to be able to describe when it is not profitable to use eager filling rather than just disabling it completely. Jeff
On 09/17/2015 11:52 AM, Simon Dardis wrote: > The profitability of using an ordinary branch over a delay slot branch > depends on how the delay slot is filled. If a delay slot can be filled from > an instruction preceding the branch or instructions proceeding that must be > executed on both sides then it is profitable to use a delay slot branch. > > For cases when instructions are chosen from one side of the branch, > the proposed optimization strategy is to not speculatively execute > instructions when ordinary branches could be used. Performance-wise > this avoids executing instructions which the eager delay filler picked > wrongly. > > Since most branches have a compact form disabling the eager delay filler > should be no worse than altering it not to fill delay slots in this case. Ok, so in that case I think the patch would be reasonable if the target hook was named appropriately to say something like don't speculate when filling delay slots. It looks like fill_eager_delay_slots always speculates, so you needn't change your approach in reorg.c. Possibly place the hook after rtx_cost/address_cost in target.def since it's cost related. Bernd
On 09/17/2015 03:52 AM, Simon Dardis wrote: > The profitability of using an ordinary branch over a delay slot branch > depends on how the delay slot is filled. If a delay slot can be filled from > an instruction preceding the branch or instructions proceeding that must be > executed on both sides then it is profitable to use a delay slot branch. Agreed. It's an over-simplification, but for the purposes of this discussion it's close enough. > > For cases when instructions are chosen from one side of the branch, > the proposed optimization strategy is to not speculatively execute > instructions when ordinary branches could be used. Performance-wise > this avoids executing instructions which the eager delay filler picked > wrongly. Are you trying to say that you have the option as to what kind of branch to use? ie, "ordinary", presumably without a delay slot or one with a delay slot? Is the "ordinary" actually just a nullified delay slot or some form of likely/not likely static hint? > > Since most branches have a compact form disabling the eager delay filler > should be no worse than altering it not to fill delay slots in this case. But what is the compact form at the micro-architectural level? My mips-fu has diminished greatly, but my recollection is the bubble is always there. Is that not the case? fill_eager_delay_slots is most definitely speculative and its profitability is largely dependent on the cost of what insns it finds to fill those delay slots and whether they're from the common or uncommon path. If it is able to find insns from the commonly executed path that don't have a long latency, then the fill is usually profitable (since the pipeline bubble always exists). However, pulling a long latency instruction (say anything that might cache miss or an fdiv/fsqrt) off the slow path and conditionally nullifying it can be *awful*. Everything else is in-between. Jeff
Index: gcc/config/mips/mips.c =================================================================== --- gcc/config/mips/mips.c (revision 227676) +++ gcc/config/mips/mips.c (working copy) @@ -14425,6 +14425,14 @@ return cached_can_issue_more; } +/* Implement USE_EAGER_DELAY_FILLER. */ + +static bool +mips_use_eager_delay_filler_p () +{ + return TARGET_CB_NEVER; +} + /* Update round-robin counters for ALU1/2 and FALU1/2. */ static void @@ -19982,6 +19990,9 @@ #undef TARGET_IN_SMALL_DATA_P #define TARGET_IN_SMALL_DATA_P mips_in_small_data_p +#undef TARGET_USE_EAGER_DELAY_FILLER_P +#define TARGET_USE_EAGER_DELAY_FILLER_P mips_use_eager_delay_filler_p + #undef TARGET_MACHINE_DEPENDENT_REORG #define TARGET_MACHINE_DEPENDENT_REORG mips_reorg Index: gcc/doc/tm.texi =================================================================== --- gcc/doc/tm.texi (revision 227676) +++ gcc/doc/tm.texi (working copy) @@ -10949,6 +10949,15 @@ definition is null. @end deftypefn +@deftypefn {Target Hook} bool TARGET_USE_EAGER_DELAY_FILLER_P (void) +This predicate controls the use of the eager delay slot filler. Targets +such as certain MIPS architectures possess both branches with and without +delay slots. As the eager delay slot filler can increase code size, +disabling it is beneficial when ordinary branches are available. Use of +delay slot branches filled using the basic filler is often still desirable +as the delay slot can hide a pipeline bubble. +@end deftypefn + @deftypefn {Target Hook} void TARGET_INIT_BUILTINS (void) Define this hook if you have any machine-specific built-in functions that need to be defined. It should be a function that performs the Index: gcc/doc/tm.texi.in =================================================================== --- gcc/doc/tm.texi.in (revision 227676) +++ gcc/doc/tm.texi.in (working copy) @@ -7985,6 +7985,8 @@ @hook TARGET_MACHINE_DEPENDENT_REORG +@hook TARGET_USE_EAGER_DELAY_FILLER_P + @hook TARGET_INIT_BUILTINS @hook TARGET_BUILTIN_DECL Index: gcc/reorg.c =================================================================== --- gcc/reorg.c (revision 227676) +++ gcc/reorg.c (working copy) @@ -3793,7 +3793,8 @@ { fill_simple_delay_slots (1); fill_simple_delay_slots (0); - fill_eager_delay_slots (); + if (targetm.use_eager_delay_filler_p ()) + fill_eager_delay_slots (); relax_delay_slots (first); } Index: gcc/target.def =================================================================== --- gcc/target.def (revision 227676) +++ gcc/target.def (working copy) @@ -3618,6 +3618,17 @@ definition is null.", void, (void), NULL) +/* Control of eager delay slot filling in delayed-branch scheduling. */ +DEFHOOK +(use_eager_delay_filler_p, + "This predicate controls the use of the eager delay slot filler. Targets\n\ +such as certain MIPS architectures possess both branches with and without\n\ +delay slots. As the eager delay slot filler can increase code size,\n\ +disabling it is beneficial when ordinary branches are available. Use of\n\ +delay slot branches filled using the basic filler is often still desirable\n\ +as the delay slot can hide a pipeline bubble.", bool, (void), + hook_bool_void_true) + /* Create the __builtin_va_list type. */ DEFHOOK (build_builtin_va_list, Index: gcc/testsuite/gcc.target/mips/ds-schedule-1.c =================================================================== --- gcc/testsuite/gcc.target/mips/ds-schedule-1.c (revision 0) +++ gcc/testsuite/gcc.target/mips/ds-schedule-1.c (working copy) @@ -0,0 +1,29 @@ +/* { dg-options "isa_rev>=6 -mcompact-branches=optimal -mno-abicalls -G4" } */ +/* { dg-final { scan-assembler-not "bne\t" } } */ +/* { dg-final { scan-assembler-not "beq\t" } } */ +/* { dg-final { scan-assembler-times "\\(foo\\)" 1 } } */ + +/* Test that when compact branches are used, that a compact branch is + produced in the case where code expansion would have occurred if a + delay slot branch would have be used. 'foo' should only be + referenced once in the program text. */ + +struct list +{ + struct list *next; + int element; +}; + +struct list *gr; + +int foo; + +extern void t (int, int, int*); + +void +f (struct list **ptr) +{ + if (gr) + *ptr = gr->next; + t (1, foo, &gr->element); +} Index: gcc/testsuite/gcc.target/mips/ds-schedule-2.c =================================================================== --- gcc/testsuite/gcc.target/mips/ds-schedule-2.c (revision 0) +++ gcc/testsuite/gcc.target/mips/ds-schedule-2.c (working copy) @@ -0,0 +1,28 @@ +/* { dg-options "-mcompact-branches=never -mno-abicalls -G4" } */ +/* { dg-skip-if "code quality test" { *-*-* } { "-O0" "-O1" "-Os" } { "" } } */ +/* { dg-final { scan-assembler "beq.*\n\tlw" } } */ +/* { dg-final { scan-assembler-times "\\(foo\\)" 2 } } */ + +/* Test that when compact branches are explicitly disabled, that a non-compact + branch is produced. 'foo' should be referenced twice in the program text as the + eager delay slot filler will duplicate the load of foo. */ + +struct list +{ + struct list *next; + int element; +}; + +struct list *gr; + +int foo; + +extern void t (int, int, int*); + +void +f (struct list **ptr) +{ + if (gr) + *ptr = gr->next; + t (1, foo, &gr->element); +}