diff mbox

fix scheduling antideps

Message ID 0E3A13C8-561C-4587-AEAE-A067F9DD356D@comcast.net
State New
Headers show

Commit Message

Mike Stump Dec. 11, 2015, 8:11 a.m. UTC
This patch allows a target to increase the cost of anti-deps to better reflect the actual cost on the machine.

This gets me get 5% more performance on an important inner loop by exposing the actual cost of long dep chains that have lots of anti-deps in them.  Be scheduling the longer chain first, we have more opportunities to fill in the holes with content from the other less critical chains.

I’m unsure if all machines should have a cost of 1, or just some machines.  I suspect that OOO can hide the del chains well enough so that the value 0 is more appropriate.

Ok?

Comments

Eric Botcazou Dec. 11, 2015, 9:22 a.m. UTC | #1
> This patch allows a target to increase the cost of anti-deps to better
> reflect the actual cost on the machine.

But it can already do it via the TARGET_SCHED_ADJUST_COST hook, can't it?
Jeff Law Dec. 11, 2015, 2:09 p.m. UTC | #2
On 12/11/2015 02:22 AM, Eric Botcazou wrote:
>> This patch allows a target to increase the cost of anti-deps to better
>> reflect the actual cost on the machine.
>
> But it can already do it via the TARGET_SCHED_ADJUST_COST hook, can't it?
And can't this be done with define_bypass as well?

Jeff
Mike Stump Dec. 15, 2015, 6:47 p.m. UTC | #3
On Dec 11, 2015, at 1:22 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> This patch allows a target to increase the cost of anti-deps to better
>> reflect the actual cost on the machine.
> 
> But it can already do it via the TARGET_SCHED_ADJUST_COST hook, can't it?

The undocumented TARGET_SCHED_ADJUST_COST_2 seems a better fit.  Yes, that works, I can use it.  I’m assuming the lack of documentation is a simple error.
Mike Stump Dec. 15, 2015, 6:48 p.m. UTC | #4
On Dec 11, 2015, at 6:09 AM, Jeff Law <law@redhat.com> wrote:
> On 12/11/2015 02:22 AM, Eric Botcazou wrote:
>>> This patch allows a target to increase the cost of anti-deps to better
>>> reflect the actual cost on the machine.
>> 
>> But it can already do it via the TARGET_SCHED_ADJUST_COST hook, can't it?
> And can't this be done with define_bypass as well?

          if (dep_type == REG_DEP_ANTI)
            cost = 0;
          else if (dep_type == REG_DEP_OUTPUT)
            {
              cost = (insn_default_latency (insn)
                      - insn_default_latency (used));
              if (cost <= 0)
                cost = 1;
            }
          else if (bypass_p (insn))
            cost = insn_latency (insn, used);

I don’t see how if the first case is true, one gets into the third without code mods.  I opted for the adjust_cost_2 hook.
diff mbox

Patch

Index: defaults.h
===================================================================
--- defaults.h  (revision 231539)
+++ defaults.h  (working copy)
@@ -1486,6 +1486,10 @@ 
 #define TARGET_VTABLE_USES_DESCRIPTORS 0
 #endif
 
+#ifndef TARGET_ANTI_DEP_COST
+#define TARGET_ANTI_DEP_COST 0
+#endif
+
 #endif /* GCC_INSN_FLAGS_H  */
 
 #endif  /* ! GCC_DEFAULTS_H */
Index: doc/tm.texi
===================================================================
--- doc/tm.texi (revision 231539)
+++ doc/tm.texi (working copy)
@@ -6970,6 +6970,13 @@ 
 the hook implementation for how different fusion types are supported.
 @end deftypefn
 
+@defmac TARGET_ANTI_DEP_COST
+The cost in cycles for an anti-dependency.  Defaults to 0.  On non-OOO
+multi-issue machines that can't issue instructions that have
+overlapping registers in the same cycle, a value of 1 will better
+reflect the actual cost of the instruction sequence.
+@end defmac
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
Index: doc/tm.texi.in
===================================================================
--- doc/tm.texi.in      (revision 231539)
+++ doc/tm.texi.in      (working copy)
@@ -4852,6 +4852,13 @@ 
 
 @hook TARGET_SCHED_FUSION_PRIORITY
 
+@defmac TARGET_ANTI_DEP_COST
+The cost in cycles for an anti-dependency.  Defaults to 0.  On non-OOO
+multi-issue machines that can't issue instructions that have
+overlapping registers in the same cycle, a value of 1 will better
+reflect the actual cost of the instruction sequence.
+@end defmac
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
Index: haifa-sched.c
===================================================================
--- haifa-sched.c       (revision 231539)
+++ haifa-sched.c       (working copy)
@@ -1470,7 +1470,7 @@ 
       if (INSN_CODE (insn) >= 0)
        {
          if (dep_type == REG_DEP_ANTI)
-           cost = 0;
+           cost = TARGET_ANTI_DEP_COST;
          else if (dep_type == REG_DEP_OUTPUT)
            {
              cost = (insn_default_latency (insn)