diff mbox

[RFC] sched: Do not move expensive insns speculatively (PR68664)

Message ID 9a0c95d7ab2d7c5532b51b898acae2ef8f756aa9.1485477070.git.segher@kernel.crashing.org
State New
Headers show

Commit Message

Segher Boessenkool Jan. 27, 2017, 12:38 a.m. UTC
Scheduling should never move very expensive instructions to places they
are executed more frequently.  This patch fixes that, reducing the
execution time of c-ray by over 40% (I tested on a BE Power7 system).

Is there some existing way to test for "very expensive insn" or "very
expensive insn we should not speculate"?  Should there be a new hook?
Is only disallowing (FP) SQRT and DIV a good solution?


Segher


---
 gcc/sched-rgn.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

Comments

Andrew Pinski Jan. 27, 2017, 1 a.m. UTC | #1
On Thu, Jan 26, 2017 at 4:38 PM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> Scheduling should never move very expensive instructions to places they
> are executed more frequently.  This patch fixes that, reducing the
> execution time of c-ray by over 40% (I tested on a BE Power7 system).
>
> Is there some existing way to test for "very expensive insn" or "very
> expensive insn we should not speculate"?  Should there be a new hook?
> Is only disallowing (FP) SQRT and DIV a good solution?

Seems like it should be checking the insn cost and compare that
against some parameter.  That is possibly set by the target if needed.

Thanks,
Andrew

>
>
> Segher
>
>
> ---
>  gcc/sched-rgn.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c
> index 2af3a03..6ccd973 100644
> --- a/gcc/sched-rgn.c
> +++ b/gcc/sched-rgn.c
> @@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "sel-sched.h"
>  #include "tree-pass.h"
>  #include "dbgcnt.h"
> +#include "rtl-iter.h"
>  #include "pretty-print.h"
>  #include "print-rtl.h"
>
> @@ -2147,12 +2148,20 @@ static int
>  can_schedule_ready_p (rtx_insn *insn)
>  {
>    /* An interblock motion?  */
> -  if (INSN_BB (insn) != target_bb
> -      && IS_SPECULATIVE_INSN (insn)
> -      && !check_live (insn, INSN_BB (insn)))
> -    return 0;
> -  else
> -    return 1;
> +  if (INSN_BB (insn) != target_bb && IS_SPECULATIVE_INSN (insn))
> +    {
> +      /* Cannot schedule this insn unless all operands are live.  */
> +      if (!check_live (insn, INSN_BB (insn)))
> +       return 0;
> +
> +      /* Should not move expensive instructions speculatively.  */
> +      subrtx_iterator::array_type array;
> +      FOR_EACH_SUBRTX (iter, array, insn, NONCONST)
> +        if (*iter && GET_CODE (*iter) == SQRT)
> +         return 0;
> +    }
> +
> +  return 1;
>  }
>
>  /* Updates counter and other information.  Split from can_schedule_ready_p ()
> --
> 1.9.3
>
Segher Boessenkool Jan. 27, 2017, 1:19 a.m. UTC | #2
On Thu, Jan 26, 2017 at 05:00:44PM -0800, Andrew Pinski wrote:
> On Thu, Jan 26, 2017 at 4:38 PM, Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > Scheduling should never move very expensive instructions to places they
> > are executed more frequently.  This patch fixes that, reducing the
> > execution time of c-ray by over 40% (I tested on a BE Power7 system).
> >
> > Is there some existing way to test for "very expensive insn" or "very
> > expensive insn we should not speculate"?  Should there be a new hook?
> > Is only disallowing (FP) SQRT and DIV a good solution?
> 
> Seems like it should be checking the insn cost and compare that
> against some parameter.  That is possibly set by the target if needed.

But what is "insn cost"?  Latency is no good at all -- we *want* insns
with higher latency to be earlier.  fsqrt is not pipelined, and that is
what makes it so costly.  (This isn't modeled in the scheduling
description btw: that would make the automata sizes explode, the usual
problem).


Segher
Andrew Pinski Jan. 27, 2017, 1:27 a.m. UTC | #3
On Thu, Jan 26, 2017 at 5:19 PM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> On Thu, Jan 26, 2017 at 05:00:44PM -0800, Andrew Pinski wrote:
>> On Thu, Jan 26, 2017 at 4:38 PM, Segher Boessenkool
>> <segher@kernel.crashing.org> wrote:
>> > Scheduling should never move very expensive instructions to places they
>> > are executed more frequently.  This patch fixes that, reducing the
>> > execution time of c-ray by over 40% (I tested on a BE Power7 system).
>> >
>> > Is there some existing way to test for "very expensive insn" or "very
>> > expensive insn we should not speculate"?  Should there be a new hook?
>> > Is only disallowing (FP) SQRT and DIV a good solution?
>>
>> Seems like it should be checking the insn cost and compare that
>> against some parameter.  That is possibly set by the target if needed.
>
> But what is "insn cost"?  Latency is no good at all -- we *want* insns
> with higher latency to be earlier.  fsqrt is not pipelined, and that is
> what makes it so costly.  (This isn't modeled in the scheduling
> description btw: that would make the automata sizes explode, the usual
> problem).

I was just talking about RTX cost of the insn.  It seems like we don't
want to move any insn cost which is high.  Even if it is pipelined, it
does not make sense to be part of the hot path.
Even on in-order targets, we don't want to move instructions which are
pipelined either.

Say the default is 8 as declared as expensive.

Thanks,
Andrew

>
>
> Segher
Ramana Radhakrishnan Jan. 27, 2017, 11:32 a.m. UTC | #4
On Fri, Jan 27, 2017 at 1:27 AM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Thu, Jan 26, 2017 at 5:19 PM, Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
>> On Thu, Jan 26, 2017 at 05:00:44PM -0800, Andrew Pinski wrote:
>>> On Thu, Jan 26, 2017 at 4:38 PM, Segher Boessenkool
>>> <segher@kernel.crashing.org> wrote:
>>> > Scheduling should never move very expensive instructions to places they
>>> > are executed more frequently.  This patch fixes that, reducing the
>>> > execution time of c-ray by over 40% (I tested on a BE Power7 system).
>>> >
>>> > Is there some existing way to test for "very expensive insn" or "very
>>> > expensive insn we should not speculate"?  Should there be a new hook?
>>> > Is only disallowing (FP) SQRT and DIV a good solution?
>>>
>>> Seems like it should be checking the insn cost and compare that
>>> against some parameter.  That is possibly set by the target if needed.
>>
>> But what is "insn cost"?  Latency is no good at all -- we *want* insns
>> with higher latency to be earlier.  fsqrt is not pipelined, and that is
>> what makes it so costly.  (This isn't modeled in the scheduling
>> description btw: that would make the automata sizes explode, the usual
>> problem).
>
> I was just talking about RTX cost of the insn.  It seems like we don't
> want to move any insn cost which is high.  Even if it is pipelined, it
> does not make sense to be part of the hot path.
> Even on in-order targets, we don't want to move instructions which are
> pipelined either.
>
> Say the default is 8 as declared as expensive.

Either that or you add another hook to the backend to get these insn
codes which is also fragile ?

If you do proceed with 8 as your magic number then let backends
override it.  Magic numbers in target independent code irrespective of
whether they are 8 or 3 annoy me :)

regards
Ramana

>
> Thanks,
> Andrew
>
>>
>>
>> Segher
Segher Boessenkool Jan. 27, 2017, 11:47 a.m. UTC | #5
On Fri, Jan 27, 2017 at 11:32:32AM +0000, Ramana Radhakrishnan wrote:
> >>> Seems like it should be checking the insn cost and compare that
> >>> against some parameter.  That is possibly set by the target if needed.
> >>
> >> But what is "insn cost"?  Latency is no good at all -- we *want* insns
> >> with higher latency to be earlier.  fsqrt is not pipelined, and that is
> >> what makes it so costly.  (This isn't modeled in the scheduling
> >> description btw: that would make the automata sizes explode, the usual
> >> problem).
> >
> > I was just talking about RTX cost of the insn.  It seems like we don't
> > want to move any insn cost which is high.  Even if it is pipelined, it
> > does not make sense to be part of the hot path.
> > Even on in-order targets, we don't want to move instructions which are
> > pipelined either.
> >
> > Say the default is 8 as declared as expensive.
> 
> Either that or you add another hook to the backend to get these insn
> codes which is also fragile ?

How so?  The hook would get the full insn pattern of course.

> If you do proceed with 8 as your magic number then let backends
> override it.  Magic numbers in target independent code irrespective of
> whether they are 8 or 3 annoy me :)

If we abuse insn_rtx_cost for this then every target will need to tune
the cutoff, and there may not even be a good cutoff value at all.  It
also in the PowerPC case means we need to adjust our costs (sqrt does
not have any defined cost right now, we never needed it).

IMO it's much cleaner to have a nice hook that says exactly what it
does in its name, say, targetm.sched.do_not_speculate_insn or such.


Segher
Richard Biener Jan. 27, 2017, 12:15 p.m. UTC | #6
On Fri, Jan 27, 2017 at 1:38 AM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> Scheduling should never move very expensive instructions to places they
> are executed more frequently.  This patch fixes that, reducing the
> execution time of c-ray by over 40% (I tested on a BE Power7 system).
>
> Is there some existing way to test for "very expensive insn" or "very
> expensive insn we should not speculate"?  Should there be a new hook?
> Is only disallowing (FP) SQRT and DIV a good solution?

As both SQRT and DIV may trap I wonder how we can end up speculating
them at all?
Ok, maybe with -fno-trapping-math we don't consider that case but even
then generating
a NaN is usually dreadfully slow so avoiding speculation of such insns
looks good in
any case (w/o considering its cost).

Richard.

>
> Segher
>
>
> ---
>  gcc/sched-rgn.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c
> index 2af3a03..6ccd973 100644
> --- a/gcc/sched-rgn.c
> +++ b/gcc/sched-rgn.c
> @@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "sel-sched.h"
>  #include "tree-pass.h"
>  #include "dbgcnt.h"
> +#include "rtl-iter.h"
>  #include "pretty-print.h"
>  #include "print-rtl.h"
>
> @@ -2147,12 +2148,20 @@ static int
>  can_schedule_ready_p (rtx_insn *insn)
>  {
>    /* An interblock motion?  */
> -  if (INSN_BB (insn) != target_bb
> -      && IS_SPECULATIVE_INSN (insn)
> -      && !check_live (insn, INSN_BB (insn)))
> -    return 0;
> -  else
> -    return 1;
> +  if (INSN_BB (insn) != target_bb && IS_SPECULATIVE_INSN (insn))
> +    {
> +      /* Cannot schedule this insn unless all operands are live.  */
> +      if (!check_live (insn, INSN_BB (insn)))
> +       return 0;
> +
> +      /* Should not move expensive instructions speculatively.  */
> +      subrtx_iterator::array_type array;
> +      FOR_EACH_SUBRTX (iter, array, insn, NONCONST)
> +        if (*iter && GET_CODE (*iter) == SQRT)
> +         return 0;
> +    }
> +
> +  return 1;
>  }
>
>  /* Updates counter and other information.  Split from can_schedule_ready_p ()
> --
> 1.9.3
>
Segher Boessenkool Jan. 27, 2017, 12:38 p.m. UTC | #7
On Fri, Jan 27, 2017 at 01:15:27PM +0100, Richard Biener wrote:
> As both SQRT and DIV may trap I wonder how we can end up speculating
> them at all?

The testcase uses -ffast-math.

> Ok, maybe with -fno-trapping-math we don't consider that case but even
> then generating
> a NaN is usually dreadfully slow so avoiding speculation of such insns
> looks good in
> any case (w/o considering its cost).

And -ffast-math includes -ffinite-math-only.  No, the testcase never
takes the square root of number smaller than zero, it isn't *that* slow ;-)

Things slow down so much because there is a loop immediately followed
by a square root insn, and sched-rgn decides it is a good idea to move
it to inside the loop.  Which is a bad idea no matter what the frequency
of the loop is because 1) we do not get such profiles very correct, and
2) sqrt is really expensive.


Segher
Bernd Schmidt Jan. 27, 2017, 12:43 p.m. UTC | #8
On 01/27/2017 02:19 AM, Segher Boessenkool wrote:

> But what is "insn cost"?  Latency is no good at all -- we *want* insns
> with higher latency to be earlier.  fsqrt is not pipelined, and that is
> what makes it so costly.  (This isn't modeled in the scheduling
> description btw: that would make the automata sizes explode, the usual
> problem).

On other machines sqrt might be pipelined, so the patch as-is clearly 
isn't suitable. rtx_cost/insn_cost probably also won't do, since it 
doesn't model that property either.

Maybe instead of a hook you could have an insn attribute that the 
scheduler looks for.


Bernd
Richard Biener Jan. 27, 2017, 1:30 p.m. UTC | #9
On Fri, Jan 27, 2017 at 1:38 PM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> On Fri, Jan 27, 2017 at 01:15:27PM +0100, Richard Biener wrote:
>> As both SQRT and DIV may trap I wonder how we can end up speculating
>> them at all?
>
> The testcase uses -ffast-math.
>
>> Ok, maybe with -fno-trapping-math we don't consider that case but even
>> then generating
>> a NaN is usually dreadfully slow so avoiding speculation of such insns
>> looks good in
>> any case (w/o considering its cost).
>
> And -ffast-math includes -ffinite-math-only.  No, the testcase never
> takes the square root of number smaller than zero, it isn't *that* slow ;-)

Well, the testcase as written doesn't but if you speculate the sqrt it might?

> Things slow down so much because there is a loop immediately followed
> by a square root insn, and sched-rgn decides it is a good idea to move
> it to inside the loop.  Which is a bad idea no matter what the frequency
> of the loop is because 1) we do not get such profiles very correct, and
> 2) sqrt is really expensive.

I understood that but then moving sth inside a loop is almost never a win.

Can't "not modeled" insns not be marked somehow in the pipeline description?

Richard.

>
>
> Segher
Segher Boessenkool Jan. 27, 2017, 2:04 p.m. UTC | #10
On Fri, Jan 27, 2017 at 01:43:30PM +0100, Bernd Schmidt wrote:
> On 01/27/2017 02:19 AM, Segher Boessenkool wrote:
> 
> >But what is "insn cost"?  Latency is no good at all -- we *want* insns
> >with higher latency to be earlier.  fsqrt is not pipelined, and that is
> >what makes it so costly.  (This isn't modeled in the scheduling
> >description btw: that would make the automata sizes explode, the usual
> >problem).
> 
> On other machines sqrt might be pipelined, so the patch as-is clearly 
> isn't suitable. rtx_cost/insn_cost probably also won't do, since it 
> doesn't model that property either.

insn_cost is 0, even (because we have -fsched-fusion), so overloading it
would cause problems already before we start :-/

> Maybe instead of a hook you could have an insn attribute that the 
> scheduler looks for.

Eww.  Do we have any other attributes like that?  I sure hope not.  This
isn't a property of a define_insn (etc.), it is one of the machine insns
instead, so marking it with an attr makes it really hard to check if you
caught all patterns that end up as such a machine insn.

A target can decide to use an attr of course, in its own implementation
of the hook.  But generic it is not such a great interface.


Segher
Segher Boessenkool Jan. 27, 2017, 2:19 p.m. UTC | #11
On Fri, Jan 27, 2017 at 02:30:49PM +0100, Richard Biener wrote:
> >> Ok, maybe with -fno-trapping-math we don't consider that case but even
> >> then generating
> >> a NaN is usually dreadfully slow so avoiding speculation of such insns
> >> looks good in
> >> any case (w/o considering its cost).
> >
> > And -ffast-math includes -ffinite-math-only.  No, the testcase never
> > takes the square root of number smaller than zero, it isn't *that* slow ;-)
> 
> Well, the testcase as written doesn't but if you speculate the sqrt it might?

Yeah true.  Except we have -ffast-math so we told the compiler that is
just fine to do.

> > Things slow down so much because there is a loop immediately followed
> > by a square root insn, and sched-rgn decides it is a good idea to move
> > it to inside the loop.  Which is a bad idea no matter what the frequency
> > of the loop is because 1) we do not get such profiles very correct, and
> > 2) sqrt is really expensive.
> 
> I understood that but then moving sth inside a loop is almost never a win.

It defaults to moving something if it has space for it in the schedule
and it is executed at least 40% of the time (I think).

> Can't "not modeled" insns not be marked somehow in the pipeline description?

Well, the only thing from the pipeline description that is used here is
the insn latency, which isn't all that much higher than "normal" FP insns.
And simply "not decribed properly" won't do much good -- if we could
(without blowing up the automata) we would, and sched-rgn would then
still speculate this.


Segher
Jeff Law Jan. 27, 2017, 6:04 p.m. UTC | #12
On 01/27/2017 07:19 AM, Segher Boessenkool wrote:
> On Fri, Jan 27, 2017 at 02:30:49PM +0100, Richard Biener wrote:
>>>> Ok, maybe with -fno-trapping-math we don't consider that case but even
>>>> then generating
>>>> a NaN is usually dreadfully slow so avoiding speculation of such insns
>>>> looks good in
>>>> any case (w/o considering its cost).
>>>
>>> And -ffast-math includes -ffinite-math-only.  No, the testcase never
>>> takes the square root of number smaller than zero, it isn't *that* slow ;-)
>>
>> Well, the testcase as written doesn't but if you speculate the sqrt it might?
>
> Yeah true.  Except we have -ffast-math so we told the compiler that is
> just fine to do.
>
>>> Things slow down so much because there is a loop immediately followed
>>> by a square root insn, and sched-rgn decides it is a good idea to move
>>> it to inside the loop.  Which is a bad idea no matter what the frequency
>>> of the loop is because 1) we do not get such profiles very correct, and
>>> 2) sqrt is really expensive.
>>
>> I understood that but then moving sth inside a loop is almost never a win.
>
> It defaults to moving something if it has space for it in the schedule
> and it is executed at least 40% of the time (I think).
>
>> Can't "not modeled" insns not be marked somehow in the pipeline description?
>
> Well, the only thing from the pipeline description that is used here is
> the insn latency, which isn't all that much higher than "normal" FP insns.
> And simply "not decribed properly" won't do much good -- if we could
> (without blowing up the automata) we would, and sched-rgn would then
> still speculate this.
And I think this is the core of the issue.  We have multiple ports that 
don't necessarily fully describe the latency, issue rates, etc of 
certain insns like div/sqrt/rsqrt.  There are good reasons for doing that.

Because of the partial description, the scheduler may think those insns 
fit into a pipeline bubble within the loop, when reality they do not.

The scheduler currently has no way of knowing what insns have this 
property.   While there are cases where we'd like to speculate a div or 
sqrt to give it more time to complete without stalls -- there's no good 
way to do that without fully describing them to the scheduler.

My preference would be somehow either mark those insns as not fully 
modeled and avoid speculating on them.  Or invent a target hook to allow 
the scheduler to query the backend.

Note that these could be used elsewhere -- for example delay slot 
scheduling and predication.  Delay slot scheduling does speculation and 
there's ports that simply refuse to allow certain instructions (div/sqrt 
on one port, I think all FP stuff on another) to avoid these kinds of 
problems.

Similarly nullification/predication often work by wiping out the final 
posting of results into the register file.  So imagine a non-pipelined 
div/sqrt.  Predicating a div/sqrt instruction will actually keep the 
pipeline busy computing results that will be thrown away and preventing 
other useful work from occurring.  And, yes, this really does happen. 
THe PA suffered from these problems.

jeff
Jeff Law Jan. 27, 2017, 6:08 p.m. UTC | #13
On 01/27/2017 04:32 AM, Ramana Radhakrishnan wrote:
> On Fri, Jan 27, 2017 at 1:27 AM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Thu, Jan 26, 2017 at 5:19 PM, Segher Boessenkool
>> <segher@kernel.crashing.org> wrote:
>>> On Thu, Jan 26, 2017 at 05:00:44PM -0800, Andrew Pinski wrote:
>>>> On Thu, Jan 26, 2017 at 4:38 PM, Segher Boessenkool
>>>> <segher@kernel.crashing.org> wrote:
>>>>> Scheduling should never move very expensive instructions to places they
>>>>> are executed more frequently.  This patch fixes that, reducing the
>>>>> execution time of c-ray by over 40% (I tested on a BE Power7 system).
>>>>>
>>>>> Is there some existing way to test for "very expensive insn" or "very
>>>>> expensive insn we should not speculate"?  Should there be a new hook?
>>>>> Is only disallowing (FP) SQRT and DIV a good solution?
>>>>
>>>> Seems like it should be checking the insn cost and compare that
>>>> against some parameter.  That is possibly set by the target if needed.
>>>
>>> But what is "insn cost"?  Latency is no good at all -- we *want* insns
>>> with higher latency to be earlier.  fsqrt is not pipelined, and that is
>>> what makes it so costly.  (This isn't modeled in the scheduling
>>> description btw: that would make the automata sizes explode, the usual
>>> problem).
>>
>> I was just talking about RTX cost of the insn.  It seems like we don't
>> want to move any insn cost which is high.  Even if it is pipelined, it
>> does not make sense to be part of the hot path.
>> Even on in-order targets, we don't want to move instructions which are
>> pipelined either.
>>
>> Say the default is 8 as declared as expensive.
>
> Either that or you add another hook to the backend to get these insn
> codes which is also fragile ?
>
> If you do proceed with 8 as your magic number then let backends
> override it.  Magic numbers in target independent code irrespective of
> whether they are 8 or 3 annoy me :)
Magic numbers are definitely not the way to go here :-)

Consider a port where a particular commonly used insn blocks anything 
else from executing in the CPU or FPU for several cycles.  Given there's 
literally no way to avoid the lock, we don't bother modeling that insn 
for the purposes of scheduling.  We just claim single cycle latency and 
issue rate.

Jeff
Jeff Law Jan. 27, 2017, 6:11 p.m. UTC | #14
On 01/27/2017 05:43 AM, Bernd Schmidt wrote:
> On 01/27/2017 02:19 AM, Segher Boessenkool wrote:
>
>> But what is "insn cost"?  Latency is no good at all -- we *want* insns
>> with higher latency to be earlier.  fsqrt is not pipelined, and that is
>> what makes it so costly.  (This isn't modeled in the scheduling
>> description btw: that would make the automata sizes explode, the usual
>> problem).
>
> On other machines sqrt might be pipelined, so the patch as-is clearly
> isn't suitable. rtx_cost/insn_cost probably also won't do, since it
> doesn't model that property either.
>
> Maybe instead of a hook you could have an insn attribute that the
> scheduler looks for.
Even if it's pipelined, it may not be a good idea to speculate it :-) 
We may not describe the extent to which its pipelined due to DFA 
explosions.  Thus we may (incorrectly) decide that the insn fits into a 
pipeline bubble, when in fact it does not, even when pipelined.

It really seems this needs to either be a hook or attribute.

jeff
Segher Boessenkool Jan. 27, 2017, 10:01 p.m. UTC | #15
On Fri, Jan 27, 2017 at 11:04:42AM -0700, Jeff Law wrote:
> >Well, the only thing from the pipeline description that is used here is
> >the insn latency, which isn't all that much higher than "normal" FP insns.
> >And simply "not decribed properly" won't do much good -- if we could
> >(without blowing up the automata) we would, and sched-rgn would then
> >still speculate this.
> And I think this is the core of the issue.  We have multiple ports that 
> don't necessarily fully describe the latency, issue rates, etc of 
> certain insns like div/sqrt/rsqrt.  There are good reasons for doing that.
> 
> Because of the partial description, the scheduler may think those insns 
> fit into a pipeline bubble within the loop, when reality they do not.
> 
> The scheduler currently has no way of knowing what insns have this 
> property.   While there are cases where we'd like to speculate a div or 
> sqrt to give it more time to complete without stalls -- there's no good 
> way to do that without fully describing them to the scheduler.
> 
> My preference would be somehow either mark those insns as not fully 
> modeled and avoid speculating on them.  Or invent a target hook to allow 
> the scheduler to query the backend.

This is my preference -- have it in one location, not spread out over
many instruction patterns, or many scheduling descriptions.

> Note that these could be used elsewhere -- for example delay slot 
> scheduling and predication.  Delay slot scheduling does speculation and 
> there's ports that simply refuse to allow certain instructions (div/sqrt 
> on one port, I think all FP stuff on another) to avoid these kinds of 
> problems.

Are you saying there already is a hook we could use, maybe after
adjusting it a bit?  That would be ideal.

> Similarly nullification/predication often work by wiping out the final 
> posting of results into the register file.  So imagine a non-pipelined 
> div/sqrt.  Predicating a div/sqrt instruction will actually keep the 
> pipeline busy computing results that will be thrown away and preventing 
> other useful work from occurring.  And, yes, this really does happen. 
> THe PA suffered from these problems.


Segher
Jeff Law Jan. 27, 2017, 10:17 p.m. UTC | #16
On 01/27/2017 03:01 PM, Segher Boessenkool wrote:
> On Fri, Jan 27, 2017 at 11:04:42AM -0700, Jeff Law wrote:
>>> Well, the only thing from the pipeline description that is used here is
>>> the insn latency, which isn't all that much higher than "normal" FP insns.
>>> And simply "not decribed properly" won't do much good -- if we could
>>> (without blowing up the automata) we would, and sched-rgn would then
>>> still speculate this.
>> And I think this is the core of the issue.  We have multiple ports that
>> don't necessarily fully describe the latency, issue rates, etc of
>> certain insns like div/sqrt/rsqrt.  There are good reasons for doing that.
>>
>> Because of the partial description, the scheduler may think those insns
>> fit into a pipeline bubble within the loop, when reality they do not.
>>
>> The scheduler currently has no way of knowing what insns have this
>> property.   While there are cases where we'd like to speculate a div or
>> sqrt to give it more time to complete without stalls -- there's no good
>> way to do that without fully describing them to the scheduler.
>>
>> My preference would be somehow either mark those insns as not fully
>> modeled and avoid speculating on them.  Or invent a target hook to allow
>> the scheduler to query the backend.
>
> This is my preference -- have it in one location, not spread out over
> many instruction patterns, or many scheduling descriptions.
No strong opinions on the two approaches.  I could easily see defining a 
hook and ports implementing the hook via attributes if that were easier 
for them.  Other ports might implement the hook by scanning the insn for 
key RTL codes.

>
>> Note that these could be used elsewhere -- for example delay slot
>> scheduling and predication.  Delay slot scheduling does speculation and
>> there's ports that simply refuse to allow certain instructions (div/sqrt
>> on one port, I think all FP stuff on another) to avoid these kinds of
>> problems.
>
> Are you saying there already is a hook we could use, maybe after
> adjusting it a bit?  That would be ideal.
No -- ports handle this inside there eligible_for_delay_XXX support. 
It's not something that could be reasonably re-used within the scheduler.

Jeff
Segher Boessenkool Jan. 27, 2017, 10:29 p.m. UTC | #17
On Fri, Jan 27, 2017 at 03:17:42PM -0700, Jeff Law wrote:
> >>My preference would be somehow either mark those insns as not fully
> >>modeled and avoid speculating on them.  Or invent a target hook to allow
> >>the scheduler to query the backend.
> >
> >This is my preference -- have it in one location, not spread out over
> >many instruction patterns, or many scheduling descriptions.
> No strong opinions on the two approaches.  I could easily see defining a 
> hook and ports implementing the hook via attributes if that were easier 
> for them.  Other ports might implement the hook by scanning the insn for 
> key RTL codes.

Okay, I'll come up with something.

> >>Note that these could be used elsewhere -- for example delay slot
> >>scheduling and predication.  Delay slot scheduling does speculation and
> >>there's ports that simply refuse to allow certain instructions (div/sqrt
> >>on one port, I think all FP stuff on another) to avoid these kinds of
> >>problems.
> >
> >Are you saying there already is a hook we could use, maybe after
> >adjusting it a bit?  That would be ideal.
> No -- ports handle this inside there eligible_for_delay_XXX support. 
> It's not something that could be reasonably re-used within the scheduler.

Too bad.  Oh well.


Segher
diff mbox

Patch

diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c
index 2af3a03..6ccd973 100644
--- a/gcc/sched-rgn.c
+++ b/gcc/sched-rgn.c
@@ -64,6 +64,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "sel-sched.h"
 #include "tree-pass.h"
 #include "dbgcnt.h"
+#include "rtl-iter.h"
 #include "pretty-print.h"
 #include "print-rtl.h"
 
@@ -2147,12 +2148,20 @@  static int
 can_schedule_ready_p (rtx_insn *insn)
 {
   /* An interblock motion?  */
-  if (INSN_BB (insn) != target_bb
-      && IS_SPECULATIVE_INSN (insn)
-      && !check_live (insn, INSN_BB (insn)))
-    return 0;
-  else
-    return 1;
+  if (INSN_BB (insn) != target_bb && IS_SPECULATIVE_INSN (insn))
+    {
+      /* Cannot schedule this insn unless all operands are live.  */
+      if (!check_live (insn, INSN_BB (insn)))
+	return 0;
+
+      /* Should not move expensive instructions speculatively.  */
+      subrtx_iterator::array_type array;
+      FOR_EACH_SUBRTX (iter, array, insn, NONCONST)
+        if (*iter && GET_CODE (*iter) == SQRT)
+	  return 0;
+    }
+
+  return 1;
 }
 
 /* Updates counter and other information.  Split from can_schedule_ready_p ()