diff mbox series

[RFC,1/1] OPAL : Support spr save-restore using new opal call

Message ID 20180720110051.40468-2-huntbag@linux.vnet.ibm.com
State RFC
Headers show
Series Saving and restoring of SPRs in opal | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success master/apply_patch Successfully applied
snowpatch_ozlabs/make_check success Test make_check on branch master

Commit Message

Abhishek Goel July 20, 2018, 11 a.m. UTC
In an attempt to make the powernv idle code backward compatible,
and to some extent forward compatible, add support for pre-stop entry
and post-stop exit actions in OPAL. If a kernel knows about this
opal call, then just a firmware supporting newer hardware is required,
instead of waiting for kernel updates.
This opal support for stop can be indicated by a compatibility string
in the newly proposed device-tree format (Ref:
https://patchwork.ozlabs.org/patch/923120/). Thus the kernel can
enable a stop state, when the opal support for it exists, even if the
kernel support doesn't exist.

Signed-off-by: Abhishek Goel <huntbag@linux.vnet.ibm.com>
---
 hw/chiptod.c        |   7 ++-
 hw/slw.c            | 145 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/opal-api.h  |   6 ++-
 include/processor.h |  12 +++++
 4 files changed, 168 insertions(+), 2 deletions(-)

Comments

Nicholas Piggin July 23, 2018, 5:30 a.m. UTC | #1
On Fri, 20 Jul 2018 16:30:51 +0530
Abhishek Goel <huntbag@linux.vnet.ibm.com> wrote:

> In an attempt to make the powernv idle code backward compatible,
> and to some extent forward compatible, add support for pre-stop entry
> and post-stop exit actions in OPAL. If a kernel knows about this
> opal call, then just a firmware supporting newer hardware is required,
> instead of waiting for kernel updates.
> This opal support for stop can be indicated by a compatibility string
> in the newly proposed device-tree format (Ref:
> https://patchwork.ozlabs.org/patch/923120/). Thus the kernel can
> enable a stop state, when the opal support for it exists, even if the
> kernel support doesn't exist.

I'm hoping rather than a save API it should just be an enter-idle
call, and it would expect another call in response to a powersave
wakeup (or it could return in the case of EC=0 type idle).

The problem is we may need a new special OPAL call API for
the wakeup case because that would be re-entrant. IMO we also
need something similar for OPAL machine check handling, so we
should think about it and see if something can fit both.

I'd like Linux not to have to know about any saving of core
vs thread SPRs or even PSSCR or STOP meanings. Just use an
idle state number it got from dt.

Thanks,
Nick
Benjamin Herrenschmidt July 23, 2018, 7:14 a.m. UTC | #2
On Mon, 2018-07-23 at 15:30 +1000, Nicholas Piggin wrote:
> I'm hoping rather than a save API it should just be an enter-idle
> call, and it would expect another call in response to a powersave
> wakeup (or it could return in the case of EC=0 type idle).
> 
> The problem is we may need a new special OPAL call API for
> the wakeup case because that would be re-entrant. IMO we also
> need something similar for OPAL machine check handling, so we
> should think about it and see if something can fit both.
> 
> I'd like Linux not to have to know about any saving of core
> vs thread SPRs or even PSSCR or STOP meanings. Just use an
> idle state number it got from dt.

We don't want OPAL to be the normal path. Only the "oops, we don't know
about that processor/idle_state combination/version, use OPAL as a
fallback". The cost of an OPAL call is too high otherwise.

We could have a separate OPAL entry point for the wakeup case but
putting the entirety of the enter/exit into OPAL is complicated.

We have to arbitrate between multiple threads entering different
states. We don't know until we exit how deep the core actually went and
what actually needs to be restored. It's a function of the SRR1 bits on
return *and* the state that was requested.

There's also the need to synchronize/rendez-vous threads on wakeup in
some cases such as TB resync, and dealing with big vs. small core.

Doing it all in OPAL would require *all* stop states to go to OPAL, I
don't think that's a great idea for performance.

Cheers
Ben.
Gautham R Shenoy July 23, 2018, 12:32 p.m. UTC | #3
Hello Ben, Nicholas,

On Mon, Jul 23, 2018 at 05:14:07PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-07-23 at 15:30 +1000, Nicholas Piggin wrote:
> > I'm hoping rather than a save API it should just be an enter-idle
> > call, and it would expect another call in response to a powersave
> > wakeup (or it could return in the case of EC=0 type idle).
> > 
> > The problem is we may need a new special OPAL call API for
> > the wakeup case because that would be re-entrant. IMO we also
> > need something similar for OPAL machine check handling, so we
> > should think about it and see if something can fit both.
> > 
> > I'd like Linux not to have to know about any saving of core
> > vs thread SPRs or even PSSCR or STOP meanings. Just use an
> > idle state number it got from dt.
> 
> We don't want OPAL to be the normal path. Only the "oops, we don't know
> about that processor/idle_state combination/version, use OPAL as a
> fallback". The cost of an OPAL call is too high otherwise.

Yes. This is the intended usecase so that the stop state is not
dropped by the kernel, and can be used if the support for save/restore
exists in OPAL. In order to support this, an additional change that is
required in the newly proposed device-tree is a different latency_ns
and residency_ns values should the OPAL based save/restore be used, so
that the cpuidle driver in the kernel will pick it less often than if
the support for the stop state was available in the Kernel.

> 
> We could have a separate OPAL entry point for the wakeup case but
> putting the entirety of the enter/exit into OPAL is complicated.

> 
> We have to arbitrate between multiple threads entering different
> states. We don't know until we exit how deep the core actually went and
> what actually needs to be restored. It's a function of the SRR1 bits on
> return *and* the state that was requested.

And the PLS value in PSSCR which tells us precisely how deep a state
did the CPU/Core transition to. 

> 
> There's also the need to synchronize/rendez-vous threads on wakeup in
> some cases such as TB resync, and dealing with big vs. small core.
> 
> Doing it all in OPAL would require *all* stop states to go to OPAL, I
> don't think that's a great idea for performance.

This was one of the reasons why the current implementation offloads
only the save/restore task to OPAL and retains the rest of the things
(handling the synchronization, determining the first thread in the
core) in the kernel, since there can be stop states which the Kernel
is capable of handling.


> 
> Cheers
> Ben.

--
Thanks and Regards
gautham.

> 
>
Nicholas Piggin July 23, 2018, 1:41 p.m. UTC | #4
On Mon, 23 Jul 2018 17:14:07 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Mon, 2018-07-23 at 15:30 +1000, Nicholas Piggin wrote:
> > I'm hoping rather than a save API it should just be an enter-idle
> > call, and it would expect another call in response to a powersave
> > wakeup (or it could return in the case of EC=0 type idle).
> > 
> > The problem is we may need a new special OPAL call API for
> > the wakeup case because that would be re-entrant. IMO we also
> > need something similar for OPAL machine check handling, so we
> > should think about it and see if something can fit both.
> > 
> > I'd like Linux not to have to know about any saving of core
> > vs thread SPRs or even PSSCR or STOP meanings. Just use an
> > idle state number it got from dt.  
> 
> We don't want OPAL to be the normal path. Only the "oops, we don't know
> about that processor/idle_state combination/version, use OPAL as a
> fallback". The cost of an OPAL call is too high otherwise.

Right, so we should make OPAL do the whole thing (except catch the
SRESET wakeup which Linux has to do and has well architected SRR1
bits).

> 
> We could have a separate OPAL entry point for the wakeup case but
> putting the entirety of the enter/exit into OPAL is complicated.
> 
> We have to arbitrate between multiple threads entering different
> states. We don't know until we exit how deep the core actually went and
> what actually needs to be restored. It's a function of the SRR1 bits on
> return *and* the state that was requested.
> 
> There's also the need to synchronize/rendez-vous threads on wakeup in
> some cases such as TB resync, and dealing with big vs. small core.
> 
> Doing it all in OPAL would require *all* stop states to go to OPAL, I
> don't think that's a great idea for performance.

Oh you want Linux native and OPAL drivers to be used simultaneously?
Is that the right thing to optimise for?

Thanks,
Nick
Benjamin Herrenschmidt July 24, 2018, 12:06 a.m. UTC | #5
On Mon, 2018-07-23 at 23:41 +1000, Nicholas Piggin wrote:
> Right, so we should make OPAL do the whole thing (except catch the
> SRESET wakeup which Linux has to do and has well architected SRR1
> bits).
> 
> > 
> > We could have a separate OPAL entry point for the wakeup case but
> > putting the entirety of the enter/exit into OPAL is complicated.
> > 
> > We have to arbitrate between multiple threads entering different
> > states. We don't know until we exit how deep the core actually went and
> > what actually needs to be restored. It's a function of the SRR1 bits on
> > return *and* the state that was requested.
> > 
> > There's also the need to synchronize/rendez-vous threads on wakeup in
> > some cases such as TB resync, and dealing with big vs. small core.
> > 
> > Doing it all in OPAL would require *all* stop states to go to OPAL, I
> > don't think that's a great idea for performance.
> 
> Oh you want Linux native and OPAL drivers to be used simultaneously?
> Is that the right thing to optimise for?

We want Linux native for states that are known to work (hopefully
STOP0/1) and OPAL for things that might need additional resources saved
or restored and/or workarounds.

At least that's how I thought of it so far... you prefer having an "all
or nothing" approach ?

Cheers,
Ben.
Nicholas Piggin July 24, 2018, 5:36 a.m. UTC | #6
On Tue, 24 Jul 2018 10:06:12 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Mon, 2018-07-23 at 23:41 +1000, Nicholas Piggin wrote:
> > Right, so we should make OPAL do the whole thing (except catch the
> > SRESET wakeup which Linux has to do and has well architected SRR1
> > bits).
> >   
> > > 
> > > We could have a separate OPAL entry point for the wakeup case but
> > > putting the entirety of the enter/exit into OPAL is complicated.
> > > 
> > > We have to arbitrate between multiple threads entering different
> > > states. We don't know until we exit how deep the core actually went and
> > > what actually needs to be restored. It's a function of the SRR1 bits on
> > > return *and* the state that was requested.
> > > 
> > > There's also the need to synchronize/rendez-vous threads on wakeup in
> > > some cases such as TB resync, and dealing with big vs. small core.
> > > 
> > > Doing it all in OPAL would require *all* stop states to go to OPAL, I
> > > don't think that's a great idea for performance.  
> > 
> > Oh you want Linux native and OPAL drivers to be used simultaneously?
> > Is that the right thing to optimise for?  
> 
> We want Linux native for states that are known to work (hopefully
> STOP0/1) and OPAL for things that might need additional resources saved
> or restored and/or workarounds.
> 
> At least that's how I thought of it so far... you prefer having an "all
> or nothing" approach ?

As an option, yes. And once you have that option I don't think there is
much point requiring kernel to know anything about deep states or even
the STOP instruction or PSSCR register.

I would not object to an OPAL API that requires all deep states to be
entered via OPAL but allows shallow states (GPR loss only) to be used
natively by the kernel concurrently. The shallow states are pretty well
architected and the deep states not nearly so performance critical.

Thanks,
Nick
Vaidyanathan Srinivasan July 24, 2018, 5:44 a.m. UTC | #7
* Nicholas Piggin <npiggin@gmail.com> [2018-07-23 23:41:19]:

> On Mon, 23 Jul 2018 17:14:07 +1000
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> 
> > On Mon, 2018-07-23 at 15:30 +1000, Nicholas Piggin wrote:
> > > I'm hoping rather than a save API it should just be an enter-idle
> > > call, and it would expect another call in response to a powersave
> > > wakeup (or it could return in the case of EC=0 type idle).
> > > 
> > > The problem is we may need a new special OPAL call API for
> > > the wakeup case because that would be re-entrant. IMO we also
> > > need something similar for OPAL machine check handling, so we
> > > should think about it and see if something can fit both.
> > > 
> > > I'd like Linux not to have to know about any saving of core
> > > vs thread SPRs or even PSSCR or STOP meanings. Just use an
> > > idle state number it got from dt.  
> > 
> > We don't want OPAL to be the normal path. Only the "oops, we don't know
> > about that processor/idle_state combination/version, use OPAL as a
> > fallback". The cost of an OPAL call is too high otherwise.
> 
> Right, so we should make OPAL do the whole thing (except catch the
> SRESET wakeup which Linux has to do and has well architected SRR1
> bits).
> 
> > 
> > We could have a separate OPAL entry point for the wakeup case but
> > putting the entirety of the enter/exit into OPAL is complicated.
> > 
> > We have to arbitrate between multiple threads entering different
> > states. We don't know until we exit how deep the core actually went and
> > what actually needs to be restored. It's a function of the SRR1 bits on
> > return *and* the state that was requested.
> > 
> > There's also the need to synchronize/rendez-vous threads on wakeup in
> > some cases such as TB resync, and dealing with big vs. small core.
> > 
> > Doing it all in OPAL would require *all* stop states to go to OPAL, I
> > don't think that's a great idea for performance.
> 
> Oh you want Linux native and OPAL drivers to be used simultaneously?
> Is that the right thing to optimise for?

Yes, ideally Linux should handle all stop states and related
save/restore.  We want to fallback to OPAL as a option to use the
deeper stop states even if it has some overheads.  This is a fallback
option to provide kernel level compatibility on different platforms.

Ideally we would expect newer kernels to handle all stop states
optimally without calling OPAL, but if we still miss something, then
atleast through device-tree we can tell kernel to call OPAL for this
stop state rather than just disabling the state completely.

We would expect shallow and fast stop states to be handled by kernel
and fallback to OPAL only for deeper states.

--Vaidy
Vaidyanathan Srinivasan July 24, 2018, 5:49 a.m. UTC | #8
* Nicholas Piggin <npiggin@gmail.com> [2018-07-24 15:36:48]:

> On Tue, 24 Jul 2018 10:06:12 +1000
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> 
> > On Mon, 2018-07-23 at 23:41 +1000, Nicholas Piggin wrote:
> > > Right, so we should make OPAL do the whole thing (except catch the
> > > SRESET wakeup which Linux has to do and has well architected SRR1
> > > bits).
> > >   
> > > > 
> > > > We could have a separate OPAL entry point for the wakeup case but
> > > > putting the entirety of the enter/exit into OPAL is complicated.
> > > > 
> > > > We have to arbitrate between multiple threads entering different
> > > > states. We don't know until we exit how deep the core actually went and
> > > > what actually needs to be restored. It's a function of the SRR1 bits on
> > > > return *and* the state that was requested.
> > > > 
> > > > There's also the need to synchronize/rendez-vous threads on wakeup in
> > > > some cases such as TB resync, and dealing with big vs. small core.
> > > > 
> > > > Doing it all in OPAL would require *all* stop states to go to OPAL, I
> > > > don't think that's a great idea for performance.  
> > > 
> > > Oh you want Linux native and OPAL drivers to be used simultaneously?
> > > Is that the right thing to optimise for?  
> > 
> > We want Linux native for states that are known to work (hopefully
> > STOP0/1) and OPAL for things that might need additional resources saved
> > or restored and/or workarounds.
> > 
> > At least that's how I thought of it so far... you prefer having an "all
> > or nothing" approach ?
> 
> As an option, yes. And once you have that option I don't think there is
> much point requiring kernel to know anything about deep states or even
> the STOP instruction or PSSCR register.
> 
> I would not object to an OPAL API that requires all deep states to be
> entered via OPAL but allows shallow states (GPR loss only) to be used
> natively by the kernel concurrently. The shallow states are pretty well
> architected and the deep states not nearly so performance critical.

I agree with your design point.  However based on our experience
across different platforms, we did not want to draw a line for shallow
vs deep for this API.  The framework can be used as fallback for any
state if required.  But in practical situations, we expect to hook
onto this one from the deepest state like STOP11 where we still have
few quirks to deal with.  The STOP11 case is a clear win with OPAL
path because it is so slow and kernel's context management will not
speed it up.

--Vaidy
Benjamin Herrenschmidt July 24, 2018, 5:58 a.m. UTC | #9
On Tue, 2018-07-24 at 15:36 +1000, Nicholas Piggin wrote:
> On Tue, 24 Jul 2018 10:06:12 +1000
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> 
> > On Mon, 2018-07-23 at 23:41 +1000, Nicholas Piggin wrote:
> > > Right, so we should make OPAL do the whole thing (except catch the
> > > SRESET wakeup which Linux has to do and has well architected SRR1
> > > bits).
> > >   
> > > > 
> > > > We could have a separate OPAL entry point for the wakeup case but
> > > > putting the entirety of the enter/exit into OPAL is complicated.
> > > > 
> > > > We have to arbitrate between multiple threads entering different
> > > > states. We don't know until we exit how deep the core actually went and
> > > > what actually needs to be restored. It's a function of the SRR1 bits on
> > > > return *and* the state that was requested.
> > > > 
> > > > There's also the need to synchronize/rendez-vous threads on wakeup in
> > > > some cases such as TB resync, and dealing with big vs. small core.
> > > > 
> > > > Doing it all in OPAL would require *all* stop states to go to OPAL, I
> > > > don't think that's a great idea for performance.  
> > > 
> > > Oh you want Linux native and OPAL drivers to be used simultaneously?
> > > Is that the right thing to optimise for?  
> > 
> > We want Linux native for states that are known to work (hopefully
> > STOP0/1) and OPAL for things that might need additional resources saved
> > or restored and/or workarounds.
> > 
> > At least that's how I thought of it so far... you prefer having an "all
> > or nothing" approach ?
> 
> As an option, yes. And once you have that option I don't think there is
> much point requiring kernel to know anything about deep states or even
> the STOP instruction or PSSCR register.
> 
> I would not object to an OPAL API that requires all deep states to be
> entered via OPAL but allows shallow states (GPR loss only) to be used
> natively by the kernel concurrently. The shallow states are pretty well
> architected and the deep states not nearly so performance critical.

Where do you put in between states like 2 or 4/5 ? They are somwhat
performance critical, enough that you don't want the round trip to
OPAL, do you ?

Or do we have to go there to resync TB anyway ?

I wish the HW could handle the TB resync and stave save/restore using
the CMEs and completely remove the problem...

Ben.
Nicholas Piggin July 24, 2018, 6:34 a.m. UTC | #10
On Tue, 24 Jul 2018 11:19:07 +0530
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> wrote:

> * Nicholas Piggin <npiggin@gmail.com> [2018-07-24 15:36:48]:
> 
> > On Tue, 24 Jul 2018 10:06:12 +1000
> > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> >   
> > > On Mon, 2018-07-23 at 23:41 +1000, Nicholas Piggin wrote:  
> > > > Right, so we should make OPAL do the whole thing (except catch the
> > > > SRESET wakeup which Linux has to do and has well architected SRR1
> > > > bits).
> > > >     
> > > > > 
> > > > > We could have a separate OPAL entry point for the wakeup case but
> > > > > putting the entirety of the enter/exit into OPAL is complicated.
> > > > > 
> > > > > We have to arbitrate between multiple threads entering different
> > > > > states. We don't know until we exit how deep the core actually went and
> > > > > what actually needs to be restored. It's a function of the SRR1 bits on
> > > > > return *and* the state that was requested.
> > > > > 
> > > > > There's also the need to synchronize/rendez-vous threads on wakeup in
> > > > > some cases such as TB resync, and dealing with big vs. small core.
> > > > > 
> > > > > Doing it all in OPAL would require *all* stop states to go to OPAL, I
> > > > > don't think that's a great idea for performance.    
> > > > 
> > > > Oh you want Linux native and OPAL drivers to be used simultaneously?
> > > > Is that the right thing to optimise for?    
> > > 
> > > We want Linux native for states that are known to work (hopefully
> > > STOP0/1) and OPAL for things that might need additional resources saved
> > > or restored and/or workarounds.
> > > 
> > > At least that's how I thought of it so far... you prefer having an "all
> > > or nothing" approach ?  
> > 
> > As an option, yes. And once you have that option I don't think there is
> > much point requiring kernel to know anything about deep states or even
> > the STOP instruction or PSSCR register.
> > 
> > I would not object to an OPAL API that requires all deep states to be
> > entered via OPAL but allows shallow states (GPR loss only) to be used
> > natively by the kernel concurrently. The shallow states are pretty well
> > architected and the deep states not nearly so performance critical.  
> 
> I agree with your design point.  However based on our experience
> across different platforms, we did not want to draw a line for shallow
> vs deep for this API.

I don't see a problem with doing that. If the state came to require some
workaround other than simple GPR restore, then it can be marked in
device tree as being a deep state (which it is by definition, isn't it?)

The biggest performance win and most critical is SMT mode switch, so if
we only had stop0 available for native kernel I don't think that would
be a problem.

When we start turning off clocks and throwing away caches and SPRs, I
don't think the OPAL call will be a huge problem.

>  The framework can be used as fallback for any
> state if required.  But in practical situations, we expect to hook
> onto this one from the deepest state like STOP11 where we still have
> few quirks to deal with.  The STOP11 case is a clear win with OPAL
> path because it is so slow and kernel's context management will not
> speed it up.

The problem with that is that it's all still expecting the kernel to
do the right synchronisation and know about scope of resource loss.
What if we had some issue that required synchronisation between a pair
of cores? Or if we had to use a different entry sequence to execute
the 'stop' instruction (AFAIK that is not actually architected, but
assuming we don't do something crazy like change it, there could be
an errata)? Then we lose the whole API.

IMO we need a *true* back compatible API (at least one that relies on
no more than SRESET/MCE + SRR1 powersave wakeup indication). And then
we can think of reasonable additions to make performance a bit better.
I think allowing native shallow stop concurrently is reasonable. Is
allowing native concurrent deep stops? I'm not so sure. I'd like to see
some numbers though.

Thanks,
Nick
Nicholas Piggin July 24, 2018, 6:52 a.m. UTC | #11
On Tue, 24 Jul 2018 15:58:58 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Tue, 2018-07-24 at 15:36 +1000, Nicholas Piggin wrote:
> > On Tue, 24 Jul 2018 10:06:12 +1000
> > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> >   
> > > On Mon, 2018-07-23 at 23:41 +1000, Nicholas Piggin wrote:  
> > > > Right, so we should make OPAL do the whole thing (except catch the
> > > > SRESET wakeup which Linux has to do and has well architected SRR1
> > > > bits).
> > > >     
> > > > > 
> > > > > We could have a separate OPAL entry point for the wakeup case but
> > > > > putting the entirety of the enter/exit into OPAL is complicated.
> > > > > 
> > > > > We have to arbitrate between multiple threads entering different
> > > > > states. We don't know until we exit how deep the core actually went and
> > > > > what actually needs to be restored. It's a function of the SRR1 bits on
> > > > > return *and* the state that was requested.
> > > > > 
> > > > > There's also the need to synchronize/rendez-vous threads on wakeup in
> > > > > some cases such as TB resync, and dealing with big vs. small core.
> > > > > 
> > > > > Doing it all in OPAL would require *all* stop states to go to OPAL, I
> > > > > don't think that's a great idea for performance.    
> > > > 
> > > > Oh you want Linux native and OPAL drivers to be used simultaneously?
> > > > Is that the right thing to optimise for?    
> > > 
> > > We want Linux native for states that are known to work (hopefully
> > > STOP0/1) and OPAL for things that might need additional resources saved
> > > or restored and/or workarounds.
> > > 
> > > At least that's how I thought of it so far... you prefer having an "all
> > > or nothing" approach ?  
> > 
> > As an option, yes. And once you have that option I don't think there is
> > much point requiring kernel to know anything about deep states or even
> > the STOP instruction or PSSCR register.
> > 
> > I would not object to an OPAL API that requires all deep states to be
> > entered via OPAL but allows shallow states (GPR loss only) to be used
> > natively by the kernel concurrently. The shallow states are pretty well
> > architected and the deep states not nearly so performance critical.  
> 
> Where do you put in between states like 2 or 4/5 ? They are somwhat
> performance critical, enough that you don't want the round trip to
> OPAL, do you ?

I think an OPAL call for 4/5 would be acceptable. They get very
expensive when power turns off.

For 2 I'm not so sure. It's still 20000ns hardware cost.

> 
> Or do we have to go there to resync TB anyway ?

POWER9 is better about keeping TB, so it's not required I think.

> I wish the HW could handle the TB resync and stave save/restore using
> the CMEs and completely remove the problem...

Yeah, there's lots we could improve there...

Thanks,
Nick
diff mbox series

Patch

diff --git a/hw/chiptod.c b/hw/chiptod.c
index df1274ca..7f52f6ac 100644
--- a/hw/chiptod.c
+++ b/hw/chiptod.c
@@ -1599,7 +1599,7 @@  error_out:
 	return rc;
 }
 
-static int64_t opal_resync_timebase(void)
+int64_t __opal_resync_timebase(void)
 {
 	if (!chiptod_wakeup_resync()) {
 		prerror("OPAL: Resync timebase failed on CPU 0x%04x\n",
@@ -1608,6 +1608,11 @@  static int64_t opal_resync_timebase(void)
 	}
 	return OPAL_SUCCESS;
 }
+
+static int64_t opal_resync_timebase(void)
+{
+	return __opal_resync_timebase();
+}
 opal_call(OPAL_RESYNC_TIMEBASE, opal_resync_timebase, 0);
 
 static void chiptod_print_tb(void *data __unused)
diff --git a/hw/slw.c b/hw/slw.c
index dfa9189b..7acecbb4 100644
--- a/hw/slw.c
+++ b/hw/slw.c
@@ -818,6 +818,151 @@  static void slw_late_init_p9(struct proc_chip *chip)
 	}
 }
 
+#define PTCR_IDX	0
+#define RPR_IDX		1
+#define SPURR_IDX	2
+#define PURR_IDX	3
+#define TSCR_IDX	4
+#define DSCR_IDX	5
+#define AMOR_IDX	6
+#define WORT_IDX	7
+#define WORC_IDX	8
+#define LPCR_IDX	9
+#define PID_IDX		10
+#define LDBAR_IDX	11
+#define FSCR_IDX	12
+#define HFSCR_IDX	13
+#define MMCRA_IDX	14
+#define MMCR0_IDX	15
+#define MMCR1_IDX	16
+#define MMCR2_IDX	17
+#define PSSCR_RL_MASK	0xF
+#define PSSCR_PLS_MASK	0xF000000000000000UL
+#define PSSCR_PLS_SHIFT	60
+#define SRR1_WS_HVLOSS  0x30000
+#define SCOPE_CORE	0
+#define SCOPE_THREAD	1
+
+/*
+ * opal_cpuidle_save: Save the SPRs and any other resources
+ *			when going to a deep idle stop states.
+ * @stop_sprs : Pointer to a array where the SPR values of
+ *		the relevant SPRs of this CPU have to be saved.
+ * @scope     : Defines if the saving needs to be done
+ *		for per-thread resources or per-core resources.
+ * @psscr     : The requested psscr values
+ * @srr1      : The SRR1 value
+ * Returns OPAL_EMPTY when stop_sprs is NULL.
+ * Returns OPAL_UNSUPPORTED if we are going into a shallow state.
+ * Returns OPAL_SUCCESS in all the other cases.
+ * Returns OPAL_PARAMETER if scope is incorrectly passed.
+ */
+static int opal_cpuidle_save(u64 *stop_sprs, int scope, u64 psscr)
+{
+	if (scope != SCOPE_CORE && scope != SCOPE_THREAD) {
+		prlog(PR_ERR, "opal_cpuidle_save : invalid scope\n");
+		return OPAL_PARAMETER;
+	}
+	if (!stop_sprs) {
+		prlog(PR_ERR, "opal_cpuidle_save : unallocated memory pointer\n");
+		return OPAL_EMPTY;
+	}
+	/*
+	 * TODO Fix this to use the RL value of the first thread
+	 * that loses hypervisor resources.
+	 */
+	if ((psscr & PSSCR_RL_MASK) < 4) {
+		prlog(PR_ERR, "opal_cpuidle_save : unexpected opal call\n");
+		return OPAL_UNSUPPORTED;
+	}
+	/*
+	 * Saving all the sprs. In future, We may save core resources
+	 * only if the thread entering in deep stop is the last in the
+	 * core. scope can be used to take that decision.
+	 */
+	stop_sprs[RPR_IDX] = mfspr(SPR_RPR);
+	stop_sprs[WORC_IDX] = mfspr(SPR_WORC);
+	stop_sprs[PTCR_IDX] = mfspr(SPR_PTCR);
+	stop_sprs[TSCR_IDX] = mfspr(SPR_TSCR);
+	stop_sprs[AMOR_IDX] = mfspr(SPR_AMOR);
+
+	stop_sprs[WORT_IDX] = mfspr(SPR_WORT);
+	stop_sprs[PURR_IDX] = mfspr(SPR_PURR);
+	stop_sprs[SPURR_IDX] = mfspr(SPR_SPURR);
+	stop_sprs[DSCR_IDX] = mfspr(SPR_DSCR);
+	stop_sprs[LPCR_IDX] = mfspr(SPR_LPCR);
+	stop_sprs[PID_IDX] = mfspr(SPR_PID);
+	stop_sprs[LDBAR_IDX] = mfspr(SPR_LDBAR);
+	stop_sprs[FSCR_IDX] = mfspr(SPR_FSCR);
+	stop_sprs[HFSCR_IDX] = mfspr(SPR_HFSCR);
+	stop_sprs[MMCRA_IDX] = mfspr(SPR_MMCRA);
+	stop_sprs[MMCR0_IDX] = mfspr(SPR_MMCR0);
+	stop_sprs[MMCR1_IDX] = mfspr(SPR_MMCR1);
+	stop_sprs[MMCR2_IDX] = mfspr(SPR_MMCR2);
+	return OPAL_SUCCESS;
+}
+
+opal_call(OPAL_IDLE_SAVE, opal_cpuidle_save, 3);
+
+/*
+ * opal_cpuidle_restore: Restore the SPRs and any other resources
+ *			on wakeup from a  deep idle stop states.
+ * @stop_sprs : Pointer to a array where the SPR values of
+ *		the relevant SPRs of this CPU have been stored.
+ * @scope     : Defines if the restoration needs to be done
+ *		for per-thread resources of per-core resources.
+ * @psscr     : The psscr value at wakeup from stop.
+ * @srr1      : The SRR1 value at wakeup from stop.
+ * Returns OPAL_EMPTY when stop_sprs is NULL.
+ * Returns OPAL_UNSUPPORTED if we woke up from a shallow state.
+ * Returns OPAL_SUCCESS in all the other cases.
+ * Returns OPAL_PARAMETER if scope is incorrectly passed.
+ */
+static int opal_cpuidle_restore(u64 *stop_sprs, int scope, u64 psscr, u64 srr1)
+{
+	if (scope != SCOPE_CORE && scope != SCOPE_THREAD) {
+		prlog(PR_ERR, "opal_cpuidle_save : invalid scope\n");
+		return OPAL_PARAMETER;
+	}
+	if (!stop_sprs) {
+		prlog(PR_ERR, "opal_cpuidle_save : incorrect pointer to save area\n");
+		return OPAL_EMPTY;
+	}
+	if ((psscr & PSSCR_PLS_MASK) >> PSSCR_PLS_SHIFT < 4) {
+		prlog(PR_ERR, "opal_cpuidle_save : unexpected opal call\n");
+		return OPAL_UNSUPPORTED;
+	}
+	/* if CORE scope, restore core resources as well as thread resources */
+	if (scope == SCOPE_CORE) {
+		/* In case of complete hypervisor state loss
+		 * we need to resync timebase
+		 */
+		if (srr1 & SRR1_WS_HVLOSS)
+			__opal_resync_timebase();
+		mtspr(SPR_RPR, stop_sprs[RPR_IDX]);
+		mtspr(SPR_WORC, stop_sprs[WORC_IDX]);
+		mtspr(SPR_PTCR, stop_sprs[PTCR_IDX]);
+		mtspr(SPR_TSCR, stop_sprs[TSCR_IDX]);
+		mtspr(SPR_AMOR, stop_sprs[AMOR_IDX]);
+	}
+	mtspr(SPR_WORT, stop_sprs[WORT_IDX]);
+	mtspr(SPR_PURR, stop_sprs[PURR_IDX]);
+	mtspr(SPR_SPURR, stop_sprs[SPURR_IDX]);
+	mtspr(SPR_DSCR, stop_sprs[DSCR_IDX]);
+	mtspr(SPR_LPCR, stop_sprs[LPCR_IDX]);
+	mtspr(SPR_PID, stop_sprs[PID_IDX]);
+	mtspr(SPR_LDBAR, stop_sprs[LDBAR_IDX]);
+	mtspr(SPR_FSCR, stop_sprs[FSCR_IDX]);
+	mtspr(SPR_HFSCR, stop_sprs[HFSCR_IDX]);
+	mtspr(SPR_MMCRA, stop_sprs[MMCRA_IDX]);
+	mtspr(SPR_MMCR0, stop_sprs[MMCR0_IDX]);
+	mtspr(SPR_MMCR1, stop_sprs[MMCR1_IDX]);
+	mtspr(SPR_MMCR2, stop_sprs[MMCR2_IDX]);
+	return OPAL_SUCCESS;
+}
+
+opal_call(OPAL_IDLE_RESTORE, opal_cpuidle_restore, 4);
+
 /* Add device tree properties to describe idle states */
 void add_cpu_idle_state_properties(void)
 {
diff --git a/include/opal-api.h b/include/opal-api.h
index f766dce9..24c98ed0 100644
--- a/include/opal-api.h
+++ b/include/opal-api.h
@@ -224,7 +224,9 @@ 
 #define OPAL_PCI_SET_PBCQ_TUNNEL_BAR		165
 #define OPAL_HANDLE_HMI2			166
 #define OPAL_NX_COPROC_INIT			167
-#define OPAL_LAST				167
+#define OPAL_IDLE_SAVE				168
+#define OPAL_IDLE_RESTORE			169
+#define OPAL_LAST				169
 
 #define QUIESCE_HOLD			1 /* Spin all calls at entry */
 #define QUIESCE_REJECT			2 /* Fail all calls with OPAL_BUSY */
@@ -1312,6 +1314,8 @@  enum {
 	OPAL_PCI_P2P_TARGET	= 1,
 };
 
+extern int64_t __opal_resync_timebase(void);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __OPAL_API_H */
diff --git a/include/processor.h b/include/processor.h
index 6b262b45..cfda56d3 100644
--- a/include/processor.h
+++ b/include/processor.h
@@ -87,6 +87,18 @@ 
 #define SPR_HID4	0x3f4
 #define SPR_HID5	0x3f6
 #define SPR_PIR		0x3ff	/* RO: Processor Identification */
+#define SPR_PTCR	0x1D0	/* Partition table control Register */
+#define SPR_WORT	0x37f	/* Workload optimization register - thread */
+#define SPR_WORC	0x35f	/* Workload optimization register - core */
+#define SPR_FSCR	0x099	/* Facility Status & Control Register */
+#define SPR_HFSCR	0xbe	/* HV=1 Facility Status & Control Register */
+#define SPR_LDBAR	0x352	/* LD Base Address Register */
+#define SPR_PID		0x030	/* Process ID */
+/* Performance counter Registers */
+#define SPR_MMCR0	0x31b
+#define SPR_MMCRA	0x312
+#define SPR_MMCR1	0x31e
+#define SPR_MMCR2	0x311
 
 
 /* Bits in LPCR */