[1/2] SLW: Remove stop1_lite and stop0 stop states

Message ID 1525079529-2284-1-git-send-email-akshay.adiga@linux.vnet.ibm.com
State Superseded
Headers show
Series
  • [1/2] SLW: Remove stop1_lite and stop0 stop states
Related show

Commit Message

Akshay Adiga April 30, 2018, 9:12 a.m.
Powersaving for stop0_lite and stop1_lite is observed to be quite similar
and both states resume without state loss. Using context_switch test [1]
we observe that stop0_lite has slightly lower latency, hence removing
stop1_lite.

[1] linux/tools/testing/selftests/powerpc/benchmarks/context_switch.c

Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
---
 hw/slw.c | 30 ------------------------------
 1 file changed, 30 deletions(-)

Comments

Nicholas Piggin May 1, 2018, 3:47 a.m. | #1
On Mon, 30 Apr 2018 14:42:08 +0530
Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> wrote:

> Powersaving for stop0_lite and stop1_lite is observed to be quite similar
> and both states resume without state loss. Using context_switch test [1]
> we observe that stop0_lite has slightly lower latency, hence removing
> stop1_lite.
> 
> [1] linux/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> 
> Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>

I'm okay for removing stop1_lite and stop2_lite -- SMT switching
is very latency critical. If we decide to actually start saving
real power then SMT should already have been switched.

So I would put stop1_lite and stop2_lite removal in the same patch.

Then what do we have? stop0_lite, stop0, stop1 for our fast idle
states.

I would be against removing stop0 if that is our fastest way to
release SMT resources, even if there is only a small advantage. Why
not remove stop1 instead?

We also need to better evaluate stop0_lite. How much advantage does
that have over snooze?

Thanks,
Nick


> ---
>  hw/slw.c | 30 ------------------------------
>  1 file changed, 30 deletions(-)
> 
> diff --git a/hw/slw.c b/hw/slw.c
> index 3f9abaa..edfc783 100644
> --- a/hw/slw.c
> +++ b/hw/slw.c
> @@ -521,36 +521,6 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>  				 | OPAL_PM_PSSCR_TR(3),
>  		.pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
>  	{
> -		.name = "stop0",
> -		.latency_ns = 2000,
> -		.residency_ns = 20000,
> -		.flags = 0*OPAL_PM_DEC_STOP \
> -		       | 0*OPAL_PM_TIMEBASE_STOP  \
> -		       | 1*OPAL_PM_LOSE_USER_CONTEXT \
> -		       | 0*OPAL_PM_LOSE_HYP_CONTEXT \
> -		       | 0*OPAL_PM_LOSE_FULL_CONTEXT \
> -		       | 1*OPAL_PM_STOP_INST_FAST,
> -		.pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(0) \
> -				 | OPAL_PM_PSSCR_MTL(3) \
> -				 | OPAL_PM_PSSCR_TR(3) \
> -				 | OPAL_PM_PSSCR_ESL \
> -				 | OPAL_PM_PSSCR_EC,
> -		.pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
> -	{
> -		.name = "stop1_lite", /* Enter stop1 with no state loss */
> -		.latency_ns = 4900,
> -		.residency_ns = 49000,
> -		.flags = 0*OPAL_PM_DEC_STOP \
> -		       | 0*OPAL_PM_TIMEBASE_STOP  \
> -		       | 0*OPAL_PM_LOSE_USER_CONTEXT \
> -		       | 0*OPAL_PM_LOSE_HYP_CONTEXT \
> -		       | 0*OPAL_PM_LOSE_FULL_CONTEXT \
> -		       | 1*OPAL_PM_STOP_INST_FAST,
> -		.pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(1) \
> -				 | OPAL_PM_PSSCR_MTL(3) \
> -				 | OPAL_PM_PSSCR_TR(3),
> -		.pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
> -	{
>  		.name = "stop1",
>  		.latency_ns = 5000,
>  		.residency_ns = 50000,
Akshay Adiga May 3, 2018, 9:06 a.m. | #2
On Tue, May 01, 2018 at 01:47:23PM +1000, Nicholas Piggin wrote:
> On Mon, 30 Apr 2018 14:42:08 +0530
> Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> wrote:
> 
> > Powersaving for stop0_lite and stop1_lite is observed to be quite similar
> > and both states resume without state loss. Using context_switch test [1]
> > we observe that stop0_lite has slightly lower latency, hence removing
> > stop1_lite.
> > 
> > [1] linux/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> > 
> > Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
> 
> I'm okay for removing stop1_lite and stop2_lite -- SMT switching
> is very latency critical. If we decide to actually start saving
> real power then SMT should already have been switched.
> 
> So I would put stop1_lite and stop2_lite removal in the same patch.

I can do this.

> 
> Then what do we have? stop0_lite, stop0, stop1 for our fast idle
> states.

Currently we were looking at  stop0_lite , stop1 as the fast idle states
because stop0 and stop1 have similar latency and powersaving.
Having so many low latency states does not make sense.

> 
> I would be against removing stop0 if that is our fastest way to
> release SMT resources, even if there is only a small advantage. Why
> not remove stop1 instead?
>
SMT-folding comes into picture only when we have at least one thread
running in the core. stop0 and stop1 has exactly same power-saving and
both will release SMT resources if at least one thread in the core is
running.

As soon as all threads are idle core enters stop0/stop1, where stop1
does a bit more powersaving than stop0.

> We also need to better evaluate stop0_lite. How much advantage does
> that have over snooze?

I evaluated snooze and stop0_lite, there is an additional ipi latency of
a few microseconds in case of stop0_lite. So snooze cannot still be
replaced by stop0_lite.

> 
> Thanks,
> Nick
> 
> 
> > ---
> >  hw/slw.c | 30 ------------------------------
> >  1 file changed, 30 deletions(-)
> > 
> > diff --git a/hw/slw.c b/hw/slw.c
> > index 3f9abaa..edfc783 100644
> > --- a/hw/slw.c
> > +++ b/hw/slw.c
> > @@ -521,36 +521,6 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
> >  				 | OPAL_PM_PSSCR_TR(3),
> >  		.pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
> >  	{
> > -		.name = "stop0",
> > -		.latency_ns = 2000,
> > -		.residency_ns = 20000,
> > -		.flags = 0*OPAL_PM_DEC_STOP \
> > -		       | 0*OPAL_PM_TIMEBASE_STOP  \
> > -		       | 1*OPAL_PM_LOSE_USER_CONTEXT \
> > -		       | 0*OPAL_PM_LOSE_HYP_CONTEXT \
> > -		       | 0*OPAL_PM_LOSE_FULL_CONTEXT \
> > -		       | 1*OPAL_PM_STOP_INST_FAST,
> > -		.pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(0) \
> > -				 | OPAL_PM_PSSCR_MTL(3) \
> > -				 | OPAL_PM_PSSCR_TR(3) \
> > -				 | OPAL_PM_PSSCR_ESL \
> > -				 | OPAL_PM_PSSCR_EC,
> > -		.pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
> > -	{
> > -		.name = "stop1_lite", /* Enter stop1 with no state loss */
> > -		.latency_ns = 4900,
> > -		.residency_ns = 49000,
> > -		.flags = 0*OPAL_PM_DEC_STOP \
> > -		       | 0*OPAL_PM_TIMEBASE_STOP  \
> > -		       | 0*OPAL_PM_LOSE_USER_CONTEXT \
> > -		       | 0*OPAL_PM_LOSE_HYP_CONTEXT \
> > -		       | 0*OPAL_PM_LOSE_FULL_CONTEXT \
> > -		       | 1*OPAL_PM_STOP_INST_FAST,
> > -		.pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(1) \
> > -				 | OPAL_PM_PSSCR_MTL(3) \
> > -				 | OPAL_PM_PSSCR_TR(3),
> > -		.pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
> > -	{
> >  		.name = "stop1",
> >  		.latency_ns = 5000,
> >  		.residency_ns = 50000,
>
Nicholas Piggin May 3, 2018, 9:28 a.m. | #3
On Thu, 3 May 2018 14:36:47 +0530
Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> wrote:

> On Tue, May 01, 2018 at 01:47:23PM +1000, Nicholas Piggin wrote:
> > On Mon, 30 Apr 2018 14:42:08 +0530
> > Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> wrote:
> >   
> > > Powersaving for stop0_lite and stop1_lite is observed to be quite similar
> > > and both states resume without state loss. Using context_switch test [1]
> > > we observe that stop0_lite has slightly lower latency, hence removing
> > > stop1_lite.
> > > 
> > > [1] linux/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> > > 
> > > Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>  
> > 
> > I'm okay for removing stop1_lite and stop2_lite -- SMT switching
> > is very latency critical. If we decide to actually start saving
> > real power then SMT should already have been switched.
> > 
> > So I would put stop1_lite and stop2_lite removal in the same patch.  
> 
> I can do this.
> 
> > 
> > Then what do we have? stop0_lite, stop0, stop1 for our fast idle
> > states.  
> 
> Currently we were looking at  stop0_lite , stop1 as the fast idle states
> because stop0 and stop1 have similar latency and powersaving.
> Having so many low latency states does not make sense.
> 
> > 
> > I would be against removing stop0 if that is our fastest way to
> > release SMT resources, even if there is only a small advantage. Why
> > not remove stop1 instead?
> >  
> SMT-folding comes into picture only when we have at least one thread
> running in the core. stop0 and stop1 has exactly same power-saving and
> both will release SMT resources if at least one thread in the core is
> running.

Right, but you don't know that other threads are running or will remain
running when you enter stop. If not, then latency is higher for stop1,
no? So we need to be using stop0.

> 
> As soon as all threads are idle core enters stop0/stop1, where stop1
> does a bit more powersaving than stop0.
> 
> > We also need to better evaluate stop0_lite. How much advantage does
> > that have over snooze?  
> 
> I evaluated snooze and stop0_lite, there is an additional ipi latency of
> a few microseconds in case of stop0_lite. So snooze cannot still be
> replaced by stop0_lite.

I meant the other way around. Replace stop0_lite with snooze.

So we would have snooze, stop0, stop2, and stop4 and/or 5.

Thanks,
Nick
Stewart Smith May 3, 2018, 10:03 a.m. | #4
Nicholas Piggin <npiggin@gmail.com> writes:
> On Thu, 3 May 2018 14:36:47 +0530
> Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> wrote:
>
>> On Tue, May 01, 2018 at 01:47:23PM +1000, Nicholas Piggin wrote:
>> > On Mon, 30 Apr 2018 14:42:08 +0530
>> > Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> wrote:
>> >   
>> > > Powersaving for stop0_lite and stop1_lite is observed to be quite similar
>> > > and both states resume without state loss. Using context_switch test [1]
>> > > we observe that stop0_lite has slightly lower latency, hence removing
>> > > stop1_lite.
>> > > 
>> > > [1] linux/tools/testing/selftests/powerpc/benchmarks/context_switch.c
>> > > 
>> > > Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>  
>> > 
>> > I'm okay for removing stop1_lite and stop2_lite -- SMT switching
>> > is very latency critical. If we decide to actually start saving
>> > real power then SMT should already have been switched.
>> > 
>> > So I would put stop1_lite and stop2_lite removal in the same patch.  
>> 
>> I can do this.
>> 
>> > 
>> > Then what do we have? stop0_lite, stop0, stop1 for our fast idle
>> > states.  
>> 
>> Currently we were looking at  stop0_lite , stop1 as the fast idle states
>> because stop0 and stop1 have similar latency and powersaving.
>> Having so many low latency states does not make sense.
>> 
>> > 
>> > I would be against removing stop0 if that is our fastest way to
>> > release SMT resources, even if there is only a small advantage. Why
>> > not remove stop1 instead?
>> >  
>> SMT-folding comes into picture only when we have at least one thread
>> running in the core. stop0 and stop1 has exactly same power-saving and
>> both will release SMT resources if at least one thread in the core is
>> running.
>
> Right, but you don't know that other threads are running or will remain
> running when you enter stop. If not, then latency is higher for stop1,
> no? So we need to be using stop0.
>
>> 
>> As soon as all threads are idle core enters stop0/stop1, where stop1
>> does a bit more powersaving than stop0.
>> 
>> > We also need to better evaluate stop0_lite. How much advantage does
>> > that have over snooze?  
>> 
>> I evaluated snooze and stop0_lite, there is an additional ipi latency of
>> a few microseconds in case of stop0_lite. So snooze cannot still be
>> replaced by stop0_lite.
>
> I meant the other way around. Replace stop0_lite with snooze.
>
> So we would have snooze, stop0, stop2, and stop4 and/or 5.

Slightly stupid question: should we be disabling these here or should
Linux be better and deciding what states to use?

I'm inclined to say this is a Linux problem as it should make the
decision of what hardware feature to used based on the ones OPAL says
*can* be used.

I'm also open to be being convinced otherwise though...
Nicholas Piggin May 3, 2018, 10:15 a.m. | #5
On Thu, 03 May 2018 20:03:55 +1000
Stewart Smith <stewart@linux.vnet.ibm.com> wrote:

> Nicholas Piggin <npiggin@gmail.com> writes:
> > On Thu, 3 May 2018 14:36:47 +0530
> > Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> wrote:
> >  
> >> On Tue, May 01, 2018 at 01:47:23PM +1000, Nicholas Piggin wrote:  
> >> > On Mon, 30 Apr 2018 14:42:08 +0530
> >> > Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> wrote:
> >> >     
> >> > > Powersaving for stop0_lite and stop1_lite is observed to be quite similar
> >> > > and both states resume without state loss. Using context_switch test [1]
> >> > > we observe that stop0_lite has slightly lower latency, hence removing
> >> > > stop1_lite.
> >> > > 
> >> > > [1] linux/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> >> > > 
> >> > > Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>    
> >> > 
> >> > I'm okay for removing stop1_lite and stop2_lite -- SMT switching
> >> > is very latency critical. If we decide to actually start saving
> >> > real power then SMT should already have been switched.
> >> > 
> >> > So I would put stop1_lite and stop2_lite removal in the same patch.    
> >> 
> >> I can do this.
> >>   
> >> > 
> >> > Then what do we have? stop0_lite, stop0, stop1 for our fast idle
> >> > states.    
> >> 
> >> Currently we were looking at  stop0_lite , stop1 as the fast idle states
> >> because stop0 and stop1 have similar latency and powersaving.
> >> Having so many low latency states does not make sense.
> >>   
> >> > 
> >> > I would be against removing stop0 if that is our fastest way to
> >> > release SMT resources, even if there is only a small advantage. Why
> >> > not remove stop1 instead?
> >> >    
> >> SMT-folding comes into picture only when we have at least one thread
> >> running in the core. stop0 and stop1 has exactly same power-saving and
> >> both will release SMT resources if at least one thread in the core is
> >> running.  
> >
> > Right, but you don't know that other threads are running or will remain
> > running when you enter stop. If not, then latency is higher for stop1,
> > no? So we need to be using stop0.
> >  
> >> 
> >> As soon as all threads are idle core enters stop0/stop1, where stop1
> >> does a bit more powersaving than stop0.
> >>   
> >> > We also need to better evaluate stop0_lite. How much advantage does
> >> > that have over snooze?    
> >> 
> >> I evaluated snooze and stop0_lite, there is an additional ipi latency of
> >> a few microseconds in case of stop0_lite. So snooze cannot still be
> >> replaced by stop0_lite.  
> >
> > I meant the other way around. Replace stop0_lite with snooze.
> >
> > So we would have snooze, stop0, stop2, and stop4 and/or 5.  
> 
> Slightly stupid question: should we be disabling these here or should
> Linux be better and deciding what states to use?

Yeah not a bad question, I don't have a good answer. I don't know how
smart Linux is at deciding what to use and when.

I am pretty sure the way we set our _lite states wrong -- we don't
want to go into stop2_lite as a deeper sleep state than stop0 for
example, because that then prevents SMT folding.

> 
> I'm inclined to say this is a Linux problem as it should make the
> decision of what hardware feature to used based on the ones OPAL says
> *can* be used.
> 
> I'm also open to be being convinced otherwise though...
> 

I would say we should manually decide what states we want, and then
work backwards and try to make the dt metadata reach that result
without fudging it too much. If we can't do that, then we should try
to improve the kernel so it can be made to work.

At some we may decide to trim the states by hand in skiboot just
so existing kernels work without so much fuss, and aim to do a bit
better with later devices.

Thanks,
Nick
Michael Ellerman May 4, 2018, 1:02 a.m. | #6
Stewart Smith <stewart@linux.vnet.ibm.com> writes:
...
>
> Slightly stupid question: should we be disabling these here or should
> Linux be better and deciding what states to use?
>
> I'm inclined to say this is a Linux problem as it should make the
> decision of what hardware feature to used based on the ones OPAL says
> *can* be used.

Yeah I agree.

Firmware shouldn't be implementing the policy around what states to use,
it should tell the operating system (which might be Linux) what states
are available and what their features are.

The exception to that would be that we have unfixable crash bugs in
existing kernels, in that case firmware might have to filter out states
that are known to cause those.

cheers
Stewart Smith May 6, 2018, 3:37 p.m. | #7
Michael Ellerman <mpe@ellerman.id.au> writes:
> Stewart Smith <stewart@linux.vnet.ibm.com> writes:
> ...
>>
>> Slightly stupid question: should we be disabling these here or should
>> Linux be better and deciding what states to use?
>>
>> I'm inclined to say this is a Linux problem as it should make the
>> decision of what hardware feature to used based on the ones OPAL says
>> *can* be used.
>
> Yeah I agree.
>
> Firmware shouldn't be implementing the policy around what states to use,
> it should tell the operating system (which might be Linux) what states
> are available and what their features are.

Yeah... I think I should work out somewhere to put this in the
documentation, a kind of design philosophy we can point back to.

> The exception to that would be that we have unfixable crash bugs in
> existing kernels, in that case firmware might have to filter out states
> that are known to cause those.

s/in/with/ and *cough* stop11 for example.
Akshay Adiga May 10, 2018, 8:59 a.m. | #8
On Thu, May 03, 2018 at 08:15:59PM +1000, Nicholas Piggin wrote:
> On Thu, 03 May 2018 20:03:55 +1000
> Stewart Smith <stewart@linux.vnet.ibm.com> wrote:
> 
> > Nicholas Piggin <npiggin@gmail.com> writes:
> > > On Thu, 3 May 2018 14:36:47 +0530
> > > Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> wrote:
> > >  
> > >> On Tue, May 01, 2018 at 01:47:23PM +1000, Nicholas Piggin wrote:  
> > >> > On Mon, 30 Apr 2018 14:42:08 +0530
> > >> > Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> wrote:
> > >> >     
> > >> > > Powersaving for stop0_lite and stop1_lite is observed to be quite similar
> > >> > > and both states resume without state loss. Using context_switch test [1]
> > >> > > we observe that stop0_lite has slightly lower latency, hence removing
> > >> > > stop1_lite.
> > >> > > 
> > >> > > [1] linux/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> > >> > > 
> > >> > > Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>    
> > >> > 
> > >> > I'm okay for removing stop1_lite and stop2_lite -- SMT switching
> > >> > is very latency critical. If we decide to actually start saving
> > >> > real power then SMT should already have been switched.
> > >> > 
> > >> > So I would put stop1_lite and stop2_lite removal in the same patch.    
> > >> 
> > >> I can do this.
> > >>   
> > >> > 
> > >> > Then what do we have? stop0_lite, stop0, stop1 for our fast idle
> > >> > states.    
> > >> 
> > >> Currently we were looking at  stop0_lite , stop1 as the fast idle states
> > >> because stop0 and stop1 have similar latency and powersaving.
> > >> Having so many low latency states does not make sense.
> > >>   
> > >> > 
> > >> > I would be against removing stop0 if that is our fastest way to
> > >> > release SMT resources, even if there is only a small advantage. Why
> > >> > not remove stop1 instead?
> > >> >    
> > >> SMT-folding comes into picture only when we have at least one thread
> > >> running in the core. stop0 and stop1 has exactly same power-saving and
> > >> both will release SMT resources if at least one thread in the core is
> > >> running.  
> > >
> > > Right, but you don't know that other threads are running or will remain
> > > running when you enter stop. If not, then latency is higher for stop1,
> > > no? So we need to be using stop0.
> > >  
> > >> 
> > >> As soon as all threads are idle core enters stop0/stop1, where stop1
> > >> does a bit more powersaving than stop0.
> > >>   
> > >> > We also need to better evaluate stop0_lite. How much advantage does
> > >> > that have over snooze?    
> > >> 
> > >> I evaluated snooze and stop0_lite, there is an additional ipi latency of
> > >> a few microseconds in case of stop0_lite. So snooze cannot still be
> > >> replaced by stop0_lite.  
> > >
> > > I meant the other way around. Replace stop0_lite with snooze.
> > >
> > > So we would have snooze, stop0, stop2, and stop4 and/or 5.  
> > 
> > Slightly stupid question: should we be disabling these here or should
> > Linux be better and deciding what states to use?
> 
> Yeah not a bad question, I don't have a good answer. I don't know how
> smart Linux is at deciding what to use and when.
> 
> I am pretty sure the way we set our _lite states wrong -- we don't
> want to go into stop2_lite as a deeper sleep state than stop0 for
> example, because that then prevents SMT folding.

I think we should keep both stop0 and stop1, i was not able to get
a good enough reason to remove stop0.

I a diffrent patch we need to tweak residencies so that we can bias
to more useful stop states.
Nicholas Piggin May 10, 2018, 10:02 a.m. | #9
On Thu, 10 May 2018 14:29:44 +0530
Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> wrote:

> On Thu, May 03, 2018 at 08:15:59PM +1000, Nicholas Piggin wrote:
> > On Thu, 03 May 2018 20:03:55 +1000
> > Stewart Smith <stewart@linux.vnet.ibm.com> wrote:
> >   
> > > Nicholas Piggin <npiggin@gmail.com> writes:  
> > > > On Thu, 3 May 2018 14:36:47 +0530
> > > > Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> wrote:
> > > >    
> > > >> On Tue, May 01, 2018 at 01:47:23PM +1000, Nicholas Piggin wrote:    
> > > >> > On Mon, 30 Apr 2018 14:42:08 +0530
> > > >> > Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> wrote:
> > > >> >       
> > > >> > > Powersaving for stop0_lite and stop1_lite is observed to be quite similar
> > > >> > > and both states resume without state loss. Using context_switch test [1]
> > > >> > > we observe that stop0_lite has slightly lower latency, hence removing
> > > >> > > stop1_lite.
> > > >> > > 
> > > >> > > [1] linux/tools/testing/selftests/powerpc/benchmarks/context_switch.c
> > > >> > > 
> > > >> > > Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>      
> > > >> > 
> > > >> > I'm okay for removing stop1_lite and stop2_lite -- SMT switching
> > > >> > is very latency critical. If we decide to actually start saving
> > > >> > real power then SMT should already have been switched.
> > > >> > 
> > > >> > So I would put stop1_lite and stop2_lite removal in the same patch.      
> > > >> 
> > > >> I can do this.
> > > >>     
> > > >> > 
> > > >> > Then what do we have? stop0_lite, stop0, stop1 for our fast idle
> > > >> > states.      
> > > >> 
> > > >> Currently we were looking at  stop0_lite , stop1 as the fast idle states
> > > >> because stop0 and stop1 have similar latency and powersaving.
> > > >> Having so many low latency states does not make sense.
> > > >>     
> > > >> > 
> > > >> > I would be against removing stop0 if that is our fastest way to
> > > >> > release SMT resources, even if there is only a small advantage. Why
> > > >> > not remove stop1 instead?
> > > >> >      
> > > >> SMT-folding comes into picture only when we have at least one thread
> > > >> running in the core. stop0 and stop1 has exactly same power-saving and
> > > >> both will release SMT resources if at least one thread in the core is
> > > >> running.    
> > > >
> > > > Right, but you don't know that other threads are running or will remain
> > > > running when you enter stop. If not, then latency is higher for stop1,
> > > > no? So we need to be using stop0.
> > > >    
> > > >> 
> > > >> As soon as all threads are idle core enters stop0/stop1, where stop1
> > > >> does a bit more powersaving than stop0.
> > > >>     
> > > >> > We also need to better evaluate stop0_lite. How much advantage does
> > > >> > that have over snooze?      
> > > >> 
> > > >> I evaluated snooze and stop0_lite, there is an additional ipi latency of
> > > >> a few microseconds in case of stop0_lite. So snooze cannot still be
> > > >> replaced by stop0_lite.    
> > > >
> > > > I meant the other way around. Replace stop0_lite with snooze.
> > > >
> > > > So we would have snooze, stop0, stop2, and stop4 and/or 5.    
> > > 
> > > Slightly stupid question: should we be disabling these here or should
> > > Linux be better and deciding what states to use?  
> > 
> > Yeah not a bad question, I don't have a good answer. I don't know how
> > smart Linux is at deciding what to use and when.
> > 
> > I am pretty sure the way we set our _lite states wrong -- we don't
> > want to go into stop2_lite as a deeper sleep state than stop0 for
> > example, because that then prevents SMT folding.  
> 
> I think we should keep both stop0 and stop1, i was not able to get
> a good enough reason to remove stop0.
> 
> I a diffrent patch we need to tweak residencies so that we can bias
> to more useful stop states.
> 

Well I would say stop1_lite and stop2_lite should not be used if we
have stop0_lite, particularly they should not be used after stop0 in
the case that we have SMT enabled. If SMT is disabled then possibly
we could use the _lite states instead.

So we need some way to advertise and respond to that in Linux.

After that I guess it's a matter of measuring and tuning the others.
I think between snooze and stop0, it is an open question whether it
is worth using stop0_lite.

Thanks,
Nick

Patch

diff --git a/hw/slw.c b/hw/slw.c
index 3f9abaa..edfc783 100644
--- a/hw/slw.c
+++ b/hw/slw.c
@@ -521,36 +521,6 @@  static struct cpu_idle_states power9_cpu_idle_states[] = {
 				 | OPAL_PM_PSSCR_TR(3),
 		.pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
 	{
-		.name = "stop0",
-		.latency_ns = 2000,
-		.residency_ns = 20000,
-		.flags = 0*OPAL_PM_DEC_STOP \
-		       | 0*OPAL_PM_TIMEBASE_STOP  \
-		       | 1*OPAL_PM_LOSE_USER_CONTEXT \
-		       | 0*OPAL_PM_LOSE_HYP_CONTEXT \
-		       | 0*OPAL_PM_LOSE_FULL_CONTEXT \
-		       | 1*OPAL_PM_STOP_INST_FAST,
-		.pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(0) \
-				 | OPAL_PM_PSSCR_MTL(3) \
-				 | OPAL_PM_PSSCR_TR(3) \
-				 | OPAL_PM_PSSCR_ESL \
-				 | OPAL_PM_PSSCR_EC,
-		.pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
-	{
-		.name = "stop1_lite", /* Enter stop1 with no state loss */
-		.latency_ns = 4900,
-		.residency_ns = 49000,
-		.flags = 0*OPAL_PM_DEC_STOP \
-		       | 0*OPAL_PM_TIMEBASE_STOP  \
-		       | 0*OPAL_PM_LOSE_USER_CONTEXT \
-		       | 0*OPAL_PM_LOSE_HYP_CONTEXT \
-		       | 0*OPAL_PM_LOSE_FULL_CONTEXT \
-		       | 1*OPAL_PM_STOP_INST_FAST,
-		.pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(1) \
-				 | OPAL_PM_PSSCR_MTL(3) \
-				 | OPAL_PM_PSSCR_TR(3),
-		.pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
-	{
 		.name = "stop1",
 		.latency_ns = 5000,
 		.residency_ns = 50000,