Message ID | 1527063458-16710-1-git-send-email-akshay.adiga@linux.vnet.ibm.com |
---|---|
State | Accepted |
Headers | show |
Series | [v2] SLW: Remove stop1_lite and stop2_lite | expand |
* Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> [2018-05-23 13:47:38]: Lite stop states are designed to stop dispatch of instructions but not relinquish thread resources. They primarily reduce switching power with a benefit of resuming operation without any state loss. This could save upto a micro-second in wakeup latency. However there is a down side to this. Since thread resources are not relinquished, other running threads in the core cannot benefit from automatically folding down to lower SMT mode. We have stop0_lite and stop1_lite that are pretty close in operational characteristics, hence we can choose to keep only stop0_lite and remove stop1_lite. However, stop2_lite provides some minimal benefits over stop2. But since the wakeup latency of stop2 to re-clock the core is 10-20us, the context preservation benefit is reduced. This patch removes all stop_lite states except stop0_lite. We should re-introduce stop2_lite once kernel cpuidle governor can understand SMT folding thresholds and auto promote to a non-lite state. > Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> > --- > hw/slw.c | 36 ++++++++---------------------------- > 1 file changed, 8 insertions(+), 28 deletions(-) > > diff --git a/hw/slw.c b/hw/slw.c > index 2b305db..ff6e2a4 100644 > --- a/hw/slw.c > +++ b/hw/slw.c > @@ -529,20 +529,9 @@ static struct cpu_idle_states power9_cpu_idle_states[] = { > | OPAL_PM_PSSCR_ESL \ > | OPAL_PM_PSSCR_EC, > .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK }, > - { > - .name = "stop1_lite", /* Enter stop1 with no state loss */ > - .latency_ns = 4900, > - .residency_ns = 49000, > - .flags = 0*OPAL_PM_DEC_STOP \ > - | 0*OPAL_PM_TIMEBASE_STOP \ > - | 0*OPAL_PM_LOSE_USER_CONTEXT \ > - | 0*OPAL_PM_LOSE_HYP_CONTEXT \ > - | 0*OPAL_PM_LOSE_FULL_CONTEXT \ > - | 1*OPAL_PM_STOP_INST_FAST, > - .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(1) \ > - | OPAL_PM_PSSCR_MTL(3) \ > - | OPAL_PM_PSSCR_TR(3), > - .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK }, > + > + /* stop1_lite has been removed since it adds no additional benefit over stop0_lite */ > + > { > .name = "stop1", > .latency_ns = 5000, > @@ -559,20 +548,11 @@ static struct cpu_idle_states power9_cpu_idle_states[] = { > | OPAL_PM_PSSCR_ESL \ > | OPAL_PM_PSSCR_EC, > .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK }, > - { > - .name = "stop2_lite", /* Enter stop2 with no state loss */ > - .latency_ns = 9900, > - .residency_ns = 99000, > - .flags = 0*OPAL_PM_DEC_STOP \ > - | 0*OPAL_PM_TIMEBASE_STOP \ > - | 0*OPAL_PM_LOSE_USER_CONTEXT \ > - | 0*OPAL_PM_LOSE_HYP_CONTEXT \ > - | 0*OPAL_PM_LOSE_FULL_CONTEXT \ > - | 1*OPAL_PM_STOP_INST_FAST, > - .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(2) \ > - | OPAL_PM_PSSCR_MTL(3) \ > - | OPAL_PM_PSSCR_TR(3), > - .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK }, > + /* > + * stop2_lite has been removed since currently it adds minimal benefit over stop2. > + * However, the benefit is eclipsed by the time required to ungate the clocks > + */ > + > { > .name = "stop2", > .latency_ns = 10000, > -- > 2.5.5 >
On Wed, 23 May 2018 14:49:29 +0530 Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> wrote: > * Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> [2018-05-23 13:47:38]: > > Lite stop states are designed to stop dispatch of instructions but not > relinquish thread resources. They primarily reduce switching power > with a benefit of resuming operation without any state loss. This > could save upto a micro-second in wakeup latency. However there is > a down side to this. Since thread resources are not relinquished, > other running threads in the core cannot benefit from automatically > folding down to lower SMT mode. > > We have stop0_lite and stop1_lite that are pretty close in operational > characteristics, hence we can choose to keep only stop0_lite and > remove stop1_lite. > > However, stop2_lite provides some minimal benefits over stop2. But > since the wakeup latency of stop2 to re-clock the core is 10-20us, the > context preservation benefit is reduced. > > This patch removes all stop_lite states except stop0_lite. We should > re-introduce stop2_lite once kernel cpuidle governor can understand > SMT folding thresholds and auto promote to a non-lite state. > > > Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> > > Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
On Wed, May 23, 2018 at 6:17 PM, Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> wrote: > stop1_lite has been removed since it adds no additional benefit > over stop0_lite. stop2_lite has been removed since currently it adds > minimal benefit over stop2. However, the benefit is eclipsed by the time > required to ungate the clocks > > Moreover, Lite states don't give up the SMT resources, can potentially > have a performance impact on sibling threads. > > Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> > --- > hw/slw.c | 36 ++++++++---------------------------- > 1 file changed, 8 insertions(+), 28 deletions(-) > > diff --git a/hw/slw.c b/hw/slw.c > index 2b305db..ff6e2a4 100644 > --- a/hw/slw.c > +++ b/hw/slw.c > @@ -529,20 +529,9 @@ static struct cpu_idle_states power9_cpu_idle_states[] = { > | OPAL_PM_PSSCR_ESL \ > | OPAL_PM_PSSCR_EC, > .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK }, > - { > - .name = "stop1_lite", /* Enter stop1 with no state loss */ > - .latency_ns = 4900, > - .residency_ns = 49000, How did we come up with the latency and residency values for stop1_lite, stop2_lite? It sounds like the values are roughtly 2% of the corresponding non lite states? Are these values empirical, guesses, retrofitted to the OS? Balbir Singh
Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> writes: > stop1_lite has been removed since it adds no additional benefit > over stop0_lite. stop2_lite has been removed since currently it adds > minimal benefit over stop2. However, the benefit is eclipsed by the time > required to ungate the clocks > > Moreover, Lite states don't give up the SMT resources, can potentially > have a performance impact on sibling threads. > > Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> > --- > hw/slw.c | 36 ++++++++---------------------------- > 1 file changed, 8 insertions(+), 28 deletions(-) As discussed on phone, I've merged this into master as of 34e9c3c1edb3eed02f428f9cbf97d99b3db43d4d and we can work on getting things back and having smarter kernel and firmware over the next few months. I gather this should also head to stable so that it ends up in customer hands in a timely fashion?
* Stewart Smith <stewart@linux.vnet.ibm.com> [2018-05-25 10:26:21]: > Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> writes: > > stop1_lite has been removed since it adds no additional benefit > > over stop0_lite. stop2_lite has been removed since currently it adds > > minimal benefit over stop2. However, the benefit is eclipsed by the time > > required to ungate the clocks > > > > Moreover, Lite states don't give up the SMT resources, can potentially > > have a performance impact on sibling threads. > > > > Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> > > --- > > hw/slw.c | 36 ++++++++---------------------------- > > 1 file changed, 8 insertions(+), 28 deletions(-) > > As discussed on phone, I've merged this into master as of > 34e9c3c1edb3eed02f428f9cbf97d99b3db43d4d and we can work on getting > things back and having smarter kernel and firmware over the next few > months. > > I gather this should also head to stable so that it ends up in customer > hands in a timely fashion? Hi Stewart, Thanks a lot. Yes, once the kernel gets smarter we should add some of these back. This should go to stable for P9 customers and builds. Not needed for older stable branches of skiboot. --Vaidy
* Balbir Singh <bsingharora@gmail.com> [2018-05-23 21:37:44]: > On Wed, May 23, 2018 at 6:17 PM, Akshay Adiga > <akshay.adiga@linux.vnet.ibm.com> wrote: > > stop1_lite has been removed since it adds no additional benefit > > over stop0_lite. stop2_lite has been removed since currently it adds > > minimal benefit over stop2. However, the benefit is eclipsed by the time > > required to ungate the clocks > > > > Moreover, Lite states don't give up the SMT resources, can potentially > > have a performance impact on sibling threads. > > > > Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> > > --- > > hw/slw.c | 36 ++++++++---------------------------- > > 1 file changed, 8 insertions(+), 28 deletions(-) > > > > diff --git a/hw/slw.c b/hw/slw.c > > index 2b305db..ff6e2a4 100644 > > --- a/hw/slw.c > > +++ b/hw/slw.c > > @@ -529,20 +529,9 @@ static struct cpu_idle_states power9_cpu_idle_states[] = { > > | OPAL_PM_PSSCR_ESL \ > > | OPAL_PM_PSSCR_EC, > > .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK }, > > - { > > - .name = "stop1_lite", /* Enter stop1 with no state loss */ > > - .latency_ns = 4900, > > - .residency_ns = 49000, > > How did we come up with the latency and residency values for > stop1_lite, stop2_lite? It sounds like the values are roughtly 2% of > the corresponding non lite states? Are these values empirical, > guesses, retrofitted to the OS? These values are based on real latency, but bumped up to fit into kernel with sufficient space among other stop states to make an independent selection. It is a mix of measurements and heuristics. This patch reduces the number of stop states so that we keep the ones that make a difference. This will help us tune these number better. --Vaidy
On 25/05/18 14:36, Vaidyanathan Srinivasan wrote: > * Balbir Singh <bsingharora@gmail.com> [2018-05-23 21:37:44]: > >> On Wed, May 23, 2018 at 6:17 PM, Akshay Adiga >> <akshay.adiga@linux.vnet.ibm.com> wrote: >>> stop1_lite has been removed since it adds no additional benefit >>> over stop0_lite. stop2_lite has been removed since currently it adds >>> minimal benefit over stop2. However, the benefit is eclipsed by the time >>> required to ungate the clocks >>> >>> Moreover, Lite states don't give up the SMT resources, can potentially >>> have a performance impact on sibling threads. >>> >>> Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> >>> --- >>> hw/slw.c | 36 ++++++++---------------------------- >>> 1 file changed, 8 insertions(+), 28 deletions(-) >>> >>> diff --git a/hw/slw.c b/hw/slw.c >>> index 2b305db..ff6e2a4 100644 >>> --- a/hw/slw.c >>> +++ b/hw/slw.c >>> @@ -529,20 +529,9 @@ static struct cpu_idle_states power9_cpu_idle_states[] = { >>> | OPAL_PM_PSSCR_ESL \ >>> | OPAL_PM_PSSCR_EC, >>> .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK }, >>> - { >>> - .name = "stop1_lite", /* Enter stop1 with no state loss */ >>> - .latency_ns = 4900, >>> - .residency_ns = 49000, >> >> How did we come up with the latency and residency values for >> stop1_lite, stop2_lite? It sounds like the values are roughtly 2% of >> the corresponding non lite states? Are these values empirical, >> guesses, retrofitted to the OS? > > These values are based on real latency, but bumped up to fit into > kernel with sufficient space among other stop states to make an > independent selection. It is a mix of measurements and heuristics. > This patch reduces the number of stop states so that we keep the ones > that make a difference. This will help us tune these number better. > OK, so the next question is why did we take out the stop lite states and not the stop states? Is there a huge saving with the actual stop states compared to the lite states? IOW, can we get similar numbers without the state loss? Balbir Singh.
On Mon, May 28, 2018 at 10:35 AM, Balbir Singh <bsingharora@gmail.com> wrote: > > > > On 25/05/18 14:36, Vaidyanathan Srinivasan wrote: > > * Balbir Singh <bsingharora@gmail.com> [2018-05-23 21:37:44]: > > > >> On Wed, May 23, 2018 at 6:17 PM, Akshay Adiga > >> <akshay.adiga@linux.vnet.ibm.com> wrote: > >>> stop1_lite has been removed since it adds no additional benefit > >>> over stop0_lite. stop2_lite has been removed since currently it adds > >>> minimal benefit over stop2. However, the benefit is eclipsed by the time > >>> required to ungate the clocks > >>> > >>> Moreover, Lite states don't give up the SMT resources, can potentially > >>> have a performance impact on sibling threads. > >>> > >>> Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> > >>> --- > >>> hw/slw.c | 36 ++++++++---------------------------- > >>> 1 file changed, 8 insertions(+), 28 deletions(-) > >>> > >>> diff --git a/hw/slw.c b/hw/slw.c > >>> index 2b305db..ff6e2a4 100644 > >>> --- a/hw/slw.c > >>> +++ b/hw/slw.c > >>> @@ -529,20 +529,9 @@ static struct cpu_idle_states power9_cpu_idle_states[] = { > >>> | OPAL_PM_PSSCR_ESL \ > >>> | OPAL_PM_PSSCR_EC, > >>> .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK }, > >>> - { > >>> - .name = "stop1_lite", /* Enter stop1 with no state loss */ > >>> - .latency_ns = 4900, > >>> - .residency_ns = 49000, > >> > >> How did we come up with the latency and residency values for > >> stop1_lite, stop2_lite? It sounds like the values are roughtly 2% of > >> the corresponding non lite states? Are these values empirical, > >> guesses, retrofitted to the OS? > > > > These values are based on real latency, but bumped up to fit into > > kernel with sufficient space among other stop states to make an > > independent selection. It is a mix of measurements and heuristics. > > This patch reduces the number of stop states so that we keep the ones > > that make a difference. This will help us tune these number better. > > > > OK, so the next question is why did we take out the stop lite states and not > the stop states? Is there a huge saving with the actual stop states compared > to the lite states? IOW, can we get similar numbers without the state loss? Never mind, I just re-read the changelog about SMT resources Balbir Singh.
Balbir Singh <bsingharora@gmail.com> writes: > On Mon, May 28, 2018 at 10:35 AM, Balbir Singh <bsingharora@gmail.com> wrote: >> On 25/05/18 14:36, Vaidyanathan Srinivasan wrote: >> > * Balbir Singh <bsingharora@gmail.com> [2018-05-23 21:37:44]: >> > >> >> On Wed, May 23, 2018 at 6:17 PM, Akshay Adiga >> >> <akshay.adiga@linux.vnet.ibm.com> wrote: >> >>> stop1_lite has been removed since it adds no additional benefit >> >>> over stop0_lite. stop2_lite has been removed since currently it adds >> >>> minimal benefit over stop2. However, the benefit is eclipsed by the time >> >>> required to ungate the clocks >> >>> >> >>> Moreover, Lite states don't give up the SMT resources, can potentially >> >>> have a performance impact on sibling threads. >> >>> >> >>> Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> >> >>> --- >> >>> hw/slw.c | 36 ++++++++---------------------------- >> >>> 1 file changed, 8 insertions(+), 28 deletions(-) >> >>> >> >>> diff --git a/hw/slw.c b/hw/slw.c >> >>> index 2b305db..ff6e2a4 100644 >> >>> --- a/hw/slw.c >> >>> +++ b/hw/slw.c >> >>> @@ -529,20 +529,9 @@ static struct cpu_idle_states power9_cpu_idle_states[] = { >> >>> | OPAL_PM_PSSCR_ESL \ >> >>> | OPAL_PM_PSSCR_EC, >> >>> .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK }, >> >>> - { >> >>> - .name = "stop1_lite", /* Enter stop1 with no state loss */ >> >>> - .latency_ns = 4900, >> >>> - .residency_ns = 49000, >> >> >> >> How did we come up with the latency and residency values for >> >> stop1_lite, stop2_lite? It sounds like the values are roughtly 2% of >> >> the corresponding non lite states? Are these values empirical, >> >> guesses, retrofitted to the OS? >> > >> > These values are based on real latency, but bumped up to fit into >> > kernel with sufficient space among other stop states to make an >> > independent selection. It is a mix of measurements and heuristics. >> > This patch reduces the number of stop states so that we keep the ones >> > that make a difference. This will help us tune these number better. >> > >> >> OK, so the next question is why did we take out the stop lite states and not >> the stop states? Is there a huge saving with the actual stop states compared >> to the lite states? IOW, can we get similar numbers without the state loss? > > Never mind, I just re-read the changelog about SMT resources Yeah, and it turns out we should both inform the kernel about that *and* fix the kernel to take it into consideration.
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> writes: > * Stewart Smith <stewart@linux.vnet.ibm.com> [2018-05-25 10:26:21]: > >> Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> writes: >> > stop1_lite has been removed since it adds no additional benefit >> > over stop0_lite. stop2_lite has been removed since currently it adds >> > minimal benefit over stop2. However, the benefit is eclipsed by the time >> > required to ungate the clocks >> > >> > Moreover, Lite states don't give up the SMT resources, can potentially >> > have a performance impact on sibling threads. >> > >> > Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> >> > --- >> > hw/slw.c | 36 ++++++++---------------------------- >> > 1 file changed, 8 insertions(+), 28 deletions(-) >> >> As discussed on phone, I've merged this into master as of >> 34e9c3c1edb3eed02f428f9cbf97d99b3db43d4d and we can work on getting >> things back and having smarter kernel and firmware over the next few >> months. >> >> I gather this should also head to stable so that it ends up in customer >> hands in a timely fashion? > > Hi Stewart, > > Thanks a lot. Yes, once the kernel gets smarter we should add some of > these back. > > This should go to stable for P9 customers and builds. Not needed for > older stable branches of skiboot. Ack. Cherry-picked into 6.0.x as of cc52c56200956485aee67cb933b2a3d0132cc7fd, and I'll go tag a 6.0.4 with it now.
diff --git a/hw/slw.c b/hw/slw.c index 2b305db..ff6e2a4 100644 --- a/hw/slw.c +++ b/hw/slw.c @@ -529,20 +529,9 @@ static struct cpu_idle_states power9_cpu_idle_states[] = { | OPAL_PM_PSSCR_ESL \ | OPAL_PM_PSSCR_EC, .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK }, - { - .name = "stop1_lite", /* Enter stop1 with no state loss */ - .latency_ns = 4900, - .residency_ns = 49000, - .flags = 0*OPAL_PM_DEC_STOP \ - | 0*OPAL_PM_TIMEBASE_STOP \ - | 0*OPAL_PM_LOSE_USER_CONTEXT \ - | 0*OPAL_PM_LOSE_HYP_CONTEXT \ - | 0*OPAL_PM_LOSE_FULL_CONTEXT \ - | 1*OPAL_PM_STOP_INST_FAST, - .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(1) \ - | OPAL_PM_PSSCR_MTL(3) \ - | OPAL_PM_PSSCR_TR(3), - .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK }, + + /* stop1_lite has been removed since it adds no additional benefit over stop0_lite */ + { .name = "stop1", .latency_ns = 5000, @@ -559,20 +548,11 @@ static struct cpu_idle_states power9_cpu_idle_states[] = { | OPAL_PM_PSSCR_ESL \ | OPAL_PM_PSSCR_EC, .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK }, - { - .name = "stop2_lite", /* Enter stop2 with no state loss */ - .latency_ns = 9900, - .residency_ns = 99000, - .flags = 0*OPAL_PM_DEC_STOP \ - | 0*OPAL_PM_TIMEBASE_STOP \ - | 0*OPAL_PM_LOSE_USER_CONTEXT \ - | 0*OPAL_PM_LOSE_HYP_CONTEXT \ - | 0*OPAL_PM_LOSE_FULL_CONTEXT \ - | 1*OPAL_PM_STOP_INST_FAST, - .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(2) \ - | OPAL_PM_PSSCR_MTL(3) \ - | OPAL_PM_PSSCR_TR(3), - .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK }, + /* + * stop2_lite has been removed since currently it adds minimal benefit over stop2. + * However, the benefit is eclipsed by the time required to ungate the clocks + */ + { .name = "stop2", .latency_ns = 10000,
stop1_lite has been removed since it adds no additional benefit over stop0_lite. stop2_lite has been removed since currently it adds minimal benefit over stop2. However, the benefit is eclipsed by the time required to ungate the clocks Moreover, Lite states don't give up the SMT resources, can potentially have a performance impact on sibling threads. Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> --- hw/slw.c | 36 ++++++++---------------------------- 1 file changed, 8 insertions(+), 28 deletions(-)