diff mbox

cpuidle: powernv/pseries: Decrease the snooze residency

Message ID 1432902728-31476-1-git-send-email-shilpa.bhat@linux.vnet.ibm.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Shilpasri G Bhat May 29, 2015, 12:32 p.m. UTC
The idle cpus which stay in snooze for a long period can degrade the
perfomance of the sibling cpus. If the cpu stays in snooze for more
than target residency of the next available idle state, then exit from
snooze. This gives a chance to the cpuidle governor to re-evaluate the
last idle state of the cpu to promote it to deeper idle states.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
---
 drivers/cpuidle/cpuidle-powernv.c | 12 ++++++++++++
 drivers/cpuidle/cpuidle-pseries.c | 11 +++++++++++
 2 files changed, 23 insertions(+)

Comments

Preeti U Murthy May 29, 2015, 1:47 p.m. UTC | #1
Hi Shilpa,

The subject does not convey the purpose of this patch clearly IMO.
I would definitely suggest changing the subject to something like
"Auto promotion of snooze to deeper idle state" or similar.

On 05/29/2015 06:02 PM, Shilpasri G Bhat wrote:
> The idle cpus which stay in snooze for a long period can degrade the
> perfomance of the sibling cpus. If the cpu stays in snooze for more
> than target residency of the next available idle state, then exit from
> snooze. This gives a chance to the cpuidle governor to re-evaluate the
> last idle state of the cpu to promote it to deeper idle states.
> 
> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
> ---
>  drivers/cpuidle/cpuidle-powernv.c | 12 ++++++++++++
>  drivers/cpuidle/cpuidle-pseries.c | 11 +++++++++++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
> index bb9e2b6..07135e0 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -27,6 +27,8 @@ struct cpuidle_driver pseries_idle_driver = {
>  
>  static int max_idle_state;
>  static struct cpuidle_state *cpuidle_state_table;
> +static u64 snooze_timeout;
> +static bool snooze_timeout_en;
>  
>  static inline void idle_loop_prolog(unsigned long *in_purr)
>  {
> @@ -58,14 +60,18 @@ static int snooze_loop(struct cpuidle_device *dev,
>  			int index)
>  {
>  	unsigned long in_purr;
> +	u64 snooze_exit_time;
>  
>  	idle_loop_prolog(&in_purr);
>  	local_irq_enable();
>  	set_thread_flag(TIF_POLLING_NRFLAG);
> +	snooze_exit_time = get_tb() + snooze_timeout;
>  
>  	while (!need_resched()) {
>  		HMT_low();
>  		HMT_very_low();
> +		if (snooze_timeout_en && get_tb() > snooze_exit_time)
> +			break;
>  	}
>  
>  	HMT_medium();
> @@ -244,6 +250,11 @@ static int pseries_idle_probe(void)
>  	} else
>  		return -ENODEV;
>  
> +	if (max_idle_state > 1) {
> +		snooze_timeout_en = true;
> +		snooze_timeout = cpuidle_state_table[1].target_residency *
> +				 tb_ticks_per_usec;
> +	}

Any idea why we don't have snooze defined on the shared lpar configuration ?

Regards
Preeti U Murthy
>  	return 0;
>  }
>  
>
Vaidyanathan Srinivasan May 30, 2015, 6:01 a.m. UTC | #2
* Preeti U Murthy <preeti@linux.vnet.ibm.com> [2015-05-29 19:17:17]:

[snip]

> > +	if (max_idle_state > 1) {
> > +		snooze_timeout_en = true;
> > +		snooze_timeout = cpuidle_state_table[1].target_residency *
> > +				 tb_ticks_per_usec;
> > +	}
> 
> Any idea why we don't have snooze defined on the shared lpar configuration ?

In shared lpar case, spinning in guest context may potentially take
away cycles from other lpars waiting to run on the same physical cpu.

So the policy in shared lpar case is to let PowerVM hypervisor know
immediately that the guest cpu is idle which will allow the hypervisor
to use the cycles for other tasks/lpars.

--Vaidy
Preeti U Murthy May 30, 2015, 8 a.m. UTC | #3
On 05/30/2015 11:31 AM, Vaidyanathan Srinivasan wrote:
> * Preeti U Murthy <preeti@linux.vnet.ibm.com> [2015-05-29 19:17:17]:
> 
> [snip]
> 
>>> +	if (max_idle_state > 1) {
>>> +		snooze_timeout_en = true;
>>> +		snooze_timeout = cpuidle_state_table[1].target_residency *
>>> +				 tb_ticks_per_usec;
>>> +	}
>>
>> Any idea why we don't have snooze defined on the shared lpar configuration ?
> 
> In shared lpar case, spinning in guest context may potentially take
> away cycles from other lpars waiting to run on the same physical cpu.
> 
> So the policy in shared lpar case is to let PowerVM hypervisor know
> immediately that the guest cpu is idle which will allow the hypervisor
> to use the cycles for other tasks/lpars.
> 

Oh Ok! Thanks for the clarification !

Regards
Preeti U Murthy
> --Vaidy
>
Benjamin Herrenschmidt May 31, 2015, 1:38 a.m. UTC | #4
On Sat, 2015-05-30 at 11:31 +0530, Vaidyanathan Srinivasan wrote:
> In shared lpar case, spinning in guest context may potentially take
> away cycles from other lpars waiting to run on the same physical cpu.
> 
> So the policy in shared lpar case is to let PowerVM hypervisor know
> immediately that the guest cpu is idle which will allow the hypervisor
> to use the cycles for other tasks/lpars.

But that will have negative side effects under KVM no ?

Suresh mentioned something with his new directed interrupts code that we
had many cases where the interrupts ended up arriving shortly after we
exited to host for NAP'ing ...

Snooze might fix it...

Ben.
Vaidyanathan Srinivasan June 3, 2015, 3:35 p.m. UTC | #5
* Benjamin Herrenschmidt <benh@au1.ibm.com> [2015-05-30 20:38:22]:

> On Sat, 2015-05-30 at 11:31 +0530, Vaidyanathan Srinivasan wrote:
> > In shared lpar case, spinning in guest context may potentially take
> > away cycles from other lpars waiting to run on the same physical cpu.
> > 
> > So the policy in shared lpar case is to let PowerVM hypervisor know
> > immediately that the guest cpu is idle which will allow the hypervisor
> > to use the cycles for other tasks/lpars.
> 
> But that will have negative side effects under KVM no ?

Yes, you have a good point.  If one of the thread in the core goes to
cede, it can still come back quickly since the KVM guest context is
not switched yet.  But in single threaded guest, this can force an
unnecessary exit/context switch overhead.

Now that we have fixed the snooze loop to be bounded and exit
predictably, KVM guest should actually use snooze state to improve
latency.

I will test this scenario and enable snooze state for KVM guest.
 
> Suresh mentioned something with his new directed interrupts code that we
> had many cases where the interrupts ended up arriving shortly after we
> exited to host for NAP'ing ...
> 
> Snooze might fix it...

Right.  This scenario is worth experimenting and then introduce snooze
loop for guest.

--Vaidy
diff mbox

Patch

diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 5937207..1e3ef5e 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -29,18 +29,25 @@  struct cpuidle_driver powernv_idle_driver = {
 
 static int max_idle_state;
 static struct cpuidle_state *cpuidle_state_table;
+static u64 snooze_timeout;
+static bool snooze_timeout_en;
 
 static int snooze_loop(struct cpuidle_device *dev,
 			struct cpuidle_driver *drv,
 			int index)
 {
+	u64 snooze_exit_time;
+
 	local_irq_enable();
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
+	snooze_exit_time = get_tb() + snooze_timeout;
 	ppc64_runlatch_off();
 	while (!need_resched()) {
 		HMT_low();
 		HMT_very_low();
+		if (snooze_timeout_en && get_tb() > snooze_exit_time)
+			break;
 	}
 
 	HMT_medium();
@@ -252,6 +259,11 @@  static int powernv_idle_probe(void)
 		cpuidle_state_table = powernv_states;
 		/* Device tree can indicate more idle states */
 		max_idle_state = powernv_add_idle_states();
+		if (max_idle_state > 1) {
+			snooze_timeout_en = true;
+			snooze_timeout = powernv_states[1].target_residency *
+					 tb_ticks_per_usec;
+		}
  	} else
  		return -ENODEV;
 
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index bb9e2b6..07135e0 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -27,6 +27,8 @@  struct cpuidle_driver pseries_idle_driver = {
 
 static int max_idle_state;
 static struct cpuidle_state *cpuidle_state_table;
+static u64 snooze_timeout;
+static bool snooze_timeout_en;
 
 static inline void idle_loop_prolog(unsigned long *in_purr)
 {
@@ -58,14 +60,18 @@  static int snooze_loop(struct cpuidle_device *dev,
 			int index)
 {
 	unsigned long in_purr;
+	u64 snooze_exit_time;
 
 	idle_loop_prolog(&in_purr);
 	local_irq_enable();
 	set_thread_flag(TIF_POLLING_NRFLAG);
+	snooze_exit_time = get_tb() + snooze_timeout;
 
 	while (!need_resched()) {
 		HMT_low();
 		HMT_very_low();
+		if (snooze_timeout_en && get_tb() > snooze_exit_time)
+			break;
 	}
 
 	HMT_medium();
@@ -244,6 +250,11 @@  static int pseries_idle_probe(void)
 	} else
 		return -ENODEV;
 
+	if (max_idle_state > 1) {
+		snooze_timeout_en = true;
+		snooze_timeout = cpuidle_state_table[1].target_residency *
+				 tb_ticks_per_usec;
+	}
 	return 0;
 }