diff mbox

[RFC,v2,4/4] cpuidle: (POWER) Handle power_save=off

Message ID 20111117112906.9191.54050.stgit@localhost6.localdomain6 (mailing list archive)
State Changes Requested
Delegated to: Benjamin Herrenschmidt
Headers show

Commit Message

Deepthi Dharwar Nov. 17, 2011, 11:29 a.m. UTC
This patch makes pseries_idle_driver not to be registered when
power_save=off kernel boot option is specified. The
boot_option_idle_override variable used here is similar to
its usage on x86.

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Signed-off-by: Arun R Bharadwaj <arun.r.bharadwaj@gmail.com>
---
 arch/powerpc/include/asm/processor.h            |    1 +
 arch/powerpc/platforms/pseries/processor_idle.c |    4 ++++
 2 files changed, 5 insertions(+), 0 deletions(-)

Comments

Benjamin Herrenschmidt Nov. 27, 2011, 11:07 p.m. UTC | #1
On Thu, 2011-11-17 at 16:59 +0530, Deepthi Dharwar wrote:
> This patch makes pseries_idle_driver not to be registered when
> power_save=off kernel boot option is specified. The
> boot_option_idle_override variable used here is similar to
> its usage on x86.

Quick Q. With your changes, the CPU will never get into idle at all
until cpuidle initializes and the driver loads.

That means not only much later in the boot process, but potentially
never if the distro has the driver as a module and fails to load it, or
similar.

Can't that be an issue ? Shouldn't we keep at least one of the basic
idle functions as a fallback ?

Cheers,
Ben.
Deepthi Dharwar Nov. 28, 2011, 11:03 a.m. UTC | #2
On 11/28/2011 04:37 AM, Benjamin Herrenschmidt wrote:

> On Thu, 2011-11-17 at 16:59 +0530, Deepthi Dharwar wrote:
>> This patch makes pseries_idle_driver not to be registered when
>> power_save=off kernel boot option is specified. The
>> boot_option_idle_override variable used here is similar to
>> its usage on x86.
> 
> Quick Q. With your changes, the CPU will never get into idle at all
> until cpuidle initializes and the driver loads.
> 
> That means not only much later in the boot process, but potentially
> never if the distro has the driver as a module and fails to load it, or
> similar.
> 
> Can't that be an issue ? Shouldn't we keep at least one of the basic
> idle functions as a fallback ?
> 


On an LPAR if cpuidle is disabled, ppc_md.power_save is still set to
cpuidle_idle_call by default here. This would result in calling of
cpuidle_idle_call repeatedly, only for the call to return -ENODEV. The
default idle is never executed.
This would be a major design flaw. No fallback idle routine.

We propose to fix this by checking the return value of
ppc_md.power_save() call from void to int.
Right now return value is void, but if we change this to int, this
would solve two problems. One being removing the cast to a function
pointer in the prev patch and this design flaw stated above.

So by checking the return value of ppc_md.power_save(), we can invoke
the default idle on failure. But my only concern is about the effects of
changing the ppc_md.power_save() to return int on other powerpc
architectures. Would it be a good idea to change the return type to int
which would help us flag an error and fallback to default idle?

> Cheers,

> Ben.
> 
> 


Regards,
Deepthi
Benjamin Herrenschmidt Nov. 28, 2011, 8:39 p.m. UTC | #3
On Mon, 2011-11-28 at 16:33 +0530, Deepthi Dharwar wrote:

> On an LPAR if cpuidle is disabled, ppc_md.power_save is still set to
> cpuidle_idle_call by default here. This would result in calling of
> cpuidle_idle_call repeatedly, only for the call to return -ENODEV. The
> default idle is never executed.
> This would be a major design flaw. No fallback idle routine.
> 
> We propose to fix this by checking the return value of
> ppc_md.power_save() call from void to int.
> Right now return value is void, but if we change this to int, this
> would solve two problems. One being removing the cast to a function
> pointer in the prev patch and this design flaw stated above.
> 
> So by checking the return value of ppc_md.power_save(), we can invoke
> the default idle on failure. But my only concern is about the effects of
> changing the ppc_md.power_save() to return int on other powerpc
> architectures. Would it be a good idea to change the return type to int
> which would help us flag an error and fallback to default idle?

I would have preferred an approach where the cpuidle module sets
ppc_md.power_save when loaded and restores it when unloaded ... but that
would have to go into the cpuidle core as a powerpc specific tweak and
might not be generally well received.

So go for it, add the return value, but you'll have to update all the
idle functions (grep for power_save in arch/powerpc to find them).

Cheers,
Ben.
Deepthi Dharwar Nov. 29, 2011, 6:44 a.m. UTC | #4
On 11/29/2011 02:09 AM, Benjamin Herrenschmidt wrote:

> On Mon, 2011-11-28 at 16:33 +0530, Deepthi Dharwar wrote:
> 
>> On an LPAR if cpuidle is disabled, ppc_md.power_save is still set to
>> cpuidle_idle_call by default here. This would result in calling of
>> cpuidle_idle_call repeatedly, only for the call to return -ENODEV. The
>> default idle is never executed.
>> This would be a major design flaw. No fallback idle routine.
>>
>> We propose to fix this by checking the return value of
>> ppc_md.power_save() call from void to int.
>> Right now return value is void, but if we change this to int, this
>> would solve two problems. One being removing the cast to a function
>> pointer in the prev patch and this design flaw stated above.
>>
>> So by checking the return value of ppc_md.power_save(), we can invoke
>> the default idle on failure. But my only concern is about the effects of
>> changing the ppc_md.power_save() to return int on other powerpc
>> architectures. Would it be a good idea to change the return type to int
>> which would help us flag an error and fallback to default idle?
> 
> I would have preferred an approach where the cpuidle module sets
> ppc_md.power_save when loaded and restores it when unloaded ... but that
> would have to go into the cpuidle core as a powerpc specific tweak and
> might not be generally well received.
> 
> So go for it, add the return value, but you'll have to update all the
> idle functions (grep for power_save in arch/powerpc to find them).
> 


Thanks Ben. Yes, I will update all the idle functions under powerpc.
I will re-work these patches with the discussed changes.

Regards,
Deepthi
Deepthi Dharwar Nov. 30, 2011, 1:25 a.m. UTC | #5
On 11/29/2011 12:14 PM, Deepthi Dharwar wrote:

> On 11/29/2011 02:09 AM, Benjamin Herrenschmidt wrote:
> 
>> On Mon, 2011-11-28 at 16:33 +0530, Deepthi Dharwar wrote:
>>
>>> On an LPAR if cpuidle is disabled, ppc_md.power_save is still set to
>>> cpuidle_idle_call by default here. This would result in calling of
>>> cpuidle_idle_call repeatedly, only for the call to return -ENODEV. The
>>> default idle is never executed.
>>> This would be a major design flaw. No fallback idle routine.
>>>
>>> We propose to fix this by checking the return value of
>>> ppc_md.power_save() call from void to int.
>>> Right now return value is void, but if we change this to int, this
>>> would solve two problems. One being removing the cast to a function
>>> pointer in the prev patch and this design flaw stated above.
kernel/idle.c:  ppc_md.power_save = NULL;
>>>
>>> So by checking the return value of ppc_md.power_save(), we can invoke
>>> the default idle on failure. But my only concern is about the effects of
>>> changing the ppc_md.power_save() to return int on other powerpc
>>> architectures. Would it be a good idea to change the return type to int
>>> which would help us flag an error and fallback to default idle?
>>
>> I would have preferred an approach where the cpuidle module sets
>> ppc_md.power_save when loaded and restores it when unloaded ... but that
>> would have to go into the cpuidle core as a powerpc specific tweak and
>> might not be generally well received.
>>
>> So go for it, add the return value, but you'll have to update all the
>> idle functions (grep for power_save in arch/powerpc to find them).
>>
> 
> 
> Thanks Ben. Yes, I will update all the idle functions under powerpc.
> I will re-work these patches with the discussed changes.
> 
> Regards,
> Deepthi
> 
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/linux-pm
> 
> 

Hi Ben,

I was trying to add a return value for power_save for all arch/powepc
idle functions but a few of them directly call *.S  routines, as they
are asm.

What would be a good way to change the return value  for asm routines ?
Do we make a change in asm only, put the return value in r3 or write a
wrapper function which would call these asm routines and return an int ?

Regards,
Deepthi
Benjamin Herrenschmidt Nov. 30, 2011, 4:52 a.m. UTC | #6
On Wed, 2011-11-30 at 06:55 +0530, Deepthi Dharwar wrote:
> I was trying to add a return value for power_save for all arch/powepc
> idle functions but a few of them directly call *.S  routines, as they
> are asm.
> 
> What would be a good way to change the return value  for asm
> routines ?
> Do we make a change in asm only, put the return value in r3 or write a
> wrapper function which would call these asm routines and return an
> int ?

No, add li r3,0 at the end, but beware that their return point might not
be ovbvious since we often return from an interrupt which modifies the
return address ... Let me know if there's some you can't figure out and
I'll help you.

Cheers,
Ben.
diff mbox

Patch

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 811b7e7..b286fb7 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -382,6 +382,7 @@  static inline unsigned long get_clean_sp(struct pt_regs *regs, int is_32)
 }
 #endif
 
+extern unsigned long  boot_option_idle_override;
 enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c
index b5addd7..5f74b4e 100644
--- a/arch/powerpc/platforms/pseries/processor_idle.c
+++ b/arch/powerpc/platforms/pseries/processor_idle.c
@@ -260,6 +260,10 @@  static int pseries_idle_probe(void)
 		return -EPERM;
 	}
 
+	if (boot_option_idle_override != IDLE_NO_OVERRIDE) {
+		return -ENODEV;
+	}
+
 	if (!firmware_has_feature(FW_FEATURE_SPLPAR)) {
 		printk(KERN_DEBUG "Using default idle\n");
 		return -ENODEV;