Message ID | 20111117112906.9191.54050.stgit@localhost6.localdomain6 (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Benjamin Herrenschmidt |
Headers | show |
On Thu, 2011-11-17 at 16:59 +0530, Deepthi Dharwar wrote: > This patch makes pseries_idle_driver not to be registered when > power_save=off kernel boot option is specified. The > boot_option_idle_override variable used here is similar to > its usage on x86. Quick Q. With your changes, the CPU will never get into idle at all until cpuidle initializes and the driver loads. That means not only much later in the boot process, but potentially never if the distro has the driver as a module and fails to load it, or similar. Can't that be an issue ? Shouldn't we keep at least one of the basic idle functions as a fallback ? Cheers, Ben.
On 11/28/2011 04:37 AM, Benjamin Herrenschmidt wrote: > On Thu, 2011-11-17 at 16:59 +0530, Deepthi Dharwar wrote: >> This patch makes pseries_idle_driver not to be registered when >> power_save=off kernel boot option is specified. The >> boot_option_idle_override variable used here is similar to >> its usage on x86. > > Quick Q. With your changes, the CPU will never get into idle at all > until cpuidle initializes and the driver loads. > > That means not only much later in the boot process, but potentially > never if the distro has the driver as a module and fails to load it, or > similar. > > Can't that be an issue ? Shouldn't we keep at least one of the basic > idle functions as a fallback ? > On an LPAR if cpuidle is disabled, ppc_md.power_save is still set to cpuidle_idle_call by default here. This would result in calling of cpuidle_idle_call repeatedly, only for the call to return -ENODEV. The default idle is never executed. This would be a major design flaw. No fallback idle routine. We propose to fix this by checking the return value of ppc_md.power_save() call from void to int. Right now return value is void, but if we change this to int, this would solve two problems. One being removing the cast to a function pointer in the prev patch and this design flaw stated above. So by checking the return value of ppc_md.power_save(), we can invoke the default idle on failure. But my only concern is about the effects of changing the ppc_md.power_save() to return int on other powerpc architectures. Would it be a good idea to change the return type to int which would help us flag an error and fallback to default idle? > Cheers, > Ben. > > Regards, Deepthi
On Mon, 2011-11-28 at 16:33 +0530, Deepthi Dharwar wrote: > On an LPAR if cpuidle is disabled, ppc_md.power_save is still set to > cpuidle_idle_call by default here. This would result in calling of > cpuidle_idle_call repeatedly, only for the call to return -ENODEV. The > default idle is never executed. > This would be a major design flaw. No fallback idle routine. > > We propose to fix this by checking the return value of > ppc_md.power_save() call from void to int. > Right now return value is void, but if we change this to int, this > would solve two problems. One being removing the cast to a function > pointer in the prev patch and this design flaw stated above. > > So by checking the return value of ppc_md.power_save(), we can invoke > the default idle on failure. But my only concern is about the effects of > changing the ppc_md.power_save() to return int on other powerpc > architectures. Would it be a good idea to change the return type to int > which would help us flag an error and fallback to default idle? I would have preferred an approach where the cpuidle module sets ppc_md.power_save when loaded and restores it when unloaded ... but that would have to go into the cpuidle core as a powerpc specific tweak and might not be generally well received. So go for it, add the return value, but you'll have to update all the idle functions (grep for power_save in arch/powerpc to find them). Cheers, Ben.
On 11/29/2011 02:09 AM, Benjamin Herrenschmidt wrote: > On Mon, 2011-11-28 at 16:33 +0530, Deepthi Dharwar wrote: > >> On an LPAR if cpuidle is disabled, ppc_md.power_save is still set to >> cpuidle_idle_call by default here. This would result in calling of >> cpuidle_idle_call repeatedly, only for the call to return -ENODEV. The >> default idle is never executed. >> This would be a major design flaw. No fallback idle routine. >> >> We propose to fix this by checking the return value of >> ppc_md.power_save() call from void to int. >> Right now return value is void, but if we change this to int, this >> would solve two problems. One being removing the cast to a function >> pointer in the prev patch and this design flaw stated above. >> >> So by checking the return value of ppc_md.power_save(), we can invoke >> the default idle on failure. But my only concern is about the effects of >> changing the ppc_md.power_save() to return int on other powerpc >> architectures. Would it be a good idea to change the return type to int >> which would help us flag an error and fallback to default idle? > > I would have preferred an approach where the cpuidle module sets > ppc_md.power_save when loaded and restores it when unloaded ... but that > would have to go into the cpuidle core as a powerpc specific tweak and > might not be generally well received. > > So go for it, add the return value, but you'll have to update all the > idle functions (grep for power_save in arch/powerpc to find them). > Thanks Ben. Yes, I will update all the idle functions under powerpc. I will re-work these patches with the discussed changes. Regards, Deepthi
On 11/29/2011 12:14 PM, Deepthi Dharwar wrote: > On 11/29/2011 02:09 AM, Benjamin Herrenschmidt wrote: > >> On Mon, 2011-11-28 at 16:33 +0530, Deepthi Dharwar wrote: >> >>> On an LPAR if cpuidle is disabled, ppc_md.power_save is still set to >>> cpuidle_idle_call by default here. This would result in calling of >>> cpuidle_idle_call repeatedly, only for the call to return -ENODEV. The >>> default idle is never executed. >>> This would be a major design flaw. No fallback idle routine. >>> >>> We propose to fix this by checking the return value of >>> ppc_md.power_save() call from void to int. >>> Right now return value is void, but if we change this to int, this >>> would solve two problems. One being removing the cast to a function >>> pointer in the prev patch and this design flaw stated above. kernel/idle.c: ppc_md.power_save = NULL; >>> >>> So by checking the return value of ppc_md.power_save(), we can invoke >>> the default idle on failure. But my only concern is about the effects of >>> changing the ppc_md.power_save() to return int on other powerpc >>> architectures. Would it be a good idea to change the return type to int >>> which would help us flag an error and fallback to default idle? >> >> I would have preferred an approach where the cpuidle module sets >> ppc_md.power_save when loaded and restores it when unloaded ... but that >> would have to go into the cpuidle core as a powerpc specific tweak and >> might not be generally well received. >> >> So go for it, add the return value, but you'll have to update all the >> idle functions (grep for power_save in arch/powerpc to find them). >> > > > Thanks Ben. Yes, I will update all the idle functions under powerpc. > I will re-work these patches with the discussed changes. > > Regards, > Deepthi > > _______________________________________________ > linux-pm mailing list > linux-pm@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/linux-pm > > Hi Ben, I was trying to add a return value for power_save for all arch/powepc idle functions but a few of them directly call *.S routines, as they are asm. What would be a good way to change the return value for asm routines ? Do we make a change in asm only, put the return value in r3 or write a wrapper function which would call these asm routines and return an int ? Regards, Deepthi
On Wed, 2011-11-30 at 06:55 +0530, Deepthi Dharwar wrote: > I was trying to add a return value for power_save for all arch/powepc > idle functions but a few of them directly call *.S routines, as they > are asm. > > What would be a good way to change the return value for asm > routines ? > Do we make a change in asm only, put the return value in r3 or write a > wrapper function which would call these asm routines and return an > int ? No, add li r3,0 at the end, but beware that their return point might not be ovbvious since we often return from an interrupt which modifies the return address ... Let me know if there's some you can't figure out and I'll help you. Cheers, Ben.
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 811b7e7..b286fb7 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -382,6 +382,7 @@ static inline unsigned long get_clean_sp(struct pt_regs *regs, int is_32) } #endif +extern unsigned long boot_option_idle_override; enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF}; #endif /* __KERNEL__ */ diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c index b5addd7..5f74b4e 100644 --- a/arch/powerpc/platforms/pseries/processor_idle.c +++ b/arch/powerpc/platforms/pseries/processor_idle.c @@ -260,6 +260,10 @@ static int pseries_idle_probe(void) return -EPERM; } + if (boot_option_idle_override != IDLE_NO_OVERRIDE) { + return -ENODEV; + } + if (!firmware_has_feature(FW_FEATURE_SPLPAR)) { printk(KERN_DEBUG "Using default idle\n"); return -ENODEV;