Patchwork [RFC,4/6] PM / Runtime: Introduce flag can_power_off

login
register
mail settings
Submitter Lin Ming
Date Feb. 13, 2012, 9:11 a.m.
Message ID <1329124271-29464-5-git-send-email-ming.m.lin@intel.com>
Download mbox | patch
Permalink /patch/140875/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Lin Ming - Feb. 13, 2012, 9:11 a.m.
From: Zhang Rui <rui.zhang@intel.com>

Introduce flag can_power_off in device structure to support runtime
power off/on.

Note that, for a specific device driver,
"support runtime power off/on" means that the driver .runtime_suspend
callback needs to
1) save all the context so that it can restore the device back to the previous
   working state after powered on.
2) set can_power_off flag to tell the driver model that it's ready for power off.

The following example shows how this works.

device A
 |---------|
 v         v
device B  device C

A is the parent of device B and device C, and device A/B/C shares the
same power logic
(Only device A knows how to turn on/off the power).

In order to power off A, B, C at runtime,
1) device B and device C should support runtime power off
   (runtime suspended with can_power_off flag set)
2) pm idle request for device A is fired by runtime PM core.
3) in device A .runtime_suspend callback, it tries to set can_power_off flag.
4) if succeed, it means all its children have been ready for power off
   and it can turn off the power at any time.
5) if failed, it means at least one of its children does not support runtime
   power off, thus the power can not be turned off.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
---
 include/linux/pm.h         |    1 +
 include/linux/pm_runtime.h |   30 ++++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 0 deletions(-)
Alan Stern - Feb. 13, 2012, 3:01 p.m.
On Mon, 13 Feb 2012, Lin Ming wrote:

> From: Zhang Rui <rui.zhang@intel.com>
> 
> Introduce flag can_power_off in device structure to support runtime
> power off/on.
> 
> Note that, for a specific device driver,
> "support runtime power off/on" means that the driver .runtime_suspend
> callback needs to
> 1) save all the context so that it can restore the device back to the previous
>    working state after powered on.
> 2) set can_power_off flag to tell the driver model that it's ready for power off.
> 
> The following example shows how this works.
> 
> device A
>  |---------|
>  v         v
> device B  device C
> 
> A is the parent of device B and device C, and device A/B/C shares the
> same power logic
> (Only device A knows how to turn on/off the power).
> 
> In order to power off A, B, C at runtime,
> 1) device B and device C should support runtime power off
>    (runtime suspended with can_power_off flag set)
> 2) pm idle request for device A is fired by runtime PM core.
> 3) in device A .runtime_suspend callback, it tries to set can_power_off flag.
> 4) if succeed, it means all its children have been ready for power off
>    and it can turn off the power at any time.
> 5) if failed, it means at least one of its children does not support runtime
>    power off, thus the power can not be turned off.

I'm not sure if this is really the right approach.  What you're trying 
to do is implement two different low-power states, basically D3hot and 
D3cold.  Currently the runtime PM core doesn't support such things; all 
it knows about is low power and full power.

Before doing an ad-hoc implementation, it would be best to step back
and think about other subsystems.  Other sorts of devices may well have
multiple low-power states.  What's the best way for this to be
supported by the PM core?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Feb. 13, 2012, 7:38 p.m.
On Monday, February 13, 2012, Alan Stern wrote:
> On Mon, 13 Feb 2012, Lin Ming wrote:
> 
> > From: Zhang Rui <rui.zhang@intel.com>
> > 
> > Introduce flag can_power_off in device structure to support runtime
> > power off/on.
> > 
> > Note that, for a specific device driver,
> > "support runtime power off/on" means that the driver .runtime_suspend
> > callback needs to
> > 1) save all the context so that it can restore the device back to the previous
> >    working state after powered on.
> > 2) set can_power_off flag to tell the driver model that it's ready for power off.
> > 
> > The following example shows how this works.
> > 
> > device A
> >  |---------|
> >  v         v
> > device B  device C
> > 
> > A is the parent of device B and device C, and device A/B/C shares the
> > same power logic
> > (Only device A knows how to turn on/off the power).
> > 
> > In order to power off A, B, C at runtime,
> > 1) device B and device C should support runtime power off
> >    (runtime suspended with can_power_off flag set)
> > 2) pm idle request for device A is fired by runtime PM core.
> > 3) in device A .runtime_suspend callback, it tries to set can_power_off flag.
> > 4) if succeed, it means all its children have been ready for power off
> >    and it can turn off the power at any time.
> > 5) if failed, it means at least one of its children does not support runtime
> >    power off, thus the power can not be turned off.
> 
> I'm not sure if this is really the right approach.  What you're trying 
> to do is implement two different low-power states, basically D3hot and 
> D3cold.  Currently the runtime PM core doesn't support such things; all 
> it knows about is low power and full power.

I'd rather say all it knows about is "suspended" and "active", which mean
"the device is not processing I/O" and "the device may be processing I/O",
respectively.  A "suspended" device may or may not be in a low-power state,
but the runtime PM core doesn't care about that.

> Before doing an ad-hoc implementation, it would be best to step back
> and think about other subsystems.  Other sorts of devices may well have
> multiple low-power states.  What's the best way for this to be
> supported by the PM core?

Well, I honestly don't think there's any way they all can be covered at the
same time and that's why we chose to support only "suspended" and "active"
as defined above.  The handling of multiple low-power states must be
implemented outside of the runtime PM core (like in the PCI core, for example).

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern - Feb. 13, 2012, 8:41 p.m.
On Mon, 13 Feb 2012, Rafael J. Wysocki wrote:

> > I'm not sure if this is really the right approach.  What you're trying 
> > to do is implement two different low-power states, basically D3hot and 
> > D3cold.  Currently the runtime PM core doesn't support such things; all 
> > it knows about is low power and full power.
> 
> I'd rather say all it knows about is "suspended" and "active", which mean
> "the device is not processing I/O" and "the device may be processing I/O",
> respectively.  A "suspended" device may or may not be in a low-power state,
> but the runtime PM core doesn't care about that.

Yes, okay.  We can say that this patch tries to implement two different 
"suspended" states, basically "low power" and "power off" (or D3hot and 
D3cold).

> > Before doing an ad-hoc implementation, it would be best to step back
> > and think about other subsystems.  Other sorts of devices may well have
> > multiple low-power states.  What's the best way for this to be
> > supported by the PM core?
> 
> Well, I honestly don't think there's any way they all can be covered at the
> same time and that's why we chose to support only "suspended" and "active"
> as defined above.  The handling of multiple low-power states must be
> implemented outside of the runtime PM core (like in the PCI core, for example).

That's the point.  If this is to be implemented outside of the runtime
PM core, should the patch be allowed to add new fields to struct
dev_pm_info (which has to be shared among all subsystems)?

Or to put it another way, if we do add new fields to struct dev_pm_info
(like can_power_off) in order to help support multiple "suspended"  
states, shouldn't these new fields be such that they can be used by
many different subsystems rather than being special for the
full-power/no-power situation?

Likewise, should new routines like pm_runtime_allow_power_off() be
added to the runtime PM core if they are going to be used just by PCI?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Feb. 13, 2012, 8:50 p.m.
On Monday, February 13, 2012, Alan Stern wrote:
> On Mon, 13 Feb 2012, Rafael J. Wysocki wrote:
> 
> > > I'm not sure if this is really the right approach.  What you're trying 
> > > to do is implement two different low-power states, basically D3hot and 
> > > D3cold.  Currently the runtime PM core doesn't support such things; all 
> > > it knows about is low power and full power.
> > 
> > I'd rather say all it knows about is "suspended" and "active", which mean
> > "the device is not processing I/O" and "the device may be processing I/O",
> > respectively.  A "suspended" device may or may not be in a low-power state,
> > but the runtime PM core doesn't care about that.
> 
> Yes, okay.  We can say that this patch tries to implement two different 
> "suspended" states, basically "low power" and "power off" (or D3hot and 
> D3cold).
> 
> > > Before doing an ad-hoc implementation, it would be best to step back
> > > and think about other subsystems.  Other sorts of devices may well have
> > > multiple low-power states.  What's the best way for this to be
> > > supported by the PM core?
> > 
> > Well, I honestly don't think there's any way they all can be covered at the
> > same time and that's why we chose to support only "suspended" and "active"
> > as defined above.  The handling of multiple low-power states must be
> > implemented outside of the runtime PM core (like in the PCI core, for example).
> 
> That's the point.  If this is to be implemented outside of the runtime
> PM core, should the patch be allowed to add new fields to struct
> dev_pm_info (which has to be shared among all subsystems)?
> 
> Or to put it another way, if we do add new fields to struct dev_pm_info
> (like can_power_off) in order to help support multiple "suspended"  
> states, shouldn't these new fields be such that they can be used by
> many different subsystems rather than being special for the
> full-power/no-power situation?
> 
> Likewise, should new routines like pm_runtime_allow_power_off() be
> added to the runtime PM core if they are going to be used just by PCI?

No, they shouldn't.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang Rui - Feb. 14, 2012, 6:07 a.m.
On 一, 2012-02-13 at 10:01 -0500, Alan Stern wrote:
> On Mon, 13 Feb 2012, Lin Ming wrote:
> 
> > From: Zhang Rui <rui.zhang@intel.com>
> > 
> > Introduce flag can_power_off in device structure to support runtime
> > power off/on.
> > 
> > Note that, for a specific device driver,
> > "support runtime power off/on" means that the driver .runtime_suspend
> > callback needs to
> > 1) save all the context so that it can restore the device back to the previous
> >    working state after powered on.
> > 2) set can_power_off flag to tell the driver model that it's ready for power off.
> > 
> > The following example shows how this works.
> > 
> > device A
> >  |---------|
> >  v         v
> > device B  device C
> > 
> > A is the parent of device B and device C, and device A/B/C shares the
> > same power logic
> > (Only device A knows how to turn on/off the power).
> > 
> > In order to power off A, B, C at runtime,
> > 1) device B and device C should support runtime power off
> >    (runtime suspended with can_power_off flag set)
> > 2) pm idle request for device A is fired by runtime PM core.
> > 3) in device A .runtime_suspend callback, it tries to set can_power_off flag.
> > 4) if succeed, it means all its children have been ready for power off
> >    and it can turn off the power at any time.
> > 5) if failed, it means at least one of its children does not support runtime
> >    power off, thus the power can not be turned off.
> 
> I'm not sure if this is really the right approach.  What you're trying 
> to do is implement two different low-power states, basically D3hot and 
> D3cold.  Currently the runtime PM core doesn't support such things; all 
> it knows about is low power and full power.
> 
Exactly.
what I'm trying to fix here is to add a "special" runtime low power
state, aka, power off.

> Before doing an ad-hoc implementation, it would be best to step back
> and think about other subsystems.  Other sorts of devices may well have
> multiple low-power states.  What's the best way for this to be
> supported by the PM core?
> 
I thought about this before, e.g. introduce support for multiple runtime
low power states in runtime PM core, like suspend/hibernate for system
low power states. But I'm not sure if this is workable because the low
power states varies between devices/buses/platforms.

So I decided to introduce a special low power state, aka, runtime power
off, first, which means the same thing to different
devices/buses/platforms.

thanks,
rui


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang Rui - Feb. 14, 2012, 6:17 a.m.
On 一, 2012-02-13 at 20:38 +0100, Rafael J. Wysocki wrote:
> On Monday, February 13, 2012, Alan Stern wrote:
> > On Mon, 13 Feb 2012, Lin Ming wrote:
> > 
> > > From: Zhang Rui <rui.zhang@intel.com>
> > > 
> > > Introduce flag can_power_off in device structure to support runtime
> > > power off/on.
> > > 
> > > Note that, for a specific device driver,
> > > "support runtime power off/on" means that the driver .runtime_suspend
> > > callback needs to
> > > 1) save all the context so that it can restore the device back to the previous
> > >    working state after powered on.
> > > 2) set can_power_off flag to tell the driver model that it's ready for power off.
> > > 
> > > The following example shows how this works.
> > > 
> > > device A
> > >  |---------|
> > >  v         v
> > > device B  device C
> > > 
> > > A is the parent of device B and device C, and device A/B/C shares the
> > > same power logic
> > > (Only device A knows how to turn on/off the power).
> > > 
> > > In order to power off A, B, C at runtime,
> > > 1) device B and device C should support runtime power off
> > >    (runtime suspended with can_power_off flag set)
> > > 2) pm idle request for device A is fired by runtime PM core.
> > > 3) in device A .runtime_suspend callback, it tries to set can_power_off flag.
> > > 4) if succeed, it means all its children have been ready for power off
> > >    and it can turn off the power at any time.
> > > 5) if failed, it means at least one of its children does not support runtime
> > >    power off, thus the power can not be turned off.
> > 
> > I'm not sure if this is really the right approach.  What you're trying 
> > to do is implement two different low-power states, basically D3hot and 
> > D3cold.  Currently the runtime PM core doesn't support such things; all 
> > it knows about is low power and full power.
> 
> I'd rather say all it knows about is "suspended" and "active", which mean
> "the device is not processing I/O" and "the device may be processing I/O",
> respectively.  A "suspended" device may or may not be in a low-power state,
> but the runtime PM core doesn't care about that.
> 
yes, I know that.

> > Before doing an ad-hoc implementation, it would be best to step back
> > and think about other subsystems.  Other sorts of devices may well have
> > multiple low-power states.  What's the best way for this to be
> > supported by the PM core?
> 
> Well, I honestly don't think there's any way they all can be covered at the
> same time and that's why we chose to support only "suspended" and "active"
> as defined above.

> The handling of multiple low-power states must be
> implemented outside of the runtime PM core (like in the PCI core, for example).

Surely I'd prefer to implement it in the bus code, :), but the problem
is that several buses maybe involved at the same time.
Let's take ZPODD for example,
ZPODD is attached to a SATA port. Only SATA port knows that it can be
runtime powered off, because its ACPI node has _PR3._OFF.
But when ATA layer code tries to put SATA port to D3_COLD at runtime,it
must make sure all the devices/drivers in the same power domain are
ready for power off, and in this case, we need to get this info from
SCSI layer.

thanks,
rui

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang Rui - Feb. 14, 2012, 7:11 a.m.
Hi, Alan,

On 一, 2012-02-13 at 15:41 -0500, Alan Stern wrote:
> On Mon, 13 Feb 2012, Rafael J. Wysocki wrote:
> 
> > > I'm not sure if this is really the right approach.  What you're trying 
> > > to do is implement two different low-power states, basically D3hot and 
> > > D3cold.  Currently the runtime PM core doesn't support such things; all 
> > > it knows about is low power and full power.
> > 
> > I'd rather say all it knows about is "suspended" and "active", which mean
> > "the device is not processing I/O" and "the device may be processing I/O",
> > respectively.  A "suspended" device may or may not be in a low-power state,
> > but the runtime PM core doesn't care about that.
> 
> Yes, okay.  We can say that this patch tries to implement two different 
> "suspended" states, basically "low power" and "power off" (or D3hot and 
> D3cold).
> 
Right!

> > > Before doing an ad-hoc implementation, it would be best to step back
> > > and think about other subsystems.  Other sorts of devices may well have
> > > multiple low-power states.  What's the best way for this to be
> > > supported by the PM core?
> > 
> > Well, I honestly don't think there's any way they all can be covered at the
> > same time and that's why we chose to support only "suspended" and "active"
> > as defined above.  The handling of multiple low-power states must be
> > implemented outside of the runtime PM core (like in the PCI core, for example).
> 
> That's the point.  If this is to be implemented outside of the runtime
> PM core, should the patch be allowed to add new fields to struct
> dev_pm_info (which has to be shared among all subsystems)?
> 
Surely it shouldn't in this case.

> Or to put it another way, if we do add new fields to struct dev_pm_info
> (like can_power_off) in order to help support multiple "suspended"  
> states, shouldn't these new fields be such that they can be used by
> many different subsystems rather than being special for the
> full-power/no-power situation?
> 
My opinion is that the concept of "no-power state" is unique for all
devices/buses/platforms.
If any of them support this, they can use the routines without any
confusion.

thanks,
rui

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Feb. 14, 2012, 10:38 p.m.
On Tuesday, February 14, 2012, Zhang Rui wrote:
> Hi, Alan,
> 
> On 一, 2012-02-13 at 15:41 -0500, Alan Stern wrote:
> > On Mon, 13 Feb 2012, Rafael J. Wysocki wrote:
> > 
> > > > I'm not sure if this is really the right approach.  What you're trying 
> > > > to do is implement two different low-power states, basically D3hot and 
> > > > D3cold.  Currently the runtime PM core doesn't support such things; all 
> > > > it knows about is low power and full power.
> > > 
> > > I'd rather say all it knows about is "suspended" and "active", which mean
> > > "the device is not processing I/O" and "the device may be processing I/O",
> > > respectively.  A "suspended" device may or may not be in a low-power state,
> > > but the runtime PM core doesn't care about that.
> > 
> > Yes, okay.  We can say that this patch tries to implement two different 
> > "suspended" states, basically "low power" and "power off" (or D3hot and 
> > D3cold).
> > 
> Right!
> 
> > > > Before doing an ad-hoc implementation, it would be best to step back
> > > > and think about other subsystems.  Other sorts of devices may well have
> > > > multiple low-power states.  What's the best way for this to be
> > > > supported by the PM core?
> > > 
> > > Well, I honestly don't think there's any way they all can be covered at the
> > > same time and that's why we chose to support only "suspended" and "active"
> > > as defined above.  The handling of multiple low-power states must be
> > > implemented outside of the runtime PM core (like in the PCI core, for example).
> > 
> > That's the point.  If this is to be implemented outside of the runtime
> > PM core, should the patch be allowed to add new fields to struct
> > dev_pm_info (which has to be shared among all subsystems)?
> > 
> Surely it shouldn't in this case.
> 
> > Or to put it another way, if we do add new fields to struct dev_pm_info
> > (like can_power_off) in order to help support multiple "suspended"  
> > states, shouldn't these new fields be such that they can be used by
> > many different subsystems rather than being special for the
> > full-power/no-power situation?
> > 
> My opinion is that the concept of "no-power state" is unique for all
> devices/buses/platforms.

No, it is not, basically because of power domains.  If they are used,
then individual device power states are not well defined at all.

> If any of them support this, they can use the routines without any
> confusion.

No, they can't.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Feb. 14, 2012, 10:39 p.m.
On Tuesday, February 14, 2012, Zhang Rui wrote:
> On 一, 2012-02-13 at 20:38 +0100, Rafael J. Wysocki wrote:
> > On Monday, February 13, 2012, Alan Stern wrote:
> > > On Mon, 13 Feb 2012, Lin Ming wrote:
> > > 
> > > > From: Zhang Rui <rui.zhang@intel.com>
> > > > 
> > > > Introduce flag can_power_off in device structure to support runtime
> > > > power off/on.
> > > > 
> > > > Note that, for a specific device driver,
> > > > "support runtime power off/on" means that the driver .runtime_suspend
> > > > callback needs to
> > > > 1) save all the context so that it can restore the device back to the previous
> > > >    working state after powered on.
> > > > 2) set can_power_off flag to tell the driver model that it's ready for power off.
> > > > 
> > > > The following example shows how this works.
> > > > 
> > > > device A
> > > >  |---------|
> > > >  v         v
> > > > device B  device C
> > > > 
> > > > A is the parent of device B and device C, and device A/B/C shares the
> > > > same power logic
> > > > (Only device A knows how to turn on/off the power).
> > > > 
> > > > In order to power off A, B, C at runtime,
> > > > 1) device B and device C should support runtime power off
> > > >    (runtime suspended with can_power_off flag set)
> > > > 2) pm idle request for device A is fired by runtime PM core.
> > > > 3) in device A .runtime_suspend callback, it tries to set can_power_off flag.
> > > > 4) if succeed, it means all its children have been ready for power off
> > > >    and it can turn off the power at any time.
> > > > 5) if failed, it means at least one of its children does not support runtime
> > > >    power off, thus the power can not be turned off.
> > > 
> > > I'm not sure if this is really the right approach.  What you're trying 
> > > to do is implement two different low-power states, basically D3hot and 
> > > D3cold.  Currently the runtime PM core doesn't support such things; all 
> > > it knows about is low power and full power.
> > 
> > I'd rather say all it knows about is "suspended" and "active", which mean
> > "the device is not processing I/O" and "the device may be processing I/O",
> > respectively.  A "suspended" device may or may not be in a low-power state,
> > but the runtime PM core doesn't care about that.
> > 
> yes, I know that.
> 
> > > Before doing an ad-hoc implementation, it would be best to step back
> > > and think about other subsystems.  Other sorts of devices may well have
> > > multiple low-power states.  What's the best way for this to be
> > > supported by the PM core?
> > 
> > Well, I honestly don't think there's any way they all can be covered at the
> > same time and that's why we chose to support only "suspended" and "active"
> > as defined above.
> 
> > The handling of multiple low-power states must be
> > implemented outside of the runtime PM core (like in the PCI core, for example).
> 
> Surely I'd prefer to implement it in the bus code, :), but the problem
> is that several buses maybe involved at the same time.
> Let's take ZPODD for example,
> ZPODD is attached to a SATA port. Only SATA port knows that it can be
> runtime powered off, because its ACPI node has _PR3._OFF.
> But when ATA layer code tries to put SATA port to D3_COLD at runtime,it
> must make sure all the devices/drivers in the same power domain are
> ready for power off, and in this case, we need to get this info from
> SCSI layer.

Then you need to get it from there.  I know that this is a difficult problem,
have been working on a similar one for several months now. :-)

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang Rui - Feb. 16, 2012, 7:41 a.m.
On 二, 2012-02-14 at 23:39 +0100, Rafael J. Wysocki wrote:
> On Tuesday, February 14, 2012, Zhang Rui wrote:
> > On 一, 2012-02-13 at 20:38 +0100, Rafael J. Wysocki wrote:
> > > On Monday, February 13, 2012, Alan Stern wrote:
> > > > On Mon, 13 Feb 2012, Lin Ming wrote:
> > > > 
> > > > > From: Zhang Rui <rui.zhang@intel.com>
> > > > > 
> > > > > Introduce flag can_power_off in device structure to support runtime
> > > > > power off/on.
> > > > > 
> > > > > Note that, for a specific device driver,
> > > > > "support runtime power off/on" means that the driver .runtime_suspend
> > > > > callback needs to
> > > > > 1) save all the context so that it can restore the device back to the previous
> > > > >    working state after powered on.
> > > > > 2) set can_power_off flag to tell the driver model that it's ready for power off.
> > > > > 
> > > > > The following example shows how this works.
> > > > > 
> > > > > device A
> > > > >  |---------|
> > > > >  v         v
> > > > > device B  device C
> > > > > 
> > > > > A is the parent of device B and device C, and device A/B/C shares the
> > > > > same power logic
> > > > > (Only device A knows how to turn on/off the power).
> > > > > 
> > > > > In order to power off A, B, C at runtime,
> > > > > 1) device B and device C should support runtime power off
> > > > >    (runtime suspended with can_power_off flag set)
> > > > > 2) pm idle request for device A is fired by runtime PM core.
> > > > > 3) in device A .runtime_suspend callback, it tries to set can_power_off flag.
> > > > > 4) if succeed, it means all its children have been ready for power off
> > > > >    and it can turn off the power at any time.
> > > > > 5) if failed, it means at least one of its children does not support runtime
> > > > >    power off, thus the power can not be turned off.
> > > > 
> > > > I'm not sure if this is really the right approach.  What you're trying 
> > > > to do is implement two different low-power states, basically D3hot and 
> > > > D3cold.  Currently the runtime PM core doesn't support such things; all 
> > > > it knows about is low power and full power.
> > > 
> > > I'd rather say all it knows about is "suspended" and "active", which mean
> > > "the device is not processing I/O" and "the device may be processing I/O",
> > > respectively.  A "suspended" device may or may not be in a low-power state,
> > > but the runtime PM core doesn't care about that.
> > > 
> > yes, I know that.
> > 
> > > > Before doing an ad-hoc implementation, it would be best to step back
> > > > and think about other subsystems.  Other sorts of devices may well have
> > > > multiple low-power states.  What's the best way for this to be
> > > > supported by the PM core?
> > > 
> > > Well, I honestly don't think there's any way they all can be covered at the
> > > same time and that's why we chose to support only "suspended" and "active"
> > > as defined above.
> > 
> > > The handling of multiple low-power states must be
> > > implemented outside of the runtime PM core (like in the PCI core, for example).
> > 
> > Surely I'd prefer to implement it in the bus code, :), but the problem
> > is that several buses maybe involved at the same time.
> > Let's take ZPODD for example,
> > ZPODD is attached to a SATA port. Only SATA port knows that it can be
> > runtime powered off, because its ACPI node has _PR3._OFF.
> > But when ATA layer code tries to put SATA port to D3_COLD at runtime,it
> > must make sure all the devices/drivers in the same power domain are
> > ready for power off, and in this case, we need to get this info from
> > SCSI layer.
> 
> Then you need to get it from there.  I know that this is a difficult problem,

Yeah, I have thought about this for quite a while before, there ARE
several ways to do this, but these need a lot of changes in bus code, at
least for the buses that support device runtime D3 (off) by ACPI.

Lets also take SATA port and ZPODD for example,
proposal one,
1) introduce scsi_can_power_off and ata_can_power_off.
2) sr driver set scsi_can_power_off bit and scsi layer is aware of this,
thus the scsi host can set this bit as well.
3) in the .runtime_suspend callback of ata port, it knows that its scsi
host interface can be powered off, thus it invokes ata_can_power_off to
tell the ata layer.

proposal two,
introduce a platform callback for each bus.
And it is invoked immediately after the scsi_driver->runtime_suspend
being invoked in scsi_bus->runtime_suspend.
The platform callback checks the scsi lower power state of the
scsi_device and choose a compatible ACPI D-state for the device.
The decision of whether to use ACPI D3 (off) or not is made in the
platform callback.

what do you think?

> have been working on a similar one for several months now. :-)

That's why generic power domain is introduced?
Can you tell me what's your idea please?
It would be GREAT if you can share your experience on this.

thanks,
rui

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Feb. 17, 2012, 11:54 p.m.
On Thursday, February 16, 2012, Zhang Rui wrote:
> On 二, 2012-02-14 at 23:39 +0100, Rafael J. Wysocki wrote:
> > On Tuesday, February 14, 2012, Zhang Rui wrote:
> > > On 一, 2012-02-13 at 20:38 +0100, Rafael J. Wysocki wrote:
> > > > On Monday, February 13, 2012, Alan Stern wrote:
> > > > > On Mon, 13 Feb 2012, Lin Ming wrote:
> > > > > 
> > > > > > From: Zhang Rui <rui.zhang@intel.com>
> > > > > > 
> > > > > > Introduce flag can_power_off in device structure to support runtime
> > > > > > power off/on.
> > > > > > 
> > > > > > Note that, for a specific device driver,
> > > > > > "support runtime power off/on" means that the driver .runtime_suspend
> > > > > > callback needs to
> > > > > > 1) save all the context so that it can restore the device back to the previous
> > > > > >    working state after powered on.
> > > > > > 2) set can_power_off flag to tell the driver model that it's ready for power off.
> > > > > > 
> > > > > > The following example shows how this works.
> > > > > > 
> > > > > > device A
> > > > > >  |---------|
> > > > > >  v         v
> > > > > > device B  device C
> > > > > > 
> > > > > > A is the parent of device B and device C, and device A/B/C shares the
> > > > > > same power logic
> > > > > > (Only device A knows how to turn on/off the power).
> > > > > > 
> > > > > > In order to power off A, B, C at runtime,
> > > > > > 1) device B and device C should support runtime power off
> > > > > >    (runtime suspended with can_power_off flag set)
> > > > > > 2) pm idle request for device A is fired by runtime PM core.
> > > > > > 3) in device A .runtime_suspend callback, it tries to set can_power_off flag.
> > > > > > 4) if succeed, it means all its children have been ready for power off
> > > > > >    and it can turn off the power at any time.
> > > > > > 5) if failed, it means at least one of its children does not support runtime
> > > > > >    power off, thus the power can not be turned off.
> > > > > 
> > > > > I'm not sure if this is really the right approach.  What you're trying 
> > > > > to do is implement two different low-power states, basically D3hot and 
> > > > > D3cold.  Currently the runtime PM core doesn't support such things; all 
> > > > > it knows about is low power and full power.
> > > > 
> > > > I'd rather say all it knows about is "suspended" and "active", which mean
> > > > "the device is not processing I/O" and "the device may be processing I/O",
> > > > respectively.  A "suspended" device may or may not be in a low-power state,
> > > > but the runtime PM core doesn't care about that.
> > > > 
> > > yes, I know that.
> > > 
> > > > > Before doing an ad-hoc implementation, it would be best to step back
> > > > > and think about other subsystems.  Other sorts of devices may well have
> > > > > multiple low-power states.  What's the best way for this to be
> > > > > supported by the PM core?
> > > > 
> > > > Well, I honestly don't think there's any way they all can be covered at the
> > > > same time and that's why we chose to support only "suspended" and "active"
> > > > as defined above.
> > > 
> > > > The handling of multiple low-power states must be
> > > > implemented outside of the runtime PM core (like in the PCI core, for example).
> > > 
> > > Surely I'd prefer to implement it in the bus code, :), but the problem
> > > is that several buses maybe involved at the same time.
> > > Let's take ZPODD for example,
> > > ZPODD is attached to a SATA port. Only SATA port knows that it can be
> > > runtime powered off, because its ACPI node has _PR3._OFF.
> > > But when ATA layer code tries to put SATA port to D3_COLD at runtime,it
> > > must make sure all the devices/drivers in the same power domain are
> > > ready for power off, and in this case, we need to get this info from
> > > SCSI layer.
> > 
> > Then you need to get it from there.  I know that this is a difficult problem,
> 
> Yeah, I have thought about this for quite a while before, there ARE
> several ways to do this, but these need a lot of changes in bus code, at
> least for the buses that support device runtime D3 (off) by ACPI.
> 
> Lets also take SATA port and ZPODD for example,
> proposal one,
> 1) introduce scsi_can_power_off and ata_can_power_off.
> 2) sr driver set scsi_can_power_off bit and scsi layer is aware of this,
> thus the scsi host can set this bit as well.
> 3) in the .runtime_suspend callback of ata port, it knows that its scsi
> host interface can be powered off, thus it invokes ata_can_power_off to
> tell the ata layer.

Hmm.  I'm not sure why you want to introduce this special "power off"
condition.  In fact, it's nothing special, it only means that the device
in question shouldn't be accessed by software, which pretty much is equivalent
to the "suspended" condition (as defined in the runtime PM docs).

> proposal two,
> introduce a platform callback for each bus.
> And it is invoked immediately after the scsi_driver->runtime_suspend
> being invoked in scsi_bus->runtime_suspend.
> The platform callback checks the scsi lower power state of the
> scsi_device and choose a compatible ACPI D-state for the device.
> The decision of whether to use ACPI D3 (off) or not is made in the
> platform callback.
> 
> what do you think?

I think you need to consider that at a more abstract level.

> > have been working on a similar one for several months now. :-)
> 
> That's why generic power domain is introduced?
> Can you tell me what's your idea please?
> It would be GREAT if you can share your experience on this.

Well, a power domain (which seems to be what you have in the ZPODD case)
is analogous to a package with multiple CPU cores.  In that case you
can put individual cores into per-core low-power ("idle") states (that
roughly corresponds to the D1-D3hot device states) or you can put the
whole package into a low-power state ("package idle") resulting in the
removal of power from all the cores (more-or-less).  Now, it has to be
decided which approach to use and if the "package idle" is used, it may
be necessary to restore the cores' "state" when they are "resumed".

Analogously, for devices in a power domain you usually can use some
programmable mechanism to put each of them into some sort of a low-power
state (e.g. D3hot or "stop clock" etc.) such that the device may be programmed
to go out of it.  Alternatively, you can use a different mechanism to
remove power from the entire domain, in which case devices, when power is
restored, may need to be re-initialized.  Of course, you need to know when
this happens, so that you know when to carry out the re-initialization.

Our approach in the generic PM domains framework is, essentially, to provide
a special set of PM callbacks ("domain callbacks") that are run (by the PM
core) instead of bus-type PM callbacks.  Those domain callbacks are added to
every device in the domain through its pm_domain pointer.  Of course, this
means that devices have to be added to the domains explicitly and we have some
helpers for that.  We also use some additional data structures allowing the
domain callbacks to track devices in the domain.

Now, when a device in a domain is "suspended" (meaning its runtime PM status
changes from "active" to "suspended"), the domain callbacks check if this is
the last device in the domain whose status is "active" at that point.  If
that is not the case, they simply call a special .stop() callback to put the
device into a "normal" per-device low-power state (the .stop() callback may be
defined per device and in principle it may be designed to call the bus-type
or driver .runtime_suspend() callback for the device).  Otherwise (i.e. if
this is the last device in the domain whose status was "active" before) and if
the PM QoS constraints allow that to happen, power is removed from the domain
as a whole.  Then, all devices in the domain are marked as "need re-init upon
resume" and the resume domain callbacks take care of re-initializing them as
appropriate when their status changes from "suspended" back to "active".  [The
domain callbacks use the subsys_data pointer in dev_pm_info to attach their own
data to device objects.]

The actual code is more complicated than that, but that's the idea.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
huang ying - Feb. 18, 2012, 12:54 p.m.
On Sat, Feb 18, 2012 at 7:54 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Thursday, February 16, 2012, Zhang Rui wrote:
>> On 二, 2012-02-14 at 23:39 +0100, Rafael J. Wysocki wrote:
>> > On Tuesday, February 14, 2012, Zhang Rui wrote:
>> > > On 一, 2012-02-13 at 20:38 +0100, Rafael J. Wysocki wrote:
>> > > > On Monday, February 13, 2012, Alan Stern wrote:
>> > > > > On Mon, 13 Feb 2012, Lin Ming wrote:
[snip]
>> Yeah, I have thought about this for quite a while before, there ARE
>> several ways to do this, but these need a lot of changes in bus code, at
>> least for the buses that support device runtime D3 (off) by ACPI.
>>
>> Lets also take SATA port and ZPODD for example,
>> proposal one,
>> 1) introduce scsi_can_power_off and ata_can_power_off.
>> 2) sr driver set scsi_can_power_off bit and scsi layer is aware of this,
>> thus the scsi host can set this bit as well.
>> 3) in the .runtime_suspend callback of ata port, it knows that its scsi
>> host interface can be powered off, thus it invokes ata_can_power_off to
>> tell the ata layer.
>
> Hmm.  I'm not sure why you want to introduce this special "power off"
> condition.  In fact, it's nothing special, it only means that the device
> in question shouldn't be accessed by software, which pretty much is equivalent
> to the "suspended" condition (as defined in the runtime PM docs).

I think some reasons to introduce can_poweroff can be:

1) To indicate the implementation of .runtime_suspend/.runtime_resume
is compatible with power off.  That is, .runtime_suspend will save all
needed information and .runtime_resume can work on the uninitialized
device.

If this is already the requirement of
.runtime_suspend/.runtime_resume.  Then this is not needed.   Maybe we
can make that explicitly for these callbacks via some kind of
documentation.

2) To support something like pm-qos.  power off device may have more
exit.latency than normal low power state (such as D3Hot).  Some device
may disable can_power_off based on that.

3) Whether to go to power off should be determined by leaf device
(such as SATA disk), but that may be done by its parent device (such
as SATA port).  It's a way for leaf device to tell its parent device
whether it want to go to power off.

[snip]

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Feb. 18, 2012, 8:35 p.m.
On Saturday, February 18, 2012, huang ying wrote:
> On Sat, Feb 18, 2012 at 7:54 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Thursday, February 16, 2012, Zhang Rui wrote:
> >> On 二, 2012-02-14 at 23:39 +0100, Rafael J. Wysocki wrote:
> >> > On Tuesday, February 14, 2012, Zhang Rui wrote:
> >> > > On 一, 2012-02-13 at 20:38 +0100, Rafael J. Wysocki wrote:
> >> > > > On Monday, February 13, 2012, Alan Stern wrote:
> >> > > > > On Mon, 13 Feb 2012, Lin Ming wrote:
> [snip]
> >> Yeah, I have thought about this for quite a while before, there ARE
> >> several ways to do this, but these need a lot of changes in bus code, at
> >> least for the buses that support device runtime D3 (off) by ACPI.
> >>
> >> Lets also take SATA port and ZPODD for example,
> >> proposal one,
> >> 1) introduce scsi_can_power_off and ata_can_power_off.
> >> 2) sr driver set scsi_can_power_off bit and scsi layer is aware of this,
> >> thus the scsi host can set this bit as well.
> >> 3) in the .runtime_suspend callback of ata port, it knows that its scsi
> >> host interface can be powered off, thus it invokes ata_can_power_off to
> >> tell the ata layer.
> >
> > Hmm.  I'm not sure why you want to introduce this special "power off"
> > condition.  In fact, it's nothing special, it only means that the device
> > in question shouldn't be accessed by software, which pretty much is equivalent
> > to the "suspended" condition (as defined in the runtime PM docs).
> 
> I think some reasons to introduce can_poweroff can be:
> 
> 1) To indicate the implementation of .runtime_suspend/.runtime_resume
> is compatible with power off.  That is, .runtime_suspend will save all
> needed information and .runtime_resume can work on the uninitialized
> device.
> 
> If this is already the requirement of
> .runtime_suspend/.runtime_resume.

Yes, it is.

> Then this is not needed.   Maybe we
> can make that explicitly for these callbacks via some kind of
> documentation.

I thought it was documented.

> 2) To support something like pm-qos.  power off device may have more
> exit.latency than normal low power state (such as D3Hot).  Some device
> may disable can_power_off based on that.

No, please.  There would be totally _no_ _meaning_ of that flag at the core
level.  Please use subsys_data in struct dev_pm_info for subsystem-specific
data (which is this one).

> 3) Whether to go to power off should be determined by leaf device
> (such as SATA disk), but that may be done by its parent device (such
> as SATA port).  It's a way for leaf device to tell its parent device
> whether it want to go to power off.

Well, please see above.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang Rui - Feb. 20, 2012, 3:23 a.m.
On 六, 2012-02-18 at 00:54 +0100, Rafael J. Wysocki wrote:
> 
> > > have been working on a similar one for several months now. :-)
> > 
> > That's why generic power domain is introduced?
> > Can you tell me what's your idea please?
> > It would be GREAT if you can share your experience on this.
> 
> Well, a power domain (which seems to be what you have in the ZPODD case)
> is analogous to a package with multiple CPU cores.  In that case you
> can put individual cores into per-core low-power ("idle") states (that
> roughly corresponds to the D1-D3hot device states) or you can put the
> whole package into a low-power state ("package idle") resulting in the
> removal of power from all the cores (more-or-less).  Now, it has to be
> decided which approach to use and if the "package idle" is used, it may
> be necessary to restore the cores' "state" when they are "resumed".
> 
> Analogously, for devices in a power domain you usually can use some
> programmable mechanism to put each of them into some sort of a low-power
> state (e.g. D3hot or "stop clock" etc.) such that the device may be programmed
> to go out of it.  Alternatively, you can use a different mechanism to
> remove power from the entire domain, in which case devices, when power is
> restored, may need to be re-initialized.  Of course, you need to know when
> this happens, so that you know when to carry out the re-initialization.
> 
> Our approach in the generic PM domains framework is, essentially, to provide
> a special set of PM callbacks ("domain callbacks") that are run (by the PM
> core) instead of bus-type PM callbacks.  Those domain callbacks are added to
> every device in the domain through its pm_domain pointer.  Of course, this
> means that devices have to be added to the domains explicitly and we have some
> helpers for that.  We also use some additional data structures allowing the
> domain callbacks to track devices in the domain.
> 
> Now, when a device in a domain is "suspended" (meaning its runtime PM status
> changes from "active" to "suspended"), the domain callbacks check if this is
> the last device in the domain whose status is "active" at that point.  If
> that is not the case, they simply call a special .stop() callback to put the
> device into a "normal" per-device low-power state (the .stop() callback may be
> defined per device and in principle it may be designed to call the bus-type
> or driver .runtime_suspend() callback for the device).  Otherwise (i.e. if
> this is the last device in the domain whose status was "active" before) and if
> the PM QoS constraints allow that to happen, power is removed from the domain
> as a whole.  Then, all devices in the domain are marked as "need re-init upon
> resume" and the resume domain callbacks take care of re-initializing them as
> appropriate when their status changes from "suspended" back to "active".  [The
> domain callbacks use the subsys_data pointer in dev_pm_info to attach their own
> data to device objects.]
> 
> The actual code is more complicated than that, but that's the idea.
> 
Yeah, I have read the generic PM domain code before. and I have a
question about the generic PM domain code.

genpd->pow_off is invoked if all devices in a generic PM domain are
pm_runtime_suspended(). This suggests that the device driver can set
RPM_SUSPENDED flag only if it is able to bring the device from a cold
power off, right?

So how to handle this case, say, for a device in the generic PM domain
that supports 2 different low power state, D1 and D2.
D2 is deeper than D1, and it is kind of cold power off with remote
wakeup disabled. If the driver needs to runtime suspend the device with
remote wakeup enabled, it should set the device to D1, but it can not
set the RPM_SUSPEND?

thanks,
rui


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Feb. 20, 2012, 11:13 p.m.
On Monday, February 20, 2012, Zhang Rui wrote:
> On 六, 2012-02-18 at 00:54 +0100, Rafael J. Wysocki wrote:
> > 
> > > > have been working on a similar one for several months now. :-)
> > > 
> > > That's why generic power domain is introduced?
> > > Can you tell me what's your idea please?
> > > It would be GREAT if you can share your experience on this.
> > 
> > Well, a power domain (which seems to be what you have in the ZPODD case)
> > is analogous to a package with multiple CPU cores.  In that case you
> > can put individual cores into per-core low-power ("idle") states (that
> > roughly corresponds to the D1-D3hot device states) or you can put the
> > whole package into a low-power state ("package idle") resulting in the
> > removal of power from all the cores (more-or-less).  Now, it has to be
> > decided which approach to use and if the "package idle" is used, it may
> > be necessary to restore the cores' "state" when they are "resumed".
> > 
> > Analogously, for devices in a power domain you usually can use some
> > programmable mechanism to put each of them into some sort of a low-power
> > state (e.g. D3hot or "stop clock" etc.) such that the device may be programmed
> > to go out of it.  Alternatively, you can use a different mechanism to
> > remove power from the entire domain, in which case devices, when power is
> > restored, may need to be re-initialized.  Of course, you need to know when
> > this happens, so that you know when to carry out the re-initialization.
> > 
> > Our approach in the generic PM domains framework is, essentially, to provide
> > a special set of PM callbacks ("domain callbacks") that are run (by the PM
> > core) instead of bus-type PM callbacks.  Those domain callbacks are added to
> > every device in the domain through its pm_domain pointer.  Of course, this
> > means that devices have to be added to the domains explicitly and we have some
> > helpers for that.  We also use some additional data structures allowing the
> > domain callbacks to track devices in the domain.
> > 
> > Now, when a device in a domain is "suspended" (meaning its runtime PM status
> > changes from "active" to "suspended"), the domain callbacks check if this is
> > the last device in the domain whose status is "active" at that point.  If
> > that is not the case, they simply call a special .stop() callback to put the
> > device into a "normal" per-device low-power state (the .stop() callback may be
> > defined per device and in principle it may be designed to call the bus-type
> > or driver .runtime_suspend() callback for the device).  Otherwise (i.e. if
> > this is the last device in the domain whose status was "active" before) and if
> > the PM QoS constraints allow that to happen, power is removed from the domain
> > as a whole.  Then, all devices in the domain are marked as "need re-init upon
> > resume" and the resume domain callbacks take care of re-initializing them as
> > appropriate when their status changes from "suspended" back to "active".  [The
> > domain callbacks use the subsys_data pointer in dev_pm_info to attach their own
> > data to device objects.]
> > 
> > The actual code is more complicated than that, but that's the idea.
> > 
> Yeah, I have read the generic PM domain code before. and I have a
> question about the generic PM domain code.
> 
> genpd->pow_off is invoked if all devices in a generic PM domain are
> pm_runtime_suspended(). This suggests that the device driver can set
> RPM_SUSPENDED flag only if it is able to bring the device from a cold
> power off, right?

A device driver can _never_ set the RPM_SUSPENDED, the core does that.

> So how to handle this case, say, for a device in the generic PM domain
> that supports 2 different low power state, D1 and D2.
> D2 is deeper than D1, and it is kind of cold power off with remote
> wakeup disabled. If the driver needs to runtime suspend the device with
> remote wakeup enabled, it should set the device to D1, but it can not
> set the RPM_SUSPEND?

The device is regarded as "suspended" if its bus type's (or PM domain's)
.runtime_suspend() callback has been executed and has returned 0 (success).
What the callback has actually done is not of any interest to the core.

Now, the D1 and D2 case has to be handled by the bus (PM domain) and
driver.  In both cases the device will be regarded as "suspended" and the
core doesn't track the actual device state.

I think the problem here is that the PCI bus type's runtime PM callbacks
aren't very sophisticated (they just choose the lowest possible low-power
state and attempt to put the device into it) and I can see two possible
ways to address that.

First, you can modify pci_pm_runtime_suspend/_resume() to handle multiple
states (for example, to choose the target low-power state more intelligently
than they do right now).  Second, you can add a PM domain that will do what
you want from pci_pm_runtime_suspend/_resume() for a specific set of devices.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang Rui - Feb. 21, 2012, 1:13 a.m.
On 二, 2012-02-21 at 00:13 +0100, Rafael J. Wysocki wrote:
> > So how to handle this case, say, for a device in the generic PM domain
> > that supports 2 different low power state, D1 and D2.
> > D2 is deeper than D1, and it is kind of cold power off with remote
> > wakeup disabled. If the driver needs to runtime suspend the device with
> > remote wakeup enabled, it should set the device to D1, but it can not
> > set the RPM_SUSPEND?
> 
> The device is regarded as "suspended" if its bus type's (or PM domain's)
> .runtime_suspend() callback has been executed and has returned 0 (success).
> What the callback has actually done is not of any interest to the core.
> 
right.

> Now, the D1 and D2 case has to be handled by the bus (PM domain) and
> driver.  In both cases the device will be regarded as "suspended" and the
> core doesn't track the actual device state.
> 


> I think the problem here is that the PCI bus type's runtime PM callbacks
> aren't very sophisticated (they just choose the lowest possible low-power
> state and attempt to put the device into it) and I can see two possible
> ways to address that.
> 
> First, you can modify pci_pm_runtime_suspend/_resume() to handle multiple
> states (for example, to choose the target low-power state more intelligently
> than they do right now).  Second, you can add a PM domain that will do what
> you want from pci_pm_runtime_suspend/_resume() for a specific set of devices.
> 
But RPM_SUSPENDED is set by PM core after .runtime_suspend() being
invoked, even if device is in D1 instead of D2, right?

So the problem is that, if a device in a generic power domain supports
two low power state, one is compatible with generic power domain power
off and another is not, how can the device driver pass this information
to the generic power domain, i.e. how to runtime suspend a device while
keep the generic power domain always on?

thanks,
rui

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Feb. 21, 2012, 9:43 p.m.
On Tuesday, February 21, 2012, Zhang Rui wrote:
> On 二, 2012-02-21 at 00:13 +0100, Rafael J. Wysocki wrote:
> > > So how to handle this case, say, for a device in the generic PM domain
> > > that supports 2 different low power state, D1 and D2.
> > > D2 is deeper than D1, and it is kind of cold power off with remote
> > > wakeup disabled. If the driver needs to runtime suspend the device with
> > > remote wakeup enabled, it should set the device to D1, but it can not
> > > set the RPM_SUSPEND?
> > 
> > The device is regarded as "suspended" if its bus type's (or PM domain's)
> > .runtime_suspend() callback has been executed and has returned 0 (success).
> > What the callback has actually done is not of any interest to the core.
> > 
> right.
> 
> > Now, the D1 and D2 case has to be handled by the bus (PM domain) and
> > driver.  In both cases the device will be regarded as "suspended" and the
> > core doesn't track the actual device state.
> > 
> 
> 
> > I think the problem here is that the PCI bus type's runtime PM callbacks
> > aren't very sophisticated (they just choose the lowest possible low-power
> > state and attempt to put the device into it) and I can see two possible
> > ways to address that.
> > 
> > First, you can modify pci_pm_runtime_suspend/_resume() to handle multiple
> > states (for example, to choose the target low-power state more intelligently
> > than they do right now).  Second, you can add a PM domain that will do what
> > you want from pci_pm_runtime_suspend/_resume() for a specific set of devices.
> > 
> But RPM_SUSPENDED is set by PM core after .runtime_suspend() being
> invoked, even if device is in D1 instead of D2, right?
> 
> So the problem is that, if a device in a generic power domain supports
> two low power state, one is compatible with generic power domain power
> off and another is not, how can the device driver pass this information
> to the generic power domain, i.e. how to runtime suspend a device while
> keep the generic power domain always on?

There are two "low-power" levels in the generic PM domains framework.  The
first one is the per-device low-power in which devices are put into their
individual (programmable) low-power states by the domain .dev_ops->stop()
callback.  The second one is when .stop() has been called for all devices,
so presumably all of them are in programmable low-power states and it's
possible to switch the entire domain off.  This is done by the domain
.power_off() callback.

It seems that the trick might be to make .dev_ops->stop() avoid turning off
power resources for the last suspending device in the domain and leave that
to domain .power_off().

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang Rui - Feb. 22, 2012, 12:57 a.m.
On 二, 2012-02-21 at 22:43 +0100, Rafael J. Wysocki wrote:
> On Tuesday, February 21, 2012, Zhang Rui wrote:
> > On 二, 2012-02-21 at 00:13 +0100, Rafael J. Wysocki wrote:
> > > > So how to handle this case, say, for a device in the generic PM domain
> > > > that supports 2 different low power state, D1 and D2.
> > > > D2 is deeper than D1, and it is kind of cold power off with remote
> > > > wakeup disabled. If the driver needs to runtime suspend the device with
> > > > remote wakeup enabled, it should set the device to D1, but it can not
> > > > set the RPM_SUSPEND?
> > > 
> > > The device is regarded as "suspended" if its bus type's (or PM domain's)
> > > .runtime_suspend() callback has been executed and has returned 0 (success).
> > > What the callback has actually done is not of any interest to the core.
> > > 
> > right.
> > 
> > > Now, the D1 and D2 case has to be handled by the bus (PM domain) and
> > > driver.  In both cases the device will be regarded as "suspended" and the
> > > core doesn't track the actual device state.
> > > 
> > 
> > 
> > > I think the problem here is that the PCI bus type's runtime PM callbacks
> > > aren't very sophisticated (they just choose the lowest possible low-power
> > > state and attempt to put the device into it) and I can see two possible
> > > ways to address that.
> > > 
> > > First, you can modify pci_pm_runtime_suspend/_resume() to handle multiple
> > > states (for example, to choose the target low-power state more intelligently
> > > than they do right now).  Second, you can add a PM domain that will do what
> > > you want from pci_pm_runtime_suspend/_resume() for a specific set of devices.
> > > 
> > But RPM_SUSPENDED is set by PM core after .runtime_suspend() being
> > invoked, even if device is in D1 instead of D2, right?
> > 
> > So the problem is that, if a device in a generic power domain supports
> > two low power state, one is compatible with generic power domain power
> > off and another is not, how can the device driver pass this information
> > to the generic power domain, i.e. how to runtime suspend a device while
> > keep the generic power domain always on?
> 
> There are two "low-power" levels in the generic PM domains framework.  The
> first one is the per-device low-power in which devices are put into their
> individual (programmable) low-power states by the domain .dev_ops->stop()
> callback.  The second one is when .stop() has been called for all devices,
> so presumably all of them are in programmable low-power states and it's
> possible to switch the entire domain off.  This is done by the domain
> .power_off() callback.
> 
> It seems that the trick might be to make .dev_ops->stop() avoid turning off
> power resources for the last suspending device in the domain and leave that
> to domain .power_off().
> 
Yeah, that's a good idea.
so how about this proposal for ZPODD.
1. create an ACPI generic power domain for every device that has an _PR3
   method, the SATA port in this case.
2. add sr device to the generic power domain via genpd APIs.
   this can be done either in ATA port driver or in sr driver.
3. the .power_off callback of the generic power domain
   a) checks if all the devices allows power off (for the trick above)
   b) turns off the power resources in _PR3.

thanks,
rui

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/include/linux/pm.h b/include/linux/pm.h
index e4982ac..4a09c76 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -474,6 +474,7 @@  struct dev_pm_info {
 	bool			is_prepared:1;	/* Owned by the PM core */
 	bool			is_suspended:1;	/* Ditto */
 	bool			ignore_children:1;
+	bool			can_power_off:1;
 	spinlock_t		lock;
 #ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index 609daae..81f3f13 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -100,6 +100,33 @@  static inline void pm_runtime_mark_last_busy(struct device *dev)
 	ACCESS_ONCE(dev->power.last_busy) = jiffies;
 }
 
+static inline bool pm_runtime_can_power_off(struct device *dev)
+{
+	return !!dev->power.can_power_off;
+}
+
+static inline int check_fn(struct device *dev, void *data)
+{
+	return pm_runtime_can_power_off(dev) ? 0 : -1;
+}
+
+static inline bool pm_runtime_allow_power_off(struct device *dev)
+{
+	return device_for_each_child(dev, NULL, check_fn) ? false : true;
+}
+
+static inline void pm_runtime_enable_power_off(struct device *dev)
+{
+	if (!dev->power.is_prepared)
+		dev->power.can_power_off = pm_runtime_allow_power_off(dev);
+}
+
+static inline void pm_runtime_disable_power_off(struct device *dev)
+{
+	if (!dev->power.is_prepared)
+		dev->power.can_power_off = false;
+}
+
 #else /* !CONFIG_PM_RUNTIME */
 
 static inline int __pm_runtime_idle(struct device *dev, int rpmflags)
@@ -149,6 +176,9 @@  static inline void pm_runtime_set_autosuspend_delay(struct device *dev,
 						int delay) {}
 static inline unsigned long pm_runtime_autosuspend_expiration(
 				struct device *dev) { return 0; }
+static inline bool pm_runtime_can_power_off(struct device *dev) { return false; }
+static inline void pm_runtime_enable_power_off(struct device *dev) {}
+static inline void pm_runtime_disable_power_off(struct device *dev) {}
 
 static inline void pm_runtime_update_max_time_suspended(struct device *dev,
 							s64 delta_ns) {}