Patchwork Adding runtime PM support to sata_mv driver

login
register
mail settings
Submitter Priyanka Gupta
Date Jan. 3, 2011, 5:07 a.m.
Message ID <AANLkTiksyiR3b3QZu-B7VnfhK_P4t1Y5kHxhAZ4XU+en@mail.gmail.com>
Download mbox | patch
Permalink /patch/77218/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Priyanka Gupta - Jan. 3, 2011, 5:07 a.m.
Hi,

Thanks for your replies. I am ccing the ide list now (and re-pasting
the ide patch).

Let me try and elaborate on what I am trying to achieve: I would like
to add a way to move the marvell sata controller to D3hot to save some
power. For my short term use case, I am just interested in turning off
the controllers which don't have any disks attached to them. For this,
I guess I don't really need to do much as the pci pm subsystem takes
care of suspending the controller. However, my goal is to keep the
patch generic, i.e. be able to turn off any marvell controller. So, if
there are disks attached to it, then I think that it should spin down
all the disks and power off the controller.

I would think that some userspace algorithm would decide if the
sata_mv controller should be moved into auto mode (i.e runtime power
managed). This might be done when there are no disks attached or the
system doesn't expect to use the disks behind the controller. If there
is intermittent disk usage, then the user space would not move it into
auto and keep the state as ON.

I am using the ata_pci_device_suspend for the runtime _suspend method
also because AFAICT it does exactly what I am trying to achieve. This
will try and do error handling (scsi_error_handler) which tries to
resume a device if its in the suspending state causing both threads to
be stuck.

Rafael, you mentioned in one of your emails that the patch doesn't
seem to be doing anything useful. How else do I suspend the sata_mv
controller in a proper way? Do you think that there isn't a need to
suspend the disks attached to the controller?

Thanks a lot,
Priyanka


From: Priyanka Gupta <ankaguptaca@gmail.com>
Date: Tue, 28 Dec 2010 20:49:45 -0600
Subject: [PATCH] Marvell controller Runtime Power Management

---
 arch/x86/configs/x86_64_defconfig |    1 +
 drivers/ata/sata_mv.c             |   46 +++++++++++++++++++++++++++++++++++-
 2 files changed, 45 insertions(+), 2 deletions(-)

 /* move to PCI layer or libata core? */
@@ -4284,6 +4299,18 @@ static void mv_print_info(struct ata_host *host)
 }

 /**
+ *      mv_pci_remove_one - handle a removal of the marvell controller
+ *      @pdev: PCI device to be removed
+ *
+ *      LOCKING:
+ *      Inherited from caller.
+ */
+void mv_pci_remove_one(struct pci_dev *pdev) {
+       ata_pci_remove_one(pdev);
+       pm_runtime_get_noresume(&pdev->dev);
+}
+
+/**
 *      mv_pci_init_one - handle a positive probe of a PCI Marvell host
 *      @pdev: PCI device found
 *      @ent: PCI device ID entry for the matched host
@@ -4359,10 +4386,25 @@ static int mv_pci_init_one(struct pci_dev *pdev,

       pci_set_master(pdev);
       pci_try_set_mwi(pdev);
+       pm_runtime_put_noidle(&pdev->dev);
       return ata_host_activate(host, pdev->irq, mv_interrupt, IRQF_SHARED,
                                IS_GEN_I(hpriv) ? &mv5_sht : &mv6_sht);
 }

+#ifdef CONFIG_PM_OPS
+static int mv_pci_runtime_device_suspend(struct device *dev)
+{
+       struct pci_dev *pdev = to_pci_dev(dev);
+       return ata_pci_device_suspend(pdev, PMSG_SUSPEND);
+}
+
+static int mv_pci_runtime_device_resume(struct device *dev)
+{
+       struct pci_dev *pdev = to_pci_dev(dev);
+       return mv_pci_device_resume(pdev);
+}
+#endif
+
 #ifdef CONFIG_PM
 static int mv_pci_device_resume(struct pci_dev *pdev)
 {
--
1.7.3.1


On Fri, Dec 31, 2010 at 6:01 AM, Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Fri, Dec 31, 2010 at 12:09:11AM +0100, Rafael J. Wysocki wrote:
> > Adding Tejun to the CC list, because he's our SATA expert.
> >
> > Also appending the patch so that he can actually see it.
>
> And you forgot. :-)
>
> > [Priyanka, it usually is better do discuss such things on mailing
> > lists, so that people can look into the archives and see the patches
> > etc.]
>
> Yes, when you reply, please at least cc linux-ide@vger.kernel.org.
>
> > > > already been moved to SUSPENDING state by the __pm_runtime_suspend).
> > > > This in turn invokes the scsi_error_handler to run, which ends up
> > > > calling scsi_autopm_get_host which then tries to get the device and
> > > > hence tries to resume it. And they end up stuck, one thread trying to
> > > > suspend which won't succeed, and the other thread trying to resume.
> > > > See below the stack trace for both.
> > > >
> > > > If I skip the error handling the suspend works fine as you would
> > > > expect. However, that isn't a solution. I was wondering if you can
> > > > suggest what might be the best possible solution to this problem?
> > > >
> > > > I am attaching a patch as an attachment. Hope you can help.
> > >
> > > The problem is that you call ata_pci_device_suspend().  That routine
> > > apparently was not meant for use by runtime PM.
> >
> > Agreed.
>
> Hmmm... I'm not familiar with runtime PM.  What's it supposed to
> achieve?  Should it spin down the drive and power off the controller?
> If so, the only proper way would be going through EH, but how often is
> this gonna be activated?  Even auto-spindown feature implemented in
> hardware often doesn't help much (or even consumes more power) unless
> the idle period is quite long.
>
> Thanks.
>
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - Jan. 3, 2011, 2:39 p.m.
Hello,

On Sun, Jan 02, 2011 at 11:07:43PM -0600, Priyanka Gupta wrote:
> Let me try and elaborate on what I am trying to achieve: I would like
> to add a way to move the marvell sata controller to D3hot to save some
> power.

Oh I see.  You wanna put the controller into sleep.  You'll have to
implement EH actions for it and invoke EH to do it.  The problem is
that for it to work in generic manner, the operations need to be
synchronized with other accesses to the hardware and going through EH
is the easiest way to achieve that.  Something like the following
would work.

On runtime suspend

* Tell EH to kick in and suspend the controller (and wait (or not)).

* EH kicks in.  Mark the port inactive so that further commands
  processing won't happen (it would probably be nice if this is
  something the PM framework can guarantee but I don't think it's done
  that way, is it?)

* EH skips all other EH actions and put the controller in sleep.

On runtime resume

* Tell EH to kick in and wake up the controller.

* EH kicks in.  Powers up the controller and revalidates all the
  attached devices.

So, it's gonna take a bit more code than posted.

Thanks.
Alan Stern - Jan. 3, 2011, 3:15 p.m.
On Mon, 3 Jan 2011, Tejun Heo wrote:

> Hello,
> 
> On Sun, Jan 02, 2011 at 11:07:43PM -0600, Priyanka Gupta wrote:
> > Let me try and elaborate on what I am trying to achieve: I would like
> > to add a way to move the marvell sata controller to D3hot to save some
> > power.
> 
> Oh I see.  You wanna put the controller into sleep.  You'll have to
> implement EH actions for it and invoke EH to do it.  The problem is
> that for it to work in generic manner, the operations need to be
> synchronized with other accesses to the hardware and going through EH
> is the easiest way to achieve that.  Something like the following
> would work.

Are you talking about the usual SCSI error handler or something else 
specific to ATA drivers?

Runtime suspend should not occur unless all the child devices (i.e.,
the attached drives) are already suspended.  This means it is not
necessary to synchronize any operations, since there shouldn't be any
other accesses to the hardware going on.  Therefore there is no need to
use the error handler.

> On runtime suspend
> 
> * Tell EH to kick in and suspend the controller (and wait (or not)).
> 
> * EH kicks in.  Mark the port inactive so that further commands
>   processing won't happen (it would probably be nice if this is
>   something the PM framework can guarantee but I don't think it's done
>   that way, is it?)

Why must the port be marked inactive?  If any further commands need to
be processed then they should cause the controller to be resumed, at
which point the port will be active again.

> * EH skips all other EH actions and put the controller in sleep.

The PCI core already takes care of putting device into D3.  The EH 
doesn't need to do it.  And there shouldn't be any other EH actions 
pending.

> On runtime resume
> 
> * Tell EH to kick in and wake up the controller.

The PCI core already puts devices back into D0.

> * EH kicks in.  Powers up the controller and revalidates all the
>   attached devices.

At this point the attached devices should still be suspended.  
Presumably you would not want to revalidate them until their own 
runtime resumes occur.

> So, it's gonna take a bit more code than posted.

On the contrary, there should be almost no new code needed.  In fact,
calling pm_runtime_put_noidle() in the probe routine and
pm_runtime_get_noresume() in the remove routine should be almost good 
enough.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - Jan. 3, 2011, 3:31 p.m.
Hello, Alan.

On Mon, Jan 03, 2011 at 10:15:40AM -0500, Alan Stern wrote:
> > Oh I see.  You wanna put the controller into sleep.  You'll have to
> > implement EH actions for it and invoke EH to do it.  The problem is
> > that for it to work in generic manner, the operations need to be
> > synchronized with other accesses to the hardware and going through EH
> > is the easiest way to achieve that.  Something like the following
> > would work.
> 
> Are you talking about the usual SCSI error handler or something else 
> specific to ATA drivers?

I'm talking specifically about libata EH.

> Runtime suspend should not occur unless all the child devices (i.e.,
> the attached drives) are already suspended.  This means it is not
> necessary to synchronize any operations, since there shouldn't be any
> other accesses to the hardware going on.  Therefore there is no need to
> use the error handler.

Well, it's not that simple I'm afraid.  EH actions are asynchronous.
Even if all the downstream devices are suspended, PHY events can
happen any time and EH could be active.  Hmmm... a delta but it would
make more sense to put only the controller into hot sleep while
leaving the disk alone for rotating devices.

Also, on resume, as the controller was out, libata needs to do full
revalidation & reconfiguration.  There's no way to avoid EH.

> > * EH kicks in.  Mark the port inactive so that further commands
> >   processing won't happen (it would probably be nice if this is
> >   something the PM framework can guarantee but I don't think it's done
> >   that way, is it?)
> 
> Why must the port be marked inactive?  If any further commands need to
> be processed then they should cause the controller to be resumed, at
> which point the port will be active again.

Then, at least we would need to plug EH because commands aren't the
only intiation vector.

> > * EH skips all other EH actions and put the controller in sleep.
> 
> The PCI core already takes care of putting device into D3.  The EH 
> doesn't need to do it.  And there shouldn't be any other EH actions 
> pending.

So, it behaves differently from the usual suspend/resume?  We have
.suspend callback which puts the controller into D3.  Are you saying
for runtime PM that isn't necessary?  If so, wouldn't it be better to
unify behaviors between the two paths?

> > On runtime resume
> > 
> > * Tell EH to kick in and wake up the controller.
> 
> The PCI core already puts devices back into D0.

I still can't see how this would work without low level driver's help.
Who's gonna reconfigure the controller?  Or are controllers supposed
to maintain all configurations across D3(hot) sleep?

> > * EH kicks in.  Powers up the controller and revalidates all the
> >   attached devices.
> 
> At this point the attached devices should still be suspended.  
> Presumably you would not want to revalidate them until their own 
> runtime resumes occur.

Controller resume involves SATA bus resets.  As there can be multiple
devices per bus, you can't really divide things that neatly.  The
hardware isn't designed that way.

> > So, it's gonna take a bit more code than posted.
> 
> On the contrary, there should be almost no new code needed.  In fact,
> calling pm_runtime_put_noidle() in the probe routine and
> pm_runtime_get_noresume() in the remove routine should be almost good 
> enough.

Yeah, that would be nice.  I'm quite doubtful that would work as well
as you would expect it to tho.

Thanks.
Alan Stern - Jan. 3, 2011, 4:42 p.m.
On Mon, 3 Jan 2011, Tejun Heo wrote:

> > Runtime suspend should not occur unless all the child devices (i.e.,
> > the attached drives) are already suspended.  This means it is not
> > necessary to synchronize any operations, since there shouldn't be any
> > other accesses to the hardware going on.  Therefore there is no need to
> > use the error handler.
> 
> Well, it's not that simple I'm afraid.  EH actions are asynchronous.
> Even if all the downstream devices are suspended, PHY events can
> happen any time and EH could be active.  Hmmm... a delta but it would
> make more sense to put only the controller into hot sleep while
> leaving the disk alone for rotating devices.

That could be done.  How do you tell whether a particular drive is
rotational?

Do you have to worry about a rotating drive that gets its power from
the SATA bus (what happens if that power is suddenly no longer
available because the controller is suspended)?

> Also, on resume, as the controller was out, libata needs to do full
> revalidation & reconfiguration.  There's no way to avoid EH.

There's no way to tell if a drive was disconnected while the controller 
was suspended?

> > > * EH kicks in.  Mark the port inactive so that further commands
> > >   processing won't happen (it would probably be nice if this is
> > >   something the PM framework can guarantee but I don't think it's done
> > >   that way, is it?)
> > 
> > Why must the port be marked inactive?  If any further commands need to
> > be processed then they should cause the controller to be resumed, at
> > which point the port will be active again.
> 
> Then, at least we would need to plug EH because commands aren't the
> only intiation vector.

I have no idea how libata is designed, and of course the runtime PM
implementation will depend heavily on those details.  All I can do is 
offer advice about runtime PM.  There are two major design options: You 
can leave the controller powered on as long as any of the attached 
drives are running, or you can power-down the controller in between 
accesses.  The runtime PM core supports both approaches.

> > > * EH skips all other EH actions and put the controller in sleep.
> > 
> > The PCI core already takes care of putting device into D3.  The EH 
> > doesn't need to do it.  And there shouldn't be any other EH actions 
> > pending.
> 
> So, it behaves differently from the usual suspend/resume?

The PCI subsystem's implementation is somewhat different.

>  We have
> .suspend callback which puts the controller into D3.

It does so by calling pci_prepare_to_sleep()?  Compare that with 
pci_finish_runtime_suspend(), which is called directly by 
pci_pm_runtime_suspend().

>  Are you saying
> for runtime PM that isn't necessary?  If so, wouldn't it be better to
> unify behaviors between the two paths?

I don't know.  Certainly for runtime suspend it is necessary to put a
PCI device into D3hot.  For system suspend it might not be necessary,
depending on the platform.

> > > On runtime resume
> > > 
> > > * Tell EH to kick in and wake up the controller.
> > 
> > The PCI core already puts devices back into D0.
> 
> I still can't see how this would work without low level driver's help.
> Who's gonna reconfigure the controller?  Or are controllers supposed
> to maintain all configurations across D3(hot) sleep?

The low-level driver has to take care of all these special
requirements.  Note that many PCI controllers _do_ retain their
configuration across D3 sleep -- maybe not SATA controllers, though.

> > > * EH kicks in.  Powers up the controller and revalidates all the
> > >   attached devices.
> > 
> > At this point the attached devices should still be suspended.  
> > Presumably you would not want to revalidate them until their own 
> > runtime resumes occur.
> 
> Controller resume involves SATA bus resets.  As there can be multiple
> devices per bus, you can't really divide things that neatly.  The
> hardware isn't designed that way.

Obviously, runtime PM has to be implemented in a way that will work 
with the hardware design.

> > > So, it's gonna take a bit more code than posted.
> > 
> > On the contrary, there should be almost no new code needed.  In fact,
> > calling pm_runtime_put_noidle() in the probe routine and
> > pm_runtime_get_noresume() in the remove routine should be almost good 
> > enough.
> 
> Yeah, that would be nice.  I'm quite doubtful that would work as well
> as you would expect it to tho.

Well, perhaps not...

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Freemyer - Jan. 3, 2011, 7:01 p.m.
On Mon, Jan 3, 2011 at 11:42 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Mon, 3 Jan 2011, Tejun Heo wrote:
>
>> > Runtime suspend should not occur unless all the child devices (i.e.,
>> > the attached drives) are already suspended.  This means it is not
>> > necessary to synchronize any operations, since there shouldn't be any
>> > other accesses to the hardware going on.  Therefore there is no need to
>> > use the error handler.
>>
>> Well, it's not that simple I'm afraid.  EH actions are asynchronous.
>> Even if all the downstream devices are suspended, PHY events can
>> happen any time and EH could be active.  Hmmm... a delta but it would
>> make more sense to put only the controller into hot sleep while
>> leaving the disk alone for rotating devices.
>
> That could be done.  How do you tell whether a particular drive is
> rotational?
>

I don't think there is a kernel flag for it (yet).

Per ATA-8 there is flag in the identify block.

http://markmail.org/message/2vonazw4bck7nefa

As that email says, I don't know how reliable the flag is.

Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Freemyer - Jan. 3, 2011, 11:10 p.m.
On Mon, Jan 3, 2011 at 2:01 PM, Greg Freemyer <greg.freemyer@gmail.com> wrote:
> On Mon, Jan 3, 2011 at 11:42 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
>> On Mon, 3 Jan 2011, Tejun Heo wrote:
>>
>>> > Runtime suspend should not occur unless all the child devices (i.e.,
>>> > the attached drives) are already suspended.  This means it is not
>>> > necessary to synchronize any operations, since there shouldn't be any
>>> > other accesses to the hardware going on.  Therefore there is no need to
>>> > use the error handler.
>>>
>>> Well, it's not that simple I'm afraid.  EH actions are asynchronous.
>>> Even if all the downstream devices are suspended, PHY events can
>>> happen any time and EH could be active.  Hmmm... a delta but it would
>>> make more sense to put only the controller into hot sleep while
>>> leaving the disk alone for rotating devices.
>>
>> That could be done.  How do you tell whether a particular drive is
>> rotational?
>>
>
> I don't think there is a kernel flag for it (yet).
>
> Per ATA-8 there is flag in the identify block.
>
> http://markmail.org/message/2vonazw4bck7nefa
>
> As that email says, I don't know how reliable the flag is.
>
> Greg

Ignore my noise, as of 2.6.36 you get it via ata_id_rotation_rate().
If its set to 1, then its a SSD.

Unfortunately, needed horkage is not yet implemented.

And the 2 proposed patches I've seen to do it were in the scsi-ata
translation logic ( drivers/ata/libata-scsi.c) and not readily usable
other code directly.

Looks to me like a post_horkage_ata_id_rotation_rate() routine needs
to be created between libata-scsi and ata_id_rotation_rate().

(More in the above referenced thread if you want to follow it.)

Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Priyanka Gupta - Jan. 5, 2011, 4:31 p.m.
Tejun,

The patch that I sent does invoke EH, both for resume and suspend. I
am calling ata_pci_device_suspend as the runtime suspend method. This
will call ata_host_suspend which will end up calling
ata_host_request_pm, which sets some flags including
ATA_PFLAG_PM_PENDING and basically schedules the error handling
(ata_port_schedule_eh). Isn't that what is needed?

The runtime resume that I use, does the same - schedules error
handling with ATA_EH_RESET.

Is there any reason that the system suspend and resume methods can't
be used for runtime suspend and resume? I mean sure, we probably don't
need the ata_pci_device_do_suspend part since the runtime PM will take
care of that, but other than that it looks like we should just be able
to do a ata_host_suspend during runtime suspend.

Thanks,
Priyanka

On Mon, Jan 3, 2011 at 9:31 AM, Tejun Heo <tj@kernel.org> wrote:
> Hello, Alan.
>
> On Mon, Jan 03, 2011 at 10:15:40AM -0500, Alan Stern wrote:
>> > Oh I see.  You wanna put the controller into sleep.  You'll have to
>> > implement EH actions for it and invoke EH to do it.  The problem is
>> > that for it to work in generic manner, the operations need to be
>> > synchronized with other accesses to the hardware and going through EH
>> > is the easiest way to achieve that.  Something like the following
>> > would work.
>>
>> Are you talking about the usual SCSI error handler or something else
>> specific to ATA drivers?
>
> I'm talking specifically about libata EH.
>
>> Runtime suspend should not occur unless all the child devices (i.e.,
>> the attached drives) are already suspended.  This means it is not
>> necessary to synchronize any operations, since there shouldn't be any
>> other accesses to the hardware going on.  Therefore there is no need to
>> use the error handler.
>
> Well, it's not that simple I'm afraid.  EH actions are asynchronous.
> Even if all the downstream devices are suspended, PHY events can
> happen any time and EH could be active.  Hmmm... a delta but it would
> make more sense to put only the controller into hot sleep while
> leaving the disk alone for rotating devices.
>
> Also, on resume, as the controller was out, libata needs to do full
> revalidation & reconfiguration.  There's no way to avoid EH.
>
>> > * EH kicks in.  Mark the port inactive so that further commands
>> >   processing won't happen (it would probably be nice if this is
>> >   something the PM framework can guarantee but I don't think it's done
>> >   that way, is it?)
>>
>> Why must the port be marked inactive?  If any further commands need to
>> be processed then they should cause the controller to be resumed, at
>> which point the port will be active again.
>
> Then, at least we would need to plug EH because commands aren't the
> only intiation vector.
>
>> > * EH skips all other EH actions and put the controller in sleep.
>>
>> The PCI core already takes care of putting device into D3.  The EH
>> doesn't need to do it.  And there shouldn't be any other EH actions
>> pending.
>
> So, it behaves differently from the usual suspend/resume?  We have
> .suspend callback which puts the controller into D3.  Are you saying
> for runtime PM that isn't necessary?  If so, wouldn't it be better to
> unify behaviors between the two paths?
>
>> > On runtime resume
>> >
>> > * Tell EH to kick in and wake up the controller.
>>
>> The PCI core already puts devices back into D0.
>
> I still can't see how this would work without low level driver's help.
> Who's gonna reconfigure the controller?  Or are controllers supposed
> to maintain all configurations across D3(hot) sleep?
>
>> > * EH kicks in.  Powers up the controller and revalidates all the
>> >   attached devices.
>>
>> At this point the attached devices should still be suspended.
>> Presumably you would not want to revalidate them until their own
>> runtime resumes occur.
>
> Controller resume involves SATA bus resets.  As there can be multiple
> devices per bus, you can't really divide things that neatly.  The
> hardware isn't designed that way.
>
>> > So, it's gonna take a bit more code than posted.
>>
>> On the contrary, there should be almost no new code needed.  In fact,
>> calling pm_runtime_put_noidle() in the probe routine and
>> pm_runtime_get_noresume() in the remove routine should be almost good
>> enough.
>
> Yeah, that would be nice.  I'm quite doubtful that would work as well
> as you would expect it to tho.
>
> Thanks.
>
> --
> tejun
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - Jan. 6, 2011, 4:53 a.m.
Hello, sorry about the delay.

On Mon, Jan 03, 2011 at 11:42:51AM -0500, Alan Stern wrote:
> > Well, it's not that simple I'm afraid.  EH actions are asynchronous.
> > Even if all the downstream devices are suspended, PHY events can
> > happen any time and EH could be active.  Hmmm... a delta but it would
> > make more sense to put only the controller into hot sleep while
> > leaving the disk alone for rotating devices.
> 
> That could be done.  How do you tell whether a particular drive is
> rotational?

As pointed out already, it's in the identify data.  Block layer also
keeps track of it using QUEUE_FLAG_NONROT (sd queries libata and sets
it), but I don't think it would be necessary to discern them.  Issuing
STANDBYNOW to SSDs would usually be noop.  Handling HDDs and SSDs the
same would work just fine.

> Do you have to worry about a rotating drive that gets its power from
> the SATA bus (what happens if that power is suddenly no longer
> available because the controller is suspended)?

No, SATA bus doesn't carry any power.  There are some drives (WD
mybook) which power down when SATA link goes offline tho and they may
spin down/up too often if the controller gets turned off aggressively.
Anyways, there are only a handful of them so no big deal.

> > Also, on resume, as the controller was out, libata needs to do full
> > revalidation & reconfiguration.  There's no way to avoid EH.
> 
> There's no way to tell if a drive was disconnected while the controller 
> was suspended?

Well, that's the EH reset/revalidation.  SATA has PHY events but there
are some issues.

* Link powersave effectively disables PHY event detection.

* PHY events can and do happen for other reasons so we can't afford to
  detach and reattach a SATA hard drive after a PHY event.  It could
  result in "when my air conditioner compressor kicks in, the root fs
  sometimes goes away".

So, the PHY events can be used as trigger to start rescan/revalidation
of the bus but not as the sole indication to base further actions
upon.

> > Then, at least we would need to plug EH because commands aren't the
> > only intiation vector.
> 
> I have no idea how libata is designed, and of course the runtime PM
> implementation will depend heavily on those details.  All I can do is 
> offer advice about runtime PM.  There are two major design options: You 
> can leave the controller powered on as long as any of the attached 
> drives are running, or you can power-down the controller in between 
> accesses.  The runtime PM core supports both approaches.

There are multiple layers of powersaving, so it's a bit complicated.

* Spinning down hard drives: This is difficult to get right, because
  spinning up not only consumes quite a bit of power but also involves
  significant latency (usually somewhere around or over 10secs) which
  can affect the general interactiveness of the system.  Furthermore,
  ATA hard drives already have hardware features to control this so
  implementing this in software again could be a bit silly.  Another
  danger is that hardware standby and STANDBYNOW issued by OS may
  interact weirdly.  Some (too many) drives would spin up to just spin
  down again if they receieve STANDBYNOW while already spun down.

* SATA link: This is called LPM and basically puts the link into
  powersave mode.  Again, there is hardware support for it on both
  controller (HIPM) and drive side (DIPM).  I personally think it
  should just have been DIPM but well...  Anyways, libata implements
  LPM and it's exported via SCSI sysfs.  The support can easily be
  extended to allow powering down unused ports.  Given the low latency
  dynamic nature of LPM, I'm not sure whether this would fit software
  runtime PM very well.

* SATA port/controller: This currently isn't implemented and could fit
  software runtime PM but for offline ports at least it can also be
  achieved using LPM support.  Not quite sure which way would be
  better.  It would be much nicer to integrate everything into the
  runtime PM framework but then again LPM doesn't really fit it.  If
  powering off occupied controllers has enough benefits, it definitely
  makes sense to implement it in the runtime PM framework but then we
  end up with separate LPM and runtime PM impelmentations.  Maybe it's
  inevitable.  I don't know.

> > So, it behaves differently from the usual suspend/resume?
> 
> The PCI subsystem's implementation is somewhat different.
> 
> >  We have
> > .suspend callback which puts the controller into D3.
> 
> It does so by calling pci_prepare_to_sleep()?  Compare that with 
> pci_finish_runtime_suspend(), which is called directly by 
> pci_pm_runtime_suspend().

Hmm... the PCI part of libata suspend is ata_pci_device_do_suspend()
and it calls pci_set_power_state() explicitly.

> > Are you saying for runtime PM that isn't necessary?  If so,
> > wouldn't it be better to unify behaviors between the two paths?
> 
> I don't know.  Certainly for runtime suspend it is necessary to put a
> PCI device into D3hot.  For system suspend it might not be necessary,
> depending on the platform.

Omitting pci_set_power_state() on system suspend may be an
alternative.  Again, I don't know.  The code has always been like that
and now I'm a bit afraid to change.  BTW, is there any case where
putting device into D3hot is necessary before going into system
suspend?  Aren't power always cut to the controllers anyway?

> > I still can't see how this would work without low level driver's help.
> > Who's gonna reconfigure the controller?  Or are controllers supposed
> > to maintain all configurations across D3(hot) sleep?
> 
> The low-level driver has to take care of all these special
> requirements.  Note that many PCI controllers _do_ retain their
> configuration across D3 sleep -- maybe not SATA controllers, though.

Even if some controllers retain the configuration, I think we're
better off with always reconfiguring them.  We need to reconfiguration
path for system resume anyway and it's always better to use common
code paths.  Maybe we would be able to do D3cold during runtime in the
future, who knows?

So, yeah, it's not a bit complicated.  If the only goal is to turn off
unoccpuied ports, using LPM framework would be the path with the least
amount of resistance but whether that is the right thing to do is a
different issue.  :-(
Rafael J. Wysocki - Jan. 6, 2011, 3:26 p.m.
On Thursday, January 06, 2011, Tejun Heo wrote:
> Hello, sorry about the delay.
...
> 
> Omitting pci_set_power_state() on system suspend may be an
> alternative.  Again, I don't know.  The code has always been like that
> and now I'm a bit afraid to change.

It probably is not necessary to call pci_set_power_state() on suspend, but
that would require some other changes (generally speaking, SATA will have to
be converted to using struct dev_pm_ops instead of the legacy PCI-specific
callbacks - which would be a good idea anyway and should be done _before_
trying to play with SATA runtime PM IMO).

> BTW, is there any case where putting device into D3hot is necessary before
> going into system suspend?  Aren't power always cut to the controllers
> anyway?

Not necessarily.  Well, that probably doesn't affect SATA controllers, but
there are known cases where devices draw power in a sleep state if they aren't
put into D3 on suspend directly.

> > > I still can't see how this would work without low level driver's help.
> > > Who's gonna reconfigure the controller?  Or are controllers supposed
> > > to maintain all configurations across D3(hot) sleep?
> > 
> > The low-level driver has to take care of all these special
> > requirements.  Note that many PCI controllers _do_ retain their
> > configuration across D3 sleep -- maybe not SATA controllers, though.
> 
> Even if some controllers retain the configuration, I think we're
> better off with always reconfiguring them.  We need to reconfiguration
> path for system resume anyway and it's always better to use common
> code paths.  Maybe we would be able to do D3cold during runtime in the
> future, who knows?

If you mean PCI D3cold here, we can't.  In fact we can't programmatically put
any device into PCI D3cold in any other way than by either putting its bus
segment into B3 or switching its power resources off (provided that we know
how to do that).

> So, yeah, it's not a bit complicated.  If the only goal is to turn off
> unoccpuied ports, using LPM framework would be the path with the least
> amount of resistance but whether that is the right thing to do is a
> different issue.  :-(

Still, putting unused _controllers_ into D3 might be a good idea too.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern - Jan. 6, 2011, 3:40 p.m.
On Wed, 5 Jan 2011, Tejun Heo wrote:

> > Do you have to worry about a rotating drive that gets its power from
> > the SATA bus (what happens if that power is suddenly no longer
> > available because the controller is suspended)?
> 
> No, SATA bus doesn't carry any power.  There are some drives (WD
> mybook) which power down when SATA link goes offline tho and they may
> spin down/up too often if the controller gets turned off aggressively.
> Anyways, there are only a handful of them so no big deal.

If you say so.

The drives we're talking about use the SCSI sd driver, right?  And the 
libata error handler is invoked by the regular SCSI EH?

This means that the call to scsi_autopm_get_host() in
scsi_error_handler() has to be removed.  I was planning to do that
while integrating runtime PM into the block layer.  But it will be
difficult; it will require that every block request sent while the
device is suspending or resuming be marked with a special request type
(something like REQ_TYPE_PM_SUSPEND and REQ_TYPE_PM_RESUME, as used
currently by the IDE driver).  Otherwise the block core will think that
it's a regular I/O request and will delay it until the device is fully
powered up again.

> There are multiple layers of powersaving, so it's a bit complicated.
> 
> * Spinning down hard drives: This is difficult to get right, because
>   spinning up not only consumes quite a bit of power but also involves
>   significant latency (usually somewhere around or over 10secs) which
>   can affect the general interactiveness of the system.  Furthermore,
>   ATA hard drives already have hardware features to control this so
>   implementing this in software again could be a bit silly.  Another
>   danger is that hardware standby and STANDBYNOW issued by OS may
>   interact weirdly.  Some (too many) drives would spin up to just spin
>   down again if they receieve STANDBYNOW while already spun down.

These are not new problems.  The SCSI disk driver has to cope with them 
already during system suspend.

> * SATA link: This is called LPM and basically puts the link into
>   powersave mode.  Again, there is hardware support for it on both
>   controller (HIPM) and drive side (DIPM).  I personally think it
>   should just have been DIPM but well...  Anyways, libata implements
>   LPM and it's exported via SCSI sysfs.  The support can easily be
>   extended to allow powering down unused ports.  Given the low latency
>   dynamic nature of LPM, I'm not sure whether this would fit software
>   runtime PM very well.

I don't know either.  In theory runtime PM can be pretty low latency,
i.e., usable from within an interrupt handler, if the underlying I/O
operations can be carried out without blocking.

> * SATA port/controller: This currently isn't implemented and could fit
>   software runtime PM but for offline ports at least it can also be
>   achieved using LPM support.  Not quite sure which way would be
>   better.  It would be much nicer to integrate everything into the
>   runtime PM framework but then again LPM doesn't really fit it.  If
>   powering off occupied controllers has enough benefits, it definitely
>   makes sense to implement it in the runtime PM framework but then we
>   end up with separate LPM and runtime PM impelmentations.  Maybe it's
>   inevitable.  I don't know.

It might be a good idea to step back and think about what form the 
final implementation should take.  At this point we may not have enough 
information to do that, however.

> > I don't know.  Certainly for runtime suspend it is necessary to put a
> > PCI device into D3hot.  For system suspend it might not be necessary,
> > depending on the platform.
> 
> Omitting pci_set_power_state() on system suspend may be an
> alternative.  Again, I don't know.  The code has always been like that
> and now I'm a bit afraid to change.  BTW, is there any case where
> putting device into D3hot is necessary before going into system
> suspend?  Aren't power always cut to the controllers anyway?

I think it depends on the platform.  Non-ACPI systems tend to be 
different from ACPI-based systems.

> So, yeah, it's not a bit complicated.  If the only goal is to turn off
> unoccpuied ports, using LPM framework would be the path with the least
> amount of resistance but whether that is the right thing to do is a
> different issue.  :-(

For now it might be the easiest.  Runtime PM can be revisited later.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/x86/configs/x86_64_defconfig
b/arch/x86/configs/x86_64_defconfig
index ee01a9d..b48d6ae 100644
--- a/arch/x86/configs/x86_64_defconfig
+++ b/arch/x86/configs/x86_64_defconfig
@@ -55,6 +55,7 @@  CONFIG_KEXEC=y
 CONFIG_CRASH_DUMP=y
 # CONFIG_COMPAT_VDSO is not set
 CONFIG_PM=y
+CONFIG_PM_RUNTIME=y
 CONFIG_PM_DEBUG=y
 CONFIG_PM_TRACE_RTC=y
 CONFIG_HIBERNATION=y
diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c
index bf74a36..34a2eed 100644
--- a/drivers/ata/sata_mv.c
+++ b/drivers/ata/sata_mv.c
@@ -65,6 +65,7 @@ 
 #include <linux/mbus.h>
 #include <linux/bitops.h>
 #include <linux/gfp.h>
+#include <linux/pm_runtime.h>
 #include <scsi/scsi_host.h>
 #include <scsi/scsi_cmnd.h>
 #include <scsi/scsi_device.h>
@@ -4191,21 +4192,35 @@  static struct platform_driver mv_platform_driver = {
 #ifdef CONFIG_PCI
 static int mv_pci_init_one(struct pci_dev *pdev,
                          const struct pci_device_id *ent);
+
+void mv_pci_remove_one(struct pci_dev *pdev);
+
 #ifdef CONFIG_PM
 static int mv_pci_device_resume(struct pci_dev *pdev);
 #endif

+#ifdef CONFIG_PM_OPS
+static int mv_pci_runtime_device_suspend(struct device *dev);
+static int mv_pci_runtime_device_resume(struct device *dev);
+
+static const struct dev_pm_ops mv_pm_ops = {
+       SET_SYSTEM_SLEEP_PM_OPS(ata_pci_device_suspend, mv_pci_device_resume)
+       SET_RUNTIME_PM_OPS(mv_pci_runtime_device_suspend,
mv_pci_runtime_device_resume, NULL)
+};
+#endif

 static struct pci_driver mv_pci_driver = {
       .name                   = DRV_NAME,
       .id_table               = mv_pci_tbl,
       .probe                  = mv_pci_init_one,
-       .remove                 = ata_pci_remove_one,
+       .remove                 = mv_pci_remove_one,
 #ifdef CONFIG_PM
       .suspend                = ata_pci_device_suspend,
       .resume                 = mv_pci_device_resume,
 #endif
-
+#ifdef CONFIG_PM_OPS
+       .driver.pm              = &mv_pm_ops,
+#endif
 };