Patchwork libata: skip EH autopsy and recovery during suspend

login
register
mail settings
Submitter Tejun Heo
Date Sept. 2, 2010, 10:02 a.m.
Message ID <4C7F7649.3010409@gmail.com>
Download mbox | patch
Permalink /patch/63466/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Tejun Heo - Sept. 2, 2010, 10:02 a.m.
For some mysterious reason, certain hardware reacts badly to usual EH
actions while the system is going for suspend.  As the devices won't
be needed until the system is resumed, ask EH to skip usual autopsy
and recovery and proceed directly to suspend.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
Can you please verify this one?

Thanks.

 drivers/ata/libata-core.c |   14 +++++++++++++-
 drivers/ata/libata-eh.c   |    4 ++++
 include/linux/libata.h    |    1 +
 3 files changed, 18 insertions(+), 1 deletion(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephan Diestelhorst - Sept. 2, 2010, 2:33 p.m.
On Thursday 02 September 2010, 12:02:49 Tejun Heo wrote:
> For some mysterious reason, certain hardware reacts badly to usual EH
> actions while the system is going for suspend.  As the devices won't
> be needed until the system is resumed, ask EH to skip usual autopsy
> and recovery and proceed directly to suspend.

Thanks Tejun for this patch. Will test in a second!

Cheers,
  Stephan
Rafael J. Wysocki - Sept. 2, 2010, 8:11 p.m.
On Thursday, September 02, 2010, Stephan Diestelhorst wrote:
> On Thursday 02 September 2010, 12:02:49 Tejun Heo wrote:
> > For some mysterious reason, certain hardware reacts badly to usual EH
> > actions while the system is going for suspend.  As the devices won't
> > be needed until the system is resumed, ask EH to skip usual autopsy
> > and recovery and proceed directly to suspend.
> 
> Thanks Tejun for this patch. Will test in a second!

I'll wait, then, because it takes a couple of days to reproduce it on my machine.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Sept. 2, 2010, 8:16 p.m.
On Thursday, September 02, 2010, Tejun Heo wrote:
> For some mysterious reason, certain hardware reacts badly to usual EH
> actions while the system is going for suspend.  As the devices won't
> be needed until the system is resumed, ask EH to skip usual autopsy
> and recovery and proceed directly to suspend.

Putting the issue at hand aside, I'm not really sure if using SCSI EH for
suspending the controller is a good idea.  It seems overly complicated and
it doesn't match the new PCI suspend model with separate ->suspend(),
->freeze() and ->poweroff() callbacks.  Moreover, the passing of pm_message_t
back and forth doesn't make things clear either.

Would it be possible to rework this thing entirely at one point?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - Sept. 2, 2010, 8:25 p.m.
Hello,

On 09/02/2010 10:16 PM, Rafael J. Wysocki wrote:
> Putting the issue at hand aside, I'm not really sure if using SCSI EH for
> suspending the controller is a good idea.  It seems overly complicated and
> it doesn't match the new PCI suspend model with separate ->suspend(),
> ->freeze() and ->poweroff() callbacks.  Moreover, the passing of pm_message_t
> back and forth doesn't make things clear either.
> 
> Would it be possible to rework this thing entirely at one point?

Well, I think I would need more than that to rework the whole thing.
There are a lot of benefits in sharing the same path between probing /
error handling and suspend/resuming.  ATA has a lot of quirks which
have to be dealt with and it will be very fragile to scatter handling
logics over multiple separate paths.  We definitely can try to make
the plumbing from power management easier to follow.

Thanks.
Rafael J. Wysocki - Sept. 2, 2010, 8:28 p.m.
On Thursday, September 02, 2010, Tejun Heo wrote:
> Hello,
> 
> On 09/02/2010 10:16 PM, Rafael J. Wysocki wrote:
> > Putting the issue at hand aside, I'm not really sure if using SCSI EH for
> > suspending the controller is a good idea.  It seems overly complicated and
> > it doesn't match the new PCI suspend model with separate ->suspend(),
> > ->freeze() and ->poweroff() callbacks.  Moreover, the passing of pm_message_t
> > back and forth doesn't make things clear either.
> > 
> > Would it be possible to rework this thing entirely at one point?
> 
> Well, I think I would need more than that to rework the whole thing.
> There are a lot of benefits in sharing the same path between probing /
> error handling and suspend/resuming.  ATA has a lot of quirks which
> have to be dealt with and it will be very fragile to scatter handling
> logics over multiple separate paths.  We definitely can try to make
> the plumbing from power management easier to follow.

That would be very nice.  In particular, I'd like to get rid of the
pm_message_t thing if possible.  And I'd like to avoid putting the
controller into D3 before creating hibernation image. :-)

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - Sept. 2, 2010, 8:33 p.m.
Hello,

On 09/02/2010 10:28 PM, Rafael J. Wysocki wrote:
> That would be very nice.  In particular, I'd like to get rid of the
> pm_message_t thing if possible.  And I'd like to avoid putting the
> controller into D3 before creating hibernation image. :-)

Oh, yeah, things like that can definitely be changed, but I think it
would still need to be piped through EH.  That's how the queue gets
quiesced for those special operations and resume is basically probing,
so it doesn't make much sense to split them.  Please let me know how
it should work from power management POV and I'll be happy to convert
libata to fit the new behavior.

Thanks.
Stephan Diestelhorst - Sept. 2, 2010, 8:52 p.m.
On Thu, Sep 2, 2010 at 4:33 PM, Stephan Diestelhorst
<stephan.diestelhorst@amd.com> wrote:
> On Thursday 02 September 2010, 12:02:49 Tejun Heo wrote:
>> For some mysterious reason, certain hardware reacts badly to usual EH
>> actions while the system is going for suspend.  As the devices won't
>> be needed until the system is resumed, ask EH to skip usual autopsy
>> and recovery and proceed directly to suspend.
>
> Thanks Tejun for this patch. Will test in a second!

Applied, compiled and worked for the 5 iterations I managed to test it.
Much better than your previous patch, thanks a lot! I'd give it some more spins
tomorrow, though.


Thanks,
  Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern - Sept. 2, 2010, 9:01 p.m.
On Thu, 2 Sep 2010, Tejun Heo wrote:

> Hello,
> 
> On 09/02/2010 10:28 PM, Rafael J. Wysocki wrote:
> > That would be very nice.  In particular, I'd like to get rid of the
> > pm_message_t thing if possible.  And I'd like to avoid putting the
> > controller into D3 before creating hibernation image. :-)

Ultimately I don't think it will be possible to get rid of pm_message_t
throughout the entire kernel.  Not just because it's a legacy thing
spread all over the place, but because it sometimes is genuinely
useful.  In situations where there are only minimal differences between
the various suspend/resume paths, it makes a lot of sense to call a
single function and tell it which type of operation to perform.

> Oh, yeah, things like that can definitely be changed, but I think it
> would still need to be piped through EH.  That's how the queue gets
> quiesced for those special operations and resume is basically probing,
> so it doesn't make much sense to split them.  Please let me know how
> it should work from power management POV and I'll be happy to convert
> libata to fit the new behavior.

Tejun, I'm planning to make a few changes to the block layer and the
SCSI core in order to implement delayed autosuspend.  These changes
will affect system sleep too, so you may need to know about them.

Each request_queue structure will have an rpm_status variable that gets
updated as the corresponding device is suspended or resumed.  When the
status is SUSPENDING or RESUMING, requests will not be sent from the
request_queue to the driver unless they have cmd_type set to
REQ_TYPE_PM_SUSPEND or REQ_TYPE_PM_RESUME (actually I may combine those
two into a single REQ_TYPE_PM type).  If other types of request are
added to the queue when the status isn't ACTIVE, they will cause a
runtime resume to be started.

Initially this will affect only drivers using the SCSI layer.  
Commands sent during error handling won't be affected though, because
the SCSI error handler doesn't use requests or queues -- it sends
commands directly to the lower-level driver.

Does that all sound okay to you?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Sept. 2, 2010, 9:09 p.m.
On Thursday, September 02, 2010, Alan Stern wrote:
> On Thu, 2 Sep 2010, Tejun Heo wrote:
> 
> > Hello,
> > 
> > On 09/02/2010 10:28 PM, Rafael J. Wysocki wrote:
> > > That would be very nice.  In particular, I'd like to get rid of the
> > > pm_message_t thing if possible.  And I'd like to avoid putting the
> > > controller into D3 before creating hibernation image. :-)
> 
> Ultimately I don't think it will be possible to get rid of pm_message_t
> throughout the entire kernel.  Not just because it's a legacy thing
> spread all over the place, but because it sometimes is genuinely
> useful.  In situations where there are only minimal differences between
> the various suspend/resume paths, it makes a lot of sense to call a
> single function and tell it which type of operation to perform.

Well, that's why I said "if possible" above. :-)

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - Sept. 3, 2010, 8:55 a.m.
cc'ing Jens and Christoph and quoting whole body for them.

On 09/02/2010 11:01 PM, Alan Stern wrote:
> On Thu, 2 Sep 2010, Tejun Heo wrote:
> 
>> Hello,
>>
>> On 09/02/2010 10:28 PM, Rafael J. Wysocki wrote:
>>> That would be very nice.  In particular, I'd like to get rid of the
>>> pm_message_t thing if possible.  And I'd like to avoid putting the
>>> controller into D3 before creating hibernation image. :-)
> 
> Ultimately I don't think it will be possible to get rid of pm_message_t
> throughout the entire kernel.  Not just because it's a legacy thing
> spread all over the place, but because it sometimes is genuinely
> useful.  In situations where there are only minimal differences between
> the various suspend/resume paths, it makes a lot of sense to call a
> single function and tell it which type of operation to perform.
> 
>> Oh, yeah, things like that can definitely be changed, but I think it
>> would still need to be piped through EH.  That's how the queue gets
>> quiesced for those special operations and resume is basically probing,
>> so it doesn't make much sense to split them.  Please let me know how
>> it should work from power management POV and I'll be happy to convert
>> libata to fit the new behavior.
> 
> Tejun, I'm planning to make a few changes to the block layer and the
> SCSI core in order to implement delayed autosuspend.  These changes
> will affect system sleep too, so you may need to know about them.
> 
> Each request_queue structure will have an rpm_status variable that gets
> updated as the corresponding device is suspended or resumed.  When the
> status is SUSPENDING or RESUMING, requests will not be sent from the
> request_queue to the driver unless they have cmd_type set to
> REQ_TYPE_PM_SUSPEND or REQ_TYPE_PM_RESUME (actually I may combine those
> two into a single REQ_TYPE_PM type).  If other types of request are
> added to the queue when the status isn't ACTIVE, they will cause a
> runtime resume to be started.
> 
> Initially this will affect only drivers using the SCSI layer.  
> Commands sent during error handling won't be affected though, because
> the SCSI error handler doesn't use requests or queues -- it sends
> commands directly to the lower-level driver.
> 
> Does that all sound okay to you?

Hmm... I can't really tell much without looking at the actual patch
but I'm a bit doubtful about using REQ_TYPE_* for suspend/resumes,
well, for that matter, for any other purpose than actual IO requests.
The current REQ_TYPE_PM_* are only used by ide and it's really messy.

Those special requests are very difficult to use in generic manner.
ie. How should remapping / cloning drivers like md, dm and loopback
handle them?  They end up just being a mechanism for specific low
level drivers to feed certain requests back to themselves to implement
suspend/resume sequence and as there's no clear defined common usage,
the meaning of those flags becomes ambiguous and confusing depending
on drivers and so on.

Another issue is that power management is probably better handled as a
sub problem of generic IO exception handling.  If you throw in
hot[un]plugs and transport errors and consider what should happen when
those events happen together, things get hairy if power management is
not well integrated with the usual exceptionn handling.

Thanks.
Alan Stern - Sept. 3, 2010, 2:16 p.m.
On Fri, 3 Sep 2010, Tejun Heo wrote:

> > Tejun, I'm planning to make a few changes to the block layer and the
> > SCSI core in order to implement delayed autosuspend.  These changes
> > will affect system sleep too, so you may need to know about them.
> > 
> > Each request_queue structure will have an rpm_status variable that gets
> > updated as the corresponding device is suspended or resumed.  When the
> > status is SUSPENDING or RESUMING, requests will not be sent from the
> > request_queue to the driver unless they have cmd_type set to
> > REQ_TYPE_PM_SUSPEND or REQ_TYPE_PM_RESUME (actually I may combine those
> > two into a single REQ_TYPE_PM type).  If other types of request are
> > added to the queue when the status isn't ACTIVE, they will cause a
> > runtime resume to be started.
> > 
> > Initially this will affect only drivers using the SCSI layer.  
> > Commands sent during error handling won't be affected though, because
> > the SCSI error handler doesn't use requests or queues -- it sends
> > commands directly to the lower-level driver.
> > 
> > Does that all sound okay to you?
> 
> Hmm... I can't really tell much without looking at the actual patch
> but I'm a bit doubtful about using REQ_TYPE_* for suspend/resumes,
> well, for that matter, for any other purpose than actual IO requests.
> The current REQ_TYPE_PM_* are only used by ide and it's really messy.

Yes.  I'm hoping that these changes will help simplify IDE, but I 
haven't looked at it yet.

> Those special requests are very difficult to use in generic manner.
> ie. How should remapping / cloning drivers like md, dm and loopback
> handle them?

They should never need to handle them.  These requests get used only
for preparing a device to change power modes.  Logical devices don't
need them, only physical devices do.  For example, you can't put a
logical RAID device into low-power mode; all you can do is put the
underlying physical drives into low power.

>  They end up just being a mechanism for specific low
> level drivers to feed certain requests back to themselves to implement
> suspend/resume sequence and as there's no clear defined common usage,
> the meaning of those flags becomes ambiguous and confusing depending
> on drivers and so on.

With my changes there now will be a clear defined usage.

> Another issue is that power management is probably better handled as a
> sub problem of generic IO exception handling.  If you throw in
> hot[un]plugs and transport errors and consider what should happen when
> those events happen together, things get hairy if power management is
> not well integrated with the usual exceptionn handling.

I don't see why.  Power management is not exceptional.  The relation
with hot [un]plug is already taken care of.  Transport error recovery
will block runtime power management if it occurs while the device is at
full power; it is an issue only if it occurs during a power transition.  
The SCSI error handler should work okay since it doesn't send commands
through the block layer.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephan Diestelhorst - Sept. 7, 2010, 11:54 a.m.
On Thursday 02 September 2010, 22:52:56 I wrote:
> On Thu, Sep 2, 2010 at 4:33 PM, Stephan Diestelhorst
> <stephan.diestelhorst@amd.com> wrote:
> > On Thursday 02 September 2010, 12:02:49 Tejun Heo wrote:
> >> For some mysterious reason, certain hardware reacts badly to usual EH
> >> actions while the system is going for suspend.  As the devices won't
> >> be needed until the system is resumed, ask EH to skip usual autopsy
> >> and recovery and proceed directly to suspend.
> >
> > Thanks Tejun for this patch. Will test in a second!
> 
> Applied, compiled and worked for the 5 iterations I managed to test it.
> Much better than your previous patch, thanks a lot! I'd give it some more spins
> tomorrow, though.

Alright, I have tested for more than 20 iterations now. It works
great! Thanks for the fix!

Are there any plans for upstream inclusion?

Thanks,
  Stephan

Patch

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 9ceb493..92cd5f3 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5427,6 +5427,7 @@  static int ata_host_request_pm(struct ata_host *host, pm_message_t mesg,
  */
 int ata_host_suspend(struct ata_host *host, pm_message_t mesg)
 {
+	unsigned int ehi_flags = ATA_EHI_QUIET;
 	int rc;

 	/*
@@ -5435,7 +5436,18 @@  int ata_host_suspend(struct ata_host *host, pm_message_t mesg)
 	 */
 	ata_lpm_enable(host);

-	rc = ata_host_request_pm(host, mesg, 0, ATA_EHI_QUIET, 1);
+	/*
+	 * On some hardware, device fails to respond after spun down
+	 * for suspend.  As the device won't be used before being
+	 * resumed, we don't need to touch the device.  Ask EH to skip
+	 * the usual stuff and proceed directly to suspend.
+	 *
+	 * http://thread.gmane.org/gmane.linux.ide/46764
+	 */
+	if (mesg.event == PM_EVENT_SUSPEND)
+		ehi_flags |= ATA_EHI_NO_AUTOPSY | ATA_EHI_NO_RECOVERY;
+
+	rc = ata_host_request_pm(host, mesg, 0, ehi_flags, 1);
 	if (rc == 0)
 		host->dev->power.power_state = mesg;
 	return rc;
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 0108731..95838b3 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -3242,6 +3242,10 @@  static int ata_eh_skip_recovery(struct ata_link *link)
 	if (link->flags & ATA_LFLAG_DISABLED)
 		return 1;

+	/* skip if explicitly requested */
+	if (ehc->i.flags & ATA_EHI_NO_RECOVERY)
+		return 1;
+
 	/* thaw frozen port and recover failed devices */
 	if ((ap->pflags & ATA_PFLAG_FROZEN) || ata_link_nr_enabled(link))
 		return 0;
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 89115f8..d1ef41b 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -335,6 +335,7 @@  enum {
 	ATA_EHI_HOTPLUGGED	= (1 << 0),  /* could have been hotplugged */
 	ATA_EHI_NO_AUTOPSY	= (1 << 2),  /* no autopsy */
 	ATA_EHI_QUIET		= (1 << 3),  /* be quiet */
+	ATA_EHI_NO_RECOVERY	= (1 << 4),  /* no recovery */

 	ATA_EHI_DID_SOFTRESET	= (1 << 16), /* already soft-reset this port */
 	ATA_EHI_DID_HARDRESET	= (1 << 17), /* already soft-reset this port */