Patchwork STANDBY IMMEDIATE failed on NVIDIA MCP5x controllers when system suspend

login
register
mail settings
Submitter James Bottomley
Date March 11, 2013, 8:49 a.m.
Message ID <1362991742.2395.9.camel@dabdike.int.hansenpartnership.com>
Download mbox | patch
Permalink /patch/226498/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

James Bottomley - March 11, 2013, 8:49 a.m.
On Mon, 2013-03-11 at 11:42 +0800, Aaron Lu wrote:
> Hi all,
> 
> I've seen some reports on STANDBY IMMEDIATE failed on NVIDIA MCP5x
> controllers when system goes to suspend(this command is sent by scsi sd
> driver on system suspend as a SCSI STOP command, which is translated to
> STANDBY IMMEDIATE ATA command). I've no idea of why this happened, so
> I wrote this email in hope of getting some new idea.
> 
> The related bug report:
> https://bugzilla.kernel.org/show_bug.cgi?id=48951
> 
> And google search showed that Peter reported a similar problem here:
> http://marc.info/?l=linux-ide&m=133534061316338&w=2
> 
> And bladud has found that, disable asyn suspend for the scsi target
> device can work around this problem.
> 
> Please feel free to suggest what can possibly be the cause, thanks.

I sometimes despair of people getting PM stuff right.  What on earth is
the point of refusing to suspend if the disk refuses to stop?  In theory
it gives the device more time to park its head, but almost no modern
drive requires this.  The next action suspend will take (if allowed) is
to power down peripherals which will forcibly stop the device.  The stop
request is purely informational for the device.  If it ignores it, then
the bigger hammer still works.

You really need to discriminate better between conditions we should and
shouldn't care about for suspend.  so in sd_suspend, we definitely care
if we can't flush the cache of a write back device because the power off
could lose data.  We don't really care if the disk says I can't stop, so
this is probably the correct fix.

James

---


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Lu - March 11, 2013, 1:51 p.m.
On 2013-03-11 16:49, James Bottomley wrote:
> On Mon, 2013-03-11 at 11:42 +0800, Aaron Lu wrote:
>> Hi all,
>>
>> I've seen some reports on STANDBY IMMEDIATE failed on NVIDIA MCP5x
>> controllers when system goes to suspend(this command is sent by scsi sd
>> driver on system suspend as a SCSI STOP command, which is translated to
>> STANDBY IMMEDIATE ATA command). I've no idea of why this happened, so
>> I wrote this email in hope of getting some new idea.
>>
>> The related bug report:
>> https://bugzilla.kernel.org/show_bug.cgi?id=48951
>>
>> And google search showed that Peter reported a similar problem here:
>> http://marc.info/?l=linux-ide&m=133534061316338&w=2
>>
>> And bladud has found that, disable asyn suspend for the scsi target
>> device can work around this problem.
>>
>> Please feel free to suggest what can possibly be the cause, thanks.
>
> You really need to discriminate better between conditions we should and
> shouldn't care about for suspend.  so in sd_suspend, we definitely care
> if we can't flush the cache of a write back device because the power off
> could lose data.  We don't really care if the disk says I can't stop, so
> this is probably the correct fix.

Yes, this will make suspend succeed, but the problem is still there -
the drive does not respond to this command, while it will if async
suspend for the target device is disabled. I don't think scsi code has
any fault here, it feels more like a chip bug to me.

And if we solve it this way, affected user may complain about long
delay when suspending or uncomfortable with the error messages in dmesg,
since that command would eventually timeout. So I hope we can nderstand
what is going wrong here and perhaps do something to avoid it.

Thanks,
Aaron

>
> ---
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 7992635..384b621d 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3079,7 +3079,11 @@ static int sd_suspend(struct device *dev)
>
>   	if (sdkp->device->manage_start_stop) {
>   		sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
> -		ret = sd_start_stop_device(sdkp, 0);
> +		/*
> +		 * this is informational for the disk we're going to power it
> +		 * off anyway, so don't bother about the return status
> +		 */
> +		sd_start_stop_device(sdkp, 0);
>   	}
>
>   done:
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley - March 11, 2013, 2:34 p.m.
On Mon, 2013-03-11 at 21:51 +0800, Aaron Lu wrote:
> On 2013-03-11 16:49, James Bottomley wrote:
> > On Mon, 2013-03-11 at 11:42 +0800, Aaron Lu wrote:
> >> Hi all,
> >>
> >> I've seen some reports on STANDBY IMMEDIATE failed on NVIDIA MCP5x
> >> controllers when system goes to suspend(this command is sent by scsi sd
> >> driver on system suspend as a SCSI STOP command, which is translated to
> >> STANDBY IMMEDIATE ATA command). I've no idea of why this happened, so
> >> I wrote this email in hope of getting some new idea.
> >>
> >> The related bug report:
> >> https://bugzilla.kernel.org/show_bug.cgi?id=48951
> >>
> >> And google search showed that Peter reported a similar problem here:
> >> http://marc.info/?l=linux-ide&m=133534061316338&w=2
> >>
> >> And bladud has found that, disable asyn suspend for the scsi target
> >> device can work around this problem.
> >>
> >> Please feel free to suggest what can possibly be the cause, thanks.
> >
> > You really need to discriminate better between conditions we should and
> > shouldn't care about for suspend.  so in sd_suspend, we definitely care
> > if we can't flush the cache of a write back device because the power off
> > could lose data.  We don't really care if the disk says I can't stop, so
> > this is probably the correct fix.
> 
> Yes, this will make suspend succeed, but the problem is still there -
> the drive does not respond to this command, while it will if async
> suspend for the target device is disabled. I don't think scsi code has
> any fault here, it feels more like a chip bug to me.
> 
> And if we solve it this way, affected user may complain about long
> delay when suspending or uncomfortable with the error messages in dmesg,
> since that command would eventually timeout. So I hope we can nderstand
> what is going wrong here and perhaps do something to avoid it.

Oh, that seems to be the suspend order isn't careful enough.
__device_suspend() waits for its children, but the host disk are too far
separated in the device tree.  If the immediate children of the host are
all sync, that wait never actually waits for anything.

James
 

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Robert Hancock - March 11, 2013, 2:35 p.m.
(resending due to GMail mess-up, sorry)

On Mon, Mar 11, 2013 at 2:49 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Mon, 2013-03-11 at 11:42 +0800, Aaron Lu wrote:
>> Hi all,
>>
>> I've seen some reports on STANDBY IMMEDIATE failed on NVIDIA MCP5x
>> controllers when system goes to suspend(this command is sent by scsi sd
>> driver on system suspend as a SCSI STOP command, which is translated to
>> STANDBY IMMEDIATE ATA command). I've no idea of why this happened, so
>> I wrote this email in hope of getting some new idea.
>>
>> The related bug report:
>> https://bugzilla.kernel.org/show_bug.cgi?id=48951
>>
>> And google search showed that Peter reported a similar problem here:
>> http://marc.info/?l=linux-ide&m=133534061316338&w=2
>>
>> And bladud has found that, disable asyn suspend for the scsi target
>> device can work around this problem.
>>
>> Please feel free to suggest what can possibly be the cause, thanks.
>
> I sometimes despair of people getting PM stuff right.  What on earth is
> the point of refusing to suspend if the disk refuses to stop?  In theory
> it gives the device more time to park its head, but almost no modern
> drive requires this.  The next action suspend will take (if allowed) is
> to power down peripherals which will forcibly stop the device.  The stop
> request is purely informational for the device.  If it ignores it, then
> the bigger hammer still works.

This really does matter. Especially on laptop hard drives (but on many
desktop ones as well), we really want to stop the drive, and therefore
unload the heads, before the power is shut off. Drives are often rated
for a much lower number of emergency head unloads (caused by power
loss) over their lifespan than normal software-commanded ones. So over
a long period of time, repeatedly failing to stop the drive before
power off will shorten its life. Maybe aborting suspend for this is a
bit harsh but it's not something that should be ignored.

Just to add to this point, it doesn't just matter for rotational
drives, either. A lot of SSDs use the "standby now" command to do some
kind of cleanup before power off. Some drives document that a
subsequent power up may take longer for the drive to be ready if it
wasn't "spun down" prior to the last power off. It's conceivable that
in some crappy drives, lack of proper power-off sequence could even
cause data loss.

The question is, at the point where we are stopping the drive, has
anything been done to the ATA controller to suspend it? If so, that
sounds like a definite bug somewhere. It has to be something we are
doing on the kernel side, otherwise why would this work if async
suspend was disabled?

>
> You really need to discriminate better between conditions we should and
> shouldn't care about for suspend.  so in sd_suspend, we definitely care
> if we can't flush the cache of a write back device because the power off
> could lose data.  We don't really care if the disk says I can't stop, so
> this is probably the correct fix.
>
> James
>
> ---
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 7992635..384b621d 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3079,7 +3079,11 @@ static int sd_suspend(struct device *dev)
>
>         if (sdkp->device->manage_start_stop) {
>                 sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
> -               ret = sd_start_stop_device(sdkp, 0);
> +               /*
> +                * this is informational for the disk we're going to power it
> +                * off anyway, so don't bother about the return status
> +                */
> +               sd_start_stop_device(sdkp, 0);
>         }
>
>  done:
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley - March 11, 2013, 2:51 p.m.
On Mon, 2013-03-11 at 08:35 -0600, Robert Hancock wrote:
> (resending due to GMail mess-up, sorry)
> 
> On Mon, Mar 11, 2013 at 2:49 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > On Mon, 2013-03-11 at 11:42 +0800, Aaron Lu wrote:
> >> Hi all,
> >>
> >> I've seen some reports on STANDBY IMMEDIATE failed on NVIDIA MCP5x
> >> controllers when system goes to suspend(this command is sent by scsi sd
> >> driver on system suspend as a SCSI STOP command, which is translated to
> >> STANDBY IMMEDIATE ATA command). I've no idea of why this happened, so
> >> I wrote this email in hope of getting some new idea.
> >>
> >> The related bug report:
> >> https://bugzilla.kernel.org/show_bug.cgi?id=48951
> >>
> >> And google search showed that Peter reported a similar problem here:
> >> http://marc.info/?l=linux-ide&m=133534061316338&w=2
> >>
> >> And bladud has found that, disable asyn suspend for the scsi target
> >> device can work around this problem.
> >>
> >> Please feel free to suggest what can possibly be the cause, thanks.
> >
> > I sometimes despair of people getting PM stuff right.  What on earth is
> > the point of refusing to suspend if the disk refuses to stop?  In theory
> > it gives the device more time to park its head, but almost no modern
> > drive requires this.  The next action suspend will take (if allowed) is
> > to power down peripherals which will forcibly stop the device.  The stop
> > request is purely informational for the device.  If it ignores it, then
> > the bigger hammer still works.
> 
> This really does matter. Especially on laptop hard drives (but on many
> desktop ones as well), we really want to stop the drive, and therefore
> unload the heads, before the power is shut off. Drives are often rated
> for a much lower number of emergency head unloads (caused by power
> loss) over their lifespan than normal software-commanded ones. So over
> a long period of time, repeatedly failing to stop the drive before
> power off will shorten its life. Maybe aborting suspend for this is a
> bit harsh but it's not something that should be ignored.

Where do you get this information from?  The only smart parameter
tracking this is the power off retract count (it may originally have
been the emergency head retract count, but it was renamed a while ago),
and that happens for any head unload, however caused.  I know SCSI
devices long ago ceased caring about this, because the in-drive
capacitance is sufficient to achieve a head unload before the device
completely spins down in forced power off (I admit the really old IDE
devices ... the ones that required the OS to do everything did have a
nasty habit of crashing their heads onto the surface, but they stopped
being manufactured years ago) ... I really don't see how any modern SATA
device would fail to do this.

> Just to add to this point, it doesn't just matter for rotational
> drives, either. A lot of SSDs use the "standby now" command to do some
> kind of cleanup before power off. Some drives document that a
> subsequent power up may take longer for the drive to be ready if it
> wasn't "spun down" prior to the last power off. It's conceivable that
> in some crappy drives, lack of proper power-off sequence could even
> cause data loss.

Well, no, they shouldn't ... not unless the drive violates its data
integrity commitment, which is going to cause a whole load of FS
corruption.  All our Jornalled FS guarantees rely on this data integrity
commitment.  If they're violated, we have a whole boat load of data
safety issues.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern - March 11, 2013, 3 p.m.
On Mon, 11 Mar 2013, James Bottomley wrote:

> Oh, that seems to be the suspend order isn't careful enough.
> __device_suspend() waits for its children, but the host disk are too far
> separated in the device tree.  If the immediate children of the host are
> all sync, that wait never actually waits for anything.

I was going to make exactly this same point.  During async suspend, the
PM core is careful to make sure that no device is suspended before its
children.  But there aren't any other checks, so if device A isn't an
ancestor of device B then it's possible for async suspend to power down
A before B.  This can cause problems if B needs A to be active while B
is suspending.

Does the ATA system have any non-ancestor dependencies like this?  If 
it does, the appropriate driver can be fixed to take them into account.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Robert Hancock - March 11, 2013, 7:30 p.m.
On Mon, Mar 11, 2013 at 8:51 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Mon, 2013-03-11 at 08:35 -0600, Robert Hancock wrote:
>> (resending due to GMail mess-up, sorry)
>>
>> On Mon, Mar 11, 2013 at 2:49 AM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>> > On Mon, 2013-03-11 at 11:42 +0800, Aaron Lu wrote:
>> >> Hi all,
>> >>
>> >> I've seen some reports on STANDBY IMMEDIATE failed on NVIDIA MCP5x
>> >> controllers when system goes to suspend(this command is sent by scsi sd
>> >> driver on system suspend as a SCSI STOP command, which is translated to
>> >> STANDBY IMMEDIATE ATA command). I've no idea of why this happened, so
>> >> I wrote this email in hope of getting some new idea.
>> >>
>> >> The related bug report:
>> >> https://bugzilla.kernel.org/show_bug.cgi?id=48951
>> >>
>> >> And google search showed that Peter reported a similar problem here:
>> >> http://marc.info/?l=linux-ide&m=133534061316338&w=2
>> >>
>> >> And bladud has found that, disable asyn suspend for the scsi target
>> >> device can work around this problem.
>> >>
>> >> Please feel free to suggest what can possibly be the cause, thanks.
>> >
>> > I sometimes despair of people getting PM stuff right.  What on earth is
>> > the point of refusing to suspend if the disk refuses to stop?  In theory
>> > it gives the device more time to park its head, but almost no modern
>> > drive requires this.  The next action suspend will take (if allowed) is
>> > to power down peripherals which will forcibly stop the device.  The stop
>> > request is purely informational for the device.  If it ignores it, then
>> > the bigger hammer still works.
>>
>> This really does matter. Especially on laptop hard drives (but on many
>> desktop ones as well), we really want to stop the drive, and therefore
>> unload the heads, before the power is shut off. Drives are often rated
>> for a much lower number of emergency head unloads (caused by power
>> loss) over their lifespan than normal software-commanded ones. So over
>> a long period of time, repeatedly failing to stop the drive before
>> power off will shorten its life. Maybe aborting suspend for this is a
>> bit harsh but it's not something that should be ignored.
>
> Where do you get this information from?  The only smart parameter
> tracking this is the power off retract count (it may originally have
> been the emergency head retract count, but it was renamed a while ago),
> and that happens for any head unload, however caused.  I know SCSI
> devices long ago ceased caring about this, because the in-drive
> capacitance is sufficient to achieve a head unload before the device
> completely spins down in forced power off (I admit the really old IDE
> devices ... the ones that required the OS to do everything did have a
> nasty habit of crashing their heads onto the surface, but they stopped
> being manufactured years ago) ... I really don't see how any modern SATA
> device would fail to do this.

They do still unload the heads if you cut power (I seem to recall
hearing that some of them actually used the rotational energy in the
spinning platters as a power source to do this). But it's not nearly
as nice for the drive as the normal commanded unload. You can kind of
tell just from the sound of it that it's not really the same at all.

Many drives do track these separately in SMART attributes. The one for
normal unloads is Load_Cycle_Count, the other is
Power-Off_Retract_Count. The drive on the machine I'm on right now,
for example, has a Load_Cycle_Count of 88 and Power-Off_Retract_Count
of 27. I haven't seen what the actual cycle limits are for these but
I'm willing to bet that the power-off retract count limit is a lot
lower.

>
>> Just to add to this point, it doesn't just matter for rotational
>> drives, either. A lot of SSDs use the "standby now" command to do some
>> kind of cleanup before power off. Some drives document that a
>> subsequent power up may take longer for the drive to be ready if it
>> wasn't "spun down" prior to the last power off. It's conceivable that
>> in some crappy drives, lack of proper power-off sequence could even
>> cause data loss.
>
> Well, no, they shouldn't ... not unless the drive violates its data
> integrity commitment, which is going to cause a whole load of FS
> corruption.  All our Jornalled FS guarantees rely on this data integrity
> commitment.  If they're violated, we have a whole boat load of data
> safety issues.

I'm sure there are some SSDs that do violate their data integrity
commitments - a while ago some tests were done (don't have a link
handy and I don't think they identified the actual vendor/model of the
drives in any case) but there were definitely some SSDs that did do
nasty things like trash unrelated data if power was lost while
writing, etc. So we definitely don't want to risk this sort of thing
occurring on a normal power-off or suspend.

There's a reference to this in SMART here, for example:
http://www.thomas-krenn.com/en/wiki/SMART_attributes_of_Intel_SSDs
They refer to what's normally called "Power-off Retract Count" for
HDDs as "Unsafe Shutdown Count ": "The raw value reports the
cumulative number of unsafe (unclean) shutdown events over the life of
the device. An unsafe shutdown occurs whenever the device is powered
off without STANDBY IMMEDIATE being the last command."
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Lu - March 12, 2013, 2:53 a.m.
Hi James and Alan,

On 03/11/2013 11:00 PM, Alan Stern wrote:
> On Mon, 11 Mar 2013, James Bottomley wrote:
> 
>> Oh, that seems to be the suspend order isn't careful enough.
>> __device_suspend() waits for its children, but the host disk are too far
>> separated in the device tree.  If the immediate children of the host are
>> all sync, that wait never actually waits for anything.
> 
> I was going to make exactly this same point.  During async suspend, the
> PM core is careful to make sure that no device is suspended before its
> children.  But there aren't any other checks, so if device A isn't an
> ancestor of device B then it's possible for async suspend to power down
> A before B.  This can cause problems if B needs A to be active while B
> is suspending.

Thanks for the suggestions.

> 
> Does the ATA system have any non-ancestor dependencies like this?  If 
> it does, the appropriate driver can be fixed to take them into account.

I don't think there is, and the relationship is like this:

    ata_host_controller* (named sata_nv xxx)
	    |
	ata_port* (named atax, while "ata_port atax" is another device)
	/	\
scsi_host	ata_link
    |		   |
scsi_target	ata_device
    |
scsi_device* (named sd x:x:x:x)

With the devices that have actual PM operation functions defined have
the asterisk next to it.

So ata_host_controller waits for all of the ata_ports, and the ata_port
waits for both scsi_host and ata_link. scsi_host waits for scsi_target,
and scsi_target waits for scsi_device. So if scsi_device is not done,
ata_port will not start. Doesn't look like a problem to me.

And from the log:
https://bugzilla.kernel.org/attachment.cgi?id=95101
It also looks like the order is correct.

Thanks,
Aaron
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley - March 12, 2013, 12:10 p.m.
On Tue, 2013-03-12 at 10:53 +0800, Aaron Lu wrote:
> Hi James and Alan,
> 
> On 03/11/2013 11:00 PM, Alan Stern wrote:
> > On Mon, 11 Mar 2013, James Bottomley wrote:
> > 
> >> Oh, that seems to be the suspend order isn't careful enough.
> >> __device_suspend() waits for its children, but the host disk are too far
> >> separated in the device tree.  If the immediate children of the host are
> >> all sync, that wait never actually waits for anything.
> > 
> > I was going to make exactly this same point.  During async suspend, the
> > PM core is careful to make sure that no device is suspended before its
> > children.  But there aren't any other checks, so if device A isn't an
> > ancestor of device B then it's possible for async suspend to power down
> > A before B.  This can cause problems if B needs A to be active while B
> > is suspending.
> 
> Thanks for the suggestions.
> 
> > 
> > Does the ATA system have any non-ancestor dependencies like this?  If 
> > it does, the appropriate driver can be fixed to take them into account.
> 
> I don't think there is, and the relationship is like this:
> 
>     ata_host_controller* (named sata_nv xxx)
> 	    |
> 	ata_port* (named atax, while "ata_port atax" is another device)
> 	/	\
> scsi_host	ata_link
>     |		   |
> scsi_target	ata_device
>     |
> scsi_device* (named sd x:x:x:x)
> 
> With the devices that have actual PM operation functions defined have
> the asterisk next to it.
> 
> So ata_host_controller waits for all of the ata_ports, and the ata_port
> waits for both scsi_host and ata_link. scsi_host waits for scsi_target,
> and scsi_target waits for scsi_device. So if scsi_device is not done,
> ata_port will not start. Doesn't look like a problem to me.
> 
> And from the log:
> https://bugzilla.kernel.org/attachment.cgi?id=95101
> It also looks like the order is correct.

That's not what that log appears to say.  Here are the relevant bits

[ 7377.813634] sd 2:0:0:0: async_suspend: scheduled
[ 7377.813636] sd 2:0:0:0: __device_suspend: starts
[ 7377.813639] sd 2:0:0:0: [sdb] Synchronizing SCSI cache
... so now we've begun suspend]
[ 7377.813750] sd 2:0:0:0: [sdb] Stopping disk
[... here we send STANDBY IMMEDIATE ]
[ 7378.237627] sata_nv 0000:00:05.2: async_suspend: scheduled
[ 7378.237631] sata_nv 0000:00:05.2: __device_suspend: starts
[... we begin to shut down the host ]
[ 7378.249333] sata_nv 0000:00:05.2: __device_suspend: done
[... host shutdown complete ]
[ 7408.372642] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 7408.372647] ata3.00: failed command: STANDBY IMMEDIATE
[ ... command times out ]
[ 7408.870675] dpm_run_callback(): scsi_bus_suspend+0x0/0x20 [scsi_mod] returns 134217730
[ 7408.870681] sd 2:0:0:0: __device_suspend: done

We shut down the host controller before the command completed.  This
appears to cause the timeout

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Lu - March 12, 2013, 1:46 p.m.
On 03/12/2013 08:10 PM, James Bottomley wrote:
> On Tue, 2013-03-12 at 10:53 +0800, Aaron Lu wrote:
>> Hi James and Alan,
>>
>> On 03/11/2013 11:00 PM, Alan Stern wrote:
>>> On Mon, 11 Mar 2013, James Bottomley wrote:
>>>
>>>> Oh, that seems to be the suspend order isn't careful enough.
>>>> __device_suspend() waits for its children, but the host disk are too far
>>>> separated in the device tree.  If the immediate children of the host are
>>>> all sync, that wait never actually waits for anything.
>>>
>>> I was going to make exactly this same point.  During async suspend, the
>>> PM core is careful to make sure that no device is suspended before its
>>> children.  But there aren't any other checks, so if device A isn't an
>>> ancestor of device B then it's possible for async suspend to power down
>>> A before B.  This can cause problems if B needs A to be active while B
>>> is suspending.
>>
>> Thanks for the suggestions.
>>
>>>
>>> Does the ATA system have any non-ancestor dependencies like this?  If
>>> it does, the appropriate driver can be fixed to take them into account.
>>
>> I don't think there is, and the relationship is like this:
>>
>>      ata_host_controller* (named sata_nv xxx)
>> 	    |
>> 	ata_port* (named atax, while "ata_port atax" is another device)
>> 	/	\
>> scsi_host	ata_link
>>      |		   |
>> scsi_target	ata_device
>>      |
>> scsi_device* (named sd x:x:x:x)
>>
>> With the devices that have actual PM operation functions defined have
>> the asterisk next to it.
>>
>> So ata_host_controller waits for all of the ata_ports, and the ata_port
>> waits for both scsi_host and ata_link. scsi_host waits for scsi_target,
>> and scsi_target waits for scsi_device. So if scsi_device is not done,
>> ata_port will not start. Doesn't look like a problem to me.
>>
>> And from the log:
>> https://bugzilla.kernel.org/attachment.cgi?id=95101
>> It also looks like the order is correct.
>
> That's not what that log appears to say.  Here are the relevant bits
>
> [ 7377.813634] sd 2:0:0:0: async_suspend: scheduled
> [ 7377.813636] sd 2:0:0:0: __device_suspend: starts
> [ 7377.813639] sd 2:0:0:0: [sdb] Synchronizing SCSI cache
> ... so now we've begun suspend]
> [ 7377.813750] sd 2:0:0:0: [sdb] Stopping disk
> [... here we send STANDBY IMMEDIATE ]
> [ 7378.237627] sata_nv 0000:00:05.2: async_suspend: scheduled
> [ 7378.237631] sata_nv 0000:00:05.2: __device_suspend: starts
> [... we begin to shut down the host ]
> [ 7378.249333] sata_nv 0000:00:05.2: __device_suspend: done
> [... host shutdown complete ]

I think sata_nv 0000:00:05.0 is the host controller for sd 2:0:0:0, and
sata_nv 0000:00:05.1 is the host controller for sd 4:0:0:0. I've asked
bladud@gmail.com to attach the full dmesg, which can make it easier for
us to decide which port belongs to which host controller. Note that this
system has multiple ata host controllers.

Thanks,
Aaron

> [ 7408.372642] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> [ 7408.372647] ata3.00: failed command: STANDBY IMMEDIATE
> [ ... command times out ]
> [ 7408.870675] dpm_run_callback(): scsi_bus_suspend+0x0/0x20 [scsi_mod] returns 134217730
> [ 7408.870681] sd 2:0:0:0: __device_suspend: done
>
> We shut down the host controller before the command completed.  This
> appears to cause the timeout
>
> James
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern - March 12, 2013, 3:08 p.m.
On Tue, 12 Mar 2013, Aaron Lu wrote:

> So ata_host_controller waits for all of the ata_ports, and the ata_port
> waits for both scsi_host and ata_link. scsi_host waits for scsi_target,
> and scsi_target waits for scsi_device. So if scsi_device is not done,
> ata_port will not start. Doesn't look like a problem to me.
> 
> And from the log:
> https://bugzilla.kernel.org/attachment.cgi?id=95101
> It also looks like the order is correct.

You could compare the PM debugging logs for an async suspend and a 
synchronous suspend.  There might be an obvious difference in the 
ordering that would explain the problem.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Lord - March 12, 2013, 4:34 p.m.
On 13-03-11 03:30 PM, Robert Hancock wrote:
..
> I'm sure there are some SSDs that do violate their data integrity
> commitments - a while ago some tests were done (don't have a link
> handy and I don't think they identified the actual vendor/model of the
> drives in any case) but there were definitely some SSDs that did do
> nasty things like trash unrelated data if power was lost while
> writing, etc. So we definitely don't want to risk this sort of thing
> occurring on a normal power-off or suspend.


Just about all current SATA SSDs advertise having some form(s)
of "background garbage collection".  That "feature" has always concerned me,
because of the real possibility of power being removed (system shutdown)
while the drive firmware is in the midst of re-shuffling my data around.

One would hope that "STANDBY IMMEDIATE" is taken by the drive
as a signal to stop that fussing about and prepare for full/sudden power-off.

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 7992635..384b621d 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3079,7 +3079,11 @@  static int sd_suspend(struct device *dev)
 
 	if (sdkp->device->manage_start_stop) {
 		sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
-		ret = sd_start_stop_device(sdkp, 0);
+		/*
+		 * this is informational for the disk we're going to power it
+		 * off anyway, so don't bother about the return status
+		 */
+		sd_start_stop_device(sdkp, 0);
 	}
 
 done: