Message ID | 1362991742.2395.9.camel@dabdike.int.hansenpartnership.com |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
On 2013-03-11 16:49, James Bottomley wrote: > On Mon, 2013-03-11 at 11:42 +0800, Aaron Lu wrote: >> Hi all, >> >> I've seen some reports on STANDBY IMMEDIATE failed on NVIDIA MCP5x >> controllers when system goes to suspend(this command is sent by scsi sd >> driver on system suspend as a SCSI STOP command, which is translated to >> STANDBY IMMEDIATE ATA command). I've no idea of why this happened, so >> I wrote this email in hope of getting some new idea. >> >> The related bug report: >> https://bugzilla.kernel.org/show_bug.cgi?id=48951 >> >> And google search showed that Peter reported a similar problem here: >> http://marc.info/?l=linux-ide&m=133534061316338&w=2 >> >> And bladud has found that, disable asyn suspend for the scsi target >> device can work around this problem. >> >> Please feel free to suggest what can possibly be the cause, thanks. > > You really need to discriminate better between conditions we should and > shouldn't care about for suspend. so in sd_suspend, we definitely care > if we can't flush the cache of a write back device because the power off > could lose data. We don't really care if the disk says I can't stop, so > this is probably the correct fix. Yes, this will make suspend succeed, but the problem is still there - the drive does not respond to this command, while it will if async suspend for the target device is disabled. I don't think scsi code has any fault here, it feels more like a chip bug to me. And if we solve it this way, affected user may complain about long delay when suspending or uncomfortable with the error messages in dmesg, since that command would eventually timeout. So I hope we can nderstand what is going wrong here and perhaps do something to avoid it. Thanks, Aaron > > --- > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index 7992635..384b621d 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -3079,7 +3079,11 @@ static int sd_suspend(struct device *dev) > > if (sdkp->device->manage_start_stop) { > sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n"); > - ret = sd_start_stop_device(sdkp, 0); > + /* > + * this is informational for the disk we're going to power it > + * off anyway, so don't bother about the return status > + */ > + sd_start_stop_device(sdkp, 0); > } > > done: > > -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2013-03-11 at 21:51 +0800, Aaron Lu wrote: > On 2013-03-11 16:49, James Bottomley wrote: > > On Mon, 2013-03-11 at 11:42 +0800, Aaron Lu wrote: > >> Hi all, > >> > >> I've seen some reports on STANDBY IMMEDIATE failed on NVIDIA MCP5x > >> controllers when system goes to suspend(this command is sent by scsi sd > >> driver on system suspend as a SCSI STOP command, which is translated to > >> STANDBY IMMEDIATE ATA command). I've no idea of why this happened, so > >> I wrote this email in hope of getting some new idea. > >> > >> The related bug report: > >> https://bugzilla.kernel.org/show_bug.cgi?id=48951 > >> > >> And google search showed that Peter reported a similar problem here: > >> http://marc.info/?l=linux-ide&m=133534061316338&w=2 > >> > >> And bladud has found that, disable asyn suspend for the scsi target > >> device can work around this problem. > >> > >> Please feel free to suggest what can possibly be the cause, thanks. > > > > You really need to discriminate better between conditions we should and > > shouldn't care about for suspend. so in sd_suspend, we definitely care > > if we can't flush the cache of a write back device because the power off > > could lose data. We don't really care if the disk says I can't stop, so > > this is probably the correct fix. > > Yes, this will make suspend succeed, but the problem is still there - > the drive does not respond to this command, while it will if async > suspend for the target device is disabled. I don't think scsi code has > any fault here, it feels more like a chip bug to me. > > And if we solve it this way, affected user may complain about long > delay when suspending or uncomfortable with the error messages in dmesg, > since that command would eventually timeout. So I hope we can nderstand > what is going wrong here and perhaps do something to avoid it. Oh, that seems to be the suspend order isn't careful enough. __device_suspend() waits for its children, but the host disk are too far separated in the device tree. If the immediate children of the host are all sync, that wait never actually waits for anything. James -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
(resending due to GMail mess-up, sorry) On Mon, Mar 11, 2013 at 2:49 AM, James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > On Mon, 2013-03-11 at 11:42 +0800, Aaron Lu wrote: >> Hi all, >> >> I've seen some reports on STANDBY IMMEDIATE failed on NVIDIA MCP5x >> controllers when system goes to suspend(this command is sent by scsi sd >> driver on system suspend as a SCSI STOP command, which is translated to >> STANDBY IMMEDIATE ATA command). I've no idea of why this happened, so >> I wrote this email in hope of getting some new idea. >> >> The related bug report: >> https://bugzilla.kernel.org/show_bug.cgi?id=48951 >> >> And google search showed that Peter reported a similar problem here: >> http://marc.info/?l=linux-ide&m=133534061316338&w=2 >> >> And bladud has found that, disable asyn suspend for the scsi target >> device can work around this problem. >> >> Please feel free to suggest what can possibly be the cause, thanks. > > I sometimes despair of people getting PM stuff right. What on earth is > the point of refusing to suspend if the disk refuses to stop? In theory > it gives the device more time to park its head, but almost no modern > drive requires this. The next action suspend will take (if allowed) is > to power down peripherals which will forcibly stop the device. The stop > request is purely informational for the device. If it ignores it, then > the bigger hammer still works. This really does matter. Especially on laptop hard drives (but on many desktop ones as well), we really want to stop the drive, and therefore unload the heads, before the power is shut off. Drives are often rated for a much lower number of emergency head unloads (caused by power loss) over their lifespan than normal software-commanded ones. So over a long period of time, repeatedly failing to stop the drive before power off will shorten its life. Maybe aborting suspend for this is a bit harsh but it's not something that should be ignored. Just to add to this point, it doesn't just matter for rotational drives, either. A lot of SSDs use the "standby now" command to do some kind of cleanup before power off. Some drives document that a subsequent power up may take longer for the drive to be ready if it wasn't "spun down" prior to the last power off. It's conceivable that in some crappy drives, lack of proper power-off sequence could even cause data loss. The question is, at the point where we are stopping the drive, has anything been done to the ATA controller to suspend it? If so, that sounds like a definite bug somewhere. It has to be something we are doing on the kernel side, otherwise why would this work if async suspend was disabled? > > You really need to discriminate better between conditions we should and > shouldn't care about for suspend. so in sd_suspend, we definitely care > if we can't flush the cache of a write back device because the power off > could lose data. We don't really care if the disk says I can't stop, so > this is probably the correct fix. > > James > > --- > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index 7992635..384b621d 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -3079,7 +3079,11 @@ static int sd_suspend(struct device *dev) > > if (sdkp->device->manage_start_stop) { > sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n"); > - ret = sd_start_stop_device(sdkp, 0); > + /* > + * this is informational for the disk we're going to power it > + * off anyway, so don't bother about the return status > + */ > + sd_start_stop_device(sdkp, 0); > } > > done: > > -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2013-03-11 at 08:35 -0600, Robert Hancock wrote: > (resending due to GMail mess-up, sorry) > > On Mon, Mar 11, 2013 at 2:49 AM, James Bottomley > <James.Bottomley@hansenpartnership.com> wrote: > > On Mon, 2013-03-11 at 11:42 +0800, Aaron Lu wrote: > >> Hi all, > >> > >> I've seen some reports on STANDBY IMMEDIATE failed on NVIDIA MCP5x > >> controllers when system goes to suspend(this command is sent by scsi sd > >> driver on system suspend as a SCSI STOP command, which is translated to > >> STANDBY IMMEDIATE ATA command). I've no idea of why this happened, so > >> I wrote this email in hope of getting some new idea. > >> > >> The related bug report: > >> https://bugzilla.kernel.org/show_bug.cgi?id=48951 > >> > >> And google search showed that Peter reported a similar problem here: > >> http://marc.info/?l=linux-ide&m=133534061316338&w=2 > >> > >> And bladud has found that, disable asyn suspend for the scsi target > >> device can work around this problem. > >> > >> Please feel free to suggest what can possibly be the cause, thanks. > > > > I sometimes despair of people getting PM stuff right. What on earth is > > the point of refusing to suspend if the disk refuses to stop? In theory > > it gives the device more time to park its head, but almost no modern > > drive requires this. The next action suspend will take (if allowed) is > > to power down peripherals which will forcibly stop the device. The stop > > request is purely informational for the device. If it ignores it, then > > the bigger hammer still works. > > This really does matter. Especially on laptop hard drives (but on many > desktop ones as well), we really want to stop the drive, and therefore > unload the heads, before the power is shut off. Drives are often rated > for a much lower number of emergency head unloads (caused by power > loss) over their lifespan than normal software-commanded ones. So over > a long period of time, repeatedly failing to stop the drive before > power off will shorten its life. Maybe aborting suspend for this is a > bit harsh but it's not something that should be ignored. Where do you get this information from? The only smart parameter tracking this is the power off retract count (it may originally have been the emergency head retract count, but it was renamed a while ago), and that happens for any head unload, however caused. I know SCSI devices long ago ceased caring about this, because the in-drive capacitance is sufficient to achieve a head unload before the device completely spins down in forced power off (I admit the really old IDE devices ... the ones that required the OS to do everything did have a nasty habit of crashing their heads onto the surface, but they stopped being manufactured years ago) ... I really don't see how any modern SATA device would fail to do this. > Just to add to this point, it doesn't just matter for rotational > drives, either. A lot of SSDs use the "standby now" command to do some > kind of cleanup before power off. Some drives document that a > subsequent power up may take longer for the drive to be ready if it > wasn't "spun down" prior to the last power off. It's conceivable that > in some crappy drives, lack of proper power-off sequence could even > cause data loss. Well, no, they shouldn't ... not unless the drive violates its data integrity commitment, which is going to cause a whole load of FS corruption. All our Jornalled FS guarantees rely on this data integrity commitment. If they're violated, we have a whole boat load of data safety issues. James -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 11 Mar 2013, James Bottomley wrote: > Oh, that seems to be the suspend order isn't careful enough. > __device_suspend() waits for its children, but the host disk are too far > separated in the device tree. If the immediate children of the host are > all sync, that wait never actually waits for anything. I was going to make exactly this same point. During async suspend, the PM core is careful to make sure that no device is suspended before its children. But there aren't any other checks, so if device A isn't an ancestor of device B then it's possible for async suspend to power down A before B. This can cause problems if B needs A to be active while B is suspending. Does the ATA system have any non-ancestor dependencies like this? If it does, the appropriate driver can be fixed to take them into account. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 11, 2013 at 8:51 AM, James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > On Mon, 2013-03-11 at 08:35 -0600, Robert Hancock wrote: >> (resending due to GMail mess-up, sorry) >> >> On Mon, Mar 11, 2013 at 2:49 AM, James Bottomley >> <James.Bottomley@hansenpartnership.com> wrote: >> > On Mon, 2013-03-11 at 11:42 +0800, Aaron Lu wrote: >> >> Hi all, >> >> >> >> I've seen some reports on STANDBY IMMEDIATE failed on NVIDIA MCP5x >> >> controllers when system goes to suspend(this command is sent by scsi sd >> >> driver on system suspend as a SCSI STOP command, which is translated to >> >> STANDBY IMMEDIATE ATA command). I've no idea of why this happened, so >> >> I wrote this email in hope of getting some new idea. >> >> >> >> The related bug report: >> >> https://bugzilla.kernel.org/show_bug.cgi?id=48951 >> >> >> >> And google search showed that Peter reported a similar problem here: >> >> http://marc.info/?l=linux-ide&m=133534061316338&w=2 >> >> >> >> And bladud has found that, disable asyn suspend for the scsi target >> >> device can work around this problem. >> >> >> >> Please feel free to suggest what can possibly be the cause, thanks. >> > >> > I sometimes despair of people getting PM stuff right. What on earth is >> > the point of refusing to suspend if the disk refuses to stop? In theory >> > it gives the device more time to park its head, but almost no modern >> > drive requires this. The next action suspend will take (if allowed) is >> > to power down peripherals which will forcibly stop the device. The stop >> > request is purely informational for the device. If it ignores it, then >> > the bigger hammer still works. >> >> This really does matter. Especially on laptop hard drives (but on many >> desktop ones as well), we really want to stop the drive, and therefore >> unload the heads, before the power is shut off. Drives are often rated >> for a much lower number of emergency head unloads (caused by power >> loss) over their lifespan than normal software-commanded ones. So over >> a long period of time, repeatedly failing to stop the drive before >> power off will shorten its life. Maybe aborting suspend for this is a >> bit harsh but it's not something that should be ignored. > > Where do you get this information from? The only smart parameter > tracking this is the power off retract count (it may originally have > been the emergency head retract count, but it was renamed a while ago), > and that happens for any head unload, however caused. I know SCSI > devices long ago ceased caring about this, because the in-drive > capacitance is sufficient to achieve a head unload before the device > completely spins down in forced power off (I admit the really old IDE > devices ... the ones that required the OS to do everything did have a > nasty habit of crashing their heads onto the surface, but they stopped > being manufactured years ago) ... I really don't see how any modern SATA > device would fail to do this. They do still unload the heads if you cut power (I seem to recall hearing that some of them actually used the rotational energy in the spinning platters as a power source to do this). But it's not nearly as nice for the drive as the normal commanded unload. You can kind of tell just from the sound of it that it's not really the same at all. Many drives do track these separately in SMART attributes. The one for normal unloads is Load_Cycle_Count, the other is Power-Off_Retract_Count. The drive on the machine I'm on right now, for example, has a Load_Cycle_Count of 88 and Power-Off_Retract_Count of 27. I haven't seen what the actual cycle limits are for these but I'm willing to bet that the power-off retract count limit is a lot lower. > >> Just to add to this point, it doesn't just matter for rotational >> drives, either. A lot of SSDs use the "standby now" command to do some >> kind of cleanup before power off. Some drives document that a >> subsequent power up may take longer for the drive to be ready if it >> wasn't "spun down" prior to the last power off. It's conceivable that >> in some crappy drives, lack of proper power-off sequence could even >> cause data loss. > > Well, no, they shouldn't ... not unless the drive violates its data > integrity commitment, which is going to cause a whole load of FS > corruption. All our Jornalled FS guarantees rely on this data integrity > commitment. If they're violated, we have a whole boat load of data > safety issues. I'm sure there are some SSDs that do violate their data integrity commitments - a while ago some tests were done (don't have a link handy and I don't think they identified the actual vendor/model of the drives in any case) but there were definitely some SSDs that did do nasty things like trash unrelated data if power was lost while writing, etc. So we definitely don't want to risk this sort of thing occurring on a normal power-off or suspend. There's a reference to this in SMART here, for example: http://www.thomas-krenn.com/en/wiki/SMART_attributes_of_Intel_SSDs They refer to what's normally called "Power-off Retract Count" for HDDs as "Unsafe Shutdown Count ": "The raw value reports the cumulative number of unsafe (unclean) shutdown events over the life of the device. An unsafe shutdown occurs whenever the device is powered off without STANDBY IMMEDIATE being the last command." -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi James and Alan, On 03/11/2013 11:00 PM, Alan Stern wrote: > On Mon, 11 Mar 2013, James Bottomley wrote: > >> Oh, that seems to be the suspend order isn't careful enough. >> __device_suspend() waits for its children, but the host disk are too far >> separated in the device tree. If the immediate children of the host are >> all sync, that wait never actually waits for anything. > > I was going to make exactly this same point. During async suspend, the > PM core is careful to make sure that no device is suspended before its > children. But there aren't any other checks, so if device A isn't an > ancestor of device B then it's possible for async suspend to power down > A before B. This can cause problems if B needs A to be active while B > is suspending. Thanks for the suggestions. > > Does the ATA system have any non-ancestor dependencies like this? If > it does, the appropriate driver can be fixed to take them into account. I don't think there is, and the relationship is like this: ata_host_controller* (named sata_nv xxx) | ata_port* (named atax, while "ata_port atax" is another device) / \ scsi_host ata_link | | scsi_target ata_device | scsi_device* (named sd x:x:x:x) With the devices that have actual PM operation functions defined have the asterisk next to it. So ata_host_controller waits for all of the ata_ports, and the ata_port waits for both scsi_host and ata_link. scsi_host waits for scsi_target, and scsi_target waits for scsi_device. So if scsi_device is not done, ata_port will not start. Doesn't look like a problem to me. And from the log: https://bugzilla.kernel.org/attachment.cgi?id=95101 It also looks like the order is correct. Thanks, Aaron -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-03-12 at 10:53 +0800, Aaron Lu wrote: > Hi James and Alan, > > On 03/11/2013 11:00 PM, Alan Stern wrote: > > On Mon, 11 Mar 2013, James Bottomley wrote: > > > >> Oh, that seems to be the suspend order isn't careful enough. > >> __device_suspend() waits for its children, but the host disk are too far > >> separated in the device tree. If the immediate children of the host are > >> all sync, that wait never actually waits for anything. > > > > I was going to make exactly this same point. During async suspend, the > > PM core is careful to make sure that no device is suspended before its > > children. But there aren't any other checks, so if device A isn't an > > ancestor of device B then it's possible for async suspend to power down > > A before B. This can cause problems if B needs A to be active while B > > is suspending. > > Thanks for the suggestions. > > > > > Does the ATA system have any non-ancestor dependencies like this? If > > it does, the appropriate driver can be fixed to take them into account. > > I don't think there is, and the relationship is like this: > > ata_host_controller* (named sata_nv xxx) > | > ata_port* (named atax, while "ata_port atax" is another device) > / \ > scsi_host ata_link > | | > scsi_target ata_device > | > scsi_device* (named sd x:x:x:x) > > With the devices that have actual PM operation functions defined have > the asterisk next to it. > > So ata_host_controller waits for all of the ata_ports, and the ata_port > waits for both scsi_host and ata_link. scsi_host waits for scsi_target, > and scsi_target waits for scsi_device. So if scsi_device is not done, > ata_port will not start. Doesn't look like a problem to me. > > And from the log: > https://bugzilla.kernel.org/attachment.cgi?id=95101 > It also looks like the order is correct. That's not what that log appears to say. Here are the relevant bits [ 7377.813634] sd 2:0:0:0: async_suspend: scheduled [ 7377.813636] sd 2:0:0:0: __device_suspend: starts [ 7377.813639] sd 2:0:0:0: [sdb] Synchronizing SCSI cache ... so now we've begun suspend] [ 7377.813750] sd 2:0:0:0: [sdb] Stopping disk [... here we send STANDBY IMMEDIATE ] [ 7378.237627] sata_nv 0000:00:05.2: async_suspend: scheduled [ 7378.237631] sata_nv 0000:00:05.2: __device_suspend: starts [... we begin to shut down the host ] [ 7378.249333] sata_nv 0000:00:05.2: __device_suspend: done [... host shutdown complete ] [ 7408.372642] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 7408.372647] ata3.00: failed command: STANDBY IMMEDIATE [ ... command times out ] [ 7408.870675] dpm_run_callback(): scsi_bus_suspend+0x0/0x20 [scsi_mod] returns 134217730 [ 7408.870681] sd 2:0:0:0: __device_suspend: done We shut down the host controller before the command completed. This appears to cause the timeout James -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/12/2013 08:10 PM, James Bottomley wrote: > On Tue, 2013-03-12 at 10:53 +0800, Aaron Lu wrote: >> Hi James and Alan, >> >> On 03/11/2013 11:00 PM, Alan Stern wrote: >>> On Mon, 11 Mar 2013, James Bottomley wrote: >>> >>>> Oh, that seems to be the suspend order isn't careful enough. >>>> __device_suspend() waits for its children, but the host disk are too far >>>> separated in the device tree. If the immediate children of the host are >>>> all sync, that wait never actually waits for anything. >>> >>> I was going to make exactly this same point. During async suspend, the >>> PM core is careful to make sure that no device is suspended before its >>> children. But there aren't any other checks, so if device A isn't an >>> ancestor of device B then it's possible for async suspend to power down >>> A before B. This can cause problems if B needs A to be active while B >>> is suspending. >> >> Thanks for the suggestions. >> >>> >>> Does the ATA system have any non-ancestor dependencies like this? If >>> it does, the appropriate driver can be fixed to take them into account. >> >> I don't think there is, and the relationship is like this: >> >> ata_host_controller* (named sata_nv xxx) >> | >> ata_port* (named atax, while "ata_port atax" is another device) >> / \ >> scsi_host ata_link >> | | >> scsi_target ata_device >> | >> scsi_device* (named sd x:x:x:x) >> >> With the devices that have actual PM operation functions defined have >> the asterisk next to it. >> >> So ata_host_controller waits for all of the ata_ports, and the ata_port >> waits for both scsi_host and ata_link. scsi_host waits for scsi_target, >> and scsi_target waits for scsi_device. So if scsi_device is not done, >> ata_port will not start. Doesn't look like a problem to me. >> >> And from the log: >> https://bugzilla.kernel.org/attachment.cgi?id=95101 >> It also looks like the order is correct. > > That's not what that log appears to say. Here are the relevant bits > > [ 7377.813634] sd 2:0:0:0: async_suspend: scheduled > [ 7377.813636] sd 2:0:0:0: __device_suspend: starts > [ 7377.813639] sd 2:0:0:0: [sdb] Synchronizing SCSI cache > ... so now we've begun suspend] > [ 7377.813750] sd 2:0:0:0: [sdb] Stopping disk > [... here we send STANDBY IMMEDIATE ] > [ 7378.237627] sata_nv 0000:00:05.2: async_suspend: scheduled > [ 7378.237631] sata_nv 0000:00:05.2: __device_suspend: starts > [... we begin to shut down the host ] > [ 7378.249333] sata_nv 0000:00:05.2: __device_suspend: done > [... host shutdown complete ] I think sata_nv 0000:00:05.0 is the host controller for sd 2:0:0:0, and sata_nv 0000:00:05.1 is the host controller for sd 4:0:0:0. I've asked bladud@gmail.com to attach the full dmesg, which can make it easier for us to decide which port belongs to which host controller. Note that this system has multiple ata host controllers. Thanks, Aaron > [ 7408.372642] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen > [ 7408.372647] ata3.00: failed command: STANDBY IMMEDIATE > [ ... command times out ] > [ 7408.870675] dpm_run_callback(): scsi_bus_suspend+0x0/0x20 [scsi_mod] returns 134217730 > [ 7408.870681] sd 2:0:0:0: __device_suspend: done > > We shut down the host controller before the command completed. This > appears to cause the timeout > > James > > -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 12 Mar 2013, Aaron Lu wrote: > So ata_host_controller waits for all of the ata_ports, and the ata_port > waits for both scsi_host and ata_link. scsi_host waits for scsi_target, > and scsi_target waits for scsi_device. So if scsi_device is not done, > ata_port will not start. Doesn't look like a problem to me. > > And from the log: > https://bugzilla.kernel.org/attachment.cgi?id=95101 > It also looks like the order is correct. You could compare the PM debugging logs for an async suspend and a synchronous suspend. There might be an obvious difference in the ordering that would explain the problem. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 13-03-11 03:30 PM, Robert Hancock wrote: .. > I'm sure there are some SSDs that do violate their data integrity > commitments - a while ago some tests were done (don't have a link > handy and I don't think they identified the actual vendor/model of the > drives in any case) but there were definitely some SSDs that did do > nasty things like trash unrelated data if power was lost while > writing, etc. So we definitely don't want to risk this sort of thing > occurring on a normal power-off or suspend. Just about all current SATA SSDs advertise having some form(s) of "background garbage collection". That "feature" has always concerned me, because of the real possibility of power being removed (system shutdown) while the drive firmware is in the midst of re-shuffling my data around. One would hope that "STANDBY IMMEDIATE" is taken by the drive as a signal to stop that fussing about and prepare for full/sudden power-off. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 7992635..384b621d 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3079,7 +3079,11 @@ static int sd_suspend(struct device *dev) if (sdkp->device->manage_start_stop) { sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n"); - ret = sd_start_stop_device(sdkp, 0); + /* + * this is informational for the disk we're going to power it + * off anyway, so don't bother about the return status + */ + sd_start_stop_device(sdkp, 0); } done: