Message ID | 20110525094127.GF10146@htj.dyndns.org |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
> This is a different issue. libata EH plugs the port if reset fails > repeatedly. This behavior was implemented to avoid causing continuous > resets on a port in case it has flaky PHY state reporting; however, it > seems to cause more trouble than fixing issues - ie. plugging in a > broken device may end up plugging the port even after the offending > device is removed until manual rescan or reboot. I've been pondering > about changing the behavior like the following. Thanks for the explanation, Tejun, and for sending those changes upstream. I shall incorporate these patches into my kernel and keep you informed as to how they behave. Mark, did you get a chance to clean up the sata_mv spinlock addition in mv_set_main_irq_mask? Many Thanks, Bruce. Latest News at: http://www.indigovision.com/index.php/en/news.html -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11-05-25 09:36 AM, Bruce Stenning wrote: >> This is a different issue. libata EH plugs the port if reset fails >> repeatedly. This behavior was implemented to avoid causing continuous >> resets on a port in case it has flaky PHY state reporting; however, it >> seems to cause more trouble than fixing issues - ie. plugging in a >> broken device may end up plugging the port even after the offending >> device is removed until manual rescan or reboot. I've been pondering >> about changing the behavior like the following. > > Thanks for the explanation, Tejun, and for sending those changes upstream. > I shall incorporate these patches into my kernel and keep you informed as > to how they behave. > > Mark, did you get a chance to clean up the sata_mv spinlock addition in > mv_set_main_irq_mask? Not yet. I've been waiting to hear back from you as to whether they visibly fix anything or not. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
PiA+IE1hcmssIGRpZCB5b3UgZ2V0IGEgY2hhbmNlIHRvIGNsZWFuIHVwIHRoZSBzYXRhX212IHNw aW5sb2NrIGFkZGl0aW9uIGluDQo+ID4gbXZfc2V0X21haW5faXJxX21hc2s/DQo+DQo+IE5vdCB5 ZXQuICBJJ3ZlIGJlZW4gd2FpdGluZyB0byBoZWFyIGJhY2sgZnJvbSB5b3UgYXMgdG8NCj4gd2hl dGhlciB0aGV5IHZpc2libHkgZml4IGFueXRoaW5nIG9yIG5vdC4NCg0KSGF2aW5nIG5vdyB0ZXN0 ZWQgd2l0aCBUZWp1bidzIHR3byBwYXRjaGVzIGFuZCB3aXRob3V0IHRoZSBzcGlubG9jayBpbg0K bXZfc2V0X21haW5faXJxX21hc2sgZm9yIGEgcmVhc29uYWJsZSB0aW1lLCB0aGUgaG90cGx1ZyBz ZWVtcyB0byBiZSB2ZXJ5DQpyZWxpYWJsZS4gIFNvIEkgdGhpbmsgaXQncyBwcm9iYWJseSBiZXN0 IHRvIGxlYXZlIHRoZSBzcGlubG9jayBvdXQgYW5kDQpjb250aW51ZSB0byBvYnNlcnZlIHRoZSBi ZWhhdmlvdXIgd2l0aCBmdXJ0aGVyIHRlc3RpbmcuDQoNClRoYW5rcywNCkJydWNlLg0KDQoNCkxh dGVzdCBOZXdzIGF0OiBodHRwOi8vd3d3LmluZGlnb3Zpc2lvbi5jb20vaW5kZXgucGhwL2VuL25l d3MuaHRtbA0K -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11-05-30 09:07 AM, Bruce Stenning wrote: >>> Mark, did you get a chance to clean up the sata_mv spinlock addition in >>> mv_set_main_irq_mask? >> >> Not yet. I've been waiting to hear back from you as to >> whether they visibly fix anything or not. > > Having now tested with Tejun's two patches and without the spinlock in > mv_set_main_irq_mask for a reasonable time, the hotplug seems to be very > reliable. So I think it's probably best to leave the spinlock out and > continue to observe the behaviour with further testing. Peachy. I'll have another look at some point -- I think I thought the spinlock was still needed for ATAPI under some conditions. Gotta look closer to make sure. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11-05-25 05:41 AM, Tejun Heo wrote: > Hello, sorry about the long delay. > > On Tue, May 17, 2011 at 04:30:20PM +0100, Bruce Stenning wrote: >> __ata_port_freeze: ata4 port frozen >> ata4: hard resetting link >> sata_link_hardreset: ENTER >> ata4: COMRESET failed (errno=-32) >> sata_link_hardreset: EXIT, rc=-32 >> ata4: reset failed (errno=-32), retrying in 33 secs >> __ata_port_freeze: ata4 port frozen >> ata4: hard resetting link >> sata_link_hardreset: ENTER >> ata4: COMRESET failed (errno=-32) >> sata_link_hardreset: EXIT, rc=-32 >> ata4: reset failed, giving up >> ata_eh_recover: EXIT, rc=-32 >> ata4.00: disabled >> ata4: EH complete >> ata_scsi_error: EXIT >> >> The IRQ for that port is masked off afterwards. > > This is a different issue. libata EH plugs the port if reset fails > repeatedly. This behavior was implemented to avoid causing continuous > resets on a port in case it has flaky PHY state reporting; however, it > seems to cause more trouble than fixing issues - ie. plugging in a > broken device may end up plugging the port even after the offending > device is removed until manual rescan or reboot. I've been pondering > about changing the behavior like the following. > > diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c > index dfb6e9d..05797fe 100644 > --- a/drivers/ata/libata-eh.c > +++ b/drivers/ata/libata-eh.c > @@ -2885,8 +2885,17 @@ int ata_eh_reset(struct ata_link *link, int classify, > sata_scr_read(link, SCR_STATUS, &sstatus)) > rc = -ERESTART; > > - if (rc == -ERESTART || try >= max_tries) > + if (rc == -ERESTART || try >= max_tries) { > + /* > + * Thaw host port even if reset failed, so that the port > + * can be retried on the next phy event. This risks > + * repeated EH runs but seems to be a better tradeoff than > + * shutting down a port after a botched hotplug attempt. > + */ > + if (ata_is_host_link(link)) > + ata_eh_thaw_port(ap); > goto out; > + } > > now = jiffies; > if (time_before(now, deadline)) { Tejun, did this ever go upstream and to -stable ?? I'm asking because I see the same issue with other SATA controllers, in particular with sata_sil boards. Hot plug generally works _once_ per port, and then stops working. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, (cc'ing Jeff) On Fri, Jun 10, 2011 at 08:28:51AM -0400, Mark Lord wrote: > Tejun, did this ever go upstream and to -stable ?? > I'm asking because I see the same issue with other SATA controllers, > in particular with sata_sil boards. Hot plug generally works _once_ > per port, and then stops working. The patch is still pending, http://article.gmane.org/gmane.linux.ide/49613 I don't think it would be wise to include the patch for -stable tho. It's not a regression fix and has some possibility of introducing regressions, so.... Thanks.
On 11-06-10 10:01 AM, Tejun Heo wrote: > Hello, > > (cc'ing Jeff) > > On Fri, Jun 10, 2011 at 08:28:51AM -0400, Mark Lord wrote: >> Tejun, did this ever go upstream and to -stable ?? >> I'm asking because I see the same issue with other SATA controllers, >> in particular with sata_sil boards. Hot plug generally works _once_ >> per port, and then stops working. > > The patch is still pending, > > http://article.gmane.org/gmane.linux.ide/49613 > > I don't think it would be wise to include the patch for -stable tho. > It's not a regression fix and has some possibility of introducing > regressions, so.... Not a regression fix? Excuse me? SATA hotplug always used to work fine, and is more or less essential to reboot-free operations. Cheers -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/10/2011 01:32 PM, Mark Lord wrote: > On 11-06-10 10:01 AM, Tejun Heo wrote: >> Hello, >> >> (cc'ing Jeff) >> >> On Fri, Jun 10, 2011 at 08:28:51AM -0400, Mark Lord wrote: >>> Tejun, did this ever go upstream and to -stable ?? >>> I'm asking because I see the same issue with other SATA controllers, >>> in particular with sata_sil boards. Hot plug generally works _once_ >>> per port, and then stops working. >> >> The patch is still pending, >> >> http://article.gmane.org/gmane.linux.ide/49613 >> >> I don't think it would be wise to include the patch for -stable tho. >> It's not a regression fix and has some possibility of introducing >> regressions, so.... > > Not a regression fix? Excuse me? > SATA hotplug always used to work fine, > and is more or less essential to reboot-free operations. I can move it from #upstream (linux-next) to #upstream-fixes, if we feel it is sufficient urgent. It seemed like it needed more testing, but maybe the best way to do that is #upstream-fixes? Was mainly a -rc question, to me. Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11-06-10 01:39 PM, Jeff Garzik wrote: > On 06/10/2011 01:32 PM, Mark Lord wrote: >> On 11-06-10 10:01 AM, Tejun Heo wrote: >>> Hello, >>> >>> (cc'ing Jeff) >>> >>> On Fri, Jun 10, 2011 at 08:28:51AM -0400, Mark Lord wrote: >>>> Tejun, did this ever go upstream and to -stable ?? >>>> I'm asking because I see the same issue with other SATA controllers, >>>> in particular with sata_sil boards. Hot plug generally works _once_ >>>> per port, and then stops working. >>> >>> The patch is still pending, >>> >>> http://article.gmane.org/gmane.linux.ide/49613 >>> >>> I don't think it would be wise to include the patch for -stable tho. >>> It's not a regression fix and has some possibility of introducing >>> regressions, so.... >> >> Not a regression fix? Excuse me? >> SATA hotplug always used to work fine, >> and is more or less essential to reboot-free operations. > > I can move it from #upstream (linux-next) to #upstream-fixes, if we feel it is > sufficient urgent. It seemed like it needed more testing, but maybe the best > way to do that is #upstream-fixes? > > Was mainly a -rc question, to me. Well, it's not getting any testing sitting outside of the kernel right now. Cheers -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/10/2011 04:49 PM, Mark Lord wrote:
> Well, it's not getting any testing sitting outside of the kernel right now.
It gets whatever testing linux-next gets...
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 10, 2011 at 01:32:22PM -0400, Mark Lord wrote: > > I don't think it would be wise to include the patch for -stable tho. > > It's not a regression fix and has some possibility of introducing > > regressions, so.... > > Not a regression fix? Excuse me? > SATA hotplug always used to work fine, > and is more or less essential to reboot-free operations. There are two separate patches. One is a fix which got pushed to Linus by Jeff early yesterday. The other changes behavior which has been like that since the beginning of EH. The former is marked properly w/ -stable. The latter shouldn't be in -stable. Thanks.
On 11-06-11 08:19 AM, Tejun Heo wrote: > On Fri, Jun 10, 2011 at 01:32:22PM -0400, Mark Lord wrote: >>> I don't think it would be wise to include the patch for -stable tho. >>> It's not a regression fix and has some possibility of introducing >>> regressions, so.... >> >> Not a regression fix? Excuse me? >> SATA hotplug always used to work fine, >> and is more or less essential to reboot-free operations. > > There are two separate patches. One is a fix which got pushed to > Linus by Jeff early yesterday. The other changes behavior which has > been like that since the beginning of EH. The former is marked > properly w/ -stable. The latter shouldn't be in -stable. Peachy. Got links for each of those? I may try them with the sata_sil board that was having issues earlier. Cheers -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Sun, Jun 12, 2011 at 01:10:35PM -0400, Mark Lord wrote: > On 11-06-11 08:19 AM, Tejun Heo wrote: > > On Fri, Jun 10, 2011 at 01:32:22PM -0400, Mark Lord wrote: > >>> I don't think it would be wise to include the patch for -stable tho. > >>> It's not a regression fix and has some possibility of introducing > >>> regressions, so.... > >> > >> Not a regression fix? Excuse me? > >> SATA hotplug always used to work fine, > >> and is more or less essential to reboot-free operations. > > > > There are two separate patches. One is a fix which got pushed to > > Linus by Jeff early yesterday. The other changes behavior which has > > been like that since the beginning of EH. The former is marked > > properly w/ -stable. The latter shouldn't be in -stable. > > Peachy. Got links for each of those? > I may try them with the sata_sil board that was having issues earlier. The fix is http://article.gmane.org/gmane.linux.ide/49615 and the behavior change is http://article.gmane.org/gmane.linux.ide/49613
Hi guys, While the hotplug is now far more reliable, I have recently been able to coax it into locking up SATA ports again while the device is heavily loaded. This looks very similar to the previous lockups, with the SATA port left frozen. I have pasted some tracing below; at the end of the trace ata4 is no longer responsive to hotplug events. I have attempted to add further tracing to find out where the freeze is occurring, but so far this has just prevented me from triggering the condition. I am currently trying to locate the problem by inspecting the sources, but you may have some intuition that could shortcut my efforts. I should mention that we are now working on kernel 2.6.39.4 (and are choosing not to go with 3.x to limit changes as we progress towards final testing.) Kind Regards, Bruce. [...] ata4.00: configured for UDMA/33 ata_eh_recover: EXIT, rc=0 ata4: EH complete ata_scsi_error: EXIT ata_scsiop_noop: ENTER ata_scsiop_read_cap: ENTER ata_scsiop_mode_sense: ENTER ata_scsiop_mode_sense: ENTER ata_scsiop_mode_sense: ENTER __ata_port_freeze: ata4 port frozen ata_scsi_timed_out: ENTER ata_scsi_timed_out: EXIT, ret=0 ata_scsi_error: ENTER ata_sff_flush_pio_task: ENTER ata_eh_link_autopsy: ENTER ata_eh_link_autopsy: EXIT ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x190002 action 0xe frozen ata4.00: edma_err_cause=00000020 pp_flags=00000000, SError=00180000 ata4: SError: { RecovComm PHYRdyChg 10B8B Dispar } ata4.00: failed command: FLUSH CACHE EXT ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 res 50/00:00:30:5a:27/00:08:00:00:00/a0 Emask 0x10 (ATA bus error) ata4.00: status: { DRDY } ata_eh_recover: ENTER __ata_port_freeze: ata4 port frozen ata4: hard resetting link sata_link_hardreset: ENTER sata_link_hardreset: EXIT, rc=0 ata_sff_softreset: ENTER ata_sff_softreset: about to softreset, devmask=1 ata_bus_softreset: ata4: bus reset via SRST ata_dev_classify: found ATA device by sig ata_sff_softreset: EXIT, classes[0]=1 [1]=0 ata_eh_thaw_port: ata4 port thawed ata_std_postreset: ENTER ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata_std_postreset: EXIT ata_eh_revalidate_and_attach: ENTER ata_sff_flush_pio_task: ENTER ata_sff_flush_pio_task: ENTER ata_dump_id: 49==0x2f00 53==0x0007 63==0x0007 64==0x0003 75==0x001f ata_dump_id: 80==0x01f0 81==0x0029 82==0x346b 83==0x7d61 84==0x4133 ata_dump_id: 88==0x047f 93==0x0000 ata_dev_set_xfermode: set features - xfer mode ata_sff_flush_pio_task: ENTER ata_dev_set_xfermode: EXIT, err_mask=0 ata_sff_flush_pio_task: ENTER ata_sff_flush_pio_task: ENTER ata_dump_id: 49==0x2f00 53==0x0007 63==0x0007 64==0x0003 75==0x001f ata_dump_id: 80==0x01f0 81==0x0029 82==0x346b 83==0x7d61 84==0x4133 ata_dump_id: 88==0x047f 93==0x0000 ata_dev_set_mode: xfer_shift=12, xfer_mode=0x42 ata4.00: configured for UDMA/33 ata4.00: retrying FLUSH 0xea Emask 0x10 ata_sff_flush_pio_task: ENTER ata_eh_recover: EXIT, rc=0 ata4: EH complete ata_scsi_error: EXIT __ata_port_freeze: ata4 port frozen ata_port_schedule_eh: port EH scheduled ata_scsiop_noop: ENTER ata_scsiop_read_cap: ENTER ata_scsiop_mode_sense: ENTER ata_scsiop_mode_sense: ENTER ata_scsiop_mode_sense: ENTER Latest News at: http://www.indigovision.com/index.php/en/news.html -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Mon, Aug 29, 2011 at 04:49:29PM +0100, Bruce Stenning wrote: > I should mention that we are now working on kernel 2.6.39.4 (and are choosing > not to go with 3.x to limit changes as we progress towards final testing.) I don't think it would make any difference. The core EH hasn't changed for some time now. > [...] > ata4: EH complete > ata_scsi_error: EXIT > __ata_port_freeze: ata4 port frozen > ata_port_schedule_eh: port EH scheduled > ata_scsiop_noop: ENTER > ata_scsiop_read_cap: ENTER > ata_scsiop_mode_sense: ENTER > ata_scsiop_mode_sense: ENTER > ata_scsiop_mode_sense: ENTER That's weird. Another EH is scheduled after one completes but EH doesn't kick in. Hmmm... can't see how that can happen. How easily can you reproduce the problem? Thanks.
> That's weird. Another EH is scheduled after one completes but EH > doesn't kick in. Hmmm... can't see how that can happen. How easily > can you reproduce the problem? > > Thanks. Hi Tejun, I hope you are well :-) I have been trying to determine if the problem is that it is failing to schedule the EH, or whether it is correctly scheduling it but for some reason it is getting blocked or lost. Unfortunately it has so far been quite difficult to reproduce when specifically attempting to. In normal use cases I reproduced it twice by unplugging a drive from a RAID array with redundancy intact. This was out of around a dozen cycles of waiting until redundancy was restored while the unit was under load, popping the disk, reinserting, and triggering a RAID rebuild. I have only twice managed to trigger a lockup deliberately. In both cases the tracing showed a scheduled EH which was subsequently not enacted. How long could it take for the EH to be enacted? In the lockups that I have reproduced it did not seem to have recovered minutes later, but perhaps if I had waited longer...? I have noticed that error recovery sometimes backs off for 8s and even 33s, but it always warns when that sort of delay is coming up. I shall continue to try to track down why the scheduled EH does not happen. Kind Regards, Bruce. Latest News at: http://www.indigovision.com/index.php/en/news.html -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Fri, Sep 02, 2011 at 05:22:38PM +0100, Bruce Stenning wrote: > Unfortunately it has so far been quite difficult to reproduce when specifically > attempting to. In normal use cases I reproduced it twice by unplugging a drive > from a RAID array with redundancy intact. This was out of around a dozen > cycles of waiting until redundancy was restored while the unit was under load, > popping the disk, reinserting, and triggering a RAID rebuild. Hmm... that's unfortunate. > I have only twice managed to trigger a lockup deliberately. In both cases the > tracing showed a scheduled EH which was subsequently not enacted. > > How long could it take for the EH to be enacted? In the lockups that I > have reproduced it did not seem to have recovered minutes later, but perhaps > if I had waited longer...? I have noticed that error recovery sometimes backs > off for 8s and even 33s, but it always warns when that sort of delay is coming > up. It should happen pretty quickly. In such cases, fastdrain is activated and all pending commands are aborted if they complete in 3 secs and then EH should kick in. The backoff is from reset path only to give breathing time for devices which take long time to spin up and doesn't apply in this case. > I shall continue to try to track down why the scheduled EH does not happen. Can you please add some debug printk's to scsi_schedule_eh() and see whether scsi_eh_wakeup() is invoked from there? It seems likely that the problem is caused by race conditions around SHOST_[CANCEL_]RECOVERY flags. Thanks.
> Can you please add some debug printk's to scsi_schedule_eh() and see > whether scsi_eh_wakeup() is invoked from there? It seems likely that > the problem is caused by race conditions around > SHOST_[CANCEL_]RECOVERY flags. I did manage to reproduce the lockup again yesterday with a slightly different mix of tracing, including adding tracing to scsi_eh_wakeup() and scsi_schedule_eh(). It looks like the EH is being scheduled, but the EH thread goes immediately back to sleep and doesn't wake up: ata4: EH complete Waking error handler thread scsi_eh_wakeup: succeeded scsi_schedule_eh: succeeded scsi_restart_operations: waking up host to restart Error handler scsi_eh_3 sleeping Is it attempting to wake the scsi_eh_3 thread while scsi_error_handler is still processing an EH, which then calls scsi_restart_operations and puts the scsi_eh_3 thread back to sleep again? Some while after the lockup, there was some tracing relating to SCSI operations timing out, but the port was still unresponsive. The unit is not entirely stable in this state, and our application software was no longer able to strobe softdog, so the unit rebooted. Enough was running for the serial console to be responsive before the reboot, however. Thanks, Bruce. Latest News at: http://www.indigovision.com/index.php/en/news.html -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I wrote: > ata4: EH complete > Waking error handler thread > scsi_eh_wakeup: succeeded > scsi_schedule_eh: succeeded > scsi_restart_operations: waking up host to restart > Error handler scsi_eh_3 sleeping Sorry, I hit 'send' before thinking fully. A short while before the above, the following was logged: Waking error handler thread scsi_eh_wakeup: succeeded Error handler scsi_eh_3 waking up This suggests that scsi_error_handler was indeed running when the subsequent wake-up was attempted. Bruce. Latest News at: http://www.indigovision.com/index.php/en/news.html -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I wrote: > Is it attempting to wake the scsi_eh_3 thread while scsi_error_handler > is still processing an EH, which then calls scsi_restart_operations and > puts the scsi_eh_3 thread back to sleep again? It does look to me as if there is a potential race between the scsi_eh thread and the wake-up mechanism. host_eh_scheduled is read here in scsi_error_handler: set_current_state(TASK_INTERRUPTIBLE); while (!kthread_should_stop()) { if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0) || For scheduling an EH, scsi_schedule_eh takes the shost->host_lock, increments shost->host_eh_scheduled, and then wakes the EH thread. If this happens between the scsi_eh thread reading host_eh_scheduled and sending itself to sleep, then nothing will wake up the thread and host_eh_scheduled will not be inspected again. Is there some mechanism that I've missed to prevent this from happening? Bruce. Latest News at: http://www.indigovision.com/index.php/en/news.html -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Is there some mechanism that I've missed to prevent this from happening?
Okay, no, I guess the thread will naturally be rescheduled at some point,
at which point the shost->host_eh_scheduled will be re-inspected.
Latest News at: http://www.indigovision.com/index.php/en/news.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index dfb6e9d..05797fe 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -2885,8 +2885,17 @@ int ata_eh_reset(struct ata_link *link, int classify, sata_scr_read(link, SCR_STATUS, &sstatus)) rc = -ERESTART; - if (rc == -ERESTART || try >= max_tries) + if (rc == -ERESTART || try >= max_tries) { + /* + * Thaw host port even if reset failed, so that the port + * can be retried on the next phy event. This risks + * repeated EH runs but seems to be a better tradeoff than + * shutting down a port after a botched hotplug attempt. + */ + if (ata_is_host_link(link)) + ata_eh_thaw_port(ap); goto out; + } now = jiffies; if (time_before(now, deadline)) {