Patchwork sata_mv port lockup on hotplug (kernel 2.6.38.2)

login
register
mail settings
Submitter Tejun Heo
Date Sept. 6, 2011, 4:33 p.m.
Message ID <20110906163355.GG18425@mtj.dyndns.org>
Download mbox | patch
Permalink /patch/113600/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Tejun Heo - Sept. 6, 2011, 4:33 p.m.
Hello,

On Tue, Sep 06, 2011 at 01:19:44PM +0100, Bruce Stenning wrote:
> ata4: EH complete
> Waking error handler thread
> scsi_eh_wakeup: succeeded
> scsi_schedule_eh: succeeded
> scsi_restart_operations: waking up host to restart
> Error handler scsi_eh_3 sleeping

I think the following should fix the problem.  The code there is from
the time when libata shared scsi_host->host_lock.  libata no longer
does that so, in the current state, host_eh_scheduled can be cleared
with actual pending EH condition.

You should be able to reproduce the problem more easily by adding
delay (something like mdelay(5)) before host_eh_scheduled clearing
without the following patch applied.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - Sept. 6, 2011, 4:42 p.m.
On Wed, Sep 07, 2011 at 01:33:55AM +0900, Tejun Heo wrote:
> Hello,
> 
> On Tue, Sep 06, 2011 at 01:19:44PM +0100, Bruce Stenning wrote:
> > ata4: EH complete
> > Waking error handler thread
> > scsi_eh_wakeup: succeeded
> > scsi_schedule_eh: succeeded
> > scsi_restart_operations: waking up host to restart
> > Error handler scsi_eh_3 sleeping
> 
> I think the following should fix the problem.  The code there is from
> the time when libata shared scsi_host->host_lock.  libata no longer
> does that so, in the current state, host_eh_scheduled can be cleared
> with actual pending EH condition.

Hmmm... maybe not.  Such race condition exists iff host_eh_scheduled
is incremented outside of ap->lock, which I can't see how.  Weird.
Can you please instrument the followings?

* print the caller of scsi_eh_wakeup().  "%pF" w/ (void *)_RET_IP_
  should do it.

* print why scsi_eh is going back to sleep immediately.
  ie. shost->host_failed, host_eh_scheduled, host_busy in
  scsi_error_handler().  It would also be nice to add some printks
  around schedule() and enable PRINTK_TIME.

Thanks.
Bruce Stenning - Sept. 7, 2011, 12:09 p.m.
Hi Tejun,

Sorry for sending so many emails yesterday; I blame the dental anaesthetic
I received in the morning for being so jumpy on the send button ;-)

> Can you please instrument the followings?
>
> * print the caller of scsi_eh_wakeup().  "%pF" w/ (void *)_RET_IP_
>   should do it.
>
> * print why scsi_eh is going back to sleep immediately.
>   ie. shost->host_failed, host_eh_scheduled, host_busy in
>   scsi_error_handler().  It would also be nice to add some printks
>   around schedule() and enable PRINTK_TIME.

I can certainly try this.  Could you confirm whether my thoughts about a race
between the scsi_eh thread and the wake-up are plausible?  I backtracked
yesterday because I thought the scsi_eh thread would get rescheduled naturally,
not realising that when the task state is TASK_INTERRUPTIBLE schedule() takes
the task off the run queue (so it needs to be explicitly woken.)

Here is my thinking again:

shost->host_eh_scheduled is read here in scsi_error_handler:

        set_current_state(TASK_INTERRUPTIBLE);
        while (!kthread_should_stop()) {
                if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0) ||

There's no locking in scsi_error_handler (though functions it calls probably
claim locks.)

When scheduling an EH, scsi_schedule_eh takes the shost->host_lock, increments
shost->host_eh_scheduled, and then wakes the EH thread.  If this happens
between the scsi_eh thread reading host_eh_scheduled and sending itself back
to sleep (when the scsi_eh thread's state is TASK_INTERRUPTIBLE) nothing will
wake up the thread again and host_eh_scheduled will not get inspected.
host_eh_scheduled is stuck at 1 with the scsi_eh thread asleep, and it won't
get woken again because the ata port has been frozen and irqs are masked off.

Bruce.


Latest News at: http://www.indigovision.com/index.php/en/news.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - Sept. 8, 2011, 1:16 a.m.
Hello,

On Wed, Sep 07, 2011 at 01:09:10PM +0100, Bruce Stenning wrote:
> Sorry for sending so many emails yesterday; I blame the dental anaesthetic
> I received in the morning for being so jumpy on the send button ;-)

Oh the fun. :)

> I can certainly try this.  Could you confirm whether my thoughts about a race
> between the scsi_eh thread and the wake-up are plausible?  I backtracked
> yesterday because I thought the scsi_eh thread would get rescheduled naturally,
> not realising that when the task state is TASK_INTERRUPTIBLE schedule() takes
> the task off the run queue (so it needs to be explicitly woken.)
> 
> Here is my thinking again:
> 
> shost->host_eh_scheduled is read here in scsi_error_handler:
> 
>         set_current_state(TASK_INTERRUPTIBLE);
>         while (!kthread_should_stop()) {
>                 if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0) ||
> 
> There's no locking in scsi_error_handler (though functions it calls probably
> claim locks.)
> 
> When scheduling an EH, scsi_schedule_eh takes the shost->host_lock, increments
> shost->host_eh_scheduled, and then wakes the EH thread.  If this happens
> between the scsi_eh thread reading host_eh_scheduled and sending itself back
> to sleep (when the scsi_eh thread's state is TASK_INTERRUPTIBLE) nothing will
> wake up the thread again and host_eh_scheduled will not get inspected.
> host_eh_scheduled is stuck at 1 with the scsi_eh thread asleep, and it won't
> get woken again because the ata port has been frozen and irqs are masked off.

I don't think there's a race condition there.  set_current_state()
implies memory barrier and wake_up_process() implies wmb().  host_eh
either sees the inrecremented eh_scheduled count or TASK_RUNNING set
by wake_up_process(), so it can't miss an event.

Thanks.

Patch

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index ed16fbe..e971784 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -771,6 +771,14 @@  void ata_scsi_port_error_handler(struct Scsi_Host *host, struct ata_port *ap)
 		/* process port suspend request */
 		ata_eh_handle_port_suspend(ap);
 
+		/*
+		 * Single iteration complete.  Clear SCSI EH scheduled
+		 * state and check whether another iteration is necessary.
+		 */
+		spin_lock_irqsave(host->host_lock, flags);
+		host->host_eh_scheduled = 0;
+		spin_unlock_irqrestore(host->host_lock, flags);
+
 		/* Exception might have happened after ->error_handler
 		 * recovered the port but before this point.  Repeat
 		 * EH in such case.
@@ -792,13 +800,6 @@  void ata_scsi_port_error_handler(struct Scsi_Host *host, struct ata_port *ap)
 		ata_for_each_link(link, ap, HOST_FIRST)
 			memset(&link->eh_info, 0, sizeof(link->eh_info));
 
-		/* Clear host_eh_scheduled while holding ap->lock such
-		 * that if exception occurs after this point but
-		 * before EH completion, SCSI midlayer will
-		 * re-initiate EH.
-		 */
-		host->host_eh_scheduled = 0;
-
 		spin_unlock_irqrestore(ap->lock, flags);
 		ata_eh_release(ap);
 	} else {