diff mbox

scsi: fix race between simultaneous decrements of ->host_failed

Message ID 1464407471-3712-1-git-send-email-fangwei1@huawei.com
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

fangwei May 28, 2016, 3:51 a.m. UTC
async_sas_ata_eh(), which will call scsi_eh_finish_cmd() in some case,
would be performed simultaneously in sas_ata_strategy_handler(). In this
case, ->host_failed may be decreased simultaneously in
scsi_eh_finish_cmd() on different CPUs, and become abnormal.

It will lead to permanently inequal between ->host_failed and
 ->host_busy. Then SCSI error handler thread won't become running,
SCSI errors after that won't be handled forever.

Use atomic type for ->host_failed to fix this race.

Signed-off-by: Wei Fang <fangwei1@huawei.com>
---
 drivers/ata/libata-eh.c             |    2 +-
 drivers/scsi/libsas/sas_scsi_host.c |    5 +++--
 drivers/scsi/scsi.c                 |    2 +-
 drivers/scsi/scsi_error.c           |   15 +++++++++------
 drivers/scsi/scsi_lib.c             |    3 ++-
 include/scsi/scsi_host.h            |    2 +-
 6 files changed, 17 insertions(+), 12 deletions(-)

Comments

Christoph Hellwig May 29, 2016, 6:54 a.m. UTC | #1
On Sat, May 28, 2016 at 11:51:11AM +0800, Wei Fang wrote:
> async_sas_ata_eh(), which will call scsi_eh_finish_cmd() in some case,
> would be performed simultaneously in sas_ata_strategy_handler(). In this
> case, ->host_failed may be decreased simultaneously in
> scsi_eh_finish_cmd() on different CPUs, and become abnormal.
> 
> It will lead to permanently inequal between ->host_failed and
>  ->host_busy. Then SCSI error handler thread won't become running,
> SCSI errors after that won't be handled forever.
> 
> Use atomic type for ->host_failed to fix this race.

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

But please also update Documentation/scsi/scsi_eh.txt for this
change.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley May 29, 2016, 3:41 p.m. UTC | #2
On Sat, 2016-05-28 at 23:54 -0700, Christoph Hellwig wrote:
> On Sat, May 28, 2016 at 11:51:11AM +0800, Wei Fang wrote:
> > async_sas_ata_eh(), which will call scsi_eh_finish_cmd() in some 
> > case, would be performed simultaneously in 
> > sas_ata_strategy_handler(). In this case, ->host_failed may be 
> > decreased simultaneously in scsi_eh_finish_cmd() on different CPUs,
> > and become abnormal.
> > 
> > It will lead to permanently inequal between ->host_failed and
> >  ->host_busy. Then SCSI error handler thread won't become running,
> > SCSI errors after that won't be handled forever.
> > 
> > Use atomic type for ->host_failed to fix this race.
> 
> Looks fine,

Actually, it doesn't look fine at all.  The same mechanism that's
supposed to protect the host_failed decrement is also supposed to
protect the list_move_tail().  If there's a problem with the former
then we're also in danger of corrupting the list.

Can we go back to the theory of what the problem is, since it's not
spelled out very clearly in the change log.  Our usual reason for not
requiring locking in eh routines is that the eh is single threaded on
the eh thread per host, so any host manipulations can't have
concurrency problems.  In this case, the sas_ata routines are trying to
be clever and use asynchronous workqueues for the port error handler
and you theorise that these can execute concurrently on two CPUs, thus
causing the problem?

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig May 29, 2016, 6:06 p.m. UTC | #3
On Sun, May 29, 2016 at 08:41:13AM -0700, James Bottomley wrote:
> Actually, it doesn't look fine at all.  The same mechanism that's
> supposed to protect the host_failed decrement is also supposed to
> protect the list_move_tail().  If there's a problem with the former
> then we're also in danger of corrupting the list.

No, that's not the case.  eh_entry is used for two things:

 a) shost->eh_cmd_q, which is used to queue up command for the EH
    thread, and is locked using the host lock.
 b) various on-stack lists in the EH thread

scsi_eh_finish_cmd is only called for case b) as all EH thread
implementations move the commands from eh_cmd_q to a local list
as the very first thing.

host_fail on the other hand is incremented under the host_lock
in scsi_eh_scmd_add, but decremented without any lock from the
EH thread.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley May 29, 2016, 7:15 p.m. UTC | #4
On Sun, 2016-05-29 at 11:06 -0700, Christoph Hellwig wrote:
> On Sun, May 29, 2016 at 08:41:13AM -0700, James Bottomley wrote:
> > Actually, it doesn't look fine at all.  The same mechanism that's
> > supposed to protect the host_failed decrement is also supposed to
> > protect the list_move_tail().  If there's a problem with the former
> > then we're also in danger of corrupting the list.
> 
> No, that's not the case.  eh_entry is used for two things:
> 
>  a) shost->eh_cmd_q, which is used to queue up command for the EH
>     thread, and is locked using the host lock.
>  b) various on-stack lists in the EH thread

Actually, no, in the ata error handler case we have per port queues
which are part of the struct ata_port, so it's neither a) nor b).

However, because the ata_port is per sas domain device, and the
sas_ata_strategy_handler fires off one async thread per domain device,
we're concurrency safe on the per-ata_port queues.

Just checking some of the other scsi eh assumptions this may be
violating.

James


> scsi_eh_finish_cmd is only called for case b) as all EH thread
> implementations move the commands from eh_cmd_q to a local list
> as the very first thing.
> 
> host_fail on the other hand is incremented under the host_lock
> in scsi_eh_scmd_add, but decremented without any lock from the
> EH thread.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
fangwei May 30, 2016, 7:27 a.m. UTC | #5
Hi James, Christoph,

On 2016/5/29 23:41, James Bottomley wrote:
> On Sat, 2016-05-28 at 23:54 -0700, Christoph Hellwig wrote:
>> On Sat, May 28, 2016 at 11:51:11AM +0800, Wei Fang wrote:
>>> async_sas_ata_eh(), which will call scsi_eh_finish_cmd() in some 
>>> case, would be performed simultaneously in 
>>> sas_ata_strategy_handler(). In this case, ->host_failed may be 
>>> decreased simultaneously in scsi_eh_finish_cmd() on different CPUs,
>>> and become abnormal.
>>>
>>> It will lead to permanently inequal between ->host_failed and
>>>  ->host_busy. Then SCSI error handler thread won't become running,
>>> SCSI errors after that won't be handled forever.
>>>
>>> Use atomic type for ->host_failed to fix this race.
>>
>> Looks fine,
> 
> Actually, it doesn't look fine at all.  The same mechanism that's
> supposed to protect the host_failed decrement is also supposed to
> protect the list_move_tail().  If there's a problem with the former
> then we're also in danger of corrupting the list.

Scmd is moved to local eh_done_q list here, and I checked that the
list won't be touched concurrently.

> Can we go back to the theory of what the problem is, since it's not
> spelled out very clearly in the change log.  Our usual reason for not
> requiring locking in eh routines is that the eh is single threaded on
> the eh thread per host, so any host manipulations can't have
> concurrency problems.  In this case, the sas_ata routines are trying to
> be clever and use asynchronous workqueues for the port error handler
> and you theorise that these can execute concurrently on two CPUs, thus
> causing the problem?

Yes, it's the case. The works of the port error handler are added to
system_unbound_wq, and will be performed concurrently on different CPUs.
We have already met that problem on our machine.

Thanks,
Wei

> James
> 
> 
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
fangwei May 30, 2016, 7:43 a.m. UTC | #6
Hi, Christoph,

On 2016/5/29 14:54, Christoph Hellwig wrote:
> On Sat, May 28, 2016 at 11:51:11AM +0800, Wei Fang wrote:
>> async_sas_ata_eh(), which will call scsi_eh_finish_cmd() in some case,
>> would be performed simultaneously in sas_ata_strategy_handler(). In this
>> case, ->host_failed may be decreased simultaneously in
>> scsi_eh_finish_cmd() on different CPUs, and become abnormal.
>>
>> It will lead to permanently inequal between ->host_failed and
>>  ->host_busy. Then SCSI error handler thread won't become running,
>> SCSI errors after that won't be handled forever.
>>
>> Use atomic type for ->host_failed to fix this race.
> 
> Looks fine,
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> But please also update Documentation/scsi/scsi_eh.txt for this
> change.

Thanks for reviewing the patch.
I looked around the file, and didn't find the part should be updated.
Would you point me out?

Thanks,
Wei

> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley May 30, 2016, 4:04 p.m. UTC | #7
On Mon, 2016-05-30 at 15:27 +0800, Wei Fang wrote:
> Hi James, Christoph,
> 
> On 2016/5/29 23:41, James Bottomley wrote:
> > On Sat, 2016-05-28 at 23:54 -0700, Christoph Hellwig wrote:
> > > On Sat, May 28, 2016 at 11:51:11AM +0800, Wei Fang wrote:
> > > > async_sas_ata_eh(), which will call scsi_eh_finish_cmd() in 
> > > > some case, would be performed simultaneously in
> > > > sas_ata_strategy_handler(). In this case, ->host_failed may be 
> > > > decreased simultaneously in scsi_eh_finish_cmd() on different 
> > > > CPUs, and become abnormal.
> > > > 
> > > > It will lead to permanently inequal between ->host_failed and
> > > >  ->host_busy. Then SCSI error handler thread won't become 
> > > > running, SCSI errors after that won't be handled forever.
> > > > 
> > > > Use atomic type for ->host_failed to fix this race.
> > > 
> > > Looks fine,
> > 
> > Actually, it doesn't look fine at all.  The same mechanism that's
> > supposed to protect the host_failed decrement is also supposed to
> > protect the list_move_tail().  If there's a problem with the former
> > then we're also in danger of corrupting the list.
> 
> Scmd is moved to local eh_done_q list here, and I checked that the
> list won't be touched concurrently.
>
> > Can we go back to the theory of what the problem is, since it's not
> > spelled out very clearly in the change log.  Our usual reason for 
> > not requiring locking in eh routines is that the eh is single 
> > threaded on the eh thread per host, so any host manipulations can't 
> > have concurrency problems.  In this case, the sas_ata routines are
> > trying to be clever and use asynchronous workqueues for the port 
> > error handler and you theorise that these can execute concurrently 
> > on two CPUs, thus causing the problem?
> 
> Yes, it's the case. The works of the port error handler are added to
> system_unbound_wq, and will be performed concurrently on different 
> CPUs. We have already met that problem on our machine.

OK, add that to the changelog and also that this fixes

commit 50824d6c5657ce340e3911171865a8d99fdd8eba
Author: Dan Williams <dan.j.williams@intel.com>
Date:   Sun Dec 4 01:06:24 2011 -0800

    [SCSI] libsas: async ata-eh

Because that's where the concurrency rules weren't verified when this
async threading was added.

One final thing is that we don't need this replaced by atomics.  The
only atomic check we need is the up count, which is already serialised
by the host lock.  Nothing actually ever bothers with the down count,
so it can just be eliminated and host_failed set to zero after the
strategy handle is complete (but before scsi_restart_operations) in the
eh_thread.

Once this change is made, scsi_eh_finish_cmd() and
scsi_eh_flush_done_q() are safe provided the done_q list is not
modifiable by any other thread.

As Christoph said, the documentation needs updating to reflect these
new concurrency rules.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig May 30, 2016, 7:10 p.m. UTC | #8
On Mon, May 30, 2016 at 03:43:43PM +0800, Wei Fang wrote:
> I looked around the file, and didn't find the part should be updated.
> Would you point me out?

Lines 255 and 266 in Documentation/scsi/scsi_eh.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 961acc7..a0e7612 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -606,7 +606,7 @@  void ata_scsi_error(struct Scsi_Host *host)
 	ata_scsi_port_error_handler(host, ap);
 
 	/* finish or retry handled scmd's and clean up */
-	WARN_ON(host->host_failed || !list_empty(&eh_work_q));
+	WARN_ON(atomic_read(&host->host_failed) || !list_empty(&eh_work_q));
 
 	DPRINTK("EXIT\n");
 }
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 519dac4..8d74003 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -757,7 +757,8 @@  retry:
 	spin_unlock_irq(shost->host_lock);
 
 	SAS_DPRINTK("Enter %s busy: %d failed: %d\n",
-		    __func__, atomic_read(&shost->host_busy), shost->host_failed);
+		    __func__, atomic_read(&shost->host_busy),
+		    atomic_read(&shost->host_failed));
 	/*
 	 * Deal with commands that still have SAS tasks (i.e. they didn't
 	 * complete via the normal sas_task completion mechanism),
@@ -800,7 +801,7 @@  out:
 
 	SAS_DPRINTK("--- Exit %s: busy: %d failed: %d tries: %d\n",
 		    __func__, atomic_read(&shost->host_busy),
-		    shost->host_failed, tries);
+		    atomic_read(&shost->host_failed), tries);
 }
 
 enum blk_eh_timer_return sas_scsi_timed_out(struct scsi_cmnd *cmd)
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 1deb6ad..7840915 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -527,7 +527,7 @@  void scsi_log_completion(struct scsi_cmnd *cmd, int disposition)
 				scmd_printk(KERN_INFO, cmd,
 					    "scsi host busy %d failed %d\n",
 					    atomic_read(&cmd->device->host->host_busy),
-					    cmd->device->host->host_failed);
+					    atomic_read(&cmd->device->host->host_failed));
 		}
 	}
 }
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 984ddcb..f37666f 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -62,7 +62,8 @@  static int scsi_try_to_abort_cmd(struct scsi_host_template *,
 /* called with shost->host_lock held */
 void scsi_eh_wakeup(struct Scsi_Host *shost)
 {
-	if (atomic_read(&shost->host_busy) == shost->host_failed) {
+	if (atomic_read(&shost->host_busy) ==
+	    atomic_read(&shost->host_failed)) {
 		trace_scsi_eh_wakeup(shost);
 		wake_up_process(shost->ehandler);
 		SCSI_LOG_ERROR_RECOVERY(5, shost_printk(KERN_INFO, shost,
@@ -250,7 +251,7 @@  int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
 		eh_flag &= ~SCSI_EH_CANCEL_CMD;
 	scmd->eh_eflags |= eh_flag;
 	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
-	shost->host_failed++;
+	atomic_inc(&shost->host_failed);
 	scsi_eh_wakeup(shost);
  out_unlock:
 	spin_unlock_irqrestore(shost->host_lock, flags);
@@ -1127,7 +1128,7 @@  static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn)
  */
 void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
 {
-	scmd->device->host->host_failed--;
+	atomic_dec(&scmd->device->host->host_failed);
 	scmd->eh_eflags = 0;
 	list_move_tail(&scmd->eh_entry, done_q);
 }
@@ -2190,8 +2191,10 @@  int scsi_error_handler(void *data)
 		if (kthread_should_stop())
 			break;
 
-		if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0) ||
-		    shost->host_failed != atomic_read(&shost->host_busy)) {
+		if ((atomic_read(&shost->host_failed) == 0 &&
+		     shost->host_eh_scheduled == 0) ||
+		    (atomic_read(&shost->host_failed) !=
+		     atomic_read(&shost->host_busy))) {
 			SCSI_LOG_ERROR_RECOVERY(1,
 				shost_printk(KERN_INFO, shost,
 					     "scsi_eh_%d: sleeping\n",
@@ -2205,7 +2208,7 @@  int scsi_error_handler(void *data)
 			shost_printk(KERN_INFO, shost,
 				     "scsi_eh_%d: waking up %d/%d/%d\n",
 				     shost->host_no, shost->host_eh_scheduled,
-				     shost->host_failed,
+				     atomic_read(&shost->host_failed),
 				     atomic_read(&shost->host_busy)));
 
 		/*
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8106515..fb3cc5d 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -318,7 +318,8 @@  void scsi_device_unbusy(struct scsi_device *sdev)
 		atomic_dec(&starget->target_busy);
 
 	if (unlikely(scsi_host_in_recovery(shost) &&
-		     (shost->host_failed || shost->host_eh_scheduled))) {
+		     (atomic_read(&shost->host_failed) ||
+		      shost->host_eh_scheduled))) {
 		spin_lock_irqsave(shost->host_lock, flags);
 		scsi_eh_wakeup(shost);
 		spin_unlock_irqrestore(shost->host_lock, flags);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index fcfa3d7..654435f 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -576,7 +576,7 @@  struct Scsi_Host {
 	atomic_t host_busy;		   /* commands actually active on low-level */
 	atomic_t host_blocked;
 
-	unsigned int host_failed;	   /* commands that failed.
+	atomic_t host_failed;		   /* commands that failed.
 					      protected by host_lock */
 	unsigned int host_eh_scheduled;    /* EH scheduled without command */