Patchwork [SCSI] scsi_remove_target: fix softlockup regression on hot remove

login
register
mail settings
Submitter Louis Bouchard
Date Oct. 19, 2012, 10:48 a.m.
Message ID <1350643737-14836-2-git-send-email-louis.bouchard@canonical.com>
Download mbox | patch
Permalink /patch/192662/
State New
Headers show

Comments

Louis Bouchard - Oct. 19, 2012, 10:48 a.m.
From: Dan Williams <djbw@fb.com>

John reports:
 BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202]
 [..]
 Call Trace:
  [<ffffffff8141782a>] scsi_remove_target+0xda/0x1f0
  [<ffffffff81421de5>] sas_rphy_remove+0x55/0x60
  [<ffffffff81421e01>] sas_rphy_delete+0x11/0x20
  [<ffffffff81421e35>] sas_port_delete+0x25/0x160
  [<ffffffff814549a3>] mptsas_del_end_device+0x183/0x270

...introduced by commit 3b661a9 "[SCSI] fix hot unplug vs async scan race".

Don't restart lookup of more stargets in the multi-target case, just
arrange to traverse the list once, on the assumption that new targets
are always added at the end.  There is no guarantee that the target will
change state in scsi_target_reap() so we can end up spinning if we
restart.

Cc: <stable@vger.kernel.org>
Acked-by: Jack Wang <jack_wang@usish.com>
LKML-Reference: <CAEhu1-6wq1YsNiscGMwP4ud0Q+MrViRzv=kcWCQSBNc8c68N5Q@mail.gmail.com>
Reported-by: John Drescher <drescherjm@gmail.com>
Tested-by: John Drescher <drescherjm@gmail.com>
Signed-off-by: Dan Williams <djbw@fb.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
(cherry picked from commit bc3f02a795d3b4faa99d37390174be2a75d091bd)

BugLink: http://bugs.launchpad.net/bugs/1056746

Signed-off-by: Louis Bouchard <louis.bouchard@canonical.com>
---
 drivers/scsi/scsi_sysfs.c |   30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)
Andy Whitcroft - Oct. 19, 2012, 12:26 p.m.
On Fri, Oct 19, 2012 at 12:48:57PM +0200, Louis Bouchard wrote:
> From: Dan Williams <djbw@fb.com>
> 
> John reports:
>  BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202]
>  [..]
>  Call Trace:
>   [<ffffffff8141782a>] scsi_remove_target+0xda/0x1f0
>   [<ffffffff81421de5>] sas_rphy_remove+0x55/0x60
>   [<ffffffff81421e01>] sas_rphy_delete+0x11/0x20
>   [<ffffffff81421e35>] sas_port_delete+0x25/0x160
>   [<ffffffff814549a3>] mptsas_del_end_device+0x183/0x270
> 
> ...introduced by commit 3b661a9 "[SCSI] fix hot unplug vs async scan race".
> 
> Don't restart lookup of more stargets in the multi-target case, just
> arrange to traverse the list once, on the assumption that new targets
> are always added at the end.  There is no guarantee that the target will
> change state in scsi_target_reap() so we can end up spinning if we
> restart.
> 
> Cc: <stable@vger.kernel.org>
> Acked-by: Jack Wang <jack_wang@usish.com>
> LKML-Reference: <CAEhu1-6wq1YsNiscGMwP4ud0Q+MrViRzv=kcWCQSBNc8c68N5Q@mail.gmail.com>
> Reported-by: John Drescher <drescherjm@gmail.com>
> Tested-by: John Drescher <drescherjm@gmail.com>
> Signed-off-by: Dan Williams <djbw@fb.com>
> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
> (cherry picked from commit bc3f02a795d3b4faa99d37390174be2a75d091bd)
> 
> BugLink: http://bugs.launchpad.net/bugs/1056746
> 
> Signed-off-by: Louis Bouchard <louis.bouchard@canonical.com>
> ---
>  drivers/scsi/scsi_sysfs.c |   30 ++++++++++++++----------------
>  1 file changed, 14 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index bb7c482..08d48a3 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1023,33 +1023,31 @@ static void __scsi_remove_target(struct scsi_target *starget)
>  void scsi_remove_target(struct device *dev)
>  {
>  	struct Scsi_Host *shost = dev_to_shost(dev->parent);
> -	struct scsi_target *starget, *found;
> +	struct scsi_target *starget, *last = NULL;
>  	unsigned long flags;
>  
> - restart:
> -	found = NULL;
> +	/* remove targets being careful to lookup next entry before
> +	 * deleting the last
> +	 */
>  	spin_lock_irqsave(shost->host_lock, flags);
>  	list_for_each_entry(starget, &shost->__targets, siblings) {
>  		if (starget->state == STARGET_DEL)
>  			continue;
>  		if (starget->dev.parent == dev || &starget->dev == dev) {
> -			found = starget;
> -			found->reap_ref++;
> -			break;
> +			/* assuming new targets arrive at the end */
> +			starget->reap_ref++;
> +			spin_unlock_irqrestore(shost->host_lock, flags);
> +			if (last)
> +				scsi_target_reap(last);
> +			last = starget;
> +			__scsi_remove_target(starget);
> +			spin_lock_irqsave(shost->host_lock, flags);
>  		}
>  	}
>  	spin_unlock_irqrestore(shost->host_lock, flags);
>  
> -	if (found) {
> -		__scsi_remove_target(found);
> -		scsi_target_reap(found);
> -		/* in the case where @dev has multiple starget children,
> -		 * continue removing.
> -		 *
> -		 * FIXME: does such a case exist?
> -		 */
> -		goto restart;
> -	}
> +	if (last)
> +		scsi_target_reap(last);
>  }
>  EXPORT_SYMBOL(scsi_remove_target);

Seems to do what is claimed.  Testing also shows good results.

Acked-by: Andy Whitcroft <apw@canonical.com>

-apw

Patch

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index bb7c482..08d48a3 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1023,33 +1023,31 @@  static void __scsi_remove_target(struct scsi_target *starget)
 void scsi_remove_target(struct device *dev)
 {
 	struct Scsi_Host *shost = dev_to_shost(dev->parent);
-	struct scsi_target *starget, *found;
+	struct scsi_target *starget, *last = NULL;
 	unsigned long flags;
 
- restart:
-	found = NULL;
+	/* remove targets being careful to lookup next entry before
+	 * deleting the last
+	 */
 	spin_lock_irqsave(shost->host_lock, flags);
 	list_for_each_entry(starget, &shost->__targets, siblings) {
 		if (starget->state == STARGET_DEL)
 			continue;
 		if (starget->dev.parent == dev || &starget->dev == dev) {
-			found = starget;
-			found->reap_ref++;
-			break;
+			/* assuming new targets arrive at the end */
+			starget->reap_ref++;
+			spin_unlock_irqrestore(shost->host_lock, flags);
+			if (last)
+				scsi_target_reap(last);
+			last = starget;
+			__scsi_remove_target(starget);
+			spin_lock_irqsave(shost->host_lock, flags);
 		}
 	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
 
-	if (found) {
-		__scsi_remove_target(found);
-		scsi_target_reap(found);
-		/* in the case where @dev has multiple starget children,
-		 * continue removing.
-		 *
-		 * FIXME: does such a case exist?
-		 */
-		goto restart;
-	}
+	if (last)
+		scsi_target_reap(last);
 }
 EXPORT_SYMBOL(scsi_remove_target);