[12/12] scsi_transport_sas: fix delete vs scan race

Message ID CAA9_cmeL5h_5xESis06pyT-7bt+K2eQrN5SR6_b25qLBSDVXvA@mail.gmail.com
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Dan Williams May 20, 2012, 7:20 p.m.
On Sat, May 5, 2012 at 2:52 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Sun, Apr 22, 2012 at 10:15 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
>> Async scan here means any scan in a different thread, right ... it just
>> has to be asynchronous relative to us?  So that includes the manually
>> initiated ones and hotplug ones, doesn't it?
> [ resend since I notice this never hit the lists ]
> Hmm, well no I don't think so.  This literally means the initial async
> scan, and the
> failure window is between when we skip the call to
> scsi_sysfs_add_sdev() (in scsi_add_lun() under the scan_mutex) and
> finally call scsi_sysfs_add_sdev() again via scsi_finish_async_scan().
> I don't see how that fixes it because when we fail the sequence goes:
> mutex_lock(scan_mutex)
> starget->parent = end_device;
> scsi_add_lun()
> mutex_unlock(scan_mutex)
> device_del(end_device)
> mutex_lock(scan_mutex)
> device_add(starget)
> <crash>
> As far as I can see taking the scan_mutex in sas_rphy_remove() does
> not change this failure window.  Unless I missed something?
> I am going to re-submit this patch as is with the proposed libsas batch for 3.5.

It turns out this patch can cause a deadlock in the scenario where we
have two hosts scanning and the "previous" host (according to the
async scan queue), experiences a device removal event.  I think the
following should be all we need:


...since starget removal will mark the sdevs as deleted under
scan_mutex.  scsi_sysfs_add_devices can simply ignore deleted devices.
 I'll post this patch after Darek has a chance to try it out.

To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 01b0374..8906557 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1714,6 +1714,9 @@  static void scsi_sysfs_add_devices(struct
Scsi_Host *shost)
        struct scsi_device *sdev;
        shost_for_each_device(sdev, shost) {
+               /* target removed before the device could be added */
+               if (sdev->sdev_state == SDEV_DEL)
+                       continue;
                if (!scsi_host_scan_allowed(shost) ||
                    scsi_sysfs_add_sdev(sdev) != 0)