Patchwork [#upstream-fixes] libata: fix incorrect link online check during probe

login
register
mail settings
Submitter Tejun Heo
Date Oct. 6, 2009, 8:08 a.m.
Message ID <4ACAFB08.8020501@kernel.org>
Download mbox | patch
Permalink /patch/35072/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Tejun Heo - Oct. 6, 2009, 8:08 a.m.
While trying to work around spurious detection retries for
non-existent devices on slave links, commit
816ab89782ac139a8b65147cca990822bb7e8675 incorrectly added link
offline check logic before ata_eh_thaw() was called.  This means that
if an occupied link goes down briefly at the time that offline check
was performed, device class will be cleared to ATA_DEV_NONE and libata
wouldn't retry thus failing detection of the device.

The offline check should be done after the port is thawed together
with online check so that such link glitches can be detected by the
interrupt handler and handled properly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Tim Blechmann <tim@klingt.org>
Cc: stable@kernel.org
--
 drivers/ata/libata-eh.c |   50 ++++++++++++++++++++++++++++++------------------
 1 file changed, 32 insertions(+), 18 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - Oct. 7, 2009, 1:03 a.m.
On 10/06/2009 04:08 AM, Tejun Heo wrote:
> While trying to work around spurious detection retries for
> non-existent devices on slave links, commit
> 816ab89782ac139a8b65147cca990822bb7e8675 incorrectly added link
> offline check logic before ata_eh_thaw() was called.  This means that
> if an occupied link goes down briefly at the time that offline check
> was performed, device class will be cleared to ATA_DEV_NONE and libata
> wouldn't retry thus failing detection of the device.
>
> The offline check should be done after the port is thawed together
> with online check so that such link glitches can be detected by the
> interrupt handler and handled properly.
>
> Signed-off-by: Tejun Heo<tj@kernel.org>
> Reported-by: Tim Blechmann<tim@klingt.org>
> Cc: stable@kernel.org
> --
>   drivers/ata/libata-eh.c |   50 ++++++++++++++++++++++++++++++------------------
>   1 file changed, 32 insertions(+), 18 deletions(-)

applied.  BTW, note your separator lost a dash... it should be three 
dashes (---).


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - Oct. 7, 2009, 6:17 a.m.
Jeff Garzik wrote:
> On 10/06/2009 04:08 AM, Tejun Heo wrote:
>> While trying to work around spurious detection retries for
>> non-existent devices on slave links, commit
>> 816ab89782ac139a8b65147cca990822bb7e8675 incorrectly added link
>> offline check logic before ata_eh_thaw() was called.  This means that
>> if an occupied link goes down briefly at the time that offline check
>> was performed, device class will be cleared to ATA_DEV_NONE and libata
>> wouldn't retry thus failing detection of the device.
>>
>> The offline check should be done after the port is thawed together
>> with online check so that such link glitches can be detected by the
>> interrupt handler and handled properly.
>>
>> Signed-off-by: Tejun Heo<tj@kernel.org>
>> Reported-by: Tim Blechmann<tim@klingt.org>
>> Cc: stable@kernel.org
>> -- 
>>   drivers/ata/libata-eh.c |   50
>> ++++++++++++++++++++++++++++++------------------
>>   1 file changed, 32 insertions(+), 18 deletions(-)
> 
> applied.  BTW, note your separator lost a dash... it should be three
> dashes (---).

Oops, sorry about that.

Patch

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index a04488f..0a97822 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2667,14 +2667,14 @@  int ata_eh_reset(struct ata_link *link, int classify,
 		dev->pio_mode = XFER_PIO_0;
 		dev->flags &= ~ATA_DFLAG_SLEEPING;
 
-		if (!ata_phys_link_offline(ata_dev_phys_link(dev))) {
-			/* apply class override */
-			if (lflags & ATA_LFLAG_ASSUME_ATA)
-				classes[dev->devno] = ATA_DEV_ATA;
-			else if (lflags & ATA_LFLAG_ASSUME_SEMB)
-				classes[dev->devno] = ATA_DEV_SEMB_UNSUP;
-		} else
-			classes[dev->devno] = ATA_DEV_NONE;
+		if (ata_phys_link_offline(ata_dev_phys_link(dev)))
+			continue;
+
+		/* apply class override */
+		if (lflags & ATA_LFLAG_ASSUME_ATA)
+			classes[dev->devno] = ATA_DEV_ATA;
+		else if (lflags & ATA_LFLAG_ASSUME_SEMB)
+			classes[dev->devno] = ATA_DEV_SEMB_UNSUP;
 	}
 
 	/* record current link speed */
@@ -2713,34 +2713,48 @@  int ata_eh_reset(struct ata_link *link, int classify,
 	ap->pflags &= ~ATA_PFLAG_EH_PENDING;
 	spin_unlock_irqrestore(link->ap->lock, flags);
 
-	/* Make sure onlineness and classification result correspond.
+	/*
+	 * Make sure onlineness and classification result correspond.
 	 * Hotplug could have happened during reset and some
 	 * controllers fail to wait while a drive is spinning up after
 	 * being hotplugged causing misdetection.  By cross checking
-	 * link onlineness and classification result, those conditions
-	 * can be reliably detected and retried.
+	 * link on/offlineness and classification result, those
+	 * conditions can be reliably detected and retried.
 	 */
 	nr_unknown = 0;
 	ata_for_each_dev(dev, link, ALL) {
-		/* convert all ATA_DEV_UNKNOWN to ATA_DEV_NONE */
-		if (classes[dev->devno] == ATA_DEV_UNKNOWN) {
-			classes[dev->devno] = ATA_DEV_NONE;
-			if (ata_phys_link_online(ata_dev_phys_link(dev)))
+		if (ata_phys_link_online(ata_dev_phys_link(dev))) {
+			if (classes[dev->devno] == ATA_DEV_UNKNOWN) {
+				ata_dev_printk(dev, KERN_DEBUG, "link online "
+					       "but device misclassifed\n");
+				classes[dev->devno] = ATA_DEV_NONE;
 				nr_unknown++;
+			}
+		} else if (ata_phys_link_offline(ata_dev_phys_link(dev))) {
+			if (ata_class_enabled(classes[dev->devno]))
+				ata_dev_printk(dev, KERN_DEBUG, "link offline, "
+					       "clearing class %d to NONE\n",
+					       classes[dev->devno]);
+			classes[dev->devno] = ATA_DEV_NONE;
+		} else if (classes[dev->devno] == ATA_DEV_UNKNOWN) {
+			ata_dev_printk(dev, KERN_DEBUG, "link status unknown, "
+				       "clearing UNKNOWN to NONE\n");
+			classes[dev->devno] = ATA_DEV_NONE;
 		}
 	}
 
 	if (classify && nr_unknown) {
 		if (try < max_tries) {
 			ata_link_printk(link, KERN_WARNING, "link online but "
-				       "device misclassified, retrying\n");
+					"%d devices misclassified, retrying\n",
+					nr_unknown);
 			failed_link = link;
 			rc = -EAGAIN;
 			goto fail;
 		}
 		ata_link_printk(link, KERN_WARNING,
-			       "link online but device misclassified, "
-			       "device detection might fail\n");
+				"link online but %d devices misclassified, "
+				"device detection might fail\n", nr_unknown);
 	}
 
 	/* reset successful, schedule revalidation */