Patchwork 2.6.31-rc5 regression: hd don't show up

login
register
mail settings
Submitter Tejun Heo
Date Sept. 16, 2009, noon
Message ID <4AB0D34A.2010502@kernel.org>
Download mbox | patch
Permalink /patch/33703/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Tejun Heo - Sept. 16, 2009, noon
Tim Blechmann wrote:
>> Can you please try the attached patch and report the result?  The bug
>> shouldn't happen now and for cases where it would have happened,
>> libata will report "link online but device misclassified" and retry.
> 
> this patch cannot be applied onto stable-2.6.31/master ... e.g. the
> first chunk is already in there ....

Eh... the first chunk was for the current devel branch.  Here's a
version without that one.

Thanks.
Tejun Heo - Sept. 25, 2009, 4:20 a.m.
Tejun Heo wrote:
> Tim Blechmann wrote:
>>> Can you please try the attached patch and report the result?  The bug
>>> shouldn't happen now and for cases where it would have happened,
>>> libata will report "link online but device misclassified" and retry.
>> this patch cannot be applied onto stable-2.6.31/master ... e.g. the
>> first chunk is already in there ....
> 
> Eh... the first chunk was for the current devel branch.  Here's a
> version without that one.

Ping.
Tim Blechmann - Sept. 25, 2009, 7:46 a.m.
>>>> Can you please try the attached patch and report the result?  The bug
>>>> shouldn't happen now and for cases where it would have happened,
>>>> libata will report "link online but device misclassified" and retry.
>>> this patch cannot be applied onto stable-2.6.31/master ... e.g. the
>>> first chunk is already in there ....
>>
>> Eh... the first chunk was for the current devel branch.  Here's a
>> version without that one.
> 
> Ping.

i haven't run into any further issues with this patch ... so i somehow
guess, it solved the problem ...

tim
Tejun Heo - Sept. 25, 2009, 7:53 a.m.
Tim Blechmann wrote:
>>>>> Can you please try the attached patch and report the result?  The bug
>>>>> shouldn't happen now and for cases where it would have happened,
>>>>> libata will report "link online but device misclassified" and retry.
>>>> this patch cannot be applied onto stable-2.6.31/master ... e.g. the
>>>> first chunk is already in there ....
>>> Eh... the first chunk was for the current devel branch.  Here's a
>>> version without that one.
>> Ping.
> 
> i haven't run into any further issues with this patch ... so i somehow
> guess, it solved the problem ...

In your boot logs, do you ever see messages like "link online but
device misclassified, retrying"?
Tim Blechmann - Sept. 25, 2009, 11:47 a.m.
>>>>>> Can you please try the attached patch and report the result?  The bug
>>>>>> shouldn't happen now and for cases where it would have happened,
>>>>>> libata will report "link online but device misclassified" and retry.
>>>>> this patch cannot be applied onto stable-2.6.31/master ... e.g. the
>>>>> first chunk is already in there ....
>>>> Eh... the first chunk was for the current devel branch.  Here's a
>>>> version without that one.
>>> Ping.
>>
>> i haven't run into any further issues with this patch ... so i somehow
>> guess, it solved the problem ...
> 
> In your boot logs, do you ever see messages like "link online but
> device misclassified, retrying"?

i didn't really pay attention to it ... at my currently running system,
it doesn't show up ... but i will keep my eyes open ...

tim
Tejun Heo - Sept. 25, 2009, 1:21 p.m.
Tim Blechmann wrote:
>>>>>>> Can you please try the attached patch and report the result?  The bug
>>>>>>> shouldn't happen now and for cases where it would have happened,
>>>>>>> libata will report "link online but device misclassified" and retry.
>>>>>> this patch cannot be applied onto stable-2.6.31/master ... e.g. the
>>>>>> first chunk is already in there ....
>>>>> Eh... the first chunk was for the current devel branch.  Here's a
>>>>> version without that one.
>>>> Ping.
>>> i haven't run into any further issues with this patch ... so i somehow
>>> guess, it solved the problem ...
>> In your boot logs, do you ever see messages like "link online but
>> device misclassified, retrying"?
> 
> i didn't really pay attention to it ... at my currently running system,
> it doesn't show up ... but i will keep my eyes open ...

Can you please reboot several times and see whether that happens?  The
bug is quite obscure and I wanna be sure before pushing it upstream.

Thanks.
Tejun Heo - Oct. 2, 2009, 5:20 a.m.
Tejun Heo wrote:
> Tim Blechmann wrote:
>>>>>>>> Can you please try the attached patch and report the result?  The bug
>>>>>>>> shouldn't happen now and for cases where it would have happened,
>>>>>>>> libata will report "link online but device misclassified" and retry.
>>>>>>> this patch cannot be applied onto stable-2.6.31/master ... e.g. the
>>>>>>> first chunk is already in there ....
>>>>>> Eh... the first chunk was for the current devel branch.  Here's a
>>>>>> version without that one.
>>>>> Ping.
>>>> i haven't run into any further issues with this patch ... so i somehow
>>>> guess, it solved the problem ...
>>> In your boot logs, do you ever see messages like "link online but
>>> device misclassified, retrying"?
>> i didn't really pay attention to it ... at my currently running system,
>> it doesn't show up ... but i will keep my eyes open ...
> 
> Can you please reboot several times and see whether that happens?  The
> bug is quite obscure and I wanna be sure before pushing it upstream.

Ping?
Tim Blechmann - Oct. 2, 2009, 7:30 a.m.
On 10/02/2009 07:20 AM, Tejun Heo wrote:
> Tejun Heo wrote:
>> Tim Blechmann wrote:
>>>>>>>>> Can you please try the attached patch and report the result?  The bug
>>>>>>>>> shouldn't happen now and for cases where it would have happened,
>>>>>>>>> libata will report "link online but device misclassified" and retry.
>>>>>>>> this patch cannot be applied onto stable-2.6.31/master ... e.g. the
>>>>>>>> first chunk is already in there ....
>>>>>>> Eh... the first chunk was for the current devel branch.  Here's a
>>>>>>> version without that one.
>>>>>> Ping.
>>>>> i haven't run into any further issues with this patch ... so i somehow
>>>>> guess, it solved the problem ...
>>>> In your boot logs, do you ever see messages like "link online but
>>>> device misclassified, retrying"?
>>> i didn't really pay attention to it ... at my currently running system,
>>> it doesn't show up ... but i will keep my eyes open ...
>>
>> Can you please reboot several times and see whether that happens?  The
>> bug is quite obscure and I wanna be sure before pushing it upstream.
> 
> Ping?

since i have this kernel running, i didn't see any of these messages,
still, my hds always showed up ...
not sure, if this is related to your patch or if the buggy behavior,
just didn't occur ...

tim
Tejun Heo - Oct. 2, 2009, 8:59 a.m.
Hello,

Tim Blechmann wrote:
> since i have this kernel running, i didn't see any of these messages,
> still, my hds always showed up ...
> not sure, if this is related to your patch or if the buggy behavior,
> just didn't occur ...

Can you please reboot several times and see whether libata probing
messages show anything interesting?

Thanks.
Tim Blechmann - Oct. 5, 2009, 9:59 a.m.
>> since i have this kernel running, i didn't see any of these messages,
>> still, my hds always showed up ...
>> not sure, if this is related to your patch or if the buggy behavior,
>> just didn't occur ...
> 
> Can you please reboot several times and see whether libata probing
> messages show anything interesting?

unfortunately, i didn't see any libata related messages, which could be 
of interest ... 
the relevant part looks mostly like this:

[    1.575438] libata version 3.00 loaded.
[    2.000903] ata_piix 0000:00:1f.2: version 2.13
[    2.000916] ata_piix 0000:00:1f.2: PCI INT B -> GSI 20 (level, low) -> IRQ 20
[    2.000978] ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
[    2.001212] ata_piix 0000:00:1f.2: setting latency timer to 64
[    2.001255] scsi0 : ata_piix
[    2.001372] scsi1 : ata_piix
[    2.002491] ata1: SATA max UDMA/133 cmd 0xa000 ctl 0x9c00 bmdma 0x9480 irq 20
[    2.002555] ata2: SATA max UDMA/133 cmd 0x9880 ctl 0x9800 bmdma 0x9488 irq 20
[    2.002628] ata_piix 0000:00:1f.5: PCI INT B -> GSI 20 (level, low) -> IRQ 20
[    2.002691] ata_piix 0000:00:1f.5: MAP [ P0 -- P1 -- ]
[    2.002920] ata_piix 0000:00:1f.5: setting latency timer to 64
[    2.002951] scsi2 : ata_piix
[    2.003034] scsi3 : ata_piix
[    2.004126] ata3: SATA max UDMA/133 cmd 0xb000 ctl 0xac00 bmdma 0xa480 irq 20
[    2.004188] ata4: SATA max UDMA/133 cmd 0xa880 ctl 0xa800 bmdma 0xa488 irq 20
[    2.516133] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    2.519253] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    2.536470] ata3.00: ATA-7: SAMSUNG HD753LJ, 1AA01112, max UDMA7
[    2.536536] ata3.00: 1465149168 sectors, multi 16: LBA48 NCQ (depth 0/32)
[    2.556453] ata4.00: ATA-8: SAMSUNG HD501LJ, CR100-12, max UDMA7
[    2.556528] ata4.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32)
[    2.556843] ata3.00: configured for UDMA/133
[    2.576492] ata4.00: configured for UDMA/133
[    2.856006] ata2.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    2.856081] ata2.01: SATA link down (SStatus 0 SControl 300)
[    2.856305] ata1.00: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    2.856373] ata1.01: SATA link down (SStatus 0 SControl 300)
[    2.856437] ata1.01: link offline, clearing class 3 to NONE
[    2.876344] ata1.00: ATAPI: TSSTcorp CDDVDW SH-S223B, SB01, max UDMA/100
[    2.876716] ata2.00: ATA-7: SAMSUNG HD103UJ, 1AA01118, max UDMA7
[    2.876782] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32)
[    2.896332] ata2.00: configured for UDMA/133
[    2.916132] ata1.00: configured for UDMA/100

not sure, what this means, though ...

tim
Tejun Heo - Oct. 6, 2009, 7:30 a.m.
Hello,

Tim Blechmann wrote:
>>> since i have this kernel running, i didn't see any of these messages,
>>> still, my hds always showed up ...
>>> not sure, if this is related to your patch or if the buggy behavior,
>>> just didn't occur ...
>> Can you please reboot several times and see whether libata probing
>> messages show anything interesting?
> 
> unfortunately, i didn't see any libata related messages, which could be 
> of interest ... 
> the relevant part looks mostly like this:
> 
> not sure, what this means, though ...

It means the original condition didn't trigger most likely due to
timing differences. :-( I'll push the patch upstream.

Thanks.

Patch

---
 drivers/ata/libata-eh.c |   50 ++++++++++++++++++++++++++++++------------------
 1 file changed, 32 insertions(+), 18 deletions(-)

Index: tree0/drivers/ata/libata-eh.c
===================================================================
--- tree0.orig/drivers/ata/libata-eh.c
+++ tree0/drivers/ata/libata-eh.c
@@ -2541,14 +2541,14 @@  int ata_eh_reset(struct ata_link *link,
 		dev->pio_mode = XFER_PIO_0;
 		dev->flags &= ~ATA_DFLAG_SLEEPING;
 
-		if (!ata_phys_link_offline(ata_dev_phys_link(dev))) {
-			/* apply class override */
-			if (lflags & ATA_LFLAG_ASSUME_ATA)
-				classes[dev->devno] = ATA_DEV_ATA;
-			else if (lflags & ATA_LFLAG_ASSUME_SEMB)
-				classes[dev->devno] = ATA_DEV_SEMB_UNSUP;
-		} else
-			classes[dev->devno] = ATA_DEV_NONE;
+		if (ata_phys_link_offline(ata_dev_phys_link(dev)))
+			continue;
+
+		/* apply class override */
+		if (lflags & ATA_LFLAG_ASSUME_ATA)
+			classes[dev->devno] = ATA_DEV_ATA;
+		else if (lflags & ATA_LFLAG_ASSUME_SEMB)
+			classes[dev->devno] = ATA_DEV_SEMB_UNSUP;
 	}
 
 	/* record current link speed */
@@ -2581,34 +2581,48 @@  int ata_eh_reset(struct ata_link *link,
 		slave->eh_info.serror = 0;
 	spin_unlock_irqrestore(link->ap->lock, flags);
 
-	/* Make sure onlineness and classification result correspond.
+	/*
+	 * Make sure onlineness and classification result correspond.
 	 * Hotplug could have happened during reset and some
 	 * controllers fail to wait while a drive is spinning up after
 	 * being hotplugged causing misdetection.  By cross checking
-	 * link onlineness and classification result, those conditions
-	 * can be reliably detected and retried.
+	 * link on/offlineness and classification result, those
+	 * conditions can be reliably detected and retried.
 	 */
 	nr_unknown = 0;
 	ata_for_each_dev(dev, link, ALL) {
-		/* convert all ATA_DEV_UNKNOWN to ATA_DEV_NONE */
-		if (classes[dev->devno] == ATA_DEV_UNKNOWN) {
-			classes[dev->devno] = ATA_DEV_NONE;
-			if (ata_phys_link_online(ata_dev_phys_link(dev)))
+		if (ata_phys_link_online(ata_dev_phys_link(dev))) {
+			if (classes[dev->devno] == ATA_DEV_UNKNOWN) {
+				ata_dev_printk(dev, KERN_DEBUG, "link online "
+					       "but device misclassifed\n");
+				classes[dev->devno] = ATA_DEV_NONE;
 				nr_unknown++;
+			}
+		} else if (ata_phys_link_offline(ata_dev_phys_link(dev))) {
+			if (ata_class_enabled(classes[dev->devno]))
+				ata_dev_printk(dev, KERN_DEBUG, "link offline, "
+					       "clearing class %d to NONE\n",
+					       classes[dev->devno]);
+			classes[dev->devno] = ATA_DEV_NONE;
+		} else if (classes[dev->devno] == ATA_DEV_UNKNOWN) {
+			ata_dev_printk(dev, KERN_DEBUG, "link status unknown, "
+				       "clearing UNKNOWN to NONE\n");
+			classes[dev->devno] = ATA_DEV_NONE;
 		}
 	}
 
 	if (classify && nr_unknown) {
 		if (try < max_tries) {
 			ata_link_printk(link, KERN_WARNING, "link online but "
-				       "device misclassified, retrying\n");
+					"%d devices misclassified, retrying\n",
+					nr_unknown);
 			failed_link = link;
 			rc = -EAGAIN;
 			goto fail;
 		}
 		ata_link_printk(link, KERN_WARNING,
-			       "link online but device misclassified, "
-			       "device detection might fail\n");
+				"link online but %d devices misclassified, "
+				"device detection might fail\n", nr_unknown);
 	}
 
 	/* reset successful, schedule revalidation */