Message ID | 4C3DACFF.2060504@kernel.org |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
I applied your patch to the 2.6.34 kernel, and on my very first reboot had a drive missing. What information do you want? Just /var/log/messages? I don't want to keep rebooting forever. Was this patch supposed to fix the problem, or just give you information to debug? Is the problem even known? Regards, Paul > Hello, > > On 07/12/2010 06:36 PM, Paul Check wrote: >>> Tejun & Co: >>> >>> I have finally upgraded to the 2.6.34 kernel, and I am still having >>> problems with some of my drives not coming up some of the time >>> (different >>> drives, at different times, never more than one). >>> >>> Here are some lines from /var/log/messages on the most recent boot >>> below. >>> Do you have any suggestions for this? I'm getting tired of having to >>> reconstitute my raid 30-50% of the time, and will try anything to see >>> if >>> it fixes. Note the "link up (unknown)" and "link down" lines. I don't >>> know >>> what should appear, but I have 4 hard drives and one optical drive >>> plugged >>> into 5 of the 6 SATA ports on the board. > ... >>> Jul 12 12:25:22 min kernel: [ 2.078489] ata2.01: SATA link up >>> <unknown> >>> (SStatus 300 SControl 123) > > Hmm, yeah, it seems like SCR access via SIDPR is more flaky than > covered by the previous commit. There's another thread where similar > problem is being debugged. Can you please do the following? I'm > attaching patch here too. > > http://thread.gmane.org/gmane.linux.kernel/1005983/focus=46749 > > Thanks. > > -- > tejun > -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On 07/14/2010 07:58 PM, Paul Check wrote: > I applied your patch to the 2.6.34 kernel, and on my very first reboot had > a drive missing. What information do you want? Just /var/log/messages? The kernel boot log should suffice. ie. dmesg output after boot. > I don't want to keep rebooting forever. Was this patch supposed to fix the > problem, or just give you information to debug? Is the problem even known? It should give more information but seems to actually make the other reporter's problem go away. It looks like SIDPR SCR access can be flaky depending on timing but I'm still trying to determine how to work around it. Thanks.
Ok, well we didn't have to wait long. Once again one of my drives failed to appear. This time it was a different drive than the times before. I have attached the output from dmesg. Can you tell me from this output what the problem is? Regards, Paul > Hello, > > On 07/14/2010 07:58 PM, Paul Check wrote: >> I applied your patch to the 2.6.34 kernel, and on my very first reboot >> had >> a drive missing. What information do you want? Just /var/log/messages? > > The kernel boot log should suffice. ie. dmesg output after boot. > >> I don't want to keep rebooting forever. Was this patch supposed to fix >> the >> problem, or just give you information to debug? Is the problem even >> known? > > It should give more information but seems to actually make the other > reporter's problem go away. It looks like SIDPR SCR access can be > flaky depending on timing but I'm still trying to determine how to > work around it. > > Thanks. > > -- > tejun > -- > To unsubscribe from this list: send the line "unsubscribe linux-ide" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Hello, On 07/15/2010 02:43 AM, Paul Check wrote: > Ok, well we didn't have to wait long. Once again one of my drives failed > to appear. This time it was a different drive than the times before. > > I have attached the output from dmesg. Can you tell me from this output > what the problem is? SStatus and SControl are behaving erratically. I'm not yet sure why that's happening yet. [ 2.562157] ata4.01: SATA link up <unknown> (SStatus 300 SControl 123) Hmm... it looks like SStatus and SControl are swapped here. Maybe there's a race condition in SIDPR code. I'll look into it a bit more. Thanks.
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 2984e45..ce87bfe 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -3712,7 +3712,7 @@ int sata_link_resume(struct ata_link *link, const unsigned long *params, unsigned long deadline) { int tries = ATA_LINK_RESUME_TRIES; - u32 scontrol, serror; + u32 scontrol, scontrol1, serror; int rc; if ((rc = sata_scr_read(link, SCR_CONTROL, &scontrol))) @@ -3739,6 +3739,14 @@ int sata_link_resume(struct ata_link *link, const unsigned long *params, return rc; } while ((scontrol & 0xf0f) != 0x300 && --tries); + /* check once more */ + msleep(100); + if ((rc = sata_scr_read(link, SCR_CONTROL, &scontrol1))) + return rc; + ata_link_printk(link, KERN_ERR, + "XXX SControl after resume = %X %X, tries=%d\n", + scontrol, scontrol1, ATA_LINK_RESUME_TRIES - tries + 1); + if ((scontrol & 0xf0f) != 0x300) { ata_link_printk(link, KERN_ERR, "failed to resume link (SControl %X)\n", @@ -6007,7 +6015,7 @@ static void async_port_probe(void *data, async_cookie_t cookie) ehi->probe_mask |= ATA_ALL_DEVICES; ehi->action |= ATA_EH_RESET | ATA_EH_LPM; - ehi->flags |= ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET; + ehi->flags |= ATA_EHI_NO_AUTOPSY/* | ATA_EHI_QUIET*/; ap->pflags &= ~ATA_PFLAG_INITIALIZING; ap->pflags |= ATA_PFLAG_LOADING;