Message ID | 4C34A1D5.1090202@kernel.org |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
On Wed, Jul 7, 2010 at 8:48 AM, Tejun Heo <tj@kernel.org> wrote: > Hello, > > On 07/07/2010 05:34 PM, Mark Knecht wrote: >> OK - I don't know if this was you intention but since adding this >> patch I've not had a single drive missing failure. I've cold booted >> about 8 times and warm booted at least 20 times. Every one has come up >> fine. I've even gone so far as to turn off the UPS and sit for 5 >> minutes before cold booting. Still nothing fails right now. >> >> I've had this sort of statistical thing happen before where it hasn't >> failed for days, maybe even weeks, but then it starts failing and >> fails every time for awhile. Over the past few days working with you >> I've never had to reboot more than twice to get you a file. Now I've >> tried 30 times this morning and I've come up with nothing. >> >> I will continue to watch the machine and send you the failing dmesg >> file whenever I finally get it. For now I can only attach the passing >> file showing the patch is now included. > > It seems like SIDPR is a bit more unreliable than the current code can > handle and the added delay and read could have affected the result. > Eh... weird. Can you please apply the attached patch instead? The > only difference is it will print out two SControl values instead of > one. ie. "XXX SControl after resume = AAA BBB, tries=T". Can you > please try to boot multiple times and see if AAA and BBB differ > anytime? If that happens, please attach the boot log. Also, if you > see one with T > 1, please attach that one too. > > Thanks. > > -- > tejun > Certainly. Is there a way to reverse the previous patch? c2stable linux # patch -p1 --dry-run <~mark/Downloads/resume-dbg-1.patch patching file drivers/ata/libata-core.c Hunk #1 succeeded at 3798 (offset 86 lines). Hunk #2 succeeded at 3833 with fuzz 2 (offset 94 lines). Hunk #3 FAILED at 6109. 1 out of 3 hunks FAILED -- saving rejects to file drivers/ata/libata-core.c.rej c2stable linux # I assume this is failing because your patch is over the plain kernel, not the one I've patched? Thanks, Mark -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/07/2010 06:15 PM, Mark Knecht wrote: > Certainly. Is there a way to reverse the previous patch? > > c2stable linux # patch -p1 --dry-run <~mark/Downloads/resume-dbg-1.patch > patching file drivers/ata/libata-core.c > Hunk #1 succeeded at 3798 (offset 86 lines). > Hunk #2 succeeded at 3833 with fuzz 2 (offset 94 lines). > Hunk #3 FAILED at 6109. > 1 out of 3 hunks FAILED -- saving rejects to file drivers/ata/libata-core.c.rej > c2stable linux # > > I assume this is failing because your patch is over the plain kernel, > not the one I've patched? $ patch -R -p1 < ~mark/Downloads/resume-dbg.patch $ patch -p1 < ~mark/Downloads/resume-dbg-1.patch
On Wed, Jul 7, 2010 at 9:19 AM, Tejun Heo <tj@kernel.org> wrote: > On 07/07/2010 06:15 PM, Mark Knecht wrote: >> Certainly. Is there a way to reverse the previous patch? >> >> c2stable linux # patch -p1 --dry-run <~mark/Downloads/resume-dbg-1.patch >> patching file drivers/ata/libata-core.c >> Hunk #1 succeeded at 3798 (offset 86 lines). >> Hunk #2 succeeded at 3833 with fuzz 2 (offset 94 lines). >> Hunk #3 FAILED at 6109. >> 1 out of 3 hunks FAILED -- saving rejects to file drivers/ata/libata-core.c.rej >> c2stable linux # >> >> I assume this is failing because your patch is over the plain kernel, >> not the one I've patched? > > $ patch -R -p1 < ~mark/Downloads/resume-dbg.patch > $ patch -p1 < ~mark/Downloads/resume-dbg-1.patch > > -- > tejun > Thanks. Building the new kernel now. I'll start trying to save the data you're looking for. Cheers, Mark -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 7, 2010 at 8:48 AM, Tejun Heo <tj@kernel.org> wrote: > Hello, > > On 07/07/2010 05:34 PM, Mark Knecht wrote: >> OK - I don't know if this was you intention but since adding this >> patch I've not had a single drive missing failure. I've cold booted >> about 8 times and warm booted at least 20 times. Every one has come up >> fine. I've even gone so far as to turn off the UPS and sit for 5 >> minutes before cold booting. Still nothing fails right now. >> >> I've had this sort of statistical thing happen before where it hasn't >> failed for days, maybe even weeks, but then it starts failing and >> fails every time for awhile. Over the past few days working with you >> I've never had to reboot more than twice to get you a file. Now I've >> tried 30 times this morning and I've come up with nothing. >> >> I will continue to watch the machine and send you the failing dmesg >> file whenever I finally get it. For now I can only attach the passing >> file showing the patch is now included. > > It seems like SIDPR is a bit more unreliable than the current code can > handle and the added delay and read could have affected the result. > Eh... weird. Can you please apply the attached patch instead? The > only difference is it will print out two SControl values instead of > one. ie. "XXX SControl after resume = AAA BBB, tries=T". Can you > please try to boot multiple times and see if AAA and BBB differ > anytime? If that happens, please attach the boot log. Also, if you > see one with T > 1, please attach that one too. > > Thanks. > > -- > tejun > 4 warm reboots. All 4 said 300 300. However the 4th one only showed an extra attempt at running the patch code with and also showed Tries = 2. I'm attaching boot #1 and boot #4 for now. I've saved them all if you need or just want them. Please note that in all 4 cases all drives were found. Nothing is missing in any test yet after adding these either of these patches. I've not tried cold boots yet. That's next. Cheers, Mark
Hello, On 07/07/2010 07:06 PM, Mark Knecht wrote: > 4 warm reboots. All 4 said 300 300. However the 4th one only showed an > extra attempt at running the patch code with and also showed Tries = > 2. I'm attaching boot #1 and boot #4 for now. I've saved them all if > you need or just want them. > > Please note that in all 4 cases all drives were found. Nothing is > missing in any test yet after adding these either of these patches. > > I've not tried cold boots yet. That's next. Hmm... just in case you're being lucky, please keep an eye on it over several days and report the result. I think all that's necessary is slight modification to the resume logic but let's watch a bit first. Thanks.
On Wed, Jul 7, 2010 at 10:26 AM, Tejun Heo <tj@kernel.org> wrote: > Hello, > > On 07/07/2010 07:06 PM, Mark Knecht wrote: >> 4 warm reboots. All 4 said 300 300. However the 4th one only showed an >> extra attempt at running the patch code with and also showed Tries = >> 2. I'm attaching boot #1 and boot #4 for now. I've saved them all if >> you need or just want them. >> >> Please note that in all 4 cases all drives were found. Nothing is >> missing in any test yet after adding these either of these patches. >> >> I've not tried cold boots yet. That's next. > > Hmm... just in case you're being lucky, please keep an eye on it over > several days and report the result. I think all that's necessary is > slight modification to the resume logic but let's watch a bit first. > > Thanks. > > -- > tejun > I've tried two cold boots so far. One of them had that same extra Tries = 2 at the same place. I'll do a couple more just to see what happens. I'm happy to watch it as long as it takes and will certainly save any results if and when a drive isn't found. This problem has run hot and cold for a few months. Sometimes I go a week with every boot is good. This last two weeks was finally bad enough to get me to report it. I also want to investigate actually getting the BIOS AHCI setting working, but I think I'll leave that alone for afew days so as to not upset this experiment. Performance on the machine isn't critical right now, assuming AHCI helps. I'll get back to you when I've got something new to report. Cheers, Mark -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 7, 2010 at 10:26 AM, Tejun Heo <tj@kernel.org> wrote: > Hello, > > On 07/07/2010 07:06 PM, Mark Knecht wrote: >> 4 warm reboots. All 4 said 300 300. However the 4th one only showed an >> extra attempt at running the patch code with and also showed Tries = >> 2. I'm attaching boot #1 and boot #4 for now. I've saved them all if >> you need or just want them. >> >> Please note that in all 4 cases all drives were found. Nothing is >> missing in any test yet after adding these either of these patches. >> >> I've not tried cold boots yet. That's next. > > Hmm... just in case you're being lucky, please keep an eye on it over > several days and report the result. I think all that's necessary is > slight modification to the resume logic but let's watch a bit first. > > Thanks. > > -- > tejun > Tejun, With about 10-12 day of testing, 1-2 boots/day, I've not had a single boot failure since adding the patch. Only twice has it said tries=2. Every other time it's tries=1. The machine seems to work fine either way. Thanks, Mark -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 2984e45..ce87bfe 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -3712,7 +3712,7 @@ int sata_link_resume(struct ata_link *link, const unsigned long *params, unsigned long deadline) { int tries = ATA_LINK_RESUME_TRIES; - u32 scontrol, serror; + u32 scontrol, scontrol1, serror; int rc; if ((rc = sata_scr_read(link, SCR_CONTROL, &scontrol))) @@ -3739,6 +3739,14 @@ int sata_link_resume(struct ata_link *link, const unsigned long *params, return rc; } while ((scontrol & 0xf0f) != 0x300 && --tries); + /* check once more */ + msleep(100); + if ((rc = sata_scr_read(link, SCR_CONTROL, &scontrol1))) + return rc; + ata_link_printk(link, KERN_ERR, + "XXX SControl after resume = %X %X, tries=%d\n", + scontrol, scontrol1, ATA_LINK_RESUME_TRIES - tries + 1); + if ((scontrol & 0xf0f) != 0x300) { ata_link_printk(link, KERN_ERR, "failed to resume link (SControl %X)\n", @@ -6007,7 +6015,7 @@ static void async_port_probe(void *data, async_cookie_t cookie) ehi->probe_mask |= ATA_ALL_DEVICES; ehi->action |= ATA_EH_RESET | ATA_EH_LPM; - ehi->flags |= ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET; + ehi->flags |= ATA_EHI_NO_AUTOPSY/* | ATA_EHI_QUIET*/; ap->pflags &= ~ATA_PFLAG_INITIALIZING; ap->pflags |= ATA_PFLAG_LOADING;