Message ID | 4C44BD42.3030904@kernel.org |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
Hey Tejun: I guess this is the same patch that you sent me to fix my issue with missing drives. Good news: I've been through about 10 reboots now and no problems. Based on my prior experience, I'd say with the old setup, 10 clean boots in a row was probably less than a 1% event. So, it seems that this has fixed my problem. Thanks! Regards, Paul > Hello, > > On 07/19/2010 09:31 PM, Mark Knecht wrote: >> With about 10-12 day of testing, 1-2 boots/day, I've not had a >> single boot failure since adding the patch. Only twice has it said >> tries=2. Every other time it's tries=1. The machine seems to work fine >> either way. > > Hmmm... can you please test the attached patch instead? It seems > likely that the root cause is not flakiness of SIDPR but incorrect > locking in libata EH code. > > Thanks. > > -- > tejun > -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Helo, On 07/20/2010 05:14 AM, Paul Check wrote: > Hey Tejun: I guess this is the same patch that you sent me to fix my issue > with missing drives. Good news: I've been through about 10 reboots now > and no problems. Based on my prior experience, I'd say with the old > setup, 10 clean boots in a row was probably less than a 1% event. So, it > seems that this has fixed my problem. Yeap, it's the same one. I'm forwarding the patch upstream now but, Mark, please let me know the test result. Thanks.
On Tue, Jul 20, 2010 at 7:14 AM, Tejun Heo <tj@kernel.org> wrote: > Helo, > > On 07/20/2010 05:14 AM, Paul Check wrote: >> Hey Tejun: I guess this is the same patch that you sent me to fix my issue >> with missing drives. Good news: I've been through about 10 reboots now >> and no problems. Based on my prior experience, I'd say with the old >> setup, 10 clean boots in a row was probably less than a 1% event. So, it >> seems that this has fixed my problem. > > Yeap, it's the same one. I'm forwarding the patch upstream now but, > Mark, please let me know the test result. > > Thanks. > > -- > tejun > Tejun, I'm traveling but back tonight. I'll try it then. Thanks, Mark -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 20, 2010 at 7:53 AM, Mark Knecht <markknecht@gmail.com> wrote: > On Tue, Jul 20, 2010 at 7:14 AM, Tejun Heo <tj@kernel.org> wrote: >> Helo, >> >> On 07/20/2010 05:14 AM, Paul Check wrote: >>> Hey Tejun: I guess this is the same patch that you sent me to fix my issue >>> with missing drives. Good news: I've been through about 10 reboots now >>> and no problems. Based on my prior experience, I'd say with the old >>> setup, 10 clean boots in a row was probably less than a 1% event. So, it >>> seems that this has fixed my problem. >> >> Yeap, it's the same one. I'm forwarding the patch upstream now but, >> Mark, please let me know the test result. >> >> Thanks. >> >> -- >> tejun >> > > Tejun, > I'm traveling but back tonight. I'll try it then. > > Thanks, > Mark > OK, I was able to get into the machine remotely for a few minutes. I think the patch applied correctly and the machine cold booted cleanly. (My wife powered it up for me.) I'll do more boots later when you confirm everything looks reasonable. I don't see any print statement in the patch so I don't know what to look for. I'm attaching a dmesg file for you to review. Assuming I did this right then everything seems good so far. Cheers, Mark c2stable linux # patch --verbose -p1 <~mark/Downloads/ata_piix-sidpr-lock.patch Hmm... Looks like a unified diff to me... The text leading up to this was: -------------------------- |diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c |index 7409f98..3971bc0 100644 |--- a/drivers/ata/ata_piix.c |+++ b/drivers/ata/ata_piix.c -------------------------- Patching file drivers/ata/ata_piix.c using Plan A... Hunk #1 succeeded at 158. Hunk #2 succeeded at 952. Hunk #3 succeeded at 968. Hunk #4 succeeded at 1573. done c2stable linux #
On Mon, Jul 19, 2010 at 8:14 PM, Paul Check <paul@thechecks.ca> wrote: > Hey Tejun: I guess this is the same patch that you sent me to fix my issue > with missing drives. Good news: I've been through about 10 reboots now > and no problems. Based on my prior experience, I'd say with the old > setup, 10 clean boots in a row was probably less than a 1% event. So, it > seems that this has fixed my problem. > > Thanks! > > Regards, Paul > Hey Paul. Glad it worked for you as it did for me. Was your hardware in any way similar to mine? - Mark -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I missed your hardware. I have an ASUS PT 6 Deluxe V2 board with 4 drives. I think they are 1TB each, Seagate SATA drives. P > On Mon, Jul 19, 2010 at 8:14 PM, Paul Check <paul@thechecks.ca> wrote: >> Hey Tejun: I guess this is the same patch that you sent me to fix my >> issue >> with missing drives. Â Good news: I've been through about 10 reboots now >> and no problems. Â Based on my prior experience, I'd say with the old >> setup, 10 clean boots in a row was probably less than a 1% event. Â So, >> it >> seems that this has fixed my problem. >> >> Thanks! >> >> Regards, Paul >> > > Hey Paul. Glad it worked for you as it did for me. > > Was your hardware in any way similar to mine? > > - Mark > -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 20, 2010 at 2:19 PM, Paul Check <paul@thechecks.ca> wrote: > I missed your hardware. I have an ASUS PT 6 Deluxe V2 board with 4 drives. > I think they are 1TB each, Seagate SATA drives. P > What processor? I've got the Core i7-980X. I've wondered if this problem showed up for me because possibly the 12 thread processor moves through some part of boot setup pretty quickly. Mine is an Asus Rampage II Extreme with five 500GB drives. Whatever the reason great thanks to Tejun for fixing it. I keep wanting to add an 'r' to his last name to get 'Hero'! Cheers, Mark -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Oh, yeah, my processor is the i7 975. That's 4 cores 8 threads, right? Yes, nice work Tejun, thanks. I'll have to run custom kernels until this gets fed into Debian. P > On Tue, Jul 20, 2010 at 2:19 PM, Paul Check <paul@thechecks.ca> wrote: >> I missed your hardware. I have an ASUS PT 6 Deluxe V2 board with 4 >> drives. >> I think they are 1TB each, Seagate SATA drives. Â P >> > > What processor? I've got the Core i7-980X. I've wondered if this > problem showed up for me because possibly the 12 thread processor > moves through some part of boot setup pretty quickly. > > Mine is an Asus Rampage II Extreme with five 500GB drives. > > Whatever the reason great thanks to Tejun for fixing it. I keep > wanting to add an 'r' to his last name to get 'Hero'! > > Cheers, > Mark > -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 20, 2010 at 7:14 AM, Tejun Heo <tj@kernel.org> wrote: > Helo, > > On 07/20/2010 05:14 AM, Paul Check wrote: >> Hey Tejun: I guess this is the same patch that you sent me to fix my issue >> with missing drives. Good news: I've been through about 10 reboots now >> and no problems. Based on my prior experience, I'd say with the old >> setup, 10 clean boots in a row was probably less than a 1% event. So, it >> seems that this has fixed my problem. > > Yeap, it's the same one. I'm forwarding the patch upstream now but, > Mark, please let me know the test result. > > Thanks. > > -- > tejun > Tejun, Looks like I had a failure today. First one in weeks and only the 3rd or 4th boot with this newer patch file. One of the two drives making a RAID0 wasn't found so /dev/md11 (constructed from /dev/sdd and /dev/sde) couldn't be started. I did a cold reboot and the drive was found. If it matters, and it probably doesn't, the failure came on a boot which had a scheduled fsck to do of /dev/md5 - my main / drive. I don't see how that would make a difference but I figure why leave the info out. That's why the times are so much larger in the dmesg file. (I think) dmesg attached. I patched the Gentoo kernel if it makes a difference, same as I did with the earlier patch. mark@c2stable ~ $ uname -a Linux c2stable 2.6.34-gentoo-r2 #1 SMP PREEMPT Sun Jul 18 14:09:48 PDT 2010 x86_64 Intel(R) Core(TM) i7 CPU X 980 @ 3.33GHz GenuineIntel GNU/Linux mark@c2stable ~ $ Sorry, Mark
That's unfortunate. FYI I have continued to be trouble free, but my processor is a bit weaker than Mark's, although I would find it surprising that this would cause a problem. Also, FYI Mark, I have 12GB of Corsair RAM, and have it bumped up to the Intel XMP profile. P > On Tue, Jul 20, 2010 at 7:14 AM, Tejun Heo <tj@kernel.org> wrote: >> Helo, >> >> On 07/20/2010 05:14 AM, Paul Check wrote: >>> Hey Tejun: I guess this is the same patch that you sent me to fix my >>> issue >>> with missing drives. Â Good news: I've been through about 10 reboots >>> now >>> and no problems. Â Based on my prior experience, I'd say with the old >>> setup, 10 clean boots in a row was probably less than a 1% event. Â So, >>> it >>> seems that this has fixed my problem. >> >> Yeap, it's the same one. Â I'm forwarding the patch upstream now but, >> Mark, please let me know the test result. >> >> Thanks. >> >> -- >> tejun >> > > Tejun, > Looks like I had a failure today. First one in weeks and only the > 3rd or 4th boot with this newer patch file. One of the two drives > making a RAID0 wasn't found so /dev/md11 (constructed from /dev/sdd > and /dev/sde) couldn't be started. I did a cold reboot and the drive > was found. > > If it matters, and it probably doesn't, the failure came on a boot > which had a scheduled fsck to do of /dev/md5 - my main / drive. I > don't see how that would make a difference but I figure why leave the > info out. That's why the times are so much larger in the dmesg file. > (I think) > > dmesg attached. I patched the Gentoo kernel if it makes a > difference, same as I did with the earlier patch. > > mark@c2stable ~ $ uname -a > Linux c2stable 2.6.34-gentoo-r2 #1 SMP PREEMPT Sun Jul 18 14:09:48 PDT > 2010 x86_64 Intel(R) Core(TM) i7 CPU X 980 @ 3.33GHz GenuineIntel > GNU/Linux > mark@c2stable ~ $ > > Sorry, > Mark > -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On 07/21/2010 10:54 PM, Mark Knecht wrote: > Looks like I had a failure today. First one in weeks and only the > 3rd or 4th boot with this newer patch file. One of the two drives > making a RAID0 wasn't found so /dev/md11 (constructed from /dev/sdd > and /dev/sde) couldn't be started. I did a cold reboot and the drive > was found. > > If it matters, and it probably doesn't, the failure came on a boot > which had a scheduled fsck to do of /dev/md5 - my main / drive. I > don't see how that would make a difference but I figure why leave the > info out. That's why the times are so much larger in the dmesg file. > (I think) > > dmesg attached. I patched the Gentoo kernel if it makes a > difference, same as I did with the earlier patch. > > mark@c2stable ~ $ uname -a > Linux c2stable 2.6.34-gentoo-r2 #1 SMP PREEMPT Sun Jul 18 14:09:48 PDT > 2010 x86_64 Intel(R) Core(TM) i7 CPU X 980 @ 3.33GHz GenuineIntel > GNU/Linux > mark@c2stable ~ $ Hmmm... that's weird. Can you please make sure the patch is actually applied? Adding a printk("XXX patch applied!\n") near other changes usually is easy enough. Also, can you please apply resume-dbg-1.patch too and reproduce the failure and post log? Thanks.
diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c index 7409f98..3971bc0 100644 --- a/drivers/ata/ata_piix.c +++ b/drivers/ata/ata_piix.c @@ -158,6 +158,7 @@ struct piix_map_db { struct piix_host_priv { const int *map; u32 saved_iocfg; + spinlock_t sidpr_lock; /* FIXME: remove once locking in EH is fixed */ void __iomem *sidpr; }; @@ -951,12 +952,15 @@ static int piix_sidpr_scr_read(struct ata_link *link, unsigned int reg, u32 *val) { struct piix_host_priv *hpriv = link->ap->host->private_data; + unsigned long flags; if (reg >= ARRAY_SIZE(piix_sidx_map)) return -EINVAL; + spin_lock_irqsave(&hpriv->sidpr_lock, flags); piix_sidpr_sel(link, reg); *val = ioread32(hpriv->sidpr + PIIX_SIDPR_DATA); + spin_unlock_irqrestore(&hpriv->sidpr_lock, flags); return 0; } @@ -964,12 +968,15 @@ static int piix_sidpr_scr_write(struct ata_link *link, unsigned int reg, u32 val) { struct piix_host_priv *hpriv = link->ap->host->private_data; + unsigned long flags; if (reg >= ARRAY_SIZE(piix_sidx_map)) return -EINVAL; + spin_lock_irqsave(&hpriv->sidpr_lock, flags); piix_sidpr_sel(link, reg); iowrite32(val, hpriv->sidpr + PIIX_SIDPR_DATA); + spin_unlock_irqrestore(&hpriv->sidpr_lock, flags); return 0; } @@ -1566,6 +1573,7 @@ static int __devinit piix_init_one(struct pci_dev *pdev, hpriv = devm_kzalloc(dev, sizeof(*hpriv), GFP_KERNEL); if (!hpriv) return -ENOMEM; + spin_lock_init(&hpriv->sidpr_lock); /* Save IOCFG, this will be used for cable detection, quirk * detection and restoration on detach. This is necessary