Message ID | 20110514102804.GB23665@htj.dyndns.org |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
On Sat, May 14, 2011 at 3:28 AM, Tejun Heo <tj@kernel.org> wrote: > > As we're already in -rc7, I'm sending the revert patch to both Jeff > and Linus. Applied, Linus -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/14/2011 01:47 PM, Linus Torvalds wrote: > On Sat, May 14, 2011 at 3:28 AM, Tejun Heo<tj@kernel.org> wrote: >> >> As we're already in -rc7, I'm sending the revert patch to both Jeff >> and Linus. > > Applied, ACK, thanks -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sunday, May 15, 2011, Jian Peng wrote: > Hi, Michael/Rafael/Gu, > > Could you help me understand the testing environme? I like to reproduce it > if it is possible and debug it. The host controller this patch dealing with > is not used on PC so my testing environment is quite different. Well, the system I can reproduce it on is a production one, so I can't really test too much on it. What exactly would you like to do? Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 15 May 2011 11:25:19 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > On Sunday, May 15, 2011, Jian Peng wrote: > > Hi, Michael/Rafael/Gu, > > > > Could you help me understand the testing environme? I like to > > reproduce it if it is possible and debug it. The host controller > > this patch dealing with is not used on PC so my testing environment > > is quite different. We have seen it on at least 3 different controllers from at least 2 different vendors (I do not know, what hardware Gu has). So, for now, it looks like to happen on more different controllers than not. Testing environment - uhm, what shall I say? On each of the two notebooks affected runs 64bit openSuSE 11.4 in more or less default configuration (apart from kernel, of course). Just do a "ls -lR /" and watch kernel log while plugging/unplugging power. Maybe running a 64bit system/kernel is part of picture? As far as I can see all affected machines run 64bit. > Well, the system I can reproduce it on is a production one, so I can't > really test too much on it. What exactly would you like to do? I've at least one machine available where I can test. So, if there is anything (preferably non destructive ;-) ) you want me to test this should be possible. Maybe I'll have a look if my netbook (running 32bit) is also affected.
On Sat, 14 May 2011 12:28:04 +0200, Tejun Heo said: > This reverts commit 270dac35c26433d06a89150c51e75ca0181ca7e4. > > The commits causes command timeouts on AC plug/unplug. It isn't yet > clear why. As the commit was for a single rather obscure controller, > revert the change for now. > > The problem was reported and bisected by Gu Rui in bug#34692. > > https://bugzilla.kernel.org/show_bug.cgi?id=34692 > > Also, reported by Rafael and Michael in the following thread. > > http://thread.gmane.org/gmane.linux.kernel/1138771 This also fixes the issue I had with a 10-second pause on my Dell Latitude E6500 laptop at boot while detecting the CD/DVD drive.
On Wednesday, May 18, 2011, Jian Peng wrote: > Hi, Valdis/Rafael/Michael, > > Could you help me test the following change? > > After reverting 81ca7e4, add 5ms delay as follow since that seems also > fixing the issue on my SATA host controller that requires 81ca7e4. > > In drivers/ata/libahci.c, inside ahci_hardreset() function, > > > 1333 <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1333> > ahci_start_engine > <http://lxr.linux.no/linux+*/+code=ahci_start_engine>(ap > <http://lxr.linux.no/linux+*/+code=ap>); > > msleep(5);1334 > <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1334>1335 > <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1335> if > (online <http://lxr.linux.no/linux+*/+code=online>)1336 > <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1336> > *class <http://lxr.linux.no/linux+*/+code=class> = ahci_dev_classify > <http://lxr.linux.no/linux+*/+code=ahci_dev_classify>(ap > <http://lxr.linux.no/linux+*/+code=ap>); > > Since my host controller requires time to switch internal state to be ready. > Please let me know your testing result. Could you simply post a patch? Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 18 May 2011 17:14:56 PDT, Jian Peng said: > > @@ -1353,6 +1332,8 @@ > > > > > > ahci_start_engine(ap); > > > > + msleep(5); > > + > > if (online) > > > > *class = ahci_dev_classify(ap); > > It may very well be that adding a magic msleep(5) here just Makes It Work, but I have a gut feeling that it's in the wrong place (for starters, 'online' can't change during the msleep() unless somebody *else* sets it - in which case the locking is screwed up as we're not forcing a re-read of the value). The msleep() probably needs to be before something else further down in the code (but I have no idea exactly what).
On Fri, May 20, 2011 at 11:40:20AM -0400, Valdis.Kletnieks@vt.edu wrote: > On Wed, 18 May 2011 17:14:56 PDT, Jian Peng said: > > > > @@ -1353,6 +1332,8 @@ > > > > > > > > > ahci_start_engine(ap); > > > > > > + msleep(5); > > > + > > > if (online) > > > > > > *class = ahci_dev_classify(ap); > > > > > It may very well be that adding a magic msleep(5) here just Makes It Work, but > I have a gut feeling that it's in the wrong place (for starters, 'online' can't change > during the msleep() unless somebody *else* sets it - in which case the locking > is screwed up as we're not forcing a re-read of the value). The msleep() probably > needs to be before something else further down in the code (but I have no idea > exactly what). At this point, I think it would be better to simply add a flag and enable the check for the affected controller. Thanks.
Sorry that I need fix my gmail setting to make it show up in LKML. Hi, Tejun/Valdis, Since this is an interoperability issue of SATA host controller, the first step I want to try it to make sure the tweak that MAKE my controller WORK does not break other controllers. You are both right that adding this majic 5ms delay at this place should not be the final solution. If this magic 5ms delay works on other affected systems, I plan to post a new patch that inside ahci_start_engine(), still perform same check, and show warning message if failed, but will set a flag, then still set START bit, and if there is such failure flag, add 5ms delay. Valdis, could you apply the following patch and retest it? Tejun, please review it. --- a/drivers/ata/libahci.c 2011-05-18 14:23:36.564665643 -0700 +++ c/drivers/ata/libahci.c 2011-05-20 09:48:06.194663506 -0700 @@ -540,6 +540,7 @@ void __iomem *port_mmio = ahci_port_base(ap); u32 tmp; u8 status; + int err = 0; status = readl(port_mmio + PORT_TFDATA) & 0xFF; @@ -553,12 +554,12 @@ * specific controller will fail under this condition */ if (status & (ATA_BUSY | ATA_DRQ)) - return; + err = 1; else { ahci_scr_read(&ap->link, SCR_STATUS, &tmp); if ((tmp & 0xf) != 0x3) - return; + err = 1; } /* start DMA */ @@ -566,6 +567,13 @@ tmp |= PORT_CMD_START; writel(tmp, port_mmio + PORT_CMD); readl(port_mmio + PORT_CMD); /* flush */ + + /* Some controllers need longer time to be ready */ + if(err) { + printk(KERN_WARNING + "Controller in wrong state when setting START bit\n"); + msleep(5); + } } EXPORT_SYMBOL_GPL(ahci_start_engine); On Fri, May 20, 2011 at 8:43 AM, Tejun Heo <tj@kernel.org> wrote: > > On Fri, May 20, 2011 at 11:40:20AM -0400, Valdis.Kletnieks@vt.edu wrote: > > On Wed, 18 May 2011 17:14:56 PDT, Jian Peng said: > > > > > > @@ -1353,6 +1332,8 @@ > > > > > > > > > > > > ahci_start_engine(ap); > > > > > > > > + msleep(5); > > > > + > > > > if (online) > > > > > > > > *class = ahci_dev_classify(ap); > > > > > > > > It may very well be that adding a magic msleep(5) here just Makes It Work, but > > I have a gut feeling that it's in the wrong place (for starters, 'online' can't change > > during the msleep() unless somebody *else* sets it - in which case the locking > > is screwed up as we're not forcing a re-read of the value). The msleep() probably > > needs to be before something else further down in the code (but I have no idea > > exactly what). > > At this point, I think it would be better to simply add a flag and > enable the check for the affected controller. > > Thanks. > > -- > tejun -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 20 May 2011 10:02:56 PDT, Jian Peng said: > --20cf307f38f6d842a904a3b81730 > You are both right that adding this majic 5ms delay at this place should not > be the final solution. > > If this magic 5ms delay works on other affected systems, I plan to post a > new patch that inside ahci_start_engine(), still perform same check, and > show warning message if failed, but will set a flag, then still set START > bit, and if there is such failure flag, add 5ms delay. > > Valdis, could you apply the following patch and retest it? I should be able to do that this weekend. To clarify - should this be with the problem commit 270dac35c26433d06a89150c51e75ca0181ca7e4 applied, or reverted?
HI, Valdis, This patch is on top of reverted patch 81ca7e4. So you should not revert 81ca7e4 before applying this new one. Best regards, Jian On Fri, May 20, 2011 at 11:25 AM, <Valdis.Kletnieks@vt.edu> wrote: > > On Fri, 20 May 2011 10:02:56 PDT, Jian Peng said: > > --20cf307f38f6d842a904a3b81730 > > > You are both right that adding this majic 5ms delay at this place should not > > be the final solution. > > > > If this magic 5ms delay works on other affected systems, I plan to post a > > new patch that inside ahci_start_engine(), still perform same check, and > > show warning message if failed, but will set a flag, then still set START > > bit, and if there is such failure flag, add 5ms delay. > > > > Valdis, could you apply the following patch and retest it? > > I should be able to do that this weekend. To clarify - should this be with the > problem commit 270dac35c26433d06a89150c51e75ca0181ca7e4 applied, or reverted? > -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Fri, May 20, 2011 at 10:02:56AM -0700, Jian Peng wrote: > Hi, Tejun/Valdis, > > Since this is an interoperability issue of SATA host controller, the first > step I want to try it to make sure the tweak that MAKE my controller WORK > does not break other controllers. > You are both right that adding this majic 5ms delay at this place should not > be the final solution. > > If this magic 5ms delay works on other affected systems, I plan to post a > new patch that inside ahci_start_engine(), still perform same check, and > show warning message if failed, but will set a flag, then still set START > bit, and if there is such failure flag, add 5ms delay. Yeah, sounds like a plan but please add ample comment explaining what's going on for which controller including link to the mailing list threads. As we're basically adding a black magic, I would still like to enable it only for the affected controllers so that we don't have to worry whether there are controllers which are affected by this problem but we don't know about. > --- a/drivers/ata/libahci.c 2011-05-18 14:23:36.564665643 -0700 > +++ c/drivers/ata/libahci.c 2011-05-20 09:48:06.194663506 -0700 > @@ -540,6 +540,7 @@ > void __iomem *port_mmio = ahci_port_base(ap); > u32 tmp; > u8 status; > + int err = 0; I think bool stat_failed = false; would be more in line with recent coding style. > status = readl(port_mmio + PORT_TFDATA) & 0xFF; > > @@ -553,12 +554,12 @@ > * specific controller will fail under this condition > */ > if (status & (ATA_BUSY | ATA_DRQ)) > - return; > + err = 1; > else { > ahci_scr_read(&ap->link, SCR_STATUS, &tmp); > > if ((tmp & 0xf) != 0x3) > - return; > + err = 1; > } > > /* start DMA */ > @@ -566,6 +567,13 @@ > tmp |= PORT_CMD_START; > writel(tmp, port_mmio + PORT_CMD); > readl(port_mmio + PORT_CMD); /* flush */ > + > + /* Some controllers need longer time to be ready */ > + if(err) { > + printk(KERN_WARNING > + "Controller in wrong state when setting START bit\n"); > + msleep(5); ata_port_printk()? Thanks.
Hi, Tejun, Defintely. I will clarify it as much as possible and meanwhile, wait for it to be tested by reporters. Thanks, Jian On Mon, May 23, 2011 at 5:13 AM, Tejun Heo <tj@kernel.org> wrote: > Hello, > > On Fri, May 20, 2011 at 10:02:56AM -0700, Jian Peng wrote: >> Hi, Tejun/Valdis, >> >> Since this is an interoperability issue of SATA host controller, the first >> step I want to try it to make sure the tweak that MAKE my controller WORK >> does not break other controllers. >> You are both right that adding this majic 5ms delay at this place should not >> be the final solution. >> >> If this magic 5ms delay works on other affected systems, I plan to post a >> new patch that inside ahci_start_engine(), still perform same check, and >> show warning message if failed, but will set a flag, then still set START >> bit, and if there is such failure flag, add 5ms delay. > > Yeah, sounds like a plan but please add ample comment explaining > what's going on for which controller including link to the mailing > list threads. As we're basically adding a black magic, I would still > like to enable it only for the affected controllers so that we don't > have to worry whether there are controllers which are affected by this > problem but we don't know about. > >> --- a/drivers/ata/libahci.c 2011-05-18 14:23:36.564665643 -0700 >> +++ c/drivers/ata/libahci.c 2011-05-20 09:48:06.194663506 -0700 >> @@ -540,6 +540,7 @@ >> void __iomem *port_mmio = ahci_port_base(ap); >> u32 tmp; >> u8 status; >> + int err = 0; > > I think > > bool stat_failed = false; > > would be more in line with recent coding style. > >> status = readl(port_mmio + PORT_TFDATA) & 0xFF; >> >> @@ -553,12 +554,12 @@ >> * specific controller will fail under this condition >> */ >> if (status & (ATA_BUSY | ATA_DRQ)) >> - return; >> + err = 1; >> else { >> ahci_scr_read(&ap->link, SCR_STATUS, &tmp); >> >> if ((tmp & 0xf) != 0x3) >> - return; >> + err = 1; >> } >> >> /* start DMA */ >> @@ -566,6 +567,13 @@ >> tmp |= PORT_CMD_START; >> writel(tmp, port_mmio + PORT_CMD); >> readl(port_mmio + PORT_CMD); /* flush */ >> + >> + /* Some controllers need longer time to be ready */ >> + if(err) { >> + printk(KERN_WARNING >> + "Controller in wrong state when setting START bit\n"); >> + msleep(5); > > ata_port_printk()? > > Thanks. > > -- > tejun > -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c index ff9d832..d38c40f 100644 --- a/drivers/ata/libahci.c +++ b/drivers/ata/libahci.c @@ -561,27 +561,6 @@ void ahci_start_engine(struct ata_port *ap) { void __iomem *port_mmio = ahci_port_base(ap); u32 tmp; - u8 status; - - status = readl(port_mmio + PORT_TFDATA) & 0xFF; - - /* - * At end of section 10.1 of AHCI spec (rev 1.3), it states - * Software shall not set PxCMD.ST to 1 until it is determined - * that a functoinal device is present on the port as determined by - * PxTFD.STS.BSY=0, PxTFD.STS.DRQ=0 and PxSSTS.DET=3h - * - * Even though most AHCI host controllers work without this check, - * specific controller will fail under this condition - */ - if (status & (ATA_BUSY | ATA_DRQ)) - return; - else { - ahci_scr_read(&ap->link, SCR_STATUS, &tmp); - - if ((tmp & 0xf) != 0x3) - return; - } /* start DMA */ tmp = readl(port_mmio + PORT_CMD);
This reverts commit 270dac35c26433d06a89150c51e75ca0181ca7e4. The commits causes command timeouts on AC plug/unplug. It isn't yet clear why. As the commit was for a single rather obscure controller, revert the change for now. The problem was reported and bisected by Gu Rui in bug#34692. https://bugzilla.kernel.org/show_bug.cgi?id=34692 Also, reported by Rafael and Michael in the following thread. http://thread.gmane.org/gmane.linux.kernel/1138771 Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Gu Rui <chaos.proton@gmail.com> Reported-by: Rafael J. Wysocki <rjw@sisk.pl> Reported-by: Michael Leun <lkml20100708@newton.leun.net> Cc: Jian Peng <jipeng2005@gmail.com> Cc: Jeff Garzik <jgarzik@pobox.com> --- As we're already in -rc7, I'm sending the revert patch to both Jeff and Linus. Thank you. drivers/ata/libahci.c | 21 --------------------- 1 files changed, 0 insertions(+), 21 deletions(-)