Patchwork [v2.6.38-rc7] Revert "libata: ahci_start_engine compliant to AHCI spec"

login
register
mail settings
Submitter Tejun Heo
Date May 14, 2011, 10:28 a.m.
Message ID <20110514102804.GB23665@htj.dyndns.org>
Download mbox | patch
Permalink /patch/95550/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Tejun Heo - May 14, 2011, 10:28 a.m.
This reverts commit 270dac35c26433d06a89150c51e75ca0181ca7e4.

The commits causes command timeouts on AC plug/unplug.  It isn't yet
clear why.  As the commit was for a single rather obscure controller,
revert the change for now.

The problem was reported and bisected by Gu Rui in bug#34692.

 https://bugzilla.kernel.org/show_bug.cgi?id=34692

Also, reported by Rafael and Michael in the following thread.

 http://thread.gmane.org/gmane.linux.kernel/1138771

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Gu Rui <chaos.proton@gmail.com>
Reported-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-by: Michael Leun <lkml20100708@newton.leun.net>
Cc: Jian Peng <jipeng2005@gmail.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
---
As we're already in -rc7, I'm sending the revert patch to both Jeff
and Linus.

Thank you.

 drivers/ata/libahci.c |   21 ---------------------
 1 files changed, 0 insertions(+), 21 deletions(-)
Linus Torvalds - May 14, 2011, 5:47 p.m.
On Sat, May 14, 2011 at 3:28 AM, Tejun Heo <tj@kernel.org> wrote:
>
> As we're already in -rc7, I'm sending the revert patch to both Jeff
> and Linus.

Applied,

                   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - May 14, 2011, 6:49 p.m.
On 05/14/2011 01:47 PM, Linus Torvalds wrote:
> On Sat, May 14, 2011 at 3:28 AM, Tejun Heo<tj@kernel.org>  wrote:
>>
>> As we're already in -rc7, I'm sending the revert patch to both Jeff
>> and Linus.
>
> Applied,

ACK, thanks


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - May 15, 2011, 9:25 a.m.
On Sunday, May 15, 2011, Jian Peng wrote:
> Hi, Michael/Rafael/Gu,
> 
> Could you help me understand the testing environme? I like to reproduce it
> if it is possible and debug it. The host controller this patch dealing with
> is not used on PC so my testing environment is quite different.

Well, the system I can reproduce it on is a production one, so I can't
really test too much on it.  What exactly would you like to do?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
lkml20100708@newton.leun.net - May 15, 2011, 12:20 p.m.
On Sun, 15 May 2011 11:25:19 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Sunday, May 15, 2011, Jian Peng wrote:
> > Hi, Michael/Rafael/Gu,
> > 
> > Could you help me understand the testing environme? I like to
> > reproduce it if it is possible and debug it. The host controller
> > this patch dealing with is not used on PC so my testing environment
> > is quite different.

We have seen it on at least 3 different controllers from at least 2
different vendors (I do not know, what hardware Gu has).

So, for now, it looks like to happen on more different controllers than
not.

Testing environment - uhm, what shall I say? On each of the two
notebooks affected runs 64bit openSuSE 11.4 in more or less default
configuration (apart from kernel, of course).

Just do a "ls -lR /" and watch kernel log while plugging/unplugging
power.

Maybe running a 64bit system/kernel is part of picture? As far as I can
see all affected machines run 64bit.

> Well, the system I can reproduce it on is a production one, so I can't
> really test too much on it.  What exactly would you like to do?

I've at least one machine available where I can test. So, if there is
anything (preferably non destructive ;-) ) you want me to test this
should be possible.

Maybe I'll have a look if my netbook (running 32bit) is also affected.
Valdis.Kletnieks@vt.edu - May 16, 2011, 5:02 p.m.
On Sat, 14 May 2011 12:28:04 +0200, Tejun Heo said:
> This reverts commit 270dac35c26433d06a89150c51e75ca0181ca7e4.
> 
> The commits causes command timeouts on AC plug/unplug.  It isn't yet
> clear why.  As the commit was for a single rather obscure controller,
> revert the change for now.
> 
> The problem was reported and bisected by Gu Rui in bug#34692.
> 
>  https://bugzilla.kernel.org/show_bug.cgi?id=34692
> 
> Also, reported by Rafael and Michael in the following thread.
> 
>  http://thread.gmane.org/gmane.linux.kernel/1138771

This also fixes the issue I had with a 10-second pause on my Dell Latitude E6500
laptop at boot while detecting the CD/DVD drive.
Rafael J. Wysocki - May 18, 2011, 7:44 p.m.
On Wednesday, May 18, 2011, Jian Peng wrote:
> Hi, Valdis/Rafael/Michael,
> 
> Could you help me test the following change?
> 
> After reverting 81ca7e4, add 5ms delay as follow since that seems also
> fixing the issue on my SATA host controller that requires 81ca7e4.
> 
> In drivers/ata/libahci.c, inside ahci_hardreset() function,
> 
> 
> 1333 <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1333>
> ahci_start_engine
> <http://lxr.linux.no/linux+*/+code=ahci_start_engine>(ap
> <http://lxr.linux.no/linux+*/+code=ap>);
> 
>             msleep(5);1334
> <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1334>1335
> <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1335>        if
> (online <http://lxr.linux.no/linux+*/+code=online>)1336
> <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1336>
>   *class <http://lxr.linux.no/linux+*/+code=class> = ahci_dev_classify
> <http://lxr.linux.no/linux+*/+code=ahci_dev_classify>(ap
> <http://lxr.linux.no/linux+*/+code=ap>);
> 
> Since my host controller requires time to switch internal state to be ready.
> Please let me know your testing result.

Could you simply post a patch?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Valdis.Kletnieks@vt.edu - May 20, 2011, 3:40 p.m.
On Wed, 18 May 2011 17:14:56 PDT, Jian Peng said:

> > @@ -1353,6 +1332,8 @@
> >
> >
> >   ahci_start_engine(ap);
> >
> > + msleep(5);
> > +
> >   if (online)
> >
> >    *class = ahci_dev_classify(ap);
> >

It may very well be that adding a magic msleep(5) here just Makes It Work, but
I have a gut feeling that it's in the wrong place (for starters, 'online' can't change
during the msleep() unless somebody *else* sets it - in which case the locking
is screwed up as we're not forcing a re-read of the value).  The msleep() probably
needs to be before something else further down in the code (but I have no idea
exactly what).
Tejun Heo - May 20, 2011, 3:43 p.m.
On Fri, May 20, 2011 at 11:40:20AM -0400, Valdis.Kletnieks@vt.edu wrote:
> On Wed, 18 May 2011 17:14:56 PDT, Jian Peng said:
> 
> > > @@ -1353,6 +1332,8 @@
> > >
> > >
> > >   ahci_start_engine(ap);
> > >
> > > + msleep(5);
> > > +
> > >   if (online)
> > >
> > >    *class = ahci_dev_classify(ap);
> > >
> 
> It may very well be that adding a magic msleep(5) here just Makes It Work, but
> I have a gut feeling that it's in the wrong place (for starters, 'online' can't change
> during the msleep() unless somebody *else* sets it - in which case the locking
> is screwed up as we're not forcing a re-read of the value).  The msleep() probably
> needs to be before something else further down in the code (but I have no idea
> exactly what).

At this point, I think it would be better to simply add a flag and
enable the check for the affected controller.

Thanks.
Jian Peng - May 20, 2011, 6:21 p.m.
Sorry that I need fix my gmail setting to make it show up in LKML.

Hi, Tejun/Valdis,

Since this is an interoperability issue of SATA host controller, the
first step I want to try it to make sure the tweak that MAKE my
controller WORK does not break other controllers.
You are both right that adding this majic 5ms delay at this place
should not be the final solution.

If this magic 5ms delay works on other affected systems, I plan to
post a new patch that inside ahci_start_engine(), still perform same
check, and show warning message if failed, but will set a flag, then
still set START bit, and if there is such failure flag, add 5ms delay.

Valdis, could you apply the following patch and retest it?

Tejun, please review it.

--- a/drivers/ata/libahci.c 2011-05-18 14:23:36.564665643 -0700

+++ c/drivers/ata/libahci.c 2011-05-20 09:48:06.194663506 -0700
@@ -540,6 +540,7 @@

  void __iomem *port_mmio = ahci_port_base(ap);
  u32 tmp;
  u8 status;

+ int err = 0;


  status = readl(port_mmio + PORT_TFDATA) & 0xFF;


@@ -553,12 +554,12 @@

   * specific controller will fail under this condition
   */
  if (status & (ATA_BUSY | ATA_DRQ))
-  return;

+  err = 1;

  else {
   ahci_scr_read(&ap->link, SCR_STATUS, &tmp);

   if ((tmp & 0xf) != 0x3)
-   return;

+   err = 1;
  }

  /* start DMA */
@@ -566,6 +567,13 @@
  tmp |= PORT_CMD_START;
  writel(tmp, port_mmio + PORT_CMD);
  readl(port_mmio + PORT_CMD); /* flush */
+
+ /* Some controllers need longer time to be ready */
+ if(err) {
+  printk(KERN_WARNING
+   "Controller in wrong state when setting START bit\n");
+  msleep(5);
+ }
 }
 EXPORT_SYMBOL_GPL(ahci_start_engine);

On Fri, May 20, 2011 at 8:43 AM, Tejun Heo <tj@kernel.org> wrote:
>
> On Fri, May 20, 2011 at 11:40:20AM -0400, Valdis.Kletnieks@vt.edu wrote:
> > On Wed, 18 May 2011 17:14:56 PDT, Jian Peng said:
> >
> > > > @@ -1353,6 +1332,8 @@
> > > >
> > > >
> > > >   ahci_start_engine(ap);
> > > >
> > > > + msleep(5);
> > > > +
> > > >   if (online)
> > > >
> > > >    *class = ahci_dev_classify(ap);
> > > >
> >
> > It may very well be that adding a magic msleep(5) here just Makes It Work, but
> > I have a gut feeling that it's in the wrong place (for starters, 'online' can't change
> > during the msleep() unless somebody *else* sets it - in which case the locking
> > is screwed up as we're not forcing a re-read of the value).  The msleep() probably
> > needs to be before something else further down in the code (but I have no idea
> > exactly what).
>
> At this point, I think it would be better to simply add a flag and
> enable the check for the affected controller.
>
> Thanks.
>
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Valdis.Kletnieks@vt.edu - May 20, 2011, 6:25 p.m.
On Fri, 20 May 2011 10:02:56 PDT, Jian Peng said:
> --20cf307f38f6d842a904a3b81730

> You are both right that adding this majic 5ms delay at this place should not
> be the final solution.
> 
> If this magic 5ms delay works on other affected systems, I plan to post a
> new patch that inside ahci_start_engine(), still perform same check, and
> show warning message if failed, but will set a flag, then still set START
> bit, and if there is such failure flag, add 5ms delay.
> 
> Valdis, could you apply the following patch and retest it?

I should be able to do that this weekend.  To clarify - should this be with the
problem commit 270dac35c26433d06a89150c51e75ca0181ca7e4 applied, or reverted?
Jian Peng - May 22, 2011, 2 a.m.
HI, Valdis,

This patch is on top of reverted patch 81ca7e4. So you should not
revert 81ca7e4 before applying this new one.

Best regards,
Jian

On Fri, May 20, 2011 at 11:25 AM, <Valdis.Kletnieks@vt.edu> wrote:
>
> On Fri, 20 May 2011 10:02:56 PDT, Jian Peng said:
> > --20cf307f38f6d842a904a3b81730
>
> > You are both right that adding this majic 5ms delay at this place should not
> > be the final solution.
> >
> > If this magic 5ms delay works on other affected systems, I plan to post a
> > new patch that inside ahci_start_engine(), still perform same check, and
> > show warning message if failed, but will set a flag, then still set START
> > bit, and if there is such failure flag, add 5ms delay.
> >
> > Valdis, could you apply the following patch and retest it?
>
> I should be able to do that this weekend.  To clarify - should this be with the
> problem commit 270dac35c26433d06a89150c51e75ca0181ca7e4 applied, or reverted?
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - May 23, 2011, 12:13 p.m.
Hello,

On Fri, May 20, 2011 at 10:02:56AM -0700, Jian Peng wrote:
> Hi, Tejun/Valdis,
> 
> Since this is an interoperability issue of SATA host controller, the first
> step I want to try it to make sure the tweak that MAKE my controller WORK
> does not break other controllers.
> You are both right that adding this majic 5ms delay at this place should not
> be the final solution.
> 
> If this magic 5ms delay works on other affected systems, I plan to post a
> new patch that inside ahci_start_engine(), still perform same check, and
> show warning message if failed, but will set a flag, then still set START
> bit, and if there is such failure flag, add 5ms delay.

Yeah, sounds like a plan but please add ample comment explaining
what's going on for which controller including link to the mailing
list threads.  As we're basically adding a black magic, I would still
like to enable it only for the affected controllers so that we don't
have to worry whether there are controllers which are affected by this
problem but we don't know about.

> --- a/drivers/ata/libahci.c 2011-05-18 14:23:36.564665643 -0700
> +++ c/drivers/ata/libahci.c 2011-05-20 09:48:06.194663506 -0700
> @@ -540,6 +540,7 @@
>   void __iomem *port_mmio = ahci_port_base(ap);
>   u32 tmp;
>   u8 status;
> + int err = 0;

I think

 bool stat_failed = false;

would be more in line with recent coding style.

>   status = readl(port_mmio + PORT_TFDATA) & 0xFF;
> 
> @@ -553,12 +554,12 @@
>    * specific controller will fail under this condition
>    */
>   if (status & (ATA_BUSY | ATA_DRQ))
> -  return;
> +  err = 1;
>   else {
>    ahci_scr_read(&ap->link, SCR_STATUS, &tmp);
> 
>    if ((tmp & 0xf) != 0x3)
> -   return;
> +   err = 1;
>   }
> 
>   /* start DMA */
> @@ -566,6 +567,13 @@
>   tmp |= PORT_CMD_START;
>   writel(tmp, port_mmio + PORT_CMD);
>   readl(port_mmio + PORT_CMD); /* flush */
> +
> + /* Some controllers need longer time to be ready */
> + if(err) {
> +  printk(KERN_WARNING
> +   "Controller in wrong state when setting START bit\n");
> +  msleep(5);

 ata_port_printk()?

Thanks.
Jian Peng - May 24, 2011, 1:04 a.m.
Hi, Tejun,

Defintely. I will clarify it as much as possible and meanwhile, wait
for it to be tested by reporters.

Thanks,
Jian

On Mon, May 23, 2011 at 5:13 AM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Fri, May 20, 2011 at 10:02:56AM -0700, Jian Peng wrote:
>> Hi, Tejun/Valdis,
>>
>> Since this is an interoperability issue of SATA host controller, the first
>> step I want to try it to make sure the tweak that MAKE my controller WORK
>> does not break other controllers.
>> You are both right that adding this majic 5ms delay at this place should not
>> be the final solution.
>>
>> If this magic 5ms delay works on other affected systems, I plan to post a
>> new patch that inside ahci_start_engine(), still perform same check, and
>> show warning message if failed, but will set a flag, then still set START
>> bit, and if there is such failure flag, add 5ms delay.
>
> Yeah, sounds like a plan but please add ample comment explaining
> what's going on for which controller including link to the mailing
> list threads.  As we're basically adding a black magic, I would still
> like to enable it only for the affected controllers so that we don't
> have to worry whether there are controllers which are affected by this
> problem but we don't know about.
>
>> --- a/drivers/ata/libahci.c 2011-05-18 14:23:36.564665643 -0700
>> +++ c/drivers/ata/libahci.c 2011-05-20 09:48:06.194663506 -0700
>> @@ -540,6 +540,7 @@
>>   void __iomem *port_mmio = ahci_port_base(ap);
>>   u32 tmp;
>>   u8 status;
>> + int err = 0;
>
> I think
>
>  bool stat_failed = false;
>
> would be more in line with recent coding style.
>
>>   status = readl(port_mmio + PORT_TFDATA) & 0xFF;
>>
>> @@ -553,12 +554,12 @@
>>    * specific controller will fail under this condition
>>    */
>>   if (status & (ATA_BUSY | ATA_DRQ))
>> -  return;
>> +  err = 1;
>>   else {
>>    ahci_scr_read(&ap->link, SCR_STATUS, &tmp);
>>
>>    if ((tmp & 0xf) != 0x3)
>> -   return;
>> +   err = 1;
>>   }
>>
>>   /* start DMA */
>> @@ -566,6 +567,13 @@
>>   tmp |= PORT_CMD_START;
>>   writel(tmp, port_mmio + PORT_CMD);
>>   readl(port_mmio + PORT_CMD); /* flush */
>> +
>> + /* Some controllers need longer time to be ready */
>> + if(err) {
>> +  printk(KERN_WARNING
>> +   "Controller in wrong state when setting START bit\n");
>> +  msleep(5);
>
>  ata_port_printk()?
>
> Thanks.
>
> --
> tejun
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index ff9d832..d38c40f 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -561,27 +561,6 @@  void ahci_start_engine(struct ata_port *ap)
 {
 	void __iomem *port_mmio = ahci_port_base(ap);
 	u32 tmp;
-	u8 status;
-
-	status = readl(port_mmio + PORT_TFDATA) & 0xFF;
-
-	/*
-	 * At end of section 10.1 of AHCI spec (rev 1.3), it states
-	 * Software shall not set PxCMD.ST to 1 until it is determined
-	 * that a functoinal device is present on the port as determined by
-	 * PxTFD.STS.BSY=0, PxTFD.STS.DRQ=0 and PxSSTS.DET=3h
-	 *
-	 * Even though most AHCI host controllers work without this check,
-	 * specific controller will fail under this condition
-	 */
-	if (status & (ATA_BUSY | ATA_DRQ))
-		return;
-	else {
-		ahci_scr_read(&ap->link, SCR_STATUS, &tmp);
-
-		if ((tmp & 0xf) != 0x3)
-			return;
-	}
 
 	/* start DMA */
 	tmp = readl(port_mmio + PORT_CMD);