Patchwork ahci_start_engine compliance with AHCI spec

login
register
mail settings
Submitter Brian Norris
Date July 21, 2011, 5:13 p.m.
Message ID <CAN8TOE_dCg6Wi_trE97hXjBPnZi4jpLyoeUEVrbMpgDH-BwKWA@mail.gmail.com>
Download mbox | patch
Permalink /patch/106110/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Brian Norris - July 21, 2011, 5:13 p.m.
On Thu, Jul 21, 2011 at 1:49 AM, Tejun Heo <tj@kernel.org> wrote:
> On Mon, Jul 18, 2011 at 11:40:17AM -0700, Brian Norris wrote:
>> On Wed, Jul 13, 2011 at 6:14 AM, Tejun Heo <tj@kernel.org> wrote:
>> > Hmmm... what happens if you don't comment out ahci_start_engine() call
>> > from ahci_start_port()?
>>
>> I wasn't commenting out the ahci_start_engine() from
>> ahci_start_port(). Can you clarify what you mean?
>
> Oh, I meant "what if you comment out..."  I wrote that sentence in
> negative and then switched but forgot removing "don't".

OK, well I tried simply commenting out that ahci_start_engine() on
both my special controller and on the Dell E6410 laptop and it worked
just fine (solved my issues and didn't cause any issues on the Dell).
Is this safe? It seems like we end up calling ahci_start_engine() at
the end of the error handling process anyway, so maybe this call is
not really necessary in the first place?

Anyway, I also tried my own fix for this: adding a small delay to wait
for some link recognition at the end of ahci_power_up(). I'm not sure
if this is the greatest, but it also works for both systems I'm
testing. I included the test patch here (based on linux-2.6). BTW, I'm
not sure my mail will be formatted perfectly here. I can resend with
my other mailer if needed.

>> > Is this the same IP block that Jian Peng was using?
>>
>> Yes, it is. I'm taking over some of his work.
>
> Is there any way to detect that particular IP block.  It's the only
> one with this problem so maybe we just should treat it specially.

I'm not sure how easy it would be to detect this particular version.
Besides, that is not our first choice solution, as our special-case
fix would not be very well tested for logic bugs.

Brian

---
 drivers/ata/libahci.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - July 22, 2011, 9:03 a.m.
Hello, Brian.

On Thu, Jul 21, 2011 at 10:13:16AM -0700, Brian Norris wrote:
> On Thu, Jul 21, 2011 at 1:49 AM, Tejun Heo <tj@kernel.org> wrote:
> > On Mon, Jul 18, 2011 at 11:40:17AM -0700, Brian Norris wrote:
> >> On Wed, Jul 13, 2011 at 6:14 AM, Tejun Heo <tj@kernel.org> wrote:
> >> > Hmmm... what happens if you don't comment out ahci_start_engine() call
> >> > from ahci_start_port()?
> >>
> >> I wasn't commenting out the ahci_start_engine() from
> >> ahci_start_port(). Can you clarify what you mean?
> >
> > Oh, I meant "what if you comment out..."  I wrote that sentence in
> > negative and then switched but forgot removing "don't".
> 
> OK, well I tried simply commenting out that ahci_start_engine() on
> both my special controller and on the Dell E6410 laptop and it worked
> just fine (solved my issues and didn't cause any issues on the Dell).
> Is this safe? It seems like we end up calling ahci_start_engine() at
> the end of the error handling process anyway, so maybe this call is
> not really necessary in the first place?

Yes, I believe so.

> Anyway, I also tried my own fix for this: adding a small delay to wait
> for some link recognition at the end of ahci_power_up(). I'm not sure
> if this is the greatest, but it also works for both systems I'm
> testing. I included the test patch here (based on linux-2.6). BTW, I'm
> not sure my mail will be formatted perfectly here. I can resend with
> my other mailer if needed.

The problem is that both my and your approach aren't ultimately safe
on this particular IP block.  I don't think it's possible make things
completely safe for it.  There's no mutual exclusion against PHY
events - be it flaky signal, power surge or actual hotplug - and
driver operation.  No matter how careful the driver behaves, if PHY
events happen after the last check before starting DMA engine, DRQ may
be set by the time driver gets to it.

The IP block you're dealing with is inherently buggy.  What the spec
means, I think, is the DMA engine might not start or behave properly
if enabled while DRQ is set, which is fine.  Driver will notice that,
reset stuff and retry.  It is *completely* different from "the
controller becomes brick until power cycled if that happens".  So, we
can work around all we want but that is one buggy controller.  If
possible, please tell the manufacturer or licensor to fix it.

For now, let's first try removing ahci_start_engine() call from
port_start and see how that goes.

Thanks.
Brian Norris - Aug. 3, 2011, 12:06 a.m.
Hi Tejun,

I wanted to follow up a bit on your last comments.

On Fri, Jul 22, 2011 at 2:03 AM, Tejun Heo <tj@kernel.org> wrote:
> The problem is that both my and your approach aren't ultimately safe
> on this particular IP block.  I don't think it's possible make things
> completely safe for it.  There's no mutual exclusion against PHY
> events - be it flaky signal, power surge or actual hotplug - and
> driver operation.  No matter how careful the driver behaves, if PHY
> events happen after the last check before starting DMA engine, DRQ may
> be set by the time driver gets to it.

Can DRQ be set from 0->1 without a software-initiated action? I didn't
think it was directly tied to PHY events, and so we can fairly well
predict that it will remain 0.

On the other hand, PxSSTS.DET can be affected by PHY, but I don't
believe DET != 3 directly triggers this hardware bug.

> The IP block you're dealing with is inherently buggy.  What the spec
> means, I think, is the DMA engine might not start or behave properly
> if enabled while DRQ is set, which is fine.  Driver will notice that,
> reset stuff and retry.  It is *completely* different from "the
> controller becomes brick until power cycled if that happens".  So, we
> can work around all we want but that is one buggy controller.  If
> possible, please tell the manufacturer or licensor to fix it.

Yes, I believe the hardware designers know how buggy this is...but
it's still worth some effort to fix the software as well as possible
for current hardware behavior.

> For now, let's first try removing ahci_start_engine() call from
> port_start and see how that goes.

Thanks for the help, and happy testing!

Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - Aug. 4, 2011, 9:44 a.m.
Hello,

On Tue, Aug 02, 2011 at 05:06:13PM -0700, Brian Norris wrote:
> Can DRQ be set from 0->1 without a software-initiated action? I didn't
> think it was directly tied to PHY events, and so we can fairly well
> predict that it will remain 0.

Sure it can.  It's updated by Register D2H FISes sent by the device.
It can change after a PHY event; otherwise, I don't think it would get
set during init without any command in flight, right?

> Yes, I believe the hardware designers know how buggy this is...but
> it's still worth some effort to fix the software as well as possible
> for current hardware behavior.

Yeap, sure, let's get it working for majority of cases.  I just wanted
to point out that the hardware eventually needs to be fixed.

Thanks.

Patch

diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index 41223c7..0c0c444 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -657,6 +657,19 @@  static void ahci_power_up(struct ata_port *ap)

        /* wake up link */
        writel(cmd | PORT_CMD_ICC_ACTIVE, port_mmio + PORT_CMD);
+
+       /*
+        * Wait for PxSSTS.DET == 1h or 3h, step 5 of 10.1.1 in
+        * AHCI 1.3 spec. See 10.10.2 for spin-up procedure.
+        */
+       if (ata_wait_register(ap, port_mmio + ahci_scr_offset(ap, SCR_STATUS),
+                               0x01, 0x00, 1, 10)) {
+               /* also wait for DRQ to be cleared */
+               ata_wait_register(ap, port_mmio + PORT_TFDATA, ATA_DRQ,
+                               ATA_DRQ, 1, 10);
+
+               /* proceed with step 6, 7 of 10.1.1 in AHCI 1.3 spec? */
+       }
 }

 static int ahci_set_lpm(struct ata_link *link, enum ata_lpm_policy policy,