diff mbox series

usb: ehci-hcd: Add IAA handshake for removing async QH

Message ID 1615174511-14399-1-git-send-email-ye.li@nxp.com
State Superseded
Delegated to: Marek Vasut
Headers show
Series usb: ehci-hcd: Add IAA handshake for removing async QH | expand

Commit Message

Ye Li March 8, 2021, 3:35 a.m. UTC
According to EHCI spec, software needs to do handshake with HC for
safely removing QH from async list. This handshake is implemented by
setting IAAD (Interrupt on Async Advance Doorbell) bit in USB_USBCMD
register and poll the IAA (Interrupt on Async Advance bit) in the
USB_USBSTS to ensure the HC has released all on-chip state that may
potentially reference one of the data structures just removed.

Current codes only check active status of the last QTD, but this can't
ensure the QH is released from HC. We can meet unrecoverable
"EHCI timed out on TD" errors when running SCT tests on USB disk. The
USB_ASYNCLISTADDR register is changed to a invalid address when the
issue happens. It is fixed after adding the IAA handshake.

Signed-off-by: Ye Li <ye.li@nxp.com>
---
 drivers/usb/host/ehci-hcd.c | 25 +++++++++++++++++++++++++
 drivers/usb/host/ehci.h     |  1 +
 2 files changed, 26 insertions(+)

Comments

Marek Vasut March 8, 2021, 8:50 a.m. UTC | #1
On 3/8/21 4:35 AM, Ye Li wrote:
[...]
> +static int ehci_iaa_cycle(struct ehci_ctrl *ctrl)
> +{
> +	u32 cmd, status;
> +	int ret;
> +
> +	/* Enable Interrupt on Async Advance Doorbell. */
> +	cmd = ehci_readl(&ctrl->hcor->or_usbcmd);
> +	cmd |= CMD_IAAD;
> +	ehci_writel(&ctrl->hcor->or_usbcmd, cmd);
> +
> +	ret = handshake((uint32_t *)&ctrl->hcor->or_usbsts, STS_IAA, STS_IAA,

Is the (uint32_t *) cast really needed ?

> +			10 * 1000); /* 10ms timeout */
> +	if (ret < 0)
> +		printf("EHCI fail timeout STS_IAA set\n");

Shouldn't there be some abort ^ if ret < 0 ?
Also, add the return value into the printf, it is useful for debugging.

> +	status = ehci_readl(&ctrl->hcor->or_usbsts);
> +	if ((status & STS_IAA))

Drop the double parenthesis here, one set of () is enough.

> +		ehci_writel(&ctrl->hcor->or_usbsts, STS_IAA);
> +
> +	return ret;
> +}

Is this a bugfix for this release or is this for next release ?
We're in rc3 already, so I would suggest to be careful.
Ye Li March 9, 2021, 3:18 a.m. UTC | #2
Hi Marek,

On Mon, 2021-03-08 at 09:50 +0100, Marek Vasut wrote:
> Caution: EXT Email
> 
> On 3/8/21 4:35 AM, Ye Li wrote:
> [...]
> > 
> > +static int ehci_iaa_cycle(struct ehci_ctrl *ctrl)
> > +{
> > +     u32 cmd, status;
> > +     int ret;
> > +
> > +     /* Enable Interrupt on Async Advance Doorbell. */
> > +     cmd = ehci_readl(&ctrl->hcor->or_usbcmd);
> > +     cmd |= CMD_IAAD;
> > +     ehci_writel(&ctrl->hcor->or_usbcmd, cmd);
> > +
> > +     ret = handshake((uint32_t *)&ctrl->hcor->or_usbsts, STS_IAA,
> > STS_IAA,
> Is the (uint32_t *) cast really needed ?
> 
I will remove it in v2.

> > 
> > +                     10 * 1000); /* 10ms timeout */
> > +     if (ret < 0)
> > +             printf("EHCI fail timeout STS_IAA set\n");
> Shouldn't there be some abort ^ if ret < 0 ?
> Also, add the return value into the printf, it is useful for
> debugging.
I don't think need to abort anything when ret < 0. The ret value is
very simple in handshake, when ret < 0, it must be -1. 
Actually the codes refer the ehci_enable_async for the usage of
handshake.


> 
> > 
> > +     status = ehci_readl(&ctrl->hcor->or_usbsts);
> > +     if ((status & STS_IAA))
> Drop the double parenthesis here, one set of () is enough.
> 
I will remove it v2.

> > 
> > +             ehci_writel(&ctrl->hcor->or_usbsts, STS_IAA);
> > +
> > +     return ret;
> > +}
> Is this a bugfix for this release or is this for next release ?
> We're in rc3 already, so I would suggest to be careful.
It is ok to add the fix for next release. From our tests, the issue is
not very easy to reproduce by just using usb read/write.  In 
SCT, generally we need to run about 1 hour to see the issue. 

Best regards,
Ye Li
Marek Vasut March 9, 2021, 8:08 a.m. UTC | #3
On 3/9/21 4:18 AM, Ye Li wrote:
> Hi Marek,

Hi,

[...]

>>>
>>> +                     10 * 1000); /* 10ms timeout */
>>> +     if (ret < 0)
>>> +             printf("EHCI fail timeout STS_IAA set\n");
>> Shouldn't there be some abort ^ if ret < 0 ?
>> Also, add the return value into the printf, it is useful for
>> debugging.
> I don't think need to abort anything when ret < 0. The ret value is
> very simple in handshake, when ret < 0, it must be -1.
> Actually the codes refer the ehci_enable_async for the usage of
> handshake.

If the handshake timed out, that indicates a problem and that problem 
shouldn't be ignored, but rather propagated, no ? Maybe the entire ehci 
transfer should be aborted ?

[...]

>>>
>>> +             ehci_writel(&ctrl->hcor->or_usbsts, STS_IAA);
>>> +
>>> +     return ret;
>>> +}
>> Is this a bugfix for this release or is this for next release ?
>> We're in rc3 already, so I would suggest to be careful.
> It is ok to add the fix for next release. From our tests, the issue is
> not very easy to reproduce by just using usb read/write.  In
> SCT, generally we need to run about 1 hour to see the issue.

Can you add details on how to reproduce the issue into the commit 
message ? That would be highly helpful.
Ye Li March 9, 2021, 9:32 a.m. UTC | #4
Hi Marek,

On Tue, 2021-03-09 at 09:08 +0100, Marek Vasut wrote:
> Caution: EXT Email
> 
> On 3/9/21 4:18 AM, Ye Li wrote:
> > 
> > Hi Marek,
> Hi,
> 
> [...]
> 
> > 
> > > 
> > > > 
> > > > 
> > > > +                     10 * 1000); /* 10ms timeout */
> > > > +     if (ret < 0)
> > > > +             printf("EHCI fail timeout STS_IAA set\n");
> > > Shouldn't there be some abort ^ if ret < 0 ?
> > > Also, add the return value into the printf, it is useful for
> > > debugging.
> > I don't think need to abort anything when ret < 0. The ret value is
> > very simple in handshake, when ret < 0, it must be -1.
> > Actually the codes refer the ehci_enable_async for the usage of
> > handshake.
> If the handshake timed out, that indicates a problem and that problem
> shouldn't be ignored, but rather propagated, no ? Maybe the entire
> ehci
> transfer should be aborted ?
> 
I supposed the problem means QH does not advance, so next submit of
async transfer will have QTD timeout. But it is ok to abort transfer as
the error handling.


> [...]
> 
> > 
> > > 
> > > > 
> > > > 
> > > > +             ehci_writel(&ctrl->hcor->or_usbsts, STS_IAA);
> > > > +
> > > > +     return ret;
> > > > +}
> > > Is this a bugfix for this release or is this for next release ?
> > > We're in rc3 already, so I would suggest to be careful.
> > It is ok to add the fix for next release. From our tests, the issue
> > is
> > not very easy to reproduce by just using usb read/write.  In
> > SCT, generally we need to run about 1 hour to see the issue.
> Can you add details on how to reproduce the issue into the commit
> message ? That would be highly helpful.
ok. I will add more.

Best regards,
Ye Li
Marek Vasut March 9, 2021, 10 a.m. UTC | #5
On 3/9/21 10:32 AM, Ye Li wrote:
> Hi Marek,

Hi,

[...]

>>>>> +                     10 * 1000); /* 10ms timeout */
>>>>> +     if (ret < 0)
>>>>> +             printf("EHCI fail timeout STS_IAA set\n");
>>>> Shouldn't there be some abort ^ if ret < 0 ?
>>>> Also, add the return value into the printf, it is useful for
>>>> debugging.
>>> I don't think need to abort anything when ret < 0. The ret value is
>>> very simple in handshake, when ret < 0, it must be -1.
>>> Actually the codes refer the ehci_enable_async for the usage of
>>> handshake.
>> If the handshake timed out, that indicates a problem and that problem
>> shouldn't be ignored, but rather propagated, no ? Maybe the entire
>> ehci
>> transfer should be aborted ?
>>
> I supposed the problem means QH does not advance, so next submit of
> async transfer will have QTD timeout. But it is ok to abort transfer as
> the error handling.

If this is an error which would always cause subsequent error, abort 
here already.

>>>>> +             ehci_writel(&ctrl->hcor->or_usbsts, STS_IAA);
>>>>> +
>>>>> +     return ret;
>>>>> +}
>>>> Is this a bugfix for this release or is this for next release ?
>>>> We're in rc3 already, so I would suggest to be careful.
>>> It is ok to add the fix for next release. From our tests, the issue
>>> is
>>> not very easy to reproduce by just using usb read/write.  In
>>> SCT, generally we need to run about 1 hour to see the issue.
>> Can you add details on how to reproduce the issue into the commit
>> message ? That would be highly helpful.
> ok. I will add more.

Thank you
diff mbox series

Patch

diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c
index 8933f60..b52a903 100644
--- a/drivers/usb/host/ehci-hcd.c
+++ b/drivers/usb/host/ehci-hcd.c
@@ -346,6 +346,28 @@  static int ehci_disable_async(struct ehci_ctrl *ctrl)
 	return ret;
 }
 
+static int ehci_iaa_cycle(struct ehci_ctrl *ctrl)
+{
+	u32 cmd, status;
+	int ret;
+
+	/* Enable Interrupt on Async Advance Doorbell. */
+	cmd = ehci_readl(&ctrl->hcor->or_usbcmd);
+	cmd |= CMD_IAAD;
+	ehci_writel(&ctrl->hcor->or_usbcmd, cmd);
+
+	ret = handshake((uint32_t *)&ctrl->hcor->or_usbsts, STS_IAA, STS_IAA,
+			10 * 1000); /* 10ms timeout */
+	if (ret < 0)
+		printf("EHCI fail timeout STS_IAA set\n");
+
+	status = ehci_readl(&ctrl->hcor->or_usbsts);
+	if ((status & STS_IAA))
+		ehci_writel(&ctrl->hcor->or_usbsts, STS_IAA);
+
+	return ret;
+}
+
 static int
 ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer,
 		   int length, struct devrequest *req)
@@ -631,6 +653,9 @@  ehci_submit_async(struct usb_device *dev, unsigned long pipe, void *buffer,
 	flush_dcache_range((unsigned long)&ctrl->qh_list,
 		ALIGN_END_ADDR(struct QH, &ctrl->qh_list, 1));
 
+	/* Set IAAD, poll IAA */
+	ehci_iaa_cycle(ctrl);
+
 	/*
 	 * Invalidate the memory area occupied by buffer
 	 * Don't try to fix the buffer alignment, if it isn't properly
diff --git a/drivers/usb/host/ehci.h b/drivers/usb/host/ehci.h
index 8e07554..e9e6f2a5 100644
--- a/drivers/usb/host/ehci.h
+++ b/drivers/usb/host/ehci.h
@@ -44,6 +44,7 @@  struct ehci_hcor {
 #define STS_ASS		(1 << 15)
 #define	STS_PSS		(1 << 14)
 #define STS_HALT	(1 << 12)
+#define STS_IAA		(1 << 5)
 	uint32_t or_usbintr;
 #define INTR_UE         (1 << 0)                /* USB interrupt enable */
 #define INTR_UEE        (1 << 1)                /* USB error interrupt enable */