Patchwork [Oneiric,ARM] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

login
register
mail settings
Submitter Ming Lei
Date Sept. 2, 2011, 1:24 p.m.
Message ID <1314969863-4914-1-git-send-email-ming.lei@canonical.com>
Download mbox | patch
Permalink /patch/113137/
State New
Headers show

Comments

Ming Lei - Sept. 2, 2011, 1:24 p.m.
From: Ming Lei <ming.lei@canonical.com>

This patch introduces the helper of ehci_sync_mem to flush
qtd/qh into memory immediately on some ARM, so that HC can
see the up-to-date qtd/qh descriptor asap.

This patch fixs one performance bug on ARM Cortex A9 dual core
platform, which has been reported on quite a few ARM machines
(OMAP4, Tegra 2, snowball...), see details from link of
https://bugs.launchpad.net/bugs/709245.

The patch has been tested ok on OMAP4 panda A1 board, and the
performance of 'dd' over usb mass storage can be increased from
4~5MB/sec to 14~16MB/sec after applying this patch.

SRU Justification:

Impact:
        - without the patch, 'dd' over usb mass storage is about
	4~5MB/sec.

Fix:
        - After applying the patch, 'dd' over usb mass storage is
	about 14~16MB/sec.

BugLink: http://bugs.launchpad.net/bugs/709245

upstream discusstion:
	https://patchwork.kernel.org/patch/1113332/

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
The patch has been agreed(signed-off-by) by ehci maintainer
(Alan Stern) of upstream kernel, but still not enter upstream
now. The current upstream discussion is focused on if a new
DMA API should be introduced to flush data into DMA coherent
memory. I think the patch will enter 3.2 instead of 3.1 if
new DMA API needs to be introduced, so post it out that the
patch can fix this beta 1 bug of Oneric. 
---
 drivers/usb/host/ehci-q.c |   18 ++++++++++++++++++
 drivers/usb/host/ehci.h   |   17 +++++++++++++++++
 2 files changed, 35 insertions(+), 0 deletions(-)
Paolo Pisati - Sept. 2, 2011, 1:47 p.m.
On 09/02/2011 03:24 PM, ming.lei@canonical.com wrote:
> From: Ming Lei <ming.lei@canonical.com>
> 
> This patch introduces the helper of ehci_sync_mem to flush
> qtd/qh into memory immediately on some ARM, so that HC can
> see the up-to-date qtd/qh descriptor asap.

thanks, i'll incorporate this in the next oneiric/ti-omap4 update.

BTW, shouldn't we apply this for all the previous releases too?
Ming Lei - Sept. 2, 2011, 1:53 p.m.
Hi,

On Fri, Sep 2, 2011 at 9:47 PM, Paolo Pisati <paolo.pisati@canonical.com> wrote:
> BTW, shouldn't we apply this for all the previous releases too?

Yes, of course, we should apply this for natty, ...


thanks,
--
Ming Lei

Patch

diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
index 0917e3a..2719879 100644
--- a/drivers/usb/host/ehci-q.c
+++ b/drivers/usb/host/ehci-q.c
@@ -995,6 +995,12 @@  static void qh_link_async (struct ehci_hcd *ehci, struct ehci_qh *qh)
 	head->qh_next.qh = qh;
 	head->hw->hw_next = dma;
 
+	/*
+	 * flush qh descriptor into memory immediately,
+	 * see comments in qh_append_tds.
+	 */
+	ehci_sync_mem();
+
 	qh_get(qh);
 	qh->xacterrs = 0;
 	qh->qh_state = QH_STATE_LINKED;
@@ -1082,6 +1088,18 @@  static struct ehci_qh *qh_append_tds (
 			wmb ();
 			dummy->hw_token = token;
 
+			/*
+			 * Writing to dma coherent buffer on ARM may
+			 * be delayed to reach memory, so HC may not see
+			 * hw_token of dummy qtd in time, which can cause
+			 * the qtd transaction to be executed very late,
+			 * and degrade performance a lot. ehci_sync_mem
+			 * is added to flush 'token' immediatelly into
+			 * memory, so that ehci can execute the transaction
+			 * ASAP.
+			 */
+			ehci_sync_mem();
+
 			urb->hcpriv = qh_get (qh);
 		}
 	}
diff --git a/drivers/usb/host/ehci.h b/drivers/usb/host/ehci.h
index cc7d337..313d9d6 100644
--- a/drivers/usb/host/ehci.h
+++ b/drivers/usb/host/ehci.h
@@ -738,6 +738,23 @@  static inline u32 hc32_to_cpup (const struct ehci_hcd *ehci, const __hc32 *x)
 
 #endif
 
+/*
+ * Writing to dma coherent memory on ARM may be delayed via L2
+ * writing buffer, so introduce the helper which can flush L2 writing
+ * buffer into memory immediately, especially used to flush ehci
+ * descriptor to memory.
+ */
+#ifdef	CONFIG_ARM_DMA_MEM_BUFFERABLE
+static inline void ehci_sync_mem()
+{
+	mb();
+}
+#else
+static inline void ehci_sync_mem()
+{
+}
+#endif
+
 /*-------------------------------------------------------------------------*/
 
 #ifndef DEBUG