diff mbox

[U-Boot,2/6] usb:udc:samsung: Remove redundant cache operation from Samsung UDC driver

Message ID 1391170589-14340-3-git-send-email-l.majewski@samsung.com
State Superseded
Headers show

Commit Message

Łukasz Majewski Jan. 31, 2014, 12:16 p.m. UTC
A set of cache operations (both invalidation and flush) were redundant
in the S3C HS OTG Samsung driver.

Test condition
- test HW + measurement: Trats - Exynos4210 rev.1
- test HW Trats2 - Exynos4412 rev.1
400 MiB compressed rootfs image download with `thor 0 mmc 0`

Measurements:

Base values (without improvement):
Transmission speed: 9.51 MiB/s

After the change:
Transmission speed: 10.15 MiB/s

Change-Id: I0d55da4de724b14e988fefdb37a012562f7da8a2
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Cc: Marek Vasut <marex@denx.de>
---
 drivers/usb/gadget/s3c_udc_otg_xfer_dma.c |   13 -------------
 1 file changed, 13 deletions(-)

Comments

Marek Vasut Feb. 1, 2014, 2:50 a.m. UTC | #1
On Friday, January 31, 2014 at 01:16:25 PM, Lukasz Majewski wrote:
> A set of cache operations (both invalidation and flush) were redundant
> in the S3C HS OTG Samsung driver.
> 
> Test condition
> - test HW + measurement: Trats - Exynos4210 rev.1
> - test HW Trats2 - Exynos4412 rev.1
> 400 MiB compressed rootfs image download with `thor 0 mmc 0`

The commit message is missing a proper explanation _WHY_ were they redundant. I 
do not understand why they were redundant ... and no, the test you performed 
does not justify removal of cache management calls.

Best regards,
Marek Vasut
Lukasz Majewski Feb. 1, 2014, 9:56 a.m. UTC | #2
On Sat, 1 Feb 2014 03:50:26 +0100
Marek Vasut <marex@denx.de> wrote:

> On Friday, January 31, 2014 at 01:16:25 PM, Lukasz Majewski wrote:
> > A set of cache operations (both invalidation and flush) were
> > redundant in the S3C HS OTG Samsung driver.
> > 
> > Test condition
> > - test HW + measurement: Trats - Exynos4210 rev.1
> > - test HW Trats2 - Exynos4412 rev.1
> > 400 MiB compressed rootfs image download with `thor 0 mmc 0`
> 
> The commit message is missing a proper explanation _WHY_ were they
> redundant. I do not understand why they were redundant ... and no,
> the test you performed does not justify removal of cache management
> calls.
> 

The s3c UDC driver is in u-boot since 2011. It has been added when
at Samsung boards (s5p_goni) we didn't have cache enabled.

Then there was a transition, after which L1 was enabled. Since UDC is
co-working with gadgets on the beginning it was easier to perform the
cache management inside the UDC driver.

That is why we had to copy the buffers (since e.g. device descriptors
tends to be unaligned) - which also degraded performance.

Now, all gadget code seems to be memalign'ed and ready for direct buffer
passing (despite the two overlooked kmallocs in the mass storage
gadget - which I fix in this patch set).

To sum up:

1. s3c_udc_ep0_zlp - EP0 ZLP packets don't need to invalidate the cache
(since it is zero length transmission)

2. s3c_udc_pre_setup - cache invalidation is not needed when I setup
buffer for OUT EP0 transmission.

The above two invalidation calls had been added by me, and are mine
mistakes. Those don't contribute to transmission speed up (and shall be
regarded as a cosmetic changes)

3. setdma_rx - here I invalidate parts of the s3c UDC driver's internal
buffer. This call is not needed anymore since we reuse the buffers
passed from gadgets.

This is the key speed improvement here.

Best regards,
Lukasz Majewski

> Best regards,
> Marek Vasut
> _______________________________________________
> U-Boot mailing list
> U-Boot@lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
Marek Vasut Feb. 1, 2014, 10:49 p.m. UTC | #3
On Saturday, February 01, 2014 at 10:56:27 AM, Lukasz Majewski wrote:
> On Sat, 1 Feb 2014 03:50:26 +0100
> 
> Marek Vasut <marex@denx.de> wrote:
> > On Friday, January 31, 2014 at 01:16:25 PM, Lukasz Majewski wrote:
> > > A set of cache operations (both invalidation and flush) were
> > > redundant in the S3C HS OTG Samsung driver.
> > > 
> > > Test condition
> > > - test HW + measurement: Trats - Exynos4210 rev.1
> > > - test HW Trats2 - Exynos4412 rev.1
> > > 400 MiB compressed rootfs image download with `thor 0 mmc 0`
> > 
> > The commit message is missing a proper explanation _WHY_ were they
> > redundant. I do not understand why they were redundant ... and no,
> > the test you performed does not justify removal of cache management
> > calls.
> 
> The s3c UDC driver is in u-boot since 2011. It has been added when
> at Samsung boards (s5p_goni) we didn't have cache enabled.
> 
> Then there was a transition, after which L1 was enabled. Since UDC is
> co-working with gadgets on the beginning it was easier to perform the
> cache management inside the UDC driver.
> 
> That is why we had to copy the buffers (since e.g. device descriptors
> tends to be unaligned) - which also degraded performance.
> 
> Now, all gadget code seems to be memalign'ed and ready for direct buffer
> passing (despite the two overlooked kmallocs in the mass storage
> gadget - which I fix in this patch set).
> 
> To sum up:
> 
> 1. s3c_udc_ep0_zlp - EP0 ZLP packets don't need to invalidate the cache
> (since it is zero length transmission)
> 
> 2. s3c_udc_pre_setup - cache invalidation is not needed when I setup
> buffer for OUT EP0 transmission.
> 
> The above two invalidation calls had been added by me, and are mine
> mistakes. Those don't contribute to transmission speed up (and shall be
> regarded as a cosmetic changes)
> 
> 3. setdma_rx - here I invalidate parts of the s3c UDC driver's internal
> buffer. This call is not needed anymore since we reuse the buffers
> passed from gadgets.

And you do correct cache management on those in the UDC driver or in the gadget 
driver ?
 
> This is the key speed improvement here.

This should be in the commit message really ;-)

Best regards,
Marek Vasut
Łukasz Majewski Feb. 3, 2014, 8:05 a.m. UTC | #4
Hi Marek,

> On Saturday, February 01, 2014 at 10:56:27 AM, Lukasz Majewski wrote:
> > On Sat, 1 Feb 2014 03:50:26 +0100
> > 
> > Marek Vasut <marex@denx.de> wrote:
> > > On Friday, January 31, 2014 at 01:16:25 PM, Lukasz Majewski wrote:
> > > > A set of cache operations (both invalidation and flush) were
> > > > redundant in the S3C HS OTG Samsung driver.
> > > > 
> > > > Test condition
> > > > - test HW + measurement: Trats - Exynos4210 rev.1
> > > > - test HW Trats2 - Exynos4412 rev.1
> > > > 400 MiB compressed rootfs image download with `thor 0 mmc 0`
> > > 
> > > The commit message is missing a proper explanation _WHY_ were they
> > > redundant. I do not understand why they were redundant ... and no,
> > > the test you performed does not justify removal of cache
> > > management calls.
> > 
> > The s3c UDC driver is in u-boot since 2011. It has been added when
> > at Samsung boards (s5p_goni) we didn't have cache enabled.
> > 
> > Then there was a transition, after which L1 was enabled. Since UDC
> > is co-working with gadgets on the beginning it was easier to
> > perform the cache management inside the UDC driver.
> > 
> > That is why we had to copy the buffers (since e.g. device
> > descriptors tends to be unaligned) - which also degraded
> > performance.
> > 
> > Now, all gadget code seems to be memalign'ed and ready for direct
> > buffer passing (despite the two overlooked kmallocs in the mass
> > storage gadget - which I fix in this patch set).
> > 
> > To sum up:
> > 
> > 1. s3c_udc_ep0_zlp - EP0 ZLP packets don't need to invalidate the
> > cache (since it is zero length transmission)
> > 
> > 2. s3c_udc_pre_setup - cache invalidation is not needed when I setup
> > buffer for OUT EP0 transmission.
> > 
> > The above two invalidation calls had been added by me, and are mine
> > mistakes. Those don't contribute to transmission speed up (and
> > shall be regarded as a cosmetic changes)
> > 
> > 3. setdma_rx - here I invalidate parts of the s3c UDC driver's
> > internal buffer. This call is not needed anymore since we reuse the
> > buffers passed from gadgets.
> 
> And you do correct cache management on those in the UDC driver or in
> the gadget driver ?

For download, buffers are allocated in gadgets. Then buffer is passed
to the UDC driver in a USB request. 
After receiving data via USB the UDC driver takes care to invalidate
cache, hence the gadget can work on the data.

Cache management is performed in the UDC driver.

>  
> > This is the key speed improvement here.
> 
> This should be in the commit message really ;-)

I wrongly assumed, that code explains what was the rationale :-). I'm
going to prepare more verbose commit message for v2.

> 
> Best regards,
> Marek Vasut
Marek Vasut Feb. 3, 2014, 6:06 p.m. UTC | #5
On Monday, February 03, 2014 at 09:05:06 AM, Lukasz Majewski wrote:

[...]

> > > To sum up:
> > > 
> > > 1. s3c_udc_ep0_zlp - EP0 ZLP packets don't need to invalidate the
> > > cache (since it is zero length transmission)
> > > 
> > > 2. s3c_udc_pre_setup - cache invalidation is not needed when I setup
> > > buffer for OUT EP0 transmission.
> > > 
> > > The above two invalidation calls had been added by me, and are mine
> > > mistakes. Those don't contribute to transmission speed up (and
> > > shall be regarded as a cosmetic changes)
> > > 
> > > 3. setdma_rx - here I invalidate parts of the s3c UDC driver's
> > > internal buffer. This call is not needed anymore since we reuse the
> > > buffers passed from gadgets.
> > 
> > And you do correct cache management on those in the UDC driver or in
> > the gadget driver ?
> 
> For download, buffers are allocated in gadgets. Then buffer is passed
> to the UDC driver in a USB request.
> After receiving data via USB the UDC driver takes care to invalidate
> cache, hence the gadget can work on the data.
> 
> Cache management is performed in the UDC driver.

OK, this is the correct place. I just wanted to make sure about this. Thanks :)

> > > This is the key speed improvement here.
> > 
> > This should be in the commit message really ;-)
> 
> I wrongly assumed, that code explains what was the rationale :-). I'm
> going to prepare more verbose commit message for v2.

Please do, thanks!

Best regards,
Marek Vasut
Łukasz Majewski Feb. 4, 2014, 6:23 a.m. UTC | #6
Hi Marek,

> On Monday, February 03, 2014 at 09:05:06 AM, Lukasz Majewski wrote:
> 
> [...]
> 
> > > > To sum up:
> > > > 
> > > > 1. s3c_udc_ep0_zlp - EP0 ZLP packets don't need to invalidate
> > > > the cache (since it is zero length transmission)
> > > > 
> > > > 2. s3c_udc_pre_setup - cache invalidation is not needed when I
> > > > setup buffer for OUT EP0 transmission.
> > > > 
> > > > The above two invalidation calls had been added by me, and are
> > > > mine mistakes. Those don't contribute to transmission speed up
> > > > (and shall be regarded as a cosmetic changes)
> > > > 
> > > > 3. setdma_rx - here I invalidate parts of the s3c UDC driver's
> > > > internal buffer. This call is not needed anymore since we reuse
> > > > the buffers passed from gadgets.
> > > 
> > > And you do correct cache management on those in the UDC driver or
> > > in the gadget driver ?
> > 
> > For download, buffers are allocated in gadgets. Then buffer is
> > passed to the UDC driver in a USB request.
> > After receiving data via USB the UDC driver takes care to invalidate
> > cache, hence the gadget can work on the data.
> > 
> > Cache management is performed in the UDC driver.
> 
> OK, this is the correct place. I just wanted to make sure about this.
> Thanks :)

No problem :-)

> 
> > > > This is the key speed improvement here.
> > > 
> > > This should be in the commit message really ;-)
> > 
> > I wrongly assumed, that code explains what was the rationale :-).
> > I'm going to prepare more verbose commit message for v2.
> 
> Please do, thanks!
> 
> Best regards,
> Marek Vasut
diff mbox

Patch

diff --git a/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c b/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
index 1cbf8f6..eaa3a20 100644
--- a/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
+++ b/drivers/usb/gadget/s3c_udc_otg_xfer_dma.c
@@ -29,10 +29,6 @@  static inline void s3c_udc_ep0_zlp(struct s3c_udc *dev)
 {
 	u32 ep_ctrl;
 
-	flush_dcache_range((unsigned long) usb_ctrl_dma_addr,
-			   (unsigned long) usb_ctrl_dma_addr
-			   + DMA_BUFFER_SIZE);
-
 	writel(usb_ctrl_dma_addr, &reg->in_endp[EP0_CON].diepdma);
 	writel(DIEPT_SIZ_PKT_CNT(1), &reg->in_endp[EP0_CON].dieptsiz);
 
@@ -52,10 +48,6 @@  void s3c_udc_pre_setup(void)
 	debug_cond(DEBUG_IN_EP,
 		   "%s : Prepare Setup packets.\n", __func__);
 
-	invalidate_dcache_range((unsigned long) usb_ctrl_dma_addr,
-				(unsigned long) usb_ctrl_dma_addr
-				+ DMA_BUFFER_SIZE);
-
 	writel(DOEPT_SIZ_PKT_CNT(1) | sizeof(struct usb_ctrlrequest),
 	       &reg->out_endp[EP0_CON].doeptsiz);
 	writel(usb_ctrl_dma_addr, &reg->out_endp[EP0_CON].doepdma);
@@ -115,11 +107,6 @@  static int setdma_rx(struct s3c_ep *ep, struct s3c_request *req)
 	ep->len = length;
 	ep->dma_buf = buf;
 
-	invalidate_dcache_range((unsigned long) ep->dev->dma_buf[ep_num],
-				(unsigned long) ep->dev->dma_buf[ep_num]
-				+ ROUND(ep->ep.maxpacket,
-					CONFIG_SYS_CACHELINE_SIZE));
-
 	if (length == 0)
 		pktcnt = 1;
 	else