diff mbox

usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4

Message ID 1569777.jHIobOl9fm@debian64 (mailing list archive)
State Not Applicable
Headers show

Commit Message

Unknown sender due to SPF May 18, 2016, 7:14 p.m. UTC
On Tuesday, May 17, 2016 04:50:48 PM John Youn wrote:
> On 5/14/2016 6:11 AM, Christian Lamparter wrote:
> > On Thursday, May 12, 2016 11:40:28 AM John Youn wrote:
> >> On 5/12/2016 6:30 AM, Christian Lamparter wrote:
> >>> On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote:
> >>>> On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote:
> >>>>>>>> Detecting the endianess of the
> >>>>>>>> device is probably the best future-proof solution, but it's also
> >>>>>>>> considerably more work to do in the driver, and comes with a
> >>>>>>>> tiny runtime overhead.
> >>>>>>>
> >>>>>>> The runtime overhead is probably non-measurable compared with the cost
> >>>>>>> of the actual MMIOs.
> >>>>>>
> >>>>>> Right. The code size increase is probably measurable (but still small),
> >>>>>> the runtime overhead is not.
> >>>>>
> >>>>> Ok, so no rebuts or complains have been posted.
> >>>>>
> >>>>> I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354
> >>>>> and it works: 
> >>>>>
> >>>>> Tested-by: Christian Lamparter <chunkeey@googlemail.com>
> >>>>>
> >>>>> So, how do we go from here? There is are two small issues with the
> >>>>> original patch (#ifdef DWC2_LOG_WRITES got converted to lower case:
> >>>>> #ifdef dwc2_log_writes) and I guess a proper subject would be nice.  
> >>>>>
> >>>>> Arnd, can you please respin and post it (cc'd stable as well)?
> >>>>> So this is can be picked up? Or what's your plan?
> >>>>
> >>>> (I just realized my reply was stuck in my outbox, so the patch
> >>>> went out first)
> >>>>
> >>>> If I recall correctly, the rough consensus was to go with your longer
> >>>> patch in the future (fixed up for the comments that BenH and
> >>>> I sent), and I'd suggest basing it on top of a fixed version of
> >>>> my patch.
> >>> Well, but it comes with the "overhead"! So this was just as I said:
> >>> "Let's look at it and see if it's any good"... And I think it isn't
> >>> since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN
> >>> archs etc...
> >>
> >> I slightly prefer the more general patch for future kernel versions.
> >> The overhead will probably be negligible, but we can perform some
> >> testing to make sure.
> >>
> >> Can you resubmit with all gathered feedback?
> > 
> > Yes, here are the changes.
> > 
> > I've tested it on my MyBook Live Duo. The usbotg comes right up:
> > [12610.540004] dwc2 4bff80000.usbotg: USB bus 1 deregistered
> > [12612.513934] dwc2 4bff80000.usbotg: Specified GNPTXFDEP=1024 > 256
> > [12612.518756] dwc2 4bff80000.usbotg: EPs: 3, shared fifos, 2042 entries in SPRAM
> > [12612.530112] dwc2 4bff80000.usbotg: DWC OTG Controller
> > [12612.533948] dwc2 4bff80000.usbotg: new USB bus registered, assigned bus number 1
> > [12612.540083] dwc2 4bff80000.usbotg: irq 33, io mem 0x00000000
> > 
> > John: Can you run some perf test with it?
> > 
> > I've based this on:
> > 
> > commit 6ea2fffc9057a67df1994d85a7c085d899eaa25a
> > Author: Arnd Bergmann <arnd@arndb.de>
> > Date:   Fri May 13 15:52:27 2016 +0200
> > 
> >     usb: dwc2: fix regression on big-endian PowerPC/ARM systems
> > 
> > so naturally, it needs to be applied first.
> > Most of the conversion work was done by the attached
> > coccinelle semantic patches. 
> > 
> > I had to edit the __bic32 and __orr32 helpers by hand.
> > As well as some debugfs code and stuff in gadget.c.
> > 
> 
> Thanks Christian.
> 
> I'll keep this in our internal tree and send it to Felipe later. This
> causes a bunch of conflicts that I have to fix up and I should do a
> bit of testing as well.
> 
> And since there is a patch that fixes the regression this is can wait.
> 
> Regards,
> John
---
Hey, that's really nice of you to do that :-D. Please keep me in the
loop (Cc) for those then. 

Yes, this needs definitely testing on all the affected ARCHs. 
I've attached a diff to a updated version of the patch. It
drops the special MIPS case (as requested by Arnd).

BTW, I looked into the ioread32_rep and iowrite32_rep again. I'm
not entirely convinced that the hardware FIFOs are actually endian
neutral. But I can't verify it since my Western Digital My Book Live
only supports the host configuration (forces host mode), so I don't
know what a device in dual-mode or peripheral do here.

The reason why I think it was broken is because there's a PIO copy
to and from the HCFIFO(x) in dwc2_hc_write_packet and
dwc2_hc_read_packet access in the hcd.c file as well... And there,
the code was using the dwc2_readl and dwc2_writel to access the data.
I added special accessors for the FIFOS now:
	dwc2_readl_rep and dwc2_writel_rep.

I went all the way and implemented the helpers to do unaligned access
if necessary (not sure if adding likely branches is a good idea, as
this could be either always true or false for a specific driver the
whole time).

NB: it also fixes a "regs variable not used in dwc2_hsotg_dump" warning
if DEBUG isn't selected.

NB2: If it you need a patch against a specific tree, please
let me know.
---

Comments

Arnd Bergmann May 18, 2016, 9:09 p.m. UTC | #1
On Wednesday 18 May 2016 21:14:55 Christian Lamparter wrote:
> On Tuesday, May 17, 2016 04:50:48 PM John Youn wrote:
> > On 5/14/2016 6:11 AM, Christian Lamparter wrote:

> Hey, that's really nice of you to do that :-D. Please keep me in the
> loop (Cc) for those then. 
> 
> Yes, this needs definitely testing on all the affected ARCHs. 
> I've attached a diff to a updated version of the patch. It
> drops the special MIPS case (as requested by Arnd).

Ok, thanks!

> BTW, I looked into the ioread32_rep and iowrite32_rep again. I'm
> not entirely convinced that the hardware FIFOs are actually endian
> neutral. But I can't verify it since my Western Digital My Book Live
> only supports the host configuration (forces host mode), so I don't
> know what a device in dual-mode or peripheral do here.

I think it's highly unlikely that designware would have screwed up
their part in such an unusual way, by intentionally adding a
byte-reversal on something that is not connected to a 32-bit
register.

Note that the reason why ioread32_rep() doesn't swap is not that
the registers are endian neutral, but that they use endianess in
the same way that the memory does: If you want to look at the first
byte of a (theoretical) four-byte USB data packet, we read four
bytes from the FIFO register using __raw_readl() (a pointer dereference)
and store it to memory using a 32-bit write:

	*(u32 *)buffer = __raw_readl(FIFO);

Then we expect the first byte of the packet to be at the start:

	byte0 = *(u8*)buffer;

If you replace the __raw_readl() with ioread32(), it gets byteswapped
on big-endian *CPUs*, and then written to memory without an extra
swap. This means that now you get the wrong data depending on the
kernel endianess configuration, and independent of the device endianess.

If the big-endian mode of the dwc2 block indeed contains a byteswap
on the FIFO, that would mean not having to use ioread32_be() for
the FIFO, but using

fifo_read32(void *buffer)
{
	u32 data = __raw_readl(FIFO_ADDRESS);
	if (big_endian_registers)
		data = bswap32(data);
	*(u32*)buffer = data;
}

so we byteswap the FIFO contents back, regardless of the CPU
endianess. As I said, it's unlikely that the hardware is this broken,
but not impossible.

> The reason why I think it was broken is because there's a PIO copy
> to and from the HCFIFO(x) in dwc2_hc_write_packet and
> dwc2_hc_read_packet access in the hcd.c file as well... And there,
> the code was using the dwc2_readl and dwc2_writel to access the data.

Well, we know for a fact that those functions get endianess wrong,
see dwc2_hc_write_packet:

        if (((unsigned long)data_buf & 0x3) == 0) {
                /* xfer_buf is DWORD aligned */
                for (i = 0; i < dword_count; i++, data_buf++)
                        dwc2_writel(*data_buf, data_fifo);
        } else {
                /* xfer_buf is not DWORD aligned */
                for (i = 0; i < dword_count; i++, data_buf++) {
                        u32 data = data_buf[0] | data_buf[1] << 8 |
                                   data_buf[2] << 16 | data_buf[3] << 24;
                        dwc2_writel(data, data_fifo);
                }
        }

On big-endian machines, unaligned case performs a byte swap while the
aligned case does not. My best guess is that this function never
got called on either the MIPS machine that first got the fix or
your PowerPC machine.

Another possibility is that you are right that there is a byteswap
on the FIFO register in big-endian mode, and that this function
always gets unaligned buffers, so the byteswap here cancels out
the byteswap on the FIFO when both the CPU and the device are
big-endian.

> -	if (((unsigned long)data_buf & 0x3) == 0) {
> -		/* xfer_buf is DWORD aligned */
> -		for (i = 0; i < dword_count; i++, data_buf++)
> -			dwc2_writel(hsotg, *data_buf, HCFIFO(chan->hc_num));
> -	} else {
> -		/* xfer_buf is not DWORD aligned */
> -		for (i = 0; i < dword_count; i++, data_buf++) {
> -			u32 data = data_buf[0] | data_buf[1] << 8 |
> -				   data_buf[2] << 16 | data_buf[3] << 24;
> -			dwc2_writel(hsotg, data, HCFIFO(chan->hc_num));
> -		}
> -	}
> +	dwc2_writel_rep(hsotg, HCFIFO(chan->hc_num), data_buf, byte_count);
>  

and here you are dropping the byteswap on big-endian.

	Arnd
John Youn May 19, 2016, 12:36 a.m. UTC | #2
On 5/18/2016 12:15 PM, Christian Lamparter wrote:
> On Tuesday, May 17, 2016 04:50:48 PM John Youn wrote:
>> On 5/14/2016 6:11 AM, Christian Lamparter wrote:
>>> On Thursday, May 12, 2016 11:40:28 AM John Youn wrote:
>>>> On 5/12/2016 6:30 AM, Christian Lamparter wrote:
>>>>> On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote:
>>>>>> On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote:
>>>>>>>>>> Detecting the endianess of the
>>>>>>>>>> device is probably the best future-proof solution, but it's also
>>>>>>>>>> considerably more work to do in the driver, and comes with a
>>>>>>>>>> tiny runtime overhead.
>>>>>>>>>
>>>>>>>>> The runtime overhead is probably non-measurable compared with the cost
>>>>>>>>> of the actual MMIOs.
>>>>>>>>
>>>>>>>> Right. The code size increase is probably measurable (but still small),
>>>>>>>> the runtime overhead is not.
>>>>>>>
>>>>>>> Ok, so no rebuts or complains have been posted.
>>>>>>>
>>>>>>> I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354
>>>>>>> and it works: 
>>>>>>>
>>>>>>> Tested-by: Christian Lamparter <chunkeey@googlemail.com>
>>>>>>>
>>>>>>> So, how do we go from here? There is are two small issues with the
>>>>>>> original patch (#ifdef DWC2_LOG_WRITES got converted to lower case:
>>>>>>> #ifdef dwc2_log_writes) and I guess a proper subject would be nice.  
>>>>>>>
>>>>>>> Arnd, can you please respin and post it (cc'd stable as well)?
>>>>>>> So this is can be picked up? Or what's your plan?
>>>>>>
>>>>>> (I just realized my reply was stuck in my outbox, so the patch
>>>>>> went out first)
>>>>>>
>>>>>> If I recall correctly, the rough consensus was to go with your longer
>>>>>> patch in the future (fixed up for the comments that BenH and
>>>>>> I sent), and I'd suggest basing it on top of a fixed version of
>>>>>> my patch.
>>>>> Well, but it comes with the "overhead"! So this was just as I said:
>>>>> "Let's look at it and see if it's any good"... And I think it isn't
>>>>> since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN
>>>>> archs etc...
>>>>
>>>> I slightly prefer the more general patch for future kernel versions.
>>>> The overhead will probably be negligible, but we can perform some
>>>> testing to make sure.
>>>>
>>>> Can you resubmit with all gathered feedback?
>>>
>>> Yes, here are the changes.
>>>
>>> I've tested it on my MyBook Live Duo. The usbotg comes right up:
>>> [12610.540004] dwc2 4bff80000.usbotg: USB bus 1 deregistered
>>> [12612.513934] dwc2 4bff80000.usbotg: Specified GNPTXFDEP=1024 > 256
>>> [12612.518756] dwc2 4bff80000.usbotg: EPs: 3, shared fifos, 2042 entries in SPRAM
>>> [12612.530112] dwc2 4bff80000.usbotg: DWC OTG Controller
>>> [12612.533948] dwc2 4bff80000.usbotg: new USB bus registered, assigned bus number 1
>>> [12612.540083] dwc2 4bff80000.usbotg: irq 33, io mem 0x00000000
>>>
>>> John: Can you run some perf test with it?
>>>
>>> I've based this on:
>>>
>>> commit 6ea2fffc9057a67df1994d85a7c085d899eaa25a
>>> Author: Arnd Bergmann <arnd@arndb.de>
>>> Date:   Fri May 13 15:52:27 2016 +0200
>>>
>>>     usb: dwc2: fix regression on big-endian PowerPC/ARM systems
>>>
>>> so naturally, it needs to be applied first.
>>> Most of the conversion work was done by the attached
>>> coccinelle semantic patches. 
>>>
>>> I had to edit the __bic32 and __orr32 helpers by hand.
>>> As well as some debugfs code and stuff in gadget.c.
>>>
>>
>> Thanks Christian.
>>
>> I'll keep this in our internal tree and send it to Felipe later. This
>> causes a bunch of conflicts that I have to fix up and I should do a
>> bit of testing as well.
>>
>> And since there is a patch that fixes the regression this is can wait.
>>
>> Regards,
>> John
> ---
> Hey, that's really nice of you to do that :-D. Please keep me in the
> loop (Cc) for those then. 

Sure no problem.

> 
> Yes, this needs definitely testing on all the affected ARCHs. 
> I've attached a diff to a updated version of the patch. It
> drops the special MIPS case (as requested by Arnd).
> 
> BTW, I looked into the ioread32_rep and iowrite32_rep again. I'm
> not entirely convinced that the hardware FIFOs are actually endian
> neutral. But I can't verify it since my Western Digital My Book Live
> only supports the host configuration (forces host mode), so I don't
> know what a device in dual-mode or peripheral do here.
> 
> The reason why I think it was broken is because there's a PIO copy
> to and from the HCFIFO(x) in dwc2_hc_write_packet and
> dwc2_hc_read_packet access in the hcd.c file as well... And there,
> the code was using the dwc2_readl and dwc2_writel to access the data.
> I added special accessors for the FIFOS now:
> 	dwc2_readl_rep and dwc2_writel_rep.
> 

Hmmm, you could be right in that case. I'll have to check with the IP
engineers and maybe try to run some tests on our platforms. So native
access to the host FIFO will fail then? This platform is a BE CPU with
the IP connected as LE, right?


> I went all the way and implemented the helpers to do unaligned access
> if necessary (not sure if adding likely branches is a good idea, as
> this could be either always true or false for a specific driver the
> whole time).
> 
> NB: it also fixes a "regs variable not used in dwc2_hsotg_dump" warning
> if DEBUG isn't selected.
> 
> NB2: If it you need a patch against a specific tree, please
> let me know.

I can't provide you a tree to rebase on just yet. I'm hoping to get a
few things queued for 4.8 and just apply this on top.

It would help if you could resend these as proper patches with a
commit message and signed-off-by line, and the CONFIG_MIPS removal and
compile warning fix merged in. And the fifo accessors in a separate
patch. That way I can do any simple fix-ups if needed.

Regards,
John
diff mbox

Patch

diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
index 2fa57cd..69030bb 100644
--- a/drivers/usb/dwc2/core.h
+++ b/drivers/usb/dwc2/core.h
@@ -42,6 +42,7 @@ 
 #include <linux/usb/gadget.h>
 #include <linux/usb/otg.h>
 #include <linux/usb/phy.h>
+#include <asm/unaligned.h>
 #include "hw.h"
 
 /*
@@ -958,50 +959,6 @@  enum dwc2_halt_status {
 	DWC2_HC_XFER_URB_DEQUEUE,
 };
 
-#ifdef CONFIG_MIPS
-/*
- * There are some MIPS machines that can run in either big-endian
- * or little-endian mode and that use the dwc2 register without
- * a byteswap in both ways.
- * Unlike other architectures, MIPS apparently does not require a
- * barrier before the __raw_writel() to synchronize with DMA but does
- * require the barrier after the __raw_writel() to serialize a set of
- * writes. This set of operations was added specifically for MIPS and
- * should only be used there.
- */
-static inline u32 dwc2_readl(struct dwc2_hsotg *hsotg,
-			     ptrdiff_t reg)
-{
-	const void __iomem *addr = hsotg->regs + reg;
-	u32 value = __raw_readl(addr);
-
-	/*
-	 * In order to preserve endianness __raw_* operation is used. Therefore
-	 * a barrier is needed to ensure IO access is not re-ordered across
-	 * reads or writes
-	 */
-	mb();
-	return value;
-}
-
-static inline void dwc2_writel(struct dwc2_hsotg *hsotg, u32 value,
-			       ptrdiff_t reg)
-{
-	const void __iomem *addr = hsotg->regs + reg;
-	__raw_writel(value, addr);
-
-	/*
-	 * In order to preserve endianness __raw_* operation is used. Therefore
-	 * a barrier is needed to ensure IO access is not re-ordered across
-	 * reads or writes
-	 */
-	mb();
-#ifdef DWC2_LOG_WRITES
-	pr_info("INFO:: wrote %08x to %p\n", value, addr);
-#endif
-}
-#else
-/* Normal architectures just use readl/write_be */
 static inline u32 dwc2_readl(struct dwc2_hsotg *hsotg,
 			     ptrdiff_t reg)
 {
@@ -1014,7 +971,8 @@  static inline u32 dwc2_readl(struct dwc2_hsotg *hsotg,
 }
 
 
-static inline void dwc2_writel(struct dwc2_hsotg *hsotg, u32 value,
+static inline void dwc2_writel(struct dwc2_hsotg *hsotg,
+			       const u32 value,
 			       ptrdiff_t reg)
 {
 	void __iomem *addr = hsotg->regs + reg;
@@ -1028,7 +986,103 @@  static inline void dwc2_writel(struct dwc2_hsotg *hsotg, u32 value,
 	pr_info("info:: wrote %08x to %p\n", value, addr);
 #endif
 }
-#endif
+
+static inline void dwc2_readl_rep(struct dwc2_hsotg *hsotg,
+				  ptrdiff_t reg,
+				  u32 *buf, const size_t len)
+{
+	void __iomem *addr = hsotg->regs + reg;
+	size_t i, remaining = len & ~4;
+
+	if (hsotg->is_big_endian) {
+		if (likely(IS_ALIGNED(*buf, 0x4))) {
+			for (i = len >> 2; i > 0; i--)
+				*buf++ = ioread32be(addr);
+		} else {
+			/* xfer_buf is not DWORD aligned */
+			for (i = len >> 2; i > 0; i--) {
+				u32 data = ioread32be(addr);
+
+				put_unaligned(data, buf);
+				buf++;
+			}
+		}
+	} else {
+		/* little-endian accessors */
+		if (likely(IS_ALIGNED(*buf, 0x4))) {
+			for (i = len >> 2; i > 0; i--)
+				*buf++ = ioread32(addr);
+
+		} else {
+			/* xfer_buf is not DWORD aligned */
+			for (i = len >> 2; i > 0; i--) {
+				u32 data = ioread32be(addr);
+
+				put_unaligned(data, buf);
+				buf++;
+			}
+		}
+	}
+
+	if (unlikely(remaining)) {
+		u32 data_u32;
+		u8 *buf_u8 = (u8 *) buf;
+		u8 *data_u8 = (u8 *) &data_u32;
+
+		data_u32 = dwc2_readl(hsotg, reg);
+
+		while (remaining--)
+			*buf_u8++ = *data_u8++;
+	}
+}
+
+static inline void dwc2_writel_rep(struct dwc2_hsotg *hsotg,
+				   const ptrdiff_t reg,
+				   const u32 *buf, const size_t len)
+{
+	void __iomem *addr = hsotg->regs + reg;
+	size_t i, remaining = len & ~4;
+
+	if (hsotg->is_big_endian) {
+		if (likely(IS_ALIGNED(*buf, 0x4))) {
+			for (i = len >> 2; i > 0; i--)
+				iowrite32be(*buf++, addr);
+		} else {
+			/* xfer_buf is not DWORD aligned */
+			for (i = len >> 2; i > 0; i--) {
+				u32 data = get_unaligned(buf);
+
+				iowrite32be(data, addr);
+				buf++;
+			}
+		}
+	} else {
+		/* little-endian accessors */
+		if (likely(IS_ALIGNED(*buf, 0x4))) {
+			for (i = len >> 2; i > 0; i--)
+				iowrite32(*buf++, addr);
+		} else {
+			/* xfer_buf is not DWORD aligned */
+			for (i = len >> 2; i > 0; i--) {
+				u32 data = get_unaligned(buf);
+
+				iowrite32(data, addr);
+				buf++;
+			}
+		}
+	}
+
+	if (unlikely(remaining)) {
+		u32 data_u32;
+		u8 *buf_u8 = (u8 *) buf;
+		u8 *data_u8 = (u8 *) &data_u32;
+
+		while (remaining--)
+			*data_u8++ = *buf_u8++;
+
+		dwc2_writel(hsotg, data_u32, reg);
+	}
+}
 
 extern int dwc2_detect_endiannes(struct dwc2_hsotg *hsotg);
 
diff --git a/drivers/usb/dwc2/gadget.c b/drivers/usb/dwc2/gadget.c
index 2c687d9..531b30f 100644
--- a/drivers/usb/dwc2/gadget.c
+++ b/drivers/usb/dwc2/gadget.c
@@ -317,7 +317,7 @@  static int dwc2_hsotg_write_fifo(struct dwc2_hsotg *hsotg,
 	u32 gnptxsts = dwc2_readl(hsotg, GNPTXSTS);
 	int buf_pos = hs_req->req.actual;
 	int to_write = hs_ep->size_loaded;
-	void *data;
+	u32 *data;
 	int can_write;
 	int pkt_round;
 	int max_transfer;
@@ -457,10 +457,9 @@  static int dwc2_hsotg_write_fifo(struct dwc2_hsotg *hsotg,
 	if (periodic)
 		hs_ep->fifo_load += to_write;
 
-	to_write = DIV_ROUND_UP(to_write, 4);
 	data = hs_req->req.buf + buf_pos;
 
-	iowrite32_rep(hsotg->regs + EPFIFO(hs_ep->index), data, to_write);
+	dwc2_writel_rep(hsotg, EPFIFO(hs_ep->index), data, to_write);
 
 	return (to_write >= can_write) ? -ENOSPC : 0;
 }
@@ -1439,12 +1438,11 @@  static void dwc2_hsotg_rx_data(struct dwc2_hsotg *hsotg, int ep_idx, int size)
 {
 	struct dwc2_hsotg_ep *hs_ep = hsotg->eps_out[ep_idx];
 	struct dwc2_hsotg_req *hs_req = hs_ep->req;
-	void __iomem *fifo = hsotg->regs + EPFIFO(ep_idx);
+	u32 *data;
 	int to_read;
 	int max_req;
 	int read_ptr;
 
-
 	if (!hs_req) {
 		u32 epctl = dwc2_readl(hsotg, DOEPCTL(ep_idx));
 		int ptr;
@@ -1455,7 +1453,7 @@  static void dwc2_hsotg_rx_data(struct dwc2_hsotg *hsotg, int ep_idx, int size)
 
 		/* dump the data from the FIFO, we've nothing we can do */
 		for (ptr = 0; ptr < size; ptr += 4)
-			(void)dwc2_readl(hsotg, EPFIFO(ep_idx));
+			(void)__raw_readl(hsotg->regs + EPFIFO(ep_idx));
 
 		return;
 	}
@@ -1479,13 +1477,14 @@  static void dwc2_hsotg_rx_data(struct dwc2_hsotg *hsotg, int ep_idx, int size)
 
 	hs_ep->total_data += to_read;
 	hs_req->req.actual += to_read;
-	to_read = DIV_ROUND_UP(to_read, 4);
 
 	/*
 	 * note, we might over-write the buffer end by 3 bytes depending on
 	 * alignment of the data.
 	 */
-	ioread32_rep(fifo, hs_req->req.buf + read_ptr, to_read);
+	data = hs_req->req.buf + read_ptr;
+
+	dwc2_readl_rep(hsotg, EPFIFO(ep_idx), data, to_read);
 }
 
 /**
@@ -3411,7 +3416,6 @@  static void dwc2_hsotg_dump(struct dwc2_hsotg *hsotg)
 {
 #ifdef DEBUG
 	struct device *dev = hsotg->dev;
-	void __iomem *regs = hsotg->regs;
 	u32 val;
 	int idx;
 
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index dcd6338..8568ff4 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -567,19 +567,10 @@  u32 dwc2_calc_frame_interval(struct dwc2_hsotg *hsotg)
 void dwc2_read_packet(struct dwc2_hsotg *hsotg, u8 *dest, u16 bytes)
 {
 	u32 *data_buf = (u32 *)dest;
-	int word_count = (bytes + 3) / 4;
-	int i;
-
-	/*
-	 * Todo: Account for the case where dest is not dword aligned. This
-	 * requires reading data from the FIFO into a u32 temp buffer, then
-	 * moving it into the data buffer.
-	 */
 
 	dev_vdbg(hsotg->dev, "%s(%p,%p,%d)\n", __func__, hsotg, dest, bytes);
 
-	for (i = 0; i < word_count; i++, data_buf++)
-		*data_buf = dwc2_readl(hsotg, HCFIFO(0));
+	dwc2_readl_rep(hsotg, HCFIFO(0), data_buf, bytes);
 }
 
 /**
@@ -1236,10 +1227,8 @@  static void dwc2_set_pid_isoc(struct dwc2_host_chan *chan)
 static void dwc2_hc_write_packet(struct dwc2_hsotg *hsotg,
 				 struct dwc2_host_chan *chan)
 {
-	u32 i;
 	u32 remaining_count;
 	u32 byte_count;
-	u32 dword_count;
 	u32 *data_buf = (u32 *)chan->xfer_buf;
 
 	if (dbg_hc(chan))
@@ -1251,20 +1240,7 @@  static void dwc2_hc_write_packet(struct dwc2_hsotg *hsotg,
 	else
 		byte_count = remaining_count;
 
-	dword_count = (byte_count + 3) / 4;
-
-	if (((unsigned long)data_buf & 0x3) == 0) {
-		/* xfer_buf is DWORD aligned */
-		for (i = 0; i < dword_count; i++, data_buf++)
-			dwc2_writel(hsotg, *data_buf, HCFIFO(chan->hc_num));
-	} else {
-		/* xfer_buf is not DWORD aligned */
-		for (i = 0; i < dword_count; i++, data_buf++) {
-			u32 data = data_buf[0] | data_buf[1] << 8 |
-				   data_buf[2] << 16 | data_buf[3] << 24;
-			dwc2_writel(hsotg, data, HCFIFO(chan->hc_num));
-		}
-	}
+	dwc2_writel_rep(hsotg, HCFIFO(chan->hc_num), data_buf, byte_count);
 
 	chan->xfer_count += byte_count;
 	chan->xfer_buf += byte_count;