diff mbox

usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4

Message ID 4162108.qmr2GZCaDN@wuerfel (mailing list archive)
State Not Applicable
Headers show

Commit Message

Arnd Bergmann May 9, 2016, 10:36 a.m. UTC
On Monday 09 May 2016 10:23:22 Benjamin Herrenschmidt wrote:
> On Sun, 2016-05-08 at 13:44 +0200, Christian Lamparter wrote:
> > On Sunday, May 08, 2016 08:40:55 PM Benjamin Herrenschmidt wrote:
> > > 
> > > On Sun, 2016-05-08 at 00:54 +0200, Christian Lamparter via Linuxppc-dev 
> > > wrote:
> > > > 
> > > > I've been looking in getting the MyBook Live Duo's USB OTG port
> > > > to function. The SoC is a APM82181. Which has a PowerPC 464 core
> > > > and related to the supported canyonlands architecture in
> > > > arch/powerpc/.
> > > > 
> > > > Currently in -next the dwc2 module doesn't load: 
> > > Smells like the APM implementation is little endian. You might need to
> > > use a flag to indicate what endian to use instead and set it
> > > appropriately based on some DT properties.
> > I tried. As per common-properties[0], I added little-endian; but it has no
> > effect. I looked in dwc2_driver_probe and found no way of specifying the
> > endian of the device. It all comes down to the dwc2_readl & dwc2_writel
> > accessors. These - sadly - have been hardwired to use __raw_readl and
> > __raw_writel. So, it's always "native-endian". While common-properties
> > says little-endian should be preferred.
> 
> Right, I meant, you should produce a patch adding a runtime test inside
> those functions based on a device-tree property, a bit like we do for
> some of the HCDs like OHCI, EHCI etc...
> 
> 

The patch that caused the problem had multiple issues:

- it broke big-endian ARM kernels: any machine that was working
  correctly with a little-endian kernel is no longer using byteswaps
  on big-endian kernels, which clearly breaks them.
- On PowerPC the same thing must be true: if it was working before,
  using big-endian kernels is now broken. Unlike ARM, 32-bit PowerPC
  usually uses big-endian kernels, so they are likely all broken.
- The barrier for dwc2_writel is on the wrong side of the __raw_writel(),
  so the MMIO no longer synchronizes with DMA operations.
- On architectures that require specific CPU instructions for MMIO
  access, using the __raw_ variant may turn this into a pointer
  dereference that does not have the same effect as the readl/writel.

I think we can simply make this set of accessors architecture-dependent
(MIPS vs. the rest of the world) to revert ARM and PowerPC back to
the working version.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Comments

Felipe Balbi May 9, 2016, 10:39 a.m. UTC | #1
Hi,

Arnd Bergmann <arnd@arndb.de> writes:
> On Monday 09 May 2016 10:23:22 Benjamin Herrenschmidt wrote:
>> On Sun, 2016-05-08 at 13:44 +0200, Christian Lamparter wrote:
>> > On Sunday, May 08, 2016 08:40:55 PM Benjamin Herrenschmidt wrote:
>> > > 
>> > > On Sun, 2016-05-08 at 00:54 +0200, Christian Lamparter via Linuxppc-dev 
>> > > wrote:
>> > > > 
>> > > > I've been looking in getting the MyBook Live Duo's USB OTG port
>> > > > to function. The SoC is a APM82181. Which has a PowerPC 464 core
>> > > > and related to the supported canyonlands architecture in
>> > > > arch/powerpc/.
>> > > > 
>> > > > Currently in -next the dwc2 module doesn't load: 
>> > > Smells like the APM implementation is little endian. You might need to
>> > > use a flag to indicate what endian to use instead and set it
>> > > appropriately based on some DT properties.
>> > I tried. As per common-properties[0], I added little-endian; but it has no
>> > effect. I looked in dwc2_driver_probe and found no way of specifying the
>> > endian of the device. It all comes down to the dwc2_readl & dwc2_writel
>> > accessors. These - sadly - have been hardwired to use __raw_readl and
>> > __raw_writel. So, it's always "native-endian". While common-properties
>> > says little-endian should be preferred.
>> 
>> Right, I meant, you should produce a patch adding a runtime test inside
>> those functions based on a device-tree property, a bit like we do for
>> some of the HCDs like OHCI, EHCI etc...
>> 
>> 
>
> The patch that caused the problem had multiple issues:
>
> - it broke big-endian ARM kernels: any machine that was working
>   correctly with a little-endian kernel is no longer using byteswaps
>   on big-endian kernels, which clearly breaks them.
> - On PowerPC the same thing must be true: if it was working before,
>   using big-endian kernels is now broken. Unlike ARM, 32-bit PowerPC
>   usually uses big-endian kernels, so they are likely all broken.
> - The barrier for dwc2_writel is on the wrong side of the __raw_writel(),
>   so the MMIO no longer synchronizes with DMA operations.
> - On architectures that require specific CPU instructions for MMIO
>   access, using the __raw_ variant may turn this into a pointer
>   dereference that does not have the same effect as the readl/writel.
>
> I think we can simply make this set of accessors architecture-dependent
> (MIPS vs. the rest of the world) to revert ARM and PowerPC back to
> the working version.

and patch all drivers similarly? Shouldn't arch/mips itself deal with it
and hide it from drivers ?
Benjamin Herrenschmidt May 9, 2016, 2:02 p.m. UTC | #2
On Mon, 2016-05-09 at 12:36 +0200, Arnd Bergmann wrote:

> I think we can simply make this set of accessors architecture-
> dependent
> (MIPS vs. the rest of the world) to revert ARM and PowerPC back to
> the working version.

Or use writel_be which mips seems to support...

Really, make it a BE vs. LE device test is a much better solution.

For now, since dwc2_readl() and writel don't take the device as an
argument, you can make it a function of a compile time #define, or
maybe a driver global, but the right way is really something like

	if (device_is_be())
		return readl_be(...)
	else
		return readl(...)

With the device_is_be() being temporarily set to true for MIPS for
example, and later, a second pass, add the device argument and make it
a device-flag initialized from the probe routine, possibly from the DT.

Cheers,
Ben.

> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> 
> diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
> index 3c58d633ce80..1f8ed149a40f 100644
> --- a/drivers/usb/dwc2/core.h
> +++ b/drivers/usb/dwc2/core.h
> @@ -64,12 +64,24 @@
>  	DWC2_TRACE_SCHEDULER_VB(pr_fmt("%s: SCH: " fmt),		
> \
>  				dev_name(hsotg->dev), ##__VA_ARGS__)
>  
> +
> +#ifdef CONFIG_MIPS
> +/*
> + * There are some MIPS machines that can run in either big-endian
> + * or little-endian mode and that use the dwc2 register without
> + * a byteswap in both ways.
> + * Unlike other architectures, MIPS does not require a barrier
> + * before the __raw_writel() to synchronize with DMA but does
> + * require the barrier after the writel() to serialize a series
> + * of writes. This set of operations was added specifically for
> + * MIPS and should only be used there.
> + */
>  static inline u32 dwc2_readl(const void __iomem *addr)
>  {
>  	u32 value = __raw_readl(addr);
>  
> -	/* In order to preserve endianness __raw_* operation is
> used. Therefore
> -	 * a barrier is needed to ensure IO access is not re-ordered 
> across
> +	/* in order to preserve endianness __raw_* operation is
> used. therefore
> +	 * a barrier is needed to ensure io access is not re-ordered 
> across
>  	 * reads or writes
>  	 */
>  	mb();
> @@ -81,15 +93,32 @@ static inline void dwc2_writel(u32 value, void
> __iomem *addr)
>  	__raw_writel(value, addr);
>  
>  	/*
> -	 * In order to preserve endianness __raw_* operation is
> used. Therefore
> -	 * a barrier is needed to ensure IO access is not re-ordered 
> across
> +	 * in order to preserve endianness __raw_* operation is
> used. therefore
> +	 * a barrier is needed to ensure io access is not re-ordered 
> across
>  	 * reads or writes
>  	 */
>  	mb();
> -#ifdef DWC2_LOG_WRITES
> -	pr_info("INFO:: wrote %08x to %p\n", value, addr);
> +#ifdef dwc2_log_writes
> +	pr_info("info:: wrote %08x to %p\n", value, addr);
>  #endif
>  }
> +#else
> +/* Normal architectures just use readl/write */
> +static inline u32 dwc2_readl(const void __iomem *addr)
> +{
> +	u32 value = readl(addr);
> +	return value;
> +}
> +
> +static inline void dwc2_writel(u32 value, void __iomem *addr)
> +{
> +	writel(value, addr);
> +
> +#ifdef dwc2_log_writes
> +	pr_info("info:: wrote %08x to %p\n", value, addr);
> +#endif
> +}
> +#endif
>  
>  /* Maximum number of Endpoints/HostChannels */
>  #define MAX_EPS_CHANNELS	16
Arnd Bergmann May 9, 2016, 3:08 p.m. UTC | #3
On Monday 09 May 2016 13:39:50 Felipe Balbi wrote:
> Arnd Bergmann <arnd@arndb.de> writes:
> > On Monday 09 May 2016 10:23:22 Benjamin Herrenschmidt wrote:
> >> On Sun, 2016-05-08 at 13:44 +0200, Christian Lamparter wrote:
> >
> > The patch that caused the problem had multiple issues:
> >
> > - it broke big-endian ARM kernels: any machine that was working
> >   correctly with a little-endian kernel is no longer using byteswaps
> >   on big-endian kernels, which clearly breaks them.
> > - On PowerPC the same thing must be true: if it was working before,
> >   using big-endian kernels is now broken. Unlike ARM, 32-bit PowerPC
> >   usually uses big-endian kernels, so they are likely all broken.
> > - The barrier for dwc2_writel is on the wrong side of the __raw_writel(),
> >   so the MMIO no longer synchronizes with DMA operations.
> > - On architectures that require specific CPU instructions for MMIO
> >   access, using the __raw_ variant may turn this into a pointer
> >   dereference that does not have the same effect as the readl/writel.
> >
> > I think we can simply make this set of accessors architecture-dependent
> > (MIPS vs. the rest of the world) to revert ARM and PowerPC back to
> > the working version.
> 
> and patch all drivers similarly? Shouldn't arch/mips itself deal with it
> and hide it from drivers ?
> 

Unfortunately, I don't see any way this could be done in MIPS specific
code: There is typically a byteswap between the internal bus and the PCI
bus on big-endian MIPS systems, so the PCI MMIO ends up being little-endian,
which matches the expected behavior of readl/writel. However, drivers
for non-PCI devices often use the same readl/writel accessors because
that is how it's done on ARMv6/ARMv7.

Doing it hardcoded by architecture is just the simplest way to deal
with it, working on the assumption that nothing actually needs the
runtime detection that Ben suggested. Detecting the endianess of the
device is probably the best future-proof solution, but it's also
considerably more work to do in the driver, and comes with a
tiny runtime overhead.

	Arnd
John Youn May 9, 2016, 8:22 p.m. UTC | #4
On 5/9/2016 3:36 AM, Arnd Bergmann wrote:
> On Monday 09 May 2016 10:23:22 Benjamin Herrenschmidt wrote:
>> On Sun, 2016-05-08 at 13:44 +0200, Christian Lamparter wrote:
>>> On Sunday, May 08, 2016 08:40:55 PM Benjamin Herrenschmidt wrote:
>>>>
>>>> On Sun, 2016-05-08 at 00:54 +0200, Christian Lamparter via Linuxppc-dev 
>>>> wrote:
>>>>>
>>>>> I've been looking in getting the MyBook Live Duo's USB OTG port
>>>>> to function. The SoC is a APM82181. Which has a PowerPC 464 core
>>>>> and related to the supported canyonlands architecture in
>>>>> arch/powerpc/.
>>>>>
>>>>> Currently in -next the dwc2 module doesn't load: 
>>>> Smells like the APM implementation is little endian. You might need to
>>>> use a flag to indicate what endian to use instead and set it
>>>> appropriately based on some DT properties.
>>> I tried. As per common-properties[0], I added little-endian; but it has no
>>> effect. I looked in dwc2_driver_probe and found no way of specifying the
>>> endian of the device. It all comes down to the dwc2_readl & dwc2_writel
>>> accessors. These - sadly - have been hardwired to use __raw_readl and
>>> __raw_writel. So, it's always "native-endian". While common-properties
>>> says little-endian should be preferred.
>>
>> Right, I meant, you should produce a patch adding a runtime test inside
>> those functions based on a device-tree property, a bit like we do for
>> some of the HCDs like OHCI, EHCI etc...
>>
>>
> 
> The patch that caused the problem had multiple issues:
> 
> - it broke big-endian ARM kernels: any machine that was working
>   correctly with a little-endian kernel is no longer using byteswaps
>   on big-endian kernels, which clearly breaks them.


I'm a bit confused about how this is supposed to work. My
understanding was that the readl() and writel() are defined as little
endian. So byte-swapping was performed if the architecture is big
endian. And the raw versions never swapped, always using the "native"
endianness.

dwc2 is always treating the result of readl/writel as if it was read
in native endian. So it needs to read the registers in big-endian on
big-endian systems.

This was the premise on which this patch was made.

So for big endian systems, isn't what we want is to read in big-endian
without any byteswapping to little-endian? But your saying this breaks
big-endian ARM systems as well. Am I missing something?

The rest of the feedback makes sense. Just confused about this bit.

Regards,
John
Arnd Bergmann May 9, 2016, 8:38 p.m. UTC | #5
On Monday 09 May 2016 13:22:48 John Youn wrote:
> On 5/9/2016 3:36 AM, Arnd Bergmann wrote:
> > On Monday 09 May 2016 10:23:22 Benjamin Herrenschmidt wrote:
> >> On Sun, 2016-05-08 at 13:44 +0200, Christian Lamparter wrote:
> >>> On Sunday, May 08, 2016 08:40:55 PM Benjamin Herrenschmidt wrote:
> >>>>
> >>>> On Sun, 2016-05-08 at 00:54 +0200, Christian Lamparter via Linuxppc-dev 
> >>>> wrote:
> >>>>>
> >>>>> I've been looking in getting the MyBook Live Duo's USB OTG port
> >>>>> to function. The SoC is a APM82181. Which has a PowerPC 464 core
> >>>>> and related to the supported canyonlands architecture in
> >>>>> arch/powerpc/.
> >>>>>
> >>>>> Currently in -next the dwc2 module doesn't load: 
> >>>> Smells like the APM implementation is little endian. You might need to
> >>>> use a flag to indicate what endian to use instead and set it
> >>>> appropriately based on some DT properties.
> >>> I tried. As per common-properties[0], I added little-endian; but it has no
> >>> effect. I looked in dwc2_driver_probe and found no way of specifying the
> >>> endian of the device. It all comes down to the dwc2_readl & dwc2_writel
> >>> accessors. These - sadly - have been hardwired to use __raw_readl and
> >>> __raw_writel. So, it's always "native-endian". While common-properties
> >>> says little-endian should be preferred.
> >>
> >> Right, I meant, you should produce a patch adding a runtime test inside
> >> those functions based on a device-tree property, a bit like we do for
> >> some of the HCDs like OHCI, EHCI etc...
> >>
> >>
> > 
> > The patch that caused the problem had multiple issues:
> > 
> > - it broke big-endian ARM kernels: any machine that was working
> >   correctly with a little-endian kernel is no longer using byteswaps
> >   on big-endian kernels, which clearly breaks them.
> 
> 
> I'm a bit confused about how this is supposed to work. My
> understanding was that the readl() and writel() are defined as little
> endian. So byte-swapping was performed if the architecture is big
> endian. And the raw versions never swapped, always using the "native"
> endianness.
> 
> dwc2 is always treating the result of readl/writel as if it was read
> in native endian. So it needs to read the registers in big-endian on
> big-endian systems.

The hardware has no idea of what endianess the CPU uses at any
given time, it's fixed by the SoC design, so there is no such
thing as "native" endianess for a random IP block.

The readl/writel accessors accomodate for that by swapping the
data on big-endian kernels, because most SoC designers tend to
pick little-endian device registers by default.

> This was the premise on which this patch was made.
> 
> So for big endian systems, isn't what we want is to read in big-endian
> without any byteswapping to little-endian? But your saying this breaks
> big-endian ARM systems as well. Am I missing something?

The systems are not a particular endianess, only the current state
of the CPU is, and that may change independent of the way the
hardware block got synthesized. We don't support swapping endianess
at runtime in Linux, but the system normally doesn't care what we
run.

The normal behavior is for the register contents to be read as
little-endian, and then swapped on big-endian kernel builds to
match what the kernel expects.

MIPS is a special case here, because the endianess of the CPU
core is fixed in hardware (or using a strapping pin) and is often
tied to the endianess of all the IP blocks. There are a couple
of other architectures like this (e.g. ARM ixp4xx, but none of the
modern ARM systems).

	Arnd
John Youn May 9, 2016, 9:11 p.m. UTC | #6
On 5/9/2016 1:39 PM, Arnd Bergmann wrote:
> On Monday 09 May 2016 13:22:48 John Youn wrote:
>> On 5/9/2016 3:36 AM, Arnd Bergmann wrote:
>>> On Monday 09 May 2016 10:23:22 Benjamin Herrenschmidt wrote:
>>>> On Sun, 2016-05-08 at 13:44 +0200, Christian Lamparter wrote:
>>>>> On Sunday, May 08, 2016 08:40:55 PM Benjamin Herrenschmidt wrote:
>>>>>>
>>>>>> On Sun, 2016-05-08 at 00:54 +0200, Christian Lamparter via Linuxppc-dev 
>>>>>> wrote:
>>>>>>>
>>>>>>> I've been looking in getting the MyBook Live Duo's USB OTG port
>>>>>>> to function. The SoC is a APM82181. Which has a PowerPC 464 core
>>>>>>> and related to the supported canyonlands architecture in
>>>>>>> arch/powerpc/.
>>>>>>>
>>>>>>> Currently in -next the dwc2 module doesn't load: 
>>>>>> Smells like the APM implementation is little endian. You might need to
>>>>>> use a flag to indicate what endian to use instead and set it
>>>>>> appropriately based on some DT properties.
>>>>> I tried. As per common-properties[0], I added little-endian; but it has no
>>>>> effect. I looked in dwc2_driver_probe and found no way of specifying the
>>>>> endian of the device. It all comes down to the dwc2_readl & dwc2_writel
>>>>> accessors. These - sadly - have been hardwired to use __raw_readl and
>>>>> __raw_writel. So, it's always "native-endian". While common-properties
>>>>> says little-endian should be preferred.
>>>>
>>>> Right, I meant, you should produce a patch adding a runtime test inside
>>>> those functions based on a device-tree property, a bit like we do for
>>>> some of the HCDs like OHCI, EHCI etc...
>>>>
>>>>
>>>
>>> The patch that caused the problem had multiple issues:
>>>
>>> - it broke big-endian ARM kernels: any machine that was working
>>>   correctly with a little-endian kernel is no longer using byteswaps
>>>   on big-endian kernels, which clearly breaks them.
>>
>>
>> I'm a bit confused about how this is supposed to work. My
>> understanding was that the readl() and writel() are defined as little
>> endian. So byte-swapping was performed if the architecture is big
>> endian. And the raw versions never swapped, always using the "native"
>> endianness.
>>
>> dwc2 is always treating the result of readl/writel as if it was read
>> in native endian. So it needs to read the registers in big-endian on
>> big-endian systems.
> 
> The hardware has no idea of what endianess the CPU uses at any
> given time, it's fixed by the SoC design, so there is no such
> thing as "native" endianess for a random IP block.
> 
> The readl/writel accessors accomodate for that by swapping the
> data on big-endian kernels, because most SoC designers tend to
> pick little-endian device registers by default.
> 
>> This was the premise on which this patch was made.
>>
>> So for big endian systems, isn't what we want is to read in big-endian
>> without any byteswapping to little-endian? But your saying this breaks
>> big-endian ARM systems as well. Am I missing something?
> 
> The systems are not a particular endianess, only the current state
> of the CPU is, and that may change independent of the way the
> hardware block got synthesized. We don't support swapping endianess
> at runtime in Linux, but the system normally doesn't care what we
> run.
> 
> The normal behavior is for the register contents to be read as
> little-endian, and then swapped on big-endian kernel builds to
> match what the kernel expects.
> 
> MIPS is a special case here, because the endianess of the CPU
> core is fixed in hardware (or using a strapping pin) and is often
> tied to the endianess of all the IP blocks. There are a couple
> of other architectures like this (e.g. ARM ixp4xx, but none of the
> modern ARM systems).

Ok thanks. What you're saying is clear now.

Is there a standard way to handle this? Must all drivers either check
some endianness configuration or do a runtime check?

Regards,
John
Arnd Bergmann May 9, 2016, 9:30 p.m. UTC | #7
On Monday 09 May 2016 14:11:23 John Youn wrote:
> On 5/9/2016 1:39 PM, Arnd Bergmann wrote:
>
> > The systems are not a particular endianess, only the current state
> > of the CPU is, and that may change independent of the way the
> > hardware block got synthesized. We don't support swapping endianess
> > at runtime in Linux, but the system normally doesn't care what we
> > run.
> > 
> > The normal behavior is for the register contents to be read as
> > little-endian, and then swapped on big-endian kernel builds to
> > match what the kernel expects.
> > 
> > MIPS is a special case here, because the endianess of the CPU
> > core is fixed in hardware (or using a strapping pin) and is often
> > tied to the endianess of all the IP blocks. There are a couple
> > of other architectures like this (e.g. ARM ixp4xx, but none of the
> > modern ARM systems).
> 
> Ok thanks. What you're saying is clear now.
> 
> Is there a standard way to handle this? Must all drivers either check
> some endianness configuration or do a runtime check?

There are four common approaches:

1 hardcode a particular endianess because an IP block is always
  used the same way

2 pick the right endianess at compile-time because we know
  what we are building for (typically by CPU architecture, but
  there are some other ways)

3 do runtime configuration based on a DT property or platform data

4 do automatic runtime configuration based on well-known register
  contents

Approach 1 is the most common, in particular as most IP blocks
are not configurable and are not used on big-endian MIPS machines.

Approach 4 as implemented by Christian Lamparter's patch is the
most reliable way, and probably makes most sense for DesignWare
IP blocks in general, because they are configurable and may
get used in all kinds of systems.

Approaches 2 and 3 are somewhat inferior because you have to
rely on everyone else doing it the way you planned, but 2
is particularly easy and 3 is used when there is no obvious
way to detect endianess of the device.

The patch I suggested would be approach 2, and as Felipe correctly
mentioned, it is not ideal, but it may be more appropriate for
backports to the 4.4-longterm, 4.5-stable and 4.6-stable kernels
than the more elaborate patch we are now discussing.

	Arnd
Benjamin Herrenschmidt May 9, 2016, 10:33 p.m. UTC | #8
On Mon, 2016-05-09 at 13:39 +0300, Felipe Balbi wrote:
> and patch all drivers similarly? Shouldn't arch/mips itself deal with
> it and hide it from drivers ?

Not sure what you mean, but we never had "endian neutral" accessors. It
would be a bit of an endeavour and we already have so many accessors
that adding more need a very strong justification.

Most IP blocks have a fixed endian...

Ben.
Benjamin Herrenschmidt May 9, 2016, 10:37 p.m. UTC | #9
On Mon, 2016-05-09 at 17:08 +0200, Arnd Bergmann wrote:
> 
> Unfortunately, I don't see any way this could be done in MIPS specific
> code: There is typically a byteswap between the internal bus and the PCI
> bus on big-endian MIPS systems, so the PCI MMIO ends up being little-endian,

Ugh ... not exactly, re-watch my talk on the matter :-) While there is
a specific lane wiring to preserve byte addresss, in the end it's the
end device itself that is either BE or LE. Regardless of any "bus
endianness".
 
> which matches the expected behavior of readl/writel. However, drivers
> for non-PCI devices often use the same readl/writel accessors because
> that is how it's done on ARMv6/ARMv7.

Even then, you can have on-SoC (non-PCI) devices that also have a
different endianness from the main CPU. How does it work on ARM for
example ? The device endianness should be fixed, regardless of the
endianness of the core, no ?

> Doing it hardcoded by architecture is just the simplest way to deal
> with it, working on the assumption that nothing actually needs the
> runtime detection that Ben suggested. 

No, it's not an archicture problem. It's a problem specific to that one
SoC that the device was synthetized to be a certain endian while it was
synthetized differently on another SoC... that also happens to be a
different architecture. But doesn't have to.

For example, we had in the past cases of both LE and BE EHCI
implementations on the same architecture (PowerPC).

> Detecting the endianess of the
> device is probably the best future-proof solution, but it's also
> considerably more work to do in the driver, and comes with a
> tiny runtime overhead.

The runtime overhead is probably non-measurable compared with the cost
of the actual MMIOs.

Cheers,
Ben.
Arnd Bergmann May 10, 2016, 7:23 a.m. UTC | #10
On Tuesday 10 May 2016 08:37:52 Benjamin Herrenschmidt wrote:
> On Mon, 2016-05-09 at 17:08 +0200, Arnd Bergmann wrote:
> > 
> > Unfortunately, I don't see any way this could be done in MIPS specific
> > code: There is typically a byteswap between the internal bus and the PCI
> > bus on big-endian MIPS systems, so the PCI MMIO ends up being little-endian,
> 
> Ugh ... not exactly, re-watch my talk on the matter :-) While there is
> a specific lane wiring to preserve byte addresss, in the end it's the
> end device itself that is either BE or LE. Regardless of any "bus
> endianness".

I found your slides on

http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/09/2012-lpc-ref-big-little-endian-herrenschmidt.odp

but there are at least two more twists that you completely missed here:

- Some architectures (e.g. ARMv5 "BE32" mode in IXP4xx, surely some others)
  do not implement big-endian mode by wiring up the data lines between the
  bus and the CPU differently between big- and little-endian mode like
  powerpc and armv7 "BE8" do, but instead they swizzle the *address* lines
  on 8-bit and 16-bit addresses. The effect of that is that normal RAM
  accesses work as expected both ways, and devices that are accessed using
  32-bit MMIO ops never need any byteswap (you actually get "native
  endian") while MMIO with 8 and 16 bit width does something completely
  unexpected and touches the wrong register. Having an explicit byteswap
  on the PCI host bridge gets you the expected addresses again for 8-bit
  cycles but it also means that readl()/writel() again need to swap the
  data.

- Some other architectures (e.g. Broadcom MIPS) apparently are even fancier
  and use a strapping pin on the SoC flips the endianess of the CPU core
  at the same time as all the peripheral MMIO registers, with the intention
  of never requiring any byte swaps. I believe they are implemented careful
  enough to actually get this right, but it confuses the heck out of
  Linux drivers that don't expect this.

> > which matches the expected behavior of readl/writel. However, drivers
> > for non-PCI devices often use the same readl/writel accessors because
> > that is how it's done on ARMv6/ARMv7.
> 
> Even then, you can have on-SoC (non-PCI) devices that also have a
> different endianness from the main CPU. How does it work on ARM for
> example ? The device endianness should be fixed, regardless of the
> endianness of the core, no ?

ARMv6/v7 is uses BE8 mode like powerpc: each peripheral is fixed-endian
and you have to know what it is. Only Freescale managed to put identical
IP blocks on various (powerpc-derived) SoCs and have a subset of them
treat the access as little-endian while others remain big-endian, so all
those drivers now require runtime detection.

> > Doing it hardcoded by architecture is just the simplest way to deal
> > with it, working on the assumption that nothing actually needs the
> > runtime detection that Ben suggested. 
> 
> No, it's not an archicture problem. It's a problem specific to that one
> SoC that the device was synthetized to be a certain endian while it was
> synthetized differently on another SoC... that also happens to be a
> different architecture. But doesn't have to.
> 
> For example, we had in the past cases of both LE and BE EHCI
> implementations on the same architecture (PowerPC).

I understand this, but from what I see in this history of this particular
driver, all ARM and PowerPC implementations chose to use LE registers for
DWC2 because the normal approach for these is to not mess with endianess,
while presumably all MIPS users of the same block wired up the endian-select
line of the IP block to match that of the CPU core, again because it's
what you are expected to do on a MIPS based SoC.

So hardcoding it per architecture would make an assumption based on
the mindset of the SoC designers rather than strict technical differences,
and that can fail as soon as someone does things differently on any of
them (see the Freescale example), but I still think it's the easiest
workaround for backporting to stable kernels. A revert of the original
patch would be even easier, but that would break the one big-endian
MIPS machine we know about.

> > Detecting the endianess of the
> > device is probably the best future-proof solution, but it's also
> > considerably more work to do in the driver, and comes with a
> > tiny runtime overhead.
> 
> The runtime overhead is probably non-measurable compared with the cost
> of the actual MMIOs.

Right. The code size increase is probably measurable (but still small),
the runtime overhead is not.

	Arnd
Unknown sender due to SPF May 12, 2016, 9:58 a.m. UTC | #11
On Tuesday, May 10, 2016 09:23:59 AM Arnd Bergmann wrote:
> On Tuesday 10 May 2016 08:37:52 Benjamin Herrenschmidt wrote:
> > On Mon, 2016-05-09 at 17:08 +0200, Arnd Bergmann wrote:
> > > 
> > > Unfortunately, I don't see any way this could be done in MIPS specific
> > > code: There is typically a byteswap between the internal bus and the PCI
> > > bus on big-endian MIPS systems, so the PCI MMIO ends up being little-endian,
> > 
> > Ugh ... not exactly, re-watch my talk on the matter :-) While there is
> > a specific lane wiring to preserve byte addresss, in the end it's the
> > end device itself that is either BE or LE. Regardless of any "bus
> > endianness".
> 
> I found your slides on
> 
> http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/09/2012-lpc-ref-big-little-endian-herrenschmidt.odp
> 
> but there are at least two more twists that you completely missed here:
> 
> - Some architectures (e.g. ARMv5 "BE32" mode in IXP4xx, surely some others)
>   do not implement big-endian mode by wiring up the data lines between the
>   bus and the CPU differently between big- and little-endian mode like
>   powerpc and armv7 "BE8" do, but instead they swizzle the *address* lines
>   on 8-bit and 16-bit addresses. The effect of that is that normal RAM
>   accesses work as expected both ways, and devices that are accessed using
>   32-bit MMIO ops never need any byteswap (you actually get "native
>   endian") while MMIO with 8 and 16 bit width does something completely
>   unexpected and touches the wrong register. Having an explicit byteswap
>   on the PCI host bridge gets you the expected addresses again for 8-bit
>   cycles but it also means that readl()/writel() again need to swap the
>   data.
> 
> - Some other architectures (e.g. Broadcom MIPS) apparently are even fancier
>   and use a strapping pin on the SoC flips the endianess of the CPU core
>   at the same time as all the peripheral MMIO registers, with the intention
>   of never requiring any byte swaps. I believe they are implemented careful
>   enough to actually get this right, but it confuses the heck out of
>   Linux drivers that don't expect this.
> 
> > > which matches the expected behavior of readl/writel. However, drivers
> > > for non-PCI devices often use the same readl/writel accessors because
> > > that is how it's done on ARMv6/ARMv7.
> > 
> > Even then, you can have on-SoC (non-PCI) devices that also have a
> > different endianness from the main CPU. How does it work on ARM for
> > example ? The device endianness should be fixed, regardless of the
> > endianness of the core, no ?
> 
> ARMv6/v7 is uses BE8 mode like powerpc: each peripheral is fixed-endian
> and you have to know what it is. Only Freescale managed to put identical
> IP blocks on various (powerpc-derived) SoCs and have a subset of them
> treat the access as little-endian while others remain big-endian, so all
> those drivers now require runtime detection.
> 
> > > Doing it hardcoded by architecture is just the simplest way to deal
> > > with it, working on the assumption that nothing actually needs the
> > > runtime detection that Ben suggested. 
> > 
> > No, it's not an archicture problem. It's a problem specific to that one
> > SoC that the device was synthetized to be a certain endian while it was
> > synthetized differently on another SoC... that also happens to be a
> > different architecture. But doesn't have to.
> > 
> > For example, we had in the past cases of both LE and BE EHCI
> > implementations on the same architecture (PowerPC).
> 
> I understand this, but from what I see in this history of this particular
> driver, all ARM and PowerPC implementations chose to use LE registers for
> DWC2 because the normal approach for these is to not mess with endianess,
> while presumably all MIPS users of the same block wired up the endian-select
> line of the IP block to match that of the CPU core, again because it's
> what you are expected to do on a MIPS based SoC.
> 
> So hardcoding it per architecture would make an assumption based on
> the mindset of the SoC designers rather than strict technical differences,
> and that can fail as soon as someone does things differently on any of
> them (see the Freescale example), but I still think it's the easiest
> workaround for backporting to stable kernels. A revert of the original
> patch would be even easier, but that would break the one big-endian
> MIPS machine we know about.
> 
> > > Detecting the endianess of the
> > > device is probably the best future-proof solution, but it's also
> > > considerably more work to do in the driver, and comes with a
> > > tiny runtime overhead.
> > 
> > The runtime overhead is probably non-measurable compared with the cost
> > of the actual MMIOs.
> 
> Right. The code size increase is probably measurable (but still small),
> the runtime overhead is not.

Ok, so no rebuts or complains have been posted.

I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354
and it works: 

Tested-by: Christian Lamparter <chunkeey@googlemail.com>

So, how do we go from here? There is are two small issues with the
original patch (#ifdef DWC2_LOG_WRITES got converted to lower case:
#ifdef dwc2_log_writes) and I guess a proper subject would be nice.  

Arnd, can you please respin and post it (cc'd stable as well)?
So this is can be picked up? Or what's your plan?

Regards,
Christian
Arnd Bergmann May 12, 2016, 11:55 a.m. UTC | #12
On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote:
> > > > Detecting the endianess of the
> > > > device is probably the best future-proof solution, but it's also
> > > > considerably more work to do in the driver, and comes with a
> > > > tiny runtime overhead.
> > > 
> > > The runtime overhead is probably non-measurable compared with the cost
> > > of the actual MMIOs.
> > 
> > Right. The code size increase is probably measurable (but still small),
> > the runtime overhead is not.
> 
> Ok, so no rebuts or complains have been posted.
> 
> I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354
> and it works: 
> 
> Tested-by: Christian Lamparter <chunkeey@googlemail.com>
> 
> So, how do we go from here? There is are two small issues with the
> original patch (#ifdef DWC2_LOG_WRITES got converted to lower case:
> #ifdef dwc2_log_writes) and I guess a proper subject would be nice.  
> 
> Arnd, can you please respin and post it (cc'd stable as well)?
> So this is can be picked up? Or what's your plan?

(I just realized my reply was stuck in my outbox, so the patch
went out first)

If I recall correctly, the rough consensus was to go with your longer
patch in the future (fixed up for the comments that BenH and
I sent), and I'd suggest basing it on top of a fixed version of
my patch.

Felipe just had another idea, to change the endianess of the dwc2
block by setting a registers (if that exists). That would indeed
be preferable, then we can just revert the broken change that
went into 4.4 and backport that fix instead.

	Arnd
Unknown sender due to SPF May 12, 2016, 1:30 p.m. UTC | #13
On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote:
> On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote:
> > > > > Detecting the endianess of the
> > > > > device is probably the best future-proof solution, but it's also
> > > > > considerably more work to do in the driver, and comes with a
> > > > > tiny runtime overhead.
> > > > 
> > > > The runtime overhead is probably non-measurable compared with the cost
> > > > of the actual MMIOs.
> > > 
> > > Right. The code size increase is probably measurable (but still small),
> > > the runtime overhead is not.
> > 
> > Ok, so no rebuts or complains have been posted.
> > 
> > I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354
> > and it works: 
> > 
> > Tested-by: Christian Lamparter <chunkeey@googlemail.com>
> > 
> > So, how do we go from here? There is are two small issues with the
> > original patch (#ifdef DWC2_LOG_WRITES got converted to lower case:
> > #ifdef dwc2_log_writes) and I guess a proper subject would be nice.  
> > 
> > Arnd, can you please respin and post it (cc'd stable as well)?
> > So this is can be picked up? Or what's your plan?
> 
> (I just realized my reply was stuck in my outbox, so the patch
> went out first)
> 
> If I recall correctly, the rough consensus was to go with your longer
> patch in the future (fixed up for the comments that BenH and
> I sent), and I'd suggest basing it on top of a fixed version of
> my patch.
Well, but it comes with the "overhead"! So this was just as I said:
"Let's look at it and see if it's any good"... And I think it isn't
since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN
archs etc...
 
> Felipe just had another idea, to change the endianess of the dwc2
> block by setting a registers (if that exists). That would indeed
> be preferable, then we can just revert the broken change that
> went into 4.4 and backport that fix instead.
Just a quick reply. I have the docs for the thing. There's something
like that in GAHBCFG at Bit 24... BUT it only switches the endiannes
for the DMA descriptors (which is not always used, there are devices
with PIO only)! It doesn't deal with the MMIO access at all. 

The pin that would select which endian the device uses is probably
connected to a DCR or GPIO but I don't know which or where so this
is more or less useless. (Or the selectable endianness was dropped
during synth).

Regards,
Christian
John Youn May 12, 2016, 6:40 p.m. UTC | #14
On 5/12/2016 6:30 AM, Christian Lamparter wrote:
> On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote:
>> On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote:
>>>>>> Detecting the endianess of the
>>>>>> device is probably the best future-proof solution, but it's also
>>>>>> considerably more work to do in the driver, and comes with a
>>>>>> tiny runtime overhead.
>>>>>
>>>>> The runtime overhead is probably non-measurable compared with the cost
>>>>> of the actual MMIOs.
>>>>
>>>> Right. The code size increase is probably measurable (but still small),
>>>> the runtime overhead is not.
>>>
>>> Ok, so no rebuts or complains have been posted.
>>>
>>> I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354
>>> and it works: 
>>>
>>> Tested-by: Christian Lamparter <chunkeey@googlemail.com>
>>>
>>> So, how do we go from here? There is are two small issues with the
>>> original patch (#ifdef DWC2_LOG_WRITES got converted to lower case:
>>> #ifdef dwc2_log_writes) and I guess a proper subject would be nice.  
>>>
>>> Arnd, can you please respin and post it (cc'd stable as well)?
>>> So this is can be picked up? Or what's your plan?
>>
>> (I just realized my reply was stuck in my outbox, so the patch
>> went out first)
>>
>> If I recall correctly, the rough consensus was to go with your longer
>> patch in the future (fixed up for the comments that BenH and
>> I sent), and I'd suggest basing it on top of a fixed version of
>> my patch.
> Well, but it comes with the "overhead"! So this was just as I said:
> "Let's look at it and see if it's any good"... And I think it isn't
> since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN
> archs etc...

I slightly prefer the more general patch for future kernel versions.
The overhead will probably be negligible, but we can perform some
testing to make sure.

Can you resubmit with all gathered feedback?

>  
>> Felipe just had another idea, to change the endianess of the dwc2
>> block by setting a registers (if that exists). That would indeed
>> be preferable, then we can just revert the broken change that
>> went into 4.4 and backport that fix instead.
> Just a quick reply. I have the docs for the thing. There's something
> like that in GAHBCFG at Bit 24... BUT it only switches the endiannes
> for the DMA descriptors (which is not always used, there are devices
> with PIO only)! It doesn't deal with the MMIO access at all. 

That's correct. It only affects descriptor endianness for DMA
descriptor mode of operation.

Regards,
John
Unknown sender due to SPF May 12, 2016, 8:39 p.m. UTC | #15
On Thursday, May 12, 2016 11:40:28 AM John Youn wrote:
> On 5/12/2016 6:30 AM, Christian Lamparter wrote:
> > On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote:
> >> On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote:
> >>>>>> Detecting the endianess of the
> >>>>>> device is probably the best future-proof solution, but it's also
> >>>>>> considerably more work to do in the driver, and comes with a
> >>>>>> tiny runtime overhead.
> >>>>>
> >>>>> The runtime overhead is probably non-measurable compared with the cost
> >>>>> of the actual MMIOs.
> >>>>
> >>>> Right. The code size increase is probably measurable (but still small),
> >>>> the runtime overhead is not.
> >>>
> >>> Ok, so no rebuts or complains have been posted.
> >>>
> >>> I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354
> >>> and it works: 
> >>>
> >>> Tested-by: Christian Lamparter <chunkeey@googlemail.com>
> >>>
> >>> So, how do we go from here? There is are two small issues with the
> >>> original patch (#ifdef DWC2_LOG_WRITES got converted to lower case:
> >>> #ifdef dwc2_log_writes) and I guess a proper subject would be nice.  
> >>>
> >>> Arnd, can you please respin and post it (cc'd stable as well)?
> >>> So this is can be picked up? Or what's your plan?
> >>
> >> (I just realized my reply was stuck in my outbox, so the patch
> >> went out first)
> >>
> >> If I recall correctly, the rough consensus was to go with your longer
> >> patch in the future (fixed up for the comments that BenH and
> >> I sent), and I'd suggest basing it on top of a fixed version of
> >> my patch.
> > Well, but it comes with the "overhead"! So this was just as I said:
> > "Let's look at it and see if it's any good"... And I think it isn't
> > since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN
> > archs etc...
> 
> I slightly prefer the more general patch for future kernel versions.
> The overhead will probably be negligible, but we can perform some
> testing to make sure.
> 
> Can you resubmit with all gathered feedback?
Yes I think I can do that. But I would really like to get the
regression out of the way. So for that: I back Arnd's patch.
It explains the problem much better and doesn't kill MIPS 
like the revert I was doing in my initial post to the MLs.
Also, another bonus: his patch is suited to port to stable.

The auto-detection approach is not that easy to get right,
given all the stuff that's going on with BE8, LE4, ... So
can we have your "blessing" for Arnd's patch for now? since
that way, I can base my patch on top of his work about the
issues of endiannes? (Just say: ACK :) )

Arnd: do you have a version with the #ifdef lower/uppercase
fix? Or should I give it a try (and fail in a different way ;) )
 
> >> Felipe just had another idea, to change the endianess of the dwc2
> >> block by setting a registers (if that exists). That would indeed
> >> be preferable, then we can just revert the broken change that
> >> went into 4.4 and backport that fix instead.
> > Just a quick reply. I have the docs for the thing. There's something
> > like that in GAHBCFG at Bit 24... BUT it only switches the endiannes
> > for the DMA descriptors (which is not always used, there are devices
> > with PIO only)! It doesn't deal with the MMIO access at all. 
> 
> That's correct. It only affects descriptor endianness for DMA
> descriptor mode of operation.

Ok. The funny thing is that for the MyBook Live Duo this setting might
be important since the PLB_DMA engine is not part of the DWC library...
Instead it's from IBM and operates in: Big Endian :-D.

Regards,
Christian
Arnd Bergmann May 12, 2016, 8:50 p.m. UTC | #16
On Thursday 12 May 2016 22:39:36 Christian Lamparter wrote:
> On Thursday, May 12, 2016 11:40:28 AM John Youn wrote:
> > On 5/12/2016 6:30 AM, Christian Lamparter wrote:
> > > On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote:
> > >>
> > >> If I recall correctly, the rough consensus was to go with your longer
> > >> patch in the future (fixed up for the comments that BenH and
> > >> I sent), and I'd suggest basing it on top of a fixed version of
> > >> my patch.
> > > Well, but it comes with the "overhead"! So this was just as I said:
> > > "Let's look at it and see if it's any good"... And I think it isn't
> > > since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN
> > > archs etc...
> > 
> > I slightly prefer the more general patch for future kernel versions.
> > The overhead will probably be negligible, but we can perform some
> > testing to make sure.
> > 
> > Can you resubmit with all gathered feedback?
> Yes I think I can do that. But I would really like to get the
> regression out of the way. So for that: I back Arnd's patch.
> It explains the problem much better and doesn't kill MIPS 
> like the revert I was doing in my initial post to the MLs.
> Also, another bonus: his patch is suited to port to stable.
> 
> The auto-detection approach is not that easy to get right,
> given all the stuff that's going on with BE8, LE4, ... So
> can we have your "blessing" for Arnd's patch for now? since
> that way, I can base my patch on top of his work about the
> issues of endiannes? (Just say: ACK  )
> 
> Arnd: do you have a version with the #ifdef lower/uppercase
> fix? Or should I give it a try (and fail in a different way  )

I've already fixed it up locally, will send the latest version
so it's out there, whether Felipe takes it or not.

	Arnd
John Youn May 12, 2016, 8:55 p.m. UTC | #17
On 5/12/2016 1:39 PM, Christian Lamparter wrote:
> On Thursday, May 12, 2016 11:40:28 AM John Youn wrote:
>> On 5/12/2016 6:30 AM, Christian Lamparter wrote:
>>> On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote:
>>>> On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote:
>>>>>>>> Detecting the endianess of the
>>>>>>>> device is probably the best future-proof solution, but it's also
>>>>>>>> considerably more work to do in the driver, and comes with a
>>>>>>>> tiny runtime overhead.
>>>>>>>
>>>>>>> The runtime overhead is probably non-measurable compared with the cost
>>>>>>> of the actual MMIOs.
>>>>>>
>>>>>> Right. The code size increase is probably measurable (but still small),
>>>>>> the runtime overhead is not.
>>>>>
>>>>> Ok, so no rebuts or complains have been posted.
>>>>>
>>>>> I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354
>>>>> and it works: 
>>>>>
>>>>> Tested-by: Christian Lamparter <chunkeey@googlemail.com>
>>>>>
>>>>> So, how do we go from here? There is are two small issues with the
>>>>> original patch (#ifdef DWC2_LOG_WRITES got converted to lower case:
>>>>> #ifdef dwc2_log_writes) and I guess a proper subject would be nice.  
>>>>>
>>>>> Arnd, can you please respin and post it (cc'd stable as well)?
>>>>> So this is can be picked up? Or what's your plan?
>>>>
>>>> (I just realized my reply was stuck in my outbox, so the patch
>>>> went out first)
>>>>
>>>> If I recall correctly, the rough consensus was to go with your longer
>>>> patch in the future (fixed up for the comments that BenH and
>>>> I sent), and I'd suggest basing it on top of a fixed version of
>>>> my patch.
>>> Well, but it comes with the "overhead"! So this was just as I said:
>>> "Let's look at it and see if it's any good"... And I think it isn't
>>> since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN
>>> archs etc...
>>
>> I slightly prefer the more general patch for future kernel versions.
>> The overhead will probably be negligible, but we can perform some
>> testing to make sure.
>>
>> Can you resubmit with all gathered feedback?
> Yes I think I can do that. But I would really like to get the
> regression out of the way. So for that: I back Arnd's patch.
> It explains the problem much better and doesn't kill MIPS 
> like the revert I was doing in my initial post to the MLs.
> Also, another bonus: his patch is suited to port to stable.
> 
> The auto-detection approach is not that easy to get right,
> given all the stuff that's going on with BE8, LE4, ... So
> can we have your "blessing" for Arnd's patch for now? since
> that way, I can base my patch on top of his work about the
> issues of endiannes? (Just say: ACK :) )
> 

I agree Arnd's patch is best for stable. We can also apply it to
mainline until we get the autodection working as well.

Unless Felipe has objections.


> Arnd: do you have a version with the #ifdef lower/uppercase
> fix? Or should I give it a try (and fail in a different way ;) )
>  
>>>> Felipe just had another idea, to change the endianess of the dwc2
>>>> block by setting a registers (if that exists). That would indeed
>>>> be preferable, then we can just revert the broken change that
>>>> went into 4.4 and backport that fix instead.
>>> Just a quick reply. I have the docs for the thing. There's something
>>> like that in GAHBCFG at Bit 24... BUT it only switches the endiannes
>>> for the DMA descriptors (which is not always used, there are devices
>>> with PIO only)! It doesn't deal with the MMIO access at all. 
>>
>> That's correct. It only affects descriptor endianness for DMA
>> descriptor mode of operation.
> 
> Ok. The funny thing is that for the MyBook Live Duo this setting might
> be important since the PLB_DMA engine is not part of the DWC library...
> Instead it's from IBM and operates in: Big Endian :-D.
> 

Are you sure the controller is using descriptor DMA? It's more likely
using buffer DMA which this setting doesn't affect. DWC2 doesn't
support Descriptor DMA in device mode on mainline yet. If it's a host
then it might be.

Regards,
John
Benjamin Herrenschmidt May 12, 2016, 10:17 p.m. UTC | #18
On Thu, 2016-05-12 at 11:58 +0200, Christian Lamparter wrote:
> 
> > http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/09/2012-lpc-ref-big-little-endian-herrenschmidt.odp
> > 
> > but there are at least two more twists that you completely missed here:
> > 
> > - Some architectures (e.g. ARMv5 "BE32" mode in IXP4xx, surely some others)
> >   do not implement big-endian mode by wiring up the data lines between the
> >   bus and the CPU differently between big- and little-endian mode like
> >   powerpc and armv7 "BE8" do, but instead they swizzle the *address* lines
> >   on 8-bit and 16-bit addresses. The effect of that is that normal RAM
> >   accesses work as expected both ways, and devices that are accessed using
> >   32-bit MMIO ops never need any byteswap (you actually get "native
> >   endian") while MMIO with 8 and 16 bit width does something completely
> >   unexpected and touches the wrong register. Having an explicit byteswap
> >   on the PCI host bridge gets you the expected addresses again for 8-bit
> >   cycles but it also means that readl()/writel() again need to swap the
> >   data.

Right. Old PowerPC did that too and it's completely stupid. Thankfully
most vendors grew a clue since then and this practice has slowly fallen
into oblivion.

> > - Some other architectures (e.g. Broadcom MIPS) apparently are even fancier
> >   and use a strapping pin on the SoC flips the endianess of the CPU core
> >   at the same time as all the peripheral MMIO registers, with the intention
> >   of never requiring any byte swaps. I believe they are implemented careful
> >   enough to actually get this right, but it confuses the heck out of
> >   Linux drivers that don't expect this.

Right. Drivers like that will need an explicit test in the accessors.

Cheers,
Ben.
diff mbox

Patch

diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
index 3c58d633ce80..1f8ed149a40f 100644
--- a/drivers/usb/dwc2/core.h
+++ b/drivers/usb/dwc2/core.h
@@ -64,12 +64,24 @@ 
 	DWC2_TRACE_SCHEDULER_VB(pr_fmt("%s: SCH: " fmt),		\
 				dev_name(hsotg->dev), ##__VA_ARGS__)
 
+
+#ifdef CONFIG_MIPS
+/*
+ * There are some MIPS machines that can run in either big-endian
+ * or little-endian mode and that use the dwc2 register without
+ * a byteswap in both ways.
+ * Unlike other architectures, MIPS does not require a barrier
+ * before the __raw_writel() to synchronize with DMA but does
+ * require the barrier after the writel() to serialize a series
+ * of writes. This set of operations was added specifically for
+ * MIPS and should only be used there.
+ */
 static inline u32 dwc2_readl(const void __iomem *addr)
 {
 	u32 value = __raw_readl(addr);
 
-	/* In order to preserve endianness __raw_* operation is used. Therefore
-	 * a barrier is needed to ensure IO access is not re-ordered across
+	/* in order to preserve endianness __raw_* operation is used. therefore
+	 * a barrier is needed to ensure io access is not re-ordered across
 	 * reads or writes
 	 */
 	mb();
@@ -81,15 +93,32 @@  static inline void dwc2_writel(u32 value, void __iomem *addr)
 	__raw_writel(value, addr);
 
 	/*
-	 * In order to preserve endianness __raw_* operation is used. Therefore
-	 * a barrier is needed to ensure IO access is not re-ordered across
+	 * in order to preserve endianness __raw_* operation is used. therefore
+	 * a barrier is needed to ensure io access is not re-ordered across
 	 * reads or writes
 	 */
 	mb();
-#ifdef DWC2_LOG_WRITES
-	pr_info("INFO:: wrote %08x to %p\n", value, addr);
+#ifdef dwc2_log_writes
+	pr_info("info:: wrote %08x to %p\n", value, addr);
 #endif
 }
+#else
+/* Normal architectures just use readl/write */
+static inline u32 dwc2_readl(const void __iomem *addr)
+{
+	u32 value = readl(addr);
+	return value;
+}
+
+static inline void dwc2_writel(u32 value, void __iomem *addr)
+{
+	writel(value, addr);
+
+#ifdef dwc2_log_writes
+	pr_info("info:: wrote %08x to %p\n", value, addr);
+#endif
+}
+#endif
 
 /* Maximum number of Endpoints/HostChannels */
 #define MAX_EPS_CHANNELS	16