diff mbox

arc: use little endian accesses

Message ID 1457544064-16167-1-git-send-email-ltrimas@synopsys.com
State Accepted
Headers show

Commit Message

Lada Trimasova March 9, 2016, 5:21 p.m. UTC
Memory access primitives should use cpu_to_le16, cpu_to_le32, le16_to_cpu
and le32_to_cpu because it is not really guaranteed that drivers handles
any ordering themselves.
For example, serial port driver doesn't work when kernel is build for
arc big endian architecture.

Signed-off-by: Lada Trimasova <ltrimas@synopsys.com>
Cc: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/io.h | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

Comments

Vineet Gupta March 10, 2016, 5:05 a.m. UTC | #1
+CC Noam

On Wednesday 09 March 2016 10:51 PM, Lada Trimasova wrote:
> Memory access primitives should use cpu_to_le16, cpu_to_le32, le16_to_cpu
> and le32_to_cpu because it is not really guaranteed that drivers handles
> any ordering themselves.

That is the driver issue. readxx as API simply returns data in native endianness.
We've had EZChip running big endian and so far and they didn't need this change.

> For example, serial port driver doesn't work when kernel is build for
> arc big endian architecture.

Last I tested Big Endian on SDP with 8250 part + 8250 driver it was working fine.
I presume this is the systemC model for device and standard 8250 driver and very
likely the model is not fixed endian or something.

Alexey knows about this stuff - this was discussed on lkml back in 2013 when he
was fighting the Xilinx systemAce driver endian issues

> Signed-off-by: Lada Trimasova <ltrimas@synopsys.com>

Sorry NACK on this ! If you still think we need it I need more data / details on
what exactly is failing in 8250 and how !

-Vineet

> Cc: Alexey Brodkin <abrodkin@synopsys.com>
> Cc: Vineet Gupta <vgupta@synopsys.com>
> ---
>  arch/arc/include/asm/io.h | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arc/include/asm/io.h b/arch/arc/include/asm/io.h
> index 694ece8..0b3d5ea 100644
> --- a/arch/arc/include/asm/io.h
> +++ b/arch/arc/include/asm/io.h
> @@ -129,15 +129,17 @@ static inline void __raw_writel(u32 w, volatile void __iomem *addr)
>  #define writel(v,c)		({ __iowmb(); writel_relaxed(v,c); })
>  
>  /*
> - * Relaxed API for drivers which can handle any ordering themselves
> + * This are defined to perform little endian accesses
>   */
>  #define readb_relaxed(c)	__raw_readb(c)
> -#define readw_relaxed(c)	__raw_readw(c)
> -#define readl_relaxed(c)	__raw_readl(c)
> +#define readw_relaxed(c) ({ u16 __r = le16_to_cpu((__force __le16) \
> +					__raw_readw(c)); __r; })
> +#define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32) \
> +					__raw_readl(c)); __r; })
>  
>  #define writeb_relaxed(v,c)	__raw_writeb(v,c)
> -#define writew_relaxed(v,c)	__raw_writew(v,c)
> -#define writel_relaxed(v,c)	__raw_writel(v,c)
> +#define writew_relaxed(v,c)	__raw_writew((__force u16) cpu_to_le16(v),c)
> +#define writel_relaxed(v,c)	__raw_writel((__force u32) cpu_to_le32(v),c)
>  
>  #include <asm-generic/io.h>
>
Vineet Gupta March 10, 2016, 5:19 a.m. UTC | #2
On Thursday 10 March 2016 10:35 AM, Vineet Gupta wrote:
> +CC Noam
>
> On Wednesday 09 March 2016 10:51 PM, Lada Trimasova wrote:
>> > Memory access primitives should use cpu_to_le16, cpu_to_le32, le16_to_cpu
>> > and le32_to_cpu because it is not really guaranteed that drivers handles
>> > any ordering themselves.
> That is the driver issue. readxx as API simply returns data in native endianness.
> We've had EZChip running big endian and so far and they didn't need this change.
>
>> > For example, serial port driver doesn't work when kernel is build for
>> > arc big endian architecture.
> Last I tested Big Endian on SDP with 8250 part + 8250 driver it was working fine.
> I presume this is the systemC model for device and standard 8250 driver and very
> likely the model is not fixed endian or something.
>
> Alexey knows about this stuff - this was discussed on lkml back in 2013 when he
> was fighting the Xilinx systemAce driver endian issues
>

Do you need "native-endian" DT entry in nsimosci DT bindings for uart ?
Alexey Brodkin March 10, 2016, 7:44 a.m. UTC | #3
Hi Vineet,

On Thu, 2016-03-10 at 05:05 +0000, Vineet Gupta wrote:
> +CC Noam
> 
> On Wednesday 09 March 2016 10:51 PM, Lada Trimasova wrote:
> > 
> > Memory access primitives should use cpu_to_le16, cpu_to_le32, le16_to_cpu
> > and le32_to_cpu because it is not really guaranteed that drivers handles
> > any ordering themselves.
> That is the driver issue. readxx as API simply returns data in native endianness.
> We've had EZChip running big endian and so far and they didn't need this change.

Let me disagree with you here.
See what is said in "include/asm-generic/io.h":
---------------------->8---------------------
/*
 * __raw_{read,write}{b,w,l,q}() access memory in native endianness.
 *
 * On some architectures memory mapped IO needs to be accessed differently.
 * On the simple architectures, we just read/write the memory location
 * directly.
 */

...

/*
 * {read,write}{b,w,l,q}() access little endian memory and return result in
 * native endianness.
 */
---------------------->8---------------------

And that's an implementation we have for ARC:
---------------------->8---------------------
#define readb(c)		({ u8  __v = readb_relaxed(c); __iormb(); __v; })
#define readw(c)		({ u16 __v = readw_relaxed(c); __iormb(); __v; })
#define readl(c)		({ u32 __v = readl_relaxed(c); __iormb(); __v; })

#define writeb(v,c)		({ __iowmb(); writeb_relaxed(v,c); })
#define writew(v,c)		({ __iowmb(); writew_relaxed(v,c); })
#define writel(v,c)		({ __iowmb(); writel_relaxed(v,c); })

/*
 * Relaxed API for drivers which can handle any ordering themselves
 */
#define readb_relaxed(c)	__raw_readb(c)
#define readw_relaxed(c)	__raw_readw(c)
#define readl_relaxed(c)	__raw_readl(c)

#define writeb_relaxed(v,c)	__raw_writeb(v,c)
#define writew_relaxed(v,c)	__raw_writew(v,c)
#define writel_relaxed(v,c)	__raw_writel(v,c)
---------------------->8---------------------

Which is effectively (related to endianess discussion):
---------------------->8---------------------
#define readX(c)	__raw_readX(c)
#define writeX(v,c)	__raw_writeX(v,c)
---------------------->8---------------------

That looks IMHO incorrect if we read API description in "include/asm-generic/io.h".
BTW description of {read,write}{b,w,l,q}() is a bit misleading in part saying
"... and return result in __native_endianness__".

But real implementation of {read,write}{b,w,l,q}() in "include/asm-generic/io.h"
really shows what was meant - note __leXX_to_cpu() and cpu_to_leXX are used.

> > 
> > For example, serial port driver doesn't work when kernel is build for
> > arc big endian architecture.
> Last I tested Big Endian on SDP with 8250 part + 8250 driver it was working fine.
> I presume this is the systemC model for device and standard 8250 driver and very
> likely the model is not fixed endian or something.

Model is indeed little-endian. We build it only once and than changing
only "nsim_isa_big_endian" property (which changes only CPU core endianess) may use
it equally well with little- and big-endian builds of Linux kernel.

-Alexey
Arnd Bergmann March 10, 2016, 7:45 a.m. UTC | #4
On Thursday 10 March 2016, Vineet Gupta wrote:
> On Wednesday 09 March 2016 10:51 PM, Lada Trimasova wrote:
> > Memory access primitives should use cpu_to_le16, cpu_to_le32, le16_to_cpu
> > and le32_to_cpu because it is not really guaranteed that drivers handles
> > any ordering themselves.
> 
> That is the driver issue. readxx as API simply returns data in native endianness.
> We've had EZChip running big endian and so far and they didn't need this change.

Most drivers using readl() or readl_relaxed() expect those to perform byte
swaps on big-endian architectures, as the registers tend to be fixed endian,
so the change seems reasonable.

It depends a little bit on how endian mode is implemented in the CPU: do you
follow the normal model of swapping byte order in the load/store instructions
of the CPU when running big-endian, or is the CPU always running little-endian
but the bus addresses get mangled on byte accesses to give the illusion
of a big-endian system?

	Arnd
Vineet Gupta March 10, 2016, 9:55 a.m. UTC | #5
On Thursday 10 March 2016 01:14 PM, Alexey Brodkin wrote:
>>> For example, serial port driver doesn't work when kernel is build for
>>> arc big endian architecture.
>> Last I tested Big Endian on SDP with 8250 part + 8250 driver it was working fine.
>> I presume this is the systemC model for device and standard 8250 driver and very
>> likely the model is not fixed endian or something.
> Model is indeed little-endian. We build it only once and than changing
> only "nsim_isa_big_endian" property (which changes only CPU core endianess) may use
> it equally well with little- and big-endian builds of Linux kernel.

Can you or Lada provide more details as to exactly what is not working - what
driver to be precise 8250 or dw-8250.
And where exactly the failure shows up. I want to understand this more
Noam told me off list that he has no issues with both big endian ARC + 8250 in
systemc model or silicon.

-Vineet
Lada Trimasova March 10, 2016, 6:57 p.m. UTC | #6
Hi Vineet, Alexey, Arnd,
On Thu, 2016-03-10 at 09:55 +0000, Vineet Gupta wrote:

Can you or Lada provide more details as to exactly what is not working - what
driver to be precise 8250 or dw-8250.
And where exactly the failure shows up. I want to understand this more
Noam told me off list that he has no issues with both big endian ARC + 8250 in
systemc model or silicon.

-Vineet



Driver is 8250, kernel is built for BE arc, nsim option in model "nsim_isa_big_endian = 1".

With current "readl" and "writel" implementation for ARC we read word from memory without any endianess manipulation. So in case of little endian architecture we get what we want: the first memory byte is the low byte in the word. But in case of big endian architecture the word endianess is swapped: the first memory byte is the high word byte.

And for example, let's see what happens when we use "readl" in  function "serial8250_early_in" in driver/tty/serial/8250.

Take a look to one line from memory dump:
0xf0000010: 0x0b    0x00    0x00    0x00    0x60    0x00    0x00    0x00

When kernel is built for little endian architecture, we read this data in status register in function "serial_putc" using "readl" function in driver/tty/serial/8250 as:
r0: 0x0000006
The low byte is 0x0b, so the condition "if ((status & BOTH_EMPTY) == BOTH_EMPTY)"  is true, as BOTH_EMPTY is some mask with low bytes set.

When kernel is built with "CPU_BIG_ENDIAN" and model nsim option is "nsim_isa_big_endian=1", we read this data as:
r0: 0x6000000
So as you can see the low byte is 0x00 and above mentioned condition never becomes true, we can't continue initialization.


Regards,
Lada
Arnd Bergmann March 10, 2016, 7:23 p.m. UTC | #7
On Thursday 10 March 2016, Lada Trimasova wrote:
> Driver is 8250, kernel is built for BE arc, nsim option in model "nsim_isa_big_endian = 1".
> 
> With current "readl" and "writel" implementation for ARC we read word from memory without any endianess manipulation. So in case of little endian architecture we get what we want: the first memory byte is the low byte in the word. But in case of big endian architecture the word endianess is swapped: the first memory byte is the high word byte.
> 
> And for example, let's see what happens when we use "readl" in  function "serial8250_early_in" in driver/tty/serial/8250.
> 
> Take a look to one line from memory dump:
> 0xf0000010: 0x0b    0x00    0x00    0x00    0x60    0x00    0x00    0x00
> 
> When kernel is built for little endian architecture, we read this data in status register in function "serial_putc" using "readl" function in driver/tty/serial/8250 as:
> r0: 0x0000006
> The low byte is 0x0b, so the condition "if ((status & BOTH_EMPTY) == BOTH_EMPTY)"  is true, as BOTH_EMPTY is some mask with low bytes set.
> 
> When kernel is built with "CPU_BIG_ENDIAN" and model nsim option is "nsim_isa_big_endian=1", we read this data as:
> r0: 0x6000000
> So as you can see the low byte is 0x00 and above mentioned condition never becomes true, we can't continue initialization.
> 

Ok, this sounds like a completely normal architecture implementation then,
and your patch looks correct.

If some other driver breaks because of this change, you should investigate
what went wrong there, and treat it as a driver specific problem.

	Arnd
Noam Camus March 11, 2016, 5:07 a.m. UTC | #8
> From: Lada Trimasova <Lada.Trimasova@synopsys.com>
> Sent: Thursday, March 10, 2016 8:57 PM

> And for example, let's see what happens when we use "readl" in  function "serial8250_early_in" in driver/tty/serial/8250.

Is your DTS entry includes for serial node following entries:
reg-io-width = <4>
native-endian;

if answer is yes you should expect (in case of big endian compilation) that:
port->iotype would be equal to UPIO_MEM32BE, hence no readl() use for this case.

This is how it works for me here, it sounds like your case is similar.

Regards,
Noam
Vineet Gupta March 11, 2016, 5:18 a.m. UTC | #9
On Thursday 10 March 2016 01:15 PM, Arnd Bergmann wrote:
> On Thursday 10 March 2016, Vineet Gupta wrote:
>> On Wednesday 09 March 2016 10:51 PM, Lada Trimasova wrote:
>>> Memory access primitives should use cpu_to_le16, cpu_to_le32, le16_to_cpu
>>> and le32_to_cpu because it is not really guaranteed that drivers handles
>>> any ordering themselves.
>> That is the driver issue. readxx as API simply returns data in native endianness.
>> We've had EZChip running big endian and so far and they didn't need this change.
> Most drivers using readl() or readl_relaxed() expect those to perform byte
> swaps on big-endian architectures, as the registers tend to be fixed endian,
> so the change seems reasonable.
>
> It depends a little bit on how endian mode is implemented in the CPU: do you
> follow the normal model of swapping byte order in the load/store instructions
> of the CPU when running big-endian, or is the CPU always running little-endian
> but the bus addresses get mangled on byte accesses to give the illusion
> of a big-endian system?

OK I got the response from hardware guys that we follow the normal mode of
swapping byte order for big-endian mode. Arnd can u please explain how that might
impact the io accessors, it at all.

And what exactly are semantics of readX() and ioreadX() - even if arch specific
and I'd be glad to change that for ARC.

I can also help with documenting them properly some where as went digging into
mailing list first thing Lada posted this patch :-)

Thx,
-Vineet
Vineet Gupta March 11, 2016, 12:44 p.m. UTC | #10
On Friday 11 March 2016 12:54 AM, Arnd Bergmann wrote:
> On Thursday 10 March 2016, Lada Trimasova wrote:
>> Driver is 8250, kernel is built for BE arc, nsim option in model "nsim_isa_big_endian = 1".
>>
>> With current "readl" and "writel" implementation for ARC we read word from memory without any endianess manipulation. So in case of little endian architecture we get what we want: the first memory byte is the low byte in the word. But in case of big endian architecture the word endianess is swapped: the first memory byte is the high word byte.
>>
>> And for example, let's see what happens when we use "readl" in  function "serial8250_early_in" in driver/tty/serial/8250.
>>
>> Take a look to one line from memory dump:
>> 0xf0000010: 0x0b    0x00    0x00    0x00    0x60    0x00    0x00    0x00
>>
>> When kernel is built for little endian architecture, we read this data in status register in function "serial_putc" using "readl" function in driver/tty/serial/8250 as:
>> r0: 0x0000006
>> The low byte is 0x0b, so the condition "if ((status & BOTH_EMPTY) == BOTH_EMPTY)"  is true, as BOTH_EMPTY is some mask with low bytes set.
>>
>> When kernel is built with "CPU_BIG_ENDIAN" and model nsim option is "nsim_isa_big_endian=1", we read this data as:
>> r0: 0x6000000
>> So as you can see the low byte is 0x00 and above mentioned condition never becomes true, we can't continue initialization.
>>
> Ok, this sounds like a completely normal architecture implementation then,
> and your patch looks correct.
>
> If some other driver breaks because of this change, you should investigate
> what went wrong there, and treat it as a driver specific problem.

Although I believe Arnd the most, I was not convinced abt this (see below why) so
did a bit of investigation and looks like this patch is indeed correct and Arnd as
always is right :-)

The current upstream kernel doesn't boot on BE configured AXS103 is stuck
similarly in uart. With this patch it comes up fine. My trouble was I'd seen BE
work (w/o this patch) in our internal 3.18 branch. Turns out that in 3.18, ARC
io.h used the asm-generic version of readX(). My patch b8a033023994 ("ARCv2:
barriers") to upstream introduced the io barriers for ARCHS and in the process
introduced our own readX() accessors, which were not the same as correct
asm-generic ver. Thus I inadvertently introduced this bug.

@Lada I will fix up the changelog to add some of the background behind this
change, and mark this for stable backport as well.

Thx,
-Vineet

>
> 	Arnd
>
diff mbox

Patch

diff --git a/arch/arc/include/asm/io.h b/arch/arc/include/asm/io.h
index 694ece8..0b3d5ea 100644
--- a/arch/arc/include/asm/io.h
+++ b/arch/arc/include/asm/io.h
@@ -129,15 +129,17 @@  static inline void __raw_writel(u32 w, volatile void __iomem *addr)
 #define writel(v,c)		({ __iowmb(); writel_relaxed(v,c); })
 
 /*
- * Relaxed API for drivers which can handle any ordering themselves
+ * This are defined to perform little endian accesses
  */
 #define readb_relaxed(c)	__raw_readb(c)
-#define readw_relaxed(c)	__raw_readw(c)
-#define readl_relaxed(c)	__raw_readl(c)
+#define readw_relaxed(c) ({ u16 __r = le16_to_cpu((__force __le16) \
+					__raw_readw(c)); __r; })
+#define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32) \
+					__raw_readl(c)); __r; })
 
 #define writeb_relaxed(v,c)	__raw_writeb(v,c)
-#define writew_relaxed(v,c)	__raw_writew(v,c)
-#define writel_relaxed(v,c)	__raw_writel(v,c)
+#define writew_relaxed(v,c)	__raw_writew((__force u16) cpu_to_le16(v),c)
+#define writel_relaxed(v,c)	__raw_writel((__force u32) cpu_to_le32(v),c)
 
 #include <asm-generic/io.h>