Patchwork [U-Boot,RFC] ARM: byteorder: add optimized swab16 and swab32

login
register
mail settings
Submitter Dirk Behme
Date May 10, 2013, 4:23 p.m.
Message ID <1368203000-3386-1-git-send-email-dirk.behme@gmail.com>
Download mbox | patch
Permalink /patch/243022/
State Rejected
Delegated to: Albert ARIBAUD
Headers show

Comments

Dirk Behme - May 10, 2013, 4:23 p.m.
Use the specialized ARM instructions for swapping 16 and 32 bit values
instead of using the generic ones from include/linux/byteorder/swab.h.

The x86 version in arch/x86/include/asm/byteorder.h was taken as an
example for this.

E.g. for the mx6qsabrelite target this results in ~4k less code.

Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
---

This patch is marked as RFC due to two questions:

1) These ARM instructions are available in ARMv6 and above. Do we have to limit
   this code somehow to >= ARMv6? I couldn't find any #define to do this in
   U-Boot (?)

2) I'm not sure about the rev16. The rev16 instruction swaps *both* half words,
   the lower _and_ the upper one: 

   REV16 Rd, Rm => Rd[15:8] := Rm[7:0], Rd[7:0] := Rm[15:8], Rd[31:24] := Rm[23:16], Rd[23:16] := Rm[31:24]

   I'm not sure if we want this? Most probably we only want to swap the lower
   half word with swap16()?

 arch/arm/include/asm/byteorder.h |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)
Måns Rullgård - May 10, 2013, 4:56 p.m.
Dirk Behme <dirk.behme@gmail.com> writes:

> Use the specialized ARM instructions for swapping 16 and 32 bit values
> instead of using the generic ones from include/linux/byteorder/swab.h.
>
> The x86 version in arch/x86/include/asm/byteorder.h was taken as an
> example for this.
>
> E.g. for the mx6qsabrelite target this results in ~4k less code.

Which compiler are you using?  GCC 4.5 and later recognises the byteswap
pattern and emits these instructions automatically.  This gives better
code than the inline asm since it allows the compiler to do proper
instruction scheduling, and in the 16-bit case it avoids extraneous
masking.
Dirk Behme - May 10, 2013, 5:42 p.m.
Am 10.05.2013 18:56, schrieb Måns Rullgård:
> Dirk Behme <dirk.behme@gmail.com> writes:
>
>> Use the specialized ARM instructions for swapping 16 and 32 bit values
>> instead of using the generic ones from include/linux/byteorder/swab.h.
>>
>> The x86 version in arch/x86/include/asm/byteorder.h was taken as an
>> example for this.
>>
>> E.g. for the mx6qsabrelite target this results in ~4k less code.
>
> Which compiler are you using?

An older gcc 4.4.4 version.

> GCC 4.5 and later recognises the byteswap
> pattern and emits these instructions automatically.  This gives better
> code than the inline asm since it allows the compiler to do proper
> instruction scheduling, and in the 16-bit case it avoids extraneous
> masking.

Thanks,

Dirk
Wolfgang Denk - May 11, 2013, 12:05 a.m.
Dear Dirk Behme,

In message <518D3191.7060205@gmail.com> you wrote:
>
> > Which compiler are you using?
> 
> An older gcc 4.4.4 version.

We should not really care about these.  And if so, then onlya fter
testing we are really suing such old stuff, i. e. those who are using
recent tools (eventually the majority of users) should not suffer from
hacst to optimize for ancient code.

Best regards,

Wolfgang Denk

Patch

diff --git a/arch/arm/include/asm/byteorder.h b/arch/arm/include/asm/byteorder.h
index c3489f1..f21baf0 100644
--- a/arch/arm/include/asm/byteorder.h
+++ b/arch/arm/include/asm/byteorder.h
@@ -15,6 +15,23 @@ 
 #ifndef __ASM_ARM_BYTEORDER_H
 #define __ASM_ARM_BYTEORDER_H
 
+static __inline__ __u32 ___arch__swab32(__u32 x)
+{
+	__asm__("rev %0,%1"		/* swap bytes */ \
+		: "=r" (x) \
+		:  "0" (x)); \
+		return x;
+}
+#define __arch__swab32(x) ___arch__swab32(x)
+
+static __inline__ __u16 ___arch__swab16(__u16 x)
+{
+	__asm__("rev16 %0,%1"		/* swap bytes */ \
+		: "=r" (x) \
+		:  "0" (x)); \
+		return x;
+}
+#define __arch__swab16(x) ___arch__swab16(x)
 
 #include <asm/types.h>