diff mbox

[U-Boot] arm: Add armv6 and armv7 optimized swab functions

Message ID 1292425994-24331-1-git-send-email-robherring2@gmail.com
State Deferred
Headers show

Commit Message

Rob Herring Dec. 15, 2010, 3:13 p.m. UTC
From: Rob Herring <rob.herring@calxeda.com>

swab functions are heavily used by FDT code, so enable
optimized assembly code for ARMv6 and later.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
---
 arch/arm/include/asm/byteorder.h |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

Comments

Wolfgang Denk Dec. 17, 2010, 8:21 p.m. UTC | #1
Dear Rob Herring,

In message <1292425994-24331-1-git-send-email-robherring2@gmail.com> you wrote:
> From: Rob Herring <rob.herring@calxeda.com>
> 
> swab functions are heavily used by FDT code, so enable
> optimized assembly code for ARMv6 and later.
> 
> Signed-off-by: Rob Herring <rob.herring@calxeda.com>
> ---
>  arch/arm/include/asm/byteorder.h |   16 ++++++++++++++++
>  1 files changed, 16 insertions(+), 0 deletions(-)

Do you have any numbers if this changes gives any measurable
improvement?

Best regards,

Wolfgang Denk
Rob Herring Dec. 17, 2010, 8:52 p.m. UTC | #2
Wolfgang,

On 12/17/2010 02:21 PM, Wolfgang Denk wrote:
> Dear Rob Herring,
>
> In message<1292425994-24331-1-git-send-email-robherring2@gmail.com>  you wrote:
>> From: Rob Herring<rob.herring@calxeda.com>
>>
>> swab functions are heavily used by FDT code, so enable
>> optimized assembly code for ARMv6 and later.
>>
>> Signed-off-by: Rob Herring<rob.herring@calxeda.com>
>> ---
>>   arch/arm/include/asm/byteorder.h |   16 ++++++++++++++++
>>   1 files changed, 16 insertions(+), 0 deletions(-)
>
> Do you have any numbers if this changes gives any measurable
> improvement?

I have an instruction trace capture and see repeated calls to swab32 by 
the fdt code. It's an obvious low hanging fruit. The boot time for 
device tree vs. non-device tree is noticeably longer, but I don't have 
any formal measurements.

Rob
Måns Rullgård Dec. 17, 2010, 9:27 p.m. UTC | #3
Rob Herring <robherring2@gmail.com> writes:

> From: Rob Herring <rob.herring@calxeda.com>
>
> swab functions are heavily used by FDT code, so enable
> optimized assembly code for ARMv6 and later.
>
> Signed-off-by: Rob Herring <rob.herring@calxeda.com>
> ---
>  arch/arm/include/asm/byteorder.h |   16 ++++++++++++++++
>  1 files changed, 16 insertions(+), 0 deletions(-)
>
> diff --git a/arch/arm/include/asm/byteorder.h b/arch/arm/include/asm/byteorder.h
> index c3489f1..9df5844 100644
> --- a/arch/arm/include/asm/byteorder.h
> +++ b/arch/arm/include/asm/byteorder.h
> @@ -23,6 +23,22 @@
>  #  define __SWAB_64_THRU_32__
>  #endif
>
> +#if defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_6__)
> +static inline __u16 __attribute__((const)) ___arch_swab16(__u16 x)
> +{
> +	__asm__ ("rev16 %0, %1" : "=r" (x) : "r" (x));
> +	return x;
> +}

Pay close attention to what gcc does with this as it is prone to add
unnecessary masking of the low halfword.  If the callers are
well-behaved (argument having top halfword clear), making the
parameter and return types here plain unsigned (or u32) gives better
code.
Rob Herring Dec. 18, 2010, 5:12 p.m. UTC | #4
On 12/17/2010 03:27 PM, Måns Rullgård wrote:
> Rob Herring<robherring2@gmail.com>  writes:
>
>> From: Rob Herring<rob.herring@calxeda.com>
>>
>> swab functions are heavily used by FDT code, so enable
>> optimized assembly code for ARMv6 and later.
>>
>> Signed-off-by: Rob Herring<rob.herring@calxeda.com>
>> ---
>>   arch/arm/include/asm/byteorder.h |   16 ++++++++++++++++
>>   1 files changed, 16 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/byteorder.h b/arch/arm/include/asm/byteorder.h
>> index c3489f1..9df5844 100644
>> --- a/arch/arm/include/asm/byteorder.h
>> +++ b/arch/arm/include/asm/byteorder.h
>> @@ -23,6 +23,22 @@
>>   #  define __SWAB_64_THRU_32__
>>   #endif
>>
>> +#if defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_6__)
>> +static inline __u16 __attribute__((const)) ___arch_swab16(__u16 x)
>> +{
>> +	__asm__ ("rev16 %0, %1" : "=r" (x) : "r" (x));
>> +	return x;
>> +}
>
> Pay close attention to what gcc does with this as it is prone to add
> unnecessary masking of the low halfword.  If the callers are
> well-behaved (argument having top halfword clear), making the
> parameter and return types here plain unsigned (or u32) gives better
> code.

This straight from the Linux code and there are only a few users of 
swab16 (none in my build).

Rob
Måns Rullgård Dec. 18, 2010, 6:17 p.m. UTC | #5
Rob Herring <robherring2@gmail.com> writes:

> On 12/17/2010 03:27 PM, Måns Rullgård wrote:
>> Rob Herring<robherring2@gmail.com>  writes:
>>
>>> From: Rob Herring<rob.herring@calxeda.com>
>>>
>>> swab functions are heavily used by FDT code, so enable
>>> optimized assembly code for ARMv6 and later.
>>>
>>> Signed-off-by: Rob Herring<rob.herring@calxeda.com>
>>> ---
>>>   arch/arm/include/asm/byteorder.h |   16 ++++++++++++++++
>>>   1 files changed, 16 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/arch/arm/include/asm/byteorder.h b/arch/arm/include/asm/byteorder.h
>>> index c3489f1..9df5844 100644
>>> --- a/arch/arm/include/asm/byteorder.h
>>> +++ b/arch/arm/include/asm/byteorder.h
>>> @@ -23,6 +23,22 @@
>>>   #  define __SWAB_64_THRU_32__
>>>   #endif
>>>
>>> +#if defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_6__)
>>> +static inline __u16 __attribute__((const)) ___arch_swab16(__u16 x)
>>> +{
>>> +	__asm__ ("rev16 %0, %1" : "=r" (x) : "r" (x));
>>> +	return x;
>>> +}
>>
>> Pay close attention to what gcc does with this as it is prone to add
>> unnecessary masking of the low halfword.  If the callers are
>> well-behaved (argument having top halfword clear), making the
>> parameter and return types here plain unsigned (or u32) gives better
>> code.
>
> This straight from the Linux code and there are only a few users of 
> swab16 (none in my build).

Look at the generated code if you don't believe me.
Wolfgang Denk Dec. 18, 2010, 9:59 p.m. UTC | #6
Dear Rob Herring,

In message <4D0CEB67.2040502@gmail.com> you wrote:
>
> This straight from the Linux code and there are only a few users of
> swab16 (none in my build).

Given that we have no idea if this code really gives any measurable
performance improvement, and that it appears to be dangerous as well,
I tend to not include that as is.

Thanks.


Wolfgang Denk
diff mbox

Patch

diff --git a/arch/arm/include/asm/byteorder.h b/arch/arm/include/asm/byteorder.h
index c3489f1..9df5844 100644
--- a/arch/arm/include/asm/byteorder.h
+++ b/arch/arm/include/asm/byteorder.h
@@ -23,6 +23,22 @@ 
 #  define __SWAB_64_THRU_32__
 #endif
 
+#if defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_6__)
+static inline __u16 __attribute__((const)) ___arch_swab16(__u16 x)
+{
+	__asm__ ("rev16 %0, %1" : "=r" (x) : "r" (x));
+	return x;
+}
+#define __arch_swab16 ___arch_swab16
+
+static inline __u32 __attribute__((const)) ___arch_swab32(__u32 x)
+{
+	__asm__ ("rev %0, %1" : "=r" (x) : "r" (x));
+	return x;
+}
+#define __arch_swab32 ___arch_swab32
+#endif
+
 #ifdef __ARMEB__
 #include <linux/byteorder/big_endian.h>
 #else