Message ID | 1292425994-24331-1-git-send-email-robherring2@gmail.com |
---|---|
State | Deferred |
Headers | show |
Dear Rob Herring, In message <1292425994-24331-1-git-send-email-robherring2@gmail.com> you wrote: > From: Rob Herring <rob.herring@calxeda.com> > > swab functions are heavily used by FDT code, so enable > optimized assembly code for ARMv6 and later. > > Signed-off-by: Rob Herring <rob.herring@calxeda.com> > --- > arch/arm/include/asm/byteorder.h | 16 ++++++++++++++++ > 1 files changed, 16 insertions(+), 0 deletions(-) Do you have any numbers if this changes gives any measurable improvement? Best regards, Wolfgang Denk
Wolfgang, On 12/17/2010 02:21 PM, Wolfgang Denk wrote: > Dear Rob Herring, > > In message<1292425994-24331-1-git-send-email-robherring2@gmail.com> you wrote: >> From: Rob Herring<rob.herring@calxeda.com> >> >> swab functions are heavily used by FDT code, so enable >> optimized assembly code for ARMv6 and later. >> >> Signed-off-by: Rob Herring<rob.herring@calxeda.com> >> --- >> arch/arm/include/asm/byteorder.h | 16 ++++++++++++++++ >> 1 files changed, 16 insertions(+), 0 deletions(-) > > Do you have any numbers if this changes gives any measurable > improvement? I have an instruction trace capture and see repeated calls to swab32 by the fdt code. It's an obvious low hanging fruit. The boot time for device tree vs. non-device tree is noticeably longer, but I don't have any formal measurements. Rob
Rob Herring <robherring2@gmail.com> writes: > From: Rob Herring <rob.herring@calxeda.com> > > swab functions are heavily used by FDT code, so enable > optimized assembly code for ARMv6 and later. > > Signed-off-by: Rob Herring <rob.herring@calxeda.com> > --- > arch/arm/include/asm/byteorder.h | 16 ++++++++++++++++ > 1 files changed, 16 insertions(+), 0 deletions(-) > > diff --git a/arch/arm/include/asm/byteorder.h b/arch/arm/include/asm/byteorder.h > index c3489f1..9df5844 100644 > --- a/arch/arm/include/asm/byteorder.h > +++ b/arch/arm/include/asm/byteorder.h > @@ -23,6 +23,22 @@ > # define __SWAB_64_THRU_32__ > #endif > > +#if defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_6__) > +static inline __u16 __attribute__((const)) ___arch_swab16(__u16 x) > +{ > + __asm__ ("rev16 %0, %1" : "=r" (x) : "r" (x)); > + return x; > +} Pay close attention to what gcc does with this as it is prone to add unnecessary masking of the low halfword. If the callers are well-behaved (argument having top halfword clear), making the parameter and return types here plain unsigned (or u32) gives better code.
On 12/17/2010 03:27 PM, Måns Rullgård wrote: > Rob Herring<robherring2@gmail.com> writes: > >> From: Rob Herring<rob.herring@calxeda.com> >> >> swab functions are heavily used by FDT code, so enable >> optimized assembly code for ARMv6 and later. >> >> Signed-off-by: Rob Herring<rob.herring@calxeda.com> >> --- >> arch/arm/include/asm/byteorder.h | 16 ++++++++++++++++ >> 1 files changed, 16 insertions(+), 0 deletions(-) >> >> diff --git a/arch/arm/include/asm/byteorder.h b/arch/arm/include/asm/byteorder.h >> index c3489f1..9df5844 100644 >> --- a/arch/arm/include/asm/byteorder.h >> +++ b/arch/arm/include/asm/byteorder.h >> @@ -23,6 +23,22 @@ >> # define __SWAB_64_THRU_32__ >> #endif >> >> +#if defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_6__) >> +static inline __u16 __attribute__((const)) ___arch_swab16(__u16 x) >> +{ >> + __asm__ ("rev16 %0, %1" : "=r" (x) : "r" (x)); >> + return x; >> +} > > Pay close attention to what gcc does with this as it is prone to add > unnecessary masking of the low halfword. If the callers are > well-behaved (argument having top halfword clear), making the > parameter and return types here plain unsigned (or u32) gives better > code. This straight from the Linux code and there are only a few users of swab16 (none in my build). Rob
Rob Herring <robherring2@gmail.com> writes: > On 12/17/2010 03:27 PM, Måns Rullgård wrote: >> Rob Herring<robherring2@gmail.com> writes: >> >>> From: Rob Herring<rob.herring@calxeda.com> >>> >>> swab functions are heavily used by FDT code, so enable >>> optimized assembly code for ARMv6 and later. >>> >>> Signed-off-by: Rob Herring<rob.herring@calxeda.com> >>> --- >>> arch/arm/include/asm/byteorder.h | 16 ++++++++++++++++ >>> 1 files changed, 16 insertions(+), 0 deletions(-) >>> >>> diff --git a/arch/arm/include/asm/byteorder.h b/arch/arm/include/asm/byteorder.h >>> index c3489f1..9df5844 100644 >>> --- a/arch/arm/include/asm/byteorder.h >>> +++ b/arch/arm/include/asm/byteorder.h >>> @@ -23,6 +23,22 @@ >>> # define __SWAB_64_THRU_32__ >>> #endif >>> >>> +#if defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_6__) >>> +static inline __u16 __attribute__((const)) ___arch_swab16(__u16 x) >>> +{ >>> + __asm__ ("rev16 %0, %1" : "=r" (x) : "r" (x)); >>> + return x; >>> +} >> >> Pay close attention to what gcc does with this as it is prone to add >> unnecessary masking of the low halfword. If the callers are >> well-behaved (argument having top halfword clear), making the >> parameter and return types here plain unsigned (or u32) gives better >> code. > > This straight from the Linux code and there are only a few users of > swab16 (none in my build). Look at the generated code if you don't believe me.
Dear Rob Herring, In message <4D0CEB67.2040502@gmail.com> you wrote: > > This straight from the Linux code and there are only a few users of > swab16 (none in my build). Given that we have no idea if this code really gives any measurable performance improvement, and that it appears to be dangerous as well, I tend to not include that as is. Thanks. Wolfgang Denk
diff --git a/arch/arm/include/asm/byteorder.h b/arch/arm/include/asm/byteorder.h index c3489f1..9df5844 100644 --- a/arch/arm/include/asm/byteorder.h +++ b/arch/arm/include/asm/byteorder.h @@ -23,6 +23,22 @@ # define __SWAB_64_THRU_32__ #endif +#if defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_6__) +static inline __u16 __attribute__((const)) ___arch_swab16(__u16 x) +{ + __asm__ ("rev16 %0, %1" : "=r" (x) : "r" (x)); + return x; +} +#define __arch_swab16 ___arch_swab16 + +static inline __u32 __attribute__((const)) ___arch_swab32(__u32 x) +{ + __asm__ ("rev %0, %1" : "=r" (x) : "r" (x)); + return x; +} +#define __arch_swab32 ___arch_swab32 +#endif + #ifdef __ARMEB__ #include <linux/byteorder/big_endian.h> #else