Message ID | 546A31DB.7040102@arm.com |
---|---|
State | New |
Headers | show |
On 17 November 2014 17:35, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote: > 2014-11-17 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic. > > 2014-11-17 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * gcc.target/aarch64/simd/vsqrt_f64_1.c OK /Marcus
Hi Kyrill, On 21 November 2014 at 16:52, Marcus Shawcroft <marcus.shawcroft@gmail.com> wrote: > On 17 November 2014 17:35, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote: > >> 2014-11-17 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >> >> * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic. >> >> 2014-11-17 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >> >> * gcc.target/aarch64/simd/vsqrt_f64_1.c > > OK /Marcus Your new test fails at the scan-assembly step because all the code is optimized away (even at -O1). Christophe.
On Mon, Nov 17, 2014 at 05:35:23PM +0000, Kyrill Tkachov wrote: > Hi all, > > This patch implements the vsqrt_f64 intrinsic in arm_neon.h. > There's not much to it, we can reuse __builtin_sqrt. > It's a fairly straightforward and self-contained patch, > do we still want it at this stage? > > A new execute test is added. > > Tested aarch64-none-elf. > > > Thanks, > Kyrill > > 2014-11-17 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic. > > 2014-11-17 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * gcc.target/aarch64/simd/vsqrt_f64_1.c > commit d9e42debe2655287eef7b8c3ecf29bbdd11e6425 > Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com> > Date: Mon Nov 17 15:02:01 2014 +0000 > > [AArch64] Implement vsqrt_f64 intrinsic > > diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h > index b3b80b8..c58213a 100644 > --- a/gcc/config/aarch64/arm_neon.h > +++ b/gcc/config/aarch64/arm_neon.h > @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a) > return __builtin_aarch64_sqrtv4sf (a); > } > > +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__)) > +vsqrt_f64 (float64x1_t a) > +{ > + return (float64x1_t) { __builtin_sqrt (a[0]) }; > +} Hi Kyrill, Does this introduce an implicit need to link against a maths library if we want arm_neon.h to work correctly? If so, I think we need to take a different approach. At O0 I've started to see: " undefined reference to `sqrt' " When checking a large arm_neon.h testcase. It does seem strange that the mid-end would convert a __builtin_sqrt back to a library call at O0 when the target has an optab for it, so perhaps there is a bug there to go hunt? Thanks, James
On Mon, 15 Dec 2014, James Greenhalgh wrote: > > @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a) > > return __builtin_aarch64_sqrtv4sf (a); > > } > > > > +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__)) > > +vsqrt_f64 (float64x1_t a) > > +{ > > + return (float64x1_t) { __builtin_sqrt (a[0]) }; > > +} > > Hi Kyrill, > > Does this introduce an implicit need to link against a maths library if > we want arm_neon.h to work correctly? If so, I think we need to take a > different approach. > > At O0 I've started to see: > > " undefined reference to `sqrt' " > > When checking a large arm_neon.h testcase. > > It does seem strange that the mid-end would convert a __builtin_sqrt back > to a library call at O0 when the target has an optab for it, so perhaps > there is a bug there to go hunt? __builtin_sqrt has the same semantics as the sqrt library function. This includes setting errno for negative arguments (other than -0 and -NaN). The semantics also include that it's always OK to expand to a call to that library function (generally, __builtin_foo may always expand to a call to foo, if there is such a library function).
On 17/12/14 00:04, Joseph Myers wrote: > On Mon, 15 Dec 2014, James Greenhalgh wrote: > >>> @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a) >>> return __builtin_aarch64_sqrtv4sf (a); >>> } >>> >>> +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__)) >>> +vsqrt_f64 (float64x1_t a) >>> +{ >>> + return (float64x1_t) { __builtin_sqrt (a[0]) }; >>> +} >> Hi Kyrill, >> >> Does this introduce an implicit need to link against a maths library if >> we want arm_neon.h to work correctly? If so, I think we need to take a >> different approach. >> >> At O0 I've started to see: >> >> " undefined reference to `sqrt' " >> >> When checking a large arm_neon.h testcase. >> >> It does seem strange that the mid-end would convert a __builtin_sqrt back >> to a library call at O0 when the target has an optab for it, so perhaps >> there is a bug there to go hunt? > __builtin_sqrt has the same semantics as the sqrt library function. This > includes setting errno for negative arguments (other than -0 and -NaN). > The semantics also include that it's always OK to expand to a call to that > library function (generally, __builtin_foo may always expand to a call to > foo, if there is such a library function). Thanks for the explanation. I see that there are some intrinsics implemented in terms of __builtin_fabsf, presumable they can be at 'risk' too of having a library call emitted? Kyrill >
On Thu, 18 Dec 2014, Kyrill Tkachov wrote: > I see that there are some intrinsics implemented in terms of __builtin_fabsf, > presumable they can be at 'risk' too of having a library call emitted? The semantics of fabsf/fabs/fabsl are to clear the sign bit, never raise any exceptions (even for signaling NaN arguments, subject to any limitations imposed by calling conventions / floating-point load conventions on the processor) and never set errno, as per IEEE 754-2008 (and the C bindings thereto, TS 18661-1:2014). So there is certainly no need to call a library function for __builtin_fabsf, but I don't know whether the implementation guarantees never to generate such a call.
On 17/12/14 00:04, Joseph Myers wrote: > On Mon, 15 Dec 2014, James Greenhalgh wrote: > >>> @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a) >>> return __builtin_aarch64_sqrtv4sf (a); >>> } >>> >>> +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__)) >>> +vsqrt_f64 (float64x1_t a) >>> +{ >>> + return (float64x1_t) { __builtin_sqrt (a[0]) }; >>> +} >> Hi Kyrill, >> >> Does this introduce an implicit need to link against a maths library if >> we want arm_neon.h to work correctly? If so, I think we need to take a >> different approach. >> >> At O0 I've started to see: >> >> " undefined reference to `sqrt' " >> >> When checking a large arm_neon.h testcase. >> >> It does seem strange that the mid-end would convert a __builtin_sqrt back >> to a library call at O0 when the target has an optab for it, so perhaps >> there is a bug there to go hunt? > __builtin_sqrt has the same semantics as the sqrt library function. This > includes setting errno for negative arguments (other than -0 and -NaN). > The semantics also include that it's always OK to expand to a call to that > library function (generally, __builtin_foo may always expand to a call to > foo, if there is such a library function). So my first attempt at this patch had created a target builtin (__builtin_aarch64_sqrtdf) and used that. Eventually though I went for the shorter __builtin_sqrt because I thought we could benefit from the tree-level information about the semantics rather than the RTL-level expansion that the target-specific builtin would provide. But if there's a risk for it to expand to a library function call, I guess it's better to go with the target builtin. I'll prepare a patch. Thanks for the explanations, Kyrill >
commit d9e42debe2655287eef7b8c3ecf29bbdd11e6425 Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com> Date: Mon Nov 17 15:02:01 2014 +0000 [AArch64] Implement vsqrt_f64 intrinsic diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index b3b80b8..c58213a 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a) return __builtin_aarch64_sqrtv4sf (a); } +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__)) +vsqrt_f64 (float64x1_t a) +{ + return (float64x1_t) { __builtin_sqrt (a[0]) }; +} + __extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) vsqrtq_f64 (float64x2_t a) { diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c new file mode 100644 index 0000000..57fb6bb --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c @@ -0,0 +1,25 @@ +/* Test the vsqrt_f64 AArch64 SIMD intrinsic. */ + +/* { dg-do run } */ +/* { dg-options "-save-temps -O3" } */ + +#include "arm_neon.h" + +extern void abort (void); + + +int +main (void) +{ + float64x1_t in = vcreate_f64(0x3febd3e560634d7bULL); + float64x1_t result = vsqrt_f64 (in); + float64_t expected = 0.9325321502142351; + + if (result[0] != expected) + abort (); + + return 0; +} + +/* { dg-final { scan-assembler "fsqrt\[ \t\]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+\n" } } */ +/* { dg-final { cleanup-saved-temps } } */