diff mbox

[AArch64] Implement vsqrt_f64 intrinsic

Message ID 546A31DB.7040102@arm.com
State New
Headers show

Commit Message

Kyrylo Tkachov Nov. 17, 2014, 5:35 p.m. UTC
Hi all,

This patch implements the vsqrt_f64 intrinsic in arm_neon.h.
There's not much to it, we can reuse __builtin_sqrt.
It's a fairly straightforward and self-contained patch,
do we still want it at this stage?

A new execute test is added.

Tested aarch64-none-elf.


Thanks,
Kyrill

2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic.

2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/simd/vsqrt_f64_1.c

Comments

Marcus Shawcroft Nov. 21, 2014, 3:52 p.m. UTC | #1
On 17 November 2014 17:35, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:

> 2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic.
>
> 2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * gcc.target/aarch64/simd/vsqrt_f64_1.c

OK /Marcus
Christophe Lyon Nov. 26, 2014, 10:14 a.m. UTC | #2
Hi Kyrill,


On 21 November 2014 at 16:52, Marcus Shawcroft
<marcus.shawcroft@gmail.com> wrote:
> On 17 November 2014 17:35, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>
>> 2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>     * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic.
>>
>> 2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>     * gcc.target/aarch64/simd/vsqrt_f64_1.c
>
> OK /Marcus

Your new test fails at the scan-assembly step because all the code is
optimized away (even at -O1).

Christophe.
James Greenhalgh Dec. 15, 2014, 12:13 p.m. UTC | #3
On Mon, Nov 17, 2014 at 05:35:23PM +0000, Kyrill Tkachov wrote:
> Hi all,
> 
> This patch implements the vsqrt_f64 intrinsic in arm_neon.h.
> There's not much to it, we can reuse __builtin_sqrt.
> It's a fairly straightforward and self-contained patch,
> do we still want it at this stage?
> 
> A new execute test is added.
> 
> Tested aarch64-none-elf.
> 
> 
> Thanks,
> Kyrill
> 
> 2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>      * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic.
> 
> 2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>      * gcc.target/aarch64/simd/vsqrt_f64_1.c

> commit d9e42debe2655287eef7b8c3ecf29bbdd11e6425
> Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
> Date:   Mon Nov 17 15:02:01 2014 +0000
> 
>     [AArch64] Implement vsqrt_f64 intrinsic
> 
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index b3b80b8..c58213a 100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a)
>    return __builtin_aarch64_sqrtv4sf (a);
>  }
>  
> +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
> +vsqrt_f64 (float64x1_t a)
> +{
> +  return (float64x1_t) { __builtin_sqrt (a[0]) };
> +}

Hi Kyrill,

Does this introduce an implicit need to link against a maths library if
we want arm_neon.h to work correctly? If so, I think we need to take a
different approach.

At O0 I've started to see:

  " undefined reference to `sqrt' "

When checking a large arm_neon.h testcase.

It does seem strange that the mid-end would convert a __builtin_sqrt back
to a library call at O0 when the target has an optab for it, so perhaps
there is a bug there to go hunt?

Thanks,
James
Joseph Myers Dec. 17, 2014, 12:04 a.m. UTC | #4
On Mon, 15 Dec 2014, James Greenhalgh wrote:

> > @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a)
> >    return __builtin_aarch64_sqrtv4sf (a);
> >  }
> >  
> > +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
> > +vsqrt_f64 (float64x1_t a)
> > +{
> > +  return (float64x1_t) { __builtin_sqrt (a[0]) };
> > +}
> 
> Hi Kyrill,
> 
> Does this introduce an implicit need to link against a maths library if
> we want arm_neon.h to work correctly? If so, I think we need to take a
> different approach.
> 
> At O0 I've started to see:
> 
>   " undefined reference to `sqrt' "
> 
> When checking a large arm_neon.h testcase.
> 
> It does seem strange that the mid-end would convert a __builtin_sqrt back
> to a library call at O0 when the target has an optab for it, so perhaps
> there is a bug there to go hunt?

__builtin_sqrt has the same semantics as the sqrt library function.  This 
includes setting errno for negative arguments (other than -0 and -NaN).  
The semantics also include that it's always OK to expand to a call to that 
library function (generally, __builtin_foo may always expand to a call to 
foo, if there is such a library function).
Kyrylo Tkachov Dec. 18, 2014, 11:39 a.m. UTC | #5
On 17/12/14 00:04, Joseph Myers wrote:
> On Mon, 15 Dec 2014, James Greenhalgh wrote:
>
>>> @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a)
>>>     return __builtin_aarch64_sqrtv4sf (a);
>>>   }
>>>   
>>> +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
>>> +vsqrt_f64 (float64x1_t a)
>>> +{
>>> +  return (float64x1_t) { __builtin_sqrt (a[0]) };
>>> +}
>> Hi Kyrill,
>>
>> Does this introduce an implicit need to link against a maths library if
>> we want arm_neon.h to work correctly? If so, I think we need to take a
>> different approach.
>>
>> At O0 I've started to see:
>>
>>    " undefined reference to `sqrt' "
>>
>> When checking a large arm_neon.h testcase.
>>
>> It does seem strange that the mid-end would convert a __builtin_sqrt back
>> to a library call at O0 when the target has an optab for it, so perhaps
>> there is a bug there to go hunt?
> __builtin_sqrt has the same semantics as the sqrt library function.  This
> includes setting errno for negative arguments (other than -0 and -NaN).
> The semantics also include that it's always OK to expand to a call to that
> library function (generally, __builtin_foo may always expand to a call to
> foo, if there is such a library function).

Thanks for the explanation.
I see that there are some intrinsics implemented in terms of 
__builtin_fabsf, presumable they can be at 'risk' too of having a 
library call emitted?

Kyrill

>
Joseph Myers Dec. 18, 2014, 1:27 p.m. UTC | #6
On Thu, 18 Dec 2014, Kyrill Tkachov wrote:

> I see that there are some intrinsics implemented in terms of __builtin_fabsf,
> presumable they can be at 'risk' too of having a library call emitted?

The semantics of fabsf/fabs/fabsl are to clear the sign bit, never raise 
any exceptions (even for signaling NaN arguments, subject to any 
limitations imposed by calling conventions / floating-point load 
conventions on the processor) and never set errno, as per IEEE 754-2008 
(and the C bindings thereto, TS 18661-1:2014).

So there is certainly no need to call a library function for 
__builtin_fabsf, but I don't know whether the implementation guarantees 
never to generate such a call.
Kyrylo Tkachov Jan. 9, 2015, 11:45 a.m. UTC | #7
On 17/12/14 00:04, Joseph Myers wrote:
> On Mon, 15 Dec 2014, James Greenhalgh wrote:
>
>>> @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a)
>>>     return __builtin_aarch64_sqrtv4sf (a);
>>>   }
>>>   
>>> +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
>>> +vsqrt_f64 (float64x1_t a)
>>> +{
>>> +  return (float64x1_t) { __builtin_sqrt (a[0]) };
>>> +}
>> Hi Kyrill,
>>
>> Does this introduce an implicit need to link against a maths library if
>> we want arm_neon.h to work correctly? If so, I think we need to take a
>> different approach.
>>
>> At O0 I've started to see:
>>
>>    " undefined reference to `sqrt' "
>>
>> When checking a large arm_neon.h testcase.
>>
>> It does seem strange that the mid-end would convert a __builtin_sqrt back
>> to a library call at O0 when the target has an optab for it, so perhaps
>> there is a bug there to go hunt?
> __builtin_sqrt has the same semantics as the sqrt library function.  This
> includes setting errno for negative arguments (other than -0 and -NaN).
> The semantics also include that it's always OK to expand to a call to that
> library function (generally, __builtin_foo may always expand to a call to
> foo, if there is such a library function).

So my first attempt at this patch had created a target builtin 
(__builtin_aarch64_sqrtdf) and used that. Eventually though I went for 
the shorter __builtin_sqrt because I thought we could benefit from the 
tree-level information about the semantics rather than the RTL-level 
expansion that the target-specific builtin would provide.
But if there's a risk for it to expand to a library function call, I 
guess it's better to go with the target builtin. I'll prepare a patch.

Thanks for the explanations,
Kyrill

>
diff mbox

Patch

commit d9e42debe2655287eef7b8c3ecf29bbdd11e6425
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Mon Nov 17 15:02:01 2014 +0000

    [AArch64] Implement vsqrt_f64 intrinsic

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index b3b80b8..c58213a 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -22792,6 +22792,12 @@  vsqrtq_f32 (float32x4_t a)
   return __builtin_aarch64_sqrtv4sf (a);
 }
 
+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+vsqrt_f64 (float64x1_t a)
+{
+  return (float64x1_t) { __builtin_sqrt (a[0]) };
+}
+
 __extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
 vsqrtq_f64 (float64x2_t a)
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c
new file mode 100644
index 0000000..57fb6bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c
@@ -0,0 +1,25 @@ 
+/* Test the vsqrt_f64 AArch64 SIMD intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-options "-save-temps -O3" } */
+
+#include "arm_neon.h"
+
+extern void abort (void);
+
+
+int
+main (void)
+{
+  float64x1_t in = vcreate_f64(0x3febd3e560634d7bULL);
+  float64x1_t result = vsqrt_f64 (in);
+  float64_t expected = 0.9325321502142351;
+
+  if (result[0] != expected)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler "fsqrt\[ \t\]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+\n" } } */
+/* { dg-final { cleanup-saved-temps } } */