Message ID | 559BC75A.1080606@arm.com |
---|---|
State | New |
Headers | show |
Hi Alan,
On 07/07/15 13:34, Alan Lawrence wrote:
> As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html
For some context, the reference for these is at:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf
This patch is ok once you and Charles decide on how to proceed with the two prerequisites.
Thanks,
Kyrill
On 07/07/15 14:09, Kyrill Tkachov wrote: > Hi Alan, > > On 07/07/15 13:34, Alan Lawrence wrote: >> As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html > For some context, the reference for these is at: > http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf > > This patch is ok once you and Charles decide on how to proceed with the two prerequisites. On second thought, the ACLE document at http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf says in 12.2.1: "float16 types are only available when the __fp16 type is defined, i.e. when supported by the hardware" This indicates that float16 type and intrinsic availability should be gated on the availability of fp16 in the specified -mfpu. Look at some existing intrinsics like vcvt_f16_f32 for a way to gate these. I notice that the float32x4_t is unconditionally defined in our arm_neon.h, however. I think this is a bug and its definition should be #ifdef'd properly as well. Thanks, Kyrill > > Thanks, > Kyrill >
Kyrill Tkachov wrote: > On 07/07/15 14:09, Kyrill Tkachov wrote: >> Hi Alan, >> >> On 07/07/15 13:34, Alan Lawrence wrote: >>> As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html >> For some context, the reference for these is at: >> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf >> >> This patch is ok once you and Charles decide on how to proceed with the two prerequisites. > > On second thought, the ACLE document at http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf > > says in 12.2.1: > "float16 types are only available when the __fp16 type is defined, i.e. when supported by the hardware" However, we support __fp16 whenever the user specifies -mfp16-format=ieee or -mfp16-format=alternative, regardless of whether we have hardware support or not. (Without hardware support, gcc generates calls to __gnu_f2h_ieee or __gnu_f2h_alternative instead of vcvtb.f16.f32, and __gnu_h2f_ieee or __gnu_h2f_alternative instead of vcvtb.f32.f16. However, there is no way to support __fp16 just using those hardware instructions without caring about which format is in use.) Thus we cannot be consistent with both sides of that 'i.e.', unless we also change when __fp16 is available. > I notice that the float32x4_t is unconditionally defined in our arm_neon.h, however. > I think this is a bug and its definition should be #ifdef'd properly as well. Hmmm. Is this becoming a question of, which potentially-existing code do we want to break??? Cheers, Alan
On 07/07/15 17:34, Alan Lawrence wrote: > Kyrill Tkachov wrote: >> On 07/07/15 14:09, Kyrill Tkachov wrote: >>> Hi Alan, >>> >>> On 07/07/15 13:34, Alan Lawrence wrote: >>>> As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html >>> For some context, the reference for these is at: >>> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf >>> >>> This patch is ok once you and Charles decide on how to proceed with the two prerequisites. >> On second thought, the ACLE document at http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf >> >> says in 12.2.1: >> "float16 types are only available when the __fp16 type is defined, i.e. when supported by the hardware" > However, we support __fp16 whenever the user specifies -mfp16-format=ieee or > -mfp16-format=alternative, regardless of whether we have hardware support or not. > > (Without hardware support, gcc generates calls to __gnu_f2h_ieee or > __gnu_f2h_alternative instead of vcvtb.f16.f32, and __gnu_h2f_ieee or > __gnu_h2f_alternative instead of vcvtb.f32.f16. However, there is no way to > support __fp16 just using those hardware instructions without caring about which > format is in use.) Hmmm... In my opinion intrinsics should aim to map to instructions rather than go away and call library functions, but this is the existing functionality that current users might depend on :( > > Thus we cannot be consistent with both sides of that 'i.e.', unless we also > change when __fp16 is available. > >> I notice that the float32x4_t is unconditionally defined in our arm_neon.h, however. >> I think this is a bug and its definition should be #ifdef'd properly as well. > Hmmm. Is this becoming a question of, which potentially-existing code do we want > to break??? CC'ing the ARM maintainers and Tejas for an ACLE perspective. I think that we'd want to gate the definition of __fp16 on hardware availability as well (the -mfpu option) rather than just arm_fp16_format but I'm not sure of the impact this will have on existing users. Kyrill > > Cheers, Alan
Kyrill Tkachov wrote: > On 07/07/15 17:34, Alan Lawrence wrote: >> Kyrill Tkachov wrote: >>> On 07/07/15 14:09, Kyrill Tkachov wrote: >>>> Hi Alan, >>>> >>>> On 07/07/15 13:34, Alan Lawrence wrote: >>>>> As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html >>>> For some context, the reference for these is at: >>>> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf >>>> >>>> This patch is ok once you and Charles decide on how to proceed with the two prerequisites. >>> On second thought, the ACLE document at http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf >>> >>> says in 12.2.1: >>> "float16 types are only available when the __fp16 type is defined, i.e. when supported by the hardware" >> However, we support __fp16 whenever the user specifies -mfp16-format=ieee or >> -mfp16-format=alternative, regardless of whether we have hardware support or not. >> >> (Without hardware support, gcc generates calls to __gnu_f2h_ieee or >> __gnu_f2h_alternative instead of vcvtb.f16.f32, and __gnu_h2f_ieee or >> __gnu_h2f_alternative instead of vcvtb.f32.f16. However, there is no way to >> support __fp16 just using those hardware instructions without caring about which >> format is in use.) > > Hmmm... In my opinion intrinsics should aim to map to instructions rather than go away and > call library functions, but this is the existing functionality > that current users might depend on :( Sorry - to clarify: currently we generate __gnu_f2h_ieee / __gnu_h2f_ieee, to convert between single __fp16 and 'float' values, when there is no HW. General operations on scalar __fp16 values are performed by converting to float, performing operations on float, and converting back. The __fp16 type is available and "usable" without HW support, but only when -mfp16-format is specified. (The existing) intrinsics operating on float16x[48] vectors (converting to/from float32x4) are *not* available without hardware support; these intrinsics *are* available without specifying -mfp16-format. ACLE (4.1.2) allows toolchains to provide __fp16 when not implemented in HW, even if this is not required. > CC'ing the ARM maintainers and Tejas for an ACLE perspective. > I think that we'd want to gate the definition of __fp16 on hardware availability as well > (the -mfpu option) rather than just arm_fp16_format but I'm not sure of the impact this will have > on existing users. Sure....but do we require -mfpu *and* -mfp16-format? s/and/or/ ? Do we require -mfp16-format for float16x[48] intrinsics, or allow format-agnostic code (as HW support allows us to!)? I don't have very strong opinions as to which way we should go, I merely tried to be consistent with the existing codebase, and to support as much code as possible, although I agree I ignored cases where defining functions unexpectedly might cause problems. Cheers, Alan
I haven't seen the patch yet but here are my thoughts on where this should be going. On 07/07/15 18:17, Alan Lawrence wrote: > Kyrill Tkachov wrote: >> On 07/07/15 17:34, Alan Lawrence wrote: >>> Kyrill Tkachov wrote: >>>> On 07/07/15 14:09, Kyrill Tkachov wrote: >>>>> Hi Alan, >>>>> >>>>> On 07/07/15 13:34, Alan Lawrence wrote: >>>>>> As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html >>>>> For some context, the reference for these is at: >>>>> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf >>>>> >>>>> This patch is ok once you and Charles decide on how to proceed with the two prerequisites. >>>> On second thought, the ACLE document at http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf >>>> >>>> says in 12.2.1: >>>> "float16 types are only available when the __fp16 type is defined, i.e. when supported by the hardware" >>> However, we support __fp16 whenever the user specifies -mfp16-format=ieee or >>> -mfp16-format=alternative, regardless of whether we have hardware support or not. >>> >>> (Without hardware support, gcc generates calls to __gnu_f2h_ieee or >>> __gnu_f2h_alternative instead of vcvtb.f16.f32, and __gnu_h2f_ieee or >>> __gnu_h2f_alternative instead of vcvtb.f32.f16. However, there is no way to >>> support __fp16 just using those hardware instructions without caring about which >>> format is in use.) >> >> Hmmm... In my opinion intrinsics should aim to map to instructions rather than go away and >> call library functions, but this is the existing functionality >> that current users might depend on :( > > Sorry - to clarify: currently we generate __gnu_f2h_ieee / __gnu_h2f_ieee, to convert between single __fp16 and 'float' values, when there is no HW. General operations on scalar __fp16 values are performed by converting to float, performing operations on float, and converting back. The __fp16 type is available and "usable" without HW support, but only when -mfp16-format is specified. > > (The existing) intrinsics operating on float16x[48] vectors (converting to/from float32x4) are *not* available without hardware support; these intrinsics *are* available without specifying -mfp16-format. > > ACLE (4.1.2) allows toolchains to provide __fp16 when not implemented in HW, even if this is not required. The type should exist with the presence of the SIMD unit and all the intrinsics that treat this as a bag of bits should just work (TM). The only intrinsics to be guarded by mfpu=neon-fp16 should really be the intrinsics for the instructions that interpret the 16 bits as float16 types. > >> CC'ing the ARM maintainers and Tejas for an ACLE perspective. >> I think that we'd want to gate the definition of __fp16 on hardware availability as well >> (the -mfpu option) rather than just arm_fp16_format but I'm not sure of the impact this will have >> on existing users. This is just a storage format in the scalar world and the ACLE allows folks to have fp16 support without hardware. There are helper routines for that which were put in in the first place for this purpose. > > Sure....but do we require -mfpu *and* -mfp16-format? s/and/or/ ? Do we require -mfp16-format for float16x[48] intrinsics, or allow format-agnostic code (as HW support allows us to!)? > I'd say we require the mfpu option for the intrinsics that interpret the float16 type but there is no bearing on the float16 format being chosen for this purpose, the reason being that the actual instruction being emitted takes care of doing the right thing as per the format specified by the AHP bit in the FPSCR - This is unlike the scalar case where the compiler *needs* to know the fp16-format that the user intended to use in order to call the correct emulation function. Thus in summary - 1. -mfpu=neon implies the presence of the float16x(4/8) types and all the intrinsics that treat these values as bags of bits. 2. -mfpu=neon-fp16 implies the presence of the vcvt* intrinsics that are needed for the float16 types. Thoughts ? regards Ramana > > Cheers, Alan >
Ramana Radhakrishnan wrote: > I haven't seen the patch yet but here are my thoughts on where this should be going. > > Thus in summary - > > 1. -mfpu=neon implies the presence of the float16x(4/8) types and all the intrinsics that treat these values as bags of bits. > 2. -mfpu=neon-fp16 implies the presence of the vcvt* intrinsics that are needed for the float16 types. So I think the "problems" are statements in ACLE that (a) we should only have float16x(4/8)_t types, when we have scalar types as well; (b) whenever we have a scalar __fp16 type, we should have one or other of the __ARM_FP16_FORMAT_(IEEE/ALTERNATIVE) macros defined to indicate the format that's in use. Sadly these seem to forbid the current situation whereby we expose hardware conversion instructions (that work with either fp16 format, according to the status of the FPSCR bit) and allow compiling a binary that will work with either format :(. The situation is further complicated by GCC's support for the alternative format (not mandated by ACLE), that we can support either format in the absence of any hardware (as we have software emulation routines for scalar conversions in either format, as long as we know which at compile time), and that object files compiled with different -mfp16-format cannot be linked together (the ABI attributes conflict). However, I think we can still go with Ramana's point 1. albeit _only_when_ a -mfp16-format is specified. _2_ similarly (i.e. -mfpu=neon-fp16 will not provide any additional intrinsics unless an -mfp16-format is specified). I'll repost the patch series shortly with those changes implemented. In the meantime: are patches 1 & 2 ( ARM __builtin_arm_neon_lane_bounds and qualifier_lane_index) OK to commit? These contain nothing float16-specific, and would unblock Charles Baylis' work on PR63870 (https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00545.html). I'll ping the AArch64 changes separately. Cheers, Alan
commit 54a89a084fbd00e4de036f549ca893b74b8f58fb Author: Alan Lawrence <alan.lawrence@arm.com> Date: Mon Dec 8 18:40:03 2014 +0000 ARM: float16x4_t intrinsics (v2 - fix v[sg]et_lane_f16 at -O0, no vdup_n/vmov_n) diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index c923e29..b4100c8 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -41,6 +41,7 @@ typedef __simd64_int8_t int8x8_t; typedef __simd64_int16_t int16x4_t; typedef __simd64_int32_t int32x2_t; typedef __builtin_neon_di int64x1_t; +typedef __builtin_neon_hf float16_t; typedef __simd64_float16_t float16x4_t; typedef __simd64_float32_t float32x2_t; typedef __simd64_poly8_t poly8x8_t; @@ -5201,6 +5202,19 @@ vget_lane_s32 (int32x2_t __a, const int __b) return (int32_t)__builtin_neon_vget_lanev2si (__a, __b); } +/* Functions cannot accept or return __FP16 types. Even if the function + were marked always-inline so there were no call sites, the declaration + would nonetheless raise an error. Hence, we must use a macro instead. */ + +#define vget_lane_f16(__v, __idx) \ + __extension__ \ + ({ \ + float16x4_t __vec = (__v); \ + __builtin_arm_lane_check (4, __idx); \ + float16_t __res = __vec[__idx]; \ + __res; \ + }) + __extension__ static __inline float32_t __attribute__ ((__always_inline__)) vget_lane_f32 (float32x2_t __a, const int __b) { @@ -5333,6 +5347,16 @@ vset_lane_s32 (int32_t __a, int32x2_t __b, const int __c) return (int32x2_t)__builtin_neon_vset_lanev2si ((__builtin_neon_si) __a, __b, __c); } +#define vset_lane_f16(__e, __v, __idx) \ + __extension__ \ + ({ \ + float16_t __elem = (__e); \ + float16x4_t __vec = (__v); \ + __builtin_arm_lane_check (4, __idx); \ + __vec[__idx] = __elem; \ + __vec; \ + }) + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vset_lane_f32 (float32_t __a, float32x2_t __b, const int __c) { @@ -5479,6 +5503,12 @@ vcreate_s64 (uint64_t __a) return (int64x1_t)__builtin_neon_vcreatedi ((__builtin_neon_di) __a); } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vcreate_f16 (uint64_t __a) +{ + return (float16x4_t) __a; +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vcreate_f32 (uint64_t __a) { @@ -8796,6 +8826,12 @@ vld1_lane_s32 (const int32_t * __a, int32x2_t __b, const int __c) return (int32x2_t)__builtin_neon_vld1_lanev2si ((const __builtin_neon_si *) __a, __b, __c); } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vld1_lane_f16 (const float16_t * __a, float16x4_t __b, const int __c) +{ + return vset_lane_f16 (*__a, __b, __c); +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vld1_lane_f32 (const float32_t * __a, float32x2_t __b, const int __c) { @@ -8944,6 +8980,13 @@ vld1_dup_s32 (const int32_t * __a) return (int32x2_t)__builtin_neon_vld1_dupv2si ((const __builtin_neon_si *) __a); } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vld1_dup_f16 (const float16_t * __a) +{ + float16_t __f = *__a; + return (float16x4_t) { __f, __f, __f, __f }; +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vld1_dup_f32 (const float32_t * __a) { @@ -11828,6 +11871,12 @@ vreinterpret_p8_p16 (poly16x4_t __a) } __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) +vreinterpret_p8_f16 (float16x4_t __a) +{ + return (poly8x8_t) __a; +} + +__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) vreinterpret_p8_f32 (float32x2_t __a) { return (poly8x8_t)__builtin_neon_vreinterpretv8qiv2sf (__a); @@ -11896,6 +11945,12 @@ vreinterpret_p16_p8 (poly8x8_t __a) } __extension__ static __inline poly16x4_t __attribute__ ((__always_inline__)) +vreinterpret_p16_f16 (float16x4_t __a) +{ + return (poly16x4_t) __a; +} + +__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__)) vreinterpret_p16_f32 (float32x2_t __a) { return (poly16x4_t)__builtin_neon_vreinterpretv4hiv2sf (__a); @@ -11957,6 +12012,80 @@ vreinterpret_p16_u32 (uint32x2_t __a) return (poly16x4_t)__builtin_neon_vreinterpretv4hiv2si ((int32x2_t) __a); } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_p8 (poly8x8_t __a) +{ + return (float16x4_t) __a; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_p16 (poly16x4_t __a) +{ + return (float16x4_t) __a; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_f32 (float32x2_t __a) +{ + return (float16x4_t) __a; +} + +#ifdef __ARM_FEATURE_CRYPTO +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_p64 (poly64x1_t __a) +{ + return (float16x4_t) __a; +} + +#endif +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_s64 (int64x1_t __a) +{ + return (float16x4_t) __a; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_u64 (uint64x1_t __a) +{ + return (float16x4_t) __a; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_s8 (int8x8_t __a) +{ + return (float16x4_t) __a; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_s16 (int16x4_t __a) +{ + return (float16x4_t) __a; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_s32 (int32x2_t __a) +{ + return (float16x4_t) __a; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_u8 (uint8x8_t __a) +{ + return (float16x4_t) __a; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_u16 (uint16x4_t __a) +{ + return (float16x4_t) __a; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_u32 (uint32x2_t __a) +{ + return (float16x4_t) __a; +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vreinterpret_f32_p8 (poly8x8_t __a) { @@ -11969,6 +12098,12 @@ vreinterpret_f32_p16 (poly16x4_t __a) return (float32x2_t)__builtin_neon_vreinterpretv2sfv4hi ((int16x4_t) __a); } +__extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) +vreinterpret_f32_f16 (float16x4_t __a) +{ + return (float32x2_t) __a; +} + #ifdef __ARM_FEATURE_CRYPTO __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vreinterpret_f32_p64 (poly64x1_t __a) @@ -12043,6 +12178,14 @@ vreinterpret_p64_p16 (poly16x4_t __a) #endif #ifdef __ARM_FEATURE_CRYPTO __extension__ static __inline poly64x1_t __attribute__ ((__always_inline__)) +vreinterpret_p64_f16 (float16x4_t __a) +{ + return (poly64x1_t) __a; +} + +#endif +#ifdef __ARM_FEATURE_CRYPTO +__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__)) vreinterpret_p64_f32 (float32x2_t __a) { return (poly64x1_t)__builtin_neon_vreinterpretdiv2sf (__a); @@ -12126,6 +12269,12 @@ vreinterpret_s64_p16 (poly16x4_t __a) } __extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) +vreinterpret_s64_f16 (float16x4_t __a) +{ + return (int64x1_t) __a; +} + +__extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) vreinterpret_s64_f32 (float32x2_t __a) { return (int64x1_t)__builtin_neon_vreinterpretdiv2sf (__a); @@ -12194,6 +12343,12 @@ vreinterpret_u64_p16 (poly16x4_t __a) } __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__)) +vreinterpret_u64_f16 (float16x4_t __a) +{ + return (uint64x1_t) __a; +} + +__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__)) vreinterpret_u64_f32 (float32x2_t __a) { return (uint64x1_t)__builtin_neon_vreinterpretdiv2sf (__a); @@ -12262,6 +12417,12 @@ vreinterpret_s8_p16 (poly16x4_t __a) } __extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) +vreinterpret_s8_f16 (float16x4_t __a) +{ + return (int8x8_t) __a; +} + +__extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) vreinterpret_s8_f32 (float32x2_t __a) { return (int8x8_t)__builtin_neon_vreinterpretv8qiv2sf (__a); @@ -12330,6 +12491,12 @@ vreinterpret_s16_p16 (poly16x4_t __a) } __extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) +vreinterpret_s16_f16 (float16x4_t __a) +{ + return (int16x4_t) __a; +} + +__extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) vreinterpret_s16_f32 (float32x2_t __a) { return (int16x4_t)__builtin_neon_vreinterpretv4hiv2sf (__a); @@ -12398,6 +12565,12 @@ vreinterpret_s32_p16 (poly16x4_t __a) } __extension__ static __inline int32x2_t __attribute__ ((__always_inline__)) +vreinterpret_s32_f16 (float16x4_t __a) +{ + return (int32x2_t) __a; +} + +__extension__ static __inline int32x2_t __attribute__ ((__always_inline__)) vreinterpret_s32_f32 (float32x2_t __a) { return (int32x2_t)__builtin_neon_vreinterpretv2siv2sf (__a); @@ -12466,6 +12639,12 @@ vreinterpret_u8_p16 (poly16x4_t __a) } __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) +vreinterpret_u8_f16 (float16x4_t __a) +{ + return (uint8x8_t) __a; +} + +__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) vreinterpret_u8_f32 (float32x2_t __a) { return (uint8x8_t)__builtin_neon_vreinterpretv8qiv2sf (__a); @@ -12534,6 +12713,12 @@ vreinterpret_u16_p16 (poly16x4_t __a) } __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vreinterpret_u16_f16 (float16x4_t __a) +{ + return (uint16x4_t) __a; +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) vreinterpret_u16_f32 (float32x2_t __a) { return (uint16x4_t)__builtin_neon_vreinterpretv4hiv2sf (__a); @@ -12602,6 +12787,12 @@ vreinterpret_u32_p16 (poly16x4_t __a) } __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__)) +vreinterpret_u32_f16 (float16x4_t __a) +{ + return (uint32x2_t) __a; +} + +__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__)) vreinterpret_u32_f32 (float32x2_t __a) { return (uint32x2_t)__builtin_neon_vreinterpretv2siv2sf (__a);