diff mbox

[ARM,1/2] Add support for vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsics

Message ID 20130413150344.2a80da8e@octopus
State New
Headers show

Commit Message

Julian Brown April 13, 2013, 2:03 p.m. UTC
On Fri, 12 Apr 2013 20:09:39 +0100
Julian Brown <julian@codesourcery.com> wrote:

> On Fri, 12 Apr 2013 15:19:18 +0100
> Kyrylo Tkachov <kyrylo.tkachov@arm.com> wrote:
> 
> > Hi all,
> > 
> > This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic
> > to arm_neon.h through the generator ML scripts and also adds the
> > built-ins to which the intrinsics will map to. The generator ML
> > scripts are updated and used to generate the relevant .texi
> > documentation, arm_neon.h and the tests in gcc.target/arm/neon .
> 
> FWIW, some of the changes to neon*.ml can be simplified somewhat -- my
> attempt at an improved version of those bits is attached. I'm still
> not too happy with mode_suffix, but these new instructions require
> adding semantics to parts of the generator program which weren't
> really very well-defined to start with :-). I appreciate that it's a
> bit of a tangle...

I thought of an improvement to the mode_suffix part from the last
version of the patch, so here it is. I'm done fiddling with this now,
so back to you!

Cheers,

Julian

Comments

Kyrylo Tkachov April 17, 2013, 11:06 a.m. UTC | #1
Hi Julian,

> From: Julian Brown [mailto:julian@codesourcery.com]
> Sent: 13 April 2013 15:04
> To: Julian Brown
> Cc: Kyrylo Tkachov; gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana
> Radhakrishnan
> Subject: Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and
> vcvt_f32_f16 NEON intrinsics
> 
> On Fri, 12 Apr 2013 20:09:39 +0100
> Julian Brown <julian@codesourcery.com> wrote:
> 
> > On Fri, 12 Apr 2013 15:19:18 +0100
> > Kyrylo Tkachov <kyrylo.tkachov@arm.com> wrote:
> >
> > > Hi all,
> > >
> > > This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic
> > > to arm_neon.h through the generator ML scripts and also adds the
> > > built-ins to which the intrinsics will map to. The generator ML
> > > scripts are updated and used to generate the relevant .texi
> > > documentation, arm_neon.h and the tests in gcc.target/arm/neon .
> >
> > FWIW, some of the changes to neon*.ml can be simplified somewhat --
> my
> > attempt at an improved version of those bits is attached. I'm still
> > not too happy with mode_suffix, but these new instructions require
> > adding semantics to parts of the generator program which weren't
> > really very well-defined to start with :-). I appreciate that it's a
> > bit of a tangle...
> 
> I thought of an improvement to the mode_suffix part from the last
> version of the patch, so here it is. I'm done fiddling with this now,
> so back to you!

Thanks for looking at it! My Ocaml-fu is rather limited.
It does look cleaner now.
Here it is together with all the other parts of the patch, plus some
minor formatting changes.

Ok for trunk now?

gcc/ChangeLog
2013-04-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
            Julian Brown  <julian@codesourcery.com>

	* config/arm/arm.c (neon_builtin_type_mode): Add T_V4HF.
	(TB_DREG): Add T_V4HF.
	(v4hf_UP): New macro.
	(neon_itype): Add NEON_FLOAT_WIDEN, NEON_FLOAT_NARROW.
	(arm_init_neon_builtins): Handle NEON_FLOAT_WIDEN,
	NEON_FLOAT_NARROW.
	Handle initialisation of V4HF. Adjust initialisation of reinterpret
	built-ins.
	(arm_expand_neon_builtin): Handle NEON_FLOAT_WIDEN,
	NEON_FLOAT_NARROW.
	(arm_vector_mode_supported_p): Handle V4HF.
	(arm_mangle_map): Handle V4HFmode.
	* config/arm/arm.h (VALID_NEON_DREG_MODE): Add V4HF.
	* config/arm/arm_neon_builtins.def: Add entries for
	vcvtv4hfv4sf, vcvtv4sfv4hf.
	* config/arm/neon.md (neon_vcvtv4sfv4hf): New pattern.
	(neon_vcvtv4hfv4sf): Likewise.
	* config/arm/neon-gen.ml: Handle half-precision floating point
	features.
	* config/arm/neon-testgen.ml: Handle Requires_FP_bit feature.
	* config/arm/arm_neon.h: Regenerate.
	* config/arm/neon.ml (type elts): Add F16.
	(type vectype): Add T_float16x4, T_floatHF.
	(type vecmode): Add V4HF.
	(type features): Add Requires_FP_bit feature.
	(elt_width): Handle F16.
	(elt_class): Likewise.
	(elt_of_class_width): Likewise.
	(mode_of_elt): Refactor.
	(type_for_elt): Handle F16, fix error messages.
	(vectype_size): Handle T_float16x4.
	(vcvt_sh): New function.
	(ops): Add entries for vcvt_f16_f32, vcvt_f32_f16.
	(string_of_vectype): Handle T_floatHF, T_float16, T_float16x4.
	(string_of_mode): Handle V4HF.
	* doc/arm-neon-intrinsics.texi: Regenerate.


gcc/testsuite/ChangeLog
2013-04-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
            Julian Brown  <julian@codesourcery.com>

	* gcc.target/arm/neon/vcvtf16_f32.c: New test. Generated.
	* gcc.target/arm/neon/vcvtf32_f16.c: Likewise.
Richard Earnshaw April 17, 2013, 3:03 p.m. UTC | #2
On 17/04/13 12:06, Kyrylo Tkachov wrote:
> Hi Julian,
>
>> From: Julian Brown [mailto:julian@codesourcery.com]
>> Sent: 13 April 2013 15:04
>> To: Julian Brown
>> Cc: Kyrylo Tkachov; gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana
>> Radhakrishnan
>> Subject: Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and
>> vcvt_f32_f16 NEON intrinsics
>>
>> On Fri, 12 Apr 2013 20:09:39 +0100
>> Julian Brown <julian@codesourcery.com> wrote:
>>
>>> On Fri, 12 Apr 2013 15:19:18 +0100
>>> Kyrylo Tkachov <kyrylo.tkachov@arm.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic
>>>> to arm_neon.h through the generator ML scripts and also adds the
>>>> built-ins to which the intrinsics will map to. The generator ML
>>>> scripts are updated and used to generate the relevant .texi
>>>> documentation, arm_neon.h and the tests in gcc.target/arm/neon .
>>>
>>> FWIW, some of the changes to neon*.ml can be simplified somewhat --
>> my
>>> attempt at an improved version of those bits is attached. I'm still
>>> not too happy with mode_suffix, but these new instructions require
>>> adding semantics to parts of the generator program which weren't
>>> really very well-defined to start with :-). I appreciate that it's a
>>> bit of a tangle...
>>
>> I thought of an improvement to the mode_suffix part from the last
>> version of the patch, so here it is. I'm done fiddling with this now,
>> so back to you!
>
> Thanks for looking at it! My Ocaml-fu is rather limited.
> It does look cleaner now.
> Here it is together with all the other parts of the patch, plus some
> minor formatting changes.
>
> Ok for trunk now?
>
> gcc/ChangeLog
> 2013-04-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>              Julian Brown  <julian@codesourcery.com>
>
> 	* config/arm/arm.c (neon_builtin_type_mode): Add T_V4HF.
> 	(TB_DREG): Add T_V4HF.
> 	(v4hf_UP): New macro.
> 	(neon_itype): Add NEON_FLOAT_WIDEN, NEON_FLOAT_NARROW.
> 	(arm_init_neon_builtins): Handle NEON_FLOAT_WIDEN,
> 	NEON_FLOAT_NARROW.
> 	Handle initialisation of V4HF. Adjust initialisation of reinterpret
> 	built-ins.
> 	(arm_expand_neon_builtin): Handle NEON_FLOAT_WIDEN,
> 	NEON_FLOAT_NARROW.
> 	(arm_vector_mode_supported_p): Handle V4HF.
> 	(arm_mangle_map): Handle V4HFmode.
> 	* config/arm/arm.h (VALID_NEON_DREG_MODE): Add V4HF.
> 	* config/arm/arm_neon_builtins.def: Add entries for
> 	vcvtv4hfv4sf, vcvtv4sfv4hf.
> 	* config/arm/neon.md (neon_vcvtv4sfv4hf): New pattern.
> 	(neon_vcvtv4hfv4sf): Likewise.
> 	* config/arm/neon-gen.ml: Handle half-precision floating point
> 	features.
> 	* config/arm/neon-testgen.ml: Handle Requires_FP_bit feature.
> 	* config/arm/arm_neon.h: Regenerate.
> 	* config/arm/neon.ml (type elts): Add F16.
> 	(type vectype): Add T_float16x4, T_floatHF.
> 	(type vecmode): Add V4HF.
> 	(type features): Add Requires_FP_bit feature.
> 	(elt_width): Handle F16.
> 	(elt_class): Likewise.
> 	(elt_of_class_width): Likewise.
> 	(mode_of_elt): Refactor.
> 	(type_for_elt): Handle F16, fix error messages.
> 	(vectype_size): Handle T_float16x4.
> 	(vcvt_sh): New function.
> 	(ops): Add entries for vcvt_f16_f32, vcvt_f32_f16.
> 	(string_of_vectype): Handle T_floatHF, T_float16, T_float16x4.
> 	(string_of_mode): Handle V4HF.
> 	* doc/arm-neon-intrinsics.texi: Regenerate.
>
>
> gcc/testsuite/ChangeLog
> 2013-04-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>              Julian Brown  <julian@codesourcery.com>
>
> 	* gcc.target/arm/neon/vcvtf16_f32.c: New test. Generated.
> 	* gcc.target/arm/neon/vcvtf32_f16.c: Likewise.
>
>
> neon-vcvt-intrinsics.patch
>

Please give Julian 24 hours for one final review of the Ocaml bits. 
Otherwise OK.

R.
diff mbox

Patch

Index: neon-gen.ml
===================================================================
--- neon-gen.ml	(revision 197804)
+++ neon-gen.ml	(working copy)
@@ -121,6 +121,7 @@  let rec signed_ctype = function
   | T_uint16 | T_int16 -> T_intHI
   | T_uint32 | T_int32 -> T_intSI
   | T_uint64 | T_int64 -> T_intDI
+  | T_float16 -> T_floatHF
   | T_float32 -> T_floatSF
   | T_poly8 -> T_intQI
   | T_poly16 -> T_intHI
@@ -275,8 +276,8 @@  let rec mode_suffix elttype shape =
     let mode = mode_of_elt elttype shape in
     string_of_mode mode
   with MixedMode (dst, src) ->
-    let dstmode = mode_of_elt dst shape
-    and srcmode = mode_of_elt src shape in
+    let dstmode = mode_of_elt ~argpos:0 dst shape
+    and srcmode = mode_of_elt ~argpos:1 src shape in
     string_of_mode dstmode ^ string_of_mode srcmode
 
 let get_shuffle features =
@@ -291,19 +292,24 @@  let print_feature_test_start features =
     match List.find (fun feature ->
                        match feature with Requires_feature _ -> true
                                         | Requires_arch _ -> true
+                                        | Requires_FP_bit _ -> true
                                         | _ -> false)
                      features with
-      Requires_feature feature -> 
+      Requires_feature feature ->
         Format.printf "#ifdef __ARM_FEATURE_%s@\n" feature
     | Requires_arch arch ->
         Format.printf "#if __ARM_ARCH >= %d@\n" arch
+    | Requires_FP_bit bit ->
+        Format.printf "#if ((__ARM_FP & 0x%X) != 0)@\n"
+                      (1 lsl bit)
     | _ -> assert false
   with Not_found -> assert true
 
 let print_feature_test_end features =
   let feature =
-    List.exists (function Requires_feature x -> true
-                          | Requires_arch x -> true
+    List.exists (function Requires_feature _ -> true
+                          | Requires_arch _ -> true
+                          | Requires_FP_bit _ -> true
                           |  _ -> false) features in
   if feature then Format.printf "#endif@\n"
 
@@ -365,6 +371,7 @@  let deftypes () =
     "__builtin_neon_hi", "int", 16, 4;
     "__builtin_neon_si", "int", 32, 2;
     "__builtin_neon_di", "int", 64, 1;
+    "__builtin_neon_hf", "float", 16, 4;
     "__builtin_neon_sf", "float", 32, 2;
     "__builtin_neon_poly8", "poly", 8, 8;
     "__builtin_neon_poly16", "poly", 16, 4;
Index: neon.ml
===================================================================
--- neon.ml	(revision 197804)
+++ neon.ml	(working copy)
@@ -21,7 +21,7 @@ 
    <http://www.gnu.org/licenses/>.  *)
 
 (* Shorthand types for vector elements.  *)
-type elts = S8 | S16 | S32 | S64 | F32 | U8 | U16 | U32 | U64 | P8 | P16
+type elts = S8 | S16 | S32 | S64 | F16 | F32 | U8 | U16 | U32 | U64 | P8 | P16
           | I8 | I16 | I32 | I64 | B8 | B16 | B32 | B64 | Conv of elts * elts
           | Cast of elts * elts | NoElts
 
@@ -37,6 +37,7 @@  type vectype = T_int8x8    | T_int8x16
 	     | T_uint16x4  | T_uint16x8
 	     | T_uint32x2  | T_uint32x4
 	     | T_uint64x1  | T_uint64x2
+	     | T_float16x4
 	     | T_float32x2 | T_float32x4
 	     | T_poly8x8   | T_poly8x16
 	     | T_poly16x4  | T_poly16x8
@@ -46,11 +47,13 @@  type vectype = T_int8x8    | T_int8x16
              | T_uint8     | T_uint16
              | T_uint32    | T_uint64
              | T_poly8     | T_poly16
-             | T_float32   | T_arrayof of int * vectype
+             | T_float16   | T_float32
+             | T_arrayof of int * vectype
              | T_ptrto of vectype | T_const of vectype
              | T_void      | T_intQI
              | T_intHI     | T_intSI
-             | T_intDI     | T_floatSF
+             | T_intDI     | T_floatHF
+             | T_floatSF
 
 (* The meanings of the following are:
      TImode : "Tetra", two registers (four words).
@@ -93,7 +96,7 @@  type arity = Arity0 of vectype
            | Arity4 of vectype * vectype * vectype * vectype * vectype
 
 type vecmode = V8QI | V4HI | V2SI | V2SF | DI
-             | V16QI | V8HI | V4SI | V4SF | V2DI
+             | V16QI | V8HI | V4SI | V4SF | V4HF | V2DI
              | QI | HI | SI | SF
 
 type opcode =
@@ -284,18 +287,22 @@  type features =
   | Fixed_core_reg
     (* Mark that the intrinsic requires __ARM_FEATURE_string to be defined.  *)
   | Requires_feature of string
+    (* Mark that the intrinsic requires a particular architecture version.  *)
   | Requires_arch of int
+    (* Mark that the intrinsic requires a particular bit in __ARM_FP to
+    be set.   *)
+  | Requires_FP_bit of int
 
 exception MixedMode of elts * elts
 
 let rec elt_width = function
     S8 | U8 | P8 | I8 | B8 -> 8
-  | S16 | U16 | P16 | I16 | B16 -> 16
+  | S16 | U16 | P16 | I16 | B16 | F16 -> 16
   | S32 | F32 | U32 | I32 | B32 -> 32
   | S64 | U64 | I64 | B64 -> 64
   | Conv (a, b) ->
       let wa = elt_width a and wb = elt_width b in
-      if wa = wb then wa else failwith "element width?"
+      if wa = wb then wa else raise (MixedMode (a, b))
   | Cast (a, b) -> raise (MixedMode (a, b))
   | NoElts -> failwith "No elts"
 
@@ -303,7 +310,7 @@  let rec elt_class = function
     S8 | S16 | S32 | S64 -> Signed
   | U8 | U16 | U32 | U64 -> Unsigned
   | P8 | P16 -> Poly
-  | F32 -> Float
+  | F16 | F32 -> Float
   | I8 | I16 | I32 | I64 -> Int
   | B8 | B16 | B32 | B64 -> Bits
   | Conv (a, b) | Cast (a, b) -> ConvClass (elt_class a, elt_class b)
@@ -315,6 +322,7 @@  let elt_of_class_width c w =
   | Signed, 16 -> S16
   | Signed, 32 -> S32
   | Signed, 64 -> S64
+  | Float, 16 -> F16
   | Float, 32 -> F32
   | Unsigned, 8 -> U8
   | Unsigned, 16 -> U16
@@ -384,7 +392,12 @@  let find_key_operand operands =
   in
     scan ((Array.length operands) - 1)
 
-let rec mode_of_elt elt shape =
+(* Find a vecmode from a shape_elt ELT for an instruction with shape_form
+   SHAPE.  For a Use_operands shape, if ARGPOS is passed then return the mode
+   for the given argument position, else determine which argument to return a
+   mode for automatically.  *)
+
+let rec mode_of_elt ?argpos elt shape =
   let flt = match elt_class elt with
     Float | ConvClass(_, Float) -> true | _ -> false in
   let idx =
@@ -394,7 +407,10 @@  let rec mode_of_elt elt shape =
   in match shape with
     All (_, Dreg) | By_scalar Dreg | Pair_result Dreg | Unary_scalar Dreg
   | Binary_imm Dreg | Long_noreg Dreg | Wide_noreg Dreg ->
-      [| V8QI; V4HI; if flt then V2SF else V2SI; DI |].(idx)
+      if flt then
+        [| V8QI; V4HF; V2SF; DI |].(idx)
+      else
+        [| V8QI; V4HI; V2SI; DI |].(idx)
   | All (_, Qreg) | By_scalar Qreg | Pair_result Qreg | Unary_scalar Qreg
   | Binary_imm Qreg | Long_noreg Qreg | Wide_noreg Qreg ->
       [| V16QI; V8HI; if flt then V4SF else V4SI; V2DI |].(idx)
@@ -404,7 +420,11 @@  let rec mode_of_elt elt shape =
   | Long_imm ->
       [| V8QI; V4HI; V2SI; DI |].(idx)
   | Narrow | Narrow_imm -> [| V16QI; V8HI; V4SI; V2DI |].(idx)
-  | Use_operands ops -> mode_of_elt elt (All (0, (find_key_operand ops)))
+  | Use_operands ops ->
+      begin match argpos with
+        None -> mode_of_elt ?argpos elt (All (0, (find_key_operand ops)))
+      | Some pos -> mode_of_elt ?argpos elt (All (0, ops.(pos)))
+      end
   | _ -> failwith "invalid shape"
 
 (* Modify an element type dependent on the shape of the instruction and the
@@ -454,10 +474,11 @@  let type_for_elt shape elt no =
         | U16 -> T_uint16x4
         | U32 -> T_uint32x2
         | U64 -> T_uint64x1
+        | F16 -> T_float16x4
         | F32 -> T_float32x2
         | P8 -> T_poly8x8
         | P16 -> T_poly16x4
-        | _ -> failwith "Bad elt type"
+        | _ -> failwith "Bad elt type for Dreg"
         end
     | Qreg ->
         begin match elt with
@@ -472,7 +493,7 @@  let type_for_elt shape elt no =
         | F32 -> T_float32x4
         | P8 -> T_poly8x16
         | P16 -> T_poly16x8
-        | _ -> failwith "Bad elt type"
+        | _ -> failwith "Bad elt type for Qreg"
         end
     | Corereg ->
         begin match elt with
@@ -487,7 +508,7 @@  let type_for_elt shape elt no =
         | P8 -> T_poly8
         | P16 -> T_poly16
         | F32 -> T_float32
-        | _ -> failwith "Bad elt type"
+        | _ -> failwith "Bad elt type for Corereg"
         end
     | Immed ->
         T_immediate (0, 0)
@@ -506,7 +527,7 @@  let type_for_elt shape elt no =
 let vectype_size = function
     T_int8x8 | T_int16x4 | T_int32x2 | T_int64x1
   | T_uint8x8 | T_uint16x4 | T_uint32x2 | T_uint64x1
-  | T_float32x2 | T_poly8x8 | T_poly16x4 -> 64
+  | T_float32x2 | T_poly8x8 | T_poly16x4 | T_float16x4 -> 64
   | T_int8x16 | T_int16x8 | T_int32x4 | T_int64x2
   | T_uint8x16 | T_uint16x8  | T_uint32x4  | T_uint64x2
   | T_float32x4 | T_poly8x16 | T_poly16x8 -> 128
@@ -1217,6 +1238,10 @@  let ops =
       [Conv (S32, F32); Conv (U32, F32); Conv (F32, S32); Conv (F32, U32)];
     Vcvt, [InfoWord], All (2, Qreg), "vcvtQ", conv_1,
       [Conv (S32, F32); Conv (U32, F32); Conv (F32, S32); Conv (F32, U32)];
+    Vcvt, [Builtin_name "vcvt" ; Requires_FP_bit 1],
+          Use_operands [| Dreg; Qreg; |], "vcvt", conv_1, [Conv (F16, F32)];
+    Vcvt, [Builtin_name "vcvt" ; Requires_FP_bit 1],
+          Use_operands [| Qreg; Dreg; |], "vcvt", conv_1, [Conv (F32, F16)];
     Vcvt_n, [InfoWord], Use_operands [| Dreg; Dreg; Immed |], "vcvt_n", conv_2,
       [Conv (S32, F32); Conv (U32, F32); Conv (F32, S32); Conv (F32, U32)];
     Vcvt_n, [InfoWord], Use_operands [| Qreg; Qreg; Immed |], "vcvtQ_n", conv_2,
@@ -1782,7 +1807,7 @@  let rec string_of_elt = function
   | U8 -> "u8" | U16 -> "u16" | U32 -> "u32" | U64 -> "u64"
   | I8 -> "i8" | I16 -> "i16" | I32 -> "i32" | I64 -> "i64"
   | B8 -> "8" | B16 -> "16" | B32 -> "32" | B64 -> "64"
-  | F32 -> "f32" | P8 -> "p8" | P16 -> "p16"
+  | F32 -> "f32" | P8 -> "p8" | P16 -> "p16" | F16 -> "f16"
   | Conv (a, b) | Cast (a, b) -> string_of_elt a ^ "_" ^ string_of_elt b
   | NoElts -> failwith "No elts"
 
@@ -1809,6 +1834,7 @@  let string_of_vectype vt =
   | T_uint32x4 -> affix "uint32x4"
   | T_uint64x1 -> affix "uint64x1"
   | T_uint64x2 -> affix "uint64x2"
+  | T_float16x4 -> affix "float16x4"
   | T_float32x2 -> affix "float32x2"
   | T_float32x4 -> affix "float32x4"
   | T_poly8x8 -> affix "poly8x8"
@@ -1825,6 +1851,7 @@  let string_of_vectype vt =
   | T_uint64 -> affix "uint64"
   | T_poly8 -> affix "poly8"
   | T_poly16 -> affix "poly16"
+  | T_float16 -> affix "float16"
   | T_float32 -> affix "float32"
   | T_immediate _ -> "const int"
   | T_void -> "void"
@@ -1832,6 +1859,7 @@  let string_of_vectype vt =
   | T_intHI -> "__builtin_neon_hi"
   | T_intSI -> "__builtin_neon_si"
   | T_intDI -> "__builtin_neon_di"
+  | T_floatHF -> "__builtin_neon_hf"
   | T_floatSF -> "__builtin_neon_sf"
   | T_arrayof (num, base) ->
       let basename = name (fun x -> x) base in
@@ -1853,10 +1881,10 @@  let string_of_inttype = function
   | B_XImode -> "__builtin_neon_xi"
 
 let string_of_mode = function
-    V8QI -> "v8qi" | V4HI  -> "v4hi"  | V2SI -> "v2si" | V2SF -> "v2sf"
-  | DI   -> "di"   | V16QI -> "v16qi" | V8HI -> "v8hi" | V4SI -> "v4si"
-  | V4SF -> "v4sf" | V2DI  -> "v2di"  | QI -> "qi" | HI -> "hi" | SI -> "si"
-  | SF -> "sf"
+    V8QI -> "v8qi" | V4HI -> "v4hi" | V4HF  -> "v4hf"  | V2SI -> "v2si"
+  | V2SF -> "v2sf" | DI   -> "di"   | V16QI -> "v16qi" | V8HI -> "v8hi"
+  | V4SI -> "v4si" | V4SF -> "v4sf" | V2DI  -> "v2di"  | QI   -> "qi"
+  | HI -> "hi" | SI -> "si" | SF -> "sf"
 
 (* Use uppercase chars for letters which form part of the intrinsic name, but
    should be omitted from the builtin name (the info is passed in an extra
Index: neon-testgen.ml
===================================================================
--- neon-testgen.ml	(revision 197804)
+++ neon-testgen.ml	(working copy)
@@ -163,10 +163,12 @@  let effective_target features =
     match List.find (fun feature ->
                        match feature with Requires_feature _ -> true
                                         | Requires_arch _ -> true
+                                        | Requires_FP_bit 1 -> true
                                         | _ -> false)
                      features with
       Requires_feature "FMA" -> "arm_neonv2"
     | Requires_arch 8 -> "arm_v8_neon"
+    | Requires_FP_bit 1 -> "arm_neon_fp16"
     | _ -> assert false
   with Not_found -> "arm_neon"