Patchwork [ARM,1/2] Add support for vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsics

login
register
mail settings
Submitter Julian Brown
Date April 13, 2013, 2:03 p.m.
Message ID <20130413150344.2a80da8e@octopus>
Download mbox | patch
Permalink /patch/236355/
State New
Headers show

Comments

Julian Brown - April 13, 2013, 2:03 p.m.
On Fri, 12 Apr 2013 20:09:39 +0100
Julian Brown <julian@codesourcery.com> wrote:

> On Fri, 12 Apr 2013 15:19:18 +0100
> Kyrylo Tkachov <kyrylo.tkachov@arm.com> wrote:
> 
> > Hi all,
> > 
> > This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic
> > to arm_neon.h through the generator ML scripts and also adds the
> > built-ins to which the intrinsics will map to. The generator ML
> > scripts are updated and used to generate the relevant .texi
> > documentation, arm_neon.h and the tests in gcc.target/arm/neon .
> 
> FWIW, some of the changes to neon*.ml can be simplified somewhat -- my
> attempt at an improved version of those bits is attached. I'm still
> not too happy with mode_suffix, but these new instructions require
> adding semantics to parts of the generator program which weren't
> really very well-defined to start with :-). I appreciate that it's a
> bit of a tangle...

I thought of an improvement to the mode_suffix part from the last
version of the patch, so here it is. I'm done fiddling with this now,
so back to you!

Cheers,

Julian
Kyrylo Tkachov - April 17, 2013, 11:06 a.m.
Hi Julian,

> From: Julian Brown [mailto:julian@codesourcery.com]
> Sent: 13 April 2013 15:04
> To: Julian Brown
> Cc: Kyrylo Tkachov; gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana
> Radhakrishnan
> Subject: Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and
> vcvt_f32_f16 NEON intrinsics
> 
> On Fri, 12 Apr 2013 20:09:39 +0100
> Julian Brown <julian@codesourcery.com> wrote:
> 
> > On Fri, 12 Apr 2013 15:19:18 +0100
> > Kyrylo Tkachov <kyrylo.tkachov@arm.com> wrote:
> >
> > > Hi all,
> > >
> > > This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic
> > > to arm_neon.h through the generator ML scripts and also adds the
> > > built-ins to which the intrinsics will map to. The generator ML
> > > scripts are updated and used to generate the relevant .texi
> > > documentation, arm_neon.h and the tests in gcc.target/arm/neon .
> >
> > FWIW, some of the changes to neon*.ml can be simplified somewhat --
> my
> > attempt at an improved version of those bits is attached. I'm still
> > not too happy with mode_suffix, but these new instructions require
> > adding semantics to parts of the generator program which weren't
> > really very well-defined to start with :-). I appreciate that it's a
> > bit of a tangle...
> 
> I thought of an improvement to the mode_suffix part from the last
> version of the patch, so here it is. I'm done fiddling with this now,
> so back to you!

Thanks for looking at it! My Ocaml-fu is rather limited.
It does look cleaner now.
Here it is together with all the other parts of the patch, plus some
minor formatting changes.

Ok for trunk now?

gcc/ChangeLog
2013-04-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
            Julian Brown  <julian@codesourcery.com>

	* config/arm/arm.c (neon_builtin_type_mode): Add T_V4HF.
	(TB_DREG): Add T_V4HF.
	(v4hf_UP): New macro.
	(neon_itype): Add NEON_FLOAT_WIDEN, NEON_FLOAT_NARROW.
	(arm_init_neon_builtins): Handle NEON_FLOAT_WIDEN,
	NEON_FLOAT_NARROW.
	Handle initialisation of V4HF. Adjust initialisation of reinterpret
	built-ins.
	(arm_expand_neon_builtin): Handle NEON_FLOAT_WIDEN,
	NEON_FLOAT_NARROW.
	(arm_vector_mode_supported_p): Handle V4HF.
	(arm_mangle_map): Handle V4HFmode.
	* config/arm/arm.h (VALID_NEON_DREG_MODE): Add V4HF.
	* config/arm/arm_neon_builtins.def: Add entries for
	vcvtv4hfv4sf, vcvtv4sfv4hf.
	* config/arm/neon.md (neon_vcvtv4sfv4hf): New pattern.
	(neon_vcvtv4hfv4sf): Likewise.
	* config/arm/neon-gen.ml: Handle half-precision floating point
	features.
	* config/arm/neon-testgen.ml: Handle Requires_FP_bit feature.
	* config/arm/arm_neon.h: Regenerate.
	* config/arm/neon.ml (type elts): Add F16.
	(type vectype): Add T_float16x4, T_floatHF.
	(type vecmode): Add V4HF.
	(type features): Add Requires_FP_bit feature.
	(elt_width): Handle F16.
	(elt_class): Likewise.
	(elt_of_class_width): Likewise.
	(mode_of_elt): Refactor.
	(type_for_elt): Handle F16, fix error messages.
	(vectype_size): Handle T_float16x4.
	(vcvt_sh): New function.
	(ops): Add entries for vcvt_f16_f32, vcvt_f32_f16.
	(string_of_vectype): Handle T_floatHF, T_float16, T_float16x4.
	(string_of_mode): Handle V4HF.
	* doc/arm-neon-intrinsics.texi: Regenerate.


gcc/testsuite/ChangeLog
2013-04-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
            Julian Brown  <julian@codesourcery.com>

	* gcc.target/arm/neon/vcvtf16_f32.c: New test. Generated.
	* gcc.target/arm/neon/vcvtf32_f16.c: Likewise.
Richard Earnshaw - April 17, 2013, 3:03 p.m.
On 17/04/13 12:06, Kyrylo Tkachov wrote:
> Hi Julian,
>
>> From: Julian Brown [mailto:julian@codesourcery.com]
>> Sent: 13 April 2013 15:04
>> To: Julian Brown
>> Cc: Kyrylo Tkachov; gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana
>> Radhakrishnan
>> Subject: Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and
>> vcvt_f32_f16 NEON intrinsics
>>
>> On Fri, 12 Apr 2013 20:09:39 +0100
>> Julian Brown <julian@codesourcery.com> wrote:
>>
>>> On Fri, 12 Apr 2013 15:19:18 +0100
>>> Kyrylo Tkachov <kyrylo.tkachov@arm.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic
>>>> to arm_neon.h through the generator ML scripts and also adds the
>>>> built-ins to which the intrinsics will map to. The generator ML
>>>> scripts are updated and used to generate the relevant .texi
>>>> documentation, arm_neon.h and the tests in gcc.target/arm/neon .
>>>
>>> FWIW, some of the changes to neon*.ml can be simplified somewhat --
>> my
>>> attempt at an improved version of those bits is attached. I'm still
>>> not too happy with mode_suffix, but these new instructions require
>>> adding semantics to parts of the generator program which weren't
>>> really very well-defined to start with :-). I appreciate that it's a
>>> bit of a tangle...
>>
>> I thought of an improvement to the mode_suffix part from the last
>> version of the patch, so here it is. I'm done fiddling with this now,
>> so back to you!
>
> Thanks for looking at it! My Ocaml-fu is rather limited.
> It does look cleaner now.
> Here it is together with all the other parts of the patch, plus some
> minor formatting changes.
>
> Ok for trunk now?
>
> gcc/ChangeLog
> 2013-04-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>              Julian Brown  <julian@codesourcery.com>
>
> 	* config/arm/arm.c (neon_builtin_type_mode): Add T_V4HF.
> 	(TB_DREG): Add T_V4HF.
> 	(v4hf_UP): New macro.
> 	(neon_itype): Add NEON_FLOAT_WIDEN, NEON_FLOAT_NARROW.
> 	(arm_init_neon_builtins): Handle NEON_FLOAT_WIDEN,
> 	NEON_FLOAT_NARROW.
> 	Handle initialisation of V4HF. Adjust initialisation of reinterpret
> 	built-ins.
> 	(arm_expand_neon_builtin): Handle NEON_FLOAT_WIDEN,
> 	NEON_FLOAT_NARROW.
> 	(arm_vector_mode_supported_p): Handle V4HF.
> 	(arm_mangle_map): Handle V4HFmode.
> 	* config/arm/arm.h (VALID_NEON_DREG_MODE): Add V4HF.
> 	* config/arm/arm_neon_builtins.def: Add entries for
> 	vcvtv4hfv4sf, vcvtv4sfv4hf.
> 	* config/arm/neon.md (neon_vcvtv4sfv4hf): New pattern.
> 	(neon_vcvtv4hfv4sf): Likewise.
> 	* config/arm/neon-gen.ml: Handle half-precision floating point
> 	features.
> 	* config/arm/neon-testgen.ml: Handle Requires_FP_bit feature.
> 	* config/arm/arm_neon.h: Regenerate.
> 	* config/arm/neon.ml (type elts): Add F16.
> 	(type vectype): Add T_float16x4, T_floatHF.
> 	(type vecmode): Add V4HF.
> 	(type features): Add Requires_FP_bit feature.
> 	(elt_width): Handle F16.
> 	(elt_class): Likewise.
> 	(elt_of_class_width): Likewise.
> 	(mode_of_elt): Refactor.
> 	(type_for_elt): Handle F16, fix error messages.
> 	(vectype_size): Handle T_float16x4.
> 	(vcvt_sh): New function.
> 	(ops): Add entries for vcvt_f16_f32, vcvt_f32_f16.
> 	(string_of_vectype): Handle T_floatHF, T_float16, T_float16x4.
> 	(string_of_mode): Handle V4HF.
> 	* doc/arm-neon-intrinsics.texi: Regenerate.
>
>
> gcc/testsuite/ChangeLog
> 2013-04-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>              Julian Brown  <julian@codesourcery.com>
>
> 	* gcc.target/arm/neon/vcvtf16_f32.c: New test. Generated.
> 	* gcc.target/arm/neon/vcvtf32_f16.c: Likewise.
>
>
> neon-vcvt-intrinsics.patch
>

Please give Julian 24 hours for one final review of the Ocaml bits. 
Otherwise OK.

R.

Patch

Index: neon-gen.ml
===================================================================
--- neon-gen.ml	(revision 197804)
+++ neon-gen.ml	(working copy)
@@ -121,6 +121,7 @@  let rec signed_ctype = function
   | T_uint16 | T_int16 -> T_intHI
   | T_uint32 | T_int32 -> T_intSI
   | T_uint64 | T_int64 -> T_intDI
+  | T_float16 -> T_floatHF
   | T_float32 -> T_floatSF
   | T_poly8 -> T_intQI
   | T_poly16 -> T_intHI
@@ -275,8 +276,8 @@  let rec mode_suffix elttype shape =
     let mode = mode_of_elt elttype shape in
     string_of_mode mode
   with MixedMode (dst, src) ->
-    let dstmode = mode_of_elt dst shape
-    and srcmode = mode_of_elt src shape in
+    let dstmode = mode_of_elt ~argpos:0 dst shape
+    and srcmode = mode_of_elt ~argpos:1 src shape in
     string_of_mode dstmode ^ string_of_mode srcmode
 
 let get_shuffle features =
@@ -291,19 +292,24 @@  let print_feature_test_start features =
     match List.find (fun feature ->
                        match feature with Requires_feature _ -> true
                                         | Requires_arch _ -> true
+                                        | Requires_FP_bit _ -> true
                                         | _ -> false)
                      features with
-      Requires_feature feature -> 
+      Requires_feature feature ->
         Format.printf "#ifdef __ARM_FEATURE_%s@\n" feature
     | Requires_arch arch ->
         Format.printf "#if __ARM_ARCH >= %d@\n" arch
+    | Requires_FP_bit bit ->
+        Format.printf "#if ((__ARM_FP & 0x%X) != 0)@\n"
+                      (1 lsl bit)
     | _ -> assert false
   with Not_found -> assert true
 
 let print_feature_test_end features =
   let feature =
-    List.exists (function Requires_feature x -> true
-                          | Requires_arch x -> true
+    List.exists (function Requires_feature _ -> true
+                          | Requires_arch _ -> true
+                          | Requires_FP_bit _ -> true
                           |  _ -> false) features in
   if feature then Format.printf "#endif@\n"
 
@@ -365,6 +371,7 @@  let deftypes () =
     "__builtin_neon_hi", "int", 16, 4;
     "__builtin_neon_si", "int", 32, 2;
     "__builtin_neon_di", "int", 64, 1;
+    "__builtin_neon_hf", "float", 16, 4;
     "__builtin_neon_sf", "float", 32, 2;
     "__builtin_neon_poly8", "poly", 8, 8;
     "__builtin_neon_poly16", "poly", 16, 4;
Index: neon.ml
===================================================================
--- neon.ml	(revision 197804)
+++ neon.ml	(working copy)
@@ -21,7 +21,7 @@ 
    <http://www.gnu.org/licenses/>.  *)
 
 (* Shorthand types for vector elements.  *)
-type elts = S8 | S16 | S32 | S64 | F32 | U8 | U16 | U32 | U64 | P8 | P16
+type elts = S8 | S16 | S32 | S64 | F16 | F32 | U8 | U16 | U32 | U64 | P8 | P16
           | I8 | I16 | I32 | I64 | B8 | B16 | B32 | B64 | Conv of elts * elts
           | Cast of elts * elts | NoElts
 
@@ -37,6 +37,7 @@  type vectype = T_int8x8    | T_int8x16
 	     | T_uint16x4  | T_uint16x8
 	     | T_uint32x2  | T_uint32x4
 	     | T_uint64x1  | T_uint64x2
+	     | T_float16x4
 	     | T_float32x2 | T_float32x4
 	     | T_poly8x8   | T_poly8x16
 	     | T_poly16x4  | T_poly16x8
@@ -46,11 +47,13 @@  type vectype = T_int8x8    | T_int8x16
              | T_uint8     | T_uint16
              | T_uint32    | T_uint64
              | T_poly8     | T_poly16
-             | T_float32   | T_arrayof of int * vectype
+             | T_float16   | T_float32
+             | T_arrayof of int * vectype
              | T_ptrto of vectype | T_const of vectype
              | T_void      | T_intQI
              | T_intHI     | T_intSI
-             | T_intDI     | T_floatSF
+             | T_intDI     | T_floatHF
+             | T_floatSF
 
 (* The meanings of the following are:
      TImode : "Tetra", two registers (four words).
@@ -93,7 +96,7 @@  type arity = Arity0 of vectype
            | Arity4 of vectype * vectype * vectype * vectype * vectype
 
 type vecmode = V8QI | V4HI | V2SI | V2SF | DI
-             | V16QI | V8HI | V4SI | V4SF | V2DI
+             | V16QI | V8HI | V4SI | V4SF | V4HF | V2DI
              | QI | HI | SI | SF
 
 type opcode =
@@ -284,18 +287,22 @@  type features =
   | Fixed_core_reg
     (* Mark that the intrinsic requires __ARM_FEATURE_string to be defined.  *)
   | Requires_feature of string
+    (* Mark that the intrinsic requires a particular architecture version.  *)
   | Requires_arch of int
+    (* Mark that the intrinsic requires a particular bit in __ARM_FP to
+    be set.   *)
+  | Requires_FP_bit of int
 
 exception MixedMode of elts * elts
 
 let rec elt_width = function
     S8 | U8 | P8 | I8 | B8 -> 8
-  | S16 | U16 | P16 | I16 | B16 -> 16
+  | S16 | U16 | P16 | I16 | B16 | F16 -> 16
   | S32 | F32 | U32 | I32 | B32 -> 32
   | S64 | U64 | I64 | B64 -> 64
   | Conv (a, b) ->
       let wa = elt_width a and wb = elt_width b in
-      if wa = wb then wa else failwith "element width?"
+      if wa = wb then wa else raise (MixedMode (a, b))
   | Cast (a, b) -> raise (MixedMode (a, b))
   | NoElts -> failwith "No elts"
 
@@ -303,7 +310,7 @@  let rec elt_class = function
     S8 | S16 | S32 | S64 -> Signed
   | U8 | U16 | U32 | U64 -> Unsigned
   | P8 | P16 -> Poly
-  | F32 -> Float
+  | F16 | F32 -> Float
   | I8 | I16 | I32 | I64 -> Int
   | B8 | B16 | B32 | B64 -> Bits
   | Conv (a, b) | Cast (a, b) -> ConvClass (elt_class a, elt_class b)
@@ -315,6 +322,7 @@  let elt_of_class_width c w =
   | Signed, 16 -> S16
   | Signed, 32 -> S32
   | Signed, 64 -> S64
+  | Float, 16 -> F16
   | Float, 32 -> F32
   | Unsigned, 8 -> U8
   | Unsigned, 16 -> U16
@@ -384,7 +392,12 @@  let find_key_operand operands =
   in
     scan ((Array.length operands) - 1)
 
-let rec mode_of_elt elt shape =
+(* Find a vecmode from a shape_elt ELT for an instruction with shape_form
+   SHAPE.  For a Use_operands shape, if ARGPOS is passed then return the mode
+   for the given argument position, else determine which argument to return a
+   mode for automatically.  *)
+
+let rec mode_of_elt ?argpos elt shape =
   let flt = match elt_class elt with
     Float | ConvClass(_, Float) -> true | _ -> false in
   let idx =
@@ -394,7 +407,10 @@  let rec mode_of_elt elt shape =
   in match shape with
     All (_, Dreg) | By_scalar Dreg | Pair_result Dreg | Unary_scalar Dreg
   | Binary_imm Dreg | Long_noreg Dreg | Wide_noreg Dreg ->
-      [| V8QI; V4HI; if flt then V2SF else V2SI; DI |].(idx)
+      if flt then
+        [| V8QI; V4HF; V2SF; DI |].(idx)
+      else
+        [| V8QI; V4HI; V2SI; DI |].(idx)
   | All (_, Qreg) | By_scalar Qreg | Pair_result Qreg | Unary_scalar Qreg
   | Binary_imm Qreg | Long_noreg Qreg | Wide_noreg Qreg ->
       [| V16QI; V8HI; if flt then V4SF else V4SI; V2DI |].(idx)
@@ -404,7 +420,11 @@  let rec mode_of_elt elt shape =
   | Long_imm ->
       [| V8QI; V4HI; V2SI; DI |].(idx)
   | Narrow | Narrow_imm -> [| V16QI; V8HI; V4SI; V2DI |].(idx)
-  | Use_operands ops -> mode_of_elt elt (All (0, (find_key_operand ops)))
+  | Use_operands ops ->
+      begin match argpos with
+        None -> mode_of_elt ?argpos elt (All (0, (find_key_operand ops)))
+      | Some pos -> mode_of_elt ?argpos elt (All (0, ops.(pos)))
+      end
   | _ -> failwith "invalid shape"
 
 (* Modify an element type dependent on the shape of the instruction and the
@@ -454,10 +474,11 @@  let type_for_elt shape elt no =
         | U16 -> T_uint16x4
         | U32 -> T_uint32x2
         | U64 -> T_uint64x1
+        | F16 -> T_float16x4
         | F32 -> T_float32x2
         | P8 -> T_poly8x8
         | P16 -> T_poly16x4
-        | _ -> failwith "Bad elt type"
+        | _ -> failwith "Bad elt type for Dreg"
         end
     | Qreg ->
         begin match elt with
@@ -472,7 +493,7 @@  let type_for_elt shape elt no =
         | F32 -> T_float32x4
         | P8 -> T_poly8x16
         | P16 -> T_poly16x8
-        | _ -> failwith "Bad elt type"
+        | _ -> failwith "Bad elt type for Qreg"
         end
     | Corereg ->
         begin match elt with
@@ -487,7 +508,7 @@  let type_for_elt shape elt no =
         | P8 -> T_poly8
         | P16 -> T_poly16
         | F32 -> T_float32
-        | _ -> failwith "Bad elt type"
+        | _ -> failwith "Bad elt type for Corereg"
         end
     | Immed ->
         T_immediate (0, 0)
@@ -506,7 +527,7 @@  let type_for_elt shape elt no =
 let vectype_size = function
     T_int8x8 | T_int16x4 | T_int32x2 | T_int64x1
   | T_uint8x8 | T_uint16x4 | T_uint32x2 | T_uint64x1
-  | T_float32x2 | T_poly8x8 | T_poly16x4 -> 64
+  | T_float32x2 | T_poly8x8 | T_poly16x4 | T_float16x4 -> 64
   | T_int8x16 | T_int16x8 | T_int32x4 | T_int64x2
   | T_uint8x16 | T_uint16x8  | T_uint32x4  | T_uint64x2
   | T_float32x4 | T_poly8x16 | T_poly16x8 -> 128
@@ -1217,6 +1238,10 @@  let ops =
       [Conv (S32, F32); Conv (U32, F32); Conv (F32, S32); Conv (F32, U32)];
     Vcvt, [InfoWord], All (2, Qreg), "vcvtQ", conv_1,
       [Conv (S32, F32); Conv (U32, F32); Conv (F32, S32); Conv (F32, U32)];
+    Vcvt, [Builtin_name "vcvt" ; Requires_FP_bit 1],
+          Use_operands [| Dreg; Qreg; |], "vcvt", conv_1, [Conv (F16, F32)];
+    Vcvt, [Builtin_name "vcvt" ; Requires_FP_bit 1],
+          Use_operands [| Qreg; Dreg; |], "vcvt", conv_1, [Conv (F32, F16)];
     Vcvt_n, [InfoWord], Use_operands [| Dreg; Dreg; Immed |], "vcvt_n", conv_2,
       [Conv (S32, F32); Conv (U32, F32); Conv (F32, S32); Conv (F32, U32)];
     Vcvt_n, [InfoWord], Use_operands [| Qreg; Qreg; Immed |], "vcvtQ_n", conv_2,
@@ -1782,7 +1807,7 @@  let rec string_of_elt = function
   | U8 -> "u8" | U16 -> "u16" | U32 -> "u32" | U64 -> "u64"
   | I8 -> "i8" | I16 -> "i16" | I32 -> "i32" | I64 -> "i64"
   | B8 -> "8" | B16 -> "16" | B32 -> "32" | B64 -> "64"
-  | F32 -> "f32" | P8 -> "p8" | P16 -> "p16"
+  | F32 -> "f32" | P8 -> "p8" | P16 -> "p16" | F16 -> "f16"
   | Conv (a, b) | Cast (a, b) -> string_of_elt a ^ "_" ^ string_of_elt b
   | NoElts -> failwith "No elts"
 
@@ -1809,6 +1834,7 @@  let string_of_vectype vt =
   | T_uint32x4 -> affix "uint32x4"
   | T_uint64x1 -> affix "uint64x1"
   | T_uint64x2 -> affix "uint64x2"
+  | T_float16x4 -> affix "float16x4"
   | T_float32x2 -> affix "float32x2"
   | T_float32x4 -> affix "float32x4"
   | T_poly8x8 -> affix "poly8x8"
@@ -1825,6 +1851,7 @@  let string_of_vectype vt =
   | T_uint64 -> affix "uint64"
   | T_poly8 -> affix "poly8"
   | T_poly16 -> affix "poly16"
+  | T_float16 -> affix "float16"
   | T_float32 -> affix "float32"
   | T_immediate _ -> "const int"
   | T_void -> "void"
@@ -1832,6 +1859,7 @@  let string_of_vectype vt =
   | T_intHI -> "__builtin_neon_hi"
   | T_intSI -> "__builtin_neon_si"
   | T_intDI -> "__builtin_neon_di"
+  | T_floatHF -> "__builtin_neon_hf"
   | T_floatSF -> "__builtin_neon_sf"
   | T_arrayof (num, base) ->
       let basename = name (fun x -> x) base in
@@ -1853,10 +1881,10 @@  let string_of_inttype = function
   | B_XImode -> "__builtin_neon_xi"
 
 let string_of_mode = function
-    V8QI -> "v8qi" | V4HI  -> "v4hi"  | V2SI -> "v2si" | V2SF -> "v2sf"
-  | DI   -> "di"   | V16QI -> "v16qi" | V8HI -> "v8hi" | V4SI -> "v4si"
-  | V4SF -> "v4sf" | V2DI  -> "v2di"  | QI -> "qi" | HI -> "hi" | SI -> "si"
-  | SF -> "sf"
+    V8QI -> "v8qi" | V4HI -> "v4hi" | V4HF  -> "v4hf"  | V2SI -> "v2si"
+  | V2SF -> "v2sf" | DI   -> "di"   | V16QI -> "v16qi" | V8HI -> "v8hi"
+  | V4SI -> "v4si" | V4SF -> "v4sf" | V2DI  -> "v2di"  | QI   -> "qi"
+  | HI -> "hi" | SI -> "si" | SF -> "sf"
 
 (* Use uppercase chars for letters which form part of the intrinsic name, but
    should be omitted from the builtin name (the info is passed in an extra
Index: neon-testgen.ml
===================================================================
--- neon-testgen.ml	(revision 197804)
+++ neon-testgen.ml	(working copy)
@@ -163,10 +163,12 @@  let effective_target features =
     match List.find (fun feature ->
                        match feature with Requires_feature _ -> true
                                         | Requires_arch _ -> true
+                                        | Requires_FP_bit 1 -> true
                                         | _ -> false)
                      features with
       Requires_feature "FMA" -> "arm_neonv2"
     | Requires_arch 8 -> "arm_v8_neon"
+    | Requires_FP_bit 1 -> "arm_neon_fp16"
     | _ -> assert false
   with Not_found -> "arm_neon"