Patchwork [ARM,3/3] AArch32 NEON vrint builtins and intrinsics

login
register
mail settings
Submitter Kyrylo Tkachov
Date Nov. 29, 2012, 2:27 p.m.
Message ID <006b01cdce3d$ae0ff260$0a2fd720$@tkachov@arm.com>
Download mbox | patch
Permalink /patch/202757/
State New
Headers show

Comments

Kyrylo Tkachov - Nov. 29, 2012, 2:27 p.m.
Hi all,
This patch adds the intrinsics support for the vrnd intrinsics that are
implemented by the vrint instructions.
The .ml scripts contain the new information and should used to regenerate
the arm_neon.h header file, tests and documentation.
In particular:
* config/arm/arm_neon.h should be regenerated using config/arm/neon-gen.ml.
* doc/arm-neon-intrinsics.texi should be regenerated using
config/arm/neon-docgen.ml.
* The tests in testsuite/gcc.target/arm/neon/ should be generated using
config/arm/neon-testgen.ml.
All three of these scripts should be linked against the compiled neon.ml
file i.e:
$ ocamlc -c neon.ml
$ ocamlc -o neon-gen neon.cmo neon-gen.ml


The following intrinsics are defined:
vrnd_f32 (float32x2_t a)       (generating a vrintz instruction)
vrndq_f32 (float32x4_t a)      (generating a vrintz instruction)
vrnda_f32 (float32x2_t a)      (generating a vrinta instruction)	
vrndqa_f32 (float32x4_t a)     (generating a vrinta instruction)
vrndm_f32 (float32x2_t a)      (generating a vrintm instruction)
vrndqm_f32 (float32x4_t a)     (generating a vrintm instruction)
vrndn_f32 (float32x2_t a)      (generating a vrintn instruction)
vrndqn_f32 (float32x4_t a)     (generating a vrintn instruction)
vrndp_f32 (float32x2_t a)      (generating a vrintp instruction)
vrndqp_f32 (float32x4_t a)     (generating a vrintp instruction)

Note that AArch32 NEON does not support double precision floats, so we don't
have _f64 versions.

Tested on arm-none-eabi. New tests pass, no regressions (once the effective
target checks patch is added).

Ok for trunk?

Thanks,
Kyrill

2012-11-29  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

	* config/arm/neon.ml (opcode): Add Vrintn, Vrinta, Vrintp, Vrintm,
	Vrintz to type.
	(type features): Add Requires_arch type constructor.
	(ops): Define Vrintn, Vrinta, Vrintp, Vrintm, Vrintz features.
	* config/arm/neon-docgen.ml (intrinsic_groups): Define Vrintn,
	Vrinta, Vrintp, Vrintm, Vrintz, Vrintx.
	* config/arm/neon-testgen.ml (effective_target): Define check for 
	Requires_arch 8.
	* config/arm/neon-gen.ml 
	(print_feature_test_start): Handle Requires_arch.
	(print_feature_test_end): Likewise.
	* doc/arm-neon-intrinsics.texi: Regenerate.
	* config/arm/arm_neon.h: Regenerate.

	
gcc/testsuite/ChangeLog

2012-11-29  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

	* gcc.target/arm/neon/vrndaf32.c: New test.
	* gcc.target/arm/neon/vrndqaf32.c: Likewise.
	* gcc.target/arm/neon/vrndf32.c: Likewise.
	* gcc.target/arm/neon/vrndqf32.c: Likewise.
	* gcc.target/arm/neon/vrndmf32.c: Likewise.
	* gcc.target/arm/neon/vrndqmf32.c: Likewise.
	* gcc.target/arm/neon/vrndnf32.c: Likewise.
	* gcc.target/arm/neon/vrndqnf32.c: Likewise.
	* gcc.target/arm/neon/vrndpf32.c: Likewise.
	* gcc.target/arm/neon/vrndqpf32.c: Likewise.
Ramana Radhakrishnan - Dec. 10, 2012, 10:45 a.m.
On 11/29/12 14:27, Kyrylo Tkachov wrote:
> Hi all,
> This patch adds the intrinsics support for the vrnd intrinsics that are
> implemented by the vrint instructions.
> The .ml scripts contain the new information and should used to regenerate
> the arm_neon.h header file, tests and documentation.
> In particular:
> * config/arm/arm_neon.h should be regenerated using config/arm/neon-gen.ml.
> * doc/arm-neon-intrinsics.texi should be regenerated using
> config/arm/neon-docgen.ml.
> * The tests in testsuite/gcc.target/arm/neon/ should be generated using
> config/arm/neon-testgen.ml.
> All three of these scripts should be linked against the compiled neon.ml
> file i.e:
> $ ocamlc -c neon.ml
> $ ocamlc -o neon-gen neon.cmo neon-gen.ml
>
>
> The following intrinsics are defined:
> vrnd_f32 (float32x2_t a)       (generating a vrintz instruction)
> vrndq_f32 (float32x4_t a)      (generating a vrintz instruction)
> vrnda_f32 (float32x2_t a)      (generating a vrinta instruction)	
> vrndqa_f32 (float32x4_t a)     (generating a vrinta instruction)
> vrndm_f32 (float32x2_t a)      (generating a vrintm instruction)
> vrndqm_f32 (float32x4_t a)     (generating a vrintm instruction)
> vrndn_f32 (float32x2_t a)      (generating a vrintn instruction)
> vrndqn_f32 (float32x4_t a)     (generating a vrintn instruction)
> vrndp_f32 (float32x2_t a)      (generating a vrintp instruction)
> vrndqp_f32 (float32x4_t a)     (generating a vrintp instruction)
>
> Note that AArch32 NEON does not support double precision floats, so we don't
> have _f64 versions.
>
> Tested on arm-none-eabi. New tests pass, no regressions (once the effective
> target checks patch is added).
>
> Ok for trunk?

Please add 2012 as a copyright year for arm_neon.h in the generator program.

When I regenerated the documents, it did look like there were more 
changes in arm-neon-intrinsics.texi but my suspicion is this is because 
the document hasn't been regenerated in some time. The main change here 
is the order in which some of the intrinsics are listed which has 
changed over time.

Ok with that change.


regards,
Ramana

Patch

diff --git a/gcc/config/arm/neon-docgen.ml b/gcc/config/arm/neon-docgen.ml
index 043b1e0..228de16 100644
--- a/gcc/config/arm/neon-docgen.ml
+++ b/gcc/config/arm/neon-docgen.ml
@@ -105,6 +105,11 @@  let intrinsic_groups =
     "Multiply-subtract", single_opcode Vmls;
     "Fused-multiply-accumulate", single_opcode Vfma;
     "Fused-multiply-subtract", single_opcode Vfms;
+    "Round to integral (to nearest, ties to even)", single_opcode Vrintn;
+    "Round to integral (to nearest, ties away from zero)", single_opcode Vrinta;
+    "Round to integral (towards +Inf)", single_opcode Vrintp;
+    "Round to integral (towards -Inf)", single_opcode Vrintm;
+    "Round to integral (towards 0)", single_opcode Vrintz;
     "Subtraction", single_opcode Vsub;
     "Comparison (equal-to)", single_opcode Vceq;
     "Comparison (greater-than-or-equal-to)", single_opcode Vcge;
diff --git a/gcc/config/arm/neon-gen.ml b/gcc/config/arm/neon-gen.ml
index 6c4e272..c5f0583 100644
--- a/gcc/config/arm/neon-gen.ml
+++ b/gcc/config/arm/neon-gen.ml
@@ -290,17 +290,21 @@  let print_feature_test_start features =
   try
     match List.find (fun feature ->
                        match feature with Requires_feature _ -> true
+                                        | Requires_arch _ -> true
                                         | _ -> false)
                      features with
       Requires_feature feature -> 
         Format.printf "#ifdef __ARM_FEATURE_%s@\n" feature
+    | Requires_arch arch ->
+        Format.printf "#if __ARM_ARCH >= %d@\n" arch
     | _ -> assert false
   with Not_found -> assert true
 
 let print_feature_test_end features =
   let feature =
     List.exists (function Requires_feature x -> true
-                                        |  _ -> false) features in
+                          | Requires_arch x -> true
+                          |  _ -> false) features in
   if feature then Format.printf "#endif@\n"
 
 
diff --git a/gcc/config/arm/neon-testgen.ml b/gcc/config/arm/neon-testgen.ml
index 4645f39..f6c8d9a 100644
--- a/gcc/config/arm/neon-testgen.ml
+++ b/gcc/config/arm/neon-testgen.ml
@@ -162,9 +162,11 @@  let effective_target features =
   try
     match List.find (fun feature ->
                        match feature with Requires_feature _ -> true
+                                        | Requires_arch _ -> true
                                         | _ -> false)
                      features with
       Requires_feature "FMA" -> "arm_neonv2"
+    | Requires_arch 8 -> "arm_v8_neon"
     | _ -> assert false
   with Not_found -> "arm_neon"
 
diff --git a/gcc/config/arm/neon.ml b/gcc/config/arm/neon.ml
index 101f8f6..c968f6d 100644
--- a/gcc/config/arm/neon.ml
+++ b/gcc/config/arm/neon.ml
@@ -152,6 +152,11 @@  type opcode =
   | Vqdmulh_n
   | Vqdmulh_lane
   (* Unary ops.  *)
+  | Vrintn
+  | Vrinta
+  | Vrintp
+  | Vrintm
+  | Vrintz
   | Vabs
   | Vneg
   | Vcls
@@ -279,6 +285,7 @@  type features =
   | Fixed_core_reg
     (* Mark that the intrinsic requires __ARM_FEATURE_string to be defined.  *)
   | Requires_feature of string
+  | Requires_arch of int
 
 exception MixedMode of elts * elts
 
@@ -812,6 +819,27 @@  let ops =
     Vfms, [Requires_feature "FMA"], All (3, Dreg), "vfms", elts_same_io, [F32];
     Vfms, [Requires_feature "FMA"], All (3, Qreg), "vfmsQ", elts_same_io, [F32];
 
+    (* Round to integral. *)
+    Vrintn, [Builtin_name "vrintn"; Requires_arch 8], Use_operands [| Dreg; Dreg |],
+            "vrndn", elts_same_1, [F32];
+    Vrintn, [Builtin_name "vrintn"; Requires_arch 8], Use_operands [| Qreg; Qreg |],
+            "vrndqn", elts_same_1, [F32];
+    Vrinta, [Builtin_name "vrinta"; Requires_arch 8], Use_operands [| Dreg; Dreg |],
+            "vrnda", elts_same_1, [F32];
+    Vrinta, [Builtin_name "vrinta"; Requires_arch 8], Use_operands [| Qreg; Qreg |],
+            "vrndqa", elts_same_1, [F32];
+    Vrintp, [Builtin_name "vrintp"; Requires_arch 8], Use_operands [| Dreg; Dreg |],
+            "vrndp", elts_same_1, [F32];
+    Vrintp, [Builtin_name "vrintp"; Requires_arch 8], Use_operands [| Qreg; Qreg |],
+            "vrndqp", elts_same_1, [F32];
+    Vrintm, [Builtin_name "vrintm"; Requires_arch 8], Use_operands [| Dreg; Dreg |],
+            "vrndm", elts_same_1, [F32];
+    Vrintm, [Builtin_name "vrintm"; Requires_arch 8], Use_operands [| Qreg; Qreg |],
+            "vrndqm", elts_same_1, [F32];
+    Vrintz, [Builtin_name "vrintz"; Requires_arch 8], Use_operands [| Dreg; Dreg |],
+            "vrnd", elts_same_1, [F32];
+    Vrintz, [Builtin_name "vrintz"; Requires_arch 8], Use_operands [| Qreg; Qreg |],
+            "vrndq", elts_same_1, [F32];
     (* Subtraction.  *)
     Vsub, [], All (3, Dreg), "vsub", sign_invar_2, F32 :: su_8_32;
     Vsub, [No_op], All (3, Dreg), "vsub", sign_invar_2,  [S64; U64];