[AArch64] Add HF vector modes to lane-to-lane INS pattern

Submitted by Kyrill Tkachov on April 21, 2017, 8:34 a.m.

Details

Message ID 58F9C40C.3080502@foss.arm.com
State New
Headers show

Commit Message

Kyrill Tkachov April 21, 2017, 8:34 a.m.
Hi all,

For the testcase in the patch we currently miss a combination and generate:
foo:
         dup     h1, v1.h[2]
         ins     v0.h[3], v1.h[0]
         ret

bar:
         dup     h1, v1.h[2]
         ins     v0.h[3], v1.h[0]
         ret

This is because the *aarch64_simd_vec_copy_lane<mode> pattern is not defined
for HF vector modes. I think that's just a simple oversight fixed by using
the VALL_F16 mode iterator instead of VALL (it just adds V4HF and V8HF on top of VALL)
and we can use the proper INS pattern and generate:
foo:
         ins     v0.h[3], v1.h[2]
         ret

bar:
         ins     v0.h[3], v1.h[2]
         ret

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for GCC 8?

Thanks,
Kyrill

2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

	* config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane<mode>):
	Use VALL_F16 iterator rather than VALL.

2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

	* gcc.target/aarch64/hfmode_ins_1.c: New test.

Patch hide | download patch | download mbox

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 7ad3a76c8fa8bc28b8e0c6314958be7dfcf43457..3eeb54bdd512c729f43f3a19ebb0e58567767d20 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -565,14 +565,14 @@  (define_insn "aarch64_simd_vec_set<mode>"
 )
 
 (define_insn "*aarch64_simd_vec_copy_lane<mode>"
-  [(set (match_operand:VALL 0 "register_operand" "=w")
-	(vec_merge:VALL
-	    (vec_duplicate:VALL
+  [(set (match_operand:VALL_F16 0 "register_operand" "=w")
+	(vec_merge:VALL_F16
+	    (vec_duplicate:VALL_F16
 	      (vec_select:<VEL>
-		(match_operand:VALL 3 "register_operand" "w")
+		(match_operand:VALL_F16 3 "register_operand" "w")
 		(parallel
 		  [(match_operand:SI 4 "immediate_operand" "i")])))
-	    (match_operand:VALL 1 "register_operand" "0")
+	    (match_operand:VALL_F16 1 "register_operand" "0")
 	    (match_operand:SI 2 "immediate_operand" "i")))]
   "TARGET_SIMD"
   {
diff --git a/gcc/testsuite/gcc.target/aarch64/hfmode_ins_1.c b/gcc/testsuite/gcc.target/aarch64/hfmode_ins_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..7fafe92f49042b64d24ad4d5219251645da3abfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/hfmode_ins_1.c
@@ -0,0 +1,21 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* Check that we can perform this in a single INS without doing any DUPs.  */
+
+#include <arm_neon.h>
+
+float16x8_t
+foo (float16x8_t a, float16x8_t b)
+{
+  return vsetq_lane_f16 (vgetq_lane_f16 (b, 2), a, 3);
+}
+
+float16x4_t
+bar (float16x4_t a, float16x4_t b)
+{
+  return vset_lane_f16 (vget_lane_f16 (b, 2), a, 3);
+}
+
+/* { dg-final { scan-assembler-times "ins\\t" 2 } } */
+/* { dg-final { scan-assembler-not "dup\\t" } } */