diff mbox

[AArch64] Add HF vector modes to lane-to-lane INS pattern

Message ID 58F9C40C.3080502@foss.arm.com
State New
Headers show

Commit Message

Kyrill Tkachov April 21, 2017, 8:34 a.m. UTC
Hi all,

For the testcase in the patch we currently miss a combination and generate:
foo:
         dup     h1, v1.h[2]
         ins     v0.h[3], v1.h[0]
         ret

bar:
         dup     h1, v1.h[2]
         ins     v0.h[3], v1.h[0]
         ret

This is because the *aarch64_simd_vec_copy_lane<mode> pattern is not defined
for HF vector modes. I think that's just a simple oversight fixed by using
the VALL_F16 mode iterator instead of VALL (it just adds V4HF and V8HF on top of VALL)
and we can use the proper INS pattern and generate:
foo:
         ins     v0.h[3], v1.h[2]
         ret

bar:
         ins     v0.h[3], v1.h[2]
         ret

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for GCC 8?

Thanks,
Kyrill

2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

	* config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane<mode>):
	Use VALL_F16 iterator rather than VALL.

2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

	* gcc.target/aarch64/hfmode_ins_1.c: New test.

Comments

Kyrill Tkachov May 11, 2017, 10:15 a.m. UTC | #1
Ping.

https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00933.html

Thanks,
Kyrill

On 21/04/17 09:34, Kyrill Tkachov wrote:
> Hi all,
>
> For the testcase in the patch we currently miss a combination and generate:
> foo:
>         dup     h1, v1.h[2]
>         ins     v0.h[3], v1.h[0]
>         ret
>
> bar:
>         dup     h1, v1.h[2]
>         ins     v0.h[3], v1.h[0]
>         ret
>
> This is because the *aarch64_simd_vec_copy_lane<mode> pattern is not defined
> for HF vector modes. I think that's just a simple oversight fixed by using
> the VALL_F16 mode iterator instead of VALL (it just adds V4HF and V8HF on top of VALL)
> and we can use the proper INS pattern and generate:
> foo:
>         ins     v0.h[3], v1.h[2]
>         ret
>
> bar:
>         ins     v0.h[3], v1.h[2]
>         ret
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for GCC 8?
>
> Thanks,
> Kyrill
>
> 2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane<mode>):
>     Use VALL_F16 iterator rather than VALL.
>
> 2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * gcc.target/aarch64/hfmode_ins_1.c: New test.
>
Kyrill Tkachov June 2, 2017, 10:39 a.m. UTC | #2
Ping.

Thanks,
Kyrill

On 11/05/17 11:15, Kyrill Tkachov wrote:
> Ping.
>
> https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00933.html
>
> Thanks,
> Kyrill
>
> On 21/04/17 09:34, Kyrill Tkachov wrote:
>> Hi all,
>>
>> For the testcase in the patch we currently miss a combination and generate:
>> foo:
>>         dup     h1, v1.h[2]
>>         ins     v0.h[3], v1.h[0]
>>         ret
>>
>> bar:
>>         dup     h1, v1.h[2]
>>         ins     v0.h[3], v1.h[0]
>>         ret
>>
>> This is because the *aarch64_simd_vec_copy_lane<mode> pattern is not defined
>> for HF vector modes. I think that's just a simple oversight fixed by using
>> the VALL_F16 mode iterator instead of VALL (it just adds V4HF and V8HF on top of VALL)
>> and we can use the proper INS pattern and generate:
>> foo:
>>         ins     v0.h[3], v1.h[2]
>>         ret
>>
>> bar:
>>         ins     v0.h[3], v1.h[2]
>>         ret
>>
>> Bootstrapped and tested on aarch64-none-linux-gnu.
>> Ok for GCC 8?
>>
>> Thanks,
>> Kyrill
>>
>> 2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>     * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane<mode>):
>>     Use VALL_F16 iterator rather than VALL.
>>
>> 2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>     * gcc.target/aarch64/hfmode_ins_1.c: New test.
>>
>
James Greenhalgh June 2, 2017, 1:53 p.m. UTC | #3
On Fri, Apr 21, 2017 at 09:34:20AM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> For the testcase in the patch we currently miss a combination and generate:
> foo:
>         dup     h1, v1.h[2]
>         ins     v0.h[3], v1.h[0]
>         ret
> 
> bar:
>         dup     h1, v1.h[2]
>         ins     v0.h[3], v1.h[0]
>         ret
> 
> This is because the *aarch64_simd_vec_copy_lane<mode> pattern is not defined
> for HF vector modes. I think that's just a simple oversight fixed by using
> the VALL_F16 mode iterator instead of VALL (it just adds V4HF and V8HF on top of VALL)
> and we can use the proper INS pattern and generate:
> foo:
>         ins     v0.h[3], v1.h[2]
>         ret
> 
> bar:
>         ins     v0.h[3], v1.h[2]
>         ret
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for GCC 8?

Yes, this is OK.

Thanks,
James

> 
> Thanks,
> Kyrill
> 
> 2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
> 	* config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane<mode>):
> 	Use VALL_F16 iterator rather than VALL.
> 
> 2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
> 	* gcc.target/aarch64/hfmode_ins_1.c: New test.
>
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 7ad3a76c8fa8bc28b8e0c6314958be7dfcf43457..3eeb54bdd512c729f43f3a19ebb0e58567767d20 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -565,14 +565,14 @@  (define_insn "aarch64_simd_vec_set<mode>"
 )
 
 (define_insn "*aarch64_simd_vec_copy_lane<mode>"
-  [(set (match_operand:VALL 0 "register_operand" "=w")
-	(vec_merge:VALL
-	    (vec_duplicate:VALL
+  [(set (match_operand:VALL_F16 0 "register_operand" "=w")
+	(vec_merge:VALL_F16
+	    (vec_duplicate:VALL_F16
 	      (vec_select:<VEL>
-		(match_operand:VALL 3 "register_operand" "w")
+		(match_operand:VALL_F16 3 "register_operand" "w")
 		(parallel
 		  [(match_operand:SI 4 "immediate_operand" "i")])))
-	    (match_operand:VALL 1 "register_operand" "0")
+	    (match_operand:VALL_F16 1 "register_operand" "0")
 	    (match_operand:SI 2 "immediate_operand" "i")))]
   "TARGET_SIMD"
   {
diff --git a/gcc/testsuite/gcc.target/aarch64/hfmode_ins_1.c b/gcc/testsuite/gcc.target/aarch64/hfmode_ins_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..7fafe92f49042b64d24ad4d5219251645da3abfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/hfmode_ins_1.c
@@ -0,0 +1,21 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* Check that we can perform this in a single INS without doing any DUPs.  */
+
+#include <arm_neon.h>
+
+float16x8_t
+foo (float16x8_t a, float16x8_t b)
+{
+  return vsetq_lane_f16 (vgetq_lane_f16 (b, 2), a, 3);
+}
+
+float16x4_t
+bar (float16x4_t a, float16x4_t b)
+{
+  return vset_lane_f16 (vget_lane_f16 (b, 2), a, 3);
+}
+
+/* { dg-final { scan-assembler-times "ins\\t" 2 } } */
+/* { dg-final { scan-assembler-not "dup\\t" } } */