Patchwork [ARM] Implement support for NEON vmovn.

login
register
mail settings
Submitter tejas belagod
Date Aug. 31, 2010, 2:26 p.m.
Message ID <1283264810.30429.128.camel@e102484-lin.cambridge.arm.com>
Download mbox | patch
Permalink /patch/63253/
State New
Headers show

Comments

tejas belagod - Aug. 31, 2010, 2:26 p.m.
OK. Attached is a new patch that changes the vect_pack_trunc flag in
target-supports.exp. There are tests already in gcc.dg/vect (eg.
vect-multitypes-14.c) that check for vect_pack_trunc support and pass if
loops are vectorized. These tests pass with this patch. But they are not
written to xfail if vect_pack_trunc is not supported - should they have
been?

--
Tejas Belagod
ARM.

New Changelog:

gcc/

2010-08-31  Tejas Belagod  <tejas.belagod@arm.com>

	* config/arm/neon.md (vec_pack_trunc_<mode>): Instruction
	pattern for vmovn. Expansion in case of non 
	-mvectorize-with-neon-quad.
	(neon_vec_pack_trunc_<mode>): Instruction pattern for vmovn for
	non- -mvectorize-with-neon-quad case.
	(move_lo_quad_<mode>): New expansion to vmov into low part.
	(move_hi_quad_<mode>): New expansion to vmov into high part.
	(move_lo_quad_v4si): Refactor to move_lo_quad_<mode> expansion.
	(move_lo_quad_v4sf): Likewise.
	(move_lo_quad_v8hi): Likewise.
	(neon_move_lo_quad_<mode>): Instruction pattern for vmov into
	low part.
	(neon_move_hi_quad_<mode>): Instruction pattern for vmov into
	high part.
	* config/arm/iterators.md (ANY128): New mode iterator.
	(V_narrow_pack): New mode attribute.
	(V_HALF): Add attribute.
	(V_DOUBLE): Add attribute.
	(V_mode_nunits): Add attribute.

gcc/testsuite

2010-08-31  Tejas Belagod  <tejas.belagod@arm.com>

	* lib/target-supports.exp
	(check_effective_target_vect_pack_trunc): Set vect_pack_trunc
	supported flag to true for neon.


On Tue, 2010-08-31 at 11:51 +0000, Joseph S. Myers wrote:
> On Tue, 31 Aug 2010, Tejas Belagod wrote:
> 
> > Hi,
> > 
> > Attached is a patch that implements support for generating NEON
> > VMOVN.i<size>. This patch also refactors
> 
> Is it possible to add a testcase to the testsuite that fails before and 
> passes after this patch and demonstrates that it generates the desired 
> code on appropriate testcases?
>
Joseph S. Myers - Aug. 31, 2010, 2:31 p.m.
On Tue, 31 Aug 2010, Tejas Belagod wrote:

> OK. Attached is a new patch that changes the vect_pack_trunc flag in
> target-supports.exp. There are tests already in gcc.dg/vect (eg.
> vect-multitypes-14.c) that check for vect_pack_trunc support and pass if
> loops are vectorized. These tests pass with this patch. But they are not
> written to xfail if vect_pack_trunc is not supported - should they have
> been?

No, it's fine as long as they would have failed if you applied just the 
target-supports.exp part of the patch and not the ARM part; the intent is 
that gcc.dg/vect tests are skipped altogether if the processor simply 
doesn't support a particular feature in a way that means it would be 
sensible to vectorize a given test.  (But I can't review this patch.)
Richard Earnshaw - Sept. 1, 2010, 2:55 p.m.
On Tue, 2010-08-31 at 15:26 +0100, Tejas Belagod wrote:
> OK. Attached is a new patch that changes the vect_pack_trunc flag in
> target-supports.exp. There are tests already in gcc.dg/vect (eg.
> vect-multitypes-14.c) that check for vect_pack_trunc support and pass if
> loops are vectorized. These tests pass with this patch. But they are not
> written to xfail if vect_pack_trunc is not supported - should they have
> been?
> 
> --
> Tejas Belagod
> ARM.
> 
> New Changelog:
> 
> gcc/
> 
> 2010-08-31  Tejas Belagod  <tejas.belagod@arm.com>
> 
> 	* config/arm/neon.md (vec_pack_trunc_<mode>): Instruction
> 	pattern for vmovn. Expansion in case of non 
> 	-mvectorize-with-neon-quad.
> 	(neon_vec_pack_trunc_<mode>): Instruction pattern for vmovn for
> 	non- -mvectorize-with-neon-quad case.
> 	(move_lo_quad_<mode>): New expansion to vmov into low part.
> 	(move_hi_quad_<mode>): New expansion to vmov into high part.
> 	(move_lo_quad_v4si): Refactor to move_lo_quad_<mode> expansion.
> 	(move_lo_quad_v4sf): Likewise.
> 	(move_lo_quad_v8hi): Likewise.
> 	(neon_move_lo_quad_<mode>): Instruction pattern for vmov into
> 	low part.
> 	(neon_move_hi_quad_<mode>): Instruction pattern for vmov into
> 	high part.
> 	* config/arm/iterators.md (ANY128): New mode iterator.
> 	(V_narrow_pack): New mode attribute.
> 	(V_HALF): Add attribute.
> 	(V_DOUBLE): Add attribute.
> 	(V_mode_nunits): Add attribute.
> 
> gcc/testsuite
> 
> 2010-08-31  Tejas Belagod  <tejas.belagod@arm.com>
> 
> 	* lib/target-supports.exp
> 	(check_effective_target_vect_pack_trunc): Set vect_pack_trunc
> 	supported flag to true for neon.
> 

This is OK.

R.
tejas belagod - Sept. 10, 2010, 10:26 a.m.
On Wed, 2010-09-01 at 15:55 +0100, Richard Earnshaw wrote:
> On Tue, 2010-08-31 at 15:26 +0100, Tejas Belagod wrote:
> > OK. Attached is a new patch that changes the vect_pack_trunc flag in
> > target-supports.exp. There are tests already in gcc.dg/vect (eg.
> > vect-multitypes-14.c) that check for vect_pack_trunc support and pass if
> > loops are vectorized. These tests pass with this patch. But they are not
> > written to xfail if vect_pack_trunc is not supported - should they have
> > been?
> > 
> > --
> > Tejas Belagod
> > ARM.
> > 
> > New Changelog:
> > 
> > gcc/
> > 
> > 2010-08-31  Tejas Belagod  <tejas.belagod@arm.com>
> > 
> > 	* config/arm/neon.md (vec_pack_trunc_<mode>): Instruction
> > 	pattern for vmovn. Expansion in case of non 
> > 	-mvectorize-with-neon-quad.
> > 	(neon_vec_pack_trunc_<mode>): Instruction pattern for vmovn for
> > 	non- -mvectorize-with-neon-quad case.
> > 	(move_lo_quad_<mode>): New expansion to vmov into low part.
> > 	(move_hi_quad_<mode>): New expansion to vmov into high part.
> > 	(move_lo_quad_v4si): Refactor to move_lo_quad_<mode> expansion.
> > 	(move_lo_quad_v4sf): Likewise.
> > 	(move_lo_quad_v8hi): Likewise.
> > 	(neon_move_lo_quad_<mode>): Instruction pattern for vmov into
> > 	low part.
> > 	(neon_move_hi_quad_<mode>): Instruction pattern for vmov into
> > 	high part.
> > 	* config/arm/iterators.md (ANY128): New mode iterator.
> > 	(V_narrow_pack): New mode attribute.
> > 	(V_HALF): Add attribute.
> > 	(V_DOUBLE): Add attribute.
> > 	(V_mode_nunits): Add attribute.
> > 
> > gcc/testsuite
> > 
> > 2010-08-31  Tejas Belagod  <tejas.belagod@arm.com>
> > 
> > 	* lib/target-supports.exp
> > 	(check_effective_target_vect_pack_trunc): Set vect_pack_trunc
> > 	supported flag to true for neon.
> > 
> 
> This is OK.
> 
> R.
> 

I don't have commit rights. Please could someone check this patch in?

Thanks,
Tejas.
tejas belagod - Sept. 15, 2010, 11:21 a.m.
On Wed, 2010-09-01 at 15:55 +0100, Richard Earnshaw wrote:
> On Tue, 2010-08-31 at 15:26 +0100, Tejas Belagod wrote:
> > OK. Attached is a new patch that changes the vect_pack_trunc flag in
> > target-supports.exp. There are tests already in gcc.dg/vect (eg.
> > vect-multitypes-14.c) that check for vect_pack_trunc support and pass if
> > loops are vectorized. These tests pass with this patch. But they are not
> > written to xfail if vect_pack_trunc is not supported - should they have
> > been?
> > 
> > --
> > Tejas Belagod
> > ARM.
> > 
> > New Changelog:
> > 
> > gcc/
> > 
> > 2010-08-31  Tejas Belagod  <tejas.belagod@arm.com>
> > 
> > 	* config/arm/neon.md (vec_pack_trunc_<mode>): Instruction
> > 	pattern for vmovn. Expansion in case of non 
> > 	-mvectorize-with-neon-quad.
> > 	(neon_vec_pack_trunc_<mode>): Instruction pattern for vmovn for
> > 	non- -mvectorize-with-neon-quad case.
> > 	(move_lo_quad_<mode>): New expansion to vmov into low part.
> > 	(move_hi_quad_<mode>): New expansion to vmov into high part.
> > 	(move_lo_quad_v4si): Refactor to move_lo_quad_<mode> expansion.
> > 	(move_lo_quad_v4sf): Likewise.
> > 	(move_lo_quad_v8hi): Likewise.
> > 	(neon_move_lo_quad_<mode>): Instruction pattern for vmov into
> > 	low part.
> > 	(neon_move_hi_quad_<mode>): Instruction pattern for vmov into
> > 	high part.
> > 	* config/arm/iterators.md (ANY128): New mode iterator.
> > 	(V_narrow_pack): New mode attribute.
> > 	(V_HALF): Add attribute.
> > 	(V_DOUBLE): Add attribute.
> > 	(V_mode_nunits): Add attribute.
> > 
> > gcc/testsuite
> > 
> > 2010-08-31  Tejas Belagod  <tejas.belagod@arm.com>
> > 
> > 	* lib/target-supports.exp
> > 	(check_effective_target_vect_pack_trunc): Set vect_pack_trunc
> > 	supported flag to true for neon.
> > 
> 
> This is OK.
> 
> R.
> 

Hi,

I have checked this patch in.

Thanks,
Tejas.

Patch

Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	(revision 163672)
+++ gcc/testsuite/lib/target-supports.exp	(working copy)
@@ -2617,7 +2617,8 @@ 
         if { ([istarget powerpc*-*-*] && ![istarget powerpc-*-linux*paired*])
              || [istarget i?86-*-*]
              || [istarget x86_64-*-*]
-             || [istarget spu-*-*] } {
+             || [istarget spu-*-*]
+             || ([istarget arm*-*-*] && [check_effective_target_arm_neon]) } {
             set et_vect_pack_trunc_saved 1
         }
     }
Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md	(revision 163672)
+++ gcc/config/arm/neon.md	(working copy)
@@ -1115,12 +1115,13 @@ 
 ; vector registers. Make an attempt at removing unnecessary moves, though
 ; we're really at the mercy of the register allocator.
 
-(define_insn "move_lo_quad_v4si"
-  [(set (match_operand:V4SI 0 "s_register_operand" "+w")
-        (vec_concat:V4SI
-          (match_operand:V2SI 1 "s_register_operand" "w")
-          (vec_select:V2SI (match_dup 0)
-			   (parallel [(const_int 2) (const_int 3)]))))]
+(define_insn "neon_move_lo_quad_<mode>"
+  [(set (match_operand:ANY128 0 "s_register_operand" "+w")
+        (vec_concat:ANY128
+          (match_operand:<V_HALF> 1 "s_register_operand" "w")
+          (vec_select:<V_HALF> 
+		(match_dup 0)
+	        (match_operand:ANY128 2 "vect_par_constant_high" ""))))]
   "TARGET_NEON"
 {
   int dest = REGNO (operands[0]);
@@ -1134,67 +1135,62 @@ 
   [(set_attr "neon_type" "neon_bp_simple")]
 )
 
-(define_insn "move_lo_quad_v4sf"
-  [(set (match_operand:V4SF 0 "s_register_operand" "+w")
-        (vec_concat:V4SF
-          (match_operand:V2SF 1 "s_register_operand" "w")
-          (vec_select:V2SF (match_dup 0)
-			   (parallel [(const_int 2) (const_int 3)]))))]
+(define_insn "neon_move_hi_quad_<mode>"
+  [(set (match_operand:ANY128 0 "s_register_operand" "+w")
+        (vec_concat:ANY128
+          (match_operand:<V_HALF> 1 "s_register_operand" "w")
+          (vec_select:<V_HALF>
+		(match_dup 0)
+	        (match_operand:ANY128 2 "vect_par_constant_low" ""))))]
   "TARGET_NEON"
 {
   int dest = REGNO (operands[0]);
   int src = REGNO (operands[1]);
 
   if (dest != src)
-    return "vmov\t%e0, %P1";
+    return "vmov\t%f0, %P1";
   else
     return "";
 }
   [(set_attr "neon_type" "neon_bp_simple")]
 )
 
-(define_insn "move_lo_quad_v8hi"
-  [(set (match_operand:V8HI 0 "s_register_operand" "+w")
-        (vec_concat:V8HI
-          (match_operand:V4HI 1 "s_register_operand" "w")
-          (vec_select:V4HI (match_dup 0)
-                           (parallel [(const_int 4) (const_int 5)
-                                      (const_int 6) (const_int 7)]))))]
-  "TARGET_NEON"
+(define_expand "move_hi_quad_<mode>"
+ [(match_operand:ANY128 0 "s_register_operand" "")
+  (match_operand:<V_HALF> 1 "s_register_operand" "")]
+ "TARGET_NEON"
 {
-  int dest = REGNO (operands[0]);
-  int src = REGNO (operands[1]);
+  rtvec v = rtvec_alloc (<V_mode_nunits>/2);
+  rtx t1;
+  int i;
 
-  if (dest != src)
-    return "vmov\t%e0, %P1";
-  else
-    return "";
-}
-  [(set_attr "neon_type" "neon_bp_simple")]
-)
+  for (i=0; i < (<V_mode_nunits>/2); i++)
+     RTVEC_ELT (v, i) = GEN_INT (i);
 
-(define_insn "move_lo_quad_v16qi"
-  [(set (match_operand:V16QI 0 "s_register_operand" "+w")
-        (vec_concat:V16QI
-          (match_operand:V8QI 1 "s_register_operand" "w")
-          (vec_select:V8QI (match_dup 0)
-                           (parallel [(const_int 8)  (const_int 9)
-                                      (const_int 10) (const_int 11)
-                                      (const_int 12) (const_int 13)
-                                      (const_int 14) (const_int 15)]))))]
-  "TARGET_NEON"
+  t1 = gen_rtx_PARALLEL (<MODE>mode, v);
+  emit_insn (gen_neon_move_hi_quad_<mode> (operands[0], operands[1], t1));
+
+  DONE;
+})
+
+(define_expand "move_lo_quad_<mode>"
+ [(match_operand:ANY128 0 "s_register_operand" "")
+  (match_operand:<V_HALF> 1 "s_register_operand" "")]
+ "TARGET_NEON"
 {
-  int dest = REGNO (operands[0]);
-  int src = REGNO (operands[1]);
+  rtvec v = rtvec_alloc (<V_mode_nunits>/2);
+  rtx t1;
+  int i;
 
-  if (dest != src)
-    return "vmov\t%e0, %P1";
-  else
-    return "";
-}
-  [(set_attr "neon_type" "neon_bp_simple")]
-)
+  for (i=0; i < (<V_mode_nunits>/2); i++)
+     RTVEC_ELT (v, i) = GEN_INT ((<V_mode_nunits>/2) + i);
 
+  t1 = gen_rtx_PARALLEL (<MODE>mode, v);
+  emit_insn (gen_neon_move_lo_quad_<mode> (operands[0], operands[1], t1));
+
+  DONE;
+})
+
 ;; Reduction operations
 
 (define_expand "reduc_splus_<mode>"
@@ -5179,3 +5175,38 @@ 
 
  }
 )
+
+(define_insn "vec_pack_trunc_<mode>"
+ [(set (match_operand:<V_narrow_pack> 0 "register_operand" "=&w")
+       (vec_concat:<V_narrow_pack> 
+		(truncate:<V_narrow> 
+			(match_operand:VN 1 "register_operand" "w"))
+		(truncate:<V_narrow>
+			(match_operand:VN 2 "register_operand" "w"))))]
+ "TARGET_NEON"
+ "vmovn.i<V_sz_elem>\t%e0, %q1\n\tvmovn.i<V_sz_elem>\t%f0, %q2"
+ [(set_attr "neon_type" "neon_shift_1")]
+)
+
+;; For the non-quad case.
+(define_insn "neon_vec_pack_trunc_<mode>"
+ [(set (match_operand:<V_narrow> 0 "register_operand" "=w")
+       (truncate:<V_narrow> (match_operand:VN 1 "register_operand" "")))]
+ "TARGET_NEON"
+ "vmovn.i<V_sz_elem>\t%0, %q1"
+ [(set_attr "neon_type" "neon_shift_1")]
+)
+
+(define_expand "vec_pack_trunc_<mode>"
+ [(match_operand:<V_narrow_pack> 0 "register_operand" "")
+  (match_operand:VSHFT 1 "register_operand" "")
+  (match_operand:VSHFT 2 "register_operand")]
+ "TARGET_NEON"
+{
+  rtx tempreg = gen_reg_rtx (<V_DOUBLE>mode);
+  
+  emit_insn (gen_move_lo_quad_<V_double> (tempreg, operands[1])); 
+  emit_insn (gen_move_hi_quad_<V_double> (tempreg, operands[2])); 
+  emit_insn (gen_neon_vec_pack_trunc_<V_double> (operands[0], tempreg));
+  DONE;
+})
Index: gcc/config/arm/iterators.md
===================================================================
--- gcc/config/arm/iterators.md	(revision 163672)
+++ gcc/config/arm/iterators.md	(working copy)
@@ -28,6 +28,8 @@ 
 ;; registers.
 (define_mode_iterator ANY64 [DI DF V8QI V4HI V2SI V2SF])
 
+(define_mode_iterator ANY128 [V2DI V2DF V16QI V8HI V4SI V4SF])
+
 ;; A list of integer modes that are up to one word long
 (define_mode_iterator QHSI [QI HI SI])
 
@@ -227,9 +229,13 @@ 
 ;; Narrower modes with the same number of elements.
 (define_mode_attr V_narrow [(V8HI "V8QI") (V4SI "V4HI") (V2DI "V2SI")])
 
+;; Narrower modes with double the number of elements.
+(define_mode_attr V_narrow_pack [(V4SI "V8HI") (V8HI "V16QI") (V2DI "V4SI")
+				 (V4HI "V8QI") (V2SI "V4HI")  (DI "V2SI")])
+
 ;; Modes with half the number of equal-sized elements.
 (define_mode_attr V_HALF [(V16QI "V8QI") (V8HI "V4HI")
-              (V4SI  "V2SI") (V4SF "V2SF")
+              (V4SI  "V2SI") (V4SF "V2SF") (V2DF "DF")
                           (V2DI "DI")])
 
 ;; Same, but lower-case.
@@ -239,7 +245,7 @@ 
 
 ;; Modes with twice the number of equal-sized elements.
 (define_mode_attr V_DOUBLE [(V8QI "V16QI") (V4HI "V8HI")
-                (V2SI "V4SI") (V2SF "V4SF")
+                (V2SI "V4SI") (V2SF "V4SF") (DF "V2DF")
                             (DI "V2DI")])
 
 ;; Same, but lower-case.
@@ -362,7 +368,8 @@ 
                                  (V4HI "4") (V8HI "8")
                                  (V2SI "2") (V4SI "4")
                                  (V2SF "2") (V4SF "4")
-                                 (DI "1")   (V2DI "2")])
+                                 (DI "1")   (V2DI "2")
+                                 (DF "1")   (V2DF "2")])
 
 ;; Same as V_widen, but lower-case.
 (define_mode_attr V_widen_l [(V8QI "v8hi") (V4HI "v4si") ( V2SI "v2di")])