Patchwork [ARM] Discourage use of NEON on Cortex-A8

login
register
mail settings
Submitter Andrew Stubbs
Date March 13, 2011, 4:31 p.m.
Message ID <4D7CF174.1090608@codesourcery.com>
Download mbox | patch
Permalink /patch/86605/
State New
Headers show

Comments

Andrew Stubbs - March 13, 2011, 4:31 p.m.
This patch discourages the use of NEON for integer operations on ARM 
Cortex-A8.

The problem is that transferring data from NEON/VFP registers to core 
registers is prohibitively expensive on A8. This should not affect 
Cortex-A9 in the same way.

This change gives a 6% increase in performance on SPEC2000 crafty, on an 
imx51 board.

An older version of the patch has been used for some time in the 
CodeSourcery and Linaro toolchains, so it's fairly well tested.

OK (for stage 1)?

Andrew
Richard Earnshaw - March 14, 2011, 5:10 p.m.
On Sun, 2011-03-13 at 16:31 +0000, Andrew Stubbs wrote:
> This patch discourages the use of NEON for integer operations on ARM 
> Cortex-A8.
> 
> The problem is that transferring data from NEON/VFP registers to core 
> registers is prohibitively expensive on A8. This should not affect 
> Cortex-A9 in the same way.
> 
> This change gives a 6% increase in performance on SPEC2000 crafty, on an 
> imx51 board.
> 
> An older version of the patch has been used for some time in the 
> CodeSourcery and Linaro toolchains, so it's fairly well tested.
> 
> OK (for stage 1)?
> 
> Andrew

There's no denying numbers like that, so I'm going to approve this for
stage 1, but I'm far from convinced that this isn't papering over other
problems.

Did you look at REGISTER_MOVE_COST?  That code for ARM looks rather
crufty these days and should really be the place where the cost of
moving between the classes is expressed.  I also wonder whether IRA is
really taking these costs into account when doing preferencing.

Finally, alternatives from an insn are normally selected left-to-right
from those available in a pattern, all other things being equal.  So
really the A8-only alternative should come after the core registers
alternatives if its less preferable.

R.
Bernd Schmidt - March 14, 2011, 5:37 p.m.
On 03/14/2011 06:10 PM, Richard Earnshaw wrote:
> 
> On Sun, 2011-03-13 at 16:31 +0000, Andrew Stubbs wrote:
>> This patch discourages the use of NEON for integer operations on ARM 
>> Cortex-A8.
>>
>> The problem is that transferring data from NEON/VFP registers to core 
>> registers is prohibitively expensive on A8. This should not affect 
>> Cortex-A9 in the same way.
>>
>> This change gives a 6% increase in performance on SPEC2000 crafty, on an 
>> imx51 board.
>>
>> An older version of the patch has been used for some time in the 
>> CodeSourcery and Linaro toolchains, so it's fairly well tested.
>>
>> OK (for stage 1)?
>>
>> Andrew
> 
> There's no denying numbers like that, so I'm going to approve this for
> stage 1, but I'm far from convinced that this isn't papering over other
> problems.

Oh, I agree. I seem to recall we tried using REGISTER_MOVE_COST, but it
didn't help.
Ideally, we'd fix this in the register allocator and in lower-subreg,
but I believe that would be a significantly larger project. You'd have
to detect that if a certain pseudo is allocated to a NEON register, this
would cause an expensive register move (because it's used in an insn
that only allows GENERAL_REGS), and then recursively every other
operation that involves this pseudo also needs to avoid NEON registers
for the same reason. Doing this in lower-subreg would give the best
results (splitting pseudos that shouldn't go into NEON regs and leaving
the others alone), but it would also work to do it in IRA.


Bernd
Andrew Stubbs - March 14, 2011, 5:55 p.m.
On 14/03/11 17:10, Richard Earnshaw wrote:
> Finally, alternatives from an insn are normally selected left-to-right
> from those available in a pattern, all other things being equal.  So
> really the A8-only alternative should come after the core registers
> alternatives if its less preferable.

OK, I can do that, but I'm not exactly sure how far to the right you 
want them moved?

Do you want the onlya8 alternatives moved the the right of all 
alternatives without a '?' modifier (so, changing only iordi3_neon and 
anddi3_neon)? Or do you mean moving it all the way to the right hand end?

Or something completely different?

Andrew
Richard Earnshaw - March 14, 2011, 6:17 p.m.
On Mon, 2011-03-14 at 17:55 +0000, Andrew Stubbs wrote:
> On 14/03/11 17:10, Richard Earnshaw wrote:
> > Finally, alternatives from an insn are normally selected left-to-right
> > from those available in a pattern, all other things being equal.  So
> > really the A8-only alternative should come after the core registers
> > alternatives if its less preferable.
> 
> OK, I can do that, but I'm not exactly sure how far to the right you 
> want them moved?
> 
> Do you want the onlya8 alternatives moved the the right of all 
> alternatives without a '?' modifier (so, changing only iordi3_neon and 
> anddi3_neon)? Or do you mean moving it all the way to the right hand end?
> 
> Or something completely different?
> 
> Andrew
> 
So taking the current adddi3 pattern we have alternatives (in order of
preference): not-a8, only-a8, core-regs, core-regs.

I think the order should be: not-a8, core-regs, core-regs, only-a8.

R.
Andrew Stubbs - March 22, 2011, 2:05 p.m.
On 14/03/11 18:17, Richard Earnshaw wrote:
 > I think the order should be: not-a8, core-regs, core-regs, only-a8.

Ok, how about this?

I've tested that it still builds spec2k crafty.

Andrew
Richard Earnshaw - March 22, 2011, 2:06 p.m.
On Tue, 2011-03-22 at 14:05 +0000, Andrew Stubbs wrote:
> On 14/03/11 18:17, Richard Earnshaw wrote:
>  > I think the order should be: not-a8, core-regs, core-regs, only-a8.
> 
> Ok, how about this?
> 
> I've tested that it still builds spec2k crafty.
> 
> Andrew
> 

ENOPATCH

R.

Patch

2011-03-13  Bernd Schmidt  <bernds@codesourcery.com>
	    Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* config/arm/vfp.md (arm_movdi_vfp): Enable only when not tuning
	for Cortex-A8.
	(arm_movdi_vfp_cortexa8): New pattern.
	* config/arm/neon.md (adddi3_neon, subdi3_neon, anddi3_neon,
	iordi3_neon, xordi3_neon): Add alternatives to discourage Neon
	instructions when tuning for Cortex-A8.  Set attribute "arch".
	* config/arm/arm.md: Move include arm-tune.md up a bit.
	(define_attr "arch"): Add "onlya8" and "nota8" values.
	(define_attr "arch_enabled"): Handle "onlya8" and "nota8".

--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -149,6 +149,9 @@ 
 ;;---------------------------------------------------------------------------
 ;; Attributes
 
+;; Processor type.  This is created automatically from arm-cores.def.
+(include "arm-tune.md")
+
 ; IS_THUMB is set to 'yes' when we are generating Thumb code, and 'no' when
 ; generating ARM code.  This is used to control the length of some insn
 ; patterns that share the same RTL in both ARM and Thumb code.
@@ -192,7 +195,7 @@ 
 ; for ARM or Thumb-2 with arm_arch6, and nov6 for ARM without
 ; arm_arch6.  This attribute is used to compute attribute "enabled",
 ; use type "any" to enable an alternative in all cases.
-(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6"
+(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,onlya8,nota8"
   (const_string "any"))
 
 (define_attr "arch_enabled" "no,yes"
@@ -225,6 +228,14 @@ 
 
 	 (and (eq_attr "arch" "nov6")
 	      (ne (symbol_ref "(TARGET_32BIT && !arm_arch6)") (const_int 0)))
+	 (const_string "yes")
+
+	 (and (eq_attr "arch" "onlya8")
+	      (eq_attr "tune" "cortexa8"))
+	 (const_string "yes")
+
+	 (and (eq_attr "arch" "nota8")
+	      (not (eq_attr "tune" "cortexa8")))
 	 (const_string "yes")]
 	(const_string "no")))
 
@@ -485,9 +496,6 @@ 
 ;;---------------------------------------------------------------------------
 ;; Pipeline descriptions
 
-;; Processor type.  This is created automatically from arm-cores.def.
-(include "arm-tune.md")
-
 (define_attr "tune_cortexr4" "yes,no"
   (const (if_then_else
 	  (eq_attr "tune" "cortexr4,cortexr4f")
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -583,23 +583,25 @@ 
 )
 
 (define_insn "adddi3_neon"
-  [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r")
-        (plus:DI (match_operand:DI 1 "s_register_operand" "%w,0,0")
-                 (match_operand:DI 2 "s_register_operand" "w,r,0")))
+  [(set (match_operand:DI 0 "s_register_operand" "=w,?w,?&r,?&r")
+        (plus:DI (match_operand:DI 1 "s_register_operand" "%w,w,0,0")
+                 (match_operand:DI 2 "s_register_operand" "w,w,r,0")))
    (clobber (reg:CC CC_REGNUM))]
   "TARGET_NEON"
 {
   switch (which_alternative)
     {
-    case 0: return "vadd.i64\t%P0, %P1, %P2";
-    case 1: return "#";
+    case 0: /* fall through */
+    case 1: return "vadd.i64\t%P0, %P1, %P2";
     case 2: return "#";
+    case 3: return "#";
     default: gcc_unreachable ();
     }
 }
-  [(set_attr "neon_type" "neon_int_1,*,*")
-   (set_attr "conds" "*,clob,clob")
-   (set_attr "length" "*,8,8")]
+  [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*")
+   (set_attr "conds" "*,*,clob,clob")
+   (set_attr "length" "*,*,8,8")
+   (set_attr "arch" "nota8,onlya8,*,*")]
 )
 
 (define_insn "*sub<mode>3_neon"
@@ -617,24 +619,26 @@ 
 )
 
 (define_insn "subdi3_neon"
-  [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?&r")
-        (minus:DI (match_operand:DI 1 "s_register_operand" "w,0,r,0")
-                  (match_operand:DI 2 "s_register_operand" "w,r,0,0")))
+  [(set (match_operand:DI 0 "s_register_operand" "=w,?w,?&r,?&r,?&r")
+        (minus:DI (match_operand:DI 1 "s_register_operand" "w,w,0,r,0")
+                  (match_operand:DI 2 "s_register_operand" "w,w,r,0,0")))
    (clobber (reg:CC CC_REGNUM))]
   "TARGET_NEON"
 {
   switch (which_alternative)
     {
-    case 0: return "vsub.i64\t%P0, %P1, %P2";
-    case 1: /* fall through */ 
-    case 2: /* fall through */
-    case 3: return  "subs\\t%Q0, %Q1, %Q2\;sbc\\t%R0, %R1, %R2";
+    case 0: /* fall through */
+    case 1: return "vsub.i64\t%P0, %P1, %P2";
+    case 2: /* fall through */ 
+    case 3: /* fall through */
+    case 4: return  "subs\\t%Q0, %Q1, %Q2\;sbc\\t%R0, %R1, %R2";
     default: gcc_unreachable ();
     }
 }
-  [(set_attr "neon_type" "neon_int_2,*,*,*")
-   (set_attr "conds" "*,clob,clob,clob")
-   (set_attr "length" "*,8,8,8")]
+  [(set_attr "neon_type" "neon_int_2,neon_int_2,*,*,*")
+   (set_attr "conds" "*,*,clob,clob,clob")
+   (set_attr "length" "*,*,8,8,8")
+   (set_attr "arch" "nota8,onlya8,*,*,*")]
 )
 
 (define_insn "*mul<mode>3_neon"
@@ -720,23 +724,26 @@ 
 )
 
 (define_insn "iordi3_neon"
-  [(set (match_operand:DI 0 "s_register_operand" "=w,w,?&r,?&r")
-        (ior:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,r")
-		(match_operand:DI 2 "neon_logic_op2" "w,Dl,r,r")))]
+  [(set (match_operand:DI 0 "s_register_operand" "=w,?w,w,?w,?&r,?&r")
+        (ior:DI (match_operand:DI 1 "s_register_operand" "%w,w,0,0,0,r")
+		(match_operand:DI 2 "neon_logic_op2" "w,w,Dl,Dl,r,r")))]
   "TARGET_NEON"
 {
   switch (which_alternative)
     {
-    case 0: return "vorr\t%P0, %P1, %P2";
-    case 1: return neon_output_logic_immediate ("vorr", &operands[2],
+    case 0: /* fall through */
+    case 1: return "vorr\t%P0, %P1, %P2";
+    case 2: /* fall through */
+    case 3: return neon_output_logic_immediate ("vorr", &operands[2],
 		     DImode, 0, VALID_NEON_QREG_MODE (DImode));
-    case 2: return "#";
-    case 3: return "#";
+    case 4: return "#";
+    case 5: return "#";
     default: gcc_unreachable ();
     }
 }
-  [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*")
-   (set_attr "length" "*,*,8,8")]
+  [(set_attr "neon_type" "neon_int_1,neon_int_1,neon_int_1,neon_int_1,*,*")
+   (set_attr "length" "*,*,*,*,8,8")
+   (set_attr "arch" "nota8,onlya8,nota8,onlya8,*,*")]
 )
 
 ;; The concrete forms of the Neon immediate-logic instructions are vbic and
@@ -762,23 +769,26 @@ 
 )
 
 (define_insn "anddi3_neon"
-  [(set (match_operand:DI 0 "s_register_operand" "=w,w,?&r,?&r")
-        (and:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,r")
-		(match_operand:DI 2 "neon_inv_logic_op2" "w,DL,r,r")))]
+  [(set (match_operand:DI 0 "s_register_operand" "=w,?w,w,?w,?&r,?&r")
+        (and:DI (match_operand:DI 1 "s_register_operand" "%w,w,0,0,0,r")
+		(match_operand:DI 2 "neon_inv_logic_op2" "w,w,DL,DL,r,r")))]
   "TARGET_NEON"
 {
   switch (which_alternative)
     {
-    case 0: return "vand\t%P0, %P1, %P2";
-    case 1: return neon_output_logic_immediate ("vand", &operands[2],
+    case 0: /* fall through */
+    case 1: return "vand\t%P0, %P1, %P2";
+    case 2: /* fall through */
+    case 3: return neon_output_logic_immediate ("vand", &operands[2],
     		     DImode, 1, VALID_NEON_QREG_MODE (DImode));
-    case 2: return "#";
-    case 3: return "#";
+    case 4: return "#";
+    case 5: return "#";
     default: gcc_unreachable ();
     }
 }
-  [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*")
-   (set_attr "length" "*,*,8,8")]
+  [(set_attr "neon_type" "neon_int_1,neon_int_1,neon_int_1,neon_int_1,*,*")
+   (set_attr "length" "*,*,*,*,8,8")
+   (set_attr "arch" "nota8,onlya8,nota8,onlya8,*,*")]
 )
 
 (define_insn "orn<mode>3_neon"
@@ -836,16 +846,18 @@ 
 )
 
 (define_insn "xordi3_neon"
-  [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r")
-        (xor:DI (match_operand:DI 1 "s_register_operand" "%w,0,r")
-	        (match_operand:DI 2 "s_register_operand" "w,r,r")))]
+  [(set (match_operand:DI 0 "s_register_operand" "=w,?w,?&r,?&r")
+        (xor:DI (match_operand:DI 1 "s_register_operand" "%w,w,0,r")
+	        (match_operand:DI 2 "s_register_operand" "w,w,r,r")))]
   "TARGET_NEON"
   "@
    veor\t%P0, %P1, %P2
+   veor\t%P0, %P1, %P2
    #
    #"
-  [(set_attr "neon_type" "neon_int_1,*,*")
-   (set_attr "length" "*,8,8")]
+  [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*")
+   (set_attr "length" "*,*,8,8")
+   (set_attr "arch" "nota8,onlya8,*,*")]
 )
 
 (define_insn "one_cmpl<mode>2"
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -134,9 +134,51 @@ 
 ;; DImode moves
 
 (define_insn "*arm_movdi_vfp"
-  [(set (match_operand:DI 0 "nonimmediate_di_operand" "=r, r,m,w,r,w,w, Uv")
+  [(set (match_operand:DI 0 "nonimmediate_di_operand" "=r, r, m,w,r,w,w, Uv")
 	(match_operand:DI 1 "di_operand"              "rIK,mi,r,r,w,w,Uvi,w"))]
-  "TARGET_ARM && TARGET_HARD_FLOAT && TARGET_VFP
+  "TARGET_ARM && TARGET_HARD_FLOAT && TARGET_VFP && arm_tune != cortexa8
+   && (   register_operand (operands[0], DImode)
+       || register_operand (operands[1], DImode))"
+  "*
+  switch (which_alternative)
+    {
+    case 0: 
+      return \"#\";
+    case 1:
+    case 2:
+      return output_move_double (operands);
+    case 3:
+      return \"fmdrr%?\\t%P0, %Q1, %R1\\t%@ int\";
+    case 4:
+      return \"fmrrd%?\\t%Q0, %R0, %P1\\t%@ int\";
+    case 5:
+      if (TARGET_VFP_SINGLE)
+	return \"fcpys%?\\t%0, %1\\t%@ int\;fcpys%?\\t%p0, %p1\\t%@ int\";
+      else
+	return \"fcpyd%?\\t%P0, %P1\\t%@ int\";
+    case 6: case 7:
+      return output_move_vfp (operands);
+    default:
+      gcc_unreachable ();
+    }
+  "
+  [(set_attr "type" "*,load2,store2,r_2_f,f_2_r,ffarithd,f_loadd,f_stored")
+   (set_attr "neon_type" "*,*,*,neon_mcr_2_mcrr,neon_mrrc,neon_vmov,*,*")
+   (set (attr "length") (cond [(eq_attr "alternative" "0,1,2") (const_int 8)
+			       (eq_attr "alternative" "5")
+				(if_then_else
+				 (eq (symbol_ref "TARGET_VFP_SINGLE") (const_int 1))
+				 (const_int 8)
+				 (const_int 4))]
+			      (const_int 4)))
+   (set_attr "pool_range"     "*,1020,*,*,*,*,1020,*")
+   (set_attr "neg_pool_range" "*,1008,*,*,*,*,1008,*")]
+)
+
+(define_insn "*arm_movdi_vfp_cortexa8"
+  [(set (match_operand:DI 0 "nonimmediate_di_operand" "=r, r,m,w,!r,w,w, Uv")
+	(match_operand:DI 1 "di_operand"              "rIK,mi,r,r,w,w,Uvi,w"))]
+  "TARGET_ARM && TARGET_HARD_FLOAT && TARGET_VFP && arm_tune == cortexa8
    && (   register_operand (operands[0], DImode)
        || register_operand (operands[1], DImode))"
   "*