Patchwork [ARM] Fix PR50313 and handle PIC addresses properly.

login
register
mail settings
Submitter Ramana Radhakrishnan
Date Jan. 18, 2012, 3:03 p.m.
Message ID <CACUk7=XfRyo2bWkQ_AWnhe8dz2cykFBmmUiZj_f6W7GYM48_Mg@mail.gmail.com>
Download mbox | patch
Permalink /patch/136638/
State New
Headers show

Comments

Ramana Radhakrishnan - Jan. 18, 2012, 3:03 p.m.
Hi ,

PR50313 is a case where having the 2 patterns "pic_load_addr_*" and
"pic_add_dot_plus_eight/four" from expand time becomes a problem as
discussed by Jakub in his comment in the audit trail for PR48308.
There is a separate problem in combine as explained by my comment in
the audit trail for PR48308 and my RFC patch for the same sometime
last week.

What I've done in this case is to take the existing patterns merged
them into a UNIFIED form which means that we can hoist the entire
computation out with any of the loop optimizers rather than piece-meal
hoping to have each one being hoisted correctly. I've had to adjust
the cost of this to be equivalent to that of a memory load ( it's in
fact the cost of a memory load + the cost of an add instruction but
it's good enough for the moment). It's reasonably close enough for
arm_size_rtx_costs, and thumb1_rtx_costs pushes this high enough that
it will warrant a hoist anyway. I've also been conservative in
adjusting cannot_copy_insn_p to disallow copying such insns to prevent
any cases where you might have duplicate labels being generated.

Once this happens the original bug report in PR50313 appears to be
fixed and I've had this survive a bootstrap and regression test on an
A9 board and this has successfully cross-built a version of eglibc
with an ARM v7-a configuration (which is currently undergoing
testing).

There are a few ways by which we can get better code - one is to
replace the load from the constant pool with a movw / movt pair of the
actual differences in which case we'd be able to get performant code
in libraries for this case, the other which as RichardE suggested in a
conversation was to allow the splitter to actually generate the
internal labels in the pic_load_addr and pic_add_dot_plus_eight which
could then mean that the label number is really not reserved until
this complex pattern has been split. I don't think these enhancements
are worth doing so late in stage4 and I do want to backport this patch
as it stands into the 4.6 branch.

OK (and to backport to 4.6 branch ?) if no regressions ?

cheers
Ramana


2012-01-18  Ramana Radhakrishnan  <ramana.radhakrishnan@linaro.org>

	PR target/50313
	* config/arm/arm.c (arm_load_pic_register): Use
	gen_pic_load_addr_unified. Delete calls to gen_pic_load_addr_32bit
	, gen_pic_add_dot_plus_eight and gen_pic_add_dot_plus_four.
	(arm_pic_static_addr): Likewise.
	(arm_rtx_costs_1): Adjust cost for UNSPEC_PIC_UNIFIED.
	(arm_note_pic_base): Handle UNSPEC_PIC_UNIFIED.
	* config/arm/arm.md (UNSPEC_PIC_UNIFIED): Define.
	(pic_load_addr_unified): New.
Jakub Jelinek - Jan. 18, 2012, 3:26 p.m.
On Wed, Jan 18, 2012 at 03:03:47PM +0000, Ramana Radhakrishnan wrote:
> +;; operand1 is the memory address to go into pic_load_addr_32bit.
> +;; operand2 is the PIC label to be emitted from pic_add_dot_plus_*.
> +;; We do this to allow hoisting of the entire c
> +(define_insn_and_split "pic_load_addr_unified"
> +  [(set (match_operand:SI 0 "s_register_operand" "=r,r,l")
> +	(unspec:SI [(match_operand:SI 1 "" "mX,mX,mX")
> +		    (match_operand:SI 2 "" "")]
> +		    UNSPEC_PIC_UNIFIED))]
> + "flag_pic"
> + "#"
> + "flag_pic"

What's the reason to duplicate the condition here?  If flag_pic isn't
non-zero, then the insn wouldn't match, therefore wouldn't be split...

Also, while this fixes the testcase, at least if you split it before reload
e.g. first scheduling pass might move away the two insns far away from each
other, register allocation might decide to spill the pseudo set by the first
insn to the stack and the same problem could reappear.
That doesn't mean your patch isn't an incremental improvement.

> + [(set (match_dup 3) (unspec:SI [(match_dup 1)] UNSPEC_PIC_SYM))
> +  (set (match_dup 0) (unspec:SI [(match_dup 3) (match_dup 4)
> +       		     		 (match_dup 2)] UNSPEC_PIC_BASE))]
> + "operands[3] = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
> +  operands[4] = TARGET_THUMB ? GEN_INT (4) : GEN_INT (8);"
> + [(set_attr "type" "load1,load1,load1")
> +  (set_attr "pool_range" "4096,4096,1024")
> +  (set_attr "neg_pool_range" "0,4084,0")
> +  (set_attr "arch"  "a,t2,t1")
> +  (set_attr "length" "8,6,4")]
> +)
> +

	Jakub
Ramana Radhakrishnan - Jan. 18, 2012, 4:54 p.m.
On 18 January 2012 15:26, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Jan 18, 2012 at 03:03:47PM +0000, Ramana Radhakrishnan wrote:
>> +;; operand1 is the memory address to go into pic_load_addr_32bit.
>> +;; operand2 is the PIC label to be emitted from pic_add_dot_plus_*.
>> +;; We do this to allow hoisting of the entire c
>> +(define_insn_and_split "pic_load_addr_unified"
>> +  [(set (match_operand:SI 0 "s_register_operand" "=r,r,l")
>> +     (unspec:SI [(match_operand:SI 1 "" "mX,mX,mX")
>> +                 (match_operand:SI 2 "" "")]
>> +                 UNSPEC_PIC_UNIFIED))]
>> + "flag_pic"
>> + "#"
>> + "flag_pic"
>
> What's the reason to duplicate the condition here?  If flag_pic isn't
> non-zero, then the insn wouldn't match, therefore wouldn't be split...

True -

>
> Also, while this fixes the testcase, at least if you split it before reload
> e.g. first scheduling pass might move away the two insns far away from each
> other, register allocation might decide to spill the pseudo set by the first
> insn to the stack and the same problem could reappear.

Since this was always being scheduled as a standard load I didn't have
expected this to be pulled too far apart from the use - we only
schedule loads as though these were off the cache anyway. I will
change to get this split after reload to make this as robust as it
possibly can be .

I will note that the tls patterns also would need a similar overhaul
but I don't want to do that till stage1 reopens again.

There is another problem with this patch in that I've got the
neg_pool_range for ARM and Thumb2 back to front - will change that and
then commit.


regards,
Ramana

Patch

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4c310d4..240434f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -5578,11 +5578,7 @@  arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED)
 
       if (TARGET_32BIT)
 	{
-	  emit_insn (gen_pic_load_addr_32bit (pic_reg, pic_rtx));
-	  if (TARGET_ARM)
-	    emit_insn (gen_pic_add_dot_plus_eight (pic_reg, pic_reg, labelno));
-	  else
-	    emit_insn (gen_pic_add_dot_plus_four (pic_reg, pic_reg, labelno));
+	  emit_insn (gen_pic_load_addr_unified (pic_reg, pic_rtx, labelno));
 	}
       else /* TARGET_THUMB1 */
 	{
@@ -5595,10 +5591,10 @@  arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED)
 				     thumb_find_work_register (saved_regs));
 	      emit_insn (gen_pic_load_addr_thumb1 (pic_tmp, pic_rtx));
 	      emit_insn (gen_movsi (pic_offset_table_rtx, pic_tmp));
+	      emit_insn (gen_pic_add_dot_plus_four (pic_reg, pic_reg, labelno));
 	    }
 	  else
-	    emit_insn (gen_pic_load_addr_thumb1 (pic_reg, pic_rtx));
-	  emit_insn (gen_pic_add_dot_plus_four (pic_reg, pic_reg, labelno));
+	    emit_insn (gen_pic_load_addr_unified (pic_reg, pic_rtx, labelno));
 	}
     }
 
@@ -5628,20 +5624,7 @@  arm_pic_static_addr (rtx orig, rtx reg)
                                UNSPEC_SYMBOL_OFFSET);
   offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);
 
-  if (TARGET_32BIT)
-    {
-      emit_insn (gen_pic_load_addr_32bit (reg, offset_rtx));
-      if (TARGET_ARM)
-        insn = emit_insn (gen_pic_add_dot_plus_eight (reg, reg, labelno));
-      else
-        insn = emit_insn (gen_pic_add_dot_plus_four (reg, reg, labelno));
-    }
-  else /* TARGET_THUMB1 */
-    {
-      emit_insn (gen_pic_load_addr_thumb1 (reg, offset_rtx));
-      insn = emit_insn (gen_pic_add_dot_plus_four (reg, reg, labelno));
-    }
-
+  insn = emit_insn (gen_pic_load_addr_unified (reg, offset_rtx, labelno));
   return insn;
 }
 
@@ -7648,6 +7631,15 @@  arm_rtx_costs_1 (rtx x, enum rtx_code outer, int* total, bool speed)
 
     case SET:
       return false;
+      
+    case UNSPEC:
+      /* We cost this as high as our memory costs to allow this to
+	 be hoisted from loops.  */
+      if (XINT (x, 1) == UNSPEC_PIC_UNIFIED)
+	{
+	  *total = COSTS_N_INSNS (2 + ARM_NUM_REGS (mode));
+	}
+      return true;
 
     default:
       *total = COSTS_N_INSNS (4);
@@ -10008,7 +10000,8 @@  static int
 arm_note_pic_base (rtx *x, void *date ATTRIBUTE_UNUSED)
 {
   if (GET_CODE (*x) == UNSPEC
-      && XINT (*x, 1) == UNSPEC_PIC_BASE)
+      && (XINT (*x, 1) == UNSPEC_PIC_BASE
+	  || XINT (*x, 1) == UNSPEC_PIC_UNIFIED))
     return 1;
   return 0;
 }
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 5620d7d..8842846 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -116,6 +116,7 @@ 
 			; unaligned locations, on architectures which support
 			; that.
   UNSPEC_UNALIGNED_STORE ; Same for str/strh.
+  UNSPEC_PIC_UNIFIED    ; Create a common pic addressing form.
 ])
 
 ;; UNSPEC_VOLATILE Usage:
@@ -5601,7 +5602,7 @@ 
   "flag_pic"
 )
 
-;; Split calculate_pic_address into pic_load_addr_* and a move.
+;; Split calculate_pic_address into pic_load_addr_32bit/thumb1 and a move.
 (define_split
   [(set (match_operand:SI 0 "register_operand" "")
 	(mem:SI (plus:SI (match_operand:SI 1 "register_operand" "")
@@ -5613,6 +5614,29 @@ 
   "operands[3] = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];"
 )
 
+;; operand1 is the memory address to go into pic_load_addr_32bit.
+;; operand2 is the PIC label to be emitted from pic_add_dot_plus_*.
+;; We do this to allow hoisting of the entire c
+(define_insn_and_split "pic_load_addr_unified"
+  [(set (match_operand:SI 0 "s_register_operand" "=r,r,l")
+	(unspec:SI [(match_operand:SI 1 "" "mX,mX,mX")
+		    (match_operand:SI 2 "" "")]
+		    UNSPEC_PIC_UNIFIED))]
+ "flag_pic"
+ "#"
+ "flag_pic"
+ [(set (match_dup 3) (unspec:SI [(match_dup 1)] UNSPEC_PIC_SYM))
+  (set (match_dup 0) (unspec:SI [(match_dup 3) (match_dup 4)
+       		     		 (match_dup 2)] UNSPEC_PIC_BASE))]
+ "operands[3] = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
+  operands[4] = TARGET_THUMB ? GEN_INT (4) : GEN_INT (8);"
+ [(set_attr "type" "load1,load1,load1")
+  (set_attr "pool_range" "4096,4096,1024")
+  (set_attr "neg_pool_range" "0,4084,0")
+  (set_attr "arch"  "a,t2,t1")
+  (set_attr "length" "8,6,4")]
+)
+
 ;; The rather odd constraints on the following are to force reload to leave
 ;; the insn alone, and to force the minipool generation pass to then move
 ;; the GOT symbol to memory.