Patchwork Wrap calculation of PIC address into a single instruction

login
register
mail settings
Submitter Maxim Kuvyrkov
Date June 23, 2010, 9:19 p.m.
Message ID <4C227A74.1070201@codesourcery.com>
Download mbox | patch
Permalink /patch/56720/
State New
Headers show

Comments

Maxim Kuvyrkov - June 23, 2010, 9:19 p.m.
This patch enables optimizations, particularly GCSE, handle calculation 
of PIC addresses.  GCSE tracks only single instructions, so it can't 
handle two-instruction calculation of PIC address.

With this patch, calculations of PIC addresses are represented as single 
instructions allowing GCSE eliminate all but the first address 
calculation for global variables.

Any comments?  OK to check in?
Andrew Pinski - June 23, 2010, 9:22 p.m.
On Wed, Jun 23, 2010 at 2:19 PM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
> This patch enables optimizations, particularly GCSE, handle calculation of
> PIC addresses.  GCSE tracks only single instructions, so it can't handle
> two-instruction calculation of PIC address.

The reg_equal note on the second instruction should have been enough
to solve this issue.  This is how it is optimized on PowerPC and some
other targets.  Why is not working for arm?

Thanks,
Andrew Pinski
Steven Bosscher - June 23, 2010, 9:22 p.m.
On Wed, Jun 23, 2010 at 11:19 PM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
> This patch enables optimizations, particularly GCSE, handle calculation of
> PIC addresses.  GCSE tracks only single instructions, so it can't handle
> two-instruction calculation of PIC address.
>
> With this patch, calculations of PIC addresses are represented as single
> instructions allowing GCSE eliminate all but the first address calculation
> for global variables.
>
> Any comments?

Yes. This is what we added GCSE's ability to eliminate redundancies
from REG_EQUAL notes for. If your PIC addresses have a REG_EQUAL note,
GCSE is (or should be) already able to eliminate redundant address
calculations.

Ciao!
Steven
Maxim Kuvyrkov - June 23, 2010, 9:50 p.m.
On 6/24/10 1:22 AM, Steven Bosscher wrote:
> On Wed, Jun 23, 2010 at 11:19 PM, Maxim Kuvyrkov<maxim@codesourcery.com>  wrote:
>> This patch enables optimizations, particularly GCSE, handle calculation of
>> PIC addresses.  GCSE tracks only single instructions, so it can't handle
>> two-instruction calculation of PIC address.
>>
>> With this patch, calculations of PIC addresses are represented as single
>> instructions allowing GCSE eliminate all but the first address calculation
>> for global variables.
>>
>> Any comments?
>
> Yes. This is what we added GCSE's ability to eliminate redundancies
> from REG_EQUAL notes for. If your PIC addresses have a REG_EQUAL note,
> GCSE is (or should be) already able to eliminate redundant address
> calculations.

You know, it turns out GCSE can eliminate calculation of PIC addresses 
on ARM.  When I started working on improving code hoisting the example 
with PIC address wasn't fully optimized without this patch.

Now, apparently, one of the other GCSE improvements (VBEout computation, 
probably) fixed the underlying problem.

Richard, unless you think the patch may be valuable for some other 
reason, I'm dropping it.

Thanks,
Maxim Kuvyrkov - June 24, 2010, 11:04 a.m.
On 6/24/10 1:50 AM, Maxim Kuvyrkov wrote:
> On 6/24/10 1:22 AM, Steven Bosscher wrote:
>> On Wed, Jun 23, 2010 at 11:19 PM, Maxim
>> Kuvyrkov<maxim@codesourcery.com> wrote:
>>> This patch enables optimizations, particularly GCSE, handle
>>> calculation of
>>> PIC addresses. GCSE tracks only single instructions, so it can't handle
>>> two-instruction calculation of PIC address.
>>>
>>> With this patch, calculations of PIC addresses are represented as single
>>> instructions allowing GCSE eliminate all but the first address
>>> calculation
>>> for global variables.
>>>
>>> Any comments?
>>
>> Yes. This is what we added GCSE's ability to eliminate redundancies
>> from REG_EQUAL notes for. If your PIC addresses have a REG_EQUAL note,
>> GCSE is (or should be) already able to eliminate redundant address
>> calculations.
>
> You know, it turns out GCSE can eliminate calculation of PIC addresses
> on ARM. When I started working on improving code hoisting the example
> with PIC address wasn't fully optimized without this patch.

It was late in the night when I checked the generated code and, although 
GCSE of PIC addresses for ARM is now better, it is still not as good as 
with this patch.

GCSE cannot use (REG_EQUAL (symbol_ref)) note on the second instruction 
because can_assign_to_reg_without_clobbers returns false for symbol_ref 
when compiling PIC code.  (Symbol_ref) is not a 
LEGITIMATE_PIC_OPERAND_P, so it not a general_operand either.  The 
second check in can_assign_to_reg_without_clobbers returns false as (set 
(reg) (symbol_ref)) yields invalid instruction.

AFAICT, GCSE cannot optimize a bare symbol_ref for ARM PIC code because 
it has no guarantee that emit_move_insn (reg, symbol_ref) will generate 
simple enough code.

>
> Now, apparently, one of the other GCSE improvements (VBEout computation,
> probably) fixed the underlying problem.

Improvements to GCSE'ing constants were able to optimize half of second 
and subsequent address calculations.

>
> Richard, unless you think the patch may be valuable for some other
> reason, I'm dropping it.

The patch stands for now.

Thank you,
Maxim Kuvyrkov - June 29, 2010, 6:28 p.m.
On 6/24/10 3:04 PM, Maxim Kuvyrkov wrote:
...
> GCSE cannot use (REG_EQUAL (symbol_ref)) note on the second instruction
> because can_assign_to_reg_without_clobbers returns false for symbol_ref
> when compiling PIC code. (Symbol_ref) is not a LEGITIMATE_PIC_OPERAND_P,
> so it not a general_operand either. The second check in
> can_assign_to_reg_without_clobbers returns false as (set (reg)
> (symbol_ref)) yields invalid instruction.
...

Ping.

Improvements to code hoisting provide about 0.8% size reduction on Thumb 
PIC code.  Without this patch space reduction is much less.

Thank you,
Richard Earnshaw - July 1, 2010, 12:40 p.m.
On Thu, 2010-06-24 at 01:19 +0400, Maxim Kuvyrkov wrote:
> This patch enables optimizations, particularly GCSE, handle calculation 
> of PIC addresses.  GCSE tracks only single instructions, so it can't 
> handle two-instruction calculation of PIC address.
> 
> With this patch, calculations of PIC addresses are represented as single 
> instructions allowing GCSE eliminate all but the first address 
> calculation for global variables.
> 
> Any comments?  OK to check in?

+/* Return true to X will surely end up in an index register after the first
+   splitting pass.  */

s/Return true to/Return true if/

Other than that, this is ok.

R.

Patch

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5671587..d846557 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -4897,17 +4897,13 @@  legitimize_pic_address (rtx orig, enum machine_mode mode, rtx reg)
   if (GET_CODE (orig) == SYMBOL_REF
       || GET_CODE (orig) == LABEL_REF)
     {
-      rtx pic_ref, address;
       rtx insn;
 
       if (reg == 0)
 	{
 	  gcc_assert (can_create_pseudo_p ());
 	  reg = gen_reg_rtx (Pmode);
-	  address = gen_reg_rtx (Pmode);
 	}
-      else
-	address = reg;
 
       /* VxWorks does not impose a fixed gap between segments; the run-time
 	 gap can be different from the object-file gap.  We therefore can't
@@ -4923,18 +4919,21 @@  legitimize_pic_address (rtx orig, enum machine_mode mode, rtx reg)
 	insn = arm_pic_static_addr (orig, reg);
       else
 	{
+	  rtx pat;
+	  rtx mem;
+
 	  /* If this function doesn't have a pic register, create one now.  */
 	  require_pic_register ();
 
-	  if (TARGET_32BIT)
-	    emit_insn (gen_pic_load_addr_32bit (address, orig));
-	  else /* TARGET_THUMB1 */
-	    emit_insn (gen_pic_load_addr_thumb1 (address, orig));
+	  pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);
 
-	  pic_ref = gen_const_mem (Pmode,
-				   gen_rtx_PLUS (Pmode, cfun->machine->pic_reg,
-					         address));
-	  insn = emit_move_insn (reg, pic_ref);
+	  /* Make the MEM as close to a constant as possible.  */
+	  mem = SET_SRC (pat);
+	  gcc_assert (MEM_P (mem) && !MEM_VOLATILE_P (mem));
+	  MEM_READONLY_P (mem) = 1;
+	  MEM_NOTRAP_P (mem) = 1;
+
+	  insn = emit_insn (pat);
 	}
 
       /* Put a REG_EQUAL note on this insn, so that it can be optimized
@@ -5214,6 +5213,15 @@  pcrel_constant_p (rtx x)
   return FALSE;
 }
 
+/* Return true to X will surely end up in an index register after the first
+   splitting pass.  */
+static bool
+will_be_in_index_register (const_rtx x)
+{
+  /* arm.md: calculate_pic_address will split this into a register.  */
+  return GET_CODE (x) == UNSPEC && XINT (x, 1) == UNSPEC_PIC_SYM;
+}
+
 /* Return nonzero if X is a valid ARM state address operand.  */
 int
 arm_legitimate_address_outer_p (enum machine_mode mode, rtx x, RTX_CODE outer,
@@ -5271,8 +5279,9 @@  arm_legitimate_address_outer_p (enum machine_mode mode, rtx x, RTX_CODE outer,
       rtx xop1 = XEXP (x, 1);
 
       return ((arm_address_register_rtx_p (xop0, strict_p)
-	       && GET_CODE(xop1) == CONST_INT
-	       && arm_legitimate_index_p (mode, xop1, outer, strict_p))
+	       && ((GET_CODE(xop1) == CONST_INT
+		    && arm_legitimate_index_p (mode, xop1, outer, strict_p))
+		   || (!strict_p && will_be_in_index_register (xop1))))
 	      || (arm_address_register_rtx_p (xop1, strict_p)
 		  && arm_legitimate_index_p (mode, xop0, outer, strict_p)));
     }
@@ -5358,7 +5367,8 @@  thumb2_legitimate_address_p (enum machine_mode mode, rtx x, int strict_p)
       rtx xop1 = XEXP (x, 1);
 
       return ((arm_address_register_rtx_p (xop0, strict_p)
-	       && thumb2_legitimate_index_p (mode, xop1, strict_p))
+	       && (thumb2_legitimate_index_p (mode, xop1, strict_p)
+		   || (!strict_p && will_be_in_index_register (xop1))))
 	      || (arm_address_register_rtx_p (xop1, strict_p)
 		  && thumb2_legitimate_index_p (mode, xop0, strict_p)));
     }
@@ -5661,7 +5671,8 @@  thumb1_legitimate_address_p (enum machine_mode mode, rtx x, int strict_p)
 	  && XEXP (x, 0) != frame_pointer_rtx
 	  && XEXP (x, 1) != frame_pointer_rtx
 	  && thumb1_index_register_rtx_p (XEXP (x, 0), strict_p)
-	  && thumb1_index_register_rtx_p (XEXP (x, 1), strict_p))
+	  && (thumb1_index_register_rtx_p (XEXP (x, 1), strict_p)
+	      || (!strict_p && will_be_in_index_register (XEXP (x, 1)))))
 	return 1;
 
       /* REG+const has 5-7 bit offset for non-SP registers.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index b6cca49..534bfc7 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5231,6 +5231,34 @@ 
 ;; we use an unspec.  The offset will be loaded from a constant pool entry,
 ;; since that is the only type of relocation we can use.
 
+;; Wrap calculation of the whole PIC address in a single pattern for the
+;; benefit of optimizers, particularly, PRE and HOIST.  Calculation of
+;; a PIC address involves two loads from memory, so we want to CSE it
+;; as often as possible.
+;; This pattern will be split into one of the pic_load_addr_* patterns
+;; and a move after GCSE optimizations.
+;;
+;; Note: Update arm.c: legitimize_pic_address() when changing this pattern.
+(define_expand "calculate_pic_address"
+  [(set (match_operand:SI 0 "register_operand" "")
+	(mem:SI (plus:SI (match_operand:SI 1 "register_operand" "")
+			 (unspec:SI [(match_operand:SI 2 "" "")]
+				    UNSPEC_PIC_SYM))))]
+  "flag_pic"
+)
+
+;; Split calculate_pic_address into pic_load_addr_* and a move.
+(define_split
+  [(set (match_operand:SI 0 "register_operand" "")
+	(mem:SI (plus:SI (match_operand:SI 1 "register_operand" "")
+			 (unspec:SI [(match_operand:SI 2 "" "")]
+				    UNSPEC_PIC_SYM))))]
+  "flag_pic"
+  [(set (match_dup 3) (unspec:SI [(match_dup 2)] UNSPEC_PIC_SYM))
+   (set (match_dup 0) (mem:SI (plus:SI (match_dup 1) (match_dup 3))))]
+  "operands[3] = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];"
+)
+
 ;; The rather odd constraints on the following are to force reload to leave
 ;; the insn alone, and to force the minipool generation pass to then move
 ;; the GOT symbol to memory.