diff mbox

Record equivalences for spill registers

Message ID 87inlfsr6e.fsf@linaro.org
State New
Headers show

Commit Message

Richard Sandiford May 5, 2017, 7:23 a.m. UTC
If we decide to allocate a call-clobbered register R to a value that
is live across a call, LRA will create a new spill register TMPR,
insert:

   TMPR <- R

before the call and

   R <- TMPR

after it.  But if we then failed to allocate a register to TMPR, we would
always spill it to the stack, even if R was known to be equivalent to
a constant or to some existing memory location.  And on AArch64, we'd
always fail to allocate such a register for 128-bit Advanced SIMD modes,
since no registers of those modes are call-preserved.

This patch avoids the problem by copying the equivalence information
from the original pseudo to the spill register.  It means that the
code for the testcase is as good with -O2 as it is with -O,
whereas previously the -O code was better.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[Based on commit branches/ARM/sve-branch@247248]

2017-05-05  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* lra-constraints.c (lra_copy_reg_equiv): New function.
	(split_reg): Use it to copy equivalence information from the
	original register to the spill register.

gcc/testsuite/
	* gcc.target/aarch64/spill_1.c: New test.

Comments

Jeff Law May 5, 2017, 4:50 p.m. UTC | #1
On 05/05/2017 01:23 AM, Richard Sandiford wrote:
> If we decide to allocate a call-clobbered register R to a value that
> is live across a call, LRA will create a new spill register TMPR,
> insert:
> 
>     TMPR <- R
> 
> before the call and
> 
>     R <- TMPR
> 
> after it.  But if we then failed to allocate a register to TMPR, we would
> always spill it to the stack, even if R was known to be equivalent to
> a constant or to some existing memory location.  And on AArch64, we'd
> always fail to allocate such a register for 128-bit Advanced SIMD modes,
> since no registers of those modes are call-preserved.
> 
> This patch avoids the problem by copying the equivalence information
> from the original pseudo to the spill register.  It means that the
> code for the testcase is as good with -O2 as it is with -O,
> whereas previously the -O code was better.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> 
> Thanks,
> Richard
> 
> 
> [Based on commit branches/ARM/sve-branch@247248]
> 
> 2017-05-05  Richard Sandiford  <richard.sandiford@linaro.org>
> 
> gcc/
> 	* lra-constraints.c (lra_copy_reg_equiv): New function.
> 	(split_reg): Use it to copy equivalence information from the
> 	original register to the spill register.
> 
> gcc/testsuite/
> 	* gcc.target/aarch64/spill_1.c: New test.
OK.
jeff
Jim Wilson May 8, 2017, 4:30 a.m. UTC | #2
On 05/05/2017 12:23 AM, Richard Sandiford wrote:
> 2017-05-05  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
> 	* lra-constraints.c (lra_copy_reg_equiv): New function.
> 	(split_reg): Use it to copy equivalence information from the
> 	original register to the spill register.

This patch breaks aarch64 bootstrap.  I get a link error for lto1 and f951

godump.o: In function `go_define(unsigned int, char const*)':
godump.c:(.text+0x36c): relocation truncated to fit:
R_AARCH64_ADR_PREL_LO21 against `.rodata'
godump.c:(.text+0x4b4): relocation truncated to fit: 
R_AARCH64_ADR_PREL_LO21 against `.rodata'

The godump.c.271r.ira file looks OK, I see

(insn 237 223 225 10 (set (reg/f:DI 489)
         (high:DI (label_ref 240))) "../../gcc-svn/gcc/godump.c":174 49 
{*movdi_aarch64}
      (expr_list:REG_EQUIV (high:DI (label_ref 240))
         (insn_list:REG_LABEL_OPERAND 240 (nil))))
...
(insn 238 115 1157 10 (set (reg/f:DI 490)
         (lo_sum:DI (reg/f:DI 489)
             (label_ref 240))) "../../gcc-svn/gcc/godump.c":174 929 
{add_losym_di}
      (expr_list:REG_DEAD (reg/f:DI 489)
         (expr_list:REG_EQUIV (label_ref 240)
             (insn_list:REG_LABEL_OPERAND 240 (nil)))))

But in the godump.c.272r.reload file I see in a different basic block

(insn 1244 76 1161 22 (set (reg/f:DI 7 x7 [490])
         (label_ref 240)) "../../gcc-svn/gcc/godump.c":221 49 
{*movdi_aarch64}
      (nil))

which is not OK.  This label ref is the address of a jumptable in the 
rodata section, and can't be loaded with a single instruction.  It looks 
like there needs to be some extra work when rematerializing, to handle 
equiv values that can't just be copied to a register.

I haven't had a chance to step through this in a debugger to see what is 
going on yet.

Jim
Andrew Pinski May 8, 2017, 5:28 a.m. UTC | #3
On Sun, May 7, 2017 at 10:26 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Sun, May 7, 2017 at 9:30 PM, Jim Wilson <jim.wilson@linaro.org> wrote:
>> On 05/05/2017 12:23 AM, Richard Sandiford wrote:
>>>
>>> 2017-05-05  Richard Sandiford  <richard.sandiford@linaro.org>
>>>
>>> gcc/
>>>         * lra-constraints.c (lra_copy_reg_equiv): New function.
>>>         (split_reg): Use it to copy equivalence information from the
>>>         original register to the spill register.
>>
>>
>> This patch breaks aarch64 bootstrap.  I get a link error for lto1 and f951
>>
>> godump.o: In function `go_define(unsigned int, char const*)':
>> godump.c:(.text+0x36c): relocation truncated to fit:
>> R_AARCH64_ADR_PREL_LO21 against `.rodata'
>> godump.c:(.text+0x4b4): relocation truncated to fit: R_AARCH64_ADR_PREL_LO21
>> against `.rodata'
>>
>> The godump.c.271r.ira file looks OK, I see
>>
>> (insn 237 223 225 10 (set (reg/f:DI 489)
>>         (high:DI (label_ref 240))) "../../gcc-svn/gcc/godump.c":174 49
>> {*movdi_aarch64}
>>      (expr_list:REG_EQUIV (high:DI (label_ref 240))
>>         (insn_list:REG_LABEL_OPERAND 240 (nil))))
>> ...
>> (insn 238 115 1157 10 (set (reg/f:DI 490)
>>         (lo_sum:DI (reg/f:DI 489)
>>             (label_ref 240))) "../../gcc-svn/gcc/godump.c":174 929
>> {add_losym_di}
>>      (expr_list:REG_DEAD (reg/f:DI 489)
>>         (expr_list:REG_EQUIV (label_ref 240)
>>             (insn_list:REG_LABEL_OPERAND 240 (nil)))))
>>
>> But in the godump.c.272r.reload file I see in a different basic block
>>
>> (insn 1244 76 1161 22 (set (reg/f:DI 7 x7 [490])
>>         (label_ref 240)) "../../gcc-svn/gcc/godump.c":221 49
>> {*movdi_aarch64}
>>      (nil))
>>
>> which is not OK.  This label ref is the address of a jumptable in the rodata
>> section, and can't be loaded with a single instruction.  It looks like there
>> needs to be some extra work when rematerializing, to handle equiv values
>> that can't just be copied to a register.
>
> Hmm, shouldn't have the mov rejected as being invalid?  I think that
> is one issue with the aarch64 backend there.
>
> See https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01800.html
>
>
> I can't remember if the following patch was ever submitted or committed.
>
> Here are my notes about this patch from the internal bug report we got
> here at Cavium (back in 2013):
>
> Switch tables are implemented using the tiny model but they are stored
> in the rodata section which means they could overflow the address.


Some more notes:

(In reply to comment #15)
> (In reply to comment #14)
> > //(insn 793 35 795 (set (reg/f:DI 2 x2 [450])
> > //        (label_ref 456)) 32 {*movdi_aarch64}
> > //     (insn_list:REG_LABEL_OPERAND 456 (expr_list:REG_EQUAL (label_ref 456)
> > //            (nil))))
> > adr x2, .L806 // 793 *movdi_aarch64/9 [length = 4]
>
> Which is generated by the register allocator and we never split it into the
> adrp/add again.

The register allocator is doing an ok job as the backend said this is
a valid pattern.  We need a constraint for
*movdi_aarch64/*movsi_aarch64 which says this is not a valid pattern
and to go through the standard gen_movinsn path.


Thanks,
Andrew Pinski

>
> Note this patch most likely won't apply directly either:
>
> From: Andrew Pinski <apinski@cavium.com>
> Date: Thu, 15 Aug 2013 20:42:12 +0000 (-0700)
> Subject: 2013-08-11  Andrew Pinski  <apinski@cavium.com>
> X-Git-Tag: tc47_SDK_3_1_0_build_28~9
> X-Git-Url: http://cagit1.caveonetworks.com/git/?p=toolchain%2Fgcc.git;a=commitdiff_plain;h=1714f167b575956801264db5cd2300203a58e3e9
>
> 2013-08-11  Andrew Pinski  <apinski@cavium.com>
>
> Bug #7079
> * config/aarch64/aarch64.md (*movsi_aarch64): Use Ust instead of S constrain.
> (*movdi_aarch64): Likewise.
> (ldr_got_small_sidi): Add type attribute.
> * config/aarch64/constraints.md (Ust): New constraint like S but only
> if the symbol is SYMBOL_TINY_ABSOLUTE.
> ---
>
> diff --git a/gcc/ChangeLog.aarch64 b/gcc/ChangeLog.aarch64
> index 3c8ff13..f4e1e91 100644
> --- a/gcc/ChangeLog.aarch64
> +++ b/gcc/ChangeLog.aarch64
> @@ -1,3 +1,12 @@
> +2013-08-11  Andrew Pinski  <apinski@cavium.com>
> +
> + Bug #7079
> + * config/aarch64/aarch64.md (*movsi_aarch64): Use Ust instead of S constrain.
> + (*movdi_aarch64): Likewise.
> + (ldr_got_small_sidi): Add type attribute.
> + * config/aarch64/constraints.md (Ust): New constraint like S but only
> + if the symbol is SYMBOL_TINY_ABSOLUTE.
> +
>  2013-08-14  Yufeng Zhang  <yufeng.zhang@arm.com>
>
>   * function.c (assign_parm_find_data_types): Set passed_mode and
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 485bd59..0cd6a41 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -810,7 +810,7 @@
>
>  (define_insn "*movsi_aarch64"
>    [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,*w,m,
> m,r,r  ,*w, r,*w")
> - (match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,m,
> m,rZ,*w,S,Ush,rZ,*w,*w"))]
> + (match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,m,
> m,rZ,*w,Ust,Ush,rZ,*w,*w"))]
>    "(register_operand (operands[0], SImode)
>      || aarch64_reg_or_zero (operands[1], SImode))"
>    "@
> @@ -836,7 +836,7 @@
>
>  (define_insn "*movdi_aarch64"
>    [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,*w,m,
> m,r,r,  *w, r,*w,w")
> - (match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,m,
> m,rZ,*w,S,Ush,rZ,*w,*w,Dd"))]
> + (match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,m,
> m,rZ,*w,Ust,Ush,rZ,*w,*w,Dd"))]
>    "(register_operand (operands[0], DImode)
>      || aarch64_reg_or_zero (operands[1], DImode))"
>    "@
> @@ -3978,6 +3978,7 @@
>    "TARGET_ILP32"
>    "ldr\\t%w0, [%1, #:got_lo12:%a2]"
>    [(set_attr "v8type" "load1")
> +   (set_attr "type" "load1")
>     (set_attr "mode" "DI")]
>  )
>
> diff --git a/gcc/config/aarch64/constraints.md
> b/gcc/config/aarch64/constraints.md
> index 2570400..a36bdaf 100644
> --- a/gcc/config/aarch64/constraints.md
> +++ b/gcc/config/aarch64/constraints.md
> @@ -114,6 +114,12 @@
>    (and (match_code "const_int")
>         (match_test "(unsigned) exact_log2 (ival) <= 4")))
>
> +(define_constraint "Ust"
> +  "A constraint that matches an absolute symbolic address; only for
> tiny model."
> +  (and (match_code "const,symbol_ref,label_ref")
> +       (match_test "aarch64_symbolic_address_p (op)
> +    && aarch64_classify_symbolic_expression (op, SYMBOL_CONTEXT_ADR)
> == SYMBOL_TINY_ABSOLUTE")))
> +
>  (define_memory_constraint "Q"
>   "A memory address which uses a single base register with no offset."
>   (and (match_code "mem")
>
>
>>
>> I haven't had a chance to step through this in a debugger to see what is
>> going on yet.
>>
>> Jim
>>
diff mbox

Patch

Index: gcc/lra-constraints.c
===================================================================
--- gcc/lra-constraints.c	2017-04-18 19:52:35.062175087 +0100
+++ gcc/lra-constraints.c	2017-05-05 08:19:18.243479648 +0100
@@ -5394,6 +5394,29 @@  choose_split_class (enum reg_class alloc
 #endif
 }
 
+/* Copy any equivalence information from ORIGINAL_REGNO to NEW_REGNO.
+   It only makes sense to call this function if NEW_REGNO is always
+   equal to ORIGINAL_REGNO.  */
+
+static void
+lra_copy_reg_equiv (unsigned int new_regno, unsigned int original_regno)
+{
+  if (!ira_reg_equiv[original_regno].defined_p)
+    return;
+
+  ira_expand_reg_equiv ();
+  ira_reg_equiv[new_regno].defined_p = true;
+  if (ira_reg_equiv[original_regno].memory)
+    ira_reg_equiv[new_regno].memory
+      = copy_rtx (ira_reg_equiv[original_regno].memory);
+  if (ira_reg_equiv[original_regno].constant)
+    ira_reg_equiv[new_regno].constant
+      = copy_rtx (ira_reg_equiv[original_regno].constant);
+  if (ira_reg_equiv[original_regno].invariant)
+    ira_reg_equiv[new_regno].invariant
+      = copy_rtx (ira_reg_equiv[original_regno].invariant);
+}
+
 /* Do split transformations for insn INSN, which defines or uses
    ORIGINAL_REGNO.  NEXT_USAGE_INSNS specifies which instruction in
    the EBB next uses ORIGINAL_REGNO; it has the same form as the
@@ -5515,6 +5538,7 @@  split_reg (bool before_p, int original_r
       new_reg = lra_create_new_reg (mode, original_reg, rclass, "split");
       reg_renumber[REGNO (new_reg)] = hard_regno;
     }
+  int new_regno = REGNO (new_reg);
   save = emit_spill_move (true, new_reg, original_reg);
   if (NEXT_INSN (save) != NULL_RTX && !call_save_p)
     {
@@ -5523,7 +5547,7 @@  split_reg (bool before_p, int original_r
 	  fprintf
 	    (lra_dump_file,
 	     "	  Rejecting split %d->%d resulting in > 2 save insns:\n",
-	     original_regno, REGNO (new_reg));
+	     original_regno, new_regno);
 	  dump_rtl_slim (lra_dump_file, save, NULL, -1, 0);
 	  fprintf (lra_dump_file,
 		   "	))))))))))))))))))))))))))))))))))))))))))))))))\n");
@@ -5538,18 +5562,24 @@  split_reg (bool before_p, int original_r
 	  fprintf (lra_dump_file,
 		   "	Rejecting split %d->%d "
 		   "resulting in > 2 restore insns:\n",
-		   original_regno, REGNO (new_reg));
+		   original_regno, new_regno);
 	  dump_rtl_slim (lra_dump_file, restore, NULL, -1, 0);
 	  fprintf (lra_dump_file,
 		   "	))))))))))))))))))))))))))))))))))))))))))))))))\n");
 	}
       return false;
     }
+  /* Transfer equivalence information to the spill register, so that
+     if we fail to allocate the spill register, we have the option of
+     rematerializing the original value instead of spilling to the stack.  */
+  if (!HARD_REGISTER_NUM_P (original_regno)
+      && mode == PSEUDO_REGNO_MODE (original_regno))
+    lra_copy_reg_equiv (new_regno, original_regno);
   after_p = usage_insns[original_regno].after_p;
-  lra_reg_info[REGNO (new_reg)].restore_rtx = regno_reg_rtx[original_regno];
-  bitmap_set_bit (&check_only_regs, REGNO (new_reg));
+  lra_reg_info[new_regno].restore_rtx = regno_reg_rtx[original_regno];
+  bitmap_set_bit (&check_only_regs, new_regno);
   bitmap_set_bit (&check_only_regs, original_regno);
-  bitmap_set_bit (&lra_split_regs, REGNO (new_reg));
+  bitmap_set_bit (&lra_split_regs, new_regno);
   for (;;)
     {
       if (GET_CODE (next_usage_insns) != INSN_LIST)
@@ -5565,7 +5595,7 @@  split_reg (bool before_p, int original_r
       if (lra_dump_file != NULL)
 	{
 	  fprintf (lra_dump_file, "    Split reuse change %d->%d:\n",
-		   original_regno, REGNO (new_reg));
+		   original_regno, new_regno);
 	  dump_insn_slim (lra_dump_file, as_a <rtx_insn *> (usage_insn));
 	}
     }
Index: gcc/testsuite/gcc.target/aarch64/spill_1.c
===================================================================
--- /dev/null	2017-05-05 07:30:08.141451281 +0100
+++ gcc/testsuite/gcc.target/aarch64/spill_1.c	2017-05-05 08:19:18.244456716 +0100
@@ -0,0 +1,18 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef int v4si __attribute__ ((vector_size (16)));
+
+void bar (void);
+void
+foo (void)
+{
+  v4si x = { 1, 1, 1, 1 };
+  asm ("# %0" :: "w" (x));
+  bar ();
+  asm ("# %0" :: "w" (x));
+}
+
+/* { dg-final { scan-assembler-times {\tmovi\tv[0-9]+\.4s,} 2 } } */
+/* { dg-final { scan-assembler-not {\tldr\t} } } */
+/* { dg-final { scan-assembler-not {\tstr\t} } } */