diff mbox

[AArch64] Improve stack adjustment

Message ID AM5PR0802MB2610F570B8BD4EAD6E5514D583D00@AM5PR0802MB2610.eurprd08.prod.outlook.com
State New
Headers show

Commit Message

Wilco Dijkstra Oct. 17, 2016, 12:38 p.m. UTC
ping


From: Wilco Dijkstra
Sent: 10 August 2016 17:20
To: Richard Earnshaw; GCC Patches
Cc: nd
Subject: Re: [PATCH][AArch64] Improve stack adjustment
    
Richard Earnshaw wrote:
> I see you've added a default argument for your new parameter.  I think
> doing that is fine, but I have two comments about how we might use that
> in this case.
> 
> Firstly, if this parameter is suitable for having a default value, then
> I think the preceding one should also be treated in the same way.
> 
> Secondly, I think (due to these parameters being BOOL with no useful
> context to make it clear which is which) that having wrapper functions
> (inlined, of course) that describe the intended behaviour more clearly
> would be useful.  So, defining
> 
> static inline void
> aarch64_add_frame_constant (mode, regnum, scratchreg, delta)
> {
>    aarch64_add_frame_constant (mode, regnum, scratchreg, delta, true);
> }
> 
> would make it clearer at the call point than having a lot of true and
> false parameters scattered round the code.
> 
> Alternatively we might remove all the default parameters and require
> wrappers like the above to make it more explicit which API is intended -
> this might make more sense if not all combinations make sense.

OK here is the new version using simple wrapper functions and no
default parameters:

ChangeLog:
2016-08-10  Wilco Dijkstra  <wdijkstr@arm.com>

gcc/
        * config/aarch64/aarch64.c (aarch64_add_constant_internal):
        Add extra argument to allow emitting the move immediate.
        Use add/sub with positive immediate.
        (aarch64_add_constant): Add inline function.
        (aarch64_add_sp): Likewise.
        (aarch64_sub_sp): Likewise.
        (aarch64_expand_prologue): Call aarch64_sub_sp.
        (aarch64_expand_epilogue): Call aarch64_add_sp.
        Decide when to leave out move.
        (aarch64_output_mi_thunk): Call aarch64_add_constant.

testsuite/
        * gcc.target/aarch64/test_frame_17.c: New test.
--

Comments

James Greenhalgh Oct. 18, 2016, 5:05 p.m. UTC | #1
On Mon, Oct 17, 2016 at 12:38:36PM +0000, Wilco Dijkstra wrote:
> 
> ping
> 
> 
> From: Wilco Dijkstra
> Sent: 10 August 2016 17:20
> To: Richard Earnshaw; GCC Patches
> Cc: nd
> Subject: Re: [PATCH][AArch64] Improve stack adjustment
>     
> Richard Earnshaw wrote:
> > I see you've added a default argument for your new parameter.  I think
> > doing that is fine, but I have two comments about how we might use that
> > in this case.
> > 
> > Firstly, if this parameter is suitable for having a default value, then
> > I think the preceding one should also be treated in the same way.
> > 
> > Secondly, I think (due to these parameters being BOOL with no useful
> > context to make it clear which is which) that having wrapper functions
> > (inlined, of course) that describe the intended behaviour more clearly
> > would be useful.  So, defining
> > 
> > static inline void
> > aarch64_add_frame_constant (mode, regnum, scratchreg, delta)
> > {
> >    aarch64_add_frame_constant (mode, regnum, scratchreg, delta, true);
> > }
> > 
> > would make it clearer at the call point than having a lot of true and
> > false parameters scattered round the code.
> > 
> > Alternatively we might remove all the default parameters and require
> > wrappers like the above to make it more explicit which API is intended -
> > this might make more sense if not all combinations make sense.
> 
> OK here is the new version using simple wrapper functions and no
> default parameters:

I've got a simple suggestion for one of your comments, and I'd like
clarification on something I'm not understanding...

> gcc/
>         * config/aarch64/aarch64.c (aarch64_add_constant_internal):
>         Add extra argument to allow emitting the move immediate.
>         Use add/sub with positive immediate.
>         (aarch64_add_constant): Add inline function.
>         (aarch64_add_sp): Likewise.
>         (aarch64_sub_sp): Likewise.
>         (aarch64_expand_prologue): Call aarch64_sub_sp.
>         (aarch64_expand_epilogue): Call aarch64_add_sp.
>         Decide when to leave out move.
>         (aarch64_output_mi_thunk): Call aarch64_add_constant.
> @@ -1967,11 +1973,11 @@ aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
>        return;
>      }
>  
> -  /* We need two add/sub instructions, each one performing part of the
> -     calculation.  Don't do this if the addend can be loaded into register with
> -     a single instruction, in that case we prefer a move to a scratch register
> -     following by an addition.  */
> -  if (mdelta < 0x1000000 && !aarch64_move_imm (delta, mode))
> +  /* We need two add/sub instructions, each one perform part of the
> +     addition/subtraction, but don't this if the addend can be loaded into
> +     register by single instruction, in that case we prefer a move to scratch
> +     register following by addition.  */

This sentence is missing some words.

> +  if (mdelta < 0x1000000 && !aarch64_move_imm (mdelta, mode))

Could you explain this change? The comment makes it seem like delta would
still be correct. Probably the comment needs to say "followed by
addition/subtraction" rather than what is currently written?

>      {
>        HOST_WIDE_INT low_off = mdelta & 0xfff;
>  
> @@ -1985,8 +1991,10 @@ aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
>  
>    /* Otherwise use generic function to handle all other situations.  */
>    rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
> -  aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
> -  insn = emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
> +  if (emit_move_imm)
> +    aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (mdelta), true, mode);
> +  insn = emit_insn (delta < 0 ? gen_sub2_insn (this_rtx, scratch_rtx)
> +                             : gen_add2_insn (this_rtx, scratch_rtx));

What is contained in scratch_rtx here if we didn't set it up with
aarch64_internal_move_immediate? Are you not adding junk values in the
!emit_move_imm case?

Thanks,
James

>    if (frame_related_p)
>      {
>        RTX_FRAME_RELATED_P (insn) = frame_related_p;
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b8536175a84b76f8c2939e61f1379ae279b20d43..5a25fce17785af9f9dc12e0f2a9609af09af0b35 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1941,15 +1941,21 @@  aarch64_expand_mov_immediate (rtx dest, rtx imm)
 }
 
 /* Add DELTA to REGNUM in mode MODE.  SCRATCHREG can be used to held
-   intermediate value if necessary.
+   intermediate value if necessary.  FRAME_RELATED_P should be true if
+   the RTX_FRAME_RELATED flag should be set and CFA adjustments added
+   to the generated instructions.  If SCRATCHREG is known to hold
+   abs (delta), EMIT_MOVE_IMM can be set to false to avoid emitting the
+   immediate again.
 
-   This function is sometimes used to adjust the stack pointer, so we must
-   ensure that it can never cause transient stack deallocation by writing an
-   invalid value into REGNUM.  */
+   Since this function may be used to adjust the stack pointer, we must
+   ensure that it cannot cause transient stack deallocation (for example
+   by first incrementing SP and then decrementing when adjusting by a
+   large immediate).  */
 
 static void
-aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
-                     HOST_WIDE_INT delta, bool frame_related_p)
+aarch64_add_constant_internal (machine_mode mode, int regnum, int scratchreg,
+                              HOST_WIDE_INT delta, bool frame_related_p,
+                              bool emit_move_imm)
 {
   HOST_WIDE_INT mdelta = abs_hwi (delta);
   rtx this_rtx = gen_rtx_REG (mode, regnum);
@@ -1967,11 +1973,11 @@  aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
       return;
     }
 
-  /* We need two add/sub instructions, each one performing part of the
-     calculation.  Don't do this if the addend can be loaded into register with
-     a single instruction, in that case we prefer a move to a scratch register
-     following by an addition.  */
-  if (mdelta < 0x1000000 && !aarch64_move_imm (delta, mode))
+  /* We need two add/sub instructions, each one perform part of the
+     addition/subtraction, but don't this if the addend can be loaded into
+     register by single instruction, in that case we prefer a move to scratch
+     register following by addition.  */
+  if (mdelta < 0x1000000 && !aarch64_move_imm (mdelta, mode))
     {
       HOST_WIDE_INT low_off = mdelta & 0xfff;
 
@@ -1985,8 +1991,10 @@  aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
 
   /* Otherwise use generic function to handle all other situations.  */
   rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
-  aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
-  insn = emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
+  if (emit_move_imm)
+    aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (mdelta), true, mode);
+  insn = emit_insn (delta < 0 ? gen_sub2_insn (this_rtx, scratch_rtx)
+                             : gen_add2_insn (this_rtx, scratch_rtx));
   if (frame_related_p)
     {
       RTX_FRAME_RELATED_P (insn) = frame_related_p;
@@ -1995,6 +2003,27 @@  aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
     }
 }
 
+static inline void
+aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
+                     HOST_WIDE_INT delta)
+{
+  aarch64_add_constant_internal (mode, regnum, scratchreg, delta, false, true);
+}
+
+static inline void
+aarch64_add_sp (int scratchreg, HOST_WIDE_INT delta, bool emit_move_imm)
+{
+  aarch64_add_constant_internal (Pmode, SP_REGNUM, scratchreg, delta,
+                                true, emit_move_imm);
+}
+
+static inline void
+aarch64_sub_sp (int scratchreg, HOST_WIDE_INT delta, bool frame_related_p)
+{
+  aarch64_add_constant_internal (Pmode, SP_REGNUM, scratchreg, -delta,
+                                 frame_related_p, true);
+}
+
 static bool
 aarch64_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED,
                                  tree exp ATTRIBUTE_UNUSED)
@@ -3201,7 +3230,7 @@  aarch64_expand_prologue (void)
         aarch64_emit_probe_stack_range (STACK_CHECK_PROTECT, frame_size);
     }
 
-  aarch64_add_constant (Pmode, SP_REGNUM, IP0_REGNUM, -initial_adjust, true);
+  aarch64_sub_sp (IP0_REGNUM, initial_adjust, true);
 
   if (callee_adjust != 0)
     aarch64_push_regs (reg1, reg2, callee_adjust);
@@ -3222,8 +3251,7 @@  aarch64_expand_prologue (void)
                              callee_adjust != 0 || frame_pointer_needed);
   aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM,
                              callee_adjust != 0 || frame_pointer_needed);
-  aarch64_add_constant (Pmode, SP_REGNUM, IP1_REGNUM, -final_adjust,
-                       !frame_pointer_needed);
+  aarch64_sub_sp (IP1_REGNUM, final_adjust, !frame_pointer_needed);
 }
 
 /* Return TRUE if we can use a simple_return insn.
@@ -3288,7 +3316,7 @@  aarch64_expand_epilogue (bool for_sibcall)
       RTX_FRAME_RELATED_P (insn) = callee_adjust == 0;
     }
   else
-    aarch64_add_constant (Pmode, SP_REGNUM, IP1_REGNUM, final_adjust, true);
+    aarch64_add_sp (IP1_REGNUM, final_adjust, df_regs_ever_live_p (IP1_REGNUM));
 
   aarch64_restore_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM,
                                 callee_adjust != 0, &cfi_ops);
@@ -3311,7 +3339,7 @@  aarch64_expand_epilogue (bool for_sibcall)
       cfi_ops = NULL;
     }
 
-  aarch64_add_constant (Pmode, SP_REGNUM, IP0_REGNUM, initial_adjust, true);
+  aarch64_add_sp (IP0_REGNUM, initial_adjust, df_regs_ever_live_p (IP0_REGNUM));
 
   if (cfi_ops)
     {
@@ -3406,7 +3434,7 @@  aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
   emit_note (NOTE_INSN_PROLOGUE_END);
 
   if (vcall_offset == 0)
-    aarch64_add_constant (Pmode, this_regno, IP1_REGNUM, delta, false);
+    aarch64_add_constant (Pmode, this_regno, IP1_REGNUM, delta);
   else
     {
       gcc_assert ((vcall_offset & (POINTER_BYTES - 1)) == 0);
@@ -3422,7 +3450,7 @@  aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
             addr = gen_rtx_PRE_MODIFY (Pmode, this_rtx,
                                        plus_constant (Pmode, this_rtx, delta));
           else
-           aarch64_add_constant (Pmode, this_regno, IP1_REGNUM, delta, false);
+           aarch64_add_constant (Pmode, this_regno, IP1_REGNUM, delta);
         }
 
       if (Pmode == ptr_mode)
diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_17.c b/gcc/testsuite/gcc.target/aarch64/test_frame_17.c
new file mode 100644
index 0000000000000000000000000000000000000000..c214431999b60cce8a75204876a8c73ec6304128
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/test_frame_17.c
@@ -0,0 +1,21 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 --save-temps" } */
+
+/* Test reuse of stack adjustment temporaries.  */
+
+void foo ();
+
+int reuse_mov (int i)
+{
+  int arr[1025];
+  return arr[i];
+}
+
+int no_reuse_mov (int i)
+{
+  int arr[1025];
+  foo ();
+  return arr[i];
+}
+
+/* { dg-final { scan-assembler-times "mov\tx16, \[0-9\]+" 3 } } */