diff mbox

[ARM] Fix register r3 wrongly used to save ip in nested APCS frame

Message ID 17166670.OmqLAYtanj@polaris
State New
Headers show

Commit Message

Eric Botcazou Dec. 2, 2013, 10:11 a.m. UTC
> My apologies for taking so long to look at this.

No problem, all the more so that...

> > 2013-09-05  Eric Botcazou  <ebotcazou@adacore.com>
> > 
> > 	* config/arm/arm.c (arm_expand_prologue): In a nested APCS frame with
> > 	arguments to push onto the stack and no varargs, save ip into a stack
> > 	slot if r3 isn't available on entry.
> 
> Sorry, but this is not quite right either, as shown by the attached
> testcase (in C this time, so we can commit it to gcc.target/arm :-)
> 
> The problem is that if we have some alignment padding we end up storing
> ip in one location but restoring it from another.
> 
> 	str	ip, [sp, #4]
> 	add	ip, sp, #8
> 	stmfd	sp!, {fp, ip, lr, pc}
> 	sub	fp, ip, #12
> 	ldr	ip, [fp, #4]	// <==== Should be fp + 8
> 	@ ip needed
> 	str	r3, [fp, #8]
> 	ldr	ip, [ip]

...ugh, right, the computation of the slot address was supposed to be clever 
but is totally broken instead. :-(  Thanks for spotting it, I don't understand 
how we haven't seen this on VxWorks (although your testcase does not fail on 
VxWorks, since args_to_push is 4 instead of 8 as on arm-eabi).

The computation of the slot address is:

+               {
+                 rtx x = plus_constant (Pmode, stack_pointer_rtx,
+                                        args_to_push - 4);
+                 insn = emit_insn (gen_addsi3 (stack_pointer_rtx,
+                                               stack_pointer_rtx,
+                                               GEN_INT (- args_to_push)));
+                 emit_set_insn (gen_frame_mem (SImode, x), ip_rtx);
+               }

meaning that IP will be saved into the first slot on the stack.  But the 
restore code is:

	      /* Recover the static chain register.  */
	      if (!arm_r3_live_at_start_p () || saved_pretend_args)
		insn = gen_rtx_REG (SImode, 3);
	      else
		{
		  insn = plus_constant (Pmode, hard_frame_pointer_rtx, 4);
		  insn = gen_frame_mem (SImode, insn);
		}
	      emit_set_insn (ip_rtx, insn);

meaning that IP is restored from the last slot on the stack, so this can work 
only if args_to_push == 4.

Revised patch attached, your testcase passes on arm-eabi with it.  Does it 
look OK to you?  If so, I'll run a testing cycle on arm-vxworks and arm-eabi.


	* config/arm/arm.c (arm_expand_prologue): In a nested APCS frame with
	arguments to push onto the stack and no varargs, save ip into the last
	stack slot if r3 isn't available on entry.

Comments

Eric Botcazou Dec. 11, 2013, 12:13 p.m. UTC | #1
> Revised patch attached, your testcase passes on arm-eabi with it.  Does it
> look OK to you?  If so, I'll run a testing cycle on arm-vxworks and
> arm-eabi.
> 
> 
> 	* config/arm/arm.c (arm_expand_prologue): In a nested APCS frame with
> 	arguments to push onto the stack and no varargs, save ip into the last
> 	stack slot if r3 isn't available on entry.

No problems detected for the patch on arm-vxworks or arm-eabi.
Eric Botcazou Dec. 18, 2013, 10:35 a.m. UTC | #2
> Revised patch attached, your testcase passes on arm-eabi with it.  Does it
> look OK to you?  If so, I'll run a testing cycle on arm-vxworks and
> arm-eabi.
> 
> 
> 	* config/arm/arm.c (arm_expand_prologue): In a nested APCS frame with
> 	arguments to push onto the stack and no varargs, save ip into the last
> 	stack slot if r3 isn't available on entry.

It would be nice if we could settle this before Christmas. ;-)
Richard Earnshaw Dec. 19, 2013, 3:38 p.m. UTC | #3
On 02/12/13 10:11, Eric Botcazou wrote:
>> My apologies for taking so long to look at this.
> 
> No problem, all the more so that...
> 
>>> 2013-09-05  Eric Botcazou  <ebotcazou@adacore.com>
>>>
>>> 	* config/arm/arm.c (arm_expand_prologue): In a nested APCS frame with
>>> 	arguments to push onto the stack and no varargs, save ip into a stack
>>> 	slot if r3 isn't available on entry.
>>
>> Sorry, but this is not quite right either, as shown by the attached
>> testcase (in C this time, so we can commit it to gcc.target/arm :-)
>>
>> The problem is that if we have some alignment padding we end up storing
>> ip in one location but restoring it from another.
>>
>> 	str	ip, [sp, #4]
>> 	add	ip, sp, #8
>> 	stmfd	sp!, {fp, ip, lr, pc}
>> 	sub	fp, ip, #12
>> 	ldr	ip, [fp, #4]	// <==== Should be fp + 8
>> 	@ ip needed
>> 	str	r3, [fp, #8]
>> 	ldr	ip, [ip]
> 
> ...ugh, right, the computation of the slot address was supposed to be clever 
> but is totally broken instead. :-(  Thanks for spotting it, I don't understand 
> how we haven't seen this on VxWorks (although your testcase does not fail on 
> VxWorks, since args_to_push is 4 instead of 8 as on arm-eabi).
> 
> The computation of the slot address is:
> 
> +               {
> +                 rtx x = plus_constant (Pmode, stack_pointer_rtx,
> +                                        args_to_push - 4);
> +                 insn = emit_insn (gen_addsi3 (stack_pointer_rtx,
> +                                               stack_pointer_rtx,
> +                                               GEN_INT (- args_to_push)));
> +                 emit_set_insn (gen_frame_mem (SImode, x), ip_rtx);
> +               }
> 
> meaning that IP will be saved into the first slot on the stack.  But the 
> restore code is:
> 
> 	      /* Recover the static chain register.  */
> 	      if (!arm_r3_live_at_start_p () || saved_pretend_args)
> 		insn = gen_rtx_REG (SImode, 3);
> 	      else
> 		{
> 		  insn = plus_constant (Pmode, hard_frame_pointer_rtx, 4);
> 		  insn = gen_frame_mem (SImode, insn);
> 		}
> 	      emit_set_insn (ip_rtx, insn);
> 
> meaning that IP is restored from the last slot on the stack, so this can work 
> only if args_to_push == 4.
> 
> Revised patch attached, your testcase passes on arm-eabi with it.  Does it 
> look OK to you?  If so, I'll run a testing cycle on arm-vxworks and arm-eabi.
> 
> 
> 	* config/arm/arm.c (arm_expand_prologue): In a nested APCS frame with
> 	arguments to push onto the stack and no varargs, save ip into the last
> 	stack slot if r3 isn't available on entry.
> 
> 

This generates:

	sub	sp, sp, #n
	str	ip, [sp]

which can be done in one instruction, vis:

	str	ip, [sp, #-n]!

Very similar code to do this essentially already exists a few lines
earlier (in the args_to_push == 0 case), but needs tweaking to use
pre-modify rather than pre-dec when n != 4.

I'll pre-approve a patch that makes that change.

R.
diff mbox

Patch

Index: config/arm/arm.c
===================================================================
--- config/arm/arm.c	(revision 205579)
+++ config/arm/arm.c	(working copy)
@@ -18549,8 +18549,7 @@  arm_r3_live_at_start_p (void)
   /* Just look at cfg info, which is still close enough to correct at this
      point.  This gives false positives for broken functions that might use
      uninitialized data that happens to be allocated in r3, but who cares?  */
-  return REGNO_REG_SET_P (df_get_live_out (ENTRY_BLOCK_PTR_FOR_FN (cfun)),
-			  3);
+  return REGNO_REG_SET_P (df_get_live_out (ENTRY_BLOCK_PTR_FOR_FN (cfun)), 3);
 }
 
 /* Compute the number of bytes used to store the static chain register on the
@@ -20576,11 +20575,13 @@  arm_expand_prologue (void)
 	     whilst the frame is being created.  We try the following
 	     places in order:
 
-	       1. The last argument register r3.
-	       2. A slot on the stack above the frame.  (This only
-	          works if the function is not a varargs function).
+	       1. The last argument register r3 if it is available.
+	       2. A slot on the stack above the frame if there are no
+		  arguments to push onto the stack.
 	       3. Register r3 again, after pushing the argument registers
-	          onto the stack.
+	          onto the stack, if this is a varargs function.
+	       4. The last slot on the stack created for the arguments to
+		  push, if this isn't a varargs function.
 
 	     Note - we only need to tell the dwarf2 backend about the SP
 	     adjustment in the second variant; the static chain register
@@ -20611,21 +20612,24 @@  arm_expand_prologue (void)
 	    {
 	      /* Store the args on the stack.  */
 	      if (cfun->machine->uses_anonymous_args)
-		insn = emit_multi_reg_push
-		  ((0xf0 >> (args_to_push / 4)) & 0xf);
+		{
+		  insn
+		    = emit_multi_reg_push ((0xf0 >> (args_to_push / 4)) & 0xf);
+		  emit_set_insn (gen_rtx_REG (SImode, 3), ip_rtx);
+		  saved_pretend_args = 1;
+		}
 	      else
-		insn = emit_insn
-		  (gen_addsi3 (stack_pointer_rtx, stack_pointer_rtx,
-			       GEN_INT (- args_to_push)));
+		{
+		  insn = emit_insn (gen_addsi3 (stack_pointer_rtx,
+						stack_pointer_rtx,
+						GEN_INT (- args_to_push)));
+		  emit_set_insn (gen_frame_mem (SImode, stack_pointer_rtx),
+				 ip_rtx);
+		}
 
 	      RTX_FRAME_RELATED_P (insn) = 1;
-
-	      saved_pretend_args = 1;
 	      fp_offset = args_to_push;
 	      args_to_push = 0;
-
-	      /* Now reuse r3 to preserve IP.  */
-	      emit_set_insn (gen_rtx_REG (SImode, 3), ip_rtx);
 	    }
 	}
 
@@ -20731,7 +20735,7 @@  arm_expand_prologue (void)
 	      /* Recover the static chain register.  */
 	      if (!arm_r3_live_at_start_p () || saved_pretend_args)
 		insn = gen_rtx_REG (SImode, 3);
-	      else /* if (crtl->args.pretend_args_size == 0) */
+	      else
 		{
 		  insn = plus_constant (Pmode, hard_frame_pointer_rtx, 4);
 		  insn = gen_frame_mem (SImode, insn);