diff mbox

[expr.c] PR 65358 Avoid clobbering partial argument during sibcall

Message ID 5565C761.1060808@foss.arm.com
State New
Headers show

Commit Message

Kyrill Tkachov May 27, 2015, 1:32 p.m. UTC
Hi Jeff,

On 12/05/15 23:04, Jeff Law wrote:
> On 05/11/2015 03:28 AM, Kyrill Tkachov wrote:
>>> The more I think about this, the more I think it's an ugly can of
>>> worms and maybe we should just disable sibcalls for partial
>>> arguments.  I doubt it's a big performance issue in general.
>>
>> We already have quite a bit of code in calls.c to detect cases with
>> partial argument overlap for the
>> explicit purpose of allowing sibcalls when partial arguments occur in
>> the general case. However, that
>> code only detects when a partial argument overlaps with other arguments
>> in a call. In this PR the
>> partial argument overlaps with itself. It would be a shame to disable
>> sibcalls for all partial arguments
>> when there is already infrastructure in place to handle them.
> I didn't even realize we had support for partial arguments in sibcalls.   Ah, Kazu added that in 2005, I totally missed it.  I probably would have suggested failing the sibcall for those cases back then too...
>
> Is there any way to re-use that infrastructure to deal with the case at hand?
>
>
>
>>
>>>
>>>
>>> In addition to the argument/stack direction stuff, I've been pondering
>>> the stack/frame/arg pointer issues.  Your approach assumes that the
>>> incoming and outgoing areas are always referenced off the same base
>>> register.  If they aren't, then the routine returns no overlap.
>>>
>>> But we'd need to consider the case where we have a reference to the
>>> arg or frame pointer which later gets rewritten into a stack pointer
>>> relative address.
>>>
>>> Is it too late at the point were you do the checks to reject the
>>> sibling call?  If not, then maybe the overlap routine should return a
>>> tri-state.  No overlap, overlap, don't know.  The last would be used
>>> when the two addresses use a different register.
>>
>> Ok, here is my attempt at that. The overlap functions returns -2 when it
>> cannot staticall compare the
>> two pointers (i.e. when the base registers are different) and the caller
>> then disables sibcalls.
>> The code in calls.c that calls this code will undo any emitted
>> instructions in the meantime if sibcall
>> optimisation fails. This required me to change the type of
>> emit_push_insn to bool and add an extra
>> parameter, so this patch touches a bit more code than the original version.
>>
>> Bootstrapped on x86_64 and tested on arm. The testcase in this PR still
>> performs a sibcall correctly on arm.
>>
>> What do you think of this?
>>
>> Thanks,
>> Kyrill
>>
>>
>> 2015-05-11  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>
>>      PR target/65358
>>      * expr.c (memory_load_overlap): New function.
>>      (emit_push_insn): When pushing partial args to the stack would
>>      clobber the register part load the overlapping part into a pseudo
>>      and put it into the hard reg after pushing.  Change return type
>>      to bool.  Add bool argument.
>>      * expr.h (emit_push_insn): Change return type to bool.
>>      Add bool argument.
>>      * calls.c (expand_call): Cancel sibcall optimisation when encountering
>>      partial argument on targets with ARGS_GROW_DOWNWARD and
>>      !STACK_GROWS_DOWNWARD.
>>      (emit_library_call_value_1): Update callsite of emit_push_insn.
>>      (store_one_arg): Likewise.
>>
>>
>> 2015-05-11  Honggyu Kim <hong.gyu.kim@lge.com>
>>
>>      PR target/65358
>>      * gcc.dg/pr65358.c: New test.
>>
>>>
>>> Jeff
>>>
>>>
>>
>>
>> expr.patch
>>
>>
>> commit 5b596f10846b6d3b143442a306801c8262d8b10a
>> Author: Kyrylo Tkachov<kyrylo.tkachov@arm.com>
>> Date:   Wed Mar 18 13:42:37 2015 +0000
>>
>>      [expr.c] PR 65358 Avoid clobbering partial argument during sibcall
>>
>> diff --git a/gcc/calls.c b/gcc/calls.c
>> index caa7d60..81ef2c9 100644
>> --- a/gcc/calls.c
>> +++ b/gcc/calls.c
>> @@ -3225,6 +3225,13 @@ expand_call (tree exp, rtx target, int ignore)
>>           {
>>             rtx_insn *before_arg = get_last_insn ();
>>
>> +         /* On targets with weird calling conventions (e.g. PA) it's
>> +        hard to ensure that all cases of argument overlap between
>> +        stack and registers work.  Play it safe and bail out. */
>> +#if defined (ARGS_GROW_DOWNWARD) && !defined (STACK_GROWS_DOWNWARD)
>> +          sibcall_failure = 1;
>> +          break;
>> +#endif
> So we're trying to get away from this kind of conditional compilation.
>
> Instead we want to write
>
> if (ARGS_GROW_DOWNWARD && !STACK_GROWS_DOWNWARD)
>
> ARGS_GROW_DOWNWARD is already a testable value.  But STACK_GROWS_DOWNWARD is not.  The way folks have been dealing with this is something like this after the #includes:
>
> /* Redefine STACK_GROWS_DOWNWARD in terms of 0 or 1.  */
> #ifdef STACK_GROWS_DOWNWARD
> # undef STACK_GROWS_DOWNWARD
> # define STACK_GROWS_DOWNWARD 1
> #else
> # define STACK_GROWS_DOWNWARD 0
> #endif
>
>
> With that in place you can change the test into the more desirable
> if (ARGS_GROW_DOWNWARD && !STACK_GROWS_DOWNWARD)
>
>
>
>> diff --git a/gcc/expr.c b/gcc/expr.c
>> index 25aa11f..712fa0b 100644
>> --- a/gcc/expr.c
>> +++ b/gcc/expr.c
>> @@ -4121,12 +4121,35 @@ emit_single_push_insn (machine_mode mode, rtx x, tree type)
>>   }
>>   #endif
>>
>> +/* If reading SIZE bytes from X will end up reading from
>> +   Y return the number of bytes that overlap.  Return -1
>> +   if there is no overlap or -2 if we can't determing
> s/determing/determine/
>
>> +   partial argument during a sibcall optimisation (as specified by
> s/optimisation/optimization/
>
>
> I guess I'm still not real comfortable that we've got the issues here resolved and if we're going to try and support this case, then re-using the existing infrastructure would be better.
>
> I'll approve with the changes noted above, but would ask that you look into trying to re-use the existing infrastructure as a follow-up.

Thanks, I'm attaching what I committed with r223753.
I've raised PR 66307 for the infrastructure work.

Thanks again for helping beat this into shape.

Kyrill

>
> Jeff
>
>
diff mbox

Patch

commit 462eda51471ef6cb0ffcf4d45c8aa464c3d750e0
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Mar 18 13:42:37 2015 +0000

    [expr.c] PR 65358 Avoid clobbering partial argument during sibcall

diff --git a/gcc/calls.c b/gcc/calls.c
index afe61f4..2158eba 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -3234,6 +3234,15 @@  expand_call (tree exp, rtx target, int ignore)
 	    {
 	      rtx_insn *before_arg = get_last_insn ();
 
+	     /* On targets with weird calling conventions (e.g. PA) it's
+		hard to ensure that all cases of argument overlap between
+		stack and registers work.  Play it safe and bail out.  */
+	      if (ARGS_GROW_DOWNWARD && !STACK_GROWS_DOWNWARD)
+		{
+		  sibcall_failure = 1;
+		  break;
+		}
+
 	      if (store_one_arg (&args[i], argblock, flags,
 				 adjusted_args_size.var != 0,
 				 reg_parm_stack_space)
@@ -4276,7 +4285,7 @@  emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
 			  partial, reg, 0, argblock,
 			  GEN_INT (argvec[argnum].locate.offset.constant),
 			  reg_parm_stack_space,
-			  ARGS_SIZE_RTX (argvec[argnum].locate.alignment_pad));
+			  ARGS_SIZE_RTX (argvec[argnum].locate.alignment_pad), false);
 
 	  /* Now mark the segment we just used.  */
 	  if (ACCUMULATE_OUTGOING_ARGS)
@@ -4886,10 +4895,11 @@  store_one_arg (struct arg_data *arg, rtx argblock, int flags,
 
       /* This isn't already where we want it on the stack, so put it there.
 	 This can either be done with push or copy insns.  */
-      emit_push_insn (arg->value, arg->mode, TREE_TYPE (pval), NULL_RTX,
+      if (!emit_push_insn (arg->value, arg->mode, TREE_TYPE (pval), NULL_RTX,
 		      parm_align, partial, reg, used - size, argblock,
 		      ARGS_SIZE_RTX (arg->locate.offset), reg_parm_stack_space,
-		      ARGS_SIZE_RTX (arg->locate.alignment_pad));
+		      ARGS_SIZE_RTX (arg->locate.alignment_pad), true))
+	sibcall_failure = 1;
 
       /* Unless this is a partially-in-register argument, the argument is now
 	 in the stack.  */
@@ -4994,7 +5004,7 @@  store_one_arg (struct arg_data *arg, rtx argblock, int flags,
       emit_push_insn (arg->value, arg->mode, TREE_TYPE (pval), size_rtx,
 		      parm_align, partial, reg, excess, argblock,
 		      ARGS_SIZE_RTX (arg->locate.offset), reg_parm_stack_space,
-		      ARGS_SIZE_RTX (arg->locate.alignment_pad));
+		      ARGS_SIZE_RTX (arg->locate.alignment_pad), false);
 
       /* Unless this is a partially-in-register argument, the argument is now
 	 in the stack.
diff --git a/gcc/expr.c b/gcc/expr.c
index a613beb..1dd1cf3 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -4104,12 +4104,35 @@  emit_single_push_insn (machine_mode mode, rtx x, tree type)
 }
 #endif
 
+/* If reading SIZE bytes from X will end up reading from
+   Y return the number of bytes that overlap.  Return -1
+   if there is no overlap or -2 if we can't determine
+   (for example when X and Y have different base registers).  */
+
+static int
+memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
+{
+  rtx tmp = plus_constant (Pmode, x, size);
+  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
+
+  if (!CONST_INT_P (sub))
+    return -2;
+
+  HOST_WIDE_INT val = INTVAL (sub);
+
+  return IN_RANGE (val, 1, size) ? val : -1;
+}
+
 /* Generate code to push X onto the stack, assuming it has mode MODE and
    type TYPE.
    MODE is redundant except when X is a CONST_INT (since they don't
    carry mode info).
    SIZE is an rtx for the size of data to be copied (in bytes),
    needed only if X is BLKmode.
+   Return true if successful.  May return false if asked to push a
+   partial argument during a sibcall optimization (as specified by
+   SIBCALL_P) and the incoming and outgoing pointers cannot be shown
+   to not overlap.
 
    ALIGN (in bits) is maximum alignment we can assume.
 
@@ -4135,11 +4158,11 @@  emit_single_push_insn (machine_mode mode, rtx x, tree type)
    for arguments passed in registers.  If nonzero, it will be the number
    of bytes required.  */
 
-void
+bool
 emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 		unsigned int align, int partial, rtx reg, int extra,
 		rtx args_addr, rtx args_so_far, int reg_parm_stack_space,
-		rtx alignment_pad)
+		rtx alignment_pad, bool sibcall_p)
 {
   rtx xinner;
   enum direction stack_direction = STACK_GROWS_DOWNWARD ? downward : upward;
@@ -4157,6 +4180,10 @@  emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 
   xinner = x;
 
+  int nregs = partial / UNITS_PER_WORD;
+  rtx *tmp_regs = NULL;
+  int overlapping = 0;
+
   if (mode == BLKmode
       || (STRICT_ALIGNMENT && align < GET_MODE_ALIGNMENT (mode)))
     {
@@ -4287,6 +4314,43 @@  emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	     PARM_BOUNDARY.  Assume the caller isn't lying.  */
 	  set_mem_align (target, align);
 
+	  /* If part should go in registers and pushing to that part would
+	     overwrite some of the values that need to go into regs, load the
+	     overlapping values into temporary pseudos to be moved into the hard
+	     regs at the end after the stack pushing has completed.
+	     We cannot load them directly into the hard regs here because
+	     they can be clobbered by the block move expansions.
+	     See PR 65358.  */
+
+	  if (partial > 0 && reg != 0 && mode == BLKmode
+	      && GET_CODE (reg) != PARALLEL)
+	    {
+	      overlapping = memory_load_overlap (XEXP (x, 0), temp, partial);
+	      if (overlapping > 0)
+	        {
+		  gcc_assert (overlapping % UNITS_PER_WORD == 0);
+		  overlapping /= UNITS_PER_WORD;
+
+		  tmp_regs = XALLOCAVEC (rtx, overlapping);
+
+		  for (int i = 0; i < overlapping; i++)
+		    tmp_regs[i] = gen_reg_rtx (word_mode);
+
+		  for (int i = 0; i < overlapping; i++)
+		    emit_move_insn (tmp_regs[i],
+				    operand_subword_force (target, i, mode));
+	        }
+	      else if (overlapping == -1)
+		overlapping = 0;
+	      /* Could not determine whether there is overlap.
+	         Fail the sibcall.  */
+	      else
+		{
+		  overlapping = 0;
+		  if (sibcall_p)
+		    return false;
+		}
+	    }
 	  emit_block_move (target, xinner, size, BLOCK_OP_CALL_PARM);
 	}
     }
@@ -4341,12 +4405,13 @@  emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	 has a size a multiple of a word.  */
       for (i = size - 1; i >= not_stack; i--)
 	if (i >= not_stack + offset)
-	  emit_push_insn (operand_subword_force (x, i, mode),
+	  if (!emit_push_insn (operand_subword_force (x, i, mode),
 			  word_mode, NULL_TREE, NULL_RTX, align, 0, NULL_RTX,
 			  0, args_addr,
 			  GEN_INT (args_offset + ((i - not_stack + skip)
 						  * UNITS_PER_WORD)),
-			  reg_parm_stack_space, alignment_pad);
+			  reg_parm_stack_space, alignment_pad, sibcall_p))
+	    return false;
     }
   else
     {
@@ -4389,9 +4454,8 @@  emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	}
     }
 
-  /* If part should go in registers, copy that part
-     into the appropriate registers.  Do this now, at the end,
-     since mem-to-mem copies above may do function calls.  */
+  /* Move the partial arguments into the registers and any overlapping
+     values that we moved into the pseudos in tmp_regs.  */
   if (partial > 0 && reg != 0)
     {
       /* Handle calls that pass values in multiple non-contiguous locations.
@@ -4399,9 +4463,15 @@  emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
       if (GET_CODE (reg) == PARALLEL)
 	emit_group_load (reg, x, type, -1);
       else
-	{
+        {
 	  gcc_assert (partial % UNITS_PER_WORD == 0);
-	  move_block_to_reg (REGNO (reg), x, partial / UNITS_PER_WORD, mode);
+	  move_block_to_reg (REGNO (reg), x, nregs - overlapping, mode);
+
+	  for (int i = 0; i < overlapping; i++)
+	    emit_move_insn (gen_rtx_REG (word_mode, REGNO (reg)
+						    + nregs - overlapping + i),
+			    tmp_regs[i]);
+
 	}
     }
 
@@ -4410,6 +4480,8 @@  emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 
   if (alignment_pad && args_addr == 0)
     anti_adjust_stack (alignment_pad);
+
+  return true;
 }
 
 /* Return X if X can be used as a subtarget in a sequence of arithmetic
diff --git a/gcc/expr.h b/gcc/expr.h
index e3afa8d..7b28ffd 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -219,8 +219,8 @@  extern rtx emit_move_resolve_push (machine_mode, rtx);
 extern rtx push_block (rtx, int, int);
 
 /* Generate code to push something onto the stack, given its mode and type.  */
-extern void emit_push_insn (rtx, machine_mode, tree, rtx, unsigned int,
-			    int, rtx, int, rtx, rtx, int, rtx);
+extern bool emit_push_insn (rtx, machine_mode, tree, rtx, unsigned int,
+			    int, rtx, int, rtx, rtx, int, rtx, bool);
 
 /* Expand an assignment that stores the value of FROM into TO.  */
 extern void expand_assignment (tree, tree, bool);
diff --git a/gcc/testsuite/gcc.dg/pr65358.c b/gcc/testsuite/gcc.dg/pr65358.c
new file mode 100644
index 0000000..ba89fd4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr65358.c
@@ -0,0 +1,33 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+struct pack
+{
+  int fine;
+  int victim;
+  int killer;
+};
+
+int __attribute__ ((__noinline__, __noclone__))
+bar (int a, int b, struct pack p)
+{
+  if (a != 20 || b != 30)
+    __builtin_abort ();
+  if (p.fine != 40 || p.victim != 50 || p.killer != 60)
+    __builtin_abort ();
+  return 0;
+}
+
+int __attribute__ ((__noinline__, __noclone__))
+foo (int arg1, int arg2, int arg3, struct pack p)
+{
+  return bar (arg2, arg3, p);
+}
+
+int main (void)
+{
+  struct pack p = { 40, 50, 60 };
+
+  (void) foo (10, 20, 30, p);
+  return 0;
+}