diff mbox

[expr.c] PR 65358 Avoid clobbering partial argument during sibcall

Message ID 553F58BB.4070508@foss.arm.com
State New
Headers show

Commit Message

Kyrill Tkachov April 28, 2015, 9:54 a.m. UTC
On 27/04/15 21:13, Jeff Law wrote:
> On 04/21/2015 11:33 AM, Kyrill Tkachov wrote:
>> On 21/04/15 15:09, Jeff Law wrote:
>>> On 04/21/2015 02:30 AM, Kyrill Tkachov wrote:
>>>>    From reading config/stormy16/stormy-abi it seems to me that we don't
>>>> pass arguments partially in stormy16, so this code would never be called
>>>> there. That leaves pa as the potential problematic target.
>>>> I don't suppose there's an easy way to test on pa? My checkout of
>>>> binutils
>>>> doesn't seem to include a sim target for it.
>>> No simulator, no machines in the testfarm, the box I had access to via
>>> parisc-linux.org seems dead and my ancient PA overheats well before a
>>> bootstrap could complete.  I often regret knowing about the backwards
>>> way many things were done on the PA because it makes me think about
>>> cases that only matter on dead architectures.
>> So what should be the action plan here? I can't add an assert on
>> positive result as a negative result is valid.
>>
>> We want to catch the case where this would cause trouble on
>> pa, or change the patch until we're confident that it's fine
>> for pa.
>>
>> That being said, reading the documentation of STACK_GROWS_UPWARD
>> and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case
>> where this would cause trouble on pa.
>>
>> Is the problem that in the function:
>>
>> +/* Add SIZE to X and check whether it's greater than Y.
>> +   If it is, return the constant amount by which it's greater or smaller.
>> +   If the two are not statically comparable (for example, X and Y contain
>> +   different registers) return -1.  This is used in expand_push_insn to
>> +   figure out if reading SIZE bytes from location X will end up reading
>> from
>> +   location Y.  */
>> +static int
>> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
>> +{
>> +  rtx tmp = plus_constant (Pmode, x, size);
>> +  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
>> +
>> +  if (!CONST_INT_P (sub))
>> +    return -1;
>> +
>> +  return INTVAL (sub);
>> +}
>>
>> for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x,
>> so the function should something like the following?
> So I had to go back and compile some simple examples.
>
> References to outgoing arguments will be SP relative.  References to the
> incoming arguments will be ARGP relative.  And that brings me to the
> another issue.  Isn't X in this context the incoming argument slot and
> the destination an outgoing argument slot?
>
> If so, the approach of memory_load_overlap simply won't work on a target
> with calling conventions like the PA.  And you might really want to
> consider punting for these kind of calling conventions

Ok, thanks for the guidance.
How about this? This patch disables sibcall optimisation when
encountering a partial argument when ARGS_GROW_DOWNWARD && !STACK_GROWS_DOWNWARD.
Hopefully this shouldn't harm codegen on parisc if, as you say, it's rare to have
partial arguments anyway on PA due to the large number of argument regs.

I tested this on arm and bootstrapped on x86_64.
I am now going through the process of getting access to a Debian PA machine to
give it a test there (thanks Dave!)

Ok if testing comes clean?

Thanks,
Kyrill

2015-04-28  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/65358
     * calls.c (expand_call): Cancel sibcall optimisation when encountering
     partial argument on targets with ARGS_GROW_DOWNWARD and
     !STACK_GROWS_DOWNWARD.
     * expr.c (memory_load_overlap): New function.
     (emit_push_insn): When pushing partial args to the stack would
     clobber the register part load the overlapping part into a pseudo
     and put it into the hard reg after pushing.

2015-04-28  Honggyu Kim  <hong.gyu.kim@lge.com>

     PR target/65358
     * gcc.dg/pr65358.c: New test.

>
> If you hadn't already done the work, I'd suggest punting for any case
> where we have args partially in regs and partially in memory :-)
>
> More thoughts when I can get an hour or two to remind myself how all
> this stuff works on the PA.
>
> I will note that testing on the PA is unlikely to show anything simply
> because it uses 8 parameter passing registers.  So it's rare to pass
> anything in memory at all.  Even rarer to have something partially in
> memory and partially in registers.
>
>
>
> Jeff
>
>

Comments

Kyrylo Tkachov April 30, 2015, 12:06 p.m. UTC | #1
On 28/04/15 10:54, Kyrill Tkachov wrote:
> On 27/04/15 21:13, Jeff Law wrote:
>> On 04/21/2015 11:33 AM, Kyrill Tkachov wrote:
>>> On 21/04/15 15:09, Jeff Law wrote:
>>>> On 04/21/2015 02:30 AM, Kyrill Tkachov wrote:
>>>>>     From reading config/stormy16/stormy-abi it seems to me that we don't
>>>>> pass arguments partially in stormy16, so this code would never be called
>>>>> there. That leaves pa as the potential problematic target.
>>>>> I don't suppose there's an easy way to test on pa? My checkout of
>>>>> binutils
>>>>> doesn't seem to include a sim target for it.
>>>> No simulator, no machines in the testfarm, the box I had access to via
>>>> parisc-linux.org seems dead and my ancient PA overheats well before a
>>>> bootstrap could complete.  I often regret knowing about the backwards
>>>> way many things were done on the PA because it makes me think about
>>>> cases that only matter on dead architectures.
>>> So what should be the action plan here? I can't add an assert on
>>> positive result as a negative result is valid.
>>>
>>> We want to catch the case where this would cause trouble on
>>> pa, or change the patch until we're confident that it's fine
>>> for pa.
>>>
>>> That being said, reading the documentation of STACK_GROWS_UPWARD
>>> and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case
>>> where this would cause trouble on pa.
>>>
>>> Is the problem that in the function:
>>>
>>> +/* Add SIZE to X and check whether it's greater than Y.
>>> +   If it is, return the constant amount by which it's greater or smaller.
>>> +   If the two are not statically comparable (for example, X and Y contain
>>> +   different registers) return -1.  This is used in expand_push_insn to
>>> +   figure out if reading SIZE bytes from location X will end up reading
>>> from
>>> +   location Y.  */
>>> +static int
>>> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
>>> +{
>>> +  rtx tmp = plus_constant (Pmode, x, size);
>>> +  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
>>> +
>>> +  if (!CONST_INT_P (sub))
>>> +    return -1;
>>> +
>>> +  return INTVAL (sub);
>>> +}
>>>
>>> for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x,
>>> so the function should something like the following?
>> So I had to go back and compile some simple examples.
>>
>> References to outgoing arguments will be SP relative.  References to the
>> incoming arguments will be ARGP relative.  And that brings me to the
>> another issue.  Isn't X in this context the incoming argument slot and
>> the destination an outgoing argument slot?
>>
>> If so, the approach of memory_load_overlap simply won't work on a target
>> with calling conventions like the PA.  And you might really want to
>> consider punting for these kind of calling conventions
> Ok, thanks for the guidance.
> How about this? This patch disables sibcall optimisation when
> encountering a partial argument when ARGS_GROW_DOWNWARD && !STACK_GROWS_DOWNWARD.
> Hopefully this shouldn't harm codegen on parisc if, as you say, it's rare to have
> partial arguments anyway on PA due to the large number of argument regs.
>
> I tested this on arm and bootstrapped on x86_64.
> I am now going through the process of getting access to a Debian PA machine to
> give it a test there (thanks Dave!)
>
> Ok if testing comes clean?
Hi Jeff,

So I got access to an hppa machine.
But as mentioned here https://gcc.gnu.org/ml/gcc/2015-04/msg00364.html
I don't think I can bootstrap and run a 64-bit pa testsuite.
I've verified that the compiler builds, but is there anything more I can
do in testing this patch?
The latest version is at https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01713.html
and disables sibcall optimisation on a target like pa when a partial argument is encountered.

Thanks,
Kyrill


>
> Thanks,
> Kyrill
>
> 2015-04-28  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>       PR target/65358
>       * calls.c (expand_call): Cancel sibcall optimisation when encountering
>       partial argument on targets with ARGS_GROW_DOWNWARD and
>       !STACK_GROWS_DOWNWARD.
>       * expr.c (memory_load_overlap): New function.
>       (emit_push_insn): When pushing partial args to the stack would
>       clobber the register part load the overlapping part into a pseudo
>       and put it into the hard reg after pushing.
>
> 2015-04-28  Honggyu Kim  <hong.gyu.kim@lge.com>
>
>       PR target/65358
>       * gcc.dg/pr65358.c: New test.
>
>> If you hadn't already done the work, I'd suggest punting for any case
>> where we have args partially in regs and partially in memory :-)
>>
>> More thoughts when I can get an hour or two to remind myself how all
>> this stuff works on the PA.
>>
>> I will note that testing on the PA is unlikely to show anything simply
>> because it uses 8 parameter passing registers.  So it's rare to pass
>> anything in memory at all.  Even rarer to have something partially in
>> memory and partially in registers.
>>
>>
>>
>> Jeff
>>
>>
Jeff Law May 1, 2015, 6:51 p.m. UTC | #2
On 04/28/2015 03:54 AM, Kyrill Tkachov wrote:
>
> On 27/04/15 21:13, Jeff Law wrote:
>> On 04/21/2015 11:33 AM, Kyrill Tkachov wrote:
>>> On 21/04/15 15:09, Jeff Law wrote:
>>>> On 04/21/2015 02:30 AM, Kyrill Tkachov wrote:
>>>>>    From reading config/stormy16/stormy-abi it seems to me that we
>>>>> don't
>>>>> pass arguments partially in stormy16, so this code would never be
>>>>> called
>>>>> there. That leaves pa as the potential problematic target.
>>>>> I don't suppose there's an easy way to test on pa? My checkout of
>>>>> binutils
>>>>> doesn't seem to include a sim target for it.
>>>> No simulator, no machines in the testfarm, the box I had access to via
>>>> parisc-linux.org seems dead and my ancient PA overheats well before a
>>>> bootstrap could complete.  I often regret knowing about the backwards
>>>> way many things were done on the PA because it makes me think about
>>>> cases that only matter on dead architectures.
>>> So what should be the action plan here? I can't add an assert on
>>> positive result as a negative result is valid.
>>>
>>> We want to catch the case where this would cause trouble on
>>> pa, or change the patch until we're confident that it's fine
>>> for pa.
>>>
>>> That being said, reading the documentation of STACK_GROWS_UPWARD
>>> and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case
>>> where this would cause trouble on pa.
>>>
>>> Is the problem that in the function:
>>>
>>> +/* Add SIZE to X and check whether it's greater than Y.
>>> +   If it is, return the constant amount by which it's greater or
>>> smaller.
>>> +   If the two are not statically comparable (for example, X and Y
>>> contain
>>> +   different registers) return -1.  This is used in expand_push_insn to
>>> +   figure out if reading SIZE bytes from location X will end up reading
>>> from
>>> +   location Y.  */
>>> +static int
>>> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
>>> +{
>>> +  rtx tmp = plus_constant (Pmode, x, size);
>>> +  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
>>> +
>>> +  if (!CONST_INT_P (sub))
>>> +    return -1;
>>> +
>>> +  return INTVAL (sub);
>>> +}
>>>
>>> for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x,
>>> so the function should something like the following?
>> So I had to go back and compile some simple examples.
>>
>> References to outgoing arguments will be SP relative.  References to the
>> incoming arguments will be ARGP relative.  And that brings me to the
>> another issue.  Isn't X in this context the incoming argument slot and
>> the destination an outgoing argument slot?
>>
>> If so, the approach of memory_load_overlap simply won't work on a target
>> with calling conventions like the PA.  And you might really want to
>> consider punting for these kind of calling conventions
>
> Ok, thanks for the guidance.
> How about this? This patch disables sibcall optimisation when
> encountering a partial argument when ARGS_GROW_DOWNWARD &&
> !STACK_GROWS_DOWNWARD.
> Hopefully this shouldn't harm codegen on parisc if, as you say, it's
> rare to have
> partial arguments anyway on PA due to the large number of argument regs.
>
> I tested this on arm and bootstrapped on x86_64.
> I am now going through the process of getting access to a Debian PA
> machine to
> give it a test there (thanks Dave!)
>
> Ok if testing comes clean?
>
> Thanks,
> Kyrill
>
> 2015-04-28  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>      PR target/65358
>      * calls.c (expand_call): Cancel sibcall optimisation when encountering
>      partial argument on targets with ARGS_GROW_DOWNWARD and
>      !STACK_GROWS_DOWNWARD.
>      * expr.c (memory_load_overlap): New function.
>      (emit_push_insn): When pushing partial args to the stack would
>      clobber the register part load the overlapping part into a pseudo
>      and put it into the hard reg after pushing.
>
> 2015-04-28  Honggyu Kim  <hong.gyu.kim@lge.com>
>
>      PR target/65358
>      * gcc.dg/pr65358.c: New test.
The more I think about this, the more I think it's an ugly can of worms 
and maybe we should just disable sibcalls for partial arguments.  I 
doubt it's a big performance issue in general.

In addition to the argument/stack direction stuff, I've been pondering 
the stack/frame/arg pointer issues.  Your approach assumes that the 
incoming and outgoing areas are always referenced off the same base 
register.  If they aren't, then the routine returns no overlap.

But we'd need to consider the case where we have a reference to the arg 
or frame pointer which later gets rewritten into a stack pointer 
relative address.

Is it too late at the point were you do the checks to reject the sibling 
call?  If not, then maybe the overlap routine should return a tri-state. 
  No overlap, overlap, don't know.  The last would be used when the two 
addresses use a different register.

Jeff
diff mbox

Patch

commit 61beb521f63db2f7178e2c41e3747982ddcc8869
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Mar 18 13:42:37 2015 +0000

    [expr.c] PR 65358 Avoid clobbering partial argument during sibcall

diff --git a/gcc/calls.c b/gcc/calls.c
index 3be7ca5..8159232 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -3229,6 +3229,13 @@  expand_call (tree exp, rtx target, int ignore)
 	    {
 	      rtx_insn *before_arg = get_last_insn ();
 
+	     /* On targets with weird calling conventions (e.g. PA) it's
+		hard to ensure that all cases of argument overlap between
+		stack and registers work.  Play it safe and bail out.  */
+#if defined (ARGS_GROW_DOWNWARD) && !defined (STACK_GROWS_DOWNWARD)
+	      sibcall_failure = 1;
+	      break;
+#endif
 	      if (store_one_arg (&args[i], argblock, flags,
 				 adjusted_args_size.var != 0,
 				 reg_parm_stack_space)
diff --git a/gcc/expr.c b/gcc/expr.c
index 530a944..071a335 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -4121,6 +4121,24 @@  emit_single_push_insn (machine_mode mode, rtx x, tree type)
 }
 #endif
 
+/* If reading SIZE bytes from X will end up reading from
+   Y return the number of bytes that overlap.  Return -1
+   in all other cases.  */
+
+static int
+memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
+{
+  rtx tmp = plus_constant (Pmode, x, size);
+  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
+
+  if (!CONST_INT_P (sub))
+    return -1;
+
+  HOST_WIDE_INT val = INTVAL (sub);
+
+  return IN_RANGE (val, 1, size) ? val : -1;
+}
+
 /* Generate code to push X onto the stack, assuming it has mode MODE and
    type TYPE.
    MODE is redundant except when X is a CONST_INT (since they don't
@@ -4179,6 +4197,10 @@  emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 
   xinner = x;
 
+  int nregs = partial / UNITS_PER_WORD;
+  rtx *tmp_regs = NULL;
+  int overlapping = 0;
+
   if (mode == BLKmode
       || (STRICT_ALIGNMENT && align < GET_MODE_ALIGNMENT (mode)))
     {
@@ -4309,6 +4331,35 @@  emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	     PARM_BOUNDARY.  Assume the caller isn't lying.  */
 	  set_mem_align (target, align);
 
+	  /* If part should go in registers and pushing to that part would
+	     overwrite some of the values that need to go into regs, load the
+	     overlapping values into temporary pseudos to be moved into the hard
+	     regs at the end after the stack pushing has completed.
+	     We cannot load them directly into the hard regs here because
+	     they can be clobbered by the block move expansions.
+	     See PR 65358.  */
+
+	  if (partial > 0 && reg != 0 && mode == BLKmode
+	      && GET_CODE (reg) != PARALLEL)
+	    {
+	      overlapping = memory_load_overlap (XEXP (x, 0), temp, partial);
+	      if (overlapping > 0)
+	        {
+		  gcc_assert (overlapping % UNITS_PER_WORD == 0);
+		  overlapping /= UNITS_PER_WORD;
+
+		  tmp_regs = XALLOCAVEC (rtx, overlapping);
+
+		  for (int i = 0; i < overlapping; i++)
+		    tmp_regs[i] = gen_reg_rtx (word_mode);
+
+		  for (int i = 0; i < overlapping; i++)
+		    emit_move_insn (tmp_regs[i],
+				    operand_subword_force (target, i, mode));
+	        }
+	      else
+		overlapping = 0;
+	    }
 	  emit_block_move (target, xinner, size, BLOCK_OP_CALL_PARM);
 	}
     }
@@ -4411,9 +4462,8 @@  emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	}
     }
 
-  /* If part should go in registers, copy that part
-     into the appropriate registers.  Do this now, at the end,
-     since mem-to-mem copies above may do function calls.  */
+  /* Move the partial arguments into the registers and any overlapping
+     values that we moved into the pseudos in tmp_regs.  */
   if (partial > 0 && reg != 0)
     {
       /* Handle calls that pass values in multiple non-contiguous locations.
@@ -4421,9 +4471,15 @@  emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
       if (GET_CODE (reg) == PARALLEL)
 	emit_group_load (reg, x, type, -1);
       else
-	{
+        {
 	  gcc_assert (partial % UNITS_PER_WORD == 0);
-	  move_block_to_reg (REGNO (reg), x, partial / UNITS_PER_WORD, mode);
+	  move_block_to_reg (REGNO (reg), x, nregs - overlapping, mode);
+
+	  for (int i = 0; i < overlapping; i++)
+	    emit_move_insn (gen_rtx_REG (word_mode, REGNO (reg)
+						    + nregs - overlapping + i),
+			    tmp_regs[i]);
+
 	}
     }
 
diff --git a/gcc/testsuite/gcc.dg/pr65358.c b/gcc/testsuite/gcc.dg/pr65358.c
new file mode 100644
index 0000000..ba89fd4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr65358.c
@@ -0,0 +1,33 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+struct pack
+{
+  int fine;
+  int victim;
+  int killer;
+};
+
+int __attribute__ ((__noinline__, __noclone__))
+bar (int a, int b, struct pack p)
+{
+  if (a != 20 || b != 30)
+    __builtin_abort ();
+  if (p.fine != 40 || p.victim != 50 || p.killer != 60)
+    __builtin_abort ();
+  return 0;
+}
+
+int __attribute__ ((__noinline__, __noclone__))
+foo (int arg1, int arg2, int arg3, struct pack p)
+{
+  return bar (arg2, arg3, p);
+}
+
+int main (void)
+{
+  struct pack p = { 40, 50, 60 };
+
+  (void) foo (10, 20, 30, p);
+  return 0;
+}