Patchwork PING: PATCH: PR middle-end/45234: [4.4/4.5/4.6 Regression] ICE in expand_call, at calls.c:2845 when passing aligned function argument from unaligned stack after alloca

login
register
mail settings
Submitter H.J. Lu
Date Sept. 22, 2010, 10:01 p.m.
Message ID <AANLkTinhpMJjRWZG9+r210ebSQ7A+UpvnfC36SoKgEEK@mail.gmail.com>
Download mbox | patch
Permalink /patch/65461/
State New
Headers show

Comments

H.J. Lu - Sept. 22, 2010, 10:01 p.m.
On Wed, Sep 22, 2010 at 9:47 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Sep 21, 2010 at 09:30:16PM +0200, Eric Botcazou wrote:
>> > > OK to backport it to 4.4/4.5 branches?
>> >
>> > Yeah, sure.
>>
>> This badly breaks on i586 though.  On the 4.5 branch:
>
> Here is a shorter testcase:
>
> /* { dg-do compile } */
> /* { dg-options "-march=i586" { target ilp32 } } */
>
> struct S { union { double b[4]; } a[18]; } s, a[5];
> void foo (struct S);
> struct S bar (struct S, struct S *, struct S);
>
> void
> foo (struct S arg)
> {
> }
>
> void
> baz (void)
> {
>  foo (bar (s, &a[1], a[2]));
> }
>
> Unless this is resolved soon, I think the 4.5/4.4 backports of this patch
> should be reverted.

I am testing this patch on trunk and 4.5.
H.J. Lu - Sept. 23, 2010, 12:28 a.m.
On Wed, Sep 22, 2010 at 3:01 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Sep 22, 2010 at 9:47 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Tue, Sep 21, 2010 at 09:30:16PM +0200, Eric Botcazou wrote:
>>> > > OK to backport it to 4.4/4.5 branches?
>>> >
>>> > Yeah, sure.
>>>
>>> This badly breaks on i586 though.  On the 4.5 branch:
>>
>> Here is a shorter testcase:
>>
>> /* { dg-do compile } */
>> /* { dg-options "-march=i586" { target ilp32 } } */
>>
>> struct S { union { double b[4]; } a[18]; } s, a[5];
>> void foo (struct S);
>> struct S bar (struct S, struct S *, struct S);
>>
>> void
>> foo (struct S arg)
>> {
>> }
>>
>> void
>> baz (void)
>> {
>>  foo (bar (s, &a[1], a[2]));
>> }
>>
>> Unless this is resolved soon, I think the 4.5/4.4 backports of this patch
>> should be reverted.
>
> I am testing this patch on trunk and 4.5.
>
> --
> H.J.
> ---
> gcc/
>
> 2010-09-22  H.J. Lu  <hongjiu.lu@intel.com>
>
>        PR middle-end/45234
>        * calls.c (expand_call): Don't call check all variable sized
>        adjustments are multiple of preferred stack boundary after
>        stack alignment when calling builtin functions.
>
> gcc/testsuite/
>
> 2010-09-22  Jakub Jelinek  <jakub@redhat.com>
>
>        PR middle-end/45234
>        * gcc.target/i386/pr45234.c: New.
>

There are no regressions on trunk and 4.5. I configured
--with-arch=i586 for 32bit
gcc.  I am testing it on 4.4.
Jakub Jelinek - Sept. 23, 2010, 6:56 a.m.
On Wed, Sep 22, 2010 at 03:01:49PM -0700, H.J. Lu wrote:
> 2010-09-22  H.J. Lu  <hongjiu.lu@intel.com>
> 
> 	PR middle-end/45234
> 	* calls.c (expand_call): Don't call check all variable sized
> 	adjustments are multiple of preferred stack boundary after
> 	stack alignment when calling builtin functions.

> --- a/gcc/calls.c
> +++ b/gcc/calls.c
> @@ -2369,7 +2369,7 @@ expand_call (tree exp, rtx target, int ignore)
>  
>    preferred_unit_stack_boundary = preferred_stack_boundary / BITS_PER_UNIT;
>  
> -  if (SUPPORTS_STACK_ALIGNMENT)
> +  if (SUPPORTS_STACK_ALIGNMENT && fndecl && !DECL_IS_BUILTIN (fndecl))
>      {
>        /* All variable sized adjustments must be multiple of preferred
>  	 stack boundary.  Stack alignment may change preferred stack

I'd like to hear explanation on why you think this is the right approach.
What is so special about builtins in this case?  If they are to be expanded
as calls instead of expanded inline, I'd say they behave like any other
call.

Isn't the problem instead with nested expand_calls (which is happening
in the testcase that is now failing)?  Not sure if we still ever TER const/pure
calls into other call arguments or not[*], so if that is possible for arbitrary
calls, or the problem is just memcpy emitted through
emit_block_move_hints with method BLOCK_OP_CALL_PARM.
I admit I haven't debugged the original PR45234 testcase, so I don't know
why exactly is your patch needed and thus why it isn't needed when expanding
the call param move calls.

[*] In
long long foo (int, int) __attribute__((const));
int bar (int, int, long long, int, int);
int baz (int x)
{
  return bar (2, 3, foo (4, 5), 6, 7) + 1;
}
foo is now evaluated before starting the pushes with -m32 -O2 -march=i586.
I vaguely remember const/pure calls used to be TERed at some point, but
perhaps I'm just using a wrong testcase.

	Jakub
H.J. Lu - Sept. 23, 2010, 8:25 a.m.
On Wed, Sep 22, 2010 at 11:56 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Sep 22, 2010 at 03:01:49PM -0700, H.J. Lu wrote:
>> 2010-09-22  H.J. Lu  <hongjiu.lu@intel.com>
>>
>>       PR middle-end/45234
>>       * calls.c (expand_call): Don't call check all variable sized
>>       adjustments are multiple of preferred stack boundary after
>>       stack alignment when calling builtin functions.
>
>> --- a/gcc/calls.c
>> +++ b/gcc/calls.c
>> @@ -2369,7 +2369,7 @@ expand_call (tree exp, rtx target, int ignore)
>>
>>    preferred_unit_stack_boundary = preferred_stack_boundary / BITS_PER_UNIT;
>>
>> -  if (SUPPORTS_STACK_ALIGNMENT)
>> +  if (SUPPORTS_STACK_ALIGNMENT && fndecl && !DECL_IS_BUILTIN (fndecl))
>>      {
>>        /* All variable sized adjustments must be multiple of preferred
>>        stack boundary.  Stack alignment may change preferred stack
>
> I'd like to hear explanation on why you think this is the right approach.
> What is so special about builtins in this case?  If they are to be expanded
> as calls instead of expanded inline, I'd say they behave like any other
> call.
>
> Isn't the problem instead with nested expand_calls (which is happening
> in the testcase that is now failing)?  Not sure if we still ever TER const/pure
> calls into other call arguments or not[*], so if that is possible for arbitrary
> calls, or the problem is just memcpy emitted through
> emit_block_move_hints with method BLOCK_OP_CALL_PARM.
> I admit I haven't debugged the original PR45234 testcase, so I don't know
> why exactly is your patch needed and thus why it isn't needed when expanding
> the call param move calls.
>
> [*] In
> long long foo (int, int) __attribute__((const));
> int bar (int, int, long long, int, int);
> int baz (int x)
> {
>  return bar (2, 3, foo (4, 5), 6, 7) + 1;
> }
> foo is now evaluated before starting the pushes with -m32 -O2 -march=i586.
> I vaguely remember const/pure calls used to be TERed at some point, but
> perhaps I'm just using a wrong testcase.
>

I didn't find exactly why. From what I can see, builtin functions are
treated differently from the normal ones.
Jakub Jelinek - Sept. 23, 2010, 8:46 a.m.
On Thu, Sep 23, 2010 at 01:25:03AM -0700, H.J. Lu wrote:
> > I'd like to hear explanation on why you think this is the right approach.
> > What is so special about builtins in this case?  If they are to be expanded
> > as calls instead of expanded inline, I'd say they behave like any other
> > call.
> >
> > Isn't the problem instead with nested expand_calls (which is happening
> > in the testcase that is now failing)?  Not sure if we still ever TER const/pure
> > calls into other call arguments or not[*], so if that is possible for arbitrary
> > calls, or the problem is just memcpy emitted through
> > emit_block_move_hints with method BLOCK_OP_CALL_PARM.
> > I admit I haven't debugged the original PR45234 testcase, so I don't know
> > why exactly is your patch needed and thus why it isn't needed when expanding
> > the call param move calls.
> >
> > [*] In
> > long long foo (int, int) __attribute__((const));
> > int bar (int, int, long long, int, int);
> > int baz (int x)
> > {
> >  return bar (2, 3, foo (4, 5), 6, 7) + 1;
> > }
> > foo is now evaluated before starting the pushes with -m32 -O2 -march=i586.
> > I vaguely remember const/pure calls used to be TERed at some point, but
> > perhaps I'm just using a wrong testcase.
> >
> 
> I didn't find exactly why. From what I can see, builtin functions are
> treated differently from the normal ones.

I've briefly looked at the original PR45234 testcase, and the specifics of
the call are that it increased crtl->preferred_stack_boundary in between
the start of the expand_call function and the if (SUPPORTS_STACK_ALIGNMENT)
block you've added, I'd quite understand we need to align the stack
sufficiently in that case somewhere.  The question is where exactly and by
what exact value.  On the currently failing testcase obviously no increase
in crtl->preferred_stack_boundary for the nested memcpys happens, so if that
could be used as a condition, that testcase would work.  I guess what should
be investigated is where the nested memcpy calls on that testcase actually
do align the stack properly (and how), because it looks like they all have
%esp % 16 == 0 on the call insn.

	Jakub
Jakub Jelinek - Sept. 23, 2010, 11:58 a.m.
On Thu, Sep 23, 2010 at 01:25:03AM -0700, H.J. Lu wrote:
> I didn't find exactly why. From what I can see, builtin functions are
> treated differently from the normal ones.

I have looked into it some more and I believe the PR45234 "fix" is just
completely wrong, the problem is elsewhere (allocate_dynamic_stack_space)
and there are actually many issues in there.

So, IMNSHO that calls.c change should be reverted.

The problems I see in allocate_dynamic_stack_space for
SUPPORTS_STACK_ALIGNMENT targets are:
1) In

  /* We can't attempt to minimize alignment necessary, because we don't
     know the final value of preferred_stack_boundary yet while executing
     this code.  */
  crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;

   I'm afraid this can actually decrease crtl->preferred_stack_boundary
   if it has been increased above PREFERRED_STACK_BOUNDARY already earlier.
   IMNSHO, if it needs to be done, it should be
  if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
    crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;

2) I'd say MUST_ALIGN macro could use
   crtl->preferred_stack_boundary instead of PREFERRED_STACK_BOUNDARY,
   if we already know we want to align stack more than
   PREFERRED_STACK_BOUNDARY, we wouldn't need to try to align the result.

3) Most importantly, in

  if (!known_align_valid || known_align % PREFERRED_STACK_BOUNDARY != 0)
    {
      size = round_push (size);
     
      if (flag_stack_usage)
        {
          int align = PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT;
          stack_usage_size = (stack_usage_size + align - 1) / align * align;
        }
    }

   I think the first % PREFERRED_STACK_BOUNDARY should be
   % MAX_SUPPORTED_STACK_ALIGNMENT, because crtl->preferred_stack_boundary
   can always grow afterwards, and round_push if
   crtl->preferred_stack_boundary is smaller than
   MAX_SUPPORTED_STACK_ALIGNMENT should as align in bytes use a new
   virtual register which would be replaced by constant equal to the actual
   final crtl->preferred_stack_boundary (and let cse/combiner optimize
   all the calculations up).  For flag_stack_usage, I'm afraid it can't
   be computed reliably.

4) When this function uses anti_adjust_stack_and_probe or anti_adjust_stack,
   it should IMNSHO ensure that stack_pointer_delta isn't modified
   even when size is constant (e.g. by saving/restoring the value around
   it).  The constant allocas should have the same properties as dynamic
   allocas.  For i?86/x86_64 with the change in 3) constants won't happen
   any longer, but for other targets.  For the future, it would be nice
   if we had some REG note for al the alloca sp adjustments, because e.g.
   the unwinding code is confused by the constant size allocas e.g. when
   computing DW_OP_GNU_args_size.

5) == CONST_INT and == REG should be replaced with CONST_INT_P and REG_P

The alternative to this would be to prescan the trees right before actual
expansion and (conservatively) guess what the stack alignment will actually
need to be (e.g. take into account call argument alignments, etc.), and
don't allow crtl->preferred_stack_boundary etc. actually grow during the
expansion.

I'm attaching a testcase that shows that non-constant alloca has exactly
the same issue as the original PR testcase, but the "fix" obviously can't do
anything about it.

	Jakub
/* PR middle-end/45234 */
/* { dg-do run { target { { i?86-*-* x86_64-*-* } && ilp32 } } } */
/* { dg-options "-mincoming-stack-boundary=2 -mpreferred-stack-boundary=2" } */

#include "check.h"

void
__attribute__ ((noinline))
bar (__float128 f)
{
  check (&f, __alignof__(f));
}

volatile int z = 6;

int
main (void)
{
  char *p = __builtin_alloca (z);

  bar (0);

  __builtin_strncpy (p, "good", 5);
  if (__builtin_strncmp (p, "good", 5) != 0)
    {
#ifdef DEBUG
      p[z - 1] = '\0';
      printf ("Failed: %s != good\n", p);
#endif
     abort ();
    }

  return 0;
}

Patch

diff --git a/gcc/calls.c b/gcc/calls.c
index aef823f..0c7588a 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -2369,7 +2369,7 @@  expand_call (tree exp, rtx target, int ignore)
 
   preferred_unit_stack_boundary = preferred_stack_boundary / BITS_PER_UNIT;
 
-  if (SUPPORTS_STACK_ALIGNMENT)
+  if (SUPPORTS_STACK_ALIGNMENT && fndecl && !DECL_IS_BUILTIN (fndecl))
     {
       /* All variable sized adjustments must be multiple of preferred
 	 stack boundary.  Stack alignment may change preferred stack
--- /dev/null	2010-09-09 09:16:30.485584932 -0700
+++ gcc-release/gcc/testsuite/gcc.target/i386/pr45234.c	2010-09-22 13:34:53.415871265 -0700
@@ -0,0 +1,17 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=i586" { target ilp32 } } */
+
+struct S { union { double b[4]; } a[18]; } s, a[5];
+void foo (struct S);
+struct S bar (struct S, struct S *, struct S);
+
+void
+foo (struct S arg)
+{
+}
+
+void
+baz (void)
+{
+ foo (bar (s, &a[1], a[2]));
+}