diff mbox

PATCH: PR middle-end/45678: [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse

Message ID 20100917165833.GA17163@intel.com
State New
Headers show

Commit Message

H.J. Lu Sept. 17, 2010, 4:58 p.m. UTC
We failed to update stack alignment when we increase alignment of local
variable.  This patch fixes it.  OK for trunk/4.5/4.4?

Thanks.


H.J.
---
gcc/

2010-09-17  H.J. Lu  <hongjiu.lu@intel.com>

	PR middle-end/45678
	* cfgexpand.c (update_stack_alignment): New.
	(get_decl_align_unit): Use it.
	(expand_one_stack_var_at): Call update_stack_alignment.

gcc/testsuite/

2010-09-17  H.J. Lu  <hongjiu.lu@intel.com>

	PR middle-end/45678
	* gcc.dg/torture/pr45678-2.c: New.

Comments

Richard Henderson Sept. 17, 2010, 5:46 p.m. UTC | #1
On 09/17/2010 09:58 AM, H.J. Lu wrote:
> 	PR middle-end/45678
> 	* cfgexpand.c (update_stack_alignment): New.
> 	(get_decl_align_unit): Use it.
> 	(expand_one_stack_var_at): Call update_stack_alignment.

Ok.


r~
Jakub Jelinek Sept. 17, 2010, 6:15 p.m. UTC | #2
On Fri, Sep 17, 2010 at 10:46:00AM -0700, Richard Henderson wrote:
> On 09/17/2010 09:58 AM, H.J. Lu wrote:
> > 	PR middle-end/45678
> > 	* cfgexpand.c (update_stack_alignment): New.
> > 	(get_decl_align_unit): Use it.
> > 	(expand_one_stack_var_at): Call update_stack_alignment.
> 
> Ok.

I don't deny this fixes the problem and probably is a good idea in any case,
the question I have is just if this won't pessimize i?86 code too much,
especially with non-default options where incoming stack boundary is smaller
than PREFERRED_STACK_BOUNDARY.
The thing is that this expand_one_stack_var_at is just an optimization,
but we don't know if anything will actually make use of the increased
alignment.  If there will be just variables that need 32-bit alignment
and say incoming stack alignment is also 32-bit, but preferred stack
boundary is 128-bit, as soon as some variable will be placed in slot 16
bytes from the base, we'll force realignment even when nothing actually
needs it.

So, I wonder if
      max_align = MAX (crtl->max_used_stack_slot_alignment,
                       PREFERRED_STACK_BOUNDARY);
shouldn't be replaced by something like:
1)
      max_align = MAX (crtl->max_used_stack_slot_alignment,
                       SUPPORTS_STACK_ALIGNMENT
		       ? STACK_BOUNDARY : PREFERRED_STACK_BOUNDARY);
or:
2)
      max_align = MAX (crtl->max_used_stack_slot_alignment,
                       SUPPORTS_STACK_ALIGNMENT
		       ? INCOMING_STACK_BOUNDARY : PREFERRED_STACK_BOUNDARY);
or:
3)
      max_align = SUPPORTS_STACK_ALIGNMENT
		  ? MAX (crtl->max_used_stack_slot_alignment, STACK_BOUNDARY)
		  : PREFERRED_STACK_BOUNDARY;
or:
4)
      max_align = SUPPORTS_STACK_ALIGNMENT
		  ? MAX (crtl->max_used_stack_slot_alignment,
			 INCOMING_STACK_BOUNDARY)
		  : PREFERRED_STACK_BOUNDARY;
For 2) or 4) we'd need to do
  if (targetm.calls.update_stack_boundary)
    targetm.calls.update_stack_boundary ();
earlier (before all the expand_one_stack_var_at calls).
1) and 3) are conservative choices for i?86, seems without H.J's patch gcc
would often happily use just 32-bit alignment of the stack base and even if
incoming stack boundary was 128-bit, it would tell other routines they don't
need to bother with that alignment, but e.g. on x86-64 it would still assume
max_align of 8 bytes initially, even when e.g. x86-64 doesn't allow setting
PREFERRED_STACK_BOUNDARY below 128 bits.  I think usually it doesn't buy us
too much to align below INCOMING_STACK_BOUNDARY.  So perhaps 2)/4), where
needlessly bumping alignment in expand_one_stack_var_at too much would be
far more costly (would cause stack realignment).
The 1)/2) vs. 3)/4) difference is for !SUPPORTS_STACK_ALIGNMENT targets,
I wonder as they are never realigning the stack whether we should ever
use there higher alignment.

	Jakub
H.J. Lu Sept. 17, 2010, 6:32 p.m. UTC | #3
On Fri, Sep 17, 2010 at 11:15 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Sep 17, 2010 at 10:46:00AM -0700, Richard Henderson wrote:
>> On 09/17/2010 09:58 AM, H.J. Lu wrote:
>> >     PR middle-end/45678
>> >     * cfgexpand.c (update_stack_alignment): New.
>> >     (get_decl_align_unit): Use it.
>> >     (expand_one_stack_var_at): Call update_stack_alignment.
>>
>> Ok.
>
> I don't deny this fixes the problem and probably is a good idea in any case,
> the question I have is just if this won't pessimize i?86 code too much,
> especially with non-default options where incoming stack boundary is smaller
> than PREFERRED_STACK_BOUNDARY.
> The thing is that this expand_one_stack_var_at is just an optimization,
> but we don't know if anything will actually make use of the increased
> alignment.  If there will be just variables that need 32-bit alignment
> and say incoming stack alignment is also 32-bit, but preferred stack
> boundary is 128-bit, as soon as some variable will be placed in slot 16
> bytes from the base, we'll force realignment even when nothing actually
> needs it.
>
> So, I wonder if
>      max_align = MAX (crtl->max_used_stack_slot_alignment,
>                       PREFERRED_STACK_BOUNDARY);
> shouldn't be replaced by something like:
> 1)
>      max_align = MAX (crtl->max_used_stack_slot_alignment,
>                       SUPPORTS_STACK_ALIGNMENT
>                       ? STACK_BOUNDARY : PREFERRED_STACK_BOUNDARY);
> or:
> 2)
>      max_align = MAX (crtl->max_used_stack_slot_alignment,
>                       SUPPORTS_STACK_ALIGNMENT
>                       ? INCOMING_STACK_BOUNDARY : PREFERRED_STACK_BOUNDARY);
> or:
> 3)
>      max_align = SUPPORTS_STACK_ALIGNMENT
>                  ? MAX (crtl->max_used_stack_slot_alignment, STACK_BOUNDARY)
>                  : PREFERRED_STACK_BOUNDARY;
> or:
> 4)
>      max_align = SUPPORTS_STACK_ALIGNMENT
>                  ? MAX (crtl->max_used_stack_slot_alignment,
>                         INCOMING_STACK_BOUNDARY)
>                  : PREFERRED_STACK_BOUNDARY;
> For 2) or 4) we'd need to do
>  if (targetm.calls.update_stack_boundary)
>    targetm.calls.update_stack_boundary ();
> earlier (before all the expand_one_stack_var_at calls).
> 1) and 3) are conservative choices for i?86, seems without H.J's patch gcc
> would often happily use just 32-bit alignment of the stack base and even if
> incoming stack boundary was 128-bit, it would tell other routines they don't
> need to bother with that alignment, but e.g. on x86-64 it would still assume
> max_align of 8 bytes initially, even when e.g. x86-64 doesn't allow setting
> PREFERRED_STACK_BOUNDARY below 128 bits.  I think usually it doesn't buy us
> too much to align below INCOMING_STACK_BOUNDARY.  So perhaps 2)/4), where
> needlessly bumping alignment in expand_one_stack_var_at too much would be
> far more costly (would cause stack realignment).
> The 1)/2) vs. 3)/4) difference is for !SUPPORTS_STACK_ALIGNMENT targets,
> I wonder as they are never realigning the stack whether we should ever
> use there higher alignment.
>

I think it is related to

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44542

One of my attempts may work.

FWIW, I am fine with any solution as long as when we do

DECL_ALIGN (decl) = align;

we guarantee that it will be aligned at that boundary at run-time.
H.J. Lu Sept. 17, 2010, 7:18 p.m. UTC | #4
On Fri, Sep 17, 2010 at 11:32 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Sep 17, 2010 at 11:15 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Fri, Sep 17, 2010 at 10:46:00AM -0700, Richard Henderson wrote:
>>> On 09/17/2010 09:58 AM, H.J. Lu wrote:
>>> >     PR middle-end/45678
>>> >     * cfgexpand.c (update_stack_alignment): New.
>>> >     (get_decl_align_unit): Use it.
>>> >     (expand_one_stack_var_at): Call update_stack_alignment.
>>>
>>> Ok.
>>
>> I don't deny this fixes the problem and probably is a good idea in any case,
>> the question I have is just if this won't pessimize i?86 code too much,
>> especially with non-default options where incoming stack boundary is smaller
>> than PREFERRED_STACK_BOUNDARY.
>> The thing is that this expand_one_stack_var_at is just an optimization,
>> but we don't know if anything will actually make use of the increased
>> alignment.  If there will be just variables that need 32-bit alignment
>> and say incoming stack alignment is also 32-bit, but preferred stack
>> boundary is 128-bit, as soon as some variable will be placed in slot 16
>> bytes from the base, we'll force realignment even when nothing actually
>> needs it.
>>
>> So, I wonder if
>>      max_align = MAX (crtl->max_used_stack_slot_alignment,
>>                       PREFERRED_STACK_BOUNDARY);
>> shouldn't be replaced by something like:
>> 1)
>>      max_align = MAX (crtl->max_used_stack_slot_alignment,
>>                       SUPPORTS_STACK_ALIGNMENT
>>                       ? STACK_BOUNDARY : PREFERRED_STACK_BOUNDARY);
>> or:
>> 2)
>>      max_align = MAX (crtl->max_used_stack_slot_alignment,
>>                       SUPPORTS_STACK_ALIGNMENT
>>                       ? INCOMING_STACK_BOUNDARY : PREFERRED_STACK_BOUNDARY);
>> or:
>> 3)
>>      max_align = SUPPORTS_STACK_ALIGNMENT
>>                  ? MAX (crtl->max_used_stack_slot_alignment, STACK_BOUNDARY)
>>                  : PREFERRED_STACK_BOUNDARY;
>> or:
>> 4)
>>      max_align = SUPPORTS_STACK_ALIGNMENT
>>                  ? MAX (crtl->max_used_stack_slot_alignment,
>>                         INCOMING_STACK_BOUNDARY)
>>                  : PREFERRED_STACK_BOUNDARY;
>> For 2) or 4) we'd need to do
>>  if (targetm.calls.update_stack_boundary)
>>    targetm.calls.update_stack_boundary ();
>> earlier (before all the expand_one_stack_var_at calls).
>> 1) and 3) are conservative choices for i?86, seems without H.J's patch gcc
>> would often happily use just 32-bit alignment of the stack base and even if
>> incoming stack boundary was 128-bit, it would tell other routines they don't
>> need to bother with that alignment, but e.g. on x86-64 it would still assume
>> max_align of 8 bytes initially, even when e.g. x86-64 doesn't allow setting
>> PREFERRED_STACK_BOUNDARY below 128 bits.  I think usually it doesn't buy us
>> too much to align below INCOMING_STACK_BOUNDARY.  So perhaps 2)/4), where
>> needlessly bumping alignment in expand_one_stack_var_at too much would be
>> far more costly (would cause stack realignment).
>> The 1)/2) vs. 3)/4) difference is for !SUPPORTS_STACK_ALIGNMENT targets,
>> I wonder as they are never realigning the stack whether we should ever
>> use there higher alignment.
>>
>
> I think it is related to
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44542
>
> One of my attempts may work.
>
> FWIW, I am fine with any solution as long as when we do
>
> DECL_ALIGN (decl) = align;
>
> we guarantee that it will be aligned at that boundary at run-time.

I don't think we should use offset as alignment unless the variable
should be aligned at offset bytes.  My patch

http://gcc.gnu.org/bugzilla/attachment.cgi?id=20922&action=view

may align the local variable at boundary specified for the variable.
H.J. Lu Sept. 17, 2010, 8:03 p.m. UTC | #5
On Fri, Sep 17, 2010 at 12:18 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Sep 17, 2010 at 11:32 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, Sep 17, 2010 at 11:15 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>>> On Fri, Sep 17, 2010 at 10:46:00AM -0700, Richard Henderson wrote:
>>>> On 09/17/2010 09:58 AM, H.J. Lu wrote:
>>>> >     PR middle-end/45678
>>>> >     * cfgexpand.c (update_stack_alignment): New.
>>>> >     (get_decl_align_unit): Use it.
>>>> >     (expand_one_stack_var_at): Call update_stack_alignment.
>>>>
>>>> Ok.
>>>
>>> I don't deny this fixes the problem and probably is a good idea in any case,
>>> the question I have is just if this won't pessimize i?86 code too much,
>>> especially with non-default options where incoming stack boundary is smaller
>>> than PREFERRED_STACK_BOUNDARY.
>>> The thing is that this expand_one_stack_var_at is just an optimization,
>>> but we don't know if anything will actually make use of the increased
>>> alignment.  If there will be just variables that need 32-bit alignment
>>> and say incoming stack alignment is also 32-bit, but preferred stack
>>> boundary is 128-bit, as soon as some variable will be placed in slot 16
>>> bytes from the base, we'll force realignment even when nothing actually
>>> needs it.
>>>
>>> So, I wonder if
>>>      max_align = MAX (crtl->max_used_stack_slot_alignment,
>>>                       PREFERRED_STACK_BOUNDARY);
>>> shouldn't be replaced by something like:
>>> 1)
>>>      max_align = MAX (crtl->max_used_stack_slot_alignment,
>>>                       SUPPORTS_STACK_ALIGNMENT
>>>                       ? STACK_BOUNDARY : PREFERRED_STACK_BOUNDARY);
>>> or:
>>> 2)
>>>      max_align = MAX (crtl->max_used_stack_slot_alignment,
>>>                       SUPPORTS_STACK_ALIGNMENT
>>>                       ? INCOMING_STACK_BOUNDARY : PREFERRED_STACK_BOUNDARY);
>>> or:
>>> 3)
>>>      max_align = SUPPORTS_STACK_ALIGNMENT
>>>                  ? MAX (crtl->max_used_stack_slot_alignment, STACK_BOUNDARY)
>>>                  : PREFERRED_STACK_BOUNDARY;
>>> or:
>>> 4)
>>>      max_align = SUPPORTS_STACK_ALIGNMENT
>>>                  ? MAX (crtl->max_used_stack_slot_alignment,
>>>                         INCOMING_STACK_BOUNDARY)
>>>                  : PREFERRED_STACK_BOUNDARY;
>>> For 2) or 4) we'd need to do
>>>  if (targetm.calls.update_stack_boundary)
>>>    targetm.calls.update_stack_boundary ();
>>> earlier (before all the expand_one_stack_var_at calls).
>>> 1) and 3) are conservative choices for i?86, seems without H.J's patch gcc
>>> would often happily use just 32-bit alignment of the stack base and even if
>>> incoming stack boundary was 128-bit, it would tell other routines they don't
>>> need to bother with that alignment, but e.g. on x86-64 it would still assume
>>> max_align of 8 bytes initially, even when e.g. x86-64 doesn't allow setting
>>> PREFERRED_STACK_BOUNDARY below 128 bits.  I think usually it doesn't buy us
>>> too much to align below INCOMING_STACK_BOUNDARY.  So perhaps 2)/4), where
>>> needlessly bumping alignment in expand_one_stack_var_at too much would be
>>> far more costly (would cause stack realignment).
>>> The 1)/2) vs. 3)/4) difference is for !SUPPORTS_STACK_ALIGNMENT targets,
>>> I wonder as they are never realigning the stack whether we should ever
>>> use there higher alignment.
>>>
>>
>> I think it is related to
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44542
>>
>> One of my attempts may work.
>>
>> FWIW, I am fine with any solution as long as when we do
>>
>> DECL_ALIGN (decl) = align;
>>
>> we guarantee that it will be aligned at that boundary at run-time.
>
> I don't think we should use offset as alignment unless the variable
> should be aligned at offset bytes.  My patch
>
> http://gcc.gnu.org/bugzilla/attachment.cgi?id=20922&action=view
>
> may align the local variable at boundary specified for the variable.
>
>
>
> --
> H.J.
>

Using stack offset for local variable alignment is a bad idea. My patch
caused:

FAIL: gcc.target/i386/incoming-9.c scan-assembler-not andl[\\t
]*\\$-16,[\\t ]*%esp
diff mbox

Patch

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 4237276..1e67e77 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -205,19 +205,11 @@  static bool has_protected_decls;
    smaller than our cutoff threshold.  Used for -Wstack-protector.  */
 static bool has_short_buffer;
 
-/* Discover the byte alignment to use for DECL.  Ignore alignment
-   we can't do with expected alignment of the stack boundary.  */
+/* Update stack alignment requirement.  */
 
-static unsigned int
-get_decl_align_unit (tree decl)
+static void
+update_stack_alignment (unsigned int align)
 {
-  unsigned int align;
-
-  align = LOCAL_DECL_ALIGNMENT (decl);
-
-  if (align > MAX_SUPPORTED_STACK_ALIGNMENT)
-    align = MAX_SUPPORTED_STACK_ALIGNMENT;
-
   if (SUPPORTS_STACK_ALIGNMENT)
     {
       if (crtl->stack_alignment_estimated < align)
@@ -233,6 +225,22 @@  get_decl_align_unit (tree decl)
     crtl->stack_alignment_needed = align;
   if (crtl->max_used_stack_slot_alignment < align)
     crtl->max_used_stack_slot_alignment = align;
+}
+
+/* Discover the byte alignment to use for DECL.  Ignore alignment
+   we can't do with expected alignment of the stack boundary.  */
+
+static unsigned int
+get_decl_align_unit (tree decl)
+{
+  unsigned int align;
+
+  align = LOCAL_DECL_ALIGNMENT (decl);
+
+  if (align > MAX_SUPPORTED_STACK_ALIGNMENT)
+    align = MAX_SUPPORTED_STACK_ALIGNMENT;
+
+  update_stack_alignment (align);
 
   return align / BITS_PER_UNIT;
 }
@@ -735,6 +743,7 @@  expand_one_stack_var_at (tree decl, HOST_WIDE_INT offset)
       if (align == 0 || align > max_align)
 	align = max_align;
 
+      update_stack_alignment (align);
       DECL_ALIGN (decl) = align;
       DECL_USER_ALIGN (decl) = 0;
     }
--- /dev/null	2010-09-09 09:16:30.485584932 -0700
+++ gcc/gcc/testsuite/gcc.dg/torture/pr45678-2.c	2010-09-17 09:53:20.510888017 -0700
@@ -0,0 +1,16 @@ 
+/* { dg-do run } */
+
+typedef float V __attribute__ ((vector_size (16)));
+V g;
+
+int
+main ()
+{
+  float d[4] = { 4, 3, 2, 1 };
+  V e;
+  __builtin_memcpy (&e, &d, sizeof (d));
+  V f = { 5, 15, 25, 35 };
+  e = e * f;
+  g = e;
+  return 0;
+}