i386: Always set cfun->machine->max_used_stack_alignment

Message ID 20180802225548.GA1730@intel.com
State New
Headers show
Series
  • i386: Always set cfun->machine->max_used_stack_alignment
Related show

Commit Message

H.J. Lu Aug. 2, 2018, 10:55 p.m.
We should always set cfun->machine->max_used_stack_alignment if the
maximum stack slot alignment may be greater than 64 bits.

Tested on i686 and x86-64.  OK for master and backport for GCC 8?

Thanks.

H.J.
----
gcc/

	PR target/86386
	* config/i386/i386.c (ix86_finalize_stack_frame_flags): Set
	cfun->machine->max_used_stack_alignment if needed.

gcc/testsuite/

	PR target/86386
	* gcc.target/i386/pr86386.c: New file.
---
 gcc/config/i386/i386.c                  |  6 +++---
 gcc/testsuite/gcc.target/i386/pr86386.c | 26 +++++++++++++++++++++++++
 2 files changed, 29 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr86386.c

Comments

Uros Bizjak Aug. 4, 2018, 10:42 a.m. | #1
On Fri, Aug 3, 2018 at 12:55 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
> We should always set cfun->machine->max_used_stack_alignment if the
> maximum stack slot alignment may be greater than 64 bits.
>
> Tested on i686 and x86-64.  OK for master and backport for GCC 8?

Can you explain why 64 bits, and what this value represents? Is this
value the same for 64bit and 32bit targets?

Should crtl->max_used_stack_slot_alignment be compared to
STACK_BOUNDARY or even MIN_STACK_BOUNDARY instead?

Uros.

> Thanks.
>
> H.J.
> ----
> gcc/
>
>         PR target/86386
>         * config/i386/i386.c (ix86_finalize_stack_frame_flags): Set
>         cfun->machine->max_used_stack_alignment if needed.
>
> gcc/testsuite/
>
>         PR target/86386
>         * gcc.target/i386/pr86386.c: New file.
> ---
>  gcc/config/i386/i386.c                  |  6 +++---
>  gcc/testsuite/gcc.target/i386/pr86386.c | 26 +++++++++++++++++++++++++
>  2 files changed, 29 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr86386.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index ee409cfe7e4..4a0a050b3a2 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -13281,12 +13281,12 @@ ix86_finalize_stack_frame_flags (void)
>           recompute_frame_layout_p = true;
>         }
>      }
> -  else if (crtl->max_used_stack_slot_alignment
> -          > crtl->preferred_stack_boundary)
> +  else if (crtl->max_used_stack_slot_alignment > 64)
>      {
>        /* We don't need to realign stack.  But we still need to keep
>          stack frame properly aligned to satisfy the largest alignment
> -        of stack slots.  */
> +        of stack slots if the maximum stack slot alignment may be
> +        greater than 64 bits.  */
>        if (ix86_find_max_used_stack_alignment (stack_alignment, true))
>         cfun->machine->max_used_stack_alignment
>           = stack_alignment / BITS_PER_UNIT;
> diff --git a/gcc/testsuite/gcc.target/i386/pr86386.c b/gcc/testsuite/gcc.target/i386/pr86386.c
> new file mode 100644
> index 00000000000..a67cf45444e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr86386.c
> @@ -0,0 +1,26 @@
> +/* PR target/86386 */
> +/* { dg-do run { target { avx_runtime && int128 } } } */
> +/* { dg-options "-Os -fno-tree-dce -mstringop-strategy=vector_loop -mavx" } */
> +
> +unsigned c, d, e, f;
> +
> +unsigned __attribute__((noipa))
> +foo (unsigned char g, unsigned short h, unsigned i, unsigned long long j,
> +     unsigned char k, unsigned short l, unsigned m, unsigned __int128 n)
> +{
> +  __builtin_memset (&e, 0, 3);
> +  n <<= m;
> +  __builtin_memcpy (&m, 2 + (char *) &n, 1);
> +  m >>= 0;
> +  d ^= __builtin_mul_overflow (l, n, &m);
> +  return m;
> +}
> +
> +int
> +main ()
> +{
> +  unsigned __int128 x = foo (0, 0, 0, 0, 0, 4, 1, 3);
> +  if (x != 24)
> +    __builtin_abort ();
> +  return 0;
> +}
> --
> 2.17.1
>
H.J. Lu Aug. 4, 2018, 1:59 p.m. | #2
On Sat, Aug 4, 2018 at 3:42 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Fri, Aug 3, 2018 at 12:55 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>> We should always set cfun->machine->max_used_stack_alignment if the
>> maximum stack slot alignment may be greater than 64 bits.
>>
>> Tested on i686 and x86-64.  OK for master and backport for GCC 8?
>
> Can you explain why 64 bits, and what this value represents? Is this
> value the same for 64bit and 32bit targets?
>
> Should crtl->max_used_stack_slot_alignment be compared to
> STACK_BOUNDARY or even MIN_STACK_BOUNDARY instead?

In this case, we don't need to realign the incoming stack since both
crtl->max_used_stack_slot_alignment and crtl->preferred_stack_boundary
are 128 bits.  We don't compute the largest alignment of stack slots to
keep stack frame properly aligned for it.  Normally it is OK.   But if
the largest alignment of stack slots > 64 bits and we don't keep stack
frame proper aligned, we will get segfault if aligned vector load/store
are used on these unaligned stack slots. My patch computes the
largest alignment of stack slots, which are actually used,  if the
largest alignment of stack slots allocated is > 64 bits, which is
the smallest alignment for misaligned load/store.

Here is the diff of before and after:

diff -up old/x.s new/x.s
--- old/x.s 2018-08-02 12:39:22.916400504 -0700
+++ new/x.s 2018-08-02 12:38:35.853729415 -0700
@@ -15,6 +15,7 @@ foo:
  movq %rsp, %rbp
  .cfi_def_cfa_register 6
  pushq %rbx
+ subq $8, %rsp  <<<<<<<<<<< Stack frame is properly aligned to 16 bytes.
  .cfi_offset 3, -24
  stosw
  movl 16(%rbp), %ecx
@@ -65,6 +66,7 @@ foo:
 .L9:
  xorl %r8d, d(%rip)
  movl %esi, %eax
+ popq %rdx
  popq %rbx
  popq %rbp
  .cfi_def_cfa 7, 8
Uros Bizjak Aug. 4, 2018, 7:09 p.m. | #3
On Sat, Aug 4, 2018 at 3:59 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Sat, Aug 4, 2018 at 3:42 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Fri, Aug 3, 2018 at 12:55 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>> We should always set cfun->machine->max_used_stack_alignment if the
>>> maximum stack slot alignment may be greater than 64 bits.
>>>
>>> Tested on i686 and x86-64.  OK for master and backport for GCC 8?
>>
>> Can you explain why 64 bits, and what this value represents? Is this
>> value the same for 64bit and 32bit targets?
>>
>> Should crtl->max_used_stack_slot_alignment be compared to
>> STACK_BOUNDARY or even MIN_STACK_BOUNDARY instead?
>
> In this case, we don't need to realign the incoming stack since both
> crtl->max_used_stack_slot_alignment and crtl->preferred_stack_boundary
> are 128 bits.  We don't compute the largest alignment of stack slots to
> keep stack frame properly aligned for it.  Normally it is OK.   But if
> the largest alignment of stack slots > 64 bits and we don't keep stack
> frame proper aligned, we will get segfault if aligned vector load/store
> are used on these unaligned stack slots. My patch computes the
> largest alignment of stack slots, which are actually used,  if the
> largest alignment of stack slots allocated is > 64 bits, which is
> the smallest alignment for misaligned load/store.

Does > 64 bits also hold for 32 bit targets, and -mabi=ms? I think
that we need to compare to STACK_BOUNDARY instead:

--cut here--
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c  (revision 263307)
+++ config/i386/i386.c  (working copy)
@@ -13281,8 +13281,7 @@
          recompute_frame_layout_p = true;
        }
     }
-  else if (crtl->max_used_stack_slot_alignment
-          > crtl->preferred_stack_boundary)
+  else if (crtl->max_used_stack_slot_alignment > STACK_BOUNDARY)
     {
       /* We don't need to realign stack.  But we still need to keep
         stack frame properly aligned to satisfy the largest alignment
--cut here--

(The testcase works OK with -mabi=ms, which somehow suggests that we
don't need realignment in this case).

Uros.
H.J. Lu Aug. 4, 2018, 7:49 p.m. | #4
On Sat, Aug 4, 2018 at 12:09 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sat, Aug 4, 2018 at 3:59 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Sat, Aug 4, 2018 at 3:42 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Fri, Aug 3, 2018 at 12:55 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>>> We should always set cfun->machine->max_used_stack_alignment if the
>>>> maximum stack slot alignment may be greater than 64 bits.
>>>>
>>>> Tested on i686 and x86-64.  OK for master and backport for GCC 8?
>>>
>>> Can you explain why 64 bits, and what this value represents? Is this
>>> value the same for 64bit and 32bit targets?
>>>
>>> Should crtl->max_used_stack_slot_alignment be compared to
>>> STACK_BOUNDARY or even MIN_STACK_BOUNDARY instead?
>>
>> In this case, we don't need to realign the incoming stack since both
>> crtl->max_used_stack_slot_alignment and crtl->preferred_stack_boundary
>> are 128 bits.  We don't compute the largest alignment of stack slots to
>> keep stack frame properly aligned for it.  Normally it is OK.   But if
>> the largest alignment of stack slots > 64 bits and we don't keep stack
>> frame proper aligned, we will get segfault if aligned vector load/store
>> are used on these unaligned stack slots. My patch computes the
>> largest alignment of stack slots, which are actually used,  if the
>> largest alignment of stack slots allocated is > 64 bits, which is
>> the smallest alignment for misaligned load/store.
>
> Does > 64 bits also hold for 32 bit targets, and -mabi=ms? I think
> that we need to compare to STACK_BOUNDARY instead:

64 bits requirement is independent of any psABIs nor 32-bit vs 64-bit.
cfun->machine->max_used_stack_alignment is used to decide how
stack frame should be aligned.  It is always safe to compute it.  I used

else if (crtl->max_used_stack_slot_alignment > 64)

to compute cfun->machine->max_used_stack_alignment only if
we have to.

> --cut here--
> Index: config/i386/i386.c
> ===================================================================
> --- config/i386/i386.c  (revision 263307)
> +++ config/i386/i386.c  (working copy)
> @@ -13281,8 +13281,7 @@
>           recompute_frame_layout_p = true;
>         }
>      }
> -  else if (crtl->max_used_stack_slot_alignment
> -          > crtl->preferred_stack_boundary)
> +  else if (crtl->max_used_stack_slot_alignment > STACK_BOUNDARY)
>      {

This isn't correct..  cygming.h has

#define STACK_BOUNDARY (TARGET_64BIT && ix86_abi == MS_ABI ? 128 :
BITS_PER_WORD)

darwin.h has

#undef STACK_BOUNDARY
#define STACK_BOUNDARY \
 ((profile_flag || (TARGET_64BIT && ix86_abi == MS_ABI)) \
  ? 128 : BITS_PER_WORD)

i386.h has

/* Boundary (in *bits*) on which stack pointer should be aligned.  */
#define STACK_BOUNDARY \
 (TARGET_64BIT && ix86_abi == MS_ABI ? 128 : BITS_PER_WORD)

If STACK_BOUNDARY is 128 and max_used_stack_slot_alignment is 128,
we will get segment when 128bit aligned load/store is generated on misaligned
stack slot.

>        /* We don't need to realign stack.  But we still need to keep
>          stack frame properly aligned to satisfy the largest alignment
> --cut here--
>
> (The testcase works OK with -mabi=ms, which somehow suggests that we
> don't need realignment in this case).

We may not hit 128bit aligned load/store on misaligned stack slot in this
case.  It doesn't mean that won't happen.

else if (crtl->max_used_stack_slot_alignment > 64)

is the correct thing to do here.
Uros Bizjak Aug. 4, 2018, 9:48 p.m. | #5
On Sat, Aug 4, 2018 at 9:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Sat, Aug 4, 2018 at 12:09 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Sat, Aug 4, 2018 at 3:59 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Sat, Aug 4, 2018 at 3:42 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>> On Fri, Aug 3, 2018 at 12:55 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>>>> We should always set cfun->machine->max_used_stack_alignment if the
>>>>> maximum stack slot alignment may be greater than 64 bits.
>>>>>
>>>>> Tested on i686 and x86-64.  OK for master and backport for GCC 8?
>>>>
>>>> Can you explain why 64 bits, and what this value represents? Is this
>>>> value the same for 64bit and 32bit targets?
>>>>
>>>> Should crtl->max_used_stack_slot_alignment be compared to
>>>> STACK_BOUNDARY or even MIN_STACK_BOUNDARY instead?
>>>
>>> In this case, we don't need to realign the incoming stack since both
>>> crtl->max_used_stack_slot_alignment and crtl->preferred_stack_boundary
>>> are 128 bits.  We don't compute the largest alignment of stack slots to
>>> keep stack frame properly aligned for it.  Normally it is OK.   But if
>>> the largest alignment of stack slots > 64 bits and we don't keep stack
>>> frame proper aligned, we will get segfault if aligned vector load/store
>>> are used on these unaligned stack slots. My patch computes the
>>> largest alignment of stack slots, which are actually used,  if the
>>> largest alignment of stack slots allocated is > 64 bits, which is
>>> the smallest alignment for misaligned load/store.
>>
>> Does > 64 bits also hold for 32 bit targets, and -mabi=ms? I think
>> that we need to compare to STACK_BOUNDARY instead:
>
> 64 bits requirement is independent of any psABIs nor 32-bit vs 64-bit.
> cfun->machine->max_used_stack_alignment is used to decide how
> stack frame should be aligned.  It is always safe to compute it.  I used
>
> else if (crtl->max_used_stack_slot_alignment > 64)
>
> to compute cfun->machine->max_used_stack_alignment only if
> we have to.
>
>> --cut here--
>> Index: config/i386/i386.c
>> ===================================================================
>> --- config/i386/i386.c  (revision 263307)
>> +++ config/i386/i386.c  (working copy)
>> @@ -13281,8 +13281,7 @@
>>           recompute_frame_layout_p = true;
>>         }
>>      }
>> -  else if (crtl->max_used_stack_slot_alignment
>> -          > crtl->preferred_stack_boundary)
>> +  else if (crtl->max_used_stack_slot_alignment > STACK_BOUNDARY)
>>      {
>
> This isn't correct..  cygming.h has
>
> #define STACK_BOUNDARY (TARGET_64BIT && ix86_abi == MS_ABI ? 128 :
> BITS_PER_WORD)
>
> darwin.h has
>
> #undef STACK_BOUNDARY
> #define STACK_BOUNDARY \
>  ((profile_flag || (TARGET_64BIT && ix86_abi == MS_ABI)) \
>   ? 128 : BITS_PER_WORD)
>
> i386.h has
>
> /* Boundary (in *bits*) on which stack pointer should be aligned.  */
> #define STACK_BOUNDARY \
>  (TARGET_64BIT && ix86_abi == MS_ABI ? 128 : BITS_PER_WORD)
>
> If STACK_BOUNDARY is 128 and max_used_stack_slot_alignment is 128,
> we will get segment when 128bit aligned load/store is generated on misaligned
> stack slot.
>
>>        /* We don't need to realign stack.  But we still need to keep
>>          stack frame properly aligned to satisfy the largest alignment
>> --cut here--
>>
>> (The testcase works OK with -mabi=ms, which somehow suggests that we
>> don't need realignment in this case).
>
> We may not hit 128bit aligned load/store on misaligned stack slot in this
> case.  It doesn't mean that won't happen.
>
> else if (crtl->max_used_stack_slot_alignment > 64)
>
> is the correct thing to do here.

OK, but please add a comment, so in the future we will still know the
purpose of the magic number.

Thanks,
Uros.
H.J. Lu Aug. 4, 2018, 10:48 p.m. | #6
On Sat, Aug 04, 2018 at 11:48:15PM +0200, Uros Bizjak wrote:
> On Sat, Aug 4, 2018 at 9:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> > On Sat, Aug 4, 2018 at 12:09 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> >> On Sat, Aug 4, 2018 at 3:59 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> >>> On Sat, Aug 4, 2018 at 3:42 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> >>>> On Fri, Aug 3, 2018 at 12:55 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
> >>>>> We should always set cfun->machine->max_used_stack_alignment if the
> >>>>> maximum stack slot alignment may be greater than 64 bits.
> >>>>>
> >>>>> Tested on i686 and x86-64.  OK for master and backport for GCC 8?
> >>>>
> >>>> Can you explain why 64 bits, and what this value represents? Is this
> >>>> value the same for 64bit and 32bit targets?
> >>>>
> >>>> Should crtl->max_used_stack_slot_alignment be compared to
> >>>> STACK_BOUNDARY or even MIN_STACK_BOUNDARY instead?
> >>>
> >>> In this case, we don't need to realign the incoming stack since both
> >>> crtl->max_used_stack_slot_alignment and crtl->preferred_stack_boundary
> >>> are 128 bits.  We don't compute the largest alignment of stack slots to
> >>> keep stack frame properly aligned for it.  Normally it is OK.   But if
> >>> the largest alignment of stack slots > 64 bits and we don't keep stack
> >>> frame proper aligned, we will get segfault if aligned vector load/store
> >>> are used on these unaligned stack slots. My patch computes the
> >>> largest alignment of stack slots, which are actually used,  if the
> >>> largest alignment of stack slots allocated is > 64 bits, which is
> >>> the smallest alignment for misaligned load/store.
> >>
> >> Does > 64 bits also hold for 32 bit targets, and -mabi=ms? I think
> >> that we need to compare to STACK_BOUNDARY instead:
> >
> > 64 bits requirement is independent of any psABIs nor 32-bit vs 64-bit.
> > cfun->machine->max_used_stack_alignment is used to decide how
> > stack frame should be aligned.  It is always safe to compute it.  I used
> >
> > else if (crtl->max_used_stack_slot_alignment > 64)
> >
> > to compute cfun->machine->max_used_stack_alignment only if
> > we have to.
> >
> >> --cut here--
> >> Index: config/i386/i386.c
> >> ===================================================================
> >> --- config/i386/i386.c  (revision 263307)
> >> +++ config/i386/i386.c  (working copy)
> >> @@ -13281,8 +13281,7 @@
> >>           recompute_frame_layout_p = true;
> >>         }
> >>      }
> >> -  else if (crtl->max_used_stack_slot_alignment
> >> -          > crtl->preferred_stack_boundary)
> >> +  else if (crtl->max_used_stack_slot_alignment > STACK_BOUNDARY)
> >>      {
> >
> > This isn't correct..  cygming.h has
> >
> > #define STACK_BOUNDARY (TARGET_64BIT && ix86_abi == MS_ABI ? 128 :
> > BITS_PER_WORD)
> >
> > darwin.h has
> >
> > #undef STACK_BOUNDARY
> > #define STACK_BOUNDARY \
> >  ((profile_flag || (TARGET_64BIT && ix86_abi == MS_ABI)) \
> >   ? 128 : BITS_PER_WORD)
> >
> > i386.h has
> >
> > /* Boundary (in *bits*) on which stack pointer should be aligned.  */
> > #define STACK_BOUNDARY \
> >  (TARGET_64BIT && ix86_abi == MS_ABI ? 128 : BITS_PER_WORD)
> >
> > If STACK_BOUNDARY is 128 and max_used_stack_slot_alignment is 128,
> > we will get segment when 128bit aligned load/store is generated on misaligned
> > stack slot.
> >
> >>        /* We don't need to realign stack.  But we still need to keep
> >>          stack frame properly aligned to satisfy the largest alignment
> >> --cut here--
> >>
> >> (The testcase works OK with -mabi=ms, which somehow suggests that we
> >> don't need realignment in this case).
> >
> > We may not hit 128bit aligned load/store on misaligned stack slot in this
> > case.  It doesn't mean that won't happen.
> >
> > else if (crtl->max_used_stack_slot_alignment > 64)
> >
> > is the correct thing to do here.
> 
> OK, but please add a comment, so in the future we will still know the
> purpose of the magic number.
> 

Like this?

H.J.
---
cfun->machine->max_used_stack_alignment is used to decide how stack frame
should be aligned.  This is independent of any psABIs nor 32-bit vs 64-bit.
It is always safe to compute max_used_stack_alignment.  We compute it only
if 128-bit aligned load/store may be generated on misaligned stack slot
which will lead to segfault.

gcc/

	PR target/86386
	* config/i386/i386.c (ix86_finalize_stack_frame_flags): Set
	cfun->machine->max_used_stack_alignment if needed.

gcc/testsuite/

	PR target/86386
	* gcc.target/i386/pr86386.c: New file.
---
 gcc/config/i386/i386.c                  | 14 +++++++------
 gcc/testsuite/gcc.target/i386/pr86386.c | 26 +++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr86386.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ee409cfe7e4..cf8c33bd909 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13281,12 +13281,14 @@ ix86_finalize_stack_frame_flags (void)
 	  recompute_frame_layout_p = true;
 	}
     }
-  else if (crtl->max_used_stack_slot_alignment
-	   > crtl->preferred_stack_boundary)
-    {
-      /* We don't need to realign stack.  But we still need to keep
-	 stack frame properly aligned to satisfy the largest alignment
-	 of stack slots.  */
+  else if (crtl->max_used_stack_slot_alignment > 64)
+    {
+      /* We don't need to realign stack.  max_used_stack_alignment is
+	 used to decide how stack frame should be aligned.  This is
+	 independent of any psABIs nor 32-bit vs 64-bit.  It is always
+	 safe to compute max_used_stack_alignment.  We compute it only
+	 if 128-bit aligned load/store may be generated on misaligned
+	 stack slot which will lead to segfault.   */
       if (ix86_find_max_used_stack_alignment (stack_alignment, true))
 	cfun->machine->max_used_stack_alignment
 	  = stack_alignment / BITS_PER_UNIT;
diff --git a/gcc/testsuite/gcc.target/i386/pr86386.c b/gcc/testsuite/gcc.target/i386/pr86386.c
new file mode 100644
index 00000000000..a67cf45444e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr86386.c
@@ -0,0 +1,26 @@
+/* PR target/86386 */
+/* { dg-do run { target { avx_runtime && int128 } } } */
+/* { dg-options "-Os -fno-tree-dce -mstringop-strategy=vector_loop -mavx" } */
+
+unsigned c, d, e, f;
+
+unsigned __attribute__((noipa))
+foo (unsigned char g, unsigned short h, unsigned i, unsigned long long j,
+     unsigned char k, unsigned short l, unsigned m, unsigned __int128 n)
+{
+  __builtin_memset (&e, 0, 3);
+  n <<= m;
+  __builtin_memcpy (&m, 2 + (char *) &n, 1);
+  m >>= 0;
+  d ^= __builtin_mul_overflow (l, n, &m);
+  return m;
+}
+
+int
+main ()
+{
+  unsigned __int128 x = foo (0, 0, 0, 0, 0, 4, 1, 3);
+  if (x != 24)
+    __builtin_abort ();
+  return 0;
+}
Uros Bizjak Aug. 5, 2018, 7:15 a.m. | #7
On Sun, Aug 5, 2018 at 12:48 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Sat, Aug 04, 2018 at 11:48:15PM +0200, Uros Bizjak wrote:
>> On Sat, Aug 4, 2018 at 9:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> > On Sat, Aug 4, 2018 at 12:09 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> >> On Sat, Aug 4, 2018 at 3:59 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> >>> On Sat, Aug 4, 2018 at 3:42 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> >>>> On Fri, Aug 3, 2018 at 12:55 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>> >>>>> We should always set cfun->machine->max_used_stack_alignment if the
>> >>>>> maximum stack slot alignment may be greater than 64 bits.
>> >>>>>
>> >>>>> Tested on i686 and x86-64.  OK for master and backport for GCC 8?
>> >>>>
>> >>>> Can you explain why 64 bits, and what this value represents? Is this
>> >>>> value the same for 64bit and 32bit targets?
>> >>>>
>> >>>> Should crtl->max_used_stack_slot_alignment be compared to
>> >>>> STACK_BOUNDARY or even MIN_STACK_BOUNDARY instead?
>> >>>
>> >>> In this case, we don't need to realign the incoming stack since both
>> >>> crtl->max_used_stack_slot_alignment and crtl->preferred_stack_boundary
>> >>> are 128 bits.  We don't compute the largest alignment of stack slots to
>> >>> keep stack frame properly aligned for it.  Normally it is OK.   But if
>> >>> the largest alignment of stack slots > 64 bits and we don't keep stack
>> >>> frame proper aligned, we will get segfault if aligned vector load/store
>> >>> are used on these unaligned stack slots. My patch computes the
>> >>> largest alignment of stack slots, which are actually used,  if the
>> >>> largest alignment of stack slots allocated is > 64 bits, which is
>> >>> the smallest alignment for misaligned load/store.
>> >>
>> >> Does > 64 bits also hold for 32 bit targets, and -mabi=ms? I think
>> >> that we need to compare to STACK_BOUNDARY instead:
>> >
>> > 64 bits requirement is independent of any psABIs nor 32-bit vs 64-bit.
>> > cfun->machine->max_used_stack_alignment is used to decide how
>> > stack frame should be aligned.  It is always safe to compute it.  I used
>> >
>> > else if (crtl->max_used_stack_slot_alignment > 64)
>> >
>> > to compute cfun->machine->max_used_stack_alignment only if
>> > we have to.
>> >
>> >> --cut here--
>> >> Index: config/i386/i386.c
>> >> ===================================================================
>> >> --- config/i386/i386.c  (revision 263307)
>> >> +++ config/i386/i386.c  (working copy)
>> >> @@ -13281,8 +13281,7 @@
>> >>           recompute_frame_layout_p = true;
>> >>         }
>> >>      }
>> >> -  else if (crtl->max_used_stack_slot_alignment
>> >> -          > crtl->preferred_stack_boundary)
>> >> +  else if (crtl->max_used_stack_slot_alignment > STACK_BOUNDARY)
>> >>      {
>> >
>> > This isn't correct..  cygming.h has
>> >
>> > #define STACK_BOUNDARY (TARGET_64BIT && ix86_abi == MS_ABI ? 128 :
>> > BITS_PER_WORD)
>> >
>> > darwin.h has
>> >
>> > #undef STACK_BOUNDARY
>> > #define STACK_BOUNDARY \
>> >  ((profile_flag || (TARGET_64BIT && ix86_abi == MS_ABI)) \
>> >   ? 128 : BITS_PER_WORD)
>> >
>> > i386.h has
>> >
>> > /* Boundary (in *bits*) on which stack pointer should be aligned.  */
>> > #define STACK_BOUNDARY \
>> >  (TARGET_64BIT && ix86_abi == MS_ABI ? 128 : BITS_PER_WORD)
>> >
>> > If STACK_BOUNDARY is 128 and max_used_stack_slot_alignment is 128,
>> > we will get segment when 128bit aligned load/store is generated on misaligned
>> > stack slot.
>> >
>> >>        /* We don't need to realign stack.  But we still need to keep
>> >>          stack frame properly aligned to satisfy the largest alignment
>> >> --cut here--
>> >>
>> >> (The testcase works OK with -mabi=ms, which somehow suggests that we
>> >> don't need realignment in this case).
>> >
>> > We may not hit 128bit aligned load/store on misaligned stack slot in this
>> > case.  It doesn't mean that won't happen.
>> >
>> > else if (crtl->max_used_stack_slot_alignment > 64)
>> >
>> > is the correct thing to do here.
>>
>> OK, but please add a comment, so in the future we will still know the
>> purpose of the magic number.
>>
>
> Like this?
>
> H.J.
> ---
> cfun->machine->max_used_stack_alignment is used to decide how stack frame
> should be aligned.  This is independent of any psABIs nor 32-bit vs 64-bit.
> It is always safe to compute max_used_stack_alignment.  We compute it only
> if 128-bit aligned load/store may be generated on misaligned stack slot
> which will lead to segfault.
>
> gcc/
>
>         PR target/86386
>         * config/i386/i386.c (ix86_finalize_stack_frame_flags): Set
>         cfun->machine->max_used_stack_alignment if needed.
>
> gcc/testsuite/
>
>         PR target/86386
>         * gcc.target/i386/pr86386.c: New file.

OK, but please write the condition as ">= 128". The number 64 confused
me; I was thinking that it has something to do with minimum alignment
on 64bit target, while 128 clearly shows that alignment is related to
SSE moves. With ">= 128", I think that code is clear enough that a
long comment is not needed.

Thanks, and sorry for the confusion,
Uros.

> ---
>  gcc/config/i386/i386.c                  | 14 +++++++------
>  gcc/testsuite/gcc.target/i386/pr86386.c | 26 +++++++++++++++++++++++++
>  2 files changed, 34 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr86386.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index ee409cfe7e4..cf8c33bd909 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -13281,12 +13281,14 @@ ix86_finalize_stack_frame_flags (void)
>           recompute_frame_layout_p = true;
>         }
>      }
> -  else if (crtl->max_used_stack_slot_alignment
> -          > crtl->preferred_stack_boundary)
> -    {
> -      /* We don't need to realign stack.  But we still need to keep
> -        stack frame properly aligned to satisfy the largest alignment
> -        of stack slots.  */
> +  else if (crtl->max_used_stack_slot_alignment > 64)
> +    {
> +      /* We don't need to realign stack.  max_used_stack_alignment is
> +        used to decide how stack frame should be aligned.  This is
> +        independent of any psABIs nor 32-bit vs 64-bit.  It is always
> +        safe to compute max_used_stack_alignment.  We compute it only
> +        if 128-bit aligned load/store may be generated on misaligned
> +        stack slot which will lead to segfault.   */
>        if (ix86_find_max_used_stack_alignment (stack_alignment, true))
>         cfun->machine->max_used_stack_alignment
>           = stack_alignment / BITS_PER_UNIT;
> diff --git a/gcc/testsuite/gcc.target/i386/pr86386.c b/gcc/testsuite/gcc.target/i386/pr86386.c
> new file mode 100644
> index 00000000000..a67cf45444e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr86386.c
> @@ -0,0 +1,26 @@
> +/* PR target/86386 */
> +/* { dg-do run { target { avx_runtime && int128 } } } */
> +/* { dg-options "-Os -fno-tree-dce -mstringop-strategy=vector_loop -mavx" } */
> +
> +unsigned c, d, e, f;
> +
> +unsigned __attribute__((noipa))
> +foo (unsigned char g, unsigned short h, unsigned i, unsigned long long j,
> +     unsigned char k, unsigned short l, unsigned m, unsigned __int128 n)
> +{
> +  __builtin_memset (&e, 0, 3);
> +  n <<= m;
> +  __builtin_memcpy (&m, 2 + (char *) &n, 1);
> +  m >>= 0;
> +  d ^= __builtin_mul_overflow (l, n, &m);
> +  return m;
> +}
> +
> +int
> +main ()
> +{
> +  unsigned __int128 x = foo (0, 0, 0, 0, 0, 4, 1, 3);
> +  if (x != 24)
> +    __builtin_abort ();
> +  return 0;
> +}
> --
> 2.17.1
>
H.J. Lu Aug. 5, 2018, 12:51 p.m. | #8
On Sun, Aug 5, 2018 at 12:15 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> OK, but please add a comment, so in the future we will still know the
>>> purpose of the magic number.
>>>
>>
>> Like this?
>>
>> H.J.
>> ---
>> cfun->machine->max_used_stack_alignment is used to decide how stack frame
>> should be aligned.  This is independent of any psABIs nor 32-bit vs 64-bit.
>> It is always safe to compute max_used_stack_alignment.  We compute it only
>> if 128-bit aligned load/store may be generated on misaligned stack slot
>> which will lead to segfault.
>>
>> gcc/
>>
>>         PR target/86386
>>         * config/i386/i386.c (ix86_finalize_stack_frame_flags): Set
>>         cfun->machine->max_used_stack_alignment if needed.
>>
>> gcc/testsuite/
>>
>>         PR target/86386
>>         * gcc.target/i386/pr86386.c: New file.
>
> OK, but please write the condition as ">= 128". The number 64 confused
> me; I was thinking that it has something to do with minimum alignment
> on 64bit target, while 128 clearly shows that alignment is related to
> SSE moves. With ">= 128", I think that code is clear enough that a
> long comment is not needed.
>
> Thanks, and sorry for the confusion,
> Uros.
>

This is what I checked in.  I kept the comment change.  I will backport it
to GCC 8 after a few days.

Thanks.

Patch

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ee409cfe7e4..4a0a050b3a2 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13281,12 +13281,12 @@  ix86_finalize_stack_frame_flags (void)
 	  recompute_frame_layout_p = true;
 	}
     }
-  else if (crtl->max_used_stack_slot_alignment
-	   > crtl->preferred_stack_boundary)
+  else if (crtl->max_used_stack_slot_alignment > 64)
     {
       /* We don't need to realign stack.  But we still need to keep
 	 stack frame properly aligned to satisfy the largest alignment
-	 of stack slots.  */
+	 of stack slots if the maximum stack slot alignment may be
+	 greater than 64 bits.  */
       if (ix86_find_max_used_stack_alignment (stack_alignment, true))
 	cfun->machine->max_used_stack_alignment
 	  = stack_alignment / BITS_PER_UNIT;
diff --git a/gcc/testsuite/gcc.target/i386/pr86386.c b/gcc/testsuite/gcc.target/i386/pr86386.c
new file mode 100644
index 00000000000..a67cf45444e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr86386.c
@@ -0,0 +1,26 @@ 
+/* PR target/86386 */
+/* { dg-do run { target { avx_runtime && int128 } } } */
+/* { dg-options "-Os -fno-tree-dce -mstringop-strategy=vector_loop -mavx" } */
+
+unsigned c, d, e, f;
+
+unsigned __attribute__((noipa))
+foo (unsigned char g, unsigned short h, unsigned i, unsigned long long j,
+     unsigned char k, unsigned short l, unsigned m, unsigned __int128 n)
+{
+  __builtin_memset (&e, 0, 3);
+  n <<= m;
+  __builtin_memcpy (&m, 2 + (char *) &n, 1);
+  m >>= 0;
+  d ^= __builtin_mul_overflow (l, n, &m);
+  return m;
+}
+
+int
+main ()
+{
+  unsigned __int128 x = foo (0, 0, 0, 0, 0, 4, 1, 3);
+  if (x != 24)
+    __builtin_abort ();
+  return 0;
+}