Message ID | 20180802225548.GA1730@intel.com |
---|---|
State | New |
Headers | show |
Series | i386: Always set cfun->machine->max_used_stack_alignment | expand |
On Fri, Aug 3, 2018 at 12:55 AM, H.J. Lu <hongjiu.lu@intel.com> wrote: > We should always set cfun->machine->max_used_stack_alignment if the > maximum stack slot alignment may be greater than 64 bits. > > Tested on i686 and x86-64. OK for master and backport for GCC 8? Can you explain why 64 bits, and what this value represents? Is this value the same for 64bit and 32bit targets? Should crtl->max_used_stack_slot_alignment be compared to STACK_BOUNDARY or even MIN_STACK_BOUNDARY instead? Uros. > Thanks. > > H.J. > ---- > gcc/ > > PR target/86386 > * config/i386/i386.c (ix86_finalize_stack_frame_flags): Set > cfun->machine->max_used_stack_alignment if needed. > > gcc/testsuite/ > > PR target/86386 > * gcc.target/i386/pr86386.c: New file. > --- > gcc/config/i386/i386.c | 6 +++--- > gcc/testsuite/gcc.target/i386/pr86386.c | 26 +++++++++++++++++++++++++ > 2 files changed, 29 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr86386.c > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index ee409cfe7e4..4a0a050b3a2 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -13281,12 +13281,12 @@ ix86_finalize_stack_frame_flags (void) > recompute_frame_layout_p = true; > } > } > - else if (crtl->max_used_stack_slot_alignment > - > crtl->preferred_stack_boundary) > + else if (crtl->max_used_stack_slot_alignment > 64) > { > /* We don't need to realign stack. But we still need to keep > stack frame properly aligned to satisfy the largest alignment > - of stack slots. */ > + of stack slots if the maximum stack slot alignment may be > + greater than 64 bits. */ > if (ix86_find_max_used_stack_alignment (stack_alignment, true)) > cfun->machine->max_used_stack_alignment > = stack_alignment / BITS_PER_UNIT; > diff --git a/gcc/testsuite/gcc.target/i386/pr86386.c b/gcc/testsuite/gcc.target/i386/pr86386.c > new file mode 100644 > index 00000000000..a67cf45444e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr86386.c > @@ -0,0 +1,26 @@ > +/* PR target/86386 */ > +/* { dg-do run { target { avx_runtime && int128 } } } */ > +/* { dg-options "-Os -fno-tree-dce -mstringop-strategy=vector_loop -mavx" } */ > + > +unsigned c, d, e, f; > + > +unsigned __attribute__((noipa)) > +foo (unsigned char g, unsigned short h, unsigned i, unsigned long long j, > + unsigned char k, unsigned short l, unsigned m, unsigned __int128 n) > +{ > + __builtin_memset (&e, 0, 3); > + n <<= m; > + __builtin_memcpy (&m, 2 + (char *) &n, 1); > + m >>= 0; > + d ^= __builtin_mul_overflow (l, n, &m); > + return m; > +} > + > +int > +main () > +{ > + unsigned __int128 x = foo (0, 0, 0, 0, 0, 4, 1, 3); > + if (x != 24) > + __builtin_abort (); > + return 0; > +} > -- > 2.17.1 >
On Sat, Aug 4, 2018 at 3:42 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Fri, Aug 3, 2018 at 12:55 AM, H.J. Lu <hongjiu.lu@intel.com> wrote: >> We should always set cfun->machine->max_used_stack_alignment if the >> maximum stack slot alignment may be greater than 64 bits. >> >> Tested on i686 and x86-64. OK for master and backport for GCC 8? > > Can you explain why 64 bits, and what this value represents? Is this > value the same for 64bit and 32bit targets? > > Should crtl->max_used_stack_slot_alignment be compared to > STACK_BOUNDARY or even MIN_STACK_BOUNDARY instead? In this case, we don't need to realign the incoming stack since both crtl->max_used_stack_slot_alignment and crtl->preferred_stack_boundary are 128 bits. We don't compute the largest alignment of stack slots to keep stack frame properly aligned for it. Normally it is OK. But if the largest alignment of stack slots > 64 bits and we don't keep stack frame proper aligned, we will get segfault if aligned vector load/store are used on these unaligned stack slots. My patch computes the largest alignment of stack slots, which are actually used, if the largest alignment of stack slots allocated is > 64 bits, which is the smallest alignment for misaligned load/store. Here is the diff of before and after: diff -up old/x.s new/x.s --- old/x.s 2018-08-02 12:39:22.916400504 -0700 +++ new/x.s 2018-08-02 12:38:35.853729415 -0700 @@ -15,6 +15,7 @@ foo: movq %rsp, %rbp .cfi_def_cfa_register 6 pushq %rbx + subq $8, %rsp <<<<<<<<<<< Stack frame is properly aligned to 16 bytes. .cfi_offset 3, -24 stosw movl 16(%rbp), %ecx @@ -65,6 +66,7 @@ foo: .L9: xorl %r8d, d(%rip) movl %esi, %eax + popq %rdx popq %rbx popq %rbp .cfi_def_cfa 7, 8
On Sat, Aug 4, 2018 at 3:59 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Sat, Aug 4, 2018 at 3:42 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >> On Fri, Aug 3, 2018 at 12:55 AM, H.J. Lu <hongjiu.lu@intel.com> wrote: >>> We should always set cfun->machine->max_used_stack_alignment if the >>> maximum stack slot alignment may be greater than 64 bits. >>> >>> Tested on i686 and x86-64. OK for master and backport for GCC 8? >> >> Can you explain why 64 bits, and what this value represents? Is this >> value the same for 64bit and 32bit targets? >> >> Should crtl->max_used_stack_slot_alignment be compared to >> STACK_BOUNDARY or even MIN_STACK_BOUNDARY instead? > > In this case, we don't need to realign the incoming stack since both > crtl->max_used_stack_slot_alignment and crtl->preferred_stack_boundary > are 128 bits. We don't compute the largest alignment of stack slots to > keep stack frame properly aligned for it. Normally it is OK. But if > the largest alignment of stack slots > 64 bits and we don't keep stack > frame proper aligned, we will get segfault if aligned vector load/store > are used on these unaligned stack slots. My patch computes the > largest alignment of stack slots, which are actually used, if the > largest alignment of stack slots allocated is > 64 bits, which is > the smallest alignment for misaligned load/store. Does > 64 bits also hold for 32 bit targets, and -mabi=ms? I think that we need to compare to STACK_BOUNDARY instead: --cut here-- Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 263307) +++ config/i386/i386.c (working copy) @@ -13281,8 +13281,7 @@ recompute_frame_layout_p = true; } } - else if (crtl->max_used_stack_slot_alignment - > crtl->preferred_stack_boundary) + else if (crtl->max_used_stack_slot_alignment > STACK_BOUNDARY) { /* We don't need to realign stack. But we still need to keep stack frame properly aligned to satisfy the largest alignment --cut here-- (The testcase works OK with -mabi=ms, which somehow suggests that we don't need realignment in this case). Uros.
On Sat, Aug 4, 2018 at 12:09 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Sat, Aug 4, 2018 at 3:59 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Sat, Aug 4, 2018 at 3:42 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >>> On Fri, Aug 3, 2018 at 12:55 AM, H.J. Lu <hongjiu.lu@intel.com> wrote: >>>> We should always set cfun->machine->max_used_stack_alignment if the >>>> maximum stack slot alignment may be greater than 64 bits. >>>> >>>> Tested on i686 and x86-64. OK for master and backport for GCC 8? >>> >>> Can you explain why 64 bits, and what this value represents? Is this >>> value the same for 64bit and 32bit targets? >>> >>> Should crtl->max_used_stack_slot_alignment be compared to >>> STACK_BOUNDARY or even MIN_STACK_BOUNDARY instead? >> >> In this case, we don't need to realign the incoming stack since both >> crtl->max_used_stack_slot_alignment and crtl->preferred_stack_boundary >> are 128 bits. We don't compute the largest alignment of stack slots to >> keep stack frame properly aligned for it. Normally it is OK. But if >> the largest alignment of stack slots > 64 bits and we don't keep stack >> frame proper aligned, we will get segfault if aligned vector load/store >> are used on these unaligned stack slots. My patch computes the >> largest alignment of stack slots, which are actually used, if the >> largest alignment of stack slots allocated is > 64 bits, which is >> the smallest alignment for misaligned load/store. > > Does > 64 bits also hold for 32 bit targets, and -mabi=ms? I think > that we need to compare to STACK_BOUNDARY instead: 64 bits requirement is independent of any psABIs nor 32-bit vs 64-bit. cfun->machine->max_used_stack_alignment is used to decide how stack frame should be aligned. It is always safe to compute it. I used else if (crtl->max_used_stack_slot_alignment > 64) to compute cfun->machine->max_used_stack_alignment only if we have to. > --cut here-- > Index: config/i386/i386.c > =================================================================== > --- config/i386/i386.c (revision 263307) > +++ config/i386/i386.c (working copy) > @@ -13281,8 +13281,7 @@ > recompute_frame_layout_p = true; > } > } > - else if (crtl->max_used_stack_slot_alignment > - > crtl->preferred_stack_boundary) > + else if (crtl->max_used_stack_slot_alignment > STACK_BOUNDARY) > { This isn't correct.. cygming.h has #define STACK_BOUNDARY (TARGET_64BIT && ix86_abi == MS_ABI ? 128 : BITS_PER_WORD) darwin.h has #undef STACK_BOUNDARY #define STACK_BOUNDARY \ ((profile_flag || (TARGET_64BIT && ix86_abi == MS_ABI)) \ ? 128 : BITS_PER_WORD) i386.h has /* Boundary (in *bits*) on which stack pointer should be aligned. */ #define STACK_BOUNDARY \ (TARGET_64BIT && ix86_abi == MS_ABI ? 128 : BITS_PER_WORD) If STACK_BOUNDARY is 128 and max_used_stack_slot_alignment is 128, we will get segment when 128bit aligned load/store is generated on misaligned stack slot. > /* We don't need to realign stack. But we still need to keep > stack frame properly aligned to satisfy the largest alignment > --cut here-- > > (The testcase works OK with -mabi=ms, which somehow suggests that we > don't need realignment in this case). We may not hit 128bit aligned load/store on misaligned stack slot in this case. It doesn't mean that won't happen. else if (crtl->max_used_stack_slot_alignment > 64) is the correct thing to do here.
On Sat, Aug 4, 2018 at 9:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Sat, Aug 4, 2018 at 12:09 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >> On Sat, Aug 4, 2018 at 3:59 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Sat, Aug 4, 2018 at 3:42 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >>>> On Fri, Aug 3, 2018 at 12:55 AM, H.J. Lu <hongjiu.lu@intel.com> wrote: >>>>> We should always set cfun->machine->max_used_stack_alignment if the >>>>> maximum stack slot alignment may be greater than 64 bits. >>>>> >>>>> Tested on i686 and x86-64. OK for master and backport for GCC 8? >>>> >>>> Can you explain why 64 bits, and what this value represents? Is this >>>> value the same for 64bit and 32bit targets? >>>> >>>> Should crtl->max_used_stack_slot_alignment be compared to >>>> STACK_BOUNDARY or even MIN_STACK_BOUNDARY instead? >>> >>> In this case, we don't need to realign the incoming stack since both >>> crtl->max_used_stack_slot_alignment and crtl->preferred_stack_boundary >>> are 128 bits. We don't compute the largest alignment of stack slots to >>> keep stack frame properly aligned for it. Normally it is OK. But if >>> the largest alignment of stack slots > 64 bits and we don't keep stack >>> frame proper aligned, we will get segfault if aligned vector load/store >>> are used on these unaligned stack slots. My patch computes the >>> largest alignment of stack slots, which are actually used, if the >>> largest alignment of stack slots allocated is > 64 bits, which is >>> the smallest alignment for misaligned load/store. >> >> Does > 64 bits also hold for 32 bit targets, and -mabi=ms? I think >> that we need to compare to STACK_BOUNDARY instead: > > 64 bits requirement is independent of any psABIs nor 32-bit vs 64-bit. > cfun->machine->max_used_stack_alignment is used to decide how > stack frame should be aligned. It is always safe to compute it. I used > > else if (crtl->max_used_stack_slot_alignment > 64) > > to compute cfun->machine->max_used_stack_alignment only if > we have to. > >> --cut here-- >> Index: config/i386/i386.c >> =================================================================== >> --- config/i386/i386.c (revision 263307) >> +++ config/i386/i386.c (working copy) >> @@ -13281,8 +13281,7 @@ >> recompute_frame_layout_p = true; >> } >> } >> - else if (crtl->max_used_stack_slot_alignment >> - > crtl->preferred_stack_boundary) >> + else if (crtl->max_used_stack_slot_alignment > STACK_BOUNDARY) >> { > > This isn't correct.. cygming.h has > > #define STACK_BOUNDARY (TARGET_64BIT && ix86_abi == MS_ABI ? 128 : > BITS_PER_WORD) > > darwin.h has > > #undef STACK_BOUNDARY > #define STACK_BOUNDARY \ > ((profile_flag || (TARGET_64BIT && ix86_abi == MS_ABI)) \ > ? 128 : BITS_PER_WORD) > > i386.h has > > /* Boundary (in *bits*) on which stack pointer should be aligned. */ > #define STACK_BOUNDARY \ > (TARGET_64BIT && ix86_abi == MS_ABI ? 128 : BITS_PER_WORD) > > If STACK_BOUNDARY is 128 and max_used_stack_slot_alignment is 128, > we will get segment when 128bit aligned load/store is generated on misaligned > stack slot. > >> /* We don't need to realign stack. But we still need to keep >> stack frame properly aligned to satisfy the largest alignment >> --cut here-- >> >> (The testcase works OK with -mabi=ms, which somehow suggests that we >> don't need realignment in this case). > > We may not hit 128bit aligned load/store on misaligned stack slot in this > case. It doesn't mean that won't happen. > > else if (crtl->max_used_stack_slot_alignment > 64) > > is the correct thing to do here. OK, but please add a comment, so in the future we will still know the purpose of the magic number. Thanks, Uros.
On Sat, Aug 04, 2018 at 11:48:15PM +0200, Uros Bizjak wrote: > On Sat, Aug 4, 2018 at 9:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > > On Sat, Aug 4, 2018 at 12:09 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > >> On Sat, Aug 4, 2018 at 3:59 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > >>> On Sat, Aug 4, 2018 at 3:42 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > >>>> On Fri, Aug 3, 2018 at 12:55 AM, H.J. Lu <hongjiu.lu@intel.com> wrote: > >>>>> We should always set cfun->machine->max_used_stack_alignment if the > >>>>> maximum stack slot alignment may be greater than 64 bits. > >>>>> > >>>>> Tested on i686 and x86-64. OK for master and backport for GCC 8? > >>>> > >>>> Can you explain why 64 bits, and what this value represents? Is this > >>>> value the same for 64bit and 32bit targets? > >>>> > >>>> Should crtl->max_used_stack_slot_alignment be compared to > >>>> STACK_BOUNDARY or even MIN_STACK_BOUNDARY instead? > >>> > >>> In this case, we don't need to realign the incoming stack since both > >>> crtl->max_used_stack_slot_alignment and crtl->preferred_stack_boundary > >>> are 128 bits. We don't compute the largest alignment of stack slots to > >>> keep stack frame properly aligned for it. Normally it is OK. But if > >>> the largest alignment of stack slots > 64 bits and we don't keep stack > >>> frame proper aligned, we will get segfault if aligned vector load/store > >>> are used on these unaligned stack slots. My patch computes the > >>> largest alignment of stack slots, which are actually used, if the > >>> largest alignment of stack slots allocated is > 64 bits, which is > >>> the smallest alignment for misaligned load/store. > >> > >> Does > 64 bits also hold for 32 bit targets, and -mabi=ms? I think > >> that we need to compare to STACK_BOUNDARY instead: > > > > 64 bits requirement is independent of any psABIs nor 32-bit vs 64-bit. > > cfun->machine->max_used_stack_alignment is used to decide how > > stack frame should be aligned. It is always safe to compute it. I used > > > > else if (crtl->max_used_stack_slot_alignment > 64) > > > > to compute cfun->machine->max_used_stack_alignment only if > > we have to. > > > >> --cut here-- > >> Index: config/i386/i386.c > >> =================================================================== > >> --- config/i386/i386.c (revision 263307) > >> +++ config/i386/i386.c (working copy) > >> @@ -13281,8 +13281,7 @@ > >> recompute_frame_layout_p = true; > >> } > >> } > >> - else if (crtl->max_used_stack_slot_alignment > >> - > crtl->preferred_stack_boundary) > >> + else if (crtl->max_used_stack_slot_alignment > STACK_BOUNDARY) > >> { > > > > This isn't correct.. cygming.h has > > > > #define STACK_BOUNDARY (TARGET_64BIT && ix86_abi == MS_ABI ? 128 : > > BITS_PER_WORD) > > > > darwin.h has > > > > #undef STACK_BOUNDARY > > #define STACK_BOUNDARY \ > > ((profile_flag || (TARGET_64BIT && ix86_abi == MS_ABI)) \ > > ? 128 : BITS_PER_WORD) > > > > i386.h has > > > > /* Boundary (in *bits*) on which stack pointer should be aligned. */ > > #define STACK_BOUNDARY \ > > (TARGET_64BIT && ix86_abi == MS_ABI ? 128 : BITS_PER_WORD) > > > > If STACK_BOUNDARY is 128 and max_used_stack_slot_alignment is 128, > > we will get segment when 128bit aligned load/store is generated on misaligned > > stack slot. > > > >> /* We don't need to realign stack. But we still need to keep > >> stack frame properly aligned to satisfy the largest alignment > >> --cut here-- > >> > >> (The testcase works OK with -mabi=ms, which somehow suggests that we > >> don't need realignment in this case). > > > > We may not hit 128bit aligned load/store on misaligned stack slot in this > > case. It doesn't mean that won't happen. > > > > else if (crtl->max_used_stack_slot_alignment > 64) > > > > is the correct thing to do here. > > OK, but please add a comment, so in the future we will still know the > purpose of the magic number. > Like this? H.J. --- cfun->machine->max_used_stack_alignment is used to decide how stack frame should be aligned. This is independent of any psABIs nor 32-bit vs 64-bit. It is always safe to compute max_used_stack_alignment. We compute it only if 128-bit aligned load/store may be generated on misaligned stack slot which will lead to segfault. gcc/ PR target/86386 * config/i386/i386.c (ix86_finalize_stack_frame_flags): Set cfun->machine->max_used_stack_alignment if needed. gcc/testsuite/ PR target/86386 * gcc.target/i386/pr86386.c: New file. --- gcc/config/i386/i386.c | 14 +++++++------ gcc/testsuite/gcc.target/i386/pr86386.c | 26 +++++++++++++++++++++++++ 2 files changed, 34 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr86386.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index ee409cfe7e4..cf8c33bd909 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -13281,12 +13281,14 @@ ix86_finalize_stack_frame_flags (void) recompute_frame_layout_p = true; } } - else if (crtl->max_used_stack_slot_alignment - > crtl->preferred_stack_boundary) - { - /* We don't need to realign stack. But we still need to keep - stack frame properly aligned to satisfy the largest alignment - of stack slots. */ + else if (crtl->max_used_stack_slot_alignment > 64) + { + /* We don't need to realign stack. max_used_stack_alignment is + used to decide how stack frame should be aligned. This is + independent of any psABIs nor 32-bit vs 64-bit. It is always + safe to compute max_used_stack_alignment. We compute it only + if 128-bit aligned load/store may be generated on misaligned + stack slot which will lead to segfault. */ if (ix86_find_max_used_stack_alignment (stack_alignment, true)) cfun->machine->max_used_stack_alignment = stack_alignment / BITS_PER_UNIT; diff --git a/gcc/testsuite/gcc.target/i386/pr86386.c b/gcc/testsuite/gcc.target/i386/pr86386.c new file mode 100644 index 00000000000..a67cf45444e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr86386.c @@ -0,0 +1,26 @@ +/* PR target/86386 */ +/* { dg-do run { target { avx_runtime && int128 } } } */ +/* { dg-options "-Os -fno-tree-dce -mstringop-strategy=vector_loop -mavx" } */ + +unsigned c, d, e, f; + +unsigned __attribute__((noipa)) +foo (unsigned char g, unsigned short h, unsigned i, unsigned long long j, + unsigned char k, unsigned short l, unsigned m, unsigned __int128 n) +{ + __builtin_memset (&e, 0, 3); + n <<= m; + __builtin_memcpy (&m, 2 + (char *) &n, 1); + m >>= 0; + d ^= __builtin_mul_overflow (l, n, &m); + return m; +} + +int +main () +{ + unsigned __int128 x = foo (0, 0, 0, 0, 0, 4, 1, 3); + if (x != 24) + __builtin_abort (); + return 0; +}
On Sun, Aug 5, 2018 at 12:48 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Sat, Aug 04, 2018 at 11:48:15PM +0200, Uros Bizjak wrote: >> On Sat, Aug 4, 2018 at 9:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> > On Sat, Aug 4, 2018 at 12:09 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >> >> On Sat, Aug 4, 2018 at 3:59 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> >>> On Sat, Aug 4, 2018 at 3:42 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >> >>>> On Fri, Aug 3, 2018 at 12:55 AM, H.J. Lu <hongjiu.lu@intel.com> wrote: >> >>>>> We should always set cfun->machine->max_used_stack_alignment if the >> >>>>> maximum stack slot alignment may be greater than 64 bits. >> >>>>> >> >>>>> Tested on i686 and x86-64. OK for master and backport for GCC 8? >> >>>> >> >>>> Can you explain why 64 bits, and what this value represents? Is this >> >>>> value the same for 64bit and 32bit targets? >> >>>> >> >>>> Should crtl->max_used_stack_slot_alignment be compared to >> >>>> STACK_BOUNDARY or even MIN_STACK_BOUNDARY instead? >> >>> >> >>> In this case, we don't need to realign the incoming stack since both >> >>> crtl->max_used_stack_slot_alignment and crtl->preferred_stack_boundary >> >>> are 128 bits. We don't compute the largest alignment of stack slots to >> >>> keep stack frame properly aligned for it. Normally it is OK. But if >> >>> the largest alignment of stack slots > 64 bits and we don't keep stack >> >>> frame proper aligned, we will get segfault if aligned vector load/store >> >>> are used on these unaligned stack slots. My patch computes the >> >>> largest alignment of stack slots, which are actually used, if the >> >>> largest alignment of stack slots allocated is > 64 bits, which is >> >>> the smallest alignment for misaligned load/store. >> >> >> >> Does > 64 bits also hold for 32 bit targets, and -mabi=ms? I think >> >> that we need to compare to STACK_BOUNDARY instead: >> > >> > 64 bits requirement is independent of any psABIs nor 32-bit vs 64-bit. >> > cfun->machine->max_used_stack_alignment is used to decide how >> > stack frame should be aligned. It is always safe to compute it. I used >> > >> > else if (crtl->max_used_stack_slot_alignment > 64) >> > >> > to compute cfun->machine->max_used_stack_alignment only if >> > we have to. >> > >> >> --cut here-- >> >> Index: config/i386/i386.c >> >> =================================================================== >> >> --- config/i386/i386.c (revision 263307) >> >> +++ config/i386/i386.c (working copy) >> >> @@ -13281,8 +13281,7 @@ >> >> recompute_frame_layout_p = true; >> >> } >> >> } >> >> - else if (crtl->max_used_stack_slot_alignment >> >> - > crtl->preferred_stack_boundary) >> >> + else if (crtl->max_used_stack_slot_alignment > STACK_BOUNDARY) >> >> { >> > >> > This isn't correct.. cygming.h has >> > >> > #define STACK_BOUNDARY (TARGET_64BIT && ix86_abi == MS_ABI ? 128 : >> > BITS_PER_WORD) >> > >> > darwin.h has >> > >> > #undef STACK_BOUNDARY >> > #define STACK_BOUNDARY \ >> > ((profile_flag || (TARGET_64BIT && ix86_abi == MS_ABI)) \ >> > ? 128 : BITS_PER_WORD) >> > >> > i386.h has >> > >> > /* Boundary (in *bits*) on which stack pointer should be aligned. */ >> > #define STACK_BOUNDARY \ >> > (TARGET_64BIT && ix86_abi == MS_ABI ? 128 : BITS_PER_WORD) >> > >> > If STACK_BOUNDARY is 128 and max_used_stack_slot_alignment is 128, >> > we will get segment when 128bit aligned load/store is generated on misaligned >> > stack slot. >> > >> >> /* We don't need to realign stack. But we still need to keep >> >> stack frame properly aligned to satisfy the largest alignment >> >> --cut here-- >> >> >> >> (The testcase works OK with -mabi=ms, which somehow suggests that we >> >> don't need realignment in this case). >> > >> > We may not hit 128bit aligned load/store on misaligned stack slot in this >> > case. It doesn't mean that won't happen. >> > >> > else if (crtl->max_used_stack_slot_alignment > 64) >> > >> > is the correct thing to do here. >> >> OK, but please add a comment, so in the future we will still know the >> purpose of the magic number. >> > > Like this? > > H.J. > --- > cfun->machine->max_used_stack_alignment is used to decide how stack frame > should be aligned. This is independent of any psABIs nor 32-bit vs 64-bit. > It is always safe to compute max_used_stack_alignment. We compute it only > if 128-bit aligned load/store may be generated on misaligned stack slot > which will lead to segfault. > > gcc/ > > PR target/86386 > * config/i386/i386.c (ix86_finalize_stack_frame_flags): Set > cfun->machine->max_used_stack_alignment if needed. > > gcc/testsuite/ > > PR target/86386 > * gcc.target/i386/pr86386.c: New file. OK, but please write the condition as ">= 128". The number 64 confused me; I was thinking that it has something to do with minimum alignment on 64bit target, while 128 clearly shows that alignment is related to SSE moves. With ">= 128", I think that code is clear enough that a long comment is not needed. Thanks, and sorry for the confusion, Uros. > --- > gcc/config/i386/i386.c | 14 +++++++------ > gcc/testsuite/gcc.target/i386/pr86386.c | 26 +++++++++++++++++++++++++ > 2 files changed, 34 insertions(+), 6 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr86386.c > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index ee409cfe7e4..cf8c33bd909 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -13281,12 +13281,14 @@ ix86_finalize_stack_frame_flags (void) > recompute_frame_layout_p = true; > } > } > - else if (crtl->max_used_stack_slot_alignment > - > crtl->preferred_stack_boundary) > - { > - /* We don't need to realign stack. But we still need to keep > - stack frame properly aligned to satisfy the largest alignment > - of stack slots. */ > + else if (crtl->max_used_stack_slot_alignment > 64) > + { > + /* We don't need to realign stack. max_used_stack_alignment is > + used to decide how stack frame should be aligned. This is > + independent of any psABIs nor 32-bit vs 64-bit. It is always > + safe to compute max_used_stack_alignment. We compute it only > + if 128-bit aligned load/store may be generated on misaligned > + stack slot which will lead to segfault. */ > if (ix86_find_max_used_stack_alignment (stack_alignment, true)) > cfun->machine->max_used_stack_alignment > = stack_alignment / BITS_PER_UNIT; > diff --git a/gcc/testsuite/gcc.target/i386/pr86386.c b/gcc/testsuite/gcc.target/i386/pr86386.c > new file mode 100644 > index 00000000000..a67cf45444e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr86386.c > @@ -0,0 +1,26 @@ > +/* PR target/86386 */ > +/* { dg-do run { target { avx_runtime && int128 } } } */ > +/* { dg-options "-Os -fno-tree-dce -mstringop-strategy=vector_loop -mavx" } */ > + > +unsigned c, d, e, f; > + > +unsigned __attribute__((noipa)) > +foo (unsigned char g, unsigned short h, unsigned i, unsigned long long j, > + unsigned char k, unsigned short l, unsigned m, unsigned __int128 n) > +{ > + __builtin_memset (&e, 0, 3); > + n <<= m; > + __builtin_memcpy (&m, 2 + (char *) &n, 1); > + m >>= 0; > + d ^= __builtin_mul_overflow (l, n, &m); > + return m; > +} > + > +int > +main () > +{ > + unsigned __int128 x = foo (0, 0, 0, 0, 0, 4, 1, 3); > + if (x != 24) > + __builtin_abort (); > + return 0; > +} > -- > 2.17.1 >
On Sun, Aug 5, 2018 at 12:15 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >>> OK, but please add a comment, so in the future we will still know the >>> purpose of the magic number. >>> >> >> Like this? >> >> H.J. >> --- >> cfun->machine->max_used_stack_alignment is used to decide how stack frame >> should be aligned. This is independent of any psABIs nor 32-bit vs 64-bit. >> It is always safe to compute max_used_stack_alignment. We compute it only >> if 128-bit aligned load/store may be generated on misaligned stack slot >> which will lead to segfault. >> >> gcc/ >> >> PR target/86386 >> * config/i386/i386.c (ix86_finalize_stack_frame_flags): Set >> cfun->machine->max_used_stack_alignment if needed. >> >> gcc/testsuite/ >> >> PR target/86386 >> * gcc.target/i386/pr86386.c: New file. > > OK, but please write the condition as ">= 128". The number 64 confused > me; I was thinking that it has something to do with minimum alignment > on 64bit target, while 128 clearly shows that alignment is related to > SSE moves. With ">= 128", I think that code is clear enough that a > long comment is not needed. > > Thanks, and sorry for the confusion, > Uros. > This is what I checked in. I kept the comment change. I will backport it to GCC 8 after a few days. Thanks.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index ee409cfe7e4..4a0a050b3a2 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -13281,12 +13281,12 @@ ix86_finalize_stack_frame_flags (void) recompute_frame_layout_p = true; } } - else if (crtl->max_used_stack_slot_alignment - > crtl->preferred_stack_boundary) + else if (crtl->max_used_stack_slot_alignment > 64) { /* We don't need to realign stack. But we still need to keep stack frame properly aligned to satisfy the largest alignment - of stack slots. */ + of stack slots if the maximum stack slot alignment may be + greater than 64 bits. */ if (ix86_find_max_used_stack_alignment (stack_alignment, true)) cfun->machine->max_used_stack_alignment = stack_alignment / BITS_PER_UNIT; diff --git a/gcc/testsuite/gcc.target/i386/pr86386.c b/gcc/testsuite/gcc.target/i386/pr86386.c new file mode 100644 index 00000000000..a67cf45444e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr86386.c @@ -0,0 +1,26 @@ +/* PR target/86386 */ +/* { dg-do run { target { avx_runtime && int128 } } } */ +/* { dg-options "-Os -fno-tree-dce -mstringop-strategy=vector_loop -mavx" } */ + +unsigned c, d, e, f; + +unsigned __attribute__((noipa)) +foo (unsigned char g, unsigned short h, unsigned i, unsigned long long j, + unsigned char k, unsigned short l, unsigned m, unsigned __int128 n) +{ + __builtin_memset (&e, 0, 3); + n <<= m; + __builtin_memcpy (&m, 2 + (char *) &n, 1); + m >>= 0; + d ^= __builtin_mul_overflow (l, n, &m); + return m; +} + +int +main () +{ + unsigned __int128 x = foo (0, 0, 0, 0, 0, 4, 1, 3); + if (x != 24) + __builtin_abort (); + return 0; +}