diff mbox series

x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)

Message ID 878r2ifdrx.fsf@oldenburg.str.redhat.com
State New
Headers show
Series x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501) | expand

Commit Message

Florian Weimer March 16, 2024, 2:32 p.m. UTC
In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
This realignment does not take into account that the function has
already used part of the red zone at this point, thus clobbering
the initally saved register values located there if the stack
alignment inherited from the caller is unfortunate.

(Note: I do not know to write a good test case for this in the existing
framework.  We saw this as a random LTO plugin crash when building GCC
with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
with this change here.)

---
 sysdeps/x86_64/dl-tlsdesc-dynamic.h | 4 ++++
 1 file changed, 4 insertions(+)


base-commit: 5ebc24f785dc0dff494a93ca82a369497c3cdc68

Comments

H.J. Lu March 16, 2024, 2:37 p.m. UTC | #1
On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> This realignment does not take into account that the function has
> already used part of the red zone at this point, thus clobbering
> the initally saved register values located there if the stack
> alignment inherited from the caller is unfortunate.
>
> (Note: I do not know to write a good test case for this in the existing
> framework.  We saw this as a random LTO plugin crash when building GCC
> with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
> with this change here.)

Will a different STATE_SAVE_OFFSET for TLS descriptor work?

> ---
>  sysdeps/x86_64/dl-tlsdesc-dynamic.h | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> index 9f02cfc3eb..8e49e7eece 100644
> --- a/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> @@ -83,6 +83,8 @@ _dl_tlsdesc_dynamic:
>  2:
>  #if DL_RUNTIME_RESOLVE_REALIGN_STACK
>         movq    %rbx, -24(%rsp)
> +       subq    $24, %rsp
> +       cfi_adjust_cfa_offset(24)
>         mov     %RSP_LP, %RBX_LP
>         cfi_def_cfa_register(%rbx)
>         and     $-STATE_SAVE_ALIGNMENT, %RSP_LP
> @@ -153,6 +155,8 @@ _dl_tlsdesc_dynamic:
>  #if DL_RUNTIME_RESOLVE_REALIGN_STACK
>         mov     %RBX_LP, %RSP_LP
>         cfi_def_cfa_register(%rsp)
> +       addq    $24, %rsp
> +       cfi_adjust_cfa_offset(-24)
>         movq    -24(%rsp), %rbx
>         cfi_restore(%rbx)
>  #else
>
> base-commit: 5ebc24f785dc0dff494a93ca82a369497c3cdc68
>
H.J. Lu March 16, 2024, 2:43 p.m. UTC | #2
On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> > This realignment does not take into account that the function has
> > already used part of the red zone at this point, thus clobbering
> > the initally saved register values located there if the stack
> > alignment inherited from the caller is unfortunate.
> >
> > (Note: I do not know to write a good test case for this in the existing
> > framework.  We saw this as a random LTO plugin crash when building GCC
> > with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
> > with this change here.)
>
> Will a different STATE_SAVE_OFFSET for TLS descriptor work?

Correction.  REGISTER_SAVE_AREA is for this purpose.   Will a different
value for TLS descriptor work?

> > ---
> >  sysdeps/x86_64/dl-tlsdesc-dynamic.h | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > index 9f02cfc3eb..8e49e7eece 100644
> > --- a/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > @@ -83,6 +83,8 @@ _dl_tlsdesc_dynamic:
> >  2:
> >  #if DL_RUNTIME_RESOLVE_REALIGN_STACK
> >         movq    %rbx, -24(%rsp)
> > +       subq    $24, %rsp
> > +       cfi_adjust_cfa_offset(24)
> >         mov     %RSP_LP, %RBX_LP
> >         cfi_def_cfa_register(%rbx)
> >         and     $-STATE_SAVE_ALIGNMENT, %RSP_LP
> > @@ -153,6 +155,8 @@ _dl_tlsdesc_dynamic:
> >  #if DL_RUNTIME_RESOLVE_REALIGN_STACK
> >         mov     %RBX_LP, %RSP_LP
> >         cfi_def_cfa_register(%rsp)
> > +       addq    $24, %rsp
> > +       cfi_adjust_cfa_offset(-24)
> >         movq    -24(%rsp), %rbx
> >         cfi_restore(%rbx)
> >  #else
> >
> > base-commit: 5ebc24f785dc0dff494a93ca82a369497c3cdc68
> >
>
>
> --
> H.J.
H.J. Lu March 16, 2024, 2:47 p.m. UTC | #3
On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >
> > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> > > This realignment does not take into account that the function has
> > > already used part of the red zone at this point, thus clobbering
> > > the initally saved register values located there if the stack
> > > alignment inherited from the caller is unfortunate.
> > >
> > > (Note: I do not know to write a good test case for this in the existing
> > > framework.  We saw this as a random LTO plugin crash when building GCC
> > > with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
> > > with this change here.)

We should try to find a testcase.  Can you provide a backtrace when it
happens?  It should be possible to write a testcase with the backtrace.


> > Will a different STATE_SAVE_OFFSET for TLS descriptor work?
>
> Correction.  REGISTER_SAVE_AREA is for this purpose.   Will a different
> value for TLS descriptor work?
>
> > > ---
> > >  sysdeps/x86_64/dl-tlsdesc-dynamic.h | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > > index 9f02cfc3eb..8e49e7eece 100644
> > > --- a/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > > +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > > @@ -83,6 +83,8 @@ _dl_tlsdesc_dynamic:
> > >  2:
> > >  #if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > >         movq    %rbx, -24(%rsp)
> > > +       subq    $24, %rsp
> > > +       cfi_adjust_cfa_offset(24)
> > >         mov     %RSP_LP, %RBX_LP
> > >         cfi_def_cfa_register(%rbx)
> > >         and     $-STATE_SAVE_ALIGNMENT, %RSP_LP
> > > @@ -153,6 +155,8 @@ _dl_tlsdesc_dynamic:
> > >  #if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > >         mov     %RBX_LP, %RSP_LP
> > >         cfi_def_cfa_register(%rsp)
> > > +       addq    $24, %rsp
> > > +       cfi_adjust_cfa_offset(-24)
> > >         movq    -24(%rsp), %rbx
> > >         cfi_restore(%rbx)
> > >  #else
> > >
> > > base-commit: 5ebc24f785dc0dff494a93ca82a369497c3cdc68
> > >
> >
> >
> > --
> > H.J.
>
>
>
> --
> H.J.



--
H.J.
Florian Weimer March 16, 2024, 2:57 p.m. UTC | #4
* H. J. Lu:

> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
>> >
>> > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
>> > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
>> > This realignment does not take into account that the function has
>> > already used part of the red zone at this point, thus clobbering
>> > the initally saved register values located there if the stack
>> > alignment inherited from the caller is unfortunate.
>> >
>> > (Note: I do not know to write a good test case for this in the existing
>> > framework.  We saw this as a random LTO plugin crash when building GCC
>> > with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
>> > with this change here.)
>>
>> Will a different STATE_SAVE_OFFSET for TLS descriptor work?
>
> Correction.  REGISTER_SAVE_AREA is for this purpose.   Will a different
> value for TLS descriptor work?

I think REGISTER_SAVE_AREA is for the later register saves?

This use of the red zone is specific to to the TLS trampoline.  The lazy
binding trampoline doesn't do that.  REGISTER_SAVE_AREA is used by both.

Thanks,
Florian
H.J. Lu March 16, 2024, 2:59 p.m. UTC | #5
On Sat, Mar 16, 2024 at 7:57 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> >> >
> >> > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> >> > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> >> > This realignment does not take into account that the function has
> >> > already used part of the red zone at this point, thus clobbering
> >> > the initally saved register values located there if the stack
> >> > alignment inherited from the caller is unfortunate.
> >> >
> >> > (Note: I do not know to write a good test case for this in the existing
> >> > framework.  We saw this as a random LTO plugin crash when building GCC
> >> > with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
> >> > with this change here.)
> >>
> >> Will a different STATE_SAVE_OFFSET for TLS descriptor work?
> >
> > Correction.  REGISTER_SAVE_AREA is for this purpose.   Will a different
> > value for TLS descriptor work?
>
> I think REGISTER_SAVE_AREA is for the later register saves?
>
> This use of the red zone is specific to to the TLS trampoline.  The lazy
> binding trampoline doesn't do that.  REGISTER_SAVE_AREA is used by both.
>

# if DL_RUNTIME_RESOLVE_REALIGN_STACK
        /* STATE_SAVE_OFFSET has space for 8 integer registers.  But we
           need space for RCX, RDX, RSI, RDI, R8, R9, R10 and R11, plus
           RBX above.  */
        sub     $(REGISTER_SAVE_AREA + STATE_SAVE_ALIGNMENT), %RSP_LP
# else
        sub     $REGISTER_SAVE_AREA, %RSP_LP
        cfi_adjust_cfa_offset(REGISTER_SAVE_AREA)
# endif

Let's find a testcase first.
Florian Weimer March 16, 2024, 3:04 p.m. UTC | #6
* H. J. Lu:

> On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>> >
>> > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
>> > >
>> > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
>> > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
>> > > This realignment does not take into account that the function has
>> > > already used part of the red zone at this point, thus clobbering
>> > > the initally saved register values located there if the stack
>> > > alignment inherited from the caller is unfortunate.
>> > >
>> > > (Note: I do not know to write a good test case for this in the existing
>> > > framework.  We saw this as a random LTO plugin crash when building GCC
>> > > with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
>> > > with this change here.)
>
> We should try to find a testcase.  Can you provide a backtrace when it
> happens?  It should be possible to write a testcase with the backtrace.

In my reproducer, when %rbx is about to be clobbered, I see
(%rsp % 64) == 8 at the start of _dl_tlsdesc_dynamic_xsavec.

The %rbx register does not get clobbered if (%rsp % 64) == 56.

Does this help?

Thanks,
Florian
H.J. Lu March 16, 2024, 3:18 p.m. UTC | #7
On Sat, Mar 16, 2024 at 8:04 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >> >
> >> > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> >> > >
> >> > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> >> > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> >> > > This realignment does not take into account that the function has
> >> > > already used part of the red zone at this point, thus clobbering
> >> > > the initally saved register values located there if the stack
> >> > > alignment inherited from the caller is unfortunate.
> >> > >
> >> > > (Note: I do not know to write a good test case for this in the existing
> >> > > framework.  We saw this as a random LTO plugin crash when building GCC
> >> > > with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
> >> > > with this change here.)
> >
> > We should try to find a testcase.  Can you provide a backtrace when it
> > happens?  It should be possible to write a testcase with the backtrace.
>
> In my reproducer, when %rbx is about to be clobbered, I see
> (%rsp % 64) == 8 at the start of _dl_tlsdesc_dynamic_xsavec.
>
> The %rbx register does not get clobbered if (%rsp % 64) == 56.
>
> Does this help?
>

Yes.  I am working on a testcase.
H.J. Lu March 16, 2024, 4:32 p.m. UTC | #8
On Sat, Mar 16, 2024 at 8:18 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 8:04 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * H. J. Lu:
> >
> > > On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >>
> > >> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >> >
> > >> > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >> > >
> > >> > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> > >> > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> > >> > > This realignment does not take into account that the function has
> > >> > > already used part of the red zone at this point, thus clobbering
> > >> > > the initally saved register values located there if the stack
> > >> > > alignment inherited from the caller is unfortunate.
> > >> > >
> > >> > > (Note: I do not know to write a good test case for this in the existing
> > >> > > framework.  We saw this as a random LTO plugin crash when building GCC
> > >> > > with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
> > >> > > with this change here.)
> > >
> > > We should try to find a testcase.  Can you provide a backtrace when it
> > > happens?  It should be possible to write a testcase with the backtrace.
> >
> > In my reproducer, when %rbx is about to be clobbered, I see
> > (%rsp % 64) == 8 at the start of _dl_tlsdesc_dynamic_xsavec.
> >
> > The %rbx register does not get clobbered if (%rsp % 64) == 56.
> >
> > Does this help?
> >
>
> Yes.  I am working on a testcase.

Hi Florian,

Please verify if this is the right testcase.

Thanks.
H.J. Lu March 16, 2024, 4:37 p.m. UTC | #9
On Sat, Mar 16, 2024 at 9:32 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 8:18 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Sat, Mar 16, 2024 at 8:04 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >
> > > * H. J. Lu:
> > >
> > > > On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >>
> > > >> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >> >
> > > >> > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> > > >> > >
> > > >> > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> > > >> > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> > > >> > > This realignment does not take into account that the function has
> > > >> > > already used part of the red zone at this point, thus clobbering
> > > >> > > the initally saved register values located there if the stack
> > > >> > > alignment inherited from the caller is unfortunate.
> > > >> > >
> > > >> > > (Note: I do not know to write a good test case for this in the existing
> > > >> > > framework.  We saw this as a random LTO plugin crash when building GCC
> > > >> > > with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
> > > >> > > with this change here.)
> > > >
> > > > We should try to find a testcase.  Can you provide a backtrace when it
> > > > happens?  It should be possible to write a testcase with the backtrace.
> > >
> > > In my reproducer, when %rbx is about to be clobbered, I see
> > > (%rsp % 64) == 8 at the start of _dl_tlsdesc_dynamic_xsavec.
> > >
> > > The %rbx register does not get clobbered if (%rsp % 64) == 56.
> > >
> > > Does this help?
> > >
> >
> > Yes.  I am working on a testcase.
>
> Hi Florian,
>
> Please verify if this is the right testcase.

This test fails only on AVX512 machines.
Florian Weimer March 16, 2024, 5:42 p.m. UTC | #10
* H. J. Lu:

> Please verify if this is the right testcase.

Test case works (fails without my fix, succeeds with my fix).  Some
comments below.

> diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> new file mode 100644
> index 0000000000..8129b28061
> --- /dev/null
> +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> @@ -0,0 +1,57 @@

> +	.text
> +	.p2align 4
> +	.globl	apply_tls
> +	.type	apply_tls, @function
> +apply_tls:
> +	.cfi_startproc

Missing CET marker.

> +	subq	$24, %rsp
> +	.cfi_def_cfa_offset 32
> +	movdqu	(%rdi), %xmm0
> +	movq	%fs:40, %rax
> +	movq	%rax, 8(%rsp)
> +	xorl	%eax, %eax
> +	leaq	tls_var0@TLSDESC(%rip), %rax
> +	call	*tls_var0@TLSCALL(%rax)
> +	addq	%fs:0, %rax
> +	movups	%xmm0, (%rax)
> +	movdqu	16(%rdi), %xmm1
> +	movups	%xmm1, 16(%rax)
> +	movq	8(%rsp), %rdx
> +	subq	%fs:40, %rdx
> +	jne	.L5
> +	addq	$24, %rsp
> +	.cfi_remember_state
> +	.cfi_def_cfa_offset 8
> +	ret
> +.L5:
> +	.cfi_restore_state
> +	call	__stack_chk_fail@PLT

Not sure if we need this?

Maybe add some comment what exactly this subtest is exercising?

These are present in the other TLS modules as well.

> diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> new file mode 100644
> index 0000000000..af4b7ca761
> --- /dev/null
> +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S

> +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to
> +   clobber %rbx.  */
> +#define OFFSET (56 + 16 + 16 + 16)
> +
> +	.text
> +	.p2align 4
> +	.globl	apply_tls
> +	.type	apply_tls, @function
> +apply_tls:
> +	.cfi_startproc
> +	pushq	%rbp
> +	.cfi_def_cfa_offset 16
> +	.cfi_offset 6, -16
> +	movq	%rsp, %rbp
> +	.cfi_def_cfa_register 6
> +	/* Align stack to 64 bytes.  */
> +	andq	$-64, %rsp
> +	pushq	%rbx
> +	subq	$OFFSET, %rsp

The offset could be loaded from a global variable or something like
that.  We should exercise a wide range of stack alignments—the
individual tests are cheap.  And maybe check extra registers.

Thanks,
Florian
H.J. Lu March 16, 2024, 5:51 p.m. UTC | #11
On Sat, Mar 16, 2024 at 10:42 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > Please verify if this is the right testcase.
>
> Test case works (fails without my fix, succeeds with my fix).  Some
> comments below.
>
> > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > new file mode 100644
> > index 0000000000..8129b28061
> > --- /dev/null
> > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > @@ -0,0 +1,57 @@
>
> > +     .text
> > +     .p2align 4
> > +     .globl  apply_tls
> > +     .type   apply_tls, @function
> > +apply_tls:
> > +     .cfi_startproc
>
> Missing CET marker.
>
> > +     subq    $24, %rsp
> > +     .cfi_def_cfa_offset 32
> > +     movdqu  (%rdi), %xmm0
> > +     movq    %fs:40, %rax
> > +     movq    %rax, 8(%rsp)
> > +     xorl    %eax, %eax
> > +     leaq    tls_var0@TLSDESC(%rip), %rax
> > +     call    *tls_var0@TLSCALL(%rax)
> > +     addq    %fs:0, %rax
> > +     movups  %xmm0, (%rax)
> > +     movdqu  16(%rdi), %xmm1
> > +     movups  %xmm1, 16(%rax)
> > +     movq    8(%rsp), %rdx
> > +     subq    %fs:40, %rdx
> > +     jne     .L5
> > +     addq    $24, %rsp
> > +     .cfi_remember_state
> > +     .cfi_def_cfa_offset 8
> > +     ret
> > +.L5:
> > +     .cfi_restore_state
> > +     call    __stack_chk_fail@PLT
>
> Not sure if we need this?
>
> Maybe add some comment what exactly this subtest is exercising?
>
> These are present in the other TLS modules as well.
>
> > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > new file mode 100644
> > index 0000000000..af4b7ca761
> > --- /dev/null
> > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
>
> > +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to
> > +   clobber %rbx.  */
> > +#define OFFSET (56 + 16 + 16 + 16)
> > +
> > +     .text
> > +     .p2align 4
> > +     .globl  apply_tls
> > +     .type   apply_tls, @function
> > +apply_tls:
> > +     .cfi_startproc
> > +     pushq   %rbp
> > +     .cfi_def_cfa_offset 16
> > +     .cfi_offset 6, -16
> > +     movq    %rsp, %rbp
> > +     .cfi_def_cfa_register 6
> > +     /* Align stack to 64 bytes.  */
> > +     andq    $-64, %rsp
> > +     pushq   %rbx
> > +     subq    $OFFSET, %rsp
>
> The offset could be loaded from a global variable or something like
> that.  We should exercise a wide range of stack alignments—the
> individual tests are cheap.  And maybe check extra registers.

I will clean it up with a different fix.
H.J. Lu March 16, 2024, 10:05 p.m. UTC | #12
On Sat, Mar 16, 2024 at 10:51 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 10:42 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * H. J. Lu:
> >
> > > Please verify if this is the right testcase.
> >
> > Test case works (fails without my fix, succeeds with my fix).  Some
> > comments below.
> >
> > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > new file mode 100644
> > > index 0000000000..8129b28061
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > @@ -0,0 +1,57 @@
> >
> > > +     .text
> > > +     .p2align 4
> > > +     .globl  apply_tls
> > > +     .type   apply_tls, @function
> > > +apply_tls:
> > > +     .cfi_startproc
> >
> > Missing CET marker.
> >
> > > +     subq    $24, %rsp
> > > +     .cfi_def_cfa_offset 32
> > > +     movdqu  (%rdi), %xmm0
> > > +     movq    %fs:40, %rax
> > > +     movq    %rax, 8(%rsp)
> > > +     xorl    %eax, %eax
> > > +     leaq    tls_var0@TLSDESC(%rip), %rax
> > > +     call    *tls_var0@TLSCALL(%rax)
> > > +     addq    %fs:0, %rax
> > > +     movups  %xmm0, (%rax)
> > > +     movdqu  16(%rdi), %xmm1
> > > +     movups  %xmm1, 16(%rax)
> > > +     movq    8(%rsp), %rdx
> > > +     subq    %fs:40, %rdx
> > > +     jne     .L5
> > > +     addq    $24, %rsp
> > > +     .cfi_remember_state
> > > +     .cfi_def_cfa_offset 8
> > > +     ret
> > > +.L5:
> > > +     .cfi_restore_state
> > > +     call    __stack_chk_fail@PLT
> >
> > Not sure if we need this?
> >
> > Maybe add some comment what exactly this subtest is exercising?
> >
> > These are present in the other TLS modules as well.
> >
> > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > > new file mode 100644
> > > index 0000000000..af4b7ca761
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> >
> > > +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to
> > > +   clobber %rbx.  */
> > > +#define OFFSET (56 + 16 + 16 + 16)
> > > +
> > > +     .text
> > > +     .p2align 4
> > > +     .globl  apply_tls
> > > +     .type   apply_tls, @function
> > > +apply_tls:
> > > +     .cfi_startproc
> > > +     pushq   %rbp
> > > +     .cfi_def_cfa_offset 16
> > > +     .cfi_offset 6, -16
> > > +     movq    %rsp, %rbp
> > > +     .cfi_def_cfa_register 6
> > > +     /* Align stack to 64 bytes.  */
> > > +     andq    $-64, %rsp
> > > +     pushq   %rbx
> > > +     subq    $OFFSET, %rsp
> >
> > The offset could be loaded from a global variable or something like
> > that.  We should exercise a wide range of stack alignments—the
> > individual tests are cheap.  And maybe check extra registers.
>
> I will clean it up with a different fix.
>

I submitted a patch with a testase:

https://patchwork.sourceware.org/project/glibc/list/?series=31963

My patch allocates 64 more bytes to avoid clobbering saved RDI,
RSI and RBX values on stack by xsave.  It avoids 2 stack
adjustments.  Either my fix or Florian's fix should fix the issue.
I don't have a strong preference as long as my testcase is
included.

Thanks.
H.J. Lu March 17, 2024, 1:19 a.m. UTC | #13
On Sat, Mar 16, 2024 at 3:05 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 10:51 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Sat, Mar 16, 2024 at 10:42 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >
> > > * H. J. Lu:
> > >
> > > > Please verify if this is the right testcase.
> > >
> > > Test case works (fails without my fix, succeeds with my fix).  Some
> > > comments below.
> > >
> > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > > new file mode 100644
> > > > index 0000000000..8129b28061
> > > > --- /dev/null
> > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > > @@ -0,0 +1,57 @@
> > >
> > > > +     .text
> > > > +     .p2align 4
> > > > +     .globl  apply_tls
> > > > +     .type   apply_tls, @function
> > > > +apply_tls:
> > > > +     .cfi_startproc
> > >
> > > Missing CET marker.
> > >
> > > > +     subq    $24, %rsp
> > > > +     .cfi_def_cfa_offset 32
> > > > +     movdqu  (%rdi), %xmm0
> > > > +     movq    %fs:40, %rax
> > > > +     movq    %rax, 8(%rsp)
> > > > +     xorl    %eax, %eax
> > > > +     leaq    tls_var0@TLSDESC(%rip), %rax
> > > > +     call    *tls_var0@TLSCALL(%rax)
> > > > +     addq    %fs:0, %rax
> > > > +     movups  %xmm0, (%rax)
> > > > +     movdqu  16(%rdi), %xmm1
> > > > +     movups  %xmm1, 16(%rax)
> > > > +     movq    8(%rsp), %rdx
> > > > +     subq    %fs:40, %rdx
> > > > +     jne     .L5
> > > > +     addq    $24, %rsp
> > > > +     .cfi_remember_state
> > > > +     .cfi_def_cfa_offset 8
> > > > +     ret
> > > > +.L5:
> > > > +     .cfi_restore_state
> > > > +     call    __stack_chk_fail@PLT
> > >
> > > Not sure if we need this?
> > >
> > > Maybe add some comment what exactly this subtest is exercising?
> > >
> > > These are present in the other TLS modules as well.
> > >
> > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > > > new file mode 100644
> > > > index 0000000000..af4b7ca761
> > > > --- /dev/null
> > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > >
> > > > +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to
> > > > +   clobber %rbx.  */
> > > > +#define OFFSET (56 + 16 + 16 + 16)
> > > > +
> > > > +     .text
> > > > +     .p2align 4
> > > > +     .globl  apply_tls
> > > > +     .type   apply_tls, @function
> > > > +apply_tls:
> > > > +     .cfi_startproc
> > > > +     pushq   %rbp
> > > > +     .cfi_def_cfa_offset 16
> > > > +     .cfi_offset 6, -16
> > > > +     movq    %rsp, %rbp
> > > > +     .cfi_def_cfa_register 6
> > > > +     /* Align stack to 64 bytes.  */
> > > > +     andq    $-64, %rsp
> > > > +     pushq   %rbx
> > > > +     subq    $OFFSET, %rsp
> > >
> > > The offset could be loaded from a global variable or something like
> > > that.  We should exercise a wide range of stack alignments—the
> > > individual tests are cheap.  And maybe check extra registers.
> >
> > I will clean it up with a different fix.
> >
>
> I submitted a patch with a testase:
>
> https://patchwork.sourceware.org/project/glibc/list/?series=31963
>
> My patch allocates 64 more bytes to avoid clobbering saved RDI,
> RSI and RBX values on stack by xsave.  It avoids 2 stack
> adjustments.  Either my fix or Florian's fix should fix the issue.
> I don't have a strong preference as long as my testcase is
> included.
>
>
I think my testcase may fail on AMD AVX CPUs without the
fix.  On Intel AVX CPUs, the state size is 960 bytes.  But the
last 128 bytes may be unused.
H.J. Lu March 17, 2024, 3:14 a.m. UTC | #14
On Sat, Mar 16, 2024 at 6:19 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 3:05 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Sat, Mar 16, 2024 at 10:51 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Sat, Mar 16, 2024 at 10:42 AM Florian Weimer <fweimer@redhat.com> wrote:
> > > >
> > > > * H. J. Lu:
> > > >
> > > > > Please verify if this is the right testcase.
> > > >
> > > > Test case works (fails without my fix, succeeds with my fix).  Some
> > > > comments below.
> > > >
> > > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > > > new file mode 100644
> > > > > index 0000000000..8129b28061
> > > > > --- /dev/null
> > > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > > > @@ -0,0 +1,57 @@
> > > >
> > > > > +     .text
> > > > > +     .p2align 4
> > > > > +     .globl  apply_tls
> > > > > +     .type   apply_tls, @function
> > > > > +apply_tls:
> > > > > +     .cfi_startproc
> > > >
> > > > Missing CET marker.
> > > >
> > > > > +     subq    $24, %rsp
> > > > > +     .cfi_def_cfa_offset 32
> > > > > +     movdqu  (%rdi), %xmm0
> > > > > +     movq    %fs:40, %rax
> > > > > +     movq    %rax, 8(%rsp)
> > > > > +     xorl    %eax, %eax
> > > > > +     leaq    tls_var0@TLSDESC(%rip), %rax
> > > > > +     call    *tls_var0@TLSCALL(%rax)
> > > > > +     addq    %fs:0, %rax
> > > > > +     movups  %xmm0, (%rax)
> > > > > +     movdqu  16(%rdi), %xmm1
> > > > > +     movups  %xmm1, 16(%rax)
> > > > > +     movq    8(%rsp), %rdx
> > > > > +     subq    %fs:40, %rdx
> > > > > +     jne     .L5
> > > > > +     addq    $24, %rsp
> > > > > +     .cfi_remember_state
> > > > > +     .cfi_def_cfa_offset 8
> > > > > +     ret
> > > > > +.L5:
> > > > > +     .cfi_restore_state
> > > > > +     call    __stack_chk_fail@PLT
> > > >
> > > > Not sure if we need this?
> > > >
> > > > Maybe add some comment what exactly this subtest is exercising?
> > > >
> > > > These are present in the other TLS modules as well.
> > > >
> > > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > > > > new file mode 100644
> > > > > index 0000000000..af4b7ca761
> > > > > --- /dev/null
> > > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > > >
> > > > > +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to
> > > > > +   clobber %rbx.  */
> > > > > +#define OFFSET (56 + 16 + 16 + 16)
> > > > > +
> > > > > +     .text
> > > > > +     .p2align 4
> > > > > +     .globl  apply_tls
> > > > > +     .type   apply_tls, @function
> > > > > +apply_tls:
> > > > > +     .cfi_startproc
> > > > > +     pushq   %rbp
> > > > > +     .cfi_def_cfa_offset 16
> > > > > +     .cfi_offset 6, -16
> > > > > +     movq    %rsp, %rbp
> > > > > +     .cfi_def_cfa_register 6
> > > > > +     /* Align stack to 64 bytes.  */
> > > > > +     andq    $-64, %rsp
> > > > > +     pushq   %rbx
> > > > > +     subq    $OFFSET, %rsp
> > > >
> > > > The offset could be loaded from a global variable or something like
> > > > that.  We should exercise a wide range of stack alignments—the
> > > > individual tests are cheap.  And maybe check extra registers.
> > >
> > > I will clean it up with a different fix.
> > >
> >
> > I submitted a patch with a testase:
> >
> > https://patchwork.sourceware.org/project/glibc/list/?series=31963
> >
> > My patch allocates 64 more bytes to avoid clobbering saved RDI,
> > RSI and RBX values on stack by xsave.  It avoids 2 stack
> > adjustments.  Either my fix or Florian's fix should fix the issue.
> > I don't have a strong preference as long as my testcase is
> > included.
> >
> >
> I think my testcase may fail on AMD AVX CPUs without the
> fix.  On Intel AVX CPUs, the state size is 960 bytes.  But the
> last 128 bytes may be unused.
>

I sent out the v2 patch:

https://patchwork.sourceware.org/project/glibc/list/?series=31966

to simplify the testcase.
diff mbox series

Patch

diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
index 9f02cfc3eb..8e49e7eece 100644
--- a/sysdeps/x86_64/dl-tlsdesc-dynamic.h
+++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
@@ -83,6 +83,8 @@  _dl_tlsdesc_dynamic:
 2:
 #if DL_RUNTIME_RESOLVE_REALIGN_STACK
 	movq	%rbx, -24(%rsp)
+	subq    $24, %rsp
+	cfi_adjust_cfa_offset(24)
 	mov	%RSP_LP, %RBX_LP
 	cfi_def_cfa_register(%rbx)
 	and	$-STATE_SAVE_ALIGNMENT, %RSP_LP
@@ -153,6 +155,8 @@  _dl_tlsdesc_dynamic:
 #if DL_RUNTIME_RESOLVE_REALIGN_STACK
 	mov	%RBX_LP, %RSP_LP
 	cfi_def_cfa_register(%rsp)
+	addq    $24, %rsp
+	cfi_adjust_cfa_offset(-24)
 	movq	-24(%rsp), %rbx
 	cfi_restore(%rbx)
 #else