mbox series

[0/2] nptl: Update struct pthread_unwind_buf

Message ID 20180201205757.51911-1-hjl.tools@gmail.com
Headers show
Series nptl: Update struct pthread_unwind_buf | expand

Message

H.J. Lu Feb. 1, 2018, 8:57 p.m. UTC
struct pthread_unwind_buf is updated to save and restore shadow stack
register with backward binary compatibility.

H.J. Lu (2):
  Revert "Revert Intel CET changes to __jmp_buf_tag (Bug 22743)"
  nptl: Update struct pthread_unwind_buf [BZ #22743]

 bits/types/__cancel_jmp_buf_tag.h                  | 28 ++++++++
 csu/libc-start.c                                   |  6 +-
 nptl/Makefile                                      |  3 +-
 nptl/cleanup.c                                     |  9 ++-
 nptl/cleanup_defer.c                               | 16 +++--
 nptl/descr.h                                       | 78 +++++++++++++++++-----
 nptl/pthread_create.c                              |  9 ++-
 nptl/unwind.c                                      |  6 +-
 sysdeps/i386/nptl/tcb-offsets.sym                  |  1 +
 sysdeps/i386/nptl/tls.h                            |  4 ++
 sysdeps/nptl/pthread.h                             |  7 +-
 sysdeps/unix/sysv/linux/hppa/pthread.h             |  7 +-
 .../linux/x86/bits/types/__cancel_jmp_buf_tag.h    | 31 +++++++++
 sysdeps/unix/sysv/linux/x86/nptl/pthreadP.h        | 36 ++++++++++
 sysdeps/unix/sysv/linux/x86/pthreaddef.h           | 36 ++++++++++
 sysdeps/x86_64/nptl/tcb-offsets.sym                |  1 +
 sysdeps/x86_64/nptl/tls.h                          |  5 +-
 17 files changed, 238 insertions(+), 45 deletions(-)
 create mode 100644 bits/types/__cancel_jmp_buf_tag.h
 create mode 100644 sysdeps/unix/sysv/linux/x86/bits/types/__cancel_jmp_buf_tag.h
 create mode 100644 sysdeps/unix/sysv/linux/x86/nptl/pthreadP.h
 create mode 100644 sysdeps/unix/sysv/linux/x86/pthreaddef.h

Comments

Carlos O'Donell Feb. 8, 2018, 9:25 a.m. UTC | #1
On 02/01/2018 12:57 PM, H.J. Lu wrote:
> struct pthread_unwind_buf is updated to save and restore shadow stack
> register with backward binary compatibility.
> 
> H.J. Lu (2):
>   Revert "Revert Intel CET changes to __jmp_buf_tag (Bug 22743)"
>   nptl: Update struct pthread_unwind_buf [BZ #22743]
> 

High Level:

To enable Intel CET we need to make sure that all places that change
execution context abruptly also adjust the shadow stack such that the
subsequent return matches what is presently on the stack. The current
setjmp/longjmp are just such functions that change execution context
abruptly, and to use longjmp to return to a setjmp location we must
also restore the shadow stack on the return jump.

The cancellation push/pop sequences also use a jump buffer, and at a
high-level your solution is to expand those buffers with enough space
(an ABI change) to store the shadow stack pointer. The ABI transition
is managed using the CET note markup, and safety is ensured in that way
as a all-or-nothing ABI change.

Note that it depends heavily CET markup rolling out smoothly with glibc
2.28. If any distro enables CET markup without glibc 2.28 then it might
be possible to have objects that use the old-ABI jump buffers in a CET
enabled application and crash. Similarly if users build and use their
own runtime.

Why not version __pthread_register_cancel instead? The versioned symbol
would seem to me a more robust indicator of the larger jump buffer than
the requirement on coordinating markup + glibc 2.28 in all distros and
for all users?

I would like to see an argument made for CET markup against versioning
__pthread_register_cleanup.

Design:

Commit f33632ccd1dec3217583fcfdd965afb62954203c finds space in the
existing sigset_t by reducing the number of possible signals down to
513, the kernel is only using 64, and the remaining space is used to
store the 64-bit shadow stack pointer.

However, this isn't the only place that jump buffers like those used
in setjmp/longjmp are used. Internally in pthread_cleanup_push and
pthread_clenaup_pop the cancellation is handled by a special
structure that is almost identical to the jump buffers used by
setjmp/longjmp. However, because they are not identical, and were
designed to avoid needing to save/restore the process signal mask
during cancellation (as an optimization), these buffers also do
not have enough space to store the shadow stack pointer.

Therefore there needs to be an expansion in the size of the jump
buffer used for cancellation to accomodate the shadow stack pointer.

Your patches present the most logical next step here, which is to
make the cancellation jump buffer the same size as the normal jump
buffer used for setjmp/longjmp.

Changing the cancellation jump buffer is an ABI change though since
the jump buffer is allocated in the application stack via the the
pthread_cleanup_push macro. Some way must be found to determine which
size of jump buffer the caller is using to know if the newer larger
version can be used.

It is not sufficient to version pthread_create to make this distinction
because an application may have been compiled with a new glibc, get
the new version of pthread_create, and then call a DSO which used the
old pthread_cleanup_push, and so expecte the old-ABI jump buffer.

It is not sufficient to use priv.pad[3] because on ABIs with 32-bit
pointers like x32 there is insufficient space to store the 64-bit
shadow stack pointer.

It *is* sufficient to version __pthread_register_cleanup in theory
because on a caller-by-caller basis this would indicate if the caller
had been compiled with the old or new-ABI jump buffer and read
accordingly.

Your patches take another approach, which is to use the binutils CET
markup to indicate an all-or-nothing ABI transition to the new larger
sized jump buffer. If CET is enabled, then the application would have
been compiled with a CET enabled gcc, CET enabled glibc, and CET
enabled binutils, and the result is a CET enabled application which
is then assumed to have the newer larger sized cancellation jump buffer.
Your solution avoids adding another symbol version and uses the existing
CET markup as the deciding factor.

Why might we reject versioning __pthread_register_cleanup? It would seem
that since glibc internally controls the pthread_unwind_buf it will never
be passed by pointer to another TU to be used in ways we don't control
(the problem s390x faced when trying to increase the size of jmpbuf
for setjmp/longjmp) e.g. loaded by a newer version of __pthread_register_cleanup
that expects a different sized object. The internals and their use
as expected from pthread_cleanup_push/pop are all in the same translation
unit.

Again, I would like to see an argument made for CET markup against versioning
__pthread_register_cleanup.

Details:

The name NEED_SAVED_MASK_IN_CANCEL_JMP_BUF while true is slightly
misleading IMO. While it is true that you need the saved mask there, the
actual logical goal is to make the structure match in layout with the
non-cancel jmp_buf. I would rename this to NEED_SETJMP_LAYOUT or something
like that to indicate the intent is to make the structure match the layout
used by setjmp jump buffers.

Both UNWIND_BUF_PRIV and THREAD_COPY_ADDITONAL_INFO are exmaples of typo-prone
macro APIs. We should work hard to define defaults and have unconditional
users of these macros such that -Wundef can warn us if we make mistakes
e.g. https://sourceware.org/glibc/wiki/Wundef

Notes:
- This current patch set passes the ABI tests I ran last time so it is good
  to note that your patches do not appear to introduce any other ABI regressions.
Florian Weimer Feb. 8, 2018, 11:55 a.m. UTC | #2
* Carlos O'Donell:

> I would like to see an argument made for CET markup against versioning
> __pthread_register_cleanup.

I don't think __pthread_register_cleanup.  It's __sigsetjmp.  We would
have to version __sigsetjmp.  Changing the name would be ideal, but
this will be difficult because of its returns-twice nature.

I also don't have a problem with requiring -fexceptions for
cancellation handlers with CET support.  For additional safety, we
could change sigsetjmp to write the shadow stack pointer before the
kernel signal mask, so that it will still be in bounds for legacy
cancellation handler allocation.  __pthread_register_cleanup will
overwrite it, but I don't think we ever need to restore it, so that
shouldn't be a problem.
H.J. Lu Feb. 8, 2018, 1:27 p.m. UTC | #3
On Thu, Feb 8, 2018 at 1:25 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 02/01/2018 12:57 PM, H.J. Lu wrote:
>> struct pthread_unwind_buf is updated to save and restore shadow stack
>> register with backward binary compatibility.
>>
>> H.J. Lu (2):
>>   Revert "Revert Intel CET changes to __jmp_buf_tag (Bug 22743)"
>>   nptl: Update struct pthread_unwind_buf [BZ #22743]
>>
>
> High Level:
>
> To enable Intel CET we need to make sure that all places that change
> execution context abruptly also adjust the shadow stack such that the
> subsequent return matches what is presently on the stack. The current
> setjmp/longjmp are just such functions that change execution context
> abruptly, and to use longjmp to return to a setjmp location we must
> also restore the shadow stack on the return jump.
>
> The cancellation push/pop sequences also use a jump buffer, and at a
> high-level your solution is to expand those buffers with enough space
> (an ABI change) to store the shadow stack pointer. The ABI transition
> is managed using the CET note markup, and safety is ensured in that way
> as a all-or-nothing ABI change.
>
> Note that it depends heavily CET markup rolling out smoothly with glibc
> 2.28. If any distro enables CET markup without glibc 2.28 then it might
> be possible to have objects that use the old-ABI jump buffers in a CET
> enabled application and crash. Similarly if users build and use their
> own runtime.
>
> Why not version __pthread_register_cancel instead? The versioned symbol
> would seem to me a more robust indicator of the larger jump buffer than
> the requirement on coordinating markup + glibc 2.28 in all distros and
> for all users?
>
> I would like to see an argument made for CET markup against versioning
> __pthread_register_cleanup.
>
> Design:
>
> Commit f33632ccd1dec3217583fcfdd965afb62954203c finds space in the
> existing sigset_t by reducing the number of possible signals down to
> 513, the kernel is only using 64, and the remaining space is used to
> store the 64-bit shadow stack pointer.
>
> However, this isn't the only place that jump buffers like those used
> in setjmp/longjmp are used. Internally in pthread_cleanup_push and
> pthread_clenaup_pop the cancellation is handled by a special
> structure that is almost identical to the jump buffers used by
> setjmp/longjmp. However, because they are not identical, and were
> designed to avoid needing to save/restore the process signal mask
> during cancellation (as an optimization), these buffers also do
> not have enough space to store the shadow stack pointer.
>
> Therefore there needs to be an expansion in the size of the jump
> buffer used for cancellation to accomodate the shadow stack pointer.
>
> Your patches present the most logical next step here, which is to
> make the cancellation jump buffer the same size as the normal jump
> buffer used for setjmp/longjmp.
>
> Changing the cancellation jump buffer is an ABI change though since
> the jump buffer is allocated in the application stack via the the
> pthread_cleanup_push macro. Some way must be found to determine which
> size of jump buffer the caller is using to know if the newer larger
> version can be used.
>
> It is not sufficient to version pthread_create to make this distinction
> because an application may have been compiled with a new glibc, get
> the new version of pthread_create, and then call a DSO which used the
> old pthread_cleanup_push, and so expecte the old-ABI jump buffer.
>
> It is not sufficient to use priv.pad[3] because on ABIs with 32-bit
> pointers like x32 there is insufficient space to store the 64-bit
> shadow stack pointer.
>
> It *is* sufficient to version __pthread_register_cleanup in theory
> because on a caller-by-caller basis this would indicate if the caller
> had been compiled with the old or new-ABI jump buffer and read
> accordingly.
>
> Your patches take another approach, which is to use the binutils CET
> markup to indicate an all-or-nothing ABI transition to the new larger
> sized jump buffer. If CET is enabled, then the application would have
> been compiled with a CET enabled gcc, CET enabled glibc, and CET
> enabled binutils, and the result is a CET enabled application which
> is then assumed to have the newer larger sized cancellation jump buffer.
> Your solution avoids adding another symbol version and uses the existing
> CET markup as the deciding factor.
>
> Why might we reject versioning __pthread_register_cleanup? It would seem
> that since glibc internally controls the pthread_unwind_buf it will never
> be passed by pointer to another TU to be used in ways we don't control
> (the problem s390x faced when trying to increase the size of jmpbuf
> for setjmp/longjmp) e.g. loaded by a newer version of __pthread_register_cleanup
> that expects a different sized object. The internals and their use
> as expected from pthread_cleanup_push/pop are all in the same translation
> unit.
>
> Again, I would like to see an argument made for CET markup against versioning
> __pthread_register_cleanup.

Symbol versioning works well when only opaque data pointers are used.
A pointer of struct pthread_unwind_buf is passed to libpthread from user
programs.  Within libpthread, we need to know the exact layout of struct
pthread_unwind_buf.  It is next to impossible to use symbol versioning here.

> Details:
>
> The name NEED_SAVED_MASK_IN_CANCEL_JMP_BUF while true is slightly
> misleading IMO. While it is true that you need the saved mask there, the
> actual logical goal is to make the structure match in layout with the
> non-cancel jmp_buf. I would rename this to NEED_SETJMP_LAYOUT or something

Will do.

> like that to indicate the intent is to make the structure match the layout
> used by setjmp jump buffers.
>
> Both UNWIND_BUF_PRIV and THREAD_COPY_ADDITONAL_INFO are exmaples of typo-prone
> macro APIs. We should work hard to define defaults and have unconditional
> users of these macros such that -Wundef can warn us if we make mistakes
> e.g. https://sourceware.org/glibc/wiki/Wundef

Will do.

> Notes:
> - This current patch set passes the ABI tests I ran last time so it is good
>   to note that your patches do not appear to introduce any other ABI regressions.
>

Thanks.
Carlos O'Donell Feb. 9, 2018, 6:29 a.m. UTC | #4
On 02/08/2018 03:55 AM, Florian Weimer wrote:
> * Carlos O'Donell:
> 
>> I would like to see an argument made for CET markup against versioning
>> __pthread_register_cleanup.
> 
> I don't think __pthread_register_cleanup.  It's __sigsetjmp.  We would
> have to version __sigsetjmp.  Changing the name would be ideal, but
> this will be difficult because of its returns-twice nature.

I'm looking for a solution that yields a clearly diagnosable failure
when users mix a compiler which emits CET markup and a glibc which has
the smaller sized cancellation jump buffer.

If we version __sigsetjmp, we need to decide the semantics of the old
and new version.

A naive old version would not save the shadow stack pointer to the
reclaimed space after the newer smaller internal __sigset_t, and a newer
version would save the shadow stack pointer.

This leads to sigsegv when a user mixes as above by accident, and they
will have no warning about the problem. The CET markup will be present
in all objects they have, but when run with a newer glibc it will crash
by corrupting the stack when it writes the shadow stack pointer.

My suggestion with __pthread_register_cancel et.al. (and yes we would 
have to version all the interfaces) was to basically data version the
structure, but my original idea revolved around the idea of shifting
__pad[3] up to the start of the structure and use it for versioning.

However, your idea to version __sigsetjmp has reminded me that
__saved_mask need only be non-zero to work properly and we could
store a version cookie there, which could allow us to do the following:

* Version __sigsetjmp, and store a version cookie in __saved_mask
  indicating the version of the structure.
* Have the old __sigsetjmp disable CET since it clearly indicates
  that we have the wrong sized structure regardless of CET flags.
* Adjust the unwinder to look at __saved_mask version cooke to decide
  which sized structure it is dealing with.

Is this feasible? It would give high quality diagnostics and potentially
allow us to extend the cancellation jump buffer with more data in the
future.

> I also don't have a problem with requiring -fexceptions for
> cancellation handlers with CET support.  For additional safety, we
> could change sigsetjmp to write the shadow stack pointer before the
> kernel signal mask, so that it will still be in bounds for legacy
> cancellation handler allocation.  __pthread_register_cleanup will
> overwrite it, but I don't think we ever need to restore it, so that
> shouldn't be a problem.
 
In the truncated sigjmp_buf used by pthread cancellation we have only 
void * which is not large enough on x32, where you need a 64-bit
shadow stack pointer. It would work on every other machine though.
HJ has mentioned this problem already IIRC.

Why would __pthread_register_cancel overwrite it?
Carlos O'Donell Feb. 9, 2018, 6:40 a.m. UTC | #5
On 02/08/2018 05:27 AM, H.J. Lu wrote:
> On Thu, Feb 8, 2018 at 1:25 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>> Again, I would like to see an argument made for CET markup against versioning
>> __pthread_register_cleanup.
> 
> Symbol versioning works well when only opaque data pointers are used.
> A pointer of struct pthread_unwind_buf is passed to libpthread from user
> programs.  Within libpthread, we need to know the exact layout of struct
> pthread_unwind_buf.  It is next to impossible to use symbol versioning here.

My worry is about incorrectly marked up objects which use the wrong unwind
buffer size resulting in difficult to diagnose crashes.

The exact issue would be users using a gcc and binutils which marks up binaries
with CET notes, but glibc headers which are not CET enabled.

This will happen when we ship Developer Toolset with a gcc and binutils that
has CET support, but the system glibc is still < 2.28. Then users will take
these binaries and try to run them on systems with glibc >= 2.28 because we
allow this backwards compatibility, and this will segfault because we are
turning on CET, but mixing different sized unwind buffers.

Please tell me if you think my worries are unfounded, or of sufficiently
low risk that they will not be a problem, or that we can somehow say that
what was done was unsupported e.g. saying that Developer Toolset used with
Intel CET flags to build anything on any system with glibc < 2.28 generates
unsupported binaries.

My suggestion in the other email is that we briefly investigate what it would
take version the data in the unwind buffer such that we *can* handle different
sized objects properly, and enable us to potentially grow the unwind data
we have in the future.

To reiterate from my other email:
* Version __sigsetjmp, and store a version cookie in __saved_mask
  indicating the version of the structure.
* Have the old __sigsetjmp disable CET since it clearly indicates
  that we have the wrong sized structure regardless of CET flags.
* Adjust the unwinder to look at __saved_mask version cooke to decide
  which sized structure it is dealing with.

The third bullet point is effectively what you are *already* doing
with the CET flags, but using the flags in a coarse way to decide
which of 2 sizes of unwind buffers we are using. This has the problems that
I outline above which is that it is error prone. While the symbol version
mechanism will always tell us exactly which sized object was used when
the TU was compiled.
Florian Weimer Feb. 9, 2018, 10:48 a.m. UTC | #6
* Carlos O'Donell:

> In the truncated sigjmp_buf used by pthread cancellation we have only 
> void * which is not large enough on x32, where you need a 64-bit
> shadow stack pointer. It would work on every other machine though.
> HJ has mentioned this problem already IIRC.
>
> Why would __pthread_register_cancel overwrite it?

__pthread_unwind_buf_t is defined as:

typedef struct
{
  struct
  {
    __jmp_buf __cancel_jmp_buf;
    int __mask_was_saved;
  } __cancel_jmp_buf[1];
  void *__pad[4];
} __pthread_unwind_buf_t __attribute__ ((__aligned__));

pthread_cleanup_push does this:

# define pthread_cleanup_push(routine, arg) \
  do {									      \
    __pthread_unwind_buf_t __cancel_buf;				      \
    void (*__cancel_routine) (void *) = (routine);			      \
    void *__cancel_arg = (arg);						      \
    int __not_first_call = __sigsetjmp ((struct __jmp_buf_tag *) (void *)     \
					__cancel_buf.__cancel_jmp_buf, 0);    \
    if (__glibc_unlikely (__not_first_call))				      \
      {									      \
	__cancel_routine (__cancel_arg);				      \
	__pthread_unwind_next (&__cancel_buf);				      \
	/* NOTREACHED */						      \
      }									      \
									      \
    __pthread_register_cancel (&__cancel_buf);				      \
    do {

__pthread_register_cancel overwrites __cancel_buf.__pad.  If
__sigsetjmp were to write to that memory area, it would not matter, as
long as we skip restoring the shadow stack pointer during unwinding
(which we do not need to do because we never return along the regular
execution path recorded on the shadow stack).

In short, the only thing need to ensure is that the over-write from
__sigsetjmp stays within __cancel_buf.  Then we are good, without
changing the stack layout for cancellation.

My proposal is still rather hackish, but so is the existing code (the
truncated jump buffer), and HJ's approach of storing the shadow stack
pointer in the signal save area of the non-truncated jump buffer.  But
I think we can make it work.
H.J. Lu Feb. 9, 2018, 11:13 a.m. UTC | #7
On Fri, Feb 9, 2018 at 2:48 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
> * Carlos O'Donell:
>
>> In the truncated sigjmp_buf used by pthread cancellation we have only
>> void * which is not large enough on x32, where you need a 64-bit
>> shadow stack pointer. It would work on every other machine though.
>> HJ has mentioned this problem already IIRC.
>>
>> Why would __pthread_register_cancel overwrite it?
>
> __pthread_unwind_buf_t is defined as:
>
> typedef struct
> {
>   struct
>   {
>     __jmp_buf __cancel_jmp_buf;
>     int __mask_was_saved;
>   } __cancel_jmp_buf[1];
>   void *__pad[4];
> } __pthread_unwind_buf_t __attribute__ ((__aligned__));
>
> pthread_cleanup_push does this:
>
> # define pthread_cleanup_push(routine, arg) \
>   do {                                                                        \
>     __pthread_unwind_buf_t __cancel_buf;                                      \
>     void (*__cancel_routine) (void *) = (routine);                            \
>     void *__cancel_arg = (arg);                                               \
>     int __not_first_call = __sigsetjmp ((struct __jmp_buf_tag *) (void *)     \
>                                         __cancel_buf.__cancel_jmp_buf, 0);    \
>     if (__glibc_unlikely (__not_first_call))                                  \
>       {                                                                       \
>         __cancel_routine (__cancel_arg);                                      \
>         __pthread_unwind_next (&__cancel_buf);                                \
>         /* NOTREACHED */                                                      \
>       }                                                                       \
>                                                                               \
>     __pthread_register_cancel (&__cancel_buf);                                \
>     do {
>
> __pthread_register_cancel overwrites __cancel_buf.__pad.  If
> __sigsetjmp were to write to that memory area, it would not matter, as
> long as we skip restoring the shadow stack pointer during unwinding
> (which we do not need to do because we never return along the regular
> execution path recorded on the shadow stack).

That is true.

> In short, the only thing need to ensure is that the over-write from
> __sigsetjmp stays within __cancel_buf.  Then we are good, without
> changing the stack layout for cancellation.

That is correct.

> My proposal is still rather hackish, but so is the existing code (the

A pointer to a buffer in user program is passed to libpthread.  There is a
jmp buf in the buffer followed by other fields.  Since the size of jmp buf is
increased in glibc 2.28, we need to know the offset of other fields. Otherwise
libpthread may write beyond the buffer in user program.  I don't see how
symbol versioning can help us here since the INTERNAL libpthread functions
don't know the layout of __pthread_unwind_buf_t of USER programs.

> truncated jump buffer), and HJ's approach of storing the shadow stack
> pointer in the signal save area of the non-truncated jump buffer.  But
> I think we can make it work.
Florian Weimer Feb. 9, 2018, 12:11 p.m. UTC | #8
* H. J. Lu:

>> My proposal is still rather hackish, but so is the existing code (the
>
> A pointer to a buffer in user program is passed to libpthread.
> There is a jmp buf in the buffer followed by other fields.  Since
> the size of jmp buf is increased in glibc 2.28, we need to know the
> offset of other fields. Otherwise libpthread may write beyond the
> buffer in user program.  I don't see how symbol versioning can help
> us here since the INTERNAL libpthread functions don't know the
> layout of __pthread_unwind_buf_t of USER programs.

I suggest *not* to increase the size of the jump buffer.

CET markup will not be correct for static libraries compiled against
2.27 or earlier with a CET-enabled toolchain, so this is the only
completely safe approach.
H.J. Lu Feb. 9, 2018, 12:34 p.m. UTC | #9
On Fri, Feb 9, 2018 at 4:11 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
> * H. J. Lu:
>
>>> My proposal is still rather hackish, but so is the existing code (the
>>
>> A pointer to a buffer in user program is passed to libpthread.
>> There is a jmp buf in the buffer followed by other fields.  Since
>> the size of jmp buf is increased in glibc 2.28, we need to know the
>> offset of other fields. Otherwise libpthread may write beyond the
>> buffer in user program.  I don't see how symbol versioning can help
>> us here since the INTERNAL libpthread functions don't know the
>> layout of __pthread_unwind_buf_t of USER programs.
>
> I suggest *not* to increase the size of the jump buffer.

Where do we save shadow stack pointer?

> CET markup will not be correct for static libraries compiled against
> 2.27 or earlier with a CET-enabled toolchain, so this is the only
> completely safe approach.

I don't believe so.  As long as there is a single input .o file which
isn't marked
with CET, the output static binary won't be marked with CET.  If what
you said is true, please file a glibc bug report.
H.J. Lu Feb. 9, 2018, 2:13 p.m. UTC | #10
On Fri, Feb 9, 2018 at 4:34 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Feb 9, 2018 at 4:11 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>> * H. J. Lu:
>>
>>>> My proposal is still rather hackish, but so is the existing code (the
>>>
>>> A pointer to a buffer in user program is passed to libpthread.
>>> There is a jmp buf in the buffer followed by other fields.  Since
>>> the size of jmp buf is increased in glibc 2.28, we need to know the
>>> offset of other fields. Otherwise libpthread may write beyond the
>>> buffer in user program.  I don't see how symbol versioning can help
>>> us here since the INTERNAL libpthread functions don't know the
>>> layout of __pthread_unwind_buf_t of USER programs.
>>
>> I suggest *not* to increase the size of the jump buffer.
>
> Where do we save shadow stack pointer?
>
>> CET markup will not be correct for static libraries compiled against
>> 2.27 or earlier with a CET-enabled toolchain, so this is the only
>> completely safe approach.
>
> I don't believe so.  As long as there is a single input .o file which
> isn't marked
> with CET, the output static binary won't be marked with CET.  If what
> you said is true, please file a glibc bug report.
>
>

I built glibc master with gcc-8.0.1 -mcet -fcf-protection.  Some object
files do get CET marker as expected.  But static executable isn't:

[hjl@gnu-skx-1 build-x86_64-linux]$ readelf -n elf/sln.o

Displaying notes found in: .note.gnu.property
  Owner                 Data size Description
  GNU                  0x00000010 NT_GNU_PROPERTY_TYPE_0
      Properties: x86 feature: IBT, SHSTK
[hjl@gnu-skx-1 build-x86_64-linux]$ readelf -n elf/sln

Displaying notes found in: .note.ABI-tag
  Owner                 Data size Description
  GNU                  0x00000010 NT_GNU_ABI_TAG (ABI version tag)
    OS: Linux, ABI: 3.2.0

Displaying notes found in: .note.gnu.build-id
  Owner                 Data size Description
  GNU                  0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: ab9587f0ef16086b740923f72bc1a024b6185b06
[hjl@gnu-skx-1 build-x86_64-linux]$

My CET ecosystem design prevents what you worried from happening.
Florian Weimer Feb. 9, 2018, 2:33 p.m. UTC | #11
On 02/09/2018 03:13 PM, H.J. Lu wrote:
> I built glibc master with gcc-8.0.1 -mcet -fcf-protection.  Some object
> files do get CET marker as expected.  But static executable isn't:

I said static libraries.  If I compile this code (based on the example 
from the manual page) on Fedora rawhide:

#include <pthread.h>
#include <stdio.h>

static int done = 0;
static int cleanup_pop_arg = 0;
static int cnt = 0;

static void
cleanup_handler(void *arg)
{
   printf("Called clean-up handler\n");
   cnt = 0;
}

void *
thread_start(void *arg)
{
   time_t start, curr;

   printf("New thread started\n");

   pthread_cleanup_push(cleanup_handler, NULL);

   curr = start = time(NULL);

   while (!done) {
     pthread_testcancel();           /* A cancellation point */
     if (curr < time(NULL)) {
       curr = time(NULL);
       printf("cnt = %d\n", cnt);  /* A cancellation point */
       cnt++;
     }
   }

   pthread_cleanup_pop(cleanup_pop_arg);
   return NULL;
}

the small jump buffer is used:

0000000000000030 <thread_start>:
   30:   f3 0f 1e fa             endbr64
   34:   53                      push   %rbx
   35:   bf 00 00 00 00          mov    $0x0,%edi
                         36: R_X86_64_32 .rodata.str1.1+0x18
   3a:   48 83 ec 70             sub    $0x70,%rsp
   3e:   e8 00 00 00 00          callq  43 <thread_start+0x13>
                         3f: R_X86_64_PC32       puts-0x4
   43:   31 f6                   xor    %esi,%esi
   45:   48 89 e7                mov    %rsp,%rdi
   48:   e8 00 00 00 00          callq  4d <thread_start+0x1d>
                         49: R_X86_64_PC32       __sigsetjmp-0x4
   4d:   f3 0f 1e fa             endbr64
   51:   85 c0                   test   %eax,%eax
   53:   75 51                   jne    a6 <thread_start+0x76>
   55:   48 89 e7                mov    %rsp,%rdi
   58:   e8 00 00 00 00          callq  5d <thread_start+0x2d>
                         59: R_X86_64_PC32  __pthread_register_cancel-0x4

And it looks to me that readelf says the object file is compatible with CET:

Displaying notes found in: .note.gnu.property
   Owner                 Data size       Description
   GNU                  0x00000010       NT_GNU_PROPERTY_TYPE_0
       Properties: x86 feature: IBT, SHSTK

Thanks,
Florian
H.J. Lu Feb. 9, 2018, 3:24 p.m. UTC | #12
On Fri, Feb 9, 2018 at 6:33 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 02/09/2018 03:13 PM, H.J. Lu wrote:
>>
>> I built glibc master with gcc-8.0.1 -mcet -fcf-protection.  Some object
>> files do get CET marker as expected.  But static executable isn't:
>
>
> I said static libraries.  If I compile this code (based on the example from
> the manual page) on Fedora rawhide:
>

Glibc never provides binary compatibility with static libraries.  My suggestions
are

1. Recompile static libraries after CET is enabled in glibc.  Or
2. Don't compile static libraries with CET.

BTW, we don't have space to save shadow stack register with existing
cancel buf.
Carlos O'Donell Feb. 24, 2018, 5:46 a.m. UTC | #13
On 02/09/2018 04:34 AM, H.J. Lu wrote:
> On Fri, Feb 9, 2018 at 4:11 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>> * H. J. Lu:
>>
>>>> My proposal is still rather hackish, but so is the existing code (the
>>>
>>> A pointer to a buffer in user program is passed to libpthread.
>>> There is a jmp buf in the buffer followed by other fields.  Since
>>> the size of jmp buf is increased in glibc 2.28, we need to know the
>>> offset of other fields. Otherwise libpthread may write beyond the
>>> buffer in user program.  I don't see how symbol versioning can help
>>> us here since the INTERNAL libpthread functions don't know the
>>> layout of __pthread_unwind_buf_t of USER programs.
>>
>> I suggest *not* to increase the size of the jump buffer.
> 
> Where do we save shadow stack pointer?

typedef struct
{
  struct
  {
    __jmp_buf __cancel_jmp_buf;
    int __mask_was_saved;
  } __cancel_jmp_buf[1];


  void *__pad[4];
  ^^^^^^^^^^^^^^^ Save the shadow stack pointer here.


} __pthread_unwind_buf_t __attribute__ ((__aligned__));

Save the shadow stack pointer to __pad[4] by making the
internal sigset_t smaller and moving it down.

The key aspect of Florian's recommendation is a realization
that a pthread_cleanup_pop can only restore you to the *same*
function e.g. the earlier pthread_cleanup_push, and therefore
does not need to change the shadow stack pointer.

All subsequent unwinding will proceed from one jump buffer
to the next, and through the unwinder, until it reaches
pthread_create. When will we ever return via a normal return
instruction and need to verify via the shadow stack?
Carlos O'Donell Feb. 24, 2018, 6:41 a.m. UTC | #14
On 02/09/2018 02:48 AM, Florian Weimer wrote:
> In short, the only thing need to ensure is that the over-write from
> __sigsetjmp stays within __cancel_buf.  Then we are good, without
> changing the stack layout for cancellation.
> 
> My proposal is still rather hackish, but so is the existing code (the
> truncated jump buffer), and HJ's approach of storing the shadow stack
> pointer in the signal save area of the non-truncated jump buffer.  But
> I think we can make it work.
 
I disagree completely. I think this is an an elegant solution to an ugly
problem. We need to see if HJ has any other requirements which may conflict
with this suggestion.
H.J. Lu Feb. 24, 2018, 3:19 p.m. UTC | #15
On Fri, Feb 23, 2018 at 9:46 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 02/09/2018 04:34 AM, H.J. Lu wrote:
>> On Fri, Feb 9, 2018 at 4:11 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>>> * H. J. Lu:
>>>
>>>>> My proposal is still rather hackish, but so is the existing code (the
>>>>
>>>> A pointer to a buffer in user program is passed to libpthread.
>>>> There is a jmp buf in the buffer followed by other fields.  Since
>>>> the size of jmp buf is increased in glibc 2.28, we need to know the
>>>> offset of other fields. Otherwise libpthread may write beyond the
>>>> buffer in user program.  I don't see how symbol versioning can help
>>>> us here since the INTERNAL libpthread functions don't know the
>>>> layout of __pthread_unwind_buf_t of USER programs.
>>>
>>> I suggest *not* to increase the size of the jump buffer.
>>
>> Where do we save shadow stack pointer?
>
> typedef struct
> {
>   struct
>   {
>     __jmp_buf __cancel_jmp_buf;
>     int __mask_was_saved;
>   } __cancel_jmp_buf[1];
>
>
>   void *__pad[4];
>   ^^^^^^^^^^^^^^^ Save the shadow stack pointer here.
>
>
> } __pthread_unwind_buf_t __attribute__ ((__aligned__));
>
> Save the shadow stack pointer to __pad[4] by making the
> internal sigset_t smaller and moving it down.
>
> The key aspect of Florian's recommendation is a realization
> that a pthread_cleanup_pop can only restore you to the *same*
> function e.g. the earlier pthread_cleanup_push, and therefore
> does not need to change the shadow stack pointer.

PLEASE take a closer look:

Yes, there are

void *__pad[4];

But the name is misleading.   It isn't real padding.  This is
an opaque array:

/* Private data in the cleanup buffer.  */
union pthread_unwind_buf_data
{
  /* This is the placeholder of the public version.  */
  void *pad[4];

  struct
  {
    /* Pointer to the previous cleanup buffer.  */
    struct pthread_unwind_buf *prev;

    /* Backward compatibility: state of the old-style cleanup
       handler at the time of the previous new-style cleanup handler
       installment.  */
    struct _pthread_cleanup_buffer *cleanup;

    /* Cancellation type before the push call.  */
    int canceltype;
  } data;
};

Only the last element in __pad[4] is unused.  There is

---
Note: There is an unused pointer space in pthread_unwind_buf_data.  But
it isn't suitable for saving and restoring shadow stack register since
x32 is a 64-bit process with 32-bit software pointer and kernel may
place x32 shadow stack above 4GB.  We need to save and restore 64-bit
shadow stack register for x32.
---

in my commit log to explain why it isn't suitable for shadow stack.
Florian Weimer Feb. 24, 2018, 3:46 p.m. UTC | #16
* H. J. Lu:

> PLEASE take a closer look:
>
> Yes, there are
>
> void *__pad[4];
>
> But the name is misleading.   It isn't real padding.  This is
> an opaque array:
>
> /* Private data in the cleanup buffer.  */
> union pthread_unwind_buf_data
> {
>   /* This is the placeholder of the public version.  */
>   void *pad[4];
>
>   struct
>   {
>     /* Pointer to the previous cleanup buffer.  */
>     struct pthread_unwind_buf *prev;
>
>     /* Backward compatibility: state of the old-style cleanup
>        handler at the time of the previous new-style cleanup handler
>        installment.  */
>     struct _pthread_cleanup_buffer *cleanup;
>
>     /* Cancellation type before the push call.  */
>     int canceltype;
>   } data;
> };
>
> Only the last element in __pad[4] is unused.  There is

The entire __pad array is unused until the handler is registered,
which happens *after* the call to __sigsetjmp, in the
__pthread_register_cancel function.  This means that __sigsetjmp may
clobber it.
H.J. Lu Feb. 25, 2018, 2:04 a.m. UTC | #17
On Sat, Feb 24, 2018 at 7:46 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
> * H. J. Lu:
>
>> PLEASE take a closer look:
>>
>> Yes, there are
>>
>> void *__pad[4];
>>
>> But the name is misleading.   It isn't real padding.  This is
>> an opaque array:
>>
>> /* Private data in the cleanup buffer.  */
>> union pthread_unwind_buf_data
>> {
>>   /* This is the placeholder of the public version.  */
>>   void *pad[4];
>>
>>   struct
>>   {
>>     /* Pointer to the previous cleanup buffer.  */
>>     struct pthread_unwind_buf *prev;
>>
>>     /* Backward compatibility: state of the old-style cleanup
>>        handler at the time of the previous new-style cleanup handler
>>        installment.  */
>>     struct _pthread_cleanup_buffer *cleanup;
>>
>>     /* Cancellation type before the push call.  */
>>     int canceltype;
>>   } data;
>> };
>>
>> Only the last element in __pad[4] is unused.  There is
>
> The entire __pad array is unused until the handler is registered,
> which happens *after* the call to __sigsetjmp, in the
> __pthread_register_cancel function.  This means that __sigsetjmp may
> clobber it.

Please check out hjl/setjmp/pad branch and check it on x86-64.

1. It uses pad array in struct pthread_unwind_buf to save and restore shadow
stack register if size of struct pthread_unwind_buf is no less than
offset of shadow stack pointer + shadow stack pointer size.

2. It stores (int64_t) -1 as shadow stack register in x86-64 setjmp and read
it back in x86-64 longjmp to verify that it is unchanged.

I got

FAIL: nptl/tst-basic3
FAIL: nptl/tst-cancel-self
FAIL: nptl/tst-cancel-self-cancelstate
FAIL: nptl/tst-cancel-self-canceltype
FAIL: nptl/tst-cancel-self-testcancel
FAIL: nptl/tst-cancel1
FAIL: nptl/tst-cancel10
FAIL: nptl/tst-cancel11
FAIL: nptl/tst-cancel12
FAIL: nptl/tst-cancel13
FAIL: nptl/tst-cancel14
FAIL: nptl/tst-cancel15
FAIL: nptl/tst-cancel16
FAIL: nptl/tst-cancel17
FAIL: nptl/tst-cancel18
FAIL: nptl/tst-cancel20
FAIL: nptl/tst-cancel21
FAIL: nptl/tst-cancel21-static
FAIL: nptl/tst-cancel24
FAIL: nptl/tst-cancel24-static
FAIL: nptl/tst-cancel25
FAIL: nptl/tst-cancel4
FAIL: nptl/tst-cancel4_1
FAIL: nptl/tst-cancel4_2
FAIL: nptl/tst-cancel5
FAIL: nptl/tst-cancel7
FAIL: nptl/tst-cancel9
FAIL: nptl/tst-cancelx13
FAIL: nptl/tst-cancelx15
FAIL: nptl/tst-cancelx21
FAIL: nptl/tst-cancelx7
FAIL: nptl/tst-cleanup0
FAIL: nptl/tst-cleanup0-cmp
FAIL: nptl/tst-cleanup1
FAIL: nptl/tst-cleanup3
FAIL: nptl/tst-cleanup4
FAIL: nptl/tst-cleanupx0
FAIL: nptl/tst-cleanupx4
FAIL: nptl/tst-cond-except
FAIL: nptl/tst-cond22
FAIL: nptl/tst-cond25
FAIL: nptl/tst-cond7
FAIL: nptl/tst-cond8
FAIL: nptl/tst-cond8-static
FAIL: nptl/tst-execstack
FAIL: nptl/tst-exit2
FAIL: nptl/tst-exit3
FAIL: nptl/tst-join1
FAIL: nptl/tst-join5
FAIL: nptl/tst-mutex8
FAIL: nptl/tst-mutex8-static
FAIL: nptl/tst-mutexpi8
FAIL: nptl/tst-mutexpi8-static
FAIL: nptl/tst-once3
FAIL: nptl/tst-once4
FAIL: nptl/tst-oncex3
FAIL: nptl/tst-oncex4
FAIL: nptl/tst-sem11
FAIL: nptl/tst-sem11-static
FAIL: nptl/tst-sem12
FAIL: nptl/tst-sem12-static
FAIL: nptl/tst-tsd5
FAIL: nss/tst-cancel-getpwuid_r
FAIL: rt/tst-mqueue8
FAIL: nptl/tst-setuid2

For nptl/tst-tsd5, it went like this:

1. __libc_start_main calls __sigsetjmp:

 /* Memory for the cancellation buffer.  */
  struct pthread_unwind_buf unwind_buf;

  int not_first_call;
  not_first_call = setjmp ((struct __jmp_buf_tag *) unwind_buf.cancel_jmp_buf);
  if (__glibc_likely (! not_first_call))

__sigsetjmp stores -1 as shadow stack pointer.

2.  After calling  __sigsetjmp, __libc_start_main does

      /* Store old info.  */
      unwind_buf.priv.data.prev = THREAD_GETMEM (self, cleanup_jmp_buf);
      unwind_buf.priv.data.cleanup = THREAD_GETMEM (self, cleanup);

which overrides shadow stack pointer.

What have I done wrong?
Florian Weimer Feb. 25, 2018, 9:26 a.m. UTC | #18
* H. J. Lu:

> What have I done wrong?

My understanding (shared by Carlos as far as I know) is that you do
not need to restore the shadow stack pointer when unwinding for
cancellation or thread exit.
H.J. Lu Feb. 25, 2018, 11:37 a.m. UTC | #19
On Sun, Feb 25, 2018 at 1:26 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
> * H. J. Lu:
>
>> What have I done wrong?
>
> My understanding (shared by Carlos as far as I know) is that you do
> not need to restore the shadow stack pointer when unwinding for
> cancellation or thread exit.

It may be true for thread exit.  Are you saying that a function will never
return after cancellation point?
Florian Weimer Feb. 25, 2018, 11:59 a.m. UTC | #20
* H. J. Lu:

> On Sun, Feb 25, 2018 at 1:26 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>> * H. J. Lu:
>>
>>> What have I done wrong?
>>
>> My understanding (shared by Carlos as far as I know) is that you do
>> not need to restore the shadow stack pointer when unwinding for
>> cancellation or thread exit.
>
> It may be true for thread exit.  Are you saying that a function will never
> return after cancellation point?

That's certainly the intent.  Cancellation is supposed to be
unstoppable.
H.J. Lu Feb. 25, 2018, 12:53 p.m. UTC | #21
On Sun, Feb 25, 2018 at 3:59 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
> * H. J. Lu:
>
>> On Sun, Feb 25, 2018 at 1:26 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>>> * H. J. Lu:
>>>
>>>> What have I done wrong?
>>>
>>> My understanding (shared by Carlos as far as I know) is that you do
>>> not need to restore the shadow stack pointer when unwinding for
>>> cancellation or thread exit.
>>
>> It may be true for thread exit.  Are you saying that a function will never
>> return after cancellation point?
>
> That's certainly the intent.  Cancellation is supposed to be
> unstoppable.

The question is what happens after cancellation.  Can a function return
after cancellation?
H.J. Lu Feb. 25, 2018, 12:55 p.m. UTC | #22
On Sun, Feb 25, 2018 at 4:53 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Sun, Feb 25, 2018 at 3:59 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>> * H. J. Lu:
>>
>>> On Sun, Feb 25, 2018 at 1:26 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>>>> * H. J. Lu:
>>>>
>>>>> What have I done wrong?
>>>>
>>>> My understanding (shared by Carlos as far as I know) is that you do
>>>> not need to restore the shadow stack pointer when unwinding for
>>>> cancellation or thread exit.
>>>
>>> It may be true for thread exit.  Are you saying that a function will never
>>> return after cancellation point?
>>
>> That's certainly the intent.  Cancellation is supposed to be
>> unstoppable.
>
> The question is what happens after cancellation.  Can a function return
> after cancellation?
>

Let me rephrase.  Can a function, which contains cancellation points,
return to its caller after cancellation happened?
Florian Weimer Feb. 25, 2018, 12:58 p.m. UTC | #23
* H. J. Lu:

> Let me rephrase.  Can a function, which contains cancellation points,
> return to its caller after cancellation happened?

I don't think it can.  It's undefined to use longjmp to jump out of a
cancellation handler.  On the C++ side, we make sure that the
unwinding always proceeds, even across catch (...) blocks which
swallow the exception.
H.J. Lu Feb. 25, 2018, 1:23 p.m. UTC | #24
On Sun, Feb 25, 2018 at 4:58 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
> * H. J. Lu:
>
>> Let me rephrase.  Can a function, which contains cancellation points,
>> return to its caller after cancellation happened?
>
> I don't think it can.  It's undefined to use longjmp to jump out of a
> cancellation handler.  On the C++ side, we make sure that the
> unwinding always proceeds, even across catch (...) blocks which
> swallow the exception.

libpthread cancellation implementation passes cancel_jmp_buf to
libgcc unwinder, which saves and restores shadow stack register
for C++ exception.  How can libgcc unwinder tell when to save and
restore shadow stack register?
Florian Weimer Feb. 25, 2018, 1:31 p.m. UTC | #25
* H. J. Lu:

> libpthread cancellation implementation passes cancel_jmp_buf to
> libgcc unwinder,

Oh.  Where does it do that?  If you mean _Unwind_ForcedUnwind, I think
that's just an opaque closure argument for the callback.
H.J. Lu Feb. 25, 2018, 1:36 p.m. UTC | #26
On Sun, Feb 25, 2018 at 5:31 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
> * H. J. Lu:
>
>> libpthread cancellation implementation passes cancel_jmp_buf to
>> libgcc unwinder,
>
> Oh.  Where does it do that?  If you mean _Unwind_ForcedUnwind, I think
> that's just an opaque closure argument for the callback.

Yes.  Libgcc unwinder needs to deal with it.
H.J. Lu Feb. 25, 2018, 1:49 p.m. UTC | #27
On Sun, Feb 25, 2018 at 5:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Sun, Feb 25, 2018 at 5:31 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>> * H. J. Lu:
>>
>>> libpthread cancellation implementation passes cancel_jmp_buf to
>>> libgcc unwinder,
>>
>> Oh.  Where does it do that?  If you mean _Unwind_ForcedUnwind, I think
>> that's just an opaque closure argument for the callback.
>
> Yes.  Libgcc unwinder needs to deal with it.
>

Because of  libgcc unwinder,  provide another set of setjmp/longjmp
without saving and restoring shadow stack register for thread cancellation
won't solve our problem.
Florian Weimer Feb. 25, 2018, 1:49 p.m. UTC | #28
* H. J. Lu:

> On Sun, Feb 25, 2018 at 5:31 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>> * H. J. Lu:
>>
>>> libpthread cancellation implementation passes cancel_jmp_buf to
>>> libgcc unwinder,
>>
>> Oh.  Where does it do that?  If you mean _Unwind_ForcedUnwind, I think
>> that's just an opaque closure argument for the callback.
>
> Yes.  Libgcc unwinder needs to deal with it.

Please point me to the code.  Thanks.
H.J. Lu Feb. 25, 2018, 2 p.m. UTC | #29
On Sun, Feb 25, 2018 at 5:49 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
> * H. J. Lu:
>
>> On Sun, Feb 25, 2018 at 5:31 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>>> * H. J. Lu:
>>>
>>>> libpthread cancellation implementation passes cancel_jmp_buf to
>>>> libgcc unwinder,
>>>
>>> Oh.  Where does it do that?  If you mean _Unwind_ForcedUnwind, I think
>>> that's just an opaque closure argument for the callback.
>>
>> Yes.  Libgcc unwinder needs to deal with it.
>
> Please point me to the code.  Thanks.

sysdeps/nptl/unwind-forcedunwind.c has

_Unwind_Reason_Code
_Unwind_ForcedUnwind (struct _Unwind_Exception *exc, _Unwind_Stop_Fn stop,
      void *stop_argument)
{
  if (__glibc_unlikely (libgcc_s_handle == NULL))
    pthread_cancel_init ();
  else
    atomic_read_barrier ();

  _Unwind_Reason_Code (*forcedunwind)
    (struct _Unwind_Exception *, _Unwind_Stop_Fn, void *)
    = libgcc_s_forcedunwind;
  PTR_DEMANGLE (forcedunwind);
  return forcedunwind (exc, stop, stop_argument);
}
Florian Weimer Feb. 25, 2018, 2:13 p.m. UTC | #30
* H. J. Lu:

> On Sun, Feb 25, 2018 at 5:49 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>> * H. J. Lu:
>>
>>> On Sun, Feb 25, 2018 at 5:31 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>>>> * H. J. Lu:
>>>>
>>>>> libpthread cancellation implementation passes cancel_jmp_buf to
>>>>> libgcc unwinder,
>>>>
>>>> Oh.  Where does it do that?  If you mean _Unwind_ForcedUnwind, I think
>>>> that's just an opaque closure argument for the callback.
>>>
>>> Yes.  Libgcc unwinder needs to deal with it.
>>
>> Please point me to the code.  Thanks.
>
> sysdeps/nptl/unwind-forcedunwind.c has
>
> _Unwind_Reason_Code
> _Unwind_ForcedUnwind (struct _Unwind_Exception *exc, _Unwind_Stop_Fn stop,
>       void *stop_argument)
> {
>   if (__glibc_unlikely (libgcc_s_handle == NULL))
>     pthread_cancel_init ();
>   else
>     atomic_read_barrier ();
>
>   _Unwind_Reason_Code (*forcedunwind)
>     (struct _Unwind_Exception *, _Unwind_Stop_Fn, void *)
>     = libgcc_s_forcedunwind;
>   PTR_DEMANGLE (forcedunwind);
>   return forcedunwind (exc, stop, stop_argument);
> }

Thanks.  I think stop_argument ends up in the private_2 member inside
unwind.inc, which is only passed back to the callback (the stop
function pointer) in _Unwind_ForcedUnwind_Phase2, and not interpreted
by libgcc itself.  So this shouldn't be a problem.
H.J. Lu Feb. 26, 2018, 3:55 a.m. UTC | #31
On Sun, Feb 25, 2018 at 6:13 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
> * H. J. Lu:
>
>> On Sun, Feb 25, 2018 at 5:49 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>>> * H. J. Lu:
>>>
>>>> On Sun, Feb 25, 2018 at 5:31 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>>>>> * H. J. Lu:
>>>>>
>>>>>> libpthread cancellation implementation passes cancel_jmp_buf to
>>>>>> libgcc unwinder,
>>>>>
>>>>> Oh.  Where does it do that?  If you mean _Unwind_ForcedUnwind, I think
>>>>> that's just an opaque closure argument for the callback.
>>>>
>>>> Yes.  Libgcc unwinder needs to deal with it.
>>>
>>> Please point me to the code.  Thanks.
>>
>> sysdeps/nptl/unwind-forcedunwind.c has
>>
>> _Unwind_Reason_Code
>> _Unwind_ForcedUnwind (struct _Unwind_Exception *exc, _Unwind_Stop_Fn stop,
>>       void *stop_argument)
>> {
>>   if (__glibc_unlikely (libgcc_s_handle == NULL))
>>     pthread_cancel_init ();
>>   else
>>     atomic_read_barrier ();
>>
>>   _Unwind_Reason_Code (*forcedunwind)
>>     (struct _Unwind_Exception *, _Unwind_Stop_Fn, void *)
>>     = libgcc_s_forcedunwind;
>>   PTR_DEMANGLE (forcedunwind);
>>   return forcedunwind (exc, stop, stop_argument);
>> }
>
> Thanks.  I think stop_argument ends up in the private_2 member inside
> unwind.inc, which is only passed back to the callback (the stop
> function pointer) in _Unwind_ForcedUnwind_Phase2, and not interpreted
> by libgcc itself.  So this shouldn't be a problem.

Please take a look at hjl/setjmp/cancel branch:

https://github.com/hjl-tools/glibc/tree/hjl/setjmp/cancel

Functions, like LIBC_START_MAIN, START_THREAD_DEFN as well as these
with thread cancellation, call setjmp, but never return to their
callers after longjmp returns.  This patch adds <bits/setjmp-cancel.h>
and <setjmp-cancelP.h> to provide a version of setjmp family functions,
__setjmp_cancel and __sigsetjmp_cancel, which are used to implement
thread cancellation.  The default __setjmp_cancel and __sigsetjmp_cancel
are defined as setjmp and __sigsetjmp, respectively, which are the same
as before.

On x86, a different version is added to avoid saving and restoring
shadow stack register.  __libc_longjmp, which is a private interface
for cancellation implementation in libpthread, is changed to call
__longjmp_cancel instead of __longjmp.

This leaves cancel_jmp_buf unchanged.  But is it worth it?

1. I have to add  __setjmp_cancel and __sigsetjmp_cancel which won't
save and restore shadow stack register.
2. I still need yet to add the new version of __sigsetjmp for older binaries.
3, Older .o files compiled against glibc 2.27 are still incompatible with
glibc 2.28.
Carlos O'Donell Feb. 28, 2018, 11:23 p.m. UTC | #32
On 02/25/2018 07:55 PM, H.J. Lu wrote:
> On Sun, Feb 25, 2018 at 6:13 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>> * H. J. Lu:
>>
>>> On Sun, Feb 25, 2018 at 5:49 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>>>> * H. J. Lu:
>>>>
>>>>> On Sun, Feb 25, 2018 at 5:31 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>>>>>> * H. J. Lu:
>>>>>>
>>>>>>> libpthread cancellation implementation passes cancel_jmp_buf to
>>>>>>> libgcc unwinder,
>>>>>>
>>>>>> Oh.  Where does it do that?  If you mean _Unwind_ForcedUnwind, I think
>>>>>> that's just an opaque closure argument for the callback.
>>>>>
>>>>> Yes.  Libgcc unwinder needs to deal with it.
>>>>
>>>> Please point me to the code.  Thanks.
>>>
>>> sysdeps/nptl/unwind-forcedunwind.c has
>>>
>>> _Unwind_Reason_Code
>>> _Unwind_ForcedUnwind (struct _Unwind_Exception *exc, _Unwind_Stop_Fn stop,
>>>       void *stop_argument)
>>> {
>>>   if (__glibc_unlikely (libgcc_s_handle == NULL))
>>>     pthread_cancel_init ();
>>>   else
>>>     atomic_read_barrier ();
>>>
>>>   _Unwind_Reason_Code (*forcedunwind)
>>>     (struct _Unwind_Exception *, _Unwind_Stop_Fn, void *)
>>>     = libgcc_s_forcedunwind;
>>>   PTR_DEMANGLE (forcedunwind);
>>>   return forcedunwind (exc, stop, stop_argument);
>>> }
>>
>> Thanks.  I think stop_argument ends up in the private_2 member inside
>> unwind.inc, which is only passed back to the callback (the stop
>> function pointer) in _Unwind_ForcedUnwind_Phase2, and not interpreted
>> by libgcc itself.  So this shouldn't be a problem.
> 
> Please take a look at hjl/setjmp/cancel branch:
> 
> https://github.com/hjl-tools/glibc/tree/hjl/setjmp/cancel

OK.

> Functions, like LIBC_START_MAIN, START_THREAD_DEFN as well as these
> with thread cancellation, call setjmp, but never return to their
> callers after longjmp returns.  This patch adds <bits/setjmp-cancel.h>
> and <setjmp-cancelP.h> to provide a version of setjmp family functions,
> __setjmp_cancel and __sigsetjmp_cancel, which are used to implement
> thread cancellation.  The default __setjmp_cancel and __sigsetjmp_cancel
> are defined as setjmp and __sigsetjmp, respectively, which are the same
> as before.

> On x86, a different version is added to avoid saving and restoring
> shadow stack register.  __libc_longjmp, which is a private interface
> for cancellation implementation in libpthread, is changed to call
> __longjmp_cancel instead of __longjmp.

The consequence of this solution is that any function calls after the
first longjmp will continue to extend the shadow stack because there is
no restoration? I think this is OK, you effectively need to have enough
shadow stack to run through all of the unwind functions as if they were
nested frames on the shadow stack.

Do we see that being a problem?
 
> This leaves cancel_jmp_buf unchanged.  But is it worth it?

Absolutely it is worth it. This new design looks much simpler to support.

The technical trade is as follows:

* Three new versioned symbols for __sigsetjmp, __sigsetjmp_cancel,
  and __setjmp_cancel.

  * You can avoid versioning by placing the shadow stack pointer such
    that it is written into the pad[4] of the cancel_jmp_buf, and thus
    it would always have space to be written. However, it is cleaner
    to have cancel versions of these functions that do less.

vs.

* An ABI change that requires coordination across glibc, gcc, and
  binutils to ensure there is a "flag day" where the ABI can be correctly
  selected at runtime.

The latter is not easy to execute for downstream, and the versioned
symbols provide a good barrier in terms of compatibility and errors
reported by the dynamic loader.
 
> 1. I have to add  __setjmp_cancel and __sigsetjmp_cancel which won't
> save and restore shadow stack register.

OK.

> 2. I still need yet to add the new version of __sigsetjmp for older binaries.

OK.

> 3, Older .o files compiled against glibc 2.27 are still incompatible with
> glibc 2.28.
 
This is normal, and has never been assured.
H.J. Lu March 7, 2018, 11:56 a.m. UTC | #33
On Wed, Feb 28, 2018 at 3:23 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 02/25/2018 07:55 PM, H.J. Lu wrote:
>> On Sun, Feb 25, 2018 at 6:13 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>>> * H. J. Lu:
>>>
>>>> On Sun, Feb 25, 2018 at 5:49 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>>>>> * H. J. Lu:
>>>>>
>>>>>> On Sun, Feb 25, 2018 at 5:31 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>>>>>>> * H. J. Lu:
>>>>>>>
>>>>>>>> libpthread cancellation implementation passes cancel_jmp_buf to
>>>>>>>> libgcc unwinder,
>>>>>>>
>>>>>>> Oh.  Where does it do that?  If you mean _Unwind_ForcedUnwind, I think
>>>>>>> that's just an opaque closure argument for the callback.
>>>>>>
>>>>>> Yes.  Libgcc unwinder needs to deal with it.
>>>>>
>>>>> Please point me to the code.  Thanks.
>>>>
>>>> sysdeps/nptl/unwind-forcedunwind.c has
>>>>
>>>> _Unwind_Reason_Code
>>>> _Unwind_ForcedUnwind (struct _Unwind_Exception *exc, _Unwind_Stop_Fn stop,
>>>>       void *stop_argument)
>>>> {
>>>>   if (__glibc_unlikely (libgcc_s_handle == NULL))
>>>>     pthread_cancel_init ();
>>>>   else
>>>>     atomic_read_barrier ();
>>>>
>>>>   _Unwind_Reason_Code (*forcedunwind)
>>>>     (struct _Unwind_Exception *, _Unwind_Stop_Fn, void *)
>>>>     = libgcc_s_forcedunwind;
>>>>   PTR_DEMANGLE (forcedunwind);
>>>>   return forcedunwind (exc, stop, stop_argument);
>>>> }
>>>
>>> Thanks.  I think stop_argument ends up in the private_2 member inside
>>> unwind.inc, which is only passed back to the callback (the stop
>>> function pointer) in _Unwind_ForcedUnwind_Phase2, and not interpreted
>>> by libgcc itself.  So this shouldn't be a problem.
>>
>> Please take a look at hjl/setjmp/cancel branch:
>>
>> https://github.com/hjl-tools/glibc/tree/hjl/setjmp/cancel
>
> OK.
>
>> Functions, like LIBC_START_MAIN, START_THREAD_DEFN as well as these
>> with thread cancellation, call setjmp, but never return to their
>> callers after longjmp returns.  This patch adds <bits/setjmp-cancel.h>
>> and <setjmp-cancelP.h> to provide a version of setjmp family functions,
>> __setjmp_cancel and __sigsetjmp_cancel, which are used to implement
>> thread cancellation.  The default __setjmp_cancel and __sigsetjmp_cancel
>> are defined as setjmp and __sigsetjmp, respectively, which are the same
>> as before.
>
>> On x86, a different version is added to avoid saving and restoring
>> shadow stack register.  __libc_longjmp, which is a private interface
>> for cancellation implementation in libpthread, is changed to call
>> __longjmp_cancel instead of __longjmp.
>
> The consequence of this solution is that any function calls after the
> first longjmp will continue to extend the shadow stack because there is
> no restoration? I think this is OK, you effectively need to have enough
> shadow stack to run through all of the unwind functions as if they were
> nested frames on the shadow stack.
>
> Do we see that being a problem?
>
>> This leaves cancel_jmp_buf unchanged.  But is it worth it?
>
> Absolutely it is worth it. This new design looks much simpler to support.
>
> The technical trade is as follows:
>
> * Three new versioned symbols for __sigsetjmp, __sigsetjmp_cancel,
>   and __setjmp_cancel.
>
>   * You can avoid versioning by placing the shadow stack pointer such
>     that it is written into the pad[4] of the cancel_jmp_buf, and thus
>     it would always have space to be written. However, it is cleaner
>     to have cancel versions of these functions that do less.
>
> vs.
>
> * An ABI change that requires coordination across glibc, gcc, and
>   binutils to ensure there is a "flag day" where the ABI can be correctly
>   selected at runtime.
>
> The latter is not easy to execute for downstream, and the versioned
> symbols provide a good barrier in terms of compatibility and errors
> reported by the dynamic loader.
>
>> 1. I have to add  __setjmp_cancel and __sigsetjmp_cancel which won't
>> save and restore shadow stack register.

I have been testing this.  I ran into one issue.  GCC knows that setjmp will
return via longjmp and inserts ENDBR after it.  But it doesn't know
  __setjmp_cancel and __sigsetjmp_cancel.   We can either add them to GCC
or add NOTRACK prefix to the corresponding longjmps.

> OK.
>
>> 2. I still need yet to add the new version of __sigsetjmp for older binaries.
>
> OK.
>
>> 3, Older .o files compiled against glibc 2.27 are still incompatible with
>> glibc 2.28.
>
> This is normal, and has never been assured.
>
> --
> Cheers,
> Carlos.
Carlos O'Donell March 7, 2018, 5:34 p.m. UTC | #34
On 03/07/2018 03:56 AM, H.J. Lu wrote:
>>> 1. I have to add  __setjmp_cancel and __sigsetjmp_cancel which won't
>>> save and restore shadow stack register.
> 
> I have been testing this.  I ran into one issue.  GCC knows that setjmp will
> return via longjmp and inserts ENDBR after it.  But it doesn't know
>   __setjmp_cancel and __sigsetjmp_cancel.   We can either add them to GCC
> or add NOTRACK prefix to the corresponding longjmps.

I would rather GCC did not know about these implementation details.

I have no objection to the NOTRACK prefix in the corresponding longjmps.

What would be a downside to this choice?
H.J. Lu March 7, 2018, 7:47 p.m. UTC | #35
On Wed, Mar 7, 2018 at 9:34 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 03/07/2018 03:56 AM, H.J. Lu wrote:
>>>> 1. I have to add  __setjmp_cancel and __sigsetjmp_cancel which won't
>>>> save and restore shadow stack register.
>>
>> I have been testing this.  I ran into one issue.  GCC knows that setjmp will
>> return via longjmp and inserts ENDBR after it.  But it doesn't know
>>   __setjmp_cancel and __sigsetjmp_cancel.   We can either add them to GCC
>> or add NOTRACK prefix to the corresponding longjmps.
>
> I would rather GCC did not know about these implementation details.
>
> I have no objection to the NOTRACK prefix in the corresponding longjmps.
>
> What would be a downside to this choice?
>

NOTRACK prefix is typically generated by compiler for switch table.  Compiler
knows each indirect jump target is valid and pointer load for indirect jump is
generated by compiler in read-only section.  This is pretty safe since there is
very little chance for malicious codes to temper the pointer value.  However,
in case of longjmp, the indirect jump target is in jmpbuf.   There is
a possilibty
for malicious codes to change the indirect jump target such that longjmp wil
jump to the wrong place.  Use NOTRACK prefix here defeats the purpose of
indirect branch tracking in CET.
H.J. Lu March 7, 2018, 8:14 p.m. UTC | #36
On Wed, Mar 7, 2018 at 11:47 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Mar 7, 2018 at 9:34 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>> On 03/07/2018 03:56 AM, H.J. Lu wrote:
>>>>> 1. I have to add  __setjmp_cancel and __sigsetjmp_cancel which won't
>>>>> save and restore shadow stack register.
>>>
>>> I have been testing this.  I ran into one issue.  GCC knows that setjmp will
>>> return via longjmp and inserts ENDBR after it.  But it doesn't know
>>>   __setjmp_cancel and __sigsetjmp_cancel.   We can either add them to GCC
>>> or add NOTRACK prefix to the corresponding longjmps.
>>
>> I would rather GCC did not know about these implementation details.
>>
>> I have no objection to the NOTRACK prefix in the corresponding longjmps.
>>
>> What would be a downside to this choice?
>>
>
> NOTRACK prefix is typically generated by compiler for switch table.  Compiler
> knows each indirect jump target is valid and pointer load for indirect jump is
> generated by compiler in read-only section.  This is pretty safe since there is
> very little chance for malicious codes to temper the pointer value.  However,
> in case of longjmp, the indirect jump target is in jmpbuf.   There is
> a possilibty
> for malicious codes to change the indirect jump target such that longjmp wil
> jump to the wrong place.  Use NOTRACK prefix here defeats the purpose of
> indirect branch tracking in CET.
>

Also GCC needs to know that __setjmp_cancel and __sigsetjmp_cancel may
return twice, similar to setjmp.
H.J. Lu March 7, 2018, 10:06 p.m. UTC | #37
On Wed, Mar 7, 2018 at 12:14 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Mar 7, 2018 at 11:47 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, Mar 7, 2018 at 9:34 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>>> On 03/07/2018 03:56 AM, H.J. Lu wrote:
>>>>>> 1. I have to add  __setjmp_cancel and __sigsetjmp_cancel which won't
>>>>>> save and restore shadow stack register.
>>>>
>>>> I have been testing this.  I ran into one issue.  GCC knows that setjmp will
>>>> return via longjmp and inserts ENDBR after it.  But it doesn't know
>>>>   __setjmp_cancel and __sigsetjmp_cancel.   We can either add them to GCC
>>>> or add NOTRACK prefix to the corresponding longjmps.
>>>
>>> I would rather GCC did not know about these implementation details.
>>>
>>> I have no objection to the NOTRACK prefix in the corresponding longjmps.
>>>
>>> What would be a downside to this choice?
>>>
>>
>> NOTRACK prefix is typically generated by compiler for switch table.  Compiler
>> knows each indirect jump target is valid and pointer load for indirect jump is
>> generated by compiler in read-only section.  This is pretty safe since there is
>> very little chance for malicious codes to temper the pointer value.  However,
>> in case of longjmp, the indirect jump target is in jmpbuf.   There is
>> a possilibty
>> for malicious codes to change the indirect jump target such that longjmp wil
>> jump to the wrong place.  Use NOTRACK prefix here defeats the purpose of
>> indirect branch tracking in CET.
>>
>
> Also GCC needs to know that __setjmp_cancel and __sigsetjmp_cancel may
> return twice, similar to setjmp.
>

Here is the GCC patch:


diff --git a/gcc/calls.c b/gcc/calls.c
index 19c95b8455b..d1f436dfa91 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -604,7 +604,7 @@ special_function_p (const_tree fndecl, int flags)
     name_decl = DECL_NAME (cgraph_node::get (fndecl)->orig_decl);

   if (fndecl && name_decl
-      && IDENTIFIER_LENGTH (name_decl) <= 11
+      && IDENTIFIER_LENGTH (name_decl) <= 18
       /* Exclude functions not at the file scope, or not `extern',
    since they are not the magic functions we would otherwise
    think they are.
@@ -637,8 +637,8 @@ special_function_p (const_tree fndecl, int flags)
   }

       /* ECF_RETURNS_TWICE is safe even for -ffreestanding.  */
-      if (! strcmp (tname, "setjmp")
-    || ! strcmp (tname, "sigsetjmp")
+      if (! strncmp (tname, "setjmp", 6)
+    || ! strncmp (tname, "sigsetjmp", 9)
     || ! strcmp (name, "savectx")
     || ! strcmp (name, "vfork")
     || ! strcmp (name, "getcontext"))
Tsimbalist, Igor V March 8, 2018, 12:24 p.m. UTC | #38
> -----Original Message-----

> From: H.J. Lu [mailto:hjl.tools@gmail.com]

> Sent: Wednesday, March 7, 2018 11:07 PM

> To: Carlos O'Donell <carlos@redhat.com>; Tsimbalist, Igor V

> <igor.v.tsimbalist@intel.com>

> Cc: Florian Weimer <fw@deneb.enyo.de>; GNU C Library <libc-

> alpha@sourceware.org>

> Subject: Re: [PATCH 0/2] nptl: Update struct pthread_unwind_buf

> 

> On Wed, Mar 7, 2018 at 12:14 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

> > On Wed, Mar 7, 2018 at 11:47 AM, H.J. Lu <hjl.tools@gmail.com> wrote:

> >> On Wed, Mar 7, 2018 at 9:34 AM, Carlos O'Donell <carlos@redhat.com>

> wrote:

> >>> On 03/07/2018 03:56 AM, H.J. Lu wrote:

> >>>>>> 1. I have to add  __setjmp_cancel and __sigsetjmp_cancel which

> won't

> >>>>>> save and restore shadow stack register.

> >>>>

> >>>> I have been testing this.  I ran into one issue.  GCC knows that setjmp

> will

> >>>> return via longjmp and inserts ENDBR after it.  But it doesn't know

> >>>>   __setjmp_cancel and __sigsetjmp_cancel.   We can either add them to

> GCC

> >>>> or add NOTRACK prefix to the corresponding longjmps.

> >>>

> >>> I would rather GCC did not know about these implementation details.

> >>>

> >>> I have no objection to the NOTRACK prefix in the corresponding

> longjmps.

> >>>

> >>> What would be a downside to this choice?

> >>>

> >>

> >> NOTRACK prefix is typically generated by compiler for switch table.

> Compiler

> >> knows each indirect jump target is valid and pointer load for indirect

> jump is

> >> generated by compiler in read-only section.  This is pretty safe since there

> is

> >> very little chance for malicious codes to temper the pointer value.

> However,

> >> in case of longjmp, the indirect jump target is in jmpbuf.   There is

> >> a possilibty

> >> for malicious codes to change the indirect jump target such that longjmp

> wil

> >> jump to the wrong place.  Use NOTRACK prefix here defeats the purpose

> of

> >> indirect branch tracking in CET.

> >>

> >

> > Also GCC needs to know that __setjmp_cancel and __sigsetjmp_cancel may

> > return twice, similar to setjmp.

> >

> 

> Here is the GCC patch:

> 

> 

> diff --git a/gcc/calls.c b/gcc/calls.c

> index 19c95b8455b..d1f436dfa91 100644

> --- a/gcc/calls.c

> +++ b/gcc/calls.c

> @@ -604,7 +604,7 @@ special_function_p (const_tree fndecl, int flags)

>      name_decl = DECL_NAME (cgraph_node::get (fndecl)->orig_decl);

> 

>    if (fndecl && name_decl

> -      && IDENTIFIER_LENGTH (name_decl) <= 11

> +      && IDENTIFIER_LENGTH (name_decl) <= 18

>        /* Exclude functions not at the file scope, or not `extern',

>     since they are not the magic functions we would otherwise

>     think they are.

> @@ -637,8 +637,8 @@ special_function_p (const_tree fndecl, int flags)

>    }

> 

>        /* ECF_RETURNS_TWICE is safe even for -ffreestanding.  */

> -      if (! strcmp (tname, "setjmp")

> -    || ! strcmp (tname, "sigsetjmp")

> +      if (! strncmp (tname, "setjmp", 6)

> +    || ! strncmp (tname, "sigsetjmp", 9)

>      || ! strcmp (name, "savectx")

>      || ! strcmp (name, "vfork")

>      || ! strcmp (name, "getcontext"))


Should it be compared with the whole function name (__setjmp_cancel and
__sigsetjmp_cancel) as something like setjmp_my_func will be also detected?

Also there was an error in detection of __getcontext, which is 12 bytes, but it
will be fixed by this patch.

Igor

> --

> H.J.
H.J. Lu March 8, 2018, 12:48 p.m. UTC | #39
On Thu, Mar 8, 2018 at 4:24 AM, Tsimbalist, Igor V
<igor.v.tsimbalist@intel.com> wrote:
>> -----Original Message-----
>> From: H.J. Lu [mailto:hjl.tools@gmail.com]
>> Sent: Wednesday, March 7, 2018 11:07 PM
>> To: Carlos O'Donell <carlos@redhat.com>; Tsimbalist, Igor V
>> <igor.v.tsimbalist@intel.com>
>> Cc: Florian Weimer <fw@deneb.enyo.de>; GNU C Library <libc-
>> alpha@sourceware.org>
>> Subject: Re: [PATCH 0/2] nptl: Update struct pthread_unwind_buf
>>
>> On Wed, Mar 7, 2018 at 12:14 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> > On Wed, Mar 7, 2018 at 11:47 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> >> On Wed, Mar 7, 2018 at 9:34 AM, Carlos O'Donell <carlos@redhat.com>
>> wrote:
>> >>> On 03/07/2018 03:56 AM, H.J. Lu wrote:
>> >>>>>> 1. I have to add  __setjmp_cancel and __sigsetjmp_cancel which
>> won't
>> >>>>>> save and restore shadow stack register.
>> >>>>
>> >>>> I have been testing this.  I ran into one issue.  GCC knows that setjmp
>> will
>> >>>> return via longjmp and inserts ENDBR after it.  But it doesn't know
>> >>>>   __setjmp_cancel and __sigsetjmp_cancel.   We can either add them to
>> GCC
>> >>>> or add NOTRACK prefix to the corresponding longjmps.
>> >>>
>> >>> I would rather GCC did not know about these implementation details.
>> >>>
>> >>> I have no objection to the NOTRACK prefix in the corresponding
>> longjmps.
>> >>>
>> >>> What would be a downside to this choice?
>> >>>
>> >>
>> >> NOTRACK prefix is typically generated by compiler for switch table.
>> Compiler
>> >> knows each indirect jump target is valid and pointer load for indirect
>> jump is
>> >> generated by compiler in read-only section.  This is pretty safe since there
>> is
>> >> very little chance for malicious codes to temper the pointer value.
>> However,
>> >> in case of longjmp, the indirect jump target is in jmpbuf.   There is
>> >> a possilibty
>> >> for malicious codes to change the indirect jump target such that longjmp
>> wil
>> >> jump to the wrong place.  Use NOTRACK prefix here defeats the purpose
>> of
>> >> indirect branch tracking in CET.
>> >>
>> >
>> > Also GCC needs to know that __setjmp_cancel and __sigsetjmp_cancel may
>> > return twice, similar to setjmp.
>> >
>>
>> Here is the GCC patch:
>>
>>
>> diff --git a/gcc/calls.c b/gcc/calls.c
>> index 19c95b8455b..d1f436dfa91 100644
>> --- a/gcc/calls.c
>> +++ b/gcc/calls.c
>> @@ -604,7 +604,7 @@ special_function_p (const_tree fndecl, int flags)
>>      name_decl = DECL_NAME (cgraph_node::get (fndecl)->orig_decl);
>>
>>    if (fndecl && name_decl
>> -      && IDENTIFIER_LENGTH (name_decl) <= 11
>> +      && IDENTIFIER_LENGTH (name_decl) <= 18
>>        /* Exclude functions not at the file scope, or not `extern',
>>     since they are not the magic functions we would otherwise
>>     think they are.
>> @@ -637,8 +637,8 @@ special_function_p (const_tree fndecl, int flags)
>>    }
>>
>>        /* ECF_RETURNS_TWICE is safe even for -ffreestanding.  */
>> -      if (! strcmp (tname, "setjmp")
>> -    || ! strcmp (tname, "sigsetjmp")
>> +      if (! strncmp (tname, "setjmp", 6)
>> +    || ! strncmp (tname, "sigsetjmp", 9)
>>      || ! strcmp (name, "savectx")
>>      || ! strcmp (name, "vfork")
>>      || ! strcmp (name, "getcontext"))
>
> Should it be compared with the whole function name (__setjmp_cancel and
> __sigsetjmp_cancel) as something like setjmp_my_func will be also detected?

True.  This is the patch I have tested:

https://github.com/hjl-tools/gcc/commit/e98087865405f051e93d5f35588789ef9686db4a

> Also there was an error in detection of __getcontext, which is 12 bytes, but it
> will be fixed by this patch.
Carlos O'Donell March 9, 2018, 12:47 a.m. UTC | #40
On 03/08/2018 04:48 AM, H.J. Lu wrote:
> True.  This is the patch I have tested:
> 
> https://github.com/hjl-tools/gcc/commit/e98087865405f051e93d5f35588789ef9686db4a

I assume NOTRACK prefix is what Intel calls 'NO_TRACK_EN' in the CET
documentation since binutils uses the 0x3E prefix for it and that matches
the Intel CET docs. Please correct me if I'm wrong.

In which case I agree, using NOTRACK is going to prevent a useful use
of CET against writes to the cancellation jump buffer.

This patch looks good to me, but is not a *correctness* issue, it is
simply that we want to extend coverage to the private cancellation
setjmp/longjmp buffers.

Is there any way we can do this with source markup instead of via
a fragile list in the compiler?

Presumably users may want to markup their own code like this also
if they have custom implementations of functions that behave like
setjmp/longjmp?
H.J. Lu March 9, 2018, 5:23 a.m. UTC | #41
On Thu, Mar 8, 2018 at 4:47 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 03/08/2018 04:48 AM, H.J. Lu wrote:
>> True.  This is the patch I have tested:
>>
>> https://github.com/hjl-tools/gcc/commit/e98087865405f051e93d5f35588789ef9686db4a
>
> I assume NOTRACK prefix is what Intel calls 'NO_TRACK_EN' in the CET
> documentation since binutils uses the 0x3E prefix for it and that matches
> the Intel CET docs. Please correct me if I'm wrong.

That is correct.

> In which case I agree, using NOTRACK is going to prevent a useful use
> of CET against writes to the cancellation jump buffer.

True.

> This patch looks good to me, but is not a *correctness* issue, it is
> simply that we want to extend coverage to the private cancellation
> setjmp/longjmp buffers.
>
> Is there any way we can do this with source markup instead of via
> a fragile list in the compiler?
>
> Presumably users may want to markup their own code like this also
> if they have custom implementations of functions that behave like
> setjmp/longjmp?
>

Yes, we can use __attribute__((__returns_twice__)).  I updated
hjl/setjmp/cancel branch to do that.  No GCC changes are needed.
Carlos O'Donell March 15, 2018, 4:20 a.m. UTC | #42
On 03/08/2018 10:23 PM, H.J. Lu wrote:
> On Thu, Mar 8, 2018 at 4:47 PM, Carlos O'Donell <carlos@redhat.com> wrote:
>> On 03/08/2018 04:48 AM, H.J. Lu wrote:
>>> True.  This is the patch I have tested:
>>>
>>> https://github.com/hjl-tools/gcc/commit/e98087865405f051e93d5f35588789ef9686db4a
>>
>> I assume NOTRACK prefix is what Intel calls 'NO_TRACK_EN' in the CET
>> documentation since binutils uses the 0x3E prefix for it and that matches
>> the Intel CET docs. Please correct me if I'm wrong.
> 
> That is correct.
> 
>> In which case I agree, using NOTRACK is going to prevent a useful use
>> of CET against writes to the cancellation jump buffer.
> 
> True.
> 
>> This patch looks good to me, but is not a *correctness* issue, it is
>> simply that we want to extend coverage to the private cancellation
>> setjmp/longjmp buffers.
>>
>> Is there any way we can do this with source markup instead of via
>> a fragile list in the compiler?
>>
>> Presumably users may want to markup their own code like this also
>> if they have custom implementations of functions that behave like
>> setjmp/longjmp?
>>
> 
> Yes, we can use __attribute__((__returns_twice__)).  I updated
> hjl/setjmp/cancel branch to do that.  No GCC changes are needed.

Is the next step for me to do another round of review on this branch?

Cheers,
Carlos.