diff mbox series

[RFC,3/8] nptl: Start new threads with all signals blocked [BZ #25098]

Message ID 20200323131607.15185-4-mathieu.desnoyers@efficios.com
State New
Headers show
Series Restartable Sequences enablement | expand

Commit Message

Michael Kerrisk \(man-pages\) via Libc-alpha March 23, 2020, 1:16 p.m. UTC
From: Florian Weimer <fweimer@redhat.com>

New threads inherit the signal mask from the current thread.  This
means that signal handlers can run on the newly created thread
immediately after the kernel has created the userspace thread, even
before glibc has initialized the TCB.  Consequently, new threads can
observe uninitialized ctype data, among other things.

To address this, block all signals before starting the thread, and
pass the original signal mask to the start routine wrapper.  On the
new thread, first perform all thread initialization, and then unblock
signals.

The cost of doing this is two rt_sigprocmask system calls on the old
thread, and one rt_sigprocmask system call on the new thread.  (If
there was a way to clone a new thread with a signals disabled, this
could be brought down to one system call each.)  The thread descriptor
increases in size, too, and sigset_t is fairly large.  This increase
could be brought down by reusing space the in the descriptor which is
not needed before running user code, or by switching to an internal
sigset_t definition which only covers the signals supported by the
kernel definition.  (Part of the thread descriptor size increase is
already offset by reduced stack usage in the thread start wrapper
routine after this commit.)

-----
 nptl/descr.h          | 10 +++++++---
 nptl/pthread_create.c | 47 +++++++++++++++++++++++++----------------------
 2 files changed, 32 insertions(+), 25 deletions(-)
---
 nptl/descr.h          | 10 +++++++---
 nptl/pthread_create.c | 46 +++++++++++++++++++++++--------------------
 2 files changed, 32 insertions(+), 24 deletions(-)

Comments

Christian Brauner March 23, 2020, 3:31 p.m. UTC | #1
On Mon, Mar 23, 2020 at 09:16:02AM -0400, Mathieu Desnoyers via Libc-alpha wrote:
> From: Florian Weimer <fweimer@redhat.com>
> 
> New threads inherit the signal mask from the current thread.  This
> means that signal handlers can run on the newly created thread
> immediately after the kernel has created the userspace thread, even
> before glibc has initialized the TCB.  Consequently, new threads can
> observe uninitialized ctype data, among other things.
> 
> To address this, block all signals before starting the thread, and
> pass the original signal mask to the start routine wrapper.  On the
> new thread, first perform all thread initialization, and then unblock
> signals.
> 
> The cost of doing this is two rt_sigprocmask system calls on the old
> thread, and one rt_sigprocmask system call on the new thread.  (If
> there was a way to clone a new thread with a signals disabled, this

This could be a new clone3() flag. If someone wants to send a patch I'd
take it.

Christian
Michael Kerrisk \(man-pages\) via Libc-alpha March 23, 2020, 5:02 p.m. UTC | #2
----- On Mar 23, 2020, at 11:31 AM, Christian Brauner christian.brauner@ubuntu.com wrote:

> On Mon, Mar 23, 2020 at 09:16:02AM -0400, Mathieu Desnoyers via Libc-alpha
> wrote:
>> From: Florian Weimer <fweimer@redhat.com>
>> 
>> New threads inherit the signal mask from the current thread.  This
>> means that signal handlers can run on the newly created thread
>> immediately after the kernel has created the userspace thread, even
>> before glibc has initialized the TCB.  Consequently, new threads can
>> observe uninitialized ctype data, among other things.
>> 
>> To address this, block all signals before starting the thread, and
>> pass the original signal mask to the start routine wrapper.  On the
>> new thread, first perform all thread initialization, and then unblock
>> signals.
>> 
>> The cost of doing this is two rt_sigprocmask system calls on the old
>> thread, and one rt_sigprocmask system call on the new thread.  (If
>> there was a way to clone a new thread with a signals disabled, this
> 
> This could be a new clone3() flag. If someone wants to send a patch I'd
> take it.

I agree that it would be a nice improvement to alleviate the overhead of
tweaking the signal masks on thread creation. I suspect the code proposed in
this patch is still needed, because glibc would have to support the currently
existing kernels. The improvement you envision involves adding this new flag
into the Linux kernel clone3 system call and then wiring up glibc support.

I don't expect this should delay rseq integration into glibc ?

Thanks,

Mathieu
Christian Brauner March 23, 2020, 5:05 p.m. UTC | #3
On Mon, Mar 23, 2020 at 01:02:12PM -0400, Mathieu Desnoyers via Libc-alpha wrote:
> ----- On Mar 23, 2020, at 11:31 AM, Christian Brauner christian.brauner@ubuntu.com wrote:
> 
> > On Mon, Mar 23, 2020 at 09:16:02AM -0400, Mathieu Desnoyers via Libc-alpha
> > wrote:
> >> From: Florian Weimer <fweimer@redhat.com>
> >> 
> >> New threads inherit the signal mask from the current thread.  This
> >> means that signal handlers can run on the newly created thread
> >> immediately after the kernel has created the userspace thread, even
> >> before glibc has initialized the TCB.  Consequently, new threads can
> >> observe uninitialized ctype data, among other things.
> >> 
> >> To address this, block all signals before starting the thread, and
> >> pass the original signal mask to the start routine wrapper.  On the
> >> new thread, first perform all thread initialization, and then unblock
> >> signals.
> >> 
> >> The cost of doing this is two rt_sigprocmask system calls on the old
> >> thread, and one rt_sigprocmask system call on the new thread.  (If
> >> there was a way to clone a new thread with a signals disabled, this
> > 
> > This could be a new clone3() flag. If someone wants to send a patch I'd
> > take it.
> 
> I agree that it would be a nice improvement to alleviate the overhead of
> tweaking the signal masks on thread creation. I suspect the code proposed in
> this patch is still needed, because glibc would have to support the currently
> existing kernels. The improvement you envision involves adding this new flag
> into the Linux kernel clone3 system call and then wiring up glibc support.
> 
> I don't expect this should delay rseq integration into glibc ?

Oh no, I wasn't trying to say that rseq() in glibc should be blocked on
this, not at all. I was just saying we should probably do this since
it's a valuable improvement.

Christian
Carlos O'Donell April 22, 2020, 4:38 p.m. UTC | #4
On 3/23/20 1:05 PM, Christian Brauner wrote:
> On Mon, Mar 23, 2020 at 01:02:12PM -0400, Mathieu Desnoyers via Libc-alpha wrote:
>> ----- On Mar 23, 2020, at 11:31 AM, Christian Brauner christian.brauner@ubuntu.com wrote:
>>
>>> On Mon, Mar 23, 2020 at 09:16:02AM -0400, Mathieu Desnoyers via Libc-alpha
>>> wrote:
>>>> From: Florian Weimer <fweimer@redhat.com>
>>>>
>>>> New threads inherit the signal mask from the current thread.  This
>>>> means that signal handlers can run on the newly created thread
>>>> immediately after the kernel has created the userspace thread, even
>>>> before glibc has initialized the TCB.  Consequently, new threads can
>>>> observe uninitialized ctype data, among other things.
>>>>
>>>> To address this, block all signals before starting the thread, and
>>>> pass the original signal mask to the start routine wrapper.  On the
>>>> new thread, first perform all thread initialization, and then unblock
>>>> signals.
>>>>
>>>> The cost of doing this is two rt_sigprocmask system calls on the old
>>>> thread, and one rt_sigprocmask system call on the new thread.  (If
>>>> there was a way to clone a new thread with a signals disabled, this
>>>
>>> This could be a new clone3() flag. If someone wants to send a patch I'd
>>> take it.
>>
>> I agree that it would be a nice improvement to alleviate the overhead of
>> tweaking the signal masks on thread creation. I suspect the code proposed in
>> this patch is still needed, because glibc would have to support the currently
>> existing kernels. The improvement you envision involves adding this new flag
>> into the Linux kernel clone3 system call and then wiring up glibc support.
>>
>> I don't expect this should delay rseq integration into glibc ?
> 
> Oh no, I wasn't trying to say that rseq() in glibc should be blocked on
> this, not at all. I was just saying we should probably do this since
> it's a valuable improvement.

That's up to you to decide, and it depends on the workload. I'd be hesitant
to spend a flag bit on this. To cleanup the glibc code would require you to
(a) implement this in the kernel
(b) have glibc move the minimum kernel line above the new kernel that has this
We're talking a minor cleanup that would take 10-15 years to arrive... if ever.
Moving the minimum kernel baseline has container workload impacts that are
going to become real problems soon.

Mathieu, Florian, and I talked about this very issue at GNU Tools
Cauldron 2019 (Montreal). We agreed that even if we could fix it immediately
we would still need the above implementation.

IMO you are doing something wrong if this shows up in your profiling though.
You should not be starting up and killing threads that fast. You should be
using thread pools etc. The kernel reaping of tasks is quite slow and even
in glibc we often hit EAGAIN limits on heavily loaded test boxes.

The above code is required for correctness so certain state is not observable,
but it need not be that fast. Someone correct me if I'm wrong though and I've
missed a use case.
diff mbox series

Patch

diff --git a/nptl/descr.h b/nptl/descr.h
index 9dcf480bdf..e1c7db5473 100644
--- a/nptl/descr.h
+++ b/nptl/descr.h
@@ -332,9 +332,8 @@  struct pthread
   /* True if thread must stop at startup time.  */
   bool stopped_start;
 
-  /* The parent's cancel handling at the time of the pthread_create
-     call.  This might be needed to undo the effects of a cancellation.  */
-  int parent_cancelhandling;
+  /* Formerly used for dealing with cancellation.  */
+  int parent_cancelhandling_unsed;
 
   /* Lock to synchronize access to the descriptor.  */
   int lock;
@@ -391,6 +390,11 @@  struct pthread
   /* Resolver state.  */
   struct __res_state res;
 
+  /* Signal mask for the new thread.  Used during thread startup to
+     restore the signal mask.  (Threads are launched with all signals
+     masked.)  */
+  sigset_t sigmask;
+
   /* Indicates whether is a C11 thread created by thrd_creat.  */
   bool c11;
 
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 7c752d0f99..afd379e89a 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -369,7 +369,6 @@  __free_tcb (struct pthread *pd)
     }
 }
 
-
 /* Local function to start thread and handle cleanup.
    createthread.c defines the macro START_THREAD_DEFN to the
    declaration that its create_thread function will refer to, and
@@ -385,10 +384,6 @@  START_THREAD_DEFN
   /* Initialize pointers to locale data.  */
   __ctype_init ();
 
-  /* Allow setxid from now onwards.  */
-  if (__glibc_unlikely (atomic_exchange_acq (&pd->setxid_futex, 0) == -2))
-    futex_wake (&pd->setxid_futex, 1, FUTEX_PRIVATE);
-
 #ifndef __ASSUME_SET_ROBUST_LIST
   if (__set_robust_list_avail >= 0)
 #endif
@@ -399,18 +394,6 @@  START_THREAD_DEFN
 			     sizeof (struct robust_list_head));
     }
 
-  /* If the parent was running cancellation handlers while creating
-     the thread the new thread inherited the signal mask.  Reset the
-     cancellation signal mask.  */
-  if (__glibc_unlikely (pd->parent_cancelhandling & CANCELING_BITMASK))
-    {
-      sigset_t mask;
-      __sigemptyset (&mask);
-      __sigaddset (&mask, SIGCANCEL);
-      INTERNAL_SYSCALL_CALL (rt_sigprocmask, SIG_UNBLOCK, &mask,
-			     NULL, _NSIG / 8);
-    }
-
   /* This is where the try/finally block should be created.  For
      compilers without that support we do use setjmp.  */
   struct pthread_unwind_buf unwind_buf;
@@ -432,6 +415,12 @@  START_THREAD_DEFN
   unwind_buf.priv.data.prev = NULL;
   unwind_buf.priv.data.cleanup = NULL;
 
+  __libc_signal_restore_set (&pd->sigmask);
+
+  /* Allow setxid from now onwards.  */
+  if (__glibc_unlikely (atomic_exchange_acq (&pd->setxid_futex, 0) == -2))
+    futex_wake (&pd->setxid_futex, 1, FUTEX_PRIVATE);
+
   if (__glibc_likely (! not_first_call))
     {
       /* Store the new cleanup handler info.  */
@@ -722,10 +711,6 @@  __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
   CHECK_THREAD_SYSINFO (pd);
 #endif
 
-  /* Inform start_thread (above) about cancellation state that might
-     translate into inherited signal state.  */
-  pd->parent_cancelhandling = THREAD_GETMEM (THREAD_SELF, cancelhandling);
-
   /* Determine scheduling parameters for the thread.  */
   if (__builtin_expect ((iattr->flags & ATTR_FLAG_NOTINHERITSCHED) != 0, 0)
       && (iattr->flags & (ATTR_FLAG_SCHED_SET | ATTR_FLAG_POLICY_SET)) != 0)
@@ -771,6 +756,21 @@  __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
      ownership of PD (see CONCURRENCY NOTES above).  */
   bool stopped_start = false; bool thread_ran = false;
 
+  /* Block all signals, so that the new thread starts out with
+     signals disabled.  This avoids race conditions in the thread
+     startup.  */
+  sigset_t original_sigmask;
+  __libc_signal_block_all (&original_sigmask);
+
+  /* Conceptually, the new thread needs to inherit the signal mask of
+     this thread.  Therefore, it needs to restore the saved signal
+     mask of this thread, so save it in the startup information.  */
+  pd->sigmask = original_sigmask;
+
+  /* Reset the cancellation signal mask in case this thread is running
+     cancellation.  */
+  __sigdelset (&pd->sigmask, SIGCANCEL);
+
   /* Start the thread.  */
   if (__glibc_unlikely (report_thread_creation (pd)))
     {
@@ -813,6 +813,10 @@  __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
     retval = create_thread (pd, iattr, &stopped_start,
 			    STACK_VARIABLES_ARGS, &thread_ran);
 
+  /* Return to the previous signal mask, after creating the new
+     thread.  */
+  __libc_signal_restore_set (&original_sigmask);
+
   if (__glibc_unlikely (retval != 0))
     {
       if (thread_ran)