diff mbox series

[v3] nptl: Deallocate the thread stack on setup failure (BZ #19511)

Message ID 20210602125644.3725112-1-adhemerval.zanella@linaro.org
State New
Headers show
Series [v3] nptl: Deallocate the thread stack on setup failure (BZ #19511) | expand

Commit Message

Adhemerval Zanella June 2, 2021, 12:56 p.m. UTC
To setup either the thread scheduling parameters or affinity,
pthread_create enforce synchronization on created thread to wait until
its parent either release PD ownership or send a cancellation signal if
a failure occurs.

However, cancelling the thread does not deallocate the newly created
stack since cancellation expects that a pthread_join to deallocate any
allocated thread resouces (threads stack or TLS).

This patch changes on how the thread resource is deallocate in case of
failure to be synchrnous, where the creating thread will signal the
created thread to early exit so it could be joined.  The creating thread
will be reponsible for the resource cleanup before return to caller.

To signal the creating thread a failure has occured, an unused
'struct pthread' member, parent_cancelhandling_unsed, now indicates
whether the setup has failed so creating thread can proper exit.

This strategy also simplifies by not using thread cancellation and
thus not running libgcc_so load in the signal handler (which is
avoided in thread cancellation since 'pthread_cancel' is the one
responsible to dlopen libgcc_s).  Another advantage is since the
early exit is move to first step at thread creation, the signal
mask is not already set and thus it can not act on change ID setxid
handler.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.
---
 nptl/allocatestack.c  |   1 +
 nptl/descr.h          |   5 +-
 nptl/pthread_create.c | 140 ++++++++++++++++++++----------------------
 3 files changed, 71 insertions(+), 75 deletions(-)

Comments

Andreas Schwab June 2, 2021, 1:08 p.m. UTC | #1
On Jun 02 2021, Adhemerval Zanella via Libc-alpha wrote:

> diff --git a/nptl/descr.h b/nptl/descr.h
> index 3de9535449..9d8297b45f 100644
> --- a/nptl/descr.h
> +++ b/nptl/descr.h
> @@ -340,8 +340,9 @@ struct pthread
>    /* True if thread must stop at startup time.  */
>    bool stopped_start;
>  
> -  /* Formerly used for dealing with cancellation.  */
> -  int parent_cancelhandling_unsed;
> +  /* Indicate that a thread creation setup has failed (for instance the
> +     scheduler or affinity).  */
> +  int setup_failed;

Perhaps this should be turned into a bitfield?

Andreas.
Adhemerval Zanella June 2, 2021, 1:39 p.m. UTC | #2
On 02/06/2021 10:08, Andreas Schwab wrote:
> On Jun 02 2021, Adhemerval Zanella via Libc-alpha wrote:
> 
>> diff --git a/nptl/descr.h b/nptl/descr.h
>> index 3de9535449..9d8297b45f 100644
>> --- a/nptl/descr.h
>> +++ b/nptl/descr.h
>> @@ -340,8 +340,9 @@ struct pthread
>>    /* True if thread must stop at startup time.  */
>>    bool stopped_start;
>>  
>> -  /* Formerly used for dealing with cancellation.  */
>> -  int parent_cancelhandling_unsed;
>> +  /* Indicate that a thread creation setup has failed (for instance the
>> +     scheduler or affinity).  */
>> +  int setup_failed;
> 
> Perhaps this should be turned into a bitfield?

I do not want to change the 'struct parent' layout or size with this
patch.  I discussed briefly with Florian on another thread and maybe
we can cleanup the struct pthread, there are a couple of unused fields
and others that can use a better type.
Andreas Schwab June 2, 2021, 1:41 p.m. UTC | #3
On Jun 02 2021, Adhemerval Zanella wrote:

> On 02/06/2021 10:08, Andreas Schwab wrote:
>> On Jun 02 2021, Adhemerval Zanella via Libc-alpha wrote:
>> 
>>> diff --git a/nptl/descr.h b/nptl/descr.h
>>> index 3de9535449..9d8297b45f 100644
>>> --- a/nptl/descr.h
>>> +++ b/nptl/descr.h
>>> @@ -340,8 +340,9 @@ struct pthread
>>>    /* True if thread must stop at startup time.  */
>>>    bool stopped_start;
>>>  
>>> -  /* Formerly used for dealing with cancellation.  */
>>> -  int parent_cancelhandling_unsed;
>>> +  /* Indicate that a thread creation setup has failed (for instance the
>>> +     scheduler or affinity).  */
>>> +  int setup_failed;
>> 
>> Perhaps this should be turned into a bitfield?
>
> I do not want to change the 'struct parent' layout or size with this
> patch.

I didn't say you should change the layout.

Andreas.
Adhemerval Zanella June 2, 2021, 2:01 p.m. UTC | #4
On 02/06/2021 10:41, Andreas Schwab wrote:
> On Jun 02 2021, Adhemerval Zanella wrote:
> 
>> On 02/06/2021 10:08, Andreas Schwab wrote:
>>> On Jun 02 2021, Adhemerval Zanella via Libc-alpha wrote:
>>>
>>>> diff --git a/nptl/descr.h b/nptl/descr.h
>>>> index 3de9535449..9d8297b45f 100644
>>>> --- a/nptl/descr.h
>>>> +++ b/nptl/descr.h
>>>> @@ -340,8 +340,9 @@ struct pthread
>>>>    /* True if thread must stop at startup time.  */
>>>>    bool stopped_start;
>>>>  
>>>> -  /* Formerly used for dealing with cancellation.  */
>>>> -  int parent_cancelhandling_unsed;
>>>> +  /* Indicate that a thread creation setup has failed (for instance the
>>>> +     scheduler or affinity).  */
>>>> +  int setup_failed;
>>>
>>> Perhaps this should be turned into a bitfield?
>>
>> I do not want to change the 'struct parent' layout or size with this
>> patch.
> 
> I didn't say you should change the layout.

I don't have a strong preference here, maybe use a 'bool' then?
Andreas Schwab June 2, 2021, 2:07 p.m. UTC | #5
On Jun 02 2021, Adhemerval Zanella wrote:

> I don't have a strong preference here, maybe use a 'bool' then?

bool has architecture dependent size.

Andreas.
Adhemerval Zanella June 2, 2021, 2:15 p.m. UTC | #6
On 02/06/2021 11:07, Andreas Schwab wrote:
> On Jun 02 2021, Adhemerval Zanella wrote:
> 
>> I don't have a strong preference here, maybe use a 'bool' then?
> 
> bool has architecture dependent size.

Fair enough, I will change to:

  int setup_failed:1;
  int ununsed_0:31;
Andreas Schwab June 2, 2021, 2:30 p.m. UTC | #7
On Jun 02 2021, Adhemerval Zanella wrote:

> On 02/06/2021 11:07, Andreas Schwab wrote:
>> On Jun 02 2021, Adhemerval Zanella wrote:
>> 
>>> I don't have a strong preference here, maybe use a 'bool' then?
>> 
>> bool has architecture dependent size.
>
> Fair enough, I will change to:
>
>   int setup_failed:1;

That should be unsigned int.

Andreas.
Adhemerval Zanella June 2, 2021, 2:42 p.m. UTC | #8
On 02/06/2021 11:30, Andreas Schwab wrote:
> On Jun 02 2021, Adhemerval Zanella wrote:
> 
>> On 02/06/2021 11:07, Andreas Schwab wrote:
>>> On Jun 02 2021, Adhemerval Zanella wrote:
>>>
>>>> I don't have a strong preference here, maybe use a 'bool' then?
>>>
>>> bool has architecture dependent size.
>>
>> Fair enough, I will change to:
>>
>>   int setup_failed:1;
> 
> That should be unsigned int.
> 
> Andreas.
> 

Ack.
Joseph Myers June 2, 2021, 6:20 p.m. UTC | #9
On Wed, 2 Jun 2021, Andreas Schwab wrote:

> On Jun 02 2021, Adhemerval Zanella wrote:
> 
> > I don't have a strong preference here, maybe use a 'bool' then?
> 
> bool has architecture dependent size.

bool is one byte for all configurations supported by glibc (the only 
configuration supported by GCC for which it isn't is powerpc-darwin).  
Although I don't see any likely reason for glibc code to depend on it 
being one byte.

(GCC 2.95 had a different, four-byte bool in <stdbool.h>; if any future 
glibc interfaces involve bool in ABIs, I expect we should simply not 
support use of those interfaces with GCC 2.95.)
Florian Weimer June 2, 2021, 6:30 p.m. UTC | #10
* Joseph Myers:

> On Wed, 2 Jun 2021, Andreas Schwab wrote:
>
>> On Jun 02 2021, Adhemerval Zanella wrote:
>> 
>> > I don't have a strong preference here, maybe use a 'bool' then?
>> 
>> bool has architecture dependent size.
>
> bool is one byte for all configurations supported by glibc (the only 
> configuration supported by GCC for which it isn't is powerpc-darwin).  
> Although I don't see any likely reason for glibc code to depend on it 
> being one byte.

However, one reason not to use bool in internal code is that our own
atomics do not support one-byte types on all architectures.

For example this definition in sysdeps/powerpc/atomic-machine.h:

#define atomic_exchange_acq(mem, value) \
  ({                                                                          \
    __typeof (*(mem)) __result;                                               \
    if (sizeof (*mem) == 4)                                                   \
      __result = __arch_atomic_exchange_32_acq (mem, value);                  \
    else if (sizeof (*mem) == 8)                                              \
      __result = __arch_atomic_exchange_64_acq (mem, value);                  \
    else                                                                      \
       abort ();                                                              \
    __result;                                                                 \
  })

Thanks,
Florian
Adhemerval Zanella June 2, 2021, 7:02 p.m. UTC | #11
On 02/06/2021 15:30, Florian Weimer via Libc-alpha wrote:
> * Joseph Myers:
> 
>> On Wed, 2 Jun 2021, Andreas Schwab wrote:
>>
>>> On Jun 02 2021, Adhemerval Zanella wrote:
>>>
>>>> I don't have a strong preference here, maybe use a 'bool' then?
>>>
>>> bool has architecture dependent size.
>>
>> bool is one byte for all configurations supported by glibc (the only 
>> configuration supported by GCC for which it isn't is powerpc-darwin).  
>> Although I don't see any likely reason for glibc code to depend on it 
>> being one byte.
> 
> However, one reason not to use bool in internal code is that our own
> atomics do not support one-byte types on all architectures.
> 
> For example this definition in sysdeps/powerpc/atomic-machine.h:
> 
> #define atomic_exchange_acq(mem, value) \
>   ({                                                                          \
>     __typeof (*(mem)) __result;                                               \
>     if (sizeof (*mem) == 4)                                                   \
>       __result = __arch_atomic_exchange_32_acq (mem, value);                  \
>     else if (sizeof (*mem) == 8)                                              \
>       __result = __arch_atomic_exchange_64_acq (mem, value);                  \
>     else                                                                      \
>        abort ();                                                              \
>     __result;                                                                 \
>   })

I think it should be safe to use on code that won't be used for atomic
operations.  The this specific flag, for instance, is protected by
the PD lock.
Florian Weimer June 2, 2021, 7:11 p.m. UTC | #12
* Adhemerval Zanella via Libc-alpha:

> I think it should be safe to use on code that won't be used for atomic
> operations.  The this specific flag, for instance, is protected by
> the PD lock.

Right, the atomics limitation does not seem to apply in this case.

Thanks,
Florian
Florian Weimer June 8, 2021, 10:56 a.m. UTC | #13
* Adhemerval Zanella via Libc-alpha:

> To setup either the thread scheduling parameters or affinity,
> pthread_create enforce synchronization on created thread to wait until
> its parent either release PD ownership or send a cancellation signal if
> a failure occurs.
>
> However, cancelling the thread does not deallocate the newly created
> stack since cancellation expects that a pthread_join to deallocate any
> allocated thread resouces (threads stack or TLS).
>
> This patch changes on how the thread resource is deallocate in case of
> failure to be synchrnous, where the creating thread will signal the

“synchronous”

> created thread to early exit so it could be joined.  The creating thread

“to exit early”

> will be reponsible for the resource cleanup before return to caller.

“returning to the caller”

> To signal the creating thread a failure has occured, an unused

“that a failure has occured”

> 'struct pthread' member, parent_cancelhandling_unsed, now indicates

(okay, existing typo)

> whether the setup has failed so creating thread can proper exit.
>
> This strategy also simplifies by not using thread cancellation and
> thus not running libgcc_so load in the signal handler (which is
> avoided in thread cancellation since 'pthread_cancel' is the one
> responsible to dlopen libgcc_s).  Another advantage is since the
> early exit is move to first step at thread creation, the signal
> mask is not already set and thus it can not act on change ID setxid
> handler.
>
> Checked on x86_64-linux-gnu and aarch64-linux-gnu.
> ---
>  nptl/allocatestack.c  |   1 +
>  nptl/descr.h          |   5 +-
>  nptl/pthread_create.c | 140 ++++++++++++++++++++----------------------
>  3 files changed, 71 insertions(+), 75 deletions(-)
>
> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
> index dc81a2ca73..2114bd2e27 100644
> --- a/nptl/allocatestack.c
> +++ b/nptl/allocatestack.c
> @@ -161,6 +161,7 @@ get_cached_stack (size_t *sizep, void **memp)
>    /* Cancellation handling is back to the default.  */
>    result->cancelhandling = 0;
>    result->cleanup = NULL;
> +  result->setup_failed = 0;
>  
>    /* No pending event.  */
>    result->nextevent = NULL;
> diff --git a/nptl/descr.h b/nptl/descr.h
> index 3de9535449..9d8297b45f 100644
> --- a/nptl/descr.h
> +++ b/nptl/descr.h
> @@ -340,8 +340,9 @@ struct pthread
>    /* True if thread must stop at startup time.  */
>    bool stopped_start;
>  
> -  /* Formerly used for dealing with cancellation.  */
> -  int parent_cancelhandling_unsed;
> +  /* Indicate that a thread creation setup has failed (for instance the
> +     scheduler or affinity).  */
> +  int setup_failed;
>  
>    /* Lock to synchronize access to the descriptor.  */
>    int lock;
> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
> index 52d6738f7f..4b143a5016 100644
> --- a/nptl/pthread_create.c
> +++ b/nptl/pthread_create.c
> @@ -167,19 +167,19 @@ late_init (void)
>         without error, then the created thread owns PD; otherwise, see
>         (c) and (e) below.
>  
> -   (c) If the detached thread setup failed and THREAD_RAN is true, then
> -       the creating thread releases ownership to the new thread by
> -       sending a cancellation signal.  All threads set THREAD_RAN to
> -       true as quickly as possible after returning from the OS kernel's
> -       thread creation routine.
> +   (c) If the detached thread setup failed and THREAD_RAN is true, then the
> +       creating thread releases ownership to the new thread, the created
> +       thread see the failed setup through a PD field and releases the
> +       PD ownership and early exit.  The creating thread will be one
> +       responsible for cleanup state.  All threads set THREAD_RAN to true as
> +       quickly as possible after returning from the OS kernel's thread
> +       creation routine.

“the created thread see[s]”, “and [exits early]”,
“will be [] responsible”.

I don't think (c) is no longer described correctly.  THREAD_RAN is set
even before user code runs, and ownership is not released in this case.
(I think this shows the limits of describing what happens in terms of
ownership of PD.)

Perhaps add a note that the distinction between (c) and (d) only matters
briefly within create_thread?

>     (d) If the joinable thread setup failed and THREAD_RAN is true, then
> -       then the creating thread retains ownership of PD and must cleanup
> +       the creating thread retains ownership of PD and must cleanup
>         state.  Ownership cannot be released to the process via the
>         return of pthread_create since a non-zero result entails PD is
>         undefined and therefore cannot be joined to free the resources.
> -       We privately call pthread_join on the thread to finish handling
> -       the resource shutdown (Or at least we should, see bug 19511).
>  
>     (e) If the thread creation failed and THREAD_RAN is false, then the
>         creating thread retains ownership of PD and must cleanup state.


And I'm not sure if THREAD_RAN matters for the distinction between (d)
and (e), either.

This suggests to me that we should keep THREAD_RAN local to
create_thread, and handle all setup errors there.

> @@ -239,8 +239,8 @@ late_init (void)
>     The return value is zero for success or an errno code for failure.
>     If the return value is ENOMEM, that will be translated to EAGAIN,
>     so create_thread need not do that.  On failure, *THREAD_RAN should
> -   be set to true iff the thread actually started up and then got
> -   canceled before calling user code (*PD->start_routine).  */
> +   be set to true iff the thread actually started up but before calling
> +   the user code (*PD->start_routine).  */
>  
>  static int _Noreturn start_thread (void *arg);
>  
> @@ -308,35 +308,23 @@ static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
>  			== -1))
>      return errno;
>  
> -  /* It's started now, so if we fail below, we'll have to cancel it
> -     and let it clean itself up.  */
> +  /* It's started now, so if we fail below, we'll have to let it clean itself
> +     up.  */
>    *thread_ran = true;
>  
>    /* Now we have the possibility to set scheduling parameters etc.  */
>    if (attr != NULL)
>      {
> -      int res;
> -
>        /* Set the affinity mask if necessary.  */
>        if (need_setaffinity)
>  	{
>  	  assert (*stopped_start);
>  
> -	  res = INTERNAL_SYSCALL_CALL (sched_setaffinity, pd->tid,
> -				       attr->extension->cpusetsize,
> -				       attr->extension->cpuset);
> -
> +	  int res = INTERNAL_SYSCALL_CALL (sched_setaffinity, pd->tid,
> +					   attr->extension->cpusetsize,
> +					   attr->extension->cpuset);
>  	  if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (res)))
> -	  err_out:
> -	    {
> -	      /* The operation failed.  We have to kill the thread.
> -		 We let the normal cancellation mechanism do the work.  */
> -
> -	      pid_t pid = __getpid ();
> -	      INTERNAL_SYSCALL_CALL (tgkill, pid, pd->tid, SIGCANCEL);
> -
> -	      return INTERNAL_SYSCALL_ERRNO (res);
> -	    }
> +	    return INTERNAL_SYSCALL_ERRNO (res);
>  	}
>  
>        /* Set the scheduling parameters.  */
> @@ -344,11 +332,10 @@ static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
>  	{
>  	  assert (*stopped_start);
>  
> -	  res = INTERNAL_SYSCALL_CALL (sched_setscheduler, pd->tid,
> -				       pd->schedpolicy, &pd->schedparam);
> -
> +	  int res = INTERNAL_SYSCALL_CALL (sched_setscheduler, pd->tid,
> +					   pd->schedpolicy, &pd->schedparam);
>  	  if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (res)))
> -	    goto err_out;
> +	    return INTERNAL_SYSCALL_ERRNO (res);
>  	}
>      }
>  
> @@ -361,6 +348,29 @@ start_thread (void *arg)
>  {
>    struct pthread *pd = arg;
>  
> +  /* We are either in (a) or (b), and in either case we either own PD already
> +     (2) or are about to own PD (1), and so our only restriction would be that
> +     we can't free PD until we know we have ownership (see CONCURRENCY NOTES
> +     above).  */
> +  bool setup_failed = false;
> +  if (__glibc_unlikely (pd->stopped_start))

I think we should remove __glibc_unlikely for cases which can always be
true due to particular application use cases.

> +    {
> +      /* Get the lock the parent locked to force synchronization.  */
> +      lll_lock (pd->lock, LLL_PRIVATE);
> +
> +      /* We have ownership of PD now, for detached threads with setup failure
> +	 we set it as joinable so the creating thread could synchronous join
> +         and free any resource prior return to the pthread_create caller.  */
> +      setup_failed = pd->setup_failed == 1;
> +      if (setup_failed)
> +	pd->joinid = NULL;
> +
> +      /* And give it up right away.  */
> +      lll_unlock (pd->lock, LLL_PRIVATE);
> +    }
> +  if (setup_failed)
> +    goto out;

I think it would be clearer if that goto were nested within the previous
if block.

> @@ -814,30 +805,33 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>    if (__glibc_unlikely (retval != 0))
>      {
>        if (thread_ran)
> -	/* State (c) or (d) and we may not have PD ownership (see
> -	   CONCURRENCY NOTES above).  We can assert that STOPPED_START
> -	   must have been true because thread creation didn't fail, but
> -	   thread attribute setting did.  */
> -	/* See bug 19511 which explains why doing nothing here is a
> -	   resource leak for a joinable thread.  */
> -	assert (stopped_start);
> -      else
> -	{
> -	  /* State (e) and we have ownership of PD (see CONCURRENCY
> -	     NOTES above).  */
> +	/* State (c) and we not have PD ownership (see CONCURRENCY NOTES
> +	   above).  We can assert that STOPPED_START must have been true
> +	   because thread creation didn't fail, but thread attribute setting
> +	   did.  */
> +        {
> +	  assert (stopped_start);
> +	  /* Signal the creating thread to release PD ownership and early
> +	     exit so it could be joined.  */
> +	  pd->setup_failed = 1;
> +	  lll_unlock (pd->lock, LLL_PRIVATE);

“Signal the creat[ed] thread” (this code runs on the creating thread).

> +	  /* Similar to pthread_join, but since thread creation has failed at
> +	     startup there is no need to handle all the steps.  */
> +	  pid_t tid;
> +	  while ((tid = atomic_load_acquire (&pd->tid)) != 0)
> +	    __futex_abstimed_wait_cancelable64 ((unsigned int *) &pd->tid,
> +						tid, 0, NULL, LLL_SHARED);
> +        }
>  
> +      /* State (c), (d), or (e) and we have ownership of PD (see CONCURRENCY
> +	 NOTES above).  */
>  
> +      /* Oops, we lied for a second.  */
> +      atomic_decrement (&__nptl_nthreads);
> +
> +      /* Free the resources.  */
> +      __nptl_deallocate_stack (pd);
>  
>        /* We have to translate error codes.  */
>        if (retval == ENOMEM)

What's missing further below is pd->setup_failed handling if there is an
early failure on the created thread (I have my sigaltstack changes in
this mind).  No such code exits in your patch, so this is consistent in
this regard.

The actual code changes are fine, I think.

Thanks,
Florian
Adhemerval Zanella June 8, 2021, 5:01 p.m. UTC | #14
On 08/06/2021 07:56, Florian Weimer wrote:
> * Adhemerval Zanella via Libc-alpha:
> 
>> To setup either the thread scheduling parameters or affinity,
>> pthread_create enforce synchronization on created thread to wait until
>> its parent either release PD ownership or send a cancellation signal if
>> a failure occurs.
>>
>> However, cancelling the thread does not deallocate the newly created
>> stack since cancellation expects that a pthread_join to deallocate any
>> allocated thread resouces (threads stack or TLS).
>>
>> This patch changes on how the thread resource is deallocate in case of
>> failure to be synchrnous, where the creating thread will signal the
> 
> “synchronous”


Ack.

> 
>> created thread to early exit so it could be joined.  The creating thread
> 
> “to exit early”

Ack.

> 
>> will be reponsible for the resource cleanup before return to caller.
> 
> “returning to the caller”

Ack.

> 
>> To signal the creating thread a failure has occured, an unused
> 
> “that a failure has occured”

Ack.

> 
>> 'struct pthread' member, parent_cancelhandling_unsed, now indicates
> 
> (okay, existing typo)
> 
>> whether the setup has failed so creating thread can proper exit.
>>
>> This strategy also simplifies by not using thread cancellation and
>> thus not running libgcc_so load in the signal handler (which is
>> avoided in thread cancellation since 'pthread_cancel' is the one
>> responsible to dlopen libgcc_s).  Another advantage is since the
>> early exit is move to first step at thread creation, the signal
>> mask is not already set and thus it can not act on change ID setxid
>> handler.
>>
>> Checked on x86_64-linux-gnu and aarch64-linux-gnu.
>> ---
>>  nptl/allocatestack.c  |   1 +
>>  nptl/descr.h          |   5 +-
>>  nptl/pthread_create.c | 140 ++++++++++++++++++++----------------------
>>  3 files changed, 71 insertions(+), 75 deletions(-)
>>
>> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
>> index dc81a2ca73..2114bd2e27 100644
>> --- a/nptl/allocatestack.c
>> +++ b/nptl/allocatestack.c
>> @@ -161,6 +161,7 @@ get_cached_stack (size_t *sizep, void **memp)
>>    /* Cancellation handling is back to the default.  */
>>    result->cancelhandling = 0;
>>    result->cleanup = NULL;
>> +  result->setup_failed = 0;
>>  
>>    /* No pending event.  */
>>    result->nextevent = NULL;
>> diff --git a/nptl/descr.h b/nptl/descr.h
>> index 3de9535449..9d8297b45f 100644
>> --- a/nptl/descr.h
>> +++ b/nptl/descr.h
>> @@ -340,8 +340,9 @@ struct pthread
>>    /* True if thread must stop at startup time.  */
>>    bool stopped_start;
>>  
>> -  /* Formerly used for dealing with cancellation.  */
>> -  int parent_cancelhandling_unsed;
>> +  /* Indicate that a thread creation setup has failed (for instance the
>> +     scheduler or affinity).  */
>> +  int setup_failed;
>>  
>>    /* Lock to synchronize access to the descriptor.  */
>>    int lock;
>> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
>> index 52d6738f7f..4b143a5016 100644
>> --- a/nptl/pthread_create.c
>> +++ b/nptl/pthread_create.c
>> @@ -167,19 +167,19 @@ late_init (void)
>>         without error, then the created thread owns PD; otherwise, see
>>         (c) and (e) below.
>>  
>> -   (c) If the detached thread setup failed and THREAD_RAN is true, then
>> -       the creating thread releases ownership to the new thread by
>> -       sending a cancellation signal.  All threads set THREAD_RAN to
>> -       true as quickly as possible after returning from the OS kernel's
>> -       thread creation routine.
>> +   (c) If the detached thread setup failed and THREAD_RAN is true, then the
>> +       creating thread releases ownership to the new thread, the created
>> +       thread see the failed setup through a PD field and releases the
>> +       PD ownership and early exit.  The creating thread will be one
>> +       responsible for cleanup state.  All threads set THREAD_RAN to true as
>> +       quickly as possible after returning from the OS kernel's thread
>> +       creation routine.
> 
> “the created thread see[s]”, “and [exits early]”,
> “will be [] responsible”.

Ack.

> 
> I don't think (c) is no longer described correctly.  THREAD_RAN is set
> even before user code runs, and ownership is not released in this case.
> (I think this shows the limits of describing what happens in terms of
> ownership of PD.)
> 
> Perhaps add a note that the distinction between (c) and (d) only matters
> briefly within create_thread?
> 
>>     (d) If the joinable thread setup failed and THREAD_RAN is true, then
>> -       then the creating thread retains ownership of PD and must cleanup
>> +       the creating thread retains ownership of PD and must cleanup
>>         state.  Ownership cannot be released to the process via the
>>         return of pthread_create since a non-zero result entails PD is
>>         undefined and therefore cannot be joined to free the resources.
>> -       We privately call pthread_join on the thread to finish handling
>> -       the resource shutdown (Or at least we should, see bug 19511).
>>  
>>     (e) If the thread creation failed and THREAD_RAN is false, then the
>>         creating thread retains ownership of PD and must cleanup state.
> 
> 
> And I'm not sure if THREAD_RAN matters for the distinction between (d)
> and (e), either.
> 
> This suggests to me that we should keep THREAD_RAN local to
> create_thread, and handle all setup errors there.

THREAD_RAN only signals if whether clone() or thread setup has failed, 
there is no synchronization involved and it is already local to
creating thread.  I changed the concurrency comment to the following,
to indicate that (c) and (d) are essentially the same as you noted:

   (c) If either a joinable or detached thread setup failed and THREAD_RAN
       is true, then the creating thread releases ownership to the new thread,
       the created thread sees the failed setup through PD SETUP_FAILED
       member, releases the PD ownership, and exits.  The creating thread will
       be responsible for cleanup the allocated resources.  The THREAD_RAN is
       local to creating thread and indicate whether thread creation or setup
       has failed.
   
   (d) If the thread creation failed and THREAD_RAN is false (meaning
       ARCH_CLONE has failed), then the creating thread retains ownership
       of PD and must cleanup he allocated resource.  No waiting for the new
       thread is required because it never started.

(with the adjusted references to (c), (d), and (e) adjusted on other
comments).
> 
>> @@ -239,8 +239,8 @@ late_init (void)
>>     The return value is zero for success or an errno code for failure.
>>     If the return value is ENOMEM, that will be translated to EAGAIN,
>>     so create_thread need not do that.  On failure, *THREAD_RAN should
>> -   be set to true iff the thread actually started up and then got
>> -   canceled before calling user code (*PD->start_routine).  */
>> +   be set to true iff the thread actually started up but before calling
>> +   the user code (*PD->start_routine).  */
>>  
>>  static int _Noreturn start_thread (void *arg);
>>  
>> @@ -308,35 +308,23 @@ static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
>>  			== -1))
>>      return errno;
>>  
>> -  /* It's started now, so if we fail below, we'll have to cancel it
>> -     and let it clean itself up.  */
>> +  /* It's started now, so if we fail below, we'll have to let it clean itself
>> +     up.  */
>>    *thread_ran = true;
>>  
>>    /* Now we have the possibility to set scheduling parameters etc.  */
>>    if (attr != NULL)
>>      {
>> -      int res;
>> -
>>        /* Set the affinity mask if necessary.  */
>>        if (need_setaffinity)
>>  	{
>>  	  assert (*stopped_start);
>>  
>> -	  res = INTERNAL_SYSCALL_CALL (sched_setaffinity, pd->tid,
>> -				       attr->extension->cpusetsize,
>> -				       attr->extension->cpuset);
>> -
>> +	  int res = INTERNAL_SYSCALL_CALL (sched_setaffinity, pd->tid,
>> +					   attr->extension->cpusetsize,
>> +					   attr->extension->cpuset);
>>  	  if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (res)))
>> -	  err_out:
>> -	    {
>> -	      /* The operation failed.  We have to kill the thread.
>> -		 We let the normal cancellation mechanism do the work.  */
>> -
>> -	      pid_t pid = __getpid ();
>> -	      INTERNAL_SYSCALL_CALL (tgkill, pid, pd->tid, SIGCANCEL);
>> -
>> -	      return INTERNAL_SYSCALL_ERRNO (res);
>> -	    }
>> +	    return INTERNAL_SYSCALL_ERRNO (res);
>>  	}
>>  
>>        /* Set the scheduling parameters.  */
>> @@ -344,11 +332,10 @@ static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
>>  	{
>>  	  assert (*stopped_start);
>>  
>> -	  res = INTERNAL_SYSCALL_CALL (sched_setscheduler, pd->tid,
>> -				       pd->schedpolicy, &pd->schedparam);
>> -
>> +	  int res = INTERNAL_SYSCALL_CALL (sched_setscheduler, pd->tid,
>> +					   pd->schedpolicy, &pd->schedparam);
>>  	  if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (res)))
>> -	    goto err_out;
>> +	    return INTERNAL_SYSCALL_ERRNO (res);
>>  	}
>>      }
>>  
>> @@ -361,6 +348,29 @@ start_thread (void *arg)
>>  {
>>    struct pthread *pd = arg;
>>  
>> +  /* We are either in (a) or (b), and in either case we either own PD already
>> +     (2) or are about to own PD (1), and so our only restriction would be that
>> +     we can't free PD until we know we have ownership (see CONCURRENCY NOTES
>> +     above).  */
>> +  bool setup_failed = false;
>> +  if (__glibc_unlikely (pd->stopped_start))
> 
> I think we should remove __glibc_unlikely for cases which can always be
> true due to particular application use cases.

Ack.

> 
>> +    {
>> +      /* Get the lock the parent locked to force synchronization.  */
>> +      lll_lock (pd->lock, LLL_PRIVATE);
>> +
>> +      /* We have ownership of PD now, for detached threads with setup failure
>> +	 we set it as joinable so the creating thread could synchronous join
>> +         and free any resource prior return to the pthread_create caller.  */
>> +      setup_failed = pd->setup_failed == 1;
>> +      if (setup_failed)
>> +	pd->joinid = NULL;
>> +
>> +      /* And give it up right away.  */
>> +      lll_unlock (pd->lock, LLL_PRIVATE);
>> +    }
>> +  if (setup_failed)
>> +    goto out;
> 
> I think it would be clearer if that goto were nested within the previous
> if block.

Ack, I moved both the 'setup_failed' and the goto check inside the if block.

> 
>> @@ -814,30 +805,33 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>>    if (__glibc_unlikely (retval != 0))
>>      {
>>        if (thread_ran)
>> -	/* State (c) or (d) and we may not have PD ownership (see
>> -	   CONCURRENCY NOTES above).  We can assert that STOPPED_START
>> -	   must have been true because thread creation didn't fail, but
>> -	   thread attribute setting did.  */
>> -	/* See bug 19511 which explains why doing nothing here is a
>> -	   resource leak for a joinable thread.  */
>> -	assert (stopped_start);
>> -      else
>> -	{
>> -	  /* State (e) and we have ownership of PD (see CONCURRENCY
>> -	     NOTES above).  */
>> +	/* State (c) and we not have PD ownership (see CONCURRENCY NOTES
>> +	   above).  We can assert that STOPPED_START must have been true
>> +	   because thread creation didn't fail, but thread attribute setting
>> +	   did.  */
>> +        {
>> +	  assert (stopped_start);
>> +	  /* Signal the creating thread to release PD ownership and early
>> +	     exit so it could be joined.  */
>> +	  pd->setup_failed = 1;
>> +	  lll_unlock (pd->lock, LLL_PRIVATE);
> 
> “Signal the creat[ed] thread” (this code runs on the creating thread).

Ack.

> 
>> +	  /* Similar to pthread_join, but since thread creation has failed at
>> +	     startup there is no need to handle all the steps.  */
>> +	  pid_t tid;
>> +	  while ((tid = atomic_load_acquire (&pd->tid)) != 0)
>> +	    __futex_abstimed_wait_cancelable64 ((unsigned int *) &pd->tid,
>> +						tid, 0, NULL, LLL_SHARED);
>> +        }
>>  
>> +      /* State (c), (d), or (e) and we have ownership of PD (see CONCURRENCY
>> +	 NOTES above).  */
>>  
>> +      /* Oops, we lied for a second.  */
>> +      atomic_decrement (&__nptl_nthreads);
>> +
>> +      /* Free the resources.  */
>> +      __nptl_deallocate_stack (pd);
>>  
>>        /* We have to translate error codes.  */
>>        if (retval == ENOMEM)
> 
> What's missing further below is pd->setup_failed handling if there is an
> early failure on the created thread (I have my sigaltstack changes in
> this mind).  No such code exits in your patch, so this is consistent in
> this regard.

I think for such case you will need to set PD->stopped_start to true
before thread creation (similar to what affinity and schedule set do),
add the required initialization after the setup failure check, and
handle potential errors.

Something like (I removed the comments to simplify):


  static int _Noreturn
  start_thread (void *arg)
  { 
    struct pthread *pd = arg;

    if (pd->stopped_start)
      { 
        bool setup_failed = false;

        /* Get the lock the parent locked to force synchronization.  */
        lll_lock (pd->lock, LLL_PRIVATE);

        setup_failed = pd->setup_failed == 1;
        if (setup_failed)
          pd->joinid = NULL;

	/* Handle any thread setup requirement.  If a failure occurs, set
	   PD->setup_failed and 'setup_failed' to true to exit early.  */

        lll_unlock (pd->lock, LLL_PRIVATE);

        if (setup_failed)
          goto out;
      }
    [...]
  }

Then on the 'create_thread' error handling check for errors similar to 
what my patch proposes:

  if (__glibc_unlikely (retval != 0))
    { 
      [...]
    }
  else
    {
      if (stopped_start)
	{
          lll_unlock (pd->lock, LLL_PRIVATE);
          /* Thread starts the its own setup code.  */

	  lll_lock (pd->lock, LLL_PRIVATE);
	  /* Handle any thread setup failure similar to early setup
	     failure.  */
	  if (pd->setup_failed)
	    {
	       pid_t tid;
	       while ((tid = atomic_load_acquire (&pd->tid)) != 0)
		 __futex_abstimed_wait_cancelable64 ((unsigned int *) &pd->tid,
						     tid, 0, NULL, LLL_SHARED);
	    }
          else
	    lll_unlock (pd->lock)
    }

It would require a slight more synchronization so the created thread
signal that its own setup has failed, but I don't see a better
strategy here.


> 
> The actual code changes are fine, I think.

Below it is the updated patch based on your suggestion, including the concurrency
state descriptor regarding THREAD_RAN.  Are this version ok to commit?

---

[PATCH] nptl: Deallocate the thread stack on setup failure (BZ#19511)

To setup either the thread scheduling parameters or affinity,
pthread_create enforce synchronization on created thread to wait until
its parent either release PD ownership or send a cancellation signal if
a failure occurs.

However, cancelling the thread does not deallocate the newly created
stack since cancellation expects that a pthread_join to deallocate any
allocated thread resouces (threads stack or TLS).

This patch changes on how the thread resource is deallocate in case of
failure to be synchronous, where the creating thread will signal the
created thread to exit early so it could be joined.  The creating thread
will be reponsible for the resource cleanup before returning to the
caller.

To signal the creating thread that a failure has occured, an unused
'struct pthread' member, parent_cancelhandling_unsed, now indicates
whether the setup has failed so creating thread can proper exit.

This strategy also simplifies by not using thread cancellation and
thus not running libgcc_so load in the signal handler (which is
avoided in thread cancellation since 'pthread_cancel' is the one
responsible to dlopen libgcc_s).  Another advantage is since the
early exit is move to first step at thread creation, the signal
mask is not already set and thus it can not act on change ID setxid
handler.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.
---
 nptl/allocatestack.c  |   1 +
 nptl/descr.h          |   5 +-
 nptl/pthread_create.c | 162 ++++++++++++++++++++----------------------
 3 files changed, 80 insertions(+), 88 deletions(-)

diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index dc81a2ca73..2114bd2e27 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -161,6 +161,7 @@ get_cached_stack (size_t *sizep, void **memp)
   /* Cancellation handling is back to the default.  */
   result->cancelhandling = 0;
   result->cleanup = NULL;
+  result->setup_failed = 0;
 
   /* No pending event.  */
   result->nextevent = NULL;
diff --git a/nptl/descr.h b/nptl/descr.h
index 3de9535449..9d8297b45f 100644
--- a/nptl/descr.h
+++ b/nptl/descr.h
@@ -340,8 +340,9 @@ struct pthread
   /* True if thread must stop at startup time.  */
   bool stopped_start;
 
-  /* Formerly used for dealing with cancellation.  */
-  int parent_cancelhandling_unsed;
+  /* Indicate that a thread creation setup has failed (for instance the
+     scheduler or affinity).  */
+  int setup_failed;
 
   /* Lock to synchronize access to the descriptor.  */
   int lock;
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 52d6738f7f..9cca28419d 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -158,33 +158,27 @@ late_init (void)
        or joinable (default PTHREAD_CREATE_JOINABLE) state and
        STOPPED_START is true, then the creating thread has ownership of
        PD until the PD->lock is released by pthread_create.  If any
-       errors occur we are in states (c), (d), or (e) below.
+       errors occur we are in states (c) or (d) below.
 
    (b) If the created thread is in a detached state
        (PTHREAD_CREATED_DETACHED), and STOPPED_START is false, then the
        creating thread has ownership of PD until it invokes the OS
        kernel's thread creation routine.  If this routine returns
        without error, then the created thread owns PD; otherwise, see
-       (c) and (e) below.
-
-   (c) If the detached thread setup failed and THREAD_RAN is true, then
-       the creating thread releases ownership to the new thread by
-       sending a cancellation signal.  All threads set THREAD_RAN to
-       true as quickly as possible after returning from the OS kernel's
-       thread creation routine.
-
-   (d) If the joinable thread setup failed and THREAD_RAN is true, then
-       then the creating thread retains ownership of PD and must cleanup
-       state.  Ownership cannot be released to the process via the
-       return of pthread_create since a non-zero result entails PD is
-       undefined and therefore cannot be joined to free the resources.
-       We privately call pthread_join on the thread to finish handling
-       the resource shutdown (Or at least we should, see bug 19511).
-
-   (e) If the thread creation failed and THREAD_RAN is false, then the
-       creating thread retains ownership of PD and must cleanup state.
-       No waiting for the new thread is required because it never
-       started.
+       (c) or (d) below.
+
+   (c) If either a joinable or detached thread setup failed and THREAD_RAN
+       is true, then the creating thread releases ownership to the new thread,
+       the created thread sees the failed setup through PD SETUP_FAILED
+       member, releases the PD ownership, and exits.  The creating thread will
+       be responsible for cleanup the allocated resources.  The THREAD_RAN is
+       local to creating thread and indicate whether thread creation or setup
+       has failed.
+
+   (d) If the thread creation failed and THREAD_RAN is false (meaning
+       ARCH_CLONE has failed), then the creating thread retains ownership
+       of PD and must cleanup he allocated resource.  No waiting for the new
+       thread is required because it never started.
 
    The nptl_db interface:
 
@@ -239,8 +233,8 @@ late_init (void)
    The return value is zero for success or an errno code for failure.
    If the return value is ENOMEM, that will be translated to EAGAIN,
    so create_thread need not do that.  On failure, *THREAD_RAN should
-   be set to true iff the thread actually started up and then got
-   canceled before calling user code (*PD->start_routine).  */
+   be set to true iff the thread actually started up but before calling
+   the user code (*PD->start_routine).  */
 
 static int _Noreturn start_thread (void *arg);
 
@@ -308,35 +302,23 @@ static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
 			== -1))
     return errno;
 
-  /* It's started now, so if we fail below, we'll have to cancel it
-     and let it clean itself up.  */
+  /* It's started now, so if we fail below, we'll have to let it clean itself
+     up.  */
   *thread_ran = true;
 
   /* Now we have the possibility to set scheduling parameters etc.  */
   if (attr != NULL)
     {
-      int res;
-
       /* Set the affinity mask if necessary.  */
       if (need_setaffinity)
 	{
 	  assert (*stopped_start);
 
-	  res = INTERNAL_SYSCALL_CALL (sched_setaffinity, pd->tid,
-				       attr->extension->cpusetsize,
-				       attr->extension->cpuset);
-
+	  int res = INTERNAL_SYSCALL_CALL (sched_setaffinity, pd->tid,
+					   attr->extension->cpusetsize,
+					   attr->extension->cpuset);
 	  if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (res)))
-	  err_out:
-	    {
-	      /* The operation failed.  We have to kill the thread.
-		 We let the normal cancellation mechanism do the work.  */
-
-	      pid_t pid = __getpid ();
-	      INTERNAL_SYSCALL_CALL (tgkill, pid, pd->tid, SIGCANCEL);
-
-	      return INTERNAL_SYSCALL_ERRNO (res);
-	    }
+	    return INTERNAL_SYSCALL_ERRNO (res);
 	}
 
       /* Set the scheduling parameters.  */
@@ -344,11 +326,10 @@ static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
 	{
 	  assert (*stopped_start);
 
-	  res = INTERNAL_SYSCALL_CALL (sched_setscheduler, pd->tid,
-				       pd->schedpolicy, &pd->schedparam);
-
+	  int res = INTERNAL_SYSCALL_CALL (sched_setscheduler, pd->tid,
+					   pd->schedpolicy, &pd->schedparam);
 	  if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (res)))
-	    goto err_out;
+	    return INTERNAL_SYSCALL_ERRNO (res);
 	}
     }
 
@@ -361,6 +342,31 @@ start_thread (void *arg)
 {
   struct pthread *pd = arg;
 
+  /* We are either in (a) or (b), and in either case we either own PD already
+     (2) or are about to own PD (1), and so our only restriction would be that
+     we can't free PD until we know we have ownership (see CONCURRENCY NOTES
+     above).  */
+  if (pd->stopped_start)
+    {
+      bool setup_failed = false;
+
+      /* Get the lock the parent locked to force synchronization.  */
+      lll_lock (pd->lock, LLL_PRIVATE);
+
+      /* We have ownership of PD now, for detached threads with setup failure
+	 we set it as joinable so the creating thread could synchronous join
+         and free any resource prior return to the pthread_create caller.  */
+      setup_failed = pd->setup_failed == 1;
+      if (setup_failed)
+	pd->joinid = NULL;
+
+      /* And give it up right away.  */
+      lll_unlock (pd->lock, LLL_PRIVATE);
+
+      if (setup_failed)
+	goto out;
+    }
+
   /* Initialize resolver state pointer.  */
   __resp = &pd->res;
 
@@ -418,25 +424,6 @@ start_thread (void *arg)
       /* Store the new cleanup handler info.  */
       THREAD_SETMEM (pd, cleanup_jmp_buf, &unwind_buf);
 
-      /* We are either in (a) or (b), and in either case we either own
-         PD already (2) or are about to own PD (1), and so our only
-	 restriction would be that we can't free PD until we know we
-	 have ownership (see CONCURRENCY NOTES above).  */
-      if (__glibc_unlikely (pd->stopped_start))
-	{
-	  int oldtype = LIBC_CANCEL_ASYNC ();
-
-	  /* Get the lock the parent locked to force synchronization.  */
-	  lll_lock (pd->lock, LLL_PRIVATE);
-
-	  /* We have ownership of PD now.  */
-
-	  /* And give it up right away.  */
-	  lll_unlock (pd->lock, LLL_PRIVATE);
-
-	  LIBC_CANCEL_RESET (oldtype);
-	}
-
       LIBC_PROBE (pthread_start, 3, (pthread_t) pd, pd->start_routine, pd->arg);
 
       /* Run the code the user provided.  */
@@ -566,6 +553,7 @@ start_thread (void *arg)
     /* Free the TCB.  */
     __nptl_free_tcb (pd);
 
+out:
   /* We cannot call '_exit' here.  '_exit' will terminate the process.
 
      The 'exit' implementation in the kernel will signal when the
@@ -759,7 +747,6 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
 	 signal mask of this thread, so save it in the startup
 	 information.  */
       pd->sigmask = original_sigmask;
-
       /* Reset the cancellation signal mask in case this thread is
 	 running cancellation.  */
       __sigdelset (&pd->sigmask, SIGCANCEL);
@@ -814,30 +801,33 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
   if (__glibc_unlikely (retval != 0))
     {
       if (thread_ran)
-	/* State (c) or (d) and we may not have PD ownership (see
-	   CONCURRENCY NOTES above).  We can assert that STOPPED_START
-	   must have been true because thread creation didn't fail, but
-	   thread attribute setting did.  */
-	/* See bug 19511 which explains why doing nothing here is a
-	   resource leak for a joinable thread.  */
-	assert (stopped_start);
-      else
-	{
-	  /* State (e) and we have ownership of PD (see CONCURRENCY
-	     NOTES above).  */
+	/* State (c) and we not have PD ownership (see CONCURRENCY NOTES
+	   above).  We can assert that STOPPED_START must have been true
+	   because thread creation didn't fail, but thread attribute setting
+	   did.  */
+        {
+	  assert (stopped_start);
+	  /* Signal the created thread to release PD ownership and early
+	     exit so it could be joined.  */
+	  pd->setup_failed = 1;
+	  lll_unlock (pd->lock, LLL_PRIVATE);
+
+	  /* Similar to pthread_join, but since thread creation has failed at
+	     startup there is no need to handle all the steps.  */
+	  pid_t tid;
+	  while ((tid = atomic_load_acquire (&pd->tid)) != 0)
+	    __futex_abstimed_wait_cancelable64 ((unsigned int *) &pd->tid,
+						tid, 0, NULL, LLL_SHARED);
+        }
 
-	  /* Oops, we lied for a second.  */
-	  atomic_decrement (&__nptl_nthreads);
+      /* State (c) or (d) and we have ownership of PD (see CONCURRENCY
+	 NOTES above).  */
 
-	  /* Perhaps a thread wants to change the IDs and is waiting for this
-	     stillborn thread.  */
-	  if (__glibc_unlikely (atomic_exchange_acq (&pd->setxid_futex, 0)
-				== -2))
-	    futex_wake (&pd->setxid_futex, 1, FUTEX_PRIVATE);
+      /* Oops, we lied for a second.  */
+      atomic_decrement (&__nptl_nthreads);
 
-	  /* Free the resources.  */
-	  __nptl_deallocate_stack (pd);
-	}
+      /* Free the resources.  */
+      __nptl_deallocate_stack (pd);
 
       /* We have to translate error codes.  */
       if (retval == ENOMEM)
Florian Weimer June 9, 2021, 1:49 p.m. UTC | #15
* Adhemerval Zanella:

> THREAD_RAN only signals if whether clone() or thread setup has failed, 
> there is no synchronization involved and it is already local to
> creating thread.  I changed the concurrency comment to the following,
> to indicate that (c) and (d) are essentially the same as you noted:
>
>    (c) If either a joinable or detached thread setup failed and THREAD_RAN
>        is true, then the creating thread releases ownership to the new thread,
>        the created thread sees the failed setup through PD SETUP_FAILED

PD->setup_failed (I think this is the GNU convention).

>        member, releases the PD ownership, and exits.  The creating thread will
>        be responsible for cleanup the allocated resources.  The THREAD_RAN is
>        local to creating thread and indicate whether thread creation or setup
>        has failed.
>    
>    (d) If the thread creation failed and THREAD_RAN is false (meaning
>        ARCH_CLONE has failed), then the creating thread retains ownership
>        of PD and must cleanup he allocated resource.  No waiting for the new
>        thread is required because it never started.

Yes, that's much better.

> I think for such case you will need to set PD->stopped_start to true
> before thread creation (similar to what affinity and schedule set do),
> add the required initialization after the setup failure check, and
> handle potential errors.
>
> Something like (I removed the comments to simplify):

Yeah, that is what I expect as well.

> Below it is the updated patch based on your suggestion, including the
> concurrency state descriptor regarding THREAD_RAN.  Are this version
> ok to commit?

This version looks okay to me, with the PD->setup_failed change above.

Thanks,
Florian
Adhemerval Zanella June 9, 2021, 5:07 p.m. UTC | #16
On 09/06/2021 10:49, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> THREAD_RAN only signals if whether clone() or thread setup has failed, 
>> there is no synchronization involved and it is already local to
>> creating thread.  I changed the concurrency comment to the following,
>> to indicate that (c) and (d) are essentially the same as you noted:
>>
>>    (c) If either a joinable or detached thread setup failed and THREAD_RAN
>>        is true, then the creating thread releases ownership to the new thread,
>>        the created thread sees the failed setup through PD SETUP_FAILED
> 
> PD->setup_failed (I think this is the GNU convention).

Ack (I was in doubt which would the rule here).

> 
>>        member, releases the PD ownership, and exits.  The creating thread will
>>        be responsible for cleanup the allocated resources.  The THREAD_RAN is
>>        local to creating thread and indicate whether thread creation or setup
>>        has failed.
>>    
>>    (d) If the thread creation failed and THREAD_RAN is false (meaning
>>        ARCH_CLONE has failed), then the creating thread retains ownership
>>        of PD and must cleanup he allocated resource.  No waiting for the new
>>        thread is required because it never started.
> 
> Yes, that's much better.
> 
>> I think for such case you will need to set PD->stopped_start to true
>> before thread creation (similar to what affinity and schedule set do),
>> add the required initialization after the setup failure check, and
>> handle potential errors.
>>
>> Something like (I removed the comments to simplify):
> 
> Yeah, that is what I expect as well.
> 
>> Below it is the updated patch based on your suggestion, including the
>> concurrency state descriptor regarding THREAD_RAN.  Are this version
>> ok to commit?
> 
> This version looks okay to me, with the PD->setup_failed change above.

Thanks, I will run a check on some architecture and commit the patchset.
diff mbox series

Patch

diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index dc81a2ca73..2114bd2e27 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -161,6 +161,7 @@  get_cached_stack (size_t *sizep, void **memp)
   /* Cancellation handling is back to the default.  */
   result->cancelhandling = 0;
   result->cleanup = NULL;
+  result->setup_failed = 0;
 
   /* No pending event.  */
   result->nextevent = NULL;
diff --git a/nptl/descr.h b/nptl/descr.h
index 3de9535449..9d8297b45f 100644
--- a/nptl/descr.h
+++ b/nptl/descr.h
@@ -340,8 +340,9 @@  struct pthread
   /* True if thread must stop at startup time.  */
   bool stopped_start;
 
-  /* Formerly used for dealing with cancellation.  */
-  int parent_cancelhandling_unsed;
+  /* Indicate that a thread creation setup has failed (for instance the
+     scheduler or affinity).  */
+  int setup_failed;
 
   /* Lock to synchronize access to the descriptor.  */
   int lock;
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 52d6738f7f..4b143a5016 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -167,19 +167,19 @@  late_init (void)
        without error, then the created thread owns PD; otherwise, see
        (c) and (e) below.
 
-   (c) If the detached thread setup failed and THREAD_RAN is true, then
-       the creating thread releases ownership to the new thread by
-       sending a cancellation signal.  All threads set THREAD_RAN to
-       true as quickly as possible after returning from the OS kernel's
-       thread creation routine.
+   (c) If the detached thread setup failed and THREAD_RAN is true, then the
+       creating thread releases ownership to the new thread, the created
+       thread see the failed setup through a PD field and releases the
+       PD ownership and early exit.  The creating thread will be one
+       responsible for cleanup state.  All threads set THREAD_RAN to true as
+       quickly as possible after returning from the OS kernel's thread
+       creation routine.
 
    (d) If the joinable thread setup failed and THREAD_RAN is true, then
-       then the creating thread retains ownership of PD and must cleanup
+       the creating thread retains ownership of PD and must cleanup
        state.  Ownership cannot be released to the process via the
        return of pthread_create since a non-zero result entails PD is
        undefined and therefore cannot be joined to free the resources.
-       We privately call pthread_join on the thread to finish handling
-       the resource shutdown (Or at least we should, see bug 19511).
 
    (e) If the thread creation failed and THREAD_RAN is false, then the
        creating thread retains ownership of PD and must cleanup state.
@@ -239,8 +239,8 @@  late_init (void)
    The return value is zero for success or an errno code for failure.
    If the return value is ENOMEM, that will be translated to EAGAIN,
    so create_thread need not do that.  On failure, *THREAD_RAN should
-   be set to true iff the thread actually started up and then got
-   canceled before calling user code (*PD->start_routine).  */
+   be set to true iff the thread actually started up but before calling
+   the user code (*PD->start_routine).  */
 
 static int _Noreturn start_thread (void *arg);
 
@@ -308,35 +308,23 @@  static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
 			== -1))
     return errno;
 
-  /* It's started now, so if we fail below, we'll have to cancel it
-     and let it clean itself up.  */
+  /* It's started now, so if we fail below, we'll have to let it clean itself
+     up.  */
   *thread_ran = true;
 
   /* Now we have the possibility to set scheduling parameters etc.  */
   if (attr != NULL)
     {
-      int res;
-
       /* Set the affinity mask if necessary.  */
       if (need_setaffinity)
 	{
 	  assert (*stopped_start);
 
-	  res = INTERNAL_SYSCALL_CALL (sched_setaffinity, pd->tid,
-				       attr->extension->cpusetsize,
-				       attr->extension->cpuset);
-
+	  int res = INTERNAL_SYSCALL_CALL (sched_setaffinity, pd->tid,
+					   attr->extension->cpusetsize,
+					   attr->extension->cpuset);
 	  if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (res)))
-	  err_out:
-	    {
-	      /* The operation failed.  We have to kill the thread.
-		 We let the normal cancellation mechanism do the work.  */
-
-	      pid_t pid = __getpid ();
-	      INTERNAL_SYSCALL_CALL (tgkill, pid, pd->tid, SIGCANCEL);
-
-	      return INTERNAL_SYSCALL_ERRNO (res);
-	    }
+	    return INTERNAL_SYSCALL_ERRNO (res);
 	}
 
       /* Set the scheduling parameters.  */
@@ -344,11 +332,10 @@  static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
 	{
 	  assert (*stopped_start);
 
-	  res = INTERNAL_SYSCALL_CALL (sched_setscheduler, pd->tid,
-				       pd->schedpolicy, &pd->schedparam);
-
+	  int res = INTERNAL_SYSCALL_CALL (sched_setscheduler, pd->tid,
+					   pd->schedpolicy, &pd->schedparam);
 	  if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (res)))
-	    goto err_out;
+	    return INTERNAL_SYSCALL_ERRNO (res);
 	}
     }
 
@@ -361,6 +348,29 @@  start_thread (void *arg)
 {
   struct pthread *pd = arg;
 
+  /* We are either in (a) or (b), and in either case we either own PD already
+     (2) or are about to own PD (1), and so our only restriction would be that
+     we can't free PD until we know we have ownership (see CONCURRENCY NOTES
+     above).  */
+  bool setup_failed = false;
+  if (__glibc_unlikely (pd->stopped_start))
+    {
+      /* Get the lock the parent locked to force synchronization.  */
+      lll_lock (pd->lock, LLL_PRIVATE);
+
+      /* We have ownership of PD now, for detached threads with setup failure
+	 we set it as joinable so the creating thread could synchronous join
+         and free any resource prior return to the pthread_create caller.  */
+      setup_failed = pd->setup_failed == 1;
+      if (setup_failed)
+	pd->joinid = NULL;
+
+      /* And give it up right away.  */
+      lll_unlock (pd->lock, LLL_PRIVATE);
+    }
+  if (setup_failed)
+    goto out;
+
   /* Initialize resolver state pointer.  */
   __resp = &pd->res;
 
@@ -418,25 +428,6 @@  start_thread (void *arg)
       /* Store the new cleanup handler info.  */
       THREAD_SETMEM (pd, cleanup_jmp_buf, &unwind_buf);
 
-      /* We are either in (a) or (b), and in either case we either own
-         PD already (2) or are about to own PD (1), and so our only
-	 restriction would be that we can't free PD until we know we
-	 have ownership (see CONCURRENCY NOTES above).  */
-      if (__glibc_unlikely (pd->stopped_start))
-	{
-	  int oldtype = LIBC_CANCEL_ASYNC ();
-
-	  /* Get the lock the parent locked to force synchronization.  */
-	  lll_lock (pd->lock, LLL_PRIVATE);
-
-	  /* We have ownership of PD now.  */
-
-	  /* And give it up right away.  */
-	  lll_unlock (pd->lock, LLL_PRIVATE);
-
-	  LIBC_CANCEL_RESET (oldtype);
-	}
-
       LIBC_PROBE (pthread_start, 3, (pthread_t) pd, pd->start_routine, pd->arg);
 
       /* Run the code the user provided.  */
@@ -566,6 +557,7 @@  start_thread (void *arg)
     /* Free the TCB.  */
     __nptl_free_tcb (pd);
 
+out:
   /* We cannot call '_exit' here.  '_exit' will terminate the process.
 
      The 'exit' implementation in the kernel will signal when the
@@ -759,7 +751,6 @@  __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
 	 signal mask of this thread, so save it in the startup
 	 information.  */
       pd->sigmask = original_sigmask;
-
       /* Reset the cancellation signal mask in case this thread is
 	 running cancellation.  */
       __sigdelset (&pd->sigmask, SIGCANCEL);
@@ -814,30 +805,33 @@  __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
   if (__glibc_unlikely (retval != 0))
     {
       if (thread_ran)
-	/* State (c) or (d) and we may not have PD ownership (see
-	   CONCURRENCY NOTES above).  We can assert that STOPPED_START
-	   must have been true because thread creation didn't fail, but
-	   thread attribute setting did.  */
-	/* See bug 19511 which explains why doing nothing here is a
-	   resource leak for a joinable thread.  */
-	assert (stopped_start);
-      else
-	{
-	  /* State (e) and we have ownership of PD (see CONCURRENCY
-	     NOTES above).  */
+	/* State (c) and we not have PD ownership (see CONCURRENCY NOTES
+	   above).  We can assert that STOPPED_START must have been true
+	   because thread creation didn't fail, but thread attribute setting
+	   did.  */
+        {
+	  assert (stopped_start);
+	  /* Signal the creating thread to release PD ownership and early
+	     exit so it could be joined.  */
+	  pd->setup_failed = 1;
+	  lll_unlock (pd->lock, LLL_PRIVATE);
 
-	  /* Oops, we lied for a second.  */
-	  atomic_decrement (&__nptl_nthreads);
+	  /* Similar to pthread_join, but since thread creation has failed at
+	     startup there is no need to handle all the steps.  */
+	  pid_t tid;
+	  while ((tid = atomic_load_acquire (&pd->tid)) != 0)
+	    __futex_abstimed_wait_cancelable64 ((unsigned int *) &pd->tid,
+						tid, 0, NULL, LLL_SHARED);
+        }
 
-	  /* Perhaps a thread wants to change the IDs and is waiting for this
-	     stillborn thread.  */
-	  if (__glibc_unlikely (atomic_exchange_acq (&pd->setxid_futex, 0)
-				== -2))
-	    futex_wake (&pd->setxid_futex, 1, FUTEX_PRIVATE);
+      /* State (c), (d), or (e) and we have ownership of PD (see CONCURRENCY
+	 NOTES above).  */
 
-	  /* Free the resources.  */
-	  __nptl_deallocate_stack (pd);
-	}
+      /* Oops, we lied for a second.  */
+      atomic_decrement (&__nptl_nthreads);
+
+      /* Free the resources.  */
+      __nptl_deallocate_stack (pd);
 
       /* We have to translate error codes.  */
       if (retval == ENOMEM)