diff mbox series

: Fix blocking pthread_join.

Message ID 8aeb7b8b-6b1e-9a60-e961-75cde1aa463b@linux.vnet.ibm.com
State New
Headers show
Series : Fix blocking pthread_join. | expand

Commit Message

Stefan Liebler April 25, 2018, 11:27 a.m. UTC
Hi,

On s390 (31bit) if glibc is build with -Os, pthread_join sometimes
blocks indefinitely. This is e.g. observable with
testcase intl/tst-gettext6.

pthread_join is calling lll_wait_tid(tid), which performs the futex-wait
syscall in a loop as long as tid != 0 (thread is alive).

On s390 (and build with -Os), tid is loaded from memory before
comparing against zero and then the tid is loaded a second time
in order to pass it to the futex-wait-syscall.
If the thread exits in between, then the futex-wait-syscall is
called with the value zero and it waits until a futex-wake occurs.
As the thread is already exited, there won't be a futex-wake.

In lll_wait_tid, the tid is stored to the local variable __tid,
which is then used as argument for the futex-wait-syscall.
But unfortunately the compiler is allowed to reload the value
from memory.

With this patch, the tid is loaded by dereferencing a volatile pointer.
Then the compiler is not allowed to reload the value for __tid from memory.

Okay to commit?

Bye
Stefan

---

ChangeLog:

	* sysdeps/nptl/lowlevellock.h (lll_wait_tid):
	Use a volatile pointer to load __tid.

Comments

Carlos O'Donell April 25, 2018, 12:39 p.m. UTC | #1
On 04/25/2018 06:27 AM, Stefan Liebler wrote:
> Hi,
> 
> On s390 (31bit) if glibc is build with -Os, pthread_join sometimes
> blocks indefinitely. This is e.g. observable with
> testcase intl/tst-gettext6.
> 
> pthread_join is calling lll_wait_tid(tid), which performs the futex-wait
> syscall in a loop as long as tid != 0 (thread is alive).
> 
> On s390 (and build with -Os), tid is loaded from memory before
> comparing against zero and then the tid is loaded a second time
> in order to pass it to the futex-wait-syscall.
> If the thread exits in between, then the futex-wait-syscall is
> called with the value zero and it waits until a futex-wake occurs.
> As the thread is already exited, there won't be a futex-wake.
> 
> In lll_wait_tid, the tid is stored to the local variable __tid,
> which is then used as argument for the futex-wait-syscall.
> But unfortunately the compiler is allowed to reload the value
> from memory.
> 
> With this patch, the tid is loaded by dereferencing a volatile pointer.
> Then the compiler is not allowed to reload the value for __tid from memory.
> 
> Okay to commit?

Would using an atomic type and an atomic load MO relaxed prevent the
compiler from reloading from memory?

I'm unhappy with the use of volatile here because it's not quite
the real semantics. Sure, the memory is volatile, it may change at
any point, but that's not what matters. What matters is that we load
from that memory once and only once.
Torvald Riegel April 25, 2018, 3:25 p.m. UTC | #2
On Wed, 2018-04-25 at 07:39 -0500, Carlos O'Donell wrote:
> On 04/25/2018 06:27 AM, Stefan Liebler wrote:
> > With this patch, the tid is loaded by dereferencing a volatile pointer.
> > Then the compiler is not allowed to reload the value for __tid from memory.

We always use atomic accesses when it comes to concurrently accessed
data (there are exceptions, but these are tightly controlled).
We never use volatile to "fix" concurrent accesses.

> > Okay to commit?
> 
> Would using an atomic type and an atomic load MO relaxed prevent the
> compiler from reloading from memory?

That's the right fix, and it should be an acquire MO load to synchronize
with the kernel's store to 0.  (We should make it a requirement for the
kernel to use a release store; IIRC, it is on many archs, but it isn't
documented.)

The accesses to the TID should be changed to use atomics everywhere, and
some (simple) concurrency notes should be added.

> I'm unhappy with the use of volatile here because it's not quite
> the real semantics. Sure, the memory is volatile, it may change at
> any point, but that's not what matters. What matters is that we load
> from that memory once and only once.

It's a normal concurrent access, so we're using atomics for it.
Volatile but non-atomic is for cases where one would communicate with an
external device or sth like that, and those device's memory accesses
would appear to interrupt the thread that's using the volatile accesses.
IOW, it's like sequential code from a memory-model perspective, just
that the device's accesses can interleave with the CPU thread's
accesses.  There's no such simple interleaving when it comes to
concurrent accesses.
Stefan Liebler April 30, 2018, 3:19 p.m. UTC | #3
On 04/25/2018 05:25 PM, Torvald Riegel wrote:
> On Wed, 2018-04-25 at 07:39 -0500, Carlos O'Donell wrote:
>> On 04/25/2018 06:27 AM, Stefan Liebler wrote:
>>> With this patch, the tid is loaded by dereferencing a volatile pointer.
>>> Then the compiler is not allowed to reload the value for __tid from memory.
> 
> We always use atomic accesses when it comes to concurrently accessed
> data (there are exceptions, but these are tightly controlled).
> We never use volatile to "fix" concurrent accesses.
> 
>>> Okay to commit?
>>
>> Would using an atomic type and an atomic load MO relaxed prevent the
>> compiler from reloading from memory?
> 
> That's the right fix, and it should be an acquire MO load to synchronize
> with the kernel's store to 0.  (We should make it a requirement for the
> kernel to use a release store; IIRC, it is on many archs, but it isn't
> documented.)
> 
See the attached patch for lll_wait_tid.
This prevents the compiler from reloading from memory if build with -Os 
on s390 (31bit).

> The accesses to the TID should be changed to use atomics everywhere, and
> some (simple) concurrency notes should be added.
> There are some functions which are using the loaded pd->tid as argument 
for e.g. passing it to a syscall.
Then this syscall "operates" on the thread with given tid or on the 
calling thread if zero was specified, e.g.:
-nptl/pthread_setschedparam.c: The INVALID_TD_P macro is used in order 
to check if pd->tid is valid, but pd->tid is reloaded before the call to 
__sched_setscheduler().
-sysdeps/unix/sysv/linux/pthread_[s|g]etaffinity.c: pd is not evaluated 
with INVALID_TD_P macro in order to return ESRCH. If the thread has 
already exited, then this function won't fail with ESRCH.

Can we enhance the INVALID_TD_P macro in a way, that it additionally 
stores the evaluated tid in a local variable?
Then we could e.g. pass this tid-value to the mentioned syscalls.
Is atomic_load_relaxed enough for loading pd->tid within INVALID_TD_P?
In the examples above, the syscall will fail if the thread has just exited.

>> I'm unhappy with the use of volatile here because it's not quite
>> the real semantics. Sure, the memory is volatile, it may change at
>> any point, but that's not what matters. What matters is that we load
>> from that memory once and only once.
> 
> It's a normal concurrent access, so we're using atomics for it.
> Volatile but non-atomic is for cases where one would communicate with an
> external device or sth like that, and those device's memory accesses
> would appear to interrupt the thread that's using the volatile accesses.
> IOW, it's like sequential code from a memory-model perspective, just
> that the device's accesses can interleave with the CPU thread's
> accesses.  There's no such simple interleaving when it comes to
> concurrent accesses.
diff --git a/sysdeps/nptl/lowlevellock.h b/sysdeps/nptl/lowlevellock.h
index 8326e2805c..bfbda99940 100644
--- a/sysdeps/nptl/lowlevellock.h
+++ b/sysdeps/nptl/lowlevellock.h
@@ -181,11 +181,14 @@ extern int __lll_timedlock_wait (int *futex, const struct timespec *,
    thread ID while the clone is running and is reset to zero by the kernel
    afterwards.  The kernel up to version 3.16.3 does not use the private futex
    operations for futex wake-up when the clone terminates.  */
-#define lll_wait_tid(tid) \
-  do {					\
-    __typeof (tid) __tid;		\
-    while ((__tid = (tid)) != 0)	\
-      lll_futex_wait (&(tid), __tid, LLL_SHARED);\
+#define lll_wait_tid(tid)				\
+  do {							\
+    __typeof (tid) __tid;				\
+    /* We need acquire MO here so that we synchronize	\
+       with the kernel's store to 0 when the clone	\
+       terminates. (see above)  */			\
+    while ((__tid = atomic_load_acquire (&(tid))) != 0)	\
+      lll_futex_wait (&(tid), __tid, LLL_SHARED);	\
   } while (0)
 
 extern int __lll_timedwait_tid (int *, const struct timespec *)
Carlos O'Donell May 2, 2018, 4:27 a.m. UTC | #4
On 04/30/2018 11:19 AM, Stefan Liebler wrote:
> diff --git a/sysdeps/nptl/lowlevellock.h b/sysdeps/nptl/lowlevellock.h
> index 8326e2805c..bfbda99940 100644
> --- a/sysdeps/nptl/lowlevellock.h
> +++ b/sysdeps/nptl/lowlevellock.h
> @@ -181,11 +181,14 @@ extern int __lll_timedlock_wait (int *futex, const struct timespec *,
>     thread ID while the clone is running and is reset to zero by the kernel
>     afterwards.  The kernel up to version 3.16.3 does not use the private futex
>     operations for futex wake-up when the clone terminates.  */
> -#define lll_wait_tid(tid) \
> -  do {					\
> -    __typeof (tid) __tid;		\
> -    while ((__tid = (tid)) != 0)	\
> -      lll_futex_wait (&(tid), __tid, LLL_SHARED);\
> +#define lll_wait_tid(tid)				\
> +  do {							\
> +    __typeof (tid) __tid;				\
> +    /* We need acquire MO here so that we synchronize	\
> +       with the kernel's store to 0 when the clone	\
> +       terminates. (see above)  */			\
> +    while ((__tid = atomic_load_acquire (&(tid))) != 0)	\
> +      lll_futex_wait (&(tid), __tid, LLL_SHARED);	\
>    } while (0)
>  
>  extern int __lll_timedwait_tid (int *, const struct timespec *)

This looks good to me, and improves the situation.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

I haven't had a chance to review the other P&C issues discussed by Torvald,
but we should probably raise them in a new thread related to tid reloading
and the consequences.
Rich Felker May 2, 2018, 4:29 p.m. UTC | #5
On Wed, Apr 25, 2018 at 01:27:07PM +0200, Stefan Liebler wrote:
> Hi,
> 
> On s390 (31bit) if glibc is build with -Os, pthread_join sometimes
> blocks indefinitely. This is e.g. observable with
> testcase intl/tst-gettext6.
> 
> pthread_join is calling lll_wait_tid(tid), which performs the futex-wait
> syscall in a loop as long as tid != 0 (thread is alive).
> 
> On s390 (and build with -Os), tid is loaded from memory before
> comparing against zero and then the tid is loaded a second time
> in order to pass it to the futex-wait-syscall.
> If the thread exits in between, then the futex-wait-syscall is
> called with the value zero and it waits until a futex-wake occurs.
> As the thread is already exited, there won't be a futex-wake.
> 
> In lll_wait_tid, the tid is stored to the local variable __tid,
> which is then used as argument for the futex-wait-syscall.
> But unfortunately the compiler is allowed to reload the value
> from memory.
> 
> With this patch, the tid is loaded by dereferencing a volatile pointer.
> Then the compiler is not allowed to reload the value for __tid from memory.
> 
> Okay to commit?

There should probably be a bugzilla issue for this, no?

Rich
Carlos O'Donell May 2, 2018, 5:54 p.m. UTC | #6
On 05/02/2018 12:29 PM, Rich Felker wrote:
> On Wed, Apr 25, 2018 at 01:27:07PM +0200, Stefan Liebler wrote:
>> Hi,
>>
>> On s390 (31bit) if glibc is build with -Os, pthread_join sometimes
>> blocks indefinitely. This is e.g. observable with
>> testcase intl/tst-gettext6.
>>
>> pthread_join is calling lll_wait_tid(tid), which performs the futex-wait
>> syscall in a loop as long as tid != 0 (thread is alive).
>>
>> On s390 (and build with -Os), tid is loaded from memory before
>> comparing against zero and then the tid is loaded a second time
>> in order to pass it to the futex-wait-syscall.
>> If the thread exits in between, then the futex-wait-syscall is
>> called with the value zero and it waits until a futex-wake occurs.
>> As the thread is already exited, there won't be a futex-wake.
>>
>> In lll_wait_tid, the tid is stored to the local variable __tid,
>> which is then used as argument for the futex-wait-syscall.
>> But unfortunately the compiler is allowed to reload the value
>> from memory.
>>
>> With this patch, the tid is loaded by dereferencing a volatile pointer.
>> Then the compiler is not allowed to reload the value for __tid from memory.
>>
>> Okay to commit?
> 
> There should probably be a bugzilla issue for this, no?

Yes. Publicly visible bugs need one.
Stefan Liebler May 4, 2018, 8:07 a.m. UTC | #7
On 05/02/2018 06:27 AM, Carlos O'Donell wrote:
> On 04/30/2018 11:19 AM, Stefan Liebler wrote:
>> diff --git a/sysdeps/nptl/lowlevellock.h b/sysdeps/nptl/lowlevellock.h
>> index 8326e2805c..bfbda99940 100644
>> --- a/sysdeps/nptl/lowlevellock.h
>> +++ b/sysdeps/nptl/lowlevellock.h
>> @@ -181,11 +181,14 @@ extern int __lll_timedlock_wait (int *futex, const struct timespec *,
>>      thread ID while the clone is running and is reset to zero by the kernel
>>      afterwards.  The kernel up to version 3.16.3 does not use the private futex
>>      operations for futex wake-up when the clone terminates.  */
>> -#define lll_wait_tid(tid) \
>> -  do {					\
>> -    __typeof (tid) __tid;		\
>> -    while ((__tid = (tid)) != 0)	\
>> -      lll_futex_wait (&(tid), __tid, LLL_SHARED);\
>> +#define lll_wait_tid(tid)				\
>> +  do {							\
>> +    __typeof (tid) __tid;				\
>> +    /* We need acquire MO here so that we synchronize	\
>> +       with the kernel's store to 0 when the clone	\
>> +       terminates. (see above)  */			\
>> +    while ((__tid = atomic_load_acquire (&(tid))) != 0)	\
>> +      lll_futex_wait (&(tid), __tid, LLL_SHARED);	\
>>     } while (0)
>>   
>>   extern int __lll_timedwait_tid (int *, const struct timespec *)
> 
> This looks good to me, and improves the situation.
> 
> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
> 
I've just committed this part and opened the
"Bug 23137 - s390: pthread_join sometimes block indefinitely (on 31bit 
and libc build with -Os)"
(https://sourceware.org/bugzilla/show_bug.cgi?id=23137)

> I haven't had a chance to review the other P&C issues discussed by Torvald,
> but we should probably raise them in a new thread related to tid reloading
> and the consequences.
>
diff mbox series

Patch

commit be2e80e32fa4d0c7f7d021f550d21ab102aa8c42
Author: Stefan Liebler <stli@linux.vnet.ibm.com>
Date:   Wed Apr 25 12:51:27 2018 +0200

    Fix blocking pthread_join.
    
    On s390 (31bit) if glibc is build with -Os, pthread_join sometimes
    blocks indefinitely. This is e.g. observable with
    testcase intl/tst-gettext6.
    
    pthread_join is calling lll_wait_tid(tid), which performs the futex-wait
    syscall in a loop as long as tid != 0 (thread is alive).
    
    On s390 (and build with -Os), tid is loaded from memory before
    comparing against zero and then the tid is loaded a second time
    in order to pass it to the futex-wait-syscall.
    If the thread exits in between, then the futex-wait-syscall is
    called with the value zero and it waits until a futex-wake occurs.
    As the thread is already exited, there won't be a futex-wake.
    
    In lll_wait_tid, the tid is stored to the local variable __tid,
    which is then used as argument for the futex-wait-syscall.
    But unfortunately the compiler is allowed to reload the value
    from memory.
    
    With this patch, the tid is loaded by dereferencing a volatile pointer.
    Then the compiler is not allowed to reload the value for __tid from memory.
    
    ChangeLog:
    
            * sysdeps/nptl/lowlevellock.h (lll_wait_tid):
            Use a volatile pointer to load __tid.

diff --git a/sysdeps/nptl/lowlevellock.h b/sysdeps/nptl/lowlevellock.h
index 8326e2805c..6e06e8498a 100644
--- a/sysdeps/nptl/lowlevellock.h
+++ b/sysdeps/nptl/lowlevellock.h
@@ -184,7 +184,7 @@  extern int __lll_timedlock_wait (int *futex, const struct timespec *,
 #define lll_wait_tid(tid) \
   do {					\
     __typeof (tid) __tid;		\
-    while ((__tid = (tid)) != 0)	\
+    while ((__tid = *(volatile __typeof(tid) *) &(tid)) != 0)	\
       lll_futex_wait (&(tid), __tid, LLL_SHARED);\
   } while (0)