diff mbox

[PR18457] Don't require rtld lock to compute DTV addr for static TLS

Message ID 20150603152448.GC32684@spoyarek.pnq.redhat.com
State New
Headers show

Commit Message

Siddhesh Poyarekar June 3, 2015, 3:24 p.m. UTC
On Wed, Jun 03, 2015 at 03:44:58AM -0300, Alexandre Oliva wrote:
> We used to store static TLS addrs in the DTV at module load time, but
> this required one thread to modify another thread's DTV.  Now that we
> defer the DTV update to the first use in the thread, we should do so
> without taking the rtld lock if the module is already assigned to static
> TLS.  Taking the lock introduces deadlocks where there weren't any
> before.
> 
> This patch fixes the deadlock caused by tls_get_addr's unnecessarily
> taking the rtld lock to initialize the DTV entry for tls_dtor_list
> within __call_dtors_list, which deadlocks with a dlclose of a module
> whose finalizer joins with that thread.  The patch does not, however,
> attempt to fix other potential sources of similar deadlocks, such as
> the explicit rtld locks taken by call_dtors_list, when the dtor list
> is not empty; lazy relocation of the reference to tls_dtor_list, when
> TLS Descriptors are in use; when tls dtors call functions through the
> PLT and lazy relocation needs to be performed, or any such called
> functions interact with the dynamic loader in ways that require its
> lock to be taken.
> 
> Ok to install?

It's not good enough and is in fact probably just dancing around the
problem.  The simple patch to the test case below will cause the test
case to deadlock.  Andreas' reproducer can be fixed by simply setting
the TLS variables in cxa_thread_atexit as initial exec; I've got a
patch for it that I'll post shortly.  That would leave two other
problems:

1. All of the lock taking and NODELETE flag clearing in
   cxa_thread_atexit.  Not only can it cause a deadlock, clearing the
   flag like that may actually be wrong.  We may be better off not
   unloading the DSO at all, but I'll see if there's another way out.

2. The lock taking in tls_get_addr_tail.  That has to go and we need
   to figure out another way to wait for another dlopen to complete.
   I haven't wrapped my head around this bit of the code properly yet
   and you may be better placed to debug this.  If you don't have
   enough time, I could run with any kind of help/guidance you may
   provide.

Siddhesh

Comments

Alexandre Oliva June 3, 2015, 4:51 p.m. UTC | #1
On Jun  3, 2015, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:

> On Wed, Jun 03, 2015 at 03:44:58AM -0300, Alexandre Oliva wrote:
>> We used to store static TLS addrs in the DTV at module load time, but
>> this required one thread to modify another thread's DTV.  Now that we
>> defer the DTV update to the first use in the thread, we should do so
>> without taking the rtld lock if the module is already assigned to static
>> TLS.  Taking the lock introduces deadlocks where there weren't any
>> before.
>> 
>> This patch fixes the deadlock caused by tls_get_addr's unnecessarily
>> taking the rtld lock to initialize the DTV entry for tls_dtor_list
>> within __call_dtors_list, which deadlocks with a dlclose of a module
>> whose finalizer joins with that thread.  The patch does not, however,
>> attempt to fix other potential sources of similar deadlocks, such as
>> the explicit rtld locks taken by call_dtors_list, when the dtor list
>> is not empty; lazy relocation of the reference to tls_dtor_list, when
>> TLS Descriptors are in use; when tls dtors call functions through the
>> PLT and lazy relocation needs to be performed, or any such called
>> functions interact with the dynamic loader in ways that require its
>> lock to be taken.
>> 
>> Ok to install?

> It's not good enough

... for what?

It sure isn't good enough to enable anyone to do whatever they like in a
module finalizer, including blocking waiting for other threads that
interact, explicitly or implicitly, with the dynamic loader before the
finalizer is unblocked.  Most of these issues are preexisting, and
explicitly not addressed by the patch.

It is good enough, however, to fix the regression, namely, that
___tls_get_addr for an IE variable used to be able to complete even if
the rtld lock was taken by another thread, while now it's slower than
needed and it may deadlock.  The patch is intended to fix this
performance and deadlock regression, not any of the preexisting
problems.

> The simple patch to the test case below will cause the test
> case to deadlock.

I very much doubt it would have completed before the DTV race fix.
___tls_get_addr would find the l_tls_offset for tst-join7mod.so to be
undecided, and would attempt to take the lock to make it forced dynamic.
Deadlock.

It's not like accessing TLS variables is special WRT interacting
implicitly with the dynamic loader.  It's just the tip of the iceberg.
Any lazily-bound function called for the first time will do so as well.
libdl functions will also interact with the dynamic loader, obviously,
and there are various cases in which localization libraries deeply
hidden in library internals will dlopen modules to translate an error
message or somesuch.  There are so many things that can go wrong if a
module initializer or finalizer is called with the rtld lock held that
I'm convinced the solution is not to go case after case after case
trying to somehow make it lock-free, but rather to release the rtld lock
before handing control over to user-supplied code (init and fini).  This
would require, before releasing the lock, placing the modules in
intermediate states and making a list of things without the lock and
after taking the lock again, and then, after we're done with the
lockless list and take the lock back, checking that the things to do are
still appropriate.  We might even require additional corner-case states,
such as a module that was to be finalized and released, but that got
dlopened again by some other concurrent finalizer.  Tricky, but doable.

The alternative, IMHO, is not to go after each case that takes the rtld
lock and somehow avoid that, but rather to document what can and cannot
be done in module finalizers.  We might want to rule out synchronization
with other threads, or just document that they are called while holding
a lock that various infrastructure bits have to take at unanticipated
times, so any such synchronization is prone to deadlocks.

> Andreas' reproducer can be fixed by simply setting
> the TLS variables in cxa_thread_atexit as initial exec

It's not clear to me that we want to make libc.so variables IE.  If we
do, there are a number of others that could benefit from this treatment.

> 1. All of the lock taking and NODELETE flag clearing in
>    cxa_thread_atexit.

As I wrote:

>> The patch does not, however,
>> attempt to fix other potential sources of similar deadlocks, such as
>> the explicit rtld locks taken by call_dtors_list

> 2. The lock taking in tls_get_addr_tail.

Thanks, I failed to list this one as a remaining non-regression problem
we might have to address.

>    That has to go and we need
>    to figure out another way to wait for another dlopen to complete.

I don't think so, but I won't try to stop you ;-)
Siddhesh Poyarekar June 3, 2015, 5:24 p.m. UTC | #2
On Wed, Jun 03, 2015 at 01:51:32PM -0300, Alexandre Oliva wrote:
> It sure isn't good enough to enable anyone to do whatever they like in a
> module finalizer, including blocking waiting for other threads that
> interact, explicitly or implicitly, with the dynamic loader before the
> finalizer is unblocked.  Most of these issues are preexisting, and
> explicitly not addressed by the patch.
> 
> It is good enough, however, to fix the regression, namely, that
> ___tls_get_addr for an IE variable used to be able to complete even if
> the rtld lock was taken by another thread, while now it's slower than
> needed and it may deadlock.  The patch is intended to fix this
> performance and deadlock regression, not any of the preexisting
> problems.

Here's the core of my lack of understanding I guess - calling
__tls_get_addr for an IE variable.  Does that ever happen?  An IE
variable resolution does not result in a call to __tls_get_addr
AFAICT.  Maybe you mean a variable that could be relaxed as an IE but
was compiled to use GD?

> I very much doubt it would have completed before the DTV race fix.
> ___tls_get_addr would find the l_tls_offset for tst-join7mod.so to be
> undecided, and would attempt to take the lock to make it forced dynamic.
> Deadlock.

I haven't tested that, but I did test with master and it works just
fine.  This is because __tls_get_addr doesn't get called at all when
the variables are set as IE.  And for that reason, I don't see why it
shouldn't work otherwise.

> It's not like accessing TLS variables is special WRT interacting
> implicitly with the dynamic loader.  It's just the tip of the iceberg.
> Any lazily-bound function called for the first time will do so as well.
> libdl functions will also interact with the dynamic loader, obviously,
> and there are various cases in which localization libraries deeply
> hidden in library internals will dlopen modules to translate an error
> message or somesuch.  There are so many things that can go wrong if a
> module initializer or finalizer is called with the rtld lock held that
> I'm convinced the solution is not to go case after case after case
> trying to somehow make it lock-free, but rather to release the rtld lock
> before handing control over to user-supplied code (init and fini).  This
> would require, before releasing the lock, placing the modules in
> intermediate states and making a list of things without the lock and
> after taking the lock again, and then, after we're done with the
> lockless list and take the lock back, checking that the things to do are
> still appropriate.  We might even require additional corner-case states,
> such as a module that was to be finalized and released, but that got
> dlopened again by some other concurrent finalizer.  Tricky, but doable.
> 
> The alternative, IMHO, is not to go after each case that takes the rtld
> lock and somehow avoid that, but rather to document what can and cannot
> be done in module finalizers.  We might want to rule out synchronization
> with other threads, or just document that they are called while holding
> a lock that various infrastructure bits have to take at unanticipated
> times, so any such synchronization is prone to deadlocks.

Synchronization with other threads in a module finalizer seems like a
common use case.  For example, a module waiting for worker threads to
exit before being deleted.  The reproducer in fact is such a case.
The fact that the module doesn't have its own TLS is just sheer luck.

> It's not clear to me that we want to make libc.so variables IE.  If we
> do, there are a number of others that could benefit from this treatment.

These are static variables that are only referenced inside libc.so.
Also, libc.so will always be loaded before the executable is running.
I don't see why they shouldn't be IE.

> >    That has to go and we need
> >    to figure out another way to wait for another dlopen to complete.
> 
> I don't think so, but I won't try to stop you ;-)

:)

Siddhesh
Alexandre Oliva June 3, 2015, 9:42 p.m. UTC | #3
On Jun  3, 2015, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:

> Here's the core of my lack of understanding I guess - calling
> __tls_get_addr for an IE variable.  Does that ever happen?

I guess it depends on what is meant by "IE variable".

I meant it as a TLS variable defined as part of the initial module set,
i.e., the main executable and the transitive closure of its
dependencies.

Now, if you understood it as a variable declared with the initial-exec
tls_model attribute, I apologize for the confusion.

>> I very much doubt it would have completed before the DTV race fix.
>> ___tls_get_addr would find the l_tls_offset for tst-join7mod.so to be
>> undecided, and would attempt to take the lock to make it forced dynamic.
>> Deadlock.

> I haven't tested that, but I did test with master and it works just
> fine.  This is because __tls_get_addr doesn't get called at all when
> the variables are set as IE.  And for that reason, I don't see why it
> shouldn't work otherwise.

Declaring the variable with IE tls_model will indeed go through a
completely different path.  It is the GD tls_model, as in the patch for
the testcase that you posted, that will go through __tls_get_addr and
deadlock.

>> It's not clear to me that we want to make libc.so variables IE.  If we
>> do, there are a number of others that could benefit from this treatment.

> These are static variables that are only referenced inside libc.so.
> Also, libc.so will always be loaded before the executable is running.
> I don't see why they shouldn't be IE.

The case to worry about is a static executable, that for whatever reason
doesn't get these variables linked in, but that dlopens some library
that depends on libc.so, which then brings these variables into the
running image.  If libc.so had these variables as IE, this dlopen could
fail.
Siddhesh Poyarekar June 4, 2015, 5:38 a.m. UTC | #4
On Wed, Jun 03, 2015 at 06:42:28PM -0300, Alexandre Oliva wrote:
> I guess it depends on what is meant by "IE variable".
> 
> I meant it as a TLS variable defined as part of the initial module set,
> i.e., the main executable and the transitive closure of its
> dependencies.
> 
> Now, if you understood it as a variable declared with the initial-exec
> tls_model attribute, I apologize for the confusion.

OK, that's what I thought.  And I also understand now what your fix
attempts to do and I agree that is a better and more general approach.

> The case to worry about is a static executable, that for whatever reason
> doesn't get these variables linked in, but that dlopens some library
> that depends on libc.so, which then brings these variables into the
> running image.  If libc.so had these variables as IE, this dlopen could
> fail.

This seems to make a case for not having any variables in libc.so as
IE, yet a number of them already are.

Siddhesh
Carlos O'Donell June 4, 2015, 4:49 p.m. UTC | #5
On 06/03/2015 11:24 AM, Siddhesh Poyarekar wrote:
> On Wed, Jun 03, 2015 at 03:44:58AM -0300, Alexandre Oliva wrote:
>> We used to store static TLS addrs in the DTV at module load time, but
>> this required one thread to modify another thread's DTV.  Now that we
>> defer the DTV update to the first use in the thread, we should do so
>> without taking the rtld lock if the module is already assigned to static
>> TLS.  Taking the lock introduces deadlocks where there weren't any
>> before.
>>
>> This patch fixes the deadlock caused by tls_get_addr's unnecessarily
>> taking the rtld lock to initialize the DTV entry for tls_dtor_list
>> within __call_dtors_list, which deadlocks with a dlclose of a module
>> whose finalizer joins with that thread.  The patch does not, however,
>> attempt to fix other potential sources of similar deadlocks, such as
>> the explicit rtld locks taken by call_dtors_list, when the dtor list
>> is not empty; lazy relocation of the reference to tls_dtor_list, when
>> TLS Descriptors are in use; when tls dtors call functions through the
>> PLT and lazy relocation needs to be performed, or any such called
>> functions interact with the dynamic loader in ways that require its
>> lock to be taken.
>>
>> Ok to install?
> 
> It's not good enough and is in fact probably just dancing around the
> problem.  The simple patch to the test case below will cause the test
> case to deadlock.  Andreas' reproducer can be fixed by simply setting
> the TLS variables in cxa_thread_atexit as initial exec; I've got a
> patch for it that I'll post shortly.  That would leave two other
> problems:

I agree with Siddhesh and Torvald, it's not a good enough solution.

I spoke at length with Siddhesh about this, and I think we need to
sit down and really fix this once and for all by reasoning through
the TLS code and providing the appropriate atomic accesses, avoiding
data races, and documenting synchronization.

Siddhesh has the ball on this. Thanks for taking a first look, but
we're going to need to spend more time fixing this than I think you've
got :-(

Cheers,
Carlos.
Carlos O'Donell June 4, 2015, 4:56 p.m. UTC | #6
On 06/03/2015 05:42 PM, Alexandre Oliva wrote:
> On Jun  3, 2015, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:
> 
>> Here's the core of my lack of understanding I guess - calling
>> __tls_get_addr for an IE variable.  Does that ever happen?
> 
> I guess it depends on what is meant by "IE variable".
> 
> I meant it as a TLS variable defined as part of the initial module set,
> i.e., the main executable and the transitive closure of its
> dependencies.
> 
> Now, if you understood it as a variable declared with the initial-exec
> tls_model attribute, I apologize for the confusion.

An IE variable is exactly a variable using initial-exec tls model.

There is no other definition that I am aware of.

Cheers,
Carlos.
Carlos O'Donell June 4, 2015, 4:58 p.m. UTC | #7
On 06/04/2015 01:38 AM, Siddhesh Poyarekar wrote:
>> The case to worry about is a static executable, that for whatever reason
>> doesn't get these variables linked in, but that dlopens some library
>> that depends on libc.so, which then brings these variables into the
>> running image.  If libc.so had these variables as IE, this dlopen could
>> fail.
> 
> This seems to make a case for not having any variables in libc.so as
> IE, yet a number of them already are.

In practice all IE variables should be removed from dynamic libraries
and be replaced with the more flexible TLS descriptors.

However, presently, we allocate extra static TLS image space and
surplus DTV entries to allow for some core coordinated set of DSOs
to be able to be loaded with static-TLS-using IE variables.

Making all core DSOs IE-free is a worthwhile but orthogonal problem.

Cheers,
Carlos.
Alexandre Oliva June 5, 2015, 4:23 a.m. UTC | #8
On Jun  4, 2015, "Carlos O'Donell" <carlos@redhat.com> wrote:

> I agree with Siddhesh and Torvald, it's not a good enough solution.

Again, good enough for what?

It is undisputably enough to fix the performance regression that in
certain cases introduced new deadlocks to the pile of other possible
deadlocks that variants of the testcase could exercise.  It fixes a
small problem, and it's a trivial patch.

Sure, it doesn't fix a much larger preexisting problem, that's being
misrepresented as a TLS issue, but that is actually a far more pervasive
dynamic loader problem.  I have no interest or time in tackling that.

Sure, if you guys want to leave the regression unfixed until someone
figures out some way to fix a larger but mostly unrelated problem, it's
your choice.

But if this larger fix does not contain the very change I'm proposing,
tls_get_addr on variables assigned static TLS will remain much slower
than needed, because there's absolutely no need to take a lock when it
is already decided whether the module should use static or dynamic TLS.

So what is this simple regression fix waiting for to get installed,
again?
Siddhesh Poyarekar June 5, 2015, 6:04 a.m. UTC | #9
On Fri, Jun 05, 2015 at 01:23:45AM -0300, Alexandre Oliva wrote:
> It is undisputably enough to fix the performance regression that in
> certain cases introduced new deadlocks to the pile of other possible
> deadlocks that variants of the testcase could exercise.  It fixes a
> small problem, and it's a trivial patch.
> 
> Sure, it doesn't fix a much larger preexisting problem, that's being
> misrepresented as a TLS issue, but that is actually a far more pervasive
> dynamic loader problem.  I have no interest or time in tackling that.
> 
> Sure, if you guys want to leave the regression unfixed until someone
> figures out some way to fix a larger but mostly unrelated problem, it's
> your choice.

No, IMO we should fix this regression in the interim and your approach
in general seems fine to me.  I intended to point out that it doesn't
fix the larger problem and I think we're on the same page on that.

I haven't reviewed the patch in detail nor your conversation with
Torvald yet, so I can't comment on whether Torvald and I agree,
although from a quick skimming of the thread ISTM that we're talking
about different things.

Siddhesh
Carlos O'Donell June 5, 2015, 6:14 p.m. UTC | #10
On 06/05/2015 12:23 AM, Alexandre Oliva wrote:
> On Jun  4, 2015, "Carlos O'Donell" <carlos@redhat.com> wrote:
> 
>> I agree with Siddhesh and Torvald, it's not a good enough solution.
> 
> Again, good enough for what?

The solution does not meet my standards for fixing atomicity issues.
It lacks proper documentation on the synchronization changes being made.

I've asked Siddhesh to look into this issue and document some of the
expectations and perhaps resolve some of the broader issues as well.
We have received real customer reports of at least 3 kinds of P&C
related failures in this code. So we are going to expand the nature
of the fix and see if there isn't an overall cleaner solution.

> It is undisputably enough to fix the performance regression that in
> certain cases introduced new deadlocks to the pile of other possible
> deadlocks that variants of the testcase could exercise.  It fixes a
> small problem, and it's a trivial patch.

I agree.
 
> Sure, it doesn't fix a much larger preexisting problem, that's being
> misrepresented as a TLS issue, but that is actually a far more pervasive
> dynamic loader problem.  I have no interest or time in tackling that.

I appreciate your honesty in expressing what you are able to do and what
you want to do.

> Sure, if you guys want to leave the regression unfixed until someone
> figures out some way to fix a larger but mostly unrelated problem, it's
> your choice.

Siddhesh and I will be working on it right now since more than one of the
problems are impacting real customer applications.

> But if this larger fix does not contain the very change I'm proposing,
> tls_get_addr on variables assigned static TLS will remain much slower
> than needed, because there's absolutely no need to take a lock when it
> is already decided whether the module should use static or dynamic TLS.

I agree there is no need to take the lock, but your patch needs to be
expanded in more detail to make it crystal clear exactly what
synchronization is being done and why.

> So what is this simple regression fix waiting for to get installed,
> again?

As of today only Fedora is using your previous patches which reveal these
older bugs, and until we branch for 2.22, this patch can wait while we
look into a more holistic fix.

Cheers,
Carlos.
Alexandre Oliva June 5, 2015, 7:18 p.m. UTC | #11
On Jun  5, 2015, "Carlos O'Donell" <carlos@redhat.com> wrote:

> The solution does not meet my standards for fixing atomicity issues.
> It lacks proper documentation on the synchronization changes being made.

Could it be because there aren't any synchronization changes being made?
Seriously, the changes, if any, were made in the previous patch, that
didn't meet such a storm of stonewalling and nitpicking.  This one just
fixes a tiny part of the change in the previous patch.

Why is it drawing so much attention?  Why are so different standards
being applied to this one?  It's not like our consensus has changed
since then.  It's not like the problem fixed by this patch is less
serious than the one fixed before.

>> But if this larger fix does not contain the very change I'm proposing,
>> tls_get_addr on variables assigned static TLS will remain much slower
>> than needed, because there's absolutely no need to take a lock when it
>> is already decided whether the module should use static or dynamic TLS.

> I agree there is no need to take the lock, but your patch needs to be
> expanded in more detail to make it crystal clear exactly what
> synchronization is being done and why.

  /* If the TLS block for the map is already assigned to dynamic, or
     to some static TLS offset, the decision is final, and no lock is
     required.  Now, if the decision hasn't been made, take the rtld
     lock, so that an ongoing dlopen gets a chance to complete,
     possibly assigning the module to static TLS and initializing the
     corresponding TLS area for all threads, and then retest; if the
     decision is still pending, force the module to dynamic TLS.

     The risk that the thread accesses an earlier value in that memory
     location, from before it was recycled into a link map in another
     thread, is removed by the need for some happens before
     relationship between the loader that set that up and the TLS
     access that referenced the module id.  In the presence of such a
     relationship, the value will be at least as recent as the
     initialization, and in its absence, calling tls_get_addr with its
     module id invokes undefined behavior.  */

Seriously, what's missing from this?  What's not crystal clear about it?

>> So what is this simple regression fix waiting for to get installed,
>> again?

> As of today only Fedora is using your previous patches which reveal these
> older bugs, and until we branch for 2.22, this patch can wait while we
> look into a more holistic fix.

And then you're going to have a single patch fixing two or more
different issues?  Isn't that against our conventions, that prefer
separate patches to address separate issues?
Carlos O'Donell June 5, 2015, 8:31 p.m. UTC | #12
On 06/05/2015 03:18 PM, Alexandre Oliva wrote:
> On Jun  5, 2015, "Carlos O'Donell" <carlos@redhat.com> wrote:
> 
>> The solution does not meet my standards for fixing atomicity issues.
>> It lacks proper documentation on the synchronization changes being made.
> 
> Could it be because there aren't any synchronization changes being made?
> Seriously, the changes, if any, were made in the previous patch, that
> didn't meet such a storm of stonewalling and nitpicking.  This one just
> fixes a tiny part of the change in the previous patch.
> 
> Why is it drawing so much attention?  Why are so different standards
> being applied to this one?  It's not like our consensus has changed
> since then.  It's not like the problem fixed by this patch is less
> serious than the one fixed before.

Different people review different patches.

It has drawn attention because it is impacting certain use cases
which I prioritize as needing to be fixed.

>>> But if this larger fix does not contain the very change I'm proposing,
>>> tls_get_addr on variables assigned static TLS will remain much slower
>>> than needed, because there's absolutely no need to take a lock when it
>>> is already decided whether the module should use static or dynamic TLS.
> 
>> I agree there is no need to take the lock, but your patch needs to be
>> expanded in more detail to make it crystal clear exactly what
>> synchronization is being done and why.
> 
>   /* If the TLS block for the map is already assigned to dynamic, or
>      to some static TLS offset, the decision is final, and no lock is
>      required.  Now, if the decision hasn't been made, take the rtld
>      lock, so that an ongoing dlopen gets a chance to complete,
>      possibly assigning the module to static TLS and initializing the
>      corresponding TLS area for all threads, and then retest; if the
>      decision is still pending, force the module to dynamic TLS.
> 
>      The risk that the thread accesses an earlier value in that memory
>      location, from before it was recycled into a link map in another
>      thread, is removed by the need for some happens before
>      relationship between the loader that set that up and the TLS
>      access that referenced the module id.  In the presence of such a
>      relationship, the value will be at least as recent as the
>      initialization, and in its absence, calling tls_get_addr with its
>      module id invokes undefined behavior.  */
> 
> Seriously, what's missing from this?  What's not crystal clear about it?

For synchronization issues I want to see comments at all the sites
of code that coordinate to provide the synchronization, followed by
a higher level comment explaining how the synchronization happens.

One comment that covers all sites and fails to explain why the
synchronization is correct is insufficient.

>>> So what is this simple regression fix waiting for to get installed,
>>> again?
> 
>> As of today only Fedora is using your previous patches which reveal these
>> older bugs, and until we branch for 2.22, this patch can wait while we
>> look into a more holistic fix.
> 
> And then you're going to have a single patch fixing two or more
> different issues?  Isn't that against our conventions, that prefer
> separate patches to address separate issues?

No. We are going to fix one issue per commit.

Cheers,
Carlos.
Torvald Riegel June 7, 2015, 8:48 p.m. UTC | #13
On Fri, 2015-06-05 at 16:18 -0300, Alexandre Oliva wrote:
> On Jun  5, 2015, "Carlos O'Donell" <carlos@redhat.com> wrote:
> 
> > The solution does not meet my standards for fixing atomicity issues.
> > It lacks proper documentation on the synchronization changes being made.
> 
> Could it be because there aren't any synchronization changes being made?
> Seriously, the changes, if any, were made in the previous patch,

I'm not aware of another one with synchronization bits in it.  I
probably simply didn't notice it.

> that
> didn't meet such a storm of stonewalling and nitpicking.  This one just
> fixes a tiny part of the change in the previous patch.
> 
> Why is it drawing so much attention?  Why are so different standards
> being applied to this one?

As a counter-example, I reviewed Szabolcs' Lazy TLSDESC relocation data
race fix at the same level of detail, and requested comments at the same
level of detail as for your patch.

If you feel that there are other patches that haven't been reviewed
closely enough, please mention them specifically so we can see whether
we need to improve the affected code.

> >> But if this larger fix does not contain the very change I'm proposing,
> >> tls_get_addr on variables assigned static TLS will remain much slower
> >> than needed, because there's absolutely no need to take a lock when it
> >> is already decided whether the module should use static or dynamic TLS.
> 
> > I agree there is no need to take the lock, but your patch needs to be
> > expanded in more detail to make it crystal clear exactly what
> > synchronization is being done and why.
> 
>   /* If the TLS block for the map is already assigned to dynamic, or
>      to some static TLS offset, the decision is final, and no lock is
>      required.  Now, if the decision hasn't been made, take the rtld
>      lock, so that an ongoing dlopen gets a chance to complete,
>      possibly assigning the module to static TLS and initializing the
>      corresponding TLS area for all threads,

That sounds like it is not just about reaching consensus on the final
value of l_tls_offset:  For example, you mention initialization in
there, which seems to indicate that there is some dependency on
happening after initialization (ie, wait for dlopen to complete
initialization); but elsewhere in the thread you say there are no such
constraints and it's really just consensus on just this value.

Also, it sounds as if any thread concurrent with the dlopen could only
actually create a concurrent access if dlopen has the lock already --
but that doesn't seem to be a requirement based on the "spin"
example/analogy you gave.

> and then retest; if the
>      decision is still pending, force the module to dynamic TLS.
> 
>      The risk that the thread accesses an earlier value in that memory
>      location, from before it was recycled into a link map in another
>      thread,

I wouldn't point out "earlier" here, but rather specifically address
which stores (eg, initialization) the code needs a happens-before for.

> is removed by the need for some happens before
>      relationship between the loader that set that up and the TLS
>      access that referenced the module id.  In the presence of such a
>      relationship, the value will be at least as recent as the
>      initialization, and in its absence, calling tls_get_addr with its
>      module id invokes undefined behavior.  */
> 
> Seriously, what's missing from this?  What's not crystal clear about it?

See above.  The lack of atomic accesses in the code also made this seem
inconsistent.
Alexandre Oliva June 9, 2015, 2:30 a.m. UTC | #14
On Jun  7, 2015, Torvald Riegel <triegel@redhat.com> wrote:

> That sounds like it is not just about reaching consensus on the final
> value of l_tls_offset:  For example, you mention initialization in
> there,

Yeah, we've already covered the relocation in the tls initialization
block, performed by the dynamic loader.  Why do we have to go in
circles?

> Also, it sounds as if any thread concurrent with the dlopen could only
> actually create a concurrent access if dlopen has the lock already --
> but that doesn't seem to be a requirement based on the "spin"
> example/analogy you gave.

There are two roles dlopen plays here:

1. loading the module that defines the TLS variable

2. applying TLS replications to modules that reference the TLS variable

It might turn out that 1. and 2. are performed under a single rtld lock
taking, and a subsequent dlopen could bring in additional references to
TLS variables.
Torvald Riegel June 9, 2015, 12:05 p.m. UTC | #15
On Mon, 2015-06-08 at 23:30 -0300, Alexandre Oliva wrote:
> On Jun  7, 2015, Torvald Riegel <triegel@redhat.com> wrote:
> 
> > That sounds like it is not just about reaching consensus on the final
> > value of l_tls_offset:  For example, you mention initialization in
> > there,
> 
> Yeah, we've already covered the relocation in the tls initialization
> block, performed by the dynamic loader.  Why do we have to go in
> circles?

Your comment, that you claimed was crystal-clear, seems inconsistent
with what you described elsewhere.  That's the reason.

> > Also, it sounds as if any thread concurrent with the dlopen could only
> > actually create a concurrent access if dlopen has the lock already --
> > but that doesn't seem to be a requirement based on the "spin"
> > example/analogy you gave.
> 
> There are two roles dlopen plays here:
> 
> 1. loading the module that defines the TLS variable
> 
> 2. applying TLS replications to modules that reference the TLS variable
> 
> It might turn out that 1. and 2. are performed under a single rtld lock
> taking, and a subsequent dlopen could bring in additional references to
> TLS variables.
> 

I can't follow you; I don't see how it addresses my comment.  But given
that you'll stop working on this patch anywhere and have handed it off
to Siddhesh, I'll discuss with Siddhesh.
Alexandre Oliva June 9, 2015, 10:32 p.m. UTC | #16
On Jun  9, 2015, Torvald Riegel <triegel@redhat.com> wrote:

> On Mon, 2015-06-08 at 23:30 -0300, Alexandre Oliva wrote:
>> On Jun  7, 2015, Torvald Riegel <triegel@redhat.com> wrote:

>> > Also, it sounds as if any thread concurrent with the dlopen could only
>> > actually create a concurrent access if dlopen has the lock already --
>> > but that doesn't seem to be a requirement based on the "spin"
>> > example/analogy you gave.
>> 
>> There are two roles dlopen plays here:
>> 
>> 1. loading the module that defines the TLS variable
>> 
>> 2. applying TLS replications to modules that reference the TLS variable
>> 
>> It might turn out that 1. and 2. are performed under a single rtld lock
>> taking, and a subsequent dlopen could bring in additional references to
>> TLS variables.

> I can't follow you; I don't see how it addresses my comment.

rtld's role 1 is analogous to init in the electron spin example.

rtld's role 2 is analogous to make_positive in the electron spin
example.

In some cases, rtld roles 1 and 2 are performed without releasing the
lock.  In this case, no concurrent accesses are possible, because nobody
else can have the pointer returned by init, to pass to the other
functions, before init returns.

Now, in case rtld's role 2 is performed separately, as in a subsequent
dlopen (analogous to make_positive), there may be concurrent
tls_get_addr accesses (analogous to concurrent positive_p calls).  At
this point, role 1 has already released the lock and set up the module
defining the TLS variable and possibly processed some dynamic TLS access
model relocations (analogous to init returning), so there is room for
concurrency between those accesses and a subsequent dlopen, that may or
may not require the variable to be placed in static TLS.

Does this clarify how my response addresses your comment?
diff mbox

Patch

diff --git a/nptl/tst-join7mod.c b/nptl/tst-join7mod.c
index a8c7bc0..a066a1f 100644
--- a/nptl/tst-join7mod.c
+++ b/nptl/tst-join7mod.c
@@ -4,12 +4,15 @@ 
 static pthread_t th;
 static int running = 1;
 
+static __thread int foo;
+
 static void *
 test_run (void *p)
 {
   while (running)
     fprintf (stderr, "XXX test_run\n");
   fprintf (stderr, "XXX test_run FINISHED\n");
+  foo = 42;
   return NULL;
 }