diff mbox

do not take mutex in _Unwind_Find_registered_FDE if there is no registered objects

Message ID 20160725134453.GL1018@scylladb.com
State New
Headers show

Commit Message

Gleb Natapov July 25, 2016, 1:44 p.m. UTC
_Unwind_Find_FDE calls _Unwind_Find_registered_FDE and it takes lock even
when there is no registered objects. As far as I see only statically
linked applications call __register_frame_info* functions, so for
dynamically linked executables taking the lock to check unseen_objects
and seen_objects is a pessimization. Since the function is called on
each thrown exception this is a lot of unneeded locking.  This patch
checks unseen_objects and seen_objects outside of the lock and returns
earlier if both are NULL.

--
			Gleb.

Comments

Jeff Law July 27, 2016, 11:12 p.m. UTC | #1
On 07/25/2016 07:44 AM, Gleb Natapov wrote:
> _Unwind_Find_FDE calls _Unwind_Find_registered_FDE and it takes lock even
> when there is no registered objects. As far as I see only statically
> linked applications call __register_frame_info* functions, so for
> dynamically linked executables taking the lock to check unseen_objects
> and seen_objects is a pessimization. Since the function is called on
> each thrown exception this is a lot of unneeded locking.  This patch
> checks unseen_objects and seen_objects outside of the lock and returns
> earlier if both are NULL.
>
> diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
> index 5b16a1f..41de746 100644
> --- a/libgcc/unwind-dw2-fde.c
> +++ b/libgcc/unwind-dw2-fde.c
> @@ -1001,6 +1001,13 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
>    struct object *ob;
>    const fde *f = NULL;
>
> +  /* __atomic_write is not used to modify unseen_objects and seen_objects
> +     since they are modified in locked sections only and unlock provides
> +     release semantics. */
> +  if (!__atomic_load_n(&unseen_objects, __ATOMIC_ACQUIRE)
> +      && !__atomic_load_n(&seen_objects, __ATOMIC_ACQUIRE))
> +      return NULL;
As as Andrew noted, this might be bad on targets that don't have 
atomics.  For those we could easily end up inside a libfunc which would 
be unfortunate.  Certain mips/arm variants come to mind here.

For targets that don't have atomics or any kind of synchronization 
libfunc, we'll emit nop-asm-barriers to at least stop the optimizers 
from munging things too badly.

It's been a couple years since I've really thought about these kinds of 
synchronization issues -- is it really safe in a weakly ordered 
processor to rely on the mutex lock/unlock of the "object_mutex" to 
order the loads/stores of "unseen_objects" and "seen_objects"?



Jeff
Gleb Natapov July 28, 2016, 4:24 a.m. UTC | #2
On Wed, Jul 27, 2016 at 05:12:18PM -0600, Jeff Law wrote:
> On 07/25/2016 07:44 AM, Gleb Natapov wrote:
> > _Unwind_Find_FDE calls _Unwind_Find_registered_FDE and it takes lock even
> > when there is no registered objects. As far as I see only statically
> > linked applications call __register_frame_info* functions, so for
> > dynamically linked executables taking the lock to check unseen_objects
> > and seen_objects is a pessimization. Since the function is called on
> > each thrown exception this is a lot of unneeded locking.  This patch
> > checks unseen_objects and seen_objects outside of the lock and returns
> > earlier if both are NULL.
> > 
> > diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
> > index 5b16a1f..41de746 100644
> > --- a/libgcc/unwind-dw2-fde.c
> > +++ b/libgcc/unwind-dw2-fde.c
> > @@ -1001,6 +1001,13 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
> >    struct object *ob;
> >    const fde *f = NULL;
> > 
> > +  /* __atomic_write is not used to modify unseen_objects and seen_objects
> > +     since they are modified in locked sections only and unlock provides
> > +     release semantics. */
> > +  if (!__atomic_load_n(&unseen_objects, __ATOMIC_ACQUIRE)
> > +      && !__atomic_load_n(&seen_objects, __ATOMIC_ACQUIRE))
> > +      return NULL;
> As as Andrew noted, this might be bad on targets that don't have atomics.
> For those we could easily end up inside a libfunc which would be
> unfortunate.  Certain mips/arm variants come to mind here.
> 
> For targets that don't have atomics or any kind of synchronization libfunc,
> we'll emit nop-asm-barriers to at least stop the optimizers from munging
> things too badly.
> 
> It's been a couple years since I've really thought about these kinds of
> synchronization issues -- is it really safe in a weakly ordered processor to
> rely on the mutex lock/unlock of the "object_mutex" to order the
> loads/stores of "unseen_objects" and "seen_objects"?
> 
I am pretty sure it is. After mutex unlock another cpu will not see
old value (provided it uses acquire semantics to read value).

But when I wrote the patch I did not notice that Jakub already wrote one
that address the same issue and avoids both of your concerns. It can be
found here: https://gcc.gnu.org/bugzilla/attachment.cgi?id=38852. Can we
apply his version?


--
			Gleb.
Jeff Law July 29, 2016, 4:03 p.m. UTC | #3
On 07/27/2016 10:24 PM, Gleb Natapov wrote:
> On Wed, Jul 27, 2016 at 05:12:18PM -0600, Jeff Law wrote:
>> On 07/25/2016 07:44 AM, Gleb Natapov wrote:
>>> _Unwind_Find_FDE calls _Unwind_Find_registered_FDE and it takes lock even
>>> when there is no registered objects. As far as I see only statically
>>> linked applications call __register_frame_info* functions, so for
>>> dynamically linked executables taking the lock to check unseen_objects
>>> and seen_objects is a pessimization. Since the function is called on
>>> each thrown exception this is a lot of unneeded locking.  This patch
>>> checks unseen_objects and seen_objects outside of the lock and returns
>>> earlier if both are NULL.
>>>
>>> diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
>>> index 5b16a1f..41de746 100644
>>> --- a/libgcc/unwind-dw2-fde.c
>>> +++ b/libgcc/unwind-dw2-fde.c
>>> @@ -1001,6 +1001,13 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
>>>    struct object *ob;
>>>    const fde *f = NULL;
>>>
>>> +  /* __atomic_write is not used to modify unseen_objects and seen_objects
>>> +     since they are modified in locked sections only and unlock provides
>>> +     release semantics. */
>>> +  if (!__atomic_load_n(&unseen_objects, __ATOMIC_ACQUIRE)
>>> +      && !__atomic_load_n(&seen_objects, __ATOMIC_ACQUIRE))
>>> +      return NULL;
>> As as Andrew noted, this might be bad on targets that don't have atomics.
>> For those we could easily end up inside a libfunc which would be
>> unfortunate.  Certain mips/arm variants come to mind here.
>>
>> For targets that don't have atomics or any kind of synchronization libfunc,
>> we'll emit nop-asm-barriers to at least stop the optimizers from munging
>> things too badly.
>>
>> It's been a couple years since I've really thought about these kinds of
>> synchronization issues -- is it really safe in a weakly ordered processor to
>> rely on the mutex lock/unlock of the "object_mutex" to order the
>> loads/stores of "unseen_objects" and "seen_objects"?
>>
> I am pretty sure it is. After mutex unlock another cpu will not see
> old value (provided it uses acquire semantics to read value).
>
> But when I wrote the patch I did not notice that Jakub already wrote one
> that address the same issue and avoids both of your concerns. It can be
> found here: https://gcc.gnu.org/bugzilla/attachment.cgi?id=38852. Can we
> apply his version?
Jakub is on PTO for another week.  I'm sure he'll run that issue through 
to completion after he returns.

I'll table your patch on that assumption.
jeff
Jeff Law July 29, 2016, 4:54 p.m. UTC | #4
On 07/29/2016 10:03 AM, Jeff Law wrote:
>>> It's been a couple years since I've really thought about these kinds of
>>> synchronization issues -- is it really safe in a weakly ordered
>>> processor to
>>> rely on the mutex lock/unlock of the "object_mutex" to order the
>>> loads/stores of "unseen_objects" and "seen_objects"?
>>>
>> I am pretty sure it is. After mutex unlock another cpu will not see
>> old value (provided it uses acquire semantics to read value).
Just a followup because this has been bugging me...

Release/acquire provides synchronization between threads releasing and 
acquiring the same atomic variable.  A mutex, implemented using 
acquire/release semantics would provide a memory ordering of accesses to 
  the mutex variable which is used to negotiate entry/exits to critical 
sections.

I don't see how using a mutex on "object_mutex" gives us an ordering on 
"seen_objects" and "unseen objects" -- unless the mutex under the hood 
does a full memory barrier/fence, which AFAIU is not required.

This is a bit moot since we're likely going to go with Jakub's patch...

jeff
Gleb Natapov July 29, 2016, 5:19 p.m. UTC | #5
On Fri, Jul 29, 2016 at 10:03:53AM -0600, Jeff Law wrote:
> On 07/27/2016 10:24 PM, Gleb Natapov wrote:
> > On Wed, Jul 27, 2016 at 05:12:18PM -0600, Jeff Law wrote:
> > > On 07/25/2016 07:44 AM, Gleb Natapov wrote:
> > > > _Unwind_Find_FDE calls _Unwind_Find_registered_FDE and it takes lock even
> > > > when there is no registered objects. As far as I see only statically
> > > > linked applications call __register_frame_info* functions, so for
> > > > dynamically linked executables taking the lock to check unseen_objects
> > > > and seen_objects is a pessimization. Since the function is called on
> > > > each thrown exception this is a lot of unneeded locking.  This patch
> > > > checks unseen_objects and seen_objects outside of the lock and returns
> > > > earlier if both are NULL.
> > > > 
> > > > diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
> > > > index 5b16a1f..41de746 100644
> > > > --- a/libgcc/unwind-dw2-fde.c
> > > > +++ b/libgcc/unwind-dw2-fde.c
> > > > @@ -1001,6 +1001,13 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
> > > >    struct object *ob;
> > > >    const fde *f = NULL;
> > > > 
> > > > +  /* __atomic_write is not used to modify unseen_objects and seen_objects
> > > > +     since they are modified in locked sections only and unlock provides
> > > > +     release semantics. */
> > > > +  if (!__atomic_load_n(&unseen_objects, __ATOMIC_ACQUIRE)
> > > > +      && !__atomic_load_n(&seen_objects, __ATOMIC_ACQUIRE))
> > > > +      return NULL;
> > > As as Andrew noted, this might be bad on targets that don't have atomics.
> > > For those we could easily end up inside a libfunc which would be
> > > unfortunate.  Certain mips/arm variants come to mind here.
> > > 
> > > For targets that don't have atomics or any kind of synchronization libfunc,
> > > we'll emit nop-asm-barriers to at least stop the optimizers from munging
> > > things too badly.
> > > 
> > > It's been a couple years since I've really thought about these kinds of
> > > synchronization issues -- is it really safe in a weakly ordered processor to
> > > rely on the mutex lock/unlock of the "object_mutex" to order the
> > > loads/stores of "unseen_objects" and "seen_objects"?
> > > 
> > I am pretty sure it is. After mutex unlock another cpu will not see
> > old value (provided it uses acquire semantics to read value).
> > 
> > But when I wrote the patch I did not notice that Jakub already wrote one
> > that address the same issue and avoids both of your concerns. It can be
> > found here: https://gcc.gnu.org/bugzilla/attachment.cgi?id=38852. Can we
> > apply his version?
> Jakub is on PTO for another week.  I'm sure he'll run that issue through to
> completion after he returns.
> 
> I'll table your patch on that assumption.
OK. Can you ping him about it after he will back or should I (although in
another week I will be on PTO :))? We are suffering from unwind
scalability problems in C++ exceptions and although this one patch will
not solve the issue it will leave only one lock in dl_iterate_phdr to
deal with.

--
			Gleb.
Jeff Law July 29, 2016, 5:22 p.m. UTC | #6
On 07/29/2016 11:19 AM, Gleb Natapov wrote:

>> I'll table your patch on that assumption.
> OK. Can you ping him about it after he will back or should I (although in
> another week I will be on PTO :))? We are suffering from unwind
> scalability problems in C++ exceptions and although this one patch will
> not solve the issue it will leave only one lock in dl_iterate_phdr to
> deal with.
Jakub is quite good about monitoring his in-progress patches.  I'm 
confident he'll address it after he returns.

jeff
Gleb Natapov July 29, 2016, 5:23 p.m. UTC | #7
On Fri, Jul 29, 2016 at 10:54:21AM -0600, Jeff Law wrote:
> On 07/29/2016 10:03 AM, Jeff Law wrote:
> > > > It's been a couple years since I've really thought about these kinds of
> > > > synchronization issues -- is it really safe in a weakly ordered
> > > > processor to
> > > > rely on the mutex lock/unlock of the "object_mutex" to order the
> > > > loads/stores of "unseen_objects" and "seen_objects"?
> > > > 
> > > I am pretty sure it is. After mutex unlock another cpu will not see
> > > old value (provided it uses acquire semantics to read value).
> Just a followup because this has been bugging me...
> 
> Release/acquire provides synchronization between threads releasing and
> acquiring the same atomic variable.  A mutex, implemented using
> acquire/release semantics would provide a memory ordering of accesses to
> the mutex variable which is used to negotiate entry/exits to critical
> sections.
> 
> I don't see how using a mutex on "object_mutex" gives us an ordering on
> "seen_objects" and "unseen objects" -- unless the mutex under the hood does
> a full memory barrier/fence, which AFAIU is not required.
> 
But mutex has to make sure that no write will leave critical section,
not only writes to mutex variable itself.

> This is a bit moot since we're likely going to go with Jakub's patch...
> 
Still interesting topic to understand. Jakub's patch checks only one
variable in common case, so in that regards it's even better for my use
case.

--
			Gleb.
Gleb Natapov Aug. 21, 2016, 12:13 p.m. UTC | #8
On Fri, Jul 29, 2016 at 11:22:18AM -0600, Jeff Law wrote:
> On 07/29/2016 11:19 AM, Gleb Natapov wrote:
> 
> > > I'll table your patch on that assumption.
> > OK. Can you ping him about it after he will back or should I (although in
> > another week I will be on PTO :))? We are suffering from unwind
> > scalability problems in C++ exceptions and although this one patch will
> > not solve the issue it will leave only one lock in dl_iterate_phdr to
> > deal with.
> Jakub is quite good about monitoring his in-progress patches.  I'm confident
> he'll address it after he returns.
> 
Jakub, is there a reason to not apply https://gcc.gnu.org/bugzilla/attachment.cgi?id=38852?

--
			Gleb.
diff mbox

Patch

diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
index 5b16a1f..41de746 100644
--- a/libgcc/unwind-dw2-fde.c
+++ b/libgcc/unwind-dw2-fde.c
@@ -1001,6 +1001,13 @@  _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
   struct object *ob;
   const fde *f = NULL;
 
+  /* __atomic_write is not used to modify unseen_objects and seen_objects
+     since they are modified in locked sections only and unlock provides
+     release semantics. */
+  if (!__atomic_load_n(&unseen_objects, __ATOMIC_ACQUIRE)
+      && !__atomic_load_n(&seen_objects, __ATOMIC_ACQUIRE))
+      return NULL;
+
   init_object_mutex_once ();
   __gthread_mutex_lock (&object_mutex);
 
@@ -1020,8 +1027,8 @@  _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
   while ((ob = unseen_objects))
     {
       struct object **p;
+      struct object *next = ob->next;
 
-      unseen_objects = ob->next;
       f = search_object (ob, pc);
 
       /* Insert the object into the classified list.  */
@@ -1031,6 +1038,8 @@  _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
       ob->next = *p;
       *p = ob;
 
+      unseen_objects = next;
+
       if (f)
 	goto fini;
     }