diff mbox

Partially improve scalability of the unwinder (PR libgcc/71744)

Message ID 20160915072249.GU7282@tucnak.redhat.com
State New
Headers show

Commit Message

Jakub Jelinek Sept. 15, 2016, 7:22 a.m. UTC
Hi!

These days on many targets that use dl_iterate_phdr to find .eh_frame_hdr
that way in most of the programs the old style EH registry is never used,
yet we still lock a global mutex and unlock it soon afterwards to find out
it is the case.

This patch adds a fast path to that, by replacing that with an atomic load
of a flag "has anything ever been registered in the old style registry"
and if the flag says no, it avoids the locking.

Of course, there is another locking in dl_iterate_phdr, which will need
agreement and cooperation in between the glibc and GCC projects, so this
isn't a full solution, but a path towards it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-09-15  Jakub Jelinek  <jakub@redhat.com>

	PR libgcc/71744
	* unwind-dw2-fde.c (ATOMIC_FDE_FAST_PATH): Define if __register_frame*
	is not the primary registry and atomics are available.
	(any_objects_registered): New variable.
	(__register_frame_info_bases, __register_frame_info_table_bases):
	Atomically store 1 to any_objects_registered after registering first
	unwind info.
	(_Unwind_Find_FDE): Return early if any_objects_registered is 0.


	Jakub

Comments

Ian Lance Taylor Sept. 15, 2016, 1:05 p.m. UTC | #1
Jakub Jelinek <jakub@redhat.com> writes:

> 2016-09-15  Jakub Jelinek  <jakub@redhat.com>
>
> 	PR libgcc/71744
> 	* unwind-dw2-fde.c (ATOMIC_FDE_FAST_PATH): Define if __register_frame*
> 	is not the primary registry and atomics are available.
> 	(any_objects_registered): New variable.
> 	(__register_frame_info_bases, __register_frame_info_table_bases):
> 	Atomically store 1 to any_objects_registered after registering first
> 	unwind info.
> 	(_Unwind_Find_FDE): Return early if any_objects_registered is 0.

This is OK.

> +#ifdef ATOMIC_FDE_FAST_PATH
> +  /* For targets where unwind info is usually not registered through these
> +     APIs anymore, avoid taking a global lock.  */
> +  if (__builtin_expect (!__atomic_load_n (&any_objects_registered,
> +					  __ATOMIC_ACQUIRE), 1))
> +    return NULL;
> +#endif
> +
>    init_object_mutex_once ();
>    __gthread_mutex_lock (&object_mutex);

I doubt it matters, but I don't think you need to use __ATOMIC_ACQUIRE
in the atomic_load_n.  You could use __ATOMIC_RELAXED.  Acquiring the
mutex is going to enforce cross-thread sequential consistency anyhow.

Thanks.

Ian
Torvald Riegel Sept. 16, 2016, 6:40 p.m. UTC | #2
On Thu, 2016-09-15 at 09:22 +0200, Jakub Jelinek wrote:
> Hi!
> 
> These days on many targets that use dl_iterate_phdr to find .eh_frame_hdr
> that way in most of the programs the old style EH registry is never used,
> yet we still lock a global mutex and unlock it soon afterwards to find out
> it is the case.
> 
> This patch adds a fast path to that, by replacing that with an atomic load
> of a flag "has anything ever been registered in the old style registry"
> and if the flag says no, it avoids the locking.

I think this needs a more detailed reasoning about the synchronization,
and comments in the code that explain this reasoning.  For example,
there are no comments why you need the acquire/release pair, nor what
the requirements or external synchronization is that would make it
sufficient to use relaxed memory order.

The current code with the critical sections seems to only ensure that
_Unwind_Find_FDE observes calls to  __register_frame_info_bases or
__register_frame_info_table_bases if those already happen-before
(referring to the so-called relation in the memory model) the
_Unwind_Find_FDE call.  This happen-before can be due to other
synchronization or sequenced-before.
If that is all you need, relaxed MO would be sufficient, but then this
should get documented.  If that's not all that you need, then what the
patch adds seems insufficient (as well as the existing code).

The acquire/release pair you currently have in the patch would ensure
that the changes to unseen_objects are observed if
any_objects_registered is either 0 or 1; if 1, you use the critical
section, so the critical section that stores the 1 that you read from
would happen-before the _Unwind_Find_FDE critical section anyway.  If
one observes 0, I guess you don't have any invariant wrt other stores
(that the caller would have to observe too).

Also, it might be possible that the critical section is required to
serve as a fence wrt something else, under some conditions.  But that
would be a weird way to synchronize, so I guess that wouldn't be the
case.  OTOH, I don't know enough about the callers, so maybe it's worth
considering ;)
Torvald Riegel Sept. 16, 2016, 6:47 p.m. UTC | #3
On Thu, 2016-09-15 at 06:05 -0700, Ian Lance Taylor wrote:
> Jakub Jelinek <jakub@redhat.com> writes:
> 
> > 2016-09-15  Jakub Jelinek  <jakub@redhat.com>
> >
> > 	PR libgcc/71744
> > 	* unwind-dw2-fde.c (ATOMIC_FDE_FAST_PATH): Define if __register_frame*
> > 	is not the primary registry and atomics are available.
> > 	(any_objects_registered): New variable.
> > 	(__register_frame_info_bases, __register_frame_info_table_bases):
> > 	Atomically store 1 to any_objects_registered after registering first
> > 	unwind info.
> > 	(_Unwind_Find_FDE): Return early if any_objects_registered is 0.
> 
> This is OK.

In glibc, we have a rule that we add sufficient code comments for glibc.
I'm not in the position to set rules for GCC, but I think this would
help here too.  Trying to explain in comments why a certain memory order
is used would have at least brought up the issue you mention below.

> > +#ifdef ATOMIC_FDE_FAST_PATH
> > +  /* For targets where unwind info is usually not registered through these
> > +     APIs anymore, avoid taking a global lock.  */
> > +  if (__builtin_expect (!__atomic_load_n (&any_objects_registered,
> > +					  __ATOMIC_ACQUIRE), 1))
> > +    return NULL;
> > +#endif
> > +
> >    init_object_mutex_once ();
> >    __gthread_mutex_lock (&object_mutex);
> 
> I doubt it matters, but I don't think you need to use __ATOMIC_ACQUIRE
> in the atomic_load_n.  You could use __ATOMIC_RELAXED.  Acquiring the
> mutex is going to enforce cross-thread sequential consistency anyhow.

But then the release MO wouldn't be required either.  The
__gthread_mutex_lock call has no synchronizes-with relation with the
release MO stores the patch adds, if that was what you were assuming.

Note that in the C11 / C++11 memory model, lock acquisitions are not
considered to automatically also be acquire MO fences.  This may be the
case on many archs, but it's not something the memory model guarantees.
diff mbox

Patch

--- libgcc/unwind-dw2-fde.c.jj	2016-01-04 15:14:09.000000000 +0100
+++ libgcc/unwind-dw2-fde.c	2016-07-07 13:25:39.248845579 +0200
@@ -35,6 +35,11 @@  see the files COPYING3 and COPYING.RUNTI
 #include "unwind-pe.h"
 #include "unwind-dw2-fde.h"
 #include "gthr.h"
+#else
+#if (defined(__GTHREAD_MUTEX_INIT) || defined(__GTHREAD_MUTEX_INIT_FUNCTION)) \
+    && defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4)
+#define ATOMIC_FDE_FAST_PATH 1
+#endif
 #endif
 
 /* The unseen_objects list contains objects that have been registered
@@ -43,6 +48,9 @@  see the files COPYING3 and COPYING.RUNTI
    by decreasing value of pc_begin.  */
 static struct object *unseen_objects;
 static struct object *seen_objects;
+#ifdef ATOMIC_FDE_FAST_PATH
+static int any_objects_registered;
+#endif
 
 #ifdef __GTHREAD_MUTEX_INIT
 static __gthread_mutex_t object_mutex = __GTHREAD_MUTEX_INIT;
@@ -96,6 +104,10 @@  __register_frame_info_bases (const void
 
   ob->next = unseen_objects;
   unseen_objects = ob;
+#ifdef ATOMIC_FDE_FAST_PATH
+  if (!any_objects_registered)
+    __atomic_store_n (&any_objects_registered, 1, __ATOMIC_RELEASE);
+#endif
 
   __gthread_mutex_unlock (&object_mutex);
 }
@@ -140,6 +152,10 @@  __register_frame_info_table_bases (void
 
   ob->next = unseen_objects;
   unseen_objects = ob;
+#ifdef ATOMIC_FDE_FAST_PATH
+  if (!any_objects_registered)
+    __atomic_store_n (&any_objects_registered, 1, __ATOMIC_RELEASE);
+#endif
 
   __gthread_mutex_unlock (&object_mutex);
 }
@@ -1001,6 +1017,14 @@  _Unwind_Find_FDE (void *pc, struct dwarf
   struct object *ob;
   const fde *f = NULL;
 
+#ifdef ATOMIC_FDE_FAST_PATH
+  /* For targets where unwind info is usually not registered through these
+     APIs anymore, avoid taking a global lock.  */
+  if (__builtin_expect (!__atomic_load_n (&any_objects_registered,
+					  __ATOMIC_ACQUIRE), 1))
+    return NULL;
+#endif
+
   init_object_mutex_once ();
   __gthread_mutex_lock (&object_mutex);