Message ID | 20160915072249.GU7282@tucnak.redhat.com |
---|---|
State | New |
Headers | show |
Jakub Jelinek <jakub@redhat.com> writes: > 2016-09-15 Jakub Jelinek <jakub@redhat.com> > > PR libgcc/71744 > * unwind-dw2-fde.c (ATOMIC_FDE_FAST_PATH): Define if __register_frame* > is not the primary registry and atomics are available. > (any_objects_registered): New variable. > (__register_frame_info_bases, __register_frame_info_table_bases): > Atomically store 1 to any_objects_registered after registering first > unwind info. > (_Unwind_Find_FDE): Return early if any_objects_registered is 0. This is OK. > +#ifdef ATOMIC_FDE_FAST_PATH > + /* For targets where unwind info is usually not registered through these > + APIs anymore, avoid taking a global lock. */ > + if (__builtin_expect (!__atomic_load_n (&any_objects_registered, > + __ATOMIC_ACQUIRE), 1)) > + return NULL; > +#endif > + > init_object_mutex_once (); > __gthread_mutex_lock (&object_mutex); I doubt it matters, but I don't think you need to use __ATOMIC_ACQUIRE in the atomic_load_n. You could use __ATOMIC_RELAXED. Acquiring the mutex is going to enforce cross-thread sequential consistency anyhow. Thanks. Ian
On Thu, 2016-09-15 at 09:22 +0200, Jakub Jelinek wrote: > Hi! > > These days on many targets that use dl_iterate_phdr to find .eh_frame_hdr > that way in most of the programs the old style EH registry is never used, > yet we still lock a global mutex and unlock it soon afterwards to find out > it is the case. > > This patch adds a fast path to that, by replacing that with an atomic load > of a flag "has anything ever been registered in the old style registry" > and if the flag says no, it avoids the locking. I think this needs a more detailed reasoning about the synchronization, and comments in the code that explain this reasoning. For example, there are no comments why you need the acquire/release pair, nor what the requirements or external synchronization is that would make it sufficient to use relaxed memory order. The current code with the critical sections seems to only ensure that _Unwind_Find_FDE observes calls to __register_frame_info_bases or __register_frame_info_table_bases if those already happen-before (referring to the so-called relation in the memory model) the _Unwind_Find_FDE call. This happen-before can be due to other synchronization or sequenced-before. If that is all you need, relaxed MO would be sufficient, but then this should get documented. If that's not all that you need, then what the patch adds seems insufficient (as well as the existing code). The acquire/release pair you currently have in the patch would ensure that the changes to unseen_objects are observed if any_objects_registered is either 0 or 1; if 1, you use the critical section, so the critical section that stores the 1 that you read from would happen-before the _Unwind_Find_FDE critical section anyway. If one observes 0, I guess you don't have any invariant wrt other stores (that the caller would have to observe too). Also, it might be possible that the critical section is required to serve as a fence wrt something else, under some conditions. But that would be a weird way to synchronize, so I guess that wouldn't be the case. OTOH, I don't know enough about the callers, so maybe it's worth considering ;)
On Thu, 2016-09-15 at 06:05 -0700, Ian Lance Taylor wrote: > Jakub Jelinek <jakub@redhat.com> writes: > > > 2016-09-15 Jakub Jelinek <jakub@redhat.com> > > > > PR libgcc/71744 > > * unwind-dw2-fde.c (ATOMIC_FDE_FAST_PATH): Define if __register_frame* > > is not the primary registry and atomics are available. > > (any_objects_registered): New variable. > > (__register_frame_info_bases, __register_frame_info_table_bases): > > Atomically store 1 to any_objects_registered after registering first > > unwind info. > > (_Unwind_Find_FDE): Return early if any_objects_registered is 0. > > This is OK. In glibc, we have a rule that we add sufficient code comments for glibc. I'm not in the position to set rules for GCC, but I think this would help here too. Trying to explain in comments why a certain memory order is used would have at least brought up the issue you mention below. > > +#ifdef ATOMIC_FDE_FAST_PATH > > + /* For targets where unwind info is usually not registered through these > > + APIs anymore, avoid taking a global lock. */ > > + if (__builtin_expect (!__atomic_load_n (&any_objects_registered, > > + __ATOMIC_ACQUIRE), 1)) > > + return NULL; > > +#endif > > + > > init_object_mutex_once (); > > __gthread_mutex_lock (&object_mutex); > > I doubt it matters, but I don't think you need to use __ATOMIC_ACQUIRE > in the atomic_load_n. You could use __ATOMIC_RELAXED. Acquiring the > mutex is going to enforce cross-thread sequential consistency anyhow. But then the release MO wouldn't be required either. The __gthread_mutex_lock call has no synchronizes-with relation with the release MO stores the patch adds, if that was what you were assuming. Note that in the C11 / C++11 memory model, lock acquisitions are not considered to automatically also be acquire MO fences. This may be the case on many archs, but it's not something the memory model guarantees.
--- libgcc/unwind-dw2-fde.c.jj 2016-01-04 15:14:09.000000000 +0100 +++ libgcc/unwind-dw2-fde.c 2016-07-07 13:25:39.248845579 +0200 @@ -35,6 +35,11 @@ see the files COPYING3 and COPYING.RUNTI #include "unwind-pe.h" #include "unwind-dw2-fde.h" #include "gthr.h" +#else +#if (defined(__GTHREAD_MUTEX_INIT) || defined(__GTHREAD_MUTEX_INIT_FUNCTION)) \ + && defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4) +#define ATOMIC_FDE_FAST_PATH 1 +#endif #endif /* The unseen_objects list contains objects that have been registered @@ -43,6 +48,9 @@ see the files COPYING3 and COPYING.RUNTI by decreasing value of pc_begin. */ static struct object *unseen_objects; static struct object *seen_objects; +#ifdef ATOMIC_FDE_FAST_PATH +static int any_objects_registered; +#endif #ifdef __GTHREAD_MUTEX_INIT static __gthread_mutex_t object_mutex = __GTHREAD_MUTEX_INIT; @@ -96,6 +104,10 @@ __register_frame_info_bases (const void ob->next = unseen_objects; unseen_objects = ob; +#ifdef ATOMIC_FDE_FAST_PATH + if (!any_objects_registered) + __atomic_store_n (&any_objects_registered, 1, __ATOMIC_RELEASE); +#endif __gthread_mutex_unlock (&object_mutex); } @@ -140,6 +152,10 @@ __register_frame_info_table_bases (void ob->next = unseen_objects; unseen_objects = ob; +#ifdef ATOMIC_FDE_FAST_PATH + if (!any_objects_registered) + __atomic_store_n (&any_objects_registered, 1, __ATOMIC_RELEASE); +#endif __gthread_mutex_unlock (&object_mutex); } @@ -1001,6 +1017,14 @@ _Unwind_Find_FDE (void *pc, struct dwarf struct object *ob; const fde *f = NULL; +#ifdef ATOMIC_FDE_FAST_PATH + /* For targets where unwind info is usually not registered through these + APIs anymore, avoid taking a global lock. */ + if (__builtin_expect (!__atomic_load_n (&any_objects_registered, + __ATOMIC_ACQUIRE), 1)) + return NULL; +#endif + init_object_mutex_once (); __gthread_mutex_lock (&object_mutex);