Message ID | 20160725134453.GL1018@scylladb.com |
---|---|
State | New |
Headers | show |
On 07/25/2016 07:44 AM, Gleb Natapov wrote: > _Unwind_Find_FDE calls _Unwind_Find_registered_FDE and it takes lock even > when there is no registered objects. As far as I see only statically > linked applications call __register_frame_info* functions, so for > dynamically linked executables taking the lock to check unseen_objects > and seen_objects is a pessimization. Since the function is called on > each thrown exception this is a lot of unneeded locking. This patch > checks unseen_objects and seen_objects outside of the lock and returns > earlier if both are NULL. > > diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c > index 5b16a1f..41de746 100644 > --- a/libgcc/unwind-dw2-fde.c > +++ b/libgcc/unwind-dw2-fde.c > @@ -1001,6 +1001,13 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases) > struct object *ob; > const fde *f = NULL; > > + /* __atomic_write is not used to modify unseen_objects and seen_objects > + since they are modified in locked sections only and unlock provides > + release semantics. */ > + if (!__atomic_load_n(&unseen_objects, __ATOMIC_ACQUIRE) > + && !__atomic_load_n(&seen_objects, __ATOMIC_ACQUIRE)) > + return NULL; As as Andrew noted, this might be bad on targets that don't have atomics. For those we could easily end up inside a libfunc which would be unfortunate. Certain mips/arm variants come to mind here. For targets that don't have atomics or any kind of synchronization libfunc, we'll emit nop-asm-barriers to at least stop the optimizers from munging things too badly. It's been a couple years since I've really thought about these kinds of synchronization issues -- is it really safe in a weakly ordered processor to rely on the mutex lock/unlock of the "object_mutex" to order the loads/stores of "unseen_objects" and "seen_objects"? Jeff
On Wed, Jul 27, 2016 at 05:12:18PM -0600, Jeff Law wrote: > On 07/25/2016 07:44 AM, Gleb Natapov wrote: > > _Unwind_Find_FDE calls _Unwind_Find_registered_FDE and it takes lock even > > when there is no registered objects. As far as I see only statically > > linked applications call __register_frame_info* functions, so for > > dynamically linked executables taking the lock to check unseen_objects > > and seen_objects is a pessimization. Since the function is called on > > each thrown exception this is a lot of unneeded locking. This patch > > checks unseen_objects and seen_objects outside of the lock and returns > > earlier if both are NULL. > > > > diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c > > index 5b16a1f..41de746 100644 > > --- a/libgcc/unwind-dw2-fde.c > > +++ b/libgcc/unwind-dw2-fde.c > > @@ -1001,6 +1001,13 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases) > > struct object *ob; > > const fde *f = NULL; > > > > + /* __atomic_write is not used to modify unseen_objects and seen_objects > > + since they are modified in locked sections only and unlock provides > > + release semantics. */ > > + if (!__atomic_load_n(&unseen_objects, __ATOMIC_ACQUIRE) > > + && !__atomic_load_n(&seen_objects, __ATOMIC_ACQUIRE)) > > + return NULL; > As as Andrew noted, this might be bad on targets that don't have atomics. > For those we could easily end up inside a libfunc which would be > unfortunate. Certain mips/arm variants come to mind here. > > For targets that don't have atomics or any kind of synchronization libfunc, > we'll emit nop-asm-barriers to at least stop the optimizers from munging > things too badly. > > It's been a couple years since I've really thought about these kinds of > synchronization issues -- is it really safe in a weakly ordered processor to > rely on the mutex lock/unlock of the "object_mutex" to order the > loads/stores of "unseen_objects" and "seen_objects"? > I am pretty sure it is. After mutex unlock another cpu will not see old value (provided it uses acquire semantics to read value). But when I wrote the patch I did not notice that Jakub already wrote one that address the same issue and avoids both of your concerns. It can be found here: https://gcc.gnu.org/bugzilla/attachment.cgi?id=38852. Can we apply his version? -- Gleb.
On 07/27/2016 10:24 PM, Gleb Natapov wrote: > On Wed, Jul 27, 2016 at 05:12:18PM -0600, Jeff Law wrote: >> On 07/25/2016 07:44 AM, Gleb Natapov wrote: >>> _Unwind_Find_FDE calls _Unwind_Find_registered_FDE and it takes lock even >>> when there is no registered objects. As far as I see only statically >>> linked applications call __register_frame_info* functions, so for >>> dynamically linked executables taking the lock to check unseen_objects >>> and seen_objects is a pessimization. Since the function is called on >>> each thrown exception this is a lot of unneeded locking. This patch >>> checks unseen_objects and seen_objects outside of the lock and returns >>> earlier if both are NULL. >>> >>> diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c >>> index 5b16a1f..41de746 100644 >>> --- a/libgcc/unwind-dw2-fde.c >>> +++ b/libgcc/unwind-dw2-fde.c >>> @@ -1001,6 +1001,13 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases) >>> struct object *ob; >>> const fde *f = NULL; >>> >>> + /* __atomic_write is not used to modify unseen_objects and seen_objects >>> + since they are modified in locked sections only and unlock provides >>> + release semantics. */ >>> + if (!__atomic_load_n(&unseen_objects, __ATOMIC_ACQUIRE) >>> + && !__atomic_load_n(&seen_objects, __ATOMIC_ACQUIRE)) >>> + return NULL; >> As as Andrew noted, this might be bad on targets that don't have atomics. >> For those we could easily end up inside a libfunc which would be >> unfortunate. Certain mips/arm variants come to mind here. >> >> For targets that don't have atomics or any kind of synchronization libfunc, >> we'll emit nop-asm-barriers to at least stop the optimizers from munging >> things too badly. >> >> It's been a couple years since I've really thought about these kinds of >> synchronization issues -- is it really safe in a weakly ordered processor to >> rely on the mutex lock/unlock of the "object_mutex" to order the >> loads/stores of "unseen_objects" and "seen_objects"? >> > I am pretty sure it is. After mutex unlock another cpu will not see > old value (provided it uses acquire semantics to read value). > > But when I wrote the patch I did not notice that Jakub already wrote one > that address the same issue and avoids both of your concerns. It can be > found here: https://gcc.gnu.org/bugzilla/attachment.cgi?id=38852. Can we > apply his version? Jakub is on PTO for another week. I'm sure he'll run that issue through to completion after he returns. I'll table your patch on that assumption. jeff
On 07/29/2016 10:03 AM, Jeff Law wrote: >>> It's been a couple years since I've really thought about these kinds of >>> synchronization issues -- is it really safe in a weakly ordered >>> processor to >>> rely on the mutex lock/unlock of the "object_mutex" to order the >>> loads/stores of "unseen_objects" and "seen_objects"? >>> >> I am pretty sure it is. After mutex unlock another cpu will not see >> old value (provided it uses acquire semantics to read value). Just a followup because this has been bugging me... Release/acquire provides synchronization between threads releasing and acquiring the same atomic variable. A mutex, implemented using acquire/release semantics would provide a memory ordering of accesses to the mutex variable which is used to negotiate entry/exits to critical sections. I don't see how using a mutex on "object_mutex" gives us an ordering on "seen_objects" and "unseen objects" -- unless the mutex under the hood does a full memory barrier/fence, which AFAIU is not required. This is a bit moot since we're likely going to go with Jakub's patch... jeff
On Fri, Jul 29, 2016 at 10:03:53AM -0600, Jeff Law wrote: > On 07/27/2016 10:24 PM, Gleb Natapov wrote: > > On Wed, Jul 27, 2016 at 05:12:18PM -0600, Jeff Law wrote: > > > On 07/25/2016 07:44 AM, Gleb Natapov wrote: > > > > _Unwind_Find_FDE calls _Unwind_Find_registered_FDE and it takes lock even > > > > when there is no registered objects. As far as I see only statically > > > > linked applications call __register_frame_info* functions, so for > > > > dynamically linked executables taking the lock to check unseen_objects > > > > and seen_objects is a pessimization. Since the function is called on > > > > each thrown exception this is a lot of unneeded locking. This patch > > > > checks unseen_objects and seen_objects outside of the lock and returns > > > > earlier if both are NULL. > > > > > > > > diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c > > > > index 5b16a1f..41de746 100644 > > > > --- a/libgcc/unwind-dw2-fde.c > > > > +++ b/libgcc/unwind-dw2-fde.c > > > > @@ -1001,6 +1001,13 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases) > > > > struct object *ob; > > > > const fde *f = NULL; > > > > > > > > + /* __atomic_write is not used to modify unseen_objects and seen_objects > > > > + since they are modified in locked sections only and unlock provides > > > > + release semantics. */ > > > > + if (!__atomic_load_n(&unseen_objects, __ATOMIC_ACQUIRE) > > > > + && !__atomic_load_n(&seen_objects, __ATOMIC_ACQUIRE)) > > > > + return NULL; > > > As as Andrew noted, this might be bad on targets that don't have atomics. > > > For those we could easily end up inside a libfunc which would be > > > unfortunate. Certain mips/arm variants come to mind here. > > > > > > For targets that don't have atomics or any kind of synchronization libfunc, > > > we'll emit nop-asm-barriers to at least stop the optimizers from munging > > > things too badly. > > > > > > It's been a couple years since I've really thought about these kinds of > > > synchronization issues -- is it really safe in a weakly ordered processor to > > > rely on the mutex lock/unlock of the "object_mutex" to order the > > > loads/stores of "unseen_objects" and "seen_objects"? > > > > > I am pretty sure it is. After mutex unlock another cpu will not see > > old value (provided it uses acquire semantics to read value). > > > > But when I wrote the patch I did not notice that Jakub already wrote one > > that address the same issue and avoids both of your concerns. It can be > > found here: https://gcc.gnu.org/bugzilla/attachment.cgi?id=38852. Can we > > apply his version? > Jakub is on PTO for another week. I'm sure he'll run that issue through to > completion after he returns. > > I'll table your patch on that assumption. OK. Can you ping him about it after he will back or should I (although in another week I will be on PTO :))? We are suffering from unwind scalability problems in C++ exceptions and although this one patch will not solve the issue it will leave only one lock in dl_iterate_phdr to deal with. -- Gleb.
On 07/29/2016 11:19 AM, Gleb Natapov wrote: >> I'll table your patch on that assumption. > OK. Can you ping him about it after he will back or should I (although in > another week I will be on PTO :))? We are suffering from unwind > scalability problems in C++ exceptions and although this one patch will > not solve the issue it will leave only one lock in dl_iterate_phdr to > deal with. Jakub is quite good about monitoring his in-progress patches. I'm confident he'll address it after he returns. jeff
On Fri, Jul 29, 2016 at 10:54:21AM -0600, Jeff Law wrote: > On 07/29/2016 10:03 AM, Jeff Law wrote: > > > > It's been a couple years since I've really thought about these kinds of > > > > synchronization issues -- is it really safe in a weakly ordered > > > > processor to > > > > rely on the mutex lock/unlock of the "object_mutex" to order the > > > > loads/stores of "unseen_objects" and "seen_objects"? > > > > > > > I am pretty sure it is. After mutex unlock another cpu will not see > > > old value (provided it uses acquire semantics to read value). > Just a followup because this has been bugging me... > > Release/acquire provides synchronization between threads releasing and > acquiring the same atomic variable. A mutex, implemented using > acquire/release semantics would provide a memory ordering of accesses to > the mutex variable which is used to negotiate entry/exits to critical > sections. > > I don't see how using a mutex on "object_mutex" gives us an ordering on > "seen_objects" and "unseen objects" -- unless the mutex under the hood does > a full memory barrier/fence, which AFAIU is not required. > But mutex has to make sure that no write will leave critical section, not only writes to mutex variable itself. > This is a bit moot since we're likely going to go with Jakub's patch... > Still interesting topic to understand. Jakub's patch checks only one variable in common case, so in that regards it's even better for my use case. -- Gleb.
On Fri, Jul 29, 2016 at 11:22:18AM -0600, Jeff Law wrote: > On 07/29/2016 11:19 AM, Gleb Natapov wrote: > > > > I'll table your patch on that assumption. > > OK. Can you ping him about it after he will back or should I (although in > > another week I will be on PTO :))? We are suffering from unwind > > scalability problems in C++ exceptions and although this one patch will > > not solve the issue it will leave only one lock in dl_iterate_phdr to > > deal with. > Jakub is quite good about monitoring his in-progress patches. I'm confident > he'll address it after he returns. > Jakub, is there a reason to not apply https://gcc.gnu.org/bugzilla/attachment.cgi?id=38852? -- Gleb.
diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c index 5b16a1f..41de746 100644 --- a/libgcc/unwind-dw2-fde.c +++ b/libgcc/unwind-dw2-fde.c @@ -1001,6 +1001,13 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases) struct object *ob; const fde *f = NULL; + /* __atomic_write is not used to modify unseen_objects and seen_objects + since they are modified in locked sections only and unlock provides + release semantics. */ + if (!__atomic_load_n(&unseen_objects, __ATOMIC_ACQUIRE) + && !__atomic_load_n(&seen_objects, __ATOMIC_ACQUIRE)) + return NULL; + init_object_mutex_once (); __gthread_mutex_lock (&object_mutex); @@ -1020,8 +1027,8 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases) while ((ob = unseen_objects)) { struct object **p; + struct object *next = ob->next; - unseen_objects = ob->next; f = search_object (ob, pc); /* Insert the object into the classified list. */ @@ -1031,6 +1038,8 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases) ob->next = *p; *p = ob; + unseen_objects = next; + if (f) goto fini; }