Message ID | 20220325015050.2065523-1-wangyang.guo@intel.com |
---|---|
State | New |
Headers | show |
Series | nptl: Add backoff mechanism to spinlock loop | expand |
On Thu, Mar 24, 2022 at 8:51 PM Wangyang Guo via Libc-alpha <libc-alpha@sourceware.org> wrote: > > When mutiple threads waiting for lock at the same time, once lock owner > releases the lock, waiters will see lock available and all try to lock, > which may cause an expensive CAS storm. > > Binary exponential backoff with random jitter is introduced. As try-lock > attempt increases, there is more likely that a larger number threads > compete for adaptive mutex lock, so increase wait time in exponential. > A random jitter is also added to avoid synchronous try-lock from other > threads. > > Signed-off-by: Wangyang Guo <wangyang.guo@intel.com> > --- > nptl/pthread_mutex_lock.c | 17 +++++++++++++++-- > 1 file changed, 15 insertions(+), 2 deletions(-) > > diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c > index d2e652d151..ec57dc3627 100644 > --- a/nptl/pthread_mutex_lock.c > +++ b/nptl/pthread_mutex_lock.c > @@ -26,6 +26,7 @@ > #include <futex-internal.h> > #include <stap-probe.h> > #include <shlib-compat.h> > +#include <random-bits.h> > > /* Some of the following definitions differ when pthread_mutex_cond_lock.c > includes this file. */ > @@ -138,14 +139,26 @@ PTHREAD_MUTEX_LOCK (pthread_mutex_t *mutex) > int cnt = 0; > int max_cnt = MIN (max_adaptive_count (), > mutex->__data.__spins * 2 + 10); > + int spin_count, exp_backoff = 1; > + unsigned int jitter = random_bits (); > do > { > - if (cnt++ >= max_cnt) > + /* In each loop, spin count is exponential backoff plus > + random jitter, random range is [0, exp_backoff-1]. */ > + spin_count = exp_backoff + (jitter & (exp_backoff - 1)); > + cnt += spin_count; > + if (cnt >= max_cnt) > { > + /* If cnt exceeds max spin count, just go to wait > + queue. */ > LLL_MUTEX_LOCK (mutex); > break; > } > - atomic_spin_nop (); > + do > + atomic_spin_nop (); > + while (--spin_count > 0); > + /* Binary exponential backoff, prepare for next loop. */ > + exp_backoff <<= 1; > } > while (LLL_MUTEX_READ_LOCK (mutex) != 0 Does this load not already prevent against the 'CAS storm'? > || LLL_MUTEX_TRYLOCK (mutex) != 0); > -- > 2.35.1 >
>> + atomic_spin_nop (); >> + while (--spin_count > 0); >> + /* Binary exponential backoff, prepare for next loop. */ >> + exp_backoff <<= 1; >> } >> while (LLL_MUTEX_READ_LOCK (mutex) != 0 >Does this load not already prevent against the 'CAS storm'? This just prevent CAS in a long held lock. But if multiple threads waiting for lock at the same time, suppose many of them are going to read lock state, once the lock owner release the lock at this point, those waiter threads will be convinced lock is unlocked. The next step, they will all do try lock at the same time. Backoff is introduced to solve this problem. >> || LLL_MUTEX_TRYLOCK (mutex) != 0);
On Thu, Mar 24, 2022 at 10:24 PM Guo, Wangyang <wangyang.guo@intel.com> wrote: > > > >> + atomic_spin_nop (); > >> + while (--spin_count > 0); > >> + /* Binary exponential backoff, prepare for next loop. */ > >> + exp_backoff <<= 1; > >> } > >> while (LLL_MUTEX_READ_LOCK (mutex) != 0 > >Does this load not already prevent against the 'CAS storm'? > This just prevent CAS in a long held lock. > But if multiple threads waiting for lock at the same time, suppose many of them are going to read lock state, > once the lock owner release the lock at this point, those waiter threads will be convinced lock is unlocked. > The next step, they will all do try lock at the same time. Backoff is introduced to solve this problem. The loop isn't spinning on CAS failure which is typically where you see the poor performance. I get that there can still be some contention on the CAS, but shouldn't the read check limit the evict ping-ponging? > > >> || LLL_MUTEX_TRYLOCK (mutex) != 0); >
Noah Goldstein wrote: > > >> + atomic_spin_nop (); > > >> + while (--spin_count > 0); > > >> + /* Binary exponential backoff, prepare for next loop. */ > > >> + exp_backoff <<= 1; > > >> } > > >> while (LLL_MUTEX_READ_LOCK (mutex) != 0 > > >Does this load not already prevent against the 'CAS storm'? > > This just prevent CAS in a long held lock. > > But if multiple threads waiting for lock at the same time, suppose > > many of them are going to read lock state, once the lock owner release the lock at this point, those waiter threads will be convinced lock is unlocked. > > The next step, they will all do try lock at the same time. Backoff is introduced to solve this problem. > > The loop isn't spinning on CAS failure which is typically where you see the poor performance. > I get that there can still be some contention on the CAS, but shouldn't the read check limit the evict ping-ponging? Yes, read check can help. But in a very high contention case, it becomes more easier to meet the above problem that needs backoff. > > > > >> || LLL_MUTEX_TRYLOCK (mutex) != 0); > >
On Thu, Mar 24, 2022 at 11:32 PM Guo, Wangyang <wangyang.guo@intel.com> wrote: > > Noah Goldstein wrote: > > > >> + atomic_spin_nop (); > > > >> + while (--spin_count > 0); > > > >> + /* Binary exponential backoff, prepare for next loop. */ > > > >> + exp_backoff <<= 1; > > > >> } > > > >> while (LLL_MUTEX_READ_LOCK (mutex) != 0 > > > >Does this load not already prevent against the 'CAS storm'? > > > This just prevent CAS in a long held lock. > > > But if multiple threads waiting for lock at the same time, suppose > > > many of them are going to read lock state, once the lock owner release the lock at this point, those waiter threads will be convinced lock is unlocked. > > > The next step, they will all do try lock at the same time. Backoff is introduced to solve this problem. > > > > The loop isn't spinning on CAS failure which is typically where you see the poor performance. > > I get that there can still be some contention on the CAS, but shouldn't the read check limit the evict ping-ponging? > > Yes, read check can help. > But in a very high contention case, it becomes more easier to meet the above problem that needs backoff. Do we need both then? But made a quick benchmark to test this out and your right, using the lock for a tiny critical section at least (incrementing an int) see less failed CAS attempts w/ this patch and better performance :) > > > > > > >> || LLL_MUTEX_TRYLOCK (mutex) != 0); > > >
> On Thu, Mar 24, 2022 at 11:32 PM Guo, Wangyang <wangyang.guo@intel.com> wrote: > > > > Noah Goldstein wrote: > > > > >> + atomic_spin_nop (); > > > > >> + while (--spin_count > 0); > > > > >> + /* Binary exponential backoff, prepare for next loop. */ > > > > >> + exp_backoff <<= 1; > > > > >> } > > > > >> while (LLL_MUTEX_READ_LOCK (mutex) != 0 > > > > >Does this load not already prevent against the 'CAS storm'? > > > > This just prevent CAS in a long held lock. > > > > But if multiple threads waiting for lock at the same time, suppose > > > > many of them are going to read lock state, once the lock owner release the lock at this point, those waiter threads will be convinced lock is unlocked. > > > > The next step, they will all do try lock at the same time. Backoff is introduced to solve this problem. > > > > > > The loop isn't spinning on CAS failure which is typically where you see the poor performance. > > > I get that there can still be some contention on the CAS, but shouldn't the read check limit the evict ping-ponging? > > > > Yes, read check can help. > > But in a very high contention case, it becomes more easier to meet the above problem that needs backoff. > Do we need both then? > But made a quick benchmark to test this out and your right, using the lock for a tiny critical section at least (incrementing an int) see less failed CAS attempts w/ this patch and better performance :) In theory, the read-check still works in a long held lock, but will have extra overhead if lock can be acquired immediately. From my testing, without read-check has a better performance. > > > > > > > > >> || LLL_MUTEX_TRYLOCK (mutex) != 0); > > > >
diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c index d2e652d151..ec57dc3627 100644 --- a/nptl/pthread_mutex_lock.c +++ b/nptl/pthread_mutex_lock.c @@ -26,6 +26,7 @@ #include <futex-internal.h> #include <stap-probe.h> #include <shlib-compat.h> +#include <random-bits.h> /* Some of the following definitions differ when pthread_mutex_cond_lock.c includes this file. */ @@ -138,14 +139,26 @@ PTHREAD_MUTEX_LOCK (pthread_mutex_t *mutex) int cnt = 0; int max_cnt = MIN (max_adaptive_count (), mutex->__data.__spins * 2 + 10); + int spin_count, exp_backoff = 1; + unsigned int jitter = random_bits (); do { - if (cnt++ >= max_cnt) + /* In each loop, spin count is exponential backoff plus + random jitter, random range is [0, exp_backoff-1]. */ + spin_count = exp_backoff + (jitter & (exp_backoff - 1)); + cnt += spin_count; + if (cnt >= max_cnt) { + /* If cnt exceeds max spin count, just go to wait + queue. */ LLL_MUTEX_LOCK (mutex); break; } - atomic_spin_nop (); + do + atomic_spin_nop (); + while (--spin_count > 0); + /* Binary exponential backoff, prepare for next loop. */ + exp_backoff <<= 1; } while (LLL_MUTEX_READ_LOCK (mutex) != 0 || LLL_MUTEX_TRYLOCK (mutex) != 0);
When mutiple threads waiting for lock at the same time, once lock owner releases the lock, waiters will see lock available and all try to lock, which may cause an expensive CAS storm. Binary exponential backoff with random jitter is introduced. As try-lock attempt increases, there is more likely that a larger number threads compete for adaptive mutex lock, so increase wait time in exponential. A random jitter is also added to avoid synchronous try-lock from other threads. Signed-off-by: Wangyang Guo <wangyang.guo@intel.com> --- nptl/pthread_mutex_lock.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)