Message ID | 1522394093-9835-3-git-send-email-kemi.wang@intel.com |
---|---|
State | New |
Headers | show |
Series | [1/3] Tunables: Add tunables of spin count for adaptive spin mutex | expand |
On 30/03/2018 04:14, Kemi Wang wrote: > Usually, we can't set too short time out while spinning on the lock, that > probably makes a thread which is trying to acquire the lock go to sleep > quickly, thus weakens the benefit of pthread adaptive spin lock. > > However, there is also a problem if we set the time out large in > case of protecting a small critical section with severe lock contention. > As we can see the test result in the last patch, the performance is highly > effected by the spin count tunables, smaller spin count, better performance > improvement. This is because the thread probably spins on the lock until > timeout in severe lock contention before going to sleep. > > In this patch, we avoid the useless spin by making the spinner sleep > if it fails to acquire the lock when the lock is available, as suggested > by Tim Chen. > > nr_threads base COUNT=1000(head~1) COUNT=1000(head) > 1 51644585 51323778(-0.6%) 51378551(-0.5%) > 2 7914789 9867343(+24.7%) 11503559(+45.3%) > 7 1687620 3430504(+103.3%) 7817383(+363.2%) > 14 1026555 1843458(+79.6%) 7360883(+617.0%) > 28 962001 681965(-29.1%) 5681945(+490.6%) > 56 883770 364879(-58.7%) 3416068(+286.5%) > 112 1150589 415261(-63.9%) 3255700(+183.0%) As before [2], I checked the change on a 64 cores aarch64 machine, but differently than previous patch this one seems to show improvements: nr_threads base head(SPIN_COUNT=10) head(SPIN_COUNT=1000) 1 27566206 28776779 (4.206770) 28778073 (4.211078) 2 8498813 9129102 (6.904173) 7042975 (-20.670782) 7 5019434 5832195 (13.935765) 5098511 (1.550982) 14 4379155 6507212 (32.703053) 5200018 (15.785772) 28 4397464 4584480 (4.079329) 4456767 (1.330628) 56 4020956 3534899 (-13.750237) 4096197 (1.836850) I would suggest you to squash both patch in only one for version 2. > > Suggested-by: Tim Chen <tim.c.chen@intel.com> > Signed-off-by: Kemi Wang <kemi.wang@intel.com> > --- > nptl/pthread_mutex_lock.c | 25 +++++++++++-------------- > 1 file changed, 11 insertions(+), 14 deletions(-) > > diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c > index c3aca93..0faee1a 100644 > --- a/nptl/pthread_mutex_lock.c > +++ b/nptl/pthread_mutex_lock.c > @@ -127,22 +127,19 @@ __pthread_mutex_lock (pthread_mutex_t *mutex) > int cnt = 0; > int max_cnt = MIN (__mutex_aconf.spin_count, > mutex->__data.__spins * 2 + 100); > + > + /* MO read while spinning */ > do > - { > - if (cnt >= max_cnt) > - { > - LLL_MUTEX_LOCK (mutex); > - break; > - } > - /* MO read while spinning */ > - do > - { > - atomic_spin_nop (); > - } > - while (atomic_load_relaxed (&mutex->__data.__lock) != 0 && > + { > + atomic_spin_nop (); > + } > + while (atomic_load_relaxed (&mutex->__data.__lock) != 0 && > ++cnt < max_cnt); > - } > - while (LLL_MUTEX_TRYLOCK (mutex) != 0); > + /* Try to acquire the lock if lock is available or the spin count > + * is run out, go to sleep if fails > + */ > + if (LLL_MUTEX_TRYLOCK (mutex) != 0) > + LLL_MUTEX_LOCK (mutex); > > mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8; > } > Please fix the format issue here.
On 2018年04月06日 04:59, Adhemerval Zanella wrote: > > > On 30/03/2018 04:14, Kemi Wang wrote: >> Usually, we can't set too short time out while spinning on the lock, that >> probably makes a thread which is trying to acquire the lock go to sleep >> quickly, thus weakens the benefit of pthread adaptive spin lock. >> >> However, there is also a problem if we set the time out large in >> case of protecting a small critical section with severe lock contention. >> As we can see the test result in the last patch, the performance is highly >> effected by the spin count tunables, smaller spin count, better performance >> improvement. This is because the thread probably spins on the lock until >> timeout in severe lock contention before going to sleep. >> >> In this patch, we avoid the useless spin by making the spinner sleep >> if it fails to acquire the lock when the lock is available, as suggested >> by Tim Chen. >> >> nr_threads base COUNT=1000(head~1) COUNT=1000(head) >> 1 51644585 51323778(-0.6%) 51378551(-0.5%) >> 2 7914789 9867343(+24.7%) 11503559(+45.3%) >> 7 1687620 3430504(+103.3%) 7817383(+363.2%) >> 14 1026555 1843458(+79.6%) 7360883(+617.0%) >> 28 962001 681965(-29.1%) 5681945(+490.6%) >> 56 883770 364879(-58.7%) 3416068(+286.5%) >> 112 1150589 415261(-63.9%) 3255700(+183.0%) > > As before [2], I checked the change on a 64 cores aarch64 machine, but > differently than previous patch this one seems to show improvements: > > nr_threads base head(SPIN_COUNT=10) head(SPIN_COUNT=1000) > 1 27566206 28776779 (4.206770) 28778073 (4.211078) > 2 8498813 9129102 (6.904173) 7042975 (-20.670782) > 7 5019434 5832195 (13.935765) 5098511 (1.550982) > 14 4379155 6507212 (32.703053) 5200018 (15.785772) > 28 4397464 4584480 (4.079329) 4456767 (1.330628) > 56 4020956 3534899 (-13.750237) 4096197 (1.836850) > > I would suggest you to squash both patch in only one for version 2. > OK, the separation here is easy for review. >> >> Suggested-by: Tim Chen <tim.c.chen@intel.com> >> Signed-off-by: Kemi Wang <kemi.wang@intel.com> >> --- >> nptl/pthread_mutex_lock.c | 25 +++++++++++-------------- >> 1 file changed, 11 insertions(+), 14 deletions(-) >> >> diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c >> index c3aca93..0faee1a 100644 >> --- a/nptl/pthread_mutex_lock.c >> +++ b/nptl/pthread_mutex_lock.c >> @@ -127,22 +127,19 @@ __pthread_mutex_lock (pthread_mutex_t *mutex) >> int cnt = 0; >> int max_cnt = MIN (__mutex_aconf.spin_count, >> mutex->__data.__spins * 2 + 100); >> + >> + /* MO read while spinning */ >> do >> - { >> - if (cnt >= max_cnt) >> - { >> - LLL_MUTEX_LOCK (mutex); >> - break; >> - } >> - /* MO read while spinning */ >> - do >> - { >> - atomic_spin_nop (); >> - } >> - while (atomic_load_relaxed (&mutex->__data.__lock) != 0 && >> + { >> + atomic_spin_nop (); >> + } >> + while (atomic_load_relaxed (&mutex->__data.__lock) != 0 && >> ++cnt < max_cnt); >> - } >> - while (LLL_MUTEX_TRYLOCK (mutex) != 0); >> + /* Try to acquire the lock if lock is available or the spin count >> + * is run out, go to sleep if fails >> + */ >> + if (LLL_MUTEX_TRYLOCK (mutex) != 0) >> + LLL_MUTEX_LOCK (mutex); >> >> mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8; >> } >> > > Please fix the format issue here. >
diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c index c3aca93..0faee1a 100644 --- a/nptl/pthread_mutex_lock.c +++ b/nptl/pthread_mutex_lock.c @@ -127,22 +127,19 @@ __pthread_mutex_lock (pthread_mutex_t *mutex) int cnt = 0; int max_cnt = MIN (__mutex_aconf.spin_count, mutex->__data.__spins * 2 + 100); + + /* MO read while spinning */ do - { - if (cnt >= max_cnt) - { - LLL_MUTEX_LOCK (mutex); - break; - } - /* MO read while spinning */ - do - { - atomic_spin_nop (); - } - while (atomic_load_relaxed (&mutex->__data.__lock) != 0 && + { + atomic_spin_nop (); + } + while (atomic_load_relaxed (&mutex->__data.__lock) != 0 && ++cnt < max_cnt); - } - while (LLL_MUTEX_TRYLOCK (mutex) != 0); + /* Try to acquire the lock if lock is available or the spin count + * is run out, go to sleep if fails + */ + if (LLL_MUTEX_TRYLOCK (mutex) != 0) + LLL_MUTEX_LOCK (mutex); mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8; }
Usually, we can't set too short time out while spinning on the lock, that probably makes a thread which is trying to acquire the lock go to sleep quickly, thus weakens the benefit of pthread adaptive spin lock. However, there is also a problem if we set the time out large in case of protecting a small critical section with severe lock contention. As we can see the test result in the last patch, the performance is highly effected by the spin count tunables, smaller spin count, better performance improvement. This is because the thread probably spins on the lock until timeout in severe lock contention before going to sleep. In this patch, we avoid the useless spin by making the spinner sleep if it fails to acquire the lock when the lock is available, as suggested by Tim Chen. nr_threads base COUNT=1000(head~1) COUNT=1000(head) 1 51644585 51323778(-0.6%) 51378551(-0.5%) 2 7914789 9867343(+24.7%) 11503559(+45.3%) 7 1687620 3430504(+103.3%) 7817383(+363.2%) 14 1026555 1843458(+79.6%) 7360883(+617.0%) 28 962001 681965(-29.1%) 5681945(+490.6%) 56 883770 364879(-58.7%) 3416068(+286.5%) 112 1150589 415261(-63.9%) 3255700(+183.0%) Suggested-by: Tim Chen <tim.c.chen@intel.com> Signed-off-by: Kemi Wang <kemi.wang@intel.com> --- nptl/pthread_mutex_lock.c | 25 +++++++++++-------------- 1 file changed, 11 insertions(+), 14 deletions(-)