diff mbox series

[v2,1/3] Tunables: Add tunables of spin count for pthread adaptive spin mutex

Message ID 1524624988-29141-1-git-send-email-kemi.wang@intel.com
State New
Headers show
Series [v2,1/3] Tunables: Add tunables of spin count for pthread adaptive spin mutex | expand

Commit Message

kemi April 25, 2018, 2:56 a.m. UTC
This patch does not have any functionality change, we only provide a spin
count tunes for pthread adaptive spin mutex. The tunable
glibc.mutex.spin_count tunes can be used by system administrator to squeeze
system performance according to different hardware capability and workload
model.

This is the preparation work for the next patch, in which the way of
adaptive spin would be changed from an expensive cmpxchg to read while
spinning.

   * elf/dl-tunables.list: Add glibc.mutex.spin_count entry.
   * manual/tunables.texi: Add glibc.mutex.spin_count description.
   * nptl/Makefile: Add pthread_mutex_conf.c for compilation.
   * nptl/pthread_mutex_conf.h: New file.
   * nptl/pthread_mutex_conf.c: New file.

ChangeLog:
    V1->V2
    a) Renamed nptl/mutex-conf.h -> nptl/pthread_mutex_conf.h
    b) Renamed nptl/mutex-conf.c -> nptl/pthread_mutex_conf.c
    c) Change the Makefile to compile pthread_mutex_conf.c
    d) Modify the copyright "2013-2018" -> "2018" for new added files
    e) Fix the indentation issue (tab -> double space) in
    elf/dl-tunables.list
    f) Remove the env alias LD_SPIN_COUNT in elf/dl-tunables.list
    g) Fix the typo errors and refresh glibc.mutex.spin_count tunable
    description in manual/tunables.texi.
    h) Fix the indentation issue in nptl/pthread_mutex_conf.c
    i) Fix the indentation issue for nested preprocessor (add one space for
    each level)

Suggested-by: Andi Kleen <andi.kleen@intel.com>
Signed-off-by: Kemi Wang <kemi.wang@intel.com>
---
 ChangeLog                 |  8 +++++
 elf/dl-tunables.list      |  9 ++++++
 manual/tunables.texi      | 21 +++++++++++++
 nptl/Makefile             |  3 +-
 nptl/pthread_mutex_conf.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++
 nptl/pthread_mutex_conf.h | 31 +++++++++++++++++++
 6 files changed, 148 insertions(+), 1 deletion(-)
 create mode 100644 nptl/pthread_mutex_conf.c
 create mode 100644 nptl/pthread_mutex_conf.h

Comments

Rical Jasan April 25, 2018, 4:02 a.m. UTC | #1
On 04/24/2018 07:56 PM, Kemi Wang wrote:
...
> diff --git a/manual/tunables.texi b/manual/tunables.texi
> index be33c9f..c2cf8c6 100644
> --- a/manual/tunables.texi
> +++ b/manual/tunables.texi
> @@ -281,6 +281,27 @@ of try lock attempts.
>  The default value of this tunable is @samp{3}.
>  @end deftp
>  
> +@node Pthread Mutex Tunables
> +@section Pthread Mutex Tunables
> +@cindex pthread mutex tunables
> +
> +@deftp {Tunable namespace} glibc.mutex
> +Behavior of pthread mutex can be tuned to gain performance improvement
> +according to specific hardware capability and workload character by setting
> +the following tunables in the @code{mutex} namespace.

The behavior of pthread mutexes can be tuned to gain performance
improvements according to specific hardware capabilities and workload
characteristics by setting the following tunables in the @code{mutex}
namespace:

> +@end deftp
> +
> +@deftp Tunable glibc.mutex.spin_count
> +The @code{glibc.mutex.spin_count} tunable sets the maximum spin times that
> +a thread should spin on the lock before calling into the kernel to block.

"maximum number of times a thread should spin"

> +Adaptive spin is used for the mutex initialized with PTHREAD_MUTEX_ADAPTIVE_NP

"used for mutexes initialized with the"

> +GNU extension. It affects both pthread_mutex_lock and pthread_mutex_timedlock.

Double spaces between sentences.

> +The spinning is done in case of either the maximum spin times is reached or
> +the lock is acquired during spinning.

Why would we spin after we've maxed out spinning (or acquired the lock)?
 Perhaps this should read:

"Spinning is done until either the maximum spin count is reached or the
lock is acquired."

> +
> +The default value of this tunable is @samp{1000}.
> +@end deftp
> +
>  @node Hardware Capability Tunables
>  @section Hardware Capability Tunables
>  @cindex hardware capability tunables

Rical
kemi April 25, 2018, 5:11 a.m. UTC | #2
Hi, Rical
   Thanks for your help to polish this description.
That's much better than before, I will fold these changes in the next version.

On 2018年04月25日 12:02, Rical Jasan wrote:
> On 04/24/2018 07:56 PM, Kemi Wang wrote:
> ...
>> diff --git a/manual/tunables.texi b/manual/tunables.texi
>> index be33c9f..c2cf8c6 100644
>> --- a/manual/tunables.texi
>> +++ b/manual/tunables.texi
>> @@ -281,6 +281,27 @@ of try lock attempts.
>>  The default value of this tunable is @samp{3}.
>>  @end deftp
>>  
>> +@node Pthread Mutex Tunables
>> +@section Pthread Mutex Tunables
>> +@cindex pthread mutex tunables
>> +
>> +@deftp {Tunable namespace} glibc.mutex
>> +Behavior of pthread mutex can be tuned to gain performance improvement
>> +according to specific hardware capability and workload character by setting
>> +the following tunables in the @code{mutex} namespace.
> 
> The behavior of pthread mutexes can be tuned to gain performance
> improvements according to specific hardware capabilities and workload
> characteristics by setting the following tunables in the @code{mutex}
> namespace:
> 

OK.

>> +@end deftp
>> +
>> +@deftp Tunable glibc.mutex.spin_count
>> +The @code{glibc.mutex.spin_count} tunable sets the maximum spin times that
>> +a thread should spin on the lock before calling into the kernel to block.
> 
> "maximum number of times a thread should spin"
>

OK

>> +Adaptive spin is used for the mutex initialized with PTHREAD_MUTEX_ADAPTIVE_NP
> 
> "used for mutexes initialized with the"
> 
>> +GNU extension. It affects both pthread_mutex_lock and pthread_mutex_timedlock.
> 
> Double spaces between sentences.
> 

Sure.

>> +The spinning is done in case of either the maximum spin times is reached or
>> +the lock is acquired during spinning.
> 
> Why would we spin after we've maxed out spinning (or acquired the lock)?
>  Perhaps this should read:
> 
> "Spinning is done until either the maximum spin count is reached or the
> lock is acquired."
> 

more readable, thanks.

>> +
>> +The default value of this tunable is @samp{1000}.
>> +@end deftp
>> +
>>  @node Hardware Capability Tunables
>>  @section Hardware Capability Tunables
>>  @cindex hardware capability tunables
> 
> Rical
>
kemi May 2, 2018, 1:51 a.m. UTC | #3
PING for series

On 2018年04月25日 10:56, Kemi Wang wrote:
> This patch does not have any functionality change, we only provide a spin
> count tunes for pthread adaptive spin mutex. The tunable
> glibc.mutex.spin_count tunes can be used by system administrator to squeeze
> system performance according to different hardware capability and workload
> model.
> 
> This is the preparation work for the next patch, in which the way of
> adaptive spin would be changed from an expensive cmpxchg to read while
> spinning.
> 
>    * elf/dl-tunables.list: Add glibc.mutex.spin_count entry.
>    * manual/tunables.texi: Add glibc.mutex.spin_count description.
>    * nptl/Makefile: Add pthread_mutex_conf.c for compilation.
>    * nptl/pthread_mutex_conf.h: New file.
>    * nptl/pthread_mutex_conf.c: New file.
> 
> ChangeLog:
>     V1->V2
>     a) Renamed nptl/mutex-conf.h -> nptl/pthread_mutex_conf.h
>     b) Renamed nptl/mutex-conf.c -> nptl/pthread_mutex_conf.c
>     c) Change the Makefile to compile pthread_mutex_conf.c
>     d) Modify the copyright "2013-2018" -> "2018" for new added files
>     e) Fix the indentation issue (tab -> double space) in
>     elf/dl-tunables.list
>     f) Remove the env alias LD_SPIN_COUNT in elf/dl-tunables.list
>     g) Fix the typo errors and refresh glibc.mutex.spin_count tunable
>     description in manual/tunables.texi.
>     h) Fix the indentation issue in nptl/pthread_mutex_conf.c
>     i) Fix the indentation issue for nested preprocessor (add one space for
>     each level)
> 
> Suggested-by: Andi Kleen <andi.kleen@intel.com>
> Signed-off-by: Kemi Wang <kemi.wang@intel.com>
> ---
>  ChangeLog                 |  8 +++++
>  elf/dl-tunables.list      |  9 ++++++
>  manual/tunables.texi      | 21 +++++++++++++
>  nptl/Makefile             |  3 +-
>  nptl/pthread_mutex_conf.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++
>  nptl/pthread_mutex_conf.h | 31 +++++++++++++++++++
>  6 files changed, 148 insertions(+), 1 deletion(-)
>  create mode 100644 nptl/pthread_mutex_conf.c
>  create mode 100644 nptl/pthread_mutex_conf.h
> 
> diff --git a/ChangeLog b/ChangeLog
> index efa8d39..4750b11 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,3 +1,11 @@
> +2018-04-24  Kemi Wang <kemi.wang@intel.com>
> +
> +	* elf/dl-tunables.list: Add glibc.mutex.spin_count entry.
> +	* manual/tunables.texi: Add glibc.mutex.spin_count description.
> +	* nptl/Makefile: Add pthread_mutex_conf.c for compilation.
> +	* nptl/pthread_mutex_conf.h: New file.
> +	* nptl/pthread_mutex_conf.c: New file.
> +
>  2018-04-24  Joseph Myers  <joseph@codesourcery.com>
>  
>  	* sysdeps/mach/hurd/dl-sysdep.c: Include <not-errno.h>.
> diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
> index 1f8ecb8..dc1e8f4 100644
> --- a/elf/dl-tunables.list
> +++ b/elf/dl-tunables.list
> @@ -121,4 +121,13 @@ glibc {
>        default: 3
>      }
>    }
> +
> +  mutex {
> +    spin_count {
> +      type: INT_32
> +      minval: 0
> +      maxval: 30000
> +      default: 1000
> +    }
> +  }
>  }
> diff --git a/manual/tunables.texi b/manual/tunables.texi
> index be33c9f..c2cf8c6 100644
> --- a/manual/tunables.texi
> +++ b/manual/tunables.texi
> @@ -281,6 +281,27 @@ of try lock attempts.
>  The default value of this tunable is @samp{3}.
>  @end deftp
>  
> +@node Pthread Mutex Tunables
> +@section Pthread Mutex Tunables
> +@cindex pthread mutex tunables
> +
> +@deftp {Tunable namespace} glibc.mutex
> +Behavior of pthread mutex can be tuned to gain performance improvement
> +according to specific hardware capability and workload character by setting
> +the following tunables in the @code{mutex} namespace.
> +@end deftp
> +
> +@deftp Tunable glibc.mutex.spin_count
> +The @code{glibc.mutex.spin_count} tunable sets the maximum spin times that
> +a thread should spin on the lock before calling into the kernel to block.
> +Adaptive spin is used for the mutex initialized with PTHREAD_MUTEX_ADAPTIVE_NP
> +GNU extension. It affects both pthread_mutex_lock and pthread_mutex_timedlock.
> +The spinning is done in case of either the maximum spin times is reached or
> +the lock is acquired during spinning.
> +
> +The default value of this tunable is @samp{1000}.
> +@end deftp
> +
>  @node Hardware Capability Tunables
>  @section Hardware Capability Tunables
>  @cindex hardware capability tunables
> diff --git a/nptl/Makefile b/nptl/Makefile
> index 94be92c..bd1096f 100644
> --- a/nptl/Makefile
> +++ b/nptl/Makefile
> @@ -139,7 +139,8 @@ libpthread-routines = nptl-init vars events version pt-interp \
>  		      pthread_mutex_getprioceiling \
>  		      pthread_mutex_setprioceiling \
>  		      pthread_setname pthread_getname \
> -		      pthread_setattr_default_np pthread_getattr_default_np
> +		      pthread_setattr_default_np pthread_getattr_default_np \
> +		      pthread_mutex_conf
>  #		      pthread_setuid pthread_seteuid pthread_setreuid \
>  #		      pthread_setresuid \
>  #		      pthread_setgid pthread_setegid pthread_setregid \
> diff --git a/nptl/pthread_mutex_conf.c b/nptl/pthread_mutex_conf.c
> new file mode 100644
> index 0000000..6340b5d
> --- /dev/null
> +++ b/nptl/pthread_mutex_conf.c
> @@ -0,0 +1,77 @@
> +/* pthread_mutex_conf.c: Pthread mutex tunable parameters.
> +   Copyright (C) 2018 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include <pthreadP.h>
> +#include <init-arch.h>
> +#include <pthread_mutex_conf.h>
> +#include <unistd.h>
> +
> +#if HAVE_TUNABLES
> +# define TUNABLE_NAMESPACE mutex
> +#endif
> +#include <elf/dl-tunables.h>
> +
> +
> +struct mutex_config __mutex_aconf =
> +  {
> +    /* The maximum times a thread spin on the lock before going to block */
> +    .spin_count = 1000,
> +  };
> +
> +#if HAVE_TUNABLES
> +# define TUNABLE_CALLBACK_FNDECL(__name, __type)			\
> +static inline void						\
> +__always_inline							\
> +do_set_mutex_ ## __name (__type value)			\
> +{								\
> +  __mutex_aconf.__name = value;				\
> +}								\
> +void								\
> +TUNABLE_CALLBACK (set_mutex_ ## __name) (tunable_val_t *valp) \
> +{								\
> +  __type value = (__type) (valp)->numval;			\
> +  do_set_mutex_ ## __name (value);				\
> +}
> +
> +TUNABLE_CALLBACK_FNDECL (spin_count, int32_t);
> +#endif
> +
> +static void
> +mutex_tunables_init (int argc __attribute__ ((unused)),
> +			      char **argv  __attribute__ ((unused)),
> +					      char **environ)
> +{
> +#if HAVE_TUNABLES
> +
> +  TUNABLE_GET (spin_count, int32_t,
> +		  TUNABLE_CALLBACK (set_mutex_spin_count));
> +#endif
> +}
> +
> +#ifdef SHARED
> +# define INIT_SECTION ".init_array"
> +#else
> +# define INIT_SECTION ".preinit_array"
> +#endif
> +
> +void (*const __pthread_mutex_tunables_init_array []) (int, char **, char **)
> +  __attribute__ ((section (INIT_SECTION), aligned (sizeof (void *)))) =
> +{
> +  &mutex_tunables_init
> +};
> diff --git a/nptl/pthread_mutex_conf.h b/nptl/pthread_mutex_conf.h
> new file mode 100644
> index 0000000..e5b027c
> --- /dev/null
> +++ b/nptl/pthread_mutex_conf.h
> @@ -0,0 +1,31 @@
> +/* pthread_mutex_conf.h: Pthread mutex tunable parameters.
> +   Copyright (C) 2018 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +#ifndef _PTHREAD_MUTEX_CONF_H
> +#define _PTHREAD_MUTEX_CONF_H 1
> +
> +#include <pthread.h>
> +#include <time.h>
> +
> +struct mutex_config
> +{
> +  int spin_count;
> +};
> +
> +extern struct mutex_config __mutex_aconf attribute_hidden;
> +
> +#endif
>
Florian Weimer May 2, 2018, 8:04 a.m. UTC | #4
On 04/25/2018 04:56 AM, Kemi Wang wrote:

> +  mutex {
> +    spin_count {
> +      type: INT_32
> +      minval: 0
> +      maxval: 30000
> +      default: 1000
> +    }

How did you come up with the default and maximum values?  Larger maximum 
values might be useful for testing boundary conditions.

> +# define TUNABLE_CALLBACK_FNDECL(__name, __type)			\
> +static inline void						\
> +__always_inline							\
> +do_set_mutex_ ## __name (__type value)			\
> +{								\
> +  __mutex_aconf.__name = value;				\
> +}								\
> +void								\
> +TUNABLE_CALLBACK (set_mutex_ ## __name) (tunable_val_t *valp) \
> +{								\
> +  __type value = (__type) (valp)->numval;			\
> +  do_set_mutex_ ## __name (value);				\
> +}
> +
> +TUNABLE_CALLBACK_FNDECL (spin_count, int32_t);

I'm not sure if the macro is helpful in this context.

> +static void
> +mutex_tunables_init (int argc __attribute__ ((unused)),
> +			      char **argv  __attribute__ ((unused)),
> +					      char **environ)
> +{
> +#if HAVE_TUNABLES
> +
> +  TUNABLE_GET (spin_count, int32_t,
> +		  TUNABLE_CALLBACK (set_mutex_spin_count));
> +#endif
> +}
> +
> +#ifdef SHARED
> +# define INIT_SECTION ".init_array"
> +#else
> +# define INIT_SECTION ".preinit_array"
> +#endif
> +
> +void (*const __pthread_mutex_tunables_init_array []) (int, char **, char **)
> +  __attribute__ ((section (INIT_SECTION), aligned (sizeof (void *)))) =
> +{
> +  &mutex_tunables_init
> +};

Can't you perform the initialization as part of overall pthread 
initialization?  This would avoid the extra relocation.

Thanks,
Florian
kemi May 2, 2018, 11:06 a.m. UTC | #5
Hi, Florian
    Thanks for your time to review.

On 2018年05月02日 16:04, Florian Weimer wrote:
> On 04/25/2018 04:56 AM, Kemi Wang wrote:
> 
>> +  mutex {
>> +    spin_count {
>> +      type: INT_32
>> +      minval: 0
>> +      maxval: 30000
>> +      default: 1000
>> +    }
> 
> How did you come up with the default and maximum values?  Larger maximum values might be useful for testing boundary conditions.
> 

For the maximum value of spin count:
Please notice that mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8, and the variable *cnt* could reach 
the value of spin count due to spinning timeout. In such case, mutex->__data.__spins is increased and could be close to *cnt*
(close to the value of spin count). Keeping the value of spin count less than MAX_SHORT can avoid the overflow of
 mutex->__data.__spins variable with the possible type of short.


For the default value of spin count:
I referred to the previous number of 100 times for trylock in the loop. When this mode is changed to read only while spinning.
I suppose the value could be larger because of lower overhead and latency of read compared with cmpxchg.

Perhaps we should make the default value of spin count differently according to architecture. 

>> +# define TUNABLE_CALLBACK_FNDECL(__name, __type)            \
>> +static inline void                        \
>> +__always_inline                            \
>> +do_set_mutex_ ## __name (__type value)            \
>> +{                                \
>> +  __mutex_aconf.__name = value;                \
>> +}                                \
>> +void                                \
>> +TUNABLE_CALLBACK (set_mutex_ ## __name) (tunable_val_t *valp) \
>> +{                                \
>> +  __type value = (__type) (valp)->numval;            \
>> +  do_set_mutex_ ## __name (value);                \
>> +}
>> +
>> +TUNABLE_CALLBACK_FNDECL (spin_count, int32_t);
> 
> I'm not sure if the macro is helpful in this context.
> 

It is a matter of taste.
But, perhaps we have other mutex tunables in future.

>> +static void
>> +mutex_tunables_init (int argc __attribute__ ((unused)),
>> +                  char **argv  __attribute__ ((unused)),
>> +                          char **environ)
>> +{
>> +#if HAVE_TUNABLES
>> +
>> +  TUNABLE_GET (spin_count, int32_t,
>> +          TUNABLE_CALLBACK (set_mutex_spin_count));
>> +#endif
>> +}
>> +
>> +#ifdef SHARED
>> +# define INIT_SECTION ".init_array"
>> +#else
>> +# define INIT_SECTION ".preinit_array"
>> +#endif
>> +
>> +void (*const __pthread_mutex_tunables_init_array []) (int, char **, char **)
>> +  __attribute__ ((section (INIT_SECTION), aligned (sizeof (void *)))) =
>> +{
>> +  &mutex_tunables_init
>> +};
> 
> Can't you perform the initialization as part of overall pthread initialization?  This would avoid the extra relocation.
> 

Thanks for your suggestion. I am not sure how to do it now and will take a look at it.

> Thanks,
> Florian
Florian Weimer May 8, 2018, 3:44 p.m. UTC | #6
On 05/02/2018 01:06 PM, kemi wrote:
> Hi, Florian
>      Thanks for your time to review.
> 
> On 2018年05月02日 16:04, Florian Weimer wrote:
>> On 04/25/2018 04:56 AM, Kemi Wang wrote:
>>
>>> +  mutex {
>>> +    spin_count {
>>> +      type: INT_32
>>> +      minval: 0
>>> +      maxval: 30000
>>> +      default: 1000
>>> +    }
>>
>> How did you come up with the default and maximum values?  Larger maximum values might be useful for testing boundary conditions.
>>
> 
> For the maximum value of spin count:
> Please notice that mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8, and the variable *cnt* could reach
> the value of spin count due to spinning timeout. In such case, mutex->__data.__spins is increased and could be close to *cnt*
> (close to the value of spin count). Keeping the value of spin count less than MAX_SHORT can avoid the overflow of
>   mutex->__data.__spins variable with the possible type of short.

Could you add this as a comment, please?

> For the default value of spin count:
> I referred to the previous number of 100 times for trylock in the loop. When this mode is changed to read only while spinning.
> I suppose the value could be larger because of lower overhead and latency of read compared with cmpxchg.

Ahh, makes sense.  Perhaps put this information into the commit message.

> Perhaps we should make the default value of spin count differently according to architecture.

Sure, or if there is just a single good choice for the tunable, just use 
that and remove the tunable again.  I guess one aspect here is to 
experiment with different values and see if there's a clear winner.

>>> +# define TUNABLE_CALLBACK_FNDECL(__name, __type)            \
>>> +static inline void                        \
>>> +__always_inline                            \
>>> +do_set_mutex_ ## __name (__type value)            \
>>> +{                                \
>>> +  __mutex_aconf.__name = value;                \
>>> +}                                \
>>> +void                                \
>>> +TUNABLE_CALLBACK (set_mutex_ ## __name) (tunable_val_t *valp) \
>>> +{                                \
>>> +  __type value = (__type) (valp)->numval;            \
>>> +  do_set_mutex_ ## __name (value);                \
>>> +}
>>> +
>>> +TUNABLE_CALLBACK_FNDECL (spin_count, int32_t);
>>
>> I'm not sure if the macro is helpful in this context.

> It is a matter of taste.
> But, perhaps we have other mutex tunables in future.

We can still macroize the code at that point.  But no strong preference 
here.

>>> +void (*const __pthread_mutex_tunables_init_array []) (int, char **, char **)
>>> +  __attribute__ ((section (INIT_SECTION), aligned (sizeof (void *)))) =
>>> +{
>>> +  &mutex_tunables_init
>>> +};
>>
>> Can't you perform the initialization as part of overall pthread initialization?  This would avoid the extra relocation.

> Thanks for your suggestion. I am not sure how to do it now and will take a look at it.

The code would go into nptl/nptl-init.c.  It's just an idea, but I think 
it should be possible to make it work.

Thanks,
Florian
kemi May 14, 2018, 4:03 a.m. UTC | #7
On 2018年05月08日 23:44, Florian Weimer wrote:
> On 05/02/2018 01:06 PM, kemi wrote:
>> Hi, Florian
>>      Thanks for your time to review.
>>
>> On 2018年05月02日 16:04, Florian Weimer wrote:
>>> On 04/25/2018 04:56 AM, Kemi Wang wrote:
>>>
>>>> +  mutex {
>>>> +    spin_count {
>>>> +      type: INT_32
>>>> +      minval: 0
>>>> +      maxval: 30000
>>>> +      default: 1000
>>>> +    }
>>>
>>> How did you come up with the default and maximum values?  Larger maximum values might be useful for testing boundary conditions.
>>>
>>
>> For the maximum value of spin count:
>> Please notice that mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8, and the variable *cnt* could reach
>> the value of spin count due to spinning timeout. In such case, mutex->__data.__spins is increased and could be close to *cnt*
>> (close to the value of spin count). Keeping the value of spin count less than MAX_SHORT can avoid the overflow of
>>   mutex->__data.__spins variable with the possible type of short.
> 
> Could you add this as a comment, please?
> 

Sure:)

>> For the default value of spin count:
>> I referred to the previous number of 100 times for trylock in the loop. When this mode is changed to read only while spinning.
>> I suppose the value could be larger because of lower overhead and latency of read compared with cmpxchg.
> 
> Ahh, makes sense.  Perhaps put this information into the commit message.
> 

I investigated more on the default value of spin count recently.

It's obvious that we should provide a larger default value since the spinning way would be changed from "TRYLOCK"(cmpxchg) to 
read only while spinning. But it's not a trivial issue to determine which one is the best if possible.
The latency of atomic operation and read (e.g. cmpxchg) is determined by many factors, such as the position of cache line by which
the data is owned, the state of cache line(M/E/S/I or even O), cache line transformation and etc.
And some research report[1](Fig2 in that paper) before has shown that the latency of cmpxchg is 1.5x longer than single read at
the same condition in Haswell.

So, let's set the default value of spin count as 150, and run some benchmark to test it.

What's your idea?

[1] Lesani, Mohsen, Todd Millstein, and Jens Palsberg. "Automatic Atomicity Verification for Clients of Concurrent Data Structures." 
International Conference on Computer Aided Verification. Springer, Cham, 2014.

>> Perhaps we should make the default value of spin count differently according to architecture.
> 
> Sure, or if there is just a single good choice for the tunable, just use that and remove the tunable again.  I guess one aspect here is to experiment with different values and see if there's a clear winner.
>

Two reasons for keeping the tunables here:
1) The overhead of instructions are architecture-specific, so, it is hard or even impossible to have a perfect default value that fits all the architecture well.
E.g. The pause instruction in the Skylake platform is 10x expensive than before.

2) There are kinds of workload which may need a different spin timeout.
I have heard many grumble of the pthread adaptive spin mutex from customers that does not work well in their practical workload. 
Let's keep a tunable here for them.

>>>> +# define TUNABLE_CALLBACK_FNDECL(__name, __type)            \
>>>> +static inline void                        \
>>>> +__always_inline                            \
>>>> +do_set_mutex_ ## __name (__type value)            \
>>>> +{                                \
>>>> +  __mutex_aconf.__name = value;                \
>>>> +}                                \
>>>> +void                                \
>>>> +TUNABLE_CALLBACK (set_mutex_ ## __name) (tunable_val_t *valp) \
>>>> +{                                \
>>>> +  __type value = (__type) (valp)->numval;            \
>>>> +  do_set_mutex_ ## __name (value);                \
>>>> +}
>>>> +
>>>> +TUNABLE_CALLBACK_FNDECL (spin_count, int32_t);
>>>
>>> I'm not sure if the macro is helpful in this context.
> 
>> It is a matter of taste.
>> But, perhaps we have other mutex tunables in future.
> 
> We can still macroize the code at that point.  But no strong preference here.
> 

That's all right.

>>>> +void (*const __pthread_mutex_tunables_init_array []) (int, char **, char **)
>>>> +  __attribute__ ((section (INIT_SECTION), aligned (sizeof (void *)))) =
>>>> +{
>>>> +  &mutex_tunables_init
>>>> +};
>>>
>>> Can't you perform the initialization as part of overall pthread initialization?  This would avoid the extra relocation.
> 
>> Thanks for your suggestion. I am not sure how to do it now and will take a look at it.
> 
> The code would go into nptl/nptl-init.c.  It's just an idea, but I think it should be possible to make it work.
> 

thanks, will take a look at it and see whether we can get benefit.
> Thanks,
> Florian
kemi May 14, 2018, 5:03 a.m. UTC | #8
On 2018年05月14日 12:03, kemi wrote:
> 
> 
> On 2018年05月08日 23:44, Florian Weimer wrote:
>> On 05/02/2018 01:06 PM, kemi wrote:
>>> Hi, Florian
>>>      Thanks for your time to review.
>>>
>>> On 2018年05月02日 16:04, Florian Weimer wrote:
>>>> On 04/25/2018 04:56 AM, Kemi Wang wrote:
>>>>
>>>>> +  mutex {
>>>>> +    spin_count {
>>>>> +      type: INT_32
>>>>> +      minval: 0
>>>>> +      maxval: 30000
>>>>> +      default: 1000
>>>>> +    }
>>>>
>>>> How did you come up with the default and maximum values?  Larger maximum values might be useful for testing boundary conditions.
>>>>
>>>
>>> For the maximum value of spin count:
>>> Please notice that mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8, and the variable *cnt* could reach
>>> the value of spin count due to spinning timeout. In such case, mutex->__data.__spins is increased and could be close to *cnt*
>>> (close to the value of spin count). Keeping the value of spin count less than MAX_SHORT can avoid the overflow of
>>>   mutex->__data.__spins variable with the possible type of short.
>>
>> Could you add this as a comment, please?
>>
> 
> Sure:)
> 
>>> For the default value of spin count:
>>> I referred to the previous number of 100 times for trylock in the loop. When this mode is changed to read only while spinning.
>>> I suppose the value could be larger because of lower overhead and latency of read compared with cmpxchg.
>>
>> Ahh, makes sense.  Perhaps put this information into the commit message.
>>
> 
> I investigated more on the default value of spin count recently.
> 
> It's obvious that we should provide a larger default value since the spinning way would be changed from "TRYLOCK"(cmpxchg) to 
> read only while spinning. But it's not a trivial issue to determine which one is the best if possible.
> The latency of atomic operation and read (e.g. cmpxchg) is determined by many factors, such as the position of cache line by which
> the data is owned, the state of cache line(M/E/S/I or even O), cache line transformation and etc.
> And some research report[1](Fig2 in that paper) before has shown that the latency of cmpxchg is 1.5x longer than single read at
> the same condition in Haswell.
> 
> So, let's set the default value of spin count as 150, and run some benchmark to test it.
> 
> What's your idea?
> 
> [1] Lesani, Mohsen, Todd Millstein, and Jens Palsberg. "Automatic Atomicity Verification for Clients of Concurrent Data Structures." 
> International Conference on Computer Aided Verification. Springer, Cham, 2014.
> 

The follow paper is the correct reference. Misoperation before.
Schweizer, Hermann, Maciej Besta, and Torsten Hoefler. "Evaluating the cost of atomic operations on modern architectures." Parallel 
Architecture and Compilation (PACT), 2015 International Conference on. IEEE, 2015.

>>> Perhaps we should make the default value of spin count differently according to architecture.
>>
>> Sure, or if there is just a single good choice for the tunable, just use that and remove the tunable again.  I guess one aspect here is to experiment with different values and see if there's a clear winner.
>>
> 
> Two reasons for keeping the tunables here:
> 1) The overhead of instructions are architecture-specific, so, it is hard or even impossible to have a perfect default value that fits all the architecture well.
> E.g. The pause instruction in the Skylake platform is 10x expensive than before.
> 
> 2) There are kinds of workload which may need a different spin timeout.
> I have heard many grumble of the pthread adaptive spin mutex from customers that does not work well in their practical workload. 
> Let's keep a tunable here for them.
> 
>>>>> +# define TUNABLE_CALLBACK_FNDECL(__name, __type)            \
>>>>> +static inline void                        \
>>>>> +__always_inline                            \
>>>>> +do_set_mutex_ ## __name (__type value)            \
>>>>> +{                                \
>>>>> +  __mutex_aconf.__name = value;                \
>>>>> +}                                \
>>>>> +void                                \
>>>>> +TUNABLE_CALLBACK (set_mutex_ ## __name) (tunable_val_t *valp) \
>>>>> +{                                \
>>>>> +  __type value = (__type) (valp)->numval;            \
>>>>> +  do_set_mutex_ ## __name (value);                \
>>>>> +}
>>>>> +
>>>>> +TUNABLE_CALLBACK_FNDECL (spin_count, int32_t);
>>>>
>>>> I'm not sure if the macro is helpful in this context.
>>
>>> It is a matter of taste.
>>> But, perhaps we have other mutex tunables in future.
>>
>> We can still macroize the code at that point.  But no strong preference here.
>>
> 
> That's all right.
> 
>>>>> +void (*const __pthread_mutex_tunables_init_array []) (int, char **, char **)
>>>>> +  __attribute__ ((section (INIT_SECTION), aligned (sizeof (void *)))) =
>>>>> +{
>>>>> +  &mutex_tunables_init
>>>>> +};
>>>>
>>>> Can't you perform the initialization as part of overall pthread initialization?  This would avoid the extra relocation.
>>
>>> Thanks for your suggestion. I am not sure how to do it now and will take a look at it.
>>
>> The code would go into nptl/nptl-init.c.  It's just an idea, but I think it should be possible to make it work.
>>
> 
> thanks, will take a look at it and see whether we can get benefit.
>> Thanks,
>> Florian
Florian Weimer May 14, 2018, 7:30 a.m. UTC | #9
On 05/14/2018 06:03 AM, kemi wrote:

> So, let's set the default value of spin count as 150, and run some benchmark to test it.
> 
> What's your idea?

Picking an arbitrary value for now is good enough, and using benchmarks 
to tune it in the future certainly makes sense.

> Two reasons for keeping the tunables here:
> 1) The overhead of instructions are architecture-specific, so, it is hard or even impossible to have a perfect default value that fits all the architecture well.
> E.g. The pause instruction in the Skylake platform is 10x expensive than before.

If we have particularly good numbers for one architecture, we could 
supply an architecture-specific default.

Thanks,
Florian
kemi May 14, 2018, 7:37 a.m. UTC | #10
On 2018年05月14日 15:30, Florian Weimer wrote:
> On 05/14/2018 06:03 AM, kemi wrote:
> 
>> So, let's set the default value of spin count as 150, and run some benchmark to test it.
>>
>> What's your idea?
> 
> Picking an arbitrary value for now is good enough, and using benchmarks to tune it in the future certainly makes sense.
> 

Agree.
And will enhance lock benchmark in V3.

>> Two reasons for keeping the tunables here:
>> 1) The overhead of instructions are architecture-specific, so, it is hard or even impossible to have a perfect default value that fits all the architecture well.
>> E.g. The pause instruction in the Skylake platform is 10x expensive than before.
> 
> If we have particularly good numbers for one architecture, we could supply an architecture-specific default.
> 

Agree, and I will do later for x86 architecture.

> Thanks,
> Florian
diff mbox series

Patch

diff --git a/ChangeLog b/ChangeLog
index efa8d39..4750b11 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@ 
+2018-04-24  Kemi Wang <kemi.wang@intel.com>
+
+	* elf/dl-tunables.list: Add glibc.mutex.spin_count entry.
+	* manual/tunables.texi: Add glibc.mutex.spin_count description.
+	* nptl/Makefile: Add pthread_mutex_conf.c for compilation.
+	* nptl/pthread_mutex_conf.h: New file.
+	* nptl/pthread_mutex_conf.c: New file.
+
 2018-04-24  Joseph Myers  <joseph@codesourcery.com>
 
 	* sysdeps/mach/hurd/dl-sysdep.c: Include <not-errno.h>.
diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
index 1f8ecb8..dc1e8f4 100644
--- a/elf/dl-tunables.list
+++ b/elf/dl-tunables.list
@@ -121,4 +121,13 @@  glibc {
       default: 3
     }
   }
+
+  mutex {
+    spin_count {
+      type: INT_32
+      minval: 0
+      maxval: 30000
+      default: 1000
+    }
+  }
 }
diff --git a/manual/tunables.texi b/manual/tunables.texi
index be33c9f..c2cf8c6 100644
--- a/manual/tunables.texi
+++ b/manual/tunables.texi
@@ -281,6 +281,27 @@  of try lock attempts.
 The default value of this tunable is @samp{3}.
 @end deftp
 
+@node Pthread Mutex Tunables
+@section Pthread Mutex Tunables
+@cindex pthread mutex tunables
+
+@deftp {Tunable namespace} glibc.mutex
+Behavior of pthread mutex can be tuned to gain performance improvement
+according to specific hardware capability and workload character by setting
+the following tunables in the @code{mutex} namespace.
+@end deftp
+
+@deftp Tunable glibc.mutex.spin_count
+The @code{glibc.mutex.spin_count} tunable sets the maximum spin times that
+a thread should spin on the lock before calling into the kernel to block.
+Adaptive spin is used for the mutex initialized with PTHREAD_MUTEX_ADAPTIVE_NP
+GNU extension. It affects both pthread_mutex_lock and pthread_mutex_timedlock.
+The spinning is done in case of either the maximum spin times is reached or
+the lock is acquired during spinning.
+
+The default value of this tunable is @samp{1000}.
+@end deftp
+
 @node Hardware Capability Tunables
 @section Hardware Capability Tunables
 @cindex hardware capability tunables
diff --git a/nptl/Makefile b/nptl/Makefile
index 94be92c..bd1096f 100644
--- a/nptl/Makefile
+++ b/nptl/Makefile
@@ -139,7 +139,8 @@  libpthread-routines = nptl-init vars events version pt-interp \
 		      pthread_mutex_getprioceiling \
 		      pthread_mutex_setprioceiling \
 		      pthread_setname pthread_getname \
-		      pthread_setattr_default_np pthread_getattr_default_np
+		      pthread_setattr_default_np pthread_getattr_default_np \
+		      pthread_mutex_conf
 #		      pthread_setuid pthread_seteuid pthread_setreuid \
 #		      pthread_setresuid \
 #		      pthread_setgid pthread_setegid pthread_setregid \
diff --git a/nptl/pthread_mutex_conf.c b/nptl/pthread_mutex_conf.c
new file mode 100644
index 0000000..6340b5d
--- /dev/null
+++ b/nptl/pthread_mutex_conf.c
@@ -0,0 +1,77 @@ 
+/* pthread_mutex_conf.c: Pthread mutex tunable parameters.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include <pthreadP.h>
+#include <init-arch.h>
+#include <pthread_mutex_conf.h>
+#include <unistd.h>
+
+#if HAVE_TUNABLES
+# define TUNABLE_NAMESPACE mutex
+#endif
+#include <elf/dl-tunables.h>
+
+
+struct mutex_config __mutex_aconf =
+  {
+    /* The maximum times a thread spin on the lock before going to block */
+    .spin_count = 1000,
+  };
+
+#if HAVE_TUNABLES
+# define TUNABLE_CALLBACK_FNDECL(__name, __type)			\
+static inline void						\
+__always_inline							\
+do_set_mutex_ ## __name (__type value)			\
+{								\
+  __mutex_aconf.__name = value;				\
+}								\
+void								\
+TUNABLE_CALLBACK (set_mutex_ ## __name) (tunable_val_t *valp) \
+{								\
+  __type value = (__type) (valp)->numval;			\
+  do_set_mutex_ ## __name (value);				\
+}
+
+TUNABLE_CALLBACK_FNDECL (spin_count, int32_t);
+#endif
+
+static void
+mutex_tunables_init (int argc __attribute__ ((unused)),
+			      char **argv  __attribute__ ((unused)),
+					      char **environ)
+{
+#if HAVE_TUNABLES
+
+  TUNABLE_GET (spin_count, int32_t,
+		  TUNABLE_CALLBACK (set_mutex_spin_count));
+#endif
+}
+
+#ifdef SHARED
+# define INIT_SECTION ".init_array"
+#else
+# define INIT_SECTION ".preinit_array"
+#endif
+
+void (*const __pthread_mutex_tunables_init_array []) (int, char **, char **)
+  __attribute__ ((section (INIT_SECTION), aligned (sizeof (void *)))) =
+{
+  &mutex_tunables_init
+};
diff --git a/nptl/pthread_mutex_conf.h b/nptl/pthread_mutex_conf.h
new file mode 100644
index 0000000..e5b027c
--- /dev/null
+++ b/nptl/pthread_mutex_conf.h
@@ -0,0 +1,31 @@ 
+/* pthread_mutex_conf.h: Pthread mutex tunable parameters.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+#ifndef _PTHREAD_MUTEX_CONF_H
+#define _PTHREAD_MUTEX_CONF_H 1
+
+#include <pthread.h>
+#include <time.h>
+
+struct mutex_config
+{
+  int spin_count;
+};
+
+extern struct mutex_config __mutex_aconf attribute_hidden;
+
+#endif