Message ID | 20170525143624.GA7747@lucon.org |
---|---|
State | New |
Headers | show |
On Thu, May 25, 2017 at 7:36 AM, H.J. Lu <hongjiu.lu@intel.com> wrote: > __x86_shared_non_temporal_threshold was set to 6 times of per-core > shared cache size, based on the large memcpy micro benchmark in glibc > on a 8-core processor. For a processor with more than 8 cores, the > threshold is too low. Set __x86_shared_non_temporal_threshold to the > 3/4 of the total shared cache size so that it is unchanged on 8-core > processors. On processors with less than 8 cores, the threshold is > lower. > > Any comments? Any objections? > H.J. > --- > * sysdeps/x86/cacheinfo.c (__x86_shared_non_temporal_threshold): > Set to the 3/4 of the total shared cache size. > --- > sysdeps/x86/cacheinfo.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c > index 1ccbe41..3434d97 100644 > --- a/sysdeps/x86/cacheinfo.c > +++ b/sysdeps/x86/cacheinfo.c > @@ -766,6 +766,8 @@ intel_bug_no_cache_info: > > /* The large memcpy micro benchmark in glibc shows that 6 times of > shared cache size is the approximate value above which non-temporal > - store becomes faster. */ > - __x86_shared_non_temporal_threshold = __x86_shared_cache_size * 6; > + store becomes faster on a 8-core processor. This is the 3/4 of the > + total shared cache size. */ > + __x86_shared_non_temporal_threshold > + = __x86_shared_cache_size * threads * 3 / 4; > } > -- > 2.9.4 >
On Tue, May 30, 2017 at 7:44 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Thu, May 25, 2017 at 7:36 AM, H.J. Lu <hongjiu.lu@intel.com> wrote: >> __x86_shared_non_temporal_threshold was set to 6 times of per-core >> shared cache size, based on the large memcpy micro benchmark in glibc >> on a 8-core processor. For a processor with more than 8 cores, the >> threshold is too low. Set __x86_shared_non_temporal_threshold to the >> 3/4 of the total shared cache size so that it is unchanged on 8-core >> processors. On processors with less than 8 cores, the threshold is >> lower. >> >> Any comments? > > Any objections? > >> H.J. >> --- >> * sysdeps/x86/cacheinfo.c (__x86_shared_non_temporal_threshold): >> Set to the 3/4 of the total shared cache size. >> --- >> sysdeps/x86/cacheinfo.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c >> index 1ccbe41..3434d97 100644 >> --- a/sysdeps/x86/cacheinfo.c >> +++ b/sysdeps/x86/cacheinfo.c >> @@ -766,6 +766,8 @@ intel_bug_no_cache_info: >> >> /* The large memcpy micro benchmark in glibc shows that 6 times of >> shared cache size is the approximate value above which non-temporal >> - store becomes faster. */ >> - __x86_shared_non_temporal_threshold = __x86_shared_cache_size * 6; >> + store becomes faster on a 8-core processor. This is the 3/4 of the >> + total shared cache size. */ >> + __x86_shared_non_temporal_threshold >> + = __x86_shared_cache_size * threads * 3 / 4; >> } >> -- >> 2.9.4 >> > I will check it today.
diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c index 1ccbe41..3434d97 100644 --- a/sysdeps/x86/cacheinfo.c +++ b/sysdeps/x86/cacheinfo.c @@ -766,6 +766,8 @@ intel_bug_no_cache_info: /* The large memcpy micro benchmark in glibc shows that 6 times of shared cache size is the approximate value above which non-temporal - store becomes faster. */ - __x86_shared_non_temporal_threshold = __x86_shared_cache_size * 6; + store becomes faster on a 8-core processor. This is the 3/4 of the + total shared cache size. */ + __x86_shared_non_temporal_threshold + = __x86_shared_cache_size * threads * 3 / 4; }