Message ID | 20200519203053.9854-1-hjl.tools@gmail.com |
---|---|
State | New |
Headers | show |
Series | benchtests: Restore the clock_gettime option | expand |
On Tue, 19 May 2020, H.J. Lu via Libc-alpha wrote: > commit 7621e38bf3c58b2d0359545f1f2898017fd89d05 > Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com> > Date: Tue Jan 29 17:43:45 2019 +0000 > > Add generic hp-timing support > > removed the clock_gettime option. On x86, fewer cycles doesn't > necessarily mean faster exection due to frequency drop. We should > restore the clock_gettime option. Can you please elaborate more which x86 CPUs you have in mind here, as since Nehalem (2008) when invariant TSC was introduced, the rdtsc(p) instructions count at a fixed rate rather than at CPU clock rate. And before 2008, there were no turbo frequencies and no AVX frequency drop. Alexander
On Tue, May 19, 2020 at 2:18 PM Alexander Monakov <amonakov@ispras.ru> wrote: > > > > On Tue, 19 May 2020, H.J. Lu via Libc-alpha wrote: > > > commit 7621e38bf3c58b2d0359545f1f2898017fd89d05 > > Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com> > > Date: Tue Jan 29 17:43:45 2019 +0000 > > > > Add generic hp-timing support > > > > removed the clock_gettime option. On x86, fewer cycles doesn't > > necessarily mean faster exection due to frequency drop. We should > > restore the clock_gettime option. > > Can you please elaborate more which x86 CPUs you have in mind here, > as since Nehalem (2008) when invariant TSC was introduced, the rdtsc(p) > instructions count at a fixed rate rather than at CPU clock rate. > And before 2008, there were no turbo frequencies and no AVX frequency drop. https://stackoverflow.com/questions/56852812/simd-instructions-lowering-cpu-frequency
On Tue, 19 May 2020, H.J. Lu wrote: > On Tue, May 19, 2020 at 2:18 PM Alexander Monakov <amonakov@ispras.ru> wrote: > > > > > > > > On Tue, 19 May 2020, H.J. Lu via Libc-alpha wrote: > > > > > commit 7621e38bf3c58b2d0359545f1f2898017fd89d05 > > > Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com> > > > Date: Tue Jan 29 17:43:45 2019 +0000 > > > > > > Add generic hp-timing support > > > > > > removed the clock_gettime option. On x86, fewer cycles doesn't > > > necessarily mean faster exection due to frequency drop. We should > > > restore the clock_gettime option. > > > > Can you please elaborate more which x86 CPUs you have in mind here, > > as since Nehalem (2008) when invariant TSC was introduced, the rdtsc(p) > > instructions count at a fixed rate rather than at CPU clock rate. > > And before 2008, there were no turbo frequencies and no AVX frequency drop. > > https://stackoverflow.com/questions/56852812/simd-instructions-lowering-cpu-frequency I am well aware. Again: rdtsc does not count CPU cycles on recent Intel CPUs. RDTSC reads a register that increments at a fixed rate. So its increment is proportional to wall clock time. When a workload is causing a reduction in actual CPU frequency, RDTSC increment frequency is not affected and so it remains suitable for measuring the actual wall-clock time. Alexander
On Tue, May 19, 2020 at 3:16 PM Alexander Monakov <amonakov@ispras.ru> wrote: > > On Tue, 19 May 2020, H.J. Lu wrote: > > > On Tue, May 19, 2020 at 2:18 PM Alexander Monakov <amonakov@ispras.ru> wrote: > > > > > > > > > > > > On Tue, 19 May 2020, H.J. Lu via Libc-alpha wrote: > > > > > > > commit 7621e38bf3c58b2d0359545f1f2898017fd89d05 > > > > Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com> > > > > Date: Tue Jan 29 17:43:45 2019 +0000 > > > > > > > > Add generic hp-timing support > > > > > > > > removed the clock_gettime option. On x86, fewer cycles doesn't > > > > necessarily mean faster exection due to frequency drop. We should > > > > restore the clock_gettime option. > > > > > > Can you please elaborate more which x86 CPUs you have in mind here, > > > as since Nehalem (2008) when invariant TSC was introduced, the rdtsc(p) > > > instructions count at a fixed rate rather than at CPU clock rate. > > > And before 2008, there were no turbo frequencies and no AVX frequency drop. > > > > https://stackoverflow.com/questions/56852812/simd-instructions-lowering-cpu-frequency > > I am well aware. Again: rdtsc does not count CPU cycles on recent Intel CPUs. > RDTSC reads a register that increments at a fixed rate. So its increment is > proportional to wall clock time. When a workload is causing a reduction in > actual CPU frequency, RDTSC increment frequency is not affected and so it > remains suitable for measuring the actual wall-clock time. > > Alexander We'd like to have it as an option on x86.
On Wed, 20 May 2020, H.J. Lu wrote: > On Tue, May 19, 2020 at 3:16 PM Alexander Monakov <amonakov@ispras.ru> wrote: > > > > On Tue, 19 May 2020, H.J. Lu wrote: > > > > > On Tue, May 19, 2020 at 2:18 PM Alexander Monakov <amonakov@ispras.ru> wrote: > > > > > > > > > > > > > > > > On Tue, 19 May 2020, H.J. Lu via Libc-alpha wrote: > > > > > > > > > commit 7621e38bf3c58b2d0359545f1f2898017fd89d05 > > > > > Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com> > > > > > Date: Tue Jan 29 17:43:45 2019 +0000 > > > > > > > > > > Add generic hp-timing support > > > > > > > > > > removed the clock_gettime option. On x86, fewer cycles doesn't > > > > > necessarily mean faster exection due to frequency drop. We should > > > > > restore the clock_gettime option. > > > > > > > > Can you please elaborate more which x86 CPUs you have in mind here, > > > > as since Nehalem (2008) when invariant TSC was introduced, the rdtsc(p) > > > > instructions count at a fixed rate rather than at CPU clock rate. > > > > And before 2008, there were no turbo frequencies and no AVX frequency drop. > > > > > > https://stackoverflow.com/questions/56852812/simd-instructions-lowering-cpu-frequency > > > > I am well aware. Again: rdtsc does not count CPU cycles on recent Intel CPUs. > > RDTSC reads a register that increments at a fixed rate. So its increment is > > proportional to wall clock time. When a workload is causing a reduction in > > actual CPU frequency, RDTSC increment frequency is not affected and so it > > remains suitable for measuring the actual wall-clock time. > > > > Alexander > > We'd like to have it as an option on x86. Then I would suggest rewording the proposed commit message to explain that. As written, commit message gives a misleading/wrong motivation. Alexander
* Alexander Monakov via Libc-alpha: > I am well aware. Again: rdtsc does not count CPU cycles on recent > Intel CPUs. H.J. probably has a different view on what those “recent Intel CPUs” are. 8-) I have not reviewed the mechanics of the patch, but if we need this for some CPUs, we should make the change. Thanks, Florian
diff --git a/benchtests/Makefile b/benchtests/Makefile index 335d643ecb..99e90d17a0 100644 --- a/benchtests/Makefile +++ b/benchtests/Makefile @@ -132,11 +132,17 @@ endif CPPFLAGS-nonlib += -DDURATION=$(BENCH_DURATION) -D_ISOMAC +# Use clock_gettime to measure performance of functions. The default is +# to use the architecture-specific high precision timing instructions. +ifdef USE_CLOCK_GETTIME +CPPFLAGS-nonlib += -DUSE_CLOCK_GETTIME +else # On x86 processors, use RDTSCP, instead of RDTSC, to measure performance # of functions. All x86 processors since 2010 support RDTSCP instruction. ifdef USE_RDTSCP CPPFLAGS-nonlib += -DUSE_RDTSCP endif +endif DETAILED_OPT := diff --git a/benchtests/README b/benchtests/README index c4f03fd872..f440f3295a 100644 --- a/benchtests/README +++ b/benchtests/README @@ -27,7 +27,12 @@ BENCH_DURATION. The benchmark suite does function call measurements using architecture-specific high precision timing instructions whenever available. When such support is -not available, it uses clock_gettime (CLOCK_MONOTONIC). +not available, it uses clock_gettime (CLOCK_MONOTONIC). One can force the +benchmark to use clock_gettime by invoking make as follows: + + $ make USE_CLOCK_GETTIME=1 bench + +Again, one must run `make bench-clean' before changing the measurement method. On x86 processors, RDTSCP instruction provides more precise timing data than RDTSC instruction. All x86 processors since 2010 support RDTSCP diff --git a/benchtests/bench-timing.h b/benchtests/bench-timing.h index 5b9a8384bb..844a7727c9 100644 --- a/benchtests/bench-timing.h +++ b/benchtests/bench-timing.h @@ -19,7 +19,11 @@ #undef attribute_hidden #define attribute_hidden #define __clock_gettime clock_gettime -#include <hp-timing.h> +#ifdef USE_CLOCK_GETTIME +# include <sysdeps/generic/hp-timing.h> +#else +# include <hp-timing.h> +#endif #include <stdint.h> #define GL(x) _##x