Message ID | CAMe9rOp7EdV2NPzq0Tz-GKNtf=WpTyvaL9anA8-dTufa-CS-tQ@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | V2 [PATCH] benchtests: Restore the clock_gettime option | expand |
* H. J. Lu: > On Wed, May 20, 2020 at 11:05 AM Florian Weimer <fweimer@redhat.com> wrote: >> >> * Alexander Monakov via Libc-alpha: >> >> > I am well aware. Again: rdtsc does not count CPU cycles on recent >> > Intel CPUs. >> >> H.J. probably has a different view on what those “recent Intel CPUs” >> are. 8-) I have not reviewed the mechanics of the patch, but if we need >> this for some CPUs, we should make the change. >> > > Here the patch with updated commit message: > > commit 7621e38bf3c58b2d0359545f1f2898017fd89d05 > Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com> > Date: Tue Jan 29 17:43:45 2019 +0000 > > Add generic hp-timing support > > removed the clock_gettime option. Restore the clock_gettime option for > some x86 CPUs on which value from RDTSC may not be incremented at a fixed > rate. > > OK for master? Patch looks okay to me. Thanks, Florian
On 20/05/2020 15:14, H.J. Lu via Libc-alpha wrote: > On Wed, May 20, 2020 at 11:05 AM Florian Weimer <fweimer@redhat.com> wrote: >> >> * Alexander Monakov via Libc-alpha: >> >>> I am well aware. Again: rdtsc does not count CPU cycles on recent >>> Intel CPUs. >> >> H.J. probably has a different view on what those “recent Intel CPUs” >> are. 8-) I have not reviewed the mechanics of the patch, but if we need >> this for some CPUs, we should make the change. >> > > Here the patch with updated commit message: > > commit 7621e38bf3c58b2d0359545f1f2898017fd89d05 > Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com> > Date: Tue Jan 29 17:43:45 2019 +0000 > > Add generic hp-timing support > > removed the clock_gettime option. Restore the clock_gettime option for > some x86 CPUs on which value from RDTSC may not be incremented at a fixed > rate. > > OK for master? What kind of result discrepancies are you seeing using hp-timing.h on x86? The clock_gettime help in what exactly here (it was not clear from discussion, neither from patch submission)? I am asking because we rely on hp-timing.h to get the loader profiling, so if this does provide accurate information in some cases it might be the case to disable on ld.so/libc.so as well.
On Wed, May 20, 2020 at 11:38 AM Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org> wrote: > > > > On 20/05/2020 15:14, H.J. Lu via Libc-alpha wrote: > > On Wed, May 20, 2020 at 11:05 AM Florian Weimer <fweimer@redhat.com> wrote: > >> > >> * Alexander Monakov via Libc-alpha: > >> > >>> I am well aware. Again: rdtsc does not count CPU cycles on recent > >>> Intel CPUs. > >> > >> H.J. probably has a different view on what those “recent Intel CPUs” > >> are. 8-) I have not reviewed the mechanics of the patch, but if we need > >> this for some CPUs, we should make the change. > >> > > > > Here the patch with updated commit message: > > > > commit 7621e38bf3c58b2d0359545f1f2898017fd89d05 > > Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com> > > Date: Tue Jan 29 17:43:45 2019 +0000 > > > > Add generic hp-timing support > > > > removed the clock_gettime option. Restore the clock_gettime option for > > some x86 CPUs on which value from RDTSC may not be incremented at a fixed > > rate. > > > > OK for master? > > What kind of result discrepancies are you seeing using hp-timing.h on x86? > The clock_gettime help in what exactly here (it was not clear from > discussion, neither from patch submission)? > > I am asking because we rely on hp-timing.h to get the loader profiling, > so if this does provide accurate information in some cases it might be > the case to disable on ld.so/libc.so as well. We are using glibc benchmarks for tuning memory/string functions. We want to compare results between RDTSC and clock_gettime.
On 5/20/20 2:14 PM, H.J. Lu via Libc-alpha wrote: > On Wed, May 20, 2020 at 11:05 AM Florian Weimer <fweimer@redhat.com> wrote: >> >> * Alexander Monakov via Libc-alpha: >> >>> I am well aware. Again: rdtsc does not count CPU cycles on recent >>> Intel CPUs. >> >> H.J. probably has a different view on what those “recent Intel CPUs” >> are. 8-) I have not reviewed the mechanics of the patch, but if we need >> this for some CPUs, we should make the change. >> > > Here the patch with updated commit message: > > commit 7621e38bf3c58b2d0359545f1f2898017fd89d05 > Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com> > Date: Tue Jan 29 17:43:45 2019 +0000 > > Add generic hp-timing support > > removed the clock_gettime option. Restore the clock_gettime option for > some x86 CPUs on which value from RDTSC may not be incremented at a fixed > rate. > > OK for master? OK for master. Reviewed-by: Carlos O'Donell <carlos@redhat.com> > Thanks. > > From 7e48f7adbd53c18df7ab5fdd488bfcc134627480 Mon Sep 17 00:00:00 2001 > From: "H.J. Lu" <hjl.tools@gmail.com> > Date: Mon, 18 May 2020 17:28:42 -0700 > Subject: [PATCH] benchtests: Restore the clock_gettime option > > commit 7621e38bf3c58b2d0359545f1f2898017fd89d05 > Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com> > Date: Tue Jan 29 17:43:45 2019 +0000 > > Add generic hp-timing support > > removed the clock_gettime option. Restore the clock_gettime option for > some x86 CPUs on which value from RDTSC may not be incremented at a fixed > rate. > --- > benchtests/Makefile | 6 ++++++ > benchtests/README | 7 ++++++- > benchtests/bench-timing.h | 6 +++++- > 3 files changed, 17 insertions(+), 2 deletions(-) > > diff --git a/benchtests/Makefile b/benchtests/Makefile > index 335d643ecb..99e90d17a0 100644 > --- a/benchtests/Makefile > +++ b/benchtests/Makefile > @@ -132,11 +132,17 @@ endif > > CPPFLAGS-nonlib += -DDURATION=$(BENCH_DURATION) -D_ISOMAC > > +# Use clock_gettime to measure performance of functions. The default is > +# to use the architecture-specific high precision timing instructions. > +ifdef USE_CLOCK_GETTIME > +CPPFLAGS-nonlib += -DUSE_CLOCK_GETTIME > +else > # On x86 processors, use RDTSCP, instead of RDTSC, to measure performance > # of functions. All x86 processors since 2010 support RDTSCP instruction. > ifdef USE_RDTSCP > CPPFLAGS-nonlib += -DUSE_RDTSCP > endif > +endif > > DETAILED_OPT := > > diff --git a/benchtests/README b/benchtests/README > index c4f03fd872..f440f3295a 100644 > --- a/benchtests/README > +++ b/benchtests/README > @@ -27,7 +27,12 @@ BENCH_DURATION. > > The benchmark suite does function call measurements using architecture-specific > high precision timing instructions whenever available. When such support is > -not available, it uses clock_gettime (CLOCK_MONOTONIC). > +not available, it uses clock_gettime (CLOCK_MONOTONIC). One can force the > +benchmark to use clock_gettime by invoking make as follows: > + > + $ make USE_CLOCK_GETTIME=1 bench > + > +Again, one must run `make bench-clean' before changing the measurement method. > > On x86 processors, RDTSCP instruction provides more precise timing data > than RDTSC instruction. All x86 processors since 2010 support RDTSCP > diff --git a/benchtests/bench-timing.h b/benchtests/bench-timing.h > index 5b9a8384bb..844a7727c9 100644 > --- a/benchtests/bench-timing.h > +++ b/benchtests/bench-timing.h > @@ -19,7 +19,11 @@ > #undef attribute_hidden > #define attribute_hidden > #define __clock_gettime clock_gettime > -#include <hp-timing.h> > +#ifdef USE_CLOCK_GETTIME > +# include <sysdeps/generic/hp-timing.h> > +#else > +# include <hp-timing.h> > +#endif > #include <stdint.h> > > #define GL(x) _##x > -- > 2.26.2 >
From 7e48f7adbd53c18df7ab5fdd488bfcc134627480 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.tools@gmail.com> Date: Mon, 18 May 2020 17:28:42 -0700 Subject: [PATCH] benchtests: Restore the clock_gettime option commit 7621e38bf3c58b2d0359545f1f2898017fd89d05 Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Date: Tue Jan 29 17:43:45 2019 +0000 Add generic hp-timing support removed the clock_gettime option. Restore the clock_gettime option for some x86 CPUs on which value from RDTSC may not be incremented at a fixed rate. --- benchtests/Makefile | 6 ++++++ benchtests/README | 7 ++++++- benchtests/bench-timing.h | 6 +++++- 3 files changed, 17 insertions(+), 2 deletions(-) diff --git a/benchtests/Makefile b/benchtests/Makefile index 335d643ecb..99e90d17a0 100644 --- a/benchtests/Makefile +++ b/benchtests/Makefile @@ -132,11 +132,17 @@ endif CPPFLAGS-nonlib += -DDURATION=$(BENCH_DURATION) -D_ISOMAC +# Use clock_gettime to measure performance of functions. The default is +# to use the architecture-specific high precision timing instructions. +ifdef USE_CLOCK_GETTIME +CPPFLAGS-nonlib += -DUSE_CLOCK_GETTIME +else # On x86 processors, use RDTSCP, instead of RDTSC, to measure performance # of functions. All x86 processors since 2010 support RDTSCP instruction. ifdef USE_RDTSCP CPPFLAGS-nonlib += -DUSE_RDTSCP endif +endif DETAILED_OPT := diff --git a/benchtests/README b/benchtests/README index c4f03fd872..f440f3295a 100644 --- a/benchtests/README +++ b/benchtests/README @@ -27,7 +27,12 @@ BENCH_DURATION. The benchmark suite does function call measurements using architecture-specific high precision timing instructions whenever available. When such support is -not available, it uses clock_gettime (CLOCK_MONOTONIC). +not available, it uses clock_gettime (CLOCK_MONOTONIC). One can force the +benchmark to use clock_gettime by invoking make as follows: + + $ make USE_CLOCK_GETTIME=1 bench + +Again, one must run `make bench-clean' before changing the measurement method. On x86 processors, RDTSCP instruction provides more precise timing data than RDTSC instruction. All x86 processors since 2010 support RDTSCP diff --git a/benchtests/bench-timing.h b/benchtests/bench-timing.h index 5b9a8384bb..844a7727c9 100644 --- a/benchtests/bench-timing.h +++ b/benchtests/bench-timing.h @@ -19,7 +19,11 @@ #undef attribute_hidden #define attribute_hidden #define __clock_gettime clock_gettime -#include <hp-timing.h> +#ifdef USE_CLOCK_GETTIME +# include <sysdeps/generic/hp-timing.h> +#else +# include <hp-timing.h> +#endif #include <stdint.h> #define GL(x) _##x -- 2.26.2