x86: Use _rdtsc intrinsic for HP_TIMING_NOW

Message ID 20181001220836.20131-1-hjl.tools@gmail.com
State New
Headers show
Series
  • x86: Use _rdtsc intrinsic for HP_TIMING_NOW
Related show

Commit Message

H.J. Lu Oct. 1, 2018, 10:08 p.m.
Since _rdtsc intrinsic is supported in GCC 4.9, we can use it for
HP_TIMING_NOW.

	* sysdeps/i386/i686/hp-timing.h: Include <x86intrin.h>.
	(HP_TIMING_NOW): Use _rdtsc.
	* sysdeps/x86_64/hp-timing.h: Just include
	<sysdeps/i386/i686/hp-timing.h>.
---
 sysdeps/i386/i686/hp-timing.h |  4 +++-
 sysdeps/x86_64/hp-timing.h    | 41 +----------------------------------
 2 files changed, 4 insertions(+), 41 deletions(-)

Comments

Adhemerval Zanella Oct. 2, 2018, 1:15 p.m. | #1
On 01/10/2018 19:08, H.J. Lu wrote:
> Since _rdtsc intrinsic is supported in GCC 4.9, we can use it for
> HP_TIMING_NOW.
> 
> 	* sysdeps/i386/i686/hp-timing.h: Include <x86intrin.h>.
> 	(HP_TIMING_NOW): Use _rdtsc.
> 	* sysdeps/x86_64/hp-timing.h: Just include
> 	<sysdeps/i386/i686/hp-timing.h>.
> ---
>  sysdeps/i386/i686/hp-timing.h |  4 +++-
>  sysdeps/x86_64/hp-timing.h    | 41 +----------------------------------
>  2 files changed, 4 insertions(+), 41 deletions(-)
> 
> diff --git a/sysdeps/i386/i686/hp-timing.h b/sysdeps/i386/i686/hp-timing.h
> index 59af526fdb..6eee8c58be 100644
> --- a/sysdeps/i386/i686/hp-timing.h
> +++ b/sysdeps/i386/i686/hp-timing.h
> @@ -20,6 +20,8 @@
>  #ifndef _HP_TIMING_H
>  #define _HP_TIMING_H	1
>  
> +#include <x86intrin.h>
> +
>  /* We always assume having the timestamp register.  */
>  #define HP_TIMING_AVAIL		(1)
>  #define HP_SMALL_TIMING_AVAIL	(1)
> @@ -35,7 +37,7 @@ typedef unsigned long long int hp_timing_t;
>     running in this moment.  This could be changed by using a barrier like
>     'cpuid' right before the `rdtsc' instruciton.  But we are not interested
>     in accurate clock cycles here so we don't do this.  */
> -#define HP_TIMING_NOW(Var)	__asm__ __volatile__ ("rdtsc" : "=A" (Var))
> +#define HP_TIMING_NOW(Var)	({ (Var) = _rdtsc (); })

Do we need a compound statement here? It does not use loops, switches, or
local variables.

>  
>  #include <hp-timing-common.h>
>  
> diff --git a/sysdeps/x86_64/hp-timing.h b/sysdeps/x86_64/hp-timing.h
> index ec543bef03..46236b9777 100644
> --- a/sysdeps/x86_64/hp-timing.h
> +++ b/sysdeps/x86_64/hp-timing.h
> @@ -1,40 +1 @@

[...]

> -
> -#endif /* hp-timing.h */
> +#include <sysdeps/i386/i686/hp-timing.h>
> 

I am not very found of cross abi includes like this one, wouldn't be better
to create a 'sysdeps/x86/hp-timing.h' and for 32-bit use _rdtsc iff
__i686__ is defined otherwise include generic header?
H.J. Lu Oct. 2, 2018, 1:45 p.m. | #2
On Tue, Oct 2, 2018 at 6:15 AM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 01/10/2018 19:08, H.J. Lu wrote:
> > Since _rdtsc intrinsic is supported in GCC 4.9, we can use it for
> > HP_TIMING_NOW.
> >
> >       * sysdeps/i386/i686/hp-timing.h: Include <x86intrin.h>.
> >       (HP_TIMING_NOW): Use _rdtsc.
> >       * sysdeps/x86_64/hp-timing.h: Just include
> >       <sysdeps/i386/i686/hp-timing.h>.
> > ---
> >  sysdeps/i386/i686/hp-timing.h |  4 +++-
> >  sysdeps/x86_64/hp-timing.h    | 41 +----------------------------------
> >  2 files changed, 4 insertions(+), 41 deletions(-)
> >
> > diff --git a/sysdeps/i386/i686/hp-timing.h b/sysdeps/i386/i686/hp-timing.h
> > index 59af526fdb..6eee8c58be 100644
> > --- a/sysdeps/i386/i686/hp-timing.h
> > +++ b/sysdeps/i386/i686/hp-timing.h
> > @@ -20,6 +20,8 @@
> >  #ifndef _HP_TIMING_H
> >  #define _HP_TIMING_H 1
> >
> > +#include <x86intrin.h>
> > +
> >  /* We always assume having the timestamp register.  */
> >  #define HP_TIMING_AVAIL              (1)
> >  #define HP_SMALL_TIMING_AVAIL        (1)
> > @@ -35,7 +37,7 @@ typedef unsigned long long int hp_timing_t;
> >     running in this moment.  This could be changed by using a barrier like
> >     'cpuid' right before the `rdtsc' instruciton.  But we are not interested
> >     in accurate clock cycles here so we don't do this.  */
> > -#define HP_TIMING_NOW(Var)   __asm__ __volatile__ ("rdtsc" : "=A" (Var))
> > +#define HP_TIMING_NOW(Var)   ({ (Var) = _rdtsc (); })
>
> Do we need a compound statement here? It does not use loops, switches, or
> local variables.

I have a followup patch to use _rdtscp which will need a local variable.

> >
> >  #include <hp-timing-common.h>
> >
> > diff --git a/sysdeps/x86_64/hp-timing.h b/sysdeps/x86_64/hp-timing.h
> > index ec543bef03..46236b9777 100644
> > --- a/sysdeps/x86_64/hp-timing.h
> > +++ b/sysdeps/x86_64/hp-timing.h
> > @@ -1,40 +1 @@
>
> [...]
>
> > -
> > -#endif /* hp-timing.h */
> > +#include <sysdeps/i386/i686/hp-timing.h>
> >
>
> I am not very found of cross abi includes like this one, wouldn't be better
> to create a 'sysdeps/x86/hp-timing.h' and for 32-bit use _rdtsc iff
> __i686__ is defined otherwise include generic header?

I can do that.
H.J. Lu Oct. 2, 2018, 4:56 p.m. | #3
On Tue, Oct 2, 2018 at 6:45 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Tue, Oct 2, 2018 at 6:15 AM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:

> > I am not very found of cross abi includes like this one, wouldn't be better
> > to create a 'sysdeps/x86/hp-timing.h' and for 32-bit use _rdtsc iff
> > __i686__ is defined otherwise include generic header?
>
> I can do that.

Here is the V2 patch.  OK for master?
Adhemerval Zanella Oct. 2, 2018, 5:23 p.m. | #4
On 02/10/2018 13:56, H.J. Lu wrote:
> On Tue, Oct 2, 2018 at 6:45 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Tue, Oct 2, 2018 at 6:15 AM Adhemerval Zanella
>> <adhemerval.zanella@linaro.org> wrote:
> 
>>> I am not very found of cross abi includes like this one, wouldn't be better
>>> to create a 'sysdeps/x86/hp-timing.h' and for 32-bit use _rdtsc iff
>>> __i686__ is defined otherwise include generic header?
>>
>> I can do that.
> 
> Here is the V2 patch.  OK for master?
> 

> NB: Checking if __i686__ isn't sufficient since __i686__ may not be
> defined when building for i686 class processors.

Right, it seems gcc does not define it for -march newer than pentium4.

> diff --git a/sysdeps/i386/i586/init-arch.h b/sysdeps/i386/i586/isa.h
> similarity index 89%
> rename from sysdeps/i386/i586/init-arch.h
> rename to sysdeps/i386/i586/isa.h
> index 72fb46c61e..a2b5d585d4 100644
> --- a/sysdeps/i386/i586/init-arch.h
> +++ b/sysdeps/i386/i586/isa.h
> @@ -1,4 +1,4 @@
> -/* Copyright (C) 2015-2018 Free Software Foundation, Inc.
> +/* Copyright (C) 2018 Free Software Foundation, Inc.
>     This file is part of the GNU C Library.
>  
>     The GNU C Library is free software; you can redistribute it and/or
> @@ -15,5 +15,9 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> +#ifndef _isa_h
> +#define _isa_h
> +

Not sure if this is explicit on our coding rules, but usually we
use uppercase caps for preprocessor guards.  I think it requires
a one-line description as well (same for other isa.h files).

> -#ifndef _HP_TIMING_H
> -#define _HP_TIMING_H	1
> +#include <isa.h>
> +
> +#if MINIMUM_ISA == 686 || MINIMUM_ISA == 8664
> +# ifndef _HP_TIMING_H
> +#  define _HP_TIMING_H	1
> +

The include guard inside the MINIMUM_ISA check seems off, why do
you need it?
Joseph Myers Oct. 2, 2018, 5:38 p.m. | #5
On Tue, 2 Oct 2018, Adhemerval Zanella wrote:

> > NB: Checking if __i686__ isn't sufficient since __i686__ may not be
> > defined when building for i686 class processors.
> 
> Right, it seems gcc does not define it for -march newer than pentium4.

Specifically, you need a negative test (testing not building for an older 
processor) as in sysdeps/x86/cpu-features.h.  (There's probably a case for 
factoring it out so there's a single header, usable from assembly sources 
as well as from C code, that defines a macro, suitable for #if tests, that 
can be used to test "known to be i686 or later".)
H.J. Lu Oct. 2, 2018, 6:06 p.m. | #6
On Tue, Oct 2, 2018 at 10:23 AM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 02/10/2018 13:56, H.J. Lu wrote:
> > On Tue, Oct 2, 2018 at 6:45 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Tue, Oct 2, 2018 at 6:15 AM Adhemerval Zanella
> >> <adhemerval.zanella@linaro.org> wrote:
> >
> >>> I am not very found of cross abi includes like this one, wouldn't be better
> >>> to create a 'sysdeps/x86/hp-timing.h' and for 32-bit use _rdtsc iff
> >>> __i686__ is defined otherwise include generic header?
> >>
> >> I can do that.
> >
> > Here is the V2 patch.  OK for master?
> >
>
> > NB: Checking if __i686__ isn't sufficient since __i686__ may not be
> > defined when building for i686 class processors.
>
> Right, it seems gcc does not define it for -march newer than pentium4.
>
> > diff --git a/sysdeps/i386/i586/init-arch.h b/sysdeps/i386/i586/isa.h
> > similarity index 89%
> > rename from sysdeps/i386/i586/init-arch.h
> > rename to sysdeps/i386/i586/isa.h
> > index 72fb46c61e..a2b5d585d4 100644
> > --- a/sysdeps/i386/i586/init-arch.h
> > +++ b/sysdeps/i386/i586/isa.h
> > @@ -1,4 +1,4 @@
> > -/* Copyright (C) 2015-2018 Free Software Foundation, Inc.
> > +/* Copyright (C) 2018 Free Software Foundation, Inc.
> >     This file is part of the GNU C Library.
> >
> >     The GNU C Library is free software; you can redistribute it and/or
> > @@ -15,5 +15,9 @@
> >     License along with the GNU C Library; if not, see
> >     <http://www.gnu.org/licenses/>.  */
> >
> > +#ifndef _isa_h
> > +#define _isa_h
> > +
>
> Not sure if this is explicit on our coding rules, but usually we
> use uppercase caps for preprocessor guards.  I think it requires
> a one-line description as well (same for other isa.h files).

I will change that.

> > -#ifndef _HP_TIMING_H
> > -#define _HP_TIMING_H 1
> > +#include <isa.h>
> > +
> > +#if MINIMUM_ISA == 686 || MINIMUM_ISA == 8664
> > +# ifndef _HP_TIMING_H
> > +#  define _HP_TIMING_H       1
> > +
>
> The include guard inside the MINIMUM_ISA check seems off, why do
> you need it?

<sysdeps/generic/hp-timing.h> will be skipped if _HP_TIMING_H is
defined.  That is the draw back for the single x86 hp-timing.h.

Patch

diff --git a/sysdeps/i386/i686/hp-timing.h b/sysdeps/i386/i686/hp-timing.h
index 59af526fdb..6eee8c58be 100644
--- a/sysdeps/i386/i686/hp-timing.h
+++ b/sysdeps/i386/i686/hp-timing.h
@@ -20,6 +20,8 @@ 
 #ifndef _HP_TIMING_H
 #define _HP_TIMING_H	1
 
+#include <x86intrin.h>
+
 /* We always assume having the timestamp register.  */
 #define HP_TIMING_AVAIL		(1)
 #define HP_SMALL_TIMING_AVAIL	(1)
@@ -35,7 +37,7 @@  typedef unsigned long long int hp_timing_t;
    running in this moment.  This could be changed by using a barrier like
    'cpuid' right before the `rdtsc' instruciton.  But we are not interested
    in accurate clock cycles here so we don't do this.  */
-#define HP_TIMING_NOW(Var)	__asm__ __volatile__ ("rdtsc" : "=A" (Var))
+#define HP_TIMING_NOW(Var)	({ (Var) = _rdtsc (); })
 
 #include <hp-timing-common.h>
 
diff --git a/sysdeps/x86_64/hp-timing.h b/sysdeps/x86_64/hp-timing.h
index ec543bef03..46236b9777 100644
--- a/sysdeps/x86_64/hp-timing.h
+++ b/sysdeps/x86_64/hp-timing.h
@@ -1,40 +1 @@ 
-/* High precision, low overhead timing functions.  x86-64 version.
-   Copyright (C) 2002-2018 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#ifndef _HP_TIMING_H
-#define _HP_TIMING_H	1
-
-/* We always assume having the timestamp register.  */
-#define HP_TIMING_AVAIL		(1)
-#define HP_SMALL_TIMING_AVAIL	(1)
-
-/* We indeed have inlined functions.  */
-#define HP_TIMING_INLINE	(1)
-
-/* We use 64bit values for the times.  */
-typedef unsigned long long int hp_timing_t;
-
-/* The "=A" constraint used in 32-bit mode does not work in 64-bit mode.  */
-#define HP_TIMING_NOW(Var) \
-  ({ unsigned int _hi, _lo; \
-     asm volatile ("rdtsc" : "=a" (_lo), "=d" (_hi)); \
-     (Var) = ((unsigned long long int) _hi << 32) | _lo; })
-
-#include <hp-timing-common.h>
-
-#endif /* hp-timing.h */
+#include <sysdeps/i386/i686/hp-timing.h>