V2 [PATCH] x86: Use _rdtsc intrinsic for HP_TIMING_NOW

Message ID CAMe9rOpeMG7ES2q3iLt9AjZ04mynkWKm1BJs5taZi3B+a9=HsQ@mail.gmail.com
State New
Headers show
Series
  • V2 [PATCH] x86: Use _rdtsc intrinsic for HP_TIMING_NOW
Related show

Commit Message

H.J. Lu Oct. 3, 2018, 11:08 p.m.
On Tue, Oct 2, 2018 at 11:06 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Tue, Oct 2, 2018 at 10:23 AM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
> >
> >
> >
> > On 02/10/2018 13:56, H.J. Lu wrote:
> > > On Tue, Oct 2, 2018 at 6:45 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >>
> > >> On Tue, Oct 2, 2018 at 6:15 AM Adhemerval Zanella
> > >> <adhemerval.zanella@linaro.org> wrote:
> > >
> > >>> I am not very found of cross abi includes like this one, wouldn't be better
> > >>> to create a 'sysdeps/x86/hp-timing.h' and for 32-bit use _rdtsc iff
> > >>> __i686__ is defined otherwise include generic header?
> > >>
> > >> I can do that.
> > >
> > > Here is the V2 patch.  OK for master?
> > >
> >
> > > NB: Checking if __i686__ isn't sufficient since __i686__ may not be
> > > defined when building for i686 class processors.
> >
> > Right, it seems gcc does not define it for -march newer than pentium4.
> >
> > > diff --git a/sysdeps/i386/i586/init-arch.h b/sysdeps/i386/i586/isa.h
> > > similarity index 89%
> > > rename from sysdeps/i386/i586/init-arch.h
> > > rename to sysdeps/i386/i586/isa.h
> > > index 72fb46c61e..a2b5d585d4 100644
> > > --- a/sysdeps/i386/i586/init-arch.h
> > > +++ b/sysdeps/i386/i586/isa.h
> > > @@ -1,4 +1,4 @@
> > > -/* Copyright (C) 2015-2018 Free Software Foundation, Inc.
> > > +/* Copyright (C) 2018 Free Software Foundation, Inc.
> > >     This file is part of the GNU C Library.
> > >
> > >     The GNU C Library is free software; you can redistribute it and/or
> > > @@ -15,5 +15,9 @@
> > >     License along with the GNU C Library; if not, see
> > >     <http://www.gnu.org/licenses/>.  */
> > >
> > > +#ifndef _isa_h
> > > +#define _isa_h
> > > +
> >
> > Not sure if this is explicit on our coding rules, but usually we
> > use uppercase caps for preprocessor guards.  I think it requires
> > a one-line description as well (same for other isa.h files).
>
> I will change that.
>
> > > -#ifndef _HP_TIMING_H
> > > -#define _HP_TIMING_H 1
> > > +#include <isa.h>
> > > +
> > > +#if MINIMUM_ISA == 686 || MINIMUM_ISA == 8664
> > > +# ifndef _HP_TIMING_H
> > > +#  define _HP_TIMING_H       1
> > > +
> >
> > The include guard inside the MINIMUM_ISA check seems off, why do
> > you need it?
>
> <sysdeps/generic/hp-timing.h> will be skipped if _HP_TIMING_H is
> defined.  That is the draw back for the single x86 hp-timing.h.

Here is the updated patch.  OK for master?

Comments

Adhemerval Zanella Oct. 17, 2018, 1:15 p.m. | #1
On 03/10/2018 20:08, H.J. Lu wrote:
> On Tue, Oct 2, 2018 at 11:06 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Oct 2, 2018 at 10:23 AM Adhemerval Zanella
>> <adhemerval.zanella@linaro.org> wrote:
>>>
>>>
>>> On 02/10/2018 13:56, H.J. Lu wrote:
>>>> On Tue, Oct 2, 2018 at 6:45 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Tue, Oct 2, 2018 at 6:15 AM Adhemerval Zanella
>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>> I am not very found of cross abi includes like this one, wouldn't be better
>>>>>> to create a 'sysdeps/x86/hp-timing.h' and for 32-bit use _rdtsc iff
>>>>>> __i686__ is defined otherwise include generic header?
>>>>> I can do that.
>>>> Here is the V2 patch.  OK for master?
>>>>
>>>> NB: Checking if __i686__ isn't sufficient since __i686__ may not be
>>>> defined when building for i686 class processors.
>>> Right, it seems gcc does not define it for -march newer than pentium4.
>>>
>>>> diff --git a/sysdeps/i386/i586/init-arch.h b/sysdeps/i386/i586/isa.h
>>>> similarity index 89%
>>>> rename from sysdeps/i386/i586/init-arch.h
>>>> rename to sysdeps/i386/i586/isa.h
>>>> index 72fb46c61e..a2b5d585d4 100644
>>>> --- a/sysdeps/i386/i586/init-arch.h
>>>> +++ b/sysdeps/i386/i586/isa.h
>>>> @@ -1,4 +1,4 @@
>>>> -/* Copyright (C) 2015-2018 Free Software Foundation, Inc.
>>>> +/* Copyright (C) 2018 Free Software Foundation, Inc.
>>>>     This file is part of the GNU C Library.
>>>>
>>>>     The GNU C Library is free software; you can redistribute it and/or
>>>> @@ -15,5 +15,9 @@
>>>>     License along with the GNU C Library; if not, see
>>>>     <http://www.gnu.org/licenses/>.  */
>>>>
>>>> +#ifndef _isa_h
>>>> +#define _isa_h
>>>> +
>>> Not sure if this is explicit on our coding rules, but usually we
>>> use uppercase caps for preprocessor guards.  I think it requires
>>> a one-line description as well (same for other isa.h files).
>> I will change that.
>>
>>>> -#ifndef _HP_TIMING_H
>>>> -#define _HP_TIMING_H 1
>>>> +#include <isa.h>
>>>> +
>>>> +#if MINIMUM_ISA == 686 || MINIMUM_ISA == 8664
>>>> +# ifndef _HP_TIMING_H
>>>> +#  define _HP_TIMING_H       1
>>>> +
>>> The include guard inside the MINIMUM_ISA check seems off, why do
>>> you need it?
>> <sysdeps/generic/hp-timing.h> will be skipped if _HP_TIMING_H is
>> defined.  That is the draw back for the single x86 hp-timing.h.
> Here is the updated patch.  OK for master?

LGTM.

> index 59af526fdb..1c20e9d828 100644
> --- a/sysdeps/i386/i686/hp-timing.h
> +++ b/sysdeps/x86/hp-timing.h
> @@ -1,7 +1,6 @@
> -/* High precision, low overhead timing functions.  i686 version.
> -   Copyright (C) 1998-2018 Free Software Foundation, Inc.
> +/* High precision, low overhead timing functions.  x86 version.
> +   Copyright (C) 2018 Free Software Foundation, Inc.
>     This file is part of the GNU C Library.
> -   Contributed by Ulrich Drepper <drepper@cygnus.com>, 1998.
>  
>     The GNU C Library is free software; you can redistribute it and/or
>     modify it under the terms of the GNU Lesser General Public
> @@ -20,12 +19,17 @@
>  #ifndef _HP_TIMING_H
>  #define _HP_TIMING_H	1

Another possibility is define _HP_TIMING_X86_1 and not undefine it below,
I don't have a strong opinion, so either way look good to me.
Florian Weimer Oct. 20, 2018, 4 p.m. | #2
* H. J. Lu:

> Since _rdtsc intrinsic is supported in GCC 4.9, we can use it for
> HP_TIMING_NOW.  This patch
>
> 1. Create x86 hp-timing.h to replace i686 and x86_64 hp-timing.h.
> 2. Move MINIMUM_ISA from init-arch.h to isa.h so that x86 hp-timing.h
> can check minimum x86 ISA to decide if _rdtsc can be used.
>
> NB: Checking if __i686__ isn't sufficient since __i686__ may not be
> defined when building for i686 class processors.
>
> 	* sysdeps/i386/init-arch.h: Removed.
> 	* sysdeps/i386/i586/init-arch.h: Likewise.
> 	* sysdeps/i386/i686/init-arch.h: Likewise.
> 	* sysdeps/i386/i686/hp-timing.h: Likewise.
> 	* sysdeps/x86_64/hp-timing.h: Likewise.
> 	* sysdeps/i386/isa.h: New file.
> 	* sysdeps/i386/i586/isa.h: Likewise.
> 	* sysdeps/i386/i686/isa.h: Likewise.
> 	* sysdeps/x86_64/isa.h: Likewise.
> 	* sysdeps/x86/hp-timing.h: New file.
> 	* sysdeps/x86/init-arch.h: Include <isa.h>.

This patch results in a substantial build time increase on Debian with
GCC 6: instead of less than two minutes I now see more than four
minutes for a full build (excluding testing).

I tried to fix this by avoiding the #undef in sysdeps/x86/hp-timing.h,
as Adhemerval suggested, but it did not help.  It turns out that
including <x86intrin.h> in the old header (sysdeps/x86_64/hp-timing.h)
is sufficient to trigger the regression in build performance.

I think the regression substantial enough to revert the patch.
Comments?

I'm trying to remove the #include <hp-timing.h> from
<libc-internal.h>, but I don't know how hard this will be.
Florian Weimer Oct. 20, 2018, 4:15 p.m. | #3
* Florian Weimer:


> I'm trying to remove the #include <hp-timing.h> from
> <libc-internal.h>, but I don't know how hard this will be.

That doesn't help.  It's pulled in via <tls.h> and <descr.h> as well.
We would need a separate <hp_timing_t.h> header file to avoid that
because hp_timing_t is required in a few central places (thread
descriptor and link map).
H.J. Lu Oct. 20, 2018, 10:27 p.m. | #4
On 10/20/18, Florian Weimer <fw@deneb.enyo.de> wrote:
> * Florian Weimer:
>
>
>> I'm trying to remove the #include <hp-timing.h> from
>> <libc-internal.h>, but I don't know how hard this will be.
>
> That doesn't help.  It's pulled in via <tls.h> and <descr.h> as well.
> We would need a separate <hp_timing_t.h> header file to avoid that
> because hp_timing_t is required in a few central places (thread
> descriptor and link map).
>

Please try this.
Florian Weimer Oct. 21, 2018, 6:57 a.m. | #5
* H. J. Lu:

> +   NB: Use __builtin_ia32_rdtsc directly since including <x86intrin.h>
> +   is very slow with GCC 6.  */

Thanks for the patch.

It fixes the issue, but the comment is wrong: I see the same slowdown
with GCC 8.

Patch

From 01974a82401e0facbefe46f5d5a1423314083d7d Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Mon, 1 Oct 2018 15:05:34 -0700
Subject: [PATCH] x86: Use _rdtsc intrinsic for HP_TIMING_NOW

Since _rdtsc intrinsic is supported in GCC 4.9, we can use it for
HP_TIMING_NOW.  This patch

1. Create x86 hp-timing.h to replace i686 and x86_64 hp-timing.h.
2. Move MINIMUM_ISA from init-arch.h to isa.h so that x86 hp-timing.h
can check minimum x86 ISA to decide if _rdtsc can be used.

NB: Checking if __i686__ isn't sufficient since __i686__ may not be
defined when building for i686 class processors.

	* sysdeps/i386/init-arch.h: Removed.
	* sysdeps/i386/i586/init-arch.h: Likewise.
	* sysdeps/i386/i686/init-arch.h: Likewise.
	* sysdeps/i386/i686/hp-timing.h: Likewise.
	* sysdeps/x86_64/hp-timing.h: Likewise.
	* sysdeps/i386/isa.h: New file.
	* sysdeps/i386/i586/isa.h: Likewise.
	* sysdeps/i386/i686/isa.h: Likewise.
	* sysdeps/x86_64/isa.h: Likewise.
	* sysdeps/x86/hp-timing.h: New file.
	* sysdeps/x86/init-arch.h: Include <isa.h>.
---
 sysdeps/i386/i586/{init-arch.h => isa.h} |  9 ++++--
 sysdeps/i386/i686/{init-arch.h => isa.h} |  9 ++++--
 sysdeps/i386/{init-arch.h => isa.h}      |  9 ++++--
 sysdeps/{i386/i686 => x86}/hp-timing.h   | 28 +++++++++++------
 sysdeps/x86/init-arch.h                  |  1 +
 sysdeps/x86_64/hp-timing.h               | 40 ------------------------
 sysdeps/x86_64/isa.h                     | 24 ++++++++++++++
 7 files changed, 65 insertions(+), 55 deletions(-)
 rename sysdeps/i386/i586/{init-arch.h => isa.h} (85%)
 rename sysdeps/i386/i686/{init-arch.h => isa.h} (85%)
 rename sysdeps/i386/{init-arch.h => isa.h} (85%)
 rename sysdeps/{i386/i686 => x86}/hp-timing.h (69%)
 delete mode 100644 sysdeps/x86_64/hp-timing.h
 create mode 100644 sysdeps/x86_64/isa.h

diff --git a/sysdeps/i386/i586/init-arch.h b/sysdeps/i386/i586/isa.h
similarity index 85%
rename from sysdeps/i386/i586/init-arch.h
rename to sysdeps/i386/i586/isa.h
index 72fb46c61e..79481ce680 100644
--- a/sysdeps/i386/i586/init-arch.h
+++ b/sysdeps/i386/i586/isa.h
@@ -1,4 +1,5 @@ 
-/* Copyright (C) 2015-2018 Free Software Foundation, Inc.
+/* x86 ISA info.  i586 version.
+   Copyright (C) 2018 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -15,5 +16,9 @@ 
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#ifndef _ISA_H
+#define _ISA_H
+
 #define MINIMUM_ISA 586
-#include <sysdeps/x86/init-arch.h>
+
+#endif
diff --git a/sysdeps/i386/i686/init-arch.h b/sysdeps/i386/i686/isa.h
similarity index 85%
rename from sysdeps/i386/i686/init-arch.h
rename to sysdeps/i386/i686/isa.h
index ab99392b58..584e26bd4f 100644
--- a/sysdeps/i386/i686/init-arch.h
+++ b/sysdeps/i386/i686/isa.h
@@ -1,4 +1,5 @@ 
-/* Copyright (C) 2015-2018 Free Software Foundation, Inc.
+/* x86 ISA info.  i686 version.
+   Copyright (C) 2018 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -15,5 +16,9 @@ 
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#ifndef _ISA_H
+#define _ISA_H
+
 #define MINIMUM_ISA 686
-#include <sysdeps/x86/init-arch.h>
+
+#endif
diff --git a/sysdeps/i386/init-arch.h b/sysdeps/i386/isa.h
similarity index 85%
rename from sysdeps/i386/init-arch.h
rename to sysdeps/i386/isa.h
index 043089ceb9..e0a1e9c84f 100644
--- a/sysdeps/i386/init-arch.h
+++ b/sysdeps/i386/isa.h
@@ -1,4 +1,5 @@ 
-/* Copyright (C) 2015-2018 Free Software Foundation, Inc.
+/* x86 ISA info.  i486 version.
+   Copyright (C) 2018 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -15,5 +16,9 @@ 
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#ifndef _ISA_H
+#define _ISA_H
+
 #define MINIMUM_ISA 486
-#include <sysdeps/x86/init-arch.h>
+
+#endif
diff --git a/sysdeps/i386/i686/hp-timing.h b/sysdeps/x86/hp-timing.h
similarity index 69%
rename from sysdeps/i386/i686/hp-timing.h
rename to sysdeps/x86/hp-timing.h
index 59af526fdb..1c20e9d828 100644
--- a/sysdeps/i386/i686/hp-timing.h
+++ b/sysdeps/x86/hp-timing.h
@@ -1,7 +1,6 @@ 
-/* High precision, low overhead timing functions.  i686 version.
-   Copyright (C) 1998-2018 Free Software Foundation, Inc.
+/* High precision, low overhead timing functions.  x86 version.
+   Copyright (C) 2018 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
-   Contributed by Ulrich Drepper <drepper@cygnus.com>, 1998.
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -20,12 +19,17 @@ 
 #ifndef _HP_TIMING_H
 #define _HP_TIMING_H	1
 
+#include <isa.h>
+
+#if MINIMUM_ISA == 686 || MINIMUM_ISA == 8664
+# include <x86intrin.h>
+
 /* We always assume having the timestamp register.  */
-#define HP_TIMING_AVAIL		(1)
-#define HP_SMALL_TIMING_AVAIL	(1)
+# define HP_TIMING_AVAIL	(1)
+# define HP_SMALL_TIMING_AVAIL	(1)
 
 /* We indeed have inlined functions.  */
-#define HP_TIMING_INLINE	(1)
+# define HP_TIMING_INLINE	(1)
 
 /* We use 64bit values for the times.  */
 typedef unsigned long long int hp_timing_t;
@@ -35,8 +39,14 @@  typedef unsigned long long int hp_timing_t;
    running in this moment.  This could be changed by using a barrier like
    'cpuid' right before the `rdtsc' instruciton.  But we are not interested
    in accurate clock cycles here so we don't do this.  */
-#define HP_TIMING_NOW(Var)	__asm__ __volatile__ ("rdtsc" : "=A" (Var))
+# define HP_TIMING_NOW(Var)	((Var) = _rdtsc ())
 
-#include <hp-timing-common.h>
+# include <hp-timing-common.h>
+#else
+/* NB: Undefine _HP_TIMING_H so that <sysdeps/generic/hp-timing.h> will
+   be included.  */
+# undef _HP_TIMING_H
+# include <sysdeps/generic/hp-timing.h>
+#endif
 
-#endif	/* hp-timing.h */
+#endif /* hp-timing.h */
diff --git a/sysdeps/x86/init-arch.h b/sysdeps/x86/init-arch.h
index a81ca8a4eb..bc860fcd69 100644
--- a/sysdeps/x86/init-arch.h
+++ b/sysdeps/x86/init-arch.h
@@ -21,6 +21,7 @@ 
 # include <ldsodefs.h>
 #endif
 #include <ifunc-init.h>
+#include <isa.h>
 
 #ifndef __x86_64__
 /* Due to the reordering and the other nifty extensions in i686, it is
diff --git a/sysdeps/x86_64/hp-timing.h b/sysdeps/x86_64/hp-timing.h
deleted file mode 100644
index ec543bef03..0000000000
--- a/sysdeps/x86_64/hp-timing.h
+++ /dev/null
@@ -1,40 +0,0 @@ 
-/* High precision, low overhead timing functions.  x86-64 version.
-   Copyright (C) 2002-2018 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#ifndef _HP_TIMING_H
-#define _HP_TIMING_H	1
-
-/* We always assume having the timestamp register.  */
-#define HP_TIMING_AVAIL		(1)
-#define HP_SMALL_TIMING_AVAIL	(1)
-
-/* We indeed have inlined functions.  */
-#define HP_TIMING_INLINE	(1)
-
-/* We use 64bit values for the times.  */
-typedef unsigned long long int hp_timing_t;
-
-/* The "=A" constraint used in 32-bit mode does not work in 64-bit mode.  */
-#define HP_TIMING_NOW(Var) \
-  ({ unsigned int _hi, _lo; \
-     asm volatile ("rdtsc" : "=a" (_lo), "=d" (_hi)); \
-     (Var) = ((unsigned long long int) _hi << 32) | _lo; })
-
-#include <hp-timing-common.h>
-
-#endif /* hp-timing.h */
diff --git a/sysdeps/x86_64/isa.h b/sysdeps/x86_64/isa.h
new file mode 100644
index 0000000000..452bce75eb
--- /dev/null
+++ b/sysdeps/x86_64/isa.h
@@ -0,0 +1,24 @@ 
+/* x86 ISA info.  x86-64 version.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _ISA_H
+#define _ISA_H
+
+#define MINIMUM_ISA 8664
+
+#endif
-- 
2.17.1