diff mbox series

[v5,2/8] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait

Message ID 796560f571e7787a1bd0aa655bc25b873e2d49b4.1590732962.git-series.mac@mcrowe.com
State New
Headers show
Series std::future::wait_* and std::condition_variable improvements | expand

Commit Message

Mike Crowe May 29, 2020, 6:17 a.m. UTC
The futex system call supports waiting for an absolute time if
FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT.  Doing so provides two
benefits:

1. The call to gettimeofday is not required in order to calculate a
   relative timeout.

2. If someone changes the system clock during the wait then the futex
   timeout will correctly expire earlier or later.  Currently that only
   happens if the clock is changed prior to the call to gettimeofday.

According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25.  To ensure
that the code still works correctly with earlier kernel versions, an ENOSYS
error from futex[1] results in the futex_clock_realtime_unavailable flag
being set.  This flag is used to avoid the unnecessary unsupported futex
call in the future and to fall back to the previous gettimeofday and
relative time implementation.

glibc applied an equivalent switch in pthread_cond_timedwait to use
FUTEX_CLOCK_REALTIME and FUTEX_WAIT_BITSET rather than FUTEX_WAIT for
glibc-2.10 back in 2009.  See
glibc:cbd8aeb836c8061c23a5e00419e0fb25a34abee7

The futex_clock_realtime_unavailable flag is accessed using
std::memory_order_relaxed to stop it becoming a bottleneck.  If the first
two calls to _M_futex_wait_until happen to happen simultaneously then the
only consequence is that both will try to use FUTEX_CLOCK_REALTIME, both
risk discovering that it doesn't work and, if so, both set the flag.

[1] This is how glibc's nptl-init.c determines whether these flags are
    supported.

	* libstdc++-v3/src/c++11/futex.cc: Add new constants for required
	futex flags.  Add futex_clock_realtime_unavailable flag to store
	result of trying to use
	FUTEX_CLOCK_REALTIME. (__atomic_futex_unsigned_base::_M_futex_wait_until):
	Try to use FUTEX_WAIT_BITSET with FUTEX_CLOCK_REALTIME and only
	fall back to using gettimeofday and FUTEX_WAIT if that's not
	supported.
---
 libstdc++-v3/src/c++11/futex.cc | 37 ++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+)

Comments

Jonathan Wakely Nov. 12, 2020, 11:07 p.m. UTC | #1
On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
>The futex system call supports waiting for an absolute time if
>FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT.  Doing so provides two
>benefits:
>
>1. The call to gettimeofday is not required in order to calculate a
>   relative timeout.
>
>2. If someone changes the system clock during the wait then the futex
>   timeout will correctly expire earlier or later.  Currently that only
>   happens if the clock is changed prior to the call to gettimeofday.
>
>According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
>v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25.  To ensure
>that the code still works correctly with earlier kernel versions, an ENOSYS
>error from futex[1] results in the futex_clock_realtime_unavailable flag
>being set.  This flag is used to avoid the unnecessary unsupported futex
>call in the future and to fall back to the previous gettimeofday and
>relative time implementation.
>
>glibc applied an equivalent switch in pthread_cond_timedwait to use
>FUTEX_CLOCK_REALTIME and FUTEX_WAIT_BITSET rather than FUTEX_WAIT for
>glibc-2.10 back in 2009.  See
>glibc:cbd8aeb836c8061c23a5e00419e0fb25a34abee7
>
>The futex_clock_realtime_unavailable flag is accessed using
>std::memory_order_relaxed to stop it becoming a bottleneck.  If the first
>two calls to _M_futex_wait_until happen to happen simultaneously then the
>only consequence is that both will try to use FUTEX_CLOCK_REALTIME, both
>risk discovering that it doesn't work and, if so, both set the flag.
>
>[1] This is how glibc's nptl-init.c determines whether these flags are
>    supported.
>
>	* libstdc++-v3/src/c++11/futex.cc: Add new constants for required
>	futex flags.  Add futex_clock_realtime_unavailable flag to store
>	result of trying to use
>	FUTEX_CLOCK_REALTIME. (__atomic_futex_unsigned_base::_M_futex_wait_until):
>	Try to use FUTEX_WAIT_BITSET with FUTEX_CLOCK_REALTIME and only
>	fall back to using gettimeofday and FUTEX_WAIT if that's not
>	supported.

Mike,

I've been doing some performance comparisons and this patch seems to
make quite a big difference to code that polls a future by calling
fut.wait_until(t) using any t < now() as the timeout. For example,
fut.wait_until(chrono::system_clock::time_point{}) to wait until the
UNIX epoch.

With GCC 10 (or with the if (!futex_clock_realtime_unavailable.load(...)
commented out) I see that polling take < 100ns. With the change, it
takes 3000ns or more.

Now this is still far better than polling using fut.wait_for(0s) which
takes around 50000ns due to the clock_gettime call, but I'm about to
fix that.

I'm not sure how important it is for wait_until(past) to be fast, but
the difference from 100ns to 3000ns seems significant. Do you see the
same kind of numbers? Is this just a property of the futex wait with
an absolute time?

N.B. using wait_until(system_clock::time_point::min()) or any other
time before the epoch doesn't work. The futex syscall returns EINVAL
which we don't check for. I'm about to fix that too.


> libstdc++-v3/src/c++11/futex.cc | 37 ++++++++++++++++++++++++++++++++++-
> 1 file changed, 37 insertions(+)
>
>diff --git a/libstdc++-v3/src/c++11/futex.cc b/libstdc++-v3/src/c++11/futex.cc
>index c9de11a..25b3e05 100644
>--- a/libstdc++-v3/src/c++11/futex.cc
>+++ b/libstdc++-v3/src/c++11/futex.cc
>@@ -35,8 +35,16 @@
>
> // Constants for the wait/wake futex syscall operations
> const unsigned futex_wait_op = 0;
>+const unsigned futex_wait_bitset_op = 9;
>+const unsigned futex_clock_realtime_flag = 256;
>+const unsigned futex_bitset_match_any = ~0;
> const unsigned futex_wake_op = 1;
>
>+namespace
>+{
>+  std::atomic<bool> futex_clock_realtime_unavailable;
>+}
>+
> namespace std _GLIBCXX_VISIBILITY(default)
> {
> _GLIBCXX_BEGIN_NAMESPACE_VERSION
>@@ -58,6 +66,35 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       }
>     else
>       {
>+	if (!futex_clock_realtime_unavailable.load(std::memory_order_relaxed))
>+	  {
>+	    struct timespec rt;
>+	    rt.tv_sec = __s.count();
>+	    rt.tv_nsec = __ns.count();
>+	    if (syscall (SYS_futex, __addr,
>+			 futex_wait_bitset_op | futex_clock_realtime_flag,
>+			 __val, &rt, nullptr, futex_bitset_match_any) == -1)
>+	      {
>+		__glibcxx_assert(errno == EINTR || errno == EAGAIN
>+				|| errno == ETIMEDOUT || errno == ENOSYS);
>+		if (errno == ETIMEDOUT)
>+		  return false;
>+		if (errno == ENOSYS)
>+		  {
>+		    futex_clock_realtime_unavailable.store(true,
>+						    std::memory_order_relaxed);
>+		    // Fall through to legacy implementation if the system
>+		    // call is unavailable.
>+		  }
>+		else
>+		  return true;
>+	      }
>+	    else
>+	      return true;
>+	  }
>+
>+	// We only get to here if futex_clock_realtime_unavailable was
>+	// true or has just been set to true.
> 	struct timeval tv;
> 	gettimeofday (&tv, NULL);
> 	// Convert the absolute timeout value to a relative timeout
>-- 
>git-series 0.9.1
>
Mike Crowe Nov. 13, 2020, 9:58 p.m. UTC | #2
On Thursday 12 November 2020 at 23:07:47 +0000, Jonathan Wakely wrote:
> On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
> > The futex system call supports waiting for an absolute time if
> > FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT.  Doing so provides two
> > benefits:
> > 
> > 1. The call to gettimeofday is not required in order to calculate a
> >   relative timeout.
> > 
> > 2. If someone changes the system clock during the wait then the futex
> >   timeout will correctly expire earlier or later.  Currently that only
> >   happens if the clock is changed prior to the call to gettimeofday.
> > 
> > According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
> > v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25.  To ensure
> > that the code still works correctly with earlier kernel versions, an ENOSYS
> > error from futex[1] results in the futex_clock_realtime_unavailable flag
> > being set.  This flag is used to avoid the unnecessary unsupported futex
> > call in the future and to fall back to the previous gettimeofday and
> > relative time implementation.
> > 
> > glibc applied an equivalent switch in pthread_cond_timedwait to use
> > FUTEX_CLOCK_REALTIME and FUTEX_WAIT_BITSET rather than FUTEX_WAIT for
> > glibc-2.10 back in 2009.  See
> > glibc:cbd8aeb836c8061c23a5e00419e0fb25a34abee7
> > 
> > The futex_clock_realtime_unavailable flag is accessed using
> > std::memory_order_relaxed to stop it becoming a bottleneck.  If the first
> > two calls to _M_futex_wait_until happen to happen simultaneously then the
> > only consequence is that both will try to use FUTEX_CLOCK_REALTIME, both
> > risk discovering that it doesn't work and, if so, both set the flag.
> > 
> > [1] This is how glibc's nptl-init.c determines whether these flags are
> >    supported.
> > 
> > 	* libstdc++-v3/src/c++11/futex.cc: Add new constants for required
> > 	futex flags.  Add futex_clock_realtime_unavailable flag to store
> > 	result of trying to use
> > 	FUTEX_CLOCK_REALTIME. (__atomic_futex_unsigned_base::_M_futex_wait_until):
> > 	Try to use FUTEX_WAIT_BITSET with FUTEX_CLOCK_REALTIME and only
> > 	fall back to using gettimeofday and FUTEX_WAIT if that's not
> > 	supported.
> 
> Mike,
> 
> I've been doing some performance comparisons and this patch seems to
> make quite a big difference to code that polls a future by calling
> fut.wait_until(t) using any t < now() as the timeout. For example,
> fut.wait_until(chrono::system_clock::time_point{}) to wait until the
> UNIX epoch.
> 
> With GCC 10 (or with the if (!futex_clock_realtime_unavailable.load(...)
> commented out) I see that polling take < 100ns. With the change, it
> takes 3000ns or more.
> 
> Now this is still far better than polling using fut.wait_for(0s) which
> takes around 50000ns due to the clock_gettime call, but I'm about to
> fix that.
> 
> I'm not sure how important it is for wait_until(past) to be fast, but
> the difference from 100ns to 3000ns seems significant. Do you see the
> same kind of numbers? Is this just a property of the futex wait with
> an absolute time?
> 
> N.B. using wait_until(system_clock::time_point::min()) or any other
> time before the epoch doesn't work. The futex syscall returns EINVAL
> which we don't check for. I'm about to fix that too.

I see similar behaviour. I suppose this is because the
gettimeofday/clock_gettime system calls are in the VDSO and therefore
usually much cheaper to call than the real system call SYS_futex.

If rather than bailing out early when the relative timeout is negative, I
call the relative SYS_futex with rt.tv_sec = rt.tv_nsec = 0 then the
wait_until call takes about ten times longer than when using the absolute
SYS_futex. I can't really explain that.

Calling these functions with a time in the past is probably quite common if
you calculate a single timeout for several operations in sequence. What's
less clear is whether the performance matters that much when the return
value indicates a timeout anyway.

If gettimeofday/clock_gettime are cheap enough then I suppose we can call
them even in the absolute timeout case (losing benefit 1 above, which
appears to not really exist) to get the improved performance for timeouts
in the past whilst retaining the correct behaviour if the clock is warped
that this patch addressed (benefit 2 above.)

I'll try to come up with some standalone test cases with results for
further discussion. I suspect that the glibc people will be interested too.

Thanks for investigating this.

Mike.
Jonathan Wakely Nov. 13, 2020, 10:22 p.m. UTC | #3
On 13/11/20 21:58 +0000, Mike Crowe via Libstdc++ wrote:
>On Thursday 12 November 2020 at 23:07:47 +0000, Jonathan Wakely wrote:
>> On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
>> > The futex system call supports waiting for an absolute time if
>> > FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT.  Doing so provides two
>> > benefits:
>> >
>> > 1. The call to gettimeofday is not required in order to calculate a
>> >   relative timeout.
>> >
>> > 2. If someone changes the system clock during the wait then the futex
>> >   timeout will correctly expire earlier or later.  Currently that only
>> >   happens if the clock is changed prior to the call to gettimeofday.
>> >
>> > According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
>> > v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25.  To ensure
>> > that the code still works correctly with earlier kernel versions, an ENOSYS
>> > error from futex[1] results in the futex_clock_realtime_unavailable flag
>> > being set.  This flag is used to avoid the unnecessary unsupported futex
>> > call in the future and to fall back to the previous gettimeofday and
>> > relative time implementation.
>> >
>> > glibc applied an equivalent switch in pthread_cond_timedwait to use
>> > FUTEX_CLOCK_REALTIME and FUTEX_WAIT_BITSET rather than FUTEX_WAIT for
>> > glibc-2.10 back in 2009.  See
>> > glibc:cbd8aeb836c8061c23a5e00419e0fb25a34abee7
>> >
>> > The futex_clock_realtime_unavailable flag is accessed using
>> > std::memory_order_relaxed to stop it becoming a bottleneck.  If the first
>> > two calls to _M_futex_wait_until happen to happen simultaneously then the
>> > only consequence is that both will try to use FUTEX_CLOCK_REALTIME, both
>> > risk discovering that it doesn't work and, if so, both set the flag.
>> >
>> > [1] This is how glibc's nptl-init.c determines whether these flags are
>> >    supported.
>> >
>> > 	* libstdc++-v3/src/c++11/futex.cc: Add new constants for required
>> > 	futex flags.  Add futex_clock_realtime_unavailable flag to store
>> > 	result of trying to use
>> > 	FUTEX_CLOCK_REALTIME. (__atomic_futex_unsigned_base::_M_futex_wait_until):
>> > 	Try to use FUTEX_WAIT_BITSET with FUTEX_CLOCK_REALTIME and only
>> > 	fall back to using gettimeofday and FUTEX_WAIT if that's not
>> > 	supported.
>>
>> Mike,
>>
>> I've been doing some performance comparisons and this patch seems to
>> make quite a big difference to code that polls a future by calling
>> fut.wait_until(t) using any t < now() as the timeout. For example,
>> fut.wait_until(chrono::system_clock::time_point{}) to wait until the
>> UNIX epoch.
>>
>> With GCC 10 (or with the if (!futex_clock_realtime_unavailable.load(...)
>> commented out) I see that polling take < 100ns. With the change, it
>> takes 3000ns or more.
>>
>> Now this is still far better than polling using fut.wait_for(0s) which
>> takes around 50000ns due to the clock_gettime call, but I'm about to
>> fix that.
>>
>> I'm not sure how important it is for wait_until(past) to be fast, but
>> the difference from 100ns to 3000ns seems significant. Do you see the
>> same kind of numbers? Is this just a property of the futex wait with
>> an absolute time?
>>
>> N.B. using wait_until(system_clock::time_point::min()) or any other
>> time before the epoch doesn't work. The futex syscall returns EINVAL
>> which we don't check for. I'm about to fix that too.
>
>I see similar behaviour. I suppose this is because the
>gettimeofday/clock_gettime system calls are in the VDSO and therefore
>usually much cheaper to call than the real system call SYS_futex.
>
>If rather than bailing out early when the relative timeout is negative, I
>call the relative SYS_futex with rt.tv_sec = rt.tv_nsec = 0 then the
>wait_until call takes about ten times longer than when using the absolute
>SYS_futex. I can't really explain that.
>
>Calling these functions with a time in the past is probably quite common if
>you calculate a single timeout for several operations in sequence. What's
>less clear is whether the performance matters that much when the return
>value indicates a timeout anyway.
>
>If gettimeofday/clock_gettime are cheap enough then I suppose we can call
>them even in the absolute timeout case (losing benefit 1 above, which
>appears to not really exist) to get the improved performance for timeouts
>in the past whilst retaining the correct behaviour if the clock is warped
>that this patch addressed (benefit 2 above.)
>
>I'll try to come up with some standalone test cases with results for
>further discussion. I suspect that the glibc people will be interested too.

Thanks, that would be great. I have about twenty things on my plate
already.
Mike Crowe Nov. 14, 2020, 5:46 p.m. UTC | #4
On Friday 13 November 2020 at 21:58:25 +0000, Mike Crowe via Libstdc++ wrote:
> On Thursday 12 November 2020 at 23:07:47 +0000, Jonathan Wakely wrote:
> > On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
> > > The futex system call supports waiting for an absolute time if
> > > FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT.  Doing so provides two
> > > benefits:
> > > 
> > > 1. The call to gettimeofday is not required in order to calculate a
> > >   relative timeout.
> > > 
> > > 2. If someone changes the system clock during the wait then the futex
> > >   timeout will correctly expire earlier or later.  Currently that only
> > >   happens if the clock is changed prior to the call to gettimeofday.
> > > 
> > > According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
> > > v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25.  To ensure
> > > that the code still works correctly with earlier kernel versions, an ENOSYS
> > > error from futex[1] results in the futex_clock_realtime_unavailable flag
> > > being set.  This flag is used to avoid the unnecessary unsupported futex
> > > call in the future and to fall back to the previous gettimeofday and
> > > relative time implementation.
> > > 
> > > glibc applied an equivalent switch in pthread_cond_timedwait to use
> > > FUTEX_CLOCK_REALTIME and FUTEX_WAIT_BITSET rather than FUTEX_WAIT for
> > > glibc-2.10 back in 2009.  See
> > > glibc:cbd8aeb836c8061c23a5e00419e0fb25a34abee7
> > > 
> > > The futex_clock_realtime_unavailable flag is accessed using
> > > std::memory_order_relaxed to stop it becoming a bottleneck.  If the first
> > > two calls to _M_futex_wait_until happen to happen simultaneously then the
> > > only consequence is that both will try to use FUTEX_CLOCK_REALTIME, both
> > > risk discovering that it doesn't work and, if so, both set the flag.
> > > 
> > > [1] This is how glibc's nptl-init.c determines whether these flags are
> > >    supported.
> > > 
> > > 	* libstdc++-v3/src/c++11/futex.cc: Add new constants for required
> > > 	futex flags.  Add futex_clock_realtime_unavailable flag to store
> > > 	result of trying to use
> > > 	FUTEX_CLOCK_REALTIME. (__atomic_futex_unsigned_base::_M_futex_wait_until):
> > > 	Try to use FUTEX_WAIT_BITSET with FUTEX_CLOCK_REALTIME and only
> > > 	fall back to using gettimeofday and FUTEX_WAIT if that's not
> > > 	supported.
> > 
> > Mike,
> > 
> > I've been doing some performance comparisons and this patch seems to
> > make quite a big difference to code that polls a future by calling
> > fut.wait_until(t) using any t < now() as the timeout. For example,
> > fut.wait_until(chrono::system_clock::time_point{}) to wait until the
> > UNIX epoch.
> > 
> > With GCC 10 (or with the if (!futex_clock_realtime_unavailable.load(...)
> > commented out) I see that polling take < 100ns. With the change, it
> > takes 3000ns or more.
> > 
> > Now this is still far better than polling using fut.wait_for(0s) which
> > takes around 50000ns due to the clock_gettime call, but I'm about to
> > fix that.
> > 
> > I'm not sure how important it is for wait_until(past) to be fast, but
> > the difference from 100ns to 3000ns seems significant. Do you see the
> > same kind of numbers? Is this just a property of the futex wait with
> > an absolute time?
> > 
> > N.B. using wait_until(system_clock::time_point::min()) or any other
> > time before the epoch doesn't work. The futex syscall returns EINVAL
> > which we don't check for. I'm about to fix that too.
> 
> I see similar behaviour. I suppose this is because the
> gettimeofday/clock_gettime system calls are in the VDSO and therefore
> usually much cheaper to call than the real system call SYS_futex.
> 
> If rather than bailing out early when the relative timeout is negative, I
> call the relative SYS_futex with rt.tv_sec = rt.tv_nsec = 0 then the
> wait_until call takes about ten times longer than when using the absolute
> SYS_futex. I can't really explain that.
> 
> Calling these functions with a time in the past is probably quite common if
> you calculate a single timeout for several operations in sequence. What's
> less clear is whether the performance matters that much when the return
> value indicates a timeout anyway.
> 
> If gettimeofday/clock_gettime are cheap enough then I suppose we can call
> them even in the absolute timeout case (losing benefit 1 above, which
> appears to not really exist) to get the improved performance for timeouts
> in the past whilst retaining the correct behaviour if the clock is warped
> that this patch addressed (benefit 2 above.)

I wrote the attached standalone program to measure the relative performance
of wait operations in the past (or with zero timeout in the relative case)
and ran it on a variety of machines. The results below are in nanoseconds:

|--------------------+---------+----------+-----------+----------+---------|
|                    |  Kernel |    futex |     futex |    futex |   clock |
| CPU                | version | realtime | monotonic | relative | gettime |
|--------------------+---------+----------+-----------+----------+---------|
| x86_64 E5-2690 v2  |    4.19 |     6942 |      6675 |    61175 |      85 |
| x86_64 i7-10510U   |     5.4 |    27950 |     36650 |    69969 |     433 |
| x86_64 i7-3770K    |     5.9 |    17152 |     17232 |    59827 |     322 |
| x86_64 i7-4790K    |    4.19 |    13280 |     12219 |    58225 |     413 |
| x86 Celeron G1610T |     4.9 |    18245 |     18626 |    58445 |     407 |
| Raspberry Pi 3     |     5.9 |    30765 |     30851 |    72776 |     300 |
| Raspberry Pi 2     |     5.4 |    23830 |     24104 |    91539 |    1062 |
| mips64 gcc24       |    4.19 |    23102 |     23503 |    69343 |    1236 |
| sparc32 gcc102     |     5.9 |    42657 |     39306 |    87568 |     688 |
|--------------------+---------+----------+-----------+----------+---------|

The first machine is virtual on ESXi (it appears that Xeons really are much
faster at this stuff!) The last two machines are .fsffrance.org GCC farm
machines.

The pthread_cond_timedwait durations for CLOCK_REALTIME were generally a
little bit better than the futex realtime durations.

The pthread_cond_timedwait durations for CLOCK_MONOTONIC differed greatly
by glibc version. With glibc v2.28 (from Debian 10) it completed very
quickly, but with glibc v2.31 (from Ubuntu 20.04) it took slightly longer
than the realtime version. This is presumably because glibc v2.28 would use
a relative timeout in this case and realise that there was no point it
calling futex - I changed that in
glibc:99d01ffcc386d1bfb681fb0684fcf6a6a996beb3.

So, it's clear to me that my changes have caused an absolute wait on a time
in the past to take longer in both libstdc++ and glibc. The question now is
whether that matters to anyone. I'll take my findings to the glibc list and
see what they think.

Thanks.

Mike.
diff mbox series

Patch

diff --git a/libstdc++-v3/src/c++11/futex.cc b/libstdc++-v3/src/c++11/futex.cc
index c9de11a..25b3e05 100644
--- a/libstdc++-v3/src/c++11/futex.cc
+++ b/libstdc++-v3/src/c++11/futex.cc
@@ -35,8 +35,16 @@ 
 
 // Constants for the wait/wake futex syscall operations
 const unsigned futex_wait_op = 0;
+const unsigned futex_wait_bitset_op = 9;
+const unsigned futex_clock_realtime_flag = 256;
+const unsigned futex_bitset_match_any = ~0;
 const unsigned futex_wake_op = 1;
 
+namespace
+{
+  std::atomic<bool> futex_clock_realtime_unavailable;
+}
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -58,6 +66,35 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       }
     else
       {
+	if (!futex_clock_realtime_unavailable.load(std::memory_order_relaxed))
+	  {
+	    struct timespec rt;
+	    rt.tv_sec = __s.count();
+	    rt.tv_nsec = __ns.count();
+	    if (syscall (SYS_futex, __addr,
+			 futex_wait_bitset_op | futex_clock_realtime_flag,
+			 __val, &rt, nullptr, futex_bitset_match_any) == -1)
+	      {
+		__glibcxx_assert(errno == EINTR || errno == EAGAIN
+				|| errno == ETIMEDOUT || errno == ENOSYS);
+		if (errno == ETIMEDOUT)
+		  return false;
+		if (errno == ENOSYS)
+		  {
+		    futex_clock_realtime_unavailable.store(true,
+						    std::memory_order_relaxed);
+		    // Fall through to legacy implementation if the system
+		    // call is unavailable.
+		  }
+		else
+		  return true;
+	      }
+	    else
+	      return true;
+	  }
+
+	// We only get to here if futex_clock_realtime_unavailable was
+	// true or has just been set to true.
 	struct timeval tv;
 	gettimeofday (&tv, NULL);
 	// Convert the absolute timeout value to a relative timeout