diff mbox

[1/2] Add futex wrappers with error checking

Message ID 1417726487.22797.48.camel@triegel.csb
State New
Headers show

Commit Message

Torvald Riegel Dec. 4, 2014, 8:54 p.m. UTC
This adds new functions for futex operations, starting with wait,
timedwait, wake.  They add documentation and error checking according to
the outcomes of this thread:
https://sourceware.org/ml/libc-alpha/2014-09/msg00381.html

My intent is to move over other futex callers incrementally, over time.
Right now, this is not intended to replace the futex operations aimed at
lowlevellocks, but just those in other places that uses the lll_futex_*
functions now.

It is implemented on top of lll_futex_*, so that we can expose the raw
futex to users via a syscall wrapper (or external futex_wait,...
functions), should we want to do that in the future.

We can add other futex operations as necessary, and when there is a need
(e.g., PI cmp requeue).

They have been tested on x86 and x86_64 with pthread_once (patch 2/2)
and the new semaphore implementation I'm about to post.


2014-12-05  Torvald Riegel  <triegel@redhat.com>

	* nptl/futex-internal.h: New file.

Comments

Roland McGrath Dec. 5, 2014, 12:33 a.m. UTC | #1
I'm not entirely clear on why this is separate from the lll_futex_*
layer rather than replacing it.  I understand the benefits of
incremental change, of course.  Is that the only reason?  I don't
think we want to have both layers as such in the long run, and it's
not clear to me what the end state of this cleanup will be in your
vision.  Perhaps we should do some more thorough design of the final
internal API we want to have, and then figure out the incremental path
to get there with bite-sized changes.

This is dovetailing (or colliding, to be pessimistic) with more
cleanup and refactoring that I'm starting to do as part of my Native
Client port.  So I'll start by just throwing out there all the issues
I'm aware of off hand, in hopes that solving each might naturally be
folded into what you're doing.

The internal API of lowlevellock-futex.h needs to be cleaned up and
specified more thoroughly in a few ways.

* FUTEX_PRIVATE_FLAG is not part of the stated API (as described in
  the stub file, sysdeps/nptl/lowlevellock-futex.h).  But it is used
  in "generic" NPTL code, with implicit assumptions about its
  relationship to LLL_PRIVATE and LLL_SHARED.  What I'd like to see is
  both use of FUTEX_PRIVATE_FLAG and __ASSUME_PRIVATE_FUTEX disappear
  from generic code.  Instead, we can have some sysdeps-defined
  inlines akin to your futex_private_if_supported, but both sides of
  the coin: one for the cases like pthread_barrier_init and one for
  the cases like pthread_barrier_wait.  Ideally, this would also cover
  detection of support in the opposite direction from what we've ever
  dealt with before: when shared is the unavailable option, so that
  pthread_barrierattr_setpshared et al can fail with ENOTSUP for any
  request for pshared semantics.

  Today we actually do not have any configurations where the answer is
  dynamic, though I'd like to support them for the future.  On Linux,
  we always set __ASSUME_PRIVATE_FUTEX.  (Older Linux kernels did not
  always support private, but shared can always stand in for private
  so it's not a user-visible distinction.  Nowadays our minimum
  supported kernel version is one that supports private.)  On NaCl,
  all futexes are private and shared is just not available.  Hence I'd
  like to make setpshared fail properly rather than lie.  But it's not
  unlikely that at some point in the future, NaCl will support shared
  and then it would be a dynamic check to determine whether it's
  available or not.

  I'm sure this can be done in a way that does not change the compiled
  code at all for Linux.  I'm even sure it can be done in a way that
  would not change the compiled code for Linux cases without
  __ASSUME_PRIVATE_FUTEX, if there still were such.  But I haven't
  thought up the right API for that off hand.  And frankly, I get a
  bit dizzy every time I try to think through all the XORs and which
  places store the bit in which sense.

* We haven't properly specified the exact types of pointer arguments
  in lll_futex_* calls.  In NaCl these are implemented by eventually
  calling actual C functions with similar signatures, as opposed to
  many layers of macro turning into asm with operands that just have
  to be pointer-sized.  Our uses are actually inconsistent about
  whether it's an 'int *' or an 'unsigned int *' and about whether
  it's volatile or not.  So my build has lots of volatileness and
  pointee signedness warnings that are not easily vanquished without
  resorting to casts that could mask real bugs.  The Linux
  implementations indirectly have casts that could indeed mask real
  bugs.  It would be far better to have inlines with specific types
  and clean up all our usage.

* I really want to completely excise the inane "negated errno" return
  value convention from all our internal APIs.  That is not even a
  true Linuxism, it's a style copied from Linux kernel internals that
  does not even map to what the user ABI for syscall errors is on all
  machines.  All new or cleaned-up interfaces should just use the
  straightforward POSIXy "errno or zero" convention instead.  (That's
  what the underlying OS interface C functions in NaCl do, so today my
  macros all do -function(...) and I get oodles of -Wunused-value
  warnings for all the places we lack error checking today.)
  
  The only reason not to use that convention is if you needed some
  content in the return value other than error indication.  The Linux
  futex syscall interface does return such values (FUTEX_WAKE), but we
  do not actually use them anywhere at all.  If we ever did need them,
  I'd advocate for using out parameters in our internal APIs instead,
  even if the Linux implementation transmutes part of the return value
  space into the out parameter.  (Such layers will be all inlines
  anyway, so it shouldn't even make a microoptimization difference.)

> It is implemented on top of lll_futex_*, so that we can expose the raw
> futex to users via a syscall wrapper (or external futex_wait,...
> functions), should we want to do that in the future.

I'm not sure how that potential future relates to the layering
choices.  At any rate, I don't think we should choose our internal
layering based on speculation about such future uses if it results in
deciding on more complex internals (extra layers and the like) now.
We can always refactor in the future when everything is clearer.

> 	* nptl/futex-internal.h: New file.

Just as a procedural matter, I'm inclined to say that a new file like
this should come in the same commit as the first use of it.
Otherwise, even total build-breaking errors in the file wouldn't be
noticed until the next change.  Likewise, it seems best to leave out
things (e.g. futex_private_if_supported) until the commit that
actually introduces a use.

> +#include <lowlevellock.h>

Include only what you need: lowlevellock-futex.h here.  That changes
which code you're getting today, because all the machine-specific
lowlevellock.h files still need to be removed.  But we should be
finishing that cleanup this cycle anyway (though everyone seems to
have forgotten).

> +   glibc or in the calling application, or when an error code is not know.

s/know/&n/

> +   Due to POSIX requirements on when synchronization data structures such
> +   as mutexes or semaphores can be destructed and due to the futex design

s/destructed/destroyed/ (throughout)

> +static int __attribute__ ((unused))

I think we have been tending towards 'static inline int' for these
sorts of cases, but I don't really recall any more what we said is
"best practice".  inline alone is enough to avoid the unused-function
warnings, and these are cases where we probably do want to be alerted
by warnings if inlining fails (because that would be so unexpected).

> +#ifdef __ASSUME_PRIVATE_FUTEX
> +  return pshared != 0 ? FUTEX_SHARED : FUTEX_PRIVATE;
> +#else
> +  return pshared != 0 ? FUTEX_SHARED :
> +      THREAD_GETMEM (THREAD_SELF, header.private_futex) ^ FUTEX_PRIVATE_FLAG;
> +#endif

Always conditionalize as little as possible, rather than duplicating
even a little bit of logic.  e.g.:

	  if (pshared != 0)
	    return FUTEX_SHARED;
	#ifdef __ASSUME_PRIVATE_FUTEX
	  return FUTEX_PRIVATE;
	#else
	  return THREAD_GETMEM ...;
	#endif

> +/* Atomically wrt. other futex operations, this blocks iff the value at
> +   *futex matches the expected value.  This is semantically equivalent to:

Put parameter or local variable names in all caps in comments: FUTEX.

> +     l = <get lock associated with futex> (futex);
> +     wait_flag = <get wait_flag associated with futex> (futex);
> +     lock (l);
> +     val = atomic_load_relaxed (futex);
> +     if (val != expected) { unlock (l); return EWOULDBLOCK; }

EAGAIN is the canonical name (since 1988), so use that instead of
EWOULDBLOCK in all code.  (I won't cite other instances of EWOULDBLOCK.)

> +   Note that some previous code in glibc assumed the futex syscall to start
> +   with the equivalent of a seq_cst fence; this allows one to avoid an
> +   explicit seq_cst fence before a futex_wait call when synchronizing similar
> +   to Dekker synchronization.  However, this is not documented by the kernel
> +   to be guaranteed, so we do not assume it here.
> +   */

These comments describe a libc-internal API that is generic, not
purely Linux-specific.  So they should talk primarily about what this
interface is on its own terms rather than with reference to Linux
kernel idiosyncrasies.  Of course it's still worthwhile to note the
relevant history and any nonobvious details of what the interface is
not as well as what it is.  But this gives too much of an impression
that the Linux kernel documentation is the actual arbiter of this
internal API.  Be clear that this internal API is wholly specified by
these comments rather than necessarily by any alignment with kernels.

> +static int __attribute__ ((unused))
> +futex_wait (unsigned* futex, unsigned expected, int private)

Never use implicit 'int' in libc code, always 'unsigned int' et al.
Never use the wrongheaded C++ style for placing * vs whitespace.

> +  if (err == 0 || err == -EWOULDBLOCK || err == -EINTR)
> +    return err;
> +  /* ETIMEDOUT cannot have happened because we provided no timeout.
> +     EFAULT must have been caused by a glibc or application bug.
> +     EINVAL (wrong alignment or timeout is not normalized) must have been
> +     caused by a glibc or application bug.
> +     ENOSYS must have been caused by a glibc bug.
> +     No other errors are documented at this time.  */

This kind of thing is far easier to read as a switch.  Then you can
also use "case ETIMEDOUT: /* comment */" et al, fall-through cases
before default: to add some easier-to-read structure to the
documentation of what codes we think are possible.

> +  abort();

Space before paren.  Also, need #include <stdlib.h> to use abort.

So, now I'm seeing a potential reason to have this layer exist
distinct from the OS-encapsulation layer.  Perhaps we should have the
checks for expected errno values be in an OS-independent layer rather
than just saying in the specification of the OS-encapsulation layer
that it must yield only the particular set.

> +/* Like futex_timed_wait, but will eventually time out (i.e., stop being
> +   blocked) after the absolute point in time provided has passed (abstime).

Upcase ABSTIME throughout the comment.

> +futex_timed_wait (unsigned* futex, unsigned expected,
> +    const struct timespec* abstime, int private)

Same issues mentioned in the previous decl, plus indentation should
have the second line of arguments line up after the open-paren.

> +/* Atomically wrt. to other futex operations, this unblocks the specified

"wrt." abbreviates "with regard to", so "wrt. to" is redundant.  
Also, "wrt" with no period is the common style.

> +   Returns the number of processes woken up.  Note that we need to support
> +   futex_wake calls to past futexes whose memory has potentially been reused
> +   due to POSIX' requirements on synchronization object destruction (see
> +   above);  therefore, we must not report or abort on most errors.  */

Split this paragraph after the first sentence.  Return value protocol
is unrelated to the other caveats.  Only one space after semicolon.

As mentioned above, we have no need of the non-error return value, so
don't include it in the API of this function.


Thanks,
Roland
diff mbox

Patch

commit c3ee46bfe1d25142bb572a28643bfebf460e8285
Author: Torvald Riegel <triegel@redhat.com>
Date:   Thu Dec 4 14:12:23 2014 +0100

    Add wrappers for futex operations for glibc-internal use.

diff --git a/nptl/futex-internal.h b/nptl/futex-internal.h
new file mode 100644
index 0000000..ea55e61
--- /dev/null
+++ b/nptl/futex-internal.h
@@ -0,0 +1,173 @@ 
+/* futex operations for glibc-internal use.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef FUTEX_INTERNAL_H
+#define FUTEX_INTERNAL_H
+
+#include <errno.h>
+#include <lowlevellock.h>
+#include <sys/time.h>
+
+/* This file defines futex operations used internally in glibc.  They are
+   wrappers for the syscalls and add glibc-specific error checking of the
+   syscall return value.  We abort on error codes that are caused by bugs in
+   glibc or in the calling application, or when an error code is not know.
+   We return error codes that can arise in correct executions to the caller.
+   Each operation calls out exactly the return values that callers need to
+   handle.
+
+   The private flag must be either FUTEX_PRIVATE or FUTEX_SHARED.
+
+   We expect callers to only use these operations if futexes are supported.
+
+   Due to POSIX requirements on when synchronization data structures such
+   as mutexes or semaphores can be destructed and due to the futex design
+   having separate fast/slow paths for wake-ups, we need to consider that
+   futex_wake calls might effectively target a data structure that has been
+   destructed and reused for another object, or unmapped; thus, some
+   errors or spurious wake-ups can happen in correct executions that would
+   not be possible in a program using just a single futex whose lifetime
+   does not end before the program terminates.  For background, see:
+   https://sourceware.org/ml/libc-alpha/2014-04/msg00075.html
+   https://lkml.org/lkml/2014/11/27/472  */
+
+#define FUTEX_PRIVATE LLL_PRIVATE
+#define FUTEX_SHARED  LLL_SHARED
+
+/* Returns FUTEX_PRIVATE if pshared is zero and private futexes are supported;
+   returns FUTEX_SHARED otherwise.  */
+static int __attribute__ ((unused))
+futex_private_if_supported (int pshared)
+{
+#ifdef __ASSUME_PRIVATE_FUTEX
+  return pshared != 0 ? FUTEX_SHARED : FUTEX_PRIVATE;
+#else
+  return pshared != 0 ? FUTEX_SHARED :
+      THREAD_GETMEM (THREAD_SELF, header.private_futex) ^ FUTEX_PRIVATE_FLAG;
+#endif
+}
+
+
+/* Atomically wrt. other futex operations, this blocks iff the value at
+   *futex matches the expected value.  This is semantically equivalent to:
+     l = <get lock associated with futex> (futex);
+     wait_flag = <get wait_flag associated with futex> (futex);
+     lock (l);
+     val = atomic_load_relaxed (futex);
+     if (val != expected) { unlock (l); return EWOULDBLOCK; }
+     atomic_store_relaxed (wait_flag, 1);
+     unlock (l);
+     // Now block; can time out in futex_time_wait (see below)
+     while (atomic_load_relaxed(wait_flag));
+
+   Note that no guarantee of a happens-before relation between a woken
+   futex_wait and a futex_wake is documented; however, this does not matter
+   in practice because we have to consider spurious wake-ups (see below),
+   and thus would not be able to reason which futex_wake woke us anyway.
+
+   Returns 0 if woken by a futex operation or spuriously.  (Note that due to
+   the POSIX requirements mentioned above, we need to conservatively assume
+   that unrelated futex_wake operations could wake this futex; it is easiest
+   to just be prepared for spurious wake-ups.)
+   Returns -EWOULDBLOCK if the futex' value did not match the expected value.
+   Returns -EINTR if signals or other spurious wake-ups happened.
+
+   Note that some previous code in glibc assumed the futex syscall to start
+   with the equivalent of a seq_cst fence; this allows one to avoid an
+   explicit seq_cst fence before a futex_wait call when synchronizing similar
+   to Dekker synchronization.  However, this is not documented by the kernel
+   to be guaranteed, so we do not assume it here.
+   */
+static int __attribute__ ((unused))
+futex_wait (unsigned* futex, unsigned expected, int private)
+{
+  int err = lll_futex_timed_wait (futex, expected, NULL, private);
+  if (err == 0 || err == -EWOULDBLOCK || err == -EINTR)
+    return err;
+  /* ETIMEDOUT cannot have happened because we provided no timeout.
+     EFAULT must have been caused by a glibc or application bug.
+     EINVAL (wrong alignment or timeout is not normalized) must have been
+     caused by a glibc or application bug.
+     ENOSYS must have been caused by a glibc bug.
+     No other errors are documented at this time.  */
+  abort();
+}
+
+/* Like futex_timed_wait, but will eventually time out (i.e., stop being
+   blocked) after the absolute point in time provided has passed (abstime).
+   We require the caller to provide a normalized abstime.
+
+   Returns 0 if woken by a futex operation or spuriously.  (Note that due to
+   the POSIX requirements mentioned above, we need to conservatively assume
+   that unrelated futex_wake operations could wake this futex; it is easiest
+   to just be prepared for spurious wake-ups.)
+   Returns -EWOULDBLOCK if the futex' value did not match the expected value.
+   Returns -EINTR if signals or other spurious wake-ups happened.
+   Returns -ETIMEDOUT if the timeout expired.
+   */
+static int __attribute__ ((unused))
+futex_timed_wait (unsigned* futex, unsigned expected,
+    const struct timespec* abstime, int private)
+{
+  int err = lll_futex_timed_wait (futex, expected, abstime, private);
+  if (err == 0 || err == -EWOULDBLOCK || err == -EINTR || err == -ETIMEDOUT)
+    return err;
+  /* EFAULT must have been caused by a glibc or application bug.
+     EINVAL (wrong alignment or timeout is not normalized) must have been
+     caused by a glibc or application bug.
+     ENOSYS must have been caused by a glibc bug.
+     No other errors are documented at this time.  */
+  abort();
+}
+
+/* Atomically wrt. to other futex operations, this unblocks the specified
+   number of processes, or all processes blocked on this futex if there are
+   fewer than the specified number.  Semantically, this is equivalent to:
+     l = <get lock associated with futex> (futex);
+     lock (l);
+     for (res = 0; processes_to_wake > 0; processes_to_wake--, res++) {
+       if (<no process blocked on futex>) break;
+       wf = <get wait_flag of a process blocked on futex> (futex);
+       // No happens-before guarantee with woken futex_wait (see above)
+       atomic_store_relaxed (wf, 0);
+     }
+     return res;
+
+   Returns the number of processes woken up.  Note that we need to support
+   futex_wake calls to past futexes whose memory has potentially been reused
+   due to POSIX' requirements on synchronization object destruction (see
+   above);  therefore, we must not report or abort on most errors.  */
+static int __attribute__ ((unused))
+futex_wake (unsigned* futex, int processes_to_wake, int private)
+{
+  int res = lll_futex_wake (futex, processes_to_wake, private);
+  /* EFAULT could have happened due to memory reuse.
+     EINVAL could be either due to incorrect alignment (a bug in glibc or the
+     application) or due to memory being reused for a PI futex.  We cannot
+     distinguish between the two causes, and one of them is correct use, so
+     we do not act in this case.  */
+  if (res == -EFAULT || res == -EINVAL)
+    return 0;
+  /* ENOSYS must have been caused by a glibc bug.
+     No other errors are documented at this time.  */
+  if (res < 0)
+    abort ();
+  return res;
+}
+
+#endif  /* futex-internal.h */