From patchwork Wed Jan 21 05:57:12 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Carlos O'Donell <carlos@redhat.com>
X-Patchwork-Id: 431358
Return-Path: 
 <libc-alpha-return-56331-incoming=patchwork.ozlabs.org@sourceware.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 8973714014D
	for <incoming@patchwork.ozlabs.org>;
	Wed, 21 Jan 2015 16:57:34 +1100 (AEDT)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:message-id:date:from:mime-version:to:cc
	:subject:references:in-reply-to:content-type
	:content-transfer-encoding; q=dns; s=default; b=nRgmWQyT+9JYUDlg
	PzuXgNWJn8BOuMQGkszeDZQ6y1lSSu+BfNUkYxcc78MruHRXmrHXCw6q255/9czW
	GEQ1LrmTclJo1dmhST9R9Ue3DE+wQ83HUGM21RPt2whEQpqePW1SYuAiqPUU737E
	vQQlMrpv3DcmVs3i4e5tSSjJWYM=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:message-id:date:from:mime-version:to:cc
	:subject:references:in-reply-to:content-type
	:content-transfer-encoding; s=default; bh=/p/ZYA/CJmCD1mepor8Q3s
	AZ6rA=; b=POs+zpjmRI8M+svh4zbgHf9/Fx7+5z1ml+wvx2/Fjqaq8NBrXqg3VZ
	O0yP0xAIVWdaP+XoFCU0586DoaJnZbit8yClV+9EZp+hpAGvyXH6imNLAh8mLmB9
	oJ9+zTZSQ93uoCruQ3b5lr/Dtdz8ruGT/Uuxlo+AME8OBRGxTh72o=
Received: (qmail 12607 invoked by alias); 21 Jan 2015 05:57:26 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: 
 <mailto:libc-alpha-unsubscribe-incoming=patchwork.ozlabs.org@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>,
	<http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Delivered-To: mailing list libc-alpha@sourceware.org
Received: (qmail 12587 invoked by uid 89); 21 Jan 2015 05:57:23 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=AWL, BAYES_00,
	SPF_HELO_PASS, SPF_PASS,
	T_RP_MATCHES_RCVD autolearn=ham version=3.3.2
X-HELO: mx1.redhat.com
Message-ID: <54BF3FB8.30203@redhat.com>
Date: Wed, 21 Jan 2015 00:57:12 -0500
From: "Carlos O'Donell" <carlos@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Torvald Riegel <triegel@redhat.com>
CC: GLIBC Devel <libc-alpha@sourceware.org>
Subject: Re: [PATCH] Update to new generic semaphore algorithm v2
References: <1421111716.23151.35.camel@triegel.csb>
In-Reply-To: <1421111716.23151.35.camel@triegel.csb>

On 01/12/2015 08:15 PM, Torvald Riegel wrote:
> This patch is a revision of that patch:
> https://sourceware.org/ml/libc-alpha/2014-12/msg00527.html
> 
> The changes compare to that patch are aimed at making this as
> non-invasive as possible:
> * Not based on top of the futex clean-up patches anymore.  The futex
> wrappers from those patches have simply been copied, so can be replaced
> easily once we do the futex clean-up; furthermore, we still get to do
> proper futex error handling in the new semaphore.

OK.

> * It assumes that EINTR is only returned by FUTEX_WAIT if the syscall
> was indeed interrupted by a signal.  This conflicts with what the
> current Linux-kernel-side info on when EINTR can be returned, but keeps
> doing what glibc's semaphore has assumed so far.  As a result, we don't
> need to change nptl/tst-sem4.

Good. I like this conservative first step until we decide what to do
next for this situation.

> * Fixes a linknamespace issue by moving sem_timedwait into its own file.

Also good.

> Tested and no regressions on Linux x86_64 and i686.  I haven't tested
> the old semaphore version (ie, __old_sem*), but the change is minimal
> (one release fence added -- was already present in the alpha and powerpc
> implementations).

I agree.

> OK?

Yes, I've committed this myself after more testing. I ran some custom
sem_post/sem_wait programs on multicore boxes in an attempt to stress
test the implementation. I didn't find any detectable defects.

I made one change by pushing async cancel enable down a level
into futex_abstimed_wait. It was required to prevent async cancel from
being enabled while entering abort() and __gettimeofday(), neither of which
are safe. We want to limit the async-cancellation to a minimum. There are
actually several other instances of this problem in NPTL, for example asserts
insides of AC regions are bad.

v3
- Adjust ChangeLog.
- Use braces to make statement clearer.
- Push cancellation down a function to increase AC safety.

NEWS:

* A new semaphore algorithm has been implemented in generic C code for all
  machines. Previous custom assembly implementations of semaphore were
  difficult to reason about or ensure that they were safe. The new version
  of semaphore supports machines with 64-bit or 32-bit atomic operations.
  The new semaphore algorithm is used by sem_init, sem_open, sem_post,
  sem_wait, sem_timedwait, sem_trywait, and sem_getvalue.
---

 
> 2015-01-13  Torvald Riegel  <triegel@redhat.com>
> 
> 	[BZ #12674]
> 	* nptl/sem_waitcommon.c: New file.  Implement new semaphore algorithm.

Just say "New file." The file itself comments the implementation.

> 	* nptl/sem_wait.c: Include sem_waitcommon.c.
> 	(__sem_wait_cleanup, do_futex_wait): Remove.
> 	(__new_sem_wait): Adapt.
> 	(__new_sem_trywait): New function.
> 	(__old_sem_trywait): Moved here from nptl/sem_trywait.c.
> 	* nptl/sem_timedwait.c: Include sem_waitcommon.c.
> 	(__sem_wait_cleanup, do_futex_timed_wait): Remove.
> 	(sem_timedwait): Adapt.
> 	* nptl/sem_post.c (__new_sem_post): Adapt.
> 	(futex_wake): New function.
> 	(__old_sem_post): Add release MO fence.
> 	* nptl/sem_open.c (sem_open): Adapt.
> 	* nptl/sem_init.c (__new_sem_init): Adapt.
> 	(futex_private_if_supported): New function.
> 	* nptl/sem_getvalue.c (__new_sem_getvalue): Adapt.
> 	(__old_sem_getvalue): Add using previous code.
> 	* sysdeps/nptl/internaltypes.h: Adapt.
> 	* nptl/tst-sem13.c (do_test): Adapt.
> 	* nptl/tst-sem11.c (main): Adapt.
> 	* nptl/sem_trywait.c: Remove.
> 	* nptl/DESIGN-sem.txt: Remove.
> 	* nptl/Makefile (libpthread-routines): Remove sem_trywait.
> 	(gen-as-const-headers): Remove structsem.sym.
> 	* nptl/structsem.sym: Remove.
> 	* sysdeps/unix/sysv/linux/alpha/sem_post.c: Remove.
> 	* sysdeps/unix/sysv/linux/i386/i486/sem_post.S: Remove.
> 	* sysdeps/unix/sysv/linux/i386/i486/sem_timedwait.S: Remove.
> 	* sysdeps/unix/sysv/linux/i386/i486/sem_trywait.S: Remove.
> 	* sysdeps/unix/sysv/linux/i386/i486/sem_wait.S: Remove.
> 	* sysdeps/unix/sysv/linux/i386/i586/sem_post.S: Remove.
> 	* sysdeps/unix/sysv/linux/i386/i586/sem_timedwait.S: Remove.
> 	* sysdeps/unix/sysv/linux/i386/i586/sem_trywait.S: Remove.
> 	* sysdeps/unix/sysv/linux/i386/i586/sem_wait.S: Remove.
> 	* sysdeps/unix/sysv/linux/i386/i686/sem_post.S: Remove.
> 	* sysdeps/unix/sysv/linux/i386/i686/sem_timedwait.S: Remove.
> 	* sysdeps/unix/sysv/linux/i386/i686/sem_trywait.S: Remove.
> 	* sysdeps/unix/sysv/linux/i386/i686/sem_wait.S: Remove.
> 	* sysdeps/unix/sysv/linux/powerpc/sem_post.c: Remove.
> 	* sysdeps/unix/sysv/linux/sh/sem_post.S: Remove.
> 	* sysdeps/unix/sysv/linux/sh/sem_timedwait.S: Remove.
> 	* sysdeps/unix/sysv/linux/sh/sem_trywait.S: Remove.
> 	* sysdeps/unix/sysv/linux/sh/sem_wait.S: Remove.
> 	* sysdeps/unix/sysv/linux/x86_64/sem_post.S: Remove.
> 	* sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S: Remove.
> 	* sysdeps/unix/sysv/linux/x86_64/sem_trywait.S: Remove.
> 	* sysdeps/unix/sysv/linux/x86_64/sem_wait.S: Remove.
> 

OK.

> Author: Torvald Riegel <triegel@redhat.com>
> Date:   Thu Dec 18 15:15:22 2014 +0100
> 
>     Update to new generic semaphore algorithm.
> 
> diff --git a/nptl/DESIGN-sem.txt b/nptl/DESIGN-sem.txt
> deleted file mode 100644
> index 17eb0c1..0000000
> --- a/nptl/DESIGN-sem.txt
> +++ /dev/null

OK, no longer valid.

> diff --git a/nptl/Makefile b/nptl/Makefile
> index ce2d0e4..43d8510 100644
> --- a/nptl/Makefile
> +++ b/nptl/Makefile
> @@ -100,7 +100,7 @@ libpthread-routines = nptl-init vars events version \
>  		      sem_init sem_destroy \
>  		      sem_open sem_close sem_unlink \
>  		      sem_getvalue \
> -		      sem_wait sem_trywait sem_timedwait sem_post \
> +		      sem_wait sem_timedwait sem_post \

OK, sem_trywait moved into same file as sem_wait.

OK, No other machine implements sem_trywait.S so this won't impact anyone else.

>  		      cleanup cleanup_defer cleanup_compat \
>  		      cleanup_defer_compat unwind \
>  		      pt-longjmp pt-cleanup\
> @@ -283,8 +283,7 @@ tests-nolibpthread = tst-unload
>  gen-as-const-headers = pthread-errnos.sym \
>  		       lowlevelcond.sym lowlevelrwlock.sym \
>  		       lowlevelbarrier.sym unwindbuf.sym \
> -		       lowlevelrobustlock.sym pthread-pi-defines.sym \
> -		       structsem.sym
> +		       lowlevelrobustlock.sym pthread-pi-defines.sym

OK, remove the sym file for assembly uses that don't exist.

>  LDFLAGS-pthread.so = -Wl,--enable-new-dtags,-z,nodelete,-z,initfirst
> diff --git a/nptl/sem_getvalue.c b/nptl/sem_getvalue.c
> index c3c91e1..1432cc7 100644
> --- a/nptl/sem_getvalue.c
> +++ b/nptl/sem_getvalue.c
> @@ -19,23 +19,37 @@
>  #include <semaphore.h>
>  #include <shlib-compat.h>
>  #include "semaphoreP.h"
> +#include <atomic.h>
>  
>  
>  int
> -__new_sem_getvalue (sem, sval)
> -     sem_t *sem;
> -     int *sval;
> +__new_sem_getvalue (sem_t *sem, int *sval)
>  {
>    struct new_sem *isem = (struct new_sem *) sem;
>  
>    /* XXX Check for valid SEM parameter.  */
> -
> -  *sval = isem->value;
> +  /* FIXME This uses relaxed MO, even though POSIX specifies that this function
> +     should be linearizable.  However, its debatable whether linearizability
> +     is the right requirement.  We need to follow up with POSIX and, if
> +     necessary, use a stronger MO here and elsewhere (e.g., potentially
> +     release MO in all places where we consume a token).  */
> +
> +#if __HAVE_64B_ATOMICS
> +  *sval = atomic_load_relaxed (&isem->data) & SEM_VALUE_MASK;
> +#else
> +  *sval = atomic_load_relaxed (&isem->value) >> SEM_VALUE_SHIFT;
> +#endif

OK.

>  
>    return 0;
>  }
>  versioned_symbol (libpthread, __new_sem_getvalue, sem_getvalue, GLIBC_2_1);
>  #if SHLIB_COMPAT(libpthread, GLIBC_2_0, GLIBC_2_1)
> -strong_alias (__new_sem_getvalue, __old_sem_getvalue)
> +int
> +__old_sem_getvalue (sem_t *sem, int *sval)
> +{
> +  struct old_sem *isem = (struct old_sem *) sem;
> +  *sval = isem->value;
> +  return 0;
> +}

OK.

>  compat_symbol (libpthread, __old_sem_getvalue, sem_getvalue, GLIBC_2_0);
>  #endif
> diff --git a/nptl/sem_init.c b/nptl/sem_init.c
> index 5350f16..575b661 100644
> --- a/nptl/sem_init.c
> +++ b/nptl/sem_init.c
> @@ -18,17 +18,29 @@
>  
>  #include <errno.h>
>  #include <semaphore.h>
> -#include <lowlevellock.h>
>  #include <shlib-compat.h>
>  #include "semaphoreP.h"
>  #include <kernel-features.h>
>  
> +/* Returns FUTEX_PRIVATE if pshared is zero and private futexes are supported;
> +   returns FUTEX_SHARED otherwise.
> +   TODO Remove when cleaning up the futex API throughout glibc.  */
> +static __always_inline int
> +futex_private_if_supported (int pshared)
> +{
> +  if (pshared != 0)
> +    return LLL_SHARED;
> +#ifdef __ASSUME_PRIVATE_FUTEX
> +  return LLL_PRIVATE;
> +#else
> +  return THREAD_GETMEM (THREAD_SELF, header.private_futex)
> +      ^ FUTEX_PRIVATE_FLAG;
> +#endif
> +}
> +

OK.

>  
>  int
> -__new_sem_init (sem, pshared, value)
> -     sem_t *sem;
> -     int pshared;
> -     unsigned int value;
> +__new_sem_init (sem_t *sem, int pshared, unsigned int value)
>  {
>    /* Parameter sanity check.  */
>    if (__glibc_unlikely (value > SEM_VALUE_MAX))
> @@ -40,16 +52,15 @@ __new_sem_init (sem, pshared, value)
>    /* Map to the internal type.  */
>    struct new_sem *isem = (struct new_sem *) sem;
>  
> -  /* Use the values the user provided.  */
> -  isem->value = value;
> -#ifdef __ASSUME_PRIVATE_FUTEX
> -  isem->private = pshared ? 0 : FUTEX_PRIVATE_FLAG;
> +  /* Use the values the caller provided.  */
> +#if __HAVE_64B_ATOMICS
> +  isem->data = value;

OK.

I thought about masking value, but that isn't needed given
the input is int, and the only time it would matter would be
with a compiler bug, and then all bets are off anyway.

>  #else
> -  isem->private = pshared ? 0 : THREAD_GETMEM (THREAD_SELF,
> -					       header.private_futex);
> +  isem->value = value << SEM_VALUE_SHIFT;
> +  isem->nwaiters = 0;
>  #endif
>  
> -  isem->nwaiters = 0;
> +  isem->private = futex_private_if_supported (pshared);
>  
>    return 0;
>  }
> diff --git a/nptl/sem_open.c b/nptl/sem_open.c
> index b5a5de4..bfd2dea 100644
> --- a/nptl/sem_open.c
> +++ b/nptl/sem_open.c
> @@ -193,9 +193,14 @@ sem_open (const char *name, int oflag, ...)
>  	struct new_sem newsem;
>        } sem;
>  
> -      sem.newsem.value = value;
> -      sem.newsem.private = 0;
> +#if __HAVE_64B_ATOMICS
> +      sem.newsem.data = value;

OK.

> +#else
> +      sem.newsem.value = value << SEM_VALUE_SHIFT;
>        sem.newsem.nwaiters = 0;
> +#endif
> +      /* This always is a shared semaphore.  */
> +      sem.newsem.private = LLL_SHARED;

OK.

>  
>        /* Initialize the remaining bytes as well.  */
>        memset ((char *) &sem.initsem + sizeof (struct new_sem), '\0',
> diff --git a/nptl/sem_post.c b/nptl/sem_post.c
> index d1c39ff..f7b6985 100644
> --- a/nptl/sem_post.c
> +++ b/nptl/sem_post.c
> @@ -26,34 +26,78 @@
>  
>  #include <shlib-compat.h>
>  
> +/* Wrapper for lll_futex_wake, with error checking.
> +   TODO Remove when cleaning up the futex API throughout glibc.  */
> +static __always_inline void
> +futex_wake (unsigned int* futex, int processes_to_wake, int private)
> +{
> +  int res = lll_futex_wake (futex, processes_to_wake, private);
> +  /* No error.  Ignore the number of woken processes.  */
> +  if (res >= 0)
> +    return;
> +  switch (res)
> +    {
> +    case -EFAULT: /* Could have happened due to memory reuse.  */
> +    case -EINVAL: /* Could be either due to incorrect alignment (a bug in
> +		     glibc or in the application) or due to memory being
> +		     reused for a PI futex.  We cannot distinguish between the
> +		     two causes, and one of them is correct use, so we do not
> +		     act in this case.  */
> +      return;
> +    case -ENOSYS: /* Must have been caused by a glibc bug.  */
> +    /* No other errors are documented at this time.  */
> +    default:
> +      abort ();
> +    }
> +}

OK.

> +
> +
> +/* See sem_wait for an explanation of the algorithm.  */
>  int
>  __new_sem_post (sem_t *sem)
>  {
>    struct new_sem *isem = (struct new_sem *) sem;
> +  int private = isem->private;
>  
> -  __typeof (isem->value) cur;
> +#if __HAVE_64B_ATOMICS
> +  /* Add a token to the semaphore.  We use release MO to make sure that a
> +     thread acquiring this token synchronizes with us and other threads that
> +     added tokens before (the release sequence includes atomic RMW operations
> +     by other threads).  */
> +  /* TODO Use atomic_fetch_add to make it scale better than a CAS loop?  */
> +  unsigned long int d = atomic_load_relaxed (&isem->data);

OK.

>    do
>      {
> -      cur = isem->value;
> -      if (isem->value == SEM_VALUE_MAX)
> +      if ((d & SEM_VALUE_MASK) == SEM_VALUE_MAX)

OK.

>  	{
>  	  __set_errno (EOVERFLOW);
>  	  return -1;
>  	}
>      }
> -  while (atomic_compare_and_exchange_bool_rel (&isem->value, cur + 1, cur));
> +  while (!atomic_compare_exchange_weak_release (&isem->data, &d, d + 1));

OK.

>  
> -  atomic_full_barrier ();
> -  if (isem->nwaiters > 0)
> +  /* If there is any potentially blocked waiter, wake one of them.  */
> +  if (d >> SEM_NWAITERS_SHIFT > 0)

Changed: if ((d >> SEM_NWAITERS_SHIFT) > 0) to clarify the expression.

> +    futex_wake (((unsigned int *) &isem->data) + SEM_VALUE_OFFSET, 1, private);
> +#else
> +  /* Add a token to the semaphore.  Similar to 64b version.  */
> +  unsigned int v = atomic_load_relaxed (&isem->value);
> +  do
>      {
> -      int err = lll_futex_wake (&isem->value, 1,
> -				isem->private ^ FUTEX_PRIVATE_FLAG);
> -      if (__builtin_expect (err, 0) < 0)
> +      if ((v << SEM_VALUE_SHIFT) == SEM_VALUE_MAX)
>  	{
> -	  __set_errno (-err);
> +	  __set_errno (EOVERFLOW);

OK.

>  	  return -1;
>  	}
>      }
> +  while (!atomic_compare_exchange_weak_release (&isem->value,
> +      &v, v + (1 << SEM_VALUE_SHIFT)));
> +
> +  /* If there is any potentially blocked waiter, wake one of them.  */
> +  if ((v & SEM_NWAITERS_MASK) != 0)
> +    futex_wake (&isem->value, 1, private);
> +#endif
> +
>    return 0;
>  }
>  versioned_symbol (libpthread, __new_sem_post, sem_post, GLIBC_2_1);
> @@ -66,6 +110,9 @@ __old_sem_post (sem_t *sem)
>  {
>    int *futex = (int *) sem;
>  
> +  /* We must need to synchronize with consumers of this token, so the atomic
> +     increment must have release MO semantics.  */
> +  atomic_write_barrier ();
>    (void) atomic_increment_val (futex);
>    /* We always have to assume it is a shared semaphore.  */
>    int err = lll_futex_wake (futex, 1, LLL_SHARED);
> diff --git a/nptl/sem_timedwait.c b/nptl/sem_timedwait.c
> index d04bcdf..042b0ac 100644
> --- a/nptl/sem_timedwait.c
> +++ b/nptl/sem_timedwait.c
> @@ -1,4 +1,4 @@
> -/* sem_timedwait -- wait on a semaphore.  Generic futex-using version.
> +/* sem_timedwait -- wait on a semaphore with timeout.
>     Copyright (C) 2003-2015 Free Software Foundation, Inc.
>     This file is part of the GNU C Library.
>     Contributed by Paul Mackerras <paulus@au.ibm.com>, 2003.
> @@ -17,101 +17,21 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#include <errno.h>
> -#include <sysdep.h>
> -#include <lowlevellock.h>
> -#include <internaltypes.h>
> -#include <semaphore.h>
> -#include <sys/time.h>
> -
> -#include <pthreadP.h>
> -#include <shlib-compat.h>
> -
> -
> -extern void __sem_wait_cleanup (void *arg) attribute_hidden;
> -
> -/* This is in a seperate function in order to make sure gcc
> -   puts the call site into an exception region, and thus the
> -   cleanups get properly run.  */
> -static int
> -__attribute__ ((noinline))
> -do_futex_timed_wait (struct new_sem *isem, struct timespec *rt)
> -{
> -  int err, oldtype = __pthread_enable_asynccancel ();
> -
> -  err = lll_futex_timed_wait (&isem->value, 0, rt,
> -			      isem->private ^ FUTEX_PRIVATE_FLAG);
> -
> -  __pthread_disable_asynccancel (oldtype);
> -  return err;
> -}
> +#include "sem_waitcommon.c"
>  
> +/* This is in a separate file because because sem_timedwait is only provided
> +   if __USE_XOPEN2K is defined.  */

OK.

>  int
>  sem_timedwait (sem_t *sem, const struct timespec *abstime)
>  {
> -  struct new_sem *isem = (struct new_sem *) sem;
> -  int err;
> -
> -  if (atomic_decrement_if_positive (&isem->value) > 0)
> -    return 0;
> -
>    if (abstime->tv_nsec < 0 || abstime->tv_nsec >= 1000000000)
>      {
>        __set_errno (EINVAL);
>        return -1;
>      }
>  
> -  atomic_increment (&isem->nwaiters);
> -
> -  pthread_cleanup_push (__sem_wait_cleanup, isem);
> -
> -  while (1)
> -    {
> -      struct timeval tv;
> -      struct timespec rt;
> -      int sec, nsec;
> -
> -      /* Get the current time.  */
> -      __gettimeofday (&tv, NULL);
> -
> -      /* Compute relative timeout.  */
> -      sec = abstime->tv_sec - tv.tv_sec;
> -      nsec = abstime->tv_nsec - tv.tv_usec * 1000;
> -      if (nsec < 0)
> -	{
> -	  nsec += 1000000000;
> -	  --sec;
> -	}
> -
> -      /* Already timed out?  */
> -      if (sec < 0)
> -	{
> -	  __set_errno (ETIMEDOUT);
> -	  err = -1;
> -	  break;
> -	}
> -
> -      /* Do wait.  */
> -      rt.tv_sec = sec;
> -      rt.tv_nsec = nsec;
> -      err = do_futex_timed_wait(isem, &rt);
> -      if (err != 0 && err != -EWOULDBLOCK)
> -	{
> -	  __set_errno (-err);
> -	  err = -1;
> -	  break;
> -	}
> -
> -      if (atomic_decrement_if_positive (&isem->value) > 0)
> -	{
> -	  err = 0;
> -	  break;
> -	}
> -    }
> -
> -  pthread_cleanup_pop (0);
> -
> -  atomic_decrement (&isem->nwaiters);
> -
> -  return err;
> +  if (__new_sem_wait_fast ((struct new_sem *) sem, 0) == 0)
> +    return 0;
> +  else
> +    return __new_sem_wait_slow((struct new_sem *) sem, abstime);

OK.

>  }
> diff --git a/nptl/sem_trywait.c b/nptl/sem_trywait.c
> deleted file mode 100644
> index d434098..0000000
> --- a/nptl/sem_trywait.c
> +++ /dev/null
> @@ -1,50 +0,0 @@

OK.

> diff --git a/nptl/sem_wait.c b/nptl/sem_wait.c
> index df11933..c1fd10c 100644
> --- a/nptl/sem_wait.c
> +++ b/nptl/sem_wait.c
> @@ -17,80 +17,18 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#include <errno.h>
> -#include <sysdep.h>
> -#include <lowlevellock.h>
> -#include <internaltypes.h>
> -#include <semaphore.h>
> -
> -#include <pthreadP.h>
> -#include <shlib-compat.h>
> -#include <atomic.h>
> -
> -
> -void
> -attribute_hidden
> -__sem_wait_cleanup (void *arg)
> -{
> -  struct new_sem *isem = (struct new_sem *) arg;
> -
> -  atomic_decrement (&isem->nwaiters);
> -}
> -
> -/* This is in a seperate function in order to make sure gcc
> -   puts the call site into an exception region, and thus the
> -   cleanups get properly run.  */
> -static int
> -__attribute__ ((noinline))
> -do_futex_wait (struct new_sem *isem)
> -{
> -  int err, oldtype = __pthread_enable_asynccancel ();
> -
> -  err = lll_futex_wait (&isem->value, 0, isem->private ^ FUTEX_PRIVATE_FLAG);
> -
> -  __pthread_disable_asynccancel (oldtype);
> -  return err;
> -}
> +#include "sem_waitcommon.c"
>  
>  int
>  __new_sem_wait (sem_t *sem)
>  {
> -  struct new_sem *isem = (struct new_sem *) sem;
> -  int err;
> -
> -  if (atomic_decrement_if_positive (&isem->value) > 0)
> +  if (__new_sem_wait_fast ((struct new_sem *) sem, 0) == 0)
>      return 0;
> -
> -  atomic_increment (&isem->nwaiters);
> -
> -  pthread_cleanup_push (__sem_wait_cleanup, isem);
> -
> -  while (1)
> -    {
> -      err = do_futex_wait(isem);
> -      if (err != 0 && err != -EWOULDBLOCK)
> -	{
> -	  __set_errno (-err);
> -	  err = -1;
> -	  break;
> -	}
> -
> -      if (atomic_decrement_if_positive (&isem->value) > 0)
> -	{
> -	  err = 0;
> -	  break;
> -	}
> -    }
> -
> -  pthread_cleanup_pop (0);
> -
> -  atomic_decrement (&isem->nwaiters);
> -
> -  return err;
> +  else
> +    return __new_sem_wait_slow((struct new_sem *) sem, NULL);

OK.

>  }
>  versioned_symbol (libpthread, __new_sem_wait, sem_wait, GLIBC_2_1);
>  
> -
>  #if SHLIB_COMPAT (libpthread, GLIBC_2_0, GLIBC_2_1)
>  int
>  attribute_compat_text_section
> @@ -121,3 +59,34 @@ __old_sem_wait (sem_t *sem)
>  
>  compat_symbol (libpthread, __old_sem_wait, sem_wait, GLIBC_2_0);
>  #endif
> +
> +int
> +__new_sem_trywait (sem_t *sem)
> +{
> +  /* We must not fail spuriously, so require a definitive result even if this
> +     may lead to a long execution time.  */
> +  if (__new_sem_wait_fast ((struct new_sem *) sem, 1) == 0)
> +    return 0;
> +  __set_errno (EAGAIN);
> +  return -1;
> +}

OK.

> +versioned_symbol (libpthread, __new_sem_trywait, sem_trywait, GLIBC_2_1);
> +#if SHLIB_COMPAT (libpthread, GLIBC_2_0, GLIBC_2_1)
> +int
> +__old_sem_trywait (sem_t *sem)
> +{
> +  int *futex = (int *) sem;
> +  int val;
> +
> +  if (*futex > 0)
> +    {
> +      val = atomic_decrement_if_positive (futex);
> +      if (val > 0)
> +	return 0;
> +    }
> +
> +  __set_errno (EAGAIN);
> +  return -1;
> +}
> +compat_symbol (libpthread, __old_sem_trywait, sem_trywait, GLIBC_2_0);
> +#endif

OK.

> diff --git a/nptl/sem_waitcommon.c b/nptl/sem_waitcommon.c
> new file mode 100644
> index 0000000..5f3e87f
> --- /dev/null
> +++ b/nptl/sem_waitcommon.c
> @@ -0,0 +1,459 @@
> +/* sem_waitcommon -- wait on a semaphore, shared code.
> +   Copyright (C) 2003-2015 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +   Contributed by Paul Mackerras <paulus@au.ibm.com>, 2003.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <errno.h>
> +#include <sysdep.h>
> +#include <lowlevellock.h>
> +#include <internaltypes.h>
> +#include <semaphore.h>
> +#include <sys/time.h>
> +
> +#include <pthreadP.h>
> +#include <shlib-compat.h>
> +#include <atomic.h>
> +
> +/* Wrapper for lll_futex_wait with absolute timeout and error checking.
> +   TODO Remove when cleaning up the futex API throughout glibc.  */
> +static __always_inline int
> +futex_abstimed_wait (unsigned int* futex, unsigned int expected,
> +		     const struct timespec* abstime, int private)
> +{
> +  int err;
> +  if (abstime == NULL)
> +    err = lll_futex_wait (futex, expected, private);
> +  else
> +    {
> +      struct timeval tv;
> +      struct timespec rt;
> +      int sec, nsec;
> +
> +      /* Get the current time.  */
> +      __gettimeofday (&tv, NULL);
> +
> +      /* Compute relative timeout.  */
> +      sec = abstime->tv_sec - tv.tv_sec;
> +      nsec = abstime->tv_nsec - tv.tv_usec * 1000;
> +      if (nsec < 0)
> +        {
> +          nsec += 1000000000;
> +          --sec;
> +        }
> +
> +      /* Already timed out?  */
> +      if (sec < 0)
> +        return ETIMEDOUT;
> +
> +      /* Do wait.  */
> +      rt.tv_sec = sec;
> +      rt.tv_nsec = nsec;
> +
> +      err = lll_futex_timed_wait (futex, expected, &rt, private);

OK.

> +    }
> +  switch (err)
> +    {
> +    case 0:
> +    case -EAGAIN:
> +    case -EINTR:
> +    case -ETIMEDOUT:
> +      return -err;
> +
> +    case -EFAULT: /* Must have been caused by a glibc or application bug.  */
> +    case -EINVAL: /* Either due to wrong alignment or due to the timeout not
> +		     being normalized.  Must have been caused by a glibc or
> +		     application bug.  */
> +    case -ENOSYS: /* Must have been caused by a glibc bug.  */
> +    /* No other errors are documented at this time.  */
> +    default:
> +      abort ();

OK.

> +    }
> +}
> +
> +/* Wrapper for lll_futex_wake, with error checking.
> +   TODO Remove when cleaning up the futex API throughout glibc.  */
> +static __always_inline void
> +futex_wake (unsigned int* futex, int processes_to_wake, int private)
> +{
> +  int res = lll_futex_wake (futex, processes_to_wake, private);
> +  /* No error.  Ignore the number of woken processes.  */
> +  if (res >= 0)
> +    return;
> +  switch (res)
> +    {
> +    case -EFAULT: /* Could have happened due to memory reuse.  */
> +    case -EINVAL: /* Could be either due to incorrect alignment (a bug in
> +		     glibc or in the application) or due to memory being
> +		     reused for a PI futex.  We cannot distinguish between the
> +		     two causes, and one of them is correct use, so we do not
> +		     act in this case.  */
> +      return;
> +    case -ENOSYS: /* Must have been caused by a glibc bug.  */
> +    /* No other errors are documented at this time.  */
> +    default:
> +      abort ();
> +    }

OK.

> +}
> +
> +
> +/* The semaphore provides two main operations: sem_post adds a token to the
> +   semaphore; sem_wait grabs a token from the semaphore, potentially waiting
> +   until there is a token available.  A sem_wait needs to synchronize with
> +   the sem_post that provided the token, so that whatever lead to the sem_post
> +   happens before the code after sem_wait.
> +
> +   Conceptually, available tokens can simply be counted; let's call that the
> +   value of the semaphore.  However, we also want to know whether there might
> +   be a sem_wait that is blocked on the value because it was zero (using a
> +   futex with the value being the futex variable); if there is no blocked
> +   sem_wait, sem_post does not need to execute a futex_wake call.  Therefore,
> +   we also need to count the number of potentially blocked sem_wait calls
> +   (which we call nwaiters).
> +
> +   What makes this tricky is that POSIX requires that a semaphore can be
> +   destroyed as soon as the last remaining sem_wait has returned, and no
> +   other sem_wait or sem_post calls are executing concurrently.  However, the
> +   sem_post call whose token was consumed by the last sem_wait is considered
> +   to have finished once it provided the token to the sem_wait.
> +   Thus, sem_post must not access the semaphore struct anymore after it has
> +   made a token available; IOW, it needs to be able to atomically provide
> +   a token and check whether any blocked sem_wait calls might exist.
> +
> +   This is straightforward to do if the architecture provides 64b atomics
> +   because we can just put both the value and nwaiters into one variable that
> +   we access atomically: This is the data field, the value is in the
> +   least-significant 32 bits, and nwaiters in the other bits.  When sem_post
> +   makes a value available, it can atomically check nwaiters.
> +
> +   If we have only 32b atomics available, we cannot put both nwaiters and
> +   value into one 32b value because then we might have too few bits for both
> +   of those counters.  Therefore, we need to use two distinct fields.
> +
> +   To allow sem_post to atomically make a token available and check for
> +   blocked sem_wait calls, we use one bit in value to indicate whether
> +   nwaiters is nonzero.  That allows sem_post to use basically the same
> +   algorithm as with 64b atomics, but requires sem_wait to update the bit; it
> +   can't do this atomically with another access to nwaiters, but it can compute
> +   a conservative value for the bit because it's benign if the bit is set
> +   even if nwaiters is zero (all we get is an unnecessary futex wake call by
> +   sem_post).
> +   Specifically, sem_wait will unset the bit speculatively if it believes that
> +   there is no other concurrently executing sem_wait.  If it misspeculated,
> +   it will have to clean up by waking any other sem_wait call (i.e., what
> +   sem_post would do otherwise).  This does not conflict with the destruction
> +   requirement because the semaphore must not be destructed while any sem_wait
> +   is still executing.  */

OK. Great comment.

> +
> +/* Set this to true if you assume that, in contrast to current Linux futex
> +   documentation, lll_futex_wake can return -EINTR only if interrupted by a
> +   signal, not spuriously due to some other reason.
> +   TODO Discuss EINTR conditions with the Linux kernel community.  For
> +   now, we set this to true to not change behavior of semaphores compared
> +   to previous glibc builds.  */
> +static const int sem_assume_only_signals_cause_futex_EINTR = 1;

OK :-)

> +
> +#if !__HAVE_64B_ATOMICS
> +static void
> +__sem_wait_32_finish (struct new_sem *sem);
> +#endif
> +
> +static void
> +__sem_wait_cleanup (void *arg)
> +{
> +  struct new_sem *sem = (struct new_sem *) arg;
> +
> +#if __HAVE_64B_ATOMICS
> +  /* Stop being registered as a waiter.  See below for MO.  */
> +  atomic_fetch_add_relaxed (&sem->data, -(1UL << SEM_NWAITERS_SHIFT));
> +#else
> +  __sem_wait_32_finish (sem);
> +#endif
> +}

OK.

> +
> +/* Wait until at least one token is available, possibly with a timeout.
> +   This is in a separate function in order to make sure gcc
> +   puts the call site into an exception region, and thus the
> +   cleanups get properly run.  TODO still necessary?  Other futex_wait
> +   users don't seem to need it.  */
> +static int
> +__attribute__ ((noinline))
> +do_futex_wait (struct new_sem *sem, const struct timespec *abstime)
> +{
> +  int err, oldtype = __pthread_enable_asynccancel ();
> +
> +#if __HAVE_64B_ATOMICS
> +  err = futex_abstimed_wait ((unsigned int *) &sem->data + SEM_VALUE_OFFSET, 0,
> +			     abstime, sem->private);
> +#else
> +  err = futex_abstimed_wait (&sem->value, SEM_NWAITERS_MASK, abstime,
> +			     sem->private);
> +#endif
> +
> +  __pthread_disable_asynccancel (oldtype);
> +  return err;
> +}

This won't work, you have to push the cancellation to futex_abstimed_wait
and wrap just the syscall, otherwise you have async cancellation enabled
during __gettimeofday and abort(), all of which are dangerous.

The easy fix is to add `int cancel` to futex_abstimed_wait and then cancel
there if required. Which is what I did. It doesn't impact anything, but
makes this safer until Adhemerval can get the proper cancellation fixes
in place.

> +
> +/* Fast path: Try to grab a token without blocking.  */
> +static int
> +__new_sem_wait_fast (struct new_sem *sem, int definitive_result)
> +{
> +  /* We need acquire MO if we actually grab a token, so that this
> +     synchronizes with all token providers (i.e., the RMW operation we read
> +     from or all those before it in modification order; also see sem_post).
> +     We do not need to guarantee any ordering if we observed that there is
> +     no token (POSIX leaves it unspecified whether functions that fail
> +     synchronize memory); thus, relaxed MO is sufficient for the initial load
> +     and the failure path of the CAS.  If the weak CAS fails and we need a
> +     definitive result, retry.  */

OK.

> +#if __HAVE_64B_ATOMICS
> +  unsigned long d = atomic_load_relaxed (&sem->data);
> +  do
> +    {
> +      if ((d & SEM_VALUE_MASK) == 0)
> +	break;
> +      if (atomic_compare_exchange_weak_acquire (&sem->data, &d, d - 1))
> +	return 0;
> +    }
> +  while (definitive_result);
> +  return -1;
> +#else
> +  unsigned int v = atomic_load_relaxed (&sem->value);
> +  do
> +    {
> +      if ((v >> SEM_VALUE_SHIFT) == 0)
> +	break;
> +      if (atomic_compare_exchange_weak_acquire (&sem->value,
> +	  &v, v - (1 << SEM_VALUE_SHIFT)))
> +	return 0;
> +    }
> +  while (definitive_result);
> +  return -1;
> +#endif
> +}
> +
> +/* Slow path that blocks.  */
> +static int
> +__attribute__ ((noinline))
> +__new_sem_wait_slow (struct new_sem *sem, const struct timespec *abstime)
> +{
> +  int err = 0;
> +
> +#if __HAVE_64B_ATOMICS
> +  /* Add a waiter.  Relaxed MO is sufficient because we can rely on the
> +     ordering provided by the RMW operations we use.  */
> +  unsigned long d = atomic_fetch_add_relaxed (&sem->data,
> +      1UL << SEM_NWAITERS_SHIFT);
> +
> +  pthread_cleanup_push (__sem_wait_cleanup, sem);
> +
> +  /* Wait for a token to be available.  Retry until we can grab one.  */
> +  for (;;)
> +    {
> +      /* If there is no token available, sleep until there is.  */
> +      if ((d & SEM_VALUE_MASK) == 0)
> +	{
> +	  err = do_futex_wait (sem, abstime);
> +	  /* A futex return value of 0 or EAGAIN is due to a real or spurious
> +	     wake-up, or due to a change in the number of tokens.  We retry in
> +	     these cases.
> +	     If we timed out, forward this to the caller.
> +	     EINTR could be either due to being interrupted by a signal, or
> +	     due to a spurious wake-up.  Thus, we cannot distinguish between
> +	     both, and are not allowed to return EINTR to the caller but have
> +	     to retry; this is because we may not have been interrupted by a
> +	     signal.  However, if we assume that only signals cause a futex
> +	     return of EINTR, we forward EINTR to the caller.
> +
> +	     Retrying on EINTR is technically always allowed because to
> +	     reliably interrupt sem_wait with a signal, the signal handler
> +	     must call sem_post (which is AS-Safe).  In executions where the
> +	     signal handler does not do that, the implementation can correctly
> +	     claim that sem_wait hadn't actually started to execute yet, and
> +	     thus the signal never actually interrupted sem_wait.  We make no
> +	     timing guarantees, so the program can never observe that sem_wait
> +	     actually did start to execute.  Thus, in a correct program, we
> +	     can expect a signal that wanted to interrupt the sem_wait to have
> +	     provided a token, and can just try to grab this token if
> +	     futex_wait returns EINTR.  */

OK.

> +	  if (err == ETIMEDOUT ||
> +	      (err == EINTR && sem_assume_only_signals_cause_futex_EINTR))
> +	    {
> +	      __set_errno (err);
> +	      err = -1;
> +	      /* Stop being registered as a waiter.  */
> +	      atomic_fetch_add_relaxed (&sem->data,
> +		  -(1UL << SEM_NWAITERS_SHIFT));
> +	      break;
> +	    }
> +	  /* Relaxed MO is sufficient; see below.  */
> +	  d = atomic_load_relaxed (&sem->data);
> +	}
> +      else
> +	{
> +	  /* Try to grab both a token and stop being a waiter.  We need
> +	     acquire MO so this synchronizes with all token providers (i.e.,
> +	     the RMW operation we read from or all those before it in
> +	     modification order; also see sem_post).  On the failure path,
> +	     relaxed MO is sufficient because we only eventually need the
> +	     up-to-date value; the futex_wait or the CAS perform the real
> +	     work.  */

OK.

> +	  if (atomic_compare_exchange_weak_acquire (&sem->data,
> +	      &d, d - 1 - (1UL << SEM_NWAITERS_SHIFT)))
> +	    {
> +	      err = 0;
> +	      break;
> +	    }
> +	}
> +    }
> +
> +  pthread_cleanup_pop (0);
> +#else
> +  /* The main difference to the 64b-atomics implementation is that we need to
> +     access value and nwaiters in separate steps, and that the nwaiters bit
> +     in the value can temporarily not be set even if nwaiters is nonzero.
> +     We work around incorrectly unsetting the nwaiters bit by letting sem_wait
> +     set the bit again and waking the number of waiters that could grab a
> +     token.  There are two additional properties we need to ensure:
> +     (1) We make sure that whenever unsetting the bit, we see the increment of
> +     nwaiters by the other thread that set the bit.  IOW, we will notice if
> +     we make a mistake.
> +     (2) When setting the nwaiters bit, we make sure that we see the unsetting
> +     of the bit by another waiter that happened before us.  This avoids having
> +     to blindly set the bit whenever we need to block on it.  We set/unset
> +     the bit while having incremented nwaiters (i.e., are a registered
> +     waiter), and the problematic case only happens when one waiter indeed
> +     followed another (i.e., nwaiters was never larger than 1); thus, this
> +     works similarly as with a critical section using nwaiters (see the MOs
> +     and related comments below).
> +
> +     An alternative approach would be to unset the bit after decrementing
> +     nwaiters; however, that would result in needing Dekker-like
> +     synchronization and thus full memory barriers.  We also would not be able
> +     to prevent misspeculation, so this alternative scheme does not seem
> +     beneficial.  */
> +  unsigned int v;
> +
> +  /* Add a waiter.  We need acquire MO so this synchronizes with the release
> +     MO we use when decrementing nwaiters below; it ensures that if another
> +     waiter unset the bit before us, we see that and set it again.  Also see
> +     property (2) above.  */
> +  atomic_fetch_add_acquire (&sem->nwaiters, 1);
> +
> +  pthread_cleanup_push (__sem_wait_cleanup, sem);
> +
> +  /* Wait for a token to be available.  Retry until we can grab one.  */
> +  /* We do not need any ordering wrt. to this load's reads-from, so relaxed
> +     MO is sufficient.  The acquire MO above ensures that in the problematic
> +     case, we do see the unsetting of the bit by another waiter.  */
> +  v = atomic_load_relaxed (&sem->value);
> +  do
> +    {
> +      do
> +	{
> +	  /* We are about to block, so make sure that the nwaiters bit is
> +	     set.  We need release MO on the CAS to ensure that when another
> +	     waiter unsets the nwaiters bit, it will also observe that we
> +	     incremented nwaiters in the meantime (also see the unsetting of
> +	     the bit below).  Relaxed MO on CAS failure is sufficient (see
> +	     above).  */
> +	  do
> +	    {
> +	      if ((v & SEM_NWAITERS_MASK) != 0)
> +		break;
> +	    }
> +	  while (!atomic_compare_exchange_weak_release (&sem->value,
> +	      &v, v | SEM_NWAITERS_MASK));
> +	  /* If there is no token, wait.  */
> +	  if ((v >> SEM_VALUE_SHIFT) == 0)
> +	    {
> +	      /* See __HAVE_64B_ATOMICS variant.  */
> +	      err = do_futex_wait(sem, abstime);
> +	      if (err == ETIMEDOUT ||
> +		  (err == EINTR && sem_assume_only_signals_cause_futex_EINTR))
> +		{
> +		  __set_errno (err);
> +		  err = -1;
> +		  goto error;
> +		}
> +	      err = 0;
> +	      /* We blocked, so there might be a token now.  Relaxed MO is
> +		 sufficient (see above).  */
> +	      v = atomic_load_relaxed (&sem->value);
> +	    }
> +	}
> +      /* If there is no token, we must not try to grab one.  */
> +      while ((v >> SEM_VALUE_SHIFT) == 0);
> +    }
> +  /* Try to grab a token.  We need acquire MO so this synchronizes with
> +     all token providers (i.e., the RMW operation we read from or all those
> +     before it in modification order; also see sem_post).  */
> +  while (!atomic_compare_exchange_weak_acquire (&sem->value,
> +      &v, v - (1 << SEM_VALUE_SHIFT)));

OK.

> +
> +error:
> +  pthread_cleanup_pop (0);
> +
> +  __sem_wait_32_finish (sem);
> +#endif
> +
> +  return err;
> +}
> +
> +/* Stop being a registered waiter (non-64b-atomics code only).  */
> +#if !__HAVE_64B_ATOMICS
> +static void
> +__sem_wait_32_finish (struct new_sem *sem)
> +{
> +  /* The nwaiters bit is still set, try to unset it now if this seems
> +     necessary.  We do this before decrementing nwaiters so that the unsetting
> +     is visible to other waiters entering after us.  Relaxed MO is sufficient
> +     because we are just speculating here; a stronger MO would not prevent
> +     misspeculation.  */
> +  unsigned int wguess = atomic_load_relaxed (&sem->nwaiters);
> +  if (wguess == 1)
> +    /* We might be the last waiter, so unset.  This needs acquire MO so that
> +       it syncronizes with the release MO when setting the bit above; if we
> +       overwrite someone else that set the bit, we'll read in the following
> +       decrement of nwaiters at least from that release sequence, so we'll
> +       see if the other waiter is still active or if another writer entered
> +       in the meantime (i.e., using the check below).  */
> +    atomic_fetch_and_acquire (&sem->value, ~SEM_NWAITERS_MASK);
> +
> +  /* Now stop being a waiter, and see whether our guess was correct.
> +     This needs release MO so that it synchronizes with the acquire MO when
> +     a waiter increments nwaiters; this makes sure that newer writers see that
> +     we reset the waiters_present bit.  */
> +  unsigned int wfinal = atomic_fetch_add_release (&sem->nwaiters, -1);
> +  if (wfinal > 1 && wguess == 1)
> +    {
> +      /* We guessed wrong, and so need to clean up after the mistake and
> +         unblock any waiters that could have not been woken.  There is no
> +         additional ordering that we need to set up, so relaxed MO is
> +         sufficient.  */
> +      unsigned int v = atomic_fetch_or_relaxed (&sem->value,
> +						SEM_NWAITERS_MASK);
> +      /* If there are available tokens, then wake as many waiters.  If there
> +         aren't any, then there is no need to wake anyone because there is
> +         none to grab for another waiter.  If tokens become available
> +         subsequently, then the respective sem_post calls will do the wake-up
> +         due to us having set the nwaiters bit again.  */
> +      v >>= SEM_VALUE_SHIFT;
> +      if (v > 0)
> +	futex_wake (&sem->value, v, sem->private);

OK.

> +    }
> +}
> +#endif
> diff --git a/nptl/structsem.sym b/nptl/structsem.sym
> deleted file mode 100644
> index 0e2a15f..0000000
> --- a/nptl/structsem.sym
> +++ /dev/null

OK.

> diff --git a/nptl/tst-sem11.c b/nptl/tst-sem11.c
> index 5248eba..1a2dbaf 100644
> --- a/nptl/tst-sem11.c
> +++ b/nptl/tst-sem11.c
> @@ -34,8 +34,11 @@ main (void)
>        puts ("sem_init failed");
>        return 1;
>      }
> -
> +#if __HAVE_64B_ATOMICS
> +  if ((u.ns.data >> SEM_NWAITERS_SHIFT) != 0)
> +#else
>    if (u.ns.nwaiters != 0)
> +#endif

OK.

>      {
>        puts ("nwaiters not initialized");
>        return 1;
> @@ -68,7 +71,11 @@ main (void)
>        goto again;
>      }
>  
> +#if __HAVE_64B_ATOMICS
> +  if ((u.ns.data >> SEM_NWAITERS_SHIFT) != 0)
> +#else
>    if (u.ns.nwaiters != 0)
> +#endif

OK.

>      {
>        puts ("nwaiters not reset");
>        return 1;
> diff --git a/nptl/tst-sem13.c b/nptl/tst-sem13.c
> index 068d79e..1560e91 100644
> --- a/nptl/tst-sem13.c
> +++ b/nptl/tst-sem13.c
> @@ -33,9 +33,14 @@ do_test (void)
>        perror ("sem_timedwait did not fail with EINVAL");
>        return 1;
>      }
> -  if (u.ns.nwaiters != 0)
> +#if __HAVE_64B_ATOMICS
> +  unsigned int nwaiters = (u.ns.data >> SEM_NWAITERS_SHIFT);
> +#else
> +  unsigned int nwaiters = u.ns.nwaiters;
> +#endif
> +  if (nwaiters != 0)
>      {
> -      printf ("sem_timedwait modified nwaiters: %ld\n", u.ns.nwaiters);
> +      printf ("sem_timedwait modified nwaiters: %d\n", nwaiters);

OK.

>        return 1;
>      }
>  
> @@ -52,9 +57,14 @@ do_test (void)
>        perror ("2nd sem_timedwait did not fail with ETIMEDOUT");
>        return 1;
>      }
> -  if (u.ns.nwaiters != 0)
> +#if __HAVE_64B_ATOMICS
> +  nwaiters = (u.ns.data >> SEM_NWAITERS_SHIFT);
> +#else
> +  nwaiters = u.ns.nwaiters;
> +#endif
> +  if (nwaiters != 0)
>      {
> -      printf ("2nd sem_timedwait modified nwaiters: %ld\n", u.ns.nwaiters);
> +      printf ("2nd sem_timedwait modified nwaiters: %d\n", nwaiters);

OK.

>        return 1;
>      }
>  
> diff --git a/sysdeps/nptl/internaltypes.h b/sysdeps/nptl/internaltypes.h
> index 2f10545..7c0d240 100644
> --- a/sysdeps/nptl/internaltypes.h
> +++ b/sysdeps/nptl/internaltypes.h
> @@ -20,6 +20,8 @@
>  #define _INTERNALTYPES_H	1
>  
>  #include <stdint.h>
> +#include <atomic.h>
> +#include <endian.h>
>  
>  
>  struct pthread_attr
> @@ -141,9 +143,29 @@ struct pthread_key_struct
>  /* Semaphore variable structure.  */
>  struct new_sem
>  {
> +#if __HAVE_64B_ATOMICS
> +  /* The data field holds both value (in the least-significant 32 bytes) and
> +     nwaiters.  */
> +# if __BYTE_ORDER == __LITTLE_ENDIAN
> +#  define SEM_VALUE_OFFSET 0
> +# elif __BYTE_ORDER == __BIG_ENDIAN
> +#  define SEM_VALUE_OFFSET 1
> +# else
> +# error Unsupported byte order.
> +# endif
> +# define SEM_NWAITERS_SHIFT 32
> +# define SEM_VALUE_MASK (~(unsigned int)0)
> +  unsigned long int data;
> +  int private;
> +  int pad;
> +#else
> +# define SEM_VALUE_SHIFT 1
> +# define SEM_NWAITERS_MASK ((unsigned int)1)

OK.

>    unsigned int value;
>    int private;
> -  unsigned long int nwaiters;
> +  int pad;
> +  unsigned int nwaiters;

OK.

> +#endif
>  };
>  
>  struct old_sem
> diff --git a/sysdeps/unix/sysv/linux/alpha/sem_post.c b/sysdeps/unix/sysv/linux/alpha/sem_post.c
> deleted file mode 100644
> index 9d44953..0000000
> --- a/sysdeps/unix/sysv/linux/alpha/sem_post.c
> +++ /dev/null
> @@ -1,5 +0,0 @@
> -/* ??? This is an ass-backwards way to do this.  We should simply define
> -   the acquire/release semantics of atomic_exchange_and_add.  And even if
> -   we don't do this, we should be using atomic_full_barrier or otherwise.  */
> -#define __lll_rel_instr  "mb"
> -#include <nptl/sem_post.c>

Please ask Richard Henderson about removing this.

I suggest a short email to libc-alpha and TO him and ask for an ACK to remove this.

> diff --git a/sysdeps/unix/sysv/linux/i386/i486/sem_post.S b/sysdeps/unix/sysv/linux/i386/i486/sem_post.S
> deleted file mode 100644
> index 7b553bb..0000000
> --- a/sysdeps/unix/sysv/linux/i386/i486/sem_post.S
> +++ /dev/null

OK.

> diff --git a/sysdeps/unix/sysv/linux/i386/i486/sem_timedwait.S b/sysdeps/unix/sysv/linux/i386/i486/sem_timedwait.S
> deleted file mode 100644
> index a8b9164..0000000
> --- a/sysdeps/unix/sysv/linux/i386/i486/sem_timedwait.S
> +++ /dev/null

OK.

> diff --git a/sysdeps/unix/sysv/linux/i386/i486/sem_trywait.S b/sysdeps/unix/sysv/linux/i386/i486/sem_trywait.S
> deleted file mode 100644
> index 2524d96..0000000
> --- a/sysdeps/unix/sysv/linux/i386/i486/sem_trywait.S
> +++ /dev/null
> @@ -1,67 +0,0 @@

OK.

> diff --git a/sysdeps/unix/sysv/linux/i386/i486/sem_wait.S b/sysdeps/unix/sysv/linux/i386/i486/sem_wait.S
> deleted file mode 100644
> index 9121041..0000000
> --- a/sysdeps/unix/sysv/linux/i386/i486/sem_wait.S
> +++ /dev/null
> @@ -1,343 +0,0 @@

OK.

> diff --git a/sysdeps/unix/sysv/linux/i386/i586/sem_post.S b/sysdeps/unix/sysv/linux/i386/i586/sem_post.S
> deleted file mode 100644
> index 4534d56..0000000
> --- a/sysdeps/unix/sysv/linux/i386/i586/sem_post.S
> +++ /dev/null

OK.

> diff --git a/sysdeps/unix/sysv/linux/i386/i586/sem_timedwait.S b/sysdeps/unix/sysv/linux/i386/i586/sem_timedwait.S
> deleted file mode 100644
> index fa4ad49..0000000
> --- a/sysdeps/unix/sysv/linux/i386/i586/sem_timedwait.S
> +++ /dev/null

OK.

> diff --git a/sysdeps/unix/sysv/linux/i386/i586/sem_trywait.S b/sysdeps/unix/sysv/linux/i386/i586/sem_trywait.S
> deleted file mode 100644
> index 6f3a690..0000000
> --- a/sysdeps/unix/sysv/linux/i386/i586/sem_trywait.S
> +++ /dev/null
> @@ -1,19 +0,0 @@

OK.

> diff --git a/sysdeps/unix/sysv/linux/i386/i586/sem_wait.S b/sysdeps/unix/sysv/linux/i386/i586/sem_wait.S
> deleted file mode 100644
> index 718d50d..0000000
> --- a/sysdeps/unix/sysv/linux/i386/i586/sem_wait.S
> +++ /dev/null

OK.

> diff --git a/sysdeps/unix/sysv/linux/i386/i686/sem_post.S b/sysdeps/unix/sysv/linux/i386/i686/sem_post.S
> deleted file mode 100644
> index 4534d56..0000000
> --- a/sysdeps/unix/sysv/linux/i386/i686/sem_post.S
> +++ /dev/null

OK.

> diff --git a/sysdeps/unix/sysv/linux/i386/i686/sem_timedwait.S b/sysdeps/unix/sysv/linux/i386/i686/sem_timedwait.S
> deleted file mode 100644
> index fa4ad49..0000000
> --- a/sysdeps/unix/sysv/linux/i386/i686/sem_timedwait.S
> +++ /dev/null

OK.

> diff --git a/sysdeps/unix/sysv/linux/i386/i686/sem_trywait.S b/sysdeps/unix/sysv/linux/i386/i686/sem_trywait.S
> deleted file mode 100644
> index 6f3a690..0000000
> --- a/sysdeps/unix/sysv/linux/i386/i686/sem_trywait.S
> +++ /dev/null

OK.

> diff --git a/sysdeps/unix/sysv/linux/i386/i686/sem_wait.S b/sysdeps/unix/sysv/linux/i386/i686/sem_wait.S
> deleted file mode 100644
> index 718d50d..0000000
> --- a/sysdeps/unix/sysv/linux/i386/i686/sem_wait.S
> +++ /dev/null

OK.

> diff --git a/sysdeps/unix/sysv/linux/powerpc/sem_post.c b/sysdeps/unix/sysv/linux/powerpc/sem_post.c
> deleted file mode 100644
> index 6a4e46f..0000000
> --- a/sysdeps/unix/sysv/linux/powerpc/sem_post.c
> +++ /dev/null
> @@ -1,71 +0,0 @@
> -/* sem_post -- post to a POSIX semaphore.  Powerpc version.
> -   Copyright (C) 2003-2015 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -   Contributed by Paul Mackerras <paulus@au.ibm.com>, 2003.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <http://www.gnu.org/licenses/>.  */
> -
> -#include <errno.h>
> -#include <sysdep.h>
> -#include <lowlevellock.h>
> -#include <internaltypes.h>
> -#include <semaphore.h>
> -
> -#include <shlib-compat.h>
> -
> -int
> -__new_sem_post (sem_t *sem)
> -{
> -  struct new_sem *isem = (struct new_sem *) sem;
> -
> -  __asm __volatile (__ARCH_REL_INSTR ::: "memory");
> -  atomic_increment (&isem->value);
> -  __asm __volatile (__ARCH_ACQ_INSTR ::: "memory");
> -  if (isem->nwaiters > 0)
> -    {
> -      int err = lll_futex_wake (&isem->value, 1,
> -				isem->private ^ FUTEX_PRIVATE_FLAG);
> -      if (__builtin_expect (err, 0) < 0)
> -	{
> -	  __set_errno (-err);
> -	  return -1;
> -	}
> -    }
> -  return 0;
> -}
> -versioned_symbol (libpthread, __new_sem_post, sem_post, GLIBC_2_1);
> -
> -#if SHLIB_COMPAT (libpthread, GLIBC_2_0, GLIBC_2_1)
> -
> -int
> -attribute_compat_text_section
> -__old_sem_post (sem_t *sem)
> -{
> -  int *futex = (int *) sem;
> -
> -  __asm __volatile (__ARCH_REL_INSTR ::: "memory");
> -  (void) atomic_increment_val (futex);
> -  /* We always have to assume it is a shared semaphore.  */
> -  int err = lll_futex_wake (futex, 1, LLL_SHARED);
> -  if (__builtin_expect (err, 0) < 0)
> -    {
> -      __set_errno (-err);
> -      return -1;
> -    }
> -  return 0;
> -}
> -
> -compat_symbol (libpthread, __old_sem_post, sem_post, GLIBC_2_0);
> -#endif

Please ask Adhemerval Zanella (IBM) about removing this.

I suggest a short email to libc-alpha and TO him and ask for an ACK to remove this.

> diff --git a/sysdeps/unix/sysv/linux/x86_64/sem_post.S b/sysdeps/unix/sysv/linux/x86_64/sem_post.S
> deleted file mode 100644
> index f5dfb05..0000000
> --- a/sysdeps/unix/sysv/linux/x86_64/sem_post.S
> +++ /dev/null

OK.


> diff --git a/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S b/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S
> deleted file mode 100644
> index 091b241..0000000
> --- a/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S
> +++ /dev/null

OK.

> diff --git a/sysdeps/unix/sysv/linux/x86_64/sem_trywait.S b/sysdeps/unix/sysv/linux/x86_64/sem_trywait.S
> deleted file mode 100644
> index 1838e96..0000000
> --- a/sysdeps/unix/sysv/linux/x86_64/sem_trywait.S
> +++ /dev/null

OK.

> diff --git a/sysdeps/unix/sysv/linux/x86_64/sem_wait.S b/sysdeps/unix/sysv/linux/x86_64/sem_wait.S
> deleted file mode 100644
> index 2e7b131..0000000
> --- a/sysdeps/unix/sysv/linux/x86_64/sem_wait.S
> +++ /dev/null

OK.

Cheers,
Carlos.

diff --git a/ChangeLog b/ChangeLog
index fef1322..ae8d932 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,7 +1,7 @@
 2015-01-20  Torvald Riegel  <triegel@redhat.com>
 
 	[BZ #12674]
-	* nptl/sem_waitcommon.c: New file.  Implement new semaphore algorithm.
+	* nptl/sem_waitcommon.c: New file.
 	* nptl/sem_wait.c: Include sem_waitcommon.c.
 	(__sem_wait_cleanup, do_futex_wait): Remove.
 	(__new_sem_wait): Adapt.
diff --git a/nptl/sem_post.c b/nptl/sem_post.c
index f7b6985..9162e4c 100644
--- a/nptl/sem_post.c
+++ b/nptl/sem_post.c
@@ -77,7 +77,7 @@ __new_sem_post (sem_t *sem)
   while (!atomic_compare_exchange_weak_release (&isem->data, &d, d + 1));
 
   /* If there is any potentially blocked waiter, wake one of them.  */
-  if (d >> SEM_NWAITERS_SHIFT > 0)
+  if ((d >> SEM_NWAITERS_SHIFT) > 0)
     futex_wake (((unsigned int *) &isem->data) + SEM_VALUE_OFFSET, 1, private);
 #else
   /* Add a token to the semaphore.  Similar to 64b version.  */
diff --git a/nptl/sem_waitcommon.c b/nptl/sem_waitcommon.c
index 5f3e87f..96848d7 100644
--- a/nptl/sem_waitcommon.c
+++ b/nptl/sem_waitcommon.c
@@ -32,11 +32,17 @@
    TODO Remove when cleaning up the futex API throughout glibc.  */
 static __always_inline int
 futex_abstimed_wait (unsigned int* futex, unsigned int expected,
-		     const struct timespec* abstime, int private)
+		     const struct timespec* abstime, int private, bool cancel)
 {
-  int err;
+  int err, oldtype;
   if (abstime == NULL)
-    err = lll_futex_wait (futex, expected, private);
+    {
+      if (cancel)
+	oldtype = __pthread_enable_asynccancel ();
+      err = lll_futex_wait (futex, expected, private);
+      if (cancel)
+	__pthread_disable_asynccancel (oldtype);
+    }
   else
     {
       struct timeval tv;
@@ -62,8 +68,11 @@ futex_abstimed_wait (unsigned int* futex, unsigned int expected,
       /* Do wait.  */
       rt.tv_sec = sec;
       rt.tv_nsec = nsec;
-
+      if (cancel)
+	oldtype = __pthread_enable_asynccancel ();
       err = lll_futex_timed_wait (futex, expected, &rt, private);
+      if (cancel)
+	__pthread_disable_asynccancel (oldtype);
     }
   switch (err)
     {
@@ -193,17 +202,16 @@ static int
 __attribute__ ((noinline))
 do_futex_wait (struct new_sem *sem, const struct timespec *abstime)
 {
-  int err, oldtype = __pthread_enable_asynccancel ();
+  int err;
 
 #if __HAVE_64B_ATOMICS
   err = futex_abstimed_wait ((unsigned int *) &sem->data + SEM_VALUE_OFFSET, 0,
-			     abstime, sem->private);
+			     abstime, sem->private, true);
 #else
   err = futex_abstimed_wait (&sem->value, SEM_NWAITERS_MASK, abstime,
-			     sem->private);
+			     sem->private, true);
 #endif
 
-  __pthread_disable_asynccancel (oldtype);
   return err;
 }