Patchwork Fix libgomp semaphores

login
register
mail settings
Submitter Alan Modra
Date Nov. 29, 2011, 12:49 p.m.
Message ID <20111129124911.GR7827@bubble.grove.modra.org>
Download mbox | patch
Permalink /patch/128286/
State New
Headers show

Comments

Alan Modra - Nov. 29, 2011, 12:49 p.m.
Since there were a number of changes requested, and some confusion
over what was happening in this code, it might be better if I post
a new patch and we start over.  In this patch I'm still using the MSB
for the wait flag, and count in the low 31 bits, as that way is
slightly more efficient on powerpc.  However, if some other target
generates really bad code for the masking operations, then I'm happy
enough to change the comment and use
#define SEM_WAIT 1
#define SEM_INC 2

Tested both ways on powerpc-linux.

	PR libgomp/51249
	* config/linux/sem.h: Rewrite.
	* config/linux/sem.c: Rewrite.
Richard Henderson - Nov. 29, 2011, 4:09 p.m.
On 11/29/2011 04:49 AM, Alan Modra wrote:
> Since there were a number of changes requested, and some confusion
> over what was happening in this code, it might be better if I post
> a new patch and we start over.  In this patch I'm still using the MSB
> for the wait flag, and count in the low 31 bits, as that way is
> slightly more efficient on powerpc.  However, if some other target
> generates really bad code for the masking operations, then I'm happy
> enough to change the comment and use
> #define SEM_WAIT 1
> #define SEM_INC 2

Agreed we can wait for more data for this.  I think readability of the file with these two names is significantly better already.

> +static inline bool
> +likely_compare_exchange (int *ptr, int *expected, int val, bool weak,
> +			 enum memmodel succ, enum memmodel fail)
>  {
> +  return __builtin_expect (__atomic_compare_exchange_n (ptr, expected, val,
> +							weak, succ, fail), 1);
>  }

Please move this to libgomp.h just below the memmodel definition, as a macro so that it's not tied to int.  It seems likely that we'll want this elsewhere in libgomp as we convert things.

> +gomp_sem_wait (gomp_sem_t *sem)
>  {
> +  int count = *sem;
> +
> +  while ((count & ~SEM_WAIT) != 0)
> +    if (likely_compare_exchange (sem, &count, count - SEM_INC,
> +				 false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED))
> +      return;

Tight loops like this should use the weak compare-exchange so that we don't get two nested loops without need.

> +gomp_sem_post (gomp_sem_t *sem)
> +{
> +  int count = *sem;
> +
> +  while (1)
> +    if (likely_compare_exchange (sem, &count, ((count + SEM_INC) & ~SEM_WAIT),
> +				 false, MEMMODEL_RELEASE, MEMMODEL_RELAXED))
> +      break;

Likewise.

And this morning I finally follow the logic.  I agree it's correct.  I do think we need more commentary here so that next month I can still follow it.

> +gomp_sem_wait_slow (gomp_sem_t *sem, int count)
>  {
> +  /* First loop spins a while.  */
> +  while (count == 0)
> +    if (do_spin (sem, 0)
> +	/* Spin timeout, nothing changed.  Set waiting flag.  */
> +	&& likely_compare_exchange (sem, &count, SEM_WAIT,
> +				    false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED))
> +      {

For the record, here I agree we should use the strong CAS, because the CAS is not the only thing in the loop.  We don't want a syscall just because the store-conditional failed.

Ok with those changes.


r~
Richard Henderson - Nov. 29, 2011, 5:50 p.m.
On 11/29/2011 08:09 AM, Richard Henderson wrote:
>> +static inline bool
>> +likely_compare_exchange (int *ptr, int *expected, int val, bool weak,
>> +			 enum memmodel succ, enum memmodel fail)
>>  {
>> +  return __builtin_expect (__atomic_compare_exchange_n (ptr, expected, val,
>> +							weak, succ, fail), 1);
>>  }
> 
> Please move this to libgomp.h just below the memmodel definition, as
> a macro so that it's not tied to int.  It seems likely that we'll
> want this elsewhere in libgomp as we convert things.

... although it does occur to me that it would be more useful to adjust gcc/predict.c so that we get this by default.  Then we don't have to contort loops like

> +  while (1)
> +    if (likely_compare_exchange (sem, &count, ((count + SEM_INC) & ~SEM_WAIT),
> +				 false, MEMMODEL_RELEASE, MEMMODEL_RELAXED))
> +      break;

this.  This is surely more natural as

  while (!__atomic_compare_exchange_n (sem, &count, ...))
    continue;


r~
Alan Modra - Nov. 30, 2011, 3:44 a.m.
On Tue, Nov 29, 2011 at 08:09:31AM -0800, Richard Henderson wrote:
> Tight loops like this should use the weak compare-exchange so that we don't get two nested loops without need.

Done.

> I do think we need more commentary here so that next month I can still follow it.

Added this in gomp_sem_post

  /* Clear SEM_WAIT here so that if there are no more waiting threads
     we transition back to the uncontended state that does not make
     futex syscalls.  If there are waiting threads then when one is
     awoken it will set SEM_WAIT again, so other waiting threads are
     woken on a future gomp_sem_post.  Furthermore, the awoken thread
     will wake other threads in case gomp_sem_post was called again
     before it had time to set SEM_WAIT.  */

and committed revision 181831 without likely_compare_exchange as
discussed.

Patch

Index: libgomp/config/linux/sem.h
===================================================================
--- libgomp/config/linux/sem.h	(revision 181770)
+++ libgomp/config/linux/sem.h	(working copy)
@@ -1,4 +1,4 @@ 
-/* Copyright (C) 2005, 2009 Free Software Foundation, Inc.
+/* Copyright (C) 2005, 2009, 2011 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
    This file is part of the GNU OpenMP Library (libgomp).
@@ -24,34 +24,65 @@ 
 
 /* This is a Linux specific implementation of a semaphore synchronization
    mechanism for libgomp.  This type is private to the library.  This 
-   implementation uses atomic instructions and the futex syscall.  */
+   counting semaphore implementation uses atomic instructions and the
+   futex syscall, and a single 32-bit int to store semaphore state.
+   The low 31 bits are the count, the top bit is a flag set when some
+   threads may be waiting.  */
 
 #ifndef GOMP_SEM_H
 #define GOMP_SEM_H 1
 
+#include <limits.h> /* For INT_MIN */
+
 typedef int gomp_sem_t;
+#define SEM_WAIT INT_MIN
+#define SEM_INC 1
+
+extern void gomp_sem_wait_slow (gomp_sem_t *, int);
+extern void gomp_sem_post_slow (gomp_sem_t *);
 
-static inline void gomp_sem_init (gomp_sem_t *sem, int value)
+static inline void
+gomp_sem_init (gomp_sem_t *sem, int value)
 {
-  *sem = value;
+  *sem = value * SEM_INC;
 }
 
-extern void gomp_sem_wait_slow (gomp_sem_t *);
-static inline void gomp_sem_wait (gomp_sem_t *sem)
+static inline void
+gomp_sem_destroy (gomp_sem_t *sem)
 {
-  if (!__sync_bool_compare_and_swap (sem, 1, 0))
-    gomp_sem_wait_slow (sem);
 }
 
-extern void gomp_sem_post_slow (gomp_sem_t *);
-static inline void gomp_sem_post (gomp_sem_t *sem)
+static inline bool
+likely_compare_exchange (int *ptr, int *expected, int val, bool weak,
+			 enum memmodel succ, enum memmodel fail)
 {
-  if (!__sync_bool_compare_and_swap (sem, 0, 1))
-    gomp_sem_post_slow (sem);
+  return __builtin_expect (__atomic_compare_exchange_n (ptr, expected, val,
+							weak, succ, fail), 1);
 }
 
-static inline void gomp_sem_destroy (gomp_sem_t *sem)
+static inline void
+gomp_sem_wait (gomp_sem_t *sem)
 {
+  int count = *sem;
+
+  while ((count & ~SEM_WAIT) != 0)
+    if (likely_compare_exchange (sem, &count, count - SEM_INC,
+				 false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED))
+      return;
+  gomp_sem_wait_slow (sem, count);
 }
 
+static inline void
+gomp_sem_post (gomp_sem_t *sem)
+{
+  int count = *sem;
+
+  while (1)
+    if (likely_compare_exchange (sem, &count, ((count + SEM_INC) & ~SEM_WAIT),
+				 false, MEMMODEL_RELEASE, MEMMODEL_RELAXED))
+      break;
+
+  if (__builtin_expect (count & SEM_WAIT, 0))
+    gomp_sem_post_slow (sem);
+}
 #endif /* GOMP_SEM_H */
Index: libgomp/config/linux/sem.c
===================================================================
--- libgomp/config/linux/sem.c	(revision 181770)
+++ libgomp/config/linux/sem.c	(working copy)
@@ -1,4 +1,4 @@ 
-/* Copyright (C) 2005, 2008, 2009 Free Software Foundation, Inc.
+/* Copyright (C) 2005, 2008, 2009, 2011 Free Software Foundation, Inc.
    Contributed by Richard Henderson <rth@redhat.com>.
 
    This file is part of the GNU OpenMP Library (libgomp).
@@ -28,34 +28,56 @@ 
 
 #include "wait.h"
 
-
 void
-gomp_sem_wait_slow (gomp_sem_t *sem)
+gomp_sem_wait_slow (gomp_sem_t *sem, int count)
 {
+  /* First loop spins a while.  */
+  while (count == 0)
+    if (do_spin (sem, 0)
+	/* Spin timeout, nothing changed.  Set waiting flag.  */
+	&& likely_compare_exchange (sem, &count, SEM_WAIT,
+				    false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED))
+      {
+	futex_wait (sem, SEM_WAIT);
+	count = *sem;
+	break;
+      }
+  /* Something changed.  If it wasn't the wait flag, we're good to go.  */
+    else if (__builtin_expect (((count = *sem) & SEM_WAIT) == 0 && count != 0,
+			       1))
+      {
+	if (likely_compare_exchange (sem, &count, count - SEM_INC,
+				     false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED))
+	  return;
+      }
+
+  /* Second loop waits until semaphore is posted.  We always exit this
+     loop with wait flag set, so next post will awaken a thread.  */
   while (1)
     {
-      int val = __sync_val_compare_and_swap (sem, 0, -1);
-      if (val > 0)
+      unsigned int wake = count & ~SEM_WAIT;
+      int newval = SEM_WAIT;
+
+      if (wake != 0)
+	newval |= wake - SEM_INC;
+      if (likely_compare_exchange (sem, &count, newval,
+				   false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED))
 	{
-	  if (__sync_bool_compare_and_swap (sem, val, val - 1))
-	    return;
+	  if (wake != 0)
+	    {
+	      /* If we can wake more threads, do so now.  */
+	      if (wake > SEM_INC)
+		gomp_sem_post_slow (sem);
+	      break;
+	    }
+	  do_wait (sem, SEM_WAIT);
+	  count = *sem;
 	}
-      do_wait (sem, -1);
     }
 }
 
 void
 gomp_sem_post_slow (gomp_sem_t *sem)
 {
-  int old, tmp = *sem, wake;
-
-  do
-    {
-      old = tmp;
-      wake = old > 0 ? old + 1 : 1;
-      tmp = __sync_val_compare_and_swap (sem, old, wake);
-    }
-  while (old != tmp);
-
-  futex_wake (sem, wake);
+  futex_wake (sem, 1);
 }