Patchwork semaphore: fix a hangup problem under load on NetBSD hosts.

login
register
mail settings
Submitter Izumi Tsutsui
Date June 29, 2013, 10:22 a.m.
Message ID <1372501374-3550-1-git-send-email-tsutsui@ceres.dti.ne.jp>
Download mbox | patch
Permalink /patch/255740/
State New
Headers show

Comments

Izumi Tsutsui - June 29, 2013, 10:22 a.m.
Fix following bugs in "fallback implementation of counting semaphores
with mutex+condvar" added in c166cb72f1676855816340666c3b618beef4b976:
 - waiting threads are not restarted properly if more than one threads
   are waiting unblock signals in qemu_sem_timedwait()
 - possible missing pthread_cond_signal(3) calls when waiting threads
   are returned by ETIMEDOUT
 - fix an uninitialized variable

The problem is analyzed by and fix is provided by Noriyuki Soda.

Signed-off-by: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
---
 util/qemu-thread-posix.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)
Laszlo Ersek - July 1, 2013, 9:54 a.m.
On 06/29/13 12:22, Izumi Tsutsui wrote:
> Fix following bugs in "fallback implementation of counting semaphores
> with mutex+condvar" added in c166cb72f1676855816340666c3b618beef4b976:
>  - waiting threads are not restarted properly if more than one threads
>    are waiting unblock signals in qemu_sem_timedwait()
>  - possible missing pthread_cond_signal(3) calls when waiting threads
>    are returned by ETIMEDOUT
>  - fix an uninitialized variable
> 
> The problem is analyzed by and fix is provided by Noriyuki Soda.
> 
> Signed-off-by: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
> ---
>  util/qemu-thread-posix.c | 17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
> index 4489abf..db7a15b 100644
> --- a/util/qemu-thread-posix.c
> +++ b/util/qemu-thread-posix.c
> @@ -172,10 +172,9 @@ void qemu_sem_post(QemuSemaphore *sem)
>      pthread_mutex_lock(&sem->lock);
>      if (sem->count == INT_MAX) {
>          rc = EINVAL;
> -    } else if (sem->count++ < 0) {
> -        rc = pthread_cond_signal(&sem->cond);
>      } else {
> -        rc = 0;
> +        sem->count++;
> +        rc = pthread_cond_signal(&sem->cond);
>      }
>      pthread_mutex_unlock(&sem->lock);
>      if (rc != 0) {
> @@ -207,19 +206,21 @@ int qemu_sem_timedwait(QemuSemaphore *sem, int ms)
>      struct timespec ts;
>  
>  #if defined(__APPLE__) || defined(__NetBSD__)
> +    rc = 0;
>      compute_abs_deadline(&ts, ms);
>      pthread_mutex_lock(&sem->lock);
> -    --sem->count;
> -    while (sem->count < 0) {
> +    while (sem->count <= 0) {
>          rc = pthread_cond_timedwait(&sem->cond, &sem->lock, &ts);
>          if (rc == ETIMEDOUT) {
> -            ++sem->count;
>              break;
>          }
>          if (rc != 0) {
>              error_exit(rc, __func__);
>          }
>      }
> +    if (rc != ETIMEDOUT) {
> +        --sem->count;
> +    }
>      pthread_mutex_unlock(&sem->lock);
>      return (rc == ETIMEDOUT ? -1 : 0);
>  #else
> @@ -251,10 +252,10 @@ void qemu_sem_wait(QemuSemaphore *sem)
>  {
>  #if defined(__APPLE__) || defined(__NetBSD__)
>      pthread_mutex_lock(&sem->lock);
> -    --sem->count;
> -    while (sem->count < 0) {
> +    while (sem->count <= 0) {
>          pthread_cond_wait(&sem->cond, &sem->lock);
>      }
> +    --sem->count;
>      pthread_mutex_unlock(&sem->lock);
>  #else
>      int rc;
> 

I agree with this patch, but I'd propose something more intrusive (feel
free to ignore it anyway): "QemuSemaphore.count" has no business with
negative values; it should be an unsigned int.

The condition on which consumers block is exactly (count == 0).
Conversely, the only time we need to send a signal is the 0->1 count
transition (*). Checks for negative values should be eliminated in
parallel with the int->unsigned type change.

Also I'd feel safer if pthread_cond_*() and pthread_mutex_*() were
retval-checked consistently, but that's tangential.

Reviewed-by: Laszlo Ersek <lersek@redhat.com>

(*) 100% tangential: this reminds me of when I made an attempt to
dissect condvars & co on reddit [1]. I considered pthread_cond_signal()
vs. pthread_cond_broadcast() too; alas my two conclusions there against
the former were wrong. See [2] why -- in short when a wakeup signal is
delivered, the victim thread is removed from the set of potential
victims. In other words, pthread_cond_signal() itself (vs. broadcast)
*is* correct here.

I also like that the signal is sent with the mutex held [3] [4].

[1] http://www.reddit.com/r/programming/comments/9ynxv/utter_verbiage_how_to_design_condition_variables/
[2] http://thread.gmane.org/gmane.comp.standards.posix.austin.general/4844/focus=4850
[3] http://thread.gmane.org/gmane.comp.standards.posix.austin.general/1822/focus=1823
[4] http://www.domaigne.com/blog/computing/condvars-signal-with-mutex-locked-or-not/

Thanks,
Laszlo
Izumi Tsutsui - July 2, 2013, 3:27 p.m.
Laszlo Ersek wrote:

> On 06/29/13 12:22, Izumi Tsutsui wrote:
> > Fix following bugs in "fallback implementation of counting semaphores
> > with mutex+condvar" added in c166cb72f1676855816340666c3b618beef4b976:
> >  - waiting threads are not restarted properly if more than one threads
> >    are waiting unblock signals in qemu_sem_timedwait()
> >  - possible missing pthread_cond_signal(3) calls when waiting threads
> >    are returned by ETIMEDOUT
> >  - fix an uninitialized variable
> > 
> > The problem is analyzed by and fix is provided by Noriyuki Soda.
> > 
> > Signed-off-by: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
> > ---
> >  util/qemu-thread-posix.c | 17 +++++++++--------
> >  1 file changed, 9 insertions(+), 8 deletions(-)
> > 
> > diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
> > index 4489abf..db7a15b 100644
> > --- a/util/qemu-thread-posix.c
> > +++ b/util/qemu-thread-posix.c
> > @@ -172,10 +172,9 @@ void qemu_sem_post(QemuSemaphore *sem)
> >      pthread_mutex_lock(&sem->lock);
> >      if (sem->count == INT_MAX) {
> >          rc = EINVAL;
> > -    } else if (sem->count++ < 0) {
> > -        rc = pthread_cond_signal(&sem->cond);
> >      } else {
> > -        rc = 0;
> > +        sem->count++;
> > +        rc = pthread_cond_signal(&sem->cond);
> >      }
> >      pthread_mutex_unlock(&sem->lock);
> >      if (rc != 0) {
> > @@ -207,19 +206,21 @@ int qemu_sem_timedwait(QemuSemaphore *sem, int ms)
> >      struct timespec ts;
> >  
> >  #if defined(__APPLE__) || defined(__NetBSD__)
> > +    rc = 0;
> >      compute_abs_deadline(&ts, ms);
> >      pthread_mutex_lock(&sem->lock);
> > -    --sem->count;
> > -    while (sem->count < 0) {
> > +    while (sem->count <= 0) {
> >          rc = pthread_cond_timedwait(&sem->cond, &sem->lock, &ts);
> >          if (rc == ETIMEDOUT) {
> > -            ++sem->count;
> >              break;
> >          }
> >          if (rc != 0) {
> >              error_exit(rc, __func__);
> >          }
> >      }
> > +    if (rc != ETIMEDOUT) {
> > +        --sem->count;
> > +    }
> >      pthread_mutex_unlock(&sem->lock);
> >      return (rc == ETIMEDOUT ? -1 : 0);
> >  #else
> > @@ -251,10 +252,10 @@ void qemu_sem_wait(QemuSemaphore *sem)
> >  {
> >  #if defined(__APPLE__) || defined(__NetBSD__)
> >      pthread_mutex_lock(&sem->lock);
> > -    --sem->count;
> > -    while (sem->count < 0) {
> > +    while (sem->count <= 0) {
> >          pthread_cond_wait(&sem->cond, &sem->lock);
> >      }
> > +    --sem->count;
> >      pthread_mutex_unlock(&sem->lock);
> >  #else
> >      int rc;
> > 
> 
> I agree with this patch, but I'd propose something more intrusive (feel
> free to ignore it anyway): "QemuSemaphore.count" has no business with
> negative values; it should be an unsigned int.
> 
> The condition on which consumers block is exactly (count == 0).

Sure, I'll post an updated patch v2 later.

> Conversely, the only time we need to send a signal is the 0->1 count
> transition (*).

Per comments from Soda, signals could be required even on count >0,
if more than one threads are sleeping in qemu_cond_timedwait(),
and more than one qemu_sem_post() are called at once, then
the second qemu_sem_post() gets the mutex before sleeping threads
in qemu_sem_timedwait().

> Checks for negative values should be eliminated in
> parallel with the int->unsigned type change.

I'll also eliminate them.

> Also I'd feel safer if pthread_cond_*() and pthread_mutex_*() were
> retval-checked consistently, but that's tangential.

I'll add a retval check of pthread_cond_wait() in qemu_sem_wait()
as pthread_cond_timedwait() in qemu_sem_timedwait().
But I'll leave pthread_mutex_{lock,unlock} because there are
many other sources which don't check retvals of them.

> Reviewed-by: Laszlo Ersek <lersek@redhat.com>

Thanks,

> (*) 100% tangential: this reminds me of when I made an attempt to
> dissect condvars & co on reddit [1]. I considered pthread_cond_signal()
> vs. pthread_cond_broadcast() too; alas my two conclusions there against
> the former were wrong. See [2] why -- in short when a wakeup signal is
> delivered, the victim thread is removed from the set of potential
> victims. In other words, pthread_cond_signal() itself (vs. broadcast)
> *is* correct here.
> 
> I also like that the signal is sent with the mutex held [3] [4].
> 
> [1] http://www.reddit.com/r/programming/comments/9ynxv/utter_verbiage_how_to_design_condition_variables/
> [2] http://thread.gmane.org/gmane.comp.standards.posix.austin.general/4844/focus=4850
> [3] http://thread.gmane.org/gmane.comp.standards.posix.austin.general/1822/focus=1823
> [4] http://www.domaigne.com/blog/computing/condvars-signal-with-mutex-locked-or-not/
> 
> Thanks,
> Laszlo
> 
---
Izumi Tsutsui
Laszlo Ersek - July 2, 2013, 3:47 p.m.
On 07/02/13 17:27, Izumi Tsutsui wrote:
> Laszlo Ersek wrote:

>> Conversely, the only time we need to send a signal is the 0->1 count
>> transition (*).
> 
> Per comments from Soda, signals could be required even on count >0,
> if more than one threads are sleeping in qemu_cond_timedwait(),
> and more than one qemu_sem_post() are called at once, then
> the second qemu_sem_post() gets the mutex before sleeping threads
> in qemu_sem_timedwait().

You're right.

Laszlo

Patch

diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index 4489abf..db7a15b 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -172,10 +172,9 @@  void qemu_sem_post(QemuSemaphore *sem)
     pthread_mutex_lock(&sem->lock);
     if (sem->count == INT_MAX) {
         rc = EINVAL;
-    } else if (sem->count++ < 0) {
-        rc = pthread_cond_signal(&sem->cond);
     } else {
-        rc = 0;
+        sem->count++;
+        rc = pthread_cond_signal(&sem->cond);
     }
     pthread_mutex_unlock(&sem->lock);
     if (rc != 0) {
@@ -207,19 +206,21 @@  int qemu_sem_timedwait(QemuSemaphore *sem, int ms)
     struct timespec ts;
 
 #if defined(__APPLE__) || defined(__NetBSD__)
+    rc = 0;
     compute_abs_deadline(&ts, ms);
     pthread_mutex_lock(&sem->lock);
-    --sem->count;
-    while (sem->count < 0) {
+    while (sem->count <= 0) {
         rc = pthread_cond_timedwait(&sem->cond, &sem->lock, &ts);
         if (rc == ETIMEDOUT) {
-            ++sem->count;
             break;
         }
         if (rc != 0) {
             error_exit(rc, __func__);
         }
     }
+    if (rc != ETIMEDOUT) {
+        --sem->count;
+    }
     pthread_mutex_unlock(&sem->lock);
     return (rc == ETIMEDOUT ? -1 : 0);
 #else
@@ -251,10 +252,10 @@  void qemu_sem_wait(QemuSemaphore *sem)
 {
 #if defined(__APPLE__) || defined(__NetBSD__)
     pthread_mutex_lock(&sem->lock);
-    --sem->count;
-    while (sem->count < 0) {
+    while (sem->count <= 0) {
         pthread_cond_wait(&sem->cond, &sem->lock);
     }
+    --sem->count;
     pthread_mutex_unlock(&sem->lock);
 #else
     int rc;