diff mbox series

[ovs-dev,v2,6/8] ovs-thread: Quiesce when joining pthreads

Message ID 08ee1d5d77b710a12bdd236a54a3621e85914791.1621517561.git.grive@u256.net
State Changes Requested
Headers show
Series RCU: Add blocking mode for debugging | expand

Commit Message

Gaetan Rivet May 20, 2021, 1:35 p.m. UTC
Joining pthreads makes the caller quiescent. It should register as such,
as joined threads may wait on an RCU callback executing before quitting,
deadlocking the caller.

Signed-off-by: Gaetan Rivet <grive@u256.net>
---
 lib/ovs-thread.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Comments

Ilya Maximets June 30, 2021, 9:09 a.m. UTC | #1
On 5/20/21 3:35 PM, Gaetan Rivet wrote:
> Joining pthreads makes the caller quiescent. It should register as such,
> as joined threads may wait on an RCU callback executing before quitting,
> deadlocking the caller.

Hi, Gaetan.

This patch doesn't look right to me.  The problem is that users of this
function has no idea that the quiescent state will be entered by this
function.  And this is really hard to track down, because it can be called
very deep inside some separate part of the code base that user at the
top level might not even know about.  For example, a lot of call chains
in ovsdb-server may lead to xpthread_join called from ovsdb/log.c.
Even though ovsdb-server is single-threaded now, insertion of the
ovsrcu_quiesce_start() into xpthread_join() will effectively mean that
ovsdb-server will never be able to use RCU if it will become multi-threaded
someday, e.g. it will not be able to use CMAPs.

So, instead of doing that, callers should enter quiescent state before
joining, and this should be done at the highest level of a call chain
possible.  We have the same thing with cond_wait() implementation, you
may add similar comment to the join() function as we have for cond_wait().

Best regards, Ilya Maximets.
Gaetan Rivet June 30, 2021, 1:11 p.m. UTC | #2
On Wed, Jun 30, 2021, at 11:09, Ilya Maximets wrote:
> On 5/20/21 3:35 PM, Gaetan Rivet wrote:
> > Joining pthreads makes the caller quiescent. It should register as such,
> > as joined threads may wait on an RCU callback executing before quitting,
> > deadlocking the caller.
> 
> Hi, Gaetan.
> 
> This patch doesn't look right to me.  The problem is that users of this
> function has no idea that the quiescent state will be entered by this
> function.  And this is really hard to track down, because it can be called
> very deep inside some separate part of the code base that user at the
> top level might not even know about.  For example, a lot of call chains
> in ovsdb-server may lead to xpthread_join called from ovsdb/log.c.
> Even though ovsdb-server is single-threaded now, insertion of the
> ovsrcu_quiesce_start() into xpthread_join() will effectively mean that
> ovsdb-server will never be able to use RCU if it will become multi-threaded
> someday, e.g. it will not be able to use CMAPs.
> 
> So, instead of doing that, callers should enter quiescent state before
> joining, and this should be done at the highest level of a call chain
> possible.  We have the same thing with cond_wait() implementation, you
> may add similar comment to the join() function as we have for cond_wait().
> 
> Best regards, Ilya Maximets.
> 

Hi Ilya,

I think you are right, this is dangerous and puts the caller in a position
where they might break RCU without realizing.

Without quiescing, joining on a thread that has RCU callbacks scheduled
has 2 different behaviors: if the RCU is not blocking this is fine and
joining will resolve ; if it is blocking then both threads enter a deadlock.

The solution would be to explicitly quiesce as you mentioned, but this is what
I tried to avoid with this series: requiring developers to 'fix' code that is
only an issue if this very specific compilation option is used.

I don't yet see a good solution to it.

Documenting only is bound to leave incorrect code through, that would be fine
most of the time but will break on people trying to use RCU blocking.
I could try to detect the deadlock (I don't see how yet), issue a warning and
make the joined thread non-blocking / wake it if already blocked.

Thanks for the review,
diff mbox series

Patch

diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c
index 805cba622..bf58923f8 100644
--- a/lib/ovs-thread.c
+++ b/lib/ovs-thread.c
@@ -180,8 +180,6 @@  XPTHREAD_FUNC1(pthread_cond_destroy, pthread_cond_t *);
 XPTHREAD_FUNC1(pthread_cond_signal, pthread_cond_t *);
 XPTHREAD_FUNC1(pthread_cond_broadcast, pthread_cond_t *);
 
-XPTHREAD_FUNC2(pthread_join, pthread_t, void **);
-
 typedef void destructor_func(void *);
 XPTHREAD_FUNC2(pthread_key_create, pthread_key_t *, destructor_func *);
 XPTHREAD_FUNC1(pthread_key_delete, pthread_key_t);
@@ -191,6 +189,20 @@  XPTHREAD_FUNC2(pthread_setspecific, pthread_key_t, const void *);
 XPTHREAD_FUNC3(pthread_sigmask, int, const sigset_t *, sigset_t *);
 #endif
 
+void
+xpthread_join(pthread_t thread, void **retval)
+{
+    int error;
+
+    ovsrcu_quiesce_start();
+    error = pthread_join(thread, retval);
+    ovsrcu_quiesce_end();
+
+    if (OVS_UNLIKELY(error)) {
+        ovs_abort(error, "%s failed", __func__);
+    }
+}
+
 static void
 ovs_mutex_init__(const struct ovs_mutex *l_, int type)
 {