Message ID | 08ee1d5d77b710a12bdd236a54a3621e85914791.1621517561.git.grive@u256.net |
---|---|
State | Changes Requested |
Headers | show |
Series | RCU: Add blocking mode for debugging | expand |
On 5/20/21 3:35 PM, Gaetan Rivet wrote: > Joining pthreads makes the caller quiescent. It should register as such, > as joined threads may wait on an RCU callback executing before quitting, > deadlocking the caller. Hi, Gaetan. This patch doesn't look right to me. The problem is that users of this function has no idea that the quiescent state will be entered by this function. And this is really hard to track down, because it can be called very deep inside some separate part of the code base that user at the top level might not even know about. For example, a lot of call chains in ovsdb-server may lead to xpthread_join called from ovsdb/log.c. Even though ovsdb-server is single-threaded now, insertion of the ovsrcu_quiesce_start() into xpthread_join() will effectively mean that ovsdb-server will never be able to use RCU if it will become multi-threaded someday, e.g. it will not be able to use CMAPs. So, instead of doing that, callers should enter quiescent state before joining, and this should be done at the highest level of a call chain possible. We have the same thing with cond_wait() implementation, you may add similar comment to the join() function as we have for cond_wait(). Best regards, Ilya Maximets.
On Wed, Jun 30, 2021, at 11:09, Ilya Maximets wrote: > On 5/20/21 3:35 PM, Gaetan Rivet wrote: > > Joining pthreads makes the caller quiescent. It should register as such, > > as joined threads may wait on an RCU callback executing before quitting, > > deadlocking the caller. > > Hi, Gaetan. > > This patch doesn't look right to me. The problem is that users of this > function has no idea that the quiescent state will be entered by this > function. And this is really hard to track down, because it can be called > very deep inside some separate part of the code base that user at the > top level might not even know about. For example, a lot of call chains > in ovsdb-server may lead to xpthread_join called from ovsdb/log.c. > Even though ovsdb-server is single-threaded now, insertion of the > ovsrcu_quiesce_start() into xpthread_join() will effectively mean that > ovsdb-server will never be able to use RCU if it will become multi-threaded > someday, e.g. it will not be able to use CMAPs. > > So, instead of doing that, callers should enter quiescent state before > joining, and this should be done at the highest level of a call chain > possible. We have the same thing with cond_wait() implementation, you > may add similar comment to the join() function as we have for cond_wait(). > > Best regards, Ilya Maximets. > Hi Ilya, I think you are right, this is dangerous and puts the caller in a position where they might break RCU without realizing. Without quiescing, joining on a thread that has RCU callbacks scheduled has 2 different behaviors: if the RCU is not blocking this is fine and joining will resolve ; if it is blocking then both threads enter a deadlock. The solution would be to explicitly quiesce as you mentioned, but this is what I tried to avoid with this series: requiring developers to 'fix' code that is only an issue if this very specific compilation option is used. I don't yet see a good solution to it. Documenting only is bound to leave incorrect code through, that would be fine most of the time but will break on people trying to use RCU blocking. I could try to detect the deadlock (I don't see how yet), issue a warning and make the joined thread non-blocking / wake it if already blocked. Thanks for the review,
diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c index 805cba622..bf58923f8 100644 --- a/lib/ovs-thread.c +++ b/lib/ovs-thread.c @@ -180,8 +180,6 @@ XPTHREAD_FUNC1(pthread_cond_destroy, pthread_cond_t *); XPTHREAD_FUNC1(pthread_cond_signal, pthread_cond_t *); XPTHREAD_FUNC1(pthread_cond_broadcast, pthread_cond_t *); -XPTHREAD_FUNC2(pthread_join, pthread_t, void **); - typedef void destructor_func(void *); XPTHREAD_FUNC2(pthread_key_create, pthread_key_t *, destructor_func *); XPTHREAD_FUNC1(pthread_key_delete, pthread_key_t); @@ -191,6 +189,20 @@ XPTHREAD_FUNC2(pthread_setspecific, pthread_key_t, const void *); XPTHREAD_FUNC3(pthread_sigmask, int, const sigset_t *, sigset_t *); #endif +void +xpthread_join(pthread_t thread, void **retval) +{ + int error; + + ovsrcu_quiesce_start(); + error = pthread_join(thread, retval); + ovsrcu_quiesce_end(); + + if (OVS_UNLIKELY(error)) { + ovs_abort(error, "%s failed", __func__); + } +} + static void ovs_mutex_init__(const struct ovs_mutex *l_, int type) {
Joining pthreads makes the caller quiescent. It should register as such, as joined threads may wait on an RCU callback executing before quitting, deadlocking the caller. Signed-off-by: Gaetan Rivet <grive@u256.net> --- lib/ovs-thread.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)