Message ID | 53BC2579.9060200@redhat.com |
---|---|
State | New |
Headers | show |
On 08/07/14 19:08, Paolo Bonzini wrote: > Il 08/07/2014 17:59, Stefan Hajnoczi ha scritto: >> I sent Christian an initial patch to fix this but now both threads are >> stuck in rfifolock_lock() inside cond wait. That's very strange and >> should never happen. > > I had this patch pending for 2.2: > > commit 6c81e31615c3cda5ea981a998ba8b1b8ed17de6f > Author: Paolo Bonzini <pbonzini@redhat.com> > Date: Mon Jul 7 10:39:49 2014 +0200 > > iothread: do not rely on aio_poll(ctx, true) result to end a loop > > Currently, whenever aio_poll(ctx, true) has completed all pending > work it returns true *and* the next call to aio_poll(ctx, true) > will not block. > > This invariant has its roots in qemu_aio_flush()'s implementation > as "while (qemu_aio_wait()) {}". However, qemu_aio_flush() does > not exist anymore and bdrv_drain_all() is implemented differently; > and this invariant is complicated to maintain and subtly different > from the return value of GMainLoop's g_main_context_iteration. > > All calls to aio_poll(ctx, true) except one are guarded by a > while() loop checking for a request to be incomplete, or a > BlockDriverState to be idle. Modify that one exception in > iothread.c. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> The hangs are gone. Looks like 2.1 material now... Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> > > diff --git a/iothread.c b/iothread.c > index 1fbf9f1..d9403cf 100644 > --- a/iothread.c > +++ b/iothread.c > @@ -30,6 +30,7 @@ typedef ObjectClass IOThreadClass; > static void *iothread_run(void *opaque) > { > IOThread *iothread = opaque; > + bool blocking; > > qemu_mutex_lock(&iothread->init_done_lock); > iothread->thread_id = qemu_get_thread_id(); > @@ -38,8 +39,10 @@ static void *iothread_run(void *opaque) > > while (!iothread->stopping) { > aio_context_acquire(iothread->ctx); > - while (!iothread->stopping && aio_poll(iothread->ctx, true)) { > + blocking = true; > + while (!iothread->stopping && aio_poll(iothread->ctx, blocking)) { > /* Progress was made, keep going */ > + blocking = false; > } > aio_context_release(iothread->ctx); > } > > Christian, can you test it? > > Paolo >
Il 08/07/2014 21:07, Christian Borntraeger ha scritto: > On 08/07/14 19:08, Paolo Bonzini wrote: >> Il 08/07/2014 17:59, Stefan Hajnoczi ha scritto: >>> I sent Christian an initial patch to fix this but now both threads are >>> stuck in rfifolock_lock() inside cond wait. That's very strange and >>> should never happen. >> >> I had this patch pending for 2.2: >> >> commit 6c81e31615c3cda5ea981a998ba8b1b8ed17de6f >> Author: Paolo Bonzini <pbonzini@redhat.com> >> Date: Mon Jul 7 10:39:49 2014 +0200 >> >> iothread: do not rely on aio_poll(ctx, true) result to end a loop >> >> Currently, whenever aio_poll(ctx, true) has completed all pending >> work it returns true *and* the next call to aio_poll(ctx, true) >> will not block. >> >> This invariant has its roots in qemu_aio_flush()'s implementation >> as "while (qemu_aio_wait()) {}". However, qemu_aio_flush() does >> not exist anymore and bdrv_drain_all() is implemented differently; >> and this invariant is complicated to maintain and subtly different >> from the return value of GMainLoop's g_main_context_iteration. >> >> All calls to aio_poll(ctx, true) except one are guarded by a >> while() loop checking for a request to be incomplete, or a >> BlockDriverState to be idle. Modify that one exception in >> iothread.c. >> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > The hangs are gone. Looks like 2.1 material now... > > Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> > Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Great, I'll send it out tomorrow morning. Paolo
On Tue, Jul 8, 2014 at 9:07 PM, Christian Borntraeger <borntraeger@de.ibm.com> wrote: > On 08/07/14 19:08, Paolo Bonzini wrote: >> Il 08/07/2014 17:59, Stefan Hajnoczi ha scritto: >>> I sent Christian an initial patch to fix this but now both threads are >>> stuck in rfifolock_lock() inside cond wait. That's very strange and >>> should never happen. >> >> I had this patch pending for 2.2: >> >> commit 6c81e31615c3cda5ea981a998ba8b1b8ed17de6f >> Author: Paolo Bonzini <pbonzini@redhat.com> >> Date: Mon Jul 7 10:39:49 2014 +0200 >> >> iothread: do not rely on aio_poll(ctx, true) result to end a loop >> >> Currently, whenever aio_poll(ctx, true) has completed all pending >> work it returns true *and* the next call to aio_poll(ctx, true) >> will not block. >> >> This invariant has its roots in qemu_aio_flush()'s implementation >> as "while (qemu_aio_wait()) {}". However, qemu_aio_flush() does >> not exist anymore and bdrv_drain_all() is implemented differently; >> and this invariant is complicated to maintain and subtly different >> from the return value of GMainLoop's g_main_context_iteration. >> >> All calls to aio_poll(ctx, true) except one are guarded by a >> while() loop checking for a request to be incomplete, or a >> BlockDriverState to be idle. Modify that one exception in >> iothread.c. >> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > The hangs are gone. Looks like 2.1 material now... > > Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> > Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> > > > > >> >> diff --git a/iothread.c b/iothread.c >> index 1fbf9f1..d9403cf 100644 >> --- a/iothread.c >> +++ b/iothread.c >> @@ -30,6 +30,7 @@ typedef ObjectClass IOThreadClass; >> static void *iothread_run(void *opaque) >> { >> IOThread *iothread = opaque; >> + bool blocking; >> >> qemu_mutex_lock(&iothread->init_done_lock); >> iothread->thread_id = qemu_get_thread_id(); >> @@ -38,8 +39,10 @@ static void *iothread_run(void *opaque) >> >> while (!iothread->stopping) { >> aio_context_acquire(iothread->ctx); >> - while (!iothread->stopping && aio_poll(iothread->ctx, true)) { >> + blocking = true; >> + while (!iothread->stopping && aio_poll(iothread->ctx, blocking)) { >> /* Progress was made, keep going */ >> + blocking = false; >> } >> aio_context_release(iothread->ctx); >> } >> >> Christian, can you test it? Could affect performance because of the extra poll/release/acquire but a clean solution for broken iothread_run(). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Good for 2.1
diff --git a/iothread.c b/iothread.c index 1fbf9f1..d9403cf 100644 --- a/iothread.c +++ b/iothread.c @@ -30,6 +30,7 @@ typedef ObjectClass IOThreadClass; static void *iothread_run(void *opaque) { IOThread *iothread = opaque; + bool blocking; qemu_mutex_lock(&iothread->init_done_lock); iothread->thread_id = qemu_get_thread_id(); @@ -38,8 +39,10 @@ static void *iothread_run(void *opaque) while (!iothread->stopping) { aio_context_acquire(iothread->ctx); - while (!iothread->stopping && aio_poll(iothread->ctx, true)) { + blocking = true; + while (!iothread->stopping && aio_poll(iothread->ctx, blocking)) { /* Progress was made, keep going */ + blocking = false; } aio_context_release(iothread->ctx); }