Message ID | 1364893452-10604-1-git-send-email-peter.crosthwaite@xilinx.com |
---|---|
State | New |
Headers | show |
Il 02/04/2013 11:04, Peter Crosthwaite ha scritto: > Public bug: 1154328 > Broken Commit: a29753f8aa79a34a324afebe340182a51a5aef11 > > ATM, the timeout from g_pollfds_fill is inhibiting unlocking of the > iothread. This is capable of causing a total deadlock when hw/serial > is used as a device. The bug manifests when you go -nographic -serial > mon:stdio and then paste 40 or more chars into the terminal. > > My knowledge of this g_foo is vague at best, but my best working > theory is this: > > - First 8 chars are recieved by the serial device no complaints. > - The next 32 chars, serial returns false for can_receive() so they > are buffered by the MuxDriver object - mux_chr_read() > - Buffer is full, so 41st char causes false return from Muxes own > can_read() > - This propagates all the way up to glib_pollfds_fill and manifests > as a timeout I suppose you mean "manifests as timeout==0". The question is *which* GSource has a timeout of zero? Not the mux's: if mux_chr_can_read() returns zero, the prepare function returns FALSE without touching the timeout at all... static gboolean io_watch_poll_prepare(GSource *source, gint *timeout_) { IOWatchPoll *iwp = io_watch_poll_from_source(source); iwp->max_size = iwp->fd_can_read(iwp->opaque); if (iwp->max_size == 0) { return FALSE; } return g_io_watch_funcs.prepare(source, timeout_); } > - Timeout means no unlock of IOthread. Device land never sees any more > cycles so the serial port never progresses - no flushing of > buffer Still, this is plausible, so the patch looks correct. Paolo > - Deadlock > > Tested on petalogix_ml605 microblazeel machine model, which was faulty > due to 1154328. > > Fix by removing the conditions on unlocking the iothread. Don't know > what else this will break but the timeout is certainly the wrong > condition for the unlock. Probably the real solution is to have a more > selective unlock policy. > > I'm happy for someone to take this patch off my hands, or educate me on > the correct implementation. For the peeps doing automated testing on > nographic platforms this will get your build working again. > > Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> > --- > main-loop.c | 8 ++------ > 1 files changed, 2 insertions(+), 6 deletions(-) > > diff --git a/main-loop.c b/main-loop.c > index eb80ff3..a376898 100644 > --- a/main-loop.c > +++ b/main-loop.c > @@ -194,15 +194,11 @@ static int os_host_main_loop_wait(uint32_t timeout) > > glib_pollfds_fill(&timeout); > > - if (timeout > 0) { > - qemu_mutex_unlock_iothread(); > - } > + qemu_mutex_unlock_iothread(); > > ret = g_poll((GPollFD *)gpollfds->data, gpollfds->len, timeout); > > - if (timeout > 0) { > - qemu_mutex_lock_iothread(); > - } > + qemu_mutex_lock_iothread(); > > glib_pollfds_poll(); > return ret; >
diff --git a/main-loop.c b/main-loop.c index eb80ff3..a376898 100644 --- a/main-loop.c +++ b/main-loop.c @@ -194,15 +194,11 @@ static int os_host_main_loop_wait(uint32_t timeout) glib_pollfds_fill(&timeout); - if (timeout > 0) { - qemu_mutex_unlock_iothread(); - } + qemu_mutex_unlock_iothread(); ret = g_poll((GPollFD *)gpollfds->data, gpollfds->len, timeout); - if (timeout > 0) { - qemu_mutex_lock_iothread(); - } + qemu_mutex_lock_iothread(); glib_pollfds_poll(); return ret;
Public bug: 1154328 Broken Commit: a29753f8aa79a34a324afebe340182a51a5aef11 ATM, the timeout from g_pollfds_fill is inhibiting unlocking of the iothread. This is capable of causing a total deadlock when hw/serial is used as a device. The bug manifests when you go -nographic -serial mon:stdio and then paste 40 or more chars into the terminal. My knowledge of this g_foo is vague at best, but my best working theory is this: - First 8 chars are recieved by the serial device no complaints. - The next 32 chars, serial returns false for can_receive() so they are buffered by the MuxDriver object - mux_chr_read() - Buffer is full, so 41st char causes false return from Muxes own can_read() - This propagates all the way up to glib_pollfds_fill and manifests as a timeout - Timeout means no unlock of IOthread. Device land never sees any more cycles so the serial port never progresses - no flushing of buffer - Deadlock Tested on petalogix_ml605 microblazeel machine model, which was faulty due to 1154328. Fix by removing the conditions on unlocking the iothread. Don't know what else this will break but the timeout is certainly the wrong condition for the unlock. Probably the real solution is to have a more selective unlock policy. I'm happy for someone to take this patch off my hands, or educate me on the correct implementation. For the peeps doing automated testing on nographic platforms this will get your build working again. Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> --- main-loop.c | 8 ++------ 1 files changed, 2 insertions(+), 6 deletions(-)