diff mbox

[RFC] main-loop: Unconditionally unlock iothread

Message ID 1364893452-10604-1-git-send-email-peter.crosthwaite@xilinx.com
State New
Headers show

Commit Message

Peter Crosthwaite April 2, 2013, 9:04 a.m. UTC
Public bug: 1154328
Broken Commit: a29753f8aa79a34a324afebe340182a51a5aef11

ATM, the timeout from g_pollfds_fill is inhibiting unlocking of the
iothread. This is capable of causing a total deadlock when hw/serial
is used as a device. The bug manifests when you go -nographic -serial
mon:stdio and then paste 40 or more chars into the terminal.

My knowledge of this g_foo is vague at best, but my best working
theory is this:

- First 8 chars are recieved by the serial device no complaints.
- The next 32 chars, serial returns false for can_receive() so they
	are buffered by the MuxDriver object - mux_chr_read()
- Buffer is full, so 41st char causes false return from Muxes own
	can_read()
- This propagates all the way up to glib_pollfds_fill and manifests
	as a timeout
- Timeout means no unlock of IOthread. Device land never sees any more
	cycles so the serial port never progresses - no flushing of
	buffer
- Deadlock

Tested on petalogix_ml605 microblazeel machine model, which was faulty
due to 1154328.

Fix by removing the conditions on unlocking the iothread. Don't know
what else this will break but the timeout is certainly the wrong
condition for the unlock. Probably the real solution is to have a more
selective unlock policy.

I'm happy for someone to take this patch off my hands, or educate me on
the correct implementation. For the peeps doing automated testing on
nographic platforms this will get your build working again.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
---
 main-loop.c |    8 ++------
 1 files changed, 2 insertions(+), 6 deletions(-)

Comments

Paolo Bonzini April 2, 2013, 11:11 a.m. UTC | #1
Il 02/04/2013 11:04, Peter Crosthwaite ha scritto:
> Public bug: 1154328
> Broken Commit: a29753f8aa79a34a324afebe340182a51a5aef11
> 
> ATM, the timeout from g_pollfds_fill is inhibiting unlocking of the
> iothread. This is capable of causing a total deadlock when hw/serial
> is used as a device. The bug manifests when you go -nographic -serial
> mon:stdio and then paste 40 or more chars into the terminal.
> 
> My knowledge of this g_foo is vague at best, but my best working
> theory is this:
> 
> - First 8 chars are recieved by the serial device no complaints.
> - The next 32 chars, serial returns false for can_receive() so they
> 	are buffered by the MuxDriver object - mux_chr_read()
> - Buffer is full, so 41st char causes false return from Muxes own
> 	can_read()
> - This propagates all the way up to glib_pollfds_fill and manifests
> 	as a timeout

I suppose you mean "manifests as timeout==0".  The question is *which*
GSource has a timeout of zero?  Not the mux's: if mux_chr_can_read()
returns zero, the prepare function returns FALSE without touching the
timeout at all...

static gboolean io_watch_poll_prepare(GSource *source, gint *timeout_)
{
    IOWatchPoll *iwp = io_watch_poll_from_source(source);

    iwp->max_size = iwp->fd_can_read(iwp->opaque);
    if (iwp->max_size == 0) {
        return FALSE;
    }

    return g_io_watch_funcs.prepare(source, timeout_);
}

> - Timeout means no unlock of IOthread. Device land never sees any more
> 	cycles so the serial port never progresses - no flushing of
> 	buffer

Still, this is plausible, so the patch looks correct.

Paolo

> - Deadlock
> 
> Tested on petalogix_ml605 microblazeel machine model, which was faulty
> due to 1154328.
> 
> Fix by removing the conditions on unlocking the iothread. Don't know
> what else this will break but the timeout is certainly the wrong
> condition for the unlock. Probably the real solution is to have a more
> selective unlock policy.
> 
> I'm happy for someone to take this patch off my hands, or educate me on
> the correct implementation. For the peeps doing automated testing on
> nographic platforms this will get your build working again.
> 
> Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
> ---
>  main-loop.c |    8 ++------
>  1 files changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/main-loop.c b/main-loop.c
> index eb80ff3..a376898 100644
> --- a/main-loop.c
> +++ b/main-loop.c
> @@ -194,15 +194,11 @@ static int os_host_main_loop_wait(uint32_t timeout)
>  
>      glib_pollfds_fill(&timeout);
>  
> -    if (timeout > 0) {
> -        qemu_mutex_unlock_iothread();
> -    }
> +    qemu_mutex_unlock_iothread();
>  
>      ret = g_poll((GPollFD *)gpollfds->data, gpollfds->len, timeout);
>  
> -    if (timeout > 0) {
> -        qemu_mutex_lock_iothread();
> -    }
> +    qemu_mutex_lock_iothread();
>  
>      glib_pollfds_poll();
>      return ret;
>
diff mbox

Patch

diff --git a/main-loop.c b/main-loop.c
index eb80ff3..a376898 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -194,15 +194,11 @@  static int os_host_main_loop_wait(uint32_t timeout)
 
     glib_pollfds_fill(&timeout);
 
-    if (timeout > 0) {
-        qemu_mutex_unlock_iothread();
-    }
+    qemu_mutex_unlock_iothread();
 
     ret = g_poll((GPollFD *)gpollfds->data, gpollfds->len, timeout);
 
-    if (timeout > 0) {
-        qemu_mutex_lock_iothread();
-    }
+    qemu_mutex_lock_iothread();
 
     glib_pollfds_poll();
     return ret;