Patchwork [v2] qemu-char: eliminate busy waiting on can_read returning zero

login
register
mail settings
Submitter Paolo Bonzini
Date April 7, 2013, 6:49 p.m.
Message ID <5161BFAC.4030403@redhat.com>
Download mbox | patch
Permalink /patch/234517/
State New
Headers show

Comments

Paolo Bonzini - April 7, 2013, 6:49 p.m.
Il 06/04/2013 21:00, Amit Shah ha scritto:
> On (Fri) 05 Apr 2013 [17:59:33], Paolo Bonzini wrote:
>> The character backend refactoring introduced an undesirable busy wait.
>> The busy wait happens if can_read returns zero and there is data available
>> on the character device's file descriptor.  Then, the I/O watch will
>> fire continuously and, with TCG, the CPU thread will never run.
>>
>>     1) Char backend asks front end if it can write
>>     2) Front end says no
>>     3) poll() finds the char backend's descriptor is available
>>     4) Goto (1)
>>
>> What we really want is this (note that step 3 avoids the busy wait):
>>
>>     1) Char backend asks front end if it can write
>>     2) Front end says no
>>     3) poll() goes on without char backend's descriptor
>>     4) Goto (1) until qemu_chr_accept_input() called
>>
>>     5) Char backend asks front end if it can write
>>     6) Front end says yes
>>     7) poll() finds the char backend's descriptor is available
>>     8) Backend handler called
>>
>> After this patch, the IOWatchPoll source and the watch source are
>> separated.  The IOWatchPoll is simply a hook that runs during the prepare
>> phase on each main loop iteration.  The hook adds/removes the actual
>> source depending on the return value from can_read.
>>
>> A simple reproducer is
>>
>>     qemu-system-i386 -serial mon:stdio
>>
>> ... followed by banging on the terminal as much as you can. :)  Without
>> this patch, emulation will hang.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>         v1->v2: use g_source_get_context to find if the watch was active
> 
>>  static gboolean io_watch_poll_prepare(GSource *source, gint *timeout_)
>>  {
>>      IOWatchPoll *iwp = io_watch_poll_from_source(source);
>> -
>> -    iwp->max_size = iwp->fd_can_read(iwp->opaque);
>> -    if (iwp->max_size == 0) {
>> +    bool now_active = iwp->fd_can_read(iwp->opaque) > 0;
>> +    bool was_active = g_source_get_context(iwp->src) != NULL;
> 
> This gives me a bunch of
> 
> (process:30075): GLib-CRITICAL **: g_source_get_context: assertion `!SOURCE_DESTROYED (source)' failed
> 
> messages

This should fix it, but unfortunately Peter reports it is not enough:

     }


I'm not sure what is different between the microblaze and x86 cases.
Peter, please send us reproduction instructions.

Paolo
Peter Crosthwaite - April 7, 2013, 10:43 p.m.
Hi Paolo,

On Mon, Apr 8, 2013 at 4:49 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 06/04/2013 21:00, Amit Shah ha scritto:
>> On (Fri) 05 Apr 2013 [17:59:33], Paolo Bonzini wrote:
>>> The character backend refactoring introduced an undesirable busy wait.
>>> The busy wait happens if can_read returns zero and there is data available
>>> on the character device's file descriptor.  Then, the I/O watch will
>>> fire continuously and, with TCG, the CPU thread will never run.
>>>
>>>     1) Char backend asks front end if it can write
>>>     2) Front end says no
>>>     3) poll() finds the char backend's descriptor is available
>>>     4) Goto (1)
>>>
>>> What we really want is this (note that step 3 avoids the busy wait):
>>>
>>>     1) Char backend asks front end if it can write
>>>     2) Front end says no
>>>     3) poll() goes on without char backend's descriptor
>>>     4) Goto (1) until qemu_chr_accept_input() called
>>>
>>>     5) Char backend asks front end if it can write
>>>     6) Front end says yes
>>>     7) poll() finds the char backend's descriptor is available
>>>     8) Backend handler called
>>>
>>> After this patch, the IOWatchPoll source and the watch source are
>>> separated.  The IOWatchPoll is simply a hook that runs during the prepare
>>> phase on each main loop iteration.  The hook adds/removes the actual
>>> source depending on the return value from can_read.
>>>
>>> A simple reproducer is
>>>
>>>     qemu-system-i386 -serial mon:stdio
>>>
>>> ... followed by banging on the terminal as much as you can. :)  Without
>>> this patch, emulation will hang.
>>>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>         v1->v2: use g_source_get_context to find if the watch was active
>>
>>>  static gboolean io_watch_poll_prepare(GSource *source, gint *timeout_)
>>>  {
>>>      IOWatchPoll *iwp = io_watch_poll_from_source(source);
>>> -
>>> -    iwp->max_size = iwp->fd_can_read(iwp->opaque);
>>> -    if (iwp->max_size == 0) {
>>> +    bool now_active = iwp->fd_can_read(iwp->opaque) > 0;
>>> +    bool was_active = g_source_get_context(iwp->src) != NULL;
>>
>> This gives me a bunch of
>>
>> (process:30075): GLib-CRITICAL **: g_source_get_context: assertion `!SOURCE_DESTROYED (source)' failed
>>
>> messages
>
> This should fix it, but unfortunately Peter reports it is not enough:
>
> diff --git a/qemu-char.c b/qemu-char.c
> index 85ebcf9..c2fb985 100644
> --- a/qemu-char.c
> +++ b/qemu-char.c
> @@ -610,8 +610,14 @@ static IOWatchPoll
> *io_watch_poll_from_source(GSource *source)
>  static gboolean io_watch_poll_prepare(GSource *source, gint *timeout_)
>  {
>      IOWatchPoll *iwp = io_watch_poll_from_source(source);
> -    bool was_active = g_source_get_context(iwp->src) != NULL;
> -    bool now_active = iwp->fd_can_read(iwp->opaque) > 0;
> +    bool was_active, now_active;
> +
> +    if (g_source_is_destroyed(iwp->src)) {
> +        return FALSE;
> +    }
> +
> +    was_active = g_source_get_context(iwp->src) != NULL;
> +    now_active = iwp->fd_can_read(iwp->opaque) > 0;
>      if (was_active == now_active) {
>          return FALSE;
>      }
>

I too was getting the asserts. And this hunk does fix them. But the
bug persists.

>
> I'm not sure what is different between the microblaze and x86 cases.
> Peter, please send us reproduction instructions.
>

Heres a guest (Linux):

http://www.wiki.xilinx.com/file/view/image.elf/420852602/image.elf

./microblazeel-softmmu/qemu-system-microblazeel -M petalogix-ml605 -m
256 --serial mon:stdio --nographic -kernel image.elf

Let it run until you get the login prompt. Then paste 100 or so chars
into the console at once. Some will come through (before this patch
none came through), but the guest will then hang.

My configure:

./configure --target-list="microblazeel-softmmu" --disable-kvm --disable-fdt

Apply my "mainloop.c: Keep unlocking BQL during busy-wait spin-out"
and revert this patch to make it work. If you just revert this it will
work but be very slow.

Regards,
Peter

> Paolo
>

Patch

diff --git a/qemu-char.c b/qemu-char.c
index 85ebcf9..c2fb985 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -610,8 +610,14 @@  static IOWatchPoll
*io_watch_poll_from_source(GSource *source)
 static gboolean io_watch_poll_prepare(GSource *source, gint *timeout_)
 {
     IOWatchPoll *iwp = io_watch_poll_from_source(source);
-    bool was_active = g_source_get_context(iwp->src) != NULL;
-    bool now_active = iwp->fd_can_read(iwp->opaque) > 0;
+    bool was_active, now_active;
+
+    if (g_source_is_destroyed(iwp->src)) {
+        return FALSE;
+    }
+
+    was_active = g_source_get_context(iwp->src) != NULL;
+    now_active = iwp->fd_can_read(iwp->opaque) > 0;
     if (was_active == now_active) {
         return FALSE;