Patchwork event_notifier: move to top-level directory

login
register
mail settings
Submitter Anthony Liguori
Date Sept. 27, 2011, 4:05 p.m.
Message ID <4E81F43B.1040708@codemonkey.ws>
Download mbox | patch
Permalink /patch/116625/
State New
Headers show

Comments

Anthony Liguori - Sept. 27, 2011, 4:05 p.m.
On 09/27/2011 10:28 AM, Jan Kiszka wrote:
> On 2011-09-27 17:17, Stefan Hajnoczi wrote:
>> On Tue, Sep 27, 2011 at 3:26 PM, Avi Kivity<avi@redhat.com>  wrote:
>>> Has no business in hw/.
>>>
>>> Signed-off-by: Avi Kivity<avi@redhat.com>
>>> ---
>>>   hw/event_notifier.c =>  event_notifier.c |    1 -
>>>   hw/event_notifier.h =>  event_notifier.h |    0
>>>   2 files changed, 0 insertions(+), 1 deletions(-)
>>>   rename hw/event_notifier.c =>  event_notifier.c (98%)
>>>   rename hw/event_notifier.h =>  event_notifier.h (100%)
>>
>> Yay.  Now perhaps we can kill qemu_eventfd(), whose users are
>> typically poking around with write(2) and read(2) when really they
>> could use the high-level event_notifier.h API.
>
> EventNotifiers will have to be superseded by something more generic
> first. It's not fully covering the use cases of cpus.c and
> posix-aio-compat.c.

Actually, for posix-aio, we can just switch to using g_idle_add().  g_idle_add() 
uses g_source_attach which is thread safe.  g_idle_add() gives you a thread safe 
mechanism to defer a piece of work to the main loop which is really what we want 
here.

This can actually be made to work with sync I/O emulation too by having another 
GMainLoop in the sync I/O loop although I thought I recalled a patch series to 
remove that stuff.

Kevin/Stefan, what's the plans for sync I/O emulation?

See untested patch below.

Regards,

Anthony Liguori

>
> Jan
>
Paolo Bonzini - Sept. 27, 2011, 4:39 p.m.
On 09/27/2011 06:05 PM, Anthony Liguori wrote:
> Actually, for posix-aio, we can just switch to using g_idle_add().
> g_idle_add() uses g_source_attach which is thread safe.  g_idle_add()
>  gives you a thread safe mechanism to defer a piece of work to the
> main loop which is really what we want here.

For that, a bottom half would also do (apart that I am not sure it is
async-safe with TCG).  In fact, that would make sense since all of
posix_aio_process_queue could become a bottom half.

But the problem to be solved here is waking up qemu_aio_wait, and
scheduling bottom halves currently wakes up only the vl.c main loop.

> This can actually be made to work with sync I/O emulation too by
> having another GMainLoop in the sync I/O loop although I thought I
> recalled a patch series to remove that stuff.

... which stuff? :)  Another GMainLoop in the sync I/O loop is 
problematic for two reasons: 1) the sync I/O loop does not relinquish 
the I/O thread mutex, which makes it very different from the outer loop; 
2) a nested GMainLoop keeps polling all the file descriptors in the 
outer loop, which requires you to cope with reentrancy in those monitor 
commands that flush AIO.

Paolo
Anthony Liguori - Sept. 27, 2011, 9:23 p.m.
On 09/27/2011 11:39 AM, Paolo Bonzini wrote:
> On 09/27/2011 06:05 PM, Anthony Liguori wrote:
>> Actually, for posix-aio, we can just switch to using g_idle_add().
>> g_idle_add() uses g_source_attach which is thread safe. g_idle_add()
>> gives you a thread safe mechanism to defer a piece of work to the
>> main loop which is really what we want here.
>
> For that, a bottom half would also do (apart that I am not sure it is
> async-safe with TCG). In fact, that would make sense since all of
> posix_aio_process_queue could become a bottom half.

Bottom halves are signal safe, not thread safe.

To make bottom halves thread safe, you would (in the very least) have to add 
some barriers when reading/writing the scheduling flag.  I think it's much 
better to just use GIdle sources though.

>> This can actually be made to work with sync I/O emulation too by
>> having another GMainLoop in the sync I/O loop although I thought I
>> recalled a patch series to remove that stuff.
>
> ... which stuff? :)

The sync I/O emulation.  Since sync I/O is done in block drivers, they can just 
use coroutine I/O instead of sync I/O.

> Another GMainLoop in the sync I/O loop is problematic for
> two reasons: 1) the sync I/O loop does not relinquish the I/O thread mutex,
> which makes it very different from the outer loop; 2) a nested GMainLoop keeps
> polling all the file descriptors in the outer loop, which requires you to cope
> with reentrancy in those monitor commands that flush AIO.

Re: (2), you can use a separate aio GMainContext.  The trick is that you need to 
keep a global current main context that switches when you enter the AIO main loop.

It's doable, but since I think we're on the verge of eliminating sync I/O 
emulation, that's probably a better path to pursue.

Regards,

Anthony Liguori

>
> Paolo
>
Paolo Bonzini - Sept. 28, 2011, 6:27 a.m.
On 09/27/2011 11:23 PM, Anthony Liguori wrote:
> On 09/27/2011 11:39 AM, Paolo Bonzini wrote:
>> On 09/27/2011 06:05 PM, Anthony Liguori wrote:
>>> Actually, for posix-aio, we can just switch to using g_idle_add().
>>> g_idle_add() uses g_source_attach which is thread safe. g_idle_add()
>>> gives you a thread safe mechanism to defer a piece of work to the
>>> main loop which is really what we want here.
>>
>> For that, a bottom half would also do (apart that I am not sure it is
>> async-safe with TCG). In fact, that would make sense since all of
>> posix_aio_process_queue could become a bottom half.
>
> Bottom halves are signal safe, not thread safe.
>
> To make bottom halves thread safe, you would (in the very least) have to
> add some barriers when reading/writing the scheduling flag.

You can probably assume that qemu_notify_event (and dually the read in 
the main loop) are resp. write/read memory barriers.  Or even full.

If we switch entirely to GSources, it would be nice to use them.  But 
since we aren't, and our main loop functionality is quite different from 
glib's (it doesn't rely on abstractions for file descriptors, for 
example), it is just a painful incomplete transition to use glib's idle 
sources to do the exact same thing that is done by bottom halves (which 
are already in our toolbox).

Paolo
Jan Kiszka - Sept. 28, 2011, 7:52 a.m.
On 2011-09-28 08:27, Paolo Bonzini wrote:
> On 09/27/2011 11:23 PM, Anthony Liguori wrote:
>> On 09/27/2011 11:39 AM, Paolo Bonzini wrote:
>>> On 09/27/2011 06:05 PM, Anthony Liguori wrote:
>>>> Actually, for posix-aio, we can just switch to using g_idle_add().
>>>> g_idle_add() uses g_source_attach which is thread safe. g_idle_add()
>>>> gives you a thread safe mechanism to defer a piece of work to the
>>>> main loop which is really what we want here.
>>>
>>> For that, a bottom half would also do (apart that I am not sure it is
>>> async-safe with TCG). In fact, that would make sense since all of
>>> posix_aio_process_queue could become a bottom half.
>>
>> Bottom halves are signal safe, not thread safe.
>>
>> To make bottom halves thread safe, you would (in the very least) have to
>> add some barriers when reading/writing the scheduling flag.
> 
> You can probably assume that qemu_notify_event (and dually the read in 
> the main loop) are resp. write/read memory barriers.  Or even full.
> 
> If we switch entirely to GSources, it would be nice to use them.  But 
> since we aren't, and our main loop functionality is quite different from 
> glib's (it doesn't rely on abstractions for file descriptors, for 
> example), it is just a painful incomplete transition to use glib's idle 
> sources to do the exact same thing that is done by bottom halves (which 
> are already in our toolbox).

BTW, I just wondered if there is anything conceptually preventing to
skip this ping pong between AIO thread and main loop and just run the
completion over the former context (under global lock protection of course).

Jan
Paolo Bonzini - Sept. 28, 2011, 8 a.m.
On 09/28/2011 09:52 AM, Jan Kiszka wrote:
> >  You can probably assume that qemu_notify_event (and dually the read in
> >  the main loop) are resp. write/read memory barriers.  Or even full.
> >
> >  If we switch entirely to GSources, it would be nice to use them.  But
> >  since we aren't, and our main loop functionality is quite different from
> >  glib's (it doesn't rely on abstractions for file descriptors, for
> >  example), it is just a painful incomplete transition to use glib's idle
> >  sources to do the exact same thing that is done by bottom halves (which
> >  are already in our toolbox).
>
> BTW, I just wondered if there is anything conceptually preventing to
> skip this ping pong between AIO thread and main loop and just run the
> completion over the former context (under global lock protection of course).

Would be a good idea, but it would require some refactoring because 
tools do not have a global lock.

Paolo
Stefan Hajnoczi - Sept. 28, 2011, 9:12 a.m.
On Tue, Sep 27, 2011 at 10:23 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 09/27/2011 11:39 AM, Paolo Bonzini wrote:
>>
>> On 09/27/2011 06:05 PM, Anthony Liguori wrote:
>>>
>>> Actually, for posix-aio, we can just switch to using g_idle_add().
>>> g_idle_add() uses g_source_attach which is thread safe. g_idle_add()
>>> gives you a thread safe mechanism to defer a piece of work to the
>>> main loop which is really what we want here.
>>
>> For that, a bottom half would also do (apart that I am not sure it is
>> async-safe with TCG). In fact, that would make sense since all of
>> posix_aio_process_queue could become a bottom half.
>
> Bottom halves are signal safe, not thread safe.
>
> To make bottom halves thread safe, you would (in the very least) have to add
> some barriers when reading/writing the scheduling flag.  I think it's much
> better to just use GIdle sources though.
>
>>> This can actually be made to work with sync I/O emulation too by
>>> having another GMainLoop in the sync I/O loop although I thought I
>>> recalled a patch series to remove that stuff.
>>
>> ... which stuff? :)
>
> The sync I/O emulation.  Since sync I/O is done in block drivers, they can
> just use coroutine I/O instead of sync I/O.

Yes, I think we should covert sync I/O code to use coroutines, which
is quite natural.  The users of sync I/O today are:
1. Hardware emulation - lesser-used or not performance-critical code
paths still use bdrv_read/write() in places, e.g. sd, nand, fdc, ide
pio.
2. Block drivers.  Some image formats are synchronous, but converting
to coroutines is pretty easy.
3. qemu-tools including qemu-img and qemu-io.

With Paolo's work to make the event loop available in qemu-tools it
should be possible to convert and eliminate synchronous I/O
interfaces.

Stefan

Patch

From cf036a192f09ea76d89648b83ec84a4226d14172 Mon Sep 17 00:00:00 2001
From: Anthony Liguori <aliguori@us.ibm.com>
Date: Tue, 27 Sep 2011 11:01:51 -0500
Subject: [PATCH] posix-aio-compat: use g_idle_add()

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
---
 posix-aio-compat.c |   51 +++++++--------------------------------------------
 1 files changed, 7 insertions(+), 44 deletions(-)

diff --git a/posix-aio-compat.c b/posix-aio-compat.c
index d3c1174..1f746be 100644
--- a/posix-aio-compat.c
+++ b/posix-aio-compat.c
@@ -52,7 +52,6 @@  struct qemu_paiocb {
 };
 
 typedef struct PosixAioState {
-    int rfd, wfd;
     struct qemu_paiocb *first_aio;
 } PosixAioState;
 
@@ -466,7 +465,7 @@  static int qemu_paio_error(struct qemu_paiocb *aiocb)
     return ret;
 }
 
-static int posix_aio_process_queue(void *opaque)
+static gboolean posix_aio_process_queue(void *opaque)
 {
     PosixAioState *s = opaque;
     struct qemu_paiocb *acb, **pacb;
@@ -477,8 +476,9 @@  static int posix_aio_process_queue(void *opaque)
         pacb = &s->first_aio;
         for(;;) {
             acb = *pacb;
-            if (!acb)
-                return result;
+            if (!acb) {
+                goto out;
+            }
 
             ret = qemu_paio_error(acb);
             if (ret == ECANCELED) {
@@ -513,27 +513,8 @@  static int posix_aio_process_queue(void *opaque)
         }
     }
 
-    return result;
-}
-
-static void posix_aio_read(void *opaque)
-{
-    PosixAioState *s = opaque;
-    ssize_t len;
-
-    /* read all bytes from signal pipe */
-    for (;;) {
-        char bytes[16];
-
-        len = read(s->rfd, bytes, sizeof(bytes));
-        if (len == -1 && errno == EINTR)
-            continue; /* try again */
-        if (len == sizeof(bytes))
-            continue; /* more to read */
-        break;
-    }
-
-    posix_aio_process_queue(s);
+out:
+    return FALSE;
 }
 
 static int posix_aio_flush(void *opaque)
@@ -546,12 +527,7 @@  static PosixAioState *posix_aio_state;
 
 static void posix_aio_notify_event(void)
 {
-    char byte = 0;
-    ssize_t ret;
-
-    ret = write(posix_aio_state->wfd, &byte, sizeof(byte));
-    if (ret < 0 && errno != EAGAIN)
-        die("write()");
+    g_idle_add(posix_aio_process_queue, posix_aio_state);
 }
 
 static void paio_remove(struct qemu_paiocb *acb)
@@ -665,19 +641,6 @@  int paio_init(void)
     s = g_malloc(sizeof(PosixAioState));
 
     s->first_aio = NULL;
-    if (qemu_pipe(fds) == -1) {
-        fprintf(stderr, "failed to create pipe\n");
-        return -1;
-    }
-
-    s->rfd = fds[0];
-    s->wfd = fds[1];
-
-    fcntl(s->rfd, F_SETFL, O_NONBLOCK);
-    fcntl(s->wfd, F_SETFL, O_NONBLOCK);
-
-    qemu_aio_set_fd_handler(s->rfd, posix_aio_read, NULL, posix_aio_flush,
-        posix_aio_process_queue, s);
 
     ret = pthread_attr_init(&attr);
     if (ret)
-- 
1.7.4.1