Patchwork aio-posix: Fix return value of aio_poll()

login
register
mail settings
Submitter Kevin Wolf
Date Jan. 16, 2013, 6:11 p.m.
Message ID <1358359893-718-1-git-send-email-kwolf@redhat.com>
Download mbox | patch
Permalink /patch/212914/
State New
Headers show

Comments

Kevin Wolf - Jan. 16, 2013, 6:11 p.m.
aio_poll() must return true if any work is still pending, even if it
didn't make progress, so that qemu_aio_wait() doesn't return too early.
The possibility of returning early occasionally lead to a failed
assertion in bdrv_drain_all(), when some in-flight request was missed
and the function didn't really drain all requests.

In order to make that change, the return value as specified in the
function comment must change for blocking = false; fortunately, the
return value of blocking = false callers is only used in test cases, so
this change shouldn't cause any trouble.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 aio-posix.c         |    3 ++-
 include/block/aio.h |    6 ++----
 tests/test-aio.c    |    4 ++--
 3 files changed, 6 insertions(+), 7 deletions(-)
Paolo Bonzini - Jan. 16, 2013, 6:13 p.m.
Il 16/01/2013 19:11, Kevin Wolf ha scritto:
> aio_poll() must return true if any work is still pending, even if it
> didn't make progress, so that qemu_aio_wait() doesn't return too early.
> The possibility of returning early occasionally lead to a failed
> assertion in bdrv_drain_all(), when some in-flight request was missed
> and the function didn't really drain all requests.
> 
> In order to make that change, the return value as specified in the
> function comment must change for blocking = false; fortunately, the
> return value of blocking = false callers is only used in test cases, so
> this change shouldn't cause any trouble.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Looks good, but please make the same change in aio-win32.c, and add a Cc
to qemu-stable@nongnu.org

Paolo

> ---
>  aio-posix.c         |    3 ++-
>  include/block/aio.h |    6 ++----
>  tests/test-aio.c    |    4 ++--
>  3 files changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/aio-posix.c b/aio-posix.c
> index 88d09e1..fe4dbb4 100644
> --- a/aio-posix.c
> +++ b/aio-posix.c
> @@ -264,5 +264,6 @@ bool aio_poll(AioContext *ctx, bool blocking)
>          }
>      }
>  
> -    return progress;
> +    assert(progress || busy);
> +    return true;
>  }
> diff --git a/include/block/aio.h b/include/block/aio.h
> index 0933f05..8eda924 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -173,16 +173,14 @@ bool aio_pending(AioContext *ctx);
>   * aio as a result of executing I/O completion or bh callbacks.
>   *
>   * If there is no pending AIO operation or completion (bottom half),
> - * return false.  If there are pending bottom halves, return true.
> + * return false.  If there are pending AIO operations of bottom halves,
> + * return true.
>   *
>   * If there are no pending bottom halves, but there are pending AIO
>   * operations, it may not be possible to make any progress without
>   * blocking.  If @blocking is true, this function will wait until one
>   * or more AIO events have completed, to ensure something has moved
>   * before returning.
> - *
> - * If @blocking is false, this function will also return false if the
> - * function cannot make any progress without blocking.
>   */
>  bool aio_poll(AioContext *ctx, bool blocking);
>  
> diff --git a/tests/test-aio.c b/tests/test-aio.c
> index e4ebef7..c173870 100644
> --- a/tests/test-aio.c
> +++ b/tests/test-aio.c
> @@ -315,13 +315,13 @@ static void test_wait_event_notifier_noflush(void)
>      event_notifier_set(&data.e);
>      g_assert(aio_poll(ctx, false));
>      g_assert_cmpint(data.n, ==, 1);
> -    g_assert(!aio_poll(ctx, false));
> +    g_assert(aio_poll(ctx, false));
>      g_assert_cmpint(data.n, ==, 1);
>  
>      event_notifier_set(&data.e);
>      g_assert(aio_poll(ctx, false));
>      g_assert_cmpint(data.n, ==, 2);
> -    g_assert(!aio_poll(ctx, false));
> +    g_assert(aio_poll(ctx, false));
>      g_assert_cmpint(data.n, ==, 2);
>  
>      event_notifier_set(&dummy.e);
>
Dietmar Maurer - Jan. 22, 2013, 7:02 a.m.
> aio_poll() must return true if any work is still pending, even if it didn't make
> progress, so that qemu_aio_wait() doesn't return too early.
> The possibility of returning early occasionally lead to a failed assertion in
> bdrv_drain_all(), when some in-flight request was missed and the function
> didn't really drain all requests.

I still have problem with bdrv_drain_all() and my backup block job.
If I reset/suspend/resume/ the VM during a backup job run I get:

block.c:1221: bdrv_drain_all: Assertion `((&bs->tracked_requests)->lh_first == ((void *)0))' failed.
Aborted

I am not 100% sure, but I think a simple qemu_aio_wait() is not enough to ensure that
a copy-on-write action has finished.

Any idea how to solve that problem?
Paolo Bonzini - Jan. 22, 2013, 7:49 a.m.
Il 22/01/2013 08:02, Dietmar Maurer ha scritto:
>> aio_poll() must return true if any work is still pending, even if it didn't make
>> > progress, so that qemu_aio_wait() doesn't return too early.
>> > The possibility of returning early occasionally lead to a failed assertion in
>> > bdrv_drain_all(), when some in-flight request was missed and the function
>> > didn't really drain all requests.
> I still have problem with bdrv_drain_all() and my backup block job.
> If I reset/suspend/resume/ the VM during a backup job run I get:
> 
> block.c:1221: bdrv_drain_all: Assertion `((&bs->tracked_requests)->lh_first == ((void *)0))' failed.
> Aborted
> 
> I am not 100% sure, but I think a simple qemu_aio_wait() is not enough to ensure that
> a copy-on-write action has finished.

Are you using timers in any way?

Paolo

> Any idea how to solve that problem?
> 
> 
> 
> 
> 
>
Dietmar Maurer - Jan. 22, 2013, 7:56 a.m.
> > If I reset/suspend/resume/ the VM during a backup job run I get:
> >
> > block.c:1221: bdrv_drain_all: Assertion `((&bs->tracked_requests)->lh_first
> == ((void *)0))' failed.
> > Aborted
> >
> > I am not 100% sure, but I think a simple qemu_aio_wait() is not enough
> > to ensure that a copy-on-write action has finished.
> 
> Are you using timers in any way?

Yes, I call co_sleep_ns(rt_clock, delay) to limit rate to output stream.
Paolo Bonzini - Jan. 22, 2013, 8:38 a.m.
> > > If I reset/suspend/resume/ the VM during a backup job run I get:
> > >
> > > block.c:1221: bdrv_drain_all: Assertion
> > > `((&bs->tracked_requests)->lh_first
> > == ((void *)0))' failed.
> > > Aborted
> > >
> > > I am not 100% sure, but I think a simple qemu_aio_wait() is not
> > > enough
> > > to ensure that a copy-on-write action has finished.
> > 
> > Are you using timers in any way?
> 
> Yes, I call co_sleep_ns(rt_clock, delay) to limit rate to output
> stream.

Use block_job_sleep_ns instead, and only call it when no I/O
is pending.

Paolo
Dietmar Maurer - Jan. 22, 2013, 10:09 a.m.
> > > Are you using timers in any way?

> >

> > Yes, I call co_sleep_ns(rt_clock, delay) to limit rate to output

> > stream.

> 

> Use block_job_sleep_ns instead, and only call it when no I/O is pending. 


Thanks, that works!
Dietmar Maurer - Jan. 23, 2013, 8:58 a.m.
> > > > Are you using timers in any way?

> > >

> > > Yes, I call co_sleep_ns(rt_clock, delay) to limit rate to output

> > > stream.

> >

> > Use block_job_sleep_ns instead, and only call it when no I/O is pending.

> 

> Thanks, that works!


I currently use qemu_aio_set_fd_handler() to implement async output to
the backup stream. While that works, performance is much better (during backup)
if I use a separate thread to write the data.

Is that known behavior, or should aio give about the same performance as 
using a thread?

Or would I get better performance if I use Linux native AIO support?
Paolo Bonzini - Jan. 23, 2013, 11:10 a.m.
Il 23/01/2013 09:58, Dietmar Maurer ha scritto:
>>>>> Are you using timers in any way?
>>>>
>>>> Yes, I call co_sleep_ns(rt_clock, delay) to limit rate to output
>>>> stream.
>>>
>>> Use block_job_sleep_ns instead, and only call it when no I/O is pending.
>>
>> Thanks, that works!
> 
> I currently use qemu_aio_set_fd_handler() to implement async output to
> the backup stream. While that works, performance is much better (during backup)
> if I use a separate thread to write the data.
> 
> Is that known behavior, or should aio give about the same performance as 
> using a thread?

It depends, but this is what we saw for migration too.  You may also
have less contention on the big QEMU lock if you use a thread.

> Or would I get better performance if I use Linux native AIO support?

That also depends, but in general it should be faster.

Paolo
Dietmar Maurer - Jan. 23, 2013, 11:37 a.m.
> > I currently use qemu_aio_set_fd_handler() to implement async output to

> > the backup stream. While that works, performance is much better

> > (during backup) if I use a separate thread to write the data.

> >

> > Is that known behavior, or should aio give about the same performance

> > as using a thread?

> 

> It depends, but this is what we saw for migration too.  You may also have

> less contention on the big QEMU lock if you use a thread.


But when I use a thread it triggers the bug in bdrv_drain_all(). So how can
I fix  bdrv_drain_all() if I use a separate thread to write data?

I currently use CoQueue to wait for the output thread.

Maybe there is some example code somewhere?
 
> > Or would I get better performance if I use Linux native AIO support?

> 

> That also depends, but in general it should be faster.


Is there an easy way to switch to native AIO? The code in block/linux-aio.c is only
usable inside block drivers? I simply need to write to a unix file descriptor.
Paolo Bonzini - Jan. 23, 2013, 11:43 a.m.
Il 23/01/2013 12:37, Dietmar Maurer ha scritto:
>>> I currently use qemu_aio_set_fd_handler() to implement async output to
>>> the backup stream. While that works, performance is much better
>>> (during backup) if I use a separate thread to write the data.
>>>
>>> Is that known behavior, or should aio give about the same performance
>>> as using a thread?
>>
>> It depends, but this is what we saw for migration too.  You may also have
>> less contention on the big QEMU lock if you use a thread.
> 
> But when I use a thread it triggers the bug in bdrv_drain_all(). So how can
> I fix  bdrv_drain_all() if I use a separate thread to write data?

The bug is, in all likelihood, in your own code.  Sorry. :)

> I currently use CoQueue to wait for the output thread.

The simplest way to synchronize the main event loop with another thread
is a bottom half.  There is no example yet, but I will post it soon for
migration.

In the other direction, use an EventNotifier (for which bottom halves
are just a nice wrapper).

> Maybe there is some example code somewhere?
>  
>>> Or would I get better performance if I use Linux native AIO support?
>>
>> That also depends, but in general it should be faster.
> 
> Is there an easy way to switch to native AIO? The code in block/linux-aio.c is only
> usable inside block drivers? I simply need to write to a unix file descriptor.

First of all, AIO is only usable for files and block devices, not
character devices or sockets.  If you have one, you can use the block
device API rather than the file descriptor API to write it.  You'll get
native AIO support for free.

Support for running the block drivers in a separate thread is not there
yet, but I predict it will come later this year.

Paolo
Dietmar Maurer - Jan. 23, 2013, 2 p.m.
> >>> Is that known behavior, or should aio give about the same

> >>> performance as using a thread?

> >>

> >> It depends, but this is what we saw for migration too.  You may also

> >> have less contention on the big QEMU lock if you use a thread.


I managed to fix the performance issue by doing the following:

- use O_DIRECT to avoid host cache on write
- limit write size to max 4096 bytes (throughput gets a bit lower, but VM is
much more responsive during backup)

Now I get about the same behavior/performance as using a thread.

Unfortunately, the bug in bdrv_drain_all() trigger more often.

> > But when I use a thread it triggers the bug in bdrv_drain_all(). So

> > how can I fix  bdrv_drain_all() if I use a separate thread to write data?

> 

> The bug is, in all likelihood, in your own code.  Sorry. :)

> 

> > I currently use CoQueue to wait for the output thread.

> 

> The simplest way to synchronize the main event loop with another thread is

> a bottom half.  There is no example yet, but I will post it soon for migration.


The problem is not really how I can synchronize. The problem is that I need to halt the
blockjob when the output buffer is full, and this causes the bug in bdrv_drain_all().

I am currently a bit out of ideas - will do more testing.
Dietmar Maurer - Jan. 24, 2013, 7:42 a.m.
> > But when I use a thread it triggers the bug in bdrv_drain_all(). So

> > how can I fix  bdrv_drain_all() if I use a separate thread to write data?

> 

> The bug is, in all likelihood, in your own code.  Sorry. :)


yes. I still not fully understand that aio code.

If I detect a incomplete write, I register an aio handler like this:

        ret = write(fd, buf, count) 
         if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK) ) {
              ...
              qemu_aio_set_fd_handler(fd, NULL, vma_co_continue_write,  NULL, mydata);
              qemu_coroutine_yield();

But seems that qemu_aio_wait() returns false after that.

I thought that should return true instead?
Paolo Bonzini - Jan. 24, 2013, 8:25 a.m.
Il 24/01/2013 08:42, Dietmar Maurer ha scritto:
>>> But when I use a thread it triggers the bug in bdrv_drain_all(). So
>>> how can I fix  bdrv_drain_all() if I use a separate thread to write data?
>>
>> The bug is, in all likelihood, in your own code.  Sorry. :)
> 
> yes. I still not fully understand that aio code.
> 
> If I detect a incomplete write, I register an aio handler like this:
> 
>         ret = write(fd, buf, count) 
>          if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK) ) {
>               ...
>               qemu_aio_set_fd_handler(fd, NULL, vma_co_continue_write,  NULL, mydata);
>               qemu_coroutine_yield();
> 
> But seems that qemu_aio_wait() returns false after that.
> 
> I thought that should return true instead?

You need a flush handler (fourth argument to qemu_aio_set_fd_handler)
that returns true.

Paolo
Dietmar Maurer - Jan. 24, 2013, 9:05 a.m.
> > If I detect a incomplete write, I register an aio handler like this:

> >

> >         ret = write(fd, buf, count)

> >          if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK) ) {

> >               ...

> >               qemu_aio_set_fd_handler(fd, NULL, vma_co_continue_write,

> NULL, mydata);

> >               qemu_coroutine_yield();

> >

> > But seems that qemu_aio_wait() returns false after that.

> >

> > I thought that should return true instead?

> 

> You need a flush handler (fourth argument to qemu_aio_set_fd_handler)

> that returns true.


Ok, that helps :-) Many thanks!

Patch

diff --git a/aio-posix.c b/aio-posix.c
index 88d09e1..fe4dbb4 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -264,5 +264,6 @@  bool aio_poll(AioContext *ctx, bool blocking)
         }
     }
 
-    return progress;
+    assert(progress || busy);
+    return true;
 }
diff --git a/include/block/aio.h b/include/block/aio.h
index 0933f05..8eda924 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -173,16 +173,14 @@  bool aio_pending(AioContext *ctx);
  * aio as a result of executing I/O completion or bh callbacks.
  *
  * If there is no pending AIO operation or completion (bottom half),
- * return false.  If there are pending bottom halves, return true.
+ * return false.  If there are pending AIO operations of bottom halves,
+ * return true.
  *
  * If there are no pending bottom halves, but there are pending AIO
  * operations, it may not be possible to make any progress without
  * blocking.  If @blocking is true, this function will wait until one
  * or more AIO events have completed, to ensure something has moved
  * before returning.
- *
- * If @blocking is false, this function will also return false if the
- * function cannot make any progress without blocking.
  */
 bool aio_poll(AioContext *ctx, bool blocking);
 
diff --git a/tests/test-aio.c b/tests/test-aio.c
index e4ebef7..c173870 100644
--- a/tests/test-aio.c
+++ b/tests/test-aio.c
@@ -315,13 +315,13 @@  static void test_wait_event_notifier_noflush(void)
     event_notifier_set(&data.e);
     g_assert(aio_poll(ctx, false));
     g_assert_cmpint(data.n, ==, 1);
-    g_assert(!aio_poll(ctx, false));
+    g_assert(aio_poll(ctx, false));
     g_assert_cmpint(data.n, ==, 1);
 
     event_notifier_set(&data.e);
     g_assert(aio_poll(ctx, false));
     g_assert_cmpint(data.n, ==, 2);
-    g_assert(!aio_poll(ctx, false));
+    g_assert(aio_poll(ctx, false));
     g_assert_cmpint(data.n, ==, 2);
 
     event_notifier_set(&dummy.e);