diff mbox

[2/4] block: unify flush implementations

Message ID 1318581692-18338-3-git-send-email-pbonzini@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini Oct. 14, 2011, 8:41 a.m. UTC
Add coroutine support for flush and apply the same emulation that
we already do for read/write.  bdrv_aio_flush is simplified to always
go through a coroutine.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c     |  160 ++++++++++++++++++++++++++---------------------------------
 block_int.h |    1 +
 2 files changed, 71 insertions(+), 90 deletions(-)

Comments

Kevin Wolf Oct. 14, 2011, 11:08 a.m. UTC | #1
Am 14.10.2011 10:41, schrieb Paolo Bonzini:
> Add coroutine support for flush and apply the same emulation that
> we already do for read/write.  bdrv_aio_flush is simplified to always
> go through a coroutine.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

To make the implementation more consistent with read/write operations,
wouldn't it make sense to provide a bdrv_co_flush() globally instead of
using the synchronous version as the preferred public interface?

This is the semantics that I would expect of a bdrv_co_flush() anyway,
your use of it for an AIO emulation functions confused me a bit at first.

Kevin
Paolo Bonzini Oct. 14, 2011, 11:30 a.m. UTC | #2
On 10/14/2011 01:08 PM, Kevin Wolf wrote:
> Am 14.10.2011 10:41, schrieb Paolo Bonzini:
>> Add coroutine support for flush and apply the same emulation that
>> we already do for read/write.  bdrv_aio_flush is simplified to always
>> go through a coroutine.
>>
>> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
>
> To make the implementation more consistent with read/write operations,
> wouldn't it make sense to provide a bdrv_co_flush() globally instead of
> using the synchronous version as the preferred public interface?

I thought about it, but then it turned out that I would have

         bdrv_flush
         -> create coroutine or just fast-path to bdrv_flush_co_entry
            -> bdrv_flush_co_entry
               -> driver

and

         bdrv_co_flush
         -> bdrv_flush_co_entry
            -> driver

In other words, the code would be exactly the same, save for an "if 
(qemu_in_coroutine())".  The reason is that, unlike read/write, neither 
flush nor discard take a qiov.

In general, I think that with Stefan's cleanup having specialized 
coroutine versions has in general a much smaller benefit.  The code 
reading benefit of naming routines like bdrv_co_* is already lost, for 
example, since bdrv_read can yield when called for coroutine context.

Let me show how this might go.  Right now you have

         bdrv_read/write
         -> bdrv_rw_co
            -> create qiov
            -> create coroutine or just fast-path to bdrv_rw_co_entry
	      -> bdrv_rw_co_entry
                  -> bdrv_co_do_readv/writev
                     -> driver

         bdrv_co_readv/writev
         -> bdrv_co_do_readv/writev
            -> driver

But starting from here, you might just as well reorganize it like this:

         bdrv_read/writev
         -> bdrv_rw_co
            -> create qiov
            -> bdrv_readv/writev

         bdrv_readv/writev
         -> create coroutine or just fast-path to bdrv_rw_co_entry
            -> bdrv_rw_co_entry
               -> bdrv_co_do_readv/writev
                  -> driver

and just drop bdrv_co_readv, since it would just be hard-coding the 
fast-path of bdrv_readv.

Since some amount of synchronous I/O would likely always be there, for 
example in qemu-img, I think this unification would make more sense than 
providing two separate entrypoints for bdrv_co_flush and bdrv_flush.

Paolo
Kevin Wolf Oct. 14, 2011, 11:54 a.m. UTC | #3
Am 14.10.2011 13:30, schrieb Paolo Bonzini:
> On 10/14/2011 01:08 PM, Kevin Wolf wrote:
>> Am 14.10.2011 10:41, schrieb Paolo Bonzini:
>>> Add coroutine support for flush and apply the same emulation that
>>> we already do for read/write.  bdrv_aio_flush is simplified to always
>>> go through a coroutine.
>>>
>>> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
>>
>> To make the implementation more consistent with read/write operations,
>> wouldn't it make sense to provide a bdrv_co_flush() globally instead of
>> using the synchronous version as the preferred public interface?
> 
> I thought about it, but then it turned out that I would have
> 
>          bdrv_flush
>          -> create coroutine or just fast-path to bdrv_flush_co_entry
>             -> bdrv_flush_co_entry
>                -> driver
> 
> and
> 
>          bdrv_co_flush
>          -> bdrv_flush_co_entry
>             -> driver
> 
> In other words, the code would be exactly the same, save for an "if 
> (qemu_in_coroutine())".  The reason is that, unlike read/write, neither 
> flush nor discard take a qiov.

What I was thinking of looks a bit different:

    bdrv_flush
    -> create coroutine or just fast-path to bdrv_flush_co_entry
       -> bdrv_flush_co_entry
          -> bdrv_co_flush

and

    bdrv_co_flush
    -> driver

And the reason for this is that bdrv_co_flush would be a function that
does only little more than passing the function to the driver (just like
most bdrv_* functions do), with no emulation going on at all.

> In general, I think that with Stefan's cleanup having specialized 
> coroutine versions has in general a much smaller benefit.  The code 
> reading benefit of naming routines like bdrv_co_* is already lost, for 
> example, since bdrv_read can yield when called for coroutine context.

Instead of taking a void* and working on a RwCo structure that is really
meant for emulation, bdrv_co_flush would take a BlockDriverState and
improve readability this way.

The more complicated and ugly code would be left separated and only used
for emulation. I think that would make it easier to understand the
common path without being distracted by emulation code.

> Let me show how this might go.  Right now you have
> 
>          bdrv_read/write
>          -> bdrv_rw_co
>             -> create qiov
>             -> create coroutine or just fast-path to bdrv_rw_co_entry
> 	      -> bdrv_rw_co_entry
>                   -> bdrv_co_do_readv/writev
>                      -> driver
> 
>          bdrv_co_readv/writev
>          -> bdrv_co_do_readv/writev
>             -> driver
> 
> But starting from here, you might just as well reorganize it like this:
> 
>          bdrv_read/writev
>          -> bdrv_rw_co
>             -> create qiov
>             -> bdrv_readv/writev
> 
>          bdrv_readv/writev
>          -> create coroutine or just fast-path to bdrv_rw_co_entry
>             -> bdrv_rw_co_entry
>                -> bdrv_co_do_readv/writev
>                   -> driver
> 
> and just drop bdrv_co_readv, since it would just be hard-coding the 
> fast-path of bdrv_readv.

I guess it's all a matter of taste. Stefan, what do you think?

> Since some amount of synchronous I/O would likely always be there, for 
> example in qemu-img, I think this unification would make more sense than 
> providing two separate entrypoints for bdrv_co_flush and bdrv_flush.

Actually, I'm not so sure about qemu-img. I think we have thought of
scenarios where converting it to a coroutine based version with a main
loop would be helpful (can't remember the details, though).

Kevin
Paolo Bonzini Oct. 14, 2011, 12:42 p.m. UTC | #4
On 10/14/2011 01:54 PM, Kevin Wolf wrote:
> Am 14.10.2011 13:30, schrieb Paolo Bonzini:
>> On 10/14/2011 01:08 PM, Kevin Wolf wrote:
>>> Am 14.10.2011 10:41, schrieb Paolo Bonzini:
>>>> Add coroutine support for flush and apply the same emulation that
>>>> we already do for read/write.  bdrv_aio_flush is simplified to always
>>>> go through a coroutine.
>>>>
>>>> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
>>>
>>> To make the implementation more consistent with read/write operations,
>>> wouldn't it make sense to provide a bdrv_co_flush() globally instead of
>>> using the synchronous version as the preferred public interface?
>>
> What I was thinking of looks a bit different:
>
>      bdrv_flush
>      ->  create coroutine or just fast-path to bdrv_flush_co_entry
>         ->  bdrv_flush_co_entry
>            ->  bdrv_co_flush
>
> and
>
>      bdrv_co_flush
>      ->  driver
>
> And the reason for this is that bdrv_co_flush would be a function that
> does only little more than passing the function to the driver (just like
> most bdrv_* functions do), with no emulation going on at all.

It would still host the checks on BDRV_O_NO_FLUSH and bs->drv->*_flush. 
  It would be the same as bdrv_flush_co_entry is now, minus the 
marshalling in/out of the RwCo.

> Instead of taking a void* and working on a RwCo structure that is really
> meant for emulation, bdrv_co_flush would take a BlockDriverState and
> improve readability this way.

I see.  Yeah, that's doable, but I'd still need two coroutines (one for 
bdrv_flush, one for bdrv_aio_flush) and the patch would be bigger overall...

> The more complicated and ugly code would be left separated and only used
> for emulation. I think that would make it easier to understand the
> common path without being distracted by emulation code.

... and on the other hand the length of the call chain would increse. 
It easily gets confusing, it already is for me in the read/write case.

Would bdrv_co_flush be static or not?  If not, you also get an 
additional entry point of dubious additional value, i.e. more complexity.

> Actually, I'm not so sure about qemu-img. I think we have thought of
> scenarios where converting it to a coroutine based version with a main
> loop would be helpful (can't remember the details, though).

qemu-img convert might benefit from multiple in-flight requests if on of 
the endpoints is remote or perhaps even sparse, I guess.

Paolo
Stefan Hajnoczi Oct. 14, 2011, 1:20 p.m. UTC | #5
On Fri, Oct 14, 2011 at 01:54:42PM +0200, Kevin Wolf wrote:
> Am 14.10.2011 13:30, schrieb Paolo Bonzini:
> > On 10/14/2011 01:08 PM, Kevin Wolf wrote:
> >> Am 14.10.2011 10:41, schrieb Paolo Bonzini:
> > Let me show how this might go.  Right now you have
> > 
> >          bdrv_read/write
> >          -> bdrv_rw_co
> >             -> create qiov
> >             -> create coroutine or just fast-path to bdrv_rw_co_entry
> > 	      -> bdrv_rw_co_entry
> >                   -> bdrv_co_do_readv/writev
> >                      -> driver
> > 
> >          bdrv_co_readv/writev
> >          -> bdrv_co_do_readv/writev
> >             -> driver
> > 
> > But starting from here, you might just as well reorganize it like this:
> > 
> >          bdrv_read/writev
> >          -> bdrv_rw_co
> >             -> create qiov
> >             -> bdrv_readv/writev
> > 
> >          bdrv_readv/writev
> >          -> create coroutine or just fast-path to bdrv_rw_co_entry
> >             -> bdrv_rw_co_entry
> >                -> bdrv_co_do_readv/writev
> >                   -> driver
> > 
> > and just drop bdrv_co_readv, since it would just be hard-coding the 
> > fast-path of bdrv_readv.
> 
> I guess it's all a matter of taste. Stefan, what do you think?
> 
> > Since some amount of synchronous I/O would likely always be there, for 
> > example in qemu-img, I think this unification would make more sense than 
> > providing two separate entrypoints for bdrv_co_flush and bdrv_flush.
> 
> Actually, I'm not so sure about qemu-img. I think we have thought of
> scenarios where converting it to a coroutine based version with a main
> loop would be helpful (can't remember the details, though).

I'd like to completely remove synchronous bdrv_*(), including for
qemu-img.

It's just too tempting to call these functions in contexts where it is
not okay to do so.  The bdrv_co_*() functions are all tagged as
coroutine_fn and make it clear that they can yield.

We already have an event loop in qemu-img except it's the nested event
loop in synchronous bdrv_*() emulation functions.  The nested event loop
is a mini event loop and can't really do things like timers.  It would
be nicer to remove it in favor of a single top-level event loop with the
qemu-img code running in a coroutine.

So I'm in favor of keeping bdrv_co_*() explicit for now and refactoring
both hw/ and qemu-tool users of synchronous functions.

Stefan
Paolo Bonzini Oct. 14, 2011, 1:47 p.m. UTC | #6
On 10/14/2011 03:20 PM, Stefan Hajnoczi wrote:
> It's just too tempting to call these functions in contexts where it is
> not okay to do so.  The bdrv_co_*() functions are all tagged as
> coroutine_fn and make it clear that they can yield.

Yes, I agree.

> We already have an event loop in qemu-img except it's the nested event
> loop in synchronous bdrv_*() emulation functions.  The nested event loop
> is a mini event loop and can't really do things like timers.  It would
> be nicer to remove it in favor of a single top-level event loop with the
> qemu-img code running in a coroutine.

Note that the nested event loop cannot go away because of 
qemu_aio_flush, though. :(

Paolo
Kevin Wolf Oct. 14, 2011, 2:02 p.m. UTC | #7
Am 14.10.2011 14:42, schrieb Paolo Bonzini:
> On 10/14/2011 01:54 PM, Kevin Wolf wrote:
>> Am 14.10.2011 13:30, schrieb Paolo Bonzini:
>>> On 10/14/2011 01:08 PM, Kevin Wolf wrote:
>>>> Am 14.10.2011 10:41, schrieb Paolo Bonzini:
>>>>> Add coroutine support for flush and apply the same emulation that
>>>>> we already do for read/write.  bdrv_aio_flush is simplified to always
>>>>> go through a coroutine.
>>>>>
>>>>> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
>>>>
>>>> To make the implementation more consistent with read/write operations,
>>>> wouldn't it make sense to provide a bdrv_co_flush() globally instead of
>>>> using the synchronous version as the preferred public interface?
>>>
>> What I was thinking of looks a bit different:
>>
>>      bdrv_flush
>>      ->  create coroutine or just fast-path to bdrv_flush_co_entry
>>         ->  bdrv_flush_co_entry
>>            ->  bdrv_co_flush
>>
>> and
>>
>>      bdrv_co_flush
>>      ->  driver
>>
>> And the reason for this is that bdrv_co_flush would be a function that
>> does only little more than passing the function to the driver (just like
>> most bdrv_* functions do), with no emulation going on at all.
> 
> It would still host the checks on BDRV_O_NO_FLUSH and bs->drv->*_flush. 
>   It would be the same as bdrv_flush_co_entry is now, minus the 
> marshalling in/out of the RwCo.

Right.

By the way, I like how you handle all three backends in the same
function. I think this is a lot more readable than the solution used by
read/write (changing the function pointers on driver registration).

>> Instead of taking a void* and working on a RwCo structure that is really
>> meant for emulation, bdrv_co_flush would take a BlockDriverState and
>> improve readability this way.
> 
> I see.  Yeah, that's doable, but I'd still need two coroutines (one for 
> bdrv_flush, one for bdrv_aio_flush) and the patch would be bigger overall...

You already have two of them (bdrv_co_flush for AIO and
bdrv_flush_co_entry for synchronous), so I don't think it makes a
difference there.

>> The more complicated and ugly code would be left separated and only used
>> for emulation. I think that would make it easier to understand the
>> common path without being distracted by emulation code.
> 
> ... and on the other hand the length of the call chain would increse. 
> It easily gets confusing, it already is for me in the read/write case.

Well, depends on what you're looking at. The call chain length would
increase for AIO and synchronous bdrv_flush, but it would become shorter
for bdrv_co_flush.

If we want to declare coroutines as the preferred interface, I think
such a change makes sense.

> Would bdrv_co_flush be static or not?  If not, you also get an 
> additional entry point of dubious additional value, i.e. more complexity.

I think I would make it public. The one that has to go eventually is the
synchronous bdrv_flush(). Which is another reason why I wouldn't design
everything around it.

>> Actually, I'm not so sure about qemu-img. I think we have thought of
>> scenarios where converting it to a coroutine based version with a main
>> loop would be helpful (can't remember the details, though).
> 
> qemu-img convert might benefit from multiple in-flight requests if on of 
> the endpoints is remote or perhaps even sparse, I guess.

Quite possible, yes.

Kevin
Paolo Bonzini Oct. 14, 2011, 2:04 p.m. UTC | #8
On 10/14/2011 04:02 PM, Kevin Wolf wrote:
>> >
>> >  It would still host the checks on BDRV_O_NO_FLUSH and bs->drv->*_flush.
>> >     It would be the same as bdrv_flush_co_entry is now, minus the
>> >  marshalling in/out of the RwCo.
> Right.
>
> By the way, I like how you handle all three backends in the same
> function. I think this is a lot more readable than the solution used by
> read/write (changing the function pointers on driver registration).
>

Yeah, and it makes sense to handle all of them in the bdrv_co_* version.

Will resubmit.

Paolo
diff mbox

Patch

diff --git a/block.c b/block.c
index 7184a0f..0af9a89 100644
--- a/block.c
+++ b/block.c
@@ -53,17 +53,12 @@  static BlockDriverAIOCB *bdrv_aio_readv_em(BlockDriverState *bs,
 static BlockDriverAIOCB *bdrv_aio_writev_em(BlockDriverState *bs,
         int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
         BlockDriverCompletionFunc *cb, void *opaque);
-static BlockDriverAIOCB *bdrv_aio_flush_em(BlockDriverState *bs,
-        BlockDriverCompletionFunc *cb, void *opaque);
-static BlockDriverAIOCB *bdrv_aio_noop_em(BlockDriverState *bs,
-        BlockDriverCompletionFunc *cb, void *opaque);
 static int coroutine_fn bdrv_co_readv_em(BlockDriverState *bs,
                                          int64_t sector_num, int nb_sectors,
                                          QEMUIOVector *iov);
 static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
                                          int64_t sector_num, int nb_sectors,
                                          QEMUIOVector *iov);
-static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs);
 static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
     int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
 static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
@@ -203,9 +198,6 @@  void bdrv_register(BlockDriver *bdrv)
         }
     }
 
-    if (!bdrv->bdrv_aio_flush)
-        bdrv->bdrv_aio_flush = bdrv_aio_flush_em;
-
     QLIST_INSERT_HEAD(&bdrv_drivers, bdrv, list);
 }
 
@@ -1027,11 +1019,6 @@  static int bdrv_check_request(BlockDriverState *bs, int64_t sector_num,
                                    nb_sectors * BDRV_SECTOR_SIZE);
 }
 
-static inline bool bdrv_has_async_flush(BlockDriver *drv)
-{
-    return drv->bdrv_aio_flush != bdrv_aio_flush_em;
-}
-
 typedef struct RwCo {
     BlockDriverState *bs;
     int64_t sector_num;
@@ -1759,33 +1746,6 @@  const char *bdrv_get_device_name(BlockDriverState *bs)
     return bs->device_name;
 }
 
-int bdrv_flush(BlockDriverState *bs)
-{
-    if (bs->open_flags & BDRV_O_NO_FLUSH) {
-        return 0;
-    }
-
-    if (bs->drv && bdrv_has_async_flush(bs->drv) && qemu_in_coroutine()) {
-        return bdrv_co_flush_em(bs);
-    }
-
-    if (bs->drv && bs->drv->bdrv_flush) {
-        return bs->drv->bdrv_flush(bs);
-    }
-
-    /*
-     * Some block drivers always operate in either writethrough or unsafe mode
-     * and don't support bdrv_flush therefore. Usually qemu doesn't know how
-     * the server works (because the behaviour is hardcoded or depends on
-     * server-side configuration), so we can't ensure that everything is safe
-     * on disk. Returning an error doesn't work because that would break guests
-     * even if the server operates in writethrough mode.
-     *
-     * Let's hope the user knows what he's doing.
-     */
-    return 0;
-}
-
 void bdrv_flush_all(void)
 {
     BlockDriverState *bs;
@@ -2610,22 +2570,6 @@  fail:
     return -1;
 }
 
-BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
-        BlockDriverCompletionFunc *cb, void *opaque)
-{
-    BlockDriver *drv = bs->drv;
-
-    trace_bdrv_aio_flush(bs, opaque);
-
-    if (bs->open_flags & BDRV_O_NO_FLUSH) {
-        return bdrv_aio_noop_em(bs, cb, opaque);
-    }
-
-    if (!drv)
-        return NULL;
-    return drv->bdrv_aio_flush(bs, cb, opaque);
-}
-
 void bdrv_aio_cancel(BlockDriverAIOCB *acb)
 {
     acb->pool->cancel(acb);
@@ -2785,41 +2729,28 @@  static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
     return &acb->common;
 }
 
-static BlockDriverAIOCB *bdrv_aio_flush_em(BlockDriverState *bs,
-        BlockDriverCompletionFunc *cb, void *opaque)
+static void coroutine_fn bdrv_co_flush(void *opaque)
 {
-    BlockDriverAIOCBSync *acb;
-
-    acb = qemu_aio_get(&bdrv_em_aio_pool, bs, cb, opaque);
-    acb->is_write = 1; /* don't bounce in the completion hadler */
-    acb->qiov = NULL;
-    acb->bounce = NULL;
-    acb->ret = 0;
-
-    if (!acb->bh)
-        acb->bh = qemu_bh_new(bdrv_aio_bh_cb, acb);
+    BlockDriverAIOCBCoroutine *acb = opaque;
+    BlockDriverState *bs = acb->common.bs;
 
-    bdrv_flush(bs);
+    acb->req.error = bdrv_flush(bs);
+    acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
     qemu_bh_schedule(acb->bh);
-    return &acb->common;
 }
 
-static BlockDriverAIOCB *bdrv_aio_noop_em(BlockDriverState *bs,
+BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
         BlockDriverCompletionFunc *cb, void *opaque)
 {
-    BlockDriverAIOCBSync *acb;
+    trace_bdrv_aio_flush(bs, opaque);
 
-    acb = qemu_aio_get(&bdrv_em_aio_pool, bs, cb, opaque);
-    acb->is_write = 1; /* don't bounce in the completion handler */
-    acb->qiov = NULL;
-    acb->bounce = NULL;
-    acb->ret = 0;
+    Coroutine *co;
+    BlockDriverAIOCBCoroutine *acb;
 
-    if (!acb->bh) {
-        acb->bh = qemu_bh_new(bdrv_aio_bh_cb, acb);
-    }
+    acb = qemu_aio_get(&bdrv_em_co_aio_pool, bs, cb, opaque);
+    co = qemu_coroutine_create(bdrv_co_flush);
+    qemu_coroutine_enter(co, acb);
 
-    qemu_bh_schedule(acb->bh);
     return &acb->common;
 }
 
@@ -2916,19 +2847,68 @@  static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
     return bdrv_co_io_em(bs, sector_num, nb_sectors, iov, true);
 }
 
-static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs)
+static void bdrv_flush_co_entry(void *opaque)
 {
-    CoroutineIOCompletion co = {
-        .coroutine = qemu_coroutine_self(),
+    RwCo *rwco = opaque;
+    BlockDriverState *bs = rwco->bs;
+
+    if (bs->open_flags & BDRV_O_NO_FLUSH) {
+        rwco->ret = 0;
+    } else if (!bs->drv) {
+        rwco->ret = 0;
+    } else if (bs->drv->bdrv_co_flush) {
+        rwco->ret = bs->drv->bdrv_co_flush(bs);
+    } else if (bs->drv->bdrv_aio_flush) {
+        BlockDriverAIOCB *acb;
+        CoroutineIOCompletion co = {
+            .coroutine = qemu_coroutine_self(),
+        };
+
+        acb = bs->drv->bdrv_aio_flush(bs, bdrv_co_io_em_complete, &co);
+        if (acb == NULL) {
+            rwco->ret = -EIO;
+        } else {
+            qemu_coroutine_yield();
+            rwco->ret = co.ret;
+        }
+    } else if (bs->drv->bdrv_flush) {
+        rwco->ret = bs->drv->bdrv_flush(bs);
+    } else {
+        /*
+         * Some block drivers always operate in either writethrough or unsafe
+         * mode and don't support bdrv_flush therefore. Usually qemu doesn't
+         * know how the server works (because the behaviour is hardcoded or
+         * depends on server-side configuration), so we can't ensure that
+         * everything is safe on disk. Returning an error doesn't work because
+         * that would break guests even if the server operates in writethrough
+         * mode.
+         *
+         * Let's hope the user knows what he's doing.
+         */
+        rwco->ret = 0;
+    }
+}
+
+int bdrv_flush(BlockDriverState *bs)
+{
+    Coroutine *co;
+    RwCo rwco = {
+        .bs = bs,
+        .ret = NOT_DONE,
     };
-    BlockDriverAIOCB *acb;
 
-    acb = bdrv_aio_flush(bs, bdrv_co_io_em_complete, &co);
-    if (!acb) {
-        return -EIO;
+    if (qemu_in_coroutine()) {
+        /* Fast-path if already in coroutine context */
+        bdrv_flush_co_entry(&rwco);
+    } else {
+        co = qemu_coroutine_create(bdrv_flush_co_entry);
+        qemu_coroutine_enter(co, &rwco);
+        while (rwco.ret == NOT_DONE) {
+            qemu_aio_wait();
+        }
     }
-    qemu_coroutine_yield();
-    return co.ret;
+
+    return rwco.ret;
 }
 
 /**************************************************************/
diff --git a/block_int.h b/block_int.h
index f2f4f2d..9cb536d 100644
--- a/block_int.h
+++ b/block_int.h
@@ -83,6 +83,7 @@  struct BlockDriver {
         int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
     int coroutine_fn (*bdrv_co_writev)(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
+    int coroutine_fn (*bdrv_co_flush)(BlockDriverState *bs);
 
     int (*bdrv_aio_multiwrite)(BlockDriverState *bs, BlockRequest *reqs,
         int num_reqs);