diff mbox series

[14/19] block: Defer .bdrv_drain_begin callback to polling phase

Message ID 20180411163940.2523-15-kwolf@redhat.com
State New
Headers show
Series Drain fixes and cleanups, part 3 | expand

Commit Message

Kevin Wolf April 11, 2018, 4:39 p.m. UTC
We cannot allow aio_poll() in bdrv_drain_invoke(begin=true) until we're
done with propagating the drain through the graph and are doing the
single final BDRV_POLL_WHILE().

Just schedule the coroutine with the callback and increase bs->in_flight
to make sure that the polling phase will wait for it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

Comments

Max Reitz June 27, 2018, 2:30 p.m. UTC | #1
On 2018-04-11 18:39, Kevin Wolf wrote:
> We cannot allow aio_poll() in bdrv_drain_invoke(begin=true) until we're
> done with propagating the drain through the graph and are doing the
> single final BDRV_POLL_WHILE().
> 
> Just schedule the coroutine with the callback and increase bs->in_flight
> to make sure that the polling phase will wait for it.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/io.c | 28 +++++++++++++++++++++++-----
>  1 file changed, 23 insertions(+), 5 deletions(-)

According to bisect, this breaks blockdev-snapshot with QED:

$ ./qemu-img create -f qed foo.qed 64M
Formatting 'foo.qed', fmt=qed size=67108864 cluster_size=65536
$ echo "{'execute':'qmp_capabilities'}
        {'execute':'blockdev-snapshot',
         'arguments':{'node':'backing','overlay':'overlay'}}
        {'execute':'quit'}" | \
    x86_64-softmmu/qemu-system-x86_64 -qmp stdio -nodefaults \
        -blockdev "{'node-name':'backing','driver':'null-co'}" \
        -blockdev "{'node-name':'overlay','driver':'qed',
                    'file':{'driver':'file','filename':'foo.qed'}}"
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2},
"package": "v2.12.0-1422-g0109e7e6f8"}, "capabilities": []}}
{"return": {}}
qemu-system-x86_64: block.c:3434: bdrv_replace_node: Assertion
`!atomic_read(&to->in_flight)' failed.
[1]    5252 done                 echo  |
       5253 abort (core dumped)  x86_64-softmmu/qemu-system-x86_64 -qmp
stdio -nodefaults -blockdev  -blockdev

Max
Kevin Wolf June 29, 2018, 3:14 p.m. UTC | #2
Am 27.06.2018 um 16:30 hat Max Reitz geschrieben:
> On 2018-04-11 18:39, Kevin Wolf wrote:
> > We cannot allow aio_poll() in bdrv_drain_invoke(begin=true) until we're
> > done with propagating the drain through the graph and are doing the
> > single final BDRV_POLL_WHILE().
> > 
> > Just schedule the coroutine with the callback and increase bs->in_flight
> > to make sure that the polling phase will wait for it.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/io.c | 28 +++++++++++++++++++++++-----
> >  1 file changed, 23 insertions(+), 5 deletions(-)
> 
> According to bisect, this breaks blockdev-snapshot with QED:

I have no idea why it would trigger only with qed, but I think the real
bug is in commit dcf94a23b1 ('block: Don't poll in parent drain
callbacks').

The crash is specifically in the place where the new overlay needs to be
drained because it's new backing file is drained. First,
bdrv_drain_invoke() creates new activity on the overlay when it gets the
old active layer attached as a backing file:

#0  0x000055b90eb73b88 in bdrv_drain_invoke (bs=0x55b910fd8440, begin=<optimized out>) at block/io.c:217
#1  0x000055b90eb1e2b0 in bdrv_replace_child_noperm (child=0x55b911f32450, new_bs=<optimized out>) at block.c:2069
#2  0x000055b90eb20875 in bdrv_replace_child (child=<optimized out>, new_bs=0x55b910fd2130) at block.c:2098
#3  0x000055b90eb20c53 in bdrv_root_attach_child (child_bs=child_bs@entry=0x55b910fd2130, child_name=child_name@entry=0x55b90eda75e5 "backing", child_role=child_role@entry=0x55b90f3d3300 <child_backing>, perm=0, shared_perm=31, opaque=opaque@entry=0x55b910fd8440, errp=0x7fffb8943620) at block.c:2141
#4  0x000055b90eb20d60 in bdrv_attach_child (parent_bs=parent_bs@entry=0x55b910fd8440, child_bs=child_bs@entry=0x55b910fd2130, child_name=child_name@entry=0x55b90eda75e5 "backing", child_role=child_role@entry=0x55b90f3d3300 <child_backing>, errp=0x7fffb8943620) at block.c:2162
#5  0x000055b90eb23c30 in bdrv_set_backing_hd (bs=bs@entry=0x55b910fd8440, backing_hd=backing_hd@entry=0x55b910fd2130, errp=errp@entry=0x7fffb8943620) at block.c:2249
#6  0x000055b90eb24a76 in bdrv_append (bs_new=0x55b910fd8440, bs_top=0x55b910fd2130, errp=errp@entry=0x7fffb8943680) at block.c:3535
#7  0x000055b90e937a89 in external_snapshot_prepare (common=0x55b910f5cf90, errp=0x7fffb89436f8) at blockdev.c:1680

And then, when trying to move all users of the old active layer to the
new overlay, it turns out that the bdrv_drain_invoke() BH hasn't been
executed yet:

#0  0x00007fdfcef5d9fb in raise () at /lib64/libc.so.6
#1  0x00007fdfcef5f800 in abort () at /lib64/libc.so.6
#2  0x00007fdfcef560da in __assert_fail_base () at /lib64/libc.so.6
#3  0x00007fdfcef56152 in  () at /lib64/libc.so.6
#4  0x000055b90eb24867 in bdrv_replace_node (from=<optimized out>, to=0x55b910fd8440, errp=0x7fffb8943620) at block.c:3470
#5  0x000055b90eb24abe in bdrv_append (bs_new=0x55b910fd8440, bs_top=0x55b910fd2130, errp=errp@entry=0x7fffb8943680) at block.c:3541
#6  0x000055b90e937a89 in external_snapshot_prepare (common=0x55b910f5cf90, errp=0x7fffb89436f8) at blockdev.c:1680

Not polling in bdrv_child_cb_drained_begin() is great if the original
drain is still polling, but if the original drain_begin has already
returned, we obviously do need to poll so we don't enter a drained
section while requests are still running.

Another thing I noticed is that qmp_transaction() only calls a one-time
bdrv_drain_all() instead of using a proper begin/end section. I suppose
this should be fixed as well.

Kevin
diff mbox series

Patch

diff --git a/block/io.c b/block/io.c
index f372b9ffb0..fc26c1a2f8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -178,22 +178,40 @@  static void coroutine_fn bdrv_drain_invoke_entry(void *opaque)
 
     /* Set data->done before reading bs->wakeup.  */
     atomic_mb_set(&data->done, true);
-    bdrv_wakeup(bs);
+    bdrv_dec_in_flight(bs);
+
+    if (data->begin) {
+        g_free(data);
+    }
 }
 
 /* Recursively call BlockDriver.bdrv_co_drain_begin/end callbacks */
 static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
 {
-    BdrvCoDrainData data = { .bs = bs, .done = false, .begin = begin};
+    BdrvCoDrainData *data;
 
     if (!bs->drv || (begin && !bs->drv->bdrv_co_drain_begin) ||
             (!begin && !bs->drv->bdrv_co_drain_end)) {
         return;
     }
 
-    data.co = qemu_coroutine_create(bdrv_drain_invoke_entry, &data);
-    bdrv_coroutine_enter(bs, data.co);
-    BDRV_POLL_WHILE(bs, !data.done);
+    data = g_new(BdrvCoDrainData, 1);
+    *data = (BdrvCoDrainData) {
+        .bs = bs,
+        .done = false,
+        .begin = begin
+    };
+
+    /* Make sure the driver callback completes during the polling phase for
+     * drain_begin. */
+    bdrv_inc_in_flight(bs);
+    data->co = qemu_coroutine_create(bdrv_drain_invoke_entry, data);
+    aio_co_schedule(bdrv_get_aio_context(bs), data->co);
+
+    if (!begin) {
+        BDRV_POLL_WHILE(bs, !data->done);
+        g_free(data);
+    }
 }
 
 /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */