Message ID | 20180411163940.2523-15-kwolf@redhat.com |
---|---|
State | New |
Headers | show |
Series | Drain fixes and cleanups, part 3 | expand |
On 2018-04-11 18:39, Kevin Wolf wrote: > We cannot allow aio_poll() in bdrv_drain_invoke(begin=true) until we're > done with propagating the drain through the graph and are doing the > single final BDRV_POLL_WHILE(). > > Just schedule the coroutine with the callback and increase bs->in_flight > to make sure that the polling phase will wait for it. > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > --- > block/io.c | 28 +++++++++++++++++++++++----- > 1 file changed, 23 insertions(+), 5 deletions(-) According to bisect, this breaks blockdev-snapshot with QED: $ ./qemu-img create -f qed foo.qed 64M Formatting 'foo.qed', fmt=qed size=67108864 cluster_size=65536 $ echo "{'execute':'qmp_capabilities'} {'execute':'blockdev-snapshot', 'arguments':{'node':'backing','overlay':'overlay'}} {'execute':'quit'}" | \ x86_64-softmmu/qemu-system-x86_64 -qmp stdio -nodefaults \ -blockdev "{'node-name':'backing','driver':'null-co'}" \ -blockdev "{'node-name':'overlay','driver':'qed', 'file':{'driver':'file','filename':'foo.qed'}}" {"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2}, "package": "v2.12.0-1422-g0109e7e6f8"}, "capabilities": []}} {"return": {}} qemu-system-x86_64: block.c:3434: bdrv_replace_node: Assertion `!atomic_read(&to->in_flight)' failed. [1] 5252 done echo | 5253 abort (core dumped) x86_64-softmmu/qemu-system-x86_64 -qmp stdio -nodefaults -blockdev -blockdev Max
Am 27.06.2018 um 16:30 hat Max Reitz geschrieben: > On 2018-04-11 18:39, Kevin Wolf wrote: > > We cannot allow aio_poll() in bdrv_drain_invoke(begin=true) until we're > > done with propagating the drain through the graph and are doing the > > single final BDRV_POLL_WHILE(). > > > > Just schedule the coroutine with the callback and increase bs->in_flight > > to make sure that the polling phase will wait for it. > > > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > > --- > > block/io.c | 28 +++++++++++++++++++++++----- > > 1 file changed, 23 insertions(+), 5 deletions(-) > > According to bisect, this breaks blockdev-snapshot with QED: I have no idea why it would trigger only with qed, but I think the real bug is in commit dcf94a23b1 ('block: Don't poll in parent drain callbacks'). The crash is specifically in the place where the new overlay needs to be drained because it's new backing file is drained. First, bdrv_drain_invoke() creates new activity on the overlay when it gets the old active layer attached as a backing file: #0 0x000055b90eb73b88 in bdrv_drain_invoke (bs=0x55b910fd8440, begin=<optimized out>) at block/io.c:217 #1 0x000055b90eb1e2b0 in bdrv_replace_child_noperm (child=0x55b911f32450, new_bs=<optimized out>) at block.c:2069 #2 0x000055b90eb20875 in bdrv_replace_child (child=<optimized out>, new_bs=0x55b910fd2130) at block.c:2098 #3 0x000055b90eb20c53 in bdrv_root_attach_child (child_bs=child_bs@entry=0x55b910fd2130, child_name=child_name@entry=0x55b90eda75e5 "backing", child_role=child_role@entry=0x55b90f3d3300 <child_backing>, perm=0, shared_perm=31, opaque=opaque@entry=0x55b910fd8440, errp=0x7fffb8943620) at block.c:2141 #4 0x000055b90eb20d60 in bdrv_attach_child (parent_bs=parent_bs@entry=0x55b910fd8440, child_bs=child_bs@entry=0x55b910fd2130, child_name=child_name@entry=0x55b90eda75e5 "backing", child_role=child_role@entry=0x55b90f3d3300 <child_backing>, errp=0x7fffb8943620) at block.c:2162 #5 0x000055b90eb23c30 in bdrv_set_backing_hd (bs=bs@entry=0x55b910fd8440, backing_hd=backing_hd@entry=0x55b910fd2130, errp=errp@entry=0x7fffb8943620) at block.c:2249 #6 0x000055b90eb24a76 in bdrv_append (bs_new=0x55b910fd8440, bs_top=0x55b910fd2130, errp=errp@entry=0x7fffb8943680) at block.c:3535 #7 0x000055b90e937a89 in external_snapshot_prepare (common=0x55b910f5cf90, errp=0x7fffb89436f8) at blockdev.c:1680 And then, when trying to move all users of the old active layer to the new overlay, it turns out that the bdrv_drain_invoke() BH hasn't been executed yet: #0 0x00007fdfcef5d9fb in raise () at /lib64/libc.so.6 #1 0x00007fdfcef5f800 in abort () at /lib64/libc.so.6 #2 0x00007fdfcef560da in __assert_fail_base () at /lib64/libc.so.6 #3 0x00007fdfcef56152 in () at /lib64/libc.so.6 #4 0x000055b90eb24867 in bdrv_replace_node (from=<optimized out>, to=0x55b910fd8440, errp=0x7fffb8943620) at block.c:3470 #5 0x000055b90eb24abe in bdrv_append (bs_new=0x55b910fd8440, bs_top=0x55b910fd2130, errp=errp@entry=0x7fffb8943680) at block.c:3541 #6 0x000055b90e937a89 in external_snapshot_prepare (common=0x55b910f5cf90, errp=0x7fffb89436f8) at blockdev.c:1680 Not polling in bdrv_child_cb_drained_begin() is great if the original drain is still polling, but if the original drain_begin has already returned, we obviously do need to poll so we don't enter a drained section while requests are still running. Another thing I noticed is that qmp_transaction() only calls a one-time bdrv_drain_all() instead of using a proper begin/end section. I suppose this should be fixed as well. Kevin
diff --git a/block/io.c b/block/io.c index f372b9ffb0..fc26c1a2f8 100644 --- a/block/io.c +++ b/block/io.c @@ -178,22 +178,40 @@ static void coroutine_fn bdrv_drain_invoke_entry(void *opaque) /* Set data->done before reading bs->wakeup. */ atomic_mb_set(&data->done, true); - bdrv_wakeup(bs); + bdrv_dec_in_flight(bs); + + if (data->begin) { + g_free(data); + } } /* Recursively call BlockDriver.bdrv_co_drain_begin/end callbacks */ static void bdrv_drain_invoke(BlockDriverState *bs, bool begin) { - BdrvCoDrainData data = { .bs = bs, .done = false, .begin = begin}; + BdrvCoDrainData *data; if (!bs->drv || (begin && !bs->drv->bdrv_co_drain_begin) || (!begin && !bs->drv->bdrv_co_drain_end)) { return; } - data.co = qemu_coroutine_create(bdrv_drain_invoke_entry, &data); - bdrv_coroutine_enter(bs, data.co); - BDRV_POLL_WHILE(bs, !data.done); + data = g_new(BdrvCoDrainData, 1); + *data = (BdrvCoDrainData) { + .bs = bs, + .done = false, + .begin = begin + }; + + /* Make sure the driver callback completes during the polling phase for + * drain_begin. */ + bdrv_inc_in_flight(bs); + data->co = qemu_coroutine_create(bdrv_drain_invoke_entry, data); + aio_co_schedule(bdrv_get_aio_context(bs), data->co); + + if (!begin) { + BDRV_POLL_WHILE(bs, !data->done); + g_free(data); + } } /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
We cannot allow aio_poll() in bdrv_drain_invoke(begin=true) until we're done with propagating the drain through the graph and are doing the single final BDRV_POLL_WHILE(). Just schedule the coroutine with the callback and increase bs->in_flight to make sure that the polling phase will wait for it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/io.c | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-)