diff mbox

[for-2.6,2/3] mirror: Skip BH for mirror_exit if in main loop

Message ID 1461306907-2837-3-git-send-email-famz@redhat.com
State New
Headers show

Commit Message

Fam Zheng April 22, 2016, 6:35 a.m. UTC
Commit 5a7e7a0ba moved mirror_exit to a BH handler but didn't add any
protection against guest requests that could sneak in before the BH is
dispatched. For example, this could happen (assuming a code base at that
commit):

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch
                aio_dispatch
                  ...
                    mirror_run
                      bdrv_drain
    (a)               block_job_defer_to_main_loop
          qemu_iohandler_poll
            virtio_queue_host_notifier_read
              ...
                virtio_submit_multiwrite
    (b)           blk_aio_multiwrite

        main_loop_wait # 2
          <snip>
                aio_dispatch
                  aio_bh_poll
    (c)             mirror_exit

At (a) we know the BDS has no pending request. However, the same
main_loop_wait call is going to dispatch iohandlers (EventNotifier
events), which may lead to a new I/O from guest. So the invariant is
already broken at (c). Data loss.

Commit f3926945c8 made iohandler to use aio API.  The order of
virtio_queue_host_notifier_read and block_job_defer_to_main_loop within
a main_loop_wait becomes unpredictable, and even worse, if the host
notifier event arrives at the next main_loop_wait call, the
unpredictable order between mirror_exit and
virtio_queue_host_notifier_read is also a trouble. As shown below, this
commit made the bug easier to trigger:

    - Bug case 1:

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch (qemu_aio_context)
                ...
                  mirror_run
                    bdrv_drain
    (a)             block_job_defer_to_main_loop
              aio_ctx_dispatch (iohandler_ctx)
                virtio_queue_host_notifier_read
                  ...
                    virtio_submit_multiwrite
    (b)               blk_aio_multiwrite

        main_loop_wait # 2
          ...
                aio_dispatch
                  aio_bh_poll
    (c)             mirror_exit

    - Bug case 2:

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch (qemu_aio_context)
                ...
                  mirror_run
                    bdrv_drain
    (a)             block_job_defer_to_main_loop

        main_loop_wait # 2
          ...
            aio_ctx_dispatch (iohandler_ctx)
              virtio_queue_host_notifier_read
                ...
                  virtio_submit_multiwrite
    (b)             blk_aio_multiwrite
              aio_dispatch
                aio_bh_poll
    (c)           mirror_exit

In both cases, (b) breaks the invariant wanted by (a) and (c).

Unfortunately, the request loss has been silent, until 3f09bfbc7be added
an assertion at (c) to check the invariant in
bdrv_replace_in_backing_chain. Max reproduced an assertion failure after
that commit, by doing active committing while the guest is running
bonnie++.

2.5 added bdrv_drained_begin at (a) to protect the dataplane case from
similar problems, but we never realize the main loop bug until now.

As a bandage, this patch undoes the change of 5a7e7a0bad17 in
non-dataplane case, and calls mirror_replace directly in mirror_run.

The next step after 2.6 is to complete bdrv_drained_begin complete
(including both fixing the dispatching code in the main loop and marking
host event notifiers etc. as external) and then this special casing is
not necessary.

Launchpad Bug: 1570134

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/mirror.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Fam Zheng April 22, 2016, 9:37 a.m. UTC | #1
On Fri, 04/22 14:35, Fam Zheng wrote:
> As a bandage, this patch undoes the change of 5a7e7a0bad17 in
> non-dataplane case, and calls mirror_replace directly in mirror_run.

mirror_exit minus mirror_replace was kept in the BH because it's not safe to
call in coroutine. After discussing with kwolf on IRC, now it turns out that
mirror_replace isn't either, due to the bdrv_drain_all in bdrv_reopen. So,

NACK

The only feasible fix for 2.6 is with the previous direction, but we need
is_external check added to the main loop. Patches are being worked on.

Fam
diff mbox

Patch

diff --git a/block/mirror.c b/block/mirror.c
index 6c3fe43..bc77773 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -726,6 +726,13 @@  immediate_exit:
     /* Before we switch to target in mirror_exit, make sure data doesn't
      * change. */
     bdrv_drained_begin(s->common.bs);
+    if (qemu_get_aio_context() == bdrv_get_aio_context(bs)) {
+        /* FIXME: if we are in the main loop, the iohandler doesn't honor
+         * bdrv_drained_begin yet, and guest requests can sneak in before BH
+         * callback runs. Do the replace now to avoid it. */
+        mirror_replace(s, &data->ret);
+        data->should_replace = false;
+    }
     block_job_defer_to_main_loop(&s->common, mirror_exit, data);
 }