diff mbox

[v2,7/8] block: Move some bdrv_*_all() functions to BB

Message ID 1447126069-6185-8-git-send-email-mreitz@redhat.com
State New
Headers show

Commit Message

Max Reitz Nov. 10, 2015, 3:27 a.m. UTC
Move bdrv_drain_all(), bdrv_commit_all(), bdrv_flush_all() and
bdrv_invalidate_cache_all() to BB.

The only operation left is bdrv_close_all(), which cannot be moved to
the BB because it should not only close all BBs, but also all
monitor-owned BDSs.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c                 |  38 -------------
 block/block-backend.c   | 138 +++++++++++++++++++++++++++++++++++++++++++-----
 block/io.c              |  88 ------------------------------
 include/block/block.h   |   4 --
 stubs/Makefile.objs     |   2 +-
 stubs/bdrv-commit-all.c |   7 ---
 stubs/blk-commit-all.c  |   7 +++
 7 files changed, 134 insertions(+), 150 deletions(-)
 delete mode 100644 stubs/bdrv-commit-all.c
 create mode 100644 stubs/blk-commit-all.c

Comments

Kevin Wolf Dec. 1, 2015, 4:01 p.m. UTC | #1
Am 10.11.2015 um 04:27 hat Max Reitz geschrieben:
> Move bdrv_drain_all(), bdrv_commit_all(), bdrv_flush_all() and
> bdrv_invalidate_cache_all() to BB.
> 
> The only operation left is bdrv_close_all(), which cannot be moved to
> the BB because it should not only close all BBs, but also all
> monitor-owned BDSs.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>

bdrv_commit_all() and bdrv_flush_all() are relatively obvious. They are
meant to commit/flush changes made by a guest, so moving to the BB level
seems right.

I'm not so sure about bdrv_invalidate_cache_all(). Even if a BB isn't
attached to a BDS, we still need to invalidate the caches of any images
that have been opened before migration has completed, including those
images that are only owned by the monitor.

Similarly I'm concerned about bdrv_drain_all(). Anything outside the
block layer should certainly call blk_drain_all() because it can only be
interested in those images that it can see. The calls in block.c,
however, look like they should consider BDSes without a BB as well. (The
monitor, which can hold direct BDS references, may be another valid user
of a bdrv_drain_all that also covers BDSes without a BB.)

And block.c calling blk_*() functions smells a bit fishy anyway.

Kevin
Max Reitz Dec. 4, 2015, 11:15 p.m. UTC | #2
On 01.12.2015 17:01, Kevin Wolf wrote:
> Am 10.11.2015 um 04:27 hat Max Reitz geschrieben:
>> Move bdrv_drain_all(), bdrv_commit_all(), bdrv_flush_all() and
>> bdrv_invalidate_cache_all() to BB.
>>
>> The only operation left is bdrv_close_all(), which cannot be moved to
>> the BB because it should not only close all BBs, but also all
>> monitor-owned BDSs.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> 
> bdrv_commit_all() and bdrv_flush_all() are relatively obvious. They are
> meant to commit/flush changes made by a guest, so moving to the BB level
> seems right.

OK, I wanted to just follow this comment, but since monitor_bdrv_states
is local to blockdev.c (and I consider that correct) that would mean
either making it global (which I wouldn't like) or keeping bdrv_states
(which I wouldn't like either, not least because dropping bdrv_states
allows us to drop bdrv_new_root(), which may or may not be crucial to
the follow-up series to this one).

So, I'll be trying to defend what this patch does for now.

> I'm not so sure about bdrv_invalidate_cache_all(). Even if a BB isn't
> attached to a BDS, we still need to invalidate the caches of any images
> that have been opened before migration has completed, including those
> images that are only owned by the monitor.

I consider BB-less BDSs to be in a temporary state between blockdev-add
and binding them to a device, or between unbinding them from a device
and blockdev-del. So I don't even know if we should allow them to be
migrated.

Anyway, in my opinion, a BB-less BDS should be totally static.
Invalidating its cache shouldn't do anything. Instead, its cache should
be invalidated when it is detached from a BB (just like this patch adds
a bdrv_drain() call to blk_remove_bs()).

> Similarly I'm concerned about bdrv_drain_all(). Anything outside the
> block layer should certainly call blk_drain_all() because it can only be
> interested in those images that it can see. The calls in block.c,
> however, look like they should consider BDSes without a BB as well. (The
> monitor, which can hold direct BDS references, may be another valid user
> of a bdrv_drain_all that also covers BDSes without a BB.)

Similarly, in my opinion, bdrv_drain() shouldn't do anything on a
BB-less BDS. That is why this patch adds a bdrv_drain() call to
blk_remove_bs() (a bdrv_invalidate_cache() call is missing yet, though).

But I just remembered that block jobs don't use BBs... Did I mention
before how wrong I think that is?

Now I'm torn between trying to make block jobs use BBs once more
(because I really think BB-less BDSs should not have any active I/O) and
doing some ugly things with bdrv_states and/or bdrv_monitor_states. I'll
have to think about it.

> And block.c calling blk_*() functions smells a bit fishy anyway.

I don't find it any more fishy than single-BDS functions calling
bdrv_*_all() functions. And that is, not fishy at all. If some function
wants to drain all I/O and we define I/O to always originate from a BB,
then I find the only reasonable approach to call blk_drain_all().

Max
Kevin Wolf Dec. 7, 2015, 11:16 a.m. UTC | #3
Am 05.12.2015 um 00:15 hat Max Reitz geschrieben:
> On 01.12.2015 17:01, Kevin Wolf wrote:
> > Am 10.11.2015 um 04:27 hat Max Reitz geschrieben:
> >> Move bdrv_drain_all(), bdrv_commit_all(), bdrv_flush_all() and
> >> bdrv_invalidate_cache_all() to BB.
> >>
> >> The only operation left is bdrv_close_all(), which cannot be moved to
> >> the BB because it should not only close all BBs, but also all
> >> monitor-owned BDSs.
> >>
> >> Signed-off-by: Max Reitz <mreitz@redhat.com>
> > 
> > bdrv_commit_all() and bdrv_flush_all() are relatively obvious. They are
> > meant to commit/flush changes made by a guest, so moving to the BB level
> > seems right.
> 
> OK, I wanted to just follow this comment, but since monitor_bdrv_states
> is local to blockdev.c (and I consider that correct) that would mean
> either making it global (which I wouldn't like) or keeping bdrv_states
> (which I wouldn't like either, not least because dropping bdrv_states
> allows us to drop bdrv_new_root(), which may or may not be crucial to
> the follow-up series to this one).
> 
> So, I'll be trying to defend what this patch does for now.
> 
> > I'm not so sure about bdrv_invalidate_cache_all(). Even if a BB isn't
> > attached to a BDS, we still need to invalidate the caches of any images
> > that have been opened before migration has completed, including those
> > images that are only owned by the monitor.
> 
> I consider BB-less BDSs to be in a temporary state between blockdev-add
> and binding them to a device, or between unbinding them from a device
> and blockdev-del. So I don't even know if we should allow them to be
> migrated.

If we don't want to, we should add a migration blocker while such a BDS
exists.

I don't think that would be right, though. Definitely not in the strict
way you phrased it ("binding them to a device"), but probably also not
if you interpret "device" as any kind of user, including block jobs or
NBD servers; actually, I think even the monitor would belong in this
list, but then you always have "something" attached, otherwise the
refcount becomes zero and the BDS is deleted.

If you don't include the monitor, though, you prevent entirely
reasonable use cases like opening upfront multiple images for a device
with removable media and then changing between them later on, which
leaves some BDSes unattached while another medium is inserted.

I think BDSes without a BB attached should still be first-class
citizens.

> Anyway, in my opinion, a BB-less BDS should be totally static.
> Invalidating its cache shouldn't do anything. Instead, its cache should
> be invalidated when it is detached from a BB (just like this patch adds
> a bdrv_drain() call to blk_remove_bs()).

Are you sure you remember what bdrv_invalidate_cache() is for? Its job
is dropping stale cache contents after a mere bdrv_open() when migrating
with shared storage. It is wrong to ever use this after actually using
(instead of just opening) an image.

The order is like this:

1. Start destination qemu -incoming. This opens the image and
   potentially reads in some metadata (qcow2 L1 table, etc.)
2. Source keeps writing to the image.
3. Migration completes. Source flushes the image and stops writing.
4. Destination must call bdrv_invalidate_cache() to make sure it gets
   all the updates from step 2 instead of using stale data from step 1.

Note how there is no detaching from a BB involved at all.

> > Similarly I'm concerned about bdrv_drain_all(). Anything outside the
> > block layer should certainly call blk_drain_all() because it can only be
> > interested in those images that it can see. The calls in block.c,
> > however, look like they should consider BDSes without a BB as well. (The
> > monitor, which can hold direct BDS references, may be another valid user
> > of a bdrv_drain_all that also covers BDSes without a BB.)
> 
> Similarly, in my opinion, bdrv_drain() shouldn't do anything on a
> BB-less BDS. That is why this patch adds a bdrv_drain() call to
> blk_remove_bs() (a bdrv_invalidate_cache() call is missing yet, though).
> 
> But I just remembered that block jobs don't use BBs... Did I mention
> before how wrong I think that is?
> 
> Now I'm torn between trying to make block jobs use BBs once more
> (because I really think BB-less BDSs should not have any active I/O) and
> doing some ugly things with bdrv_states and/or bdrv_monitor_states. I'll
> have to think about it.

I think we first need to allow multiple BBs per BDS, so that block jobs
can create their own BB.

And anyway, this seems to be a great start for discussing our whole BB
model in person on Friday. As I said, I think what we're currently doing
is wrong. I am concerned that introducing a model that actually makes
sense is hard to do in a compatible way, but I'm getting less and less
confident that doing it like we do now is the right conclusion from that.

> > And block.c calling blk_*() functions smells a bit fishy anyway.
> 
> I don't find it any more fishy than single-BDS functions calling
> bdrv_*_all() functions. And that is, not fishy at all. If some function
> wants to drain all I/O and we define I/O to always originate from a BB,
> then I find the only reasonable approach to call blk_drain_all().

bdrv_*_all() calls are always fishy, at least if done outside shutdown.

Kevin
diff mbox

Patch

diff --git a/block.c b/block.c
index f8c1e99..c80675e 100644
--- a/block.c
+++ b/block.c
@@ -2297,26 +2297,6 @@  ro_cleanup:
     return ret;
 }
 
-int bdrv_commit_all(void)
-{
-    BlockDriverState *bs;
-
-    QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-
-        aio_context_acquire(aio_context);
-        if (bs->drv && bs->backing) {
-            int ret = bdrv_commit(bs);
-            if (ret < 0) {
-                aio_context_release(aio_context);
-                return ret;
-            }
-        }
-        aio_context_release(aio_context);
-    }
-    return 0;
-}
-
 /*
  * Return values:
  * 0        - success
@@ -3074,24 +3054,6 @@  void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
     }
 }
 
-void bdrv_invalidate_cache_all(Error **errp)
-{
-    BlockDriverState *bs;
-    Error *local_err = NULL;
-
-    QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-
-        aio_context_acquire(aio_context);
-        bdrv_invalidate_cache(bs, &local_err);
-        aio_context_release(aio_context);
-        if (local_err) {
-            error_propagate(errp, local_err);
-            return;
-        }
-    }
-}
-
 /**************************************************************/
 /* removable device support */
 
diff --git a/block/block-backend.c b/block/block-backend.c
index 609a045..b0bf3b2 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -409,6 +409,9 @@  void blk_remove_bs(BlockBackend *blk)
     notifier_list_notify(&blk->remove_bs_notifiers, blk);
 
     blk_update_root_state(blk);
+    /* After this function, the BDS may longer be accessible to blk_*_all()
+     * functions */
+    bdrv_drain(blk->bs);
 
     blk->bs->blk = NULL;
     bdrv_unref(blk->bs);
@@ -902,11 +905,6 @@  int blk_flush(BlockBackend *blk)
     return bdrv_flush(blk->bs);
 }
 
-int blk_flush_all(void)
-{
-    return bdrv_flush_all();
-}
-
 void blk_drain(BlockBackend *blk)
 {
     if (blk->bs) {
@@ -914,11 +912,6 @@  void blk_drain(BlockBackend *blk)
     }
 }
 
-void blk_drain_all(void)
-{
-    bdrv_drain_all();
-}
-
 void blk_set_on_error(BlockBackend *blk, BlockdevOnError on_read_error,
                       BlockdevOnError on_write_error)
 {
@@ -1343,12 +1336,133 @@  BlockBackendRootState *blk_get_root_state(BlockBackend *blk)
     return &blk->root_state;
 }
 
+/*
+ * Wait for pending requests to complete across all BlockBackends
+ *
+ * This function does not flush data to disk, use blk_flush_all() for that
+ * after calling this function.
+ */
+void blk_drain_all(void)
+{
+    /* Always run first iteration so any pending completion BHs run */
+    bool busy = true;
+    BlockBackend *blk;
+    GSList *aio_ctxs = NULL, *ctx;
+
+    QTAILQ_FOREACH(blk, &blk_backends, link) {
+        AioContext *aio_context = blk_get_aio_context(blk);
+
+        aio_context_acquire(aio_context);
+        if (blk_is_inserted(blk) && blk->bs->job) {
+            block_job_pause(blk->bs->job);
+        }
+        aio_context_release(aio_context);
+
+        if (!g_slist_find(aio_ctxs, aio_context)) {
+            aio_ctxs = g_slist_prepend(aio_ctxs, aio_context);
+        }
+    }
+
+    /* Note that completion of an asynchronous I/O operation can trigger any
+     * number of other I/O operations on other devices---for example a
+     * coroutine can submit an I/O request to another device in response to
+     * request completion.  Therefore we must keep looping until there was no
+     * more activity rather than simply draining each device independently.
+     */
+    while (busy) {
+        busy = false;
+
+        for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) {
+            AioContext *aio_context = ctx->data;
+
+            aio_context_acquire(aio_context);
+            QTAILQ_FOREACH(blk, &blk_backends, link) {
+                if (blk_is_inserted(blk) &&
+                    aio_context == blk_get_aio_context(blk))
+                {
+                    bdrv_flush_io_queue(blk->bs);
+                    if (bdrv_requests_pending(blk->bs)) {
+                        busy = true;
+                        aio_poll(aio_context, busy);
+                    }
+                }
+            }
+            busy |= aio_poll(aio_context, false);
+            aio_context_release(aio_context);
+        }
+    }
+
+    QTAILQ_FOREACH(blk, &blk_backends, link) {
+        AioContext *aio_context = blk_get_aio_context(blk);
+
+        aio_context_acquire(aio_context);
+        if (blk_is_inserted(blk) && blk->bs->job) {
+            block_job_resume(blk->bs->job);
+        }
+        aio_context_release(aio_context);
+    }
+    g_slist_free(aio_ctxs);
+}
+
+
 int blk_commit_all(void)
 {
-    return bdrv_commit_all();
+    BlockBackend *blk;
+
+    QTAILQ_FOREACH(blk, &blk_backends, link) {
+        AioContext *aio_context = blk_get_aio_context(blk);
+
+        aio_context_acquire(aio_context);
+        if (blk_is_available(blk) && blk->bs->drv && blk->bs->backing) {
+            int ret = bdrv_commit(blk->bs);
+            if (ret < 0) {
+                aio_context_release(aio_context);
+                return ret;
+            }
+        }
+        aio_context_release(aio_context);
+    }
+    return 0;
+}
+
+int blk_flush_all(void)
+{
+    BlockBackend *blk;
+    int result = 0;
+
+    QTAILQ_FOREACH(blk, &blk_backends, link) {
+        AioContext *aio_context = blk_get_aio_context(blk);
+        int ret;
+
+        aio_context_acquire(aio_context);
+        if (blk_is_inserted(blk)) {
+            ret = blk_flush(blk);
+            if (ret < 0 && !result) {
+                result = ret;
+            }
+        }
+        aio_context_release(aio_context);
+    }
+
+    return result;
 }
 
 void blk_invalidate_cache_all(Error **errp)
 {
-    bdrv_invalidate_cache_all(errp);
+    BlockBackend *blk;
+    Error *local_err = NULL;
+
+    QTAILQ_FOREACH(blk, &blk_backends, link) {
+        AioContext *aio_context = blk_get_aio_context(blk);
+
+        aio_context_acquire(aio_context);
+        if (blk_is_inserted(blk)) {
+            blk_invalidate_cache(blk, &local_err);
+        }
+        aio_context_release(aio_context);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
 }
diff --git a/block/io.c b/block/io.c
index 8dcad3b..110f7b7 100644
--- a/block/io.c
+++ b/block/io.c
@@ -259,74 +259,6 @@  void bdrv_drain(BlockDriverState *bs)
     }
 }
 
-/*
- * Wait for pending requests to complete across all BlockDriverStates
- *
- * This function does not flush data to disk, use bdrv_flush_all() for that
- * after calling this function.
- */
-void bdrv_drain_all(void)
-{
-    /* Always run first iteration so any pending completion BHs run */
-    bool busy = true;
-    BlockDriverState *bs = NULL;
-    GSList *aio_ctxs = NULL, *ctx;
-
-    while ((bs = bdrv_next(bs))) {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-
-        aio_context_acquire(aio_context);
-        if (bs->job) {
-            block_job_pause(bs->job);
-        }
-        aio_context_release(aio_context);
-
-        if (!g_slist_find(aio_ctxs, aio_context)) {
-            aio_ctxs = g_slist_prepend(aio_ctxs, aio_context);
-        }
-    }
-
-    /* Note that completion of an asynchronous I/O operation can trigger any
-     * number of other I/O operations on other devices---for example a
-     * coroutine can submit an I/O request to another device in response to
-     * request completion.  Therefore we must keep looping until there was no
-     * more activity rather than simply draining each device independently.
-     */
-    while (busy) {
-        busy = false;
-
-        for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) {
-            AioContext *aio_context = ctx->data;
-            bs = NULL;
-
-            aio_context_acquire(aio_context);
-            while ((bs = bdrv_next(bs))) {
-                if (aio_context == bdrv_get_aio_context(bs)) {
-                    bdrv_flush_io_queue(bs);
-                    if (bdrv_requests_pending(bs)) {
-                        busy = true;
-                        aio_poll(aio_context, busy);
-                    }
-                }
-            }
-            busy |= aio_poll(aio_context, false);
-            aio_context_release(aio_context);
-        }
-    }
-
-    bs = NULL;
-    while ((bs = bdrv_next(bs))) {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-
-        aio_context_acquire(aio_context);
-        if (bs->job) {
-            block_job_resume(bs->job);
-        }
-        aio_context_release(aio_context);
-    }
-    g_slist_free(aio_ctxs);
-}
-
 /**
  * Remove an active request from the tracked requests list
  *
@@ -1417,26 +1349,6 @@  int coroutine_fn bdrv_co_write_zeroes(BlockDriverState *bs,
                              BDRV_REQ_ZERO_WRITE | flags);
 }
 
-int bdrv_flush_all(void)
-{
-    BlockDriverState *bs = NULL;
-    int result = 0;
-
-    while ((bs = bdrv_next(bs))) {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-        int ret;
-
-        aio_context_acquire(aio_context);
-        ret = bdrv_flush(bs);
-        if (ret < 0 && !result) {
-            result = ret;
-        }
-        aio_context_release(aio_context);
-    }
-
-    return result;
-}
-
 typedef struct BdrvCoGetBlockStatusData {
     BlockDriverState *bs;
     BlockDriverState *base;
diff --git a/include/block/block.h b/include/block/block.h
index 03551d3..f47c45a 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -268,7 +268,6 @@  int64_t bdrv_get_allocated_file_size(BlockDriverState *bs);
 void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
 int bdrv_commit(BlockDriverState *bs);
-int bdrv_commit_all(void);
 int bdrv_change_backing_file(BlockDriverState *bs,
     const char *backing_file, const char *backing_fmt);
 void bdrv_register(BlockDriver *bdrv);
@@ -355,15 +354,12 @@  BlockAIOCB *bdrv_aio_ioctl(BlockDriverState *bs,
 
 /* Invalidate any cached metadata used by image formats */
 void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp);
-void bdrv_invalidate_cache_all(Error **errp);
 
 /* Ensure contents are flushed to disk.  */
 int bdrv_flush(BlockDriverState *bs);
 int coroutine_fn bdrv_co_flush(BlockDriverState *bs);
-int bdrv_flush_all(void);
 void bdrv_close_all(void);
 void bdrv_drain(BlockDriverState *bs);
-void bdrv_drain_all(void);
 
 int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors);
 int bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors);
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index c1f02be..f872ba0 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -1,5 +1,5 @@ 
 stub-obj-y += arch-query-cpu-def.o
-stub-obj-y += bdrv-commit-all.o
+stub-obj-y += blk-commit-all.o
 stub-obj-y += blockdev-close-all-bdrv-states.o
 stub-obj-y += clock-warp.o
 stub-obj-y += cpu-get-clock.o
diff --git a/stubs/bdrv-commit-all.c b/stubs/bdrv-commit-all.c
deleted file mode 100644
index a8e0a95..0000000
--- a/stubs/bdrv-commit-all.c
+++ /dev/null
@@ -1,7 +0,0 @@ 
-#include "qemu-common.h"
-#include "block/block.h"
-
-int bdrv_commit_all(void)
-{
-    return 0;
-}
diff --git a/stubs/blk-commit-all.c b/stubs/blk-commit-all.c
new file mode 100644
index 0000000..90b59e5
--- /dev/null
+++ b/stubs/blk-commit-all.c
@@ -0,0 +1,7 @@ 
+#include "qemu-common.h"
+#include "sysemu/block-backend.h"
+
+int blk_commit_all(void)
+{
+    return 0;
+}