diff mbox series

file-posix: add drop-cache=on|off option

Message ID 20190226153549.17867-1-stefanha@redhat.com
State New
Headers show
Series file-posix: add drop-cache=on|off option | expand

Commit Message

Stefan Hajnoczi Feb. 26, 2019, 3:35 p.m. UTC
Commit dd577a26ff03b6829721b1ffbbf9e7c411b72378 ("block/file-posix:
implement bdrv_co_invalidate_cache() on Linux") introduced page cache
invalidation so that cache.direct=off live migration is safe on Linux.

The invalidation takes a significant amount of time when the file is
large and present in the page cache.  Normally this is not the case for
cross-host live migration but it can happen when migrating between QEMU
processes on the same host.

On same-host migration we don't need to invalidate pages for correctness
anyway, so an option to skip page cache invalidation is useful.  I
investigated optimizing invalidation and detecting same-host migration,
but both are hard to achieve so a user-visible option will suffice.

As a bonus this option means that the cache invalidation feature will
now be detectable by libvirt via QMP schema introspection.

Suggested-by: Neil Skrypuch <neil@tembosocial.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 qapi/block-core.json |  5 +++++
 block/file-posix.c   | 14 ++++++++++++++
 2 files changed, 19 insertions(+)

Comments

Eric Blake Feb. 26, 2019, 4:54 p.m. UTC | #1
On 2/26/19 9:35 AM, Stefan Hajnoczi wrote:
> Commit dd577a26ff03b6829721b1ffbbf9e7c411b72378 ("block/file-posix:
> implement bdrv_co_invalidate_cache() on Linux") introduced page cache
> invalidation so that cache.direct=off live migration is safe on Linux.
> 
> The invalidation takes a significant amount of time when the file is
> large and present in the page cache.  Normally this is not the case for
> cross-host live migration but it can happen when migrating between QEMU
> processes on the same host.
> 
> On same-host migration we don't need to invalidate pages for correctness
> anyway, so an option to skip page cache invalidation is useful.  I
> investigated optimizing invalidation and detecting same-host migration,
> but both are hard to achieve so a user-visible option will suffice.
> 
> As a bonus this option means that the cache invalidation feature will
> now be detectable by libvirt via QMP schema introspection.

Do you still want to pursue the QMP query-qemu-features command, or does
this delay that for another day?

> 
> Suggested-by: Neil Skrypuch <neil@tembosocial.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  qapi/block-core.json |  5 +++++
>  block/file-posix.c   | 14 ++++++++++++++
>  2 files changed, 19 insertions(+)
> 
Reviewed-by: Eric Blake <eblake@redhat.com>
Neil Skrypuch Feb. 26, 2019, 8:02 p.m. UTC | #2
On Tuesday, February 26, 2019 10:35:49 AM EST Stefan Hajnoczi wrote:
> Suggested-by: Neil Skrypuch <neil@tembosocial.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  qapi/block-core.json |  5 +++++
>  block/file-posix.c   | 14 ++++++++++++++
>  2 files changed, 19 insertions(+)

Tested-by: Neil Skrypuch <neil@tembosocial.com>

Applied this patch against 3.1.0 and tested with -drive ...,file.drop-
cache=off and it resolves the issue mentioned in my previous email to the 
list, "[regression] Clock jump on VM migration".

Thanks,

- Neil
Stefano Garzarella Feb. 27, 2019, 9:57 a.m. UTC | #3
On Tue, Feb 26, 2019 at 03:35:49PM +0000, Stefan Hajnoczi wrote:
> Commit dd577a26ff03b6829721b1ffbbf9e7c411b72378 ("block/file-posix:
> implement bdrv_co_invalidate_cache() on Linux") introduced page cache
> invalidation so that cache.direct=off live migration is safe on Linux.
> 
> The invalidation takes a significant amount of time when the file is
> large and present in the page cache.  Normally this is not the case for
> cross-host live migration but it can happen when migrating between QEMU
> processes on the same host.
> 
> On same-host migration we don't need to invalidate pages for correctness
> anyway, so an option to skip page cache invalidation is useful.  I
> investigated optimizing invalidation and detecting same-host migration,
> but both are hard to achieve so a user-visible option will suffice.
> 
> As a bonus this option means that the cache invalidation feature will
> now be detectable by libvirt via QMP schema introspection.
> 
> Suggested-by: Neil Skrypuch <neil@tembosocial.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  qapi/block-core.json |  5 +++++
>  block/file-posix.c   | 14 ++++++++++++++
>  2 files changed, 19 insertions(+)
> 

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Stefan Hajnoczi Feb. 27, 2019, 3:33 p.m. UTC | #4
On Tue, Feb 26, 2019 at 10:54:55AM -0600, Eric Blake wrote:
> On 2/26/19 9:35 AM, Stefan Hajnoczi wrote:
> > Commit dd577a26ff03b6829721b1ffbbf9e7c411b72378 ("block/file-posix:
> > implement bdrv_co_invalidate_cache() on Linux") introduced page cache
> > invalidation so that cache.direct=off live migration is safe on Linux.
> > 
> > The invalidation takes a significant amount of time when the file is
> > large and present in the page cache.  Normally this is not the case for
> > cross-host live migration but it can happen when migrating between QEMU
> > processes on the same host.
> > 
> > On same-host migration we don't need to invalidate pages for correctness
> > anyway, so an option to skip page cache invalidation is useful.  I
> > investigated optimizing invalidation and detecting same-host migration,
> > but both are hard to achieve so a user-visible option will suffice.
> > 
> > As a bonus this option means that the cache invalidation feature will
> > now be detectable by libvirt via QMP schema introspection.
> 
> Do you still want to pursue the QMP query-qemu-features command, or does
> this delay that for another day?

The presence of this new option doesn't guarantee that dropping caches
works.  It is currently only implemented on Linux.

We still need query-qemu-features so that libvirt can detect whether
this QEMU binary can drop caches (e.g. on Linux vs FreeBSD).

Stefan
Daniel P. Berrangé Feb. 27, 2019, 3:56 p.m. UTC | #5
On Wed, Feb 27, 2019 at 03:33:11PM +0000, Stefan Hajnoczi wrote:
> On Tue, Feb 26, 2019 at 10:54:55AM -0600, Eric Blake wrote:
> > On 2/26/19 9:35 AM, Stefan Hajnoczi wrote:
> > > Commit dd577a26ff03b6829721b1ffbbf9e7c411b72378 ("block/file-posix:
> > > implement bdrv_co_invalidate_cache() on Linux") introduced page cache
> > > invalidation so that cache.direct=off live migration is safe on Linux.
> > > 
> > > The invalidation takes a significant amount of time when the file is
> > > large and present in the page cache.  Normally this is not the case for
> > > cross-host live migration but it can happen when migrating between QEMU
> > > processes on the same host.
> > > 
> > > On same-host migration we don't need to invalidate pages for correctness
> > > anyway, so an option to skip page cache invalidation is useful.  I
> > > investigated optimizing invalidation and detecting same-host migration,
> > > but both are hard to achieve so a user-visible option will suffice.
> > > 
> > > As a bonus this option means that the cache invalidation feature will
> > > now be detectable by libvirt via QMP schema introspection.
> > 
> > Do you still want to pursue the QMP query-qemu-features command, or does
> > this delay that for another day?
> 
> The presence of this new option doesn't guarantee that dropping caches
> works.  It is currently only implemented on Linux.
> 
> We still need query-qemu-features so that libvirt can detect whether
> this QEMU binary can drop caches (e.g. on Linux vs FreeBSD).

The commit message said that libvirt would use this new option to
detect availability of the cache drop feature. That should probably
be removed from the commit message, as this caveat about non-portable
impl means libvirt can't actually rely on it.

Regards,
Daniel
Stefan Hajnoczi Feb. 27, 2019, 5:36 p.m. UTC | #6
On Wed, Feb 27, 2019 at 03:56:27PM +0000, Daniel P. Berrangé wrote:
> On Wed, Feb 27, 2019 at 03:33:11PM +0000, Stefan Hajnoczi wrote:
> > On Tue, Feb 26, 2019 at 10:54:55AM -0600, Eric Blake wrote:
> > > On 2/26/19 9:35 AM, Stefan Hajnoczi wrote:
> > > > Commit dd577a26ff03b6829721b1ffbbf9e7c411b72378 ("block/file-posix:
> > > > implement bdrv_co_invalidate_cache() on Linux") introduced page cache
> > > > invalidation so that cache.direct=off live migration is safe on Linux.
> > > > 
> > > > The invalidation takes a significant amount of time when the file is
> > > > large and present in the page cache.  Normally this is not the case for
> > > > cross-host live migration but it can happen when migrating between QEMU
> > > > processes on the same host.
> > > > 
> > > > On same-host migration we don't need to invalidate pages for correctness
> > > > anyway, so an option to skip page cache invalidation is useful.  I
> > > > investigated optimizing invalidation and detecting same-host migration,
> > > > but both are hard to achieve so a user-visible option will suffice.
> > > > 
> > > > As a bonus this option means that the cache invalidation feature will
> > > > now be detectable by libvirt via QMP schema introspection.
> > > 
> > > Do you still want to pursue the QMP query-qemu-features command, or does
> > > this delay that for another day?
> > 
> > The presence of this new option doesn't guarantee that dropping caches
> > works.  It is currently only implemented on Linux.
> > 
> > We still need query-qemu-features so that libvirt can detect whether
> > this QEMU binary can drop caches (e.g. on Linux vs FreeBSD).
> 
> The commit message said that libvirt would use this new option to
> detect availability of the cache drop feature. That should probably
> be removed from the commit message, as this caveat about non-portable
> impl means libvirt can't actually rely on it.

Okay, will fix.

Stefan
diff mbox series

Patch

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 2b8afbb924..d4cc3c4294 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2807,6 +2807,10 @@ 
 # @locking:     whether to enable file locking. If set to 'auto', only enable
 #               when Open File Descriptor (OFD) locking API is available
 #               (default: auto, since 2.10)
+# @drop-cache:  invalidate page cache during live migration.  This prevents
+#               stale data on the migration destination with cache.direct=off.
+#               Currently only supported on Linux hosts.
+#               (default: on, since: 4.0)
 # @x-check-cache-dropped: whether to check that page cache was dropped on live
 #                         migration.  May cause noticeable delays if the image
 #                         file is large, do not use in production.
@@ -2819,6 +2823,7 @@ 
             '*pr-manager': 'str',
             '*locking': 'OnOffAuto',
             '*aio': 'BlockdevAioOptions',
+            '*drop-cache': 'bool',
             '*x-check-cache-dropped': 'bool' } }
 
 ##
diff --git a/block/file-posix.c b/block/file-posix.c
index ba6ab62a38..7bb2c4762f 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -154,6 +154,7 @@  typedef struct BDRVRawState {
     bool page_cache_inconsistent:1;
     bool has_fallocate;
     bool needs_alignment;
+    bool drop_cache;
     bool check_cache_dropped;
 
     PRManager *pr_mgr;
@@ -162,6 +163,7 @@  typedef struct BDRVRawState {
 typedef struct BDRVRawReopenState {
     int fd;
     int open_flags;
+    bool drop_cache;
     bool check_cache_dropped;
 } BDRVRawReopenState;
 
@@ -422,6 +424,11 @@  static QemuOptsList raw_runtime_opts = {
             .type = QEMU_OPT_STRING,
             .help = "id of persistent reservation manager object (default: none)",
         },
+        {
+            .name = "drop-cache",
+            .type = QEMU_OPT_BOOL,
+            .help = "invalidate page cache during live migration (default: on)",
+        },
         {
             .name = "x-check-cache-dropped",
             .type = QEMU_OPT_BOOL,
@@ -511,6 +518,7 @@  static int raw_open_common(BlockDriverState *bs, QDict *options,
         }
     }
 
+    s->drop_cache = qemu_opt_get_bool(opts, "drop-cache", true);
     s->check_cache_dropped = qemu_opt_get_bool(opts, "x-check-cache-dropped",
                                                false);
 
@@ -869,6 +877,7 @@  static int raw_reopen_prepare(BDRVReopenState *state,
         goto out;
     }
 
+    rs->drop_cache = qemu_opt_get_bool_del(opts, "drop-cache", true);
     rs->check_cache_dropped =
         qemu_opt_get_bool_del(opts, "x-check-cache-dropped", false);
 
@@ -946,6 +955,7 @@  static void raw_reopen_commit(BDRVReopenState *state)
     BDRVRawState *s = state->bs->opaque;
     Error *local_err = NULL;
 
+    s->drop_cache = rs->drop_cache;
     s->check_cache_dropped = rs->check_cache_dropped;
     s->open_flags = rs->open_flags;
 
@@ -2531,6 +2541,10 @@  static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
         return;
     }
 
+    if (!s->drop_cache) {
+        return;
+    }
+
     if (s->open_flags & O_DIRECT) {
         return; /* No host kernel page cache */
     }