diff mbox series

[v2] file-posix: add drop-cache=on|off option

Message ID 20190301160929.19892-1-stefanha@redhat.com
State New
Headers show
Series [v2] file-posix: add drop-cache=on|off option | expand

Commit Message

Stefan Hajnoczi March 1, 2019, 4:09 p.m. UTC
Commit dd577a26ff03b6829721b1ffbbf9e7c411b72378 ("block/file-posix:
implement bdrv_co_invalidate_cache() on Linux") introduced page cache
invalidation so that cache.direct=off live migration is safe on Linux.

The invalidation takes a significant amount of time when the file is
large and present in the page cache.  Normally this is not the case for
cross-host live migration but it can happen when migrating between QEMU
processes on the same host.

On same-host migration we don't need to invalidate pages for correctness
anyway, so an option to skip page cache invalidation is useful.  I
investigated optimizing invalidation and detecting same-host migration,
but both are hard to achieve so a user-visible option will suffice.

Suggested-by: Neil Skrypuch <neil@tembosocial.com>
Tested-by: Neil Skrypuch <neil@tembosocial.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v2:
 * Remove outdated comment about libvirt feature detection [danpb]
---
 qapi/block-core.json |  5 +++++
 block/file-posix.c   | 14 ++++++++++++++
 2 files changed, 19 insertions(+)

Comments

Eric Blake March 1, 2019, 4:30 p.m. UTC | #1
On 3/1/19 10:09 AM, Stefan Hajnoczi wrote:
> Commit dd577a26ff03b6829721b1ffbbf9e7c411b72378 ("block/file-posix:
> implement bdrv_co_invalidate_cache() on Linux") introduced page cache
> invalidation so that cache.direct=off live migration is safe on Linux.
> 
> The invalidation takes a significant amount of time when the file is
> large and present in the page cache.  Normally this is not the case for
> cross-host live migration but it can happen when migrating between QEMU
> processes on the same host.
> 
> On same-host migration we don't need to invalidate pages for correctness
> anyway, so an option to skip page cache invalidation is useful.  I
> investigated optimizing invalidation and detecting same-host migration,
> but both are hard to achieve so a user-visible option will suffice.
> 
> Suggested-by: Neil Skrypuch <neil@tembosocial.com>
> Tested-by: Neil Skrypuch <neil@tembosocial.com>
> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> v2:
>  * Remove outdated comment about libvirt feature detection [danpb]

Question - if we used qapi's 'if':COND to only declare the field on
platforms where we know at compile time that we can support it, would
that be enough for libvirt to introspect that if the field exists then
migration is safe, without having to rely on an query-qemu-features command?

> +++ b/qapi/block-core.json
> @@ -2807,6 +2807,10 @@
>  # @locking:     whether to enable file locking. If set to 'auto', only enable
>  #               when Open File Descriptor (OFD) locking API is available
>  #               (default: auto, since 2.10)
> +# @drop-cache:  invalidate page cache during live migration.  This prevents
> +#               stale data on the migration destination with cache.direct=off.
> +#               Currently only supported on Linux hosts.
> +#               (default: on, since: 4.0)
>  # @x-check-cache-dropped: whether to check that page cache was dropped on live
>  #                         migration.  May cause noticeable delays if the image
>  #                         file is large, do not use in production.
> @@ -2819,6 +2823,7 @@
>              '*pr-manager': 'str',
>              '*locking': 'OnOffAuto',
>              '*aio': 'BlockdevAioOptions',
> +            '*drop-cache': 'bool',
>              '*x-check-cache-dropped': 'bool' } }

In other words, now that we can use 'if' to hide features that aren't
supported based on compile-time knowledge, shouldn't we use that to make
the doc comment "only supported on Linux hosts" introspectible?
Markus Armbruster March 4, 2019, 8:51 a.m. UTC | #2
Eric Blake <eblake@redhat.com> writes:

> On 3/1/19 10:09 AM, Stefan Hajnoczi wrote:
>> Commit dd577a26ff03b6829721b1ffbbf9e7c411b72378 ("block/file-posix:
>> implement bdrv_co_invalidate_cache() on Linux") introduced page cache
>> invalidation so that cache.direct=off live migration is safe on Linux.
>> 
>> The invalidation takes a significant amount of time when the file is
>> large and present in the page cache.  Normally this is not the case for
>> cross-host live migration but it can happen when migrating between QEMU
>> processes on the same host.
>> 
>> On same-host migration we don't need to invalidate pages for correctness
>> anyway, so an option to skip page cache invalidation is useful.  I
>> investigated optimizing invalidation and detecting same-host migration,
>> but both are hard to achieve so a user-visible option will suffice.
>> 
>> Suggested-by: Neil Skrypuch <neil@tembosocial.com>
>> Tested-by: Neil Skrypuch <neil@tembosocial.com>
>> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>> v2:
>>  * Remove outdated comment about libvirt feature detection [danpb]
>
> Question - if we used qapi's 'if':COND to only declare the field on
> platforms where we know at compile time that we can support it, would
> that be enough for libvirt to introspect that if the field exists then
> migration is safe, without having to rely on an query-qemu-features command?
>
>> +++ b/qapi/block-core.json
>> @@ -2807,6 +2807,10 @@
>>  # @locking:     whether to enable file locking. If set to 'auto', only enable
>>  #               when Open File Descriptor (OFD) locking API is available
>>  #               (default: auto, since 2.10)
>> +# @drop-cache:  invalidate page cache during live migration.  This prevents
>> +#               stale data on the migration destination with cache.direct=off.
>> +#               Currently only supported on Linux hosts.
>> +#               (default: on, since: 4.0)
>>  # @x-check-cache-dropped: whether to check that page cache was dropped on live
>>  #                         migration.  May cause noticeable delays if the image
>>  #                         file is large, do not use in production.
>> @@ -2819,6 +2823,7 @@
>>              '*pr-manager': 'str',
>>              '*locking': 'OnOffAuto',
>>              '*aio': 'BlockdevAioOptions',
>> +            '*drop-cache': 'bool',
>>              '*x-check-cache-dropped': 'bool' } }
>
> In other words, now that we can use 'if' to hide features that aren't
> supported based on compile-time knowledge, shouldn't we use that to make
> the doc comment "only supported on Linux hosts" introspectible?

When the code behind some QAPI entity is under #if conditional, we
should consider putting the QAPI entity under the same conditional.

On introspection strategy, my general advice is to make query-qmp-schema
do whenever it can be done easily and "naturally".  Adding stuff to the
schema nobody needs except to detect it in query-qmp-schema wouldn't be
"natural".

However, consider these three cases:

1. Feature is not compiled into this build

   E.g. @drop-cache is not implemented in this build configuration.

2. Compiled into this build, always works

   E.g. @drop-cache is implemented in this build configuration, and will
   work on any supported host capable of running this build.

3. Compiled into this build, but whether it works is known only at
   run-time

   E.g. @drop-cache is implemented in this build configuration, but
   drop-cache: true might be rejected on some hosts.

As far as I understand, case 3. doesn't exist right now for @drop-cache.

As long as that's the case, query-qmp-schema suffices.

If it's not the case, we may want to use other means for it, such as
query-qmp-features.  If we anticipate it not being the case, we might
want to use other means even sooner.
Stefan Hajnoczi March 6, 2019, 10:18 a.m. UTC | #3
On Fri, Mar 01, 2019 at 10:30:07AM -0600, Eric Blake wrote:
> On 3/1/19 10:09 AM, Stefan Hajnoczi wrote:
> > Commit dd577a26ff03b6829721b1ffbbf9e7c411b72378 ("block/file-posix:
> > implement bdrv_co_invalidate_cache() on Linux") introduced page cache
> > invalidation so that cache.direct=off live migration is safe on Linux.
> > 
> > The invalidation takes a significant amount of time when the file is
> > large and present in the page cache.  Normally this is not the case for
> > cross-host live migration but it can happen when migrating between QEMU
> > processes on the same host.
> > 
> > On same-host migration we don't need to invalidate pages for correctness
> > anyway, so an option to skip page cache invalidation is useful.  I
> > investigated optimizing invalidation and detecting same-host migration,
> > but both are hard to achieve so a user-visible option will suffice.
> > 
> > Suggested-by: Neil Skrypuch <neil@tembosocial.com>
> > Tested-by: Neil Skrypuch <neil@tembosocial.com>
> > Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
> > Reviewed-by: Eric Blake <eblake@redhat.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > v2:
> >  * Remove outdated comment about libvirt feature detection [danpb]
> 
> Question - if we used qapi's 'if':COND to only declare the field on
> platforms where we know at compile time that we can support it, would
> that be enough for libvirt to introspect that if the field exists then
> migration is safe, without having to rely on an query-qemu-features command?

Yes, although this raises another question:

The drop-cache implementation is not #ifdefed in file-posix.c.  If we
make the QMP schema conditional, should we also #ifdef the command-line
option in raw_runtime_opts[] to prevent QEMU from silently ignoring this
option?

Stefan
Eric Blake March 6, 2019, 12:44 p.m. UTC | #4
On 3/6/19 4:18 AM, Stefan Hajnoczi wrote:

>> Question - if we used qapi's 'if':COND to only declare the field on
>> platforms where we know at compile time that we can support it, would
>> that be enough for libvirt to introspect that if the field exists then
>> migration is safe, without having to rely on an query-qemu-features command?
> 
> Yes, although this raises another question:
> 
> The drop-cache implementation is not #ifdefed in file-posix.c.  If we
> make the QMP schema conditional, should we also #ifdef the command-line
> option in raw_runtime_opts[] to prevent QEMU from silently ignoring this
> option?

Yes, that would make sense to me - if we don't advertise the feature,
then we should not silently ignore it on the command line.
Markus Armbruster March 6, 2019, 12:50 p.m. UTC | #5
Stefan Hajnoczi <stefanha@gmail.com> writes:

> On Fri, Mar 01, 2019 at 10:30:07AM -0600, Eric Blake wrote:
>> On 3/1/19 10:09 AM, Stefan Hajnoczi wrote:
>> > Commit dd577a26ff03b6829721b1ffbbf9e7c411b72378 ("block/file-posix:
>> > implement bdrv_co_invalidate_cache() on Linux") introduced page cache
>> > invalidation so that cache.direct=off live migration is safe on Linux.
>> > 
>> > The invalidation takes a significant amount of time when the file is
>> > large and present in the page cache.  Normally this is not the case for
>> > cross-host live migration but it can happen when migrating between QEMU
>> > processes on the same host.
>> > 
>> > On same-host migration we don't need to invalidate pages for correctness
>> > anyway, so an option to skip page cache invalidation is useful.  I
>> > investigated optimizing invalidation and detecting same-host migration,
>> > but both are hard to achieve so a user-visible option will suffice.
>> > 
>> > Suggested-by: Neil Skrypuch <neil@tembosocial.com>
>> > Tested-by: Neil Skrypuch <neil@tembosocial.com>
>> > Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>> > Reviewed-by: Eric Blake <eblake@redhat.com>
>> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> > ---
>> > v2:
>> >  * Remove outdated comment about libvirt feature detection [danpb]
>> 
>> Question - if we used qapi's 'if':COND to only declare the field on
>> platforms where we know at compile time that we can support it, would
>> that be enough for libvirt to introspect that if the field exists then
>> migration is safe, without having to rely on an query-qemu-features command?
>
> Yes, although this raises another question:
>
> The drop-cache implementation is not #ifdefed in file-posix.c.  If we
> make the QMP schema conditional, should we also #ifdef the command-line
> option in raw_runtime_opts[] to prevent QEMU from silently ignoring this
> option?

Silently ignoring user directives is generally a bad idea.  Possible
exceptions include directives of a "$frobnicate if you can" kind.
diff mbox series

Patch

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 2b8afbb924..d4cc3c4294 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2807,6 +2807,10 @@ 
 # @locking:     whether to enable file locking. If set to 'auto', only enable
 #               when Open File Descriptor (OFD) locking API is available
 #               (default: auto, since 2.10)
+# @drop-cache:  invalidate page cache during live migration.  This prevents
+#               stale data on the migration destination with cache.direct=off.
+#               Currently only supported on Linux hosts.
+#               (default: on, since: 4.0)
 # @x-check-cache-dropped: whether to check that page cache was dropped on live
 #                         migration.  May cause noticeable delays if the image
 #                         file is large, do not use in production.
@@ -2819,6 +2823,7 @@ 
             '*pr-manager': 'str',
             '*locking': 'OnOffAuto',
             '*aio': 'BlockdevAioOptions',
+            '*drop-cache': 'bool',
             '*x-check-cache-dropped': 'bool' } }
 
 ##
diff --git a/block/file-posix.c b/block/file-posix.c
index ba6ab62a38..7bb2c4762f 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -154,6 +154,7 @@  typedef struct BDRVRawState {
     bool page_cache_inconsistent:1;
     bool has_fallocate;
     bool needs_alignment;
+    bool drop_cache;
     bool check_cache_dropped;
 
     PRManager *pr_mgr;
@@ -162,6 +163,7 @@  typedef struct BDRVRawState {
 typedef struct BDRVRawReopenState {
     int fd;
     int open_flags;
+    bool drop_cache;
     bool check_cache_dropped;
 } BDRVRawReopenState;
 
@@ -422,6 +424,11 @@  static QemuOptsList raw_runtime_opts = {
             .type = QEMU_OPT_STRING,
             .help = "id of persistent reservation manager object (default: none)",
         },
+        {
+            .name = "drop-cache",
+            .type = QEMU_OPT_BOOL,
+            .help = "invalidate page cache during live migration (default: on)",
+        },
         {
             .name = "x-check-cache-dropped",
             .type = QEMU_OPT_BOOL,
@@ -511,6 +518,7 @@  static int raw_open_common(BlockDriverState *bs, QDict *options,
         }
     }
 
+    s->drop_cache = qemu_opt_get_bool(opts, "drop-cache", true);
     s->check_cache_dropped = qemu_opt_get_bool(opts, "x-check-cache-dropped",
                                                false);
 
@@ -869,6 +877,7 @@  static int raw_reopen_prepare(BDRVReopenState *state,
         goto out;
     }
 
+    rs->drop_cache = qemu_opt_get_bool_del(opts, "drop-cache", true);
     rs->check_cache_dropped =
         qemu_opt_get_bool_del(opts, "x-check-cache-dropped", false);
 
@@ -946,6 +955,7 @@  static void raw_reopen_commit(BDRVReopenState *state)
     BDRVRawState *s = state->bs->opaque;
     Error *local_err = NULL;
 
+    s->drop_cache = rs->drop_cache;
     s->check_cache_dropped = rs->check_cache_dropped;
     s->open_flags = rs->open_flags;
 
@@ -2531,6 +2541,10 @@  static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
         return;
     }
 
+    if (!s->drop_cache) {
+        return;
+    }
+
     if (s->open_flags & O_DIRECT) {
         return; /* No host kernel page cache */
     }