Message ID | 20220701155935.482503-3-leobras@redhat.com |
---|---|
State | New |
Headers | show |
Series | Zero copy improvements (QIOChannel + multifd) | expand |
Leonardo Bras <leobras@redhat.com> writes: > Signed-off-by: Leonardo Bras <leobras@redhat.com> > --- > qapi/migration.json | 5 ++++- > migration/migration.c | 1 + > monitor/hmp-cmds.c | 4 ++++ > 3 files changed, 9 insertions(+), 1 deletion(-) > > diff --git a/qapi/migration.json b/qapi/migration.json > index 7102e474a6..925f009868 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -55,6 +55,9 @@ > # @postcopy-bytes: The number of bytes sent during the post-copy phase > # (since 7.0). > # > +# @zero-copy-copied: The number of zero-copy flushes that reported data sent > +# using zero-copy that ended up being copied. (since 7.2) The description feels awkward. What's a "zero-copy flush", and why should the user care? I figure what users care about is the number of all-zero pages we had to "copy", i.e. send the bulky way. Is this what @zero-copy-copied reports? > +# > # Since: 0.14 > ## > { 'struct': 'MigrationStats', > @@ -65,7 +68,7 @@ > 'postcopy-requests' : 'int', 'page-size' : 'int', > 'multifd-bytes' : 'uint64', 'pages-per-second' : 'uint64', > 'precopy-bytes' : 'uint64', 'downtime-bytes' : 'uint64', > - 'postcopy-bytes' : 'uint64' } } > + 'postcopy-bytes' : 'uint64', 'zero-copy-copied' : 'uint64' } } > > ## > # @XBZRLECacheStats:
On Mon, Jul 04, 2022 at 08:18:54AM +0200, Markus Armbruster wrote: > Leonardo Bras <leobras@redhat.com> writes: > > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > > --- > > qapi/migration.json | 5 ++++- > > migration/migration.c | 1 + > > monitor/hmp-cmds.c | 4 ++++ > > 3 files changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/qapi/migration.json b/qapi/migration.json > > index 7102e474a6..925f009868 100644 > > --- a/qapi/migration.json > > +++ b/qapi/migration.json > > @@ -55,6 +55,9 @@ > > # @postcopy-bytes: The number of bytes sent during the post-copy phase > > # (since 7.0). > > # > > +# @zero-copy-copied: The number of zero-copy flushes that reported data sent > > +# using zero-copy that ended up being copied. (since 7.2) > > The description feels awkward. What's a "zero-copy flush", and why > should the user care? I figure what users care about is the number of > all-zero pages we had to "copy", i.e. send the bulky way. Is this what > @zero-copy-copied reports? MigrationCapability field @zero-copy-send instructs QEMU to try to avoid copying data between userspace and kernel space when transmitting RAM region. Even if the kernel supports zero copy, it is not guaranteed to happen, it is merely a request to try. QEMU periodically (once per migration iteration) flushes outstanding zero-copy requests and gets an indication back of whether any copies took place or not. So this counter is a reflection of how many iterations resulted in zero-copy not being fully honoured. IOW, ideally this counter will always be zero. If it is non-zero, then the magnitude gives a very very very rough guide to what's going on. If it is '1' then it was just a transient limitation. If it matches the number of migration iterations, then it is a more systemic limitation. Incidentally, do we report the migration iteration count ? I thought we did, but i'm not finding it now that I look. With regards, Daniel
Daniel P. Berrangé <berrange@redhat.com> writes: > On Mon, Jul 04, 2022 at 08:18:54AM +0200, Markus Armbruster wrote: >> Leonardo Bras <leobras@redhat.com> writes: >> >> > Signed-off-by: Leonardo Bras <leobras@redhat.com> >> > --- >> > qapi/migration.json | 5 ++++- >> > migration/migration.c | 1 + >> > monitor/hmp-cmds.c | 4 ++++ >> > 3 files changed, 9 insertions(+), 1 deletion(-) >> > >> > diff --git a/qapi/migration.json b/qapi/migration.json >> > index 7102e474a6..925f009868 100644 >> > --- a/qapi/migration.json >> > +++ b/qapi/migration.json >> > @@ -55,6 +55,9 @@ >> > # @postcopy-bytes: The number of bytes sent during the post-copy phase >> > # (since 7.0). >> > # >> > +# @zero-copy-copied: The number of zero-copy flushes that reported data sent >> > +# using zero-copy that ended up being copied. (since 7.2) >> >> The description feels awkward. What's a "zero-copy flush", and why >> should the user care? I figure what users care about is the number of >> all-zero pages we had to "copy", i.e. send the bulky way. Is this what >> @zero-copy-copied reports? > > MigrationCapability field @zero-copy-send instructs QEMU to try to > avoid copying data between userspace and kernel space when transmitting > RAM region. > > Even if the kernel supports zero copy, it is not guaranteed to happen, > it is merely a request to try. > > QEMU periodically (once per migration iteration) flushes outstanding > zero-copy requests and gets an indication back of whether any copies > took place or not. > > So this counter is a reflection of how many iterations resulted in > zero-copy not being fully honoured. Aha. Thanks! > IOW, ideally this counter will always be zero. If it is non-zero, > then the magnitude gives a very very very rough guide to what's > going on. If it is '1' then it was just a transient limitation. > If it matches the number of migration iterations, then it is a > more systemic limitation. I think we should rephrase the documentation in terms of migration iterations instead of "flushes", because the latter is a non-obvious term without a definition. > Incidentally, do we report the migration iteration count ? I > thought we did, but i'm not finding it now that I look. I was about to ask exactly that. I'm not sure the value of @zero-copy-copied can be usefully interpreted without.
* Daniel P. Berrangé (berrange@redhat.com) wrote: > On Mon, Jul 04, 2022 at 08:18:54AM +0200, Markus Armbruster wrote: > > Leonardo Bras <leobras@redhat.com> writes: > > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > > > --- > > > qapi/migration.json | 5 ++++- > > > migration/migration.c | 1 + > > > monitor/hmp-cmds.c | 4 ++++ > > > 3 files changed, 9 insertions(+), 1 deletion(-) > > > > > > diff --git a/qapi/migration.json b/qapi/migration.json > > > index 7102e474a6..925f009868 100644 > > > --- a/qapi/migration.json > > > +++ b/qapi/migration.json > > > @@ -55,6 +55,9 @@ > > > # @postcopy-bytes: The number of bytes sent during the post-copy phase > > > # (since 7.0). > > > # > > > +# @zero-copy-copied: The number of zero-copy flushes that reported data sent > > > +# using zero-copy that ended up being copied. (since 7.2) > > > > The description feels awkward. What's a "zero-copy flush", and why > > should the user care? I figure what users care about is the number of > > all-zero pages we had to "copy", i.e. send the bulky way. Is this what > > @zero-copy-copied reports? > > MigrationCapability field @zero-copy-send instructs QEMU to try to > avoid copying data between userspace and kernel space when transmitting > RAM region. > > Even if the kernel supports zero copy, it is not guaranteed to happen, > it is merely a request to try. > > QEMU periodically (once per migration iteration) flushes outstanding > zero-copy requests and gets an indication back of whether any copies > took place or not. > > So this counter is a reflection of how many iterations resulted in > zero-copy not being fully honoured. > > IOW, ideally this counter will always be zero. If it is non-zero, > then the magnitude gives a very very very rough guide to what's > going on. If it is '1' then it was just a transient limitation. > If it matches the number of migration iterations, then it is a > more systemic limitation. > > Incidentally, do we report the migration iteration count ? I > thought we did, but i'm not finding it now that I look. Yes we do; it's dirty-sync-count Dave > > With regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| >
"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: > * Daniel P. Berrangé (berrange@redhat.com) wrote: >> On Mon, Jul 04, 2022 at 08:18:54AM +0200, Markus Armbruster wrote: >> > Leonardo Bras <leobras@redhat.com> writes: >> > >> > > Signed-off-by: Leonardo Bras <leobras@redhat.com> >> > > --- >> > > qapi/migration.json | 5 ++++- >> > > migration/migration.c | 1 + >> > > monitor/hmp-cmds.c | 4 ++++ >> > > 3 files changed, 9 insertions(+), 1 deletion(-) >> > > >> > > diff --git a/qapi/migration.json b/qapi/migration.json >> > > index 7102e474a6..925f009868 100644 >> > > --- a/qapi/migration.json >> > > +++ b/qapi/migration.json >> > > @@ -55,6 +55,9 @@ >> > > # @postcopy-bytes: The number of bytes sent during the post-copy phase >> > > # (since 7.0). >> > > # >> > > +# @zero-copy-copied: The number of zero-copy flushes that reported data sent >> > > +# using zero-copy that ended up being copied. (since 7.2) since 7.1, unless you're planning for really tortuous review :) >> > >> > The description feels awkward. What's a "zero-copy flush", and why >> > should the user care? I figure what users care about is the number of >> > all-zero pages we had to "copy", i.e. send the bulky way. Is this what >> > @zero-copy-copied reports? >> >> MigrationCapability field @zero-copy-send instructs QEMU to try to >> avoid copying data between userspace and kernel space when transmitting >> RAM region. >> >> Even if the kernel supports zero copy, it is not guaranteed to happen, >> it is merely a request to try. >> >> QEMU periodically (once per migration iteration) flushes outstanding >> zero-copy requests and gets an indication back of whether any copies >> took place or not. >> >> So this counter is a reflection of how many iterations resulted in >> zero-copy not being fully honoured. >> >> IOW, ideally this counter will always be zero. If it is non-zero, >> then the magnitude gives a very very very rough guide to what's >> going on. If it is '1' then it was just a transient limitation. >> If it matches the number of migration iterations, then it is a >> more systemic limitation. >> >> Incidentally, do we report the migration iteration count ? I >> thought we did, but i'm not finding it now that I look. > > Yes we do; it's dirty-sync-count Please rephrase the documentation of @zero-copy-copied in terms of @dirty-sync-count. Here's my attempt. # @zero-copy-copied: Number of times dirty RAM synchronization could not # avoid copying zero pages. This is between 0 and # @dirty-sync-count. (since 7.1)
On Mon, Jul 04, 2022 at 02:04:41PM +0200, Markus Armbruster wrote: > "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: > > > * Daniel P. Berrangé (berrange@redhat.com) wrote: > >> On Mon, Jul 04, 2022 at 08:18:54AM +0200, Markus Armbruster wrote: > >> > Leonardo Bras <leobras@redhat.com> writes: > >> > > >> > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > >> > > --- > >> > > qapi/migration.json | 5 ++++- > >> > > migration/migration.c | 1 + > >> > > monitor/hmp-cmds.c | 4 ++++ > >> > > 3 files changed, 9 insertions(+), 1 deletion(-) > >> > > > >> > > diff --git a/qapi/migration.json b/qapi/migration.json > >> > > index 7102e474a6..925f009868 100644 > >> > > --- a/qapi/migration.json > >> > > +++ b/qapi/migration.json > >> > > @@ -55,6 +55,9 @@ > >> > > # @postcopy-bytes: The number of bytes sent during the post-copy phase > >> > > # (since 7.0). > >> > > # > >> > > +# @zero-copy-copied: The number of zero-copy flushes that reported data sent > >> > > +# using zero-copy that ended up being copied. (since 7.2) > > since 7.1, unless you're planning for really tortuous review :) > > >> > > >> > The description feels awkward. What's a "zero-copy flush", and why > >> > should the user care? I figure what users care about is the number of > >> > all-zero pages we had to "copy", i.e. send the bulky way. Is this what > >> > @zero-copy-copied reports? > >> > >> MigrationCapability field @zero-copy-send instructs QEMU to try to > >> avoid copying data between userspace and kernel space when transmitting > >> RAM region. > >> > >> Even if the kernel supports zero copy, it is not guaranteed to happen, > >> it is merely a request to try. > >> > >> QEMU periodically (once per migration iteration) flushes outstanding > >> zero-copy requests and gets an indication back of whether any copies > >> took place or not. > >> > >> So this counter is a reflection of how many iterations resulted in > >> zero-copy not being fully honoured. > >> > >> IOW, ideally this counter will always be zero. If it is non-zero, > >> then the magnitude gives a very very very rough guide to what's > >> going on. If it is '1' then it was just a transient limitation. > >> If it matches the number of migration iterations, then it is a > >> more systemic limitation. > >> > >> Incidentally, do we report the migration iteration count ? I > >> thought we did, but i'm not finding it now that I look. > > > > Yes we do; it's dirty-sync-count > > Please rephrase the documentation of @zero-copy-copied in terms of > @dirty-sync-count. Here's my attempt. > > # @zero-copy-copied: Number of times dirty RAM synchronization could not > # avoid copying zero pages. This is between 0 and > # @dirty-sync-count. (since 7.1) Any one have preferences on the name - i get slight put off by the repeated word in the property name here. @zero-copy-rejects ? @zero-copy-misses ? @zero-copy-fails ? With regards, Daniel
Daniel P. Berrangé <berrange@redhat.com> writes: > On Mon, Jul 04, 2022 at 02:04:41PM +0200, Markus Armbruster wrote: [...] >> Please rephrase the documentation of @zero-copy-copied in terms of >> @dirty-sync-count. Here's my attempt. >> >> # @zero-copy-copied: Number of times dirty RAM synchronization could not >> # avoid copying zero pages. This is between 0 and >> # @dirty-sync-count. (since 7.1) > > Any one have preferences on the name - i get slight put off by the > repeated word in the property name here. > > @zero-copy-rejects ? > @zero-copy-misses ? > @zero-copy-fails ? I'd consider any of these an improvement. Not a native speaker, but perhaps "failures" instead of "fails". We could also express the connection to @dirty-sync-count right in the name, like @dirty-sync-rejected-zero-copy, @dirty-sync-missed-zero-copy, @dirty-sync-failed-zero-copy. Or maybe -copies.
On Mon, Jul 04, 2022 at 02:59:50PM +0200, Markus Armbruster wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > On Mon, Jul 04, 2022 at 02:04:41PM +0200, Markus Armbruster wrote: > > [...] > > >> Please rephrase the documentation of @zero-copy-copied in terms of > >> @dirty-sync-count. Here's my attempt. > >> > >> # @zero-copy-copied: Number of times dirty RAM synchronization could not > >> # avoid copying zero pages. This is between 0 and > >> # @dirty-sync-count. (since 7.1) > > > > Any one have preferences on the name - i get slight put off by the > > repeated word in the property name here. > > > > @zero-copy-rejects ? > > @zero-copy-misses ? > > @zero-copy-fails ? > > I'd consider any of these an improvement. Not a native speaker, but > perhaps "failures" instead of "fails". > > We could also express the connection to @dirty-sync-count right in the > name, like @dirty-sync-rejected-zero-copy, @dirty-sync-missed-zero-copy, > @dirty-sync-failed-zero-copy. Or maybe -copies. Yeah, @dirty-sync-missed-zero-copy is probably my favourite. With regards, Daniel
Thanks Daniel, Markus and Dave for the feedback! On Mon, 2022-07-04 at 14:14 +0100, Daniel P. Berrangé wrote: > On Mon, Jul 04, 2022 at 02:59:50PM +0200, Markus Armbruster wrote: > > Daniel P. Berrangé <berrange@redhat.com> writes: > > > > > On Mon, Jul 04, 2022 at 02:04:41PM +0200, Markus Armbruster wrote: > > [...] > > since 7.1, unless you're planning for really tortuous review :) > Ok, updated :) > > [...] > > > > > > Please rephrase the documentation of @zero-copy-copied in terms of > > > > @dirty-sync-count. Here's my attempt. > > > > > > > > # @zero-copy-copied: Number of times dirty RAM synchronization could not > > > > # avoid copying zero pages. This is between 0 and > > > > # @dirty-sync-count. (since 7.1) That's a great description! And it's almost 100% correct. IIUC dirty-sync-count increments on migration_bitmap_sync() once after each full dirty-bitmap scan, and even with multifd syncing at the same point, it could potentially have a increment per multifd channel. The only change would be having: # This is between 0 and @dirty-sync-count * @multifd-channel. > > > > > > Any one have preferences on the name - i get slight put off by the > > > repeated word in the property name here. > > > > > > @zero-copy-rejects ? > > > @zero-copy-misses ? > > > @zero-copy-fails ? > > > > I'd consider any of these an improvement. Not a native speaker, but > > perhaps "failures" instead of "fails". > > > > We could also express the connection to @dirty-sync-count right in the > > name, like @dirty-sync-rejected-zero-copy, @dirty-sync-missed-zero-copy, > > @dirty-sync-failed-zero-copy. Or maybe -copies. > > Yeah, @dirty-sync-missed-zero-copy is probably my favourite. Ok then, this one there is :) I will update my patchset with the suggestions, and I should publish a v3 shortly. Best regards, Leo > > With regards, > Daniel
Leonardo Brás <leobras@redhat.com> writes: > Thanks Daniel, Markus and Dave for the feedback! > > On Mon, 2022-07-04 at 14:14 +0100, Daniel P. Berrangé wrote: >> On Mon, Jul 04, 2022 at 02:59:50PM +0200, Markus Armbruster wrote: >> > Daniel P. Berrangé <berrange@redhat.com> writes: >> > >> > > On Mon, Jul 04, 2022 at 02:04:41PM +0200, Markus Armbruster wrote: >> > > [...] > >> >> since 7.1, unless you're planning for really tortuous review :) >> > > Ok, updated :) > >> > [...] >> > >> > > > Please rephrase the documentation of @zero-copy-copied in terms of >> > > > @dirty-sync-count. Here's my attempt. >> > > > >> > > > # @zero-copy-copied: Number of times dirty RAM synchronization could not >> > > > # avoid copying zero pages. This is between 0 and >> > > > # @dirty-sync-count. (since 7.1) > > That's a great description! And it's almost 100% correct. That's why we do patch review :) > IIUC dirty-sync-count increments on migration_bitmap_sync() once after each full > dirty-bitmap scan, and even with multifd syncing at the same point, it could > potentially have a increment per multifd channel. > > The only change would be having: > # This is between 0 and @dirty-sync-count * @multifd-channel. Makes sense to me. [...]
diff --git a/qapi/migration.json b/qapi/migration.json index 7102e474a6..925f009868 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -55,6 +55,9 @@ # @postcopy-bytes: The number of bytes sent during the post-copy phase # (since 7.0). # +# @zero-copy-copied: The number of zero-copy flushes that reported data sent +# using zero-copy that ended up being copied. (since 7.2) +# # Since: 0.14 ## { 'struct': 'MigrationStats', @@ -65,7 +68,7 @@ 'postcopy-requests' : 'int', 'page-size' : 'int', 'multifd-bytes' : 'uint64', 'pages-per-second' : 'uint64', 'precopy-bytes' : 'uint64', 'downtime-bytes' : 'uint64', - 'postcopy-bytes' : 'uint64' } } + 'postcopy-bytes' : 'uint64', 'zero-copy-copied' : 'uint64' } } ## # @XBZRLECacheStats: diff --git a/migration/migration.c b/migration/migration.c index 78f5057373..68fc3894ba 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1034,6 +1034,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s) info->ram->precopy_bytes = ram_counters.precopy_bytes; info->ram->downtime_bytes = ram_counters.downtime_bytes; info->ram->postcopy_bytes = ram_counters.postcopy_bytes; + info->ram->zero_copy_copied = ram_counters.zero_copy_copied; if (migrate_use_xbzrle()) { info->has_xbzrle_cache = true; diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c index ca98df0495..bcb1799543 100644 --- a/monitor/hmp-cmds.c +++ b/monitor/hmp-cmds.c @@ -307,6 +307,10 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict) monitor_printf(mon, "postcopy ram: %" PRIu64 " kbytes\n", info->ram->postcopy_bytes >> 10); } + if (info->ram->zero_copy_copied) { + monitor_printf(mon, "zero-copy copied: %" PRIu64 " iterations\n", + info->ram->zero_copy_copied); + } } if (info->has_disk) {
Signed-off-by: Leonardo Bras <leobras@redhat.com> --- qapi/migration.json | 5 ++++- migration/migration.c | 1 + monitor/hmp-cmds.c | 4 ++++ 3 files changed, 9 insertions(+), 1 deletion(-)