diff mbox

[16/46] Add migration-capability boolean for postcopy-ram.

Message ID 1404495717-4239-17-git-send-email-dgilbert@redhat.com
State New
Headers show

Commit Message

Dr. David Alan Gilbert July 4, 2014, 5:41 p.m. UTC
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h | 1 +
 migration.c                   | 9 +++++++++
 qapi-schema.json              | 6 +++++-
 3 files changed, 15 insertions(+), 1 deletion(-)

Comments

Eric Blake July 7, 2014, 7:41 p.m. UTC | #1
On 07/04/2014 11:41 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/migration.h | 1 +
>  migration.c                   | 9 +++++++++
>  qapi-schema.json              | 6 +++++-
>  3 files changed, 15 insertions(+), 1 deletion(-)
> 

> +++ b/qapi-schema.json
> @@ -491,10 +491,14 @@
>  # @auto-converge: If enabled, QEMU will automatically throttle down the guest
>  #          to speed up convergence of RAM migration. (since 1.6)
>  #
> +# @x-postcopy-ram: Start executing on the migration target before all of RAM has been
> +#          migrated, pulling the remaining pages along as needed. NOTE: If the
> +#          migration fails during postcopy the VM will fail.  (since 2.2)

How does this work with libvirt's current insistence that it manually
resumes the guest on the destination in order to give feedback to the
source on whether it was successful? I'm not sure if enabling this bool
is the right thing to do, or if we just need more visibility (such as
events rather than the current state of polling) for libvirt to know
that it is time to resume the destination and start the post-copy phase.
Dr. David Alan Gilbert July 7, 2014, 8:23 p.m. UTC | #2
* Eric Blake (eblake@redhat.com) wrote:
> On 07/04/2014 11:41 AM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/migration.h | 1 +
> >  migration.c                   | 9 +++++++++
> >  qapi-schema.json              | 6 +++++-
> >  3 files changed, 15 insertions(+), 1 deletion(-)
> > 
> 
> > +++ b/qapi-schema.json
> > @@ -491,10 +491,14 @@
> >  # @auto-converge: If enabled, QEMU will automatically throttle down the guest
> >  #          to speed up convergence of RAM migration. (since 1.6)
> >  #
> > +# @x-postcopy-ram: Start executing on the migration target before all of RAM has been
> > +#          migrated, pulling the remaining pages along as needed. NOTE: If the
> > +#          migration fails during postcopy the VM will fail.  (since 2.2)
> 
> How does this work with libvirt's current insistence that it manually
> resumes the guest on the destination in order to give feedback to the
> source on whether it was successful? I'm not sure if enabling this bool
> is the right thing to do, or if we just need more visibility (such as
> events rather than the current state of polling) for libvirt to know
> that it is time to resume the destination and start the post-copy phase.

That's an interesting overlap with Paolo's question.
(and approximately the same answer)

I think what I need to do for that is:
   1) As for precopy add the option not to start the destination CPU on entry to postcopy;
      I think that's OK, because we can carry on in postcopy mode even if the destination
      CPU isn't running, we just won't generate page requests.
   2) Finally fix up the old request libvirt has for events based on migration state.

Admittedly I don't quite understand how (1) is supposed to interact with device
state.

Dave

> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Paolo Bonzini July 10, 2014, 4:17 p.m. UTC | #3
Il 07/07/2014 22:23, Dr. David Alan Gilbert ha scritto:
> I think what I need to do for that is:
>    1) As for precopy add the option not to start the destination CPU on entry to postcopy;
>       I think that's OK, because we can carry on in postcopy mode even if the destination
>       CPU isn't running, we just won't generate page requests.
>
> Admittedly I don't quite understand how (1) is supposed to interact with device
> state.

This is just passing "-S" on the destination side.  Device state is 
treated the same as without "-S" and can still generate page requests. 
The only difference is whether you have a vm_start() or not.

I think it should be possible to restart the VM on the source side after 
postcopy migration, as long as migration has failed or has been 
canceled.  Whether that makes sense or causes dire disk corruption 
depends on the particular scenario, but then the same holds for precopy 
and we don't try at all to prevent "cont" at the end of migration.  It 
makes it much easier for libvirt to restart the source if it cannot 
continue on the destination.

Paolo
Dr. David Alan Gilbert July 10, 2014, 7:02 p.m. UTC | #4
* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 07/07/2014 22:23, Dr. David Alan Gilbert ha scritto:
> >I think what I need to do for that is:
> >   1) As for precopy add the option not to start the destination CPU on entry to postcopy;
> >      I think that's OK, because we can carry on in postcopy mode even if the destination
> >      CPU isn't running, we just won't generate page requests.
> >
> >Admittedly I don't quite understand how (1) is supposed to interact with device
> >state.
> 
> This is just passing "-S" on the destination side.  Device state is treated
> the same as without "-S" and can still generate page requests. The only
> difference is whether you have a vm_start() or not.

Good, that sounds easy enough.

> I think it should be possible to restart the VM on the source side after
> postcopy migration, as long as migration has failed or has been canceled.
> Whether that makes sense or causes dire disk corruption depends on the
> particular scenario, but then the same holds for precopy and we don't try at
> all to prevent "cont" at the end of migration.  It makes it much easier for
> libvirt to restart the source if it cannot continue on the destination.

Interesting; Andrea fell into accidentally starting his source and
was somewhat surprised.
I was just going to add the RAN_STATE_MEMORY_STALE that Lei Li added
in the exec-migration patchset.

Dave

> 
> Paolo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff mbox

Patch

diff --git a/include/migration/migration.h b/include/migration/migration.h
index a1ed7a3..35ad1f6 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -171,6 +171,7 @@  void migrate_add_blocker(Error *reason);
  */
 void migrate_del_blocker(Error *reason);
 
+bool migrate_postcopy_ram(void);
 bool migrate_rdma_pin_all(void);
 bool migrate_zero_blocks(void);
 
diff --git a/migration.c b/migration.c
index e69a49e..67cdfd6 100644
--- a/migration.c
+++ b/migration.c
@@ -612,6 +612,15 @@  bool migrate_rdma_pin_all(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL];
 }
 
+bool migrate_postcopy_ram(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_X_POSTCOPY_RAM];
+}
+
 bool migrate_auto_converge(void)
 {
     MigrationState *s;
diff --git a/qapi-schema.json b/qapi-schema.json
index b11aad2..eac3739 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -491,10 +491,14 @@ 
 # @auto-converge: If enabled, QEMU will automatically throttle down the guest
 #          to speed up convergence of RAM migration. (since 1.6)
 #
+# @x-postcopy-ram: Start executing on the migration target before all of RAM has been
+#          migrated, pulling the remaining pages along as needed. NOTE: If the
+#          migration fails during postcopy the VM will fail.  (since 2.2)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] }
+  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', 'x-postcopy-ram'] }
 
 ##
 # @MigrationCapabilityStatus