diff mbox series

[v6,25/28] qmp/migration: new command migrate-recover

Message ID 20180208103132.28452-26-peterx@redhat.com
State New
Headers show
Series [v6,01/28] migration: better error handling with QEMUFile | expand

Commit Message

Peter Xu Feb. 8, 2018, 10:31 a.m. UTC
The first allow-oob=true command.  It's used on destination side when
the postcopy migration is paused and ready for a recovery.  After
execution, a new migration channel will be established for postcopy to
continue.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 26 ++++++++++++++++++++++++++
 migration/migration.h |  1 +
 migration/savevm.c    |  3 +++
 qapi/migration.json   | 20 ++++++++++++++++++++
 4 files changed, 50 insertions(+)

Comments

Dr. David Alan Gilbert Feb. 13, 2018, 6:56 p.m. UTC | #1
* Peter Xu (peterx@redhat.com) wrote:
> The first allow-oob=true command.  It's used on destination side when
> the postcopy migration is paused and ready for a recovery.  After
> execution, a new migration channel will be established for postcopy to
> continue.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c | 26 ++++++++++++++++++++++++++
>  migration/migration.h |  1 +
>  migration/savevm.c    |  3 +++
>  qapi/migration.json   | 20 ++++++++++++++++++++
>  4 files changed, 50 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index cf3a3f416c..bb57ed9ade 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1422,6 +1422,32 @@ void qmp_migrate_incoming(const char *uri, Error **errp)
>      once = false;
>  }
>  
> +void qmp_migrate_recover(const char *uri, Error **errp)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +
> +    if (mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
> +        error_setg(errp, "Migrate recover can only be run "
> +                   "when postcopy is paused.");
> +        return;
> +    }

OK, if it did come back as Paused I don't think it can leave it again
except this way, so I'm not too worried it being thread safe.

> +    if (mis->postcopy_recover_triggered) {
> +        error_setg(errp, "Migrate recovery is triggered already");
> +        return;
> +    }
> +
> +    /* This will make sure we'll only allow one recover for one pause */
> +    mis->postcopy_recover_triggered = true;

However, does that need to be done with a :
   if (atomic_cmpxchg(mis->postcopy_recovery_triggered, false, true) ==
       true) {
      error_setg(errp, "Migrate recovery is triggered already");
   }

for the slim chance that someone did this command on the main and the
oob monitor?

Dave

> +    /*
> +     * Note that this call will never start a real migration; it will
> +     * only re-setup the migration stream and poke existing migration
> +     * to continue using that newly established channel.
> +     */
> +    qemu_start_incoming_migration(uri, errp);
> +}
> +
>  bool migration_is_blocked(Error **errp)
>  {
>      if (qemu_savevm_state_blocked(errp)) {
> diff --git a/migration/migration.h b/migration/migration.h
> index 88f5614b90..581bf4668b 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -65,6 +65,7 @@ struct MigrationIncomingState {
>      QemuSemaphore colo_incoming_sem;
>  
>      /* notify PAUSED postcopy incoming migrations to try to continue */
> +    bool postcopy_recover_triggered;
>      QemuSemaphore postcopy_pause_sem_dst;
>      QemuSemaphore postcopy_pause_sem_fault;
>  };
> diff --git a/migration/savevm.c b/migration/savevm.c
> index d40092a2b6..5f41b062ba 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2182,6 +2182,9 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
>      /* Notify the fault thread for the invalidated file handle */
>      postcopy_fault_thread_notify(mis);
>  
> +    /* Clear the triggered bit to allow one recovery */
> +    mis->postcopy_recover_triggered = false;
> +
>      error_report("Detected IO failure for postcopy. "
>                   "Migration paused.");
>  
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 055130314d..dfbcb02d4c 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1172,3 +1172,23 @@
>  # Since: 2.9
>  ##
>  { 'command': 'xen-colo-do-checkpoint' }
> +
> +##
> +# @migrate-recover:
> +#
> +# Provide a recovery migration stream URI.
> +#
> +# @uri: the URI to be used for the recovery of migration stream.
> +#
> +# Returns: nothing.
> +#
> +# Example:
> +#
> +# -> { "execute": "migrate-recover",
> +#      "arguments": { "uri": "tcp:192.168.1.200:12345" } }
> +# <- { "return": {} }
> +#
> +# Since: 2.12
> +##
> +{ 'command': 'migrate-recover', 'data': { 'uri': 'str' },
> +  'allow-oob': true }
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Peter Xu Feb. 14, 2018, 4:30 a.m. UTC | #2
On Tue, Feb 13, 2018 at 06:56:51PM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > The first allow-oob=true command.  It's used on destination side when
> > the postcopy migration is paused and ready for a recovery.  After
> > execution, a new migration channel will be established for postcopy to
> > continue.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/migration.c | 26 ++++++++++++++++++++++++++
> >  migration/migration.h |  1 +
> >  migration/savevm.c    |  3 +++
> >  qapi/migration.json   | 20 ++++++++++++++++++++
> >  4 files changed, 50 insertions(+)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index cf3a3f416c..bb57ed9ade 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1422,6 +1422,32 @@ void qmp_migrate_incoming(const char *uri, Error **errp)
> >      once = false;
> >  }
> >  
> > +void qmp_migrate_recover(const char *uri, Error **errp)
> > +{
> > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > +
> > +    if (mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > +        error_setg(errp, "Migrate recover can only be run "
> > +                   "when postcopy is paused.");
> > +        return;
> > +    }
> 
> OK, if it did come back as Paused I don't think it can leave it again
> except this way, so I'm not too worried it being thread safe.
> 
> > +    if (mis->postcopy_recover_triggered) {
> > +        error_setg(errp, "Migrate recovery is triggered already");
> > +        return;
> > +    }
> > +
> > +    /* This will make sure we'll only allow one recover for one pause */
> > +    mis->postcopy_recover_triggered = true;
> 
> However, does that need to be done with a :
>    if (atomic_cmpxchg(mis->postcopy_recovery_triggered, false, true) ==
>        true) {
>       error_setg(errp, "Migrate recovery is triggered already");
>    }
> 
> for the slim chance that someone did this command on the main and the
> oob monitor?

Yes, slim chance, but I agree. :)

I wasn't that strict on this, but I should.  Since we are at it, maybe
I'll also...

> 
> Dave
> 
> > +    /*
> > +     * Note that this call will never start a real migration; it will
> > +     * only re-setup the migration stream and poke existing migration
> > +     * to continue using that newly established channel.
> > +     */
> > +    qemu_start_incoming_migration(uri, errp);
> > +}
> > +
> >  bool migration_is_blocked(Error **errp)
> >  {
> >      if (qemu_savevm_state_blocked(errp)) {
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 88f5614b90..581bf4668b 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -65,6 +65,7 @@ struct MigrationIncomingState {
> >      QemuSemaphore colo_incoming_sem;
> >  
> >      /* notify PAUSED postcopy incoming migrations to try to continue */
> > +    bool postcopy_recover_triggered;
> >      QemuSemaphore postcopy_pause_sem_dst;
> >      QemuSemaphore postcopy_pause_sem_fault;
> >  };
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index d40092a2b6..5f41b062ba 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -2182,6 +2182,9 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
> >      /* Notify the fault thread for the invalidated file handle */
> >      postcopy_fault_thread_notify(mis);
> >  
> > +    /* Clear the triggered bit to allow one recovery */
> > +    mis->postcopy_recover_triggered = false;
> > +

... move this set operation above migrate_set_state() since there can
also be a slim chance too that we may be handling migrate-recover even
before setting up postcopy_recover_triggered=false first.

Thanks,
diff mbox series

Patch

diff --git a/migration/migration.c b/migration/migration.c
index cf3a3f416c..bb57ed9ade 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1422,6 +1422,32 @@  void qmp_migrate_incoming(const char *uri, Error **errp)
     once = false;
 }
 
+void qmp_migrate_recover(const char *uri, Error **errp)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    if (mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
+        error_setg(errp, "Migrate recover can only be run "
+                   "when postcopy is paused.");
+        return;
+    }
+
+    if (mis->postcopy_recover_triggered) {
+        error_setg(errp, "Migrate recovery is triggered already");
+        return;
+    }
+
+    /* This will make sure we'll only allow one recover for one pause */
+    mis->postcopy_recover_triggered = true;
+
+    /*
+     * Note that this call will never start a real migration; it will
+     * only re-setup the migration stream and poke existing migration
+     * to continue using that newly established channel.
+     */
+    qemu_start_incoming_migration(uri, errp);
+}
+
 bool migration_is_blocked(Error **errp)
 {
     if (qemu_savevm_state_blocked(errp)) {
diff --git a/migration/migration.h b/migration/migration.h
index 88f5614b90..581bf4668b 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -65,6 +65,7 @@  struct MigrationIncomingState {
     QemuSemaphore colo_incoming_sem;
 
     /* notify PAUSED postcopy incoming migrations to try to continue */
+    bool postcopy_recover_triggered;
     QemuSemaphore postcopy_pause_sem_dst;
     QemuSemaphore postcopy_pause_sem_fault;
 };
diff --git a/migration/savevm.c b/migration/savevm.c
index d40092a2b6..5f41b062ba 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2182,6 +2182,9 @@  static bool postcopy_pause_incoming(MigrationIncomingState *mis)
     /* Notify the fault thread for the invalidated file handle */
     postcopy_fault_thread_notify(mis);
 
+    /* Clear the triggered bit to allow one recovery */
+    mis->postcopy_recover_triggered = false;
+
     error_report("Detected IO failure for postcopy. "
                  "Migration paused.");
 
diff --git a/qapi/migration.json b/qapi/migration.json
index 055130314d..dfbcb02d4c 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1172,3 +1172,23 @@ 
 # Since: 2.9
 ##
 { 'command': 'xen-colo-do-checkpoint' }
+
+##
+# @migrate-recover:
+#
+# Provide a recovery migration stream URI.
+#
+# @uri: the URI to be used for the recovery of migration stream.
+#
+# Returns: nothing.
+#
+# Example:
+#
+# -> { "execute": "migrate-recover",
+#      "arguments": { "uri": "tcp:192.168.1.200:12345" } }
+# <- { "return": {} }
+#
+# Since: 2.12
+##
+{ 'command': 'migrate-recover', 'data': { 'uri': 'str' },
+  'allow-oob': true }