[v5,26/28] migration: allow migrate_cancel to pause postcopy

Message ID 20171205065307.21853-27-peterx@redhat.com
State New
Headers show
Series
  • [v5,01/28] migration: better error handling with QEMUFile
Related show

Commit Message

Peter Xu Dec. 5, 2017, 6:53 a.m.
It was allowed in the past to even cancel a postcopy migration, but it
does not really make sense, and no one should be using it, since
cancelling a migration during postcopy means crashing the VM at no time.

Let's just use re-use this command as a way to pause the postcopy
migration when we detected that we are in postcopy migration.  This can
be used when we want to trigger the postcopy recovery manually.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hmp-commands.hx       |  8 ++++++--
 migration/migration.c | 18 +++++++++++++++++-
 qapi/migration.json   |  3 ++-
 3 files changed, 25 insertions(+), 4 deletions(-)

Comments

Dr. David Alan Gilbert Dec. 19, 2017, 10:58 a.m. | #1
* Peter Xu (peterx@redhat.com) wrote:
> It was allowed in the past to even cancel a postcopy migration, but it
> does not really make sense, and no one should be using it, since
> cancelling a migration during postcopy means crashing the VM at no time.
> 
> Let's just use re-use this command as a way to pause the postcopy
> migration when we detected that we are in postcopy migration.  This can
> be used when we want to trigger the postcopy recovery manually.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Yes, OK, this is a little odd without having any flags or anything, but
it's essentially the saem reason that cancel exists - to stop a
migration we know that's broken for some reason.

(I could argue whether this should be special cased in migrate_fd_cancel
instead, but that's just a preference).


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

>  hmp-commands.hx       |  8 ++++++--
>  migration/migration.c | 18 +++++++++++++++++-
>  qapi/migration.json   |  3 ++-
>  3 files changed, 25 insertions(+), 4 deletions(-)
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index ffcdc34652..32fdd52212 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -952,14 +952,18 @@ ETEXI
>          .name       = "migrate_cancel",
>          .args_type  = "",
>          .params     = "",
> -        .help       = "cancel the current VM migration",
> +        .help       = "cancel the current VM precopy migration, or "
> +                      "pause the migration if it's in postcopy state, "
> +                      "so that a postcopy recovery can be carried out later.",
>          .cmd        = hmp_migrate_cancel,
>      },
>  
>  STEXI
>  @item migrate_cancel
>  @findex migrate_cancel
> -Cancel the current VM migration.
> +If during a precopy, this command cancels the migration.  If during
> +postcopy, this command pauses the migration (so that a postcopy
> +recovery can be carried out afterward).
>  ETEXI
>  
>      {
> diff --git a/migration/migration.c b/migration/migration.c
> index 8762fad9be..0c1a783df2 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1535,7 +1535,23 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
>  
>  void qmp_migrate_cancel(Error **errp)
>  {
> -    migrate_fd_cancel(migrate_get_current());
> +    MigrationState *ms = migrate_get_current();
> +    int ret;
> +
> +    if (ms->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +        /*
> +         * Cancelling a postcopy migration does not really make sense.
> +         * Here instead we pause the migration only, so another
> +         * recovery can take place if needed.
> +         */
> +        ret = qemu_file_shutdown(ms->to_dst_file);
> +        if (ret) {
> +            error_setg(errp, "Failed to pause migration stream.");
> +        }
> +        return;
> +    }
> +
> +    migrate_fd_cancel(ms);
>  }
>  
>  void qmp_migrate_continue(MigrationStatus state, Error **errp)
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 5dfac0681d..e15bb516cd 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -881,7 +881,8 @@
>  ##
>  # @migrate_cancel:
>  #
> -# Cancel the current executing migration process.
> +# Cancel the current executing migration process for precopy.  For
> +# postcopy, it'll pause the migration instead.
>  #
>  # Returns: nothing on success
>  #
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Peter Xu Jan. 24, 2018, 8:28 a.m. | #2
On Tue, Dec 19, 2017 at 10:58:59AM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > It was allowed in the past to even cancel a postcopy migration, but it
> > does not really make sense, and no one should be using it, since
> > cancelling a migration during postcopy means crashing the VM at no time.
> > 
> > Let's just use re-use this command as a way to pause the postcopy
> > migration when we detected that we are in postcopy migration.  This can
> > be used when we want to trigger the postcopy recovery manually.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> Yes, OK, this is a little odd without having any flags or anything, but
> it's essentially the saem reason that cancel exists - to stop a
> migration we know that's broken for some reason.
> 
> (I could argue whether this should be special cased in migrate_fd_cancel
> instead, but that's just a preference).
> 
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Firstly, thanks for the r-b.

Now after knowing your iptable test, I think we reached a consensus
that we need to provide a command (for example, reuse migrate-cancel)
to allow destination to shutdown its incoming migration port too.  At
the same time, if we want to make sure the command can always work on
destination, we'd better also let that command to be OOB-capable.

However I'm not sure whether I can let migrate-cancel be OOB-capable
since recently we added bdrv_invalidate_cache_all() into
migrate_fd_cancel().  That invalidation needs some mutex which might
block.  I don't know whether it means migrate-cancel will no longer be
suitable as an OOB command now.

So, maybe now I can do this: firstly, drop this patch; then add a new
command to do the shutdown explicitly (allow either src/dst to
shutdown its migration fd) and keep migrate-cancel untouched. In that
case, I can make sure the new command will be OOB-compatible.

What do you think?
Dr. David Alan Gilbert Jan. 24, 2018, 9:06 a.m. | #3
* Peter Xu (peterx@redhat.com) wrote:
> On Tue, Dec 19, 2017 at 10:58:59AM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > It was allowed in the past to even cancel a postcopy migration, but it
> > > does not really make sense, and no one should be using it, since
> > > cancelling a migration during postcopy means crashing the VM at no time.
> > > 
> > > Let's just use re-use this command as a way to pause the postcopy
> > > migration when we detected that we are in postcopy migration.  This can
> > > be used when we want to trigger the postcopy recovery manually.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > 
> > Yes, OK, this is a little odd without having any flags or anything, but
> > it's essentially the saem reason that cancel exists - to stop a
> > migration we know that's broken for some reason.
> > 
> > (I could argue whether this should be special cased in migrate_fd_cancel
> > instead, but that's just a preference).
> > 
> > 
> > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Firstly, thanks for the r-b.
> 
> Now after knowing your iptable test, I think we reached a consensus
> that we need to provide a command (for example, reuse migrate-cancel)
> to allow destination to shutdown its incoming migration port too.  At
> the same time, if we want to make sure the command can always work on
> destination, we'd better also let that command to be OOB-capable.
> 
> However I'm not sure whether I can let migrate-cancel be OOB-capable
> since recently we added bdrv_invalidate_cache_all() into
> migrate_fd_cancel().  That invalidation needs some mutex which might
> block.  I don't know whether it means migrate-cancel will no longer be
> suitable as an OOB command now.
> 
> So, maybe now I can do this: firstly, drop this patch; then add a new
> command to do the shutdown explicitly (allow either src/dst to
> shutdown its migration fd) and keep migrate-cancel untouched. In that
> case, I can make sure the new command will be OOB-compatible.
> 
> What do you think?

Yes I'm fine with that; misusing another command did feel a bit odd.

Dave

> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Patch

diff --git a/hmp-commands.hx b/hmp-commands.hx
index ffcdc34652..32fdd52212 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -952,14 +952,18 @@  ETEXI
         .name       = "migrate_cancel",
         .args_type  = "",
         .params     = "",
-        .help       = "cancel the current VM migration",
+        .help       = "cancel the current VM precopy migration, or "
+                      "pause the migration if it's in postcopy state, "
+                      "so that a postcopy recovery can be carried out later.",
         .cmd        = hmp_migrate_cancel,
     },
 
 STEXI
 @item migrate_cancel
 @findex migrate_cancel
-Cancel the current VM migration.
+If during a precopy, this command cancels the migration.  If during
+postcopy, this command pauses the migration (so that a postcopy
+recovery can be carried out afterward).
 ETEXI
 
     {
diff --git a/migration/migration.c b/migration/migration.c
index 8762fad9be..0c1a783df2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1535,7 +1535,23 @@  void qmp_migrate(const char *uri, bool has_blk, bool blk,
 
 void qmp_migrate_cancel(Error **errp)
 {
-    migrate_fd_cancel(migrate_get_current());
+    MigrationState *ms = migrate_get_current();
+    int ret;
+
+    if (ms->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        /*
+         * Cancelling a postcopy migration does not really make sense.
+         * Here instead we pause the migration only, so another
+         * recovery can take place if needed.
+         */
+        ret = qemu_file_shutdown(ms->to_dst_file);
+        if (ret) {
+            error_setg(errp, "Failed to pause migration stream.");
+        }
+        return;
+    }
+
+    migrate_fd_cancel(ms);
 }
 
 void qmp_migrate_continue(MigrationStatus state, Error **errp)
diff --git a/qapi/migration.json b/qapi/migration.json
index 5dfac0681d..e15bb516cd 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -881,7 +881,8 @@ 
 ##
 # @migrate_cancel:
 #
-# Cancel the current executing migration process.
+# Cancel the current executing migration process for precopy.  For
+# postcopy, it'll pause the migration instead.
 #
 # Returns: nothing on success
 #