Message ID | 20171205065307.21853-27-peterx@redhat.com |
---|---|
State | New |
Headers | show |
Series | [v5,01/28] migration: better error handling with QEMUFile | expand |
* Peter Xu (peterx@redhat.com) wrote: > It was allowed in the past to even cancel a postcopy migration, but it > does not really make sense, and no one should be using it, since > cancelling a migration during postcopy means crashing the VM at no time. > > Let's just use re-use this command as a way to pause the postcopy > migration when we detected that we are in postcopy migration. This can > be used when we want to trigger the postcopy recovery manually. > > Signed-off-by: Peter Xu <peterx@redhat.com> Yes, OK, this is a little odd without having any flags or anything, but it's essentially the saem reason that cancel exists - to stop a migration we know that's broken for some reason. (I could argue whether this should be special cased in migrate_fd_cancel instead, but that's just a preference). Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > hmp-commands.hx | 8 ++++++-- > migration/migration.c | 18 +++++++++++++++++- > qapi/migration.json | 3 ++- > 3 files changed, 25 insertions(+), 4 deletions(-) > > diff --git a/hmp-commands.hx b/hmp-commands.hx > index ffcdc34652..32fdd52212 100644 > --- a/hmp-commands.hx > +++ b/hmp-commands.hx > @@ -952,14 +952,18 @@ ETEXI > .name = "migrate_cancel", > .args_type = "", > .params = "", > - .help = "cancel the current VM migration", > + .help = "cancel the current VM precopy migration, or " > + "pause the migration if it's in postcopy state, " > + "so that a postcopy recovery can be carried out later.", > .cmd = hmp_migrate_cancel, > }, > > STEXI > @item migrate_cancel > @findex migrate_cancel > -Cancel the current VM migration. > +If during a precopy, this command cancels the migration. If during > +postcopy, this command pauses the migration (so that a postcopy > +recovery can be carried out afterward). > ETEXI > > { > diff --git a/migration/migration.c b/migration/migration.c > index 8762fad9be..0c1a783df2 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1535,7 +1535,23 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, > > void qmp_migrate_cancel(Error **errp) > { > - migrate_fd_cancel(migrate_get_current()); > + MigrationState *ms = migrate_get_current(); > + int ret; > + > + if (ms->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { > + /* > + * Cancelling a postcopy migration does not really make sense. > + * Here instead we pause the migration only, so another > + * recovery can take place if needed. > + */ > + ret = qemu_file_shutdown(ms->to_dst_file); > + if (ret) { > + error_setg(errp, "Failed to pause migration stream."); > + } > + return; > + } > + > + migrate_fd_cancel(ms); > } > > void qmp_migrate_continue(MigrationStatus state, Error **errp) > diff --git a/qapi/migration.json b/qapi/migration.json > index 5dfac0681d..e15bb516cd 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -881,7 +881,8 @@ > ## > # @migrate_cancel: > # > -# Cancel the current executing migration process. > +# Cancel the current executing migration process for precopy. For > +# postcopy, it'll pause the migration instead. > # > # Returns: nothing on success > # > -- > 2.14.3 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Tue, Dec 19, 2017 at 10:58:59AM +0000, Dr. David Alan Gilbert wrote: > * Peter Xu (peterx@redhat.com) wrote: > > It was allowed in the past to even cancel a postcopy migration, but it > > does not really make sense, and no one should be using it, since > > cancelling a migration during postcopy means crashing the VM at no time. > > > > Let's just use re-use this command as a way to pause the postcopy > > migration when we detected that we are in postcopy migration. This can > > be used when we want to trigger the postcopy recovery manually. > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > Yes, OK, this is a little odd without having any flags or anything, but > it's essentially the saem reason that cancel exists - to stop a > migration we know that's broken for some reason. > > (I could argue whether this should be special cased in migrate_fd_cancel > instead, but that's just a preference). > > > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Firstly, thanks for the r-b. Now after knowing your iptable test, I think we reached a consensus that we need to provide a command (for example, reuse migrate-cancel) to allow destination to shutdown its incoming migration port too. At the same time, if we want to make sure the command can always work on destination, we'd better also let that command to be OOB-capable. However I'm not sure whether I can let migrate-cancel be OOB-capable since recently we added bdrv_invalidate_cache_all() into migrate_fd_cancel(). That invalidation needs some mutex which might block. I don't know whether it means migrate-cancel will no longer be suitable as an OOB command now. So, maybe now I can do this: firstly, drop this patch; then add a new command to do the shutdown explicitly (allow either src/dst to shutdown its migration fd) and keep migrate-cancel untouched. In that case, I can make sure the new command will be OOB-compatible. What do you think?
* Peter Xu (peterx@redhat.com) wrote: > On Tue, Dec 19, 2017 at 10:58:59AM +0000, Dr. David Alan Gilbert wrote: > > * Peter Xu (peterx@redhat.com) wrote: > > > It was allowed in the past to even cancel a postcopy migration, but it > > > does not really make sense, and no one should be using it, since > > > cancelling a migration during postcopy means crashing the VM at no time. > > > > > > Let's just use re-use this command as a way to pause the postcopy > > > migration when we detected that we are in postcopy migration. This can > > > be used when we want to trigger the postcopy recovery manually. > > > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > > > Yes, OK, this is a little odd without having any flags or anything, but > > it's essentially the saem reason that cancel exists - to stop a > > migration we know that's broken for some reason. > > > > (I could argue whether this should be special cased in migrate_fd_cancel > > instead, but that's just a preference). > > > > > > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > Firstly, thanks for the r-b. > > Now after knowing your iptable test, I think we reached a consensus > that we need to provide a command (for example, reuse migrate-cancel) > to allow destination to shutdown its incoming migration port too. At > the same time, if we want to make sure the command can always work on > destination, we'd better also let that command to be OOB-capable. > > However I'm not sure whether I can let migrate-cancel be OOB-capable > since recently we added bdrv_invalidate_cache_all() into > migrate_fd_cancel(). That invalidation needs some mutex which might > block. I don't know whether it means migrate-cancel will no longer be > suitable as an OOB command now. > > So, maybe now I can do this: firstly, drop this patch; then add a new > command to do the shutdown explicitly (allow either src/dst to > shutdown its migration fd) and keep migrate-cancel untouched. In that > case, I can make sure the new command will be OOB-compatible. > > What do you think? Yes I'm fine with that; misusing another command did feel a bit odd. Dave > -- > Peter Xu -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff --git a/hmp-commands.hx b/hmp-commands.hx index ffcdc34652..32fdd52212 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -952,14 +952,18 @@ ETEXI .name = "migrate_cancel", .args_type = "", .params = "", - .help = "cancel the current VM migration", + .help = "cancel the current VM precopy migration, or " + "pause the migration if it's in postcopy state, " + "so that a postcopy recovery can be carried out later.", .cmd = hmp_migrate_cancel, }, STEXI @item migrate_cancel @findex migrate_cancel -Cancel the current VM migration. +If during a precopy, this command cancels the migration. If during +postcopy, this command pauses the migration (so that a postcopy +recovery can be carried out afterward). ETEXI { diff --git a/migration/migration.c b/migration/migration.c index 8762fad9be..0c1a783df2 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1535,7 +1535,23 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, void qmp_migrate_cancel(Error **errp) { - migrate_fd_cancel(migrate_get_current()); + MigrationState *ms = migrate_get_current(); + int ret; + + if (ms->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { + /* + * Cancelling a postcopy migration does not really make sense. + * Here instead we pause the migration only, so another + * recovery can take place if needed. + */ + ret = qemu_file_shutdown(ms->to_dst_file); + if (ret) { + error_setg(errp, "Failed to pause migration stream."); + } + return; + } + + migrate_fd_cancel(ms); } void qmp_migrate_continue(MigrationStatus state, Error **errp) diff --git a/qapi/migration.json b/qapi/migration.json index 5dfac0681d..e15bb516cd 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -881,7 +881,8 @@ ## # @migrate_cancel: # -# Cancel the current executing migration process. +# Cancel the current executing migration process for precopy. For +# postcopy, it'll pause the migration instead. # # Returns: nothing on success #
It was allowed in the past to even cancel a postcopy migration, but it does not really make sense, and no one should be using it, since cancelling a migration during postcopy means crashing the VM at no time. Let's just use re-use this command as a way to pause the postcopy migration when we detected that we are in postcopy migration. This can be used when we want to trigger the postcopy recovery manually. Signed-off-by: Peter Xu <peterx@redhat.com> --- hmp-commands.hx | 8 ++++++-- migration/migration.c | 18 +++++++++++++++++- qapi/migration.json | 3 ++- 3 files changed, 25 insertions(+), 4 deletions(-)