diff mbox series

[PULL,30/33] migration: do not restart VM after successful snapshot-load

Message ID 20210511081350.419428-31-pbonzini@redhat.com
State New
Headers show
Series [PULL,01/33] target/i386: Rename helper_fldt, helper_fstt | expand

Commit Message

Paolo Bonzini May 11, 2021, 8:13 a.m. UTC
The HMP loadvm code is calling load_snapshot rather than
qmp_snapshot_load, in order to bypass the job infrastructure.  The code
around it is almost the same, with one difference: hmp_loadvm is
restarting the VM if load_snapshot fails, qmp_snapshot_load is doing so
if load_snapshot succeeds.

Fix the bug in QMP by moving the common code to load_snapshot.

Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 migration/savevm.c | 16 ++++++++--------
 monitor/hmp-cmds.c |  7 +------
 2 files changed, 9 insertions(+), 14 deletions(-)

Comments

Dr. David Alan Gilbert May 11, 2021, 8:56 a.m. UTC | #1
* Paolo Bonzini (pbonzini@redhat.com) wrote:
> The HMP loadvm code is calling load_snapshot rather than
> qmp_snapshot_load, in order to bypass the job infrastructure.  The code
> around it is almost the same, with one difference: hmp_loadvm is
> restarting the VM if load_snapshot fails, qmp_snapshot_load is doing so
> if load_snapshot succeeds.
> 
> Fix the bug in QMP by moving the common code to load_snapshot.

See my comment:
https://lists.gnu.org/archive/html/qemu-devel/2021-05/msg01103.html

but you've also lost Eric's Rb.

Dave

> Cc: qemu-stable@nongnu.org
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  migration/savevm.c | 16 ++++++++--------
>  monitor/hmp-cmds.c |  7 +------
>  2 files changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 52e2d72e4b..a899191cbf 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2992,6 +2992,7 @@ bool load_snapshot(const char *name, const char *vmstate,
>      int ret;
>      AioContext *aio_context;
>      MigrationIncomingState *mis = migration_incoming_get_current();
> +    int saved_vm_running  = runstate_is_running();
>  
>      if (!bdrv_all_can_snapshot(has_devices, devices, errp)) {
>          return false;
> @@ -3024,6 +3025,8 @@ bool load_snapshot(const char *name, const char *vmstate,
>          return false;
>      }
>  
> +    vm_stop(RUN_STATE_RESTORE_VM);
> +
>      /*
>       * Flush the record/replay queue. Now the VM state is going
>       * to change. Therefore we don't need to preserve its consistency
> @@ -3061,13 +3064,17 @@ bool load_snapshot(const char *name, const char *vmstate,
>  
>      if (ret < 0) {
>          error_setg(errp, "Error %d while loading VM state", ret);
> -        return false;
> +        goto err_restart;
>      }
>  
>      return true;
>  
>  err_drain:
>      bdrv_drain_all_end();
> +err_restart:
> +    if (saved_vm_running) {
> +        vm_start();
> +    }
>      return false;
>  }
>  
> @@ -3135,17 +3142,10 @@ static void snapshot_load_job_bh(void *opaque)
>  {
>      Job *job = opaque;
>      SnapshotJob *s = container_of(job, SnapshotJob, common);
> -    int orig_vm_running;
>  
>      job_progress_set_remaining(&s->common, 1);
>  
> -    orig_vm_running = runstate_is_running();
> -    vm_stop(RUN_STATE_RESTORE_VM);
> -
>      s->ret = load_snapshot(s->tag, s->vmstate, true, s->devices, s->errp);
> -    if (s->ret && orig_vm_running) {
> -        vm_start();
> -    }
>  
>      job_progress_update(&s->common, 1);
>  
> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index 0ad5b77477..a39436c8cb 100644
> --- a/monitor/hmp-cmds.c
> +++ b/monitor/hmp-cmds.c
> @@ -1127,15 +1127,10 @@ void hmp_balloon(Monitor *mon, const QDict *qdict)
>  
>  void hmp_loadvm(Monitor *mon, const QDict *qdict)
>  {
> -    int saved_vm_running  = runstate_is_running();
>      const char *name = qdict_get_str(qdict, "name");
>      Error *err = NULL;
>  
> -    vm_stop(RUN_STATE_RESTORE_VM);
> -
> -    if (!load_snapshot(name, NULL, false, NULL, &err) && saved_vm_running) {
> -        vm_start();
> -    }
> +    load_snapshot(name, NULL, false, NULL, &err);
>      hmp_handle_error(mon, err);
>  }
>  
> -- 
> 2.26.2
> 
> 
>
Daniel P. Berrangé May 11, 2021, 9:09 a.m. UTC | #2
On Tue, May 11, 2021 at 04:13:47AM -0400, Paolo Bonzini wrote:
> The HMP loadvm code is calling load_snapshot rather than
> qmp_snapshot_load, in order to bypass the job infrastructure.  The code
> around it is almost the same, with one difference: hmp_loadvm is
> restarting the VM if load_snapshot fails, qmp_snapshot_load is doing so
> if load_snapshot succeeds.
> 
> Fix the bug in QMP by moving the common code to load_snapshot.
> 
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  migration/savevm.c | 16 ++++++++--------
>  monitor/hmp-cmds.c |  7 +------
>  2 files changed, 9 insertions(+), 14 deletions(-)

David had a question about safety on this patch that probably
ought to be answered before merging

  https://lists.gnu.org/archive/html/qemu-devel/2021-05/msg01103.html


Regards,
Daniel
Daniel P. Berrangé May 11, 2021, 5:06 p.m. UTC | #3
On Tue, May 11, 2021 at 04:13:47AM -0400, Paolo Bonzini wrote:
> The HMP loadvm code is calling load_snapshot rather than
> qmp_snapshot_load, in order to bypass the job infrastructure.  The code
> around it is almost the same, with one difference: hmp_loadvm is
> restarting the VM if load_snapshot fails, qmp_snapshot_load is doing so
> if load_snapshot succeeds.
> 
> Fix the bug in QMP by moving the common code to load_snapshot.

This doesn't appear to have actually fixed the HMP regression.
Instead I think it is duplicated the HMP bug in the QMP code
too.

See the 068  iotest enhancement here that validates the expected
state on success:

https://lists.gnu.org/archive/html/qemu-devel/2021-05/msg03091.html


Regards,
Daniel
Kevin Wolf May 12, 2021, 7:45 a.m. UTC | #4
Am 11.05.2021 um 19:06 hat Daniel P. Berrangé geschrieben:
> On Tue, May 11, 2021 at 04:13:47AM -0400, Paolo Bonzini wrote:
> > The HMP loadvm code is calling load_snapshot rather than
> > qmp_snapshot_load, in order to bypass the job infrastructure.  The code
> > around it is almost the same, with one difference: hmp_loadvm is
> > restarting the VM if load_snapshot fails, qmp_snapshot_load is doing so
> > if load_snapshot succeeds.
> > 
> > Fix the bug in QMP by moving the common code to load_snapshot.
> 
> This doesn't appear to have actually fixed the HMP regression.
> Instead I think it is duplicated the HMP bug in the QMP code
> too.
> 
> See the 068  iotest enhancement here that validates the expected
> state on success:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2021-05/msg03091.html

I guess I'll wait for this new QMP bug to hit master, and then rebase my
fix on top of it, with now two Fixes: lines.

Or do you want to send a v2 of this pull request without this patch,
Paolo?

Kevin
Paolo Bonzini May 12, 2021, 8:05 a.m. UTC | #5
On 11/05/21 10:56, Dr. David Alan Gilbert wrote:
> * Paolo Bonzini (pbonzini@redhat.com) wrote:
>> The HMP loadvm code is calling load_snapshot rather than
>> qmp_snapshot_load, in order to bypass the job infrastructure.  The code
>> around it is almost the same, with one difference: hmp_loadvm is
>> restarting the VM if load_snapshot fails, qmp_snapshot_load is doing so
>> if load_snapshot succeeds.
>>
>> Fix the bug in QMP by moving the common code to load_snapshot.
> 
> See my comment:
> https://lists.gnu.org/archive/html/qemu-devel/2021-05/msg01103.html
> 
> but you've also lost Eric's Rb.

Sorry, I missed both replies.  As Daniel pointed out, the bug is in the 
QMP version.  Kevin has the correct fix, I'll send the cleanup of 
vm_stop/vm_start separately on top of his patch.

Paolo

> Dave
> 
>> Cc: qemu-stable@nongnu.org
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>   migration/savevm.c | 16 ++++++++--------
>>   monitor/hmp-cmds.c |  7 +------
>>   2 files changed, 9 insertions(+), 14 deletions(-)
>>
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index 52e2d72e4b..a899191cbf 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -2992,6 +2992,7 @@ bool load_snapshot(const char *name, const char *vmstate,
>>       int ret;
>>       AioContext *aio_context;
>>       MigrationIncomingState *mis = migration_incoming_get_current();
>> +    int saved_vm_running  = runstate_is_running();
>>   
>>       if (!bdrv_all_can_snapshot(has_devices, devices, errp)) {
>>           return false;
>> @@ -3024,6 +3025,8 @@ bool load_snapshot(const char *name, const char *vmstate,
>>           return false;
>>       }
>>   
>> +    vm_stop(RUN_STATE_RESTORE_VM);
>> +
>>       /*
>>        * Flush the record/replay queue. Now the VM state is going
>>        * to change. Therefore we don't need to preserve its consistency
>> @@ -3061,13 +3064,17 @@ bool load_snapshot(const char *name, const char *vmstate,
>>   
>>       if (ret < 0) {
>>           error_setg(errp, "Error %d while loading VM state", ret);
>> -        return false;
>> +        goto err_restart;
>>       }
>>   
>>       return true;
>>   
>>   err_drain:
>>       bdrv_drain_all_end();
>> +err_restart:
>> +    if (saved_vm_running) {
>> +        vm_start();
>> +    }
>>       return false;
>>   }
>>   
>> @@ -3135,17 +3142,10 @@ static void snapshot_load_job_bh(void *opaque)
>>   {
>>       Job *job = opaque;
>>       SnapshotJob *s = container_of(job, SnapshotJob, common);
>> -    int orig_vm_running;
>>   
>>       job_progress_set_remaining(&s->common, 1);
>>   
>> -    orig_vm_running = runstate_is_running();
>> -    vm_stop(RUN_STATE_RESTORE_VM);
>> -
>>       s->ret = load_snapshot(s->tag, s->vmstate, true, s->devices, s->errp);
>> -    if (s->ret && orig_vm_running) {
>> -        vm_start();
>> -    }
>>   
>>       job_progress_update(&s->common, 1);
>>   
>> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
>> index 0ad5b77477..a39436c8cb 100644
>> --- a/monitor/hmp-cmds.c
>> +++ b/monitor/hmp-cmds.c
>> @@ -1127,15 +1127,10 @@ void hmp_balloon(Monitor *mon, const QDict *qdict)
>>   
>>   void hmp_loadvm(Monitor *mon, const QDict *qdict)
>>   {
>> -    int saved_vm_running  = runstate_is_running();
>>       const char *name = qdict_get_str(qdict, "name");
>>       Error *err = NULL;
>>   
>> -    vm_stop(RUN_STATE_RESTORE_VM);
>> -
>> -    if (!load_snapshot(name, NULL, false, NULL, &err) && saved_vm_running) {
>> -        vm_start();
>> -    }
>> +    load_snapshot(name, NULL, false, NULL, &err);
>>       hmp_handle_error(mon, err);
>>   }
>>   
>> -- 
>> 2.26.2
>>
>>
>>
Paolo Bonzini May 12, 2021, 6:11 p.m. UTC | #6
On 12/05/21 09:45, Kevin Wolf wrote:
> I guess I'll wait for this new QMP bug to hit master, and then rebase my
> fix on top of it, with now two Fixes: lines.
> 
> Or do you want to send a v2 of this pull request without this patch,
> Paolo?

I managed to remove it from the tag before Peter pulled, so problem averted.

Paolo
diff mbox series

Patch

diff --git a/migration/savevm.c b/migration/savevm.c
index 52e2d72e4b..a899191cbf 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2992,6 +2992,7 @@  bool load_snapshot(const char *name, const char *vmstate,
     int ret;
     AioContext *aio_context;
     MigrationIncomingState *mis = migration_incoming_get_current();
+    int saved_vm_running  = runstate_is_running();
 
     if (!bdrv_all_can_snapshot(has_devices, devices, errp)) {
         return false;
@@ -3024,6 +3025,8 @@  bool load_snapshot(const char *name, const char *vmstate,
         return false;
     }
 
+    vm_stop(RUN_STATE_RESTORE_VM);
+
     /*
      * Flush the record/replay queue. Now the VM state is going
      * to change. Therefore we don't need to preserve its consistency
@@ -3061,13 +3064,17 @@  bool load_snapshot(const char *name, const char *vmstate,
 
     if (ret < 0) {
         error_setg(errp, "Error %d while loading VM state", ret);
-        return false;
+        goto err_restart;
     }
 
     return true;
 
 err_drain:
     bdrv_drain_all_end();
+err_restart:
+    if (saved_vm_running) {
+        vm_start();
+    }
     return false;
 }
 
@@ -3135,17 +3142,10 @@  static void snapshot_load_job_bh(void *opaque)
 {
     Job *job = opaque;
     SnapshotJob *s = container_of(job, SnapshotJob, common);
-    int orig_vm_running;
 
     job_progress_set_remaining(&s->common, 1);
 
-    orig_vm_running = runstate_is_running();
-    vm_stop(RUN_STATE_RESTORE_VM);
-
     s->ret = load_snapshot(s->tag, s->vmstate, true, s->devices, s->errp);
-    if (s->ret && orig_vm_running) {
-        vm_start();
-    }
 
     job_progress_update(&s->common, 1);
 
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 0ad5b77477..a39436c8cb 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1127,15 +1127,10 @@  void hmp_balloon(Monitor *mon, const QDict *qdict)
 
 void hmp_loadvm(Monitor *mon, const QDict *qdict)
 {
-    int saved_vm_running  = runstate_is_running();
     const char *name = qdict_get_str(qdict, "name");
     Error *err = NULL;
 
-    vm_stop(RUN_STATE_RESTORE_VM);
-
-    if (!load_snapshot(name, NULL, false, NULL, &err) && saved_vm_running) {
-        vm_start();
-    }
+    load_snapshot(name, NULL, false, NULL, &err);
     hmp_handle_error(mon, err);
 }