diff mbox series

[v4,30/32] migration: delay the postcopy-active state switch

Message ID 20171108060130.3772-31-peterx@redhat.com
State New
Headers show
Series Migration: postcopy failure recovery | expand

Commit Message

Peter Xu Nov. 8, 2017, 6:01 a.m. UTC
Switch the state until we try to start the VM on destination side.  The
problem is that without doing this we may have a very small window that
we'll be in such a state:

- dst VM is in postcopy-active state,
- main thread is handling MIG_CMD_PACKAGED message, which loads all the
  device states,
- ram load thread is reading memory data from source.

Then if we failed at this point when reading the migration stream we'll
also switch to postcopy-paused state, but that is not what we want.  If
device states failed to load, we should fail the migration directly
instead of pause.

Postponing the state switch to the point when we have already loaded the
devices' states and been ready to start running destination VM.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/savevm.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Dr. David Alan Gilbert Dec. 1, 2017, 12:34 p.m. UTC | #1
* Peter Xu (peterx@redhat.com) wrote:
> Switch the state until we try to start the VM on destination side.  The
> problem is that without doing this we may have a very small window that
> we'll be in such a state:
> 
> - dst VM is in postcopy-active state,
> - main thread is handling MIG_CMD_PACKAGED message, which loads all the
>   device states,
> - ram load thread is reading memory data from source.
> 
> Then if we failed at this point when reading the migration stream we'll
> also switch to postcopy-paused state, but that is not what we want.  If
> device states failed to load, we should fail the migration directly
> instead of pause.
> 
> Postponing the state switch to the point when we have already loaded the
> devices' states and been ready to start running destination VM.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

If it's the only way, then this is OK; but I'd prefer you use a separate
flag somewhere to let you know this, because this means that
POSTCOPY_ACTIVE on the destination happens at a different point that it
does on the source (and changing it on the source I think will break
lots of things).
Can't you use the PostcopyState value and check if it's in
POSTCOPY_INCOMING_RUNNING?

Dave

> ---
>  migration/savevm.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index bc87b0e5b1..3bc792e320 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1584,8 +1584,6 @@ static void *postcopy_ram_listen_thread(void *opaque)
>      QEMUFile *f = mis->from_src_file;
>      int load_res;
>  
> -    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> -                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);
>      qemu_sem_post(&mis->listen_thread_sem);
>      trace_postcopy_ram_listen_thread_start();
>  
> @@ -1748,6 +1746,14 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
>          return -1;
>      }
>  
> +    /*
> +     * Declare that we are in postcopy now.  We should already have
> +     * all the device states loaded ready when reach here, and also
> +     * the ram load thread running.
> +     */
> +    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> +                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);
> +
>      data = g_new(HandleRunBhData, 1);
>      data->bh = qemu_bh_new(loadvm_postcopy_handle_run_bh, data);
>      qemu_bh_schedule(data->bh);
> -- 
> 2.13.6
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Peter Xu Dec. 4, 2017, 4:14 a.m. UTC | #2
On Fri, Dec 01, 2017 at 12:34:32PM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > Switch the state until we try to start the VM on destination side.  The
> > problem is that without doing this we may have a very small window that
> > we'll be in such a state:
> > 
> > - dst VM is in postcopy-active state,
> > - main thread is handling MIG_CMD_PACKAGED message, which loads all the
> >   device states,
> > - ram load thread is reading memory data from source.
> > 
> > Then if we failed at this point when reading the migration stream we'll
> > also switch to postcopy-paused state, but that is not what we want.  If
> > device states failed to load, we should fail the migration directly
> > instead of pause.
> > 
> > Postponing the state switch to the point when we have already loaded the
> > devices' states and been ready to start running destination VM.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> If it's the only way, then this is OK; but I'd prefer you use a separate
> flag somewhere to let you know this, because this means that
> POSTCOPY_ACTIVE on the destination happens at a different point that it
> does on the source (and changing it on the source I think will break
> lots of things).

Yes, I thought it was fine to postpone it a bit to the point even
after receiving the packed data, but I fully understand your point.

> Can't you use the PostcopyState value and check if it's in
> POSTCOPY_INCOMING_RUNNING?

I think, yes. :)

Let me drop this patch, instead I'll check explicitly on
POSTCOPY_INCOMING_RUNNING state in patch 6.  Thanks,
diff mbox series

Patch

diff --git a/migration/savevm.c b/migration/savevm.c
index bc87b0e5b1..3bc792e320 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1584,8 +1584,6 @@  static void *postcopy_ram_listen_thread(void *opaque)
     QEMUFile *f = mis->from_src_file;
     int load_res;
 
-    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
-                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);
     qemu_sem_post(&mis->listen_thread_sem);
     trace_postcopy_ram_listen_thread_start();
 
@@ -1748,6 +1746,14 @@  static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
         return -1;
     }
 
+    /*
+     * Declare that we are in postcopy now.  We should already have
+     * all the device states loaded ready when reach here, and also
+     * the ram load thread running.
+     */
+    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
+                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);
+
     data = g_new(HandleRunBhData, 1);
     data->bh = qemu_bh_new(loadvm_postcopy_handle_run_bh, data);
     qemu_bh_schedule(data->bh);