Message ID | 20171108060130.3772-31-peterx@redhat.com |
---|---|
State | New |
Headers | show |
Series | Migration: postcopy failure recovery | expand |
* Peter Xu (peterx@redhat.com) wrote: > Switch the state until we try to start the VM on destination side. The > problem is that without doing this we may have a very small window that > we'll be in such a state: > > - dst VM is in postcopy-active state, > - main thread is handling MIG_CMD_PACKAGED message, which loads all the > device states, > - ram load thread is reading memory data from source. > > Then if we failed at this point when reading the migration stream we'll > also switch to postcopy-paused state, but that is not what we want. If > device states failed to load, we should fail the migration directly > instead of pause. > > Postponing the state switch to the point when we have already loaded the > devices' states and been ready to start running destination VM. > > Signed-off-by: Peter Xu <peterx@redhat.com> If it's the only way, then this is OK; but I'd prefer you use a separate flag somewhere to let you know this, because this means that POSTCOPY_ACTIVE on the destination happens at a different point that it does on the source (and changing it on the source I think will break lots of things). Can't you use the PostcopyState value and check if it's in POSTCOPY_INCOMING_RUNNING? Dave > --- > migration/savevm.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/migration/savevm.c b/migration/savevm.c > index bc87b0e5b1..3bc792e320 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -1584,8 +1584,6 @@ static void *postcopy_ram_listen_thread(void *opaque) > QEMUFile *f = mis->from_src_file; > int load_res; > > - migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, > - MIGRATION_STATUS_POSTCOPY_ACTIVE); > qemu_sem_post(&mis->listen_thread_sem); > trace_postcopy_ram_listen_thread_start(); > > @@ -1748,6 +1746,14 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis) > return -1; > } > > + /* > + * Declare that we are in postcopy now. We should already have > + * all the device states loaded ready when reach here, and also > + * the ram load thread running. > + */ > + migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, > + MIGRATION_STATUS_POSTCOPY_ACTIVE); > + > data = g_new(HandleRunBhData, 1); > data->bh = qemu_bh_new(loadvm_postcopy_handle_run_bh, data); > qemu_bh_schedule(data->bh); > -- > 2.13.6 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Fri, Dec 01, 2017 at 12:34:32PM +0000, Dr. David Alan Gilbert wrote: > * Peter Xu (peterx@redhat.com) wrote: > > Switch the state until we try to start the VM on destination side. The > > problem is that without doing this we may have a very small window that > > we'll be in such a state: > > > > - dst VM is in postcopy-active state, > > - main thread is handling MIG_CMD_PACKAGED message, which loads all the > > device states, > > - ram load thread is reading memory data from source. > > > > Then if we failed at this point when reading the migration stream we'll > > also switch to postcopy-paused state, but that is not what we want. If > > device states failed to load, we should fail the migration directly > > instead of pause. > > > > Postponing the state switch to the point when we have already loaded the > > devices' states and been ready to start running destination VM. > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > If it's the only way, then this is OK; but I'd prefer you use a separate > flag somewhere to let you know this, because this means that > POSTCOPY_ACTIVE on the destination happens at a different point that it > does on the source (and changing it on the source I think will break > lots of things). Yes, I thought it was fine to postpone it a bit to the point even after receiving the packed data, but I fully understand your point. > Can't you use the PostcopyState value and check if it's in > POSTCOPY_INCOMING_RUNNING? I think, yes. :) Let me drop this patch, instead I'll check explicitly on POSTCOPY_INCOMING_RUNNING state in patch 6. Thanks,
diff --git a/migration/savevm.c b/migration/savevm.c index bc87b0e5b1..3bc792e320 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1584,8 +1584,6 @@ static void *postcopy_ram_listen_thread(void *opaque) QEMUFile *f = mis->from_src_file; int load_res; - migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, - MIGRATION_STATUS_POSTCOPY_ACTIVE); qemu_sem_post(&mis->listen_thread_sem); trace_postcopy_ram_listen_thread_start(); @@ -1748,6 +1746,14 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis) return -1; } + /* + * Declare that we are in postcopy now. We should already have + * all the device states loaded ready when reach here, and also + * the ram load thread running. + */ + migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, + MIGRATION_STATUS_POSTCOPY_ACTIVE); + data = g_new(HandleRunBhData, 1); data->bh = qemu_bh_new(loadvm_postcopy_handle_run_bh, data); qemu_bh_schedule(data->bh);
Switch the state until we try to start the VM on destination side. The problem is that without doing this we may have a very small window that we'll be in such a state: - dst VM is in postcopy-active state, - main thread is handling MIG_CMD_PACKAGED message, which loads all the device states, - ram load thread is reading memory data from source. Then if we failed at this point when reading the migration stream we'll also switch to postcopy-paused state, but that is not what we want. If device states failed to load, we should fail the migration directly instead of pause. Postponing the state switch to the point when we have already loaded the devices' states and been ready to start running destination VM. Signed-off-by: Peter Xu <peterx@redhat.com> --- migration/savevm.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)