Message ID | 1374501718-2581-8-git-send-email-mrhines@linux.vnet.ibm.com |
---|---|
State | New |
Headers | show |
On 07/22/2013 08:01 AM, mrhines@linux.vnet.ibm.com wrote: > From: "Michael R. Hines" <mrhines@us.ibm.com> > > As described in the previous patch, until now, the MIG_STATE_SETUP > state was not really a 'formal' state. It has been used as a 'zero' state > (what we're calling 'NONE' here) and QEMU has been unconditionally transitioning > into this state when the QMP migration command was called. Instead we want to > introduce MIG_STATE_NONE, which is our starting state in the state machine, and > then immediately transition into the MIG_STATE_SETUP state when the QMP migrate > command is issued. > > In order to do this, we must delay the transition into MIG_STATE_ACTIVE until > later in the migration_thread(). This is done to be able to timestamp the amount of > time spent in the SETUP state for proper accounting to the user during > an RDMA migration. > > Furthermore, the management software, until now, has never been aware of the > existence of the SETUP state whatsoever. This must change, because, timing of this > state implies that the state actually exists. > > These two patches cannot be separated because the 'query_migrate' QMP > switch statement needs to know how to handle this new state transition. This patch causes an issue with libvirt migration: https://bugzilla.redhat.com/show_bug.cgi?id=1015636 > + case MIG_STATE_SETUP: > + info->has_status = true; > + info->status = g_strdup("setup"); > + break; You are now returning a state that older libvirt versions are not expecting. Obviously, we can patch newer libvirt to make migration work again, but should we be thinking about damage control by stating that an older management app should still be able to drive migration on a new qemu? Or is it acceptable to state that newer qemu requires newer management tools? If we try to support this working under older management tools, maybe the approach is that we have to add some new migration capability; newer management tools set the capability to true and are therefore allowed to see the new state name; older management tools do not set the capability and when that is the case we guarantee that we do not return a state string that the older tool is not expecting. Thoughts on whether we should pursue this?
Il 08/10/2013 16:49, Eric Blake ha scritto: > You are now returning a state that older libvirt versions are not > expecting. Obviously, we can patch newer libvirt to make migration work > again, but should we be thinking about damage control by stating that an > older management app should still be able to drive migration on a new > qemu? Or is it acceptable to state that newer qemu requires newer > management tools? We strive for that, but sometimes we break because we do not really know what management expects from QEMU. In this case, in particular, I think a capability is a bit overkill. Libvirt needs to be somewhat liberal in what it accepts; in this case it can treat any unknown state as "active". Paolo > If we try to support this working under older management tools, maybe > the approach is that we have to add some new migration capability; newer > management tools set the capability to true and are therefore allowed to > see the new state name; older management tools do not set the capability > and when that is the case we guarantee that we do not return a state > string that the older tool is not expecting. Thoughts on whether we > should pursue this? >
On 10/08/2013 12:05 PM, Paolo Bonzini wrote: > Il 08/10/2013 16:49, Eric Blake ha scritto: >> You are now returning a state that older libvirt versions are not >> expecting. Obviously, we can patch newer libvirt to make migration work >> again, but should we be thinking about damage control by stating that an >> older management app should still be able to drive migration on a new >> qemu? Or is it acceptable to state that newer qemu requires newer >> management tools? > We strive for that, but sometimes we break because we do not really know > what management expects from QEMU. > > In this case, in particular, I think a capability is a bit overkill. > Libvirt needs to be somewhat liberal in what it accepts; in this case it > can treat any unknown state as "active". > > Paolo That makes sense to me too - the setup state is "effectively" the same as active and can be safely treated as such. There's a similar issue MigrationInfo statistics - what's the backwards/forwards-compatible procedure for not breaking libvirt when new statics appear? - such as "setup-time", which timestamps the new state that was introduced? - Michael for adding new statistics - Michael
On 10/08/2013 11:32 AM, Michael R. Hines wrote: >> >> In this case, in particular, I think a capability is a bit overkill. >> Libvirt needs to be somewhat liberal in what it accepts; in this case it >> can treat any unknown state as "active". >> >> Paolo > > That makes sense to me too - the setup state is "effectively" the same > as active and can be safely treated as such. > > There's a similar issue MigrationInfo statistics - what's the > backwards/forwards-compatible procedure for not breaking libvirt > when new statics appear? - such as "setup-time", which timestamps > the new state that was introduced? Libvirt ignores fields it does not expect. But for fields it expects, it was not ignoring new enum values within those fields that it was not expecting. The 'setup-time' change is not a problem (new field); only the new state ('setup') - and even then, Paolo is right that newer libvirt will be taught to be tolerant of unknown states, and that it is okay to require newer libvirt when using newer qemu (for example, when qemu 1.0 first came out, you had to upgrade to a new-enough libvirt that knew what to do with the different version number). So sounds like nothing further to worry about in qemu.
On Oct 8, 2013, at 12:05 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 08/10/2013 16:49, Eric Blake ha scritto: >> You are now returning a state that older libvirt versions are not >> expecting. Obviously, we can patch newer libvirt to make migration work >> again, but should we be thinking about damage control by stating that an >> older management app should still be able to drive migration on a new >> qemu? Or is it acceptable to state that newer qemu requires newer >> management tools? > > We strive for that, but sometimes we break because we do not really know > what management expects from QEMU. > > In this case, in particular, I think a capability is a bit overkill. > Libvirt needs to be somewhat liberal in what it accepts; in this case it > can treat any unknown state as "active". > > Paolo > >> If we try to support this working under older management tools, maybe >> the approach is that we have to add some new migration capability; newer >> management tools set the capability to true and are therefore allowed to >> see the new state name; older management tools do not set the capability >> and when that is the case we guarantee that we do not return a state >> string that the older tool is not expecting. Thoughts on whether we >> should pursue this? > >
diff --git a/migration.c b/migration.c index d212871..c259d93 100644 --- a/migration.c +++ b/migration.c @@ -36,7 +36,8 @@ #endif enum { - MIG_STATE_ERROR, + MIG_STATE_ERROR = -1, + MIG_STATE_NONE, MIG_STATE_SETUP, MIG_STATE_CANCELLED, MIG_STATE_ACTIVE, @@ -63,7 +64,7 @@ static NotifierList migration_state_notifiers = MigrationState *migrate_get_current(void) { static MigrationState current_migration = { - .state = MIG_STATE_SETUP, + .state = MIG_STATE_NONE, .bandwidth_limit = MAX_THROTTLE, .xbzrle_cache_size = DEFAULT_MIGRATE_CACHE_SIZE, .mbps = -1, @@ -184,9 +185,13 @@ MigrationInfo *qmp_query_migrate(Error **errp) MigrationState *s = migrate_get_current(); switch (s->state) { - case MIG_STATE_SETUP: + case MIG_STATE_NONE: /* no migration has happened ever */ break; + case MIG_STATE_SETUP: + info->has_status = true; + info->status = g_strdup("setup"); + break; case MIG_STATE_ACTIVE: info->has_status = true; info->status = g_strdup("active"); @@ -257,7 +262,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params, MigrationState *s = migrate_get_current(); MigrationCapabilityStatusList *cap; - if (s->state == MIG_STATE_ACTIVE) { + if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP) { error_set(errp, QERR_MIGRATION_ACTIVE); return; } @@ -392,7 +397,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, params.blk = blk; params.shared = inc; - if (s->state == MIG_STATE_ACTIVE) { + if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP) { error_set(errp, QERR_MIGRATION_ACTIVE); return; } @@ -524,6 +529,8 @@ static void *migration_thread(void *opaque) DPRINTF("beginning savevm\n"); qemu_savevm_state_begin(s->file, &s->params); + migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE); + while (s->state == MIG_STATE_ACTIVE) { int64_t current_time; uint64_t pending_size; @@ -603,8 +610,8 @@ static void *migration_thread(void *opaque) void migrate_fd_connect(MigrationState *s) { - s->state = MIG_STATE_ACTIVE; - trace_migrate_set_state(MIG_STATE_ACTIVE); + s->state = MIG_STATE_SETUP; + trace_migrate_set_state(MIG_STATE_SETUP); /* This is a best 1st approximation. ns to ms */ s->expected_downtime = max_downtime/1000000;