Message ID | 20190923174942.12182-1-dgilbert@redhat.com |
---|---|
State | New |
Headers | show |
Series | migration/postcopy: Recognise the recovery states as 'in_postcopy' | expand |
Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Various parts of the migration code do different things when they're > in postcopy mode; prior to this patch this has been 'postcopy-active'. > This patch extends 'in_postcopy' to include 'postcopy-paused' and > 'postcopy-recover'. > > In particular, when you set the max-postcopy-bandwidth parameter, this > only affects the current migration fd if we're 'in_postcopy'; > this leads to a race in the postcopy recovery test where it increases > the speed from 4k/sec to unlimited, but that increase can get ignored > if the change is made between the point at which the reconnection > happens and it transitions back to active. > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> I'm stress testing it now. > --- > migration/migration.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 01863a95f5..5f7e4d15e9 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1659,7 +1659,14 @@ bool migration_in_postcopy(void) > { > MigrationState *s = migrate_get_current(); > > - return (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE); > + switch (s->state) { > + case MIGRATION_STATUS_POSTCOPY_ACTIVE: > + case MIGRATION_STATUS_POSTCOPY_PAUSED: > + case MIGRATION_STATUS_POSTCOPY_RECOVER: > + return true; > + default: > + return false; > + } > } > > bool migration_in_postcopy_after_devices(MigrationState *s) -- Alex Bennée
On Mon, Sep 23, 2019 at 06:49:42PM +0100, Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Various parts of the migration code do different things when they're > in postcopy mode; prior to this patch this has been 'postcopy-active'. > This patch extends 'in_postcopy' to include 'postcopy-paused' and > 'postcopy-recover'. > > In particular, when you set the max-postcopy-bandwidth parameter, this > only affects the current migration fd if we're 'in_postcopy'; > this leads to a race in the postcopy recovery test where it increases > the speed from 4k/sec to unlimited, but that increase can get ignored > if the change is made between the point at which the reconnection > happens and it transitions back to active. > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Yeh this makes quite a lot of sense to me... Reviewed-by: Peter Xu <peterx@redhat.com>
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Various parts of the migration code do different things when they're > in postcopy mode; prior to this patch this has been 'postcopy-active'. > This patch extends 'in_postcopy' to include 'postcopy-paused' and > 'postcopy-recover'. > > In particular, when you set the max-postcopy-bandwidth parameter, this > only affects the current migration fd if we're 'in_postcopy'; > this leads to a race in the postcopy recovery test where it increases > the speed from 4k/sec to unlimited, but that increase can get ignored > if the change is made between the point at which the reconnection > happens and it transitions back to active. > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>
Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Various parts of the migration code do different things when they're > in postcopy mode; prior to this patch this has been 'postcopy-active'. > This patch extends 'in_postcopy' to include 'postcopy-paused' and > 'postcopy-recover'. > > In particular, when you set the max-postcopy-bandwidth parameter, this > only affects the current migration fd if we're 'in_postcopy'; > this leads to a race in the postcopy recovery test where it increases > the speed from 4k/sec to unlimited, but that increase can get ignored > if the change is made between the point at which the reconnection > happens and it transitions back to active. > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> In my xenial stress test I run 100 times and it never triggered the 180s timeout I set on my retry.py script: Tested-by: Alex Bennée <alex.bennee@linaro.org> > --- > migration/migration.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 01863a95f5..5f7e4d15e9 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1659,7 +1659,14 @@ bool migration_in_postcopy(void) > { > MigrationState *s = migrate_get_current(); > > - return (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE); > + switch (s->state) { > + case MIGRATION_STATUS_POSTCOPY_ACTIVE: > + case MIGRATION_STATUS_POSTCOPY_PAUSED: > + case MIGRATION_STATUS_POSTCOPY_RECOVER: > + return true; > + default: > + return false; > + } > } > > bool migration_in_postcopy_after_devices(MigrationState *s) -- Alex Bennée
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> writes: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Various parts of the migration code do different things when they're > in postcopy mode; prior to this patch this has been 'postcopy-active'. > This patch extends 'in_postcopy' to include 'postcopy-paused' and > 'postcopy-recover'. > > In particular, when you set the max-postcopy-bandwidth parameter, this > only affects the current migration fd if we're 'in_postcopy'; > this leads to a race in the postcopy recovery test where it increases > the speed from 4k/sec to unlimited, but that increase can get ignored > if the change is made between the point at which the reconnection > happens and it transitions back to active. > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> This seems to fix the intermittent hangs I observed and bisected to commit 8504ddeca0 "migration: Fix postcopy bw for recovery". Tested-by: Markus Armbruster <armbru@redhat.com>
* Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Various parts of the migration code do different things when they're > in postcopy mode; prior to this patch this has been 'postcopy-active'. > This patch extends 'in_postcopy' to include 'postcopy-paused' and > 'postcopy-recover'. > > In particular, when you set the max-postcopy-bandwidth parameter, this > only affects the current migration fd if we're 'in_postcopy'; > this leads to a race in the postcopy recovery test where it increases > the speed from 4k/sec to unlimited, but that increase can get ignored > if the change is made between the point at which the reconnection > happens and it transitions back to active. > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Queued > --- > migration/migration.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 01863a95f5..5f7e4d15e9 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1659,7 +1659,14 @@ bool migration_in_postcopy(void) > { > MigrationState *s = migrate_get_current(); > > - return (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE); > + switch (s->state) { > + case MIGRATION_STATUS_POSTCOPY_ACTIVE: > + case MIGRATION_STATUS_POSTCOPY_PAUSED: > + case MIGRATION_STATUS_POSTCOPY_RECOVER: > + return true; > + default: > + return false; > + } > } > > bool migration_in_postcopy_after_devices(MigrationState *s) > -- > 2.21.0 > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff --git a/migration/migration.c b/migration/migration.c index 01863a95f5..5f7e4d15e9 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1659,7 +1659,14 @@ bool migration_in_postcopy(void) { MigrationState *s = migrate_get_current(); - return (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE); + switch (s->state) { + case MIGRATION_STATUS_POSTCOPY_ACTIVE: + case MIGRATION_STATUS_POSTCOPY_PAUSED: + case MIGRATION_STATUS_POSTCOPY_RECOVER: + return true; + default: + return false; + } } bool migration_in_postcopy_after_devices(MigrationState *s)