Message ID | 20230130080307.1792-6-quintela@redhat.com |
---|---|
State | New |
Headers | show |
Series | [PULL,1/5] migration: Fix migration crash when target psize larger than host | expand |
On 30.01.23 11:03, Juan Quintela wrote: > Signed-off-by: Juan Quintela <quintela@redhat.com> > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > migration/migration.c | 24 ++++++++++++------------ > 1 file changed, 12 insertions(+), 12 deletions(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 594a42f085..644c61e91d 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -3764,23 +3764,23 @@ static MigIterateState migration_iteration_run(MigrationState *s) > pend_pre, pend_compat, pend_post); > } > > - if (pending_size && pending_size >= s->threshold_size) { > - /* Still a significant amount to transfer */ > - if (!in_postcopy && pend_pre <= s->threshold_size && > - qatomic_read(&s->start_postcopy)) { > - if (postcopy_start(s)) { > - error_report("%s: postcopy failed to start", __func__); > - } > - return MIG_ITERATE_SKIP; > - } > - /* Just another iteration step */ > - qemu_savevm_state_iterate(s->to_dst_file, in_postcopy); > - } else { > + if (pending_size < s->threshold_size) { to keep the logic, formally it should be "if (!pending_size || pending_size < s->threshold_size)"... Actually, could s->threshold_size be 0 here? Or, worth an assertion assert(s->threshold_size) ? > trace_migration_thread_low_pending(pending_size); > migration_completion(s); > return MIG_ITERATE_BREAK; > } > > + /* Still a significant amount to transfer */ > + if (!in_postcopy && pend_pre <= s->threshold_size && > + qatomic_read(&s->start_postcopy)) { > + if (postcopy_start(s)) { > + error_report("%s: postcopy failed to start", __func__); > + } > + return MIG_ITERATE_SKIP; > + } > + > + /* Just another iteration step */ > + qemu_savevm_state_iterate(s->to_dst_file, in_postcopy); > return MIG_ITERATE_RESUME; > } >
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> wrote: > On 30.01.23 11:03, Juan Quintela wrote: >> Signed-off-by: Juan Quintela <quintela@redhat.com> >> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> >> --- >> migration/migration.c | 24 ++++++++++++------------ >> 1 file changed, 12 insertions(+), 12 deletions(-) >> diff --git a/migration/migration.c b/migration/migration.c >> index 594a42f085..644c61e91d 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -3764,23 +3764,23 @@ static MigIterateState migration_iteration_run(MigrationState *s) >> pend_pre, pend_compat, pend_post); >> } >> - if (pending_size && pending_size >= s->threshold_size) { >> - /* Still a significant amount to transfer */ >> - if (!in_postcopy && pend_pre <= s->threshold_size && >> - qatomic_read(&s->start_postcopy)) { >> - if (postcopy_start(s)) { >> - error_report("%s: postcopy failed to start", __func__); >> - } >> - return MIG_ITERATE_SKIP; >> - } >> - /* Just another iteration step */ >> - qemu_savevm_state_iterate(s->to_dst_file, in_postcopy); >> - } else { >> + if (pending_size < s->threshold_size) { > > to keep the logic, formally it should be "if (!pending_size || pending_size < s->threshold_size)"... > Actually, could s->threshold_size be 0 here? Or, worth an assertion assert(s->threshold_size) ? It is weird, but it could: bandwidth = (double)transferred / time_spent; s->threshold_size = bandwidth * s->parameters.downtime_limit; That is the (real) only place when we set it, and if the network was completely down, bandwidth could be zero. But I think that it is better to explain in the other way. What does the current code do: if (too much dirty memory to converge) do another iteration else do final iteration What the new code do is if (low enough memory to converge) do final iteration. do another iteration. So, how we decide if we converge? if pending_size < s->threshold_size we "could" change it to pending_size <= s->threshold_size for the zero case But I think that we would be shooting ourselves in the foot, because we are having network trouble (only reason that s->theshold_size could be zero) and we still have all the devices state to migrate. And once that we enter that state, there is no way back, guest is stopped in source and we are not able to send anything else. So I think it is better this way. What do you think? Later, Juan.
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> wrote: > On 30.01.23 11:03, Juan Quintela wrote: >> Signed-off-by: Juan Quintela <quintela@redhat.com> >> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> >> --- >> migration/migration.c | 24 ++++++++++++------------ >> 1 file changed, 12 insertions(+), 12 deletions(-) >> diff --git a/migration/migration.c b/migration/migration.c >> index 594a42f085..644c61e91d 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -3764,23 +3764,23 @@ static MigIterateState migration_iteration_run(MigrationState *s) >> pend_pre, pend_compat, pend_post); >> } >> - if (pending_size && pending_size >= s->threshold_size) { >> - /* Still a significant amount to transfer */ >> - if (!in_postcopy && pend_pre <= s->threshold_size && >> - qatomic_read(&s->start_postcopy)) { >> - if (postcopy_start(s)) { >> - error_report("%s: postcopy failed to start", __func__); >> - } >> - return MIG_ITERATE_SKIP; >> - } >> - /* Just another iteration step */ >> - qemu_savevm_state_iterate(s->to_dst_file, in_postcopy); >> - } else { >> + if (pending_size < s->threshold_size) { > > to keep the logic, formally it should be "if (!pending_size || pending_size < s->threshold_size)"... And here I am, back. To stand corrected O:-) I have to do the change that you suggested. Why? Because it never ever happens with "real" migrations. But qemu-iotest 181, well, sometimes it fails. Never when you compile x86_64. You need to compile all architectures. And it fails only sometimes. Ok, changed to if (!pending_size || pending_size < s->threshold_size) Thanks, Juan. > Actually, could s->threshold_size be 0 here? Or, worth an assertion > assert(s->threshold_size) ? This test in particular sets the "bandwidth_limit" to 4k, that is one page, so it could be that gets zero. Later, Juan.
diff --git a/migration/migration.c b/migration/migration.c index 594a42f085..644c61e91d 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3764,23 +3764,23 @@ static MigIterateState migration_iteration_run(MigrationState *s) pend_pre, pend_compat, pend_post); } - if (pending_size && pending_size >= s->threshold_size) { - /* Still a significant amount to transfer */ - if (!in_postcopy && pend_pre <= s->threshold_size && - qatomic_read(&s->start_postcopy)) { - if (postcopy_start(s)) { - error_report("%s: postcopy failed to start", __func__); - } - return MIG_ITERATE_SKIP; - } - /* Just another iteration step */ - qemu_savevm_state_iterate(s->to_dst_file, in_postcopy); - } else { + if (pending_size < s->threshold_size) { trace_migration_thread_low_pending(pending_size); migration_completion(s); return MIG_ITERATE_BREAK; } + /* Still a significant amount to transfer */ + if (!in_postcopy && pend_pre <= s->threshold_size && + qatomic_read(&s->start_postcopy)) { + if (postcopy_start(s)) { + error_report("%s: postcopy failed to start", __func__); + } + return MIG_ITERATE_SKIP; + } + + /* Just another iteration step */ + qemu_savevm_state_iterate(s->to_dst_file, in_postcopy); return MIG_ITERATE_RESUME; }