Message ID | 20210708190653.252961-3-peterx@redhat.com |
---|---|
State | New |
Headers | show |
Series | migration: Three more fixes for postcopy recovery | expand |
* Peter Xu (peterx@redhat.com) wrote: > Below process could crash qemu with postcopy recovery: > > 1. (hmp) migrate -d .. > 2. (hmp) migrate_start_postcopy > 3. [network down, postcopy paused] > 4. (hmp) migrate -r $WRONG_PORT > when try the recover on an invalid $WRONG_PORT, cleanup_bh will be cleared > 5. (hmp) migrate -r $RIGHT_PORT > [qemu crash on assert(cleanup_bh)] > > The thing is we shouldn't cleanup if it's postcopy resume; the error is set > mostly because the channel is wrong, so we return directly waiting for the user > to retry. > > migrate_fd_cleanup() should only be called when migration is cancelled or > completed. > > Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > migration/migration.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 8786104c9a..bb1edf862a 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -3975,7 +3975,18 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) > } > if (error_in) { > migrate_fd_error(s, error_in); > - migrate_fd_cleanup(s); > + if (resume) { > + /* > + * Don't do cleanup for resume if channel is invalid, but only dump > + * the error. We wait for another channel connect from the user. > + * The error_report still gives HMP user a hint on what failed. > + * It's normally done in migrate_fd_cleanup(), but call it here > + * explicitly. > + */ > + error_report_err(error_copy(s->error)); > + } else { > + migrate_fd_cleanup(s); > + } > return; > } > > -- > 2.31.1 >
diff --git a/migration/migration.c b/migration/migration.c index 8786104c9a..bb1edf862a 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3975,7 +3975,18 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) } if (error_in) { migrate_fd_error(s, error_in); - migrate_fd_cleanup(s); + if (resume) { + /* + * Don't do cleanup for resume if channel is invalid, but only dump + * the error. We wait for another channel connect from the user. + * The error_report still gives HMP user a hint on what failed. + * It's normally done in migrate_fd_cleanup(), but call it here + * explicitly. + */ + error_report_err(error_copy(s->error)); + } else { + migrate_fd_cleanup(s); + } return; }
Below process could crash qemu with postcopy recovery: 1. (hmp) migrate -d .. 2. (hmp) migrate_start_postcopy 3. [network down, postcopy paused] 4. (hmp) migrate -r $WRONG_PORT when try the recover on an invalid $WRONG_PORT, cleanup_bh will be cleared 5. (hmp) migrate -r $RIGHT_PORT [qemu crash on assert(cleanup_bh)] The thing is we shouldn't cleanup if it's postcopy resume; the error is set mostly because the channel is wrong, so we return directly waiting for the user to retry. migrate_fd_cleanup() should only be called when migration is cancelled or completed. Signed-off-by: Peter Xu <peterx@redhat.com> --- migration/migration.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)