Message ID | 20210629050522.147057-1-leobras@redhat.com |
---|---|
State | New |
Headers | show |
Series | [v2,1/1] migration: Unregister yank if migration setup fails | expand |
* Leonardo Bras (leobras@redhat.com) wrote: > Currently, if a qemu instance is started with "-incoming defer" and > an incorect parameter is passed to "migrate_incoming", it will print the > expected error and reply with "duplicate yank instance" for any upcoming > "migrate_incoming" command. > > This renders current qemu process unusable, and requires a new qemu > process to be started before accepting a migration. > > This is caused by a yank_register_instance() that happens in > qemu_start_incoming_migration() but is never reverted if any error > happens. > > Solves this by unregistering the instance if anything goes wrong > in the function, allowing a new "migrate_incoming" command to be > accepted. > > Fixes: b5eea99ec2f ("migration: Add yank feature", 2021-01-13) > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1974366 > Signed-off-by: Leonardo Bras <leobras@redhat.com> > Reviewed-by: Peter Xu <peterx@redhat.com> > > --- > Changes since v1: > - Add ERRP_GUARD() at the beginning of the function, so it deals with > errp passed as NULL, and does correct error propagation. Thanks; Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > migration/migration.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 4228635d18..af0c72609f 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -454,6 +454,7 @@ void migrate_add_address(SocketAddress *address) > > static void qemu_start_incoming_migration(const char *uri, Error **errp) > { > + ERRP_GUARD(); > const char *p = NULL; > > if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) { > @@ -474,9 +475,13 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp) > } else if (strstart(uri, "fd:", &p)) { > fd_start_incoming_migration(p, errp); > } else { > - yank_unregister_instance(MIGRATION_YANK_INSTANCE); > error_setg(errp, "unknown migration protocol: %s", uri); > } > + > + if (*errp) { > + yank_unregister_instance(MIGRATION_YANK_INSTANCE); > + } > + > } > > static void process_incoming_migration_bh(void *opaque) > -- > 2.32.0 >
On Tue, Jun 29, 2021 at 02:05:23AM -0300, Leonardo Bras wrote: > Currently, if a qemu instance is started with "-incoming defer" and > an incorect parameter is passed to "migrate_incoming", it will print the > expected error and reply with "duplicate yank instance" for any upcoming > "migrate_incoming" command. > > This renders current qemu process unusable, and requires a new qemu > process to be started before accepting a migration. > > This is caused by a yank_register_instance() that happens in > qemu_start_incoming_migration() but is never reverted if any error > happens. > > Solves this by unregistering the instance if anything goes wrong > in the function, allowing a new "migrate_incoming" command to be > accepted. > > Fixes: b5eea99ec2f ("migration: Add yank feature", 2021-01-13) > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1974366 > Signed-off-by: Leonardo Bras <leobras@redhat.com> > Reviewed-by: Peter Xu <peterx@redhat.com> > > --- > Changes since v1: > - Add ERRP_GUARD() at the beginning of the function, so it deals with > errp passed as NULL, and does correct error propagation. > --- > migration/migration.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 4228635d18..af0c72609f 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -454,6 +454,7 @@ void migrate_add_address(SocketAddress *address) > > static void qemu_start_incoming_migration(const char *uri, Error **errp) > { > + ERRP_GUARD(); > const char *p = NULL; > > if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) { > @@ -474,9 +475,13 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp) > } else if (strstart(uri, "fd:", &p)) { > fd_start_incoming_migration(p, errp); > } else { > - yank_unregister_instance(MIGRATION_YANK_INSTANCE); > error_setg(errp, "unknown migration protocol: %s", uri); > } > + > + if (*errp) { > + yank_unregister_instance(MIGRATION_YANK_INSTANCE); > + } > + > } > > static void process_incoming_migration_bh(void *opaque) > -- Leo, The patch looks great to me, thanks. Though I found that maybe we need to fix it in another way due to some other reason out of scope of this issue. The problem is today I encountered another yank crashing qemu when trying to do postcopy recover twice. The problem here is both initial incoming migration and postcopy recovery uses qemu_start_incoming_migration(), while in qmp_migrate_recover() b5eea99ec2f5c calls unregister before calling qemu_start_incoming_migration(). I believe it wanted to mitigate the next call to yank_register_instance() to make it work, but I think that's wrong... Firstly, if during recover, we should keep the yank instance there, not "quickly removing and adding it back". Meanwhile, as I mentioned calling qmp_migrate_recover() twice with b5eea99ec2f5c will directly crash the dest qemu because the 1st call of qmp_migrate_recover() will unregister permanently when channel failed to establish, then the 2nd call of qmp_migrate_recover() crashes at yank_unregister_instance(). I'll post an alternative fix of this issue (plus another postcopy recovery fix) to show better on what I meant. Thanks,
diff --git a/migration/migration.c b/migration/migration.c index 4228635d18..af0c72609f 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -454,6 +454,7 @@ void migrate_add_address(SocketAddress *address) static void qemu_start_incoming_migration(const char *uri, Error **errp) { + ERRP_GUARD(); const char *p = NULL; if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) { @@ -474,9 +475,13 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp) } else if (strstart(uri, "fd:", &p)) { fd_start_incoming_migration(p, errp); } else { - yank_unregister_instance(MIGRATION_YANK_INSTANCE); error_setg(errp, "unknown migration protocol: %s", uri); } + + if (*errp) { + yank_unregister_instance(MIGRATION_YANK_INSTANCE); + } + } static void process_incoming_migration_bh(void *opaque)