Message ID | 20180619163552.18206-1-berrange@redhat.com |
---|---|
State | New |
Headers | show |
Series | [v2] migration: fix crash in when incoming client channel setup fails | expand |
* Daniel P. Berrangé (berrange@redhat.com) wrote: > The way we determine if we can start the incoming migration was > changed to use migration_has_all_channels() in: > > commit 428d89084c709e568f9cd301c2f6416a54c53d6d > Author: Juan Quintela <quintela@redhat.com> > Date: Mon Jul 24 13:06:25 2017 +0200 > > migration: Create migration_has_all_channels > > This method in turn calls multifd_recv_all_channels_created() > which is hardcoded to always return 'true' when multifd is > not in use. This is a latent bug... > > ...activated in in a following commit where that return result > ends up acting as the flag to indicate whether it is possible > to start processing the migration: > > commit 36c2f8be2c4eb0003ac77a14910842b7ddd7337e > Author: Juan Quintela <quintela@redhat.com> > Date: Wed Mar 7 08:40:52 2018 +0100 > > migration: Delay start of migration main routines > > This means that if channel initialization fails with normal > migration, it'll never notice and attempt to start the > incoming migration regardless and crash on a NULL pointer. > > This can be seen, for example, if a client connects to a server > requiring TLS, but has an invalid x509 certificate: > > qemu-system-x86_64: The certificate hasn't got a known issuer > qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co: Assertion `mis->from_src_file' failed. > > #0 0x00007fffebd24f2b in raise () at /lib64/libc.so.6 > #1 0x00007fffebd0f561 in abort () at /lib64/libc.so.6 > #2 0x00007fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6 > #3 0x00007fffebd1d692 in () at /lib64/libc.so.6 > #4 0x0000555555ad027e in process_incoming_migration_co (opaque=<optimized out>) at migration/migration.c:386 > #5 0x0000555555c45e8b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116 > #6 0x00007fffebd3a6a0 in __start_context () at /lib64/libc.so.6 > #7 0x0000000000000000 in () > > To handle the non-multifd case, we check whether mis->from_src_file > is non-NULL. With this in place, the migration server drops the > rejected client and stays around waiting for another, hopefully > valid, client to arrive. > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > > Changed in v2: > > - Clarify exactly when it broke > - Clarify that we expect the server to stay active > > migration/migration.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c > index e1eaa97df4..38ad818b23 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -518,11 +518,12 @@ void migration_ioc_process_incoming(QIOChannel *ioc) > */ > bool migration_has_all_channels(void) > { > + MigrationIncomingState *mis = migration_incoming_get_current(); > bool all_channels; > > all_channels = multifd_recv_all_channels_created(); > > - return all_channels; > + return all_channels && mis->from_src_file != NULL; > } > > /* > -- > 2.17.0 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 06/19/2018 11:35 AM, Daniel P. Berrangé wrote: > The way we determine if we can start the incoming migration was > changed to use migration_has_all_channels() in: > > commit 428d89084c709e568f9cd301c2f6416a54c53d6d > Author: Juan Quintela <quintela@redhat.com> > Date: Mon Jul 24 13:06:25 2017 +0200 > > migration: Create migration_has_all_channels > > This method in turn calls multifd_recv_all_channels_created() > which is hardcoded to always return 'true' when multifd is > not in use. This is a latent bug... > > ...activated in in a following commit where that return result s/in in/in/
Daniel P. Berrangé <berrange@redhat.com> wrote: > The way we determine if we can start the incoming migration was > changed to use migration_has_all_channels() in: > > commit 428d89084c709e568f9cd301c2f6416a54c53d6d > Author: Juan Quintela <quintela@redhat.com> > Date: Mon Jul 24 13:06:25 2017 +0200 > > migration: Create migration_has_all_channels > > This method in turn calls multifd_recv_all_channels_created() > which is hardcoded to always return 'true' when multifd is > not in use. This is a latent bug... > > ...activated in in a following commit where that return result > ends up acting as the flag to indicate whether it is possible > to start processing the migration: > > commit 36c2f8be2c4eb0003ac77a14910842b7ddd7337e > Author: Juan Quintela <quintela@redhat.com> > Date: Wed Mar 7 08:40:52 2018 +0100 > > migration: Delay start of migration main routines > > This means that if channel initialization fails with normal > migration, it'll never notice and attempt to start the > incoming migration regardless and crash on a NULL pointer. > > This can be seen, for example, if a client connects to a server > requiring TLS, but has an invalid x509 certificate: > > qemu-system-x86_64: The certificate hasn't got a known issuer > qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co: Assertion `mis->from_src_file' failed. > > #0 0x00007fffebd24f2b in raise () at /lib64/libc.so.6 > #1 0x00007fffebd0f561 in abort () at /lib64/libc.so.6 > #2 0x00007fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6 > #3 0x00007fffebd1d692 in () at /lib64/libc.so.6 > #4 0x0000555555ad027e in process_incoming_migration_co (opaque=<optimized out>) at migration/migration.c:386 > #5 0x0000555555c45e8b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116 > #6 0x00007fffebd3a6a0 in __start_context () at /lib64/libc.so.6 > #7 0x0000000000000000 in () > > To handle the non-multifd case, we check whether mis->from_src_file > is non-NULL. With this in place, the migration server drops the > rejected client and stays around waiting for another, hopefully > valid, client to arrive. > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>
diff --git a/migration/migration.c b/migration/migration.c index e1eaa97df4..38ad818b23 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -518,11 +518,12 @@ void migration_ioc_process_incoming(QIOChannel *ioc) */ bool migration_has_all_channels(void) { + MigrationIncomingState *mis = migration_incoming_get_current(); bool all_channels; all_channels = multifd_recv_all_channels_created(); - return all_channels; + return all_channels && mis->from_src_file != NULL; } /*
The way we determine if we can start the incoming migration was changed to use migration_has_all_channels() in: commit 428d89084c709e568f9cd301c2f6416a54c53d6d Author: Juan Quintela <quintela@redhat.com> Date: Mon Jul 24 13:06:25 2017 +0200 migration: Create migration_has_all_channels This method in turn calls multifd_recv_all_channels_created() which is hardcoded to always return 'true' when multifd is not in use. This is a latent bug... ...activated in in a following commit where that return result ends up acting as the flag to indicate whether it is possible to start processing the migration: commit 36c2f8be2c4eb0003ac77a14910842b7ddd7337e Author: Juan Quintela <quintela@redhat.com> Date: Wed Mar 7 08:40:52 2018 +0100 migration: Delay start of migration main routines This means that if channel initialization fails with normal migration, it'll never notice and attempt to start the incoming migration regardless and crash on a NULL pointer. This can be seen, for example, if a client connects to a server requiring TLS, but has an invalid x509 certificate: qemu-system-x86_64: The certificate hasn't got a known issuer qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co: Assertion `mis->from_src_file' failed. #0 0x00007fffebd24f2b in raise () at /lib64/libc.so.6 #1 0x00007fffebd0f561 in abort () at /lib64/libc.so.6 #2 0x00007fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6 #3 0x00007fffebd1d692 in () at /lib64/libc.so.6 #4 0x0000555555ad027e in process_incoming_migration_co (opaque=<optimized out>) at migration/migration.c:386 #5 0x0000555555c45e8b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116 #6 0x00007fffebd3a6a0 in __start_context () at /lib64/libc.so.6 #7 0x0000000000000000 in () To handle the non-multifd case, we check whether mis->from_src_file is non-NULL. With this in place, the migration server drops the rejected client and stays around waiting for another, hopefully valid, client to arrive. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- Changed in v2: - Clarify exactly when it broke - Clarify that we expect the server to stay active migration/migration.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)