Patchwork [v2] Fix segfault on migration completion

login
register
mail settings
Submitter Luiz Capitulino
Date Oct. 28, 2011, 4:59 p.m.
Message ID <20111028145952.4bb63294@doriath>
Download mbox | patch
Permalink /patch/122443/
State New
Headers show

Comments

Luiz Capitulino - Oct. 28, 2011, 4:59 p.m.
A simple migration reproduces it:

1. Start the source VM with:

   # qemu [...] -S

2. Start the destination VM with:

   # qemu <source VM cmd-line> -incoming tcp:0:4444

3. In the source VM:

   (qemu) migrate -d tcp:0:4444

4. The source VM will segfault as soon as migration completes (might not
   happen in the first try)

What is happening here is that qemu_file_put_notify() can end up closing
's->file' (in which case it's also set to NULL). The call stack is rather
complex, but Eduardo helped tracking it to:

select loop -> migrate_fd_put_notify() -> qemu_file_put_notify() ->
buffered_put_buffer() -> migrate_fd_put_ready() ->
migrate_fd_completed() -> migrate_fd_cleanup().

To be honest, it's not completely clear to me in which cases 's->file'
is not closed (on error maybe)? But I doubt this fix will make anything
worse.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
---

V2: better commit log

 migration.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
Juan Quintela - Oct. 31, 2011, 12:27 a.m.
Luiz Capitulino <lcapitulino@redhat.com> wrote:
> A simple migration reproduces it:
>
> 1. Start the source VM with:
>
>    # qemu [...] -S
>
> 2. Start the destination VM with:
>
>    # qemu <source VM cmd-line> -incoming tcp:0:4444
>
> 3. In the source VM:
>
>    (qemu) migrate -d tcp:0:4444
>
> 4. The source VM will segfault as soon as migration completes (might not
>    happen in the first try)
>
> What is happening here is that qemu_file_put_notify() can end up closing
> 's->file' (in which case it's also set to NULL). The call stack is rather
> complex, but Eduardo helped tracking it to:
>
> select loop -> migrate_fd_put_notify() -> qemu_file_put_notify() ->
> buffered_put_buffer() -> migrate_fd_put_ready() ->
> migrate_fd_completed() -> migrate_fd_cleanup().
>
> To be honest, it's not completely clear to me in which cases 's->file'
> is not closed (on error maybe)? But I doubt this fix will make anything
> worse.
>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Acked-by: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> ---
>
> V2: better commit log

Acked-by: Juan Quintela <quintela@redhat.com>

And people wonder why error handling on migration is difficult, sniff
:-(
Markus Armbruster - Oct. 31, 2011, 6:15 a.m.
Luiz Capitulino <lcapitulino@redhat.com> writes:

> A simple migration reproduces it:
>
> 1. Start the source VM with:
>
>    # qemu [...] -S
>
> 2. Start the destination VM with:
>
>    # qemu <source VM cmd-line> -incoming tcp:0:4444
>
> 3. In the source VM:
>
>    (qemu) migrate -d tcp:0:4444
>
> 4. The source VM will segfault as soon as migration completes (might not
>    happen in the first try)
>
> What is happening here is that qemu_file_put_notify() can end up closing
> 's->file' (in which case it's also set to NULL). The call stack is rather
> complex, but Eduardo helped tracking it to:
>
> select loop -> migrate_fd_put_notify() -> qemu_file_put_notify() ->
> buffered_put_buffer() -> migrate_fd_put_ready() ->
> migrate_fd_completed() -> migrate_fd_cleanup().
>
> To be honest, it's not completely clear to me in which cases 's->file'
> is not closed (on error maybe)? But I doubt this fix will make anything
> worse.
>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Acked-by: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> ---
>
> V2: better commit log
>
>  migration.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/migration.c b/migration.c
> index bdca72e..f6e6208 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -252,7 +252,7 @@ static void migrate_fd_put_notify(void *opaque)
>  
>      qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
>      qemu_file_put_notify(s->file);
> -    if (qemu_file_get_error(s->file)) {
> +    if (s->file && qemu_file_get_error(s->file)) {
>          migrate_fd_error(s);
>      }
>  }

I wonder whether we can lose the error in s->file by closing s->file
before we get here.  But even if we can, we still report more errors
than before the series this patch fixes.
Anthony Liguori - Nov. 1, 2011, 6:04 p.m.
On 10/28/2011 11:59 AM, Luiz Capitulino wrote:
> A simple migration reproduces it:
>
> 1. Start the source VM with:
>
>     # qemu [...] -S
>
> 2. Start the destination VM with:
>
>     # qemu<source VM cmd-line>  -incoming tcp:0:4444
>
> 3. In the source VM:
>
>     (qemu) migrate -d tcp:0:4444
>
> 4. The source VM will segfault as soon as migration completes (might not
>     happen in the first try)
>
> What is happening here is that qemu_file_put_notify() can end up closing
> 's->file' (in which case it's also set to NULL). The call stack is rather
> complex, but Eduardo helped tracking it to:
>
> select loop ->  migrate_fd_put_notify() ->  qemu_file_put_notify() ->
> buffered_put_buffer() ->  migrate_fd_put_ready() ->
> migrate_fd_completed() ->  migrate_fd_cleanup().
>
> To be honest, it's not completely clear to me in which cases 's->file'
> is not closed (on error maybe)? But I doubt this fix will make anything
> worse.
>
> Reviewed-by: Paolo Bonzini<pbonzini@redhat.com>
> Acked-by: Eduardo Habkost<ehabkost@redhat.com>
> Signed-off-by: Luiz Capitulino<lcapitulino@redhat.com>

Applied.  Thanks.

Regards,

Anthony Liguori

> ---
>
> V2: better commit log
>
>   migration.c |    2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/migration.c b/migration.c
> index bdca72e..f6e6208 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -252,7 +252,7 @@ static void migrate_fd_put_notify(void *opaque)
>
>       qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
>       qemu_file_put_notify(s->file);
> -    if (qemu_file_get_error(s->file)) {
> +    if (s->file&&  qemu_file_get_error(s->file)) {
>           migrate_fd_error(s);
>       }
>   }

Patch

diff --git a/migration.c b/migration.c
index bdca72e..f6e6208 100644
--- a/migration.c
+++ b/migration.c
@@ -252,7 +252,7 @@  static void migrate_fd_put_notify(void *opaque)
 
     qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
     qemu_file_put_notify(s->file);
-    if (qemu_file_get_error(s->file)) {
+    if (s->file && qemu_file_get_error(s->file)) {
         migrate_fd_error(s);
     }
 }