Patchwork Fix segfault after migration completes

login
register
mail settings
Submitter Luiz Capitulino
Date Oct. 28, 2011, 2:58 p.m.
Message ID <20111028125804.751c99de@doriath>
Download mbox | patch
Permalink /patch/122416/
State New
Headers show

Comments

Luiz Capitulino - Oct. 28, 2011, 2:58 p.m.
To reproduce:

1. Start the source VM with:

 # qemu [...] -S

2. Start the destination VM with:

 # qemu <source VM cmd-line> -incoming tcp:0:4444

3. In the source VM:

  (qemu) migrate -d tcp:0:4444

3. The source VM will segfault as soon as migration completes (might not
   happen in the first try)

Here's the backtrace:

#0  0x0000000000516f39 in qemu_file_get_error (f=0x0) at /home/lcapitulino/src/qmp-unstable/savevm.c:431
431	    return f->last_error;

#0  0x0000000000516f39 in qemu_file_get_error (f=0x0) at /home/lcapitulino/src/qmp-unstable/savevm.c:431
#1  0x00000000004e7a9a in migrate_fd_put_notify (opaque=0x987640) at /home/lcapitulino/src/qmp-unstable/migration.c:255
#2  0x000000000046d59a in qemu_iohandler_poll (readfds=0x7fff45ccfe50, writefds=0x7fff45ccfdd0, xfds=0x7fff45ccfd50, ret=1)
    at /home/lcapitulino/src/qmp-unstable/iohandler.c:124
#3  0x00000000004e6033 in main_loop_wait (nonblocking=0) at /home/lcapitulino/src/qmp-unstable/main-loop.c:463
#4  0x00000000004db5b0 in main_loop () at /home/lcapitulino/src/qmp-unstable/vl.c:1478
#5  0x00000000004dffed in main (argc=16, argv=0x7fff45cd0318, envp=0x7fff45cd03a0) at /home/lcapitulino/src/qmp-unstable/vl.c:3449

So, 's->file' is NULL in migrate_fd_put_notify(). The interesting thing
is that it's valid in the qemu_file_put_notify() call, which makes me
think that either: there's a race somewhere or qemu_file_put_notify() is
itself clearing 's->file'. In both cases the fix below could just be hiding
the real issue, but let's get started...

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
---
 migration.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
Paolo Bonzini - Oct. 28, 2011, 3:19 p.m.
On 10/28/2011 04:58 PM, Luiz Capitulino wrote:
> To reproduce:
>
> 1. Start the source VM with:
>
>   # qemu [...] -S
>
> 2. Start the destination VM with:
>
>   # qemu<source VM cmd-line>  -incoming tcp:0:4444
>
> 3. In the source VM:
>
>    (qemu) migrate -d tcp:0:4444
>
> 3. The source VM will segfault as soon as migration completes (might not
>     happen in the first try)
>
> Here's the backtrace:
>
> #0  0x0000000000516f39 in qemu_file_get_error (f=0x0) at /home/lcapitulino/src/qmp-unstable/savevm.c:431
> 431	    return f->last_error;
>
> #0  0x0000000000516f39 in qemu_file_get_error (f=0x0) at /home/lcapitulino/src/qmp-unstable/savevm.c:431
> #1  0x00000000004e7a9a in migrate_fd_put_notify (opaque=0x987640) at /home/lcapitulino/src/qmp-unstable/migration.c:255
> #2  0x000000000046d59a in qemu_iohandler_poll (readfds=0x7fff45ccfe50, writefds=0x7fff45ccfdd0, xfds=0x7fff45ccfd50, ret=1)
>      at /home/lcapitulino/src/qmp-unstable/iohandler.c:124
> #3  0x00000000004e6033 in main_loop_wait (nonblocking=0) at /home/lcapitulino/src/qmp-unstable/main-loop.c:463
> #4  0x00000000004db5b0 in main_loop () at /home/lcapitulino/src/qmp-unstable/vl.c:1478
> #5  0x00000000004dffed in main (argc=16, argv=0x7fff45cd0318, envp=0x7fff45cd03a0) at /home/lcapitulino/src/qmp-unstable/vl.c:3449
>
> So, 's->file' is NULL in migrate_fd_put_notify(). The interesting thing
> is that it's valid in the qemu_file_put_notify() call, which makes me
> think that either: there's a race somewhere or qemu_file_put_notify() is
> itself clearing 's->file'. In both cases the fix below could just be hiding
> the real issue, but let's get started...
>
> Signed-off-by: Luiz Capitulino<lcapitulino@redhat.com>
> ---
>   migration.c |    2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/migration.c b/migration.c
> index bdca72e..f6e6208 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -252,7 +252,7 @@ static void migrate_fd_put_notify(void *opaque)
>
>       qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
>       qemu_file_put_notify(s->file);
> -    if (qemu_file_get_error(s->file)) {
> +    if (s->file&&  qemu_file_get_error(s->file)) {
>           migrate_fd_error(s);
>       }
>   }

Just one comment, it would be good to mention in the commit message the 
call chain.  The one that Eduardo had tracked offlist looks indeed 
correct to me:

select loop -> migrate_fd_put_notify() -> qemu_file_put_notify() -> 
buffered_put_buffer() -> migrate_fd_put_ready() -> 
migrate_fd_completed() -> migrate_fd_cleanup().

Anyway, code-wise:

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo
Eduardo Habkost - Oct. 28, 2011, 3:37 p.m.
On Fri, Oct 28, 2011 at 12:58:04PM -0200, Luiz Capitulino wrote:
[...]
> 
> So, 's->file' is NULL in migrate_fd_put_notify(). The interesting thing
> is that it's valid in the qemu_file_put_notify() call, which makes me
> think that either: there's a race somewhere or qemu_file_put_notify() is
> itself clearing 's->file'. In both cases the fix below could just be hiding
> the real issue, but let's get started...
> 
> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>

Acked-by: Eduardo Habkost <ehabkost@redhat.com>

However, it looks like the error-check interface of QEMUFile is really
easy to misuse, and can be improved:

- Either errors are always triggered synchronously inside
  qemu_file_put_notify(), or they can be triggered asynchronously
  elsewhere too.
- If they are always triggered synchronously during the
  qemu_file_put_notify() call, then qemu_file_put_notify() should return
  error information itself instead of requiring a qemu_file_get_error()
  call.
- If errors can be triggered asynchronously, then we need an error
  notification mechanism that makes sure no error is ever missed,
  instead of this error check on migrate_fd_put_notify().

> ---
>  migration.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/migration.c b/migration.c
> index bdca72e..f6e6208 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -252,7 +252,7 @@ static void migrate_fd_put_notify(void *opaque)
>  
>      qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
>      qemu_file_put_notify(s->file);
> -    if (qemu_file_get_error(s->file)) {
> +    if (s->file && qemu_file_get_error(s->file)) {
>          migrate_fd_error(s);
>      }
>  }
> -- 
> 1.7.7.1.488.ge8e1c.dirty
>

Patch

diff --git a/migration.c b/migration.c
index bdca72e..f6e6208 100644
--- a/migration.c
+++ b/migration.c
@@ -252,7 +252,7 @@  static void migrate_fd_put_notify(void *opaque)
 
     qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
     qemu_file_put_notify(s->file);
-    if (qemu_file_get_error(s->file)) {
+    if (s->file && qemu_file_get_error(s->file)) {
         migrate_fd_error(s);
     }
 }