Patchwork RFC: exit on incoming exec migrate failure

login
register
mail settings
Submitter Andrew Farmer
Date Dec. 9, 2009, 9:10 p.m.
Message ID <8994198D-0AA5-45C0-8A46-375BCA34E201@hq.newdream.net>
Download mbox | patch
Permalink /patch/40762/
State New
Headers show

Comments

Andrew Farmer - Dec. 9, 2009, 9:10 p.m.
Right now, if an incoming migrate through exec fails, the qemu process will end up chewing CPU indefinitely - it looks like it closes the migration FD but doesn't remove its IO handler properly. An easy way to reproduce this is to try launching with -incoming exec:/bin/false. This is obviously useless, but illustrates the issue handily.

One solution might be to retry the command on migrate failure, but that won't really help in all circumstances (for instance, if the migration command is broken!), so it seems equally appropriate to just die if an incoming exec migration fails. The patch is trivial, and follows - does this look sensible? (I'm new to qemu development, but trying to pick it up.)
Daniel P. Berrange - Dec. 11, 2009, 9:19 p.m.
On Wed, Dec 09, 2009 at 01:10:18PM -0800, Andrew Farmer wrote:
> Right now, if an incoming migrate through exec fails, the qemu process 
> will end up chewing CPU indefinitely - it looks like it closes the 
> migration FD but doesn't remove its IO handler properly. An easy way 
> to reproduce this is to try launching with -incoming exec:/bin/false.
> This is obviously useless, but illustrates the issue handily.

I've hit this in real life too, with restore from a file containing
the saved state which had got corrupted/truncated. I only discovered
the failure when I wondered by QEMU was chewing 100% cpu

> One solution might be to retry the command on migrate failure, but that
> won't really help in all circumstances (for instance, if the migration 
> command is broken!), so it seems equally appropriate to just die if an 
> incoming exec migration fails. The patch is trivial, and follows - does
> this look sensible? (I'm new to qemu development, but trying to pick it up.)

It looks like a reasonable approach to me. If we carried on running, it would
be hard for apps to determine whether migration succeeded & thus QEMU is
running, or whether it failed and is just idling. By exiting we give the
management app/user the optional to retry simply by relaunching

> diff --git a/migration-exec.c b/migration-exec.c
> index c830669..0292c19 100644
> --- a/migration-exec.c
> +++ b/migration-exec.c
> @@ -114,7 +114,7 @@ static void exec_accept_incoming_migration(void *opaque)
>      ret = qemu_loadvm_state(f);
>      if (ret < 0) {
>          fprintf(stderr, "load of migration failed\n");
> -        goto err;
> +        exit(0);
>      }
>      qemu_announce_self();
>      dprintf("successfully loaded vm state\n");
> @@ -123,7 +123,6 @@ static void exec_accept_incoming_migration(void *opaque)
>      if (autostart)
>          vm_start();
>  
> -err:
>      qemu_fclose(f);
>  }


Daniel
Andrew Farmer - Dec. 14, 2009, 7:05 p.m.
On 11 Dec 2009, at 13:19, Daniel P. Berrange wrote:
> On Wed, Dec 09, 2009 at 01:10:18PM -0800, Andrew Farmer wrote:
>> Right now, if an incoming migrate through exec fails, the qemu process 
>> will end up chewing CPU indefinitely - it looks like it closes the 
>> migration FD but doesn't remove its IO handler properly. An easy way 
>> to reproduce this is to try launching with -incoming exec:/bin/false.
>> This is obviously useless, but illustrates the issue handily.
> 
> I've hit this in real life too, with restore from a file containing
> the saved state which had got corrupted/truncated. I only discovered
> the failure when I wondered by QEMU was chewing 100% cpu.

Hrm... actually, if this also happens on state restore, the problem might not be in migration-exec at all (or there might be multiple bugs with similar symptoms), as the fix I identified was specific to exec failures.

Patch

diff --git a/migration-exec.c b/migration-exec.c
index c830669..0292c19 100644
--- a/migration-exec.c
+++ b/migration-exec.c
@@ -114,7 +114,7 @@  static void exec_accept_incoming_migration(void *opaque)
     ret = qemu_loadvm_state(f);
     if (ret < 0) {
         fprintf(stderr, "load of migration failed\n");
-        goto err;
+        exit(0);
     }
     qemu_announce_self();
     dprintf("successfully loaded vm state\n");
@@ -123,7 +123,6 @@  static void exec_accept_incoming_migration(void *opaque)
     if (autostart)
         vm_start();
 
-err:
     qemu_fclose(f);
 }