Patchwork [1/5] Exit if incoming migration fails

login
register
mail settings
Submitter Juan Quintela
Date May 25, 2010, 2:21 p.m.
Message ID <889abbffe3359f5160234e580cb663ec6189174e.1274796992.git.quintela@redhat.com>
Download mbox | patch
Permalink /patch/53553/
State New
Headers show

Comments

Juan Quintela - May 25, 2010, 2:21 p.m.
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration.c |   16 ++++++++++------
 migration.h |    2 +-
 vl.c        |    7 ++++++-
 3 files changed, 17 insertions(+), 8 deletions(-)
Luiz Capitulino - May 25, 2010, 6:01 p.m.
On Tue, 25 May 2010 16:21:01 +0200
Juan Quintela <quintela@redhat.com> wrote:

> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  migration.c |   16 ++++++++++------
>  migration.h |    2 +-
>  vl.c        |    7 ++++++-
>  3 files changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/migration.c b/migration.c
> index 05f6cc5..9c1d4b6 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -36,22 +36,26 @@ static uint32_t max_throttle = (32 << 20);
> 
>  static MigrationState *current_migration;
> 
> -void qemu_start_incoming_migration(const char *uri)
> +int qemu_start_incoming_migration(const char *uri)
>  {
>      const char *p;
> +    int ret;
> 
>      if (strstart(uri, "tcp:", &p))
> -        tcp_start_incoming_migration(p);
> +        ret = tcp_start_incoming_migration(p);
>  #if !defined(WIN32)
>      else if (strstart(uri, "exec:", &p))
> -        exec_start_incoming_migration(p);
> +        ret =  exec_start_incoming_migration(p);
>      else if (strstart(uri, "unix:", &p))
> -        unix_start_incoming_migration(p);
> +        ret = unix_start_incoming_migration(p);
>      else if (strstart(uri, "fd:", &p))
> -        fd_start_incoming_migration(p);
> +        ret = fd_start_incoming_migration(p);
>  #endif
> -    else
> +    else {
>          fprintf(stderr, "unknown migration protocol: %s\n", uri);
> +        ret = -EPROTONOSUPPORT;
> +    }
> +    return ret;
>  }
> 
>  int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data)
> diff --git a/migration.h b/migration.h
> index 385423f..dd423a1 100644
> --- a/migration.h
> +++ b/migration.h
> @@ -50,7 +50,7 @@ struct FdMigrationState
>      void *opaque;
>  };
> 
> -void qemu_start_incoming_migration(const char *uri);
> +int qemu_start_incoming_migration(const char *uri);
> 
>  int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data);
> 
> diff --git a/vl.c b/vl.c
> index 328395e..d13440d 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -3823,7 +3823,12 @@ int main(int argc, char **argv, char **envp)
>      }
> 
>      if (incoming) {
> -        qemu_start_incoming_migration(incoming);
> +        int ret = qemu_start_incoming_migration(incoming);
> +        if (ret < 0) {
> +            fprintf(stderr, "Migration failed. Exit code %s(%d), exiting.\n",
> +                    incoming, ret);
> +            exit(ret);

 While I agree on the change, I have two comments:

1. By taking a look at the code I have the impression that most of the
   fun failures will happen on the handler passed to qemu_set_fd_handler2(),
   do you agree? Any plan to address that?

1. Is exit()ing the best thing to be done? I understand it's the easiest
   and maybe better than nothing, but wouldn't it be better to enter in
   paused-forever state so that clients can query and decide what to do?

> +        }
>      } else if (autostart) {
>          vm_start();
>      }
Juan Quintela - May 25, 2010, 6:37 p.m.
Luiz Capitulino <lcapitulino@redhat.com> wrote:
> On Tue, 25 May 2010 16:21:01 +0200
> Juan Quintela <quintela@redhat.com> wrote:
>
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> ---
>>  migration.c |   16 ++++++++++------
>>  migration.h |    2 +-
>>  vl.c        |    7 ++++++-
>>  3 files changed, 17 insertions(+), 8 deletions(-)
>> 

>  While I agree on the change, I have two comments:
>
> 1. By taking a look at the code I have the impression that most of the
>    fun failures will happen on the handler passed to qemu_set_fd_handler2(),
>    do you agree? Any plan to address that?

That is outgoing migration, not incoming migration.
Incoming migration in synchronous..


> 1. Is exit()ing the best thing to be done? I understand it's the easiest
>    and maybe better than nothing, but wouldn't it be better to enter in
>    paused-forever state so that clients can query and decide what to do?

For incoming migration, if it fails in the middle, every bet is off.
You are in a really inconsistent state, not sure which one, and if
migration was live, with the other host possibly retaking the disks to
continue.

In some cases, you can't do anything:
- you got passed an fd, and fd got closed/image corrupted/...
- you got passed an exec command like "exec: gzip -d < foo.gz"
  If gzip failed once, it will fail forever.

If you are running it by hand, cursor up + enter, and you are back
If you are using a management application, it is going to be easier to
restart the process that trying to cleanup everything.

Experience shows that people really tries to do weird things when
machine is in this state.

Later, Juan.
Anthony Liguori - May 25, 2010, 6:52 p.m.
On 05/25/2010 01:37 PM, Juan Quintela wrote:
> Luiz Capitulino<lcapitulino@redhat.com>  wrote:
>    
>> On Tue, 25 May 2010 16:21:01 +0200
>> Juan Quintela<quintela@redhat.com>  wrote:
>>
>>      
>>> Signed-off-by: Juan Quintela<quintela@redhat.com>
>>> ---
>>>   migration.c |   16 ++++++++++------
>>>   migration.h |    2 +-
>>>   vl.c        |    7 ++++++-
>>>   3 files changed, 17 insertions(+), 8 deletions(-)
>>>
>>>        
>    
>>   While I agree on the change, I have two comments:
>>
>> 1. By taking a look at the code I have the impression that most of the
>>     fun failures will happen on the handler passed to qemu_set_fd_handler2(),
>>     do you agree? Any plan to address that?
>>      
> That is outgoing migration, not incoming migration.
> Incoming migration in synchronous..
>
>
>    
>> 1. Is exit()ing the best thing to be done? I understand it's the easiest
>>     and maybe better than nothing, but wouldn't it be better to enter in
>>     paused-forever state so that clients can query and decide what to do?
>>      
> For incoming migration, if it fails in the middle, every bet is off.
> You are in a really inconsistent state, not sure which one, and if
> migration was live, with the other host possibly retaking the disks to
> continue.
>    

I agree that exiting is the only sane behavior for the destination.

Regards,

Anthony Liguori

> In some cases, you can't do anything:
> - you got passed an fd, and fd got closed/image corrupted/...
> - you got passed an exec command like "exec: gzip -d<  foo.gz"
>    If gzip failed once, it will fail forever.
>
> If you are running it by hand, cursor up + enter, and you are back
> If you are using a management application, it is going to be easier to
> restart the process that trying to cleanup everything.
>
> Experience shows that people really tries to do weird things when
> machine is in this state.
>
> Later, Juan.
>
>

Patch

diff --git a/migration.c b/migration.c
index 05f6cc5..9c1d4b6 100644
--- a/migration.c
+++ b/migration.c
@@ -36,22 +36,26 @@  static uint32_t max_throttle = (32 << 20);

 static MigrationState *current_migration;

-void qemu_start_incoming_migration(const char *uri)
+int qemu_start_incoming_migration(const char *uri)
 {
     const char *p;
+    int ret;

     if (strstart(uri, "tcp:", &p))
-        tcp_start_incoming_migration(p);
+        ret = tcp_start_incoming_migration(p);
 #if !defined(WIN32)
     else if (strstart(uri, "exec:", &p))
-        exec_start_incoming_migration(p);
+        ret =  exec_start_incoming_migration(p);
     else if (strstart(uri, "unix:", &p))
-        unix_start_incoming_migration(p);
+        ret = unix_start_incoming_migration(p);
     else if (strstart(uri, "fd:", &p))
-        fd_start_incoming_migration(p);
+        ret = fd_start_incoming_migration(p);
 #endif
-    else
+    else {
         fprintf(stderr, "unknown migration protocol: %s\n", uri);
+        ret = -EPROTONOSUPPORT;
+    }
+    return ret;
 }

 int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data)
diff --git a/migration.h b/migration.h
index 385423f..dd423a1 100644
--- a/migration.h
+++ b/migration.h
@@ -50,7 +50,7 @@  struct FdMigrationState
     void *opaque;
 };

-void qemu_start_incoming_migration(const char *uri);
+int qemu_start_incoming_migration(const char *uri);

 int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data);

diff --git a/vl.c b/vl.c
index 328395e..d13440d 100644
--- a/vl.c
+++ b/vl.c
@@ -3823,7 +3823,12 @@  int main(int argc, char **argv, char **envp)
     }

     if (incoming) {
-        qemu_start_incoming_migration(incoming);
+        int ret = qemu_start_incoming_migration(incoming);
+        if (ret < 0) {
+            fprintf(stderr, "Migration failed. Exit code %s(%d), exiting.\n",
+                    incoming, ret);
+            exit(ret);
+        }
     } else if (autostart) {
         vm_start();
     }