Patchwork Add warmup phase for live migration of large memory apps

login
register
mail settings
Submitter Shribman, Aidan
Date May 12, 2011, 8:42 a.m.
Message ID <AB5A8C7661872E428D6B8E1C2DFA35085CAAEEB246@DEWDFECCR02.wdf.sap.corp>
Download mbox | patch
Permalink /patch/95271/
State New
Headers show

Comments

Shribman, Aidan - May 12, 2011, 8:42 a.m.
> On Wed, May 11, 2011 at 8:58 AM, Shribman, Aidan 
> <aidan.shribman@sap.com> wrote:
> > From: Aidan Shribman <aidan.shribman@sap.com>
> >
> > [PATCH] Add warmup phase for live migration of large memory apps
> >
> > By invoking "migrate -w <url>" we initiate a background 
> live-migration
> > transferring of dirty pages continuously until invocation 
> of "migrate_end"
> > which attempts to complete the live migration operation.
> 
> What is the purpose of this patch?  How and when do I use it?
> 

The warmup patch adds none-converging background update of guest memory during live-migration such that on request of live-migration completion (via "migrate_end" command) we get much faster response. This is especially needed when running a payload of large enterprise applications which have high memory demands.

> Some nitpicks:
> 
> > @@ -81,6 +83,11 @@ int do_migrate(Monitor *mon, const QDict 
> *qdict, QObject **ret_data)
> >     int blk = qdict_get_try_bool(qdict, "blk", 0);
> >     int inc = qdict_get_try_bool(qdict, "inc", 0);
> >     const char *uri = qdict_get_str(qdict, "uri");
> > +    is_migrate_warmup = qdict_get_try_bool(qdict, "warmup", 0);
> > +
> > +    if (is_migrate_warmup) {
> > +        detach = 1; /* as we need migrate_end to complte */
> 
> s/complte/complete/
> 
> > +       }
> 
> Please follow the coding style and put the closing curly brace on the
> same column as the 'if' statement.
> 
> > +int qemu_savevm_state_warmup(Monitor *mon, QEMUFile *f) {
> > +    int ret = 1;
> 
> 1 is overwritten immediately.

A new patch follows with corrections:

---
From: Aidan Shribman <aidan.shribman@sap.com>
Subject: Add warmup phase for live migration of large memory apps
 By invoking "migrate -w <url>" we initiate a background live-migration
 transferring of dirty pages continuously until invocation of "migrate_end"
 which attempts to complete the live migration operation.
 Qemu host: Ubuntu 10.10
 Testing: live migration (with/without warmup phase) tested successfully.

Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
---

 hmp-commands.hx |   33 +++++++++++++++++++++++++--------
 migration.c     |   23 ++++++++++++++++++++++-
 migration.h     |    2 ++
 qmp-commands.hx |   41 ++++++++++++++++++++++++++++++++++-------
 savevm.c        |   11 +++++++++++
 sysemu.h        |    1 +
 6 files changed, 95 insertions(+), 16 deletions(-)

---
Juan Quintela - May 12, 2011, 10:39 a.m.
"Shribman, Aidan" <aidan.shribman@sap.com> wrote:
>> On Wed, May 11, 2011 at 8:58 AM, Shribman, Aidan 
>> <aidan.shribman@sap.com> wrote:
>> > From: Aidan Shribman <aidan.shribman@sap.com>
>> >
>> > [PATCH] Add warmup phase for live migration of large memory apps
>> >
>> > By invoking "migrate -w <url>" we initiate a background 
>> live-migration
>> > transferring of dirty pages continuously until invocation 
>> of "migrate_end"
>> > which attempts to complete the live migration operation.
>> 
>> What is the purpose of this patch?  How and when do I use it?
>> 
>
> The warmup patch adds none-converging background update of guest
> memory during live-migration such that on request of live-migration
> completion (via "migrate_end" command) we get much faster
> response. This is especially needed when running a payload of large
> enterprise applications which have high memory demands.

We should integrate this with Kemari (Kemari is doing something like
this, just that it has more requirements).  Isaku, do you have any comments?

BTW, what loads have you tested for this?

if I setup an image with 1GB RAM and a DVD iso image, and do in the
guest:

while true; do find /media/cdrom -type f | xargs md5sum; done

Migration never converges with current code (if you use more than 1GB
memory, then all the DVD will be cached inside).

So, I see this only useful for guests that are almost idle, and on that
case, migration speed is not the bigger of your problems, no?

Later, Juan.
Isaku Yamahata - May 12, 2011, 10:54 a.m.
On Thu, May 12, 2011 at 12:39:22PM +0200, Juan Quintela wrote:
> "Shribman, Aidan" <aidan.shribman@sap.com> wrote:
> >> On Wed, May 11, 2011 at 8:58 AM, Shribman, Aidan 
> >> <aidan.shribman@sap.com> wrote:
> >> > From: Aidan Shribman <aidan.shribman@sap.com>
> >> >
> >> > [PATCH] Add warmup phase for live migration of large memory apps
> >> >
> >> > By invoking "migrate -w <url>" we initiate a background 
> >> live-migration
> >> > transferring of dirty pages continuously until invocation 
> >> of "migrate_end"
> >> > which attempts to complete the live migration operation.
> >> 
> >> What is the purpose of this patch?  How and when do I use it?
> >> 
> >
> > The warmup patch adds none-converging background update of guest
> > memory during live-migration such that on request of live-migration
> > completion (via "migrate_end" command) we get much faster
> > response. This is especially needed when running a payload of large
> > enterprise applications which have high memory demands.
> 
> We should integrate this with Kemari (Kemari is doing something like
> this, just that it has more requirements).  Isaku, do you have any comments?

Yochi and Kei are familiar with Kemari. Not me. Cced to them.


> 
> BTW, what loads have you tested for this?
> 
> if I setup an image with 1GB RAM and a DVD iso image, and do in the
> guest:
> 
> while true; do find /media/cdrom -type f | xargs md5sum; done
> 
> Migration never converges with current code (if you use more than 1GB
> memory, then all the DVD will be cached inside).
> 
> So, I see this only useful for guests that are almost idle, and on that
> case, migration speed is not the bigger of your problems, no?
> 
> Later, Juan.
>
Yoshiaki Tamura - May 13, 2011, 2:55 a.m.
2011/5/12 Isaku Yamahata <yamahata@valinux.co.jp>:
> On Thu, May 12, 2011 at 12:39:22PM +0200, Juan Quintela wrote:
>> "Shribman, Aidan" <aidan.shribman@sap.com> wrote:
>> >> On Wed, May 11, 2011 at 8:58 AM, Shribman, Aidan
>> >> <aidan.shribman@sap.com> wrote:
>> >> > From: Aidan Shribman <aidan.shribman@sap.com>
>> >> >
>> >> > [PATCH] Add warmup phase for live migration of large memory apps
>> >> >
>> >> > By invoking "migrate -w <url>" we initiate a background
>> >> live-migration
>> >> > transferring of dirty pages continuously until invocation
>> >> of "migrate_end"
>> >> > which attempts to complete the live migration operation.
>> >>
>> >> What is the purpose of this patch?  How and when do I use it?
>> >>
>> >
>> > The warmup patch adds none-converging background update of guest
>> > memory during live-migration such that on request of live-migration
>> > completion (via "migrate_end" command) we get much faster
>> > response. This is especially needed when running a payload of large
>> > enterprise applications which have high memory demands.
>>
>> We should integrate this with Kemari (Kemari is doing something like
>> this, just that it has more requirements).  Isaku, do you have any comments?
>
> Yochi and Kei are familiar with Kemari. Not me. Cced to them.

I think it's OK to have this feature by checking max_downtime ==
0.  But I'm wondering that if users type commands like:

migrate_set_downtime 0
migrate <url> # w/o -d

it'll lock the monitor forever in most cases.  So forcing users to
set -d or automatically doing inside in case of max_downtime == 0
seems better to me.  Sorry if I'm missing the point...

Yoshi

>
>
>>
>> BTW, what loads have you tested for this?
>>
>> if I setup an image with 1GB RAM and a DVD iso image, and do in the
>> guest:
>>
>> while true; do find /media/cdrom -type f | xargs md5sum; done
>>
>> Migration never converges with current code (if you use more than 1GB
>> memory, then all the DVD will be cached inside).
>>
>> So, I see this only useful for guests that are almost idle, and on that
>> case, migration speed is not the bigger of your problems, no?
>>
>> Later, Juan.
>>
>
> --
> yamahata
>
Shribman, Aidan - May 15, 2011, 2:25 p.m.
> From: Yoshiaki Tamura [mailto:tamura.yoshiaki@gmail.com] 
> I think it's OK to have this feature by checking max_downtime ==
> 0.  But I'm wondering that if users type commands like:
> 
> migrate_set_downtime 0
> migrate <url> # w/o -d
> 
> it'll lock the monitor forever in most cases.  So forcing users to
> set -d or automatically doing inside in case of max_downtime == 0
> seems better to me.  Sorry if I'm missing the point...
> 
> Yoshi

The suggested warmup phase implementation (by never converging migration when max_downtime == 0) should not be considered as a special case (which requires implicate set -d or the likes) as we currently anyway land up with monitor lockup for any attempt of a none demonized migration of a vm with high memory write rate on using small enough max_downtime (such as 0).

Aidan 

> -----Original Message-----
> From: Yoshiaki Tamura [mailto:tamura.yoshiaki@gmail.com] 
> Sent: Friday, May 13, 2011 5:55 AM
> To: Isaku Yamahata
> Cc: Juan Quintela; Shribman, Aidan; Stefan Hajnoczi; 
> qemu-devel@nongnu.org; ohmura.kei@lab.ntt.co.jp
> Subject: Re: [Qemu-devel] [PATCH] Add warmup phase for live 
> migration of large memory apps
> 
> 2011/5/12 Isaku Yamahata <yamahata@valinux.co.jp>:
> > On Thu, May 12, 2011 at 12:39:22PM +0200, Juan Quintela wrote:
> >> "Shribman, Aidan" <aidan.shribman@sap.com> wrote:
> >> >> On Wed, May 11, 2011 at 8:58 AM, Shribman, Aidan
> >> >> <aidan.shribman@sap.com> wrote:
> >> >> > From: Aidan Shribman <aidan.shribman@sap.com>
> >> >> >
> >> >> > [PATCH] Add warmup phase for live migration of large 
> memory apps
> >> >> >
> >> >> > By invoking "migrate -w <url>" we initiate a background
> >> >> live-migration
> >> >> > transferring of dirty pages continuously until invocation
> >> >> of "migrate_end"
> >> >> > which attempts to complete the live migration operation.
> >> >>
> >> >> What is the purpose of this patch?  How and when do I use it?
> >> >>
> >> >
> >> > The warmup patch adds none-converging background update of guest
> >> > memory during live-migration such that on request of 
> live-migration
> >> > completion (via "migrate_end" command) we get much faster
> >> > response. This is especially needed when running a 
> payload of large
> >> > enterprise applications which have high memory demands.
> >>
> >> We should integrate this with Kemari (Kemari is doing 
> something like
> >> this, just that it has more requirements).  Isaku, do you 
> have any comments?
> >
> > Yochi and Kei are familiar with Kemari. Not me. Cced to them.
> 
> I think it's OK to have this feature by checking max_downtime ==
> 0.  But I'm wondering that if users type commands like:
> 
> migrate_set_downtime 0
> migrate <url> # w/o -d
> 
> it'll lock the monitor forever in most cases.  So forcing users to
> set -d or automatically doing inside in case of max_downtime == 0
> seems better to me.  Sorry if I'm missing the point...
> 
> Yoshi
> 
> >
> >
> >>
> >> BTW, what loads have you tested for this?
> >>
> >> if I setup an image with 1GB RAM and a DVD iso image, and do in the
> >> guest:
> >>
> >> while true; do find /media/cdrom -type f | xargs md5sum; done
> >>
> >> Migration never converges with current code (if you use 
> more than 1GB
> >> memory, then all the DVD will be cached inside).
> >>
> >> So, I see this only useful for guests that are almost 
> idle, and on that
> >> case, migration speed is not the bigger of your problems, no?
> >>
> >> Later, Juan.
> >>
> >
> > --
> > yamahata
> >
>

Patch

diff --git a/hmp-commands.hx b/hmp-commands.hx
index e5585ba..215ea41 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -717,24 +717,28 @@  ETEXI
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
-        .params     = "[-d] [-b] [-i] uri",
-        .help       = "migrate to URI (using -d to not wait for completion)"
-		      "\n\t\t\t -b for migration without shared storage with"
-		      " full copy of disk\n\t\t\t -i for migration without "
-		      "shared storage with incremental copy of disk "
-		      "(base image shared between src and destination)",
+        .args_type  = "detach:-d,blk:-b,inc:-i,warmup:-w,uri:s",
+        .params     = "[-d] [-b] [-i] [-w] uri",
+        .help       = "migrate to URI"
+                      "\n\t -d to not wait for completion"
+                      "\n\t -b for migration without shared storage with"
+                      " full copy of disk"
+                      "\n\t -i for migration without"
+                      " shared storage with incremental copy of disk"
+                      " (base image shared between source and destination)"
+                      "\n\t -w to enter warmup phase",
         .user_print = monitor_user_noop,	
 	.mhandler.cmd_new = do_migrate,
     },
 
 
 STEXI
-@item migrate [-d] [-b] [-i] @var{uri}
+@item migrate [-d] [-b] [-i] [-w] @var{uri}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
 	-b for migration with full copy of disk
 	-i for migration with incremental copy of disk (base image is shared)
+    -w to enter warmup phase
 ETEXI
 
     {
@@ -753,6 +757,19 @@  Cancel the current VM migration.
 ETEXI
 
     {
+        .name       = "migrate_end",
+        .args_type  = "",
+        .params     = "",
+        .help       = "Complete warmup and move to full live migration",
+        .mhandler.cmd = do_migrate_end,
+    },
+
+STEXI
+@item migrate_end
+Complete warmup and move to full live migration.
+ETEXI
+
+    {
         .name       = "migrate_set_speed",
         .args_type  = "value:o",
         .params     = "value",
diff --git a/migration.c b/migration.c
index 9ee8b17..c178a77 100644
--- a/migration.c
+++ b/migration.c
@@ -31,6 +31,8 @@ 
     do { } while (0)
 #endif
 
+static int is_migrate_warmup = 0;
+
 /* Migration speed throttling */
 static uint32_t max_throttle = (32 << 20);
 
@@ -81,6 +83,11 @@  int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data)
     int blk = qdict_get_try_bool(qdict, "blk", 0);
     int inc = qdict_get_try_bool(qdict, "inc", 0);
     const char *uri = qdict_get_str(qdict, "uri");
+    is_migrate_warmup = qdict_get_try_bool(qdict, "warmup", 0);
+
+    if (is_migrate_warmup) {
+        detach = 1; /* as we need migrate_end to complete */
+    }
 
     if (current_migration &&
         current_migration->get_status(current_migration) == MIG_STATE_ACTIVE) {
@@ -361,7 +368,9 @@  void migrate_fd_put_ready(void *opaque)
     }
 
     DPRINTF("iterate\n");
-    if (qemu_savevm_state_iterate(s->mon, s->file) == 1) {
+    if (is_migrate_warmup) {
+            qemu_savevm_state_warmup(s->mon, s->file);
+    } else if (qemu_savevm_state_iterate(s->mon, s->file) == 1) {
         int state;
         int old_vm_running = vm_running;
 
@@ -448,3 +457,15 @@  int migrate_fd_close(void *opaque)
     qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
     return s->close(s);
 }
+
+void do_migrate_end(Monitor *mon, const QDict *qdict)
+{
+    if (!vm_running) {
+        return;
+    }
+    if (!is_migrate_warmup) {
+        return;
+    }
+    is_migrate_warmup = 0;
+}
+
diff --git a/migration.h b/migration.h
index d13ed4f..6a96b29 100644
--- a/migration.h
+++ b/migration.h
@@ -134,4 +134,6 @@  static inline FdMigrationState *migrate_to_fms(MigrationState *mig_state)
     return container_of(mig_state, FdMigrationState, mig_state);
 }
 
+void do_migrate_end(Monitor *mon, const QDict *qdict);
+
 #endif
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 793cf1c..1f85c09 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -431,13 +431,16 @@  EQMP
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
-        .params     = "[-d] [-b] [-i] uri",
-        .help       = "migrate to URI (using -d to not wait for completion)"
-		      "\n\t\t\t -b for migration without shared storage with"
-		      " full copy of disk\n\t\t\t -i for migration without "
-		      "shared storage with incremental copy of disk "
-		      "(base image shared between src and destination)",
+        .args_type  = "detach:-d,blk:-b,inc:-i,xbrle:-x,warmup:-w,uri:s",
+        .params     = "[-d] [-b] [-i] [-w] uri",
+        .help       = "migrate to URI"
+                      "\n\t -d to not wait for completion"
+                      "\n\t -b for migration without shared storage with"
+                      " full copy of disk"
+                      "\n\t -i for migration without"
+                      " shared storage with incremental copy of disk"
+                      " (base image shared between source and destination)"
+                      "\n\t -w to enter warmup phase",
         .user_print = monitor_user_noop,	
 	.mhandler.cmd_new = do_migrate,
     },
@@ -453,6 +456,7 @@  Arguments:
 - "blk": block migration, full disk copy (json-bool, optional)
 - "inc": incremental disk copy (json-bool, optional)
 - "uri": Destination URI (json-string)
+- "warmup": to enter warmup phase
 
 Example:
 
@@ -494,6 +498,29 @@  Example:
 EQMP
 
     {
+        .name       = "migrate_end",
+        .args_type  = "",
+        .params     = "",
+        .help       = "Complete warmup and move to full live migration",
+        .mhandler.cmd = do_migrate_end,
+    },
+
+SQMP
+migrate_end
+-----------
+
+End the current migration warmup.
+
+Arguments: None.
+
+Example:
+
+-> { "execute": "migrate_end" }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "migrate_set_speed",
         .args_type  = "value:f",
         .params     = "value",
diff --git a/savevm.c b/savevm.c
index 4e49765..425c7b7 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1471,6 +1471,17 @@  int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
     return 0;
 }
 
+int qemu_savevm_state_warmup(Monitor *mon, QEMUFile *f)
+{
+    int ret;
+
+    if ((ret = qemu_savevm_state_iterate(mon, f)) < 0) {
+        return ret;
+    }
+
+    return 0;
+}
+
 int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f)
 {
     SaveStateEntry *se;
diff --git a/sysemu.h b/sysemu.h
index b81a70e..74e8a48 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -76,6 +76,7 @@  void main_loop_wait(int nonblocking);
 int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
                             int shared);
 int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f);
+int qemu_savevm_state_warmup(Monitor *mon, QEMUFile *f);
 int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f);
 void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f);
 int qemu_loadvm_state(QEMUFile *f);