diff mbox

[RFC,qemu,4/4] migration: filter out guest's free pages in ram bulk stage

Message ID 1457001868-15949-5-git-send-email-liang.z.li@intel.com
State New
Headers show

Commit Message

Li, Liang Z March 3, 2016, 10:44 a.m. UTC
Get the free pages information through virtio and filter out the free
pages in the ram bulk stage. This can significantly reduce the total
live migration time as well as network traffic.

Signed-off-by: Liang Li <liang.z.li@intel.com>
---
 migration/ram.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 46 insertions(+), 6 deletions(-)

Comments

Cornelia Huck March 3, 2016, 12:16 p.m. UTC | #1
On Thu,  3 Mar 2016 18:44:28 +0800
Liang Li <liang.z.li@intel.com> wrote:

> Get the free pages information through virtio and filter out the free
> pages in the ram bulk stage. This can significantly reduce the total
> live migration time as well as network traffic.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> ---
>  migration/ram.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 46 insertions(+), 6 deletions(-)
> 

> @@ -1945,6 +1971,20 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>                                              DIRTY_MEMORY_MIGRATION);
>      }
>      memory_global_dirty_log_start();
> +
> +    if (balloon_free_pages_support() &&
> +        balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,
> +                               &free_pages_count) == 0) {
> +        qemu_mutex_unlock_iothread();
> +        while (balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,
> +                                      &free_pages_count) == 0) {
> +            usleep(1000);
> +        }
> +        qemu_mutex_lock_iothread();
> +
> +        filter_out_guest_free_pages(migration_bitmap_rcu->free_pages_bmap);

A general comment: Using the ballooner to get information about pages
that can be filtered out is too limited (there may be other ways to do
this; we might be able to use cmma on s390, for example), and I don't
like hardcoding to a specific method.

What about the reverse approach: Code may register a handler that
populates the free_pages_bitmap which is called during this stage?

<I like the idea of filtering in general, but I haven't looked at the
code yet>

> +    }
> +
>      migration_bitmap_sync();
>      qemu_mutex_unlock_ramlist();
>      qemu_mutex_unlock_iothread();
Daniel P. Berrangé March 3, 2016, 12:45 p.m. UTC | #2
On Thu, Mar 03, 2016 at 06:44:28PM +0800, Liang Li wrote:
> Get the free pages information through virtio and filter out the free
> pages in the ram bulk stage. This can significantly reduce the total
> live migration time as well as network traffic.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> ---
>  migration/ram.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 46 insertions(+), 6 deletions(-)

> @@ -1945,6 +1971,20 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>                                              DIRTY_MEMORY_MIGRATION);
>      }
>      memory_global_dirty_log_start();
> +
> +    if (balloon_free_pages_support() &&
> +        balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,
> +                               &free_pages_count) == 0) {
> +        qemu_mutex_unlock_iothread();
> +        while (balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,
> +                                      &free_pages_count) == 0) {
> +            usleep(1000);
> +        }
> +        qemu_mutex_lock_iothread();
> +
> +        filter_out_guest_free_pages(migration_bitmap_rcu->free_pages_bmap);
> +    }

IIUC, this code is synchronous wrt to the guest OS balloon drive. ie it
is asking the geust for free pages and waiting for a response. If the
guest OS has crashed this is going to mean QEMU waits forever and thus
migration won't complete. Similarly you need to consider that the guest
OS may be malicious and simply never respond.

So if the migration code is going to use the guest balloon driver to get
info about free pages it has to be done in an asynchronous manner so that
migration can never be stalled by a slow/crashed/malicious guest driver.

Regards,
Daniel
Li, Liang Z March 4, 2016, 2:32 a.m. UTC | #3
> On Thu,  3 Mar 2016 18:44:28 +0800
> Liang Li <liang.z.li@intel.com> wrote:
> 
> > Get the free pages information through virtio and filter out the free
> > pages in the ram bulk stage. This can significantly reduce the total
> > live migration time as well as network traffic.
> >
> > Signed-off-by: Liang Li <liang.z.li@intel.com>
> > ---
> >  migration/ram.c | 52
> > ++++++++++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 46 insertions(+), 6 deletions(-)
> >
> 
> > @@ -1945,6 +1971,20 @@ static int ram_save_setup(QEMUFile *f, void
> *opaque)
> >                                              DIRTY_MEMORY_MIGRATION);
> >      }
> >      memory_global_dirty_log_start();
> > +
> > +    if (balloon_free_pages_support() &&
> > +        balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,
> > +                               &free_pages_count) == 0) {
> > +        qemu_mutex_unlock_iothread();
> > +        while (balloon_get_free_pages(migration_bitmap_rcu-
> >free_pages_bmap,
> > +                                      &free_pages_count) == 0) {
> > +            usleep(1000);
> > +        }
> > +        qemu_mutex_lock_iothread();
> > +
> > +
> > + filter_out_guest_free_pages(migration_bitmap_rcu-
> >free_pages_bmap);
> 
> A general comment: Using the ballooner to get information about pages that
> can be filtered out is too limited (there may be other ways to do this; we
> might be able to use cmma on s390, for example), and I don't like hardcoding
> to a specific method.
> 
> What about the reverse approach: Code may register a handler that
> populates the free_pages_bitmap which is called during this stage?

Good suggestion, thanks!

Liang
> <I like the idea of filtering in general, but I haven't looked at the code yet>
>
Li, Liang Z March 4, 2016, 2:43 a.m. UTC | #4
> On Thu, Mar 03, 2016 at 06:44:28PM +0800, Liang Li wrote:

> > Get the free pages information through virtio and filter out the free

> > pages in the ram bulk stage. This can significantly reduce the total

> > live migration time as well as network traffic.

> >

> > Signed-off-by: Liang Li <liang.z.li@intel.com>

> > ---

> >  migration/ram.c | 52

> > ++++++++++++++++++++++++++++++++++++++++++++++------

> >  1 file changed, 46 insertions(+), 6 deletions(-)

> 

> > @@ -1945,6 +1971,20 @@ static int ram_save_setup(QEMUFile *f, void

> *opaque)

> >                                              DIRTY_MEMORY_MIGRATION);

> >      }

> >      memory_global_dirty_log_start();

> > +

> > +    if (balloon_free_pages_support() &&

> > +        balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,

> > +                               &free_pages_count) == 0) {

> > +        qemu_mutex_unlock_iothread();

> > +        while (balloon_get_free_pages(migration_bitmap_rcu-

> >free_pages_bmap,

> > +                                      &free_pages_count) == 0) {

> > +            usleep(1000);

> > +        }

> > +        qemu_mutex_lock_iothread();

> > +

> > +        filter_out_guest_free_pages(migration_bitmap_rcu-

> >free_pages_bmap);

> > +    }

> 

> IIUC, this code is synchronous wrt to the guest OS balloon drive. ie it is asking

> the geust for free pages and waiting for a response. If the guest OS has

> crashed this is going to mean QEMU waits forever and thus migration won't

> complete. Similarly you need to consider that the guest OS may be malicious

> and simply never respond.

> 

> So if the migration code is going to use the guest balloon driver to get info

> about free pages it has to be done in an asynchronous manner so that

> migration can never be stalled by a slow/crashed/malicious guest driver.

> 

> Regards,

> Daniel


Really,  thanks a lot!

Liang
diff mbox

Patch

diff --git a/migration/ram.c b/migration/ram.c
index ee2547d..819553b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -40,6 +40,7 @@ 
 #include "trace.h"
 #include "exec/ram_addr.h"
 #include "qemu/rcu_queue.h"
+#include "sysemu/balloon.h"
 
 #ifdef DEBUG_MIGRATION_RAM
 #define DPRINTF(fmt, ...) \
@@ -241,6 +242,7 @@  static struct BitmapRcu {
     struct rcu_head rcu;
     /* Main migration bitmap */
     unsigned long *bmap;
+    unsigned long *free_pages_bmap;
     /* bitmap of pages that haven't been sent even once
      * only maintained and used in postcopy at the moment
      * where it's used to send the dirtymap at the start
@@ -561,12 +563,7 @@  ram_addr_t migration_bitmap_find_dirty(RAMBlock *rb,
     unsigned long next;
 
     bitmap = atomic_rcu_read(&migration_bitmap_rcu)->bmap;
-    if (ram_bulk_stage && nr > base) {
-        next = nr + 1;
-    } else {
-        next = find_next_bit(bitmap, size, nr);
-    }
-
+    next = find_next_bit(bitmap, size, nr);
     *ram_addr_abs = next << TARGET_PAGE_BITS;
     return (next - base) << TARGET_PAGE_BITS;
 }
@@ -1415,6 +1412,9 @@  void free_xbzrle_decoded_buf(void)
 static void migration_bitmap_free(struct BitmapRcu *bmap)
 {
     g_free(bmap->bmap);
+    if (balloon_free_pages_support()) {
+        g_free(bmap->free_pages_bmap);
+    }
     g_free(bmap->unsentmap);
     g_free(bmap);
 }
@@ -1873,6 +1873,28 @@  err:
     return ret;
 }
 
+static void filter_out_guest_free_pages(unsigned long *free_pages_bmap)
+{
+    RAMBlock *block;
+    DirtyMemoryBlocks *blocks;
+    unsigned long end, page;
+
+    blocks = atomic_rcu_read(&ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]);
+    block = QLIST_FIRST_RCU(&ram_list.blocks);
+    end = TARGET_PAGE_ALIGN(block->offset +
+                            block->used_length) >> TARGET_PAGE_BITS;
+    page = block->offset >> TARGET_PAGE_BITS;
+
+    while (page < end) {
+        unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
+        unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
+        unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - offset);
+        unsigned long *p = free_pages_bmap + BIT_WORD(page);
+
+        slow_bitmap_complement(blocks->blocks[idx], p, num);
+        page += num;
+    }
+}
 
 /* Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
@@ -1884,6 +1906,7 @@  static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMBlock *block;
     int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
+    uint64_t free_pages_count = 0;
 
     dirty_rate_high_cnt = 0;
     bitmap_sync_count = 0;
@@ -1931,6 +1954,9 @@  static int ram_save_setup(QEMUFile *f, void *opaque)
     ram_bitmap_pages = last_ram_offset() >> TARGET_PAGE_BITS;
     migration_bitmap_rcu = g_new0(struct BitmapRcu, 1);
     migration_bitmap_rcu->bmap = bitmap_new(ram_bitmap_pages);
+    if (balloon_free_pages_support()) {
+        migration_bitmap_rcu->free_pages_bmap = bitmap_new(ram_bitmap_pages);
+    }
 
     if (migrate_postcopy_ram()) {
         migration_bitmap_rcu->unsentmap = bitmap_new(ram_bitmap_pages);
@@ -1945,6 +1971,20 @@  static int ram_save_setup(QEMUFile *f, void *opaque)
                                             DIRTY_MEMORY_MIGRATION);
     }
     memory_global_dirty_log_start();
+
+    if (balloon_free_pages_support() &&
+        balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,
+                               &free_pages_count) == 0) {
+        qemu_mutex_unlock_iothread();
+        while (balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,
+                                      &free_pages_count) == 0) {
+            usleep(1000);
+        }
+        qemu_mutex_lock_iothread();
+
+        filter_out_guest_free_pages(migration_bitmap_rcu->free_pages_bmap);
+    }
+
     migration_bitmap_sync();
     qemu_mutex_unlock_ramlist();
     qemu_mutex_unlock_iothread();