Patchwork qemu-kvm: Speed up of the dirty-bitmap-traveling

login
register
mail settings
Submitter OHMURA Kei
Date Feb. 5, 2010, 10:18 a.m.
Message ID <4B6BF06D.1090909@lab.ntt.co.jp>
Download mbox | patch
Permalink /patch/44700/
State New
Headers show

Comments

OHMURA Kei - Feb. 5, 2010, 10:18 a.m.
dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
But We think that dirty-bitmap-traveling by long size is faster than by byte
size especially when most of memory is not dirty.

Signed-off-by: OHMURA Kei <ohmura.kei@lab.ntt.co.jp>
---
 qemu-kvm.c |   49 ++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 42 insertions(+), 7 deletions(-)

-- 1.6.3.3
OHMURA Kei - Feb. 8, 2010, 6:14 a.m.
> Sounds logical - do you have numbers on the improvement?

Sure.  The patch showed approximately 3-7 times speed up when measured with 
rdtsc.  The test environment and detailed results are described below.

---
tmp = rdtsc();
/* function of original code*/
t1 += rdtsc() - tmp;
    
tmp = rdtsc();
/* function of this patch */
t2 += rdtsc() - tmp;
---

Test Envirionment:
CPU: 4x Intel Xeon Quad Core 2.66GHz
Mem size: 6GB
kvm version: 2.6.31-17-server
qemu version: commit ed880109f74f0a4dd5b7ec09e6a2d9ba4903d9a5

Host OS: Ubuntu 9.10 (kernel 2.6.31)
Guest OS: Debian/GNU Linux lenny (kernel 2.6.26)
Guest Mem size: 512MB

We executed live migration three times.  This data shows, how many times the 
function is called (#called), runtime of original (orig.), runtime of this 
patch (patch), speedup ratio (ratio), when live migration run.

Experimental results:
Test1: Guest OS read 3GB file, which is bigger than memory.
#called     orig.(msec)     patch(msec)     ratio
114         1.00            0.15            6.76
132         1.57            0.25            6.26
96          1.00            0.16            6.27
 
Test2: Guest OS read/write 3GB file, which is bigger than memory.
#called     orig.(msec)     patch(msec)     ratio
2196        38.1            10.6            3.59
2256        39.6            10.8            3.68
2112        36.3            10.3            3.53



> Would be great if you could provide a version for upstream as well
> because it will likely replace this qemu-kvm code on day.

O.K.  We'll prepare it.

We'll also post a patch set to quicken dirty pages checking in ram_save_block
and ram_save_live soon.
Avi Kivity - Feb. 8, 2010, 12:40 p.m.
On 02/05/2010 12:18 PM, OHMURA Kei wrote:
> dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
> But We think that dirty-bitmap-traveling by long size is faster than by byte
> size especially when most of memory is not dirty.
>
>
>
> +
> +static int kvm_get_dirty_pages_log_range_by_long(unsigned long start_addr,
> +                                                 unsigned char *bitmap,
> +                                                 unsigned long offset,
> +                                                 unsigned long mem_size)
> +{
> +    unsigned int i;
> +    unsigned int len;
> +    unsigned long *bitmap_ul = (unsigned long *)bitmap;
> +
> +    /* bitmap-traveling by long size is faster than by byte size
> +     * especially when most of memory is not dirty.
> +     * bitmap should be long-size aligned for traveling by long.
> +     */
> +    if (((unsigned long)bitmap & (TARGET_LONG_SIZE - 1)) == 0) {
>   

Since we allocate the bitmap, we can be sure that it is aligned on a
long boundary (qemu_malloc() should guarantee that). So you can
eliminate the fallback.

> +        len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) /
> +            TARGET_LONG_BITS;
> +        for (i = 0; i < len; i++)
> +            if (bitmap_ul[i] != 0)
> +                kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE, 
> +                    (i + 1) * TARGET_LONG_SIZE, bitmap, offset);
>   

Better to just use the original loop here (since we don't need the
function as a fallback).

> +        /* 
> +         * We will check the remaining dirty-bitmap, 
> +         * when the mem_size is not a multiple of TARGET_LONG_SIZE. 
> +         */ 
> +        if ((mem_size & (TARGET_LONG_SIZE - 1)) != 0) {
> +            len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
> +            kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE, 
> +                len, bitmap, offset);
> +        }
>   

Seems like the bitmap size is also aligned as well (allocated using
BITMAP_SIZE which aligns using HOST_LONG_BITS), so this is unnecessary
as well.

Patch

diff --git a/qemu-kvm.c b/qemu-kvm.c
index a305907..5459cdd 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2433,22 +2433,21 @@  int kvm_physical_memory_set_dirty_tracking(int enable)
 }
 
 /* get kvm's dirty pages bitmap and update qemu's */
-static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
-                                         unsigned char *bitmap,
-                                         unsigned long offset,
-                                         unsigned long mem_size)
+static void kvm_get_dirty_pages_log_range_by_byte(unsigned int start,
+                                                  unsigned int end,
+                                                  unsigned char *bitmap,
+                                                  unsigned long offset)
 {
     unsigned int i, j, n = 0;
     unsigned char c;
     unsigned long page_number, addr, addr1;
     ram_addr_t ram_addr;
-    unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
 
     /* 
      * bitmap-traveling is faster than memory-traveling (for addr...) 
      * especially when most of the memory is not dirty.
      */
-    for (i = 0; i < len; i++) {
+    for (i = start; i < end; i++) {
         c = bitmap[i];
         while (c > 0) {
             j = ffsl(c) - 1;
@@ -2461,13 +2460,49 @@  static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
             n++;
         }
     }
+}
+
+static int kvm_get_dirty_pages_log_range_by_long(unsigned long start_addr,
+                                                 unsigned char *bitmap,
+                                                 unsigned long offset,
+                                                 unsigned long mem_size)
+{
+    unsigned int i;
+    unsigned int len;
+    unsigned long *bitmap_ul = (unsigned long *)bitmap;
+
+    /* bitmap-traveling by long size is faster than by byte size
+     * especially when most of memory is not dirty.
+     * bitmap should be long-size aligned for traveling by long.
+     */
+    if (((unsigned long)bitmap & (TARGET_LONG_SIZE - 1)) == 0) {
+        len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) /
+            TARGET_LONG_BITS;
+        for (i = 0; i < len; i++)
+            if (bitmap_ul[i] != 0)
+                kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE, 
+                    (i + 1) * TARGET_LONG_SIZE, bitmap, offset);
+        /* 
+         * We will check the remaining dirty-bitmap, 
+         * when the mem_size is not a multiple of TARGET_LONG_SIZE. 
+         */ 
+        if ((mem_size & (TARGET_LONG_SIZE - 1)) != 0) {
+            len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
+            kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE, 
+                len, bitmap, offset);
+        }
+    } else { /* slow path: traveling by byte. */
+        len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
+        kvm_get_dirty_pages_log_range_by_byte(0, len, bitmap, offset);
+    }
+
     return 0;
 }
 
 static int kvm_get_dirty_bitmap_cb(unsigned long start, unsigned long len,
                                    void *bitmap, void *opaque)
 {
-    return kvm_get_dirty_pages_log_range(start, bitmap, start, len);
+    return kvm_get_dirty_pages_log_range_by_long(start, bitmap, start, len);
 }
 
 /*