diff mbox

Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling

Message ID 4B7130E9.7060809@lab.ntt.co.jp
State New
Headers show

Commit Message

OHMURA Kei Feb. 9, 2010, 9:54 a.m. UTC
Thank you for your comments.  We have implemented the code which applied your 
comments.  This is patch for qemu-kvm.c.

Signed-off-by: OHMURA Kei <ohmura.kei@lab.ntt.co.jp>
---
 qemu-kvm.c |   31 +++++++++++++++++++------------
 1 files changed, 19 insertions(+), 12 deletions(-)

Comments

Avi Kivity Feb. 9, 2010, 10:26 a.m. UTC | #1
On 02/09/2010 11:54 AM, OHMURA Kei wrote:
> Thank you for your comments.  We have implemented the code which applied your 
> comments.  This is patch for qemu-kvm.c.
>   

Please reuse the changelog when reposing a patch, this makes it easier
for me to apply it.

> @@ -2438,27 +2438,34 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
>                                           unsigned long offset,
>                                           unsigned long mem_size)
>  {
> -    unsigned int i, j, n = 0;
> +    unsigned int i, j, k, start, end;
>      unsigned char c;
>      unsigned long page_number, addr, addr1;
>      ram_addr_t ram_addr;
> -    unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
> +    unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) /
> +        TARGET_LONG_BITS;
> +    unsigned long *bitmap_ul = (unsigned long *)bitmap;
>  
>      /* 
>       * bitmap-traveling is faster than memory-traveling (for addr...) 
>       * especially when most of the memory is not dirty.
>       */
>      for (i = 0; i < len; i++) {
> -        c = bitmap[i];
> -        while (c > 0) {
> -            j = ffsl(c) - 1;
> -            c &= ~(1u << j);
> -            page_number = i * 8 + j;
> -            addr1 = page_number * TARGET_PAGE_SIZE;
> -            addr = offset + addr1;
> -            ram_addr = cpu_get_physical_page_desc(addr);
> -            cpu_physical_memory_set_dirty(ram_addr);
> -            n++;
> +        if (bitmap_ul[i] != 0) {
> +            start = i * TARGET_LONG_SIZE;
> +            end = (i + 1) * TARGET_LONG_SIZE;
>   

Should be a host long size, not guest. This will fail when running a
32-bit qemu-system-x86_64 binary.

> +            for (j = start; j < end; j++) {
> +                c = bitmap[j];
> +                while (c > 0) {
> +                    k = ffsl(c) - 1;
> +                    c &= ~(1u << k);
> +                    page_number = j * 8 + k;
> +                    addr1 = page_number * TARGET_PAGE_SIZE;
> +                    addr = offset + addr1;
> +                    ram_addr = cpu_get_physical_page_desc(addr);
> +                    cpu_physical_memory_set_dirty(ram_addr);
> +                }
> +            }
>   

Instead of using a nested loop if bitmap_ul[i] != 0, it is possible to
use just a single loop (while (c > 0)), and process a long's worth of data.

The only trickery is with big endian hosts, where the conversion from
bit number to page number is a bit complicated.

If we do this, we can convert the bitmap's type to unsigned long
throughout, and avoid the casts.
OHMURA Kei Feb. 10, 2010, 9:55 a.m. UTC | #2
> Please reuse the changelog when reposing a patch, this makes it easier
> for me to apply it.

Thanks.  Will follow it from next time.


> Should be a host long size, not guest. This will fail when running a
> 32-bit qemu-system-x86_64 binary.

Sorry. That was our mistake.


> Instead of using a nested loop if bitmap_ul[i] != 0, it is possible to
> use just a single loop (while (c>  0)), and process a long's worth of data.
>
> The only trickery is with big endian hosts, where the conversion from
> bit number to page number is a bit complicated.

To convert the bitmap from big endian to little endian, le_bswap macro in
bswap.h seems useful, which is now undefined.  What do you think about this
approach?

This is an example bitmap-traveling code using le_bswap:
     /* 
      * bitmap-traveling is faster than memory-traveling (for addr...) 
      * especially when most of the memory is not dirty.
      */
     for (i = 0; i < len; i++) {
        if (bitmap_ul[i] != 0) {
            c = le_bswap(bitmap_ul[i], HOST_LONG_BITS);
            while (c > 0) {
                j = ffsl(c) - 1;
                c &= ~(1ul << j);
                page_number = i * HOST_LONG_BITS + j;
                addr1 = page_number * TARGET_PAGE_SIZE;
                addr = offset + addr1;
                ram_addr = cpu_get_physical_page_desc(addr);
                cpu_physical_memory_set_dirty(ram_addr);
            }
         }
     }
Avi Kivity Feb. 10, 2010, 10:24 a.m. UTC | #3
On 02/10/2010 11:55 AM, OHMURA Kei wrote:
>
>> Instead of using a nested loop if bitmap_ul[i] != 0, it is possible to
>> use just a single loop (while (c>  0)), and process a long's worth of data.
>>
>> The only trickery is with big endian hosts, where the conversion from
>> bit number to page number is a bit complicated.
>>     
> To convert the bitmap from big endian to little endian, le_bswap macro in
> bswap.h seems useful, which is now undefined.  What do you think about this
> approach?
>
> This is an example bitmap-traveling code using le_bswap:
>      /* 
>       * bitmap-traveling is faster than memory-traveling (for addr...) 
>       * especially when most of the memory is not dirty.
>       */
>      for (i = 0; i < len; i++) {
>         if (bitmap_ul[i] != 0) {
>             c = le_bswap(bitmap_ul[i], HOST_LONG_BITS);
>             while (c > 0) {
>                 j = ffsl(c) - 1;
>                 c &= ~(1ul << j);
>                 page_number = i * HOST_LONG_BITS + j;
>                 addr1 = page_number * TARGET_PAGE_SIZE;
>                 addr = offset + addr1;
>                 ram_addr = cpu_get_physical_page_desc(addr);
>                 cpu_physical_memory_set_dirty(ram_addr);
>             }
>          }
>      }
>   

Yes, that solves the problem very neatly.
diff mbox

Patch

diff --git a/qemu-kvm.c b/qemu-kvm.c
index a305907..d7474ea 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2438,27 +2438,34 @@  static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
                                          unsigned long offset,
                                          unsigned long mem_size)
 {
-    unsigned int i, j, n = 0;
+    unsigned int i, j, k, start, end;
     unsigned char c;
     unsigned long page_number, addr, addr1;
     ram_addr_t ram_addr;
-    unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
+    unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) /
+        TARGET_LONG_BITS;
+    unsigned long *bitmap_ul = (unsigned long *)bitmap;
 
     /* 
      * bitmap-traveling is faster than memory-traveling (for addr...) 
      * especially when most of the memory is not dirty.
      */
     for (i = 0; i < len; i++) {
-        c = bitmap[i];
-        while (c > 0) {
-            j = ffsl(c) - 1;
-            c &= ~(1u << j);
-            page_number = i * 8 + j;
-            addr1 = page_number * TARGET_PAGE_SIZE;
-            addr = offset + addr1;
-            ram_addr = cpu_get_physical_page_desc(addr);
-            cpu_physical_memory_set_dirty(ram_addr);
-            n++;
+        if (bitmap_ul[i] != 0) {
+            start = i * TARGET_LONG_SIZE;
+            end = (i + 1) * TARGET_LONG_SIZE;
+            for (j = start; j < end; j++) {
+                c = bitmap[j];
+                while (c > 0) {
+                    k = ffsl(c) - 1;
+                    c &= ~(1u << k);
+                    page_number = j * 8 + k;
+                    addr1 = page_number * TARGET_PAGE_SIZE;
+                    addr = offset + addr1;
+                    ram_addr = cpu_get_physical_page_desc(addr);
+                    cpu_physical_memory_set_dirty(ram_addr);
+                }
+            }
         }
     }
     return 0;