Patchwork [1/3] qemu-kvm: Wrap phys_ram_dirty with additional inline functions.

login
register
mail settings
Submitter OHMURA Kei
Date Feb. 8, 2010, 10:22 a.m.
Message ID <4B6FE5DA.6000502@lab.ntt.co.jp>
Download mbox | patch
Permalink /patch/44772/
State New
Headers show

Comments

OHMURA Kei - Feb. 8, 2010, 10:22 a.m.
We think access phys_ram_dirty through inline functions is better
than directly for encoupseling reason.

We devided the ram in a 64 pages block. Each block has a counter, which is
stored in phys_ram_dirty_by_word. It shows the number of dirty pages.
We will find the 64 pages block is dirty or non-dirty using
phys_ram_dirty_by_word.

Signed-off-by: OHMURA Kei <ohmura.kei@lab.ntt.co.jp>
---
 cpu-all.h  |   74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 cpu-defs.h |    1 +
 2 files changed, 75 insertions(+), 0 deletions(-)
Avi Kivity - Feb. 11, 2010, 12:42 p.m.
On 02/08/2010 12:22 PM, OHMURA Kei wrote:
> We think access phys_ram_dirty through inline functions is better
> than directly for encoupseling reason.
>
> We devided the ram in a 64 pages block. Each block has a counter, which is
> stored in phys_ram_dirty_by_word. It shows the number of dirty pages.
> We will find the 64 pages block is dirty or non-dirty using
> phys_ram_dirty_by_word.
>
> Signed-off-by: OHMURA Kei <ohmura.kei@lab.ntt.co.jp>
> ---
>  cpu-all.h  |   74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  cpu-defs.h |    1 +
>  2 files changed, 75 insertions(+), 0 deletions(-)
>
> diff --git a/cpu-all.h b/cpu-all.h
> index 8ed76c7..2251f14 100644
> --- a/cpu-all.h
> +++ b/cpu-all.h
> @@ -168,6 +168,33 @@ typedef union {
>  } CPU_QuadU;
>  #endif
>  
> +static inline unsigned long unroll_flags_to_ul(int flags)
> +{
> +    unsigned long ret = 0, flags_ul = (unsigned long)flags;
> +
> +#if TARGET_LONG_SIZE == 4
> +    ret |= flags_ul <<  0;
> +    ret |= flags_ul <<  8;
> +    ret |= flags_ul << 16;
> +    ret |= flags_ul << 24;
> +#elif TARGET_LONG_SIZE == 8
> +    ret |= flags_ul <<  0;
> +    ret |= flags_ul <<  8;
> +    ret |= flags_ul << 16;
> +    ret |= flags_ul << 24;
> +    ret |= flags_ul << 32;
> +    ret |= flags_ul << 40;
> +    ret |= flags_ul << 48;
> +    ret |= flags_ul << 56;
>   

HOST_LONG_SIZE, not TARGET_LONG_SIZE.

> @@ -890,9 +923,50 @@ static inline int cpu_physical_memory_get_dirty(ram_addr_t addr,
>  
>  static inline void cpu_physical_memory_set_dirty(ram_addr_t addr)
>  {
> +    if (phys_ram_dirty[addr >> TARGET_PAGE_BITS] != 0xff)
> +        ++phys_ram_dirty_by_word[(addr >> TARGET_PAGE_BITS) / 
> +                                 TARGET_LONG_BITS];
> +
>      phys_ram_dirty[addr >> TARGET_PAGE_BITS] = 0xff;
>  }
>   

Why do you need a counter? It may be sufficient to set a single bit.
This reduces the memory overhead and perhaps cache thrashing.
OHMURA Kei - Feb. 12, 2010, 2:08 a.m.
> Why do you need a counter? It may be sufficient to set a single bit.
> This reduces the memory overhead and perhaps cache thrashing.

Thanks for looking into this.  I agree with your opinion.

Our motivation here is to skip traveling when the dirty bitmap is really sparse
or dense, so either setting a bit or counting up would be fine.

There is one advantage to the counter approach that we can make this large
traveling granularity flexible.  In case of the bit approach, the maximum
granularity is limited to HOST_LONG_BITS.  If you think this flexibility is to
be useless, we would take the bit approach.

By the way, this is about filling the gap of the dirty bitmap management 
between kvm and qemu.  Do you think we should set a bit when qemu's 
phys_ram_dirty is 0xff or !0?

Radically, if we could have a bit-based phys_ram_dirty_by_word, we may just OR
the dirty bitmap of kvm with qemu in kvm_get_dirty_pages_log_range()...
Avi Kivity - Feb. 13, 2010, 7:33 a.m.
On 02/12/2010 04:08 AM, OHMURA Kei wrote:
>> Why do you need a counter? It may be sufficient to set a single bit.
>> This reduces the memory overhead and perhaps cache thrashing.
>>      
> Thanks for looking into this.  I agree with your opinion.
>
> Our motivation here is to skip traveling when the dirty bitmap is really sparse
> or dense, so either setting a bit or counting up would be fine.
>
> There is one advantage to the counter approach that we can make this large
> traveling granularity flexible.  In case of the bit approach, the maximum
> granularity is limited to HOST_LONG_BITS.  If you think this flexibility is to
> be useless, we would take the bit approach.
>    

The bit approach can be used for any packing ratio; for example you can 
pack 64 pages in a single bit.  The rule is that if one or more pages is 
dirty, the bit is set; otherwise it is clear.  This makes clearing a 
single page expensive (you have to examine the state of 63 other pages) 
but IIRC we always clear in ranges, so apart from the edges, you can use 
a memset.

> By the way, this is about filling the gap of the dirty bitmap management
> between kvm and qemu.  Do you think we should set a bit when qemu's
> phys_ram_dirty is 0xff or !0?
>
> Radically, if we could have a bit-based phys_ram_dirty_by_word, we may just OR
> the dirty bitmap of kvm with qemu in kvm_get_dirty_pages_log_range()...
>    

The problem is that the qemu uses the dirty information for at least 
three different purposes: live migration, vga updates, and tcg 
self-modifying code.  But I think that's solvable: keep a separate 
bitmap for each purpose, and OR the kvm bitmap into any used qemu bitmap 
whenever we get it from the kernel.

That has many advantages; foremost, when vnc is not connected and we 
aren't live migrating, we can drop all of the bitmaps and save some 
memory.  If you can make that work I think that's best.
Yoshiaki Tamura - Feb. 15, 2010, 5:05 a.m.
Avi Kivity wrote:
> On 02/12/2010 04:08 AM, OHMURA Kei wrote:
>>> Why do you need a counter? It may be sufficient to set a single bit.
>>> This reduces the memory overhead and perhaps cache thrashing.
>> Thanks for looking into this. I agree with your opinion.
>>
>> Our motivation here is to skip traveling when the dirty bitmap is
>> really sparse
>> or dense, so either setting a bit or counting up would be fine.
>>
>> There is one advantage to the counter approach that we can make this
>> large
>> traveling granularity flexible. In case of the bit approach, the maximum
>> granularity is limited to HOST_LONG_BITS. If you think this
>> flexibility is to
>> be useless, we would take the bit approach.
>
> The bit approach can be used for any packing ratio; for example you can
> pack 64 pages in a single bit. The rule is that if one or more pages is
> dirty, the bit is set; otherwise it is clear. This makes clearing a
> single page expensive (you have to examine the state of 63 other pages)
> but IIRC we always clear in ranges, so apart from the edges, you can use
> a memset.

Sounds good.
If we could extend the packing ratio to kvm (in kernel),
it would be more efficient.

>> By the way, this is about filling the gap of the dirty bitmap management
>> between kvm and qemu. Do you think we should set a bit when qemu's
>> phys_ram_dirty is 0xff or !0?
>>
>> Radically, if we could have a bit-based phys_ram_dirty_by_word, we may
>> just OR
>> the dirty bitmap of kvm with qemu in kvm_get_dirty_pages_log_range()...
>
> The problem is that the qemu uses the dirty information for at least
> three different purposes: live migration, vga updates, and tcg
> self-modifying code. But I think that's solvable: keep a separate bitmap
> for each purpose, and OR the kvm bitmap into any used qemu bitmap
> whenever we get it from the kernel.
>
> That has many advantages; foremost, when vnc is not connected and we
> aren't live migrating, we can drop all of the bitmaps and save some
> memory. If you can make that work I think that's best.

Would be happy to do it.
We were also thinking that approach originally, but hesitating because the 
changes might be radical.  I hope this plan is fine for upstream qemu too.

Patch

diff --git a/cpu-all.h b/cpu-all.h
index 8ed76c7..2251f14 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -168,6 +168,33 @@  typedef union {
 } CPU_QuadU;
 #endif
 
+static inline unsigned long unroll_flags_to_ul(int flags)
+{
+    unsigned long ret = 0, flags_ul = (unsigned long)flags;
+
+#if TARGET_LONG_SIZE == 4
+    ret |= flags_ul <<  0;
+    ret |= flags_ul <<  8;
+    ret |= flags_ul << 16;
+    ret |= flags_ul << 24;
+#elif TARGET_LONG_SIZE == 8
+    ret |= flags_ul <<  0;
+    ret |= flags_ul <<  8;
+    ret |= flags_ul << 16;
+    ret |= flags_ul << 24;
+    ret |= flags_ul << 32;
+    ret |= flags_ul << 40;
+    ret |= flags_ul << 48;
+    ret |= flags_ul << 56;
+#else
+    int i;
+    for (i = 0; i < sizeof(unsigned long); i++)
+        ret |= flags_ul << (i * 8);
+#endif
+
+    return ret;
+}
+
 /* CPU memory access without any memory or io remapping */
 
 /*
@@ -847,6 +874,7 @@  int cpu_str_to_log_mask(const char *str);
 
 extern int phys_ram_fd;
 extern uint8_t *phys_ram_dirty;
+extern int *phys_ram_dirty_by_word;
 extern ram_addr_t ram_size;
 extern ram_addr_t last_ram_offset;
 extern uint8_t *bios_mem;
@@ -882,6 +910,11 @@  static inline int cpu_physical_memory_is_dirty(ram_addr_t addr)
     return phys_ram_dirty[addr >> TARGET_PAGE_BITS] == 0xff;
 }
 
+static inline int cpu_physical_memory_get_dirty_flags(ram_addr_t addr)
+{
+    return phys_ram_dirty[addr >> TARGET_PAGE_BITS];
+}
+
 static inline int cpu_physical_memory_get_dirty(ram_addr_t addr,
                                                 int dirty_flags)
 {
@@ -890,9 +923,50 @@  static inline int cpu_physical_memory_get_dirty(ram_addr_t addr,
 
 static inline void cpu_physical_memory_set_dirty(ram_addr_t addr)
 {
+    if (phys_ram_dirty[addr >> TARGET_PAGE_BITS] != 0xff)
+        ++phys_ram_dirty_by_word[(addr >> TARGET_PAGE_BITS) / 
+                                 TARGET_LONG_BITS];
+
     phys_ram_dirty[addr >> TARGET_PAGE_BITS] = 0xff;
 }
 
+static inline int cpu_physical_memory_set_dirty_flags(ram_addr_t addr,
+                                                      int dirty_flags)
+{
+    if ((phys_ram_dirty[addr >> TARGET_PAGE_BITS] != 0xff) &&
+        ((phys_ram_dirty[addr >> TARGET_PAGE_BITS] | dirty_flags) == 0xff)) 
+        ++phys_ram_dirty_by_word[(addr >> TARGET_PAGE_BITS) / 
+                                 TARGET_LONG_BITS];
+
+    return phys_ram_dirty[addr >> TARGET_PAGE_BITS] |= dirty_flags;
+}
+
+static inline void cpu_physical_memory_mask_dirty_range(ram_addr_t start,
+                                                        int length,
+                                                        int dirty_flags)
+{
+    int i, mask, len, *pw;
+    uint8_t *p;
+
+    len = length >> TARGET_PAGE_BITS;
+    mask = ~dirty_flags;
+    p = phys_ram_dirty + (start >> TARGET_PAGE_BITS);
+    pw = phys_ram_dirty_by_word + (start >> TARGET_PAGE_BITS) / 
+        TARGET_LONG_BITS;
+
+    for (i = 0; i < len; i++) {
+        if (p[i] == 0xff)
+            --pw[i / TARGET_LONG_BITS];
+        p[i] &= mask;
+    }
+}
+
+int cpu_physical_memory_get_dirty_range(ram_addr_t start, ram_addr_t end, 
+                                        int dirty_flags);
+
+int cpu_physical_memory_get_non_dirty_range(ram_addr_t start, ram_addr_t end, 
+                                            int dirty_flags);
+
 void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
                                      int dirty_flags);
 void cpu_tlb_update_dirty(CPUState *env);
diff --git a/cpu-defs.h b/cpu-defs.h
index cf502e9..8e89e96 100644
--- a/cpu-defs.h
+++ b/cpu-defs.h
@@ -37,6 +37,7 @@ 
 #endif
 
 #define TARGET_LONG_SIZE (TARGET_LONG_BITS / 8)
+#define TARGET_LONG_ALIGN(addr) (((addr) + TARGET_LONG_BITS - 1) / TARGET_LONG_BITS) 
 
 /* target_ulong is the type of a virtual address */
 #if TARGET_LONG_SIZE == 4