Patchwork Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

login
register
mail settings
Submitter OHMURA Kei
Date Feb. 12, 2010, 2:03 a.m.
Message ID <4B74B70A.4030805@lab.ntt.co.jp>
Download mbox | patch
Permalink /patch/45174/
State New
Headers show

Comments

OHMURA Kei - Feb. 12, 2010, 2:03 a.m.
On 02/11/2010 Anthony Liguori <anthony@codemonkey.ws> wrote:
> Oh, I see what's happening here. Yes, I think a leul_to_cpu() makes more
> sense.

Maybe I'm missing something here.
I couldn't find leul_to_cpu(), so have defined it in bswap.h.
Correct?


On 02/10/2010 Ulrich Drepper <drepper@redhat.com> wrote:
> If you're optimizing this code you might want to do it all.  The
> compiler might not see through the bswap call and create unnecessary
> data dependencies.  Especially problematic if the bitmap is really
> sparse.  Also, the outer test is != while the inner test is >.  Be
> consistent.  I suggest to replace the inner loop with
> 
>      do {
>        ...
>      } while (c != 0);
> 
> Depending on how sparse the bitmap is populated this might reduce the
> number of data dependencies quite a bit.

Combining all comments, the code would be like this.
     
 if (bitmap_ul[i] != 0) {
     c = leul_to_cpu(bitmap_ul[i]);
     do {
         j = ffsl(c) - 1;
         c &= ~(1ul << j);
         page_number = i * HOST_LONG_BITS + j;
         addr1 = page_number * TARGET_PAGE_SIZE;
         addr = offset + addr1;
         ram_addr = cpu_get_physical_page_desc(addr);
         cpu_physical_memory_set_dirty(ram_addr);
     } while (c != 0);
 }
Avi Kivity - Feb. 14, 2010, 12:34 p.m.
On 02/12/2010 04:03 AM, OHMURA Kei wrote:
> On 02/11/2010 Anthony Liguori <anthony@codemonkey.ws> wrote:
>   
>> Oh, I see what's happening here. Yes, I think a leul_to_cpu() makes more
>> sense.
>>     
> Maybe I'm missing something here.
> I couldn't find leul_to_cpu(), so have defined it in bswap.h.
> Correct?
>
> --- a/bswap.h
> +++ b/bswap.h
> @@ -205,8 +205,10 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
>  
>  #ifdef HOST_WORDS_BIGENDIAN
>  #define cpu_to_32wu cpu_to_be32wu
> +#define leul_to_cpu(v) le ## HOST_LONG_BITS ## _to_cpu(v)
>  #else
>  #define cpu_to_32wu cpu_to_le32wu
> +#define leul_to_cpu(v) (v)
>  #endif
>
>
>
> On 02/10/2010 Ulrich Drepper <drepper@redhat.com> wrote:
>   
>> If you're optimizing this code you might want to do it all.  The
>> compiler might not see through the bswap call and create unnecessary
>> data dependencies.  Especially problematic if the bitmap is really
>> sparse.  Also, the outer test is != while the inner test is >.  Be
>> consistent.  I suggest to replace the inner loop with
>>
>>      do {
>>        ...
>>      } while (c != 0);
>>
>> Depending on how sparse the bitmap is populated this might reduce the
>> number of data dependencies quite a bit.
>>     
> Combining all comments, the code would be like this.
>      
>  if (bitmap_ul[i] != 0) {
>      c = leul_to_cpu(bitmap_ul[i]);
>      do {
>          j = ffsl(c) - 1;
>          c &= ~(1ul << j);
>          page_number = i * HOST_LONG_BITS + j;
>          addr1 = page_number * TARGET_PAGE_SIZE;
>          addr = offset + addr1;
>          ram_addr = cpu_get_physical_page_desc(addr);
>          cpu_physical_memory_set_dirty(ram_addr);
>      } while (c != 0);
>  }
>   

Except you don't need bitmap_ul any more - you can change the type of
the bitmap variable, since all accesses should now be ulongs.

Patch

--- a/bswap.h
+++ b/bswap.h
@@ -205,8 +205,10 @@  static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
 
 #ifdef HOST_WORDS_BIGENDIAN
 #define cpu_to_32wu cpu_to_be32wu
+#define leul_to_cpu(v) le ## HOST_LONG_BITS ## _to_cpu(v)
 #else
 #define cpu_to_32wu cpu_to_le32wu
+#define leul_to_cpu(v) (v)
 #endif