Patchwork [Xen-devel] frequently ballooning results in qemu exit

login
register
mail settings
Submitter Stefano Stabellini
Date March 29, 2013, 12:37 p.m.
Message ID <alpine.DEB.2.02.1303291209020.4430@kaball.uk.xensource.com>
Download mbox | patch
Permalink /patch/232392/
State New
Headers show

Comments

Stefano Stabellini - March 29, 2013, 12:37 p.m.
On Mon, 25 Mar 2013, Hanweidong wrote:
> We fixed this issue by below patch which computed the correct size for test_bits(). qemu_get_ram_ptr() and qemu_safe_ram_ptr() will call xen_map_cache() with size is 0 if the requested address is in the RAM. Then xen_map_cache() will pass the size 0 to test_bits() for checking if the corresponding pfn was mapped in cache. But test_bits() will always return 1 when size is 0 without any bit testing. Actually, for this case, test_bits should check one bit. So this patch introduced a __test_bit_size which is greater than 0 and a multiple of XC_PAGE_SIZE, then test_bits can work correctly with __test_bit_size >> XC_PAGE_SHIFT as its size.
> 
> Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com>
> Signed-off-by: Weidong Han <hanweidong@huawei.com>

Thanks for the patch and for debugging this difficult problem.
The reality is that size is never actually 0: when qemu_get_ram_ptr
calls xen_map_cache with size 0, it actually means "map until the end of
the page". As a consequence test_bits should always test at least 1 bit,
like you wrote.

I tried to simplify your patch a bit, does this one work for you?
Hanweidong - March 30, 2013, 3:04 p.m.
> -----Original Message-----

> From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com]

> Sent: 2013年3月29日 20:37

> To: Hanweidong

> Cc: Tim (Xen.org); George Dunlap; Andrew Cooper; Yanqiangjun; qemu-

> devel@nongnu.org; xen-devel@lists.xen.org; Gonglei (Arei); Anthony

> Perard; Wangzhenguo

> Subject: Re: [Qemu-devel] [Xen-devel] frequently ballooning results in

> qemu exit

> 

> On Mon, 25 Mar 2013, Hanweidong wrote:

> > We fixed this issue by below patch which computed the correct size

> for test_bits(). qemu_get_ram_ptr() and qemu_safe_ram_ptr() will call

> xen_map_cache() with size is 0 if the requested address is in the RAM.

> Then xen_map_cache() will pass the size 0 to test_bits() for checking

> if the corresponding pfn was mapped in cache. But test_bits() will

> always return 1 when size is 0 without any bit testing. Actually, for

> this case, test_bits should check one bit. So this patch introduced a

> __test_bit_size which is greater than 0 and a multiple of XC_PAGE_SIZE,

> then test_bits can work correctly with __test_bit_size >> XC_PAGE_SHIFT

> as its size.

> >

> > Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com>

> > Signed-off-by: Weidong Han <hanweidong@huawei.com>

> 

> Thanks for the patch and for debugging this difficult problem.

> The reality is that size is never actually 0: when qemu_get_ram_ptr

> calls xen_map_cache with size 0, it actually means "map until the end

> of

> the page". As a consequence test_bits should always test at least 1 bit,

> like you wrote.


Yes, for the case of size is 0, we can just simply set __test_bit_size 1. But for size > 0, I think set __test_bit_size to size >> XC_PAGE_SHIFT is incorrect. If size is not multiple of XC_PAGE_SIZE, then the part of (size % XC_PAGE_SIZE) also needs to test 1 bit. For example size is XC_PAGE_SIZE + 1, then it needs to test 2 bits, but size >> XC_PAGE_SHIFT is only 1. 

-weidong

> 

> I tried to simplify your patch a bit, does this one work for you?

> 

> 

> diff --git a/xen-mapcache.c b/xen-mapcache.c

> index dc6d1fa..b03b373 100644

> --- a/xen-mapcache.c

> +++ b/xen-mapcache.c

> @@ -202,6 +202,7 @@ uint8_t *xen_map_cache(hwaddr phys_addr, hwaddr

> size,

>      hwaddr address_index;

>      hwaddr address_offset;

>      hwaddr __size = size;

> +    hwaddr __test_bit_size = size >> XC_PAGE_SHIFT;

>      bool translated = false;

> 

>  tryagain:

> @@ -224,12 +225,16 @@ tryagain:

>      } else {

>          __size = MCACHE_BUCKET_SIZE;

>      }

> +    /* always test at least one page */

> +    if (!__test_bit_size) {

> +        __test_bit_size = 1UL;

> +    }

> 

>      entry = &mapcache->entry[address_index % mapcache->nr_buckets];

> 

>      while (entry && entry->lock && entry->vaddr_base &&

>              (entry->paddr_index != address_index || entry->size !=

> __size ||

> -             !test_bits(address_offset >> XC_PAGE_SHIFT, size >>

> XC_PAGE_SHIFT,

> +             !test_bits(address_offset >> XC_PAGE_SHIFT,

> __test_bit_size,

>                   entry->valid_mapping))) {

>          pentry = entry;

>          entry = entry->next;

> @@ -241,13 +246,13 @@ tryagain:

>      } else if (!entry->lock) {

>          if (!entry->vaddr_base || entry->paddr_index != address_index

> ||

>                  entry->size != __size ||

> -                !test_bits(address_offset >> XC_PAGE_SHIFT, size >>

> XC_PAGE_SHIFT,

> +                !test_bits(address_offset >> XC_PAGE_SHIFT,

> __test_bit_size,

>                      entry->valid_mapping)) {

>              xen_remap_bucket(entry, __size, address_index);

>          }

>      }

> 

> -    if(!test_bits(address_offset >> XC_PAGE_SHIFT, size >>

> XC_PAGE_SHIFT,

> +    if(!test_bits(address_offset >> XC_PAGE_SHIFT, __test_bit_size,

>                  entry->valid_mapping)) {

>          mapcache->last_address_index = -1;

>          if (!translated && mapcache->phys_offset_to_gaddr) {
Stefano Stabellini - April 1, 2013, 2:39 p.m.
On Sat, 30 Mar 2013, Hanweidong wrote:
> > -----Original Message-----
> > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com]
> > Sent: 2013年3月29日 20:37
> > To: Hanweidong
> > Cc: Tim (Xen.org); George Dunlap; Andrew Cooper; Yanqiangjun; qemu-
> > devel@nongnu.org; xen-devel@lists.xen.org; Gonglei (Arei); Anthony
> > Perard; Wangzhenguo
> > Subject: Re: [Qemu-devel] [Xen-devel] frequently ballooning results in
> > qemu exit
> > 
> > On Mon, 25 Mar 2013, Hanweidong wrote:
> > > We fixed this issue by below patch which computed the correct size
> > for test_bits(). qemu_get_ram_ptr() and qemu_safe_ram_ptr() will call
> > xen_map_cache() with size is 0 if the requested address is in the RAM.
> > Then xen_map_cache() will pass the size 0 to test_bits() for checking
> > if the corresponding pfn was mapped in cache. But test_bits() will
> > always return 1 when size is 0 without any bit testing. Actually, for
> > this case, test_bits should check one bit. So this patch introduced a
> > __test_bit_size which is greater than 0 and a multiple of XC_PAGE_SIZE,
> > then test_bits can work correctly with __test_bit_size >> XC_PAGE_SHIFT
> > as its size.
> > >
> > > Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com>
> > > Signed-off-by: Weidong Han <hanweidong@huawei.com>
> > 
> > Thanks for the patch and for debugging this difficult problem.
> > The reality is that size is never actually 0: when qemu_get_ram_ptr
> > calls xen_map_cache with size 0, it actually means "map until the end
> > of
> > the page". As a consequence test_bits should always test at least 1 bit,
> > like you wrote.
> 
> Yes, for the case of size is 0, we can just simply set __test_bit_size 1. But for size > 0, I think set __test_bit_size to size >> XC_PAGE_SHIFT is incorrect. If size is not multiple of XC_PAGE_SIZE, then the part of (size % XC_PAGE_SIZE) also needs to test 1 bit. For example size is XC_PAGE_SIZE + 1, then it needs to test 2 bits, but size >> XC_PAGE_SHIFT is only 1. 
> 

I was assuming that the size is always page aligned.
Looking through the code actually I think that it's better not to rely
on this assumption.

Looking back at your original patch:



> We fixed this issue by below patch which computed the correct size for test_bits(). qemu_get_ram_ptr() and qemu_safe_ram_ptr() will call xen_map_cache() with size is 0 if the requested address is in the RAM. Then xen_map_cache() will pass the size 0 to test_bits() for checking if the corresponding pfn was mapped in cache. But test_bits() will always return 1 when size is 0 without any bit testing. Actually, for this case, test_bits should check one bit. So this patch introduced a __test_bit_size which is greater than 0 and a multiple of XC_PAGE_SIZE, then test_bits can work correctly with __test_bit_size >> XC_PAGE_SHIFT as its size.
> 
> Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com>
> Signed-off-by: Weidong Han <hanweidong@huawei.com>
> 
> diff --git a/xen-mapcache.c b/xen-mapcache.c
> index 31c06dc..bd4a97f 100644
> --- a/xen-mapcache.c
> +++ b/xen-mapcache.c
> @@ -202,6 +202,7 @@ uint8_t *xen_map_cache(hwaddr phys_addr, hwaddr size,
>      hwaddr address_index;
>      hwaddr address_offset;
>      hwaddr __size = size;
> +    hwaddr __test_bit_size = size;
>      bool translated = false;
> 
>  tryagain:
> @@ -210,7 +211,23 @@ tryagain:
> 
>      trace_xen_map_cache(phys_addr);
> 
> -    if (address_index == mapcache->last_address_index && !lock && !__size) {
> +    entry = &mapcache->entry[address_index % mapcache->nr_buckets];

there is no need to move this line up here, see below


> +    /* __test_bit_size is always a multiple of XC_PAGE_SIZE */
> +    if (size) {
> +        __test_bit_size = size + (phys_addr & (XC_PAGE_SIZE - 1));
> +
> +        if (__test_bit_size % XC_PAGE_SIZE) {
> +            __test_bit_size += XC_PAGE_SIZE - (__test_bit_size % XC_PAGE_SIZE);
> +        }
> +    } else {
> +        __test_bit_size = XC_PAGE_SIZE;
> +    }

this is OK


> +    if (address_index == mapcache->last_address_index && !lock && !__size &&
> +        test_bits(address_offset >> XC_PAGE_SHIFT,
> +                  __test_bit_size >> XC_PAGE_SHIFT,
> +                  entry->valid_mapping)) {
>          trace_xen_map_cache_return(mapcache->last_address_vaddr + address_offset);
>          return mapcache->last_address_vaddr + address_offset;
>      }

Unless I am missing something this change is unnecessary: if the mapping
is not valid than mapcache->last_address_index is set to -1.


> @@ -225,11 +242,10 @@ tryagain:
>          __size = MCACHE_BUCKET_SIZE;
>      }
> 
> -    entry = &mapcache->entry[address_index % mapcache->nr_buckets];
> -

just leave entry where it is


>      while (entry && entry->lock && entry->vaddr_base &&
>              (entry->paddr_index != address_index || entry->size != __size ||
> -             !test_bits(address_offset >> XC_PAGE_SHIFT, size >> XC_PAGE_SHIFT,
> +             !test_bits(address_offset >> XC_PAGE_SHIFT,
> +                 __test_bit_size >> XC_PAGE_SHIFT,
>                   entry->valid_mapping))) {
>          pentry = entry;
>          entry = entry->next;
> @@ -241,13 +257,15 @@ tryagain:
>      } else if (!entry->lock) {
>          if (!entry->vaddr_base || entry->paddr_index != address_index ||
>                  entry->size != __size ||
> -                !test_bits(address_offset >> XC_PAGE_SHIFT, size >> XC_PAGE_SHIFT,
> +                !test_bits(address_offset >> XC_PAGE_SHIFT,
> +                    __test_bit_size >> XC_PAGE_SHIFT,
>                      entry->valid_mapping)) {
>              xen_remap_bucket(entry, __size, address_index);
>          }
>      }
> 
> -    if(!test_bits(address_offset >> XC_PAGE_SHIFT, size >> XC_PAGE_SHIFT,
> +    if(!test_bits(address_offset >> XC_PAGE_SHIFT,
> +                __test_bit_size >> XC_PAGE_SHIFT,
>                  entry->valid_mapping)) {
>          mapcache->last_address_index = -1;
>          if (!translated && mapcache->phys_offset_to_gaddr) {

This is fine
Hanweidong - April 2, 2013, 1:06 a.m.
> -----Original Message-----

> From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com]

> Sent: 2013年4月1日 22:39

> To: Hanweidong

> Cc: Stefano Stabellini; Tim (Xen.org); George Dunlap; Andrew Cooper;

> Yanqiangjun; qemu-devel@nongnu.org; xen-devel@lists.xen.org; Gonglei

> (Arei); Anthony Perard; Wangzhenguo

> Subject: RE: [Qemu-devel] [Xen-devel] frequently ballooning results in

> qemu exit

> 

> On Sat, 30 Mar 2013, Hanweidong wrote:

> > > -----Original Message-----

> > > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com]

> > > Sent: 2013年3月29日 20:37

> > > To: Hanweidong

> > > Cc: Tim (Xen.org); George Dunlap; Andrew Cooper; Yanqiangjun; qemu-

> > > devel@nongnu.org; xen-devel@lists.xen.org; Gonglei (Arei); Anthony

> > > Perard; Wangzhenguo

> > > Subject: Re: [Qemu-devel] [Xen-devel] frequently ballooning results

> in

> > > qemu exit

> > >

> > > On Mon, 25 Mar 2013, Hanweidong wrote:

> > > > We fixed this issue by below patch which computed the correct

> size

> > > for test_bits(). qemu_get_ram_ptr() and qemu_safe_ram_ptr() will

> call

> > > xen_map_cache() with size is 0 if the requested address is in the

> RAM.

> > > Then xen_map_cache() will pass the size 0 to test_bits() for

> checking

> > > if the corresponding pfn was mapped in cache. But test_bits() will

> > > always return 1 when size is 0 without any bit testing. Actually,

> for

> > > this case, test_bits should check one bit. So this patch introduced

> a

> > > __test_bit_size which is greater than 0 and a multiple of

> XC_PAGE_SIZE,

> > > then test_bits can work correctly with __test_bit_size >>

> XC_PAGE_SHIFT

> > > as its size.

> > > >

> > > > Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com>

> > > > Signed-off-by: Weidong Han <hanweidong@huawei.com>

> > >

> > > Thanks for the patch and for debugging this difficult problem.

> > > The reality is that size is never actually 0: when qemu_get_ram_ptr

> > > calls xen_map_cache with size 0, it actually means "map until the

> end

> > > of

> > > the page". As a consequence test_bits should always test at least 1

> bit,

> > > like you wrote.

> >

> > Yes, for the case of size is 0, we can just simply set

> __test_bit_size 1. But for size > 0, I think set __test_bit_size to

> size >> XC_PAGE_SHIFT is incorrect. If size is not multiple of

> XC_PAGE_SIZE, then the part of (size % XC_PAGE_SIZE) also needs to test

> 1 bit. For example size is XC_PAGE_SIZE + 1, then it needs to test 2

> bits, but size >> XC_PAGE_SHIFT is only 1.

> >

> 

> I was assuming that the size is always page aligned.

> Looking through the code actually I think that it's better not to rely

> on this assumption.

> 

> Looking back at your original patch:

> 

> 

> 

> > We fixed this issue by below patch which computed the correct size

> for test_bits(). qemu_get_ram_ptr() and qemu_safe_ram_ptr() will call

> xen_map_cache() with size is 0 if the requested address is in the RAM.

> Then xen_map_cache() will pass the size 0 to test_bits() for checking

> if the corresponding pfn was mapped in cache. But test_bits() will

> always return 1 when size is 0 without any bit testing. Actually, for

> this case, test_bits should check one bit. So this patch introduced a

> __test_bit_size which is greater than 0 and a multiple of XC_PAGE_SIZE,

> then test_bits can work correctly with __test_bit_size >> XC_PAGE_SHIFT

> as its size.

> >

> > Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com>

> > Signed-off-by: Weidong Han <hanweidong@huawei.com>

> >

> > diff --git a/xen-mapcache.c b/xen-mapcache.c

> > index 31c06dc..bd4a97f 100644

> > --- a/xen-mapcache.c

> > +++ b/xen-mapcache.c

> > @@ -202,6 +202,7 @@ uint8_t *xen_map_cache(hwaddr phys_addr, hwaddr

> size,

> >      hwaddr address_index;

> >      hwaddr address_offset;

> >      hwaddr __size = size;

> > +    hwaddr __test_bit_size = size;

> >      bool translated = false;

> >

> >  tryagain:

> > @@ -210,7 +211,23 @@ tryagain:

> >

> >      trace_xen_map_cache(phys_addr);

> >

> > -    if (address_index == mapcache->last_address_index && !lock

> && !__size) {

> > +    entry = &mapcache->entry[address_index % mapcache->nr_buckets];

> 

> there is no need to move this line up here, see below

> 

> 

> > +    /* __test_bit_size is always a multiple of XC_PAGE_SIZE */

> > +    if (size) {

> > +        __test_bit_size = size + (phys_addr & (XC_PAGE_SIZE - 1));

> > +

> > +        if (__test_bit_size % XC_PAGE_SIZE) {

> > +            __test_bit_size += XC_PAGE_SIZE - (__test_bit_size %

> XC_PAGE_SIZE);

> > +        }

> > +    } else {

> > +        __test_bit_size = XC_PAGE_SIZE;

> > +    }

> 

> this is OK

> 

> 

> > +    if (address_index == mapcache->last_address_index && !lock

> && !__size &&

> > +        test_bits(address_offset >> XC_PAGE_SHIFT,

> > +                  __test_bit_size >> XC_PAGE_SHIFT,

> > +                  entry->valid_mapping)) {

> >          trace_xen_map_cache_return(mapcache->last_address_vaddr +

> address_offset);

> >          return mapcache->last_address_vaddr + address_offset;

> >      }

> 

> Unless I am missing something this change is unnecessary: if the

> mapping

> is not valid than mapcache->last_address_index is set to -1.


mapcache->last_address_index means the corresponding bucket (1MB) was mapped, but we noticed that some pages of the bucket may be not mapped. So we need to check if it's mapped even the address_index is equal to last_address_index.

-weidong
 
> 

> 

> > @@ -225,11 +242,10 @@ tryagain:

> >          __size = MCACHE_BUCKET_SIZE;

> >      }

> >

> > -    entry = &mapcache->entry[address_index % mapcache->nr_buckets];

> > -

> 

> just leave entry where it is

> 

> 

> >      while (entry && entry->lock && entry->vaddr_base &&

> >              (entry->paddr_index != address_index || entry->size !=

> __size ||

> > -             !test_bits(address_offset >> XC_PAGE_SHIFT, size >>

> XC_PAGE_SHIFT,

> > +             !test_bits(address_offset >> XC_PAGE_SHIFT,

> > +                 __test_bit_size >> XC_PAGE_SHIFT,

> >                   entry->valid_mapping))) {

> >          pentry = entry;

> >          entry = entry->next;

> > @@ -241,13 +257,15 @@ tryagain:

> >      } else if (!entry->lock) {

> >          if (!entry->vaddr_base || entry->paddr_index !=

> address_index ||

> >                  entry->size != __size ||

> > -                !test_bits(address_offset >> XC_PAGE_SHIFT, size >>

> XC_PAGE_SHIFT,

> > +                !test_bits(address_offset >> XC_PAGE_SHIFT,

> > +                    __test_bit_size >> XC_PAGE_SHIFT,

> >                      entry->valid_mapping)) {

> >              xen_remap_bucket(entry, __size, address_index);

> >          }

> >      }

> >

> > -    if(!test_bits(address_offset >> XC_PAGE_SHIFT, size >>

> XC_PAGE_SHIFT,

> > +    if(!test_bits(address_offset >> XC_PAGE_SHIFT,

> > +                __test_bit_size >> XC_PAGE_SHIFT,

> >                  entry->valid_mapping)) {

> >          mapcache->last_address_index = -1;

> >          if (!translated && mapcache->phys_offset_to_gaddr) {

> 

> This is fine
Stefano Stabellini - April 2, 2013, 1:27 p.m.
On Tue, 2 Apr 2013, Hanweidong wrote:
> > -----Original Message-----
> > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com]
> > Sent: 2013年4月1日 22:39
> > To: Hanweidong
> > Cc: Stefano Stabellini; Tim (Xen.org); George Dunlap; Andrew Cooper;
> > Yanqiangjun; qemu-devel@nongnu.org; xen-devel@lists.xen.org; Gonglei
> > (Arei); Anthony Perard; Wangzhenguo
> > Subject: RE: [Qemu-devel] [Xen-devel] frequently ballooning results in
> > qemu exit
> > 
> > On Sat, 30 Mar 2013, Hanweidong wrote:
> > > > -----Original Message-----
> > > > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com]
> > > > Sent: 2013年3月29日 20:37
> > > > To: Hanweidong
> > > > Cc: Tim (Xen.org); George Dunlap; Andrew Cooper; Yanqiangjun; qemu-
> > > > devel@nongnu.org; xen-devel@lists.xen.org; Gonglei (Arei); Anthony
> > > > Perard; Wangzhenguo
> > > > Subject: Re: [Qemu-devel] [Xen-devel] frequently ballooning results
> > in
> > > > qemu exit
> > > >
> > > > On Mon, 25 Mar 2013, Hanweidong wrote:
> > > > > We fixed this issue by below patch which computed the correct
> > size
> > > > for test_bits(). qemu_get_ram_ptr() and qemu_safe_ram_ptr() will
> > call
> > > > xen_map_cache() with size is 0 if the requested address is in the
> > RAM.
> > > > Then xen_map_cache() will pass the size 0 to test_bits() for
> > checking
> > > > if the corresponding pfn was mapped in cache. But test_bits() will
> > > > always return 1 when size is 0 without any bit testing. Actually,
> > for
> > > > this case, test_bits should check one bit. So this patch introduced
> > a
> > > > __test_bit_size which is greater than 0 and a multiple of
> > XC_PAGE_SIZE,
> > > > then test_bits can work correctly with __test_bit_size >>
> > XC_PAGE_SHIFT
> > > > as its size.
> > > > >
> > > > > Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com>
> > > > > Signed-off-by: Weidong Han <hanweidong@huawei.com>
> > > >
> > > > Thanks for the patch and for debugging this difficult problem.
> > > > The reality is that size is never actually 0: when qemu_get_ram_ptr
> > > > calls xen_map_cache with size 0, it actually means "map until the
> > end
> > > > of
> > > > the page". As a consequence test_bits should always test at least 1
> > bit,
> > > > like you wrote.
> > >
> > > Yes, for the case of size is 0, we can just simply set
> > __test_bit_size 1. But for size > 0, I think set __test_bit_size to
> > size >> XC_PAGE_SHIFT is incorrect. If size is not multiple of
> > XC_PAGE_SIZE, then the part of (size % XC_PAGE_SIZE) also needs to test
> > 1 bit. For example size is XC_PAGE_SIZE + 1, then it needs to test 2
> > bits, but size >> XC_PAGE_SHIFT is only 1.
> > >
> > 
> > I was assuming that the size is always page aligned.
> > Looking through the code actually I think that it's better not to rely
> > on this assumption.
> > 
> > Looking back at your original patch:
> > 
> > 
> > 
> > > We fixed this issue by below patch which computed the correct size
> > for test_bits(). qemu_get_ram_ptr() and qemu_safe_ram_ptr() will call
> > xen_map_cache() with size is 0 if the requested address is in the RAM.
> > Then xen_map_cache() will pass the size 0 to test_bits() for checking
> > if the corresponding pfn was mapped in cache. But test_bits() will
> > always return 1 when size is 0 without any bit testing. Actually, for
> > this case, test_bits should check one bit. So this patch introduced a
> > __test_bit_size which is greater than 0 and a multiple of XC_PAGE_SIZE,
> > then test_bits can work correctly with __test_bit_size >> XC_PAGE_SHIFT
> > as its size.
> > >
> > > Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com>
> > > Signed-off-by: Weidong Han <hanweidong@huawei.com>
> > >
> > > diff --git a/xen-mapcache.c b/xen-mapcache.c
> > > index 31c06dc..bd4a97f 100644
> > > --- a/xen-mapcache.c
> > > +++ b/xen-mapcache.c
> > > @@ -202,6 +202,7 @@ uint8_t *xen_map_cache(hwaddr phys_addr, hwaddr
> > size,
> > >      hwaddr address_index;
> > >      hwaddr address_offset;
> > >      hwaddr __size = size;
> > > +    hwaddr __test_bit_size = size;
> > >      bool translated = false;
> > >
> > >  tryagain:
> > > @@ -210,7 +211,23 @@ tryagain:
> > >
> > >      trace_xen_map_cache(phys_addr);
> > >
> > > -    if (address_index == mapcache->last_address_index && !lock
> > && !__size) {
> > > +    entry = &mapcache->entry[address_index % mapcache->nr_buckets];
> > 
> > there is no need to move this line up here, see below
> > 
> > 
> > > +    /* __test_bit_size is always a multiple of XC_PAGE_SIZE */
> > > +    if (size) {
> > > +        __test_bit_size = size + (phys_addr & (XC_PAGE_SIZE - 1));
> > > +
> > > +        if (__test_bit_size % XC_PAGE_SIZE) {
> > > +            __test_bit_size += XC_PAGE_SIZE - (__test_bit_size %
> > XC_PAGE_SIZE);
> > > +        }
> > > +    } else {
> > > +        __test_bit_size = XC_PAGE_SIZE;
> > > +    }
> > 
> > this is OK
> > 
> > 
> > > +    if (address_index == mapcache->last_address_index && !lock
> > && !__size &&
> > > +        test_bits(address_offset >> XC_PAGE_SHIFT,
> > > +                  __test_bit_size >> XC_PAGE_SHIFT,
> > > +                  entry->valid_mapping)) {
> > >          trace_xen_map_cache_return(mapcache->last_address_vaddr +
> > address_offset);
> > >          return mapcache->last_address_vaddr + address_offset;
> > >      }
> > 
> > Unless I am missing something this change is unnecessary: if the
> > mapping
> > is not valid than mapcache->last_address_index is set to -1.
> 
> mapcache->last_address_index means the corresponding bucket (1MB) was mapped, but we noticed that some pages of the bucket may be not mapped. So we need to check if it's mapped even the address_index is equal to last_address_index.

That is a good point, but the current fix doesn't fully address that
problem: the first entry found in the cache might not be the one
corresponding to last_address_index.

I think that the right fix here would be to replace last_address_index
and last_address_vaddr with a last_entry pointer.

I have sent a small patch series that includes your patch, can you
please let me know if it does solve your problem and if you think that
is correct?

The patch series is here: 

http://marc.info/?l=qemu-devel&m=136490915902679
http://marc.info/?l=qemu-devel&m=136490915602678
Hanweidong - April 3, 2013, 8:15 a.m.
> -----Original Message-----

> From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com]

> Sent: 2013年4月2日 21:28

> To: Hanweidong

> Cc: Stefano Stabellini; Tim (Xen.org); George Dunlap; Andrew Cooper;

> Yanqiangjun; qemu-devel@nongnu.org; xen-devel@lists.xen.org; Gonglei

> (Arei); Anthony Perard; Wangzhenguo

> Subject: RE: [Qemu-devel] [Xen-devel] frequently ballooning results in

> qemu exit

> 

> On Tue, 2 Apr 2013, Hanweidong wrote:

> > > -----Original Message-----

> > > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com]

> > > Sent: 2013年4月1日 22:39

> > > To: Hanweidong

> > > Cc: Stefano Stabellini; Tim (Xen.org); George Dunlap; Andrew Cooper;

> > > Yanqiangjun; qemu-devel@nongnu.org; xen-devel@lists.xen.org;

> Gonglei

> > > (Arei); Anthony Perard; Wangzhenguo

> > > Subject: RE: [Qemu-devel] [Xen-devel] frequently ballooning results

> in

> > > qemu exit

> > >

> > > On Sat, 30 Mar 2013, Hanweidong wrote:

> > > > > -----Original Message-----

> > > > > From: Stefano Stabellini

> [mailto:stefano.stabellini@eu.citrix.com]

> > > > > Sent: 2013年3月29日 20:37

> > > > > To: Hanweidong

> > > > > Cc: Tim (Xen.org); George Dunlap; Andrew Cooper; Yanqiangjun;

> qemu-

> > > > > devel@nongnu.org; xen-devel@lists.xen.org; Gonglei (Arei);

> Anthony

> > > > > Perard; Wangzhenguo

> > > > > Subject: Re: [Qemu-devel] [Xen-devel] frequently ballooning

> results

> > > in

> > > > > qemu exit

> > > > >

> > > > > On Mon, 25 Mar 2013, Hanweidong wrote:

> > > > > > We fixed this issue by below patch which computed the correct

> > > size

> > > > > for test_bits(). qemu_get_ram_ptr() and qemu_safe_ram_ptr()

> will

> > > call

> > > > > xen_map_cache() with size is 0 if the requested address is in

> the

> > > RAM.

> > > > > Then xen_map_cache() will pass the size 0 to test_bits() for

> > > checking

> > > > > if the corresponding pfn was mapped in cache. But test_bits()

> will

> > > > > always return 1 when size is 0 without any bit testing.

> Actually,

> > > for

> > > > > this case, test_bits should check one bit. So this patch

> introduced

> > > a

> > > > > __test_bit_size which is greater than 0 and a multiple of

> > > XC_PAGE_SIZE,

> > > > > then test_bits can work correctly with __test_bit_size >>

> > > XC_PAGE_SHIFT

> > > > > as its size.

> > > > > >

> > > > > > Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com>

> > > > > > Signed-off-by: Weidong Han <hanweidong@huawei.com>

> > > > >

> > > > > Thanks for the patch and for debugging this difficult problem.

> > > > > The reality is that size is never actually 0: when

> qemu_get_ram_ptr

> > > > > calls xen_map_cache with size 0, it actually means "map until

> the

> > > end

> > > > > of

> > > > > the page". As a consequence test_bits should always test at

> least 1

> > > bit,

> > > > > like you wrote.

> > > >

> > > > Yes, for the case of size is 0, we can just simply set

> > > __test_bit_size 1. But for size > 0, I think set __test_bit_size to

> > > size >> XC_PAGE_SHIFT is incorrect. If size is not multiple of

> > > XC_PAGE_SIZE, then the part of (size % XC_PAGE_SIZE) also needs to

> test

> > > 1 bit. For example size is XC_PAGE_SIZE + 1, then it needs to test

> 2

> > > bits, but size >> XC_PAGE_SHIFT is only 1.

> > > >

> > >

> > > I was assuming that the size is always page aligned.

> > > Looking through the code actually I think that it's better not to

> rely

> > > on this assumption.

> > >

> > > Looking back at your original patch:

> > >

> > >

> > >

> > > > We fixed this issue by below patch which computed the correct

> size

> > > for test_bits(). qemu_get_ram_ptr() and qemu_safe_ram_ptr() will

> call

> > > xen_map_cache() with size is 0 if the requested address is in the

> RAM.

> > > Then xen_map_cache() will pass the size 0 to test_bits() for

> checking

> > > if the corresponding pfn was mapped in cache. But test_bits() will

> > > always return 1 when size is 0 without any bit testing. Actually,

> for

> > > this case, test_bits should check one bit. So this patch introduced

> a

> > > __test_bit_size which is greater than 0 and a multiple of

> XC_PAGE_SIZE,

> > > then test_bits can work correctly with __test_bit_size >>

> XC_PAGE_SHIFT

> > > as its size.

> > > >

> > > > Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com>

> > > > Signed-off-by: Weidong Han <hanweidong@huawei.com>

> > > >

> > > > diff --git a/xen-mapcache.c b/xen-mapcache.c

> > > > index 31c06dc..bd4a97f 100644

> > > > --- a/xen-mapcache.c

> > > > +++ b/xen-mapcache.c

> > > > @@ -202,6 +202,7 @@ uint8_t *xen_map_cache(hwaddr phys_addr,

> hwaddr

> > > size,

> > > >      hwaddr address_index;

> > > >      hwaddr address_offset;

> > > >      hwaddr __size = size;

> > > > +    hwaddr __test_bit_size = size;

> > > >      bool translated = false;

> > > >

> > > >  tryagain:

> > > > @@ -210,7 +211,23 @@ tryagain:

> > > >

> > > >      trace_xen_map_cache(phys_addr);

> > > >

> > > > -    if (address_index == mapcache->last_address_index && !lock

> > > && !__size) {

> > > > +    entry = &mapcache->entry[address_index % mapcache-

> >nr_buckets];

> > >

> > > there is no need to move this line up here, see below

> > >

> > >

> > > > +    /* __test_bit_size is always a multiple of XC_PAGE_SIZE */

> > > > +    if (size) {

> > > > +        __test_bit_size = size + (phys_addr & (XC_PAGE_SIZE -

> 1));

> > > > +

> > > > +        if (__test_bit_size % XC_PAGE_SIZE) {

> > > > +            __test_bit_size += XC_PAGE_SIZE - (__test_bit_size %

> > > XC_PAGE_SIZE);

> > > > +        }

> > > > +    } else {

> > > > +        __test_bit_size = XC_PAGE_SIZE;

> > > > +    }

> > >

> > > this is OK

> > >

> > >

> > > > +    if (address_index == mapcache->last_address_index && !lock

> > > && !__size &&

> > > > +        test_bits(address_offset >> XC_PAGE_SHIFT,

> > > > +                  __test_bit_size >> XC_PAGE_SHIFT,

> > > > +                  entry->valid_mapping)) {

> > > >          trace_xen_map_cache_return(mapcache->last_address_vaddr

> +

> > > address_offset);

> > > >          return mapcache->last_address_vaddr + address_offset;

> > > >      }

> > >

> > > Unless I am missing something this change is unnecessary: if the

> > > mapping

> > > is not valid than mapcache->last_address_index is set to -1.

> >

> > mapcache->last_address_index means the corresponding bucket (1MB) was

> mapped, but we noticed that some pages of the bucket may be not mapped.

> So we need to check if it's mapped even the address_index is equal to

> last_address_index.

> 

> That is a good point, but the current fix doesn't fully address that

> problem: the first entry found in the cache might not be the one

> corresponding to last_address_index.

> 

> I think that the right fix here would be to replace last_address_index

> and last_address_vaddr with a last_entry pointer.

> 

> I have sent a small patch series that includes your patch, can you

> please let me know if it does solve your problem and if you think that

> is correct?

> 


The patches look good for me. We verified that the patches solved our problem. 

--weidong

> The patch series is here:

> 

> http://marc.info/?l=qemu-devel&m=136490915902679

> http://marc.info/?l=qemu-devel&m=136490915602678
Stefano Stabellini - April 3, 2013, 10:36 a.m.
On Wed, 3 Apr 2013, Hanweidong wrote:
> > -----Original Message-----
> > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com]
> > Sent: 2013年4月2日 21:28
> > To: Hanweidong
> > Cc: Stefano Stabellini; Tim (Xen.org); George Dunlap; Andrew Cooper;
> > Yanqiangjun; qemu-devel@nongnu.org; xen-devel@lists.xen.org; Gonglei
> > (Arei); Anthony Perard; Wangzhenguo
> > Subject: RE: [Qemu-devel] [Xen-devel] frequently ballooning results in
> > qemu exit
> > 
> > On Tue, 2 Apr 2013, Hanweidong wrote:
> > > > -----Original Message-----
> > > > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com]
> > > > Sent: 2013年4月1日 22:39
> > > > To: Hanweidong
> > > > Cc: Stefano Stabellini; Tim (Xen.org); George Dunlap; Andrew Cooper;
> > > > Yanqiangjun; qemu-devel@nongnu.org; xen-devel@lists.xen.org;
> > Gonglei
> > > > (Arei); Anthony Perard; Wangzhenguo
> > > > Subject: RE: [Qemu-devel] [Xen-devel] frequently ballooning results
> > in
> > > > qemu exit
> > > >
> > > > On Sat, 30 Mar 2013, Hanweidong wrote:
> > > > > > -----Original Message-----
> > > > > > From: Stefano Stabellini
> > [mailto:stefano.stabellini@eu.citrix.com]
> > > > > > Sent: 2013年3月29日 20:37
> > > > > > To: Hanweidong
> > > > > > Cc: Tim (Xen.org); George Dunlap; Andrew Cooper; Yanqiangjun;
> > qemu-
> > > > > > devel@nongnu.org; xen-devel@lists.xen.org; Gonglei (Arei);
> > Anthony
> > > > > > Perard; Wangzhenguo
> > > > > > Subject: Re: [Qemu-devel] [Xen-devel] frequently ballooning
> > results
> > > > in
> > > > > > qemu exit
> > > > > >
> > > > > > On Mon, 25 Mar 2013, Hanweidong wrote:
> > > > > > > We fixed this issue by below patch which computed the correct
> > > > size
> > > > > > for test_bits(). qemu_get_ram_ptr() and qemu_safe_ram_ptr()
> > will
> > > > call
> > > > > > xen_map_cache() with size is 0 if the requested address is in
> > the
> > > > RAM.
> > > > > > Then xen_map_cache() will pass the size 0 to test_bits() for
> > > > checking
> > > > > > if the corresponding pfn was mapped in cache. But test_bits()
> > will
> > > > > > always return 1 when size is 0 without any bit testing.
> > Actually,
> > > > for
> > > > > > this case, test_bits should check one bit. So this patch
> > introduced
> > > > a
> > > > > > __test_bit_size which is greater than 0 and a multiple of
> > > > XC_PAGE_SIZE,
> > > > > > then test_bits can work correctly with __test_bit_size >>
> > > > XC_PAGE_SHIFT
> > > > > > as its size.
> > > > > > >
> > > > > > > Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com>
> > > > > > > Signed-off-by: Weidong Han <hanweidong@huawei.com>
> > > > > >
> > > > > > Thanks for the patch and for debugging this difficult problem.
> > > > > > The reality is that size is never actually 0: when
> > qemu_get_ram_ptr
> > > > > > calls xen_map_cache with size 0, it actually means "map until
> > the
> > > > end
> > > > > > of
> > > > > > the page". As a consequence test_bits should always test at
> > least 1
> > > > bit,
> > > > > > like you wrote.
> > > > >
> > > > > Yes, for the case of size is 0, we can just simply set
> > > > __test_bit_size 1. But for size > 0, I think set __test_bit_size to
> > > > size >> XC_PAGE_SHIFT is incorrect. If size is not multiple of
> > > > XC_PAGE_SIZE, then the part of (size % XC_PAGE_SIZE) also needs to
> > test
> > > > 1 bit. For example size is XC_PAGE_SIZE + 1, then it needs to test
> > 2
> > > > bits, but size >> XC_PAGE_SHIFT is only 1.
> > > > >
> > > >
> > > > I was assuming that the size is always page aligned.
> > > > Looking through the code actually I think that it's better not to
> > rely
> > > > on this assumption.
> > > >
> > > > Looking back at your original patch:
> > > >
> > > >
> > > >
> > > > > We fixed this issue by below patch which computed the correct
> > size
> > > > for test_bits(). qemu_get_ram_ptr() and qemu_safe_ram_ptr() will
> > call
> > > > xen_map_cache() with size is 0 if the requested address is in the
> > RAM.
> > > > Then xen_map_cache() will pass the size 0 to test_bits() for
> > checking
> > > > if the corresponding pfn was mapped in cache. But test_bits() will
> > > > always return 1 when size is 0 without any bit testing. Actually,
> > for
> > > > this case, test_bits should check one bit. So this patch introduced
> > a
> > > > __test_bit_size which is greater than 0 and a multiple of
> > XC_PAGE_SIZE,
> > > > then test_bits can work correctly with __test_bit_size >>
> > XC_PAGE_SHIFT
> > > > as its size.
> > > > >
> > > > > Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com>
> > > > > Signed-off-by: Weidong Han <hanweidong@huawei.com>
> > > > >
> > > > > diff --git a/xen-mapcache.c b/xen-mapcache.c
> > > > > index 31c06dc..bd4a97f 100644
> > > > > --- a/xen-mapcache.c
> > > > > +++ b/xen-mapcache.c
> > > > > @@ -202,6 +202,7 @@ uint8_t *xen_map_cache(hwaddr phys_addr,
> > hwaddr
> > > > size,
> > > > >      hwaddr address_index;
> > > > >      hwaddr address_offset;
> > > > >      hwaddr __size = size;
> > > > > +    hwaddr __test_bit_size = size;
> > > > >      bool translated = false;
> > > > >
> > > > >  tryagain:
> > > > > @@ -210,7 +211,23 @@ tryagain:
> > > > >
> > > > >      trace_xen_map_cache(phys_addr);
> > > > >
> > > > > -    if (address_index == mapcache->last_address_index && !lock
> > > > && !__size) {
> > > > > +    entry = &mapcache->entry[address_index % mapcache-
> > >nr_buckets];
> > > >
> > > > there is no need to move this line up here, see below
> > > >
> > > >
> > > > > +    /* __test_bit_size is always a multiple of XC_PAGE_SIZE */
> > > > > +    if (size) {
> > > > > +        __test_bit_size = size + (phys_addr & (XC_PAGE_SIZE -
> > 1));
> > > > > +
> > > > > +        if (__test_bit_size % XC_PAGE_SIZE) {
> > > > > +            __test_bit_size += XC_PAGE_SIZE - (__test_bit_size %
> > > > XC_PAGE_SIZE);
> > > > > +        }
> > > > > +    } else {
> > > > > +        __test_bit_size = XC_PAGE_SIZE;
> > > > > +    }
> > > >
> > > > this is OK
> > > >
> > > >
> > > > > +    if (address_index == mapcache->last_address_index && !lock
> > > > && !__size &&
> > > > > +        test_bits(address_offset >> XC_PAGE_SHIFT,
> > > > > +                  __test_bit_size >> XC_PAGE_SHIFT,
> > > > > +                  entry->valid_mapping)) {
> > > > >          trace_xen_map_cache_return(mapcache->last_address_vaddr
> > +
> > > > address_offset);
> > > > >          return mapcache->last_address_vaddr + address_offset;
> > > > >      }
> > > >
> > > > Unless I am missing something this change is unnecessary: if the
> > > > mapping
> > > > is not valid than mapcache->last_address_index is set to -1.
> > >
> > > mapcache->last_address_index means the corresponding bucket (1MB) was
> > mapped, but we noticed that some pages of the bucket may be not mapped.
> > So we need to check if it's mapped even the address_index is equal to
> > last_address_index.
> > 
> > That is a good point, but the current fix doesn't fully address that
> > problem: the first entry found in the cache might not be the one
> > corresponding to last_address_index.
> > 
> > I think that the right fix here would be to replace last_address_index
> > and last_address_vaddr with a last_entry pointer.
> > 
> > I have sent a small patch series that includes your patch, can you
> > please let me know if it does solve your problem and if you think that
> > is correct?
> > 
> 
> The patches look good for me. We verified that the patches solved our problem. 

Great, thanks!
I'll submit a pull request.

Patch

diff --git a/xen-mapcache.c b/xen-mapcache.c
index dc6d1fa..b03b373 100644
--- a/xen-mapcache.c
+++ b/xen-mapcache.c
@@ -202,6 +202,7 @@  uint8_t *xen_map_cache(hwaddr phys_addr, hwaddr size,
     hwaddr address_index;
     hwaddr address_offset;
     hwaddr __size = size;
+    hwaddr __test_bit_size = size >> XC_PAGE_SHIFT;
     bool translated = false;
 
 tryagain:
@@ -224,12 +225,16 @@  tryagain:
     } else {
         __size = MCACHE_BUCKET_SIZE;
     }
+    /* always test at least one page */
+    if (!__test_bit_size) {
+        __test_bit_size = 1UL;
+    }
 
     entry = &mapcache->entry[address_index % mapcache->nr_buckets];
 
     while (entry && entry->lock && entry->vaddr_base &&
             (entry->paddr_index != address_index || entry->size != __size ||
-             !test_bits(address_offset >> XC_PAGE_SHIFT, size >> XC_PAGE_SHIFT,
+             !test_bits(address_offset >> XC_PAGE_SHIFT, __test_bit_size,
                  entry->valid_mapping))) {
         pentry = entry;
         entry = entry->next;
@@ -241,13 +246,13 @@  tryagain:
     } else if (!entry->lock) {
         if (!entry->vaddr_base || entry->paddr_index != address_index ||
                 entry->size != __size ||
-                !test_bits(address_offset >> XC_PAGE_SHIFT, size >> XC_PAGE_SHIFT,
+                !test_bits(address_offset >> XC_PAGE_SHIFT, __test_bit_size,
                     entry->valid_mapping)) {
             xen_remap_bucket(entry, __size, address_index);
         }
     }
 
-    if(!test_bits(address_offset >> XC_PAGE_SHIFT, size >> XC_PAGE_SHIFT,
+    if(!test_bits(address_offset >> XC_PAGE_SHIFT, __test_bit_size,
                 entry->valid_mapping)) {
         mapcache->last_address_index = -1;
         if (!translated && mapcache->phys_offset_to_gaddr) {