Patchwork [Xen-devel] frequently ballooning results in qemu exit

login
register
mail settings
Submitter Hanweidong
Date March 25, 2013, 12:40 p.m.
Message ID <FAB5C136CA8BEA4DBEA2F641E3F536384A8BC8A0@szxeml538-mbx.china.huawei.com>
Download mbox | patch
Permalink /patch/230714/
State New
Headers show

Comments

Hanweidong - March 25, 2013, 12:40 p.m.
We fixed this issue by below patch which computed the correct size for test_bits(). qemu_get_ram_ptr() and qemu_safe_ram_ptr() will call xen_map_cache() with size is 0 if the requested address is in the RAM. Then xen_map_cache() will pass the size 0 to test_bits() for checking if the corresponding pfn was mapped in cache. But test_bits() will always return 1 when size is 0 without any bit testing. Actually, for this case, test_bits should check one bit. So this patch introduced a __test_bit_size which is greater than 0 and a multiple of XC_PAGE_SIZE, then test_bits can work correctly with __test_bit_size >> XC_PAGE_SHIFT as its size.

Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com>

Signed-off-by: Weidong Han <hanweidong@huawei.com>




> -----Original Message-----

> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-

> bounces@lists.xen.org] On Behalf Of Hanweidong

> Sent: 2013年3月21日 21:33

> To: Tim Deegan

> Cc: George Dunlap; Andrew Cooper; Yanqiangjun; xen-devel@lists.xen.org;

> Gonglei (Arei); Anthony PERARD

> Subject: Re: [Xen-devel] frequently ballooning results in qemu exit

> 

> > -----Original Message-----

> > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-

> > bounces@lists.xen.org] On Behalf Of Tim Deegan

> > Sent: 2013年3月21日 20:16

> > To: Hanweidong

> > Cc: George Dunlap; Andrew Cooper; Yanqiangjun; xen-

> devel@lists.xen.org;

> > Gonglei (Arei); Anthony PERARD

> > Subject: Re: [Xen-devel] frequently ballooning results in qemu exit

> >

> > At 05:54 +0000 on 15 Mar (1363326854), Hanweidong wrote:

> > > > > I'm also curious about this. There is a window between memory

> > balloon

> > > > out

> > > > > and QEMU invalidate mapcache.

> > > >

> > > > That by itself is OK; I don't think we need to provide any

> > meaningful

> > > > semantics if the guest is accessing memory that it's ballooned

> out.

> > > >

> > > > The question is where the SIGBUS comes from: either qemu has a

> > mapping

> > > > of the old memory, in which case it can write to it safely, or it

> > > > doesn't, in which case it shouldn't try.

> > >

> > > The error always happened at memcpy in if (is_write) branch in

> > > address_space_rw.

> >

> > Sure, but _why_?  Why does this access cause SIGBUS?  Presumably

> > there's

> > some part of the mapcache code that thinks it has a mapping there

> when

> > it doesn't.

> >

> > > We found that, after the last xen_invalidate_map_cache, the

> mapcache

> > entry related to the failed address was mapped:

> > > 	==xen_map_cache== phys_addr=7a3c1ec0 size=0 lock=0

> > > 	==xen_remap_bucket== begin size=1048576 ,address_index=7a3

> > > 	==xen_remap_bucket== end entry->paddr_index=7a3,entry-

> > >vaddr_base=2a2d9000,size=1048576,address_index=7a3

> >

> > OK, so that's 0x2a2d9000 -- 0x2a3d8fff.

> >

> > > 	==address_space_rw== ptr=2a39aec0

> > > 	==xen_map_cache== phys_addr=7a3c1ec4 size=0 lock=0

> > > 	==xen_map_cache==first return 2a2d9000+c1ec4=2a39aec4

> > > 	==address_space_rw== ptr=2a39aec4

> > > 	==xen_map_cache== phys_addr=7a3c1ec8 size=0 lock=0

> > > 	==xen_map_cache==first return 2a2d9000+c1ec8=2a39aec8

> > > 	==address_space_rw== ptr=2a39aec8

> > > 	==xen_map_cache== phys_addr=7a3c1ecc size=0 lock=0

> > > 	==xen_map_cache==first return 2a2d9000+c1ecc=2a39aecc

> > > 	==address_space_rw== ptr=2a39aecc

> >

> > These are all to page 0x2a3e9a___.

> >

> > > 	==xen_map_cache== phys_addr=7a16c108 size=0 lock=0

> > > 	==xen_map_cache== return 92a407000+6c108=2a473108

> > > 	==xen_map_cache== phys_addr=7a16c10c size=0 lock=0

> > > 	==xen_map_cache==first return 2a407000+6c10c=2a47310c

> > > 	==xen_map_cache== phys_addr=7a16c110 size=0 lock=0

> > > 	==xen_map_cache==first return 2a407000+6c110=2a473110

> > > 	==xen_map_cache== phys_addr=7a395000 size=0 lock=0

> > > 	==xen_map_cache== return 2a2d9000+95000=2a36e000

> > > 	==address_space_rw== ptr=2a36e000

> >

> > And this is to page 0x2a36e___, a different page in the same bucket.

> >

> > >       here, the SIGBUS error occurred.

> >

> > So that page isn't mapped.  Which means:

> > - it was never mapped (and the mapcache code didn't handle the error

> >   correctly at map time); or

> > - it was never mapped (and the mapcache hasn't checked its own

> records

> >   before using the map); or

> > - it was mapped (and something unmapped it in the meantime).

> >

> > Why not add some tests in xen_remap_bucket to check that all the

> pages

> > that qemu records as mapped are actually there?

> >

> 

> Hi Tim,

> 

> We did more debugging these days, and root caused it just now. We found

> the pfn which caused the SIGBUS was never mapped. We verified it by

> printing *err* variable return from xc_map_foreign_bulk(), and its bit

> was also not set in entry->valid_mapping. But test_bits didn't return

> right value for this situation. qemu_get_ram_ptr() almost passed size 0

> to xen_map_cache(), and then xen_map_cache() calls test_bits() to test

> valid_mapping, and test_bits always return 1 when size is 0 because

> qemu's find_next_zero_bit() has following code:

> 	if (offset >= size) {

>         return size;

>     	}

> 

> One way to fix test_bits() issue is to change size >> XC_PAGE_SHIFT to

> (size + 1 << XC_PAGE_SHIFT) >> XC_PAGE_SHIFT in xen_map_cache(). But we

> think it still cannot solve the problem perfectly, because test_bits

> will only test one bit when size is 0. Why does qemu_get_ram_ptr()

> always call xen_map_cache() with size is 0 when block->offset == 0? If

> some operations need to access big range address (more than 1 page),

> this way still has problem. So I think it should pass actual size to

> xen_map_cache().

> 

> --weidong

> 

> _______________________________________________

> Xen-devel mailing list

> Xen-devel@lists.xen.org

> http://lists.xen.org/xen-devel

Patch

diff --git a/xen-mapcache.c b/xen-mapcache.c

index 31c06dc..bd4a97f 100644

--- a/xen-mapcache.c

+++ b/xen-mapcache.c

@@ -202,6 +202,7 @@  uint8_t *xen_map_cache(hwaddr phys_addr, hwaddr size,

     hwaddr address_index;
     hwaddr address_offset;
     hwaddr __size = size;
+    hwaddr __test_bit_size = size;

     bool translated = false;

 tryagain:
@@ -210,7 +211,23 @@  tryagain:


     trace_xen_map_cache(phys_addr);

-    if (address_index == mapcache->last_address_index && !lock && !__size) {

+    entry = &mapcache->entry[address_index % mapcache->nr_buckets];

+

+    /* __test_bit_size is always a multiple of XC_PAGE_SIZE */

+    if (size) {

+        __test_bit_size = size + (phys_addr & (XC_PAGE_SIZE - 1));

+

+        if (__test_bit_size % XC_PAGE_SIZE) {

+            __test_bit_size += XC_PAGE_SIZE - (__test_bit_size % XC_PAGE_SIZE);

+        }

+    } else {

+        __test_bit_size = XC_PAGE_SIZE;

+    }

+

+    if (address_index == mapcache->last_address_index && !lock && !__size &&

+        test_bits(address_offset >> XC_PAGE_SHIFT,

+                  __test_bit_size >> XC_PAGE_SHIFT,

+                  entry->valid_mapping)) {

         trace_xen_map_cache_return(mapcache->last_address_vaddr + address_offset);
         return mapcache->last_address_vaddr + address_offset;
     }
@@ -225,11 +242,10 @@  tryagain:

         __size = MCACHE_BUCKET_SIZE;
     }

-    entry = &mapcache->entry[address_index % mapcache->nr_buckets];

-

     while (entry && entry->lock && entry->vaddr_base &&
             (entry->paddr_index != address_index || entry->size != __size ||
-             !test_bits(address_offset >> XC_PAGE_SHIFT, size >> XC_PAGE_SHIFT,

+             !test_bits(address_offset >> XC_PAGE_SHIFT,

+                 __test_bit_size >> XC_PAGE_SHIFT,

                  entry->valid_mapping))) {
         pentry = entry;
         entry = entry->next;
@@ -241,13 +257,15 @@  tryagain:

     } else if (!entry->lock) {
         if (!entry->vaddr_base || entry->paddr_index != address_index ||
                 entry->size != __size ||
-                !test_bits(address_offset >> XC_PAGE_SHIFT, size >> XC_PAGE_SHIFT,

+                !test_bits(address_offset >> XC_PAGE_SHIFT,

+                    __test_bit_size >> XC_PAGE_SHIFT,

                     entry->valid_mapping)) {
             xen_remap_bucket(entry, __size, address_index);
         }
     }

-    if(!test_bits(address_offset >> XC_PAGE_SHIFT, size >> XC_PAGE_SHIFT,

+    if(!test_bits(address_offset >> XC_PAGE_SHIFT,

+                __test_bit_size >> XC_PAGE_SHIFT,

                 entry->valid_mapping)) {
         mapcache->last_address_index = -1;
         if (!translated && mapcache->phys_offset_to_gaddr) {