[v1,2/2] reduce qemu's heap Rss size from 12252kB to 2752KB

Message ID	4712D8F4B26E034E80552F30A67BE0B1A1561C@ORSMSX112.amr.corp.intel.com
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: "Xu, Anthony" <anthony.xu@intel.com> To: Paolo Bonzini <pbonzini@redhat.com> Thread-Topic: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB Thread-Index: AQHSmW5s7VgxEgkfD0KNa0Uxoqhl0qGOUJAAgAAAp8AZksd/6v842HEggADaOoD///owIA== Date: Tue, 14 Mar 2017 21:23:29 +0000 Message-ID: <4712D8F4B26E034E80552F30A67BE0B1A1561C@ORSMSX112.amr.corp.intel.com> References: <1489158897-9206-1-git-send-email-yang.zhong@intel.com> <1489158897-9206-2-git-send-email-yang.zhong@intel.com> <53522014-8f70-9358-7d60-8b6c491c9611@redhat.com> <4712D8F4B26E034E80552F30A67BE0B1A0F91A@ORSMSX112.amr.corp.intel.com> <920352018.1797331.1489251855464.JavaMail.zimbra@redhat.com> <4712D8F4B26E034E80552F30A67BE0B1A13045@ORSMSX112.amr.corp.intel.com> <991a0f3d-5e1f-d17c-a8f3-a42c02ac2f17@redhat.com> In-Reply-To: <991a0f3d-5e1f-d17c-a8f3-a42c02ac2f17@redhat.com> Accept-Language: en-US Content-Language: en-US Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB Precedence: list Cc: "Zhong, Yang" <yang.zhong@intel.com>, "Peng, Chao P" <chao.p.peng@intel.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org> Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>

Message ID

4712D8F4B26E034E80552F30A67BE0B1A1561C@ORSMSX112.amr.corp.intel.com

State

New

Headers

From: "Xu, Anthony" <anthony.xu@intel.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Thread-Topic: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from
	12252kB to 2752KB
Thread-Index: AQHSmW5s7VgxEgkfD0KNa0Uxoqhl0qGOUJAAgAAAp8AZksd/6v842HEggADaOoD///owIA==
Date: Tue, 14 Mar 2017 21:23:29 +0000
Message-ID: <4712D8F4B26E034E80552F30A67BE0B1A1561C@ORSMSX112.amr.corp.intel.com>
References: <1489158897-9206-1-git-send-email-yang.zhong@intel.com>
	<1489158897-9206-2-git-send-email-yang.zhong@intel.com>
	<53522014-8f70-9358-7d60-8b6c491c9611@redhat.com>
	<4712D8F4B26E034E80552F30A67BE0B1A0F91A@ORSMSX112.amr.corp.intel.com>
	<920352018.1797331.1489251855464.JavaMail.zimbra@redhat.com>
	<4712D8F4B26E034E80552F30A67BE0B1A13045@ORSMSX112.amr.corp.intel.com>
	<991a0f3d-5e1f-d17c-a8f3-a42c02ac2f17@redhat.com>
In-Reply-To: <991a0f3d-5e1f-d17c-a8f3-a42c02ac2f17@redhat.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from
	12252kB to 2752KB
Precedence: list
Cc: "Zhong, Yang" <yang.zhong@intel.com>, "Peng,
	Chao P" <chao.p.peng@intel.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>

Commit Message

Xu, Anthony March 14, 2017, 9:23 p.m. UTC

> -----Original Message-----

> From: Paolo Bonzini [mailto:pbonzini@redhat.com]

> Sent: Tuesday, March 14, 2017 3:15 AM

> To: Xu, Anthony <anthony.xu@intel.com>

> Cc: Zhong, Yang <yang.zhong@intel.com>; qemu-devel@nongnu.org; Peng,

> Chao P <chao.p.peng@intel.com>

> Subject: Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from

> 12252kB to 2752KB

> 

> 

> 

> On 14/03/2017 06:14, Xu, Anthony wrote:

> > Below functions are registered in RCU thread

> > address_space_dispatch_free,

> > do_address_space_destroy

> > flatview_unref

> > reclaim_ramblock,

> > qht_map_destroy,

> > migration_bitmap_free

> >

> > first three are address space related, should work without global lock per

> above analysis.

> > The rest are very simple, seems doesn't need global lock.

> 

> flatview_unref can call object_unref and thus reach:


Okay,  flatview_unref is the one you worried about,

Flatview_unref is registered as a RCU callback only in address_space_update_topology,
Strangely, it is registered as a RCU callback, and is called directly in this function. 
Basially flatview_unref is called twice for old view. There are some comments, 
but it is not clear to me, is this a bug or by design? Is flatview_destroy called in current thread 
or RCU thread?


static void address_space_update_topology(AddressSpace *as)
{
    FlatView *old_view = address_space_get_flatview(as);
    FlatView *new_view = generate_memory_topology(as->root);

    address_space_update_topology_pass(as, old_view, new_view, false);
    address_space_update_topology_pass(as, old_view, new_view, true);

    /* Writes are protected by the BQL.  */
    atomic_rcu_set(&as->current_map, new_view);
    call_rcu(old_view, flatview_unref, rcu);

    /* Note that all the old MemoryRegions are still alive up to this
     * point.  This relieves most MemoryListeners from the need to
     * ref/unref the MemoryRegions they get---unless they use them
     * outside the iothread mutex, in which case precise reference
     * counting is necessary.
     */
    flatview_unref(old_view);

    address_space_update_ioeventfds(as);
}


Let me split the patch, Do you think below patch is correct?














> 

> - all QOM instance_finalize callbacks

> 

> - all QOM property release callbacks

> 

> In turn, of QOM property release callbacks the more important ones are

> release_drive (which calls blockdev_auto_del and blk_detach_dev) and

> release_chr (which calls qemu_chr_fe_deinit).

> 

> Your patch is incorrect, sorry.  If it were that simple, it would have

> been done already...

> 

> Paolo

Comments

Paolo Bonzini March 15, 2017, 8:36 a.m. UTC | #1

On 14/03/2017 22:23, Xu, Anthony wrote:
>> flatview_unref can call object_unref and thus reach:
> 
> Okay,  flatview_unref is the one you worried about,
> 
> Flatview_unref is registered as a RCU callback only in address_space_update_topology,
> Strangely, it is registered as a RCU callback, and is called directly in this function. 
> Basially flatview_unref is called twice for old view.

The first unref is done after as->current_map is overwritten.
as->current_map is accessed under RCU, so it needs call_rcu.  It
balances the initial reference that is present since flatview_init.

The second unref is done to balance the flatview_ref in
address_space_get_flatview.

> There are some comments, 
> but it is not clear to me, is this a bug or by design? Is flatview_destroy called in current thread 
> or RCU thread?

You cannot know, because there are also other callers of
address_space_get_flatview.  Usually it's from the RCU thread.

> Let me split the patch, Do you think below patch is correct?
> 
> --- a/memory.c
> +++ b/memory.c
> @@ -1503,15 +1503,9 @@ static void memory_region_finalize(Object *obj)
>       * and cause an infinite loop.
>       */
>      mr->enabled = false;
> -    memory_region_transaction_begin();
> -    while (!QTAILQ_EMPTY(&mr->subregions)) {
> -        MemoryRegion *subregion = QTAILQ_FIRST(&mr->subregions);
> -        memory_region_del_subregion(mr, subregion);
> -    }
> -    memory_region_transaction_commit();
> -
> +    assert(QTAILQ_EMPTY(&mr->subregions));
>      mr->destructor(mr);
> -    memory_region_clear_coalescing(m
> +    assert(QTAILQ_EMPTY(&mr->coalesced));
>      g_free((char *)mr->name);
>      g_free(mr->ioeventfds);
>  }

No.  Callers can:

- let memory_region_finalize remove subregions (commit 2e2b8eb, "memory:
allow destroying a non-empty MemoryRegion", 2015-10-09)

- let memory_region_finalize disable coalesced I/O (in fact there are no
callers of memory_region_clear_coalescing outside memory.c)

> Then let us take a look at flatview_unref, memory_region_unref may call any QOM finalize through 
> Object_unref(mr->owner), that's your concern, I got it. It is way too complicated to look at each QOM 
> object release callbacks and each QOM property release callbacks.  I gave up this path.
> How about fall back to synchronize_rcu?

I'm afraid it would be too slow, but you can test.

Something like the kernel's synchronize_rcu_expedited would work, but we
don't have anything like that in QEMU yet.

> As for address space, the RCU read lock is used to protect PhysPageMap, but not the regular MemoryRegions,

Nope.  The RCU read lock protects all MemoryRegions through
flatview_unref: RCU keeps the FlatView alive, and the FlatView keeps
MemoryRegions alive.

Hence...

> The MemoryRegions returned from address_space_translate are regular MemoryRegions, so 
> address_space_write_continue and address_space_read_continue don't need RCU read lock.

... this is wrong too.

Paolo

--- a/memory.c
+++ b/memory.c
@@ -1503,15 +1503,9 @@  static void memory_region_finalize(Object *obj)
      * and cause an infinite loop.
      */
     mr->enabled = false;
-    memory_region_transaction_begin();
-    while (!QTAILQ_EMPTY(&mr->subregions)) {
-        MemoryRegion *subregion = QTAILQ_FIRST(&mr->subregions);
-        memory_region_del_subregion(mr, subregion);
-    }
-    memory_region_transaction_commit();
-
+    assert(QTAILQ_EMPTY(&mr->subregions));
     mr->destructor(mr);
-    memory_region_clear_coalescing(m
+    assert(QTAILQ_EMPTY(&mr->coalesced));
     g_free((char *)mr->name);
     g_free(mr->ioeventfds);
 }


Then let us take a look at flatview_unref, memory_region_unref may call any QOM finalize through 
Object_unref(mr->owner), that's your concern, I got it. It is way too complicated to look at each QOM 
object release callbacks and each QOM property release callbacks.  I gave up this path.

How about fall back to synchronize_rcu?
As for address space, the RCU read lock is used to protect PhysPageMap, but not the regular MemoryRegions, 
because that are not allocated in PhysPageMap build. Subpage MemoryRegion is an exception, because that is 
allocated in PhysPageMap build.

The MemoryRegions returned from address_space_translate are regular MemoryRegions, so 
address_space_write_continue and address_space_read_continue don't need RCU read lock.

Below patch reclaim address space in synchronized way, it reduces heap size from ~12MB to ~3MB.
You need to apply this patch with above patch. And when address_space_dispatch_free is called 
it already holds the global lock, we don't need to acquire the global lock in address_space_dispatch_free.
Please review this patch.

Thanks,
Anthony


diff --git a/cputlb.c b/cputlb.c
index 6c39927..98bd21f 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -347,6 +347,7 @@  void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
         tlb_add_large_page(env, vaddr, size);
     }

+    rcu_read_lock();
     sz = size;
     section = address_space_translate_for_iotlb(cpu, asidx, paddr, &xlat, &sz);
     assert(sz >= TARGET_PAGE_SIZE);
@@ -406,6 +407,7 @@  void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
     } else {
         te->addr_write = -1;
     }
+    rcu_read_unlock();
 }

 /* Add a new TLB entry, but without specifying the memory
diff --git a/exec.c b/exec.c
index 6fa337b..446d622 100644
--- a/exec.c
+++ b/exec.c
@@ -455,7 +455,7 @@  IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
     IOMMUTLBEntry iotlb = {0};
     MemoryRegionSection *section;
     MemoryRegion *mr;
-
+    rcu_read_lock();
     for (;;) {
         AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
         section = address_space_lookup_region(d, addr, false);
@@ -477,7 +477,7 @@  IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
                 | (addr & iotlb.addr_mask));
         as = iotlb.target_as;
     }
-
+    rcu_read_unlock();
     return iotlb;
 }

@@ -490,6 +490,7 @@  MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
     MemoryRegionSection *section;
     MemoryRegion *mr;

+    rcu_read_lock();
     for (;;) {
         AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
         section = address_space_translate_internal(d, addr, &addr, plen, true);
@@ -517,6 +518,7 @@  MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
     }

     *xlat = addr;
+    rcu_read_unlock();
     return mr;
 }

@@ -2413,7 +2415,8 @@  static void mem_commit(MemoryListener *listener)

     atomic_rcu_set(&as->dispatch, next);
     if (cur) {
-        call_rcu(cur, address_space_dispatch_free, rcu);
+        synchronize_rcu();
+        address_space_dispatch_free(cur);
     }
 }

@@ -2459,7 +2462,8 @@  void address_space_destroy_dispatch(AddressSpace *as)

     atomic_rcu_set(&as->dispatch, NULL);
     if (d) {
-        call_rcu(d, address_space_dispatch_free, rcu);
+        synchronize_rcu();
+        address_space_dispatch_free(d);
     }
 }

@@ -2686,12 +2690,10 @@  MemTxResult address_space_write(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
     MemTxResult result = MEMTX_OK;

     if (len > 0) {
-        rcu_read_lock();
         l = len;
         mr = address_space_translate(as, addr, &addr1, &l, true);
         result = address_space_write_continue(as, addr, attrs, buf, len,
                                               addr1, l, mr);
-        rcu_read_unlock();
     }

     return result;
@@ -2776,12 +2778,10 @@  MemTxResult address_space_read_full(AddressSpace *as, hwaddr addr,
     MemTxResult result = MEMTX_OK;

     if (len > 0) {
-        rcu_read_lock();
         l = len;
         mr = address_space_translate(as, addr, &addr1, &l, false);
         result = address_space_read_continue(as, addr, attrs, buf, len,
                                              addr1, l, mr);
-        rcu_read_unlock();
     }

     return result;
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index febe519..3557ac5 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -914,8 +914,6 @@  void vhost_device_iotlb_miss(struct vhost_dev *dev, uint64_t iova, int write)
     IOMMUTLBEntry iotlb;
     uint64_t uaddr, len;

-    rcu_read_lock();
-
     iotlb = address_space_get_iotlb_entry(dev->vdev->dma_as,
                                           iova, write);
     if (iotlb.target_as != NULL) {
@@ -936,7 +934,7 @@  void vhost_device_iotlb_miss(struct vhost_dev *dev, uint64_t iova, int write)
         }
     }
 out:
-    rcu_read_unlock();
+    return;
 }

 static int vhost_virtqueue_start(struct vhost_dev *dev,

[v1,2/2] reduce qemu's heap Rss size from 12252kB to 2752KB

Commit Message

Comments

Patch