Patchwork [06/15] memory: use refcnt to manage MemoryRegion

login
register
mail settings
Submitter pingfan liu
Date Aug. 8, 2012, 6:25 a.m.
Message ID <1344407156-25562-7-git-send-email-qemulist@gmail.com>
Download mbox | patch
Permalink /patch/175874/
State New
Headers show

Comments

pingfan liu - Aug. 8, 2012, 6:25 a.m.
From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Using refcnt for mr, so we can separate mr's life cycle management
from refered object.
  When mr->ref 0->1, inc the refered object.
  When mr->ref 1->0, dec the refered object.

The refered object can be DeviceStae, another mr, or other opaque.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 memory.c |   18 ++++++++++++++++++
 memory.h |    5 +++++
 2 files changed, 23 insertions(+), 0 deletions(-)
Avi Kivity - Aug. 8, 2012, 9:20 a.m.
On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Using refcnt for mr, so we can separate mr's life cycle management
> from refered object.
>   When mr->ref 0->1, inc the refered object.
>   When mr->ref 1->0, dec the refered object.
> 
> The refered object can be DeviceStae, another mr, or other opaque.

Please explain the motivation more fully.

Usually a MemoryRegion will be embedded within some DeviceState, or its
lifecycle will be managed by the DeviceState.  So long as we keep the
DeviceState alive all associated MemoryRegions should be alive as well.
 Why not do this directly?
pingfan liu - Aug. 9, 2012, 7:27 a.m.
On Wed, Aug 8, 2012 at 5:20 PM, Avi Kivity <avi@redhat.com> wrote:
> On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> Using refcnt for mr, so we can separate mr's life cycle management
>> from refered object.
>>   When mr->ref 0->1, inc the refered object.
>>   When mr->ref 1->0, dec the refered object.
>>
>> The refered object can be DeviceStae, another mr, or other opaque.
>
> Please explain the motivation more fully.
>
Actually, the aim is to mange the reference of an object, used by mem view.
DeviceState can be referred by different system, when it comes to the
view of subsystem, we hold dev's ref. And any indirect reference will
just mr->ref++, not dev's.
This can help us avoid the down-walk through the referred chain, like
alias----> mr ---> DeviceState.

In the previous discussion, you have suggest add dev->ref++ in
core_region_add.  But I think, if we can move it to higher layer --
memory_region_{add,del}_subregion, so we can avoid to duplicate do
this in other xx_region_add.
As a payment for this, we need to handle alias which can be avoid at
core_region_add().  And mr's ref can help to avoid
 the down-walk.

Regards,
pingfan
> Usually a MemoryRegion will be embedded within some DeviceState, or its
> lifecycle will be managed by the DeviceState.  So long as we keep the
> DeviceState alive all associated MemoryRegions should be alive as well.
>  Why not do this directly?
>
>
> --
> error compiling committee.c: too many arguments to function
Avi Kivity - Aug. 9, 2012, 8:38 a.m.
On 08/09/2012 10:27 AM, liu ping fan wrote:
> On Wed, Aug 8, 2012 at 5:20 PM, Avi Kivity <avi@redhat.com> wrote:
>> On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>
>>> Using refcnt for mr, so we can separate mr's life cycle management
>>> from refered object.
>>>   When mr->ref 0->1, inc the refered object.
>>>   When mr->ref 1->0, dec the refered object.
>>>
>>> The refered object can be DeviceStae, another mr, or other opaque.
>>
>> Please explain the motivation more fully.
>>
> Actually, the aim is to mange the reference of an object, used by mem view.
> DeviceState can be referred by different system, when it comes to the
> view of subsystem, we hold dev's ref. And any indirect reference will
> just mr->ref++, not dev's.
> This can help us avoid the down-walk through the referred chain, like
> alias----> mr ---> DeviceState.

That is a lot of complexity, for no gain.  Manipulating memory regions
is a slow path, and can be done under the bit qemu lock without any
complications.

> 
> In the previous discussion, you have suggest add dev->ref++ in
> core_region_add.  But I think, if we can move it to higher layer --
> memory_region_{add,del}_subregion, so we can avoid to duplicate do
> this in other xx_region_add.

Why would other memory listeners be impacted?  They all operate under
the big qemu lock.  If they start using devices outside the lock, then
they need to take a reference.

> As a payment for this, we need to handle alias which can be avoid at
> core_region_add().  And mr's ref can help to avoid
>  the down-walk.

The payment is two systems of reference counts.
pingfan liu - Aug. 10, 2012, 6:44 a.m.
On Thu, Aug 9, 2012 at 4:38 PM, Avi Kivity <avi@redhat.com> wrote:
> On 08/09/2012 10:27 AM, liu ping fan wrote:
>> On Wed, Aug 8, 2012 at 5:20 PM, Avi Kivity <avi@redhat.com> wrote:
>>> On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
>>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>>
>>>> Using refcnt for mr, so we can separate mr's life cycle management
>>>> from refered object.
>>>>   When mr->ref 0->1, inc the refered object.
>>>>   When mr->ref 1->0, dec the refered object.
>>>>
>>>> The refered object can be DeviceStae, another mr, or other opaque.
>>>
>>> Please explain the motivation more fully.
>>>
>> Actually, the aim is to mange the reference of an object, used by mem view.
>> DeviceState can be referred by different system, when it comes to the
>> view of subsystem, we hold dev's ref. And any indirect reference will
>> just mr->ref++, not dev's.
>> This can help us avoid the down-walk through the referred chain, like
>> alias----> mr ---> DeviceState.
>
> That is a lot of complexity, for no gain.  Manipulating memory regions
> is a slow path, and can be done under the bit qemu lock without any
> complications.
>
OK. I will discard this design.
>>
>> In the previous discussion, you have suggest add dev->ref++ in
>> core_region_add.  But I think, if we can move it to higher layer --
>> memory_region_{add,del}_subregion, so we can avoid to duplicate do
>> this in other xx_region_add.
>
> Why would other memory listeners be impacted?  They all operate under
> the big qemu lock.  If they start using devices outside the lock, then
> they need to take a reference.
>
Yes, if unplug path in the protection of big lock.
And just one extra question, for ram-unplug scene, how do we protect from:
  updater:  ram-unplug -->qemu free() --> brk() invalidate this vaddr interval
  reader:  vhost-thread copy data from the interval
I guess something like lock/ref used by them, but can not find such
mechanism in vhost_set_memory() to protect the scene against
vhost_worker()

Thanks and regards,
pingfan

>> As a payment for this, we need to handle alias which can be avoid at
>> core_region_add().  And mr's ref can help to avoid
>>  the down-walk.
>
> The payment is two systems of reference counts.
>
> --
> error compiling committee.c: too many arguments to function
Avi Kivity - Aug. 12, 2012, 8:43 a.m.
On 08/10/2012 09:44 AM, liu ping fan wrote:
>>> In the previous discussion, you have suggest add dev->ref++ in
>>> core_region_add.  But I think, if we can move it to higher layer --
>>> memory_region_{add,del}_subregion, so we can avoid to duplicate do
>>> this in other xx_region_add.
>>
>> Why would other memory listeners be impacted?  They all operate under
>> the big qemu lock.  If they start using devices outside the lock, then
>> they need to take a reference.
>>
> Yes, if unplug path in the protection of big lock.
> And just one extra question, for ram-unplug scene, how do we protect from:
>   updater:  ram-unplug -->qemu free() --> brk() invalidate this vaddr interval
>   reader:  vhost-thread copy data from the interval
> I guess something like lock/ref used by them, but can not find such
> mechanism in vhost_set_memory() to protect the scene against
> vhost_worker()

VHOST_SET_MEM_TABLE uses synchronize_srcu() to ensure no readers are
active before returning.

Patch

diff --git a/memory.c b/memory.c
index 80c7529..5dc8b59 100644
--- a/memory.c
+++ b/memory.c
@@ -811,6 +811,7 @@  void memory_region_init(MemoryRegion *mr,
     if (size == UINT64_MAX) {
         mr->size = int128_2_64();
     }
+    atomic_set(&mr->ref, 0);
     mr->life_ops = &nops;
     mr->addr = 0;
     mr->subpage = false;
@@ -1090,6 +1091,23 @@  static const MemoryRegionOps reservation_ops = {
     .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
+void memory_region_get(MemoryRegion *mr)
+{
+    if (atomic_add_and_return(1, &mr->ref) == 1) {
+        mr->life_ops->get(mr);
+    }
+}
+
+void memory_region_put(MemoryRegion *mr)
+{
+    assert(atomic_read(&mr->ref) > 0);
+
+    if (atomic_dec_and_test(&mr->ref)) {
+        /* to fix, using call_rcu( ,release) */
+        mr->life_ops->put(mr);
+    }
+}
+
 void memory_region_init_reservation(MemoryRegion *mr,
                                     const char *name,
                                     uint64_t size)
diff --git a/memory.h b/memory.h
index 8fb543b..740f018 100644
--- a/memory.h
+++ b/memory.h
@@ -18,6 +18,7 @@ 
 
 #include <stdint.h>
 #include <stdbool.h>
+#include "qemu/atomic.h"
 #include "qemu-common.h"
 #include "cpu-common.h"
 #include "targphys.h"
@@ -26,6 +27,7 @@ 
 #include "ioport.h"
 #include "int128.h"
 #include "qemu-thread.h"
+#include "qemu/reclaimer.h"
 
 typedef struct MemoryRegionOps MemoryRegionOps;
 typedef struct MemoryRegionLifeOps MemoryRegionLifeOps;
@@ -126,6 +128,7 @@  typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
 struct MemoryRegion {
     /* All fields are private - violators will be prosecuted */
     const MemoryRegionOps *ops;
+    Atomic ref;
     MemoryRegionLifeOps *life_ops;
     void *opaque;
     MemoryRegion *parent;
@@ -766,6 +769,8 @@  void memory_global_dirty_log_stop(void);
 
 void mtree_info(fprintf_function mon_printf, void *f);
 
+void memory_region_get(MemoryRegion *mr);
+void memory_region_put(MemoryRegion *mr);
 #endif
 
 #endif