Patchwork [2/5] exec.c: use refcnt to protect device during dispatching

login
register
mail settings
Submitter pingfan liu
Date July 25, 2012, 3:31 a.m.
Message ID <1343187070-27371-3-git-send-email-qemulist@gmail.com>
Download mbox | patch
Permalink /patch/173098/
State New
Headers show

Comments

pingfan liu - July 25, 2012, 3:31 a.m.
From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

acquire device's refcnt with qemu_device_tree_mutex rwlock, so we
can safely handle it when mmio dispatch.

If in radix-tree, leaf is subpage, then move further step to acquire
opaque which is the type --DeiveState.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 exec.c   |   38 ++++++++++++++++++++++++++++++++++++++
 memory.h |    2 ++
 2 files changed, 40 insertions(+), 0 deletions(-)
Stefan Hajnoczi - July 25, 2012, 7:43 a.m.
On Wed, Jul 25, 2012 at 4:31 AM, Liu Ping Fan <qemulist@gmail.com> wrote:
> @@ -3396,13 +3420,25 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>      uint32_t val;
>      target_phys_addr_t page;
>      MemoryRegionSection *section;
> +    Object *bk;
>
>      while (len > 0) {
>          page = addr & TARGET_PAGE_MASK;
>          l = (page + TARGET_PAGE_SIZE) - addr;
>          if (l > len)
>              l = len;
> +
> +        qemu_rwlock_rdlock_devtree();
>          section = phys_page_find(page >> TARGET_PAGE_BITS);
> +        if (!(memory_region_is_ram(section->mr) ||
> +            memory_region_is_romd(section->mr)) && !is_write) {
> +            bk = get_backend(section->mr, addr);
> +            object_ref(bk);
> +        } else if (!memory_region_is_ram(section->mr) && is_write) {
> +            bk = get_backend(section->mr, addr);
> +            object_ref(bk);
> +        }
> +        qemu_rwlock_unlock_devtree();
>
>          if (is_write) {
>              if (!memory_region_is_ram(section->mr)) {
> @@ -3426,6 +3462,7 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>                      io_mem_write(section->mr, addr1, val, 1);
>                      l = 1;
>                  }
> +                object_unref(bk);

Currently object_ref()/object_unref() are not atomic.  Will you send
another patch to perform atomic increment/decrement or how will
per-device synchronization work?

Stefan
pingfan liu - July 25, 2012, 8:12 a.m.
On Wed, Jul 25, 2012 at 3:43 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Wed, Jul 25, 2012 at 4:31 AM, Liu Ping Fan <qemulist@gmail.com> wrote:
>> @@ -3396,13 +3420,25 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>      uint32_t val;
>>      target_phys_addr_t page;
>>      MemoryRegionSection *section;
>> +    Object *bk;
>>
>>      while (len > 0) {
>>          page = addr & TARGET_PAGE_MASK;
>>          l = (page + TARGET_PAGE_SIZE) - addr;
>>          if (l > len)
>>              l = len;
>> +
>> +        qemu_rwlock_rdlock_devtree();
>>          section = phys_page_find(page >> TARGET_PAGE_BITS);
>> +        if (!(memory_region_is_ram(section->mr) ||
>> +            memory_region_is_romd(section->mr)) && !is_write) {
>> +            bk = get_backend(section->mr, addr);
>> +            object_ref(bk);
>> +        } else if (!memory_region_is_ram(section->mr) && is_write) {
>> +            bk = get_backend(section->mr, addr);
>> +            object_ref(bk);
>> +        }
>> +        qemu_rwlock_unlock_devtree();
>>
>>          if (is_write) {
>>              if (!memory_region_is_ram(section->mr)) {
>> @@ -3426,6 +3462,7 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>                      io_mem_write(section->mr, addr1, val, 1);
>>                      l = 1;
>>                  }
>> +                object_unref(bk);
>
> Currently object_ref()/object_unref() are not atomic.  Will you send

We obey the rule:
           rdlock->search->ref_get,
           wrlock->remove ->ref_put
So can  it causes problem if object_ref()/object_unref()  are not atomic?
Thanx, pingfan
> another patch to perform atomic increment/decrement or how will
> per-device synchronization work?
>
> Stefan
Paolo Bonzini - July 25, 2012, 9:18 a.m.
Il 25/07/2012 10:12, liu ping fan ha scritto:
>>> >> +        qemu_rwlock_rdlock_devtree();
>>> >>          section = phys_page_find(page >> TARGET_PAGE_BITS);
>>> >> +        if (!(memory_region_is_ram(section->mr) ||
>>> >> +            memory_region_is_romd(section->mr)) && !is_write) {
>>> >> +            bk = get_backend(section->mr, addr);
>>> >> +            object_ref(bk);
>>> >> +        } else if (!memory_region_is_ram(section->mr) && is_write) {
>>> >> +            bk = get_backend(section->mr, addr);
>>> >> +            object_ref(bk);
>>> >> +        }
>>> >> +        qemu_rwlock_unlock_devtree();
>>> >>
>>> >>          if (is_write) {
>>> >>              if (!memory_region_is_ram(section->mr)) {
>>> >> @@ -3426,6 +3462,7 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>> >>                      io_mem_write(section->mr, addr1, val, 1);
>>> >>                      l = 1;
>>> >>                  }
>>> >> +                object_unref(bk);
>> >
>> > Currently object_ref()/object_unref() are not atomic.  Will you send
> We obey the rule:
>            rdlock->search->ref_get,
>            wrlock->remove ->ref_put
> So can  it causes problem if object_ref()/object_unref()  are not atomic?

Yes, two CPUs can perform object_ref at the same time.

You can find a header file for atomic operations here:
https://github.com/bonzini/qemu/commit/atomics.patch

Paolo
Avi Kivity - July 25, 2012, 10:58 a.m.
On 07/25/2012 06:31 AM, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> acquire device's refcnt with qemu_device_tree_mutex rwlock, so we
> can safely handle it when mmio dispatch.
> 
> If in radix-tree, leaf is subpage, then move further step to acquire
> opaque which is the type --DeiveState.
> 
>  
> +static MemoryRegionSection *subpage_get_backend(subpage_t *mmio,
> +    target_phys_addr_t addr)
> +{
> +    MemoryRegionSection *section;
> +    unsigned int idx = SUBPAGE_IDX(addr);
> +
> +    section = &phys_sections[mmio->sub_section[idx]];
> +    return section;
> +}
> +
> +void *get_backend(MemoryRegion* mr,  target_phys_addr_t addr)
> +{
> +    MemoryRegionSection *p;
> +    Object *ret;
> +
> +    if (mr->subpage) {
> +        p = subpage_get_backend(mr->opaque, addr);
> +        ret = OBJECT(p->mr->opaque);
> +    } else {
> +        ret = OBJECT(mr->opaque);
> +    }
> +    return ret;
> +}
> +

You don't enforce that mr->opaque is an object.

The name 'backend' is inappropriate here (actually I don't like it
anywhere).  If we can s/opaque/object/ (and change the type too, we can
call it get_object() (and return an Object *).

>  static const MemoryRegionOps subpage_ops = {
>      .read = subpage_read,
>      .write = subpage_write,
> @@ -3396,13 +3420,25 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>      uint32_t val;
>      target_phys_addr_t page;
>      MemoryRegionSection *section;
> +    Object *bk;
>  
>      while (len > 0) {
>          page = addr & TARGET_PAGE_MASK;
>          l = (page + TARGET_PAGE_SIZE) - addr;
>          if (l > len)
>              l = len;
> +
> +        qemu_rwlock_rdlock_devtree();
>          section = phys_page_find(page >> TARGET_PAGE_BITS);

Does the devtree lock also protect the data structures accessed by
phys_page_find()?  Seems wrong.

> +        if (!(memory_region_is_ram(section->mr) ||
> +            memory_region_is_romd(section->mr)) && !is_write) {
> +            bk = get_backend(section->mr, addr);
> +            object_ref(bk);
> +        } else if (!memory_region_is_ram(section->mr) && is_write) {
> +            bk = get_backend(section->mr, addr);
> +            object_ref(bk);
> +        }

Best push the ugliness that computes bk into a small helper, and do just
the object_ref() here.

> +        qemu_rwlock_unlock_devtree();
>  
>          if (is_write) {
>              if (!memory_region_is_ram(section->mr)) {
> @@ -3426,6 +3462,7 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>                      io_mem_write(section->mr, addr1, val, 1);
>                      l = 1;
>                  }
> +                object_unref(bk);
>              } else if (!section->readonly) {
>                  ram_addr_t addr1;
>                  addr1 = memory_region_get_ram_addr(section->mr)
> @@ -3464,6 +3501,7 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>                      stb_p(buf, val);
>                      l = 1;
>                  }
> +                object_unref(bk);
>              } else {
>                  /* RAM case */
>                  ptr = qemu_get_ram_ptr(section->mr->ram_addr
> diff --git a/memory.h b/memory.h
> index 740c48e..e5a86dc 100644
> --- a/memory.h
> +++ b/memory.h
> @@ -748,6 +748,8 @@ void memory_global_dirty_log_stop(void);
>  
>  void mtree_info(fprintf_function mon_printf, void *f);
>  
> +void *get_backend(MemoryRegion* mr,  target_phys_addr_t addr);
> +

This is a private interface, shouldn't be in memory.h.
Avi Kivity - July 25, 2012, 12:27 p.m.
On 07/25/2012 01:58 PM, Avi Kivity wrote:
>>      while (len > 0) {
>>          page = addr & TARGET_PAGE_MASK;
>>          l = (page + TARGET_PAGE_SIZE) - addr;
>>          if (l > len)
>>              l = len;
>> +
>> +        qemu_rwlock_rdlock_devtree();
>>          section = phys_page_find(page >> TARGET_PAGE_BITS);
> 
> Does the devtree lock also protect the data structures accessed by
> phys_page_find()?  Seems wrong.

The right way is to object_ref() in core_region_add() and object_unref()
in core_region_del().  We're guaranteed that mr->object is alive during
_add(), and DeviceClass::unmap() ensures that the extra ref doesn't
block destruction.
pingfan liu - July 26, 2012, 1 p.m.
On Wed, Jul 25, 2012 at 5:18 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 25/07/2012 10:12, liu ping fan ha scritto:
>>>> >> +        qemu_rwlock_rdlock_devtree();
>>>> >>          section = phys_page_find(page >> TARGET_PAGE_BITS);
>>>> >> +        if (!(memory_region_is_ram(section->mr) ||
>>>> >> +            memory_region_is_romd(section->mr)) && !is_write) {
>>>> >> +            bk = get_backend(section->mr, addr);
>>>> >> +            object_ref(bk);
>>>> >> +        } else if (!memory_region_is_ram(section->mr) && is_write) {
>>>> >> +            bk = get_backend(section->mr, addr);
>>>> >> +            object_ref(bk);
>>>> >> +        }
>>>> >> +        qemu_rwlock_unlock_devtree();
>>>> >>
>>>> >>          if (is_write) {
>>>> >>              if (!memory_region_is_ram(section->mr)) {
>>>> >> @@ -3426,6 +3462,7 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>>> >>                      io_mem_write(section->mr, addr1, val, 1);
>>>> >>                      l = 1;
>>>> >>                  }
>>>> >> +                object_unref(bk);
>>> >
>>> > Currently object_ref()/object_unref() are not atomic.  Will you send
>> We obey the rule:
>>            rdlock->search->ref_get,
>>            wrlock->remove ->ref_put
>> So can  it causes problem if object_ref()/object_unref()  are not atomic?
>
> Yes, two CPUs can perform object_ref at the same time.
>
> You can find a header file for atomic operations here:
> https://github.com/bonzini/qemu/commit/atomics.patch
>
Got it, thanks

> Paolo
pingfan liu - July 26, 2012, 1:06 p.m.
On Wed, Jul 25, 2012 at 8:27 PM, Avi Kivity <avi@redhat.com> wrote:
> On 07/25/2012 01:58 PM, Avi Kivity wrote:
>>>      while (len > 0) {
>>>          page = addr & TARGET_PAGE_MASK;
>>>          l = (page + TARGET_PAGE_SIZE) - addr;
>>>          if (l > len)
>>>              l = len;
>>> +
>>> +        qemu_rwlock_rdlock_devtree();
>>>          section = phys_page_find(page >> TARGET_PAGE_BITS);
>>
>> Does the devtree lock also protect the data structures accessed by
>> phys_page_find()?  Seems wrong.
>
> The right way is to object_ref() in core_region_add() and object_unref()
> in core_region_del().  We're guaranteed that mr->object is alive during
> _add(), and DeviceClass::unmap() ensures that the extra ref doesn't
> block destruction.
>
OK, I see. I will try in this way.  But when
memory_region_destroy()->..->core_region_del(), should we reset the
lp.ptr to phys_section_unassigned , otherwise, if using  removed
target_phys_addr_t, we will still get the pointer to invalid
MemoryRegion?

Thanx, pingfan
> --
> error compiling committee.c: too many arguments to function
>
>
Avi Kivity - July 26, 2012, 1:13 p.m.
On 07/26/2012 04:06 PM, liu ping fan wrote:
> On Wed, Jul 25, 2012 at 8:27 PM, Avi Kivity <avi@redhat.com> wrote:
>> On 07/25/2012 01:58 PM, Avi Kivity wrote:
>>>>      while (len > 0) {
>>>>          page = addr & TARGET_PAGE_MASK;
>>>>          l = (page + TARGET_PAGE_SIZE) - addr;
>>>>          if (l > len)
>>>>              l = len;
>>>> +
>>>> +        qemu_rwlock_rdlock_devtree();
>>>>          section = phys_page_find(page >> TARGET_PAGE_BITS);
>>>
>>> Does the devtree lock also protect the data structures accessed by
>>> phys_page_find()?  Seems wrong.
>>
>> The right way is to object_ref() in core_region_add() and object_unref()
>> in core_region_del().  We're guaranteed that mr->object is alive during
>> _add(), and DeviceClass::unmap() ensures that the extra ref doesn't
>> block destruction.
>>
> OK, I see. I will try in this way.  But when
> memory_region_destroy()->..->core_region_del(), should we reset the
> lp.ptr to phys_section_unassigned , otherwise, if using  removed
> target_phys_addr_t, we will still get the pointer to invalid
> MemoryRegion?

The intent was to use rcu, so when we rebuild phys_map we build it into
a new tree, use rcu_assign_pointer() to switch into the new tree, then
synchronize_rcu() and drop the old tree.

Since we don't have rcu yet we can emulate it with a lock.  We can start
with a simple mutex around the lookup and rebuild, then switch to rwlock
or rcu if needed.

(without the lock or rcu, just changing lp.ptr is dangerous, since it is
a bit field)

Patch

diff --git a/exec.c b/exec.c
index 8244d54..d2a6d08 100644
--- a/exec.c
+++ b/exec.c
@@ -3032,6 +3032,30 @@  static void subpage_write(void *opaque, target_phys_addr_t addr,
     io_mem_write(section->mr, addr, value, len);
 }
 
+static MemoryRegionSection *subpage_get_backend(subpage_t *mmio,
+    target_phys_addr_t addr)
+{
+    MemoryRegionSection *section;
+    unsigned int idx = SUBPAGE_IDX(addr);
+
+    section = &phys_sections[mmio->sub_section[idx]];
+    return section;
+}
+
+void *get_backend(MemoryRegion* mr,  target_phys_addr_t addr)
+{
+    MemoryRegionSection *p;
+    Object *ret;
+
+    if (mr->subpage) {
+        p = subpage_get_backend(mr->opaque, addr);
+        ret = OBJECT(p->mr->opaque);
+    } else {
+        ret = OBJECT(mr->opaque);
+    }
+    return ret;
+}
+
 static const MemoryRegionOps subpage_ops = {
     .read = subpage_read,
     .write = subpage_write,
@@ -3396,13 +3420,25 @@  void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
     uint32_t val;
     target_phys_addr_t page;
     MemoryRegionSection *section;
+    Object *bk;
 
     while (len > 0) {
         page = addr & TARGET_PAGE_MASK;
         l = (page + TARGET_PAGE_SIZE) - addr;
         if (l > len)
             l = len;
+
+        qemu_rwlock_rdlock_devtree();
         section = phys_page_find(page >> TARGET_PAGE_BITS);
+        if (!(memory_region_is_ram(section->mr) ||
+            memory_region_is_romd(section->mr)) && !is_write) {
+            bk = get_backend(section->mr, addr);
+            object_ref(bk);
+        } else if (!memory_region_is_ram(section->mr) && is_write) {
+            bk = get_backend(section->mr, addr);
+            object_ref(bk);
+        }
+        qemu_rwlock_unlock_devtree();
 
         if (is_write) {
             if (!memory_region_is_ram(section->mr)) {
@@ -3426,6 +3462,7 @@  void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
                     io_mem_write(section->mr, addr1, val, 1);
                     l = 1;
                 }
+                object_unref(bk);
             } else if (!section->readonly) {
                 ram_addr_t addr1;
                 addr1 = memory_region_get_ram_addr(section->mr)
@@ -3464,6 +3501,7 @@  void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
                     stb_p(buf, val);
                     l = 1;
                 }
+                object_unref(bk);
             } else {
                 /* RAM case */
                 ptr = qemu_get_ram_ptr(section->mr->ram_addr
diff --git a/memory.h b/memory.h
index 740c48e..e5a86dc 100644
--- a/memory.h
+++ b/memory.h
@@ -748,6 +748,8 @@  void memory_global_dirty_log_stop(void);
 
 void mtree_info(fprintf_function mon_printf, void *f);
 
+void *get_backend(MemoryRegion* mr,  target_phys_addr_t addr);
+
 #endif
 
 #endif