diff mbox

[01/16] qdev: fix hot-unplug

Message ID 1328201142-26145-2-git-send-email-pbonzini@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini Feb. 2, 2012, 4:45 p.m. UTC
The reference that is returned by qdev_device_add is never given
back, so that device_del does not cause the refcount to go to zero
(and thus does nothing).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 vl.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

Comments

Anthony Liguori Feb. 2, 2012, 5:03 p.m. UTC | #1
On 02/02/2012 10:45 AM, Paolo Bonzini wrote:
> The reference that is returned by qdev_device_add is never given
> back, so that device_del does not cause the refcount to go to zero
> (and thus does nothing).
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
>   vl.c |    1 +
>   1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/vl.c b/vl.c
> index d88a18c..c63af69 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1746,6 +1746,7 @@ static int device_init_func(QemuOpts *opts, void *opaque)
>       dev = qdev_device_add(opts);
>       if (!dev)
>           return -1;
> +    object_unref(OBJECT(dev));
>       return 0;

Is this still needed with qom-upstream.14?  I fixed a bug on .14 that involved 
child properties that was making device-del sometimes fail.

If it is, what's your test case?  I have a device_del test case that seems to be 
working right now without this patch.

Regards,

Anthony Liguori

>   }
>
Paolo Bonzini Feb. 2, 2012, 5:29 p.m. UTC | #2
On 02/02/2012 06:03 PM, Anthony Liguori wrote:
>>
>
> Is this still needed with qom-upstream.14?  I fixed a bug on .14 that
> involved child properties that was making device-del sometimes fail.

Not sure, I tried with .13 but, from the look of it, it should still be 
there.  Regarding the .13->.14 diff:

- you need QTAILQ_FOREACH_SAFE in object_property_del_child.

- you need to check for the existence of the non-aliased name when 
accessing the alias table, because s390 does not have PCI.

> If it is, what's your test case?

I check that the device disappears from "info qtree".  I check with gdb 
that after object_unparent the refcount is zero.

Paolo
Anthony Liguori Feb. 2, 2012, 7:01 p.m. UTC | #3
On 02/02/2012 11:29 AM, Paolo Bonzini wrote:
> On 02/02/2012 06:03 PM, Anthony Liguori wrote:
>>>
>>
>> Is this still needed with qom-upstream.14? I fixed a bug on .14 that
>> involved child properties that was making device-del sometimes fail.
>
> Not sure, I tried with .13 but, from the look of it, it should still be there.
> Regarding the .13->.14 diff:
>
> - you need QTAILQ_FOREACH_SAFE in object_property_del_child.

Ack.

>
> - you need to check for the existence of the non-aliased name when accessing the
> alias table, because s390 does not have PCI.

I don't think that's the right strategy as it means that s390 only works if we 
don't include the PCI objects in the build (regardless of whether it uses PCI). 
  This would be defeated if/when we move to having all device objects in a 
single shared library used by all of the qemu executables.

I'd prefer to just drop the aliases for s390.  I don't see a lot of value in it 
and I don't think there are tons of s390 users that will be affected.

>
>> If it is, what's your test case?
>
> I check that the device disappears from "info qtree". I check with gdb that
> after object_unparent the refcount is zero.

Ah, okay, I'll look at this more closely.  Thanks.

Regards,

Anthony Liguori

>
> Paolo
>
Alexander Graf Feb. 2, 2012, 7:07 p.m. UTC | #4
On 02.02.2012, at 20:01, Anthony Liguori wrote:

> On 02/02/2012 11:29 AM, Paolo Bonzini wrote:
>> On 02/02/2012 06:03 PM, Anthony Liguori wrote:
>>>> 
>>> 
>>> Is this still needed with qom-upstream.14? I fixed a bug on .14 that
>>> involved child properties that was making device-del sometimes fail.
>> 
>> Not sure, I tried with .13 but, from the look of it, it should still be there.
>> Regarding the .13->.14 diff:
>> 
>> - you need QTAILQ_FOREACH_SAFE in object_property_del_child.
> 
> Ack.
> 
>> 
>> - you need to check for the existence of the non-aliased name when accessing the
>> alias table, because s390 does not have PCI.
> 
> I don't think that's the right strategy as it means that s390 only works if we don't include the PCI objects in the build (regardless of whether it uses PCI).  This would be defeated if/when we move to having all device objects in a single shared library used by all of the qemu executables.
> 
> I'd prefer to just drop the aliases for s390.  I don't see a lot of value in it and I don't think there are tons of s390 users that will be affected.

The reason for the aliases is to make -drive and -net work. If you have alternatives to aliases there, I'm happy to go with them.


Alex
Anthony Liguori Feb. 2, 2012, 8:03 p.m. UTC | #5
On 02/02/2012 01:07 PM, Alexander Graf wrote:
>
> On 02.02.2012, at 20:01, Anthony Liguori wrote:
>
>> On 02/02/2012 11:29 AM, Paolo Bonzini wrote:
>>> On 02/02/2012 06:03 PM, Anthony Liguori wrote:
>>>>>
>>>>
>>>> Is this still needed with qom-upstream.14? I fixed a bug on .14 that
>>>> involved child properties that was making device-del sometimes fail.
>>>
>>> Not sure, I tried with .13 but, from the look of it, it should still be there.
>>> Regarding the .13->.14 diff:
>>>
>>> - you need QTAILQ_FOREACH_SAFE in object_property_del_child.
>>
>> Ack.
>>
>>>
>>> - you need to check for the existence of the non-aliased name when accessing the
>>> alias table, because s390 does not have PCI.
>>
>> I don't think that's the right strategy as it means that s390 only works if we don't include the PCI objects in the build (regardless of whether it uses PCI).  This would be defeated if/when we move to having all device objects in a single shared library used by all of the qemu executables.
>>
>> I'd prefer to just drop the aliases for s390.  I don't see a lot of value in it and I don't think there are tons of s390 users that will be affected.
>
> The reason for the aliases is to make -drive and -net work. If you have alternatives to aliases there, I'm happy to go with them.

We can simply do a const char *target_get_virtio_net_type(void) in arch_init.c.

Not pretty, but we can later fix the -drive/-net calls to not require this.

Regards,

Anthony Liguori

>
>
> Alex
>
>
Alexander Graf Feb. 2, 2012, 8:31 p.m. UTC | #6
On 02.02.2012, at 21:03, Anthony Liguori wrote:

> On 02/02/2012 01:07 PM, Alexander Graf wrote:
>> 
>> On 02.02.2012, at 20:01, Anthony Liguori wrote:
>> 
>>> On 02/02/2012 11:29 AM, Paolo Bonzini wrote:
>>>> On 02/02/2012 06:03 PM, Anthony Liguori wrote:
>>>>>> 
>>>>> 
>>>>> Is this still needed with qom-upstream.14? I fixed a bug on .14 that
>>>>> involved child properties that was making device-del sometimes fail.
>>>> 
>>>> Not sure, I tried with .13 but, from the look of it, it should still be there.
>>>> Regarding the .13->.14 diff:
>>>> 
>>>> - you need QTAILQ_FOREACH_SAFE in object_property_del_child.
>>> 
>>> Ack.
>>> 
>>>> 
>>>> - you need to check for the existence of the non-aliased name when accessing the
>>>> alias table, because s390 does not have PCI.
>>> 
>>> I don't think that's the right strategy as it means that s390 only works if we don't include the PCI objects in the build (regardless of whether it uses PCI).  This would be defeated if/when we move to having all device objects in a single shared library used by all of the qemu executables.
>>> 
>>> I'd prefer to just drop the aliases for s390.  I don't see a lot of value in it and I don't think there are tons of s390 users that will be affected.
>> 
>> The reason for the aliases is to make -drive and -net work. If you have alternatives to aliases there, I'm happy to go with them.
> 
> We can simply do a const char *target_get_virtio_net_type(void) in arch_init.c.
> 
> Not pretty, but we can later fix the -drive/-net calls to not require this.

Anything that works. The only reason to have the aliases for me really was to not have target awareness in -drive and -net. So if you're feeling better with an arch callback, I'm definitely fine with that too.

Alex
Anthony Liguori Feb. 3, 2012, 2:27 p.m. UTC | #7
On 02/02/2012 10:45 AM, Paolo Bonzini wrote:
> The reference that is returned by qdev_device_add is never given
> back, so that device_del does not cause the refcount to go to zero
> (and thus does nothing).
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>

This isn't needed in qom-upstream.14.  Here's why:

object_init does not increase the reference count

object_property_add_child increases the reference count
object_new increases the reference count

object_delete decrements the reference count
object_property_del_child decreases the reference count

object_delete calls object_property_del_child(obj->parent, obj)

qdev_device_add calls object_new and object_property_add_child
  -> ref == 2

qdev_device_del calls object_delete
  -> ref -= 2

In qom-upstream.13, object_delete wasn't calling object_property_del_child which 
is why you saw the behavior you did.  This problem would still exist with a 
composed device so dropping the reference here isn't enough.

Regards,

Anthony Liguori

> ---
>   vl.c |    1 +
>   1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/vl.c b/vl.c
> index d88a18c..c63af69 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1746,6 +1746,7 @@ static int device_init_func(QemuOpts *opts, void *opaque)
>       dev = qdev_device_add(opts);
>       if (!dev)
>           return -1;
> +    object_unref(OBJECT(dev));
>       return 0;
>   }
>
Paolo Bonzini Feb. 4, 2012, 12:27 a.m. UTC | #8
On 02/03/2012 03:27 PM, Anthony Liguori wrote:
> On 02/02/2012 10:45 AM, Paolo Bonzini wrote:
>> The reference that is returned by qdev_device_add is never given
>> back, so that device_del does not cause the refcount to go to zero
>> (and thus does nothing).
>>
>> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
>
> This isn't needed in qom-upstream.14. Here's why:
>
> object_init does not increase the reference count
>
> object_property_add_child increases the reference count
> object_new increases the reference count
>
> object_delete decrements the reference count
> object_property_del_child decreases the reference count
>
> object_delete calls object_property_del_child(obj->parent, obj)
>
> qdev_device_add calls object_new and object_property_add_child
> -> ref == 2
>
> qdev_device_del calls object_delete
> -> ref -= 2
>
> In qom-upstream.13, object_delete wasn't calling
> object_property_del_child which is why you saw the behavior you did.
> This problem would still exist with a composed device so dropping the
> reference here isn't enough.

I trust you for now. :)

It really seems like my patch is obviously correct so, if it's not 
needed anymore there may be another bug elsewhere that masks it.

Paolo
Anthony Liguori Feb. 4, 2012, 3:03 a.m. UTC | #9
On 02/03/2012 06:27 PM, Paolo Bonzini wrote:
> On 02/03/2012 03:27 PM, Anthony Liguori wrote:
>> On 02/02/2012 10:45 AM, Paolo Bonzini wrote:
>>> The reference that is returned by qdev_device_add is never given
>>> back, so that device_del does not cause the refcount to go to zero
>>> (and thus does nothing).
>>>
>>> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
>>
>> This isn't needed in qom-upstream.14. Here's why:
>>
>> object_init does not increase the reference count
>>
>> object_property_add_child increases the reference count
>> object_new increases the reference count
>>
>> object_delete decrements the reference count
>> object_property_del_child decreases the reference count
>>
>> object_delete calls object_property_del_child(obj->parent, obj)
>>
>> qdev_device_add calls object_new and object_property_add_child
>> -> ref == 2
>>
>> qdev_device_del calls object_delete
>> -> ref -= 2
>>
>> In qom-upstream.13, object_delete wasn't calling
>> object_property_del_child which is why you saw the behavior you did.
>> This problem would still exist with a composed device so dropping the
>> reference here isn't enough.
>
> I trust you for now. :)
>
> It really seems like my patch is obviously correct so, if it's not needed
> anymore there may be another bug elsewhere that masks it.

There's no object_ref() in qdev_device_add().  The 2 references come from adding 
a child link to /peripheral and via object_new().

object_free() drops a reference (it's called in qdev_device_del()) and in the 
process of calling object_free(), it also calls object_unparent() which will 
drop the reference from the parent.

I'm not thrilled about the way reference counting is done now.  Perhaps we 
should do a gobject style floating reference...

Regards,

Anthony Liguori

>
> Paolo
>
>
Paolo Bonzini Feb. 4, 2012, 6:51 a.m. UTC | #10
On 02/04/2012 04:03 AM, Anthony Liguori wrote:
> There's no object_ref() in qdev_device_add().  The 2 references come
> from adding a child link to /peripheral and via object_new().

Sure, but there's when the object_new() reference becomes unreachable. 
At this point, if it weren't for /peripheral the device should have 
disappeared.

> object_free() drops a reference (it's called in qdev_device_del()) and
> in the process of calling object_free(), it also calls object_unparent()
> which will drop the reference from the parent.
>
> I'm not thrilled about the way reference counting is done now.  Perhaps
> we should do a gobject style floating reference...

I'm not sure that's a problem.  Rather, the problem is that we are 
(still) mixing manual memory management and refcounting by making 
object_delete drop a reference.

Can you remind me of why you have object_unref separate from 
object_delete?  Is it because you must not delete objects that were 
object_initialize'd rather than object_new'd?  Perhaps we can take care 
of that with a flag elsewhere saying "do not free this object when 
object_unref drops the last ref" (only finalize it).

Thanks for analyzing the behavior.  We don't have to get it right 
immediately as long as we know what's going on, the transition is not 
complete anyway.

Paolo
Anthony Liguori Feb. 4, 2012, 5:13 p.m. UTC | #11
On 02/04/2012 12:51 AM, Paolo Bonzini wrote:
> On 02/04/2012 04:03 AM, Anthony Liguori wrote:
>> There's no object_ref() in qdev_device_add(). The 2 references come
>> from adding a child link to /peripheral and via object_new().
>
> Sure, but there's when the object_new() reference becomes unreachable. At this
> point, if it weren't for /peripheral the device should have disappeared.
>
>> object_free() drops a reference (it's called in qdev_device_del()) and
>> in the process of calling object_free(), it also calls object_unparent()
>> which will drop the reference from the parent.
>>
>> I'm not thrilled about the way reference counting is done now. Perhaps
>> we should do a gobject style floating reference...
>
> I'm not sure that's a problem. Rather, the problem is that we are (still) mixing
> manual memory management and refcounting by making object_delete drop a reference.
>
> Can you remind me of why you have object_unref separate from object_delete? Is
> it because you must not delete objects that were object_initialize'd rather than
> object_new'd? Perhaps we can take care of that with a flag elsewhere saying "do
> not free this object when object_unref drops the last ref" (only finalize it).

I really didn't want to bake allocation into the type interface.  I think 
there's another more robust way to handle this.

We need signal support.  I'd really like to have generic signals that showed up 
a signal<> property type, that could also be registered over QMP, but in the 
interim, we can probably just an ad-hoc NotifierList and do something better later.

Anyway, we should have a "destroy" signal and a "delete" signal.  destroy is 
fired when object_finalize() is called at the very beginning of the function. 
delete is fired at the very end of object_finalize().

We would change object_delete() to call object_unref and then object_finalize(). 
  We would connect a "delete" event in object_new() that would free the memory.

This would also allow object_property_add_child() to connect to the "destroy" 
event such that it could remove the reference it holds on the child.

I'll work up a patch.  I've got some ideas about how to do generic signals too 
but I'd prefer to wait to introduce all of that.

Regards,

Anthony Liguori


> Thanks for analyzing the behavior. We don't have to get it right immediately as
> long as we know what's going on, the transition is not complete anyway.
>
> Paolo
>
diff mbox

Patch

diff --git a/vl.c b/vl.c
index d88a18c..c63af69 100644
--- a/vl.c
+++ b/vl.c
@@ -1746,6 +1746,7 @@  static int device_init_func(QemuOpts *opts, void *opaque)
     dev = qdev_device_add(opts);
     if (!dev)
         return -1;
+    object_unref(OBJECT(dev));
     return 0;
 }