Patchwork qdev: Reset hotplugged devices

login
register
mail settings
Submitter Anthony Liguori
Date Aug. 20, 2010, 6:12 p.m.
Message ID <4C6EC5AA.6050502@codemonkey.ws>
Download mbox | patch
Permalink /patch/62300/
State New
Headers show

Comments

Anthony Liguori - Aug. 20, 2010, 6:12 p.m.
On 08/20/2010 11:14 AM, Markus Armbruster wrote:
>> The real problem is how we do reset.  We shouldn't register a reset
>> handler for every qdev device but rather register a single reset
>> handler that walks the device tree and calls reset on every reachable
>> device.
>>
>> Then we can always call reset in init() and there's no need to have a
>> dev->hotplugged check.  The qdev device tree reset handler should not
>> be registered until *after* we call qemu_system_reset() after creating
>> the device model which will ensure that we don't do a double reset.
>>      
> Fine with me.
>
> But we need to merge something short term (pre 0.13) to fix hot plug of
> e1000 et al.  Use Alex's patch as such a stop-gap?
>    

No, we're accumulating crud in base qdev at an alarming rate.  It's 
important to fix these things now before it gets prohibitively hard to 
take care of.

Can you and Alex review/try the following patch?  It seems to work for 
me although I'm not sure how to trigger the original bug.

Regards,

Anthony Liguori
Alex Williamson - Aug. 20, 2010, 10:05 p.m.
On Fri, 2010-08-20 at 13:12 -0500, Anthony Liguori wrote:
> On 08/20/2010 11:14 AM, Markus Armbruster wrote:
> >> The real problem is how we do reset.  We shouldn't register a reset
> >> handler for every qdev device but rather register a single reset
> >> handler that walks the device tree and calls reset on every reachable
> >> device.
> >>
> >> Then we can always call reset in init() and there's no need to have a
> >> dev->hotplugged check.  The qdev device tree reset handler should not
> >> be registered until *after* we call qemu_system_reset() after creating
> >> the device model which will ensure that we don't do a double reset.
> >>      
> > Fine with me.
> >
> > But we need to merge something short term (pre 0.13) to fix hot plug of
> > e1000 et al.  Use Alex's patch as such a stop-gap?
> >    
> 
> No, we're accumulating crud in base qdev at an alarming rate.  It's 
> important to fix these things now before it gets prohibitively hard to 
> take care of.
> 
> Can you and Alex review/try the following patch?  It seems to work for 
> me although I'm not sure how to trigger the original bug.

Yep, that works.  The test is simply to hot add an e1000, much of the
register state is setup in the reset function so the guest won't be able
to make use of the device unless reset is called somewhere along the
way.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Markus Armbruster - Aug. 21, 2010, 10:07 a.m.
Anthony Liguori <anthony@codemonkey.ws> writes:

> On 08/20/2010 11:14 AM, Markus Armbruster wrote:
>>> The real problem is how we do reset.  We shouldn't register a reset
>>> handler for every qdev device but rather register a single reset
>>> handler that walks the device tree and calls reset on every reachable
>>> device.
>>>
>>> Then we can always call reset in init() and there's no need to have a
>>> dev->hotplugged check.  The qdev device tree reset handler should not
>>> be registered until *after* we call qemu_system_reset() after creating
>>> the device model which will ensure that we don't do a double reset.
>>>      
>> Fine with me.
>>
>> But we need to merge something short term (pre 0.13) to fix hot plug of
>> e1000 et al.  Use Alex's patch as such a stop-gap?
>>    
>
> No, we're accumulating crud in base qdev at an alarming rate.  It's
> important to fix these things now before it gets prohibitively hard to
> take care of.
>
> Can you and Alex review/try the following patch?  It seems to work for
> me although I'm not sure how to trigger the original bug.
>
> Regards,
>
> Anthony Liguori

Looks good to me, except I dislike one little thing:

>>From df719f1cc6ae2cd430e1cc47896a13d25af81e67 Mon Sep 17 00:00:00 2001
> From: Anthony Liguori <aliguori@us.ibm.com>
> Date: Fri, 20 Aug 2010 13:06:22 -0500
> Subject: [PATCH] qdev: fix reset with hotplug
>
> Devices expect to be reset after being initialized.  Today, we achieve this by
> registering a reset handler in each qdev device.  We then rely on this reset
> handler getting called after device init but before CPU execution runs.
>
> Since hot plug results in a device being initialized outside of the normal
> system reset, things go badly today.
>
> This patch changes the reset handling so that qdev has no knowledge of the
> global system reset.  Instead, qdev devices are reset after initialization and
> then a new bus level function is introduced that allows all devices on the bus
> to be reset using a depth first transversal.
>
> We still need to do a system_reset before CPU init to preserve behavior of
> non-qdev devices so we make sure to register the qdev-based reset handler after
> that reset.
>
> N.B. we have to expose the implicit system bus because we have various hacks
> that result in an implicit system bus existing.  Instead, we ought to have an
> explicitly created system bus that we can trigger reset from.  That's a topic
> for a future patch though.
>
> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
>
> diff --git a/hw/qdev.c b/hw/qdev.c
> index e99c73f..dfd91d7 100644
> --- a/hw/qdev.c
> +++ b/hw/qdev.c
[...]
> +BusState *sysbus_get_default(void)
> +{
> +    return main_system_bus;
> +}
> +
> +void qbus_reset_all(BusState *bus)
> +{
> +    qbus_walk_children(bus, qdev_reset_one, NULL);
> +}
> +
>  /* can be used as ->unplug() callback for the simple cases */
>  int qdev_simple_unplug_cb(DeviceState *dev)
>  {
[...]
> diff --git a/vl.c b/vl.c
> index b3e3676..5de1688 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -2968,6 +2968,9 @@ int main(int argc, char **argv, char **envp)
>      }
>  
>      qemu_system_reset();
> +
> +    qemu_register_reset((void *)qbus_reset_all, sysbus_get_default());
> +

This is inconsistent with qdev_create().  qdev_create() uses null.

I agree with the N.B. in your commit message: the root of the tree
should be explicit.  Implicit is too much magic.  But you create a
second kind of magic.  I don't object to how that works, only to having
two different kinds.

I'd suggest you either make your qemu_reset_all() work like existing
qdev_create(), i.e. null means root.  Or change qdev_create() to work
like your qemu_reset_all(), i.e. use sysbus_get_default() instead of
null.

>      if (loadvm) {
>          if (load_vmstate(loadvm) < 0) {
>              autostart = 0;
Anthony Liguori - Aug. 21, 2010, 3:19 p.m.
On 08/21/2010 05:07 AM, Markus Armbruster wrote:
>> diff --git a/vl.c b/vl.c
>> index b3e3676..5de1688 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -2968,6 +2968,9 @@ int main(int argc, char **argv, char **envp)
>>       }
>>
>>       qemu_system_reset();
>> +
>> +    qemu_register_reset((void *)qbus_reset_all, sysbus_get_default());
>> +
>>      
> This is inconsistent with qdev_create().  qdev_create() uses null.
>
> I agree with the N.B. in your commit message: the root of the tree
> should be explicit.  Implicit is too much magic.  But you create a
> second kind of magic.  I don't object to how that works, only to having
> two different kinds.
>
> I'd suggest you either make your qemu_reset_all() work like existing
> qdev_create(), i.e. null means root.  Or change qdev_create() to work
> like your qemu_reset_all(), i.e. use sysbus_get_default() instead of
> null.
>    

I'm getting rid of the NULL crap too although it's lower on my qdev 
TODO..  sysbus_get_default() is a heck of a lot easier to grep for 
though than NULL so I'd prefer to use this for now.

Regards,

Anthony Liguori

>>       if (loadvm) {
>>           if (load_vmstate(loadvm)<  0) {
>>               autostart = 0;
>>
Paolo Bonzini - Aug. 23, 2010, 11:25 a.m.
On 08/20/2010 08:12 PM, Anthony Liguori wrote:
> +/* Returns false to terminate walk; true to continue */
> +typedef int (qdev_walkerfn)(DeviceState *dev, void *opaque);
> +

Since you're introducing qbus_walk_children, I suggest a different 
interface: qdev_walkerfn should return 0 to walk children, -1 to skip 
walking children, and anything else to terminate walk.  If anything ever 
returns x > 0, qbus_walk_children returns that x, else 
qbus_walk_children returns 0.  This interface is inspired by a similar 
one in GCC and it works well.

If you don't want to introduce the full complication, removing the "-1 
to skip walking children" part would still give the same flexibility WRT 
to the return values, which is the important part.

Paolo
Anthony Liguori - Aug. 23, 2010, 1:27 p.m.
On 08/23/2010 06:25 AM, Paolo Bonzini wrote:
> On 08/20/2010 08:12 PM, Anthony Liguori wrote:
>> +/* Returns false to terminate walk; true to continue */
>> +typedef int (qdev_walkerfn)(DeviceState *dev, void *opaque);
>> +
>
> Since you're introducing qbus_walk_children, I suggest a different 
> interface: qdev_walkerfn should return 0 to walk children, -1 to skip 
> walking children, and anything else to terminate walk.  If anything 
> ever returns x > 0, qbus_walk_children returns that x, else 
> qbus_walk_children returns 0.  This interface is inspired by a similar 
> one in GCC and it works well.

Good suggestion.

Regards,

Anthony Liguori

> If you don't want to introduce the full complication, removing the "-1 
> to skip walking children" part would still give the same flexibility 
> WRT to the return values, which is the important part.
>
> Paolo

Patch

From df719f1cc6ae2cd430e1cc47896a13d25af81e67 Mon Sep 17 00:00:00 2001
From: Anthony Liguori <aliguori@us.ibm.com>
Date: Fri, 20 Aug 2010 13:06:22 -0500
Subject: [PATCH] qdev: fix reset with hotplug

Devices expect to be reset after being initialized.  Today, we achieve this by
registering a reset handler in each qdev device.  We then rely on this reset
handler getting called after device init but before CPU execution runs.

Since hot plug results in a device being initialized outside of the normal
system reset, things go badly today.

This patch changes the reset handling so that qdev has no knowledge of the
global system reset.  Instead, qdev devices are reset after initialization and
then a new bus level function is introduced that allows all devices on the bus
to be reset using a depth first transversal.

We still need to do a system_reset before CPU init to preserve behavior of
non-qdev devices so we make sure to register the qdev-based reset handler after
that reset.

N.B. we have to expose the implicit system bus because we have various hacks
that result in an implicit system bus existing.  Instead, we ought to have an
explicitly created system bus that we can trigger reset from.  That's a topic
for a future patch though.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

diff --git a/hw/qdev.c b/hw/qdev.c
index e99c73f..dfd91d7 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -256,13 +256,6 @@  DeviceState *qdev_device_add(QemuOpts *opts)
     return qdev;
 }
 
-static void qdev_reset(void *opaque)
-{
-    DeviceState *dev = opaque;
-    if (dev->info->reset)
-        dev->info->reset(dev);
-}
-
 /* Initialize a device.  Device properties should be set before calling
    this function.  IRQs and MMIO regions should be connected/mapped after
    calling this function.
@@ -278,13 +271,15 @@  int qdev_init(DeviceState *dev)
         qdev_free(dev);
         return rc;
     }
-    qemu_register_reset(qdev_reset, dev);
     if (dev->info->vmsd) {
         vmstate_register_with_alias_id(dev, -1, dev->info->vmsd, dev,
                                        dev->instance_id_alias,
                                        dev->alias_required_for_version);
     }
     dev->state = DEV_STATE_INITIALIZED;
+    if (dev->info->reset) {
+        dev->info->reset(dev);
+    }
     return 0;
 }
 
@@ -307,6 +302,25 @@  int qdev_unplug(DeviceState *dev)
     return dev->info->unplug(dev);
 }
 
+static int qdev_reset_one(DeviceState *dev, void *opaque)
+{
+    if (dev->info->reset) {
+        dev->info->reset(dev);
+    }
+
+    return 1;
+}
+
+BusState *sysbus_get_default(void)
+{
+    return main_system_bus;
+}
+
+void qbus_reset_all(BusState *bus)
+{
+    qbus_walk_children(bus, qdev_reset_one, NULL);
+}
+
 /* can be used as ->unplug() callback for the simple cases */
 int qdev_simple_unplug_cb(DeviceState *dev)
 {
@@ -350,7 +364,6 @@  void qdev_free(DeviceState *dev)
         if (dev->opts)
             qemu_opts_del(dev->opts);
     }
-    qemu_unregister_reset(qdev_reset, dev);
     QLIST_REMOVE(dev, sibling);
     for (prop = dev->info->props; prop && prop->name; prop++) {
         if (prop->info->free) {
@@ -448,6 +461,27 @@  BusState *qdev_get_child_bus(DeviceState *dev, const char *name)
     return NULL;
 }
 
+int qbus_walk_children(BusState *bus, qdev_walkerfn *walker, void *opaque)
+{
+    DeviceState *dev;
+
+    QLIST_FOREACH(dev, &bus->children, sibling) {
+        BusState *child;
+
+        if (!walker(dev, opaque)) {
+            return 0;
+        }
+
+        QLIST_FOREACH(child, &dev->child_bus, sibling) {
+            if (!qbus_walk_children(child, walker, opaque)) {
+                return 0;
+            }
+        }
+    }
+
+    return 1;
+}
+
 static BusState *qbus_find_recursive(BusState *bus, const char *name,
                                      const BusInfo *info)
 {
diff --git a/hw/qdev.h b/hw/qdev.h
index 678f8b7..1e5f983 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -174,13 +174,21 @@  BusState *qdev_get_parent_bus(DeviceState *dev);
 
 /*** BUS API. ***/
 
+/* Returns false to terminate walk; true to continue */
+typedef int (qdev_walkerfn)(DeviceState *dev, void *opaque);
+
 void qbus_create_inplace(BusState *bus, BusInfo *info,
                          DeviceState *parent, const char *name);
 BusState *qbus_create(BusInfo *info, DeviceState *parent, const char *name);
+int qbus_walk_children(BusState *bus, qdev_walkerfn *walker, void *opaque);
+void qbus_reset_all(BusState *bus);
 void qbus_free(BusState *bus);
 
 #define FROM_QBUS(type, dev) DO_UPCAST(type, qbus, dev)
 
+/* This should go away once we get rid of the NULL bus hack */
+BusState *sysbus_get_default(void);
+
 /*** monitor commands ***/
 
 void do_info_qtree(Monitor *mon);
diff --git a/vl.c b/vl.c
index b3e3676..5de1688 100644
--- a/vl.c
+++ b/vl.c
@@ -2968,6 +2968,9 @@  int main(int argc, char **argv, char **envp)
     }
 
     qemu_system_reset();
+
+    qemu_register_reset((void *)qbus_reset_all, sysbus_get_default());
+
     if (loadvm) {
         if (load_vmstate(loadvm) < 0) {
             autostart = 0;
-- 
1.7.0.4