Message ID | 1436460692-5142-2-git-send-email-cornelia.huck@de.ibm.com |
---|---|
State | New |
Headers | show |
Am 09.07.2015 um 18:51 schrieb Cornelia Huck: > Devices that don't live on a bus aren't caught by the normal device > reset logic. Let's register a reset handler for those devices during > device realization that calls the reset handler for the associated > device class. > > Suggested-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> > Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> > --- > hw/core/qdev.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) Looks acceptable as solution for 2.4... However, I would like to understand why the parent device cannot reset it? Someone created that device in the first place. I would prefer you call device_reset() from there without this generic patch. A similar discussion took place for the x86 APIC. Regards, Andreas
On Thu, Jul 9, 2015 at 9:59 AM, Andreas Färber <afaerber@suse.de> wrote: > Am 09.07.2015 um 18:51 schrieb Cornelia Huck: >> Devices that don't live on a bus aren't caught by the normal device >> reset logic. Let's register a reset handler for those devices during >> device realization that calls the reset handler for the associated >> device class. >> >> Suggested-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> >> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> >> --- >> hw/core/qdev.c | 15 +++++++++++++++ >> 1 file changed, 15 insertions(+) > > Looks acceptable as solution for 2.4... However, I would like to > understand why the parent device cannot reset it? Someone created that > device in the first place. I would prefer you call device_reset() from > there without this generic patch. > So it is entirely possible for a device to be completely orphaned if it neither fits in a bus or a container. In that case I guess the container would be /machine. Particularly tricky for hotplug scenarios. I don't think we wan't concrete-class containers having to do explicit child resets and any generalization of that in TYPE_DEVICE is not too far removed from this patch. Any reason to not just rip though the whole QOM tree and reset every device regardless of bus/containuer arch? IIRC the semantics for ->reset (and at least devices_reset()) are "pull the plug out of wall and put it back in again" so there is currently supposed to be no scope for controlling reset events. This is a good lower risk solution that handled to obvious degenerate case of orphaned devices. Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> Regards, Peter > A similar discussion took place for the x86 APIC. > > Regards, > Andreas > > -- > SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany > GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham Norton; HRB > 21284 (AG Nürnberg) >
Am 09.07.2015 um 18:51 schrieb Cornelia Huck: > Devices that don't live on a bus aren't caught by the normal device > reset logic. Let's register a reset handler for those devices during > device realization that calls the reset handler for the associated > device class. > > Suggested-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> > Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> reboot (from within guest) and external reset (system_reset in monitor) now work fine with the s390 watchdog. Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> > --- > hw/core/qdev.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > > diff --git a/hw/core/qdev.c b/hw/core/qdev.c > index b2f404a..5c7c27b 100644 > --- a/hw/core/qdev.c > +++ b/hw/core/qdev.c > @@ -1018,6 +1018,13 @@ static bool device_get_realized(Object *obj, Error **errp) > return dev->realized; > } > > +static void do_device_reset(void *opaque) > +{ > + DeviceState *dev = opaque; > + > + device_reset(dev); > +} > + > static void device_set_realized(Object *obj, bool value, Error **errp) > { > DeviceState *dev = DEVICE(obj); > @@ -1061,6 +1068,11 @@ static void device_set_realized(Object *obj, bool value, Error **errp) > goto post_realize_fail; > } > > + if (!dev->parent_bus) { > + /* Make sure that reset is called for bus-less devices. */ > + qemu_register_reset(do_device_reset, dev); > + } > + > if (qdev_get_vmsd(dev)) { > vmstate_register_with_alias_id(dev, -1, qdev_get_vmsd(dev), dev, > dev->instance_id_alias, > @@ -1094,6 +1106,9 @@ static void device_set_realized(Object *obj, bool value, Error **errp) > } > dev->pending_deleted_event = true; > DEVICE_LISTENER_CALL(unrealize, Reverse, dev); > + if (!dev->parent_bus) { > + qemu_unregister_reset(do_device_reset, dev); > + } > } > > if (local_err != NULL) { >
On Mon, 13 Jul 2015 14:22:05 +0200 Christian Borntraeger <borntraeger@de.ibm.com> wrote: > Am 09.07.2015 um 18:51 schrieb Cornelia Huck: > > Devices that don't live on a bus aren't caught by the normal device > > reset logic. Let's register a reset handler for those devices during > > device realization that calls the reset handler for the associated > > device class. > > > > Suggested-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> > > Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> > reboot (from within guest) and external reset (system_reset in monitor) > now work fine with the s390 watchdog. > > Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> > > > --- > > hw/core/qdev.c | 15 +++++++++++++++ > > 1 file changed, 15 insertions(+) Thanks. Any objections against taking this through s390-next? I'd like to fix diag288 reset (+ that annoying migration regession) for 2.4-rc1 and send a pull request soon.
Am 13.07.2015 um 16:11 schrieb Cornelia Huck: > On Mon, 13 Jul 2015 14:22:05 +0200 > Christian Borntraeger <borntraeger@de.ibm.com> wrote: > >> Am 09.07.2015 um 18:51 schrieb Cornelia Huck: >>> Devices that don't live on a bus aren't caught by the normal device >>> reset logic. Let's register a reset handler for those devices during >>> device realization that calls the reset handler for the associated >>> device class. >>> >>> Suggested-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> >>> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> >> reboot (from within guest) and external reset (system_reset in monitor) >> now work fine with the s390 watchdog. >> >> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> >> >>> --- >>> hw/core/qdev.c | 15 +++++++++++++++ >>> 1 file changed, 15 insertions(+) > > Thanks. > > Any objections against taking this through s390-next? I'd like to fix > diag288 reset (+ that annoying migration regession) for 2.4-rc1 and > send a pull request soon. Which device does this fix (only this diag88?), and is it really not possible to register a reset handler where it's being created? Peter C.'s theory does not match practice for x86, and this patch will lead to bus-less devices that are properly being reset by their parent getting reset twice, potentially causing issues due to qemu_irqs. I'd rather avoid that. One workaround would be to amend this patch with a DeviceClass flag for whether to enable this new behavior, defaulting to no and getting overridden by your affected device. Regards, Andreas
On 09/07/2015 20:53, Peter Crosthwaite wrote: > > Any reason to not just rip though the whole QOM tree and reset every > device regardless of bus/containuer arch? IIRC the semantics for > ->reset (and at least devices_reset()) are "pull the plug out of wall > and put it back in again" so there is currently supposed to be no > scope for controlling reset events. Most bus ->reset methods expect the children to be reset before the parent (postorder). At least SCSI and PCI do. Paolo
Am 13.07.2015 um 16:20 schrieb Andreas Färber: > Am 13.07.2015 um 16:11 schrieb Cornelia Huck: >> On Mon, 13 Jul 2015 14:22:05 +0200 >> Christian Borntraeger <borntraeger@de.ibm.com> wrote: >> >>> Am 09.07.2015 um 18:51 schrieb Cornelia Huck: >>>> Devices that don't live on a bus aren't caught by the normal device >>>> reset logic. Let's register a reset handler for those devices during >>>> device realization that calls the reset handler for the associated >>>> device class. >>>> >>>> Suggested-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> >>>> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> >>> reboot (from within guest) and external reset (system_reset in monitor) >>> now work fine with the s390 watchdog. >>> >>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> >>> >>>> --- >>>> hw/core/qdev.c | 15 +++++++++++++++ >>>> 1 file changed, 15 insertions(+) >> >> Thanks. >> >> Any objections against taking this through s390-next? I'd like to fix >> diag288 reset (+ that annoying migration regession) for 2.4-rc1 and >> send a pull request soon. > > Which device does this fix (only this diag88?), and is it really not > possible to register a reset handler where it's being created? A sysbus device reset is also not registered or called by its parent. It is resetted by the generic qdev handler walking all children (qdev_reset_all and qbus_reset_all), no? Christian
On Mon, 13 Jul 2015 16:20:09 +0200 Andreas Färber <afaerber@suse.de> wrote: > Am 13.07.2015 um 16:11 schrieb Cornelia Huck: > > On Mon, 13 Jul 2015 14:22:05 +0200 > > Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > >> Am 09.07.2015 um 18:51 schrieb Cornelia Huck: > >>> Devices that don't live on a bus aren't caught by the normal device > >>> reset logic. Let's register a reset handler for those devices during > >>> device realization that calls the reset handler for the associated > >>> device class. > >>> > >>> Suggested-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> > >>> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> > >> reboot (from within guest) and external reset (system_reset in monitor) > >> now work fine with the s390 watchdog. > >> > >> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> > >> > >>> --- > >>> hw/core/qdev.c | 15 +++++++++++++++ > >>> 1 file changed, 15 insertions(+) > > > > Thanks. > > > > Any objections against taking this through s390-next? I'd like to fix > > diag288 reset (+ that annoying migration regession) for 2.4-rc1 and > > send a pull request soon. > > Which device does this fix (only this diag88?), and is it really not > possible to register a reset handler where it's being created? The original patch did this (<1436259202-20509-2-git-send-email-cornelia.huck@de.ibm.com>). Peter C. suspected NAND may also be affected (<CAEgOgz6Fa5TSApDrj9iHA+2joDt1yBg6amCAp-634Bbe5bgNvA@mail.gmail.com>). > > Peter C.'s theory does not match practice for x86, and this patch will > lead to bus-less devices that are properly being reset by their parent > getting reset twice, potentially causing issues due to qemu_irqs. I'd > rather avoid that. Introducing new bugs is not something I want to do. Are double resets a problem in practice, though? > > One workaround would be to amend this patch with a DeviceClass flag for > whether to enable this new behavior, defaulting to no and getting > overridden by your affected device. That is still something that can be easily missed. Providing a reset handler that isn't called until you do some magical incantations is somewhat surprising. But in the end, what I care about is that resetting diag288 works with 2.4 :)
Am 13.07.2015 um 16:30 schrieb Christian Borntraeger: > Am 13.07.2015 um 16:20 schrieb Andreas Färber: >> Am 13.07.2015 um 16:11 schrieb Cornelia Huck: >>> On Mon, 13 Jul 2015 14:22:05 +0200 >>> Christian Borntraeger <borntraeger@de.ibm.com> wrote: >>> >>>> Am 09.07.2015 um 18:51 schrieb Cornelia Huck: >>>>> Devices that don't live on a bus aren't caught by the normal device >>>>> reset logic. Let's register a reset handler for those devices during >>>>> device realization that calls the reset handler for the associated >>>>> device class. >>>>> >>>>> Suggested-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> >>>>> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> >>>> reboot (from within guest) and external reset (system_reset in monitor) >>>> now work fine with the s390 watchdog. >>>> >>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> >>>> >>>>> --- >>>>> hw/core/qdev.c | 15 +++++++++++++++ >>>>> 1 file changed, 15 insertions(+) >>> >>> Thanks. >>> >>> Any objections against taking this through s390-next? I'd like to fix >>> diag288 reset (+ that annoying migration regession) for 2.4-rc1 and >>> send a pull request soon. >> >> Which device does this fix (only this diag88?), and is it really not >> possible to register a reset handler where it's being created? > > A sysbus device reset is also not registered or called by its parent. It is > resetted by the generic qdev handler walking all children (qdev_reset_all and > qbus_reset_all), no? It seems you're missing my point. Having a SysBusDevice reset by the SysBus is the expected way, rather than having the SysBusDevice register its own handler. Reset propagates along defined paths - buses and composition - and doesn't hit randomly. Therefore resetting random bus-less devices is wrong by design. And a step backwards, too. Which btw is the reason that my recursive realization patches stalled - we were potentially violating these rules. Anyway, here it's not about a SysBusDevice, it's about a pure Device, unless I've misunderstood something. Regards, Andreas
On Mon, 13 Jul 2015 17:05:31 +0200 Andreas Färber <afaerber@suse.de> wrote: > Am 13.07.2015 um 16:30 schrieb Christian Borntraeger: > > Am 13.07.2015 um 16:20 schrieb Andreas Färber: > >> Am 13.07.2015 um 16:11 schrieb Cornelia Huck: > >>> On Mon, 13 Jul 2015 14:22:05 +0200 > >>> Christian Borntraeger <borntraeger@de.ibm.com> wrote: > >>> > >>>> Am 09.07.2015 um 18:51 schrieb Cornelia Huck: > >>>>> Devices that don't live on a bus aren't caught by the normal device > >>>>> reset logic. Let's register a reset handler for those devices during > >>>>> device realization that calls the reset handler for the associated > >>>>> device class. > >>>>> > >>>>> Suggested-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> > >>>>> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> > >>>> reboot (from within guest) and external reset (system_reset in monitor) > >>>> now work fine with the s390 watchdog. > >>>> > >>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> > >>>> > >>>>> --- > >>>>> hw/core/qdev.c | 15 +++++++++++++++ > >>>>> 1 file changed, 15 insertions(+) > >>> > >>> Thanks. > >>> > >>> Any objections against taking this through s390-next? I'd like to fix > >>> diag288 reset (+ that annoying migration regession) for 2.4-rc1 and > >>> send a pull request soon. > >> > >> Which device does this fix (only this diag88?), and is it really not > >> possible to register a reset handler where it's being created? > > > > A sysbus device reset is also not registered or called by its parent. It is > > resetted by the generic qdev handler walking all children (qdev_reset_all and > > qbus_reset_all), no? > > It seems you're missing my point. > > Having a SysBusDevice reset by the SysBus is the expected way, rather > than having the SysBusDevice register its own handler. Reset propagates > along defined paths - buses and composition - and doesn't hit randomly. > Therefore resetting random bus-less devices is wrong by design. And a > step backwards, too. > > Which btw is the reason that my recursive realization patches stalled - > we were potentially violating these rules. > > Anyway, here it's not about a SysBusDevice, it's about a pure Device, > unless I've misunderstood something. So why does a pure Device have a reset callback then that is not called by default? Really, I think we're moving in circles here. First, the device should not live on the sysbus as it does not fit the perceived sysbus semantics. As there is no natural bus for it to live on, it becomes a pure device. Which is not reset, because somehow a generic callback is not called generically. I think we're really doing ourselves a disservice by this confused calling convention. How likely is it that someone introducing a pure device is immediately aware that some callbacks happen automatically while others don't? I wouldn't be surprised if there were some headscratchers that are solved by adding the reset call.
Am 13.07.2015 um 17:38 schrieb Cornelia Huck: > Really, I think we're moving in circles here. First, the device should > not live on the sysbus as it does not fit the perceived sysbus > semantics. As there is no natural bus for it to live on, it becomes a > pure device. Which is not reset, because somehow a generic callback is > not called generically. > > I think we're really doing ourselves a disservice by this confused > calling convention. How likely is it that someone introducing a pure > device is immediately aware that some callbacks happen automatically > while others don't? I wouldn't be surprised if there were some > headscratchers that are solved by adding the reset call. Well, I'd say the real bug is that this device is on the /machine in the first place, without a parent that propagates reset. Either it becomes self-responsible like this patch or my manual alternative are suggesting, or it is managed by some other object - either by /machine or indicating that this device is misplaced as a child of /machine. Fixing qdev reset to automatically propagate to children wouldn't help here, as /machine is not a device. I would've suggested putting that on the KVM call agenda, but the call is not tomorrow but next week. Regards, Andreas
On 13 July 2015 at 16:38, Cornelia Huck <cornelia.huck@de.ibm.com> wrote: > So why does a pure Device have a reset callback then that is not called > by default? This is (as I understand it) basically historical legacy from qdev. The qdev model puts every device on a bus of some kind, and then says that it's the bus that has responsibility for resetting the devices on it. We've now generalised that to allowing devices that don't really sit on buses, but because we still have a lot of code in the system that assumes the reset-is-by-the-bus model (and for cases like PCI depends on that ordering) we can't shift to "just reset every Device object in the system". So the current setup is that every Device needs to either (a) live on a bus that handles reset for it or (b) arrange to call reset itself or (c) require the code which creates it to arrange for reset to be called (which is what CPU objects do, for instance -- the board code has to register a reset handler which calls cpu_reset()). > Really, I think we're moving in circles here. First, the device should > not live on the sysbus as it does not fit the perceived sysbus > semantics. As there is no natural bus for it to live on, it becomes a > pure device. Which is not reset, because somehow a generic callback is > not called generically. I agree it's confusing, but I think adding a generic call to reset at this point in the release cycle is really dangerous. For better or worse, a lot of devices in the system expect to be reset in the way they are currently. There's definitely scope for improving/fixing the current code, but I think it's more upheaval than we can do for this release. thanks -- PMM
Am 13.07.2015 um 16:37 schrieb Cornelia Huck: > On Mon, 13 Jul 2015 16:20:09 +0200 > Andreas Färber <afaerber@suse.de> wrote: > >> Am 13.07.2015 um 16:11 schrieb Cornelia Huck: >>> On Mon, 13 Jul 2015 14:22:05 +0200 >>> Christian Borntraeger <borntraeger@de.ibm.com> wrote: >>> >>> Any objections against taking this through s390-next? I'd like to fix >>> diag288 reset (+ that annoying migration regession) for 2.4-rc1 and >>> send a pull request soon. >> >> Which device does this fix (only this diag88?), and is it really not >> possible to register a reset handler where it's being created? > > The original patch did this > (<1436259202-20509-2-git-send-email-cornelia.huck@de.ibm.com>). Peter > C. suspected NAND may also be affected > (<CAEgOgz6Fa5TSApDrj9iHA+2joDt1yBg6amCAp-634Bbe5bgNvA@mail.gmail.com>). Found it. So the problem *is* different from what I understood! It's not directly attached to /machine by s390x code, but rather instantiated via -device by the user. Peter C. suggested you to do it in realize, which affects all devices. The solution would be to instead either do the reset registration in qdev-monitor.c, where it's specific to devices that do not have a bus and on /machine/peripheral or /machine/periph-anon are not managed by a parent, or to add a further check here in realized. Right now I can only think of a hot-plug flag...? Not sure about unrealizing in the qdev-monitor case, but I think we can ignore that in this case? >> Peter C.'s theory does not match practice for x86, and this patch will >> lead to bus-less devices that are properly being reset by their parent >> getting reset twice, potentially causing issues due to qemu_irqs. I'd >> rather avoid that. > > Introducing new bugs is not something I want to do. Are double resets a > problem in practice, though? Not the double part, the relative ordering. Regards, Andreas
On Mon, 13 Jul 2015 18:06:45 +0200 Andreas Färber <afaerber@suse.de> wrote: > Found it. So the problem *is* different from what I understood! It's not > directly attached to /machine by s390x code, but rather instantiated via > -device by the user. > > Peter C. suggested you to do it in realize, which affects all devices. > > The solution would be to instead either do the reset registration in > qdev-monitor.c, where it's specific to devices that do not have a bus > and on /machine/peripheral or /machine/periph-anon are not managed by a > parent, or to add a further check here in realized. Right now I can only > think of a hot-plug flag...? Not sure about unrealizing in the > qdev-monitor case, but I think we can ignore that in this case? This sounds not like something we'll want to do during hardfreeze, though. I'll just fall back to the original patch that registered a reset so this works for now, and we can think of a more generic solution later.
diff --git a/hw/core/qdev.c b/hw/core/qdev.c index b2f404a..5c7c27b 100644 --- a/hw/core/qdev.c +++ b/hw/core/qdev.c @@ -1018,6 +1018,13 @@ static bool device_get_realized(Object *obj, Error **errp) return dev->realized; } +static void do_device_reset(void *opaque) +{ + DeviceState *dev = opaque; + + device_reset(dev); +} + static void device_set_realized(Object *obj, bool value, Error **errp) { DeviceState *dev = DEVICE(obj); @@ -1061,6 +1068,11 @@ static void device_set_realized(Object *obj, bool value, Error **errp) goto post_realize_fail; } + if (!dev->parent_bus) { + /* Make sure that reset is called for bus-less devices. */ + qemu_register_reset(do_device_reset, dev); + } + if (qdev_get_vmsd(dev)) { vmstate_register_with_alias_id(dev, -1, qdev_get_vmsd(dev), dev, dev->instance_id_alias, @@ -1094,6 +1106,9 @@ static void device_set_realized(Object *obj, bool value, Error **errp) } dev->pending_deleted_event = true; DEVICE_LISTENER_CALL(unrealize, Reverse, dev); + if (!dev->parent_bus) { + qemu_unregister_reset(do_device_reset, dev); + } } if (local_err != NULL) {
Devices that don't live on a bus aren't caught by the normal device reset logic. Let's register a reset handler for those devices during device realization that calls the reset handler for the associated device class. Suggested-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> --- hw/core/qdev.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)