diff mbox

[v2,00/22] ppc/xics: simplify ICS and ICP creation

Message ID e58f7cfa-c03c-ca25-a973-7894253d2dcc@kaod.org
State New
Headers show

Commit Message

Cédric Le Goater Feb. 22, 2017, 10:55 a.m. UTC
On 02/22/2017 04:34 AM, David Gibson wrote:
> On Thu, Feb 16, 2017 at 02:47:23PM +0100, Cédric Le Goater wrote:
>> Hello,
>>
>> The goal behind this series is to simplify the XICS interface by
>> moving back in the machine the way the ICS and ICP objects interact
>> together. It's up to the machine to implement this "fabric" logic by
>> providing a set of handlers of a QOM interface. These handlers are
>> used to grab an ICS or an ICP object and also do irq resends. This
>> idea was suggested by David Gibson.
>>
>> The patchset is organised as follow. It starts with a preliminary
>> cleanup to get rid of the set_nr_irqs() and set_nr_servers()
>> handlers. It also moves the creation of the ICS and ICP objects from
>> the XICS object to the sPAPR machine. This simplifies the code
>> significantly and prepares ground for future changes.
>>
>> As the sPAPR machine only makes use of a single ICS, we can store it
>> at the machine level. This lets us remove dependencies on the list of
>> ICS of the XICS object and simplify even more the code for the
>> following changes.
>>
>> The QOM interface to interact with the ICS and ICP objects is then
>> introduced. These are moved under the machine and cleanups are done
>> accordingly.
>>
>> Finally, the XICSState classes are removed as they have been
>> deprecated by the QOM interface.
>>
>>
>> After the initial cleanups, which are rather big, I have tried to keep
>> the each patch small enough to ease the review and to spot any
>> problem. Each should be bisectable. The tree is available here :
>>
>> 	   https://github.com/legoater/qemu/tree/ppc-2.9
> 
> So, after you posted this, I discover the patch I sent the other day -
> changing XICS away from a SysBusDevice breaks the postcopy migration
> test on KVM.  I haven't had a chance to debug this yet, so for the
> time being I've pulled my patch from ppc-for-2.9.  I've moved it into
> a new 'xics-cleanup' branch.

It is even worse than that, the kernel does not start. This is because
the ICS and ICP objects are not reseted  anymore and so the mfrr and
irq priority values are incorrect : 0x0 instead of 0xFF. 

Before that patch, the reset was implicit because the device was a 
SysBusDevice and all the devices were reseted when the bus was.  

Other devices (not on a bus or/and QOM objects) need to register on 
SysBus to be reseted :
  
	qdev_set_parent_bus(dev, sysbus_get_default());

or use a handler for :

	qemu_register_reset()

which will be called by qemu_devices_reset()

I fell into this trap a few times with PowerNV and I should have 
spotted it before adding my Reviewed-by. Sorry about that.


So, to move on, we can use the fix below (You can merge it in your 
patch). I also updated my branch with it : 

	https://github.com/legoater/qemu/commits/ppc-2.9

I have checked that KVM and TCG migration still worked with the 
patchset and also rebased PowerNV on it. All seem to work. Tell
me if you want a resend. The patchset needs some review any how 
and there should be some comment to address so it might be a bit 
too early for a resend. 



FYI, the xics-cleanup branch has some issue with migration :

qemu-system-ppc64: VQ 0 size 0x80 < last_avail_idx 0x9f9 - used_idx 0x0
qemu-system-ppc64: Failed to load virtio-blk:virtio
qemu-system-ppc64: error while loading state for instance 0x0 of device 'pci@800000020000000:01.0/virtio-blk'

This is most probably a temporary regression, unrelated to XICS 
though. 
 
Thanks,

C.

From f01dd87954b818096e4fb8c85265ea71a0075975 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@kaod.org>
Date: Wed, 22 Feb 2017 10:50:25 +0100
Subject: [PATCH] ppc/xics: fix ICP and ICS reset
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

commit 5b17c7207938 ("xics: XICS should not be a SysBusDevice")
changed the nature of the XICS object to be a descendent of
TYPE_DEVICE. By doing so, the object is not on a bus and its reset
handler is not called anymore. The direct consequence is that the ICP
and ICS objects are not correctly initialized and so the IRQ subsystem
is broken in the guest.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr.c        | 1 +
 include/hw/ppc/xics.h | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

Comments

David Gibson Feb. 23, 2017, 3:07 a.m. UTC | #1
On Wed, Feb 22, 2017 at 11:55:40AM +0100, Cédric Le Goater wrote:
> On 02/22/2017 04:34 AM, David Gibson wrote:
> > On Thu, Feb 16, 2017 at 02:47:23PM +0100, Cédric Le Goater wrote:
> >> Hello,
> >>
> >> The goal behind this series is to simplify the XICS interface by
> >> moving back in the machine the way the ICS and ICP objects interact
> >> together. It's up to the machine to implement this "fabric" logic by
> >> providing a set of handlers of a QOM interface. These handlers are
> >> used to grab an ICS or an ICP object and also do irq resends. This
> >> idea was suggested by David Gibson.
> >>
> >> The patchset is organised as follow. It starts with a preliminary
> >> cleanup to get rid of the set_nr_irqs() and set_nr_servers()
> >> handlers. It also moves the creation of the ICS and ICP objects from
> >> the XICS object to the sPAPR machine. This simplifies the code
> >> significantly and prepares ground for future changes.
> >>
> >> As the sPAPR machine only makes use of a single ICS, we can store it
> >> at the machine level. This lets us remove dependencies on the list of
> >> ICS of the XICS object and simplify even more the code for the
> >> following changes.
> >>
> >> The QOM interface to interact with the ICS and ICP objects is then
> >> introduced. These are moved under the machine and cleanups are done
> >> accordingly.
> >>
> >> Finally, the XICSState classes are removed as they have been
> >> deprecated by the QOM interface.
> >>
> >>
> >> After the initial cleanups, which are rather big, I have tried to keep
> >> the each patch small enough to ease the review and to spot any
> >> problem. Each should be bisectable. The tree is available here :
> >>
> >> 	   https://github.com/legoater/qemu/tree/ppc-2.9
> > 
> > So, after you posted this, I discover the patch I sent the other day -
> > changing XICS away from a SysBusDevice breaks the postcopy migration
> > test on KVM.  I haven't had a chance to debug this yet, so for the
> > time being I've pulled my patch from ppc-for-2.9.  I've moved it into
> > a new 'xics-cleanup' branch.
> 
> It is even worse than that, the kernel does not start. This is because
> the ICS and ICP objects are not reseted  anymore and so the mfrr and
> irq priority values are incorrect : 0x0 instead of 0xFF. 
> 
> Before that patch, the reset was implicit because the device was a 
> SysBusDevice and all the devices were reseted when the bus was.  
> 
> Other devices (not on a bus or/and QOM objects) need to register on 
> SysBus to be reseted :
>   
> 	qdev_set_parent_bus(dev, sysbus_get_default());
> 
> or use a handler for :
> 
> 	qemu_register_reset()
> 
> which will be called by qemu_devices_reset()
> 
> I fell into this trap a few times with PowerNV and I should have 
> spotted it before adding my Reviewed-by. Sorry about that.

Ah!  Well, thanks for spotting it now and saving me the debugging.

> 
> So, to move on, we can use the fix below (You can merge it in your 
> patch). I also updated my branch with it : 
> 
> 	https://github.com/legoater/qemu/commits/ppc-2.9
> 
> I have checked that KVM and TCG migration still worked with the 
> patchset and also rebased PowerNV on it. All seem to work. Tell
> me if you want a resend. The patchset needs some review any how 
> and there should be some comment to address so it might be a bit 
> too early for a resend. 
> 
> 
> 
> FYI, the xics-cleanup branch has some issue with migration :
> 
> qemu-system-ppc64: VQ 0 size 0x80 < last_avail_idx 0x9f9 - used_idx 0x0
> qemu-system-ppc64: Failed to load virtio-blk:virtio
> qemu-system-ppc64: error while loading state for instance 0x0 of device 'pci@800000020000000:01.0/virtio-blk'
> 
> This is most probably a temporary regression, unrelated to XICS 
> though.

Hmm.  I'm lss sure.  This series changes the qom paths of ics and icp
devices, which I'd expect to mess with migration, though I haven't had
a chance to actually check yet.

So, as mentioned in one of my patch comments it hadn't been my
intention for the ICS and ICPs to assume that the machine implements
the fabric, but rather to replace their current "concrete" xics
pointer with a xics interface pointer that would point to the (spapr)
machine in practice.

Apart from that I'm pretty happy with the endpoint you reach.  I'm a
bit less convinced about the path taken to get there.  I'm not sure if
it's worth the churn of doing this reorg, but I think we'd get there
more clearly and with less intermediate abstraction violations if it
was done by:

     1. Introduce the xics qom interface, but have it implemented by
        the existing xics object
     2. Change the ics and icp to only interact with the xics object
        via the qom interface
     3. Implement the qom interface in the spapr machine
     4. Change to spapr directly creating ics and icp objects,
        pointing back to itself as the xics interface provider
     5. Remove the xics concrete object

This also has the advantage that the qom path changing parts are
isolated to step (4), meaning problems with migration should be easier
to localize.
Cédric Le Goater Feb. 23, 2017, 6:49 a.m. UTC | #2
On 02/23/2017 04:07 AM, David Gibson wrote:
>> FYI, the xics-cleanup branch has some issue with migration :
>>
>> qemu-system-ppc64: VQ 0 size 0x80 < last_avail_idx 0x9f9 - used_idx 0x0
>> qemu-system-ppc64: Failed to load virtio-blk:virtio
>> qemu-system-ppc64: error while loading state for instance 0x0 of device 'pci@800000020000000:01.0/virtio-blk'
>>
>> This is most probably a temporary regression, unrelated to XICS 
>> though.
> Hmm.  I'm lss sure.  This series changes the qom paths of ics and icp
> devices, which I'd expect to mess with migration, though I haven't had
> a chance to actually check yet.
> 

Just to be clear, the problem occurs without this patchset. 
It's inherent to the branch, but you will need the fix on 
XICS to start the guest.

C.
Cédric Le Goater Feb. 23, 2017, 7:19 a.m. UTC | #3
> Apart from that I'm pretty happy with the endpoint you reach.  I'm a
> bit less convinced about the path taken to get there.  I'm not sure if
> it's worth the churn of doing this reorg, but I think we'd get there
> more clearly and with less intermediate abstraction violations if it
> was done by:
> 
>      1. Introduce the xics qom interface, but have it implemented by
>         the existing xics object
>      2. Change the ics and icp to only interact with the xics object
>         via the qom interface
>      3. Implement the qom interface in the spapr machine
>      4. Change to spapr directly creating ics and icp objects,
>         pointing back to itself as the xics interface provider
>      5. Remove the xics concrete object

So that's a full rewrite of the patchset to reach the same point. 
I can only grumble for such a proposal :/ 

> This also has the advantage that the qom path changing parts are
> isolated to step (4), meaning problems with migration should be easier
> to localize.

and migration works.

C.
David Gibson Feb. 23, 2017, 10:55 p.m. UTC | #4
On Thu, Feb 23, 2017 at 08:19:31AM +0100, Cédric Le Goater wrote:
> > Apart from that I'm pretty happy with the endpoint you reach.  I'm a
> > bit less convinced about the path taken to get there.  I'm not sure if
> > it's worth the churn of doing this reorg, but I think we'd get there
> > more clearly and with less intermediate abstraction violations if it
> > was done by:
> > 
> >      1. Introduce the xics qom interface, but have it implemented by
> >         the existing xics object
> >      2. Change the ics and icp to only interact with the xics object
> >         via the qom interface
> >      3. Implement the qom interface in the spapr machine
> >      4. Change to spapr directly creating ics and icp objects,
> >         pointing back to itself as the xics interface provider
> >      5. Remove the xics concrete object
> 
> So that's a full rewrite of the patchset to reach the same point. 
> I can only grumble for such a proposal :/ 

Yeah.. point taken.

> > This also has the advantage that the qom path changing parts are
> > isolated to step (4), meaning problems with migration should be easier
> > to localize.
> 
> and migration works.

Oh, that's a nice surprise.  Ok never mind about the rework, just
address the other comments and repost.
diff mbox

Patch

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 8af54494f166..fa6a2947c791 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -104,6 +104,7 @@  static XICSState *try_create_xics(const char *type, int nr_servers,
     dev = DEVICE(object_new(type));
     qdev_prop_set_uint32(dev, "nr_servers", nr_servers);
     qdev_prop_set_uint32(dev, "nr_irqs", nr_irqs);
+    qdev_set_parent_bus(dev, sysbus_get_default());
     object_property_set_bool(OBJECT(dev), true, "realized", &err);
     if (err) {
         error_propagate(errp, err);
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 3f0c31610aa4..1aefd3d52257 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -80,7 +80,7 @@  struct XICSStateClass {
 
 struct XICSState {
     /*< private >*/
-    SysBusDevice parent_obj;
+    DeviceState parent_obj;
     /*< public >*/
     uint32_t nr_servers;
     uint32_t nr_irqs;