Message ID | 20170911171235.29331-1-clg@kaod.org |
---|---|
Headers | show |
Series | Guest exploitation of the XIVE interrupt controller (POWER9) | expand |
On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote: > On a POWER9 sPAPR machine, the Client Architecture Support (CAS) > negotiation process determines whether the guest operates with an > interrupt controller using the XICS legacy model, as found on POWER8, > or in XIVE exploitation mode, the newer POWER9 interrupt model. This > patchset is a proposal to add XIVE support in POWER9 sPAPR machine. > > Follows a model for the XIVE interrupt controller and support for the > Hypervisor's calls which are used to configure the interrupt sources > and the event/notification queues of the guest. The last patch > integrates XIVE in the sPAPR machine. > > Code is here: An overall comment: I note in several replies here that I think the way XICS objects are re-used for XIVE is really ugly, and I think it will make future maintenance pretty painful. I'm thinking maybe trying to support the CAS negotiation of interrupt controller from day 1 is warping the design. A better approach might be first to implement XIVE only when given a specific machine option - guest gets one or the other and can't negotiate. That should allow a more natural XIVE design to emerge, *then* we can look at what's necessary to make boot-time negotiation possible. > > https://github.com/legoater/qemu/commits/xive > > Caveats : > > - IRQ allocator : making progress > > The sPAPR machine make uses of the interrupt controller very early > in the initialization sequence to allocate IRQ numbers and populate > the device tree. CAS requires XIVE to be able to switch interrupt > model and consequently have the models share a common IRQ allocator. > > I have chosen to link the sPAPR XICS interrupt source into XIVE to > share the ICSIRQState array which acts as an IRQ allocator. This > can be improved. > > - Interrupt presenter : > > The register data is directly stored under the ICPState structure > which is shared with all other sPAPR interrupt controller models. > > - KVM support : not addressed yet > > The guest needs to be run with kernel_irqchip=off on a POWER9 system. > > - LSI : lightly tested. > > Thanks, > > C. > > Changes since RFC v1: > > - removed initial complexity due to a tentative try to support > PowerNV. This will come later. > - removed specific XIVE interrupt source and presenter models > - renamed files and typedefs > - removed print_info() handler > - introduced a CAS reset to rebuild the device tree > - linked the XIVE model with the sPAPR XICS interrupt source to share > the IRQ allocator > - improved hcall support (still some missing but they are not used > under Linux) > - improved device tree > - should have addressed comments in first RFC > - and much more ... Next version should have a better changelog. > > > Cédric Le Goater (21): > ppc/xive: introduce a skeleton for the sPAPR XIVE interrupt controller > migration: add VMSTATE_STRUCT_VARRAY_UINT32_ALLOC > ppc/xive: define the XIVE internal tables > ppc/xive: provide a link to the sPAPR ICS object under XIVE > ppc/xive: allocate IRQ numbers for the IPIs > ppc/xive: introduce handlers for interrupt sources > ppc/xive: add MMIO handlers for the XIVE interrupt sources > ppc/xive: describe the XIVE interrupt source flags > ppc/xive: extend the interrupt presenter model for XIVE > ppc/xive: add MMIO handlers for the XIVE TIMA > ppc/xive: push the EQ data in OS event queue > ppc/xive: notify the CPU when interrupt priority is more privileged > ppc/xive: handle interrupt acknowledgment by the O/S > ppc/xive: add support for the SET_OS_PENDING command > spapr: modify spapr_populate_pci_dt() to use a 'nr_irqs' argument > spapr: add a XIVE object to the sPAPR machine > ppc/xive: add hcalls support > ppc/xive: add device tree support > ppc/xive: introduce a helper to map the XIVE memory regions > ppc/xics: introduce a qirq_get() helper in the XICSFabric > spapr: activate XIVE exploitation mode > > default-configs/ppc64-softmmu.mak | 1 + > hw/intc/Makefile.objs | 1 + > hw/intc/spapr_xive.c | 821 +++++++++++++++++++++++++++++++++ > hw/intc/spapr_xive_hcall.c | 930 ++++++++++++++++++++++++++++++++++++++ > hw/intc/xics.c | 11 +- > hw/intc/xive-internal.h | 189 ++++++++ > hw/ppc/spapr.c | 110 ++++- > hw/ppc/spapr_hcall.c | 6 + > hw/ppc/spapr_pci.c | 4 +- > include/hw/pci-host/spapr.h | 2 +- > include/hw/ppc/spapr.h | 17 +- > include/hw/ppc/spapr_xive.h | 75 +++ > include/hw/ppc/xics.h | 7 + > include/migration/vmstate.h | 10 + > 14 files changed, 2169 insertions(+), 15 deletions(-) > create mode 100644 hw/intc/spapr_xive.c > create mode 100644 hw/intc/spapr_xive_hcall.c > create mode 100644 hw/intc/xive-internal.h > create mode 100644 include/hw/ppc/spapr_xive.h >
On Tue, Sep 19, 2017 at 06:20:20PM +1000, David Gibson wrote: > On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote: > > On a POWER9 sPAPR machine, the Client Architecture Support (CAS) > > negotiation process determines whether the guest operates with an > > interrupt controller using the XICS legacy model, as found on POWER8, > > or in XIVE exploitation mode, the newer POWER9 interrupt model. This > > patchset is a proposal to add XIVE support in POWER9 sPAPR machine. > > > > Follows a model for the XIVE interrupt controller and support for the > > Hypervisor's calls which are used to configure the interrupt sources > > and the event/notification queues of the guest. The last patch > > integrates XIVE in the sPAPR machine. > > > > Code is here: > > > An overall comment: > > I note in several replies here that I think the way XICS objects are > re-used for XIVE is really ugly, and I think it will make future > maintenance pretty painful. > > I'm thinking maybe trying to support the CAS negotiation of interrupt > controller from day 1 is warping the design. A better approach might > be first to implement XIVE only when given a specific machine option - > guest gets one or the other and can't negotiate. > > That should allow a more natural XIVE design to emerge, *then* we can > look at what's necessary to make boot-time negotiation possible. Actually, it just occurred to me that we might be making life hard for ourselves by trying to actually switch between full XICS and XIVE models. Coudln't we have new machine types always construct the XIVE infrastructure, but then implement the XICS RTAS and hcalls in terms of the XIVE virtual hardware. Since something more or less equivalent has already been done in both OPAL and the host kernel, I'm guessing this shouldn't be too hard at this point.
On 09/19/2017 10:46 AM, David Gibson wrote: > On Tue, Sep 19, 2017 at 06:20:20PM +1000, David Gibson wrote: >> On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote: >>> On a POWER9 sPAPR machine, the Client Architecture Support (CAS) >>> negotiation process determines whether the guest operates with an >>> interrupt controller using the XICS legacy model, as found on POWER8, >>> or in XIVE exploitation mode, the newer POWER9 interrupt model. This >>> patchset is a proposal to add XIVE support in POWER9 sPAPR machine. >>> >>> Follows a model for the XIVE interrupt controller and support for the >>> Hypervisor's calls which are used to configure the interrupt sources >>> and the event/notification queues of the guest. The last patch >>> integrates XIVE in the sPAPR machine. >>> >>> Code is here: >> >> >> An overall comment: >> >> I note in several replies here that I think the way XICS objects are >> re-used for XIVE is really ugly, and I think it will make future >> maintenance pretty painful. I agree. That was one way to identify what we need for migration compatibility and CAS reset. >> I'm thinking maybe trying to support the CAS negotiation of interrupt >> controller from day 1 is warping the design. A better approach might >> be first to implement XIVE only when given a specific machine option - >> guest gets one or the other and can't negotiate. ok. CAS is not the most complex problem, we mostly need to share the ICSIRQState array and the source offset. migration from older machine is a problem. We are doomed to keep the existing XICS framework available. >> That should allow a more natural XIVE design to emerge, *then* we can >> look at what's necessary to make boot-time negotiation possible. > > Actually, it just occurred to me that we might be making life hard for > ourselves by trying to actually switch between full XICS and XIVE > models. Coudln't we have new machine types always construct the XIVE > infrastructure, yes. > but then implement the XICS RTAS and hcalls in terms of the XIVE virtual > hardware. ok but migration will not be supported. > Since something more or less equivalent > has already been done in both OPAL and the host kernel, I'm guessing > this shouldn't be too hard at this point. Indeed that is how it is working currently on P9 kvm guests. hcalls are implemented on top of XIVE native. Thanks, C.
On Wed, Sep 20, 2017 at 02:33:37PM +0200, Cédric Le Goater wrote: > On 09/19/2017 10:46 AM, David Gibson wrote: > > On Tue, Sep 19, 2017 at 06:20:20PM +1000, David Gibson wrote: > >> On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote: > >>> On a POWER9 sPAPR machine, the Client Architecture Support (CAS) > >>> negotiation process determines whether the guest operates with an > >>> interrupt controller using the XICS legacy model, as found on POWER8, > >>> or in XIVE exploitation mode, the newer POWER9 interrupt model. This > >>> patchset is a proposal to add XIVE support in POWER9 sPAPR machine. > >>> > >>> Follows a model for the XIVE interrupt controller and support for the > >>> Hypervisor's calls which are used to configure the interrupt sources > >>> and the event/notification queues of the guest. The last patch > >>> integrates XIVE in the sPAPR machine. > >>> > >>> Code is here: > >> > >> > >> An overall comment: > >> > >> I note in several replies here that I think the way XICS objects are > >> re-used for XIVE is really ugly, and I think it will make future > >> maintenance pretty painful. > > I agree. That was one way to identify what we need for migration > compatibility and CAS reset. > > >> I'm thinking maybe trying to support the CAS negotiation of interrupt > >> controller from day 1 is warping the design. A better approach might > >> be first to implement XIVE only when given a specific machine option - > >> guest gets one or the other and can't negotiate. > > ok. > > CAS is not the most complex problem, we mostly need to share > the ICSIRQState array and the source offset. migration from older > machine is a problem. Uh.. what? Migration from an older machine isn't a thing. We can migrate from an older qemu, but the machine type (and version) has to be identical at each end. That's *why* we keep around the older machine types on newer qemus. > We are doomed to keep the existing XICS > framework available. > > >> That should allow a more natural XIVE design to emerge, *then* we can > >> look at what's necessary to make boot-time negotiation possible. > > > > Actually, it just occurred to me that we might be making life hard for > > ourselves by trying to actually switch between full XICS and XIVE > > models. Coudln't we have new machine types always construct the XIVE > > infrastructure, > > yes. > > > but then implement the XICS RTAS and hcalls in terms of the XIVE virtual > > hardware. > > ok but migration will not be supported. Right, this would only be for newer machine types, and you can never migrate between different machine types. > > Since something more or less equivalent > > has already been done in both OPAL and the host kernel, I'm guessing > > this shouldn't be too hard at this point. > > Indeed that is how it is working currently on P9 kvm guests. hcalls are > implemented on top of XIVE native. > > Thanks, > > > C. >
On 09/21/2017 03:25 AM, David Gibson wrote: > On Wed, Sep 20, 2017 at 02:33:37PM +0200, Cédric Le Goater wrote: >> On 09/19/2017 10:46 AM, David Gibson wrote: >>> On Tue, Sep 19, 2017 at 06:20:20PM +1000, David Gibson wrote: >>>> On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote: >>>>> On a POWER9 sPAPR machine, the Client Architecture Support (CAS) >>>>> negotiation process determines whether the guest operates with an >>>>> interrupt controller using the XICS legacy model, as found on POWER8, >>>>> or in XIVE exploitation mode, the newer POWER9 interrupt model. This >>>>> patchset is a proposal to add XIVE support in POWER9 sPAPR machine. >>>>> >>>>> Follows a model for the XIVE interrupt controller and support for the >>>>> Hypervisor's calls which are used to configure the interrupt sources >>>>> and the event/notification queues of the guest. The last patch >>>>> integrates XIVE in the sPAPR machine. >>>>> >>>>> Code is here: >>>> >>>> >>>> An overall comment: >>>> >>>> I note in several replies here that I think the way XICS objects are >>>> re-used for XIVE is really ugly, and I think it will make future >>>> maintenance pretty painful. >> >> I agree. That was one way to identify what we need for migration >> compatibility and CAS reset. >> >>>> I'm thinking maybe trying to support the CAS negotiation of interrupt >>>> controller from day 1 is warping the design. A better approach might >>>> be first to implement XIVE only when given a specific machine option - >>>> guest gets one or the other and can't negotiate. >> >> ok. >> >> CAS is not the most complex problem, we mostly need to share >> the ICSIRQState array and the source offset. migration from older >> machine is a problem. > > Uh.. what? Migration from an older machine isn't a thing. We can > migrate from an older qemu, but the machine type (and version) has to > be identical at each end. That's *why* we keep around the older > machine types on newer qemus. yes. I am just wondering how I am going to handle a xics-only machine migrating to a xics/xive machine. The xive machine option we are talking about will activate the xive interrupt mode and instantiate the objects behind it. So when we migrate from an older machine we will need to start the target machine with xive=off. I guess that is OK. Thanks for the insights and the time to review the code, C. >> We are doomed to keep the existing XICS >> framework available. >> >>>> That should allow a more natural XIVE design to emerge, *then* we can >>>> look at what's necessary to make boot-time negotiation possible. >>> >>> Actually, it just occurred to me that we might be making life hard for >>> ourselves by trying to actually switch between full XICS and XIVE >>> models. Coudln't we have new machine types always construct the XIVE >>> infrastructure, >> >> yes. >> >>> but then implement the XICS RTAS and hcalls in terms of the XIVE virtual >>> hardware. >> >> ok but migration will not be supported. > > Right, this would only be for newer machine types, and you can never > migrate between different machine types. > >>> Since something more or less equivalent >>> has already been done in both OPAL and the host kernel, I'm guessing >>> this shouldn't be too hard at this point. >> >> Indeed that is how it is working currently on P9 kvm guests. hcalls are >> implemented on top of XIVE native. >> >> Thanks, >> >> >> C. >> >
On Thu, Sep 21, 2017 at 04:18:33PM +0200, Cédric Le Goater wrote: > On 09/21/2017 03:25 AM, David Gibson wrote: > > On Wed, Sep 20, 2017 at 02:33:37PM +0200, Cédric Le Goater wrote: > >> On 09/19/2017 10:46 AM, David Gibson wrote: > >>> On Tue, Sep 19, 2017 at 06:20:20PM +1000, David Gibson wrote: > >>>> On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote: > >>>>> On a POWER9 sPAPR machine, the Client Architecture Support (CAS) > >>>>> negotiation process determines whether the guest operates with an > >>>>> interrupt controller using the XICS legacy model, as found on POWER8, > >>>>> or in XIVE exploitation mode, the newer POWER9 interrupt model. This > >>>>> patchset is a proposal to add XIVE support in POWER9 sPAPR machine. > >>>>> > >>>>> Follows a model for the XIVE interrupt controller and support for the > >>>>> Hypervisor's calls which are used to configure the interrupt sources > >>>>> and the event/notification queues of the guest. The last patch > >>>>> integrates XIVE in the sPAPR machine. > >>>>> > >>>>> Code is here: > >>>> > >>>> > >>>> An overall comment: > >>>> > >>>> I note in several replies here that I think the way XICS objects are > >>>> re-used for XIVE is really ugly, and I think it will make future > >>>> maintenance pretty painful. > >> > >> I agree. That was one way to identify what we need for migration > >> compatibility and CAS reset. > >> > >>>> I'm thinking maybe trying to support the CAS negotiation of interrupt > >>>> controller from day 1 is warping the design. A better approach might > >>>> be first to implement XIVE only when given a specific machine option - > >>>> guest gets one or the other and can't negotiate. > >> > >> ok. > >> > >> CAS is not the most complex problem, we mostly need to share > >> the ICSIRQState array and the source offset. migration from older > >> machine is a problem. > > > > Uh.. what? Migration from an older machine isn't a thing. We can > > migrate from an older qemu, but the machine type (and version) has to > > be identical at each end. That's *why* we keep around the older > > machine types on newer qemus. > > yes. I am just wondering how I am going to handle a xics-only > machine migrating to a xics/xive machine. Won't ever happen. Older machine types will always be xics, newer machine type will always be xive (at least with POWER9). > The xive machine option we are talking about will activate > the xive interrupt mode and instantiate the objects behind it. > So when we migrate from an older machine we will need to start > the target machine with xive=off. I guess that is OK. Again, we *don't* migrate from an older machine. Ever. We only ever migrate from an older qemu version to a newer qemu using the older machine type. > > Thanks for the insights and the time to review the code, > > C. > > >> We are doomed to keep the existing XICS > >> framework available. > >> > >>>> That should allow a more natural XIVE design to emerge, *then* we can > >>>> look at what's necessary to make boot-time negotiation possible. > >>> > >>> Actually, it just occurred to me that we might be making life hard for > >>> ourselves by trying to actually switch between full XICS and XIVE > >>> models. Coudln't we have new machine types always construct the XIVE > >>> infrastructure, > >> > >> yes. > >> > >>> but then implement the XICS RTAS and hcalls in terms of the XIVE virtual > >>> hardware. > >> > >> ok but migration will not be supported. > > > > Right, this would only be for newer machine types, and you can never > > migrate between different machine types. > > > >>> Since something more or less equivalent > >>> has already been done in both OPAL and the host kernel, I'm guessing > >>> this shouldn't be too hard at this point. > >> > >> Indeed that is how it is working currently on P9 kvm guests. hcalls are > >> implemented on top of XIVE native. > >> > >> Thanks, > >> > >> > >> C. > >> > > >
On 09/22/2017 12:33 PM, David Gibson wrote: > On Thu, Sep 21, 2017 at 04:18:33PM +0200, Cédric Le Goater wrote: >> On 09/21/2017 03:25 AM, David Gibson wrote: >>> On Wed, Sep 20, 2017 at 02:33:37PM +0200, Cédric Le Goater wrote: >>>> On 09/19/2017 10:46 AM, David Gibson wrote: >>>>> On Tue, Sep 19, 2017 at 06:20:20PM +1000, David Gibson wrote: >>>>>> On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote: >>>>>>> On a POWER9 sPAPR machine, the Client Architecture Support (CAS) >>>>>>> negotiation process determines whether the guest operates with an >>>>>>> interrupt controller using the XICS legacy model, as found on POWER8, >>>>>>> or in XIVE exploitation mode, the newer POWER9 interrupt model. This >>>>>>> patchset is a proposal to add XIVE support in POWER9 sPAPR machine. >>>>>>> >>>>>>> Follows a model for the XIVE interrupt controller and support for the >>>>>>> Hypervisor's calls which are used to configure the interrupt sources >>>>>>> and the event/notification queues of the guest. The last patch >>>>>>> integrates XIVE in the sPAPR machine. >>>>>>> >>>>>>> Code is here: >>>>>> >>>>>> >>>>>> An overall comment: >>>>>> >>>>>> I note in several replies here that I think the way XICS objects are >>>>>> re-used for XIVE is really ugly, and I think it will make future >>>>>> maintenance pretty painful. >>>> >>>> I agree. That was one way to identify what we need for migration >>>> compatibility and CAS reset. >>>> >>>>>> I'm thinking maybe trying to support the CAS negotiation of interrupt >>>>>> controller from day 1 is warping the design. A better approach might >>>>>> be first to implement XIVE only when given a specific machine option - >>>>>> guest gets one or the other and can't negotiate. >>>> >>>> ok. >>>> >>>> CAS is not the most complex problem, we mostly need to share >>>> the ICSIRQState array and the source offset. migration from older >>>> machine is a problem. >>> >>> Uh.. what? Migration from an older machine isn't a thing. We can >>> migrate from an older qemu, but the machine type (and version) has to >>> be identical at each end. That's *why* we keep around the older >>> machine types on newer qemus. >> >> yes. I am just wondering how I am going to handle a xics-only >> machine migrating to a xics/xive machine. > > Won't ever happen. Older machine types will always be xics, newer > machine type will always be xive (at least with POWER9). > >> The xive machine option we are talking about will activate >> the xive interrupt mode and instantiate the objects behind it. >> So when we migrate from an older machine we will need to start >> the target machine with xive=off. I guess that is OK. > > Again, we *don't* migrate from an older machine. Ever. We only ever > migrate from an older qemu version to a newer qemu using the older > machine type. Sorry I was talking about QEMU version, and not machine version. I still have to look at how both machines will cohabitate in the newer QEMU. Thanks, C. >> >> Thanks for the insights and the time to review the code, >> >> C. >> >>>> We are doomed to keep the existing XICS >>>> framework available. >>>> >>>>>> That should allow a more natural XIVE design to emerge, *then* we can >>>>>> look at what's necessary to make boot-time negotiation possible. >>>>> >>>>> Actually, it just occurred to me that we might be making life hard for >>>>> ourselves by trying to actually switch between full XICS and XIVE >>>>> models. Coudln't we have new machine types always construct the XIVE >>>>> infrastructure, >>>> >>>> yes. >>>> >>>>> but then implement the XICS RTAS and hcalls in terms of the XIVE virtual >>>>> hardware. >>>> >>>> ok but migration will not be supported. >>> >>> Right, this would only be for newer machine types, and you can never >>> migrate between different machine types. >>> >>>>> Since something more or less equivalent >>>>> has already been done in both OPAL and the host kernel, I'm guessing >>>>> this shouldn't be too hard at this point. >>>> >>>> Indeed that is how it is working currently on P9 kvm guests. hcalls are >>>> implemented on top of XIVE native. >>>> >>>> Thanks, >>>> >>>> >>>> C. >>>> >>> >> >
On Wed, 2017-09-20 at 14:33 +0200, Cédric Le Goater wrote: > > > I'm thinking maybe trying to support the CAS negotiation of interrupt > > > controller from day 1 is warping the design. A better approach might > > > be first to implement XIVE only when given a specific machine option - > > > guest gets one or the other and can't negotiate. > > ok. > > CAS is not the most complex problem, we mostly need to share > the ICSIRQState array and the source offset. migration from older > machine is a problem. We are doomed to keep the existing XICS > framework available. I don't like sharing anything. I'd rather we had separate objects alltogether. If needed we can implement CAS by doing a partition reboot like pHyp does, at least initially, until we add ways to tear down and rebuild objects. The main issue is whether we can keep a consistent number space so the DT doesn't have to be completely rebuilt. If it does, then reboot will be the only practical option I'm afraid. > > > That should allow a more natural XIVE design to emerge, *then* we can > > > look at what's necessary to make boot-time negotiation possible. > > > > Actually, it just occurred to me that we might be making life hard for > > ourselves by trying to actually switch between full XICS and XIVE > > models. Coudln't we have new machine types always construct the XIVE > > infrastructure, > > yes. > > > but then implement the XICS RTAS and hcalls in terms of the XIVE virtual > > hardware. That's gross :-) This is also exactly what KVM does with real XIVE HW and there's also such an emulation in OPAL. I'd be weary of creating a 3rd one... I'd much prefer if we managed to: - Split the source numbering from the various state tracking objects so we can have that common - Either delay the creation to after CAS or tear down & re-create the state tracking objects at CAS time. > ok but migration will not be supported. > > > Since something more or less equivalent > > has already been done in both OPAL and the host kernel, I'm guessing > > this shouldn't be too hard at this point. It would very much suck to have yet another one of these. Also we need to understand how that would work in a KVM context, the kernel will provide a "XICS" state even on top of XIVE unless we switch the kernel object to native, but then the kernel will expect full exploitation. > Indeed that is how it is working currently on P9 kvm guests. hcalls are > implemented on top of XIVE native. > > Thanks, > > > C.