mbox series

[RFC,v2,00/21] Guest exploitation of the XIVE interrupt controller (POWER9)

Message ID 20170911171235.29331-1-clg@kaod.org
Headers show
Series Guest exploitation of the XIVE interrupt controller (POWER9) | expand

Message

Cédric Le Goater Sept. 11, 2017, 5:12 p.m. UTC
On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
negotiation process determines whether the guest operates with an
interrupt controller using the XICS legacy model, as found on POWER8,
or in XIVE exploitation mode, the newer POWER9 interrupt model. This
patchset is a proposal to add XIVE support in POWER9 sPAPR machine.

Follows a model for the XIVE interrupt controller and support for the
Hypervisor's calls which are used to configure the interrupt sources
and the event/notification queues of the guest. The last patch
integrates XIVE in the sPAPR machine.

Code is here:

  https://github.com/legoater/qemu/commits/xive

Caveats :

 - IRQ allocator : making progress

   The sPAPR machine make uses of the interrupt controller very early
   in the initialization sequence to allocate IRQ numbers and populate
   the device tree. CAS requires XIVE to be able to switch interrupt
   model and consequently have the models share a common IRQ allocator.   

   I have chosen to link the sPAPR XICS interrupt source into XIVE to
   share the ICSIRQState array which acts as an IRQ allocator. This
   can be improved.

 - Interrupt presenter :

   The register data is directly stored under the ICPState structure
   which is shared with all other sPAPR interrupt controller models.

 - KVM support : not addressed yet

   The guest needs to be run with kernel_irqchip=off on a POWER9 system.

 - LSI : lightly tested.
   
Thanks,

C.

Changes since RFC v1:

 - removed initial complexity due to a tentative try to support
   PowerNV. This will come later.
 - removed specific XIVE interrupt source and presenter models
 - renamed files and typedefs
 - removed print_info() handler
 - introduced a CAS reset to rebuild the device tree
 - linked the XIVE model with the sPAPR XICS interrupt source to share
   the IRQ allocator   
 - improved hcall support (still some missing but they are not used
   under Linux)
 - improved device tree
 - should have addressed comments in first RFC
 - and much more ... Next version should have a better changelog.
 

Cédric Le Goater (21):
  ppc/xive: introduce a skeleton for the sPAPR XIVE interrupt controller
  migration: add VMSTATE_STRUCT_VARRAY_UINT32_ALLOC
  ppc/xive: define the XIVE internal tables
  ppc/xive: provide a link to the sPAPR ICS object under XIVE
  ppc/xive: allocate IRQ numbers for the IPIs
  ppc/xive: introduce handlers for interrupt sources
  ppc/xive: add MMIO handlers for the XIVE interrupt sources
  ppc/xive: describe the XIVE interrupt source flags
  ppc/xive: extend the interrupt presenter model for XIVE
  ppc/xive: add MMIO handlers for the XIVE TIMA
  ppc/xive: push the EQ data in OS event queue
  ppc/xive: notify the CPU when interrupt priority is more privileged
  ppc/xive: handle interrupt acknowledgment by the O/S
  ppc/xive: add support for the SET_OS_PENDING command
  spapr: modify spapr_populate_pci_dt() to use a 'nr_irqs' argument
  spapr: add a XIVE object to the sPAPR machine
  ppc/xive: add hcalls support
  ppc/xive: add device tree support
  ppc/xive: introduce a helper to map the XIVE memory regions
  ppc/xics: introduce a qirq_get() helper in the XICSFabric
  spapr: activate XIVE exploitation mode

 default-configs/ppc64-softmmu.mak |   1 +
 hw/intc/Makefile.objs             |   1 +
 hw/intc/spapr_xive.c              | 821 +++++++++++++++++++++++++++++++++
 hw/intc/spapr_xive_hcall.c        | 930 ++++++++++++++++++++++++++++++++++++++
 hw/intc/xics.c                    |  11 +-
 hw/intc/xive-internal.h           | 189 ++++++++
 hw/ppc/spapr.c                    | 110 ++++-
 hw/ppc/spapr_hcall.c              |   6 +
 hw/ppc/spapr_pci.c                |   4 +-
 include/hw/pci-host/spapr.h       |   2 +-
 include/hw/ppc/spapr.h            |  17 +-
 include/hw/ppc/spapr_xive.h       |  75 +++
 include/hw/ppc/xics.h             |   7 +
 include/migration/vmstate.h       |  10 +
 14 files changed, 2169 insertions(+), 15 deletions(-)
 create mode 100644 hw/intc/spapr_xive.c
 create mode 100644 hw/intc/spapr_xive_hcall.c
 create mode 100644 hw/intc/xive-internal.h
 create mode 100644 include/hw/ppc/spapr_xive.h

Comments

David Gibson Sept. 19, 2017, 8:20 a.m. UTC | #1
On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote:
> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
> negotiation process determines whether the guest operates with an
> interrupt controller using the XICS legacy model, as found on POWER8,
> or in XIVE exploitation mode, the newer POWER9 interrupt model. This
> patchset is a proposal to add XIVE support in POWER9 sPAPR machine.
> 
> Follows a model for the XIVE interrupt controller and support for the
> Hypervisor's calls which are used to configure the interrupt sources
> and the event/notification queues of the guest. The last patch
> integrates XIVE in the sPAPR machine.
> 
> Code is here:


An overall comment:

I note in several replies here that I think the way XICS objects are
re-used for XIVE is really ugly, and I think it will make future
maintenance pretty painful.

I'm thinking maybe trying to support the CAS negotiation of interrupt
controller from day 1 is warping the design.  A better approach might
be first to implement XIVE only when given a specific machine option -
guest gets one or the other and can't negotiate.

That should allow a more natural XIVE design to emerge, *then* we can
look at what's necessary to make boot-time negotiation possible.
> 
>   https://github.com/legoater/qemu/commits/xive
> 
> Caveats :
> 
>  - IRQ allocator : making progress
> 
>    The sPAPR machine make uses of the interrupt controller very early
>    in the initialization sequence to allocate IRQ numbers and populate
>    the device tree. CAS requires XIVE to be able to switch interrupt
>    model and consequently have the models share a common IRQ allocator.   
> 
>    I have chosen to link the sPAPR XICS interrupt source into XIVE to
>    share the ICSIRQState array which acts as an IRQ allocator. This
>    can be improved.
> 
>  - Interrupt presenter :
> 
>    The register data is directly stored under the ICPState structure
>    which is shared with all other sPAPR interrupt controller models.
> 
>  - KVM support : not addressed yet
> 
>    The guest needs to be run with kernel_irqchip=off on a POWER9 system.
> 
>  - LSI : lightly tested.
>    
> Thanks,
> 
> C.
> 
> Changes since RFC v1:
> 
>  - removed initial complexity due to a tentative try to support
>    PowerNV. This will come later.
>  - removed specific XIVE interrupt source and presenter models
>  - renamed files and typedefs
>  - removed print_info() handler
>  - introduced a CAS reset to rebuild the device tree
>  - linked the XIVE model with the sPAPR XICS interrupt source to share
>    the IRQ allocator   
>  - improved hcall support (still some missing but they are not used
>    under Linux)
>  - improved device tree
>  - should have addressed comments in first RFC
>  - and much more ... Next version should have a better changelog.
>  
> 
> Cédric Le Goater (21):
>   ppc/xive: introduce a skeleton for the sPAPR XIVE interrupt controller
>   migration: add VMSTATE_STRUCT_VARRAY_UINT32_ALLOC
>   ppc/xive: define the XIVE internal tables
>   ppc/xive: provide a link to the sPAPR ICS object under XIVE
>   ppc/xive: allocate IRQ numbers for the IPIs
>   ppc/xive: introduce handlers for interrupt sources
>   ppc/xive: add MMIO handlers for the XIVE interrupt sources
>   ppc/xive: describe the XIVE interrupt source flags
>   ppc/xive: extend the interrupt presenter model for XIVE
>   ppc/xive: add MMIO handlers for the XIVE TIMA
>   ppc/xive: push the EQ data in OS event queue
>   ppc/xive: notify the CPU when interrupt priority is more privileged
>   ppc/xive: handle interrupt acknowledgment by the O/S
>   ppc/xive: add support for the SET_OS_PENDING command
>   spapr: modify spapr_populate_pci_dt() to use a 'nr_irqs' argument
>   spapr: add a XIVE object to the sPAPR machine
>   ppc/xive: add hcalls support
>   ppc/xive: add device tree support
>   ppc/xive: introduce a helper to map the XIVE memory regions
>   ppc/xics: introduce a qirq_get() helper in the XICSFabric
>   spapr: activate XIVE exploitation mode
> 
>  default-configs/ppc64-softmmu.mak |   1 +
>  hw/intc/Makefile.objs             |   1 +
>  hw/intc/spapr_xive.c              | 821 +++++++++++++++++++++++++++++++++
>  hw/intc/spapr_xive_hcall.c        | 930 ++++++++++++++++++++++++++++++++++++++
>  hw/intc/xics.c                    |  11 +-
>  hw/intc/xive-internal.h           | 189 ++++++++
>  hw/ppc/spapr.c                    | 110 ++++-
>  hw/ppc/spapr_hcall.c              |   6 +
>  hw/ppc/spapr_pci.c                |   4 +-
>  include/hw/pci-host/spapr.h       |   2 +-
>  include/hw/ppc/spapr.h            |  17 +-
>  include/hw/ppc/spapr_xive.h       |  75 +++
>  include/hw/ppc/xics.h             |   7 +
>  include/migration/vmstate.h       |  10 +
>  14 files changed, 2169 insertions(+), 15 deletions(-)
>  create mode 100644 hw/intc/spapr_xive.c
>  create mode 100644 hw/intc/spapr_xive_hcall.c
>  create mode 100644 hw/intc/xive-internal.h
>  create mode 100644 include/hw/ppc/spapr_xive.h
>
David Gibson Sept. 19, 2017, 8:46 a.m. UTC | #2
On Tue, Sep 19, 2017 at 06:20:20PM +1000, David Gibson wrote:
> On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote:
> > On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
> > negotiation process determines whether the guest operates with an
> > interrupt controller using the XICS legacy model, as found on POWER8,
> > or in XIVE exploitation mode, the newer POWER9 interrupt model. This
> > patchset is a proposal to add XIVE support in POWER9 sPAPR machine.
> > 
> > Follows a model for the XIVE interrupt controller and support for the
> > Hypervisor's calls which are used to configure the interrupt sources
> > and the event/notification queues of the guest. The last patch
> > integrates XIVE in the sPAPR machine.
> > 
> > Code is here:
> 
> 
> An overall comment:
> 
> I note in several replies here that I think the way XICS objects are
> re-used for XIVE is really ugly, and I think it will make future
> maintenance pretty painful.
> 
> I'm thinking maybe trying to support the CAS negotiation of interrupt
> controller from day 1 is warping the design.  A better approach might
> be first to implement XIVE only when given a specific machine option -
> guest gets one or the other and can't negotiate.
> 
> That should allow a more natural XIVE design to emerge, *then* we can
> look at what's necessary to make boot-time negotiation possible.

Actually, it just occurred to me that we might be making life hard for
ourselves by trying to actually switch between full XICS and XIVE
models.  Coudln't we have new machine types always construct the XIVE
infrastructure, but then implement the XICS RTAS and hcalls in terms
of the XIVE virtual hardware.  Since something more or less equivalent
has already been done in both OPAL and the host kernel, I'm guessing
this shouldn't be too hard at this point.
Cédric Le Goater Sept. 20, 2017, 12:33 p.m. UTC | #3
On 09/19/2017 10:46 AM, David Gibson wrote:
> On Tue, Sep 19, 2017 at 06:20:20PM +1000, David Gibson wrote:
>> On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote:
>>> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
>>> negotiation process determines whether the guest operates with an
>>> interrupt controller using the XICS legacy model, as found on POWER8,
>>> or in XIVE exploitation mode, the newer POWER9 interrupt model. This
>>> patchset is a proposal to add XIVE support in POWER9 sPAPR machine.
>>>
>>> Follows a model for the XIVE interrupt controller and support for the
>>> Hypervisor's calls which are used to configure the interrupt sources
>>> and the event/notification queues of the guest. The last patch
>>> integrates XIVE in the sPAPR machine.
>>>
>>> Code is here:
>>
>>
>> An overall comment:
>>
>> I note in several replies here that I think the way XICS objects are
>> re-used for XIVE is really ugly, and I think it will make future
>> maintenance pretty painful.

I agree. That was one way to identify what we need for migration 
compatibility and CAS reset.   

>> I'm thinking maybe trying to support the CAS negotiation of interrupt
>> controller from day 1 is warping the design.  A better approach might
>> be first to implement XIVE only when given a specific machine option -
>> guest gets one or the other and can't negotiate.

ok. 

CAS is not the most complex problem, we mostly need to share 
the ICSIRQState array and the source offset. migration from older
machine is a problem. We are doomed to keep the existing XICS
framework available.

>> That should allow a more natural XIVE design to emerge, *then* we can
>> look at what's necessary to make boot-time negotiation possible.
> 
> Actually, it just occurred to me that we might be making life hard for
> ourselves by trying to actually switch between full XICS and XIVE
> models.  Coudln't we have new machine types always construct the XIVE
> infrastructure, 

yes.

> but then implement the XICS RTAS and hcalls in terms of the XIVE virtual 
> hardware.

ok but migration will not be supported.

> Since something more or less equivalent
> has already been done in both OPAL and the host kernel, I'm guessing
> this shouldn't be too hard at this point.

Indeed that is how it is working currently on P9 kvm guests. hcalls are
implemented on top of XIVE native.

Thanks,


C.
David Gibson Sept. 21, 2017, 1:25 a.m. UTC | #4
On Wed, Sep 20, 2017 at 02:33:37PM +0200, Cédric Le Goater wrote:
> On 09/19/2017 10:46 AM, David Gibson wrote:
> > On Tue, Sep 19, 2017 at 06:20:20PM +1000, David Gibson wrote:
> >> On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote:
> >>> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
> >>> negotiation process determines whether the guest operates with an
> >>> interrupt controller using the XICS legacy model, as found on POWER8,
> >>> or in XIVE exploitation mode, the newer POWER9 interrupt model. This
> >>> patchset is a proposal to add XIVE support in POWER9 sPAPR machine.
> >>>
> >>> Follows a model for the XIVE interrupt controller and support for the
> >>> Hypervisor's calls which are used to configure the interrupt sources
> >>> and the event/notification queues of the guest. The last patch
> >>> integrates XIVE in the sPAPR machine.
> >>>
> >>> Code is here:
> >>
> >>
> >> An overall comment:
> >>
> >> I note in several replies here that I think the way XICS objects are
> >> re-used for XIVE is really ugly, and I think it will make future
> >> maintenance pretty painful.
> 
> I agree. That was one way to identify what we need for migration 
> compatibility and CAS reset.   
> 
> >> I'm thinking maybe trying to support the CAS negotiation of interrupt
> >> controller from day 1 is warping the design.  A better approach might
> >> be first to implement XIVE only when given a specific machine option -
> >> guest gets one or the other and can't negotiate.
> 
> ok. 
> 
> CAS is not the most complex problem, we mostly need to share 
> the ICSIRQState array and the source offset. migration from older
> machine is a problem.

Uh.. what?  Migration from an older machine isn't a thing.  We can
migrate from an older qemu, but the machine type (and version) has to
be identical at each end.  That's *why* we keep around the older
machine types on newer qemus.

> We are doomed to keep the existing XICS
> framework available.
> 
> >> That should allow a more natural XIVE design to emerge, *then* we can
> >> look at what's necessary to make boot-time negotiation possible.
> > 
> > Actually, it just occurred to me that we might be making life hard for
> > ourselves by trying to actually switch between full XICS and XIVE
> > models.  Coudln't we have new machine types always construct the XIVE
> > infrastructure, 
> 
> yes.
> 
> > but then implement the XICS RTAS and hcalls in terms of the XIVE virtual 
> > hardware.
> 
> ok but migration will not be supported.

Right, this would only be for newer machine types, and you can never
migrate between different machine types.

> > Since something more or less equivalent
> > has already been done in both OPAL and the host kernel, I'm guessing
> > this shouldn't be too hard at this point.
> 
> Indeed that is how it is working currently on P9 kvm guests. hcalls are
> implemented on top of XIVE native.
> 
> Thanks,
> 
> 
> C.
>
Cédric Le Goater Sept. 21, 2017, 2:18 p.m. UTC | #5
On 09/21/2017 03:25 AM, David Gibson wrote:
> On Wed, Sep 20, 2017 at 02:33:37PM +0200, Cédric Le Goater wrote:
>> On 09/19/2017 10:46 AM, David Gibson wrote:
>>> On Tue, Sep 19, 2017 at 06:20:20PM +1000, David Gibson wrote:
>>>> On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote:
>>>>> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
>>>>> negotiation process determines whether the guest operates with an
>>>>> interrupt controller using the XICS legacy model, as found on POWER8,
>>>>> or in XIVE exploitation mode, the newer POWER9 interrupt model. This
>>>>> patchset is a proposal to add XIVE support in POWER9 sPAPR machine.
>>>>>
>>>>> Follows a model for the XIVE interrupt controller and support for the
>>>>> Hypervisor's calls which are used to configure the interrupt sources
>>>>> and the event/notification queues of the guest. The last patch
>>>>> integrates XIVE in the sPAPR machine.
>>>>>
>>>>> Code is here:
>>>>
>>>>
>>>> An overall comment:
>>>>
>>>> I note in several replies here that I think the way XICS objects are
>>>> re-used for XIVE is really ugly, and I think it will make future
>>>> maintenance pretty painful.
>>
>> I agree. That was one way to identify what we need for migration 
>> compatibility and CAS reset.   
>>
>>>> I'm thinking maybe trying to support the CAS negotiation of interrupt
>>>> controller from day 1 is warping the design.  A better approach might
>>>> be first to implement XIVE only when given a specific machine option -
>>>> guest gets one or the other and can't negotiate.
>>
>> ok. 
>>
>> CAS is not the most complex problem, we mostly need to share 
>> the ICSIRQState array and the source offset. migration from older
>> machine is a problem.
> 
> Uh.. what?  Migration from an older machine isn't a thing.  We can
> migrate from an older qemu, but the machine type (and version) has to
> be identical at each end.  That's *why* we keep around the older
> machine types on newer qemus.

yes. I am just wondering how I am going to handle a xics-only 
machine migrating to a xics/xive machine. 

The xive machine option we are talking about will activate 
the xive interrupt mode and instantiate the objects behind it. 
So when we migrate from an older machine we will need to start 
the target machine with xive=off. I guess that is OK.   

Thanks for the insights and the time to review the code,

C. 

>> We are doomed to keep the existing XICS
>> framework available.
>>
>>>> That should allow a more natural XIVE design to emerge, *then* we can
>>>> look at what's necessary to make boot-time negotiation possible.
>>>
>>> Actually, it just occurred to me that we might be making life hard for
>>> ourselves by trying to actually switch between full XICS and XIVE
>>> models.  Coudln't we have new machine types always construct the XIVE
>>> infrastructure, 
>>
>> yes.
>>
>>> but then implement the XICS RTAS and hcalls in terms of the XIVE virtual 
>>> hardware.
>>
>> ok but migration will not be supported.
> 
> Right, this would only be for newer machine types, and you can never
> migrate between different machine types.
> 
>>> Since something more or less equivalent
>>> has already been done in both OPAL and the host kernel, I'm guessing
>>> this shouldn't be too hard at this point.
>>
>> Indeed that is how it is working currently on P9 kvm guests. hcalls are
>> implemented on top of XIVE native.
>>
>> Thanks,
>>
>>
>> C.
>>
>
David Gibson Sept. 22, 2017, 10:33 a.m. UTC | #6
On Thu, Sep 21, 2017 at 04:18:33PM +0200, Cédric Le Goater wrote:
> On 09/21/2017 03:25 AM, David Gibson wrote:
> > On Wed, Sep 20, 2017 at 02:33:37PM +0200, Cédric Le Goater wrote:
> >> On 09/19/2017 10:46 AM, David Gibson wrote:
> >>> On Tue, Sep 19, 2017 at 06:20:20PM +1000, David Gibson wrote:
> >>>> On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote:
> >>>>> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
> >>>>> negotiation process determines whether the guest operates with an
> >>>>> interrupt controller using the XICS legacy model, as found on POWER8,
> >>>>> or in XIVE exploitation mode, the newer POWER9 interrupt model. This
> >>>>> patchset is a proposal to add XIVE support in POWER9 sPAPR machine.
> >>>>>
> >>>>> Follows a model for the XIVE interrupt controller and support for the
> >>>>> Hypervisor's calls which are used to configure the interrupt sources
> >>>>> and the event/notification queues of the guest. The last patch
> >>>>> integrates XIVE in the sPAPR machine.
> >>>>>
> >>>>> Code is here:
> >>>>
> >>>>
> >>>> An overall comment:
> >>>>
> >>>> I note in several replies here that I think the way XICS objects are
> >>>> re-used for XIVE is really ugly, and I think it will make future
> >>>> maintenance pretty painful.
> >>
> >> I agree. That was one way to identify what we need for migration 
> >> compatibility and CAS reset.   
> >>
> >>>> I'm thinking maybe trying to support the CAS negotiation of interrupt
> >>>> controller from day 1 is warping the design.  A better approach might
> >>>> be first to implement XIVE only when given a specific machine option -
> >>>> guest gets one or the other and can't negotiate.
> >>
> >> ok. 
> >>
> >> CAS is not the most complex problem, we mostly need to share 
> >> the ICSIRQState array and the source offset. migration from older
> >> machine is a problem.
> > 
> > Uh.. what?  Migration from an older machine isn't a thing.  We can
> > migrate from an older qemu, but the machine type (and version) has to
> > be identical at each end.  That's *why* we keep around the older
> > machine types on newer qemus.
> 
> yes. I am just wondering how I am going to handle a xics-only 
> machine migrating to a xics/xive machine. 

Won't ever happen.  Older machine types will always be xics, newer
machine type will always be xive (at least with POWER9).

> The xive machine option we are talking about will activate 
> the xive interrupt mode and instantiate the objects behind it. 
> So when we migrate from an older machine we will need to start 
> the target machine with xive=off. I guess that is OK.

Again, we *don't* migrate from an older machine.  Ever.  We only ever
migrate from an older qemu version to a newer qemu using the older
machine type.
> 
> Thanks for the insights and the time to review the code,
> 
> C. 
> 
> >> We are doomed to keep the existing XICS
> >> framework available.
> >>
> >>>> That should allow a more natural XIVE design to emerge, *then* we can
> >>>> look at what's necessary to make boot-time negotiation possible.
> >>>
> >>> Actually, it just occurred to me that we might be making life hard for
> >>> ourselves by trying to actually switch between full XICS and XIVE
> >>> models.  Coudln't we have new machine types always construct the XIVE
> >>> infrastructure, 
> >>
> >> yes.
> >>
> >>> but then implement the XICS RTAS and hcalls in terms of the XIVE virtual 
> >>> hardware.
> >>
> >> ok but migration will not be supported.
> > 
> > Right, this would only be for newer machine types, and you can never
> > migrate between different machine types.
> > 
> >>> Since something more or less equivalent
> >>> has already been done in both OPAL and the host kernel, I'm guessing
> >>> this shouldn't be too hard at this point.
> >>
> >> Indeed that is how it is working currently on P9 kvm guests. hcalls are
> >> implemented on top of XIVE native.
> >>
> >> Thanks,
> >>
> >>
> >> C.
> >>
> > 
>
Cédric Le Goater Sept. 22, 2017, 12:32 p.m. UTC | #7
On 09/22/2017 12:33 PM, David Gibson wrote:
> On Thu, Sep 21, 2017 at 04:18:33PM +0200, Cédric Le Goater wrote:
>> On 09/21/2017 03:25 AM, David Gibson wrote:
>>> On Wed, Sep 20, 2017 at 02:33:37PM +0200, Cédric Le Goater wrote:
>>>> On 09/19/2017 10:46 AM, David Gibson wrote:
>>>>> On Tue, Sep 19, 2017 at 06:20:20PM +1000, David Gibson wrote:
>>>>>> On Mon, Sep 11, 2017 at 07:12:14PM +0200, Cédric Le Goater wrote:
>>>>>>> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
>>>>>>> negotiation process determines whether the guest operates with an
>>>>>>> interrupt controller using the XICS legacy model, as found on POWER8,
>>>>>>> or in XIVE exploitation mode, the newer POWER9 interrupt model. This
>>>>>>> patchset is a proposal to add XIVE support in POWER9 sPAPR machine.
>>>>>>>
>>>>>>> Follows a model for the XIVE interrupt controller and support for the
>>>>>>> Hypervisor's calls which are used to configure the interrupt sources
>>>>>>> and the event/notification queues of the guest. The last patch
>>>>>>> integrates XIVE in the sPAPR machine.
>>>>>>>
>>>>>>> Code is here:
>>>>>>
>>>>>>
>>>>>> An overall comment:
>>>>>>
>>>>>> I note in several replies here that I think the way XICS objects are
>>>>>> re-used for XIVE is really ugly, and I think it will make future
>>>>>> maintenance pretty painful.
>>>>
>>>> I agree. That was one way to identify what we need for migration 
>>>> compatibility and CAS reset.   
>>>>
>>>>>> I'm thinking maybe trying to support the CAS negotiation of interrupt
>>>>>> controller from day 1 is warping the design.  A better approach might
>>>>>> be first to implement XIVE only when given a specific machine option -
>>>>>> guest gets one or the other and can't negotiate.
>>>>
>>>> ok. 
>>>>
>>>> CAS is not the most complex problem, we mostly need to share 
>>>> the ICSIRQState array and the source offset. migration from older
>>>> machine is a problem.
>>>
>>> Uh.. what?  Migration from an older machine isn't a thing.  We can
>>> migrate from an older qemu, but the machine type (and version) has to
>>> be identical at each end.  That's *why* we keep around the older
>>> machine types on newer qemus.
>>
>> yes. I am just wondering how I am going to handle a xics-only 
>> machine migrating to a xics/xive machine. 
> 
> Won't ever happen.  Older machine types will always be xics, newer
> machine type will always be xive (at least with POWER9).
> 
>> The xive machine option we are talking about will activate 
>> the xive interrupt mode and instantiate the objects behind it. 
>> So when we migrate from an older machine we will need to start 
>> the target machine with xive=off. I guess that is OK.
> 
> Again, we *don't* migrate from an older machine.  Ever.  We only ever
> migrate from an older qemu version to a newer qemu using the older
> machine type.

Sorry I was talking about QEMU version, and not machine version.
I still have to look at how both machines will cohabitate in the 
newer QEMU. 

Thanks,

C. 


>>
>> Thanks for the insights and the time to review the code,
>>
>> C. 
>>
>>>> We are doomed to keep the existing XICS
>>>> framework available.
>>>>
>>>>>> That should allow a more natural XIVE design to emerge, *then* we can
>>>>>> look at what's necessary to make boot-time negotiation possible.
>>>>>
>>>>> Actually, it just occurred to me that we might be making life hard for
>>>>> ourselves by trying to actually switch between full XICS and XIVE
>>>>> models.  Coudln't we have new machine types always construct the XIVE
>>>>> infrastructure, 
>>>>
>>>> yes.
>>>>
>>>>> but then implement the XICS RTAS and hcalls in terms of the XIVE virtual 
>>>>> hardware.
>>>>
>>>> ok but migration will not be supported.
>>>
>>> Right, this would only be for newer machine types, and you can never
>>> migrate between different machine types.
>>>
>>>>> Since something more or less equivalent
>>>>> has already been done in both OPAL and the host kernel, I'm guessing
>>>>> this shouldn't be too hard at this point.
>>>>
>>>> Indeed that is how it is working currently on P9 kvm guests. hcalls are
>>>> implemented on top of XIVE native.
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> C.
>>>>
>>>
>>
>
Benjamin Herrenschmidt Sept. 28, 2017, 8:23 a.m. UTC | #8
On Wed, 2017-09-20 at 14:33 +0200, Cédric Le Goater wrote:
> > > I'm thinking maybe trying to support the CAS negotiation of interrupt
> > > controller from day 1 is warping the design.  A better approach might
> > > be first to implement XIVE only when given a specific machine option -
> > > guest gets one or the other and can't negotiate.
> 
> ok. 
> 
> CAS is not the most complex problem, we mostly need to share 
> the ICSIRQState array and the source offset. migration from older
> machine is a problem. We are doomed to keep the existing XICS
> framework available.

I don't like sharing anything. I'd rather we had separate objects
alltogether. If needed we can implement CAS by doing a partition reboot
like pHyp does, at least initially, until we add ways to tear down and
rebuild objects.

The main issue is whether we can keep a consistent number space so the
DT doesn't have to be completely rebuilt. If it does, then reboot will
be the only practical option I'm afraid.

> > > That should allow a more natural XIVE design to emerge, *then* we can
> > > look at what's necessary to make boot-time negotiation possible.
> > 
> > Actually, it just occurred to me that we might be making life hard for
> > ourselves by trying to actually switch between full XICS and XIVE
> > models.  Coudln't we have new machine types always construct the XIVE
> > infrastructure, 
> 
> yes.
> 
> > but then implement the XICS RTAS and hcalls in terms of the XIVE virtual 
> > hardware.

That's gross :-)

This is also exactly what KVM does with real XIVE HW and there's also
such an emulation in OPAL. I'd be weary of creating a 3rd one...

I'd much prefer if we managed to:

 - Split the source numbering from the various state tracking objects
so we can have that common

 - Either delay the creation to after CAS or tear down & re-create the
state tracking objects at CAS time.

> ok but migration will not be supported.
> 
> > Since something more or less equivalent
> > has already been done in both OPAL and the host kernel, I'm guessing
> > this shouldn't be too hard at this point.

It would very much suck to have yet another one of these.

Also we need to understand how that would work in a KVM context, the
kernel will provide a "XICS" state even on top of XIVE unless we switch
the kernel object to native, but then the kernel will expect full
exploitation.

> Indeed that is how it is working currently on P9 kvm guests. hcalls are
> implemented on top of XIVE native.
> 
> Thanks,
> 
> 
> C.