Message ID | 20190107183946.7230-13-clg@kaod.org |
---|---|
State | New |
Headers | show |
Series | spapr: add KVM support to the XIVE interrupt mode | expand |
On Mon, Jan 07, 2019 at 07:39:45PM +0100, Cédric Le Goater wrote: > The IRQ number space of the XIVE and XICS interrupt mode are aligned > when using the dual interrupt mode for the machine. This means that > the ICS offset is set to zero in QEMU and that the KVM XICS device > should be informed of this new value. Unfortunately, there is now way > to do so and KVM still maintains the XICS_IRQ_BASE (0x1000) offset. > > Ignore the lower 4K which are not used under the XICS interrupt > mode. These IRQ numbers are only claimed by XIVE for the CPU IPIs. > > Signed-off-by: Cédric Le Goater <clg@kaod.org> > --- > hw/intc/xics_kvm.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c > index 651bbfdf6966..1d21ff217b82 100644 > --- a/hw/intc/xics_kvm.c > +++ b/hw/intc/xics_kvm.c > @@ -238,6 +238,15 @@ static void ics_get_kvm_state(ICSState *ics) > for (i = 0; i < ics->nr_irqs; i++) { > ICSIRQState *irq = &ics->irqs[i]; > > + /* > + * The KVM XICS device considers that the IRQ numbers should > + * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K > + * numbers (only claimed by XIVE for the CPU IPIs). > + */ > + if (i + ics->offset < XICS_IRQ_BASE) { > + continue; > + } > + This seems bogus to me. The guest-visible irq numbers need to line up between xics and xive mode, yes, but that doesn't mean we need to keep around a great big array of unused array of ICS irq states, even in TCG mode. > kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES, > i + ics->offset, &state, false, &error_fatal); > > @@ -303,6 +312,15 @@ static int ics_set_kvm_state(ICSState *ics, int version_id) > ICSIRQState *irq = &ics->irqs[i]; > int ret; > > + /* > + * The KVM XICS device considers that the IRQ numbers should > + * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K > + * numbers (only claimed by XIVE for the CPU IPIs). > + */ > + if (i + ics->offset < XICS_IRQ_BASE) { > + continue; > + } > + > state = irq->server; > state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK) > << KVM_XICS_PRIORITY_SHIFT;
On 2/12/19 2:06 AM, David Gibson wrote: > On Mon, Jan 07, 2019 at 07:39:45PM +0100, Cédric Le Goater wrote: >> The IRQ number space of the XIVE and XICS interrupt mode are aligned >> when using the dual interrupt mode for the machine. This means that >> the ICS offset is set to zero in QEMU and that the KVM XICS device >> should be informed of this new value. Unfortunately, there is now way >> to do so and KVM still maintains the XICS_IRQ_BASE (0x1000) offset. >> >> Ignore the lower 4K which are not used under the XICS interrupt >> mode. These IRQ numbers are only claimed by XIVE for the CPU IPIs. >> >> Signed-off-by: Cédric Le Goater <clg@kaod.org> >> --- >> hw/intc/xics_kvm.c | 18 ++++++++++++++++++ >> 1 file changed, 18 insertions(+) >> >> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c >> index 651bbfdf6966..1d21ff217b82 100644 >> --- a/hw/intc/xics_kvm.c >> +++ b/hw/intc/xics_kvm.c >> @@ -238,6 +238,15 @@ static void ics_get_kvm_state(ICSState *ics) >> for (i = 0; i < ics->nr_irqs; i++) { >> ICSIRQState *irq = &ics->irqs[i]; >> >> + /* >> + * The KVM XICS device considers that the IRQ numbers should >> + * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K >> + * numbers (only claimed by XIVE for the CPU IPIs). >> + */ >> + if (i + ics->offset < XICS_IRQ_BASE) { >> + continue; >> + } >> + > > This seems bogus to me. The guest-visible irq numbers need to line up > between xics and xive mode, yes, but that doesn't mean we need to keep > around a great big array of unused array of ICS irq states, even in > TCG mode. This is because the qirqs[] array is under the machine and shared between both interrupt modes, xics and xive. C. > >> kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES, >> i + ics->offset, &state, false, &error_fatal); >> >> @@ -303,6 +312,15 @@ static int ics_set_kvm_state(ICSState *ics, int version_id) >> ICSIRQState *irq = &ics->irqs[i]; >> int ret; >> >> + /* >> + * The KVM XICS device considers that the IRQ numbers should >> + * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K >> + * numbers (only claimed by XIVE for the CPU IPIs). >> + */ >> + if (i + ics->offset < XICS_IRQ_BASE) { >> + continue; >> + } >> + >> state = irq->server; >> state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK) >> << KVM_XICS_PRIORITY_SHIFT; >
On Tue, Feb 12, 2019 at 08:05:53AM +0100, Cédric Le Goater wrote: > On 2/12/19 2:06 AM, David Gibson wrote: > > On Mon, Jan 07, 2019 at 07:39:45PM +0100, Cédric Le Goater wrote: > >> The IRQ number space of the XIVE and XICS interrupt mode are aligned > >> when using the dual interrupt mode for the machine. This means that > >> the ICS offset is set to zero in QEMU and that the KVM XICS device > >> should be informed of this new value. Unfortunately, there is now way > >> to do so and KVM still maintains the XICS_IRQ_BASE (0x1000) offset. > >> > >> Ignore the lower 4K which are not used under the XICS interrupt > >> mode. These IRQ numbers are only claimed by XIVE for the CPU IPIs. > >> > >> Signed-off-by: Cédric Le Goater <clg@kaod.org> > >> --- > >> hw/intc/xics_kvm.c | 18 ++++++++++++++++++ > >> 1 file changed, 18 insertions(+) > >> > >> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c > >> index 651bbfdf6966..1d21ff217b82 100644 > >> --- a/hw/intc/xics_kvm.c > >> +++ b/hw/intc/xics_kvm.c > >> @@ -238,6 +238,15 @@ static void ics_get_kvm_state(ICSState *ics) > >> for (i = 0; i < ics->nr_irqs; i++) { > >> ICSIRQState *irq = &ics->irqs[i]; > >> > >> + /* > >> + * The KVM XICS device considers that the IRQ numbers should > >> + * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K > >> + * numbers (only claimed by XIVE for the CPU IPIs). > >> + */ > >> + if (i + ics->offset < XICS_IRQ_BASE) { > >> + continue; > >> + } > >> + > > > > This seems bogus to me. The guest-visible irq numbers need to line up > > between xics and xive mode, yes, but that doesn't mean we need to keep > > around a great big array of unused array of ICS irq states, even in > > TCG mode. > > This is because the qirqs[] array is under the machine and shared between > both interrupt modes, xics and xive. I don't see how that follows. ICSIRQState is indexed in terms of the ICS source number, not the global irq number, so I don't see why it has to match up with the qirq array. > > C. > > > > >> kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES, > >> i + ics->offset, &state, false, &error_fatal); > >> > >> @@ -303,6 +312,15 @@ static int ics_set_kvm_state(ICSState *ics, int version_id) > >> ICSIRQState *irq = &ics->irqs[i]; > >> int ret; > >> > >> + /* > >> + * The KVM XICS device considers that the IRQ numbers should > >> + * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K > >> + * numbers (only claimed by XIVE for the CPU IPIs). > >> + */ > >> + if (i + ics->offset < XICS_IRQ_BASE) { > >> + continue; > >> + } > >> + > >> state = irq->server; > >> state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK) > >> << KVM_XICS_PRIORITY_SHIFT; > > >
On 2/13/19 2:33 AM, David Gibson wrote: > On Tue, Feb 12, 2019 at 08:05:53AM +0100, Cédric Le Goater wrote: >> On 2/12/19 2:06 AM, David Gibson wrote: >>> On Mon, Jan 07, 2019 at 07:39:45PM +0100, Cédric Le Goater wrote: >>>> The IRQ number space of the XIVE and XICS interrupt mode are aligned >>>> when using the dual interrupt mode for the machine. This means that >>>> the ICS offset is set to zero in QEMU and that the KVM XICS device >>>> should be informed of this new value. Unfortunately, there is now way >>>> to do so and KVM still maintains the XICS_IRQ_BASE (0x1000) offset. >>>> >>>> Ignore the lower 4K which are not used under the XICS interrupt >>>> mode. These IRQ numbers are only claimed by XIVE for the CPU IPIs. >>>> >>>> Signed-off-by: Cédric Le Goater <clg@kaod.org> >>>> --- >>>> hw/intc/xics_kvm.c | 18 ++++++++++++++++++ >>>> 1 file changed, 18 insertions(+) >>>> >>>> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c >>>> index 651bbfdf6966..1d21ff217b82 100644 >>>> --- a/hw/intc/xics_kvm.c >>>> +++ b/hw/intc/xics_kvm.c >>>> @@ -238,6 +238,15 @@ static void ics_get_kvm_state(ICSState *ics) >>>> for (i = 0; i < ics->nr_irqs; i++) { >>>> ICSIRQState *irq = &ics->irqs[i]; >>>> >>>> + /* >>>> + * The KVM XICS device considers that the IRQ numbers should >>>> + * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K >>>> + * numbers (only claimed by XIVE for the CPU IPIs). >>>> + */ >>>> + if (i + ics->offset < XICS_IRQ_BASE) { >>>> + continue; >>>> + } >>>> + >>> >>> This seems bogus to me. The guest-visible irq numbers need to line up >>> between xics and xive mode, yes, but that doesn't mean we need to keep >>> around a great big array of unused array of ICS irq states, even in >>> TCG mode. >> >> This is because the qirqs[] array is under the machine and shared between >> both interrupt modes, xics and xive. > > I don't see how that follows. ICSIRQState is indexed in terms of the > ICS source number, not the global irq number, so I don't see why it > has to match up with the qirq array. The root cause is the use of spapr->irq->nr_irqs to initialize the ICS and sPAPRXive object. In case of the 'dual' backend, it covers the full XIVE IRQ number space (0x2000 today) but XICS only needs 0x1000. I think we can fix the offset issue by using the appropriate nr_irqs which should be for the XICS backend : spapr->irq->nr_irqs - ics->offset I keep in mind the XIVE support for nested guests and I think we will need to extend the IRQ number space in L1 and have the L2 use a portion of it (using an offset). C. >>> >>>> kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES, >>>> i + ics->offset, &state, false, &error_fatal); >>>> >>>> @@ -303,6 +312,15 @@ static int ics_set_kvm_state(ICSState *ics, int version_id) >>>> ICSIRQState *irq = &ics->irqs[i]; >>>> int ret; >>>> >>>> + /* >>>> + * The KVM XICS device considers that the IRQ numbers should >>>> + * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K >>>> + * numbers (only claimed by XIVE for the CPU IPIs). >>>> + */ >>>> + if (i + ics->offset < XICS_IRQ_BASE) { >>>> + continue; >>>> + } >>>> + >>>> state = irq->server; >>>> state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK) >>>> << KVM_XICS_PRIORITY_SHIFT; >>> >> >
On Wed, 13 Feb 2019 09:03:33 +0100 Cédric Le Goater <clg@kaod.org> wrote: > On 2/13/19 2:33 AM, David Gibson wrote: > > On Tue, Feb 12, 2019 at 08:05:53AM +0100, Cédric Le Goater wrote: > >> On 2/12/19 2:06 AM, David Gibson wrote: > >>> On Mon, Jan 07, 2019 at 07:39:45PM +0100, Cédric Le Goater wrote: > >>>> The IRQ number space of the XIVE and XICS interrupt mode are aligned > >>>> when using the dual interrupt mode for the machine. This means that > >>>> the ICS offset is set to zero in QEMU and that the KVM XICS device > >>>> should be informed of this new value. Unfortunately, there is now way > >>>> to do so and KVM still maintains the XICS_IRQ_BASE (0x1000) offset. > >>>> > >>>> Ignore the lower 4K which are not used under the XICS interrupt > >>>> mode. These IRQ numbers are only claimed by XIVE for the CPU IPIs. > >>>> > >>>> Signed-off-by: Cédric Le Goater <clg@kaod.org> > >>>> --- > >>>> hw/intc/xics_kvm.c | 18 ++++++++++++++++++ > >>>> 1 file changed, 18 insertions(+) > >>>> > >>>> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c > >>>> index 651bbfdf6966..1d21ff217b82 100644 > >>>> --- a/hw/intc/xics_kvm.c > >>>> +++ b/hw/intc/xics_kvm.c > >>>> @@ -238,6 +238,15 @@ static void ics_get_kvm_state(ICSState *ics) > >>>> for (i = 0; i < ics->nr_irqs; i++) { > >>>> ICSIRQState *irq = &ics->irqs[i]; > >>>> > >>>> + /* > >>>> + * The KVM XICS device considers that the IRQ numbers should > >>>> + * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K > >>>> + * numbers (only claimed by XIVE for the CPU IPIs). > >>>> + */ > >>>> + if (i + ics->offset < XICS_IRQ_BASE) { > >>>> + continue; > >>>> + } > >>>> + > >>> > >>> This seems bogus to me. The guest-visible irq numbers need to line up > >>> between xics and xive mode, yes, but that doesn't mean we need to keep > >>> around a great big array of unused array of ICS irq states, even in > >>> TCG mode. > >> > >> This is because the qirqs[] array is under the machine and shared between > >> both interrupt modes, xics and xive. > > > > I don't see how that follows. ICSIRQState is indexed in terms of the > > ICS source number, not the global irq number, so I don't see why it > > has to match up with the qirq array. > > The root cause is the use of spapr->irq->nr_irqs to initialize the ICS > and sPAPRXive object. In case of the 'dual' backend, it covers the full > XIVE IRQ number space (0x2000 today) but XICS only needs 0x1000. > > I think we can fix the offset issue by using the appropriate nr_irqs > which should be for the XICS backend : spapr->irq->nr_irqs - ics->offset > Since the root cause is that the value of spapr->irq->nr_irqs should be different in XIVE and XICS, what about fixing it during reset ? Something like: static void spapr_irq_reset_dual(sPAPRMachineState *spapr, Error **errp) { [...] spapr->irq->nr_irqs = spapr_irq_current(spapr)->nr_irqs; spapr_irq_current(spapr)->reset(spapr, errp); } > > I keep in mind the XIVE support for nested guests and I think we will > need to extend the IRQ number space in L1 and have the L2 use a portion > of it (using an offset). > > C. > > >>> > >>>> kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES, > >>>> i + ics->offset, &state, false, &error_fatal); > >>>> > >>>> @@ -303,6 +312,15 @@ static int ics_set_kvm_state(ICSState *ics, int version_id) > >>>> ICSIRQState *irq = &ics->irqs[i]; > >>>> int ret; > >>>> > >>>> + /* > >>>> + * The KVM XICS device considers that the IRQ numbers should > >>>> + * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K > >>>> + * numbers (only claimed by XIVE for the CPU IPIs). > >>>> + */ > >>>> + if (i + ics->offset < XICS_IRQ_BASE) { > >>>> + continue; > >>>> + } > >>>> + > >>>> state = irq->server; > >>>> state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK) > >>>> << KVM_XICS_PRIORITY_SHIFT; > >>> > >> > > > >
On Wed, 13 Feb 2019 12:27:13 +0100 Greg Kurz <groug@kaod.org> wrote: > On Wed, 13 Feb 2019 09:03:33 +0100 > Cédric Le Goater <clg@kaod.org> wrote: > > > On 2/13/19 2:33 AM, David Gibson wrote: > > > On Tue, Feb 12, 2019 at 08:05:53AM +0100, Cédric Le Goater wrote: > > >> On 2/12/19 2:06 AM, David Gibson wrote: > > >>> On Mon, Jan 07, 2019 at 07:39:45PM +0100, Cédric Le Goater wrote: > > >>>> The IRQ number space of the XIVE and XICS interrupt mode are aligned > > >>>> when using the dual interrupt mode for the machine. This means that > > >>>> the ICS offset is set to zero in QEMU and that the KVM XICS device > > >>>> should be informed of this new value. Unfortunately, there is now way > > >>>> to do so and KVM still maintains the XICS_IRQ_BASE (0x1000) offset. > > >>>> > > >>>> Ignore the lower 4K which are not used under the XICS interrupt > > >>>> mode. These IRQ numbers are only claimed by XIVE for the CPU IPIs. > > >>>> > > >>>> Signed-off-by: Cédric Le Goater <clg@kaod.org> > > >>>> --- > > >>>> hw/intc/xics_kvm.c | 18 ++++++++++++++++++ > > >>>> 1 file changed, 18 insertions(+) > > >>>> > > >>>> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c > > >>>> index 651bbfdf6966..1d21ff217b82 100644 > > >>>> --- a/hw/intc/xics_kvm.c > > >>>> +++ b/hw/intc/xics_kvm.c > > >>>> @@ -238,6 +238,15 @@ static void ics_get_kvm_state(ICSState *ics) > > >>>> for (i = 0; i < ics->nr_irqs; i++) { > > >>>> ICSIRQState *irq = &ics->irqs[i]; > > >>>> > > >>>> + /* > > >>>> + * The KVM XICS device considers that the IRQ numbers should > > >>>> + * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K > > >>>> + * numbers (only claimed by XIVE for the CPU IPIs). > > >>>> + */ > > >>>> + if (i + ics->offset < XICS_IRQ_BASE) { > > >>>> + continue; > > >>>> + } > > >>>> + > > >>> > > >>> This seems bogus to me. The guest-visible irq numbers need to line up > > >>> between xics and xive mode, yes, but that doesn't mean we need to keep > > >>> around a great big array of unused array of ICS irq states, even in > > >>> TCG mode. > > >> > > >> This is because the qirqs[] array is under the machine and shared between > > >> both interrupt modes, xics and xive. > > > > > > I don't see how that follows. ICSIRQState is indexed in terms of the > > > ICS source number, not the global irq number, so I don't see why it > > > has to match up with the qirq array. > > > > The root cause is the use of spapr->irq->nr_irqs to initialize the ICS > > and sPAPRXive object. In case of the 'dual' backend, it covers the full > > XIVE IRQ number space (0x2000 today) but XICS only needs 0x1000. > > > > I think we can fix the offset issue by using the appropriate nr_irqs > > which should be for the XICS backend : spapr->irq->nr_irqs - ics->offset > > > > Since the root cause is that the value of spapr->irq->nr_irqs should > be different in XIVE and XICS, what about fixing it during reset ? > Nah this doesn't make sense :) But if XICS always needs 0x1000, why just not change spapr_irq_init_xics() to use SPAPR_IRQ_XICS_NR_IRQS instead of spapr->irq->nr_irqs ? > Something like: > > static void spapr_irq_reset_dual(sPAPRMachineState *spapr, Error **errp) > { > [...] > > spapr->irq->nr_irqs = spapr_irq_current(spapr)->nr_irqs; > > spapr_irq_current(spapr)->reset(spapr, errp); > } > > > > > I keep in mind the XIVE support for nested guests and I think we will > > need to extend the IRQ number space in L1 and have the L2 use a portion > > of it (using an offset). > > > > C. > > > > >>> > > >>>> kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES, > > >>>> i + ics->offset, &state, false, &error_fatal); > > >>>> > > >>>> @@ -303,6 +312,15 @@ static int ics_set_kvm_state(ICSState *ics, int version_id) > > >>>> ICSIRQState *irq = &ics->irqs[i]; > > >>>> int ret; > > >>>> > > >>>> + /* > > >>>> + * The KVM XICS device considers that the IRQ numbers should > > >>>> + * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K > > >>>> + * numbers (only claimed by XIVE for the CPU IPIs). > > >>>> + */ > > >>>> + if (i + ics->offset < XICS_IRQ_BASE) { > > >>>> + continue; > > >>>> + } > > >>>> + > > >>>> state = irq->server; > > >>>> state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK) > > >>>> << KVM_XICS_PRIORITY_SHIFT; > > >>> > > >> > > > > > > > > >
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c index 651bbfdf6966..1d21ff217b82 100644 --- a/hw/intc/xics_kvm.c +++ b/hw/intc/xics_kvm.c @@ -238,6 +238,15 @@ static void ics_get_kvm_state(ICSState *ics) for (i = 0; i < ics->nr_irqs; i++) { ICSIRQState *irq = &ics->irqs[i]; + /* + * The KVM XICS device considers that the IRQ numbers should + * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K + * numbers (only claimed by XIVE for the CPU IPIs). + */ + if (i + ics->offset < XICS_IRQ_BASE) { + continue; + } + kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES, i + ics->offset, &state, false, &error_fatal); @@ -303,6 +312,15 @@ static int ics_set_kvm_state(ICSState *ics, int version_id) ICSIRQState *irq = &ics->irqs[i]; int ret; + /* + * The KVM XICS device considers that the IRQ numbers should + * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K + * numbers (only claimed by XIVE for the CPU IPIs). + */ + if (i + ics->offset < XICS_IRQ_BASE) { + continue; + } + state = irq->server; state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK) << KVM_XICS_PRIORITY_SHIFT;
The IRQ number space of the XIVE and XICS interrupt mode are aligned when using the dual interrupt mode for the machine. This means that the ICS offset is set to zero in QEMU and that the KVM XICS device should be informed of this new value. Unfortunately, there is now way to do so and KVM still maintains the XICS_IRQ_BASE (0x1000) offset. Ignore the lower 4K which are not used under the XICS interrupt mode. These IRQ numbers are only claimed by XIVE for the CPU IPIs. Signed-off-by: Cédric Le Goater <clg@kaod.org> --- hw/intc/xics_kvm.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)