diff mbox series

[12/13] spapr/xics: ignore the lower 4K in the IRQ number space

Message ID 20190107183946.7230-13-clg@kaod.org
State New
Headers show
Series spapr: add KVM support to the XIVE interrupt mode | expand

Commit Message

Cédric Le Goater Jan. 7, 2019, 6:39 p.m. UTC
The IRQ number space of the XIVE and XICS interrupt mode are aligned
when using the dual interrupt mode for the machine. This means that
the ICS offset is set to zero in QEMU and that the KVM XICS device
should be informed of this new value. Unfortunately, there is now way
to do so and KVM still maintains the XICS_IRQ_BASE (0x1000) offset.

Ignore the lower 4K which are not used under the XICS interrupt
mode. These IRQ numbers are only claimed by XIVE for the CPU IPIs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/xics_kvm.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Comments

David Gibson Feb. 12, 2019, 1:06 a.m. UTC | #1
On Mon, Jan 07, 2019 at 07:39:45PM +0100, Cédric Le Goater wrote:
> The IRQ number space of the XIVE and XICS interrupt mode are aligned
> when using the dual interrupt mode for the machine. This means that
> the ICS offset is set to zero in QEMU and that the KVM XICS device
> should be informed of this new value. Unfortunately, there is now way
> to do so and KVM still maintains the XICS_IRQ_BASE (0x1000) offset.
> 
> Ignore the lower 4K which are not used under the XICS interrupt
> mode. These IRQ numbers are only claimed by XIVE for the CPU IPIs.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/xics_kvm.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> index 651bbfdf6966..1d21ff217b82 100644
> --- a/hw/intc/xics_kvm.c
> +++ b/hw/intc/xics_kvm.c
> @@ -238,6 +238,15 @@ static void ics_get_kvm_state(ICSState *ics)
>      for (i = 0; i < ics->nr_irqs; i++) {
>          ICSIRQState *irq = &ics->irqs[i];
>  
> +        /*
> +         * The KVM XICS device considers that the IRQ numbers should
> +         * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K
> +         * numbers (only claimed by XIVE for the CPU IPIs).
> +         */
> +        if (i + ics->offset < XICS_IRQ_BASE) {
> +            continue;
> +        }
> +

This seems bogus to me.  The guest-visible irq numbers need to line up
between xics and xive mode, yes, but that doesn't mean we need to keep
around a great big array of unused array of ICS irq states, even in
TCG mode.

>          kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES,
>                            i + ics->offset, &state, false, &error_fatal);
>  
> @@ -303,6 +312,15 @@ static int ics_set_kvm_state(ICSState *ics, int version_id)
>          ICSIRQState *irq = &ics->irqs[i];
>          int ret;
>  
> +        /*
> +         * The KVM XICS device considers that the IRQ numbers should
> +         * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K
> +         * numbers (only claimed by XIVE for the CPU IPIs).
> +         */
> +        if (i + ics->offset < XICS_IRQ_BASE) {
> +            continue;
> +        }
> +
>          state = irq->server;
>          state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK)
>              << KVM_XICS_PRIORITY_SHIFT;
Cédric Le Goater Feb. 12, 2019, 7:05 a.m. UTC | #2
On 2/12/19 2:06 AM, David Gibson wrote:
> On Mon, Jan 07, 2019 at 07:39:45PM +0100, Cédric Le Goater wrote:
>> The IRQ number space of the XIVE and XICS interrupt mode are aligned
>> when using the dual interrupt mode for the machine. This means that
>> the ICS offset is set to zero in QEMU and that the KVM XICS device
>> should be informed of this new value. Unfortunately, there is now way
>> to do so and KVM still maintains the XICS_IRQ_BASE (0x1000) offset.
>>
>> Ignore the lower 4K which are not used under the XICS interrupt
>> mode. These IRQ numbers are only claimed by XIVE for the CPU IPIs.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/xics_kvm.c | 18 ++++++++++++++++++
>>  1 file changed, 18 insertions(+)
>>
>> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
>> index 651bbfdf6966..1d21ff217b82 100644
>> --- a/hw/intc/xics_kvm.c
>> +++ b/hw/intc/xics_kvm.c
>> @@ -238,6 +238,15 @@ static void ics_get_kvm_state(ICSState *ics)
>>      for (i = 0; i < ics->nr_irqs; i++) {
>>          ICSIRQState *irq = &ics->irqs[i];
>>  
>> +        /*
>> +         * The KVM XICS device considers that the IRQ numbers should
>> +         * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K
>> +         * numbers (only claimed by XIVE for the CPU IPIs).
>> +         */
>> +        if (i + ics->offset < XICS_IRQ_BASE) {
>> +            continue;
>> +        }
>> +
> 
> This seems bogus to me.  The guest-visible irq numbers need to line up
> between xics and xive mode, yes, but that doesn't mean we need to keep
> around a great big array of unused array of ICS irq states, even in
> TCG mode.

This is because the qirqs[] array is under the machine and shared between 
both interrupt modes, xics and xive.

C.

> 
>>          kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES,
>>                            i + ics->offset, &state, false, &error_fatal);
>>  
>> @@ -303,6 +312,15 @@ static int ics_set_kvm_state(ICSState *ics, int version_id)
>>          ICSIRQState *irq = &ics->irqs[i];
>>          int ret;
>>  
>> +        /*
>> +         * The KVM XICS device considers that the IRQ numbers should
>> +         * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K
>> +         * numbers (only claimed by XIVE for the CPU IPIs).
>> +         */
>> +        if (i + ics->offset < XICS_IRQ_BASE) {
>> +            continue;
>> +        }
>> +
>>          state = irq->server;
>>          state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK)
>>              << KVM_XICS_PRIORITY_SHIFT;
>
David Gibson Feb. 13, 2019, 1:33 a.m. UTC | #3
On Tue, Feb 12, 2019 at 08:05:53AM +0100, Cédric Le Goater wrote:
> On 2/12/19 2:06 AM, David Gibson wrote:
> > On Mon, Jan 07, 2019 at 07:39:45PM +0100, Cédric Le Goater wrote:
> >> The IRQ number space of the XIVE and XICS interrupt mode are aligned
> >> when using the dual interrupt mode for the machine. This means that
> >> the ICS offset is set to zero in QEMU and that the KVM XICS device
> >> should be informed of this new value. Unfortunately, there is now way
> >> to do so and KVM still maintains the XICS_IRQ_BASE (0x1000) offset.
> >>
> >> Ignore the lower 4K which are not used under the XICS interrupt
> >> mode. These IRQ numbers are only claimed by XIVE for the CPU IPIs.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  hw/intc/xics_kvm.c | 18 ++++++++++++++++++
> >>  1 file changed, 18 insertions(+)
> >>
> >> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> >> index 651bbfdf6966..1d21ff217b82 100644
> >> --- a/hw/intc/xics_kvm.c
> >> +++ b/hw/intc/xics_kvm.c
> >> @@ -238,6 +238,15 @@ static void ics_get_kvm_state(ICSState *ics)
> >>      for (i = 0; i < ics->nr_irqs; i++) {
> >>          ICSIRQState *irq = &ics->irqs[i];
> >>  
> >> +        /*
> >> +         * The KVM XICS device considers that the IRQ numbers should
> >> +         * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K
> >> +         * numbers (only claimed by XIVE for the CPU IPIs).
> >> +         */
> >> +        if (i + ics->offset < XICS_IRQ_BASE) {
> >> +            continue;
> >> +        }
> >> +
> > 
> > This seems bogus to me.  The guest-visible irq numbers need to line up
> > between xics and xive mode, yes, but that doesn't mean we need to keep
> > around a great big array of unused array of ICS irq states, even in
> > TCG mode.
> 
> This is because the qirqs[] array is under the machine and shared between 
> both interrupt modes, xics and xive.

I don't see how that follows.  ICSIRQState is indexed in terms of the
ICS source number, not the global irq number, so I don't see why it
has to match up with the qirq array.

> 
> C.
> 
> > 
> >>          kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES,
> >>                            i + ics->offset, &state, false, &error_fatal);
> >>  
> >> @@ -303,6 +312,15 @@ static int ics_set_kvm_state(ICSState *ics, int version_id)
> >>          ICSIRQState *irq = &ics->irqs[i];
> >>          int ret;
> >>  
> >> +        /*
> >> +         * The KVM XICS device considers that the IRQ numbers should
> >> +         * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K
> >> +         * numbers (only claimed by XIVE for the CPU IPIs).
> >> +         */
> >> +        if (i + ics->offset < XICS_IRQ_BASE) {
> >> +            continue;
> >> +        }
> >> +
> >>          state = irq->server;
> >>          state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK)
> >>              << KVM_XICS_PRIORITY_SHIFT;
> > 
>
Cédric Le Goater Feb. 13, 2019, 8:03 a.m. UTC | #4
On 2/13/19 2:33 AM, David Gibson wrote:
> On Tue, Feb 12, 2019 at 08:05:53AM +0100, Cédric Le Goater wrote:
>> On 2/12/19 2:06 AM, David Gibson wrote:
>>> On Mon, Jan 07, 2019 at 07:39:45PM +0100, Cédric Le Goater wrote:
>>>> The IRQ number space of the XIVE and XICS interrupt mode are aligned
>>>> when using the dual interrupt mode for the machine. This means that
>>>> the ICS offset is set to zero in QEMU and that the KVM XICS device
>>>> should be informed of this new value. Unfortunately, there is now way
>>>> to do so and KVM still maintains the XICS_IRQ_BASE (0x1000) offset.
>>>>
>>>> Ignore the lower 4K which are not used under the XICS interrupt
>>>> mode. These IRQ numbers are only claimed by XIVE for the CPU IPIs.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>> ---
>>>>  hw/intc/xics_kvm.c | 18 ++++++++++++++++++
>>>>  1 file changed, 18 insertions(+)
>>>>
>>>> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
>>>> index 651bbfdf6966..1d21ff217b82 100644
>>>> --- a/hw/intc/xics_kvm.c
>>>> +++ b/hw/intc/xics_kvm.c
>>>> @@ -238,6 +238,15 @@ static void ics_get_kvm_state(ICSState *ics)
>>>>      for (i = 0; i < ics->nr_irqs; i++) {
>>>>          ICSIRQState *irq = &ics->irqs[i];
>>>>  
>>>> +        /*
>>>> +         * The KVM XICS device considers that the IRQ numbers should
>>>> +         * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K
>>>> +         * numbers (only claimed by XIVE for the CPU IPIs).
>>>> +         */
>>>> +        if (i + ics->offset < XICS_IRQ_BASE) {
>>>> +            continue;
>>>> +        }
>>>> +
>>>
>>> This seems bogus to me.  The guest-visible irq numbers need to line up
>>> between xics and xive mode, yes, but that doesn't mean we need to keep
>>> around a great big array of unused array of ICS irq states, even in
>>> TCG mode.
>>
>> This is because the qirqs[] array is under the machine and shared between 
>> both interrupt modes, xics and xive.
> 
> I don't see how that follows.  ICSIRQState is indexed in terms of the
> ICS source number, not the global irq number, so I don't see why it
> has to match up with the qirq array.

The root cause is the use of spapr->irq->nr_irqs to initialize the ICS 
and sPAPRXive object. In case of the 'dual' backend, it covers the full 
XIVE IRQ number space (0x2000 today) but XICS only needs 0x1000.

I think we can fix the offset issue by using the appropriate nr_irqs 
which should be for the XICS backend : spapr->irq->nr_irqs - ics->offset


I keep in mind the XIVE support for nested guests and I think we will
need to extend the IRQ number space in L1 and have the L2 use a portion
of it (using an offset).     

C.
 
>>>
>>>>          kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES,
>>>>                            i + ics->offset, &state, false, &error_fatal);
>>>>  
>>>> @@ -303,6 +312,15 @@ static int ics_set_kvm_state(ICSState *ics, int version_id)
>>>>          ICSIRQState *irq = &ics->irqs[i];
>>>>          int ret;
>>>>  
>>>> +        /*
>>>> +         * The KVM XICS device considers that the IRQ numbers should
>>>> +         * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K
>>>> +         * numbers (only claimed by XIVE for the CPU IPIs).
>>>> +         */
>>>> +        if (i + ics->offset < XICS_IRQ_BASE) {
>>>> +            continue;
>>>> +        }
>>>> +
>>>>          state = irq->server;
>>>>          state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK)
>>>>              << KVM_XICS_PRIORITY_SHIFT;
>>>
>>
>
Greg Kurz Feb. 13, 2019, 11:27 a.m. UTC | #5
On Wed, 13 Feb 2019 09:03:33 +0100
Cédric Le Goater <clg@kaod.org> wrote:

> On 2/13/19 2:33 AM, David Gibson wrote:
> > On Tue, Feb 12, 2019 at 08:05:53AM +0100, Cédric Le Goater wrote:  
> >> On 2/12/19 2:06 AM, David Gibson wrote:  
> >>> On Mon, Jan 07, 2019 at 07:39:45PM +0100, Cédric Le Goater wrote:  
> >>>> The IRQ number space of the XIVE and XICS interrupt mode are aligned
> >>>> when using the dual interrupt mode for the machine. This means that
> >>>> the ICS offset is set to zero in QEMU and that the KVM XICS device
> >>>> should be informed of this new value. Unfortunately, there is now way
> >>>> to do so and KVM still maintains the XICS_IRQ_BASE (0x1000) offset.
> >>>>
> >>>> Ignore the lower 4K which are not used under the XICS interrupt
> >>>> mode. These IRQ numbers are only claimed by XIVE for the CPU IPIs.
> >>>>
> >>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >>>> ---
> >>>>  hw/intc/xics_kvm.c | 18 ++++++++++++++++++
> >>>>  1 file changed, 18 insertions(+)
> >>>>
> >>>> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> >>>> index 651bbfdf6966..1d21ff217b82 100644
> >>>> --- a/hw/intc/xics_kvm.c
> >>>> +++ b/hw/intc/xics_kvm.c
> >>>> @@ -238,6 +238,15 @@ static void ics_get_kvm_state(ICSState *ics)
> >>>>      for (i = 0; i < ics->nr_irqs; i++) {
> >>>>          ICSIRQState *irq = &ics->irqs[i];
> >>>>  
> >>>> +        /*
> >>>> +         * The KVM XICS device considers that the IRQ numbers should
> >>>> +         * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K
> >>>> +         * numbers (only claimed by XIVE for the CPU IPIs).
> >>>> +         */
> >>>> +        if (i + ics->offset < XICS_IRQ_BASE) {
> >>>> +            continue;
> >>>> +        }
> >>>> +  
> >>>
> >>> This seems bogus to me.  The guest-visible irq numbers need to line up
> >>> between xics and xive mode, yes, but that doesn't mean we need to keep
> >>> around a great big array of unused array of ICS irq states, even in
> >>> TCG mode.  
> >>
> >> This is because the qirqs[] array is under the machine and shared between 
> >> both interrupt modes, xics and xive.  
> > 
> > I don't see how that follows.  ICSIRQState is indexed in terms of the
> > ICS source number, not the global irq number, so I don't see why it
> > has to match up with the qirq array.  
> 
> The root cause is the use of spapr->irq->nr_irqs to initialize the ICS 
> and sPAPRXive object. In case of the 'dual' backend, it covers the full 
> XIVE IRQ number space (0x2000 today) but XICS only needs 0x1000.
> 
> I think we can fix the offset issue by using the appropriate nr_irqs 
> which should be for the XICS backend : spapr->irq->nr_irqs - ics->offset
> 

Since the root cause is that the value of spapr->irq->nr_irqs should
be different in XIVE and XICS, what about fixing it during reset ?

Something like:

static void spapr_irq_reset_dual(sPAPRMachineState *spapr, Error **errp)
{
    [...]

    spapr->irq->nr_irqs = spapr_irq_current(spapr)->nr_irqs;

    spapr_irq_current(spapr)->reset(spapr, errp);
}

> 
> I keep in mind the XIVE support for nested guests and I think we will
> need to extend the IRQ number space in L1 and have the L2 use a portion
> of it (using an offset).     
> 
> C.
>  
> >>>  
> >>>>          kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES,
> >>>>                            i + ics->offset, &state, false, &error_fatal);
> >>>>  
> >>>> @@ -303,6 +312,15 @@ static int ics_set_kvm_state(ICSState *ics, int version_id)
> >>>>          ICSIRQState *irq = &ics->irqs[i];
> >>>>          int ret;
> >>>>  
> >>>> +        /*
> >>>> +         * The KVM XICS device considers that the IRQ numbers should
> >>>> +         * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K
> >>>> +         * numbers (only claimed by XIVE for the CPU IPIs).
> >>>> +         */
> >>>> +        if (i + ics->offset < XICS_IRQ_BASE) {
> >>>> +            continue;
> >>>> +        }
> >>>> +
> >>>>          state = irq->server;
> >>>>          state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK)
> >>>>              << KVM_XICS_PRIORITY_SHIFT;  
> >>>  
> >>  
> >   
> 
>
Greg Kurz Feb. 13, 2019, 12:11 p.m. UTC | #6
On Wed, 13 Feb 2019 12:27:13 +0100
Greg Kurz <groug@kaod.org> wrote:

> On Wed, 13 Feb 2019 09:03:33 +0100
> Cédric Le Goater <clg@kaod.org> wrote:
> 
> > On 2/13/19 2:33 AM, David Gibson wrote:  
> > > On Tue, Feb 12, 2019 at 08:05:53AM +0100, Cédric Le Goater wrote:    
> > >> On 2/12/19 2:06 AM, David Gibson wrote:    
> > >>> On Mon, Jan 07, 2019 at 07:39:45PM +0100, Cédric Le Goater wrote:    
> > >>>> The IRQ number space of the XIVE and XICS interrupt mode are aligned
> > >>>> when using the dual interrupt mode for the machine. This means that
> > >>>> the ICS offset is set to zero in QEMU and that the KVM XICS device
> > >>>> should be informed of this new value. Unfortunately, there is now way
> > >>>> to do so and KVM still maintains the XICS_IRQ_BASE (0x1000) offset.
> > >>>>
> > >>>> Ignore the lower 4K which are not used under the XICS interrupt
> > >>>> mode. These IRQ numbers are only claimed by XIVE for the CPU IPIs.
> > >>>>
> > >>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > >>>> ---
> > >>>>  hw/intc/xics_kvm.c | 18 ++++++++++++++++++
> > >>>>  1 file changed, 18 insertions(+)
> > >>>>
> > >>>> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> > >>>> index 651bbfdf6966..1d21ff217b82 100644
> > >>>> --- a/hw/intc/xics_kvm.c
> > >>>> +++ b/hw/intc/xics_kvm.c
> > >>>> @@ -238,6 +238,15 @@ static void ics_get_kvm_state(ICSState *ics)
> > >>>>      for (i = 0; i < ics->nr_irqs; i++) {
> > >>>>          ICSIRQState *irq = &ics->irqs[i];
> > >>>>  
> > >>>> +        /*
> > >>>> +         * The KVM XICS device considers that the IRQ numbers should
> > >>>> +         * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K
> > >>>> +         * numbers (only claimed by XIVE for the CPU IPIs).
> > >>>> +         */
> > >>>> +        if (i + ics->offset < XICS_IRQ_BASE) {
> > >>>> +            continue;
> > >>>> +        }
> > >>>> +    
> > >>>
> > >>> This seems bogus to me.  The guest-visible irq numbers need to line up
> > >>> between xics and xive mode, yes, but that doesn't mean we need to keep
> > >>> around a great big array of unused array of ICS irq states, even in
> > >>> TCG mode.    
> > >>
> > >> This is because the qirqs[] array is under the machine and shared between 
> > >> both interrupt modes, xics and xive.    
> > > 
> > > I don't see how that follows.  ICSIRQState is indexed in terms of the
> > > ICS source number, not the global irq number, so I don't see why it
> > > has to match up with the qirq array.    
> > 
> > The root cause is the use of spapr->irq->nr_irqs to initialize the ICS 
> > and sPAPRXive object. In case of the 'dual' backend, it covers the full 
> > XIVE IRQ number space (0x2000 today) but XICS only needs 0x1000.
> > 
> > I think we can fix the offset issue by using the appropriate nr_irqs 
> > which should be for the XICS backend : spapr->irq->nr_irqs - ics->offset
> >   
> 
> Since the root cause is that the value of spapr->irq->nr_irqs should
> be different in XIVE and XICS, what about fixing it during reset ?
> 

Nah this doesn't make sense :)

But if XICS always needs 0x1000, why just not change spapr_irq_init_xics()
to use SPAPR_IRQ_XICS_NR_IRQS instead of spapr->irq->nr_irqs ?

> Something like:
> 
> static void spapr_irq_reset_dual(sPAPRMachineState *spapr, Error **errp)
> {
>     [...]
> 
>     spapr->irq->nr_irqs = spapr_irq_current(spapr)->nr_irqs;
> 
>     spapr_irq_current(spapr)->reset(spapr, errp);
> }
> 
> > 
> > I keep in mind the XIVE support for nested guests and I think we will
> > need to extend the IRQ number space in L1 and have the L2 use a portion
> > of it (using an offset).     
> > 
> > C.
> >    
> > >>>    
> > >>>>          kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES,
> > >>>>                            i + ics->offset, &state, false, &error_fatal);
> > >>>>  
> > >>>> @@ -303,6 +312,15 @@ static int ics_set_kvm_state(ICSState *ics, int version_id)
> > >>>>          ICSIRQState *irq = &ics->irqs[i];
> > >>>>          int ret;
> > >>>>  
> > >>>> +        /*
> > >>>> +         * The KVM XICS device considers that the IRQ numbers should
> > >>>> +         * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K
> > >>>> +         * numbers (only claimed by XIVE for the CPU IPIs).
> > >>>> +         */
> > >>>> +        if (i + ics->offset < XICS_IRQ_BASE) {
> > >>>> +            continue;
> > >>>> +        }
> > >>>> +
> > >>>>          state = irq->server;
> > >>>>          state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK)
> > >>>>              << KVM_XICS_PRIORITY_SHIFT;    
> > >>>    
> > >>    
> > >     
> > 
> >   
> 
>
diff mbox series

Patch

diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index 651bbfdf6966..1d21ff217b82 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -238,6 +238,15 @@  static void ics_get_kvm_state(ICSState *ics)
     for (i = 0; i < ics->nr_irqs; i++) {
         ICSIRQState *irq = &ics->irqs[i];
 
+        /*
+         * The KVM XICS device considers that the IRQ numbers should
+         * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K
+         * numbers (only claimed by XIVE for the CPU IPIs).
+         */
+        if (i + ics->offset < XICS_IRQ_BASE) {
+            continue;
+        }
+
         kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES,
                           i + ics->offset, &state, false, &error_fatal);
 
@@ -303,6 +312,15 @@  static int ics_set_kvm_state(ICSState *ics, int version_id)
         ICSIRQState *irq = &ics->irqs[i];
         int ret;
 
+        /*
+         * The KVM XICS device considers that the IRQ numbers should
+         * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K
+         * numbers (only claimed by XIVE for the CPU IPIs).
+         */
+        if (i + ics->offset < XICS_IRQ_BASE) {
+            continue;
+        }
+
         state = irq->server;
         state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK)
             << KVM_XICS_PRIORITY_SHIFT;