diff mbox

[v5,00/18] IOMMU: Enable interrupt remapping for Intel IOMMU

Message ID 20160503032216.GA3296@pxdev.xzpeter.org
State New
Headers show

Commit Message

Peter Xu May 3, 2016, 3:22 a.m. UTC
On Fri, Apr 29, 2016 at 09:52:14PM +0200, Radim Krčmář wrote:
> 2016-04-28 17:18+0800, Peter Xu:
> > On Thu, Apr 28, 2016 at 09:19:28AM +0200, Jan Kiszka wrote:
> >> Instead of fiddling with irq routes for the IOAPIC - where we don't need
> >> it -, I would suggest to do the following: Send IOAPIC events via
> >> kvm_irqchip_send_msi to the kernel. Only irqfd users (vhost, vfio)
> >> should use the pattern you are now applying to the IOAPIC: establish
> >> static routes as an irqfd is set up, and that route should be translated
> >> by the iommu first, register an IEC notifier to update any affected
> >> route when the iommu translation changes.
> > 
> > Yes, maybe that's the right thing to do. Or say, when split irqchip,
> > IOAPIC can avoid using GSI routes any more. If with that, I should
> > also remove lots of things, like: IEC notifiers for IOAPIC, and all
> > things related to msi route sync-up in IOAPIC codes with KVM (so I
> > suppose we will save 24 gsi route entries for KVM, which sounds
> > good).
> 
> Sadly, we can't get rid of those GSI routes.  KVM uses them to decide
> whether it should forward EOI to userspace.  And QEMU also has to remap
> them, because KVM uses dest_id for that decision.

Thanks to point out.  Actually I have started to test the new patch
and did encountered issue when using split irqchip mode. The above
explains. :)

So I think I will keep this part of codes un-touched in this
series, and I can move forward with IR changes.

However, I am still wondering whether we can avoid this in KVM when
when using split irqchip mode? E.g., what if we do not do this check
in KVM and let all EOI be passed to userspace? Like:

---


---

I believe this will brings more user-space context switches, I just
do not know how this will impact the system (only for curiosity :).

Thanks,

-- peterx

Comments

Jan Kiszka May 3, 2016, 4:38 a.m. UTC | #1
On 2016-05-03 05:22, Peter Xu wrote:
> On Fri, Apr 29, 2016 at 09:52:14PM +0200, Radim Krčmář wrote:
>> 2016-04-28 17:18+0800, Peter Xu:
>>> On Thu, Apr 28, 2016 at 09:19:28AM +0200, Jan Kiszka wrote:
>>>> Instead of fiddling with irq routes for the IOAPIC - where we don't need
>>>> it -, I would suggest to do the following: Send IOAPIC events via
>>>> kvm_irqchip_send_msi to the kernel. Only irqfd users (vhost, vfio)
>>>> should use the pattern you are now applying to the IOAPIC: establish
>>>> static routes as an irqfd is set up, and that route should be translated
>>>> by the iommu first, register an IEC notifier to update any affected
>>>> route when the iommu translation changes.
>>>
>>> Yes, maybe that's the right thing to do. Or say, when split irqchip,
>>> IOAPIC can avoid using GSI routes any more. If with that, I should
>>> also remove lots of things, like: IEC notifiers for IOAPIC, and all
>>> things related to msi route sync-up in IOAPIC codes with KVM (so I
>>> suppose we will save 24 gsi route entries for KVM, which sounds
>>> good).
>>
>> Sadly, we can't get rid of those GSI routes.  KVM uses them to decide
>> whether it should forward EOI to userspace.  And QEMU also has to remap
>> them, because KVM uses dest_id for that decision.
> 
> Thanks to point out.  Actually I have started to test the new patch
> and did encountered issue when using split irqchip mode. The above
> explains. :)
> 
> So I think I will keep this part of codes un-touched in this
> series, and I can move forward with IR changes.
> 
> However, I am still wondering whether we can avoid this in KVM when
> when using split irqchip mode? E.g., what if we do not do this check
> in KVM and let all EOI be passed to userspace? Like:
> 
> ---
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 36591fa..c0a6e51 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -936,10 +936,6 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>  {
>         int trigger_mode;
> 
> -       /* Eoi the ioapic only if the ioapic doesn't own the vector. */
> -       if (!kvm_ioapic_handles_vector(apic, vector))
> -               return;
> -
>         /* Request a KVM exit to inform the userspace IOAPIC. */
>         if (irqchip_split(apic->vcpu->kvm)) {
>                 apic->vcpu->arch.pending_ioapic_eoi = vector;
> @@ -947,6 +943,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>                 return;
>         }
> 
> +       /* Eoi the ioapic only if the ioapic doesn't own the vector. */
> +       if (!kvm_ioapic_handles_vector(apic, vector))
> +               return;
> +
>         if (apic_test_vector(vector, apic->regs + APIC_TMR))
>                 trigger_mode = IOAPIC_LEVEL_TRIG;
>         else
> 
> ---
> 
> I believe this will brings more user-space context switches, I just
> do not know how this will impact the system (only for curiosity :).

Yes, this doesn't look good from the performance POV. Given that most
EOIs of the APICs will not trigger a message to an IOAPIC and userspace
exits are expensive, that should be measurable.

But you should optimize route updating a bit: I noticed real delays
(almost a second) when reprogramming all IOAPIC pins during Jailhouse
handover. That's because of the heavyweight synchronize_srcu we have on
each route change. Even if that is an extreme case, you should try to
reduce route updates to only those that were actually affected by an
IRTE change.

Jan
Peter Xu May 3, 2016, 5:30 a.m. UTC | #2
On Tue, May 03, 2016 at 06:38:28AM +0200, Jan Kiszka wrote:
> On 2016-05-03 05:22, Peter Xu wrote:
> > On Fri, Apr 29, 2016 at 09:52:14PM +0200, Radim Krčmář wrote:
> >> 2016-04-28 17:18+0800, Peter Xu:
> >>> On Thu, Apr 28, 2016 at 09:19:28AM +0200, Jan Kiszka wrote:
> >>>> Instead of fiddling with irq routes for the IOAPIC - where we don't need
> >>>> it -, I would suggest to do the following: Send IOAPIC events via
> >>>> kvm_irqchip_send_msi to the kernel. Only irqfd users (vhost, vfio)
> >>>> should use the pattern you are now applying to the IOAPIC: establish
> >>>> static routes as an irqfd is set up, and that route should be translated
> >>>> by the iommu first, register an IEC notifier to update any affected
> >>>> route when the iommu translation changes.
> >>>
> >>> Yes, maybe that's the right thing to do. Or say, when split irqchip,
> >>> IOAPIC can avoid using GSI routes any more. If with that, I should
> >>> also remove lots of things, like: IEC notifiers for IOAPIC, and all
> >>> things related to msi route sync-up in IOAPIC codes with KVM (so I
> >>> suppose we will save 24 gsi route entries for KVM, which sounds
> >>> good).
> >>
> >> Sadly, we can't get rid of those GSI routes.  KVM uses them to decide
> >> whether it should forward EOI to userspace.  And QEMU also has to remap
> >> them, because KVM uses dest_id for that decision.
> > 
> > Thanks to point out.  Actually I have started to test the new patch
> > and did encountered issue when using split irqchip mode. The above
> > explains. :)
> > 
> > So I think I will keep this part of codes un-touched in this
> > series, and I can move forward with IR changes.
> > 
> > However, I am still wondering whether we can avoid this in KVM when
> > when using split irqchip mode? E.g., what if we do not do this check
> > in KVM and let all EOI be passed to userspace? Like:
> > 
> > ---
> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 36591fa..c0a6e51 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -936,10 +936,6 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> >  {
> >         int trigger_mode;
> > 
> > -       /* Eoi the ioapic only if the ioapic doesn't own the vector. */
> > -       if (!kvm_ioapic_handles_vector(apic, vector))
> > -               return;
> > -
> >         /* Request a KVM exit to inform the userspace IOAPIC. */
> >         if (irqchip_split(apic->vcpu->kvm)) {
> >                 apic->vcpu->arch.pending_ioapic_eoi = vector;
> > @@ -947,6 +943,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> >                 return;
> >         }
> > 
> > +       /* Eoi the ioapic only if the ioapic doesn't own the vector. */
> > +       if (!kvm_ioapic_handles_vector(apic, vector))
> > +               return;
> > +
> >         if (apic_test_vector(vector, apic->regs + APIC_TMR))
> >                 trigger_mode = IOAPIC_LEVEL_TRIG;
> >         else
> > 
> > ---
> > 
> > I believe this will brings more user-space context switches, I just
> > do not know how this will impact the system (only for curiosity :).
> 
> Yes, this doesn't look good from the performance POV. Given that most
> EOIs of the APICs will not trigger a message to an IOAPIC and userspace
> exits are expensive, that should be measurable.
> 
> But you should optimize route updating a bit: I noticed real delays
> (almost a second) when reprogramming all IOAPIC pins during Jailhouse
> handover. That's because of the heavyweight synchronize_srcu we have on
> each route change. Even if that is an extreme case, you should try to
> reduce route updates to only those that were actually affected by an
> IRTE change.

Yes, this solution will possibly need one more IOMMU API besides
int_remap(), for IOAPIC to know whether specific IOAPIC entry needs
to be updated (IOAPIC needs to know the index information of each
IOAPIC entry, and see whether it is matched with IEC notified index
and mask, in ioapic_iec_notifier()).

Another problem that may cause this issue is that: now we are
committing route information every time when updating route entries
(please check kvm_update_routing_entry()). IMHO this is not
necessary - we will need 25 times of committing for each IEC
invalication, while actually we only need one at the end.

So... We'll have another solution for this: how about not committing
for each update, and just do one commit after all changes settled?
This seems to be much cleaner, and easier comparing to the above
proposal.

(These two solutions actually solves different problems, and either
 of them should help in reducing the 1 sec delay mentioned
 above. For now, I'd prefer first choose the 2nd solution to do
 explicit route commit, and add the 1st one into my todo list)

Thanks,

-- peterx
Jan Kiszka May 3, 2016, 5:40 a.m. UTC | #3
On 2016-05-03 07:30, Peter Xu wrote:
> On Tue, May 03, 2016 at 06:38:28AM +0200, Jan Kiszka wrote:
>> On 2016-05-03 05:22, Peter Xu wrote:
>>> On Fri, Apr 29, 2016 at 09:52:14PM +0200, Radim Krčmář wrote:
>>>> 2016-04-28 17:18+0800, Peter Xu:
>>>>> On Thu, Apr 28, 2016 at 09:19:28AM +0200, Jan Kiszka wrote:
>>>>>> Instead of fiddling with irq routes for the IOAPIC - where we don't need
>>>>>> it -, I would suggest to do the following: Send IOAPIC events via
>>>>>> kvm_irqchip_send_msi to the kernel. Only irqfd users (vhost, vfio)
>>>>>> should use the pattern you are now applying to the IOAPIC: establish
>>>>>> static routes as an irqfd is set up, and that route should be translated
>>>>>> by the iommu first, register an IEC notifier to update any affected
>>>>>> route when the iommu translation changes.
>>>>>
>>>>> Yes, maybe that's the right thing to do. Or say, when split irqchip,
>>>>> IOAPIC can avoid using GSI routes any more. If with that, I should
>>>>> also remove lots of things, like: IEC notifiers for IOAPIC, and all
>>>>> things related to msi route sync-up in IOAPIC codes with KVM (so I
>>>>> suppose we will save 24 gsi route entries for KVM, which sounds
>>>>> good).
>>>>
>>>> Sadly, we can't get rid of those GSI routes.  KVM uses them to decide
>>>> whether it should forward EOI to userspace.  And QEMU also has to remap
>>>> them, because KVM uses dest_id for that decision.
>>>
>>> Thanks to point out.  Actually I have started to test the new patch
>>> and did encountered issue when using split irqchip mode. The above
>>> explains. :)
>>>
>>> So I think I will keep this part of codes un-touched in this
>>> series, and I can move forward with IR changes.
>>>
>>> However, I am still wondering whether we can avoid this in KVM when
>>> when using split irqchip mode? E.g., what if we do not do this check
>>> in KVM and let all EOI be passed to userspace? Like:
>>>
>>> ---
>>>
>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>> index 36591fa..c0a6e51 100644
>>> --- a/arch/x86/kvm/lapic.c
>>> +++ b/arch/x86/kvm/lapic.c
>>> @@ -936,10 +936,6 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>>>  {
>>>         int trigger_mode;
>>>
>>> -       /* Eoi the ioapic only if the ioapic doesn't own the vector. */
>>> -       if (!kvm_ioapic_handles_vector(apic, vector))
>>> -               return;
>>> -
>>>         /* Request a KVM exit to inform the userspace IOAPIC. */
>>>         if (irqchip_split(apic->vcpu->kvm)) {
>>>                 apic->vcpu->arch.pending_ioapic_eoi = vector;
>>> @@ -947,6 +943,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>>>                 return;
>>>         }
>>>
>>> +       /* Eoi the ioapic only if the ioapic doesn't own the vector. */
>>> +       if (!kvm_ioapic_handles_vector(apic, vector))
>>> +               return;
>>> +
>>>         if (apic_test_vector(vector, apic->regs + APIC_TMR))
>>>                 trigger_mode = IOAPIC_LEVEL_TRIG;
>>>         else
>>>
>>> ---
>>>
>>> I believe this will brings more user-space context switches, I just
>>> do not know how this will impact the system (only for curiosity :).
>>
>> Yes, this doesn't look good from the performance POV. Given that most
>> EOIs of the APICs will not trigger a message to an IOAPIC and userspace
>> exits are expensive, that should be measurable.
>>
>> But you should optimize route updating a bit: I noticed real delays
>> (almost a second) when reprogramming all IOAPIC pins during Jailhouse
>> handover. That's because of the heavyweight synchronize_srcu we have on
>> each route change. Even if that is an extreme case, you should try to
>> reduce route updates to only those that were actually affected by an
>> IRTE change.
> 
> Yes, this solution will possibly need one more IOMMU API besides
> int_remap(), for IOAPIC to know whether specific IOAPIC entry needs
> to be updated (IOAPIC needs to know the index information of each
> IOAPIC entry, and see whether it is matched with IEC notified index
> and mask, in ioapic_iec_notifier()).
> 
> Another problem that may cause this issue is that: now we are
> committing route information every time when updating route entries
> (please check kvm_update_routing_entry()). IMHO this is not
> necessary - we will need 25 times of committing for each IEC
> invalication, while actually we only need one at the end.
> 
> So... We'll have another solution for this: how about not committing
> for each update, and just do one commit after all changes settled?
> This seems to be much cleaner, and easier comparing to the above
> proposal.

If you see a number of invalidation requests queued in a row, you may
also process them atomically from the guest POV, which includes a single
route update at the end, before the guest gets told that everything was
done.

Jan

> 
> (These two solutions actually solves different problems, and either
>  of them should help in reducing the 1 sec delay mentioned
>  above. For now, I'd prefer first choose the 2nd solution to do
>  explicit route commit, and add the 1st one into my todo list)
> 
> Thanks,
> 
> -- peterx
>
Peter Xu May 3, 2016, 6 a.m. UTC | #4
On Tue, May 03, 2016 at 07:40:50AM +0200, Jan Kiszka wrote:
> On 2016-05-03 07:30, Peter Xu wrote:
> > On Tue, May 03, 2016 at 06:38:28AM +0200, Jan Kiszka wrote:
> >> On 2016-05-03 05:22, Peter Xu wrote:
> >>> On Fri, Apr 29, 2016 at 09:52:14PM +0200, Radim Krčmář wrote:
> >>>> 2016-04-28 17:18+0800, Peter Xu:
> >>>>> On Thu, Apr 28, 2016 at 09:19:28AM +0200, Jan Kiszka wrote:
> >>>>>> Instead of fiddling with irq routes for the IOAPIC - where we don't need
> >>>>>> it -, I would suggest to do the following: Send IOAPIC events via
> >>>>>> kvm_irqchip_send_msi to the kernel. Only irqfd users (vhost, vfio)
> >>>>>> should use the pattern you are now applying to the IOAPIC: establish
> >>>>>> static routes as an irqfd is set up, and that route should be translated
> >>>>>> by the iommu first, register an IEC notifier to update any affected
> >>>>>> route when the iommu translation changes.
> >>>>>
> >>>>> Yes, maybe that's the right thing to do. Or say, when split irqchip,
> >>>>> IOAPIC can avoid using GSI routes any more. If with that, I should
> >>>>> also remove lots of things, like: IEC notifiers for IOAPIC, and all
> >>>>> things related to msi route sync-up in IOAPIC codes with KVM (so I
> >>>>> suppose we will save 24 gsi route entries for KVM, which sounds
> >>>>> good).
> >>>>
> >>>> Sadly, we can't get rid of those GSI routes.  KVM uses them to decide
> >>>> whether it should forward EOI to userspace.  And QEMU also has to remap
> >>>> them, because KVM uses dest_id for that decision.
> >>>
> >>> Thanks to point out.  Actually I have started to test the new patch
> >>> and did encountered issue when using split irqchip mode. The above
> >>> explains. :)
> >>>
> >>> So I think I will keep this part of codes un-touched in this
> >>> series, and I can move forward with IR changes.
> >>>
> >>> However, I am still wondering whether we can avoid this in KVM when
> >>> when using split irqchip mode? E.g., what if we do not do this check
> >>> in KVM and let all EOI be passed to userspace? Like:
> >>>
> >>> ---
> >>>
> >>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> >>> index 36591fa..c0a6e51 100644
> >>> --- a/arch/x86/kvm/lapic.c
> >>> +++ b/arch/x86/kvm/lapic.c
> >>> @@ -936,10 +936,6 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> >>>  {
> >>>         int trigger_mode;
> >>>
> >>> -       /* Eoi the ioapic only if the ioapic doesn't own the vector. */
> >>> -       if (!kvm_ioapic_handles_vector(apic, vector))
> >>> -               return;
> >>> -
> >>>         /* Request a KVM exit to inform the userspace IOAPIC. */
> >>>         if (irqchip_split(apic->vcpu->kvm)) {
> >>>                 apic->vcpu->arch.pending_ioapic_eoi = vector;
> >>> @@ -947,6 +943,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> >>>                 return;
> >>>         }
> >>>
> >>> +       /* Eoi the ioapic only if the ioapic doesn't own the vector. */
> >>> +       if (!kvm_ioapic_handles_vector(apic, vector))
> >>> +               return;
> >>> +
> >>>         if (apic_test_vector(vector, apic->regs + APIC_TMR))
> >>>                 trigger_mode = IOAPIC_LEVEL_TRIG;
> >>>         else
> >>>
> >>> ---
> >>>
> >>> I believe this will brings more user-space context switches, I just
> >>> do not know how this will impact the system (only for curiosity :).
> >>
> >> Yes, this doesn't look good from the performance POV. Given that most
> >> EOIs of the APICs will not trigger a message to an IOAPIC and userspace
> >> exits are expensive, that should be measurable.
> >>
> >> But you should optimize route updating a bit: I noticed real delays
> >> (almost a second) when reprogramming all IOAPIC pins during Jailhouse
> >> handover. That's because of the heavyweight synchronize_srcu we have on
> >> each route change. Even if that is an extreme case, you should try to
> >> reduce route updates to only those that were actually affected by an
> >> IRTE change.
> > 
> > Yes, this solution will possibly need one more IOMMU API besides
> > int_remap(), for IOAPIC to know whether specific IOAPIC entry needs
> > to be updated (IOAPIC needs to know the index information of each
> > IOAPIC entry, and see whether it is matched with IEC notified index
> > and mask, in ioapic_iec_notifier()).
> > 
> > Another problem that may cause this issue is that: now we are
> > committing route information every time when updating route entries
> > (please check kvm_update_routing_entry()). IMHO this is not
> > necessary - we will need 25 times of committing for each IEC
> > invalication, while actually we only need one at the end.
> > 
> > So... We'll have another solution for this: how about not committing
> > for each update, and just do one commit after all changes settled?
> > This seems to be much cleaner, and easier comparing to the above
> > proposal.
> 
> If you see a number of invalidation requests queued in a row, you may
> also process them atomically from the guest POV, which includes a single
> route update at the end, before the guest gets told that everything was
> done.

Yes, this would be better as a 3rd step after the above two. If you
would not mind, I'll noting this down as well with the 1st proposal
(considering that this will need more code changes, not only in QI
logic, also in IEC notification path), and will only contain the 2nd
change (explicit route commit) in v6, to partially solve the problem
for now.

Thanks,

-- peterx
Jan Kiszka May 3, 2016, 6:09 a.m. UTC | #5
On 2016-05-03 08:00, Peter Xu wrote:
> On Tue, May 03, 2016 at 07:40:50AM +0200, Jan Kiszka wrote:
>> On 2016-05-03 07:30, Peter Xu wrote:
>>> On Tue, May 03, 2016 at 06:38:28AM +0200, Jan Kiszka wrote:
>>>> On 2016-05-03 05:22, Peter Xu wrote:
>>>>> On Fri, Apr 29, 2016 at 09:52:14PM +0200, Radim Krčmář wrote:
>>>>>> 2016-04-28 17:18+0800, Peter Xu:
>>>>>>> On Thu, Apr 28, 2016 at 09:19:28AM +0200, Jan Kiszka wrote:
>>>>>>>> Instead of fiddling with irq routes for the IOAPIC - where we don't need
>>>>>>>> it -, I would suggest to do the following: Send IOAPIC events via
>>>>>>>> kvm_irqchip_send_msi to the kernel. Only irqfd users (vhost, vfio)
>>>>>>>> should use the pattern you are now applying to the IOAPIC: establish
>>>>>>>> static routes as an irqfd is set up, and that route should be translated
>>>>>>>> by the iommu first, register an IEC notifier to update any affected
>>>>>>>> route when the iommu translation changes.
>>>>>>>
>>>>>>> Yes, maybe that's the right thing to do. Or say, when split irqchip,
>>>>>>> IOAPIC can avoid using GSI routes any more. If with that, I should
>>>>>>> also remove lots of things, like: IEC notifiers for IOAPIC, and all
>>>>>>> things related to msi route sync-up in IOAPIC codes with KVM (so I
>>>>>>> suppose we will save 24 gsi route entries for KVM, which sounds
>>>>>>> good).
>>>>>>
>>>>>> Sadly, we can't get rid of those GSI routes.  KVM uses them to decide
>>>>>> whether it should forward EOI to userspace.  And QEMU also has to remap
>>>>>> them, because KVM uses dest_id for that decision.
>>>>>
>>>>> Thanks to point out.  Actually I have started to test the new patch
>>>>> and did encountered issue when using split irqchip mode. The above
>>>>> explains. :)
>>>>>
>>>>> So I think I will keep this part of codes un-touched in this
>>>>> series, and I can move forward with IR changes.
>>>>>
>>>>> However, I am still wondering whether we can avoid this in KVM when
>>>>> when using split irqchip mode? E.g., what if we do not do this check
>>>>> in KVM and let all EOI be passed to userspace? Like:
>>>>>
>>>>> ---
>>>>>
>>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>>>> index 36591fa..c0a6e51 100644
>>>>> --- a/arch/x86/kvm/lapic.c
>>>>> +++ b/arch/x86/kvm/lapic.c
>>>>> @@ -936,10 +936,6 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>>>>>  {
>>>>>         int trigger_mode;
>>>>>
>>>>> -       /* Eoi the ioapic only if the ioapic doesn't own the vector. */
>>>>> -       if (!kvm_ioapic_handles_vector(apic, vector))
>>>>> -               return;
>>>>> -
>>>>>         /* Request a KVM exit to inform the userspace IOAPIC. */
>>>>>         if (irqchip_split(apic->vcpu->kvm)) {
>>>>>                 apic->vcpu->arch.pending_ioapic_eoi = vector;
>>>>> @@ -947,6 +943,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>>>>>                 return;
>>>>>         }
>>>>>
>>>>> +       /* Eoi the ioapic only if the ioapic doesn't own the vector. */
>>>>> +       if (!kvm_ioapic_handles_vector(apic, vector))
>>>>> +               return;
>>>>> +
>>>>>         if (apic_test_vector(vector, apic->regs + APIC_TMR))
>>>>>                 trigger_mode = IOAPIC_LEVEL_TRIG;
>>>>>         else
>>>>>
>>>>> ---
>>>>>
>>>>> I believe this will brings more user-space context switches, I just
>>>>> do not know how this will impact the system (only for curiosity :).
>>>>
>>>> Yes, this doesn't look good from the performance POV. Given that most
>>>> EOIs of the APICs will not trigger a message to an IOAPIC and userspace
>>>> exits are expensive, that should be measurable.
>>>>
>>>> But you should optimize route updating a bit: I noticed real delays
>>>> (almost a second) when reprogramming all IOAPIC pins during Jailhouse
>>>> handover. That's because of the heavyweight synchronize_srcu we have on
>>>> each route change. Even if that is an extreme case, you should try to
>>>> reduce route updates to only those that were actually affected by an
>>>> IRTE change.
>>>
>>> Yes, this solution will possibly need one more IOMMU API besides
>>> int_remap(), for IOAPIC to know whether specific IOAPIC entry needs
>>> to be updated (IOAPIC needs to know the index information of each
>>> IOAPIC entry, and see whether it is matched with IEC notified index
>>> and mask, in ioapic_iec_notifier()).
>>>
>>> Another problem that may cause this issue is that: now we are
>>> committing route information every time when updating route entries
>>> (please check kvm_update_routing_entry()). IMHO this is not
>>> necessary - we will need 25 times of committing for each IEC
>>> invalication, while actually we only need one at the end.
>>>
>>> So... We'll have another solution for this: how about not committing
>>> for each update, and just do one commit after all changes settled?
>>> This seems to be much cleaner, and easier comparing to the above
>>> proposal.
>>
>> If you see a number of invalidation requests queued in a row, you may
>> also process them atomically from the guest POV, which includes a single
>> route update at the end, before the guest gets told that everything was
>> done.
> 
> Yes, this would be better as a 3rd step after the above two. If you
> would not mind, I'll noting this down as well with the 1st proposal
> (considering that this will need more code changes, not only in QI
> logic, also in IEC notification path), and will only contain the 2nd
> change (explicit route commit) in v6, to partially solve the problem
> for now.

Ack.

Jan
Paolo Bonzini May 9, 2016, 11:50 a.m. UTC | #6
On 03/05/2016 06:38, Jan Kiszka wrote:
> Yes, this doesn't look good from the performance POV. Given that most
> EOIs of the APICs will not trigger a message to an IOAPIC and userspace
> exits are expensive, that should be measurable.
> 
> But you should optimize route updating a bit: I noticed real delays
> (almost a second) when reprogramming all IOAPIC pins during Jailhouse
> handover. That's because of the heavyweight synchronize_srcu we have on
> each route change. Even if that is an extreme case, you should try to
> reduce route updates to only those that were actually affected by an
> IRTE change.

It's okay to make it synchronize_srcu_expedited.  Unlike RCU, expediting
SRCU grace periods doesn't have an impact on the whole system (the
reason why KVM uses SRCU here is exactly to avoid
synchronize_rcu_expedited).

Of course, optimizing userspace cannot hurt either.

Paolo
diff mbox

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 36591fa..c0a6e51 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -936,10 +936,6 @@  static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
 {
        int trigger_mode;

-       /* Eoi the ioapic only if the ioapic doesn't own the vector. */
-       if (!kvm_ioapic_handles_vector(apic, vector))
-               return;
-
        /* Request a KVM exit to inform the userspace IOAPIC. */
        if (irqchip_split(apic->vcpu->kvm)) {
                apic->vcpu->arch.pending_ioapic_eoi = vector;
@@ -947,6 +943,10 @@  static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
                return;
        }

+       /* Eoi the ioapic only if the ioapic doesn't own the vector. */
+       if (!kvm_ioapic_handles_vector(apic, vector))
+               return;
+
        if (apic_test_vector(vector, apic->regs + APIC_TMR))
                trigger_mode = IOAPIC_LEVEL_TRIG;
        else