diff mbox

[v4,0/5] vfio/pci: Fix up breakage against split irqchip and INTx

Message ID 20200318145204.74483-1-peterx@redhat.com
State New
Headers show

Commit Message

Peter Xu March 18, 2020, 2:51 p.m. UTC
v4:
- pick r-b and a-b for Alex without patch 4
- only kick resamplefd for level triggered irq (as 3.1 change on patch
  4) [Alex]
- fix mingw build error with below squashed into patch 4:

----------------------------------------
----------------------------------------

v3:
- collect r-bs for Eric
- unconditionally call kvm_resample_fd_notify(), change comment [Alex]
- remove the split irqchip check in kvm_resample_fd_notify(), then let
  it return nothing [Alex]
- test against shared irq to make sure it won't break

v2:
- pick tags
- don't register resamplefd with KVM kernel when the userspace
  resamplefd path is enabled (should enable fast path on new kernels)
- fix resamplefd mem leak
- fix commit message of patch 4 [Eric]
- let kvm_resample_fd_notify() return a boolean, skip ioapic check if
  returned true
- more comments here and there in the code to state the fact that
  userspace ioapic irr & remote-irr are bypassed [Paolo]

VFIO INTx is not working with split irqchip.  On new kernels KVM_IRQFD
will directly fail with resamplefd attached so QEMU will automatically
fallback to the INTx slow path.  However on old kernels it's still
broken.

Only until recently I noticed that this could also break PXE boot for
assigned NICs [1].  My wild guess is that the PXE ROM will be mostly
using INTx as well, which means we can't bypass that even if we
enables MSI for the guest kernel.

This series tries to first fix this issue function-wise, then speed up
for the INTx again with resamplefd (mostly following the ideas
proposed by Paolo one year ago [2]).  My TCP_RR test shows that:

  - Before this series: this is broken, no number to show

  - After patch 1 (enable slow path): get 63% perf comparing to full
    kernel irqchip

  - After whole series (enable fast path partly, irq injection will be
    the same as fast path, however userspace needs to intercept for
    EOI broadcast to resamplefd, though should still be faster than
    the MMIO trick for intx eoi): get 93% perf comparing to full
    kernel irqchip, which is a 46% performance boost

I think we can consider to apply patch 1 even sooner than the rest of
the series to unbreak intx+split first.

The whole test matrix for reference:

  |----------+---------+-------------------+--------------+--------------------|
  | IRQ type | irqchip | TCP_STREAM (Gbps) | TCP_RR (pps) | note               |
  |----------+---------+-------------------+--------------+--------------------|
  | msi      | on      |              9.39 |        17567 |                    |
  | nomsi    | on      |              9.29 |        14056 |                    |
  | msi      | split   |              9.36 |        17330 |                    |
  | nomsi    | split   |                 / |            / | currently broken   |
  | nomsi    | split   |              8.98 |         8977 | after patch 1      |
  | nomsi    | split   |              9.21 |        13142 | after whole series |
  |----------+---------+-------------------+--------------+--------------------|

Any review comment is welcomed.  Thanks,

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1786404
[2] https://patchwork.kernel.org/patch/10738541/#22609933

Peter Xu (5):
  vfio/pci: Disable INTx fast path if using split irqchip
  vfio/pci: Use kvm_irqchip_add_irqfd_notifier_gsi() for irqfds
  KVM: Pass EventNotifier into kvm_irqchip_assign_irqfd
  KVM: Kick resamplefd for split kernel irqchip
  Revert "vfio/pci: Disable INTx fast path if using split irqchip"

 accel/kvm/kvm-all.c    | 95 ++++++++++++++++++++++++++++++++++++++----
 accel/kvm/trace-events |  1 +
 hw/intc/ioapic.c       | 19 +++++++++
 hw/vfio/pci.c          | 37 +++++++---------
 include/sysemu/kvm.h   |  4 ++
 5 files changed, 126 insertions(+), 30 deletions(-)

Comments

Alex Williamson March 31, 2020, 10:32 p.m. UTC | #1
On Wed, 18 Mar 2020 10:51:59 -0400
Peter Xu <peterx@redhat.com> wrote:

> v4:
> - pick r-b and a-b for Alex without patch 4
> - only kick resamplefd for level triggered irq (as 3.1 change on patch
>   4) [Alex]
> - fix mingw build error with below squashed into patch 4:

IMO, it'd be nice to get this in for QEMU 5.0, were others thinking
something different?  I provided acks thinking Paolo might take it, but
I can send a pull request for it if Paolo wants to ack instead.  Thanks,

Alex

> ----------------------------------------
> diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
> index 2ae96e10be..b9ec570c03 100644
> --- a/hw/intc/ioapic.c
> +++ b/hw/intc/ioapic.c
> @@ -241,6 +241,7 @@ void ioapic_eoi_broadcast(int vector)
>                  continue;
>              }
> 
> +#ifdef CONFIG_KVM
>              /*
>               * When IOAPIC is in the userspace while APIC is still in
>               * the kernel (i.e., split irqchip), we have a trick to
> @@ -257,6 +258,7 @@ void ioapic_eoi_broadcast(int vector)
>               * emulated devices that are using/sharing the same IRQ.
>               */
>              kvm_resample_fd_notify(n);
> +#endif
> 
>              if (!(entry & IOAPIC_LVT_REMOTE_IRR)) {
>                  continue;
> ----------------------------------------
> 
> v3:
> - collect r-bs for Eric
> - unconditionally call kvm_resample_fd_notify(), change comment [Alex]
> - remove the split irqchip check in kvm_resample_fd_notify(), then let
>   it return nothing [Alex]
> - test against shared irq to make sure it won't break
> 
> v2:
> - pick tags
> - don't register resamplefd with KVM kernel when the userspace
>   resamplefd path is enabled (should enable fast path on new kernels)
> - fix resamplefd mem leak
> - fix commit message of patch 4 [Eric]
> - let kvm_resample_fd_notify() return a boolean, skip ioapic check if
>   returned true
> - more comments here and there in the code to state the fact that
>   userspace ioapic irr & remote-irr are bypassed [Paolo]
> 
> VFIO INTx is not working with split irqchip.  On new kernels KVM_IRQFD
> will directly fail with resamplefd attached so QEMU will automatically
> fallback to the INTx slow path.  However on old kernels it's still
> broken.
> 
> Only until recently I noticed that this could also break PXE boot for
> assigned NICs [1].  My wild guess is that the PXE ROM will be mostly
> using INTx as well, which means we can't bypass that even if we
> enables MSI for the guest kernel.
> 
> This series tries to first fix this issue function-wise, then speed up
> for the INTx again with resamplefd (mostly following the ideas
> proposed by Paolo one year ago [2]).  My TCP_RR test shows that:
> 
>   - Before this series: this is broken, no number to show
> 
>   - After patch 1 (enable slow path): get 63% perf comparing to full
>     kernel irqchip
> 
>   - After whole series (enable fast path partly, irq injection will be
>     the same as fast path, however userspace needs to intercept for
>     EOI broadcast to resamplefd, though should still be faster than
>     the MMIO trick for intx eoi): get 93% perf comparing to full
>     kernel irqchip, which is a 46% performance boost
> 
> I think we can consider to apply patch 1 even sooner than the rest of
> the series to unbreak intx+split first.
> 
> The whole test matrix for reference:
> 
>   |----------+---------+-------------------+--------------+--------------------|
>   | IRQ type | irqchip | TCP_STREAM (Gbps) | TCP_RR (pps) | note               |
>   |----------+---------+-------------------+--------------+--------------------|
>   | msi      | on      |              9.39 |        17567 |                    |
>   | nomsi    | on      |              9.29 |        14056 |                    |
>   | msi      | split   |              9.36 |        17330 |                    |
>   | nomsi    | split   |                 / |            / | currently broken   |
>   | nomsi    | split   |              8.98 |         8977 | after patch 1      |
>   | nomsi    | split   |              9.21 |        13142 | after whole series |
>   |----------+---------+-------------------+--------------+--------------------|
> 
> Any review comment is welcomed.  Thanks,
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1786404
> [2] https://patchwork.kernel.org/patch/10738541/#22609933
> 
> Peter Xu (5):
>   vfio/pci: Disable INTx fast path if using split irqchip
>   vfio/pci: Use kvm_irqchip_add_irqfd_notifier_gsi() for irqfds
>   KVM: Pass EventNotifier into kvm_irqchip_assign_irqfd
>   KVM: Kick resamplefd for split kernel irqchip
>   Revert "vfio/pci: Disable INTx fast path if using split irqchip"
> 
>  accel/kvm/kvm-all.c    | 95 ++++++++++++++++++++++++++++++++++++++----
>  accel/kvm/trace-events |  1 +
>  hw/intc/ioapic.c       | 19 +++++++++
>  hw/vfio/pci.c          | 37 +++++++---------
>  include/sysemu/kvm.h   |  4 ++
>  5 files changed, 126 insertions(+), 30 deletions(-)
>
Peter Xu March 31, 2020, 11:07 p.m. UTC | #2
On Tue, Mar 31, 2020 at 04:32:45PM -0600, Alex Williamson wrote:
> On Wed, 18 Mar 2020 10:51:59 -0400
> Peter Xu <peterx@redhat.com> wrote:
> 
> > v4:
> > - pick r-b and a-b for Alex without patch 4
> > - only kick resamplefd for level triggered irq (as 3.1 change on patch
> >   4) [Alex]
> > - fix mingw build error with below squashed into patch 4:
> 
> IMO, it'd be nice to get this in for QEMU 5.0, were others thinking
> something different?  I provided acks thinking Paolo might take it, but
> I can send a pull request for it if Paolo wants to ack instead.  Thanks,

I didn't ask because I thought it has already missed 5.0 (/me didn't
watch the schedule before hand, softfreeze Mar 17).  It would be
definitely good if this can still make it somehow.

Thanks,
Paolo Bonzini May 21, 2020, 3:57 p.m. UTC | #3
On 18/03/20 15:51, Peter Xu wrote:
> v4:
> - pick r-b and a-b for Alex without patch 4
> - only kick resamplefd for level triggered irq (as 3.1 change on patch
>   4) [Alex]
> - fix mingw build error with below squashed into patch 4:
> 
> ----------------------------------------
> diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
> index 2ae96e10be..b9ec570c03 100644
> --- a/hw/intc/ioapic.c
> +++ b/hw/intc/ioapic.c
> @@ -241,6 +241,7 @@ void ioapic_eoi_broadcast(int vector)
>                  continue;
>              }
> 
> +#ifdef CONFIG_KVM
>              /*
>               * When IOAPIC is in the userspace while APIC is still in
>               * the kernel (i.e., split irqchip), we have a trick to
> @@ -257,6 +258,7 @@ void ioapic_eoi_broadcast(int vector)
>               * emulated devices that are using/sharing the same IRQ.
>               */
>              kvm_resample_fd_notify(n);
> +#endif
> 
>              if (!(entry & IOAPIC_LVT_REMOTE_IRR)) {
>                  continue;
> ----------------------------------------
> 
> v3:
> - collect r-bs for Eric
> - unconditionally call kvm_resample_fd_notify(), change comment [Alex]
> - remove the split irqchip check in kvm_resample_fd_notify(), then let
>   it return nothing [Alex]
> - test against shared irq to make sure it won't break
> 
> v2:
> - pick tags
> - don't register resamplefd with KVM kernel when the userspace
>   resamplefd path is enabled (should enable fast path on new kernels)
> - fix resamplefd mem leak
> - fix commit message of patch 4 [Eric]
> - let kvm_resample_fd_notify() return a boolean, skip ioapic check if
>   returned true
> - more comments here and there in the code to state the fact that
>   userspace ioapic irr & remote-irr are bypassed [Paolo]
> 
> VFIO INTx is not working with split irqchip.  On new kernels KVM_IRQFD
> will directly fail with resamplefd attached so QEMU will automatically
> fallback to the INTx slow path.  However on old kernels it's still
> broken.
> 
> Only until recently I noticed that this could also break PXE boot for
> assigned NICs [1].  My wild guess is that the PXE ROM will be mostly
> using INTx as well, which means we can't bypass that even if we
> enables MSI for the guest kernel.
> 
> This series tries to first fix this issue function-wise, then speed up
> for the INTx again with resamplefd (mostly following the ideas
> proposed by Paolo one year ago [2]).  My TCP_RR test shows that:
> 
>   - Before this series: this is broken, no number to show
> 
>   - After patch 1 (enable slow path): get 63% perf comparing to full
>     kernel irqchip
> 
>   - After whole series (enable fast path partly, irq injection will be
>     the same as fast path, however userspace needs to intercept for
>     EOI broadcast to resamplefd, though should still be faster than
>     the MMIO trick for intx eoi): get 93% perf comparing to full
>     kernel irqchip, which is a 46% performance boost
> 
> I think we can consider to apply patch 1 even sooner than the rest of
> the series to unbreak intx+split first.
> 
> The whole test matrix for reference:
> 
>   |----------+---------+-------------------+--------------+--------------------|
>   | IRQ type | irqchip | TCP_STREAM (Gbps) | TCP_RR (pps) | note               |
>   |----------+---------+-------------------+--------------+--------------------|
>   | msi      | on      |              9.39 |        17567 |                    |
>   | nomsi    | on      |              9.29 |        14056 |                    |
>   | msi      | split   |              9.36 |        17330 |                    |
>   | nomsi    | split   |                 / |            / | currently broken   |
>   | nomsi    | split   |              8.98 |         8977 | after patch 1      |
>   | nomsi    | split   |              9.21 |        13142 | after whole series |
>   |----------+---------+-------------------+--------------+--------------------|
> 
> Any review comment is welcomed.  Thanks,
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1786404
> [2] https://patchwork.kernel.org/patch/10738541/#22609933
> 
> Peter Xu (5):
>   vfio/pci: Disable INTx fast path if using split irqchip
>   vfio/pci: Use kvm_irqchip_add_irqfd_notifier_gsi() for irqfds
>   KVM: Pass EventNotifier into kvm_irqchip_assign_irqfd
>   KVM: Kick resamplefd for split kernel irqchip
>   Revert "vfio/pci: Disable INTx fast path if using split irqchip"
> 
>  accel/kvm/kvm-all.c    | 95 ++++++++++++++++++++++++++++++++++++++----
>  accel/kvm/trace-events |  1 +
>  hw/intc/ioapic.c       | 19 +++++++++
>  hw/vfio/pci.c          | 37 +++++++---------
>  include/sysemu/kvm.h   |  4 ++
>  5 files changed, 126 insertions(+), 30 deletions(-)
> 

Queued, thanks.

Paolo
diff mbox

Patch

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 2ae96e10be..b9ec570c03 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -241,6 +241,7 @@  void ioapic_eoi_broadcast(int vector)
                 continue;
             }

+#ifdef CONFIG_KVM
             /*
              * When IOAPIC is in the userspace while APIC is still in
              * the kernel (i.e., split irqchip), we have a trick to
@@ -257,6 +258,7 @@  void ioapic_eoi_broadcast(int vector)
              * emulated devices that are using/sharing the same IRQ.
              */
             kvm_resample_fd_notify(n);
+#endif

             if (!(entry & IOAPIC_LVT_REMOTE_IRR)) {
                 continue;