PCI/VMD: White list for fast interrupt handlers

Message ID 20180508160022.30954-1-keith.busch@intel.com
State Accepted
Delegated to: Lorenzo Pieralisi
Headers show
Series
  • PCI/VMD: White list for fast interrupt handlers
Related show

Commit Message

Keith Busch May 8, 2018, 4 p.m.
Devices with slow interrupt handlers are significantly harming performance
when their interrupt vector is shared with a fast device. This patch
creates a class code white list for devices with known fast interrupt
handlers, and all other devices will share a single vector so they don't
interfere with performance.

At the moment, only the NVM Express class code is on the list, but more
may be added if VMD users desire to use other low-latency devices in
these domains.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/host/vmd.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig May 9, 2018, 4:38 a.m. | #1
On Tue, May 08, 2018 at 10:00:22AM -0600, Keith Busch wrote:
> Devices with slow interrupt handlers are significantly harming performance
> when their interrupt vector is shared with a fast device. This patch
> creates a class code white list for devices with known fast interrupt
> handlers, and all other devices will share a single vector so they don't
> interfere with performance.
> 
> At the moment, only the NVM Express class code is on the list, but more
> may be added if VMD users desire to use other low-latency devices in
> these domains.

I think this is far too much of a hack.   Just don't use VMD if your
care about performance.
Keith Busch May 9, 2018, 3:26 p.m. | #2
On Tue, May 08, 2018 at 09:38:28PM -0700, Christoph Hellwig wrote:
> On Tue, May 08, 2018 at 10:00:22AM -0600, Keith Busch wrote:
> > Devices with slow interrupt handlers are significantly harming performance
> > when their interrupt vector is shared with a fast device. This patch
> > creates a class code white list for devices with known fast interrupt
> > handlers, and all other devices will share a single vector so they don't
> > interfere with performance.
> > 
> > At the moment, only the NVM Express class code is on the list, but more
> > may be added if VMD users desire to use other low-latency devices in
> > these domains.
> 
> I think this is far too much of a hack.   Just don't use VMD if your
> care about performance.

I'm not aware of an easier way you can direct-assign an entire PCIe domain
to a virtual machine. :)
Derrick, Jonathan May 11, 2018, 3:39 p.m. | #3
On Wed, 2018-05-09 at 09:26 -0600, Keith Busch wrote:
> On Tue, May 08, 2018 at 09:38:28PM -0700, Christoph Hellwig wrote:
> > On Tue, May 08, 2018 at 10:00:22AM -0600, Keith Busch wrote:
> > > Devices with slow interrupt handlers are significantly harming
> > > performance
> > > when their interrupt vector is shared with a fast device. This
> > > patch
> > > creates a class code white list for devices with known fast
> > > interrupt
> > > handlers, and all other devices will share a single vector so
> > > they don't
> > > interfere with performance.
> > > 
> > > At the moment, only the NVM Express class code is on the list,
> > > but more
> > > may be added if VMD users desire to use other low-latency devices
> > > in
> > > these domains.
> > 
> > I think this is far too much of a hack.   Just don't use VMD if
> > your
> > care about performance.
> 
> I'm not aware of an easier way you can direct-assign an entire PCIe
> domain
> to a virtual machine. :)

It's fine with me

Acked-by: Jon Derrick: <jonathan.derrick@intel.com>
Christoph Hellwig May 14, 2018, 2:15 p.m. | #4
On Wed, May 09, 2018 at 09:26:53AM -0600, Keith Busch wrote:
> I'm not aware of an easier way you can direct-assign an entire PCIe domain
> to a virtual machine. :)

libvirt will handle it just fine for you.
Derrick, Jonathan May 25, 2018, 7:11 p.m. | #5
+Lorenzo

On Fri, 2018-05-11 at 09:39 -0600, Jonathan Derrick wrote:
> On Wed, 2018-05-09 at 09:26 -0600, Keith Busch wrote:
> > On Tue, May 08, 2018 at 09:38:28PM -0700, Christoph Hellwig wrote:
> > > On Tue, May 08, 2018 at 10:00:22AM -0600, Keith Busch wrote:
> > > > Devices with slow interrupt handlers are significantly harming
> > > > performance
> > > > when their interrupt vector is shared with a fast device. This
> > > > patch
> > > > creates a class code white list for devices with known fast
> > > > interrupt
> > > > handlers, and all other devices will share a single vector so
> > > > they don't
> > > > interfere with performance.
> > > > 
> > > > At the moment, only the NVM Express class code is on the list,
> > > > but more
> > > > may be added if VMD users desire to use other low-latency
> > > > devices
> > > > in
> > > > these domains.
> > > 
> > > I think this is far too much of a hack.   Just don't use VMD if
> > > your
> > > care about performance.
> > 
> > I'm not aware of an easier way you can direct-assign an entire PCIe
> > domain
> > to a virtual machine. :)
> 
> It's fine with me
> 
> Acked-by: Jon Derrick: <jonathan.derrick@intel.com>


We've seen this actually fix an issue with one vendor's multi-function
switch.
I'd like to see it get into 4.18 if possible
Lorenzo Pieralisi June 28, 2018, 11:22 a.m. | #6
On Fri, May 25, 2018 at 07:11:14PM +0000, Derrick, Jonathan wrote:
> +Lorenzo
> 
> On Fri, 2018-05-11 at 09:39 -0600, Jonathan Derrick wrote:
> > On Wed, 2018-05-09 at 09:26 -0600, Keith Busch wrote:
> > > On Tue, May 08, 2018 at 09:38:28PM -0700, Christoph Hellwig wrote:
> > > > On Tue, May 08, 2018 at 10:00:22AM -0600, Keith Busch wrote:
> > > > > Devices with slow interrupt handlers are significantly harming
> > > > > performance
> > > > > when their interrupt vector is shared with a fast device. This
> > > > > patch
> > > > > creates a class code white list for devices with known fast
> > > > > interrupt
> > > > > handlers, and all other devices will share a single vector so
> > > > > they don't
> > > > > interfere with performance.
> > > > > 
> > > > > At the moment, only the NVM Express class code is on the list,
> > > > > but more
> > > > > may be added if VMD users desire to use other low-latency
> > > > > devices
> > > > > in
> > > > > these domains.
> > > > 
> > > > I think this is far too much of a hack.   Just don't use VMD if
> > > > your
> > > > care about performance.
> > > 
> > > I'm not aware of an easier way you can direct-assign an entire PCIe
> > > domain
> > > to a virtual machine. :)
> > 
> > It's fine with me
> > 
> > Acked-by: Jon Derrick: <jonathan.derrick@intel.com>
> 
> 
> We've seen this actually fix an issue with one vendor's multi-function
> switch.
> I'd like to see it get into 4.18 if possible

Sorry for the delay in getting back to this.

It seems like Christoph is not too happy about this patch, it is your
code so I would apply it unless there is a cleaner alternative so please
do let me know.

Thanks,
Lorenzo
Keith Busch June 28, 2018, 2:02 p.m. | #7
On Thu, Jun 28, 2018 at 12:22:17PM +0100, Lorenzo Pieralisi wrote:
> On Fri, May 25, 2018 at 07:11:14PM +0000, Derrick, Jonathan wrote:
> > We've seen this actually fix an issue with one vendor's multi-function
> > switch. I'd like to see it get into 4.18 if possible
> 
> Sorry for the delay in getting back to this.
> 
> It seems like Christoph is not too happy about this patch, it is your
> code so I would apply it unless there is a cleaner alternative so please
> do let me know.

VMD hardware really relies on the host software to bring all the logic,
and it was intended to prioritize PCIe attached storage. The patch just
captures that intention, so yes please, let's queue this up.

I understands Christoph's lack of enthusiasm for this, but at least this
isn't a PCIe HBA with proprietary interfaces and firmware. :)
Lorenzo Pieralisi June 28, 2018, 4:32 p.m. | #8
On Thu, Jun 28, 2018 at 08:02:40AM -0600, Keith Busch wrote:
> On Thu, Jun 28, 2018 at 12:22:17PM +0100, Lorenzo Pieralisi wrote:
> > On Fri, May 25, 2018 at 07:11:14PM +0000, Derrick, Jonathan wrote:
> > > We've seen this actually fix an issue with one vendor's multi-function
> > > switch. I'd like to see it get into 4.18 if possible
> > 
> > Sorry for the delay in getting back to this.
> > 
> > It seems like Christoph is not too happy about this patch, it is your
> > code so I would apply it unless there is a cleaner alternative so please
> > do let me know.
> 
> VMD hardware really relies on the host software to bring all the logic,
> and it was intended to prioritize PCIe attached storage. The patch just
> captures that intention, so yes please, let's queue this up.
> 
> I understands Christoph's lack of enthusiasm for this, but at least this
> isn't a PCIe HBA with proprietary interfaces and firmware. :)

Applied to pci/vmd with a slightly updated changelog, thanks.

Lorenzo

Patch

diff --git a/drivers/pci/host/vmd.c b/drivers/pci/host/vmd.c
index 930a8fa08bd6..f94f3e1d3470 100644
--- a/drivers/pci/host/vmd.c
+++ b/drivers/pci/host/vmd.c
@@ -175,9 +175,20 @@  static struct vmd_irq_list *vmd_next_irq(struct vmd_dev *vmd, struct msi_desc *d
 	int i, best = 1;
 	unsigned long flags;
 
-	if (pci_is_bridge(msi_desc_to_pci_dev(desc)) || vmd->msix_count == 1)
+	if (vmd->msix_count == 1)
 		return &vmd->irqs[0];
 
+	/*
+	 * White list for fast-interrupt handlers. All others will share the
+	 * "slow" interrupt vector.
+	 */
+	switch (msi_desc_to_pci_dev(desc)->class) {
+	case PCI_CLASS_STORAGE_EXPRESS:
+		break;
+	default:
+		return &vmd->irqs[0];
+	}
+
 	raw_spin_lock_irqsave(&list_lock, flags);
 	for (i = 1; i < vmd->msix_count; i++)
 		if (vmd->irqs[i].count < vmd->irqs[best].count)