diff mbox

[RESEND] pci-assign: Enable MSIX on device to match guest

Message ID 20130107041458.29065.77211.stgit@bling.home
State New
Headers show

Commit Message

Alex Williamson Jan. 7, 2013, 4:30 a.m. UTC
When a guest enables MSIX on a device we evaluate the MSIX vector
table, typically find no unmasked vectors and don't switch the device
to MSIX mode.  This generally works fine and the device will be
switched once the guest enables and therefore unmasks a vector.
Unfortunately some drivers enable MSIX, then use interfaces to send
commands between VF & PF or PF & firmware that act based on the host
state of the device.  These therefore may break when MSIX is managed
lazily.  This change re-enables the previous test used to enable MSIX
(see qemu-kvm a6b402c9), which basically guesses whether a vector
will be used based on the data field of the vector table.

Cc: qemu-stable@nongnu.org
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---

Michael has now ack'd this patch as the correct initial first step,
so I'm resending with that included.  I'm actually not sure what the
expected upstream path is for this file now that it's part of qemu.
There's no entry for hw/kvm/* in MAINTAINERS nor anything specifically
for this file.  Is kvm still upstream for this, through the uq branch
or is it qemu for anything not specifically part of a kvm interface?
Anthony, Gleb, Marcelo, Michael, feel free to add this to your tree,
any path is fine by me.  Thanks,

Alex

 hw/kvm/pci-assign.c |   17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

Comments

Michael S. Tsirkin Jan. 7, 2013, 4:01 p.m. UTC | #1
On Sun, Jan 06, 2013 at 09:30:31PM -0700, Alex Williamson wrote:
> When a guest enables MSIX on a device we evaluate the MSIX vector
> table, typically find no unmasked vectors and don't switch the device
> to MSIX mode.  This generally works fine and the device will be
> switched once the guest enables and therefore unmasks a vector.
> Unfortunately some drivers enable MSIX, then use interfaces to send
> commands between VF & PF or PF & firmware that act based on the host
> state of the device.  These therefore may break when MSIX is managed
> lazily.  This change re-enables the previous test used to enable MSIX
> (see qemu-kvm a6b402c9), which basically guesses whether a vector
> will be used based on the data field of the vector table.
> 
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> Michael has now ack'd this patch as the correct initial first step,
> so I'm resending with that included.  I'm actually not sure what the
> expected upstream path is for this file now that it's part of qemu.
> There's no entry for hw/kvm/* in MAINTAINERS nor anything specifically
> for this file.  Is kvm still upstream for this, through the uq branch
> or is it qemu for anything not specifically part of a kvm interface?
> Anthony, Gleb, Marcelo, Michael, feel free to add this to your tree,
> any path is fine by me.  Thanks,
> 
> Alex

I can merge this if there are no other takers.

>  hw/kvm/pci-assign.c |   17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c
> index 8ee9428..896cfe8 100644
> --- a/hw/kvm/pci-assign.c
> +++ b/hw/kvm/pci-assign.c
> @@ -1031,6 +1031,19 @@ static bool assigned_dev_msix_masked(MSIXTableEntry *entry)
>      return (entry->ctrl & cpu_to_le32(0x1)) != 0;
>  }
>  
> +/*
> + * When MSI-X is first enabled the vector table typically has all the
> + * vectors masked, so we can't use that as the obvious test to figure out
> + * how many vectors to initially enable.  Instead we look at the data field
> + * because this is what worked for pci-assign for a long time.  This makes
> + * sure the physical MSI-X state tracks the guest's view, which is important
> + * for some VF/PF and PF/fw communication channels.
> + */
> +static bool assigned_dev_msix_skipped(MSIXTableEntry *entry)
> +{
> +    return !entry->data;
> +}
> +
>  static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
>  {
>      AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> @@ -1041,7 +1054,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
>  
>      /* Get the usable entry number for allocating */
>      for (i = 0; i < adev->msix_max; i++, entry++) {
> -        if (assigned_dev_msix_masked(entry)) {
> +        if (assigned_dev_msix_skipped(entry)) {
>              continue;
>          }
>          entries_nr++;
> @@ -1070,7 +1083,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
>      for (i = 0; i < adev->msix_max; i++, entry++) {
>          adev->msi_virq[i] = -1;
>  
> -        if (assigned_dev_msix_masked(entry)) {
> +        if (assigned_dev_msix_skipped(entry)) {
>              continue;
>          }
>
Marcelo Tosatti Jan. 7, 2013, 10:41 p.m. UTC | #2
On Mon, Jan 07, 2013 at 06:01:19PM +0200, Michael S. Tsirkin wrote:
> On Sun, Jan 06, 2013 at 09:30:31PM -0700, Alex Williamson wrote:
> > When a guest enables MSIX on a device we evaluate the MSIX vector
> > table, typically find no unmasked vectors and don't switch the device
> > to MSIX mode.  This generally works fine and the device will be
> > switched once the guest enables and therefore unmasks a vector.
> > Unfortunately some drivers enable MSIX, then use interfaces to send
> > commands between VF & PF or PF & firmware that act based on the host
> > state of the device.  These therefore may break when MSIX is managed
> > lazily.  This change re-enables the previous test used to enable MSIX
> > (see qemu-kvm a6b402c9), which basically guesses whether a vector
> > will be used based on the data field of the vector table.
> > 
> > Cc: qemu-stable@nongnu.org
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > 
> > Michael has now ack'd this patch as the correct initial first step,
> > so I'm resending with that included.  I'm actually not sure what the
> > expected upstream path is for this file now that it's part of qemu.
> > There's no entry for hw/kvm/* in MAINTAINERS nor anything specifically
> > for this file.  Is kvm still upstream for this, through the uq branch
> > or is it qemu for anything not specifically part of a kvm interface?
> > Anthony, Gleb, Marcelo, Michael, feel free to add this to your tree,
> > any path is fine by me.  Thanks,
> > 
> > Alex
> 
> I can merge this if there are no other takers.

Go for it.

> 
> >  hw/kvm/pci-assign.c |   17 +++++++++++++++--
> >  1 file changed, 15 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c
> > index 8ee9428..896cfe8 100644
> > --- a/hw/kvm/pci-assign.c
> > +++ b/hw/kvm/pci-assign.c
> > @@ -1031,6 +1031,19 @@ static bool assigned_dev_msix_masked(MSIXTableEntry *entry)
> >      return (entry->ctrl & cpu_to_le32(0x1)) != 0;
> >  }
> >  
> > +/*
> > + * When MSI-X is first enabled the vector table typically has all the
> > + * vectors masked, so we can't use that as the obvious test to figure out
> > + * how many vectors to initially enable.  Instead we look at the data field
> > + * because this is what worked for pci-assign for a long time.  This makes
> > + * sure the physical MSI-X state tracks the guest's view, which is important
> > + * for some VF/PF and PF/fw communication channels.
> > + */
> > +static bool assigned_dev_msix_skipped(MSIXTableEntry *entry)
> > +{
> > +    return !entry->data;
> > +}
> > +
> >  static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> >  {
> >      AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> > @@ -1041,7 +1054,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> >  
> >      /* Get the usable entry number for allocating */
> >      for (i = 0; i < adev->msix_max; i++, entry++) {
> > -        if (assigned_dev_msix_masked(entry)) {
> > +        if (assigned_dev_msix_skipped(entry)) {
> >              continue;
> >          }
> >          entries_nr++;
> > @@ -1070,7 +1083,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> >      for (i = 0; i < adev->msix_max; i++, entry++) {
> >          adev->msi_virq[i] = -1;
> >  
> > -        if (assigned_dev_msix_masked(entry)) {
> > +        if (assigned_dev_msix_skipped(entry)) {
> >              continue;
> >          }
> >  
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c
index 8ee9428..896cfe8 100644
--- a/hw/kvm/pci-assign.c
+++ b/hw/kvm/pci-assign.c
@@ -1031,6 +1031,19 @@  static bool assigned_dev_msix_masked(MSIXTableEntry *entry)
     return (entry->ctrl & cpu_to_le32(0x1)) != 0;
 }
 
+/*
+ * When MSI-X is first enabled the vector table typically has all the
+ * vectors masked, so we can't use that as the obvious test to figure out
+ * how many vectors to initially enable.  Instead we look at the data field
+ * because this is what worked for pci-assign for a long time.  This makes
+ * sure the physical MSI-X state tracks the guest's view, which is important
+ * for some VF/PF and PF/fw communication channels.
+ */
+static bool assigned_dev_msix_skipped(MSIXTableEntry *entry)
+{
+    return !entry->data;
+}
+
 static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
 {
     AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
@@ -1041,7 +1054,7 @@  static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
 
     /* Get the usable entry number for allocating */
     for (i = 0; i < adev->msix_max; i++, entry++) {
-        if (assigned_dev_msix_masked(entry)) {
+        if (assigned_dev_msix_skipped(entry)) {
             continue;
         }
         entries_nr++;
@@ -1070,7 +1083,7 @@  static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
     for (i = 0; i < adev->msix_max; i++, entry++) {
         adev->msi_virq[i] = -1;
 
-        if (assigned_dev_msix_masked(entry)) {
+        if (assigned_dev_msix_skipped(entry)) {
             continue;
         }