mbox series

[RFC,0/7] vfio/pci: SR-IOV support

Message ID 158085337582.9445.17682266437583505502.stgit@gimli.home
Headers show
Series vfio/pci: SR-IOV support | expand

Message

Alex Williamson Feb. 4, 2020, 11:05 p.m. UTC
There seems to be an ongoing desire to use userspace, vfio-based
drivers for both SR-IOV PF and VF devices.  The fundamental issue
with this concept is that the VF is not fully independent of the PF
driver.  Minimally the PF driver might be able to deny service to the
VF, VF data paths might be dependent on the state of the PF device,
or the PF my have some degree of ability to inspect or manipulate the
VF data.  It therefore would seem irresponsible to unleash VFs onto
the system, managed by a user owned PF.

We address this in a few ways in this series.  First, we can use a bus
notifier and the driver_override facility to make sure VFs are bound
to the vfio-pci driver by default.  This should eliminate the chance
that a VF is accidentally bound and used by host drivers.  We don't
however remove the ability for a host admin to change this override.

The next issue we need to address is how we let userspace drivers
opt-in to this participation with the PF driver.  We do not want an
admin to be able to unwittingly assign one of these VFs to a tenant
that isn't working in collaboration with the PF driver.  We could use
IOMMU grouping, but this seems to push too far towards tightly coupled
PF and VF drivers.  This series introduces a "VF token", implemented
as a UUID, as a shared secret between PF and VF drivers.  The token
needs to be set by the PF driver and used as part of the device
matching by the VF driver.  Provisions in the code also account for
restarting the PF driver with active VF drivers, requiring the PF to
use the current token to re-gain access to the PF.

The above solutions introduce a bit of a modification to the VFIO ABI
and an additional ABI extension.  The modification is that the
VFIO_GROUP_GET_DEVICE_FD ioctl is specified to require a char string
from the user providing the device name.  For this solution, we extend
the syntax to allow the device name followed by key/value pairs.  In
this case we add "vf_token=3e7e882e-1daf-417f-ad8d-882eea5ee337", for
example.  These options are expected to be space separated.  Matching
these key/value pairs is entirely left to the vfio bus driver (ex.
vfio-pci) and the internal ops structure is extended to allow this
optional support.  This extension should be fully backwards compatible
to existing userspace, such code will simply fail to open these newly
exposed devices, as intended.

I've been debating whether instead of the above we should allow the
user to get the device fd as normal, but restrict the interfaces until
the user authenticates, but I'm afraid this would be a less backwards
compatible solution.  It would be just as unclear to the user why a
device read/write/mmap/ioctl failed as it might be to why getting the
device fd could fail.  However in the latter case, I believe we do a
better job of restricting how far userspace code might go before they
ultimately fail.  I'd welcome discussion in the space, and or course
the extension of the GET_DEVICE_FD string.

Finally, the user needs to be able to set a VF token.  I add a
VFIO_DEVICE_FEATURE ioctl for this that's meant to be reusable for
getting, setting, and probing arbitrary features of a device.

I'll reply to this cover letter with a very basic example of a QEMU
update to support this interface, though I haven't found a device yet
that behaves well with the PF running in one VM with the VF in
another, or really even just a PF running in a VM with SR-IOV enabled.
I know these devices exist though, and I suspect QEMU will not be the
primary user of this support for now, but this behavior reaffirms my
concerns to prevent mis-use.

Please comment.  In particular, does this approach meet the DPDK needs
for userspace PF and VF drivers, with the hopefully minor hurdle of
sharing a token between drivers.  The token is of course left to
userspace how to manage, and might be static (and not very secret) for
a given set of drivers.  Thanks,

Alex

---

Alex Williamson (7):
      vfio: Include optional device match in vfio_device_ops callbacks
      vfio/pci: Implement match ops
      vfio/pci: Introduce VF token
      vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user
      vfio/pci: Add sriov_configure support
      vfio/pci: Remove dev_fmt definition
      vfio/pci: Cleanup .probe() exit paths


 drivers/vfio/pci/vfio_pci.c         |  315 ++++++++++++++++++++++++++++++++---
 drivers/vfio/pci/vfio_pci_private.h |   10 +
 drivers/vfio/vfio.c                 |   19 ++
 include/linux/vfio.h                |    3 
 include/uapi/linux/vfio.h           |   37 ++++
 5 files changed, 356 insertions(+), 28 deletions(-)

Comments

Alex Williamson Feb. 4, 2020, 11:17 p.m. UTC | #1
Promised example QEMU test case...

commit 3557c63bcb286c71f3f7242cad632edd9e297d26
Author: Alex Williamson <alex.williamson@redhat.com>
Date:   Tue Feb 4 13:47:41 2020 -0700

    vfio-pci: QEMU support for vfio-pci VF tokens
    
    Example support for using a vf_token to gain access to a device as
    well as using the VFIO_DEVICE_FEATURE interface to set the VF token.
    Note that the kernel will disregard the additional option where it's
    not required, such as opening the PF with no VF users, so we can
    always provide it.
    
    NB. It's unclear whether there's value to this QEMU support without
    further exposure of SR-IOV within a VM.  This is meant mostly as a
    test case where the real initial users will likely be DPDK drivers.
    
    Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 337a173ce7c6..b755b4caa870 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2816,12 +2816,45 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
         goto error;
     }
 
-    ret = vfio_get_device(group, vdev->vbasedev.name, &vdev->vbasedev, errp);
+    if (!qemu_uuid_is_null(&vdev->vf_token)) {
+        char *uuid = qemu_uuid_unparse_strdup(&vdev->vf_token);
+
+        tmp = g_strdup_printf("%s vf_token=%s", vdev->vbasedev.name, uuid);
+        g_free(uuid);
+    } else {
+        tmp = g_strdup_printf("%s", vdev->vbasedev.name);
+    }
+
+    ret = vfio_get_device(group, tmp, &vdev->vbasedev, errp);
+
+    g_free(tmp);
+
     if (ret) {
         vfio_put_group(group);
         goto error;
     }
 
+    if (!qemu_uuid_is_null(&vdev->vf_token)) {
+        struct vfio_device_feature *feature;
+
+        feature = g_malloc0(sizeof(*feature) + 16);
+        feature->argsz = sizeof(*feature) + 16;
+        feature->flags = VFIO_DEVICE_FEATURE_PROBE | VFIO_DEVICE_FEATURE_SET |
+                         VFIO_DEVICE_FEATURE_PCI_VF_TOKEN;
+
+        if (!ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, feature)) {
+            feature->flags &= ~VFIO_DEVICE_FEATURE_PROBE;
+            memcpy(&feature->data, vdev->vf_token.data, 16);
+            if (ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, feature)) {
+                g_free(feature);
+                error_setg_errno(errp, errno, "failed to set vf_token UUID");
+                goto error;
+            }
+        }
+
+        g_free(feature);
+    }
+
     vfio_populate_device(vdev, &err);
     if (err) {
         error_propagate(errp, err);
@@ -3149,6 +3182,7 @@ static void vfio_instance_init(Object *obj)
 static Property vfio_pci_dev_properties[] = {
     DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIOPCIDevice, host),
     DEFINE_PROP_STRING("sysfsdev", VFIOPCIDevice, vbasedev.sysfsdev),
+    DEFINE_PROP_UUID_NODEFAULT("vf_token", VFIOPCIDevice, vf_token),
     DEFINE_PROP_ON_OFF_AUTO("display", VFIOPCIDevice,
                             display, ON_OFF_AUTO_OFF),
     DEFINE_PROP_UINT32("xres", VFIOPCIDevice, display_xres, 0),
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 35626cd63e9d..7f25672d7500 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -18,6 +18,7 @@
 #include "qemu/event_notifier.h"
 #include "qemu/queue.h"
 #include "qemu/timer.h"
+#include "qemu/uuid.h"
 
 #define PCI_ANY_ID (~0)
 
@@ -170,6 +171,7 @@ typedef struct VFIOPCIDevice {
     VFIODisplay *dpy;
     Error *migration_blocker;
     Notifier irqchip_change_notifier;
+    QemuUUID vf_token;
 } VFIOPCIDevice;
 
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index fb10370d2928..9bc28cc1d272 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -707,6 +707,43 @@ struct vfio_device_ioeventfd {
 
 #define VFIO_DEVICE_IOEVENTFD		_IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/**
+ * VFIO_DEVICE_FEATURE - _IORW(VFIO_TYPE, VFIO_BASE + 17,
+ *			       struct vfio_device_feature
+ *
+ * Get, set, or probe feature data of the device.  The feature is selected
+ * using the FEATURE_MASK portion of the flags field.  Support for a feature
+ * can be probed by setting both the FEATURE_MASK and PROBE bits.  A probe
+ * may optionally include the GET and/or SET bits to determine read vs write
+ * access of the feature respectively.  Probing a feature will return success
+ * if the feature is supporedt and all of the optionally indicated GET/SET
+ * methods are supported.  The format of the data portion of the structure is
+ * specific to the given feature.  The data portion is not required for
+ * probing.
+ *
+ * Return 0 on success, -errno on failure.
+ */
+struct vfio_device_feature {
+	__u32	argsz;
+	__u32	flags;
+#define VFIO_DEVICE_FEATURE_MASK	(0xffff) /* 16-bit feature index */
+#define VFIO_DEVICE_FEATURE_GET		(1 << 16) /* Get feature into data[] */
+#define VFIO_DEVICE_FEATURE_SET		(1 << 17) /* Set feature from data[] */
+#define VFIO_DEVICE_FEATURE_PROBE	(1 << 18) /* Probe feature support */
+	__u8	data[];
+};
+
+#define VFIO_DEVICE_FEATURE		_IO(VFIO_TYPE, VFIO_BASE + 17)
+
+/*
+ * Provide support for setting a PCI VF Token, which is used as a shared
+ * secret between PF and VF drivers.  This feature may only be set on a
+ * PCI SR-IOV PF when SR-IOV is enabled on the PF and there are no existing
+ * open VFs.  Data provided when setting this feature is a 16-byte array
+ * (__u8 b[16]), representing a UUID.
+ */
+#define VFIO_DEVICE_FEATURE_PCI_VF_TOKEN	(0)
+
 /* -------- API for Type1 VFIO IOMMU -------- */
 
 /**
Christoph Hellwig Feb. 5, 2020, 7:01 a.m. UTC | #2
On Tue, Feb 04, 2020 at 04:05:34PM -0700, Alex Williamson wrote:
> We address this in a few ways in this series.  First, we can use a bus
> notifier and the driver_override facility to make sure VFs are bound
> to the vfio-pci driver by default.  This should eliminate the chance
> that a VF is accidentally bound and used by host drivers.  We don't
> however remove the ability for a host admin to change this override.

That is just such a bad idea.  Using VFs in the host is a perfectly
valid use case that you are breaking.
Yi Liu Feb. 5, 2020, 7:57 a.m. UTC | #3
Hi Alex,

Silly questions on the background:

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, February 5, 2020 7:06 AM
> Subject: [RFC PATCH 0/7] vfio/pci: SR-IOV support
> 
> There seems to be an ongoing desire to use userspace, vfio-based
> drivers for both SR-IOV PF and VF devices. 

Is this series to make PF be bound-able to vfio-pci even SR-IOV is
enabled on such PFs? If yes, is it allowed to assign PF to a VM? or
it can only be used by userspace applications like DPDK?

> The fundamental issue
> with this concept is that the VF is not fully independent of the PF
> driver.  Minimally the PF driver might be able to deny service to the
> VF, VF data paths might be dependent on the state of the PF device,
> or the PF my have some degree of ability to inspect or manipulate the
> VF data.  It therefore would seem irresponsible to unleash VFs onto
> the system, managed by a user owned PF.
> 
> We address this in a few ways in this series.  First, we can use a bus
> notifier and the driver_override facility to make sure VFs are bound
> to the vfio-pci driver by default.  This should eliminate the chance
> that a VF is accidentally bound and used by host drivers.  We don't
> however remove the ability for a host admin to change this override.
> 
> The next issue we need to address is how we let userspace drivers
> opt-in to this participation with the PF driver.  We do not want an
> admin to be able to unwittingly assign one of these VFs to a tenant
> that isn't working in collaboration with the PF driver.  We could use
> IOMMU grouping, but this seems to push too far towards tightly coupled
> PF and VF drivers.  This series introduces a "VF token", implemented
> as a UUID, as a shared secret between PF and VF drivers.  The token
> needs to be set by the PF driver and used as part of the device
> matching by the VF driver.  Provisions in the code also account for
> restarting the PF driver with active VF drivers, requiring the PF to
> use the current token to re-gain access to the PF.

How about the scenario in which PF driver is vfio-based userspace
driver but VF drivers are mixed. This means not all VFs are bound
to vfio-based userspace driver. Is it also supported here? :-)

Regards,
Yi Liu
Yi Liu Feb. 5, 2020, 7:57 a.m. UTC | #4
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, February 5, 2020 7:18 AM
> To: kvm@vger.kernel.org
> Subject: Re: [RFC PATCH 0/7] vfio/pci: SR-IOV support
> 
> 
> Promised example QEMU test case...
> 
> commit 3557c63bcb286c71f3f7242cad632edd9e297d26
> Author: Alex Williamson <alex.williamson@redhat.com>
> Date:   Tue Feb 4 13:47:41 2020 -0700
> 
>     vfio-pci: QEMU support for vfio-pci VF tokens
> 
>     Example support for using a vf_token to gain access to a device as
>     well as using the VFIO_DEVICE_FEATURE interface to set the VF token.
>     Note that the kernel will disregard the additional option where it's
>     not required, such as opening the PF with no VF users, so we can
>     always provide it.
> 
>     NB. It's unclear whether there's value to this QEMU support without
>     further exposure of SR-IOV within a VM.  This is meant mostly as a
>     test case where the real initial users will likely be DPDK drivers.
> 
>     Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Just curious how UUID is used across the test. Should the QEMU
which opens VFs add the vfio_token=UUID or the QEMU which
opens PF add the vfio_token=UUID? or both should add vfio_token=UUID.

Regards,
Yi Liu
Alex Williamson Feb. 5, 2020, 1:58 p.m. UTC | #5
On Tue, 4 Feb 2020 23:01:09 -0800
Christoph Hellwig <hch@infradead.org> wrote:

> On Tue, Feb 04, 2020 at 04:05:34PM -0700, Alex Williamson wrote:
> > We address this in a few ways in this series.  First, we can use a bus
> > notifier and the driver_override facility to make sure VFs are bound
> > to the vfio-pci driver by default.  This should eliminate the chance
> > that a VF is accidentally bound and used by host drivers.  We don't
> > however remove the ability for a host admin to change this override.  
> 
> That is just such a bad idea.  Using VFs in the host is a perfectly
> valid use case that you are breaking.

vfio-pci currently does not allow binding to a PF with VFs enabled and
does not provide an sriov_configure callback, so it's not possible to
have VFs on a vfio-pci bound PF.  Therefore I'm not breaking any
existing use cases.  I'm also not preventing VFs from being used in the
host, I only set a default driver_override value, which can be replaced
if a different driver binding is desired.  So I also don't see that I'm
breaking a usage model here.  I do stand by the idea that VFs sourced
from a user owned PF should not by default be used in the host (ie.
autoprobed on device add).  There's a pci-pf-stub driver that can be
used to create VFs on a PF if no userspace access of the PF is required.
Thanks,

Alex
Alex Williamson Feb. 5, 2020, 2:10 p.m. UTC | #6
On Wed, 5 Feb 2020 07:57:21 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> Hi Alex,
> 
> Silly questions on the background:
> 
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Wednesday, February 5, 2020 7:06 AM
> > Subject: [RFC PATCH 0/7] vfio/pci: SR-IOV support
> > 
> > There seems to be an ongoing desire to use userspace, vfio-based
> > drivers for both SR-IOV PF and VF devices.   
> 
> Is this series to make PF be bound-able to vfio-pci even SR-IOV is
> enabled on such PFs? If yes, is it allowed to assign PF to a VM? or
> it can only be used by userspace applications like DPDK?

No, this series does not change the behavior of vfio-pci with respect
to probing a PF where VFs are already enabled.  This is still
disallowed.  I haven't seen a use case that requires this and allowing
it tends to subvert the restrictions here.  For instance, if an
existing VF is already in use by a vfio-pci driver, the PF can
transition from a trusted host driver to an unknown userspace driver.

> > The fundamental issue
> > with this concept is that the VF is not fully independent of the PF
> > driver.  Minimally the PF driver might be able to deny service to the
> > VF, VF data paths might be dependent on the state of the PF device,
> > or the PF my have some degree of ability to inspect or manipulate the
> > VF data.  It therefore would seem irresponsible to unleash VFs onto
> > the system, managed by a user owned PF.
> > 
> > We address this in a few ways in this series.  First, we can use a bus
> > notifier and the driver_override facility to make sure VFs are bound
> > to the vfio-pci driver by default.  This should eliminate the chance
> > that a VF is accidentally bound and used by host drivers.  We don't
> > however remove the ability for a host admin to change this override.
> > 
> > The next issue we need to address is how we let userspace drivers
> > opt-in to this participation with the PF driver.  We do not want an
> > admin to be able to unwittingly assign one of these VFs to a tenant
> > that isn't working in collaboration with the PF driver.  We could use
> > IOMMU grouping, but this seems to push too far towards tightly coupled
> > PF and VF drivers.  This series introduces a "VF token", implemented
> > as a UUID, as a shared secret between PF and VF drivers.  The token
> > needs to be set by the PF driver and used as part of the device
> > matching by the VF driver.  Provisions in the code also account for
> > restarting the PF driver with active VF drivers, requiring the PF to
> > use the current token to re-gain access to the PF.  
> 
> How about the scenario in which PF driver is vfio-based userspace
> driver but VF drivers are mixed. This means not all VFs are bound
> to vfio-based userspace driver. Is it also supported here? :-)

It's allowed.  Userspace VF drivers will need to participate in the VF
token scheme, host drivers may be bound to VFs normally after removing
the default driver_override.  Thanks,

Alex
Alex Williamson Feb. 5, 2020, 2:18 p.m. UTC | #7
On Wed, 5 Feb 2020 07:57:36 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Wednesday, February 5, 2020 7:18 AM
> > To: kvm@vger.kernel.org
> > Subject: Re: [RFC PATCH 0/7] vfio/pci: SR-IOV support
> > 
> > 
> > Promised example QEMU test case...
> > 
> > commit 3557c63bcb286c71f3f7242cad632edd9e297d26
> > Author: Alex Williamson <alex.williamson@redhat.com>
> > Date:   Tue Feb 4 13:47:41 2020 -0700
> > 
> >     vfio-pci: QEMU support for vfio-pci VF tokens
> > 
> >     Example support for using a vf_token to gain access to a device as
> >     well as using the VFIO_DEVICE_FEATURE interface to set the VF token.
> >     Note that the kernel will disregard the additional option where it's
> >     not required, such as opening the PF with no VF users, so we can
> >     always provide it.
> > 
> >     NB. It's unclear whether there's value to this QEMU support without
> >     further exposure of SR-IOV within a VM.  This is meant mostly as a
> >     test case where the real initial users will likely be DPDK drivers.
> > 
> >     Signed-off-by: Alex Williamson <alex.williamson@redhat.com>  
> 
> Just curious how UUID is used across the test. Should the QEMU
> which opens VFs add the vfio_token=UUID or the QEMU which
> opens PF add the vfio_token=UUID? or both should add vfio_token=UUID.

In this example we do both as this covers the case where there are
existing VF users, which requires the PF to also provide the vf_token.
If there are no VF users, the PF is not required to provide a vf_token
and vfio-pci will not fail the device match if a vf_token is provided
but not needed.  In fact, when a PF is probed by vfio-pci a random
vf_token is set, so it's required to use a PF driver to set a known
vf_token before any VF users can access their VFs.  Thanks,

Alex
Jerin Jacob Feb. 11, 2020, 11:18 a.m. UTC | #8
On Wed, Feb 5, 2020 at 4:35 AM Alex Williamson
<alex.williamson@redhat.com> wrote:
>
> There seems to be an ongoing desire to use userspace, vfio-based
> drivers for both SR-IOV PF and VF devices.  The fundamental issue
> with this concept is that the VF is not fully independent of the PF
> driver.  Minimally the PF driver might be able to deny service to the
> VF, VF data paths might be dependent on the state of the PF device,
> or the PF my have some degree of ability to inspect or manipulate the
> VF data.  It therefore would seem irresponsible to unleash VFs onto
> the system, managed by a user owned PF.
>
> We address this in a few ways in this series.  First, we can use a bus
> notifier and the driver_override facility to make sure VFs are bound
> to the vfio-pci driver by default.  This should eliminate the chance
> that a VF is accidentally bound and used by host drivers.  We don't
> however remove the ability for a host admin to change this override.
>
> The next issue we need to address is how we let userspace drivers
> opt-in to this participation with the PF driver.  We do not want an
> admin to be able to unwittingly assign one of these VFs to a tenant
> that isn't working in collaboration with the PF driver.  We could use
> IOMMU grouping, but this seems to push too far towards tightly coupled
> PF and VF drivers.  This series introduces a "VF token", implemented
> as a UUID, as a shared secret between PF and VF drivers.  The token
> needs to be set by the PF driver and used as part of the device
> matching by the VF driver.  Provisions in the code also account for
> restarting the PF driver with active VF drivers, requiring the PF to
> use the current token to re-gain access to the PF.

Thanks Alex for the series. DPDK realizes this use-case through, an out of
tree igb_uio module, for non VFIO devices. Supporting this use case, with
VFIO, will be a great enhancement for DPDK as we are planning to
get rid of out of tree modules any focus only on userspace aspects.

From the DPDK perspective, we have following use-cases

1) VF representer or OVS/vSwitch  use cases where
DPDK PF acts as an HW switch to steer traffic to VF
using the rte_flow library backed by HW CAMs.

2) Unlike, other PCI class of devices, Network class of PCIe devices
would have additional
capability on the PF devices such as promiscuous mode support etc
leverage that in DPDK
PF and VF use cases.

That would boil down to the use of the following topology.
a)  PF bound to DPDK/VFIO  and  VF bound to Linux
b)  PF bound to DPDK/VFIO  and  VF bound to DPDK/VFIO

Tested the use case (a) and it works this patch. Tested use case(b), it
works with patch provided both PF and VF under the same application.

Regarding the use case where  PF bound to DPDK/VFIO and
VF bound to DPDK/VFIO are _two different_ processes then sharing the UUID
will be a little tricky thing in terms of usage. But if that is the
purpose of bringing
UUID to the equation then it fine.

Overall this series looks good to me.  We can test the next non-RFC
series and give
Tested-by by after testing with DPDK.


>
> The above solutions introduce a bit of a modification to the VFIO ABI
> and an additional ABI extension.  The modification is that the
> VFIO_GROUP_GET_DEVICE_FD ioctl is specified to require a char string
> from the user providing the device name.  For this solution, we extend
> the syntax to allow the device name followed by key/value pairs.  In
> this case we add "vf_token=3e7e882e-1daf-417f-ad8d-882eea5ee337", for
> example.  These options are expected to be space separated.  Matching
> these key/value pairs is entirely left to the vfio bus driver (ex.
> vfio-pci) and the internal ops structure is extended to allow this
> optional support.  This extension should be fully backwards compatible
> to existing userspace, such code will simply fail to open these newly
> exposed devices, as intended.
>
> I've been debating whether instead of the above we should allow the
> user to get the device fd as normal, but restrict the interfaces until
> the user authenticates, but I'm afraid this would be a less backwards
> compatible solution.  It would be just as unclear to the user why a
> device read/write/mmap/ioctl failed as it might be to why getting the
> device fd could fail.  However in the latter case, I believe we do a
> better job of restricting how far userspace code might go before they
> ultimately fail.  I'd welcome discussion in the space, and or course
> the extension of the GET_DEVICE_FD string.
>
> Finally, the user needs to be able to set a VF token.  I add a
> VFIO_DEVICE_FEATURE ioctl for this that's meant to be reusable for
> getting, setting, and probing arbitrary features of a device.
>
> I'll reply to this cover letter with a very basic example of a QEMU
> update to support this interface, though I haven't found a device yet
> that behaves well with the PF running in one VM with the VF in
> another, or really even just a PF running in a VM with SR-IOV enabled.
> I know these devices exist though, and I suspect QEMU will not be the
> primary user of this support for now, but this behavior reaffirms my
> concerns to prevent mis-use.
>
> Please comment.  In particular, does this approach meet the DPDK needs
> for userspace PF and VF drivers, with the hopefully minor hurdle of
> sharing a token between drivers.  The token is of course left to
> userspace how to manage, and might be static (and not very secret) for
> a given set of drivers.  Thanks,
>
> Alex
>
> ---
>
> Alex Williamson (7):
>       vfio: Include optional device match in vfio_device_ops callbacks
>       vfio/pci: Implement match ops
>       vfio/pci: Introduce VF token
>       vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user
>       vfio/pci: Add sriov_configure support
>       vfio/pci: Remove dev_fmt definition
>       vfio/pci: Cleanup .probe() exit paths
>
>
>  drivers/vfio/pci/vfio_pci.c         |  315 ++++++++++++++++++++++++++++++++---
>  drivers/vfio/pci/vfio_pci_private.h |   10 +
>  drivers/vfio/vfio.c                 |   19 ++
>  include/linux/vfio.h                |    3
>  include/uapi/linux/vfio.h           |   37 ++++
>  5 files changed, 356 insertions(+), 28 deletions(-)
>
Thomas Monjalon Feb. 11, 2020, 1:57 p.m. UTC | #9
11/02/2020 12:18, Jerin Jacob:
> On Wed, Feb 5, 2020 at 4:35 AM Alex Williamson wrote:
> >
> > There seems to be an ongoing desire to use userspace, vfio-based
> > drivers for both SR-IOV PF and VF devices.  The fundamental issue
> > with this concept is that the VF is not fully independent of the PF
> > driver.  Minimally the PF driver might be able to deny service to the
> > VF, VF data paths might be dependent on the state of the PF device,
> > or the PF my have some degree of ability to inspect or manipulate the
> > VF data.  It therefore would seem irresponsible to unleash VFs onto
> > the system, managed by a user owned PF.
> >
> > We address this in a few ways in this series.  First, we can use a bus
> > notifier and the driver_override facility to make sure VFs are bound
> > to the vfio-pci driver by default.  This should eliminate the chance
> > that a VF is accidentally bound and used by host drivers.  We don't
> > however remove the ability for a host admin to change this override.
> >
> > The next issue we need to address is how we let userspace drivers
> > opt-in to this participation with the PF driver.  We do not want an
> > admin to be able to unwittingly assign one of these VFs to a tenant
> > that isn't working in collaboration with the PF driver.  We could use
> > IOMMU grouping, but this seems to push too far towards tightly coupled
> > PF and VF drivers.  This series introduces a "VF token", implemented
> > as a UUID, as a shared secret between PF and VF drivers.  The token
> > needs to be set by the PF driver and used as part of the device
> > matching by the VF driver.  Provisions in the code also account for
> > restarting the PF driver with active VF drivers, requiring the PF to
> > use the current token to re-gain access to the PF.
> 
> Thanks Alex for the series. DPDK realizes this use-case through, an out of
> tree igb_uio module, for non VFIO devices. Supporting this use case, with
> VFIO, will be a great enhancement for DPDK as we are planning to
> get rid of out of tree modules any focus only on userspace aspects.
[..]
> Regarding the use case where  PF bound to DPDK/VFIO and
> VF bound to DPDK/VFIO are _two different_ processes then sharing the UUID
> will be a little tricky thing in terms of usage. But if that is the
> purpose of bringing UUID to the equation then it fine.
> 
> Overall this series looks good to me.  We can test the next non-RFC
> series and give
> Tested-by by after testing with DPDK.
[..]
> > Please comment.  In particular, does this approach meet the DPDK needs
> > for userspace PF and VF drivers, with the hopefully minor hurdle of
> > sharing a token between drivers.  The token is of course left to
> > userspace how to manage, and might be static (and not very secret) for
> > a given set of drivers.  Thanks,

Thanks Alex, it looks to be a great improvement.

In the meantime, DPDK is going to move igb_uio (an out-of-tree
Linux kernel module) from the main DPDK repository to a side-repo.
This move and this patchset will hopefully encourage using VFIO.
As Jerin said, DPDK prefers relying on upstream Linux modules.
Alex Williamson Feb. 11, 2020, 5:06 p.m. UTC | #10
On Tue, 11 Feb 2020 16:48:47 +0530
Jerin Jacob <jerinjacobk@gmail.com> wrote:

> On Wed, Feb 5, 2020 at 4:35 AM Alex Williamson
> <alex.williamson@redhat.com> wrote:
> >
> > There seems to be an ongoing desire to use userspace, vfio-based
> > drivers for both SR-IOV PF and VF devices.  The fundamental issue
> > with this concept is that the VF is not fully independent of the PF
> > driver.  Minimally the PF driver might be able to deny service to the
> > VF, VF data paths might be dependent on the state of the PF device,
> > or the PF my have some degree of ability to inspect or manipulate the
> > VF data.  It therefore would seem irresponsible to unleash VFs onto
> > the system, managed by a user owned PF.
> >
> > We address this in a few ways in this series.  First, we can use a bus
> > notifier and the driver_override facility to make sure VFs are bound
> > to the vfio-pci driver by default.  This should eliminate the chance
> > that a VF is accidentally bound and used by host drivers.  We don't
> > however remove the ability for a host admin to change this override.
> >
> > The next issue we need to address is how we let userspace drivers
> > opt-in to this participation with the PF driver.  We do not want an
> > admin to be able to unwittingly assign one of these VFs to a tenant
> > that isn't working in collaboration with the PF driver.  We could use
> > IOMMU grouping, but this seems to push too far towards tightly coupled
> > PF and VF drivers.  This series introduces a "VF token", implemented
> > as a UUID, as a shared secret between PF and VF drivers.  The token
> > needs to be set by the PF driver and used as part of the device
> > matching by the VF driver.  Provisions in the code also account for
> > restarting the PF driver with active VF drivers, requiring the PF to
> > use the current token to re-gain access to the PF.  
> 
> Thanks Alex for the series. DPDK realizes this use-case through, an out of
> tree igb_uio module, for non VFIO devices. Supporting this use case, with
> VFIO, will be a great enhancement for DPDK as we are planning to
> get rid of out of tree modules any focus only on userspace aspects.
> 
> From the DPDK perspective, we have following use-cases
> 
> 1) VF representer or OVS/vSwitch  use cases where
> DPDK PF acts as an HW switch to steer traffic to VF
> using the rte_flow library backed by HW CAMs.
> 
> 2) Unlike, other PCI class of devices, Network class of PCIe devices
> would have additional
> capability on the PF devices such as promiscuous mode support etc
> leverage that in DPDK
> PF and VF use cases.
> 
> That would boil down to the use of the following topology.
> a)  PF bound to DPDK/VFIO  and  VF bound to Linux
> b)  PF bound to DPDK/VFIO  and  VF bound to DPDK/VFIO
> 
> Tested the use case (a) and it works this patch. Tested use case(b), it
> works with patch provided both PF and VF under the same application.
> 
> Regarding the use case where  PF bound to DPDK/VFIO and
> VF bound to DPDK/VFIO are _two different_ processes then sharing the UUID
> will be a little tricky thing in terms of usage. But if that is the
> purpose of bringing
> UUID to the equation then it fine.
> 
> Overall this series looks good to me.  We can test the next non-RFC
> series and give
> Tested-by by after testing with DPDK.

Thanks Jerin, that's great feedback.  For case b), it is rather the
intention of the shared VF token proposed here that it imposes some
small barrier in validating the collaboration between the PF and VF
drivers.  In a trusted environment, a common UUID might be exposed in a
shared file and the same token could be used by all PFs and VFs on the
system, or datacenter.  The goal is simply to make sure the
collaboration is explicit, I don't want to be fielding support issues
from users assigning PFs and VFs to unrelated VM instances or
unintentionally creating your scenario a) configuration.

With the positive response from you and Thomas, I'll post a non-RFC
version and barring any blockers maybe we can get this in for the v5.7
kernel.  Thanks,

Alex
Jerin Jacob Feb. 11, 2020, 6:03 p.m. UTC | #11
On Tue, Feb 11, 2020 at 10:36 PM Alex Williamson
<alex.williamson@redhat.com> wrote:
>
> On Tue, 11 Feb 2020 16:48:47 +0530
> Jerin Jacob <jerinjacobk@gmail.com> wrote:
>
> > On Wed, Feb 5, 2020 at 4:35 AM Alex Williamson
> > <alex.williamson@redhat.com> wrote:
> > >
> > > There seems to be an ongoing desire to use userspace, vfio-based
> > > drivers for both SR-IOV PF and VF devices.  The fundamental issue
> > > with this concept is that the VF is not fully independent of the PF
> > > driver.  Minimally the PF driver might be able to deny service to the
> > > VF, VF data paths might be dependent on the state of the PF device,
> > > or the PF my have some degree of ability to inspect or manipulate the
> > > VF data.  It therefore would seem irresponsible to unleash VFs onto
> > > the system, managed by a user owned PF.
> > >
> > > We address this in a few ways in this series.  First, we can use a bus
> > > notifier and the driver_override facility to make sure VFs are bound
> > > to the vfio-pci driver by default.  This should eliminate the chance
> > > that a VF is accidentally bound and used by host drivers.  We don't
> > > however remove the ability for a host admin to change this override.
> > >
> > > The next issue we need to address is how we let userspace drivers
> > > opt-in to this participation with the PF driver.  We do not want an
> > > admin to be able to unwittingly assign one of these VFs to a tenant
> > > that isn't working in collaboration with the PF driver.  We could use
> > > IOMMU grouping, but this seems to push too far towards tightly coupled
> > > PF and VF drivers.  This series introduces a "VF token", implemented
> > > as a UUID, as a shared secret between PF and VF drivers.  The token
> > > needs to be set by the PF driver and used as part of the device
> > > matching by the VF driver.  Provisions in the code also account for
> > > restarting the PF driver with active VF drivers, requiring the PF to
> > > use the current token to re-gain access to the PF.
> >
> > Thanks Alex for the series. DPDK realizes this use-case through, an out of
> > tree igb_uio module, for non VFIO devices. Supporting this use case, with
> > VFIO, will be a great enhancement for DPDK as we are planning to
> > get rid of out of tree modules any focus only on userspace aspects.
> >
> > From the DPDK perspective, we have following use-cases
> >
> > 1) VF representer or OVS/vSwitch  use cases where
> > DPDK PF acts as an HW switch to steer traffic to VF
> > using the rte_flow library backed by HW CAMs.
> >
> > 2) Unlike, other PCI class of devices, Network class of PCIe devices
> > would have additional
> > capability on the PF devices such as promiscuous mode support etc
> > leverage that in DPDK
> > PF and VF use cases.
> >
> > That would boil down to the use of the following topology.
> > a)  PF bound to DPDK/VFIO  and  VF bound to Linux
> > b)  PF bound to DPDK/VFIO  and  VF bound to DPDK/VFIO
> >
> > Tested the use case (a) and it works this patch. Tested use case(b), it
> > works with patch provided both PF and VF under the same application.
> >
> > Regarding the use case where  PF bound to DPDK/VFIO and
> > VF bound to DPDK/VFIO are _two different_ processes then sharing the UUID
> > will be a little tricky thing in terms of usage. But if that is the
> > purpose of bringing
> > UUID to the equation then it fine.
> >
> > Overall this series looks good to me.  We can test the next non-RFC
> > series and give
> > Tested-by by after testing with DPDK.
>
> Thanks Jerin, that's great feedback.  For case b), it is rather the
> intention of the shared VF token proposed here that it imposes some
> small barrier in validating the collaboration between the PF and VF
> drivers.  In a trusted environment, a common UUID might be exposed in a
> shared file and the same token could be used by all PFs and VFs on the
> system, or datacenter.  The goal is simply to make sure the
> collaboration is explicit, I don't want to be fielding support issues
> from users assigning PFs and VFs to unrelated VM instances or
> unintentionally creating your scenario a) configuration.

Yes. Makes sense from kernel PoV.

DPDK side, probably we will end in hardcoded UUID value.

The tricky part would DPDK PF and QEMU VF integration case.
I am not sure about that integration( a command-line option for UUID) or
something more sophisticated orchestration. Anyway, it is clear from
kernel side,
Something needs to be sorted with the QEMU community.

> With the positive response from you and Thomas, I'll post a non-RFC
> version and barring any blockers maybe we can get this in for the v5.7
> kernel.  Thanks,

Great.

>
> Alex
>