diff mbox

[RFC] vfio/pci: Use kernel VPD access functions

Message ID 20150911181332.32399.74472.stgit@gimli.home
State Not Applicable
Headers show

Commit Message

Alex Williamson Sept. 11, 2015, 6:16 p.m. UTC
The PCI VPD capability operates on a set of window registers in PCI
config space.  Writing to the address register triggers either a read
or write, depending on the setting of the PCI_VPD_ADDR_F bit within
the address register.  The data register provides either the source
for writes or the target for reads.

This model is susceptible to being broken by concurrent access, for
which the kernel has adopted a set of access functions to serialize
these registers.  Additionally, commits like 932c435caba8 ("PCI: Add
dev_flags bit to access VPD through function 0") and 7aa6ca4d39ed
("PCI: Add VPD function 0 quirk for Intel Ethernet devices") indicate
that VPD registers can be shared between functions on multifunction
devices creating dependencies between otherwise independent devices.

Fortunately it's quite easy to emulate the VPD registers, simply
storing copies of the address and data registers in memory and
triggering a VPD read or write on writes to the address register.
This allows vfio users to avoid seeing spurious register changes from
accesses on other devices and enables the use of shared quirks in the
host kernel.  We can theoretically still race with access through
sysfs, but the window of opportunity is much smaller.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

RFC - Is this something we should do?  Should we consider providing
similar emulation through PCI sysfs to allow lspci to also make use
of the vpd interfaces?

 drivers/vfio/pci/vfio_pci_config.c |   70 +++++++++++++++++++++++++++++++++++-
 1 file changed, 69 insertions(+), 1 deletion(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Rustad, Mark D Sept. 12, 2015, 1:11 a.m. UTC | #1
Alex,

> On Sep 11, 2015, at 11:16 AM, Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> RFC - Is this something we should do?

Superficially this looks pretty good. I need to think harder to be sure of the details.

> Should we consider providing
> similar emulation through PCI sysfs to allow lspci to also make use
> of the vpd interfaces?

It looks to me like lspci already uses the vpd attribute in sysfs to access VPD, so maybe nothing more than this is needed. No doubt lspci can be coerced into accessing VPD directly, but is that really worth going after? I'm not so sure.

An strace of lspci accessing a device with VPD shows me:

write(1, "\tCapabilities: [e0] Vital Produc"..., 39	Capabilities: [e0] Vital Product Data
) = 39
open("/sys/bus/pci/devices/0000:02:00.0/vpd", O_RDONLY) = 4
----------------------------------------^^^ accesses to this should be safe, I think

pread(4, "\202", 1, 0)                  = 1
pread(4, "\10\0", 2, 1)                 = 2
pread(4, "PVL Dell", 8, 3)              = 8
write(1, "\t\tProduct Name: PVL Dell\n", 25		Product Name: PVL Dell
) = 25

and so forth.

--
Mark Rustad, Networking Division, Intel Corporation
Alex Williamson Sept. 14, 2015, 4:54 p.m. UTC | #2
On Sat, 2015-09-12 at 01:11 +0000, Rustad, Mark D wrote:
> Alex,
> 
> > On Sep 11, 2015, at 11:16 AM, Alex Williamson <alex.williamson@redhat.com> wrote:
> > 
> > RFC - Is this something we should do?
> 
> Superficially this looks pretty good. I need to think harder to be sure of the details.
> 
> > Should we consider providing
> > similar emulation through PCI sysfs to allow lspci to also make use
> > of the vpd interfaces?
> 
> It looks to me like lspci already uses the vpd attribute in sysfs to access VPD, so maybe nothing more than this is needed. No doubt lspci can be coerced into accessing VPD directly, but is that really worth going after? I'm not so sure.
> 
> An strace of lspci accessing a device with VPD shows me:
> 
> write(1, "\tCapabilities: [e0] Vital Produc"..., 39	Capabilities: [e0] Vital Product Data
> ) = 39
> open("/sys/bus/pci/devices/0000:02:00.0/vpd", O_RDONLY) = 4
> ----------------------------------------^^^ accesses to this should be safe, I think
> 
> pread(4, "\202", 1, 0)                  = 1
> pread(4, "\10\0", 2, 1)                 = 2
> pread(4, "PVL Dell", 8, 3)              = 8
> write(1, "\t\tProduct Name: PVL Dell\n", 25		Product Name: PVL Dell
> ) = 25
> 
> and so forth.

Oh good, so aside from some rouge admin poking around with setpci access
through pci-sysfs is hopefully not an issue.  Thanks for looking into
it.

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rustad, Mark D Sept. 14, 2015, 9:34 p.m. UTC | #3
> On Sep 11, 2015, at 6:11 PM, Rustad, Mark D <mark.d.rustad@intel.com> wrote:
> 
> Superficially this looks pretty good. I need to think harder to be sure of the details.

This is the first time I've looked at all at any of the vfio code, but this is still looking good to me. Thanks for taking this on and exposing the vfio code to me. I hope more devices will be able to take advantage of the quirk and get their VPD issues resolved.

I did run this on a host with a device with VPD assigned to a guest and did not see any trouble when accessing it concurrently from both the guest and the host on the same and different functions. I don't think my particular environment is ideal to fully reproduce the problem (no writable VPD area), but my initial testing looks good.

Acked-by: Mark Rustad <mark.d.rustad@intel.com>

--
Mark Rustad, Networking Division, Intel Corporation
diff mbox

Patch

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index ff75ca3..a8657ef 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -671,6 +671,73 @@  static int __init init_pci_cap_pm_perm(struct perm_bits *perm)
 	return 0;
 }
 
+static int vfio_vpd_config_write(struct vfio_pci_device *vdev, int pos,
+				 int count, struct perm_bits *perm,
+				 int offset, __le32 val)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	__le16 *paddr = (__le16 *)(vdev->vconfig + pos - offset + PCI_VPD_ADDR);
+	__le32 *pdata = (__le32 *)(vdev->vconfig + pos - offset + PCI_VPD_DATA);
+	u16 addr;
+	u32 data;
+
+	/*
+	 * Write through to emulation.  If the write includes the upper byte
+	 * of PCI_VPD_ADDR, then the PCI_VPD_ADDR_F bit is written and we
+	 * have work to do.
+	 */
+	count = vfio_default_config_write(vdev, pos, count, perm, offset, val);
+	if (count < 0 || offset > PCI_VPD_ADDR + 1 ||
+	    offset + count <= PCI_VPD_ADDR + 1)
+		return count;
+
+	addr = le16_to_cpu(*paddr);
+
+	if (addr & PCI_VPD_ADDR_F) {
+		data = le32_to_cpu(*pdata);
+		if (pci_write_vpd(pdev, addr & ~PCI_VPD_ADDR_F, 4, &data) != 4)
+			return count;
+	} else {
+		if (pci_read_vpd(pdev, addr, 4, &data) != 4)
+			return count;
+		*pdata = cpu_to_le32(data);
+	}
+
+	/*
+	 * Toggle PCI_VPD_ADDR_F in the emulated PCI_VPD_ADDR register to
+	 * signal completion.  If an error occurs above, we assume that not
+	 * toggling this bit will induce a driver timeout.
+	 */
+	addr ^= PCI_VPD_ADDR_F;
+	*paddr = cpu_to_le16(addr);
+
+	return count;
+}
+
+/* Permissions for Vital Product Data capability */
+static int __init init_pci_cap_vpd_perm(struct perm_bits *perm)
+{
+	if (alloc_perm_bits(perm, pci_cap_length[PCI_CAP_ID_VPD]))
+		return -ENOMEM;
+
+	perm->writefn = vfio_vpd_config_write;
+
+	/*
+	 * We always virtualize the next field so we can remove
+	 * capabilities from the chain if we want to.
+	 */
+	p_setb(perm, PCI_CAP_LIST_NEXT, (u8)ALL_VIRT, NO_WRITE);
+
+	/*
+	 * Both the address and data registers are virtualized to
+	 * enable access through the pci_vpd_read/write functions
+	 */
+	p_setw(perm, PCI_VPD_ADDR, (u16)ALL_VIRT, (u16)ALL_WRITE);
+	p_setd(perm, PCI_VPD_DATA, ALL_VIRT, ALL_WRITE);
+
+	return 0;
+}
+
 /* Permissions for PCI-X capability */
 static int __init init_pci_cap_pcix_perm(struct perm_bits *perm)
 {
@@ -790,6 +857,7 @@  void vfio_pci_uninit_perm_bits(void)
 	free_perm_bits(&cap_perms[PCI_CAP_ID_BASIC]);
 
 	free_perm_bits(&cap_perms[PCI_CAP_ID_PM]);
+	free_perm_bits(&cap_perms[PCI_CAP_ID_VPD]);
 	free_perm_bits(&cap_perms[PCI_CAP_ID_PCIX]);
 	free_perm_bits(&cap_perms[PCI_CAP_ID_EXP]);
 	free_perm_bits(&cap_perms[PCI_CAP_ID_AF]);
@@ -807,7 +875,7 @@  int __init vfio_pci_init_perm_bits(void)
 
 	/* Capabilities */
 	ret |= init_pci_cap_pm_perm(&cap_perms[PCI_CAP_ID_PM]);
-	cap_perms[PCI_CAP_ID_VPD].writefn = vfio_raw_config_write;
+	ret |= init_pci_cap_vpd_perm(&cap_perms[PCI_CAP_ID_VPD]);
 	ret |= init_pci_cap_pcix_perm(&cap_perms[PCI_CAP_ID_PCIX]);
 	cap_perms[PCI_CAP_ID_VNDR].writefn = vfio_raw_config_write;
 	ret |= init_pci_cap_exp_perm(&cap_perms[PCI_CAP_ID_EXP]);