| Message ID | a04c44aa4625a6edfadaf9c9e2c2afb460ad1857.1760368250.git.leon@kernel.org |
|---|---|
| State | New |
| Headers | show |
| Series | vfio/pci: Allow MMIO regions to be exported through dma-buf | expand |
Hi Leon, On Mon, Oct 13, 2025 at 06:26:10PM +0300, Leon Romanovsky wrote: > @@ -2090,6 +2092,9 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev) > INIT_LIST_HEAD(&vdev->dummy_resources_list); > INIT_LIST_HEAD(&vdev->ioeventfds_list); > INIT_LIST_HEAD(&vdev->sriov_pfs_item); > + ret = pcim_p2pdma_init(vdev->pdev); > + if (ret != -EOPNOTSUPP) > + return ret; > init_rwsem(&vdev->memory_lock); > xa_init(&vdev->ctx); I think this should be: if (ret && ret != -EOPNOTSUPP) return ret; Otherwise, init_rwsem() and xa_init() would be missed if ret==0. Thanks Nicolin
On Wed, Oct 15, 2025 at 09:09:53PM -0700, Nicolin Chen wrote: > Hi Leon, > > On Mon, Oct 13, 2025 at 06:26:10PM +0300, Leon Romanovsky wrote: > > @@ -2090,6 +2092,9 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev) > > INIT_LIST_HEAD(&vdev->dummy_resources_list); > > INIT_LIST_HEAD(&vdev->ioeventfds_list); > > INIT_LIST_HEAD(&vdev->sriov_pfs_item); > > + ret = pcim_p2pdma_init(vdev->pdev); > > + if (ret != -EOPNOTSUPP) > > + return ret; > > init_rwsem(&vdev->memory_lock); > > xa_init(&vdev->ctx); > > I think this should be: > if (ret && ret != -EOPNOTSUPP) > return ret; > > Otherwise, init_rwsem() and xa_init() would be missed if ret==0. You absolutely right. Thanks > > Thanks > Nicolin >
On Mon, Oct 13, 2025 at 06:26:10PM +0300, Leon Romanovsky wrote: > From: Leon Romanovsky <leonro@nvidia.com> > > Make sure that all VFIO PCI devices have peer-to-peer capabilities > enables, so we would be able to export their MMIO memory through DMABUF, How do you know that they are safe to use with P2P?
On Thu, Oct 16, 2025 at 11:32:59PM -0700, Christoph Hellwig wrote: > On Mon, Oct 13, 2025 at 06:26:10PM +0300, Leon Romanovsky wrote: > > From: Leon Romanovsky <leonro@nvidia.com> > > > > Make sure that all VFIO PCI devices have peer-to-peer capabilities > > enables, so we would be able to export their MMIO memory through DMABUF, > > How do you know that they are safe to use with P2P? All PCI devices are "safe" for P2P by spec. I've never heard of a non-complaint device causing problems in this area. The issue is always SOC support inside the CPU and that is delt with inside the P2P subsystem logic. If we ever see a problem it would be delt with by quirking the broken device through pci-quirks and having the p2p subsystem refuse any p2p with that device. Jason
On Fri, Oct 17, 2025 at 08:55:24AM -0300, Jason Gunthorpe wrote: > On Thu, Oct 16, 2025 at 11:32:59PM -0700, Christoph Hellwig wrote: > > On Mon, Oct 13, 2025 at 06:26:10PM +0300, Leon Romanovsky wrote: > > > From: Leon Romanovsky <leonro@nvidia.com> > > > > > > Make sure that all VFIO PCI devices have peer-to-peer capabilities > > > enables, so we would be able to export their MMIO memory through DMABUF, > > > > How do you know that they are safe to use with P2P? > > All PCI devices are "safe" for P2P by spec. I've never heard of a > non-complaint device causing problems in this area. Real PCIe device, yes. But we have a lot of stuff mascquerading as such with is just emulated or special integrated. I.e. a lot of integrated Intel GPUs claim had issue there.
On Mon, Oct 20, 2025 at 05:28:02AM -0700, Christoph Hellwig wrote: > On Fri, Oct 17, 2025 at 08:55:24AM -0300, Jason Gunthorpe wrote: > > On Thu, Oct 16, 2025 at 11:32:59PM -0700, Christoph Hellwig wrote: > > > On Mon, Oct 13, 2025 at 06:26:10PM +0300, Leon Romanovsky wrote: > > > > From: Leon Romanovsky <leonro@nvidia.com> > > > > > > > > Make sure that all VFIO PCI devices have peer-to-peer capabilities > > > > enables, so we would be able to export their MMIO memory through DMABUF, > > > > > > How do you know that they are safe to use with P2P? > > > > All PCI devices are "safe" for P2P by spec. I've never heard of a > > non-complaint device causing problems in this area. > > Real PCIe device, yes. But we have a lot of stuff mascquerading as > such with is just emulated or special integrated. I.e. a lot of > integrated Intel GPUs claim had issue there. Sure, but this should be handled by the P2P subsystem and PCI quirks, IMHO. It isn't VFIOs job.. If people complain about broken HW then it is easy to add those things. I think the majority of stuff is OK, there is a chunk of configurations that will have clean failures - meaning the initiating device gets an error indication and handles it. Then there is a small minority where the platform crashes with a machine check. IDK where Intel GPU lands on this, but VFIO has always supported P2P and userspace/VMs have always been able to trigger these kinds of bugs. If nobody has complained so far I'm not inclined to do anything right now. VFIO has always kind of come along with a footnote that if you actually want fully safe VFIO then it is up to the user to validate the SOC and device implementations are sane. Jason
On Mon, Oct 20, 2025 at 10:08:55AM -0300, Jason Gunthorpe wrote: > Sure, but this should be handled by the P2P subsystem and PCI quirks, > IMHO. It isn't VFIOs job.. If people complain about broken HW then it > is easy to add those things. I think it is. You now open up behavior generally that previously had specific drivers in charge. > IDK where Intel GPU lands on this, but VFIO has always supported P2P How?
On Wed, Oct 22, 2025 at 12:08:48AM -0700, Christoph Hellwig wrote: > On Mon, Oct 20, 2025 at 10:08:55AM -0300, Jason Gunthorpe wrote: > > Sure, but this should be handled by the P2P subsystem and PCI quirks, > > IMHO. It isn't VFIOs job.. If people complain about broken HW then it > > is easy to add those things. > > I think it is. You now open up behavior generally that previously > had specific drivers in charge. It has always been available in VFIO. This series is fixing it up to not have the lifetime bugs. > > IDK where Intel GPU lands on this, but VFIO has always supported P2P > > How? It uses follow_pfnmap_start()/etc to fish the MMIO PFNs out of a VMA and stick them into the iommu. Jason
On Mon, Oct 13, 2025 at 06:26:10PM +0300, Leon Romanovsky wrote: > From: Leon Romanovsky <leonro@nvidia.com> > > Make sure that all VFIO PCI devices have peer-to-peer capabilities > enables, so we would be able to export their MMIO memory through DMABUF, Let's enhance this: VFIO has always supported P2P mappings with itself. VFIO type 1 insecurely reads PFNs directly out of a VMA's PTEs and programs them into the IOMMU allowing any two VFIO devices to perform P2P to each other. All existing VMMs use this capability to export P2P into a VM where the VM could setup any kind of DMA it likes. Projects like DPDK/SPDK are also known to make use of this, though less frequently. As a first step to more properly integrating VFIO with the P2P subsystem unconditionally enable P2P support for VFIO PCI devices. The struct p2pdma_provider will act has a handle to the P2P subsystem to do things like DMA mapping. While real PCI devices have to support P2P (they can't even tell if an IOVA is P2P or not) there may be fake PCI devices that may trigger some kind of catastrophic system failure. To date VFIO has never tripped up on such a case, but if one is discovered the plan is to add a PCI quirk and have pcim_p2pdma_init() fail. This will fully block the broken device throughout any users of the P2P subsystem in the kernel. Thus P2P through DMABUF will follow the historical VFIO model and be unconditionally enabled by vfio-pci. Jason
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index ca9a95716a85..fe247d0e2831 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -28,6 +28,7 @@ #include <linux/nospec.h> #include <linux/sched/mm.h> #include <linux/iommufd.h> +#include <linux/pci-p2pdma.h> #if IS_ENABLED(CONFIG_EEH) #include <asm/eeh.h> #endif @@ -2081,6 +2082,7 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev) { struct vfio_pci_core_device *vdev = container_of(core_vdev, struct vfio_pci_core_device, vdev); + int ret; vdev->pdev = to_pci_dev(core_vdev->dev); vdev->irq_type = VFIO_PCI_NUM_IRQS; @@ -2090,6 +2092,9 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev) INIT_LIST_HEAD(&vdev->dummy_resources_list); INIT_LIST_HEAD(&vdev->ioeventfds_list); INIT_LIST_HEAD(&vdev->sriov_pfs_item); + ret = pcim_p2pdma_init(vdev->pdev); + if (ret != -EOPNOTSUPP) + return ret; init_rwsem(&vdev->memory_lock); xa_init(&vdev->ctx);