diff mbox series

[v5,8/9] vfio/pci: Enable peer-to-peer DMA transactions by default

Message ID a04c44aa4625a6edfadaf9c9e2c2afb460ad1857.1760368250.git.leon@kernel.org
State New
Headers show
Series vfio/pci: Allow MMIO regions to be exported through dma-buf | expand

Commit Message

Leon Romanovsky Oct. 13, 2025, 3:26 p.m. UTC
From: Leon Romanovsky <leonro@nvidia.com>

Make sure that all VFIO PCI devices have peer-to-peer capabilities
enables, so we would be able to export their MMIO memory through DMABUF,

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Nicolin Chen Oct. 16, 2025, 4:09 a.m. UTC | #1
Hi Leon,

On Mon, Oct 13, 2025 at 06:26:10PM +0300, Leon Romanovsky wrote:
> @@ -2090,6 +2092,9 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
>  	INIT_LIST_HEAD(&vdev->dummy_resources_list);
>  	INIT_LIST_HEAD(&vdev->ioeventfds_list);
>  	INIT_LIST_HEAD(&vdev->sriov_pfs_item);
> +	ret = pcim_p2pdma_init(vdev->pdev);
> +	if (ret != -EOPNOTSUPP)
> +		return ret;
>  	init_rwsem(&vdev->memory_lock);
>  	xa_init(&vdev->ctx);

I think this should be:
	if (ret && ret != -EOPNOTSUPP)
		return ret;

Otherwise, init_rwsem() and xa_init() would be missed if ret==0.

Thanks
Nicolin
Leon Romanovsky Oct. 16, 2025, 6:10 a.m. UTC | #2
On Wed, Oct 15, 2025 at 09:09:53PM -0700, Nicolin Chen wrote:
> Hi Leon,
> 
> On Mon, Oct 13, 2025 at 06:26:10PM +0300, Leon Romanovsky wrote:
> > @@ -2090,6 +2092,9 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
> >  	INIT_LIST_HEAD(&vdev->dummy_resources_list);
> >  	INIT_LIST_HEAD(&vdev->ioeventfds_list);
> >  	INIT_LIST_HEAD(&vdev->sriov_pfs_item);
> > +	ret = pcim_p2pdma_init(vdev->pdev);
> > +	if (ret != -EOPNOTSUPP)
> > +		return ret;
> >  	init_rwsem(&vdev->memory_lock);
> >  	xa_init(&vdev->ctx);
> 
> I think this should be:
> 	if (ret && ret != -EOPNOTSUPP)
> 		return ret;
> 
> Otherwise, init_rwsem() and xa_init() would be missed if ret==0.

You absolutely right.

Thanks

> 
> Thanks
> Nicolin
>
Christoph Hellwig Oct. 17, 2025, 6:32 a.m. UTC | #3
On Mon, Oct 13, 2025 at 06:26:10PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Make sure that all VFIO PCI devices have peer-to-peer capabilities
> enables, so we would be able to export their MMIO memory through DMABUF,

How do you know that they are safe to use with P2P?
Jason Gunthorpe Oct. 17, 2025, 11:55 a.m. UTC | #4
On Thu, Oct 16, 2025 at 11:32:59PM -0700, Christoph Hellwig wrote:
> On Mon, Oct 13, 2025 at 06:26:10PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > Make sure that all VFIO PCI devices have peer-to-peer capabilities
> > enables, so we would be able to export their MMIO memory through DMABUF,
> 
> How do you know that they are safe to use with P2P?

All PCI devices are "safe" for P2P by spec. I've never heard of a
non-complaint device causing problems in this area.

The issue is always SOC support inside the CPU and that is delt with
inside the P2P subsystem logic.

If we ever see a problem it would be delt with by quirking the broken
device through pci-quirks and having the p2p subsystem refuse any p2p
with that device.

Jason
Christoph Hellwig Oct. 20, 2025, 12:28 p.m. UTC | #5
On Fri, Oct 17, 2025 at 08:55:24AM -0300, Jason Gunthorpe wrote:
> On Thu, Oct 16, 2025 at 11:32:59PM -0700, Christoph Hellwig wrote:
> > On Mon, Oct 13, 2025 at 06:26:10PM +0300, Leon Romanovsky wrote:
> > > From: Leon Romanovsky <leonro@nvidia.com>
> > > 
> > > Make sure that all VFIO PCI devices have peer-to-peer capabilities
> > > enables, so we would be able to export their MMIO memory through DMABUF,
> > 
> > How do you know that they are safe to use with P2P?
> 
> All PCI devices are "safe" for P2P by spec. I've never heard of a
> non-complaint device causing problems in this area.

Real PCIe device, yes.  But we have a lot of stuff mascquerading as
such with is just emulated or special integrated.  I.e. a lot of
integrated Intel GPUs claim had issue there.
Jason Gunthorpe Oct. 20, 2025, 1:08 p.m. UTC | #6
On Mon, Oct 20, 2025 at 05:28:02AM -0700, Christoph Hellwig wrote:
> On Fri, Oct 17, 2025 at 08:55:24AM -0300, Jason Gunthorpe wrote:
> > On Thu, Oct 16, 2025 at 11:32:59PM -0700, Christoph Hellwig wrote:
> > > On Mon, Oct 13, 2025 at 06:26:10PM +0300, Leon Romanovsky wrote:
> > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > > 
> > > > Make sure that all VFIO PCI devices have peer-to-peer capabilities
> > > > enables, so we would be able to export their MMIO memory through DMABUF,
> > > 
> > > How do you know that they are safe to use with P2P?
> > 
> > All PCI devices are "safe" for P2P by spec. I've never heard of a
> > non-complaint device causing problems in this area.
> 
> Real PCIe device, yes.  But we have a lot of stuff mascquerading as
> such with is just emulated or special integrated.  I.e. a lot of
> integrated Intel GPUs claim had issue there.

Sure, but this should be handled by the P2P subsystem and PCI quirks,
IMHO. It isn't VFIOs job.. If people complain about broken HW then it
is easy to add those things.

I think the majority of stuff is OK, there is a chunk of
configurations that will have clean failures - meaning the initiating
device gets an error indication and handles it. Then there is a small
minority where the platform crashes with a machine check.

IDK where Intel GPU lands on this, but VFIO has always supported P2P
and userspace/VMs have always been able to trigger these kinds of
bugs. If nobody has complained so far I'm not inclined to do anything
right now.

VFIO has always kind of come along with a footnote that if you
actually want fully safe VFIO then it is up to the user to validate
the SOC and device implementations are sane.

Jason
Christoph Hellwig Oct. 22, 2025, 7:08 a.m. UTC | #7
On Mon, Oct 20, 2025 at 10:08:55AM -0300, Jason Gunthorpe wrote:
> Sure, but this should be handled by the P2P subsystem and PCI quirks,
> IMHO. It isn't VFIOs job.. If people complain about broken HW then it
> is easy to add those things.

I think it is.  You now open up behavior generally that previously
had specific drivers in charge.

> IDK where Intel GPU lands on this, but VFIO has always supported P2P

How?
Jason Gunthorpe Oct. 22, 2025, 11:38 a.m. UTC | #8
On Wed, Oct 22, 2025 at 12:08:48AM -0700, Christoph Hellwig wrote:
> On Mon, Oct 20, 2025 at 10:08:55AM -0300, Jason Gunthorpe wrote:
> > Sure, but this should be handled by the P2P subsystem and PCI quirks,
> > IMHO. It isn't VFIOs job.. If people complain about broken HW then it
> > is easy to add those things.
> 
> I think it is.  You now open up behavior generally that previously
> had specific drivers in charge.

It has always been available in VFIO. This series is fixing it up to
not have the lifetime bugs.

> > IDK where Intel GPU lands on this, but VFIO has always supported P2P
> 
> How?

It uses follow_pfnmap_start()/etc to fish the MMIO PFNs out of a VMA and
stick them into the iommu.

Jason
Jason Gunthorpe Oct. 22, 2025, 11:54 a.m. UTC | #9
On Mon, Oct 13, 2025 at 06:26:10PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Make sure that all VFIO PCI devices have peer-to-peer capabilities
> enables, so we would be able to export their MMIO memory through DMABUF,

Let's enhance this:

VFIO has always supported P2P mappings with itself. VFIO type 1
insecurely reads PFNs directly out of a VMA's PTEs and programs them
into the IOMMU allowing any two VFIO devices to perform P2P to each
other.

All existing VMMs use this capability to export P2P into a VM where
the VM could setup any kind of DMA it likes. Projects like DPDK/SPDK
are also known to make use of this, though less frequently.

As a first step to more properly integrating VFIO with the P2P
subsystem unconditionally enable P2P support for VFIO PCI devices. The
struct p2pdma_provider will act has a handle to the P2P subsystem to
do things like DMA mapping.

While real PCI devices have to support P2P (they can't even tell if an
IOVA is P2P or not) there may be fake PCI devices that may trigger
some kind of catastrophic system failure. To date VFIO has never
tripped up on such a case, but if one is discovered the plan is to add
a PCI quirk and have pcim_p2pdma_init() fail. This will fully block
the broken device throughout any users of the P2P subsystem in the
kernel.

Thus P2P through DMABUF will follow the historical VFIO model and be
unconditionally enabled by vfio-pci.

Jason
diff mbox series

Patch

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index ca9a95716a85..fe247d0e2831 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -28,6 +28,7 @@ 
 #include <linux/nospec.h>
 #include <linux/sched/mm.h>
 #include <linux/iommufd.h>
+#include <linux/pci-p2pdma.h>
 #if IS_ENABLED(CONFIG_EEH)
 #include <asm/eeh.h>
 #endif
@@ -2081,6 +2082,7 @@  int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
 {
 	struct vfio_pci_core_device *vdev =
 		container_of(core_vdev, struct vfio_pci_core_device, vdev);
+	int ret;
 
 	vdev->pdev = to_pci_dev(core_vdev->dev);
 	vdev->irq_type = VFIO_PCI_NUM_IRQS;
@@ -2090,6 +2092,9 @@  int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
 	INIT_LIST_HEAD(&vdev->dummy_resources_list);
 	INIT_LIST_HEAD(&vdev->ioeventfds_list);
 	INIT_LIST_HEAD(&vdev->sriov_pfs_item);
+	ret = pcim_p2pdma_init(vdev->pdev);
+	if (ret != -EOPNOTSUPP)
+		return ret;
 	init_rwsem(&vdev->memory_lock);
 	xa_init(&vdev->ctx);