diff mbox series

[11/14] PCI/P2PDMA: dma_map P2PDMA map requests that traverse the host bridge

Message ID 20190722230859.5436-12-logang@deltatee.com
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series PCI/P2PDMA: Support transactions that hit the host bridge | expand

Commit Message

Logan Gunthorpe July 22, 2019, 11:08 p.m. UTC
Any requests that traverse the host bridge will need to be mapped into
the IOMMU, so call dma_map_sg() inside pci_p2pdma_map_sg() when
appropriate.

Similarly, call dma_unmap_sg() inside pci_p2pdma_unmap_sg().

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/pci/p2pdma.c | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig July 24, 2019, 6:32 a.m. UTC | #1
>  	struct dev_pagemap *pgmap = sg_page(sg)->pgmap;
> +	struct pci_dev *client;
> +	int dist;
> +
> +	client = find_parent_pci_dev(dev);
> +	if (WARN_ON_ONCE(!client))
> +		return 0;
>  
> +	dist = upstream_bridge_distance(pgmap->pci_p2pdma_provider,
> +					client, NULL);

Doing this on every mapping call sounds expensive..

> +	if (WARN_ON_ONCE(dist & P2PDMA_NOT_SUPPORTED))
> +		return 0;
> +
> +	if (dist & P2PDMA_THRU_HOST_BRIDGE)
> +		return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
> +	else
> +		return __pci_p2pdma_map_sg(pgmap, dev, sg, nents);

Can't we organize the values so that we can switch on the return
value instead of doing flag checks?

>  }
>  EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs);
>  
> @@ -847,6 +861,21 @@ EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs);
>  void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
>  		int nents, enum dma_data_direction dir, unsigned long attrs)
>  {
> +	struct dev_pagemap *pgmap = sg_page(sg)->pgmap;
> +	struct pci_dev *client;
> +	int dist;
> +
> +	client = find_parent_pci_dev(dev);
> +	if (!client)
> +		return;
> +
> +	dist = upstream_bridge_distance(pgmap->pci_p2pdma_provider,
> +					client, NULL);

And then we do it for every unmap again..
Logan Gunthorpe July 24, 2019, 3:58 p.m. UTC | #2
On 2019-07-24 12:32 a.m., Christoph Hellwig wrote:
>>  	struct dev_pagemap *pgmap = sg_page(sg)->pgmap;
>> +	struct pci_dev *client;
>> +	int dist;
>> +
>> +	client = find_parent_pci_dev(dev);
>> +	if (WARN_ON_ONCE(!client))
>> +		return 0;
>>  
>> +	dist = upstream_bridge_distance(pgmap->pci_p2pdma_provider,
>> +					client, NULL);
> 
> Doing this on every mapping call sounds expensive..

The result of this function is cached in an xarray (per patch 4) so, on
the hot path, it should just be a single xa_load() which should be a
relatively fast lookup which is similarly used for other hot path
operations.

> 
>> +	if (WARN_ON_ONCE(dist & P2PDMA_NOT_SUPPORTED))
>> +		return 0;
>> +
>> +	if (dist & P2PDMA_THRU_HOST_BRIDGE)
>> +		return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
>> +	else
>> +		return __pci_p2pdma_map_sg(pgmap, dev, sg, nents);
> 
> Can't we organize the values so that we can switch on the return
> value instead of doing flag checks?

Sorry, I don't follow what you are saying here. If you mean for
upstream_bridge_distance() to just return how to map and not the
distance that would interfere with other uses of that function.

>>  }
>>  EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs);
>>  
>> @@ -847,6 +861,21 @@ EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs);
>>  void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
>>  		int nents, enum dma_data_direction dir, unsigned long attrs)
>>  {
>> +	struct dev_pagemap *pgmap = sg_page(sg)->pgmap;
>> +	struct pci_dev *client;
>> +	int dist;
>> +
>> +	client = find_parent_pci_dev(dev);
>> +	if (!client)
>> +		return;
>> +
>> +	dist = upstream_bridge_distance(pgmap->pci_p2pdma_provider,
>> +					client, NULL);
> 
> And then we do it for every unmap again..

Yup, I don't think there's much else we can do here.

This is why I was fighting against doing lookups against the phys_addr_t
because that means you have to do these additional lookups.

My hope is if we can move to the phys_addr_t and flags as I described
here[1] we can get rid of these hot path lookups, but with the way
things are structured now this is necessary.

Logan

[1]
https://lore.kernel.org/linux-block/e63d0259-e17f-effe-b76d-43dbfda8ae3a@deltatee.com/
Christoph Hellwig July 25, 2019, 6:10 a.m. UTC | #3
On Wed, Jul 24, 2019 at 09:58:59AM -0600, Logan Gunthorpe wrote:
> 
> 
> On 2019-07-24 12:32 a.m., Christoph Hellwig wrote:
> >>  	struct dev_pagemap *pgmap = sg_page(sg)->pgmap;
> >> +	struct pci_dev *client;
> >> +	int dist;
> >> +
> >> +	client = find_parent_pci_dev(dev);
> >> +	if (WARN_ON_ONCE(!client))
> >> +		return 0;
> >>  
> >> +	dist = upstream_bridge_distance(pgmap->pci_p2pdma_provider,
> >> +					client, NULL);
> > 
> > Doing this on every mapping call sounds expensive..
> 
> The result of this function is cached in an xarray (per patch 4) so, on
> the hot path, it should just be a single xa_load() which should be a
> relatively fast lookup which is similarly used for other hot path
> operations.

We don't cache find_parent_pci_dev, though.  So we should probably
export find_parent_pci_dev with a proper namespaces name and cache
that in the caler.

> > 
> >> +	if (WARN_ON_ONCE(dist & P2PDMA_NOT_SUPPORTED))
> >> +		return 0;
> >> +
> >> +	if (dist & P2PDMA_THRU_HOST_BRIDGE)
> >> +		return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
> >> +	else
> >> +		return __pci_p2pdma_map_sg(pgmap, dev, sg, nents);
> > 
> > Can't we organize the values so that we can switch on the return
> > value instead of doing flag checks?
> 
> Sorry, I don't follow what you are saying here. If you mean for
> upstream_bridge_distance() to just return how to map and not the
> distance that would interfere with other uses of that function.

The point is that in the map path we don't even care about the
distance.  I think we should just have a function that returns the
P2PDMA_ values from the xarray (maybe also store it there as two
values, but that isn't quite as important), and get rid of even
the concept of distance in the map path. e.g.:

	switch (pci_p2pdma_supported(pgmap->pci_p2pdma_provider, client))) {
	case P2PDMA_HOST_BRIDGE:
		return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
	case P2PDMA_SWITCH:
		return __pci_p2pdma_map_sg(pgmap, dev, sg, nents);
	default:
		WARN_ON_ONCE(1);
		return 0;
	}
Logan Gunthorpe July 25, 2019, 4 p.m. UTC | #4
On 2019-07-25 12:10 a.m., Christoph Hellwig wrote:
> On Wed, Jul 24, 2019 at 09:58:59AM -0600, Logan Gunthorpe wrote:
>>
>>
>> On 2019-07-24 12:32 a.m., Christoph Hellwig wrote:
>>>>  	struct dev_pagemap *pgmap = sg_page(sg)->pgmap;
>>>> +	struct pci_dev *client;
>>>> +	int dist;
>>>> +
>>>> +	client = find_parent_pci_dev(dev);
>>>> +	if (WARN_ON_ONCE(!client))
>>>> +		return 0;
>>>>  
>>>> +	dist = upstream_bridge_distance(pgmap->pci_p2pdma_provider,
>>>> +					client, NULL);
>>>
>>> Doing this on every mapping call sounds expensive..
>>
>> The result of this function is cached in an xarray (per patch 4) so, on
>> the hot path, it should just be a single xa_load() which should be a
>> relatively fast lookup which is similarly used for other hot path
>> operations.
> 
> We don't cache find_parent_pci_dev, though.  So we should probably
> export find_parent_pci_dev with a proper namespaces name and cache
> that in the caler.

Oh, yes, I'll take a look at this. Of the two callers: NVMe should be
easy we could just pass the PCI device instead of the struct device.
RDMA is significantly more unclear: would you add a pci_dev to struct
ib_device? Or maybe we should be able to simply rely on the fact that
the DMA device *must* be a PCI device and just use to_pci_dev() directly?

>>>
>>>> +	if (WARN_ON_ONCE(dist & P2PDMA_NOT_SUPPORTED))
>>>> +		return 0;
>>>> +
>>>> +	if (dist & P2PDMA_THRU_HOST_BRIDGE)
>>>> +		return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
>>>> +	else
>>>> +		return __pci_p2pdma_map_sg(pgmap, dev, sg, nents);
>>>
>>> Can't we organize the values so that we can switch on the return
>>> value instead of doing flag checks?
>>
>> Sorry, I don't follow what you are saying here. If you mean for
>> upstream_bridge_distance() to just return how to map and not the
>> distance that would interfere with other uses of that function.
> 
> The point is that in the map path we don't even care about the
> distance.  I think we should just have a function that returns the
> P2PDMA_ values from the xarray (maybe also store it there as two
> values, but that isn't quite as important), and get rid of even
> the concept of distance in the map path. e.g.:
> 
> 	switch (pci_p2pdma_supported(pgmap->pci_p2pdma_provider, client))) {
> 	case P2PDMA_HOST_BRIDGE:
> 		return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
> 	case P2PDMA_SWITCH:
> 		return __pci_p2pdma_map_sg(pgmap, dev, sg, nents);
> 	default:
> 		WARN_ON_ONCE(1);
> 		return 0;
> 	}

Ok, will change for v2.

Thanks,

Logan
Jason Gunthorpe July 25, 2019, 4:34 p.m. UTC | #5
On Thu, Jul 25, 2019 at 10:00:25AM -0600, Logan Gunthorpe wrote:
> 
> 
> On 2019-07-25 12:10 a.m., Christoph Hellwig wrote:
> > On Wed, Jul 24, 2019 at 09:58:59AM -0600, Logan Gunthorpe wrote:
> >>
> >>
> >> On 2019-07-24 12:32 a.m., Christoph Hellwig wrote:
> >>>>  	struct dev_pagemap *pgmap = sg_page(sg)->pgmap;
> >>>> +	struct pci_dev *client;
> >>>> +	int dist;
> >>>> +
> >>>> +	client = find_parent_pci_dev(dev);
> >>>> +	if (WARN_ON_ONCE(!client))
> >>>> +		return 0;
> >>>>  
> >>>> +	dist = upstream_bridge_distance(pgmap->pci_p2pdma_provider,
> >>>> +					client, NULL);
> >>>
> >>> Doing this on every mapping call sounds expensive..
> >>
> >> The result of this function is cached in an xarray (per patch 4) so, on
> >> the hot path, it should just be a single xa_load() which should be a
> >> relatively fast lookup which is similarly used for other hot path
> >> operations.
> > 
> > We don't cache find_parent_pci_dev, though.  So we should probably
> > export find_parent_pci_dev with a proper namespaces name and cache
> > that in the caler.
> 
> Oh, yes, I'll take a look at this. Of the two callers: NVMe should be
> easy we could just pass the PCI device instead of the struct device.
> RDMA is significantly more unclear: would you add a pci_dev to struct
> ib_device? Or maybe we should be able to simply rely on the fact that
> the DMA device *must* be a PCI device and just use to_pci_dev() directly?

AFAIK you need to use the ib_device->dma_device and add some kind of
is_pci_dev to make it safe

Jason
Logan Gunthorpe July 25, 2019, 5:22 p.m. UTC | #6
On 2019-07-25 10:34 a.m., Jason Gunthorpe wrote:
> On Thu, Jul 25, 2019 at 10:00:25AM -0600, Logan Gunthorpe wrote:
>>
>>
>> On 2019-07-25 12:10 a.m., Christoph Hellwig wrote:
>>> On Wed, Jul 24, 2019 at 09:58:59AM -0600, Logan Gunthorpe wrote:
>>>>
>>>>
>>>> On 2019-07-24 12:32 a.m., Christoph Hellwig wrote:
>>>>>>  	struct dev_pagemap *pgmap = sg_page(sg)->pgmap;
>>>>>> +	struct pci_dev *client;
>>>>>> +	int dist;
>>>>>> +
>>>>>> +	client = find_parent_pci_dev(dev);
>>>>>> +	if (WARN_ON_ONCE(!client))
>>>>>> +		return 0;
>>>>>>  
>>>>>> +	dist = upstream_bridge_distance(pgmap->pci_p2pdma_provider,
>>>>>> +					client, NULL);
>>>>>
>>>>> Doing this on every mapping call sounds expensive..
>>>>
>>>> The result of this function is cached in an xarray (per patch 4) so, on
>>>> the hot path, it should just be a single xa_load() which should be a
>>>> relatively fast lookup which is similarly used for other hot path
>>>> operations.
>>>
>>> We don't cache find_parent_pci_dev, though.  So we should probably
>>> export find_parent_pci_dev with a proper namespaces name and cache
>>> that in the caler.
>>
>> Oh, yes, I'll take a look at this. Of the two callers: NVMe should be
>> easy we could just pass the PCI device instead of the struct device.
>> RDMA is significantly more unclear: would you add a pci_dev to struct
>> ib_device? Or maybe we should be able to simply rely on the fact that
>> the DMA device *must* be a PCI device and just use to_pci_dev() directly?
> 
> AFAIK you need to use the ib_device->dma_device and add some kind of
> is_pci_dev to make it safe

Yes, that's my thinking. The dma_device *should* be a PCI device. We can
just be sure by doing is_pci_dev() and failing the mapping if it is not.

So I *think* we should be able to simply replace the
find_parent_pci_dev() with:

if (!dev_is_pci(dev))
     return 0;

client = to_pci_dev(dev);

Which should be fast and reliable.

The alternative is to push this out into the caller which may have a bit
more information (like the nvme driver does).

Logan
Jason Gunthorpe July 25, 2019, 6:58 p.m. UTC | #7
On Mon, Jul 22, 2019 at 05:08:56PM -0600, Logan Gunthorpe wrote:
> Any requests that traverse the host bridge will need to be mapped into
> the IOMMU, so call dma_map_sg() inside pci_p2pdma_map_sg() when
> appropriate.
> 
> Similarly, call dma_unmap_sg() inside pci_p2pdma_unmap_sg().
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>  drivers/pci/p2pdma.c | 31 ++++++++++++++++++++++++++++++-
>  1 file changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 5f43f92f9336..76f51678342c 100644
> +++ b/drivers/pci/p2pdma.c
> @@ -830,8 +830,22 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>  		int nents, enum dma_data_direction dir, unsigned long attrs)
>  {
>  	struct dev_pagemap *pgmap = sg_page(sg)->pgmap;
> +	struct pci_dev *client;
> +	int dist;
> +
> +	client = find_parent_pci_dev(dev);
> +	if (WARN_ON_ONCE(!client))
> +		return 0;
>  
> -	return __pci_p2pdma_map_sg(pgmap, dev, sg, nents);
> +	dist = upstream_bridge_distance(pgmap->pci_p2pdma_provider,
> +					client, NULL);
> +	if (WARN_ON_ONCE(dist & P2PDMA_NOT_SUPPORTED))
> +		return 0;
> +
> +	if (dist & P2PDMA_THRU_HOST_BRIDGE)
> +		return dma_map_sg_attrs(dev, sg, nents, dir, attrs);

IIRC at this point the SG will have struct page references to the BAR
memory - so (all?) the IOMMU drivers are able to handle P2P setup in
this case?

Didn't you also have a patch to remove the struct page's for the BAR
memory?  

Confused what the plan is here

Jason
Logan Gunthorpe July 25, 2019, 7:17 p.m. UTC | #8
On 2019-07-25 12:58 p.m., Jason Gunthorpe wrote:
> On Mon, Jul 22, 2019 at 05:08:56PM -0600, Logan Gunthorpe wrote:
>> Any requests that traverse the host bridge will need to be mapped into
>> the IOMMU, so call dma_map_sg() inside pci_p2pdma_map_sg() when
>> appropriate.
>>
>> Similarly, call dma_unmap_sg() inside pci_p2pdma_unmap_sg().
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>>  drivers/pci/p2pdma.c | 31 ++++++++++++++++++++++++++++++-
>>  1 file changed, 30 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
>> index 5f43f92f9336..76f51678342c 100644
>> +++ b/drivers/pci/p2pdma.c
>> @@ -830,8 +830,22 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>>  		int nents, enum dma_data_direction dir, unsigned long attrs)
>>  {
>>  	struct dev_pagemap *pgmap = sg_page(sg)->pgmap;
>> +	struct pci_dev *client;
>> +	int dist;
>> +
>> +	client = find_parent_pci_dev(dev);
>> +	if (WARN_ON_ONCE(!client))
>> +		return 0;
>>  
>> -	return __pci_p2pdma_map_sg(pgmap, dev, sg, nents);
>> +	dist = upstream_bridge_distance(pgmap->pci_p2pdma_provider,
>> +					client, NULL);
>> +	if (WARN_ON_ONCE(dist & P2PDMA_NOT_SUPPORTED))
>> +		return 0;
>> +
>> +	if (dist & P2PDMA_THRU_HOST_BRIDGE)
>> +		return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
> 
> IIRC at this point the SG will have struct page references to the BAR
> memory - so (all?) the IOMMU drivers are able to handle P2P setup in
> this case?

Yes. The IOMMU drivers refer to the physical address for BAR which they
can get from the struct page. And this works fine today.

> Didn't you also have a patch to remove the struct page's for the BAR
> memory?  

Well that was an RFC that's not going anywhere in it's current form. I'd
still like to remove struct pages but that's going to take a while and
when that happens, this will have to be reworked. Probably to use
dma_map_resource() instead but will depend on the form the removal takes.

Logan
Jason Gunthorpe July 25, 2019, 7:29 p.m. UTC | #9
On Thu, Jul 25, 2019 at 01:17:02PM -0600, Logan Gunthorpe wrote:
> 
> 
> On 2019-07-25 12:58 p.m., Jason Gunthorpe wrote:
> > On Mon, Jul 22, 2019 at 05:08:56PM -0600, Logan Gunthorpe wrote:
> >> Any requests that traverse the host bridge will need to be mapped into
> >> the IOMMU, so call dma_map_sg() inside pci_p2pdma_map_sg() when
> >> appropriate.
> >>
> >> Similarly, call dma_unmap_sg() inside pci_p2pdma_unmap_sg().
> >>
> >> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> >>  drivers/pci/p2pdma.c | 31 ++++++++++++++++++++++++++++++-
> >>  1 file changed, 30 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> >> index 5f43f92f9336..76f51678342c 100644
> >> +++ b/drivers/pci/p2pdma.c
> >> @@ -830,8 +830,22 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
> >>  		int nents, enum dma_data_direction dir, unsigned long attrs)
> >>  {
> >>  	struct dev_pagemap *pgmap = sg_page(sg)->pgmap;
> >> +	struct pci_dev *client;
> >> +	int dist;
> >> +
> >> +	client = find_parent_pci_dev(dev);
> >> +	if (WARN_ON_ONCE(!client))
> >> +		return 0;
> >>  
> >> -	return __pci_p2pdma_map_sg(pgmap, dev, sg, nents);
> >> +	dist = upstream_bridge_distance(pgmap->pci_p2pdma_provider,
> >> +					client, NULL);

Isn't is a bit of a leap to assume that the pgmap is uniform across
all the sgs?

> >> +	if (WARN_ON_ONCE(dist & P2PDMA_NOT_SUPPORTED))
> >> +		return 0;
> >> +
> >> +	if (dist & P2PDMA_THRU_HOST_BRIDGE)
> >> +		return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
> > 
> > IIRC at this point the SG will have struct page references to the BAR
> > memory - so (all?) the IOMMU drivers are able to handle P2P setup in
> > this case?
> 
> Yes. The IOMMU drivers refer to the physical address for BAR which they
> can get from the struct page. And this works fine today.

Interesting.

So, for the places where we already map BAR memory to userspace, if I
were to make struct pages for those BARs and use vm_insert_page()
instead of io_remap_pfn_range(), then the main thing missing in RDMA
to actually do P2P DMA is a way to get those struct pages out of
get_user_pages and know to call the pci_p2pdma_map_sg version (ie in
ib_umem_get())?

Jason
Logan Gunthorpe July 25, 2019, 7:36 p.m. UTC | #10
On 2019-07-25 1:29 p.m., Jason Gunthorpe wrote:
> On Thu, Jul 25, 2019 at 01:17:02PM -0600, Logan Gunthorpe wrote:
>>
>>
>> On 2019-07-25 12:58 p.m., Jason Gunthorpe wrote:
>>> On Mon, Jul 22, 2019 at 05:08:56PM -0600, Logan Gunthorpe wrote:
>>>> Any requests that traverse the host bridge will need to be mapped into
>>>> the IOMMU, so call dma_map_sg() inside pci_p2pdma_map_sg() when
>>>> appropriate.
>>>>
>>>> Similarly, call dma_unmap_sg() inside pci_p2pdma_unmap_sg().
>>>>
>>>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>>>>  drivers/pci/p2pdma.c | 31 ++++++++++++++++++++++++++++++-
>>>>  1 file changed, 30 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
>>>> index 5f43f92f9336..76f51678342c 100644
>>>> +++ b/drivers/pci/p2pdma.c
>>>> @@ -830,8 +830,22 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>>>>  		int nents, enum dma_data_direction dir, unsigned long attrs)
>>>>  {
>>>>  	struct dev_pagemap *pgmap = sg_page(sg)->pgmap;
>>>> +	struct pci_dev *client;
>>>> +	int dist;
>>>> +
>>>> +	client = find_parent_pci_dev(dev);
>>>> +	if (WARN_ON_ONCE(!client))
>>>> +		return 0;
>>>>  
>>>> -	return __pci_p2pdma_map_sg(pgmap, dev, sg, nents);
>>>> +	dist = upstream_bridge_distance(pgmap->pci_p2pdma_provider,
>>>> +					client, NULL);
> 
> Isn't is a bit of a leap to assume that the pgmap is uniform across
> all the sgs?

This is definitely a wart but there's not much we can do about it.
Currently we can't support mixing p2p pages with regular pages. In the
same way we can't support mixing p2p pages from different sources. No
current users do that and it would be weird for them to want to at this
point.

>>>> +	if (WARN_ON_ONCE(dist & P2PDMA_NOT_SUPPORTED))
>>>> +		return 0;
>>>> +
>>>> +	if (dist & P2PDMA_THRU_HOST_BRIDGE)
>>>> +		return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
>>>
>>> IIRC at this point the SG will have struct page references to the BAR
>>> memory - so (all?) the IOMMU drivers are able to handle P2P setup in
>>> this case?
>>
>> Yes. The IOMMU drivers refer to the physical address for BAR which they
>> can get from the struct page. And this works fine today.
> 
> Interesting.
> 
> So, for the places where we already map BAR memory to userspace, if I
> were to make struct pages for those BARs and use vm_insert_page()
> instead of io_remap_pfn_range(), then the main thing missing in RDMA
> to actually do P2P DMA is a way to get those struct pages out of
> get_user_pages and know to call the pci_p2pdma_map_sg version (ie in
> ib_umem_get())?

Yes, we've been doing that for a long time with hacky code that would
never get upstream.

Essentially, if you expose those pages to userspace we also need to
ensure that all other users of GUP either reject those pages or map them
correctly.

Logan
diff mbox series

Patch

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 5f43f92f9336..76f51678342c 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -830,8 +830,22 @@  int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 		int nents, enum dma_data_direction dir, unsigned long attrs)
 {
 	struct dev_pagemap *pgmap = sg_page(sg)->pgmap;
+	struct pci_dev *client;
+	int dist;
+
+	client = find_parent_pci_dev(dev);
+	if (WARN_ON_ONCE(!client))
+		return 0;
 
-	return __pci_p2pdma_map_sg(pgmap, dev, sg, nents);
+	dist = upstream_bridge_distance(pgmap->pci_p2pdma_provider,
+					client, NULL);
+	if (WARN_ON_ONCE(dist & P2PDMA_NOT_SUPPORTED))
+		return 0;
+
+	if (dist & P2PDMA_THRU_HOST_BRIDGE)
+		return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
+	else
+		return __pci_p2pdma_map_sg(pgmap, dev, sg, nents);
 }
 EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs);
 
@@ -847,6 +861,21 @@  EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs);
 void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
 		int nents, enum dma_data_direction dir, unsigned long attrs)
 {
+	struct dev_pagemap *pgmap = sg_page(sg)->pgmap;
+	struct pci_dev *client;
+	int dist;
+
+	client = find_parent_pci_dev(dev);
+	if (!client)
+		return;
+
+	dist = upstream_bridge_distance(pgmap->pci_p2pdma_provider,
+					client, NULL);
+	if (dist & P2PDMA_NOT_SUPPORTED)
+		return;
+
+	if (dist & P2PDMA_THRU_HOST_BRIDGE)
+		dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
 }
 EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);