[01/20] kernel/dma/direct: take DMA offset into account in dma_direct_supported

Message ID 20180730163824.10064-2-hch@lst.de
State Not Applicable
Headers show
Series
  • [01/20] kernel/dma/direct: take DMA offset into account in dma_direct_supported
Related show

Checks

Context Check Description
snowpatch_ozlabs/checkpatch success Test checkpatch on branch next
snowpatch_ozlabs/apply_patch success next/apply_patch Successfully applied

Commit Message

Christoph Hellwig July 30, 2018, 4:38 p.m.
When a device has a DMA offset the dma capable result will change due
to the difference between the physical and DMA address.  Take that into
account.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/direct.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Benjamin Herrenschmidt Aug. 8, 2018, 11:44 p.m. | #1
On Mon, 2018-07-30 at 18:38 +0200, Christoph Hellwig wrote:
> When a device has a DMA offset the dma capable result will change due
> to the difference between the physical and DMA address.  Take that into
> account.

The patch in itself makes sense.

However, there are a number of things in that dma_direct.c file that I
don't quite get:

 - looking more generally at what that function does, I worry about the
switch of ppc32 to this later on:

We do have the occasional device with things like 31-bit DMA
limitation. We know they happens to work because those systems
can't have enough memory to be a problem. This is why our current
DMA direct ops in powerpc just unconditionally return true on ppc32.

The test against a full 32-bit mask here will break them I think.

Thing is, I'm not sure I still have access to one of these things
to test, I'll have to dig (from memory things like b43 wifi).

Also those platforms don't have an iommu.

 - What is this trying to achieve ?

	/*
	 * Various PCI/PCIe bridges have broken support for > 32bit DMA even
	 * if the device itself might support it.
	 */
	if (dev->dma_32bit_limit && mask > phys_to_dma(dev, DMA_BIT_MASK(32)))
		return 0;

IE, if the device has a 32-bit limit, we fail an attempt at checking
if a >32-bit mask works ? That doesn't quite seem to be the right thing
to do... Shouldn't this be in dma_set_mask() and just clamp the mask down ?

IE, dma_set_mask() is what a driver uses to establish the device capability,
so it makes sense tot have dma_32bit_limit just reduce that capability, not
fail because the device can do more than what the bridge can.... 

Sorry if I'm a bit confused here.

 - How is that file supposed to work on 64-bit platforms ? From what I can
tell, dma_supported() will unconditionally return true if the mask is
32-bit or larger (appart from the above issue). This doesn't look right,
the mask needs to be compared to the max memory address. There are a bunch
of devices out there with masks anywhere bettween 40 and 64 bits, and
some of these will not work "out of the box" if the offseted top
of memory is beyond the mask limit. Or am I missing something ?

Cheers,
Ben.

> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

> ---
>  kernel/dma/direct.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 8be8106270c2..d32d4f0d2c0c 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -167,7 +167,7 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>  int dma_direct_supported(struct device *dev, u64 mask)
>  {
>  #ifdef CONFIG_ZONE_DMA
> -	if (mask < DMA_BIT_MASK(ARCH_ZONE_DMA_BITS))
> +	if (mask < phys_to_dma(dev, DMA_BIT_MASK(ARCH_ZONE_DMA_BITS)))
>  		return 0;
>  #else
>  	/*
> @@ -176,14 +176,14 @@ int dma_direct_supported(struct device *dev, u64 mask)
>  	 * memory, or by providing a ZONE_DMA32.  If neither is the case, the
>  	 * architecture needs to use an IOMMU instead of the direct mapping.
>  	 */
> -	if (mask < DMA_BIT_MASK(32))
> +	if (mask < phys_to_dma(dev, DMA_BIT_MASK(32)))
>  		return 0;
>  #endif
>  	/*
>  	 * Various PCI/PCIe bridges have broken support for > 32bit DMA even
>  	 * if the device itself might support it.
>  	 */
> -	if (dev->dma_32bit_limit && mask > DMA_BIT_MASK(32))
> +	if (dev->dma_32bit_limit && mask > phys_to_dma(dev, DMA_BIT_MASK(32)))
>  		return 0;
>  	return 1;
>  }
Christoph Hellwig Aug. 22, 2018, 6:53 a.m. | #2
On Thu, Aug 09, 2018 at 09:44:18AM +1000, Benjamin Herrenschmidt wrote:
> We do have the occasional device with things like 31-bit DMA
> limitation. We know they happens to work because those systems
> can't have enough memory to be a problem. This is why our current
> DMA direct ops in powerpc just unconditionally return true on ppc32.
> 
> The test against a full 32-bit mask here will break them I think.
> 
> Thing is, I'm not sure I still have access to one of these things
> to test, I'll have to dig (from memory things like b43 wifi).

Yeah, the other platforms that support these devices support ZONE_DMA
to reliably handle these devices. But there is two other ways the
current code would actually handle these fine despite the dma_direct
checks:

 1) if the device only has physical addresses up to 31-bit anyway
 2) by trying again to find a lower address.  But this only works
    for coherent allocations and not streaming maps (unless we have
    swiotlb with a buffer below 31-bits).

It seems powerpc can have ZONE_DMA, though and we will cover these
devices just fine.  If it didn't have that the current powerpc
code would not work either.

>  - What is this trying to achieve ?
> 
> 	/*
> 	 * Various PCI/PCIe bridges have broken support for > 32bit DMA even
> 	 * if the device itself might support it.
> 	 */
> 	if (dev->dma_32bit_limit && mask > phys_to_dma(dev, DMA_BIT_MASK(32)))
> 		return 0;
> 
> IE, if the device has a 32-bit limit, we fail an attempt at checking
> if a >32-bit mask works ? That doesn't quite seem to be the right thing
> to do... Shouldn't this be in dma_set_mask() and just clamp the mask down ?
> 
> IE, dma_set_mask() is what a driver uses to establish the device capability,
> so it makes sense tot have dma_32bit_limit just reduce that capability, not
> fail because the device can do more than what the bridge can.... 

If your PCI bridge / PCIe root port doesn't support dma to addresses
larger than 32-bit the device capabilities above that don't matter, it
just won't work.  We have this case at least for some old VIA x86 chipsets
and some relatively modern Xilinx FPGAs with PCIe.

>  - How is that file supposed to work on 64-bit platforms ? From what I can
> tell, dma_supported() will unconditionally return true if the mask is
> 32-bit or larger (appart from the above issue). This doesn't look right,
> the mask needs to be compared to the max memory address. There are a bunch
> of devices out there with masks anywhere bettween 40 and 64 bits, and
> some of these will not work "out of the box" if the offseted top
> of memory is beyond the mask limit. Or am I missing something ?

Your are not missing anything except for the history of this code.

Your observation is right, but there always has been the implicit
assumption that architectures with more than 4GB of physical address
space must either support and iommu or swiotlb and use that.  It's
never been document anywhere, but I'm working on integrating all
this code to make more sense.
Benjamin Herrenschmidt Aug. 22, 2018, 11:59 p.m. | #3
On Wed, 2018-08-22 at 08:53 +0200, Christoph Hellwig wrote:
> On Thu, Aug 09, 2018 at 09:44:18AM +1000, Benjamin Herrenschmidt wrote:
> > We do have the occasional device with things like 31-bit DMA
> > limitation. We know they happens to work because those systems
> > can't have enough memory to be a problem. This is why our current
> > DMA direct ops in powerpc just unconditionally return true on ppc32.
> > 
> > The test against a full 32-bit mask here will break them I think.
> > 
> > Thing is, I'm not sure I still have access to one of these things
> > to test, I'll have to dig (from memory things like b43 wifi).
> 
> Yeah, the other platforms that support these devices support ZONE_DMA
> to reliably handle these devices. But there is two other ways the
> current code would actually handle these fine despite the dma_direct
> checks:
> 
>  1) if the device only has physical addresses up to 31-bit anyway
>  2) by trying again to find a lower address.  But this only works
>     for coherent allocations and not streaming maps (unless we have
>     swiotlb with a buffer below 31-bits).
> 
> It seems powerpc can have ZONE_DMA, though and we will cover these
> devices just fine.  If it didn't have that the current powerpc
> code would not work either.

Not exactly. powerpc has ZONE_DMA covering all of system memory.

What happens in ppc32 is that we somewhat "know" that none of the
systems with those stupid 31-bit limited pieces of HW is capable of
having more than 2GB of memory anyway.

So we get away with just returning "1".

> 
> >  - What is this trying to achieve ?
> > 
> > 	/*
> > 	 * Various PCI/PCIe bridges have broken support for > 32bit DMA even
> > 	 * if the device itself might support it.
> > 	 */
> > 	if (dev->dma_32bit_limit && mask > phys_to_dma(dev, DMA_BIT_MASK(32)))
> > 		return 0;
> > 
> > IE, if the device has a 32-bit limit, we fail an attempt at checking
> > if a >32-bit mask works ? That doesn't quite seem to be the right thing
> > to do... Shouldn't this be in dma_set_mask() and just clamp the mask down ?
> > 
> > IE, dma_set_mask() is what a driver uses to establish the device capability,
> > so it makes sense tot have dma_32bit_limit just reduce that capability, not
> > fail because the device can do more than what the bridge can.... 
> 
> If your PCI bridge / PCIe root port doesn't support dma to addresses
> larger than 32-bit the device capabilities above that don't matter, it
> just won't work.  We have this case at least for some old VIA x86 chipsets
> and some relatively modern Xilinx FPGAs with PCIe.

Hrm... that's the usual confusion dma_capable() vs. dma_set_mask().

It's always been perfectly fine for a driver to do a dma_set_mask(64-
bit) on a system where the bridge can only do 32-bits ...

We shouldn't fail there, we should instead "clamp" the mask to 32-bit,
see what I mean ? It doesn't matter that the device itself is capable
of issuing >32 addresses, I agree, but what we need to express is that
the combination device+bridge doesn't want addresses above 32-bit, so
it's equivalent to making the device do a set_mask(32-bit).

This will succeed if the system can limit the addresses (for example
because memory is never above 32-bit) and will fail if the system
can't.

So that's equivalent of writing

	if (dev->dma_32bit_limit && mask > phys_to_dma(dev, DMA_BIT_MASK(32)))
		mask = phys_to_dma(dev, DMA_BIT_MASK(32)); 

Effectively meaning "don't give me addresses aboe 32-bit".

Still, your code doesn't check the mask against the memory size. Which
means it will fail for 32-bit masks even on systems that do not have
memory above 4G.

> >  - How is that file supposed to work on 64-bit platforms ? From what I can
> > tell, dma_supported() will unconditionally return true if the mask is
> > 32-bit or larger (appart from the above issue). This doesn't look right,
> > the mask needs to be compared to the max memory address. There are a bunch
> > of devices out there with masks anywhere bettween 40 and 64 bits, and
> > some of these will not work "out of the box" if the offseted top
> > of memory is beyond the mask limit. Or am I missing something ?
> 
> Your are not missing anything except for the history of this code.
> 
> Your observation is right, but there always has been the implicit
> assumption that architectures with more than 4GB of physical address
> space must either support and iommu or swiotlb and use that.  It's
> never been document anywhere, but I'm working on integrating all
> this code to make more sense.

Well, iommus can have bypass regions, which we also use for
performance, so we do at dma_set_mask() time "swap" the ops around, and
in that case, we do want to check the mask against the actual top of
memory...

Cheers,
Ben.
Christoph Hellwig Aug. 23, 2018, 5:24 a.m. | #4
On Thu, Aug 23, 2018 at 09:59:18AM +1000, Benjamin Herrenschmidt wrote:
> > Yeah, the other platforms that support these devices support ZONE_DMA
> > to reliably handle these devices. But there is two other ways the
> > current code would actually handle these fine despite the dma_direct
> > checks:
> > 
> >  1) if the device only has physical addresses up to 31-bit anyway
> >  2) by trying again to find a lower address.  But this only works
> >     for coherent allocations and not streaming maps (unless we have
> >     swiotlb with a buffer below 31-bits).
> > 
> > It seems powerpc can have ZONE_DMA, though and we will cover these
> > devices just fine.  If it didn't have that the current powerpc
> > code would not work either.
> 
> Not exactly. powerpc has ZONE_DMA covering all of system memory.
> 
> What happens in ppc32 is that we somewhat "know" that none of the
> systems with those stupid 31-bit limited pieces of HW is capable of
> having more than 2GB of memory anyway.
> 
> So we get away with just returning "1".

I think I can up with a proper way of handling that by checking
the actual amount of physical memory present instead of the hard coded
32-bit.

> > If your PCI bridge / PCIe root port doesn't support dma to addresses
> > larger than 32-bit the device capabilities above that don't matter, it
> > just won't work.  We have this case at least for some old VIA x86 chipsets
> > and some relatively modern Xilinx FPGAs with PCIe.
> 
> Hrm... that's the usual confusion dma_capable() vs. dma_set_mask().
> 
> It's always been perfectly fine for a driver to do a dma_set_mask(64-
> bit) on a system where the bridge can only do 32-bits ...

No, it hasn't.  That's why we have this pattern of trying a 64-bit
mask first and then setting a 32-bit mask if that fails all over
drivers/.  However with all the work we've done over the last month
we are getting really close to a world where:

 - the driver just does one dma_set_mask for the capabilities and
   stores that in the dma_mask
 - other limitations go elsewhere and will be automatically taken
   into account.

Which is I guess what you always wanted, but which wasn't how things
actually worked before.

> We shouldn't fail there, we should instead "clamp" the mask to 32-bit,
> see what I mean ? It doesn't matter that the device itself is capable
> of issuing >32 addresses, I agree, but what we need to express is that
> the combination device+bridge doesn't want addresses above 32-bit, so
> it's equivalent to making the device do a set_mask(32-bit).

As said, we'll get there (but with the new separate bus_dma_mask in 4.19),
but this is not how things currently work.

> > Your observation is right, but there always has been the implicit
> > assumption that architectures with more than 4GB of physical address
> > space must either support and iommu or swiotlb and use that.  It's
> > never been document anywhere, but I'm working on integrating all
> > this code to make more sense.
> 
> Well, iommus can have bypass regions, which we also use for
> performance, so we do at dma_set_mask() time "swap" the ops around, and
> in that case, we do want to check the mask against the actual top of
> memory...

That is a bit of a powerpc special case (we also had one other arch
doing that, but it got removed in the great purge, can't rember which
one right now).  Everyone else has one set of ops, and they just switch
to the direct mapping inside the iommu ops.
Benjamin Herrenschmidt Aug. 23, 2018, 5:24 a.m. | #5
On Thu, 2018-08-23 at 07:24 +0200, Christoph Hellwig wrote:
> > Well, iommus can have bypass regions, which we also use for
> > performance, so we do at dma_set_mask() time "swap" the ops around, and
> > in that case, we do want to check the mask against the actual top of
> > memory...
> 
> That is a bit of a powerpc special case (we also had one other arch
> doing that, but it got removed in the great purge, can't rember which
> one right now).  Everyone else has one set of ops, and they just switch
> to the direct mapping inside the iommu ops.

We more or less do that too in some of ours these days bcs of the whole
coherent_mask vs mask where a given device might need either depending
on the type of mapping.

Ben.

Patch

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 8be8106270c2..d32d4f0d2c0c 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -167,7 +167,7 @@  int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
 int dma_direct_supported(struct device *dev, u64 mask)
 {
 #ifdef CONFIG_ZONE_DMA
-	if (mask < DMA_BIT_MASK(ARCH_ZONE_DMA_BITS))
+	if (mask < phys_to_dma(dev, DMA_BIT_MASK(ARCH_ZONE_DMA_BITS)))
 		return 0;
 #else
 	/*
@@ -176,14 +176,14 @@  int dma_direct_supported(struct device *dev, u64 mask)
 	 * memory, or by providing a ZONE_DMA32.  If neither is the case, the
 	 * architecture needs to use an IOMMU instead of the direct mapping.
 	 */
-	if (mask < DMA_BIT_MASK(32))
+	if (mask < phys_to_dma(dev, DMA_BIT_MASK(32)))
 		return 0;
 #endif
 	/*
 	 * Various PCI/PCIe bridges have broken support for > 32bit DMA even
 	 * if the device itself might support it.
 	 */
-	if (dev->dma_32bit_limit && mask > DMA_BIT_MASK(32))
+	if (dev->dma_32bit_limit && mask > phys_to_dma(dev, DMA_BIT_MASK(32)))
 		return 0;
 	return 1;
 }