[02/20] kernel/dma/direct: refine dma_direct_alloc zone selection

Message ID 20180730163824.10064-3-hch@lst.de
State Not Applicable
Headers show
Series
  • [01/20] kernel/dma/direct: take DMA offset into account in dma_direct_supported
Related show

Checks

Context Check Description
snowpatch_ozlabs/checkpatch warning Test checkpatch on branch next
snowpatch_ozlabs/apply_patch success next/apply_patch Successfully applied

Commit Message

Christoph Hellwig July 30, 2018, 4:38 p.m.
We need to take the DMA offset and encryption bit into account when selecting
a zone.  Add a helper that takes those into account and use it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/direct.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

Comments

Benjamin Herrenschmidt Aug. 8, 2018, 11:54 p.m. | #1
On Mon, 2018-07-30 at 18:38 +0200, Christoph Hellwig wrote:
> We need to take the DMA offset and encryption bit into account when selecting
> a zone.  Add a helper that takes those into account and use it.

That whole "encryption" stuff seems to be completely specific to the
way x86 does memory encryption, or am I mistaken ? It's not clear to me
what that does in practice and how it relates to DMA mappings.

I'm also not sure about that whole business with ZONE_DMA and
ARCH_ZONE_DMA_BITS...

On ppc64, unless you enable swiotlb (which we only do currently on
some embedded platforms), you have all of memory in ZONE_DMA.

[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000000000-0x0000001fffffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   empty
[    0.000000]   Device   empty

I'm not sure how this will work with that dma direct code.

I also see a number of tests against a 64-bit mask rather than the
top of memory...

Ben.

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  kernel/dma/direct.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index d32d4f0d2c0c..c2c1df8827f2 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -58,6 +58,14 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
>  	return addr + size - 1 <= dev->coherent_dma_mask;
>  }
>  
> +static bool dma_coherent_below(struct device *dev, u64 mask)
> +{
> +	dma_addr_t addr = force_dma_unencrypted() ?
> +		__phys_to_dma(dev, mask) : phys_to_dma(dev, mask);
> +
> +	return dev->coherent_dma_mask <= addr;
> +}
> +
>  void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
>  		gfp_t gfp, unsigned long attrs)
>  {
> @@ -70,9 +78,9 @@ void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
>  	gfp &= ~__GFP_ZERO;
>  
>  	/* GFP_DMA32 and GFP_DMA are no ops without the corresponding zones: */
> -	if (dev->coherent_dma_mask <= DMA_BIT_MASK(ARCH_ZONE_DMA_BITS))
> +	if (dma_coherent_below(dev, DMA_BIT_MASK(ARCH_ZONE_DMA_BITS)))
>  		gfp |= GFP_DMA;
> -	if (dev->coherent_dma_mask <= DMA_BIT_MASK(32) && !(gfp & GFP_DMA))
> +	if (dma_coherent_below(dev, DMA_BIT_MASK(32) && !(gfp & GFP_DMA)))
>  		gfp |= GFP_DMA32;
>  
>  again:
> @@ -92,14 +100,14 @@ void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
>  		page = NULL;
>  
>  		if (IS_ENABLED(CONFIG_ZONE_DMA32) &&
> -		    dev->coherent_dma_mask < DMA_BIT_MASK(64) &&
> +		    dma_coherent_below(dev, DMA_BIT_MASK(64)) &&
>  		    !(gfp & (GFP_DMA32 | GFP_DMA))) {
>  			gfp |= GFP_DMA32;
>  			goto again;
>  		}
>  
>  		if (IS_ENABLED(CONFIG_ZONE_DMA) &&
> -		    dev->coherent_dma_mask < DMA_BIT_MASK(32) &&
> +		    dma_coherent_below(dev, DMA_BIT_MASK(32)) &&
>  		    !(gfp & GFP_DMA)) {
>  			gfp = (gfp & ~GFP_DMA32) | GFP_DMA;
>  			goto again;
Christoph Hellwig Aug. 22, 2018, 6:58 a.m. | #2
On Thu, Aug 09, 2018 at 09:54:33AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-07-30 at 18:38 +0200, Christoph Hellwig wrote:
> > We need to take the DMA offset and encryption bit into account when selecting
> > a zone.  Add a helper that takes those into account and use it.
> 
> That whole "encryption" stuff seems to be completely specific to the
> way x86 does memory encryption, or am I mistaken ? It's not clear to me
> what that does in practice and how it relates to DMA mappings.

Not even all of x86, but AMD in particular, Intel does it yet another
way.  But it still is easier to take this into the core with a few
overrides than duplicating all the code.

> I'm also not sure about that whole business with ZONE_DMA and
> ARCH_ZONE_DMA_BITS...

ZONE_DMA usually (but not always) maps to 24-bits of address space,
if it doesn't (I mostly through about s390 with it's odd 31-bits)
the architecture can override it if it cares).

> On ppc64, unless you enable swiotlb (which we only do currently on
> some embedded platforms), you have all of memory in ZONE_DMA.
> 
> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x0000000000000000-0x0000001fffffffff]
> [    0.000000]   DMA32    empty
> [    0.000000]   Normal   empty
> [    0.000000]   Device   empty

This is really weird.  Why would you wire up ZONE_DMA like this?
The general scheme that architectures should implement is:

ZONE_DMA:	Any memory below a magic threshold that is lower than
		32-bit.  Only enabled if actually required (usually
		either 24-bit for ISA, or some other weird architecture
		specific value like 32-bit for S/390)
ZONE_DMA32:	Memory <= 32-bit if the architecture supports more than
		32-bits worth of physical address space.  Should generally
		be enabled on all 64-bit architectures unless you have
		a very good reason not to.
ZONE_NORMAL:	Everything above 32-bit not falling into HIGHMEM or
		MOVEABLE.
Benjamin Herrenschmidt Aug. 23, 2018, 12:01 a.m. | #3
On Wed, 2018-08-22 at 08:58 +0200, Christoph Hellwig wrote:
> On Thu, Aug 09, 2018 at 09:54:33AM +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2018-07-30 at 18:38 +0200, Christoph Hellwig wrote:
> > > We need to take the DMA offset and encryption bit into account when selecting
> > > a zone.  Add a helper that takes those into account and use it.
> > 
> > That whole "encryption" stuff seems to be completely specific to the
> > way x86 does memory encryption, or am I mistaken ? It's not clear to me
> > what that does in practice and how it relates to DMA mappings.
> 
> Not even all of x86, but AMD in particular, Intel does it yet another
> way.  But it still is easier to take this into the core with a few
> overrides than duplicating all the code.
> 
> > I'm also not sure about that whole business with ZONE_DMA and
> > ARCH_ZONE_DMA_BITS...
> 
> ZONE_DMA usually (but not always) maps to 24-bits of address space,
> if it doesn't (I mostly through about s390 with it's odd 31-bits)
> the architecture can override it if it cares).
> 
> > On ppc64, unless you enable swiotlb (which we only do currently on
> > some embedded platforms), you have all of memory in ZONE_DMA.
> > 
> > [    0.000000] Zone ranges:
> > [    0.000000]   DMA      [mem 0x0000000000000000-0x0000001fffffffff]
> > [    0.000000]   DMA32    empty
> > [    0.000000]   Normal   empty
> > [    0.000000]   Device   empty
> 
> This is really weird.  Why would you wire up ZONE_DMA like this?

We always did :-) It predates my involvement and I think it predates
even Pauls. It's quite silly actually since the first powerpc machines
actually had ISA devices in them, but that's how it's been for ever. I
suppose we could change it but that would mean digging out some old
stuff to test.

> The general scheme that architectures should implement is:
> 
> ZONE_DMA:	Any memory below a magic threshold that is lower than
> 		32-bit.  Only enabled if actually required (usually
> 		either 24-bit for ISA, or some other weird architecture
> 		specific value like 32-bit for S/390)

It should have been ZONE_ISA_DMA :-)

> ZONE_DMA32:	Memory <= 32-bit if the architecture supports more than
> 		32-bits worth of physical address space.  Should generally
> 		be enabled on all 64-bit architectures unless you have
> 		a very good reason not to.

Yeah so we sort-of enable the config option but only populate the zone
on platforms using swiotlb (freescale stuff). It's a bit messy at the
moment I must admit.

> ZONE_NORMAL:	Everything above 32-bit not falling into HIGHMEM or
> 		MOVEABLE.

Cheers,
Ben.
Christoph Hellwig Aug. 23, 2018, 5:26 a.m. | #4
On Thu, Aug 23, 2018 at 10:01:45AM +1000, Benjamin Herrenschmidt wrote:
> > The general scheme that architectures should implement is:
> > 
> > ZONE_DMA:	Any memory below a magic threshold that is lower than
> > 		32-bit.  Only enabled if actually required (usually
> > 		either 24-bit for ISA, or some other weird architecture
> > 		specific value like 32-bit for S/390)
> 
> It should have been ZONE_ISA_DMA :-)

For most of these use cases it should have been indeed, and that
would avoid a lot of confusion where people use GFP_DMA just because
they do DMA.

Anyway, switching powerpc to this scheme would be great, but I don't
think it is required - GFP_KERNEL will silently fall back to ZONE_DMA,
so except for an additional GFP_DMA fallback allocation when the
GFP_KERNEL one fails the code should just work.

Patch

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index d32d4f0d2c0c..c2c1df8827f2 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -58,6 +58,14 @@  static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
 	return addr + size - 1 <= dev->coherent_dma_mask;
 }
 
+static bool dma_coherent_below(struct device *dev, u64 mask)
+{
+	dma_addr_t addr = force_dma_unencrypted() ?
+		__phys_to_dma(dev, mask) : phys_to_dma(dev, mask);
+
+	return dev->coherent_dma_mask <= addr;
+}
+
 void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
 		gfp_t gfp, unsigned long attrs)
 {
@@ -70,9 +78,9 @@  void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
 	gfp &= ~__GFP_ZERO;
 
 	/* GFP_DMA32 and GFP_DMA are no ops without the corresponding zones: */
-	if (dev->coherent_dma_mask <= DMA_BIT_MASK(ARCH_ZONE_DMA_BITS))
+	if (dma_coherent_below(dev, DMA_BIT_MASK(ARCH_ZONE_DMA_BITS)))
 		gfp |= GFP_DMA;
-	if (dev->coherent_dma_mask <= DMA_BIT_MASK(32) && !(gfp & GFP_DMA))
+	if (dma_coherent_below(dev, DMA_BIT_MASK(32) && !(gfp & GFP_DMA)))
 		gfp |= GFP_DMA32;
 
 again:
@@ -92,14 +100,14 @@  void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
 		page = NULL;
 
 		if (IS_ENABLED(CONFIG_ZONE_DMA32) &&
-		    dev->coherent_dma_mask < DMA_BIT_MASK(64) &&
+		    dma_coherent_below(dev, DMA_BIT_MASK(64)) &&
 		    !(gfp & (GFP_DMA32 | GFP_DMA))) {
 			gfp |= GFP_DMA32;
 			goto again;
 		}
 
 		if (IS_ENABLED(CONFIG_ZONE_DMA) &&
-		    dev->coherent_dma_mask < DMA_BIT_MASK(32) &&
+		    dma_coherent_below(dev, DMA_BIT_MASK(32)) &&
 		    !(gfp & GFP_DMA)) {
 			gfp = (gfp & ~GFP_DMA32) | GFP_DMA;
 			goto again;