diff mbox

[2/7,v2] powerpc/dma-mapping: override dma_get_page_shift

Message ID 20151023205718.GC10197@linux.vnet.ibm.com
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Nishanth Aravamudan Oct. 23, 2015, 8:57 p.m. UTC
On Power, the kernel's page size can differ from the IOMMU's page size,
so we need to override the generic implementation, which always returns
the kernel's page size. Lookup the IOMMU's page size from struct
iommu_table, if available. Fallback to the kernel's page size,
otherwise.

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/dma-mapping.h | 3 +++
 arch/powerpc/kernel/dma.c              | 9 +++++++++
 2 files changed, 12 insertions(+)

Comments

Alexey Kardashevskiy Oct. 27, 2015, 6:02 a.m. UTC | #1
On 10/24/2015 07:57 AM, Nishanth Aravamudan wrote:
> On Power, the kernel's page size can differ from the IOMMU's page size,
> so we need to override the generic implementation, which always returns
> the kernel's page size. Lookup the IOMMU's page size from struct
> iommu_table, if available. Fallback to the kernel's page size,
> otherwise.
>
> Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/dma-mapping.h | 3 +++
>   arch/powerpc/kernel/dma.c              | 9 +++++++++
>   2 files changed, 12 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
> index 7f522c0..c5638f4 100644
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -125,6 +125,9 @@ static inline void set_dma_offset(struct device *dev, dma_addr_t off)
>   #define HAVE_ARCH_DMA_SET_MASK 1
>   extern int dma_set_mask(struct device *dev, u64 dma_mask);
>
> +#define HAVE_ARCH_DMA_GET_PAGE_SHIFT 1
> +extern unsigned long dma_get_page_shift(struct device *dev);
> +
>   #include <asm-generic/dma-mapping-common.h>
>
>   extern int __dma_set_mask(struct device *dev, u64 dma_mask);
> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> index 59503ed..e805af2 100644
> --- a/arch/powerpc/kernel/dma.c
> +++ b/arch/powerpc/kernel/dma.c
> @@ -335,6 +335,15 @@ int dma_set_mask(struct device *dev, u64 dma_mask)
>   }
>   EXPORT_SYMBOL(dma_set_mask);
>
> +unsigned long dma_get_page_shift(struct device *dev)
> +{
> +	struct iommu_table *tbl = get_iommu_table_base(dev);
> +	if (tbl)
> +		return tbl->it_page_shift;


All PCI devices have this initialized on POWER (at least, our, IBM's POWER) 
so 4K will always be returned here while in the case of 
(get_dma_ops(dev)==&dma_direct_ops) it could actually return PAGE_SHIFT. Is 
4K still preferred value to return here?



> +	return PAGE_SHIFT;
> +}
> +EXPORT_SYMBOL(dma_get_page_shift);
> +
>   u64 __dma_get_required_mask(struct device *dev)
>   {
>   	struct dma_map_ops *dma_ops = get_dma_ops(dev);
>
Keith Busch Oct. 27, 2015, 2:06 p.m. UTC | #2
On Tue, Oct 27, 2015 at 05:02:16PM +1100, Alexey Kardashevskiy wrote:
> >+unsigned long dma_get_page_shift(struct device *dev)
> >+{
> >+	struct iommu_table *tbl = get_iommu_table_base(dev);
> >+	if (tbl)
> >+		return tbl->it_page_shift;
> 
> 
> All PCI devices have this initialized on POWER (at least, our, IBM's
> POWER) so 4K will always be returned here while in the case of
> (get_dma_ops(dev)==&dma_direct_ops) it could actually return
> PAGE_SHIFT. Is 4K still preferred value to return here?

4k is always a safe option to return, but ideally you want to return the
highest guaranteed DMA address alignment. The driver just needs to know
which bits to mask from virtual addresses such that the offset is the
same as the DMA address.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Aravamudan Oct. 27, 2015, 10:27 p.m. UTC | #3
On 27.10.2015 [17:02:16 +1100], Alexey Kardashevskiy wrote:
> On 10/24/2015 07:57 AM, Nishanth Aravamudan wrote:
> >On Power, the kernel's page size can differ from the IOMMU's page size,
> >so we need to override the generic implementation, which always returns
> >the kernel's page size. Lookup the IOMMU's page size from struct
> >iommu_table, if available. Fallback to the kernel's page size,
> >otherwise.
> >
> >Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
> >---
> >  arch/powerpc/include/asm/dma-mapping.h | 3 +++
> >  arch/powerpc/kernel/dma.c              | 9 +++++++++
> >  2 files changed, 12 insertions(+)
> >
> >diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
> >index 7f522c0..c5638f4 100644
> >--- a/arch/powerpc/include/asm/dma-mapping.h
> >+++ b/arch/powerpc/include/asm/dma-mapping.h
> >@@ -125,6 +125,9 @@ static inline void set_dma_offset(struct device *dev, dma_addr_t off)
> >  #define HAVE_ARCH_DMA_SET_MASK 1
> >  extern int dma_set_mask(struct device *dev, u64 dma_mask);
> >
> >+#define HAVE_ARCH_DMA_GET_PAGE_SHIFT 1
> >+extern unsigned long dma_get_page_shift(struct device *dev);
> >+
> >  #include <asm-generic/dma-mapping-common.h>
> >
> >  extern int __dma_set_mask(struct device *dev, u64 dma_mask);
> >diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> >index 59503ed..e805af2 100644
> >--- a/arch/powerpc/kernel/dma.c
> >+++ b/arch/powerpc/kernel/dma.c
> >@@ -335,6 +335,15 @@ int dma_set_mask(struct device *dev, u64 dma_mask)
> >  }
> >  EXPORT_SYMBOL(dma_set_mask);
> >
> >+unsigned long dma_get_page_shift(struct device *dev)
> >+{
> >+	struct iommu_table *tbl = get_iommu_table_base(dev);
> >+	if (tbl)
> >+		return tbl->it_page_shift;
> 
> 
> All PCI devices have this initialized on POWER (at least, our, IBM's
> POWER) so 4K will always be returned here while in the case of
> (get_dma_ops(dev)==&dma_direct_ops) it could actually return
> PAGE_SHIFT. Is 4K still preferred value to return here?

Right, so the logic of my series, goes like this:

a) We currently are assuming DMA_PAGE_SHIFT (conceptual constant) is
PAGE_SHIFT everywhere, including Power.

b) After 2/7, the Power code will return either the IOMMU table's shift
value, if set, or PAGE_SHIFT (I guess this would be the case if
get_dma_ops(dev) == &dma_direct_ops, as you said). That is no different
than we have now, except we can return the accurate IOMMU value if
available.

3) After 3/7, the platform can override the generic Power
get_dma_page_shift().

4) After 4/7, pseries will return the DDW value, if available, then
fallback to the IOMMU table's value. I think in the case of
get_dma_ops(dev)==&dma_direct_ops, the only way that can happen is if we
are using DDW, right?

-Nish

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexey Kardashevskiy Oct. 28, 2015, 1 a.m. UTC | #4
On 10/28/2015 09:27 AM, Nishanth Aravamudan wrote:
> On 27.10.2015 [17:02:16 +1100], Alexey Kardashevskiy wrote:
>> On 10/24/2015 07:57 AM, Nishanth Aravamudan wrote:
>>> On Power, the kernel's page size can differ from the IOMMU's page size,
>>> so we need to override the generic implementation, which always returns
>>> the kernel's page size. Lookup the IOMMU's page size from struct
>>> iommu_table, if available. Fallback to the kernel's page size,
>>> otherwise.
>>>
>>> Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/include/asm/dma-mapping.h | 3 +++
>>>   arch/powerpc/kernel/dma.c              | 9 +++++++++
>>>   2 files changed, 12 insertions(+)
>>>
>>> diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
>>> index 7f522c0..c5638f4 100644
>>> --- a/arch/powerpc/include/asm/dma-mapping.h
>>> +++ b/arch/powerpc/include/asm/dma-mapping.h
>>> @@ -125,6 +125,9 @@ static inline void set_dma_offset(struct device *dev, dma_addr_t off)
>>>   #define HAVE_ARCH_DMA_SET_MASK 1
>>>   extern int dma_set_mask(struct device *dev, u64 dma_mask);
>>>
>>> +#define HAVE_ARCH_DMA_GET_PAGE_SHIFT 1
>>> +extern unsigned long dma_get_page_shift(struct device *dev);
>>> +
>>>   #include <asm-generic/dma-mapping-common.h>
>>>
>>>   extern int __dma_set_mask(struct device *dev, u64 dma_mask);
>>> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
>>> index 59503ed..e805af2 100644
>>> --- a/arch/powerpc/kernel/dma.c
>>> +++ b/arch/powerpc/kernel/dma.c
>>> @@ -335,6 +335,15 @@ int dma_set_mask(struct device *dev, u64 dma_mask)
>>>   }
>>>   EXPORT_SYMBOL(dma_set_mask);
>>>
>>> +unsigned long dma_get_page_shift(struct device *dev)
>>> +{
>>> +	struct iommu_table *tbl = get_iommu_table_base(dev);
>>> +	if (tbl)
>>> +		return tbl->it_page_shift;
>>
>>
>> All PCI devices have this initialized on POWER (at least, our, IBM's
>> POWER) so 4K will always be returned here while in the case of
>> (get_dma_ops(dev)==&dma_direct_ops) it could actually return
>> PAGE_SHIFT. Is 4K still preferred value to return here?
>
> Right, so the logic of my series, goes like this:
>
> a) We currently are assuming DMA_PAGE_SHIFT (conceptual constant) is
> PAGE_SHIFT everywhere, including Power.
>
> b) After 2/7, the Power code will return either the IOMMU table's shift
> value, if set, or PAGE_SHIFT (I guess this would be the case if
> get_dma_ops(dev) == &dma_direct_ops, as you said). That is no different
> than we have now, except we can return the accurate IOMMU value if
> available.

If it is not available, then something went wrong and BUG_ON(!tbl || 
!tbl->it_page_shift) make more sense here than pretending that this 
function can ever return PAGE_SHIFT. imho.


>
> 3) After 3/7, the platform can override the generic Power
> get_dma_page_shift().
>
> 4) After 4/7, pseries will return the DDW value, if available, then
> fallback to the IOMMU table's value. I think in the case of
> get_dma_ops(dev)==&dma_direct_ops, the only way that can happen is if we
> are using DDW, right?

This is for pseries guests; for the powernv host it is a "bypass" mode 
which does 64bit direct DMA mapping and there is no additional window for 
that (i.e. DIRECT64_PROPNAME, etc).
Nishanth Aravamudan Oct. 28, 2015, 1:54 a.m. UTC | #5
On 28.10.2015 [12:00:20 +1100], Alexey Kardashevskiy wrote:
> On 10/28/2015 09:27 AM, Nishanth Aravamudan wrote:
> >On 27.10.2015 [17:02:16 +1100], Alexey Kardashevskiy wrote:
> >>On 10/24/2015 07:57 AM, Nishanth Aravamudan wrote:
> >>>On Power, the kernel's page size can differ from the IOMMU's page size,
> >>>so we need to override the generic implementation, which always returns
> >>>the kernel's page size. Lookup the IOMMU's page size from struct
> >>>iommu_table, if available. Fallback to the kernel's page size,
> >>>otherwise.
> >>>
> >>>Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
> >>>---
> >>>  arch/powerpc/include/asm/dma-mapping.h | 3 +++
> >>>  arch/powerpc/kernel/dma.c              | 9 +++++++++
> >>>  2 files changed, 12 insertions(+)
> >>>
> >>>diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
> >>>index 7f522c0..c5638f4 100644
> >>>--- a/arch/powerpc/include/asm/dma-mapping.h
> >>>+++ b/arch/powerpc/include/asm/dma-mapping.h
> >>>@@ -125,6 +125,9 @@ static inline void set_dma_offset(struct device *dev, dma_addr_t off)
> >>>  #define HAVE_ARCH_DMA_SET_MASK 1
> >>>  extern int dma_set_mask(struct device *dev, u64 dma_mask);
> >>>
> >>>+#define HAVE_ARCH_DMA_GET_PAGE_SHIFT 1
> >>>+extern unsigned long dma_get_page_shift(struct device *dev);
> >>>+
> >>>  #include <asm-generic/dma-mapping-common.h>
> >>>
> >>>  extern int __dma_set_mask(struct device *dev, u64 dma_mask);
> >>>diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> >>>index 59503ed..e805af2 100644
> >>>--- a/arch/powerpc/kernel/dma.c
> >>>+++ b/arch/powerpc/kernel/dma.c
> >>>@@ -335,6 +335,15 @@ int dma_set_mask(struct device *dev, u64 dma_mask)
> >>>  }
> >>>  EXPORT_SYMBOL(dma_set_mask);
> >>>
> >>>+unsigned long dma_get_page_shift(struct device *dev)
> >>>+{
> >>>+	struct iommu_table *tbl = get_iommu_table_base(dev);
> >>>+	if (tbl)
> >>>+		return tbl->it_page_shift;
> >>
> >>
> >>All PCI devices have this initialized on POWER (at least, our, IBM's
> >>POWER) so 4K will always be returned here while in the case of
> >>(get_dma_ops(dev)==&dma_direct_ops) it could actually return
> >>PAGE_SHIFT. Is 4K still preferred value to return here?
> >
> >Right, so the logic of my series, goes like this:
> >
> >a) We currently are assuming DMA_PAGE_SHIFT (conceptual constant) is
> >PAGE_SHIFT everywhere, including Power.
> >
> >b) After 2/7, the Power code will return either the IOMMU table's shift
> >value, if set, or PAGE_SHIFT (I guess this would be the case if
> >get_dma_ops(dev) == &dma_direct_ops, as you said). That is no different
> >than we have now, except we can return the accurate IOMMU value if
> >available.
> 
> If it is not available, then something went wrong and BUG_ON(!tbl ||
> !tbl->it_page_shift) make more sense here than pretending that this
> function can ever return PAGE_SHIFT. imho.

That's a good point, thanks!

> >3) After 3/7, the platform can override the generic Power
> >get_dma_page_shift().
> >
> >4) After 4/7, pseries will return the DDW value, if available, then
> >fallback to the IOMMU table's value. I think in the case of
> >get_dma_ops(dev)==&dma_direct_ops, the only way that can happen is if we
> >are using DDW, right?
> 
> This is for pseries guests; for the powernv host it is a "bypass"
> mode which does 64bit direct DMA mapping and there is no additional
> window for that (i.e. DIRECT64_PROPNAME, etc).

You're right! I should update the code to handle both cases.

In "bypass" mode, what TCE size is used? Is it guaranteed to be 4K?

Seems like this would be a different platform implentation I'd put in
for 'powernv', is that right?

My apologies for missing that, and thank you for the review!

-Nish

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt Oct. 28, 2015, 2:20 a.m. UTC | #6
On Tue, 2015-10-27 at 18:54 -0700, Nishanth Aravamudan wrote:
> 
> In "bypass" mode, what TCE size is used? Is it guaranteed to be 4K?

None :-) The TCEs are completely bypassed. You get a N:M linear mapping
of all memory starting at 1<<59 PCI side.

> Seems like this would be a different platform implentation I'd put in
> for 'powernv', is that right?
> 
> My apologies for missing that, and thank you for the review!

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Aravamudan Oct. 28, 2015, 2:30 a.m. UTC | #7
On 28.10.2015 [11:20:05 +0900], Benjamin Herrenschmidt wrote:
> On Tue, 2015-10-27 at 18:54 -0700, Nishanth Aravamudan wrote:
> > 
> > In "bypass" mode, what TCE size is used? Is it guaranteed to be 4K?
> 
> None :-) The TCEs are completely bypassed. You get a N:M linear mapping
> of all memory starting at 1<<59 PCI side.

Err, duh, sorry! Ok, so in that case, DMA page shift is PAGE_SHIFT,
then?

-Nish

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt Oct. 28, 2015, 3:20 a.m. UTC | #8
On Tue, 2015-10-27 at 19:30 -0700, Nishanth Aravamudan wrote:
> On 28.10.2015 [11:20:05 +0900], Benjamin Herrenschmidt wrote:
> > On Tue, 2015-10-27 at 18:54 -0700, Nishanth Aravamudan wrote:
> > > 
> > > In "bypass" mode, what TCE size is used? Is it guaranteed to be
> > > 4K?
> > 
> > None :-) The TCEs are completely bypassed. You get a N:M linear
> > mapping
> > of all memory starting at 1<<59 PCI side.
> 
> Err, duh, sorry! Ok, so in that case, DMA page shift is PAGE_SHIFT,
> then?

I think so.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index 7f522c0..c5638f4 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -125,6 +125,9 @@  static inline void set_dma_offset(struct device *dev, dma_addr_t off)
 #define HAVE_ARCH_DMA_SET_MASK 1
 extern int dma_set_mask(struct device *dev, u64 dma_mask);
 
+#define HAVE_ARCH_DMA_GET_PAGE_SHIFT 1
+extern unsigned long dma_get_page_shift(struct device *dev);
+
 #include <asm-generic/dma-mapping-common.h>
 
 extern int __dma_set_mask(struct device *dev, u64 dma_mask);
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 59503ed..e805af2 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -335,6 +335,15 @@  int dma_set_mask(struct device *dev, u64 dma_mask)
 }
 EXPORT_SYMBOL(dma_set_mask);
 
+unsigned long dma_get_page_shift(struct device *dev)
+{
+	struct iommu_table *tbl = get_iommu_table_base(dev);
+	if (tbl)
+		return tbl->it_page_shift;
+	return PAGE_SHIFT;
+}
+EXPORT_SYMBOL(dma_get_page_shift);
+
 u64 __dma_get_required_mask(struct device *dev)
 {
 	struct dma_map_ops *dma_ops = get_dma_ops(dev);