diff mbox

[1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask

Message ID 1483044304-2085-1-git-send-email-nikita.yoush@cogentembedded.com
State Not Applicable
Headers show

Commit Message

Nikita Yushchenko Dec. 29, 2016, 8:45 p.m. UTC
It is possible that PCI device supports 64-bit DMA addressing, and thus
it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
bridge has limitations on inbound transactions addressing. Example of
such setup is NVME SSD device connected to RCAR PCIe controller.

Previously there was attempt to handle this via bus notifier: after
driver is attached to PCI device, bridge driver gets notifier callback,
and resets dma_mask from there. However, this is racy: PCI device driver
could already allocate buffers and/or start i/o in probe routine.
In NVME case, i/o is started in workqueue context, and this race gives
"sometimes works, sometimes not" effect.

Proper solution should make driver's dma_set_mask() call to fail if host
bridge can't support mask being set.

This patch makes __swiotlb_dma_supported() to check mask being set for
PCI device against dma_mask of struct device corresponding to PCI host
bridge (one with name "pciXXXX:YY"), if that dma_mask is set.

This is the least destructive approach: currently dma_mask of that device
object is not used anyhow, thus all existing setups will work as before,
and modification is required only in actually affected components -
driver of particular PCI host bridge, and dma_map_ops of particular
platform.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
---
 arch/arm64/mm/dma-mapping.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Arnd Bergmann Dec. 29, 2016, 9:18 p.m. UTC | #1
On Thursday, December 29, 2016 11:45:03 PM CET Nikita Yushchenko wrote:
> 
>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>  {
> +#ifdef CONFIG_PCI
> +       if (dev_is_pci(hwdev)) {
> +               struct pci_dev *pdev = to_pci_dev(hwdev);
> +               struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
> +
> +               if (br->dev.dma_mask && (*br->dev.dma_mask) &&
> +                               (mask & (*br->dev.dma_mask)) != mask)
> +                       return 0;
> +       }
> +#endif
>         if (swiotlb)
>                 return swiotlb_dma_supported(hwdev, mask);
>         return 1;
> 

I think it's wrong to make this a special case for PCI.

Instead, we should follow the dma-ranges properties during dma_set_mask()
to ensure we don't set a mask that any of the parents up to the root
cannot support.

	Arnd
Sergei Shtylyov Dec. 30, 2016, 9:46 a.m. UTC | #2
Hello!

On 12/29/2016 11:45 PM, Nikita Yushchenko wrote:

> It is possible that PCI device supports 64-bit DMA addressing, and thus
> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host

    Its.

> bridge has limitations on inbound transactions addressing. Example of
> such setup is NVME

    Isn't it called NVMe?

> SSD device connected to RCAR PCIe controller.

    R=Car.

> Previously there was attempt to handle this via bus notifier: after
> driver is attached to PCI device, bridge driver gets notifier callback,
> and resets dma_mask from there. However, this is racy: PCI device driver
> could already allocate buffers and/or start i/o in probe routine.
> In NVME case, i/o is started in workqueue context, and this race gives
> "sometimes works, sometimes not" effect.
>
> Proper solution should make driver's dma_set_mask() call to fail if host
> bridge can't support mask being set.
>
> This patch makes __swiotlb_dma_supported() to check mask being set for

    "To" not needed here.

> PCI device against dma_mask of struct device corresponding to PCI host
> bridge (one with name "pciXXXX:YY"), if that dma_mask is set.
>
> This is the least destructive approach: currently dma_mask of that device
> object is not used anyhow, thus all existing setups will work as before,
> and modification is required only in actually affected components -
> driver of particular PCI host bridge, and dma_map_ops of particular
> platform.
>
> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
> ---
>  arch/arm64/mm/dma-mapping.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 290a84f..49645277 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
[...]
> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
>
>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>  {
> +#ifdef CONFIG_PCI
> +	if (dev_is_pci(hwdev)) {
> +		struct pci_dev *pdev = to_pci_dev(hwdev);
> +		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
> +
> +		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
> +				(mask & (*br->dev.dma_mask)) != mask)

    Hum, inner parens not necessary?

[...]

MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sergei Shtylyov Dec. 30, 2016, 10:06 a.m. UTC | #3
On 12/30/2016 12:46 PM, Sergei Shtylyov wrote:

>> It is possible that PCI device supports 64-bit DMA addressing, and thus
>> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
>
>    Its.
>
>> bridge has limitations on inbound transactions addressing. Example of
>> such setup is NVME
>
>    Isn't it called NVMe?
>
>> SSD device connected to RCAR PCIe controller.
>
>    R=Car.

    Sorry, R-Car. :-)

[...]

MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Will Deacon Jan. 3, 2017, 6:44 p.m. UTC | #4
On Thu, Dec 29, 2016 at 11:45:03PM +0300, Nikita Yushchenko wrote:
> It is possible that PCI device supports 64-bit DMA addressing, and thus
> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
> bridge has limitations on inbound transactions addressing. Example of
> such setup is NVME SSD device connected to RCAR PCIe controller.
> 
> Previously there was attempt to handle this via bus notifier: after
> driver is attached to PCI device, bridge driver gets notifier callback,
> and resets dma_mask from there. However, this is racy: PCI device driver
> could already allocate buffers and/or start i/o in probe routine.
> In NVME case, i/o is started in workqueue context, and this race gives
> "sometimes works, sometimes not" effect.
> 
> Proper solution should make driver's dma_set_mask() call to fail if host
> bridge can't support mask being set.
> 
> This patch makes __swiotlb_dma_supported() to check mask being set for
> PCI device against dma_mask of struct device corresponding to PCI host
> bridge (one with name "pciXXXX:YY"), if that dma_mask is set.
> 
> This is the least destructive approach: currently dma_mask of that device
> object is not used anyhow, thus all existing setups will work as before,
> and modification is required only in actually affected components -
> driver of particular PCI host bridge, and dma_map_ops of particular
> platform.
> 
> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
> ---
>  arch/arm64/mm/dma-mapping.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 290a84f..49645277 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -28,6 +28,7 @@
>  #include <linux/dma-contiguous.h>
>  #include <linux/vmalloc.h>
>  #include <linux/swiotlb.h>
> +#include <linux/pci.h>
>  
>  #include <asm/cacheflush.h>
>  
> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
>  
>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>  {
> +#ifdef CONFIG_PCI
> +	if (dev_is_pci(hwdev)) {
> +		struct pci_dev *pdev = to_pci_dev(hwdev);
> +		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
> +
> +		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
> +				(mask & (*br->dev.dma_mask)) != mask)
> +			return 0;
> +	}
> +#endif

Hmm, but this makes it look like the problem is both arm64 and swiotlb
specific, when in reality it's not. Perhaps another hack you could try
would be to register a PCI bus notifier in the host bridge looking for
BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child
device before the driver has probed, but adding a dma_set_mask callback
to limit the mask to what you need?

I agree that it would be better if dma_set_mask handled all of this
transparently, but it's all based on the underlying ops rather than the
bus type.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikita Yushchenko Jan. 3, 2017, 7:01 p.m. UTC | #5
>> It is possible that PCI device supports 64-bit DMA addressing, and thus
>> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
>> bridge has limitations on inbound transactions addressing. Example of
>> such setup is NVME SSD device connected to RCAR PCIe controller.
>>
>> Previously there was attempt to handle this via bus notifier: after
>> driver is attached to PCI device, bridge driver gets notifier callback,
>> and resets dma_mask from there. However, this is racy: PCI device driver
>> could already allocate buffers and/or start i/o in probe routine.
>> In NVME case, i/o is started in workqueue context, and this race gives
>> "sometimes works, sometimes not" effect.
>>
>> Proper solution should make driver's dma_set_mask() call to fail if host
>> bridge can't support mask being set.
>>
>> This patch makes __swiotlb_dma_supported() to check mask being set for
>> PCI device against dma_mask of struct device corresponding to PCI host
>> bridge (one with name "pciXXXX:YY"), if that dma_mask is set.
>>
>> This is the least destructive approach: currently dma_mask of that device
>> object is not used anyhow, thus all existing setups will work as before,
>> and modification is required only in actually affected components -
>> driver of particular PCI host bridge, and dma_map_ops of particular
>> platform.
>>
>> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
>> ---
>>  arch/arm64/mm/dma-mapping.c | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
>>
>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>> index 290a84f..49645277 100644
>> --- a/arch/arm64/mm/dma-mapping.c
>> +++ b/arch/arm64/mm/dma-mapping.c
>> @@ -28,6 +28,7 @@
>>  #include <linux/dma-contiguous.h>
>>  #include <linux/vmalloc.h>
>>  #include <linux/swiotlb.h>
>> +#include <linux/pci.h>
>>  
>>  #include <asm/cacheflush.h>
>>  
>> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
>>  
>>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>>  {
>> +#ifdef CONFIG_PCI
>> +	if (dev_is_pci(hwdev)) {
>> +		struct pci_dev *pdev = to_pci_dev(hwdev);
>> +		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
>> +
>> +		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
>> +				(mask & (*br->dev.dma_mask)) != mask)
>> +			return 0;
>> +	}
>> +#endif
> 
> Hmm, but this makes it look like the problem is both arm64 and swiotlb
> specific, when in reality it's not. Perhaps another hack you could try
> would be to register a PCI bus notifier in the host bridge looking for
> BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child
> device before the driver has probed, but adding a dma_set_mask callback
> to limit the mask to what you need?

This is what Renesas BSP tries to do and it does not work.

BUS_NOTIFY_BIND_DRIVER arrives after driver's probe routine exits, but
i/o can be started before that.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Grygorii Strashko Jan. 3, 2017, 8:13 p.m. UTC | #6
On 01/03/2017 01:01 PM, Nikita Yushchenko wrote:
>>> It is possible that PCI device supports 64-bit DMA addressing, and thus
>>> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
>>> bridge has limitations on inbound transactions addressing. Example of
>>> such setup is NVME SSD device connected to RCAR PCIe controller.
>>>
>>> Previously there was attempt to handle this via bus notifier: after
>>> driver is attached to PCI device, bridge driver gets notifier callback,
>>> and resets dma_mask from there. However, this is racy: PCI device driver
>>> could already allocate buffers and/or start i/o in probe routine.
>>> In NVME case, i/o is started in workqueue context, and this race gives
>>> "sometimes works, sometimes not" effect.
>>>
>>> Proper solution should make driver's dma_set_mask() call to fail if host
>>> bridge can't support mask being set.
>>>
>>> This patch makes __swiotlb_dma_supported() to check mask being set for
>>> PCI device against dma_mask of struct device corresponding to PCI host
>>> bridge (one with name "pciXXXX:YY"), if that dma_mask is set.
>>>
>>> This is the least destructive approach: currently dma_mask of that device
>>> object is not used anyhow, thus all existing setups will work as before,
>>> and modification is required only in actually affected components -
>>> driver of particular PCI host bridge, and dma_map_ops of particular
>>> platform.
>>>
>>> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
>>> ---
>>>  arch/arm64/mm/dma-mapping.c | 11 +++++++++++
>>>  1 file changed, 11 insertions(+)
>>>
>>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>>> index 290a84f..49645277 100644
>>> --- a/arch/arm64/mm/dma-mapping.c
>>> +++ b/arch/arm64/mm/dma-mapping.c
>>> @@ -28,6 +28,7 @@
>>>  #include <linux/dma-contiguous.h>
>>>  #include <linux/vmalloc.h>
>>>  #include <linux/swiotlb.h>
>>> +#include <linux/pci.h>
>>>  
>>>  #include <asm/cacheflush.h>
>>>  
>>> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
>>>  
>>>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>>>  {
>>> +#ifdef CONFIG_PCI
>>> +	if (dev_is_pci(hwdev)) {
>>> +		struct pci_dev *pdev = to_pci_dev(hwdev);
>>> +		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
>>> +
>>> +		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
>>> +				(mask & (*br->dev.dma_mask)) != mask)
>>> +			return 0;
>>> +	}
>>> +#endif
>>
>> Hmm, but this makes it look like the problem is both arm64 and swiotlb
>> specific, when in reality it's not. Perhaps another hack you could try
>> would be to register a PCI bus notifier in the host bridge looking for
>> BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child
>> device before the driver has probed, but adding a dma_set_mask callback
>> to limit the mask to what you need?
> 
> This is what Renesas BSP tries to do and it does not work.
> 
> BUS_NOTIFY_BIND_DRIVER arrives after driver's probe routine exits, but
> i/o can be started before that.

Hm. This is strange statement:
 really_probe
 |->driver_sysfs_add
    |-> blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 					     BUS_NOTIFY_BIND_DRIVER, dev);
...
 |- ret = drv->probe(dev);
...
 |- driver_bound(dev);
    |- blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
					     BUS_NOTIFY_BOUND_DRIVER, dev);

Am I missing smth?
Nikita Yushchenko Jan. 3, 2017, 8:23 p.m. UTC | #7
>>>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>>>> index 290a84f..49645277 100644
>>>> --- a/arch/arm64/mm/dma-mapping.c
>>>> +++ b/arch/arm64/mm/dma-mapping.c
>>>> @@ -28,6 +28,7 @@
>>>>  #include <linux/dma-contiguous.h>
>>>>  #include <linux/vmalloc.h>
>>>>  #include <linux/swiotlb.h>
>>>> +#include <linux/pci.h>
>>>>  
>>>>  #include <asm/cacheflush.h>
>>>>  
>>>> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
>>>>  
>>>>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>>>>  {
>>>> +#ifdef CONFIG_PCI
>>>> +	if (dev_is_pci(hwdev)) {
>>>> +		struct pci_dev *pdev = to_pci_dev(hwdev);
>>>> +		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
>>>> +
>>>> +		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
>>>> +				(mask & (*br->dev.dma_mask)) != mask)
>>>> +			return 0;
>>>> +	}
>>>> +#endif
>>>
>>> Hmm, but this makes it look like the problem is both arm64 and swiotlb
>>> specific, when in reality it's not. Perhaps another hack you could try
>>> would be to register a PCI bus notifier in the host bridge looking for
>>> BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child
>>> device before the driver has probed, but adding a dma_set_mask callback
>>> to limit the mask to what you need?
>>
>> This is what Renesas BSP tries to do and it does not work.
>>
>> BUS_NOTIFY_BIND_DRIVER arrives after driver's probe routine exits, but
>> i/o can be started before that.
> 
> Hm. This is strange statement:
>  really_probe
>  |->driver_sysfs_add
>     |-> blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
>  					     BUS_NOTIFY_BIND_DRIVER, dev);
> ...
>  |- ret = drv->probe(dev);
> ...
>  |- driver_bound(dev);
>     |- blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
> 					     BUS_NOTIFY_BOUND_DRIVER, dev);
> 
> Am I missing smth?

I misinterpreted your message, sorry.

BSP attaches to BUS_NOTIFY_BOUND_DRIVER, not to BUS_NOTIFY_BIND_DRIVER,
and simply overwrites device's dma_mask there.  You are suggesting
something completely different.

I'll check if your approach is practical.


Currently powerpc architecture has one more approach implemented, they
use pci_controller structure provided by host bridge driver, and that
has a set_dma_mask() hook. Maybe extending this beyond powerpc could be
a good idea. However, that will require changing quite a few host bridge
drivers, without any gain for most of those...
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 290a84f..49645277 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -28,6 +28,7 @@ 
 #include <linux/dma-contiguous.h>
 #include <linux/vmalloc.h>
 #include <linux/swiotlb.h>
+#include <linux/pci.h>
 
 #include <asm/cacheflush.h>
 
@@ -347,6 +348,16 @@  static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
 
 static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
+#ifdef CONFIG_PCI
+	if (dev_is_pci(hwdev)) {
+		struct pci_dev *pdev = to_pci_dev(hwdev);
+		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
+
+		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
+				(mask & (*br->dev.dma_mask)) != mask)
+			return 0;
+	}
+#endif
 	if (swiotlb)
 		return swiotlb_dma_supported(hwdev, mask);
 	return 1;