diff mbox series

[03/25] dma-direct: take dma-ranges/offsets into account in resource mapping

Message ID 20220324014836.19149-4-Sergey.Semin@baikalelectronics.ru
State New
Headers show
Series dmaengine: dw-edma: Add RP/EP local DMA controllers support | expand

Commit Message

Serge Semin March 24, 2022, 1:48 a.m. UTC
A basic device-specific linear memory mapping was introduced back in
commit ("dma: Take into account dma_pfn_offset") as a single-valued offset
preserved in the device.dma_pfn_offset field, which was initialized for
instance by means of the "dma-ranges" DT property. Afterwards the
functionality was extended to support more than one device-specific region
defined in the device.dma_range_map list of maps. But all of these
improvements concerned a single pointer, page or sg DMA-mapping methods,
while the system resource mapping function turned to miss the
corresponding modification. Thus the dma_direct_map_resource() method now
just casts the CPU physical address to the device DMA address with no
dma-ranges-based mapping taking into account, which is obviously wrong.
Let's fix it by using the phys_to_dma_direct() method to get the
device-specific bus address from the passed memory resource for the case
of the directly mapped DMA.

Fixes: 25f1e1887088 ("dma: Take into account dma_pfn_offset")
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
---
 kernel/dma/direct.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Serge Semin April 17, 2022, 10:44 p.m. UTC | #1
Hello Robin.

Sorry for the delayed answer. My comments are below.

On Thu, Mar 24, 2022 at 11:30:38AM +0000, Robin Murphy wrote:
> On 2022-03-24 01:48, Serge Semin wrote:
> > A basic device-specific linear memory mapping was introduced back in
> > commit ("dma: Take into account dma_pfn_offset") as a single-valued offset
> > preserved in the device.dma_pfn_offset field, which was initialized for
> > instance by means of the "dma-ranges" DT property. Afterwards the
> > functionality was extended to support more than one device-specific region
> > defined in the device.dma_range_map list of maps. But all of these
> > improvements concerned a single pointer, page or sg DMA-mapping methods,
> > while the system resource mapping function turned to miss the
> > corresponding modification. Thus the dma_direct_map_resource() method now
> > just casts the CPU physical address to the device DMA address with no
> > dma-ranges-based mapping taking into account, which is obviously wrong.
> > Let's fix it by using the phys_to_dma_direct() method to get the
> > device-specific bus address from the passed memory resource for the case
> > of the directly mapped DMA.
> 

> It may not have been well-documented at the time, but this was largely
> intentional. The assumption based on known systems was that where
> dma_pfn_offset existed, it would *not* apply to peer MMIO addresses.

Well, I'd say it wasn't documented or even discussed at all. At least
after a pretty much comprehensive retrospective research I failed to
find any note about the reason of having all the dma_direct_map*()
methods converted to supporting the dma_pfn_offset/dma_range_map
ranges, but leaving the dma_direct_map_resource() method out of that
conversion. Neither it is immediately inferable from the method usage
and its prototype that it is supposed to be utilized for the DMA
memory addresses, not the CPU one.

> 
> For instance, DTs for TI Keystone 2 platforms only describe an offset for
> RAM:
> 
> 	dma-ranges = <0x80000000 0x8 0x00000000 0x80000000>;
> 
> but a DMA controller might also want to access something in the MMIO range
> 0x0-0x7fffffff, of which it still has an identical non-offset view. If a
> driver was previously using dma_map_resource() for that, it would now start
> getting DMA_MAPPING_ERROR because the dma_range_map exists but doesn't
> describe the MMIO region. I agree that in hindsight it's not an ideal
> situation, but it's how things have ended up, so at this point I'm wary of
> making potentially-breaking changes.

Hmm, what if the driver was previously using for instance the
dma_direct_map_sg() method for it? Following this logic you would have
needed to decline the whole dma_pfn_offset/dma_range_map ranges
support, since the dma_direct_map_sg(), dma_direct_map_page(),
dma_direct_alloc*() methods do take the offsets into account. What we
can see now is that the same physical address will be differently
mapped by the dma_map_resource() and, for instance, dma_map_sg()
methods. All of these methods expect to have the "phys_addr_t" address
passed, which is the CPU address, not the DMA one. Doesn't that look
erroneous? IIUC in accordance with the common kernel definition the
"resource" suffix indicates the CPU-visible address (like struct
resource range), not the DMA address. No matter whether it's used to
describe the RAM or MMIO range.

AFAICS the dma_range_map just defines the offset-based DMA-to-CPU
mapping for the particular bus/device. If the device driver already
knows the DMA address why does it need to map it at all? I see some
contradiction here.

> 
> May I ask what exactly your setup looks like, if you have a DMA controller
> with an offset view of its "own" MMIO space?

I don't have such. But what I see here is either the wrong
dma_direct_map_resource() implementation or a redundant mapping
performed in some platforms/DMA-device drivers. Indeed judging by the
dma_map_resource() method declaration it expects to have the
CPU-address passed, which will be mapped in accordance with the
"dma-ranges"-based DMA-to-CPU memory mapping in the same way as the
rest of the dma_direct-family methods. If DMA address is already known
then it is supposed to be used as-is with no any additional remapping
procedure performed.

The last but not least regarding the DMA controllers and the
dma_map_resource() usage. The dma_slave_config structure was converted
to having the CPU-physical src/dst address specified in commit
9575632052ba ("dmaengine: make slave address physical"). So the DMA
client drivers now have to set the slave source and destination
addresses defined in the CPU address space, while the DMA engine
driver needs to map it in accordance with the platform/device specific
configs.

To sum up as I see it the problem is in the dma_map_resource()
semantics still exist. The semantic isn't documented in any way while
its implementation looks confusing. You say that the method
expects to have the DMA address passed, but at the same
time it has the phys_addr argument of the phys_addr_t type. If it had
dma_addr_t type instead that would have been much less confusing.
Could you clarify whether my considerations above are wrong and in what
aspect?

-Sergey

> 
> Thanks,
> Robin.
> 
> > Fixes: 25f1e1887088 ("dma: Take into account dma_pfn_offset")
> > Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> > ---
> >   kernel/dma/direct.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > index 50f48e9e4598..9ce8192b29ab 100644
> > --- a/kernel/dma/direct.c
> > +++ b/kernel/dma/direct.c
> > @@ -497,7 +497,7 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
> >   dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
> >   		size_t size, enum dma_data_direction dir, unsigned long attrs)
> >   {
> > -	dma_addr_t dma_addr = paddr;
> > +	dma_addr_t dma_addr = phys_to_dma_direct(dev, paddr);
> >   	if (unlikely(!dma_capable(dev, dma_addr, size, false))) {
> >   		dev_err_once(dev,
Christoph Hellwig April 20, 2022, 7:12 a.m. UTC | #2
On Mon, Apr 18, 2022 at 01:44:27AM +0300, Serge Semin wrote:
> > but a DMA controller might also want to access something in the MMIO range
> > 0x0-0x7fffffff, of which it still has an identical non-offset view. If a
> > driver was previously using dma_map_resource() for that, it would now start
> > getting DMA_MAPPING_ERROR because the dma_range_map exists but doesn't
> > describe the MMIO region. I agree that in hindsight it's not an ideal
> > situation, but it's how things have ended up, so at this point I'm wary of
> > making potentially-breaking changes.
> 
> Hmm, what if the driver was previously using for instance the
> dma_direct_map_sg() method for it?

dma_map_resource is for mapping MMIO space, and must not be called on
memory in the kernel map.  For dma_map_sg (or all the other dma_map_*
interface except for dma_map_resource), the reverse is true.
Serge Semin April 20, 2022, 8:32 a.m. UTC | #3
On Wed, Apr 20, 2022 at 09:12:17AM +0200, Christoph Hellwig wrote:
> On Mon, Apr 18, 2022 at 01:44:27AM +0300, Serge Semin wrote:
> > > but a DMA controller might also want to access something in the MMIO range
> > > 0x0-0x7fffffff, of which it still has an identical non-offset view. If a
> > > driver was previously using dma_map_resource() for that, it would now start
> > > getting DMA_MAPPING_ERROR because the dma_range_map exists but doesn't
> > > describe the MMIO region. I agree that in hindsight it's not an ideal
> > > situation, but it's how things have ended up, so at this point I'm wary of
> > > making potentially-breaking changes.
> > 
> > Hmm, what if the driver was previously using for instance the
> > dma_direct_map_sg() method for it?
> 

> dma_map_resource is for mapping MMIO space, and must not be called on
> memory in the kernel map.  For dma_map_sg (or all the other dma_map_*
> interface except for dma_map_resource), the reverse is true.

I've got it from the Robin comment. Exactly that part seems very much
confusing to me, because what you say doesn't cohere with the passed
address type. If the passed address belongs to the MMIO space and is a
part of the CPU physical address space, then it is supposed to be
visible by the CPU as is (see the very first diagram in [1]). So the
mapping performed in the dma_map_resource() and dma_map_sg() methods
is supposed to match. Otherwise the spaces you are talking about are
different and as such need to be described by different types. Since
what you are talking about more seem like a DMA address space, then the
dma_map_resource() address needs to have the dma_addr_t type instead
of the phys_addr_t.

BTW here is a brightest example of a system, which contradicts the
MMIO-specific mapping semantics you are talking about (it actually
matches to what we've got except some interconnect implementation
peculiarities):

              +-----+
              | DDR |
              +--+--+
                 |
  +-----+ +------+-------+ +---------+
  | CPU +-+ Interconnect +-+ DEVs... |
  +-----+ +-----^-+------+ +---------+
     dma-ranges-| |-ranges
              +-+-v-+
              | PCI |
              +-----+

See, if I get to map a virtual memory address to be accessible by any
PCIe peripheral device, then the dma_map_sg/dma_map_page/etc
procedures will take the PCIe host controller dma-ranges into account.
It will work as expected and the PCIe devices will see the memory what
I specified. But if I get to pass the physical address of the same
page or a physical address of some device of the DEVs space to the
dma_map_resource(), then the PCIe dma-ranges won't be taken into
account, and the result mapping will be incorrect. That's why the
current dma_map_resource() implementation seems very confusing to me.
As I see it phys_addr_t is the type of the Interconnect address space,
meanwhile dma_addr_t describes the PCIe, DEVs address spaces.

Based on what I said here and in my previous email could you explain
what do I get wrong?

[1] Documentation/core-api/dma-api-howto.rst

-Sergey
Christoph Hellwig April 20, 2022, 8:47 a.m. UTC | #4
I can't really comment on the dma-ranges exlcusion for P2P mappings,
as that predates my involvedment, however:

On Wed, Apr 20, 2022 at 11:32:07AM +0300, Serge Semin wrote:
> See, if I get to map a virtual memory address to be accessible by any
> PCIe peripheral device, then the dma_map_sg/dma_map_page/etc
> procedures will take the PCIe host controller dma-ranges into account.
> It will work as expected and the PCIe devices will see the memory what
> I specified. But if I get to pass the physical address of the same
> page or a physical address of some device of the DEVs space to the
> dma_map_resource(), then the PCIe dma-ranges won't be taken into
> account, and the result mapping will be incorrect. That's why the
> current dma_map_resource() implementation seems very confusing to me.
> As I see it phys_addr_t is the type of the Interconnect address space,
> meanwhile dma_addr_t describes the PCIe, DEVs address spaces.
> 
> Based on what I said here and in my previous email could you explain
> what do I get wrong?

You simply must not use dma_map_resource for normal kernel memory.
So while the exclusion might be somewhat confusing, that confusion
really should not matter for any proper use of the API.
Serge Semin April 20, 2022, 8:55 a.m. UTC | #5
On Wed, Apr 20, 2022 at 10:47:46AM +0200, Christoph Hellwig wrote:
> I can't really comment on the dma-ranges exlcusion for P2P mappings,
> as that predates my involvedment, however:

My example wasn't specific to the PCIe P2P transfers, but about PCIe
devices reaching some platform devices over the system interconnect
bus.

> 
> On Wed, Apr 20, 2022 at 11:32:07AM +0300, Serge Semin wrote:
> > See, if I get to map a virtual memory address to be accessible by any
> > PCIe peripheral device, then the dma_map_sg/dma_map_page/etc
> > procedures will take the PCIe host controller dma-ranges into account.
> > It will work as expected and the PCIe devices will see the memory what
> > I specified. But if I get to pass the physical address of the same
> > page or a physical address of some device of the DEVs space to the
> > dma_map_resource(), then the PCIe dma-ranges won't be taken into
> > account, and the result mapping will be incorrect. That's why the
> > current dma_map_resource() implementation seems very confusing to me.
> > As I see it phys_addr_t is the type of the Interconnect address space,
> > meanwhile dma_addr_t describes the PCIe, DEVs address spaces.
> > 
> > Based on what I said here and in my previous email could you explain
> > what do I get wrong?
> 

> You simply must not use dma_map_resource for normal kernel memory.
> So while the exclusion might be somewhat confusing, that confusion
> really should not matter for any proper use of the API.

What if I get to have a physical address of a platform device and want
have that device being accessed by a PCIe peripheral device? The
dma_map_resource() seemed very much suitable for that. But considering
what you say it isn't.

-Sergey
Christoph Hellwig April 21, 2022, 2:45 p.m. UTC | #6
On Wed, Apr 20, 2022 at 11:55:38AM +0300, Serge Semin wrote:
> On Wed, Apr 20, 2022 at 10:47:46AM +0200, Christoph Hellwig wrote:
> > I can't really comment on the dma-ranges exlcusion for P2P mappings,
> > as that predates my involvedment, however:
> 
> My example wasn't specific to the PCIe P2P transfers, but about PCIe
> devices reaching some platform devices over the system interconnect
> bus.

So strike PCIe, but this our definition of Peer to Peer accesses.

> What if I get to have a physical address of a platform device and want
> have that device being accessed by a PCIe peripheral device? The
> dma_map_resource() seemed very much suitable for that. But considering
> what you say it isn't.

dma_map_resource is the right thing for that.  But the physical address
of MMIO ranges in the platform device should not have struct pages
allocated for it, and thus the other dma_map_* APIs should not work on
it to start with.
Serge Semin April 21, 2022, 5:35 p.m. UTC | #7
On Thu, Apr 21, 2022 at 04:45:36PM +0200, Christoph Hellwig wrote:
> On Wed, Apr 20, 2022 at 11:55:38AM +0300, Serge Semin wrote:
> > On Wed, Apr 20, 2022 at 10:47:46AM +0200, Christoph Hellwig wrote:
> > > I can't really comment on the dma-ranges exlcusion for P2P mappings,
> > > as that predates my involvedment, however:
> > 
> > My example wasn't specific to the PCIe P2P transfers, but about PCIe
> > devices reaching some platform devices over the system interconnect
> > bus.
> 
> So strike PCIe, but this our definition of Peer to Peer accesses.
> 
> > What if I get to have a physical address of a platform device and want
> > have that device being accessed by a PCIe peripheral device? The
> > dma_map_resource() seemed very much suitable for that. But considering
> > what you say it isn't.
> 

> dma_map_resource is the right thing for that.  But the physical address
> of MMIO ranges in the platform device should not have struct pages
> allocated for it, and thus the other dma_map_* APIs should not work on
> it to start with.

The problem is that the dma_map_resource() won't work for that, but
presumably the dma_map_sg()-like methods will (after some hacking with
the phys address, but anyway). Consider the system diagram in my
previous email. Here is what I would do to initialize a DMA
transaction between a platform device and a PCIe peripheral device:

1) struct resource *rsc = platform_get_resource(plat_dev, IORESOURCE_MEM, 0);

2) dma_addr_t dar = dma_map_resource(&pci_dev->dev, rsc->start, rsc->end - rsc->start + 1,
                                      DMA_FROM_DEVICE, 0);

3) dma_addr_t sar;
   void *tmp = dma_alloc_coherent(&pci_dev->dev, PAGE_SIZE, &sar, GFP_KERNEL);
   memset(tmp, 0xaa, PAGE_SIZE);

4) PCIe device: DMA.DAR=dar, DMA.SAR=sar. RUN.

If there is no dma-ranges specified in the PCIe Host controller
DT-node, the PCIe peripheral devices will see the rest of the system
memory as is (no offsets and remappings). But if there is dma-ranges
with some specific system settings it may affect the PCIe MRd/MWr TLPs
address translation including the addresses targeted to the MMIO
space. In that case the mapping performed on step 2) will return a
wrong DMA-address since the corresponding dma_direct_map_resource()
just returns the passed physical address missing the
'pci_dev->dma_range_map'-based mapping performed in
translate_phys_to_dma().

Note the mapping on step 3) works correctly because it calls the
translate_phys_to_dma() of the direct DMA interface thus taking the
PCie dma-ranges into account.

To sum up as I see it either restricting dma_map_resource() to map
just the intra-bus addresses was wrong or there must be some
additional mapping infrastructure for the denoted systems. Though I
don't see a way the dma_map_resource() could be fixed to be suitable
for each considered cases.

-Sergey
   
map the platforms
Robin Murphy April 21, 2022, 8:51 p.m. UTC | #8
On 2022-04-21 18:35, Serge Semin wrote:
> On Thu, Apr 21, 2022 at 04:45:36PM +0200, Christoph Hellwig wrote:
>> On Wed, Apr 20, 2022 at 11:55:38AM +0300, Serge Semin wrote:
>>> On Wed, Apr 20, 2022 at 10:47:46AM +0200, Christoph Hellwig wrote:
>>>> I can't really comment on the dma-ranges exlcusion for P2P mappings,
>>>> as that predates my involvedment, however:
>>>
>>> My example wasn't specific to the PCIe P2P transfers, but about PCIe
>>> devices reaching some platform devices over the system interconnect
>>> bus.
>>
>> So strike PCIe, but this our definition of Peer to Peer accesses.
>>
>>> What if I get to have a physical address of a platform device and want
>>> have that device being accessed by a PCIe peripheral device? The
>>> dma_map_resource() seemed very much suitable for that. But considering
>>> what you say it isn't.
>>
> 
>> dma_map_resource is the right thing for that.  But the physical address
>> of MMIO ranges in the platform device should not have struct pages
>> allocated for it, and thus the other dma_map_* APIs should not work on
>> it to start with.
> 
> The problem is that the dma_map_resource() won't work for that, but
> presumably the dma_map_sg()-like methods will (after some hacking with
> the phys address, but anyway). Consider the system diagram in my
> previous email. Here is what I would do to initialize a DMA
> transaction between a platform device and a PCIe peripheral device:
> 
> 1) struct resource *rsc = platform_get_resource(plat_dev, IORESOURCE_MEM, 0);
> 
> 2) dma_addr_t dar = dma_map_resource(&pci_dev->dev, rsc->start, rsc->end - rsc->start + 1,
>                                        DMA_FROM_DEVICE, 0);
> 
> 3) dma_addr_t sar;
>     void *tmp = dma_alloc_coherent(&pci_dev->dev, PAGE_SIZE, &sar, GFP_KERNEL);
>     memset(tmp, 0xaa, PAGE_SIZE);
> 
> 4) PCIe device: DMA.DAR=dar, DMA.SAR=sar. RUN.
> 
> If there is no dma-ranges specified in the PCIe Host controller
> DT-node, the PCIe peripheral devices will see the rest of the system
> memory as is (no offsets and remappings). But if there is dma-ranges
> with some specific system settings it may affect the PCIe MRd/MWr TLPs
> address translation including the addresses targeted to the MMIO
> space. In that case the mapping performed on step 2) will return a
> wrong DMA-address since the corresponding dma_direct_map_resource()
> just returns the passed physical address missing the
> 'pci_dev->dma_range_map'-based mapping performed in
> translate_phys_to_dma().
> 
> Note the mapping on step 3) works correctly because it calls the
> translate_phys_to_dma() of the direct DMA interface thus taking the
> PCie dma-ranges into account.
> 
> To sum up as I see it either restricting dma_map_resource() to map
> just the intra-bus addresses was wrong or there must be some
> additional mapping infrastructure for the denoted systems. Though I
> don't see a way the dma_map_resource() could be fixed to be suitable
> for each considered cases.

FWIW the current semantics of dma_map_resource() are basically just to 
insert IOMMU awareness where dmaengine drivers were previously just 
casting phys_addr_t to dma_addr_t (or u32, or whatever else they put 
into their descriptor/register/etc.) IIRC there was a bit of a question 
whether it really belonged in the DMA API at all, since it's not really 
a "DMA" operation in the conventional sense, and convenience was the 
only real deciding argument. The relevant drivers at the time were not 
taking dma_pfn_offset into account when consuming physical addresses 
directly, so the new API didn't either.

That's just how things got to where they are today. Once again, I'm not 
saying that what we have now is necessarily right, or that your change 
is necessarily wrong, I just really want to understand specifically 
*why* you need to make it, so we can evaluate the risk of possible 
breakage either way. Theoretical "if"s aren't really enough.

Robin.
Serge Semin April 24, 2022, 9:46 p.m. UTC | #9
On Thu, Apr 21, 2022 at 09:51:31PM +0100, Robin Murphy wrote:
> On 2022-04-21 18:35, Serge Semin wrote:
> > On Thu, Apr 21, 2022 at 04:45:36PM +0200, Christoph Hellwig wrote:
> > > On Wed, Apr 20, 2022 at 11:55:38AM +0300, Serge Semin wrote:
> > > > On Wed, Apr 20, 2022 at 10:47:46AM +0200, Christoph Hellwig wrote:
> > > > > I can't really comment on the dma-ranges exlcusion for P2P mappings,
> > > > > as that predates my involvedment, however:
> > > > 
> > > > My example wasn't specific to the PCIe P2P transfers, but about PCIe
> > > > devices reaching some platform devices over the system interconnect
> > > > bus.
> > > 
> > > So strike PCIe, but this our definition of Peer to Peer accesses.
> > > 
> > > > What if I get to have a physical address of a platform device and want
> > > > have that device being accessed by a PCIe peripheral device? The
> > > > dma_map_resource() seemed very much suitable for that. But considering
> > > > what you say it isn't.
> > > 
> > 
> > > dma_map_resource is the right thing for that.  But the physical address
> > > of MMIO ranges in the platform device should not have struct pages
> > > allocated for it, and thus the other dma_map_* APIs should not work on
> > > it to start with.
> > 
> > The problem is that the dma_map_resource() won't work for that, but
> > presumably the dma_map_sg()-like methods will (after some hacking with
> > the phys address, but anyway). Consider the system diagram in my
> > previous email. Here is what I would do to initialize a DMA
> > transaction between a platform device and a PCIe peripheral device:
> > 
> > 1) struct resource *rsc = platform_get_resource(plat_dev, IORESOURCE_MEM, 0);
> > 
> > 2) dma_addr_t dar = dma_map_resource(&pci_dev->dev, rsc->start, rsc->end - rsc->start + 1,
> >                                        DMA_FROM_DEVICE, 0);
> > 
> > 3) dma_addr_t sar;
> >     void *tmp = dma_alloc_coherent(&pci_dev->dev, PAGE_SIZE, &sar, GFP_KERNEL);
> >     memset(tmp, 0xaa, PAGE_SIZE);
> > 
> > 4) PCIe device: DMA.DAR=dar, DMA.SAR=sar. RUN.
> > 
> > If there is no dma-ranges specified in the PCIe Host controller
> > DT-node, the PCIe peripheral devices will see the rest of the system
> > memory as is (no offsets and remappings). But if there is dma-ranges
> > with some specific system settings it may affect the PCIe MRd/MWr TLPs
> > address translation including the addresses targeted to the MMIO
> > space. In that case the mapping performed on step 2) will return a
> > wrong DMA-address since the corresponding dma_direct_map_resource()
> > just returns the passed physical address missing the
> > 'pci_dev->dma_range_map'-based mapping performed in
> > translate_phys_to_dma().
> > 
> > Note the mapping on step 3) works correctly because it calls the
> > translate_phys_to_dma() of the direct DMA interface thus taking the
> > PCie dma-ranges into account.
> > 
> > To sum up as I see it either restricting dma_map_resource() to map
> > just the intra-bus addresses was wrong or there must be some
> > additional mapping infrastructure for the denoted systems. Though I
> > don't see a way the dma_map_resource() could be fixed to be suitable
> > for each considered cases.
> 

> FWIW the current semantics of dma_map_resource() are basically just to
> insert IOMMU awareness where dmaengine drivers were previously just casting
> phys_addr_t to dma_addr_t (or u32, or whatever else they put into their
> descriptor/register/etc.) IIRC there was a bit of a question whether it
> really belonged in the DMA API at all, since it's not really a "DMA"
> operation in the conventional sense, and convenience was the only real
> deciding argument. The relevant drivers at the time were not taking
> dma_pfn_offset into account when consuming physical addresses directly, so
> the new API didn't either.
> 
> That's just how things got to where they are today. 

I see. Thanks for the clarification. Right, IOMMU is the only reason
to have the current dma_map_resource() implementation.

> Once again, I'm not
> saying that what we have now is necessarily right, or that your change is
> necessarily wrong, I just really want to understand specifically *why* you
> need to make it, so we can evaluate the risk of possible breakage either
> way. Theoretical "if"s aren't really enough.

As I already said our SoC has the next structure (obviously the
diagram is very simplified, but the gist is the same):
              +-----+
              | DDR |
              +--+--+
                 |
  +-----+ +------+-------+ +---------+
  | CPU +-+ Interconnect +-+ DEVs... |
  +-----+ +-----^-+------+ +---------+
     dma-ranges-| |-ranges
              +-+-v-+
              | PCI |
              +-----+
The PCIe peripheral devices are connected to the rest of the system
via the DW PCIe Host controller. If the controller has the inbound
iATU configured to re-map the system memory (RAM, IOMEM) in a
non-one-to-one way (using the dma-ranges DT property of the PCIe host
controller), then all the PCIe bus MRd/MWr TLP addresses will be
accordingly translated on a way to all the connected to the
interconnect slave devices including the MMIO devices. The kernel DMA
API at this moment provides the only methods to get the PCIe-bus
visible RAM addresses, while the physical addresses (for instance of
the MMIO devices) can't be correctly translated for such case. I
thought that dma_map_resource() could do the trick, but it turned out
it didn't get the dma-ranges mapping into account.

To be fully honest currently we don't really have any platform which
would have had the strong requirement of doing DMA from the PCIe
peripheral devices to the platform devices. But since the PCIe bus is
the extendable bus (cold and hot pluggable) then such requirement may
arise in the practice for instance on a platform with the PCIe NTB
device attached to the PCIe bus and configured to access the system
MMIO devices via the bridge. That part I find potentially problematic
seeing the practical usecase is unsupported just due to the incomplete
API. Moreover the dma_direct_map_resource() method semantic being
different from the rest of the direct DMA mapping methods doesn't seem
right from the usability point of view. Finally as you can see having
the dma_direct_map_resource() defined as MMIO-specific doesn't
mean that the dma_pfn_offset-based mapping isn't supposed to be taken
into account.

-Sergey

> 
> Robin.
diff mbox series

Patch

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 50f48e9e4598..9ce8192b29ab 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -497,7 +497,7 @@  int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
 dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-	dma_addr_t dma_addr = paddr;
+	dma_addr_t dma_addr = phys_to_dma_direct(dev, paddr);
 
 	if (unlikely(!dma_capable(dev, dma_addr, size, false))) {
 		dev_err_once(dev,