mbox series

[v4,0/7] iommu, dma-mapping: Simplify arch_setup_dma_ops()

Message ID cover.1713523152.git.robin.murphy@arm.com
Headers show
Series iommu, dma-mapping: Simplify arch_setup_dma_ops() | expand

Message

Robin Murphy April 19, 2024, 4:54 p.m. UTC
v3: https://lore.kernel.org/linux-iommu/cover.1707493264.git.robin.murphy@arm.com/

Hi all,

Since this ended up missing the boat for 6.9, here's a rebase and resend
with the additional tags from v3 collected.

Cheers,
Robin.


Robin Murphy (7):
  OF: Retire dma-ranges mask workaround
  OF: Simplify DMA range calculations
  ACPI/IORT: Handle memory address size limits as limits
  dma-mapping: Add helpers for dma_range_map bounds
  iommu/dma: Make limit checks self-contained
  iommu/dma: Centralise iommu_setup_dma_ops()
  dma-mapping: Simplify arch_setup_dma_ops()

 arch/arc/mm/dma.c               |  3 +--
 arch/arm/mm/dma-mapping-nommu.c |  3 +--
 arch/arm/mm/dma-mapping.c       | 16 +++++++------
 arch/arm64/mm/dma-mapping.c     |  5 +---
 arch/loongarch/kernel/dma.c     |  9 ++-----
 arch/mips/mm/dma-noncoherent.c  |  3 +--
 arch/riscv/mm/dma-noncoherent.c |  3 +--
 drivers/acpi/arm64/dma.c        | 17 ++++---------
 drivers/acpi/arm64/iort.c       | 20 ++++++++--------
 drivers/acpi/scan.c             |  7 +-----
 drivers/hv/hv_common.c          |  6 +----
 drivers/iommu/amd/iommu.c       |  8 -------
 drivers/iommu/dma-iommu.c       | 39 ++++++++++++------------------
 drivers/iommu/dma-iommu.h       | 14 +++++------
 drivers/iommu/intel/iommu.c     |  7 ------
 drivers/iommu/iommu.c           | 20 ++++++----------
 drivers/iommu/s390-iommu.c      |  6 -----
 drivers/iommu/virtio-iommu.c    | 10 --------
 drivers/of/device.c             | 42 ++++++---------------------------
 include/linux/acpi_iort.h       |  4 ++--
 include/linux/dma-direct.h      | 18 ++++++++++++++
 include/linux/dma-map-ops.h     |  6 ++---
 include/linux/iommu.h           |  7 ------
 23 files changed, 89 insertions(+), 184 deletions(-)

Comments

Christoph Hellwig April 22, 2024, 6:37 a.m. UTC | #1
On Fri, Apr 19, 2024 at 05:54:39PM +0100, Robin Murphy wrote:
> v3: https://lore.kernel.org/linux-iommu/cover.1707493264.git.robin.murphy@arm.com/
> 
> Hi all,
> 
> Since this ended up missing the boat for 6.9, here's a rebase and resend
> with the additional tags from v3 collected.

And just to clarify:  I expect this to go in through the iommu tree.
If you need further action from me, just let me know.
Catalin Marinas April 23, 2024, 9:57 a.m. UTC | #2
On Fri, Apr 19, 2024 at 05:54:45PM +0100, Robin Murphy wrote:
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 61886e43e3a1..313d8938a2f0 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -58,8 +58,6 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>  		   ARCH_DMA_MINALIGN, cls);
>  
>  	dev->dma_coherent = coherent;
> -	if (device_iommu_mapped(dev))
> -		iommu_setup_dma_ops(dev, dma_base, dma_base + size - 1);
>  
>  	xen_setup_dma_ops(dev);
>  }

In case you need an ack for the arm64 changes:

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Joerg Roedel April 26, 2024, 10:07 a.m. UTC | #3
On Fri, Apr 19, 2024 at 05:54:39PM +0100, Robin Murphy wrote:
> Since this ended up missing the boat for 6.9, here's a rebase and resend
> with the additional tags from v3 collected.

Applied, thanks.
Dmitry Baryshkov April 29, 2024, 4:31 p.m. UTC | #4
On Fri, Apr 19, 2024 at 05:54:45PM +0100, Robin Murphy wrote:
> It's somewhat hard to see, but arm64's arch_setup_dma_ops() should only
> ever call iommu_setup_dma_ops() after a successful iommu_probe_device(),
> which means there should be no harm in achieving the same order of
> operations by running it off the back of iommu_probe_device() itself.
> This then puts it in line with the x86 and s390 .probe_finalize bodges,
> letting us pull it all into the main flow properly. As a bonus this lets
> us fold in and de-scope the PCI workaround setup as well.
> 
> At this point we can also then pull the call up inside the group mutex,
> and avoid having to think about whether iommu_group_store_type() could
> theoretically race and free the domain if iommu_setup_dma_ops() ran just
> *before* iommu_device_use_default_domain() claims it... Furthermore we
> replace one .probe_finalize call completely, since the only remaining
> implementations are now one which only needs to run once for the initial
> boot-time probe, and two which themselves render that path unreachable.
> 
> This leaves us a big step closer to realistically being able to unpick
> the variety of different things that iommu_setup_dma_ops() has been
> muddling together, and further streamline iommu-dma into core API flows
> in future.
> 
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> # For Intel IOMMU
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Hanjun Guo <guohanjun@huawei.com>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
> v2: Shuffle around to make sure the iommu_group_do_probe_finalize() case
>     is covered as well, with bonus side-effects as above.
> v3: *Really* do that, remembering the other two probe_finalize sites too.
> ---
>  arch/arm64/mm/dma-mapping.c  |  2 --
>  drivers/iommu/amd/iommu.c    |  8 --------
>  drivers/iommu/dma-iommu.c    | 18 ++++++------------
>  drivers/iommu/dma-iommu.h    | 14 ++++++--------
>  drivers/iommu/intel/iommu.c  |  7 -------
>  drivers/iommu/iommu.c        | 20 +++++++-------------
>  drivers/iommu/s390-iommu.c   |  6 ------
>  drivers/iommu/virtio-iommu.c | 10 ----------
>  include/linux/iommu.h        |  7 -------
>  9 files changed, 19 insertions(+), 73 deletions(-)

This patch breaks UFS on Qualcomm SC8180X Primus platform:


[    3.846856] arm-smmu 15000000.iommu: Unhandled context fault: fsr=0x402, iova=0x1032db3e0, fsynr=0x130000, cbfrsynra=0x300, cb=4
[    3.846880] ufshcd-qcom 1d84000.ufshc: ufshcd_check_errors: saved_err 0x20000 saved_uic_err 0x0
[    3.846929] host_regs: 00000000: 1587031f 00000000 00000300 00000000
[    3.846935] host_regs: 00000010: 01000000 00010217 00000000 00000000
[    3.846941] host_regs: 00000020: 00000000 00070ef5 00000000 00000000
[    3.846946] host_regs: 00000030: 0000000f 00000001 00000000 00000000
[    3.846951] host_regs: 00000040: 00000000 00000000 00000000 00000000
[    3.846956] host_regs: 00000050: 032db000 00000001 00000000 00000000
[    3.846962] host_regs: 00000060: 00000000 80000000 00000000 00000000
[    3.846967] host_regs: 00000070: 032dd000 00000001 00000000 00000000
[    3.846972] host_regs: 00000080: 00000000 00000000 00000000 00000000
[    3.846977] host_regs: 00000090: 00000016 00000000 00000000 0000000c
[    3.847074] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler started; HBA state eh_fatal; powered 1; shutting down 0; saved_err = 131072; saved_uic_err = 0; force_reset = 0
[    4.406550] ufshcd-qcom 1d84000.ufshc: ufshcd_verify_dev_init: NOP OUT failed -11
[    4.417953] ufshcd-qcom 1d84000.ufshc: ufshcd_async_scan failed: -11
Dmitry Baryshkov April 29, 2024, 9:26 p.m. UTC | #5
On Mon, 29 Apr 2024 at 19:31, Dmitry Baryshkov
<dmitry.baryshkov@linaro.org> wrote:
>
> On Fri, Apr 19, 2024 at 05:54:45PM +0100, Robin Murphy wrote:
> > It's somewhat hard to see, but arm64's arch_setup_dma_ops() should only
> > ever call iommu_setup_dma_ops() after a successful iommu_probe_device(),
> > which means there should be no harm in achieving the same order of
> > operations by running it off the back of iommu_probe_device() itself.
> > This then puts it in line with the x86 and s390 .probe_finalize bodges,
> > letting us pull it all into the main flow properly. As a bonus this lets
> > us fold in and de-scope the PCI workaround setup as well.
> >
> > At this point we can also then pull the call up inside the group mutex,
> > and avoid having to think about whether iommu_group_store_type() could
> > theoretically race and free the domain if iommu_setup_dma_ops() ran just
> > *before* iommu_device_use_default_domain() claims it... Furthermore we
> > replace one .probe_finalize call completely, since the only remaining
> > implementations are now one which only needs to run once for the initial
> > boot-time probe, and two which themselves render that path unreachable.
> >
> > This leaves us a big step closer to realistically being able to unpick
> > the variety of different things that iommu_setup_dma_ops() has been
> > muddling together, and further streamline iommu-dma into core API flows
> > in future.
> >
> > Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> # For Intel IOMMU
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Tested-by: Hanjun Guo <guohanjun@huawei.com>
> > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > ---
> > v2: Shuffle around to make sure the iommu_group_do_probe_finalize() case
> >     is covered as well, with bonus side-effects as above.
> > v3: *Really* do that, remembering the other two probe_finalize sites too.
> > ---
> >  arch/arm64/mm/dma-mapping.c  |  2 --
> >  drivers/iommu/amd/iommu.c    |  8 --------
> >  drivers/iommu/dma-iommu.c    | 18 ++++++------------
> >  drivers/iommu/dma-iommu.h    | 14 ++++++--------
> >  drivers/iommu/intel/iommu.c  |  7 -------
> >  drivers/iommu/iommu.c        | 20 +++++++-------------
> >  drivers/iommu/s390-iommu.c   |  6 ------
> >  drivers/iommu/virtio-iommu.c | 10 ----------
> >  include/linux/iommu.h        |  7 -------
> >  9 files changed, 19 insertions(+), 73 deletions(-)
>
> This patch breaks UFS on Qualcomm SC8180X Primus platform:
>
>
> [    3.846856] arm-smmu 15000000.iommu: Unhandled context fault: fsr=0x402, iova=0x1032db3e0, fsynr=0x130000, cbfrsynra=0x300, cb=4
> [    3.846880] ufshcd-qcom 1d84000.ufshc: ufshcd_check_errors: saved_err 0x20000 saved_uic_err 0x0
> [    3.846929] host_regs: 00000000: 1587031f 00000000 00000300 00000000
> [    3.846935] host_regs: 00000010: 01000000 00010217 00000000 00000000
> [    3.846941] host_regs: 00000020: 00000000 00070ef5 00000000 00000000
> [    3.846946] host_regs: 00000030: 0000000f 00000001 00000000 00000000
> [    3.846951] host_regs: 00000040: 00000000 00000000 00000000 00000000
> [    3.846956] host_regs: 00000050: 032db000 00000001 00000000 00000000
> [    3.846962] host_regs: 00000060: 00000000 80000000 00000000 00000000
> [    3.846967] host_regs: 00000070: 032dd000 00000001 00000000 00000000
> [    3.846972] host_regs: 00000080: 00000000 00000000 00000000 00000000
> [    3.846977] host_regs: 00000090: 00000016 00000000 00000000 0000000c
> [    3.847074] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler started; HBA state eh_fatal; powered 1; shutting down 0; saved_err = 131072; saved_uic_err = 0; force_reset = 0
> [    4.406550] ufshcd-qcom 1d84000.ufshc: ufshcd_verify_dev_init: NOP OUT failed -11
> [    4.417953] ufshcd-qcom 1d84000.ufshc: ufshcd_async_scan failed: -11

Just to confirm: reverting f091e93306e0 ("dma-mapping: Simplify
arch_setup_dma_ops()") and b67483b3c44e ("iommu/dma: Centralise
iommu_setup_dma_ops()" fixes the issue for me. Please ping me if you'd
like me to test a fix.
Robin Murphy April 29, 2024, 10:26 p.m. UTC | #6
On 2024-04-29 5:31 pm, Dmitry Baryshkov wrote:
> On Fri, Apr 19, 2024 at 05:54:45PM +0100, Robin Murphy wrote:
>> It's somewhat hard to see, but arm64's arch_setup_dma_ops() should only
>> ever call iommu_setup_dma_ops() after a successful iommu_probe_device(),
>> which means there should be no harm in achieving the same order of
>> operations by running it off the back of iommu_probe_device() itself.
>> This then puts it in line with the x86 and s390 .probe_finalize bodges,
>> letting us pull it all into the main flow properly. As a bonus this lets
>> us fold in and de-scope the PCI workaround setup as well.
>>
>> At this point we can also then pull the call up inside the group mutex,
>> and avoid having to think about whether iommu_group_store_type() could
>> theoretically race and free the domain if iommu_setup_dma_ops() ran just
>> *before* iommu_device_use_default_domain() claims it... Furthermore we
>> replace one .probe_finalize call completely, since the only remaining
>> implementations are now one which only needs to run once for the initial
>> boot-time probe, and two which themselves render that path unreachable.
>>
>> This leaves us a big step closer to realistically being able to unpick
>> the variety of different things that iommu_setup_dma_ops() has been
>> muddling together, and further streamline iommu-dma into core API flows
>> in future.
>>
>> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> # For Intel IOMMU
>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>> Tested-by: Hanjun Guo <guohanjun@huawei.com>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>> v2: Shuffle around to make sure the iommu_group_do_probe_finalize() case
>>      is covered as well, with bonus side-effects as above.
>> v3: *Really* do that, remembering the other two probe_finalize sites too.
>> ---
>>   arch/arm64/mm/dma-mapping.c  |  2 --
>>   drivers/iommu/amd/iommu.c    |  8 --------
>>   drivers/iommu/dma-iommu.c    | 18 ++++++------------
>>   drivers/iommu/dma-iommu.h    | 14 ++++++--------
>>   drivers/iommu/intel/iommu.c  |  7 -------
>>   drivers/iommu/iommu.c        | 20 +++++++-------------
>>   drivers/iommu/s390-iommu.c   |  6 ------
>>   drivers/iommu/virtio-iommu.c | 10 ----------
>>   include/linux/iommu.h        |  7 -------
>>   9 files changed, 19 insertions(+), 73 deletions(-)
> 
> This patch breaks UFS on Qualcomm SC8180X Primus platform:
> 
> 
> [    3.846856] arm-smmu 15000000.iommu: Unhandled context fault: fsr=0x402, iova=0x1032db3e0, fsynr=0x130000, cbfrsynra=0x300, cb=4

Hmm, a context fault implies that the device did get attached to a DMA 
domain, thus has successfully been through __iommu_probe_device(), yet 
somehow still didn't get the right DMA ops (since that "IOVA" looks more 
like a PA to me). Do you see the "Adding to IOMMU group..." message for 
this device, and/or any other relevant messages or errors before this 
point? I'm guessing there's a fair chance probe deferral might be 
involved as well. I'd like to understand what path(s) this ends up 
taking through __iommu_probe_device() and of_dma_configure(), or at 
least the number and order of probe attempts between the UFS and SMMU 
drivers.

I'll stare at the code in the morning and see if I can spot any 
overlooked ways in which what I think might be happening could happen, 
but any more info to help narrow it down would be much appreciated.

Thanks,
Robin.

> [    3.846880] ufshcd-qcom 1d84000.ufshc: ufshcd_check_errors: saved_err 0x20000 saved_uic_err 0x0
> [    3.846929] host_regs: 00000000: 1587031f 00000000 00000300 00000000
> [    3.846935] host_regs: 00000010: 01000000 00010217 00000000 00000000
> [    3.846941] host_regs: 00000020: 00000000 00070ef5 00000000 00000000
> [    3.846946] host_regs: 00000030: 0000000f 00000001 00000000 00000000
> [    3.846951] host_regs: 00000040: 00000000 00000000 00000000 00000000
> [    3.846956] host_regs: 00000050: 032db000 00000001 00000000 00000000
> [    3.846962] host_regs: 00000060: 00000000 80000000 00000000 00000000
> [    3.846967] host_regs: 00000070: 032dd000 00000001 00000000 00000000
> [    3.846972] host_regs: 00000080: 00000000 00000000 00000000 00000000
> [    3.846977] host_regs: 00000090: 00000016 00000000 00000000 0000000c
> [    3.847074] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler started; HBA state eh_fatal; powered 1; shutting down 0; saved_err = 131072; saved_uic_err = 0; force_reset = 0
> [    4.406550] ufshcd-qcom 1d84000.ufshc: ufshcd_verify_dev_init: NOP OUT failed -11
> [    4.417953] ufshcd-qcom 1d84000.ufshc: ufshcd_async_scan failed: -11
>
Dmitry Baryshkov April 30, 2024, 12:41 a.m. UTC | #7
On Tue, 30 Apr 2024 at 01:26, Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2024-04-29 5:31 pm, Dmitry Baryshkov wrote:
> > On Fri, Apr 19, 2024 at 05:54:45PM +0100, Robin Murphy wrote:
> >> It's somewhat hard to see, but arm64's arch_setup_dma_ops() should only
> >> ever call iommu_setup_dma_ops() after a successful iommu_probe_device(),
> >> which means there should be no harm in achieving the same order of
> >> operations by running it off the back of iommu_probe_device() itself.
> >> This then puts it in line with the x86 and s390 .probe_finalize bodges,
> >> letting us pull it all into the main flow properly. As a bonus this lets
> >> us fold in and de-scope the PCI workaround setup as well.
> >>
> >> At this point we can also then pull the call up inside the group mutex,
> >> and avoid having to think about whether iommu_group_store_type() could
> >> theoretically race and free the domain if iommu_setup_dma_ops() ran just
> >> *before* iommu_device_use_default_domain() claims it... Furthermore we
> >> replace one .probe_finalize call completely, since the only remaining
> >> implementations are now one which only needs to run once for the initial
> >> boot-time probe, and two which themselves render that path unreachable.
> >>
> >> This leaves us a big step closer to realistically being able to unpick
> >> the variety of different things that iommu_setup_dma_ops() has been
> >> muddling together, and further streamline iommu-dma into core API flows
> >> in future.
> >>
> >> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> # For Intel IOMMU
> >> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> >> Tested-by: Hanjun Guo <guohanjun@huawei.com>
> >> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> >> ---
> >> v2: Shuffle around to make sure the iommu_group_do_probe_finalize() case
> >>      is covered as well, with bonus side-effects as above.
> >> v3: *Really* do that, remembering the other two probe_finalize sites too.
> >> ---
> >>   arch/arm64/mm/dma-mapping.c  |  2 --
> >>   drivers/iommu/amd/iommu.c    |  8 --------
> >>   drivers/iommu/dma-iommu.c    | 18 ++++++------------
> >>   drivers/iommu/dma-iommu.h    | 14 ++++++--------
> >>   drivers/iommu/intel/iommu.c  |  7 -------
> >>   drivers/iommu/iommu.c        | 20 +++++++-------------
> >>   drivers/iommu/s390-iommu.c   |  6 ------
> >>   drivers/iommu/virtio-iommu.c | 10 ----------
> >>   include/linux/iommu.h        |  7 -------
> >>   9 files changed, 19 insertions(+), 73 deletions(-)
> >
> > This patch breaks UFS on Qualcomm SC8180X Primus platform:
> >
> >
> > [    3.846856] arm-smmu 15000000.iommu: Unhandled context fault: fsr=0x402, iova=0x1032db3e0, fsynr=0x130000, cbfrsynra=0x300, cb=4
>
> Hmm, a context fault implies that the device did get attached to a DMA
> domain, thus has successfully been through __iommu_probe_device(), yet
> somehow still didn't get the right DMA ops (since that "IOVA" looks more
> like a PA to me). Do you see the "Adding to IOMMU group..." message for
> this device, and/or any other relevant messages or errors before this
> point?

No, nothing relevant.

[    8.372395] ufshcd-qcom 1d84000.ufshc: Adding to iommu group 6

(please ignore the timestamp, it comes before ufshc being probed).

> I'm guessing there's a fair chance probe deferral might be
> involved as well. I'd like to understand what path(s) this ends up
> taking through __iommu_probe_device() and of_dma_configure(), or at
> least the number and order of probe attempts between the UFS and SMMU
> drivers.

__iommu_probe_device() gets called twice and returns early because ops is NULL.

Then finally of_dma_configure_id() is called. The following branches are taken:

np == dev->of_node
of_dma_get_range() returned 0
bus_dma_limit and dma_range_map are set
__iommu_probe_device() is called, using the `!group->default_domain &&
!group_lis` case, then group->default_domain() is not NULL,
In the end, iommu_setup_dma_ops() is called.

Then the ufshc probe defers (most likely the PHY is not present or
some other device is not there yet).

On the next (succeeding) try, of_dma_configure_id() is called again.
The call trace is more or less the same, except that
__iommu_probe_device() is not called

> I'll stare at the code in the morning and see if I can spot any
> overlooked ways in which what I think might be happening could happen,
> but any more info to help narrow it down would be much appreciated.
>
> Thanks,
> Robin.
>
> > [    3.846880] ufshcd-qcom 1d84000.ufshc: ufshcd_check_errors: saved_err 0x20000 saved_uic_err 0x0
> > [    3.846929] host_regs: 00000000: 1587031f 00000000 00000300 00000000
> > [    3.846935] host_regs: 00000010: 01000000 00010217 00000000 00000000
> > [    3.846941] host_regs: 00000020: 00000000 00070ef5 00000000 00000000
> > [    3.846946] host_regs: 00000030: 0000000f 00000001 00000000 00000000
> > [    3.846951] host_regs: 00000040: 00000000 00000000 00000000 00000000
> > [    3.846956] host_regs: 00000050: 032db000 00000001 00000000 00000000
> > [    3.846962] host_regs: 00000060: 00000000 80000000 00000000 00000000
> > [    3.846967] host_regs: 00000070: 032dd000 00000001 00000000 00000000
> > [    3.846972] host_regs: 00000080: 00000000 00000000 00000000 00000000
> > [    3.846977] host_regs: 00000090: 00000016 00000000 00000000 0000000c
> > [    3.847074] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler started; HBA state eh_fatal; powered 1; shutting down 0; saved_err = 131072; saved_uic_err = 0; force_reset = 0
> > [    4.406550] ufshcd-qcom 1d84000.ufshc: ufshcd_verify_dev_init: NOP OUT failed -11
> > [    4.417953] ufshcd-qcom 1d84000.ufshc: ufshcd_async_scan failed: -11
> >
Robin Murphy April 30, 2024, 10:20 a.m. UTC | #8
On 2024-04-30 1:41 am, Dmitry Baryshkov wrote:
> On Tue, 30 Apr 2024 at 01:26, Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> On 2024-04-29 5:31 pm, Dmitry Baryshkov wrote:
>>> On Fri, Apr 19, 2024 at 05:54:45PM +0100, Robin Murphy wrote:
>>>> It's somewhat hard to see, but arm64's arch_setup_dma_ops() should only
>>>> ever call iommu_setup_dma_ops() after a successful iommu_probe_device(),
>>>> which means there should be no harm in achieving the same order of
>>>> operations by running it off the back of iommu_probe_device() itself.
>>>> This then puts it in line with the x86 and s390 .probe_finalize bodges,
>>>> letting us pull it all into the main flow properly. As a bonus this lets
>>>> us fold in and de-scope the PCI workaround setup as well.
>>>>
>>>> At this point we can also then pull the call up inside the group mutex,
>>>> and avoid having to think about whether iommu_group_store_type() could
>>>> theoretically race and free the domain if iommu_setup_dma_ops() ran just
>>>> *before* iommu_device_use_default_domain() claims it... Furthermore we
>>>> replace one .probe_finalize call completely, since the only remaining
>>>> implementations are now one which only needs to run once for the initial
>>>> boot-time probe, and two which themselves render that path unreachable.
>>>>
>>>> This leaves us a big step closer to realistically being able to unpick
>>>> the variety of different things that iommu_setup_dma_ops() has been
>>>> muddling together, and further streamline iommu-dma into core API flows
>>>> in future.
>>>>
>>>> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> # For Intel IOMMU
>>>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>>>> Tested-by: Hanjun Guo <guohanjun@huawei.com>
>>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>>> ---
>>>> v2: Shuffle around to make sure the iommu_group_do_probe_finalize() case
>>>>       is covered as well, with bonus side-effects as above.
>>>> v3: *Really* do that, remembering the other two probe_finalize sites too.
>>>> ---
>>>>    arch/arm64/mm/dma-mapping.c  |  2 --
>>>>    drivers/iommu/amd/iommu.c    |  8 --------
>>>>    drivers/iommu/dma-iommu.c    | 18 ++++++------------
>>>>    drivers/iommu/dma-iommu.h    | 14 ++++++--------
>>>>    drivers/iommu/intel/iommu.c  |  7 -------
>>>>    drivers/iommu/iommu.c        | 20 +++++++-------------
>>>>    drivers/iommu/s390-iommu.c   |  6 ------
>>>>    drivers/iommu/virtio-iommu.c | 10 ----------
>>>>    include/linux/iommu.h        |  7 -------
>>>>    9 files changed, 19 insertions(+), 73 deletions(-)
>>>
>>> This patch breaks UFS on Qualcomm SC8180X Primus platform:
>>>
>>>
>>> [    3.846856] arm-smmu 15000000.iommu: Unhandled context fault: fsr=0x402, iova=0x1032db3e0, fsynr=0x130000, cbfrsynra=0x300, cb=4
>>
>> Hmm, a context fault implies that the device did get attached to a DMA
>> domain, thus has successfully been through __iommu_probe_device(), yet
>> somehow still didn't get the right DMA ops (since that "IOVA" looks more
>> like a PA to me). Do you see the "Adding to IOMMU group..." message for
>> this device, and/or any other relevant messages or errors before this
>> point?
> 
> No, nothing relevant.
> 
> [    8.372395] ufshcd-qcom 1d84000.ufshc: Adding to iommu group 6
> 
> (please ignore the timestamp, it comes before ufshc being probed).
> 
>> I'm guessing there's a fair chance probe deferral might be
>> involved as well. I'd like to understand what path(s) this ends up
>> taking through __iommu_probe_device() and of_dma_configure(), or at
>> least the number and order of probe attempts between the UFS and SMMU
>> drivers.
> 
> __iommu_probe_device() gets called twice and returns early because ops is NULL.
> 
> Then finally of_dma_configure_id() is called. The following branches are taken:
> 
> np == dev->of_node
> of_dma_get_range() returned 0
> bus_dma_limit and dma_range_map are set
> __iommu_probe_device() is called, using the `!group->default_domain &&
> !group_lis` case, then group->default_domain() is not NULL,
> In the end, iommu_setup_dma_ops() is called.
> 
> Then the ufshc probe defers (most likely the PHY is not present or
> some other device is not there yet).

Ah good, probe deferral. And indeed the half-formed hunch from last 
night grew into a pretty definite idea by this morning... patch incoming.

Thanks,
Robin.

> On the next (succeeding) try, of_dma_configure_id() is called again.
> The call trace is more or less the same, except that
> __iommu_probe_device() is not called
> 
>> I'll stare at the code in the morning and see if I can spot any
>> overlooked ways in which what I think might be happening could happen,
>> but any more info to help narrow it down would be much appreciated.
>>
>> Thanks,
>> Robin.
>>
>>> [    3.846880] ufshcd-qcom 1d84000.ufshc: ufshcd_check_errors: saved_err 0x20000 saved_uic_err 0x0
>>> [    3.846929] host_regs: 00000000: 1587031f 00000000 00000300 00000000
>>> [    3.846935] host_regs: 00000010: 01000000 00010217 00000000 00000000
>>> [    3.846941] host_regs: 00000020: 00000000 00070ef5 00000000 00000000
>>> [    3.846946] host_regs: 00000030: 0000000f 00000001 00000000 00000000
>>> [    3.846951] host_regs: 00000040: 00000000 00000000 00000000 00000000
>>> [    3.846956] host_regs: 00000050: 032db000 00000001 00000000 00000000
>>> [    3.846962] host_regs: 00000060: 00000000 80000000 00000000 00000000
>>> [    3.846967] host_regs: 00000070: 032dd000 00000001 00000000 00000000
>>> [    3.846972] host_regs: 00000080: 00000000 00000000 00000000 00000000
>>> [    3.846977] host_regs: 00000090: 00000016 00000000 00000000 0000000c
>>> [    3.847074] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler started; HBA state eh_fatal; powered 1; shutting down 0; saved_err = 131072; saved_uic_err = 0; force_reset = 0
>>> [    4.406550] ufshcd-qcom 1d84000.ufshc: ufshcd_verify_dev_init: NOP OUT failed -11
>>> [    4.417953] ufshcd-qcom 1d84000.ufshc: ufshcd_async_scan failed: -11
>>>
> 
> 
>
Dmitry Baryshkov April 30, 2024, 10:56 a.m. UTC | #9
On Tue, 30 Apr 2024 at 13:20, Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2024-04-30 1:41 am, Dmitry Baryshkov wrote:
> > On Tue, 30 Apr 2024 at 01:26, Robin Murphy <robin.murphy@arm.com> wrote:
> >>
> >> On 2024-04-29 5:31 pm, Dmitry Baryshkov wrote:
> >>> On Fri, Apr 19, 2024 at 05:54:45PM +0100, Robin Murphy wrote:
> >>>> It's somewhat hard to see, but arm64's arch_setup_dma_ops() should only
> >>>> ever call iommu_setup_dma_ops() after a successful iommu_probe_device(),
> >>>> which means there should be no harm in achieving the same order of
> >>>> operations by running it off the back of iommu_probe_device() itself.
> >>>> This then puts it in line with the x86 and s390 .probe_finalize bodges,
> >>>> letting us pull it all into the main flow properly. As a bonus this lets
> >>>> us fold in and de-scope the PCI workaround setup as well.
> >>>>
> >>>> At this point we can also then pull the call up inside the group mutex,
> >>>> and avoid having to think about whether iommu_group_store_type() could
> >>>> theoretically race and free the domain if iommu_setup_dma_ops() ran just
> >>>> *before* iommu_device_use_default_domain() claims it... Furthermore we
> >>>> replace one .probe_finalize call completely, since the only remaining
> >>>> implementations are now one which only needs to run once for the initial
> >>>> boot-time probe, and two which themselves render that path unreachable.
> >>>>
> >>>> This leaves us a big step closer to realistically being able to unpick
> >>>> the variety of different things that iommu_setup_dma_ops() has been
> >>>> muddling together, and further streamline iommu-dma into core API flows
> >>>> in future.
> >>>>
> >>>> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> # For Intel IOMMU
> >>>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> >>>> Tested-by: Hanjun Guo <guohanjun@huawei.com>
> >>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> >>>> ---
> >>>> v2: Shuffle around to make sure the iommu_group_do_probe_finalize() case
> >>>>       is covered as well, with bonus side-effects as above.
> >>>> v3: *Really* do that, remembering the other two probe_finalize sites too.
> >>>> ---
> >>>>    arch/arm64/mm/dma-mapping.c  |  2 --
> >>>>    drivers/iommu/amd/iommu.c    |  8 --------
> >>>>    drivers/iommu/dma-iommu.c    | 18 ++++++------------
> >>>>    drivers/iommu/dma-iommu.h    | 14 ++++++--------
> >>>>    drivers/iommu/intel/iommu.c  |  7 -------
> >>>>    drivers/iommu/iommu.c        | 20 +++++++-------------
> >>>>    drivers/iommu/s390-iommu.c   |  6 ------
> >>>>    drivers/iommu/virtio-iommu.c | 10 ----------
> >>>>    include/linux/iommu.h        |  7 -------
> >>>>    9 files changed, 19 insertions(+), 73 deletions(-)
> >>>
> >>> This patch breaks UFS on Qualcomm SC8180X Primus platform:
> >>>
> >>>
> >>> [    3.846856] arm-smmu 15000000.iommu: Unhandled context fault: fsr=0x402, iova=0x1032db3e0, fsynr=0x130000, cbfrsynra=0x300, cb=4
> >>
> >> Hmm, a context fault implies that the device did get attached to a DMA
> >> domain, thus has successfully been through __iommu_probe_device(), yet
> >> somehow still didn't get the right DMA ops (since that "IOVA" looks more
> >> like a PA to me). Do you see the "Adding to IOMMU group..." message for
> >> this device, and/or any other relevant messages or errors before this
> >> point?
> >
> > No, nothing relevant.
> >
> > [    8.372395] ufshcd-qcom 1d84000.ufshc: Adding to iommu group 6
> >
> > (please ignore the timestamp, it comes before ufshc being probed).
> >
> >> I'm guessing there's a fair chance probe deferral might be
> >> involved as well. I'd like to understand what path(s) this ends up
> >> taking through __iommu_probe_device() and of_dma_configure(), or at
> >> least the number and order of probe attempts between the UFS and SMMU
> >> drivers.
> >
> > __iommu_probe_device() gets called twice and returns early because ops is NULL.
> >
> > Then finally of_dma_configure_id() is called. The following branches are taken:
> >
> > np == dev->of_node
> > of_dma_get_range() returned 0
> > bus_dma_limit and dma_range_map are set
> > __iommu_probe_device() is called, using the `!group->default_domain &&
> > !group_lis` case, then group->default_domain() is not NULL,
> > In the end, iommu_setup_dma_ops() is called.
> >
> > Then the ufshc probe defers (most likely the PHY is not present or
> > some other device is not there yet).
>
> Ah good, probe deferral. And indeed the half-formed hunch from last
> night grew into a pretty definite idea by this morning... patch incoming.

Thanks a lot for the quick fix!
Konrad Dybcio April 30, 2024, 12:23 p.m. UTC | #10
On 29.04.2024 11:26 PM, Dmitry Baryshkov wrote:
> On Mon, 29 Apr 2024 at 19:31, Dmitry Baryshkov
> <dmitry.baryshkov@linaro.org> wrote:
>>
>> On Fri, Apr 19, 2024 at 05:54:45PM +0100, Robin Murphy wrote:
>>> It's somewhat hard to see, but arm64's arch_setup_dma_ops() should only
>>> ever call iommu_setup_dma_ops() after a successful iommu_probe_device(),
>>> which means there should be no harm in achieving the same order of
>>> operations by running it off the back of iommu_probe_device() itself.
>>> This then puts it in line with the x86 and s390 .probe_finalize bodges,
>>> letting us pull it all into the main flow properly. As a bonus this lets
>>> us fold in and de-scope the PCI workaround setup as well.
>>>
>>> At this point we can also then pull the call up inside the group mutex,
>>> and avoid having to think about whether iommu_group_store_type() could
>>> theoretically race and free the domain if iommu_setup_dma_ops() ran just
>>> *before* iommu_device_use_default_domain() claims it... Furthermore we
>>> replace one .probe_finalize call completely, since the only remaining
>>> implementations are now one which only needs to run once for the initial
>>> boot-time probe, and two which themselves render that path unreachable.
>>>
>>> This leaves us a big step closer to realistically being able to unpick
>>> the variety of different things that iommu_setup_dma_ops() has been
>>> muddling together, and further streamline iommu-dma into core API flows
>>> in future.
>>>
>>> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> # For Intel IOMMU
>>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>>> Tested-by: Hanjun Guo <guohanjun@huawei.com>
>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>> ---
>>> v2: Shuffle around to make sure the iommu_group_do_probe_finalize() case
>>>     is covered as well, with bonus side-effects as above.
>>> v3: *Really* do that, remembering the other two probe_finalize sites too.
>>> ---
>>>  arch/arm64/mm/dma-mapping.c  |  2 --
>>>  drivers/iommu/amd/iommu.c    |  8 --------
>>>  drivers/iommu/dma-iommu.c    | 18 ++++++------------
>>>  drivers/iommu/dma-iommu.h    | 14 ++++++--------
>>>  drivers/iommu/intel/iommu.c  |  7 -------
>>>  drivers/iommu/iommu.c        | 20 +++++++-------------
>>>  drivers/iommu/s390-iommu.c   |  6 ------
>>>  drivers/iommu/virtio-iommu.c | 10 ----------
>>>  include/linux/iommu.h        |  7 -------
>>>  9 files changed, 19 insertions(+), 73 deletions(-)
>>
>> This patch breaks UFS on Qualcomm SC8180X Primus platform:
>>
>>
>> [    3.846856] arm-smmu 15000000.iommu: Unhandled context fault: fsr=0x402, iova=0x1032db3e0, fsynr=0x130000, cbfrsynra=0x300, cb=4
>> [    3.846880] ufshcd-qcom 1d84000.ufshc: ufshcd_check_errors: saved_err 0x20000 saved_uic_err 0x0
>> [    3.846929] host_regs: 00000000: 1587031f 00000000 00000300 00000000
>> [    3.846935] host_regs: 00000010: 01000000 00010217 00000000 00000000
>> [    3.846941] host_regs: 00000020: 00000000 00070ef5 00000000 00000000
>> [    3.846946] host_regs: 00000030: 0000000f 00000001 00000000 00000000
>> [    3.846951] host_regs: 00000040: 00000000 00000000 00000000 00000000
>> [    3.846956] host_regs: 00000050: 032db000 00000001 00000000 00000000
>> [    3.846962] host_regs: 00000060: 00000000 80000000 00000000 00000000
>> [    3.846967] host_regs: 00000070: 032dd000 00000001 00000000 00000000
>> [    3.846972] host_regs: 00000080: 00000000 00000000 00000000 00000000
>> [    3.846977] host_regs: 00000090: 00000016 00000000 00000000 0000000c
>> [    3.847074] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler started; HBA state eh_fatal; powered 1; shutting down 0; saved_err = 131072; saved_uic_err = 0; force_reset = 0
>> [    4.406550] ufshcd-qcom 1d84000.ufshc: ufshcd_verify_dev_init: NOP OUT failed -11
>> [    4.417953] ufshcd-qcom 1d84000.ufshc: ufshcd_async_scan failed: -11
> 
> Just to confirm: reverting f091e93306e0 ("dma-mapping: Simplify
> arch_setup_dma_ops()") and b67483b3c44e ("iommu/dma: Centralise
> iommu_setup_dma_ops()" fixes the issue for me. Please ping me if you'd
> like me to test a fix.

This also triggers a different issue (that also comes down to "ufs bad") on
another QC platform (SM8550):

[    4.282098] scsi host0: ufshcd
[    4.315970] ufshcd-qcom 1d84000.ufs: ufshcd_check_errors: saved_err 0x20000 saved_uic_err 0x0
[    4.330155] host_regs: 00000000: 3587031f 00000000 00000400 00000000
[    4.343955] host_regs: 00000010: 01000000 00010217 00000000 00000000
[    4.356027] host_regs: 00000020: 00000000 00070ef5 00000000 00000000
[    4.370136] host_regs: 00000030: 0000000f 00000003 00000000 00000000
[    4.376662] host_regs: 00000040: 00000000 00000000 00000000 00000000
[    4.383192] host_regs: 00000050: 85109000 00000008 00000000 00000000
[    4.389719] host_regs: 00000060: 00000000 80000000 00000000 00000000
[    4.396245] host_regs: 00000070: 8510a000 00000008 00000000 00000000
[    4.402773] host_regs: 00000080: 00000000 00000000 00000000 00000000
[    4.409298] host_regs: 00000090: 00000016 00000000 00000000 0000000c
[    4.415900] arm-smmu 15000000.iommu: Unhandled context fault: fsr=0x402, iova=0x8851093e0, fsynr=0x3b0001, cbfrsynra=0x60, cb=2
[    4.416135] ufshcd-qcom 1d84000.ufs: ufshcd_err_handler started; HBA state eh_fatal; powered 1; shutting down 0; saved_err = 131072; saved_uic_err = 0; force_reset = 0
[    4.951750] ufshcd-qcom 1d84000.ufs: ufshcd_verify_dev_init: NOP OUT failed -11
[    4.960644] ufshcd-qcom 1d84000.ufs: ufshcd_async_scan failed: -11

Reverting the commits Dmitry mentioned also fixes this.

Konrad
Robin Murphy April 30, 2024, 12:33 p.m. UTC | #11
On 30/04/2024 1:23 pm, Konrad Dybcio wrote:
> On 29.04.2024 11:26 PM, Dmitry Baryshkov wrote:
>> On Mon, 29 Apr 2024 at 19:31, Dmitry Baryshkov
>> <dmitry.baryshkov@linaro.org> wrote:
>>>
>>> On Fri, Apr 19, 2024 at 05:54:45PM +0100, Robin Murphy wrote:
>>>> It's somewhat hard to see, but arm64's arch_setup_dma_ops() should only
>>>> ever call iommu_setup_dma_ops() after a successful iommu_probe_device(),
>>>> which means there should be no harm in achieving the same order of
>>>> operations by running it off the back of iommu_probe_device() itself.
>>>> This then puts it in line with the x86 and s390 .probe_finalize bodges,
>>>> letting us pull it all into the main flow properly. As a bonus this lets
>>>> us fold in and de-scope the PCI workaround setup as well.
>>>>
>>>> At this point we can also then pull the call up inside the group mutex,
>>>> and avoid having to think about whether iommu_group_store_type() could
>>>> theoretically race and free the domain if iommu_setup_dma_ops() ran just
>>>> *before* iommu_device_use_default_domain() claims it... Furthermore we
>>>> replace one .probe_finalize call completely, since the only remaining
>>>> implementations are now one which only needs to run once for the initial
>>>> boot-time probe, and two which themselves render that path unreachable.
>>>>
>>>> This leaves us a big step closer to realistically being able to unpick
>>>> the variety of different things that iommu_setup_dma_ops() has been
>>>> muddling together, and further streamline iommu-dma into core API flows
>>>> in future.
>>>>
>>>> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> # For Intel IOMMU
>>>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>>>> Tested-by: Hanjun Guo <guohanjun@huawei.com>
>>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>>> ---
>>>> v2: Shuffle around to make sure the iommu_group_do_probe_finalize() case
>>>>      is covered as well, with bonus side-effects as above.
>>>> v3: *Really* do that, remembering the other two probe_finalize sites too.
>>>> ---
>>>>   arch/arm64/mm/dma-mapping.c  |  2 --
>>>>   drivers/iommu/amd/iommu.c    |  8 --------
>>>>   drivers/iommu/dma-iommu.c    | 18 ++++++------------
>>>>   drivers/iommu/dma-iommu.h    | 14 ++++++--------
>>>>   drivers/iommu/intel/iommu.c  |  7 -------
>>>>   drivers/iommu/iommu.c        | 20 +++++++-------------
>>>>   drivers/iommu/s390-iommu.c   |  6 ------
>>>>   drivers/iommu/virtio-iommu.c | 10 ----------
>>>>   include/linux/iommu.h        |  7 -------
>>>>   9 files changed, 19 insertions(+), 73 deletions(-)
>>>
>>> This patch breaks UFS on Qualcomm SC8180X Primus platform:
>>>
>>>
>>> [    3.846856] arm-smmu 15000000.iommu: Unhandled context fault: fsr=0x402, iova=0x1032db3e0, fsynr=0x130000, cbfrsynra=0x300, cb=4
>>> [    3.846880] ufshcd-qcom 1d84000.ufshc: ufshcd_check_errors: saved_err 0x20000 saved_uic_err 0x0
>>> [    3.846929] host_regs: 00000000: 1587031f 00000000 00000300 00000000
>>> [    3.846935] host_regs: 00000010: 01000000 00010217 00000000 00000000
>>> [    3.846941] host_regs: 00000020: 00000000 00070ef5 00000000 00000000
>>> [    3.846946] host_regs: 00000030: 0000000f 00000001 00000000 00000000
>>> [    3.846951] host_regs: 00000040: 00000000 00000000 00000000 00000000
>>> [    3.846956] host_regs: 00000050: 032db000 00000001 00000000 00000000
>>> [    3.846962] host_regs: 00000060: 00000000 80000000 00000000 00000000
>>> [    3.846967] host_regs: 00000070: 032dd000 00000001 00000000 00000000
>>> [    3.846972] host_regs: 00000080: 00000000 00000000 00000000 00000000
>>> [    3.846977] host_regs: 00000090: 00000016 00000000 00000000 0000000c
>>> [    3.847074] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler started; HBA state eh_fatal; powered 1; shutting down 0; saved_err = 131072; saved_uic_err = 0; force_reset = 0
>>> [    4.406550] ufshcd-qcom 1d84000.ufshc: ufshcd_verify_dev_init: NOP OUT failed -11
>>> [    4.417953] ufshcd-qcom 1d84000.ufshc: ufshcd_async_scan failed: -11
>>
>> Just to confirm: reverting f091e93306e0 ("dma-mapping: Simplify
>> arch_setup_dma_ops()") and b67483b3c44e ("iommu/dma: Centralise
>> iommu_setup_dma_ops()" fixes the issue for me. Please ping me if you'd
>> like me to test a fix.
> 
> This also triggers a different issue (that also comes down to "ufs bad") on
> another QC platform (SM8550):
> 
> [    4.282098] scsi host0: ufshcd
> [    4.315970] ufshcd-qcom 1d84000.ufs: ufshcd_check_errors: saved_err 0x20000 saved_uic_err 0x0
> [    4.330155] host_regs: 00000000: 3587031f 00000000 00000400 00000000
> [    4.343955] host_regs: 00000010: 01000000 00010217 00000000 00000000
> [    4.356027] host_regs: 00000020: 00000000 00070ef5 00000000 00000000
> [    4.370136] host_regs: 00000030: 0000000f 00000003 00000000 00000000
> [    4.376662] host_regs: 00000040: 00000000 00000000 00000000 00000000
> [    4.383192] host_regs: 00000050: 85109000 00000008 00000000 00000000
> [    4.389719] host_regs: 00000060: 00000000 80000000 00000000 00000000
> [    4.396245] host_regs: 00000070: 8510a000 00000008 00000000 00000000
> [    4.402773] host_regs: 00000080: 00000000 00000000 00000000 00000000
> [    4.409298] host_regs: 00000090: 00000016 00000000 00000000 0000000c
> [    4.415900] arm-smmu 15000000.iommu: Unhandled context fault: fsr=0x402, iova=0x8851093e0, fsynr=0x3b0001, cbfrsynra=0x60, cb=2
> [    4.416135] ufshcd-qcom 1d84000.ufs: ufshcd_err_handler started; HBA state eh_fatal; powered 1; shutting down 0; saved_err = 131072; saved_uic_err = 0; force_reset = 0
> [    4.951750] ufshcd-qcom 1d84000.ufs: ufshcd_verify_dev_init: NOP OUT failed -11
> [    4.960644] ufshcd-qcom 1d84000.ufs: ufshcd_async_scan failed: -11
> 
> Reverting the commits Dmitry mentioned also fixes this.

Yeah, It'll be the same thing - doesn't really matter exactly *how* the 
UFS goes wrong due to the SMMU blocking it, the issue is that the SMMU 
is erroneously blocking it in the first place due to a DMA ops mixup. 
Fix is now here:

https://lore.kernel.org/linux-iommu/d4cc20cbb0c45175e98dd76bf187e2ad6421296d.1714472573.git.robin.murphy@arm.com/

Thanks,
Robin.
Niklas Schnelle April 30, 2024, 2:39 p.m. UTC | #12
On Fri, 2024-04-19 at 17:54 +0100, Robin Murphy wrote:
> It's somewhat hard to see, but arm64's arch_setup_dma_ops() should only
> ever call iommu_setup_dma_ops() after a successful iommu_probe_device(),
> which means there should be no harm in achieving the same order of
> operations by running it off the back of iommu_probe_device() itself.
> This then puts it in line with the x86 and s390 .probe_finalize bodges,
> letting us pull it all into the main flow properly. As a bonus this lets
> us fold in and de-scope the PCI workaround setup as well.
> 
> At this point we can also then pull the call up inside the group mutex,
> and avoid having to think about whether iommu_group_store_type() could
> theoretically race and free the domain if iommu_setup_dma_ops() ran just
> *before* iommu_device_use_default_domain() claims it... Furthermore we
> replace one .probe_finalize call completely, since the only remaining
> implementations are now one which only needs to run once for the initial
> boot-time probe, and two which themselves render that path unreachable.
> 
> This leaves us a big step closer to realistically being able to unpick
> the variety of different things that iommu_setup_dma_ops() has been
> muddling together, and further streamline iommu-dma into core API flows
> in future.
> 
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> # For Intel IOMMU
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Hanjun Guo <guohanjun@huawei.com>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
---8<---
> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> index 9a5196f523de..d8eaa7ea380b 100644
> --- a/drivers/iommu/s390-iommu.c
> +++ b/drivers/iommu/s390-iommu.c
> @@ -695,11 +695,6 @@ static size_t s390_iommu_unmap_pages(struct iommu_domain *domain,
>  	return size;
>  }
>  
> -static void s390_iommu_probe_finalize(struct device *dev)
> -{
> -	iommu_setup_dma_ops(dev, 0, U64_MAX);
> -}
> -
>  struct zpci_iommu_ctrs *zpci_get_iommu_ctrs(struct zpci_dev *zdev)
>  {
>  	if (!zdev || !zdev->s390_domain)
> @@ -785,7 +780,6 @@ static const struct iommu_ops s390_iommu_ops = {
>  	.capable = s390_iommu_capable,
>  	.domain_alloc_paging = s390_domain_alloc_paging,
>  	.probe_device = s390_iommu_probe_device,
> -	.probe_finalize = s390_iommu_probe_finalize,
>  	.release_device = s390_iommu_release_device,
>  	.device_group = generic_device_group,
>  	.pgsize_bitmap = SZ_4K,

I gave this whole series a test boot on s390 and also tried running a
KVM guest with vfio-pci pass-through. For the s390 part feel free to
add my.

Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> # for s390
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> # for s390

Thanks,
Niklas
Jon Hunter May 14, 2024, 1:27 p.m. UTC | #13
Hi Robin,

On 19/04/2024 17:54, Robin Murphy wrote:
> It's now easy to retrieve the device's DMA limits if we want to check
> them against the domain aperture, so do that ourselves instead of
> relying on them being passed through the callchain.
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Hanjun Guo <guohanjun@huawei.com>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>   drivers/iommu/dma-iommu.c | 21 +++++++++------------
>   1 file changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index a3039005b696..f542eabaefa4 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -660,19 +660,16 @@ static void iommu_dma_init_options(struct iommu_dma_options *options,
>   /**
>    * iommu_dma_init_domain - Initialise a DMA mapping domain
>    * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
> - * @base: IOVA at which the mappable address space starts
> - * @limit: Last address of the IOVA space
>    * @dev: Device the domain is being initialised for
>    *
> - * @base and @limit + 1 should be exact multiples of IOMMU page granularity to
> - * avoid rounding surprises. If necessary, we reserve the page at address 0
> + * If the geometry and dma_range_map include address 0, we reserve that page
>    * to ensure it is an invalid IOVA. It is safe to reinitialise a domain, but
>    * any change which could make prior IOVAs invalid will fail.
>    */
> -static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
> -				 dma_addr_t limit, struct device *dev)
> +static int iommu_dma_init_domain(struct iommu_domain *domain, struct device *dev)
>   {
>   	struct iommu_dma_cookie *cookie = domain->iova_cookie;
> +	const struct bus_dma_region *map = dev->dma_range_map;
>   	unsigned long order, base_pfn;
>   	struct iova_domain *iovad;
>   	int ret;
> @@ -684,18 +681,18 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>   
>   	/* Use the smallest supported page size for IOVA granularity */
>   	order = __ffs(domain->pgsize_bitmap);
> -	base_pfn = max_t(unsigned long, 1, base >> order);
> +	base_pfn = 1;
>   
>   	/* Check the domain allows at least some access to the device... */
> -	if (domain->geometry.force_aperture) {
> +	if (map) {
> +		dma_addr_t base = dma_range_map_min(map);
>   		if (base > domain->geometry.aperture_end ||
> -		    limit < domain->geometry.aperture_start) {
> +		    dma_range_map_max(map) < domain->geometry.aperture_start) {
>   			pr_warn("specified DMA range outside IOMMU capability\n");
>   			return -EFAULT;
>   		}
>   		/* ...then finally give it a kicking to make sure it fits */
> -		base_pfn = max_t(unsigned long, base_pfn,
> -				domain->geometry.aperture_start >> order);
> +		base_pfn = max(base, domain->geometry.aperture_start) >> order;
>   	}
>   
>   	/* start_pfn is always nonzero for an already-initialised domain */
> @@ -1760,7 +1757,7 @@ void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit)
>   	 * underlying IOMMU driver needs to support via the dma-iommu layer.
>   	 */
>   	if (iommu_is_dma_domain(domain)) {
> -		if (iommu_dma_init_domain(domain, dma_base, dma_limit, dev))
> +		if (iommu_dma_init_domain(domain, dev))
>   			goto out_err;
>   		dev->dma_ops = &iommu_dma_ops;
>   	}


I have noticed some random test failures on Tegra186 and Tegra194 and 
bisect is pointing to this commit. Reverting this along with the various 
dependencies does fix the problem. On Tegra186 CPU hotplug is failing 
and on Tegra194 suspend is failing. Unfortunately, on neither platform 
do I see any particular crash but the boards hang somewhere.

If you have any ideas on things we can try let me know.

Cheers
Jon
Robin Murphy May 15, 2024, 2:59 p.m. UTC | #14
Hi Jon,

On 2024-05-14 2:27 pm, Jon Hunter wrote:
> Hi Robin,
> 
> On 19/04/2024 17:54, Robin Murphy wrote:
>> It's now easy to retrieve the device's DMA limits if we want to check
>> them against the domain aperture, so do that ourselves instead of
>> relying on them being passed through the callchain.
>>
>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>> Tested-by: Hanjun Guo <guohanjun@huawei.com>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>   drivers/iommu/dma-iommu.c | 21 +++++++++------------
>>   1 file changed, 9 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index a3039005b696..f542eabaefa4 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -660,19 +660,16 @@ static void iommu_dma_init_options(struct 
>> iommu_dma_options *options,
>>   /**
>>    * iommu_dma_init_domain - Initialise a DMA mapping domain
>>    * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
>> - * @base: IOVA at which the mappable address space starts
>> - * @limit: Last address of the IOVA space
>>    * @dev: Device the domain is being initialised for
>>    *
>> - * @base and @limit + 1 should be exact multiples of IOMMU page 
>> granularity to
>> - * avoid rounding surprises. If necessary, we reserve the page at 
>> address 0
>> + * If the geometry and dma_range_map include address 0, we reserve 
>> that page
>>    * to ensure it is an invalid IOVA. It is safe to reinitialise a 
>> domain, but
>>    * any change which could make prior IOVAs invalid will fail.
>>    */
>> -static int iommu_dma_init_domain(struct iommu_domain *domain, 
>> dma_addr_t base,
>> -                 dma_addr_t limit, struct device *dev)
>> +static int iommu_dma_init_domain(struct iommu_domain *domain, struct 
>> device *dev)
>>   {
>>       struct iommu_dma_cookie *cookie = domain->iova_cookie;
>> +    const struct bus_dma_region *map = dev->dma_range_map;
>>       unsigned long order, base_pfn;
>>       struct iova_domain *iovad;
>>       int ret;
>> @@ -684,18 +681,18 @@ static int iommu_dma_init_domain(struct 
>> iommu_domain *domain, dma_addr_t base,
>>       /* Use the smallest supported page size for IOVA granularity */
>>       order = __ffs(domain->pgsize_bitmap);
>> -    base_pfn = max_t(unsigned long, 1, base >> order);
>> +    base_pfn = 1;
>>       /* Check the domain allows at least some access to the device... */
>> -    if (domain->geometry.force_aperture) {
>> +    if (map) {
>> +        dma_addr_t base = dma_range_map_min(map);
>>           if (base > domain->geometry.aperture_end ||
>> -            limit < domain->geometry.aperture_start) {
>> +            dma_range_map_max(map) < domain->geometry.aperture_start) {
>>               pr_warn("specified DMA range outside IOMMU capability\n");
>>               return -EFAULT;
>>           }
>>           /* ...then finally give it a kicking to make sure it fits */
>> -        base_pfn = max_t(unsigned long, base_pfn,
>> -                domain->geometry.aperture_start >> order);
>> +        base_pfn = max(base, domain->geometry.aperture_start) >> order;
>>       }
>>       /* start_pfn is always nonzero for an already-initialised domain */
>> @@ -1760,7 +1757,7 @@ void iommu_setup_dma_ops(struct device *dev, u64 
>> dma_base, u64 dma_limit)
>>        * underlying IOMMU driver needs to support via the dma-iommu 
>> layer.
>>        */
>>       if (iommu_is_dma_domain(domain)) {
>> -        if (iommu_dma_init_domain(domain, dma_base, dma_limit, dev))
>> +        if (iommu_dma_init_domain(domain, dev))
>>               goto out_err;
>>           dev->dma_ops = &iommu_dma_ops;
>>       }
> 
> 
> I have noticed some random test failures on Tegra186 and Tegra194 and 
> bisect is pointing to this commit. Reverting this along with the various 
> dependencies does fix the problem. On Tegra186 CPU hotplug is failing 
> and on Tegra194 suspend is failing. Unfortunately, on neither platform 
> do I see any particular crash but the boards hang somewhere.

That is... thoroughly bemusing :/ Not only is there supposed to be no 
real functional change here - we should merely be recalculating the same 
information from dev->dma_range_map that the callers were already doing 
to generate the base/limit arguments - but the act of initially setting 
up a default domain for a device behind an IOMMU should have no 
connection whatsoever to suspend and especially not to CPU hotplug.

> If you have any ideas on things we can try let me know.

Since the symptom seems inexplicable, I'd throw the usual memory 
debugging stuff like KASAN at it first. I'd also try 
"no_console_suspend" to check whether any late output is being missed in 
the suspend case (and if it's already broken, then any additional issues 
that may be caused by the console itself hopefully shouldn't matter).

For more base-covering, do you have the "arm64: Properly clean up 
iommu-dma remnants" fix in there already as well? That bug has bisected 
to patch #6 each time though, so I do still suspect that what you're 
seeing is likely something else. It does seem potentially significant 
that those Tegra platforms are making fairly wide use of dma-ranges, but 
there's no clear idea forming out of that observation just yet...

Thanks,
Robin.