diff mbox series

[RESEND,v2,4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

Message ID 20201125101013.14953-5-nicoleotsuka@gmail.com
State Deferred
Headers show
Series iommu/tegra-smmu: Some pending reviewed changes | expand

Commit Message

Nicolin Chen Nov. 25, 2020, 10:10 a.m. UTC
The bus_set_iommu() in tegra_smmu_probe() enumerates all clients
to call in tegra_smmu_probe_device() where each client searches
its DT node for smmu pointer and swgroup ID, so as to configure
an fwspec. But this requires a valid smmu pointer even before mc
and smmu drivers are probed. So in tegra_smmu_probe() we added a
line of code to fill mc->smmu, marking "a bit of a hack".

This works for most of clients in the DTB, however, doesn't work
for a client that doesn't exist in DTB, a PCI device for example.

Actually, if we return ERR_PTR(-ENODEV) in ->probe_device() when
it's called from bus_set_iommu(), iommu core will let everything
carry on. Then when a client gets probed, of_iommu_configure() in
iommu core will search DTB for swgroup ID and call ->of_xlate()
to prepare an fwspec, similar to tegra_smmu_probe_device() and
tegra_smmu_configure(). Then it'll call tegra_smmu_probe_device()
again, and this time we shall return smmu->iommu pointer properly.

So we can get rid of tegra_smmu_find() and tegra_smmu_configure()
along with DT polling code by letting the iommu core handle every
thing, except a problem that we search iommus property in DTB not
only for swgroup ID but also for mc node to get mc->smmu pointer
to call dev_iommu_priv_set() and return the smmu->iommu pointer.
So we'll need to find another way to get smmu pointer.

Referencing the implementation of sun50i-iommu driver, of_xlate()
has client's dev pointer, mc node and swgroup ID. This means that
we can call dev_iommu_priv_set() in of_xlate() instead, so we can
simply get smmu pointer in ->probe_device().

This patch reworks tegra_smmu_probe_device() by:
1) Removing mc->smmu hack in tegra_smmu_probe() so as to return
   ERR_PTR(-ENODEV) in tegra_smmu_probe_device() during stage of
   tegra_smmu_probe/tegra_mc_probe().
2) Moving dev_iommu_priv_set() to of_xlate() so we can get smmu
   pointer in tegra_smmu_probe_device() to replace DTB polling.
3) Removing tegra_smmu_configure() accordingly since iommu core
   takes care of it.

This also fixes a problem that previously we could add clients to
iommu groups before iommu core initializes its default domain:
    ubuntu@jetson:~$ dmesg | grep iommu
    platform 50000000.host1x: Adding to iommu group 1
    platform 57000000.gpu: Adding to iommu group 2
    iommu: Default domain type: Translated
    platform 54200000.dc: Adding to iommu group 3
    platform 54240000.dc: Adding to iommu group 3
    platform 54340000.vic: Adding to iommu group 4

Though it works fine with IOMMU_DOMAIN_UNMANAGED, but will have
warnings if switching to IOMMU_DOMAIN_DMA:
    iommu: Failed to allocate default IOMMU domain of type 0 for
           group (null) - Falling back to IOMMU_DOMAIN_DMA
    iommu: Failed to allocate default IOMMU domain of type 0 for
           group (null) - Falling back to IOMMU_DOMAIN_DMA

Now, bypassing the first probe_device() call from bus_set_iommu()
fixes the sequence:
    ubuntu@jetson:~$ dmesg | grep iommu
    iommu: Default domain type: Translated
    tegra-host1x 50000000.host1x: Adding to iommu group 0
    tegra-dc 54200000.dc: Adding to iommu group 1
    tegra-dc 54240000.dc: Adding to iommu group 1
    tegra-vic 54340000.vic: Adding to iommu group 2
    nouveau 57000000.gpu: Adding to iommu group 3

Note that dmesg log above is testing with IOMMU_DOMAIN_UNMANAGED.

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
---
 drivers/iommu/tegra-smmu.c | 96 ++++++--------------------------------
 1 file changed, 15 insertions(+), 81 deletions(-)

Comments

Guillaume Tucker Feb. 4, 2021, 11:10 a.m. UTC | #1
Hi Nicolin,

A regression was detected by kernelci.org in IGT's drm_read tests
on mainline, it was first seen on 17th December 2020.  You can
find some details here:

  https://kernelci.org/test/case/id/600b82dc1e3208f123d3dffc/

Then an automated bisection was run and it landed on this
patch (v5.10-rc3-4-g25938c73cd79 on mainline).  Normally, an
email is generated automatically but I had to start this one by
hand as there were issues getting it to complete.

You can see the failing test cases with this patch:

  https://lava.collabora.co.uk/results/3126405/0_igt-kms-tegra

Some errors are seen around this point in the log:

  https://lava.collabora.co.uk/scheduler/job/3126405#L1005

[    3.029729] tegra-mc 70019000.memory-controller: display0a: read @0xfe000000: EMEM address decode error (SMMU translation error [--S])
[    3.042058] tegra-mc 70019000.memory-controller: display0a: read @0xfe000000: Page fault (SMMU translation error [--S])


Here's the same test passing with this patch reverted:

  https://lava.collabora.co.uk/results/3126570/0_igt-kms-tegra
  

For completeness, you can see all the test jobs run by the
automated bisection here:

  https://lava.collabora.co.uk/scheduler/device_type/tegra124-nyan-big?dt_length=25&dt_search=bisection-gtucker-12#dt_


Please let us know if you need any help debugging this issue or
to try a fix on this platform.

Best wishes,
Guillaume

On 25/11/2020 10:10, Nicolin Chen wrote:
> The bus_set_iommu() in tegra_smmu_probe() enumerates all clients
> to call in tegra_smmu_probe_device() where each client searches
> its DT node for smmu pointer and swgroup ID, so as to configure
> an fwspec. But this requires a valid smmu pointer even before mc
> and smmu drivers are probed. So in tegra_smmu_probe() we added a
> line of code to fill mc->smmu, marking "a bit of a hack".
> 
> This works for most of clients in the DTB, however, doesn't work
> for a client that doesn't exist in DTB, a PCI device for example.
> 
> Actually, if we return ERR_PTR(-ENODEV) in ->probe_device() when
> it's called from bus_set_iommu(), iommu core will let everything
> carry on. Then when a client gets probed, of_iommu_configure() in
> iommu core will search DTB for swgroup ID and call ->of_xlate()
> to prepare an fwspec, similar to tegra_smmu_probe_device() and
> tegra_smmu_configure(). Then it'll call tegra_smmu_probe_device()
> again, and this time we shall return smmu->iommu pointer properly.
> 
> So we can get rid of tegra_smmu_find() and tegra_smmu_configure()
> along with DT polling code by letting the iommu core handle every
> thing, except a problem that we search iommus property in DTB not
> only for swgroup ID but also for mc node to get mc->smmu pointer
> to call dev_iommu_priv_set() and return the smmu->iommu pointer.
> So we'll need to find another way to get smmu pointer.
> 
> Referencing the implementation of sun50i-iommu driver, of_xlate()
> has client's dev pointer, mc node and swgroup ID. This means that
> we can call dev_iommu_priv_set() in of_xlate() instead, so we can
> simply get smmu pointer in ->probe_device().
> 
> This patch reworks tegra_smmu_probe_device() by:
> 1) Removing mc->smmu hack in tegra_smmu_probe() so as to return
>    ERR_PTR(-ENODEV) in tegra_smmu_probe_device() during stage of
>    tegra_smmu_probe/tegra_mc_probe().
> 2) Moving dev_iommu_priv_set() to of_xlate() so we can get smmu
>    pointer in tegra_smmu_probe_device() to replace DTB polling.
> 3) Removing tegra_smmu_configure() accordingly since iommu core
>    takes care of it.
> 
> This also fixes a problem that previously we could add clients to
> iommu groups before iommu core initializes its default domain:
>     ubuntu@jetson:~$ dmesg | grep iommu
>     platform 50000000.host1x: Adding to iommu group 1
>     platform 57000000.gpu: Adding to iommu group 2
>     iommu: Default domain type: Translated
>     platform 54200000.dc: Adding to iommu group 3
>     platform 54240000.dc: Adding to iommu group 3
>     platform 54340000.vic: Adding to iommu group 4
> 
> Though it works fine with IOMMU_DOMAIN_UNMANAGED, but will have
> warnings if switching to IOMMU_DOMAIN_DMA:
>     iommu: Failed to allocate default IOMMU domain of type 0 for
>            group (null) - Falling back to IOMMU_DOMAIN_DMA
>     iommu: Failed to allocate default IOMMU domain of type 0 for
>            group (null) - Falling back to IOMMU_DOMAIN_DMA
> 
> Now, bypassing the first probe_device() call from bus_set_iommu()
> fixes the sequence:
>     ubuntu@jetson:~$ dmesg | grep iommu
>     iommu: Default domain type: Translated
>     tegra-host1x 50000000.host1x: Adding to iommu group 0
>     tegra-dc 54200000.dc: Adding to iommu group 1
>     tegra-dc 54240000.dc: Adding to iommu group 1
>     tegra-vic 54340000.vic: Adding to iommu group 2
>     nouveau 57000000.gpu: Adding to iommu group 3
> 
> Note that dmesg log above is testing with IOMMU_DOMAIN_UNMANAGED.
> 
> Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
> Tested-by: Dmitry Osipenko <digetx@gmail.com>
> Acked-by: Thierry Reding <treding@nvidia.com>
> Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
> ---
>  drivers/iommu/tegra-smmu.c | 96 ++++++--------------------------------
>  1 file changed, 15 insertions(+), 81 deletions(-)
> 
> diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
> index 297d49f3f80e..f45ed43cf8db 100644
> --- a/drivers/iommu/tegra-smmu.c
> +++ b/drivers/iommu/tegra-smmu.c
> @@ -797,75 +797,9 @@ static phys_addr_t tegra_smmu_iova_to_phys(struct iommu_domain *domain,
>  	return SMMU_PFN_PHYS(pfn) + SMMU_OFFSET_IN_PAGE(iova);
>  }
>  
> -static struct tegra_smmu *tegra_smmu_find(struct device_node *np)
> -{
> -	struct platform_device *pdev;
> -	struct tegra_mc *mc;
> -
> -	pdev = of_find_device_by_node(np);
> -	if (!pdev)
> -		return NULL;
> -
> -	mc = platform_get_drvdata(pdev);
> -	if (!mc)
> -		return NULL;
> -
> -	return mc->smmu;
> -}
> -
> -static int tegra_smmu_configure(struct tegra_smmu *smmu, struct device *dev,
> -				struct of_phandle_args *args)
> -{
> -	const struct iommu_ops *ops = smmu->iommu.ops;
> -	int err;
> -
> -	err = iommu_fwspec_init(dev, &dev->of_node->fwnode, ops);
> -	if (err < 0) {
> -		dev_err(dev, "failed to initialize fwspec: %d\n", err);
> -		return err;
> -	}
> -
> -	err = ops->of_xlate(dev, args);
> -	if (err < 0) {
> -		dev_err(dev, "failed to parse SW group ID: %d\n", err);
> -		iommu_fwspec_free(dev);
> -		return err;
> -	}
> -
> -	return 0;
> -}
> -
>  static struct iommu_device *tegra_smmu_probe_device(struct device *dev)
>  {
> -	struct device_node *np = dev->of_node;
> -	struct tegra_smmu *smmu = NULL;
> -	struct of_phandle_args args;
> -	unsigned int index = 0;
> -	int err;
> -
> -	while (of_parse_phandle_with_args(np, "iommus", "#iommu-cells", index,
> -					  &args) == 0) {
> -		smmu = tegra_smmu_find(args.np);
> -		if (smmu) {
> -			err = tegra_smmu_configure(smmu, dev, &args);
> -			of_node_put(args.np);
> -
> -			if (err < 0)
> -				return ERR_PTR(err);
> -
> -			/*
> -			 * Only a single IOMMU master interface is currently
> -			 * supported by the Linux kernel, so abort after the
> -			 * first match.
> -			 */
> -			dev_iommu_priv_set(dev, smmu);
> -
> -			break;
> -		}
> -
> -		of_node_put(args.np);
> -		index++;
> -	}
> +	struct tegra_smmu *smmu = dev_iommu_priv_get(dev);
>  
>  	if (!smmu)
>  		return ERR_PTR(-ENODEV);
> @@ -873,10 +807,7 @@ static struct iommu_device *tegra_smmu_probe_device(struct device *dev)
>  	return &smmu->iommu;
>  }
>  
> -static void tegra_smmu_release_device(struct device *dev)
> -{
> -	dev_iommu_priv_set(dev, NULL);
> -}
> +static void tegra_smmu_release_device(struct device *dev) {}
>  
>  static const struct tegra_smmu_group_soc *
>  tegra_smmu_find_group(struct tegra_smmu *smmu, unsigned int swgroup)
> @@ -953,8 +884,21 @@ static struct iommu_group *tegra_smmu_device_group(struct device *dev)
>  static int tegra_smmu_of_xlate(struct device *dev,
>  			       struct of_phandle_args *args)
>  {
> +	struct platform_device *iommu_pdev = of_find_device_by_node(args->np);
> +	struct tegra_mc *mc = platform_get_drvdata(iommu_pdev);
>  	u32 id = args->args[0];
>  
> +	/*
> +	 * Note: we are here releasing the reference of &iommu_pdev->dev, which
> +	 * is mc->dev. Although some functions in tegra_smmu_ops may keep using
> +	 * its private data beyond this point, it's still safe to do so because
> +	 * the SMMU parent device is the same as the MC, so the reference count
> +	 * isn't strictly necessary.
> +	 */
> +	put_device(&iommu_pdev->dev);
> +
> +	dev_iommu_priv_set(dev, mc->smmu);
> +
>  	return iommu_fwspec_add_ids(dev, &id, 1);
>  }
>  
> @@ -1079,16 +1023,6 @@ struct tegra_smmu *tegra_smmu_probe(struct device *dev,
>  	if (!smmu)
>  		return ERR_PTR(-ENOMEM);
>  
> -	/*
> -	 * This is a bit of a hack. Ideally we'd want to simply return this
> -	 * value. However the IOMMU registration process will attempt to add
> -	 * all devices to the IOMMU when bus_set_iommu() is called. In order
> -	 * not to rely on global variables to track the IOMMU instance, we
> -	 * set it here so that it can be looked up from the .probe_device()
> -	 * callback via the IOMMU device's .drvdata field.
> -	 */
> -	mc->smmu = smmu;
> -
>  	size = BITS_TO_LONGS(soc->num_asids) * sizeof(long);
>  
>  	smmu->asids = devm_kzalloc(dev, size, GFP_KERNEL);
>
Nicolin Chen Feb. 5, 2021, 5:24 a.m. UTC | #2
Hi Guillaume,

On Thu, Feb 04, 2021 at 11:10:15AM +0000, Guillaume Tucker wrote:
> Hi Nicolin,
> 
> A regression was detected by kernelci.org in IGT's drm_read tests
> on mainline, it was first seen on 17th December 2020.  You can
> find some details here:
> 
>   https://kernelci.org/test/case/id/600b82dc1e3208f123d3dffc/

Thanks for reporting the issue. We did test on Tegra210 and Tegra30
yet not on Tegra124. I am wondering what could go wrong...

> Please let us know if you need any help debugging this issue or
> to try a fix on this platform.

Yes, I don't have any Tegra124 platform to run. It'd be very nice
if you can run some debugging patch (I can provide you) and a fix
after I root cause the issue.

Thanks
Nicolin
Nicolin Chen Feb. 5, 2021, 9:45 a.m. UTC | #3
Hi Guillaume,

On Thu, Feb 04, 2021 at 09:24:23PM -0800, Nicolin Chen wrote:
> > Please let us know if you need any help debugging this issue or
> > to try a fix on this platform.
> 
> Yes, I don't have any Tegra124 platform to run. It'd be very nice
> if you can run some debugging patch (I can provide you) and a fix
> after I root cause the issue.

Would it be possible for you to run with the given debugging patch?

It'd be nicer if I can get both logs of the vanilla kernel (failing)
and the commit-reverted version (passing), each applying this patch.

Thanks in advance!
Nicolin
Guillaume Tucker Feb. 6, 2021, 1:40 p.m. UTC | #4
On 05/02/2021 09:45, Nicolin Chen wrote:
> Hi Guillaume,
> 
> On Thu, Feb 04, 2021 at 09:24:23PM -0800, Nicolin Chen wrote:
>>> Please let us know if you need any help debugging this issue or
>>> to try a fix on this platform.
>>
>> Yes, I don't have any Tegra124 platform to run. It'd be very nice
>> if you can run some debugging patch (I can provide you) and a fix
>> after I root cause the issue.
> 
> Would it be possible for you to run with the given debugging patch?
> 
> It'd be nicer if I can get both logs of the vanilla kernel (failing)
> and the commit-reverted version (passing), each applying this patch.

Sure, I've run 3 jobs:

* v5.11-rc6 as a reference, to see the original issue:
  https://lava.collabora.co.uk/scheduler/job/3187848

* + your debug patch:
  https://lava.collabora.co.uk/scheduler/job/3187849

* + the "breaking" commit reverted, passing the tests:
  https://lava.collabora.co.uk/scheduler/job/3187851


You can see the history of the test branch I'm using here, with
the 3 revisions mentioned above:

  https://gitlab.collabora.com/gtucker/linux/-/commits/linux-5.11-rc6-nyan-big-drm-read/


Hope that helps,
Guillaume
Nicolin Chen Feb. 10, 2021, 8:20 a.m. UTC | #5
Hi Guillaume,

On Sat, Feb 06, 2021 at 01:40:13PM +0000, Guillaume Tucker wrote:
> > It'd be nicer if I can get both logs of the vanilla kernel (failing)
> > and the commit-reverted version (passing), each applying this patch.
> 
> Sure, I've run 3 jobs:
> 
> * v5.11-rc6 as a reference, to see the original issue:
>   https://lava.collabora.co.uk/scheduler/job/3187848
> 
> * + your debug patch:
>   https://lava.collabora.co.uk/scheduler/job/3187849
> 
> * + the "breaking" commit reverted, passing the tests:
>   https://lava.collabora.co.uk/scheduler/job/3187851

Thanks for the help!

I am able to figure out what's probably wrong, yet not so sure
about the best solution at this point.

Would it be possible for you to run one more time with another
debugging patch? I'd like to see the same logs as previous:
1. Vanilla kernel + debug patch
2. Vanilla kernel + Reverted + debug patch

Thank you
Nicolin
Guillaume Tucker Feb. 11, 2021, 3:50 p.m. UTC | #6
On 10/02/2021 08:20, Nicolin Chen wrote:
> Hi Guillaume,
> 
> On Sat, Feb 06, 2021 at 01:40:13PM +0000, Guillaume Tucker wrote:
>>> It'd be nicer if I can get both logs of the vanilla kernel (failing)
>>> and the commit-reverted version (passing), each applying this patch.
>>
>> Sure, I've run 3 jobs:
>>
>> * v5.11-rc6 as a reference, to see the original issue:
>>   https://lava.collabora.co.uk/scheduler/job/3187848
>>
>> * + your debug patch:
>>   https://lava.collabora.co.uk/scheduler/job/3187849
>>
>> * + the "breaking" commit reverted, passing the tests:
>>   https://lava.collabora.co.uk/scheduler/job/3187851
> 
> Thanks for the help!
> 
> I am able to figure out what's probably wrong, yet not so sure
> about the best solution at this point.
> 
> Would it be possible for you to run one more time with another
> debugging patch? I'd like to see the same logs as previous:
> 1. Vanilla kernel + debug patch
> 2. Vanilla kernel + Reverted + debug patch

As it turns out, next-20210210 is passing all the tests again so
it looks like this got fixed in the meantime:

  https://lava.collabora.co.uk/scheduler/job/3210192
  https://lava.collabora.co.uk/results/3210192/0_igt-kms-tegra

And here's a more extensive list of IGT tests on next-20210211,
all the regressions have been fixed:

  https://kernelci.org/test/plan/id/60254c42f51df36be53abe62/


I haven't run a reversed bisection to find the fix, but I guess
it wouldn't be too hard to find out what happened by hand anyway.
I see the drm/tegra/for-5.12-rc1 tag has been merged into
linux-next, maybe that solved the issue?

FYI I've also run some jobs with your debug patch and with the
breaking patch reverted:

  https://lava.collabora.co.uk/scheduler/job/3210245
  https://lava.collabora.co.uk/scheduler/job/3210596

Meanwhile I'll see what can be done to improve the automated
bisection so if there are new IGT regressions they would get
reported earlier.  I guess it would have saved us all some time
if it had been bisected in December.

Thanks,
Guillaume
Nicolin Chen Feb. 18, 2021, 10:35 a.m. UTC | #7
Hi Guillaume,

Thank you for the test results! And sorry for my belated reply.

On Thu, Feb 11, 2021 at 03:50:05PM +0000, Guillaume Tucker wrote:
> > On Sat, Feb 06, 2021 at 01:40:13PM +0000, Guillaume Tucker wrote:
> >>> It'd be nicer if I can get both logs of the vanilla kernel (failing)
> >>> and the commit-reverted version (passing), each applying this patch.
> >>
> >> Sure, I've run 3 jobs:
> >>
> >> * v5.11-rc6 as a reference, to see the original issue:
> >>   https://lava.collabora.co.uk/scheduler/job/3187848
> >>
> >> * + your debug patch:
> >>   https://lava.collabora.co.uk/scheduler/job/3187849
> >>
> >> * + the "breaking" commit reverted, passing the tests:
> >>   https://lava.collabora.co.uk/scheduler/job/3187851
> > 
> > Thanks for the help!
> > 
> > I am able to figure out what's probably wrong, yet not so sure
> > about the best solution at this point.
> > 
> > Would it be possible for you to run one more time with another
> > debugging patch? I'd like to see the same logs as previous:
> > 1. Vanilla kernel + debug patch
> > 2. Vanilla kernel + Reverted + debug patch
> 
> As it turns out, next-20210210 is passing all the tests again so
> it looks like this got fixed in the meantime:
> 
>   https://lava.collabora.co.uk/scheduler/job/3210192

I checked this passing log, however, found that the regression is
still there though test passed, as the prints below aren't normal:
  tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
	 EMEM address decode error (SMMU translation error [--S])
  tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
	 Page fault (SMMU translation error [--S])

I was trying to think of a simpler solution than a revert. However,
given the fact that the callback sequence could change -- guessing
likely a recent change in iommu core, I feel it safer to revert my
previous change, not necessarily being a complete revert though.

I attached my partial reverting change in this email. Would it be
possible for you to run one more test for me to confirm it? It'd
keep the tests passing while eliminating all error prints above.

If the fix works, I'll re-send it to mail list by adding a commit
message.

Thanks!
Nicolin
Guillaume Tucker Feb. 18, 2021, 8:38 p.m. UTC | #8
On 18/02/2021 10:35, Nicolin Chen wrote:
> Hi Guillaume,
> 
> Thank you for the test results! And sorry for my belated reply.

No worries :)

> On Thu, Feb 11, 2021 at 03:50:05PM +0000, Guillaume Tucker wrote:
>>> On Sat, Feb 06, 2021 at 01:40:13PM +0000, Guillaume Tucker wrote:
>>>>> It'd be nicer if I can get both logs of the vanilla kernel (failing)
>>>>> and the commit-reverted version (passing), each applying this patch.
>>>>
>>>> Sure, I've run 3 jobs:
>>>>
>>>> * v5.11-rc6 as a reference, to see the original issue:
>>>>   https://lava.collabora.co.uk/scheduler/job/3187848
>>>>
>>>> * + your debug patch:
>>>>   https://lava.collabora.co.uk/scheduler/job/3187849
>>>>
>>>> * + the "breaking" commit reverted, passing the tests:
>>>>   https://lava.collabora.co.uk/scheduler/job/3187851
>>>
>>> Thanks for the help!
>>>
>>> I am able to figure out what's probably wrong, yet not so sure
>>> about the best solution at this point.
>>>
>>> Would it be possible for you to run one more time with another
>>> debugging patch? I'd like to see the same logs as previous:
>>> 1. Vanilla kernel + debug patch
>>> 2. Vanilla kernel + Reverted + debug patch
>>
>> As it turns out, next-20210210 is passing all the tests again so
>> it looks like this got fixed in the meantime:
>>
>>   https://lava.collabora.co.uk/scheduler/job/3210192
> 
> I checked this passing log, however, found that the regression is
> still there though test passed, as the prints below aren't normal:
>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
> 	 EMEM address decode error (SMMU translation error [--S])
>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
> 	 Page fault (SMMU translation error [--S])

Ah yes sorry, there are other KernelCI checks for kernel errors
but that wasn't enabled in the bisection so I didn't notice them.

> I was trying to think of a simpler solution than a revert. However,
> given the fact that the callback sequence could change -- guessing
> likely a recent change in iommu core, I feel it safer to revert my
> previous change, not necessarily being a complete revert though.
> 
> I attached my partial reverting change in this email. Would it be
> possible for you to run one more test for me to confirm it? It'd
> keep the tests passing while eliminating all error prints above.
> 
> If the fix works, I'll re-send it to mail list by adding a commit
> message.

Sure, here's next-20210218 as a reference:

  https://lava.collabora.co.uk/scheduler/job/3241236

and here with your patch applied on top of it:

  https://lava.collabora.co.uk/scheduler/job/3241246

The git branch I've used where your patch is applied:

  https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210218-nyan-big-drm-read/

The errors seem to have disappeared but I'll let you double check
that things are all back to a working state.

BTW: This thread is a good example of how having an "on-demand"
KernelCI service to let developers re-run tests with extra
patches would allow them to fix issues independently.  We'll keep
that in mind for the future.

Best wishes,
Guillaume
diff mbox series

Patch

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 297d49f3f80e..f45ed43cf8db 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -797,75 +797,9 @@  static phys_addr_t tegra_smmu_iova_to_phys(struct iommu_domain *domain,
 	return SMMU_PFN_PHYS(pfn) + SMMU_OFFSET_IN_PAGE(iova);
 }
 
-static struct tegra_smmu *tegra_smmu_find(struct device_node *np)
-{
-	struct platform_device *pdev;
-	struct tegra_mc *mc;
-
-	pdev = of_find_device_by_node(np);
-	if (!pdev)
-		return NULL;
-
-	mc = platform_get_drvdata(pdev);
-	if (!mc)
-		return NULL;
-
-	return mc->smmu;
-}
-
-static int tegra_smmu_configure(struct tegra_smmu *smmu, struct device *dev,
-				struct of_phandle_args *args)
-{
-	const struct iommu_ops *ops = smmu->iommu.ops;
-	int err;
-
-	err = iommu_fwspec_init(dev, &dev->of_node->fwnode, ops);
-	if (err < 0) {
-		dev_err(dev, "failed to initialize fwspec: %d\n", err);
-		return err;
-	}
-
-	err = ops->of_xlate(dev, args);
-	if (err < 0) {
-		dev_err(dev, "failed to parse SW group ID: %d\n", err);
-		iommu_fwspec_free(dev);
-		return err;
-	}
-
-	return 0;
-}
-
 static struct iommu_device *tegra_smmu_probe_device(struct device *dev)
 {
-	struct device_node *np = dev->of_node;
-	struct tegra_smmu *smmu = NULL;
-	struct of_phandle_args args;
-	unsigned int index = 0;
-	int err;
-
-	while (of_parse_phandle_with_args(np, "iommus", "#iommu-cells", index,
-					  &args) == 0) {
-		smmu = tegra_smmu_find(args.np);
-		if (smmu) {
-			err = tegra_smmu_configure(smmu, dev, &args);
-			of_node_put(args.np);
-
-			if (err < 0)
-				return ERR_PTR(err);
-
-			/*
-			 * Only a single IOMMU master interface is currently
-			 * supported by the Linux kernel, so abort after the
-			 * first match.
-			 */
-			dev_iommu_priv_set(dev, smmu);
-
-			break;
-		}
-
-		of_node_put(args.np);
-		index++;
-	}
+	struct tegra_smmu *smmu = dev_iommu_priv_get(dev);
 
 	if (!smmu)
 		return ERR_PTR(-ENODEV);
@@ -873,10 +807,7 @@  static struct iommu_device *tegra_smmu_probe_device(struct device *dev)
 	return &smmu->iommu;
 }
 
-static void tegra_smmu_release_device(struct device *dev)
-{
-	dev_iommu_priv_set(dev, NULL);
-}
+static void tegra_smmu_release_device(struct device *dev) {}
 
 static const struct tegra_smmu_group_soc *
 tegra_smmu_find_group(struct tegra_smmu *smmu, unsigned int swgroup)
@@ -953,8 +884,21 @@  static struct iommu_group *tegra_smmu_device_group(struct device *dev)
 static int tegra_smmu_of_xlate(struct device *dev,
 			       struct of_phandle_args *args)
 {
+	struct platform_device *iommu_pdev = of_find_device_by_node(args->np);
+	struct tegra_mc *mc = platform_get_drvdata(iommu_pdev);
 	u32 id = args->args[0];
 
+	/*
+	 * Note: we are here releasing the reference of &iommu_pdev->dev, which
+	 * is mc->dev. Although some functions in tegra_smmu_ops may keep using
+	 * its private data beyond this point, it's still safe to do so because
+	 * the SMMU parent device is the same as the MC, so the reference count
+	 * isn't strictly necessary.
+	 */
+	put_device(&iommu_pdev->dev);
+
+	dev_iommu_priv_set(dev, mc->smmu);
+
 	return iommu_fwspec_add_ids(dev, &id, 1);
 }
 
@@ -1079,16 +1023,6 @@  struct tegra_smmu *tegra_smmu_probe(struct device *dev,
 	if (!smmu)
 		return ERR_PTR(-ENOMEM);
 
-	/*
-	 * This is a bit of a hack. Ideally we'd want to simply return this
-	 * value. However the IOMMU registration process will attempt to add
-	 * all devices to the IOMMU when bus_set_iommu() is called. In order
-	 * not to rely on global variables to track the IOMMU instance, we
-	 * set it here so that it can be looked up from the .probe_device()
-	 * callback via the IOMMU device's .drvdata field.
-	 */
-	mc->smmu = smmu;
-
 	size = BITS_TO_LONGS(soc->num_asids) * sizeof(long);
 
 	smmu->asids = devm_kzalloc(dev, size, GFP_KERNEL);