diff mbox series

gpu: host1x: Do not output error message for deferred probe

Message ID 20190604153150.22265-1-thierry.reding@gmail.com
State Accepted
Headers show
Series gpu: host1x: Do not output error message for deferred probe | expand

Commit Message

Thierry Reding June 4, 2019, 3:31 p.m. UTC
From: Thierry Reding <treding@nvidia.com>

When deferring probe, avoid logging a confusing error message. While at
it, make the error message more informational.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/host1x/dev.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Dmitry Osipenko June 4, 2019, 4:07 p.m. UTC | #1
04.06.2019 18:31, Thierry Reding пишет:
> From: Thierry Reding <treding@nvidia.com>
> 
> When deferring probe, avoid logging a confusing error message. While at
> it, make the error message more informational.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
>  drivers/gpu/host1x/dev.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> index c55e2d634887..5a3f797240d4 100644
> --- a/drivers/gpu/host1x/dev.c
> +++ b/drivers/gpu/host1x/dev.c
> @@ -247,8 +247,11 @@ static int host1x_probe(struct platform_device *pdev)
>  
>  	host->clk = devm_clk_get(&pdev->dev, NULL);
>  	if (IS_ERR(host->clk)) {
> -		dev_err(&pdev->dev, "failed to get clock\n");
>  		err = PTR_ERR(host->clk);
> +
> +		if (err != -EPROBE_DEFER)
> +			dev_err(&pdev->dev, "failed to get clock: %d\n", err);
> +
>  		return err;
>  	}

The clock driver should be available at the time of host1x's probing on
all Tegra's because it is one of essential core drivers that become
available early during boot.

I guess you're making this change for T186, is it because the BPMP
driver's probe getting deferred? If yes, won't it be possible to fix the
defer of the clock driver instead of making such changes in the affected
drivers?
Dmitry Osipenko June 4, 2019, 6:35 p.m. UTC | #2
04.06.2019 18:31, Thierry Reding пишет:
> From: Thierry Reding <treding@nvidia.com>
> 
> When deferring probe, avoid logging a confusing error message. While at
> it, make the error message more informational.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
>  drivers/gpu/host1x/dev.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> index c55e2d634887..5a3f797240d4 100644
> --- a/drivers/gpu/host1x/dev.c
> +++ b/drivers/gpu/host1x/dev.c
> @@ -247,8 +247,11 @@ static int host1x_probe(struct platform_device *pdev)
>  
>  	host->clk = devm_clk_get(&pdev->dev, NULL);
>  	if (IS_ERR(host->clk)) {
> -		dev_err(&pdev->dev, "failed to get clock\n");
>  		err = PTR_ERR(host->clk);
> +
> +		if (err != -EPROBE_DEFER)
> +			dev_err(&pdev->dev, "failed to get clock: %d\n", err);
> +
>  		return err;
>  	}

The clock driver should be available at the time of host1x's probing on
all Tegra's because it is one of essential core drivers that become
available early during boot.

I guess you're making this change for T186, is it because the BPMP
driver's probe getting deferred? If yes, won't it be possible to fix the
defer of the clock driver instead of making such changes in all of the
affected drivers?

[it appeared to me that first email got dropped by gmail, so I'm
re-sending it second time just in a case]
Thierry Reding June 5, 2019, 8:28 a.m. UTC | #3
On Tue, Jun 04, 2019 at 07:07:42PM +0300, Dmitry Osipenko wrote:
> 04.06.2019 18:31, Thierry Reding пишет:
> > From: Thierry Reding <treding@nvidia.com>
> > 
> > When deferring probe, avoid logging a confusing error message. While at
> > it, make the error message more informational.
> > 
> > Signed-off-by: Thierry Reding <treding@nvidia.com>
> > ---
> >  drivers/gpu/host1x/dev.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> > index c55e2d634887..5a3f797240d4 100644
> > --- a/drivers/gpu/host1x/dev.c
> > +++ b/drivers/gpu/host1x/dev.c
> > @@ -247,8 +247,11 @@ static int host1x_probe(struct platform_device *pdev)
> >  
> >  	host->clk = devm_clk_get(&pdev->dev, NULL);
> >  	if (IS_ERR(host->clk)) {
> > -		dev_err(&pdev->dev, "failed to get clock\n");
> >  		err = PTR_ERR(host->clk);
> > +
> > +		if (err != -EPROBE_DEFER)
> > +			dev_err(&pdev->dev, "failed to get clock: %d\n", err);
> > +
> >  		return err;
> >  	}
> 
> The clock driver should be available at the time of host1x's probing on
> all Tegra's because it is one of essential core drivers that become
> available early during boot.

That's the currently baked-in assumption. However, there can be any
number of reasons for why the clocks may not show up as early as
expected, as evidenced in the case of Tegra186.

> I guess you're making this change for T186, is it because the BPMP
> driver's probe getting deferred? If yes, won't it be possible to fix the
> defer of the clock driver instead of making such changes in the affected
> drivers?

The reason why this is now happening on Tegra186 is because the BPMP is
bound to an IOMMU to avoid getting faults from the new no-bypass policy
that the ARM SMMU driver is implementing as of v5.2-rc1.

As a result of binding to an IOMMU, the first probe of the BPMP driver
will get deferred, so any driver trying to request a clock after that
and before BPMP gets probed successfully the next time, any clk_get()
calls will fail with -EPROBE_DEFER.

This is a bit unfortunate, but like I said, these kinds of things can
happen, and probe deferral was designed specifically to deal with that
kind of situation so that we wouldn't have to rely on all of these
built-in assumptions that occasionally break.

The driver also already handles deferred probe properly. The only thing
that this patch really changes is to no longer consider -EPROBE_DEFER an
error. It's in fact a pretty common situation in many drivers and should
not warrant a kernel log message.

Thierry
Dmitry Osipenko June 5, 2019, 11:25 a.m. UTC | #4
05.06.2019 11:28, Thierry Reding пишет:
> On Tue, Jun 04, 2019 at 07:07:42PM +0300, Dmitry Osipenko wrote:
>> 04.06.2019 18:31, Thierry Reding пишет:
>>> From: Thierry Reding <treding@nvidia.com>
>>>
>>> When deferring probe, avoid logging a confusing error message. While at
>>> it, make the error message more informational.
>>>
>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>> ---
>>>  drivers/gpu/host1x/dev.c | 5 ++++-
>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
>>> index c55e2d634887..5a3f797240d4 100644
>>> --- a/drivers/gpu/host1x/dev.c
>>> +++ b/drivers/gpu/host1x/dev.c
>>> @@ -247,8 +247,11 @@ static int host1x_probe(struct platform_device *pdev)
>>>  
>>>  	host->clk = devm_clk_get(&pdev->dev, NULL);
>>>  	if (IS_ERR(host->clk)) {
>>> -		dev_err(&pdev->dev, "failed to get clock\n");
>>>  		err = PTR_ERR(host->clk);
>>> +
>>> +		if (err != -EPROBE_DEFER)
>>> +			dev_err(&pdev->dev, "failed to get clock: %d\n", err);
>>> +
>>>  		return err;
>>>  	}
>>
>> The clock driver should be available at the time of host1x's probing on
>> all Tegra's because it is one of essential core drivers that become
>> available early during boot.
> 
> That's the currently baked-in assumption. However, there can be any
> number of reasons for why the clocks may not show up as early as
> expected, as evidenced in the case of Tegra186.
> 
>> I guess you're making this change for T186, is it because the BPMP
>> driver's probe getting deferred? If yes, won't it be possible to fix the
>> defer of the clock driver instead of making such changes in the affected
>> drivers?
> 
> The reason why this is now happening on Tegra186 is because the BPMP is
> bound to an IOMMU to avoid getting faults from the new no-bypass policy
> that the ARM SMMU driver is implementing as of v5.2-rc1.
> 
> As a result of binding to an IOMMU, the first probe of the BPMP driver
> will get deferred, so any driver trying to request a clock after that
> and before BPMP gets probed successfully the next time, any clk_get()
> calls will fail with -EPROBE_DEFER.
> 
> This is a bit unfortunate, but like I said, these kinds of things can
> happen, and probe deferral was designed specifically to deal with that
> kind of situation so that we wouldn't have to rely on all of these
> built-in assumptions that occasionally break.
> 
> The driver also already handles deferred probe properly. The only thing
> that this patch really changes is to no longer consider -EPROBE_DEFER an
> error. It's in fact a pretty common situation in many drivers and should
> not warrant a kernel log message.

You're trying to mask symptoms instead of curing the decease and it looks
like the decease could be cured.

Won't something like this work for you?

From fbeabba5f1151e96edc38620db67593585558ca0 Mon Sep 17 00:00:00 2001
From: Dmitry Osipenko <digetx@gmail.com>
Date: Wed, 5 Jun 2019 14:02:00 +0300
Subject: [PATCH 1/2] iommu/arm-smmu: Move driver registration to subsys level

On some platforms there is a dependency on the IOMMU availability that
comes up early during of the boot process. One example is NVIDIA Tegra186
which uses firmware, called BPMP, which manages lots of core functions
like system clocks for example. That firmware driver require IOMMU
functionality and hence the driver's probing is getting deferred because
the ARM's SMMU driver is probed later, thus all the drivers that depend
on the BPMP availability are also getting deferred on Tegra186. Let's move
SMMU driver's registration to an earlier boot stage, allowing drivers like
BPMP to probe successfully without the defer.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/iommu/arm-smmu.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 5e54cc0a28b3..08919f2fdf04 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -2410,4 +2410,9 @@ static struct platform_driver arm_smmu_driver = {
 	.probe	= arm_smmu_device_probe,
 	.shutdown = arm_smmu_device_shutdown,
 };
-builtin_platform_driver(arm_smmu_driver);
+
+static int __init arm_smmu_init(void)
+{
+	return platform_driver_register(&arm_smmu_driver);
+}
+subsys_initcall(arm_smmu_init);
Thierry Reding June 5, 2019, 12:32 p.m. UTC | #5
On Wed, Jun 05, 2019 at 02:25:43PM +0300, Dmitry Osipenko wrote:
> 05.06.2019 11:28, Thierry Reding пишет:
> > On Tue, Jun 04, 2019 at 07:07:42PM +0300, Dmitry Osipenko wrote:
> >> 04.06.2019 18:31, Thierry Reding пишет:
> >>> From: Thierry Reding <treding@nvidia.com>
> >>>
> >>> When deferring probe, avoid logging a confusing error message. While at
> >>> it, make the error message more informational.
> >>>
> >>> Signed-off-by: Thierry Reding <treding@nvidia.com>
> >>> ---
> >>>  drivers/gpu/host1x/dev.c | 5 ++++-
> >>>  1 file changed, 4 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> >>> index c55e2d634887..5a3f797240d4 100644
> >>> --- a/drivers/gpu/host1x/dev.c
> >>> +++ b/drivers/gpu/host1x/dev.c
> >>> @@ -247,8 +247,11 @@ static int host1x_probe(struct platform_device *pdev)
> >>>  
> >>>  	host->clk = devm_clk_get(&pdev->dev, NULL);
> >>>  	if (IS_ERR(host->clk)) {
> >>> -		dev_err(&pdev->dev, "failed to get clock\n");
> >>>  		err = PTR_ERR(host->clk);
> >>> +
> >>> +		if (err != -EPROBE_DEFER)
> >>> +			dev_err(&pdev->dev, "failed to get clock: %d\n", err);
> >>> +
> >>>  		return err;
> >>>  	}
> >>
> >> The clock driver should be available at the time of host1x's probing on
> >> all Tegra's because it is one of essential core drivers that become
> >> available early during boot.
> > 
> > That's the currently baked-in assumption. However, there can be any
> > number of reasons for why the clocks may not show up as early as
> > expected, as evidenced in the case of Tegra186.
> > 
> >> I guess you're making this change for T186, is it because the BPMP
> >> driver's probe getting deferred? If yes, won't it be possible to fix the
> >> defer of the clock driver instead of making such changes in the affected
> >> drivers?
> > 
> > The reason why this is now happening on Tegra186 is because the BPMP is
> > bound to an IOMMU to avoid getting faults from the new no-bypass policy
> > that the ARM SMMU driver is implementing as of v5.2-rc1.
> > 
> > As a result of binding to an IOMMU, the first probe of the BPMP driver
> > will get deferred, so any driver trying to request a clock after that
> > and before BPMP gets probed successfully the next time, any clk_get()
> > calls will fail with -EPROBE_DEFER.
> > 
> > This is a bit unfortunate, but like I said, these kinds of things can
> > happen, and probe deferral was designed specifically to deal with that
> > kind of situation so that we wouldn't have to rely on all of these
> > built-in assumptions that occasionally break.
> > 
> > The driver also already handles deferred probe properly. The only thing
> > that this patch really changes is to no longer consider -EPROBE_DEFER an
> > error. It's in fact a pretty common situation in many drivers and should
> > not warrant a kernel log message.
> 
> You're trying to mask symptoms instead of curing the decease and it looks
> like the decease could be cured.

There's nothing here to cure. -EPROBE_DEFER was designed specifically to
avoid having to play these kinds of games with initcall levels.

What this patch tries to do is just to avoid printing an error message
for something that is not an error. -EPROBE_DEFER is totally expected to
happen, it's normal, it's not something that we should bother users with
because things end up sorting themselves out in the end.

> Won't something like this work for you?

I'm sure we could find a number of ways to fix this. But there's no need
to fix this because it's not broken. What is broken is that we output an
error message when this happens and make an elephant out of a fly.

Thierry

> From fbeabba5f1151e96edc38620db67593585558ca0 Mon Sep 17 00:00:00 2001
> From: Dmitry Osipenko <digetx@gmail.com>
> Date: Wed, 5 Jun 2019 14:02:00 +0300
> Subject: [PATCH 1/2] iommu/arm-smmu: Move driver registration to subsys level
> 
> On some platforms there is a dependency on the IOMMU availability that
> comes up early during of the boot process. One example is NVIDIA Tegra186
> which uses firmware, called BPMP, which manages lots of core functions
> like system clocks for example. That firmware driver require IOMMU
> functionality and hence the driver's probing is getting deferred because
> the ARM's SMMU driver is probed later, thus all the drivers that depend
> on the BPMP availability are also getting deferred on Tegra186. Let's move
> SMMU driver's registration to an earlier boot stage, allowing drivers like
> BPMP to probe successfully without the defer.
> 
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/iommu/arm-smmu.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 5e54cc0a28b3..08919f2fdf04 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -2410,4 +2410,9 @@ static struct platform_driver arm_smmu_driver = {
>  	.probe	= arm_smmu_device_probe,
>  	.shutdown = arm_smmu_device_shutdown,
>  };
> -builtin_platform_driver(arm_smmu_driver);
> +
> +static int __init arm_smmu_init(void)
> +{
> +	return platform_driver_register(&arm_smmu_driver);
> +}
> +subsys_initcall(arm_smmu_init);
> -- 
> 2.21.0
> 
> From 12ec90e22405b6d5574cbfcbd33b92736ad6bfe4 Mon Sep 17 00:00:00 2001
> From: Dmitry Osipenko <digetx@gmail.com>
> Date: Wed, 5 Jun 2019 14:13:15 +0300
> Subject: [PATCH 2/2] firmware/tegra: Move driver registration to subsys-sync
>  level
> 
> The BPMP driver depends on the ARM SMMU driver which is now getting
> probed from the subsys level, hence let's move the BPMP's probing to
> the subsys-sync level in order to avoid probe deferring of the BPMP's
> driver.
> 
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/firmware/tegra/bpmp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/firmware/tegra/bpmp.c b/drivers/firmware/tegra/bpmp.c
> index dd775e8ba5a0..3f649e12f9f6 100644
> --- a/drivers/firmware/tegra/bpmp.c
> +++ b/drivers/firmware/tegra/bpmp.c
> @@ -883,4 +883,4 @@ static int __init tegra_bpmp_init(void)
>  {
>  	return platform_driver_register(&tegra_bpmp_driver);
>  }
> -core_initcall(tegra_bpmp_init);
> +subsys_initcall_sync(tegra_bpmp_init);
> -- 
> 2.21.0
> 
>
Daniel Vetter June 5, 2019, 12:40 p.m. UTC | #6
On Tue, Jun 04, 2019 at 05:31:50PM +0200, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> When deferring probe, avoid logging a confusing error message. While at
> it, make the error message more informational.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
>  drivers/gpu/host1x/dev.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> index c55e2d634887..5a3f797240d4 100644
> --- a/drivers/gpu/host1x/dev.c
> +++ b/drivers/gpu/host1x/dev.c
> @@ -247,8 +247,11 @@ static int host1x_probe(struct platform_device *pdev)
>  
>  	host->clk = devm_clk_get(&pdev->dev, NULL);
>  	if (IS_ERR(host->clk)) {
> -		dev_err(&pdev->dev, "failed to get clock\n");
>  		err = PTR_ERR(host->clk);
> +
> +		if (err != -EPROBE_DEFER)
> +			dev_err(&pdev->dev, "failed to get clock: %d\n", err);

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> +
>  		return err;
>  	}
>  
> -- 
> 2.21.0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Dmitry Osipenko June 5, 2019, 12:40 p.m. UTC | #7
05.06.2019 15:32, Thierry Reding пишет:
> On Wed, Jun 05, 2019 at 02:25:43PM +0300, Dmitry Osipenko wrote:
>> 05.06.2019 11:28, Thierry Reding пишет:
>>> On Tue, Jun 04, 2019 at 07:07:42PM +0300, Dmitry Osipenko wrote:
>>>> 04.06.2019 18:31, Thierry Reding пишет:
>>>>> From: Thierry Reding <treding@nvidia.com>
>>>>>
>>>>> When deferring probe, avoid logging a confusing error message. While at
>>>>> it, make the error message more informational.
>>>>>
>>>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>>>> ---
>>>>>  drivers/gpu/host1x/dev.c | 5 ++++-
>>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
>>>>> index c55e2d634887..5a3f797240d4 100644
>>>>> --- a/drivers/gpu/host1x/dev.c
>>>>> +++ b/drivers/gpu/host1x/dev.c
>>>>> @@ -247,8 +247,11 @@ static int host1x_probe(struct platform_device *pdev)
>>>>>  
>>>>>  	host->clk = devm_clk_get(&pdev->dev, NULL);
>>>>>  	if (IS_ERR(host->clk)) {
>>>>> -		dev_err(&pdev->dev, "failed to get clock\n");
>>>>>  		err = PTR_ERR(host->clk);
>>>>> +
>>>>> +		if (err != -EPROBE_DEFER)
>>>>> +			dev_err(&pdev->dev, "failed to get clock: %d\n", err);
>>>>> +
>>>>>  		return err;
>>>>>  	}
>>>>
>>>> The clock driver should be available at the time of host1x's probing on
>>>> all Tegra's because it is one of essential core drivers that become
>>>> available early during boot.
>>>
>>> That's the currently baked-in assumption. However, there can be any
>>> number of reasons for why the clocks may not show up as early as
>>> expected, as evidenced in the case of Tegra186.
>>>
>>>> I guess you're making this change for T186, is it because the BPMP
>>>> driver's probe getting deferred? If yes, won't it be possible to fix the
>>>> defer of the clock driver instead of making such changes in the affected
>>>> drivers?
>>>
>>> The reason why this is now happening on Tegra186 is because the BPMP is
>>> bound to an IOMMU to avoid getting faults from the new no-bypass policy
>>> that the ARM SMMU driver is implementing as of v5.2-rc1.
>>>
>>> As a result of binding to an IOMMU, the first probe of the BPMP driver
>>> will get deferred, so any driver trying to request a clock after that
>>> and before BPMP gets probed successfully the next time, any clk_get()
>>> calls will fail with -EPROBE_DEFER.
>>>
>>> This is a bit unfortunate, but like I said, these kinds of things can
>>> happen, and probe deferral was designed specifically to deal with that
>>> kind of situation so that we wouldn't have to rely on all of these
>>> built-in assumptions that occasionally break.
>>>
>>> The driver also already handles deferred probe properly. The only thing
>>> that this patch really changes is to no longer consider -EPROBE_DEFER an
>>> error. It's in fact a pretty common situation in many drivers and should
>>> not warrant a kernel log message.
>>
>> You're trying to mask symptoms instead of curing the decease and it looks
>> like the decease could be cured.
> 
> There's nothing here to cure. -EPROBE_DEFER was designed specifically to
> avoid having to play these kinds of games with initcall levels.
> 
> What this patch tries to do is just to avoid printing an error message
> for something that is not an error. -EPROBE_DEFER is totally expected to
> happen, it's normal, it's not something that we should bother users with
> because things end up sorting themselves out in the end.
> 
>> Won't something like this work for you?
> 
> I'm sure we could find a number of ways to fix this. But there's no need
> to fix this because it's not broken. What is broken is that we output an
> error message when this happens and make an elephant out of a fly.

Sure, this is absolutely not critical and deferred probe is doing its job.
But don't you agree that it's better to fix the root of the annoyance once
and for all?

Would be awesome if you could give a whirl to the patches that I'm suggesting.
Thierry Reding June 5, 2019, 1:04 p.m. UTC | #8
On Wed, Jun 05, 2019 at 03:40:28PM +0300, Dmitry Osipenko wrote:
> 05.06.2019 15:32, Thierry Reding пишет:
> > On Wed, Jun 05, 2019 at 02:25:43PM +0300, Dmitry Osipenko wrote:
> >> 05.06.2019 11:28, Thierry Reding пишет:
> >>> On Tue, Jun 04, 2019 at 07:07:42PM +0300, Dmitry Osipenko wrote:
> >>>> 04.06.2019 18:31, Thierry Reding пишет:
> >>>>> From: Thierry Reding <treding@nvidia.com>
> >>>>>
> >>>>> When deferring probe, avoid logging a confusing error message. While at
> >>>>> it, make the error message more informational.
> >>>>>
> >>>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
> >>>>> ---
> >>>>>  drivers/gpu/host1x/dev.c | 5 ++++-
> >>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> >>>>> index c55e2d634887..5a3f797240d4 100644
> >>>>> --- a/drivers/gpu/host1x/dev.c
> >>>>> +++ b/drivers/gpu/host1x/dev.c
> >>>>> @@ -247,8 +247,11 @@ static int host1x_probe(struct platform_device *pdev)
> >>>>>  
> >>>>>  	host->clk = devm_clk_get(&pdev->dev, NULL);
> >>>>>  	if (IS_ERR(host->clk)) {
> >>>>> -		dev_err(&pdev->dev, "failed to get clock\n");
> >>>>>  		err = PTR_ERR(host->clk);
> >>>>> +
> >>>>> +		if (err != -EPROBE_DEFER)
> >>>>> +			dev_err(&pdev->dev, "failed to get clock: %d\n", err);
> >>>>> +
> >>>>>  		return err;
> >>>>>  	}
> >>>>
> >>>> The clock driver should be available at the time of host1x's probing on
> >>>> all Tegra's because it is one of essential core drivers that become
> >>>> available early during boot.
> >>>
> >>> That's the currently baked-in assumption. However, there can be any
> >>> number of reasons for why the clocks may not show up as early as
> >>> expected, as evidenced in the case of Tegra186.
> >>>
> >>>> I guess you're making this change for T186, is it because the BPMP
> >>>> driver's probe getting deferred? If yes, won't it be possible to fix the
> >>>> defer of the clock driver instead of making such changes in the affected
> >>>> drivers?
> >>>
> >>> The reason why this is now happening on Tegra186 is because the BPMP is
> >>> bound to an IOMMU to avoid getting faults from the new no-bypass policy
> >>> that the ARM SMMU driver is implementing as of v5.2-rc1.
> >>>
> >>> As a result of binding to an IOMMU, the first probe of the BPMP driver
> >>> will get deferred, so any driver trying to request a clock after that
> >>> and before BPMP gets probed successfully the next time, any clk_get()
> >>> calls will fail with -EPROBE_DEFER.
> >>>
> >>> This is a bit unfortunate, but like I said, these kinds of things can
> >>> happen, and probe deferral was designed specifically to deal with that
> >>> kind of situation so that we wouldn't have to rely on all of these
> >>> built-in assumptions that occasionally break.
> >>>
> >>> The driver also already handles deferred probe properly. The only thing
> >>> that this patch really changes is to no longer consider -EPROBE_DEFER an
> >>> error. It's in fact a pretty common situation in many drivers and should
> >>> not warrant a kernel log message.
> >>
> >> You're trying to mask symptoms instead of curing the decease and it looks
> >> like the decease could be cured.
> > 
> > There's nothing here to cure. -EPROBE_DEFER was designed specifically to
> > avoid having to play these kinds of games with initcall levels.
> > 
> > What this patch tries to do is just to avoid printing an error message
> > for something that is not an error. -EPROBE_DEFER is totally expected to
> > happen, it's normal, it's not something that we should bother users with
> > because things end up sorting themselves out in the end.
> > 
> >> Won't something like this work for you?
> > 
> > I'm sure we could find a number of ways to fix this. But there's no need
> > to fix this because it's not broken. What is broken is that we output an
> > error message when this happens and make an elephant out of a fly.
> 
> Sure, this is absolutely not critical and deferred probe is doing its job.
> But don't you agree that it's better to fix the root of the annoyance once
> and for all?

From my point of view deferred probe is the once and for all fix. Back
before we had deferred probe, doing these kinds of initcall reordering
tricks was fairly common and while such a change may fix one setup, it
often ended up breaking others.

Sorry, this is a lesson that we already learned a couple of years ago,
no need to rehash it.

Thierry
Dmitry Osipenko June 5, 2019, 1:16 p.m. UTC | #9
05.06.2019 16:04, Thierry Reding пишет:
> On Wed, Jun 05, 2019 at 03:40:28PM +0300, Dmitry Osipenko wrote:
>> 05.06.2019 15:32, Thierry Reding пишет:
>>> On Wed, Jun 05, 2019 at 02:25:43PM +0300, Dmitry Osipenko wrote:
>>>> 05.06.2019 11:28, Thierry Reding пишет:
>>>>> On Tue, Jun 04, 2019 at 07:07:42PM +0300, Dmitry Osipenko wrote:
>>>>>> 04.06.2019 18:31, Thierry Reding пишет:
>>>>>>> From: Thierry Reding <treding@nvidia.com>
>>>>>>>
>>>>>>> When deferring probe, avoid logging a confusing error message. While at
>>>>>>> it, make the error message more informational.
>>>>>>>
>>>>>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>>>>>> ---
>>>>>>>  drivers/gpu/host1x/dev.c | 5 ++++-
>>>>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
>>>>>>> index c55e2d634887..5a3f797240d4 100644
>>>>>>> --- a/drivers/gpu/host1x/dev.c
>>>>>>> +++ b/drivers/gpu/host1x/dev.c
>>>>>>> @@ -247,8 +247,11 @@ static int host1x_probe(struct platform_device *pdev)
>>>>>>>  
>>>>>>>  	host->clk = devm_clk_get(&pdev->dev, NULL);
>>>>>>>  	if (IS_ERR(host->clk)) {
>>>>>>> -		dev_err(&pdev->dev, "failed to get clock\n");
>>>>>>>  		err = PTR_ERR(host->clk);
>>>>>>> +
>>>>>>> +		if (err != -EPROBE_DEFER)
>>>>>>> +			dev_err(&pdev->dev, "failed to get clock: %d\n", err);
>>>>>>> +
>>>>>>>  		return err;
>>>>>>>  	}
>>>>>>
>>>>>> The clock driver should be available at the time of host1x's probing on
>>>>>> all Tegra's because it is one of essential core drivers that become
>>>>>> available early during boot.
>>>>>
>>>>> That's the currently baked-in assumption. However, there can be any
>>>>> number of reasons for why the clocks may not show up as early as
>>>>> expected, as evidenced in the case of Tegra186.
>>>>>
>>>>>> I guess you're making this change for T186, is it because the BPMP
>>>>>> driver's probe getting deferred? If yes, won't it be possible to fix the
>>>>>> defer of the clock driver instead of making such changes in the affected
>>>>>> drivers?
>>>>>
>>>>> The reason why this is now happening on Tegra186 is because the BPMP is
>>>>> bound to an IOMMU to avoid getting faults from the new no-bypass policy
>>>>> that the ARM SMMU driver is implementing as of v5.2-rc1.
>>>>>
>>>>> As a result of binding to an IOMMU, the first probe of the BPMP driver
>>>>> will get deferred, so any driver trying to request a clock after that
>>>>> and before BPMP gets probed successfully the next time, any clk_get()
>>>>> calls will fail with -EPROBE_DEFER.
>>>>>
>>>>> This is a bit unfortunate, but like I said, these kinds of things can
>>>>> happen, and probe deferral was designed specifically to deal with that
>>>>> kind of situation so that we wouldn't have to rely on all of these
>>>>> built-in assumptions that occasionally break.
>>>>>
>>>>> The driver also already handles deferred probe properly. The only thing
>>>>> that this patch really changes is to no longer consider -EPROBE_DEFER an
>>>>> error. It's in fact a pretty common situation in many drivers and should
>>>>> not warrant a kernel log message.
>>>>
>>>> You're trying to mask symptoms instead of curing the decease and it looks
>>>> like the decease could be cured.
>>>
>>> There's nothing here to cure. -EPROBE_DEFER was designed specifically to
>>> avoid having to play these kinds of games with initcall levels.
>>>
>>> What this patch tries to do is just to avoid printing an error message
>>> for something that is not an error. -EPROBE_DEFER is totally expected to
>>> happen, it's normal, it's not something that we should bother users with
>>> because things end up sorting themselves out in the end.
>>>
>>>> Won't something like this work for you?
>>>
>>> I'm sure we could find a number of ways to fix this. But there's no need
>>> to fix this because it's not broken. What is broken is that we output an
>>> error message when this happens and make an elephant out of a fly.
>>
>> Sure, this is absolutely not critical and deferred probe is doing its job.
>> But don't you agree that it's better to fix the root of the annoyance once
>> and for all?
> 
> From my point of view deferred probe is the once and for all fix. Back
> before we had deferred probe, doing these kinds of initcall reordering
> tricks was fairly common and while such a change may fix one setup, it
> often ended up breaking others.
> 
> Sorry, this is a lesson that we already learned a couple of years ago,
> no need to rehash it.

We had a success story in similar case of pinctrl vs GPIO driver probe
order, reordering the probe order fixed a lot of deferred probings and
improved boot up time a tad. Oh, that reminded me that the Stefan's
patch was never merged [0] :(

I'm trying to help, but seems not much I could do since you're in oppose ;)

[0] https://patchwork.ozlabs.org/patch/949776/
diff mbox series

Patch

diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index c55e2d634887..5a3f797240d4 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -247,8 +247,11 @@  static int host1x_probe(struct platform_device *pdev)
 
 	host->clk = devm_clk_get(&pdev->dev, NULL);
 	if (IS_ERR(host->clk)) {
-		dev_err(&pdev->dev, "failed to get clock\n");
 		err = PTR_ERR(host->clk);
+
+		if (err != -EPROBE_DEFER)
+			dev_err(&pdev->dev, "failed to get clock: %d\n", err);
+
 		return err;
 	}