diff mbox series

[5/5] PCI: iproc: Properly handle optional PHYs

Message ID 20190828163636.12967-5-thierry.reding@gmail.com
State Superseded
Headers show
Series [1/5] PCI: exynos: Properly handle optional PHYs | expand

Commit Message

Thierry Reding Aug. 28, 2019, 4:36 p.m. UTC
From: Thierry Reding <treding@nvidia.com>

devm_phy_get() can fail for a number of resides besides probe deferral.
It can for example return -ENOMEM if it runs out of memory as it tries
to allocate devres structures. Propagating only -EPROBE_DEFER is
problematic because it results in these legitimately fatal errors being
treated as "PHY not specified in DT".

What we really want is to ignore the optional PHYs only if they have not
been specified in DT. devm_phy_optional_get() is a function that exactly
does what's required here, so use that instead.

Cc: Ray Jui <rjui@broadcom.com>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: bcm-kernel-feedback-list@broadcom.com
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/pci/controller/pcie-iproc-platform.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

Comments

Andrew Murray Aug. 28, 2019, 9:26 p.m. UTC | #1
On Wed, Aug 28, 2019 at 06:36:36PM +0200, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> devm_phy_get() can fail for a number of resides besides probe deferral.
> It can for example return -ENOMEM if it runs out of memory as it tries
> to allocate devres structures. Propagating only -EPROBE_DEFER is
> problematic because it results in these legitimately fatal errors being
> treated as "PHY not specified in DT".
> 
> What we really want is to ignore the optional PHYs only if they have not
> been specified in DT. devm_phy_optional_get() is a function that exactly
> does what's required here, so use that instead.
> 
> Cc: Ray Jui <rjui@broadcom.com>
> Cc: Scott Branden <sbranden@broadcom.com>
> Cc: bcm-kernel-feedback-list@broadcom.com
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
>  drivers/pci/controller/pcie-iproc-platform.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/controller/pcie-iproc-platform.c b/drivers/pci/controller/pcie-iproc-platform.c
> index 5a3550b6bb29..9ee6200a66f4 100644
> --- a/drivers/pci/controller/pcie-iproc-platform.c
> +++ b/drivers/pci/controller/pcie-iproc-platform.c
> @@ -93,12 +93,9 @@ static int iproc_pcie_pltfm_probe(struct platform_device *pdev)
>  	pcie->need_ib_cfg = of_property_read_bool(np, "dma-ranges");
>  
>  	/* PHY use is optional */
> -	pcie->phy = devm_phy_get(dev, "pcie-phy");
> -	if (IS_ERR(pcie->phy)) {
> -		if (PTR_ERR(pcie->phy) == -EPROBE_DEFER)
> -			return -EPROBE_DEFER;
> -		pcie->phy = NULL;
> -	}
> +	pcie->phy = devm_phy_optional_get(dev, "pcie-phy");
> +	if (IS_ERR(pcie->phy))
> +		return PTR_ERR(pcie->phy);

Once you've applied Bjorn's feedback you can add:

Reviewed-by: Andrew Murray <andrew.murray@arm.com>

I initially thought that you forgot to check for -ENODEV - though I can see
that the implementation of devm_phy_optional_get very helpfully does this for
us and returns NULL instead of an error.

What is also confusing is that devm_regulator_get_optional, despite its
_optional suffix doesn't do this and returns an error. I wonder if
devm_phy_optional_get should be changed to return NULL instead of an error
instead of -ENODEV. I've copied Liam/Mark for feedback.

Thanks,

Andrew Murray

>  
>  	ret = devm_of_pci_get_host_bridge_resources(dev, 0, 0xff, &resources,
>  						    &iobase);
> -- 
> 2.22.0
>
Mark Brown Aug. 28, 2019, 9:49 p.m. UTC | #2
On Wed, Aug 28, 2019 at 10:26:55PM +0100, Andrew Murray wrote:

> I initially thought that you forgot to check for -ENODEV - though I can see
> that the implementation of devm_phy_optional_get very helpfully does this for
> us and returns NULL instead of an error.

> What is also confusing is that devm_regulator_get_optional, despite its
> _optional suffix doesn't do this and returns an error. I wonder if
> devm_phy_optional_get should be changed to return NULL instead of an error
> instead of -ENODEV. I've copied Liam/Mark for feedback.

The regulator API has an assumption that people will write bad DTs and
not describe all the regulators in the system, this is especially likely
in cases where consumer drivers initially don't have regulator support
and then get it added since people often only describe supplies actively
used by drivers.  In order to handle this gracefully the API will
substitute in a dummy regulator if it sees that the regulator just isn't
described in the system but a consumer requests it, this will ensure
that for most simple uses the consumer will function fine even if the DT
is not fully described.  Since most devices won't physically work if
some of their supplies are missing this is a good default assumption.  

If a consumer could genuinely have some missing supplies (some devices
do support this for various reasons) then this support would mean that
the consumer would have to have some extra property to say that the
regulator is intentionally missing which would be bad.  Instead what we
do is let the consumer say that real systems could actually be missing
the regulator and that the dummy shouldn't be used so that the consumer
can handle this.
Andrew Murray Aug. 29, 2019, 10:09 a.m. UTC | #3
On Wed, Aug 28, 2019 at 10:49:01PM +0100, Mark Brown wrote:
> On Wed, Aug 28, 2019 at 10:26:55PM +0100, Andrew Murray wrote:
> 
> > I initially thought that you forgot to check for -ENODEV - though I can see
> > that the implementation of devm_phy_optional_get very helpfully does this for
> > us and returns NULL instead of an error.
> 
> > What is also confusing is that devm_regulator_get_optional, despite its
> > _optional suffix doesn't do this and returns an error. I wonder if
> > devm_phy_optional_get should be changed to return NULL instead of an error
> > instead of -ENODEV. I've copied Liam/Mark for feedback.
> 
> The regulator API has an assumption that people will write bad DTs and
> not describe all the regulators in the system, this is especially likely
> in cases where consumer drivers initially don't have regulator support
> and then get it added since people often only describe supplies actively
> used by drivers.  In order to handle this gracefully the API will
> substitute in a dummy regulator if it sees that the regulator just isn't
> drescribed in the system but a consumer requests it, this will ensure
> that for most simple uses the consumer will function fine even if the DT
> is not fully described.  Since most devices won't physically work if
> some of their supplies are missing this is a good default assumption.  

Right, if I understand correctly this is the behaviour when regulator_get
is called (e.g. NORMAL_GET) - you get a dummy instead of an error.

> 
> If a consumer could genuinely have some missing supplies (some devices
> do support this for various reasons) then this support would mean that
> the consumer would have to have some extra property to say that the
> regulator is intentionally missing which would be bad.  Instead what we
> do is let the consumer say that real systems could actually be missing
> the regulator and that the dummy shouldn't be used so that the consumer
> can handle this.

And if I understand correctly this is the behaviour when
regulator_get_optional is called (e.g. OPTIONAL_GET) - you get -ENODEV
instead of a dummy.

But why do we return -ENODEV and not NULL for OPTIONAL_GET?

Looking at some of the consumer drivers I can see that lots of them don't
correctly handle the return value of regulator_get_optional:

 - some fail their probes and return upon IS_ERR(ret) - for example even
   if -ENODEV is returned.

 - some don't fail their probes and assume the regulator isn't present upon
   IS_ERR(ret) - yet this may not be correct as the regulator may be present
   but -ENOMEM was returned.

Given that a common pattern is to set a consumer regulator pointer to NULL
upon -ENODEV - if regulator_get_optional did this for them, then it would
be more difficult for consumer drivers to get the error handling wrong and
would remove some consumer boiler plate code.

(Of course some consumers won't set a regulator pointer to NULL and instead
test it against IS_ERR instead of NULL everywhere (IS_ERR(NULL) is false) -
but such a change may be a reason to not use IS_ERR everywhere).

As I understand, if a consumer wants to fail upon an absent regulator
it seems the only way they can do this is call regulator_get_optional (which
seems odd) and test for -ENODEV. I'm not sure if there is actually a use-case
for this.

I guess I'm looking here for something that can simplify consumer error
handling - it's easy to get wrong and it seems that many drivers may be wrong.

Thanks,

Andrew Murray
Thierry Reding Aug. 29, 2019, 10:48 a.m. UTC | #4
On Thu, Aug 29, 2019 at 11:09:34AM +0100, Andrew Murray wrote:
> On Wed, Aug 28, 2019 at 10:49:01PM +0100, Mark Brown wrote:
> > On Wed, Aug 28, 2019 at 10:26:55PM +0100, Andrew Murray wrote:
> > 
> > > I initially thought that you forgot to check for -ENODEV - though I can see
> > > that the implementation of devm_phy_optional_get very helpfully does this for
> > > us and returns NULL instead of an error.
> > 
> > > What is also confusing is that devm_regulator_get_optional, despite its
> > > _optional suffix doesn't do this and returns an error. I wonder if
> > > devm_phy_optional_get should be changed to return NULL instead of an error
> > > instead of -ENODEV. I've copied Liam/Mark for feedback.
> > 
> > The regulator API has an assumption that people will write bad DTs and
> > not describe all the regulators in the system, this is especially likely
> > in cases where consumer drivers initially don't have regulator support
> > and then get it added since people often only describe supplies actively
> > used by drivers.  In order to handle this gracefully the API will
> > substitute in a dummy regulator if it sees that the regulator just isn't
> > drescribed in the system but a consumer requests it, this will ensure
> > that for most simple uses the consumer will function fine even if the DT
> > is not fully described.  Since most devices won't physically work if
> > some of their supplies are missing this is a good default assumption.  
> 
> Right, if I understand correctly this is the behaviour when regulator_get
> is called (e.g. NORMAL_GET) - you get a dummy instead of an error.
> 
> > 
> > If a consumer could genuinely have some missing supplies (some devices
> > do support this for various reasons) then this support would mean that
> > the consumer would have to have some extra property to say that the
> > regulator is intentionally missing which would be bad.  Instead what we
> > do is let the consumer say that real systems could actually be missing
> > the regulator and that the dummy shouldn't be used so that the consumer
> > can handle this.
> 
> And if I understand correctly this is the behaviour when
> regulator_get_optional is called (e.g. OPTIONAL_GET) - you get -ENODEV
> instead of a dummy.
> 
> But why do we return -ENODEV and not NULL for OPTIONAL_GET?
> 
> Looking at some of the consumer drivers I can see that lots of them don't
> correctly handle the return value of regulator_get_optional:
> 
>  - some fail their probes and return upon IS_ERR(ret) - for example even
>    if -ENODEV is returned.
> 
>  - some don't fail their probes and assume the regulator isn't present upon
>    IS_ERR(ret) - yet this may not be correct as the regulator may be present
>    but -ENOMEM was returned.
> 
> Given that a common pattern is to set a consumer regulator pointer to NULL
> upon -ENODEV - if regulator_get_optional did this for them, then it would
> be more difficult for consumer drivers to get the error handling wrong and
> would remove some consumer boiler plate code.
> 
> (Of course some consumers won't set a regulator pointer to NULL and instead
> test it against IS_ERR instead of NULL everywhere (IS_ERR(NULL) is false) -
> but such a change may be a reason to not use IS_ERR everywhere).
> 
> As I understand, if a consumer wants to fail upon an absent regulator
> it seems the only way they can do this is call regulator_get_optional (which
> seems odd) and test for -ENODEV. I'm not sure if there is actually a use-case
> for this.
> 
> I guess I'm looking here for something that can simplify consumer error
> handling - it's easy to get wrong and it seems that many drivers may be wrong.

Agreed. However, this requires a thorough audit of all callers of
regulator_get_optional() to make sure they behave in a sane way. To
further complicate things, unless we want to convert all ~100 callers
in a single patch we need to convert all of them to set the regulator
pointer to NULL on -ENODEV. After that we can make the change to
regulator_get_optional() and only then can we remove the now obsolete
boilerplate from those ~100 callers. Not impossible, but pretty time-
consuming.

While at it, we could also add optional variants to some of the
*phy*_get() functions to convert those as well. Currently there's only
optional variants for phy_get() and devm_phy_get(), but a bunch of
drivers use of_phy_get() or of_phy_get_by_index(). Though especially the
latter isn't very common with optional PHYs, I think.

I also noticed a slightly similar pattern for GPIOs. Perhaps this would
be a good task for someone with good semantic patch skills. Or perhaps
something to add to the janitors' TODO list? Not sure if that's still a
thing, though.

Thierry
Mark Brown Aug. 29, 2019, 11:17 a.m. UTC | #5
On Thu, Aug 29, 2019 at 11:09:34AM +0100, Andrew Murray wrote:

> But why do we return -ENODEV and not NULL for OPTIONAL_GET?

Why would we return NULL?  The caller is going to have to check the
error code anyway since they need to handle -EPROBE_DEFER and NULL is
never a valid thing to use with the regulator API.

> Looking at some of the consumer drivers I can see that lots of them don't
> correctly handle the return value of regulator_get_optional:

This API is *really* commonly abused, especially in the graphics
subsystem which for some reason has lots of users even though I don't
think I've ever seen a sensible use of the API there.  As far as I can
tell the graphics subsystem mostly suffers from people cut'n'pasting
from the Qualcomm BSP which is full of really bad practice.  It's really
frustrating since I will intermittently go through and clean things up
but the uses just keep coming back into the subsystem.

> Given that a common pattern is to set a consumer regulator pointer to NULL
> upon -ENODEV - if regulator_get_optional did this for them, then it would
> be more difficult for consumer drivers to get the error handling wrong and
> would remove some consumer boiler plate code.

No, they'd all still be broken for probe deferral (which is commonly
caused by off-chip devices) and they'd crash as soon as they try to call
any further regulator API functions.

> I guess I'm looking here for something that can simplify consumer error
> handling - it's easy to get wrong and it seems that many drivers may be wrong.

The overwhelming majority of the users just shouldn't be using this API.
If there weren't devices that absolutely *need* this API it wouldn't be
there in the first place since it seems like a magent for bad code, it's
so disappointing how bad the majority of the consumer code is.
Thierry Reding Aug. 29, 2019, 11:46 a.m. UTC | #6
On Thu, Aug 29, 2019 at 12:17:29PM +0100, Mark Brown wrote:
> On Thu, Aug 29, 2019 at 11:09:34AM +0100, Andrew Murray wrote:
> 
> > But why do we return -ENODEV and not NULL for OPTIONAL_GET?
> 
> Why would we return NULL?  The caller is going to have to check the
> error code anyway since they need to handle -EPROBE_DEFER and NULL is
> never a valid thing to use with the regulator API.

I think the idea is that consumers would do something like this:

	supply = regulator_get_optional(dev, "foo");
	if (IS_ERR(supply)) {
		/* -EPROBE_DEFER handled as part of this */
		return PTR_ERR(supply);
	}

	/*
	 * Optional supply is NULL here, making it "easier" to check
	 * whether or not to use it.
	 */

I suppose checking using IS_ERR() is equally simple, so this mostly
boils down to taste...

The PHY subsystem, for instance, uses a similar approach but it goes one
step further. All APIs can take NULL as their struct phy * argument and
return success in that case, which makes it really trivial to deal with
optional PHYs. You really don't have to care about them at all after you
get NULL from phy_get_optional().

> > Looking at some of the consumer drivers I can see that lots of them don't
> > correctly handle the return value of regulator_get_optional:
> 
> This API is *really* commonly abused, especially in the graphics
> subsystem which for some reason has lots of users even though I don't
> think I've ever seen a sensible use of the API there.  As far as I can
> tell the graphics subsystem mostly suffers from people cut'n'pasting
> from the Qualcomm BSP which is full of really bad practice.  It's really
> frustrating since I will intermittently go through and clean things up
> but the uses just keep coming back into the subsystem.

The intention here is to make it more difficult to abuse. Obviously if
people keep copying abuses from one driver to another that's a different
story. But if there was no way to abuse the API and if it automatically
did the right thing, even copy/paste abuse would be difficult to pull
off.

> > Given that a common pattern is to set a consumer regulator pointer to NULL
> > upon -ENODEV - if regulator_get_optional did this for them, then it would
> > be more difficult for consumer drivers to get the error handling wrong and
> > would remove some consumer boiler plate code.
> 
> No, they'd all still be broken for probe deferral (which is commonly
> caused by off-chip devices) and they'd crash as soon as they try to call
> any further regulator API functions.

I think what Andrew was suggesting here was to make it easier for people
to determine whether or not an optional regulator was in fact requested
successfully or not. Many consumers already set the optional supply to
NULL and then check for that before calling any regulator API.

If regulator_get_optional() returned NULL for absent optional supplies,
this could be unified across all drivers. And it would allow treating
NULL regulators special, if that's something you'd be willing to do.

In either case, the number of abuses shows that people clearly don't
understand how to use this. So there are two options: a) fix abuse every
time we come across it or b) try to change the API to make it more
difficult to abuse.

> > I guess I'm looking here for something that can simplify consumer error
> > handling - it's easy to get wrong and it seems that many drivers may be wrong.
> 
> The overwhelming majority of the users just shouldn't be using this API.
> If there weren't devices that absolutely *need* this API it wouldn't be
> there in the first place since it seems like a magent for bad code, it's
> so disappointing how bad the majority of the consumer code is.

Yeah, I guess in many cases this API is used as a cheap way to model
always-on, fixed-voltage regulators. It's pretty hard to change those
after the fact, though, because they're usually happening as part of
device tree bindings implementation, so by the time we notice them,
they've often become ABI...

Thierry
Andrew Murray Aug. 29, 2019, 12:08 p.m. UTC | #7
On Thu, Aug 29, 2019 at 01:46:03PM +0200, Thierry Reding wrote:
> On Thu, Aug 29, 2019 at 12:17:29PM +0100, Mark Brown wrote:
> > On Thu, Aug 29, 2019 at 11:09:34AM +0100, Andrew Murray wrote:
> > 
> > > But why do we return -ENODEV and not NULL for OPTIONAL_GET?
> > 
> > Why would we return NULL?  The caller is going to have to check the
> > error code anyway since they need to handle -EPROBE_DEFER and NULL is
> > never a valid thing to use with the regulator API.
> 
> I think the idea is that consumers would do something like this:
> 
> 	supply = regulator_get_optional(dev, "foo");
> 	if (IS_ERR(supply)) {
> 		/* -EPROBE_DEFER handled as part of this */
> 		return PTR_ERR(supply);
> 	}
> 
> 	/*
> 	 * Optional supply is NULL here, making it "easier" to check
> 	 * whether or not to use it.
> 	 */

Indeed. This way the error path is only for errors (if you consider that
requesting an optional regulator that doesn't exist isn't an error - and
if you also consider that -EPROBE_DEFER is an error, or at least a reason
to bail).

> 
> I suppose checking using IS_ERR() is equally simple, so this mostly
> boils down to taste...
> 
> The PHY subsystem, for instance, uses a similar approach but it goes one
> step further. All APIs can take NULL as their struct phy * argument and
> return success in that case, which makes it really trivial to deal with
> optional PHYs. You really don't have to care about them at all after you
> get NULL from phy_get_optional().

I can see how this makes everything very convenient for the driver, though
this doesn't smell right.

> 
> > > Looking at some of the consumer drivers I can see that lots of them don't
> > > correctly handle the return value of regulator_get_optional:
> > 
> > This API is *really* commonly abused, especially in the graphics
> > subsystem which for some reason has lots of users even though I don't
> > think I've ever seen a sensible use of the API there.  As far as I can
> > tell the graphics subsystem mostly suffers from people cut'n'pasting
> > from the Qualcomm BSP which is full of really bad practice.  It's really
> > frustrating since I will intermittently go through and clean things up
> > but the uses just keep coming back into the subsystem.
> 
> The intention here is to make it more difficult to abuse. Obviously if
> people keep copying abuses from one driver to another that's a different
> story. But if there was no way to abuse the API and if it automatically
> did the right thing, even copy/paste abuse would be difficult to pull
> off.

That's the motativation here.

> 
> > > Given that a common pattern is to set a consumer regulator pointer to NULL
> > > upon -ENODEV - if regulator_get_optional did this for them, then it would
> > > be more difficult for consumer drivers to get the error handling wrong and
> > > would remove some consumer boiler plate code.
> > 
> > No, they'd all still be broken for probe deferral (which is commonly
> > caused by off-chip devices) and they'd crash as soon as they try to call
> > any further regulator API functions.

regulator_get_optional would still return -EPROBE_DEFER and this would be
caught in the above example set out by Thierry.

> 
> I think what Andrew was suggesting here was to make it easier for people
> to determine whether or not an optional regulator was in fact requested
> successfully or not. Many consumers already set the optional supply to
> NULL and then check for that before calling any regulator API.
> 
> If regulator_get_optional() returned NULL for absent optional supplies,
> this could be unified across all drivers. And it would allow treating
> NULL regulators special, if that's something you'd be willing to do.
> 
> In either case, the number of abuses shows that people clearly don't
> understand how to use this. So there are two options: a) fix abuse every
> time we come across it or b) try to change the API to make it more
> difficult to abuse.

Sure. I think we end up with something like:

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index e0c0cf462004..67e2a6d7abf6 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1868,6 +1868,9 @@ struct regulator *_regulator_get(struct device *dev, const char *id,
                }
 
                switch (get_type) {
+               case OPTIONAL_GET:
+                       return NULL;
+
                case NORMAL_GET:
                        /*
                         * Assume that a regulator is physically present and


Of course there may be other approaches.

> 
> > > I guess I'm looking here for something that can simplify consumer error
> > > handling - it's easy to get wrong and it seems that many drivers may be wrong.
> > 
> > The overwhelming majority of the users just shouldn't be using this API.
> > If there weren't devices that absolutely *need* this API it wouldn't be
> > there in the first place since it seems like a magent for bad code, it's
> > so disappointing how bad the majority of the consumer code is.
> 
> Yeah, I guess in many cases this API is used as a cheap way to model
> always-on, fixed-voltage regulators. It's pretty hard to change those
> after the fact, though, because they're usually happening as part of
> device tree bindings implementation, so by the time we notice them,
> they've often become ABI...

Thanks,

Andrew Murray

> 
> Thierry
Andrew Murray Aug. 29, 2019, 12:13 p.m. UTC | #8
On Thu, Aug 29, 2019 at 12:48:06PM +0200, Thierry Reding wrote:
> On Thu, Aug 29, 2019 at 11:09:34AM +0100, Andrew Murray wrote:
> > On Wed, Aug 28, 2019 at 10:49:01PM +0100, Mark Brown wrote:
> > > On Wed, Aug 28, 2019 at 10:26:55PM +0100, Andrew Murray wrote:
> > > 
> > > > I initially thought that you forgot to check for -ENODEV - though I can see
> > > > that the implementation of devm_phy_optional_get very helpfully does this for
> > > > us and returns NULL instead of an error.
> > > 
> > > > What is also confusing is that devm_regulator_get_optional, despite its
> > > > _optional suffix doesn't do this and returns an error. I wonder if
> > > > devm_phy_optional_get should be changed to return NULL instead of an error
> > > > instead of -ENODEV. I've copied Liam/Mark for feedback.
> > > 
> > > The regulator API has an assumption that people will write bad DTs and
> > > not describe all the regulators in the system, this is especially likely
> > > in cases where consumer drivers initially don't have regulator support
> > > and then get it added since people often only describe supplies actively
> > > used by drivers.  In order to handle this gracefully the API will
> > > substitute in a dummy regulator if it sees that the regulator just isn't
> > > drescribed in the system but a consumer requests it, this will ensure
> > > that for most simple uses the consumer will function fine even if the DT
> > > is not fully described.  Since most devices won't physically work if
> > > some of their supplies are missing this is a good default assumption.  
> > 
> > Right, if I understand correctly this is the behaviour when regulator_get
> > is called (e.g. NORMAL_GET) - you get a dummy instead of an error.
> > 
> > > 
> > > If a consumer could genuinely have some missing supplies (some devices
> > > do support this for various reasons) then this support would mean that
> > > the consumer would have to have some extra property to say that the
> > > regulator is intentionally missing which would be bad.  Instead what we
> > > do is let the consumer say that real systems could actually be missing
> > > the regulator and that the dummy shouldn't be used so that the consumer
> > > can handle this.
> > 
> > And if I understand correctly this is the behaviour when
> > regulator_get_optional is called (e.g. OPTIONAL_GET) - you get -ENODEV
> > instead of a dummy.
> > 
> > But why do we return -ENODEV and not NULL for OPTIONAL_GET?
> > 
> > Looking at some of the consumer drivers I can see that lots of them don't
> > correctly handle the return value of regulator_get_optional:
> > 
> >  - some fail their probes and return upon IS_ERR(ret) - for example even
> >    if -ENODEV is returned.
> > 
> >  - some don't fail their probes and assume the regulator isn't present upon
> >    IS_ERR(ret) - yet this may not be correct as the regulator may be present
> >    but -ENOMEM was returned.
> > 
> > Given that a common pattern is to set a consumer regulator pointer to NULL
> > upon -ENODEV - if regulator_get_optional did this for them, then it would
> > be more difficult for consumer drivers to get the error handling wrong and
> > would remove some consumer boiler plate code.
> > 
> > (Of course some consumers won't set a regulator pointer to NULL and instead
> > test it against IS_ERR instead of NULL everywhere (IS_ERR(NULL) is false) -
> > but such a change may be a reason to not use IS_ERR everywhere).
> > 
> > As I understand, if a consumer wants to fail upon an absent regulator
> > it seems the only way they can do this is call regulator_get_optional (which
> > seems odd) and test for -ENODEV. I'm not sure if there is actually a use-case
> > for this.
> > 
> > I guess I'm looking here for something that can simplify consumer error
> > handling - it's easy to get wrong and it seems that many drivers may be wrong.
> 
> Agreed. However, this requires a thorough audit of all callers of
> regulator_get_optional() to make sure they behave in a sane way. To
> further complicate things, unless we want to convert all ~100 callers
> in a single patch we need to convert all of them to set the regulator
> pointer to NULL on -ENODEV. After that we can make the change to
> regulator_get_optional() and only then can we remove the now obsolete
> boilerplate from those ~100 callers. Not impossible, but pretty time-
> consuming.

This makes sense.

> 
> While at it, we could also add optional variants to some of the
> *phy*_get() functions to convert those as well. Currently there's only
> optional variants for phy_get() and devm_phy_get(), but a bunch of
> drivers use of_phy_get() or of_phy_get_by_index(). Though especially the
> latter isn't very common with optional PHYs, I think.
> 
> I also noticed a slightly similar pattern for GPIOs. Perhaps this would
> be a good task for someone with good semantic patch skills. Or perhaps
> something to add to the janitors' TODO list? Not sure if that's still a
> thing, though.

Yeah I'm pretty sure this applies to several APIs.

Janitors is still around - http://vger.kernel.org/vger-lists.html#kernel-janitors

I think there are others too, such as the Linux Kernel Mentorship Program

Thanks,

Andrew Murray

> 
> Thierry
Mark Brown Aug. 29, 2019, 1:03 p.m. UTC | #9
On Thu, Aug 29, 2019 at 01:46:03PM +0200, Thierry Reding wrote:
> On Thu, Aug 29, 2019 at 12:17:29PM +0100, Mark Brown wrote:

> > Why would we return NULL?  The caller is going to have to check the
> > error code anyway since they need to handle -EPROBE_DEFER and NULL is
> > never a valid thing to use with the regulator API.

> I think the idea is that consumers would do something like this:

> 	supply = regulator_get_optional(dev, "foo");
> 	if (IS_ERR(supply)) {
> 		/* -EPROBE_DEFER handled as part of this */
> 		return PTR_ERR(supply);
> 	}

> 	/*
> 	 * Optional supply is NULL here, making it "easier" to check
> 	 * whether or not to use it.
> 	 */

> I suppose checking using IS_ERR() is equally simple, so this mostly
> boils down to taste...

Or setting a flag saying which mode to drive the chip in for example.

> The PHY subsystem, for instance, uses a similar approach but it goes one
> step further. All APIs can take NULL as their struct phy * argument and
> return success in that case, which makes it really trivial to deal with
> optional PHYs. You really don't have to care about them at all after you
> get NULL from phy_get_optional().

That case really doesn't exist for the regulator API, that's the case
which is handled by the dummy regulator in the regulator API - it really
is rare to see devices that operate without power.  I suspect the PHY
case is similar in that there always is a PHY of some kind.  If a
consumer can be written like that then it just shouldn't be using
regulator_get_optional() in the first place.

> > > Looking at some of the consumer drivers I can see that lots of them don't
> > > correctly handle the return value of regulator_get_optional:

> > This API is *really* commonly abused, especially in the graphics
> > subsystem which for some reason has lots of users even though I don't
> > think I've ever seen a sensible use of the API there.  As far as I can
> > tell the graphics subsystem mostly suffers from people cut'n'pasting
> > from the Qualcomm BSP which is full of really bad practice.  It's really
> > frustrating since I will intermittently go through and clean things up
> > but the uses just keep coming back into the subsystem.

> The intention here is to make it more difficult to abuse. Obviously if
> people keep copying abuses from one driver to another that's a different
> story. But if there was no way to abuse the API and if it automatically
> did the right thing, even copy/paste abuse would be difficult to pull
> off.

I fail to see how returning NULL would change anything with regard to
the API being abused, if anything I think it'd make it more likely to
have people write things that break for probe deferral since it's not
reminding people about error codes (that's very marginal though).

> > No, they'd all still be broken for probe deferral (which is commonly
> > caused by off-chip devices) and they'd crash as soon as they try to call
> > any further regulator API functions.

> I think what Andrew was suggesting here was to make it easier for people
> to determine whether or not an optional regulator was in fact requested
> successfully or not. Many consumers already set the optional supply to
> NULL and then check for that before calling any regulator API.

I think that really is very marginal.

> If regulator_get_optional() returned NULL for absent optional supplies,
> this could be unified across all drivers. And it would allow treating
> NULL regulators special, if that's something you'd be willing to do.

Not really, no.  What we're doing at the minute at least mitigates
against the problems we used to have with people just not handling
errors at all and having the dummy regulator gives us some opportunity
to do checks on API usage in the consumers that would otherwise not get
run which helps robustness for the systems where there are real,
controllable regulators providing those supplies.

> In either case, the number of abuses shows that people clearly don't
> understand how to use this. So there are two options: a) fix abuse every
> time we come across it or b) try to change the API to make it more
> difficult to abuse.

I don't think there's much that can be done to avoid abuse of the API, 
I've thought of things like having a #define around the prototype for
"yes I really meant to use this" which consumers would have to define in
an effort to try to flag up to developers and reviewers that it's not
normal.

> > > I guess I'm looking here for something that can simplify consumer error
> > > handling - it's easy to get wrong and it seems that many drivers may be wrong.

> > The overwhelming majority of the users just shouldn't be using this API.
> > If there weren't devices that absolutely *need* this API it wouldn't be
> > there in the first place since it seems like a magent for bad code, it's
> > so disappointing how bad the majority of the consumer code is.

> Yeah, I guess in many cases this API is used as a cheap way to model
> always-on, fixed-voltage regulators. It's pretty hard to change those
> after the fact, though, because they're usually happening as part of
> device tree bindings implementation, so by the time we notice them,
> they've often become ABI...

I don't follow here?  Such users are going to be the common case for the
regulator API and shouldn't have any problems using normal regulator_get().  
When I say users shouldn't be using this API I mean _get_optional()
specifically.
Mark Brown Aug. 29, 2019, 1:16 p.m. UTC | #10
On Thu, Aug 29, 2019 at 01:08:35PM +0100, Andrew Murray wrote:
> On Thu, Aug 29, 2019 at 01:46:03PM +0200, Thierry Reding wrote:

> > If regulator_get_optional() returned NULL for absent optional supplies,
> > this could be unified across all drivers. And it would allow treating
> > NULL regulators special, if that's something you'd be willing to do.

> > In either case, the number of abuses shows that people clearly don't
> > understand how to use this. So there are two options: a) fix abuse every
> > time we come across it or b) try to change the API to make it more
> > difficult to abuse.

> Sure. I think we end up with something like:

> diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
> index e0c0cf462004..67e2a6d7abf6 100644
> --- a/drivers/regulator/core.c
> +++ b/drivers/regulator/core.c
> @@ -1868,6 +1868,9 @@ struct regulator *_regulator_get(struct device *dev, const char *id,
>                 }
>  
>                 switch (get_type) {
> +               case OPTIONAL_GET:
> +                       return NULL;
> +

Implementing returning NULL is not hard.  How returning NULL discourages
people from using regulator_get_optional() when they shouldn't be using
it in the first place is not clear to me.
Andrew Murray Aug. 29, 2019, 1:43 p.m. UTC | #11
On Thu, Aug 29, 2019 at 02:16:03PM +0100, Mark Brown wrote:
> On Thu, Aug 29, 2019 at 01:08:35PM +0100, Andrew Murray wrote:
> > On Thu, Aug 29, 2019 at 01:46:03PM +0200, Thierry Reding wrote:
> 
> > > If regulator_get_optional() returned NULL for absent optional supplies,
> > > this could be unified across all drivers. And it would allow treating
> > > NULL regulators special, if that's something you'd be willing to do.
> 
> > > In either case, the number of abuses shows that people clearly don't
> > > understand how to use this. So there are two options: a) fix abuse every
> > > time we come across it or b) try to change the API to make it more
> > > difficult to abuse.
> 
> > Sure. I think we end up with something like:
> 
> > diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
> > index e0c0cf462004..67e2a6d7abf6 100644
> > --- a/drivers/regulator/core.c
> > +++ b/drivers/regulator/core.c
> > @@ -1868,6 +1868,9 @@ struct regulator *_regulator_get(struct device *dev, const char *id,
> >                 }
> >  
> >                 switch (get_type) {
> > +               case OPTIONAL_GET:
> > +                       return NULL;
> > +
> 
> Implementing returning NULL is not hard.  How returning NULL discourages
> people from using regulator_get_optional() when they shouldn't be using
> it in the first place is not clear to me.

I think this is the part I haven't understood until now.

There are many consumer drivers that will not have a regulator specified in
the DT - this may be because they are optional (possibly a rare thing) or
because they don't need to be specified (because they are always on and
require no software interaction)...

Where they are not specified, because there is really no reason for them to
be described in the DT - then these drivers should use regulator_get and
be happy with a dummy regulator. This gives a benefit as if another hardware
version uses the same driver but does have a regulator that needs software
control then we can be confident it will work.

Where regulators are really optional, then regulator_get_optional is used
and -ENODEV can be used by the caller to perform a different course of action
if required. (Does this use-case actually exist?)

I guess I interpreted _optional as 'it's OK if you can't provide a regulator',
whereas the meaning is really 'get me a regulator that may not exist'.

Is my understanding correct? If so I guess another course of action would
be to clean-up users of _optional and convert them to regulator_get where
appropriate?

Thanks,

Andrew Murray
Thierry Reding Aug. 29, 2019, 2:58 p.m. UTC | #12
On Thu, Aug 29, 2019 at 02:03:23PM +0100, Mark Brown wrote:
> On Thu, Aug 29, 2019 at 01:46:03PM +0200, Thierry Reding wrote:
> > On Thu, Aug 29, 2019 at 12:17:29PM +0100, Mark Brown wrote:
> 
> > > Why would we return NULL?  The caller is going to have to check the
> > > error code anyway since they need to handle -EPROBE_DEFER and NULL is
> > > never a valid thing to use with the regulator API.
> 
> > I think the idea is that consumers would do something like this:
> 
> > 	supply = regulator_get_optional(dev, "foo");
> > 	if (IS_ERR(supply)) {
> > 		/* -EPROBE_DEFER handled as part of this */
> > 		return PTR_ERR(supply);
> > 	}
> 
> > 	/*
> > 	 * Optional supply is NULL here, making it "easier" to check
> > 	 * whether or not to use it.
> > 	 */
> 
> > I suppose checking using IS_ERR() is equally simple, so this mostly
> > boils down to taste...
> 
> Or setting a flag saying which mode to drive the chip in for example.
> 
> > The PHY subsystem, for instance, uses a similar approach but it goes one
> > step further. All APIs can take NULL as their struct phy * argument and
> > return success in that case, which makes it really trivial to deal with
> > optional PHYs. You really don't have to care about them at all after you
> > get NULL from phy_get_optional().
> 
> That case really doesn't exist for the regulator API, that's the case
> which is handled by the dummy regulator in the regulator API - it really
> is rare to see devices that operate without power.  I suspect the PHY
> case is similar in that there always is a PHY of some kind.  If a
> consumer can be written like that then it just shouldn't be using
> regulator_get_optional() in the first place.

One recent example, which is actually what spawned this series of
cleanups, that comes to mind here is the case where we need regulators
to supply power to a PCI device. The PCI device that consumes power is
itself not listed in the device tree, because it is usually enumerated
via the PCI bus. However, we still have to provide power to a PCI slot
to make sure the device powers up and can be enumerated in the first
place. For this particular case it's pretty obvious that the supplies
are in fact required.

However, the same PCI controller can also be used with onboard devices
where no standard 3.3 V and 12 V need to be supplied (as in the PCIe
slot case above). Instead there are usually different power supplies
to power the device. Also, since these supplies are usually required to
bring the device up and make it detectable on the PCI bus, these
supplies are typically always on. Also, since the PCI controller is not
the consumer of the power supplies, it's impossible to specify which
supplies exactly are needed by the device (they'd have to be described
by a binding for these devices, but drivers couldn't be requesting them
because without the supplies being enabled the device would never show
up on the PCI bus and the driver would never bind).

Do you think those 3.3 V and 12 V regulators would qualify for use of
the regulator_get_optional() API? Because in the case where the PCIe
controller doesn't drive a PCIe slot, the 3.3 V and 12 V supplies really
are not there.

> > > > Looking at some of the consumer drivers I can see that lots of them don't
> > > > correctly handle the return value of regulator_get_optional:
> 
> > > This API is *really* commonly abused, especially in the graphics
> > > subsystem which for some reason has lots of users even though I don't
> > > think I've ever seen a sensible use of the API there.  As far as I can
> > > tell the graphics subsystem mostly suffers from people cut'n'pasting
> > > from the Qualcomm BSP which is full of really bad practice.  It's really
> > > frustrating since I will intermittently go through and clean things up
> > > but the uses just keep coming back into the subsystem.
> 
> > The intention here is to make it more difficult to abuse. Obviously if
> > people keep copying abuses from one driver to another that's a different
> > story. But if there was no way to abuse the API and if it automatically
> > did the right thing, even copy/paste abuse would be difficult to pull
> > off.
> 
> I fail to see how returning NULL would change anything with regard to
> the API being abused, if anything I think it'd make it more likely to
> have people write things that break for probe deferral since it's not
> reminding people about error codes (that's very marginal though).

People would of course still need to check the return value for errors.
Returning NULL would only make sure that if there's really no regulator
there's a standard way to signal that. And drivers could always use
similar code patterns to deal with optional regulators. Right now there
are two patterns: set regulator to NULL if the regulator is not
available (and use !supply checks before calling regulator APIs) or
leave the regulator set to whatever error pointer was returned (and use
!IS_ERR() checks before calling regulator APIs).

In many cases above, the drivers will continue without a regulator
irrespective of the error code returned. If we return NULL in exactly
only the case where the regulator is not there and an ERR_PTR()
otherwise, it becomes much clearer that all errors should just be
propagated to the caller (including -EPROBE_DEFER) and otherwise we can
continue with appropriate NULL checks.

Again, yeah, this might be marginal.

> > > No, they'd all still be broken for probe deferral (which is commonly
> > > caused by off-chip devices) and they'd crash as soon as they try to call
> > > any further regulator API functions.
> 
> > I think what Andrew was suggesting here was to make it easier for people
> > to determine whether or not an optional regulator was in fact requested
> > successfully or not. Many consumers already set the optional supply to
> > NULL and then check for that before calling any regulator API.
> 
> I think that really is very marginal.
> 
> > If regulator_get_optional() returned NULL for absent optional supplies,
> > this could be unified across all drivers. And it would allow treating
> > NULL regulators special, if that's something you'd be willing to do.
> 
> Not really, no.  What we're doing at the minute at least mitigates
> against the problems we used to have with people just not handling
> errors at all and having the dummy regulator gives us some opportunity
> to do checks on API usage in the consumers that would otherwise not get
> run which helps robustness for the systems where there are real,
> controllable regulators providing those supplies.
> 
> > In either case, the number of abuses shows that people clearly don't
> > understand how to use this. So there are two options: a) fix abuse every
> > time we come across it or b) try to change the API to make it more
> > difficult to abuse.
> 
> I don't think there's much that can be done to avoid abuse of the API, 
> I've thought of things like having a #define around the prototype for
> "yes I really meant to use this" which consumers would have to define in
> an effort to try to flag up to developers and reviewers that it's not
> normal.
> 
> > > > I guess I'm looking here for something that can simplify consumer error
> > > > handling - it's easy to get wrong and it seems that many drivers may be wrong.
> 
> > > The overwhelming majority of the users just shouldn't be using this API.
> > > If there weren't devices that absolutely *need* this API it wouldn't be
> > > there in the first place since it seems like a magent for bad code, it's
> > > so disappointing how bad the majority of the consumer code is.
> 
> > Yeah, I guess in many cases this API is used as a cheap way to model
> > always-on, fixed-voltage regulators. It's pretty hard to change those
> > after the fact, though, because they're usually happening as part of
> > device tree bindings implementation, so by the time we notice them,
> > they've often become ABI...
> 
> I don't follow here?  Such users are going to be the common case for the
> regulator API and shouldn't have any problems using normal regulator_get().  
> When I say users shouldn't be using this API I mean _get_optional()
> specifically.

True, even if the regulator is specified as optional in the bindings,
using regulator_get() would effectively make it optional for the driver
given that it returns the dummy.

Thierry
Mark Brown Aug. 29, 2019, 3:25 p.m. UTC | #13
On Thu, Aug 29, 2019 at 02:43:46PM +0100, Andrew Murray wrote:

> Where they are not specified, because there is really no reason for them to
> be described in the DT - then these drivers should use regulator_get and
> be happy with a dummy regulator. This gives a benefit as if another hardware
> version uses the same driver but does have a regulator that needs software
> control then we can be confident it will work.

The common use case for this is that some boards have flexible power
control which allows them save energy by powering off chips that are not
in use, the driver doesn't super care if it actually gets powered off or
not but it's nice to be able to do so.

> Where regulators are really optional, then regulator_get_optional is used
> and -ENODEV can be used by the caller to perform a different course of action
> if required. (Does this use-case actually exist?)

Yes.  There are two main use cases.  One is for things like MMC where
there's optional supplies that can be used for some more advanced use
cases but their use needs to be negotiated between the host and device,
these typically enable lower power or higher performance modes at the
cost of complexity.  The other is for devices which have the option of
using an internal regulator but can use an external one.  This is
typically used either for performance reasons (the external regulator
might perform better but will increase board cost) or if some systems
need multiple devices to be operating with the same reference voltage.

> I guess I interpreted _optional as 'it's OK if you can't provide a regulator',
> whereas the meaning is really 'get me a regulator that may not exist'.

> Is my understanding correct? If so I guess another course of action would

Yes.

> be to clean-up users of _optional and convert them to regulator_get where
> appropriate?

Yes, I keep doing that intermittently (hence my frustration with more
usages continually popping up in graphics drivers).
Mark Brown Aug. 29, 2019, 5:55 p.m. UTC | #14
On Thu, Aug 29, 2019 at 04:58:20PM +0200, Thierry Reding wrote:

> However, the same PCI controller can also be used with onboard devices
> where no standard 3.3 V and 12 V need to be supplied (as in the PCIe
> slot case above). Instead there are usually different power supplies
> to power the device. Also, since these supplies are usually required to
> bring the device up and make it detectable on the PCI bus, these
> supplies are typically always on. Also, since the PCI controller is not
> the consumer of the power supplies, it's impossible to specify which
> supplies exactly are needed by the device (they'd have to be described
> by a binding for these devices, but drivers couldn't be requesting them
> because without the supplies being enabled the device would never show
> up on the PCI bus and the driver would never bind).

These on board devices that might have non-standard supplies are more of
a problem as you say.  This is a general problem with enumerable buses
that get deployed in embedded contexts, things like clocks and GPIOs can
also be required for enumeration.  Ideally we'd have some pre-probe
callback that could sort these things out but that's core device model
work I've never found the time/enthusiasm to work on.  IIRC there is
already a PCI binding for DT so I guess we could do something like say
it's up to the driver for the PCI device and just call probe() even
without enumeration and require the device to power things up but that
feels very messy.

> Do you think those 3.3 V and 12 V regulators would qualify for use of
> the regulator_get_optional() API? Because in the case where the PCIe
> controller doesn't drive a PCIe slot, the 3.3 V and 12 V supplies really
> are not there.

It doesn't fill me with joy.  I think the main thing is working out
where to attach the supply.

The cleanest and most idiomatic thing from a regulator perspective for
the physical slots would be for the supplies to be attached to the PCI
slot rather than the controller in the DT, even though they will get
driven by the controller driver in practice.  This would mean that
controllers would have optional slot objects with mandatory regulators,
the controller would then have to cope with powering the slot up before
enumerating it but would be able to power it off if nothing's plugged in
and save a bit of power, it also copes with the case where individual
slots have separately controllable power.  I'm not sure how this plays
more generally but since the slots are a physical thing you can point to
on the board it doesn't seem unreasonable.  Every time I see these
supplies attached to the controller for a bus it always feels a bit
wrong, especially in interactions with soldered down devices.

That feels cleaner than saying that the controller has a couple of
optional supplies, it plays a bit better with soldered down devices and
with slots having distinct supplies and it generally feels a bit more
consistent.  I'm not super sure I'm *happy* with this approach though,
it feels a bit awkward with the device model.

> > When I say users shouldn't be using this API I mean _get_optional()
> > specifically.

> True, even if the regulator is specified as optional in the bindings,
> using regulator_get() would effectively make it optional for the driver
> given that it returns the dummy.

Unless the hardware can genuinely cope with there being literally no
power supply there (as opposed to some fixed voltage thing) it probably
shouldn't be specifying the supply as optional in the bindings either :/
diff mbox series

Patch

diff --git a/drivers/pci/controller/pcie-iproc-platform.c b/drivers/pci/controller/pcie-iproc-platform.c
index 5a3550b6bb29..9ee6200a66f4 100644
--- a/drivers/pci/controller/pcie-iproc-platform.c
+++ b/drivers/pci/controller/pcie-iproc-platform.c
@@ -93,12 +93,9 @@  static int iproc_pcie_pltfm_probe(struct platform_device *pdev)
 	pcie->need_ib_cfg = of_property_read_bool(np, "dma-ranges");
 
 	/* PHY use is optional */
-	pcie->phy = devm_phy_get(dev, "pcie-phy");
-	if (IS_ERR(pcie->phy)) {
-		if (PTR_ERR(pcie->phy) == -EPROBE_DEFER)
-			return -EPROBE_DEFER;
-		pcie->phy = NULL;
-	}
+	pcie->phy = devm_phy_optional_get(dev, "pcie-phy");
+	if (IS_ERR(pcie->phy))
+		return PTR_ERR(pcie->phy);
 
 	ret = devm_of_pci_get_host_bridge_resources(dev, 0, 0xff, &resources,
 						    &iobase);