[tpmdd-devel] base/platform: fix panic when probe function is NULL
diff mbox

Message ID 1448564494-23218-1-git-send-email-martin.wilck@ts.fujitsu.com
State New
Headers show

Commit Message

Martin Wilck Nov. 26, 2015, 7:01 p.m. UTC
From: Martin Wilck <Martin.Wilck@ts.fujitsu.com>

Since b8b2c7d845d5, platform_drv_probe() is called for all platform
devices. If drv->probe is NULL, and dev_pm_domain_attach() fails,
platform_drv_probe() will return the error code from dev_pm_domain_attach().

This causes real_probe() to enter the "probe_failed" path and set
dev->driver to NULL. Before b8b2c7d845d5, real_probe() would assume
success if both dev->bus->probe and drv->probe are missing.

This may cause a panic later. For example, inserting the tpm_tis
driver with parameter "force=1" (i.e. registering tpm_tis as a platform
driver) will panic in tpmm_chip_alloc() because dev->driver is NULL:

     chip->cdev.owner = chip->pdev->driver->owner;

This patch fixes this by returning success in platform_drv_probe() if
"just" dev_pm_domain_attach() had failed. This restores the semantics
of platform_device_register_XXX() if the associated platform driver has
no "probe" function.

Fixes: b8b2c7d845d5 ("base/platform: assert that dev_pm_domain
callbacks are called unconditionally")

Signed-off-by: Martin Wilck <Martin.Wilck@ts.fujitsu.com>
---
 drivers/base/platform.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

Comments

Jason Gunthorpe Nov. 26, 2015, 8:30 p.m. UTC | #1
On Thu, Nov 26, 2015 at 08:01:34PM +0100, martin.wilck@ts.fujitsu.com wrote:
> From: Martin Wilck <Martin.Wilck@ts.fujitsu.com>
> 
> Since b8b2c7d845d5, platform_drv_probe() is called for all platform
> devices. If drv->probe is NULL, and dev_pm_domain_attach() fails,
> platform_drv_probe() will return the error code from dev_pm_domain_attach().
> 
> This causes real_probe() to enter the "probe_failed" path and set
> dev->driver to NULL. Before b8b2c7d845d5, real_probe() would assume
> success if both dev->bus->probe and drv->probe are missing.
> 
> This may cause a panic later. For example, inserting the tpm_tis
> driver with parameter "force=1" (i.e. registering tpm_tis as a platform
> driver) will panic in tpmm_chip_alloc() because dev->driver is NULL:
> 
>      chip->cdev.owner = chip->pdev->driver->owner;

Is this happening because tpm_tis is not creating the platform device
properly? ie it just calls platform_device_register_simple and then
force initializes it via tpm_tis_init, which expects to be called from
a probe function with an attached driver.

Instead we should setup a proper platform device with the default
IO range for x86 and let the driver core call tpm_tis_init via
tis_drv.probe.

Would changing things in this way fix the problem you've observed?

I have some patches to do this that are part of my OF enablement
series, but I can make something simpler that would deal with this
fairly quickly if you can test.

Jason

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140
Martin Wilck Nov. 27, 2015, 7:32 a.m. UTC | #2
On Do, 2015-11-26 at 13:30 -0700, Jason Gunthorpe wrote:
> On Thu, Nov 26, 2015 at 08:01:34PM +0100, martin.wilck@ts.fujitsu.com wrote:
> > From: Martin Wilck <Martin.Wilck@ts.fujitsu.com>
> > 
> > Since b8b2c7d845d5, platform_drv_probe() is called for all platform
> > devices. If drv->probe is NULL, and dev_pm_domain_attach() fails,
> > platform_drv_probe() will return the error code from dev_pm_domain_attach().
> > 
> > This causes real_probe() to enter the "probe_failed" path and set
> > dev->driver to NULL. Before b8b2c7d845d5, real_probe() would assume
> > success if both dev->bus->probe and drv->probe are missing.
> > 
> > This may cause a panic later. For example, inserting the tpm_tis
> > driver with parameter "force=1" (i.e. registering tpm_tis as a platform
> > driver) will panic in tpmm_chip_alloc() because dev->driver is NULL:
> > 
> >      chip->cdev.owner = chip->pdev->driver->owner;
> 
> Is this happening because tpm_tis is not creating the platform device
> properly? ie it just calls platform_device_register_simple and then
> force initializes it via tpm_tis_init, which expects to be called from
> a probe function with an attached driver.
> 
> Instead we should setup a proper platform device with the default
> IO range for x86 and let the driver core call tpm_tis_init via
> tis_drv.probe.
> 
> Would changing things in this way fix the problem you've observed?

I think so. Nonetheless, patch b8b2c7d845d5 introduced a change in the
way platform device registration behaves. The platform device code seems
to be prepared for cases where platform_driver->probe == NULL, so that
case should be handled gracefully. Otherwise, failure should occur
earlier, e.g. when platform_driver_register() is called with 
platform_driver->probe == NULL. tpm_tis may not be the only driver that
uses platform_device_register_simple() in this way.

> I have some patches to do this that are part of my OF enablement
> series, but I can make something simpler that would deal with this
> fairly quickly if you can test.

Let's first wait what the platform guys say.

Martin

------------------------------------------------------------------------------
Uwe Kleine-König Nov. 27, 2015, 10:11 a.m. UTC | #3
Hello Martin,

On Thu, Nov 26, 2015 at 08:01:34PM +0100, martin.wilck@ts.fujitsu.com wrote:
> From: Martin Wilck <Martin.Wilck@ts.fujitsu.com>
> 
> Since b8b2c7d845d5, platform_drv_probe() is called for all platform
> devices. If drv->probe is NULL, and dev_pm_domain_attach() fails,
> platform_drv_probe() will return the error code from dev_pm_domain_attach().

Correct, this is an unintended change of behaviour introduced in
b8b2c7d845d5.

> This causes real_probe() to enter the "probe_failed" path and set
> dev->driver to NULL. Before b8b2c7d845d5, real_probe() would assume
> success if both dev->bus->probe and drv->probe are missing.
> 
> This may cause a panic later. For example, inserting the tpm_tis
> driver with parameter "force=1" (i.e. registering tpm_tis as a platform
> driver) will panic in tpmm_chip_alloc() because dev->driver is NULL:
> 
>      chip->cdev.owner = chip->pdev->driver->owner;

This sounds like a separate issue though. Looking at init_tis there is:

        rc = platform_driver_register(&tis_drv);
        if (rc < 0)
                return rc;
        pdev = platform_device_register_simple("tpm_tis", -1, NULL, 0);
        if (IS_ERR(pdev)) {
                rc = PTR_ERR(pdev);
                goto err_dev;
        }
        rc = tpm_tis_init(&pdev->dev, &tis_default_info, NULL);

tpm_tis_init calls tpmm_chip_alloc which barfs when pdev (i.e. the return value
of platform_device_register_simple above) isn't bound. It is not allowed
to assume that the device is bound after the above function calls.

So I'd say drop the paragraph about tpm_tis and the change is fine.

> This patch fixes this by returning success in platform_drv_probe() if
> "just" dev_pm_domain_attach() had failed. This restores the semantics
> of platform_device_register_XXX() if the associated platform driver has
> no "probe" function.
> 
> Fixes: b8b2c7d845d5 ("base/platform: assert that dev_pm_domain
> callbacks are called unconditionally")
> 

I think line breaks in the Fixes: line are frowned on. Also usually
there is no empty line between Fixes: and S-o-b:.

> Signed-off-by: Martin Wilck <Martin.Wilck@ts.fujitsu.com>
> ---
>  drivers/base/platform.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 1dd6d3b..c994e76 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -513,10 +513,14 @@ static int platform_drv_probe(struct device *_dev)
>  		return ret;
>  
>  	ret = dev_pm_domain_attach(_dev, true);
> -	if (ret != -EPROBE_DEFER && drv->probe) {
> -		ret = drv->probe(dev);
> -		if (ret)
> -			dev_pm_domain_detach(_dev, true);
> +	if (ret != -EPROBE_DEFER) {
> +		if (drv->probe) {
> +			ret = drv->probe(dev);
> +			if (ret)
> +				dev_pm_domain_detach(_dev, true);
> +		} else
> +			/* don't fail if just dev_pm_domain_attach failed */
> +			ret = 0;

An else that has a } should also have a {, according to 
checkpatch and Documentation/CodingStyle. You can write it
alternatively as:

	if (ret != -EPROBE_DEFER) {
		if (drv->probe)
			ret = drv->probe(dev);
		else
			ret = 0;

		if (ret)
			dev_pm_domain_detach(_dev, true);
	}

.

Best regards
Uwe
Jarkko Sakkinen Nov. 28, 2015, 4:34 p.m. UTC | #4
On Thu, Nov 26, 2015 at 08:01:34PM +0100, martin.wilck@ts.fujitsu.com wrote:
> From: Martin Wilck <Martin.Wilck@ts.fujitsu.com>
> 
> Since b8b2c7d845d5, platform_drv_probe() is called for all platform
> devices. If drv->probe is NULL, and dev_pm_domain_attach() fails,
> platform_drv_probe() will return the error code from dev_pm_domain_attach().
> 
> This causes real_probe() to enter the "probe_failed" path and set
> dev->driver to NULL. Before b8b2c7d845d5, real_probe() would assume
> success if both dev->bus->probe and drv->probe are missing.
> 
> This may cause a panic later. For example, inserting the tpm_tis
> driver with parameter "force=1" (i.e. registering tpm_tis as a platform
> driver) will panic in tpmm_chip_alloc() because dev->driver is NULL:
> 
>      chip->cdev.owner = chip->pdev->driver->owner;
> 
> This patch fixes this by returning success in platform_drv_probe() if
> "just" dev_pm_domain_attach() had failed. This restores the semantics
> of platform_device_register_XXX() if the associated platform driver has
> no "probe" function.
> 
> Fixes: b8b2c7d845d5 ("base/platform: assert that dev_pm_domain
> callbacks are called unconditionally")
> 
> Signed-off-by: Martin Wilck <Martin.Wilck@ts.fujitsu.com>

Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>

> ---
>  drivers/base/platform.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 1dd6d3b..c994e76 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -513,10 +513,14 @@ static int platform_drv_probe(struct device *_dev)
>  		return ret;
>  
>  	ret = dev_pm_domain_attach(_dev, true);
> -	if (ret != -EPROBE_DEFER && drv->probe) {
> -		ret = drv->probe(dev);
> -		if (ret)
> -			dev_pm_domain_detach(_dev, true);
> +	if (ret != -EPROBE_DEFER) {
> +		if (drv->probe) {
> +			ret = drv->probe(dev);
> +			if (ret)
> +				dev_pm_domain_detach(_dev, true);
> +		} else
> +			/* don't fail if just dev_pm_domain_attach failed */
> +			ret = 0;
>  	}
>  
>  	if (drv->prevent_deferred_probe && ret == -EPROBE_DEFER) {
> -- 
> 1.8.3.1
> 
> 


------------------------------------------------------------------------------
Jarkko Sakkinen Nov. 28, 2015, 4:40 p.m. UTC | #5
On Thu, Nov 26, 2015 at 01:30:31PM -0700, Jason Gunthorpe wrote:
> On Thu, Nov 26, 2015 at 08:01:34PM +0100, martin.wilck@ts.fujitsu.com wrote:
> > From: Martin Wilck <Martin.Wilck@ts.fujitsu.com>
> > 
> > Since b8b2c7d845d5, platform_drv_probe() is called for all platform
> > devices. If drv->probe is NULL, and dev_pm_domain_attach() fails,
> > platform_drv_probe() will return the error code from dev_pm_domain_attach().
> > 
> > This causes real_probe() to enter the "probe_failed" path and set
> > dev->driver to NULL. Before b8b2c7d845d5, real_probe() would assume
> > success if both dev->bus->probe and drv->probe are missing.
> > 
> > This may cause a panic later. For example, inserting the tpm_tis
> > driver with parameter "force=1" (i.e. registering tpm_tis as a platform
> > driver) will panic in tpmm_chip_alloc() because dev->driver is NULL:
> > 
> >      chip->cdev.owner = chip->pdev->driver->owner;
> 
> Is this happening because tpm_tis is not creating the platform device
> properly? ie it just calls platform_device_register_simple and then
> force initializes it via tpm_tis_init, which expects to be called from
> a probe function with an attached driver.

Agreed. We should have a probe callback.

> Instead we should setup a proper platform device with the default
> IO range for x86 and let the driver core call tpm_tis_init via
> tis_drv.probe.
> 
> Would changing things in this way fix the problem you've observed?
> 
> I have some patches to do this that are part of my OF enablement
> series, but I can make something simpler that would deal with this
> fairly quickly if you can test.

Does the patch set that you sent include the fix or not? I haven't yet
reviewed them properly.

> Jason

/Jarkko

------------------------------------------------------------------------------
Jarkko Sakkinen Nov. 28, 2015, 4:49 p.m. UTC | #6
On Sat, Nov 28, 2015 at 06:40:03PM +0200, Jarkko Sakkinen wrote:
> On Thu, Nov 26, 2015 at 01:30:31PM -0700, Jason Gunthorpe wrote:
> > On Thu, Nov 26, 2015 at 08:01:34PM +0100, martin.wilck@ts.fujitsu.com wrote:
> > > From: Martin Wilck <Martin.Wilck@ts.fujitsu.com>
> > > 
> > > Since b8b2c7d845d5, platform_drv_probe() is called for all platform
> > > devices. If drv->probe is NULL, and dev_pm_domain_attach() fails,
> > > platform_drv_probe() will return the error code from dev_pm_domain_attach().
> > > 
> > > This causes real_probe() to enter the "probe_failed" path and set
> > > dev->driver to NULL. Before b8b2c7d845d5, real_probe() would assume
> > > success if both dev->bus->probe and drv->probe are missing.
> > > 
> > > This may cause a panic later. For example, inserting the tpm_tis
> > > driver with parameter "force=1" (i.e. registering tpm_tis as a platform
> > > driver) will panic in tpmm_chip_alloc() because dev->driver is NULL:
> > > 
> > >      chip->cdev.owner = chip->pdev->driver->owner;
> > 
> > Is this happening because tpm_tis is not creating the platform device
> > properly? ie it just calls platform_device_register_simple and then
> > force initializes it via tpm_tis_init, which expects to be called from
> > a probe function with an attached driver.
> 
> Agreed. We should have a probe callback.
> 
> > Instead we should setup a proper platform device with the default
> > IO range for x86 and let the driver core call tpm_tis_init via
> > tis_drv.probe.
> > 
> > Would changing things in this way fix the problem you've observed?
> > 
> > I have some patches to do this that are part of my OF enablement
> > series, but I can make something simpler that would deal with this
> > fairly quickly if you can test.
> 
> Does the patch set that you sent include the fix or not? I haven't yet
> reviewed them properly.

Another question: does you patch series include an alternative fix for
the probe bug or should I just pick Martins fix? As I sad previously I
was seriously lost with the race but now I understand what you and
Martin were saying (and feel utterly stupid + ashamed!). Now I'm just
thinking, which fix I should pick.

Anyway, I'll try to go through your code ASAP.

/Jarkko

------------------------------------------------------------------------------
Jason Gunthorpe Nov. 28, 2015, 10:52 p.m. UTC | #7
On Sat, Nov 28, 2015 at 06:40:03PM +0200, Jarkko Sakkinen wrote:
> > I have some patches to do this that are part of my OF enablement
> > series, but I can make something simpler that would deal with this
> > fairly quickly if you can test.
> 
> Does the patch set that you sent include the fix or not? I haven't yet
> reviewed them properly.

No fixing probe is another task. I can send some patches for that when
we are done with the IRQ stuff. That is something we should fix no
matter what..

BTW, please test my IRQ series, I forgot to mention I was unable to
test it properly here...

Jason

------------------------------------------------------------------------------
Uwe Kleine-König Nov. 29, 2015, 9:54 a.m. UTC | #8
Hello Jarkko,

On Sat, Nov 28, 2015 at 06:34:47PM +0200, Jarkko Sakkinen wrote:
> On Thu, Nov 26, 2015 at 08:01:34PM +0100, martin.wilck@ts.fujitsu.com wrote:
> > From: Martin Wilck <Martin.Wilck@ts.fujitsu.com>
> > 
> > Since b8b2c7d845d5, platform_drv_probe() is called for all platform
> > devices. If drv->probe is NULL, and dev_pm_domain_attach() fails,
> > platform_drv_probe() will return the error code from dev_pm_domain_attach().
> > 
> > This causes real_probe() to enter the "probe_failed" path and set
> > dev->driver to NULL. Before b8b2c7d845d5, real_probe() would assume
> > success if both dev->bus->probe and drv->probe are missing.
> > 
> > This may cause a panic later. For example, inserting the tpm_tis
> > driver with parameter "force=1" (i.e. registering tpm_tis as a platform
> > driver) will panic in tpmm_chip_alloc() because dev->driver is NULL:
> > 
> >      chip->cdev.owner = chip->pdev->driver->owner;
> > 
> > This patch fixes this by returning success in platform_drv_probe() if
> > "just" dev_pm_domain_attach() had failed. This restores the semantics
> > of platform_device_register_XXX() if the associated platform driver has
> > no "probe" function.
> > 
> > Fixes: b8b2c7d845d5 ("base/platform: assert that dev_pm_domain
> > callbacks are called unconditionally")
> > 
> > Signed-off-by: Martin Wilck <Martin.Wilck@ts.fujitsu.com>
> 
> Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>

While the patch is fine, the commit log is not. It blames b8b2c7d845d5
to be responsible for a panic, but in fact it only breaks the wrong
assumption of the tpm_tis driver.

So I'm not sure how to interpret your Ack, IMHO it should not make
gregkh pick up the patch as is.

Best regards
Uwe
Martin Wilck Nov. 30, 2015, 7:42 a.m. UTC | #9
Hello Uwe,

thanks for your review.

> This may cause a panic later. For example, inserting the tpm_tis
> > driver with parameter "force=1" (i.e. registering tpm_tis as a platform
> > driver) will panic in tpmm_chip_alloc() because dev->driver is NULL:
> > 
> >      chip->cdev.owner = chip->pdev->driver->owner;
> 
> This sounds like a separate issue though. Looking at init_tis there is:
> 
>         rc = platform_driver_register(&tis_drv);
>         if (rc < 0)
>                 return rc;
>         pdev = platform_device_register_simple("tpm_tis", -1, NULL, 0);
>         if (IS_ERR(pdev)) {
>                 rc = PTR_ERR(pdev);
>                 goto err_dev;
>         }
>         rc = tpm_tis_init(&pdev->dev, &tis_default_info, NULL);
> 
> tpm_tis_init calls tpmm_chip_alloc which barfs when pdev (i.e. the return value
> of platform_device_register_simple above) isn't bound. It is not allowed
> to assume that the device is bound after the above function calls.

I agree that the TPM platform device code deserves improvement. Jason
wrote that he has already some patches available for that.

I lack the knowledge to judge whether or not tpm_is_init's assumption
was correct. But, maybe just by luck, this assumption used to be *true*
until patch b8b2c7d845d5. Driver and device were matched by name
("tpm_tis") by the platform driver probing code, and device and driver
were actually bound to each other after this sequence of calls. 

> So I'd say drop the paragraph about tpm_tis and the change is fine.

I didn't mean to blame your patch. But a note about the panic might be
helpful just in case someone else runs into the same problem. The
connection between your patch and tpm_tis loading is far from obvious.
I mentioned the panic in order to clarify that this wasn't just a
theoretical issue.

Anyway, I'll resubmit with your style hints applied and will try to find
a wording for the commit message that we can agree upon.

Best Regards,
Martin

> 
> > This patch fixes this by returning success in platform_drv_probe() if
> > "just" dev_pm_domain_attach() had failed. This restores the semantics
> > of platform_device_register_XXX() if the associated platform driver has
> > no "probe" function.
> > 
> > Fixes: b8b2c7d845d5 ("base/platform: assert that dev_pm_domain
> > callbacks are called unconditionally")
> > 
> 
> I think line breaks in the Fixes: line are frowned on. Also usually
> there is no empty line between Fixes: and S-o-b:.
> 
> > Signed-off-by: Martin Wilck <Martin.Wilck@ts.fujitsu.com>
> > ---
> >  drivers/base/platform.c | 12 ++++++++----
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> > index 1dd6d3b..c994e76 100644
> > --- a/drivers/base/platform.c
> > +++ b/drivers/base/platform.c
> > @@ -513,10 +513,14 @@ static int platform_drv_probe(struct device *_dev)
> >  		return ret;
> >  
> >  	ret = dev_pm_domain_attach(_dev, true);
> > -	if (ret != -EPROBE_DEFER && drv->probe) {
> > -		ret = drv->probe(dev);
> > -		if (ret)
> > -			dev_pm_domain_detach(_dev, true);
> > +	if (ret != -EPROBE_DEFER) {
> > +		if (drv->probe) {
> > +			ret = drv->probe(dev);
> > +			if (ret)
> > +				dev_pm_domain_detach(_dev, true);
> > +		} else
> > +			/* don't fail if just dev_pm_domain_attach failed */
> > +			ret = 0;
> 
> An else that has a } should also have a {, according to 
> checkpatch and Documentation/CodingStyle. You can write it
> alternatively as:
> 
> 	if (ret != -EPROBE_DEFER) {
> 		if (drv->probe)
> 			ret = drv->probe(dev);
> 		else
> 			ret = 0;
> 
> 		if (ret)
> 			dev_pm_domain_detach(_dev, true);
> 	}
> 
> .
> 
> Best regards
> Uwe
> 
------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140
Jarkko Sakkinen Nov. 30, 2015, 12:50 p.m. UTC | #10
On Sat, Nov 28, 2015 at 03:52:51PM -0700, Jason Gunthorpe wrote:
> On Sat, Nov 28, 2015 at 06:40:03PM +0200, Jarkko Sakkinen wrote:
> > > I have some patches to do this that are part of my OF enablement
> > > series, but I can make something simpler that would deal with this
> > > fairly quickly if you can test.
> > 
> > Does the patch set that you sent include the fix or not? I haven't yet
> > reviewed them properly.
> 
> No fixing probe is another task. I can send some patches for that when
> we are done with the IRQ stuff. That is something we should fix no
> matter what..
> 
> BTW, please test my IRQ series, I forgot to mention I was unable to
> test it properly here...

Got you. I need to at least test insmod/rmod (maybe couple of times in a
cycle).  Do you see any other code paths that could easily break because
of your patches?

> Jason

/Jarkko

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140
Jarkko Sakkinen Nov. 30, 2015, 12:56 p.m. UTC | #11
Hi Uwe,

On Sun, Nov 29, 2015 at 10:54:11AM +0100, Uwe Kleine-König wrote:
> Hello Jarkko,
> 
> On Sat, Nov 28, 2015 at 06:34:47PM +0200, Jarkko Sakkinen wrote:
> > On Thu, Nov 26, 2015 at 08:01:34PM +0100, martin.wilck@ts.fujitsu.com wrote:
> > > From: Martin Wilck <Martin.Wilck@ts.fujitsu.com>
> > > 
> > > Since b8b2c7d845d5, platform_drv_probe() is called for all platform
> > > devices. If drv->probe is NULL, and dev_pm_domain_attach() fails,
> > > platform_drv_probe() will return the error code from dev_pm_domain_attach().
> > > 
> > > This causes real_probe() to enter the "probe_failed" path and set
> > > dev->driver to NULL. Before b8b2c7d845d5, real_probe() would assume
> > > success if both dev->bus->probe and drv->probe are missing.
> > > 
> > > This may cause a panic later. For example, inserting the tpm_tis
> > > driver with parameter "force=1" (i.e. registering tpm_tis as a platform
> > > driver) will panic in tpmm_chip_alloc() because dev->driver is NULL:
> > > 
> > >      chip->cdev.owner = chip->pdev->driver->owner;
> > > 
> > > This patch fixes this by returning success in platform_drv_probe() if
> > > "just" dev_pm_domain_attach() had failed. This restores the semantics
> > > of platform_device_register_XXX() if the associated platform driver has
> > > no "probe" function.
> > > 
> > > Fixes: b8b2c7d845d5 ("base/platform: assert that dev_pm_domain
> > > callbacks are called unconditionally")
> > > 
> > > Signed-off-by: Martin Wilck <Martin.Wilck@ts.fujitsu.com>
> > 
> > Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> 
> While the patch is fine, the commit log is not. It blames b8b2c7d845d5
> to be responsible for a panic, but in fact it only breaks the wrong
> assumption of the tpm_tis driver.
> 
> So I'm not sure how to interpret your Ack, IMHO it should not make
> gregkh pick up the patch as is.

Alright. I don't think you can speak about *wrong assumptions* if the
semantics allowed not to have it before. *Where* it should be fixed is
another question. I'd keep the Fixes tag in all cases.

Jason, you had the fix for this issue directly to tpm_tis driver that
you haven't yet posted, right? Just double-checking this.

> Best regards
> Uwe
> 
> -- 
> Pengutronix e.K.                           | Uwe Kleine-König            |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |

/Jarkko

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140
Jarkko Sakkinen Nov. 30, 2015, 1:06 p.m. UTC | #12
On Mon, Nov 30, 2015 at 02:56:31PM +0200, Jarkko Sakkinen wrote:
> Hi Uwe,
> 
> On Sun, Nov 29, 2015 at 10:54:11AM +0100, Uwe Kleine-König wrote:
> > Hello Jarkko,
> > 
> > On Sat, Nov 28, 2015 at 06:34:47PM +0200, Jarkko Sakkinen wrote:
> > > On Thu, Nov 26, 2015 at 08:01:34PM +0100, martin.wilck@ts.fujitsu.com wrote:
> > > > From: Martin Wilck <Martin.Wilck@ts.fujitsu.com>
> > > > 
> > > > Since b8b2c7d845d5, platform_drv_probe() is called for all platform
> > > > devices. If drv->probe is NULL, and dev_pm_domain_attach() fails,
> > > > platform_drv_probe() will return the error code from dev_pm_domain_attach().
> > > > 
> > > > This causes real_probe() to enter the "probe_failed" path and set
> > > > dev->driver to NULL. Before b8b2c7d845d5, real_probe() would assume
> > > > success if both dev->bus->probe and drv->probe are missing.
> > > > 
> > > > This may cause a panic later. For example, inserting the tpm_tis
> > > > driver with parameter "force=1" (i.e. registering tpm_tis as a platform
> > > > driver) will panic in tpmm_chip_alloc() because dev->driver is NULL:
> > > > 
> > > >      chip->cdev.owner = chip->pdev->driver->owner;
> > > > 
> > > > This patch fixes this by returning success in platform_drv_probe() if
> > > > "just" dev_pm_domain_attach() had failed. This restores the semantics
> > > > of platform_device_register_XXX() if the associated platform driver has
> > > > no "probe" function.
> > > > 
> > > > Fixes: b8b2c7d845d5 ("base/platform: assert that dev_pm_domain
> > > > callbacks are called unconditionally")
> > > > 
> > > > Signed-off-by: Martin Wilck <Martin.Wilck@ts.fujitsu.com>
> > > 
> > > Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> > 
> > While the patch is fine, the commit log is not. It blames b8b2c7d845d5
> > to be responsible for a panic, but in fact it only breaks the wrong
> > assumption of the tpm_tis driver.
> > 
> > So I'm not sure how to interpret your Ack, IMHO it should not make
> > gregkh pick up the patch as is.
> 
> Alright. I don't think you can speak about *wrong assumptions* if the
> semantics allowed not to have it before. *Where* it should be fixed is
> another question. I'd keep the Fixes tag in all cases.
> 
> Jason, you had the fix for this issue directly to tpm_tis driver that
> you haven't yet posted, right? Just double-checking this.

Uwe, please ignore this :) Saw your more in-depth comment about platform
driver creation. Thank you. I somehow have missed it before.

/Jarkko

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140
Martin Wilck Dec. 1, 2015, 10:41 a.m. UTC | #13
Hello Uwe,

> This sounds like a separate issue though. Looking at init_tis there is:
> 
>         rc = platform_driver_register(&tis_drv);
>         if (rc < 0)
>                 return rc;
>         pdev = platform_device_register_simple("tpm_tis", -1, NULL, 0);
>         if (IS_ERR(pdev)) {
>                 rc = PTR_ERR(pdev);
>                 goto err_dev;
>         }
>         rc = tpm_tis_init(&pdev->dev, &tis_default_info, NULL);
> 
> tpm_tis_init calls tpmm_chip_alloc which barfs when pdev (i.e. the return value
> of platform_device_register_simple above) isn't bound. It is not allowed
> to assume that the device is bound after the above function calls.

Can you please explain again why you think that assumption is invalid? 
As far as I understand the code, the assumption would be correct in
4.3.0 and earlier:

platform_driver_register() registers a platform driver with name
"tpm_tis". platform_device_register_simple() registers a device with the
same name. This will call platform_device_add()/device_add() and start
probing for a platform device. Platform bus probing in platform_match()
falls back to a simple match between driver and device name if all else
fails. That match succeeds for the "tpm_tis" driver. Thus
driver_probe_device() will be called, and in the absence of a
driver-specific probe routine, will succeed. Thus after
platform_device_register_simple() returns, device and driver will be
bound. This matches also actual behavior of the pre-4.4 code. 

Please explain what I am overlooking. I am just trying to understand.
As far as tpm_tis is concerned, Jason's current patch set is going to
fix this for good anyway.

Regards
Martin

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
Uwe Kleine-König Dec. 1, 2015, 1:24 p.m. UTC | #14
Hello Martin,

On Tue, Dec 01, 2015 at 11:41:53AM +0100, Wilck, Martin wrote:
> > This sounds like a separate issue though. Looking at init_tis there is:
> > 
> >         rc = platform_driver_register(&tis_drv);
> >         if (rc < 0)
> >                 return rc;
> >         pdev = platform_device_register_simple("tpm_tis", -1, NULL, 0);
> >         if (IS_ERR(pdev)) {
> >                 rc = PTR_ERR(pdev);
> >                 goto err_dev;
> >         }
> >         rc = tpm_tis_init(&pdev->dev, &tis_default_info, NULL);
> > 
> > tpm_tis_init calls tpmm_chip_alloc which barfs when pdev (i.e. the return value
> > of platform_device_register_simple above) isn't bound. It is not allowed
> > to assume that the device is bound after the above function calls.
> 
> Can you please explain again why you think that assumption is invalid? 

You can unbind a device from a driver via sysfs, you can also prevent
binding somehow I think, probing can fail for different reasons, probing
might wait for userspace interaction to load firmware which wasn't
scheduled yet. I'm sure there are still more things that break the
assumption.

Best regards
Uwe
Martin Wilck Dec. 1, 2015, 3:19 p.m. UTC | #15
> > > tpm_tis_init calls tpmm_chip_alloc which barfs when pdev (i.e. the return value
> > > of platform_device_register_simple above) isn't bound. It is not allowed
> > > to assume that the device is bound after the above function calls.
> > 
> > Can you please explain again why you think that assumption is invalid? 
> 
> You can unbind a device from a driver via sysfs, you can also prevent
> binding somehow I think, probing can fail for different reasons, probing
> might wait for userspace interaction to load firmware which wasn't
> scheduled yet. I'm sure there are still more things that break the
> assumption.

Thanks. Out of these, "prevent binding somehow" would be the only
problem that applies to tpm_tis, as probing can't fail (no probe()
routine), there's no FW to load, and unbinding via sysfs would require
nearly impossible timing (not sure if it could be done with udev).

Anyway, the Right Thing to do is to create a probe() routine and that's
what Jason did.

Thanks again,
Martin

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
Jason Gunthorpe Dec. 1, 2015, 5:25 p.m. UTC | #16
On Tue, Dec 01, 2015 at 04:19:25PM +0100, Wilck, Martin wrote:
> > > > tpm_tis_init calls tpmm_chip_alloc which barfs when pdev (i.e. the return value
> > > > of platform_device_register_simple above) isn't bound. It is not allowed
> > > > to assume that the device is bound after the above function calls.
> > > 
> > > Can you please explain again why you think that assumption is invalid? 
> > 
> > You can unbind a device from a driver via sysfs, you can also prevent
> > binding somehow I think, probing can fail for different reasons, probing
> > might wait for userspace interaction to load firmware which wasn't
> > scheduled yet. I'm sure there are still more things that break the
> > assumption.
> 
> Thanks. Out of these, "prevent binding somehow" would be the only
> problem that applies to tpm_tis, as probing can't fail (no probe()
> routine), there's no FW to load, and unbinding via sysfs would require
> nearly impossible timing (not sure if it could be done with udev).
> 
> Anyway, the Right Thing to do is to create a probe() routine and that's
> what Jason did.

That fixes tpm_tis, but there are other ancient TPM drivers that use
the old, now broken way.

So, we still need to do something here. Either fixup b8b2c7d845d5 as
you have proposed, remove the now broken obsolete TPM drivers, or try
and fix them..

Jason

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
Peter Hüwe Dec. 1, 2015, 6:26 p.m. UTC | #17
Am 1. Dezember 2015 09:25:37 PST, schrieb Jason Gunthorpe <jgunthorpe@obsidianresearch.com>:
>On Tue, Dec 01, 2015 at 04:19:25PM +0100, Wilck, Martin wrote:
>> > > > tpm_tis_init calls tpmm_chip_alloc which barfs when pdev (i.e.
>the return value
>> > > > of platform_device_register_simple above) isn't bound. It is
>not allowed
>> > > > to assume that the device is bound after the above function
>calls.
>> > > 
>> > > Can you please explain again why you think that assumption is
>invalid? 
>> > 
>> > You can unbind a device from a driver via sysfs, you can also
>prevent
>> > binding somehow I think, probing can fail for different reasons,
>probing
>> > might wait for userspace interaction to load firmware which wasn't
>> > scheduled yet. I'm sure there are still more things that break the
>> > assumption.
>> 
>> Thanks. Out of these, "prevent binding somehow" would be the only
>> problem that applies to tpm_tis, as probing can't fail (no probe()
>> routine), there's no FW to load, and unbinding via sysfs would
>require
>> nearly impossible timing (not sure if it could be done with udev).
>> 
>> Anyway, the Right Thing to do is to create a probe() routine and
>that's
>> what Jason did.
>
>That fixes tpm_tis, but there are other ancient TPM drivers that use
>the old, now broken way.
>
>So, we still need to do something here. Either fixup b8b2c7d845d5 as
>you have proposed, remove the now broken obsolete TPM drivers, or try
>and fix them..

How broken are they and since when?

I thought multiple times about deprecating and finally removing the 1.1b stuff - tpm 1.2 is out for 10? years now? With an expected life span of a TPM of roughly 5years... 
And also unfortunately the 1.1b legacy drivers usually get loaded first :( (atleast for slb9635)

Mark them as obsolete , default them to No and remove them by 4.10 if there are no objections?

Peter
Jason Gunthorpe Dec. 1, 2015, 6:40 p.m. UTC | #18
On Tue, Dec 01, 2015 at 10:26:17AM -0800, Peter Huewe wrote:
> 
> 
> Am 1. Dezember 2015 09:25:37 PST, schrieb Jason Gunthorpe <jgunthorpe@obsidianresearch.com>:
> >On Tue, Dec 01, 2015 at 04:19:25PM +0100, Wilck, Martin wrote:
> >> > > > tpm_tis_init calls tpmm_chip_alloc which barfs when pdev (i.e.
> >the return value
> >> > > > of platform_device_register_simple above) isn't bound. It is
> >not allowed
> >> > > > to assume that the device is bound after the above function
> >calls.
> >> > > 
> >> > > Can you please explain again why you think that assumption is
> >invalid? 
> >> > 
> >> > You can unbind a device from a driver via sysfs, you can also
> >prevent
> >> > binding somehow I think, probing can fail for different reasons,
> >probing
> >> > might wait for userspace interaction to load firmware which wasn't
> >> > scheduled yet. I'm sure there are still more things that break the
> >> > assumption.
> >> 
> >> Thanks. Out of these, "prevent binding somehow" would be the only
> >> problem that applies to tpm_tis, as probing can't fail (no probe()
> >> routine), there's no FW to load, and unbinding via sysfs would
> >require
> >> nearly impossible timing (not sure if it could be done with udev).
> >> 
> >> Anyway, the Right Thing to do is to create a probe() routine and
> >that's
> >> what Jason did.
> >
> >That fixes tpm_tis, but there are other ancient TPM drivers that use
> >the old, now broken way.
> >
> >So, we still need to do something here. Either fixup b8b2c7d845d5 as
> >you have proposed, remove the now broken obsolete TPM drivers, or try
> >and fix them..
> 
> How broken are they and since when?

oops the kernel broken, since 4.4-rc1 apparently, so not released yet.

> Mark them as obsolete , default them to No and remove them by 4.10
> if there are no objections?

Greg KH has been advising just to delete stuff right away. It is easy
to undelete something if someone comes around with hardware and is
willing to test

Jason

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
Peter Hüwe Dec. 1, 2015, 6:54 p.m. UTC | #19
> > >That fixes tpm_tis, but there are other ancient TPM drivers that use
> > >the old, now broken way.
> > >
> > >So, we still need to do something here. Either fixup b8b2c7d845d5 as
> > >you have proposed, remove the now broken obsolete TPM drivers, or try
> > >and fix them..
> >
> > How broken are they and since when?

> oops the kernel broken, since 4.4-rc1 apparently, so not released yet.
damn, I was hoping MUCH longer :)

> > Mark them as obsolete , default them to No and remove them by 4.10
> > if there are no objections?

> Greg KH has been advising just to delete stuff right away. It is easy
> to undelete something if someone comes around with hardware and is
> willing to test

Can you point to that discussion or was it offline?
I mean for staging, yes there it is clear, but for stuff that is officially "upstream", I'm not 100% sure.

Thanks,
Peter



------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
Jason Gunthorpe Dec. 1, 2015, 7:03 p.m. UTC | #20
On Tue, Dec 01, 2015 at 07:54:59PM +0100, Peter Huewe wrote:
> Can you point to that discussion or was it offline?

Apparently it was brought up at the most recent kernel summit

http://www.spinics.net/lists/linux-rdma/msg29985.html

Jason

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140

Patch
diff mbox

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 1dd6d3b..c994e76 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -513,10 +513,14 @@  static int platform_drv_probe(struct device *_dev)
 		return ret;
 
 	ret = dev_pm_domain_attach(_dev, true);
-	if (ret != -EPROBE_DEFER && drv->probe) {
-		ret = drv->probe(dev);
-		if (ret)
-			dev_pm_domain_detach(_dev, true);
+	if (ret != -EPROBE_DEFER) {
+		if (drv->probe) {
+			ret = drv->probe(dev);
+			if (ret)
+				dev_pm_domain_detach(_dev, true);
+		} else
+			/* don't fail if just dev_pm_domain_attach failed */
+			ret = 0;
 	}
 
 	if (drv->prevent_deferred_probe && ret == -EPROBE_DEFER) {