diff mbox series

net: phy: mdio_bus: add missing device_del() in mdiobus_register() error handling

Message ID 20190116095358.28354-1-thomas.petazzoni@bootlin.com
State Accepted
Delegated to: David Miller
Headers show
Series net: phy: mdio_bus: add missing device_del() in mdiobus_register() error handling | expand

Commit Message

Thomas Petazzoni Jan. 16, 2019, 9:53 a.m. UTC
The current code in __mdiobus_register() doesn't properly handle
failures returned by the devm_gpiod_get_optional() call: it returns
immediately, without unregistering the device that was added by the
call to device_register() earlier in the function.

This leaves a stale device, which then causes a NULL pointer
dereference in the code that handles deferred probing:

[    1.489982] Unable to handle kernel NULL pointer dereference at virtual address 00000074
[    1.498110] pgd = (ptrval)
[    1.500838] [00000074] *pgd=00000000
[    1.504432] Internal error: Oops: 17 [#1] SMP ARM
[    1.509133] Modules linked in:
[    1.512192] CPU: 1 PID: 51 Comm: kworker/1:3 Not tainted 4.20.0-00039-g3b73a4cc8b3e-dirty #99
[    1.520708] Hardware name: Xilinx Zynq Platform
[    1.525261] Workqueue: events deferred_probe_work_func
[    1.530403] PC is at klist_next+0x10/0xfc
[    1.534403] LR is at device_for_each_child+0x40/0x94
[    1.539361] pc : [<c0683fbc>]    lr : [<c0455d90>]    psr: 200e0013
[    1.545628] sp : ceeefe68  ip : 00000001  fp : ffffe000
[    1.550863] r10: 00000000  r9 : c0c66790  r8 : 00000000
[    1.556079] r7 : c0457d44  r6 : 00000000  r5 : ceeefe8c  r4 : cfa2ec78
[    1.562604] r3 : 00000064  r2 : c0457d44  r1 : ceeefe8c  r0 : 00000064
[    1.569129] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[    1.576263] Control: 18c5387d  Table: 0ed7804a  DAC: 00000051
[    1.582013] Process kworker/1:3 (pid: 51, stack limit = 0x(ptrval))
[    1.588280] Stack: (0xceeefe68 to 0xceef0000)
[    1.592630] fe60:                   cfa2ec78 c0c03c08 00000000 c0457d44 00000000 c0c66790
[    1.600814] fe80: 00000000 c0455d90 ceeefeac 00000064 00000000 0d7a542e cee9d494 cfa2ec78
[    1.608998] fea0: cfa2ec78 00000000 c0457d44 c0457d7c cee9d494 c0c03c08 00000000 c0455dac
[    1.617182] fec0: cf98ba44 cf926a00 cee9d494 0d7a542e 00000000 cf935a10 cf935a10 cf935a10
[    1.625366] fee0: c0c4e9b8 c0457d7c c0c4e80c 00000001 cf935a10 c0457df4 cf935a10 c0c4e99c
[    1.633550] ff00: c0c4e99c c045a27c c0c4e9c4 ced63f80 cfde8a80 cfdebc00 00000000 c013893c
[    1.641734] ff20: cfde8a80 cfde8a80 c07bd354 ced63f80 ced63f94 cfde8a80 00000008 c0c02d00
[    1.649936] ff40: cfde8a98 cfde8a80 ffffe000 c0139a30 ffffe000 c0c6624a c07bd354 00000000
[    1.658120] ff60: ffffe000 cee9e780 ceebfe00 00000000 ceeee000 ced63f80 c0139788 cf8cdea4
[    1.666304] ff80: cee9e79c c013e598 00000001 ceebfe00 c013e44c 00000000 00000000 00000000
[    1.674488] ffa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000
[    1.682671] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.690855] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[    1.699058] [<c0683fbc>] (klist_next) from [<c0455d90>] (device_for_each_child+0x40/0x94)
[    1.707241] [<c0455d90>] (device_for_each_child) from [<c0457d7c>] (device_reorder_to_tail+0x38/0x88)
[    1.716476] [<c0457d7c>] (device_reorder_to_tail) from [<c0455dac>] (device_for_each_child+0x5c/0x94)
[    1.725692] [<c0455dac>] (device_for_each_child) from [<c0457d7c>] (device_reorder_to_tail+0x38/0x88)
[    1.734927] [<c0457d7c>] (device_reorder_to_tail) from [<c0457df4>] (device_pm_move_to_tail+0x28/0x40)
[    1.744235] [<c0457df4>] (device_pm_move_to_tail) from [<c045a27c>] (deferred_probe_work_func+0x58/0x8c)
[    1.753746] [<c045a27c>] (deferred_probe_work_func) from [<c013893c>] (process_one_work+0x210/0x4fc)
[    1.762888] [<c013893c>] (process_one_work) from [<c0139a30>] (worker_thread+0x2a8/0x5c0)
[    1.771072] [<c0139a30>] (worker_thread) from [<c013e598>] (kthread+0x14c/0x154)
[    1.778482] [<c013e598>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[    1.785689] Exception stack(0xceeeffb0 to 0xceeefff8)
[    1.790739] ffa0:                                     00000000 00000000 00000000 00000000
[    1.798923] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.807107] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    1.813724] Code: e92d47f0 e1a05000 e8900048 e1a00003 (e5937010)
[    1.819844] ---[ end trace 3c2c0c8b65399ec9 ]---

The actual error that we had from devm_gpiod_get_optional() was
-EPROBE_DEFER, due to the GPIO being provided by a driver that is
probed later than the Ethernet controller driver.

To fix this, we simply add the missing device_del() invocation in the
error path.

Fixes: 69226896ad636 ("mdio_bus: Issue GPIO RESET to PHYs")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 drivers/net/phy/mdio_bus.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Andrew Lunn Jan. 16, 2019, 2:48 p.m. UTC | #1
> The actual error that we had from devm_gpiod_get_optional() was
> -EPROBE_DEFER, due to the GPIO being provided by a driver that is
> probed later than the Ethernet controller driver.
> 
> To fix this, we simply add the missing device_del() invocation in the
> error path.

Hi Thomas

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

However, i wounder if it makes sense to add a label before the
existing device_del() at the end of the function, and convert this,
and the case above into a goto? That might scale better, avoiding the
same issue in the future?

Thanks

    Andrew
Thomas Petazzoni Jan. 16, 2019, 3:18 p.m. UTC | #2
Hello,

On Wed, 16 Jan 2019 15:48:29 +0100, Andrew Lunn wrote:

> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
> 
> However, i wounder if it makes sense to add a label before the
> existing device_del() at the end of the function, and convert this,
> and the case above into a goto? That might scale better, avoiding the
> same issue in the future?

That's another option indeed.

Hmm, now that I looked at it, I think we should use device_unregister()
instead. device_unregister() does both device_del() and put_device().

According to the comment above device_register(), put_device() should
always be called: "Always use put_device() to give up the reference
initialized in this function instead.".

What do you think?

Thomas
Andrew Lunn Jan. 16, 2019, 3:44 p.m. UTC | #3
On Wed, Jan 16, 2019 at 04:18:55PM +0100, Thomas Petazzoni wrote:
> Hello,
> 
> On Wed, 16 Jan 2019 15:48:29 +0100, Andrew Lunn wrote:
> 
> > Reviewed-by: Andrew Lunn <andrew@lunn.ch>
> > 
> > However, i wounder if it makes sense to add a label before the
> > existing device_del() at the end of the function, and convert this,
> > and the case above into a goto? That might scale better, avoiding the
> > same issue in the future?
> 
> That's another option indeed.
> 
> Hmm, now that I looked at it, I think we should use device_unregister()
> instead. device_unregister() does both device_del() and put_device().

Hi Thomas

device_unregister() does seem symmetrical with device_register() which
is what we are trying to undo.

   Andrew
David Miller Jan. 18, 2019, 7:08 p.m. UTC | #4
From: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Date: Wed, 16 Jan 2019 10:53:58 +0100

> The current code in __mdiobus_register() doesn't properly handle
> failures returned by the devm_gpiod_get_optional() call: it returns
> immediately, without unregistering the device that was added by the
> call to device_register() earlier in the function.
> 
> This leaves a stale device, which then causes a NULL pointer
> dereference in the code that handles deferred probing:
 ...
> The actual error that we had from devm_gpiod_get_optional() was
> -EPROBE_DEFER, due to the GPIO being provided by a driver that is
> probed later than the Ethernet controller driver.
> 
> To fix this, we simply add the missing device_del() invocation in the
> error path.
> 
> Fixes: 69226896ad636 ("mdio_bus: Issue GPIO RESET to PHYs")
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>

Applied and queued up for -stable.
Thomas Petazzoni Feb. 11, 2019, 2:31 p.m. UTC | #5
Hello Andrew,

On Wed, 16 Jan 2019 16:44:39 +0100
Andrew Lunn <andrew@lunn.ch> wrote:

> > On Wed, 16 Jan 2019 15:48:29 +0100, Andrew Lunn wrote:
> >   
> > > Reviewed-by: Andrew Lunn <andrew@lunn.ch>
> > > 
> > > However, i wounder if it makes sense to add a label before the
> > > existing device_del() at the end of the function, and convert this,
> > > and the case above into a goto? That might scale better, avoiding the
> > > same issue in the future?  
> > 
> > That's another option indeed.
> > 
> > Hmm, now that I looked at it, I think we should use device_unregister()
> > instead. device_unregister() does both device_del() and put_device().  
> 
> Hi Thomas
> 
> device_unregister() does seem symmetrical with device_register() which
> is what we are trying to undo.

Even if DaveM already merged my simple fix, I had a further look at
whether we should be using device_unregister(), and in fact we should
not, but not really for a good reason: because the mdio API is not very
symmetrical.

The typical flow is:

	probe() {
		bus = mdiobus_alloc();
		if (!bus)
			return -ENOMEM;

		ret = mdiobus_register(&bus);
		if (ret) {
			mdiobus_free(bus);

		...
	}

	remove() {
		mdiobus_unregister();
		mdiobus_free();
	}

mdiobus_alloc() only does memory allocation, i.e it has no side effects
on the device model data structures.

mdiobus_register() does a device_register(). If it fails, it only
cleans up with a device_del(), i.e it doesn't do the put_device() that
it should do to fully "undo" its effect.

mdiobus_unregister() does a device_del(), i.e it also doesn't do the
opposite of mdiobus_register(), which should be device_del() +
put_device() (device_unregister() is a shortcut for both).

mdiobus_free() does the put_device()

So:

 * mdiobus_alloc() / mdiobus_free() are not symmetrical in terms of
   their interaction with the device model data structures

 * On error, mdiobus_register() leaves a non-zero reference count to the
   bus->dev structure, which will be freed up by mdiobus_free()

 * mdiobus_unregister() leaves a non-zero reference count to the
   bus->dev structure, which will be freed up by mdiobus_free()

So, if we were to use device_unregister() in the error path of
mdiobus_register() and in mdiobus_unregister(), it would break how
mdiobus_free() works.

Best regards,

Thomas
Andrew Lunn Feb. 15, 2019, 1:23 a.m. UTC | #6
On Mon, Feb 11, 2019 at 03:31:59PM +0100, Thomas Petazzoni wrote:
> Even if DaveM already merged my simple fix, I had a further look at
> whether we should be using device_unregister(), and in fact we should
> not, but not really for a good reason: because the mdio API is not very
> symmetrical.
> 
> The typical flow is:
> 
> 	probe() {
> 		bus = mdiobus_alloc();
> 		if (!bus)
> 			return -ENOMEM;
> 
> 		ret = mdiobus_register(&bus);
> 		if (ret) {
> 			mdiobus_free(bus);
> 
> 		...
> 	}
> 
> 	remove() {
> 		mdiobus_unregister();
> 		mdiobus_free();
> 	}
> 
> mdiobus_alloc() only does memory allocation, i.e it has no side effects
> on the device model data structures.
> 
> mdiobus_register() does a device_register(). If it fails, it only
> cleans up with a device_del(), i.e it doesn't do the put_device() that
> it should do to fully "undo" its effect.
> 
> mdiobus_unregister() does a device_del(), i.e it also doesn't do the
> opposite of mdiobus_register(), which should be device_del() +
> put_device() (device_unregister() is a shortcut for both).
> 
> mdiobus_free() does the put_device()

Hi Thomas

You made some simplifications here. There is a state variable involved
as well. So if you do mdiobus_alloc() ; mdiobus_free(), it will not do
a put_device() but actually call free(). In that case, it is
symmetrical. 

> 
> So:
> 
>  * mdiobus_alloc() / mdiobus_free() are not symmetrical in terms of
>    their interaction with the device model data structures
> 
>  * On error, mdiobus_register() leaves a non-zero reference count to the
>    bus->dev structure, which will be freed up by mdiobus_free()
> 
>  * mdiobus_unregister() leaves a non-zero reference count to the
>    bus->dev structure, which will be freed up by mdiobus_free()
> 
> So, if we were to use device_unregister() in the error path of
> mdiobus_register() and in mdiobus_unregister(), it would break how
> mdiobus_free() works.

I compared mdiobus with alloc_netdev(), register_netdev(),
unregister_netdev() and free_netdev(). The code is actually very
similar, both have a state variable indicating UNITITIALIZED,
UNREGISTERED, REGISTERED, RELEASED, etc.  Both have asymmetric
operations.

I think the real issue here is that you cannot destroy the memory
until all references to it are released. So free_netdev() /
mdiobus_free() cannot actual use free() if the device has been
registered so there could be references to it. The free() has to
happen as part of the put_device() once the reference count reaches
zero.  If you were to do the put_device() in mdiobus_unregister()
call, by the time you called mdiobus_free(), the structure could of
already been freed, so you cannot look at the state variable to know
if it was ever registered. If it was never registered, you do need to
free it.

So the internals are asymmetric. Which is messy. But the usage of the
API by a driver is symmetric. That i think is more important.

    Andrew
diff mbox series

Patch

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 2e59a8419b17..66b9cfe692fc 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -390,6 +390,7 @@  int __mdiobus_register(struct mii_bus *bus, struct module *owner)
 	if (IS_ERR(gpiod)) {
 		dev_err(&bus->dev, "mii_bus %s couldn't get reset GPIO\n",
 			bus->id);
+		device_del(&bus->dev);
 		return PTR_ERR(gpiod);
 	} else	if (gpiod) {
 		bus->reset_gpiod = gpiod;