diff mbox series

[net-next] net: phy: Ensure scheduled work is cancelled during removal

Message ID 1559330150-30099-2-git-send-email-hancock@sedsystems.ca
State Not Applicable
Delegated to: David Miller
Headers show
Series [net-next] net: phy: Ensure scheduled work is cancelled during removal | expand

Commit Message

Robert Hancock May 31, 2019, 7:15 p.m. UTC
It is possible that scheduled work started by the PHY driver is still
outstanding when phy_device_remove is called if the PHY was initially
started but never connected, and therefore phy_disconnect is never
called. phy_stop does not guarantee that the scheduled work is stopped
because it is called under rtnl_lock. This can cause an oops due to
use-after-free if the delayed work fires after freeing the PHY device.

Ensure that the state_queue work is cancelled in both phy_device_remove
and phy_remove paths.

Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
---
 drivers/net/phy/phy_device.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Andrew Lunn May 31, 2019, 8:54 p.m. UTC | #1
Robert

Please make sure you Cc: PHY patches to the PHY maintainers.

Heiner, this one is for you.

	Andrew

On Fri, May 31, 2019 at 01:15:50PM -0600, Robert Hancock wrote:
> It is possible that scheduled work started by the PHY driver is still
> outstanding when phy_device_remove is called if the PHY was initially
> started but never connected, and therefore phy_disconnect is never
> called. phy_stop does not guarantee that the scheduled work is stopped
> because it is called under rtnl_lock. This can cause an oops due to
> use-after-free if the delayed work fires after freeing the PHY device.
> 
> Ensure that the state_queue work is cancelled in both phy_device_remove
> and phy_remove paths.
> 
> Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
> ---
>  drivers/net/phy/phy_device.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 2c879ba..1c90b33 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -877,6 +877,8 @@ int phy_device_register(struct phy_device *phydev)
>   */
>  void phy_device_remove(struct phy_device *phydev)
>  {
> +	cancel_delayed_work_sync(&phydev->state_queue);
> +
>  	device_del(&phydev->mdio.dev);
>  
>  	/* Assert the reset signal */
> -- 
> 1.8.3.1
>
Heiner Kallweit May 31, 2019, 9:26 p.m. UTC | #2
On 31.05.2019 22:54, Andrew Lunn wrote:
> Robert
> 
> Please make sure you Cc: PHY patches to the PHY maintainers.
> 
> Heiner, this one is for you.
> 
> 	Andrew
> 
> On Fri, May 31, 2019 at 01:15:50PM -0600, Robert Hancock wrote:
>> It is possible that scheduled work started by the PHY driver is still
>> outstanding when phy_device_remove is called if the PHY was initially
>> started but never connected, and therefore phy_disconnect is never
>> called. phy_stop does not guarantee that the scheduled work is stopped
>> because it is called under rtnl_lock. This can cause an oops due to
>> use-after-free if the delayed work fires after freeing the PHY device.
>>
The patch itself at least shouldn't do any harm. However the justification
isn't fully convincing yet.
PHY drivers don't start any scheduled work. This queue is used by the
phylib state machine. phy_stop usually isn't called under rtnl_lock,
and it calls phy_stop_machine that cancels pending work.
Did you experience such an oops? Can you provide a call chain where
your described scenario could happen?

>> Ensure that the state_queue work is cancelled in both phy_device_remove
>> and phy_remove paths.
>>
>> Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
>> ---
>>  drivers/net/phy/phy_device.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index 2c879ba..1c90b33 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -877,6 +877,8 @@ int phy_device_register(struct phy_device *phydev)
>>   */
>>  void phy_device_remove(struct phy_device *phydev)
>>  {
>> +	cancel_delayed_work_sync(&phydev->state_queue);
>> +
>>  	device_del(&phydev->mdio.dev);
>>  
>>  	/* Assert the reset signal */
>> -- 
>> 1.8.3.1
>>
>
Robert Hancock June 1, 2019, 3:22 a.m. UTC | #3
> On 31.05.2019 22:54, Andrew Lunn wrote:
>>> It is possible that scheduled work started by the PHY driver is still
>>> outstanding when phy_device_remove is called if the PHY was initially
>>> started but never connected, and therefore phy_disconnect is never
>>> called. phy_stop does not guarantee that the scheduled work is stopped
>>> because it is called under rtnl_lock. This can cause an oops due to
>>> use-after-free if the delayed work fires after freeing the PHY device.
>>>
> The patch itself at least shouldn't do any harm. However the justification
> isn't fully convincing yet.
> PHY drivers don't start any scheduled work. This queue is used by the
> phylib state machine. phy_stop usually isn't called under rtnl_lock,
> and it calls phy_stop_machine that cancels pending work.
> Did you experience such an oops? Can you provide a call chain where
> your described scenario could happen?

Upon further investigation, it appears that this change is no longer
needed in the mainline. Previously (such as in 4.19 kernels as we are
using), phy_stop did not call phy_stop_machine, only phy_disconnect did,
so if the phy was started but never connected and disconnected before
stopping it, the delayed work was not stopped. That sequence didn't occur
often, but could happen in some failure cases which I believe was what I
ran into during development when this change was originally made.

It looks like this was fixed in commit
cbfd12b3e8c3542e8142aa041714ed614d3f67b0 "net: phy: ensure phylib state
machine is stopped after calling phy_stop". So my patch can be dropped -
but maybe that patch should be added to stable?
Andrew Lunn June 1, 2019, 3:46 p.m. UTC | #4
On Fri, May 31, 2019 at 09:22:16PM -0600, hancock@sedsystems.ca wrote:
> > On 31.05.2019 22:54, Andrew Lunn wrote:
> >>> It is possible that scheduled work started by the PHY driver is still
> >>> outstanding when phy_device_remove is called if the PHY was initially
> >>> started but never connected, and therefore phy_disconnect is never
> >>> called. phy_stop does not guarantee that the scheduled work is stopped
> >>> because it is called under rtnl_lock. This can cause an oops due to
> >>> use-after-free if the delayed work fires after freeing the PHY device.
> >>>
> > The patch itself at least shouldn't do any harm. However the justification
> > isn't fully convincing yet.
> > PHY drivers don't start any scheduled work. This queue is used by the
> > phylib state machine. phy_stop usually isn't called under rtnl_lock,
> > and it calls phy_stop_machine that cancels pending work.
> > Did you experience such an oops? Can you provide a call chain where
> > your described scenario could happen?
> 
> Upon further investigation, it appears that this change is no longer
> needed in the mainline. Previously (such as in 4.19 kernels as we are
> using),

Hi Robert

Please do all your testing on net-next. 4.19 is dead, in terms of
development. There is no point in developing and testing on it patches
intended for mainline.

     Andrew
diff mbox series

Patch

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 2c879ba..1c90b33 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -877,6 +877,8 @@  int phy_device_register(struct phy_device *phydev)
  */
 void phy_device_remove(struct phy_device *phydev)
 {
+	cancel_delayed_work_sync(&phydev->state_queue);
+
 	device_del(&phydev->mdio.dev);
 
 	/* Assert the reset signal */