diff mbox

[RFC] net: don't set __LINK_STATE_START until after dev->open() call

Message ID 20170807222421.11897-1-jacob.e.keller@intel.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Keller, Jacob E Aug. 7, 2017, 10:24 p.m. UTC
Fix an issue with relying on netif_running() which could be true during
when dev->open() handler is being called, even if it would exit with
a failure. This ensures the state does not get set and removed with
a narrow race for other callers to read it as open when infact it never
finished opening.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
I found this as a result of debugging a race condition in the i40evf
driver, in which we assumed that netif_running() would not be true until
after dev->open() had been called and succeeded. Unfortunately we can't
hold the rtnl_lock() while checking netif_running() because it would
cause a deadlock between our reset task and our ndo_open handler.

I am wondering whether the proposed change is acceptable here, or
whether some ndo_open handlers rely on __LINK_STATE_START being true
prior to their being called?

 net/core/dev.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

Comments

David Miller Aug. 9, 2017, 1:16 a.m. UTC | #1
From: Jacob Keller <jacob.e.keller@intel.com>
Date: Mon,  7 Aug 2017 15:24:21 -0700

> Fix an issue with relying on netif_running() which could be true during
> when dev->open() handler is being called, even if it would exit with
> a failure. This ensures the state does not get set and removed with
> a narrow race for other callers to read it as open when infact it never
> finished opening.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> ---
> I found this as a result of debugging a race condition in the i40evf
> driver, in which we assumed that netif_running() would not be true until
> after dev->open() had been called and succeeded. Unfortunately we can't
> hold the rtnl_lock() while checking netif_running() because it would
> cause a deadlock between our reset task and our ndo_open handler.
> 
> I am wondering whether the proposed change is acceptable here, or
> whether some ndo_open handlers rely on __LINK_STATE_START being true
> prior to their being called?

I think this has the potential to break a bunch of drivers, but I
cannot prove this.

A lot of drivers have several pieces of state setup when they bring
the device up.  And these routines are also invoked from other code
paths like suspend/resume, PCI-E error recovery, etc. and they
probably do netif_running() calls here and there.

This behavior has been this way for a very long time, so the risk is
quite high I think.
Keller, Jacob E Aug. 9, 2017, 8 p.m. UTC | #2
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
> Behalf Of David Miller
> Sent: Tuesday, August 08, 2017 6:17 PM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: netdev@vger.kernel.org
> Subject: Re: [RFC PATCH] net: don't set __LINK_STATE_START until after dev->open()
> call
> 
> From: Jacob Keller <jacob.e.keller@intel.com>
> Date: Mon,  7 Aug 2017 15:24:21 -0700
> > I am wondering whether the proposed change is acceptable here, or
> > whether some ndo_open handlers rely on __LINK_STATE_START being true
> > prior to their being called?
> 
> I think this has the potential to break a bunch of drivers, but I
> cannot prove this.
> 
> A lot of drivers have several pieces of state setup when they bring
> the device up.  And these routines are also invoked from other code
> paths like suspend/resume, PCI-E error recovery, etc. and they
> probably do netif_running() calls here and there.
> 
> This behavior has been this way for a very long time, so the risk is
> quite high I think.

That's what I am worried about. However, I think there are problems with leaving it. A lot of drivers rely on netif_running() to determine whether or not the device is open, but they may be using it relying on all the changes caused by the .ndo_open() handler are finished. The current system there is a race, since you set the __LINK_STATE_START before .ndo_open is called.

This is what I found when attempting to debug a race in i40evf_open and i40evf_reset. The reset ended up running at the same time as open and the two flows collided because reset relied on netif_running() under the assumption that it would not return true until the device was actually open. However, it was running at the same time as open was, so it was trying to shutdown things at the same time as the open call was trying to bring them up.

Any location which uses rtnl_lock() will be safe, since the dev_open call is under rtnl_lock() so all callers of netif_running() under lock are serialized to see only the flag before or after dev_open completes. This is the majority of the callers I think.

Unfortunately, I can't really hold the rtnl_lock() during reset, since this caused a deadlock when I tried it due to lock ordering problems. I'm not sure if I could fix that or not.

I think there are other places which are incorrect, but haven't yet run into an issue. The only place I can see where the functionality might be changed for the worse is if a .dev_open handler actually relies on checking netif_running() during its call, which seems incredibly silly to me....

Thanks,
Jake

P.S. I apologize for the typo in this patch, if there is more discussion I can send a v2 which fixes it.
David Miller Aug. 9, 2017, 9:24 p.m. UTC | #3
From: "Keller, Jacob E" <jacob.e.keller@intel.com>
Date: Wed, 9 Aug 2017 20:00:55 +0000

> That's what I am worried about. However, I think there are problems
> with leaving it. A lot of drivers rely on netif_running() to
> determine whether or not the device is open, but they may be using
> it relying on all the changes caused by the .ndo_open() handler are
> finished. The current system there is a race, since you set the
> __LINK_STATE_START before .ndo_open is called.

I think this is only a half-accurate description.

What drivers want to know is if initialization phase X of ndo_open()
has completed.

And honestly it must be like this because this is what one would
use to guide the teardown during failure paths of ndo_open(), right?

So I would really rather drivers internally track this "I initialized
X" state, as most drivers do, rather than take the risk of changing
the netif_running() behavior which can impact ~500 drivers :-)
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 1d75499add72..11953af90427 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1362,8 +1362,6 @@  static int __dev_open(struct net_device *dev)
 	if (ret)
 		return ret;
 
-	set_bit(__LINK_STATE_START, &dev->state);
-
 	if (ops->ndo_validate_addr)
 		ret = ops->ndo_validate_addr(dev);
 
@@ -1372,9 +1370,8 @@  static int __dev_open(struct net_device *dev)
 
 	netpoll_poll_enable(dev);
 
-	if (ret)
-		clear_bit(__LINK_STATE_START, &dev->state);
-	else {
+	if (!ret)
+		set_bit(__LINK_STATE_START, &dev->state);
 		dev->flags |= IFF_UP;
 		dev_set_rx_mode(dev);
 		dev_activate(dev);