diff mbox series

[net] be2net: Signal that the device cannot transmit during reconfiguration

Message ID 20190716081655.7676-1-bpoirier@suse.com
State Accepted
Delegated to: David Miller
Headers show
Series [net] be2net: Signal that the device cannot transmit during reconfiguration | expand

Commit Message

Benjamin Poirier July 16, 2019, 8:16 a.m. UTC
While changing the number of interrupt channels, be2net stops adapter
operation (including netif_tx_disable()) but it doesn't signal that it
cannot transmit. This may lead dev_watchdog() to falsely trigger during
that time.

Add the missing call to netif_carrier_off(), following the pattern used in
many other drivers. netif_carrier_on() is already taken care of in
be_open().

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
---
 drivers/net/ethernet/emulex/benet/be_main.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

David Miller July 16, 2019, 7:41 p.m. UTC | #1
From: Benjamin Poirier <bpoirier@suse.com>
Date: Tue, 16 Jul 2019 17:16:55 +0900

> While changing the number of interrupt channels, be2net stops adapter
> operation (including netif_tx_disable()) but it doesn't signal that it
> cannot transmit. This may lead dev_watchdog() to falsely trigger during
> that time.
> 
> Add the missing call to netif_carrier_off(), following the pattern used in
> many other drivers. netif_carrier_on() is already taken care of in
> be_open().
> 
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>

Applied.
Firo Yang July 17, 2019, 4:23 a.m. UTC | #2
I think there is a problem if dev_watchdog() is triggered before netif_carrier_off(). dev_watchdog() might call ->ndo_tx_timeout(), i.e. be_tx_timeout(), if txq timeout  happens. Thus be_tx_timeout() could still be able to access the memory which is being freed by be_update_queues().

Thanks,
Firo
Benjamin Poirier July 17, 2019, 8:23 a.m. UTC | #3
On 2019/07/17 13:23, Firo Yang wrote:
> I think there is a problem if dev_watchdog() is triggered before netif_carrier_off(). dev_watchdog() might call ->ndo_tx_timeout(), i.e. be_tx_timeout(), if txq timeout  happens. Thus be_tx_timeout() could still be able to access the memory which is being freed by be_update_queues().

Good point. That's a separate problem which would occur in case of real
tx timeout. How about this followup change:

--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4698,8 +4698,13 @@ int be_update_queues(struct be_adapter *adapter)
 	int status;
 
 	if (netif_running(netdev)) {
+		/* be_tx_timeout() must not run concurrently with this
+		 * function, synchronize with an already-running dev_watchdog
+		 */
+		netif_tx_lock_bh(netdev);
 		/* device cannot transmit now, avoid dev_watchdog timeouts */
 		netif_carrier_off(netdev);
+		netif_tx_unlock_bh(netdev);
 
 		be_close(netdev);
 	}
Firo Yang July 17, 2019, 8:56 a.m. UTC | #4
I don't think this change could fix this problem because if SMP, dev_watchdog() could run on a different CPU.

Thanks,
Firo
Benjamin Poirier July 17, 2019, 9:32 a.m. UTC | #5
On 2019/07/17 17:56, Firo Yang wrote:
> I don't think this change could fix this problem because if SMP, dev_watchdog() could run on a different CPU.

hmm, SMP is clearly part of the picture here. The change I proposed
revolves around the synchronization offered by dev->tx_global_lock:

we have
\ dev_watchdog
	\ netif_tx_lock
		spin_lock(&dev->tx_global_lock);
	...
	\ netif_tx_unlock

and

\ be_update_queues
	\ netif_tx_lock_bh
		\ netif_tx_lock
			spin_lock(&dev->tx_global_lock);

Makes sense?
Firo Yang July 17, 2019, 10:25 a.m. UTC | #6
Crystal clear. Many thanks.

// Firo
diff mbox series

Patch

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 82015c8a5ed7..b7a246b33599 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4697,8 +4697,12 @@  int be_update_queues(struct be_adapter *adapter)
 	struct net_device *netdev = adapter->netdev;
 	int status;
 
-	if (netif_running(netdev))
+	if (netif_running(netdev)) {
+		/* device cannot transmit now, avoid dev_watchdog timeouts */
+		netif_carrier_off(netdev);
+
 		be_close(netdev);
+	}
 
 	be_cancel_worker(adapter);