diff mbox series

[net] be2net: Synchronize be_update_queues with dev_watchdog

Message ID 20190718014218.16610-1-bpoirier@suse.com
State Accepted
Delegated to: David Miller
Headers show
Series [net] be2net: Synchronize be_update_queues with dev_watchdog | expand

Commit Message

Benjamin Poirier July 18, 2019, 1:42 a.m. UTC
As pointed out by Firo Yang, a netdev tx timeout may trigger just before an
ethtool set_channels operation is started. be_tx_timeout(), which dumps
some queue structures, is not written to run concurrently with
be_update_queues(), which frees/allocates those queues structures. Add some
synchronization between the two.

Message-id: <CH2PR18MB31898E033896F9760D36BFF288C90@CH2PR18MB3189.namprd18.prod.outlook.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
---
 drivers/net/ethernet/emulex/benet/be_main.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Florian Fainelli July 18, 2019, 5:23 p.m. UTC | #1
On 7/17/19 6:42 PM, Benjamin Poirier wrote:
> As pointed out by Firo Yang, a netdev tx timeout may trigger just before an
> ethtool set_channels operation is started. be_tx_timeout(), which dumps
> some queue structures, is not written to run concurrently with
> be_update_queues(), which frees/allocates those queues structures. Add some
> synchronization between the two.
> 
> Message-id: <CH2PR18MB31898E033896F9760D36BFF288C90@CH2PR18MB3189.namprd18.prod.outlook.com>
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>

Would not moving the netif_tx_disable() in be_close() further up in the
function resolve that problem as well?
Benjamin Poirier July 19, 2019, 5:26 a.m. UTC | #2
On 2019/07/18 10:23, Florian Fainelli wrote:
> On 7/17/19 6:42 PM, Benjamin Poirier wrote:
> > As pointed out by Firo Yang, a netdev tx timeout may trigger just before an
> > ethtool set_channels operation is started. be_tx_timeout(), which dumps
> > some queue structures, is not written to run concurrently with
> > be_update_queues(), which frees/allocates those queues structures. Add some
> > synchronization between the two.
> > 
> > Message-id: <CH2PR18MB31898E033896F9760D36BFF288C90@CH2PR18MB3189.namprd18.prod.outlook.com>
> > Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> 
> Would not moving the netif_tx_disable() in be_close() further up in the
> function resolve that problem as well?

Thanks for your review Florian,

No, netif_tx_disable() doesn't provide mutual exclusion with
dev_watchdog(). You can have:

cpu0                               cpu1
\ dev_watchdog
       \ netif_tx_lock
              \ be_tx_timeout
                     ...
                                   \ be_set_channels
                                          \ be_update_queues
                                                 \ netif_carrier_off
                                                 \ netif_tx_disable
                                                 ...
                                                 \ be_clear_queues
                     still running in
                     be_tx_timeout(),
                     boom!
David Miller July 21, 2019, 8:22 p.m. UTC | #3
From: Benjamin Poirier <bpoirier@suse.com>
Date: Thu, 18 Jul 2019 10:42:18 +0900

> As pointed out by Firo Yang, a netdev tx timeout may trigger just before an
> ethtool set_channels operation is started. be_tx_timeout(), which dumps
> some queue structures, is not written to run concurrently with
> be_update_queues(), which frees/allocates those queues structures. Add some
> synchronization between the two.
> 
> Message-id: <CH2PR18MB31898E033896F9760D36BFF288C90@CH2PR18MB3189.namprd18.prod.outlook.com>
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>

Applied and queued up for -stable, thanks.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index b7a246b33599..2edb86ec9fe9 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4698,8 +4698,13 @@  int be_update_queues(struct be_adapter *adapter)
 	int status;
 
 	if (netif_running(netdev)) {
+		/* be_tx_timeout() must not run concurrently with this
+		 * function, synchronize with an already-running dev_watchdog
+		 */
+		netif_tx_lock_bh(netdev);
 		/* device cannot transmit now, avoid dev_watchdog timeouts */
 		netif_carrier_off(netdev);
+		netif_tx_unlock_bh(netdev);
 
 		be_close(netdev);
 	}