diff mbox series

[net,1/2] ionic: add LIF_READY state to close probe-open race

Message ID 20200429183739.56540-2-snelson@pensando.io
State Changes Requested
Delegated to: David Miller
Headers show
Series ionic: fw upgrade bug fixes | expand

Commit Message

Shannon Nelson April 29, 2020, 6:37 p.m. UTC
Add a bit of state to the lif to signify when the queues are
ready to be used.  This closes an ionic_probe()/ionic_open()
race condition where the driver has registered the netdev
and signaled Link Up while running under ionic_probe(),
which NetworkManager or other user processes can see and then
try to bring up the device before the initial pass through
ionic_link_status_check() has finished.  NetworkManager's
thread can get into __dev_open() and set __LINK_STATE_START
in dev->state before the ionic_probe() thread makes it to the
netif_running() check, which results in the ionic_probe() thread
trying to start the queues before the queues have completed
their initialization.

Adding a LIF_QREADY flag allows us to prevent this condition by
signaling whether the Tx/Rx queues are initialized and ready.

Fixes: c672412f6172 ("ionic: remove lifs on fw reset")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 drivers/net/ethernet/pensando/ionic/ionic_lif.c |  6 ++++++
 drivers/net/ethernet/pensando/ionic/ionic_lif.h | 11 +++++++++++
 2 files changed, 17 insertions(+)

Comments

David Miller April 29, 2020, 7:38 p.m. UTC | #1
From: Shannon Nelson <snelson@pensando.io>
Date: Wed, 29 Apr 2020 11:37:38 -0700

> Add a bit of state to the lif to signify when the queues are
> ready to be used.  This closes an ionic_probe()/ionic_open()
> race condition where the driver has registered the netdev
> and signaled Link Up while running under ionic_probe(),
> which NetworkManager or other user processes can see and then
> try to bring up the device before the initial pass through
> ionic_link_status_check() has finished.  NetworkManager's
> thread can get into __dev_open() and set __LINK_STATE_START
> in dev->state before the ionic_probe() thread makes it to the
> netif_running() check, which results in the ionic_probe() thread
> trying to start the queues before the queues have completed
> their initialization.
> 
> Adding a LIF_QREADY flag allows us to prevent this condition by
> signaling whether the Tx/Rx queues are initialized and ready.
> 
> Fixes: c672412f6172 ("ionic: remove lifs on fw reset")
> Signed-off-by: Shannon Nelson <snelson@pensando.io>

I would rather you fix this by adjusting the visibility.

You should only call register_netdevice() when all of the device
setup and software state initialization is complete.

This improper ordering is the true cause of this bug and adding
this new boolean state is simply papering over the core issue.

Thanks.
Shannon Nelson April 29, 2020, 9:43 p.m. UTC | #2
On 4/29/20 12:38 PM, David Miller wrote:
> From: Shannon Nelson <snelson@pensando.io>
> Date: Wed, 29 Apr 2020 11:37:38 -0700
>
>> Add a bit of state to the lif to signify when the queues are
>> ready to be used.  This closes an ionic_probe()/ionic_open()
>> race condition where the driver has registered the netdev
>> and signaled Link Up while running under ionic_probe(),
>> which NetworkManager or other user processes can see and then
>> try to bring up the device before the initial pass through
>> ionic_link_status_check() has finished.  NetworkManager's
>> thread can get into __dev_open() and set __LINK_STATE_START
>> in dev->state before the ionic_probe() thread makes it to the
>> netif_running() check, which results in the ionic_probe() thread
>> trying to start the queues before the queues have completed
>> their initialization.
>>
>> Adding a LIF_QREADY flag allows us to prevent this condition by
>> signaling whether the Tx/Rx queues are initialized and ready.
>>
>> Fixes: c672412f6172 ("ionic: remove lifs on fw reset")
>> Signed-off-by: Shannon Nelson <snelson@pensando.io>
> I would rather you fix this by adjusting the visibility.
>
> You should only call register_netdevice() when all of the device
> setup and software state initialization is complete.
>
> This improper ordering is the true cause of this bug and adding
> this new boolean state is simply papering over the core issue.
>
> Thanks.

It seems my impatience to see the Link Up message is a problem.  It 
looka as if I push the link check out of the probe thread and let the 
watchdog discover it later, all is much better.  I'll followup with a v2.

sln
diff mbox series

Patch

diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
index 5acf4f46c268..2dc513f43fd4 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
@@ -1627,6 +1627,9 @@  static int ionic_start_queues(struct ionic_lif *lif)
 {
 	int err;
 
+	if (!test_bit(IONIC_LIF_F_QREADY, lif->state))
+		return 0;
+
 	if (test_and_set_bit(IONIC_LIF_F_UP, lif->state))
 		return 0;
 
@@ -1652,6 +1655,7 @@  int ionic_open(struct net_device *netdev)
 	err = ionic_txrx_init(lif);
 	if (err)
 		goto err_out;
+	set_bit(IONIC_LIF_F_QREADY, lif->state);
 
 	/* don't start the queues until we have link */
 	if (netif_carrier_ok(netdev)) {
@@ -1663,6 +1667,7 @@  int ionic_open(struct net_device *netdev)
 	return 0;
 
 err_txrx_deinit:
+	clear_bit(IONIC_LIF_F_QREADY, lif->state);
 	ionic_txrx_deinit(lif);
 err_out:
 	ionic_txrx_free(lif);
@@ -1685,6 +1690,7 @@  int ionic_stop(struct net_device *netdev)
 	if (test_bit(IONIC_LIF_F_FW_RESET, lif->state))
 		return 0;
 
+	clear_bit(IONIC_LIF_F_QREADY, lif->state);
 	ionic_stop_queues(lif);
 	ionic_txrx_deinit(lif);
 	ionic_txrx_free(lif);
diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.h b/drivers/net/ethernet/pensando/ionic/ionic_lif.h
index 5d4ffda5c05f..a7bf85c17354 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_lif.h
+++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.h
@@ -121,9 +121,20 @@  struct ionic_lif_sw_stats {
 	u64 rx_csum_error;
 };
 
+/**
+ * enum ionic_lif_state_flags - driver LIF states
+ * @IONIC_LIF_F_INITED:		LIF configured, adminq running
+ * @IONIC_LIF_F_SW_DEBUG_STATS:	Ethtool printing of extra stats is enabled
+ * @IONIC_LIF_F_QREADY:		LIF queues are configured and ready for UP
+ * @IONIC_LIF_F_UP:		LIF is fully UP and running
+ * @IONIC_LIF_F_LINK_CHECK_REQUESTED:	Link check has been requested
+ * @IONIC_LIF_F_QUEUE_RESET:	LIF is resetting all queues
+ * @IONIC_LIF_F_FW_RESET:	FW is going through reset
+ */
 enum ionic_lif_state_flags {
 	IONIC_LIF_F_INITED,
 	IONIC_LIF_F_SW_DEBUG_STATS,
+	IONIC_LIF_F_QREADY,
 	IONIC_LIF_F_UP,
 	IONIC_LIF_F_LINK_CHECK_REQUESTED,
 	IONIC_LIF_F_QUEUE_RESET,