mbox series

[net-next,0/8] ibmvnic: Failover hardening

Message ID 1527100682-23099-1-git-send-email-tlfalcon@linux.vnet.ibm.com
Headers show
Series ibmvnic: Failover hardening | expand

Message

Thomas Falcon May 23, 2018, 6:37 p.m. UTC
Introduce additional transport event hardening to handle
events during device reset. In the driver's current state,
if a transport event is received during device reset, it can
cause the device to become unresponsive as invalid operations
are processed as the backing device context changes. After
a transport event, the device expects a request to begin the
initialization process. If the driver is still processing
a previously queued device reset in this state, it is likely
to fail as firmware will reject any commands other than the
one to initialize the client driver's Command-Response Queue.

Instead of failing and becoming dormant, the driver will make
one more attempt to recover and continue operation. This is
achieved by setting a state flag, which if true will direct
the driver to clean up all allocated resources and perform
a hard reset in an attempt to bring the driver back to an
operational state.

Thomas Falcon (8):
  ibmvnic: Mark NAPI flag as disabled when released
  ibmvnic: Introduce active CRQ state
  ibmvnic: Check CRQ command return codes
  ibmvnic: Return error code if init interrupted by transport event
  ibmvnic: Handle error case when setting link state
  ibmvnic: Create separate initialization routine for resets
  ibmvnic: Set resetting state at earliest possible point
  ibmvnic: Introduce hard reset recovery

 drivers/net/ethernet/ibm/ibmvnic.c | 223 +++++++++++++++++++++++++++++++++----
 drivers/net/ethernet/ibm/ibmvnic.h |   2 +
 2 files changed, 202 insertions(+), 23 deletions(-)

Comments

David Miller May 25, 2018, 2:19 a.m. UTC | #1
From: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Date: Wed, 23 May 2018 13:37:54 -0500

> Introduce additional transport event hardening to handle
> events during device reset. In the driver's current state,
> if a transport event is received during device reset, it can
> cause the device to become unresponsive as invalid operations
> are processed as the backing device context changes. After
> a transport event, the device expects a request to begin the
> initialization process. If the driver is still processing
> a previously queued device reset in this state, it is likely
> to fail as firmware will reject any commands other than the
> one to initialize the client driver's Command-Response Queue.
> 
> Instead of failing and becoming dormant, the driver will make
> one more attempt to recover and continue operation. This is
> achieved by setting a state flag, which if true will direct
> the driver to clean up all allocated resources and perform
> a hard reset in an attempt to bring the driver back to an
> operational state.

Series applied.