diff mbox

[net] bnx2x: don't reset chip on cleanup if PCI function is offline

Message ID 1470687227-21024-1-git-send-email-gpiccoli@linux.vnet.ibm.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Guilherme G. Piccoli Aug. 8, 2016, 8:13 p.m. UTC
When PCI error is detected, in some architectures (like PowerPC) a slot
reset is performed - the driver's error handlers are in charge of "disable"
device before the reset, and re-enable it after a successful slot reset.

There are two cases though that another path is taken on the code: if the
slot reset is not successful or if too many errors already happened in the
specific adapter (meaning that possibly the device is experiencing a HW
failure that slot reset is not able to solve), the core PCI error mechanism
(called EEH in PowerPC) will remove the adapter from the system, since it
will consider this as a permanent failure on device. In this case, a path
is taken that leads to bnx2x_chip_cleanup() calling bnx2x_reset_hw(), which
then tries to perform a HW reset on chip. This reset won't succeed since
the HW is in a fault state, which can be seen by multiple messages on
kernel log like below:

	bnx2x: [bnx2x_issue_dmae_with_comp:552(eth1)]DMAE timeout!
	bnx2x: [bnx2x_write_dmae:600(eth1)]DMAE returned failure -1

After some time, the PCI error mechanism gives up on waiting the driver's
correct removal procedure and forcibly remove the adapter from the system.
We can see soft lockup while core PCI error mechanism is waiting for driver
to accomplish the right removal process.

This patch adds a verification to avoid a chip reset whenever the function
is in PCI error state - since this case is only reached when we have a
device being removed because of a permanent failure, the HW chip reset is
not expected to work fine neither is necessary.

Also, we avoid the MCP information dump in case of non-recoverable PCI
error (when adapter is about to be removed), since it will certainly fail.

Reported-by: Harsha Thyagaraja <hathyaga@in.ibm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

Comments

Yuval Mintz Aug. 9, 2016, 12:14 p.m. UTC | #1
> When PCI error is detected, in some architectures (like PowerPC) a slot reset is
> performed - the driver's error handlers are in charge of "disable"
> device before the reset, and re-enable it after a successful slot reset.
> 
> There are two cases though that another path is taken on the code: if the slot
> reset is not successful or if too many errors already happened in the specific
> adapter (meaning that possibly the device is experiencing a HW failure that slot
> reset is not able to solve), the core PCI error mechanism (called EEH in PowerPC)
> will remove the adapter from the system, since it will consider this as a
> permanent failure on device. 
Why would the published resume()  from pci_error_handlers be called in this scenario?

> Also, we avoid the MCP information dump in case of non-recoverable PCI error
> (when adapter is about to be removed), since it will certainly fail.

We should probably avoid several things here; Why specifically only this?
 
> +	if (unlikely(pci_channel_offline(bp->pdev))) {
> +		BNX2X_ERR("Cannot dump MCP info while in PCI error\n");
> +		return;
> +	}
> +
Nitpicky, but I don't think there's reason to add 'unlikely' to a purely slowpath
Configuration.

>  	val = REG_RD(bp, MCP_REG_MCPR_CPU_PROGRAM_COUNTER);
>  	if (val == REG_RD(bp, MCP_REG_MCPR_CPU_PROGRAM_COUNTER))
>  		BNX2X_ERR("%s" "MCP PC at 0x%x\n", lvl, val); @@ -9415,10
> +9420,16 @@ unload_error:


> -	/* Reset the chip */
> -	rc = bnx2x_reset_hw(bp, reset_code);
> -	if (rc)
> -		BNX2X_ERR("HW_RESET failed\n");
> +	/* Reset the chip, unless PCI function is offline. If we reach this
> +	 * point following a PCI error handling, it means device is really
> +	 * in a bad state and we're about to remove it, so reset the chip
> +	 * is not a good idea.
> +	 */
> +	if (!pci_channel_offline(bp->pdev)) {
> +		rc = bnx2x_reset_hw(bp, reset_code);
> +		if (rc)
> +			BNX2X_ERR("HW_RESET failed\n");
> +	}

Why not simply check this at the beginning of the function?
Guilherme G. Piccoli Aug. 9, 2016, 1:27 p.m. UTC | #2
On 08/09/2016 09:14 AM, Yuval Mintz wrote:
>> When PCI error is detected, in some architectures (like PowerPC) a slot reset is
>> performed - the driver's error handlers are in charge of "disable"
>> device before the reset, and re-enable it after a successful slot reset.
>>
>> There are two cases though that another path is taken on the code: if the slot
>> reset is not successful or if too many errors already happened in the specific
>> adapter (meaning that possibly the device is experiencing a HW failure that slot
>> reset is not able to solve), the core PCI error mechanism (called EEH in PowerPC)
>> will remove the adapter from the system, since it will consider this as a
>> permanent failure on device.
> Why would the published resume()  from pci_error_handlers be called in this scenario?

It isn't. That's why I specifically commented on commit message: "There 
are two cases though that another path is taken on the code".

The code path reach bnx2x_chip_cleanup() on device removal from the 
system, as seen in the below call trace:

bnx2x_chip_cleanup+0x3c0/0x910 [bnx2x]
bnx2x_nic_unload+0x268/0xaf0 [bnx2x]
bnx2x_close+0x34/0x50 [bnx2x]
__dev_close_many+0xd4/0x150
dev_close_many+0xa8/0x160
rollback_registered_many+0x174/0x3f0
rollback_registered+0x40/0x70
unregister_netdevice_queue+0x98/0x110
unregister_netdev+0x34/0x50
__bnx2x_remove+0xa8/0x3a0 [bnx2x]
pci_device_remove+0x70/0x110


>
>> Also, we avoid the MCP information dump in case of non-recoverable PCI error
>> (when adapter is about to be removed), since it will certainly fail.
>
> We should probably avoid several things here; Why specifically only this?

For example, we shouldn't execute bnx2x_timer() in this scenario. But I 
thought it'd be too much to check every call of a timer function against 
PCI channel state just to avoid it's execution on this scenario, so I 
just let it execute, since it seems harmless.

On the other hand, MCP dump is, as you said below, a slowpath and surely 
will fail in the following conditional (since addresses are all like 
0xFFF...):

        /* sanity */
         if (trace_shmem_base < MCPR_SCRATCH_BASE(bp) + 
MCPR_TRACE_BUFFER_SIZE ||
             trace_shmem_base >= MCPR_SCRATCH_BASE(bp) +
                                 SCRATCH_BUFFER_SIZE(bp))

So, I added the check to at least be more informative on logs on why it 
failed. If you desire, we can avoid the timer execution (maybe a 
del_timer() instead of checking for PCI state?) and bnx2x_period_task() 
run in case of PCI permanent failure. I'm open to suggestions!


>> +	if (unlikely(pci_channel_offline(bp->pdev))) {
>> +		BNX2X_ERR("Cannot dump MCP info while in PCI error\n");
>> +		return;
>> +	}
>> +
> Nitpicky, but I don't think there's reason to add 'unlikely' to a purely slowpath
> Configuration.

OK, your call. I can remove it on v2, no problem.


>
>>   	val = REG_RD(bp, MCP_REG_MCPR_CPU_PROGRAM_COUNTER);
>>   	if (val == REG_RD(bp, MCP_REG_MCPR_CPU_PROGRAM_COUNTER))
>>   		BNX2X_ERR("%s" "MCP PC at 0x%x\n", lvl, val); @@ -9415,10
>> +9420,16 @@ unload_error:
>
>
>> -	/* Reset the chip */
>> -	rc = bnx2x_reset_hw(bp, reset_code);
>> -	if (rc)
>> -		BNX2X_ERR("HW_RESET failed\n");
>> +	/* Reset the chip, unless PCI function is offline. If we reach this
>> +	 * point following a PCI error handling, it means device is really
>> +	 * in a bad state and we're about to remove it, so reset the chip
>> +	 * is not a good idea.
>> +	 */
>> +	if (!pci_channel_offline(bp->pdev)) {
>> +		rc = bnx2x_reset_hw(bp, reset_code);
>> +		if (rc)
>> +			BNX2X_ERR("HW_RESET failed\n");
>> +	}
>
> Why not simply check this at the beginning of the function?

Because I wasn't sure if I could drop the entire execution of 
chip_cleanup(). I tried to keep the most of this function aiming to 
shutdown the module in a gentle way, like cleaning MAC, stopping 
queues...but again, I'm open to suggestions and gladly will change this 
in v2 if you think it's for the best.

Thanks for your review,


Guilherme
Yuval Mintz Aug. 10, 2016, 7:59 a.m. UTC | #3
> > Why would the published resume()  from pci_error_handlers be called in this
> scenario?
> 
> It isn't. That's why I specifically commented on commit message: "There are two
> cases though that another path is taken on the code".
> 
> The code path reach bnx2x_chip_cleanup() on device removal from the system,
> as seen in the below call trace:
> 
> bnx2x_chip_cleanup+0x3c0/0x910 [bnx2x]
> bnx2x_nic_unload+0x268/0xaf0 [bnx2x]
> bnx2x_close+0x34/0x50 [bnx2x]
> __dev_close_many+0xd4/0x150
> dev_close_many+0xa8/0x160
> rollback_registered_many+0x174/0x3f0
> rollback_registered+0x40/0x70
> unregister_netdevice_queue+0x98/0x110
> unregister_netdev+0x34/0x50
> __bnx2x_remove+0xa8/0x3a0 [bnx2x]
> pci_device_remove+0x70/0x110

Makes sense.

> >> Also, we avoid the MCP information dump in case of non-recoverable
> >> PCI error (when adapter is about to be removed), since it will certainly fail.
> >
> > We should probably avoid several things here; Why specifically only this?
> 
> For example, we shouldn't execute bnx2x_timer() in this scenario. But I thought
> it'd be too much to check every call of a timer function against PCI channel state
> just to avoid it's execution on this scenario, so I just let it execute, since it seems
> harmless.
> 
> >> +	/* Reset the chip, unless PCI function is offline. If we reach this
> >> +	 * point following a PCI error handling, it means device is really
> >> +	 * in a bad state and we're about to remove it, so reset the chip
> >> +	 * is not a good idea.
> >> +	 */
> >> +	if (!pci_channel_offline(bp->pdev)) {
> >> +		rc = bnx2x_reset_hw(bp, reset_code);
> >> +		if (rc)
> >> +			BNX2X_ERR("HW_RESET failed\n");
> >> +	}
> >
> > Why not simply check this at the beginning of the function?
> 
> Because I wasn't sure if I could drop the entire execution of chip_cleanup(). I
> tried to keep the most of this function aiming to shutdown the module in a
> gentle way, like cleaning MAC, stopping queues...but again, I'm open to
> suggestions and gladly will change this in v2 if you think it's for the best.

Problem is I won't be able to have a more thorough review of this in the next
couple of days - and other than code-review I won't have a reasonable way
of testing this [I can use aer_inject, but I don't have your magical EEH
error injections, and I'm not at all certain it would suffice for a good testing ].

I agree that even as-is, what you're suggesting is an improvement to the
existing flow - so it's basically up to dave, i.e., whether to take a half fix
or wait for a more thorough one.
Guilherme G. Piccoli Aug. 10, 2016, 1:28 p.m. UTC | #4
On 08/10/2016 04:59 AM, Yuval Mintz wrote:
>>> Why would the published resume()  from pci_error_handlers be called in this
>> scenario?
>>
>> It isn't. That's why I specifically commented on commit message: "There are two
>> cases though that another path is taken on the code".
>>
>> The code path reach bnx2x_chip_cleanup() on device removal from the system,
>> as seen in the below call trace:
>>
>> bnx2x_chip_cleanup+0x3c0/0x910 [bnx2x]
>> bnx2x_nic_unload+0x268/0xaf0 [bnx2x]
>> bnx2x_close+0x34/0x50 [bnx2x]
>> __dev_close_many+0xd4/0x150
>> dev_close_many+0xa8/0x160
>> rollback_registered_many+0x174/0x3f0
>> rollback_registered+0x40/0x70
>> unregister_netdevice_queue+0x98/0x110
>> unregister_netdev+0x34/0x50
>> __bnx2x_remove+0xa8/0x3a0 [bnx2x]
>> pci_device_remove+0x70/0x110
>
> Makes sense.
>
>>>> Also, we avoid the MCP information dump in case of non-recoverable
>>>> PCI error (when adapter is about to be removed), since it will certainly fail.
>>>
>>> We should probably avoid several things here; Why specifically only this?
>>
>> For example, we shouldn't execute bnx2x_timer() in this scenario. But I thought
>> it'd be too much to check every call of a timer function against PCI channel state
>> just to avoid it's execution on this scenario, so I just let it execute, since it seems
>> harmless.
>>
>>>> +	/* Reset the chip, unless PCI function is offline. If we reach this
>>>> +	 * point following a PCI error handling, it means device is really
>>>> +	 * in a bad state and we're about to remove it, so reset the chip
>>>> +	 * is not a good idea.
>>>> +	 */
>>>> +	if (!pci_channel_offline(bp->pdev)) {
>>>> +		rc = bnx2x_reset_hw(bp, reset_code);
>>>> +		if (rc)
>>>> +			BNX2X_ERR("HW_RESET failed\n");
>>>> +	}
>>>
>>> Why not simply check this at the beginning of the function?
>>
>> Because I wasn't sure if I could drop the entire execution of chip_cleanup(). I
>> tried to keep the most of this function aiming to shutdown the module in a
>> gentle way, like cleaning MAC, stopping queues...but again, I'm open to
>> suggestions and gladly will change this in v2 if you think it's for the best.
>
> Problem is I won't be able to have a more thorough review of this in the next
> couple of days - and other than code-review I won't have a reasonable way
> of testing this [I can use aer_inject, but I don't have your magical EEH
> error injections, and I'm not at all certain it would suffice for a good testing ].
>
> I agree that even as-is, what you're suggesting is an improvement to the
> existing flow - so it's basically up to dave, i.e., whether to take a half fix
> or wait for a more thorough one.
>

Thanks for your consideration. The point is: the important part of this 
patch is avoiding the reset_hw() path, since it will clearly fail and 
generate soft lockups. This is the fix per se, the other part (regarding 
the MCP dump) is just an improvement; surely we have more potential 
improvements to explore, but they wouldn't be fixes, only code improvements.

So, I wouldn't call this a half fix, but yet, a completely fix with a 
small improvement as a bonus :)

Cheers,


Guilherme
Guilherme G. Piccoli Aug. 16, 2016, 5:51 p.m. UTC | #5
On 08/10/2016 10:28 AM, Guilherme G. Piccoli wrote:
> On 08/10/2016 04:59 AM, Yuval Mintz wrote:
>>>> Why would the published resume()  from pci_error_handlers be called
>>>> in this
>>> scenario?
>>>
>>> It isn't. That's why I specifically commented on commit message:
>>> "There are two
>>> cases though that another path is taken on the code".
>>>
>>> The code path reach bnx2x_chip_cleanup() on device removal from the
>>> system,
>>> as seen in the below call trace:
>>>
>>> bnx2x_chip_cleanup+0x3c0/0x910 [bnx2x]
>>> bnx2x_nic_unload+0x268/0xaf0 [bnx2x]
>>> bnx2x_close+0x34/0x50 [bnx2x]
>>> __dev_close_many+0xd4/0x150
>>> dev_close_many+0xa8/0x160
>>> rollback_registered_many+0x174/0x3f0
>>> rollback_registered+0x40/0x70
>>> unregister_netdevice_queue+0x98/0x110
>>> unregister_netdev+0x34/0x50
>>> __bnx2x_remove+0xa8/0x3a0 [bnx2x]
>>> pci_device_remove+0x70/0x110
>>
>> Makes sense.
>>
>>>>> Also, we avoid the MCP information dump in case of non-recoverable
>>>>> PCI error (when adapter is about to be removed), since it will
>>>>> certainly fail.
>>>>
>>>> We should probably avoid several things here; Why specifically only
>>>> this?
>>>
>>> For example, we shouldn't execute bnx2x_timer() in this scenario. But
>>> I thought
>>> it'd be too much to check every call of a timer function against PCI
>>> channel state
>>> just to avoid it's execution on this scenario, so I just let it
>>> execute, since it seems
>>> harmless.
>>>
>>>>> +    /* Reset the chip, unless PCI function is offline. If we reach
>>>>> this
>>>>> +     * point following a PCI error handling, it means device is
>>>>> really
>>>>> +     * in a bad state and we're about to remove it, so reset the chip
>>>>> +     * is not a good idea.
>>>>> +     */
>>>>> +    if (!pci_channel_offline(bp->pdev)) {
>>>>> +        rc = bnx2x_reset_hw(bp, reset_code);
>>>>> +        if (rc)
>>>>> +            BNX2X_ERR("HW_RESET failed\n");
>>>>> +    }
>>>>
>>>> Why not simply check this at the beginning of the function?
>>>
>>> Because I wasn't sure if I could drop the entire execution of
>>> chip_cleanup(). I
>>> tried to keep the most of this function aiming to shutdown the module
>>> in a
>>> gentle way, like cleaning MAC, stopping queues...but again, I'm open to
>>> suggestions and gladly will change this in v2 if you think it's for
>>> the best.
>>
>> Problem is I won't be able to have a more thorough review of this in
>> the next
>> couple of days - and other than code-review I won't have a reasonable way
>> of testing this [I can use aer_inject, but I don't have your magical EEH
>> error injections, and I'm not at all certain it would suffice for a
>> good testing ].
>>
>> I agree that even as-is, what you're suggesting is an improvement to the
>> existing flow - so it's basically up to dave, i.e., whether to take a
>> half fix
>> or wait for a more thorough one.
>>
>
> Thanks for your consideration. The point is: the important part of this
> patch is avoiding the reset_hw() path, since it will clearly fail and
> generate soft lockups. This is the fix per se, the other part (regarding
> the MCP dump) is just an improvement; surely we have more potential
> improvements to explore, but they wouldn't be fixes, only code
> improvements.
>
> So, I wouldn't call this a half fix, but yet, a completely fix with a
> small improvement as a bonus :)
>
> Cheers,
>
>
> Guilherme
>

David, sorry to bother you - maybe you didn't notice this.
Any comments?

Thanks in advance,


Guilherme
diff mbox

Patch

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 97e8925..e6329dc 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -772,6 +772,11 @@  void bnx2x_fw_dump_lvl(struct bnx2x *bp, const char *lvl)
 		(bp->common.bc_ver & 0xff00) >> 8,
 		(bp->common.bc_ver & 0xff));
 
+	if (unlikely(pci_channel_offline(bp->pdev))) {
+		BNX2X_ERR("Cannot dump MCP info while in PCI error\n");
+		return;
+	}
+
 	val = REG_RD(bp, MCP_REG_MCPR_CPU_PROGRAM_COUNTER);
 	if (val == REG_RD(bp, MCP_REG_MCPR_CPU_PROGRAM_COUNTER))
 		BNX2X_ERR("%s" "MCP PC at 0x%x\n", lvl, val);
@@ -9415,10 +9420,16 @@  unload_error:
 	/* Release IRQs */
 	bnx2x_free_irq(bp);
 
-	/* Reset the chip */
-	rc = bnx2x_reset_hw(bp, reset_code);
-	if (rc)
-		BNX2X_ERR("HW_RESET failed\n");
+	/* Reset the chip, unless PCI function is offline. If we reach this
+	 * point following a PCI error handling, it means device is really
+	 * in a bad state and we're about to remove it, so reset the chip
+	 * is not a good idea.
+	 */
+	if (!pci_channel_offline(bp->pdev)) {
+		rc = bnx2x_reset_hw(bp, reset_code);
+		if (rc)
+			BNX2X_ERR("HW_RESET failed\n");
+	}
 
 	/* Report UNLOAD_DONE to MCP */
 	bnx2x_send_unload_done(bp, keep_link);