diff mbox

sky2: detect and prevent kernel panic from a possible faulty hardware

Message ID 1374957354-10142-1-git-send-email-sam8641@gmail.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Samuel Williams July 27, 2013, 8:35 p.m. UTC
A possible faulty hardware might interrupt with a status of 0xffffffff which
may kernel panic if sky2 driver tries to handle it. Detecting this problem
may avoid kernel panic.

Signed-off-by: Samuel Williams <sam8641@gmail.com>
---
 drivers/net/ethernet/marvell/sky2.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

David Miller July 30, 2013, 11:19 p.m. UTC | #1
From: Samuel Williams <sam8641@gmail.com>
Date: Sat, 27 Jul 2013 15:35:54 -0500

> A possible faulty hardware might interrupt with a status of 0xffffffff which
> may kernel panic if sky2 driver tries to handle it. Detecting this problem
> may avoid kernel panic.
> 
> Signed-off-by: Samuel Williams <sam8641@gmail.com>

Sure, but then the device is basically just going to stop.

I think this change needs to be more comprehensive, for example it
needs to fully reset the device and try to get it operational again.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Hemminger July 30, 2013, 11:32 p.m. UTC | #2
On Tue, 30 Jul 2013 16:19:06 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Samuel Williams <sam8641@gmail.com>
> Date: Sat, 27 Jul 2013 15:35:54 -0500
> 
> > A possible faulty hardware might interrupt with a status of 0xffffffff which
> > may kernel panic if sky2 driver tries to handle it. Detecting this problem
> > may avoid kernel panic.
> > 
> > Signed-off-by: Samuel Williams <sam8641@gmail.com>
> 
> Sure, but then the device is basically just going to stop.
> 
> I think this change needs to be more comprehensive, for example it
> needs to fully reset the device and try to get it operational again.

Not only that it is going to assert IRQ until the device is hard
reset. Plus it is not clear that cleaning up after sick hardware
is possible at all.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Neukum July 31, 2013, 8:18 a.m. UTC | #3
On Tue, 2013-07-30 at 16:19 -0700, David Miller wrote:
> From: Samuel Williams <sam8641@gmail.com>
> Date: Sat, 27 Jul 2013 15:35:54 -0500
> 
> > A possible faulty hardware might interrupt with a status of 0xffffffff which
> > may kernel panic if sky2 driver tries to handle it. Detecting this problem
> > may avoid kernel panic.
> > 
> > Signed-off-by: Samuel Williams <sam8641@gmail.com>
> 
> Sure, but then the device is basically just going to stop.
> 
> I think this change needs to be more comprehensive, for example it
> needs to fully reset the device and try to get it operational again.

But sky2 is used in some hotremovable cards. You can hit the window
for this race. It is unlikely but still this simple check fixes a bug.

	Regards
		Oliver


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
index e09a8c6..43fe2f3 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -3049,8 +3049,17 @@  static int sky2_poll(struct napi_struct *napi, int work_limit)
 	int work_done = 0;
 	u16 idx;
 
-	if (unlikely(status & Y2_IS_ERROR))
+	if (unlikely(status & Y2_IS_ERROR)) {
+		if (status == 0xFFFFFFFF) {
+			dev_err(&hw->pdev->dev,
+				"fatal hardware inturrupt error\n");
+			napi_complete(napi);
+			napi_disable(&hw->napi);
+			return 0;
+		}
 		sky2_err_intr(hw, status);
+	}
+
 
 	if (status & Y2_IS_IRQ_PHY1)
 		sky2_phy_intr(hw, 0);