Patchwork sky2: Fix oops in sky2_xmit_frame() after TX timeout

login
register
mail settings
Submitter Jarek Poplawski
Date Jan. 4, 2010, 1:49 p.m.
Message ID <20100104134904.GA18583@ff.dom.local>
Download mbox | patch
Permalink /patch/42065/
State Changes Requested
Delegated to: David Miller
Headers show

Comments

Jarek Poplawski - Jan. 4, 2010, 1:49 p.m.
On Sun, Jan 03, 2010 at 07:44:28PM -0700, Berck E. Nash wrote:
> Jarek Poplawski wrote:
> > Yes, it seems this lock might be needed around this place, but I'd
> > like to check another idea: if it's not about awakening too early.
> > Berck, could you try this patch?
> 
> Okay, after running with that for several days I have not gotten the
> oops.  That doesn't mean that I won't tomorrow, but I've gotten several
> of these:
> 
> [45621.704025] sky2 eth0: hung mac 124:21 fifo 195 (127:122)
> [45621.704027] sky2 eth0: receiver hang detected
> [45621.708524] sky2 eth0: disabling interface
> [45621.715229] sky2 eth0: enabling interface
> [45624.862111] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow
> control both
> [61024.704036] sky2 eth0: hung mac 124:75 fifo 195 (133:128)
> [61024.704039] sky2 eth0: receiver hang detected
> [61024.708487] sky2 eth0: disabling interface
> [61024.714791] sky2 eth0: enabling interface
> [61027.864288] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow
> control both
> 
> And it hasn't crashed.  The "receiver hang detected" would often (but
> not always) be followed by the oops before.
> 
> Berck

OK, here it is with some cosmetics; let David decide if it needs more
testing.

Thanks everybody,
Jarek P.
------------->

During TX timeout procedure dev could be awaken too early, e.g. by
sky2_complete_tx() called from sky2_down(). Then sky2_xmit_frame()
can run while buffers are freed causing an oops. This patch fixes it
by adding netif_device_present() test in sky2_tx_complete().

Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=14925

With debugging by: Mike McCormack <mikem@ring3k.org>

Reported-by: Berck E. Nash <flyboy@gmail.com>
Tested-by: Berck E. Nash <flyboy@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>

---

 drivers/net/sky2.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger - Jan. 4, 2010, 6:26 p.m.
On Mon, 4 Jan 2010 13:49:04 +0000
Jarek Poplawski <jarkao2@gmail.com> wrote:

> On Sun, Jan 03, 2010 at 07:44:28PM -0700, Berck E. Nash wrote:
> > Jarek Poplawski wrote:
> > > Yes, it seems this lock might be needed around this place, but I'd
> > > like to check another idea: if it's not about awakening too early.
> > > Berck, could you try this patch?
> > 
> > Okay, after running with that for several days I have not gotten the
> > oops.  That doesn't mean that I won't tomorrow, but I've gotten several
> > of these:
> > 
> > [45621.704025] sky2 eth0: hung mac 124:21 fifo 195 (127:122)
> > [45621.704027] sky2 eth0: receiver hang detected
> > [45621.708524] sky2 eth0: disabling interface
> > [45621.715229] sky2 eth0: enabling interface
> > [45624.862111] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow
> > control both
> > [61024.704036] sky2 eth0: hung mac 124:75 fifo 195 (133:128)
> > [61024.704039] sky2 eth0: receiver hang detected
> > [61024.708487] sky2 eth0: disabling interface
> > [61024.714791] sky2 eth0: enabling interface
> > [61027.864288] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow
> > control both
> > 
> > And it hasn't crashed.  The "receiver hang detected" would often (but
> > not always) be followed by the oops before.
> > 
> > Berck
> 
> OK, here it is with some cosmetics; let David decide if it needs more
> testing.
> 
> Thanks everybody,
> Jarek P.
> ------------->
> 
> During TX timeout procedure dev could be awaken too early, e.g. by
> sky2_complete_tx() called from sky2_down(). Then sky2_xmit_frame()
> can run while buffers are freed causing an oops. This patch fixes it
> by adding netif_device_present() test in sky2_tx_complete().
> 
> Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=14925
> 
> With debugging by: Mike McCormack <mikem@ring3k.org>
> 
> Reported-by: Berck E. Nash <flyboy@gmail.com>
> Tested-by: Berck E. Nash <flyboy@gmail.com>
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
> 
> ---
> 
>  drivers/net/sky2.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> index 1c01b96..2f32fab 100644
> --- a/drivers/net/sky2.c
> +++ b/drivers/net/sky2.c
> @@ -1844,7 +1844,9 @@ static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
>  	sky2->tx_cons = idx;
>  	smp_mb();
>  
> -	if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
> +	/* Wake unless it's detached, and called e.g. from sky2_down() */
> +	if (tx_avail(sky2) > MAX_SKB_TX_LE + 4 &&
> +	    likely(netif_device_present(dev)))
>  		netif_wake_queue(dev);
>  }
>

The likely() seems unnecessary noise here. It can't possibly matter that
much.

Patch

diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index 1c01b96..2f32fab 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -1844,7 +1844,9 @@  static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
 	sky2->tx_cons = idx;
 	smp_mb();
 
-	if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
+	/* Wake unless it's detached, and called e.g. from sky2_down() */
+	if (tx_avail(sky2) > MAX_SKB_TX_LE + 4 &&
+	    likely(netif_device_present(dev)))
 		netif_wake_queue(dev);
 }