diff mbox

[Bug,#14925] sky2 panic under load

Message ID 20100111084504.7adb5a1e@nehalam
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

stephen hemminger Jan. 11, 2010, 4:45 p.m. UTC
On Mon, 11 Jan 2010 23:03:41 +0900
Mike McCormack <mikem@ring3k.org> wrote:

> Perhaps only sky2_tx_done should wake the queue?
> Does the patch below fix the problem too?
> 
> thanks,
> 
> Mike

The idea is good, but what if transmit queue was full (stopped),
and TX FIFO gets stuck.  Then Transmit timeout happens and
the reset logic clears the queue.  What will start the queue?

Something like this:
-----------------------------------------------------------

Comments

Jarek Poplawski Jan. 11, 2010, 10:07 p.m. UTC | #1
On Mon, Jan 11, 2010 at 08:45:04AM -0800, Stephen Hemminger wrote:
> On Mon, 11 Jan 2010 23:03:41 +0900
> Mike McCormack <mikem@ring3k.org> wrote:
> 
> > Perhaps only sky2_tx_done should wake the queue?
> > Does the patch below fix the problem too?
> > 
> > thanks,
> > 
> > Mike
> 
> The idea is good,

I don't agree: you both try to make this check more specific.
Actually, since this problem is quite tricky, I wondered about
making it more genaral and visible for other maintainers by
adding something like netif_wake_queue_present() with the
check and some comment.

Jarek P.

> but what if transmit queue was full (stopped),
> and TX FIFO gets stuck.  Then Transmit timeout happens and
> the reset logic clears the queue.  What will start the queue?
> 
> Something like this:
> -----------------------------------------------------------
> 
> 
> 
> --- a/drivers/net/sky2.c	2010-01-11 08:36:42.617426300 -0800
> +++ b/drivers/net/sky2.c	2010-01-11 08:42:51.295237661 -0800
> @@ -1843,9 +1843,6 @@ static void sky2_tx_complete(struct sky2
>  
>  	sky2->tx_cons = idx;
>  	smp_mb();
> -
> -	if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
> -		netif_wake_queue(dev);
>  }
>  
>  static void sky2_tx_reset(struct sky2_hw *hw, unsigned port)
> @@ -2416,8 +2413,12 @@ static inline void sky2_tx_done(struct n
>  {
>  	struct sky2_port *sky2 = netdev_priv(dev);
>  
> -	if (netif_running(dev))
> +	if (netif_running(dev) && netif_device_present(dev)) {
>  		sky2_tx_complete(sky2, last);
> +
> +		if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
> +			netif_wake_queue(dev);
> +	}
>  }
>  
>  static inline void sky2_skb_rx(const struct sky2_port *sky2,
> @@ -3197,6 +3198,7 @@ static int sky2_reattach(struct net_devi
>  		} else {
>  			netif_device_attach(dev);
>  			sky2_set_multicast(dev);
> +			netif_start_queue(dev);
>  		}
>  	}
>  
> 
> -- 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Jan. 11, 2010, 10:31 p.m. UTC | #2
On Mon, Jan 11, 2010 at 08:45:04AM -0800, Stephen Hemminger wrote:
> On Mon, 11 Jan 2010 23:03:41 +0900
> Mike McCormack <mikem@ring3k.org> wrote:
> 
> > Perhaps only sky2_tx_done should wake the queue?
> > Does the patch below fix the problem too?
> > 
> > thanks,
> > 
> > Mike
> 
> The idea is good, but what if transmit queue was full (stopped),
> and TX FIFO gets stuck.  Then Transmit timeout happens and
> the reset logic clears the queue.  What will start the queue?
> 
> Something like this:
> -----------------------------------------------------------
> 
> 
> 
> --- a/drivers/net/sky2.c	2010-01-11 08:36:42.617426300 -0800
> +++ b/drivers/net/sky2.c	2010-01-11 08:42:51.295237661 -0800
> @@ -1843,9 +1843,6 @@ static void sky2_tx_complete(struct sky2
>  
>  	sky2->tx_cons = idx;
>  	smp_mb();
> -
> -	if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
> -		netif_wake_queue(dev);
>  }
>  
>  static void sky2_tx_reset(struct sky2_hw *hw, unsigned port)
> @@ -2416,8 +2413,12 @@ static inline void sky2_tx_done(struct n
>  {
>  	struct sky2_port *sky2 = netdev_priv(dev);
>  
> -	if (netif_running(dev))
> +	if (netif_running(dev) && netif_device_present(dev)) {
>  		sky2_tx_complete(sky2, last);
> +
> +		if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
> +			netif_wake_queue(dev);
> +	}
>  }
>  
>  static inline void sky2_skb_rx(const struct sky2_port *sky2,
> @@ -3197,6 +3198,7 @@ static int sky2_reattach(struct net_devi
>  		} else {
>  			netif_device_attach(dev);
>  			sky2_set_multicast(dev);
> +			netif_start_queue(dev);

Why netif_device_attach() is not enough?

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 12, 2010, 12:14 a.m. UTC | #3
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Mon, 11 Jan 2010 23:07:54 +0100

> I don't agree: you both try to make this check more specific.
> Actually, since this problem is quite tricky, I wondered about
> making it more genaral and visible for other maintainers by
> adding something like netif_wake_queue_present() with the
> check and some comment.

This detracts from the real problem.

The issue is that we have a code path bringing a device down, which
uses helper routines which are meant to be executed when the device is
up and functioning normally.

That's the bug.

No other driver does silly things like call helper routines which wake
the TX queue when taking the chip down.

Fix that, not the immediate symptoms.  Write a routine that
unconditionally clears the TX queue, frees the packets, etc. and
has none of the wake logic.

That's what most other driver do, and those that don't should be
fixed similarly.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Jan. 12, 2010, 7:50 a.m. UTC | #4
On Mon, Jan 11, 2010 at 04:14:19PM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Mon, 11 Jan 2010 23:07:54 +0100
> 
> > I don't agree: you both try to make this check more specific.
> > Actually, since this problem is quite tricky, I wondered about
> > making it more genaral and visible for other maintainers by
> > adding something like netif_wake_queue_present() with the
> > check and some comment.
> 
> This detracts from the real problem.
> 
> The issue is that we have a code path bringing a device down, which
> uses helper routines which are meant to be executed when the device is
> up and functioning normally.
> 
> That's the bug.
> 
> No other driver does silly things like call helper routines which wake
> the TX queue when taking the chip down.
> 
> Fix that, not the immediate symptoms.  Write a routine that
> unconditionally clears the TX queue, frees the packets, etc. and
> has none of the wake logic.
> 
> That's what most other driver do, and those that don't should be
> fixed similarly.

As I wrote it's tricky and could be hard to track even if you "heard"
about it, e.g. Mike moves netif_wake_queue() to another place in his
last proposal, but it still can run after netif_device_detach() until 
napi_synchronize() in sky2_down().

I think, I can see similar problems e.g. in gianfar or netxen, where
napi_disable() is done after netif_device_detach(), especially in
suspend procedures (there might be less severe (than oops) effects
yet). IMHO, it all looks simply error prone (sometime you have to
know a driver well to track all possible paths to say it's really
safe).

In the meantime users are endangered with known oopses or at least
suspend problems, while we know the reasons, can fix it (even
temporarily) with one-liners, but wait for the right fixes. (Btw,
such fixes might require a significant rewrite, risking inserting
new bugs, so I doubt they are "right" for -stable.) Anyway, I'm
OK with this (I hope what should be done will be done, and I
don't have sky2 ;-)

Cheers,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 12, 2010, 8:08 a.m. UTC | #5
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Tue, 12 Jan 2010 07:50:59 +0000

> I think, I can see similar problems e.g. in gianfar or netxen, where
> napi_disable() is done after netif_device_detach(), especially in
> suspend procedures (there might be less severe (than oops) effects
> yet). IMHO, it all looks simply error prone (sometime you have to
> know a driver well to track all possible paths to say it's really
> safe).

Then that's an even larger bug.

Until you do napi_disable(), the device can be touched.

Asynchronous paths outside of the driver's control, even
with interrupts disabled, can call back into the driver
and touch the chip.

F.e. netpoll via netconsole output on another cpu

So it therefore must be done before doing the actual work of bringing
the device down or suspending it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Jan. 12, 2010, 8:56 a.m. UTC | #6
On Tue, Jan 12, 2010 at 12:08:04AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Tue, 12 Jan 2010 07:50:59 +0000
> 
> > I think, I can see similar problems e.g. in gianfar or netxen, where
> > napi_disable() is done after netif_device_detach(), especially in
> > suspend procedures (there might be less severe (than oops) effects
> > yet). IMHO, it all looks simply error prone (sometime you have to
> > know a driver well to track all possible paths to say it's really
> > safe).
> 
> Then that's an even larger bug.
> 
> Until you do napi_disable(), the device can be touched.
> 
> Asynchronous paths outside of the driver's control, even
> with interrupts disabled, can call back into the driver
> and touch the chip.
> 
> F.e. netpoll via netconsole output on another cpu
> 
> So it therefore must be done before doing the actual work of bringing
> the device down or suspending it.

Maybe I miss something, but once more: this patch mentioned by Berck
Nash has been tested by at least two users, Berck himself, and
probably even more intensively by Michael Breuer, during af_packet
debugging. Both guys acknowledged it helped, so it can't be that bad.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 12, 2010, 9:42 a.m. UTC | #7
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Tue, 12 Jan 2010 08:56:34 +0000

> Maybe I miss something, but once more: this patch mentioned by Berck
> Nash has been tested by at least two users, Berck himself, and
> probably even more intensively by Michael Breuer, during af_packet
> debugging. Both guys acknowledged it helped, so it can't be that bad.

I agree, as a short term fix, this patch is fine.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Jan. 12, 2010, 10:31 a.m. UTC | #8
On Tue, Jan 12, 2010 at 01:42:18AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Tue, 12 Jan 2010 08:56:34 +0000
> 
> > Maybe I miss something, but once more: this patch mentioned by Berck
> > Nash has been tested by at least two users, Berck himself, and
> > probably even more intensively by Michael Breuer, during af_packet
> > debugging. Both guys acknowledged it helped, so it can't be that bad.
> 
> I agree, as a short term fix, this patch is fine.

And I agree real problems might be deeper.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 12, 2010, 10:56 a.m. UTC | #9
From: David Miller <davem@davemloft.net>
Date: Tue, 12 Jan 2010 01:42:18 -0800 (PST)

> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Tue, 12 Jan 2010 08:56:34 +0000
> 
>> Maybe I miss something, but once more: this patch mentioned by Berck
>> Nash has been tested by at least two users, Berck himself, and
>> probably even more intensively by Michael Breuer, during af_packet
>> debugging. Both guys acknowledged it helped, so it can't be that bad.
> 
> I agree, as a short term fix, this patch is fine.

In case it isn't painfully obvious, I'm applying Jarek's "v2"
fix and will also queue that up for -stable.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Jan. 12, 2010, 11:04 a.m. UTC | #10
On Tue, Jan 12, 2010 at 02:56:20AM -0800, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Tue, 12 Jan 2010 01:42:18 -0800 (PST)
> 
> > From: Jarek Poplawski <jarkao2@gmail.com>
> > Date: Tue, 12 Jan 2010 08:56:34 +0000
> > 
> >> Maybe I miss something, but once more: this patch mentioned by Berck
> >> Nash has been tested by at least two users, Berck himself, and
> >> probably even more intensively by Michael Breuer, during af_packet
> >> debugging. Both guys acknowledged it helped, so it can't be that bad.
> > 
> > I agree, as a short term fix, this patch is fine.
> 
> In case it isn't painfully obvious, I'm applying Jarek's "v2"
> fix and will also queue that up for -stable.
> 

Thanks!
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Hemminger Jan. 12, 2010, 3:39 p.m. UTC | #11
Please hold off applying anything. I am testing a cleaner solution

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/drivers/net/sky2.c	2010-01-11 08:36:42.617426300 -0800
+++ b/drivers/net/sky2.c	2010-01-11 08:42:51.295237661 -0800
@@ -1843,9 +1843,6 @@  static void sky2_tx_complete(struct sky2
 
 	sky2->tx_cons = idx;
 	smp_mb();
-
-	if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
-		netif_wake_queue(dev);
 }
 
 static void sky2_tx_reset(struct sky2_hw *hw, unsigned port)
@@ -2416,8 +2413,12 @@  static inline void sky2_tx_done(struct n
 {
 	struct sky2_port *sky2 = netdev_priv(dev);
 
-	if (netif_running(dev))
+	if (netif_running(dev) && netif_device_present(dev)) {
 		sky2_tx_complete(sky2, last);
+
+		if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
+			netif_wake_queue(dev);
+	}
 }
 
 static inline void sky2_skb_rx(const struct sky2_port *sky2,
@@ -3197,6 +3198,7 @@  static int sky2_reattach(struct net_devi
 		} else {
 			netif_device_attach(dev);
 			sky2_set_multicast(dev);
+			netif_start_queue(dev);
 		}
 	}