diff mbox

Panic at tcp_xmit_retransmit_queue

Message ID alpine.DEB.2.00.1002151520250.7063@wel-95.cs.helsinki.fi
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Ilpo Järvinen Feb. 15, 2010, 1:21 p.m. UTC
On Wed, 3 Feb 2010, Ilpo Järvinen wrote:

> On Mon, 1 Feb 2010, sbs wrote:
> 
> > actually removing netconsole from kernel didnt help.
> > i found many guys with the same problem but with different hardware
> > configurations here:
> > 
> > freez in TCP stack :
> > http://bugzilla.kernel.org/show_bug.cgi?id=14470
> > 
> > is there someone who can investigate it?
> > 
> > 
> > On Tue, Jan 19, 2010 at 7:13 PM, sbs <gexlie@gmail.com> wrote:
> > > We are hiting kernel panics on servers with nVidia MCP55 NICs once a day;
> > > it appears usualy under a high network trafic ( around 10000Mbit/s) but
> > > it is not a rule, it has happened even on low trafic.
> > >
> > > Servers are used as nginx+static content
> > > On 2 equal servers this panic happens aprox 2 times a day depending on
> > > network load. Machine completly freezes till the netconsole reboots.
> > >
> > > Kernel: 2.6.32.3
> > >
> > > what can it be? whats wrong with tcp_xmit_retransmit_queue() function ?
> > > can anyone explain or fix?
> 
> You might want to try with to debug patch below. It might even make the 
> box to survive the event (if I got it coded right).

Here should be a better version of the debug patch, hopefully the infinite 
looping is now gone.

Comments

Bruno Prémont Feb. 18, 2010, 10:28 a.m. UTC | #1
On Mon, 15 Feb 2010 15:21:58 "Ilpo Järvinen" wrote:
> On Wed, 3 Feb 2010, Ilpo Järvinen wrote:
> 
> > On Mon, 1 Feb 2010, sbs wrote:
> > 
> > > actually removing netconsole from kernel didnt help.
> > > i found many guys with the same problem but with different
> > > hardware configurations here:
> > > 
> > > freez in TCP stack :
> > > http://bugzilla.kernel.org/show_bug.cgi?id=14470
> > > 
> > > is there someone who can investigate it?
> > > 
> > > 
> > > On Tue, Jan 19, 2010 at 7:13 PM, sbs <gexlie@gmail.com> wrote:
> > > > We are hiting kernel panics on servers with nVidia MCP55 NICs
> > > > once a day; it appears usualy under a high network trafic
> > > > ( around 10000Mbit/s) but it is not a rule, it has happened
> > > > even on low trafic.
> > > >
> > > > Servers are used as nginx+static content
> > > > On 2 equal servers this panic happens aprox 2 times a day
> > > > depending on network load. Machine completly freezes till the
> > > > netconsole reboots.
> > > >
> > > > Kernel: 2.6.32.3
> > > >
> > > > what can it be? whats wrong with tcp_xmit_retransmit_queue()
> > > > function ? can anyone explain or fix?
> > 
> > You might want to try with to debug patch below. It might even make
> > the box to survive the event (if I got it coded right).
> 
> Here should be a better version of the debug patch, hopefully the
> infinite looping is now gone.

I can reproduce the freeze pretty easily, even on an idle server,
all I need is netconsole enabled, an ssh connection to server and
permission to write to /proc/sysrq-trigger.

The following command, executed via SSH triggers the frozen system:
  echo t > /proc/sysrq-trigger
when netconsole is enabled. Doing the same from local console has no
negative effect (idle system).
Unfortunately I can't get any useful information out of the system as
nothing reaches VGA console and interaction with the system is not
possible anymore (cursor is still blinking on VGA console).

Unfortunately I currently have no setup here to analyze dead system via
kexec crash kernel that would be run on watchdog.

System I'm using is HP Proliant DL360 G5 (4 logical CPUs, two sockets),
bnx2 NIC.
Eventually I will try with some other system to reproduce there as
well (to rule out NIC driver).

Any hints on how to get pertinent data out of that system would be
really nice!

Regards,
Bruno
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
sbs March 2, 2010, 1:16 p.m. UTC | #2
thank you very much, have stable running server for a week and it
seems that it works like a charm now, i havent detected any panics
since i apply the patch. although seems that the problem stops
ocurring cause i dont see any debug information through netconsole

On Mon, Feb 15, 2010 at 4:21 PM, Ilpo Järvinen
<ilpo.jarvinen@helsinki.fi> wrote:
> On Wed, 3 Feb 2010, Ilpo Järvinen wrote:
>
>> On Mon, 1 Feb 2010, sbs wrote:
>>
>> > actually removing netconsole from kernel didnt help.
>> > i found many guys with the same problem but with different hardware
>> > configurations here:
>> >
>> > freez in TCP stack :
>> > http://bugzilla.kernel.org/show_bug.cgi?id=14470
>> >
>> > is there someone who can investigate it?
>> >
>> >
>> > On Tue, Jan 19, 2010 at 7:13 PM, sbs <gexlie@gmail.com> wrote:
>> > > We are hiting kernel panics on servers with nVidia MCP55 NICs once a day;
>> > > it appears usualy under a high network trafic ( around 10000Mbit/s) but
>> > > it is not a rule, it has happened even on low trafic.
>> > >
>> > > Servers are used as nginx+static content
>> > > On 2 equal servers this panic happens aprox 2 times a day depending on
>> > > network load. Machine completly freezes till the netconsole reboots.
>> > >
>> > > Kernel: 2.6.32.3
>> > >
>> > > what can it be? whats wrong with tcp_xmit_retransmit_queue() function ?
>> > > can anyone explain or fix?
>>
>> You might want to try with to debug patch below. It might even make the
>> box to survive the event (if I got it coded right).
>
> Here should be a better version of the debug patch, hopefully the infinite
> looping is now gone.
>
> --
>  i.
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 383ce23..4672a30 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2186,6 +2186,42 @@ static int tcp_can_forward_retransmit(struct sock *sk)
>        return 1;
>  }
>
> +static void print_queue(struct sock *sk, struct sk_buff *old, struct sk_buff *hole)
> +{
> +       struct tcp_sock *tp = tcp_sk(sk);
> +       struct sk_buff *skb, *prev;
> +
> +       skb = tcp_write_queue_head(sk);
> +       prev = (struct sk_buff *)(&sk->sk_write_queue);
> +
> +       if (skb == NULL) {
> +               printk("NULL head, pkts %u\n", tp->packets_out);
> +               return;
> +       }
> +       printk("head %p tail %p sendhead %p oldhint %p now %p hole %p high %u\n",
> +              tcp_write_queue_head(sk), tcp_write_queue_tail(sk),
> +              tcp_send_head(sk), old, tp->retransmit_skb_hint, hole,
> +              tp->retransmit_high);
> +
> +       while (skb) {
> +               printk("skb %p (%u-%u) next %p prev %p sacked %u\n",
> +                      skb, TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq,
> +                      skb->next, skb->prev, TCP_SKB_CB(skb)->sacked);
> +               if (prev != skb->prev)
> +                       printk("Inconsistent prev\n");
> +
> +               if (skb == tcp_write_queue_tail(sk)) {
> +                       if (skb->next != (struct sk_buff *)(&sk->sk_write_queue))
> +                               printk("Improper next at tail\n");
> +                       return;
> +               }
> +
> +               prev = skb;
> +               skb = skb->next;
> +       }
> +       printk("Encountered unexpected NULL\n");
> +}
> +
>  /* This gets called after a retransmit timeout, and the initially
>  * retransmitted data is acknowledged.  It tries to continue
>  * resending the rest of the retransmit queue, until either
> @@ -2194,12 +2230,15 @@ static int tcp_can_forward_retransmit(struct sock *sk)
>  * based retransmit packet might feed us FACK information again.
>  * If so, we use it to avoid unnecessarily retransmissions.
>  */
> +static int caught_it = 0;
> +
>  void tcp_xmit_retransmit_queue(struct sock *sk)
>  {
>        const struct inet_connection_sock *icsk = inet_csk(sk);
>        struct tcp_sock *tp = tcp_sk(sk);
>        struct sk_buff *skb;
>        struct sk_buff *hole = NULL;
> +       struct sk_buff *old = tp->retransmit_skb_hint;
>        u32 last_lost;
>        int mib_idx;
>        int fwd_rexmitting = 0;
> @@ -2217,6 +2256,16 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
>                last_lost = tp->snd_una;
>        }
>
> +checknull:
> +       if (skb == NULL) {
> +               if (!caught_it)
> +                       print_queue(sk, old, hole);
> +               caught_it++;
> +               if (net_ratelimit())
> +                       printk("Errors caught so far %u\n", caught_it);
> +               return;
> +       }
> +
>        tcp_for_write_queue_from(skb, sk) {
>                __u8 sacked = TCP_SKB_CB(skb)->sacked;
>
> @@ -2257,7 +2306,7 @@ begin_fwd:
>                } else if (!(sacked & TCPCB_LOST)) {
>                        if (hole == NULL && !(sacked & (TCPCB_SACKED_RETRANS|TCPCB_SACKED_ACKED)))
>                                hole = skb;
> -                       continue;
> +                       goto cont;
>
>                } else {
>                        last_lost = TCP_SKB_CB(skb)->end_seq;
> @@ -2268,7 +2317,7 @@ begin_fwd:
>                }
>
>                if (sacked & (TCPCB_SACKED_ACKED|TCPCB_SACKED_RETRANS))
> -                       continue;
> +                       goto cont;
>
>                if (tcp_retransmit_skb(sk, skb))
>                        return;
> @@ -2278,6 +2327,9 @@ begin_fwd:
>                        inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
>                                                  inet_csk(sk)->icsk_rto,
>                                                  TCP_RTO_MAX);
> +cont:
> +               skb = skb->next;
> +               goto checknull;
>        }
>  }
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 383ce23..4672a30 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2186,6 +2186,42 @@  static int tcp_can_forward_retransmit(struct sock *sk)
 	return 1;
 }
 
+static void print_queue(struct sock *sk, struct sk_buff *old, struct sk_buff *hole)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *skb, *prev;
+
+	skb = tcp_write_queue_head(sk);
+	prev = (struct sk_buff *)(&sk->sk_write_queue);
+
+	if (skb == NULL) {
+		printk("NULL head, pkts %u\n", tp->packets_out);
+		return;
+	}
+	printk("head %p tail %p sendhead %p oldhint %p now %p hole %p high %u\n",
+	       tcp_write_queue_head(sk), tcp_write_queue_tail(sk),
+	       tcp_send_head(sk), old, tp->retransmit_skb_hint, hole,
+	       tp->retransmit_high);
+
+	while (skb) {
+		printk("skb %p (%u-%u) next %p prev %p sacked %u\n",
+		       skb, TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq,
+		       skb->next, skb->prev, TCP_SKB_CB(skb)->sacked);
+		if (prev != skb->prev)
+			printk("Inconsistent prev\n");
+
+		if (skb == tcp_write_queue_tail(sk)) {
+			if (skb->next != (struct sk_buff *)(&sk->sk_write_queue))
+				printk("Improper next at tail\n");
+			return;
+		}
+
+		prev = skb;
+		skb = skb->next;
+	}
+	printk("Encountered unexpected NULL\n");
+}
+
 /* This gets called after a retransmit timeout, and the initially
  * retransmitted data is acknowledged.  It tries to continue
  * resending the rest of the retransmit queue, until either
@@ -2194,12 +2230,15 @@  static int tcp_can_forward_retransmit(struct sock *sk)
  * based retransmit packet might feed us FACK information again.
  * If so, we use it to avoid unnecessarily retransmissions.
  */
+static int caught_it = 0;
+
 void tcp_xmit_retransmit_queue(struct sock *sk)
 {
 	const struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *skb;
 	struct sk_buff *hole = NULL;
+	struct sk_buff *old = tp->retransmit_skb_hint;
 	u32 last_lost;
 	int mib_idx;
 	int fwd_rexmitting = 0;
@@ -2217,6 +2256,16 @@  void tcp_xmit_retransmit_queue(struct sock *sk)
 		last_lost = tp->snd_una;
 	}
 
+checknull:
+	if (skb == NULL) {
+		if (!caught_it)
+			print_queue(sk, old, hole);
+		caught_it++;
+		if (net_ratelimit())
+			printk("Errors caught so far %u\n", caught_it);
+		return;
+	}
+
 	tcp_for_write_queue_from(skb, sk) {
 		__u8 sacked = TCP_SKB_CB(skb)->sacked;
 
@@ -2257,7 +2306,7 @@  begin_fwd:
 		} else if (!(sacked & TCPCB_LOST)) {
 			if (hole == NULL && !(sacked & (TCPCB_SACKED_RETRANS|TCPCB_SACKED_ACKED)))
 				hole = skb;
-			continue;
+			goto cont;
 
 		} else {
 			last_lost = TCP_SKB_CB(skb)->end_seq;
@@ -2268,7 +2317,7 @@  begin_fwd:
 		}
 
 		if (sacked & (TCPCB_SACKED_ACKED|TCPCB_SACKED_RETRANS))
-			continue;
+			goto cont;
 
 		if (tcp_retransmit_skb(sk, skb))
 			return;
@@ -2278,6 +2327,9 @@  begin_fwd:
 			inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
 						  inet_csk(sk)->icsk_rto,
 						  TCP_RTO_MAX);
+cont:
+		skb = skb->next;
+		goto checknull;
 	}
 }