Patchwork oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

login
register
mail settings
Submitter Tejun Heo
Date July 19, 2010, 2:57 p.m.
Message ID <4C4467E0.9080907@kernel.org>
Download mbox | patch
Permalink /patch/59205/
State RFC
Delegated to: David Miller
Headers show

Comments

Tejun Heo - July 19, 2010, 2:57 p.m.
Hello,

On 07/16/2010 02:02 PM, Ilpo Järvinen wrote:
> Besides, Tejun has also found that it's hint->next ptr which is NULL in 
> his case so this won't solve his case anyway. Tejun, can you confirm 
> whether it was retransmit_skb_hint->next being NULL on _entry time_ to 
> tcp_xmit_retransmit_queue() or later on in the loop after the updates done 
> by the loop itself to the hint (or that your testing didn't conclude 
> either)?

Sorry about the delay.  I was traveling last week.  Unfortunately, I
don't know whether ->next was NULL on entry or not.  I hacked up the
following ugly patch for the next test run.  It should have everything
which has come up till now + list and hint sanity checking before
starting processing them.  I'm planning on deploying it w/ crashdump
enabled in several days.  If I've missed something, please let me
know.

Thanks.
Ilpo Järvinen - July 20, 2010, 8:41 a.m.
On Mon, 19 Jul 2010, Tejun Heo wrote:

> Hello,
> 
> On 07/16/2010 02:02 PM, Ilpo Järvinen wrote:
> > Besides, Tejun has also found that it's hint->next ptr which is NULL in 
> > his case so this won't solve his case anyway. Tejun, can you confirm 
> > whether it was retransmit_skb_hint->next being NULL on _entry time_ to 
> > tcp_xmit_retransmit_queue() or later on in the loop after the updates done 
> > by the loop itself to the hint (or that your testing didn't conclude 
> > either)?
> 
> Sorry about the delay.  I was traveling last week.  Unfortunately, I
> don't know whether ->next was NULL on entry or not.  I hacked up the
> following ugly patch for the next test run.  It should have everything
> which has come up till now + list and hint sanity checking before
> starting processing them.  I'm planning on deploying it w/ crashdump
> enabled in several days.  If I've missed something, please let me
> know.

One thing that complicates things further is the fact that the write queue 
can change beneath us (ie., in tcp_retrans_try_collapse and tcp_fragment).
I've read them multiple times through and always found them innocent so 
this might be just for-the-record type of a note (at least I hope so).
Ilpo Järvinen - Sept. 8, 2010, 9:32 a.m.
On Mon, 19 Jul 2010, Tejun Heo wrote:

> Hello,
> 
> On 07/16/2010 02:02 PM, Ilpo Järvinen wrote:
> > Besides, Tejun has also found that it's hint->next ptr which is NULL in 
> > his case so this won't solve his case anyway. Tejun, can you confirm 
> > whether it was retransmit_skb_hint->next being NULL on _entry time_ to 
> > tcp_xmit_retransmit_queue() or later on in the loop after the updates done 
> > by the loop itself to the hint (or that your testing didn't conclude 
> > either)?
> 
> Sorry about the delay.  I was traveling last week.  Unfortunately, I
> don't know whether ->next was NULL on entry or not.  I hacked up the
> following ugly patch for the next test run.  It should have everything
> which has come up till now + list and hint sanity checking before
> starting processing them.  I'm planning on deploying it w/ crashdump
> enabled in several days.  If I've missed something, please let me
> know.

Any news on this one?
Tejun Heo - Sept. 8, 2010, 10:25 a.m.
Hello,

On 09/08/2010 11:32 AM, Ilpo Järvinen wrote:
>> Sorry about the delay.  I was traveling last week.  Unfortunately, I
>> don't know whether ->next was NULL on entry or not.  I hacked up the
>> following ugly patch for the next test run.  It should have everything
>> which has come up till now + list and hint sanity checking before
>> starting processing them.  I'm planning on deploying it w/ crashdump
>> enabled in several days.  If I've missed something, please let me
>> know.
> 
> Any news on this one?

Unfortunately, we haven't been able to reproduce the problem anymore.
It could be (but not likely given that none of the debugging messages
is triggering) that I was mistaken and the previously posted fixed the
issue.  The network used by the cluster went through some changes at
the time and there have been issues with packet losses.  Given that
the problem needs packet losses to trigger, it's likely that packet
loss pattern here changed such that the patterns of packet losses
which trigger the problem aren't happening anymore.  (Carsten,
Henning, please feel free to fill in if I'm missing something).

Thanks.
Ilpo Järvinen - Sept. 8, 2010, 10:34 a.m.
On Wed, 8 Sep 2010, Tejun Heo wrote:

> On 09/08/2010 11:32 AM, Ilpo Järvinen wrote:
> >> Sorry about the delay.  I was traveling last week.  Unfortunately, I
> >> don't know whether ->next was NULL on entry or not.  I hacked up the
> >> following ugly patch for the next test run.  It should have everything
> >> which has come up till now + list and hint sanity checking before
> >> starting processing them.  I'm planning on deploying it w/ crashdump
> >> enabled in several days.  If I've missed something, please let me
> >> know.
> > 
> > Any news on this one?
> 
> Unfortunately, we haven't been able to reproduce the problem anymore.

With my debug patch or not at all?

> It could be (but not likely given that none of the debugging messages
> is triggering) that I was mistaken and the previously posted fixed the
> issue. The network used by the cluster went through some changes at
> the time and there have been issues with packet losses.  Given that
> the problem needs packet losses to trigger, it's likely that packet
> loss pattern here changed such that the patterns of packet losses
> which trigger the problem aren't happening anymore.  (Carsten,
> Henning, please feel free to fill in if I'm missing something).

That might well be true, however, you're already a second guy who 
cannot reproduce it with the debug patch so I would not rule out other 
possibilities unless you've tried without debug patch too since the 
changes?
Tejun Heo - Sept. 9, 2010, 10:27 a.m.
Hello,

On 09/08/2010 12:34 PM, Ilpo Järvinen wrote:
>> Unfortunately, we haven't been able to reproduce the problem anymore.
> 
> With my debug patch or not at all?

With the ugly merged patch I posted previously in this thread which
contained debug messages if any of the worked around condition
triggers.

>> It could be (but not likely given that none of the debugging messages
>> is triggering) that I was mistaken and the previously posted fixed the
>> issue. The network used by the cluster went through some changes at
>> the time and there have been issues with packet losses.  Given that
>> the problem needs packet losses to trigger, it's likely that packet
>> loss pattern here changed such that the patterns of packet losses
>> which trigger the problem aren't happening anymore.  (Carsten,
>> Henning, please feel free to fill in if I'm missing something).
> 
> That might well be true, however, you're already a second guy who 
> cannot reproduce it with the debug patch so I would not rule out other 
> possibilities unless you've tried without debug patch too since the 
> changes?

Unfortunately, I can't really tell one way or the other at this point.
Carsten will be back in a few days.  I'll ask him for more details.

Thanks.
Ilpo Järvinen - Sept. 9, 2010, 10:45 a.m.
On Thu, 9 Sep 2010, Tejun Heo wrote:

> On 09/08/2010 12:34 PM, Ilpo Järvinen wrote:
> >> Unfortunately, we haven't been able to reproduce the problem anymore.
> > 
> > With my debug patch or not at all?
> 
> With the ugly merged patch I posted previously in this thread which
> contained debug messages if any of the worked around condition
> triggers.
> 
> >> It could be (but not likely given that none of the debugging messages
> >> is triggering) that I was mistaken and the previously posted fixed the
> >> issue. The network used by the cluster went through some changes at
> >> the time and there have been issues with packet losses.  Given that
> >> the problem needs packet losses to trigger, it's likely that packet
> >> loss pattern here changed such that the patterns of packet losses
> >> which trigger the problem aren't happening anymore.  (Carsten,
> >> Henning, please feel free to fill in if I'm missing something).
> > 
> > That might well be true, however, you're already a second guy who 
> > cannot reproduce it with the debug patch so I would not rule out other 
> > possibilities unless you've tried without debug patch too since the 
> > changes?
> 
> Unfortunately, I can't really tell one way or the other at this point.
> Carsten will be back in a few days.  I'll ask him for more details.

Once you get the info, if not yet done, I'd recommend you try without the 
debug patch (assuming a possible crash isn't too devasting for the actual 
stuff you're doing with the machines :-)). ...If it crashes without, then 
it's time to start looking into compiler versions, etc.

Patch

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b4ed957..1c8b1e0 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2190,6 +2190,53 @@  static int tcp_can_forward_retransmit(struct sock *sk)
 	return 1;
 }

+static void print_queue(struct sock *sk, struct sk_buff *old, struct sk_buff *hole)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *skb, *prev;
+	bool do_panic = false;
+
+	skb = tcp_write_queue_head(sk);
+	prev = (struct sk_buff *)(&sk->sk_write_queue);
+
+	if (skb == NULL) {
+		printk("XXX NULL head, pkts %u\n", tp->packets_out);
+		do_panic = true;
+	}
+
+	printk("XXX head %p tail %p sendhead %p oldhint %p now %p hole %p high %u\n",
+	       tcp_write_queue_head(sk), tcp_write_queue_tail(sk),
+	       tcp_send_head(sk), old, tp->retransmit_skb_hint, hole,
+	       tp->retransmit_high);
+
+	while (skb) {
+		printk("XXX skb %p (%u-%u) next %p prev %p sacked %u\n",
+		       skb, TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq,
+		       skb->next, skb->prev, TCP_SKB_CB(skb)->sacked);
+		if (prev != skb->prev) {
+			printk("XXX Inconsistent prev\n");
+			do_panic = true;
+		}
+
+		if (skb == tcp_write_queue_tail(sk)) {
+			if (skb->next != (struct sk_buff *)(&sk->sk_write_queue)) {
+				printk("XXX Improper next at tail\n");
+				do_panic = true;
+			}
+			break;
+		}
+
+		prev = skb;
+		skb = skb->next;
+	}
+	if (!skb) {
+		printk("XXX Encountered unexpected NULL\n");
+		do_panic = true;
+	}
+	if (do_panic)
+		panic("XXX panicking");
+}
+
 /* This gets called after a retransmit timeout, and the initially
  * retransmitted data is acknowledged.  It tries to continue
  * resending the rest of the retransmit queue, until either
@@ -2198,19 +2245,53 @@  static int tcp_can_forward_retransmit(struct sock *sk)
  * based retransmit packet might feed us FACK information again.
  * If so, we use it to avoid unnecessarily retransmissions.
  */
+static unsigned int caught_it;
+
 void tcp_xmit_retransmit_queue(struct sock *sk)
 {
 	const struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
-	struct sk_buff *skb;
+	struct sk_buff *skb, *prev;
 	struct sk_buff *hole = NULL;
+	struct sk_buff *old = tp->retransmit_skb_hint;
 	u32 last_lost;
 	int mib_idx;
 	int fwd_rexmitting = 0;
+	bool saw_hint = false;
+
+	if (!tp->packets_out) {
+		if (net_ratelimit())
+			printk("XXX !tp->packets_out, retransmit_skb_hint=%p, write_queue_head=%p\n",
+			       tp->retransmit_skb_hint, tcp_write_queue_head(sk));
+		return;
+	}

 	if (!tp->lost_out)
 		tp->retransmit_high = tp->snd_una;

+	for (skb = tcp_write_queue_head(sk),
+	     prev = (struct sk_buff *)&sk->sk_write_queue;
+	     skb != (struct sk_buff *)&sk->sk_write_queue;
+	     prev = skb, skb = skb->next) {
+		if (prev != skb->prev) {
+			printk("XXX sanity check: prev corrupt\n");
+			print_queue(sk, old, hole);
+		}
+		if (skb == tp->retransmit_skb_hint)
+			saw_hint = true;
+		if (skb == tcp_write_queue_tail(sk) &&
+		    skb->next != (struct sk_buff *)(&sk->sk_write_queue)) {
+			printk("XXX sanity check: end corrupt\n");
+			print_queue(sk, old, hole);
+		}
+	}
+	if (tp->retransmit_skb_hint && !saw_hint) {
+		printk("XXX sanity check: retransmit_skb_hint=%p is not on list, claring hint\n",
+		       tp->retransmit_skb_hint);
+		print_queue(sk, old, hole);
+		tp->retransmit_skb_hint = NULL;
+	}
+
 	if (tp->retransmit_skb_hint) {
 		skb = tp->retransmit_skb_hint;
 		last_lost = TCP_SKB_CB(skb)->end_seq;
@@ -2218,7 +2299,17 @@  void tcp_xmit_retransmit_queue(struct sock *sk)
 			last_lost = tp->retransmit_high;
 	} else {
 		skb = tcp_write_queue_head(sk);
-		last_lost = tp->snd_una;
+		if (skb)
+			last_lost = tp->snd_una;
+	}
+
+checknull:
+	if (skb == NULL) {
+		print_queue(sk, old, hole);
+		caught_it++;
+		if (net_ratelimit())
+			printk("XXX Errors caught so far %u\n", caught_it);
+		return;
 	}

 	tcp_for_write_queue_from(skb, sk) {
@@ -2261,7 +2352,7 @@  begin_fwd:
 		} else if (!(sacked & TCPCB_LOST)) {
 			if (hole == NULL && !(sacked & (TCPCB_SACKED_RETRANS|TCPCB_SACKED_ACKED)))
 				hole = skb;
-			continue;
+			goto cont;

 		} else {
 			last_lost = TCP_SKB_CB(skb)->end_seq;
@@ -2272,7 +2363,7 @@  begin_fwd:
 		}

 		if (sacked & (TCPCB_SACKED_ACKED|TCPCB_SACKED_RETRANS))
-			continue;
+			goto cont;

 		if (tcp_retransmit_skb(sk, skb))
 			return;
@@ -2282,6 +2373,9 @@  begin_fwd:
 			inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
 						  inet_csk(sk)->icsk_rto,
 						  TCP_RTO_MAX);
+cont:
+		skb = skb->next;
+		goto checknull;
 	}
 }