Message ID | 1433551592-3596877-1-git-send-email-kafai@fb.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, 2015-06-05 at 17:46 -0700, Martin KaFai Lau wrote: > The problem is caught by this WARN_ON(len > skb->len) in tcp_fragment(): > > [<ffffffff810510ca>] warn_slowpath_null+0x1a/0x20 > [<ffffffff8160ec90>] tcp_fragment+0x2a0/0x2b0 > [<ffffffff81604e06>] tcp_mark_head_lost+0x196/0x230 > [<ffffffff8160585d>] tcp_update_scoreboard+0x4d/0x80 > [<ffffffff8160a9ac>] tcp_fastretrans_alert+0x6ac/0xa90 > [<ffffffff8160b834>] tcp_ack+0x9d4/0x10e0 > [<ffffffff8160c699>] tcp_rcv_established+0x309/0x7e0 > > The WARN_ON pointed out that tcp_skb_pcount (i.e. > TCP_SKB_CB(skb)->tcp_gso_segs) and skb->len is inconsistent. > > The WARN_ON stack goes away after setting net.ipv4.tcp_mtu_probing to 0. > > v2 > - Replace the skb slicing codes by the existing tcp_trim_head(), > suggested by Eric Dumazet. > > v1 > - Call tcp_set_skb_tso_segs() for all slicing cases. > > Signed-off-by: Martin KaFai Lau <kafai@fb.com> > Reported-by: Grant Zhang <gzhang@fastly.com> > Cc: Grant Zhang <gzhang@fastly.com> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Neal Cardwell <ncardwell@google.com> > Cc: Yuchung Cheng <ycheng@google.com> > --- > net/ipv4/tcp_output.c | 12 ++---------- > 1 file changed, 2 insertions(+), 10 deletions(-) > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index a369e8a..4ae4f0c 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -1977,16 +1977,8 @@ static int tcp_mtu_probe(struct sock *sk) > } else { > TCP_SKB_CB(nskb)->tcp_flags |= TCP_SKB_CB(skb)->tcp_flags & > ~(TCPHDR_FIN|TCPHDR_PSH); > - if (!skb_shinfo(skb)->nr_frags) { > - skb_pull(skb, copy); > - if (skb->ip_summed != CHECKSUM_PARTIAL) > - skb->csum = csum_partial(skb->data, > - skb->len, 0); > - } else { > - __pskb_trim_head(skb, copy); > - tcp_set_skb_tso_segs(sk, skb, mss_now); > - } > - TCP_SKB_CB(skb)->seq += copy; > + tcp_skb_pcount_set(skb, 0); > + tcp_trim_head(sk, skb, copy); > } > > len += copy; I think the invariant should be that if a packet had been never sent, its pcount should be already 0. (cleared in do_tcp_sendpages() and tcp_sendmsg() : it seems we hacked these functions already in the past :( ) So we might need to track places where we violate this rule, then get rid of the tcp_skb_pcount_set(skb, 0); done in do_tcp_sendpages() and tcp_sendmsg(). Here, trimming a packet that was never sent (by definition) should not force pcount to 0, it should already be the case. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 05, 2015 at 06:11:33PM -0700, Eric Dumazet wrote: > On Fri, 2015-06-05 at 17:46 -0700, Martin KaFai Lau wrote: > > The problem is caught by this WARN_ON(len > skb->len) in tcp_fragment(): > > > > [<ffffffff810510ca>] warn_slowpath_null+0x1a/0x20 > > [<ffffffff8160ec90>] tcp_fragment+0x2a0/0x2b0 > > [<ffffffff81604e06>] tcp_mark_head_lost+0x196/0x230 > > [<ffffffff8160585d>] tcp_update_scoreboard+0x4d/0x80 > > [<ffffffff8160a9ac>] tcp_fastretrans_alert+0x6ac/0xa90 > > [<ffffffff8160b834>] tcp_ack+0x9d4/0x10e0 > > [<ffffffff8160c699>] tcp_rcv_established+0x309/0x7e0 > > > > The WARN_ON pointed out that tcp_skb_pcount (i.e. > > TCP_SKB_CB(skb)->tcp_gso_segs) and skb->len is inconsistent. > > > > The WARN_ON stack goes away after setting net.ipv4.tcp_mtu_probing to 0. > > > > v2 > > - Replace the skb slicing codes by the existing tcp_trim_head(), > > suggested by Eric Dumazet. > > > > v1 > > - Call tcp_set_skb_tso_segs() for all slicing cases. > > > > Signed-off-by: Martin KaFai Lau <kafai@fb.com> > > Reported-by: Grant Zhang <gzhang@fastly.com> > > Cc: Grant Zhang <gzhang@fastly.com> > > Cc: Eric Dumazet <edumazet@google.com> > > Cc: Neal Cardwell <ncardwell@google.com> > > Cc: Yuchung Cheng <ycheng@google.com> > > --- > > net/ipv4/tcp_output.c | 12 ++---------- > > 1 file changed, 2 insertions(+), 10 deletions(-) > > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > > index a369e8a..4ae4f0c 100644 > > --- a/net/ipv4/tcp_output.c > > +++ b/net/ipv4/tcp_output.c > > @@ -1977,16 +1977,8 @@ static int tcp_mtu_probe(struct sock *sk) > > } else { > > TCP_SKB_CB(nskb)->tcp_flags |= TCP_SKB_CB(skb)->tcp_flags & > > ~(TCPHDR_FIN|TCPHDR_PSH); > > - if (!skb_shinfo(skb)->nr_frags) { > > - skb_pull(skb, copy); > > - if (skb->ip_summed != CHECKSUM_PARTIAL) > > - skb->csum = csum_partial(skb->data, > > - skb->len, 0); > > - } else { > > - __pskb_trim_head(skb, copy); > > - tcp_set_skb_tso_segs(sk, skb, mss_now); > > - } > > - TCP_SKB_CB(skb)->seq += copy; > > + tcp_skb_pcount_set(skb, 0); > > + tcp_trim_head(sk, skb, copy); > > } > > > > len += copy; > > > I think the invariant should be that if a packet had been never sent, > its pcount should be already 0. > > (cleared in do_tcp_sendpages() and tcp_sendmsg() : it seems we hacked > these functions already in the past :( ) > > So we might need to track places where we violate this rule, then get > rid of the tcp_skb_pcount_set(skb, 0); done in do_tcp_sendpages() and > tcp_sendmsg(). > > Here, trimming a packet that was never sent (by definition) should not > force pcount to 0, it should already be the case. It seems the invariant does not hold at this point also. Should the invariant fix be something for net-next? or Would you like to post a patch for it? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2015-06-08 at 10:58 -0700, Martin KaFai Lau wrote: > It seems the invariant does not hold at this point also. > Should the invariant fix be something for net-next? or Would you like > to post a patch for it? This patch definitely can target net-next, it is not a new regression. Once fully tested, we can ask David to submit a stable version later. Yes, I can provide a v3, probably at the end of today. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2015-06-08 at 11:11 -0700, Eric Dumazet wrote: > On Mon, 2015-06-08 at 10:58 -0700, Martin KaFai Lau wrote: > > > It seems the invariant does not hold at this point also. > > Should the invariant fix be something for net-next? or Would you like > > to post a patch for it? > > This patch definitely can target net-next, it is not a new regression. > > Once fully tested, we can ask David to submit a stable version later. > > Yes, I can provide a v3, probably at the end of today. I've been working on this, but still can get the bug triggering in tcp_fragment(), no matter what (Neal patch , yours, mine...) I suspect a problem in skb_shift() or tcp_shifted_skb() and am doing additional tests. [ 1806.206345] ------------[ cut here ]------------ [ 1806.206359] WARNING: CPU: 19 PID: 0 at net/ipv4/tcp_output.c:1159 tcp_fragment+0x2b0/0x2d0() [ 1806.206360] Modules linked in: ip_gre gre sit ip_tunnel tunnel4 bonding w1_therm wire cdc_acm ehci_pci ehci_hcd i2c_dev i2c_i801 i2c_core msr cpuid ib_uverbs mlx4_ib ib_sa ib_mad ib_core ib_addr ip6table_mangle ip6_tables ipv6 [ 1806.206376] CPU: 19 PID: 0 Comm: swapper/19 Tainted: G W 4.1.0-smp-DEV #1762 [ 1806.206378] ffffffff8196ba4b ffff88407fae38f8 ffffffff81643125 00000000000000dd [ 1806.206380] 0000000000000000 ffff88407fae3938 ffffffff810a8c97 ffff88407fae3998 [ 1806.206382] ffff883f75498200 ffff881fe1d5f800 000000000000000d 0000000000004802 [ 1806.206384] Call Trace: [ 1806.206385] <IRQ> [<ffffffff81643125>] dump_stack+0x45/0x57 [ 1806.206395] [<ffffffff810a8c97>] warn_slowpath_common+0x97/0xe0 [ 1806.206396] [<ffffffff810a8cfa>] warn_slowpath_null+0x1a/0x20 [ 1806.206398] [<ffffffff815df0c0>] tcp_fragment+0x2b0/0x2d0 [ 1806.206400] [<ffffffff815d62e1>] tcp_mark_head_lost+0x191/0x280 [ 1806.206402] [<ffffffff815d641d>] tcp_update_scoreboard+0x4d/0x80 [ 1806.206404] [<ffffffff815daa24>] tcp_fastretrans_alert+0x6e4/0xc20 [ 1806.206406] [<ffffffff815dbcef>] tcp_ack+0xb0f/0x13c0 [ 1806.206408] [<ffffffff815dcc31>] tcp_rcv_established+0x311/0x820 [ 1806.206410] [<ffffffff815e79cb>] tcp_v4_do_rcv+0x14b/0x3d0 [ 1806.206414] [<ffffffff812f15e6>] ? security_sock_rcv_skb+0x16/0x20 [ 1806.206415] [<ffffffff815e8e46>] tcp_v4_rcv+0x8c6/0x9d0 [ 1806.206418] [<ffffffff815c1dec>] ip_local_deliver_finish+0xac/0x230 [ 1806.206419] [<ffffffff815c216a>] ip_local_deliver+0xaa/0xc0 [ 1806.206421] [<ffffffff815c1d40>] ? ip_rcv_finish+0x380/0x380 [ 1806.206422] [<ffffffff815c1a48>] ip_rcv_finish+0x88/0x380 [ 1806.206423] [<ffffffff815c2487>] ip_rcv+0x307/0x420 [ 1806.206426] [<ffffffff815c19c0>] ? inet_add_protocol+0x50/0x50 [ 1806.206431] [<ffffffff8157cc0c>] __netif_receive_skb_core+0x5bc/0xa30 [ 1806.206432] [<ffffffff8157d0a6>] __netif_receive_skb+0x26/0x70 [ 1806.206434] [<ffffffff8157d180>] process_backlog+0x90/0x130 [ 1806.206436] [<ffffffff8157d9a7>] net_rx_action+0x157/0x330 [ 1806.206439] [<ffffffff810ac9c7>] __do_softirq+0xe7/0x270 [ 1806.206441] [<ffffffff810acd85>] irq_exit+0xa5/0xb0 [ 1806.206445] [<ffffffff8108aa65>] smp_call_function_single_interrupt+0x35/0x40 [ 1806.206448] [<ffffffff8164ae2b>] call_function_single_interrupt+0x6b/0x70 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 09, 2015 at 10:06:25AM -0700, Eric Dumazet wrote: > I've been working on this, but still can get the bug triggering in > tcp_fragment(), no matter what (Neal patch , yours, mine...) Can you describe the test case that can reproduce it? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2015-06-09 at 10:45 -0700, Martin KaFai Lau wrote: > On Tue, Jun 09, 2015 at 10:06:25AM -0700, Eric Dumazet wrote: > > I've been working on this, but still can get the bug triggering in > > tcp_fragment(), no matter what (Neal patch , yours, mine...) > Can you describe the test case that can reproduce it? This might not be easy : I used a 40Gb testbed and following. Warning triggers in about 30 minutes, but this might be related to some unrelated traffic. echo 4 >/proc/sys/net/ipv4/tcp_min_tso_segs echo 0 >/proc/sys/kernel/timer_migration echo fq >/proc/sys/net/core/default_qdisc tc qd replace dev eth1 root mq ./super_netperf 1500 --google-pacing-rate 3028000 -H lpaa24 -l 10000 & You might implement the pacing stuff using regular netperf with for ETH in eth1 do tc qd del dev $ETH root 2>/dev/null tc qd add dev $ETH root handle 1: mq for i in `seq 1 16` do slot=$( printf %x $(( i )) ) tc qd add dev $ETH parent 1:$slot fq maxrate 3028000 done done (Change eth1 by eth0 or something, and 16 by number of TX queues.) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index a369e8a..4ae4f0c 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1977,16 +1977,8 @@ static int tcp_mtu_probe(struct sock *sk) } else { TCP_SKB_CB(nskb)->tcp_flags |= TCP_SKB_CB(skb)->tcp_flags & ~(TCPHDR_FIN|TCPHDR_PSH); - if (!skb_shinfo(skb)->nr_frags) { - skb_pull(skb, copy); - if (skb->ip_summed != CHECKSUM_PARTIAL) - skb->csum = csum_partial(skb->data, - skb->len, 0); - } else { - __pskb_trim_head(skb, copy); - tcp_set_skb_tso_segs(sk, skb, mss_now); - } - TCP_SKB_CB(skb)->seq += copy; + tcp_skb_pcount_set(skb, 0); + tcp_trim_head(sk, skb, copy); } len += copy;
The problem is caught by this WARN_ON(len > skb->len) in tcp_fragment(): [<ffffffff810510ca>] warn_slowpath_null+0x1a/0x20 [<ffffffff8160ec90>] tcp_fragment+0x2a0/0x2b0 [<ffffffff81604e06>] tcp_mark_head_lost+0x196/0x230 [<ffffffff8160585d>] tcp_update_scoreboard+0x4d/0x80 [<ffffffff8160a9ac>] tcp_fastretrans_alert+0x6ac/0xa90 [<ffffffff8160b834>] tcp_ack+0x9d4/0x10e0 [<ffffffff8160c699>] tcp_rcv_established+0x309/0x7e0 The WARN_ON pointed out that tcp_skb_pcount (i.e. TCP_SKB_CB(skb)->tcp_gso_segs) and skb->len is inconsistent. The WARN_ON stack goes away after setting net.ipv4.tcp_mtu_probing to 0. v2 - Replace the skb slicing codes by the existing tcp_trim_head(), suggested by Eric Dumazet. v1 - Call tcp_set_skb_tso_segs() for all slicing cases. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reported-by: Grant Zhang <gzhang@fastly.com> Cc: Grant Zhang <gzhang@fastly.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> --- net/ipv4/tcp_output.c | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-)