Message ID | 30f38b3c-8c5c-7fab-e424-985e63ad900a@oracle.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, 2017-01-13 at 18:01 +0300, Alexey Kodanev wrote: > Hi, > > Got the issue when running LTP/netstress test on localhost with mss > greater than the send window advertised by client (right after 3WHS). > Here is the testscenario that can reproduce this: Hi Alexey So this is a combination of Fastopen + small window + large MSS ? I would rather not force burning tons of coal or other fossil fuel, by making each tcp_sendmsg() done by billions of linux devices more expensive, only to accommodate for some LTP test doing something not sensible ;) Fact that you removed one condition in the BUG_ON() might hide another issue later in the path. I would suggest to clamp MSS to half the initial window, but I guess this is impractical since window in SYN/SYNACK are not scaled. Care to send a packetdrill test so that we have a clear picture of what is going on ? Thanks.
Hi Eric, On 13.01.2017 18:35, Eric Dumazet wrote: > On Fri, 2017-01-13 at 18:01 +0300, Alexey Kodanev wrote: >> Hi, >> >> Got the issue when running LTP/netstress test on localhost with mss >> greater than the send window advertised by client (right after 3WHS). >> Here is the testscenario that can reproduce this: > Hi Alexey > > So this is a combination of Fastopen + small window + large MSS ? Yeah, this happens only in the beginning, after first ack from client. Later window gets lager than mss and it doesn't happen. > > I would rather not force burning tons of coal or other fossil fuel, > by making each tcp_sendmsg() done by billions of linux devices more > expensive, only to accommodate for some LTP test doing something not > sensible ;) > > Fact that you removed one condition in the BUG_ON() might hide another > issue later in the path. > > I would suggest to clamp MSS to half the initial window, but I guess > this is impractical since window in SYN/SYNACK are not scaled. > Care to send a packetdrill test so that we have a clear picture of what > is going on ? Is it capable of making two connections in the single test, one after another? Thanks, Alexey
On Fri, Jan 13, 2017 at 9:07 AM, Alexey Kodanev <alexey.kodanev@oracle.com> wrote: > Hi Eric, > On 13.01.2017 18:35, Eric Dumazet wrote: >> Care to send a packetdrill test so that we have a clear picture of what >> is going on ? > > Is it capable of making two connections in the single test, one after > another? Absolutely. Neal, Yuchung would you be kind enough to send a Fastopen tpacketdrill template showing a typical fastopen test running on an upstream kernel ? Thanks !
On Fri, Jan 13, 2017 at 12:14 PM, Eric Dumazet <edumazet@google.com> wrote: > > On Fri, Jan 13, 2017 at 9:07 AM, Alexey Kodanev > <alexey.kodanev@oracle.com> wrote: > > Hi Eric, > > On 13.01.2017 18:35, Eric Dumazet wrote: > > >> Care to send a packetdrill test so that we have a clear picture of what > >> is going on ? > > > > Is it capable of making two connections in the single test, one after > > another? > > Absolutely. > > Neal, Yuchung would you be kind enough to send a Fastopen tpacketdrill > template showing a typical fastopen test > running on an upstream kernel ? > > Thanks ! Sure, here is an example packetdrill script, IIRC written by Yuchung, which demonstrates TCP fast open and consecutive active connections: `sysctl -q net.ipv4.tcp_timestamps=0` // Cache warmup: send a Fast Open cookie request 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 +0 sendto(3, ..., 0, MSG_FASTOPEN, ..., ...) = -1 EINPROGRESS (Operation is now in progress) +0 > S 0:0(0) <mss 1460,nop,nop,sackOK,nop,wscale 8,FO,nop,nop> +.010 < S. 123:123(0) ack 1 win 5840 <mss 1040,nop,nop,sackOK,nop,wscale 6,FO abcd1234,nop,nop> +0 > . 1:1(0) ack 1 +0 close(3) = 0 +0 > F. 1:1(0) ack 1 +.010 < F. 1:1(0) ack 2 win 92 +0 > . 2:2(0) ack 2 // // TEST1: Servers sends SYN-ACK with data and another two data packets // +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4 +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 +0 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000 +0 > S 0:1000(1000) <mss 1460,nop,nop,sackOK,nop,wscale 8,FO abcd1234,nop,nop> +.010 < S. 1000000:1001400(1400) ack 1001 win 5840 <mss 1040,nop,nop,sackOK,nop,wscale 6> +0 < . 1401:2801(1400) ack 1001 win 257 +0 < P. 2801:3001(200) ack 1001 win 257 neal
On Fri, 2017-01-13 at 12:32 -0500, Neal Cardwell wrote: > On Fri, Jan 13, 2017 at 12:14 PM, Eric Dumazet <edumazet@google.com> wrote: > > > > On Fri, Jan 13, 2017 at 9:07 AM, Alexey Kodanev > > <alexey.kodanev@oracle.com> wrote: > > > Hi Eric, > > > On 13.01.2017 18:35, Eric Dumazet wrote: > > > > >> Care to send a packetdrill test so that we have a clear picture of what > > >> is going on ? > > > > > > Is it capable of making two connections in the single test, one after > > > another? > > > > Absolutely. > > > > Neal, Yuchung would you be kind enough to send a Fastopen tpacketdrill > > template showing a typical fastopen test > > running on an upstream kernel ? > > > > Thanks ! > > Sure, here is an example packetdrill script, IIRC written by Yuchung, > which demonstrates TCP fast open and consecutive active connections: > > `sysctl -q net.ipv4.tcp_timestamps=0` > > // Cache warmup: send a Fast Open cookie request > 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 > +0 sendto(3, ..., 0, MSG_FASTOPEN, ..., ...) = -1 EINPROGRESS > (Operation is now in progress) > +0 > S 0:0(0) <mss 1460,nop,nop,sackOK,nop,wscale 8,FO,nop,nop> > +.010 < S. 123:123(0) ack 1 win 5840 <mss > 1040,nop,nop,sackOK,nop,wscale 6,FO abcd1234,nop,nop> > +0 > . 1:1(0) ack 1 > +0 close(3) = 0 > +0 > F. 1:1(0) ack 1 > +.010 < F. 1:1(0) ack 2 win 92 > +0 > . 2:2(0) ack 2 > > // > // TEST1: Servers sends SYN-ACK with data and another two data packets > // > +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4 > +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 > +0 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000 > +0 > S 0:1000(1000) <mss 1460,nop,nop,sackOK,nop,wscale 8,FO > abcd1234,nop,nop> > +.010 < S. 1000000:1001400(1400) ack 1001 win 5840 <mss > 1040,nop,nop,sackOK,nop,wscale 6> > +0 < . 1401:2801(1400) ack 1001 win 257 > +0 < P. 2801:3001(200) ack 1001 win 257 > > neal Thanks Neal Also worth adding that packetdrill has the following option to tune the MTU on the tun device : --mtu=xxxxx
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 4a04496..3d3bd97 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -860,7 +860,12 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now, size_goal = tp->gso_segs * mss_now; } - return max(size_goal, mss_now); + size_goal = max(size_goal, mss_now); + + if (tp->snd_wnd > TCP_MSS_DEFAULT) + return min(tp->snd_wnd, size_goal); + + return size_goal; } static int tcp_send_mss(struct sock *sk, int *size_goal, int flags) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 1d5331a..0ac133f 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2445,7 +2445,7 @@ void tcp_push_one(struct sock *sk, unsigned int mss_now) { struct sk_buff *skb = tcp_send_head(sk); - BUG_ON(!skb || skb->len < mss_now); + BUG_ON(!skb); tcp_write_xmit(sk, mss_now, TCP_NAGLE_PUSH, 1, sk->sk_allocation);