Message ID | 1336144442.3752.348.camel@edumazet-glaptop |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, May 4, 2012 at 11:14 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > From: Eric Dumazet <edumazet@google.com> > > It appears some networks play bad games with the two bits reserved for > ECN. This can trigger false congestion notifications and very slow > transferts. > > Since RFC 3168 (6.1.1) forbids SYN packets to carry CT bits, we can > disable TCP ECN negociation if it happens we receive mangled CT bits in > the SYN packet. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Perry Lorier <perryl@google.com> > Cc: Matt Mathis <mattmathis@google.com> > Cc: Yuchung Cheng <ycheng@google.com> > Cc: Neal Cardwell <ncardwell@google.com> > Cc: Wilmer van der Gaast <wilmer@google.com> > Cc: Ankur Jain <jankur@google.com> > Cc: Tom Herbert <therbert@google.com> > Cc: Dave Täht <dave.taht@bufferbloat.net> > --- > include/net/tcp.h | 23 ++++++++++++++++------- > net/ipv4/tcp_ipv4.c | 2 +- > net/ipv6/tcp_ipv6.c | 2 +- > 3 files changed, 18 insertions(+), 9 deletions(-) Acked-by: Neal Cardwell <ncardwell@google.com> neal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 04 May 2012 17:14:02 +0200 > From: Eric Dumazet <edumazet@google.com> > > It appears some networks play bad games with the two bits reserved for > ECN. This can trigger false congestion notifications and very slow > transferts. > > Since RFC 3168 (6.1.1) forbids SYN packets to carry CT bits, we can > disable TCP ECN negociation if it happens we receive mangled CT bits in > the SYN packet. > > Signed-off-by: Eric Dumazet <edumazet@google.com> Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/04/2012 08:14 AM, Eric Dumazet wrote: > From: Eric Dumazet<edumazet@google.com> > > It appears some networks play bad games with the two bits reserved for > ECN. This can trigger false congestion notifications and very slow > transferts. > > Since RFC 3168 (6.1.1) forbids SYN packets to carry CT bits, we can > disable TCP ECN negociation if it happens we receive mangled CT bits in > the SYN packet. What sort of networks were these? Any chance it was some sort of attempt to add ECN to FastOpen? rick jones -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-05-04 at 11:09 -0700, Rick Jones wrote: > On 05/04/2012 08:14 AM, Eric Dumazet wrote: > > From: Eric Dumazet<edumazet@google.com> > > > > It appears some networks play bad games with the two bits reserved for > > ECN. This can trigger false congestion notifications and very slow > > transferts. > > > > Since RFC 3168 (6.1.1) forbids SYN packets to carry CT bits, we can > > disable TCP ECN negociation if it happens we receive mangled CT bits in > > the SYN packet. > > What sort of networks were these? Any chance it was some sort of > attempt to add ECN to FastOpen? Nothing to do with fastopen. Just take a look at a random http server and sample all SYN packets it receives. Some of them have TOS bits 0 or 1 set, or even both bits set. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/04/2012 11:23 AM, Eric Dumazet wrote: > On Fri, 2012-05-04 at 11:09 -0700, Rick Jones wrote: >> What sort of networks were these? Any chance it was some sort of >> attempt to add ECN to FastOpen? > > Nothing to do with fastopen. > > Just take a look at a random http server and sample all SYN packets it > receives. > > Some of them have TOS bits 0 or 1 set, or even both bits set. I'll fire-up tcpdump on netperf.org: tcpdump -i eth0 -vvv '(tcp[tcpflags] & tcp-syn != 0) && (ip[1] != 0x0)' and see what appears. rick -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-05-04 at 11:48 -0700, Rick Jones wrote: > On 05/04/2012 11:23 AM, Eric Dumazet wrote: > > On Fri, 2012-05-04 at 11:09 -0700, Rick Jones wrote: > >> What sort of networks were these? Any chance it was some sort of > >> attempt to add ECN to FastOpen? > > > > Nothing to do with fastopen. > > > > Just take a look at a random http server and sample all SYN packets it > > receives. > > > > Some of them have TOS bits 0 or 1 set, or even both bits set. > > I'll fire-up tcpdump on netperf.org: > > tcpdump -i eth0 -vvv '(tcp[tcpflags] & tcp-syn != 0) && (ip[1] != 0x0)' > > and see what appears. > > rick of (ip[1] & 3 != 0) Note that you could catch SYNACK with this filter (if your machine initiates some active TCP sessions), since SYNACK might have ECT bits, if some stacks implemented : http://tools.ietf.org/html/draft-kuzmanovic-ecn-syn-00 ( Adding Explicit Congestion Notification (ECN) Capability to TCP's SYN/ACK Packets ) http://tools.ietf.org/id/draft-ietf-tcpm-ecnsyn-04.txt -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/04/2012 12:05 PM, Eric Dumazet wrote: > On Fri, 2012-05-04 at 11:48 -0700, Rick Jones wrote: >> I'll fire-up tcpdump on netperf.org: >> >> tcpdump -i eth0 -vvv '(tcp[tcpflags]& tcp-syn != 0)&& (ip[1] != 0x0)' >> >> and see what appears. >> >> rick > > of (ip[1]& 3 != 0) True, I'm looking at more than the ECN bits, but in the 90 minutes the tcpdump has been running there have been no packets with the any of the 8 bits at ip[1] being 1 anyway :) Netperf.org doesn't get a massive quantity of traffic. It may go the entire week-end or longer without seeing such a packet. > Note that you could catch SYNACK with this filter (if your machine > initiates some active TCP sessions), since SYNACK might have ECT bits, > if some stacks implemented : > > http://tools.ietf.org/html/draft-kuzmanovic-ecn-syn-00 ( Adding > Explicit Congestion Notification (ECN) Capability to TCP's SYN/ACK > Packets ) > > http://tools.ietf.org/id/draft-ietf-tcpm-ecnsyn-04.txt True. I suspect that 99 times out of 10, the outbound connections established by netperf.org are in response to traffic to netperf-talk, which is itself a rather quiet list, so I'm not too worried about the output being cluttered with false hits. rick -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/04/2012 01:20 PM, Rick Jones wrote: > True, I'm looking at more than the ECN bits, but in the 90 minutes the > tcpdump has been running there have been no packets with the any of the > 8 bits at ip[1] being 1 anyway :) Netperf.org doesn't get a massive > quantity of traffic. It may go the entire week-end or longer without > seeing such a packet. I see fate is working as intended, or someone decided to try to feed me my words :) for within 6 minutes of my sending the above I got: 13:26:16.866007 IP (tos 0x3,CE, ttl 41, id 28850, offset 0, flags [DF], proto TCP (6), length 64) somesystemin.de.55363 > www.netperf.org.www: Flags [S], cksum 0x4cfc (correct), seq 304457158, win 65535, options [mss 1460,nop,wscale 3,nop,nop,TS val 288116308 ecr 0,sackOK,eol], length 0 13:26:17.831880 IP (tos 0x3,CE, ttl 41, id 6911, offset 0, flags [DF], proto TCP (6), length 64) somesystemin.de.55367 > www.netperf.org.www: Flags [S], cksum 0x17aa (correct), seq 586073737, win 65535, options [mss 1460,nop,wscale 3,nop,nop,TS val 288117270 ecr 0,sackOK,eol], length 0 13:26:17.831929 IP (tos 0x3,CE, ttl 41, id 28924, offset 0, flags [DF], proto TCP (6), length 64) somesystemin.de.55368 > www.netperf.org.www: Flags [S], cksum 0x07cc (correct), seq 1513398047, win 65535, options [mss 1460,nop,wscale 3,nop,nop,TS val 288117271 ecr 0,sackOK,eol], length 0 13:26:17.831952 IP (tos 0x3,CE, ttl 41, id 2494, offset 0, flags [DF], proto TCP (6), length 64) somesystemin.de.55366 > www.netperf.org.www: Flags [S], cksum 0x75f4 (correct), seq 1153058420, win 65535, options [mss 1460,nop,wscale 3,nop,nop,TS val 288117270 ecr 0,sackOK,eol], length 0 13:26:17.832177 IP (tos 0x3,CE, ttl 41, id 6854, offset 0, flags [DF], proto TCP (6), length 64) somesystemin.de.55365 > www.netperf.org.www: Flags [S], cksum 0xfca0 (correct), seq 2332522875, win 65535, options [mss 1460,nop,wscale 3,nop,nop,TS val 288117270 ecr 0,sackOK,eol], length 0 13:26:17.832239 IP (tos 0x3,CE, ttl 41, id 64733, offset 0, flags [DF], proto TCP (6), length 64) somesystemin.de.55364 > www.netperf.org.www: Flags [S], cksum 0x7414 (correct), seq 1544827132, win 65535, options [mss 1460,nop,wscale 3,nop,nop,TS val 288117270 ecr 0,sackOK,eol], length 0 13:26:38.649126 IP (tos 0x3,CE, ttl 41, id 9860, offset 0, flags [DF], proto TCP (6), length 64) somesystemin.de.55369 > www.netperf.org.www: Flags [S], cksum 0x6270 (correct), seq 683091230, win 65535, options [mss 1460,nop,wscale 3,nop,nop,TS val 288137968 ecr 0,sackOK,eol], length 0 13:26:39.417589 IP (tos 0x3,CE, ttl 41, id 13478, offset 0, flags [DF], proto TCP (6), length 64) somesystemin.de.55370 > www.netperf.org.www: Flags [S], cksum 0x2862 (correct), seq 3168323595, win 65535, options [mss 1460,nop,wscale 3,nop,nop,TS val 288138734 ecr 0,sackOK,eol], length 0 rick -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-05-04 at 13:36 -0700, Rick Jones wrote: > On 05/04/2012 01:20 PM, Rick Jones wrote: > > True, I'm looking at more than the ECN bits, but in the 90 minutes the > > tcpdump has been running there have been no packets with the any of the > > 8 bits at ip[1] being 1 anyway :) Netperf.org doesn't get a massive > > quantity of traffic. It may go the entire week-end or longer without > > seeing such a packet. > > I see fate is working as intended, or someone decided to try to feed me > my words :) for within 6 minutes of my sending the above I got: > > 13:26:16.866007 IP (tos 0x3,CE, ttl 41, id 28850, offset 0, flags [DF], > proto TCP (6), length 64) > somesystemin.de.55363 > www.netperf.org.www: Flags [S], cksum > 0x4cfc (correct), seq 304457158, win 65535, options [mss 1460,nop,wscale > 3,nop,nop,TS val 288116308 ecr 0,sackOK,eol], length 0 > 13:26:17.831880 IP (tos 0x3,CE, ttl 41, id 6911, offset 0, flags [DF], > proto TCP (6), length 64) > somesystemin.de.55367 > www.netperf.org.www: Flags [S], cksum > 0x17aa (correct), seq 586073737, win 65535, options [mss 1460,nop,wscale > 3,nop,nop,TS val 288117270 ecr 0,sackOK,eol], length 0 > 13:26:17.831929 IP (tos 0x3,CE, ttl 41, id 28924, offset 0, flags [DF], > proto TCP (6), length 64) > somesystemin.de.55368 > www.netperf.org.www: Flags [S], cksum > 0x07cc (correct), seq 1513398047, win 65535, options [mss > 1460,nop,wscale 3,nop,nop,TS val 288117271 ecr 0,sackOK,eol], length 0 > 13:26:17.831952 IP (tos 0x3,CE, ttl 41, id 2494, offset 0, flags [DF], > proto TCP (6), length 64) > somesystemin.de.55366 > www.netperf.org.www: Flags [S], cksum > 0x75f4 (correct), seq 1153058420, win 65535, options [mss > 1460,nop,wscale 3,nop,nop,TS val 288117270 ecr 0,sackOK,eol], length 0 > 13:26:17.832177 IP (tos 0x3,CE, ttl 41, id 6854, offset 0, flags [DF], > proto TCP (6), length 64) > somesystemin.de.55365 > www.netperf.org.www: Flags [S], cksum > 0xfca0 (correct), seq 2332522875, win 65535, options [mss > 1460,nop,wscale 3,nop,nop,TS val 288117270 ecr 0,sackOK,eol], length 0 > 13:26:17.832239 IP (tos 0x3,CE, ttl 41, id 64733, offset 0, flags [DF], > proto TCP (6), length 64) > somesystemin.de.55364 > www.netperf.org.www: Flags [S], cksum > 0x7414 (correct), seq 1544827132, win 65535, options [mss > 1460,nop,wscale 3,nop,nop,TS val 288117270 ecr 0,sackOK,eol], length 0 > 13:26:38.649126 IP (tos 0x3,CE, ttl 41, id 9860, offset 0, flags [DF], > proto TCP (6), length 64) > somesystemin.de.55369 > www.netperf.org.www: Flags [S], cksum > 0x6270 (correct), seq 683091230, win 65535, options [mss 1460,nop,wscale > 3,nop,nop,TS val 288137968 ecr 0,sackOK,eol], length 0 > 13:26:39.417589 IP (tos 0x3,CE, ttl 41, id 13478, offset 0, flags [DF], > proto TCP (6), length 64) > somesystemin.de.55370 > www.netperf.org.www: Flags [S], cksum > 0x2862 (correct), seq 3168323595, win 65535, options [mss > 1460,nop,wscale 3,nop,nop,TS val 288138734 ecr 0,sackOK,eol], length 0 > > rick Interesting indeed ;) Did you check if it was spoofed ? (did the 3WHS really completed) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Interesting indeed ;) > > Did you check if it was spoofed ? > > (did the 3WHS really completed) Well, the tcpdump command was still: tcpdump -i eth0 -vvv '(tcp[tcpflags]& tcp-syn != 0)&& (ip[1] != 0x0)' I didn't see any SYN|ACKs go out, but netperf.org would have had to set ECT for me to see a SYN|ACK going out. FWIW, this is on a 2.6.31-15 (Ubuntu) kernel with net.ipv4.tcp_ecn = 2 and I don't think the SYNs themselves were negotiating ECN: 13:26:16.866007 IP (tos 0x3,CE, ttl 41, id 28850, offset 0, flags [DF], proto TCP (6), length 64) somesystemin.de.55363 > www.netperf.org.www: Flags [S], cksum 0x4cfc (correct), seq 304457158, win 65535, options [mss 1460,nop,wscale rick -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-05-04 at 14:01 -0700, Rick Jones wrote: > > > > Interesting indeed ;) > > > > Did you check if it was spoofed ? > > > > (did the 3WHS really completed) > > > Well, the tcpdump command was still: > > > tcpdump -i eth0 -vvv '(tcp[tcpflags]& tcp-syn != 0)&& (ip[1] != 0x0)' > > I didn't see any SYN|ACKs go out, but netperf.org would have had to set > ECT for me to see a SYN|ACK going out. FWIW, this is on a 2.6.31-15 > (Ubuntu) kernel with net.ipv4.tcp_ecn = 2 and I don't think the SYNs > themselves were negotiating ECN: > > 13:26:16.866007 IP (tos 0x3,CE, ttl 41, id 28850, offset 0, flags [DF], > proto TCP (6), length 64) > somesystemin.de.55363 > www.netperf.org.www: Flags [S], cksum > 0x4cfc (correct), seq 304457158, win 65535, options [mss 1460,nop,wscale Probably not, or else you would see : 13:26:16.866007 IP (tos 0x3,CE, ttl 41, id 28850, offset 0, flags [DF],proto TCP (6), length 64) somesystemin.de.55363 > www.netperf.org.www: Flags [SEW], cksum ... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/tcp.h b/include/net/tcp.h index c826ed7..92faa6a 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -367,13 +367,6 @@ static inline void tcp_dec_quickack_mode(struct sock *sk, #define TCP_ECN_DEMAND_CWR 4 #define TCP_ECN_SEEN 8 -static __inline__ void -TCP_ECN_create_request(struct request_sock *req, struct tcphdr *th) -{ - if (sysctl_tcp_ecn && th->ece && th->cwr) - inet_rsk(req)->ecn_ok = 1; -} - enum tcp_tw_status { TCP_TW_SUCCESS = 0, TCP_TW_RST = 1, @@ -671,6 +664,22 @@ struct tcp_skb_cb { #define TCP_SKB_CB(__skb) ((struct tcp_skb_cb *)&((__skb)->cb[0])) +/* RFC3168 : 6.1.1 SYN packets must not have ECT/ECN bits set + * + * If we receive a SYN packet with these bits set, it means a network is + * playing bad games with TOS bits. In order to avoid possible false congestion + * notifications, we disable TCP ECN negociation. + */ +static inline void +TCP_ECN_create_request(struct request_sock *req, const struct sk_buff *skb) +{ + const struct tcphdr *th = tcp_hdr(skb); + + if (sysctl_tcp_ecn && th->ece && th->cwr && + INET_ECN_is_not_ect(TCP_SKB_CB(skb)->ip_dsfield)) + inet_rsk(req)->ecn_ok = 1; +} + /* Due to TSO, an SKB can be composed of multiple actual * packets. To keep these tracked properly, we use this. */ diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index cf97e98..4ff5e1f 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1368,7 +1368,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) goto drop_and_free; if (!want_cookie || tmp_opt.tstamp_ok) - TCP_ECN_create_request(req, tcp_hdr(skb)); + TCP_ECN_create_request(req, skb); if (want_cookie) { isn = cookie_v4_init_sequence(sk, skb, &req->mss); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 57b2109..078d039 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1140,7 +1140,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb) treq->rmt_addr = ipv6_hdr(skb)->saddr; treq->loc_addr = ipv6_hdr(skb)->daddr; if (!want_cookie || tmp_opt.tstamp_ok) - TCP_ECN_create_request(req, tcp_hdr(skb)); + TCP_ECN_create_request(req, skb); treq->iif = sk->sk_bound_dev_if;