Message ID | 1316757739.2560.12.camel@edumazet-laptop |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 23 Sep 2011 08:02:19 +0200 > While playing with a new ADSL box at home, I discovered that ECN > blackhole can trigger suboptimal quickack mode on linux : We send one > ACK for each incoming data frame, without any delay and eventual > piggyback. > > This is because TCP_ECN_check_ce() considers that if no ECT is seen on a > segment, this is because this segment was a retransmit. > > Refine this heuristic and apply it only if we seen ECT in a previous > segment, to detect ECN blackhole at IP level. > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > CC: Jamal Hadi Salim <jhs@mojatatu.com> > CC: Jerry Chu <hkchu@google.com> > CC: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> > CC: Jim Gettys <jg@freedesktop.org> > CC: Dave Taht <dave.taht@gmail.com> > --- > Another possibility is to remove this (not in RFC 3168) heuristic, what > do you think ? Jamal, please give this a look over. Thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 23 Sep 2011, Eric Dumazet wrote: > While playing with a new ADSL box at home, I discovered that ECN > blackhole can trigger suboptimal quickack mode on linux : We send one > ACK for each incoming data frame, without any delay and eventual > piggyback. > > This is because TCP_ECN_check_ce() considers that if no ECT is seen on a > segment, this is because this segment was a retransmit. > > Refine this heuristic and apply it only if we seen ECT in a previous > segment, to detect ECN blackhole at IP level. This seems sensible thing to do. Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
On Fri, 2011-09-23 at 08:02 +0200, Eric Dumazet wrote: > While playing with a new ADSL box at home, I discovered that ECN > blackhole can trigger suboptimal quickack mode on linux : We send one > ACK for each incoming data frame, without any delay and eventual > piggyback. > > This is because TCP_ECN_check_ce() considers that if no ECT is seen on a > segment, this is because this segment was a retransmit. If you have Linux as the sender, that assumption is legit (i.e we never set ECT on retransmits). > Refine this heuristic and apply it only if we seen ECT in a previous > segment, to detect ECN blackhole at IP level. So small confusion for me, Eric: Is theres some middlebox clearing the ECT but allowing the TCP header ECN flags to be set during SYN echange? If yes, why would not allowing quickack help? cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 23 septembre 2011 à 20:55 -0400, Jamal Hadi Salim a écrit : > So small confusion for me, Eric: > Is theres some middlebox clearing the ECT but allowing the TCP header > ECN flags to be set during SYN echange? If yes, why would not allowing > quickack help? As you may know, the SYN and SYNACK packets dont have ECT bits, as mandated by RFC. So there is no way to detect a TOS blackhole in TCP session init. On following trace taken from my laptop connecting to a ssh server (192.168.2.1), you can see that my laptop sends ECT messages, but never receive ECT bits. Each sides believe the session is ECN enabled, as the SYN and SYNACK messages have the appropriate ECN markers. So we have somewhere a box that mask the TOS bits. This is a well known effect on some networks, unfortunately. The 'linux funny extension' is wrong in this case, since I have no retransmits, no losses, yet my laptop is in quickack mode (it sends an ACK for every incoming data packet), because you assumed receiving a packet with no ECT bits means this packet was a retransmit. 21:14:44.665405 IP (tos 0x0, ttl 64, id 63689, offset 0, flags [DF], proto TCP (6), length 60) 192.168.2.32.39529 > 192.168.2.1.22: Flags [SEW], cksum 0xf2ea (incorrect -> 0x2151), seq 1589453915, win 14600, options [mss 1460,sackOK,TS val 56466 ecr 0,nop,wscale 7], length 0 21:14:44.773643 IP (tos 0x0, ttl 46, id 0, offset 0, flags [DF], proto TCP (6), length 60) 192.168.2.1.22 > 192.168.2.32.39529: Flags [S.E], cksum 0xbce1 (correct), seq 1422101564, ack 1589453916, win 14480, options [mss 1452,sackOK,TS val 176781781 ecr 56466,nop,wscale 7], length 0 21:14:44.773691 IP (tos 0x0, ttl 64, id 63690, offset 0, flags [DF], proto TCP (6), length 52) 192.168.2.32.39529 > 192.168.2.1.22: Flags [.], cksum 0xf2e2 (incorrect -> 0x23f8), ack 1, win 115, options [nop,nop,TS val 56477 ecr 176781781], length 0 21:14:44.888804 IP (tos 0x0, ttl 46, id 62941, offset 0, flags [DF], proto TCP (6), length 91) 192.168.2.1.22 > 192.168.2.32.39529: Flags [P.], cksum 0x64d1 (correct), seq 1:40, ack 1, win 114, options [nop,nop,TS val 176781792 ecr 56477], length 39 21:14:44.888842 IP (tos 0x0, ttl 64, id 63691, offset 0, flags [DF], proto TCP (6), length 52) 192.168.2.32.39529 > 192.168.2.1.22: Flags [.], cksum 0xf2e2 (incorrect -> 0x23ba), ack 40, win 115, options [nop,nop,TS val 56489 ecr 176781792], length 0 21:14:44.888929 IP (tos 0x2,ECT(0), ttl 64, id 63692, offset 0, flags [DF], proto TCP (6), length 91) 192.168.2.32.39529 > 192.168.2.1.22: Flags [P.], cksum 0x649d (correct), seq 1:40, ack 40, win 115, options [nop,nop,TS val 56489 ecr 176781792], length 39 21:14:44.993714 IP (tos 0x0, ttl 46, id 62942, offset 0, flags [DF], proto TCP (6), length 52) 192.168.2.1.22 > 192.168.2.32.39529: Flags [.], cksum 0x2389 (correct), ack 40, win 114, options [nop,nop,TS val 176781803 ecr 56489], length 0 21:14:44.993758 IP (tos 0x2,ECT(0), ttl 64, id 63693, offset 0, flags [DF], proto TCP (6), length 1196) 192.168.2.32.39529 > 192.168.2.1.22: Flags [P.], cksum 0x1e1b (correct), seq 40:1184, ack 40, win 115, options [nop,nop,TS val 56499 ecr 176781803], length 1144 21:14:44.994447 IP (tos 0x0, ttl 46, id 62943, offset 0, flags [DF], proto TCP (6), length 900) 192.168.2.1.22 > 192.168.2.32.39529: Flags [P.], cksum 0xd3d4 (correct), seq 40:888, ack 40, win 114, options [nop,nop,TS val 176781803 ecr 56489], length 848 21:14:44.994463 IP (tos 0x0, ttl 64, id 63694, offset 0, flags [DF], proto TCP (6), length 52) 192.168.2.32.39529 > 192.168.2.1.22: Flags [.], cksum 0xf2e2 (incorrect -> 0x1ba9), ack 888, win 128, options [nop,nop,TS val 56499 ecr 176781803], length 0 21:14:45.546332 IP (tos 0x0, ttl 64, id 63695, offset 0, flags [DF], proto TCP (6), length 1196) 192.168.2.32.39529 > 192.168.2.1.22: Flags [P.], cksum 0x1a8c (correct), seq 40:1184, ack 888, win 128, options [nop,nop,TS val 56549 ecr 176781803], length 1144 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2011-09-25 at 21:24 +0200, Eric Dumazet wrote: > > So we have somewhere a box that mask the TOS bits. This is a well known > effect on some networks, unfortunately. ok > The 'linux funny extension' is wrong in this case, since I have no > retransmits, no losses, yet my laptop is in quickack mode (it sends an > ACK for every incoming data packet), because you assumed receiving a > packet with no ECT bits means this packet was a retransmit. No objection from me - only one corner case i can think of is if expected behavior happens and the first data packet was a retransmit (i.e you wouldnt have seen an ECT which you need for your heuristic), but that is probably not a big a deal. In regards to the 'linux funny extension' - this was a brilliant idea in my opinion back then from Alexey; lots of discussions happened but I cant remember if it made it in some RFC or not (I will try to search some old archives). cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2011-09-25 at 21:07 -0400, jamal wrote: > In regards to the 'linux funny extension' - this was a brilliant idea > in my opinion back then from Alexey; lots of discussions happened but > I cant remember if it made it in some RFC or not (I will try to search > some old archives). Wasnt hard: https://tools.ietf.org/id/draft-ietf-tsvwg-tcp-ecn-00.txt It doesnt seem to have made it to RFC and i cant remember why. [We dont wanna bring Sally out of retirement but we can ask KK if it bugs you;->] cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le dimanche 25 septembre 2011 à 21:13 -0400, jamal a écrit : > On Sun, 2011-09-25 at 21:07 -0400, jamal wrote: > > > In regards to the 'linux funny extension' - this was a brilliant idea > > in my opinion back then from Alexey; lots of discussions happened but > > I cant remember if it made it in some RFC or not (I will try to search > > some old archives). > > Wasnt hard: > https://tools.ietf.org/id/draft-ietf-tsvwg-tcp-ecn-00.txt > It doesnt seem to have made it to RFC and i cant remember why. > [We dont wanna bring Sally out of retirement but we can ask KK if > it bugs you;->] Thanks ! This refers to additions to RFC 2481 : This was refined by RFC 3168, and the retransmitted TCP packets requirement is part of the final RFC : 6.1.5. Retransmitted TCP packets This document specifies ECN-capable TCP implementations MUST NOT set either ECT codepoint (ECT(0) or ECT(1)) in the IP header for retransmitted data packets, ... Followed by a very long description :) BTW, the ECN+ proposal (RFC 5562 : Adding Explicit Congestion Notification (ECN) Capability to TCP's SYN/ACK Packets) would allow the client (receiving SYNACK message with ECT flags) to set the TCP_ECN_SEEN I added in my patch, allowing even the first (retransmitted) data packet to trigger quickack mode. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2011-09-26 at 10:26 +0200, Eric Dumazet wrote: > This refers to additions to RFC 2481 : This was refined by RFC 3168, and > the retransmitted TCP packets requirement is part of the final RFC : Ah, yes - thats where it went. > > BTW, the ECN+ proposal (RFC 5562 : Adding Explicit Congestion > Notification (ECN) Capability to TCP's SYN/ACK Packets) would allow the > client (receiving SYNACK message with ECT flags) to set the TCP_ECN_SEEN > I added in my patch, allowing even the first (retransmitted) data packet > to trigger quickack mode. Excellent ;-> Are you going to add 5562 support? cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le lundi 26 septembre 2011 à 08:00 -0400, jamal a écrit : > On Mon, 2011-09-26 at 10:26 +0200, Eric Dumazet wrote: > > > This refers to additions to RFC 2481 : This was refined by RFC 3168, and > > the retransmitted TCP packets requirement is part of the final RFC : > > Ah, yes - thats where it went. > > > > > BTW, the ECN+ proposal (RFC 5562 : Adding Explicit Congestion > > Notification (ECN) Capability to TCP's SYN/ACK Packets) would allow the > > client (receiving SYNACK message with ECT flags) to set the TCP_ECN_SEEN > > I added in my patch, allowing even the first (retransmitted) data packet > > to trigger quickack mode. > > Excellent ;-> > Are you going to add 5562 support? I sent a private mail to Adam Langley this morning asking him if he planned an official submission of a prior patch : http://www.ietf.org/mail-archive/web/tcpm/current/msg03988.html If Adam is too busy or not anymore interested, I would like to spend some cycles on this. Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 23 Sep 2011 08:02:19 +0200 > While playing with a new ADSL box at home, I discovered that ECN > blackhole can trigger suboptimal quickack mode on linux : We send one > ACK for each incoming data frame, without any delay and eventual > piggyback. > > This is because TCP_ECN_check_ce() considers that if no ECT is seen on a > segment, this is because this segment was a retransmit. > > Refine this heuristic and apply it only if we seen ECT in a previous > segment, to detect ECN blackhole at IP level. > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Applied to net-next, thanks Eric. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/tcp.h b/include/net/tcp.h index f357bef..702aefc 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -356,6 +356,7 @@ static inline void tcp_dec_quickack_mode(struct sock *sk, #define TCP_ECN_OK 1 #define TCP_ECN_QUEUE_CWR 2 #define TCP_ECN_DEMAND_CWR 4 +#define TCP_ECN_SEEN 8 static __inline__ void TCP_ECN_create_request(struct request_sock *req, struct tcphdr *th) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a5d01b1..5a4408c 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -217,16 +217,25 @@ static inline void TCP_ECN_withdraw_cwr(struct tcp_sock *tp) tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR; } -static inline void TCP_ECN_check_ce(struct tcp_sock *tp, struct sk_buff *skb) +static inline void TCP_ECN_check_ce(struct tcp_sock *tp, const struct sk_buff *skb) { - if (tp->ecn_flags & TCP_ECN_OK) { - if (INET_ECN_is_ce(TCP_SKB_CB(skb)->flags)) - tp->ecn_flags |= TCP_ECN_DEMAND_CWR; + if (!(tp->ecn_flags & TCP_ECN_OK)) + return; + + switch (TCP_SKB_CB(skb)->flags & INET_ECN_MASK) { + case INET_ECN_NOT_ECT: /* Funny extension: if ECT is not set on a segment, - * it is surely retransmit. It is not in ECN RFC, - * but Linux follows this rule. */ - else if (INET_ECN_is_not_ect((TCP_SKB_CB(skb)->flags))) + * and we already seen ECT on a previous segment, + * it is probably a retransmit. + */ + if (tp->ecn_flags & TCP_ECN_SEEN) tcp_enter_quickack_mode((struct sock *)tp); + break; + case INET_ECN_CE: + tp->ecn_flags |= TCP_ECN_DEMAND_CWR; + /* fallinto */ + default: + tp->ecn_flags |= TCP_ECN_SEEN; } }
While playing with a new ADSL box at home, I discovered that ECN blackhole can trigger suboptimal quickack mode on linux : We send one ACK for each incoming data frame, without any delay and eventual piggyback. This is because TCP_ECN_check_ce() considers that if no ECT is seen on a segment, this is because this segment was a retransmit. Refine this heuristic and apply it only if we seen ECT in a previous segment, to detect ECN blackhole at IP level. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jamal Hadi Salim <jhs@mojatatu.com> CC: Jerry Chu <hkchu@google.com> CC: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> CC: Jim Gettys <jg@freedesktop.org> CC: Dave Taht <dave.taht@gmail.com> --- Another possibility is to remove this (not in RFC 3168) heuristic, what do you think ? include/net/tcp.h | 1 + net/ipv4/tcp_input.c | 23 ++++++++++++++++------- 2 files changed, 17 insertions(+), 7 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html