diff mbox

tcp: ECN blackhole should not force quickack mode

Message ID 1316757739.2560.12.camel@edumazet-laptop
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Sept. 23, 2011, 6:02 a.m. UTC
While playing with a new ADSL box at home, I discovered that ECN
blackhole can trigger suboptimal quickack mode on linux : We send one
ACK for each incoming data frame, without any delay and eventual
piggyback.

This is because TCP_ECN_check_ce() considers that if no ECT is seen on a
segment, this is because this segment was a retransmit.

Refine this heuristic and apply it only if we seen ECT in a previous
segment, to detect ECN blackhole at IP level.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jamal Hadi Salim <jhs@mojatatu.com>
CC: Jerry Chu <hkchu@google.com>
CC: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
CC: Jim Gettys <jg@freedesktop.org>
CC: Dave Taht <dave.taht@gmail.com>
---
Another possibility is to remove this (not in RFC 3168) heuristic, what
do you think ?

 include/net/tcp.h    |    1 +
 net/ipv4/tcp_input.c |   23 ++++++++++++++++-------
 2 files changed, 17 insertions(+), 7 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Sept. 23, 2011, 5:47 p.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 23 Sep 2011 08:02:19 +0200

> While playing with a new ADSL box at home, I discovered that ECN
> blackhole can trigger suboptimal quickack mode on linux : We send one
> ACK for each incoming data frame, without any delay and eventual
> piggyback.
> 
> This is because TCP_ECN_check_ce() considers that if no ECT is seen on a
> segment, this is because this segment was a retransmit.
> 
> Refine this heuristic and apply it only if we seen ECT in a previous
> segment, to detect ECN blackhole at IP level.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Jamal Hadi Salim <jhs@mojatatu.com>
> CC: Jerry Chu <hkchu@google.com>
> CC: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> CC: Jim Gettys <jg@freedesktop.org>
> CC: Dave Taht <dave.taht@gmail.com>
> ---
> Another possibility is to remove this (not in RFC 3168) heuristic, what
> do you think ?

Jamal, please give this a look over.  Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilpo Järvinen Sept. 23, 2011, 7:17 p.m. UTC | #2
On Fri, 23 Sep 2011, Eric Dumazet wrote:

> While playing with a new ADSL box at home, I discovered that ECN
> blackhole can trigger suboptimal quickack mode on linux : We send one
> ACK for each incoming data frame, without any delay and eventual
> piggyback.
> 
> This is because TCP_ECN_check_ce() considers that if no ECT is seen on a
> segment, this is because this segment was a retransmit.
> 
> Refine this heuristic and apply it only if we seen ECT in a previous
> segment, to detect ECN blackhole at IP level.

This seems sensible thing to do.

Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Jamal Hadi Salim Sept. 24, 2011, 12:55 a.m. UTC | #3
On Fri, 2011-09-23 at 08:02 +0200, Eric Dumazet wrote:
> While playing with a new ADSL box at home, I discovered that ECN
> blackhole can trigger suboptimal quickack mode on linux : We send one
> ACK for each incoming data frame, without any delay and eventual
> piggyback.
>
> This is because TCP_ECN_check_ce() considers that if no ECT is seen on a
> segment, this is because this segment was a retransmit.

If you have Linux as the sender, that assumption is legit (i.e we never
set ECT on retransmits).

> Refine this heuristic and apply it only if we seen ECT in a previous
> segment, to detect ECN blackhole at IP level.

So  small confusion for me, Eric:
Is theres some middlebox clearing the ECT but allowing the TCP header
ECN flags to be set during SYN echange?  If yes, why would not allowing
quickack help?

cheers,
jamal





--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 25, 2011, 7:24 p.m. UTC | #4
Le vendredi 23 septembre 2011 à 20:55 -0400, Jamal Hadi Salim a écrit :

> So  small confusion for me, Eric:
> Is theres some middlebox clearing the ECT but allowing the TCP header
> ECN flags to be set during SYN echange?  If yes, why would not allowing
> quickack help?

As you may know, the SYN and SYNACK packets dont have ECT bits, as
mandated by RFC. So there is no way to detect a TOS blackhole in TCP
session init.

On following trace taken from my laptop connecting to a ssh server
(192.168.2.1), you can see that my laptop sends ECT messages, but never
receive ECT bits. Each sides believe the session is ECN enabled, as the
SYN and SYNACK messages have the appropriate ECN markers.

So we have somewhere a box that mask the TOS bits. This is a well known
effect on some networks, unfortunately.

The 'linux funny extension' is wrong in this case, since I have no
retransmits, no losses, yet my laptop is in quickack mode (it sends an
ACK for every incoming data packet), because you assumed receiving a
packet with no ECT bits means this packet was a retransmit.



21:14:44.665405 IP (tos 0x0, ttl 64, id 63689, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.2.32.39529 > 192.168.2.1.22: Flags [SEW], cksum 0xf2ea (incorrect -> 0x2151), seq 1589453915, win 14600, options [mss 1460,sackOK,TS val 56466 ecr 0,nop,wscale 7], length 0

21:14:44.773643 IP (tos 0x0, ttl 46, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.2.1.22 > 192.168.2.32.39529: Flags [S.E], cksum 0xbce1 (correct), seq 1422101564, ack 1589453916, win 14480, options [mss 1452,sackOK,TS val 176781781 ecr 56466,nop,wscale 7], length 0

21:14:44.773691 IP (tos 0x0, ttl 64, id 63690, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.2.32.39529 > 192.168.2.1.22: Flags [.], cksum 0xf2e2 (incorrect -> 0x23f8), ack 1, win 115, options [nop,nop,TS val 56477 ecr 176781781], length 0

21:14:44.888804 IP (tos 0x0, ttl 46, id 62941, offset 0, flags [DF], proto TCP (6), length 91)
    192.168.2.1.22 > 192.168.2.32.39529: Flags [P.], cksum 0x64d1 (correct), seq 1:40, ack 1, win 114, options [nop,nop,TS val 176781792 ecr 56477], length 39

21:14:44.888842 IP (tos 0x0, ttl 64, id 63691, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.2.32.39529 > 192.168.2.1.22: Flags [.], cksum 0xf2e2 (incorrect -> 0x23ba), ack 40, win 115, options [nop,nop,TS val 56489 ecr 176781792], length 0

21:14:44.888929 IP (tos 0x2,ECT(0), ttl 64, id 63692, offset 0, flags [DF], proto TCP (6), length 91)
    192.168.2.32.39529 > 192.168.2.1.22: Flags [P.], cksum 0x649d (correct), seq 1:40, ack 40, win 115, options [nop,nop,TS val 56489 ecr 176781792], length 39

21:14:44.993714 IP (tos 0x0, ttl 46, id 62942, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.2.1.22 > 192.168.2.32.39529: Flags [.], cksum 0x2389 (correct), ack 40, win 114, options [nop,nop,TS val 176781803 ecr 56489], length 0

21:14:44.993758 IP (tos 0x2,ECT(0), ttl 64, id 63693, offset 0, flags [DF], proto TCP (6), length 1196)
    192.168.2.32.39529 > 192.168.2.1.22: Flags [P.], cksum 0x1e1b (correct), seq 40:1184, ack 40, win 115, options [nop,nop,TS val 56499 ecr 176781803], length 1144

21:14:44.994447 IP (tos 0x0, ttl 46, id 62943, offset 0, flags [DF], proto TCP (6), length 900)
    192.168.2.1.22 > 192.168.2.32.39529: Flags [P.], cksum 0xd3d4 (correct), seq 40:888, ack 40, win 114, options [nop,nop,TS val 176781803 ecr 56489], length 848

21:14:44.994463 IP (tos 0x0, ttl 64, id 63694, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.2.32.39529 > 192.168.2.1.22: Flags [.], cksum 0xf2e2 (incorrect -> 0x1ba9), ack 888, win 128, options [nop,nop,TS val 56499 ecr 176781803], length 0

21:14:45.546332 IP (tos 0x0, ttl 64, id 63695, offset 0, flags [DF], proto TCP (6), length 1196)
    192.168.2.32.39529 > 192.168.2.1.22: Flags [P.], cksum 0x1a8c (correct), seq 40:1184, ack 888, win 128, options [nop,nop,TS val 56549 ecr 176781803], length 1144



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal Sept. 26, 2011, 1:07 a.m. UTC | #5
On Sun, 2011-09-25 at 21:24 +0200, Eric Dumazet wrote:

> 
> So we have somewhere a box that mask the TOS bits. This is a well known
> effect on some networks, unfortunately.

ok

> The 'linux funny extension' is wrong in this case, since I have no
> retransmits, no losses, yet my laptop is in quickack mode (it sends an
> ACK for every incoming data packet), because you assumed receiving a
> packet with no ECT bits means this packet was a retransmit.

No objection from me - only one corner case i can think of is if
expected behavior happens and the first data packet was a retransmit
(i.e you wouldnt have seen an ECT which you need for your heuristic),
but that is probably not a big a deal. 

In regards to the 'linux funny extension' - this was a brilliant idea
in my opinion back then from Alexey; lots of discussions happened but
I cant remember if it made it in some RFC or not (I will try to search
some old archives).

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal Sept. 26, 2011, 1:13 a.m. UTC | #6
On Sun, 2011-09-25 at 21:07 -0400, jamal wrote:

> In regards to the 'linux funny extension' - this was a brilliant idea
> in my opinion back then from Alexey; lots of discussions happened but
> I cant remember if it made it in some RFC or not (I will try to search
> some old archives).

Wasnt hard:
https://tools.ietf.org/id/draft-ietf-tsvwg-tcp-ecn-00.txt
It doesnt seem to have made it to RFC and i cant remember why.
[We dont wanna bring Sally out of retirement but we can ask KK if
it bugs you;->]

cheers,
jamal


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 26, 2011, 8:26 a.m. UTC | #7
Le dimanche 25 septembre 2011 à 21:13 -0400, jamal a écrit :
> On Sun, 2011-09-25 at 21:07 -0400, jamal wrote:
> 
> > In regards to the 'linux funny extension' - this was a brilliant idea
> > in my opinion back then from Alexey; lots of discussions happened but
> > I cant remember if it made it in some RFC or not (I will try to search
> > some old archives).
> 
> Wasnt hard:
> https://tools.ietf.org/id/draft-ietf-tsvwg-tcp-ecn-00.txt
> It doesnt seem to have made it to RFC and i cant remember why.
> [We dont wanna bring Sally out of retirement but we can ask KK if
> it bugs you;->]

Thanks !

This refers to additions to RFC 2481 : This was refined by RFC 3168, and
the retransmitted TCP packets requirement is part of the final RFC :

6.1.5.  Retransmitted TCP packets

   This document specifies ECN-capable TCP implementations MUST NOT set
   either ECT codepoint (ECT(0) or ECT(1)) in the IP header for
   retransmitted data packets, ...

Followed by a very long description :)

BTW, the ECN+ proposal (RFC 5562 :  Adding Explicit Congestion
Notification (ECN) Capability to TCP's SYN/ACK Packets) would allow the
client (receiving SYNACK message with ECT flags) to set the TCP_ECN_SEEN
I added in my patch, allowing even the first (retransmitted) data packet
to trigger quickack mode.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal Sept. 26, 2011, noon UTC | #8
On Mon, 2011-09-26 at 10:26 +0200, Eric Dumazet wrote:

> This refers to additions to RFC 2481 : This was refined by RFC 3168, and
> the retransmitted TCP packets requirement is part of the final RFC :

Ah, yes - thats where it went.

> 
> BTW, the ECN+ proposal (RFC 5562 :  Adding Explicit Congestion
> Notification (ECN) Capability to TCP's SYN/ACK Packets) would allow the
> client (receiving SYNACK message with ECT flags) to set the TCP_ECN_SEEN
> I added in my patch, allowing even the first (retransmitted) data packet
> to trigger quickack mode.

Excellent ;->
Are you going to add 5562 support?

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 26, 2011, 12:44 p.m. UTC | #9
Le lundi 26 septembre 2011 à 08:00 -0400, jamal a écrit :
> On Mon, 2011-09-26 at 10:26 +0200, Eric Dumazet wrote:
> 
> > This refers to additions to RFC 2481 : This was refined by RFC 3168, and
> > the retransmitted TCP packets requirement is part of the final RFC :
> 
> Ah, yes - thats where it went.
> 
> > 
> > BTW, the ECN+ proposal (RFC 5562 :  Adding Explicit Congestion
> > Notification (ECN) Capability to TCP's SYN/ACK Packets) would allow the
> > client (receiving SYNACK message with ECT flags) to set the TCP_ECN_SEEN
> > I added in my patch, allowing even the first (retransmitted) data packet
> > to trigger quickack mode.
> 
> Excellent ;->
> Are you going to add 5562 support?

I sent a private mail to Adam Langley this morning asking him if he
planned an official submission of a prior patch :  
http://www.ietf.org/mail-archive/web/tcpm/current/msg03988.html

If Adam is too busy or not anymore interested, I would like to spend
some cycles on this.

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Sept. 27, 2011, 4:59 a.m. UTC | #10
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 23 Sep 2011 08:02:19 +0200

> While playing with a new ADSL box at home, I discovered that ECN
> blackhole can trigger suboptimal quickack mode on linux : We send one
> ACK for each incoming data frame, without any delay and eventual
> piggyback.
> 
> This is because TCP_ECN_check_ce() considers that if no ECT is seen on a
> segment, this is because this segment was a retransmit.
> 
> Refine this heuristic and apply it only if we seen ECT in a previous
> segment, to detect ECN blackhole at IP level.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied to net-next, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/tcp.h b/include/net/tcp.h
index f357bef..702aefc 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -356,6 +356,7 @@  static inline void tcp_dec_quickack_mode(struct sock *sk,
 #define	TCP_ECN_OK		1
 #define	TCP_ECN_QUEUE_CWR	2
 #define	TCP_ECN_DEMAND_CWR	4
+#define	TCP_ECN_SEEN		8
 
 static __inline__ void
 TCP_ECN_create_request(struct request_sock *req, struct tcphdr *th)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a5d01b1..5a4408c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -217,16 +217,25 @@  static inline void TCP_ECN_withdraw_cwr(struct tcp_sock *tp)
 	tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR;
 }
 
-static inline void TCP_ECN_check_ce(struct tcp_sock *tp, struct sk_buff *skb)
+static inline void TCP_ECN_check_ce(struct tcp_sock *tp, const struct sk_buff *skb)
 {
-	if (tp->ecn_flags & TCP_ECN_OK) {
-		if (INET_ECN_is_ce(TCP_SKB_CB(skb)->flags))
-			tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
+	if (!(tp->ecn_flags & TCP_ECN_OK))
+		return;
+
+	switch (TCP_SKB_CB(skb)->flags & INET_ECN_MASK) {
+	case INET_ECN_NOT_ECT:
 		/* Funny extension: if ECT is not set on a segment,
-		 * it is surely retransmit. It is not in ECN RFC,
-		 * but Linux follows this rule. */
-		else if (INET_ECN_is_not_ect((TCP_SKB_CB(skb)->flags)))
+		 * and we already seen ECT on a previous segment,
+		 * it is probably a retransmit.
+		 */
+		if (tp->ecn_flags & TCP_ECN_SEEN)
 			tcp_enter_quickack_mode((struct sock *)tp);
+		break;
+	case INET_ECN_CE:
+		tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
+		/* fallinto */
+	default:
+		tp->ecn_flags |= TCP_ECN_SEEN;
 	}
 }