Message ID | cc0be24818d7009208dd0d8f73cc35939957f134.1432059419.git.daniel@iogearbox.net |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, 2015-05-19 at 21:33 +0200, Daniel Borkmann wrote: ... > > Reference: https://www.ietf.org/proceedings/92/slides/slides-92-iccrg-1.pdf > Reference: https://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > Signed-off-by: Florian Westphal <fw@strlen.de> > Signed-off-by: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch> > Signed-off-by: Brian Trammell <trammell@tik.ee.ethz.ch> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Dave Täht <dave.taht@gmail.com> > --- > v2 -> v3: > - Very sorry. Typo happened in Dave's name since v1, getting it right > this time, no bad intentions. ;) > v1 -> v2: > - Added suggestion from Eric to let ecn_flags be cleared eventually in > tcp_ecn_rcv_synack(), thanks! > - Rest as is. Acked-by: Eric Dumazet <edumazet@google.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Daniel Borkmann <daniel@iogearbox.net> Date: Tue, 19 May 2015 21:33:42 +0200 > This work as a follow-up of commit f7b3bec6f516 ("net: allow setting ecn > via routing table") and adds RFC3168 section 6.1.1.1. fallback for outgoing > ECN connections. In other words, this work adds a retry with a non-ECN > setup SYN packet, as suggested from the RFC on the first timeout: > > [...] A host that receives no reply to an ECN-setup SYN within the > normal SYN retransmission timeout interval MAY resend the SYN and > any subsequent SYN retransmissions with CWR and ECE cleared. [...] ... > Reference: https://www.ietf.org/proceedings/92/slides/slides-92-iccrg-1.pdf > Reference: https://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > Signed-off-by: Florian Westphal <fw@strlen.de> > Signed-off-by: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch> > Signed-off-by: Brian Trammell <trammell@tik.ee.ethz.ch> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Dave Täht <dave.taht@gmail.com> > --- > v2 -> v3: > - Very sorry. Typo happened in Dave's name since v1, getting it right > this time, no bad intentions. ;) > v1 -> v2: > - Added suggestion from Eric to let ecn_flags be cleared eventually in > tcp_ecn_rcv_synack(), thanks! > - Rest as is. Applied, thanks everyone. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Daniel, With this commit, ifconfig does not show any of the interfaces and I don't have any connectivity as a result. Can you double check this? Thanks! Vijay On 19 May 2015 at 13:54, David Miller <davem@davemloft.net> wrote: > From: Daniel Borkmann <daniel@iogearbox.net> > Date: Tue, 19 May 2015 21:33:42 +0200 > >> This work as a follow-up of commit f7b3bec6f516 ("net: allow setting ecn >> via routing table") and adds RFC3168 section 6.1.1.1. fallback for outgoing >> ECN connections. In other words, this work adds a retry with a non-ECN >> setup SYN packet, as suggested from the RFC on the first timeout: >> >> [...] A host that receives no reply to an ECN-setup SYN within the >> normal SYN retransmission timeout interval MAY resend the SYN and >> any subsequent SYN retransmissions with CWR and ECE cleared. [...] > ... >> Reference: https://www.ietf.org/proceedings/92/slides/slides-92-iccrg-1.pdf >> Reference: https://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf >> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> >> Signed-off-by: Florian Westphal <fw@strlen.de> >> Signed-off-by: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch> >> Signed-off-by: Brian Trammell <trammell@tik.ee.ethz.ch> >> Cc: Eric Dumazet <edumazet@google.com> >> Cc: Dave Täht <dave.taht@gmail.com> >> --- >> v2 -> v3: >> - Very sorry. Typo happened in Dave's name since v1, getting it right >> this time, no bad intentions. ;) >> v1 -> v2: >> - Added suggestion from Eric to let ecn_flags be cleared eventually in >> tcp_ecn_rcv_synack(), thanks! >> - Rest as is. > > Applied, thanks everyone. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2015-05-20 at 11:13 -0700, Vijay Subramanian wrote: > Hi Daniel, > > With this commit, ifconfig does not show any of the interfaces and I > don't have any connectivity as a result. > Can you double check this? Please do not top post. No problem here. I do not see obvious reasons for breaking your setup. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please do not top post. My fault. Will be more careful in future. > > No problem here. I do not see obvious reasons for breaking your setup. > It was a problem with my network driver not getting installed due to some symbol mismatch after a compile. It got sorted out after I cleaned up everything. This was a false alarm. Apologies for the noise. Vijay -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/Documentation/networking/dctcp.txt b/Documentation/networking/dctcp.txt index 0d5dfbc..cd9d3eb 100644 --- a/Documentation/networking/dctcp.txt +++ b/Documentation/networking/dctcp.txt @@ -8,6 +8,7 @@ the data center network to provide multi-bit feedback to the end hosts. To enable it on end hosts: sysctl -w net.ipv4.tcp_congestion_control=dctcp + sysctl -w net.ipv4.tcp_ecn_fallback=0 (optional) All switches in the data center network running DCTCP must support ECN marking and be configured for marking when reaching defined switch buffer diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 5095c63..cb083e0 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -267,6 +267,15 @@ tcp_ecn - INTEGER but do not request ECN on outgoing connections. Default: 2 +tcp_ecn_fallback - BOOLEAN + If the kernel detects that ECN connection misbehaves, enable fall + back to non-ECN. Currently, this knob implements the fallback + from RFC3168, section 6.1.1.1., but we reserve that in future, + additional detection mechanisms could be implemented under this + knob. The value is not used, if tcp_ecn or per route (or congestion + control) ECN settings are disabled. + Default: 1 (fallback enabled) + tcp_fack - BOOLEAN Enable FACK congestion avoidance and fast retransmission. The value is not used, if tcp_sack is not enabled. diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 614a49b..6848b8b 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -77,6 +77,8 @@ struct netns_ipv4 { struct local_ports ip_local_ports; int sysctl_tcp_ecn; + int sysctl_tcp_ecn_fallback; + int sysctl_ip_no_pmtu_disc; int sysctl_ip_fwd_use_pmtu; int sysctl_ip_nonlocal_bind; diff --git a/include/net/tcp.h b/include/net/tcp.h index 7ace6ac..3275f93 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -702,6 +702,8 @@ static inline u32 tcp_skb_timestamp(const struct sk_buff *skb) #define TCPHDR_ECE 0x40 #define TCPHDR_CWR 0x80 +#define TCPHDR_SYN_ECN (TCPHDR_SYN | TCPHDR_ECE | TCPHDR_CWR) + /* This is what the send packet queuing engine uses to pass * TCP per-packet control information to the transmission code. * We also store the host-order sequence numbers in here too. diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index c3852a7..841de32 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -821,6 +821,13 @@ static struct ctl_table ipv4_net_table[] = { .proc_handler = proc_dointvec }, { + .procname = "tcp_ecn_fallback", + .data = &init_net.ipv4.sysctl_tcp_ecn_fallback, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec + }, + { .procname = "ip_local_port_range", .maxlen = sizeof(init_net.ipv4.ip_local_ports.range), .data = &init_net.ipv4.ip_local_ports.range, diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 91cb476..0cc4b5a 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2411,12 +2411,15 @@ static int __net_init tcp_sk_init(struct net *net) goto fail; *per_cpu_ptr(net->ipv4.tcp_sk, cpu) = sk; } + net->ipv4.sysctl_tcp_ecn = 2; + net->ipv4.sysctl_tcp_ecn_fallback = 1; + net->ipv4.sysctl_tcp_base_mss = TCP_BASE_MSS; net->ipv4.sysctl_tcp_probe_threshold = TCP_PROBE_THRESHOLD; net->ipv4.sysctl_tcp_probe_interval = TCP_PROBE_INTERVAL; - return 0; + return 0; fail: tcp_sk_exit(net); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 7386d32..a057054 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -350,6 +350,15 @@ static void tcp_ecn_send_syn(struct sock *sk, struct sk_buff *skb) } } +static void tcp_ecn_clear_syn(struct sock *sk, struct sk_buff *skb) +{ + if (sock_net(sk)->ipv4.sysctl_tcp_ecn_fallback) + /* tp->ecn_flags are cleared at a later point in time when + * SYN ACK is ultimatively being received. + */ + TCP_SKB_CB(skb)->tcp_flags &= ~(TCPHDR_ECE | TCPHDR_CWR); +} + static void tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th, struct sock *sk) @@ -2615,6 +2624,10 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb) } } + /* RFC3168, section 6.1.1.1. ECN fallback */ + if ((TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN_ECN) == TCPHDR_SYN_ECN) + tcp_ecn_clear_syn(sk, skb); + tcp_retrans_try_collapse(sk, skb, cur_mss); /* Make a copy, if the first transmission SKB clone we made