Patchwork [net-next] tcp: add a global sysctl to control TCP delayed ack

login
register
mail settings
Submitter Amerigo Wang
Date April 4, 2013, 10:16 a.m.
Message ID <1365070560-11544-1-git-send-email-amwang@redhat.com>
Download mbox | patch
Permalink /patch/233700/
State Rejected
Delegated to: David Miller
Headers show

Comments

Amerigo Wang - April 4, 2013, 10:16 a.m.
From: Cong Wang <amwang@redhat.com>

Change from RFC:
* make the sysctl per netns

According to previous discussion, it seems there is no
reasonable heuristics.

Similar to TCP_QUICK_ACK option, but for people who can't
modify the source code and still wants to control
TCP delayed ACK behavior.

David, do you still have any objection?

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Rick Jones <rick.jones2@hp.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Cong Wang <amwang@redhat.com>

---
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa - April 4, 2013, 10:48 p.m.
On Thu, Apr 04, 2013 at 06:16:00PM +0800, Cong Wang wrote:
> From: Cong Wang <amwang@redhat.com>
> 
> Change from RFC:
> * make the sysctl per netns
> 
> According to previous discussion, it seems there is no
> reasonable heuristics.
> 
> Similar to TCP_QUICK_ACK option, but for people who can't
> modify the source code and still wants to control
> TCP delayed ACK behavior.
> 
> David, do you still have any objection?

I totally understand the objections that were given regarding this
patch. But for defense of this patch we also provide a knob to disable
slow start after idle, which from my point of view is as "evil" as
this change.

Btw I think we should make tcp_slow_start_after_idle namespace aware, too.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - April 4, 2013, 11:25 p.m.
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Fri, 5 Apr 2013 00:48:10 +0200

> I totally understand the objections that were given regarding this
> patch. But for defense of this patch we also provide a knob to disable
> slow start after idle, which from my point of view is as "evil" as
> this change.

I completely disagree, slow start after idle is way too aggressively
throwing past history away, so turning that off is much safer.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Greear - April 4, 2013, 11:39 p.m.
On 04/04/2013 04:25 PM, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Fri, 5 Apr 2013 00:48:10 +0200
>
>> I totally understand the objections that were given regarding this
>> patch. But for defense of this patch we also provide a knob to disable
>> slow start after idle, which from my point of view is as "evil" as
>> this change.
>
> I completely disagree, slow start after idle is way too aggressively
> throwing past history away, so turning that off is much safer.

I've been carrying a per-socket way to control delayed ack in my tree for a while,
and it is a big performance gain on wifi networks where the network is
basically half-duplex.  If I recall correctly, it's worth 50+Mbps
throughput improvement in some cases.  If it matters, I can post more
detailed numbers.

There are drawbacks when you set delayed ack too high:  The ramp-up time
for TCP takes a good bit longer.  But, some applications may want to trade
slower startup time for better bulk transport, and at moderate delayed-ack
values, the ramp up time is not noticeably impaired.

Thanks,
Ben
Hagen Paul Pfeifer - April 6, 2013, 3:38 p.m.
* Cong Wang | 2013-04-04 18:16:00 [+0800]:

>According to previous discussion, it seems there is no
>reasonable heuristics.
>
>Similar to TCP_QUICK_ACK option, but for people who can't
>modify the source code and still wants to control
>TCP delayed ACK behavior.
>
>David, do you still have any objection?

Maybe:

http://kerneltrap.org/mailarchive/linux-netdev/2010/8/23/6283640

I submitted a identical patch 3 years back. The consensus was to implement
this on a per-route tunable - not on a global sysctl tunable. OK, consensus
is not the correct wording ... ;-)

Hagen
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - April 7, 2013, 9:09 p.m.
From: Cong Wang <amwang@redhat.com>
Date: Thu,  4 Apr 2013 18:16:00 +0800

> From: Cong Wang <amwang@redhat.com>
> 
> Change from RFC:
> * make the sysctl per netns
> 
> According to previous discussion, it seems there is no
> reasonable heuristics.
> 
> Similar to TCP_QUICK_ACK option, but for people who can't
> modify the source code and still wants to control
> TCP delayed ACK behavior.
> 
> David, do you still have any objection?
> 
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Rick Jones <rick.jones2@hp.com>
> Cc: Stephen Hemminger <shemminger@vyatta.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Thomas Graf <tgraf@suug.ch>
> CC: David Laight <David.Laight@ACULAB.COM>
> Signed-off-by: Cong Wang <amwang@redhat.com>

I'm not applying a patch that adds a global parameter for
an attribute which has per-path scope.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amerigo Wang - April 8, 2013, 1:45 a.m.
On Sun, 2013-04-07 at 17:09 -0400, David Miller wrote:
> 
> I'm not applying a patch that adds a global parameter for
> an attribute which has per-path scope.

Ok, I will make it per-route.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rick Jones - April 8, 2013, 4:39 p.m.
On 04/07/2013 06:45 PM, Cong Wang wrote:
> On Sun, 2013-04-07 at 17:09 -0400, David Miller wrote:
>>
>> I'm not applying a patch that adds a global parameter for
>> an attribute which has per-path scope.
>
> Ok, I will make it per-route.

Is setting a per-route attribute to cover the backside of an application 
for which source code is not available in this day and age really all 
that much easier than hitting that application with an LD_PRELOAD 
library to make the desired setsockopt() call?

rick jones
having flashbacks to similar conversations in other contexts in the 1990s...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index f98ca63..9b39681 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -572,6 +572,11 @@  tcp_challenge_ack_limit - INTEGER
 	in RFC 5961 (Improving TCP's Robustness to Blind In-Window Attacks)
 	Default: 100
 
+tcp_quick_ack - BOOLEAN
+	Globally enables or disables TCP delayed ACK. The applications
+	can still change the quick ACK mode by TCP_QUICK_ACK option.
+	Default: off
+
 UDP variables:
 
 udp_mem - vector of 3 INTEGERs: min, pressure, max
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 2ba9de8..f03298a 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -63,6 +63,7 @@  struct netns_ipv4 {
 	int sysctl_icmp_errors_use_inbound_ifaddr;
 
 	int sysctl_tcp_ecn;
+	int sysctl_tcp_quick_ack;
 
 	kgid_t sysctl_ping_group_range[2];
 	long sysctl_tcp_mem[3];
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index fa2f63f..1db1780 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -837,6 +837,13 @@  static struct ctl_table ipv4_net_table[] = {
 		.mode		= 0644,
 		.proc_handler	= ipv4_tcp_mem,
 	},
+	{
+		.procname	= "tcp_quick_ack",
+		.data		= &init_net.ipv4.sysctl_tcp_quick_ack,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
 	{ }
 };
 
@@ -866,6 +873,8 @@  static __net_init int ipv4_sysctl_init_net(struct net *net)
 			&net->ipv4.sysctl_ping_group_range;
 		table[7].data =
 			&net->ipv4.sysctl_tcp_ecn;
+		table[9].data =
+			&net->ipv4.sysctl_tcp_quick_ack;
 
 		/* Don't export sysctls to unprivileged users */
 		if (net->user_ns != &init_user_ns)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 6d9ca35..a1b44f3 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3781,7 +3781,8 @@  static void tcp_fin(struct sock *sk)
 	case TCP_ESTABLISHED:
 		/* Move to CLOSE_WAIT */
 		tcp_set_state(sk, TCP_CLOSE_WAIT);
-		inet_csk(sk)->icsk_ack.pingpong = 1;
+		if (!sock_net(sk)->ipv4.sysctl_tcp_quick_ack)
+			inet_csk(sk)->icsk_ack.pingpong = 1;
 		break;
 
 	case TCP_CLOSE_WAIT:
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index af354c98..130fc99 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -169,8 +169,9 @@  static void tcp_event_data_sent(struct tcp_sock *tp,
 	/* If it is a reply for ato after last received
 	 * packet, enter pingpong mode.
 	 */
-	if ((u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato)
-		icsk->icsk_ack.pingpong = 1;
+	if ((u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato &&
+	    !sock_net(sk)->ipv4.sysctl_tcp_quick_ack)
+			icsk->icsk_ack.pingpong = 1;
 }
 
 /* Account for an ACK we sent. */