diff mbox

[v2] net:Add sysctl_tcp_sg_max_skb_frags

Message ID 1453900829-3384-1-git-send-email-hans.westgaard.ry@oracle.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Hans Westgaard Ry Jan. 27, 2016, 1:20 p.m. UTC
Devices may have limits on the number of fragments in an skb they support.
Current codebase uses a constant as maximum for number of fragments one
skb can hold and use.
When enabling scatter/gather and running traffic with many small messages
the codebase uses the maximum number of fragments and may thereby violate
the max for certain devices.
The patch introduces a global variable as max number of fragments in
scatter/gather.

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>

---
 include/net/tcp.h          |  2 ++
 net/ipv4/sysctl_net_ipv4.c | 10 ++++++++++
 net/ipv4/tcp.c             |  8 +++++---
 3 files changed, 17 insertions(+), 3 deletions(-)

Comments

Eric Dumazet Jan. 27, 2016, 3:15 p.m. UTC | #1
On Wed, 2016-01-27 at 14:20 +0100, Hans Westgaard Ry wrote:
> Devices may have limits on the number of fragments in an skb they support.
> Current codebase uses a constant as maximum for number of fragments one
> skb can hold and use.
> When enabling scatter/gather and running traffic with many small messages
> the codebase uses the maximum number of fragments and may thereby violate
> the max for certain devices.
> The patch introduces a global variable as max number of fragments in
> scatter/gather.


Principle looks good, but we have to ask if other skb providers [1] will
add other sysctl, or if we could share a common one ?

If it is a common one, it should be /proc/sys/net/core/... instead
of /proc/sys/net/ipv4/tcp_....



Other providers include :

1) GRO stack
2) callers of sock_alloc_send_pskb(), alloc_skb_with_frags(),
sock_alloc_send_skb() ..

Thanks !
Hannes Frederic Sowa Jan. 27, 2016, 6:12 p.m. UTC | #2
On 27.01.2016 16:15, Eric Dumazet wrote:
> On Wed, 2016-01-27 at 14:20 +0100, Hans Westgaard Ry wrote:
>> Devices may have limits on the number of fragments in an skb they support.
>> Current codebase uses a constant as maximum for number of fragments one
>> skb can hold and use.
>> When enabling scatter/gather and running traffic with many small messages
>> the codebase uses the maximum number of fragments and may thereby violate
>> the max for certain devices.
>> The patch introduces a global variable as max number of fragments in
>> scatter/gather.
>
>
> Principle looks good, but we have to ask if other skb providers [1] will
> add other sysctl, or if we could share a common one ?
>
> If it is a common one, it should be /proc/sys/net/core/... instead
> of /proc/sys/net/ipv4/tcp_....
 >
> Other providers include :
>
> 1) GRO stack
> 2) callers of sock_alloc_send_pskb(), alloc_skb_with_frags(),
> sock_alloc_send_skb() ..

I agree, this knob should get a generic name and live in a generic net/ 
directory to control this globally, so things don't break during 
forwarding etc.

It does not solve the problem completely, e.g. when VMs send gso packets 
through a vhost-net onto IPoIB, no?

Thanks,
Hannes
David Miller Jan. 27, 2016, 8:13 p.m. UTC | #3
From: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Date: Wed, 27 Jan 2016 14:20:29 +0100

> Devices may have limits on the number of fragments in an skb they support.

Then this belongs as a global networking setting not a protocol
specfic one.
Hans Westgaard Ry Feb. 1, 2016, 1:12 p.m. UTC | #4
On 01/27/2016 07:12 PM, Hannes Frederic Sowa wrote:
> On 27.01.2016 16:15, Eric Dumazet wrote:
>>
>> If it is a common one, it should be /proc/sys/net/core/... instead
>> of /proc/sys/net/ipv4/tcp_....
> >
>> Other providers include :
>>
>> 1) GRO stack
>> 2) callers of sock_alloc_send_pskb(), alloc_skb_with_frags(),
>> sock_alloc_send_skb() ..
>
> I agree, this knob should get a generic name and live in a generic 
> net/ directory to control this globally, so things don't break during 
> forwarding etc.
>
> It does not solve the problem completely, e.g. when VMs send gso 
> packets through a vhost-net onto IPoIB, no?
>
> Thanks,
> Hannes
>
I have understood more of the problem Hannes raises and I realize that 
we need a slowpath in IPoIB to handle the problem
in question. I will make a new version of this patch with a more correct 
position/name of the sysctl-variable.
I'll also make a separate patch adding the slowpath i IPoIB.

Hans
diff mbox

Patch

diff --git a/include/net/tcp.h b/include/net/tcp.h
index f80e74c..8d18df3 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -285,6 +285,8 @@  extern int sysctl_tcp_invalid_ratelimit;
 extern int sysctl_tcp_pacing_ss_ratio;
 extern int sysctl_tcp_pacing_ca_ratio;
 
+extern int sysctl_tcp_sg_max_skb_frags;
+
 extern atomic_long_t tcp_memory_allocated;
 extern struct percpu_counter tcp_sockets_allocated;
 extern int tcp_memory_pressure;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index a0bd7a5..498dcf9 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -40,6 +40,7 @@  static int ip_ttl_min = 1;
 static int ip_ttl_max = 255;
 static int tcp_syn_retries_min = 1;
 static int tcp_syn_retries_max = MAX_TCP_SYNCNT;
+static int tcp_sg_max_skb_frags = MAX_SKB_FRAGS;
 static int ip_ping_group_range_min[] = { 0, 0 };
 static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX };
 
@@ -761,6 +762,15 @@  static struct ctl_table ipv4_table[] = {
 		.proc_handler	= proc_dointvec_ms_jiffies,
 	},
 	{
+		.procname	= "tcp_sg_max_skb_frags",
+		.data		= &sysctl_tcp_sg_max_skb_frags,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &one,
+		.extra2		= &tcp_sg_max_skb_frags,
+	},
+	{
 		.procname	= "icmp_msgs_per_sec",
 		.data		= &sysctl_icmp_msgs_per_sec,
 		.maxlen		= sizeof(int),
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c82cca1..928d2c7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -300,7 +300,9 @@  EXPORT_SYMBOL(sysctl_tcp_wmem);
 
 atomic_long_t tcp_memory_allocated;	/* Current allocated memory. */
 EXPORT_SYMBOL(tcp_memory_allocated);
-
+/* maximum number of fragments for tcp scatter/gather */
+int sysctl_tcp_sg_max_skb_frags __read_mostly = MAX_SKB_FRAGS;
+EXPORT_SYMBOL(sysctl_tcp_sg_max_skb_frags);
 /*
  * Current number of TCP sockets.
  */
@@ -938,7 +940,7 @@  new_segment:
 
 		i = skb_shinfo(skb)->nr_frags;
 		can_coalesce = skb_can_coalesce(skb, i, page, offset);
-		if (!can_coalesce && i >= MAX_SKB_FRAGS) {
+		if (!can_coalesce && i >= sysctl_tcp_sg_max_skb_frags) {
 			tcp_mark_push(tp, skb);
 			goto new_segment;
 		}
@@ -1211,7 +1213,7 @@  new_segment:
 
 			if (!skb_can_coalesce(skb, i, pfrag->page,
 					      pfrag->offset)) {
-				if (i == MAX_SKB_FRAGS || !sg) {
+				if (i == sysctl_tcp_sg_max_skb_frags || !sg) {
 					tcp_mark_push(tp, skb);
 					goto new_segment;
 				}