diff mbox series

ipv4: Namespaceify tcp_max_orphans knob

Message ID 1504753808-13266-1-git-send-email-yanhaishuang@cmss.chinamobile.com
State Deferred, archived
Delegated to: David Miller
Headers show
Series ipv4: Namespaceify tcp_max_orphans knob | expand

Commit Message

Haishuang Yan Sept. 7, 2017, 3:10 a.m. UTC
Different namespace application might require different maximal number
of TCP sockets independently of the host.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
 include/net/netns/ipv4.h   |  1 +
 include/net/tcp.h          |  5 +++--
 net/ipv4/sysctl_net_ipv4.c | 14 +++++++-------
 net/ipv4/tcp.c             |  3 ---
 net/ipv4/tcp_input.c       |  1 -
 net/ipv4/tcp_ipv4.c        |  1 +
 6 files changed, 12 insertions(+), 13 deletions(-)

Comments

Cong Wang Sept. 8, 2017, 10:13 p.m. UTC | #1
On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
<yanhaishuang@cmss.chinamobile.com> wrote:
> Different namespace application might require different maximal number
> of TCP sockets independently of the host.

So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
in a whole system, right? This just makes OOM easier to trigger.
Haishuang Yan Sept. 9, 2017, 1:25 a.m. UTC | #2
> On 2017年9月9日, at 上午6:13, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> 
> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
> <yanhaishuang@cmss.chinamobile.com> wrote:
>> Different namespace application might require different maximal number
>> of TCP sockets independently of the host.
> 
> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
> in a whole system, right? This just makes OOM easier to trigger.
> 

From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans,
and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans
+ ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.
Cong Wang Sept. 9, 2017, 4:35 a.m. UTC | #3
On Fri, Sep 8, 2017 at 6:25 PM, 严海双 <yanhaishuang@cmss.chinamobile.com> wrote:
>
>
>> On 2017年9月9日, at 上午6:13, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>
>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
>> <yanhaishuang@cmss.chinamobile.com> wrote:
>>> Different namespace application might require different maximal number
>>> of TCP sockets independently of the host.
>>
>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
>> in a whole system, right? This just makes OOM easier to trigger.
>>
>
> From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans,
> and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans
> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.

Nope, by N I mean the number of containers. Before your patch, the limit
is global, after your patch it is per container.
Haishuang Yan Sept. 9, 2017, 5:09 a.m. UTC | #4
> On 2017年9月9日, at 下午12:35, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> 
> On Fri, Sep 8, 2017 at 6:25 PM, 严海双 <yanhaishuang@cmss.chinamobile.com> wrote:
>> 
>> 
>>> On 2017年9月9日, at 上午6:13, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> 
>>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
>>> <yanhaishuang@cmss.chinamobile.com> wrote:
>>>> Different namespace application might require different maximal number
>>>> of TCP sockets independently of the host.
>>> 
>>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
>>> in a whole system, right? This just makes OOM easier to trigger.
>>> 
>> 
>> From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans,
>> and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans
>> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.
> 
> Nope, by N I mean the number of containers. Before your patch, the limit
> is global, after your patch it is per container.
> 

Yeah, for example, if there is N containers, before the patch, I mean the limit is:

	N * net->ipv4.sysctl_tcp_max_orphans

After the patch, the limit is:

	ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. net->ipv4.sysctl_tcp_max_orphans + …
David Miller Sept. 9, 2017, 5:16 a.m. UTC | #5
From: 严海双 <yanhaishuang@cmss.chinamobile.com>

Date: Sat, 9 Sep 2017 13:09:57 +0800

> 

> 

>> On 2017年9月9日, at 下午12:35, Cong Wang <xiyou.wangcong@gmail.com> wrote:

>> 

>> On Fri, Sep 8, 2017 at 6:25 PM, 严海双 <yanhaishuang@cmss.chinamobile.com> wrote:

>>> 

>>> 

>>>> On 2017年9月9日, at 上午6:13, Cong Wang <xiyou.wangcong@gmail.com> wrote:

>>>> 

>>>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan

>>>> <yanhaishuang@cmss.chinamobile.com> wrote:

>>>>> Different namespace application might require different maximal number

>>>>> of TCP sockets independently of the host.

>>>> 

>>>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans

>>>> in a whole system, right? This just makes OOM easier to trigger.

>>>> 

>>> 

>>> From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans,

>>> and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans

>>> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.

>> 

>> Nope, by N I mean the number of containers. Before your patch, the limit

>> is global, after your patch it is per container.

>> 

> 

> Yeah, for example, if there is N containers, before the patch, I mean the limit is:

> 

> 	N * net->ipv4.sysctl_tcp_max_orphans

> 

> After the patch, the limit is:

> 

> 	ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. net->ipv4.sysctl_tcp_max_orphans + …


Not true.

Please remove "N" from your equation of the current situation.

"sysctl_tcp_max_orphans" applies to entire system, it is a global limit,
comparing one limit against all orphans in the system, there is no N.
Haishuang Yan Sept. 9, 2017, 10:21 a.m. UTC | #6
> On 2017年9月9日, at 下午1:16, David Miller <davem@davemloft.net> wrote:
> 
> From: 严海双 <yanhaishuang@cmss.chinamobile.com>
> Date: Sat, 9 Sep 2017 13:09:57 +0800
> 
>> 
>> 
>>> On 2017年9月9日, at 下午12:35, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> 
>>> On Fri, Sep 8, 2017 at 6:25 PM, 严海双 <yanhaishuang@cmss.chinamobile.com> wrote:
>>>> 
>>>> 
>>>>> On 2017年9月9日, at 上午6:13, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>> 
>>>>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
>>>>> <yanhaishuang@cmss.chinamobile.com> wrote:
>>>>>> Different namespace application might require different maximal number
>>>>>> of TCP sockets independently of the host.
>>>>> 
>>>>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
>>>>> in a whole system, right? This just makes OOM easier to trigger.
>>>>> 
>>>> 
>>>> From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans,
>>>> and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans
>>>> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.
>>> 
>>> Nope, by N I mean the number of containers. Before your patch, the limit
>>> is global, after your patch it is per container.
>>> 
>> 
>> Yeah, for example, if there is N containers, before the patch, I mean the limit is:
>> 
>> 	N * net->ipv4.sysctl_tcp_max_orphans
>> 
>> After the patch, the limit is:
>> 
>> 	ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. net->ipv4.sysctl_tcp_max_orphans + …
> 
> Not true.
> 
> Please remove "N" from your equation of the current situation.
> 
> "sysctl_tcp_max_orphans" applies to entire system, it is a global limit,
> comparing one limit against all orphans in the system, there is no N.

Yes, it’s right. I browse the source code and found that it’s a global limit, 
sorry for my mistake.

Thanks David and Cong.
diff mbox series

Patch

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 20d061c..305e031 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -127,6 +127,7 @@  struct netns_ipv4 {
 	int sysctl_tcp_timestamps;
 	struct inet_timewait_death_row tcp_death_row;
 	int sysctl_max_syn_backlog;
+	int sysctl_tcp_max_orphans;
 
 #ifdef CONFIG_NET_L3_MASTER_DEV
 	int sysctl_udp_l3mdev_accept;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index b510f28..ac2d998 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -320,10 +320,11 @@  static inline bool tcp_too_many_orphans(struct sock *sk, int shift)
 {
 	struct percpu_counter *ocp = sk->sk_prot->orphan_count;
 	int orphans = percpu_counter_read_positive(ocp);
+	int tcp_max_orphans = sock_net(sk)->ipv4.sysctl_tcp_max_orphans;
 
-	if (orphans << shift > sysctl_tcp_max_orphans) {
+	if (orphans << shift > tcp_max_orphans) {
 		orphans = percpu_counter_sum_positive(ocp);
-		if (orphans << shift > sysctl_tcp_max_orphans)
+		if (orphans << shift > tcp_max_orphans)
 			return true;
 	}
 	return false;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 0d3c038..4f26c8d3 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -394,13 +394,6 @@  static int proc_tcp_available_ulp(struct ctl_table *ctl,
 		.proc_handler	= proc_dointvec
 	},
 	{
-		.procname	= "tcp_max_orphans",
-		.data		= &sysctl_tcp_max_orphans,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec
-	},
-	{
 		.procname	= "tcp_fastopen",
 		.data		= &sysctl_tcp_fastopen,
 		.maxlen		= sizeof(int),
@@ -1085,6 +1078,13 @@  static int proc_tcp_available_ulp(struct ctl_table *ctl,
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "tcp_max_orphans",
+		.data		= &init_net.ipv4.sysctl_tcp_max_orphans,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 	{
 		.procname	= "fib_multipath_use_neigh",
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5091402..39187ac 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3522,9 +3522,6 @@  void __init tcp_init(void)
 	}
 
 
-	cnt = tcp_hashinfo.ehash_mask + 1;
-	sysctl_tcp_max_orphans = cnt / 2;
-
 	tcp_init_mem();
 	/* Set per-socket limits to no more than 1/128 the pressure threshold */
 	limit = nr_free_buffer_pages() << (PAGE_SHIFT - 7);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c5d7656..0230509 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -88,7 +88,6 @@ 
 
 int sysctl_tcp_stdurg __read_mostly;
 int sysctl_tcp_rfc1337 __read_mostly;
-int sysctl_tcp_max_orphans __read_mostly = NR_FILE;
 int sysctl_tcp_frto __read_mostly = 2;
 int sysctl_tcp_min_rtt_wlen __read_mostly = 300;
 int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index a63486a..4b17a91 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2468,6 +2468,7 @@  static int __net_init tcp_sk_init(struct net *net)
 	net->ipv4.tcp_death_row.hashinfo = &tcp_hashinfo;
 
 	net->ipv4.sysctl_max_syn_backlog = max(128, cnt / 256);
+	net->ipv4.sysctl_tcp_max_orphans = cnt / 2;
 	net->ipv4.sysctl_tcp_sack = 1;
 	net->ipv4.sysctl_tcp_window_scaling = 1;
 	net->ipv4.sysctl_tcp_timestamps = 1;