Patchwork Lucid SRU - UBUNTU: SAUCE: netns: Add quota for number of NET_NS instances.

login
register
mail settings
Submitter Tim Gardner
Date Dec. 1, 2011, 9:48 p.m.
Message ID <4ED7F629.8070408@canonical.com>
Download mbox | patch
Permalink /patch/128771/
State New
Headers show

Comments

Tim Gardner - Dec. 1, 2011, 9:48 p.m.
Please consider this (untested) patch for inclusion in Lucid. See the 
discussion in http://bugs.launchpad.net/bugs/790863 for arguments 
proposing to restore CONFIG_NET_NS.

I'll post a test kernel to the bug in awhile.

One of the issues I have with this patch is that it appears that any 
consumer of network name spaces will have to initially write a non-zero 
value to netns_max before _any_ name spaces can be successfully 
allocated. If copy_net_ns() fails in create_new_namespaces(), then it 
seems the whole allocation is buggered.

rtg
Brad Figg - Dec. 1, 2011, 10:39 p.m.
On 12/01/2011 01:48 PM, Tim Gardner wrote:
> Please consider this (untested) patch for inclusion in Lucid. See the discussion in http://bugs.launchpad.net/bugs/790863 for arguments proposing to restore CONFIG_NET_NS.
>
> I'll post a test kernel to the bug in awhile.
>
> One of the issues I have with this patch is that it appears that any consumer of network name spaces will have to initially write a non-zero value to netns_max before _any_ name spaces can be successfully allocated. If copy_net_ns() fails in
> create_new_namespaces(), then it seems the whole allocation is buggered.
>
> rtg
>
>

Tim,

If you follow the thread that starts at:
http://www.spinics.net/lists/netdev/msg180263.html
you will see that Tetsuo actually proposed a modified
version of this patch: http://www.spinics.net/lists/netdev/msg180360.html.

Brad
Tim Gardner - Dec. 2, 2011, 1:05 a.m.
On 12/01/2011 03:39 PM, Brad Figg wrote:
> On 12/01/2011 01:48 PM, Tim Gardner wrote:
>> Please consider this (untested) patch for inclusion in Lucid. See the
>> discussion in http://bugs.launchpad.net/bugs/790863 for arguments
>> proposing to restore CONFIG_NET_NS.
>>
>> I'll post a test kernel to the bug in awhile.
>>
>> One of the issues I have with this patch is that it appears that any
>> consumer of network name spaces will have to initially write a
>> non-zero value to netns_max before _any_ name spaces can be
>> successfully allocated. If copy_net_ns() fails in
>> create_new_namespaces(), then it seems the whole allocation is buggered.
>>
>> rtg
>>
>>
>
> Tim,
>
> If you follow the thread that starts at:
> http://www.spinics.net/lists/netdev/msg180263.html
> you will see that Tetsuo actually proposed a modified
> version of this patch: http://www.spinics.net/lists/netdev/msg180360.html.
>
> Brad

I did see the second version, but its more complicated and I'm not 
convinced that it solves the OOM better then the first (simpler) patch.

This is my (perhaps incorrect) model of the problem.

Consider the workload that first brought this issue to light. vsftpd 
receives a login request for which it forks a process and indirectly 
allocates a network name space. Eventually the login process terminates 
and synchronously frees all of its resources except the network 
namespace (which is now on an RCU list to be freed later). Now imagine 
this happening at a sufficiently high rate that the lower priority RCU 
thread never gets to run and free its list elements. Eventually all slab 
space is exhausted and the OOM killer cranks up.

So, the first patch simply synchronously returns an error if the number 
of network name spaces exceeds the specified maximum. This happens 
within the context of the fork, the login process is aborted, and the 
remote user is told to buzz off.

With the second patch, once the maximum number of network name spaces 
has been reached, the fork _waits_ until a name space is free (having 
already consumed some non-zero amount of task structure memory). In the 
meantime login requests continue to pour in and vsftpd attempts to fork 
still more processes which consume still more memory. If the login 
attempt rate is sufficiently high, then I think the forks will 
eventually start to fail when they cannot allocate task structure memory.

Of course, with either patch failure recovery is deferred to user space, 
but I'm not convinced that the end result is any different.

With both patches, vsftpd fails a login attempt when there are 
insufficient resources, so why not use the simpler approach ?

rtg
Tetsuo Handa - Dec. 2, 2011, 4:36 a.m.
Tim Gardner wrote:
> So, the first patch simply synchronously returns an error if the number 
> of network name spaces exceeds the specified maximum. This happens 
> within the context of the fork, the login process is aborted, and the 
> remote user is told to buzz off.

According to comment #24 of bug #790863, vsftpd in Lucid was updated to use
Debian's 10-remote-dos.patch 2.3.4-1 patch. So, we no longer need to worry
about vsftpd users, don't we?

I guess normal lxr containers will not start/terminate as frequent as ftp
clients. Thus, I think the first patch (give up immediately version) is fine.
Just setting initial quota value to 512 or so?
Tim Gardner - Dec. 19, 2011, 1:50 p.m.
On 12/01/2011 02:48 PM, Tim Gardner wrote:
> Please consider this (untested) patch for inclusion in Lucid. See the
> discussion in http://bugs.launchpad.net/bugs/790863 for arguments
> proposing to restore CONFIG_NET_NS.
>
> I'll post a test kernel to the bug in awhile.
>
> One of the issues I have with this patch is that it appears that any
> consumer of network name spaces will have to initially write a non-zero
> value to netns_max before _any_ name spaces can be successfully
> allocated. If copy_net_ns() fails in create_new_namespaces(), then it
> seems the whole allocation is buggered.
>
> rtg
>

At least 2 testers have reported good results with this patch. Is there 
any dissent ? Otherwise I shall apply it to master-next.

rtg
Leann Ogasawara - Dec. 19, 2011, 2:39 p.m.
On Mon, 2011-12-19 at 06:50 -0700, Tim Gardner wrote:
> On 12/01/2011 02:48 PM, Tim Gardner wrote:
> > Please consider this (untested) patch for inclusion in Lucid. See the
> > discussion in http://bugs.launchpad.net/bugs/790863 for arguments
> > proposing to restore CONFIG_NET_NS.
> >
> > I'll post a test kernel to the bug in awhile.
> >
> > One of the issues I have with this patch is that it appears that any
> > consumer of network name spaces will have to initially write a non-zero
> > value to netns_max before _any_ name spaces can be successfully
> > allocated. If copy_net_ns() fails in create_new_namespaces(), then it
> > seems the whole allocation is buggered.
> >
> > rtg
> >
> 
> At least 2 testers have reported good results with this patch. Is there 
> any dissent ? Otherwise I shall apply it to master-next.

Given your simpler approach has generated positive testing feedback, I
think you should apply it.

Acked-by: Leann Ogasawara <leann.ogasawara@canonical.com>
Serge Hallyn - Dec. 19, 2011, 3:32 p.m.
Quoting Brad Figg (brad.figg@canonical.com):
> On 12/01/2011 01:48 PM, Tim Gardner wrote:
> >Please consider this (untested) patch for inclusion in Lucid. See the discussion in http://bugs.launchpad.net/bugs/790863 for arguments proposing to restore CONFIG_NET_NS.
> >
> >I'll post a test kernel to the bug in awhile.
> >
> >One of the issues I have with this patch is that it appears that any consumer of network name spaces will have to initially write a non-zero value to netns_max before _any_ name spaces can be successfully allocated. If copy_net_ns() fails in
> >create_new_namespaces(), then it seems the whole allocation is buggered.
> >
> >rtg
> >
> >
> 
> Tim,
> 
> If you follow the thread that starts at:
> http://www.spinics.net/lists/netdev/msg180263.html
> you will see that Tetsuo actually proposed a modified
> version of this patch: http://www.spinics.net/lists/netdev/msg180360.html.

(Shouldn't used_netns_count default to 1?  :)

It looks good, I'd only ask that a warning be printed, even if only
printk_once(), when the limit is hit.  Otherwise we risk mysterious
bugs reported against other software.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

thanks,
-serge

Patch

From be54a2184e5fa1d01cf62bf4de5f75bafa795a8f Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Thu, 1 Dec 2011 14:28:04 -0700
Subject: [PATCH] UBUNTU: SAUCE: netns: Add quota for number of NET_NS instances.

CONFIG_NET_NS support in 2.6.32 has a problem that leads to OOM killer when
clone(CLONE_NEWNET) is called instantly.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/720095
But disabling CONFIG_NET_NS broke lxc containers.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/790863

This patch introduces /proc/sys/net/core/netns_max interface that limits
max number of network namespace instances.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
---
 include/net/sock.h         |    4 ++++
 net/core/net_namespace.c   |    9 +++++++++
 net/core/sysctl_net_core.c |   10 ++++++++++
 3 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 4babe89..3cd628c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1620,4 +1620,8 @@  extern int sysctl_optmem_max;
 extern __u32 sysctl_wmem_default;
 extern __u32 sysctl_rmem_default;
 
+#ifdef CONFIG_NET_NS
+extern int max_netns_count;
+#endif
+
 #endif	/* _SOCK_H */
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 1c1af27..4020874 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -81,12 +81,18 @@  static struct net_generic *net_alloc_generic(void)
 #ifdef CONFIG_NET_NS
 static struct kmem_cache *net_cachep;
 static struct workqueue_struct *netns_wq;
+static atomic_t used_netns_count = ATOMIC_INIT(0);
+unsigned int max_netns_count;
 
 static struct net *net_alloc(void)
 {
 	struct net *net = NULL;
 	struct net_generic *ng;
 
+	atomic_inc(&used_netns_count);
+	if (atomic_read(&used_netns_count) > max_netns_count)
+		goto out;
+
 	ng = net_alloc_generic();
 	if (!ng)
 		goto out;
@@ -96,7 +102,9 @@  static struct net *net_alloc(void)
 		goto out_free;
 
 	rcu_assign_pointer(net->gen, ng);
+	return net;
 out:
+	atomic_dec(&used_netns_count);
 	return net;
 
 out_free:
@@ -115,6 +123,7 @@  static void net_free(struct net *net)
 #endif
 	kfree(net->gen);
 	kmem_cache_free(net_cachep, net);
+	atomic_dec(&used_netns_count);
 }
 
 static struct net *net_create(void)
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 7db1de0..81c79df 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -89,6 +89,16 @@  static struct ctl_table net_core_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+#ifdef CONFIG_NET_NS
+	{
+		.ctl_name       = CTL_UNNUMBERED,
+		.procname       = "netns_max",
+		.data           = &max_netns_count,
+		.maxlen         = sizeof(int),
+		.mode           = 0644,
+		.proc_handler   = proc_dointvec,
+	},
+#endif
 #endif /* CONFIG_NET */
 	{
 		.ctl_name	= NET_CORE_BUDGET,
-- 
1.7.0.4