diff mbox series

[RESEND,net-next,v2] net: ipv4: allocate ipv4_devconf memory for init_net

Message ID 1547628802-10902-1-git-send-email-xiangxia.m.yue@gmail.com
State Not Applicable
Delegated to: David Miller
Headers show
Series [RESEND,net-next,v2] net: ipv4: allocate ipv4_devconf memory for init_net | expand

Commit Message

Tonghao Zhang Jan. 16, 2019, 8:53 a.m. UTC
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

The devconf setting on the init_net will affect other
namespace when them created. For example:

$ cat /proc/sys/net/ipv4/conf/all/rp_filter
0
$ echo 2 > /proc/sys/net/ipv4/conf/all/rp_filter
$ cat /proc/sys/net/ipv4/conf/all/rp_filter
2

$ ip netns add ns100
$ ip netns exec ns100 bash
$ cat /proc/sys/net/ipv4/conf/all/rp_filter
2

The value of rp_filter in the ns100, should be 0 as
default, but it is 2 same as _init_net_.

In some case,  there are many containers, on host, where running the
different type containers and applications. We don't know what network configuration
user will set. The new containers don't inherit our host configuration.

* host _init_net_ may be used as a complex network. (the rp_filter, arp_ignore
  and arp_announce may be used.) but containers use them as default. Host network
  configuration should not affect containers.
* containers and host network configuration are complete isolation.

To fix it and init devconf to default value, we allocate
memory for every namespace(include init_net), this memory
will be used to store themself setting data and we also
allocate memory to register sys_ctl tables.

IPv6 does that in the same way.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
v1->v2: add more commit info.
---
 net/ipv4/devinet.c | 40 ++++++++++++++++------------------------
 1 file changed, 16 insertions(+), 24 deletions(-)

Comments

Nicolas Dichtel Jan. 17, 2019, 8:25 a.m. UTC | #1
Le 16/01/2019 à 09:53, xiangxia.m.yue@gmail.com a écrit :
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> 
> The devconf setting on the init_net will affect other
> namespace when them created. For example:
> 
> $ cat /proc/sys/net/ipv4/conf/all/rp_filter
> 0
> $ echo 2 > /proc/sys/net/ipv4/conf/all/rp_filter
> $ cat /proc/sys/net/ipv4/conf/all/rp_filter
> 2
> 
> $ ip netns add ns100
> $ ip netns exec ns100 bash
> $ cat /proc/sys/net/ipv4/conf/all/rp_filter
> 2
> 
> The value of rp_filter in the ns100, should be 0 as
> default, but it is 2 same as _init_net_.
This problem has already been discussed in the past. We cannot change anymore
the current behavior, some users are relying on it.


Regards,
Nicolas
Tonghao Zhang Jan. 17, 2019, 11:12 a.m. UTC | #2
On Thu, Jan 17, 2019 at 4:25 PM Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
>
> Le 16/01/2019 à 09:53, xiangxia.m.yue@gmail.com a écrit :
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > The devconf setting on the init_net will affect other
> > namespace when them created. For example:
> >
> > $ cat /proc/sys/net/ipv4/conf/all/rp_filter
> > 0
> > $ echo 2 > /proc/sys/net/ipv4/conf/all/rp_filter
> > $ cat /proc/sys/net/ipv4/conf/all/rp_filter
> > 2
> >
> > $ ip netns add ns100
> > $ ip netns exec ns100 bash
> > $ cat /proc/sys/net/ipv4/conf/all/rp_filter
> > 2
> >
> > The value of rp_filter in the ns100, should be 0 as
> > default, but it is 2 same as _init_net_.
> This problem has already been discussed in the past. We cannot change anymore
> the current behavior, some users are relying on it.
Sorry, I don't know that some users rely on it and I send that patch
again. In our case,

there are many containers, on host, where running the
different type containers and applications. We don't know what network
configuration
user will set. The new containers don't inherit our host configuration.

* host _init_net_ may be used as a complex network. (the rp_filter, arp_ignore
  and arp_announce may be used.) but containers use them as default.
Host network
  configuration should not affect containers.
* containers and host network configuration are complete isolation.

so, we should support configuration for user, inherit _init_net_ value or not ?

>
> Regards,
> Nicolas
Cong Wang Jan. 17, 2019, 6 p.m. UTC | #3
On Thu, Jan 17, 2019 at 3:12 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> so, we should support configuration for user, inherit _init_net_ value or not ?

I already told you to introduce a new proc to control this behavior,
you ignored my advice... :(

I will send out a patch then, as I know what I want. :)

Thanks.
Tonghao Zhang Jan. 18, 2019, 1:43 a.m. UTC | #4
On Fri, Jan 18, 2019 at 2:00 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Thu, Jan 17, 2019 at 3:12 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >
> > so, we should support configuration for user, inherit _init_net_ value or not ?
>
> I already told you to introduce a new proc to control this behavior,
> you ignored my advice... :(
Sorry,  because I can not find the reason that we should  inherit
_init_net, and send that patch.
> I will send out a patch then, as I know what I want. :)
good, please cc to me.
> Thanks.
diff mbox series

Patch

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 5b9b6d4..2edf0f8 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2555,33 +2555,28 @@  static __net_init int devinet_init_net(struct net *net)
 	int err;
 	struct ipv4_devconf *all, *dflt;
 #ifdef CONFIG_SYSCTL
-	struct ctl_table *tbl = ctl_forward_entry;
+	struct ctl_table *tbl;
 	struct ctl_table_header *forw_hdr;
 #endif
 
 	err = -ENOMEM;
-	all = &ipv4_devconf;
-	dflt = &ipv4_devconf_dflt;
-
-	if (!net_eq(net, &init_net)) {
-		all = kmemdup(all, sizeof(ipv4_devconf), GFP_KERNEL);
-		if (!all)
-			goto err_alloc_all;
+	all = kmemdup(&ipv4_devconf, sizeof(ipv4_devconf), GFP_KERNEL);
+	if (!all)
+		goto err_alloc_all;
 
-		dflt = kmemdup(dflt, sizeof(ipv4_devconf_dflt), GFP_KERNEL);
-		if (!dflt)
-			goto err_alloc_dflt;
+	dflt = kmemdup(&ipv4_devconf_dflt, sizeof(ipv4_devconf_dflt), GFP_KERNEL);
+	if (!dflt)
+		goto err_alloc_dflt;
 
 #ifdef CONFIG_SYSCTL
-		tbl = kmemdup(tbl, sizeof(ctl_forward_entry), GFP_KERNEL);
-		if (!tbl)
-			goto err_alloc_ctl;
+	tbl = kmemdup(ctl_forward_entry, sizeof(ctl_forward_entry), GFP_KERNEL);
+	if (!tbl)
+		goto err_alloc_ctl;
 
-		tbl[0].data = &all->data[IPV4_DEVCONF_FORWARDING - 1];
-		tbl[0].extra1 = all;
-		tbl[0].extra2 = net;
+	tbl[0].data = &all->data[IPV4_DEVCONF_FORWARDING - 1];
+	tbl[0].extra1 = all;
+	tbl[0].extra2 = net;
 #endif
-	}
 
 #ifdef CONFIG_SYSCTL
 	err = __devinet_sysctl_register(net, "all", NETCONFA_IFINDEX_ALL, all);
@@ -2610,15 +2605,12 @@  static __net_init int devinet_init_net(struct net *net)
 err_reg_dflt:
 	__devinet_sysctl_unregister(net, all, NETCONFA_IFINDEX_ALL);
 err_reg_all:
-	if (tbl != ctl_forward_entry)
-		kfree(tbl);
+	kfree(tbl);
 err_alloc_ctl:
 #endif
-	if (dflt != &ipv4_devconf_dflt)
-		kfree(dflt);
+	kfree(dflt);
 err_alloc_dflt:
-	if (all != &ipv4_devconf)
-		kfree(all);
+	kfree(all);
 err_alloc_all:
 	return err;
 }