Message ID | CAFFEFTXT_fkF2pPSxDEEgic80NVWLqBWtFuvs6W9uDUW2aCnqw@mail.gmail.com |
---|---|
State | Not Applicable |
Headers | show |
canqun zhang <canqunzhang@gmail.com> writes: > Hi all > As discussed above,if the host machine create several linux > containers, there will be several net namespaces.Resources with "nf > conntrack" are registered or unregistered on the first net > namespace(init_net),But init_net is not unregistered lastly,so > cleanuping other net namespaces will triger painic. > If net namespaces are created with the order of 1,2,...n,they should > be cleaned with the order of n,...2,1,so in this case init_net will be > unregistered lastly. No. Network namespaces in general can be cleaned up in any order. In particular you should never ever expect to see the order n,n-1,n-2,...,2,1. It may make sense to special case init_net in the cleanup order but I would really rather not. Now init_net is special and really should never be cleaned up for non-modular code. So it almost makes sense to special case init_net. Does anyone know why Alexy decided to do this only for init_net? My inclination is that Gao Feng is on the rigt path by just removing the strange init_net special case and performing the work once per module load, and once per module unload. > I fixed it up (see below). I have taken a lot of test! Thank you. It is nice to see that we have exposed this mis-assumption. I am inclined to leave the order of this list as is so that other assumptions of network namespace unregistration order are exposed. Unless there is a truly good reason to perform magic on init_net. Eric > diff -r 6a1a258923f5 -r 2667e89e6f50 net/core/net_namespace.c > --- a/net/core/net_namespace.c Fri Dec 28 11:01:17 2012 +0800 > +++ b/net/core/net_namespace.c Fri Dec 28 11:05:12 2012 +0800 > @@ -450,7 +450,7 @@ > > list_del(&ops->list); > for_each_net(net) > - list_add_tail(&net->exit_list, &net_exit_list); > + list_add(&net->exit_list, &net_exit_list); > ops_exit_list(ops, &net_exit_list); > ops_free_list(ops, &net_exit_lis > -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
yes,Network namespaces in general can be cleaned up in any order,but when doing /etc/ini.d/iptables restart, the system need cleaning up all net namespace,and init_net should be cleanup lastly.init_net is the first namespace,other net namespace is copied for it ,and it is diuty for Initializing resources,so It in itself is special. 2012/12/28 Eric W. Biederman <ebiederm@xmission.com>: > canqun zhang <canqunzhang@gmail.com> writes: > >> Hi all >> As discussed above,if the host machine create several linux >> containers, there will be several net namespaces.Resources with "nf >> conntrack" are registered or unregistered on the first net >> namespace(init_net),But init_net is not unregistered lastly,so >> cleanuping other net namespaces will triger painic. >> If net namespaces are created with the order of 1,2,...n,they should >> be cleaned with the order of n,...2,1,so in this case init_net will be >> unregistered lastly. > > No. Network namespaces in general can be cleaned up in any order. > > In particular you should never ever expect to see the order > n,n-1,n-2,...,2,1. > > It may make sense to special case init_net in the cleanup order > but I would really rather not. > > Now init_net is special and really should never be cleaned up > for non-modular code. So it almost makes sense to special > case init_net. > > Does anyone know why Alexy decided to do this only for init_net? > > My inclination is that Gao Feng is on the rigt path by just removing > the strange init_net special case and performing the work once > per module load, and once per module unload. > >> I fixed it up (see below). I have taken a lot of test! > > Thank you. > > It is nice to see that we have exposed this mis-assumption. > > I am inclined to leave the order of this list as is so that > other assumptions of network namespace unregistration order > are exposed. > > Unless there is a truly good reason to perform magic on init_net. > > Eric > >> diff -r 6a1a258923f5 -r 2667e89e6f50 net/core/net_namespace.c >> --- a/net/core/net_namespace.c Fri Dec 28 11:01:17 2012 +0800 >> +++ b/net/core/net_namespace.c Fri Dec 28 11:05:12 2012 +0800 >> @@ -450,7 +450,7 @@ >> >> list_del(&ops->list); >> for_each_net(net) >> - list_add_tail(&net->exit_list, &net_exit_list); >> + list_add(&net->exit_list, &net_exit_list); >> ops_exit_list(ops, &net_exit_list); >> ops_free_list(ops, &net_exit_lis >> -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
canqun zhang <canqunzhang@gmail.com> writes: > yes,Network namespaces in general can be cleaned up in any order,but > when doing /etc/ini.d/iptables restart, the system need cleaning up > all net namespace,and init_net should be cleanup lastly.init_net is > the first namespace,other net namespace is copied for it ,and it is > diuty for Initializing resources,so It in itself is special. "other net namespaces is copied for it" I don't have a clue what you mean by that. Every network namespace starts out in a default state not in a copied state. Nowhere else in the network stack does &init_net have the duty of initializing or cleaning up resources. That /etc/init.d/iptables restart removes modules in general is a little dubious. That /etc/init.d/iptables restart removes modules when there are other existing network namespaces using those modules is down right dangerous. Dangerous in the anyone can ssh into the machine way. I suspect it has taken 5 years for this bug to show up because it is so idiotic to remove code that someone else is using. I won't argue that making it so that &init_net is the last network namespace to go will solve this problem. But I can't see how adding the guarantee that &init_net will always be cleaned up last is a good long term solution. Removing the init_net special case gives a simpler mental model, and less to learn and maintain about network namespaces. Eric -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/28/12 11:52, canqun zhang wrote: > Hi all > As discussed above,if the host machine create several linux > containers, there will be several net namespaces.Resources with "nf > conntrack" are registered or unregistered on the first net > namespace(init_net),But init_net is not unregistered lastly,so > cleanuping other net namespaces will triger painic. > If net namespaces are created with the order of 1,2,...n,they should > be cleaned with the order of n,...2,1,so in this case init_net will be > unregistered lastly. > I fixed it up (see below). I have taken a lot of test! > I thinks this BUG is a netfilter BUG,not a netns BUG. Other subsystems implemented netns support don't use init_net to do some special works((un)register/(un)set). In fact,we can't use init_net to do this job well.such as function nf_conntrack_clean,we shoud set ip_ct_attach to NULL before any netns doing cleanup jobs, and set nf_ct_destroy to NULL after all of netns finish these cleanup jobs. So I think finally we still need this patchset,And this is a regular way to fix this problem. Can you help me to test if the panic bug is fixed by this patchset? and then give me your tested-by? thank you very much! -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ok, I can help you take a test, please send a big patch container this patchset to my email. 2012/12/28 Gao feng <gaofeng@cn.fujitsu.com>: > On 12/28/12 11:52, canqun zhang wrote: >> Hi all >> As discussed above,if the host machine create several linux >> containers, there will be several net namespaces.Resources with "nf >> conntrack" are registered or unregistered on the first net >> namespace(init_net),But init_net is not unregistered lastly,so >> cleanuping other net namespaces will triger painic. >> If net namespaces are created with the order of 1,2,...n,they should >> be cleaned with the order of n,...2,1,so in this case init_net will be >> unregistered lastly. >> I fixed it up (see below). I have taken a lot of test! >> > > I thinks this BUG is a netfilter BUG,not a netns BUG. > Other subsystems implemented netns support don't use init_net to > do some special works((un)register/(un)set). > > In fact,we can't use init_net to do this job well.such as function > nf_conntrack_clean,we shoud set ip_ct_attach to NULL before any > netns doing cleanup jobs, and set nf_ct_destroy to NULL after all of > netns finish these cleanup jobs. > > So I think finally we still need this patchset,And this is a regular > way to fix this problem. > > Can you help me to test if the panic bug is fixed by this patchset? > and then give me your tested-by? > > thank you very much! -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 27, 2012 at 08:48:44PM -0800, Eric W. Biederman wrote: [...] > Does anyone know why Alexy decided to do this only for init_net? Not myself. [...] > My inclination is that Gao Feng is on the rigt path by just removing > the strange init_net special case and performing the work once > per module load, and once per module unload. That makes sense to me, I'll check Gao's patchset asap to come back to him with some feedback. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi canqun, On 2012/12/28 16:48, canqun zhang wrote: > ok, I can help you take a test, please send a big patch container this > patchset to my email. > Can you give me your tested-by to this patchset besides patch [19/19]? Thanks! > > 2012/12/28 Gao feng <gaofeng@cn.fujitsu.com>: >> On 12/28/12 11:52, canqun zhang wrote: >>> Hi all >>> As discussed above,if the host machine create several linux >>> containers, there will be several net namespaces.Resources with "nf >>> conntrack" are registered or unregistered on the first net >>> namespace(init_net),But init_net is not unregistered lastly,so >>> cleanuping other net namespaces will triger painic. >>> If net namespaces are created with the order of 1,2,...n,they should >>> be cleaned with the order of n,...2,1,so in this case init_net will be >>> unregistered lastly. >>> I fixed it up (see below). I have taken a lot of test! >>> >> >> I thinks this BUG is a netfilter BUG,not a netns BUG. >> Other subsystems implemented netns support don't use init_net to >> do some special works((un)register/(un)set). >> >> In fact,we can't use init_net to do this job well.such as function >> nf_conntrack_clean,we shoud set ip_ct_attach to NULL before any >> netns doing cleanup jobs, and set nf_ct_destroy to NULL after all of >> netns finish these cleanup jobs. >> >> So I think finally we still need this patchset,And this is a regular >> way to fix this problem. >> >> Can you help me to test if the panic bug is fixed by this patchset? >> and then give me your tested-by? >> >> thank you very much! > -- > To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff -r 6a1a258923f5 -r 2667e89e6f50 net/core/net_namespace.c --- a/net/core/net_namespace.c Fri Dec 28 11:01:17 2012 +0800 +++ b/net/core/net_namespace.c Fri Dec 28 11:05:12 2012 +0800 @@ -450,7 +450,7 @@ list_del(&ops->list); for_each_net(net) - list_add_tail(&net->exit_list, &net_exit_list); + list_add(&net->exit_list, &net_exit_list); ops_exit_list(ops, &net_exit_list); ops_free_list(ops, &net_exit_lis