Message ID | m1hbhgq1v1.fsf@fess.ebiederm.org |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, 2010-09-23 at 01:51 -0700, Eric W. Biederman wrote: > Take advantage of the new abstraction and allow network devices > to be placed in any network namespace that we have a fd to talk > about. > So ... why just netdevice? could you allow migration of other net "items" eg a route table since they are all tagged by netns? cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/23/2010 04:51 AM, Eric W. Biederman wrote: > > Take advantage of the new abstraction and allow network devices > to be placed in any network namespace that we have a fd to talk > about. > ... > +struct net *get_net_ns_by_fd(int fd) > +{ > + struct proc_inode *ei; > + struct file *file; > + struct net *net; > + > + file = NULL; No need to initialize this. > + net = ERR_PTR(-EINVAL); or this? > + file = proc_ns_fget(fd); > + if (!fd) > + goto out; > + return ERR_PTR(-EINVAL); Shouldn't this be: if (!file) And the "goto" seems wrong, especially without a {} here. Unless you meant to keep the "goto" and branch below? -Brian > + > + ei = PROC_I(file->f_dentry->d_inode); > + if (ei->ns_ops != &netns_operations) > + goto out; > + > + net = get_net(ei->ns); > +out: > + if (file) > + fput(file); > + return net; > +} -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 23, 2010 at 07:22:06AM -0400, jamal wrote: > On Thu, 2010-09-23 at 01:51 -0700, Eric W. Biederman wrote: > > Take advantage of the new abstraction and allow network devices > > to be placed in any network namespace that we have a fd to talk > > about. > > So ... why just netdevice? could you allow migration of other > net "items" eg a route table since they are all tagged by > netns? migrating route table entries makes no sense because a) they refer to devices and configuration that does not exist in the target namespace; they only make sense within their netns context b) they are purely virtual and you get the same result from deleting and recreating them. Network devices are special because they may have something attached to them, be it hardware or some daemon. -David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
jamal <hadi@cyberus.ca> writes: > On Thu, 2010-09-23 at 01:51 -0700, Eric W. Biederman wrote: >> Take advantage of the new abstraction and allow network devices >> to be placed in any network namespace that we have a fd to talk >> about. >> > > So ... why just netdevice? could you allow migration of other > net "items" eg a route table since they are all tagged by > netns? For this patchset because we only support migrating physical network devices between network namespaces today. In the bigger picture migrating things between network namespaces is race prone. Fixing those races probably would reduce network stack performance and increase code complexity for not particularly good reason. Network devices are special because they are physical hardware and in combination with the rule that all packets coming a network device go to a single network namespace we have to implement migration for network devices. Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Brian Haley <brian.haley@hp.com> writes: > On 09/23/2010 04:51 AM, Eric W. Biederman wrote: >> >> Take advantage of the new abstraction and allow network devices >> to be placed in any network namespace that we have a fd to talk >> about. >> > ... >> +struct net *get_net_ns_by_fd(int fd) >> +{ >> + struct proc_inode *ei; >> + struct file *file; >> + struct net *net; >> + >> + file = NULL; > > No need to initialize this. > >> + net = ERR_PTR(-EINVAL); > > or this? > >> + file = proc_ns_fget(fd); >> + if (!fd) >> + goto out; >> + return ERR_PTR(-EINVAL); > > Shouldn't this be: > > if (!file) > > And the "goto" seems wrong, especially without a {} here. Unless you > meant to keep the "goto" and branch below? I think I changed my mind half way through writing the code and never did anything about it. Oops. Thanks fixed. It is now: struct net *get_net_ns_by_fd(int fd) { struct proc_inode *ei; struct file *file; struct net *net; net = ERR_PTR(-EINVAL); file = proc_ns_fget(fd); if (!file) goto out; ei = PROC_I(file->f_dentry->d_inode); if (ei->ns_ops != &netns_operations) goto out; net = get_net(ei->ns); out: if (file) fput(file); return net; } Which at least makes sense. Now to test it to double check it does what it should do. Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2010-09-23 at 16:58 +0200, David Lamparter wrote: > migrating route table entries makes no sense because > a) they refer to devices and configuration that does not exist in the > target namespace; they only make sense within their netns context > b) they are purely virtual and you get the same result from deleting and > recreating them. > > Network devices are special because they may have something attached to > them, be it hardware or some daemon. Routes functionally reside on top of netdevices, point to nexthop neighbors across these netdevices etc. Underlying assumption is you take care of that dependency when migrating. We are talking about FIB entries here not the route cache; moving a few pointers within the kernel is a hell lot faster than recreating a subset of BGP entries from user space. Eric, I didnt follow the exposed-races arguement: Why would it involve more than just some basic locking only while you change the struct net pointer to the new namespace for these sub-subsystems? cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 24, 2010 at 07:51:24AM -0400, jamal wrote: > > migrating route table entries makes no sense because > > a) they refer to devices and configuration that does not exist in the > > target namespace; they only make sense within their netns context > > b) they are purely virtual and you get the same result from deleting and > > recreating them. > > > > Network devices are special because they may have something attached to > > them, be it hardware or some daemon. > > Routes functionally reside on top of netdevices, point to nexthop > neighbors across these netdevices etc. Underlying assumption is you take > care of that dependency when migrating. > We are talking about FIB entries here not the route cache; moving a few > pointers within the kernel is a hell lot faster than recreating a subset > of BGP entries from user space. No. While you sure could associate routes with devices, they don't *functionally* reside on top of network devices. They reside on top of the entire IP configuration, and in case of BGP they even reside on top of your set of peerings and their data. Even if you could "move" routes together with a network device, the result would be utter nonsense. The routes depend on your BGP view, and if your set of interfaces (and peers) changes, your routes will change. Your bgpd will, either way, need to set up new peerings and redo best path evaluations. (On an unrelated note, how often are you planning to move stuff between namespaces? I don't expect to be moving stuff except on configuration events...) -David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2010-09-24 at 14:57 +0200, David Lamparter wrote: > No. While you sure could associate routes with devices, they don't > *functionally* reside on top of network devices. They reside on top of > the entire IP configuration, I think i am not clearly making my point. There are data dependencies; If you were to move routes, youd need everything that routes depend on. IOW, if i was to draw a functional graph, routes would appear on top of netdevs (I dont care what other functional blocks you put in between or sideways to them). > and in case of BGP they even reside on top > of your set of peerings and their data. > Even if you could "move" routes together with a network device, the > result would be utter nonsense. You could argue that moving a netdevice where some of its fundamental properties such as an ifindex change is utter nonsense. But you can work around it. > The routes depend on your BGP view, and > if your set of interfaces (and peers) changes, your routes will change. > Your bgpd will, either way, need to set up new peerings and redo best > path evaluations. Worst case scenario, yes. I am beginning to get a feeling we are trying to achieve different goals maybe? Why are you even migrating netdevs? > (On an unrelated note, how often are you planning to move stuff between > namespaces? I don't expect to be moving stuff except on configuration > events...) Triggering on config events is useful and it is likely the only possibility if you assumed the other namespace is remote. But if could send a single command to migrate several things in the kernel (in my case to recover state to a different ns), then that is much simpler and uses the least resources (memory, cpu, bandwidth). I admit it is very hard to do in most cases where the underlying dependencies are evolving and synchronizing via user space is the best approach. The example of route table i pointed to is simple. Besides that: dynamic state created in the kernel that doesnt have to be recreated by the next arriving 100K packets helps to improve recovery. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/23/2010 10:51 AM, Eric W. Biederman wrote: > > Take advantage of the new abstraction and allow network devices > to be placed in any network namespace that we have a fd to talk > about. > > Signed-off-by: Eric W. Biederman<ebiederm@xmission.com> > --- [ ... ] > +struct net *get_net_ns_by_fd(int fd) > +{ > + struct proc_inode *ei; > + struct file *file; > + struct net *net; > + > + file = NULL; > + net = ERR_PTR(-EINVAL); > + file = proc_ns_fget(fd); > + if (!fd) > + goto out; > + return ERR_PTR(-EINVAL); > + > + ei = PROC_I(file->f_dentry->d_inode); > + if (ei->ns_ops !=&netns_operations) > + goto out; Is this check necessary here ? proc_ns_fget checks "file->f_op != &ns_file_operations", no ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 24, 2010 at 09:32:53AM -0400, jamal wrote: > On Fri, 2010-09-24 at 14:57 +0200, David Lamparter wrote: > > No. While you sure could associate routes with devices, they don't > > *functionally* reside on top of network devices. They reside on top of > > the entire IP configuration, > > I think i am not clearly making my point. There are data dependencies; > If you were to move routes, youd need everything that routes depend on. > IOW, if i was to draw a functional graph, routes would appear on top > of netdevs (I dont care what other functional blocks you put in between > or sideways to them). I understood your point. What I'm saying is that that functional graph you're describing is too simplistic do be a workable model. Your graph allows for what you're trying to do, yes. But your graph is not modeling the reality. > > The routes depend on your BGP view, and > > if your set of interfaces (and peers) changes, your routes will change. > > Your bgpd will, either way, need to set up new peerings and redo best > > path evaluations. > > Worst case scenario, yes. I am beginning to get a feeling we are trying > to achieve different goals maybe? Why are you even migrating netdevs? Err... I'm migrating netdevs to assign them to namespaces to allow them to use them? Setup, basically. Either way a device move only happens as result of some administrative action; be it creating a new namespace or changing the physical/logical network setup. > > (On an unrelated note, how often are you planning to move stuff between > > namespaces? I don't expect to be moving stuff except on configuration > > events...) > > Triggering on config events is useful and it is likely the only > possibility if you assumed the other namespace is remote. wtf is a "remote" namespace? > But if could > send a single command to migrate several things in the kernel (in my > case to recover state to a different ns), then that is much simpler and > uses the least resources (memory, cpu, bandwidth). I admit it is very > hard to do in most cases where the underlying dependencies are evolving > and synchronizing via user space is the best approach. The example > of route table i pointed to is simple. > Besides that: dynamic state created in the kernel that doesnt have to be > recreated by the next arriving 100K packets helps to improve recovery. Can you please describe your application that requires moving possibly several network devices together with "their" routes to a different namespace? -David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2010-09-24 at 16:09 +0200, David Lamparter wrote: > I understood your point. What I'm saying is that that functional graph > you're describing is too simplistic do be a workable model. Your graph > allows for what you're trying to do, yes. But your graph is not modeling > the reality. How about we put this specific point to rest by agreeing to disagree? ;-> > Err... I'm migrating netdevs to assign them to namespaces to allow them > to use them? Setup, basically. Either way a device move only happens as > result of some administrative action; be it creating a new namespace or > changing the physical/logical network setup. > Ok, different need. You have a much more basic requirement than i do. > wtf is a "remote" namespace? > A namespace that is remotely located on another machine/hardware ;-> > Can you please describe your application that requires moving possibly > several network devices together with "their" routes to a different > namespace? scaling and availability are the driving requirements. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/if_link.h b/include/linux/if_link.h index 2fc66dd..ae73d5e 100644 --- a/include/linux/if_link.h +++ b/include/linux/if_link.h @@ -116,6 +116,7 @@ enum { IFLA_STATS64, IFLA_VF_PORTS, IFLA_PORT_SELF, + IFLA_NET_NS_FD, __IFLA_MAX }; diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index bd10a79..68672ce 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -114,6 +114,7 @@ static inline struct net *copy_net_ns(unsigned long flags, struct net *net_ns) extern struct list_head net_namespace_list; extern struct net *get_net_ns_by_pid(pid_t pid); +extern struct net *get_net_ns_by_fd(int pid); #ifdef CONFIG_NET_NS extern void __put_net(struct net *net); diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 581a088..a9b54a7 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -8,6 +8,8 @@ #include <linux/idr.h> #include <linux/rculist.h> #include <linux/nsproxy.h> +#include <linux/proc_fs.h> +#include <linux/file.h> #include <net/net_namespace.h> #include <net/netns/generic.h> @@ -341,6 +343,30 @@ struct net *get_net_ns_by_pid(pid_t pid) } EXPORT_SYMBOL_GPL(get_net_ns_by_pid); +struct net *get_net_ns_by_fd(int fd) +{ + struct proc_inode *ei; + struct file *file; + struct net *net; + + file = NULL; + net = ERR_PTR(-EINVAL); + file = proc_ns_fget(fd); + if (!fd) + goto out; + return ERR_PTR(-EINVAL); + + ei = PROC_I(file->f_dentry->d_inode); + if (ei->ns_ops != &netns_operations) + goto out; + + net = get_net(ei->ns); +out: + if (file) + fput(file); + return net; +} + static int __init net_ns_init(void) { struct net_generic *ng; diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index f78d821..771d8be 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1003,6 +1003,8 @@ struct net *rtnl_link_get_net(struct net *src_net, struct nlattr *tb[]) */ if (tb[IFLA_NET_NS_PID]) net = get_net_ns_by_pid(nla_get_u32(tb[IFLA_NET_NS_PID])); + else if (tb[IFLA_NET_NS_FD]) + net = get_net_ns_by_fd(nla_get_u32(tb[IFLA_NET_NS_FD])); else net = get_net(src_net); return net; @@ -1077,7 +1079,7 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm, int send_addr_notify = 0; int err; - if (tb[IFLA_NET_NS_PID]) { + if (tb[IFLA_NET_NS_PID] || tb[IFLA_NET_NS_FD]) { struct net *net = rtnl_link_get_net(dev_net(dev), tb); if (IS_ERR(net)) { err = PTR_ERR(net);
Take advantage of the new abstraction and allow network devices to be placed in any network namespace that we have a fd to talk about. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- include/linux/if_link.h | 1 + include/net/net_namespace.h | 1 + net/core/net_namespace.c | 26 ++++++++++++++++++++++++++ net/core/rtnetlink.c | 4 +++- 4 files changed, 31 insertions(+), 1 deletions(-)