Message ID | 20180118202124.21616-2-christian.brauner@ubuntu.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Series | rtnetlink: request RTM_GETLINK by pid or fd | expand |
On Thu, 18 Jan 2018 21:21:24 +0100, Christian Brauner wrote: > In such scenarios setting a netns id property is > not really wanted Why? I think that's what you should do if you want to avoid setns. Just use netnsid. I don't see any problem with that and you didn't provide any explanation. Jiri
On Thu, Jan 18, 2018 at 09:29:14PM +0100, Jiri Benc wrote: > On Thu, 18 Jan 2018 21:21:24 +0100, Christian Brauner wrote: > > In such scenarios setting a netns id property is > > not really wanted > > Why? I think that's what you should do if you want to avoid setns. Just > use netnsid. I don't see any problem with that and you didn't provide > any explanation. Ah, sorry, maybe I was to brief. When creating and destroying a lot of short-lived network namespaces it seems unnecessary to give them all label/netns id. Using a netns id makes much more sense when you want a persistent, long-living network namespace. For example, iproute2 where you want to create a persistent network namespace that sticks around via ip netns add bla && ip netns set bla 5. A more concrete scenario is creating a network namespace, moving a device into it via RTM_SETLINK which also supports IFLA_NET_NS_{FD,PID} and then wanting to query the device. This would be very easy to do if one could reuse the IFLA_NET_NS_{FD,PID} without having to set a netnsid. Christian
On Thu, 18 Jan 2018 21:55:53 +0100, Christian Brauner wrote: > A more concrete scenario is creating a network namespace, moving a > device into it via RTM_SETLINK which also supports IFLA_NET_NS_{FD,PID} > and then wanting to query the device. This would be very easy to do if > one could reuse the IFLA_NET_NS_{FD,PID} without having to set a > netnsid. Wouldn't be a better solution to have a way to request the netnsid allocation (and return it) right away when creating the name space, then? Jiri
On Mon, Jan 22, 2018 at 10:00:46PM +0100, Jiri Benc wrote: > On Thu, 18 Jan 2018 21:55:53 +0100, Christian Brauner wrote: > > A more concrete scenario is creating a network namespace, moving a > > device into it via RTM_SETLINK which also supports IFLA_NET_NS_{FD,PID} > > and then wanting to query the device. This would be very easy to do if > > one could reuse the IFLA_NET_NS_{FD,PID} without having to set a > > netnsid. > > Wouldn't be a better solution to have a way to request the netnsid > allocation (and return it) right away when creating the name space, > then? That is certainly a good idea and I'm happy to send a follow-up patch for that! But there's still value in being able to use IFLA_NET_NS_{FD,PID} in scenarios where the network namespace has been created by another process. In this case we don't know what its netnsid is and not even if it had been assigned one at creation time or not. In this case it would be useful to refer to the netns via a pid or fd. A more concrete and frequent example is querying a network namespace of a (sorry for the buzzword :)) container for all defined network interfaces.
On Mon, 22 Jan 2018 22:23:54 +0100, Christian Brauner wrote: > That is certainly a good idea and I'm happy to send a follow-up patch > for that! Note that I haven't looked into that and I don't know whether it is easily possible. I'll appreciate if you could try that. > But there's still value in being able to use > IFLA_NET_NS_{FD,PID} in scenarios where the network namespace has been > created by another process. In this case we don't know what its netnsid > is and not even if it had been assigned one at creation time or not. In > this case it would be useful to refer to the netns via a pid or fd. A > more concrete and frequent example is querying a network namespace of a > (sorry for the buzzword :)) container for all defined network > interfaces. That's what spurred my original comment. If you don't know the netnsid in such case, we're missing something in uAPI but at a different point than RTM_GETLINK. When you find yourself in a need to query an interface in another netns, you had to learn about that interface in the first place. Meaning you got its ifindex (or ifname, perhaps) somehow. My point is, you should have learned the netnsid at the same time. ifindex alone is not an unique identifier of an interface. The (ifindex, netnsid) pair is. (Also, note that ifindex can change when moving the interface to a different netns.) So you should never get ifindex alone when the interface is in another netns than the current one. If that happens, that is the uAPI that needs to be fixed. You have to always get the (ifindex, netnsid) pair. And that's also the way how it should operate in the other direction: (ifindex, netnsid) is the identifier to query interface in another netns. Note that many APIs in networking are based around netnsid - look at NETLINK_LISTEN_ALL_NSID. It allows you to keep track of interfaces as they are moved from name spaces to name spaces using a single socket: you get notifications about interfaces disappearing or appearing in "watched" name spaces. The name spaces are, of course, referenced by their netnsid. In order to add another "watched" name space, just assign it (or, more typically, let the kernel assign) a netnsid. Btw, we have one missing piece here: when an interface is moved to a name space that does not have netnsid attached, we want to find out where the interface was moved to. But there's currently no reliable way to do it. For veth, the other end can be used to get the netnsid (note that RTM_GETLINK will return the correct link netnsid even when the queried veth interface is in a different name space), for openvswitch, we can now use genetlink, etc., but using different ways for different interface types is not the best API ever and for standalone interfaces we have nothing. I'd like to see something added to uAPI to cover this in a generic way. But as for this patch, I don't think it's the correct way. We do have missing pieces in uAPI wrt. netns support but I think they're at different places. If you have a counter example, please speak up. Thanks, Jiri
On Mon, Jan 22, 2018 at 11:06:16PM +0100, Jiri Benc wrote: > On Mon, 22 Jan 2018 22:23:54 +0100, Christian Brauner wrote: > > That is certainly a good idea and I'm happy to send a follow-up patch > > for that! > > Note that I haven't looked into that and I don't know whether it is > easily possible. I'll appreciate if you could try that. > > > But there's still value in being able to use > > IFLA_NET_NS_{FD,PID} in scenarios where the network namespace has been > > created by another process. In this case we don't know what its netnsid > > is and not even if it had been assigned one at creation time or not. In > > this case it would be useful to refer to the netns via a pid or fd. A > > more concrete and frequent example is querying a network namespace of a > > (sorry for the buzzword :)) container for all defined network > > interfaces. > > That's what spurred my original comment. If you don't know the netnsid > in such case, we're missing something in uAPI but at a different point > than RTM_GETLINK. > > When you find yourself in a need to query an interface in another > netns, you had to learn about that interface in the first place. > Meaning you got its ifindex (or ifname, perhaps) somehow. My point is, > you should have learned the netnsid at the same time. <snip> > different places. If you have a counter example, please speak up. This is not necessarily true in scenarios where I move a network device via RTM_NEWLINK + IFLA_NET_NS_PID into a network namespace I haven't created. Here is an example: nlmsghdr->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK; nlmsghdr->nlmsg_type = RTM_NEWLINK; /* move to network namespace of pid */ nla_put_u32(nlmsg, IFLA_NET_NS_PID, pid) /* give interface new name */ nla_put_string(nlmsg, IFLA_IFNAME, ifname) The only thing I have is the pid that identifies the network namespace. There's no non-syscall way to learn the netnsid. In this case I just want to be able to do the analogue to RTM_NEWLINK + IFLA_NET_NS_PID aka RTM_GETLINK + IFLA_NET_NS_PID to e.g. retrieve the list of all network devices in the network namespace identified by pid. Christian
On Mon, 22 Jan 2018 23:25:41 +0100, Christian Brauner wrote: > This is not necessarily true in scenarios where I move a network device > via RTM_NEWLINK + IFLA_NET_NS_PID into a network namespace I haven't > created. Here is an example: > > nlmsghdr->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK; > nlmsghdr->nlmsg_type = RTM_NEWLINK; > /* move to network namespace of pid */ > nla_put_u32(nlmsg, IFLA_NET_NS_PID, pid) > /* give interface new name */ > nla_put_string(nlmsg, IFLA_IFNAME, ifname) > > The only thing I have is the pid that identifies the network namespace. How do you know the interface did not get renamed in the new netns? This is racy and won't work reliably. You really need to know the netnsid before moving the interface to the netns to be able to do meaningful queries. You may argue that for your case, you're fine with the race. But I know about use cases where it matters a lot: those are tools that show network topology including changes in real time, such as Skydive. It's important to have the uAPI designed right. And we don't want two different APIs for the same thing. If you want to do any watching over the interfaces (as opposed to "move to the netns and forget"), you really have to work with netnsids. Let's focus on how to do that more easily. We don't return netnsid at all places where we should return it and we need to fix that. > There's no non-syscall way to learn the netnsid. And that is the primary problem. Jiri
On Tue, Jan 23, 2018 at 10:30:09AM +0100, Jiri Benc wrote: > On Mon, 22 Jan 2018 23:25:41 +0100, Christian Brauner wrote: > > This is not necessarily true in scenarios where I move a network device > > via RTM_NEWLINK + IFLA_NET_NS_PID into a network namespace I haven't > > created. Here is an example: > > > > nlmsghdr->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK; > > nlmsghdr->nlmsg_type = RTM_NEWLINK; > > /* move to network namespace of pid */ > > nla_put_u32(nlmsg, IFLA_NET_NS_PID, pid) > > /* give interface new name */ > > nla_put_string(nlmsg, IFLA_IFNAME, ifname) > > > > The only thing I have is the pid that identifies the network namespace. > > How do you know the interface did not get renamed in the new netns? > > This is racy and won't work reliably. You really need to know the > netnsid before moving the interface to the netns to be able to do > meaningful queries. Even if you know the netnsid, do the mentioned watches work for nested/child namespaces if eg. a container creates new namespace before and/or after the watch was established and moves interfaces to these child namespaces, would you just see them disappear, or can you keep track of them later on as well? Even if that works, from what the documentation tells me netlink is an unreliable protocol, so if my watcher's socket buffer is full, wouldn't I be losing important tracking information? I think one possible solution to tracking interfaces would be to have a unique identifier that never changes (even if it's just a simple uint64_t incremented whenever an interface is created). But since they're not local to the current namespace that may require a lot of extra permission checks (but I'm just speculating here...). In any case, IFLA_NET_NS_FD/PID are already there and I had been wondering previously why they couldn't be used with RTM_GETLINK, it would just make sense.
On Tue, 23 Jan 2018 11:26:58 +0100, Wolfgang Bumiller wrote: > Even if you know the netnsid, do the mentioned watches work for > nested/child namespaces if eg. a container creates new namespace before > and/or after the watch was established and moves interfaces to these > child namespaces, would you just see them disappear, or can you keep > track of them later on as well? What do you mean by "nested namespaces"? There's no such thing for net name spaces. As for missing API to get netnsid of the netns the interface is moved to, see my previous emails in this thread. This needs to be added. > Even if that works, from what the documentation tells me netlink is an > unreliable protocol, so if my watcher's socket buffer is full, wouldn't > I be losing important tracking information? Sure. But that's fundamentally unfixable independently on netlink, the kernel needs to take an action if a program is not reading its messages. Either some messages get dropped or the program is killed or infinite amount of memory is consumed. This has nothing to do with uAPI design. > I think one possible solution to tracking interfaces would be to have a > unique identifier that never changes (even if it's just a simple > uint64_t incremented whenever an interface is created). But since > they're not local to the current namespace that may require a lot of > extra permission checks (but I'm just speculating here...). You'll get a hard NACK from CRIU folks if you try to propose this. > In any case, IFLA_NET_NS_FD/PID are already there and I had been > wondering previously why they couldn't be used with RTM_GETLINK, it > would just make sense. Those predate netnsids and we can't get rid of them now, since they're part of uAPI. But we can (and should) make sure we don't add more of those. Jiri
(Christian, I'm adding back the netdev list, there's no reason not to have the discussion in open.) On Tue, 23 Jan 2018 12:42:19 +0100, Christian Brauner wrote: > Thanks for the comments and discussion. Sorry, for not going through the > list but I just have a quick question that doesn't deserve to spam the > whole netdev list. :) I have written another set of patches that would > add support for IFLA_IF_NETNSID to RTM_{N,S}ETLINK and friends which I > planned to send out after the pid and fd ones. Would you be interested > in those if I sent them out over the next week? Yes, please CC me. I will have limited time next week (whole day meetings for the most of the week) but I'll try to review. In general, I see value in adding netnsid support to setlink, newlink and dellink. It's a bit of minefield, though. Look for and be sure to resolve: - Backwards compatibility with old kernels. Old kernels will ignore the IFLA_IF_NETNSID attribute that is unknown to them. And it's certainly undesirable for an application to attempt to set/create an interface in a given netns and instead the kernel sets/creates an interface in the current netns. In 79e1ad148c844, I put into place means to handle that: any kernel that understands IFLA_IF_NETNSID but does not support setlink/newlink/dellink will return -EOPNOTSUPP. This reduces the problem to detecting whether the kernel understands IFLA_IF_NETNSID. This can be done by invoking getlink with IFLA_IF_NETNSID and looking whether the reply contains IFLA_IF_NETNSID. The current getlink calls are also limited to CAP_NET_ADMIN in the *target* netns. If/when this is relaxed in the future, we still need to stay backwards compatible. So, the application needs to do this: 1. call RTM_GETLINK with IFLA_IF_NETNSID on lo (or do a dump) 2. if getting EACCESS, assume IFLA_IF_NETNSID is not supported => fall back to other means 2. if the reply does not contain IFLA_IF_NETNSID, the kernel does not support IFLA_IF_NETNSID => fall back to other means 3. try RTM_SETLINK/NEWLINK/DELLINK and check for EOPNOTSUPP => fall back to other means What is important for your kernel patches is to not break this. If you implement only partial support for some of the operation, think about the future extensibility. Applications have to be able to reliably detect what is supported on the running kernel, regardless of the kernel version. - Access rights. This is security sensitive stuff. I took the big hammer and limited everything to CAP_NET_ADMIN in the target netns. If you want to relax this, be sure to get review from people involved in name space security. Jiri
Le 22/01/2018 à 23:06, Jiri Benc a écrit : [snip] > Btw, we have one missing piece here: when an interface is moved to a > name space that does not have netnsid attached, we want to find out > where the interface was moved to. But there's currently no reliable way > to do it. For veth, the other end can be used to get the netnsid (note > that RTM_GETLINK will return the correct link netnsid even when the > queried veth interface is in a different name space), for openvswitch, > we can now use genetlink, etc., but using different ways for different > interface types is not the best API ever and for standalone interfaces > we have nothing. I'd like to see something added to uAPI to cover this > in a generic way. When a virtual interface moves to another netns, the netlink RTM_DELLINK message contains the attribute IFLA_NEW_NETNSID, which identifies where the interface moves. The nsid may be allocated if needed. I don't know if it's acceptable to also allocate an nsid in case of a physical interface. Regards, Nicolas
Le 23/01/2018 à 11:26, Wolfgang Bumiller a écrit : > On Tue, Jan 23, 2018 at 10:30:09AM +0100, Jiri Benc wrote: >> On Mon, 22 Jan 2018 23:25:41 +0100, Christian Brauner wrote: >>> This is not necessarily true in scenarios where I move a network device >>> via RTM_NEWLINK + IFLA_NET_NS_PID into a network namespace I haven't >>> created. Here is an example: >>> >>> nlmsghdr->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK; >>> nlmsghdr->nlmsg_type = RTM_NEWLINK; >>> /* move to network namespace of pid */ >>> nla_put_u32(nlmsg, IFLA_NET_NS_PID, pid) >>> /* give interface new name */ >>> nla_put_string(nlmsg, IFLA_IFNAME, ifname) >>> >>> The only thing I have is the pid that identifies the network namespace. >> >> How do you know the interface did not get renamed in the new netns? >> >> This is racy and won't work reliably. You really need to know the >> netnsid before moving the interface to the netns to be able to do >> meaningful queries. > > Even if you know the netnsid, do the mentioned watches work for > nested/child namespaces if eg. a container creates new namespace before > and/or after the watch was established and moves interfaces to these > child namespaces, would you just see them disappear, or can you keep > track of them later on as well? nsid can be monitored (see ip monitor nsid). > > Even if that works, from what the documentation tells me netlink is an > unreliable protocol, so if my watcher's socket buffer is full, wouldn't > I be losing important tracking information? You can track socket error statistics. In case of error, you can start a dump to ensure that you have the right view of the system. > > I think one possible solution to tracking interfaces would be to have a > unique identifier that never changes (even if it's just a simple > uint64_t incremented whenever an interface is created). But since > they're not local to the current namespace that may require a lot of > extra permission checks (but I'm just speculating here...). It's not possible to have unique identifiers. With CRIU, you need to be able to reassign all existing ids.
Le 23/01/2018 à 13:22, Jiri Benc a écrit : > (Christian, I'm adding back the netdev list, there's no reason not to > have the discussion in open.) > > On Tue, 23 Jan 2018 12:42:19 +0100, Christian Brauner wrote: >> Thanks for the comments and discussion. Sorry, for not going through the >> list but I just have a quick question that doesn't deserve to spam the >> whole netdev list. :) I have written another set of patches that would >> add support for IFLA_IF_NETNSID to RTM_{N,S}ETLINK and friends which I >> planned to send out after the pid and fd ones. Would you be interested >> in those if I sent them out over the next week? > > Yes, please CC me. I will have limited time next week (whole day > meetings for the most of the week) but I'll try to review. CC me also please. Regards, Nicolas
On Tue, 23 Jan 2018 17:37:11 +0100, Nicolas Dichtel wrote: > When a virtual interface moves to another netns, the netlink RTM_DELLINK message > contains the attribute IFLA_NEW_NETNSID, which identifies where the interface > moves. The nsid may be allocated if needed. The problem is that ifindex may change and it's not announced. The only way is to track both ifindex and ifname, watch for the ifname to appear in the target netns and update the application's view of ifindex. It would be much better if the whole (ifindex, netnsid) pair was returned. I think it could be added. > I don't know if it's acceptable to also allocate an nsid in case of a physical > interface. I think we need that. Thanks, Jiri
On Tue, Jan 23, 2018 at 05:55:27PM +0100, Nicolas Dichtel wrote: > Le 23/01/2018 à 13:22, Jiri Benc a écrit : > > (Christian, I'm adding back the netdev list, there's no reason not to > > have the discussion in open.) > > > > On Tue, 23 Jan 2018 12:42:19 +0100, Christian Brauner wrote: > >> Thanks for the comments and discussion. Sorry, for not going through the > >> list but I just have a quick question that doesn't deserve to spam the > >> whole netdev list. :) I have written another set of patches that would > >> add support for IFLA_IF_NETNSID to RTM_{N,S}ETLINK and friends which I > >> planned to send out after the pid and fd ones. Would you be interested > >> in those if I sent them out over the next week? > > > > Yes, please CC me. I will have limited time next week (whole day > > meetings for the most of the week) but I'll try to review. > CC me also please. Of course, will do. I plan on sending out the patches for RTM_SETLINK and RTM_DELLINK tomorrow after I've done some more testing on it. Thanks! Christian
Le 23/01/2018 à 18:08, Jiri Benc a écrit : > On Tue, 23 Jan 2018 17:37:11 +0100, Nicolas Dichtel wrote: >> When a virtual interface moves to another netns, the netlink RTM_DELLINK message >> contains the attribute IFLA_NEW_NETNSID, which identifies where the interface >> moves. The nsid may be allocated if needed. > > The problem is that ifindex may change and it's not announced. The only > way is to track both ifindex and ifname, watch for the ifname to appear > in the target netns and update the application's view of ifindex. Yes, you're right. > > It would be much better if the whole (ifindex, netnsid) pair was > returned. I think it could be added. Sure. Do you plan to send a patch? > >> I don't know if it's acceptable to also allocate an nsid in case of a physical >> interface. > > I think we need that. If you agree, I can send a patch to remove this limitation. Regards, Nicolas
On Wed, 24 Jan 2018 11:53:11 +0100, Nicolas Dichtel wrote: > Le 23/01/2018 à 18:08, Jiri Benc a écrit : > > It would be much better if the whole (ifindex, netnsid) pair was > > returned. I think it could be added. > Sure. Do you plan to send a patch? I can do that but it will take a while. I'll be happy to leave that to someone else but if there's no one, I'll eventually do it. > > I think we need that. > If you agree, I can send a patch to remove this limitation. That would be great. Thanks! Jiri
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 16d644a4f974..5210448dcf1f 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1509,7 +1509,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev, struct net *src_net, int type, u32 pid, u32 seq, u32 change, unsigned int flags, u32 ext_filter_mask, - u32 event, int *new_nsid, int tgt_netnsid) + u32 event, int *new_nsid, int tgt_netnsid, + int tgt_netnsid_type) { struct ifinfomsg *ifm; struct nlmsghdr *nlh; @@ -1527,8 +1528,23 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, ifm->ifi_flags = dev_get_flags(dev); ifm->ifi_change = change; - if (tgt_netnsid >= 0 && nla_put_s32(skb, IFLA_IF_NETNSID, tgt_netnsid)) - goto nla_put_failure; + if (tgt_netnsid >= 0) { + u32 err = -1; + + switch (tgt_netnsid_type) { + case IFLA_IF_NETNSID: + err = nla_put_s32(skb, IFLA_IF_NETNSID, tgt_netnsid); + break; + case IFLA_NET_NS_PID: + err = nla_put_s32(skb, IFLA_NET_NS_PID, tgt_netnsid); + break; + case IFLA_NET_NS_FD: + err = nla_put_s32(skb, IFLA_NET_NS_FD, tgt_netnsid); + break; + } + if (err) + goto nla_put_failure; + } if (nla_put_string(skb, IFLA_IFNAME, dev->name) || nla_put_u32(skb, IFLA_TXQLEN, dev->tx_queue_len) || @@ -1791,6 +1807,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) unsigned int flags = NLM_F_MULTI; int master_idx = 0; int netnsid = -1; + int netnsid_type = -1; int err; int hdrlen; @@ -1810,12 +1827,22 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) if (nlmsg_parse(cb->nlh, hdrlen, tb, IFLA_MAX, ifla_policy, NULL) >= 0) { if (tb[IFLA_IF_NETNSID]) { + netnsid_type = IFLA_IF_NETNSID; netnsid = nla_get_s32(tb[IFLA_IF_NETNSID]); tgt_net = get_target_net(skb->sk, netnsid); - if (IS_ERR(tgt_net)) { - tgt_net = net; - netnsid = -1; - } + } else if (tb[IFLA_NET_NS_PID]) { + netnsid_type = IFLA_NET_NS_PID; + netnsid = nla_get_s32(tb[IFLA_NET_NS_PID]); + tgt_net = get_net_ns_by_pid(netnsid); + } else if (tb[IFLA_NET_NS_FD]) { + netnsid_type = IFLA_NET_NS_FD; + netnsid = nla_get_s32(tb[IFLA_NET_NS_FD]); + tgt_net = get_net_ns_by_fd(netnsid); + } + if (IS_ERR(tgt_net)) { + tgt_net = net; + netnsid = -1; + netnsid_type = -1; } if (tb[IFLA_EXT_MASK]) @@ -1845,7 +1872,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) cb->nlh->nlmsg_seq, 0, flags, ext_filter_mask, 0, NULL, - netnsid); + netnsid, netnsid_type); if (err < 0) { if (likely(skb->len)) @@ -2984,6 +3011,7 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh, struct net_device *dev = NULL; struct sk_buff *nskb; int netnsid = -1; + int netnsid_type = -1; int err; u32 ext_filter_mask = 0; @@ -2992,11 +3020,20 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh, return err; if (tb[IFLA_IF_NETNSID]) { + netnsid_type = IFLA_IF_NETNSID; netnsid = nla_get_s32(tb[IFLA_IF_NETNSID]); tgt_net = get_target_net(NETLINK_CB(skb).sk, netnsid); - if (IS_ERR(tgt_net)) - return PTR_ERR(tgt_net); - } + } else if (tb[IFLA_NET_NS_PID]) { + netnsid_type = IFLA_NET_NS_PID; + netnsid = nla_get_s32(tb[IFLA_NET_NS_PID]); + tgt_net = get_net_ns_by_pid(netnsid); + } else if (tb[IFLA_NET_NS_FD]) { + netnsid_type = IFLA_NET_NS_FD; + netnsid = nla_get_s32(tb[IFLA_NET_NS_FD]); + tgt_net = get_net_ns_by_fd(netnsid); + } + if (IS_ERR(tgt_net)) + return PTR_ERR(tgt_net); if (tb[IFLA_IFNAME]) nla_strlcpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ); @@ -3025,7 +3062,7 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh, err = rtnl_fill_ifinfo(nskb, dev, net, RTM_NEWLINK, NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0, 0, ext_filter_mask, - 0, NULL, netnsid); + 0, NULL, netnsid, netnsid_type); if (err < 0) { /* -EMSGSIZE implies BUG in if_nlmsg_size */ WARN_ON(err == -EMSGSIZE); @@ -3134,7 +3171,7 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type, struct net_device *dev, err = rtnl_fill_ifinfo(skb, dev, dev_net(dev), type, 0, 0, change, 0, 0, event, - new_nsid, -1); + new_nsid, -1, -1); if (err < 0) { /* -EMSGSIZE implies BUG in if_nlmsg_size() */ WARN_ON(err == -EMSGSIZE);
This makes it possible to identify the target network namespace of a RTM_GETLINK message by pid or fd. Often userspace tools that make heavy use of network namespaces need a simple and cheap way of querying network devices and network device properties. This becomes even more crucial when the network namespaces in question are transient. In such scenarios setting a netns id property is not really wanted and it is preferable to avoid the hit of (possibly multiple) setns() syscalls (e.g. attaching to the target network namespace and back to the original network namespace.). This commit lets userspace set the IFLA_NET_NS_{FD,PID} property to identify a target network namespace where the device in question is to be queried. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> --- net/core/rtnetlink.c | 63 +++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 50 insertions(+), 13 deletions(-)