Message ID | 1358046374.20249.1789.camel@edumazet-glaptop |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Sat, Jan 12, 2013 at 07:06:14PM -0800, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > act_mirred needs to find the current net_ns, and struct net > pointer is not provided in the call chain. We run in process > context and current->nsproxy->net_ns is the needed pointer. ... I don't think this is correct. Going by the call chain, tcf_action_add can be called because of a netlink message, and that netlink message may not be in the same "struct net" as the current process. It looks like the ->init operation is going to need to have the namespace passed in for this to work correctly. -ben > diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c > index 9c0fd0c..f5a7e18 100644 > --- a/net/sched/act_mirred.c > +++ b/net/sched/act_mirred.c > @@ -88,7 +88,7 @@ static int tcf_mirred_init(struct nlattr *nla, struct nlattr *est, > return -EINVAL; > } > if (parm->ifindex) { > - dev = __dev_get_by_index(&init_net, parm->ifindex); > + dev = __dev_get_by_index(current->nsproxy->net_ns, parm->ifindex); > if (dev == NULL) > return -ENODEV; > switch (dev->type) { > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2013-01-12 at 22:50 -0500, Benjamin LaHaise wrote: > On Sat, Jan 12, 2013 at 07:06:14PM -0800, Eric Dumazet wrote: > > From: Eric Dumazet <edumazet@google.com> > > > > act_mirred needs to find the current net_ns, and struct net > > pointer is not provided in the call chain. We run in process > > context and current->nsproxy->net_ns is the needed pointer. > ... > > I don't think this is correct. Going by the call chain, tcf_action_add can > be called because of a netlink message, and that netlink message may not be > in the same "struct net" as the current process. It looks like the ->init > operation is going to need to have the namespace passed in for this to work > correctly. But it is working in my tests. I added a WARN and the call stack is : [ 701.522282] [<ffffffff8108634f>] warn_slowpath_common+0x7f/0xc0 [ 701.522284] [<ffffffff810863aa>] warn_slowpath_null+0x1a/0x20 [ 701.522286] [<ffffffffa00e00e3>] tcf_mirred_init+0x43/0x340 [act_mirred] [ 701.522289] [<ffffffff81506385>] ? __rtnl_unlock+0x15/0x20 [ 701.522293] [<ffffffff815195e8>] tcf_action_init_1+0x198/0x1e0 [ 701.522295] [<ffffffff815196c8>] tcf_action_init+0x98/0x100 [ 701.522298] [<ffffffff81517f30>] tcf_exts_validate+0x90/0xb0 [ 701.522300] [<ffffffff8151e55b>] u32_set_parms.isra.11+0x3b/0x270 [ 701.522303] [<ffffffff812ffdf0>] ? nla_parse+0x90/0xe0 [ 701.522304] [<ffffffff8151ed16>] u32_change+0x2e6/0x4c0 [ 701.522306] [<ffffffff81518432>] tc_ctl_tfilter+0x4e2/0x720 [ 701.522308] [<ffffffff815064ad>] rtnetlink_rcv_msg+0x11d/0x310 [ 701.522310] [<ffffffff81506390>] ? __rtnl_unlock+0x20/0x20 [ 701.522312] [<ffffffff81522d39>] netlink_rcv_skb+0xa9/0xd0 [ 701.522314] [<ffffffff81503875>] rtnetlink_rcv+0x25/0x40 [ 701.522316] [<ffffffff81522681>] netlink_unicast+0x1b1/0x230 [ 701.522317] [<ffffffff815229fe>] netlink_sendmsg+0x2fe/0x3b0 [ 701.522321] [<ffffffff814dbdf2>] sock_sendmsg+0xd2/0xf0 [ 701.522323] [<ffffffff814dbca0>] ? sock_recvmsg+0xe0/0x100 [ 701.522326] [<ffffffff814dd0f0>] __sys_sendmsg+0x380/0x390 [ 701.522329] [<ffffffff815b20b4>] ? __do_page_fault+0x214/0x460 [ 701.522331] [<ffffffff814df4e9>] sys_sendmsg+0x49/0x90 [ 701.522334] [<ffffffff815b68c2>] system_call_fastpath+0x16/0x1b Could you elaborate on what could be the problem ? We hold the RTNL, so I dont think another process could possibly call tcf_mirred_init() -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 13-01-13 12:49 AM, Eric Dumazet wrote: > Could you elaborate on what could be the problem ? > > We hold the RTNL, so I dont think another process could possibly call > tcf_mirred_init() > Eric, the point probably Ben was trying to make is not about synchronizing rather about which namespace has the right to that action config. Your change is correct for the common use of actions but does not fix the larger picture. At the moment a dev is owned by a specific namespace; that owns a tc filter that in turn owns an action. So no problem with the change you make if all configuration follows those rules i.e something along the lines of: === tc filter add dev eth0 parent 1: protocol ip prio 1 u32 \ match ip dst 10.0.0.229/32 flowid 1:10 \ action mirred egress redirect dev ifb0 ===== I would say most people use the above syntax. However, there is another way to configure actions so they can be shared[1], example control syntax: ---- tc actions add \ action police rate 1kbit burst 90k drop index 3 \ action mirred egress mirror dev eth0 index 5 ---- In such a case, the "tc actions" netlink path may be entered from a different namespace than the one that is using it. Then current->nsproxy->net_ns is no longer correct. To correct this, i think what Ben points out in passing the init() the correct namespace seem like the way to go. Feel free to make that change - otherwise i will get to it and fix it. cheers, jamal [1] You can then have multiple filters use this action like so: === tc filter add dev eth0 parent 1: protocol ip prio 1 u32 \ match ip dst 10.0.0.229/32 flowid 1:10 \ action police index 3 # tc filter add dev eth0 parent ffff: protocol ip prio 6 u32 \ match ip src 10.0.0.21/32 flowid 1:16 \ action police index 3 action mirred egress mirror dev eth0 ===== -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Jan 12, 2013 at 09:49:59PM -0800, Eric Dumazet wrote: > But it is working in my tests. > > I added a WARN and the call stack is : ... > Could you elaborate on what could be the problem ? > > We hold the RTNL, so I dont think another process could possibly call > tcf_mirred_init() The locking isn't the issue, but how the network namespace is selected it. I've implemented some virtual router functionality using network namespaces, and prior to having the setns() syscall, the only way to manipulate other network namespaces was via socket passing between threads in different namespaces. One of the optimizations in using this technique was to open a netlink socket in another namespace, then pass that file descriptor back to the main daemon. The code could then add routes and manipulate other areas of the network stack via that netlink socket. I think this technique is a valid approach for making use of network namespaces. It also has the benefit of avoid the use of setns() for the vast majority of operations. -ben
On Sun, 2013-01-13 at 09:59 -0500, Benjamin LaHaise wrote: > > The locking isn't the issue, but how the network namespace is selected it. > I've implemented some virtual router functionality using network namespaces, > and prior to having the setns() syscall, the only way to manipulate other > network namespaces was via socket passing between threads in different > namespaces. One of the optimizations in using this technique was to open a > netlink socket in another namespace, then pass that file descriptor back to > the main daemon. The code could then add routes and manipulate other areas > of the network stack via that netlink socket. > OK thats evil, I'll pass the net pointer then. Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c index 344dceb..8216438 100644 --- a/drivers/net/ifb.c +++ b/drivers/net/ifb.c @@ -90,7 +90,7 @@ static void ri_tasklet(unsigned long dev) u64_stats_update_end(&dp->tsync); rcu_read_lock(); - skb->dev = dev_get_by_index_rcu(&init_net, skb->skb_iif); + skb->dev = dev_get_by_index_rcu(dev_net(_dev), skb->skb_iif); if (!skb->dev) { rcu_read_unlock(); dev_kfree_skb(skb); diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index 9c0fd0c..f5a7e18 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -88,7 +88,7 @@ static int tcf_mirred_init(struct nlattr *nla, struct nlattr *est, return -EINVAL; } if (parm->ifindex) { - dev = __dev_get_by_index(&init_net, parm->ifindex); + dev = __dev_get_by_index(current->nsproxy->net_ns, parm->ifindex); if (dev == NULL) return -ENODEV; switch (dev->type) {