diff mbox

[net-next] pkt_sched: namespace aware ifb

Message ID 1358046374.20249.1789.camel@edumazet-glaptop
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Jan. 13, 2013, 3:06 a.m. UTC
From: Eric Dumazet <edumazet@google.com>

act_mirred needs to find the current net_ns, and struct net
pointer is not provided in the call chain. We run in process
context and current->nsproxy->net_ns is the needed pointer.

For ifb, things are easier, as the current ifb device can provide
the net pointer immediately.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ifb.c      |    2 +-
 net/sched/act_mirred.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Benjamin LaHaise Jan. 13, 2013, 3:50 a.m. UTC | #1
On Sat, Jan 12, 2013 at 07:06:14PM -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> act_mirred needs to find the current net_ns, and struct net
> pointer is not provided in the call chain. We run in process
> context and current->nsproxy->net_ns is the needed pointer.
...

I don't think this is correct.  Going by the call chain, tcf_action_add can 
be called because of a netlink message, and that netlink message may not be 
in the same "struct net" as the current process.  It looks like the ->init 
operation is going to need to have the namespace passed in for this to work 
correctly.

		-ben

> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> index 9c0fd0c..f5a7e18 100644
> --- a/net/sched/act_mirred.c
> +++ b/net/sched/act_mirred.c
> @@ -88,7 +88,7 @@ static int tcf_mirred_init(struct nlattr *nla, struct nlattr *est,
>  		return -EINVAL;
>  	}
>  	if (parm->ifindex) {
> -		dev = __dev_get_by_index(&init_net, parm->ifindex);
> +		dev = __dev_get_by_index(current->nsproxy->net_ns, parm->ifindex);
>  		if (dev == NULL)
>  			return -ENODEV;
>  		switch (dev->type) {
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 13, 2013, 5:49 a.m. UTC | #2
On Sat, 2013-01-12 at 22:50 -0500, Benjamin LaHaise wrote:
> On Sat, Jan 12, 2013 at 07:06:14PM -0800, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > act_mirred needs to find the current net_ns, and struct net
> > pointer is not provided in the call chain. We run in process
> > context and current->nsproxy->net_ns is the needed pointer.
> ...
> 
> I don't think this is correct.  Going by the call chain, tcf_action_add can 
> be called because of a netlink message, and that netlink message may not be 
> in the same "struct net" as the current process.  It looks like the ->init 
> operation is going to need to have the namespace passed in for this to work 
> correctly.

But it is working in my tests.

I added a WARN and the call stack is :

[  701.522282]  [<ffffffff8108634f>] warn_slowpath_common+0x7f/0xc0
[  701.522284]  [<ffffffff810863aa>] warn_slowpath_null+0x1a/0x20
[  701.522286]  [<ffffffffa00e00e3>] tcf_mirred_init+0x43/0x340 [act_mirred]
[  701.522289]  [<ffffffff81506385>] ? __rtnl_unlock+0x15/0x20
[  701.522293]  [<ffffffff815195e8>] tcf_action_init_1+0x198/0x1e0
[  701.522295]  [<ffffffff815196c8>] tcf_action_init+0x98/0x100
[  701.522298]  [<ffffffff81517f30>] tcf_exts_validate+0x90/0xb0
[  701.522300]  [<ffffffff8151e55b>] u32_set_parms.isra.11+0x3b/0x270
[  701.522303]  [<ffffffff812ffdf0>] ? nla_parse+0x90/0xe0
[  701.522304]  [<ffffffff8151ed16>] u32_change+0x2e6/0x4c0
[  701.522306]  [<ffffffff81518432>] tc_ctl_tfilter+0x4e2/0x720
[  701.522308]  [<ffffffff815064ad>] rtnetlink_rcv_msg+0x11d/0x310
[  701.522310]  [<ffffffff81506390>] ? __rtnl_unlock+0x20/0x20
[  701.522312]  [<ffffffff81522d39>] netlink_rcv_skb+0xa9/0xd0
[  701.522314]  [<ffffffff81503875>] rtnetlink_rcv+0x25/0x40
[  701.522316]  [<ffffffff81522681>] netlink_unicast+0x1b1/0x230
[  701.522317]  [<ffffffff815229fe>] netlink_sendmsg+0x2fe/0x3b0
[  701.522321]  [<ffffffff814dbdf2>] sock_sendmsg+0xd2/0xf0
[  701.522323]  [<ffffffff814dbca0>] ? sock_recvmsg+0xe0/0x100
[  701.522326]  [<ffffffff814dd0f0>] __sys_sendmsg+0x380/0x390
[  701.522329]  [<ffffffff815b20b4>] ? __do_page_fault+0x214/0x460
[  701.522331]  [<ffffffff814df4e9>] sys_sendmsg+0x49/0x90
[  701.522334]  [<ffffffff815b68c2>] system_call_fastpath+0x16/0x1b

Could you elaborate on what could be the problem ?

We hold the RTNL, so I dont think another process could possibly call
tcf_mirred_init()



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim Jan. 13, 2013, 2:44 p.m. UTC | #3
On 13-01-13 12:49 AM, Eric Dumazet wrote:

> Could you elaborate on what could be the problem ?
>
> We hold the RTNL, so I dont think another process could possibly call
> tcf_mirred_init()
>

Eric, the point probably Ben was trying to make is not about 
synchronizing rather about which namespace has the right to that action 
config. Your change is correct for the common use of actions
but does not fix the larger picture.

At the moment a dev is owned by a specific namespace; that owns a tc 
filter that in turn owns an action. So no problem with the change you 
make if all configuration follows those rules i.e something along the 
lines of:
===
tc filter add dev eth0 parent 1: protocol ip prio 1 u32 \
match ip dst 10.0.0.229/32 flowid 1:10 \
action mirred egress redirect dev ifb0
=====

I would say most people use the above syntax.
However, there is another way to configure actions so they can be 
shared[1], example control syntax:
----
tc actions add \
action police rate 1kbit burst 90k drop index 3 \
action mirred egress mirror dev eth0 index 5
----

In such a case, the "tc actions" netlink path may be
entered from a different namespace than the one that is
using it. Then current->nsproxy->net_ns is no longer correct.

To correct this, i think what Ben points out in passing the
init() the correct namespace seem like the way to go. Feel free
to make that change - otherwise i will get to it and fix it.

cheers,
jamal

[1]
You can then have multiple filters use this action like so:
===
tc filter add dev eth0 parent 1: protocol ip prio 1 u32 \
match ip dst 10.0.0.229/32 flowid 1:10 \
action police index 3
#
tc filter add dev eth0 parent ffff: protocol ip prio 6 u32 \
match ip src 10.0.0.21/32 flowid 1:16 \
action police index 3 action mirred egress mirror dev eth0
=====

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin LaHaise Jan. 13, 2013, 2:59 p.m. UTC | #4
On Sat, Jan 12, 2013 at 09:49:59PM -0800, Eric Dumazet wrote:
> But it is working in my tests.
> 
> I added a WARN and the call stack is :
...
> Could you elaborate on what could be the problem ?
> 
> We hold the RTNL, so I dont think another process could possibly call
> tcf_mirred_init()

The locking isn't the issue, but how the network namespace is selected it.  
I've implemented some virtual router functionality using network namespaces, 
and prior to having the setns() syscall, the only way to manipulate other 
network namespaces was via socket passing between threads in different 
namespaces.  One of the optimizations in using this technique was to open a 
netlink socket in another namespace, then pass that file descriptor back to 
the main daemon.  The code could then add routes and manipulate other areas 
of the network stack via that netlink socket.

I think this technique is a valid approach for making use of network 
namespaces.  It also has the benefit of avoid the use of setns() for the 
vast majority of operations.

		-ben
Eric Dumazet Jan. 13, 2013, 4:35 p.m. UTC | #5
On Sun, 2013-01-13 at 09:59 -0500, Benjamin LaHaise wrote:

> 
> The locking isn't the issue, but how the network namespace is selected it.  
> I've implemented some virtual router functionality using network namespaces, 
> and prior to having the setns() syscall, the only way to manipulate other 
> network namespaces was via socket passing between threads in different 
> namespaces.  One of the optimizations in using this technique was to open a 
> netlink socket in another namespace, then pass that file descriptor back to 
> the main daemon.  The code could then add routes and manipulate other areas 
> of the network stack via that netlink socket.
> 

OK thats evil, I'll pass the net pointer then.

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 344dceb..8216438 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -90,7 +90,7 @@  static void ri_tasklet(unsigned long dev)
 		u64_stats_update_end(&dp->tsync);
 
 		rcu_read_lock();
-		skb->dev = dev_get_by_index_rcu(&init_net, skb->skb_iif);
+		skb->dev = dev_get_by_index_rcu(dev_net(_dev), skb->skb_iif);
 		if (!skb->dev) {
 			rcu_read_unlock();
 			dev_kfree_skb(skb);
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 9c0fd0c..f5a7e18 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -88,7 +88,7 @@  static int tcf_mirred_init(struct nlattr *nla, struct nlattr *est,
 		return -EINVAL;
 	}
 	if (parm->ifindex) {
-		dev = __dev_get_by_index(&init_net, parm->ifindex);
+		dev = __dev_get_by_index(current->nsproxy->net_ns, parm->ifindex);
 		if (dev == NULL)
 			return -ENODEV;
 		switch (dev->type) {