diff mbox

[net] netns: avoid allocating idr when dumping info

Message ID 1425018776-14725-1-git-send-email-xiyou.wangcong@gmail.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Cong Wang Feb. 27, 2015, 6:32 a.m. UTC
We can allocate the peer netns id when creating the link
instead of when dumping the link.

This fixes the following kernel warning:

 ===============================
 [ INFO: suspicious RCU usage. ]
 3.19.0+ #805 Tainted: G        W
 -------------------------------
 include/linux/rcupdate.h:538 Illegal context switch in RCU read-side critical section!

 other info that might help us debug this:

 rcu_scheduler_active = 1, debug_locks = 0
 2 locks held by ip/771:
  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff8182b8f4>] netlink_dump+0x21/0x26c
  #1:  (rcu_read_lock){......}, at: [<ffffffff817d785b>] rcu_read_lock+0x0/0x6e

 stack backtrace:
 CPU: 3 PID: 771 Comm: ip Tainted: G        W       3.19.0+ #805
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  0000000000000001 ffff8800d51e7718 ffffffff81a27457 0000000029e729e6
  ffff8800d6108000 ffff8800d51e7748 ffffffff810b539b ffffffff820013dd
  00000000000001c8 0000000000000000 ffff8800d7448088 ffff8800d51e7758
 Call Trace:
  [<ffffffff81a27457>] dump_stack+0x4c/0x65
  [<ffffffff810b539b>] lockdep_rcu_suspicious+0x107/0x110
  [<ffffffff8109796f>] rcu_preempt_sleep_check+0x45/0x47
  [<ffffffff8109e457>] ___might_sleep+0x1d/0x1cb
  [<ffffffff8109e67d>] __might_sleep+0x78/0x80
  [<ffffffff814b9b1f>] idr_alloc+0x45/0xd1
  [<ffffffff810cb7ab>] ? rcu_read_lock_held+0x3b/0x3d
  [<ffffffff814b9f9d>] ? idr_for_each+0x53/0x101
  [<ffffffff817c1383>] alloc_netid+0x61/0x69
  [<ffffffff817c14c3>] __peernet2id+0x79/0x8d
  [<ffffffff817c1ab7>] peernet2id+0x13/0x1f
  [<ffffffff817d8673>] rtnl_fill_ifinfo+0xa8d/0xc20
  [<ffffffff810b17d9>] ? __lock_is_held+0x39/0x52
  [<ffffffff817d894f>] rtnl_dump_ifinfo+0x149/0x213
  [<ffffffff8182b9c2>] netlink_dump+0xef/0x26c
  [<ffffffff8182bcba>] netlink_recvmsg+0x17b/0x2c5
  [<ffffffff817b0adc>] __sock_recvmsg+0x4e/0x59
  [<ffffffff817b1b40>] sock_recvmsg+0x3f/0x51
  [<ffffffff817b1f9a>] ___sys_recvmsg+0xf6/0x1d9
  [<ffffffff8115dc67>] ? handle_pte_fault+0x6e1/0xd3d
  [<ffffffff8100a3a0>] ? native_sched_clock+0x35/0x37
  [<ffffffff8109f45b>] ? sched_clock_local+0x12/0x72
  [<ffffffff8109f6ac>] ? sched_clock_cpu+0x9e/0xb7
  [<ffffffff810cb7ab>] ? rcu_read_lock_held+0x3b/0x3d
  [<ffffffff811abde8>] ? __fcheck_files+0x4c/0x58
  [<ffffffff811ac556>] ? __fget_light+0x2d/0x52
  [<ffffffff817b376f>] __sys_recvmsg+0x42/0x60
  [<ffffffff817b379f>] SyS_recvmsg+0x12/0x1c

Fixes: commit d37512a277dfb2cef ("rtnl: add link netns id to interface messages")
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/net/net_namespace.h | 14 +++++++++++++-
 net/core/net_namespace.c    | 14 ++------------
 net/core/rtnetlink.c        | 11 +++++++++--
 3 files changed, 24 insertions(+), 15 deletions(-)

Comments

Eric Dumazet Feb. 27, 2015, 8:01 a.m. UTC | #1
On Thu, 2015-02-26 at 22:32 -0800, Cong Wang wrote:
> We can allocate the peer netns id when creating the link
> instead of when dumping the link.
> 
> This fixes the following kernel warning:
> 
>  ===============================
>  [ INFO: suspicious RCU usage. ]
>  3.19.0+ #805 Tainted: G        W
>  -------------------------------
>  include/linux/rcupdate.h:538 Illegal context switch in RCU read-side critical section!
> 
>  other info that might help us debug this:

This looks very complicated, why even bother ?

I gave an obvious patch fixing root cause, I have no idea why you prefer
over engineering this.

If you believe allocating peer netns id is required at link creation
time, you should explain why.

BTW, (void)peernet2id(dev_net(dev), src_net) can fail, and your fix is
not complete anyway.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas Dichtel Feb. 27, 2015, 4:25 p.m. UTC | #2
Le 27/02/2015 09:01, Eric Dumazet a écrit :
> On Thu, 2015-02-26 at 22:32 -0800, Cong Wang wrote:
>> We can allocate the peer netns id when creating the link
>> instead of when dumping the link.
>>
>> This fixes the following kernel warning:
>>
>>   ===============================
>>   [ INFO: suspicious RCU usage. ]
>>   3.19.0+ #805 Tainted: G        W
>>   -------------------------------
>>   include/linux/rcupdate.h:538 Illegal context switch in RCU read-side critical section!
>>
>>   other info that might help us debug this:
>
> This looks very complicated, why even bother ?
>
> I gave an obvious patch fixing root cause, I have no idea why you prefer
> over engineering this.
>
> If you believe allocating peer netns id is required at link creation
> time, you should explain why.
>
> BTW, (void)peernet2id(dev_net(dev), src_net) can fail, and your fix is
> not complete anyway.
That's true. The patch does not cover the following case:
ip link add foo type bar
ip link set foo netns netns1

In this case, the second command will call dev_change_net_namespace() and the
id will not be allocated.

I think Eric's patch is simpler and less error-prone.

FWIW, I will be off for one week and probably without any internet access ;-)

Regards,
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 27, 2015, 4:55 p.m. UTC | #3
On Fri, 2015-02-27 at 17:25 +0100, Nicolas Dichtel wrote:

> That's true. The patch does not cover the following case:
> ip link add foo type bar
> ip link set foo netns netns1
> 
> In this case, the second command will call dev_change_net_namespace() and the
> id will not be allocated.
> 
> I think Eric's patch is simpler and less error-prone.
> 
> FWIW, I will be off for one week and probably without any internet access ;-)

I am cooking an official patch right now, thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang Feb. 27, 2015, 5:28 p.m. UTC | #4
On Fri, Feb 27, 2015 at 8:25 AM, Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
> Le 27/02/2015 09:01, Eric Dumazet a écrit :
>
>> On Thu, 2015-02-26 at 22:32 -0800, Cong Wang wrote:
>>>
>>> We can allocate the peer netns id when creating the link
>>> instead of when dumping the link.
>>>
>>> This fixes the following kernel warning:
>>>
>>>   ===============================
>>>   [ INFO: suspicious RCU usage. ]
>>>   3.19.0+ #805 Tainted: G        W
>>>   -------------------------------
>>>   include/linux/rcupdate.h:538 Illegal context switch in RCU read-side
>>> critical section!
>>>
>>>   other info that might help us debug this:
>>
>>
>> This looks very complicated, why even bother ?
>>
>> I gave an obvious patch fixing root cause, I have no idea why you prefer
>> over engineering this.
>>
>> If you believe allocating peer netns id is required at link creation
>> time, you should explain why.


I don't understand why you think this is over engineering,
clearly it is normal to allocate resources in newlink rather than
dumping. Look at other resources we allocate in newlink.


>>
>> BTW, (void)peernet2id(dev_net(dev), src_net) can fail, and your fix is
>> not complete anyway.
>
> That's true. The patch does not cover the following case:
> ip link add foo type bar
> ip link set foo netns netns1
>
> In this case, the second command will call dev_change_net_namespace() and
> the
> id will not be allocated.

Then call it in do_setlink(). This is not a reason we _have to_
allocate it in dumping.

>
> I think Eric's patch is simpler and less error-prone.
>

Sure, he introduced this rcu read lock from the very beginning.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas Dichtel Feb. 27, 2015, 5:48 p.m. UTC | #5
Le 27/02/2015 18:28, Cong Wang a écrit :
> On Fri, Feb 27, 2015 at 8:25 AM, Nicolas Dichtel
> <nicolas.dichtel@6wind.com> wrote:
>> Le 27/02/2015 09:01, Eric Dumazet a écrit :
>>
>>> On Thu, 2015-02-26 at 22:32 -0800, Cong Wang wrote:
[snip]
>>>
>>> BTW, (void)peernet2id(dev_net(dev), src_net) can fail, and your fix is
>>> not complete anyway.
>>
>> That's true. The patch does not cover the following case:
>> ip link add foo type bar
>> ip link set foo netns netns1
>>
>> In this case, the second command will call dev_change_net_namespace() and
>> the
>> id will not be allocated.
>
> Then call it in do_setlink(). This is not a reason we _have to_
> allocate it in dumping.
Sure, but that just shows the 'over engineering' side ;-)
The reason to allocate it in dumping is that you need it in dumping, not before ;-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 27, 2015, 6:04 p.m. UTC | #6
On Fri, 2015-02-27 at 09:28 -0800, Cong Wang wrote:

> I don't understand why you think this is over engineering,
> clearly it is normal to allocate resources in newlink rather than
> dumping. Look at other resources we allocate in newlink.

This has nothing to do with the splat you gave.

If you believe there is a _different_ bug, then submit another patch.

But claiming your patch solves the rcu thing is misleading,
you gave already 2 wrong patches, trying to hide the real fix.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang Feb. 28, 2015, 12:48 a.m. UTC | #7
On Fri, Feb 27, 2015 at 10:04 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2015-02-27 at 09:28 -0800, Cong Wang wrote:
>
>> I don't understand why you think this is over engineering,
>> clearly it is normal to allocate resources in newlink rather than
>> dumping. Look at other resources we allocate in newlink.
>
> This has nothing to do with the splat you gave.


Why not? If we from the beginning can avoid allocate any
memory in dumping, such rcu warning would not even trigger.

>
> If you believe there is a _different_ bug, then submit another patch.

I am not 100% sure it is a bug yet, but it does signal some bad sign.

>
> But claiming your patch solves the rcu thing is misleading,
> you gave already 2 wrong patches, trying to hide the real fix.
>

If rcu read lock were the only problem, I wouldn't even want to
reply, too trivial to worthy a reply.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang Feb. 28, 2015, 12:56 a.m. UTC | #8
On Fri, Feb 27, 2015 at 9:48 AM, Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
> Le 27/02/2015 18:28, Cong Wang a écrit :
>> Then call it in do_setlink(). This is not a reason we _have to_
>> allocate it in dumping.
>
> Sure, but that just shows the 'over engineering' side ;-)
> The reason to allocate it in dumping is that you need it in dumping, not
> before ;-)

Why? This relies on rtnetlink to dump the link after creation,
which is not obvious at all. IOW, imagine if rtnetlink didn't do
the dump after new link, what would happen?

You are hiding it too deep, without a valid reason. It sounds
like you want to defer the allocation as late as possible,
but again it makes no sense: 1) we don't allocate too much,
just one id in idr; 2) this loses the code readability and hides
potential bugs.

You are over engineering, not me who wants to make it sane.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang Feb. 28, 2015, 6:17 a.m. UTC | #9
(Removing Eric from discussion since he expressed he doesn't
care this code at all in a private email.)


On Fri, Feb 27, 2015 at 4:56 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> Why? This relies on rtnetlink to dump the link after creation,
> which is not obvious at all. IOW, imagine if rtnetlink didn't do
> the dump after new link, what would happen?
>
> You are hiding it too deep, without a valid reason. It sounds
> like you want to defer the allocation as late as possible,
> but again it makes no sense: 1) we don't allocate too much,
> just one id in idr; 2) this loses the code readability and hides
> potential bugs.
>

Here we go:

This breaks veth, where veth peers are completely equal,
therefore they are links for each other. So we expect
link-netnsid shown on both side.

Your current impl. relies on rtnl_configure_link() to dump
the link to allocate the netns id after creating a link.
Unfortunately for veth, rtnl_configure_link() is called
before veth->peer is set, therefore the dump of the "peer"
does _not_ actually allocate the netns id, since its
->get_link_net() returns the same net.

I will send a patch together after I figure out another bug
I saw.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 36faf49..25ac2e0 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -27,6 +27,7 @@ 
 #include <net/netns/nftables.h>
 #include <net/netns/xfrm.h>
 #include <linux/ns_common.h>
+#include <uapi/linux/net_namespace.h>
 
 struct user_namespace;
 struct proc_dir_entry;
@@ -291,7 +292,18 @@  static inline struct net *read_pnet(struct net * const *pnet)
 #define __net_initconst	__initconst
 #endif
 
-int peernet2id(struct net *net, struct net *peer);
+int __peernet2id(struct net *net, struct net *peer, bool alloc);
+
+/* This function returns the id of a peer netns. If no id is assigned, one will
+ * be allocated and returned.
+ */
+static inline int peernet2id(struct net *net, struct net *peer)
+{
+	int id = __peernet2id(net, peer, true);
+
+	return id >= 0 ? id : NETNSA_NSID_NOT_ASSIGNED;
+}
+
 struct net *get_net_ns_by_id(struct net *net, int id);
 
 struct pernet_operations {
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index cb5290b..9ff2164 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -175,7 +175,7 @@  static int net_eq_idr(int id, void *net, void *peer)
 	return 0;
 }
 
-static int __peernet2id(struct net *net, struct net *peer, bool alloc)
+int __peernet2id(struct net *net, struct net *peer, bool alloc)
 {
 	int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
 
@@ -192,17 +192,7 @@  static int __peernet2id(struct net *net, struct net *peer, bool alloc)
 
 	return -ENOENT;
 }
-
-/* This function returns the id of a peer netns. If no id is assigned, one will
- * be allocated and returned.
- */
-int peernet2id(struct net *net, struct net *peer)
-{
-	int id = __peernet2id(net, peer, true);
-
-	return id >= 0 ? id : NETNSA_NSID_NOT_ASSIGNED;
-}
-EXPORT_SYMBOL(peernet2id);
+EXPORT_SYMBOL(__peernet2id);
 
 struct net *get_net_ns_by_id(struct net *net, int id)
 {
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1385de0..a4348ac 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1175,8 +1175,10 @@  static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 		struct net *link_net = dev->rtnl_link_ops->get_link_net(dev);
 
 		if (!net_eq(dev_net(dev), link_net)) {
-			int id = peernet2id(dev_net(dev), link_net);
+			int id = __peernet2id(dev_net(dev), link_net, false);
 
+			if (id < 0)
+				id = NETNSA_NSID_NOT_ASSIGNED;
 			if (nla_put_s32(skb, IFLA_LINK_NETNSID, id))
 				goto nla_put_failure;
 		}
@@ -2142,7 +2144,12 @@  replay:
 		dev->ifindex = ifm->ifi_index;
 
 		if (ops->newlink) {
-			err = ops->newlink(link_net ? : net, dev, tb, data);
+			struct net *src_net = link_net ?: net;
+
+			if (ops->get_link_net && !net_eq(src_net, dev_net(dev)))
+				(void)peernet2id(dev_net(dev), src_net);
+
+			err = ops->newlink(src_net, dev, tb, data);
 			/* Drivers should call free_netdev() in ->destructor
 			 * and unregister it on failure after registration
 			 * so that device could be finally freed in rtnl_unlock.