Patchwork [net] rtnetlink: call rtnl_lock_unregistering() in rtnl_link_unregister()

login
register
mail settings
Submitter WANG Cong
Date May 9, 2014, 5:47 p.m.
Message ID <1399657653-4909-1-git-send-email-xiyou.wangcong@gmail.com>
Download mbox | patch
Permalink /patch/347496/
State Changes Requested
Delegated to: David Miller
Headers show

Comments

WANG Cong - May 9, 2014, 5:47 p.m.
From: Cong Wang <cwang@twopensource.com>

commit 50624c934db18ab90 (net: Delay default_device_exit_batch until no
devices are unregistering) introduced rtnl_lock_unregistering() for
default_device_exit_batch(). Same race could happen we when rmmod a driver
which calls rtnl_link_unregister() as we call dev->destructor without rtnl
lock.

I know making rtnl_lock_unregistering() a macro is ugly, but I don't find
any less ugly way to fix it unless we duplicate the code. For long term,
I think we should clean up the mess of netdev_run_todo() and net namespce
exit code.

Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
---
 include/linux/rtnetlink.h | 31 +++++++++++++++++++++++++++++++
 net/core/dev.c            | 32 ++------------------------------
 net/core/rtnetlink.c      |  2 +-
 3 files changed, 34 insertions(+), 31 deletions(-)
stephen hemminger - May 9, 2014, 6:03 p.m.
On Fri,  9 May 2014 10:47:33 -0700
Cong Wang <xiyou.wangcong@gmail.com> wrote:

> I know making rtnl_lock_unregistering() a macro is ugly, but I don't find
> any less ugly way to fix it unless we duplicate the code. For long term,
> I think we should clean up the mess of netdev_run_todo() and net namespce
> exit code.

Huh, why isn't a real function going to work.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
WANG Cong - May 9, 2014, 6:10 p.m.
On Fri, May 9, 2014 at 11:03 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Fri,  9 May 2014 10:47:33 -0700
> Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
>> I know making rtnl_lock_unregistering() a macro is ugly, but I don't find
>> any less ugly way to fix it unless we duplicate the code. For long term,
>> I think we should clean up the mess of netdev_run_todo() and net namespce
>> exit code.
>
> Huh, why isn't a real function going to work.

Because in its callers:

+       rtnl_lock_unregistering(net_list, exit_list);
+       rtnl_lock_unregistering(&net_namespace_list, list);

struct net is linked via ->exit_list in net_list and is linked
via ->list in net_namespace_list.

This can't be done without a macro (at least for me). Or
we have to duplicate the code.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger - May 9, 2014, 7:02 p.m.
On Fri, 9 May 2014 11:10:36 -0700
Cong Wang <xiyou.wangcong@gmail.com> wrote:

> On Fri, May 9, 2014 at 11:03 AM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> > On Fri,  9 May 2014 10:47:33 -0700
> > Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> >> I know making rtnl_lock_unregistering() a macro is ugly, but I don't find
> >> any less ugly way to fix it unless we duplicate the code. For long term,
> >> I think we should clean up the mess of netdev_run_todo() and net namespce
> >> exit code.
> >
> > Huh, why isn't a real function going to work.
> 
> Because in its callers:
> 
> +       rtnl_lock_unregistering(net_list, exit_list);
> +       rtnl_lock_unregistering(&net_namespace_list, list);
> 
> struct net is linked via ->exit_list in net_list and is linked
> via ->list in net_namespace_list.
> 
> This can't be done without a macro (at least for me). Or
> we have to duplicate the code.

macro method is too ugly, figure out a better way.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
WANG Cong - May 9, 2014, 7:22 p.m.
On Fri, May 9, 2014 at 12:02 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Fri, 9 May 2014 11:10:36 -0700
> Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
>>
>> This can't be done without a macro (at least for me). Or
>> we have to duplicate the code.
>
> macro method is too ugly, figure out a better way.

That's what I am going to do for -net-next. This patch is for -net,
it's an ugly but minimum change I can find.

Of course, if you insist we should clean up it for -net as well,
I can do that.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman - May 9, 2014, 11:05 p.m.
Cong Wang <xiyou.wangcong@gmail.com> writes:

> On Fri, May 9, 2014 at 12:02 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>> On Fri, 9 May 2014 11:10:36 -0700
>> Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>
>>>
>>> This can't be done without a macro (at least for me). Or
>>> we have to duplicate the code.
>>
>> macro method is too ugly, figure out a better way.
>
> That's what I am going to do for -net-next. This patch is for -net,
> it's an ugly but minimum change I can find.
>
> Of course, if you insist we should clean up it for -net as well,
> I can do that.

It would be no worse to rename the existing function
rtnl_lock_unregistering_list

And add a second function rtnl_lock_unregistering that does
the same thing but uses the global list.

Of course this begs the question what happens if the network
device we want to destroy is a network namespace that is currently
exiting and not on the global list.

It looks like we need to grab the net_mutex to get a state where network
namespaces are not exiting...

Eric

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
WANG Cong - May 12, 2014, 5:17 a.m.
On Fri, May 9, 2014 at 4:05 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Cong Wang <xiyou.wangcong@gmail.com> writes:
>
>> On Fri, May 9, 2014 at 12:02 PM, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>>> On Fri, 9 May 2014 11:10:36 -0700
>>> Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>
>>>>
>>>> This can't be done without a macro (at least for me). Or
>>>> we have to duplicate the code.
>>>
>>> macro method is too ugly, figure out a better way.
>>
>> That's what I am going to do for -net-next. This patch is for -net,
>> it's an ugly but minimum change I can find.
>>
>> Of course, if you insist we should clean up it for -net as well,
>> I can do that.
>
> It would be no worse to rename the existing function
> rtnl_lock_unregistering_list
>
> And add a second function rtnl_lock_unregistering that does
> the same thing but uses the global list.
>
> Of course this begs the question what happens if the network
> device we want to destroy is a network namespace that is currently
> exiting and not on the global list.
>

OK, so we have to duplicate the code.

> It looks like we need to grab the net_mutex to get a state where network
> namespaces are not exiting...
>

Hmm, for me it looks like we need net_mutex only we change pernet ops,
here rtnl lock is enough. No?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman - May 12, 2014, 9:18 a.m.
Cong Wang <xiyou.wangcong@gmail.com> writes:

> On Fri, May 9, 2014 at 4:05 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Cong Wang <xiyou.wangcong@gmail.com> writes:
>>
>>> On Fri, May 9, 2014 at 12:02 PM, Stephen Hemminger
>>> <stephen@networkplumber.org> wrote:
>>>> On Fri, 9 May 2014 11:10:36 -0700
>>>> Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>
>>>>>
>>>>> This can't be done without a macro (at least for me). Or
>>>>> we have to duplicate the code.
>>>>
>>>> macro method is too ugly, figure out a better way.
>>>
>>> That's what I am going to do for -net-next. This patch is for -net,
>>> it's an ugly but minimum change I can find.
>>>
>>> Of course, if you insist we should clean up it for -net as well,
>>> I can do that.
>>
>> It would be no worse to rename the existing function
>> rtnl_lock_unregistering_list
>>
>> And add a second function rtnl_lock_unregistering that does
>> the same thing but uses the global list.
>>
>> Of course this begs the question what happens if the network
>> device we want to destroy is a network namespace that is currently
>> exiting and not on the global list.
>>
>
> OK, so we have to duplicate the code.
>
>> It looks like we need to grab the net_mutex to get a state where network
>> namespaces are not exiting...
>>
>
> Hmm, for me it looks like we need net_mutex only we change pernet ops,
> here rtnl lock is enough. No?

Look at net/core/net_namespace.c:cleanup_net

While namespaces are being torn down the net_mutex is held.  At the same
time the rtnl_mutex is not held, and those namespaces are not on the
net_namespace_list.

So while holding the rtnl_lock is enough to traverse the net_namespace
list without it changing.  You need the net_mutex to make certain there
are not network namespaces in the final stages of being cleaned up
that are not on the net_namespaces list.

Those namespaces in the final stages of being cleaned up are as
problematic as network devices that are being cleaned up that live
on the netdev todo list.  for_each_net in rtnl_link_unregister
can not see them.

Eric

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 8e3e66a..e70218d 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -4,6 +4,7 @@ 
 
 #include <linux/mutex.h>
 #include <linux/netdevice.h>
+#include <linux/wait.h>
 #include <uapi/linux/rtnetlink.h>
 
 extern int rtnetlink_send(struct sk_buff *skb, struct net *net, u32 pid, u32 group, int echo);
@@ -22,6 +23,36 @@  extern void rtnl_lock(void);
 extern void rtnl_unlock(void);
 extern int rtnl_trylock(void);
 extern int rtnl_is_locked(void);
+
+extern wait_queue_head_t netdev_unregistering_wq;
+/* Return with the rtnl_lock held when there are no network
+ * devices unregistering in any network namespace in net_list.
+ */
+#define rtnl_lock_unregistering(net_list, member)			\
+do {									\
+	struct net *net;						\
+	bool unregistering;						\
+	DEFINE_WAIT(wait);						\
+									\
+	for (;;) {							\
+		prepare_to_wait(&netdev_unregistering_wq, &wait,	\
+				TASK_UNINTERRUPTIBLE);			\
+		unregistering = false;					\
+		rtnl_lock();						\
+		list_for_each_entry(net, net_list, member) {		\
+			if (net->dev_unreg_count > 0) {			\
+				unregistering = true;			\
+				break;					\
+			}						\
+		}							\
+		if (!unregistering)					\
+			break;						\
+		__rtnl_unlock();					\
+		schedule();						\
+	}								\
+	finish_wait(&netdev_unregistering_wq, &wait);			\
+} while (0)
+
 #ifdef CONFIG_PROVE_LOCKING
 extern int lockdep_rtnl_is_held(void);
 #else
diff --git a/net/core/dev.c b/net/core/dev.c
index c619b86..25f7ed2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5541,7 +5541,7 @@  static int dev_new_index(struct net *net)
 
 /* Delayed registration/unregisteration */
 static LIST_HEAD(net_todo_list);
-static DECLARE_WAIT_QUEUE_HEAD(netdev_unregistering_wq);
+DECLARE_WAIT_QUEUE_HEAD(netdev_unregistering_wq);
 
 static void net_set_todo(struct net_device *dev)
 {
@@ -6926,34 +6926,6 @@  static void __net_exit default_device_exit(struct net *net)
 	rtnl_unlock();
 }
 
-static void __net_exit rtnl_lock_unregistering(struct list_head *net_list)
-{
-	/* Return with the rtnl_lock held when there are no network
-	 * devices unregistering in any network namespace in net_list.
-	 */
-	struct net *net;
-	bool unregistering;
-	DEFINE_WAIT(wait);
-
-	for (;;) {
-		prepare_to_wait(&netdev_unregistering_wq, &wait,
-				TASK_UNINTERRUPTIBLE);
-		unregistering = false;
-		rtnl_lock();
-		list_for_each_entry(net, net_list, exit_list) {
-			if (net->dev_unreg_count > 0) {
-				unregistering = true;
-				break;
-			}
-		}
-		if (!unregistering)
-			break;
-		__rtnl_unlock();
-		schedule();
-	}
-	finish_wait(&netdev_unregistering_wq, &wait);
-}
-
 static void __net_exit default_device_exit_batch(struct list_head *net_list)
 {
 	/* At exit all network devices most be removed from a network
@@ -6976,7 +6948,7 @@  static void __net_exit default_device_exit_batch(struct list_head *net_list)
 	 * will run in the rtnl_unlock() at the end of
 	 * default_device_exit_batch.
 	 */
-	rtnl_lock_unregistering(net_list);
+	rtnl_lock_unregistering(net_list, exit_list);
 	list_for_each_entry(net, net_list, exit_list) {
 		for_each_netdev_reverse(net, dev) {
 			if (dev->rtnl_link_ops)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 9837beb..fd33a83 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -359,7 +359,7 @@  EXPORT_SYMBOL_GPL(__rtnl_link_unregister);
  */
 void rtnl_link_unregister(struct rtnl_link_ops *ops)
 {
-	rtnl_lock();
+	rtnl_lock_unregistering(&net_namespace_list, list);
 	__rtnl_link_unregister(ops);
 	rtnl_unlock();
 }