diff mbox

net: fix rtnl even race in register_netdevice()

Message ID 20110429172634.27130.25375.stgit@x201
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Kalle Valo April 29, 2011, 5:26 p.m. UTC
From: Kalle Valo <kalle.valo@atheros.com>

There's a race in register_netdevice so that the rtnl event is sent before
the device is actually ready. This was visible with flimflam, chrome os
connection manager:

00:21:35 roska flimflamd[2598]: src/udev.c:add_net_device()
00:21:35 roska flimflamd[2598]: connman_inet_ifname: SIOCGIFNAME(index
   4): No such device
00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message() buf
   0xbfefda3c len 1004
00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message()
   NEWLINK len 1004 type 16 flags 0x0000 seq 0

So the kobject is visible in udev before the device is ready.

(ignore the 10 s delay, I added that to reproduce the issue easily)

The issue is reported here:

https://bugzilla.kernel.org/show_bug.cgi?id=15606

The fix is to call netdev_register_kobject() after the device is added
to the list.

Signed-off-by: Kalle Valo <kalle.valo@atheros.com>
---
 net/core/dev.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller April 29, 2011, 8:53 p.m. UTC | #1
From: Kalle Valo <kvalo@adurom.com>
Date: Fri, 29 Apr 2011 20:26:34 +0300

> From: Kalle Valo <kalle.valo@atheros.com>
> 
> There's a race in register_netdevice so that the rtnl event is sent before
> the device is actually ready. This was visible with flimflam, chrome os
> connection manager:
> 
> 00:21:35 roska flimflamd[2598]: src/udev.c:add_net_device()
> 00:21:35 roska flimflamd[2598]: connman_inet_ifname: SIOCGIFNAME(index
>    4): No such device
> 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message() buf
>    0xbfefda3c len 1004
> 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message()
>    NEWLINK len 1004 type 16 flags 0x0000 seq 0
> 
> So the kobject is visible in udev before the device is ready.
> 
> (ignore the 10 s delay, I added that to reproduce the issue easily)
> 
> The issue is reported here:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=15606
> 
> The fix is to call netdev_register_kobject() after the device is added
> to the list.
> 
> Signed-off-by: Kalle Valo <kalle.valo@atheros.com>

This is not correct.

If you move the kobject registry around, you have to change the
error handling cleanup to match.

This change will leave the netdevice on all sorts of lists, it will
also leak a reference to the device.

I also think this points a fundamental problem with this change, in
that you can't register the kobject after the device is added to
the various lists in list_netdevice().

Once it's in those lists, any thread of control can find the device
and those threads of control may try to get at the data backed by
the kobject and therefore they really expect it to be there by
then.

What you can do instead is try to delay the NETREG_REGISTERED
setting, and block the problematic notifications by testing
reg_state or similar.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kalle Valo May 3, 2011, 2:38 a.m. UTC | #2
David Miller <davem@davemloft.net> writes:

> From: Kalle Valo <kvalo@adurom.com>
> Date: Fri, 29 Apr 2011 20:26:34 +0300
>
>> From: Kalle Valo <kalle.valo@atheros.com>
>> 
>> There's a race in register_netdevice so that the rtnl event is sent before
>> the device is actually ready. This was visible with flimflam, chrome os
>> connection manager:

[...]

>> The fix is to call netdev_register_kobject() after the device is added
>> to the list.
>> 
>> Signed-off-by: Kalle Valo <kalle.valo@atheros.com>
>
> This is not correct.
>
> If you move the kobject registry around, you have to change the
> error handling cleanup to match.
>
> This change will leave the netdevice on all sorts of lists, it will
> also leak a reference to the device.
>
> I also think this points a fundamental problem with this change, in
> that you can't register the kobject after the device is added to
> the various lists in list_netdevice().
>
> Once it's in those lists, any thread of control can find the device
> and those threads of control may try to get at the data backed by
> the kobject and therefore they really expect it to be there by
> then.
>
> What you can do instead is try to delay the NETREG_REGISTERED
> setting, and block the problematic notifications by testing
> reg_state or similar.

Thanks for the review. I'll investigate more about this and send v2
once I found a better solution.
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 956d3b0..f2afbe6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5425,11 +5425,6 @@  int register_netdevice(struct net_device *dev)
 	if (ret)
 		goto err_uninit;
 
-	ret = netdev_register_kobject(dev);
-	if (ret)
-		goto err_uninit;
-	dev->reg_state = NETREG_REGISTERED;
-
 	netdev_update_features(dev);
 
 	/*
@@ -5443,6 +5438,11 @@  int register_netdevice(struct net_device *dev)
 	dev_hold(dev);
 	list_netdevice(dev);
 
+	ret = netdev_register_kobject(dev);
+	if (ret)
+		goto err_uninit;
+	dev->reg_state = NETREG_REGISTERED;
+
 	/* Notify protocols, that a new device appeared. */
 	ret = call_netdevice_notifiers(NETDEV_REGISTER, dev);
 	ret = notifier_to_errno(ret);