Patchwork [net-next,v4] netpoll: fix a rtnl lock assertion failure

login
register
mail settings
Submitter Amerigo Wang
Date Jan. 17, 2013, 3:30 a.m.
Message ID <1358393418.3855.3.camel@cr0>
Download mbox | patch
Permalink /patch/213134/
State RFC
Delegated to: David Miller
Headers show

Comments

Amerigo Wang - Jan. 17, 2013, 3:30 a.m.
On Wed, 2013-01-16 at 17:24 -0800, Eric Dumazet wrote:
> On Tue, 2013-01-15 at 17:34 +0800, Cong Wang wrote:
> > From: Cong Wang <amwang@redhat.com>
> > 
> > v4: hold rtnl lock for the whole netpoll_setup()
> > v3: remove the comment
> > v2: use RCU read lock
> > 
> > This patch fixes the following warning:
> > 
> > [   72.013864] RTNL: assertion failed at net/core/dev.c (4955)
> > [   72.017758] Pid: 668, comm: netpoll-prep-v6 Not tainted 3.8.0-rc1+ #474
> > [   72.019582] Call Trace:
> > [   72.020295]  [<ffffffff8176653d>] netdev_master_upper_dev_get+0x35/0x58
> > [   72.022545]  [<ffffffff81784edd>] netpoll_setup+0x61/0x340
> > [   72.024846]  [<ffffffff815d837e>] store_enabled+0x82/0xc3
> > [   72.027466]  [<ffffffff815d7e51>] netconsole_target_attr_store+0x35/0x37
> > [   72.029348]  [<ffffffff811c3479>] configfs_write_file+0xe2/0x10c
> > [   72.030959]  [<ffffffff8115d239>] vfs_write+0xaf/0xf6
> > [   72.032359]  [<ffffffff81978a05>] ? sysret_check+0x22/0x5d
> > [   72.033824]  [<ffffffff8115d453>] sys_write+0x5c/0x84
> > [   72.035328]  [<ffffffff819789d9>] system_call_fastpath+0x16/0x1b
> > 
> > In case of other races, hold rtnl lock for the entire netpoll_setup() function.
> > 
> > Cc: Eric Dumazet <eric.dumazet@gmail.com>
> > Cc: Jiri Pirko <jiri@resnulli.us>
> > Cc: David S. Miller <davem@davemloft.net>
> > Signed-off-by: Cong Wang <amwang@redhat.com>
> > ---
> > diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> 
> ...
> 
> >  	if (np->dev_name)
> > -		ndev = dev_get_by_name(&init_net, np->dev_name);
> > +		ndev = __dev_get_by_name(&init_net, np->dev_name);
> 
> This change brings interesting bugs.

Hmm, I didn't realize __dev_get_by_name() doesn't hold the device, so
just call dev_hold() after this?

np->dev_name);


> 
> All the "goto put;" are basically wrong, and the section waiting for the
> carrier and releasing/getting rtnl is buggy.

Either we have to sleep for few seconds with rtnl lock held, or leave as
it is. The original code doesn't hold rtnl lock either.

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Jan. 17, 2013, 3:54 a.m.
From: Cong Wang <amwang@redhat.com>
Date: Thu, 17 Jan 2013 11:30:18 +0800

> On Wed, 2013-01-16 at 17:24 -0800, Eric Dumazet wrote:
>> On Tue, 2013-01-15 at 17:34 +0800, Cong Wang wrote:
>> >  	if (np->dev_name)
>> > -		ndev = dev_get_by_name(&init_net, np->dev_name);
>> > +		ndev = __dev_get_by_name(&init_net, np->dev_name);
>> 
>> This change brings interesting bugs.
> 
> Hmm, I didn't realize __dev_get_by_name() doesn't hold the device, so
> just call dev_hold() after this?

Why not just... call dev_get_by_name()?  It doesn't hurt to over-RCU
lock.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amerigo Wang - Jan. 17, 2013, 4:18 a.m.
On Wed, 2013-01-16 at 22:54 -0500, David Miller wrote:
> From: Cong Wang <amwang@redhat.com>
> Date: Thu, 17 Jan 2013 11:30:18 +0800
> 
> > On Wed, 2013-01-16 at 17:24 -0800, Eric Dumazet wrote:
> >> On Tue, 2013-01-15 at 17:34 +0800, Cong Wang wrote:
> >> >  	if (np->dev_name)
> >> > -		ndev = dev_get_by_name(&init_net, np->dev_name);
> >> > +		ndev = __dev_get_by_name(&init_net, np->dev_name);
> >> 
> >> This change brings interesting bugs.
> > 
> > Hmm, I didn't realize __dev_get_by_name() doesn't hold the device, so
> > just call dev_hold() after this?
> 
> Why not just... call dev_get_by_name()?  It doesn't hurt to over-RCU
> lock.
> 

Just that taking RCU read lock while having rtnl lock is unnecessary, no
other reason.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Jan. 17, 2013, 4:53 a.m.
On Thu, 2013-01-17 at 12:18 +0800, Cong Wang wrote:

> > Why not just... call dev_get_by_name()?  It doesn't hurt to over-RCU
> > lock.
> > 
> 
> Just that taking RCU read lock while having rtnl lock is unnecessary, no
> other reason.

Calling the dev_get_by_name() would be just fine, and generate less
code.

Its not a fast path...

Anyway David already applied your patch.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index a5ad1c1..a9b1004 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -1056,6 +1056,7 @@  int netpoll_setup(struct netpoll *np)
                err = -ENODEV;
                goto unlock;
        }
+       dev_hold(ndev);
 
        if (netdev_master_upper_dev_get(ndev)) {
                np_err(np, "%s is a slave device, aborting\n",