Message ID | 1376987338.13829.7.camel@jlt4.sipsolutions.net |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, Aug 20, 2013 at 10:28:58AM +0200, Johannes Berg wrote:
> Something like the patch below, perhaps? Completely untested so far.
Yeah, this one seems to fix it here (I was seeing the same lockdep splat
as Hugh).
Thanks.
On Tue, 20 Aug 2013, Borislav Petkov wrote: > On Tue, Aug 20, 2013 at 10:28:58AM +0200, Johannes Berg wrote: > > Something like the patch below, perhaps? Completely untested so far. > > Yeah, this one seems to fix it here (I was seeing the same lockdep splat > as Hugh). Not so good for me: it fixed the original splat, but a few seconds later: [ 4.073542] e1000e 0000:00:19.0: irq 43 for MSI/MSI-X [ 4.175849] e1000e 0000:00:19.0: irq 43 for MSI/MSI-X [ 4.176223] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready [ 4.182322] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S [ 4.182537] iwlwifi 0000:03:00.0: Radio type=0x0-0x3-0x1 [ 4.405766] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S [ 4.405973] iwlwifi 0000:03:00.0: Radio type=0x0-0x3-0x1 [ 4.504441] IPv6: ADDRCONF(NETDEV_UP): wlan1: link is not ready [ 6.204569] input: PS/2 Generic Mouse as /devices/platform/i8042/serio1/serio2/input/input8 [ 7.662343] [ 7.662361] ====================================================== [ 7.662393] [ INFO: possible circular locking dependency detected ] [ 7.662426] 3.11.0-rc6 #5 Not tainted [ 7.662462] ------------------------------------------------------- [ 7.662500] wpa_supplicant/418 is trying to acquire lock: [ 7.662533] (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81480122>] __netlink_dump_start+0xae/0x14e [ 7.662603] [ 7.662603] but task is already holding lock: [ 7.662638] (genl_mutex){+.+.+.}, at: [<ffffffff8148204d>] genl_lock+0x12/0x14 [ 7.662695] [ 7.662695] which lock already depends on the new lock. [ 7.662695] [ 7.662743] [ 7.662743] the existing dependency chain (in reverse order) is: [ 7.662788] [ 7.662788] -> #1 (genl_mutex){+.+.+.}: [ 7.662827] [<ffffffff810b34d2>] __lock_acquire+0x865/0x956 [ 7.662870] [<ffffffff810b39fc>] lock_acquire+0x57/0x6d [ 7.662909] [<ffffffff81583e52>] mutex_lock_nested+0x5e/0x345 [ 7.662951] [<ffffffff8148204d>] genl_lock+0x12/0x14 [ 7.662989] [<ffffffff814822cc>] ctrl_dumpfamily+0x2b/0xea [ 7.663029] [<ffffffff8147f1b4>] netlink_dump+0x88/0x1d7 [ 7.663069] [<ffffffff81480187>] __netlink_dump_start+0x113/0x14e [ 7.663113] [<ffffffff81482143>] genl_rcv_msg+0xf4/0x252 [ 7.663152] [<ffffffff81481742>] netlink_rcv_skb+0x3e/0x8c [ 7.663192] [<ffffffff8148199b>] genl_rcv+0x24/0x34 [ 7.663228] [<ffffffff814811ca>] netlink_unicast+0xed/0x17a [ 7.663270] [<ffffffff814815d4>] netlink_sendmsg+0x2fb/0x345 [ 7.663311] [<ffffffff814503f7>] sock_sendmsg+0x79/0x8e [ 7.663351] [<ffffffff81450707>] ___sys_sendmsg+0x231/0x2be [ 7.663391] [<ffffffff81453228>] __sys_sendmsg+0x3d/0x5e [ 7.663431] [<ffffffff81453256>] SyS_sendmsg+0xd/0x19 [ 7.663469] [<ffffffff81587c12>] system_call_fastpath+0x16/0x1b [ 7.663513] [ 7.663513] -> #0 (nlk->cb_mutex){+.+.+.}: [ 7.663554] [<ffffffff810b1fb0>] validate_chain.isra.21+0x836/0xe8e [ 7.663599] [<ffffffff810b34d2>] __lock_acquire+0x865/0x956 [ 7.663640] [<ffffffff810b39fc>] lock_acquire+0x57/0x6d [ 7.663679] [<ffffffff81583e52>] mutex_lock_nested+0x5e/0x345 [ 7.663722] [<ffffffff81480122>] __netlink_dump_start+0xae/0x14e [ 7.663765] [<ffffffff81482143>] genl_rcv_msg+0xf4/0x252 [ 7.663804] [<ffffffff81481742>] netlink_rcv_skb+0x3e/0x8c [ 7.663844] [<ffffffff8148199b>] genl_rcv+0x24/0x34 [ 7.663881] [<ffffffff814811ca>] netlink_unicast+0xed/0x17a [ 7.663921] [<ffffffff814815d4>] netlink_sendmsg+0x2fb/0x345 [ 7.663961] [<ffffffff814503f7>] sock_sendmsg+0x79/0x8e [ 7.664000] [<ffffffff81450707>] ___sys_sendmsg+0x231/0x2be [ 7.664041] [<ffffffff81453228>] __sys_sendmsg+0x3d/0x5e [ 7.664081] [<ffffffff81453256>] SyS_sendmsg+0xd/0x19 [ 7.664119] [<ffffffff81587c12>] system_call_fastpath+0x16/0x1b [ 7.664161] [ 7.664161] other info that might help us debug this: [ 7.664161] [ 7.666543] Possible unsafe locking scenario: [ 7.666543] [ 7.668877] CPU0 CPU1 [ 7.670032] ---- ---- [ 7.671165] lock(genl_mutex); [ 7.672298] lock(nlk->cb_mutex); [ 7.673424] lock(genl_mutex); [ 7.674532] lock(nlk->cb_mutex); [ 7.675607] [ 7.675607] *** DEADLOCK *** [ 7.675607] [ 7.678696] 2 locks held by wpa_supplicant/418: [ 7.679704] #0: (cb_lock){++++++}, at: [<ffffffff8148198c>] genl_rcv+0x15/0x34 [ 7.680772] #1: (genl_mutex){+.+.+.}, at: [<ffffffff8148204d>] genl_lock+0x12/0x14 [ 7.681840] [ 7.681840] stack backtrace: [ 7.683933] CPU: 0 PID: 418 Comm: wpa_supplicant Not tainted 3.11.0-rc6 #5 [ 7.685022] Hardware name: LENOVO 4174EH1/4174EH1, BIOS 8CET51WW (1.31 ) 11/29/2011 [ 7.686118] ffffffff81cc8750 ffff88022b8117b8 ffffffff8157cf90 0000000000000006 [ 7.687230] ffffffff81d0a450 ffff88022b811808 ffffffff8157a8a8 0000000000000001 [ 7.688347] ffff880230d0a080 ffff880230d0a080 ffff880230d0a778 ffff880230d0a080 [ 7.689461] Call Trace: [ 7.690551] [<ffffffff8157cf90>] dump_stack+0x4f/0x84 [ 7.691657] [<ffffffff8157a8a8>] print_circular_bug+0x2ad/0x2be [ 7.692775] [<ffffffff810b1fb0>] validate_chain.isra.21+0x836/0xe8e [ 7.693893] [<ffffffff810b34d2>] __lock_acquire+0x865/0x956 [ 7.695012] [<ffffffff81480122>] ? __netlink_dump_start+0xae/0x14e [ 7.696137] [<ffffffff810b39fc>] lock_acquire+0x57/0x6d [ 7.697275] [<ffffffff81480122>] ? __netlink_dump_start+0xae/0x14e [ 7.698404] [<ffffffff81583e52>] mutex_lock_nested+0x5e/0x345 [ 7.699541] [<ffffffff81480122>] ? __netlink_dump_start+0xae/0x14e [ 7.700696] [<ffffffff81586c32>] ? _raw_read_unlock+0x2d/0x4a [ 7.701840] [<ffffffff81480122>] __netlink_dump_start+0xae/0x14e [ 7.702991] [<ffffffff81482143>] genl_rcv_msg+0xf4/0x252 [ 7.704143] [<ffffffff81536380>] ? nl80211_dump_mpath+0x10d/0x10d [ 7.705317] [<ffffffff8148204f>] ? genl_lock+0x14/0x14 [ 7.706464] [<ffffffff81481742>] netlink_rcv_skb+0x3e/0x8c [ 7.707614] [<ffffffff8148199b>] genl_rcv+0x24/0x34 [ 7.708756] [<ffffffff814811ca>] netlink_unicast+0xed/0x17a [ 7.709890] [<ffffffff814815d4>] netlink_sendmsg+0x2fb/0x345 [ 7.711018] [<ffffffff814503f7>] sock_sendmsg+0x79/0x8e [ 7.712144] [<ffffffff810f86fa>] ? might_fault+0x52/0xa2 [ 7.713280] [<ffffffff81450707>] ___sys_sendmsg+0x231/0x2be [ 7.714400] [<ffffffff810fd37b>] ? handle_mm_fault+0x47d/0x49d [ 7.715525] [<ffffffff81087aa5>] ? up_read+0x1b/0x32 [ 7.716663] [<ffffffff81053c34>] ? __do_page_fault+0x370/0x414 [ 7.717789] [<ffffffff811488e4>] ? fget_light+0x115/0x377 [ 7.718922] [<ffffffff810f86fa>] ? might_fault+0x52/0xa2 [ 7.720052] [<ffffffff81453228>] __sys_sendmsg+0x3d/0x5e [ 7.721192] [<ffffffff81453256>] SyS_sendmsg+0xd/0x19 [ 7.722309] [<ffffffff81587c12>] system_call_fastpath+0x16/0x1b [ 11.044520] wlan1: authenticate with c0:3f:0e:ad:ff:ee [ 11.096583] wlan1: send auth to c0:3f:0e:ad:ff:ee (try 1/3) [ 11.099448] wlan1: authenticated [ 11.100813] wlan1: associate with c0:3f:0e:ad:ff:ee (try 1/3) [ 11.105771] wlan1: RX AssocResp from c0:3f:0e:ad:ff:ee (capab=0x411 status=0 aid=6) [ 11.114801] IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready [ 11.115884] wlan1: associated -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-08-20 at 10:28 +0200, Johannes Berg wrote: > The only way to fix this that I see right now (that doesn't rewrite the > locking completely) would be to make genetlink use parallel_ops itself, > thereby removing the genl_lock() in genl_rcv_msg() and breaking all > those lock chains that lockdep reported. After that, it should be safe > to use genl_lock() inside all the operations. Something like the patch > below, perhaps? Completely untested so far. Tested now, and it still causes lockdep to complain, though that's a lockdep issue I believe, it thinks that genl_mutex and nlk->cb_mutex can be inverted although nlk->cb_mutex exists per family, so we need to annotate lockdep there. johannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-08-20 at 21:02 +0200, Johannes Berg wrote: > On Tue, 2013-08-20 at 10:28 +0200, Johannes Berg wrote: > > > The only way to fix this that I see right now (that doesn't rewrite the > > locking completely) would be to make genetlink use parallel_ops itself, > > thereby removing the genl_lock() in genl_rcv_msg() and breaking all > > those lock chains that lockdep reported. After that, it should be safe > > to use genl_lock() inside all the operations. Something like the patch > > below, perhaps? Completely untested so far. > > Tested now, and it still causes lockdep to complain, though that's a > lockdep issue I believe, it thinks that genl_mutex and nlk->cb_mutex can > be inverted although nlk->cb_mutex exists per family, so we need to > annotate lockdep there. No, lockdep is correct - generic netlink uses the same cb_mutex for all families, obviously, since it's all the same netlink family. I'll just convert it to RCU. johannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c index f85f8a2..dea9586 100644 --- a/net/netlink/genetlink.c +++ b/net/netlink/genetlink.c @@ -665,6 +665,7 @@ static struct genl_family genl_ctrl = { .version = 0x2, .maxattr = CTRL_ATTR_MAX, .netnsok = true, + .parallel_ops = true, }; static int ctrl_fill_info(struct genl_family *family, u32 portid, u32 seq, @@ -789,10 +790,8 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct netlink_callback *cb) struct net *net = sock_net(skb->sk); int chains_to_skip = cb->args[0]; int fams_to_skip = cb->args[1]; - bool need_locking = chains_to_skip || fams_to_skip; - if (need_locking) - genl_lock(); + genl_lock(); for (i = chains_to_skip; i < GENL_FAM_TAB_SIZE; i++) { n = 0; @@ -814,8 +813,7 @@ errout: cb->args[0] = i; cb->args[1] = n; - if (need_locking) - genl_unlock(); + genl_unlock(); return skb->len; } @@ -870,6 +868,8 @@ static int ctrl_getfamily(struct sk_buff *skb, struct genl_info *info) struct genl_family *res = NULL; int err = -EINVAL; + genl_lock(); + if (info->attrs[CTRL_ATTR_FAMILY_ID]) { u16 id = nla_get_u16(info->attrs[CTRL_ATTR_FAMILY_ID]); res = genl_family_find_byid(id); @@ -896,19 +896,25 @@ static int ctrl_getfamily(struct sk_buff *skb, struct genl_info *info) } if (res == NULL) - return err; + goto out_unlock; if (!res->netnsok && !net_eq(genl_info_net(info), &init_net)) { /* family doesn't exist here */ - return -ENOENT; + err = -ENOENT; + goto out_unlock; } msg = ctrl_build_family_msg(res, info->snd_portid, info->snd_seq, CTRL_CMD_NEWFAMILY); - if (IS_ERR(msg)) - return PTR_ERR(msg); + if (IS_ERR(msg)) { + err = PTR_ERR(msg); + goto out_unlock; + } - return genlmsg_reply(msg, info); + err = genlmsg_reply(msg, info); +out_unlock: + genl_unlock(); + return err; } static int genl_ctrl_event(int event, void *data)