Message ID | 1465430127-22993-2-git-send-email-edumazet@google.com |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
On 6/8/16 5:55 PM, Eric Dumazet wrote: > In case a qdisc is used on a vrf device, we need to use different > lockdep classes to avoid false positives. > > Fixes: f9eb8aea2a1e ("net_sched: transform qdisc running bit into a seqcount") > Reported-by: David Ahern <dsa@cumulusnetworks.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > --- > drivers/net/vrf.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c > index 1b214ea4619a..ee6bc1c5c1ce 100644 > --- a/drivers/net/vrf.c > +++ b/drivers/net/vrf.c > @@ -657,7 +657,7 @@ static int vrf_dev_init(struct net_device *dev) > > /* similarly, oper state is irrelevant; set to up to avoid confusion */ > dev->operstate = IF_OPER_UP; > - > + netdev_lockdep_set_classes(dev); > return 0; > > out_rth: > Still see the problem; all 4 patches applied, make clean followed by a build just to make sure. [ 90.956522] [ 90.956952] ====================================================== [ 90.958441] [ INFO: possible circular locking dependency detected ] [ 90.959820] 4.7.0-rc1+ #271 Not tainted [ 90.960672] ------------------------------------------------------- [ 90.962051] ping/1585 is trying to acquire lock: [ 90.962997] (&(&list->lock)->rlock#3){+.-...}, at: [<ffffffff8140827c>] __dev_queue_xmit+0x3d3/0x751 [ 90.965033] [ 90.965033] but task is already holding lock: [ 90.966200] (dev->qdisc_running_key ?: &qdisc_running_key#2){+.....}, at: [<ffffffff814082d1>] __dev_queue_xmit+0x428/0x751 [ 90.968591] [ 90.968591] which lock already depends on the new lock. [ 90.968591] [ 90.970014] [ 90.970014] the existing dependency chain (in reverse order) is: [ 90.971287] -> #1 (dev->qdisc_running_key ?: &qdisc_running_key#2){+.....}: [ 90.972611] [<ffffffff810853b7>] __lock_acquire+0x5e4/0x690 [ 90.973712] [<ffffffff81085a32>] lock_acquire+0x140/0x1d8 [ 90.974668] [<ffffffff8140259d>] write_seqcount_begin+0x21/0x24 [ 90.975673] [<ffffffff814082d1>] __dev_queue_xmit+0x428/0x751 [ 90.976608] [<ffffffff81408605>] dev_queue_xmit+0xb/0xd [ 90.977484] [<ffffffff81411f40>] neigh_resolve_output+0x113/0x12e [ 90.978497] [<ffffffff81397881>] dst_neigh_output.isra.20+0x13b/0x148 [ 90.979548] [<ffffffff81397992>] vrf_finish_output+0x104/0x139 [ 90.980504] [<ffffffff81397b5d>] vrf_output+0x5c/0xc1 [ 90.981365] [<ffffffff81441ae3>] dst_output+0x2b/0x30 [ 90.982240] [<ffffffff8144355a>] ip_local_out+0x2a/0x31 [ 90.983109] [<ffffffff814443cb>] ip_send_skb+0x14/0x38 [ 90.983974] [<ffffffff8144441d>] ip_push_pending_frames+0x2e/0x31 [ 90.984887] [<ffffffff814650ca>] raw_sendmsg+0x788/0x9d2 [ 90.985658] [<ffffffff81472438>] inet_sendmsg+0x35/0x5c [ 90.986452] [<ffffffff813ed311>] sock_sendmsg_nosec+0x12/0x1d [ 90.987275] [<ffffffff813ee1c5>] ___sys_sendmsg+0x1b1/0x21f [ 90.988094] [<ffffffff813ee4ea>] __sys_sendmsg+0x40/0x5e [ 90.988882] [<ffffffff813ee51c>] SyS_sendmsg+0x14/0x16 [ 90.989647] [<ffffffff814f69bc>] entry_SYSCALL_64_fastpath+0x1f/0xbd [ 90.990590] -> #0 (&(&list->lock)->rlock#3){+.-...}: [ 90.991368] [<ffffffff8108435a>] validate_chain.isra.37+0x7c8/0xa5b [ 90.992268] [<ffffffff810853b7>] __lock_acquire+0x5e4/0x690 [ 90.993114] [<ffffffff81085a32>] lock_acquire+0x140/0x1d8 [ 90.993955] [<ffffffff814f5ee6>] _raw_spin_lock+0x2f/0x65 [ 90.994751] [<ffffffff8140827c>] __dev_queue_xmit+0x3d3/0x751 [ 90.995574] [<ffffffff81408605>] dev_queue_xmit+0xb/0xd [ 90.996338] [<ffffffff8146ac86>] arp_xmit+0x32/0x7e [ 90.997062] [<ffffffff8146ad0e>] arp_send_dst+0x3c/0x42 [ 90.997834] [<ffffffff8146af8f>] arp_solicit+0x27b/0x28e [ 90.998631] [<ffffffff8140f08f>] neigh_probe+0x4a/0x5e [ 90.999407] [<ffffffff8141066f>] __neigh_event_send+0x1d0/0x21b [ 91.000264] [<ffffffff814106e5>] neigh_event_send+0x2b/0x2d [ 91.001075] [<ffffffff81411e45>] neigh_resolve_output+0x18/0x12e [ 91.001956] [<ffffffff814421e8>] ip_finish_output2+0x3c0/0x41c [ 91.002803] [<ffffffff81442a7b>] ip_finish_output+0x132/0x13e [ 91.003630] [<ffffffff81442aa8>] NF_HOOK_COND.constprop.43+0x21/0x8a [ 91.004550] [<ffffffff81443c19>] ip_output+0x65/0x6a [ 91.005293] [<ffffffff81441ae3>] dst_output+0x2b/0x30 [ 91.006066] [<ffffffff8144355a>] ip_local_out+0x2a/0x31 [ 91.006821] [<ffffffff813985f1>] vrf_xmit+0x1f6/0x417 [ 91.007554] [<ffffffff81407ba8>] dev_hard_start_xmit+0x154/0x337 [ 91.008410] [<ffffffff81427f15>] sch_direct_xmit+0x98/0x16c [ 91.009211] [<ffffffff814082fc>] __dev_queue_xmit+0x453/0x751 [ 91.010043] [<ffffffff81408605>] dev_queue_xmit+0xb/0xd [ 91.010806] [<ffffffff81411f40>] neigh_resolve_output+0x113/0x12e [ 91.011690] [<ffffffff81397881>] dst_neigh_output.isra.20+0x13b/0x148 [ 91.012600] [<ffffffff81397992>] vrf_finish_output+0x104/0x139 [ 91.013431] [<ffffffff81397b5d>] vrf_output+0x5c/0xc1 [ 91.014172] [<ffffffff81441ae3>] dst_output+0x2b/0x30 [ 91.014906] [<ffffffff8144355a>] ip_local_out+0x2a/0x31 [ 91.015659] [<ffffffff814443cb>] ip_send_skb+0x14/0x38 [ 91.016398] [<ffffffff8144441d>] ip_push_pending_frames+0x2e/0x31 [ 91.017259] [<ffffffff814650ca>] raw_sendmsg+0x788/0x9d2 [ 91.018033] [<ffffffff81472438>] inet_sendmsg+0x35/0x5c [ 91.018797] [<ffffffff813ed311>] sock_sendmsg_nosec+0x12/0x1d [ 91.019615] [<ffffffff813ee1c5>] ___sys_sendmsg+0x1b1/0x21f [ 91.020417] [<ffffffff813ee4ea>] __sys_sendmsg+0x40/0x5e [ 91.021180] [<ffffffff813ee51c>] SyS_sendmsg+0x14/0x16 [ 91.021927] [<ffffffff814f69bc>] entry_SYSCALL_64_fastpath+0x1f/0xbd [ 91.022820] [ 91.022820] other info that might help us debug this: [ 91.022820] [ 91.023848] Possible unsafe locking scenario: [ 91.023848] [ 91.024595] CPU0 CPU1 [ 91.025173] ---- ---- [ 91.025751] lock(dev->qdisc_running_key ?: &qdisc_running_key#2); [ 91.026600] lock(&(&list->lock)->rlock#3); [ 91.027505] lock(dev->qdisc_running_key ?: &qdisc_running_key#2); [ 91.028657] lock(&(&list->lock)->rlock#3); [ 91.029257] [ 91.029257] *** DEADLOCK *** [ 91.029257] [ 91.030008] 6 locks held by ping/1585: [ 91.030485] #0: (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff8146506b>] raw_sendmsg+0x729/0x9d2 [ 91.031643] #1: (rcu_read_lock_bh){......}, at: [<ffffffff81397076>] rcu_lock_acquire+0x0/0x20 [ 91.032834] #2: (rcu_read_lock_bh){......}, at: [<ffffffff813ffc04>] rcu_lock_acquire+0x0/0x20 [ 91.034040] #3: (dev->qdisc_running_key ?: &qdisc_running_key#2){+.....}, at: [<ffffffff814082d1>] __dev_queue_xmit+0x428/0x751 [ 91.035622] #4: (rcu_read_lock_bh){......}, at: [<ffffffff814418d3>] rcu_lock_acquire+0x0/0x20 [ 91.036823] #5: (rcu_read_lock_bh){......}, at: [<ffffffff813ffc04>] rcu_lock_acquire+0x0/0x20 [ 91.038016] [ 91.038016] stack backtrace: [ 91.038574] CPU: 6 PID: 1585 Comm: ping Not tainted 4.7.0-rc1+ #271 [ 91.039366] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 91.040634] 0000000000000000 ffff8800b84c72d0 ffffffff8127e395 ffffffff8254b8c0 [ 91.041632] ffffffff8254b8c0 ffff8800b84c7310 ffffffff81083461 ffff8800b9b5e240 [ 91.042645] ffff8800b9b5ea60 0000000000000004 ffff8800b9b5ea98 ffff8800b9b5e240 [ 91.043645] Call Trace: [ 91.043968] [<ffffffff8127e395>] dump_stack+0x81/0xb6 [ 91.044620] [<ffffffff81083461>] print_circular_bug+0x1f6/0x204 [ 91.045377] [<ffffffff8108435a>] validate_chain.isra.37+0x7c8/0xa5b [ 91.046193] [<ffffffff8101c65c>] ? paravirt_sched_clock+0x9/0xd [ 91.046951] [<ffffffff810853b7>] __lock_acquire+0x5e4/0x690 [ 91.047666] [<ffffffff810853b7>] ? __lock_acquire+0x5e4/0x690 [ 91.048404] [<ffffffff81085a32>] lock_acquire+0x140/0x1d8 [ 91.049104] [<ffffffff8140827c>] ? __dev_queue_xmit+0x3d3/0x751 [ 91.049868] [<ffffffff814f5ee6>] _raw_spin_lock+0x2f/0x65 [ 91.050564] [<ffffffff8140827c>] ? __dev_queue_xmit+0x3d3/0x751 [ 91.051322] [<ffffffff8140827c>] __dev_queue_xmit+0x3d3/0x751 [ 91.052061] [<ffffffff813f5620>] ? __alloc_skb+0xae/0x19c [ 91.052760] [<ffffffff81408605>] dev_queue_xmit+0xb/0xd [ 91.053431] [<ffffffff8146ac86>] arp_xmit+0x32/0x7e [ 91.054069] [<ffffffff8146ad0e>] arp_send_dst+0x3c/0x42 [ 91.054741] [<ffffffff8146af8f>] arp_solicit+0x27b/0x28e [ 91.055423] [<ffffffff8140f08f>] neigh_probe+0x4a/0x5e [ 91.056097] [<ffffffff8141066f>] __neigh_event_send+0x1d0/0x21b [ 91.056853] [<ffffffff814106e5>] neigh_event_send+0x2b/0x2d [ 91.057569] [<ffffffff81411e45>] neigh_resolve_output+0x18/0x12e [ 91.058341] [<ffffffff814421e8>] ip_finish_output2+0x3c0/0x41c [ 91.059087] [<ffffffff81442a7b>] ip_finish_output+0x132/0x13e [ 91.059820] [<ffffffff81442aa8>] NF_HOOK_COND.constprop.43+0x21/0x8a [ 91.060634] [<ffffffff81434c61>] ? rcu_read_unlock+0x5d/0x5f [ 91.061362] [<ffffffff81434e96>] ? nf_hook_slow+0x94/0x9e [ 91.062061] [<ffffffff81443c19>] ip_output+0x65/0x6a [ 91.062703] [<ffffffff81441ae3>] dst_output+0x2b/0x30 [ 91.063350] [<ffffffff8144355a>] ip_local_out+0x2a/0x31 [ 91.064025] [<ffffffff813985f1>] vrf_xmit+0x1f6/0x417 [ 91.064675] [<ffffffff81081976>] ? __lock_is_held+0x38/0x50 [ 91.065393] [<ffffffff81407ba8>] dev_hard_start_xmit+0x154/0x337 [ 91.066170] [<ffffffff81427f15>] sch_direct_xmit+0x98/0x16c [ 91.066916] [<ffffffff814082fc>] __dev_queue_xmit+0x453/0x751 [ 91.067663] [<ffffffff81427011>] ? eth_header+0x27/0xaf [ 91.068353] [<ffffffff81408605>] dev_queue_xmit+0xb/0xd [ 91.069031] [<ffffffff81411f40>] neigh_resolve_output+0x113/0x12e [ 91.069834] [<ffffffff81397881>] dst_neigh_output.isra.20+0x13b/0x148 [ 91.070655] [<ffffffff81397992>] vrf_finish_output+0x104/0x139 [ 91.071400] [<ffffffff81397b5d>] vrf_output+0x5c/0xc1 [ 91.072055] [<ffffffff81443525>] ? __ip_local_out+0x9e/0xa9 [ 91.072769] [<ffffffff81441ae3>] dst_output+0x2b/0x30 [ 91.073419] [<ffffffff8144355a>] ip_local_out+0x2a/0x31 [ 91.074093] [<ffffffff814443cb>] ip_send_skb+0x14/0x38 [ 91.074756] [<ffffffff8144441d>] ip_push_pending_frames+0x2e/0x31 [ 91.075533] [<ffffffff814650ca>] raw_sendmsg+0x788/0x9d2 [ 91.076224] [<ffffffff8101c65c>] ? paravirt_sched_clock+0x9/0xd [ 91.076984] [<ffffffff81030008>] ? native_cpu_up+0x214/0x7c1 [ 91.077710] [<ffffffff81085cd9>] ? lock_release+0x20f/0x4c4 [ 91.078446] [<ffffffff81472438>] inet_sendmsg+0x35/0x5c [ 91.079135] [<ffffffff813ed311>] sock_sendmsg_nosec+0x12/0x1d [ 91.079873] [<ffffffff813ee1c5>] ___sys_sendmsg+0x1b1/0x21f [ 91.080607] [<ffffffff81085cd9>] ? lock_release+0x20f/0x4c4 [ 91.081333] [<ffffffff814f421b>] ? __mutex_unlock_slowpath+0x152/0x15f [ 91.082191] [<ffffffff814f4231>] ? mutex_unlock+0x9/0xb [ 91.082872] [<ffffffff81085cd9>] ? lock_release+0x20f/0x4c4 [ 91.083601] [<ffffffff81173107>] ? __fget_light+0x48/0x6f [ 91.084306] [<ffffffff813ee4ea>] __sys_sendmsg+0x40/0x5e [ 91.084995] [<ffffffff813ee4ea>] ? __sys_sendmsg+0x40/0x5e [ 91.085704] [<ffffffff813ee51c>] SyS_sendmsg+0x14/0x16 [ 91.086371] [<ffffffff814f69bc>] entry_SYSCALL_64_fastpath+0x1f/0xbd [ 91.087206] [<ffffffff81081420>] ? trace_hardirqs_off_caller+0xbc/0x122
On Wed, Jun 8, 2016 at 6:26 PM, David Ahern <dsa@cumulusnetworks.com> wrote: > On 6/8/16 5:55 PM, Eric Dumazet wrote: >> >> In case a qdisc is used on a vrf device, we need to use different >> lockdep classes to avoid false positives. >> >> Fixes: f9eb8aea2a1e ("net_sched: transform qdisc running bit into a >> seqcount") >> Reported-by: David Ahern <dsa@cumulusnetworks.com> >> Signed-off-by: Eric Dumazet <edumazet@google.com> >> --- >> drivers/net/vrf.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c >> index 1b214ea4619a..ee6bc1c5c1ce 100644 >> --- a/drivers/net/vrf.c >> +++ b/drivers/net/vrf.c >> @@ -657,7 +657,7 @@ static int vrf_dev_init(struct net_device *dev) >> >> /* similarly, oper state is irrelevant; set to up to avoid >> confusion */ >> dev->operstate = IF_OPER_UP; >> - >> + netdev_lockdep_set_classes(dev); >> return 0; >> >> out_rth: >> > > Still see the problem; all 4 patches applied, make clean followed by a build > just to make sure. > > [ 90.956522] > [ 90.956952] ====================================================== > [ 90.958441] [ INFO: possible circular locking dependency detected ] > [ 90.959820] 4.7.0-rc1+ #271 Not tainted > [ 90.960672] ------------------------------------------------------- > [ 90.962051] ping/1585 is trying to acquire lock: > [ 90.962997] (&(&list->lock)->rlock#3){+.-...}, at: [<ffffffff8140827c>] > __dev_queue_xmit+0x3d3/0x751 > [ 90.965033] > [ 90.965033] but task is already holding lock: > [ 90.966200] (dev->qdisc_running_key ?: &qdisc_running_key#2){+.....}, > at: [<ffffffff814082d1>] __dev_queue_xmit+0x428/0x751 > [ 90.968591] > [ 90.968591] which lock already depends on the new lock. > [ 90.968591] > [ 90.970014] > [ 90.970014] the existing dependency chain (in reverse order) is: > [ 90.971287] > -> #1 (dev->qdisc_running_key ?: &qdisc_running_key#2){+.....}: > [ 90.972611] [<ffffffff810853b7>] __lock_acquire+0x5e4/0x690 > [ 90.973712] [<ffffffff81085a32>] lock_acquire+0x140/0x1d8 > [ 90.974668] [<ffffffff8140259d>] write_seqcount_begin+0x21/0x24 > [ 90.975673] [<ffffffff814082d1>] __dev_queue_xmit+0x428/0x751 > [ 90.976608] [<ffffffff81408605>] dev_queue_xmit+0xb/0xd > [ 90.977484] [<ffffffff81411f40>] neigh_resolve_output+0x113/0x12e > [ 90.978497] [<ffffffff81397881>] > dst_neigh_output.isra.20+0x13b/0x148 > [ 90.979548] [<ffffffff81397992>] vrf_finish_output+0x104/0x139 > [ 90.980504] [<ffffffff81397b5d>] vrf_output+0x5c/0xc1 > [ 90.981365] [<ffffffff81441ae3>] dst_output+0x2b/0x30 > [ 90.982240] [<ffffffff8144355a>] ip_local_out+0x2a/0x31 > [ 90.983109] [<ffffffff814443cb>] ip_send_skb+0x14/0x38 > [ 90.983974] [<ffffffff8144441d>] ip_push_pending_frames+0x2e/0x31 > [ 90.984887] [<ffffffff814650ca>] raw_sendmsg+0x788/0x9d2 > [ 90.985658] [<ffffffff81472438>] inet_sendmsg+0x35/0x5c > [ 90.986452] [<ffffffff813ed311>] sock_sendmsg_nosec+0x12/0x1d > [ 90.987275] [<ffffffff813ee1c5>] ___sys_sendmsg+0x1b1/0x21f > [ 90.988094] [<ffffffff813ee4ea>] __sys_sendmsg+0x40/0x5e > [ 90.988882] [<ffffffff813ee51c>] SyS_sendmsg+0x14/0x16 > [ 90.989647] [<ffffffff814f69bc>] > entry_SYSCALL_64_fastpath+0x1f/0xbd > [ 90.990590] > -> #0 (&(&list->lock)->rlock#3){+.-...}: > [ 90.991368] [<ffffffff8108435a>] > validate_chain.isra.37+0x7c8/0xa5b > [ 90.992268] [<ffffffff810853b7>] __lock_acquire+0x5e4/0x690 > [ 90.993114] [<ffffffff81085a32>] lock_acquire+0x140/0x1d8 > [ 90.993955] [<ffffffff814f5ee6>] _raw_spin_lock+0x2f/0x65 > [ 90.994751] [<ffffffff8140827c>] __dev_queue_xmit+0x3d3/0x751 > [ 90.995574] [<ffffffff81408605>] dev_queue_xmit+0xb/0xd > [ 90.996338] [<ffffffff8146ac86>] arp_xmit+0x32/0x7e > [ 90.997062] [<ffffffff8146ad0e>] arp_send_dst+0x3c/0x42 > [ 90.997834] [<ffffffff8146af8f>] arp_solicit+0x27b/0x28e > [ 90.998631] [<ffffffff8140f08f>] neigh_probe+0x4a/0x5e > [ 90.999407] [<ffffffff8141066f>] __neigh_event_send+0x1d0/0x21b > [ 91.000264] [<ffffffff814106e5>] neigh_event_send+0x2b/0x2d > [ 91.001075] [<ffffffff81411e45>] neigh_resolve_output+0x18/0x12e > [ 91.001956] [<ffffffff814421e8>] ip_finish_output2+0x3c0/0x41c > [ 91.002803] [<ffffffff81442a7b>] ip_finish_output+0x132/0x13e > [ 91.003630] [<ffffffff81442aa8>] > NF_HOOK_COND.constprop.43+0x21/0x8a > [ 91.004550] [<ffffffff81443c19>] ip_output+0x65/0x6a > [ 91.005293] [<ffffffff81441ae3>] dst_output+0x2b/0x30 > [ 91.006066] [<ffffffff8144355a>] ip_local_out+0x2a/0x31 > [ 91.006821] [<ffffffff813985f1>] vrf_xmit+0x1f6/0x417 > [ 91.007554] [<ffffffff81407ba8>] dev_hard_start_xmit+0x154/0x337 > [ 91.008410] [<ffffffff81427f15>] sch_direct_xmit+0x98/0x16c > [ 91.009211] [<ffffffff814082fc>] __dev_queue_xmit+0x453/0x751 > [ 91.010043] [<ffffffff81408605>] dev_queue_xmit+0xb/0xd > [ 91.010806] [<ffffffff81411f40>] neigh_resolve_output+0x113/0x12e > [ 91.011690] [<ffffffff81397881>] > dst_neigh_output.isra.20+0x13b/0x148 > [ 91.012600] [<ffffffff81397992>] vrf_finish_output+0x104/0x139 > [ 91.013431] [<ffffffff81397b5d>] vrf_output+0x5c/0xc1 > [ 91.014172] [<ffffffff81441ae3>] dst_output+0x2b/0x30 > [ 91.014906] [<ffffffff8144355a>] ip_local_out+0x2a/0x31 > [ 91.015659] [<ffffffff814443cb>] ip_send_skb+0x14/0x38 > [ 91.016398] [<ffffffff8144441d>] ip_push_pending_frames+0x2e/0x31 > [ 91.017259] [<ffffffff814650ca>] raw_sendmsg+0x788/0x9d2 > [ 91.018033] [<ffffffff81472438>] inet_sendmsg+0x35/0x5c > [ 91.018797] [<ffffffff813ed311>] sock_sendmsg_nosec+0x12/0x1d > [ 91.019615] [<ffffffff813ee1c5>] ___sys_sendmsg+0x1b1/0x21f > [ 91.020417] [<ffffffff813ee4ea>] __sys_sendmsg+0x40/0x5e > [ 91.021180] [<ffffffff813ee51c>] SyS_sendmsg+0x14/0x16 > [ 91.021927] [<ffffffff814f69bc>] > entry_SYSCALL_64_fastpath+0x1f/0xbd > [ 91.022820] > [ 91.022820] other info that might help us debug this: > [ 91.022820] > [ 91.023848] Possible unsafe locking scenario: > [ 91.023848] > [ 91.024595] CPU0 CPU1 > [ 91.025173] ---- ---- > [ 91.025751] lock(dev->qdisc_running_key ?: &qdisc_running_key#2); > [ 91.026600] lock(&(&list->lock)->rlock#3); > [ 91.027505] lock(dev->qdisc_running_key ?: &qdisc_running_key#2); > [ 91.028657] lock(&(&list->lock)->rlock#3); > [ 91.029257] > [ 91.029257] *** DEADLOCK *** > [ 91.029257] > [ 91.030008] 6 locks held by ping/1585: > [ 91.030485] #0: (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff8146506b>] > raw_sendmsg+0x729/0x9d2 > [ 91.031643] #1: (rcu_read_lock_bh){......}, at: [<ffffffff81397076>] > rcu_lock_acquire+0x0/0x20 > [ 91.032834] #2: (rcu_read_lock_bh){......}, at: [<ffffffff813ffc04>] > rcu_lock_acquire+0x0/0x20 > [ 91.034040] #3: (dev->qdisc_running_key ?: > &qdisc_running_key#2){+.....}, at: [<ffffffff814082d1>] > __dev_queue_xmit+0x428/0x751 > [ 91.035622] #4: (rcu_read_lock_bh){......}, at: [<ffffffff814418d3>] > rcu_lock_acquire+0x0/0x20 > [ 91.036823] #5: (rcu_read_lock_bh){......}, at: [<ffffffff813ffc04>] > rcu_lock_acquire+0x0/0x20 > [ 91.038016] > [ 91.038016] stack backtrace: > [ 91.038574] CPU: 6 PID: 1585 Comm: ping Not tainted 4.7.0-rc1+ #271 > [ 91.039366] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.7.5-20140531_083030-gandalf 04/01/2014 > [ 91.040634] 0000000000000000 ffff8800b84c72d0 ffffffff8127e395 > ffffffff8254b8c0 > [ 91.041632] ffffffff8254b8c0 ffff8800b84c7310 ffffffff81083461 > ffff8800b9b5e240 > [ 91.042645] ffff8800b9b5ea60 0000000000000004 ffff8800b9b5ea98 > ffff8800b9b5e240 > [ 91.043645] Call Trace: > [ 91.043968] [<ffffffff8127e395>] dump_stack+0x81/0xb6 > [ 91.044620] [<ffffffff81083461>] print_circular_bug+0x1f6/0x204 > [ 91.045377] [<ffffffff8108435a>] validate_chain.isra.37+0x7c8/0xa5b > [ 91.046193] [<ffffffff8101c65c>] ? paravirt_sched_clock+0x9/0xd > [ 91.046951] [<ffffffff810853b7>] __lock_acquire+0x5e4/0x690 > [ 91.047666] [<ffffffff810853b7>] ? __lock_acquire+0x5e4/0x690 > [ 91.048404] [<ffffffff81085a32>] lock_acquire+0x140/0x1d8 > [ 91.049104] [<ffffffff8140827c>] ? __dev_queue_xmit+0x3d3/0x751 > [ 91.049868] [<ffffffff814f5ee6>] _raw_spin_lock+0x2f/0x65 > [ 91.050564] [<ffffffff8140827c>] ? __dev_queue_xmit+0x3d3/0x751 > [ 91.051322] [<ffffffff8140827c>] __dev_queue_xmit+0x3d3/0x751 > [ 91.052061] [<ffffffff813f5620>] ? __alloc_skb+0xae/0x19c > [ 91.052760] [<ffffffff81408605>] dev_queue_xmit+0xb/0xd > [ 91.053431] [<ffffffff8146ac86>] arp_xmit+0x32/0x7e > [ 91.054069] [<ffffffff8146ad0e>] arp_send_dst+0x3c/0x42 > [ 91.054741] [<ffffffff8146af8f>] arp_solicit+0x27b/0x28e > [ 91.055423] [<ffffffff8140f08f>] neigh_probe+0x4a/0x5e > [ 91.056097] [<ffffffff8141066f>] __neigh_event_send+0x1d0/0x21b > [ 91.056853] [<ffffffff814106e5>] neigh_event_send+0x2b/0x2d > [ 91.057569] [<ffffffff81411e45>] neigh_resolve_output+0x18/0x12e > [ 91.058341] [<ffffffff814421e8>] ip_finish_output2+0x3c0/0x41c > [ 91.059087] [<ffffffff81442a7b>] ip_finish_output+0x132/0x13e > [ 91.059820] [<ffffffff81442aa8>] NF_HOOK_COND.constprop.43+0x21/0x8a > [ 91.060634] [<ffffffff81434c61>] ? rcu_read_unlock+0x5d/0x5f > [ 91.061362] [<ffffffff81434e96>] ? nf_hook_slow+0x94/0x9e > [ 91.062061] [<ffffffff81443c19>] ip_output+0x65/0x6a > [ 91.062703] [<ffffffff81441ae3>] dst_output+0x2b/0x30 > [ 91.063350] [<ffffffff8144355a>] ip_local_out+0x2a/0x31 > [ 91.064025] [<ffffffff813985f1>] vrf_xmit+0x1f6/0x417 > [ 91.064675] [<ffffffff81081976>] ? __lock_is_held+0x38/0x50 > [ 91.065393] [<ffffffff81407ba8>] dev_hard_start_xmit+0x154/0x337 > [ 91.066170] [<ffffffff81427f15>] sch_direct_xmit+0x98/0x16c > [ 91.066916] [<ffffffff814082fc>] __dev_queue_xmit+0x453/0x751 > [ 91.067663] [<ffffffff81427011>] ? eth_header+0x27/0xaf > [ 91.068353] [<ffffffff81408605>] dev_queue_xmit+0xb/0xd > [ 91.069031] [<ffffffff81411f40>] neigh_resolve_output+0x113/0x12e > [ 91.069834] [<ffffffff81397881>] dst_neigh_output.isra.20+0x13b/0x148 > [ 91.070655] [<ffffffff81397992>] vrf_finish_output+0x104/0x139 > [ 91.071400] [<ffffffff81397b5d>] vrf_output+0x5c/0xc1 > [ 91.072055] [<ffffffff81443525>] ? __ip_local_out+0x9e/0xa9 > [ 91.072769] [<ffffffff81441ae3>] dst_output+0x2b/0x30 > [ 91.073419] [<ffffffff8144355a>] ip_local_out+0x2a/0x31 > [ 91.074093] [<ffffffff814443cb>] ip_send_skb+0x14/0x38 > [ 91.074756] [<ffffffff8144441d>] ip_push_pending_frames+0x2e/0x31 > [ 91.075533] [<ffffffff814650ca>] raw_sendmsg+0x788/0x9d2 > [ 91.076224] [<ffffffff8101c65c>] ? paravirt_sched_clock+0x9/0xd > [ 91.076984] [<ffffffff81030008>] ? native_cpu_up+0x214/0x7c1 > [ 91.077710] [<ffffffff81085cd9>] ? lock_release+0x20f/0x4c4 > [ 91.078446] [<ffffffff81472438>] inet_sendmsg+0x35/0x5c > [ 91.079135] [<ffffffff813ed311>] sock_sendmsg_nosec+0x12/0x1d > [ 91.079873] [<ffffffff813ee1c5>] ___sys_sendmsg+0x1b1/0x21f > [ 91.080607] [<ffffffff81085cd9>] ? lock_release+0x20f/0x4c4 > [ 91.081333] [<ffffffff814f421b>] ? __mutex_unlock_slowpath+0x152/0x15f > [ 91.082191] [<ffffffff814f4231>] ? mutex_unlock+0x9/0xb > [ 91.082872] [<ffffffff81085cd9>] ? lock_release+0x20f/0x4c4 > [ 91.083601] [<ffffffff81173107>] ? __fget_light+0x48/0x6f > [ 91.084306] [<ffffffff813ee4ea>] __sys_sendmsg+0x40/0x5e > [ 91.084995] [<ffffffff813ee4ea>] ? __sys_sendmsg+0x40/0x5e > [ 91.085704] [<ffffffff813ee51c>] SyS_sendmsg+0x14/0x16 > [ 91.086371] [<ffffffff814f69bc>] entry_SYSCALL_64_fastpath+0x1f/0xbd > [ 91.087206] [<ffffffff81081420>] ? trace_hardirqs_off_caller+0xbc/0x122 For this one, it looks vrf misses the _xmit_lock lockdep support. We might need to factorize the code found for example in bond_set_lockdep_class_one() Have you run lockdep before ? Strange that these issues were not spotted earlier.
On 6/8/16 7:43 PM, Eric Dumazet wrote: > For this one, it looks vrf misses the _xmit_lock lockdep support. > > We might need to factorize the code found for example in > bond_set_lockdep_class_one() > > Have you run lockdep before ? Strange that these issues were not > spotted earlier. Standard config for non-performance builds has the locking and rcu debugs enabled. You posted the patches on Jun 6; this morning was the first build with them and first run hit it.
On Wed, Jun 8, 2016 at 6:49 PM, David Ahern <dsa@cumulusnetworks.com> wrote: > On 6/8/16 7:43 PM, Eric Dumazet wrote: >> >> For this one, it looks vrf misses the _xmit_lock lockdep support. >> >> We might need to factorize the code found for example in >> bond_set_lockdep_class_one() >> >> Have you run lockdep before ? Strange that these issues were not >> spotted earlier. > > > Standard config for non-performance builds has the locking and rcu debugs > enabled. You posted the patches on Jun 6; this morning was the first build > with them and first run hit it. Then you were simply avoiding the deadlock detection for some reason. Two ->_xmit_more with same class would detect an issue. This has nothing to do with my patches. (Note they are all false positives... of course)
From: Eric Dumazet <edumazet@google.com> Date: Wed, 8 Jun 2016 16:55:25 -0700 > In case a qdisc is used on a vrf device, we need to use different > lockdep classes to avoid false positives. > > Fixes: f9eb8aea2a1e ("net_sched: transform qdisc running bit into a seqcount") > Reported-by: David Ahern <dsa@cumulusnetworks.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> Applied.
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 1b214ea4619a..ee6bc1c5c1ce 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -657,7 +657,7 @@ static int vrf_dev_init(struct net_device *dev) /* similarly, oper state is irrelevant; set to up to avoid confusion */ dev->operstate = IF_OPER_UP; - + netdev_lockdep_set_classes(dev); return 0; out_rth:
In case a qdisc is used on a vrf device, we need to use different lockdep classes to avoid false positives. Fixes: f9eb8aea2a1e ("net_sched: transform qdisc running bit into a seqcount") Reported-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Eric Dumazet <edumazet@google.com> --- drivers/net/vrf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)