Message ID | 1368447851.13473.119.camel@edumazet-glaptop |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Hi Thank you for superfast answer :) It is changed a bit, but not disappeared. Relevant part from new dmesg: [ 3.936191] ================================= [ 3.936279] [ INFO: inconsistent lock state ] [ 3.936367] 3.9.2-build-0064 #6 Not tainted [ 3.936453] --------------------------------- [ 3.936540] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 3.936630] rmmod/1843 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 3.936718] (&(&app->lock)->rlock#2){+.?...}, at: [<f862bb5b>] mrp_uninit_applicant+0x69/0xba [mrp] [ 3.936967] {IN-SOFTIRQ-W} state was registered at: [ 3.937055] [<c01611d8>] __lock_acquire+0x284/0xd52 [ 3.937058] [<c0162068>] lock_acquire+0x71/0x85 [ 3.937058] [<c03e4cae>] _raw_spin_lock_bh+0x38/0x45 [ 3.937058] [<f862b5dc>] mrp_join_timer+0x11/0x3d [mrp] [ 3.937058] [<c0136102>] call_timer_fn+0x69/0xdc [ 3.937058] [<c01362e6>] run_timer_softirq+0x171/0x1ab [ 3.937058] [<c0131963>] __do_softirq+0x98/0x153 [ 3.937058] [<c0131afa>] irq_exit+0x47/0x82 [ 3.937058] [<c011c274>] smp_apic_timer_interrupt+0x64/0x71 [ 3.937058] [<c03e5616>] apic_timer_interrupt+0x32/0x38 [ 3.937058] [<c03e3464>] mutex_lock_nested+0x20/0x22 [ 3.937058] [<c019a272>] unlink_file_vma+0x25/0x75 [ 3.937058] [<c0195dfc>] free_pgtables+0x55/0x99 [ 3.937058] [<c019b4a2>] exit_mmap+0x83/0xd5 [ 3.937058] [<c0129fe7>] mmput+0x4c/0xc4 [ 3.937058] [<c013004d>] do_exit+0x2ac/0x730 [ 3.937058] [<c0130615>] do_group_exit+0x60/0x83 [ 3.937058] [<c013064b>] sys_exit_group+0x13/0x17 [ 3.937058] [<c03e5334>] syscall_call+0x7/0xb [ 3.937058] irq event stamp: 4683 [ 3.937058] hardirqs last enabled at (4683): [<c03e50bb>] _raw_spin_unlock_irqrestore+0x36/0x3c [ 3.937058] hardirqs last disabled at (4682): [<c03e4b92>] _raw_spin_lock_irqsave+0x16/0x50 [ 3.937058] softirqs last enabled at (4646): [<c0379c22>] neigh_ifdown+0x95/0xb5 [ 3.937058] softirqs last disabled at (4644): [<c03e4cc9>] _raw_write_lock_bh+0xe/0x45 [ 3.937058] [ 3.937058] other info that might help us debug this: [ 3.937058] Possible unsafe locking scenario: [ 3.937058] [ 3.937058] CPU0 [ 3.937058] ---- [ 3.937058] lock(&(&app->lock)->rlock#2); [ 3.937058] <Interrupt> [ 3.937058] lock(&(&app->lock)->rlock#2); [ 3.937058] [ 3.937058] *** DEADLOCK *** [ 3.937058] [ 3.937058] 3 locks held by rmmod/1843: [ 3.937058] #0: (&__lockdep_no_validate__){......}, at: [<c02c195c>] device_lock+0xd/0xf [ 3.937058] #1: (&__lockdep_no_validate__){......}, at: [<c02c195c>] device_lock+0xd/0xf [ 3.937058] #2: (rtnl_mutex){+.+.+.}, at: [<c037d71b>] rtnl_lock+0xf/0x11 [ 3.937058] [ 3.937058] stack backtrace: [ 3.937058] Pid: 1843, comm: rmmod Not tainted 3.9.2-build-0064 #6 [ 3.937058] Call Trace: [ 3.937058] [<c0160d8e>] valid_state+0x1f6/0x201 [ 3.937058] [<c0160e6a>] mark_lock+0xd1/0x1bb [ 3.937058] [<c01606b1>] ? check_usage_forwards+0x77/0x77 [ 3.937058] [<c016124c>] __lock_acquire+0x2f8/0xd52 [ 3.937058] [<c0160d00>] ? valid_state+0x168/0x201 [ 3.937058] [<c0160dbf>] ? mark_lock+0x26/0x1bb [ 3.937058] [<c016230e>] ? mark_held_locks+0x5d/0x7b [ 3.937058] [<c03e50bb>] ? _raw_spin_unlock_irqrestore+0x36/0x3c [ 3.937058] [<c0162068>] lock_acquire+0x71/0x85 [ 3.937058] [<f862bb5b>] ? mrp_uninit_applicant+0x69/0xba [mrp] [ 3.937058] [<c03e4ad5>] _raw_spin_lock+0x33/0x40 [ 3.937058] [<f862bb5b>] ? mrp_uninit_applicant+0x69/0xba [mrp] [ 3.937058] [<f862bb5b>] mrp_uninit_applicant+0x69/0xba [mrp] [ 3.937058] [<c0172e67>] ? kfree_call_rcu+0x17/0x19 [ 3.937058] [<f8647bf6>] vlan_mvrp_uninit_applicant+0xd/0xf [8021q] [ 3.937058] [<f8646198>] unregister_vlan_dev+0xb8/0xe2 [8021q] [ 3.937058] [<c03d1ae8>] ? packet_mm_close+0x1f/0x1f [ 3.937058] [<f86464c9>] vlan_device_event+0x307/0x35a [8021q] [ 3.937058] [<c03d1ce7>] ? rcu_read_unlock+0x17/0x19 [ 3.937058] [<c0146215>] notifier_call_chain+0x26/0x48 [ 3.937058] [<c0146283>] raw_notifier_call_chain+0x1a/0x1c [ 3.937058] [<c0370310>] call_netdevice_notifiers+0x44/0x4b [ 3.937058] [<c0370604>] rollback_registered_many+0xe2/0x185 [ 3.937058] [<c037070d>] rollback_registered+0x1f/0x2c [ 3.937058] [<c0370777>] unregister_netdevice_queue+0x5d/0x87 [ 3.937058] [<c03e3464>] ? mutex_lock_nested+0x20/0x22 [ 3.937058] [<c037d71b>] ? rtnl_lock+0xf/0x11 [ 3.937058] [<c0370851>] unregister_netdev+0x18/0x1f [ 3.937058] [<f85cb017>] rtl_remove_one+0x7f/0xf9 [r8169] [ 3.937058] [<c02c9117>] ? __pm_runtime_resume+0x48/0x50 [ 3.937058] [<c026cea9>] pci_device_remove+0x27/0x76 [ 3.937058] [<c02c1b70>] __device_release_driver+0x66/0xaa [ 3.937058] [<c02c20a3>] driver_detach+0x62/0x83 [ 3.937058] [<c02c1919>] bus_remove_driver+0x69/0x88 [ 3.937058] [<c02c23be>] driver_unregister+0x53/0x5a [ 3.937058] [<c016243a>] ? trace_hardirqs_on_caller+0x10e/0x13f [ 3.937058] [<c026cf87>] pci_unregister_driver+0x10/0x5a [ 3.937058] [<f85cd8fb>] rtl8169_pci_driver_exit+0xd/0xf [r8169] [ 3.937058] [<c01688cf>] sys_delete_module+0x175/0x1c1 [ 3.937058] [<c03e5367>] ? restore_all+0xf/0xf [ 3.937058] [<c03e5334>] syscall_call+0x7/0xb On 2013-05-13 15:24, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > On Mon, 2013-05-13 at 14:59 +0300, Denys Fedoryshchenko wrote: > Hi > > Noticed on 32-builds of my systems that r8169 is not working anymore at > all. > Problem started to occur somewhere after 3.6 kernels, but it is > difficult to try other kernels on this semi-embedded system, and i am > not completely sure about that. > Once i used another OS build, but 64-bit build with same latest kernel, > and it worked. > It spits netdev watchdog timeouts, and constantly shows interface > coming > up. > > Here is max debug output(dmesg) i got, with some inline comments: > > > Here i am trying to remove module > > [ 19.735147] ================================= > [ 19.735235] [ INFO: inconsistent lock state ] > [ 19.735324] 3.9.2-build-0063 #4 Not tainted > [ 19.735412] --------------------------------- > [ 19.735500] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. > [ 19.735592] rmmod/1840 [HC0[0]:SC0[0]:HE1:SE1] takes: > [ 19.735682] (&(&app->lock)->rlock#2){+.?...}, at: [<f862bb5b>] > mrp_uninit_applicant+0x69/0xba [mrp] > [ 19.735932] {IN-SOFTIRQ-W} state was registered at: > [ 19.736002] [<c01611d8>] __lock_acquire+0x284/0xd52 > [ 19.736002] [<c0162068>] lock_acquire+0x71/0x85 > [ 19.736002] [<c03e4ad5>] _raw_spin_lock+0x33/0x40 > [ 19.736002] [<f862b5dc>] mrp_join_timer+0x11/0x3d [mrp] > [ 19.736002] [<c0136102>] call_timer_fn+0x69/0xdc > [ 19.736002] [<c01362e6>] run_timer_softirq+0x171/0x1ab > [ 19.736002] [<c0131963>] __do_softirq+0x98/0x153 > [ 19.736002] [<c0131afa>] irq_exit+0x47/0x82 > [ 19.736002] [<c011c274>] smp_apic_timer_interrupt+0x64/0x71 > [ 19.736002] [<c03e5616>] apic_timer_interrupt+0x32/0x38 > [ 19.736002] [<c010902c>] cpu_idle+0x55/0x6f > [ 19.736002] [<c03d7d19>] rest_init+0xa1/0xa7 > [ 19.736002] [<c057792f>] start_kernel+0x300/0x307 > [ 19.736002] [<c05772a3>] i386_start_kernel+0x79/0x7d > [ 19.736002] irq event stamp: 4379 > [ 19.736002] hardirqs last enabled at (4379): [<c03e50bb>] > _raw_spin_unlock_irqrestore+0x36/0x3c > [ 19.736002] hardirqs last disabled at (4378): [<c03e4b92>] > _raw_spin_lock_irqsave+0x16/0x50 > [ 19.736002] softirqs last enabled at (4342): [<c0379c22>] > neigh_ifdown+0x95/0xb5 > [ 19.736002] softirqs last disabled at (4340): [<c03e4cc9>] > _raw_write_lock_bh+0xe/0x45 > [ 19.736002] > [ 19.736002] other info that might help us debug this: > [ 19.736002] Possible unsafe locking scenario: > [ 19.736002] > [ 19.736002] CPU0 > [ 19.736002] ---- > [ 19.736002] lock(&(&app->lock)->rlock#2); > [ 19.736002] <Interrupt> > [ 19.736002] lock(&(&app->lock)->rlock#2); > [ 19.736002] > [ 19.736002] *** DEADLOCK *** > [ 19.736002] > [ 19.736002] 3 locks held by rmmod/1840: > [ 19.736002] #0: (&__lockdep_no_validate__){......}, at: > [<c02c195c>] device_lock+0xd/0xf > [ 19.736002] #1: (&__lockdep_no_validate__){......}, at: > [<c02c195c>] device_lock+0xd/0xf > [ 19.736002] #2: (rtnl_mutex){+.+.+.}, at: [<c037d71b>] > rtnl_lock+0xf/0x11 > [ 19.736002] > [ 19.736002] stack backtrace: > [ 19.736002] Pid: 1840, comm: rmmod Not tainted 3.9.2-build-0063 #4 > [ 19.736002] Call Trace: > [ 19.736002] [<c0160d8e>] valid_state+0x1f6/0x201 > [ 19.736002] [<c0160e6a>] mark_lock+0xd1/0x1bb > [ 19.736002] [<c01606b1>] ? check_usage_forwards+0x77/0x77 > [ 19.736002] [<c016124c>] __lock_acquire+0x2f8/0xd52 > [ 19.736002] [<c0160d00>] ? valid_state+0x168/0x201 > [ 19.736002] [<c0160dbf>] ? mark_lock+0x26/0x1bb > [ 19.736002] [<c016230e>] ? mark_held_locks+0x5d/0x7b > [ 19.736002] [<c03e50bb>] ? _raw_spin_unlock_irqrestore+0x36/0x3c > [ 19.736002] [<c0162068>] lock_acquire+0x71/0x85 > [ 19.736002] [<f862bb5b>] ? mrp_uninit_applicant+0x69/0xba [mrp] > [ 19.736002] [<c03e4ad5>] _raw_spin_lock+0x33/0x40 > [ 19.736002] [<f862bb5b>] ? mrp_uninit_applicant+0x69/0xba [mrp] > [ 19.736002] [<f862bb5b>] mrp_uninit_applicant+0x69/0xba [mrp] > [ 19.736002] [<c0172e67>] ? kfree_call_rcu+0x17/0x19 > [ 19.736002] [<f8647bf6>] vlan_mvrp_uninit_applicant+0xd/0xf [8021q] > [ 19.736002] [<f8646198>] unregister_vlan_dev+0xb8/0xe2 [8021q] > [ 19.736002] [<c03d1ae8>] ? packet_mm_close+0x1f/0x1f > [ 19.736002] [<f86464c9>] vlan_device_event+0x307/0x35a [8021q] > [ 19.736002] [<c03d1ce7>] ? rcu_read_unlock+0x17/0x19 > [ 19.736002] [<c0146215>] notifier_call_chain+0x26/0x48 > [ 19.736002] [<c0146283>] raw_notifier_call_chain+0x1a/0x1c > [ 19.736002] [<c0370310>] call_netdevice_notifiers+0x44/0x4b > [ 19.736002] [<c0370604>] rollback_registered_many+0xe2/0x185 > [ 19.736002] [<c037070d>] rollback_registered+0x1f/0x2c > [ 19.736002] [<c0370777>] unregister_netdevice_queue+0x5d/0x87 > [ 19.736002] [<c03e3464>] ? mutex_lock_nested+0x20/0x22 > [ 19.736002] [<c037d71b>] ? rtnl_lock+0xf/0x11 > [ 19.736002] [<c0370851>] unregister_netdev+0x18/0x1f > [ 19.736002] [<f85cb017>] rtl_remove_one+0x7f/0xf9 [r8169] > [ 19.736002] [<c02c9117>] ? __pm_runtime_resume+0x48/0x50 > [ 19.736002] [<c026cea9>] pci_device_remove+0x27/0x76 > [ 19.736002] [<c02c1b70>] __device_release_driver+0x66/0xaa > [ 19.736002] [<c02c20a3>] driver_detach+0x62/0x83 > [ 19.736002] [<c02c1919>] bus_remove_driver+0x69/0x88 > [ 19.736002] [<c02c23be>] driver_unregister+0x53/0x5a > [ 19.736002] [<c016243a>] ? trace_hardirqs_on_caller+0x10e/0x13f > [ 19.736002] [<c026cf87>] pci_unregister_driver+0x10/0x5a > [ 19.736002] [<f85cd8fb>] rtl8169_pci_driver_exit+0xd/0xf [r8169] > [ 19.736002] [<c01688cf>] sys_delete_module+0x175/0x1c1 > [ 19.736002] [<c03e5367>] ? restore_all+0xf/0xf > [ 19.736002] [<c03e5334>] syscall_call+0x7/0xb > > Thanks for the report. Could you please try the following fix for the > module unload problem. > > [PATCH] net/802/mrp: fix lockdep splat > > commit fb745e9a037895 ("net/802/mrp: fix possible race condition when > calling mrp_pdu_queue()") introduced a lockdep splat. > > [ 19.735147] ================================= > [ 19.735235] [ INFO: inconsistent lock state ] > [ 19.735324] 3.9.2-build-0063 #4 Not tainted > [ 19.735412] --------------------------------- > [ 19.735500] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. > [ 19.735592] rmmod/1840 [HC0[0]:SC0[0]:HE1:SE1] takes: > [ 19.735682] (&(&app->lock)->rlock#2){+.?...}, at: [<f862bb5b>] > mrp_uninit_applicant+0x69/0xba [mrp] > > app->lock is normally taken under softirq context, so disable BH to > avoid the splat. > > Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: David Ward <david.ward@ll.mit.edu> > Cc: Cong Wang <amwang@redhat.com> > --- > net/802/mrp.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/802/mrp.c b/net/802/mrp.c > index e085bcc..1eb05d8 100644 > --- a/net/802/mrp.c > +++ b/net/802/mrp.c > @@ -871,10 +871,10 @@ void mrp_uninit_applicant(struct net_device > *dev, struct mrp_application *appl) > */ > del_timer_sync(&app->join_timer); > > - spin_lock(&app->lock); > + spin_lock_bh(&app->lock); > mrp_mad_event(app, MRP_EVENT_TX); > mrp_pdu_queue(app); > - spin_unlock(&app->lock); > + spin_unlock_bh(&app->lock); > > mrp_queue_xmit(app); --- Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2013-05-13 at 05:24 -0700, Eric Dumazet wrote: > del_timer_sync(&app->join_timer); > > - spin_lock(&app->lock); > + spin_lock_bh(&app->lock); But we just have removed the join timer before taking this spin lock, so it is not possible to race with timer, but we still have race condition with mrp_rcv() which is called in BH context too. However, this is not what Denys reported, so I have no idea on this. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-05-14 at 09:12 +0800, Cong Wang wrote: > On Mon, 2013-05-13 at 05:24 -0700, Eric Dumazet wrote: > > del_timer_sync(&app->join_timer); > > > > - spin_lock(&app->lock); > > + spin_lock_bh(&app->lock); > > But we just have removed the join timer before taking this spin lock, so > it is not possible to race with timer, but we still have race condition > with mrp_rcv() which is called in BH context too. However, this is not > what Denys reported, so I have no idea on this. > Thats a lockdep splat. lockdep cannot know join timer wont fire again. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Denys Fedoryshchenko <denys@visp.net.lb> Date: Mon, 13 May 2013 15:53:30 +0300 > Thank you for superfast answer :) > It is changed a bit, but not disappeared. Relevant part from new > dmesg: Can you double check that you really had Eric's patch applied? lockdep appears to be complaining about the same thing in your log dump, as if the patch was not really applied. It's saying that app->lock can be taken from the join timer in softirq, but mrp_uninit_applicant() takes it without disabling softirqs. Eric's patch explicitly fixes this, by making sure that spin_lock_bh() is used there. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-05-14 at 12:07 -0700, David Miller wrote: > Can you double check that you really had Eric's patch applied? > lockdep appears to be complaining about the same thing in your > log dump, as if the patch was not really applied. > > It's saying that app->lock can be taken from the join timer in > softirq, but mrp_uninit_applicant() takes it without disabling > softirqs. > > Eric's patch explicitly fixes this, by making sure that spin_lock_bh() > is used there. I am going to test this myself, it seems quite simple. I'll add a Tested-by: tag once done. Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-05-14 at 12:21 -0700, Eric Dumazet wrote: > On Tue, 2013-05-14 at 12:07 -0700, David Miller wrote: > > > Can you double check that you really had Eric's patch applied? > > lockdep appears to be complaining about the same thing in your > > log dump, as if the patch was not really applied. > > > > It's saying that app->lock can be taken from the join timer in > > softirq, but mrp_uninit_applicant() takes it without disabling > > softirqs. > > > > Eric's patch explicitly fixes this, by making sure that spin_lock_bh() > > is used there. > > I am going to test this myself, it seems quite simple. > I'll add a Tested-by: tag once done. I definitely could trigger the bug easily without my patch : [ 296.523421] 8021q: 802.1Q VLAN Support v1.8 [ 296.523478] 8021q: adding VLAN 0 to HW filter on device eth4 [ 322.070382] [ 322.071882] ================================= [ 322.076234] [ INFO: inconsistent lock state ] [ 322.080588] 3.10.0-dbg-DEV #383 Not tainted [ 322.084769] --------------------------------- [ 322.089122] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 322.095124] ip/19081 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 322.099995] (&(&app->lock)->rlock){+.?...}, at: [<ffffffffa0240804>] mrp_uninit_applicant+0x54/0x110 [mrp] [ 322.109805] {IN-SOFTIRQ-W} state was registered at: [ 322.114676] [<ffffffff810dbcb7>] __lock_acquire+0x747/0x1c50 [ 322.120531] [<ffffffff810dd7c7>] lock_acquire+0x97/0x150 [ 322.126028] [<ffffffff815fc5d1>] _raw_spin_lock+0x41/0x50 [ 322.131622] [<ffffffffa02408de>] mrp_join_timer+0x1e/0x70 [mrp] [ 322.137727] [<ffffffff81096c3f>] call_timer_fn+0x7f/0x1e0 [ 322.143323] [<ffffffff81097a94>] run_timer_softirq+0x214/0x2d0 [ 322.149340] [<ffffffff8108f6a5>] __do_softirq+0xe5/0x2c0 [ 322.154838] [<ffffffff8108fa1e>] irq_exit+0x9e/0xc0 [ 322.159903] [<ffffffff816073ee>] smp_apic_timer_interrupt+0x6e/0x99 [ 322.166353] [<ffffffff8160652f>] apic_timer_interrupt+0x6f/0x80 [ 322.172458] [<ffffffff81055f2e>] arch_cpu_idle+0x1e/0x30 [ 322.177966] [<ffffffff810cdb3f>] cpu_startup_entry+0x14f/0x290 [ 322.183992] [<ffffffff815e33e6>] rest_init+0xd6/0xe0 [ 322.189145] [<ffffffff81cf6e49>] start_kernel+0x3c0/0x3cd [ 322.194729] [<ffffffff81cf65a6>] x86_64_start_reservations+0x2a/0x2c [ 322.201267] [<ffffffff81cf668d>] x86_64_start_kernel+0xe5/0xec [ 322.207287] irq event stamp: 2539 [ 322.210598] hardirqs last enabled at (2539): [<ffffffff815fce1f>] _raw_spin_unlock_irqrestore+0x3f/0x70 [ 322.220084] hardirqs last disabled at (2538): [<ffffffff815fc6dc>] _raw_spin_lock_irqsave+0x1c/0x60 [ 322.229136] softirqs last enabled at (2482): [<ffffffffa00082b7>] inet6_fill_ifla6_attrs+0x277/0x2a0 [ipv6] [ 322.238977] softirqs last disabled at (2480): [<ffffffff815fcbc6>] _raw_read_lock_bh+0x16/0x60 [ 322.247599] [ 322.247599] other info that might help us debug this: [ 322.254118] Possible unsafe locking scenario: [ 322.254118] [ 322.260033] CPU0 [ 322.262478] ---- [ 322.264924] lock(&(&app->lock)->rlock); [ 322.268965] <Interrupt> [ 322.271584] lock(&(&app->lock)->rlock); [ 322.275799] [ 322.275799] *** DEADLOCK *** [ 322.275799] [ 322.281714] 1 lock held by ip/19081: [ 322.285286] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff81540787>] rtnl_lock+0x17/0x20 [ 322.293107] [ 322.293107] stack backtrace: [ 322.297461] CPU: 2 PID: 19081 Comm: ip Not tainted 3.10.0-dbg-DEV #383 [ 322.311279] ffffffff82233cc0 ffff880c126e5628 ffffffff815f618c ffff880c126e5688 [ 322.318738] ffffffff815f3c1a 0000000000000000 ffff880c00000001 ffff880c00000001 [ 322.326194] ffffffff8105a70f ffff880c126e56b8 0000000000000004 ffff880c526bacd8 [ 322.333650] Call Trace: [ 322.336098] [<ffffffff815f618c>] dump_stack+0x19/0x1b [ 322.341234] [<ffffffff815f3c1a>] print_usage_bug+0x1fc/0x20d [ 322.346981] [<ffffffff8105a70f>] ? save_stack_trace+0x2f/0x50 [ 322.352808] [<ffffffff810db50c>] mark_lock+0x28c/0x2f0 [ 322.358026] [<ffffffff810daa50>] ? check_usage_forwards+0x150/0x150 [ 322.364375] [<ffffffff810dbd1e>] __lock_acquire+0x7ae/0x1c50 [ 322.370115] [<ffffffff810de024>] ? mark_held_locks+0x74/0x140 [ 322.375951] [<ffffffff81096fa8>] ? lock_timer_base.isra.34+0x38/0x70 [ 322.382396] [<ffffffff810de024>] ? mark_held_locks+0x74/0x140 [ 322.388231] [<ffffffffa0240804>] ? mrp_uninit_applicant+0x54/0x110 [mrp] [ 322.395021] [<ffffffff810dd7c7>] lock_acquire+0x97/0x150 [ 322.400415] [<ffffffffa0240804>] ? mrp_uninit_applicant+0x54/0x110 [mrp] [ 322.407206] [<ffffffff810de2cd>] ? trace_hardirqs_on+0xd/0x10 [ 322.413041] [<ffffffff815fc5d1>] _raw_spin_lock+0x41/0x50 [ 322.418523] [<ffffffffa0240804>] ? mrp_uninit_applicant+0x54/0x110 [mrp] [ 322.425304] [<ffffffffa0240804>] mrp_uninit_applicant+0x54/0x110 [mrp] [ 322.431914] [<ffffffffa0254775>] vlan_mvrp_uninit_applicant+0x15/0x20 [8021q] [ 322.439134] [<ffffffffa02521f8>] unregister_vlan_dev+0x128/0x150 [8021q] [ 322.445914] [<ffffffff815422cc>] rtnl_dellink+0xac/0x110 [ 322.451308] [<ffffffff81543a35>] rtnetlink_rcv_msg+0x165/0x250 [ 322.457221] [<ffffffff815f922c>] ? mutex_lock_nested+0x28c/0x380 [ 322.463308] [<ffffffff81540787>] ? rtnl_lock+0x17/0x20 [ 322.468529] [<ffffffff815438d0>] ? __rtnl_unlock+0x20/0x20 [ 322.474106] [<ffffffff81561f99>] netlink_rcv_skb+0xa9/0xd0 [ 322.479672] [<ffffffff815407b5>] rtnetlink_rcv+0x25/0x40 [ 322.485065] [<ffffffff81561939>] netlink_unicast+0x169/0x1e0 [ 322.490805] [<ffffffff81561c5e>] netlink_sendmsg+0x2ae/0x360 [ 322.496545] [<ffffffff81517382>] sock_sendmsg+0xc2/0xe0 [ 322.501854] [<ffffffff8116ba23>] ? might_fault+0x53/0xb0 [ 322.507247] [<ffffffff8116ba23>] ? might_fault+0x53/0xb0 [ 322.512643] [<ffffffff81517722>] __sys_sendmsg+0x382/0x390 [ 322.518210] [<ffffffff810b36c3>] ? up_read+0x23/0x40 [ 322.523256] [<ffffffff81600e84>] ? __do_page_fault+0x254/0x4d0 [ 322.529170] [<ffffffff81175b9c>] ? do_brk+0x1fc/0x360 [ 322.534305] [<ffffffff8151aa09>] SyS_sendmsg+0x49/0x90 [ 322.539535] [<ffffffff816058c2>] system_call_fastpath+0x16/0x1b And with the patch, there is no lockdep splat anymore. Tested-by: Eric Dumazet <edumazet@google.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2013-05-14 22:07, David Miller wrote: > From: Denys Fedoryshchenko <denys@visp.net.lb> > Date: Mon, 13 May 2013 15:53:30 +0300 > > Thank you for superfast answer :) > It is changed a bit, but not disappeared. Relevant part from new > dmesg: > > Can you double check that you really had Eric's patch applied? > lockdep appears to be complaining about the same thing in your > log dump, as if the patch was not really applied. > > It's saying that app->lock can be taken from the join timer in > softirq, but mrp_uninit_applicant() takes it without disabling > softirqs. > > Eric's patch explicitly fixes this, by making sure that spin_lock_bh() > is used there. I will reapply patch on clean kernel and try again. --- Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 14 May 2013 12:40:01 -0700 > On Tue, 2013-05-14 at 12:21 -0700, Eric Dumazet wrote: >> On Tue, 2013-05-14 at 12:07 -0700, David Miller wrote: >> >> > Can you double check that you really had Eric's patch applied? >> > lockdep appears to be complaining about the same thing in your >> > log dump, as if the patch was not really applied. >> > >> > It's saying that app->lock can be taken from the join timer in >> > softirq, but mrp_uninit_applicant() takes it without disabling >> > softirqs. >> > >> > Eric's patch explicitly fixes this, by making sure that spin_lock_bh() >> > is used there. >> >> I am going to test this myself, it seems quite simple. >> I'll add a Tested-by: tag once done. > > I definitely could trigger the bug easily without my patch : ... > And with the patch, there is no lockdep splat anymore. > > Tested-by: Eric Dumazet <edumazet@google.com> Applied, thanks Eric. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2013-05-14 22:40, Eric Dumazet wrote: > > I definitely could trigger the bug easily without my patch : > > And with the patch, there is no lockdep splat anymore. > > Tested-by: Eric Dumazet <edumazet@google.com> Yes, it seems after depclean i cannot reproduce anymore this bug. My bad, seems i should recompile kernel completely next time. --- Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2013-05-15 at 17:50 +0300, Denys Fedoryshchenko wrote: > On 2013-05-14 22:40, Eric Dumazet wrote: > > > > I definitely could trigger the bug easily without my patch : > > > > > > And with the patch, there is no lockdep splat anymore. > > > > Tested-by: Eric Dumazet <edumazet@google.com> > > Yes, it seems after depclean i cannot reproduce anymore this bug. My > bad, seems i should recompile kernel completely next time. > Thanks for testing again, I was quite mystified by your original results ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/15/2013 11:22 AM, Eric Dumazet wrote: > Re: [PATCH] net/802/mrp: fix lockdep splat > > On Wed, 2013-05-15 at 17:50 +0300, Denys Fedoryshchenko wrote: > > On 2013-05-14 22:40, Eric Dumazet wrote: > > > > > > I definitely could trigger the bug easily without my patch : > > > > > > > > > > And with the patch, there is no lockdep splat anymore. > > > > > > Tested-by: Eric Dumazet <edumazet@google.com> > > > > Yes, it seems after depclean i cannot reproduce anymore this bug. My > > bad, seems i should recompile kernel completely next time. > > > > Thanks for testing again, I was quite mystified by your original > results ;) > FYI - I believe GARP (net/802/garp.c) needs the same fix. Thechange that introduced this issue was originally copied from there. David
On 05/15/2013 12:34 PM, Ward, David - 0663 - MITLL wrote: > On 05/15/2013 11:22 AM, Eric Dumazet wrote: >> Re: [PATCH] net/802/mrp: fix lockdep splat >> >> On Wed, 2013-05-15 at 17:50 +0300, Denys Fedoryshchenko wrote: >> > On 2013-05-14 22:40, Eric Dumazet wrote: >> > > >> > > I definitely could trigger the bug easily without my patch : >> > >> > >> > > >> > > And with the patch, there is no lockdep splat anymore. >> > > >> > > Tested-by: Eric Dumazet <edumazet@google.com> >> > >> > Yes, it seems after depclean i cannot reproduce anymore this bug. My >> > bad, seems i should recompile kernel completely next time. >> > >> >> Thanks for testing again, I was quite mystified by your original >> results ;) >> > > FYI - I believe GARP (net/802/garp.c) needs the same fix. Thechange > that introduced this issue was originally copied from there. > > David > My bad... I copied the patch from GARP to MRP wrong... that's what created this problem in the first place. Sorry. David
diff --git a/net/802/mrp.c b/net/802/mrp.c index e085bcc..1eb05d8 100644 --- a/net/802/mrp.c +++ b/net/802/mrp.c @@ -871,10 +871,10 @@ void mrp_uninit_applicant(struct net_device *dev, struct mrp_application *appl) */ del_timer_sync(&app->join_timer); - spin_lock(&app->lock); + spin_lock_bh(&app->lock); mrp_mad_event(app, MRP_EVENT_TX); mrp_pdu_queue(app); - spin_unlock(&app->lock); + spin_unlock_bh(&app->lock); mrp_queue_xmit(app);