Message ID | 20090702093031.GC4954@ff.dom.local |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, Jul 02, 2009 at 09:30:31AM +0000, Jarek Poplawski wrote: > On Thu, Jul 02, 2009 at 02:37:24AM +0200, Andres Freund wrote: > ... > > So I tried - and I did not catch any lockdep output before the crash. > > Unfortunately I do not have another machine on the same local network to > > catch any messages after the crash... So I could be missing some warning > > (I did synchronous logging though). > > Will check with netconsole tomorrow. > > Could you try if this patch changes anything? ...and maybe CONFIG_PACKET_MMAP turned off. Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Im getting this one too. On any .31-rc1-gitXX, it pops up a few seconds after i turn traffic shaping on with tc/htb (no netem here). Its easily reproducible on my end, turn on traffic shaping, get traffic flowing, box freezes. No error message, nothing on logs, although -sometimes- i get lockdep error messages on screen, but scrolls too fast and box is otherwise unresponsive, so i can't get the error out. This does not happen on 2.6.30 at all. Hope this helps. (i tried both CONFIG_PACKET_MMAP off and the patch, still happens). Joao Correia Centro de Informatica Universidade da Beira Interior Portugal (snip firewall script. TCBIN is just tc, OUTIF is just the outside interface (gigabit D-Link using r8169 driver)) $TCBIN class add dev $OUTIF parent 1: classid 1:1 htb rate ${UPLINK}kbit ceil ${UPLINK}kbit $TCBIN class add dev $OUTIF parent 1:1 classid 1:10 htb rate $[30*$UPLINK/100]kbit ceil $[30*$UPLINK/100]kbit prio 0 $TCBIN class add dev $OUTIF parent 1:1 classid 1:11 htb rate $[30*$UPLINK/100]kbit ceil ${UPLINK}kbit prio 1 $TCBIN class add dev $OUTIF parent 1:1 classid 1:12 htb rate $[8*$UPLINK/100]kbit ceil ${UPLINK}kbit prio 2 $TCBIN class add dev $OUTIF parent 1:1 classid 1:13 htb rate $[8*$UPLINK/100]kbit ceil ${UPLINK}kbit prio 2 $TCBIN class add dev $OUTIF parent 1:1 classid 1:14 htb rate $[10*$UPLINK/100]kbit ceil ${UPLINK}kbit prio 7 quantum 50000 $TCBIN class add dev $OUTIF parent 1:1 classid 1:15 htb rate $[13*$UPLINK/100]kbit ceil ${UPLINK}kbit prio 8 $TCBIN qdisc add dev $OUTIF parent 1:12 handle 120: sfq perturb 10 $TCBIN qdisc add dev $OUTIF parent 1:13 handle 130: sfq perturb 10 $TCBIN qdisc add dev $OUTIF parent 1:14 handle 140: sfq perturb 10 $TCBIN qdisc add dev $OUTIF parent 1:15 handle 150: sfq perturb 10 $TCBIN filter add dev $OUTIF parent 1:0 protocol ip prio 1 handle 1 fw classid 1:10 $TCBIN filter add dev $OUTIF parent 1:0 protocol ip prio 2 handle 2 fw classid 1:11 $TCBIN filter add dev $OUTIF parent 1:0 protocol ip prio 3 handle 3 fw classid 1:12 $TCBIN filter add dev $OUTIF parent 1:0 protocol ip prio 4 handle 4 fw classid 1:13 $TCBIN filter add dev $OUTIF parent 1:0 protocol ip prio 5 handle 5 fw classid 1:14 $TCBIN filter add dev $OUTIF parent 1:0 protocol ip prio 6 handle 6 fw classid 1:15 (snip firewall script) On Thu, Jul 2, 2009 at 11:12 AM, Jarek Poplawski<jarkao2@gmail.com> wrote: > On Thu, Jul 02, 2009 at 09:30:31AM +0000, Jarek Poplawski wrote: >> On Thu, Jul 02, 2009 at 02:37:24AM +0200, Andres Freund wrote: >> ... >> > So I tried - and I did not catch any lockdep output before the crash. >> > Unfortunately I do not have another machine on the same local network to >> > catch any messages after the crash... So I could be missing some warning >> > (I did synchronous logging though). >> > Will check with netconsole tomorrow. >> >> Could you try if this patch changes anything? > > ...and maybe CONFIG_PACKET_MMAP turned off. > > Jarek P. > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 02, 2009 at 11:51:56AM +0100, Joao Correia wrote: > Im getting this one too. On any .31-rc1-gitXX, it pops up a few > seconds after i turn traffic shaping on with tc/htb (no netem here). > Its easily reproducible on my end, turn on traffic shaping, get > traffic flowing, box freezes. No error message, nothing on logs, > although -sometimes- i get lockdep error messages on screen, but > scrolls too fast and box is otherwise unresponsive, so i can't get the > error out. > > This does not happen on 2.6.30 at all. > > Hope this helps. > > (i tried both CONFIG_PACKET_MMAP off and the patch, still happens). It helps for sure - to save on useless checking, and maybe reproduce too. Thank you very much Joao, Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/02/2009 12:12 PM, Jarek Poplawski wrote: > On Thu, Jul 02, 2009 at 09:30:31AM +0000, Jarek Poplawski wrote: >> On Thu, Jul 02, 2009 at 02:37:24AM +0200, Andres Freund wrote: >> ... >>> So I tried - and I did not catch any lockdep output before the crash. >>> Unfortunately I do not have another machine on the same local network to >>> catch any messages after the crash... So I could be missing some warning >>> (I did synchronous logging though). >>> Will check with netconsole tomorrow. >> >> Could you try if this patch changes anything? > > ...and maybe CONFIG_PACKET_MMAP turned off. Ok. Removed the skb_orphan and turned of CONFIG_PACKET_MMAP. Seemingly the same game. I now had another computer to catch the netconsole output. Still no lockdep warnings. Unfortunately the other computer was a windows machine with its strange terminal, so long lines are wrapped at 80cols, but that shouldn't be too bad. Andres [ 215.208044] netem: version 1.2 [ 350.040136] BUG: soft lockup - CPU#1 stuck for 61s! [openvpn:4248] [ 350.040136] Modules linked in: sch_netem sbs sbshc pcmcia snd_hda_codec_conex ant yenta_socket rsrc_nonstatic snd_hda_intel snd_hda_codec thinkpad_acpi iwlagn pcmcia_core btusb snd_hwdep ehci_hcd uhci_hcd [ 350.040136] irq event stamp: 149925 [ 350.040136] hardirqs last enabled at (149924): [<ffffffff81036a10>] restore_ args+0x0/0x30 [ 350.040136] hardirqs last disabled at (149925): [<ffffffff81035d3a>] save_arg s+0x6a/0x70 [ 350.040136] softirqs last enabled at (19946): [<ffffffff815528ad>] lock_sock _nested+0x8d/0x130 [ 350.040136] softirqs last disabled at (19952): [<ffffffff815627a8>] dev_queue _xmit+0x58/0x4b0 [ 350.040136] CPU 1: [ 350.040136] Modules linked in: sch_netem sbs sbshc pcmcia snd_hda_codec_conex ant yenta_socket rsrc_nonstatic snd_hda_intel snd_hda_codec thinkpad_acpi iwlagn pcmcia_core btusb snd_hwdep ehci_hcd uhci_hcd [ 350.040136] Pid: 4248, comm: openvpn Not tainted 2.6.31-rc1-andres-00437-gde7 327a-dirty #61 208252G [ 350.040136] RIP: 0010:[<ffffffff8103e276>] [<ffffffff8103e276>] native_read_ tsc+0x6/0x20 [ 350.040136] RSP: 0018:ffff8801271c78b8 EFLAGS: 00000206 [ 350.040136] RAX: 000000000ae20bd8 RBX: ffff8801271c78b8 RCX: 000000000ae20b00 [ 350.040136] RDX: 00000000000000b0 RSI: 0000000000006040 RDI: 0000000000000001 [ 350.040136] RBP: ffffffff81036b6e R08: ffffffff82175180 R09: 0000000000000000 [ 350.040136] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801271c6000 [ 350.040136] R13: 0000000000000000 R14: ffff88002efec400 R15: 0000000000000000 [ 350.040136] FS: 00007f98ddc646f0(0000) GS:ffff88002efde000(0000) knlGS:00000 00000000000 [ 350.040136] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 350.040136] CR2: 0000000005593008 CR3: 000000012742f000 CR4: 00000000000026e0 [ 350.040136] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 350.040136] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 350.040136] Call Trace: [ 350.040136] [<ffffffff812a96c2>] ? delay_tsc+0x22/0x80 [ 350.040136] [<ffffffff812a95da>] ? __delay+0xa/0x10 [ 350.040136] [<ffffffff812addbd>] ? _raw_spin_lock+0xfd/0x170 [ 350.040136] [<ffffffff816e92f1>] ? _spin_lock+0x51/0x70 [ 350.040136] [<ffffffff81562836>] ? dev_queue_xmit+0xe6/0x4b0 [ 350.040136] [<ffffffff81562836>] ? dev_queue_xmit+0xe6/0x4b0 [ 350.040136] [<ffffffff815627a3>] ? dev_queue_xmit+0x53/0x4b0 [ 350.040136] [<ffffffff81594bac>] ? ip_finish_output+0x13c/0x320 [ 350.040136] [<ffffffff81594e0b>] ? ip_output+0x7b/0xd0 [ 350.040136] [<ffffffff81593bf0>] ? ip_local_out+0x20/0x30 [ 350.040136] [<ffffffff815943c5>] ? ip_queue_xmit+0x165/0x3b0 [ 350.040136] [<ffffffff815a8d49>] ? tcp_transmit_skb+0x3e9/0x780 [ 350.040136] [<ffffffff815ab3b7>] ? tcp_write_xmit+0x1e7/0x9d0 [ 350.040136] [<ffffffff815abc0b>] ? __tcp_push_pending_frames+0x2b/0xd0 [ 350.040136] [<ffffffff8159e327>] ? tcp_sendmsg+0x887/0xb90 [ 350.040136] [<ffffffff8154fa86>] ? sock_sendmsg+0x126/0x140 [ 350.040136] [<ffffffff81097b60>] ? autoremove_wake_function+0x0/0x40 [ 350.040136] [<ffffffff81097b60>] ? autoremove_wake_function+0x0/0x40 [ 350.040136] [<ffffffff810ab0e7>] ? mark_held_locks+0x67/0x90 [ 350.040136] [<ffffffff816e90bb>] ? _spin_unlock_irqrestore+0x3b/0x70 [ 350.040136] [<ffffffff815509c0>] ? sys_sendto+0xf0/0x130 [ 350.040136] [<ffffffff810ab3fd>] ? trace_hardirqs_on_caller+0x14d/0x190 [ 350.040136] [<ffffffff810ab44d>] ? trace_hardirqs_on+0xd/0x10 [ 350.040136] [<ffffffff810a1c27>] ? getnstimeofday+0x57/0xe0 [ 350.040136] [<ffffffff8109bbf1>] ? ktime_get_ts+0x51/0x70 [ 350.040136] [<ffffffff81035ec2>] ? system_call_fastpath+0x16/0x1b [ 415.538136] BUG: soft lockup - CPU#1 stuck for 61s! [openvpn:4248] [ 415.538136] Modules linked in: sch_netem sbs sbshc pcmcia snd_hda_codec_conex ant yenta_socket rsrc_nonstatic snd_hda_intel snd_hda_codec thinkpad_acpi iwlagn pcmcia_core btusb snd_hwdep ehci_hcd uhci_hcd [ 415.538136] irq event stamp: 281051 [ 415.538136] hardirqs last enabled at (281050): [<ffffffff81036a10>] restore_ args+0x0/0x30 [ 415.538136] hardirqs last disabled at (281051): [<ffffffff81035d3a>] save_arg s+0x6a/0x70 [ 415.538136] softirqs last enabled at (19946): [<ffffffff815528ad>] lock_sock _nested+0x8d/0x130 [ 415.538136] softirqs last disabled at (19952): [<ffffffff815627a8>] dev_queue _xmit+0x58/0x4b0 [ 415.538136] CPU 1: [ 415.538136] Modules linked in: sch_netem sbs sbshc pcmcia snd_hda_codec_conex ant yenta_socket rsrc_nonstatic snd_hda_intel snd_hda_codec thinkpad_acpi iwlagn pcmcia_core btusb snd_hwdep ehci_hcd uhci_hcd [ 415.538136] Pid: 4248, comm: openvpn Not tainted 2.6.31-rc1-andres-00437-gde7 327a-dirty #61 208252G [ 415.538136] RIP: 0010:[<ffffffff8103e276>] [<ffffffff8103e276>] native_read_ tsc+0x6/0x20 [ 415.538136] RSP: 0018:ffff8801271c78b8 EFLAGS: 00000206 [ 415.538136] RAX: 000000008cf9059c RBX: ffff8801271c78b8 RCX: 000000008cf9050c [ 415.538136] RDX: 00000000000000d4 RSI: 0000000000006040 RDI: 0000000000000001 [ 415.538136] RBP: ffffffff81036b6e R08: ffffffff82175180 R09: 0000000000000000 [ 415.538136] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000003fed [ 415.538136] R13: ffff88002efde000 R14: ffff8801271c6000 R15: 0000000000000000 [ 415.538136] FS: 00007f98ddc646f0(0000) GS:ffff88002efde000(0000) knlGS:00000 00000000000 [ 415.538136] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 415.538136] CR2: 0000000005593008 CR3: 000000012742f000 CR4: 00000000000026e0 [ 415.538136] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 415.538136] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 415.538136] Call Trace: [ 415.538136] [<ffffffff812a96ea>] ? delay_tsc+0x4a/0x80 [ 415.538136] [<ffffffff812a95da>] ? __delay+0xa/0x10 [ 415.538136] [<ffffffff812addbd>] ? _raw_spin_lock+0xfd/0x170 [ 415.538136] [<ffffffff816e92f1>] ? _spin_lock+0x51/0x70 [ 415.538136] [<ffffffff81562836>] ? dev_queue_xmit+0xe6/0x4b0 [ 415.538136] [<ffffffff81562836>] ? dev_queue_xmit+0xe6/0x4b0 [ 415.538136] [<ffffffff815627a3>] ? dev_queue_xmit+0x53/0x4b0 [ 415.538136] [<ffffffff81594bac>] ? ip_finish_output+0x13c/0x320 [ 415.538136] [<ffffffff81594e0b>] ? ip_output+0x7b/0xd0 [ 415.538136] [<ffffffff81593bf0>] ? ip_local_out+0x20/0x30 [ 415.538136] [<ffffffff815943c5>] ? ip_queue_xmit+0x165/0x3b0 [ 415.538136] [<ffffffff815a8d49>] ? tcp_transmit_skb+0x3e9/0x780 [ 415.538136] [<ffffffff815ab3b7>] ? tcp_write_xmit+0x1e7/0x9d0 [ 415.538136] [<ffffffff815abc0b>] ? __tcp_push_pending_frames+0x2b/0xd0 [ 415.538136] [<ffffffff8159e327>] ? tcp_sendmsg+0x887/0xb90 [ 415.538136] [<ffffffff8154fa86>] ? sock_sendmsg+0x126/0x140 [ 415.538136] [<ffffffff81097b60>] ? autoremove_wake_function+0x0/0x40 [ 415.538136] [<ffffffff81097b60>] ? autoremove_wake_function+0x0/0x40 [ 415.538136] [<ffffffff810ab0e7>] ? mark_held_locks+0x67/0x90 [ 415.538136] [<ffffffff816e90bb>] ? _spin_unlock_irqrestore+0x3b/0x70 [ 415.538136] [<ffffffff815509c0>] ? sys_sendto+0xf0/0x130 [ 415.538136] [<ffffffff810ab3fd>] ? trace_hardirqs_on_caller+0x14d/0x190 [ 415.538136] [<ffffffff810ab44d>] ? trace_hardirqs_on+0xd/0x10 [ 415.538136] [<ffffffff810a1c27>] ? getnstimeofday+0x57/0xe0 [ 415.538136] [<ffffffff8109bbf1>] ? ktime_get_ts+0x51/0x70 [ 415.538136] [<ffffffff81035ec2>] ? system_call_fastpath+0x16/0x1b
On Thu, Jul 02, 2009 at 01:11:56PM +0200, Andres Freund wrote: > On 07/02/2009 12:12 PM, Jarek Poplawski wrote: >> On Thu, Jul 02, 2009 at 09:30:31AM +0000, Jarek Poplawski wrote: >>> On Thu, Jul 02, 2009 at 02:37:24AM +0200, Andres Freund wrote: >>> ... >>>> So I tried - and I did not catch any lockdep output before the crash. >>>> Unfortunately I do not have another machine on the same local network to >>>> catch any messages after the crash... So I could be missing some warning >>>> (I did synchronous logging though). >>>> Will check with netconsole tomorrow. >>> >>> Could you try if this patch changes anything? >> >> ...and maybe CONFIG_PACKET_MMAP turned off. > Ok. Removed the skb_orphan and turned of CONFIG_PACKET_MMAP. Seemingly > the same game. > I now had another computer to catch the netconsole output. Still no > lockdep warnings. > > Unfortunately the other computer was a windows machine with its strange > terminal, so long lines are wrapped at 80cols, but that shouldn't be too > bad. It's very nice ;-) Alas we can see here only one side, so there is still some searching needed. I hope I'll be able to reproduce this later. Thanks for your work Andres, Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/02/2009 01:11 PM, Andres Freund wrote: > On 07/02/2009 12:12 PM, Jarek Poplawski wrote: >> On Thu, Jul 02, 2009 at 09:30:31AM +0000, Jarek Poplawski wrote: >>> On Thu, Jul 02, 2009 at 02:37:24AM +0200, Andres Freund wrote: >>> ... >>>> So I tried - and I did not catch any lockdep output before the crash. >>>> Unfortunately I do not have another machine on the same local >>>> network to >>>> catch any messages after the crash... So I could be missing some >>>> warning >>>> (I did synchronous logging though). >>>> Will check with netconsole tomorrow. >>> >>> Could you try if this patch changes anything? >> >> ...and maybe CONFIG_PACKET_MMAP turned off. > Ok. Removed the skb_orphan and turned of CONFIG_PACKET_MMAP. Seemingly > the same game. > I now had another computer to catch the netconsole output. Still no > lockdep warnings. > > Unfortunately the other computer was a windows machine with its strange > terminal, so long lines are wrapped at 80cols, but that shouldn't be too > bad. I will start trying to place the issue by testing with existing kernels between 2.6.30 and now. Andres -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 02, 2009 at 01:43:49PM +0200, Andres Freund wrote: ... > I will start trying to place the issue by testing with existing kernels > between 2.6.30 and now. If you can afford your time of course this would be very helpful. Thanks, Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/02/2009 01:54 PM, Jarek Poplawski wrote: > On Thu, Jul 02, 2009 at 01:43:49PM +0200, Andres Freund wrote: ... >> I will start trying to place the issue by testing with existing >> kernels between 2.6.30 and now. > If you can afford your time of course this would be very helpful. Well. Waiting for the issue to resolve itself would cost time as well ;-) I wont be able to finish this today, but perhaps some reduction of the search space will be enough. I placed it between 2.6.30 and 03347e2592078a90df818670fddf97a33eec70fb (v2.6.30-5415-g03347e2) so far. Andres -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index 2b88295..6001ad9 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -176,8 +176,6 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch) return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; } - skb_orphan(skb); - /* * If we need to duplicate packet, then re-insert at top of the * qdisc tree, since parent queuer expects that only one