| Message ID | CAN-7y0oY=LyeLTxhC5L+Vjy-NivmRJSBpqPP9+NYT7a=NFV0_A@mail.gmail.com |
|---|---|
| State | Rejected, archived |
| Delegated to: | David Miller |
| Headers | show |
On Fri, May 19, 2017 at 10:22 AM, Akshay Narayan <akshayn@mit.edu> wrote: > netlink_broadcast_filtered() calls yield() when a slow listener causes > the buffer to fill. yield() is the wrong choice here, as pointed out by > Commit 8e3fabfde4 (sched: Update yield() docs); in some cases, its use causes > "BUG: scheduling while atomic" and, when fewer cores are available, > kernel hangs: I don't want to defend the use of yield() but it looks like there is other problem. > > (note: "ccp" is a kernel module which multicasts netlink messages upon > certain TCP events) Does this module call netlink_broadcast() with __GFP_DIRECT_RECLAIM in IRQ context? If so you should adjust the gfp flags.
> I don't want to defend the use of yield() but it looks like there is other > problem. I believe this use of yield() should be replaced with cond_resched() even if it turns out there is an unrelated problem. > Does this module call netlink_broadcast() with __GFP_DIRECT_RECLAIM > in IRQ context? If so you should adjust the gfp flags. The module only calls netlink_broadcast() from a pluggable TCP function; from my understanding this is not in the IRQ context. Full trace, perhaps more clear, attached below. May 19 14:30:44 ccp kernel: [ 178.885546] BUG: scheduling while atomic: mm-link/3105/0x00000200 May 19 14:30:44 ccp kernel: [ 178.885552] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat libcrc32c xt_connmark nf_conntrack ccp(OE) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_intel8x0 pcbc snd_ac97_codec joydev ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi aesni_intel snd_seq aes_x86_64 crypto_simd snd_seq_device snd_timer snd input_leds i2c_piix4 glue_helper cryptd so undcore mac_hid serio_raw vboxvideo ttm drm_kms_helper drm fb_sys_fops syscopyarea sysfillrect sysimgblt vboxguest intel_rapl_perf parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid e1000 ahci libahci psmouse fjes pata_acpi video May 19 14:30:44 ccp kernel: [ 178.885665] CPU: 0 PID: 3105 Comm: mm-link Tainted: G W OE 4.10.0-21-generic #23-Ubuntu May 19 14:30:44 ccp kernel: [ 178.885666] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 May 19 14:30:44 ccp kernel: [ 178.885667] Call Trace: May 19 14:30:44 ccp kernel: [ 178.885674] dump_stack+0x63/0x81 May 19 14:30:44 ccp kernel: [ 178.885678] __schedule_bug+0x54/0x70 May 19 14:30:44 ccp kernel: [ 178.885682] __schedule+0x536/0x6f0 May 19 14:30:44 ccp kernel: [ 178.885685] schedule+0x36/0x80 May 19 14:30:44 ccp kernel: [ 178.885687] sys_sched_yield+0x4f/0x60 May 19 14:30:44 ccp kernel: [ 178.885688] yield+0x33/0x40 May 19 14:30:44 ccp kernel: [ 178.885691] netlink_broadcast_filtered+0x29b/0x3c0 May 19 14:30:44 ccp kernel: [ 178.885692] netlink_broadcast+0x1d/0x20 May 19 14:30:44 ccp kernel: [ 178.885697] nl_sendmsg+0xb8/0x664 [ccp] May 19 14:30:44 ccp kernel: [ 178.885699] nl_send_ack_notif+0x7d/0x90 [ccp] May 19 14:30:44 ccp kernel: [ 178.885702] tcp_ccp_cong_avoid+0x69/0x70 [ccp] May 19 14:30:44 ccp kernel: [ 178.885704] tcp_ack+0x980/0xa60 May 19 14:30:44 ccp kernel: [ 178.885708] tcp_rcv_state_process+0x2be/0xda0 May 19 14:30:44 ccp kernel: [ 178.885712] ? security_sock_rcv_skb+0x3b/0x50 May 19 14:30:44 ccp kernel: [ 178.885715] ? sk_filter_trim_cap+0x3b/0x270 May 19 14:30:44 ccp kernel: [ 178.885717] tcp_v4_do_rcv+0xb2/0x200 May 19 14:30:44 ccp kernel: [ 178.885719] tcp_v4_rcv+0x90a/0xa00 May 19 14:30:44 ccp kernel: [ 178.885722] ip_local_deliver_finish+0x96/0x1c0 May 19 14:30:44 ccp kernel: [ 178.885725] ip_local_deliver+0x6f/0xe0 May 19 14:30:44 ccp kernel: [ 178.885727] ? ip_rcv_finish+0x3f0/0x3f0 May 19 14:30:44 ccp kernel: [ 178.885730] ip_rcv_finish+0x118/0x3f0 May 19 14:30:44 ccp kernel: [ 178.885732] ip_rcv+0x282/0x390 May 19 14:30:44 ccp kernel: [ 178.885735] ? inet_del_offload+0x40/0x40 May 19 14:30:44 ccp kernel: [ 178.885737] __netif_receive_skb_core+0x514/0xa40 May 19 14:30:44 ccp kernel: [ 178.885740] ? __check_object_size+0x10/0x1d7 May 19 14:30:44 ccp kernel: [ 178.885742] __netif_receive_skb+0x18/0x60 May 19 14:30:44 ccp kernel: [ 178.885744] netif_receive_skb_internal+0x32/0xa0 May 19 14:30:44 ccp kernel: [ 178.885746] netif_receive_skb+0x1c/0x70 May 19 14:30:44 ccp kernel: [ 178.885749] tun_get_user+0x425/0x800 May 19 14:30:44 ccp kernel: [ 178.885751] tun_chr_write_iter+0x57/0x70 May 19 14:30:44 ccp kernel: [ 178.885752] new_sync_write+0xd5/0x130 May 19 14:30:44 ccp kernel: [ 178.885754] __vfs_write+0x26/0x40 May 19 14:30:44 ccp kernel: [ 178.885756] vfs_write+0xb5/0x1a0 May 19 14:30:44 ccp kernel: [ 178.885757] SyS_write+0x55/0xc0 May 19 14:30:44 ccp kernel: [ 178.885760] entry_SYSCALL_64_fastpath+0x1e/0xad May 19 14:30:44 ccp kernel: [ 178.885762] RIP: 0033:0x7f8d9abbf670 May 19 14:30:44 ccp kernel: [ 178.885763] RSP: 002b:00007ffc2f16d8b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 May 19 14:30:44 ccp kernel: [ 178.885765] RAX: ffffffffffffffda RBX: 00007ffc2f16dde0 RCX: 00007f8d9abbf670 May 19 14:30:44 ccp kernel: [ 178.885767] RDX: 0000000000000038 RSI: 0000557403fd86e0 RDI: 0000000000000008 May 19 14:30:44 ccp kernel: [ 178.885768] RBP: 00007ffc2f16dcd0 R08: 0000557403fd8d40 R09: 0000557403fd8690 May 19 14:30:44 ccp kernel: [ 178.885769] R10: 000e20d41e89a5c5 R11: 0000000000000246 R12: 0000000000000108 May 19 14:30:44 ccp kernel: [ 178.885770] R13: 000000000000145c R14: 00007ffc2f16dd98 R15: 0000000000000003
On Fri, 2017-05-19 at 14:47 -0400, Akshay Narayan wrote: > > I don't want to defend the use of yield() but it looks like there is other > > problem. > > I believe this use of yield() should be replaced with cond_resched() > even if it turns out there is an unrelated problem. > > > Does this module call netlink_broadcast() with __GFP_DIRECT_RECLAIM > > in IRQ context? If so you should adjust the gfp flags. > > The module only calls netlink_broadcast() from a pluggable TCP > function; from my understanding this is not in the IRQ context. Full > trace, perhaps more clear, attached below. > > May 19 14:30:44 ccp kernel: [ 178.885546] BUG: scheduling while > atomic: mm-link/3105/0x00000200 > May 19 14:30:44 ccp kernel: [ 178.885552] Modules linked in: > ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 > nf_defrag_ipv4 nf_nat_ipv4 nf_nat libcrc32c xt_connmark nf_conntrack > ccp(OE) crct10dif_pclmul crc32_pclmul > ghash_clmulni_intel snd_intel8x0 pcbc snd_ac97_codec joydev ac97_bus > snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi aesni_intel > snd_seq aes_x86_64 crypto_simd snd_seq_device snd_timer snd input_leds > i2c_piix4 glue_helper cryptd so > undcore mac_hid serio_raw vboxvideo ttm drm_kms_helper drm fb_sys_fops > syscopyarea sysfillrect sysimgblt vboxguest intel_rapl_perf parport_pc > ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid > e1000 ahci libahci psmouse > fjes pata_acpi video > May 19 14:30:44 ccp kernel: [ 178.885665] CPU: 0 PID: 3105 Comm: > mm-link Tainted: G W OE 4.10.0-21-generic #23-Ubuntu > May 19 14:30:44 ccp kernel: [ 178.885666] Hardware name: innotek GmbH > VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 > May 19 14:30:44 ccp kernel: [ 178.885667] Call Trace: > May 19 14:30:44 ccp kernel: [ 178.885674] dump_stack+0x63/0x81 > May 19 14:30:44 ccp kernel: [ 178.885678] __schedule_bug+0x54/0x70 > May 19 14:30:44 ccp kernel: [ 178.885682] __schedule+0x536/0x6f0 > May 19 14:30:44 ccp kernel: [ 178.885685] schedule+0x36/0x80 > May 19 14:30:44 ccp kernel: [ 178.885687] sys_sched_yield+0x4f/0x60 > May 19 14:30:44 ccp kernel: [ 178.885688] yield+0x33/0x40 > May 19 14:30:44 ccp kernel: [ 178.885691] > netlink_broadcast_filtered+0x29b/0x3c0 > May 19 14:30:44 ccp kernel: [ 178.885692] netlink_broadcast+0x1d/0x20 > May 19 14:30:44 ccp kernel: [ 178.885697] nl_sendmsg+0xb8/0x664 [ccp] > May 19 14:30:44 ccp kernel: [ 178.885699] nl_send_ack_notif+0x7d/0x90 [ccp] > May 19 14:30:44 ccp kernel: [ 178.885702] tcp_ccp_cong_avoid+0x69/0x70 [ccp] > May 19 14:30:44 ccp kernel: [ 178.885704] tcp_ack+0x980/0xa60 > May 19 14:30:44 ccp kernel: [ 178.885708] tcp_rcv_state_process+0x2be/0xda0 > May 19 14:30:44 ccp kernel: [ 178.885712] ? security_sock_rcv_skb+0x3b/0x50 > May 19 14:30:44 ccp kernel: [ 178.885715] ? sk_filter_trim_cap+0x3b/0x270 No idea what ccp is, it is not in upstream kernel, and it looks buggy. Please do not send patches that are not needed in upstream kernel.
On Fri, May 19, 2017 at 11:47 AM, Akshay Narayan <akshayn@mit.edu> wrote: >> I don't want to defend the use of yield() but it looks like there is other >> problem. > > I believe this use of yield() should be replaced with cond_resched() > even if it turns out there is an unrelated problem. Yeah, it is a different problem, because cond_resched() itself could sleep too so it doesn't fix the schedule-in-atomic problem, not to mention the kmalloc() would sleep. > >> Does this module call netlink_broadcast() with __GFP_DIRECT_RECLAIM >> in IRQ context? If so you should adjust the gfp flags. > > The module only calls netlink_broadcast() from a pluggable TCP > function; from my understanding this is not in the IRQ context. Full > trace, perhaps more clear, attached below. It is process context but with a spinlock (bh_lock_sock) held, so you still can't sleep. IOW, you have to pass a proper gfp flag to reflect this.
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 161b628..f70716c 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1441,7 +1441,7 @@ int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb, u32 portid if (info.delivered) { if (info.congested && gfpflags_allow_blocking(allocation)) - yield(); + cond_resched(); return 0; }
netlink_broadcast_filtered() calls yield() when a slow listener causes the buffer to fill. yield() is the wrong choice here, as pointed out by Commit 8e3fabfde4 (sched: Update yield() docs); in some cases, its use causes "BUG: scheduling while atomic" and, when fewer cores are available, kernel hangs: (note: "ccp" is a kernel module which multicasts netlink messages upon certain TCP events) May 17 17:33:56 ccp kernel: [ 394.493978] BUG: scheduling while atomic: iperf/4744/0x00000101 May 17 17:33:56 ccp kernel: [ 394.493979] Modules linked in: ccp(OE) snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq joydev pcbc snd_seq _device snd_timer snd vboxvideo aesni_intel aes_x86_64 i2c_piix4 input_leds crypto_simd glue_helper cryptd soundcore mac_hid vboxguest intel_rapl_perf serio_raw ttm drm_kms_helper drm fb_sys_fops syscopyarea sysfillrect sysimgblt parpor t_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid psmouse ahci e1000 libahci pata_acpi fjes video [last unloaded: ccp] May 17 17:33:56 ccp kernel: [ 394.493997] CPU: 0 PID: 4744 Comm: iperf Tainted: G W OE 4.10.0-21-generic #23-Ubuntu May 17 17:33:56 ccp kernel: [ 394.493997] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 May 17 17:33:56 ccp kernel: [ 394.493997] Call Trace: May 17 17:33:56 ccp kernel: [ 394.493998] <IRQ> May 17 17:33:56 ccp kernel: [ 394.494000] dump_stack+0x63/0x81 May 17 17:33:56 ccp kernel: [ 394.494001] __schedule_bug+0x54/0x70 May 17 17:33:56 ccp kernel: [ 394.494002] __schedule+0x536/0x6f0 May 17 17:33:56 ccp kernel: [ 394.494004] schedule+0x36/0x80 May 17 17:33:56 ccp kernel: [ 394.494005] sys_sched_yield+0x4f/0x60 May 17 17:33:56 ccp kernel: [ 394.494005] yield+0x33/0x40 May 17 17:33:56 ccp kernel: [ 394.494006] netlink_broadcast_filtered+0x29b/0x3c0 May 17 17:33:56 ccp kernel: [ 394.494007] netlink_broadcast+0x1d/0x20 ... May 17 17:33:56 ccp kernel: [ 394.494080] softirq: huh, entered softirq 3 NET_RX ffffffff85fb5c60 with preempt_count 00000101, exited with 00000000? Changing this call to cond_resched() prevents this. --- net/netlink/af_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) return -ESRCH; -- 2.7.4