Message ID | 20181108012927epcms1p47f719c1908da64a378690362901644ee@epcms1p4 |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Series | [Kernel,NET] Bug report on packet defragmenting | expand |
On 11/07/2018 05:29 PM, 배석진 wrote: > If ipv6_defrag hook is not excuted simultaneously, then it's ok. > ipv6_defrag hook can handle that. [exam 3] This seems wrong. This is the root cause, we should not try to work around it but fix it. There is no guarantee that RSS/RPS/RFS can help here, packets can sit in per-cpu backlogs long enough to reproduce the issue, if RX queues interrupts are spread over many cpus.
--------- Original Message --------- Sender : Eric Dumazet <eric.dumazet@gmail.com> Date : 2018-11-08 10:44 (GMT+9) Title : Re: [Kernel][NET] Bug report on packet defragmenting > On 11/07/2018 05:29 PM, 배석진 wrote: > > > If ipv6_defrag hook is not excuted simultaneously, then it's ok. > > ipv6_defrag hook can handle that. [exam 3] > > This seems wrong. > > This is the root cause, we should not try to work around it but fix it. > > There is no guarantee that RSS/RPS/RFS can help here, packets can sit in per-cpu > backlogs long enough to reproduce the issue, if RX queues interrupts are spread > over many cpus. Dear Dumazet, Even if rx irq be spread to overal cpu, hash will be made by src/des address. then they'll have a same hash and cpu. is it not enough? Did you mean that we need total solution for all steering method? not just only RPS? Best regards.
On 11/07/2018 06:05 PM, 배석진 wrote: > --------- Original Message --------- > Sender : Eric Dumazet <eric.dumazet@gmail.com> > Date : 2018-11-08 10:44 (GMT+9) > Title : Re: [Kernel][NET] Bug report on packet defragmenting > >> On 11/07/2018 05:29 PM, 배석진 wrote: >> >>> If ipv6_defrag hook is not excuted simultaneously, then it's ok. >>> ipv6_defrag hook can handle that. [exam 3] >> >> This seems wrong. >> >> This is the root cause, we should not try to work around it but fix it. >> >> There is no guarantee that RSS/RPS/RFS can help here, packets can sit in per-cpu >> backlogs long enough to reproduce the issue, if RX queues interrupts are spread >> over many cpus. > > > Dear Dumazet, > > Even if rx irq be spread to overal cpu, hash will be made by src/des address. > then they'll have a same hash and cpu. is it not enough? > Did you mean that we need total solution for all steering method? not just only RPS? > IPv6 defrag unit should work all the times, even if 10 cpus have to feed fragments for the same datagram at the same time. RPS is just a hint to spread packets on different cpus. Basically we could have the following patch and everything must still work properly (presumably at lower performance, if RPS/RFS is any good, of course) diff --git a/net/core/dev.c b/net/core/dev.c index 0ffcbdd55fa9ee545c807f2ed3fc178830e3075a..c1269bb0d6c86b097cfff2d8395d8cbf2d596537 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4036,7 +4036,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb, goto done; skb_reset_network_header(skb); - hash = skb_get_hash(skb); + hash = prandom_u32(); if (!hash) goto done; Sure, it is better if RPS is smarter, but if there is a bug in IPv6 defrag unit we must investigate and root-cause it.
On 11/07/2018 07:24 PM, Eric Dumazet wrote: > Sure, it is better if RPS is smarter, but if there is a bug in IPv6 defrag unit > we must investigate and root-cause it. BTW, IPv4 defrag seems to have the same issue.
> --------- Original Message --------- > Sender : Eric Dumazet <eric.dumazet@gmail.com> > Date : 2018-11-08 12:57 (GMT+9) > Title : Re: (2) [Kernel][NET] Bug report on packet defragmenting > > On 11/07/2018 07:24 PM, Eric Dumazet wrote: > > > Sure, it is better if RPS is smarter, but if there is a bug in IPv6 defrag unit > > we must investigate and root-cause it. > > BTW, IPv4 defrag seems to have the same issue. yes, it could be. key point isn't limitted to ipv6. maybe because of faster air-network and modem, it looks like occure more often and we got recognized that. anyway, we'll apply our patch to resolve this problem. Best regards, :)
On 11/07/2018 08:10 PM, 배석진 wrote: >> --------- Original Message --------- >> Sender : Eric Dumazet <eric.dumazet@gmail.com> >> Date : 2018-11-08 12:57 (GMT+9) >> Title : Re: (2) [Kernel][NET] Bug report on packet defragmenting >> >> On 11/07/2018 07:24 PM, Eric Dumazet wrote: >> >>> Sure, it is better if RPS is smarter, but if there is a bug in IPv6 defrag unit >>> we must investigate and root-cause it. >> >> BTW, IPv4 defrag seems to have the same issue. > > > yes, it could be. > key point isn't limitted to ipv6. > > maybe because of faster air-network and modem, > it looks like occure more often and we got recognized that. > > anyway, > we'll apply our patch to resolve this problem. Yeah, and I will fix the defrag units. We can not rely on other layers doing proper no-reorder logic for us. Problem here is that multiple cpus attempt concurrent rhashtable_insert_fast() and do not properly recover in case -EEXIST is returned. This is silly, of course :/
On 11/07/2018 08:26 PM, Eric Dumazet wrote: > > > On 11/07/2018 08:10 PM, 배석진 wrote: >>> --------- Original Message --------- >>> Sender : Eric Dumazet <eric.dumazet@gmail.com> >>> Date : 2018-11-08 12:57 (GMT+9) >>> Title : Re: (2) [Kernel][NET] Bug report on packet defragmenting >>> >>> On 11/07/2018 07:24 PM, Eric Dumazet wrote: >>> >>>> Sure, it is better if RPS is smarter, but if there is a bug in IPv6 defrag unit >>>> we must investigate and root-cause it. >>> >>> BTW, IPv4 defrag seems to have the same issue. >> >> >> yes, it could be. >> key point isn't limitted to ipv6. >> >> maybe because of faster air-network and modem, >> it looks like occure more often and we got recognized that. >> >> anyway, >> we'll apply our patch to resolve this problem. > > Yeah, and I will fix the defrag units. > > We can not rely on other layers doing proper no-reorder logic for us. > > Problem here is that multiple cpus attempt concurrent rhashtable_insert_fast() > and do not properly recover in case -EEXIST is returned. > > This is silly, of course :/ Patch would be https://patchwork.ozlabs.org/patch/994658/
>--------- Original Message --------- >Sender : Eric Dumazet <eric.dumazet@gmail.com> >Date : 2018-11-08 15:13 (GMT+9) >Title : Re: (2) (2) [Kernel][NET] Bug report on packet defragmenting > >On 11/07/2018 08:26 PM, Eric Dumazet wrote: >> >> >> On 11/07/2018 08:10 PM, 배석진 wrote: >>>> --------- Original Message --------- >>>> Sender : Eric Dumazet <eric.dumazet@gmail.com> >>>> Date : 2018-11-08 12:57 (GMT+9) >>>> Title : Re: (2) [Kernel][NET] Bug report on packet defragmenting >>>> >>>> On 11/07/2018 07:24 PM, Eric Dumazet wrote: >>>> >>>>> Sure, it is better if RPS is smarter, but if there is a bug in IPv6 defrag unit >>>>> we must investigate and root-cause it. >>>> >>>> BTW, IPv4 defrag seems to have the same issue. >>> >>> >>> yes, it could be. >>> key point isn't limitted to ipv6. >>> >>> maybe because of faster air-network and modem, >>> it looks like occure more often and we got recognized that. >>> >>> anyway, >>> we'll apply our patch to resolve this problem. >> >> Yeah, and I will fix the defrag units. >> >> We can not rely on other layers doing proper no-reorder logic for us. >> >> Problem here is that multiple cpus attempt concurrent rhashtable_insert_fast() >> and do not properly recover in case -EEXIST is returned. >> >> This is silly, of course :/ > >Patch would be https://patchwork.ozlabs.org/patch/994658/ Dear Dumazet, with your patch, kernel got the panic when packet recieved. I double checked after disable your patch, then no problem. <6>[ 119.702054] I[3: kworker/u18:1: 1705] LNK-RX(1464): 6b 80 00 00 05 90 2c 3e 20 01 44 30 00 05 04 01 ... <6>[ 119.702120] I[3: kworker/u18:1: 1705] __skb_flow_dissect: ports: 77500000 <6>[ 119.702153] I[3: kworker/u18:1: 1705] get_rps_cpu: cpu:2, hash:2055028308 <6>[ 119.702203] I[3: kworker/u18:1: 1705] LNK-RX(1212): 6b 80 00 00 04 94 2c 3e 20 01 44 30 00 05 04 01 ... <6>[ 119.702231] I[3: kworker/u18:1: 1705] __skb_flow_dissect: ports: 3c7e2c6b <6>[ 119.702258] I[3: kworker/u18:1: 1705] get_rps_cpu: cpu:1, hash:671343869 <6>[ 119.702365] I[1: Binder:11369_2:11382] ipv6_rcv +++ <6>[ 119.702375] I[2: swapper/2: 0] ipv6_rcv +++ <6>[ 119.702406] I[2: swapper/2: 0] ipv6_defrag +++ <6>[ 119.702425] I[1: Binder:11369_2:11382] ipv6_defrag +++ <6>[ 119.702494] I[2: swapper/2: 0] ipv6_defrag: EINPROGRESS <6>[ 119.702522] I[2: swapper/2: 0] ipv6_rcv --- <6>[ 119.702628] I[1: Binder:11369_2:11382] ipv6_defrag --- <6>[ 119.702892] I[1: Binder:11369_2:11382] ipv6_defrag +++ <6>[ 119.702922] I[1: Binder:11369_2:11382] ipv6_defrag --- <6>[ 119.702966] I[1: Binder:11369_2:11382] ipv6_rcv --- <0>[ 119.703792] [1: Binder:11369_2:11382] BUG: sleeping function called from invalid context at arch/arm64/mm/fault.c:518 <3>[ 119.703826] [1: Binder:11369_2:11382] in_atomic(): 0, irqs_disabled(): 0, pid: 11382, name: Binder:11369_2 <3>[ 119.703854] [1: Binder:11369_2:11382] Preemption disabled at: <4>[ 119.703888] [1: Binder:11369_2:11382] [<ffffff80080b13d4>] __do_softirq+0x68/0x3c4 <4>[ 119.703934] [1: Binder:11369_2:11382] CPU: 1 PID: 11382 Comm: Binder:11369_2 Tainted: G S W 4.14.75-20181108-163447-eng #0 <4>[ 119.703960] [1: Binder:11369_2:11382] Hardware name: Samsung BEYOND2LTE KOR SINGLE 19 board based on EXYNOS9820 (DT) <4>[ 119.703987] [1: Binder:11369_2:11382] Call trace: <4>[ 119.704015] [1: Binder:11369_2:11382] [<ffffff80080bd87c>] dump_backtrace+0x0/0x280 <4>[ 119.704045] [1: Binder:11369_2:11382] [<ffffff80080bddd4>] show_stack+0x18/0x24 <4>[ 119.704074] [1: Binder:11369_2:11382] [<ffffff80090bb3f8>] dump_stack+0xb8/0xf8 <4>[ 119.704104] [1: Binder:11369_2:11382] [<ffffff800811f180>] ___might_sleep+0x16c/0x178 <4>[ 119.704132] [1: Binder:11369_2:11382] [<ffffff800811efdc>] __might_sleep+0x4c/0x84 <4>[ 119.704164] [1: Binder:11369_2:11382] [<ffffff80090dcf60>] do_page_fault+0x2e8/0x4b8 <4>[ 119.704193] [1: Binder:11369_2:11382] [<ffffff80090dcbf4>] do_translation_fault+0x7c/0x100 <4>[ 119.704219] [1: Binder:11369_2:11382] [<ffffff80080b0d70>] do_mem_abort+0x4c/0x12c <4>[ 119.704243] [1: Binder:11369_2:11382] Exception stack(0xffffff8038bf3ec0 to 0xffffff8038bf4000) <4>[ 119.704266] [1: Binder:11369_2:11382] 3ec0: 00000077b8262600 00000077b1bd0800 00000000708fcae0 0000000000000018 ... <4>[ 119.704459] [1: Binder:11369_2:11382] 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 <4>[ 119.704480] [1: Binder:11369_2:11382] [<ffffff80080b33d0>] el0_da+0x20/0x24 <4>[ 119.704509] [1: Binder:11369_2:11382] ------------[ cut here ]------------ <0>[ 119.704541] [1: Binder:11369_2:11382] kernel BUG at kernel/sched/core.c:6152! <2>[ 119.704563] [1: Binder:11369_2:11382] sec_debug_set_extra_info_fault = BUG / 0xffffff800811f180 <0>[ 119.704603] [1: Binder:11369_2:11382] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
On 11/07/2018 11:58 PM, 배석진 wrote: >> --------- Original Message --------- >> Sender : Eric Dumazet <eric.dumazet@gmail.com> >> Date : 2018-11-08 15:13 (GMT+9) >> Title : Re: (2) (2) [Kernel][NET] Bug report on packet defragmenting >> >> On 11/07/2018 08:26 PM, Eric Dumazet wrote: >>> >>> >>> On 11/07/2018 08:10 PM, 배석진 wrote: >>>>> --------- Original Message --------- >>>>> Sender : Eric Dumazet <eric.dumazet@gmail.com> >>>>> Date : 2018-11-08 12:57 (GMT+9) >>>>> Title : Re: (2) [Kernel][NET] Bug report on packet defragmenting >>>>> >>>>> On 11/07/2018 07:24 PM, Eric Dumazet wrote: >>>>> >>>>>> Sure, it is better if RPS is smarter, but if there is a bug in IPv6 defrag unit >>>>>> we must investigate and root-cause it. >>>>> >>>>> BTW, IPv4 defrag seems to have the same issue. >>>> >>>> >>>> yes, it could be. >>>> key point isn't limitted to ipv6. >>>> >>>> maybe because of faster air-network and modem, >>>> it looks like occure more often and we got recognized that. >>>> >>>> anyway, >>>> we'll apply our patch to resolve this problem. >>> >>> Yeah, and I will fix the defrag units. >>> >>> We can not rely on other layers doing proper no-reorder logic for us. >>> >>> Problem here is that multiple cpus attempt concurrent rhashtable_insert_fast() >>> and do not properly recover in case -EEXIST is returned. >>> >>> This is silly, of course :/ >> >> Patch would be https://patchwork.ozlabs.org/patch/994658/ > > > Dear Dumazet, > > with your patch, kernel got the panic when packet recieved. > I double checked after disable your patch, then no problem. > > > <6>[ 119.702054] I[3: kworker/u18:1: 1705] LNK-RX(1464): 6b 80 00 00 05 90 2c 3e 20 01 44 30 00 05 04 01 ... > <6>[ 119.702120] I[3: kworker/u18:1: 1705] __skb_flow_dissect: ports: 77500000 > <6>[ 119.702153] I[3: kworker/u18:1: 1705] get_rps_cpu: cpu:2, hash:2055028308 > <6>[ 119.702203] I[3: kworker/u18:1: 1705] LNK-RX(1212): 6b 80 00 00 04 94 2c 3e 20 01 44 30 00 05 04 01 ... > <6>[ 119.702231] I[3: kworker/u18:1: 1705] __skb_flow_dissect: ports: 3c7e2c6b > <6>[ 119.702258] I[3: kworker/u18:1: 1705] get_rps_cpu: cpu:1, hash:671343869 > <6>[ 119.702365] I[1: Binder:11369_2:11382] ipv6_rcv +++ > <6>[ 119.702375] I[2: swapper/2: 0] ipv6_rcv +++ > <6>[ 119.702406] I[2: swapper/2: 0] ipv6_defrag +++ > <6>[ 119.702425] I[1: Binder:11369_2:11382] ipv6_defrag +++ > <6>[ 119.702494] I[2: swapper/2: 0] ipv6_defrag: EINPROGRESS > <6>[ 119.702522] I[2: swapper/2: 0] ipv6_rcv --- > <6>[ 119.702628] I[1: Binder:11369_2:11382] ipv6_defrag --- > <6>[ 119.702892] I[1: Binder:11369_2:11382] ipv6_defrag +++ > <6>[ 119.702922] I[1: Binder:11369_2:11382] ipv6_defrag --- > <6>[ 119.702966] I[1: Binder:11369_2:11382] ipv6_rcv --- > <0>[ 119.703792] [1: Binder:11369_2:11382] BUG: sleeping function called from invalid context at arch/arm64/mm/fault.c:518 > <3>[ 119.703826] [1: Binder:11369_2:11382] in_atomic(): 0, irqs_disabled(): 0, pid: 11382, name: Binder:11369_2 > <3>[ 119.703854] [1: Binder:11369_2:11382] Preemption disabled at: > <4>[ 119.703888] [1: Binder:11369_2:11382] [<ffffff80080b13d4>] __do_softirq+0x68/0x3c4 > <4>[ 119.703934] [1: Binder:11369_2:11382] CPU: 1 PID: 11382 Comm: Binder:11369_2 Tainted: G S W 4.14.75-20181108-163447-eng #0 > <4>[ 119.703960] [1: Binder:11369_2:11382] Hardware name: Samsung BEYOND2LTE KOR SINGLE 19 board based on EXYNOS9820 (DT) > <4>[ 119.703987] [1: Binder:11369_2:11382] Call trace: > <4>[ 119.704015] [1: Binder:11369_2:11382] [<ffffff80080bd87c>] dump_backtrace+0x0/0x280 > <4>[ 119.704045] [1: Binder:11369_2:11382] [<ffffff80080bddd4>] show_stack+0x18/0x24 > <4>[ 119.704074] [1: Binder:11369_2:11382] [<ffffff80090bb3f8>] dump_stack+0xb8/0xf8 > <4>[ 119.704104] [1: Binder:11369_2:11382] [<ffffff800811f180>] ___might_sleep+0x16c/0x178 > <4>[ 119.704132] [1: Binder:11369_2:11382] [<ffffff800811efdc>] __might_sleep+0x4c/0x84 > <4>[ 119.704164] [1: Binder:11369_2:11382] [<ffffff80090dcf60>] do_page_fault+0x2e8/0x4b8 > <4>[ 119.704193] [1: Binder:11369_2:11382] [<ffffff80090dcbf4>] do_translation_fault+0x7c/0x100 > <4>[ 119.704219] [1: Binder:11369_2:11382] [<ffffff80080b0d70>] do_mem_abort+0x4c/0x12c > <4>[ 119.704243] [1: Binder:11369_2:11382] Exception stack(0xffffff8038bf3ec0 to 0xffffff8038bf4000) > <4>[ 119.704266] [1: Binder:11369_2:11382] 3ec0: 00000077b8262600 00000077b1bd0800 00000000708fcae0 0000000000000018 > ... > <4>[ 119.704459] [1: Binder:11369_2:11382] 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > <4>[ 119.704480] [1: Binder:11369_2:11382] [<ffffff80080b33d0>] el0_da+0x20/0x24 > <4>[ 119.704509] [1: Binder:11369_2:11382] ------------[ cut here ]------------ > <0>[ 119.704541] [1: Binder:11369_2:11382] kernel BUG at kernel/sched/core.c:6152! > <2>[ 119.704563] [1: Binder:11369_2:11382] sec_debug_set_extra_info_fault = BUG / 0xffffff800811f180 > <0>[ 119.704603] [1: Binder:11369_2:11382] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP > > Thanks for testing. This is not a pristine net-next tree, this dump seems unrelated to the patch ?
>Thanks for testing. > >This is not a pristine net-next tree, this dump seems unrelated to the patch ? yes, looks like that. but only when using your patch, panic came. even right after packet recieving.. without that, there's no problem except defrag issue. it's odd.. :p I couldn't more debugging since have other problems.
On 11/08/2018 04:42 PM, 배석진 wrote: >> Thanks for testing. >> >> This is not a pristine net-next tree, this dump seems unrelated to the patch ? > > > yes, looks like that. > but only when using your patch, panic came. even right after packet recieving.. > without that, there's no problem except defrag issue. it's odd.. :p > I couldn't more debugging since have other problems. You might need to backport some fixes (check all changes to lib/rhashtable.c )
>On 11/08/2018 04:42 PM, 배석진 wrote: >> Thanks for testing. >>> >>> This is not a pristine net-next tree, this dump seems unrelated to the patch ? >> >> >> yes, looks like that. >> but only when using your patch, panic came. even right after packet recieving.. >> without that, there's no problem except defrag issue. it's odd.. :p >> I couldn't more debugging since have other problems. > > > You might need to backport some fixes (check all changes to lib/rhashtable.c ) I try to backport the updates to my space. but.. there are too many related files about lib/rhashtable.c .. I give up ;( thank you for your help!
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 676f3ad629f9..928df25129ba 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -1166,8 +1166,8 @@ bool __skb_flow_dissect(const struct sk_buff *skb, break; } - if (dissector_uses_key(flow_dissector, - FLOW_DISSECTOR_KEY_PORTS)) { + if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_PORTS) + && !(key_control->flags & FLOW_DIS_IS_FRAGMENT)) { key_ports = skb_flow_dissector_target(flow_dissector, FLOW_DISSECTOR_KEY_PORTS, target_container);