diff mbox

Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

Message ID 3230b360-528b-0ae0-8731-7906e57ee993@gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

David Ahern June 14, 2017, 2:27 a.m. UTC
On 6/13/17 3:42 PM, Cong Wang wrote:
> On Tue, Jun 13, 2017 at 1:16 PM, Ben Greear <greearb@candelatech.com> wrote:
>> On 06/09/2017 02:25 PM, Eric Dumazet wrote:
>>>
>>> On Fri, 2017-06-09 at 07:27 -0600, David Ahern wrote:
>>>>
>>>> On 6/8/17 11:55 PM, Cong Wang wrote:
>>>>> Apparently fn->parent is NULL here for some reason, but
>>>>> I don't know if that is expected or not. If a simple NULL check
>>>>> is not enough here, we have to trace why it is NULL.
>>>>
>>>>
>>>> From my understanding, parent should not be null hence the attempts to
>>>> fix access to table nodes under a lock. ie., figuring out why it is null
>>>> here.
>>
>>
>> If someone has more suggestions, I'll be happy to test.
> 
> Can you enable RT6_TRACE() by changing RT6_DEBUG
> from 2 to 3? We may collect some useful log with it.
> 
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index d4bf2c6..1941595 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -37,7 +37,7 @@
>  #include <net/ip6_fib.h>
>  #include <net/ip6_route.h>
> 
> -#define RT6_DEBUG 2
> +#define RT6_DEBUG 3
> 
>  #if RT6_DEBUG >= 3
>  #define RT6_TRACE(x...) pr_debug(x)
> 

Let's try a targeted debug patch. See attached

Comments

Ben Greear June 14, 2017, 10:23 p.m. UTC | #1
On 06/13/2017 07:27 PM, David Ahern wrote:

> Let's try a targeted debug patch. See attached

I had to change it to pr_err so it would go to our serial console
since the system locked hard on crash,
and that appears to be enough to change the timing where we can no longer
reproduce the problem.

Thanks,
Ben
David Ahern June 14, 2017, 10:25 p.m. UTC | #2
On 6/14/17 4:23 PM, Ben Greear wrote:
> On 06/13/2017 07:27 PM, David Ahern wrote:
> 
>> Let's try a targeted debug patch. See attached
> 
> I had to change it to pr_err so it would go to our serial console
> since the system locked hard on crash,
> and that appears to be enough to change the timing where we can no longer
> reproduce the problem.


ok, let's figure out which one is doing that. There are 3 debug
statements. I suspect fib6_del_route is the one setting the state to
FWS_U. Can you remove the debug prints in fib6_repair_tree and
fib6_walk_continue and try again?
Ben Greear June 20, 2017, 2:12 p.m. UTC | #3
On 06/14/2017 03:25 PM, David Ahern wrote:
> On 6/14/17 4:23 PM, Ben Greear wrote:
>> On 06/13/2017 07:27 PM, David Ahern wrote:
>>
>>> Let's try a targeted debug patch. See attached
>>
>> I had to change it to pr_err so it would go to our serial console
>> since the system locked hard on crash,
>> and that appears to be enough to change the timing where we can no longer
>> reproduce the problem.
>
>
> ok, let's figure out which one is doing that. There are 3 debug
> statements. I suspect fib6_del_route is the one setting the state to
> FWS_U. Can you remove the debug prints in fib6_repair_tree and
> fib6_walk_continue and try again?

We cannot reproduce with just that one printf in the kernel either.  It
must change the timing too much to trigger the bug.

Thanks,
Ben
Michal Kubecek June 20, 2017, 6:05 p.m. UTC | #4
On Tue, Jun 20, 2017 at 07:12:27AM -0700, Ben Greear wrote:
> On 06/14/2017 03:25 PM, David Ahern wrote:
> >On 6/14/17 4:23 PM, Ben Greear wrote:
> >>On 06/13/2017 07:27 PM, David Ahern wrote:
> >>
> >>>Let's try a targeted debug patch. See attached
> >>
> >>I had to change it to pr_err so it would go to our serial console
> >>since the system locked hard on crash,
> >>and that appears to be enough to change the timing where we can no longer
> >>reproduce the problem.
> >
> >
> >ok, let's figure out which one is doing that. There are 3 debug
> >statements. I suspect fib6_del_route is the one setting the state to
> >FWS_U. Can you remove the debug prints in fib6_repair_tree and
> >fib6_walk_continue and try again?
> 
> We cannot reproduce with just that one printf in the kernel either.  It
> must change the timing too much to trigger the bug.

You might try trace_printk() which should have less impact (don't forget
to enable /proc/sys/kernel/ftrace_dump_on_oops).

Michal Kubecek
Ben Greear June 20, 2017, 9:41 p.m. UTC | #5
On 06/20/2017 11:05 AM, Michal Kubecek wrote:
> On Tue, Jun 20, 2017 at 07:12:27AM -0700, Ben Greear wrote:
>> On 06/14/2017 03:25 PM, David Ahern wrote:
>>> On 6/14/17 4:23 PM, Ben Greear wrote:
>>>> On 06/13/2017 07:27 PM, David Ahern wrote:
>>>>
>>>>> Let's try a targeted debug patch. See attached
>>>>
>>>> I had to change it to pr_err so it would go to our serial console
>>>> since the system locked hard on crash,
>>>> and that appears to be enough to change the timing where we can no longer
>>>> reproduce the problem.
>>>
>>>
>>> ok, let's figure out which one is doing that. There are 3 debug
>>> statements. I suspect fib6_del_route is the one setting the state to
>>> FWS_U. Can you remove the debug prints in fib6_repair_tree and
>>> fib6_walk_continue and try again?
>>
>> We cannot reproduce with just that one printf in the kernel either.  It
>> must change the timing too much to trigger the bug.
>
> You might try trace_printk() which should have less impact (don't forget
> to enable /proc/sys/kernel/ftrace_dump_on_oops).

We cannot reproduce with trace_printk() either.

Thanks,
Ben

>
> Michal Kubecek
>
David Ahern June 21, 2017, 3:03 a.m. UTC | #6
On 6/20/17 5:41 PM, Ben Greear wrote:
> On 06/20/2017 11:05 AM, Michal Kubecek wrote:
>> On Tue, Jun 20, 2017 at 07:12:27AM -0700, Ben Greear wrote:
>>> On 06/14/2017 03:25 PM, David Ahern wrote:
>>>> On 6/14/17 4:23 PM, Ben Greear wrote:
>>>>> On 06/13/2017 07:27 PM, David Ahern wrote:
>>>>>
>>>>>> Let's try a targeted debug patch. See attached
>>>>>
>>>>> I had to change it to pr_err so it would go to our serial console
>>>>> since the system locked hard on crash,
>>>>> and that appears to be enough to change the timing where we can no
>>>>> longer
>>>>> reproduce the problem.
>>>>
>>>>
>>>> ok, let's figure out which one is doing that. There are 3 debug
>>>> statements. I suspect fib6_del_route is the one setting the state to
>>>> FWS_U. Can you remove the debug prints in fib6_repair_tree and
>>>> fib6_walk_continue and try again?
>>>
>>> We cannot reproduce with just that one printf in the kernel either.  It
>>> must change the timing too much to trigger the bug.
>>
>> You might try trace_printk() which should have less impact (don't forget
>> to enable /proc/sys/kernel/ftrace_dump_on_oops).
> 
> We cannot reproduce with trace_printk() either.

I think that suggests the walker state is set to FWS_U in
fib6_del_route, and it is the FWS_U case in fib6_walk_continue that
triggers the fault -- the null parent (pn = fn->parent). So we have the
2 areas of code that are interacting.

I'm on a road trip through the end of this week with little time to
focus on this problem. I'll get back to you another suggestion when I can.
Ben Greear Jan. 24, 2018, 11:59 p.m. UTC | #7
On 06/20/2017 08:03 PM, David Ahern wrote:
> On 6/20/17 5:41 PM, Ben Greear wrote:
>> On 06/20/2017 11:05 AM, Michal Kubecek wrote:
>>> On Tue, Jun 20, 2017 at 07:12:27AM -0700, Ben Greear wrote:
>>>> On 06/14/2017 03:25 PM, David Ahern wrote:
>>>>> On 6/14/17 4:23 PM, Ben Greear wrote:
>>>>>> On 06/13/2017 07:27 PM, David Ahern wrote:
>>>>>>
>>>>>>> Let's try a targeted debug patch. See attached
>>>>>>
>>>>>> I had to change it to pr_err so it would go to our serial console
>>>>>> since the system locked hard on crash,
>>>>>> and that appears to be enough to change the timing where we can no
>>>>>> longer
>>>>>> reproduce the problem.
>>>>>
>>>>>
>>>>> ok, let's figure out which one is doing that. There are 3 debug
>>>>> statements. I suspect fib6_del_route is the one setting the state to
>>>>> FWS_U. Can you remove the debug prints in fib6_repair_tree and
>>>>> fib6_walk_continue and try again?
>>>>
>>>> We cannot reproduce with just that one printf in the kernel either.  It
>>>> must change the timing too much to trigger the bug.
>>>
>>> You might try trace_printk() which should have less impact (don't forget
>>> to enable /proc/sys/kernel/ftrace_dump_on_oops).
>>
>> We cannot reproduce with trace_printk() either.
>
> I think that suggests the walker state is set to FWS_U in
> fib6_del_route, and it is the FWS_U case in fib6_walk_continue that
> triggers the fault -- the null parent (pn = fn->parent). So we have the
> 2 areas of code that are interacting.
>
> I'm on a road trip through the end of this week with little time to
> focus on this problem. I'll get back to you another suggestion when I can.

So, though I don't know the right way to fix it, the patch below appears
to make the system not crash.


diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 68b9cc7..bf19a14 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1614,6 +1614,12 @@ static int fib6_walk_continue(struct fib6_walker *w)
                         pn = fn->parent;
                         w->node = pn;
  #ifdef CONFIG_IPV6_SUBTREES
+                       if (WARN_ON_ONCE(!pn)) {
+                               pr_err("FWS-U, w: %p  fn: %p  pn: %p\n",
+                                      w, fn, pn);
+                               /* Attempt to work around crash that has been here forever. --Ben */
+                               return 0;
+                       }
                         if (FIB6_SUBTREE(pn) == fn) {
                                 WARN_ON(!(fn->fn_flags & RTN_ROOT));
                                 w->state = FWS_L;



The printout looks like this (when adding 4000 mac-vlans, so it is pretty rare).  PN is definitely NULL sometimes:

[root@2u-6n ~]# journalctl -f|grep FWS
Jan 24 15:48:05 2u-6n kernel: IPv6: FWS-U, w: ffff8807ea121ba0  fn: ffff880856a09260  pn:           (null)
Jan 24 15:51:15 2u-6n kernel: IPv6: FWS-U, w: ffff8807e3963de0  fn: ffff880856a09260  pn:           (null)
Jan 24 15:51:15 2u-6n kernel: IPv6: FWS-U, w: ffff88081ac22de0  fn: ffff880856a09260  pn:           (null)
Jan 24 15:53:13 2u-6n kernel: IPv6: FWS-U, w: ffff8808290c69c0  fn: ffff8807e369f920  pn:           (null)
Jan 24 15:53:24 2u-6n kernel: IPv6: FWS-U, w: ffff8807ea3156c0  fn: ffff88082d1eeb60  pn:           (null)



8066 Jan 24 15:48:04 2u-6n kernel: 8021q: adding VLAN 0 to HW filter on device eth2#1006
  8067 Jan 24 15:48:05 2u-6n kernel: ------------[ cut here ]------------
  8068 Jan 24 15:48:05 2u-6n kernel: WARNING: CPU: 5 PID: 3346 at /home/greearb/git/linux-4.13.dev.y/net/ipv6/ip6_fib.c:1617 fib6_walk_continue+ 
0x154/0x1b0 [ipv6]
  8069 Jan 24 15:48:05 2u-6n kernel: Modules linked in: 8021q garp mrp stp llc fuse macvlan wanlink(O) pktgen ipmi_ssif coretemp intel_rapl            sb_edac 
x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm ath9k irqbypass iTCO_wdt ath9k_common iTCO_vendor_support ath9k_hw ath              i2c_i801 mac80211 joydev 
lpc_ich cfg80211 ioatdma shpchp tpm_tis tpm_tis_core wmi tpm ipmi_si ipmi_devintf ipmi_msghandler acpi_pad             acpi_power_meter nfsd auth_rpcgss nfs_acl 
sch_fq_codel lockd grace sunrpc ast drm_kms_helper ttm drm igb hwmon ptp pps_core dca                 i2c_algo_bit i2c_core ipv6 crc_ccitt
  8070 Jan 24 15:48:05 2u-6n kernel: CPU: 5 PID: 3346 Comm: ip Tainted: G           O    4.13.16+ #22
  8071 Jan 24 15:48:05 2u-6n kernel: Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017
  8072 Jan 24 15:48:05 2u-6n kernel: task: ffff8807e9ef1dc0 task.stack: ffffc9002083c000
  8073 Jan 24 15:48:05 2u-6n kernel: RIP: 0010:fib6_walk_continue+0x154/0x1b0 [ipv6]
  8074 Jan 24 15:48:05 2u-6n kernel: RSP: 0018:ffffc9002083fbc0 EFLAGS: 00010246
  8075 Jan 24 15:48:05 2u-6n kernel: RAX: 0000000000000000 RBX: ffff8807ea121ba0 RCX: 0000000000000000
  8076 Jan 24 15:48:05 2u-6n kernel: RDX: ffff880856a09260 RSI: ffffc9002083fc00 RDI: ffffffff81ef2140
  8077 Jan 24 15:48:05 2u-6n kernel: RBP: ffffc9002083fbc8 R08: 0000000000000008 R09: ffff8807e36f6b25
  8078 Jan 24 15:48:05 2u-6n kernel: R10: ffffc9002083fb70 R11: 0000000000000000 R12: 0000000000000002
  8079 Jan 24 15:48:05 2u-6n kernel: R13: 0000000000000002 R14: ffff8807ea121ba0 R15: ffff8807ebcc8d80
  8080 Jan 24 15:48:05 2u-6n kernel: FS:  00007f77a5d0f700(0000) GS:ffff88087fd40000(0000) knlGS:0000000000000000
  8081 Jan 24 15:48:05 2u-6n kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  8082 Jan 24 15:48:05 2u-6n kernel: CR2: 0000000003d56c88 CR3: 00000007f3106000 CR4: 00000000003406e0
  8083 Jan 24 15:48:05 2u-6n kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  8084 Jan 24 15:48:05 2u-6n kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  8085 Jan 24 15:48:05 2u-6n kernel: Call Trace:
  8086 Jan 24 15:48:05 2u-6n kernel:  inet6_dump_fib+0x1ab/0x2a0 [ipv6]
  8087 Jan 24 15:48:05 2u-6n kernel:  netlink_dump+0x22c/0x2b0
  8088 Jan 24 15:48:05 2u-6n kernel:  netlink_recvmsg+0x260/0x3f0
  8089 Jan 24 15:48:05 2u-6n kernel:  sock_recvmsg+0x38/0x40
  8090 Jan 24 15:48:05 2u-6n kernel:  ___sys_recvmsg+0xe9/0x230
  8091 Jan 24 15:48:05 2u-6n kernel:  ? alloc_pages_vma+0x83/0x1e0
  8092 Jan 24 15:48:05 2u-6n kernel:  ? page_add_new_anon_rmap+0x88/0xc0
  8093 Jan 24 15:48:05 2u-6n kernel:  ? lru_cache_add_active_or_unevictable+0x31/0xb0
  8094 Jan 24 15:48:05 2u-6n kernel:  ? __handle_mm_fault+0x5e5/0xfa0
  8095 Jan 24 15:48:05 2u-6n kernel:  __sys_recvmsg+0x3d/0x70
  8096 Jan 24 15:48:05 2u-6n kernel:  ? __sys_recvmsg+0x3d/0x70
  8097 Jan 24 15:48:05 2u-6n kernel:  SyS_recvmsg+0xd/0x20
  8098 Jan 24 15:48:05 2u-6n kernel:  do_syscall_64+0x56/0xc0
  8099 Jan 24 15:48:05 2u-6n kernel:  entry_SYSCALL64_slow_path+0x25/0x25
  8100 Jan 24 15:48:05 2u-6n kernel: RIP: 0033:0x7f77a5644030
  8101 Jan 24 15:48:05 2u-6n kernel: RSP: 002b:00007ffc3e783e68 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
  8102 Jan 24 15:48:05 2u-6n kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f77a5644030
  8103 Jan 24 15:48:05 2u-6n kernel: RDX: 0000000000000000 RSI: 00007ffc3e783ed0 RDI: 0000000000000004
  8104 Jan 24 15:48:05 2u-6n kernel: RBP: 00007ffc3e787ef4 R08: 0000000000003fe4 R09: 0
8105 Jan 24 15:48:05 2u-6n kernel: R10: 00007ffc3e783f10 R11: 0000000000000246 R12: 000000000064f360
  8106 Jan 24 15:48:05 2u-6n kernel: R13: 00007ffc3e787f60 R14: 0000000000003fe4 R15: 0000000000000000
  8107 Jan 24 15:48:05 2u-6n kernel: Code: ff 24 c5 a8 e5 04 a0 f6 42 2a 02 74 68 c7 43 28 01 00 00 00 48 89 c2 e9 c7 fe ff ff c7 43 28 02 00 00       00 48 89 
c2 e9 b8 fe ff ff <0f> ff 31 c9 48 89 de 48 c7 c7 78 36 05 a0 e8 65 e4 14 e1 31 c0
  8108 Jan 24 15:48:05 2u-6n kernel: ---[ end trace 1d1c7028c9dec459 ]---
  8109 Jan 24 15:48:05 2u-6n kernel: IPv6: FWS-U, w: ffff8807ea121ba0  fn: ffff880856a09260  pn:           (null)
  8110 Jan 24 15:48:05 2u-6n kernel: 8021q: adding VLAN 0 to HW filter on device eth2#1008
  8111 Jan 24 15:48:05 2u-6n kernel: 8021q: adding VLAN 0 to HW filter on device eth2#1009
....

Thanks,
Ben
Ben Greear April 17, 2018, 11:29 p.m. UTC | #8
On 01/24/2018 03:59 PM, Ben Greear wrote:
> On 06/20/2017 08:03 PM, David Ahern wrote:
>> On 6/20/17 5:41 PM, Ben Greear wrote:
>>> On 06/20/2017 11:05 AM, Michal Kubecek wrote:
>>>> On Tue, Jun 20, 2017 at 07:12:27AM -0700, Ben Greear wrote:
>>>>> On 06/14/2017 03:25 PM, David Ahern wrote:
>>>>>> On 6/14/17 4:23 PM, Ben Greear wrote:
>>>>>>> On 06/13/2017 07:27 PM, David Ahern wrote:
>>>>>>>
>>>>>>>> Let's try a targeted debug patch. See attached
>>>>>>>
>>>>>>> I had to change it to pr_err so it would go to our serial console
>>>>>>> since the system locked hard on crash,
>>>>>>> and that appears to be enough to change the timing where we can no
>>>>>>> longer
>>>>>>> reproduce the problem.
>>>>>>
>>>>>>
>>>>>> ok, let's figure out which one is doing that. There are 3 debug
>>>>>> statements. I suspect fib6_del_route is the one setting the state to
>>>>>> FWS_U. Can you remove the debug prints in fib6_repair_tree and
>>>>>> fib6_walk_continue and try again?
>>>>>
>>>>> We cannot reproduce with just that one printf in the kernel either.  It
>>>>> must change the timing too much to trigger the bug.
>>>>
>>>> You might try trace_printk() which should have less impact (don't forget
>>>> to enable /proc/sys/kernel/ftrace_dump_on_oops).
>>>
>>> We cannot reproduce with trace_printk() either.
>>
>> I think that suggests the walker state is set to FWS_U in
>> fib6_del_route, and it is the FWS_U case in fib6_walk_continue that
>> triggers the fault -- the null parent (pn = fn->parent). So we have the
>> 2 areas of code that are interacting.
>>
>> I'm on a road trip through the end of this week with little time to
>> focus on this problem. I'll get back to you another suggestion when I can.

FYI, problem still happens in 4.16.  I'm going to re-enable my hack below
for this kernel as well...I had hopes it might be fixed...

BUG: unable to handle kernel NULL pointer dereference at 8
IP: fib6_walk_continue+0x5b/0x140 [ipv6]
PGD 80000007dfc0c067 P4D 80000007dfc0c067 PUD 7e66ff067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 libcrc32c vrf]
CPU: 3 PID: 15117 Comm: ip Tainted: G           O     4.16.0+ #5
Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017
RIP: 0010:fib6_walk_continue+0x5b/0x140 [ipv6]
RSP: 0018:ffffc90008c3bc10 EFLAGS: 00010287
RAX: ffff88085ac45050 RBX: ffff8807e03008a0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffc90008c3bc48 RDI: ffffffff8232b240
RBP: ffff880819167600 R08: 0000000000000008 R09: ffff8807dff10071
R10: ffffc90008c3bbd0 R11: 0000000000000000 R12: ffff8807e03008a0
R13: 0000000000000002 R14: ffff8807e05744c8 R15: ffff8807e08ef000
FS:  00007f2f04342700(0000) GS:ffff88087fcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 00000007e0556002 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  inet6_dump_fib+0x14b/0x2c0 [ipv6]
  netlink_dump+0x216/0x2a0
  netlink_recvmsg+0x254/0x400
  ? copy_msghdr_from_user+0xb5/0x110
  ___sys_recvmsg+0xe9/0x230
  ? find_held_lock+0x3b/0xb0
  ? __handle_mm_fault+0x617/0x1180
  ? __audit_syscall_entry+0xb3/0x110
  ? __sys_recvmsg+0x39/0x70
  __sys_recvmsg+0x39/0x70
  do_syscall_64+0x63/0x120
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7f2f03a72030
RSP: 002b:00007fffab3de508 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
RAX: ffffffffffffffda RBX: 00007fffab3e641c RCX: 00007f2f03a72030
RDX: 0000000000000000 RSI: 00007fffab3de570 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000000007e6c R09: 00007fffab3e63a8
R10: 00007fffab3de5b0 R11: 0000000000000246 R12: 00007fffab3e6608
R13: 000000000066b460 R14: 0000000000007e6c R15: 0000000000000000
Code: 85 d2 74 17 f6 40 2a 04 74 11 8b 53 2c 85 d2 0f 84 d7 00 00 00 83 ea 01 89 53 2c c7 4
RIP: fib6_walk_continue+0x5b/0x140 [ipv6] RSP: ffffc90008c3bc10
CR2: 0000000000000008
---[ end trace bd03458864eb266c ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
Rebooting in 10 seconds..
ACPI MEMORY or I/O RESET_REG.

>
> So, though I don't know the right way to fix it, the patch below appears
> to make the system not crash.
>
>
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index 68b9cc7..bf19a14 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -1614,6 +1614,12 @@ static int fib6_walk_continue(struct fib6_walker *w)
>                         pn = fn->parent;
>                         w->node = pn;
>  #ifdef CONFIG_IPV6_SUBTREES
> +                       if (WARN_ON_ONCE(!pn)) {
> +                               pr_err("FWS-U, w: %p  fn: %p  pn: %p\n",
> +                                      w, fn, pn);
> +                               /* Attempt to work around crash that has been here forever. --Ben */
> +                               return 0;
> +                       }
>                         if (FIB6_SUBTREE(pn) == fn) {
>                                 WARN_ON(!(fn->fn_flags & RTN_ROOT));
>                                 w->state = FWS_L;
>
>
>
> The printout looks like this (when adding 4000 mac-vlans, so it is pretty rare).  PN is definitely NULL sometimes:
>
> [root@2u-6n ~]# journalctl -f|grep FWS
> Jan 24 15:48:05 2u-6n kernel: IPv6: FWS-U, w: ffff8807ea121ba0  fn: ffff880856a09260  pn:           (null)
> Jan 24 15:51:15 2u-6n kernel: IPv6: FWS-U, w: ffff8807e3963de0  fn: ffff880856a09260  pn:           (null)
> Jan 24 15:51:15 2u-6n kernel: IPv6: FWS-U, w: ffff88081ac22de0  fn: ffff880856a09260  pn:           (null)
> Jan 24 15:53:13 2u-6n kernel: IPv6: FWS-U, w: ffff8808290c69c0  fn: ffff8807e369f920  pn:           (null)
> Jan 24 15:53:24 2u-6n kernel: IPv6: FWS-U, w: ffff8807ea3156c0  fn: ffff88082d1eeb60  pn:           (null)
>
>
>
> 8066 Jan 24 15:48:04 2u-6n kernel: 8021q: adding VLAN 0 to HW filter on device eth2#1006
>  8067 Jan 24 15:48:05 2u-6n kernel: ------------[ cut here ]------------
>  8068 Jan 24 15:48:05 2u-6n kernel: WARNING: CPU: 5 PID: 3346 at /home/greearb/git/linux-4.13.dev.y/net/ipv6/ip6_fib.c:1617 fib6_walk_continue+ 0x154/0x1b0 [ipv6]
>  8069 Jan 24 15:48:05 2u-6n kernel: Modules linked in: 8021q garp mrp stp llc fuse macvlan wanlink(O) pktgen ipmi_ssif coretemp intel_rapl            sb_edac
> x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm ath9k irqbypass iTCO_wdt ath9k_common iTCO_vendor_support ath9k_hw ath              i2c_i801 mac80211 joydev
> lpc_ich cfg80211 ioatdma shpchp tpm_tis tpm_tis_core wmi tpm ipmi_si ipmi_devintf ipmi_msghandler acpi_pad             acpi_power_meter nfsd auth_rpcgss nfs_acl
> sch_fq_codel lockd grace sunrpc ast drm_kms_helper ttm drm igb hwmon ptp pps_core dca                 i2c_algo_bit i2c_core ipv6 crc_ccitt
>  8070 Jan 24 15:48:05 2u-6n kernel: CPU: 5 PID: 3346 Comm: ip Tainted: G           O    4.13.16+ #22
>  8071 Jan 24 15:48:05 2u-6n kernel: Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017
>  8072 Jan 24 15:48:05 2u-6n kernel: task: ffff8807e9ef1dc0 task.stack: ffffc9002083c000
>  8073 Jan 24 15:48:05 2u-6n kernel: RIP: 0010:fib6_walk_continue+0x154/0x1b0 [ipv6]
>  8074 Jan 24 15:48:05 2u-6n kernel: RSP: 0018:ffffc9002083fbc0 EFLAGS: 00010246
>  8075 Jan 24 15:48:05 2u-6n kernel: RAX: 0000000000000000 RBX: ffff8807ea121ba0 RCX: 0000000000000000
>  8076 Jan 24 15:48:05 2u-6n kernel: RDX: ffff880856a09260 RSI: ffffc9002083fc00 RDI: ffffffff81ef2140
>  8077 Jan 24 15:48:05 2u-6n kernel: RBP: ffffc9002083fbc8 R08: 0000000000000008 R09: ffff8807e36f6b25
>  8078 Jan 24 15:48:05 2u-6n kernel: R10: ffffc9002083fb70 R11: 0000000000000000 R12: 0000000000000002
>  8079 Jan 24 15:48:05 2u-6n kernel: R13: 0000000000000002 R14: ffff8807ea121ba0 R15: ffff8807ebcc8d80
>  8080 Jan 24 15:48:05 2u-6n kernel: FS:  00007f77a5d0f700(0000) GS:ffff88087fd40000(0000) knlGS:0000000000000000
>  8081 Jan 24 15:48:05 2u-6n kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  8082 Jan 24 15:48:05 2u-6n kernel: CR2: 0000000003d56c88 CR3: 00000007f3106000 CR4: 00000000003406e0
>  8083 Jan 24 15:48:05 2u-6n kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  8084 Jan 24 15:48:05 2u-6n kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  8085 Jan 24 15:48:05 2u-6n kernel: Call Trace:
>  8086 Jan 24 15:48:05 2u-6n kernel:  inet6_dump_fib+0x1ab/0x2a0 [ipv6]
>  8087 Jan 24 15:48:05 2u-6n kernel:  netlink_dump+0x22c/0x2b0
>  8088 Jan 24 15:48:05 2u-6n kernel:  netlink_recvmsg+0x260/0x3f0
>  8089 Jan 24 15:48:05 2u-6n kernel:  sock_recvmsg+0x38/0x40
>  8090 Jan 24 15:48:05 2u-6n kernel:  ___sys_recvmsg+0xe9/0x230
>  8091 Jan 24 15:48:05 2u-6n kernel:  ? alloc_pages_vma+0x83/0x1e0
>  8092 Jan 24 15:48:05 2u-6n kernel:  ? page_add_new_anon_rmap+0x88/0xc0
>  8093 Jan 24 15:48:05 2u-6n kernel:  ? lru_cache_add_active_or_unevictable+0x31/0xb0
>  8094 Jan 24 15:48:05 2u-6n kernel:  ? __handle_mm_fault+0x5e5/0xfa0
>  8095 Jan 24 15:48:05 2u-6n kernel:  __sys_recvmsg+0x3d/0x70
>  8096 Jan 24 15:48:05 2u-6n kernel:  ? __sys_recvmsg+0x3d/0x70
>  8097 Jan 24 15:48:05 2u-6n kernel:  SyS_recvmsg+0xd/0x20
>  8098 Jan 24 15:48:05 2u-6n kernel:  do_syscall_64+0x56/0xc0
>  8099 Jan 24 15:48:05 2u-6n kernel:  entry_SYSCALL64_slow_path+0x25/0x25
>  8100 Jan 24 15:48:05 2u-6n kernel: RIP: 0033:0x7f77a5644030
>  8101 Jan 24 15:48:05 2u-6n kernel: RSP: 002b:00007ffc3e783e68 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
>  8102 Jan 24 15:48:05 2u-6n kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f77a5644030
>  8103 Jan 24 15:48:05 2u-6n kernel: RDX: 0000000000000000 RSI: 00007ffc3e783ed0 RDI: 0000000000000004
>  8104 Jan 24 15:48:05 2u-6n kernel: RBP: 00007ffc3e787ef4 R08: 0000000000003fe4 R09: 0
> 8105 Jan 24 15:48:05 2u-6n kernel: R10: 00007ffc3e783f10 R11: 0000000000000246 R12: 000000000064f360
>  8106 Jan 24 15:48:05 2u-6n kernel: R13: 00007ffc3e787f60 R14: 0000000000003fe4 R15: 0000000000000000
>  8107 Jan 24 15:48:05 2u-6n kernel: Code: ff 24 c5 a8 e5 04 a0 f6 42 2a 02 74 68 c7 43 28 01 00 00 00 48 89 c2 e9 c7 fe ff ff c7 43 28 02 00 00       00 48 89
> c2 e9 b8 fe ff ff <0f> ff 31 c9 48 89 de 48 c7 c7 78 36 05 a0 e8 65 e4 14 e1 31 c0
>  8108 Jan 24 15:48:05 2u-6n kernel: ---[ end trace 1d1c7028c9dec459 ]---
>  8109 Jan 24 15:48:05 2u-6n kernel: IPv6: FWS-U, w: ffff8807ea121ba0  fn: ffff880856a09260  pn:           (null)
>  8110 Jan 24 15:48:05 2u-6n kernel: 8021q: adding VLAN 0 to HW filter on device eth2#1008
>  8111 Jan 24 15:48:05 2u-6n kernel: 8021q: adding VLAN 0 to HW filter on device eth2#1009
> ....
>
> Thanks,
> Ben
>
David Ahern April 18, 2018, 12:38 a.m. UTC | #9
On 4/17/18 5:29 PM, Ben Greear wrote:
> 
> FYI, problem still happens in 4.16.  I'm going to re-enable my hack below
> for this kernel as well...I had hopes it might be fixed...

Interesting. I was hoping the same.

> 
> BUG: unable to handle kernel NULL pointer dereference at 8
> IP: fib6_walk_continue+0x5b/0x140 [ipv6]
> PGD 80000007dfc0c067 P4D 80000007dfc0c067 PUD 7e66ff067 PMD 0
> Oops: 0000 [#1] PREEMPT SMP PTI
> Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink
> nf_defrag_ipv4 libcrc32c vrf]
> CPU: 3 PID: 15117 Comm: ip Tainted: G           O     4.16.0+ #5
> Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b
> 05/02/2017
> RIP: 0010:fib6_walk_continue+0x5b/0x140 [ipv6]
> RSP: 0018:ffffc90008c3bc10 EFLAGS: 00010287
> RAX: ffff88085ac45050 RBX: ffff8807e03008a0 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffffc90008c3bc48 RDI: ffffffff8232b240
> RBP: ffff880819167600 R08: 0000000000000008 R09: ffff8807dff10071
> R10: ffffc90008c3bbd0 R11: 0000000000000000 R12: ffff8807e03008a0
> R13: 0000000000000002 R14: ffff8807e05744c8 R15: ffff8807e08ef000
> FS:  00007f2f04342700(0000) GS:ffff88087fcc0000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000008 CR3: 00000007e0556002 CR4: 00000000003606e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  inet6_dump_fib+0x14b/0x2c0 [ipv6]
>  netlink_dump+0x216/0x2a0
>  netlink_recvmsg+0x254/0x400
>  ? copy_msghdr_from_user+0xb5/0x110
>  ___sys_recvmsg+0xe9/0x230
>  ? find_held_lock+0x3b/0xb0
>  ? __handle_mm_fault+0x617/0x1180
>  ? __audit_syscall_entry+0xb3/0x110
>  ? __sys_recvmsg+0x39/0x70
>  __sys_recvmsg+0x39/0x70
>  do_syscall_64+0x63/0x120
>  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> RIP: 0033:0x7f2f03a72030
> RSP: 002b:00007fffab3de508 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
> RAX: ffffffffffffffda RBX: 00007fffab3e641c RCX: 00007f2f03a72030
> RDX: 0000000000000000 RSI: 00007fffab3de570 RDI: 0000000000000004
> RBP: 0000000000000000 R08: 0000000000007e6c R09: 00007fffab3e63a8
> R10: 00007fffab3de5b0 R11: 0000000000000246 R12: 00007fffab3e6608
> R13: 000000000066b460 R14: 0000000000007e6c R15: 0000000000000000
> Code: 85 d2 74 17 f6 40 2a 04 74 11 8b 53 2c 85 d2 0f 84 d7 00 00 00 83
> ea 01 89 53 2c c7 4
> RIP: fib6_walk_continue+0x5b/0x140 [ipv6] RSP: ffffc90008c3bc10
> CR2: 0000000000000008
> ---[ end trace bd03458864eb266c ]---
> Kernel panic - not syncing: Fatal exception in interrupt
> Kernel Offset: disabled
> Rebooting in 10 seconds..
> ACPI MEMORY or I/O RESET_REG.
> 


Since you can reproduce, would you mind trying
    https://github.com/dsahern/linux.git ipv6/fib6-change-v2

Hopefully these will be committed upstream soon. It changes the game a
bit with the FIB walker. Would be interesting to know if this problem
goes away.
diff mbox

Patch

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index deea901746c8..367f1284f05b 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1396,6 +1396,8 @@  static struct fib6_node *fib6_repair_tree(struct net *net,
 						RT6_TRACE("W %p adjusted by delnode 2, s=%d\n", w, w->state);
 						w->state = w->state >= FWS_C ? FWS_U : FWS_INIT;
 					}
+					if (w->state == FWS_U)
+						pr_warn("fib6_repair_tree: W %p adjusted by delnode 2, state FWS_U\n", w, w->state);
 				}
 			}
 		}
@@ -1447,8 +1449,10 @@  static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp,
 		if (w->state == FWS_C && w->leaf == rt) {
 			RT6_TRACE("walker %p adjusted by delroute\n", w);
 			w->leaf = rt->dst.rt6_next;
-			if (!w->leaf)
+			if (!w->leaf) {
+				pr_warn("fib6_del_route: walker %p adjusted by delroute - state FWS_U\n", w);
 				w->state = FWS_U;
+			}
 		}
 	}
 	read_unlock(&net->ipv6.fib6_walker_lock);
@@ -1591,6 +1595,7 @@  static int fib6_walk_continue(struct fib6_walker *w)
 				continue;
 			}
 skip:
+			pr_warn("fib6_walk_continue: set state to FWS_U\n");
 			w->state = FWS_U;
 		case FWS_U:
 			if (fn == w->root)