diff mbox series

net: Work around crash in ipv6 fib-walk-continue

Message ID 1524160886-20401-1-git-send-email-greearb@candelatech.com
State RFC, archived
Delegated to: David Miller
Headers show
Series net: Work around crash in ipv6 fib-walk-continue | expand

Commit Message

Ben Greear April 19, 2018, 6:01 p.m. UTC
From: Ben Greear <greearb@candelatech.com>

This keeps us from crashing in certain test cases where we
bring up many (1000, for instance) mac-vlans with IPv6
enabled in the kernel.  This bug has been around for a
very long time.

Until a real fix is found (and for stable), maybe it
is better to return an incomplete fib walk instead
of crashing.

BUG: unable to handle kernel NULL pointer dereference at 8
IP: fib6_walk_continue+0x5b/0x140 [ipv6]
PGD 80000007dfc0c067 P4D 80000007dfc0c067 PUD 7e66ff067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 libcrc32c vrf]
CPU: 3 PID: 15117 Comm: ip Tainted: G           O     4.16.0+ #5
Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017
RIP: 0010:fib6_walk_continue+0x5b/0x140 [ipv6]
RSP: 0018:ffffc90008c3bc10 EFLAGS: 00010287
RAX: ffff88085ac45050 RBX: ffff8807e03008a0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffc90008c3bc48 RDI: ffffffff8232b240
RBP: ffff880819167600 R08: 0000000000000008 R09: ffff8807dff10071
R10: ffffc90008c3bbd0 R11: 0000000000000000 R12: ffff8807e03008a0
R13: 0000000000000002 R14: ffff8807e05744c8 R15: ffff8807e08ef000
FS:  00007f2f04342700(0000) GS:ffff88087fcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 00000007e0556002 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 inet6_dump_fib+0x14b/0x2c0 [ipv6]
 netlink_dump+0x216/0x2a0
 netlink_recvmsg+0x254/0x400
 ? copy_msghdr_from_user+0xb5/0x110
 ___sys_recvmsg+0xe9/0x230
 ? find_held_lock+0x3b/0xb0
 ? __handle_mm_fault+0x617/0x1180
 ? __audit_syscall_entry+0xb3/0x110
 ? __sys_recvmsg+0x39/0x70
 __sys_recvmsg+0x39/0x70
 do_syscall_64+0x63/0x120
 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7f2f03a72030
RSP: 002b:00007fffab3de508 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
RAX: ffffffffffffffda RBX: 00007fffab3e641c RCX: 00007f2f03a72030
RDX: 0000000000000000 RSI: 00007fffab3de570 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000000007e6c R09: 00007fffab3e63a8
R10: 00007fffab3de5b0 R11: 0000000000000246 R12: 00007fffab3e6608
R13: 000000000066b460 R14: 0000000000007e6c R15: 0000000000000000
Code: 85 d2 74 17 f6 40 2a 04 74 11 8b 53 2c 85 d2 0f 84 d7 00 00 00 83 ea 01 89 53 2c c7 4
RIP: fib6_walk_continue+0x5b/0x140 [ipv6] RSP: ffffc90008c3bc10
CR2: 0000000000000008
---[ end trace bd03458864eb266c ]---

Signed-off-by: Ben Greear <greearb@candelatech.com>
---

* This patch is against 4.16+, but a similar patch fixes the same issue
  older kernels.  Perhaps newer kernels will be resolved by David
  Ahern's fib6 changes, but I guess those won't be backported, so maybe
  this patch is still useful either way.

 net/ipv6/ip6_fib.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

David Ahern May 4, 2018, 5:47 p.m. UTC | #1
On 4/19/18 12:01 PM, greearb@candelatech.com wrote:
> From: Ben Greear <greearb@candelatech.com>
> 
> This keeps us from crashing in certain test cases where we
> bring up many (1000, for instance) mac-vlans with IPv6
> enabled in the kernel.  This bug has been around for a
> very long time.
> 
> Until a real fix is found (and for stable), maybe it
> is better to return an incomplete fib walk instead
> of crashing.
> 
> BUG: unable to handle kernel NULL pointer dereference at 8
> IP: fib6_walk_continue+0x5b/0x140 [ipv6]
> PGD 80000007dfc0c067 P4D 80000007dfc0c067 PUD 7e66ff067 PMD 0
> Oops: 0000 [#1] PREEMPT SMP PTI
> Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 libcrc32c vrf]
> CPU: 3 PID: 15117 Comm: ip Tainted: G           O     4.16.0+ #5
> Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017
> RIP: 0010:fib6_walk_continue+0x5b/0x140 [ipv6]
> RSP: 0018:ffffc90008c3bc10 EFLAGS: 00010287
> RAX: ffff88085ac45050 RBX: ffff8807e03008a0 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffffc90008c3bc48 RDI: ffffffff8232b240
> RBP: ffff880819167600 R08: 0000000000000008 R09: ffff8807dff10071
> R10: ffffc90008c3bbd0 R11: 0000000000000000 R12: ffff8807e03008a0
> R13: 0000000000000002 R14: ffff8807e05744c8 R15: ffff8807e08ef000
> FS:  00007f2f04342700(0000) GS:ffff88087fcc0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000008 CR3: 00000007e0556002 CR4: 00000000003606e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  inet6_dump_fib+0x14b/0x2c0 [ipv6]
>  netlink_dump+0x216/0x2a0
>  netlink_recvmsg+0x254/0x400
>  ? copy_msghdr_from_user+0xb5/0x110
>  ___sys_recvmsg+0xe9/0x230
>  ? find_held_lock+0x3b/0xb0
>  ? __handle_mm_fault+0x617/0x1180
>  ? __audit_syscall_entry+0xb3/0x110
>  ? __sys_recvmsg+0x39/0x70
>  __sys_recvmsg+0x39/0x70
>  do_syscall_64+0x63/0x120
>  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> RIP: 0033:0x7f2f03a72030
> RSP: 002b:00007fffab3de508 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
> RAX: ffffffffffffffda RBX: 00007fffab3e641c RCX: 00007f2f03a72030
> RDX: 0000000000000000 RSI: 00007fffab3de570 RDI: 0000000000000004
> RBP: 0000000000000000 R08: 0000000000007e6c R09: 00007fffab3e63a8
> R10: 00007fffab3de5b0 R11: 0000000000000246 R12: 00007fffab3e6608
> R13: 000000000066b460 R14: 0000000000007e6c R15: 0000000000000000
> Code: 85 d2 74 17 f6 40 2a 04 74 11 8b 53 2c 85 d2 0f 84 d7 00 00 00 83 ea 01 89 53 2c c7 4
> RIP: fib6_walk_continue+0x5b/0x140 [ipv6] RSP: ffffc90008c3bc10
> CR2: 0000000000000008
> ---[ end trace bd03458864eb266c ]---
> 
> Signed-off-by: Ben Greear <greearb@candelatech.com>
> ---
> 

Does your use case that triggers this involve replacing routes? I just
noticed the route delete code in fib6_add_rt2node does not have the
'Adjust walkers' code that is in fib6_del_route.

Further, the adjust walkers code in fib6_del_route looks suspicious in
its timing with route deletes. If you have a reliable reproducer we can
try a few things with fib6_del_route and the walker code.
Ben Greear May 4, 2018, 5:57 p.m. UTC | #2
On 05/04/2018 10:47 AM, David Ahern wrote:
> On 4/19/18 12:01 PM, greearb@candelatech.com wrote:
>> From: Ben Greear <greearb@candelatech.com>
>>
>> This keeps us from crashing in certain test cases where we
>> bring up many (1000, for instance) mac-vlans with IPv6
>> enabled in the kernel.  This bug has been around for a
>> very long time.
>>
>> Until a real fix is found (and for stable), maybe it
>> is better to return an incomplete fib walk instead
>> of crashing.
>>
>> BUG: unable to handle kernel NULL pointer dereference at 8
>> IP: fib6_walk_continue+0x5b/0x140 [ipv6]
>> PGD 80000007dfc0c067 P4D 80000007dfc0c067 PUD 7e66ff067 PMD 0
>> Oops: 0000 [#1] PREEMPT SMP PTI
>> Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 libcrc32c vrf]
>> CPU: 3 PID: 15117 Comm: ip Tainted: G           O     4.16.0+ #5
>> Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017
>> RIP: 0010:fib6_walk_continue+0x5b/0x140 [ipv6]
>> RSP: 0018:ffffc90008c3bc10 EFLAGS: 00010287
>> RAX: ffff88085ac45050 RBX: ffff8807e03008a0 RCX: 0000000000000000
>> RDX: 0000000000000000 RSI: ffffc90008c3bc48 RDI: ffffffff8232b240
>> RBP: ffff880819167600 R08: 0000000000000008 R09: ffff8807dff10071
>> R10: ffffc90008c3bbd0 R11: 0000000000000000 R12: ffff8807e03008a0
>> R13: 0000000000000002 R14: ffff8807e05744c8 R15: ffff8807e08ef000
>> FS:  00007f2f04342700(0000) GS:ffff88087fcc0000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000008 CR3: 00000007e0556002 CR4: 00000000003606e0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>>  inet6_dump_fib+0x14b/0x2c0 [ipv6]
>>  netlink_dump+0x216/0x2a0
>>  netlink_recvmsg+0x254/0x400
>>  ? copy_msghdr_from_user+0xb5/0x110
>>  ___sys_recvmsg+0xe9/0x230
>>  ? find_held_lock+0x3b/0xb0
>>  ? __handle_mm_fault+0x617/0x1180
>>  ? __audit_syscall_entry+0xb3/0x110
>>  ? __sys_recvmsg+0x39/0x70
>>  __sys_recvmsg+0x39/0x70
>>  do_syscall_64+0x63/0x120
>>  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>> RIP: 0033:0x7f2f03a72030
>> RSP: 002b:00007fffab3de508 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
>> RAX: ffffffffffffffda RBX: 00007fffab3e641c RCX: 00007f2f03a72030
>> RDX: 0000000000000000 RSI: 00007fffab3de570 RDI: 0000000000000004
>> RBP: 0000000000000000 R08: 0000000000007e6c R09: 00007fffab3e63a8
>> R10: 00007fffab3de5b0 R11: 0000000000000246 R12: 00007fffab3e6608
>> R13: 000000000066b460 R14: 0000000000007e6c R15: 0000000000000000
>> Code: 85 d2 74 17 f6 40 2a 04 74 11 8b 53 2c 85 d2 0f 84 d7 00 00 00 83 ea 01 89 53 2c c7 4
>> RIP: fib6_walk_continue+0x5b/0x140 [ipv6] RSP: ffffc90008c3bc10
>> CR2: 0000000000000008
>> ---[ end trace bd03458864eb266c ]---
>>
>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>> ---
>>
>
> Does your use case that triggers this involve replacing routes? I just
> noticed the route delete code in fib6_add_rt2node does not have the
> 'Adjust walkers' code that is in fib6_del_route.
>
> Further, the adjust walkers code in fib6_del_route looks suspicious in
> its timing with route deletes. If you have a reliable reproducer we can
> try a few things with fib6_del_route and the walker code.

Yes, we replace routes, and yes we can reliably reproduce it and will
be happy to test patches.

Thanks,
Ben
diff mbox series

Patch

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 92b8d8c..afef362 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1855,6 +1855,12 @@  static int fib6_walk_continue(struct fib6_walker *w)
 			if (fn == w->root)
 				return 0;
 			pn = rcu_dereference_protected(fn->parent, 1);
+			if (WARN_ON_ONCE(!pn)) {
+				pr_err("FWS-U, w: %p  fn: %p  pn: %p\n",
+				       w, fn, pn);
+				/* Attempt to work around crash that has been here forever. --Ben */
+				return 0;
+			}
 			left = rcu_dereference_protected(pn->left, 1);
 			right = rcu_dereference_protected(pn->right, 1);
 			w->node = pn;