Message ID | 20180106013835.127556-1-tracywwnj@gmail.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Series | [net] ipv6: remove null_entry before adding default route | expand |
On Fri, Jan 05, 2018 at 05:38:35PM -0800, Wei Wang wrote: > From: Wei Wang <weiwan@google.com> > > In the current code, when creating a new fib6 table, tb6_root.leaf gets > initialized to net->ipv6.ip6_null_entry. > If a default route is being added with rt->rt6i_metric = 0xffffffff, > fib6_add() will add this route after net->ipv6.ip6_null_entry. As > null_entry is shared, it could cause problem. > > In order to fix it, set fn->leaf to NULL before calling > fib6_add_rt2node() when trying to add the first default route. > And reset fn->leaf to null_entry when adding fails or when deleting the > last default route. > > syzkaller reported the following issue which is fixed by this commit: > ============================= > WARNING: suspicious RCU usage > 4.15.0-rc5+ #171 Not tainted > ----------------------------- > net/ipv6/ip6_fib.c:1702 suspicious rcu_dereference_protected() usage! > > other info that might help us debug this: > > rcu_scheduler_active = 2, debug_locks = 1 > 4 locks held by swapper/0/0: > #0: ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] lockdep_copy_map include/linux/lockdep.h:178 [inline] > #0: ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] call_timer_fn+0x1c6/0x820 kernel/time/timer.c:1310 > #1: (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] spin_lock_bh include/linux/spinlock.h:315 [inline] > #1: (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] fib6_run_gc+0x9d/0x3c0 net/ipv6/ip6_fib.c:2007 > #2: (rcu_read_lock){....}, at: [<0000000091db762d>] __fib6_clean_all+0x0/0x3a0 net/ipv6/ip6_fib.c:1560 > #3: (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] spin_lock_bh include/linux/spinlock.h:315 [inline] > #3: (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] __fib6_clean_all+0x1d0/0x3a0 net/ipv6/ip6_fib.c:1948 > > stack backtrace: > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #171 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > Call Trace: > <IRQ> > __dump_stack lib/dump_stack.c:17 [inline] > dump_stack+0x194/0x257 lib/dump_stack.c:53 > lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585 > fib6_del+0xcaa/0x11b0 net/ipv6/ip6_fib.c:1701 > fib6_clean_node+0x3aa/0x4f0 net/ipv6/ip6_fib.c:1892 > fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1815 > fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1863 > fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1933 > __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1949 > fib6_clean_all net/ipv6/ip6_fib.c:1960 [inline] > fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2016 > fib6_gc_timer_cb+0x20/0x30 net/ipv6/ip6_fib.c:2033 > call_timer_fn+0x228/0x820 kernel/time/timer.c:1320 > expire_timers kernel/time/timer.c:1357 [inline] > __run_timers+0x7ee/0xb70 kernel/time/timer.c:1660 > run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686 > __do_softirq+0x2d7/0xb85 kernel/softirq.c:285 > invoke_softirq kernel/softirq.c:365 [inline] > irq_exit+0x1cc/0x200 kernel/softirq.c:405 > exiting_irq arch/x86/include/asm/apic.h:540 [inline] > smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052 > apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:904 > </IRQ> > > Reported-by: syzbot <syzkaller@googlegroups.com> > Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table") > Signed-off-by: Wei Wang <weiwan@google.com> > --- > net/ipv6/ip6_fib.c | 45 +++++++++++++++++++++++++++++++++++---------- > 1 file changed, 35 insertions(+), 10 deletions(-) > > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c > index d11a5578e4f8..37cb4ad1ea29 100644 > --- a/net/ipv6/ip6_fib.c > +++ b/net/ipv6/ip6_fib.c > @@ -640,6 +640,11 @@ static struct fib6_node *fib6_add_1(struct net *net, > if (!(fn->fn_flags & RTN_RTINFO)) { > RCU_INIT_POINTER(fn->leaf, NULL); > rt6_release(leaf); > + /* remove null_entry in the root node */ > + } else if (fn->fn_flags & RTN_TL_ROOT && > + rcu_access_pointer(fn->leaf) == > + net->ipv6.ip6_null_entry) { > + RCU_INIT_POINTER(fn->leaf, NULL); It seems the reader side could see tb6_root.leaf == NULL after this change and I think it should be fine? If it is, instead of switching betwen NULL and ip6_null_entry, would it be simpler to always set tb6_root.leaf to NULL until a legit default route is added? > } > > return fn; > @@ -1270,14 +1275,27 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, > return err; > > failure: > - /* fn->leaf could be NULL if fn is an intermediate node and we > - * failed to add the new route to it in both subtree creation > - * failure and fib6_add_rt2node() failure case. > - * In both cases, fib6_repair_tree() should be called to fix > + /* fn->leaf could be NULL if: > + * 1. fn is the root node in the table and we fail to add the default > + * route to it. > + * In this case, we put fn->leaf back to net->ipv6.ip6_null_entry as > + * the way the table was created. > + * 2. fn is an intermediate node and we failed to add the new > + * route to it in both subtree creation failure and fib6_add_rt2node() > + * failure case. > + * In this case, fib6_repair_tree() should be called to fix > * fn->leaf. > */ > - if (fn && !(fn->fn_flags & (RTN_RTINFO|RTN_ROOT))) > - fib6_repair_tree(info->nl_net, table, fn); > + if (fn) { > + if (fn->fn_flags & RTN_TL_ROOT) { > + if (!rcu_access_pointer(fn->leaf)) { > + rcu_assign_pointer(fn->leaf, > + info->nl_net->ipv6.ip6_null_entry); > + } > + } else if (!(fn->fn_flags & (RTN_RTINFO|RTN_ROOT))) { > + fib6_repair_tree(info->nl_net, table, fn); > + } > + } > /* Always release dst as dst->__refcnt is guaranteed > * to be taken before entering this function > */ > @@ -1685,11 +1703,18 @@ static void fib6_del_route(struct fib6_table *table, struct fib6_node *fn, > } > read_unlock(&net->ipv6.fib6_walker_lock); > > - /* If it was last route, expunge its radix tree node */ > + /* If it was last route: > + * 1. For root node, put back null_entry as how the table was created. > + * 2. For other nodes, expunge its radix tree node. > + */ > if (!rcu_access_pointer(fn->leaf)) { > - fn->fn_flags &= ~RTN_RTINFO; > - net->ipv6.rt6_stats->fib_route_nodes--; > - fn = fib6_repair_tree(net, table, fn); > + if (fn->fn_flags & RTN_TL_ROOT) { > + rcu_assign_pointer(fn->leaf, net->ipv6.ip6_null_entry); > + } else { > + fn->fn_flags &= ~RTN_RTINFO; > + net->ipv6.rt6_stats->fib_route_nodes--; > + fn = fib6_repair_tree(net, table, fn); > + } > } > > fib6_purge_rt(rt, fn, net); > -- > 2.16.0.rc0.223.g4a4ac83678-goog >
On Fri, Jan 5, 2018 at 11:42 PM, Martin KaFai Lau <kafai@fb.com> wrote: > On Fri, Jan 05, 2018 at 05:38:35PM -0800, Wei Wang wrote: >> From: Wei Wang <weiwan@google.com> >> >> In the current code, when creating a new fib6 table, tb6_root.leaf gets >> initialized to net->ipv6.ip6_null_entry. >> If a default route is being added with rt->rt6i_metric = 0xffffffff, >> fib6_add() will add this route after net->ipv6.ip6_null_entry. As >> null_entry is shared, it could cause problem. >> >> In order to fix it, set fn->leaf to NULL before calling >> fib6_add_rt2node() when trying to add the first default route. >> And reset fn->leaf to null_entry when adding fails or when deleting the >> last default route. >> >> syzkaller reported the following issue which is fixed by this commit: >> ============================= >> WARNING: suspicious RCU usage >> 4.15.0-rc5+ #171 Not tainted >> ----------------------------- >> net/ipv6/ip6_fib.c:1702 suspicious rcu_dereference_protected() usage! >> >> other info that might help us debug this: >> >> rcu_scheduler_active = 2, debug_locks = 1 >> 4 locks held by swapper/0/0: >> #0: ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] lockdep_copy_map include/linux/lockdep.h:178 [inline] >> #0: ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] call_timer_fn+0x1c6/0x820 kernel/time/timer.c:1310 >> #1: (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] spin_lock_bh include/linux/spinlock.h:315 [inline] >> #1: (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] fib6_run_gc+0x9d/0x3c0 net/ipv6/ip6_fib.c:2007 >> #2: (rcu_read_lock){....}, at: [<0000000091db762d>] __fib6_clean_all+0x0/0x3a0 net/ipv6/ip6_fib.c:1560 >> #3: (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] spin_lock_bh include/linux/spinlock.h:315 [inline] >> #3: (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] __fib6_clean_all+0x1d0/0x3a0 net/ipv6/ip6_fib.c:1948 >> >> stack backtrace: >> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #171 >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 >> Call Trace: >> <IRQ> >> __dump_stack lib/dump_stack.c:17 [inline] >> dump_stack+0x194/0x257 lib/dump_stack.c:53 >> lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585 >> fib6_del+0xcaa/0x11b0 net/ipv6/ip6_fib.c:1701 >> fib6_clean_node+0x3aa/0x4f0 net/ipv6/ip6_fib.c:1892 >> fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1815 >> fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1863 >> fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1933 >> __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1949 >> fib6_clean_all net/ipv6/ip6_fib.c:1960 [inline] >> fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2016 >> fib6_gc_timer_cb+0x20/0x30 net/ipv6/ip6_fib.c:2033 >> call_timer_fn+0x228/0x820 kernel/time/timer.c:1320 >> expire_timers kernel/time/timer.c:1357 [inline] >> __run_timers+0x7ee/0xb70 kernel/time/timer.c:1660 >> run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686 >> __do_softirq+0x2d7/0xb85 kernel/softirq.c:285 >> invoke_softirq kernel/softirq.c:365 [inline] >> irq_exit+0x1cc/0x200 kernel/softirq.c:405 >> exiting_irq arch/x86/include/asm/apic.h:540 [inline] >> smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052 >> apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:904 >> </IRQ> >> >> Reported-by: syzbot <syzkaller@googlegroups.com> >> Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table") >> Signed-off-by: Wei Wang <weiwan@google.com> >> --- >> net/ipv6/ip6_fib.c | 45 +++++++++++++++++++++++++++++++++++---------- >> 1 file changed, 35 insertions(+), 10 deletions(-) >> >> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c >> index d11a5578e4f8..37cb4ad1ea29 100644 >> --- a/net/ipv6/ip6_fib.c >> +++ b/net/ipv6/ip6_fib.c >> @@ -640,6 +640,11 @@ static struct fib6_node *fib6_add_1(struct net *net, >> if (!(fn->fn_flags & RTN_RTINFO)) { >> RCU_INIT_POINTER(fn->leaf, NULL); >> rt6_release(leaf); >> + /* remove null_entry in the root node */ >> + } else if (fn->fn_flags & RTN_TL_ROOT && >> + rcu_access_pointer(fn->leaf) == >> + net->ipv6.ip6_null_entry) { >> + RCU_INIT_POINTER(fn->leaf, NULL); > It seems the reader side could see tb6_root.leaf == NULL after > this change and I think it should be fine? If it is, instead > of switching betwen NULL and ip6_null_entry, would it be simpler > to always set tb6_root.leaf to NULL until a legit default route > is added? > Agree that reader side could see tb6_root.leaf == NULL and it should be fine cause the lookup code path takes care of NULL case. If we init tb6_root to NULL, it does seem simpler. The only place we need to take care is that fib6_del() should not try to expunge fn if is root node if it does not contain any route. But does it seem weird that tb6_root.fn_flags is marked with RTN_RTINFO without containing any route? >> } >> >> return fn; >> @@ -1270,14 +1275,27 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, >> return err; >> >> failure: >> - /* fn->leaf could be NULL if fn is an intermediate node and we >> - * failed to add the new route to it in both subtree creation >> - * failure and fib6_add_rt2node() failure case. >> - * In both cases, fib6_repair_tree() should be called to fix >> + /* fn->leaf could be NULL if: >> + * 1. fn is the root node in the table and we fail to add the default >> + * route to it. >> + * In this case, we put fn->leaf back to net->ipv6.ip6_null_entry as >> + * the way the table was created. >> + * 2. fn is an intermediate node and we failed to add the new >> + * route to it in both subtree creation failure and fib6_add_rt2node() >> + * failure case. >> + * In this case, fib6_repair_tree() should be called to fix >> * fn->leaf. >> */ >> - if (fn && !(fn->fn_flags & (RTN_RTINFO|RTN_ROOT))) >> - fib6_repair_tree(info->nl_net, table, fn); >> + if (fn) { >> + if (fn->fn_flags & RTN_TL_ROOT) { >> + if (!rcu_access_pointer(fn->leaf)) { >> + rcu_assign_pointer(fn->leaf, >> + info->nl_net->ipv6.ip6_null_entry); >> + } >> + } else if (!(fn->fn_flags & (RTN_RTINFO|RTN_ROOT))) { >> + fib6_repair_tree(info->nl_net, table, fn); >> + } >> + } >> /* Always release dst as dst->__refcnt is guaranteed >> * to be taken before entering this function >> */ >> @@ -1685,11 +1703,18 @@ static void fib6_del_route(struct fib6_table *table, struct fib6_node *fn, >> } >> read_unlock(&net->ipv6.fib6_walker_lock); >> >> - /* If it was last route, expunge its radix tree node */ >> + /* If it was last route: >> + * 1. For root node, put back null_entry as how the table was created. >> + * 2. For other nodes, expunge its radix tree node. >> + */ >> if (!rcu_access_pointer(fn->leaf)) { >> - fn->fn_flags &= ~RTN_RTINFO; >> - net->ipv6.rt6_stats->fib_route_nodes--; >> - fn = fib6_repair_tree(net, table, fn); >> + if (fn->fn_flags & RTN_TL_ROOT) { >> + rcu_assign_pointer(fn->leaf, net->ipv6.ip6_null_entry); >> + } else { >> + fn->fn_flags &= ~RTN_RTINFO; >> + net->ipv6.rt6_stats->fib_route_nodes--; >> + fn = fib6_repair_tree(net, table, fn); >> + } >> } >> >> fib6_purge_rt(rt, fn, net); >> -- >> 2.16.0.rc0.223.g4a4ac83678-goog >>
On Sat, Jan 06, 2018 at 05:41:28PM -0800, Wei Wang wrote: > On Fri, Jan 5, 2018 at 11:42 PM, Martin KaFai Lau <kafai@fb.com> wrote: > > On Fri, Jan 05, 2018 at 05:38:35PM -0800, Wei Wang wrote: > >> From: Wei Wang <weiwan@google.com> > >> > >> In the current code, when creating a new fib6 table, tb6_root.leaf gets > >> initialized to net->ipv6.ip6_null_entry. > >> If a default route is being added with rt->rt6i_metric = 0xffffffff, > >> fib6_add() will add this route after net->ipv6.ip6_null_entry. As > >> null_entry is shared, it could cause problem. > >> > >> In order to fix it, set fn->leaf to NULL before calling > >> fib6_add_rt2node() when trying to add the first default route. > >> And reset fn->leaf to null_entry when adding fails or when deleting the > >> last default route. > >> > >> syzkaller reported the following issue which is fixed by this commit: > >> ============================= > >> WARNING: suspicious RCU usage > >> 4.15.0-rc5+ #171 Not tainted > >> ----------------------------- > >> net/ipv6/ip6_fib.c:1702 suspicious rcu_dereference_protected() usage! > >> > >> other info that might help us debug this: > >> > >> rcu_scheduler_active = 2, debug_locks = 1 > >> 4 locks held by swapper/0/0: > >> #0: ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] lockdep_copy_map include/linux/lockdep.h:178 [inline] > >> #0: ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] call_timer_fn+0x1c6/0x820 kernel/time/timer.c:1310 > >> #1: (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] spin_lock_bh include/linux/spinlock.h:315 [inline] > >> #1: (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] fib6_run_gc+0x9d/0x3c0 net/ipv6/ip6_fib.c:2007 > >> #2: (rcu_read_lock){....}, at: [<0000000091db762d>] __fib6_clean_all+0x0/0x3a0 net/ipv6/ip6_fib.c:1560 > >> #3: (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] spin_lock_bh include/linux/spinlock.h:315 [inline] > >> #3: (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] __fib6_clean_all+0x1d0/0x3a0 net/ipv6/ip6_fib.c:1948 > >> > >> stack backtrace: > >> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #171 > >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > >> Call Trace: > >> <IRQ> > >> __dump_stack lib/dump_stack.c:17 [inline] > >> dump_stack+0x194/0x257 lib/dump_stack.c:53 > >> lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585 > >> fib6_del+0xcaa/0x11b0 net/ipv6/ip6_fib.c:1701 > >> fib6_clean_node+0x3aa/0x4f0 net/ipv6/ip6_fib.c:1892 > >> fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1815 > >> fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1863 > >> fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1933 > >> __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1949 > >> fib6_clean_all net/ipv6/ip6_fib.c:1960 [inline] > >> fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2016 > >> fib6_gc_timer_cb+0x20/0x30 net/ipv6/ip6_fib.c:2033 > >> call_timer_fn+0x228/0x820 kernel/time/timer.c:1320 > >> expire_timers kernel/time/timer.c:1357 [inline] > >> __run_timers+0x7ee/0xb70 kernel/time/timer.c:1660 > >> run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686 > >> __do_softirq+0x2d7/0xb85 kernel/softirq.c:285 > >> invoke_softirq kernel/softirq.c:365 [inline] > >> irq_exit+0x1cc/0x200 kernel/softirq.c:405 > >> exiting_irq arch/x86/include/asm/apic.h:540 [inline] > >> smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052 > >> apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:904 > >> </IRQ> > >> > >> Reported-by: syzbot <syzkaller@googlegroups.com> > >> Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table") > >> Signed-off-by: Wei Wang <weiwan@google.com> > >> --- > >> net/ipv6/ip6_fib.c | 45 +++++++++++++++++++++++++++++++++++---------- > >> 1 file changed, 35 insertions(+), 10 deletions(-) > >> > >> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c > >> index d11a5578e4f8..37cb4ad1ea29 100644 > >> --- a/net/ipv6/ip6_fib.c > >> +++ b/net/ipv6/ip6_fib.c > >> @@ -640,6 +640,11 @@ static struct fib6_node *fib6_add_1(struct net *net, > >> if (!(fn->fn_flags & RTN_RTINFO)) { > >> RCU_INIT_POINTER(fn->leaf, NULL); > >> rt6_release(leaf); > >> + /* remove null_entry in the root node */ > >> + } else if (fn->fn_flags & RTN_TL_ROOT && > >> + rcu_access_pointer(fn->leaf) == > >> + net->ipv6.ip6_null_entry) { > >> + RCU_INIT_POINTER(fn->leaf, NULL); > > It seems the reader side could see tb6_root.leaf == NULL after > > this change and I think it should be fine? If it is, instead > > of switching betwen NULL and ip6_null_entry, would it be simpler > > to always set tb6_root.leaf to NULL until a legit default route > > is added? > > > > Agree that reader side could see tb6_root.leaf == NULL and it should > be fine cause the lookup code path takes care of NULL case. > If we init tb6_root to NULL, it does seem simpler. The only place we > need to take care is that fib6_del() should not try to expunge fn if > is root node if it does not contain any route. But does it seem weird > that tb6_root.fn_flags is marked with RTN_RTINFO without containing > any route? Good point on the writer's side. I think we also need to modify fib6_add_1() to expect fn->leaf == NULL. Also, agree that RTN_RTINFO needs to be set/clear accordingly. We also need to make changes to the CONFIG_IPV6_SUBTREES...... It should be doable but requires more changes to break the writer's side assumption. net-next may be a better fit for those changes. For this patch, I have one inline comment below: > > >> } > >> > >> return fn; > >> @@ -1270,14 +1275,27 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, > >> return err; > >> > >> failure: > >> - /* fn->leaf could be NULL if fn is an intermediate node and we > >> - * failed to add the new route to it in both subtree creation > >> - * failure and fib6_add_rt2node() failure case. > >> - * In both cases, fib6_repair_tree() should be called to fix > >> + /* fn->leaf could be NULL if: > >> + * 1. fn is the root node in the table and we fail to add the default > >> + * route to it. > >> + * In this case, we put fn->leaf back to net->ipv6.ip6_null_entry as > >> + * the way the table was created. > >> + * 2. fn is an intermediate node and we failed to add the new > >> + * route to it in both subtree creation failure and fib6_add_rt2node() > >> + * failure case. > >> + * In this case, fib6_repair_tree() should be called to fix > >> * fn->leaf. > >> */ > >> - if (fn && !(fn->fn_flags & (RTN_RTINFO|RTN_ROOT))) > >> - fib6_repair_tree(info->nl_net, table, fn); > >> + if (fn) { > >> + if (fn->fn_flags & RTN_TL_ROOT) { > >> + if (!rcu_access_pointer(fn->leaf)) { > >> + rcu_assign_pointer(fn->leaf, > >> + info->nl_net->ipv6.ip6_null_entry); > >> + } > >> + } else if (!(fn->fn_flags & (RTN_RTINFO|RTN_ROOT))) { > >> + fib6_repair_tree(info->nl_net, table, fn); fib6_repair_tree() is responsible to ensure intermediate fn->leaf is not NULL (or free the fn if possible). In this case, we want tb6_root.leaf to be not NULL. Can we consolidate the tb6_root.leaf fixing codes to fib6_repair_tree() also such that all tree repairing logic is in one place? > >> + } > >> + } > >> /* Always release dst as dst->__refcnt is guaranteed > >> * to be taken before entering this function > >> */ > >> @@ -1685,11 +1703,18 @@ static void fib6_del_route(struct fib6_table *table, struct fib6_node *fn, > >> } > >> read_unlock(&net->ipv6.fib6_walker_lock); > >> > >> - /* If it was last route, expunge its radix tree node */ > >> + /* If it was last route: > >> + * 1. For root node, put back null_entry as how the table was created. > >> + * 2. For other nodes, expunge its radix tree node. > >> + */ > >> if (!rcu_access_pointer(fn->leaf)) { > >> - fn->fn_flags &= ~RTN_RTINFO; > >> - net->ipv6.rt6_stats->fib_route_nodes--; > >> - fn = fib6_repair_tree(net, table, fn); > >> + if (fn->fn_flags & RTN_TL_ROOT) { > >> + rcu_assign_pointer(fn->leaf, net->ipv6.ip6_null_entry); > >> + } else { > >> + fn->fn_flags &= ~RTN_RTINFO; > >> + net->ipv6.rt6_stats->fib_route_nodes--; > >> + fn = fib6_repair_tree(net, table, fn); > >> + } > >> } > >> > >> fib6_purge_rt(rt, fn, net); > >> -- > >> 2.16.0.rc0.223.g4a4ac83678-goog > >>
On Sat, Jan 6, 2018 at 10:16 PM, Martin KaFai Lau <kafai@fb.com> wrote: > On Sat, Jan 06, 2018 at 05:41:28PM -0800, Wei Wang wrote: >> On Fri, Jan 5, 2018 at 11:42 PM, Martin KaFai Lau <kafai@fb.com> wrote: >> > On Fri, Jan 05, 2018 at 05:38:35PM -0800, Wei Wang wrote: >> >> From: Wei Wang <weiwan@google.com> >> >> >> >> In the current code, when creating a new fib6 table, tb6_root.leaf gets >> >> initialized to net->ipv6.ip6_null_entry. >> >> If a default route is being added with rt->rt6i_metric = 0xffffffff, >> >> fib6_add() will add this route after net->ipv6.ip6_null_entry. As >> >> null_entry is shared, it could cause problem. >> >> >> >> In order to fix it, set fn->leaf to NULL before calling >> >> fib6_add_rt2node() when trying to add the first default route. >> >> And reset fn->leaf to null_entry when adding fails or when deleting the >> >> last default route. >> >> >> >> syzkaller reported the following issue which is fixed by this commit: >> >> ============================= >> >> WARNING: suspicious RCU usage >> >> 4.15.0-rc5+ #171 Not tainted >> >> ----------------------------- >> >> net/ipv6/ip6_fib.c:1702 suspicious rcu_dereference_protected() usage! >> >> >> >> other info that might help us debug this: >> >> >> >> rcu_scheduler_active = 2, debug_locks = 1 >> >> 4 locks held by swapper/0/0: >> >> #0: ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] lockdep_copy_map include/linux/lockdep.h:178 [inline] >> >> #0: ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] call_timer_fn+0x1c6/0x820 kernel/time/timer.c:1310 >> >> #1: (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] spin_lock_bh include/linux/spinlock.h:315 [inline] >> >> #1: (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] fib6_run_gc+0x9d/0x3c0 net/ipv6/ip6_fib.c:2007 >> >> #2: (rcu_read_lock){....}, at: [<0000000091db762d>] __fib6_clean_all+0x0/0x3a0 net/ipv6/ip6_fib.c:1560 >> >> #3: (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] spin_lock_bh include/linux/spinlock.h:315 [inline] >> >> #3: (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] __fib6_clean_all+0x1d0/0x3a0 net/ipv6/ip6_fib.c:1948 >> >> >> >> stack backtrace: >> >> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #171 >> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 >> >> Call Trace: >> >> <IRQ> >> >> __dump_stack lib/dump_stack.c:17 [inline] >> >> dump_stack+0x194/0x257 lib/dump_stack.c:53 >> >> lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585 >> >> fib6_del+0xcaa/0x11b0 net/ipv6/ip6_fib.c:1701 >> >> fib6_clean_node+0x3aa/0x4f0 net/ipv6/ip6_fib.c:1892 >> >> fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1815 >> >> fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1863 >> >> fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1933 >> >> __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1949 >> >> fib6_clean_all net/ipv6/ip6_fib.c:1960 [inline] >> >> fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2016 >> >> fib6_gc_timer_cb+0x20/0x30 net/ipv6/ip6_fib.c:2033 >> >> call_timer_fn+0x228/0x820 kernel/time/timer.c:1320 >> >> expire_timers kernel/time/timer.c:1357 [inline] >> >> __run_timers+0x7ee/0xb70 kernel/time/timer.c:1660 >> >> run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686 >> >> __do_softirq+0x2d7/0xb85 kernel/softirq.c:285 >> >> invoke_softirq kernel/softirq.c:365 [inline] >> >> irq_exit+0x1cc/0x200 kernel/softirq.c:405 >> >> exiting_irq arch/x86/include/asm/apic.h:540 [inline] >> >> smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052 >> >> apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:904 >> >> </IRQ> >> >> >> >> Reported-by: syzbot <syzkaller@googlegroups.com> >> >> Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table") >> >> Signed-off-by: Wei Wang <weiwan@google.com> >> >> --- >> >> net/ipv6/ip6_fib.c | 45 +++++++++++++++++++++++++++++++++++---------- >> >> 1 file changed, 35 insertions(+), 10 deletions(-) >> >> >> >> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c >> >> index d11a5578e4f8..37cb4ad1ea29 100644 >> >> --- a/net/ipv6/ip6_fib.c >> >> +++ b/net/ipv6/ip6_fib.c >> >> @@ -640,6 +640,11 @@ static struct fib6_node *fib6_add_1(struct net *net, >> >> if (!(fn->fn_flags & RTN_RTINFO)) { >> >> RCU_INIT_POINTER(fn->leaf, NULL); >> >> rt6_release(leaf); >> >> + /* remove null_entry in the root node */ >> >> + } else if (fn->fn_flags & RTN_TL_ROOT && >> >> + rcu_access_pointer(fn->leaf) == >> >> + net->ipv6.ip6_null_entry) { >> >> + RCU_INIT_POINTER(fn->leaf, NULL); >> > It seems the reader side could see tb6_root.leaf == NULL after >> > this change and I think it should be fine? If it is, instead >> > of switching betwen NULL and ip6_null_entry, would it be simpler >> > to always set tb6_root.leaf to NULL until a legit default route >> > is added? >> > >> >> Agree that reader side could see tb6_root.leaf == NULL and it should >> be fine cause the lookup code path takes care of NULL case. >> If we init tb6_root to NULL, it does seem simpler. The only place we >> need to take care is that fib6_del() should not try to expunge fn if >> is root node if it does not contain any route. But does it seem weird >> that tb6_root.fn_flags is marked with RTN_RTINFO without containing >> any route? > Good point on the writer's side. I think we also need to modify > fib6_add_1() to expect fn->leaf == NULL. Also, agree that > RTN_RTINFO needs to be set/clear accordingly. We also need to > make changes to the CONFIG_IPV6_SUBTREES...... > It should be doable but requires more changes to break the writer's side > assumption. net-next may be a better fit for those changes. > Understood. Yes. Agree. I will give it a try on net-next. > For this patch, I have one inline comment below: > >> >> >> } >> >> >> >> return fn; >> >> @@ -1270,14 +1275,27 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, >> >> return err; >> >> >> >> failure: >> >> - /* fn->leaf could be NULL if fn is an intermediate node and we >> >> - * failed to add the new route to it in both subtree creation >> >> - * failure and fib6_add_rt2node() failure case. >> >> - * In both cases, fib6_repair_tree() should be called to fix >> >> + /* fn->leaf could be NULL if: >> >> + * 1. fn is the root node in the table and we fail to add the default >> >> + * route to it. >> >> + * In this case, we put fn->leaf back to net->ipv6.ip6_null_entry as >> >> + * the way the table was created. >> >> + * 2. fn is an intermediate node and we failed to add the new >> >> + * route to it in both subtree creation failure and fib6_add_rt2node() >> >> + * failure case. >> >> + * In this case, fib6_repair_tree() should be called to fix >> >> * fn->leaf. >> >> */ >> >> - if (fn && !(fn->fn_flags & (RTN_RTINFO|RTN_ROOT))) >> >> - fib6_repair_tree(info->nl_net, table, fn); >> >> + if (fn) { >> >> + if (fn->fn_flags & RTN_TL_ROOT) { >> >> + if (!rcu_access_pointer(fn->leaf)) { >> >> + rcu_assign_pointer(fn->leaf, >> >> + info->nl_net->ipv6.ip6_null_entry); >> >> + } >> >> + } else if (!(fn->fn_flags & (RTN_RTINFO|RTN_ROOT))) { >> >> + fib6_repair_tree(info->nl_net, table, fn); > fib6_repair_tree() is responsible to ensure intermediate fn->leaf is > not NULL (or free the fn if possible). In this case, we want tb6_root.leaf > to be not NULL. Can we consolidate the tb6_root.leaf fixing codes to > fib6_repair_tree() also such that all tree repairing logic is > in one place? > Yes. Good idea. I will send out v2 for this. >> >> + } >> >> + } >> >> /* Always release dst as dst->__refcnt is guaranteed >> >> * to be taken before entering this function >> >> */ >> >> @@ -1685,11 +1703,18 @@ static void fib6_del_route(struct fib6_table *table, struct fib6_node *fn, >> >> } >> >> read_unlock(&net->ipv6.fib6_walker_lock); >> >> >> >> - /* If it was last route, expunge its radix tree node */ >> >> + /* If it was last route: >> >> + * 1. For root node, put back null_entry as how the table was created. >> >> + * 2. For other nodes, expunge its radix tree node. >> >> + */ >> >> if (!rcu_access_pointer(fn->leaf)) { >> >> - fn->fn_flags &= ~RTN_RTINFO; >> >> - net->ipv6.rt6_stats->fib_route_nodes--; >> >> - fn = fib6_repair_tree(net, table, fn); >> >> + if (fn->fn_flags & RTN_TL_ROOT) { >> >> + rcu_assign_pointer(fn->leaf, net->ipv6.ip6_null_entry); >> >> + } else { >> >> + fn->fn_flags &= ~RTN_RTINFO; >> >> + net->ipv6.rt6_stats->fib_route_nodes--; >> >> + fn = fib6_repair_tree(net, table, fn); >> >> + } >> >> } >> >> >> >> fib6_purge_rt(rt, fn, net); >> >> -- >> >> 2.16.0.rc0.223.g4a4ac83678-goog >> >>
============================= WARNING: suspicious RCU usage 4.15.0-rc5+ #171 Not tainted ----------------------------- net/ipv6/ip6_fib.c:1702 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 4 locks held by swapper/0/0: #0: ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] lockdep_copy_map include/linux/lockdep.h:178 [inline] #0: ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] call_timer_fn+0x1c6/0x820 kernel/time/timer.c:1310 #1: (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] spin_lock_bh include/linux/spinlock.h:315 [inline] #1: (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] fib6_run_gc+0x9d/0x3c0 net/ipv6/ip6_fib.c:2007 #2: (rcu_read_lock){....}, at: [<0000000091db762d>] __fib6_clean_all+0x0/0x3a0 net/ipv6/ip6_fib.c:1560 #3: (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] spin_lock_bh include/linux/spinlock.h:315 [inline] #3: (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] __fib6_clean_all+0x1d0/0x3a0 net/ipv6/ip6_fib.c:1948 stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #171 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585 fib6_del+0xcaa/0x11b0 net/ipv6/ip6_fib.c:1701 fib6_clean_node+0x3aa/0x4f0 net/ipv6/ip6_fib.c:1892 fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1815 fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1863 fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1933 __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1949 fib6_clean_all net/ipv6/ip6_fib.c:1960 [inline] fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2016 fib6_gc_timer_cb+0x20/0x30 net/ipv6/ip6_fib.c:2033 call_timer_fn+0x228/0x820 kernel/time/timer.c:1320 expire_timers kernel/time/timer.c:1357 [inline] __run_timers+0x7ee/0xb70 kernel/time/timer.c:1660 run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686 __do_softirq+0x2d7/0xb85 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1cc/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:540 [inline] smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:904 </IRQ> Reported-by: syzbot <syzkaller@googlegroups.com> Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table") Signed-off-by: Wei Wang <weiwan@google.com> --- net/ipv6/ip6_fib.c | 45 +++++++++++++++++++++++++++++++++++---------- 1 file changed, 35 insertions(+), 10 deletions(-) diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index d11a5578e4f8..37cb4ad1ea29 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -640,6 +640,11 @@ static struct fib6_node *fib6_add_1(struct net *net, if (!(fn->fn_flags & RTN_RTINFO)) { RCU_INIT_POINTER(fn->leaf, NULL); rt6_release(leaf); + /* remove null_entry in the root node */ + } else if (fn->fn_flags & RTN_TL_ROOT && + rcu_access_pointer(fn->leaf) == + net->ipv6.ip6_null_entry) { + RCU_INIT_POINTER(fn->leaf, NULL); } return fn; @@ -1270,14 +1275,27 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, return err; failure: - /* fn->leaf could be NULL if fn is an intermediate node and we - * failed to add the new route to it in both subtree creation - * failure and fib6_add_rt2node() failure case. - * In both cases, fib6_repair_tree() should be called to fix + /* fn->leaf could be NULL if: + * 1. fn is the root node in the table and we fail to add the default + * route to it. + * In this case, we put fn->leaf back to net->ipv6.ip6_null_entry as + * the way the table was created. + * 2. fn is an intermediate node and we failed to add the new + * route to it in both subtree creation failure and fib6_add_rt2node() + * failure case. + * In this case, fib6_repair_tree() should be called to fix * fn->leaf. */ - if (fn && !(fn->fn_flags & (RTN_RTINFO|RTN_ROOT))) - fib6_repair_tree(info->nl_net, table, fn); + if (fn) { + if (fn->fn_flags & RTN_TL_ROOT) { + if (!rcu_access_pointer(fn->leaf)) { + rcu_assign_pointer(fn->leaf, + info->nl_net->ipv6.ip6_null_entry); + } + } else if (!(fn->fn_flags & (RTN_RTINFO|RTN_ROOT))) { + fib6_repair_tree(info->nl_net, table, fn); + } + } /* Always release dst as dst->__refcnt is guaranteed * to be taken before entering this function */ @@ -1685,11 +1703,18 @@ static void fib6_del_route(struct fib6_table *table, struct fib6_node *fn, } read_unlock(&net->ipv6.fib6_walker_lock); - /* If it was last route, expunge its radix tree node */ + /* If it was last route: + * 1. For root node, put back null_entry as how the table was created. + * 2. For other nodes, expunge its radix tree node. + */ if (!rcu_access_pointer(fn->leaf)) { - fn->fn_flags &= ~RTN_RTINFO; - net->ipv6.rt6_stats->fib_route_nodes--; - fn = fib6_repair_tree(net, table, fn); + if (fn->fn_flags & RTN_TL_ROOT) { + rcu_assign_pointer(fn->leaf, net->ipv6.ip6_null_entry); + } else { + fn->fn_flags &= ~RTN_RTINFO; + net->ipv6.rt6_stats->fib_route_nodes--; + fn = fib6_repair_tree(net, table, fn); + } } fib6_purge_rt(rt, fn, net);
From: Wei Wang <weiwan@google.com> In the current code, when creating a new fib6 table, tb6_root.leaf gets initialized to net->ipv6.ip6_null_entry. If a default route is being added with rt->rt6i_metric = 0xffffffff, fib6_add() will add this route after net->ipv6.ip6_null_entry. As null_entry is shared, it could cause problem. In order to fix it, set fn->leaf to NULL before calling fib6_add_rt2node() when trying to add the first default route. And reset fn->leaf to null_entry when adding fails or when deleting the last default route. syzkaller reported the following issue which is fixed by this commit: