Message ID | 1567142596-25923-1-git-send-email-subashab@codeaurora.org |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | [net] dev: Delay the free of the percpu refcount | expand |
On 8/30/19 7:23 AM, Subash Abhinov Kasiviswanathan wrote: > While running stress-ng on an ARM64 kernel, the following oops > was observedi - > > 44837.761523: <6> Unable to handle kernel paging request at > virtual address 0000004a88287000 > 44837.761651: <2> pc : in_dev_finish_destroy+0x4c/0xc8 > 44837.761654: <2> lr : in_dev_finish_destroy+0x2c/0xc8 > 44837.762393: <2> Call trace: > 44837.762398: <2> in_dev_finish_destroy+0x4c/0xc8 > 44837.762404: <2> in_dev_rcu_put+0x24/0x30 > 44837.762412: <2> rcu_nocb_kthread+0x43c/0x468 > 44837.762418: <2> kthread+0x118/0x128 > 44837.762424: <2> ret_from_fork+0x10/0x1c > > Prior to this, it appeared as if some of the inet6_dev allocations > were failing. From the memory dump, the last operation performed > was dev_put(), however the pcpu_refcnt was NULL while the > reg_state = NETREG_RELEASED. Effectively, the refcount memory was > freed in free_netdev() before the last reference was dropped. > > Fix this by freeing the memory after all references are dropped and > before the dev memory itself is freed. > > Fixes: 29b4433d991c ("net: percpu net_device refcount") > Cc: Sean Tranchetti <stranche@codeaurora.org> > Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> > --- > net/core/dev.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 49589ed..bce40d8 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -9128,6 +9128,9 @@ void netdev_freemem(struct net_device *dev) > { > char *addr = (char *)dev - dev->padded; > > + free_percpu(dev->pcpu_refcnt); > + dev->pcpu_refcnt = NULL; > + > kvfree(addr); > } > > @@ -9272,9 +9275,6 @@ void free_netdev(struct net_device *dev) > list_for_each_entry_safe(p, n, &dev->napi_list, dev_list) > netif_napi_del(p); > > - free_percpu(dev->pcpu_refcnt); > - dev->pcpu_refcnt = NULL; > - > /* Compatibility with error handling in drivers */ > if (dev->reg_state == NETREG_UNINITIALIZED) { > netdev_freemem(dev); > This looks bogus. Whatever layer tries to access dev refcnt after free_netdev() has been called is buggy. I would rather trap early and fix the root cause. Untested patch : diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b5d28dadf964..8080f1305417 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3723,6 +3723,7 @@ void netdev_run_todo(void); */ static inline void dev_put(struct net_device *dev) { + BUG_ON(!dev->pcpu_refcnt); this_cpu_dec(*dev->pcpu_refcnt); } @@ -3734,6 +3735,7 @@ static inline void dev_put(struct net_device *dev) */ static inline void dev_hold(struct net_device *dev) { + BUG_ON(!dev->pcpu_refcnt); this_cpu_inc(*dev->pcpu_refcnt); }
> This looks bogus. > > Whatever layer tries to access dev refcnt after free_netdev() has been > called is buggy. > > I would rather trap early and fix the root cause. > > Untested patch : > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index b5d28dadf964..8080f1305417 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -3723,6 +3723,7 @@ void netdev_run_todo(void); > */ > static inline void dev_put(struct net_device *dev) > { > + BUG_ON(!dev->pcpu_refcnt); > this_cpu_dec(*dev->pcpu_refcnt); > } > > @@ -3734,6 +3735,7 @@ static inline void dev_put(struct net_device > *dev) > */ > static inline void dev_hold(struct net_device *dev) > { > + BUG_ON(!dev->pcpu_refcnt); > this_cpu_inc(*dev->pcpu_refcnt); > } Hello Eric I am seeing a similar crash with your patch as well. The NULL dev->pcpu_refcnt was caught by the BUG you added. 786.510217: <6> kernel BUG at include/linux/netdevice.h:3633! 786.510263: <2> pc : in_dev_finish_destroy+0xcc/0xd0 786.510267: <2> lr : in_dev_finish_destroy+0x2c/0xd0 786.511220: <2> Call trace: 786.511225: <2> in_dev_finish_destroy+0xcc/0xd0 786.511230: <2> in_dev_rcu_put+0x24/0x30 786.511237: <2> rcu_nocb_kthread+0x43c/0x468 786.511243: <2> kthread+0x118/0x128 786.511249: <2> ret_from_fork+0x10/0x1c This seems to be happening when there is an allocation failure in the IPv6 notifier callback only. I had added some additional debug to narrow down the refcount validity along the callers of the dev_put/dev_hold. refcnt valid below shows that the pointer dev->pcpu_refcnt is valid while refcnt null shows the case where dev->pcpu_refcnt is NULL. The last dev_put happens after free_netdev leading to the dev->pcpu_refcnt to be accessed when NULL. 309.908501: <6> dev_hold() ffffffe13c9df000 ip6_vti0 refcnt valid setup_net+0xa0/0x210 -> ops_init+0x88/0x110 309.908674: <6> dev_hold() ffffffe13c9df000 ip6_vti0 refcnt valid register_netdevice+0x29c/0x5b0 -> netdev_register_kobject+0xd8/0x150 309.908696: <6> dev_hold() ffffffe13c9df000 ip6_vti0 refcnt valid register_netdevice+0x29c/0x5b0 -> netdev_register_kobject+0x100/0x150 309.908717: <6> dev_hold() ffffffe13c9df000 ip6_vti0 refcnt valid vti6_init_net+0x188/0x1c0 -> register_netdev+0x28/0x40 309.908763: <6> neighbour: dev_hold() ffffffe13c9df000 ip6_vti0 refcnt valid inetdev_event+0x43c/0x528 -> inetdev_init+0x80/0x1e0 309.908835: <6> dev_hold() ffffffe13c9df000 ip6_vti0 refcnt valid raw_notifier_call_chain+0x3c/0x68 -> inetdev_event+0x43c/0x528 309.908882: <6> neighbour: dev_hold() ffffffe13c9df000 ip6_vti0 refcnt valid addrconf_notify+0x42c/0xe58 -> ipv6_add_dev+0xe4/0x588 309.908890: <6> IPv6: dev_hold() ffffffe13c9df000 ip6_vti0 refcnt valid raw_notifier_call_chain+0x3c/0x68 -> addrconf_notify+0x42c/0xe58 309.908906: <6> stress-ng-clone: page allocation failure: order:0, mode:0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null) 309.908910: <6> stress-ng-clone cpuset=foreground mems_allowed=0 309.908925: <2> Call trace: 309.908931: <2> dump_backtrace+0x0/0x158 309.908934: <2> show_stack+0x14/0x20 309.908939: <2> dump_stack+0xc4/0xfc 309.908944: <2> warn_alloc+0xf8/0x168 309.908947: <2> __alloc_pages_nodemask+0xff4/0x1018 309.908955: <2> new_slab+0x128/0x5b8 309.908958: <2> ___slab_alloc+0x4cc/0x5f8 309.908960: <2> kmem_cache_alloc_trace+0x2a4/0x2c0 309.908963: <2> ipv6_add_dev+0x220/0x588 309.908966: <2> addrconf_notify+0x42c/0xe58 309.908969: <2> raw_notifier_call_chain+0x3c/0x68 309.908972: <2> register_netdevice+0x3c4/0x5b0 309.908974: <2> register_netdev+0x28/0x40 309.908978: <2> vti6_init_net+0x188/0x1c0 309.908981: <2> ops_init+0x88/0x110 309.908983: <2> setup_net+0xa0/0x210 309.908986: <2> copy_net_ns+0xa8/0x130 309.908990: <2> create_new_namespaces+0x138/0x170 309.908993: <2> unshare_nsproxy_namespaces+0x68/0x90 309.908999: <2> ksys_unshare+0x17c/0x248 309.909001: <2> __arm64_sys_unshare+0x10/0x20 309.909004: <2> el0_svc_common+0xa0/0x158 309.909007: <2> el0_svc_handler+0x6c/0x88 309.909010: <2> el0_svc+0x8/0xc 309.909021: <6> neighbour: dev_put() ffffffe13c9df000 ip6_vti0 refcnt valid addrconf_notify+0x42c/0xe58 -> ipv6_add_dev+0x400/0x588 309.909030: <6> IPv6: dev_put() ffffffe13c9df000 ip6_vti0 refcnt valid raw_notifier_call_chain+0x3c/0x68 -> addrconf_notify+0x42c/0xe58 309.918097: <6> neighbour: dev_put() ffffffe13c9df000 ip6_vti0 refcnt valid raw_notifier_call_chain+0x3c/0x68 -> inetdev_event+0x290/0x528 309.918249: <6> dev_put() ffffffe13c9df000 ip6_vti0 refcnt valid register_netdevice+0x3f8/0x5b0 -> rollback_registered_many+0x488/0x658 309.918318: <6> dev_put() ffffffe13c9df000 ip6_vti0 refcnt valid net_rx_queue_update_kobjects+0x1ec/0x238 -> kobject_put+0x7c/0xc0 309.918405: <6> dev_put() ffffffe13c9df000 ip6_vti0 refcnt valid netdev_queue_update_kobjects+0x1dc/0x228 -> kobject_put+0x7c/0xc0 309.918759: <6> dev_put() ffffffe13c9df000 ip6_vti0 refcnt valid register_netdev+0x28/0x40 -> register_netdevice+0x3f8/0x5b0 309.918778: <6> free_netdev() ffffffe13c9df000 ip6_vti0 refcnt valid ops_init+0x88/0x110 -> vti6_init_net+0x1ac/0x1c0 309.980671: <6> dev_put() ffffffe13c9df000 ip6_vti0 refcnt null rcu_nocb_kthread+0x43c/0x468 -> in_dev_rcu_put+0x24/0x30 309.980838: <6> kernel BUG at include/linux/netdevice.h:3636!
On 2019-08-30 15:03, Subash Abhinov Kasiviswanathan wrote: >> This looks bogus. >> >> Whatever layer tries to access dev refcnt after free_netdev() has been >> called is buggy. >> >> I would rather trap early and fix the root cause. >> >> Untested patch : >> >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >> index b5d28dadf964..8080f1305417 100644 >> --- a/include/linux/netdevice.h >> +++ b/include/linux/netdevice.h >> @@ -3723,6 +3723,7 @@ void netdev_run_todo(void); >> */ >> static inline void dev_put(struct net_device *dev) >> { >> + BUG_ON(!dev->pcpu_refcnt); >> this_cpu_dec(*dev->pcpu_refcnt); >> } >> >> @@ -3734,6 +3735,7 @@ static inline void dev_put(struct net_device >> *dev) >> */ >> static inline void dev_hold(struct net_device *dev) >> { >> + BUG_ON(!dev->pcpu_refcnt); >> this_cpu_inc(*dev->pcpu_refcnt); >> } > > Hello Eric > > I am seeing a similar crash with your patch as well. > The NULL dev->pcpu_refcnt was caught by the BUG you added. > > 786.510217: <6> kernel BUG at include/linux/netdevice.h:3633! > 786.510263: <2> pc : in_dev_finish_destroy+0xcc/0xd0 > 786.510267: <2> lr : in_dev_finish_destroy+0x2c/0xd0 > 786.511220: <2> Call trace: > 786.511225: <2> in_dev_finish_destroy+0xcc/0xd0 > 786.511230: <2> in_dev_rcu_put+0x24/0x30 > 786.511237: <2> rcu_nocb_kthread+0x43c/0x468 > 786.511243: <2> kthread+0x118/0x128 > 786.511249: <2> ret_from_fork+0x10/0x1c > > This seems to be happening when there is an allocation failure > in the IPv6 notifier callback only. > > I had added some additional debug to narrow down the refcount > validity along the callers of the dev_put/dev_hold. > refcnt valid below shows that the pointer dev->pcpu_refcnt is valid > while refcnt null shows the case where dev->pcpu_refcnt is NULL. > The last dev_put happens after free_netdev leading to the > dev->pcpu_refcnt to be accessed when NULL. > > 309.908501: <6> dev_hold() ffffffe13c9df000 ip6_vti0 refcnt valid > setup_net+0xa0/0x210 -> ops_init+0x88/0x110 > 309.908674: <6> dev_hold() ffffffe13c9df000 ip6_vti0 refcnt valid > register_netdevice+0x29c/0x5b0 -> netdev_register_kobject+0xd8/0x150 > 309.908696: <6> dev_hold() ffffffe13c9df000 ip6_vti0 refcnt valid > register_netdevice+0x29c/0x5b0 -> netdev_register_kobject+0x100/0x150 > 309.908717: <6> dev_hold() ffffffe13c9df000 ip6_vti0 refcnt valid > vti6_init_net+0x188/0x1c0 -> register_netdev+0x28/0x40 > 309.908763: <6> neighbour: dev_hold() ffffffe13c9df000 ip6_vti0 > refcnt valid inetdev_event+0x43c/0x528 -> inetdev_init+0x80/0x1e0 > 309.908835: <6> dev_hold() ffffffe13c9df000 ip6_vti0 refcnt valid > raw_notifier_call_chain+0x3c/0x68 -> inetdev_event+0x43c/0x528 > 309.908882: <6> neighbour: dev_hold() ffffffe13c9df000 ip6_vti0 > refcnt valid addrconf_notify+0x42c/0xe58 -> ipv6_add_dev+0xe4/0x588 > 309.908890: <6> IPv6: dev_hold() ffffffe13c9df000 ip6_vti0 refcnt > valid raw_notifier_call_chain+0x3c/0x68 -> addrconf_notify+0x42c/0xe58 > 309.908906: <6> stress-ng-clone: page allocation failure: order:0, > mode:0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null) > 309.908910: <6> stress-ng-clone cpuset=foreground mems_allowed=0 > 309.908925: <2> Call trace: > 309.908931: <2> dump_backtrace+0x0/0x158 > 309.908934: <2> show_stack+0x14/0x20 > 309.908939: <2> dump_stack+0xc4/0xfc > 309.908944: <2> warn_alloc+0xf8/0x168 > 309.908947: <2> __alloc_pages_nodemask+0xff4/0x1018 > 309.908955: <2> new_slab+0x128/0x5b8 > 309.908958: <2> ___slab_alloc+0x4cc/0x5f8 > 309.908960: <2> kmem_cache_alloc_trace+0x2a4/0x2c0 > 309.908963: <2> ipv6_add_dev+0x220/0x588 > 309.908966: <2> addrconf_notify+0x42c/0xe58 > 309.908969: <2> raw_notifier_call_chain+0x3c/0x68 > 309.908972: <2> register_netdevice+0x3c4/0x5b0 > 309.908974: <2> register_netdev+0x28/0x40 > 309.908978: <2> vti6_init_net+0x188/0x1c0 > 309.908981: <2> ops_init+0x88/0x110 > 309.908983: <2> setup_net+0xa0/0x210 > 309.908986: <2> copy_net_ns+0xa8/0x130 > 309.908990: <2> create_new_namespaces+0x138/0x170 > 309.908993: <2> unshare_nsproxy_namespaces+0x68/0x90 > 309.908999: <2> ksys_unshare+0x17c/0x248 > 309.909001: <2> __arm64_sys_unshare+0x10/0x20 > 309.909004: <2> el0_svc_common+0xa0/0x158 > 309.909007: <2> el0_svc_handler+0x6c/0x88 > 309.909010: <2> el0_svc+0x8/0xc > 309.909021: <6> neighbour: dev_put() ffffffe13c9df000 ip6_vti0 > refcnt valid addrconf_notify+0x42c/0xe58 -> ipv6_add_dev+0x400/0x588 > 309.909030: <6> IPv6: dev_put() ffffffe13c9df000 ip6_vti0 refcnt > valid raw_notifier_call_chain+0x3c/0x68 -> addrconf_notify+0x42c/0xe58 > 309.918097: <6> neighbour: dev_put() ffffffe13c9df000 ip6_vti0 > refcnt valid raw_notifier_call_chain+0x3c/0x68 -> > inetdev_event+0x290/0x528 > 309.918249: <6> dev_put() ffffffe13c9df000 ip6_vti0 refcnt valid > register_netdevice+0x3f8/0x5b0 -> rollback_registered_many+0x488/0x658 > 309.918318: <6> dev_put() ffffffe13c9df000 ip6_vti0 refcnt valid > net_rx_queue_update_kobjects+0x1ec/0x238 -> kobject_put+0x7c/0xc0 > 309.918405: <6> dev_put() ffffffe13c9df000 ip6_vti0 refcnt valid > netdev_queue_update_kobjects+0x1dc/0x228 -> kobject_put+0x7c/0xc0 > 309.918759: <6> dev_put() ffffffe13c9df000 ip6_vti0 refcnt valid > register_netdev+0x28/0x40 -> register_netdevice+0x3f8/0x5b0 > 309.918778: <6> free_netdev() ffffffe13c9df000 ip6_vti0 refcnt valid > ops_init+0x88/0x110 -> vti6_init_net+0x1ac/0x1c0 > 309.980671: <6> dev_put() ffffffe13c9df000 ip6_vti0 refcnt null > rcu_nocb_kthread+0x43c/0x468 -> in_dev_rcu_put+0x24/0x30 > 309.980838: <6> kernel BUG at include/linux/netdevice.h:3636! Hello Eric Could you please review this? The main concern here is that dev_put() is getting called after free_netdev() causing a NULL dereference. 309.918778: <6> free_netdev() ffffffe13c9df000 ip6_vti0 refcnt valid ops_init+0x88/0x110 -> vti6_init_net+0x1ac/0x1c0 309.980671: <6> dev_put() ffffffe13c9df000 ip6_vti0 refcnt null rcu_nocb_kthread+0x43c/0x468 -> in_dev_rcu_put+0x24/0x30 I am able to reproduce this consistently and open to testing out patches.
diff --git a/net/core/dev.c b/net/core/dev.c index 49589ed..bce40d8 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -9128,6 +9128,9 @@ void netdev_freemem(struct net_device *dev) { char *addr = (char *)dev - dev->padded; + free_percpu(dev->pcpu_refcnt); + dev->pcpu_refcnt = NULL; + kvfree(addr); } @@ -9272,9 +9275,6 @@ void free_netdev(struct net_device *dev) list_for_each_entry_safe(p, n, &dev->napi_list, dev_list) netif_napi_del(p); - free_percpu(dev->pcpu_refcnt); - dev->pcpu_refcnt = NULL; - /* Compatibility with error handling in drivers */ if (dev->reg_state == NETREG_UNINITIALIZED) { netdev_freemem(dev);
While running stress-ng on an ARM64 kernel, the following oops was observedi - 44837.761523: <6> Unable to handle kernel paging request at virtual address 0000004a88287000 44837.761651: <2> pc : in_dev_finish_destroy+0x4c/0xc8 44837.761654: <2> lr : in_dev_finish_destroy+0x2c/0xc8 44837.762393: <2> Call trace: 44837.762398: <2> in_dev_finish_destroy+0x4c/0xc8 44837.762404: <2> in_dev_rcu_put+0x24/0x30 44837.762412: <2> rcu_nocb_kthread+0x43c/0x468 44837.762418: <2> kthread+0x118/0x128 44837.762424: <2> ret_from_fork+0x10/0x1c Prior to this, it appeared as if some of the inet6_dev allocations were failing. From the memory dump, the last operation performed was dev_put(), however the pcpu_refcnt was NULL while the reg_state = NETREG_RELEASED. Effectively, the refcount memory was freed in free_netdev() before the last reference was dropped. Fix this by freeing the memory after all references are dropped and before the dev memory itself is freed. Fixes: 29b4433d991c ("net: percpu net_device refcount") Cc: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> --- net/core/dev.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)