Message ID | 1449653896-5236-10-git-send-email-luis.henriques@canonical.com |
---|---|
State | New |
Headers | show |
On Wed, 2015-12-09 at 09:36 +0000, Luis Henriques wrote: > 3.16.7-ckt21 -stable review patch. If anyone has any objections, > please let me know. > > ------------------ > > From: Eric Dumazet <edumazet@google.com> > > commit 4ece9009774596ee3df0acba65a324b7ea79387c upstream. > > sit0 device allocates its percpu storage twice : > - One time in ipip6_tunnel_init() > - One time in ipip6_fb_tunnel_init() > > Thus we leak 48 bytes per possible cpu per network namespace > dismantle. > > ipip6_fb_tunnel_init() can be much simpler and does not > return an error, and should be called after register_netdev() [...] Doesn't this introduce a race condition when sit is a module? There seems to be nothing to prevent access to the partially initialised device after calling register_netdev(), if sit_init_net() is called during module loading rather than during namespace creation. Ben.
On Sat, Dec 12, 2015 at 04:18:26AM +0000, Ben Hutchings wrote: > On Wed, 2015-12-09 at 09:36 +0000, Luis Henriques wrote: > > 3.16.7-ckt21 -stable review patch. If anyone has any objections, > > please let me know. > > > > ------------------ > > > > From: Eric Dumazet <edumazet@google.com> > > > > commit 4ece9009774596ee3df0acba65a324b7ea79387c upstream. > > > > sit0 device allocates its percpu storage twice : > > - One time in ipip6_tunnel_init() > > - One time in ipip6_fb_tunnel_init() > > > > Thus we leak 48 bytes per possible cpu per network namespace > > dismantle. > > > > ipip6_fb_tunnel_init() can be much simpler and does not > > return an error, and should be called after register_netdev() > [...] > > Doesn't this introduce a race condition when sit is a module? There > seems to be nothing to prevent access to the partially initialised > device after calling register_netdev(), if sit_init_net() is called > during module loading rather than during namespace creation. > This seems to be an upstream issue, not specific to the 3.16.y-ckt stable kernel. If that is the case, I guess I'll just keep this patch and later apply the fix. Or do you think this race is really likely to be a worst problem than then issue the patch is trying to fix? Cheers, -- Luís > Ben. > > -- > Ben Hutchings > Knowledge is power. France is bacon.
On Sun, 2015-12-13 at 18:54 +0000, Luis Henriques wrote: > On Sat, Dec 12, 2015 at 04:18:26AM +0000, Ben Hutchings wrote: > > On Wed, 2015-12-09 at 09:36 +0000, Luis Henriques wrote: > > > 3.16.7-ckt21 -stable review patch. If anyone has any objections, > > > please let me know. > > > > > > ------------------ > > > > > > From: Eric Dumazet <edumazet@google.com> > > > > > > commit 4ece9009774596ee3df0acba65a324b7ea79387c upstream. > > > > > > sit0 device allocates its percpu storage twice : > > > - One time in ipip6_tunnel_init() > > > - One time in ipip6_fb_tunnel_init() > > > > > > Thus we leak 48 bytes per possible cpu per network namespace > > > dismantle. > > > > > > ipip6_fb_tunnel_init() can be much simpler and does not > > > return an error, and should be called after register_netdev() > > [...] > > > > Doesn't this introduce a race condition when sit is a module? There > > seems to be nothing to prevent access to the partially initialised > > device after calling register_netdev(), if sit_init_net() is called > > during module loading rather than during namespace creation. > > > > This seems to be an upstream issue, not specific to the 3.16.y-ckt > stable kernel. If that is the case, I guess I'll just keep this patch > and later apply the fix. Or do you think this race is really likely > to be a worst problem than then issue the patch is trying to fix? It seems worse than the problem being fixed. Ben.
On Sun, 2015-12-13 at 20:20 +0000, Ben Hutchings wrote: > On Sun, 2015-12-13 at 18:54 +0000, Luis Henriques wrote: > > On Sat, Dec 12, 2015 at 04:18:26AM +0000, Ben Hutchings wrote: > > > On Wed, 2015-12-09 at 09:36 +0000, Luis Henriques wrote: > > > > 3.16.7-ckt21 -stable review patch. If anyone has any objections, > > > > please let me know. > > > > > > > > ------------------ > > > > > > > > From: Eric Dumazet <edumazet@google.com> > > > > > > > > commit 4ece9009774596ee3df0acba65a324b7ea79387c upstream. > > > > > > > > sit0 device allocates its percpu storage twice : > > > > - One time in ipip6_tunnel_init() > > > > - One time in ipip6_fb_tunnel_init() > > > > > > > > Thus we leak 48 bytes per possible cpu per network namespace > > > > dismantle. > > > > > > > > ipip6_fb_tunnel_init() can be much simpler and does not > > > > return an error, and should be called after register_netdev() > > > [...] > > > > > > Doesn't this introduce a race condition when sit is a module? There > > > seems to be nothing to prevent access to the partially initialised > > > device after calling register_netdev(), if sit_init_net() is called > > > during module loading rather than during namespace creation. > > > > > > > This seems to be an upstream issue, not specific to the 3.16.y-ckt > > stable kernel. If that is the case, I guess I'll just keep this patch > > and later apply the fix. Or do you think this race is really likely > > to be a worst problem than then issue the patch is trying to fix? > > It seems worse than the problem being fixed. 1) Sorry Ben, I do not understand the problem you mention. What is a partially initialized device exactly ? 2) I have no idea why this patch is even backported to 3.16, since it is fixing a problem added in 3.18 : # git describe --contains ebe084aafb7e v3.18-rc5~22^2~42^2~1 If your 3.16 kernel survives this loop without consuming memory like crazy, then the backport is not needed. modprobe sit while : do ip netns add ns1 ip netns delete ns1 done
On Sun, 2015-12-13 at 12:43 -0800, Eric Dumazet wrote: > On Sun, 2015-12-13 at 20:20 +0000, Ben Hutchings wrote: > > On Sun, 2015-12-13 at 18:54 +0000, Luis Henriques wrote: > > > On Sat, Dec 12, 2015 at 04:18:26AM +0000, Ben Hutchings wrote: > > > > On Wed, 2015-12-09 at 09:36 +0000, Luis Henriques wrote: > > > > > 3.16.7-ckt21 -stable review patch. If anyone has any objections, > > > > > please let me know. > > > > > > > > > > ------------------ > > > > > > > > > > From: Eric Dumazet <edumazet@google.com> > > > > > > > > > > commit 4ece9009774596ee3df0acba65a324b7ea79387c upstream. > > > > > > > > > > sit0 device allocates its percpu storage twice : > > > > > - One time in ipip6_tunnel_init() > > > > > - One time in ipip6_fb_tunnel_init() > > > > > > > > > > Thus we leak 48 bytes per possible cpu per network namespace > > > > > dismantle. > > > > > > > > > > ipip6_fb_tunnel_init() can be much simpler and does not > > > > > return an error, and should be called after register_netdev() > > > > [...] > > > > > > > > Doesn't this introduce a race condition when sit is a module? There > > > > seems to be nothing to prevent access to the partially initialised > > > > device after calling register_netdev(), if sit_init_net() is called > > > > during module loading rather than during namespace creation. > > > > > > > > > > This seems to be an upstream issue, not specific to the 3.16.y-ckt > > > stable kernel. If that is the case, I guess I'll just keep this patch > > > and later apply the fix. Or do you think this race is really likely > > > to be a worst problem than then issue the patch is trying to fix? > > > > It seems worse than the problem being fixed. > > 1) Sorry Ben, I do not understand the problem you mention. > What is a partially initialized device exactly ? A tunnel device which is registered but hasn't had its private structure fully initialised yet. > 2) I have no idea why this patch is even backported to 3.16, > since it is fixing a problem added in 3.18 : > > # git describe --contains ebe084aafb7e > v3.18-rc5~22^2~42^2~1 > > If your 3.16 kernel survives this loop without consuming memory like > crazy, then the backport is not needed. > > modprobe sit > while : > do > ip netns add ns1 > ip netns delete ns1 > done I can't detect a memory leak when doing this. Ben.
On Sun, Dec 13, 2015 at 12:43:10PM -0800, Eric Dumazet wrote: > On Sun, 2015-12-13 at 20:20 +0000, Ben Hutchings wrote: > > On Sun, 2015-12-13 at 18:54 +0000, Luis Henriques wrote: > > > On Sat, Dec 12, 2015 at 04:18:26AM +0000, Ben Hutchings wrote: > > > > On Wed, 2015-12-09 at 09:36 +0000, Luis Henriques wrote: > > > > > 3.16.7-ckt21 -stable review patch. If anyone has any objections, > > > > > please let me know. > > > > > > > > > > ------------------ > > > > > > > > > > From: Eric Dumazet <edumazet@google.com> > > > > > > > > > > commit 4ece9009774596ee3df0acba65a324b7ea79387c upstream. > > > > > > > > > > sit0 device allocates its percpu storage twice : > > > > > - One time in ipip6_tunnel_init() > > > > > - One time in ipip6_fb_tunnel_init() > > > > > > > > > > Thus we leak 48 bytes per possible cpu per network namespace > > > > > dismantle. > > > > > > > > > > ipip6_fb_tunnel_init() can be much simpler and does not > > > > > return an error, and should be called after register_netdev() > > > > [...] > > > > > > > > Doesn't this introduce a race condition when sit is a module? There > > > > seems to be nothing to prevent access to the partially initialised > > > > device after calling register_netdev(), if sit_init_net() is called > > > > during module loading rather than during namespace creation. > > > > > > > > > > This seems to be an upstream issue, not specific to the 3.16.y-ckt > > > stable kernel. If that is the case, I guess I'll just keep this patch > > > and later apply the fix. Or do you think this race is really likely > > > to be a worst problem than then issue the patch is trying to fix? > > > > It seems worse than the problem being fixed. > > 1) Sorry Ben, I do not understand the problem you mention. > What is a partially initialized device exactly ? > > 2) I have no idea why this patch is even backported to 3.16, > since it is fixing a problem added in 3.18 : > > # git describe --contains ebe084aafb7e > v3.18-rc5~22^2~42^2~1 > Right, but this commit has been included in several stable git trees (I can see it at least in 3.14, 3.13 and 3.10, but it's probably in some more). Cheers, -- Luís > If your 3.16 kernel survives this loop without consuming memory like > crazy, then the backport is not needed. > > modprobe sit > while : > do > ip netns add ns1 > ip netns delete ns1 > done > > > >
On Sun, 2015-12-13 at 21:22 +0000, Ben Hutchings wrote: > On Sun, 2015-12-13 at 12:43 -0800, Eric Dumazet wrote: > > 1) Sorry Ben, I do not understand the problem you mention. > > What is a partially initialized device exactly ? > > A tunnel device which is registered but hasn't had its private > structure fully initialised yet. And you see this happening after my patch ? I am blind. I am referring to current linux kernel, not to a backport to pre 3.18 kernels, that was not considered when I wrote this patch. By the time ipip6_fb_tunnel_init() is called, dev->tstats had been already allocated in ipip6_tunnel_init(), so what is missing ? Thanks.
On Sun, 2015-12-13 at 13:44 -0800, Eric Dumazet wrote: > On Sun, 2015-12-13 at 21:22 +0000, Ben Hutchings wrote: > > On Sun, 2015-12-13 at 12:43 -0800, Eric Dumazet wrote: > > > > 1) Sorry Ben, I do not understand the problem you mention. > > > What is a partially initialized device exactly ? > > > > A tunnel device which is registered but hasn't had its private > > structure fully initialised yet. > > And you see this happening after my patch ? I am blind. > > I am referring to current linux kernel, not to a backport to pre 3.18 > kernels, that was not considered when I wrote this patch. > > By the time ipip6_fb_tunnel_init() is called, dev->tstats had been > already allocated in ipip6_tunnel_init(), so what is missing ? You moved this initialisation below the registration: > ipip6_tunnel_clone_6rd(sitn->fb_tunnel_dev, sitn); > ipip6_fb_tunnel_init(sitn->fb_tunnel_dev); Ben.
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Sun, 13 Dec 2015 12:43:10 -0800 > 2) I have no idea why this patch is even backported to 3.16, > since it is fixing a problem added in 3.18 : Because someone on the path to some of the -stable trees aren't even checking the Fixes: tag in the patches they are backporting. And that's really bad...
On Sun, Dec 13, 2015 at 05:14:06PM -0500, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Sun, 13 Dec 2015 12:43:10 -0800 > > > 2) I have no idea why this patch is even backported to 3.16, > > since it is fixing a problem added in 3.18 : > > Because someone on the path to some of the -stable trees aren't even > checking the Fixes: tag in the patches they are backporting. > > And that's really bad... As I already said before, this patch has been included in 3.16 because the SHA1 in the "Fixes:" *is* in 3.16 (as it is in other even older stable trees such as 3.12 and 3.14). Cheers, -- Luís
On Sun, 2015-12-13 at 21:49 +0000, Ben Hutchings wrote: > On Sun, 2015-12-13 at 13:44 -0800, Eric Dumazet wrote: > > On Sun, 2015-12-13 at 21:22 +0000, Ben Hutchings wrote: > > > On Sun, 2015-12-13 at 12:43 -0800, Eric Dumazet wrote: > > > > > > 1) Sorry Ben, I do not understand the problem you mention. > > > > What is a partially initialized device exactly ? > > > > > > A tunnel device which is registered but hasn't had its private > > > structure fully initialised yet. > > > > And you see this happening after my patch ? I am blind. > > > > I am referring to current linux kernel, not to a backport to pre 3.18 > > kernels, that was not considered when I wrote this patch. > > > > By the time ipip6_fb_tunnel_init() is called, dev->tstats had been > > already allocated in ipip6_tunnel_init(), so what is missing ? > > You moved this initialisation below the registration: > > > ipip6_tunnel_clone_6rd(sitn->fb_tunnel_dev, sitn); > > ipip6_fb_tunnel_init(sitn->fb_tunnel_dev); Okay, so what is the exact problem you are seeing Ben ? ipip6_tunnel_clone_6rd() looks to not contain a fatal race or mem leak. Note that ipip6_tunnel_clone_6rd() can be called later from ioctl() path. ipip6_fb_tunnel_init() must be done once device is ready, as it publishes state for packet processing. rcu_assign_pointer(sitn->tunnels_wc[0], tunnel); Looks like a rather correct way to register a device : init all fields, then publish the RCU protected pointer for packets to catch it. Really, I do not see a problem, please elaborate.
On Sun, 2015-12-13 at 14:45 -0800, Eric Dumazet wrote: > On Sun, 2015-12-13 at 21:49 +0000, Ben Hutchings wrote: > > On Sun, 2015-12-13 at 13:44 -0800, Eric Dumazet wrote: > > > On Sun, 2015-12-13 at 21:22 +0000, Ben Hutchings wrote: > > > > On Sun, 2015-12-13 at 12:43 -0800, Eric Dumazet wrote: > > > > > > > > 1) Sorry Ben, I do not understand the problem you mention. > > > > > What is a partially initialized device exactly ? > > > > > > > > A tunnel device which is registered but hasn't had its private > > > > structure fully initialised yet. > > > > > > And you see this happening after my patch ? I am blind. > > > > > > I am referring to current linux kernel, not to a backport to pre 3.18 > > > kernels, that was not considered when I wrote this patch. > > > > > > By the time ipip6_fb_tunnel_init() is called, dev->tstats had been > > > already allocated in ipip6_tunnel_init(), so what is missing ? > > > > You moved this initialisation below the registration: > > > > > ipip6_tunnel_clone_6rd(sitn->fb_tunnel_dev, sitn); > > > ipip6_fb_tunnel_init(sitn->fb_tunnel_dev); > > Okay, so what is the exact problem you are seeing Ben ? > > ipip6_tunnel_clone_6rd() looks to not contain a fatal race or mem leak. > > Note that ipip6_tunnel_clone_6rd() can be called later from ioctl() > path. That holds the rtnl_lock, though. > ipip6_fb_tunnel_init() must be done once device is ready, as it > publishes state for packet processing. OK. > rcu_assign_pointer(sitn->tunnels_wc[0], tunnel); > > Looks like a rather correct way to register a device : init all fields, > then publish the RCU protected pointer for packets to catch it. > > Really, I do not see a problem, please elaborate. Maybe there isn't one in this case. Ben.
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index 051e9c508933..b5bdd2aeb2f8 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -1374,34 +1374,20 @@ static int ipip6_tunnel_init(struct net_device *dev) return 0; } -static int __net_init ipip6_fb_tunnel_init(struct net_device *dev) +static void __net_init ipip6_fb_tunnel_init(struct net_device *dev) { struct ip_tunnel *tunnel = netdev_priv(dev); struct iphdr *iph = &tunnel->parms.iph; struct net *net = dev_net(dev); struct sit_net *sitn = net_generic(net, sit_net_id); - tunnel->dev = dev; - tunnel->net = dev_net(dev); - iph->version = 4; iph->protocol = IPPROTO_IPV6; iph->ihl = 5; iph->ttl = 64; - dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats); - if (!dev->tstats) - return -ENOMEM; - - tunnel->dst_cache = alloc_percpu(struct ip_tunnel_dst); - if (!tunnel->dst_cache) { - free_percpu(dev->tstats); - return -ENOMEM; - } - dev_hold(dev); rcu_assign_pointer(sitn->tunnels_wc[0], tunnel); - return 0; } static int ipip6_validate(struct nlattr *tb[], struct nlattr *data[]) @@ -1738,23 +1724,18 @@ static int __net_init sit_init_net(struct net *net) */ sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; - err = ipip6_fb_tunnel_init(sitn->fb_tunnel_dev); - if (err) - goto err_dev_free; - - ipip6_tunnel_clone_6rd(sitn->fb_tunnel_dev, sitn); - if ((err = register_netdev(sitn->fb_tunnel_dev))) goto err_reg_dev; + ipip6_tunnel_clone_6rd(sitn->fb_tunnel_dev, sitn); + ipip6_fb_tunnel_init(sitn->fb_tunnel_dev); + t = netdev_priv(sitn->fb_tunnel_dev); strcpy(t->parms.name, sitn->fb_tunnel_dev->name); return 0; err_reg_dev: - dev_put(sitn->fb_tunnel_dev); -err_dev_free: ipip6_dev_free(sitn->fb_tunnel_dev); err_alloc_dev: return err;