Message ID | 20090226084924.16cb3e08@nehalam |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Stephen Hemminger <shemminger@vyatta.com> writes: > What about something like this: > > Subject: [PATCH] Avoid race between network down and sysfs As far as solutions go. That looks like the easiest correct solution. So. Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Will -ERESTARTSYS trigger the in kernel restart logic in this case? There are a lot more cases to cover, and I don't I like it long term. Spinning waiting for rtnl_lock feels wrong. Plus it does not help with discovering the problem in new sysfs, sysctl, or proc files. It has the major advantage that we can fix things now. > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> > > --- a/net/core/net-sysfs.c 2009-02-26 08:36:18.000000000 -0800 > +++ b/net/core/net-sysfs.c 2009-02-26 08:37:51.000000000 -0800 > @@ -77,7 +77,9 @@ static ssize_t netdev_store(struct devic > if (endp == buf) > goto err; > > - rtnl_lock(); > + if (!rtnl_trylock()) > + return -ERESTARTSYS; > + > if (dev_isalive(net)) { > if ((ret = (*set)(net, new)) == 0) > ret = len; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 26 Feb 2009 11:01:41 -0800 ebiederm@xmission.com (Eric W. Biederman) wrote: > Stephen Hemminger <shemminger@vyatta.com> writes: > > > What about something like this: > > > > Subject: [PATCH] Avoid race between network down and sysfs > > As far as solutions go. That looks like the easiest correct solution. > So. > Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> > > > Will -ERESTARTSYS trigger the in kernel restart logic in this case? > > There are a lot more cases to cover, and I don't I like it long > term. Spinning waiting for rtnl_lock feels wrong. Plus it does > not help with discovering the problem in new sysfs, sysctl, or > proc files. > > It has the major advantage that we can fix things now. > I haven't tested it, but it should restart in VFS. Spinning is not that big a deal, and it also handles the case where the name changed or some other race occurred during processing. When syscall is re-entered, the name will no longer be found and -ENOENT will be returned. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Feb 26, 2009 at 08:49:24AM -0800, Stephen Hemminger wrote: > > - rtnl_lock(); > + if (!rtnl_trylock()) > + return -ERESTARTSYS; This is going to spin instead of sleep on contention, is that intended? Cheers,
On Fri, 27 Feb 2009 08:59:45 +0800 Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Thu, Feb 26, 2009 at 08:49:24AM -0800, Stephen Hemminger wrote: > > > > - rtnl_lock(); > > + if (!rtnl_trylock()) > > + return -ERESTARTSYS; > > This is going to spin instead of sleep on contention, is that > intended? > > Cheers, It walks all the way back out to VFS, which is what you have to do since the sysctl may move. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stephen Hemminger wrote: > Subject: [PATCH] Avoid race between network down and sysfs > > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> > > --- a/net/core/net-sysfs.c 2009-02-26 08:36:18.000000000 -0800 > +++ b/net/core/net-sysfs.c 2009-02-26 08:37:51.000000000 -0800 > @@ -77,7 +77,9 @@ static ssize_t netdev_store(struct devic > if (endp == buf) > goto err; > > - rtnl_lock(); > + if (!rtnl_trylock()) > + return -ERESTARTSYS; > + > if (dev_isalive(net)) { > if ((ret = (*set)(net, new)) == 0) > ret = len; I can test this to see if it fixes my problem. Are the above lines the entirety of the patch? I think I'll add a printk if we do the ERESTARTSYS to make sure that I hit the race (but still recover). Thanks, Ben
On Fri, 27 Feb 2009 10:26:43 -0800 Ben Greear <greearb@candelatech.com> wrote: > Stephen Hemminger wrote: > > > Subject: [PATCH] Avoid race between network down and sysfs > > > > > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> > > > > --- a/net/core/net-sysfs.c 2009-02-26 08:36:18.000000000 -0800 > > +++ b/net/core/net-sysfs.c 2009-02-26 08:37:51.000000000 -0800 > > @@ -77,7 +77,9 @@ static ssize_t netdev_store(struct devic > > if (endp == buf) > > goto err; > > > > - rtnl_lock(); > > + if (!rtnl_trylock()) > > + return -ERESTARTSYS; > > + > > if (dev_isalive(net)) { > > if ((ret = (*set)(net, new)) == 0) > > ret = len; > > I can test this to see if it fixes my problem. Are the above lines the > entirety of the patch? yes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stephen Hemminger wrote: > On Fri, 27 Feb 2009 10:26:43 -0800 > Ben Greear <greearb@candelatech.com> wrote: > >> Stephen Hemminger wrote: >> >>> Subject: [PATCH] Avoid race between network down and sysfs >>> >>> >>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> >>> >>> --- a/net/core/net-sysfs.c 2009-02-26 08:36:18.000000000 -0800 >>> +++ b/net/core/net-sysfs.c 2009-02-26 08:37:51.000000000 -0800 >>> @@ -77,7 +77,9 @@ static ssize_t netdev_store(struct devic >>> if (endp == buf) >>> goto err; >>> >>> - rtnl_lock(); >>> + if (!rtnl_trylock()) >>> + return -ERESTARTSYS; >>> + >>> if (dev_isalive(net)) { >>> if ((ret = (*set)(net, new)) == 0) >>> ret = len; >> I can test this to see if it fixes my problem. Are the above lines the >> entirety of the patch? > > yes We should be able to avoid the restart looping in most cases, we only need to do this while unregistration is in progress. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Patrick McHardy wrote: > Stephen Hemminger wrote: >>>> @@ -77,7 +77,9 @@ static ssize_t netdev_store(struct devic >>>> if (endp == buf) >>>> goto err; >>>> >>>> - rtnl_lock(); >>>> + if (!rtnl_trylock()) >>>> + return -ERESTARTSYS; >>>> + >>>> if (dev_isalive(net)) { >>>> if ((ret = (*set)(net, new)) == 0) >>>> ret = len; >>> I can test this to see if it fixes my problem. Are the above lines the >>> entirety of the patch? >> >> yes > > We should be able to avoid the restart looping in most cases, > we only need to do this while unregistration is in progress. It doesn't seem to work. The idea was something like this: static int addrconf_fixup_forwarding(struct ctl_table *table, int *p, int old) { struct inet6_dev *idev; struct net *net; net = (struct net *)table->extra2; if (p == &net->ipv6.devconf_dflt->forwarding) return 0; /* Unregistration in progress */ idev = table->extra1; if (idev->cnf.sysctl == NULL) return -ERESTARTSYS; rtnl_lock(); ... but the process might have entered this function while the rtnl is already held, but the unregistration hasn't been triggered yet. So it would still deadlock. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stephen Hemminger wrote: > On Fri, 27 Feb 2009 10:26:43 -0800 > Ben Greear <greearb@candelatech.com> wrote: > >> Stephen Hemminger wrote: >> >>> Subject: [PATCH] Avoid race between network down and sysfs >>> >>> >>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> >>> >>> --- a/net/core/net-sysfs.c 2009-02-26 08:36:18.000000000 -0800 >>> +++ b/net/core/net-sysfs.c 2009-02-26 08:37:51.000000000 -0800 >>> @@ -77,7 +77,9 @@ static ssize_t netdev_store(struct devic >>> if (endp == buf) >>> goto err; >>> >>> - rtnl_lock(); >>> + if (!rtnl_trylock()) >>> + return -ERESTARTSYS; >>> + >>> if (dev_isalive(net)) { >>> if ((ret = (*set)(net, new)) == 0) >>> ret = len; >> I can test this to see if it fixes my problem. Are the above lines the >> entirety of the patch? > > yes With both of Stephen's patches included in the latest -rc6 source, I re-ran the test and it seems to be working (I added printks so that I would know the new code was being exercised) I had 2000 or so mac-vlans configured, with 10 of them being re-configured concurrently, while also deleting groups of 20-100 mac-vlans in my test. This was locking up reliably before, and now it seems to be working fine. Here's the kernel log showing the ERESTARTSYS in action. I don't have an easy way to check to see if the VFS (or whatever) retried the call properly, but will let you all know if I see any indication that isn't working. I only saw the ipv6 fixup in my logs, but maybe my test case just doesn't hit the other... Mar 2 13:37:41 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Mar 2 13:37:41 simech-1 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready Mar 2 13:37:42 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Mar 2 13:38:13 simech-1 kernel:last message repeated 42 times Mar 2 13:39:20 simech-1 kernel:last message repeated 71 times Mar 2 13:40:13 simech-1 kernel:last message repeated 68 times Mar 2 13:40:13 simech-1 dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 8 Mar 2 13:40:13 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Mar 2 13:40:21 simech-1 kernel:last message repeated 11 times Mar 2 13:40:21 simech-1 dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 13 Mar 2 13:40:21 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Mar 2 13:40:28 simech-1 kernel:last message repeated 16 times Mar 2 13:40:28 simech-1 kernel: __ratelimit: 3352 callbacks suppressed Mar 2 13:40:28 simech-1 kernel: Neighbour table overflow. Mar 2 13:40:28 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Mar 2 13:40:34 simech-1 kernel:last message repeated 8 times Mar 2 13:40:34 simech-1 dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 20 Mar 2 13:40:34 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Mar 2 13:40:54 simech-1 kernel:last message repeated 42 times Mar 2 13:40:54 simech-1 dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 7 Mar 2 13:40:54 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Mar 2 13:41:01 simech-1 kernel:last message repeated 13 times Mar 2 13:41:01 simech-1 dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 13 Mar 2 13:41:01 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Mar 2 13:41:14 simech-1 kernel:last message repeated 26 times Mar 2 13:41:14 simech-1 dhclient: No DHCPOFFERS received. Mar 2 13:41:14 simech-1 dhclient: No working leases in persistent database - sleeping. Mar 2 13:41:15 simech-1 dhclient: receive_packet failed on eth1: Network is down Mar 2 13:41:16 simech-1 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready Mar 2 13:41:17 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Mar 2 13:41:24 simech-1 kernel:last message repeated 16 times Mar 2 13:41:24 simech-1 kernel: Neighbour table overflow. Mar 2 13:41:25 simech-1 kernel:last message repeated 8 times Mar 2 13:41:25 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Mar 2 13:41:30 simech-1 kernel:last message repeated 11 times Mar 2 13:41:30 simech-1 kernel: __ratelimit: 7059 callbacks suppressed Mar 2 13:41:30 simech-1 kernel: Neighbour table overflow. Mar 2 13:41:30 simech-1 kernel:last message repeated 9 times Mar 2 13:41:30 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Mar 2 13:41:35 simech-1 kernel:last message repeated 10 times Mar 2 13:41:35 simech-1 kernel: __ratelimit: 7973 callbacks suppressed Mar 2 13:41:35 simech-1 kernel: Neighbour table overflow. Mar 2 13:41:36 simech-1 kernel:last message repeated 9 times Mar 2 13:41:36 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Mar 2 13:41:40 simech-1 kernel:last message repeated 7 times Mar 2 13:41:40 simech-1 kernel: __ratelimit: 10890 callbacks suppressed Mar 2 13:41:40 simech-1 kernel: Neighbour table overflow. Mar 2 13:41:40 simech-1 kernel:last message repeated 9 times Mar 2 13:41:40 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Mar 2 13:41:46 simech-1 kernel:last message repeated 13 times Mar 2 13:41:46 simech-1 kernel: __ratelimit: 6571 callbacks suppressed Mar 2 13:41:46 simech-1 kernel: Neighbour table overflow. Mar 2 13:41:46 simech-1 kernel:last message repeated 9 times Mar 2 13:41:46 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Mar 2 13:41:51 simech-1 kernel:last message repeated 7 times Mar 2 13:41:51 simech-1 kernel: __ratelimit: 15491 callbacks suppressed Mar 2 13:41:51 simech-1 kernel: Neighbour table overflow. Mar 2 13:41:51 simech-1 kernel:last message repeated 9 times Mar 2 13:41:51 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Mar 2 13:41:56 simech-1 kernel:last message repeated 9 times Mar 2 13:41:56 simech-1 kernel: __ratelimit: 9658 callbacks suppressed Mar 2 13:41:56 simech-1 kernel: Neighbour table overflow. Mar 2 13:41:56 simech-1 kernel:last message repeated 9 times Mar 2 13:41:56 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Mar 2 13:42:01 simech-1 kernel:last message repeated 10 times Mar 2 13:42:01 simech-1 kernel: __ratelimit: 12678 callbacks suppressed Mar 2 13:42:01 simech-1 kernel: Neighbour table overflow. Mar 2 13:42:01 simech-1 kernel:last message repeated 9 times Mar 2 13:42:01 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Mar 2 13:42:06 simech-1 kernel:last message repeated 7 times Mar 2 13:42:06 simech-1 kernel: __ratelimit: 13401 callbacks suppressed Mar 2 13:42:06 simech-1 kernel: Neighbour table overflow. Mar 2 13:42:06 simech-1 kernel:last message repeated 9 times Mar 2 13:42:06 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Mar 2 13:42:11 simech-1 kernel:last message repeated 10 times Mar 2 13:42:11 simech-1 kernel: __ratelimit: 11875 callbacks suppressed Mar 2 13:42:11 simech-1 kernel: Neighbour table overflow. Mar 2 13:42:11 simech-1 kernel:last message repeated 9 times Mar 2 13:42:11 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Mar 2 13:42:16 simech-1 kernel:last message repeated 9 times Mar 2 13:42:16 simech-1 kernel: __ratelimit: 9362 callbacks suppressed Mar 2 13:42:16 simech-1 kernel: Neighbour table overflow. Mar 2 13:42:17 simech-1 kernel:last message repeated 9 times Mar 2 13:42:17 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock. Thanks, Ben
Ben Greear wrote: > With both of Stephen's patches included in the latest -rc6 source, > I re-ran the test and it seems to be working (I added printks so > that I would know the new code was being exercise > > I had 2000 or so mac-vlans configured, with 10 of them being > re-configured concurrently, while also deleting groups of 20-100 > mac-vlans in my test. This was locking up reliably before, > and now it seems to be working fine. > > Here's the kernel log showing the ERESTARTSYS in action. I don't > have an easy way to check to see if the VFS (or whatever) retried > the call properly, but will let you all know if I see any indication > that isn't working. > > I only saw the ipv6 fixup in my logs, but maybe my test case just > doesn't hit the other... This looks like its working fine. Despite the non-desirable active spinning, this seems like the best fix (actually much simpler than I expected to be possible) at this time. If we just could avoid the spinning when unnecessary, it would be perfect :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Patrick McHardy <kaber@trash.net> Date: Mon, 02 Mar 2009 23:20:49 +0100 > Ben Greear wrote: > > With both of Stephen's patches included in the latest -rc6 source, > > I re-ran the test and it seems to be working (I added printks so > > that I would know the new code was being exercise > > I had 2000 or so mac-vlans configured, with 10 of them being > > re-configured concurrently, while also deleting groups of 20-100 > > mac-vlans in my test. This was locking up reliably before, > > and now it seems to be working fine. > > Here's the kernel log showing the ERESTARTSYS in action. I don't > > have an easy way to check to see if the VFS (or whatever) retried > > the call properly, but will let you all know if I see any indication > > that isn't working. > > I only saw the ipv6 fixup in my logs, but maybe my test case just > > doesn't hit the other... > > This looks like its working fine. Despite the non-desirable active > spinning, this seems like the best fix (actually much simpler than > I expected to be possible) at this time. If we just could avoid > the spinning when unnecessary, it would be perfect :) Could you give that "not actually in-progress" detection a shot? I don't like the spinning either. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller wrote: > From: Patrick McHardy <kaber@trash.net> > Date: Mon, 02 Mar 2009 23:20:49 +0100 > >> This looks like its working fine. Despite the non-desirable active >> spinning, this seems like the best fix (actually much simpler than >> I expected to be possible) at this time. If we just could avoid >> the spinning when unnecessary, it would be perfect :) > > Could you give that "not actually in-progress" detection a shot? > > I don't like the spinning either. I tried this morning, the problem is that its always the sysctl handler which will run into the deadlock first, but there is no reliable indication to avoid it other than that the RTNL is already held. The problem is that the sysctl interface puts the process holding the RTNL to sleep and allows a process requiring it to run. Any different synchronization attempt will have the same problem, it seems you simply can't hold any locks while unregistering sysctls. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Patrick McHardy <kaber@trash.net> Date: Tue, 03 Mar 2009 00:03:00 +0100 > David Miller wrote: > > From: Patrick McHardy <kaber@trash.net> > > Date: Mon, 02 Mar 2009 23:20:49 +0100 > > > >> This looks like its working fine. Despite the non-desirable active > >> spinning, this seems like the best fix (actually much simpler than > >> I expected to be possible) at this time. If we just could avoid > >> the spinning when unnecessary, it would be perfect :) > > Could you give that "not actually in-progress" detection a shot? > > I don't like the spinning either. > > I tried this morning, the problem is that its always the sysctl > handler which will run into the deadlock first, but there is no > reliable indication to avoid it other than that the RTNL is > already held. The problem is that the sysctl interface puts the > process holding the RTNL to sleep and allows a process requiring > it to run. Any different synchronization attempt will have the > same problem, it seems you simply can't hold any locks while > unregistering sysctls. Ok, I applied Stephen's patches then... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller <davem@davemloft.net> writes: > From: Patrick McHardy <kaber@trash.net> > Date: Tue, 03 Mar 2009 00:03:00 +0100 > >> David Miller wrote: >> > From: Patrick McHardy <kaber@trash.net> >> > Date: Mon, 02 Mar 2009 23:20:49 +0100 >> > >> >> This looks like its working fine. Despite the non-desirable active >> >> spinning, this seems like the best fix (actually much simpler than >> >> I expected to be possible) at this time. If we just could avoid >> >> the spinning when unnecessary, it would be perfect :) >> > Could you give that "not actually in-progress" detection a shot? >> > I don't like the spinning either. >> >> I tried this morning, the problem is that its always the sysctl >> handler which will run into the deadlock first, but there is no >> reliable indication to avoid it other than that the RTNL is >> already held. The problem is that the sysctl interface puts the >> process holding the RTNL to sleep and allows a process requiring >> it to run. Any different synchronization attempt will have the >> same problem, it seems you simply can't hold any locks while >> unregistering sysctls. > > Ok, I applied Stephen's patches then... From the further brainstorming department... It appears we are using the wait for the completion for two distinct purposes. Waiting until it is safe to free storage. Blocking module unregistration until the all of the users of the code are gone. I'm wondering if we could move the memory freeing. And consolidate the waiting ultimately moving the wait for completion into netdev_run_todo after we drop the lock. The reason I care is that it looks like to get hotplug handling sane, I'm going to need to implement a generic version of what sysfs, proc and sysctl are doing, and I expect that generic version belongs in the vfs ultimately giving us the capability to implement revoke. So I am going to be revisiting how the code works, and if I can come up with a cleaner solution to the networking stack it would be a good time to implement it. Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/net/core/net-sysfs.c 2009-02-26 08:36:18.000000000 -0800 +++ b/net/core/net-sysfs.c 2009-02-26 08:37:51.000000000 -0800 @@ -77,7 +77,9 @@ static ssize_t netdev_store(struct devic if (endp == buf) goto err; - rtnl_lock(); + if (!rtnl_trylock()) + return -ERESTARTSYS; + if (dev_isalive(net)) { if ((ret = (*set)(net, new)) == 0) ret = len;