Message ID | 20100422.185400.71096585.davem@davemloft.net |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, Apr 22, 2010 at 06:54:00PM -0700, David Miller wrote: > From: Jiri Bohac <jbohac@suse.cz> > Date: Thu, 22 Apr 2010 17:49:08 +0200 > > > I still don't see why __ipv6_ifa_notify() needs to call > > dst_free(). Shouldn't that be dst_release() instead, to drop the > > reference obtained by dst_hold(&ifp->rt->u.dst)? > > It likely wants to do both. Actually I don't think the problem is in __ipv6_ifa_notify. The fact is none of this stuff is meant to be idempotent. So it's up to the entity that is requesting the deletion to make sure that a single object is not deleted more than once. Yes the original symptom was in __ipv6_ifa_notify, but it is merely pointing out that we have a problem further up. My patch is indeed not sufficient as Jiri pointed out, because I didn't deal with the case of an administrative deletion of autmatically generated IPv6 addresses. I will post an updated patch later today to deal with that. Cheers,
On Fri, Apr 23, 2010 at 10:10:00AM +0800, Herbert Xu wrote: > > I will post an updated patch later today to deal with that. Just got back from a business trip. This stuff is more broken than I thought. For example, we perform a number of actions when DAD succeeds, e.g., joining an anycast group. However, this is not synchronised with respect to address deletion at all, so if DAD succeeds just as someone deletes the address, you can easily get stuck on that anycast group. I will try to untangle this mess tomorrow. Cheers,
On Thu, Apr 22, 2010 at 06:54:00PM -0700, David Miller wrote: > From: Jiri Bohac <jbohac@suse.cz> > Date: Thu, 22 Apr 2010 17:49:08 +0200 > > > I still don't see why __ipv6_ifa_notify() needs to call > > dst_free(). Shouldn't that be dst_release() instead, to drop the > > reference obtained by dst_hold(&ifp->rt->u.dst)? > > It likely wants to do both. > > Just doing dst_release() doesn't mark the 'dst' object as obsolete, > and therefore it won't get force garbage collected. Sure. So If I understand it correctly, there are two problems: - the reference taken by dst_hold() just above the ip6_del_rt() is never dropped if ip6_del_rt() fails; so shouldn't the code be like this?: dst_hold(&ifp->rt->u.dst); if (ip6_del_rt(ifp->rt)) { dst_release(&ifp->rt->u.dst); dst_free(&ifp->rt->u.dst); } - if ip6_del_rt() fails because it races with something else deleting the address, dst_free() will be called twice. This is what Herbert is fixing with additional locking. However -- even when he fixes that -- how can ip6_del_rt() fail with the ifp->rt still needing a dst_free()? AFAICS, it can fail by: - __ip6_del_rt() finding that (rt == net->ipv6.ip6_null_entry); we don't want to call dst_free() on net->ipv6.ip6_null_entry, do we? - fib6_del() returning -ENOENT for multiple reasons; but doesn't that mean that something else has removed the route already and called dst_free on it? In either case, the dst_free() looks like not being needed. And it only does no harm in most cases, because these events are rare and it usually finds (obsolete > 2) and does nothing. I think that what Herbert is doing is only going to enforce that the ip6_del_rt() is never going to fail, so the dst_free() won't ever be called anyway, right? Thanks,
On Tue, Apr 27, 2010 at 05:50:34PM +0200, Jiri Bohac wrote: > > I think that what Herbert is doing is only going to enforce that > the ip6_del_rt() is never going to fail, so the dst_free() > won't ever be called anyway, right? Yes I think you're right. But let's fix the bigger problem first, and then we can audit this and possibly turn it into a WARN_ON. Thanks,
On Tue, Apr 27, 2010 at 11:55:33PM +0800, Herbert Xu wrote: > On Tue, Apr 27, 2010 at 05:50:34PM +0200, Jiri Bohac wrote: > > > > I think that what Herbert is doing is only going to enforce that > > the ip6_del_rt() is never going to fail, so the dst_free() > > won't ever be called anyway, right? > > Yes I think you're right. But let's fix the bigger problem first, > and then we can audit this and possibly turn it into a WARN_ON. Sorry for not fixing this sooner, but I'm back on the case. Also, the dst_free really is needed. It's there for the case where we delete the address before DAD completion. In that case the route object is allocated, but not inserted in the routing table. So we must manually dst_free it. Cheers,
On Fri, Apr 23, 2010 at 11:05:40PM +0800, Herbert Xu wrote: > > This stuff is more broken than I thought. For example, we perform > a number of actions when DAD succeeds, e.g., joining an anycast > group. However, this is not synchronised with respect to address > deletion at all, so if DAD succeeds just as someone deletes the > address, you can easily get stuck on that anycast group. > > I will try to untangle this mess tomorrow. Tomorrow took a while to arrive :) Here is a first batch of patches. Note that this is by no means a comprehensive fix for all the ndisc/addrconf race conditions. It is just a first step in trying to address the problems. The patchset revolves around a new lock, ifp->state_lock. I added it instead of trying to reuse the existing ifp->lock because the latter has serious nesting issues that prevent it from easily being used. My long term plan is to restructure the locking and eventually phase out ifp->lock in favour of ifp->state_lock. Cheers,
On Tue, 18 May 2010 21:02:43 +1000 Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Fri, Apr 23, 2010 at 11:05:40PM +0800, Herbert Xu wrote: > > > > This stuff is more broken than I thought. For example, we perform > > a number of actions when DAD succeeds, e.g., joining an anycast > > group. However, this is not synchronised with respect to address > > deletion at all, so if DAD succeeds just as someone deletes the > > address, you can easily get stuck on that anycast group. > > > > I will try to untangle this mess tomorrow. > > Tomorrow took a while to arrive :) > > Here is a first batch of patches. Note that this is by no means > a comprehensive fix for all the ndisc/addrconf race conditions. > It is just a first step in trying to address the problems. > > The patchset revolves around a new lock, ifp->state_lock. I > added it instead of trying to reuse the existing ifp->lock because > the latter has serious nesting issues that prevent it from easily > being used. My long term plan is to restructure the locking and > eventually phase out ifp->lock in favour of ifp->state_lock. I wonder if so many fine grained locks are really necessary at all. Everything but timers looks like it is under RTNL mutex already. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Stephen Hemminger <shemminger@vyatta.com> Date: Tue, 18 May 2010 10:25:50 -0700 > I wonder if so many fine grained locks are really necessary at > all. Everything but timers looks like it is under RTNL mutex > already. I can't see any reasonable alternative, and it took weeks to get a fix at all. So I'm going to apply Herbert's fixes with the __KERNEL__ header bit fixed up. Thanks Herbert! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 18, 2010 at 03:27:59PM -0700, David Miller wrote: > From: Stephen Hemminger <shemminger@vyatta.com> > Date: Tue, 18 May 2010 10:25:50 -0700 > > > I wonder if so many fine grained locks are really necessary at > > all. Everything but timers looks like it is under RTNL mutex > > already. > > I can't see any reasonable alternative, and it took weeks to get > a fix at all. Right, the issue is with forcing an immediate effect after an NDISC packet triggers an address addition/deletion. From what I can gather, the hard case was with NDISC address addition with DAD disabled. That is, you have an NDISC packet that causes an addition followed immediately by a packet destined (or otherwise relying on) that new address. In that case, we simply have no chance of using the RTNL. Cheers,
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index d328d59..1db5048 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -3321,9 +3321,7 @@ static void __ipv6_ifa_notify(int event, struct inet6_ifaddr *ifp) switch (event) { case RTM_NEWADDR: - dst_hold(&ifp->rt->u.dst); - if (ip6_ins_rt(ifp->rt, NULL, NULL, NULL)) - dst_release(&ifp->rt->u.dst); + ip6_ins_rt(ifp->rt, NULL, NULL, NULL); if (ifp->idev->cnf.forwarding) addrconf_join_anycast(ifp); break; @@ -3334,8 +3332,6 @@ static void __ipv6_ifa_notify(int event, struct inet6_ifaddr *ifp) dst_hold(&ifp->rt->u.dst); if (ip6_del_rt(ifp->rt, NULL, NULL, NULL)) dst_free(&ifp->rt->u.dst); - else - dst_release(&ifp->rt->u.dst); break; } }