unregister_netdevice: waiting for eth0 to become free. Usage count = 1

Submitted by Wei Wang on Aug. 12, 2017, 12:10 a.m.

Details

Message ID CAEA6p_COwXWP97wSHvzokmm8ckQ2QEd7=dfNH04ttDPm_z3bcg@mail.gmail.com
State RFC
Delegated to: David Miller
Headers show

Commit Message

Wei Wang Aug. 12, 2017, 12:10 a.m.
> If after Cong's fix, the issue still happens, could you help try the
> patch attached and collect all logs when you try the reproduce the
> issue? It would be great to have logs for both success case and the
> failure case.
>
> Thanks so much for your help.
>

I think we have a potential fix for this issue.
Martin and I found that when addrconf_dst_alloc() creates a rt6, it is
possible that rt6->dst.dev points to loopback device while
rt6->rt6i_idev->dev points to a real device.
When the real device goes down, the current fib6 clean up code only
checks for rt6->dst.dev and assumes rt6->rt6i_idev->dev is the same.
That leaves unreleased refcnt on the real device if rt6->dst.dev
points to loopback dev.

The attached potential fix is tested by Martin and made sure it fixes his issue.

John,
It will be great if you can also give it a try and see if it fixes the
issue on your side before I submit an official patch.

Thanks very much for the help from everyone.

Wei

On Fri, Aug 11, 2017 at 10:25 AM, Wei Wang <weiwan@google.com> wrote:
> On Fri, Aug 11, 2017 at 9:48 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> Hi,
>>
>> On Thu, Aug 10, 2017 at 11:12 AM, John Stultz <john.stultz@linaro.org> wrote:
>>> On Wed, Aug 9, 2017 at 10:41 PM, Wei Wang <weiwan@google.com> wrote:
>>>> Hi John,
>>>>
>>>> Is it possible to try the attached patch?
>>>
>>> Thanks so much for the quick turn around!
>>>
>>> So I dropped all the reverts you suggested, and applied this one
>>> against 4.13-rc4, but I'm still seeing the problematic behavior.
>>
>> Does the following one-line fix make a difference?
>>
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index a640fbcba15d..c145a35763a0 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -141,7 +141,7 @@ static void rt6_uncached_list_del(struct rt6_info *rt)
>>                 struct uncached_list *ul = rt->rt6i_uncached_list;
>>
>>                 spin_lock_bh(&ul->lock);
>> -               list_del(&rt->rt6i_uncached);
>> +               list_del_init(&rt->rt6i_uncached);
>>                 spin_unlock_bh(&ul->lock);
>>         }
>>  }
>
>
> Thanks a lot Cong for proposing this fix.
>
> For the last few days, John has been helping me running debug image
> and we found out that the leaked dst is probably in addrconf.c.
> Martin and I are looking through the code and trying to put more debugs.
>
> John,
>
> If after Cong's fix, the issue still happens, could you help try the
> patch attached and collect all logs when you try the reproduce the
> issue? It would be great to have logs for both success case and the
> failure case.
>
> Thanks so much for your help.
>
> Wei

Comments

David Ahern Aug. 12, 2017, 12:19 a.m.
On 8/11/17 6:10 PM, Wei Wang wrote:
> I think we have a potential fix for this issue.
> Martin and I found that when addrconf_dst_alloc() creates a rt6, it is
> possible that rt6->dst.dev points to loopback device while
> rt6->rt6i_idev->dev points to a real device.
> When the real device goes down, the current fib6 clean up code only
> checks for rt6->dst.dev and assumes rt6->rt6i_idev->dev is the same.
> That leaves unreleased refcnt on the real device if rt6->dst.dev
> points to loopback dev.

Yes, host routes and anycast routes.

I have a patch to fix that but it is held up on a few VRF test cases
failing. Hopefully I can get that figured out next week. These unrelated
routes against the loopback device have been a source of a number of
problems (e.g. take down 'lo' and all of IPv6 networking stops for that
namespace).
Wei Wang Aug. 12, 2017, 12:25 a.m.
On Fri, Aug 11, 2017 at 5:19 PM, David Ahern <dsahern@gmail.com> wrote:
> On 8/11/17 6:10 PM, Wei Wang wrote:
>> I think we have a potential fix for this issue.
>> Martin and I found that when addrconf_dst_alloc() creates a rt6, it is
>> possible that rt6->dst.dev points to loopback device while
>> rt6->rt6i_idev->dev points to a real device.
>> When the real device goes down, the current fib6 clean up code only
>> checks for rt6->dst.dev and assumes rt6->rt6i_idev->dev is the same.
>> That leaves unreleased refcnt on the real device if rt6->dst.dev
>> points to loopback dev.
>
> Yes, host routes and anycast routes.
>
> I have a patch to fix that but it is held up on a few VRF test cases
> failing. Hopefully I can get that figured out next week. These unrelated
> routes against the loopback device have been a source of a number of
> problems (e.g. take down 'lo' and all of IPv6 networking stops for that
> namespace).

Thanks David.
By "a patch to fix that" do you mean after your patch, for every rt6,
rt6->rt6i_idev will be the same as rt6->dst.dev?
John Stultz Aug. 12, 2017, 12:31 a.m.
On Fri, Aug 11, 2017 at 5:10 PM, Wei Wang <weiwan@google.com> wrote:
>> If after Cong's fix, the issue still happens, could you help try the
>> patch attached and collect all logs when you try the reproduce the
>> issue? It would be great to have logs for both success case and the
>> failure case.
>>
>> Thanks so much for your help.
>>
>
> I think we have a potential fix for this issue.
> Martin and I found that when addrconf_dst_alloc() creates a rt6, it is
> possible that rt6->dst.dev points to loopback device while
> rt6->rt6i_idev->dev points to a real device.
> When the real device goes down, the current fib6 clean up code only
> checks for rt6->dst.dev and assumes rt6->rt6i_idev->dev is the same.
> That leaves unreleased refcnt on the real device if rt6->dst.dev
> points to loopback dev.
>
> The attached potential fix is tested by Martin and made sure it fixes his issue.
>
> John,
> It will be great if you can also give it a try and see if it fixes the
> issue on your side before I submit an official patch.

So yes, sorry I haven't been able to get back quicker on the other
patches sent, was mucking about in other work.

So yea, this patch  (potential fix for unregister_netdevice()) seems
to avoid the issue.

I'm going to do some further testing, but its looking good so far.

Do you still want feedback on the previous changes?

thanks
-john
Wei Wang Aug. 12, 2017, 12:46 a.m.
> So yes, sorry I haven't been able to get back quicker on the other
> patches sent, was mucking about in other work.
>
> So yea, this patch  (potential fix for unregister_netdevice()) seems
> to avoid the issue.
>
> I'm going to do some further testing, but its looking good so far.
>

Great. Thanks.

> Do you still want feedback on the previous changes?

If this patch is good, then I don't really need the previous feedback.

Thanks.
Wei


On Fri, Aug 11, 2017 at 5:31 PM, John Stultz <john.stultz@linaro.org> wrote:
> On Fri, Aug 11, 2017 at 5:10 PM, Wei Wang <weiwan@google.com> wrote:
>>> If after Cong's fix, the issue still happens, could you help try the
>>> patch attached and collect all logs when you try the reproduce the
>>> issue? It would be great to have logs for both success case and the
>>> failure case.
>>>
>>> Thanks so much for your help.
>>>
>>
>> I think we have a potential fix for this issue.
>> Martin and I found that when addrconf_dst_alloc() creates a rt6, it is
>> possible that rt6->dst.dev points to loopback device while
>> rt6->rt6i_idev->dev points to a real device.
>> When the real device goes down, the current fib6 clean up code only
>> checks for rt6->dst.dev and assumes rt6->rt6i_idev->dev is the same.
>> That leaves unreleased refcnt on the real device if rt6->dst.dev
>> points to loopback dev.
>>
>> The attached potential fix is tested by Martin and made sure it fixes his issue.
>>
>> John,
>> It will be great if you can also give it a try and see if it fixes the
>> issue on your side before I submit an official patch.
>
> So yes, sorry I haven't been able to get back quicker on the other
> patches sent, was mucking about in other work.
>
> So yea, this patch  (potential fix for unregister_netdevice()) seems
> to avoid the issue.
>
> I'm going to do some further testing, but its looking good so far.
>
> Do you still want feedback on the previous changes?
>
> thanks
> -john
John Stultz Aug. 12, 2017, 3:07 a.m.
On Fri, Aug 11, 2017 at 5:31 PM, John Stultz <john.stultz@linaro.org> wrote:
> On Fri, Aug 11, 2017 at 5:10 PM, Wei Wang <weiwan@google.com> wrote:
>>> If after Cong's fix, the issue still happens, could you help try the
>>> patch attached and collect all logs when you try the reproduce the
>>> issue? It would be great to have logs for both success case and the
>>> failure case.
>>>
>>> Thanks so much for your help.
>>>
>>
>> I think we have a potential fix for this issue.
>> Martin and I found that when addrconf_dst_alloc() creates a rt6, it is
>> possible that rt6->dst.dev points to loopback device while
>> rt6->rt6i_idev->dev points to a real device.
>> When the real device goes down, the current fib6 clean up code only
>> checks for rt6->dst.dev and assumes rt6->rt6i_idev->dev is the same.
>> That leaves unreleased refcnt on the real device if rt6->dst.dev
>> points to loopback dev.
>>
>> The attached potential fix is tested by Martin and made sure it fixes his issue.
>>
>> John,
>> It will be great if you can also give it a try and see if it fixes the
>> issue on your side before I submit an official patch.
>
> So yes, sorry I haven't been able to get back quicker on the other
> patches sent, was mucking about in other work.
>
> So yea, this patch  (potential fix for unregister_netdevice()) seems
> to avoid the issue.
>
> I'm going to do some further testing, but its looking good so far.

Looks good so far! I've not hit the issue yet.

Thanks so much for sorting out a fix!

If its useful:
Tested-by: John Stultz <john.stultz@linaro.org>

thanks again
-john
David Ahern Aug. 12, 2017, 3:37 a.m.
On 8/11/17 6:25 PM, Wei Wang wrote:
> By "a patch to fix that" do you mean after your patch, for every rt6,
> rt6->rt6i_idev will be the same as rt6->dst.dev?

FIB entries should have them the same device with my patch.

The copies done (ip6_rt_cache_alloc and ip6_rt_pcpu_alloc) will have to
set dst.dev to loopback or VRF device for RTF_LOCAL routes; it's the
only way to get local traffic to work and this is similar to what IPv4 does.
Ido Schimmel Aug. 12, 2017, 6:01 p.m.
Hi Wei,

On Fri, Aug 11, 2017 at 05:10:02PM -0700, Wei Wang wrote:
> I think we have a potential fix for this issue.
> Martin and I found that when addrconf_dst_alloc() creates a rt6, it is
> possible that rt6->dst.dev points to loopback device while
> rt6->rt6i_idev->dev points to a real device.
> When the real device goes down, the current fib6 clean up code only
> checks for rt6->dst.dev and assumes rt6->rt6i_idev->dev is the same.
> That leaves unreleased refcnt on the real device if rt6->dst.dev
> points to loopback dev.

[...]

> From 2d8861808c2029013f6b6e86120ba6902329145b Mon Sep 17 00:00:00 2001
> From: Wei Wang <weiwan@google.com>
> Date: Fri, 11 Aug 2017 16:36:04 -0700
> Subject: [PATCH 1/2] potential fix for unregister_netdevice()
> 
> Change-Id: I5d5f6f7a7ad0f5dd769f33487db17ff2570d52ea
> ---
>  net/ipv6/route.c | 17 ++++++++---------
>  1 file changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 4d30c96a819d..105922903932 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -417,14 +417,12 @@ static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
>  	struct net_device *loopback_dev =
>  		dev_net(dev)->loopback_dev;
>  
> -	if (dev != loopback_dev) {
> -		if (idev && idev->dev == dev) {
> -			struct inet6_dev *loopback_idev =
> -				in6_dev_get(loopback_dev);
> -			if (loopback_idev) {
> -				rt->rt6i_idev = loopback_idev;
> -				in6_dev_put(idev);
> -			}
> +	if (idev && idev->dev != loopback_dev) {
> +		struct inet6_dev *loopback_idev =
> +			in6_dev_get(loopback_dev);
> +		if (loopback_idev) {
> +			rt->rt6i_idev = loopback_idev;
> +			in6_dev_put(idev);
>  		}
>  	}
>  }
> @@ -2789,7 +2787,8 @@ static int fib6_ifdown(struct rt6_info *rt, void *arg)
>  	const struct arg_dev_net *adn = arg;
>  	const struct net_device *dev = adn->dev;
>  
> -	if ((rt->dst.dev == dev || !dev) &&
> +	if ((rt->dst.dev == dev || !dev ||
> +	     rt->rt6i_idev->dev == dev) &&

Can you please explain why this line is needed? While host routes aren't
removed from the FIB by rt6_ifdown() (when dst.dev goes down), they are
removed later on in addrconf_ifdown().

With your patch, if I check the return value of ip6_del_rt() in
__ipv6_ifa_notify() I see that -ENONET is returned. Because the host
route was already removed by rt6_ifdown(). When the line in question is
removed from the patch I don't get the error anymore.

Is it possible that in John's case the host route was correctly removed
from the FIB and that the unreleased reference was due to a wrong check
in ip6_dst_ifdown() (which you patched correctly AFAICT)?

Thanks

>  	    rt != adn->net->ipv6.ip6_null_entry &&
>  	    (rt->rt6i_nsiblings == 0 ||
>  	     (dev && netdev_unregistering(dev)) ||
> -- 
> 2.14.0.434.g98096fd7a8-goog
>
Wei Wang Aug. 12, 2017, 7:28 p.m.
On Fri, Aug 11, 2017 at 8:07 PM, John Stultz <john.stultz@linaro.org> wrote:
> On Fri, Aug 11, 2017 at 5:31 PM, John Stultz <john.stultz@linaro.org> wrote:
>> On Fri, Aug 11, 2017 at 5:10 PM, Wei Wang <weiwan@google.com> wrote:
>>>> If after Cong's fix, the issue still happens, could you help try the
>>>> patch attached and collect all logs when you try the reproduce the
>>>> issue? It would be great to have logs for both success case and the
>>>> failure case.
>>>>
>>>> Thanks so much for your help.
>>>>
>>>
>>> I think we have a potential fix for this issue.
>>> Martin and I found that when addrconf_dst_alloc() creates a rt6, it is
>>> possible that rt6->dst.dev points to loopback device while
>>> rt6->rt6i_idev->dev points to a real device.
>>> When the real device goes down, the current fib6 clean up code only
>>> checks for rt6->dst.dev and assumes rt6->rt6i_idev->dev is the same.
>>> That leaves unreleased refcnt on the real device if rt6->dst.dev
>>> points to loopback dev.
>>>
>>> The attached potential fix is tested by Martin and made sure it fixes his issue.
>>>
>>> John,
>>> It will be great if you can also give it a try and see if it fixes the
>>> issue on your side before I submit an official patch.
>>
>> So yes, sorry I haven't been able to get back quicker on the other
>> patches sent, was mucking about in other work.
>>
>> So yea, this patch  (potential fix for unregister_netdevice()) seems
>> to avoid the issue.
>>
>> I'm going to do some further testing, but its looking good so far.
>
> Looks good so far! I've not hit the issue yet.
>
> Thanks so much for sorting out a fix!
>
> If its useful:
> Tested-by: John Stultz <john.stultz@linaro.org>
>
> thanks again
> -john
Wei Wang Aug. 12, 2017, 7:29 p.m.
> Looks good so far! I've not hit the issue yet.
>

Great. I will prepare an official patch then.

> Thanks so much for sorting out a fix!
>
> If its useful:
> Tested-by: John Stultz <john.stultz@linaro.org>

Sure will do.

Thanks.
Wei

On Fri, Aug 11, 2017 at 8:07 PM, John Stultz <john.stultz@linaro.org> wrote:
> On Fri, Aug 11, 2017 at 5:31 PM, John Stultz <john.stultz@linaro.org> wrote:
>> On Fri, Aug 11, 2017 at 5:10 PM, Wei Wang <weiwan@google.com> wrote:
>>>> If after Cong's fix, the issue still happens, could you help try the
>>>> patch attached and collect all logs when you try the reproduce the
>>>> issue? It would be great to have logs for both success case and the
>>>> failure case.
>>>>
>>>> Thanks so much for your help.
>>>>
>>>
>>> I think we have a potential fix for this issue.
>>> Martin and I found that when addrconf_dst_alloc() creates a rt6, it is
>>> possible that rt6->dst.dev points to loopback device while
>>> rt6->rt6i_idev->dev points to a real device.
>>> When the real device goes down, the current fib6 clean up code only
>>> checks for rt6->dst.dev and assumes rt6->rt6i_idev->dev is the same.
>>> That leaves unreleased refcnt on the real device if rt6->dst.dev
>>> points to loopback dev.
>>>
>>> The attached potential fix is tested by Martin and made sure it fixes his issue.
>>>
>>> John,
>>> It will be great if you can also give it a try and see if it fixes the
>>> issue on your side before I submit an official patch.
>>
>> So yes, sorry I haven't been able to get back quicker on the other
>> patches sent, was mucking about in other work.
>>
>> So yea, this patch  (potential fix for unregister_netdevice()) seems
>> to avoid the issue.
>>
>> I'm going to do some further testing, but its looking good so far.
>
> Looks good so far! I've not hit the issue yet.
>
> Thanks so much for sorting out a fix!
>
> If its useful:
> Tested-by: John Stultz <john.stultz@linaro.org>
>
> thanks again
> -john
Wei Wang Aug. 12, 2017, 7:29 p.m.
On Fri, Aug 11, 2017 at 8:37 PM, David Ahern <dsahern@gmail.com> wrote:
> On 8/11/17 6:25 PM, Wei Wang wrote:
>> By "a patch to fix that" do you mean after your patch, for every rt6,
>> rt6->rt6i_idev will be the same as rt6->dst.dev?
>
> FIB entries should have them the same device with my patch.
>
> The copies done (ip6_rt_cache_alloc and ip6_rt_pcpu_alloc) will have to
> set dst.dev to loopback or VRF device for RTF_LOCAL routes; it's the
> only way to get local traffic to work and this is similar to what IPv4 does.

Got it. Thanks David.

Wei
Wei Wang Aug. 12, 2017, 7:42 p.m.
Hi Ido,

>> -     if ((rt->dst.dev == dev || !dev) &&
>> +     if ((rt->dst.dev == dev || !dev ||
>> +          rt->rt6i_idev->dev == dev) &&
>
> Can you please explain why this line is needed? While host routes aren't
> removed from the FIB by rt6_ifdown() (when dst.dev goes down), they are
> removed later on in addrconf_ifdown().
>

Yes.. Agree. But one difference is that if the route is removed from
addrconf_ifdown(), dst_dev_put() won't be called to release the
devices before doing dst_release(). It is OK if dst_release() sees the
refcnt on dst already drops to 0 and directly destroys the dst. But I
think it will cause problem if at the time, the dst is still held by
some other users because then the refcnt on the device going down will
not get released.
That's why I think we should remove the dst with either dst->dev ==
going down dev or rt6->rt6i_idev->dev == going down dev from the fib6
tree always because there, we always call dst_dev_put() to release the
device.

> With your patch, if I check the return value of ip6_del_rt() in
> __ipv6_ifa_notify() I see that -ENONET is returned. Because the host
> route was already removed by rt6_ifdown(). When the line in question is
> removed from the patch I don't get the error anymore.
>

Right. That is expected as the route is already removed from the tree.

> Is it possible that in John's case the host route was correctly removed
> from the FIB and that the unreleased reference was due to a wrong check
> in ip6_dst_ifdown() (which you patched correctly AFAICT)?
>

Yes. possible. But as I explained earlier, I still think we should
also remove routes with rt6->rt6i_idev->dev == going down dev from the
tree.

Thanks.
Wei

On Sat, Aug 12, 2017 at 11:01 AM, Ido Schimmel <idosch@idosch.org> wrote:
> Hi Wei,
>
> On Fri, Aug 11, 2017 at 05:10:02PM -0700, Wei Wang wrote:
>> I think we have a potential fix for this issue.
>> Martin and I found that when addrconf_dst_alloc() creates a rt6, it is
>> possible that rt6->dst.dev points to loopback device while
>> rt6->rt6i_idev->dev points to a real device.
>> When the real device goes down, the current fib6 clean up code only
>> checks for rt6->dst.dev and assumes rt6->rt6i_idev->dev is the same.
>> That leaves unreleased refcnt on the real device if rt6->dst.dev
>> points to loopback dev.
>
> [...]
>
>> From 2d8861808c2029013f6b6e86120ba6902329145b Mon Sep 17 00:00:00 2001
>> From: Wei Wang <weiwan@google.com>
>> Date: Fri, 11 Aug 2017 16:36:04 -0700
>> Subject: [PATCH 1/2] potential fix for unregister_netdevice()
>>
>> Change-Id: I5d5f6f7a7ad0f5dd769f33487db17ff2570d52ea
>> ---
>>  net/ipv6/route.c | 17 ++++++++---------
>>  1 file changed, 8 insertions(+), 9 deletions(-)
>>
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index 4d30c96a819d..105922903932 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -417,14 +417,12 @@ static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
>>       struct net_device *loopback_dev =
>>               dev_net(dev)->loopback_dev;
>>
>> -     if (dev != loopback_dev) {
>> -             if (idev && idev->dev == dev) {
>> -                     struct inet6_dev *loopback_idev =
>> -                             in6_dev_get(loopback_dev);
>> -                     if (loopback_idev) {
>> -                             rt->rt6i_idev = loopback_idev;
>> -                             in6_dev_put(idev);
>> -                     }
>> +     if (idev && idev->dev != loopback_dev) {
>> +             struct inet6_dev *loopback_idev =
>> +                     in6_dev_get(loopback_dev);
>> +             if (loopback_idev) {
>> +                     rt->rt6i_idev = loopback_idev;
>> +                     in6_dev_put(idev);
>>               }
>>       }
>>  }
>> @@ -2789,7 +2787,8 @@ static int fib6_ifdown(struct rt6_info *rt, void *arg)
>>       const struct arg_dev_net *adn = arg;
>>       const struct net_device *dev = adn->dev;
>>
>> -     if ((rt->dst.dev == dev || !dev) &&
>> +     if ((rt->dst.dev == dev || !dev ||
>> +          rt->rt6i_idev->dev == dev) &&
>
> Can you please explain why this line is needed? While host routes aren't
> removed from the FIB by rt6_ifdown() (when dst.dev goes down), they are
> removed later on in addrconf_ifdown().
>
> With your patch, if I check the return value of ip6_del_rt() in
> __ipv6_ifa_notify() I see that -ENONET is returned. Because the host
> route was already removed by rt6_ifdown(). When the line in question is
> removed from the patch I don't get the error anymore.
>
> Is it possible that in John's case the host route was correctly removed
> from the FIB and that the unreleased reference was due to a wrong check
> in ip6_dst_ifdown() (which you patched correctly AFAICT)?
>
> Thanks
>
>>           rt != adn->net->ipv6.ip6_null_entry &&
>>           (rt->rt6i_nsiblings == 0 ||
>>            (dev && netdev_unregistering(dev)) ||
>> --
>> 2.14.0.434.g98096fd7a8-goog
>>
David Ahern Aug. 13, 2017, 4:24 p.m.
On 8/12/17 1:42 PM, Wei Wang wrote:
> Hi Ido,
> 
>>> -     if ((rt->dst.dev == dev || !dev) &&
>>> +     if ((rt->dst.dev == dev || !dev ||
>>> +          rt->rt6i_idev->dev == dev) &&
>>
>> Can you please explain why this line is needed? While host routes aren't
>> removed from the FIB by rt6_ifdown() (when dst.dev goes down), they are
>> removed later on in addrconf_ifdown().
>>
> 
> Yes.. Agree. But one difference is that if the route is removed from
> addrconf_ifdown(), dst_dev_put() won't be called to release the
> devices before doing dst_release(). It is OK if dst_release() sees the
> refcnt on dst already drops to 0 and directly destroys the dst. But I
> think it will cause problem if at the time, the dst is still held by
> some other users because then the refcnt on the device going down will
> not get released.
> That's why I think we should remove the dst with either dst->dev ==
> going down dev or rt6->rt6i_idev->dev == going down dev from the fib6
> tree always because there, we always call dst_dev_put() to release the
> device.
> 
>> With your patch, if I check the return value of ip6_del_rt() in
>> __ipv6_ifa_notify() I see that -ENONET is returned. Because the host
>> route was already removed by rt6_ifdown(). When the line in question is
>> removed from the patch I don't get the error anymore.
>>
> 
> Right. That is expected as the route is already removed from the tree.
> 
>> Is it possible that in John's case the host route was correctly removed
>> from the FIB and that the unreleased reference was due to a wrong check
>> in ip6_dst_ifdown() (which you patched correctly AFAICT)?
>>
> 
> Yes. possible. But as I explained earlier, I still think we should
> also remove routes with rt6->rt6i_idev->dev == going down dev from the
> tree.

Looking at my patch to move host routes from loopback to device with the
address, I have this:

@@ -2789,7 +2808,8 @@ static int fib6_ifdown(struct rt6_info *rt, void *arg)
        const struct arg_dev_net *adn = arg;
        const struct net_device *dev = adn->dev;

-       if ((rt->dst.dev == dev || !dev) &&
+       if ((rt->dst.dev == dev || !dev ||
+            (netdev_unregistering(dev) && rt->rt6i_idev->dev == dev)) &&
            rt != adn->net->ipv6.ip6_null_entry &&
            (rt->rt6i_nsiblings == 0 ||
             (dev && netdev_unregistering(dev)) ||
Wei Wang Aug. 13, 2017, 8:56 p.m.
> Looking at my patch to move host routes from loopback to device with the
> address, I have this:
>
> @@ -2789,7 +2808,8 @@ static int fib6_ifdown(struct rt6_info *rt, void *arg)
>         const struct arg_dev_net *adn = arg;
>         const struct net_device *dev = adn->dev;
>
> -       if ((rt->dst.dev == dev || !dev) &&
> +       if ((rt->dst.dev == dev || !dev ||
> +            (netdev_unregistering(dev) && rt->rt6i_idev->dev == dev)) &&
>             rt != adn->net->ipv6.ip6_null_entry &&
>             (rt->rt6i_nsiblings == 0 ||
>              (dev && netdev_unregistering(dev)) ||

As you explained earlier, after your patch, all entries in the fib6
tree will have rt->dst.dev be the same as rt->rt6i_idev->dev except
those ones created by p6_rt_cache_alloc() and ip6_rt_pcpu_alloc().
Then the above newly added check is mainly to catch those cached dst
entries (created by ip6_rt_cached_alloc()). right?
And it is required because __ipv6_ifa_notify() -> ip6_del_rt() won't
take care of those cached dst entries.

Then I think I should wait for your patches to get merged before
submitting my patch?

Thanks.
Wei


On Sun, Aug 13, 2017 at 9:24 AM, David Ahern <dsahern@gmail.com> wrote:
> On 8/12/17 1:42 PM, Wei Wang wrote:
>> Hi Ido,
>>
>>>> -     if ((rt->dst.dev == dev || !dev) &&
>>>> +     if ((rt->dst.dev == dev || !dev ||
>>>> +          rt->rt6i_idev->dev == dev) &&
>>>
>>> Can you please explain why this line is needed? While host routes aren't
>>> removed from the FIB by rt6_ifdown() (when dst.dev goes down), they are
>>> removed later on in addrconf_ifdown().
>>>
>>
>> Yes.. Agree. But one difference is that if the route is removed from
>> addrconf_ifdown(), dst_dev_put() won't be called to release the
>> devices before doing dst_release(). It is OK if dst_release() sees the
>> refcnt on dst already drops to 0 and directly destroys the dst. But I
>> think it will cause problem if at the time, the dst is still held by
>> some other users because then the refcnt on the device going down will
>> not get released.
>> That's why I think we should remove the dst with either dst->dev ==
>> going down dev or rt6->rt6i_idev->dev == going down dev from the fib6
>> tree always because there, we always call dst_dev_put() to release the
>> device.
>>
>>> With your patch, if I check the return value of ip6_del_rt() in
>>> __ipv6_ifa_notify() I see that -ENONET is returned. Because the host
>>> route was already removed by rt6_ifdown(). When the line in question is
>>> removed from the patch I don't get the error anymore.
>>>
>>
>> Right. That is expected as the route is already removed from the tree.
>>
>>> Is it possible that in John's case the host route was correctly removed
>>> from the FIB and that the unreleased reference was due to a wrong check
>>> in ip6_dst_ifdown() (which you patched correctly AFAICT)?
>>>
>>
>> Yes. possible. But as I explained earlier, I still think we should
>> also remove routes with rt6->rt6i_idev->dev == going down dev from the
>> tree.
>
> Looking at my patch to move host routes from loopback to device with the
> address, I have this:
>
> @@ -2789,7 +2808,8 @@ static int fib6_ifdown(struct rt6_info *rt, void *arg)
>         const struct arg_dev_net *adn = arg;
>         const struct net_device *dev = adn->dev;
>
> -       if ((rt->dst.dev == dev || !dev) &&
> +       if ((rt->dst.dev == dev || !dev ||
> +            (netdev_unregistering(dev) && rt->rt6i_idev->dev == dev)) &&
>             rt != adn->net->ipv6.ip6_null_entry &&
>             (rt->rt6i_nsiblings == 0 ||
>              (dev && netdev_unregistering(dev)) ||
>
>
David Ahern Aug. 13, 2017, 11:08 p.m.
On 8/13/17 2:56 PM, Wei Wang wrote:
>> Looking at my patch to move host routes from loopback to device with the
>> address, I have this:
>>
>> @@ -2789,7 +2808,8 @@ static int fib6_ifdown(struct rt6_info *rt, void *arg)
>>         const struct arg_dev_net *adn = arg;
>>         const struct net_device *dev = adn->dev;
>>
>> -       if ((rt->dst.dev == dev || !dev) &&
>> +       if ((rt->dst.dev == dev || !dev ||
>> +            (netdev_unregistering(dev) && rt->rt6i_idev->dev == dev)) &&
>>             rt != adn->net->ipv6.ip6_null_entry &&
>>             (rt->rt6i_nsiblings == 0 ||
>>              (dev && netdev_unregistering(dev)) ||
> 
> As you explained earlier, after your patch, all entries in the fib6
> tree will have rt->dst.dev be the same as rt->rt6i_idev->dev except
> those ones created by p6_rt_cache_alloc() and ip6_rt_pcpu_alloc().
> Then the above newly added check is mainly to catch those cached dst
> entries (created by ip6_rt_cached_alloc()). right?
> And it is required because __ipv6_ifa_notify() -> ip6_del_rt() won't
> take care of those cached dst entries.
> 
> Then I think I should wait for your patches to get merged before
> submitting my patch?

no. your patch will need to go back to 4.12; my changes will not be
appropriate for that.

Patch hide | download patch | download mbox

From 2d8861808c2029013f6b6e86120ba6902329145b Mon Sep 17 00:00:00 2001
From: Wei Wang <weiwan@google.com>
Date: Fri, 11 Aug 2017 16:36:04 -0700
Subject: [PATCH 1/2] potential fix for unregister_netdevice()

Change-Id: I5d5f6f7a7ad0f5dd769f33487db17ff2570d52ea
---
 net/ipv6/route.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4d30c96a819d..105922903932 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -417,14 +417,12 @@  static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
 	struct net_device *loopback_dev =
 		dev_net(dev)->loopback_dev;
 
-	if (dev != loopback_dev) {
-		if (idev && idev->dev == dev) {
-			struct inet6_dev *loopback_idev =
-				in6_dev_get(loopback_dev);
-			if (loopback_idev) {
-				rt->rt6i_idev = loopback_idev;
-				in6_dev_put(idev);
-			}
+	if (idev && idev->dev != loopback_dev) {
+		struct inet6_dev *loopback_idev =
+			in6_dev_get(loopback_dev);
+		if (loopback_idev) {
+			rt->rt6i_idev = loopback_idev;
+			in6_dev_put(idev);
 		}
 	}
 }
@@ -2789,7 +2787,8 @@  static int fib6_ifdown(struct rt6_info *rt, void *arg)
 	const struct arg_dev_net *adn = arg;
 	const struct net_device *dev = adn->dev;
 
-	if ((rt->dst.dev == dev || !dev) &&
+	if ((rt->dst.dev == dev || !dev ||
+	     rt->rt6i_idev->dev == dev) &&
 	    rt != adn->net->ipv6.ip6_null_entry &&
 	    (rt->rt6i_nsiblings == 0 ||
 	     (dev && netdev_unregistering(dev)) ||
-- 
2.14.0.434.g98096fd7a8-goog