diff mbox

[v2,net] net: ipv6: regenerate host route if moved to gc list

Message ID 1492879237-31566-1-git-send-email-dsa@cumulusnetworks.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

David Ahern April 22, 2017, 4:40 p.m. UTC
Taking down the loopback device wreaks havoc on IPv6 routes. By
extension, taking a VRF device wreaks havoc on its table.

Dmitry and Andrey both reported heap out-of-bounds reports in the IPv6
FIB code while running syzkaller fuzzer. The root cause is a dead dst
that is on the garbage list gets reinserted into the IPv6 FIB. While on
the gc (or perhaps when it gets added to the gc list) the dst->next is
set to an IPv4 dst. A subsequent walk of the ipv6 tables causes the
out-of-bounds access.

Andrey's reproducer was the key to getting to the bottom of this.

With IPv6, host routes for an address have the dst->dev set to the
loopback device. When the 'lo' device is taken down, rt6_ifdown initiates
a walk of the fib evicting routes with the 'lo' device which means all
host routes are removed. That process moves the dst which is attached to
an inet6_ifaddr to the gc list and marks it as dead.

The recent change to keep global IPv6 addresses added a new function
fixup_permanent_addr that is called on admin up. That function restarts
dad for an inet6_ifaddr and when it completes the host route attached
to it is inserted into the fib. Since the route was marked dead and
moved to the gc list, we get the reported out-of-bounds accesses. If
the device with the address is taken down or the address is removed, the
WARN_ON in fib6_del is triggered.

All of those faults are fixed by regenerating the host route of the
existing one has been moved to the gc list, something that can be
determined by checking if the rt6i_ref counter is 0.

Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v2
- change ifp->rt under spinlock vs cmpxchg
- add comment about rt6i_ref == 0

Dmitry / Andrey: can you guys add this patch to your tree and run
syzkaller tests? I'd like to confirm that all of the fib traces
are fixed. Thanks.

 net/ipv6/addrconf.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Martin KaFai Lau April 22, 2017, 10 p.m. UTC | #1
On Sat, Apr 22, 2017 at 09:40:37AM -0700, David Ahern wrote:
[...]
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 08f9e8ea7a81..97e86158bbcb 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -3303,14 +3303,24 @@ static void addrconf_gre_config(struct net_device *dev)
>  static int fixup_permanent_addr(struct inet6_dev *idev,
>  				struct inet6_ifaddr *ifp)
>  {
> -	if (!ifp->rt) {
> -		struct rt6_info *rt;
> +	/* rt6i_ref == 0 means the host route was removed from the
> +	 * FIB, for example, if 'lo' device is taken down. In that
> +	 * case regenerate the host route.
> +	 */
> +	if (!ifp->rt || !atomic_read(&ifp->rt->rt6i_ref)) {
> +		struct rt6_info *rt, *prev;
>
>  		rt = addrconf_dst_alloc(idev, &ifp->addr, false);
The rt regernation makes sense.

>  		if (unlikely(IS_ERR(rt)))
>  			return PTR_ERR(rt);
>
> +		spin_lock(&ifp->lock);
> +		prev = ifp->rt;
>  		ifp->rt = rt;
I am still missing something on the new spin_lock:
1) Is there an existing race in the existing
   ifp->rt modification ('ipf->rt = rt') which is
   not related to this bug?
2) If there is a race in ifp->rt, is the above if-checks
   on ifp->rt racy and need protection also? F.e. 'ifp->rt->rt6i_ref'
   since ifp->rt could be NULL or ifp->rt->rt6i_ref
   may not be zero later if there is concurrent
   modification on ifp->rt?

> +		spin_unlock(&ifp->lock);
> +
> +		if (prev)
> +			ip6_rt_put(prev);
Nit. ip6_rt_put() takes NULL.

>  	}
>
>  	if (!(ifp->flags & IFA_F_NOPREFIXROUTE)) {
> --
> 2.1.4
>
David Ahern April 23, 2017, 1:12 a.m. UTC | #2
On 4/22/17 4:00 PM, Martin KaFai Lau wrote:
> On Sat, Apr 22, 2017 at 09:40:37AM -0700, David Ahern wrote:
> [...]
>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> index 08f9e8ea7a81..97e86158bbcb 100644
>> --- a/net/ipv6/addrconf.c
>> +++ b/net/ipv6/addrconf.c
>> @@ -3303,14 +3303,24 @@ static void addrconf_gre_config(struct net_device *dev)
>>  static int fixup_permanent_addr(struct inet6_dev *idev,
>>  				struct inet6_ifaddr *ifp)
>>  {
>> -	if (!ifp->rt) {
>> -		struct rt6_info *rt;
>> +	/* rt6i_ref == 0 means the host route was removed from the
>> +	 * FIB, for example, if 'lo' device is taken down. In that
>> +	 * case regenerate the host route.
>> +	 */
>> +	if (!ifp->rt || !atomic_read(&ifp->rt->rt6i_ref)) {
>> +		struct rt6_info *rt, *prev;
>>
>>  		rt = addrconf_dst_alloc(idev, &ifp->addr, false);
> The rt regernation makes sense.
> 
>>  		if (unlikely(IS_ERR(rt)))
>>  			return PTR_ERR(rt);
>>
>> +		spin_lock(&ifp->lock);
>> +		prev = ifp->rt;
>>  		ifp->rt = rt;
> I am still missing something on the new spin_lock:
> 1) Is there an existing race in the existing
>    ifp->rt modification ('ipf->rt = rt') which is
>    not related to this bug?
> 2) If there is a race in ifp->rt, is the above if-checks
>    on ifp->rt racy and need protection also? F.e. 'ifp->rt->rt6i_ref'
>    since ifp->rt could be NULL or ifp->rt->rt6i_ref
>    may not be zero later if there is concurrent
>    modification on ifp->rt?

As I understand it:
- rt6i_ref is modified by the fib code (adding and removing to tree) and
always under RTNL.
- ifp->rt is only *set* under RTNL, but is accessed without (dad via
workqueue and sysctl).

The code path to fixup_permanent_addr is under RTNL, so the if check on
ifp->rt and rt6i_ref is ok -- neither can be changed since RTNL is held.

Since ifp->rt can be accessed outside of RTNL, the spinlock is needed to
change its value. Arguably only 'ifp->rt = rt;' needs the spinlock.

Let me know if I am missing something. There are many twists and turns
with the ipv6 code.

> 
>> +		spin_unlock(&ifp->lock);
>> +
>> +		if (prev)
>> +			ip6_rt_put(prev);
> Nit. ip6_rt_put() takes NULL.

ok.
Martin KaFai Lau April 23, 2017, 2:28 a.m. UTC | #3
On Sat, Apr 22, 2017 at 07:12:34PM -0600, David Ahern wrote:
> On 4/22/17 4:00 PM, Martin KaFai Lau wrote:
> > On Sat, Apr 22, 2017 at 09:40:37AM -0700, David Ahern wrote:
> > [...]
> >> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> >> index 08f9e8ea7a81..97e86158bbcb 100644
> >> --- a/net/ipv6/addrconf.c
> >> +++ b/net/ipv6/addrconf.c
> >> @@ -3303,14 +3303,24 @@ static void addrconf_gre_config(struct net_device *dev)
> >>  static int fixup_permanent_addr(struct inet6_dev *idev,
> >>  				struct inet6_ifaddr *ifp)
> >>  {
> >> -	if (!ifp->rt) {
> >> -		struct rt6_info *rt;
> >> +	/* rt6i_ref == 0 means the host route was removed from the
> >> +	 * FIB, for example, if 'lo' device is taken down. In that
> >> +	 * case regenerate the host route.
> >> +	 */
> >> +	if (!ifp->rt || !atomic_read(&ifp->rt->rt6i_ref)) {
> >> +		struct rt6_info *rt, *prev;
> >>
> >>  		rt = addrconf_dst_alloc(idev, &ifp->addr, false);
> > The rt regernation makes sense.
> >
> >>  		if (unlikely(IS_ERR(rt)))
> >>  			return PTR_ERR(rt);
> >>
> >> +		spin_lock(&ifp->lock);
> >> +		prev = ifp->rt;
> >>  		ifp->rt = rt;
> > I am still missing something on the new spin_lock:
> > 1) Is there an existing race in the existing
> >    ifp->rt modification ('ipf->rt = rt') which is
> >    not related to this bug?
> > 2) If there is a race in ifp->rt, is the above if-checks
> >    on ifp->rt racy and need protection also? F.e. 'ifp->rt->rt6i_ref'
> >    since ifp->rt could be NULL or ifp->rt->rt6i_ref
> >    may not be zero later if there is concurrent
> >    modification on ifp->rt?
>
> As I understand it:
> - rt6i_ref is modified by the fib code (adding and removing to tree) and
> always under RTNL.
> - ifp->rt is only *set* under RTNL, but is accessed without (dad via
> workqueue and sysctl).
>
> The code path to fixup_permanent_addr is under RTNL, so the if check on
> ifp->rt and rt6i_ref is ok -- neither can be changed since RTNL is held.
>
> Since ifp->rt can be accessed outside of RTNL, the spinlock is needed to
> change its value.
Got it. It is to protect the readers which are not under RTNL.
Many thanks for pointing out what I was missing.  It all makes sense now.

> Arguably only 'ifp->rt = rt;' needs the spinlock.
It still seems like the existing 'ifp->rt = rt;' needs protection
anyway regardless of the rt regeneration change.  It would be nice to
explain it in the commit log or even better separating it out
into another patch.

>
> There are many twists and turns with the ipv6 code.
Nod Nod :)

>
> >
> >> +		spin_unlock(&ifp->lock);
> >> +
> >> +		if (prev)
> >> +			ip6_rt_put(prev);
> > Nit. ip6_rt_put() takes NULL.
>
> ok.
>
David Ahern April 23, 2017, 2:08 p.m. UTC | #4
On 4/22/17 8:28 PM, Martin KaFai Lau wrote:
>> The code path to fixup_permanent_addr is under RTNL, so the if check on
>> ifp->rt and rt6i_ref is ok -- neither can be changed since RTNL is held.
>>
>> Since ifp->rt can be accessed outside of RTNL, the spinlock is needed to
>> change its value.
> Got it. It is to protect the readers which are not under RTNL.
> Many thanks for pointing out what I was missing.  It all makes sense now.
> 
>> Arguably only 'ifp->rt = rt;' needs the spinlock.
> It still seems like the existing 'ifp->rt = rt;' needs protection
> anyway regardless of the rt regeneration change.  It would be nice to
> explain it in the commit log or even better separating it out
> into another patch.

I'll add a comment to the commit log when I send a v3 tomorrow morning.
Andrey Konovalov April 24, 2017, 2:03 p.m. UTC | #5
On Sat, Apr 22, 2017 at 6:40 PM, David Ahern <dsa@cumulusnetworks.com> wrote:
> Taking down the loopback device wreaks havoc on IPv6 routes. By
> extension, taking a VRF device wreaks havoc on its table.
>
> Dmitry and Andrey both reported heap out-of-bounds reports in the IPv6
> FIB code while running syzkaller fuzzer. The root cause is a dead dst
> that is on the garbage list gets reinserted into the IPv6 FIB. While on
> the gc (or perhaps when it gets added to the gc list) the dst->next is
> set to an IPv4 dst. A subsequent walk of the ipv6 tables causes the
> out-of-bounds access.
>
> Andrey's reproducer was the key to getting to the bottom of this.
>
> With IPv6, host routes for an address have the dst->dev set to the
> loopback device. When the 'lo' device is taken down, rt6_ifdown initiates
> a walk of the fib evicting routes with the 'lo' device which means all
> host routes are removed. That process moves the dst which is attached to
> an inet6_ifaddr to the gc list and marks it as dead.
>
> The recent change to keep global IPv6 addresses added a new function
> fixup_permanent_addr that is called on admin up. That function restarts
> dad for an inet6_ifaddr and when it completes the host route attached
> to it is inserted into the fib. Since the route was marked dead and
> moved to the gc list, we get the reported out-of-bounds accesses. If
> the device with the address is taken down or the address is removed, the
> WARN_ON in fib6_del is triggered.
>
> All of those faults are fixed by regenerating the host route of the
> existing one has been moved to the gc list, something that can be
> determined by checking if the rt6i_ref counter is 0.
>
> Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Reported-by: Andrey Konovalov <andreyknvl@google.com>
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ---
> v2
> - change ifp->rt under spinlock vs cmpxchg
> - add comment about rt6i_ref == 0
>
> Dmitry / Andrey: can you guys add this patch to your tree and run
> syzkaller tests? I'd like to confirm that all of the fib traces
> are fixed. Thanks.

Tried fuzzing with your patch, I don't see any fib6 related reports
any more, so it should be good. I'll let you know if I see some
possibly related crashes.

Thanks!

>
>  net/ipv6/addrconf.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 08f9e8ea7a81..97e86158bbcb 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -3303,14 +3303,24 @@ static void addrconf_gre_config(struct net_device *dev)
>  static int fixup_permanent_addr(struct inet6_dev *idev,
>                                 struct inet6_ifaddr *ifp)
>  {
> -       if (!ifp->rt) {
> -               struct rt6_info *rt;
> +       /* rt6i_ref == 0 means the host route was removed from the
> +        * FIB, for example, if 'lo' device is taken down. In that
> +        * case regenerate the host route.
> +        */
> +       if (!ifp->rt || !atomic_read(&ifp->rt->rt6i_ref)) {
> +               struct rt6_info *rt, *prev;
>
>                 rt = addrconf_dst_alloc(idev, &ifp->addr, false);
>                 if (unlikely(IS_ERR(rt)))
>                         return PTR_ERR(rt);
>
> +               spin_lock(&ifp->lock);
> +               prev = ifp->rt;
>                 ifp->rt = rt;
> +               spin_unlock(&ifp->lock);
> +
> +               if (prev)
> +                       ip6_rt_put(prev);
>         }
>
>         if (!(ifp->flags & IFA_F_NOPREFIXROUTE)) {
> --
> 2.1.4
>
diff mbox

Patch

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 08f9e8ea7a81..97e86158bbcb 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3303,14 +3303,24 @@  static void addrconf_gre_config(struct net_device *dev)
 static int fixup_permanent_addr(struct inet6_dev *idev,
 				struct inet6_ifaddr *ifp)
 {
-	if (!ifp->rt) {
-		struct rt6_info *rt;
+	/* rt6i_ref == 0 means the host route was removed from the
+	 * FIB, for example, if 'lo' device is taken down. In that
+	 * case regenerate the host route.
+	 */
+	if (!ifp->rt || !atomic_read(&ifp->rt->rt6i_ref)) {
+		struct rt6_info *rt, *prev;
 
 		rt = addrconf_dst_alloc(idev, &ifp->addr, false);
 		if (unlikely(IS_ERR(rt)))
 			return PTR_ERR(rt);
 
+		spin_lock(&ifp->lock);
+		prev = ifp->rt;
 		ifp->rt = rt;
+		spin_unlock(&ifp->lock);
+
+		if (prev)
+			ip6_rt_put(prev);
 	}
 
 	if (!(ifp->flags & IFA_F_NOPREFIXROUTE)) {