diff mbox

[v2] net IPv6 : Fix broken IPv6 routing table after loopback down-up

Message ID CAHPKR9KE4LT_AEzUZ8cuJwBeMEEu0Ppe+jjynFQk7+N-A4YUmQ@mail.gmail.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Balakumaran Kannan March 28, 2013, 1:27 p.m. UTC
IPv6 Routing table becomes broken once we do ifdown, ifup of the loopback(lo)
interface. After down-up, routes of other interface's IPv6 addresses through
'lo' are lost.

IPv6 addresses assigned to all interfaces are routed through 'lo' for internal
communication. Once 'lo' is down, those routing entries are removed from
routing table. But those removed entries are not being re-created properly when
'lo' is brought up. So IPv6 addresses of other interfaces becomes unreachable
from the same machine. Also this breaks communication with other machines
because of NDISC packet processing failure.

This patch fixes this issue by reading all interface's IPv6 addresses and
adding them to IPv6 routing table while bringing up 'lo'.

Patch is prepared for Linux-3.9.rc4 kernel.

Signed-off-by: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com>
Signed-off-by: Maruthi Thotad <Maruthi.Thotad@ap.sm.sony.com>
---
This is version-2 of the patch sent earlier.
[Ref: http://marc.info/?l=linux-netdev&m=136437848103355&w=2]

Change Log:
 1. Used 'for_each_netdev_rcu' instead of while loop.
 2. Added required locks.
 3. Skipping IPv6 addresses with 'IFA_F_DADFAILED' or 'IFA_F_TENTATIVE' flags

==Testing==
Before applying the patch:
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$

After applying the patch:
$ route -A inet6
Kernel IPv6 routing
table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$
---
struct in6_addr *addr)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet March 28, 2013, 1:44 p.m. UTC | #1
On Thu, 2013-03-28 at 18:57 +0530, Balakumaran Kannan wrote:
> IPv6 Routing table becomes broken once we do ifdown, ifup of the loopback(lo)
> interface. After down-up, routes of other interface's IPv6 addresses through
> 'lo' are lost.
> 

>  	add_addr(idev, &in6addr_loopback, 128, IFA_HOST);
> +
> +	/* Add routes to other interface's IPv6 addresses */
> +	rcu_read_lock();
> +	for_each_netdev_rcu(dev_net(dev), sp_dev) {
> +

We hold RTNL at this point, so why use RCU at all, and adding potential
long latencies ?

Just use for_each_netdev()

This way, a preemption is still allowed.

Also, I am not sure we need ipv6_find_idev()

__in6_dev_get() should be OK.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet March 28, 2013, 1:45 p.m. UTC | #2
On Thu, 2013-03-28 at 06:44 -0700, Eric Dumazet wrote:

> __in6_dev_get() should be OK.

more exactly : __in_dev_get_rtnl()



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Balakumaran Kannan March 28, 2013, 2:04 p.m. UTC | #3
On Thu, Mar 28, 2013 at 7:15 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2013-03-28 at 06:44 -0700, Eric Dumazet wrote:
>
>> __in6_dev_get() should be OK.
>
> more exactly : __in_dev_get_rtnl()
>
>
>

Thank you for your comments. I'll update the patch and send it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Balakumaran Kannan March 28, 2013, 2:47 p.m. UTC | #4
On Thu, Mar 28, 2013 at 7:34 PM, Balakumaran Kannan
<kumaran.4353@gmail.com> wrote:
> On Thu, Mar 28, 2013 at 7:15 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Thu, 2013-03-28 at 06:44 -0700, Eric Dumazet wrote:
>>
>>> __in6_dev_get() should be OK.
>>
>> more exactly : __in_dev_get_rtnl()
>>
>>
>>
>
> Thank you for your comments. I'll update the patch and send it.

As __in_dev_get_rtnl returns IPv4 specific data (ip_ptr), we have to
use __in6_dev_get to get IPv6 specific data (ip6_ptr).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney March 28, 2013, 6:23 p.m. UTC | #5
On Thu, Mar 28, 2013 at 06:44:25AM -0700, Eric Dumazet wrote:
> On Thu, 2013-03-28 at 18:57 +0530, Balakumaran Kannan wrote:
> > IPv6 Routing table becomes broken once we do ifdown, ifup of the loopback(lo)
> > interface. After down-up, routes of other interface's IPv6 addresses through
> > 'lo' are lost.
> > 
> 
> >  	add_addr(idev, &in6addr_loopback, 128, IFA_HOST);
> > +
> > +	/* Add routes to other interface's IPv6 addresses */
> > +	rcu_read_lock();
> > +	for_each_netdev_rcu(dev_net(dev), sp_dev) {
> > +
> 
> We hold RTNL at this point, so why use RCU at all, and adding potential
> long latencies ?
> 
> Just use for_each_netdev()
> 
> This way, a preemption is still allowed.

Agreed, there is no point in using RCU when it is not needed.

That said...

Since v3.1, in CONFIG_PREEMPT=y kernels, rcu_read_lock() does not
disable preemption.  In CONFIG_PREEMPT=n kernels, rcu_read_lock() does
disable preemption, but to no effect because preemption is already
disabled anyway.

The net effect is that rcu_read_lock() has no effect on preemption.

							Thanx, Paul

> Also, I am not sure we need ipv6_find_idev()
> 
> __in6_dev_get() should be OK.
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet March 28, 2013, 6:49 p.m. UTC | #6
On Thu, 2013-03-28 at 11:23 -0700, Paul E. McKenney wrote:

> 
> Agreed, there is no point in using RCU when it is not needed.
> 
> That said...
> 
> Since v3.1, in CONFIG_PREEMPT=y kernels, rcu_read_lock() does not
> disable preemption.  In CONFIG_PREEMPT=n kernels, rcu_read_lock() does
> disable preemption, but to no effect because preemption is already
> disabled anyway.
> 
> The net effect is that rcu_read_lock() has no effect on preemption.
> 

Good point, but this patch might be a stable candidate.

Rule of thumb for networking is 

1) Control path : RTNL mutex

2) Data path : RTNL cant be taken (from softirq), so use RCU if
possible.

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney March 28, 2013, 7:39 p.m. UTC | #7
On Thu, Mar 28, 2013 at 11:49:37AM -0700, Eric Dumazet wrote:
> On Thu, 2013-03-28 at 11:23 -0700, Paul E. McKenney wrote:
> 
> > 
> > Agreed, there is no point in using RCU when it is not needed.
> > 
> > That said...
> > 
> > Since v3.1, in CONFIG_PREEMPT=y kernels, rcu_read_lock() does not
> > disable preemption.  In CONFIG_PREEMPT=n kernels, rcu_read_lock() does
> > disable preemption, but to no effect because preemption is already
> > disabled anyway.
> > 
> > The net effect is that rcu_read_lock() has no effect on preemption.
> > 
> 
> Good point, but this patch might be a stable candidate.
> 
> Rule of thumb for networking is 
> 
> 1) Control path : RTNL mutex
> 
> 2) Data path : RTNL cant be taken (from softirq), so use RCU if
> possible.

Stable candidate and rules of thumb sound good to me!

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- linux-3.9-rc4/net/ipv6/addrconf.c.orig	2013-03-27 10:40:26.382569527 +0530
+++ linux-3.9-rc4/net/ipv6/addrconf.c	2013-03-28 18:29:00.492241840 +0530
@@ -2529,6 +2529,9 @@  static void sit_add_v4_addrs(struct inet
 static void init_loopback(struct net_device *dev)
 {
 	struct inet6_dev  *idev;
+	struct net_device *sp_dev;
+	struct inet6_ifaddr *sp_ifa;
+	struct rt6_info *sp_rt;

 	/* ::1 */

@@ -2540,6 +2543,33 @@  static void init_loopback(struct net_dev
 	}

 	add_addr(idev, &in6addr_loopback, 128, IFA_HOST);
+
+	/* Add routes to other interface's IPv6 addresses */
+	rcu_read_lock();
+	for_each_netdev_rcu(dev_net(dev), sp_dev) {
+
+		if (!strcmp(sp_dev->name, dev->name))
+			continue;
+
+		idev = ipv6_find_idev(sp_dev);
+		if (NULL == idev)
+			continue;
+
+		read_lock_bh(&idev->lock);
+		list_for_each_entry(sp_ifa, &idev->addr_list, if_list) {
+
+			if (sp_ifa->flags & (IFA_F_DADFAILED | IFA_F_TENTATIVE))
+				continue;
+
+			sp_rt = addrconf_dst_alloc(idev, &sp_ifa->addr, 0);
+
+			/* Failure cases are ignored */
+			if (!IS_ERR(sp_rt))
+				ip6_ins_rt(sp_rt);
+		}
+		read_unlock_bh(&idev->lock);
+	}
+	rcu_read_unlock();
 }

 static void addrconf_add_linklocal(struct inet6_dev *idev, const