Patchwork tunnels: Fix tunnels change rcu protection

login
register
mail settings
Submitter Pavel Emelyanov
Date Oct. 27, 2010, 3:43 p.m.
Message ID <4CC848B9.1060406@parallels.com>
Download mbox | patch
Permalink /patch/69367/
State Accepted
Delegated to: David Miller
Headers show

Comments

Pavel Emelyanov - Oct. 27, 2010, 3:43 p.m.
After making rcu protection for tunnels (ipip, gre, sit and ip6) a bug
was introduced into the SIOCCHGTUNNEL code.

The tunnel is first unlinked, then addresses change, then it is linked
back probably into another bucket. But while changing the parms, the
hash table is unlocked to readers and they can lookup the improper tunnel.

Respective commits are b7285b79 (ipip: get rid of ipip_lock), 1507850b
(gre: get rid of ipgre_lock), 3a43be3c (sit: get rid of ipip6_lock) and
94767632 (ip6tnl: get rid of ip6_tnl_lock).

The quick fix is to wait for quiescent state to pass after unlinking,
but if it is inappropriate I can invent something better, just let me
know.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Oct. 27, 2010, 7:02 p.m.
Le mercredi 27 octobre 2010 à 19:43 +0400, Pavel Emelyanov a écrit :
> After making rcu protection for tunnels (ipip, gre, sit and ip6) a bug
> was introduced into the SIOCCHGTUNNEL code.
> 
> The tunnel is first unlinked, then addresses change, then it is linked
> back probably into another bucket. But while changing the parms, the
> hash table is unlocked to readers and they can lookup the improper tunnel.
> 
> Respective commits are b7285b79 (ipip: get rid of ipip_lock), 1507850b
> (gre: get rid of ipgre_lock), 3a43be3c (sit: get rid of ipip6_lock) and
> 94767632 (ip6tnl: get rid of ip6_tnl_lock).
> 
> The quick fix is to wait for quiescent state to pass after unlinking,
> but if it is inappropriate I can invent something better, just let me
> know.
> 
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

Good catch, I missed a change was possible at all :(

I guess some setups could scream with this fix, since this adds a
synchronize_net() call while holding RTNL ...

Hmm, maybe we should allocate a "struct ip_tunnel_parm" instead of using
an embedded one (in struct ip_tunnel), and stick an rcu_head in it to
delay its freeing...




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Oct. 27, 2010, 7:06 p.m.
Le mercredi 27 octobre 2010 à 21:02 +0200, Eric Dumazet a écrit :


> Hmm, maybe we should allocate a "struct ip_tunnel_parm" instead of using
> an embedded one (in struct ip_tunnel), and stick an rcu_head in it to
> delay its freeing...
> 

I forgot to Ack your patch, of course.

We can implement something better when net-next-2.6 re-opens.

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Oct. 27, 2010, 9:21 p.m.
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 27 Oct 2010 21:06:12 +0200

> Le mercredi 27 octobre 2010 à 21:02 +0200, Eric Dumazet a écrit :
> 
> 
>> Hmm, maybe we should allocate a "struct ip_tunnel_parm" instead of using
>> an embedded one (in struct ip_tunnel), and stick an rcu_head in it to
>> delay its freeing...
>> 
> 
> I forgot to Ack your patch, of course.
> 
> We can implement something better when net-next-2.6 re-opens.
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index d0ffcbe..01087e0 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1072,6 +1072,7 @@  ipgre_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 					break;
 				}
 				ipgre_tunnel_unlink(ign, t);
+				synchronize_net();
 				t->parms.iph.saddr = p.iph.saddr;
 				t->parms.iph.daddr = p.iph.daddr;
 				t->parms.i_key = p.i_key;
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index e9b816e..cd300aa 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -676,6 +676,7 @@  ipip_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 				}
 				t = netdev_priv(dev);
 				ipip_tunnel_unlink(ipn, t);
+				synchronize_net();
 				t->parms.iph.saddr = p.iph.saddr;
 				t->parms.iph.daddr = p.iph.daddr;
 				memcpy(dev->dev_addr, &p.iph.saddr, 4);
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 38b9a56..2a59610 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1284,6 +1284,7 @@  ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 				t = netdev_priv(dev);
 
 			ip6_tnl_unlink(ip6n, t);
+			synchronize_net();
 			err = ip6_tnl_change(t, &p);
 			ip6_tnl_link(ip6n, t);
 			netdev_state_change(dev);
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 367a6cc..d6bfaec 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -963,6 +963,7 @@  ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 				}
 				t = netdev_priv(dev);
 				ipip6_tunnel_unlink(sitn, t);
+				synchronize_net();
 				t->parms.iph.saddr = p.iph.saddr;
 				t->parms.iph.daddr = p.iph.daddr;
 				memcpy(dev->dev_addr, &p.iph.saddr, 4);