diff mbox

[net-next] ipv4: fix a dst leak in tunnels

Message ID 1389919279.31367.439.camel@edumazet-glaptop2.roam.corp.google.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Jan. 17, 2014, 12:41 a.m. UTC
From: Eric Dumazet <edumazet@google.com>

This patch :

1) Remove a dst leak if DST_NOCACHE was set on dst
   Fix this by holding a reference only if dst really cached.

2) Remove a lockdep warning in __tunnel_dst_set()
    This was reported by Cong Wang.

3) Remove usage of a spinlock where xchg() is enough

4) Remove some spurious inline keywords.
   Let compiler decide for us.

Fixes: 7d442fab0a67 ("ipv4: Cache dst in tunnels")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
---
 include/net/ip_tunnels.h |    1 -
 net/ipv4/ip_tunnel.c     |   34 ++++++++++++++--------------------
 2 files changed, 14 insertions(+), 21 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Jan. 17, 2014, 1:13 a.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 16 Jan 2014 16:41:19 -0800

> This patch :
> 
> 1) Remove a dst leak if DST_NOCACHE was set on dst
>    Fix this by holding a reference only if dst really cached.

Can you explain this problem a little bit more?

Why do we have to handle DST_NOCACHE specially?  We hold a reference
and dst_release() knows what to do with DST_NOCACHE routes.

Or is it semantically undesirable for tunnels to cache these routes?
If so, why do we leave sockets caching DST_NOCACHE routes just fine?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 17, 2014, 1:29 a.m. UTC | #2
On Thu, 2014-01-16 at 17:13 -0800, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 16 Jan 2014 16:41:19 -0800
> 
> > This patch :
> > 
> > 1) Remove a dst leak if DST_NOCACHE was set on dst
> >    Fix this by holding a reference only if dst really cached.
> 
> Can you explain this problem a little bit more?
> 
> Why do we have to handle DST_NOCACHE specially?  We hold a reference
> and dst_release() knows what to do with DST_NOCACHE routes.
> 
> Or is it semantically undesirable for tunnels to cache these routes?
> If so, why do we leave sockets caching DST_NOCACHE routes just fine?

Previous code was doing in the callers of __tunnel_dst_set() a
dst_clone(dst)

Then, we were doing in __tunnel_dst_set() :

       if (dst && (dst->flags & DST_NOCACHE))
               dst = NULL;

When setting dst to NULL here, we forgot to release the reference we
took in callers.

After my patch, we instead do the dst_clone(dst) only in the case we are
not clearing dst :

+       if (dst) {
+               if (dst->flags & DST_NOCACHE)
+                       dst = NULL;
+               else
+                       dst_clone(dst);
+       }



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 17, 2014, 1:32 a.m. UTC | #3
On Thu, 2014-01-16 at 17:13 -0800, David Miller wrote:

> Why do we have to handle DST_NOCACHE specially?  We hold a reference
> and dst_release() knows what to do with DST_NOCACHE routes.
> 
> Or is it semantically undesirable for tunnels to cache these routes?
> If so, why do we leave sockets caching DST_NOCACHE routes just fine?

If DST_NOCACHE is set on a dst, this dst cannot be used by rcu users,
because dst_release() will immediately free the dst, without rcu grace
period.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 17, 2014, 1:40 a.m. UTC | #4
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 16 Jan 2014 17:32:30 -0800

> On Thu, 2014-01-16 at 17:13 -0800, David Miller wrote:
> 
>> Why do we have to handle DST_NOCACHE specially?  We hold a reference
>> and dst_release() knows what to do with DST_NOCACHE routes.
>> 
>> Or is it semantically undesirable for tunnels to cache these routes?
>> If so, why do we leave sockets caching DST_NOCACHE routes just fine?
> 
> If DST_NOCACHE is set on a dst, this dst cannot be used by rcu users,
> because dst_release() will immediately free the dst, without rcu grace
> period.

Ok, if we ever start using DST_NOCACHE in ipv6 we will have to modify
ipv6 tunnels as they cache similarly.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang Jan. 17, 2014, 10:18 p.m. UTC | #5
On Thu, Jan 16, 2014 at 4:41 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> This patch :
>
> 1) Remove a dst leak if DST_NOCACHE was set on dst
>    Fix this by holding a reference only if dst really cached.
>
> 2) Remove a lockdep warning in __tunnel_dst_set()
>     This was reported by Cong Wang.
>
> 3) Remove usage of a spinlock where xchg() is enough
>
> 4) Remove some spurious inline keywords.
>    Let compiler decide for us.

5) remove the unless NULL'ing for percpu pointers. :)

Looks good.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 18, 2014, 2:37 a.m. UTC | #6
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 16 Jan 2014 16:41:19 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> This patch :
> 
> 1) Remove a dst leak if DST_NOCACHE was set on dst
>    Fix this by holding a reference only if dst really cached.
> 
> 2) Remove a lockdep warning in __tunnel_dst_set()
>     This was reported by Cong Wang.
> 
> 3) Remove usage of a spinlock where xchg() is enough
> 
> 4) Remove some spurious inline keywords.
>    Let compiler decide for us.
> 
> Fixes: 7d442fab0a67 ("ipv4: Cache dst in tunnels")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index cd729becbb07..48ed75c21260 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -40,7 +40,6 @@  struct ip_tunnel_prl_entry {
 
 struct ip_tunnel_dst {
 	struct dst_entry __rcu 		*dst;
-	spinlock_t			lock;
 };
 
 struct ip_tunnel {
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index d3929a69f008..432c28ab3197 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -68,27 +68,27 @@  static unsigned int ip_tunnel_hash(struct ip_tunnel_net *itn,
 			 IP_TNL_HASH_BITS);
 }
 
-static inline void __tunnel_dst_set(struct ip_tunnel_dst *idst,
-				    struct dst_entry *dst)
+static void __tunnel_dst_set(struct ip_tunnel_dst *idst,
+			     struct dst_entry *dst)
 {
 	struct dst_entry *old_dst;
 
-	if (dst && (dst->flags & DST_NOCACHE))
-		dst = NULL;
-
-	spin_lock_bh(&idst->lock);
-	old_dst = rcu_dereference(idst->dst);
-	rcu_assign_pointer(idst->dst, dst);
+	if (dst) {
+		if (dst->flags & DST_NOCACHE)
+			dst = NULL;
+		else
+			dst_clone(dst);
+	}
+	old_dst = xchg((__force struct dst_entry **)&idst->dst, dst);
 	dst_release(old_dst);
-	spin_unlock_bh(&idst->lock);
 }
 
-static inline void tunnel_dst_set(struct ip_tunnel *t, struct dst_entry *dst)
+static void tunnel_dst_set(struct ip_tunnel *t, struct dst_entry *dst)
 {
 	__tunnel_dst_set(this_cpu_ptr(t->dst_cache), dst);
 }
 
-static inline void tunnel_dst_reset(struct ip_tunnel *t)
+static void tunnel_dst_reset(struct ip_tunnel *t)
 {
 	tunnel_dst_set(t, NULL);
 }
@@ -101,7 +101,7 @@  static void tunnel_dst_reset_all(struct ip_tunnel *t)
 		__tunnel_dst_set(per_cpu_ptr(t->dst_cache, i), NULL);
 }
 
-static inline struct dst_entry *tunnel_dst_get(struct ip_tunnel *t)
+static struct dst_entry *tunnel_dst_get(struct ip_tunnel *t)
 {
 	struct dst_entry *dst;
 
@@ -413,7 +413,7 @@  static int ip_tunnel_bind_dev(struct net_device *dev)
 
 		if (!IS_ERR(rt)) {
 			tdev = rt->dst.dev;
-			tunnel_dst_set(tunnel, dst_clone(&rt->dst));
+			tunnel_dst_set(tunnel, &rt->dst);
 			ip_rt_put(rt);
 		}
 		if (dev->type != ARPHRD_ETHER)
@@ -668,7 +668,7 @@  void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 			goto tx_error;
 		}
 		if (connected)
-			tunnel_dst_set(tunnel, dst_clone(&rt->dst));
+			tunnel_dst_set(tunnel, &rt->dst);
 	}
 
 	if (rt->dst.dev == dev) {
@@ -1066,12 +1066,6 @@  int ip_tunnel_init(struct net_device *dev)
 		return -ENOMEM;
 	}
 
-	for_each_possible_cpu(i) {
-		struct ip_tunnel_dst *idst = per_cpu_ptr(tunnel->dst_cache, i);
-		idst-> dst = NULL;
-		spin_lock_init(&idst->lock);
-	}
-
 	err = gro_cells_init(&tunnel->gro_cells, dev);
 	if (err) {
 		free_percpu(tunnel->dst_cache);