Message ID | 1389919279.31367.439.camel@edumazet-glaptop2.roam.corp.google.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 16 Jan 2014 16:41:19 -0800 > This patch : > > 1) Remove a dst leak if DST_NOCACHE was set on dst > Fix this by holding a reference only if dst really cached. Can you explain this problem a little bit more? Why do we have to handle DST_NOCACHE specially? We hold a reference and dst_release() knows what to do with DST_NOCACHE routes. Or is it semantically undesirable for tunnels to cache these routes? If so, why do we leave sockets caching DST_NOCACHE routes just fine? Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2014-01-16 at 17:13 -0800, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Thu, 16 Jan 2014 16:41:19 -0800 > > > This patch : > > > > 1) Remove a dst leak if DST_NOCACHE was set on dst > > Fix this by holding a reference only if dst really cached. > > Can you explain this problem a little bit more? > > Why do we have to handle DST_NOCACHE specially? We hold a reference > and dst_release() knows what to do with DST_NOCACHE routes. > > Or is it semantically undesirable for tunnels to cache these routes? > If so, why do we leave sockets caching DST_NOCACHE routes just fine? Previous code was doing in the callers of __tunnel_dst_set() a dst_clone(dst) Then, we were doing in __tunnel_dst_set() : if (dst && (dst->flags & DST_NOCACHE)) dst = NULL; When setting dst to NULL here, we forgot to release the reference we took in callers. After my patch, we instead do the dst_clone(dst) only in the case we are not clearing dst : + if (dst) { + if (dst->flags & DST_NOCACHE) + dst = NULL; + else + dst_clone(dst); + } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2014-01-16 at 17:13 -0800, David Miller wrote: > Why do we have to handle DST_NOCACHE specially? We hold a reference > and dst_release() knows what to do with DST_NOCACHE routes. > > Or is it semantically undesirable for tunnels to cache these routes? > If so, why do we leave sockets caching DST_NOCACHE routes just fine? If DST_NOCACHE is set on a dst, this dst cannot be used by rcu users, because dst_release() will immediately free the dst, without rcu grace period. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 16 Jan 2014 17:32:30 -0800 > On Thu, 2014-01-16 at 17:13 -0800, David Miller wrote: > >> Why do we have to handle DST_NOCACHE specially? We hold a reference >> and dst_release() knows what to do with DST_NOCACHE routes. >> >> Or is it semantically undesirable for tunnels to cache these routes? >> If so, why do we leave sockets caching DST_NOCACHE routes just fine? > > If DST_NOCACHE is set on a dst, this dst cannot be used by rcu users, > because dst_release() will immediately free the dst, without rcu grace > period. Ok, if we ever start using DST_NOCACHE in ipv6 we will have to modify ipv6 tunnels as they cache similarly. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 16, 2014 at 4:41 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > From: Eric Dumazet <edumazet@google.com> > > This patch : > > 1) Remove a dst leak if DST_NOCACHE was set on dst > Fix this by holding a reference only if dst really cached. > > 2) Remove a lockdep warning in __tunnel_dst_set() > This was reported by Cong Wang. > > 3) Remove usage of a spinlock where xchg() is enough > > 4) Remove some spurious inline keywords. > Let compiler decide for us. 5) remove the unless NULL'ing for percpu pointers. :) Looks good. Thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 16 Jan 2014 16:41:19 -0800 > From: Eric Dumazet <edumazet@google.com> > > This patch : > > 1) Remove a dst leak if DST_NOCACHE was set on dst > Fix this by holding a reference only if dst really cached. > > 2) Remove a lockdep warning in __tunnel_dst_set() > This was reported by Cong Wang. > > 3) Remove usage of a spinlock where xchg() is enough > > 4) Remove some spurious inline keywords. > Let compiler decide for us. > > Fixes: 7d442fab0a67 ("ipv4: Cache dst in tunnels") > Signed-off-by: Eric Dumazet <edumazet@google.com> Applied, thanks Eric. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index cd729becbb07..48ed75c21260 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -40,7 +40,6 @@ struct ip_tunnel_prl_entry { struct ip_tunnel_dst { struct dst_entry __rcu *dst; - spinlock_t lock; }; struct ip_tunnel { diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index d3929a69f008..432c28ab3197 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -68,27 +68,27 @@ static unsigned int ip_tunnel_hash(struct ip_tunnel_net *itn, IP_TNL_HASH_BITS); } -static inline void __tunnel_dst_set(struct ip_tunnel_dst *idst, - struct dst_entry *dst) +static void __tunnel_dst_set(struct ip_tunnel_dst *idst, + struct dst_entry *dst) { struct dst_entry *old_dst; - if (dst && (dst->flags & DST_NOCACHE)) - dst = NULL; - - spin_lock_bh(&idst->lock); - old_dst = rcu_dereference(idst->dst); - rcu_assign_pointer(idst->dst, dst); + if (dst) { + if (dst->flags & DST_NOCACHE) + dst = NULL; + else + dst_clone(dst); + } + old_dst = xchg((__force struct dst_entry **)&idst->dst, dst); dst_release(old_dst); - spin_unlock_bh(&idst->lock); } -static inline void tunnel_dst_set(struct ip_tunnel *t, struct dst_entry *dst) +static void tunnel_dst_set(struct ip_tunnel *t, struct dst_entry *dst) { __tunnel_dst_set(this_cpu_ptr(t->dst_cache), dst); } -static inline void tunnel_dst_reset(struct ip_tunnel *t) +static void tunnel_dst_reset(struct ip_tunnel *t) { tunnel_dst_set(t, NULL); } @@ -101,7 +101,7 @@ static void tunnel_dst_reset_all(struct ip_tunnel *t) __tunnel_dst_set(per_cpu_ptr(t->dst_cache, i), NULL); } -static inline struct dst_entry *tunnel_dst_get(struct ip_tunnel *t) +static struct dst_entry *tunnel_dst_get(struct ip_tunnel *t) { struct dst_entry *dst; @@ -413,7 +413,7 @@ static int ip_tunnel_bind_dev(struct net_device *dev) if (!IS_ERR(rt)) { tdev = rt->dst.dev; - tunnel_dst_set(tunnel, dst_clone(&rt->dst)); + tunnel_dst_set(tunnel, &rt->dst); ip_rt_put(rt); } if (dev->type != ARPHRD_ETHER) @@ -668,7 +668,7 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, goto tx_error; } if (connected) - tunnel_dst_set(tunnel, dst_clone(&rt->dst)); + tunnel_dst_set(tunnel, &rt->dst); } if (rt->dst.dev == dev) { @@ -1066,12 +1066,6 @@ int ip_tunnel_init(struct net_device *dev) return -ENOMEM; } - for_each_possible_cpu(i) { - struct ip_tunnel_dst *idst = per_cpu_ptr(tunnel->dst_cache, i); - idst-> dst = NULL; - spin_lock_init(&idst->lock); - } - err = gro_cells_init(&tunnel->gro_cells, dev); if (err) { free_percpu(tunnel->dst_cache);