Message ID | alpine.DEB.2.02.1405140233210.16893@dtop |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, 2014-05-14 at 02:57 -0700, dormando wrote: > Hi, > > Given a machine with frequently changing routes (ie; a router with an > active internet BGP table and multiple interfaces), there're at least > several places where obsolete dst's are handled improperly. If I pause the > route changes, the crashes appear to stop. This first one has a crash > utility we've made, so I was able to more quickly find a patch and test > it. The others take time to reproduce. > > I'm testing against 3.10.39, but I think if these were fixed they'd be > backported to stable? I've also had recent 3.12's running that have > crashed in the same spots. Anyway correct me if I'm wrong... Is this a vanilla kernel ? I never had any issues like that. I wonder if you have some RCU issues. static inline struct dst_entry * sk_dst_get(struct sock *sk) { struct dst_entry *dst; rcu_read_lock(); dst = rcu_dereference(sk->sk_dst_cache); if (dst) dst_hold(dst); rcu_read_unlock(); return dst; } static inline void __sk_dst_set(struct sock *sk, struct dst_entry *dst) { struct dst_entry *old_dst; sk_tx_queue_clear(sk); /* * This can be called while sk is owned by the caller only, * with no state that can be checked in a rcu_dereference_check() cond */ old_dst = rcu_dereference_raw(sk->sk_dst_cache); rcu_assign_pointer(sk->sk_dst_cache, dst); dst_release(old_dst); } static inline void sk_dst_set(struct sock *sk, struct dst_entry *dst) { spin_lock(&sk->sk_dst_lock); __sk_dst_set(sk, dst); spin_unlock(&sk->sk_dst_lock); } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 14, 2014, at 2:57, dormando wrote: > Given a machine with frequently changing routes (ie; a router with an > active internet BGP table and multiple interfaces), there're at least > several places where obsolete dst's are handled improperly. If I pause > the > route changes, the crashes appear to stop. This first one has a crash > utility we've made, so I was able to more quickly find a patch and test > it. The others take time to reproduce. > > I'm testing against 3.10.39, but I think if these were fixed they'd be > backported to stable? I've also had recent 3.12's running that have > crashed in the same spots. Anyway correct me if I'm wrong... Just a hunch: You use macvlan? Could you somehow try without? Maybe... some ref overflow? (You could add some testing code in dst_hold with atomic_inc_return and WARN_ON). dst_release already contains such a check, so I am not sure at all if that could happen. Bye, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Wed, 2014-05-14 at 02:57 -0700, dormando wrote: > > Hi, > > > > Given a machine with frequently changing routes (ie; a router with an > > active internet BGP table and multiple interfaces), there're at least > > several places where obsolete dst's are handled improperly. If I pause the > > route changes, the crashes appear to stop. This first one has a crash > > utility we've made, so I was able to more quickly find a patch and test > > it. The others take time to reproduce. > > > > I'm testing against 3.10.39, but I think if these were fixed they'd be > > backported to stable? I've also had recent 3.12's running that have > > crashed in the same spots. Anyway correct me if I'm wrong... > > Is this a vanilla kernel ? I never had any issues like that. > > I wonder if you have some RCU issues. > > static inline struct dst_entry * > sk_dst_get(struct sock *sk) > { > struct dst_entry *dst; > > rcu_read_lock(); > dst = rcu_dereference(sk->sk_dst_cache); > if (dst) > dst_hold(dst); > rcu_read_unlock(); > return dst; > } > > static inline void > __sk_dst_set(struct sock *sk, struct dst_entry *dst) > { > struct dst_entry *old_dst; > > sk_tx_queue_clear(sk); > /* > * This can be called while sk is owned by the caller only, > * with no state that can be checked in a rcu_dereference_check() cond > */ > old_dst = rcu_dereference_raw(sk->sk_dst_cache); > rcu_assign_pointer(sk->sk_dst_cache, dst); > dst_release(old_dst); > } > > static inline void > sk_dst_set(struct sock *sk, struct dst_entry *dst) > { > spin_lock(&sk->sk_dst_lock); > __sk_dst_set(sk, dst); > spin_unlock(&sk->sk_dst_lock); > } > > > > We have some minor patches, but I've removed them before and they still happen. I'd crashed a vanilla 3.12 + just the stable patches recently I think. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Wed, May 14, 2014, at 2:57, dormando wrote: > > Given a machine with frequently changing routes (ie; a router with an > > active internet BGP table and multiple interfaces), there're at least > > several places where obsolete dst's are handled improperly. If I pause > > the > > route changes, the crashes appear to stop. This first one has a crash > > utility we've made, so I was able to more quickly find a patch and test > > it. The others take time to reproduce. > > > > I'm testing against 3.10.39, but I think if these were fixed they'd be > > backported to stable? I've also had recent 3.12's running that have > > crashed in the same spots. Anyway correct me if I'm wrong... > > Just a hunch: > You use macvlan? Could you somehow try without? > Maybe... some ref overflow? (You could add some testing code in dst_hold > with atomic_inc_return and WARN_ON). > > dst_release already contains such a check, so I am not sure at all if > that could happen. > > Bye, > > Hannes > We've seen the crashes with macvlan removed. Don't think I've explicitly removed it recently or for the udp crash, but I'm sorta doubting that'd make a difference. and yeah, pretty weird right? it's like the RCU isn't working.. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 14, 2014 at 04:40:18AM -0700, Eric Dumazet wrote: > On Wed, 2014-05-14 at 02:57 -0700, dormando wrote: > > Hi, > > > > Given a machine with frequently changing routes (ie; a router with an > > active internet BGP table and multiple interfaces), there're at least > > several places where obsolete dst's are handled improperly. If I pause the > > route changes, the crashes appear to stop. This first one has a crash > > utility we've made, so I was able to more quickly find a patch and test > > it. The others take time to reproduce. > > > > I'm testing against 3.10.39, but I think if these were fixed they'd be > > backported to stable? I've also had recent 3.12's running that have > > crashed in the same spots. Anyway correct me if I'm wrong... > > Is this a vanilla kernel ? I never had any issues like that. > > I wonder if you have some RCU issues. > > static inline struct dst_entry * > sk_dst_get(struct sock *sk) > { > struct dst_entry *dst; > > rcu_read_lock(); > dst = rcu_dereference(sk->sk_dst_cache); > if (dst) > dst_hold(dst); > rcu_read_unlock(); > return dst; > } > > static inline void > __sk_dst_set(struct sock *sk, struct dst_entry *dst) > { > struct dst_entry *old_dst; > > sk_tx_queue_clear(sk); > /* > * This can be called while sk is owned by the caller only, > * with no state that can be checked in a rcu_dereference_check() cond > */ > old_dst = rcu_dereference_raw(sk->sk_dst_cache); > rcu_assign_pointer(sk->sk_dst_cache, dst); > dst_release(old_dst); It is probably just be me getting lost in the code, but I am not seeing a synchronize_rcu(), call_rcu(), or synchronize_net() anywhere in dst_release() or the things that it calls. If there really isn't such a call, then I don't see how the above code is safe in the case where __sk_dst_set() is invoked on one CPU just after sk_dst_get() executes the rcu_dereference() on some other CPU. Thanx, Paul > } > > static inline void > sk_dst_set(struct sock *sk, struct dst_entry *dst) > { > spin_lock(&sk->sk_dst_lock); > __sk_dst_set(sk, dst); > spin_unlock(&sk->sk_dst_lock); > } > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2014-06-06 at 11:12 -0700, Paul E. McKenney wrote: > It is probably just be me getting lost in the code, but I am not seeing > a synchronize_rcu(), call_rcu(), or synchronize_net() anywhere in > dst_release() or the things that it calls. If there really isn't such > a call, then I don't see how the above code is safe in the case where > __sk_dst_set() is invoked on one CPU just after sk_dst_get() executes > the rcu_dereference() on some other CPU. Well, this part is fine, dst_release() do not free dst that are potentially stored in sk_dst_cache Only the refcount is decremented. The bug is elsewhere, we had another thread raising this issue on netdev this morning. I am cooking a patch to clear the mess. Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 06, 2014 at 11:57:10AM -0700, Eric Dumazet wrote: > On Fri, 2014-06-06 at 11:12 -0700, Paul E. McKenney wrote: > > > It is probably just be me getting lost in the code, but I am not seeing > > a synchronize_rcu(), call_rcu(), or synchronize_net() anywhere in > > dst_release() or the things that it calls. If there really isn't such > > a call, then I don't see how the above code is safe in the case where > > __sk_dst_set() is invoked on one CPU just after sk_dst_get() executes > > the rcu_dereference() on some other CPU. > > Well, this part is fine, dst_release() do not free dst that are > potentially stored in sk_dst_cache > > Only the refcount is decremented. > > The bug is elsewhere, we had another thread raising this issue on netdev > this morning. > > I am cooking a patch to clear the mess. Ah, thank you, sorry for the noise! Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index cd124c4..b9acfab 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -953,8 +953,12 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, } else if (!ipc.oif) ipc.oif = inet->uc_index; - if (connected) + if (connected) { + lock_sock(sk); rt = (struct rtable *)sk_dst_check(sk, 0); + if (rt) + release_sock(sk); + } if (rt == NULL) { struct net *net = sock_net(sk); @@ -972,15 +976,22 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, rt = NULL; if (err == -ENETUNREACH) IP_INC_STATS(net, IPSTATS_MIB_OUTNOROUTES); + if (connected) + release_sock(sk); goto out; } err = -EACCES; if ((rt->rt_flags & RTCF_BROADCAST) && - !sock_flag(sk, SOCK_BROADCAST)) + !sock_flag(sk, SOCK_BROADCAST)) { + if (connected) + release_sock(sk); goto out; - if (connected) + } + if (connected) { sk_dst_set(sk, dst_clone(&rt->dst)); + release_sock(sk); + } } if (msg->msg_flags&MSG_CONFIRM)