Message ID | 1348092082.31352.51.camel@edumazet-glaptop |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, Sep 20, 2012 at 12:01:22AM +0200, Eric Dumazet wrote: > I spent some time trying to track this bug, but found nothing so far. > > The timer->function are never cleared by TCP stack at tear down, and > should be set before fd is installed and can be caught by other threads. > > Most likely its a refcounting issue... > > Following debugging patch might trigger a bug sooner ? 4 hours so far, nothing.. I'll leave it run overnight, but it doesn't seem to have made much difference if any. Dave -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 19, 2012 at 10:02:23PM -0400, Dave Jones wrote: > On Thu, Sep 20, 2012 at 12:01:22AM +0200, Eric Dumazet wrote: > > > I spent some time trying to track this bug, but found nothing so far. > > > > The timer->function are never cleared by TCP stack at tear down, and > > should be set before fd is installed and can be caught by other threads. > > > > Most likely its a refcounting issue... > > > > Following debugging patch might trigger a bug sooner ? > > 4 hours so far, nothing.. I'll leave it run overnight, but it doesn't seem to have > made much difference if any. One of my over-weekend runs hit this again, but it didn't trigger the WARN that your patch added :-/ Dave -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2012-09-24 at 11:39 -0400, Dave Jones wrote: > On Wed, Sep 19, 2012 at 10:02:23PM -0400, Dave Jones wrote: > > On Thu, Sep 20, 2012 at 12:01:22AM +0200, Eric Dumazet wrote: > > > > > I spent some time trying to track this bug, but found nothing so far. > > > > > > The timer->function are never cleared by TCP stack at tear down, and > > > should be set before fd is installed and can be caught by other threads. > > > > > > Most likely its a refcounting issue... > > > > > > Following debugging patch might trigger a bug sooner ? > > > > 4 hours so far, nothing.. I'll leave it run overnight, but it doesn't seem to have > > made much difference if any. > > One of my over-weekend runs hit this again, but it didn't trigger the WARN that > your patch added :-/ > > Dave > OK, I believe I found the reason. I Will post a patch. open a raw socket AF_INET, TCP_PROTO + connect() ->sk_state set to TCP_ESTABLISHED + setsockopt( SO_KEEPALIVE, &on) -> crash Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/sock.h b/include/net/sock.h index 84bdaec..5d3ad5b 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -511,18 +511,18 @@ static inline void sock_hold(struct sock *sk) */ static inline void __sock_put(struct sock *sk) { - atomic_dec(&sk->sk_refcnt); + int newcnt = atomic_dec_return(&sk->sk_refcnt); + + WARN_ON(newcnt <= 0); } static inline bool sk_del_node_init(struct sock *sk) { bool rc = __sk_del_node_init(sk); - if (rc) { - /* paranoid for a while -acme */ - WARN_ON(atomic_read(&sk->sk_refcnt) == 1); + if (rc) __sock_put(sk); - } + return rc; } #define sk_del_node_init_rcu(sk) sk_del_node_init(sk) @@ -1620,7 +1620,10 @@ static inline void sk_filter_charge(struct sock *sk, struct sk_filter *fp) /* Ungrab socket and destroy it, if it was the last reference. */ static inline void sock_put(struct sock *sk) { - if (atomic_dec_and_test(&sk->sk_refcnt)) + int newcnt = atomic_dec_return(&sk->sk_refcnt); + + WARN_ON(newcnt < 0); + if (newcnt == 0) sk_free(sk); }