Message ID | 1363455366.29475.66.camel@edumazet-glaptop |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Sat, 2013-03-16 at 10:36 -0700, Eric Dumazet wrote: > On Fri, 2013-03-15 at 00:19 +0100, Eric Dumazet wrote: > > > Thanks thats really useful, we might miss to increment socket refcount > > in a timer setup. > > > > Hmm, please add following debugging patch as well > > diff --git a/include/net/sock.h b/include/net/sock.h > index 14f6e9d..fe7c8a6 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -530,7 +530,9 @@ static inline void sock_hold(struct sock *sk) > */ > static inline void __sock_put(struct sock *sk) > { > - atomic_dec(&sk->sk_refcnt); > + int newref = atomic_dec_return(&sk->sk_refcnt); > + > + BUG_ON(newref <= 0); > } > > static inline bool sk_del_node_init(struct sock *sk) > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > index 786d97a..a445e15 100644 > --- a/net/ipv4/inet_connection_sock.c > +++ b/net/ipv4/inet_connection_sock.c > @@ -739,7 +739,7 @@ void inet_csk_prepare_forced_close(struct sock *sk) > { > /* sk_clone_lock locked the socket and set refcnt to 2 */ > bh_unlock_sock(sk); > - sock_put(sk); > + __sock_put(sk); > > /* The below has to be done to allow calling inet_csk_destroy_sock */ > sock_set_flag(sk, SOCK_DEAD); > @@ -835,13 +835,13 @@ void inet_csk_listen_stop(struct sock *sk) > * tcp_v4_destroy_sock(). > */ > tcp_sk(child)->fastopen_rsk = NULL; > - sock_put(sk); > + __sock_put(sk); > } > inet_csk_destroy_sock(child); > > bh_unlock_sock(child); > local_bh_enable(); > - sock_put(child); > + __sock_put(child); > Please don't include the last line : this should stay as sock_put(child); -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Sat, 2013-03-16 at 10:36 -0700, Eric Dumazet wrote: > > On Fri, 2013-03-15 at 00:19 +0100, Eric Dumazet wrote: > > > > > Thanks thats really useful, we might miss to increment socket refcount > > > in a timer setup. > > > > > > > Hmm, please add following debugging patch as well > > > > diff --git a/include/net/sock.h b/include/net/sock.h > > index 14f6e9d..fe7c8a6 100644 > > --- a/include/net/sock.h > > +++ b/include/net/sock.h > > @@ -530,7 +530,9 @@ static inline void sock_hold(struct sock *sk) > > */ > > static inline void __sock_put(struct sock *sk) > > { > > - atomic_dec(&sk->sk_refcnt); > > + int newref = atomic_dec_return(&sk->sk_refcnt); > > + > > + BUG_ON(newref <= 0); > > } > > > > static inline bool sk_del_node_init(struct sock *sk) > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > > index 786d97a..a445e15 100644 > > --- a/net/ipv4/inet_connection_sock.c > > +++ b/net/ipv4/inet_connection_sock.c > > @@ -739,7 +739,7 @@ void inet_csk_prepare_forced_close(struct sock *sk) > > { > > /* sk_clone_lock locked the socket and set refcnt to 2 */ > > bh_unlock_sock(sk); > > - sock_put(sk); > > + __sock_put(sk); > > > > /* The below has to be done to allow calling inet_csk_destroy_sock */ > > sock_set_flag(sk, SOCK_DEAD); > > @@ -835,13 +835,13 @@ void inet_csk_listen_stop(struct sock *sk) > > * tcp_v4_destroy_sock(). > > */ > > tcp_sk(child)->fastopen_rsk = NULL; > > - sock_put(sk); > > + __sock_put(sk); > > } > > inet_csk_destroy_sock(child); > > > > bh_unlock_sock(child); > > local_bh_enable(); > > - sock_put(child); > > + __sock_put(child); > > > > Please don't include the last line : this should stay as > > sock_put(child); > thanks! Will take at least 24 hours to get the trace. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Mar 16, 2013 at 10:36:06AM -0700, Eric Dumazet wrote: > On Fri, 2013-03-15 at 00:19 +0100, Eric Dumazet wrote: > > > Thanks thats really useful, we might miss to increment socket refcount > > in a timer setup. > > > > Hmm, please add following debugging patch as well > > diff --git a/include/net/sock.h b/include/net/sock.h > index 14f6e9d..fe7c8a6 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -530,7 +530,9 @@ static inline void sock_hold(struct sock *sk) > */ > static inline void __sock_put(struct sock *sk) > { > - atomic_dec(&sk->sk_refcnt); > + int newref = atomic_dec_return(&sk->sk_refcnt); > + > + BUG_ON(newref <= 0); > } Couldn't it also be a free from sock_wfree where the wmem accounting went wrong? It does not care about reference counts there. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Mar 17, 2013 at 07:39:48AM +0100, Hannes Frederic Sowa wrote: > On Sat, Mar 16, 2013 at 10:36:06AM -0700, Eric Dumazet wrote: > > On Fri, 2013-03-15 at 00:19 +0100, Eric Dumazet wrote: > > > > > Thanks thats really useful, we might miss to increment socket refcount > > > in a timer setup. > > > > > > > Hmm, please add following debugging patch as well > > > > diff --git a/include/net/sock.h b/include/net/sock.h > > index 14f6e9d..fe7c8a6 100644 > > --- a/include/net/sock.h > > +++ b/include/net/sock.h > > @@ -530,7 +530,9 @@ static inline void sock_hold(struct sock *sk) > > */ > > static inline void __sock_put(struct sock *sk) > > { > > - atomic_dec(&sk->sk_refcnt); > > + int newref = atomic_dec_return(&sk->sk_refcnt); > > + > > + BUG_ON(newref <= 0); > > } > > Couldn't it also be a free from sock_wfree where the wmem accounting went > wrong? It does not care about reference counts there. nvm, it had to be in the stacktrace then. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Sat, 2013-03-16 at 10:36 -0700, Eric Dumazet wrote: > > On Fri, 2013-03-15 at 00:19 +0100, Eric Dumazet wrote: > > > > > Thanks thats really useful, we might miss to increment socket refcount > > > in a timer setup. > > > > > > > Hmm, please add following debugging patch as well > > > > diff --git a/include/net/sock.h b/include/net/sock.h > > index 14f6e9d..fe7c8a6 100644 > > --- a/include/net/sock.h > > +++ b/include/net/sock.h > > @@ -530,7 +530,9 @@ static inline void sock_hold(struct sock *sk) > > */ > > static inline void __sock_put(struct sock *sk) > > { > > - atomic_dec(&sk->sk_refcnt); > > + int newref = atomic_dec_return(&sk->sk_refcnt); > > + > > + BUG_ON(newref <= 0); > > } > > > > static inline bool sk_del_node_init(struct sock *sk) > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > > index 786d97a..a445e15 100644 > > --- a/net/ipv4/inet_connection_sock.c > > +++ b/net/ipv4/inet_connection_sock.c > > @@ -739,7 +739,7 @@ void inet_csk_prepare_forced_close(struct sock *sk) > > { > > /* sk_clone_lock locked the socket and set refcnt to 2 */ > > bh_unlock_sock(sk); > > - sock_put(sk); > > + __sock_put(sk); > > > > /* The below has to be done to allow calling inet_csk_destroy_sock */ > > sock_set_flag(sk, SOCK_DEAD); > > @@ -835,13 +835,13 @@ void inet_csk_listen_stop(struct sock *sk) > > * tcp_v4_destroy_sock(). > > */ > > tcp_sk(child)->fastopen_rsk = NULL; > > - sock_put(sk); > > + __sock_put(sk); > > } > > inet_csk_destroy_sock(child); > > > > bh_unlock_sock(child); > > local_bh_enable(); > > - sock_put(child); > > + __sock_put(child); > > > > Please don't include the last line : this should stay as > > sock_put(child); Hope you don't mind a screenshot: http://www.dormando.me/p/3.8.2-trace-crash.jpg (I put the patches on 3.8.2). box is on another continent so screenshot via IPMI is what I get. If this isn't enough or isn't right I'll try harder to get the trace logged, I guess? Thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2013-03-17 at 02:21 -0700, dormando wrote: > Hope you don't mind a screenshot: > http://www.dormando.me/p/3.8.2-trace-crash.jpg > > (I put the patches on 3.8.2). box is on another continent so screenshot > via IPMI is what I get. If this isn't enough or isn't right I'll try > harder to get the trace logged, I guess? Thanks a lot, this gives another useful input, no need for more traces for the moment. Could you send me the disassembly of tcp_release_cb() ? (objdump -d vmlinux | filter to get tcp_release_cb() body) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2013-03-17 at 09:33 -0700, Eric Dumazet wrote: > On Sun, 2013-03-17 at 02:21 -0700, dormando wrote: > > > Hope you don't mind a screenshot: > > http://www.dormando.me/p/3.8.2-trace-crash.jpg > > > > (I put the patches on 3.8.2). box is on another continent so screenshot > > via IPMI is what I get. If this isn't enough or isn't right I'll try > > harder to get the trace logged, I guess? > > Thanks a lot, this gives another useful input, no need for more traces > for the moment. > > Could you send me the disassembly of tcp_release_cb() ? > > (objdump -d vmlinux | filter to get tcp_release_cb() body) > > It seems we can have a refcnt imbalance because of atomic_set(&newsk->sk_refcnt, 2); we do in sk_clone_lock() Somehow, something is wrong, because at this point the socket should not be found. We perform a sock_hold() somewhere while the socket is already dead. I'll send a patch asap -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/sock.h b/include/net/sock.h index 14f6e9d..fe7c8a6 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -530,7 +530,9 @@ static inline void sock_hold(struct sock *sk) */ static inline void __sock_put(struct sock *sk) { - atomic_dec(&sk->sk_refcnt); + int newref = atomic_dec_return(&sk->sk_refcnt); + + BUG_ON(newref <= 0); } static inline bool sk_del_node_init(struct sock *sk) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 786d97a..a445e15 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -739,7 +739,7 @@ void inet_csk_prepare_forced_close(struct sock *sk) { /* sk_clone_lock locked the socket and set refcnt to 2 */ bh_unlock_sock(sk); - sock_put(sk); + __sock_put(sk); /* The below has to be done to allow calling inet_csk_destroy_sock */ sock_set_flag(sk, SOCK_DEAD); @@ -835,13 +835,13 @@ void inet_csk_listen_stop(struct sock *sk) * tcp_v4_destroy_sock(). */ tcp_sk(child)->fastopen_rsk = NULL; - sock_put(sk); + __sock_put(sk); } inet_csk_destroy_sock(child); bh_unlock_sock(child); local_bh_enable(); - sock_put(child); + __sock_put(child); sk_acceptq_removed(sk); __reqsk_free(req);