Message ID | 20100810083040.GB6801@basil.fritz.box |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, Aug 10, 2010 at 10:30:40AM +0200, Andi Kleen wrote: . > This simple patch demonstrates double destroy. I have patches for showing > the more complicated case too, but they're much more ugly. Andi, I know you're seeing the problem, but I need to udnerstand why, and this patch doesn't really answer the why part :) So did you figure out why was calling it first (I presume you know who called it the second time since you've got the back trace)? Thanks,
On Tue, Aug 10, 2010 at 06:24:21AM -0400, Herbert Xu wrote: > On Tue, Aug 10, 2010 at 10:30:40AM +0200, Andi Kleen wrote: > . > > This simple patch demonstrates double destroy. I have patches for showing > > the more complicated case too, but they're much more ugly. > > Andi, I know you're seeing the problem, but I need to udnerstand > why, and this patch doesn't really answer the why part :) > > So did you figure out why was calling it first (I presume you > know who called it the second time since you've got the back > trace)? Yes I stored the backtrace of the first caller in the ugly debug patches and dumped that on the second destroy. It was tcp_done the first time. Also did the same for tcp_sk() and there it was the fin sending. I agree that tcp_close() should skip it in theory but I saw it anyways :/ -Andi -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 10, 2010 at 12:32:06PM +0200, Andi Kleen wrote: > > Yes I stored the backtrace of the first caller in the ugly debug > patches and dumped that on the second destroy. It was tcp_done the > first time. > > Also did the same for tcp_sk() and there it was the fin sending. > > I agree that tcp_close() should skip it in theory but I saw > it anyways :/ So I presume the second caller was tcp_close? That means we have a serious bug in our stack, as if the socket is already in the CLOSE state then tcp_close should have short-circuited. This means that something is changing the TCP socket state after going into CLOSE. Can you try adding your debug backtrace patch to tcp_set_state to see if anybody is indeed changing the socket state after going into CLOSE (and more importantly who is changing it)? Thanks,
diff --git a/include/linux/tcp.h b/include/linux/tcp.h index a778ee0..bfbb06b 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -459,6 +459,8 @@ struct tcp_sock { * contains related tcp_cookie_transactions fields. */ struct tcp_cookie_values *cookie_values; + + int destroyed; }; static inline struct tcp_sock *tcp_sk(const struct sock *sk) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 0207662..25bf80a 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1919,6 +1919,9 @@ void tcp_v4_destroy_sock(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); + BUG_ON(tp->destroyed != 0); + tp->destroyed = 1; + tcp_clear_xmit_timers(sk); tcp_cleanup_congestion_control(sk);