From patchwork Wed Jul 3 01:04:27 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 256526 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 9A3982C007A for ; Wed, 3 Jul 2013 11:04:34 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752552Ab3GCBEa (ORCPT ); Tue, 2 Jul 2013 21:04:30 -0400 Received: from mail-pd0-f176.google.com ([209.85.192.176]:64843 "EHLO mail-pd0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751641Ab3GCBEa (ORCPT ); Tue, 2 Jul 2013 21:04:30 -0400 Received: by mail-pd0-f176.google.com with SMTP id t12so4047551pdi.7 for ; Tue, 02 Jul 2013 18:04:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:subject:from:to:cc:date:in-reply-to:references :content-type:x-mailer:content-transfer-encoding:mime-version; bh=5zvnRLptgTVNHdPrZjImXzAviv5aeuXJw4X1oqASILM=; b=HIGh5Kyx2gRahovh6NDEszBPSQKQDTk6j5pbumEYArxxaI0etF2lVA+lgTOoas4D5i xQ+PtCvVeQVB/HicfSmlaCijx3ISZYuKimnQQeGvdIADPqCV4H8yJ2XOtJWGDhl3hhiZ 2Ji+rNaXRLrDtjao3rJ6Gq2LEOZyCMcCCyh3rFczsLb9b0lBHTp+E45ifRRXooc+0QIE FSGmjg1PtU0+l4rPNQLE8+jLk9GxjjV0/3P+FL9YyeiDL+YQPOXzaBeCPS17sbAPj5/E NK/lz/UVRiWhRAuVJUbcvYLqu7PyKIVwm9q4MZVDpjTid5obZ0fPpqK7RRkfq1IJeWAv vmAA== X-Received: by 10.66.121.195 with SMTP id lm3mr254142pab.116.1372813469596; Tue, 02 Jul 2013 18:04:29 -0700 (PDT) Received: from [172.19.241.215] ([172.19.241.215]) by mx.google.com with ESMTPSA id dg3sm29377941pbc.24.2013.07.02.18.04.28 for (version=SSLv3 cipher=RC4-SHA bits=128/128); Tue, 02 Jul 2013 18:04:29 -0700 (PDT) Message-ID: <1372813467.4979.46.camel@edumazet-glaptop> Subject: Re: 3.9.5+: Crash in tcp_input.c:4810. From: Eric Dumazet To: Ben Greear Cc: netdev Date: Tue, 02 Jul 2013 18:04:27 -0700 In-Reply-To: <51D1C620.8030007@candelatech.com> References: <51BF50B3.1080403@candelatech.com> <1371493059.3252.200.camel@edumazet-glaptop> <51D1C620.8030007@candelatech.com> X-Mailer: Evolution 3.2.3-0ubuntu6 Mime-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Mon, 2013-07-01 at 11:10 -0700, Ben Greear wrote: > offset: -1459 start: -1146162927 seq: -1146161468 size: 16047 copy: 3576 > ... > > There were 80 total splats of this nature grouped together, and then > the system recovered and continue to function normally as far as I > can tell. The later splats are a bit farther apart...maybe the > TCP connection is dying. > > It appears my 'work-around' is poor at best, but I'd rather kill > a TCP connection and spam the logs than crash the OS. > > I'd be more than happy to add more/different debugging code. It would be nice to pinpoint the origin of the bug. Really. This BUG_ON() is at least 7 years old. I do not think invariant has changed ? Sure we can avoid crashes but it looks like we could randomly corrupt tcp payload or whatever kernel memory, if it turns out its caused by a buggy driver. Is it happening while collapsing the receive queue, or the ofo queue ? In receive queue, all skbs skb2 following skb1 must have TCP_SKB_CB(skb1)->end_seq >= TCP_SKB_CB(skb2)->seq Only on ofo, we could have this not respected, and it should be handled properly in tcp_collapse_ofo_queue() --- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 28af45a..d77f1f0 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4457,7 +4457,12 @@ restart: int offset = start - TCP_SKB_CB(skb)->seq; int size = TCP_SKB_CB(skb)->end_seq - start; - BUG_ON(offset < 0); + if (unlikely(offset < 0)) { + pr_err("tcp_collapse() bug on %s offset:%d size:%d copy:%d skb->len %u truesize %u, nskb->len %u\n", + list == &sk->sk_receive_queue ? "receive_queue" : "ofo_queue", + offset, size, copy, skb->len, skb->truesize, nskb->len); + return; + } if (size > 0) { size = min(copy, size); if (skb_copy_bits(skb, offset, skb_put(nskb, size), size))