From patchwork Fri Jul 16 13:19:46 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ilpo_J=C3=A4rvinen?= X-Patchwork-Id: 59104 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 0F13BB7086 for ; Fri, 16 Jul 2010 23:20:23 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965615Ab0GPNTu (ORCPT ); Fri, 16 Jul 2010 09:19:50 -0400 Received: from courier.cs.helsinki.fi ([128.214.9.1]:46479 "EHLO mail.cs.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965605Ab0GPNTs (ORCPT ); Fri, 16 Jul 2010 09:19:48 -0400 Received: from melkinpaasi.cs.helsinki.fi (melkinpaasi.cs.helsinki.fi [128.214.9.14]) (TLS: TLSv1/SSLv3,256bits,AES256-SHA) by mail.cs.helsinki.fi with esmtp; Fri, 16 Jul 2010 16:19:47 +0300 id 00093F50.4C405C73.00003DBD Date: Fri, 16 Jul 2010 16:19:46 +0300 (EEST) From: "=?ISO-8859-15?Q?Ilpo_J=E4rvinen?=" X-X-Sender: ijjarvin@melkinpaasi.cs.helsinki.fi To: Lennart Schulte , "David S. Miller" cc: Eric Dumazet , Tejun Heo , lkml , "netdev@vger.kernel.org" , "Fehrmann, Henning" , Carsten Aulbert Subject: Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 In-Reply-To: <4C404FC5.6040107@nets.rwth-aachen.de> Message-ID: References: <4C358AAA.9080400@kernel.org> <4C3EF7EA.2040900@nets.rwth-aachen.de> <1279195528.2496.2.camel@edumazet-laptop> <4C3F053F.7090704@nets.rwth-aachen.de> <4C404FC5.6040107@nets.rwth-aachen.de> User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Fri, 16 Jul 2010, Lennart Schulte wrote: > On 16.07.2010 14:02, Ilpo Järvinen wrote: > > > > > > > [ 2754.413150] NULL head, pkts 0 > > > > > [ 2754.413156] Errors caught so far 1 > > > > > > > Thanks for reporting the results. > > > > Could you post the oops too or double check do the timestamps really match > > (and there wasn't more "Errors caught" prints in between)? Since this > > condition doesn't seem to crash the kernel as also send_head should be > > NULL, which saves the day here exiting the loop (unless send head would > > too be corrupt). Doh, I think we'll deref skb already to get the sacked (wouldn't be absolutely necessary but better to not trust side-effects) so it certainly is bad even with the send_head exit. > I can try to do some more testing, perhaps then I will get other results. But > until now I've always gotten something like above. It might then be useful to remove if (!caught_it) which was to prevent infinite printout if the problem is such that it would have persisted forever (now w/o the crash), but since there's no evidence of that. > With the debug patch the kernel doesn't crash, but I have an oops from a run > before the patch: Right, no crash of course, stupid me :-). Lets start with this (I'm not sure if this helps Tejun's case but much doubt it does): --- [PATCH] tcp: fix crash in tcp_xmit_retransmit_queue It can happen that there are no packets in queue while calling tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns NULL and that gets deref'ed to get sacked into a local var. There is no work to do if no packets are outstanding so we just exit early. There may still be another bug affecting this same function. Signed-off-by: Ilpo Järvinen Reported-by: Lennart Schulte --- net/ipv4/tcp_output.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index b4ed957..7ed9dc1 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2208,6 +2208,9 @@ void tcp_xmit_retransmit_queue(struct sock *sk) int mib_idx; int fwd_rexmitting = 0; + if (!tp->packets_out) + return; + if (!tp->lost_out) tp->retransmit_high = tp->snd_una;