From patchwork Tue Sep 21 15:58:25 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 65329 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3EB38B70A3 for ; Wed, 22 Sep 2010 01:59:16 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757075Ab0IUP6e (ORCPT ); Tue, 21 Sep 2010 11:58:34 -0400 Received: from mail-fx0-f46.google.com ([209.85.161.46]:41235 "EHLO mail-fx0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753469Ab0IUP6c (ORCPT ); Tue, 21 Sep 2010 11:58:32 -0400 Received: by fxm3 with SMTP id 3so1507649fxm.19 for ; Tue, 21 Sep 2010 08:58:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:subject:from:to:cc :in-reply-to:references:content-type:date:message-id:mime-version :x-mailer:content-transfer-encoding; bh=i7vR5ZlmeHTyIvCLn3AcG91g0iv0dFCPc2C1QHi2iDo=; b=Cvvho9TpWfpheaXf/FoSYwNgFYhOs+Xd9v4tQylUDUzbP2BklAjQW6VO1Al22kysHB aXAGHPSe2EEA4bOtnkFPildwjwZRB7Xf/3UhrBC626nj76uEAPyf7gj306PbrQ1Ds1Li AvUcm9Dw0YTrgIkOIV2cybPItuoAtUxOWC8lg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; b=FSM1wH+Nv3W05Y8XnrmmL3aAUfIxu3hL2CKmox8kWWMon6c3iOKWAupGLaVQy5RqNo j8FOGloRa/+vYkTnjKMIN2PdeCYokMjooQ3TmG8qyr2YCze1qdCLzwM2aKn1DA69m1Wp ExGfRFnSEYaqtwH5mqQmFvdyMhwes+0+5VTkQ= Received: by 10.223.104.135 with SMTP id p7mr6404906fao.33.1285084710215; Tue, 21 Sep 2010 08:58:30 -0700 (PDT) Received: from [10.150.51.210] (gw0.net.jmsp.net [212.23.165.14]) by mx.google.com with ESMTPS id r10sm3655966faq.5.2010.09.21.08.58.27 (version=SSLv3 cipher=RC4-MD5); Tue, 21 Sep 2010 08:58:28 -0700 (PDT) Subject: [PATCH v3] ip: fix truesize mismatch in ip fragmentation From: Eric Dumazet To: Nick Bowler Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, "David S. Miller" , Jarek Poplawski , Patrick McHardy In-Reply-To: <1285078613.2617.503.camel@edumazet-laptop> References: <20100920174443.GA5515@elliptictech.com> <1285006844.2323.17.camel@edumazet-laptop> <20100920195256.GA14330@elliptictech.com> <1285013853.2323.148.camel@edumazet-laptop> <1285018272.2323.243.camel@edumazet-laptop> <20100921140501.GA21572@elliptictech.com> <1285078613.2617.503.camel@edumazet-laptop> Date: Tue, 21 Sep 2010 17:58:25 +0200 Message-ID: <1285084705.2617.636.camel@edumazet-laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Le mardi 21 septembre 2010 à 16:16 +0200, Eric Dumazet a écrit : > Le mardi 21 septembre 2010 à 10:05 -0400, Nick Bowler a écrit : > > > This hunk introduces some whitespace damage. > > > > Anyway, I tried this with ESP on both IPv4 and IPv6 and it appears to > > correct the issue. Thanks! > > > > Indeed good catch. > > Here is an updated patch, I added your Tested-by > > Thanks for testing ! > > [PATCH] ip : fix truesize mismatch in ip fragmentation > > We should not set frag->destructor to sock_wkfree() until we are sure we > dont hit slow path in ip_fragment(). Or we risk uncharging > frag->truesize twice, and in the end, having negative socket > sk_wmem_alloc counter, or even freeing socket sooner than expected. > > Many thanks to Nick Bowler, who provided a very clean bug report and > test program. > > While Nick bisection pointed to commit 2b85a34e911bf483 (net: No more > expensive sock_hold()/sock_put() on each tx), underlying bug is older. > Hmm, while looking at git history, I found commit from Patrick b2722b1c3a893e (ip_fragment: also adjust skb->truesize for packets not owned by a socket) As my patch reverts it, we probably want a more polished patch. (Also port Patrick work to ipv6) So here is a V3 [PATCH v3] ip: fix truesize mismatch in ip fragmentation Special care should be taken when slow path is hit in ip_fragment() : When walking through frags, we transfert truesize ownership from skb to frags. Then if we hit a slow_path condition, we must undo this or risk uncharging frags->truesize twice, and in the end, having negative socket sk_wmem_alloc counter, or even freeing socket sooner than expected. Many thanks to Nick Bowler, who provided a very clean bug report and test program. Thanks to Jarek for reviewing my first patch and providing a V2 While Nick bisection pointed to commit 2b85a34e911 (net: No more expensive sock_hold()/sock_put() on each tx), underlying bug is older (2.6.12-rc5) A side effect is to extend work done in commit b2722b1c3a893e (ip_fragment: also adjust skb->truesize for packets not owned by a socket) to ipv6 as well. Reported-and-bisected-by: Nick Bowler Tested-by: Nick Bowler Signed-off-by: Eric Dumazet CC: Jarek Poplawski CC: Patrick McHardy --- net/ipv4/ip_output.c | 19 +++++++++++++------ net/ipv6/ip6_output.c | 18 +++++++++++++----- 2 files changed, 26 insertions(+), 11 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 04b6989..a643f7a 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -488,9 +488,8 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *)) * we can switch to copy when see the first bad fragment. */ if (skb_has_frags(skb)) { - struct sk_buff *frag; + struct sk_buff *frag, *frag2; int first_len = skb_pagelen(skb); - int truesizes = 0; if (first_len - hlen > mtu || ((first_len - hlen) & 7) || @@ -503,18 +502,18 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *)) if (frag->len > mtu || ((frag->len & 7) && frag->next) || skb_headroom(frag) < hlen) - goto slow_path; + goto slow_path_undo; /* Partially cloned skb? */ if (skb_shared(frag)) - goto slow_path; + goto slow_path_undo; BUG_ON(frag->sk); if (skb->sk) { frag->sk = skb->sk; frag->destructor = sock_wfree; } - truesizes += frag->truesize; + skb->truesize -= frag->truesize; } /* Everything is OK. Generate! */ @@ -524,7 +523,6 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *)) frag = skb_shinfo(skb)->frag_list; skb_frag_list_init(skb); skb->data_len = first_len - skb_headlen(skb); - skb->truesize -= truesizes; skb->len = first_len; iph->tot_len = htons(first_len); iph->frag_off = htons(IP_MF); @@ -576,6 +574,15 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *)) } IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGFAILS); return err; + +slow_path_undo: + skb_walk_frags(skb, frag2) { + if (frag2 == frag) + break; + frag2->sk = NULL; + frag2->destructor = NULL; + skb->truesize += frag2->truesize; + } } slow_path: diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index d40b330..ca7ba44 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -639,7 +639,7 @@ static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *)) if (skb_has_frags(skb)) { int first_len = skb_pagelen(skb); - int truesizes = 0; + struct sk_buff *frag2; if (first_len - hlen > mtu || ((first_len - hlen) & 7) || @@ -651,18 +651,18 @@ static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *)) if (frag->len > mtu || ((frag->len & 7) && frag->next) || skb_headroom(frag) < hlen) - goto slow_path; + goto slow_path_undo; /* Partially cloned skb? */ if (skb_shared(frag)) - goto slow_path; + goto slow_path_undo; BUG_ON(frag->sk); if (skb->sk) { frag->sk = skb->sk; frag->destructor = sock_wfree; - truesizes += frag->truesize; } + skb->truesize -= frag->truesize; } err = 0; @@ -693,7 +693,6 @@ static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *)) first_len = skb_pagelen(skb); skb->data_len = first_len - skb_headlen(skb); - skb->truesize -= truesizes; skb->len = first_len; ipv6_hdr(skb)->payload_len = htons(first_len - sizeof(struct ipv6hdr)); @@ -756,6 +755,15 @@ static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *)) IPSTATS_MIB_FRAGFAILS); dst_release(&rt->dst); return err; + +slow_path_undo: + skb_walk_frags(skb, frag2) { + if (frag2 == frag) + break; + frag2->sk = NULL; + frag2->destructor = NULL; + skb->truesize += frag2->truesize; + } } slow_path: