From patchwork Fri Jul 23 05:09:08 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 59732 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 8BC861007D6 for ; Fri, 23 Jul 2010 15:09:20 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752545Ab0GWFJP (ORCPT ); Fri, 23 Jul 2010 01:09:15 -0400 Received: from mail-wy0-f174.google.com ([74.125.82.174]:61696 "EHLO mail-wy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752357Ab0GWFJO (ORCPT ); Fri, 23 Jul 2010 01:09:14 -0400 Received: by wyf19 with SMTP id 19so1220028wyf.19 for ; Thu, 22 Jul 2010 22:09:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:subject:from:to:cc :in-reply-to:references:content-type:date:message-id:mime-version :x-mailer:content-transfer-encoding; bh=xlbVyxi4pFbYauYBeMQ2B1yARbHF1huogudEw0Zu9MQ=; b=tTuSZP014Nix+7fSLHcbfEWSPz2FphqNKPOK8hw5Teez9lWBxMxHBuvHAyGumvAxvo nbFJhU3EAwBGmr+s+TKyZKbIVqjQv/RFI1Y5uFJHVWmHZ1mqDU/0zfcZYezwD9LcdXEv WNf89FnNKGEIwLRbmInxUJ0etmSYCxCu0B1qI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; b=oUtXW01Poj2AkCjHyE3bvqW9xlLTFelU6Yf/aVaq3z8BVx24D/3jx4O5XGNJFqhQ0i THqWlOhH6pmSfqf0Rmnm6K+mKt7HVr/RCG3ULWQ7rbApDSRUVVMg8S11HnB8yKVA7L8+ R+SXw3BK+AVDVoWbTr7n2O6RKe6OGm4as0zaw= Received: by 10.227.38.162 with SMTP id b34mr2915560wbe.129.1279861752078; Thu, 22 Jul 2010 22:09:12 -0700 (PDT) Received: from [127.0.0.1] ([85.17.35.125]) by mx.google.com with ESMTPS id e31sm55081287wbe.17.2010.07.22.22.09.10 (version=SSLv3 cipher=RC4-MD5); Thu, 22 Jul 2010 22:09:11 -0700 (PDT) Subject: [PATCH net-next-2.6] net: pskb_expand_head() optimization From: Eric Dumazet To: Andrea Shepard , David Miller Cc: netdev In-Reply-To: <20100722191234.GA832@cronus.persephoneslair.org> References: <20100722191234.GA832@cronus.persephoneslair.org> Date: Fri, 23 Jul 2010 07:09:08 +0200 Message-ID: <1279861748.2482.13.camel@edumazet-laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Le jeudi 22 juillet 2010 à 12:12 -0700, Andrea Shepard a écrit : > Make pskb_expand_head() check ip_summed to make sure csum_start is really > csum_start and not csum before adjusting it. > > This fixes a bug I encountered using a Sun Quad-Fast Ethernet card and VLANs. > On my configuration, the sunhme driver produces skbs with differing amounts > of headroom on receive depending on the packet size. See line 2030 of > drivers/net/sunhme.c; packets smaller than RX_COPY_THRESHOLD have 52 bytes > of headroom but packets larger than that cutoff have only 20 bytes. > > When these packets reach the VLAN driver, vlan_check_reorder_header() > calls skb_cow(), which, if the packet has less than NET_SKB_PAD (== 32) bytes > of headroom, uses pskb_expand_head() to make more. > > Then, pskb_expand_head() needs to adjust a lot of offsets into the skb, > including csum_start. Since csum_start is a union with csum, if the packet > has a valid csum value this will corrupt it, which was the effect I observed. > The sunhme hardware computes receive checksums, so the skbs would be created > by the driver with ip_summed == CHECKSUM_COMPLETE and a valid csum field, and > then pskb_expand_head() would corrupt the csum field, leading to an "hw csum > error" message later on, for example in icmp_rcv() for pings larger than the > sunhme RX_COPY_THRESHOLD. > > On the basis of the comment at the beginning of include/linux/skbuff.h, > I believe that the csum_start skb field is only meaningful if ip_csummed is > CSUM_PARTIAL, so this patch makes pskb_expand_head() adjust it only in that > case to avoid corrupting a valid csum value. > > Please see my more in-depth disucssion of tracking down this bug for > more details if you like: > > http://puellavulnerata.livejournal.com/112186.html > http://puellavulnerata.livejournal.com/112567.html > http://puellavulnerata.livejournal.com/112891.html > http://puellavulnerata.livejournal.com/113096.html > http://puellavulnerata.livejournal.com/113591.html > > I am not subscribed to this list, so please CC me on replies. > Excellent Changelog Andrea, thanks ! Reviewing pskb_expand_head(), I see it copy the whole struct skb_shared_info, even if source contains garbage (uninitialized data). I wonder why it is not triggering kmemcheck alarms [PATCH net-next-2.6] net: pskb_expand_head() optimization Move frags[] at the end of struct skb_shared_info, and make pskb_expand_head() copy only the used part of it instead of whole array. This should avoid kmemcheck warnings and speedup pskb_expand_head() as well, avoiding a lot of cache misses. Signed-off-by: Eric Dumazet --- include/linux/skbuff.h | 3 ++- net/core/skbuff.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index f5aa87e..d89876b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -202,10 +202,11 @@ struct skb_shared_info { */ atomic_t dataref; - skb_frag_t frags[MAX_SKB_FRAGS]; /* Intermediate layers must ensure that destructor_arg * remains valid until skb destructor */ void * destructor_arg; + /* must be last field, see pskb_expand_head() */ + skb_frag_t frags[MAX_SKB_FRAGS]; }; /* We divide dataref into two halves. The higher 16 bits hold references diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 76d33ca..7da58a2 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -817,7 +817,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, memcpy(data + nhead, skb->head, skb->tail - skb->head); #endif memcpy(data + size, skb_end_pointer(skb), - sizeof(struct skb_shared_info)); + offsetof(struct skb_shared_info, frags[skb_shinfo(skb)->nr_frags])); for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) get_page(skb_shinfo(skb)->frags[i].page);