From patchwork Fri Mar 21 13:47:05 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 332655 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 98E642C00B3 for ; Sat, 22 Mar 2014 00:47:44 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753629AbaCUNrL (ORCPT ); Fri, 21 Mar 2014 09:47:11 -0400 Received: from mail-pb0-f45.google.com ([209.85.160.45]:47419 "EHLO mail-pb0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751019AbaCUNrJ (ORCPT ); Fri, 21 Mar 2014 09:47:09 -0400 Received: by mail-pb0-f45.google.com with SMTP id uo5so2433227pbc.4 for ; Fri, 21 Mar 2014 06:47:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:subject:from:to:cc:date:in-reply-to:references :content-type:content-transfer-encoding:mime-version; bh=dVnSCGrKIFMsNYq1WEYQDHkWeLl68WB8SKEDTDf2t+c=; b=xQIApSocccdsgbMzLkjWWlQff4NCVQiPh0Spl2cf+I7Da8RioZHCPUFwj399DkxcJt Rxa+JFoVNqdNDo3JUywU4r8Qj8CvhmFyftqMaUfPRfQf8snOQG43zaU757nuua256lto +nvniC3SmegRFtOp53Mpk7mpm+QBeY12PSvDWSGKKMNV1nzhdYDiwabD8nZgiAC8mvVA g2LviWP9iw1GTwNy6kaAeLh1RNiswReo8QMEYtMQtSKWcpEA/mfHYypbJq7nAMy0xWF9 5/udmqWVaH0bQY0hCk8s/oz2Zb6yZPOy9xM4mzL70thgCjSl4K6OFT+Y/vu8j/gI/O3W /YZg== X-Received: by 10.66.129.133 with SMTP id nw5mr53375445pab.98.1395409628571; Fri, 21 Mar 2014 06:47:08 -0700 (PDT) Received: from [172.19.250.111] ([172.19.250.111]) by mx.google.com with ESMTPSA id vo1sm26615970pab.32.2014.03.21.06.47.06 for (version=SSLv3 cipher=RC4-SHA bits=128/128); Fri, 21 Mar 2014 06:47:07 -0700 (PDT) Message-ID: <1395409625.6441.4.camel@edumazet-glaptop2.roam.corp.google.com> Subject: Re: [RFC] csum experts, csum_replace2() is too expensive From: Eric Dumazet To: Andi Kleen Cc: "H. Peter Anvin" , Patrick McHardy , Herbert Xu , "H.K. Jerry Chu" , Michael Dalton , netdev , "linux-kernel@vger.kernel.org" Date: Fri, 21 Mar 2014 06:47:05 -0700 In-Reply-To: <1395408778.6441.2.camel@edumazet-glaptop2.roam.corp.google.com> References: <1395341341.9114.93.camel@edumazet-glaptop2.roam.corp.google.com> <87a9cknwk4.fsf@tassilo.jf.intel.com> <1395406250.9114.142.camel@edumazet-glaptop2.roam.corp.google.com> <1395408778.6441.2.camel@edumazet-glaptop2.roam.corp.google.com> X-Mailer: Evolution 3.2.3-0ubuntu6 Mime-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Fri, 2014-03-21 at 06:32 -0700, Eric Dumazet wrote: > On Fri, 2014-03-21 at 05:50 -0700, Eric Dumazet wrote: > > > Or the fact that we mix 16 bit stores and 32bit loads ? > > > > iph->tot_len = newlen; > > iph->check = 0; > > iph->check = ip_fast_csum(iph, 5); > > Yep definitely. Using 16 bit loads in ip_fast_csum() totally removes the > stall. I no longer see inet_gro_complete() in perf top... > > + if (__builtin_constant_p(ihl) && ihl == 5) { > + asm(" movw (%[iph]), %w[sum]\n" /* ihl/version/tos */ > + " addw 2(%[iph]), %w[sum]\n" /* tot_len */ > + " adcw 8(%[iph]), %w[sum]\n" /* ttl/protocol */ > + " adcw 10(%[iph]), %w[sum]\n" /* check */ > + " adcl 4(%[iph]), %[sum]\n" /* id/frag_off */ > + " adcl 12(%[iph]), %[sum]\n" /* saddr */ > + " adcl 16(%[iph]), %[sum]\n" /* daddr */ > + " adcl $0, %[sum]\n" > + : [sum] "=r" (sum) > + : [iph] "r" (iph) > + ); > + return csum_fold(sum); > Another idea would be to move the ip_fast_csum() call at the end of inet_gro_complete() I'll try this : --- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 8c54870db792..0ca8f350a532 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1434,8 +1434,8 @@ static int inet_gro_complete(struct sk_buff *skb, int nhoff) int proto = iph->protocol; int err = -ENOSYS; - csum_replace2(&iph->check, iph->tot_len, newlen); iph->tot_len = newlen; + iph->check = 0; rcu_read_lock(); ops = rcu_dereference(inet_offloads[proto]); @@ -1447,6 +1447,7 @@ static int inet_gro_complete(struct sk_buff *skb, int nhoff) * inet_gro_receive(). */ err = ops->callbacks.gro_complete(skb, nhoff + sizeof(*iph)); + iph->check = ip_fast_csum((u8 *)iph, 5); out_unlock: rcu_read_unlock();