[v2,net-next] tcp: remove a bogus TSO split

From: Eric Dumazet <edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>

While investigating performance problems on small RPC workloads,
I noticed linux TCP stack was always splitting the last TSO skb
into two parts (skbs). One being a multiple of MSS, and a small one
with the Push flag. This split is done even if TCP_NODELAY is set.

Example with request/response of 4K/4K

IP A > B: . ack 68432 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: . 65537:68433(2896) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: P 68433:69633(1200) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
IP B > A: . ack 68433 win 2768 <nop,nop,timestamp 6525001 6524593>
IP B > A: . 69632:72528(2896) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
IP B > A: P 72528:73728(1200) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
IP A > B: . ack 72528 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: . 69633:72529(2896) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: P 72529:73729(1200) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>

We think this is not needed, and is probably a leftover of initial TSO support.
TCP stack had issues with mss changes, that have been fixed.

All the Nagle/window tests are done at this point.

Note : If some NIC has trouble sending TSO packets with a partial
last segment, we will find out and add a device feature or something.

tcp_minshall_update() is moved to tcp_output.c and is updated, thanks Neal !

This patch tremendously improves performance, as the traffic now looks
like :

IP A > B: . ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
IP A > B: P 94209:98305(4096) ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
IP B > A: . ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
IP B > A: P 98304:102400(4096) ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
IP A > B: . ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
IP A > B: P 98305:102401(4096) ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
IP B > A: . ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
IP B > A: P 102400:106496(4096) ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
IP A > B: . ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
IP A > B: P 102401:106497(4096) ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
IP B > A: . ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>
IP B > A: P 106496:110592(4096) ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>

Before :

lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
280774

 Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':

     205719.049006 task-clock                #    9.278 CPUs utilized          
         8,449,968 context-switches          #    0.041 M/sec                  
         1,935,997 CPU-migrations            #    0.009 M/sec                  
           160,541 page-faults               #    0.780 K/sec                  
   548,478,722,290 cycles                    #    2.666 GHz                     [83.20%]
   455,240,670,857 stalled-cycles-frontend   #   83.00% frontend cycles idle    [83.48%]
   272,881,454,275 stalled-cycles-backend    #   49.75% backend  cycles idle    [66.73%]
   166,091,460,030 instructions              #    0.30  insns per cycle        
                                             #    2.74  stalled cycles per insn [83.39%]
    29,150,229,399 branches                  #  141.699 M/sec                   [83.30%]
     1,943,814,026 branch-misses             #    6.67% of all branches         [83.32%]

      22.173517844 seconds time elapsed

lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
IpOutRequests                   16851063           0.0
IpExtOutOctets                  23878580777        0.0

After patch :

lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
280877

 Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':

     107496.071918 task-clock                #    4.847 CPUs utilized          
         5,635,458 context-switches          #    0.052 M/sec                  
         1,374,707 CPU-migrations            #    0.013 M/sec                  
           160,920 page-faults               #    0.001 M/sec                  
   281,500,010,924 cycles                    #    2.619 GHz                     [83.28%]
   228,865,069,307 stalled-cycles-frontend   #   81.30% frontend cycles idle    [83.38%]
   142,462,742,658 stalled-cycles-backend    #   50.61% backend  cycles idle    [66.81%]
    95,227,712,566 instructions              #    0.34  insns per cycle        
                                             #    2.40  stalled cycles per insn [83.43%]
    16,209,868,171 branches                  #  150.795 M/sec                   [83.20%]
       874,252,952 branch-misses             #    5.39% of all branches         [83.37%]

      22.175821286 seconds time elapsed

lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
IpOutRequests                   11239428           0.0
IpExtOutOctets                  23595191035        0.0

Indeed, the occupancy of tx skbs (IpExtOutOctets/IpOutRequests) is higher :
2099 instead of 1417, thus helping GRO to be more efficient when using FQ packet
scheduler.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Van Jacobson <vanj@google.com>
---
v2: changed tcp_minshall_update() as Neal pointed out.

 include/net/tcp.h     |    7 ------
 net/ipv4/tcp_output.c |   41 +++++++++++++++++++++-------------------
 2 files changed, 22 insertions(+), 26 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Message ID	1386958426.19078.162.camel@edumazet-glaptop2.roam.corp.google.com
State	Changes Requested, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 962132C009E for <patchwork-incoming@ozlabs.org>; Sat, 14 Dec 2013 05:13:56 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753097Ab3LMSNw (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Fri, 13 Dec 2013 13:13:52 -0500 Received: from mail-yh0-f42.google.com ([209.85.213.42]:41861 "EHLO mail-yh0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752644Ab3LMSNv (ORCPT <rfc822;netdev@vger.kernel.org>); Fri, 13 Dec 2013 13:13:51 -0500 Received: by mail-yh0-f42.google.com with SMTP id z6so1792842yhz.29 for <netdev@vger.kernel.org>; Fri, 13 Dec 2013 10:13:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:subject:from:to:cc:date:in-reply-to:references :content-type:content-transfer-encoding:mime-version; bh=XKGS4DCzZHKkxt1/j6gsUAmvNr3VS6aGuU5WUhsSKjk=; b=HxKt2JpQMRmUj0ouqwBm4NoxkJCSyaAf17o1j3tOFlXlh2B57UrP52to3ZfC3mwnOQ 7FCaj8azq3EX8gpWxTUWxRBcqp25BnRjsBNYcSBbf/Y1aSEzAigKxwsy9R94nhpsPqfd Zbu7ZteByfm06c9fbeLR4iqAelrQh4Fn4w0+nSYO5DocowxVyuFSYwKhk+VDND29OmS7 206NtGqjjplfnkPYPK360D46JQfMGrwcnIjT+ZKY8QbpYPNnj6Xs0K2rccBBpWqf3U6Q LItf3qdxuT3LG4rR5jbZdenlDuEehsoZk1KNhERsx7saf3378HPozhLvAxRksZ7xhAdt 3/VQ== X-Received: by 10.236.231.84 with SMTP id k80mr3435439yhq.33.1386958429748; Fri, 13 Dec 2013 10:13:49 -0800 (PST) Received: from ?IPv6:2620:0:1000:3e02:b594:1b12:8a2c:e1de? ([2620:0:1000:3e02:b594:1b12:8a2c:e1de]) by mx.google.com with ESMTPSA id o27sm4275232yhb.19.2013.12.13.10.13.47 for <multiple recipients> (version=SSLv3 cipher=RC4-SHA bits=128/128); Fri, 13 Dec 2013 10:13:49 -0800 (PST) Message-ID: <1386958426.19078.162.camel@edumazet-glaptop2.roam.corp.google.com> Subject: [PATCH v2 net-next] tcp: remove a bogus TSO split From: Eric Dumazet <eric.dumazet@gmail.com> To: David Miller <davem@davemloft.net> Cc: netdev <netdev@vger.kernel.org>, Yuchung Cheng <ycheng@google.com>, Neal Cardwell <ncardwell@google.com>, Nandita Dukkipati <nanditad@google.com>, Van Jacobson <vanj@google.com> Date: Fri, 13 Dec 2013 10:13:46 -0800 In-Reply-To: <1386876523.19078.93.camel@edumazet-glaptop2.roam.corp.google.com> References: <1386876523.19078.93.camel@edumazet-glaptop2.roam.corp.google.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.2.3-0ubuntu6 Content-Transfer-Encoding: 7bit Mime-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org

[v2,net-next] tcp: remove a bogus TSO split

Commit Message

Comments

Patch