From patchwork Thu Dec 12 19:28:43 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 300744 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id EC5CF2C00A1 for ; Fri, 13 Dec 2013 06:28:51 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751715Ab3LLT2s (ORCPT ); Thu, 12 Dec 2013 14:28:48 -0500 Received: from mail-yh0-f47.google.com ([209.85.213.47]:60217 "EHLO mail-yh0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751387Ab3LLT2r (ORCPT ); Thu, 12 Dec 2013 14:28:47 -0500 Received: by mail-yh0-f47.google.com with SMTP id 29so723088yhl.34 for ; Thu, 12 Dec 2013 11:28:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:subject:from:to:cc:date:content-type :content-transfer-encoding:mime-version; bh=JnW9RQVfSAJ5XoGwQAvYzh3ArW/c9LAtCZVdLR8JpkI=; b=BU0RBet9phlSbbd0O49ktT6RQlWZPUaSQMsgJhIGqxwbzkwVJ5ssxMz2MNU39eKXus EgRsPl+QohEVUCjtNAWFGzyj0cyr/Ry7tH436J1x2YBDfL9xi+Mg8gKaOOdv4WTLMEA9 AxbH/WE87d/v2Hnrw170z6yP1bqmT7BgcovbgffRQFZX9KSUwxSvIBuuyYc7aiN4Vjxm Xo3PvSxGk6LNcTmwDqcWa5wT1YNERJWqDc4Ua0eP4FJovP15h2cpUIXaOuhABmXBYyuo lRKJ2v3pSCZOh0SJufOPBG0C8lk4kB18zwwOukDb+a4BU5sYye4Pb+PApHs4fUvT4oi4 K9BA== X-Received: by 10.236.155.100 with SMTP id i64mr7509996yhk.42.1386876526383; Thu, 12 Dec 2013 11:28:46 -0800 (PST) Received: from ?IPv6:2620:0:1000:3e02:d0c5:99c5:e8c6:c8c1? ([2620:0:1000:3e02:d0c5:99c5:e8c6:c8c1]) by mx.google.com with ESMTPSA id 9sm35655273yhe.21.2013.12.12.11.28.44 for (version=SSLv3 cipher=RC4-SHA bits=128/128); Thu, 12 Dec 2013 11:28:45 -0800 (PST) Message-ID: <1386876523.19078.93.camel@edumazet-glaptop2.roam.corp.google.com> Subject: [PATCH net-next] tcp: remove a bogus TSO split From: Eric Dumazet To: David Miller Cc: netdev , Yuchung Cheng , Neal Cardwell , Nandita Dukkipati , Van Jacobson Date: Thu, 12 Dec 2013 11:28:43 -0800 X-Mailer: Evolution 3.2.3-0ubuntu6 Mime-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Eric Dumazet While investigating performance problems on small RPC workloads, I noticed linux TCP stack was always splitting the last TSO skb into two parts (skbs). One being a multiple of MSS, and a small one with the Push flag. This split is done even if TCP_NODELAY is set. Example with request/response of 4K/4K IP A > B: . ack 68432 win 2783 IP A > B: . 65537:68433(2896) ack 69632 win 2783 IP A > B: P 68433:69633(1200) ack 69632 win 2783 IP B > A: . ack 68433 win 2768 IP B > A: . 69632:72528(2896) ack 69633 win 2768 IP B > A: P 72528:73728(1200) ack 69633 win 2768 IP A > B: . ack 72528 win 2783 IP A > B: . 69633:72529(2896) ack 73728 win 2783 IP A > B: P 72529:73729(1200) ack 73728 win 2783 We think this is not needed. All the Nagle/window tests are done at this point. This patch tremendously improves performance, as the traffic now looks like : IP A > B: . ack 98304 win 2783 IP A > B: P 94209:98305(4096) ack 98304 win 2783 IP B > A: . ack 98305 win 2768 IP B > A: P 98304:102400(4096) ack 98305 win 2768 IP A > B: . ack 102400 win 2783 IP A > B: P 98305:102401(4096) ack 102400 win 2783 IP B > A: . ack 102401 win 2768 IP B > A: P 102400:106496(4096) ack 102401 win 2768 IP A > B: . ack 106496 win 2783 IP A > B: P 102401:106497(4096) ack 106496 win 2783 IP B > A: . ack 106497 win 2768 IP B > A: P 106496:110592(4096) ack 106497 win 2768 Before : lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K 280774 Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K': 205719.049006 task-clock # 9.278 CPUs utilized 8,449,968 context-switches # 0.041 M/sec 1,935,997 CPU-migrations # 0.009 M/sec 160,541 page-faults # 0.780 K/sec 548,478,722,290 cycles # 2.666 GHz [83.20%] 455,240,670,857 stalled-cycles-frontend # 83.00% frontend cycles idle [83.48%] 272,881,454,275 stalled-cycles-backend # 49.75% backend cycles idle [66.73%] 166,091,460,030 instructions # 0.30 insns per cycle # 2.74 stalled cycles per insn [83.39%] 29,150,229,399 branches # 141.699 M/sec [83.30%] 1,943,814,026 branch-misses # 6.67% of all branches [83.32%] 22.173517844 seconds time elapsed lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets" IpOutRequests 16851063 0.0 IpExtOutOctets 23878580777 0.0 After patch : lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K 280877 Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K': 107496.071918 task-clock # 4.847 CPUs utilized 5,635,458 context-switches # 0.052 M/sec 1,374,707 CPU-migrations # 0.013 M/sec 160,920 page-faults # 0.001 M/sec 281,500,010,924 cycles # 2.619 GHz [83.28%] 228,865,069,307 stalled-cycles-frontend # 81.30% frontend cycles idle [83.38%] 142,462,742,658 stalled-cycles-backend # 50.61% backend cycles idle [66.81%] 95,227,712,566 instructions # 0.34 insns per cycle # 2.40 stalled cycles per insn [83.43%] 16,209,868,171 branches # 150.795 M/sec [83.20%] 874,252,952 branch-misses # 5.39% of all branches [83.37%] 22.175821286 seconds time elapsed lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets" IpOutRequests 11239428 0.0 IpExtOutOctets 23595191035 0.0 Indeed, the occupancy of tx skbs (IpExtOutOctets/IpOutRequests) is higher : 2099 instead of 1417, thus helping GRO to be more efficient when using FQ packet scheduler. Signed-off-by: Eric Dumazet Cc: Yuchung Cheng Cc: Neal Cardwell Cc: Nandita Dukkipati Cc: Van Jacobson --- net/ipv4/tcp_output.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 2a69f42e51ca..335e110e86ba 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1410,10 +1410,7 @@ static unsigned int tcp_mss_split_point(const struct sock *sk, const struct sk_b needed = min(skb->len, window); - if (max_len <= needed) - return max_len; - - return needed - needed % mss_now; + return min(max_len, needed); } /* Can at least one segment of SKB be sent right now, according to the