From patchwork Fri Dec 13 14:15:37 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neal Cardwell X-Patchwork-Id: 301045 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 1D84F2C00A1 for ; Sat, 14 Dec 2013 01:15:47 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752695Ab3LMOPn (ORCPT ); Fri, 13 Dec 2013 09:15:43 -0500 Received: from mail-ee0-f49.google.com ([74.125.83.49]:53061 "EHLO mail-ee0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752425Ab3LMOPm (ORCPT ); Fri, 13 Dec 2013 09:15:42 -0500 Received: by mail-ee0-f49.google.com with SMTP id c41so900391eek.22 for ; Fri, 13 Dec 2013 06:15:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=LV6ZxBw4mpYC1xRoJiiVBTiFcBOBaYiq4VsAzWmqvUY=; b=PJWTR8gDK+4E3yHAleh8gdj3v3y7or4oQk/VUxAZM06r5mdqtuW2uVK0VqQ6k+yPPV iY0hZy2ZuyhnjsotL1Y97kSKvigGzxaOZbaWpuwynCLlWllOW21eANOYuupQhNJbGtKP dL89DpJQI9FQO2UeDlYX5hyQWFBcqin5NueHfRVbW9kpgjGAAzKe0h8c+P5SgwObsS/x QUsMTGLd+nMVMnkWdEJIqiZ+46js2jwLPU9ga63USNTSy2mEB/KF8/PtcLLeDHBQIkoA e1iweqq42R5RFWVeKTN3Sk0qgnTmUeXo678agPY0VtW3TK0rWywPoPjOzS0VochnxdE7 l2Qg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=LV6ZxBw4mpYC1xRoJiiVBTiFcBOBaYiq4VsAzWmqvUY=; b=bfDou4USwKCxXH1foI07p2n/98O2wyClcUXSk7VU7mKFi6qXLJb5tMiCh2LQ+p4XPi oFVlQOx3YJTysWrqHfaXMFi5vzz6VdSI6KSa+F8VNh2gHRZeao0rX/Ct5Pfjl/6/4l7i JQIQgbU/GaqVefHJzMgvBPI+vbs71FuXZe7+1WbCnU5NRUspe5hkCFVWN4aOvHavqwUJ Nx2tC/xy5MargeT2FOagF1KIfadxfgQhdCK9zT+YqC132tKQfJZRRPDn7pHgqGNDVKDe /qP8ELO+pcqZYZIxS0lhIxnWYbCYq4XHRKuQWNeYC+B9AsVabEYz1DyLkmEhKTOqiAzj Hxag== X-Gm-Message-State: ALoCoQlkKD8BvhbMghYvJUjMJ10Mg0/Oh1MB9OJqAxbBf1H2Hwre2WXxURnfARtQMIU7kcdE1tkMbmP78K0AICETexobV0R/aROJ9sGe59NXlupmfVnnr6ge0YAaoaZh8Tk53jgwCCLym4DQR7mHiP/hCpmI23qjGxvrIB5TMAeGt1f92hMvdCZofwe+HHfMa4Etv4BZdJuMysyPrPuSXoKaTGxEBPmv8w== MIME-Version: 1.0 X-Received: by 10.15.77.134 with SMTP id p6mr3285095eey.0.1386944138030; Fri, 13 Dec 2013 06:15:38 -0800 (PST) Received: by 10.14.75.130 with HTTP; Fri, 13 Dec 2013 06:15:37 -0800 (PST) In-Reply-To: <1386876523.19078.93.camel@edumazet-glaptop2.roam.corp.google.com> References: <1386876523.19078.93.camel@edumazet-glaptop2.roam.corp.google.com> Date: Fri, 13 Dec 2013 09:15:37 -0500 Message-ID: Subject: Re: [PATCH net-next] tcp: remove a bogus TSO split From: Neal Cardwell To: Eric Dumazet Cc: David Miller , netdev , Yuchung Cheng , Nandita Dukkipati , Van Jacobson Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Thu, Dec 12, 2013 at 2:28 PM, Eric Dumazet wrote: > From: Eric Dumazet > > While investigating performance problems on small RPC workloads, > I noticed linux TCP stack was always splitting the last TSO skb > into two parts (skbs). One being a multiple of MSS, and a small one > with the Push flag. This split is done even if TCP_NODELAY is set. > > Example with request/response of 4K/4K > > IP A > B: . ack 68432 win 2783 > IP A > B: . 65537:68433(2896) ack 69632 win 2783 > IP A > B: P 68433:69633(1200) ack 69632 win 2783 > IP B > A: . ack 68433 win 2768 > IP B > A: . 69632:72528(2896) ack 69633 win 2768 > IP B > A: P 72528:73728(1200) ack 69633 win 2768 > IP A > B: . ack 72528 win 2783 > IP A > B: . 69633:72529(2896) ack 73728 win 2783 > IP A > B: P 72529:73729(1200) ack 73728 win 2783 > > We think this is not needed. > > All the Nagle/window tests are done at this point. > > This patch tremendously improves performance, as the traffic now looks > like : > > IP A > B: . ack 98304 win 2783 > IP A > B: P 94209:98305(4096) ack 98304 win 2783 > IP B > A: . ack 98305 win 2768 > IP B > A: P 98304:102400(4096) ack 98305 win 2768 > IP A > B: . ack 102400 win 2783 > IP A > B: P 98305:102401(4096) ack 102400 win 2783 > IP B > A: . ack 102401 win 2768 > IP B > A: P 102400:106496(4096) ack 102401 win 2768 > IP A > B: . ack 106496 win 2783 > IP A > B: P 102401:106497(4096) ack 106496 win 2783 > IP B > A: . ack 106497 win 2768 > IP B > A: P 106496:110592(4096) ack 106497 win 2768 > > > Before : > > lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K > 280774 > > Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K': > > 205719.049006 task-clock # 9.278 CPUs utilized > 8,449,968 context-switches # 0.041 M/sec > 1,935,997 CPU-migrations # 0.009 M/sec > 160,541 page-faults # 0.780 K/sec > 548,478,722,290 cycles # 2.666 GHz [83.20%] > 455,240,670,857 stalled-cycles-frontend # 83.00% frontend cycles idle [83.48%] > 272,881,454,275 stalled-cycles-backend # 49.75% backend cycles idle [66.73%] > 166,091,460,030 instructions # 0.30 insns per cycle > # 2.74 stalled cycles per insn [83.39%] > 29,150,229,399 branches # 141.699 M/sec [83.30%] > 1,943,814,026 branch-misses # 6.67% of all branches [83.32%] > > 22.173517844 seconds time elapsed > > lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets" > IpOutRequests 16851063 0.0 > IpExtOutOctets 23878580777 0.0 > > After patch : > > lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K > 280877 > > Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K': > > 107496.071918 task-clock # 4.847 CPUs utilized > 5,635,458 context-switches # 0.052 M/sec > 1,374,707 CPU-migrations # 0.013 M/sec > 160,920 page-faults # 0.001 M/sec > 281,500,010,924 cycles # 2.619 GHz [83.28%] > 228,865,069,307 stalled-cycles-frontend # 81.30% frontend cycles idle [83.38%] > 142,462,742,658 stalled-cycles-backend # 50.61% backend cycles idle [66.81%] > 95,227,712,566 instructions # 0.34 insns per cycle > # 2.40 stalled cycles per insn [83.43%] > 16,209,868,171 branches # 150.795 M/sec [83.20%] > 874,252,952 branch-misses # 5.39% of all branches [83.37%] > > 22.175821286 seconds time elapsed > > lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets" > IpOutRequests 11239428 0.0 > IpExtOutOctets 23595191035 0.0 > > Indeed, the occupancy of tx skbs (IpExtOutOctets/IpOutRequests) is higher : > 2099 instead of 1417, thus helping GRO to be more efficient when using FQ packet > scheduler. > > > Signed-off-by: Eric Dumazet > Cc: Yuchung Cheng > Cc: Neal Cardwell > Cc: Nandita Dukkipati > Cc: Van Jacobson > --- > net/ipv4/tcp_output.c | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index 2a69f42e51ca..335e110e86ba 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -1410,10 +1410,7 @@ static unsigned int tcp_mss_split_point(const struct sock *sk, const struct sk_b > > needed = min(skb->len, window); > > - if (max_len <= needed) > - return max_len; > - > - return needed - needed % mss_now; > + return min(max_len, needed); > } > > /* Can at least one segment of SKB be sent right now, according to the > Seems like a nice improvement, but if we apply this patch then AFAICT to get the Nagle-enabled case right we also have to update tcp_minshall_update() to notice these new non-MSS-aligned segments going out, and count those as non-full-size segments for the minshall-nagle check (to ensure we have no more than one outstanding un-ACKed sub-MSS packet). Maybe something like (please excuse the formatting): neal --- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/include/net/tcp.h b/include/net/tcp.h index 70e55d2..a2ec237 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -980,7 +980,8 @@ bool tcp_is_cwnd_limited(const struct sock *sk, u32 in_flight); static inline void tcp_minshall_update(struct tcp_sock *tp, unsigned int mss, const struct sk_buff *skb) { - if (skb->len < mss) + if (skb->len < mss || + tcp_skb_pcount(skb) * tcp_skb_mss(skb) > skb->len) tp->snd_sml = TCP_SKB_CB(skb)->end_seq; }