From patchwork Mon Apr 23 09:38:54 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 154376 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id E6547B6FAA for ; Mon, 23 Apr 2012 19:39:04 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754597Ab2DWJjB (ORCPT ); Mon, 23 Apr 2012 05:39:01 -0400 Received: from mail-bk0-f46.google.com ([209.85.214.46]:61749 "EHLO mail-bk0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753894Ab2DWJjA (ORCPT ); Mon, 23 Apr 2012 05:39:00 -0400 Received: by bkcik5 with SMTP id ik5so8416676bkc.19 for ; Mon, 23 Apr 2012 02:38:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:from:to:cc:content-type:date:message-id:mime-version :x-mailer:content-transfer-encoding; bh=h4y2NTC2yHhfxLafKLXen71L0kcVsZFYli1S8U/rSAM=; b=dRyA4+6FKL/9B6Wjz7OHWMnbQqcGVeIi3UzNgxknsaWc4DDujcr1JVrgRz+YjDAZNQ u139BHkwKoOKHGtVc62AQPF1EdI1R1PoFvMOtfx8DQ4YpqJjj5XV1otc9cXl97YbvK8K 5piLNbgDWwCHzp7nTo/FC/apj6h48TYz7jgD1kIGYP/FGUXi/GUdJXtXT/v8I2yRiqTK e26EhmIYNKMuQ3lCaWW2JOoLbQm+0QJ2UljoNlFIGo/ZaxyDgWEqdU3iTJoC1iicTml6 1LxOU1oeqBlCG24C/eCV+OQXcohkTLDDn3nTkk/nEnT5Pnir3hcQyHh2So2T+hNhCOdx 92RA== Received: by 10.204.152.18 with SMTP id e18mr2302649bkw.83.1335173939391; Mon, 23 Apr 2012 02:38:59 -0700 (PDT) Received: from [172.28.89.177] ([74.125.122.49]) by mx.google.com with ESMTPS id s20sm23119827bks.2.2012.04.23.02.38.55 (version=SSLv3 cipher=OTHER); Mon, 23 Apr 2012 02:38:56 -0700 (PDT) Subject: [PATCH 2/2 net-next] tcp: sk_add_backlog() is too agressive for TCP From: Eric Dumazet To: David Miller Cc: netdev , Tom Herbert , Neal Cardwell , Maciej =?UTF-8?Q?=C5=BBenczykowski?= , Yuchung Cheng , Ilpo =?ISO-8859-1?Q?J=E4rvinen?= , Rick Jones Date: Mon, 23 Apr 2012 11:38:54 +0200 Message-ID: <1335173934.3293.84.camel@edumazet-glaptop> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Eric Dumazet While investigating TCP performance problems on 10Gb+ links, we found a tcp sender was dropping lot of incoming ACKS because of sk_rcvbuf limit in sk_add_backlog(), especially if receiver doesnt use GRO/LRO and sends one ACK every two MSS segments. A sender usually tweaks sk_sndbuf, but sk_rcvbuf stays at its default value (87380), allowing a too small backlog. A TCP ACK, even being small, can consume nearly same truesize space than outgoing packets. Using sk_rcvbuf + sk_sndbuf as a limit makes sense and is fast to compute. Performance results on netperf, single flow, receiver with disabled GRO/LRO : 7500 Mbits instead of 6050 Mbits, no more TCPBacklogDrop increments at sender. Signed-off-by: Eric Dumazet Cc: Neal Cardwell Cc: Tom Herbert Cc: Maciej Żenczykowski Cc: Yuchung Cheng Cc: Ilpo Järvinen Cc: Rick Jones --- net/ipv4/tcp_ipv4.c | 3 ++- net/ipv6/tcp_ipv6.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 917607e..cf97e98 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1752,7 +1752,8 @@ process: if (!tcp_prequeue(sk, skb)) ret = tcp_v4_do_rcv(sk, skb); } - } else if (unlikely(sk_add_backlog(sk, skb, sk->sk_rcvbuf))) { + } else if (unlikely(sk_add_backlog(sk, skb, + sk->sk_rcvbuf + sk->sk_sndbuf))) { bh_unlock_sock(sk); NET_INC_STATS_BH(net, LINUX_MIB_TCPBACKLOGDROP); goto discard_and_relse; diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index b04e6d8..5fb19d3 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1654,7 +1654,8 @@ process: if (!tcp_prequeue(sk, skb)) ret = tcp_v6_do_rcv(sk, skb); } - } else if (unlikely(sk_add_backlog(sk, skb, sk->sk_rcvbuf))) { + } else if (unlikely(sk_add_backlog(sk, skb, + sk->sk_rcvbuf + sk->sk_sndbuf))) { bh_unlock_sock(sk); NET_INC_STATS_BH(net, LINUX_MIB_TCPBACKLOGDROP); goto discard_and_relse;