From patchwork Wed Apr 27 17:12:25 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 615751 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3qw63C3VHNz9t73 for ; Thu, 28 Apr 2016 03:12:39 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=pCdrEjPd; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753468AbcD0RMa (ORCPT ); Wed, 27 Apr 2016 13:12:30 -0400 Received: from mail-pf0-f175.google.com ([209.85.192.175]:32820 "EHLO mail-pf0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752653AbcD0RM2 (ORCPT ); Wed, 27 Apr 2016 13:12:28 -0400 Received: by mail-pf0-f175.google.com with SMTP id 206so22713213pfu.0 for ; Wed, 27 Apr 2016 10:12:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:subject:from:to:cc:date:mime-version :content-transfer-encoding; bh=5YivOHVPw2/X/Sbo+4/MJXYyjyxouCmyY/jY9cZ6c3M=; b=pCdrEjPdfj58+bhhnD8jMFNNV9mY2xmZoyHzCjXfbbv8t6TIbY7szmdOGQJiGlDIQO fDDhNVEY9DAQQgJr//HlM8fFU2d93ETJzYNuJpoVGHBAF3Lniwa2U1DiDGKdUDRYOYVU yyCZgEPiT98qYwGpYoxyUEvqB/OhubokVJCCgXGrkMVnuv+/AXvbFIK9Ae8yGBMXCtl7 d2xYF7s51t1BiFwahC8bGGD+sY1QXxXEITBV9TzCci96VARsTiZHESAl7uxOnVeI415E NKYL3qfX6QuEsm1Pnes8Hmsg5GT6oJTuwCMjOy46Gk+I8ezGd/eObh8/k7SpJHv7/e9Q lIyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:subject:from:to:cc:date:mime-version :content-transfer-encoding; bh=5YivOHVPw2/X/Sbo+4/MJXYyjyxouCmyY/jY9cZ6c3M=; b=WKoST/3EB57fdMmwuzKjrQMVrmdyQzPRf+oAkjTh0hk8WZsYUTtAMqX2V9llufyYT/ zG/wqmcDjM8plUY1KfIxboHmUkajOHFHy68gBx7R6ORCCHixT7N1a1UbHhcWhr/pXiPR ka7FxqNO6D+gFgaecI3zGspqtyBtpjdFPtIjZjrlXuknqfGmsMIH3qAkk/GIyclXrnht xLmPCmzG8I9Taie+QqlDgIrvvD5SXmFvesE4q2w8v0aEk9fjsjdQxX0SdqUJ79We7qFS 4sG/zfyqhv02TfyDSXtH1YG7YId83nWYaWKgdPsN5Pb4BgiW96beK8H5CxH13mBx4wje u9PQ== X-Gm-Message-State: AOPr4FXcxbceFWHsFI8h+oEG4HGFSBJ+XGAfQD2vrUXZWklSzeVue9/Lkv9I+LoeZQaQfQ== X-Received: by 10.98.55.129 with SMTP id e123mr13314930pfa.4.1461777147378; Wed, 27 Apr 2016 10:12:27 -0700 (PDT) Received: from ?IPv6:2620:0:1000:1704:1c59:255c:e322:1a3f? ([2620:0:1000:1704:1c59:255c:e322:1a3f]) by smtp.googlemail.com with ESMTPSA id 82sm7963392pfb.64.2016.04.27.10.12.25 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 27 Apr 2016 10:12:26 -0700 (PDT) Message-ID: <1461777145.5535.77.camel@edumazet-glaptop3.roam.corp.google.com> Subject: [PATCH net-next] tcp: give prequeue mode some care From: Eric Dumazet To: David Miller Cc: netdev Date: Wed, 27 Apr 2016 10:12:25 -0700 X-Mailer: Evolution 3.10.4-0ubuntu2 Mime-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Eric Dumazet TCP prequeue goal is to defer processing of incoming packets to user space thread currently blocked in a recvmsg() system call. Intent is to spend less time processing these packets on behalf of softirq handler, as softirq handler is unfair to normal process scheduler decisions, as it might interrupt threads that do not even use networking. Current prequeue implementation has following issues : 1) It only checks size of the prequeue against sk_rcvbuf It was fine 15 years ago when sk_rcvbuf was in the 64KB vicinity. But we now have ~8MB values to cope with modern networking needs. We have to add sk_rmem_alloc in the equation, since out of order packets can definitely use up to sk_rcvbuf memory themselves. 2) Even with a fixed memory truesize check, prequeue can be filled by thousands of packets. When prequeue needs to be flushed, either from sofirq context (in tcp_prequeue() or timer code), or process context (in tcp_prequeue_process()), this adds a latency spike which is often not desirable. I added a fixed limit of 32 packets, as this translated to a max flush time of 60 us on my test hosts. Also note that all packets in prequeue are not accounted for tcp_mem, since they are not charged against sk_forward_alloc at this point. This is probably not a big deal. Note that this might increase LINUX_MIB_TCPPREQUEUEDROPPED counts, which is misnamed, as packets are not dropped at all, but rather pushed to the stack (where they can be either consumed or dropped) Signed-off-by: Eric Dumazet --- net/ipv4/tcp_ipv4.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index d2a5763e5abc..58bcf5e001e7 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1506,16 +1506,16 @@ bool tcp_prequeue(struct sock *sk, struct sk_buff *skb) __skb_queue_tail(&tp->ucopy.prequeue, skb); tp->ucopy.memory += skb->truesize; - if (tp->ucopy.memory > sk->sk_rcvbuf) { + if (skb_queue_len(&tp->ucopy.prequeue) >= 32 || + tp->ucopy.memory + atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) { struct sk_buff *skb1; BUG_ON(sock_owned_by_user(sk)); + NET_ADD_STATS_BH(sock_net(sk), LINUX_MIB_TCPPREQUEUEDROPPED, + skb_queue_len(&tp->ucopy.prequeue)); - while ((skb1 = __skb_dequeue(&tp->ucopy.prequeue)) != NULL) { + while ((skb1 = __skb_dequeue(&tp->ucopy.prequeue)) != NULL) sk_backlog_rcv(sk, skb1); - NET_INC_STATS_BH(sock_net(sk), - LINUX_MIB_TCPPREQUEUEDROPPED); - } tp->ucopy.memory = 0; } else if (skb_queue_len(&tp->ucopy.prequeue) == 1) {