From patchwork Wed Dec 7 17:19:33 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 703674 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3tYld61sQRz9vDc for ; Thu, 8 Dec 2016 04:20:42 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="j+VMvQ2/"; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932511AbcLGRUg (ORCPT ); Wed, 7 Dec 2016 12:20:36 -0500 Received: from mail-pf0-f194.google.com ([209.85.192.194]:35105 "EHLO mail-pf0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932192AbcLGRUf (ORCPT ); Wed, 7 Dec 2016 12:20:35 -0500 Received: by mail-pf0-f194.google.com with SMTP id i88so11494680pfk.2 for ; Wed, 07 Dec 2016 09:20:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:subject:from:to:cc:date:mime-version :content-transfer-encoding; bh=M+l0haHesvxHxATIPO+iJHtXPrFw31Ri6oIp9k6iq9Q=; b=j+VMvQ2/btgNoGhD1C8jfmV6bT6IbLMcqjU8Kqp0aMxcF8eKvt5cu0xujYk68UWMR0 o5Cx1lpaV7KaWioq4+2UPue9T3phpBqqQ6Ga6OINDm6eVdZn1jqiMK1gMLBvsDpbtKKm fFIa5gVnAwIkhLjILnkEH48ubMrW4WatLO4gTdxVuxBguaY0H5axYqM+4buO1nDQy6yP mTldQ2Lx41Yw1t4+aPZbPW3wPMfsDZGz+lMxrzhAYwZaU3eeUR+dWSuX9oyEr4ZInPFR S7ICUuPFQok7Jo6FysykkQzUmIAB0aE0tkyV13OFpz+t7lCpHt+mAOpipK32OQ759LPp Kggw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:subject:from:to:cc:date:mime-version :content-transfer-encoding; bh=M+l0haHesvxHxATIPO+iJHtXPrFw31Ri6oIp9k6iq9Q=; b=DJbldVDHu3xD7EHLTFBPFZiATUqQPu1DztduDqpWVuo7JQU1HAN5qbbl4kD282Aclf DfV6OFgnsx01ZYurjT65gMJaTVOKfUBN8OP52d8NVTFm1BJzT5XQynR78pr2h00euAel UvFzuHL38PECT2XsvEyyh/ORmIOL18J32HMLVJiObGuNPM+6gV/+rH447m/5IuZfkxbZ wzgzOe6s2ErCIXi0Rca/tUOtmmaFkiQCzdt4sRLyJ/mL70kg71cbraUrjyZPfv+BPJc5 C6KS0S/UNb+7cqcF9GNCV7twbEtAS7Q8OxqvfomV3lnz18Sg6oKbggRQhx2LagbUaBrP NqTQ== X-Gm-Message-State: AKaTC021dDdigaaU/pnOtWDrY6JFacMj812zprJgpwKkDvNDRZK6RvGMjUPRvywKL/B2Rg== X-Received: by 10.99.250.69 with SMTP id g5mr123113453pgk.15.1481131234283; Wed, 07 Dec 2016 09:20:34 -0800 (PST) Received: from [172.19.22.200] ([172.19.22.200]) by smtp.googlemail.com with ESMTPSA id m5sm44156487pgn.42.2016.12.07.09.20.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 07 Dec 2016 09:20:33 -0800 (PST) Message-ID: <1481131173.4930.36.camel@edumazet-glaptop3.roam.corp.google.com> Subject: [PATCH net-next] udp: under rx pressure, try to condense skbs From: Eric Dumazet To: David Miller Cc: netdev , Paolo Abeni Date: Wed, 07 Dec 2016 09:19:33 -0800 X-Mailer: Evolution 3.10.4-0ubuntu2 Mime-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Eric Dumazet Under UDP flood, many softirq producers try to add packets to UDP receive queue, and one user thread is burning one cpu trying to dequeue packets as fast as possible. Two parts of the per packet cost are : - copying payload from kernel space to user space, - freeing memory pieces associated with skb. If socket is under pressure, softirq handler(s) can try to pull in skb->head the payload of the packet if it fits. Meaning the softirq handler(s) can free/reuse the page fragment immediately, instead of letting udp_recvmsg() do this hundreds of usec later, possibly from another node. Additional gains : - We reduce skb->truesize and thus can store more packets per SO_RCVBUF - We avoid cache line misses at copyout() time and consume_skb() time, and avoid one put_page() with potential alien freeing on NUMA hosts. This comes at the cost of a copy, bounded to available tail room, which is usually small. (We might have to fix GRO_MAX_HEAD which looks bigger than necessary) This patch gave me about 5 % increase in throughput in my tests. skb_condense() helper could probably used in other contexts. Signed-off-by: Eric Dumazet Cc: Paolo Abeni --- include/linux/skbuff.h | 2 ++ net/core/skbuff.c | 28 ++++++++++++++++++++++++++++ net/ipv4/udp.c | 12 +++++++++++- 3 files changed, 41 insertions(+), 1 deletion(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 9c535fbccf2c7dbfae04cee393460e86d588c26b..0cd92b0f2af5fe5a7c153435b8dc758338180ae3 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1966,6 +1966,8 @@ static inline int pskb_may_pull(struct sk_buff *skb, unsigned int len) return __pskb_pull_tail(skb, len - skb_headlen(skb)) != NULL; } +void skb_condense(struct sk_buff *skb); + /** * skb_headroom - bytes at buffer head * @skb: buffer to check diff --git a/net/core/skbuff.c b/net/core/skbuff.c index b45cd1494243fc99686016949f4546dbba11f424..84151cf40aebb973bad5bee3ee4be0758084d83c 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4931,3 +4931,31 @@ struct sk_buff *pskb_extract(struct sk_buff *skb, int off, return clone; } EXPORT_SYMBOL(pskb_extract); + +/** + * skb_condense - try to get rid of fragments/frag_list if possible + * @skb: buffer + * + * Can be used to save memory before skb is added to a busy queue. + * If packet has bytes in frags and enough tail room in skb->head, + * pull all of them, so that we can free the frags right now and adjust + * truesize. + * Notes: + * We do not reallocate skb->head thus can not fail. + * Caller must re-evaluate skb->truesize if needed. + */ +void skb_condense(struct sk_buff *skb) +{ + if (!skb->data_len || + skb->data_len > skb->end - skb->tail || + skb_cloned(skb)) + return; + + /* Nice, we can free page frag(s) right now */ + __pskb_pull_tail(skb, skb->data_len); + + /* Now adjust skb->truesize, since __pskb_pull_tail() does + * not do this. + */ + skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); +} diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 16d88ba9ff1c402f77063cfb5eea2708d86da2fc..f5628ada47b53f0d92d08210e5d7e4132a107f73 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1199,7 +1199,7 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb) { struct sk_buff_head *list = &sk->sk_receive_queue; int rmem, delta, amt, err = -ENOMEM; - int size = skb->truesize; + int size; /* try to avoid the costly atomic add/sub pair when the receive * queue is full; always allow at least a packet @@ -1208,6 +1208,16 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb) if (rmem > sk->sk_rcvbuf) goto drop; + /* Under mem pressure, it might be helpful to help udp_recvmsg() + * having linear skbs : + * - Reduce memory overhead and thus increase receive queue capacity + * - Less cache line misses at copyout() time + * - Less work at consume_skb() (less alien page frag freeing) + */ + if (rmem > (sk->sk_rcvbuf >> 1)) + skb_condense(skb); + size = skb->truesize; + /* we drop only if the receive buf is full and the receive * queue contains some other skb */