{"id":808152,"url":"http://patchwork.ozlabs.org/api/patches/808152/?format=json","web_url":"http://patchwork.ozlabs.org/project/netdev/patch/150417481955.28907.15567119824187929000.stgit@firesoul/","project":{"id":7,"url":"http://patchwork.ozlabs.org/api/projects/7/?format=json","name":"Linux network development","link_name":"netdev","list_id":"netdev.vger.kernel.org","list_email":"netdev@vger.kernel.org","web_url":null,"scm_url":null,"webscm_url":null,"list_archive_url":"","list_archive_url_format":"","commit_url_format":""},"msgid":"<150417481955.28907.15567119824187929000.stgit@firesoul>","list_archive_url":null,"date":"2017-08-31T10:20:19","name":"[RFC] net: frag limit checks need to use percpu_counter_compare","commit_ref":null,"pull_url":null,"state":"rfc","archived":true,"hash":"6e08bdc9af345fb2c91138fd3721331011cbea41","submitter":{"id":13625,"url":"http://patchwork.ozlabs.org/api/people/13625/?format=json","name":"Jesper Dangaard Brouer","email":"brouer@redhat.com"},"delegate":{"id":34,"url":"http://patchwork.ozlabs.org/api/users/34/?format=json","username":"davem","first_name":"David","last_name":"Miller","email":"davem@davemloft.net"},"mbox":"http://patchwork.ozlabs.org/project/netdev/patch/150417481955.28907.15567119824187929000.stgit@firesoul/mbox/","series":[{"id":784,"url":"http://patchwork.ozlabs.org/api/series/784/?format=json","web_url":"http://patchwork.ozlabs.org/project/netdev/list/?series=784","date":"2017-08-31T10:20:19","name":"[RFC] net: frag limit checks need to use percpu_counter_compare","version":1,"mbox":"http://patchwork.ozlabs.org/series/784/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/patches/808152/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/808152/checks/","tags":{},"related":[],"headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ext-mx10.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx10.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=brouer@redhat.com"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xjdg00sqVz9sNr\n\tfor <patchwork-incoming@ozlabs.org>;\n\tThu, 31 Aug 2017 20:20:28 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751419AbdHaKU0 (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 31 Aug 2017 06:20:26 -0400","from mx1.redhat.com ([209.132.183.28]:52218 \"EHLO mx1.redhat.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1750880AbdHaKUY (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tThu, 31 Aug 2017 06:20:24 -0400","from smtp.corp.redhat.com\n\t(int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 9317361474;\n\tThu, 31 Aug 2017 10:20:24 +0000 (UTC)","from firesoul.localdomain (ovpn-200-42.brq.redhat.com\n\t[10.40.200.42])\n\tby smtp.corp.redhat.com (Postfix) with ESMTP id 0F4FBB32A2;\n\tThu, 31 Aug 2017 10:20:22 +0000 (UTC)","from [192.168.5.1] (localhost [IPv6:::1])\n\tby firesoul.localdomain (Postfix) with ESMTP id A51D43073EC87;\n\tThu, 31 Aug 2017 12:20:19 +0200 (CEST)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 9317361474","Subject":"[RFC PATCH] net: frag limit checks need to use\n\tpercpu_counter_compare","From":"Jesper Dangaard Brouer <brouer@redhat.com>","To":"liujian56@huawei.com","Cc":"netdev@vger.kernel.org, Florian Westphal <fw@strlen.de>,\n\tJesper Dangaard Brouer <brouer@redhat.com>","Date":"Thu, 31 Aug 2017 12:20:19 +0200","Message-ID":"<150417481955.28907.15567119824187929000.stgit@firesoul>","User-Agent":"StGit/0.17.1-dirty","MIME-Version":"1.0","Content-Type":"text/plain; charset=\"utf-8\"","Content-Transfer-Encoding":"7bit","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.14","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.39]);\n\tThu, 31 Aug 2017 10:20:24 +0000 (UTC)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"},"content":"To: Liujian can you please test this patch?\n I want to understand if using __percpu_counter_compare() solves\n the problem correctness wise (even-though this will be slower\n than using a simple atomic_t on your big system).\n\nFix bug in fragmentation codes use of the percpu_counter API, that\ncause issues on systems with many CPUs.\n\nThe frag_mem_limit() just reads the global counter (fbc->count),\nwithout considering other CPUs can have upto batch size (130K) that\nhaven't been subtracted yet.  Due to the 3MBytes lower thresh limit,\nthis become dangerous at >=24 CPUs (3*1024*1024/130000=24).\n\nThe __percpu_counter_compare() does the right thing, and takes into\naccount the number of (online) CPUs and batch size, to account for\nthis and call __percpu_counter_sum() when needed.\n\nOn systems with many CPUs this will unfortunately always result in the\nheavier fully locked __percpu_counter_sum() which touch the\nper_cpu_ptr of all (online) CPUs.\n\nOn systems with a smaller number of CPUs this solution is also not\noptimal, because __percpu_counter_compare()/__percpu_counter_sum()\ndoesn't help synchronize the global counter.\n Florian Westphal have an idea of adding some counter sync points,\nwhich should help address this issue.\n---\n include/net/inet_frag.h  |   16 ++++++++++++++--\n net/ipv4/inet_fragment.c |    6 +++---\n 2 files changed, 17 insertions(+), 5 deletions(-)","diff":"diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h\nindex 6fdcd2427776..b586e320783d 100644\n--- a/include/net/inet_frag.h\n+++ b/include/net/inet_frag.h\n@@ -147,9 +147,21 @@ static inline bool inet_frag_evicting(struct inet_frag_queue *q)\n  */\n static unsigned int frag_percpu_counter_batch = 130000;\n \n-static inline int frag_mem_limit(struct netns_frags *nf)\n+static inline bool frag_mem_over_limit(struct netns_frags *nf, int thresh)\n {\n-\treturn percpu_counter_read(&nf->mem);\n+\t/* When reading counter here, __percpu_counter_compare() call\n+\t * will invoke __percpu_counter_sum() when needed.  Which\n+\t * depend on num_online_cpus()*batch size, as each CPU can\n+\t * potentential can hold a batch count.\n+\t *\n+\t * With many CPUs this heavier sum operation will\n+\t * unfortunately always occur.\n+\t */\n+\tif (__percpu_counter_compare(&nf->mem, thresh,\n+\t\t\t\t     frag_percpu_counter_batch) > 0)\n+\t\treturn true;\n+\telse\n+\t\treturn false;\n }\n \n static inline void sub_frag_mem_limit(struct netns_frags *nf, int i)\ndiff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c\nindex 96e95e83cc61..ee2cf56900e6 100644\n--- a/net/ipv4/inet_fragment.c\n+++ b/net/ipv4/inet_fragment.c\n@@ -120,7 +120,7 @@ static void inet_frag_secret_rebuild(struct inet_frags *f)\n static bool inet_fragq_should_evict(const struct inet_frag_queue *q)\n {\n \treturn q->net->low_thresh == 0 ||\n-\t       frag_mem_limit(q->net) >= q->net->low_thresh;\n+\t\tfrag_mem_over_limit(q->net, q->net->low_thresh);\n }\n \n static unsigned int\n@@ -355,7 +355,7 @@ static struct inet_frag_queue *inet_frag_alloc(struct netns_frags *nf,\n {\n \tstruct inet_frag_queue *q;\n \n-\tif (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh) {\n+\tif (!nf->high_thresh || frag_mem_over_limit(nf, nf->high_thresh)) {\n \t\tinet_frag_schedule_worker(f);\n \t\treturn NULL;\n \t}\n@@ -396,7 +396,7 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,\n \tstruct inet_frag_queue *q;\n \tint depth = 0;\n \n-\tif (frag_mem_limit(nf) > nf->low_thresh)\n+\tif (frag_mem_over_limit(nf, nf->low_thresh))\n \t\tinet_frag_schedule_worker(f);\n \n \thash &= (INETFRAGS_HASHSZ - 1);\n","prefixes":["RFC"]}