[{"id":1761138,"web_url":"http://patchwork.ozlabs.org/comment/1761138/","msgid":"<20170831085802.2d4cdc87@xeon-e3>","list_archive_url":null,"date":"2017-08-31T15:58:02","subject":"Re: [RFC PATCH] net: frag limit checks need to use\n\tpercpu_counter_compare","submitter":{"id":21389,"url":"http://patchwork.ozlabs.org/api/people/21389/","name":"Stephen Hemminger","email":"stephen@networkplumber.org"},"content":"On Thu, 31 Aug 2017 12:20:19 +0200\nJesper Dangaard Brouer <brouer@redhat.com> wrote:\n\n> +static inline bool frag_mem_over_limit(struct netns_frags *nf, int thresh)\n>  {\n> -\treturn percpu_counter_read(&nf->mem);\n> +\t/* When reading counter here, __percpu_counter_compare() call\n> +\t * will invoke __percpu_counter_sum() when needed.  Which\n> +\t * depend on num_online_cpus()*batch size, as each CPU can\n> +\t * potentential can hold a batch count.\n> +\t *\n> +\t * With many CPUs this heavier sum operation will\n> +\t * unfortunately always occur.\n> +\t */\n> +\tif (__percpu_counter_compare(&nf->mem, thresh,\n> +\t\t\t\t     frag_percpu_counter_batch) > 0)\n> +\t\treturn true;\n> +\telse\n> +\t\treturn false;\n\nYou don't need an if() here.\n\n\treturn __percpu_counter_compare(&nf->mem, thresh,\n\t\t\t\t     frag_percpu_counter_batch) > 0;","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=networkplumber-org.20150623.gappssmtp.com\n\theader.i=@networkplumber-org.20150623.gappssmtp.com\n\theader.b=\"NeQKuZO+\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xjn8f24gVz9s7G\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri,  1 Sep 2017 01:58:10 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751856AbdHaP6I (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 31 Aug 2017 11:58:08 -0400","from mail-pf0-f171.google.com ([209.85.192.171]:35291 \"EHLO\n\tmail-pf0-f171.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1751770AbdHaP6G (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Thu, 31 Aug 2017 11:58:06 -0400","by mail-pf0-f171.google.com with SMTP id g13so131846pfm.2\n\tfor <netdev@vger.kernel.org>; Thu, 31 Aug 2017 08:58:06 -0700 (PDT)","from xeon-e3 (76-14-207-240.or.wavecable.com. [76.14.207.240])\n\tby smtp.gmail.com with ESMTPSA id\n\ta21sm87704pfj.89.2017.08.31.08.58.05\n\t(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);\n\tThu, 31 Aug 2017 08:58:05 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=networkplumber-org.20150623.gappssmtp.com; s=20150623;\n\th=date:from:to:cc:subject:message-id:in-reply-to:references\n\t:mime-version:content-transfer-encoding;\n\tbh=LNuyWdp664I1wgyXodNc3eA6RgOsnh5FKX2oW7jHrEE=;\n\tb=NeQKuZO+5YfI2kc2AeOPY/OYZO/b3TJ9nkzA4usBT+m/MaFXiVZWbv9TglgLpPxkmg\n\t2c7Tx80d5UFwdU1bU0N0+L2UqdZQ7izOSBVd/oKDNaRQIQVe0wfWk2gA7LHAXbLXo7su\n\th10pKJzwQZTXgELARHeQIvat7uwZ/PA8WGt50Rj6Rhj1Oo8qX43R+acmwcm0n5UDMzMY\n\txY28nFv4RNM3dVKk1rBeNXz1gJbcFxizqoZmbLCbbOsjtUOo78JV62riia6MUCaqrfMf\n\tQZJZ3/MaU9jGoKzFeWo/oeaWNzgTiflNpdyc3dDPfde8joCorcE3CpNDsmYHPFQqpDnd\n\t0Aug==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to\n\t:references:mime-version:content-transfer-encoding;\n\tbh=LNuyWdp664I1wgyXodNc3eA6RgOsnh5FKX2oW7jHrEE=;\n\tb=qOdkYOK2iClsS5pFUrZkFiodkG/xQT7jt7toEZ0YE8dccF6ZclRVyvTBt+Bi1hu+gt\n\tBcPz5UCoac9Tyvcj47Bhhj+6EiEkaIDfcdyj24OGOGsVdM040vEC5V2vtxlDkB1CrFTt\n\tOiMmST8nA2Dpz/sqFuq1y2iVCjYlA96Wi+WhDoWrXW1IokwkCUASGzWZ23JShhLC/+0i\n\tOgbVa1wXC0GYgMk4rPYCpqrQleNHuwSLszPd+Rbj0FzgEPLAZPpPTgUjaOztWH6ILa0R\n\tbcjBYqrx0GY4LLC9U0b4GbL70FpwiTvdMaV3GPPfN41lvsw3mq+ZLg9zuXWSi4SuKFmp\n\tRXvg==","X-Gm-Message-State":"AHYfb5gxzD4jOtUUctbKAQW6a/O07oKu6SB0hZBkpq4dZ/Wi14AaUVeI\n\tti1WkvEPW2b5lGCTdzKadQ==","X-Google-Smtp-Source":"ADKCNb7Pfl7P+AVUNvj0JKmr/wIRLVOy7xc4itoZ1+5YzueYbaI07mDxdDRgMqHRQQrwFizStyGZIg==","X-Received":"by 10.99.2.18 with SMTP id 18mr2990496pgc.70.1504195085347;\n\tThu, 31 Aug 2017 08:58:05 -0700 (PDT)","Date":"Thu, 31 Aug 2017 08:58:02 -0700","From":"Stephen Hemminger <stephen@networkplumber.org>","To":"Jesper Dangaard Brouer <brouer@redhat.com>","Cc":"liujian56@huawei.com, netdev@vger.kernel.org,\n\tFlorian Westphal <fw@strlen.de>","Subject":"Re: [RFC PATCH] net: frag limit checks need to use\n\tpercpu_counter_compare","Message-ID":"<20170831085802.2d4cdc87@xeon-e3>","In-Reply-To":"<150417481955.28907.15567119824187929000.stgit@firesoul>","References":"<150417481955.28907.15567119824187929000.stgit@firesoul>","MIME-Version":"1.0","Content-Type":"text/plain; charset=US-ASCII","Content-Transfer-Encoding":"7bit","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1761167,"web_url":"http://patchwork.ozlabs.org/comment/1761167/","msgid":"<20170831162349.k3qnkfgkygdh2zqw@unicorn.suse.cz>","list_archive_url":null,"date":"2017-08-31T16:23:49","subject":"Re: [RFC PATCH] net: frag limit checks need to use\n\tpercpu_counter_compare","submitter":{"id":11892,"url":"http://patchwork.ozlabs.org/api/people/11892/","name":"Michal Kubecek","email":"mkubecek@suse.cz"},"content":"On Thu, Aug 31, 2017 at 12:20:19PM +0200, Jesper Dangaard Brouer wrote:\n> To: Liujian can you please test this patch?\n>  I want to understand if using __percpu_counter_compare() solves\n>  the problem correctness wise (even-though this will be slower\n>  than using a simple atomic_t on your big system).\n> \n> Fix bug in fragmentation codes use of the percpu_counter API, that\n> cause issues on systems with many CPUs.\n> \n> The frag_mem_limit() just reads the global counter (fbc->count),\n> without considering other CPUs can have upto batch size (130K) that\n> haven't been subtracted yet.  Due to the 3MBytes lower thresh limit,\n> this become dangerous at >=24 CPUs (3*1024*1024/130000=24).\n> \n> The __percpu_counter_compare() does the right thing, and takes into\n> account the number of (online) CPUs and batch size, to account for\n> this and call __percpu_counter_sum() when needed.\n> \n> On systems with many CPUs this will unfortunately always result in the\n> heavier fully locked __percpu_counter_sum() which touch the\n> per_cpu_ptr of all (online) CPUs.\n> \n> On systems with a smaller number of CPUs this solution is also not\n> optimal, because __percpu_counter_compare()/__percpu_counter_sum()\n> doesn't help synchronize the global counter.\n>  Florian Westphal have an idea of adding some counter sync points,\n> which should help address this issue.\n> ---\n>  include/net/inet_frag.h  |   16 ++++++++++++++--\n>  net/ipv4/inet_fragment.c |    6 +++---\n>  2 files changed, 17 insertions(+), 5 deletions(-)\n> \n> diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h\n> index 6fdcd2427776..b586e320783d 100644\n> --- a/include/net/inet_frag.h\n> +++ b/include/net/inet_frag.h\n> @@ -147,9 +147,21 @@ static inline bool inet_frag_evicting(struct inet_frag_queue *q)\n>   */\n>  static unsigned int frag_percpu_counter_batch = 130000;\n>  \n> -static inline int frag_mem_limit(struct netns_frags *nf)\n> +static inline bool frag_mem_over_limit(struct netns_frags *nf, int thresh)\n>  {\n> -\treturn percpu_counter_read(&nf->mem);\n> +\t/* When reading counter here, __percpu_counter_compare() call\n> +\t * will invoke __percpu_counter_sum() when needed.  Which\n> +\t * depend on num_online_cpus()*batch size, as each CPU can\n> +\t * potentential can hold a batch count.\n> +\t *\n> +\t * With many CPUs this heavier sum operation will\n> +\t * unfortunately always occur.\n> +\t */\n> +\tif (__percpu_counter_compare(&nf->mem, thresh,\n> +\t\t\t\t     frag_percpu_counter_batch) > 0)\n> +\t\treturn true;\n> +\telse\n> +\t\treturn false;\n>  }\n>  \n>  static inline void sub_frag_mem_limit(struct netns_frags *nf, int i)\n> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c\n> index 96e95e83cc61..ee2cf56900e6 100644\n> --- a/net/ipv4/inet_fragment.c\n> +++ b/net/ipv4/inet_fragment.c\n> @@ -120,7 +120,7 @@ static void inet_frag_secret_rebuild(struct inet_frags *f)\n>  static bool inet_fragq_should_evict(const struct inet_frag_queue *q)\n>  {\n>  \treturn q->net->low_thresh == 0 ||\n> -\t       frag_mem_limit(q->net) >= q->net->low_thresh;\n> +\t\tfrag_mem_over_limit(q->net, q->net->low_thresh);\n>  }\n>  \n>  static unsigned int\n> @@ -355,7 +355,7 @@ static struct inet_frag_queue *inet_frag_alloc(struct netns_frags *nf,\n>  {\n>  \tstruct inet_frag_queue *q;\n>  \n> -\tif (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh) {\n> +\tif (!nf->high_thresh || frag_mem_over_limit(nf, nf->high_thresh)) {\n>  \t\tinet_frag_schedule_worker(f);\n>  \t\treturn NULL;\n>  \t}\n\nIf we go this way (which would IMHO require some benchmarks to make sure\nit doesn't harm performance too much) we can drop the explicit checks\nfor zero thresholds which were added to work around the unreliability of\nfast checks of percpu counters (or at least the second one was by commit\n30759219f562 (\"net: disable fragment reassembly if high_thresh is zero\").\n \nMichal Kubecek\n\n> @@ -396,7 +396,7 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,\n>  \tstruct inet_frag_queue *q;\n>  \tint depth = 0;\n>  \n> -\tif (frag_mem_limit(nf) > nf->low_thresh)\n> +\tif (frag_mem_over_limit(nf, nf->low_thresh))\n>  \t\tinet_frag_schedule_worker(f);\n>  \n>  \thash &= (INETFRAGS_HASHSZ - 1);\n>","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xjnkL4x9Qz9ryk\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri,  1 Sep 2017 02:23:54 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751814AbdHaQXw (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 31 Aug 2017 12:23:52 -0400","from mx2.suse.de ([195.135.220.15]:34595 \"EHLO mx1.suse.de\"\n\trhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP\n\tid S1751752AbdHaQXv (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tThu, 31 Aug 2017 12:23:51 -0400","from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254])\n\tby mx1.suse.de (Postfix) with ESMTP id 229AFAC17;\n\tThu, 31 Aug 2017 16:23:50 +0000 (UTC)","by unicorn.suse.cz (Postfix, from userid 1000)\n\tid 510F3A0BE8; Thu, 31 Aug 2017 18:23:49 +0200 (CEST)"],"X-Virus-Scanned":"by amavisd-new at test-mx.suse.de","Date":"Thu, 31 Aug 2017 18:23:49 +0200","From":"Michal Kubecek <mkubecek@suse.cz>","To":"Jesper Dangaard Brouer <brouer@redhat.com>","Cc":"liujian56@huawei.com, netdev@vger.kernel.org,\n\tFlorian Westphal <fw@strlen.de>","Subject":"Re: [RFC PATCH] net: frag limit checks need to use\n\tpercpu_counter_compare","Message-ID":"<20170831162349.k3qnkfgkygdh2zqw@unicorn.suse.cz>","References":"<150417481955.28907.15567119824187929000.stgit@firesoul>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<150417481955.28907.15567119824187929000.stgit@firesoul>","User-Agent":"NeoMutt/20170421 (1.8.2)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1761438,"web_url":"http://patchwork.ozlabs.org/comment/1761438/","msgid":"<4F88C5DDA1E80143B232E89585ACE27D018F6A9B@DGGEMA502-MBX.china.huawei.com>","list_archive_url":null,"date":"2017-09-01T02:25:32","subject":"RE: [RFC PATCH] net: frag limit checks need to use\n\tpercpu_counter_compare","submitter":{"id":71967,"url":"http://patchwork.ozlabs.org/api/people/71967/","name":"liujian \\(CE\\)","email":"liujian56@huawei.com"},"content":"Best Regards,\nliujian\n\n\n> -----Original Message-----\n> From: Michal Kubecek [mailto:mkubecek@suse.cz]\n> Sent: Friday, September 01, 2017 12:24 AM\n> To: Jesper Dangaard Brouer\n> Cc: liujian (CE); netdev@vger.kernel.org; Florian Westphal\n> Subject: Re: [RFC PATCH] net: frag limit checks need to use\n> percpu_counter_compare\n> \n> On Thu, Aug 31, 2017 at 12:20:19PM +0200, Jesper Dangaard Brouer wrote:\n> > To: Liujian can you please test this patch?\n> >  I want to understand if using __percpu_counter_compare() solves  the\n> > problem correctness wise (even-though this will be slower  than using\n> > a simple atomic_t on your big system).\n\nI have test the patch, it can work. \n1. make sure frag_mem_limit reach to thresh\n  ===>FRAG: inuse 0 memory 0 frag_mem_limit 5386864\n2. change NIC rx irq's affinity to a fixed CPU\n3. iperf -u -c 9.83.1.41 -l 10000 -i 1 -t 1000 -P 10 -b 20M\n  And check /proc/net/snmp, there are no ReasmFails.\n\nAnd I think it is a better way that adding some counter sync points as you said.\n\n> > Fix bug in fragmentation codes use of the percpu_counter API, that\n> > cause issues on systems with many CPUs.\n> >\n> > The frag_mem_limit() just reads the global counter (fbc->count),\n> > without considering other CPUs can have upto batch size (130K) that\n> > haven't been subtracted yet.  Due to the 3MBytes lower thresh limit,\n> > this become dangerous at >=24 CPUs (3*1024*1024/130000=24).\n> >\n> > The __percpu_counter_compare() does the right thing, and takes into\n> > account the number of (online) CPUs and batch size, to account for\n> > this and call __percpu_counter_sum() when needed.\n> >\n> > On systems with many CPUs this will unfortunately always result in the\n> > heavier fully locked __percpu_counter_sum() which touch the\n> > per_cpu_ptr of all (online) CPUs.\n> >\n> > On systems with a smaller number of CPUs this solution is also not\n> > optimal, because __percpu_counter_compare()/__percpu_counter_sum()\n> > doesn't help synchronize the global counter.\n> >  Florian Westphal have an idea of adding some counter sync points,\n> > which should help address this issue.\n> > ---\n> >  include/net/inet_frag.h  |   16 ++++++++++++++--\n> >  net/ipv4/inet_fragment.c |    6 +++---\n> >  2 files changed, 17 insertions(+), 5 deletions(-)\n> >\n> > diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h index\n> > 6fdcd2427776..b586e320783d 100644\n> > --- a/include/net/inet_frag.h\n> > +++ b/include/net/inet_frag.h\n> > @@ -147,9 +147,21 @@ static inline bool inet_frag_evicting(struct\n> inet_frag_queue *q)\n> >   */\n> >  static unsigned int frag_percpu_counter_batch = 130000;\n> >\n> > -static inline int frag_mem_limit(struct netns_frags *nf)\n> > +static inline bool frag_mem_over_limit(struct netns_frags *nf, int\n> > +thresh)\n> >  {\n> > -\treturn percpu_counter_read(&nf->mem);\n> > +\t/* When reading counter here, __percpu_counter_compare() call\n> > +\t * will invoke __percpu_counter_sum() when needed.  Which\n> > +\t * depend on num_online_cpus()*batch size, as each CPU can\n> > +\t * potentential can hold a batch count.\n> > +\t *\n> > +\t * With many CPUs this heavier sum operation will\n> > +\t * unfortunately always occur.\n> > +\t */\n> > +\tif (__percpu_counter_compare(&nf->mem, thresh,\n> > +\t\t\t\t     frag_percpu_counter_batch) > 0)\n> > +\t\treturn true;\n> > +\telse\n> > +\t\treturn false;\n> >  }\n> >\n> >  static inline void sub_frag_mem_limit(struct netns_frags *nf, int i)\n> > diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index\n> > 96e95e83cc61..ee2cf56900e6 100644\n> > --- a/net/ipv4/inet_fragment.c\n> > +++ b/net/ipv4/inet_fragment.c\n> > @@ -120,7 +120,7 @@ static void inet_frag_secret_rebuild(struct\n> > inet_frags *f)  static bool inet_fragq_should_evict(const struct\n> > inet_frag_queue *q)  {\n> >  \treturn q->net->low_thresh == 0 ||\n> > -\t       frag_mem_limit(q->net) >= q->net->low_thresh;\n> > +\t\tfrag_mem_over_limit(q->net, q->net->low_thresh);\n> >  }\n> >\n> >  static unsigned int\n> > @@ -355,7 +355,7 @@ static struct inet_frag_queue\n> > *inet_frag_alloc(struct netns_frags *nf,  {\n> >  \tstruct inet_frag_queue *q;\n> >\n> > -\tif (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh) {\n> > +\tif (!nf->high_thresh || frag_mem_over_limit(nf, nf->high_thresh)) {\n> >  \t\tinet_frag_schedule_worker(f);\n> >  \t\treturn NULL;\n> >  \t}\n> \n> If we go this way (which would IMHO require some benchmarks to make sure it\n> doesn't harm performance too much) we can drop the explicit checks for zero\n> thresholds which were added to work around the unreliability of fast checks of\n> percpu counters (or at least the second one was by commit\n> 30759219f562 (\"net: disable fragment reassembly if high_thresh is zero\").\n> \n> Michal Kubecek\n> \n> > @@ -396,7 +396,7 @@ struct inet_frag_queue *inet_frag_find(struct\n> netns_frags *nf,\n> >  \tstruct inet_frag_queue *q;\n> >  \tint depth = 0;\n> >\n> > -\tif (frag_mem_limit(nf) > nf->low_thresh)\n> > +\tif (frag_mem_over_limit(nf, nf->low_thresh))\n> >  \t\tinet_frag_schedule_worker(f);\n> >\n> >  \thash &= (INETFRAGS_HASHSZ - 1);\n> >","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xk3520tDKz9s7p\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri,  1 Sep 2017 12:25:58 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751109AbdIACZp convert rfc822-to-8bit (ORCPT\n\t<rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 31 Aug 2017 22:25:45 -0400","from szxga02-in.huawei.com ([45.249.212.188]:11824 \"EHLO\n\tszxga02-in.huawei.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1750925AbdIACZo (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Thu, 31 Aug 2017 22:25:44 -0400","from 172.30.72.56 (EHLO DGGEMA406-HUB.china.huawei.com)\n\t([172.30.72.56])\n\tby dggrg02-dlp.huawei.com (MOS 4.4.6-GA FastPath queued)\n\twith ESMTP id AUN84250; Fri, 01 Sep 2017 10:25:38 +0800 (CST)","from DGGEMA502-MBX.china.huawei.com ([169.254.2.74]) by\n\tDGGEMA406-HUB.china.huawei.com ([10.3.20.47]) with mapi id\n\t14.03.0301.000; Fri, 1 Sep 2017 10:25:32 +0800"],"From":"\"liujian (CE)\" <liujian56@huawei.com>","To":"Michal Kubecek <mkubecek@suse.cz>,\n\tJesper Dangaard Brouer <brouer@redhat.com>","CC":"\"netdev@vger.kernel.org\" <netdev@vger.kernel.org>,\n\tFlorian Westphal <fw@strlen.de>","Subject":"RE: [RFC PATCH] net: frag limit checks need to use\n\tpercpu_counter_compare","Thread-Topic":"[RFC PATCH] net: frag limit checks need to use\n\tpercpu_counter_compare","Thread-Index":"AQHTIkLFoX16/cFnu0CeI7eoxSaP9qKeIGuAgAEsVNA=","Date":"Fri, 1 Sep 2017 02:25:32 +0000","Message-ID":"<4F88C5DDA1E80143B232E89585ACE27D018F6A9B@DGGEMA502-MBX.china.huawei.com>","References":"<150417481955.28907.15567119824187929000.stgit@firesoul>\n\t<20170831162349.k3qnkfgkygdh2zqw@unicorn.suse.cz>","In-Reply-To":"<20170831162349.k3qnkfgkygdh2zqw@unicorn.suse.cz>","Accept-Language":"en-US","Content-Language":"zh-CN","X-MS-Has-Attach":"","X-MS-TNEF-Correlator":"","x-originating-ip":"[10.177.97.126]","Content-Type":"text/plain; charset=\"us-ascii\"","Content-Transfer-Encoding":"8BIT","MIME-Version":"1.0","X-CFilter-Loop":"Reflected","X-Mirapoint-Virus-RAPID-Raw":"score=unknown(0),\n\trefid=str=0001.0A090205.59A8C522.0088, ss=1, re=0.000, recu=0.000,\n\treip=0.000, cl=1, cld=1, fgs=0, \n\tip=169.254.2.74, so=2014-11-16 11:51:01,\n\tdmn=2013-03-21 17:37:32","X-Mirapoint-Loop-Id":"6bc09b1bb4f4850795e9801f9fd8ce28","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1761496,"web_url":"http://patchwork.ozlabs.org/comment/1761496/","msgid":"<20170901091641.4c62af06@redhat.com>","list_archive_url":null,"date":"2017-09-01T07:16:41","subject":"Re: [RFC PATCH] net: frag limit checks need to use\n\tpercpu_counter_compare","submitter":{"id":13625,"url":"http://patchwork.ozlabs.org/api/people/13625/","name":"Jesper Dangaard Brouer","email":"brouer@redhat.com"},"content":"On Fri, 1 Sep 2017 02:25:32 +0000 \"liujian (CE)\" <liujian56@huawei.com> wrote:\n\n> > -----Original Message-----\n> > From: Michal Kubecek [mailto:mkubecek@suse.cz]\n> > Sent: Friday, September 01, 2017 12:24 AM\n> > To: Jesper Dangaard Brouer\n> > Cc: liujian (CE); netdev@vger.kernel.org; Florian Westphal\n> > Subject: Re: [RFC PATCH] net: frag limit checks need to use\n> > percpu_counter_compare\n> > \n> > On Thu, Aug 31, 2017 at 12:20:19PM +0200, Jesper Dangaard Brouer wrote:  \n> > > To: Liujian can you please test this patch?\n> > >  I want to understand if using __percpu_counter_compare() solves  the\n> > > problem correctness wise (even-though this will be slower  than using\n> > > a simple atomic_t on your big system).  \n> \n> I have test the patch, it can work. \n\nThanks for confirming this.\n\n> 1. make sure frag_mem_limit reach to thresh\n>   ===>FRAG: inuse 0 memory 0 frag_mem_limit 5386864  \n> 2. change NIC rx irq's affinity to a fixed CPU\n\nIf you pin the NIC RX queue to a single CPU, then the error issue\nbasically cannot happen.  Different CPU need to have a chance to \"own\"\npart of the percpu_counter.  I guess default setup with irqbalance\ncould eventually screw the percpu_counter enough given enough CPUs, or\na network load with enough different L2-headers to high different RX\nqueues.\n\n> 3. iperf -u -c 9.83.1.41 -l 10000 -i 1 -t 1000 -P 10 -b 20M\n>   And check /proc/net/snmp, there are no ReasmFails.\n\nMy quick check command is:\n nstat > /dev/null && sleep 1 && nstat && grep FRAG /proc/net/sockstat\n\n> And I think it is a better way that adding some counter sync points\n> as you said.\n\nI've discussed this offlist with Florian, while it is doable, we are\nadding too much complexity for something that can be solved much\nsimpler with an atomic_t (as before my patch).  Thus, I'm now looking\nat reverting my original change (commit 6d7b857d541e (\"net: use\nlib/percpu_counter API for fragmentation mem accounting\")).","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ext-mx07.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx07.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=brouer@redhat.com"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xk9Xg34sJz9s7C\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri,  1 Sep 2017 17:16:51 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751386AbdIAHQs (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tFri, 1 Sep 2017 03:16:48 -0400","from mx1.redhat.com ([209.132.183.28]:41378 \"EHLO mx1.redhat.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1751237AbdIAHQr (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tFri, 1 Sep 2017 03:16:47 -0400","from smtp.corp.redhat.com\n\t(int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id E8A13C0467D7;\n\tFri,  1 Sep 2017 07:16:46 +0000 (UTC)","from localhost (ovpn-200-27.brq.redhat.com [10.40.200.27])\n\tby smtp.corp.redhat.com (Postfix) with ESMTP id 299D2A397C;\n\tFri,  1 Sep 2017 07:16:42 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com E8A13C0467D7","Date":"Fri, 1 Sep 2017 09:16:41 +0200","From":"Jesper Dangaard Brouer <brouer@redhat.com>","To":"\"liujian (CE)\" <liujian56@huawei.com>","Cc":"Michal Kubecek <mkubecek@suse.cz>,\n\t\"netdev@vger.kernel.org\" <netdev@vger.kernel.org>,\n\tFlorian Westphal <fw@strlen.de>, brouer@redhat.com","Subject":"Re: [RFC PATCH] net: frag limit checks need to use\n\tpercpu_counter_compare","Message-ID":"<20170901091641.4c62af06@redhat.com>","In-Reply-To":"<4F88C5DDA1E80143B232E89585ACE27D018F6A9B@DGGEMA502-MBX.china.huawei.com>","References":"<150417481955.28907.15567119824187929000.stgit@firesoul>\n\t<20170831162349.k3qnkfgkygdh2zqw@unicorn.suse.cz>\n\t<4F88C5DDA1E80143B232E89585ACE27D018F6A9B@DGGEMA502-MBX.china.huawei.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=US-ASCII","Content-Transfer-Encoding":"7bit","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.13","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.31]);\n\tFri, 01 Sep 2017 07:16:47 +0000 (UTC)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1761503,"web_url":"http://patchwork.ozlabs.org/comment/1761503/","msgid":"<20170901094156.12e6e0a3@redhat.com>","list_archive_url":null,"date":"2017-09-01T07:41:56","subject":"Re: [RFC PATCH] net: frag limit checks need to use\n\tpercpu_counter_compare","submitter":{"id":13625,"url":"http://patchwork.ozlabs.org/api/people/13625/","name":"Jesper Dangaard Brouer","email":"brouer@redhat.com"},"content":"On Thu, 31 Aug 2017 18:23:49 +0200 Michal Kubecek <mkubecek@suse.cz> wrote:\n\n> If we go this way (which would IMHO require some benchmarks to make sure\n> it doesn't harm performance too much) we can drop the explicit checks\n> for zero thresholds which were added to work around the unreliability of\n> fast checks of percpu counters (or at least the second one was by commit\n> 30759219f562 (\"net: disable fragment reassembly if high_thresh is zero\").\n  \nAfter much considerations, together with Florian, I'm now instead\nlooking at reverting the use of percpu_counter for this memory\naccounting use-case.  The complexity and maintenance cost is not worth\nit.  And I'm of-cause testing the perf effect, and currently I'm _not_\nseeing any perf regression on my 10G + 100G testlab (although this is\nnot a NUMA system, which were my original optimization case).","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ext-mx08.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx08.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=brouer@redhat.com"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xkB5n4Ns1z9s81\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri,  1 Sep 2017 17:42:05 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751488AbdIAHmD (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tFri, 1 Sep 2017 03:42:03 -0400","from mx1.redhat.com ([209.132.183.28]:46088 \"EHLO mx1.redhat.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1751022AbdIAHmC (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tFri, 1 Sep 2017 03:42:02 -0400","from smtp.corp.redhat.com\n\t(int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 63DBEC056819;\n\tFri,  1 Sep 2017 07:42:02 +0000 (UTC)","from localhost (ovpn-200-27.brq.redhat.com [10.40.200.27])\n\tby smtp.corp.redhat.com (Postfix) with ESMTP id 3BDF060BE1;\n\tFri,  1 Sep 2017 07:41:57 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 63DBEC056819","Date":"Fri, 1 Sep 2017 09:41:56 +0200","From":"Jesper Dangaard Brouer <brouer@redhat.com>","To":"Michal Kubecek <mkubecek@suse.cz>","Cc":"liujian56@huawei.com, netdev@vger.kernel.org,\n\tFlorian Westphal <fw@strlen.de>, brouer@redhat.com","Subject":"Re: [RFC PATCH] net: frag limit checks need to use\n\tpercpu_counter_compare","Message-ID":"<20170901094156.12e6e0a3@redhat.com>","In-Reply-To":"<20170831162349.k3qnkfgkygdh2zqw@unicorn.suse.cz>","References":"<150417481955.28907.15567119824187929000.stgit@firesoul>\n\t<20170831162349.k3qnkfgkygdh2zqw@unicorn.suse.cz>","MIME-Version":"1.0","Content-Type":"text/plain; charset=US-ASCII","Content-Transfer-Encoding":"7bit","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.12","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.32]);\n\tFri, 01 Sep 2017 07:42:02 +0000 (UTC)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1761509,"web_url":"http://patchwork.ozlabs.org/comment/1761509/","msgid":"<20170901081006.6hfftamapa56m2jv@unicorn.suse.cz>","list_archive_url":null,"date":"2017-09-01T08:10:06","subject":"Re: [RFC PATCH] net: frag limit checks need to use\n\tpercpu_counter_compare","submitter":{"id":11892,"url":"http://patchwork.ozlabs.org/api/people/11892/","name":"Michal Kubecek","email":"mkubecek@suse.cz"},"content":"On Fri, Sep 01, 2017 at 09:41:56AM +0200, Jesper Dangaard Brouer wrote:\n> On Thu, 31 Aug 2017 18:23:49 +0200 Michal Kubecek <mkubecek@suse.cz> wrote:\n> \n> > If we go this way (which would IMHO require some benchmarks to make sure\n> > it doesn't harm performance too much) we can drop the explicit checks\n> > for zero thresholds which were added to work around the unreliability of\n> > fast checks of percpu counters (or at least the second one was by commit\n> > 30759219f562 (\"net: disable fragment reassembly if high_thresh is zero\").\n>   \n> After much considerations, together with Florian, I'm now instead\n> looking at reverting the use of percpu_counter for this memory\n> accounting use-case.  The complexity and maintenance cost is not worth\n> it.  And I'm of-cause testing the perf effect, and currently I'm _not_\n> seeing any perf regression on my 10G + 100G testlab (although this is\n> not a NUMA system, which were my original optimization case).\n\nThis sounds reasonable to me. It is indeed questionable if percpu\ncounters are still worth the complexity if all checks have to be changed\nto the exact version.\n\nPerhaps there would be some gain for many CPUs if thresholds are large\nenough to (almost) always avoid the need to calculate the sum. But once\nwe leave that safe area, I would be surprised if simple atomic_t\nwouldn't be more efficient.\n\nMichal Kubecek","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xkBkF3MGhz9s81\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri,  1 Sep 2017 18:10:13 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751463AbdIAIKK (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tFri, 1 Sep 2017 04:10:10 -0400","from mx2.suse.de ([195.135.220.15]:44017 \"EHLO mx1.suse.de\"\n\trhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP\n\tid S1751022AbdIAIKI (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tFri, 1 Sep 2017 04:10:08 -0400","from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254])\n\tby mx1.suse.de (Postfix) with ESMTP id 3E631AB9B;\n\tFri,  1 Sep 2017 08:10:07 +0000 (UTC)","by unicorn.suse.cz (Postfix, from userid 1000)\n\tid 955C2A0BE8; Fri,  1 Sep 2017 10:10:06 +0200 (CEST)"],"X-Virus-Scanned":"by amavisd-new at test-mx.suse.de","Date":"Fri, 1 Sep 2017 10:10:06 +0200","From":"Michal Kubecek <mkubecek@suse.cz>","To":"Jesper Dangaard Brouer <brouer@redhat.com>","Cc":"liujian56@huawei.com, netdev@vger.kernel.org,\n\tFlorian Westphal <fw@strlen.de>","Subject":"Re: [RFC PATCH] net: frag limit checks need to use\n\tpercpu_counter_compare","Message-ID":"<20170901081006.6hfftamapa56m2jv@unicorn.suse.cz>","References":"<150417481955.28907.15567119824187929000.stgit@firesoul>\n\t<20170831162349.k3qnkfgkygdh2zqw@unicorn.suse.cz>\n\t<20170901094156.12e6e0a3@redhat.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<20170901094156.12e6e0a3@redhat.com>","User-Agent":"NeoMutt/20170421 (1.8.2)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}}]