[{"id":1757548,"web_url":"http://patchwork.ozlabs.org/comment/1757548/","msgid":"<E0D909EE5BB15A4699798539EA149D7F07793B3E@ORSMSX103.amr.corp.intel.com>","list_archive_url":null,"date":"2017-08-25T15:36:22","subject":"Re: [RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","submitter":{"id":72228,"url":"http://patchwork.ozlabs.org/api/people/72228/","name":"Waskiewicz Jr, Peter","email":"peter.waskiewicz.jr@intel.com"},"content":"On 8/25/17 11:25 AM, Jacob Keller wrote:\n> Under some circumstances, such as with many stacked devices, it is\n> possible that dev_hard_start_xmit will bundle many packets together, and\n> mark them all with xmit_more.\n> \n> Most drivers respond to xmit_more by skipping tail bumps on packet\n> rings, or similar behavior as long as xmit_more is set. This is\n> a performance win since it means drivers can avoid notifying hardware of\n> new packets repeat daily, and thus avoid wasting unnecessary PCIe or other\n> bandwidth.\n> \n> This use of xmit_more comes with a trade off because bundling too many\n> packets can increase latency of the Tx packets. To avoid this, we should\n> limit the maximum number of packets with xmit_more.\n> \n> Driver authors could modify their drivers to check for some determined\n> limit, but this requires all drivers to be modified in order to gain\n> advantage.\n> \n> Instead, add a sysctl \"xmit_more_max\" which can be used to configure the\n> maximum number of xmit_more skbs to send in a sequence. This ensures\n> that all drivers benefit, and allows system administrators the option to\n> tune the value to their environment.\n> \n> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>\n> ---\n> \n> Stray thoughts and further questions....\n> \n> Is this the right approach? Did I miss any other places where we should\n> limit? Does the limit make sense? Should it instead be a per-device\n> tuning nob instead of a global? Is 32 a good default?\n\nI actually like the idea of a per-device knob.  A xmit_more_max that's \nglobal in a system with 1GbE devices along with a 25/50GbE or more just \ndoesn't make much sense to me.  Or having heterogeneous vendor devices \nin the same system that have different HW behaviors could mask issues \nwith latency.\n\nThis seems like another incarnation of possible buffer-bloat if the max \nis too high...\n\n> \n>   Documentation/sysctl/net.txt |  6 ++++++\n>   include/linux/netdevice.h    |  2 ++\n>   net/core/dev.c               | 10 +++++++++-\n>   net/core/sysctl_net_core.c   |  7 +++++++\n>   4 files changed, 24 insertions(+), 1 deletion(-)\n> \n> diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt\n> index b67044a2575f..3d995e8f4448 100644\n> --- a/Documentation/sysctl/net.txt\n> +++ b/Documentation/sysctl/net.txt\n> @@ -230,6 +230,12 @@ netdev_max_backlog\n>   Maximum number  of  packets,  queued  on  the  INPUT  side, when the interface\n>   receives packets faster than kernel can process them.\n>   \n> +xmit_more_max\n> +-------------\n> +\n> +Maximum number of packets in a row to mark with skb->xmit_more. A value of zero\n> +indicates no limit.\n\nWhat defines \"packet?\"  MTU-sized packets, or payloads coming down from \nthe stack (e.g. TSO's)?\n\n-PJ","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xf4yS1bmGz9sP5\n\tfor <patchwork-incoming@ozlabs.org>;\n\tSat, 26 Aug 2017 01:36:32 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S933747AbdHYPg3 convert rfc822-to-8bit (ORCPT\n\t<rfc822;patchwork-incoming@ozlabs.org>);\n\tFri, 25 Aug 2017 11:36:29 -0400","from mga05.intel.com ([192.55.52.43]:10931 \"EHLO mga05.intel.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S933386AbdHYPg2 (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tFri, 25 Aug 2017 11:36:28 -0400","from fmsmga002.fm.intel.com ([10.253.24.26])\n\tby fmsmga105.fm.intel.com with ESMTP; 25 Aug 2017 08:36:23 -0700","from orsmsx109.amr.corp.intel.com ([10.22.240.7])\n\tby fmsmga002.fm.intel.com with ESMTP; 25 Aug 2017 08:36:23 -0700","from orsmsx161.amr.corp.intel.com (10.22.240.84) by\n\tORSMSX109.amr.corp.intel.com (10.22.240.7) with Microsoft SMTP Server\n\t(TLS) id 14.3.319.2; Fri, 25 Aug 2017 08:36:23 -0700","from orsmsx103.amr.corp.intel.com ([169.254.5.176]) by\n\tORSMSX161.amr.corp.intel.com ([169.254.4.147]) with mapi id\n\t14.03.0319.002; Fri, 25 Aug 2017 08:36:23 -0700"],"X-ExtLoop1":"1","X-IronPort-AV":"E=Sophos;i=\"5.41,426,1498546800\"; d=\"scan'208\";a=\"1210437639\"","From":"\"Waskiewicz Jr, Peter\" <peter.waskiewicz.jr@intel.com>","To":"\"Keller, Jacob E\" <jacob.e.keller@intel.com>,\n\t\"netdev@vger.kernel.org\" <netdev@vger.kernel.org>","Subject":"Re: [RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","Thread-Topic":"[RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","Thread-Index":"AQHTHbZdwOG/pgeoO0mW5u5krC5yCQ==","Date":"Fri, 25 Aug 2017 15:36:22 +0000","Message-ID":"<E0D909EE5BB15A4699798539EA149D7F07793B3E@ORSMSX103.amr.corp.intel.com>","References":"<20170825152449.29790-1-jacob.e.keller@intel.com>","Accept-Language":"en-US","Content-Language":"en-US","X-MS-Has-Attach":"","X-MS-TNEF-Correlator":"","x-originating-ip":"[10.254.121.22]","Content-Type":"text/plain; charset=\"us-ascii\"","Content-Transfer-Encoding":"8BIT","MIME-Version":"1.0","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1757569,"web_url":"http://patchwork.ozlabs.org/comment/1757569/","msgid":"<20170825085816.3425a70c@xeon-e3>","list_archive_url":null,"date":"2017-08-25T15:58:16","subject":"Re: [RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","submitter":{"id":21389,"url":"http://patchwork.ozlabs.org/api/people/21389/","name":"Stephen Hemminger","email":"stephen@networkplumber.org"},"content":"On Fri, 25 Aug 2017 15:36:22 +0000\n\"Waskiewicz Jr, Peter\" <peter.waskiewicz.jr@intel.com> wrote:\n\n> On 8/25/17 11:25 AM, Jacob Keller wrote:\n> > Under some circumstances, such as with many stacked devices, it is\n> > possible that dev_hard_start_xmit will bundle many packets together, and\n> > mark them all with xmit_more.\n> > \n> > Most drivers respond to xmit_more by skipping tail bumps on packet\n> > rings, or similar behavior as long as xmit_more is set. This is\n> > a performance win since it means drivers can avoid notifying hardware of\n> > new packets repeat daily, and thus avoid wasting unnecessary PCIe or other\n> > bandwidth.\n> > \n> > This use of xmit_more comes with a trade off because bundling too many\n> > packets can increase latency of the Tx packets. To avoid this, we should\n> > limit the maximum number of packets with xmit_more.\n> > \n> > Driver authors could modify their drivers to check for some determined\n> > limit, but this requires all drivers to be modified in order to gain\n> > advantage.\n> > \n> > Instead, add a sysctl \"xmit_more_max\" which can be used to configure the\n> > maximum number of xmit_more skbs to send in a sequence. This ensures\n> > that all drivers benefit, and allows system administrators the option to\n> > tune the value to their environment.\n> > \n> > Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>\n> > ---\n> > \n> > Stray thoughts and further questions....\n> > \n> > Is this the right approach? Did I miss any other places where we should\n> > limit? Does the limit make sense? Should it instead be a per-device\n> > tuning nob instead of a global? Is 32 a good default?  \n> \n> I actually like the idea of a per-device knob.  A xmit_more_max that's \n> global in a system with 1GbE devices along with a 25/50GbE or more just \n> doesn't make much sense to me.  Or having heterogeneous vendor devices \n> in the same system that have different HW behaviors could mask issues \n> with latency.\n> \n> This seems like another incarnation of possible buffer-bloat if the max \n> is too high...\n> \n> > \n> >   Documentation/sysctl/net.txt |  6 ++++++\n> >   include/linux/netdevice.h    |  2 ++\n> >   net/core/dev.c               | 10 +++++++++-\n> >   net/core/sysctl_net_core.c   |  7 +++++++\n> >   4 files changed, 24 insertions(+), 1 deletion(-)\n> > \n> > diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt\n> > index b67044a2575f..3d995e8f4448 100644\n> > --- a/Documentation/sysctl/net.txt\n> > +++ b/Documentation/sysctl/net.txt\n> > @@ -230,6 +230,12 @@ netdev_max_backlog\n> >   Maximum number  of  packets,  queued  on  the  INPUT  side, when the interface\n> >   receives packets faster than kernel can process them.\n> >   \n> > +xmit_more_max\n> > +-------------\n> > +\n> > +Maximum number of packets in a row to mark with skb->xmit_more. A value of zero\n> > +indicates no limit.  \n> \n> What defines \"packet?\"  MTU-sized packets, or payloads coming down from \n> the stack (e.g. TSO's)?\n\nxmit_more is only a hint to the device. The device driver should ignore it unless\nthere are hardware advantages. The device driver is the place with HW specific\nknowledge (like 4 Tx descriptors is equivalent to one PCI transaction on this device).\n\nAnything that pushes that optimization out to the user is only useful for benchmarks\nand embedded devices.","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=networkplumber-org.20150623.gappssmtp.com\n\theader.i=@networkplumber-org.20150623.gappssmtp.com\n\theader.b=\"h0PSNepM\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xf5Rm0TBVz9s8J\n\tfor <patchwork-incoming@ozlabs.org>;\n\tSat, 26 Aug 2017 01:58:28 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S934048AbdHYP6Z (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tFri, 25 Aug 2017 11:58:25 -0400","from mail-pg0-f46.google.com ([74.125.83.46]:33935 \"EHLO\n\tmail-pg0-f46.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S933926AbdHYP6Y (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Fri, 25 Aug 2017 11:58:24 -0400","by mail-pg0-f46.google.com with SMTP id a7so1172870pgn.1\n\tfor <netdev@vger.kernel.org>; Fri, 25 Aug 2017 08:58:24 -0700 (PDT)","from xeon-e3 (76-14-207-240.or.wavecable.com. [76.14.207.240])\n\tby smtp.gmail.com with ESMTPSA id\n\tz83sm12299572pfd.10.2017.08.25.08.58.23\n\t(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);\n\tFri, 25 Aug 2017 08:58:23 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=networkplumber-org.20150623.gappssmtp.com; s=20150623;\n\th=date:from:to:cc:subject:message-id:in-reply-to:references\n\t:mime-version:content-transfer-encoding;\n\tbh=S3q/qdDFgUWzmmtnP6DC5H3COaL+DGmLGYWINGhzsrs=;\n\tb=h0PSNepMhXjAl1sdSo/CR/+aNLBQxBwNUwzMYkGfjBtQ/14aKm/L0vh+zMVxcB6wuG\n\tNW3ostN5pho0T4FNJv87zO5Je5iSLZYPpnc9xh2UA5xEhNfiVkOd3L3HXtVrQ0cFHAIc\n\tfN6ajDC6pHKxQrjja4XNLI9O7sTh6SlBVHcDPECj5ljyw2cjWSP3OSwNx5hA4ushoPJF\n\tTllacWqP53VUqsmf6Irkhra8iBEZCJi6tziHVVNBAyh7ysexbAhvW/dijSav5COVRQ2W\n\tKRrPoMGpfNL4pLaUAeau5YWnmwQ+eYXNazmIGe4Vs11DfdGyiyfi0TMRXHviA+RFKKRC\n\tQkLQ==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to\n\t:references:mime-version:content-transfer-encoding;\n\tbh=S3q/qdDFgUWzmmtnP6DC5H3COaL+DGmLGYWINGhzsrs=;\n\tb=d2B1Rm4k8bf5O3Tb3zibbGx57oMzfl4N+MRkKCjAWzB4SWVS3opME23VcPgQc99z3E\n\tFPRpyqXADSyqnwHyCGGYAzPNJkEkx2aGYOvUR4fpQNH0Oy9hMC/DKze23GynO+nsUjDp\n\tgtzsci+A6udmr1VSKK8c33ixIpiwc5KvDFeot4CdvWNIqlIvmMdLZN0PhcPU33L34/dV\n\tpcvZ2vm9hQbd4A02qjy4sN5v57nSKWj10/Mlzib2bwSI0PIFa4DAK0yygs+64OLcFxpe\n\tsZOJIE0xUvMfp8Z6UG3ZWzSuGtQorW/GGd9i8AesdqYoS6X5WOjOT5Z0A9hCazScs1V6\n\tieaQ==","X-Gm-Message-State":"AHYfb5ihuivVSwr8MdbPSSx6TnunKXHe02CoMeZWIJKpj/LrjTfbp7g6\n\tkowmyeTFBvcV1SGn11+MIw==","X-Received":"by 10.99.98.3 with SMTP id w3mr10322067pgb.350.1503676703624;\n\tFri, 25 Aug 2017 08:58:23 -0700 (PDT)","Date":"Fri, 25 Aug 2017 08:58:16 -0700","From":"Stephen Hemminger <stephen@networkplumber.org>","To":"\"Waskiewicz Jr, Peter\" <peter.waskiewicz.jr@intel.com>","Cc":"\"Keller, Jacob E\" <jacob.e.keller@intel.com>,\n\t\"netdev@vger.kernel.org\" <netdev@vger.kernel.org>","Subject":"Re: [RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","Message-ID":"<20170825085816.3425a70c@xeon-e3>","In-Reply-To":"<E0D909EE5BB15A4699798539EA149D7F07793B3E@ORSMSX103.amr.corp.intel.com>","References":"<20170825152449.29790-1-jacob.e.keller@intel.com>\n\t<E0D909EE5BB15A4699798539EA149D7F07793B3E@ORSMSX103.amr.corp.intel.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=US-ASCII","Content-Transfer-Encoding":"7bit","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1757591,"web_url":"http://patchwork.ozlabs.org/comment/1757591/","msgid":"<02874ECE860811409154E81DA85FBB5882A8A790@ORSMSX115.amr.corp.intel.com>","list_archive_url":null,"date":"2017-08-25T16:24:05","subject":"RE: [RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","submitter":{"id":9784,"url":"http://patchwork.ozlabs.org/api/people/9784/","name":"Jacob Keller","email":"jacob.e.keller@intel.com"},"content":"> -----Original Message-----\n> From: Stephen Hemminger [mailto:stephen@networkplumber.org]\n> Sent: Friday, August 25, 2017 8:58 AM\n> To: Waskiewicz Jr, Peter <peter.waskiewicz.jr@intel.com>\n> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; netdev@vger.kernel.org\n> Subject: Re: [RFC PATCH] net: limit maximum number of packets to mark with\n> xmit_more\n> \n> xmit_more is only a hint to the device. The device driver should ignore it unless\n> there are hardware advantages. The device driver is the place with HW specific\n> knowledge (like 4 Tx descriptors is equivalent to one PCI transaction on this\n> device).\n> \n> Anything that pushes that optimization out to the user is only useful for\n> benchmarks\n> and embedded devices.\n\nRight so most drivers I've seen simply take it as a \"avoid bumping tail of a ring\" whenever they see xmit_more. But unfortunately in some circumstances, this results in potentially several hundred packets being set with xmit_more in a row, and then the driver doesn't bump the tail for a long time, resulting in high latency spikes..\n\nI was trying to find a way to fix this potentially in multiple drivers, rather than just a single driver, since I figured the same sort of code might need to be needed.\n\nSo you're suggesting we should just perform some check in the device driver, even if it might be duplication?\n\nWe could also instead make it a setting in the netdev struct or something which would be set by the driver and then tell stack code to limit how many it sends at once (so that we don't need to duplicate that checking code in every driver?)\n\nThanks,\nJake","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xf61T3xrpz9rxm\n\tfor <patchwork-incoming@ozlabs.org>;\n\tSat, 26 Aug 2017 02:24:13 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S934281AbdHYQYL convert rfc822-to-8bit (ORCPT\n\t<rfc822;patchwork-incoming@ozlabs.org>);\n\tFri, 25 Aug 2017 12:24:11 -0400","from mga01.intel.com ([192.55.52.88]:45430 \"EHLO mga01.intel.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S934196AbdHYQYK (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tFri, 25 Aug 2017 12:24:10 -0400","from orsmga005.jf.intel.com ([10.7.209.41])\n\tby fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;\n\t25 Aug 2017 09:24:10 -0700","from orsmsx109.amr.corp.intel.com ([10.22.240.7])\n\tby orsmga005.jf.intel.com with ESMTP; 25 Aug 2017 09:24:05 -0700","from orsmsx115.amr.corp.intel.com ([169.254.4.44]) by\n\tORSMSX109.amr.corp.intel.com ([169.254.11.118]) with mapi id\n\t14.03.0319.002; Fri, 25 Aug 2017 09:24:05 -0700"],"X-ExtLoop1":"1","X-IronPort-AV":"E=Sophos;i=\"5.41,426,1498546800\"; d=\"scan'208\";a=\"142014613\"","From":"\"Keller, Jacob E\" <jacob.e.keller@intel.com>","To":"Stephen Hemminger <stephen@networkplumber.org>,\n\t\"Waskiewicz Jr, Peter\" <peter.waskiewicz.jr@intel.com>","CC":"\"netdev@vger.kernel.org\" <netdev@vger.kernel.org>","Subject":"RE: [RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","Thread-Topic":"[RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","Thread-Index":"AQHTHbZSzzEGT3+EHUKfvXGnhfFugKKVr9kA//+QQcA=","Date":"Fri, 25 Aug 2017 16:24:05 +0000","Message-ID":"<02874ECE860811409154E81DA85FBB5882A8A790@ORSMSX115.amr.corp.intel.com>","References":"<20170825152449.29790-1-jacob.e.keller@intel.com>\n\t<E0D909EE5BB15A4699798539EA149D7F07793B3E@ORSMSX103.amr.corp.intel.com>\n\t<20170825085816.3425a70c@xeon-e3>","In-Reply-To":"<20170825085816.3425a70c@xeon-e3>","Accept-Language":"en-US","Content-Language":"en-US","X-MS-Has-Attach":"","X-MS-TNEF-Correlator":"","x-titus-metadata-40":"eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMWUyNzdiMWYtNjQ3Ni00Y2YzLTljZmItYjg2YjRiNWFmOWFhIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6Ik5UYnBhRVN1XC9tTGlIXC9yMFZlUVYrTDZwNmJIcU1jQXo0SnA0bWU0YnBcL3M9In0=","x-ctpclassification":"CTP_IC","dlp-product":"dlpe-windows","dlp-version":"11.0.0.116","dlp-reaction":"no-action","x-originating-ip":"[10.22.254.138]","Content-Type":"text/plain; charset=\"us-ascii\"","Content-Transfer-Encoding":"8BIT","MIME-Version":"1.0","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1757710,"web_url":"http://patchwork.ozlabs.org/comment/1757710/","msgid":"<20170825153418.53864810@cakuba>","list_archive_url":null,"date":"2017-08-25T19:34:18","subject":"Re: [RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","submitter":{"id":17220,"url":"http://patchwork.ozlabs.org/api/people/17220/","name":"Jakub Kicinski","email":"kubakici@wp.pl"},"content":"On Fri, 25 Aug 2017 08:24:49 -0700, Jacob Keller wrote:\n> Under some circumstances, such as with many stacked devices, it is\n> possible that dev_hard_start_xmit will bundle many packets together, and\n> mark them all with xmit_more.\n\nExcuse my ignorance but what are those stacked devices?  Could they\nperhaps be fixed somehow?  My intuition was that long xmit_more\nsequences can only happen if NIC and/or BQL are back pressuring, and\ntherefore we shouldn't be seeing a long xmit_more \"train\" arriving at\nan empty device ring...","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (1024-bit key;\n\tunprotected) header.d=wp.pl header.i=@wp.pl header.b=\"n+zA9ZPy\";\n\tdkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xfBF02VJfz9sPm\n\tfor <patchwork-incoming@ozlabs.org>;\n\tSat, 26 Aug 2017 05:34:28 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1758351AbdHYTeZ (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tFri, 25 Aug 2017 15:34:25 -0400","from mx4.wp.pl ([212.77.101.12]:13419 \"EHLO mx4.wp.pl\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1758345AbdHYTeZ (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tFri, 25 Aug 2017 15:34:25 -0400","(wp-smtpd smtp.wp.pl 40953 invoked from network);\n\t25 Aug 2017 21:34:22 +0200","from 50-225-197-140-static.hfc.comcastbusiness.net (HELO cakuba)\n\t(kubakici@wp.pl@[50.225.197.140])\n\t(envelope-sender <kubakici@wp.pl>)\n\tby smtp.wp.pl (WP-SMTPD) with ECDHE-RSA-AES256-GCM-SHA384 encrypted\n\tSMTP for <jacob.e.keller@intel.com>; 25 Aug 2017 21:34:22 +0200"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=wp.pl; s=1024a;\n\tt=1503689662; bh=PaUasK7ZUB7Twrarre5NZVcwCoVJiR+J4FSnqF/p5Zo=;\n\th=From:To:Cc:Subject;\n\tb=n+zA9ZPyt01NuWp0UIk+TXgS1NiOIaoJh3REt4klSkDvD+ub+omHV/UGO3Aoi2mK5\n\tfi4tDp8SX8lYKsa346Zjhjfhd6HWt8IWegaMq+34QsSGPkXM6voQzJ6MDeq5G5Hymt\n\tFfADBOEIs64afMpEowq2OmsOxbFIeY/gV/yGXJHU=","Date":"Fri, 25 Aug 2017 15:34:18 -0400","From":"Jakub Kicinski <kubakici@wp.pl>","To":"Jacob Keller <jacob.e.keller@intel.com>","Cc":"netdev@vger.kernel.org","Subject":"Re: [RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","Message-ID":"<20170825153418.53864810@cakuba>","In-Reply-To":"<20170825152449.29790-1-jacob.e.keller@intel.com>","References":"<20170825152449.29790-1-jacob.e.keller@intel.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=US-ASCII","Content-Transfer-Encoding":"7bit","X-WP-MailID":"65278eb1453874049bc9f4335d6ecaae","X-WP-AV":"skaner antywirusowy Poczty Wirtualnej Polski","X-WP-SPAM":"NO 0000000 [gWMU]                               ","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1757821,"web_url":"http://patchwork.ozlabs.org/comment/1757821/","msgid":"<CAKgT0UdC_8hTbc-p6cQPNy4=gbFnodZ3ZyKts329Yu8kiv-ZWw@mail.gmail.com>","list_archive_url":null,"date":"2017-08-25T22:33:48","subject":"Re: [RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","submitter":{"id":252,"url":"http://patchwork.ozlabs.org/api/people/252/","name":"Alexander Duyck","email":"alexander.duyck@gmail.com"},"content":"On Fri, Aug 25, 2017 at 8:58 AM, Stephen Hemminger\n<stephen@networkplumber.org> wrote:\n> On Fri, 25 Aug 2017 15:36:22 +0000\n> \"Waskiewicz Jr, Peter\" <peter.waskiewicz.jr@intel.com> wrote:\n>\n>> On 8/25/17 11:25 AM, Jacob Keller wrote:\n>> > Under some circumstances, such as with many stacked devices, it is\n>> > possible that dev_hard_start_xmit will bundle many packets together, and\n>> > mark them all with xmit_more.\n>> >\n>> > Most drivers respond to xmit_more by skipping tail bumps on packet\n>> > rings, or similar behavior as long as xmit_more is set. This is\n>> > a performance win since it means drivers can avoid notifying hardware of\n>> > new packets repeat daily, and thus avoid wasting unnecessary PCIe or other\n>> > bandwidth.\n>> >\n>> > This use of xmit_more comes with a trade off because bundling too many\n>> > packets can increase latency of the Tx packets. To avoid this, we should\n>> > limit the maximum number of packets with xmit_more.\n>> >\n>> > Driver authors could modify their drivers to check for some determined\n>> > limit, but this requires all drivers to be modified in order to gain\n>> > advantage.\n>> >\n>> > Instead, add a sysctl \"xmit_more_max\" which can be used to configure the\n>> > maximum number of xmit_more skbs to send in a sequence. This ensures\n>> > that all drivers benefit, and allows system administrators the option to\n>> > tune the value to their environment.\n>> >\n>> > Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>\n>> > ---\n>> >\n>> > Stray thoughts and further questions....\n>> >\n>> > Is this the right approach? Did I miss any other places where we should\n>> > limit? Does the limit make sense? Should it instead be a per-device\n>> > tuning nob instead of a global? Is 32 a good default?\n>>\n>> I actually like the idea of a per-device knob.  A xmit_more_max that's\n>> global in a system with 1GbE devices along with a 25/50GbE or more just\n>> doesn't make much sense to me.  Or having heterogeneous vendor devices\n>> in the same system that have different HW behaviors could mask issues\n>> with latency.\n>>\n>> This seems like another incarnation of possible buffer-bloat if the max\n>> is too high...\n>>\n>> >\n>> >   Documentation/sysctl/net.txt |  6 ++++++\n>> >   include/linux/netdevice.h    |  2 ++\n>> >   net/core/dev.c               | 10 +++++++++-\n>> >   net/core/sysctl_net_core.c   |  7 +++++++\n>> >   4 files changed, 24 insertions(+), 1 deletion(-)\n>> >\n>> > diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt\n>> > index b67044a2575f..3d995e8f4448 100644\n>> > --- a/Documentation/sysctl/net.txt\n>> > +++ b/Documentation/sysctl/net.txt\n>> > @@ -230,6 +230,12 @@ netdev_max_backlog\n>> >   Maximum number  of  packets,  queued  on  the  INPUT  side, when the interface\n>> >   receives packets faster than kernel can process them.\n>> >\n>> > +xmit_more_max\n>> > +-------------\n>> > +\n>> > +Maximum number of packets in a row to mark with skb->xmit_more. A value of zero\n>> > +indicates no limit.\n>>\n>> What defines \"packet?\"  MTU-sized packets, or payloads coming down from\n>> the stack (e.g. TSO's)?\n>\n> xmit_more is only a hint to the device. The device driver should ignore it unless\n> there are hardware advantages. The device driver is the place with HW specific\n> knowledge (like 4 Tx descriptors is equivalent to one PCI transaction on this device).\n>\n> Anything that pushes that optimization out to the user is only useful for benchmarks\n> and embedded devices.\n\nActually I think I might have an idea what is going on here and I\nagree that this is probably something that needs to be fixed in the\ndrivers. Especially since the problem isn't so much the skbs but\ndescriptors in the descriptor ring.\n\nIf I am not mistaken the issue is most drivers will honor the\nxmit_more unless the ring cannot enqueue another packet. The problem\nis if the clean-up is occurring on a different CPU than transmit we\ncan cause the clean-up CPU/device DMA to go idle by not providing any\nnotifications to the device that new packets are present. What we\nshould probably do is look at adding another condition which is to\nforce us to flush the packet if we have used over half of the\ndescriptors in a given ring without notifying the device. Then that\nway we can be filling half while the device is processing the other\nhalf which should result in us operating smoothly.\n\n- Alex","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"JKLq5OJE\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xfGD068NXz9t3k\n\tfor <patchwork-incoming@ozlabs.org>;\n\tSat, 26 Aug 2017 08:33:52 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S935387AbdHYWdu (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tFri, 25 Aug 2017 18:33:50 -0400","from mail-qk0-f196.google.com ([209.85.220.196]:38473 \"EHLO\n\tmail-qk0-f196.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S934274AbdHYWdt (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Fri, 25 Aug 2017 18:33:49 -0400","by mail-qk0-f196.google.com with SMTP id o63so1001156qkb.5\n\tfor <netdev@vger.kernel.org>; Fri, 25 Aug 2017 15:33:49 -0700 (PDT)","by 10.140.88.139 with HTTP; Fri, 25 Aug 2017 15:33:48 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=gmail.com; s=20161025;\n\th=mime-version:in-reply-to:references:from:date:message-id:subject:to\n\t:cc; bh=p4AAN/XMeLLhoP11XHstqgyELM37VGiVMAAqiufAxp4=;\n\tb=JKLq5OJE8yc7pocnmdtYo/WDzZXcqTvUtZ4F3G3plt5SXcnVaMiO+TQ2wdcLoKRaOU\n\tIvBfSFZnXRFZQIplqGOU7uDQxzNXM1+HkEySttrIGer/QgA2xIk6kbcWcBA+pmg+uU9v\n\tBB07Tp8dq6vqqp5YihYA141EK52cqQEGr6IjdiqDEVQ6sCU5B4McqazSRy1EtP/GwCue\n\tD0QZeLjtJ4omdPWQSNNiFWHsXYCCWALcOXhqxnT6V+minxkF4WzXpuhQoERwGbIVmTXA\n\tq9bkLA1H7bNnSHRm9P63FD6084ylQoflLXgfhGOe5frW369qkohRKnU5IdkGo1kw71qd\n\ttmSQ==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:mime-version:in-reply-to:references:from:date\n\t:message-id:subject:to:cc;\n\tbh=p4AAN/XMeLLhoP11XHstqgyELM37VGiVMAAqiufAxp4=;\n\tb=icg6TjNWr993wc/oAwiDdrp+gREjw1OgtUL+CdlmlV47c0JRiUg27V8RlcZdB7Ru8H\n\tU7O5WG+5WmqXKFN+lzEcqpOcyYoZfxO22rJee8+atQomYT/JsiyIUF4EB0oWw2QFtSpO\n\tZ1+7ChnP2YkfAGqGB0oMrgQdvCOyp9224K7NgVpqWF4FkcJIA+tXSocpN7m8fiS04NOc\n\t3hAInjahZd5TWcAJRxQXBL5xViV76beyPvLZnSgKkrvReHm3saaFaiHp7F/tW++6qduT\n\t6qShzjMr39t+LLs4eB4MrVw1RyXYs+ZxP1r0wg/KzSZP7SDJvRS4UmzOjpsY/WPOkOV0\n\tFxeg==","X-Gm-Message-State":"AHYfb5ia6WDd46AGLLyq2JlIRXx4YcQpTBM+N8xJj9KdT21xZ8u5lhKr\n\tsj3rhwLN2gtnHWj+n6eW/Nc4hhrDbA==","X-Received":"by 10.55.21.129 with SMTP id 1mr1809005qkv.45.1503700428751; Fri,\n\t25 Aug 2017 15:33:48 -0700 (PDT)","MIME-Version":"1.0","In-Reply-To":"<20170825085816.3425a70c@xeon-e3>","References":"<20170825152449.29790-1-jacob.e.keller@intel.com>\n\t<E0D909EE5BB15A4699798539EA149D7F07793B3E@ORSMSX103.amr.corp.intel.com>\n\t<20170825085816.3425a70c@xeon-e3>","From":"Alexander Duyck <alexander.duyck@gmail.com>","Date":"Fri, 25 Aug 2017 15:33:48 -0700","Message-ID":"<CAKgT0UdC_8hTbc-p6cQPNy4=gbFnodZ3ZyKts329Yu8kiv-ZWw@mail.gmail.com>","Subject":"Re: [RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","To":"Stephen Hemminger <stephen@networkplumber.org>","Cc":"\"Waskiewicz Jr, Peter\" <peter.waskiewicz.jr@intel.com>,\n\t\"Keller, Jacob E\" <jacob.e.keller@intel.com>,\n\t\"netdev@vger.kernel.org\" <netdev@vger.kernel.org>","Content-Type":"text/plain; charset=\"UTF-8\"","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1758838,"web_url":"http://patchwork.ozlabs.org/comment/1758838/","msgid":"<02874ECE860811409154E81DA85FBB5882A8FFD9@ORSMSX115.amr.corp.intel.com>","list_archive_url":null,"date":"2017-08-28T20:46:25","subject":"RE: [RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","submitter":{"id":9784,"url":"http://patchwork.ozlabs.org/api/people/9784/","name":"Jacob Keller","email":"jacob.e.keller@intel.com"},"content":"> -----Original Message-----\r\n> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On\r\n> Behalf Of Alexander Duyck\r\n> Sent: Friday, August 25, 2017 3:34 PM\r\n> To: Stephen Hemminger <stephen@networkplumber.org>\r\n> Cc: Waskiewicz Jr, Peter <peter.waskiewicz.jr@intel.com>; Keller, Jacob E\r\n> <jacob.e.keller@intel.com>; netdev@vger.kernel.org\r\n> Subject: Re: [RFC PATCH] net: limit maximum number of packets to mark with\r\n> xmit_more\r\n> \r\n> On Fri, Aug 25, 2017 at 8:58 AM, Stephen Hemminger\r\n> <stephen@networkplumber.org> wrote:\r\n> > On Fri, 25 Aug 2017 15:36:22 +0000\r\n> > \"Waskiewicz Jr, Peter\" <peter.waskiewicz.jr@intel.com> wrote:\r\n> >\r\n> >> On 8/25/17 11:25 AM, Jacob Keller wrote:\r\n> >> > Under some circumstances, such as with many stacked devices, it is\r\n> >> > possible that dev_hard_start_xmit will bundle many packets together, and\r\n> >> > mark them all with xmit_more.\r\n> >> >\r\n> >> > Most drivers respond to xmit_more by skipping tail bumps on packet\r\n> >> > rings, or similar behavior as long as xmit_more is set. This is\r\n> >> > a performance win since it means drivers can avoid notifying hardware of\r\n> >> > new packets repeat daily, and thus avoid wasting unnecessary PCIe or other\r\n> >> > bandwidth.\r\n> >> >\r\n> >> > This use of xmit_more comes with a trade off because bundling too many\r\n> >> > packets can increase latency of the Tx packets. To avoid this, we should\r\n> >> > limit the maximum number of packets with xmit_more.\r\n> >> >\r\n> >> > Driver authors could modify their drivers to check for some determined\r\n> >> > limit, but this requires all drivers to be modified in order to gain\r\n> >> > advantage.\r\n> >> >\r\n> >> > Instead, add a sysctl \"xmit_more_max\" which can be used to configure the\r\n> >> > maximum number of xmit_more skbs to send in a sequence. This ensures\r\n> >> > that all drivers benefit, and allows system administrators the option to\r\n> >> > tune the value to their environment.\r\n> >> >\r\n> >> > Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>\r\n> >> > ---\r\n> >> >\r\n> >> > Stray thoughts and further questions....\r\n> >> >\r\n> >> > Is this the right approach? Did I miss any other places where we should\r\n> >> > limit? Does the limit make sense? Should it instead be a per-device\r\n> >> > tuning nob instead of a global? Is 32 a good default?\r\n> >>\r\n> >> I actually like the idea of a per-device knob.  A xmit_more_max that's\r\n> >> global in a system with 1GbE devices along with a 25/50GbE or more just\r\n> >> doesn't make much sense to me.  Or having heterogeneous vendor devices\r\n> >> in the same system that have different HW behaviors could mask issues\r\n> >> with latency.\r\n> >>\r\n> >> This seems like another incarnation of possible buffer-bloat if the max\r\n> >> is too high...\r\n> >>\r\n> >> >\r\n> >> >   Documentation/sysctl/net.txt |  6 ++++++\r\n> >> >   include/linux/netdevice.h    |  2 ++\r\n> >> >   net/core/dev.c               | 10 +++++++++-\r\n> >> >   net/core/sysctl_net_core.c   |  7 +++++++\r\n> >> >   4 files changed, 24 insertions(+), 1 deletion(-)\r\n> >> >\r\n> >> > diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt\r\n> >> > index b67044a2575f..3d995e8f4448 100644\r\n> >> > --- a/Documentation/sysctl/net.txt\r\n> >> > +++ b/Documentation/sysctl/net.txt\r\n> >> > @@ -230,6 +230,12 @@ netdev_max_backlog\r\n> >> >   Maximum number  of  packets,  queued  on  the  INPUT  side, when the\r\n> interface\r\n> >> >   receives packets faster than kernel can process them.\r\n> >> >\r\n> >> > +xmit_more_max\r\n> >> > +-------------\r\n> >> > +\r\n> >> > +Maximum number of packets in a row to mark with skb->xmit_more. A value\r\n> of zero\r\n> >> > +indicates no limit.\r\n> >>\r\n> >> What defines \"packet?\"  MTU-sized packets, or payloads coming down from\r\n> >> the stack (e.g. TSO's)?\r\n> >\r\n> > xmit_more is only a hint to the device. The device driver should ignore it unless\r\n> > there are hardware advantages. The device driver is the place with HW specific\r\n> > knowledge (like 4 Tx descriptors is equivalent to one PCI transaction on this\r\n> device).\r\n> >\r\n> > Anything that pushes that optimization out to the user is only useful for\r\n> benchmarks\r\n> > and embedded devices.\r\n> \r\n> Actually I think I might have an idea what is going on here and I\r\n> agree that this is probably something that needs to be fixed in the\r\n> drivers. Especially since the problem isn't so much the skbs but\r\n> descriptors in the descriptor ring.\r\n> \r\n> If I am not mistaken the issue is most drivers will honor the\r\n> xmit_more unless the ring cannot enqueue another packet. The problem\r\n> is if the clean-up is occurring on a different CPU than transmit we\r\n> can cause the clean-up CPU/device DMA to go idle by not providing any\r\n> notifications to the device that new packets are present. What we\r\n> should probably do is look at adding another condition which is to\r\n> force us to flush the packet if we have used over half of the\r\n> descriptors in a given ring without notifying the device. Then that\r\n> way we can be filling half while the device is processing the other\r\n> half which should result in us operating smoothly.\r\n> \r\n> - Alex\r\n\r\nOk, and that definitely is driver specific, so I would be comfortable leaving that up to driver implementation. I'll look at creating a patch to do something like this for i40e.\r\n\r\nThanks,\r\nJake","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xh3jB4V3Wz9sMN\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue, 29 Aug 2017 06:46:54 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751368AbdH1Uq1 (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tMon, 28 Aug 2017 16:46:27 -0400","from mga14.intel.com ([192.55.52.115]:59971 \"EHLO mga14.intel.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1751208AbdH1Uq0 (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tMon, 28 Aug 2017 16:46:26 -0400","from fmsmga004.fm.intel.com ([10.253.24.48])\n\tby fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;\n\t28 Aug 2017 13:46:26 -0700","from orsmsx104.amr.corp.intel.com ([10.22.225.131])\n\tby fmsmga004.fm.intel.com with ESMTP; 28 Aug 2017 13:46:26 -0700","from orsmsx115.amr.corp.intel.com ([169.254.4.44]) by\n\tORSMSX104.amr.corp.intel.com ([169.254.4.142]) with mapi id\n\t14.03.0319.002; Mon, 28 Aug 2017 13:46:25 -0700"],"X-ExtLoop1":"1","X-IronPort-AV":"E=Sophos;i=\"5.41,442,1498546800\"; d=\"scan'208\";a=\"305338929\"","From":"\"Keller, Jacob E\" <jacob.e.keller@intel.com>","To":"Alexander Duyck <alexander.duyck@gmail.com>,\n\tStephen Hemminger <stephen@networkplumber.org>","CC":"\"Waskiewicz Jr, Peter\" <peter.waskiewicz.jr@intel.com>,\n\t\"netdev@vger.kernel.org\" <netdev@vger.kernel.org>","Subject":"RE: [RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","Thread-Topic":"[RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","Thread-Index":"AQHTHbZSzzEGT3+EHUKfvXGnhfFugKKVr9kAgABuggCABCMxAA==","Date":"Mon, 28 Aug 2017 20:46:25 +0000","Message-ID":"<02874ECE860811409154E81DA85FBB5882A8FFD9@ORSMSX115.amr.corp.intel.com>","References":"<20170825152449.29790-1-jacob.e.keller@intel.com>\n\t<E0D909EE5BB15A4699798539EA149D7F07793B3E@ORSMSX103.amr.corp.intel.com>\n\t<20170825085816.3425a70c@xeon-e3>\n\t<CAKgT0UdC_8hTbc-p6cQPNy4=gbFnodZ3ZyKts329Yu8kiv-ZWw@mail.gmail.com>","In-Reply-To":"<CAKgT0UdC_8hTbc-p6cQPNy4=gbFnodZ3ZyKts329Yu8kiv-ZWw@mail.gmail.com>","Accept-Language":"en-US","Content-Language":"en-US","X-MS-Has-Attach":"","X-MS-TNEF-Correlator":"","x-titus-metadata-40":"eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiZDNlNzcxZGYtNDUyYS00MDNkLWI1NzMtNDI1OTg5OTIxMGI2IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6IisyXC9aM1FIcUhhTExHd29NQWNPbkwxaGRvNzhBQVZ3QU1ubXpGVktPMExrPSJ9","x-ctpclassification":"CTP_IC","dlp-product":"dlpe-windows","dlp-version":"11.0.0.116","dlp-reaction":"no-action","x-originating-ip":"[10.22.254.138]","Content-Type":"text/plain; charset=\"utf-8\"","Content-Transfer-Encoding":"base64","MIME-Version":"1.0","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1758841,"web_url":"http://patchwork.ozlabs.org/comment/1758841/","msgid":"<02874ECE860811409154E81DA85FBB5882A91059@ORSMSX115.amr.corp.intel.com>","list_archive_url":null,"date":"2017-08-28T20:56:17","subject":"RE: [RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","submitter":{"id":9784,"url":"http://patchwork.ozlabs.org/api/people/9784/","name":"Jacob Keller","email":"jacob.e.keller@intel.com"},"content":"> -----Original Message-----\n> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On\n> Behalf Of Jakub Kicinski\n> Sent: Friday, August 25, 2017 12:34 PM\n> To: Keller, Jacob E <jacob.e.keller@intel.com>\n> Cc: netdev@vger.kernel.org\n> Subject: Re: [RFC PATCH] net: limit maximum number of packets to mark with\n> xmit_more\n> \n> On Fri, 25 Aug 2017 08:24:49 -0700, Jacob Keller wrote:\n> > Under some circumstances, such as with many stacked devices, it is\n> > possible that dev_hard_start_xmit will bundle many packets together, and\n> > mark them all with xmit_more.\n> \n> Excuse my ignorance but what are those stacked devices?  Could they\n> perhaps be fixed somehow?  My intuition was that long xmit_more\n> sequences can only happen if NIC and/or BQL are back pressuring, and\n> therefore we shouldn't be seeing a long xmit_more \"train\" arriving at\n> an empty device ring...\n\na veth device connecting a VM to the host, then connected to a bridge, which is connected to a vlan interface connected to a bond, which is hooked in active-backup to a physical device.\n\nSorry if I don't really know the correct way to refer to these, I just think of them as devices stacked on top of each other.\n\nDuring root cause investigation I found that we (the i40e driver) sometimes received up to 100 or more SKBs in a row with xmit_more set. We were incorrectly also using xmit_more as a hint for not marking packets to get writebacks, which caused significant throughput issues. Additionally there was concern that that many packets in a row without a tail bump would cause latency issues, so I thought maybe it was best to simply guarantee that the stack didn't send us too many packets marked with xmit more at once.\n\nIt seems based on discussion that it should be up to the driver to determine exactly how to handle the xmit_more hint and to determine when it actually isn't helpful or not, so I do not think this patch makes sense now.\n\nThanks,\nJake","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xh3w734glz9s9Y\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue, 29 Aug 2017 06:56:23 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751231AbdH1U4U convert rfc822-to-8bit (ORCPT\n\t<rfc822;patchwork-incoming@ozlabs.org>);\n\tMon, 28 Aug 2017 16:56:20 -0400","from mga01.intel.com ([192.55.52.88]:33406 \"EHLO mga01.intel.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1751182AbdH1U4T (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tMon, 28 Aug 2017 16:56:19 -0400","from fmsmga002.fm.intel.com ([10.253.24.26])\n\tby fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;\n\t28 Aug 2017 13:56:19 -0700","from orsmsx101.amr.corp.intel.com ([10.22.225.128])\n\tby fmsmga002.fm.intel.com with ESMTP; 28 Aug 2017 13:56:19 -0700","from orsmsx154.amr.corp.intel.com (10.22.226.12) by\n\tORSMSX101.amr.corp.intel.com (10.22.225.128) with Microsoft SMTP\n\tServer (TLS) id 14.3.319.2; Mon, 28 Aug 2017 13:56:18 -0700","from orsmsx115.amr.corp.intel.com ([169.254.4.44]) by\n\tORSMSX154.amr.corp.intel.com ([169.254.11.235]) with mapi id\n\t14.03.0319.002; Mon, 28 Aug 2017 13:56:18 -0700"],"X-ExtLoop1":"1","X-IronPort-AV":"E=Sophos;i=\"5.41,442,1498546800\"; d=\"scan'208\";a=\"1211586887\"","From":"\"Keller, Jacob E\" <jacob.e.keller@intel.com>","To":"Jakub Kicinski <kubakici@wp.pl>","CC":"\"netdev@vger.kernel.org\" <netdev@vger.kernel.org>","Subject":"RE: [RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","Thread-Topic":"[RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","Thread-Index":"AQHTHbZSzzEGT3+EHUKfvXGnhfFugKKV7DQAgARXSyA=","Date":"Mon, 28 Aug 2017 20:56:17 +0000","Message-ID":"<02874ECE860811409154E81DA85FBB5882A91059@ORSMSX115.amr.corp.intel.com>","References":"<20170825152449.29790-1-jacob.e.keller@intel.com>\n\t<20170825153418.53864810@cakuba>","In-Reply-To":"<20170825153418.53864810@cakuba>","Accept-Language":"en-US","Content-Language":"en-US","X-MS-Has-Attach":"","X-MS-TNEF-Correlator":"","x-titus-metadata-40":"eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNWI0YzBkMTQtNGVlMi00OWNlLTkwOTYtMDBiZTY1YzU1YWYxIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6Ilp0MExTK0hyQ2FRQUw2bnhTT0hraWQ3VXNRWFRiNG8rY05rYWtpd1dsc1k9In0=","x-ctpclassification":"CTP_IC","dlp-product":"dlpe-windows","dlp-version":"11.0.0.116","dlp-reaction":"no-action","x-originating-ip":"[10.22.254.138]","Content-Type":"text/plain; charset=\"us-ascii\"","Content-Transfer-Encoding":"8BIT","MIME-Version":"1.0","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1759342,"web_url":"http://patchwork.ozlabs.org/comment/1759342/","msgid":"<063D6719AE5E284EB5DD2968C1650D6DD00684AC@AcuExch.aculab.com>","list_archive_url":null,"date":"2017-08-29T13:35:49","subject":"RE: [RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","submitter":{"id":6689,"url":"http://patchwork.ozlabs.org/api/people/6689/","name":"David Laight","email":"David.Laight@ACULAB.COM"},"content":"From: Jakub Kicinski\n> Sent: 25 August 2017 20:34\n>\n> On Fri, 25 Aug 2017 08:24:49 -0700, Jacob Keller wrote:\n> > Under some circumstances, such as with many stacked devices, it is\n> > possible that dev_hard_start_xmit will bundle many packets together, and\n> > mark them all with xmit_more.\n> \n> Excuse my ignorance but what are those stacked devices?  Could they\n> perhaps be fixed somehow?  My intuition was that long xmit_more\n> sequences can only happen if NIC and/or BQL are back pressuring, and\n> therefore we shouldn't be seeing a long xmit_more \"train\" arriving at\n> an empty device ring...\n\nI also suspect that the packets could be coming from multiple sources.\nSo getting the sources to limit the number of packets with XMIT_MORE\nset won't really solve any problem.\n\nAt some point the driver for the physical device will have to give it\na kick to start the transmits.\n\nOn the systems I've got (desktop x86) PCIe writes aren't really very\nexpensive.\nReads are a different matter entirely (2us into our fpga target).\n\n\tDavid.","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xhV5Y72VZz9t2v\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue, 29 Aug 2017 23:36:01 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751625AbdH2Nf7 convert rfc822-to-8bit (ORCPT\n\t<rfc822;patchwork-incoming@ozlabs.org>);\n\tTue, 29 Aug 2017 09:35:59 -0400","from smtp-out6.electric.net ([192.162.217.182]:57021 \"EHLO\n\tsmtp-out6.electric.net\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1751368AbdH2Nf6 (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Tue, 29 Aug 2017 09:35:58 -0400","from 1dmggL-0005nV-TB by out6a.electric.net with emc1-ok (Exim\n\t4.87) (envelope-from <David.Laight@ACULAB.COM>)\n\tid 1dmggL-0005ro-V4; Tue, 29 Aug 2017 06:35:53 -0700","by emcmailer; Tue, 29 Aug 2017 06:35:53 -0700","from [156.67.243.126] (helo=AcuExch.aculab.com)\n\tby out6a.electric.net with esmtps (TLSv1:AES128-SHA:128)\n\t(Exim 4.87) (envelope-from <David.Laight@ACULAB.COM>)\n\tid 1dmggL-0005nV-TB; Tue, 29 Aug 2017 06:35:53 -0700","from ACUEXCH.Aculab.com ([::1]) by AcuExch.aculab.com ([::1]) with\n\tmapi id 14.03.0123.003; Tue, 29 Aug 2017 14:35:50 +0100"],"From":"David Laight <David.Laight@ACULAB.COM>","To":"'Jakub Kicinski' <kubakici@wp.pl>,\n\tJacob Keller <jacob.e.keller@intel.com>","CC":"\"netdev@vger.kernel.org\" <netdev@vger.kernel.org>","Subject":"RE: [RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","Thread-Topic":"[RFC PATCH] net: limit maximum number of packets to mark with\n\txmit_more","Thread-Index":"AQHTHdkrVF6OItdRtkG8lRFYS/tVNqKbWlfg","Date":"Tue, 29 Aug 2017 13:35:49 +0000","Message-ID":"<063D6719AE5E284EB5DD2968C1650D6DD00684AC@AcuExch.aculab.com>","References":"<20170825152449.29790-1-jacob.e.keller@intel.com>\n\t<20170825153418.53864810@cakuba>","In-Reply-To":"<20170825153418.53864810@cakuba>","Accept-Language":"en-GB, en-US","Content-Language":"en-US","X-MS-Has-Attach":"","X-MS-TNEF-Correlator":"","x-originating-ip":"[10.202.99.200]","Content-Type":"text/plain; charset=\"Windows-1252\"","Content-Transfer-Encoding":"8BIT","MIME-Version":"1.0","X-Outbound-IP":"156.67.243.126","X-Env-From":"David.Laight@ACULAB.COM","X-Proto":"esmtps","X-Revdns":"","X-HELO":"AcuExch.aculab.com","X-TLS":"TLSv1:AES128-SHA:128","X-Authenticated_ID":"","X-PolicySMART":"3396946, 3397078","X-Virus-Status":["Scanned by VirusSMART (c)","Scanned by VirusSMART (s)"],"Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}}]