Patch Detail
get:
Show a patch.
patch:
Update a patch.
put:
Update a patch.
GET /api/patches/584324/?format=api
{ "id": 584324, "url": "http://patchwork.ozlabs.org/api/patches/584324/?format=api", "web_url": "http://patchwork.ozlabs.org/project/intel-wired-lan/patch/20160217190302.10339.18783.stgit@localhost.localdomain/", "project": { "id": 46, "url": "http://patchwork.ozlabs.org/api/projects/46/?format=api", "name": "Intel Wired Ethernet development", "link_name": "intel-wired-lan", "list_id": "intel-wired-lan.osuosl.org", "list_email": "intel-wired-lan@osuosl.org", "web_url": "", "scm_url": "", "webscm_url": "", "list_archive_url": "", "list_archive_url_format": "", "commit_url_format": "" }, "msgid": "<20160217190302.10339.18783.stgit@localhost.localdomain>", "list_archive_url": null, "date": "2016-02-17T19:03:02", "name": "[next,4/4] i40e/i40evf: Allow up to 12K bytes of data per Tx descriptor instead of 8K", "commit_ref": null, "pull_url": null, "state": "changes-requested", "archived": false, "hash": "6c9005925e95e12fc1d8f568f2f28dc52915372b", "submitter": { "id": 67293, "url": "http://patchwork.ozlabs.org/api/people/67293/?format=api", "name": "Alexander Duyck", "email": "aduyck@mirantis.com" }, "delegate": { "id": 68, "url": "http://patchwork.ozlabs.org/api/users/68/?format=api", "username": "jtkirshe", "first_name": "Jeff", "last_name": "Kirsher", "email": "jeffrey.t.kirsher@intel.com" }, "mbox": "http://patchwork.ozlabs.org/project/intel-wired-lan/patch/20160217190302.10339.18783.stgit@localhost.localdomain/mbox/", "series": [], "comments": "http://patchwork.ozlabs.org/api/patches/584324/comments/", "check": "pending", "checks": "http://patchwork.ozlabs.org/api/patches/584324/checks/", "tags": {}, "related": [], "headers": { "Return-Path": "<intel-wired-lan-bounces@lists.osuosl.org>", "X-Original-To": [ "incoming@patchwork.ozlabs.org", "intel-wired-lan@lists.osuosl.org" ], "Delivered-To": [ "patchwork-incoming@bilbo.ozlabs.org", "intel-wired-lan@lists.osuosl.org" ], "Received": [ "from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137])\n\tby ozlabs.org (Postfix) with ESMTP id 93DBF1401CA\n\tfor <incoming@patchwork.ozlabs.org>;\n\tThu, 18 Feb 2016 06:03:08 +1100 (AEDT)", "from localhost (localhost [127.0.0.1])\n\tby fraxinus.osuosl.org (Postfix) with ESMTP id DB755A5ED2;\n\tWed, 17 Feb 2016 19:03:07 +0000 (UTC)", "from fraxinus.osuosl.org ([127.0.0.1])\n\tby localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)\n\twith ESMTP id EtV9lqAvg7BC; Wed, 17 Feb 2016 19:03:06 +0000 (UTC)", "from ash.osuosl.org (ash.osuosl.org [140.211.166.34])\n\tby fraxinus.osuosl.org (Postfix) with ESMTP id CEED6A5E42;\n\tWed, 17 Feb 2016 19:03:06 +0000 (UTC)", "from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138])\n\tby ash.osuosl.org (Postfix) with ESMTP id 069741C0BC2\n\tfor <intel-wired-lan@lists.osuosl.org>;\n\tWed, 17 Feb 2016 19:03:05 +0000 (UTC)", "from localhost (localhost [127.0.0.1])\n\tby whitealder.osuosl.org (Postfix) with ESMTP id 01C0A9219D\n\tfor <intel-wired-lan@lists.osuosl.org>;\n\tWed, 17 Feb 2016 19:03:05 +0000 (UTC)", "from whitealder.osuosl.org ([127.0.0.1])\n\tby localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)\n\twith ESMTP id TrKjm46F-h9C for <intel-wired-lan@lists.osuosl.org>;\n\tWed, 17 Feb 2016 19:03:04 +0000 (UTC)", "from mail-pa0-f53.google.com (mail-pa0-f53.google.com\n\t[209.85.220.53])\n\tby whitealder.osuosl.org (Postfix) with ESMTPS id 233CF921A9\n\tfor <intel-wired-lan@lists.osuosl.org>;\n\tWed, 17 Feb 2016 19:03:04 +0000 (UTC)", "by mail-pa0-f53.google.com with SMTP id yy13so15834868pab.3\n\tfor <intel-wired-lan@lists.osuosl.org>;\n\tWed, 17 Feb 2016 11:03:04 -0800 (PST)", "from localhost.localdomain\n\t(static-50-53-29-36.bvtn.or.frontiernet.net. [50.53.29.36])\n\tby smtp.gmail.com with ESMTPSA id\n\tk65sm4454680pfb.30.2016.02.17.11.03.03\n\tfor <intel-wired-lan@lists.osuosl.org>\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tWed, 17 Feb 2016 11:03:03 -0800 (PST)" ], "Authentication-Results": "ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (1024-bit key;\n\tunprotected) header.d=mirantis.com header.i=@mirantis.com\n\theader.b=hPaKkMdO; dkim-atps=neutral", "X-Virus-Scanned": [ "amavisd-new at osuosl.org", "amavisd-new at osuosl.org" ], "X-Greylist": "from auto-whitelisted by SQLgrey-1.7.6", "DKIM-Signature": "v=1; a=rsa-sha256; c=relaxed/relaxed; d=mirantis.com;\n\ts=google; \n\th=subject:from:to:date:message-id:in-reply-to:references:user-agent\n\t:mime-version:content-type:content-transfer-encoding;\n\tbh=c6nrseJqNWylF9ptuSBsOjE9RqpepbdAQ52kg5MVaUM=;\n\tb=hPaKkMdOMjUNvXc76DID2xwvMXYBDTHQpxHrWb6ewcAL1ujwsfD24WUyRulaFT/C1l\n\t9aOzFqp7dgw/cFjj5yfzouZB4WvI5Pi4SpoDdeRf4aW60zxQFy81e7QX/lz6TOR4vEe7\n\tZicAkDkO32qagTliL6uPbaeobszgsaucHCVeM=", "X-Google-DKIM-Signature": "v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20130820;\n\th=x-gm-message-state:subject:from:to:date:message-id:in-reply-to\n\t:references:user-agent:mime-version:content-type\n\t:content-transfer-encoding;\n\tbh=c6nrseJqNWylF9ptuSBsOjE9RqpepbdAQ52kg5MVaUM=;\n\tb=kvZxDlJToNyiXSNcX2H5lkZe6gs5+xYjeba3m1OPZ4pwaFZ7uY9PjwPfzs4Y6R1It5\n\tn6S0M2UdEfv0+3721LKKsDONrselEzYL5zQjWqB+00b5pUxWG2Mja5r0D20gsEZRSIqX\n\tUAgdMZhekpCGYGPgldb/BA3RqXwuVcCfj5JWYy6UZaRlykt3h9hsqns+meQLtbSO42og\n\tXeJK4iS0IHI8RaZsKC72VKvpf8IvTyoQHuk78D+cwc9U0yNF6XrSOYqBBT8R98juQiX6\n\t3ds7h4WMuCFdrRgJ5C6St20WzUc0Qkm9FlzWV93Tg4dGKr/UiFRlsb7bPkFFMrttLQvo\n\t97IA==", "X-Gm-Message-State": "AG10YORWLJYpB9TG6KpYPnk+at7R4MHDMqsYvcUCrGaMBJtZifB8zLHYp+MmNL/wThdvbljw", "X-Received": "by 10.66.120.200 with SMTP id le8mr4377554pab.61.1455735783844; \n\tWed, 17 Feb 2016 11:03:03 -0800 (PST)", "From": "Alexander Duyck <aduyck@mirantis.com>", "To": "intel-wired-lan@lists.osuosl.org", "Date": "Wed, 17 Feb 2016 11:03:02 -0800", "Message-ID": "<20160217190302.10339.18783.stgit@localhost.localdomain>", "In-Reply-To": "<20160217185838.10339.68543.stgit@localhost.localdomain>", "References": "<20160217185838.10339.68543.stgit@localhost.localdomain>", "User-Agent": "StGit/0.17.1-dirty", "MIME-Version": "1.0", "Subject": "[Intel-wired-lan] [next PATCH 4/4] i40e/i40evf: Allow up to 12K\n\tbytes of data per Tx descriptor instead of 8K", "X-BeenThere": "intel-wired-lan@lists.osuosl.org", "X-Mailman-Version": "2.1.18-1", "Precedence": "list", "List-Id": "Intel Wired Ethernet Linux Kernel Driver Development\n\t<intel-wired-lan.lists.osuosl.org>", "List-Unsubscribe": "<http://lists.osuosl.org/mailman/options/intel-wired-lan>, \n\t<mailto:intel-wired-lan-request@lists.osuosl.org?subject=unsubscribe>", "List-Archive": "<http://lists.osuosl.org/pipermail/intel-wired-lan/>", "List-Post": "<mailto:intel-wired-lan@lists.osuosl.org>", "List-Help": "<mailto:intel-wired-lan-request@lists.osuosl.org?subject=help>", "List-Subscribe": "<http://lists.osuosl.org/mailman/listinfo/intel-wired-lan>, \n\t<mailto:intel-wired-lan-request@lists.osuosl.org?subject=subscribe>", "Content-Type": "text/plain; charset=\"us-ascii\"", "Content-Transfer-Encoding": "7bit", "Errors-To": "intel-wired-lan-bounces@lists.osuosl.org", "Sender": "\"Intel-wired-lan\" <intel-wired-lan-bounces@lists.osuosl.org>" }, "content": ">From what I can tell the practial limitation on the size of the Tx data\nbuffer is the fact that the Tx descriptor is limited to 14 bits. As such\nwe cannot use 16K as is typically used on the other Intel drivers. However\nartificially limiting ourselves to 8K can be expensive as this means that\nwe will consume up to 10 descriptors (1 context, 1 for header, and 9 for\nan payload, non-8K aligned) in a single send.\n\nI propose that we can reduce this by increasing the maximum data for a 4K\naligned block to 12K. We can reduce the descriptors used for a 32K aligned\nblock by 1 by increasing the size like this. In addition we still have the\n4K - 1 of space that is still unused. We can use this as a bit of extra\npadding when dealing with data that is not aligned to 4K.\n\nBy aligning the descriptors after the first to 4K we can improve the\neffiency of PCIe accesses as we can avoid using byte enables and can fetch\nfull TLP transactions after the first fetch of the buffer. This helps to\nimprove PCIe efficiency. Below is the results of testing before and after\nwith this patch:\n\nRecv Send Send Utilization Service Demand\nSocket Socket Message Elapsed Send Recv Send Recv\nSize Size Size Time Throughput local remote local remote\nbytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB\nBefore:\n87380 16384 16384 10.00 33682.24 20.27 -1.00 0.592 -1.00\nAfter:\n87380 16384 16384 10.00 34204.08 20.54 -1.00 0.590 -1.00\n\nSo the net result of this patch is that we have a small gain in throughput\ndue to a reduction in overhead for putting together the frame.\n\nSigned-off-by: Alexander Duyck <aduyck@mirantis.com>\n---\n drivers/net/ethernet/intel/i40e/i40e_txrx.c | 13 ++++++---\n drivers/net/ethernet/intel/i40e/i40e_txrx.h | 35 +++++++++++++++++++++++--\n drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 13 ++++++---\n drivers/net/ethernet/intel/i40evf/i40e_txrx.h | 35 +++++++++++++++++++++++--\n 4 files changed, 82 insertions(+), 14 deletions(-)", "diff": "diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c\nindex cb52f39d514a..f870b8da4551 100644\n--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c\n+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c\n@@ -2716,6 +2716,8 @@ static inline void i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,\n \ttx_bi = first;\n \n \tfor (frag = &skb_shinfo(skb)->frags[0];; frag++) {\n+\t\tunsigned int max_data = I40E_MAX_DATA_PER_TXD_ALIGNED;\n+\n \t\tif (dma_mapping_error(tx_ring->dev, dma))\n \t\t\tgoto dma_error;\n \n@@ -2723,12 +2725,14 @@ static inline void i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,\n \t\tdma_unmap_len_set(tx_bi, len, size);\n \t\tdma_unmap_addr_set(tx_bi, dma, dma);\n \n+\t\t/* align size to end of page */\n+\t\tmax_data += -dma & (I40E_MAX_READ_REQ_SIZE - 1);\n \t\ttx_desc->buffer_addr = cpu_to_le64(dma);\n \n \t\twhile (unlikely(size > I40E_MAX_DATA_PER_TXD)) {\n \t\t\ttx_desc->cmd_type_offset_bsz =\n \t\t\t\tbuild_ctob(td_cmd, td_offset,\n-\t\t\t\t\t I40E_MAX_DATA_PER_TXD, td_tag);\n+\t\t\t\t\t max_data, td_tag);\n \n \t\t\ttx_desc++;\n \t\t\ti++;\n@@ -2739,9 +2743,10 @@ static inline void i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,\n \t\t\t\ti = 0;\n \t\t\t}\n \n-\t\t\tdma += I40E_MAX_DATA_PER_TXD;\n-\t\t\tsize -= I40E_MAX_DATA_PER_TXD;\n+\t\t\tdma += max_data;\n+\t\t\tsize -= max_data;\n \n+\t\t\tmax_data = I40E_MAX_DATA_PER_TXD_ALIGNED;\n \t\t\ttx_desc->buffer_addr = cpu_to_le64(dma);\n \t\t}\n \n@@ -2891,7 +2896,7 @@ static netdev_tx_t i40e_xmit_frame_ring(struct sk_buff *skb,\n \tif (i40e_chk_linearize(skb, count)) {\n \t\tif (__skb_linearize(skb))\n \t\t\tgoto out_drop;\n-\t\tcount = TXD_USE_COUNT(skb->len);\n+\t\tcount = i40e_txd_use_count(skb->len);\n \t\ttx_ring->tx_stats.tx_linearize++;\n \t}\n \ndiff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h\nindex 8a3a163cc475..8b049cd77064 100644\n--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h\n+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h\n@@ -146,10 +146,39 @@ enum i40e_dyn_idx_t {\n \n #define I40E_MAX_BUFFER_TXD\t8\n #define I40E_MIN_TX_LEN\t\t17\n-#define I40E_MAX_DATA_PER_TXD\t8192\n+\n+/* The size limit for a transmit buffer in a descriptor is (16K - 1).\n+ * In order to align with the read requests we will align the value to\n+ * the nearest 4K which represents our maximum read request size.\n+ */\n+#define I40E_MAX_READ_REQ_SIZE\t\t4096\n+#define I40E_MAX_DATA_PER_TXD\t\t(16 * 1024 - 1)\n+#define I40E_MAX_DATA_PER_TXD_ALIGNED \\\n+\t(I40E_MAX_DATA_PER_TXD & ~(I40E_MAX_READ_REQ_SIZE - 1))\n+\n+/* This ugly bit of math is equivilent to DIV_ROUNDUP(size, X) where X is\n+ * the value I40E_MAX_DATA_PER_TXD_ALIGNED. It is needed due to the fact\n+ * that 12K is not a power of 2 and division is expensive. It is used to\n+ * approximate the number of descriptors used per linear buffer. Note\n+ * that this will overestimate in some cases as it doesn't account for the\n+ * fact that we will add up to 4K - 1 in aligning the 12K buffer, however\n+ * the error should not impact things much as large buffers usually mean\n+ * we will use fewer descriptors then there are frags in an skb.\n+ */\n+static inline unsigned int i40e_txd_use_count(unsigned int size)\n+{\n+\tconst unsigned int max = I40E_MAX_DATA_PER_TXD_ALIGNED;\n+\tconst unsigned int reciprocal = ((1ull << 32) - 1 + (max / 2)) / max;\n+\tunsigned int adjust = ~(u32)0;\n+\n+\t/* if we rounded up on the reciprprocal pull down the adjustment */\n+\tif ((max * reciprocal) > adjust)\n+\t\tadjust = ~(u32)(reciprocal - 1);\n+\n+\treturn (u32)((((u64)size * reciprocal) + adjust) >> 32);\n+}\n \n /* Tx Descriptors needed, worst case */\n-#define TXD_USE_COUNT(S) DIV_ROUND_UP((S), I40E_MAX_DATA_PER_TXD)\n #define DESC_NEEDED (MAX_SKB_FRAGS + 4)\n #define I40E_MIN_DESC_PENDING\t4\n \n@@ -369,7 +398,7 @@ static inline int i40e_xmit_descriptor_count(struct sk_buff *skb)\n \tint count = 0, size = skb_headlen(skb);\n \n \tfor (;;) {\n-\t\tcount += TXD_USE_COUNT(size);\n+\t\tcount += i40e_txd_use_count(size);\n \n \t\tif (!nr_frags--)\n \t\t\tbreak;\ndiff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c\nindex 686a95fe48bd..31466fe8dca1 100644\n--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c\n+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c\n@@ -1934,6 +1934,8 @@ static inline void i40evf_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,\n \ttx_bi = first;\n \n \tfor (frag = &skb_shinfo(skb)->frags[0];; frag++) {\n+\t\tunsigned int max_data = I40E_MAX_DATA_PER_TXD_ALIGNED;\n+\n \t\tif (dma_mapping_error(tx_ring->dev, dma))\n \t\t\tgoto dma_error;\n \n@@ -1941,12 +1943,14 @@ static inline void i40evf_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,\n \t\tdma_unmap_len_set(tx_bi, len, size);\n \t\tdma_unmap_addr_set(tx_bi, dma, dma);\n \n+\t\t/* align size to end of page */\n+\t\tmax_data += -dma & (I40E_MAX_READ_REQ_SIZE - 1);\n \t\ttx_desc->buffer_addr = cpu_to_le64(dma);\n \n \t\twhile (unlikely(size > I40E_MAX_DATA_PER_TXD)) {\n \t\t\ttx_desc->cmd_type_offset_bsz =\n \t\t\t\tbuild_ctob(td_cmd, td_offset,\n-\t\t\t\t\t I40E_MAX_DATA_PER_TXD, td_tag);\n+\t\t\t\t\t max_data, td_tag);\n \n \t\t\ttx_desc++;\n \t\t\ti++;\n@@ -1957,9 +1961,10 @@ static inline void i40evf_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,\n \t\t\t\ti = 0;\n \t\t\t}\n \n-\t\t\tdma += I40E_MAX_DATA_PER_TXD;\n-\t\t\tsize -= I40E_MAX_DATA_PER_TXD;\n+\t\t\tdma += max_data;\n+\t\t\tsize -= max_data;\n \n+\t\t\tmax_data = I40E_MAX_DATA_PER_TXD_ALIGNED;\n \t\t\ttx_desc->buffer_addr = cpu_to_le64(dma);\n \t\t}\n \n@@ -2108,7 +2113,7 @@ static netdev_tx_t i40e_xmit_frame_ring(struct sk_buff *skb,\n \tif (i40e_chk_linearize(skb, count)) {\n \t\tif (__skb_linearize(skb))\n \t\t\tgoto out_drop;\n-\t\tcount = TXD_USE_COUNT(skb->len);\n+\t\tcount = i40e_txd_use_count(skb->len);\n \t\ttx_ring->tx_stats.tx_linearize++;\n \t}\n \ndiff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h\nindex 8c5da4f89fd0..34096b1e4782 100644\n--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h\n+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h\n@@ -146,10 +146,39 @@ enum i40e_dyn_idx_t {\n \n #define I40E_MAX_BUFFER_TXD\t8\n #define I40E_MIN_TX_LEN\t\t17\n-#define I40E_MAX_DATA_PER_TXD\t8192\n+\n+/* The size limit for a transmit buffer in a descriptor is (16K - 1).\n+ * In order to align with the read requests we will align the value to\n+ * the nearest 4K which represents our maximum read request size.\n+ */\n+#define I40E_MAX_READ_REQ_SIZE\t\t4096\n+#define I40E_MAX_DATA_PER_TXD\t\t(16 * 1024 - 1)\n+#define I40E_MAX_DATA_PER_TXD_ALIGNED \\\n+\t(I40E_MAX_DATA_PER_TXD & ~(I40E_MAX_READ_REQ_SIZE - 1))\n+\n+/* This ugly bit of math is equivilent to DIV_ROUNDUP(size, X) where X is\n+ * the value I40E_MAX_DATA_PER_TXD_ALIGNED. It is needed due to the fact\n+ * that 12K is not a power of 2 and division is expensive. It is used to\n+ * approximate the number of descriptors used per linear buffer. Note\n+ * that this will overestimate in some cases as it doesn't account for the\n+ * fact that we will add up to 4K - 1 in aligning the 12K buffer, however\n+ * the error should not impact things much as large buffers usually mean\n+ * we will use fewer descriptors then there are frags in an skb.\n+ */\n+static inline unsigned int i40e_txd_use_count(unsigned int size)\n+{\n+\tconst unsigned int max = I40E_MAX_DATA_PER_TXD_ALIGNED;\n+\tconst unsigned int reciprocal = ((1ull << 32) - 1 + (max / 2)) / max;\n+\tunsigned int adjust = ~(u32)0;\n+\n+\t/* if we rounded up on the reciprprocal pull down the adjustment */\n+\tif ((max * reciprocal) > adjust)\n+\t\tadjust = ~(u32)(reciprocal - 1);\n+\n+\treturn (u32)((((u64)size * reciprocal) + adjust) >> 32);\n+}\n \n /* Tx Descriptors needed, worst case */\n-#define TXD_USE_COUNT(S) DIV_ROUND_UP((S), I40E_MAX_DATA_PER_TXD)\n #define DESC_NEEDED (MAX_SKB_FRAGS + 4)\n #define I40E_MIN_DESC_PENDING\t4\n \n@@ -359,7 +388,7 @@ static inline int i40e_xmit_descriptor_count(struct sk_buff *skb)\n \tint count = 0, size = skb_headlen(skb);\n \n \tfor (;;) {\n-\t\tcount += TXD_USE_COUNT(size);\n+\t\tcount += i40e_txd_use_count(size);\n \n \t\tif (!nr_frags--)\n \t\t\tbreak;\n", "prefixes": [ "next", "4/4" ] }