Patch Detail
get:
Show a patch.
patch:
Update a patch.
put:
Update a patch.
GET /api/patches/692500/?format=api
{ "id": 692500, "url": "http://patchwork.ozlabs.org/api/patches/692500/?format=api", "web_url": "http://patchwork.ozlabs.org/project/intel-wired-lan/patch/1478639119-14656-11-git-send-email-bimmy.pujari@intel.com/", "project": { "id": 46, "url": "http://patchwork.ozlabs.org/api/projects/46/?format=api", "name": "Intel Wired Ethernet development", "link_name": "intel-wired-lan", "list_id": "intel-wired-lan.osuosl.org", "list_email": "intel-wired-lan@osuosl.org", "web_url": "", "scm_url": "", "webscm_url": "", "list_archive_url": "", "list_archive_url_format": "", "commit_url_format": "" }, "msgid": "<1478639119-14656-11-git-send-email-bimmy.pujari@intel.com>", "list_archive_url": null, "date": "2016-11-08T21:05:14", "name": "[next,S52-V2,10/15] i40e: simplify txd use count calculation", "commit_ref": null, "pull_url": null, "state": "accepted", "archived": false, "hash": "d645457ad66e5b3d12ed4771255abe03117a5f4b", "submitter": { "id": 68919, "url": "http://patchwork.ozlabs.org/api/people/68919/?format=api", "name": "Pujari, Bimmy", "email": "bimmy.pujari@intel.com" }, "delegate": { "id": 68, "url": "http://patchwork.ozlabs.org/api/users/68/?format=api", "username": "jtkirshe", "first_name": "Jeff", "last_name": "Kirsher", "email": "jeffrey.t.kirsher@intel.com" }, "mbox": "http://patchwork.ozlabs.org/project/intel-wired-lan/patch/1478639119-14656-11-git-send-email-bimmy.pujari@intel.com/mbox/", "series": [], "comments": "http://patchwork.ozlabs.org/api/patches/692500/comments/", "check": "pending", "checks": "http://patchwork.ozlabs.org/api/patches/692500/checks/", "tags": {}, "related": [], "headers": { "Return-Path": "<intel-wired-lan-bounces@lists.osuosl.org>", "X-Original-To": [ "incoming@patchwork.ozlabs.org", "intel-wired-lan@lists.osuosl.org" ], "Delivered-To": [ "patchwork-incoming@bilbo.ozlabs.org", "intel-wired-lan@lists.osuosl.org" ], "Received": [ "from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3tD21p1GHsz9t1d\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 9 Nov 2016 08:07:09 +1100 (AEDT)", "from localhost (localhost [127.0.0.1])\n\tby fraxinus.osuosl.org (Postfix) with ESMTP id 0F7ABC1299;\n\tTue, 8 Nov 2016 21:07:08 +0000 (UTC)", "from fraxinus.osuosl.org ([127.0.0.1])\n\tby localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)\n\twith ESMTP id WDQF5vSYgR-3; Tue, 8 Nov 2016 21:07:04 +0000 (UTC)", "from ash.osuosl.org (ash.osuosl.org [140.211.166.34])\n\tby fraxinus.osuosl.org (Postfix) with ESMTP id 7BF09C13E1;\n\tTue, 8 Nov 2016 21:07:03 +0000 (UTC)", "from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136])\n\tby ash.osuosl.org (Postfix) with ESMTP id 588FB1C22F9\n\tfor <intel-wired-lan@lists.osuosl.org>;\n\tTue, 8 Nov 2016 21:07:00 +0000 (UTC)", "from localhost (localhost [127.0.0.1])\n\tby silver.osuosl.org (Postfix) with ESMTP id 4EF0F2E50D\n\tfor <intel-wired-lan@lists.osuosl.org>;\n\tTue, 8 Nov 2016 21:07:00 +0000 (UTC)", "from silver.osuosl.org ([127.0.0.1])\n\tby localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)\n\twith ESMTP id Vf0Z+paE5ol9 for <intel-wired-lan@lists.osuosl.org>;\n\tTue, 8 Nov 2016 21:06:55 +0000 (UTC)", "from mga05.intel.com (mga05.intel.com [192.55.52.43])\n\tby silver.osuosl.org (Postfix) with ESMTPS id 38A9631BBA\n\tfor <intel-wired-lan@lists.osuosl.org>;\n\tTue, 8 Nov 2016 21:06:55 +0000 (UTC)", "from fmsmga002.fm.intel.com ([10.253.24.26])\n\tby fmsmga105.fm.intel.com with ESMTP; 08 Nov 2016 13:06:54 -0800", "from bimmy.jf.intel.com (HELO bimmy.linux1.jf.intel.com)\n\t([134.134.2.167])\n\tby fmsmga002.fm.intel.com with ESMTP; 08 Nov 2016 13:06:54 -0800" ], "X-Virus-Scanned": [ "amavisd-new at osuosl.org", "amavisd-new at osuosl.org" ], "X-Greylist": "domain auto-whitelisted by SQLgrey-1.7.6", "X-ExtLoop1": "1", "X-IronPort-AV": "E=Sophos; i=\"5.31,611,1473145200\"; d=\"scan'208\";\n\ta=\"1082486418\"", "From": "Bimmy Pujari <bimmy.pujari@intel.com>", "To": "intel-wired-lan@lists.osuosl.org", "Date": "Tue, 8 Nov 2016 13:05:14 -0800", "Message-Id": "<1478639119-14656-11-git-send-email-bimmy.pujari@intel.com>", "X-Mailer": "git-send-email 2.4.11", "In-Reply-To": "<1478639119-14656-1-git-send-email-bimmy.pujari@intel.com>", "References": "<1478639119-14656-1-git-send-email-bimmy.pujari@intel.com>", "Subject": "[Intel-wired-lan] [next PATCH S52-V2 10/15] i40e: simplify txd use\n\tcount calculation", "X-BeenThere": "intel-wired-lan@lists.osuosl.org", "X-Mailman-Version": "2.1.18-1", "Precedence": "list", "List-Id": "Intel Wired Ethernet Linux Kernel Driver Development\n\t<intel-wired-lan.lists.osuosl.org>", "List-Unsubscribe": "<http://lists.osuosl.org/mailman/options/intel-wired-lan>, \n\t<mailto:intel-wired-lan-request@lists.osuosl.org?subject=unsubscribe>", "List-Archive": "<http://lists.osuosl.org/pipermail/intel-wired-lan/>", "List-Post": "<mailto:intel-wired-lan@lists.osuosl.org>", "List-Help": "<mailto:intel-wired-lan-request@lists.osuosl.org?subject=help>", "List-Subscribe": "<http://lists.osuosl.org/mailman/listinfo/intel-wired-lan>, \n\t<mailto:intel-wired-lan-request@lists.osuosl.org?subject=subscribe>", "MIME-Version": "1.0", "Content-Type": "text/plain; charset=\"us-ascii\"", "Content-Transfer-Encoding": "7bit", "Errors-To": "intel-wired-lan-bounces@lists.osuosl.org", "Sender": "\"Intel-wired-lan\" <intel-wired-lan-bounces@lists.osuosl.org>" }, "content": "From: Mitch Williams <mitch.a.williams@intel.com>\n\nThe i40e_txd_use_count function was fast but confusing. In the comments,\nit even admits that it's ugly. So replace it with a new function that is\n(very) slightly faster and has extensive commenting to help the thicker\namong us (including the author, who will forget in a week) understand\nhow it works.\n\nSigned-off-by: Mitch Williams <mitch.a.williams@intel.com>\nSigned-off-by: Alexander Duyck <alexander.h.duyck@intel.com>\nChange-ID: Ifb533f13786a0bf39cb29f77969a5be2c83d9a87\n---\nTesting Hints : Lots and lots of TSO with various\ndata sizes. This should operate exactly as before, with no noticeable\nperformance change.\n\n drivers/net/ethernet/intel/i40e/i40e_txrx.h | 45 +++++++++++++++++----------\n drivers/net/ethernet/intel/i40evf/i40e_txrx.h | 45 +++++++++++++++++----------\n 2 files changed, 56 insertions(+), 34 deletions(-)", "diff": "diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h\nindex de8550f..e065321 100644\n--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h\n+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h\n@@ -173,26 +173,37 @@ static inline bool i40e_test_staterr(union i40e_rx_desc *rx_desc,\n #define I40E_MAX_DATA_PER_TXD_ALIGNED \\\n \t(I40E_MAX_DATA_PER_TXD & ~(I40E_MAX_READ_REQ_SIZE - 1))\n \n-/* This ugly bit of math is equivalent to DIV_ROUNDUP(size, X) where X is\n- * the value I40E_MAX_DATA_PER_TXD_ALIGNED. It is needed due to the fact\n- * that 12K is not a power of 2 and division is expensive. It is used to\n- * approximate the number of descriptors used per linear buffer. Note\n- * that this will overestimate in some cases as it doesn't account for the\n- * fact that we will add up to 4K - 1 in aligning the 12K buffer, however\n- * the error should not impact things much as large buffers usually mean\n- * we will use fewer descriptors then there are frags in an skb.\n+/**\n+ * i40e_txd_use_count - estimate the number of descriptors needed for Tx\n+ * @size: transmit request size in bytes\n+ *\n+ * Due to hardware alignment restrictions (4K alignment), we need to\n+ * assume that we can have no more than 12K of data per descriptor, even\n+ * though each descriptor can take up to 16K - 1 bytes of aligned memory.\n+ * Thus, we need to divide by 12K. But division is slow! Instead,\n+ * we decompose the operation into shifts and one relatively cheap\n+ * multiply operation.\n+ *\n+ * To divide by 12K, we first divide by 4K, then divide by 3:\n+ * To divide by 4K, shift right by 12 bits\n+ * To divide by 3, multiply by 85, then divide by 256\n+ * (Divide by 256 is done by shifting right by 8 bits)\n+ * Finally, we add one to round up. Because 256 isn't an exact multiple of\n+ * 3, we'll underestimate near each multiple of 12K. This is actually more\n+ * accurate as we have 4K - 1 of wiggle room that we can fit into the last\n+ * segment. For our purposes this is accurate out to 1M which is orders of\n+ * magnitude greater than our largest possible GSO size.\n+ *\n+ * This would then be implemented as:\n+ * return (((size >> 12) * 85) >> 8) + 1;\n+ *\n+ * Since multiplication and division are commutative, we can reorder\n+ * operations into:\n+ * return ((size * 85) >> 20) + 1;\n */\n static inline unsigned int i40e_txd_use_count(unsigned int size)\n {\n-\tconst unsigned int max = I40E_MAX_DATA_PER_TXD_ALIGNED;\n-\tconst unsigned int reciprocal = ((1ull << 32) - 1 + (max / 2)) / max;\n-\tunsigned int adjust = ~(u32)0;\n-\n-\t/* if we rounded up on the reciprocal pull down the adjustment */\n-\tif ((max * reciprocal) > adjust)\n-\t\tadjust = ~(u32)(reciprocal - 1);\n-\n-\treturn (u32)((((u64)size * reciprocal) + adjust) >> 32);\n+\treturn ((size * 85) >> 20) + 1;\n }\n \n /* Tx Descriptors needed, worst case */\ndiff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h\nindex a586e19..a5fc789 100644\n--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h\n+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h\n@@ -173,26 +173,37 @@ static inline bool i40e_test_staterr(union i40e_rx_desc *rx_desc,\n #define I40E_MAX_DATA_PER_TXD_ALIGNED \\\n \t(I40E_MAX_DATA_PER_TXD & ~(I40E_MAX_READ_REQ_SIZE - 1))\n \n-/* This ugly bit of math is equivalent to DIV_ROUNDUP(size, X) where X is\n- * the value I40E_MAX_DATA_PER_TXD_ALIGNED. It is needed due to the fact\n- * that 12K is not a power of 2 and division is expensive. It is used to\n- * approximate the number of descriptors used per linear buffer. Note\n- * that this will overestimate in some cases as it doesn't account for the\n- * fact that we will add up to 4K - 1 in aligning the 12K buffer, however\n- * the error should not impact things much as large buffers usually mean\n- * we will use fewer descriptors then there are frags in an skb.\n+/**\n+ * i40e_txd_use_count - estimate the number of descriptors needed for Tx\n+ * @size: transmit request size in bytes\n+ *\n+ * Due to hardware alignment restrictions (4K alignment), we need to\n+ * assume that we can have no more than 12K of data per descriptor, even\n+ * though each descriptor can take up to 16K - 1 bytes of aligned memory.\n+ * Thus, we need to divide by 12K. But division is slow! Instead,\n+ * we decompose the operation into shifts and one relatively cheap\n+ * multiply operation.\n+ *\n+ * To divide by 12K, we first divide by 4K, then divide by 3:\n+ * To divide by 4K, shift right by 12 bits\n+ * To divide by 3, multiply by 85, then divide by 256\n+ * (Divide by 256 is done by shifting right by 8 bits)\n+ * Finally, we add one to round up. Because 256 isn't an exact multiple of\n+ * 3, we'll underestimate near each multiple of 12K. This is actually more\n+ * accurate as we have 4K - 1 of wiggle room that we can fit into the last\n+ * segment. For our purposes this is accurate out to 1M which is orders of\n+ * magnitude greater than our largest possible GSO size.\n+ *\n+ * This would then be implemented as:\n+ * return (((size >> 12) * 85) >> 8) + 1;\n+ *\n+ * Since multiplication and division are commutative, we can reorder\n+ * operations into:\n+ * return ((size * 85) >> 20) + 1;\n */\n static inline unsigned int i40e_txd_use_count(unsigned int size)\n {\n-\tconst unsigned int max = I40E_MAX_DATA_PER_TXD_ALIGNED;\n-\tconst unsigned int reciprocal = ((1ull << 32) - 1 + (max / 2)) / max;\n-\tunsigned int adjust = ~(u32)0;\n-\n-\t/* if we rounded up on the reciprocal pull down the adjustment */\n-\tif ((max * reciprocal) > adjust)\n-\t\tadjust = ~(u32)(reciprocal - 1);\n-\n-\treturn (u32)((((u64)size * reciprocal) + adjust) >> 32);\n+\treturn ((size * 85) >> 20) + 1;\n }\n \n /* Tx Descriptors needed, worst case */\n", "prefixes": [ "next", "S52-V2", "10/15" ] }