Patch Detail
get:
Show a patch.
patch:
Update a patch.
put:
Update a patch.
GET /api/patches/908071/?format=api
{ "id": 908071, "url": "http://patchwork.ozlabs.org/api/patches/908071/?format=api", "web_url": "http://patchwork.ozlabs.org/project/intel-wired-lan/patch/20180503035931.22439-3-pasha.tatashin@oracle.com/", "project": { "id": 46, "url": "http://patchwork.ozlabs.org/api/projects/46/?format=api", "name": "Intel Wired Ethernet development", "link_name": "intel-wired-lan", "list_id": "intel-wired-lan.osuosl.org", "list_email": "intel-wired-lan@osuosl.org", "web_url": "", "scm_url": "", "webscm_url": "", "list_archive_url": "", "list_archive_url_format": "", "commit_url_format": "" }, "msgid": "<20180503035931.22439-3-pasha.tatashin@oracle.com>", "list_archive_url": null, "date": "2018-05-03T03:59:31", "name": "[2/2] drivers core: multi-threading device shutdown", "commit_ref": null, "pull_url": null, "state": "superseded", "archived": false, "hash": "8bbf2ff912d9d54767db5c8cf88daff7c134456d", "submitter": { "id": 71010, "url": "http://patchwork.ozlabs.org/api/people/71010/?format=api", "name": "Pavel Tatashin", "email": "pasha.tatashin@oracle.com" }, "delegate": null, "mbox": "http://patchwork.ozlabs.org/project/intel-wired-lan/patch/20180503035931.22439-3-pasha.tatashin@oracle.com/mbox/", "series": [ { "id": 42367, "url": "http://patchwork.ozlabs.org/api/series/42367/?format=api", "web_url": "http://patchwork.ozlabs.org/project/intel-wired-lan/list/?series=42367", "date": "2018-05-03T03:59:29", "name": "multi-threading device shutdown", "version": 1, "mbox": "http://patchwork.ozlabs.org/series/42367/mbox/" } ], "comments": "http://patchwork.ozlabs.org/api/patches/908071/comments/", "check": "pending", "checks": "http://patchwork.ozlabs.org/api/patches/908071/checks/", "tags": {}, "related": [], "headers": { "Return-Path": "<intel-wired-lan-bounces@osuosl.org>", "X-Original-To": [ "incoming@patchwork.ozlabs.org", "intel-wired-lan@lists.osuosl.org" ], "Delivered-To": [ "patchwork-incoming@bilbo.ozlabs.org", "intel-wired-lan@lists.osuosl.org" ], "Authentication-Results": [ "ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=osuosl.org\n\t(client-ip=140.211.166.138; helo=whitealder.osuosl.org;\n\tenvelope-from=intel-wired-lan-bounces@osuosl.org;\n\treceiver=<UNKNOWN>)", "ozlabs.org;\n\tdmarc=fail (p=none dis=none) header.from=oracle.com", "ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=oracle.com header.i=@oracle.com\n\theader.b=\"Zq9/FM3v\"; dkim-atps=neutral" ], "Received": [ "from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 40cHyG52cDz9s4t\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 4 May 2018 00:45:58 +1000 (AEST)", "from localhost (localhost [127.0.0.1])\n\tby whitealder.osuosl.org (Postfix) with ESMTP id 3D5E088BF4;\n\tThu, 3 May 2018 14:45:57 +0000 (UTC)", "from whitealder.osuosl.org ([127.0.0.1])\n\tby localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)\n\twith ESMTP id GHf1HUEe7jEP; Thu, 3 May 2018 14:45:52 +0000 (UTC)", "from ash.osuosl.org (ash.osuosl.org [140.211.166.34])\n\tby whitealder.osuosl.org (Postfix) with ESMTP id D819388B80;\n\tThu, 3 May 2018 14:45:50 +0000 (UTC)", "from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136])\n\tby ash.osuosl.org (Postfix) with ESMTP id D020A1C07E1\n\tfor <intel-wired-lan@lists.osuosl.org>;\n\tThu, 3 May 2018 03:59:44 +0000 (UTC)", "from localhost (localhost [127.0.0.1])\n\tby silver.osuosl.org (Postfix) with ESMTP id CD43D2EB58\n\tfor <intel-wired-lan@lists.osuosl.org>;\n\tThu, 3 May 2018 03:59:44 +0000 (UTC)", "from silver.osuosl.org ([127.0.0.1])\n\tby localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)\n\twith ESMTP id txxud9ccKGzp for <intel-wired-lan@lists.osuosl.org>;\n\tThu, 3 May 2018 03:59:43 +0000 (UTC)", "from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85])\n\tby silver.osuosl.org (Postfix) with ESMTPS id C9A2E2EB30\n\tfor <intel-wired-lan@lists.osuosl.org>;\n\tThu, 3 May 2018 03:59:43 +0000 (UTC)", "from pps.filterd (userp2120.oracle.com [127.0.0.1])\n\tby userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id\n\tw433pIiX026867; Thu, 3 May 2018 03:59:41 GMT", "from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71])\n\tby userp2120.oracle.com with ESMTP id 2hmhmfqq9v-1\n\t(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256\n\tverify=OK); Thu, 03 May 2018 03:59:41 +0000", "from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72])\n\tby userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w433xfpF012156\n\t(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256\n\tverify=OK); Thu, 3 May 2018 03:59:41 GMT", "from abhmp0002.oracle.com (abhmp0002.oracle.com [141.146.116.8])\n\tby userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w433xe1l023290; \n\tThu, 3 May 2018 03:59:40 GMT", "from xakep.us.oracle.com (/10.154.188.87)\n\tby default (Oracle Beehive Gateway v4.0)\n\twith ESMTP ; Wed, 02 May 2018 20:59:40 -0700" ], "X-Virus-Scanned": [ "amavisd-new at osuosl.org", "amavisd-new at osuosl.org" ], "X-Greylist": "domain auto-whitelisted by SQLgrey-1.7.6", "DKIM-Signature": "v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com;\n\th=from : to : subject :\n\tdate : message-id : in-reply-to : references; s=corp-2017-10-26;\n\tbh=v1eLTNmRblmbIeK87KBSgfE23uqpqRFXf6g/3DvYgz4=;\n\tb=Zq9/FM3vrVCUeCYpw9oDnJhsO26WYhVsSxbxQselTgdhxreMK5CjUmeQFALdHsiZtrwP\n\t3nNkSUxLfNMG0llEPcDLLxXIBAXvVlyjkl3xP91owHdTMGQDnyzDAk9UAu46wR0501Tu\n\t38/i3zECcQ5YjCtkyi8X69bpBmWXQLjtOxNYlbxQytDTx5jozSq1c9dgPoPq95kweYh/\n\t/RvGS1sngTJ5rhI1y80lyGHSijoTLiEGV+ipnwdM6qkrp7GQNONk/VyecuquJmGqXaXt\n\ttad9VgfPBr73caVsrhpsSSWg6q77VDvum8TjLFepjAz16RoGE+EvgWU9tgo9gbmXtgeK\n\teQ== ", "From": "Pavel Tatashin <pasha.tatashin@oracle.com>", "To": "pasha.tatashin@oracle.com, steven.sistare@oracle.com,\n\tdaniel.m.jordan@oracle.com, linux-kernel@vger.kernel.org,\n\tjeffrey.t.kirsher@intel.com, intel-wired-lan@lists.osuosl.org,\n\tnetdev@vger.kernel.org, gregkh@linuxfoundation.org", "Date": "Wed, 2 May 2018 23:59:31 -0400", "Message-Id": "<20180503035931.22439-3-pasha.tatashin@oracle.com>", "X-Mailer": "git-send-email 2.17.0", "In-Reply-To": "<20180503035931.22439-1-pasha.tatashin@oracle.com>", "References": "<20180503035931.22439-1-pasha.tatashin@oracle.com>", "X-Proofpoint-Virus-Version": "vendor=nai engine=5900 definitions=8881\n\tsignatures=668698", "X-Proofpoint-Spam-Details": "rule=notspam policy=default score=0 suspectscore=2\n\tmalwarescore=0\n\tphishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999\n\tadultscore=0 classifier=spam adjust=0 reason=mlx scancount=1\n\tengine=8.0.1-1711220000 definitions=main-1805030034", "X-Mailman-Approved-At": "Thu, 03 May 2018 14:45:49 +0000", "Subject": "[Intel-wired-lan] [PATCH 2/2] drivers core: multi-threading device\n\tshutdown", "X-BeenThere": "intel-wired-lan@osuosl.org", "X-Mailman-Version": "2.1.24", "Precedence": "list", "List-Id": "Intel Wired Ethernet Linux Kernel Driver Development\n\t<intel-wired-lan.osuosl.org>", "List-Unsubscribe": "<https://lists.osuosl.org/mailman/options/intel-wired-lan>, \n\t<mailto:intel-wired-lan-request@osuosl.org?subject=unsubscribe>", "List-Archive": "<http://lists.osuosl.org/pipermail/intel-wired-lan/>", "List-Post": "<mailto:intel-wired-lan@osuosl.org>", "List-Help": "<mailto:intel-wired-lan-request@osuosl.org?subject=help>", "List-Subscribe": "<https://lists.osuosl.org/mailman/listinfo/intel-wired-lan>, \n\t<mailto:intel-wired-lan-request@osuosl.org?subject=subscribe>", "MIME-Version": "1.0", "Content-Type": "text/plain; charset=\"us-ascii\"", "Content-Transfer-Encoding": "7bit", "Errors-To": "intel-wired-lan-bounces@osuosl.org", "Sender": "\"Intel-wired-lan\" <intel-wired-lan-bounces@osuosl.org>" }, "content": "When system is rebooted, halted or kexeced device_shutdown() is\ncalled.\n\nThis function shuts down every single device by calling either:\n\tdev->bus->shutdown(dev)\n\tdev->driver->shutdown(dev)\n\nEven on a machine just with a moderate amount of devices, device_shutdown()\nmay take multiple seconds to complete. Because many devices require a\nspecific delays to perform this operation.\n\nHere is sample analysis of time it takes to call device_shutdown() on\ntwo socket Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz machine.\n\ndevice_shutdown\t\t2.95s\n mlx4_shutdown\t\t1.14s\n megasas_shutdown\t0.24s\n ixgbe_shutdown\t\t0.37s x 4 (four ixgbe devices on my machine).\n the rest\t\t0.09s\n\nIn mlx4 we spent the most time, but that is because there is a 1 second\nsleep:\nmlx4_shutdown\n mlx4_unload_one\n mlx4_free_ownership\n msleep(1000)\n\nWith megasas we spend quoter of second, but sometimes longer (up-to 0.5s)\nin this path:\n\n megasas_shutdown\n megasas_flush_cache\n megasas_issue_blocked_cmd\n wait_event_timeout\n\nFinally, with ixgbe_shutdown() it takes 0.37 for each device, but that time\nis spread all over the place, with bigger offenders:\n\n ixgbe_shutdown\n __ixgbe_shutdown\n ixgbe_close_suspend\n ixgbe_down\n ixgbe_init_hw_generic\n ixgbe_reset_hw_X540\n msleep(100); 0.104483472\n ixgbe_get_san_mac_addr_generic 0.048414851\n ixgbe_get_wwn_prefix_generic 0.048409893\n ixgbe_start_hw_X540\n ixgbe_start_hw_generic\n ixgbe_clear_hw_cntrs_generic 0.048581502\n ixgbe_setup_fc_generic 0.024225800\n\n All the ixgbe_*generic functions end-up calling:\n ixgbe_read_eerd_X540()\n ixgbe_acquire_swfw_sync_X540\n usleep_range(5000, 6000);\n ixgbe_release_swfw_sync_X540\n usleep_range(5000, 6000);\n\nWhile these are short sleeps, they end-up calling them over 24 times!\n24 * 0.0055s = 0.132s. Adding-up to 0.528s for four devices.\n\nWhile we should keep optimizing the individual device drivers, in some\ncases this is simply a hardware property that forces a specific delay, and\nwe must wait.\n\nSo, the solution for this problem is to shutdown devices in parallel.\nHowever, we must shutdown children before shutting down parents, so parent\ndevice must wait for its children to finish.\n\nWith this patch, on the same machine devices_shutdown() takes 1.142s, and\nwithout mlx4 one second delay only 0.38s\n\nSigned-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>\n---\n drivers/base/core.c | 238 +++++++++++++++++++++++++++++++++++---------\n 1 file changed, 189 insertions(+), 49 deletions(-)", "diff": "diff --git a/drivers/base/core.c b/drivers/base/core.c\nindex b610816eb887..f370369a303b 100644\n--- a/drivers/base/core.c\n+++ b/drivers/base/core.c\n@@ -25,6 +25,7 @@\n #include <linux/netdevice.h>\n #include <linux/sched/signal.h>\n #include <linux/sysfs.h>\n+#include <linux/kthread.h>\n \n #include \"base.h\"\n #include \"power/power.h\"\n@@ -2102,6 +2103,59 @@ const char *device_get_devnode(struct device *dev,\n \treturn *tmp = s;\n }\n \n+/**\n+ * device_children_count - device children count\n+ * @parent: parent struct device.\n+ *\n+ * Returns number of children for this device or 0 if nonde.\n+ */\n+static int device_children_count(struct device *parent)\n+{\n+\tstruct klist_iter i;\n+\tint children = 0;\n+\n+\tif (!parent->p)\n+\t\treturn 0;\n+\n+\tklist_iter_init(&parent->p->klist_children, &i);\n+\twhile (next_device(&i))\n+\t\tchildren++;\n+\tklist_iter_exit(&i);\n+\n+\treturn children;\n+}\n+\n+/**\n+ * device_get_child_by_index - Return child using the provide index.\n+ * @parent: parent struct device.\n+ * @index: Index of the child, where 0 is the first child in the children list,\n+ * and so on.\n+ *\n+ * Returns child or NULL if child with this index is not present.\n+ */\n+static struct device *\n+device_get_child_by_index(struct device *parent, int index)\n+{\n+\tstruct klist_iter i;\n+\tstruct device *dev = NULL, *d;\n+\tint child_index = 0;\n+\n+\tif (!parent->p || index < 0)\n+\t\treturn NULL;\n+\n+\tklist_iter_init(&parent->p->klist_children, &i);\n+\twhile ((d = next_device(&i)) != NULL) {\n+\t\tif (child_index == index) {\n+\t\t\tdev = d;\n+\t\t\tbreak;\n+\t\t}\n+\t\tchild_index++;\n+\t}\n+\tklist_iter_exit(&i);\n+\n+\treturn dev;\n+}\n+\n /**\n * device_for_each_child - device child iterator.\n * @parent: parent struct device.\n@@ -2765,71 +2819,157 @@ int device_move(struct device *dev, struct device *new_parent,\n }\n EXPORT_SYMBOL_GPL(device_move);\n \n+/*\n+ * device_shutdown_one - call ->shutdown() for the device passed as\n+ * argument.\n+ */\n+static void device_shutdown_one(struct device *dev)\n+{\n+\t/* Don't allow any more runtime suspends */\n+\tpm_runtime_get_noresume(dev);\n+\tpm_runtime_barrier(dev);\n+\n+\tif (dev->class && dev->class->shutdown_pre) {\n+\t\tif (initcall_debug)\n+\t\t\tdev_info(dev, \"shutdown_pre\\n\");\n+\t\tdev->class->shutdown_pre(dev);\n+\t}\n+\tif (dev->bus && dev->bus->shutdown) {\n+\t\tif (initcall_debug)\n+\t\t\tdev_info(dev, \"shutdown\\n\");\n+\t\tdev->bus->shutdown(dev);\n+\t} else if (dev->driver && dev->driver->shutdown) {\n+\t\tif (initcall_debug)\n+\t\t\tdev_info(dev, \"shutdown\\n\");\n+\t\tdev->driver->shutdown(dev);\n+\t}\n+\n+\t/* Release device lock, and decrement the reference counter */\n+\tdevice_unlock(dev);\n+\tput_device(dev);\n+}\n+\n+static DECLARE_COMPLETION(device_root_tasks_complete);\n+static void device_shutdown_tree(struct device *dev);\n+static atomic_t device_root_tasks;\n+\n+/*\n+ * Passed as an argument to to device_shutdown_task().\n+ * child_next_index\tthe next available child index.\n+ * tasks_running\tnumber of tasks still running. Each tasks decrements it\n+ *\t\t\twhen job is finished and the last tasks signals that the\n+ *\t\t\tjob is complete.\n+ * complete\t\tUsed to signal job competition.\n+ * parent\t\tParent device.\n+ */\n+struct device_shutdown_task_data {\n+\tatomic_t\t\tchild_next_index;\n+\tatomic_t\t\ttasks_running;\n+\tstruct completion\tcomplete;\n+\tstruct device\t\t*parent;\n+};\n+\n+static int device_shutdown_task(void *data)\n+{\n+\tstruct device_shutdown_task_data *tdata =\n+\t\t(struct device_shutdown_task_data *)data;\n+\tint child_idx = atomic_inc_return(&tdata->child_next_index) - 1;\n+\tstruct device *dev = device_get_child_by_index(tdata->parent,\n+\t\t\t\t\t\t child_idx);\n+\n+\tif (dev)\n+\t\tdevice_shutdown_tree(dev);\n+\tif (atomic_dec_return(&tdata->tasks_running) == 0)\n+\t\tcomplete(&tdata->complete);\n+\treturn 0;\n+}\n+\n+/*\n+ * Shutdown device tree with root started in dev. If dev has no children\n+ * simply shutdown only this device. If dev has children recursively shutdown\n+ * children first, and only then the parent. For performance reasons children\n+ * are shutdown in parallel using kernel threads.\n+ */\n+static void device_shutdown_tree(struct device *dev)\n+{\n+\tint children_count = device_children_count(dev);\n+\n+\tif (children_count) {\n+\t\tstruct device_shutdown_task_data tdata;\n+\t\tint i;\n+\n+\t\tinit_completion(&tdata.complete);\n+\t\tatomic_set(&tdata.child_next_index, 0);\n+\t\tatomic_set(&tdata.tasks_running, children_count);\n+\t\ttdata.parent = dev;\n+\n+\t\tfor (i = 0; i < children_count; i++) {\n+\t\t\tkthread_run(device_shutdown_task,\n+\t\t\t\t &tdata, \"device_shutdown.%s\",\n+\t\t\t\t dev_name(dev));\n+\t\t}\n+\t\twait_for_completion(&tdata.complete);\n+\t}\n+\tdevice_shutdown_one(dev);\n+}\n+\n+/*\n+ * On shutdown each root device (the one that does not have a parent) goes\n+ * through this function.\n+ */\n+static int\n+device_shutdown_root_task(void *data)\n+{\n+\tstruct device *dev = (struct device *)data;\n+\n+\tdevice_shutdown_tree(dev);\n+\tif (atomic_dec_return(&device_root_tasks) == 0)\n+\t\tcomplete(&device_root_tasks_complete);\n+\treturn 0;\n+}\n+\n /**\n * device_shutdown - call ->shutdown() on each device to shutdown.\n */\n void device_shutdown(void)\n {\n-\tstruct device *dev, *parent;\n+\tstruct list_head *pos, *next;\n+\tint root_devices = 0;\n+\tstruct device *dev;\n \n \tspin_lock(&devices_kset->list_lock);\n \t/*\n-\t * Walk the devices list backward, shutting down each in turn.\n-\t * Beware that device unplug events may also start pulling\n-\t * devices offline, even as the system is shutting down.\n+\t * Prepare devices for shutdown: lock, and increment references in every\n+\t * devices. Remove child devices from the list, and count number of root\n+\t * devices.\n \t */\n-\twhile (!list_empty(&devices_kset->list)) {\n-\t\tdev = list_entry(devices_kset->list.prev, struct device,\n-\t\t\t\tkobj.entry);\n+\tlist_for_each_safe(pos, next, &devices_kset->list) {\n+\t\tdev = list_entry(pos, struct device, kobj.entry);\n \n-\t\t/*\n-\t\t * hold reference count of device's parent to\n-\t\t * prevent it from being freed because parent's\n-\t\t * lock is to be held\n-\t\t */\n-\t\tparent = get_device(dev->parent);\n \t\tget_device(dev);\n-\t\t/*\n-\t\t * Make sure the device is off the kset list, in the\n-\t\t * event that dev->*->shutdown() doesn't remove it.\n-\t\t */\n-\t\tlist_del_init(&dev->kobj.entry);\n-\t\tspin_unlock(&devices_kset->list_lock);\n-\n-\t\t/* hold lock to avoid race with probe/release */\n-\t\tif (parent)\n-\t\t\tdevice_lock(parent);\n \t\tdevice_lock(dev);\n \n-\t\t/* Don't allow any more runtime suspends */\n-\t\tpm_runtime_get_noresume(dev);\n-\t\tpm_runtime_barrier(dev);\n-\n-\t\tif (dev->class && dev->class->shutdown_pre) {\n-\t\t\tif (initcall_debug)\n-\t\t\t\tdev_info(dev, \"shutdown_pre\\n\");\n-\t\t\tdev->class->shutdown_pre(dev);\n-\t\t}\n-\t\tif (dev->bus && dev->bus->shutdown) {\n-\t\t\tif (initcall_debug)\n-\t\t\t\tdev_info(dev, \"shutdown\\n\");\n-\t\t\tdev->bus->shutdown(dev);\n-\t\t} else if (dev->driver && dev->driver->shutdown) {\n-\t\t\tif (initcall_debug)\n-\t\t\t\tdev_info(dev, \"shutdown\\n\");\n-\t\t\tdev->driver->shutdown(dev);\n-\t\t}\n-\n-\t\tdevice_unlock(dev);\n-\t\tif (parent)\n-\t\t\tdevice_unlock(parent);\n-\n-\t\tput_device(dev);\n-\t\tput_device(parent);\n-\n+\t\tif (!dev->parent)\n+\t\t\troot_devices++;\n+\t\telse\n+\t\t\tlist_del_init(&dev->kobj.entry);\n+\t}\n+\tatomic_set(&device_root_tasks, root_devices);\n+\t/*\n+\t * Shutdown the root devices in parallel. The children are going to be\n+\t * shutdown first.\n+\t */\n+\tlist_for_each_safe(pos, next, &devices_kset->list) {\n+\t\tdev = list_entry(pos, struct device, kobj.entry);\n+\t\tlist_del_init(&dev->kobj.entry);\n+\t\tspin_unlock(&devices_kset->list_lock);\n+\t\tkthread_run(device_shutdown_root_task,\n+\t\t\t dev, \"device_root_shutdown.%s\",\n+\t\t\t dev_name(dev));\n \t\tspin_lock(&devices_kset->list_lock);\n \t}\n \tspin_unlock(&devices_kset->list_lock);\n+\twait_for_completion(&device_root_tasks_complete);\n }\n \n /*\n", "prefixes": [ "2/2" ] }