[{"id":1756761,"web_url":"http://patchwork.ozlabs.org/comment/1756761/","msgid":"<20170824202111.GS31858@bhelgaas-glaptop.roam.corp.google.com>","list_archive_url":null,"date":"2017-08-24T20:21:12","subject":"Re: [PATCH v5 04/10] PCI: rockchip: fix system hang up if activating\n\tCONFIG_DEBUG_SHIRQ","submitter":{"id":67298,"url":"http://patchwork.ozlabs.org/api/people/67298/","name":"Bjorn Helgaas","email":"helgaas@kernel.org"},"content":"[+cc Tejun, Dmitry, Michael, Stephen, linux-clk for devm/clk questions]\n\nOn Wed, Aug 23, 2017 at 03:02:38PM +0800, Shawn Lin wrote:\n> With CONFIG_DEBUG_SHIRQ enabled, the irq tear down routine\n> would still access the irq handler registed as a shard irq.\n> Per the comment within the function of __free_irq, it says\n> \"It's a shared IRQ -- the driver ought to be prepared for\n> an IRQ event to happen even now it's being freed\". However\n> when failing to probe the driver, it may disable the clock\n> for accessing the register and the following check for shared\n> irq state would call the irq handler which accesses the register\n> w/o the clk enabled. That will hang the system forever.\n> \n> With adding some dump_stack we could see how that happened.\n> \n> calling  rockchip_pcie_driver_init+0x0/0x28 @ 1\n> rockchip-pcie f8000000.pcie: no vpcie3v3 regulator found\n> rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found\n> rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found\n> rockchip-pcie f8000000.pcie: PCIe link training gen1 timeout!\n> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc3-next-20170807-ARCH+ #189\n> Hardware name: Firefly-RK3399 Board (DT)\n> Call trace:\n> [<ffff000008089bf0>] dump_backtrace+0x0/0x250\n> [<ffff000008089eb0>] show_stack+0x20/0x28\n> [<ffff000008c3313c>] dump_stack+0x90/0xb0\n> [<ffff000008632ad4>] rockchip_pcie_read.isra.11+0x54/0x58\n> [<ffff0000086334fc>] rockchip_pcie_client_irq_handler+0x30/0x1a0\n> [<ffff00000813ce98>] __free_irq+0x1c8/0x2dc\n> [<ffff00000813d044>] free_irq+0x44/0x74\n> [<ffff0000081415fc>] devm_irq_release+0x24/0x2c\n> [<ffff00000877429c>] release_nodes+0x1d8/0x30c\n> [<ffff000008774838>] devres_release_all+0x3c/0x5c\n> [<ffff00000876f19c>] driver_probe_device+0x244/0x494\n> [<ffff00000876f50c>] __driver_attach+0x120/0x124\n> [<ffff00000876cb80>] bus_for_each_dev+0x6c/0xac\n> [<ffff00000876e984>] driver_attach+0x2c/0x34\n> [<ffff00000876e3a4>] bus_add_driver+0x244/0x2b0\n> [<ffff000008770264>] driver_register+0x70/0x110\n> [<ffff0000087718b4>] platform_driver_register+0x60/0x6c\n> [<ffff0000091eb108>] rockchip_pcie_driver_init+0x20/0x28\n> [<ffff000008083a2c>] do_one_initcall+0xc8/0x130\n> [<ffff0000091a0ea8>] kernel_init_freeable+0x1a0/0x238\n> [<ffff000008c461cc>] kernel_init+0x18/0x108\n> [<ffff0000080836c0>] ret_from_fork+0x10/0x50\n> \n> In order to fix this, we remove all the clock-disabling from\n> the error handle path and driver's remove function. And replying\n> on the devm_add_action_or_reset to fire the clock-disabling at\n> the appropriate time. Also split out rockchip_pcie_setup_irq\n> and move requesting irq after enabling clks to avoid this kind\n\nThanks for splitting out the refactoring stuff.  That really makes\nthis patch much simpler.\n\nIIUC, this really has nothing to do with CONFIG_DEBUG_SHIRQ.  It may\nbe true that you've only *seen* the problem with CONFIG_DEBUG_SHIRQ\nenabled, but all that config option does is take a situation that\ncould happen at any time (another device sharing the IRQ generating an\ninterrupt), and force it to happen.  So it's just a way to expose an\nexisting driver problem.\n\nThe real problem is apparently that rockchip_pcie_subsys_irq_handler()\nrelies on some clock being enabled, but we're leaving it registered at\na time when the clock has already been disabled.\n\nYou fixed that by using devm_add_action_or_reset() to tell devm to\ndisable the clocks *after* releasing the IRQ.\n\nThat sort of makes sense, but devm_add_action_or_reset() is a little\nobscure, and this feels like a hole in the devm framework.  Seems like\nit would be nice if there were some sort of devm wrapper for\nclk_prepare_enable() so this would happen automatically.\n\nThis pattern:\n\n  clk = devm_clk_get(...);\n  if (IS_ERR(clk)) {\n    dev_warn(\"no clock for ...\");\n    return PTR_ERR(clk);\n  }\n\n  ret = clk_prepare_enable(clk);\n  if (ret) {\n    dev_warn(\"failed to enable ...\");\n    return err;\n  }\n\nis quite common (\"git grep -A10 devm_clk_get | grep clk_prepare_enable\n | wc -l\" finds over 400 occurrences).  Should there be something to\nsimplify this a little?\n\nI also wonder about other PCI host drivers that use both\nclk_prepare_enable() and devm_request_irq().  Maybe Rockchip is\n\"special\" in that it seems the driver must turn on a clock before it\ncan even talk to the host controller, whereas maybe other drivers can\nalways talk to the host controller, but need to turn on clocks\ndownstream from the controller.  I didn't audit them, but I'm\nconcerned that some of them might have this same problem.\n\n> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>\n> \n> ---\n> \n> Changes in v5:\n> - rebase on former reconstrtion patches suggested by Bjorn\n> \n> Changes in v4:\n> - split out rockchip_pcie_enable_clocks and reuse\n>   rockchip_pcie_enable_clocks and rockchip_pcie_disable_clocks\n>   for elsewhere suggested by Jeffy\n> \n> Changes in v3:\n> - check the return value of devm_add_action_or_reset and spilt out\n>   rockchip_pcie_setup_irq in order to move requesting irq after\n>   enabling clks.\n> \n> Changes in v2:\n> - use devm_add_action_or_reset to fix this ordering suggested by\n>   Heiko and Jeffy. Thanks!\n> \n>  drivers/pci/host/pcie-rockchip.c | 22 +++++++++++++---------\n>  1 file changed, 13 insertions(+), 9 deletions(-)\n> \n> diff --git a/drivers/pci/host/pcie-rockchip.c b/drivers/pci/host/pcie-rockchip.c\n> index 971d22b..891b60a 100644\n> --- a/drivers/pci/host/pcie-rockchip.c\n> +++ b/drivers/pci/host/pcie-rockchip.c\n> @@ -1099,10 +1099,6 @@ static int rockchip_pcie_parse_dt(struct rockchip_pcie *rockchip)\n>  \t\treturn PTR_ERR(rockchip->clk_pcie_pm);\n>  \t}\n>  \n> -\terr = rockchip_pcie_setup_irq(rockchip);\n> -\tif (err)\n> -\t\treturn err;\n> -\n>  \trockchip->vpcie12v = devm_regulator_get_optional(dev, \"vpcie12v\");\n>  \tif (IS_ERR(rockchip->vpcie12v)) {\n>  \t\tif (PTR_ERR(rockchip->vpcie12v) == -EPROBE_DEFER)\n> @@ -1525,10 +1521,22 @@ static int rockchip_pcie_probe(struct platform_device *pdev)\n>  \tif (err)\n>  \t\treturn err;\n>  \n> +\terr = devm_add_action_or_reset(dev,\n> +\t\t\t\t       rockchip_pcie_disable_clocks,\n> +\t\t\t\t       rockchip);\n> +\tif (err) {\n> +\t\tdev_err(dev, \"unable to add action or reset\\n\");\n> +\t\treturn err;\n> +\t}\n> +\n> +\terr = rockchip_pcie_setup_irq(rockchip);\n> +\tif (err)\n> +\t\treturn err;\n> +\n>  \terr = rockchip_pcie_set_vpcie(rockchip);\n>  \tif (err) {\n>  \t\tdev_err(dev, \"failed to set vpcie regulator\\n\");\n> -\t\tgoto err_set_vpcie;\n> +\t\treturn err;\n>  \t}\n>  \n>  \terr = rockchip_pcie_init_port(rockchip);\n> @@ -1625,8 +1633,6 @@ static int rockchip_pcie_probe(struct platform_device *pdev)\n>  \t\tregulator_disable(rockchip->vpcie1v8);\n>  \tif (!IS_ERR(rockchip->vpcie0v9))\n>  \t\tregulator_disable(rockchip->vpcie0v9);\n> -err_set_vpcie:\n> -\trockchip_pcie_disable_clocks(rockchip);\n>  \treturn err;\n>  }\n>  \n> @@ -1648,8 +1654,6 @@ static int rockchip_pcie_remove(struct platform_device *pdev)\n>  \t\tphy_exit(rockchip->phys[i]);\n>  \t}\n>  \n> -\trockchip_pcie_disable_clocks(rockchip);\n> -\n>  \tif (!IS_ERR(rockchip->vpcie12v))\n>  \t\tregulator_disable(rockchip->vpcie12v);\n>  \tif (!IS_ERR(rockchip->vpcie3v3))\n> -- \n> 1.9.1\n> \n>","headers":{"Return-Path":"<linux-pci-owner@vger.kernel.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-pci-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","mail.kernel.org;\n\tdmarc=none (p=none dis=none) header.from=kernel.org","mail.kernel.org;\n\tspf=none smtp.mailfrom=helgaas@kernel.org"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xdbKS24Gwz9sRV\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 25 Aug 2017 06:21:16 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1753201AbdHXUVO (ORCPT <rfc822;incoming@patchwork.ozlabs.org>);\n\tThu, 24 Aug 2017 16:21:14 -0400","from mail.kernel.org ([198.145.29.99]:38138 \"EHLO mail.kernel.org\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1752855AbdHXUVN (ORCPT <rfc822;linux-pci@vger.kernel.org>);\n\tThu, 24 Aug 2017 16:21:13 -0400","from localhost (unknown [69.55.156.165])\n\t(using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits))\n\t(No client certificate requested)\n\tby mail.kernel.org (Postfix) with ESMTPSA id 25A1D21A1B;\n\tThu, 24 Aug 2017 20:21:13 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mail.kernel.org 25A1D21A1B","Date":"Thu, 24 Aug 2017 15:21:12 -0500","From":"Bjorn Helgaas <helgaas@kernel.org>","To":"Shawn Lin <shawn.lin@rock-chips.com>","Cc":"Bjorn Helgaas <bhelgaas@google.com>, linux-pci@vger.kernel.org,\n\tlinux-rockchip@lists.infradead.org,\n\tBrian Norris <briannorris@chromium.org>,\n\tJeffy Chen <jeffy.chen@rock-chips.com>, Tejun Heo <tj@kernel.org>,\n\tDmitry Torokhov <dmitry.torokhov@gmail.com>,\n\tMichael Turquette <mturquette@baylibre.com>,\n\tStephen Boyd <sboyd@codeaurora.org>, linux-clk@vger.kernel.org","Subject":"Re: [PATCH v5 04/10] PCI: rockchip: fix system hang up if activating\n\tCONFIG_DEBUG_SHIRQ","Message-ID":"<20170824202111.GS31858@bhelgaas-glaptop.roam.corp.google.com>","References":"<1503471673-69478-1-git-send-email-shawn.lin@rock-chips.com>\n\t<1503471758-73904-1-git-send-email-shawn.lin@rock-chips.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<1503471758-73904-1-git-send-email-shawn.lin@rock-chips.com>","User-Agent":"Mutt/1.5.21 (2010-09-15)","Sender":"linux-pci-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-pci.vger.kernel.org>","X-Mailing-List":"linux-pci@vger.kernel.org"}},{"id":1756793,"web_url":"http://patchwork.ozlabs.org/comment/1756793/","msgid":"<CAKdAkRRFxsAfJrzr=rjo_mtMzP0y9-cRz9Vz+M92AhbYd5B=ww@mail.gmail.com>","list_archive_url":null,"date":"2017-08-24T21:10:52","subject":"Re: [PATCH v5 04/10] PCI: rockchip: fix system hang up if\n\tactivating CONFIG_DEBUG_SHIRQ","submitter":{"id":695,"url":"http://patchwork.ozlabs.org/api/people/695/","name":"Dmitry Torokhov","email":"dmitry.torokhov@gmail.com"},"content":"On Thu, Aug 24, 2017 at 1:21 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:\n> [+cc Tejun, Dmitry, Michael, Stephen, linux-clk for devm/clk questions]\n>\n> On Wed, Aug 23, 2017 at 03:02:38PM +0800, Shawn Lin wrote:\n>> With CONFIG_DEBUG_SHIRQ enabled, the irq tear down routine\n>> would still access the irq handler registed as a shard irq.\n>> Per the comment within the function of __free_irq, it says\n>> \"It's a shared IRQ -- the driver ought to be prepared for\n>> an IRQ event to happen even now it's being freed\". However\n>> when failing to probe the driver, it may disable the clock\n>> for accessing the register and the following check for shared\n>> irq state would call the irq handler which accesses the register\n>> w/o the clk enabled. That will hang the system forever.\n>>\n>> With adding some dump_stack we could see how that happened.\n>>\n>> calling  rockchip_pcie_driver_init+0x0/0x28 @ 1\n>> rockchip-pcie f8000000.pcie: no vpcie3v3 regulator found\n>> rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found\n>> rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found\n>> rockchip-pcie f8000000.pcie: PCIe link training gen1 timeout!\n>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc3-next-20170807-ARCH+ #189\n>> Hardware name: Firefly-RK3399 Board (DT)\n>> Call trace:\n>> [<ffff000008089bf0>] dump_backtrace+0x0/0x250\n>> [<ffff000008089eb0>] show_stack+0x20/0x28\n>> [<ffff000008c3313c>] dump_stack+0x90/0xb0\n>> [<ffff000008632ad4>] rockchip_pcie_read.isra.11+0x54/0x58\n>> [<ffff0000086334fc>] rockchip_pcie_client_irq_handler+0x30/0x1a0\n>> [<ffff00000813ce98>] __free_irq+0x1c8/0x2dc\n>> [<ffff00000813d044>] free_irq+0x44/0x74\n>> [<ffff0000081415fc>] devm_irq_release+0x24/0x2c\n>> [<ffff00000877429c>] release_nodes+0x1d8/0x30c\n>> [<ffff000008774838>] devres_release_all+0x3c/0x5c\n>> [<ffff00000876f19c>] driver_probe_device+0x244/0x494\n>> [<ffff00000876f50c>] __driver_attach+0x120/0x124\n>> [<ffff00000876cb80>] bus_for_each_dev+0x6c/0xac\n>> [<ffff00000876e984>] driver_attach+0x2c/0x34\n>> [<ffff00000876e3a4>] bus_add_driver+0x244/0x2b0\n>> [<ffff000008770264>] driver_register+0x70/0x110\n>> [<ffff0000087718b4>] platform_driver_register+0x60/0x6c\n>> [<ffff0000091eb108>] rockchip_pcie_driver_init+0x20/0x28\n>> [<ffff000008083a2c>] do_one_initcall+0xc8/0x130\n>> [<ffff0000091a0ea8>] kernel_init_freeable+0x1a0/0x238\n>> [<ffff000008c461cc>] kernel_init+0x18/0x108\n>> [<ffff0000080836c0>] ret_from_fork+0x10/0x50\n>>\n>> In order to fix this, we remove all the clock-disabling from\n>> the error handle path and driver's remove function. And replying\n>> on the devm_add_action_or_reset to fire the clock-disabling at\n>> the appropriate time. Also split out rockchip_pcie_setup_irq\n>> and move requesting irq after enabling clks to avoid this kind\n>\n> Thanks for splitting out the refactoring stuff.  That really makes\n> this patch much simpler.\n>\n> IIUC, this really has nothing to do with CONFIG_DEBUG_SHIRQ.  It may\n> be true that you've only *seen* the problem with CONFIG_DEBUG_SHIRQ\n> enabled, but all that config option does is take a situation that\n> could happen at any time (another device sharing the IRQ generating an\n> interrupt), and force it to happen.  So it's just a way to expose an\n> existing driver problem.\n>\n> The real problem is apparently that rockchip_pcie_subsys_irq_handler()\n> relies on some clock being enabled, but we're leaving it registered at\n> a time when the clock has already been disabled.\n>\n> You fixed that by using devm_add_action_or_reset() to tell devm to\n> disable the clocks *after* releasing the IRQ.\n>\n> That sort of makes sense, but devm_add_action_or_reset() is a little\n> obscure, and this feels like a hole in the devm framework.  Seems like\n> it would be nice if there were some sort of devm wrapper for\n> clk_prepare_enable() so this would happen automatically.\n>\n> This pattern:\n>\n>   clk = devm_clk_get(...);\n>   if (IS_ERR(clk)) {\n>     dev_warn(\"no clock for ...\");\n>     return PTR_ERR(clk);\n>   }\n>\n>   ret = clk_prepare_enable(clk);\n>   if (ret) {\n>     dev_warn(\"failed to enable ...\");\n>     return err;\n>   }\n>\n> is quite common (\"git grep -A10 devm_clk_get | grep clk_prepare_enable\n>  | wc -l\" finds over 400 occurrences).  Should there be something to\n> simplify this a little?\n>\n> I also wonder about other PCI host drivers that use both\n> clk_prepare_enable() and devm_request_irq().  Maybe Rockchip is\n> \"special\" in that it seems the driver must turn on a clock before it\n> can even talk to the host controller, whereas maybe other drivers can\n> always talk to the host controller, but need to turn on clocks\n> downstream from the controller.  I didn't audit them, but I'm\n> concerned that some of them might have this same problem.\n\nI proposed devm_clk_prepare_enable() and friends (see\nhttps://lkml.org/lkml/2017/2/14/544), but Stephen did not like it and\nmentioned that he and Mike were working on a different solution where\nclk_put() would drop all enables. I have not seen any updates on that\nthough. Maybe we should revisit devm approach?\n\nThanks.","headers":{"Return-Path":"<linux-pci-owner@vger.kernel.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-pci-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"L3edrsFM\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xdcQm3zcrz9sRq\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 25 Aug 2017 07:10:56 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1753217AbdHXVKz (ORCPT <rfc822;incoming@patchwork.ozlabs.org>);\n\tThu, 24 Aug 2017 17:10:55 -0400","from mail-ua0-f196.google.com ([209.85.217.196]:37557 \"EHLO\n\tmail-ua0-f196.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1753142AbdHXVKx (ORCPT\n\t<rfc822; linux-pci@vger.kernel.org>); Thu, 24 Aug 2017 17:10:53 -0400","by mail-ua0-f196.google.com with SMTP id e10so51579uah.4;\n\tThu, 24 Aug 2017 14:10:53 -0700 (PDT)","by 10.176.19.242 with HTTP; Thu, 24 Aug 2017 14:10:52 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=gmail.com; s=20161025;\n\th=mime-version:in-reply-to:references:from:date:message-id:subject:to\n\t:cc; bh=XopqZZJXoxR644N+NHAU3ct+XAMrpIcXRB0d6WpdW9k=;\n\tb=L3edrsFMFOhE+spMaN0JwlHuB1G1F8DbqWHyPIamJ9Z2HMd1eL089aX/29TDSp3BR3\n\tV46lRU05m+3eTMPqWfMSFCAvDaSnf4B5hZvBAT2CLYAdD6VmJlDjaXGA3qaSf4izyXJL\n\tmJkIWiE+YP3yWiPp9w+i2dynR/+bUXDzNOSJlxGHudqu6ouWNlvV9s1yjN8ZYfElS60S\n\tUcjdMzKK2FxuMDy83QnvQ5WvuAknG0M1l4a3n95w5R83r7Uz6J7R4Y++6QlGJzpzixSi\n\tZnDD6ygBcbNKaww/qAxnOtr01+w6wUQE9TjbrWv5IkfKro9BvVD7LYi2wA+fje1Hp1Hv\n\t/0eg==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:mime-version:in-reply-to:references:from:date\n\t:message-id:subject:to:cc;\n\tbh=XopqZZJXoxR644N+NHAU3ct+XAMrpIcXRB0d6WpdW9k=;\n\tb=lqAMw1bcQ7mF1EfusC8ICZszjFsbSOYfT9MuiDv3JW4/p4KBDOxeJy/xDjRqP2LntH\n\tSTL8HyJ9oX+cvQbEiJosywmZvm0REWTGu7fQVTKeX4IDYZX8P2G5IEm6q+EWaOKBXPk8\n\tKx0CF5Oj2piG3p1ra8lLfJUqrxDKDvehj/lEdYYf0eCZZdklviYObW6GXxEIUTjA1/FV\n\tqxIJsPRTV6TBWbrpv0TWhaWjGlcKcOfD+m6C6Ly8qjajn9kza8A+tyA83I48fmoniWs4\n\t84cI95wybu2LHKJkiPSvJPHzEEu2Y3xOr5SNarjZw2hZiU8eQ/38w5CWkqbJvEBngzd1\n\t2EQw==","X-Gm-Message-State":"AHYfb5jIwkOXyRP66gMFv/4ZT2r3z1t4E1hwwyv3FjoDWhL3K2T9di19\n\t2ThYHgdKaSrehtVrHeJSpxcDgTZ3Hw==","X-Received":"by 10.176.22.215 with SMTP id g23mr2705985uaf.125.1503609052586; \n\tThu, 24 Aug 2017 14:10:52 -0700 (PDT)","MIME-Version":"1.0","In-Reply-To":"<20170824202111.GS31858@bhelgaas-glaptop.roam.corp.google.com>","References":"<1503471673-69478-1-git-send-email-shawn.lin@rock-chips.com>\n\t<1503471758-73904-1-git-send-email-shawn.lin@rock-chips.com>\n\t<20170824202111.GS31858@bhelgaas-glaptop.roam.corp.google.com>","From":"Dmitry Torokhov <dmitry.torokhov@gmail.com>","Date":"Thu, 24 Aug 2017 14:10:52 -0700","Message-ID":"<CAKdAkRRFxsAfJrzr=rjo_mtMzP0y9-cRz9Vz+M92AhbYd5B=ww@mail.gmail.com>","Subject":"Re: [PATCH v5 04/10] PCI: rockchip: fix system hang up if\n\tactivating CONFIG_DEBUG_SHIRQ","To":"Bjorn Helgaas <helgaas@kernel.org>","Cc":"Shawn Lin <shawn.lin@rock-chips.com>,\n\tBjorn Helgaas <bhelgaas@google.com>,\n\tLinux PCI <linux-pci@vger.kernel.org>,\n\t\"open list:ARM/Rockchip SoC...\" <linux-rockchip@lists.infradead.org>, \n\tBrian Norris <briannorris@chromium.org>,\n\tJeffy Chen <jeffy.chen@rock-chips.com>, Tejun Heo <tj@kernel.org>,\n\tMichael Turquette <mturquette@baylibre.com>,\n\tStephen Boyd <sboyd@codeaurora.org>, linux-clk@vger.kernel.org","Content-Type":"text/plain; charset=\"UTF-8\"","Sender":"linux-pci-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-pci.vger.kernel.org>","X-Mailing-List":"linux-pci@vger.kernel.org"}},{"id":1756921,"web_url":"http://patchwork.ozlabs.org/comment/1756921/","msgid":"<599F77E7.4040604@rock-chips.com>","list_archive_url":null,"date":"2017-08-25T01:05:43","subject":"Re: [PATCH v5 04/10] PCI: rockchip: fix system hang up if activating\n\tCONFIG_DEBUG_SHIRQ","submitter":{"id":67754,"url":"http://patchwork.ozlabs.org/api/people/67754/","name":"Jeffy Chen","email":"jeffy.chen@rock-chips.com"},"content":"Hi Bjorn,\n\nOn 08/25/2017 04:21 AM, Bjorn Helgaas wrote:\n>> >In order to fix this, we remove all the clock-disabling from\n>> >the error handle path and driver's remove function. And replying\n>> >on the devm_add_action_or_reset to fire the clock-disabling at\n>> >the appropriate time. Also split out rockchip_pcie_setup_irq\n>> >and move requesting irq after enabling clks to avoid this kind\n> Thanks for splitting out the refactoring stuff.  That really makes\n> this patch much simpler.\n>\n> IIUC, this really has nothing to do with CONFIG_DEBUG_SHIRQ.  It may\n> be true that you've only*seen*  the problem with CONFIG_DEBUG_SHIRQ\n> enabled, but all that config option does is take a situation that\n> could happen at any time (another device sharing the IRQ generating an\n> interrupt), and force it to happen.  So it's just a way to expose an\n> existing driver problem.\nyes, and i'm wondering would it make more sense to somehow ignore those \nirqs(triggered by other devices, and we don't really need to care since \nwe already unregistered) than trying to hold all needed resources(clks & \npower domains & some other resources maybe) for that?\n\nmaybe we can just make sure the irq handler unregistered when we stop \ncaring about the irqs? or maybe add a flag to tell the irq handler to \nstop processing them?\n\n>\n> The real problem is apparently that rockchip_pcie_subsys_irq_handler()\n> relies on some clock being enabled, but we're leaving it registered at\n> a time when the clock has already been disabled.\n>\n> You fixed that by using devm_add_action_or_reset() to tell devm to\n> disable the clocks*after*  releasing the IRQ.\n>\n> That sort of makes sense, but devm_add_action_or_reset() is a little\n> obscure, and this feels like a hole in the devm framework.  Seems like\n> it would be nice if there were some sort of devm wrapper for\n> clk_prepare_enable() so this would happen automatically.\n>\n> This pattern:\n>\n>    clk = devm_clk_get(...);\n>    if (IS_ERR(clk)) {\n>      dev_warn(\"no clock for ...\");\n>      return PTR_ERR(clk);\n>    }\n>\n>    ret = clk_prepare_enable(clk);\n>    if (ret) {\n>      dev_warn(\"failed to enable ...\");\n>      return err;\n>    }\n>\n> is quite common (\"git grep -A10 devm_clk_get | grep clk_prepare_enable\n>   | wc -l\" finds over 400 occurrences).  Should there be something to\n> simplify this a little?\n>\n> I also wonder about other PCI host drivers that use both\n> clk_prepare_enable() and devm_request_irq().  Maybe Rockchip is\n> \"special\" in that it seems the driver must turn on a clock before it\n> can even talk to the host controller, whereas maybe other drivers can\n> always talk to the host controller, but need to turn on clocks\n> downstream from the controller.  I didn't audit them, but I'm\n> concerned that some of them might have this same problem.\n>","headers":{"Return-Path":"<linux-pci-owner@vger.kernel.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-pci-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xdjdy3fSCz9t3Z\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 25 Aug 2017 11:05:58 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1753891AbdHYBF4 (ORCPT <rfc822;incoming@patchwork.ozlabs.org>);\n\tThu, 24 Aug 2017 21:05:56 -0400","from regular1.263xmail.com ([211.150.99.138]:33810 \"EHLO\n\tregular1.263xmail.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1753556AbdHYBF4 (ORCPT\n\t<rfc822; linux-pci@vger.kernel.org>); Thu, 24 Aug 2017 21:05:56 -0400","from jeffy.chen?rock-chips.com (unknown [192.168.167.243])\n\tby regular1.263xmail.com (Postfix) with ESMTP id 168DA79C7;\n\tFri, 25 Aug 2017 09:05:52 +0800 (CST)","from [172.16.22.78] (localhost [127.0.0.1])\n\tby smtp.263.net (Postfix) with ESMTPA id 25D71363;\n\tFri, 25 Aug 2017 09:05:44 +0800 (CST)","from [172.16.22.78] (unknown [103.29.142.67])\n\tby smtp.263.net (Postfix) whith ESMTP id 2716RS6LXG;\n\tFri, 25 Aug 2017 09:05:51 +0800 (CST)"],"X-263anti-spam":"KSV:0;","X-MAIL-GRAY":"0","X-MAIL-DELIVERY":"1","X-KSVirus-check":"0","X-ABS-CHECKED":"4","X-RL-SENDER":"jeffy.chen@rock-chips.com","X-FST-TO":"helgaas@kernel.org","X-SENDER-IP":"103.29.142.67","X-LOGIN-NAME":"jeffy.chen@rock-chips.com","X-UNIQUE-TAG":"<8e65c773fcde3c34905a27550a464896>","X-ATTACHMENT-NUM":"0","X-SENDER":"cjf@rock-chips.com","X-DNS-TYPE":"0","Message-ID":"<599F77E7.4040604@rock-chips.com>","Date":"Fri, 25 Aug 2017 09:05:43 +0800","From":"jeffy <jeffy.chen@rock-chips.com>","User-Agent":"Mozilla/5.0 (X11; Linux x86_64;\n\trv:19.0) Gecko/20130126 Thunderbird/19.0","MIME-Version":"1.0","To":"Bjorn Helgaas <helgaas@kernel.org>, Shawn Lin <shawn.lin@rock-chips.com>","CC":"Bjorn Helgaas <bhelgaas@google.com>, linux-pci@vger.kernel.org,\n\tlinux-rockchip@lists.infradead.org,\n\tBrian Norris <briannorris@chromium.org>, Tejun Heo <tj@kernel.org>,\n\tDmitry Torokhov <dmitry.torokhov@gmail.com>,\n\tMichael Turquette <mturquette@baylibre.com>,\n\tStephen Boyd <sboyd@codeaurora.org>, linux-clk@vger.kernel.org","Subject":"Re: [PATCH v5 04/10] PCI: rockchip: fix system hang up if activating\n\tCONFIG_DEBUG_SHIRQ","References":"<1503471673-69478-1-git-send-email-shawn.lin@rock-chips.com>\n\t<1503471758-73904-1-git-send-email-shawn.lin@rock-chips.com>\n\t<20170824202111.GS31858@bhelgaas-glaptop.roam.corp.google.com>","In-Reply-To":"<20170824202111.GS31858@bhelgaas-glaptop.roam.corp.google.com>","Content-Type":"text/plain; charset=UTF-8; format=flowed","Content-Transfer-Encoding":"7bit","Sender":"linux-pci-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-pci.vger.kernel.org>","X-Mailing-List":"linux-pci@vger.kernel.org"}},{"id":1756956,"web_url":"http://patchwork.ozlabs.org/comment/1756956/","msgid":"<860f5928-6e63-ed7f-852d-7a5a90ce1652@rock-chips.com>","list_archive_url":null,"date":"2017-08-25T01:38:56","subject":"Re: [PATCH v5 04/10] PCI: rockchip: fix system hang up if activating\n\tCONFIG_DEBUG_SHIRQ","submitter":{"id":66993,"url":"http://patchwork.ozlabs.org/api/people/66993/","name":"Shawn Lin","email":"shawn.lin@rock-chips.com"},"content":"Hi Bjorn,\n\nOn在 2017/8/25 4:21, Bjorn Helgaas wrote:\n> [+cc Tejun, Dmitry, Michael, Stephen, linux-clk for devm/clk questions]\n> \n> On Wed, Aug 23, 2017 at 03:02:38PM +0800, Shawn Lin wrote:\n>> With CONFIG_DEBUG_SHIRQ enabled, the irq tear down routine\n>> would still access the irq handler registed as a shard irq.\n>> Per the comment within the function of __free_irq, it says\n>> \"It's a shared IRQ -- the driver ought to be prepared for\n>> an IRQ event to happen even now it's being freed\". However\n>> when failing to probe the driver, it may disable the clock\n>> for accessing the register and the following check for shared\n>> irq state would call the irq handler which accesses the register\n>> w/o the clk enabled. That will hang the system forever.\n>>\n>> With adding some dump_stack we could see how that happened.\n>>\n>> calling  rockchip_pcie_driver_init+0x0/0x28 @ 1\n>> rockchip-pcie f8000000.pcie: no vpcie3v3 regulator found\n>> rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found\n>> rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found\n>> rockchip-pcie f8000000.pcie: PCIe link training gen1 timeout!\n>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc3-next-20170807-ARCH+ #189\n>> Hardware name: Firefly-RK3399 Board (DT)\n>> Call trace:\n>> [<ffff000008089bf0>] dump_backtrace+0x0/0x250\n>> [<ffff000008089eb0>] show_stack+0x20/0x28\n>> [<ffff000008c3313c>] dump_stack+0x90/0xb0\n>> [<ffff000008632ad4>] rockchip_pcie_read.isra.11+0x54/0x58\n>> [<ffff0000086334fc>] rockchip_pcie_client_irq_handler+0x30/0x1a0\n>> [<ffff00000813ce98>] __free_irq+0x1c8/0x2dc\n>> [<ffff00000813d044>] free_irq+0x44/0x74\n>> [<ffff0000081415fc>] devm_irq_release+0x24/0x2c\n>> [<ffff00000877429c>] release_nodes+0x1d8/0x30c\n>> [<ffff000008774838>] devres_release_all+0x3c/0x5c\n>> [<ffff00000876f19c>] driver_probe_device+0x244/0x494\n>> [<ffff00000876f50c>] __driver_attach+0x120/0x124\n>> [<ffff00000876cb80>] bus_for_each_dev+0x6c/0xac\n>> [<ffff00000876e984>] driver_attach+0x2c/0x34\n>> [<ffff00000876e3a4>] bus_add_driver+0x244/0x2b0\n>> [<ffff000008770264>] driver_register+0x70/0x110\n>> [<ffff0000087718b4>] platform_driver_register+0x60/0x6c\n>> [<ffff0000091eb108>] rockchip_pcie_driver_init+0x20/0x28\n>> [<ffff000008083a2c>] do_one_initcall+0xc8/0x130\n>> [<ffff0000091a0ea8>] kernel_init_freeable+0x1a0/0x238\n>> [<ffff000008c461cc>] kernel_init+0x18/0x108\n>> [<ffff0000080836c0>] ret_from_fork+0x10/0x50\n>>\n>> In order to fix this, we remove all the clock-disabling from\n>> the error handle path and driver's remove function. And replying\n>> on the devm_add_action_or_reset to fire the clock-disabling at\n>> the appropriate time. Also split out rockchip_pcie_setup_irq\n>> and move requesting irq after enabling clks to avoid this kind\n> \n> Thanks for splitting out the refactoring stuff.  That really makes\n> this patch much simpler.\n> \n> IIUC, this really has nothing to do with CONFIG_DEBUG_SHIRQ.  It may\n> be true that you've only *seen* the problem with CONFIG_DEBUG_SHIRQ\n> enabled, but all that config option does is take a situation that\n> could happen at any time (another device sharing the IRQ generating an\n> interrupt), and force it to happen.  So it's just a way to expose an\n> existing driver problem.\n\nRight.\n\n> \n> The real problem is apparently that rockchip_pcie_subsys_irq_handler()\n> relies on some clock being enabled, but we're leaving it registered at\n> a time when the clock has already been disabled.\n> \n> You fixed that by using devm_add_action_or_reset() to tell devm to\n> disable the clocks *after* releasing the IRQ.\n> \n> That sort of makes sense, but devm_add_action_or_reset() is a little\n> obscure, and this feels like a hole in the devm framework.  Seems like\n> it would be nice if there were some sort of devm wrapper for\n> clk_prepare_enable() so this would happen automatically.\n\nYes, I would appreciate it if we have devm wrapper for\nclk_prepare_enable so that we don't resort to devm_add_action_or_reset.\n\n> \n> This pattern:\n> \n>    clk = devm_clk_get(...);\n>    if (IS_ERR(clk)) {\n>      dev_warn(\"no clock for ...\");\n>      return PTR_ERR(clk);\n>    }\n> \n>    ret = clk_prepare_enable(clk);\n>    if (ret) {\n>      dev_warn(\"failed to enable ...\");\n>      return err;\n>    }\n> \n> is quite common (\"git grep -A10 devm_clk_get | grep clk_prepare_enable\n>   | wc -l\" finds over 400 occurrences).  Should there be something to\n> simplify this a little?\n> \n> I also wonder about other PCI host drivers that use both\n> clk_prepare_enable() and devm_request_irq().  Maybe Rockchip is\n> \"special\" in that it seems the driver must turn on a clock before it\n> can even talk to the host controller, whereas maybe other drivers can\n\nIIRC, some of the other ARM SoCs have the same problem.\n\n> always talk to the host controller, but need to turn on clocks\n> downstream from the controller.  I didn't audit them, but I'm\n> concerned that some of them might have this same problem.\n\nSo that is my concern as well. But I have to say we may face a worse\nsituation as I see it by randomly search the DT,\n\narch/arm64/boot/dts/renesas/r8a7795.dtsi  includes a power-domains\nfor pcie-rcar and pcie-rcar registers shared irq either. So the power-\ndomain would be powered off once failing to probe or calling ->remove()\nimmediately even *before* doing devm cleanup. In another word, I don't\nhave too much confident that renesas's CPU could visit PCIe IP w/o power\ndomain in 'on' state?\n\nI posted a relevant patch for fixing this for driver core but havn't got\nany input from there (https://lkml.org/lkml/2017/8/15/146). That don't\naffect pcie-rockchip *now* as we don't have power-domain for that but\nit's highly relevant to the problem we are disscussing.\n\nFinally, as a life-saving straw if we don't reach an agreement for\nanyone of adding devm clk_prepare_enable wrraper and adjusting the\nsequence of powering off power-domain, we have to get rid of using\ndevm_request_irq and use request_irq/free_irq instead for all\nthe potential problematic drivers...\n\n\n> \n>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>\n>>\n>> ---\n>>\n>> Changes in v5:\n>> - rebase on former reconstrtion patches suggested by Bjorn\n>>\n>> Changes in v4:\n>> - split out rockchip_pcie_enable_clocks and reuse\n>>    rockchip_pcie_enable_clocks and rockchip_pcie_disable_clocks\n>>    for elsewhere suggested by Jeffy\n>>\n>> Changes in v3:\n>> - check the return value of devm_add_action_or_reset and spilt out\n>>    rockchip_pcie_setup_irq in order to move requesting irq after\n>>    enabling clks.\n>>\n>> Changes in v2:\n>> - use devm_add_action_or_reset to fix this ordering suggested by\n>>    Heiko and Jeffy. Thanks!\n>>\n>>   drivers/pci/host/pcie-rockchip.c | 22 +++++++++++++---------\n>>   1 file changed, 13 insertions(+), 9 deletions(-)\n>>\n>> diff --git a/drivers/pci/host/pcie-rockchip.c b/drivers/pci/host/pcie-rockchip.c\n>> index 971d22b..891b60a 100644\n>> --- a/drivers/pci/host/pcie-rockchip.c\n>> +++ b/drivers/pci/host/pcie-rockchip.c\n>> @@ -1099,10 +1099,6 @@ static int rockchip_pcie_parse_dt(struct rockchip_pcie *rockchip)\n>>   \t\treturn PTR_ERR(rockchip->clk_pcie_pm);\n>>   \t}\n>>   \n>> -\terr = rockchip_pcie_setup_irq(rockchip);\n>> -\tif (err)\n>> -\t\treturn err;\n>> -\n>>   \trockchip->vpcie12v = devm_regulator_get_optional(dev, \"vpcie12v\");\n>>   \tif (IS_ERR(rockchip->vpcie12v)) {\n>>   \t\tif (PTR_ERR(rockchip->vpcie12v) == -EPROBE_DEFER)\n>> @@ -1525,10 +1521,22 @@ static int rockchip_pcie_probe(struct platform_device *pdev)\n>>   \tif (err)\n>>   \t\treturn err;\n>>   \n>> +\terr = devm_add_action_or_reset(dev,\n>> +\t\t\t\t       rockchip_pcie_disable_clocks,\n>> +\t\t\t\t       rockchip);\n>> +\tif (err) {\n>> +\t\tdev_err(dev, \"unable to add action or reset\\n\");\n>> +\t\treturn err;\n>> +\t}\n>> +\n>> +\terr = rockchip_pcie_setup_irq(rockchip);\n>> +\tif (err)\n>> +\t\treturn err;\n>> +\n>>   \terr = rockchip_pcie_set_vpcie(rockchip);\n>>   \tif (err) {\n>>   \t\tdev_err(dev, \"failed to set vpcie regulator\\n\");\n>> -\t\tgoto err_set_vpcie;\n>> +\t\treturn err;\n>>   \t}\n>>   \n>>   \terr = rockchip_pcie_init_port(rockchip);\n>> @@ -1625,8 +1633,6 @@ static int rockchip_pcie_probe(struct platform_device *pdev)\n>>   \t\tregulator_disable(rockchip->vpcie1v8);\n>>   \tif (!IS_ERR(rockchip->vpcie0v9))\n>>   \t\tregulator_disable(rockchip->vpcie0v9);\n>> -err_set_vpcie:\n>> -\trockchip_pcie_disable_clocks(rockchip);\n>>   \treturn err;\n>>   }\n>>   \n>> @@ -1648,8 +1654,6 @@ static int rockchip_pcie_remove(struct platform_device *pdev)\n>>   \t\tphy_exit(rockchip->phys[i]);\n>>   \t}\n>>   \n>> -\trockchip_pcie_disable_clocks(rockchip);\n>> -\n>>   \tif (!IS_ERR(rockchip->vpcie12v))\n>>   \t\tregulator_disable(rockchip->vpcie12v);\n>>   \tif (!IS_ERR(rockchip->vpcie3v3))\n>> -- \n>> 1.9.1\n>>\n>>\n> \n> \n>","headers":{"Return-Path":"<linux-pci-owner@vger.kernel.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-pci-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xdkNN16ldz9s8P\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 25 Aug 2017 11:39:16 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1754256AbdHYBjO (ORCPT <rfc822;incoming@patchwork.ozlabs.org>);\n\tThu, 24 Aug 2017 21:39:14 -0400","from lucky1.263xmail.com ([211.157.147.132]:47776 \"EHLO\n\tlucky1.263xmail.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1754066AbdHYBjN (ORCPT\n\t<rfc822; linux-pci@vger.kernel.org>); Thu, 24 Aug 2017 21:39:13 -0400","from shawn.lin?rock-chips.com (unknown [192.168.167.159])\n\tby lucky1.263xmail.com (Postfix) with ESMTP id 07C5A643E4;\n\tFri, 25 Aug 2017 09:39:02 +0800 (CST)","from [172.16.12.30] (localhost [127.0.0.1])\n\tby smtp.263.net (Postfix) with ESMTPA id 582B13D7;\n\tFri, 25 Aug 2017 09:38:55 +0800 (CST)","from [172.16.12.30] (unknown [58.22.7.114])\n\tby smtp.263.net (Postfix) whith ESMTP id 182525TS1T;\n\tFri, 25 Aug 2017 09:38:57 +0800 (CST)"],"X-263anti-spam":"KSV:0;","X-MAIL-GRAY":"1","X-MAIL-DELIVERY":"0","X-KSVirus-check":"0","X-ABS-CHECKED":"4","X-RL-SENDER":"shawn.lin@rock-chips.com","X-FST-TO":"linux-clk@vger.kernel.org","X-SENDER-IP":"58.22.7.114","X-LOGIN-NAME":"shawn.lin@rock-chips.com","X-UNIQUE-TAG":"<4b8e3c391cbaa5637632eb8d15def4b4>","X-ATTACHMENT-NUM":"0","X-SENDER":"lintao@rock-chips.com","X-DNS-TYPE":"0","Cc":"shawn.lin@rock-chips.com, Bjorn Helgaas <bhelgaas@google.com>,\n\tlinux-pci@vger.kernel.org, linux-rockchip@lists.infradead.org,\n\tBrian Norris <briannorris@chromium.org>,\n\tJeffy Chen <jeffy.chen@rock-chips.com>, Tejun Heo <tj@kernel.org>,\n\tDmitry Torokhov <dmitry.torokhov@gmail.com>,\n\tMichael Turquette <mturquette@baylibre.com>,\n\tStephen Boyd <sboyd@codeaurora.org>, linux-clk@vger.kernel.org","Subject":"Re: [PATCH v5 04/10] PCI: rockchip: fix system hang up if activating\n\tCONFIG_DEBUG_SHIRQ","To":"Bjorn Helgaas <helgaas@kernel.org>","References":"<1503471673-69478-1-git-send-email-shawn.lin@rock-chips.com>\n\t<1503471758-73904-1-git-send-email-shawn.lin@rock-chips.com>\n\t<20170824202111.GS31858@bhelgaas-glaptop.roam.corp.google.com>","From":"Shawn Lin <shawn.lin@rock-chips.com>","Message-ID":"<860f5928-6e63-ed7f-852d-7a5a90ce1652@rock-chips.com>","Date":"Fri, 25 Aug 2017 09:38:56 +0800","User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101\n\tThunderbird/52.3.0","MIME-Version":"1.0","In-Reply-To":"<20170824202111.GS31858@bhelgaas-glaptop.roam.corp.google.com>","Content-Type":"text/plain; charset=gbk; format=flowed","Content-Transfer-Encoding":"8bit","Sender":"linux-pci-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-pci.vger.kernel.org>","X-Mailing-List":"linux-pci@vger.kernel.org"}},{"id":1756958,"web_url":"http://patchwork.ozlabs.org/comment/1756958/","msgid":"<20170825014403.GA100450@google.com>","list_archive_url":null,"date":"2017-08-25T01:44:05","subject":"Re: [PATCH v5 04/10] PCI: rockchip: fix system hang up if activating\n\tCONFIG_DEBUG_SHIRQ","submitter":{"id":67074,"url":"http://patchwork.ozlabs.org/api/people/67074/","name":"Brian Norris","email":"briannorris@chromium.org"},"content":"On Thu, Aug 24, 2017 at 02:10:52PM -0700, Dmitry Torokhov wrote:\n> On Thu, Aug 24, 2017 at 1:21 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:\n> > [+cc Tejun, Dmitry, Michael, Stephen, linux-clk for devm/clk questions]\n> >\n> > On Wed, Aug 23, 2017 at 03:02:38PM +0800, Shawn Lin wrote:\n> >> With CONFIG_DEBUG_SHIRQ enabled, the irq tear down routine\n> >> would still access the irq handler registed as a shard irq.\n> >> Per the comment within the function of __free_irq, it says\n> >> \"It's a shared IRQ -- the driver ought to be prepared for\n> >> an IRQ event to happen even now it's being freed\". However\n> >> when failing to probe the driver, it may disable the clock\n> >> for accessing the register and the following check for shared\n> >> irq state would call the irq handler which accesses the register\n> >> w/o the clk enabled. That will hang the system forever.\n\nSide note: why is this driver even requesting a shared IRQ? This is for\nrk3399, and the IRQ is a dedicated GIC interrupt for the PCIe\ncontroller. It shouldn't need to be 'shared'.\n\nThe problem still might not be *only* theoretical though, since it's\nstill possible for this non-shared interrupt to\n(a) trigger\n(b) concurrently, we remove/tear down (including disable clocks)\n(c) we service the IRQ      <-- dead, because clock is disabled\n(d) if we ever got here... free_irq()\n\nBrian","headers":{"Return-Path":"<linux-pci-owner@vger.kernel.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-pci-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (1024-bit key;\n\tunprotected) header.d=chromium.org header.i=@chromium.org\n\theader.b=\"jA7zRrBU\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xdkV31qdyz9t3m\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 25 Aug 2017 11:44:11 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1754285AbdHYBoJ (ORCPT <rfc822;incoming@patchwork.ozlabs.org>);\n\tThu, 24 Aug 2017 21:44:09 -0400","from mail-pg0-f44.google.com ([74.125.83.44]:36830 \"EHLO\n\tmail-pg0-f44.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1754191AbdHYBoI (ORCPT\n\t<rfc822; linux-pci@vger.kernel.org>); Thu, 24 Aug 2017 21:44:08 -0400","by mail-pg0-f44.google.com with SMTP id r133so6552875pgr.3\n\tfor <linux-pci@vger.kernel.org>; Thu, 24 Aug 2017 18:44:08 -0700 (PDT)","from google.com ([2620:0:1000:1600:fc53:4f69:5880:26ca])\n\tby smtp.gmail.com with ESMTPSA id\n\to18sm10497301pgd.51.2017.08.24.18.44.07\n\t(version=TLS1_2 cipher=AES128-SHA bits=128/128);\n\tThu, 24 Aug 2017 18:44:07 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=chromium.org; s=google;\n\th=date:from:to:cc:subject:message-id:references:mime-version\n\t:content-disposition:in-reply-to:user-agent;\n\tbh=2XCP04+F2Nn7bp2V+HdIPoBKcE9r2U/04Yryd3KUIrs=;\n\tb=jA7zRrBUL+CGbIn7khqrMKJBefd/BIBaXMJ52BAwdwZ+7za2x55pZUArgK1v6j9UEb\n\tokjBNQCdGYIy5bV7smLmi5L3ioCU/RmJb5f1JqvJowZpDZZff6Cmt5EYS02yA5WD3WOq\n\tk9mUpH3Fzg+GCZAvMozXZv/7RijJb5rXyVRSE=","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:date:from:to:cc:subject:message-id:references\n\t:mime-version:content-disposition:in-reply-to:user-agent;\n\tbh=2XCP04+F2Nn7bp2V+HdIPoBKcE9r2U/04Yryd3KUIrs=;\n\tb=VzxBY9oFUOoxZMZmI7vvSnzNUKcb64lK3ehcbRjMh/qMF7f14QGSZdApH+M2O03R2e\n\tgJ+P/Hf6fqfxGUP45lzuueDYGmABY7/fVitM8tUneCBUShgZkMG/2Jv6itkrAAy0uPvh\n\tF6zrPOd9SWTzcYIVs+h76oLIIfznLFD+IfAGf1IyJGoz5wVBL7P2gOaADIE/DhIGf7HQ\n\t6RGsFFOPYtyoOvhZ89Qr0rQ0LAixZsFEuWWz9oGsjoE9d/iCCWl8pDbXXI/YyiN5gw9Y\n\tpBkMBlu6vViVQ7K5Y4oFgVTFx5uXKVaQ5va1Wfo7izc/xRpTWOb+dBcHmmU3+qjilIf9\n\t5xrQ==","X-Gm-Message-State":"AHYfb5gkGRw2oyd/tMQZ/n2JZw7tIF072IMIe9tdr5rvLP4HiRlr72up\n\tms65JLbbPMubJkBJ","X-Received":"by 10.84.225.146 with SMTP id u18mr8955175plj.64.1503625448083; \n\tThu, 24 Aug 2017 18:44:08 -0700 (PDT)","Date":"Thu, 24 Aug 2017 18:44:05 -0700","From":"Brian Norris <briannorris@chromium.org>","To":"Dmitry Torokhov <dmitry.torokhov@gmail.com>","Cc":"Bjorn Helgaas <helgaas@kernel.org>, Shawn Lin <shawn.lin@rock-chips.com>,\n\tBjorn Helgaas <bhelgaas@google.com>,\n\tLinux PCI <linux-pci@vger.kernel.org>,\n\t\"open list:ARM/Rockchip SoC...\" <linux-rockchip@lists.infradead.org>, \n\tJeffy Chen <jeffy.chen@rock-chips.com>, Tejun Heo <tj@kernel.org>,\n\tMichael Turquette <mturquette@baylibre.com>,\n\tStephen Boyd <sboyd@codeaurora.org>, linux-clk@vger.kernel.org","Subject":"Re: [PATCH v5 04/10] PCI: rockchip: fix system hang up if activating\n\tCONFIG_DEBUG_SHIRQ","Message-ID":"<20170825014403.GA100450@google.com>","References":"<1503471673-69478-1-git-send-email-shawn.lin@rock-chips.com>\n\t<1503471758-73904-1-git-send-email-shawn.lin@rock-chips.com>\n\t<20170824202111.GS31858@bhelgaas-glaptop.roam.corp.google.com>\n\t<CAKdAkRRFxsAfJrzr=rjo_mtMzP0y9-cRz9Vz+M92AhbYd5B=ww@mail.gmail.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<CAKdAkRRFxsAfJrzr=rjo_mtMzP0y9-cRz9Vz+M92AhbYd5B=ww@mail.gmail.com>","User-Agent":"Mutt/1.5.21 (2010-09-15)","Sender":"linux-pci-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-pci.vger.kernel.org>","X-Mailing-List":"linux-pci@vger.kernel.org"}}]