From patchwork Wed Oct 23 03:15:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: AceLan Kao X-Patchwork-Id: 1181758 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="hBHtc8gk"; dkim-atps=neutral Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46yb9Y5mzGz9sPV; Wed, 23 Oct 2019 14:15:45 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1iN77d-00065l-R2; Wed, 23 Oct 2019 03:15:41 +0000 Received: from mail-pg1-f194.google.com ([209.85.215.194]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1iN77V-0005zo-7S for kernel-team@lists.ubuntu.com; Wed, 23 Oct 2019 03:15:33 +0000 Received: by mail-pg1-f194.google.com with SMTP id w3so11233512pgt.5 for ; Tue, 22 Oct 2019 20:15:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:subject:date:message-id:in-reply-to:references; bh=WhzA30+O6vi3mpud9oa2vqDa/ohOALuxoyQ737q1pOg=; b=hBHtc8gkx7dZF3GddHkMYPYcQmi1Yh0+aIw06O+NkFsaAAbAXcZFPGp6P4vTLAWOZ8 ZyTrwwxX+5MWwhuMPvxgZNQ96D8Uh8UrNo8TROdWUpwi57OtTtu04cZkGd6Zo4Sfhdfn iADEScagILUUcLnhgY7qQjdtiRlLpIs0NHR+fzYNhE2KqJNFRJr9zXJqGCK8de+jWhVV /6UrbHwn6c2F633UPPlCBrKENwE1YUd1M4rT+TBrxMY7cmVZCLyg1at5dRCHXxlH/tXI S2rTczYeanT6fnCxVhFM7q0Izas1ssLzt7krovLOA9JpuMb0acgH8us3ghgdsraYhOWi 4ZFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:subject:date:message-id :in-reply-to:references; bh=WhzA30+O6vi3mpud9oa2vqDa/ohOALuxoyQ737q1pOg=; b=d24gcFnmjWVzFkyBxZewpJalsbJp8cRqcaC0p03VumhLl+I8TjeiEl42GtaFRMNn1R 8Ypde+lFGH/PsUn6oiBjOU0RhxlUwDEMiJWvEeYHsaWhh+Mfo4xMTn8s5V3vxWMmb67d LAoPa/6/TSFIHXhVWK6AYcNXT2uJqK/j0tdG621xR6M3L3FXhCfAD4TiTJ4kjCw2WEas YQfEeJKh9JvM1h4PSv9HZeC9adaorJZUH8P7NMpxO3kvmb0x9PbCopuy0fLrllHm21J6 GEU0VviS/UJAJrjNAdO8MlxX/eBmZefk3KQ55zYPfBGk34QcdNy1Twru8rfdLP375DUa nc/Q== X-Gm-Message-State: APjAAAWDV2XjhdZsi7Ia0LM3wxEJ37JHG1aWDrJCeMY13H7DZPp4F+lK 2lUGeX1bok3wBOKeXsT4O8A0/gQZqxk= X-Google-Smtp-Source: APXvYqyokhuUnsvbd32D0UwhArVukFTaYW2Yineq3QdBqLUA/M44ok7IxAhaCb6+CHFDt+8MY4rfrA== X-Received: by 2002:a63:1666:: with SMTP id 38mr2555189pgw.93.1571800530773; Tue, 22 Oct 2019 20:15:30 -0700 (PDT) Received: from localhost (61-220-137-37.HINET-IP.hinet.net. [61.220.137.37]) by smtp.gmail.com with ESMTPSA id 81sm5330704pfx.142.2019.10.22.20.15.29 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 22 Oct 2019 20:15:30 -0700 (PDT) From: AceLan Kao To: kernel-team@lists.ubuntu.com Subject: [PATCH 2/2][SRU][UNSTABLE] UBUNTU: SAUCE: PCI: pciehp: Prevent deadlock on disconnect Date: Wed, 23 Oct 2019 11:15:09 +0800 Message-Id: <20191023031509.23554-11-acelan.kao@canonical.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191023031509.23554-1-acelan.kao@canonical.com> References: <20191023031509.23554-1-acelan.kao@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Mika Westerberg BugLink: https://bugs.launchpad.net/bugs/1849269 If there are more than one PCIe switch with hotplug downstream ports hot-removing them leads to a following deadlock: INFO: task irq/126-pciehp:198 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. irq/126-pciehp D 0 198 2 0x80000000 Call Trace: __schedule+0x2a2/0x880 schedule+0x2c/0x80 schedule_timeout+0x246/0x350 ? ttwu_do_activate+0x67/0x90 wait_for_completion+0xb7/0x140 ? wake_up_q+0x80/0x80 kthread_stop+0x49/0x110 __free_irq+0x15c/0x2a0 free_irq+0x32/0x70 pcie_shutdown_notification+0x2f/0x50 pciehp_remove+0x27/0x50 pcie_port_remove_service+0x36/0x50 device_release_driver_internal+0x18c/0x250 device_release_driver+0x12/0x20 bus_remove_device+0xec/0x160 device_del+0x13b/0x350 ? pcie_port_find_device+0x60/0x60 device_unregister+0x1a/0x60 remove_iter+0x1e/0x30 device_for_each_child+0x56/0x90 pcie_port_device_remove+0x22/0x40 pcie_portdrv_remove+0x20/0x60 pci_device_remove+0x3e/0xc0 device_release_driver_internal+0x18c/0x250 device_release_driver+0x12/0x20 pci_stop_bus_device+0x6f/0x90 pci_stop_bus_device+0x31/0x90 pci_stop_and_remove_bus_device+0x12/0x20 pciehp_unconfigure_device+0x88/0x140 pciehp_disable_slot+0x6a/0x110 pciehp_handle_presence_or_link_change+0x263/0x400 pciehp_ist+0x1c9/0x1d0 ? irq_forced_thread_fn+0x80/0x80 irq_thread_fn+0x24/0x60 irq_thread+0xeb/0x190 ? irq_thread_fn+0x60/0x60 kthread+0x120/0x140 ? irq_thread_check_affinity+0xf0/0xf0 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 INFO: task irq/190-pciehp:2288 blocked for more than 120 seconds. irq/190-pciehp D 0 2288 2 0x80000000 Call Trace: __schedule+0x2a2/0x880 schedule+0x2c/0x80 schedule_preempt_disabled+0xe/0x10 __mutex_lock.isra.9+0x2e0/0x4d0 ? __mutex_lock_slowpath+0x13/0x20 __mutex_lock_slowpath+0x13/0x20 mutex_lock+0x2c/0x30 pci_lock_rescan_remove+0x15/0x20 pciehp_unconfigure_device+0x4d/0x140 pciehp_disable_slot+0x6a/0x110 pciehp_handle_presence_or_link_change+0x263/0x400 pciehp_ist+0x1c9/0x1d0 ? irq_forced_thread_fn+0x80/0x80 irq_thread_fn+0x24/0x60 irq_thread+0xeb/0x190 ? irq_thread_fn+0x60/0x60 kthread+0x120/0x140 ? irq_thread_check_affinity+0xf0/0xf0 ? kthread_park+0x90/0x90 ret_from_fork+0x35/0x40 What happens here is that the whole hierarchy is runtime resumed and the parent PCIe downstream port, who got the hot-remove event, starts removing devices below it taking pci_lock_rescan_remove() lock. When the child PCIe port is runtime resumed it calls pciehp_check_presence() which ends up calling pciehp_card_present() and pciehp_check_link_active(). Both of these read their parts of PCIe config space by calling helper function pcie_capability_read_word(). Now, this function notices that the underlying device is already gone and returns PCIBIOS_DEVICE_NOT_FOUND with the capability value set to 0. When pciehp gets this value it thinks that its child device is also hot-removed and schedules its IRQ thread to handle the event. The deadlock happens when the child's IRQ thread runs and tries to acquire pci_lock_rescan_remove() which is already taken by the parent and the parent waits for the child's IRQ thread to finish. We can prevent this from happening by checking the return value of pcie_capability_read_word() and if it is PCIBIOS_DEVICE_NOT_FOUND stop performing any hot-removal activities. Link: https://patchwork.kernel.org/patch/11089973/ Signed-off-by: Mika Westerberg Signed-off-by: AceLan Kao --- drivers/pci/hotplug/pciehp.h | 6 +++--- drivers/pci/hotplug/pciehp_core.c | 11 ++++++++--- drivers/pci/hotplug/pciehp_ctrl.c | 4 ++-- drivers/pci/hotplug/pciehp_hpc.c | 32 +++++++++++++++++++++++-------- 4 files changed, 37 insertions(+), 16 deletions(-) diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h index 654c972b8ea0..afea59a3aad2 100644 --- a/drivers/pci/hotplug/pciehp.h +++ b/drivers/pci/hotplug/pciehp.h @@ -172,10 +172,10 @@ void pciehp_set_indicators(struct controller *ctrl, int pwr, int attn); void pciehp_get_latch_status(struct controller *ctrl, u8 *status); int pciehp_query_power_fault(struct controller *ctrl); -bool pciehp_card_present(struct controller *ctrl); -bool pciehp_card_present_or_link_active(struct controller *ctrl); +int pciehp_card_present(struct controller *ctrl); +int pciehp_card_present_or_link_active(struct controller *ctrl); int pciehp_check_link_status(struct controller *ctrl); -bool pciehp_check_link_active(struct controller *ctrl); +int pciehp_check_link_active(struct controller *ctrl); void pciehp_release_ctrl(struct controller *ctrl); int pciehp_sysfs_enable_slot(struct hotplug_slot *hotplug_slot); diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c index 56daad828c9e..312cc45c44c7 100644 --- a/drivers/pci/hotplug/pciehp_core.c +++ b/drivers/pci/hotplug/pciehp_core.c @@ -139,10 +139,15 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value) { struct controller *ctrl = to_ctrl(hotplug_slot); struct pci_dev *pdev = ctrl->pcie->port; + int ret; pci_config_pm_runtime_get(pdev); - *value = pciehp_card_present_or_link_active(ctrl); + ret = pciehp_card_present_or_link_active(ctrl); pci_config_pm_runtime_put(pdev); + if (ret < 0) + return ret; + + *value = ret; return 0; } @@ -158,13 +163,13 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value) */ static void pciehp_check_presence(struct controller *ctrl) { - bool occupied; + int occupied; down_read(&ctrl->reset_lock); mutex_lock(&ctrl->state_lock); occupied = pciehp_card_present_or_link_active(ctrl); - if ((occupied && (ctrl->state == OFF_STATE || + if ((occupied > 0 && (ctrl->state == OFF_STATE || ctrl->state == BLINKINGON_STATE)) || (!occupied && (ctrl->state == ON_STATE || ctrl->state == BLINKINGOFF_STATE))) diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c index 21af7b16d7a4..c760a13ec7b1 100644 --- a/drivers/pci/hotplug/pciehp_ctrl.c +++ b/drivers/pci/hotplug/pciehp_ctrl.c @@ -226,7 +226,7 @@ void pciehp_handle_disable_request(struct controller *ctrl) void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) { - bool present, link_active; + int present, link_active; /* * If the slot is on and presence or link has changed, turn it off. @@ -257,7 +257,7 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) mutex_lock(&ctrl->state_lock); present = pciehp_card_present(ctrl); link_active = pciehp_check_link_active(ctrl); - if (!present && !link_active) { + if (present <= 0 && link_active <= 0) { mutex_unlock(&ctrl->state_lock); return; } diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 1a522c1c4177..028bad8d5728 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -201,13 +201,16 @@ static void pcie_write_cmd_nowait(struct controller *ctrl, u16 cmd, u16 mask) pcie_do_write_cmd(ctrl, cmd, mask, false); } -bool pciehp_check_link_active(struct controller *ctrl) +int pciehp_check_link_active(struct controller *ctrl) { struct pci_dev *pdev = ctrl_dev(ctrl); u16 lnk_status; - bool ret; + int ret; + + ret = pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status); + if (ret == PCIBIOS_DEVICE_NOT_FOUND) + return -ENODEV; - pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status); ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA); if (ret) @@ -373,13 +376,17 @@ void pciehp_get_latch_status(struct controller *ctrl, u8 *status) *status = !!(slot_status & PCI_EXP_SLTSTA_MRLSS); } -bool pciehp_card_present(struct controller *ctrl) +int pciehp_card_present(struct controller *ctrl) { struct pci_dev *pdev = ctrl_dev(ctrl); u16 slot_status; + int ret; - pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status); - return slot_status & PCI_EXP_SLTSTA_PDS; + ret = pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status); + if (ret == PCIBIOS_DEVICE_NOT_FOUND) + return -ENODEV; + + return !!(slot_status & PCI_EXP_SLTSTA_PDS); } /** @@ -390,10 +397,19 @@ bool pciehp_card_present(struct controller *ctrl) * Presence Detect State bit, this helper also returns true if the Link Active * bit is set. This is a concession to broken hotplug ports which hardwire * Presence Detect State to zero, such as Wilocity's [1ae9:0200]. + * + * Returns: %1 if the slot is occupied and %0 if it is not. If the hotplug + * port is not present anymore returns %-ENODEV. */ -bool pciehp_card_present_or_link_active(struct controller *ctrl) +int pciehp_card_present_or_link_active(struct controller *ctrl) { - return pciehp_card_present(ctrl) || pciehp_check_link_active(ctrl); + int ret; + + ret = pciehp_card_present(ctrl); + if (ret) + return ret; + + return pciehp_check_link_active(ctrl); } int pciehp_query_power_fault(struct controller *ctrl)