From patchwork Sat Aug 18 06:51:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sinan Kaya X-Patchwork-Id: 959464 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=kernel.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.b="HQj2gAWf"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41ttDx0BHFz9s47 for ; Mon, 20 Aug 2018 09:19:28 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726518AbeHTCch (ORCPT ); Sun, 19 Aug 2018 22:32:37 -0400 Received: from mail.kernel.org ([198.145.29.99]:48772 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726397AbeHTCch (ORCPT ); Sun, 19 Aug 2018 22:32:37 -0400 Received: from localhost.localdomain (cpe-174-109-247-98.nc.res.rr.com [174.109.247.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id D551B20C0E; Sun, 19 Aug 2018 23:19:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1534720765; bh=MCW6fbfEOEa5jefK6Th7M3Bchw2CVW/sNu7qXNcXuqc=; h=From:To:Cc:Subject:Date:From; b=HQj2gAWfqXFusbUdKtsjaO85s8KZ6tDCNj8Nlt/osxR4nN1Fp2waL8ccnTaugFKFP l6qOFDlglxFA0Bo8n4furbY2cAlM6CZqHFoonWlFU/PyLR0/c88TMO77x+JruTLIgn FBxwzBVVtS10Os1NHoLHU9OivOc8IAHTXPVoPQh4= From: Sinan Kaya To: linux-pci@vger.kernel.org Cc: Sinan Kaya , Bjorn Helgaas , Lukas Wunner , Mika Westerberg , "Gustavo A. R. Silva" , Oza Pawandeep , Keith Busch , linux-kernel@vger.kernel.org (open list) Subject: [PATCH v8 1/2] PCI: pciehp: Ignore link events when there is a fatal error pending Date: Fri, 17 Aug 2018 23:51:09 -0700 Message-Id: <20180818065126.77912-1-okaya@kernel.org> X-Mailer: git-send-email 2.17.1 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org AER/DPC reset is known as warm-resets. HP link recovery is known as cold-reset via power-off and power-on command to the PCI slot. In the middle of a warm-reset operation (AER/DPC), we are: 1. turning off the slow power. Slot power needs to be kept on in order for recovery to succeed. 2. performing a cold reset causing Fatal Error recovery to fail. If link goes down due to a DPC event, it should be recovered by DPC status trigger. Injecting a cold reset in the middle can cause a HW lockup as it is an undefined behavior. Similarly, If link goes down due to an AER secondary bus reset issue, it should be recovered by HW. Injecting a cold reset in the middle of a secondary bus reset can cause a HW lockup as it is an undefined behavior. 1. HP ISR observes link down interrupt. 2. HP ISR checks that there is a fatal error pending, it doesn't touch the link. 3. HP ISR waits until link recovery happens. 4. HP ISR calls the read vendor id function. 5. If all fails, try the cold-reset approach. If fatal error is pending and a fatal error service such as DPC or AER is running, it is the responsibility of the fatal error service to recover the link. Signed-off-by: Sinan Kaya --- drivers/pci/hotplug/pciehp_ctrl.c | 18 ++++++++++++++++ drivers/pci/pci.h | 2 ++ drivers/pci/pcie/err.c | 34 +++++++++++++++++++++++++++++++ 3 files changed, 54 insertions(+) diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c index da7c72372ffc..22354b6850c3 100644 --- a/drivers/pci/hotplug/pciehp_ctrl.c +++ b/drivers/pci/hotplug/pciehp_ctrl.c @@ -222,9 +222,27 @@ void pciehp_handle_disable_request(struct slot *slot) void pciehp_handle_presence_or_link_change(struct slot *slot, u32 events) { struct controller *ctrl = slot->ctrl; + struct pci_dev *pdev = ctrl->pcie->port; bool link_active; u8 present; + /* If a fatal error is pending, wait for AER or DPC to handle it. */ + if (pcie_fatal_error_pending(pdev)) { + bool recovered; + + recovered = pcie_wait_fatal_error_clear(pdev); + + /* If the fatal error is gone and the link is up, return */ + if (recovered && pcie_wait_for_link(pdev, true)) { + ctrl_info(ctrl, "Slot(%s): Ignoring Link event due to successful fatal error recovery\n", + slot_name(slot)); + return; + } + + ctrl_info(ctrl, "Slot(%s): Fatal error recovery failed for Link event, trying hotplug reset\n", + slot_name(slot)); + } + /* * If the slot is on and presence or link has changed, turn it off. * Even if it's occupied again, we cannot assume the card is the same. diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index c358e7a07f3f..e2d98654630b 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -356,6 +356,8 @@ void pci_enable_acs(struct pci_dev *dev); /* PCI error reporting and recovery */ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service); void pcie_do_nonfatal_recovery(struct pci_dev *dev); +bool pcie_fatal_error_pending(struct pci_dev *pdev); +bool pcie_wait_fatal_error_clear(struct pci_dev *pdev); bool pcie_wait_for_link(struct pci_dev *pdev, bool active); #ifdef CONFIG_PCIEASPM diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index f7ce0cb0b0b7..b1b5604cb00b 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "portdrv.h" #include "../pci.h" @@ -386,3 +387,36 @@ void pcie_do_nonfatal_recovery(struct pci_dev *dev) /* TODO: Should kernel panic here? */ pci_info(dev, "AER: Device recovery failed\n"); } + +bool pcie_fatal_error_pending(struct pci_dev *pdev) +{ + u16 err_status = 0; + int rc; + + if (!pci_is_pcie(pdev)) + return false; + + rc = pcie_capability_read_word(pdev, PCI_EXP_DEVSTA, &err_status); + if (rc) + return false; + + return !!(err_status & PCI_EXP_DEVSTA_FED); +} + +bool pcie_wait_fatal_error_clear(struct pci_dev *pdev) +{ + int timeout = 1000; + bool ret; + + for (;;) { + ret = pcie_fatal_error_pending(pdev); + if (ret == false) + return true; + if (timeout <= 0) + break; + msleep(20); + timeout -= 20; + } + + return false; +} From patchwork Sat Aug 18 06:51:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sinan Kaya X-Patchwork-Id: 959465 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=kernel.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.b="jeG2iHxo"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41ttF53s47z9s8T for ; Mon, 20 Aug 2018 09:19:37 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726593AbeHTCcl (ORCPT ); Sun, 19 Aug 2018 22:32:41 -0400 Received: from mail.kernel.org ([198.145.29.99]:48890 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726397AbeHTCck (ORCPT ); Sun, 19 Aug 2018 22:32:40 -0400 Received: from localhost.localdomain (cpe-174-109-247-98.nc.res.rr.com [174.109.247.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C2A0C20C51; Sun, 19 Aug 2018 23:19:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1534720769; bh=44y+9eoZ90iQR+zBtghnOOSlPhHHwCSUKtXKQt94PbY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jeG2iHxoUOcD5d7Id3mTuI5Tnoz0Akfeq/gyq3kk9WbB75MkTOmQPdNLAWzWtduRU cUZ+87gPMKVAJDiEaPXNszLixyUweFWQIrOkyT6o5wkKVSp9rrIZO9txIzNAkKiTCi pC2mmypc7mbrLvaGpiv/cKvBKhdjDr3VR2hHScSg= From: Sinan Kaya To: linux-pci@vger.kernel.org Cc: Sinan Kaya , Bjorn Helgaas , Lukas Wunner , Andy Shevchenko , Mika Westerberg , linux-kernel@vger.kernel.org (open list) Subject: [PATCH v8 2/2] PCI: pciehp: Mask AER surprise link down error if hotplug is enabled Date: Fri, 17 Aug 2018 23:51:10 -0700 Message-Id: <20180818065126.77912-2-okaya@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180818065126.77912-1-okaya@kernel.org> References: <20180818065126.77912-1-okaya@kernel.org> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org PCIe Spec 3.0. 7.10.2. Uncorrectable Error Status Register (Offset 04h) defines link down errors as an AER error as bit 5 Surprise Down Error Status. If hotplug is supported by a particular port, we want hotplug driver to handle the link down/up conditions via Data Link Layer Active interrupt rather than the AER error interrupt. Mask the Surprise Down Error during hotplug driver and re-enable it during remove. Signed-off-by: Sinan Kaya --- drivers/pci/hotplug/pciehp_core.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c index ec48c9433ae5..8322db8f369a 100644 --- a/drivers/pci/hotplug/pciehp_core.c +++ b/drivers/pci/hotplug/pciehp_core.c @@ -229,6 +229,29 @@ static void pciehp_check_presence(struct controller *ctrl) up_read(&ctrl->reset_lock); } +static int pciehp_control_surprise_error(struct controller *ctrl, bool enable) +{ + struct pci_dev *pdev = ctrl->pcie->port; + u32 reg32; + int pos; + + if (!pci_is_pcie(pdev)) + return -ENODEV; + + pos = pdev->aer_cap; + if (!pos) + return -ENODEV; + + pci_read_config_dword(pdev, pos + PCI_ERR_UNCOR_MASK, ®32); + if (enable) + reg32 &= ~PCI_ERR_UNC_SURPDN; + else + reg32 |= PCI_ERR_UNC_SURPDN; + pci_write_config_dword(pdev, pos + PCI_ERR_UNCOR_MASK, reg32); + + return 0; +} + static int pciehp_probe(struct pcie_device *dev) { int rc; @@ -280,6 +303,9 @@ static int pciehp_probe(struct pcie_device *dev) pciehp_check_presence(ctrl); + /* We want exclusive control of link down events in hotplug driver */ + pciehp_control_surprise_error(ctrl, false); + return 0; err_out_shutdown_notification: @@ -298,6 +324,7 @@ static void pciehp_remove(struct pcie_device *dev) pci_hp_del(ctrl->slot->hotplug_slot); pcie_shutdown_notification(ctrl); cleanup_slot(ctrl); + pciehp_control_surprise_error(ctrl, true); pciehp_release_ctrl(ctrl); }