From patchwork Fri Aug 4 02:18:26 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongdong Liu X-Patchwork-Id: 797557 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3xNqdw61r1z9s1h for ; Fri, 4 Aug 2017 11:51:16 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751918AbdHDBvP (ORCPT ); Thu, 3 Aug 2017 21:51:15 -0400 Received: from szxga02-in.huawei.com ([45.249.212.188]:11303 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751811AbdHDBvP (ORCPT ); Thu, 3 Aug 2017 21:51:15 -0400 Received: from 172.30.72.55 (EHLO DGGEML403-HUB.china.huawei.com) ([172.30.72.55]) by dggrg02-dlp.huawei.com (MOS 4.4.6-GA FastPath queued) with ESMTP id AST97652; Fri, 04 Aug 2017 09:51:09 +0800 (CST) Received: from linux-ioko.site (10.71.200.31) by DGGEML403-HUB.china.huawei.com (10.3.17.33) with Microsoft SMTP Server id 14.3.301.0; Fri, 4 Aug 2017 09:50:59 +0800 From: Dongdong Liu To: CC: , , , , Dongdong Liu Subject: [PATCH] PCIe AER: report non fatal errors only to the functions of the same device Date: Fri, 4 Aug 2017 10:18:26 +0800 Message-ID: <1501813106-89672-1-git-send-email-liudongdong3@huawei.com> X-Mailer: git-send-email 1.9.1 MIME-Version: 1.0 X-Originating-IP: [10.71.200.31] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020202.5983D30D.00F0, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 10e1c2cd59a8ef76ab148ce42eb6bcea Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org From: Gabriele Paoloni Currently if an uncorrectable error is reported by an EP the AER driver walks over all the devices connected to the upstream port bus and in turns call the report_error_detected() callback. If any of the devices connected to the bus does not implement dev->driver->err_handler->error_detected() do_recovery() will fail. However for non fatal errors the PCIe link should not be considered compromised, therefore it makes sense to report the error only to all the functions of a multifunction device. This patch implements this new behaviour for non fatal errors. Signed-off-by: Gabriele Paoloni Signed-off-by: Dongdong Liu --- drivers/pci/bus.c | 38 ++++++++++++++++++++++++++++++++++++++ drivers/pci/pcie/aer/aerdrv_core.c | 13 ++++++++++++- include/linux/pci.h | 3 ++- 3 files changed, 52 insertions(+), 2 deletions(-) diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c index bc56cf1..bc8f8b2 100644 --- a/drivers/pci/bus.c +++ b/drivers/pci/bus.c @@ -364,6 +364,44 @@ void pci_bus_add_devices(const struct pci_bus *bus) } EXPORT_SYMBOL(pci_bus_add_devices); +/** pci_walk_mf_dev - walk all functions of a multi-function + * device calling callback. + * @dev a function in a multi-function device + * @cb callback to be called for each device found + * @userdata arbitrary pointer to be passed to callback. + * + * Walk, on a given bus, only the adjacent functions of a + * multi-function device. Call the provided callback on each + * device found. + * + * We check the return of @cb each time. If it returns anything + * other than 0, we break out. + * + */ +void pci_walk_mf_dev(struct pci_dev *dev, int (*cb)(struct pci_dev *, void *), + void *userdata) +{ + int retval; + struct pci_bus *bus; + struct pci_dev *pdev; + int ndev; + + bus = dev->bus; + ndev = PCI_SLOT(dev->devfn); + + down_read(&pci_bus_sem); + /* call cb for all the functions of the mf device */ + list_for_each_entry(pdev, &bus->devices, bus_list) { + if (PCI_SLOT(pdev->devfn) == ndev) { + retval = cb(pdev, userdata); + if (retval) + break; + } + } + up_read(&pci_bus_sem); +} +EXPORT_SYMBOL_GPL(pci_walk_mf_dev); + /** pci_walk_bus - walk devices on/under bus, calling callback. * @top bus whose devices should be walked * @cb callback to be called for each device found diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c index b1303b3..67c3dc0 100644 --- a/drivers/pci/pcie/aer/aerdrv_core.c +++ b/drivers/pci/pcie/aer/aerdrv_core.c @@ -390,7 +390,18 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev, * If the error is reported by an end point, we think this * error is related to the upstream link of the end point. */ - pci_walk_bus(dev->bus, cb, &result_data); + if ((state == pci_channel_io_normal) && + (!pci_ari_enabled(dev->bus))) + /* + * the error is non fatal so the bus is ok, just walk + * through all the functions in a multifunction device. + * if ARI is enabled on the bus then there can be only + * one device under that bus (so walk all the functions + * under the bus). + */ + pci_walk_mf_dev(dev, cb, &result_data); + else + pci_walk_bus(dev->bus, cb, &result_data); } return result_data.result; diff --git a/include/linux/pci.h b/include/linux/pci.h index 4869e66..69e77bb 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1269,7 +1269,8 @@ const struct pci_device_id *pci_match_id(const struct pci_device_id *ids, struct pci_dev *dev); int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass); - +void pci_walk_mf_dev(struct pci_dev *dev, int (*cb)(struct pci_dev *, void *), + void *userdata); void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *), void *userdata); int pci_cfg_space_size(struct pci_dev *dev);