From patchwork Mon Sep 17 14:34:28 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gavin Shan X-Patchwork-Id: 184425 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from ozlabs.org (localhost [IPv6:::1]) by ozlabs.org (Postfix) with ESMTP id 685642C01C2 for ; Tue, 18 Sep 2012 00:35:09 +1000 (EST) Received: by ozlabs.org (Postfix) id DF9C62C008A; Tue, 18 Sep 2012 00:34:43 +1000 (EST) Delivered-To: linuxppc-dev@ozlabs.org Received: from e9.ny.us.ibm.com (e9.ny.us.ibm.com [32.97.182.139]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "e9.ny.us.ibm.com", Issuer "GeoTrust SSL CA" (not verified)) by ozlabs.org (Postfix) with ESMTPS id CDBD52C0086 for ; Tue, 18 Sep 2012 00:34:41 +1000 (EST) Received: from /spool/local by e9.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 17 Sep 2012 10:34:37 -0400 Received: from d01relay04.pok.ibm.com (9.56.227.236) by e9.ny.us.ibm.com (192.168.1.109) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 17 Sep 2012 10:34:34 -0400 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q8HEYXYq041804 for ; Mon, 17 Sep 2012 10:34:33 -0400 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q8HEYVo2032037 for ; Mon, 17 Sep 2012 08:34:31 -0600 Received: from shangw ([9.125.24.222]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q8HEYUrT031968; Mon, 17 Sep 2012 08:34:30 -0600 Received: by shangw (Postfix, from userid 1000) id CBB6B3021DD; Mon, 17 Sep 2012 22:34:29 +0800 (CST) From: Gavin Shan To: linuxppc-dev@ozlabs.org Subject: [PATCH 2/2] ppc/eeh: fix crash on converting OF node to edev Date: Mon, 17 Sep 2012 22:34:28 +0800 Message-Id: <1347892468-25818-2-git-send-email-shangw@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.5.4 In-Reply-To: <1347892468-25818-1-git-send-email-shangw@linux.vnet.ibm.com> References: <1347892468-25818-1-git-send-email-shangw@linux.vnet.ibm.com> x-cbid: 12091714-7182-0000-0000-000002A49BA9 Cc: Gavin Shan , stable@vger.kernel.org X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" The kernel crash was reported by Alexy. He was testing some feature with private kernel, in which Alexy added some code in pci_pm_reset() to read the CSR after writting it. The bug could be reproduced on Fiber Channel card (Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter (rev 03)) by the following commands. # echo 1 > /sys/devices/pci0004:01/0004:01:00.0/reset # rmmod lpfc # modprobe lpfc The history behind the test case is that those additional config space reading operations in pci_pm_reset() would cause EEH error, but we didn't detect EEH error until "modprobe lpfc". For the case, all the PCI devices on PCI bus (0004:01) were removed and added after PE reset. Then the EEH devices would be figured out again based on the OF nodes. Unfortunately, there were some child OF nodes under PCI device (0004:01:00.0), but they didn't have attached PCI_DN since they're invisible from PCI domain. However, we were still trying to convert OF node to EEH device without checking on the attached PCI_DN. Eventually, it caused the kernel crash as follows: Unable to handle kernel paging request for data at address 0x00000030 Faulting instruction address: 0xc00000000004d888 cpu 0x0: Vector: 300 (Data Access) at [c000000fc797b950] pc: c00000000004d888: .eeh_add_device_tree_early+0x78/0x140 lr: c00000000004d880: .eeh_add_device_tree_early+0x70/0x140 sp: c000000fc797bbd0 msr: 8000000000009032 dar: 30 dsisr: 40000000 current = 0xc000000fc78d9f70 paca = 0xc00000000edb0000 softe: 0 irq_happened: 0x00 pid = 2951, comm = eehd enter ? for help [c000000fc797bc50] c00000000004d848 .eeh_add_device_tree_early+0x38/0x140 [c000000fc797bcd0] c00000000004d848 .eeh_add_device_tree_early+0x38/0x140 [c000000fc797bd50] c000000000051b54 .pcibios_add_pci_devices+0x34/0x190 [c000000fc797bde0] c00000000004fb10 .eeh_reset_device+0x100/0x160 [c000000fc797be70] c0000000000502dc .eeh_handle_event+0x19c/0x300 [c000000fc797bf00] c000000000050570 .eeh_event_handler+0x130/0x1a0 [c000000fc797bf90] c000000000020138 .kernel_thread+0x54/0x70 The patch changes of_node_to_eeh_dev() and just returns NULL if the passed OF node doesn't have attached PCI_DN. Cc: stable@vger.kernel.org Reported-by: Alexey Kardashevskiy Signed-off-by: Gavin Shan --- arch/powerpc/include/asm/pci-bridge.h | 8 ++++++++ arch/powerpc/platforms/pseries/eeh.c | 2 +- 2 files changed, 9 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index a059cb9..025a130 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -182,6 +182,14 @@ static inline int pci_device_from_OF_node(struct device_node *np, #if defined(CONFIG_EEH) static inline struct eeh_dev *of_node_to_eeh_dev(struct device_node *dn) { + /* + * For those OF nodes whose parent isn't PCI bridge, they + * don't have PCI_DN actually. So we have to skip them for + * any EEH operations. + */ + if (!dn || !PCI_DN(dn)) + return NULL; + return PCI_DN(dn)->edev; } #else diff --git a/arch/powerpc/platforms/pseries/eeh.c b/arch/powerpc/platforms/pseries/eeh.c index 43f6ed4..9a04322 100644 --- a/arch/powerpc/platforms/pseries/eeh.c +++ b/arch/powerpc/platforms/pseries/eeh.c @@ -728,7 +728,7 @@ static void eeh_add_device_early(struct device_node *dn) { struct pci_controller *phb; - if (!dn || !of_node_to_eeh_dev(dn)) + if (!of_node_to_eeh_dev(dn)) return; phb = of_node_to_eeh_dev(dn)->phb;