From patchwork Wed Aug 16 19:33:03 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thierry Reding X-Patchwork-Id: 802224 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="XNMW19QA"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3xXfhD5XjTz9t2V for ; Thu, 17 Aug 2017 05:35:24 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752597AbdHPTdL (ORCPT ); Wed, 16 Aug 2017 15:33:11 -0400 Received: from mail-wr0-f193.google.com ([209.85.128.193]:37609 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752444AbdHPTdH (ORCPT ); Wed, 16 Aug 2017 15:33:07 -0400 Received: by mail-wr0-f193.google.com with SMTP id z91so1653288wrc.4; Wed, 16 Aug 2017 12:33:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=4WF1QgMNDzcngtfOemO1ksdFHJ5lR7uIFaxsDRXgAeA=; b=XNMW19QA5g9FdriBak+Qufs8q05plT9TQtjUq+hQiVFAGln6liSQQRx5TeeEPo9O17 mM7obchASJODLbiE07Zh1YdPmkEFjBgIiYsxcbCX81vlZGesxa3VnclfcNcS1MoC59lm d5pzcTqoEpUrcUulFjmYkVR+jibFqBILlV1mUiApKGe7sxR9eAiYz0Qw5VfnP0efHoGI O8jGaXkMXMsKzNipR5qDf0XgR/1ejgYKw26+E+5+mEBukHo9pePQPUvkfWc/ONvTAV7R HvhuIOLhTVlnTY4cbD/RxqJ+LlpTDd1lSe4WYgvxyM0t1poWuiZFq19dQHZRp7tCYXS6 CKLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=4WF1QgMNDzcngtfOemO1ksdFHJ5lR7uIFaxsDRXgAeA=; b=RfrMNQg3hV4c7PxeUokWvYVKskVV3hroT0h/IIupvsl7Zm7nTlblyjbpy3HpGZbhRI h+8VZajdGIqglSOBf1AroDQMWHSP91drnjLMLKHjcv3j6qAd2lcpb/RkDT9oaqCiZ0ck NiHolnoZIE7TM61ZlJ/bVegomHyf2LirlWLWmXobjPr9Vns0IRrBZeu7q4Ex7jgU/DZ7 2nXLFVabHILyJpdnV8hVHolWR47doajj7xvdao3u/Hrk/x89h5TX0xjfNKa27N4yfUEQ O4spchelMFWwG+yxF9SSauPuNwm6Q2OIRJO/ZvpEDX5Ti5S2uSPNGT89b/qstNsizU5d S0ZQ== X-Gm-Message-State: AHYfb5hylUEbR3EvOr9AzudQU4gM6DyDzUTYx7CwBYFNYZFUDQjl224/ HWkbBQx7/RaZwg== X-Received: by 10.28.149.129 with SMTP id x123mr36810wmd.171.1502911985988; Wed, 16 Aug 2017 12:33:05 -0700 (PDT) Received: from localhost (p200300E41BD5E00076D02BFFFE273F51.dip0.t-ipconnect.de. [2003:e4:1bd5:e000:76d0:2bff:fe27:3f51]) by smtp.gmail.com with ESMTPSA id m189sm2274780wmb.9.2017.08.16.12.33.04 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 16 Aug 2017 12:33:04 -0700 (PDT) Date: Wed, 16 Aug 2017 21:33:03 +0200 From: Thierry Reding To: Bjorn Helgaas Cc: Ding Tianhong , mark.rutland@arm.com, gabriele.paoloni@huawei.com, asit.k.mallick@intel.com, catalin.marinas@arm.com, will.deacon@arm.com, linuxarm@huawei.com, alexander.duyck@gmail.com, ashok.raj@intel.com, eric.dumazet@gmail.com, jeffrey.t.kirsher@intel.com, linux-pci@vger.kernel.org, ganeshgr@chelsio.com, Bob.Shaw@amd.com, leedom@chelsio.com, patrick.j.cramer@intel.com, bhelgaas@google.com, werner@chelsio.com, linux-arm-kernel@lists.infradead.org, amira@mellanox.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, David.Laight@aculab.com, Suravee.Suthikulpanit@amd.com, robin.murphy@arm.com, davem@davemloft.net, l.stach@pengutronix.de Subject: Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device Message-ID: <20170816193303.GA14147@ulmo> References: <1502810688-12420-1-git-send-email-dingtianhong@huawei.com> <20170815170331.GA4099@bhelgaas-glaptop.roam.corp.google.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20170815170331.GA4099@bhelgaas-glaptop.roam.corp.google.com> User-Agent: Mutt/1.8.3 (2017-05-23) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org On Tue, Aug 15, 2017 at 12:03:31PM -0500, Bjorn Helgaas wrote: > On Tue, Aug 15, 2017 at 11:24:48PM +0800, Ding Tianhong wrote: > > Eric report a oops when booting the system after applying > > the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."): > > ... > > > It looks like the pci_find_pcie_root_port() was trying to > > find the Root Port for the PCI device which is the Root > > Port already, it will return NULL and trigger the problem, > > so check the highest_pcie_bridge to fix thie problem. > > The problem was actually with a Root Complex Integrated Endpoint that > has no upstream PCIe device: > > 00:05.2 System peripheral: Intel Corporation Device 0e2a (rev 04) > Subsystem: Intel Corporation Device 0e2a > Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00 > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us > ExtTag- RBE- FLReset- > DevCtl: Report errors: Correctable- Non-Fatal- Fatal+ Unsupported+ > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 128 bytes, MaxReadReq 128 bytes I've started seeing this crash on Tegra K1 as well. Here's the device for which it oopses: 00:02.0 PCI bridge: NVIDIA Corporation TegraK1 PCIe x1 Bridge (rev a1) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [40] Subsystem: NVIDIA Corporation TegraK1 PCIe x1 Bridge Capabilities: [48] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable+ Count=1/2 Maskable- 64bit+ Address: 000000fcfffff000 Data: 0000 Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed- Mapping Address Base: 00000000fee00000 Capabilities: [80] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0 ExtTag+ RBE+ DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s, Exit Latency L0s <512ns ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #0, PowerLimit 0.000W; Interlock- NoCompl- SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Off, PwrInd On, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet+ LinkState+ RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd- AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd- AtomicOpsCtl: ReqEn- EgressBlck- LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Kernel driver in use: pcieport > > Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported") > > This also > > Fixes: c56d4450eb68 ("PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum") > > which added pci_find_pcie_root_port(). Prior to this Relaxed Ordering > series, we only used pci_find_pcie_root_port() in a Chelsio quirk that > only applied to non-integrated endpoints, so we didn't trip over the > bug. > > > Reported-by: Eric Dumazet > > Signed-off-by: Eric Dumazet > > Signed-off-by: Ding Tianhong > > --- > > drivers/pci/pci.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index af0cc34..7e2022f 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -522,7 +522,8 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev) > > bridge = pci_upstream_bridge(bridge); > > } > > > > - if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT) > > + if (highest_pcie_bridge && > > + pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT) > > return NULL; > > > > return highest_pcie_bridge; > > -- > > I think structuring the fix as follows is a little more readable: > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index af0cc3456dc1..587cd7623ed8 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -522,10 +522,11 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev) > bridge = pci_upstream_bridge(bridge); > } > > - if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT) > - return NULL; > + if (highest_pcie_bridge && > + pci_pcie_type(highest_pcie_bridge) == PCI_EXP_TYPE_ROOT_PORT) > + return highest_pcie_bridge; > > - return highest_pcie_bridge; > + return NULL; > } > EXPORT_SYMBOL(pci_find_pcie_root_port); In case of Tegra, dev actually points to the root port. Now if I read the above code correctly, highest_pcie_bridge will still be NULL in that case, which in turn will return NULL from pci_find_pcie_root_port(). But shouldn't it really return dev? The patch that I used to fix the issue is this: --->8--- --->8--- That works correctly if this function ends up being called on the PCIe root port, though perhaps that's not what this function is supposed to do. It's somewhat unclear from the kerneldoc what the function should be doing when called on a root port device itself. Thierry diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 2c712dcfd37d..dd56c1c05614 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -514,7 +514,7 @@ EXPORT_SYMBOL(pci_find_resource); */ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev) { - struct pci_dev *bridge, *highest_pcie_bridge = NULL; + struct pci_dev *bridge, *highest_pcie_bridge = dev; bridge = pci_upstream_bridge(dev); while (bridge && pci_is_pcie(bridge)) {