From patchwork Wed Sep 27 21:42:16 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Govindarajulu Varadarajan X-Patchwork-Id: 819325 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=cisco.com header.i=@cisco.com header.b="CbXxmC8E"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3y2Wkw5S0wz9t67 for ; Thu, 28 Sep 2017 07:52:24 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752092AbdI0VwX (ORCPT ); Wed, 27 Sep 2017 17:52:23 -0400 Received: from rcdn-iport-4.cisco.com ([173.37.86.75]:6641 "EHLO rcdn-iport-4.cisco.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751905AbdI0VwW (ORCPT ); Wed, 27 Sep 2017 17:52:22 -0400 X-Greylist: delayed 563 seconds by postgrey-1.27 at vger.kernel.org; Wed, 27 Sep 2017 17:52:22 EDT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=2066; q=dns/txt; s=iport; t=1506549142; x=1507758742; h=from:to:cc:subject:date:message-id; bh=B4XupCfpP5J1cTVeghWBD/DwSnoP6Gmur28KI/bnY98=; b=CbXxmC8E07Enh/hn9RxZmo2NdGu0Ka4qyIK9qecuLIaDn9XwBDSSPUYE m96XBB7GIvNJKOa4b/YF3nEMBtSl0JI/7Yfq/mNGOu0H+vXMIhi3ZuYOP bpf9z6j1c7SGiSFuAq8bv7hGqctigw+Zc85poQxpZo7sh44ErykTLw2cR w=; X-IronPort-AV: E=Sophos;i="5.42,446,1500940800"; d="scan'208";a="300794013" Received: from alln-core-6.cisco.com ([173.36.13.139]) by rcdn-iport-4.cisco.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 27 Sep 2017 21:42:57 +0000 Received: from a6.cisco.com (arch-kvm-vm.cisco.com [10.193.184.6]) (authenticated bits=0) by alln-core-6.cisco.com (8.14.5/8.14.5) with ESMTP id v8RLgpwe021066 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 27 Sep 2017 21:42:57 GMT From: Govindarajulu Varadarajan To: benve@cisco.com, bhelgaas@google.com, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, jlbec@evilplan.org, hch@lst.de, mingo@redhat.com, peterz@infradead.org Cc: Govindarajulu Varadarajan Subject: [PATCH 0/4] pci aer: fix deadlock in do_recovery Date: Wed, 27 Sep 2017 14:42:16 -0700 Message-Id: <20170927214220.41216-1-gvaradar@cisco.com> X-Mailer: git-send-email 2.14.1 X-Authenticated-User: gvaradar@cisco.com Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org I am seeing a dead lock while loading enic driver with sriov enabled. CPU0 CPU1 --------------------------------------------------------------------- __driver_attach() device_lock(&dev->mutex) <--- device mutex lock here driver_probe_device() pci_enable_sriov() pci_iov_add_virtfn() pci_device_add() aer_isr() <--- pci aer error do_recovery() broadcast_error_message() pci_walk_bus() down_read(&pci_bus_sem) <--- rd sem down_write(&pci_bus_sem) <-- stuck on wr sem report_error_detected() device_lock(&dev->mutex)<--- DEAD LOCK This can also happen when aer error occurs while pci_dev->sriov_config() is called. Only fix I could think of is to lock &pci_bus_sem and try locking all device->mutex under that pci_bus. If it fails, unlock all device->mutex and &pci_bus_sem and try again. This approach seems to be hackish and I do not have better solution. I would like to open the discussion for this. Path 1 and 2 are code refactoring for pci locking api. Patch 3 fixes the issue. With current fix, we hold mutex lock of parent device and all the devices under the bus. This can exceed the size of held_locks in lockdep if number of devices (VFs) exceed 48. Patch 4 extends this 63, max supported by lockdep. Govindarajulu Varadarajan (4): pci: introduce __pci_walk_bus for caller with pci_bus_sem held pci: code refactor pci_bus_lock/unlock/trylock pci aer: fix deadlock in do_recovery lockdep: make MAX_LOCK_DEPTH configurable from Kconfig drivers/pci/bus.c | 13 ++++++++-- drivers/pci/pci.c | 38 ++++++++++++++++++++--------- drivers/pci/pcie/aer/aerdrv_core.c | 50 ++++++++++++++++++++++++++++++-------- fs/configfs/inode.c | 2 +- include/linux/pci.h | 18 ++++++++++++++ include/linux/sched.h | 3 +-- kernel/locking/lockdep.c | 13 +++++----- lib/Kconfig.debug | 10 ++++++++ 8 files changed, 115 insertions(+), 32 deletions(-)