{"id":819325,"url":"http://patchwork.ozlabs.org/api/covers/819325/?format=json","web_url":"http://patchwork.ozlabs.org/project/linux-pci/cover/20170927214220.41216-1-gvaradar@cisco.com/","project":{"id":28,"url":"http://patchwork.ozlabs.org/api/projects/28/?format=json","name":"Linux PCI development","link_name":"linux-pci","list_id":"linux-pci.vger.kernel.org","list_email":"linux-pci@vger.kernel.org","web_url":null,"scm_url":null,"webscm_url":null,"list_archive_url":"","list_archive_url_format":"","commit_url_format":""},"msgid":"<20170927214220.41216-1-gvaradar@cisco.com>","list_archive_url":null,"date":"2017-09-27T21:42:16","name":"[0/4] pci aer: fix deadlock in do_recovery","submitter":{"id":46073,"url":"http://patchwork.ozlabs.org/api/people/46073/?format=json","name":"Govindarajulu Varadarajan","email":"gvaradar@cisco.com"},"mbox":"http://patchwork.ozlabs.org/project/linux-pci/cover/20170927214220.41216-1-gvaradar@cisco.com/mbox/","series":[{"id":5455,"url":"http://patchwork.ozlabs.org/api/series/5455/?format=json","web_url":"http://patchwork.ozlabs.org/project/linux-pci/list/?series=5455","date":"2017-09-27T21:42:16","name":"pci aer: fix deadlock in do_recovery","version":1,"mbox":"http://patchwork.ozlabs.org/series/5455/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/covers/819325/comments/","headers":{"Return-Path":"<linux-pci-owner@vger.kernel.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-pci-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (1024-bit key;\n\tunprotected) header.d=cisco.com header.i=@cisco.com\n\theader.b=\"CbXxmC8E\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3y2Wkw5S0wz9t67\n\tfor <incoming@patchwork.ozlabs.org>;\n\tThu, 28 Sep 2017 07:52:24 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1752092AbdI0VwX (ORCPT <rfc822;incoming@patchwork.ozlabs.org>);\n\tWed, 27 Sep 2017 17:52:23 -0400","from rcdn-iport-4.cisco.com ([173.37.86.75]:6641 \"EHLO\n\trcdn-iport-4.cisco.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1751905AbdI0VwW (ORCPT\n\t<rfc822; linux-pci@vger.kernel.org>); Wed, 27 Sep 2017 17:52:22 -0400","from alln-core-6.cisco.com ([173.36.13.139])\n\tby rcdn-iport-4.cisco.com with ESMTP/TLS/DHE-RSA-AES256-SHA;\n\t27 Sep 2017 21:42:57 +0000","from a6.cisco.com (arch-kvm-vm.cisco.com [10.193.184.6])\n\t(authenticated bits=0)\n\tby alln-core-6.cisco.com (8.14.5/8.14.5) with ESMTP id v8RLgpwe021066\n\t(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);\n\tWed, 27 Sep 2017 21:42:57 GMT"],"X-Greylist":"delayed 563 seconds by postgrey-1.27 at vger.kernel.org;\n\tWed, 27 Sep 2017 17:52:22 EDT","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple;\n\td=cisco.com; i=@cisco.com; l=2066; q=dns/txt; s=iport;\n\tt=1506549142; x=1507758742;\n\th=from:to:cc:subject:date:message-id;\n\tbh=B4XupCfpP5J1cTVeghWBD/DwSnoP6Gmur28KI/bnY98=;\n\tb=CbXxmC8E07Enh/hn9RxZmo2NdGu0Ka4qyIK9qecuLIaDn9XwBDSSPUYE\n\tm96XBB7GIvNJKOa4b/YF3nEMBtSl0JI/7Yfq/mNGOu0H+vXMIhi3ZuYOP\n\tbpf9z6j1c7SGiSFuAq8bv7hGqctigw+Zc85poQxpZo7sh44ErykTLw2cR w=;","X-IronPort-AV":"E=Sophos;i=\"5.42,446,1500940800\"; d=\"scan'208\";a=\"300794013\"","From":"Govindarajulu Varadarajan <gvaradar@cisco.com>","To":"benve@cisco.com, bhelgaas@google.com, linux-pci@vger.kernel.org,\n\tlinux-kernel@vger.kernel.org, jlbec@evilplan.org, hch@lst.de,\n\tmingo@redhat.com, peterz@infradead.org","Cc":"Govindarajulu Varadarajan <gvaradar@cisco.com>","Subject":"[PATCH 0/4] pci aer: fix deadlock in do_recovery","Date":"Wed, 27 Sep 2017 14:42:16 -0700","Message-Id":"<20170927214220.41216-1-gvaradar@cisco.com>","X-Mailer":"git-send-email 2.14.1","X-Authenticated-User":"gvaradar@cisco.com","Sender":"linux-pci-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-pci.vger.kernel.org>","X-Mailing-List":"linux-pci@vger.kernel.org"},"content":"I am seeing a dead lock while loading enic driver with sriov enabled.\n\nCPU0\t\t\t\t\tCPU1\n---------------------------------------------------------------------\n__driver_attach()\ndevice_lock(&dev->mutex) <--- device mutex lock here\ndriver_probe_device()\npci_enable_sriov()\npci_iov_add_virtfn()\npci_device_add()\n\t\t\t\t\taer_isr()\t\t<--- pci aer error\n\t\t\t\t\tdo_recovery()\n\t\t\t\t\tbroadcast_error_message()\n\t\t\t\t\tpci_walk_bus()\n\t\t\t\t\tdown_read(&pci_bus_sem) <--- rd sem\ndown_write(&pci_bus_sem) <-- stuck on wr sem\n\t\t\t\t\treport_error_detected()\n\t\t\t\t\tdevice_lock(&dev->mutex)<--- DEAD LOCK\n\nThis can also happen when aer error occurs while pci_dev->sriov_config() is\ncalled.\n\nOnly fix I could think of is to lock &pci_bus_sem and try locking all\ndevice->mutex under that pci_bus. If it fails, unlock all device->mutex\nand &pci_bus_sem and try again. This approach seems to be hackish and I\ndo not have better solution. I would like to open the discussion for\nthis.\n\nPath 1 and 2 are code refactoring for pci locking api. Patch 3 fixes the\nissue.\n\nWith current fix, we hold mutex lock of parent device and all the\ndevices under the bus. This can exceed the size of held_locks in lockdep\nif number of devices (VFs) exceed 48. Patch 4 extends this 63, max\nsupported by lockdep.\n\nGovindarajulu Varadarajan (4):\n  pci: introduce __pci_walk_bus for caller with pci_bus_sem held\n  pci: code refactor pci_bus_lock/unlock/trylock\n  pci aer: fix deadlock in do_recovery\n  lockdep: make MAX_LOCK_DEPTH configurable from Kconfig\n\n drivers/pci/bus.c                  | 13 ++++++++--\n drivers/pci/pci.c                  | 38 ++++++++++++++++++++---------\n drivers/pci/pcie/aer/aerdrv_core.c | 50 ++++++++++++++++++++++++++++++--------\n fs/configfs/inode.c                |  2 +-\n include/linux/pci.h                | 18 ++++++++++++++\n include/linux/sched.h              |  3 +--\n kernel/locking/lockdep.c           | 13 +++++-----\n lib/Kconfig.debug                  | 10 ++++++++\n 8 files changed, 115 insertions(+), 32 deletions(-)"}