[resend,v2] PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors

Use down_read_nested() and down_write_nested() when taking the
ctrl->reset_lock rw-sem, passing the number of PCIe hotplug controllers in
the path to the PCI root bus as lock subclass parameter. This fixes the
following false-positive lockdep report when unplugging a Lenovo X1C8 from
a Lenovo 2nd gen TB3 dock:

[   28.583853] pcieport 0000:06:01.0: pciehp: Slot(1): Link Down
[   28.583891] pcieport 0000:06:01.0: pciehp: Slot(1): Card not present
[   28.584849] ============================================
[   28.584854] WARNING: possible recursive locking detected
[   28.584858] 5.16.0-rc2+ #621 Not tainted
[   28.584864] --------------------------------------------
[   28.584867] irq/124-pciehp/86 is trying to acquire lock:
[   28.584873] ffff8e5ac4299ef8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_check_presence+0x23/0x80
[   28.584904]
               but task is already holding lock:
[   28.584908] ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
[   28.584929]
               other info that might help us debug this:
[   28.584933]  Possible unsafe locking scenario:

[   28.584936]        CPU0
[   28.584939]        ----
[   28.584942]   lock(&ctrl->reset_lock);
[   28.584949]   lock(&ctrl->reset_lock);
[   28.584955]
                *** DEADLOCK ***

[   28.584959]  May be due to missing lock nesting notation

[   28.584963] 3 locks held by irq/124-pciehp/86:
[   28.584970]  #0: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
[   28.584991]  #1: ffffffffa3b024e8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pciehp_unconfigure_device+0x31/0x110
[   28.585012]  #2: ffff8e5ac1ee2248 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x1c/0x40
[   28.585037]
               stack backtrace:
[   28.585042] CPU: 4 PID: 86 Comm: irq/124-pciehp Not tainted 5.16.0-rc2+ #621
[   28.585052] Hardware name: LENOVO 20U90SIT19/20U90SIT19, BIOS N2WET30W (1.20 ) 08/26/2021
[   28.585059] Call Trace:
[   28.585064]  <TASK>
[   28.585073]  dump_stack_lvl+0x59/0x73
[   28.585087]  __lock_acquire.cold+0xc5/0x2c6
[   28.585106]  ? find_held_lock+0x2b/0x80
[   28.585124]  lock_acquire+0xb5/0x2b0
[   28.585132]  ? pciehp_check_presence+0x23/0x80
[   28.585144]  ? lock_is_held_type+0xa8/0x120
[   28.585161]  down_read+0x3e/0x50
[   28.585172]  ? pciehp_check_presence+0x23/0x80
[   28.585183]  pciehp_check_presence+0x23/0x80
[   28.585194]  pciehp_runtime_resume+0x5c/0xa0
[   28.585206]  ? pci_msix_init+0x60/0x60
[   28.585214]  device_for_each_child+0x45/0x70
[   28.585227]  pcie_port_device_runtime_resume+0x20/0x30
[   28.585236]  pci_pm_runtime_resume+0xa7/0xc0
[   28.585246]  ? pci_pm_freeze_noirq+0x100/0x100
[   28.585257]  __rpm_callback+0x41/0x110
[   28.585271]  ? pci_pm_freeze_noirq+0x100/0x100
[   28.585281]  rpm_callback+0x59/0x70
[   28.585293]  rpm_resume+0x512/0x7b0
[   28.585309]  __pm_runtime_resume+0x4a/0x90
[   28.585322]  __device_release_driver+0x28/0x240
[   28.585338]  device_release_driver+0x26/0x40
[   28.585351]  pci_stop_bus_device+0x68/0x90
[   28.585363]  pci_stop_bus_device+0x2c/0x90
[   28.585373]  pci_stop_and_remove_bus_device+0xe/0x20
[   28.585384]  pciehp_unconfigure_device+0x6c/0x110
[   28.585396]  ? __pm_runtime_resume+0x58/0x90
[   28.585409]  pciehp_disable_slot+0x5b/0xe0
[   28.585421]  pciehp_handle_presence_or_link_change+0xc3/0x2f0
[   28.585436]  pciehp_ist+0x179/0x180
[   28.585449]  ? disable_irq_nosync+0x10/0x10
[   28.585460]  irq_thread_fn+0x1d/0x60
[   28.585470]  ? irq_thread+0x81/0x1a0
[   28.585480]  irq_thread+0xcb/0x1a0
[   28.585491]  ? irq_thread_fn+0x60/0x60
[   28.585502]  ? irq_thread_check_affinity+0xb0/0xb0
[   28.585514]  kthread+0x165/0x190
[   28.585522]  ? set_kthread_struct+0x40/0x40
[   28.585531]  ret_from_fork+0x1f/0x30
[   28.585554]  </TASK>

This lockdep warning is triggered because with Thunderbolt, hotplug ports
are nested. When removing multiple devices in a daisy-chain, each hotplug
port's reset_lock may be acquired recursively. It's never the same lock,
so the lockdep splat is a false positive.

Because locks at the same hierarchy level are never acquired recursively,
a per-level lockdep class is sufficient to fix the lockdep warning.

The choice to use one lockdep subclass per pcie-hotplug controller in
the path to the root-bus was made to conserve class keys because their
number is limited and the complexity grows quadratically with number of
keys according to Documentation/locking/lockdep-design.rst.

Link: https://lore.kernel.org/linux-pci/20190402021933.GA2966@mit.edu/
Link: https://lore.kernel.org/linux-pci/de684a28-9038-8fc6-27ca-3f6f2f6400d7@redhat.com/
Cc: stable@vger.kernel.org
Reported-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
Changes in v2:
- Only use a subclass for each hotplug capable port/parent in the path to
  the PCI root bus, instead of one for each level in the PCI hierarchy,
  to avoid hitting MAX_LOCKDEP_SUBCLASSES
- Drop the "PCI: Add a pci_dev_depth() helper function" since we now need
  a special version of this to only count hotplug ports
- Various commit message improvements
---
 drivers/pci/hotplug/pciehp.h      |  3 +++
 drivers/pci/hotplug/pciehp_core.c |  2 +-
 drivers/pci/hotplug/pciehp_hpc.c  | 21 ++++++++++++++++++---
 3 files changed, 22 insertions(+), 4 deletions(-)

Message ID	20211217141709.379663-1-hdegoede@redhat.com
State	New
Headers	show Return-Path: <linux-pci-owner@vger.kernel.org> From: Hans de Goede <hdegoede@redhat.com> To: Bjorn Helgaas <bhelgaas@google.com> Cc: Hans de Goede <hdegoede@redhat.com>, linux-pci@vger.kernel.org, stable@vger.kernel.org, "Theodore Ts'o" <tytso@mit.edu>, Lukas Wunner <lukas@wunner.de> Subject: [PATCH resend v2] PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors Date: Fri, 17 Dec 2021 15:17:09 +0100 Message-Id: <20211217141709.379663-1-hdegoede@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[resend,v2] PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors \| expand [resend,v2] PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors

[resend,v2] PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors

Commit Message

Comments

Patch