Message ID | 20190816165101.911-2-s.miroshnichenko@yadro.com |
---|---|
State | Superseded |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI: Allow BAR movement during hotplug | expand |
----- On 16 Aug, 2019, at 18:50, Sergey Miroshnichenko s.miroshnichenko@yadro.com wrote: > This is a yet another approach to fix an old [1-2] concurrency issue, when: > - two or more devices are being hot-added into a bridge which was > initially empty; > - a bridge with two or more devices is being hot-added; > - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges. > > The problem is that a bridge is reported as enabled before the MEM/IO bits > are actually written to the PCI_COMMAND register, so another driver thread > starts memory requests through the not-yet-enabled bridge: > > CPU0 CPU1 > > pci_enable_device_mem() pci_enable_device_mem() > pci_enable_bridge() pci_enable_bridge() > pci_is_enabled() > return false; > atomic_inc_return(enable_cnt) > Start actual enabling the bridge > ... pci_is_enabled() > ... return true; > ... Start memory requests <-- FAIL > ... > Set the PCI_COMMAND_MEMORY bit <-- Must wait for this > > Protect the pci_enable/disable_device() and pci_enable_bridge(), which is > similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race > while enabling upstream bridges"), but adding a per-device mutexes and > preventing the dev->enable_cnt from from incrementing early. > > CC: Srinath Mannam <srinath.mannam@broadcom.com> > CC: Marta Rybczynska <mrybczyn@kalray.eu> > Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com> > > [1] > https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com/T/#u > [RFC PATCH v3] pci: Concurrency issue during pci enable bridge > > [2] > https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@kalray.eu/T/#u > [RFC PATCH] nvme: avoid race-conditions when enabling devices > --- > drivers/pci/pci.c | 26 ++++++++++++++++++++++---- > drivers/pci/probe.c | 1 + > include/linux/pci.h | 1 + > 3 files changed, 24 insertions(+), 4 deletions(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 1b27b5af3d55..e7f8c354e644 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -1645,6 +1645,8 @@ static void pci_enable_bridge(struct pci_dev *dev) > struct pci_dev *bridge; > int retval; > > + mutex_lock(&dev->enable_mutex); > + > bridge = pci_upstream_bridge(dev); > if (bridge) > pci_enable_bridge(bridge); > @@ -1652,6 +1654,7 @@ static void pci_enable_bridge(struct pci_dev *dev) > if (pci_is_enabled(dev)) { > if (!dev->is_busmaster) > pci_set_master(dev); > + mutex_unlock(&dev->enable_mutex); > return; > } > This code is used by numerous drivers and when we've seen that issue I was wondering if there are some use-cases when this (or pci_disable_device) is called with interrupts disabled. It seems that it shouldn't be, but a BUG_ON or error when someone calls it this way would be helpful when debugging. Marta
On Fri, Aug 16, 2019 at 07:50:39PM +0300, Sergey Miroshnichenko wrote: > This is a yet another approach to fix an old [1-2] concurrency issue, when: > - two or more devices are being hot-added into a bridge which was > initially empty; > - a bridge with two or more devices is being hot-added; > - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges. > > The problem is that a bridge is reported as enabled before the MEM/IO bits > are actually written to the PCI_COMMAND register, so another driver thread > starts memory requests through the not-yet-enabled bridge: > > CPU0 CPU1 > > pci_enable_device_mem() pci_enable_device_mem() > pci_enable_bridge() pci_enable_bridge() > pci_is_enabled() > return false; > atomic_inc_return(enable_cnt) > Start actual enabling the bridge > ... pci_is_enabled() > ... return true; > ... Start memory requests <-- FAIL > ... > Set the PCI_COMMAND_MEMORY bit <-- Must wait for this > > Protect the pci_enable/disable_device() and pci_enable_bridge(), which is > similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race > while enabling upstream bridges"), but adding a per-device mutexes and > preventing the dev->enable_cnt from from incrementing early. This isn't directly related to the movable BARs functionality; is it here because you see the problem more frequently when moving BARs?
Hello Bjorn, On 9/28/19 12:59 AM, Bjorn Helgaas wrote: > On Fri, Aug 16, 2019 at 07:50:39PM +0300, Sergey Miroshnichenko wrote: >> This is a yet another approach to fix an old [1-2] concurrency issue, when: >> - two or more devices are being hot-added into a bridge which was >> initially empty; >> - a bridge with two or more devices is being hot-added; >> - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges. >> >> The problem is that a bridge is reported as enabled before the MEM/IO bits >> are actually written to the PCI_COMMAND register, so another driver thread >> starts memory requests through the not-yet-enabled bridge: >> >> CPU0 CPU1 >> >> pci_enable_device_mem() pci_enable_device_mem() >> pci_enable_bridge() pci_enable_bridge() >> pci_is_enabled() >> return false; >> atomic_inc_return(enable_cnt) >> Start actual enabling the bridge >> ... pci_is_enabled() >> ... return true; >> ... Start memory requests <-- FAIL >> ... >> Set the PCI_COMMAND_MEMORY bit <-- Must wait for this >> >> Protect the pci_enable/disable_device() and pci_enable_bridge(), which is >> similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race >> while enabling upstream bridges"), but adding a per-device mutexes and >> preventing the dev->enable_cnt from from incrementing early. > > This isn't directly related to the movable BARs functionality; is it > here because you see the problem more frequently when moving BARs? > First two patches of this series (including this one) are fixes for the boot and for the hotplug, not related to movable BARs. Before these fixes, we were suffering from this issue on PowerNV until commit db2173198b9513f7add8009f225afa1f1c79bcc6 "powerpc/powernv/pci: Work around races in PCI bridge enabling" was backported to distros: NVMEs randomly failed to start during system boot. So we've tested the fixes with that commit reverted. On x86 the BIOS does pre-enable the bridges, but they were still prone to races when hot-added or was initially "empty". Serge
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 1b27b5af3d55..e7f8c354e644 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1645,6 +1645,8 @@ static void pci_enable_bridge(struct pci_dev *dev) struct pci_dev *bridge; int retval; + mutex_lock(&dev->enable_mutex); + bridge = pci_upstream_bridge(dev); if (bridge) pci_enable_bridge(bridge); @@ -1652,6 +1654,7 @@ static void pci_enable_bridge(struct pci_dev *dev) if (pci_is_enabled(dev)) { if (!dev->is_busmaster) pci_set_master(dev); + mutex_unlock(&dev->enable_mutex); return; } @@ -1660,11 +1663,14 @@ static void pci_enable_bridge(struct pci_dev *dev) pci_err(dev, "Error enabling bridge (%d), continuing\n", retval); pci_set_master(dev); + mutex_unlock(&dev->enable_mutex); } static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags) { struct pci_dev *bridge; + /* Enable-locking of bridges is performed within the pci_enable_bridge() */ + bool need_lock = !dev->subordinate; int err; int i, bars = 0; @@ -1680,8 +1686,13 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags) dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK); } - if (atomic_inc_return(&dev->enable_cnt) > 1) + if (need_lock) + mutex_lock(&dev->enable_mutex); + if (pci_is_enabled(dev)) { + if (need_lock) + mutex_unlock(&dev->enable_mutex); return 0; /* already enabled */ + } bridge = pci_upstream_bridge(dev); if (bridge) @@ -1696,8 +1707,10 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags) bars |= (1 << i); err = do_pci_enable_device(dev, bars); - if (err < 0) - atomic_dec(&dev->enable_cnt); + if (err >= 0) + atomic_inc(&dev->enable_cnt); + if (need_lock) + mutex_unlock(&dev->enable_mutex); return err; } @@ -1941,15 +1954,20 @@ void pci_disable_device(struct pci_dev *dev) if (dr) dr->enabled = 0; + mutex_lock(&dev->enable_mutex); dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0, "disabling already-disabled device"); - if (atomic_dec_return(&dev->enable_cnt) != 0) + if (atomic_dec_return(&dev->enable_cnt) != 0) { + mutex_unlock(&dev->enable_mutex); return; + } do_pci_disable_device(dev); dev->is_busmaster = 0; + + mutex_unlock(&dev->enable_mutex); } EXPORT_SYMBOL(pci_disable_device); diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index a3c7338fad86..2e58ece820e8 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -2427,6 +2427,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus) INIT_LIST_HEAD(&dev->bus_list); dev->dev.type = &pci_dev_type; dev->bus = pci_bus_get(bus); + mutex_init(&dev->enable_mutex); return dev; } diff --git a/include/linux/pci.h b/include/linux/pci.h index 9e700d9f9f28..d3a72159722d 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -425,6 +425,7 @@ struct pci_dev { unsigned int no_vf_scan:1; /* Don't scan for VFs after IOV enablement */ pci_dev_flags_t dev_flags; atomic_t enable_cnt; /* pci_enable_device has been called */ + struct mutex enable_mutex; u32 saved_config_space[16]; /* Config space saved at suspend time */ struct hlist_head saved_cap_space;
This is a yet another approach to fix an old [1-2] concurrency issue, when: - two or more devices are being hot-added into a bridge which was initially empty; - a bridge with two or more devices is being hot-added; - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges. The problem is that a bridge is reported as enabled before the MEM/IO bits are actually written to the PCI_COMMAND register, so another driver thread starts memory requests through the not-yet-enabled bridge: CPU0 CPU1 pci_enable_device_mem() pci_enable_device_mem() pci_enable_bridge() pci_enable_bridge() pci_is_enabled() return false; atomic_inc_return(enable_cnt) Start actual enabling the bridge ... pci_is_enabled() ... return true; ... Start memory requests <-- FAIL ... Set the PCI_COMMAND_MEMORY bit <-- Must wait for this Protect the pci_enable/disable_device() and pci_enable_bridge(), which is similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race while enabling upstream bridges"), but adding a per-device mutexes and preventing the dev->enable_cnt from from incrementing early. CC: Srinath Mannam <srinath.mannam@broadcom.com> CC: Marta Rybczynska <mrybczyn@kalray.eu> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com> [1] https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com/T/#u [RFC PATCH v3] pci: Concurrency issue during pci enable bridge [2] https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@kalray.eu/T/#u [RFC PATCH] nvme: avoid race-conditions when enabling devices --- drivers/pci/pci.c | 26 ++++++++++++++++++++++---- drivers/pci/probe.c | 1 + include/linux/pci.h | 1 + 3 files changed, 24 insertions(+), 4 deletions(-)