Message ID | 1351061654-8339-1-git-send-email-ying.huang@intel.com |
---|---|
State | Accepted |
Headers | show |
On Wednesday 24 of October 2012 14:54:13 Huang Ying wrote: > If a PCI device and its parents are put into D3cold, unbinding the > device will trigger deadlock as follow: > > - driver_unbind > - device_release_driver > - device_lock(dev) <--- previous lock here > - __device_release_driver > - pm_runtime_get_sync > ... > - rpm_resume(dev) > - rpm_resume(dev->parent) > ... > - pci_pm_runtime_resume > ... > - pci_set_power_state > - __pci_start_power_transition > - pci_wakeup_bus(dev->parent->subordinate) > - pci_walk_bus > - device_lock(dev) <--- dead lock here > > > If we do not do device_lock in pci_walk_bus, we can avoid dead lock. > Device_lock in pci_walk_bus is introduced in commit: > d71374dafbba7ec3f67371d3b7e9f6310a588808, corresponding email thread > is: https://lkml.org/lkml/2006/5/26/38. The patch author Zhang Yanmin > said device_lock is added to pci_walk_bus because: > > Some error handling functions call pci_walk_bus. For example, PCIe > aer. Here we lock the device, so the driver wouldn't detach from the > device, as the cb might call driver's callback function. > > So I fixed the dead lock as follow: > > - remove device_lock from pci_walk_bus > - add device_lock into callback if callback will call driver's callback > > I checked pci_walk_bus users one by one, and found only PCIe aer needs > device lock. > > Signed-off-by: Huang Ying <ying.huang@intel.com> > Cc: Zhang Yanmin <yanmin.zhang@intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > drivers/pci/bus.c | 3 --- > drivers/pci/pcie/aer/aerdrv_core.c | 20 ++++++++++++++++---- > 2 files changed, 16 insertions(+), 7 deletions(-) > > --- a/drivers/pci/bus.c > +++ b/drivers/pci/bus.c > @@ -320,10 +320,7 @@ void pci_walk_bus(struct pci_bus *top, i > } else > next = dev->bus_list.next; > > - /* Run device routines with the device locked */ > - device_lock(&dev->dev); > retval = cb(dev, userdata); > - device_unlock(&dev->dev); > if (retval) > break; > } > --- a/drivers/pci/pcie/aer/aerdrv_core.c > +++ b/drivers/pci/pcie/aer/aerdrv_core.c > @@ -213,6 +213,7 @@ static int report_error_detected(struct > struct aer_broadcast_data *result_data; > result_data = (struct aer_broadcast_data *) data; > > + device_lock(&dev->dev); > dev->error_state = result_data->state; > > if (!dev->driver || > @@ -231,12 +232,14 @@ static int report_error_detected(struct > dev->driver ? > "no AER-aware driver" : "no driver"); > } > - return 0; > + goto out; > } > > err_handler = dev->driver->err_handler; > vote = err_handler->error_detected(dev, result_data->state); > result_data->result = merge_result(result_data->result, vote); > +out: > + device_unlock(&dev->dev); > return 0; > } > > @@ -247,14 +250,17 @@ static int report_mmio_enabled(struct pc > struct aer_broadcast_data *result_data; > result_data = (struct aer_broadcast_data *) data; > > + device_lock(&dev->dev); > if (!dev->driver || > !dev->driver->err_handler || > !dev->driver->err_handler->mmio_enabled) > - return 0; > + goto out; > > err_handler = dev->driver->err_handler; > vote = err_handler->mmio_enabled(dev); > result_data->result = merge_result(result_data->result, vote); > +out: > + device_unlock(&dev->dev); > return 0; > } > > @@ -265,14 +271,17 @@ static int report_slot_reset(struct pci_ > struct aer_broadcast_data *result_data; > result_data = (struct aer_broadcast_data *) data; > > + device_lock(&dev->dev); > if (!dev->driver || > !dev->driver->err_handler || > !dev->driver->err_handler->slot_reset) > - return 0; > + goto out; > > err_handler = dev->driver->err_handler; > vote = err_handler->slot_reset(dev); > result_data->result = merge_result(result_data->result, vote); > +out: > + device_unlock(&dev->dev); > return 0; > } > > @@ -280,15 +289,18 @@ static int report_resume(struct pci_dev > { > const struct pci_error_handlers *err_handler; > > + device_lock(&dev->dev); > dev->error_state = pci_channel_io_normal; > > if (!dev->driver || > !dev->driver->err_handler || > !dev->driver->err_handler->resume) > - return 0; > + goto out; > > err_handler = dev->driver->err_handler; > err_handler->resume(dev); > +out: > + device_unlock(&dev->dev); > return 0; > } > >
On Wed, Oct 24, 2012 at 12:54 AM, Huang Ying <ying.huang@intel.com> wrote: > If a PCI device and its parents are put into D3cold, unbinding the > device will trigger deadlock as follow: > > - driver_unbind > - device_release_driver > - device_lock(dev) <--- previous lock here > - __device_release_driver > - pm_runtime_get_sync > ... > - rpm_resume(dev) > - rpm_resume(dev->parent) > ... > - pci_pm_runtime_resume > ... > - pci_set_power_state > - __pci_start_power_transition > - pci_wakeup_bus(dev->parent->subordinate) > - pci_walk_bus > - device_lock(dev) <--- dead lock here > > > If we do not do device_lock in pci_walk_bus, we can avoid dead lock. > Device_lock in pci_walk_bus is introduced in commit: > d71374dafbba7ec3f67371d3b7e9f6310a588808, corresponding email thread > is: https://lkml.org/lkml/2006/5/26/38. The patch author Zhang Yanmin > said device_lock is added to pci_walk_bus because: > > Some error handling functions call pci_walk_bus. For example, PCIe > aer. Here we lock the device, so the driver wouldn't detach from the > device, as the cb might call driver's callback function. > > So I fixed the dead lock as follow: > > - remove device_lock from pci_walk_bus > - add device_lock into callback if callback will call driver's callback > > I checked pci_walk_bus users one by one, and found only PCIe aer needs > device lock. Is there a problem report or bugzilla for this issue? > Signed-off-by: Huang Ying <ying.huang@intel.com> > Cc: Zhang Yanmin <yanmin.zhang@intel.com> Should this go to stable as well? The D3cold support appeared in v3.6, so my guess is that this fix could go to v3.6.x. > --- > drivers/pci/bus.c | 3 --- > drivers/pci/pcie/aer/aerdrv_core.c | 20 ++++++++++++++++---- > 2 files changed, 16 insertions(+), 7 deletions(-) > > --- a/drivers/pci/bus.c > +++ b/drivers/pci/bus.c > @@ -320,10 +320,7 @@ void pci_walk_bus(struct pci_bus *top, i > } else > next = dev->bus_list.next; > > - /* Run device routines with the device locked */ > - device_lock(&dev->dev); > retval = cb(dev, userdata); > - device_unlock(&dev->dev); > if (retval) > break; > } > --- a/drivers/pci/pcie/aer/aerdrv_core.c > +++ b/drivers/pci/pcie/aer/aerdrv_core.c > @@ -213,6 +213,7 @@ static int report_error_detected(struct > struct aer_broadcast_data *result_data; > result_data = (struct aer_broadcast_data *) data; > > + device_lock(&dev->dev); > dev->error_state = result_data->state; > > if (!dev->driver || > @@ -231,12 +232,14 @@ static int report_error_detected(struct > dev->driver ? > "no AER-aware driver" : "no driver"); > } > - return 0; > + goto out; > } > > err_handler = dev->driver->err_handler; > vote = err_handler->error_detected(dev, result_data->state); > result_data->result = merge_result(result_data->result, vote); > +out: > + device_unlock(&dev->dev); > return 0; > } > > @@ -247,14 +250,17 @@ static int report_mmio_enabled(struct pc > struct aer_broadcast_data *result_data; > result_data = (struct aer_broadcast_data *) data; > > + device_lock(&dev->dev); > if (!dev->driver || > !dev->driver->err_handler || > !dev->driver->err_handler->mmio_enabled) > - return 0; > + goto out; > > err_handler = dev->driver->err_handler; > vote = err_handler->mmio_enabled(dev); > result_data->result = merge_result(result_data->result, vote); > +out: > + device_unlock(&dev->dev); > return 0; > } > > @@ -265,14 +271,17 @@ static int report_slot_reset(struct pci_ > struct aer_broadcast_data *result_data; > result_data = (struct aer_broadcast_data *) data; > > + device_lock(&dev->dev); > if (!dev->driver || > !dev->driver->err_handler || > !dev->driver->err_handler->slot_reset) > - return 0; > + goto out; > > err_handler = dev->driver->err_handler; > vote = err_handler->slot_reset(dev); > result_data->result = merge_result(result_data->result, vote); > +out: > + device_unlock(&dev->dev); > return 0; > } > > @@ -280,15 +289,18 @@ static int report_resume(struct pci_dev > { > const struct pci_error_handlers *err_handler; > > + device_lock(&dev->dev); > dev->error_state = pci_channel_io_normal; > > if (!dev->driver || > !dev->driver->err_handler || > !dev->driver->err_handler->resume) > - return 0; > + goto out; > > err_handler = dev->driver->err_handler; > err_handler->resume(dev); > +out: > + device_unlock(&dev->dev); > return 0; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, November 02, 2012 10:50:50 AM Bjorn Helgaas wrote: > On Wed, Oct 24, 2012 at 12:54 AM, Huang Ying <ying.huang@intel.com> wrote: > > If a PCI device and its parents are put into D3cold, unbinding the > > device will trigger deadlock as follow: > > > > - driver_unbind > > - device_release_driver > > - device_lock(dev) <--- previous lock here > > - __device_release_driver > > - pm_runtime_get_sync > > ... > > - rpm_resume(dev) > > - rpm_resume(dev->parent) > > ... > > - pci_pm_runtime_resume > > ... > > - pci_set_power_state > > - __pci_start_power_transition > > - pci_wakeup_bus(dev->parent->subordinate) > > - pci_walk_bus > > - device_lock(dev) <--- dead lock here > > > > > > If we do not do device_lock in pci_walk_bus, we can avoid dead lock. > > Device_lock in pci_walk_bus is introduced in commit: > > d71374dafbba7ec3f67371d3b7e9f6310a588808, corresponding email thread > > is: https://lkml.org/lkml/2006/5/26/38. The patch author Zhang Yanmin > > said device_lock is added to pci_walk_bus because: > > > > Some error handling functions call pci_walk_bus. For example, PCIe > > aer. Here we lock the device, so the driver wouldn't detach from the > > device, as the cb might call driver's callback function. > > > > So I fixed the dead lock as follow: > > > > - remove device_lock from pci_walk_bus > > - add device_lock into callback if callback will call driver's callback > > > > I checked pci_walk_bus users one by one, and found only PCIe aer needs > > device lock. > > Is there a problem report or bugzilla for this issue? > > > Signed-off-by: Huang Ying <ying.huang@intel.com> > > Cc: Zhang Yanmin <yanmin.zhang@intel.com> > > Should this go to stable as well? The D3cold support appeared in > v3.6, so my guess is that this fix could go to v3.6.x. That's correct. Thanks, Rafael > > --- > > drivers/pci/bus.c | 3 --- > > drivers/pci/pcie/aer/aerdrv_core.c | 20 ++++++++++++++++---- > > 2 files changed, 16 insertions(+), 7 deletions(-) > > > > --- a/drivers/pci/bus.c > > +++ b/drivers/pci/bus.c > > @@ -320,10 +320,7 @@ void pci_walk_bus(struct pci_bus *top, i > > } else > > next = dev->bus_list.next; > > > > - /* Run device routines with the device locked */ > > - device_lock(&dev->dev); > > retval = cb(dev, userdata); > > - device_unlock(&dev->dev); > > if (retval) > > break; > > } > > --- a/drivers/pci/pcie/aer/aerdrv_core.c > > +++ b/drivers/pci/pcie/aer/aerdrv_core.c > > @@ -213,6 +213,7 @@ static int report_error_detected(struct > > struct aer_broadcast_data *result_data; > > result_data = (struct aer_broadcast_data *) data; > > > > + device_lock(&dev->dev); > > dev->error_state = result_data->state; > > > > if (!dev->driver || > > @@ -231,12 +232,14 @@ static int report_error_detected(struct > > dev->driver ? > > "no AER-aware driver" : "no driver"); > > } > > - return 0; > > + goto out; > > } > > > > err_handler = dev->driver->err_handler; > > vote = err_handler->error_detected(dev, result_data->state); > > result_data->result = merge_result(result_data->result, vote); > > +out: > > + device_unlock(&dev->dev); > > return 0; > > } > > > > @@ -247,14 +250,17 @@ static int report_mmio_enabled(struct pc > > struct aer_broadcast_data *result_data; > > result_data = (struct aer_broadcast_data *) data; > > > > + device_lock(&dev->dev); > > if (!dev->driver || > > !dev->driver->err_handler || > > !dev->driver->err_handler->mmio_enabled) > > - return 0; > > + goto out; > > > > err_handler = dev->driver->err_handler; > > vote = err_handler->mmio_enabled(dev); > > result_data->result = merge_result(result_data->result, vote); > > +out: > > + device_unlock(&dev->dev); > > return 0; > > } > > > > @@ -265,14 +271,17 @@ static int report_slot_reset(struct pci_ > > struct aer_broadcast_data *result_data; > > result_data = (struct aer_broadcast_data *) data; > > > > + device_lock(&dev->dev); > > if (!dev->driver || > > !dev->driver->err_handler || > > !dev->driver->err_handler->slot_reset) > > - return 0; > > + goto out; > > > > err_handler = dev->driver->err_handler; > > vote = err_handler->slot_reset(dev); > > result_data->result = merge_result(result_data->result, vote); > > +out: > > + device_unlock(&dev->dev); > > return 0; > > } > > > > @@ -280,15 +289,18 @@ static int report_resume(struct pci_dev > > { > > const struct pci_error_handlers *err_handler; > > > > + device_lock(&dev->dev); > > dev->error_state = pci_channel_io_normal; > > > > if (!dev->driver || > > !dev->driver->err_handler || > > !dev->driver->err_handler->resume) > > - return 0; > > + goto out; > > > > err_handler = dev->driver->err_handler; > > err_handler->resume(dev); > > +out: > > + device_unlock(&dev->dev); > > return 0; > > } > > >
On Fri, 2012-11-02 at 10:50 -0600, Bjorn Helgaas wrote: > On Wed, Oct 24, 2012 at 12:54 AM, Huang Ying <ying.huang@intel.com> wrote: > > If a PCI device and its parents are put into D3cold, unbinding the > > device will trigger deadlock as follow: > > > > - driver_unbind > > - device_release_driver > > - device_lock(dev) <--- previous lock here > > - __device_release_driver > > - pm_runtime_get_sync > > ... > > - rpm_resume(dev) > > - rpm_resume(dev->parent) > > ... > > - pci_pm_runtime_resume > > ... > > - pci_set_power_state > > - __pci_start_power_transition > > - pci_wakeup_bus(dev->parent->subordinate) > > - pci_walk_bus > > - device_lock(dev) <--- dead lock here > > > > > > If we do not do device_lock in pci_walk_bus, we can avoid dead lock. > > Device_lock in pci_walk_bus is introduced in commit: > > d71374dafbba7ec3f67371d3b7e9f6310a588808, corresponding email thread > > is: https://lkml.org/lkml/2006/5/26/38. The patch author Zhang Yanmin > > said device_lock is added to pci_walk_bus because: > > > > Some error handling functions call pci_walk_bus. For example, PCIe > > aer. Here we lock the device, so the driver wouldn't detach from the > > device, as the cb might call driver's callback function. > > > > So I fixed the dead lock as follow: > > > > - remove device_lock from pci_walk_bus > > - add device_lock into callback if callback will call driver's callback > > > > I checked pci_walk_bus users one by one, and found only PCIe aer needs > > device lock. > > Is there a problem report or bugzilla for this issue? No. I found this issue when I developed kexec fixing. Best Regards, Huang Ying > > Signed-off-by: Huang Ying <ying.huang@intel.com> > > Cc: Zhang Yanmin <yanmin.zhang@intel.com> > > Should this go to stable as well? The D3cold support appeared in > v3.6, so my guess is that this fix could go to v3.6.x. > > > --- > > drivers/pci/bus.c | 3 --- > > drivers/pci/pcie/aer/aerdrv_core.c | 20 ++++++++++++++++---- > > 2 files changed, 16 insertions(+), 7 deletions(-) > > > > --- a/drivers/pci/bus.c > > +++ b/drivers/pci/bus.c > > @@ -320,10 +320,7 @@ void pci_walk_bus(struct pci_bus *top, i > > } else > > next = dev->bus_list.next; > > > > - /* Run device routines with the device locked */ > > - device_lock(&dev->dev); > > retval = cb(dev, userdata); > > - device_unlock(&dev->dev); > > if (retval) > > break; > > } > > --- a/drivers/pci/pcie/aer/aerdrv_core.c > > +++ b/drivers/pci/pcie/aer/aerdrv_core.c > > @@ -213,6 +213,7 @@ static int report_error_detected(struct > > struct aer_broadcast_data *result_data; > > result_data = (struct aer_broadcast_data *) data; > > > > + device_lock(&dev->dev); > > dev->error_state = result_data->state; > > > > if (!dev->driver || > > @@ -231,12 +232,14 @@ static int report_error_detected(struct > > dev->driver ? > > "no AER-aware driver" : "no driver"); > > } > > - return 0; > > + goto out; > > } > > > > err_handler = dev->driver->err_handler; > > vote = err_handler->error_detected(dev, result_data->state); > > result_data->result = merge_result(result_data->result, vote); > > +out: > > + device_unlock(&dev->dev); > > return 0; > > } > > > > @@ -247,14 +250,17 @@ static int report_mmio_enabled(struct pc > > struct aer_broadcast_data *result_data; > > result_data = (struct aer_broadcast_data *) data; > > > > + device_lock(&dev->dev); > > if (!dev->driver || > > !dev->driver->err_handler || > > !dev->driver->err_handler->mmio_enabled) > > - return 0; > > + goto out; > > > > err_handler = dev->driver->err_handler; > > vote = err_handler->mmio_enabled(dev); > > result_data->result = merge_result(result_data->result, vote); > > +out: > > + device_unlock(&dev->dev); > > return 0; > > } > > > > @@ -265,14 +271,17 @@ static int report_slot_reset(struct pci_ > > struct aer_broadcast_data *result_data; > > result_data = (struct aer_broadcast_data *) data; > > > > + device_lock(&dev->dev); > > if (!dev->driver || > > !dev->driver->err_handler || > > !dev->driver->err_handler->slot_reset) > > - return 0; > > + goto out; > > > > err_handler = dev->driver->err_handler; > > vote = err_handler->slot_reset(dev); > > result_data->result = merge_result(result_data->result, vote); > > +out: > > + device_unlock(&dev->dev); > > return 0; > > } > > > > @@ -280,15 +289,18 @@ static int report_resume(struct pci_dev > > { > > const struct pci_error_handlers *err_handler; > > > > + device_lock(&dev->dev); > > dev->error_state = pci_channel_io_normal; > > > > if (!dev->driver || > > !dev->driver->err_handler || > > !dev->driver->err_handler->resume) > > - return 0; > > + goto out; > > > > err_handler = dev->driver->err_handler; > > err_handler->resume(dev); > > +out: > > + device_unlock(&dev->dev); > > return 0; > > } > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 2, 2012 at 11:06 PM, Huang Ying <ying.huang@intel.com> wrote: > On Fri, 2012-11-02 at 10:50 -0600, Bjorn Helgaas wrote: >> On Wed, Oct 24, 2012 at 12:54 AM, Huang Ying <ying.huang@intel.com> wrote: >> > If a PCI device and its parents are put into D3cold, unbinding the >> > device will trigger deadlock as follow: >> > >> > - driver_unbind >> > - device_release_driver >> > - device_lock(dev) <--- previous lock here >> > - __device_release_driver >> > - pm_runtime_get_sync >> > ... >> > - rpm_resume(dev) >> > - rpm_resume(dev->parent) >> > ... >> > - pci_pm_runtime_resume >> > ... >> > - pci_set_power_state >> > - __pci_start_power_transition >> > - pci_wakeup_bus(dev->parent->subordinate) >> > - pci_walk_bus >> > - device_lock(dev) <--- dead lock here >> > >> > >> > If we do not do device_lock in pci_walk_bus, we can avoid dead lock. >> > Device_lock in pci_walk_bus is introduced in commit: >> > d71374dafbba7ec3f67371d3b7e9f6310a588808, corresponding email thread >> > is: https://lkml.org/lkml/2006/5/26/38. The patch author Zhang Yanmin >> > said device_lock is added to pci_walk_bus because: >> > >> > Some error handling functions call pci_walk_bus. For example, PCIe >> > aer. Here we lock the device, so the driver wouldn't detach from the >> > device, as the cb might call driver's callback function. >> > >> > So I fixed the dead lock as follow: >> > >> > - remove device_lock from pci_walk_bus >> > - add device_lock into callback if callback will call driver's callback >> > >> > I checked pci_walk_bus users one by one, and found only PCIe aer needs >> > device lock. >> >> Is there a problem report or bugzilla for this issue? > > No. I found this issue when I developed kexec fixing. > > Best Regards, > Huang Ying > >> > Signed-off-by: Huang Ying <ying.huang@intel.com> >> > Cc: Zhang Yanmin <yanmin.zhang@intel.com> >> >> Should this go to stable as well? The D3cold support appeared in >> v3.6, so my guess is that this fix could go to v3.6.x. What about the stable tree? >> > --- >> > drivers/pci/bus.c | 3 --- >> > drivers/pci/pcie/aer/aerdrv_core.c | 20 ++++++++++++++++---- >> > 2 files changed, 16 insertions(+), 7 deletions(-) >> > >> > --- a/drivers/pci/bus.c >> > +++ b/drivers/pci/bus.c >> > @@ -320,10 +320,7 @@ void pci_walk_bus(struct pci_bus *top, i >> > } else >> > next = dev->bus_list.next; >> > >> > - /* Run device routines with the device locked */ >> > - device_lock(&dev->dev); >> > retval = cb(dev, userdata); >> > - device_unlock(&dev->dev); >> > if (retval) >> > break; >> > } >> > --- a/drivers/pci/pcie/aer/aerdrv_core.c >> > +++ b/drivers/pci/pcie/aer/aerdrv_core.c >> > @@ -213,6 +213,7 @@ static int report_error_detected(struct >> > struct aer_broadcast_data *result_data; >> > result_data = (struct aer_broadcast_data *) data; >> > >> > + device_lock(&dev->dev); >> > dev->error_state = result_data->state; >> > >> > if (!dev->driver || >> > @@ -231,12 +232,14 @@ static int report_error_detected(struct >> > dev->driver ? >> > "no AER-aware driver" : "no driver"); >> > } >> > - return 0; >> > + goto out; >> > } >> > >> > err_handler = dev->driver->err_handler; >> > vote = err_handler->error_detected(dev, result_data->state); >> > result_data->result = merge_result(result_data->result, vote); >> > +out: >> > + device_unlock(&dev->dev); >> > return 0; >> > } >> > >> > @@ -247,14 +250,17 @@ static int report_mmio_enabled(struct pc >> > struct aer_broadcast_data *result_data; >> > result_data = (struct aer_broadcast_data *) data; >> > >> > + device_lock(&dev->dev); >> > if (!dev->driver || >> > !dev->driver->err_handler || >> > !dev->driver->err_handler->mmio_enabled) >> > - return 0; >> > + goto out; >> > >> > err_handler = dev->driver->err_handler; >> > vote = err_handler->mmio_enabled(dev); >> > result_data->result = merge_result(result_data->result, vote); >> > +out: >> > + device_unlock(&dev->dev); >> > return 0; >> > } >> > >> > @@ -265,14 +271,17 @@ static int report_slot_reset(struct pci_ >> > struct aer_broadcast_data *result_data; >> > result_data = (struct aer_broadcast_data *) data; >> > >> > + device_lock(&dev->dev); >> > if (!dev->driver || >> > !dev->driver->err_handler || >> > !dev->driver->err_handler->slot_reset) >> > - return 0; >> > + goto out; >> > >> > err_handler = dev->driver->err_handler; >> > vote = err_handler->slot_reset(dev); >> > result_data->result = merge_result(result_data->result, vote); >> > +out: >> > + device_unlock(&dev->dev); >> > return 0; >> > } >> > >> > @@ -280,15 +289,18 @@ static int report_resume(struct pci_dev >> > { >> > const struct pci_error_handlers *err_handler; >> > >> > + device_lock(&dev->dev); >> > dev->error_state = pci_channel_io_normal; >> > >> > if (!dev->driver || >> > !dev->driver->err_handler || >> > !dev->driver->err_handler->resume) >> > - return 0; >> > + goto out; >> > >> > err_handler = dev->driver->err_handler; >> > err_handler->resume(dev); >> > +out: >> > + device_unlock(&dev->dev); >> > return 0; >> > } >> > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2012-11-03 at 11:22 -0600, Bjorn Helgaas wrote: > On Fri, Nov 2, 2012 at 11:06 PM, Huang Ying <ying.huang@intel.com> wrote: > > On Fri, 2012-11-02 at 10:50 -0600, Bjorn Helgaas wrote: > >> On Wed, Oct 24, 2012 at 12:54 AM, Huang Ying <ying.huang@intel.com> wrote: > >> > If a PCI device and its parents are put into D3cold, unbinding the > >> > device will trigger deadlock as follow: > >> > > >> > - driver_unbind > >> > - device_release_driver > >> > - device_lock(dev) <--- previous lock here > >> > - __device_release_driver > >> > - pm_runtime_get_sync > >> > ... > >> > - rpm_resume(dev) > >> > - rpm_resume(dev->parent) > >> > ... > >> > - pci_pm_runtime_resume > >> > ... > >> > - pci_set_power_state > >> > - __pci_start_power_transition > >> > - pci_wakeup_bus(dev->parent->subordinate) > >> > - pci_walk_bus > >> > - device_lock(dev) <--- dead lock here > >> > > >> > > >> > If we do not do device_lock in pci_walk_bus, we can avoid dead lock. > >> > Device_lock in pci_walk_bus is introduced in commit: > >> > d71374dafbba7ec3f67371d3b7e9f6310a588808, corresponding email thread > >> > is: https://lkml.org/lkml/2006/5/26/38. The patch author Zhang Yanmin > >> > said device_lock is added to pci_walk_bus because: > >> > > >> > Some error handling functions call pci_walk_bus. For example, PCIe > >> > aer. Here we lock the device, so the driver wouldn't detach from the > >> > device, as the cb might call driver's callback function. > >> > > >> > So I fixed the dead lock as follow: > >> > > >> > - remove device_lock from pci_walk_bus > >> > - add device_lock into callback if callback will call driver's callback > >> > > >> > I checked pci_walk_bus users one by one, and found only PCIe aer needs > >> > device lock. > >> > >> Is there a problem report or bugzilla for this issue? > > > > No. I found this issue when I developed kexec fixing. > > > > Best Regards, > > Huang Ying > > > >> > Signed-off-by: Huang Ying <ying.huang@intel.com> > >> > Cc: Zhang Yanmin <yanmin.zhang@intel.com> > >> > >> Should this go to stable as well? The D3cold support appeared in > >> v3.6, so my guess is that this fix could go to v3.6.x. > > What about the stable tree? You are right. This fix should go to v3.6.x stable tree. Best Regards, Huang Ying > >> > --- > >> > drivers/pci/bus.c | 3 --- > >> > drivers/pci/pcie/aer/aerdrv_core.c | 20 ++++++++++++++++---- > >> > 2 files changed, 16 insertions(+), 7 deletions(-) > >> > > >> > --- a/drivers/pci/bus.c > >> > +++ b/drivers/pci/bus.c > >> > @@ -320,10 +320,7 @@ void pci_walk_bus(struct pci_bus *top, i > >> > } else > >> > next = dev->bus_list.next; > >> > > >> > - /* Run device routines with the device locked */ > >> > - device_lock(&dev->dev); > >> > retval = cb(dev, userdata); > >> > - device_unlock(&dev->dev); > >> > if (retval) > >> > break; > >> > } > >> > --- a/drivers/pci/pcie/aer/aerdrv_core.c > >> > +++ b/drivers/pci/pcie/aer/aerdrv_core.c > >> > @@ -213,6 +213,7 @@ static int report_error_detected(struct > >> > struct aer_broadcast_data *result_data; > >> > result_data = (struct aer_broadcast_data *) data; > >> > > >> > + device_lock(&dev->dev); > >> > dev->error_state = result_data->state; > >> > > >> > if (!dev->driver || > >> > @@ -231,12 +232,14 @@ static int report_error_detected(struct > >> > dev->driver ? > >> > "no AER-aware driver" : "no driver"); > >> > } > >> > - return 0; > >> > + goto out; > >> > } > >> > > >> > err_handler = dev->driver->err_handler; > >> > vote = err_handler->error_detected(dev, result_data->state); > >> > result_data->result = merge_result(result_data->result, vote); > >> > +out: > >> > + device_unlock(&dev->dev); > >> > return 0; > >> > } > >> > > >> > @@ -247,14 +250,17 @@ static int report_mmio_enabled(struct pc > >> > struct aer_broadcast_data *result_data; > >> > result_data = (struct aer_broadcast_data *) data; > >> > > >> > + device_lock(&dev->dev); > >> > if (!dev->driver || > >> > !dev->driver->err_handler || > >> > !dev->driver->err_handler->mmio_enabled) > >> > - return 0; > >> > + goto out; > >> > > >> > err_handler = dev->driver->err_handler; > >> > vote = err_handler->mmio_enabled(dev); > >> > result_data->result = merge_result(result_data->result, vote); > >> > +out: > >> > + device_unlock(&dev->dev); > >> > return 0; > >> > } > >> > > >> > @@ -265,14 +271,17 @@ static int report_slot_reset(struct pci_ > >> > struct aer_broadcast_data *result_data; > >> > result_data = (struct aer_broadcast_data *) data; > >> > > >> > + device_lock(&dev->dev); > >> > if (!dev->driver || > >> > !dev->driver->err_handler || > >> > !dev->driver->err_handler->slot_reset) > >> > - return 0; > >> > + goto out; > >> > > >> > err_handler = dev->driver->err_handler; > >> > vote = err_handler->slot_reset(dev); > >> > result_data->result = merge_result(result_data->result, vote); > >> > +out: > >> > + device_unlock(&dev->dev); > >> > return 0; > >> > } > >> > > >> > @@ -280,15 +289,18 @@ static int report_resume(struct pci_dev > >> > { > >> > const struct pci_error_handlers *err_handler; > >> > > >> > + device_lock(&dev->dev); > >> > dev->error_state = pci_channel_io_normal; > >> > > >> > if (!dev->driver || > >> > !dev->driver->err_handler || > >> > !dev->driver->err_handler->resume) > >> > - return 0; > >> > + goto out; > >> > > >> > err_handler = dev->driver->err_handler; > >> > err_handler->resume(dev); > >> > +out: > >> > + device_unlock(&dev->dev); > >> > return 0; > >> > } > >> > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Nov 3, 2012 at 9:38 PM, Huang Ying <ying.huang@intel.com> wrote: > On Sat, 2012-11-03 at 11:22 -0600, Bjorn Helgaas wrote: >> On Fri, Nov 2, 2012 at 11:06 PM, Huang Ying <ying.huang@intel.com> wrote: >> > On Fri, 2012-11-02 at 10:50 -0600, Bjorn Helgaas wrote: >> >> On Wed, Oct 24, 2012 at 12:54 AM, Huang Ying <ying.huang@intel.com> wrote: >> >> > If a PCI device and its parents are put into D3cold, unbinding the >> >> > device will trigger deadlock as follow: >> >> > >> >> > - driver_unbind >> >> > - device_release_driver >> >> > - device_lock(dev) <--- previous lock here >> >> > - __device_release_driver >> >> > - pm_runtime_get_sync >> >> > ... >> >> > - rpm_resume(dev) >> >> > - rpm_resume(dev->parent) >> >> > ... >> >> > - pci_pm_runtime_resume >> >> > ... >> >> > - pci_set_power_state >> >> > - __pci_start_power_transition >> >> > - pci_wakeup_bus(dev->parent->subordinate) >> >> > - pci_walk_bus >> >> > - device_lock(dev) <--- dead lock here >> >> > >> >> > >> >> > If we do not do device_lock in pci_walk_bus, we can avoid dead lock. >> >> > Device_lock in pci_walk_bus is introduced in commit: >> >> > d71374dafbba7ec3f67371d3b7e9f6310a588808, corresponding email thread >> >> > is: https://lkml.org/lkml/2006/5/26/38. The patch author Zhang Yanmin >> >> > said device_lock is added to pci_walk_bus because: >> >> > >> >> > Some error handling functions call pci_walk_bus. For example, PCIe >> >> > aer. Here we lock the device, so the driver wouldn't detach from the >> >> > device, as the cb might call driver's callback function. >> >> > >> >> > So I fixed the dead lock as follow: >> >> > >> >> > - remove device_lock from pci_walk_bus >> >> > - add device_lock into callback if callback will call driver's callback >> >> > >> >> > I checked pci_walk_bus users one by one, and found only PCIe aer needs >> >> > device lock. >> >> >> >> Is there a problem report or bugzilla for this issue? >> > >> > No. I found this issue when I developed kexec fixing. >> > >> > Best Regards, >> > Huang Ying >> > >> >> > Signed-off-by: Huang Ying <ying.huang@intel.com> >> >> > Cc: Zhang Yanmin <yanmin.zhang@intel.com> >> >> >> >> Should this go to stable as well? The D3cold support appeared in >> >> v3.6, so my guess is that this fix could go to v3.6.x. >> >> What about the stable tree? > > You are right. This fix should go to v3.6.x stable tree. I applied these two patches to my for-linus branch as v3.7 material. Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/drivers/pci/bus.c +++ b/drivers/pci/bus.c @@ -320,10 +320,7 @@ void pci_walk_bus(struct pci_bus *top, i } else next = dev->bus_list.next; - /* Run device routines with the device locked */ - device_lock(&dev->dev); retval = cb(dev, userdata); - device_unlock(&dev->dev); if (retval) break; } --- a/drivers/pci/pcie/aer/aerdrv_core.c +++ b/drivers/pci/pcie/aer/aerdrv_core.c @@ -213,6 +213,7 @@ static int report_error_detected(struct struct aer_broadcast_data *result_data; result_data = (struct aer_broadcast_data *) data; + device_lock(&dev->dev); dev->error_state = result_data->state; if (!dev->driver || @@ -231,12 +232,14 @@ static int report_error_detected(struct dev->driver ? "no AER-aware driver" : "no driver"); } - return 0; + goto out; } err_handler = dev->driver->err_handler; vote = err_handler->error_detected(dev, result_data->state); result_data->result = merge_result(result_data->result, vote); +out: + device_unlock(&dev->dev); return 0; } @@ -247,14 +250,17 @@ static int report_mmio_enabled(struct pc struct aer_broadcast_data *result_data; result_data = (struct aer_broadcast_data *) data; + device_lock(&dev->dev); if (!dev->driver || !dev->driver->err_handler || !dev->driver->err_handler->mmio_enabled) - return 0; + goto out; err_handler = dev->driver->err_handler; vote = err_handler->mmio_enabled(dev); result_data->result = merge_result(result_data->result, vote); +out: + device_unlock(&dev->dev); return 0; } @@ -265,14 +271,17 @@ static int report_slot_reset(struct pci_ struct aer_broadcast_data *result_data; result_data = (struct aer_broadcast_data *) data; + device_lock(&dev->dev); if (!dev->driver || !dev->driver->err_handler || !dev->driver->err_handler->slot_reset) - return 0; + goto out; err_handler = dev->driver->err_handler; vote = err_handler->slot_reset(dev); result_data->result = merge_result(result_data->result, vote); +out: + device_unlock(&dev->dev); return 0; } @@ -280,15 +289,18 @@ static int report_resume(struct pci_dev { const struct pci_error_handlers *err_handler; + device_lock(&dev->dev); dev->error_state = pci_channel_io_normal; if (!dev->driver || !dev->driver->err_handler || !dev->driver->err_handler->resume) - return 0; + goto out; err_handler = dev->driver->err_handler; err_handler->resume(dev); +out: + device_unlock(&dev->dev); return 0; }
If a PCI device and its parents are put into D3cold, unbinding the device will trigger deadlock as follow: - driver_unbind - device_release_driver - device_lock(dev) <--- previous lock here - __device_release_driver - pm_runtime_get_sync ... - rpm_resume(dev) - rpm_resume(dev->parent) ... - pci_pm_runtime_resume ... - pci_set_power_state - __pci_start_power_transition - pci_wakeup_bus(dev->parent->subordinate) - pci_walk_bus - device_lock(dev) <--- dead lock here If we do not do device_lock in pci_walk_bus, we can avoid dead lock. Device_lock in pci_walk_bus is introduced in commit: d71374dafbba7ec3f67371d3b7e9f6310a588808, corresponding email thread is: https://lkml.org/lkml/2006/5/26/38. The patch author Zhang Yanmin said device_lock is added to pci_walk_bus because: Some error handling functions call pci_walk_bus. For example, PCIe aer. Here we lock the device, so the driver wouldn't detach from the device, as the cb might call driver's callback function. So I fixed the dead lock as follow: - remove device_lock from pci_walk_bus - add device_lock into callback if callback will call driver's callback I checked pci_walk_bus users one by one, and found only PCIe aer needs device lock. Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: Zhang Yanmin <yanmin.zhang@intel.com> --- drivers/pci/bus.c | 3 --- drivers/pci/pcie/aer/aerdrv_core.c | 20 ++++++++++++++++---- 2 files changed, 16 insertions(+), 7 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html