Patchwork [RFC,v1,06/22] PCI: use a global lock to serialize PCI root bridge hotplug operations

login
register
mail settings
Submitter Jiang Liu
Date Aug. 7, 2012, 4:10 p.m.
Message ID <1344355862-2726-7-git-send-email-jiang.liu@huawei.com>
Download mbox | patch
Permalink /patch/175718/
State Not Applicable
Headers show

Comments

Jiang Liu - Aug. 7, 2012, 4:10 p.m.
Currently there's no mechanism to protect the global pci_root_buses list
from dynamic change at runtime. That means, PCI root bridge hotplug
operations, which dynamically change the pci_root_buses list, may cause
invalid memory accesses.

So introduce a global lock to serialize accesses to the pci_root_buses
list and serialize PCI host bridge hotplug operations.

Be careful, never try to acquire this global lock from PCI device drivers,
that may cause deadlocks.

Signed-off-by: Jiang Liu <liuj97@gmail.com>
---
 drivers/acpi/pci_root.c           |    8 +++++++-
 drivers/edac/i7core_edac.c        |   16 +++++++---------
 drivers/gpu/drm/drm_fops.c        |    6 +++++-
 drivers/pci/host-bridge.c         |   19 +++++++++++++++++++
 drivers/pci/hotplug/sgi_hotplug.c |    2 ++
 drivers/pci/pci-sysfs.c           |    2 ++
 drivers/pci/probe.c               |    5 ++++-
 drivers/pci/search.c              |    9 ++++++++-
 include/linux/pci.h               |    8 ++++++++
 9 files changed, 62 insertions(+), 13 deletions(-)
Bjorn Helgaas - Sept. 11, 2012, 10:57 p.m.
On Tue, Aug 7, 2012 at 10:10 AM, Jiang Liu <liuj97@gmail.com> wrote:
> Currently there's no mechanism to protect the global pci_root_buses list
> from dynamic change at runtime. That means, PCI root bridge hotplug
> operations, which dynamically change the pci_root_buses list, may cause
> invalid memory accesses.
>
> So introduce a global lock to serialize accesses to the pci_root_buses
> list and serialize PCI host bridge hotplug operations.
>
> Be careful, never try to acquire this global lock from PCI device drivers,
> that may cause deadlocks.
>
> Signed-off-by: Jiang Liu <liuj97@gmail.com>
> ---
>  drivers/acpi/pci_root.c           |    8 +++++++-
>  drivers/edac/i7core_edac.c        |   16 +++++++---------
>  drivers/gpu/drm/drm_fops.c        |    6 +++++-
>  drivers/pci/host-bridge.c         |   19 +++++++++++++++++++
>  drivers/pci/hotplug/sgi_hotplug.c |    2 ++
>  drivers/pci/pci-sysfs.c           |    2 ++
>  drivers/pci/probe.c               |    5 ++++-
>  drivers/pci/search.c              |    9 ++++++++-
>  include/linux/pci.h               |    8 ++++++++
>  9 files changed, 62 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index 7aff631..6bd0e32 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -463,6 +463,8 @@ static int __devinit acpi_pci_root_add(struct acpi_device *device)
>         if (!root)
>                 return -ENOMEM;
>
> +       pci_host_bridge_hotplug_lock();

Here's where I get lost.  This is an ACPI driver's .add() routine,
which is analogous to a PCI driver's .probe() routine.  PCI driver
.probe() routines don't need to be concerned with PCI device hotplug.
All the hotplug-related locking is handled by the PCI core, not by
individual drivers.  So why do we need it here?

I'm not suggesting that the existing locking is correct.  I'm just not
convinced this is the right way to fix it.

The commit log says we need protection for the global pci_root_buses
list.  But even with this whole series, we still traverse the list
without protection in places like pcibios_resource_survey() and
pci_assign_unassigned_resources().

Maybe we can make progress on this by identifying specific failures
that can happen in a couple of these paths, e.g., acpi_pci_root_add()
and i7core_xeon_pci_fixup().  If we look at those paths, we might a
way to fix this in a more general fashion than throwing in lock/unlock
pairs.

It might also help to know what the rule is for when we need to use
pci_host_bridge_hotplug_lock() and pci_host_bridge_hotplug_unlock().
Apparently it is not as simple as protecting every reference to the
pci_root_buses list.

> diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
> index 123de28..f559b5b 100644
> --- a/drivers/gpu/drm/drm_fops.c
> +++ b/drivers/gpu/drm/drm_fops.c
> @@ -344,9 +344,13 @@ static int drm_open_helper(struct inode *inode, struct file *filp,
>                         pci_dev_put(pci_dev);
>                 }
>                 if (!dev->hose) {
> -                       struct pci_bus *b = pci_bus_b(pci_root_buses.next);
> +                       struct pci_bus *b;
> +
> +                       pci_host_bridge_hotplug_lock();
> +                       b = pci_find_next_bus(NULL);

Here's another case I don't understand.  We know already that
pci_find_next_bus() is unsafe with respect to hotplug because it
doesn't hold a reference on the struct pci_bus it returns.  Can't we
replace this with some variety of pci_get_next_bus() that *does*
acquire a reference?

Actually, I looked at the callers of pci_find_next_bus(), and most of
them are unsafe in an even deeper way: they're doing device setup in
initcalls, so that setup won't be done for hot-added devices.  For
example, I can pick on sba_init() because I think I wrote it back in
the dark ages.  sba_init() is a subsys_initcall that calls
sba_connect_bus() for every bus we know about at boot-time, and it
sets the host bridge's iommu pointer.  If we were to hot-add a host
bridge, we would never set the iommu pointer.

I'm not sure why you didn't add a pci_host_bridge_hotplug_lock() in
the sba_init() path, since it looks similar to the drm_open_helper()
path above.  But in any case, I think that would be the wrong thing to
do because it would fix the superficial problem while leaving the
deeper problem of host bridge hot-add not setting the iommu pointer.

>                         if (b)
>                                 dev->hose = b->sysdata;
> +                       pci_host_bridge_hotplug_unlock();
>                 }
>         }
>  #endif
...
> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> index 993d4a0..f1147a7 100644
> --- a/drivers/pci/search.c
> +++ b/drivers/pci/search.c
> @@ -100,6 +100,13 @@ struct pci_bus * pci_find_bus(int domain, int busnr)
>   * initiated by passing %NULL as the @from argument.  Otherwise if
>   * @from is not %NULL, searches continue from next device on the
>   * global list.
> + *
> + * Please don't call this function at rumtime if possible.
> + * It's designed to be called at boot time only because it's unsafe
> + * to PCI root bridge hotplug operations. But some drivers do invoke
> + * it at runtime and it's hard to fix those drivers. In such cases,
> + * use pci_host_bridge_hotplug()_{lock|unlock} to protect the PCI root
> + * bus list, but you need to be really careful to avoid deadlock.

I'm not convinced that it's too hard to fix these drivers :)  There
are only six callers, and the only ones that could possibly be at
runtime are drm_open_helper(), sn_pci_hotplug_init(), and
bus_rescan_store().

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiang Liu - Sept. 12, 2012, 3:42 p.m.
On 09/12/2012 06:57 AM, Bjorn Helgaas wrote:
> On Tue, Aug 7, 2012 at 10:10 AM, Jiang Liu <liuj97@gmail.com> wrote:
>> Currently there's no mechanism to protect the global pci_root_buses list
>> from dynamic change at runtime. That means, PCI root bridge hotplug
>> operations, which dynamically change the pci_root_buses list, may cause
>> invalid memory accesses.
>>
>> So introduce a global lock to serialize accesses to the pci_root_buses
>> list and serialize PCI host bridge hotplug operations.
>>
>> Be careful, never try to acquire this global lock from PCI device drivers,
>> that may cause deadlocks.
>>
>> Signed-off-by: Jiang Liu <liuj97@gmail.com>
>> ---
>>  drivers/acpi/pci_root.c           |    8 +++++++-
>>  drivers/edac/i7core_edac.c        |   16 +++++++---------
>>  drivers/gpu/drm/drm_fops.c        |    6 +++++-
>>  drivers/pci/host-bridge.c         |   19 +++++++++++++++++++
>>  drivers/pci/hotplug/sgi_hotplug.c |    2 ++
>>  drivers/pci/pci-sysfs.c           |    2 ++
>>  drivers/pci/probe.c               |    5 ++++-
>>  drivers/pci/search.c              |    9 ++++++++-
>>  include/linux/pci.h               |    8 ++++++++
>>  9 files changed, 62 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
>> index 7aff631..6bd0e32 100644
>> --- a/drivers/acpi/pci_root.c
>> +++ b/drivers/acpi/pci_root.c
>> @@ -463,6 +463,8 @@ static int __devinit acpi_pci_root_add(struct acpi_device *device)
>>         if (!root)
>>                 return -ENOMEM;
>>
>> +       pci_host_bridge_hotplug_lock();
> 
> Here's where I get lost.  This is an ACPI driver's .add() routine,
> which is analogous to a PCI driver's .probe() routine.  PCI driver
> .probe() routines don't need to be concerned with PCI device hotplug.
> All the hotplug-related locking is handled by the PCI core, not by
> individual drivers.  So why do we need it here?
> 
> I'm not suggesting that the existing locking is correct.  I'm just not
> convinced this is the right way to fix it.
> 
> The commit log says we need protection for the global pci_root_buses
> list.  But even with this whole series, we still traverse the list
> without protection in places like pcibios_resource_survey() and
> pci_assign_unassigned_resources().
> 
> Maybe we can make progress on this by identifying specific failures
> that can happen in a couple of these paths, e.g., acpi_pci_root_add()
> and i7core_xeon_pci_fixup().  If we look at those paths, we might a
> way to fix this in a more general fashion than throwing in lock/unlock
> pairs.
> 
> It might also help to know what the rule is for when we need to use
> pci_host_bridge_hotplug_lock() and pci_host_bridge_hotplug_unlock().
> Apparently it is not as simple as protecting every reference to the
> pci_root_buses list.
Hi Bjorn,
	It's really a challenge work to protect the pci_root_buses list:)
All evils are caused by the pci_find_next_bus() interface, which is designed
to be called at boot time only. I have tried several other solutions but
failed.
	First I tried "pci_get_next_bus()" which holds a reference to the
returned root bus "pci_bus". But that doesn't help because pci_bus could
be removed from the pci_root_buses list even you hold a reference to
pci_bus. And it will cause trouble when you call pci_get_next_bus(pci_bus)
again because pci_bus->node.next is invalid now.
	Then I tried RCU and also failed because caller of pci_get_next_bus()
may sleep.
	And at last the global host bridge hotplug lock solution. The rules
for locking are:
	1) No need for locking when accessing the pci_root_buses list at
system initialization stages. (It's system initialization instead of driver
initialization here because driver's initialization code may be called
at runtime when loading the driver.) It's single-threaded and no hotplug
during system initialization stages.
	2) Should acquire the global lock when accessing the pci_root_buses
list at runtime.

	I have done several rounds of scanning to identify accessing to
the pci_root_buses list at runtime. But there may still be something missed:(

	I think the best solution is to get rid of the pci_find_next_bus().
but not sure whether we could achieve that.

> 
>> diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
>> index 123de28..f559b5b 100644
>> --- a/drivers/gpu/drm/drm_fops.c
>> +++ b/drivers/gpu/drm/drm_fops.c
>> @@ -344,9 +344,13 @@ static int drm_open_helper(struct inode *inode, struct file *filp,
>>                         pci_dev_put(pci_dev);
>>                 }
>>                 if (!dev->hose) {
>> -                       struct pci_bus *b = pci_bus_b(pci_root_buses.next);
>> +                       struct pci_bus *b;
>> +
>> +                       pci_host_bridge_hotplug_lock();
>> +                       b = pci_find_next_bus(NULL);
> 
> Here's another case I don't understand.  We know already that
> pci_find_next_bus() is unsafe with respect to hotplug because it
> doesn't hold a reference on the struct pci_bus it returns.  Can't we
> replace this with some variety of pci_get_next_bus() that *does*
> acquire a reference?
> 
> Actually, I looked at the callers of pci_find_next_bus(), and most of
> them are unsafe in an even deeper way: they're doing device setup in
> initcalls, so that setup won't be done for hot-added devices.  For
> example, I can pick on sba_init() because I think I wrote it back in
> the dark ages.  sba_init() is a subsys_initcall that calls
> sba_connect_bus() for every bus we know about at boot-time, and it
> sets the host bridge's iommu pointer.  If we were to hot-add a host
> bridge, we would never set the iommu pointer.
That's a more fundamental issue, another big topic for us:(

> 
> I'm not sure why you didn't add a pci_host_bridge_hotplug_lock() in
> the sba_init() path, since it looks similar to the drm_open_helper()
> path above.  But in any case, I think that would be the wrong thing to
> do because it would fix the superficial problem while leaving the
> deeper problem of host bridge hot-add not setting the iommu pointer.
sba_init is called during system initialization stages through subsys_initcall,
so no extra protection for it.

>>                         if (b)
>>                                 dev->hose = b->sysdata;
>> +                       pci_host_bridge_hotplug_unlock();
>>                 }
>>         }
>>  #endif
> ...
>> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
>> index 993d4a0..f1147a7 100644
>> --- a/drivers/pci/search.c
>> +++ b/drivers/pci/search.c
>> @@ -100,6 +100,13 @@ struct pci_bus * pci_find_bus(int domain, int busnr)
>>   * initiated by passing %NULL as the @from argument.  Otherwise if
>>   * @from is not %NULL, searches continue from next device on the
>>   * global list.
>> + *
>> + * Please don't call this function at rumtime if possible.
>> + * It's designed to be called at boot time only because it's unsafe
>> + * to PCI root bridge hotplug operations. But some drivers do invoke
>> + * it at runtime and it's hard to fix those drivers. In such cases,
>> + * use pci_host_bridge_hotplug()_{lock|unlock} to protect the PCI root
>> + * bus list, but you need to be really careful to avoid deadlock.
> 
> I'm not convinced that it's too hard to fix these drivers :)  There
> are only six callers, and the only ones that could possibly be at
> runtime are drm_open_helper(), sn_pci_hotplug_init(), and
> bus_rescan_store().
The same issue for i7core_xeon_pci_fixup() in i7core_edac driver too.
Will think about this solution.

--Gerry

> 
> Bjorn
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - Sept. 12, 2012, 4:51 p.m.
On Wed, Sep 12, 2012 at 9:42 AM, Jiang Liu <liuj97@gmail.com> wrote:
> On 09/12/2012 06:57 AM, Bjorn Helgaas wrote:
>> On Tue, Aug 7, 2012 at 10:10 AM, Jiang Liu <liuj97@gmail.com> wrote:
>>> Currently there's no mechanism to protect the global pci_root_buses list
>>> from dynamic change at runtime. That means, PCI root bridge hotplug
>>> operations, which dynamically change the pci_root_buses list, may cause
>>> invalid memory accesses.
>>>
>>> So introduce a global lock to serialize accesses to the pci_root_buses
>>> list and serialize PCI host bridge hotplug operations.

>>> @@ -463,6 +463,8 @@ static int __devinit acpi_pci_root_add(struct acpi_device *device)
>>>         if (!root)
>>>                 return -ENOMEM;
>>>
>>> +       pci_host_bridge_hotplug_lock();
>>
>> Here's where I get lost.  This is an ACPI driver's .add() routine,
>> which is analogous to a PCI driver's .probe() routine.  PCI driver
>> .probe() routines don't need to be concerned with PCI device hotplug.
>> All the hotplug-related locking is handled by the PCI core, not by
>> individual drivers.  So why do we need it here?
>>
>> I'm not suggesting that the existing locking is correct.  I'm just not
>> convinced this is the right way to fix it.
>>
>> The commit log says we need protection for the global pci_root_buses
>> list.  But even with this whole series, we still traverse the list
>> without protection in places like pcibios_resource_survey() and
>> pci_assign_unassigned_resources().
>>
>> Maybe we can make progress on this by identifying specific failures
>> that can happen in a couple of these paths, e.g., acpi_pci_root_add()
>> and i7core_xeon_pci_fixup().  If we look at those paths, we might a
>> way to fix this in a more general fashion than throwing in lock/unlock
>> pairs.
>>
>> It might also help to know what the rule is for when we need to use
>> pci_host_bridge_hotplug_lock() and pci_host_bridge_hotplug_unlock().
>> Apparently it is not as simple as protecting every reference to the
>> pci_root_buses list.
> Hi Bjorn,
>         It's really a challenge work to protect the pci_root_buses list:)

Yes.  IIRC, your last patch was to unexport pci_root_buses, which I
think is a great idea.

> All evils are caused by the pci_find_next_bus() interface, which is designed
> to be called at boot time only. I have tried several other solutions but
> failed.
>         First I tried "pci_get_next_bus()" which holds a reference to the
> returned root bus "pci_bus". But that doesn't help because pci_bus could
> be removed from the pci_root_buses list even you hold a reference to
> pci_bus. And it will cause trouble when you call pci_get_next_bus(pci_bus)
> again because pci_bus->node.next is invalid now.

That sounds like a bug.  If an interface returns a structure after
acquiring a reference, the caller should be able to rely on the
structure remaining valid.  Adding extra locks doesn't feel like the
right solution for that problem.

In the big picture, I'm not sure how much sense all the
pci_find_bus(), pci_find_next_bus(), pci_get_bus(),
pci_get_next_bus(), etc., interfaces really make.  There really aren't
very many callers, and most of them look a bit hacky to me.  Usually
they're quirks trying to locate a device or drivers for device A
trying to locate companion device B or something similar.  I wonder if
we could figure out some entirely new interface that wouldn't involve
traversing so much of the hierarchy and therefore could be safer.

>         Then I tried RCU and also failed because caller of pci_get_next_bus()
> may sleep.
>         And at last the global host bridge hotplug lock solution. The rules
> for locking are:
>         1) No need for locking when accessing the pci_root_buses list at
> system initialization stages. (It's system initialization instead of driver
> initialization here because driver's initialization code may be called
> at runtime when loading the driver.) It's single-threaded and no hotplug
> during system initialization stages.
>         2) Should acquire the global lock when accessing the pci_root_buses
> list at runtime.
>
>         I have done several rounds of scanning to identify accessing to
> the pci_root_buses list at runtime. But there may still be something missed:(

That's part of what makes me uneasy.  We have to look at a lot of code
outside drivers/pci to analyze correctness, which is difficult.  It
would be much better if we could do something in the core, where we
only have to analyze drivers/pci.  I know this is probably much harder
and probably involves replacing or removing some of these interfaces
that cause problems.

>         I think the best solution is to get rid of the pci_find_next_bus().
> but not sure whether we could achieve that.

>> Actually, I looked at the callers of pci_find_next_bus(), and most of
>> them are unsafe in an even deeper way: they're doing device setup in
>> initcalls, so that setup won't be done for hot-added devices.  For
>> example, I can pick on sba_init() because I think I wrote it back in
>> the dark ages.  sba_init() is a subsys_initcall that calls
>> sba_connect_bus() for every bus we know about at boot-time, and it
>> sets the host bridge's iommu pointer.  If we were to hot-add a host
>> bridge, we would never set the iommu pointer.

> That's a more fundamental issue, another big topic for us:(

>> I'm not sure why you didn't add a pci_host_bridge_hotplug_lock() in
>> the sba_init() path, since it looks similar to the drm_open_helper()
>> path above.  But in any case, I think that would be the wrong thing to
>> do because it would fix the superficial problem while leaving the
>> deeper problem of host bridge hot-add not setting the iommu pointer.

> sba_init is called during system initialization stages through subsys_initcall,
> so no extra protection for it.

OK, I see your reasoning.  But I don't agree :)  All the users of an
interface should use the same locking scheme, even if they're at
boot-time where we "know" we don't need it.  It's too hard to analyze
differences, and code gets copied from one place to somewhere else
where it might not be appropriate.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiang Liu - Sept. 17, 2012, 3:51 p.m.
>>> I'm not sure why you didn't add a pci_host_bridge_hotplug_lock() in
>>> the sba_init() path, since it looks similar to the drm_open_helper()
>>> path above.  But in any case, I think that would be the wrong thing to
>>> do because it would fix the superficial problem while leaving the
>>> deeper problem of host bridge hot-add not setting the iommu pointer.
> 
>> sba_init is called during system initialization stages through subsys_initcall,
>> so no extra protection for it.
> 
> OK, I see your reasoning.  But I don't agree :)  All the users of an
> interface should use the same locking scheme, even if they're at
> boot-time where we "know" we don't need it.  It's too hard to analyze
> differences, and code gets copied from one place to somewhere else
> where it might not be appropriate.
Hi Bjorn,
	"All the users of an interface should use the same locking scheme", a really
good design pattern to follow. It's so easy for everybody to remove the __init
modifier without noticing the design limitation.
	--Gerry
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney - Sept. 20, 2012, 6:49 p.m.
On Wed, Sep 12, 2012 at 10:51:05AM -0600, Bjorn Helgaas wrote:
> On Wed, Sep 12, 2012 at 9:42 AM, Jiang Liu <liuj97@gmail.com> wrote:
> > On 09/12/2012 06:57 AM, Bjorn Helgaas wrote:
> >> On Tue, Aug 7, 2012 at 10:10 AM, Jiang Liu <liuj97@gmail.com> wrote:
> >>> Currently there's no mechanism to protect the global pci_root_buses list
> >>> from dynamic change at runtime. That means, PCI root bridge hotplug
> >>> operations, which dynamically change the pci_root_buses list, may cause
> >>> invalid memory accesses.
> >>>
> >>> So introduce a global lock to serialize accesses to the pci_root_buses
> >>> list and serialize PCI host bridge hotplug operations.
> 
> >>> @@ -463,6 +463,8 @@ static int __devinit acpi_pci_root_add(struct acpi_device *device)
> >>>         if (!root)
> >>>                 return -ENOMEM;
> >>>
> >>> +       pci_host_bridge_hotplug_lock();
> >>
> >> Here's where I get lost.  This is an ACPI driver's .add() routine,
> >> which is analogous to a PCI driver's .probe() routine.  PCI driver
> >> .probe() routines don't need to be concerned with PCI device hotplug.
> >> All the hotplug-related locking is handled by the PCI core, not by
> >> individual drivers.  So why do we need it here?
> >>
> >> I'm not suggesting that the existing locking is correct.  I'm just not
> >> convinced this is the right way to fix it.
> >>
> >> The commit log says we need protection for the global pci_root_buses
> >> list.  But even with this whole series, we still traverse the list
> >> without protection in places like pcibios_resource_survey() and
> >> pci_assign_unassigned_resources().
> >>
> >> Maybe we can make progress on this by identifying specific failures
> >> that can happen in a couple of these paths, e.g., acpi_pci_root_add()
> >> and i7core_xeon_pci_fixup().  If we look at those paths, we might a
> >> way to fix this in a more general fashion than throwing in lock/unlock
> >> pairs.
> >>
> >> It might also help to know what the rule is for when we need to use
> >> pci_host_bridge_hotplug_lock() and pci_host_bridge_hotplug_unlock().
> >> Apparently it is not as simple as protecting every reference to the
> >> pci_root_buses list.
> > Hi Bjorn,
> >         It's really a challenge work to protect the pci_root_buses list:)
> 
> Yes.  IIRC, your last patch was to unexport pci_root_buses, which I
> think is a great idea.
> 
> > All evils are caused by the pci_find_next_bus() interface, which is designed
> > to be called at boot time only. I have tried several other solutions but
> > failed.
> >         First I tried "pci_get_next_bus()" which holds a reference to the
> > returned root bus "pci_bus". But that doesn't help because pci_bus could
> > be removed from the pci_root_buses list even you hold a reference to
> > pci_bus. And it will cause trouble when you call pci_get_next_bus(pci_bus)
> > again because pci_bus->node.next is invalid now.
> 
> That sounds like a bug.  If an interface returns a structure after
> acquiring a reference, the caller should be able to rely on the
> structure remaining valid.  Adding extra locks doesn't feel like the
> right solution for that problem.
> 
> In the big picture, I'm not sure how much sense all the
> pci_find_bus(), pci_find_next_bus(), pci_get_bus(),
> pci_get_next_bus(), etc., interfaces really make.  There really aren't
> very many callers, and most of them look a bit hacky to me.  Usually
> they're quirks trying to locate a device or drivers for device A
> trying to locate companion device B or something similar.  I wonder if
> we could figure out some entirely new interface that wouldn't involve
> traversing so much of the hierarchy and therefore could be safer.
> 
> >         Then I tried RCU and also failed because caller of pci_get_next_bus()
> > may sleep.

On the unlikely off-chance that it helps, SRCU does allow sleeping
readers.

							Thanx, Paul

> >         And at last the global host bridge hotplug lock solution. The rules
> > for locking are:
> >         1) No need for locking when accessing the pci_root_buses list at
> > system initialization stages. (It's system initialization instead of driver
> > initialization here because driver's initialization code may be called
> > at runtime when loading the driver.) It's single-threaded and no hotplug
> > during system initialization stages.
> >         2) Should acquire the global lock when accessing the pci_root_buses
> > list at runtime.
> >
> >         I have done several rounds of scanning to identify accessing to
> > the pci_root_buses list at runtime. But there may still be something missed:(
> 
> That's part of what makes me uneasy.  We have to look at a lot of code
> outside drivers/pci to analyze correctness, which is difficult.  It
> would be much better if we could do something in the core, where we
> only have to analyze drivers/pci.  I know this is probably much harder
> and probably involves replacing or removing some of these interfaces
> that cause problems.
> 
> >         I think the best solution is to get rid of the pci_find_next_bus().
> > but not sure whether we could achieve that.
> 
> >> Actually, I looked at the callers of pci_find_next_bus(), and most of
> >> them are unsafe in an even deeper way: they're doing device setup in
> >> initcalls, so that setup won't be done for hot-added devices.  For
> >> example, I can pick on sba_init() because I think I wrote it back in
> >> the dark ages.  sba_init() is a subsys_initcall that calls
> >> sba_connect_bus() for every bus we know about at boot-time, and it
> >> sets the host bridge's iommu pointer.  If we were to hot-add a host
> >> bridge, we would never set the iommu pointer.
> 
> > That's a more fundamental issue, another big topic for us:(
> 
> >> I'm not sure why you didn't add a pci_host_bridge_hotplug_lock() in
> >> the sba_init() path, since it looks similar to the drm_open_helper()
> >> path above.  But in any case, I think that would be the wrong thing to
> >> do because it would fix the superficial problem while leaving the
> >> deeper problem of host bridge hot-add not setting the iommu pointer.
> 
> > sba_init is called during system initialization stages through subsys_initcall,
> > so no extra protection for it.
> 
> OK, I see your reasoning.  But I don't agree :)  All the users of an
> interface should use the same locking scheme, even if they're at
> boot-time where we "know" we don't need it.  It's too hard to analyze
> differences, and code gets copied from one place to somewhere else
> where it might not be appropriate.
> 
> Bjorn
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 7aff631..6bd0e32 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -463,6 +463,8 @@  static int __devinit acpi_pci_root_add(struct acpi_device *device)
 	if (!root)
 		return -ENOMEM;
 
+	pci_host_bridge_hotplug_lock();
+
 	segment = 0;
 	status = acpi_evaluate_integer(device->handle, METHOD_NAME__SEG, NULL,
 				       &segment);
@@ -516,7 +518,6 @@  static int __devinit acpi_pci_root_add(struct acpi_device *device)
 	 * TBD: Need PCI interface for enumeration/configuration of roots.
 	 */
 
-	/* TBD: Locking */
 	list_add_tail(&root->node, &acpi_pci_roots);
 
 	printk(KERN_INFO PREFIX "%s [%s] (domain %04x %pR)\n",
@@ -622,11 +623,14 @@  static int __devinit acpi_pci_root_add(struct acpi_device *device)
 	if (device->wakeup.flags.run_wake)
 		device_set_run_wake(root->bus->bridge, true);
 
+	pci_host_bridge_hotplug_unlock();
+
 	return 0;
 
 end:
 	if (!list_empty(&root->node))
 		list_del(&root->node);
+	pci_host_bridge_hotplug_unlock();
 	kfree(root);
 	return result;
 }
@@ -643,8 +647,10 @@  static int acpi_pci_root_remove(struct acpi_device *device, int type)
 {
 	struct acpi_pci_root *root = acpi_driver_data(device);
 
+	pci_host_bridge_hotplug_lock();
 	device_set_run_wake(root->bus->bridge, false);
 	pci_acpi_remove_bus_pm_notifier(device);
+	pci_host_bridge_hotplug_unlock();
 
 	kfree(root);
 	return 0;
diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index d27778f..8e6f177 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -1196,6 +1196,7 @@  static void __init i7core_xeon_pci_fixup(const struct pci_id_table *table)
 	 * aren't announced by acpi. So, we need to use a legacy scan probing
 	 * to detect them
 	 */
+	pci_host_bridge_hotplug_lock();
 	while (table && table->descr) {
 		pdev = pci_get_device(PCI_VENDOR_ID_INTEL, table->descr[0].dev_id, NULL);
 		if (unlikely(!pdev)) {
@@ -1205,19 +1206,16 @@  static void __init i7core_xeon_pci_fixup(const struct pci_id_table *table)
 		pci_dev_put(pdev);
 		table++;
 	}
+	pci_host_bridge_hotplug_unlock();
 }
 
 static unsigned i7core_pci_lastbus(void)
 {
-	int last_bus = 0, bus;
-	struct pci_bus *b = NULL;
-
-	while ((b = pci_find_next_bus(b)) != NULL) {
-		bus = b->number;
-		debugf0("Found bus %d\n", bus);
-		if (bus > last_bus)
-			last_bus = bus;
-	}
+	int last_bus = 0;
+
+	for (last_bus = 255; last_bus >= 0; last_bus--)
+		if (pci_find_bus(0, last_bus))
+			break;
 
 	debugf0("Last bus %d\n", last_bus);
 
diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
index 123de28..f559b5b 100644
--- a/drivers/gpu/drm/drm_fops.c
+++ b/drivers/gpu/drm/drm_fops.c
@@ -344,9 +344,13 @@  static int drm_open_helper(struct inode *inode, struct file *filp,
 			pci_dev_put(pci_dev);
 		}
 		if (!dev->hose) {
-			struct pci_bus *b = pci_bus_b(pci_root_buses.next);
+			struct pci_bus *b;
+
+			pci_host_bridge_hotplug_lock();
+			b = pci_find_next_bus(NULL);
 			if (b)
 				dev->hose = b->sysdata;
+			pci_host_bridge_hotplug_unlock();
 		}
 	}
 #endif
diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
index a68dc61..28d5557 100644
--- a/drivers/pci/host-bridge.c
+++ b/drivers/pci/host-bridge.c
@@ -94,3 +94,22 @@  void pcibios_bus_to_resource(struct pci_dev *dev, struct resource *res,
 	res->end = region->end + offset;
 }
 EXPORT_SYMBOL(pcibios_bus_to_resource);
+
+static DEFINE_MUTEX(pci_host_bridge_lock);
+
+/*
+ * Get the lock to serialize PCI host bridge hotplug operations.
+ * It can't be called from PCI device drivers, otherwise it may cause
+ * deadlocks when removing a host bridge.
+ */
+void pci_host_bridge_hotplug_lock(void)
+{
+	mutex_lock(&pci_host_bridge_lock);
+}
+EXPORT_SYMBOL(pci_host_bridge_hotplug_lock);
+
+void pci_host_bridge_hotplug_unlock(void)
+{
+	mutex_unlock(&pci_host_bridge_lock);
+}
+EXPORT_SYMBOL(pci_host_bridge_hotplug_unlock);
diff --git a/drivers/pci/hotplug/sgi_hotplug.c b/drivers/pci/hotplug/sgi_hotplug.c
index f64ca92..e0e5d13 100644
--- a/drivers/pci/hotplug/sgi_hotplug.c
+++ b/drivers/pci/hotplug/sgi_hotplug.c
@@ -690,6 +690,7 @@  static int __init sn_pci_hotplug_init(void)
 
 	INIT_LIST_HEAD(&sn_hp_list);
 
+	pci_host_bridge_hotplug_lock();
 	while ((pci_bus = pci_find_next_bus(pci_bus))) {
 		if (!pci_bus->sysdata)
 			continue;
@@ -709,6 +710,7 @@  static int __init sn_pci_hotplug_init(void)
 			break;
 		}
 	}
+	pci_host_bridge_hotplug_unlock();
 
 	return registered == 1 ? 0 : -ENODEV;
 }
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 86c63fe..99fefbe 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -295,10 +295,12 @@  static ssize_t bus_rescan_store(struct bus_type *bus, const char *buf,
 		return -EINVAL;
 
 	if (val) {
+		pci_host_bridge_hotplug_lock();
 		mutex_lock(&pci_remove_rescan_mutex);
 		while ((b = pci_find_next_bus(b)) != NULL)
 			pci_rescan_bus(b);
 		mutex_unlock(&pci_remove_rescan_mutex);
+		pci_host_bridge_hotplug_unlock();
 	}
 	return count;
 }
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ad77ae5..1f64e8d 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -23,7 +23,10 @@  struct resource busn_resource = {
 	.flags	= IORESOURCE_BUS,
 };
 
-/* Ugh.  Need to stop exporting this to modules. */
+/*
+ * Ugh.  Need to stop exporting this to modules.
+ * Protected by pci_host_bridge_hotplug_{lock|unlock}().
+ */
 LIST_HEAD(pci_root_buses);
 EXPORT_SYMBOL(pci_root_buses);
 
diff --git a/drivers/pci/search.c b/drivers/pci/search.c
index 993d4a0..f1147a7 100644
--- a/drivers/pci/search.c
+++ b/drivers/pci/search.c
@@ -100,6 +100,13 @@  struct pci_bus * pci_find_bus(int domain, int busnr)
  * initiated by passing %NULL as the @from argument.  Otherwise if
  * @from is not %NULL, searches continue from next device on the
  * global list.
+ *
+ * Please don't call this function at rumtime if possible.
+ * It's designed to be called at boot time only because it's unsafe
+ * to PCI root bridge hotplug operations. But some drivers do invoke
+ * it at runtime and it's hard to fix those drivers. In such cases,
+ * use pci_host_bridge_hotplug()_{lock|unlock} to protect the PCI root
+ * bus list, but you need to be really careful to avoid deadlock.
  */
 struct pci_bus * 
 pci_find_next_bus(const struct pci_bus *from)
@@ -115,6 +122,7 @@  pci_find_next_bus(const struct pci_bus *from)
 	up_read(&pci_bus_sem);
 	return b;
 }
+EXPORT_SYMBOL(pci_find_next_bus);
 
 /**
  * pci_get_slot - locate PCI device for a given PCI slot
@@ -353,7 +361,6 @@  EXPORT_SYMBOL(pci_dev_present);
 
 /* For boot time work */
 EXPORT_SYMBOL(pci_find_bus);
-EXPORT_SYMBOL(pci_find_next_bus);
 /* For everyone */
 EXPORT_SYMBOL(pci_get_device);
 EXPORT_SYMBOL(pci_get_subsys);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 21fa79e..e02f130 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -388,6 +388,8 @@  struct pci_host_bridge {
 void pci_set_host_bridge_release(struct pci_host_bridge *bridge,
 		     void (*release_fn)(struct pci_host_bridge *),
 		     void *release_data);
+void pci_host_bridge_hotplug_lock(void);
+void pci_host_bridge_hotplug_unlock(void);
 
 /*
  * The first PCI_BRIDGE_RESOURCE_NUM PCI bus resources (those that correspond
@@ -1359,6 +1361,12 @@  static inline void pci_unblock_cfg_access(struct pci_dev *dev)
 static inline struct pci_bus *pci_find_next_bus(const struct pci_bus *from)
 { return NULL; }
 
+static inline void pci_host_bridge_hotplug_lock(void)
+{ }
+
+static inline void pci_host_bridge_hotplug_unlock(void)
+{ }
+
 static inline struct pci_dev *pci_get_slot(struct pci_bus *bus,
 						unsigned int devfn)
 { return NULL; }