Patchwork [4/5] PCI/IOV: simplify code by hotplug safe pci_get_domain_bus_and_slot()

login
register
mail settings
Submitter Jiang Liu
Date Aug. 28, 2012, 3:43 p.m.
Message ID <1346168638-32724-5-git-send-email-jiang.liu@huawei.com>
Download mbox | patch
Permalink /patch/180512/
State Accepted
Headers show

Comments

Jiang Liu - Aug. 28, 2012, 3:43 p.m.
Following code has a race window between pci_find_bus() and pci_get_slot()
if PCI hotplug operation happens between them which removes the pci_bus.
So use PCI hotplug safe interface pci_get_domain_bus_and_slot() instead,
which also reduces code complexity.

struct pci_bus *pci_bus = pci_find_bus(domain, busno);
struct pci_dev *pci_dev = pci_get_slot(pci_bus, devfn);

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
 drivers/pci/iov.c |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)
Yinghai Lu - Sept. 20, 2012, 8:38 p.m.
On Tue, Aug 28, 2012 at 8:43 AM, Jiang Liu <liuj97@gmail.com> wrote:
> Following code has a race window between pci_find_bus() and pci_get_slot()
> if PCI hotplug operation happens between them which removes the pci_bus.
> So use PCI hotplug safe interface pci_get_domain_bus_and_slot() instead,
> which also reduces code complexity.
>
> struct pci_bus *pci_bus = pci_find_bus(domain, busno);
> struct pci_dev *pci_dev = pci_get_slot(pci_bus, devfn);
>
> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
> ---
>  drivers/pci/iov.c |    8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index aeccc91..b0fe771 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -152,15 +152,11 @@ failed1:
>  static void virtfn_remove(struct pci_dev *dev, int id, int reset)
>  {
>         char buf[VIRTFN_ID_LEN];
> -       struct pci_bus *bus;
>         struct pci_dev *virtfn;
>         struct pci_sriov *iov = dev->sriov;
>
> -       bus = pci_find_bus(pci_domain_nr(dev->bus), virtfn_bus(dev, id));
> -       if (!bus)
> -               return;
> -
> -       virtfn = pci_get_slot(bus, virtfn_devfn(dev, id));
> +       virtfn = pci_get_domain_bus_and_slot(pci_domain_nr(dev->bus),
> +                       virtfn_bus(dev, id), virtfn_devfn(dev, id));
>         if (!virtfn)
>                 return;
>

Hi,

This one cause IOV regression, when remove bridge with pci devices under that.

in that case, VFs  are stopped before PF, so they are not in device
tree anymore.
so pci_get_domain_bus_and_slot will not find those VFs.

So the reference to PF is not released. Also vit_bus may not be released too.

So you have to rework
    pci_get_domain_bus_and_slot to make it work on pci devices get stopped only.

or just drop this from the tree.

BTW, Bjorn,
for the similar reason,  you need to apply
http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commitdiff;h=e5a50aa3dfca1331c7c783412b1524bea06d2752

 PCI: Split out stop_bus_device and remove_bus_dev again.

 So later could use them for pci root bus hotplug support.

Also restore old behavoir: stop all at first then remove all.

-v2: only split the functions.

the reason is:

We stop all VFs at first , stop PF

before we stop PF, we can not remove VFs,

otherwise virtfn_remove does not work properly to remove reference and
virtfn bus while can not find vf.

I would like to have
 PCI: Split out stop_bus_device and remove_bus_dev again.
fold into

http://git.kernel.org/?p=linux/kernel/git/helgaas/pci.git;a=commitdiff;h=282e1d655fe7c7c2e6b0dd8166c4c6b7c2a1219b

-Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - Sept. 20, 2012, 11:59 p.m.
On Thu, Sep 20, 2012 at 2:38 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, Aug 28, 2012 at 8:43 AM, Jiang Liu <liuj97@gmail.com> wrote:
>> Following code has a race window between pci_find_bus() and pci_get_slot()
>> if PCI hotplug operation happens between them which removes the pci_bus.
>> So use PCI hotplug safe interface pci_get_domain_bus_and_slot() instead,
>> which also reduces code complexity.
>>
>> struct pci_bus *pci_bus = pci_find_bus(domain, busno);
>> struct pci_dev *pci_dev = pci_get_slot(pci_bus, devfn);
>>
>> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
>> ---
>>  drivers/pci/iov.c |    8 ++------
>>  1 file changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>> index aeccc91..b0fe771 100644
>> --- a/drivers/pci/iov.c
>> +++ b/drivers/pci/iov.c
>> @@ -152,15 +152,11 @@ failed1:
>>  static void virtfn_remove(struct pci_dev *dev, int id, int reset)
>>  {
>>         char buf[VIRTFN_ID_LEN];
>> -       struct pci_bus *bus;
>>         struct pci_dev *virtfn;
>>         struct pci_sriov *iov = dev->sriov;
>>
>> -       bus = pci_find_bus(pci_domain_nr(dev->bus), virtfn_bus(dev, id));
>> -       if (!bus)
>> -               return;
>> -
>> -       virtfn = pci_get_slot(bus, virtfn_devfn(dev, id));
>> +       virtfn = pci_get_domain_bus_and_slot(pci_domain_nr(dev->bus),
>> +                       virtfn_bus(dev, id), virtfn_devfn(dev, id));
>>         if (!virtfn)
>>                 return;
>>
>
> Hi,
>
> This one cause IOV regression, when remove bridge with pci devices under that.
>
> in that case, VFs  are stopped before PF, so they are not in device
> tree anymore.
> so pci_get_domain_bus_and_slot will not find those VFs.
>
> So the reference to PF is not released. Also vit_bus may not be released too.
>
> So you have to rework
>     pci_get_domain_bus_and_slot to make it work on pci devices get stopped only.
>
> or just drop this from the tree.

pci_find_bus() is a broken interface (because there's no reference
counting or safety with respect to hot-plug), and if the design
depends on it, that means the design is broken, too.  I don't think
reworking pci_get_domain_bus_and_slot() is the right answer.

It's not clear to me why we need the split between stopping and
removing devices.  That split leads to these zombie devices that have
been stopped and are no longer findable by bus_find_device() (which is
used by pci_get_domain_bus_and_slot()), but still "exist" in some
fashion until they're removed.  It's unreasonable for random PCI and
driver code to have to worry about that zombie state.

I'll revert this patch for now to fix the regression.  Of course, that
means we will still have the old problem of using the unsafe
pci_find_bus().

> BTW, Bjorn,
> for the similar reason,  you need to apply
> http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commitdiff;h=e5a50aa3dfca1331c7c783412b1524bea06d2752
> ...
> the reason is:
>
> We stop all VFs at first , stop PF
> before we stop PF, we can not remove VFs,
>
> otherwise virtfn_remove does not work properly to remove reference and
> virtfn bus while can not find vf.

I'll apply this one, too, for now.  I put them both on the
pci/yinghai-revert-pci_find_bus-and-remove-cleanup branch.  Let me
know if that's not what you expected.

I'm not happy about either reverting Jiang's patch or splitting
stop/remove again.  It complicates the design and the code.  I'll
apply them because they're regressions, and we don't have time for
redesign before 3.7.  But I encourage you to think about how to do
this more cleanly.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiang Liu - Sept. 21, 2012, 12:02 a.m.
On 09/21/2012 07:59 AM, Bjorn Helgaas wrote:
> On Thu, Sep 20, 2012 at 2:38 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Tue, Aug 28, 2012 at 8:43 AM, Jiang Liu <liuj97@gmail.com> wrote:
>>> Following code has a race window between pci_find_bus() and pci_get_slot()
>>> if PCI hotplug operation happens between them which removes the pci_bus.
>>> So use PCI hotplug safe interface pci_get_domain_bus_and_slot() instead,
>>> which also reduces code complexity.
>>>
>>> struct pci_bus *pci_bus = pci_find_bus(domain, busno);
>>> struct pci_dev *pci_dev = pci_get_slot(pci_bus, devfn);
>>>
>>> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
>>> ---
>>>  drivers/pci/iov.c |    8 ++------
>>>  1 file changed, 2 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>>> index aeccc91..b0fe771 100644
>>> --- a/drivers/pci/iov.c
>>> +++ b/drivers/pci/iov.c
>>> @@ -152,15 +152,11 @@ failed1:
>>>  static void virtfn_remove(struct pci_dev *dev, int id, int reset)
>>>  {
>>>         char buf[VIRTFN_ID_LEN];
>>> -       struct pci_bus *bus;
>>>         struct pci_dev *virtfn;
>>>         struct pci_sriov *iov = dev->sriov;
>>>
>>> -       bus = pci_find_bus(pci_domain_nr(dev->bus), virtfn_bus(dev, id));
>>> -       if (!bus)
>>> -               return;
>>> -
>>> -       virtfn = pci_get_slot(bus, virtfn_devfn(dev, id));
>>> +       virtfn = pci_get_domain_bus_and_slot(pci_domain_nr(dev->bus),
>>> +                       virtfn_bus(dev, id), virtfn_devfn(dev, id));
>>>         if (!virtfn)
>>>                 return;
>>>
>>
>> Hi,
>>
>> This one cause IOV regression, when remove bridge with pci devices under that.
>>
>> in that case, VFs  are stopped before PF, so they are not in device
>> tree anymore.
>> so pci_get_domain_bus_and_slot will not find those VFs.
>>
>> So the reference to PF is not released. Also vit_bus may not be released too.
>>
>> So you have to rework
>>     pci_get_domain_bus_and_slot to make it work on pci devices get stopped only.
>>
>> or just drop this from the tree.
> 
> pci_find_bus() is a broken interface (because there's no reference
> counting or safety with respect to hot-plug), and if the design
> depends on it, that means the design is broken, too.  I don't think
> reworking pci_get_domain_bus_and_slot() is the right answer.
> 
> It's not clear to me why we need the split between stopping and
> removing devices.  That split leads to these zombie devices that have
> been stopped and are no longer findable by bus_find_device() (which is
> used by pci_get_domain_bus_and_slot()), but still "exist" in some
> fashion until they're removed.  It's unreasonable for random PCI and
> driver code to have to worry about that zombie state.
> 
> I'll revert this patch for now to fix the regression.  Of course, that
> means we will still have the old problem of using the unsafe
> pci_find_bus().
Hi Bjorn,
	I'm working on to enhance unsafe calling of pci_find_bus().
	--Gerry

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - Sept. 21, 2012, 1:51 a.m.
On Thu, Sep 20, 2012 at 4:59 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Thu, Sep 20, 2012 at 2:38 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> in that case, VFs  are stopped before PF, so they are not in device
>> tree anymore.
>> so pci_get_domain_bus_and_slot will not find those VFs.
>>
>> So the reference to PF is not released. Also vit_bus may not be released too.
>>
>> So you have to rework
>>     pci_get_domain_bus_and_slot to make it work on pci devices get stopped only.
>>
>> or just drop this from the tree.
>
> pci_find_bus() is a broken interface (because there's no reference
> counting or safety with respect to hot-plug), and if the design
> depends on it, that means the design is broken, too.  I don't think
> reworking pci_get_domain_bus_and_slot() is the right answer.
>
> It's not clear to me why we need the split between stopping and
> removing devices.  That split leads to these zombie devices that have
> been stopped and are no longer findable by bus_find_device() (which is
> used by pci_get_domain_bus_and_slot()), but still "exist" in some
> fashion until they're removed.  It's unreasonable for random PCI and
> driver code to have to worry about that zombie state.

That is not zombie state. that is driver unloaded, and not in /sys, /proc.
that pci device only can be found under bus->devices.

just like we have pci_device_add and pci_bus_add_device
or acpi add and acpi start.

Splitting stop and remove should be more clean than mixing stop and remove.

>
> I'll revert this patch for now to fix the regression.  Of course, that
> means we will still have the old problem of using the unsafe
> pci_find_bus().
>
>> BTW, Bjorn,
>> for the similar reason,  you need to apply
>> http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commitdiff;h=e5a50aa3dfca1331c7c783412b1524bea06d2752
>> ...
>> the reason is:
>>
>> We stop all VFs at first , stop PF
>> before we stop PF, we can not remove VFs,
>>
>> otherwise virtfn_remove does not work properly to remove reference and
>> virtfn bus while can not find vf.
>
> I'll apply this one, too, for now.  I put them both on the
> pci/yinghai-revert-pci_find_bus-and-remove-cleanup branch.  Let me
> know if that's not what you expected.

Yes, they are good.

>
> I'm not happy about either reverting Jiang's patch or splitting
> stop/remove again.  It complicates the design and the code.  I'll
> apply them because they're regressions, and we don't have time for
> redesign before 3.7.  But I encourage you to think about how to do
> this more cleanly.

That will need to redesign sriov implementation.

Also that pci root bus add/start, stop/remove will need special
sequence to make ioapic
and dmar to be started early before normal pci device drivers and
stopped after normal pci device drivers.

-Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - Sept. 21, 2012, 2:56 a.m.
On Thu, Sep 20, 2012 at 7:51 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Thu, Sep 20, 2012 at 4:59 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Thu, Sep 20, 2012 at 2:38 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> in that case, VFs  are stopped before PF, so they are not in device
>>> tree anymore.
>>> so pci_get_domain_bus_and_slot will not find those VFs.
>>>
>>> So the reference to PF is not released. Also vit_bus may not be released too.
>>>
>>> So you have to rework
>>>     pci_get_domain_bus_and_slot to make it work on pci devices get stopped only.
>>>
>>> or just drop this from the tree.
>>
>> pci_find_bus() is a broken interface (because there's no reference
>> counting or safety with respect to hot-plug), and if the design
>> depends on it, that means the design is broken, too.  I don't think
>> reworking pci_get_domain_bus_and_slot() is the right answer.
>>
>> It's not clear to me why we need the split between stopping and
>> removing devices.  That split leads to these zombie devices that have
>> been stopped and are no longer findable by bus_find_device() (which is
>> used by pci_get_domain_bus_and_slot()), but still "exist" in some
>> fashion until they're removed.  It's unreasonable for random PCI and
>> driver code to have to worry about that zombie state.
>
> That is not zombie state. that is driver unloaded, and not in /sys, /proc.
> that pci device only can be found under bus->devices.

It doesn't matter whether we call this a "zombie state" or just refer
to it as "devices not in /sys & /proc but still in bus->devices."  The
point is that this state is not very useful, and code outside the PCI
core should not have to know that it exists.

> just like we have pci_device_add and pci_bus_add_device
> or acpi add and acpi start.

The fact that ACPI drivers have both .add() and .start() methods is
another artifact of poor design, in my opinion.  No other subsystem
has that split, as far as I know.  The ACPI split exists because of a
messed-up ACPI hotplug implementation.  That doesn't mean we should
copy it.

>> I'm not happy about either reverting Jiang's patch or splitting
>> stop/remove again.  It complicates the design and the code.  I'll
>> apply them because they're regressions, and we don't have time for
>> redesign before 3.7.  But I encourage you to think about how to do
>> this more cleanly.
>
> That will need to redesign sriov implementation.

That's right.  If we can improve the situation by redesigning, that's
what we should do.  The present situation, where we keep adding
special cases because "that's the way the rest of the system works" is
not sustainable in the long term.

> Also that pci root bus add/start, stop/remove will need special
> sequence to make ioapic
> and dmar to be started early before normal pci device drivers and
> stopped after normal pci device drivers.

This is another thing I'm curious about.  How do you handle this
situation today (before host bridge hot-add)?

The DMAR I'm not so worried about because as far as I know, there's no
such thing as a DMAR that's discovered by PCI enumeration.  We should
discover it via ACPI, and that can happen before we enumerate anything
behind a host bridge, so I don't really see any ordering problem
between the DMAR and the PCI devices that would use it.

However, I know there *are* IOAPICs that are enumerated as PCI
devices, and I don't know whether we can deduce a relationship between
the IOAPIC and the devices that use it.  Don't we have this problem
already?  I assume that even without hot-adding a host bridge, we
might discover a PCI IOAPIC that was present at boot, and we'd have to
make sure to bind a driver to it before we use any of the PCI devices
connected to it.  How does that work?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - Sept. 21, 2012, 6:22 a.m.
On Thu, Sep 20, 2012 at 7:56 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> This is another thing I'm curious about.  How do you handle this
> situation today (before host bridge hot-add)?
>
> The DMAR I'm not so worried about because as far as I know, there's no
> such thing as a DMAR that's discovered by PCI enumeration.  We should
> discover it via ACPI, and that can happen before we enumerate anything
> behind a host bridge, so I don't really see any ordering problem
> between the DMAR and the PCI devices that would use it.

only need to have pci devices on that root bus scanned, and current intel iommu
maintain one device scope to drhd with pointer to pci device... that
need to be fixed
too.

>
> However, I know there *are* IOAPICs that are enumerated as PCI
> devices, and I don't know whether we can deduce a relationship between
> the IOAPIC and the devices that use it.  Don't we have this problem
> already?  I assume that even without hot-adding a host bridge, we
> might discover a PCI IOAPIC that was present at boot, and we'd have to
> make sure to bind a driver to it before we use any of the PCI devices
> connected to it.  How does that work?

I converted it to acpi way to discover it, and it could handle that case.

will search _GSB and try to get pci device, if there is pci device
will try to get BAR as ioapic base.

http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=blob;f=drivers/pci/ioapic.c;h=504ca93ac692646a7754fff83a04e3d07d98f648;hb=refs/heads/for-x86-irq

something like:

static void handle_ioapic_add(acpi_handle handle, struct pci_dev **pdev,
				 u32 *pgsi_base)
{
	acpi_status status;
	unsigned long long gsb;
	struct pci_dev *dev;
	u32 gsi_base;
	int ret;
	char *type;
	struct resource r;
	struct resource *res = &r;
	char objname[64];
	struct acpi_buffer buffer = {sizeof(objname), objname};

	*pdev = NULL;
	*pgsi_base = 0;

	status = acpi_evaluate_integer(handle, "_GSB", NULL, &gsb);
	if (ACPI_FAILURE(status) || !gsb)
		return;

	dev = acpi_get_pci_dev(handle);
	if (!dev) {
		struct acpi_device_info *info;
		char *hid = NULL;

		status = acpi_get_object_info(handle, &info);
		if (ACPI_FAILURE(status))
			return;
		if (info->valid & ACPI_VALID_HID)
			hid = info->hardware_id.string;
		if (!hid || strcmp(hid, "ACPI0009")) {
			kfree(info);
			return;
		}
		kfree(info);
		memset(res, 0, sizeof(*res));
		acpi_walk_resources(handle, METHOD_NAME__CRS, setup_res, res);
		if (!res->flags)
			return;
	}

	acpi_get_name(handle, ACPI_FULL_PATHNAME, &buffer);

	gsi_base = gsb;
	type = "IOxAPIC";
	if (dev) {
		ret = pci_enable_device(dev);
		if (ret < 0)
			goto exit_put;

		pci_set_master(dev);

		if (dev->class == PCI_CLASS_SYSTEM_PIC_IOAPIC)
			type = "IOAPIC";

		if (pci_request_region(dev, 0, type))
			goto exit_disable;

		res = &dev->resource[0];
	}

	if (acpi_register_ioapic(handle, res->start, gsi_base)) {
		if (dev)
			goto exit_release;
		return;
	}

	printk(KERN_INFO "%s %s %s at %pR, GSI %u\n",
		dev ? dev_name(&dev->dev) : "", objname, type,
		res, gsi_base);

	*pdev = dev;
	*pgsi_base = gsi_base;
	return;

exit_release:
	pci_release_region(dev, 0);
exit_disable:
	pci_disable_device(dev);
exit_put:
	pci_dev_put(dev);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Don Dutile - Sept. 21, 2012, 2:15 p.m.
On 09/21/2012 02:22 AM, Yinghai Lu wrote:
> On Thu, Sep 20, 2012 at 7:56 PM, Bjorn Helgaas<bhelgaas@google.com>  wrote:
>> This is another thing I'm curious about.  How do you handle this
>> situation today (before host bridge hot-add)?
>>
>> The DMAR I'm not so worried about because as far as I know, there's no
>> such thing as a DMAR that's discovered by PCI enumeration.  We should
>> discover it via ACPI, and that can happen before we enumerate anything
>> behind a host bridge, so I don't really see any ordering problem
>> between the DMAR and the PCI devices that would use it.
>
> only need to have pci devices on that root bus scanned, and current intel iommu
> maintain one device scope to drhd with pointer to pci device... that
> need to be fixed
> too.
>
translation: you have an ACPI-DMAR setup bug?  a drhd can have multiple device
scopes, one of which can be "all devices under bus X uses this IOMMU".
If (dynamic) DMARs are scanned at root hot-plug time in ACPI hot-plug,
the proper dmar-init should be completed before any PCI devs are scanned
(and put into the proper iommu domain).

>>
>> However, I know there *are* IOAPICs that are enumerated as PCI
>> devices, and I don't know whether we can deduce a relationship between
>> the IOAPIC and the devices that use it.  Don't we have this problem
>> already?  I assume that even without hot-adding a host bridge, we
>> might discover a PCI IOAPIC that was present at boot, and we'd have to
>> make sure to bind a driver to it before we use any of the PCI devices
>> connected to it.  How does that work?
>
> I converted it to acpi way to discover it, and it could handle that case.
>
> will search _GSB and try to get pci device, if there is pci device
> will try to get BAR as ioapic base.
>
> http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=blob;f=drivers/pci/ioapic.c;h=504ca93ac692646a7754fff83a04e3d07d98f648;hb=refs/heads/for-x86-irq
>
> something like:
>
> static void handle_ioapic_add(acpi_handle handle, struct pci_dev **pdev,
> 				 u32 *pgsi_base)
> {
> 	acpi_status status;
> 	unsigned long long gsb;
> 	struct pci_dev *dev;
> 	u32 gsi_base;
> 	int ret;
> 	char *type;
> 	struct resource r;
> 	struct resource *res =&r;
> 	char objname[64];
> 	struct acpi_buffer buffer = {sizeof(objname), objname};
>
> 	*pdev = NULL;
> 	*pgsi_base = 0;
>
> 	status = acpi_evaluate_integer(handle, "_GSB", NULL,&gsb);
> 	if (ACPI_FAILURE(status) || !gsb)
> 		return;
>
> 	dev = acpi_get_pci_dev(handle);
> 	if (!dev) {
> 		struct acpi_device_info *info;
> 		char *hid = NULL;
>
> 		status = acpi_get_object_info(handle,&info);
> 		if (ACPI_FAILURE(status))
> 			return;
> 		if (info->valid&  ACPI_VALID_HID)
> 			hid = info->hardware_id.string;
> 		if (!hid || strcmp(hid, "ACPI0009")) {
> 			kfree(info);
> 			return;
> 		}
> 		kfree(info);
> 		memset(res, 0, sizeof(*res));
> 		acpi_walk_resources(handle, METHOD_NAME__CRS, setup_res, res);
> 		if (!res->flags)
> 			return;
> 	}
>
> 	acpi_get_name(handle, ACPI_FULL_PATHNAME,&buffer);
>
> 	gsi_base = gsb;
> 	type = "IOxAPIC";
> 	if (dev) {
> 		ret = pci_enable_device(dev);
> 		if (ret<  0)
> 			goto exit_put;
>
> 		pci_set_master(dev);
>
> 		if (dev->class == PCI_CLASS_SYSTEM_PIC_IOAPIC)
> 			type = "IOAPIC";
>
> 		if (pci_request_region(dev, 0, type))
> 			goto exit_disable;
>
> 		res =&dev->resource[0];
> 	}
>
> 	if (acpi_register_ioapic(handle, res->start, gsi_base)) {
> 		if (dev)
> 			goto exit_release;
> 		return;
> 	}
>
> 	printk(KERN_INFO "%s %s %s at %pR, GSI %u\n",
> 		dev ? dev_name(&dev->dev) : "", objname, type,
> 		res, gsi_base);
>
> 	*pdev = dev;
> 	*pgsi_base = gsi_base;
> 	return;
>
> exit_release:
> 	pci_release_region(dev, 0);
> exit_disable:
> 	pci_disable_device(dev);
> exit_put:
> 	pci_dev_put(dev);
> }

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - Sept. 21, 2012, 3:11 p.m.
On Fri, Sep 21, 2012 at 7:15 AM, Don Dutile <ddutile@redhat.com> wrote:
> On 09/21/2012 02:22 AM, Yinghai Lu wrote:
>>
>> On Thu, Sep 20, 2012 at 7:56 PM, Bjorn Helgaas<bhelgaas@google.com>
>> wrote:
>>>
>>> This is another thing I'm curious about.  How do you handle this
>>> situation today (before host bridge hot-add)?
>>>
>>> The DMAR I'm not so worried about because as far as I know, there's no
>>> such thing as a DMAR that's discovered by PCI enumeration.  We should
>>> discover it via ACPI, and that can happen before we enumerate anything
>>> behind a host bridge, so I don't really see any ordering problem
>>> between the DMAR and the PCI devices that would use it.
>>
>>
>> only need to have pci devices on that root bus scanned, and current intel
>> iommu
>> maintain one device scope to drhd with pointer to pci device... that
>> need to be fixed
>> too.
>>
> translation: you have an ACPI-DMAR setup bug?  a drhd can have multiple
> device
> scopes, one of which can be "all devices under bus X uses this IOMMU".
> If (dynamic) DMARs are scanned at root hot-plug time in ACPI hot-plug,
> the proper dmar-init should be completed before any PCI devs are scanned
> (and put into the proper iommu domain).

if you can check for-iommu branch,

http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=shortlog;h=refs/heads/for-iommu
current implementation:

after root bus plugged in,
drivers/acpi/pci_root_hp.c::handle_root_bridge_insert() ==>
acpi_bus_add ==> drivers/acpi/pci_root.c::acpi_pci_root_add ==>
arch/x86/pci/acpi.c::pci_acpi_scan_root
so all pci devices get scanned.

later
handl_root_bridge_insert ==>acpi_bus_add ==>
drivers/acpi/pci_root.c::acpi_pci_root_start==>
pcibios_reserouce_survey_bus/pci_assign_unassigned_bus_resource
and call iommu acpi_pci_driver .add aka
drivers/iommu/dmar.c::acpi_pci_iommu_add ==> register_iommu ==>
handle_iommu_add ==> dmar_parse_dev ==> dmar_parse_dev_scope ==>
dmar_parse_one_dev_scope

that dmar_parse_one_dev_scope will find pci device pointer and put it
back into dmaru device pointer array.

in the boot path, it the same, dev_scope is parsed later after pci
devices get scanned.

We really should remove that cache array for device pointer...

-Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index aeccc91..b0fe771 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -152,15 +152,11 @@  failed1:
 static void virtfn_remove(struct pci_dev *dev, int id, int reset)
 {
 	char buf[VIRTFN_ID_LEN];
-	struct pci_bus *bus;
 	struct pci_dev *virtfn;
 	struct pci_sriov *iov = dev->sriov;
 
-	bus = pci_find_bus(pci_domain_nr(dev->bus), virtfn_bus(dev, id));
-	if (!bus)
-		return;
-
-	virtfn = pci_get_slot(bus, virtfn_devfn(dev, id));
+	virtfn = pci_get_domain_bus_and_slot(pci_domain_nr(dev->bus),
+			virtfn_bus(dev, id), virtfn_devfn(dev, id));
 	if (!virtfn)
 		return;