diff mbox

[08/18] PCI, powerpc: Register busn_res for root buses

Message ID 1330395009-29260-9-git-send-email-yinghai@kernel.org (mailing list archive)
State Superseded
Headers show

Commit Message

Yinghai Lu Feb. 28, 2012, 2:09 a.m. UTC
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/kernel/pci-common.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

Comments

Bjorn Helgaas Feb. 28, 2012, 5:36 a.m. UTC | #1
On Mon, Feb 27, 2012 at 7:09 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: linuxppc-dev@lists.ozlabs.org
> ---
>  arch/powerpc/kernel/pci-common.c |    7 ++++++-
>  1 files changed, 6 insertions(+), 1 deletions(-)
>
> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> index 910b9de..ae5ae5f 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -1660,6 +1660,8 @@ void __devinit pcibios_scan_phb(struct pci_controller *hose)
>        bus->secondary = hose->first_busno;
>        hose->bus = bus;
>
> +       pci_bus_insert_busn_res(bus, hose->first_busno, hose->last_busno);
> +
>        /* Get probe mode and perform scan */
>        mode = PCI_PROBE_NORMAL;
>        if (node && ppc_md.pci_probe_mode)
> @@ -1670,8 +1672,11 @@ void __devinit pcibios_scan_phb(struct pci_controller *hose)
>                of_scan_bus(node, bus);
>        }
>
> -       if (mode == PCI_PROBE_NORMAL)
> +       if (mode == PCI_PROBE_NORMAL) {
> +               pci_bus_update_busn_res_end(bus, 255);
>                hose->last_busno = bus->subordinate = pci_scan_child_bus(bus);
> +               pci_bus_update_busn_res_end(bus, bus->subordinate);
> +       }

There's a lot of powerpc code that does this:

    bus_range = of_get_property(pcictrl, "bus-range", &len);
    hose->first_busno = bus_range[0];
    hose->last_busno = bus_range[1];

That *looks* like it is discovering the bus number aperture.  Is it?
If it is, why are we using the largest bus number found by
pci_scan_child_bus() rather than "last_busno"?

Bjorn
Benjamin Herrenschmidt Feb. 28, 2012, 8:54 a.m. UTC | #2
On Mon, 2012-02-27 at 22:36 -0700, Bjorn Helgaas wrote:
> 
> There's a lot of powerpc code that does this:
> 
>     bus_range = of_get_property(pcictrl, "bus-range", &len);
>     hose->first_busno = bus_range[0];
>     hose->last_busno = bus_range[1];
> 
> That *looks* like it is discovering the bus number aperture.  Is it?
> If it is, why are we using the largest bus number found by
> pci_scan_child_bus() rather than "last_busno"?

We do that but we somewhat -also- rely on the core bumping it if it
needs to make room :-)

As I said, we are swimming in dirty waters between reverse engineered
stuff we don't know 100% and "designed" stuff.

I think we should have ways to more explicitely define what we want tho,
ie whether hose->last_busno is just what happens to be the "current" bus
number assigned by the firmware or the hard max. Maybe a pci flag ?

On the other hand some platforms (all the ppc4xx ones for example) set
the flag to reassign all busses ... but have limit on bus numbers simply
because they have a memory mapped only config space and we don't have
enough address space to ioremap it all on 32-bit.

We need to fix them to use a fixmap entry to do atomic on-demand mapping
of the config space and lift that restriction, but that isn't done yet.

So I think those patches will need really careful handling on our side.

Cheers,
Ben.
Bjorn Helgaas Feb. 28, 2012, 11:31 p.m. UTC | #3
On Mon, Feb 27, 2012 at 10:36 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Mon, Feb 27, 2012 at 7:09 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Paul Mackerras <paulus@samba.org>
>> Cc: linuxppc-dev@lists.ozlabs.org
>> ---
>>  arch/powerpc/kernel/pci-common.c |    7 ++++++-
>>  1 files changed, 6 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>> index 910b9de..ae5ae5f 100644
>> --- a/arch/powerpc/kernel/pci-common.c
>> +++ b/arch/powerpc/kernel/pci-common.c
>> @@ -1660,6 +1660,8 @@ void __devinit pcibios_scan_phb(struct pci_controller *hose)
>>        bus->secondary = hose->first_busno;
>>        hose->bus = bus;
>>
>> +       pci_bus_insert_busn_res(bus, hose->first_busno, hose->last_busno);
>> +
>>        /* Get probe mode and perform scan */
>>        mode = PCI_PROBE_NORMAL;
>>        if (node && ppc_md.pci_probe_mode)
>> @@ -1670,8 +1672,11 @@ void __devinit pcibios_scan_phb(struct pci_controller *hose)
>>                of_scan_bus(node, bus);
>>        }
>>
>> -       if (mode == PCI_PROBE_NORMAL)
>> +       if (mode == PCI_PROBE_NORMAL) {
>> +               pci_bus_update_busn_res_end(bus, 255);
>>                hose->last_busno = bus->subordinate = pci_scan_child_bus(bus);
>> +               pci_bus_update_busn_res_end(bus, bus->subordinate);
>> +       }
>
> There's a lot of powerpc code that does this:
>
>    bus_range = of_get_property(pcictrl, "bus-range", &len);
>    hose->first_busno = bus_range[0];
>    hose->last_busno = bus_range[1];
>
> That *looks* like it is discovering the bus number aperture.  Is it?
> If it is, why are we using the largest bus number found by
> pci_scan_child_bus() rather than "last_busno"?

Sorry, I missed the earlier hunk of the patch where you *do* use last_busno:

>> +       pci_bus_insert_busn_res(bus, hose->first_busno, hose->last_busno);

I still think this part is wrong:

+       if (mode == PCI_PROBE_NORMAL) {
+               pci_bus_update_busn_res_end(bus, 255);
               hose->last_busno = bus->subordinate = pci_scan_child_bus(bus);
+               pci_bus_update_busn_res_end(bus, bus->subordinate);

I think there are two problems:

1) We can enumerate devices under the wrong PHB.  Assume this:

PCI host bridge A to [bus 00]
pci 0000:00:01.0: PCI bridge
PCI host bridge B to [bus 01]
pci 0000:01:01.0: PCI endpoint

The P2P bridge at 00:01.0 has no devices below it, but of course we
can't tell that until we look behind it.  To do that, we'll have to
assign a bus number, and since we forced the bus number aperture to
[bus 00-ff] instead of the correct [bus 00], we'll probably allocate
bus number 01 as the secondary bus.  Then we'll generate a config
cycle for 01:01.0, which discovers a device.  But we can't tell that
the cycle was actually claimed by host bridge B, not A.  So now we
wrongly think that 01:01.0 is under A, so we can't handle its
resources correctly.

I think we should have failed when allocating a secondary bus number
for 00:01.0 and just skipped looking behind it.

2) We preclude hot-add in some cases.  For example, if we scan this topology:

PCI host bridge C to [bus 00-7f]
pci 0000:00:01.0: PCI bridge to [bus 01]
pci 0000:01:01.0: PCI endpoint

we set the root bus's subordinate bus number to 01 (the highest bus
number we discovered), so we now think host bridge C leads only to
[bus 00-01].  Now let's remove 01:01.0 and plug in a card with a
bridge on it, e.g.,

pci 0000:01:01.0: PCI bridge to ...
pci 0000:xx:01.0: PCI endpoint

We can't allocate a bus number for 01:01.0's secondary bus because we
think we're out of space.  But we're really not; the true bus number
aperture for C is [bus 00-7f], not [bus 00-01].

We may need mechanism to say "don't trust this info from the
firmware," but we should be able to figure out a way that doesn't
penalize platforms that do everything correctly.  The current patch
breaks these scenarios even when the platform firmware is 100%
correct.

Bjorn
Benjamin Herrenschmidt Feb. 28, 2012, 11:41 p.m. UTC | #4
On Tue, 2012-02-28 at 16:31 -0700, Bjorn Helgaas wrote:
> We may need mechanism to say "don't trust this info from the
> firmware," but we should be able to figure out a way that doesn't
> penalize platforms that do everything correctly.  The current patch
> breaks these scenarios even when the platform firmware is 100%
> correct.

On the other hand, our firmwares tend not to be and the vast majority of
our platforms have separate bus number domains (In fact I'm not sure
whether we have one that actually splits bus numbers or not, maybe some
ancient Apple gear, I need to double check).

We did use to force renumbering on macs to avoid bus number collisions
between domains because of ancient X servers that didn't do domains
properly but I think we dropped that.

Cheers,
Ben.
Milton Miller April 2, 2012, 9:58 a.m. UTC | #5
[Apollogies if I left anyone off cc.  I am not subscribed and the archives
go to great pains to hide the to and cc lists, and then truncate it for size]

On Tue Feb 28 2012 about 18:42:03 EST, Benjamin Herrenschmidt wrote:
> On Tue, 2012-02-28 at 16:31 -0700, Bjorn Helgaas wrote:
> > We may need mechanism to say "don't trust this info from the
> > firmware," but we should be able to figure out a way that doesn't
> > penalize platforms that do everything correctly. The current patch
> > breaks these scenarios even when the platform firmware is 100%
> > correct.
> 
> On the other hand, our firmwares tend not to be and the vast majority of
> our platforms have separate bus number domains (In fact I'm not sure
> whether we have one that actually splits bus numbers or not, maybe some
> ancient Apple gear, I need to double check).
> 

In the POWER3 era we had several boxes that split the pci bus number
space across domains and RTAS used the bus number to find the correct
PHB.   This contineed to the first RS64 boxes.  By S80 and RS64-III
it was obvious that we didn't have enough numbers to continue this
ilusion and we added the presently used RTAS calls that take the pci
domain as well as bus, device, and function numbers.

The bus numbers split across pci domains started with the F50
F50 chrp 32 bit platforms.

> We did use to force renumbering on macs to avoid bus number collisions
> between domains because of ancient X servers that didn't do domains
> properly but I think we dropped that.
> 
> Cheers,
> Ben.

milton
Milton Miller April 2, 2012, 10:19 a.m. UTC | #6
[Apollogies if I left anyone off cc.  I am not subscribed and the archives
go to great pains to hide the to and cc lists, and then truncate it for size]
{fix a comma in cc}

On Tue Feb 28 2012 about 18:42:03 EST, Benjamin Herrenschmidt wrote:
> On Tue, 2012-02-28 at 16:31 -0700, Bjorn Helgaas wrote:
> > We may need mechanism to say "don't trust this info from the
> > firmware," but we should be able to figure out a way that doesn't
> > penalize platforms that do everything correctly. The current patch
> > breaks these scenarios even when the platform firmware is 100%
> > correct.
> 
> On the other hand, our firmwares tend not to be and the vast majority of
> our platforms have separate bus number domains (In fact I'm not sure
> whether we have one that actually splits bus numbers or not, maybe some
> ancient Apple gear, I need to double check).
> 

In the POWER3 era we had several boxes that split the pci bus number
space across domains and RTAS used the bus number to find the correct
PHB.   This contineed to the first RS64 boxes.  By S80 and RS64-III
it was obvious that we didn't have enough numbers to continue this
ilusion and we added the presently used RTAS calls that take the pci
domain as well as bus, device, and function numbers.

The bus numbers split across pci domains started with the F50
F50 chrp 32 bit platforms.

> We did use to force renumbering on macs to avoid bus number collisions
> between domains because of ancient X servers that didn't do domains
> properly but I think we dropped that.
> 
> Cheers,
> Ben.

milton
Benjamin Herrenschmidt April 2, 2012, 9:22 p.m. UTC | #7
On Mon, 2012-04-02 at 05:19 -0500, Milton Miller wrote:
> 
> In the POWER3 era we had several boxes that split the pci bus number
> space across domains and RTAS used the bus number to find the correct
> PHB.   This contineed to the first RS64 boxes.  By S80 and RS64-III
> it was obvious that we didn't have enough numbers to continue this
> ilusion and we added the presently used RTAS calls that take the pci
> domain as well as bus, device, and function numbers.
> 
> The bus numbers split across pci domains started with the F50
> F50 chrp 32 bit platforms. 

Ok, so old CHRPs would have a split as well...

I think the best is to honor the bus-range property of the PHB as min &
max values, and if we feel a need to allow the kernel to assign busses
beyond that, we can always quirk the appropriate platform code.

Cheers,
Ben.
diff mbox

Patch

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 910b9de..ae5ae5f 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1660,6 +1660,8 @@  void __devinit pcibios_scan_phb(struct pci_controller *hose)
 	bus->secondary = hose->first_busno;
 	hose->bus = bus;
 
+	pci_bus_insert_busn_res(bus, hose->first_busno, hose->last_busno);
+
 	/* Get probe mode and perform scan */
 	mode = PCI_PROBE_NORMAL;
 	if (node && ppc_md.pci_probe_mode)
@@ -1670,8 +1672,11 @@  void __devinit pcibios_scan_phb(struct pci_controller *hose)
 		of_scan_bus(node, bus);
 	}
 
-	if (mode == PCI_PROBE_NORMAL)
+	if (mode == PCI_PROBE_NORMAL) {
+		pci_bus_update_busn_res_end(bus, 255);
 		hose->last_busno = bus->subordinate = pci_scan_child_bus(bus);
+		pci_bus_update_busn_res_end(bus, bus->subordinate);
+	}
 
 	/* Platform gets a chance to do some global fixups before
 	 * we proceed to resource allocation